Engaging in a conversation with a PDF poses challenges, and ChatGPT faces limitations in this context. Here’s why:
The prevalent use of GPT-4 wrappers revolves around applications like the “chat with a doc/pdf” app, which is a prominent AI chatbot use case. Reading through a dense document can be tedious, making it more convenient to request the language model to parse and summarize the content.
Regrettably, ChatGPT falls short in this aspect, particularly with PDFs exceeding 10 pages. It generates brief, generic summaries and steadfastly avoids providing additional elaboration.
This complexity arises for several reasons:
OCR Quality: Effective Optical Character Recognition (OCR) is essential, especially in handling tables and images within PDFs. However, existing free or commercial OCR technologies struggle to excel in this aspect, which is crucial as many business and research PDFs contain intricate tables and images.
Contextual Challenges: Although we currently employ 128K context-length language models (LLMs), it remains unclear what ChatGPT utilizes. Attempting OCR on a document and then feeding the text to ChatGPT often triggers errors, suggesting the deployment of a smaller context length model for ChatGPT requests.
Quick RAG Implementation: A potential solution involves implementing a straightforward Retrieval-Augmented Generation (RAG) approach, where the document is segmented, embedded, results retrieved, and then presented to the LLM. However, current ChatBots lack this feature.
Highlighting Document Sections: An optimal solution should highlight the document sections from which the responses are derived, simplifying the verification process.
An ideal “Chat with PDF” application should encompass these features. The potential for a standalone app in the app store to generate revenue is evident if it successfully integrates these functionalities. However, it appears more likely to be a small-scale venture, perhaps operated by a couple, as opposed to a heavily-backed startup – possibly presenting a viable lifestyle business.
In summary, creating an effective “Chat with PDF” application is a challenging endeavor, demanding meticulous consideration of OCR quality, contextual challenges, and the implementation of features like Quick RAG and section highlighting.