MultiAgent RAG for Document Answering

About the Project

The project aims to build a multi-agent Retrieval Augmented Generation (RAG) system designed to answer questions about documents while minimizing hallucinations through a reflection pattern. The system processes uploaded documents and uses a multi-agent architecture to fact-check and verify the generated answers.

Architecture

The system is built using LangGraph and employs a workflow with the following agents:

Relevance Checker Agent: This agent assesses whether a user's question can be answered by the document content, classifying it as "CAN_ANSWER", "PARTIAL", or "NO_ANSWER".
Research Agent: If the question is deemed relevant, this agent retrieves relevant information from the document using a hybrid retrieval approach (combining BM25 and vector-based retrieval) and generates a draft answer based on the retrieved context.
Verification Agent: This agent verifies the draft answer generated by the Research Agent against the original document context. It checks for factual support, identifies unsupported claims and contradictions, and assesses relevance. Based on the verification report, the answer is either finalized or sent back to the Research Agent for re-research (reflection pattern).

The retrieval process involves converting PDF documents to Markdown using Docling and splitting the Markdown into semantic chunks using MarkdownHeaderTextSplitter. A hybrid retriever is then built using Chroma (for vector search) and BM25.

Achievement

Successfully implemented and integrated the following components:

Document processing (PDF to Markdown and chunking).
Hybrid retrieval using BM25 and Chroma.
The three core agents: Relevance Checker, Research, and Verification.
A LangGraph workflow to orchestrate the interaction between the agents, including a routing mechanism based on relevance and a reflection loop for verification.

Further Developement

Potential areas for further development include:

Refining the prompts and models used for each agent to improve accuracy and robustness.
Enhancing the reflection mechanism to handle more complex verification scenarios.
Implementing mechanisms to handle different document types and structures.