1. Purpose of System
This system is to autonomously handle document-based user queries for determining whether the answer can be extracted from the document itself or whether it requires external knowledge or actions. The system routes queries through a multi-agent workflow to either summarize, search, or escalate the task based on the nature of the query. This supports both high-level understanding and fine-grained question answering, while enabling the system to gracefully handle out-of-scope or complex tasks via delegated agent.
2. System Architecture

The system is implemented as a multi-agent pipeline, structured around the following components.
- Document Ingestion Pipeline: Loads and splits documents using llama_index tools, including SimpleDirectoryReader and SentenseSplitter. Two types of indices are created: SummaryIndex for summarization or high level overview, and VectorStoreIndex for semantic search.
- Tool Agent: Executes the query using a RouterQueryEngine that selects among three tools (summary, search, action) based on the query’s intent. Selection is handled using an LLM-based selector (GPT-3.5-turbo).
- EmbeddingAwareSelector: Since the default router (llama_index RouterQueryEngine) does not have access to the document context embeddings, this custom selector was built to enhance routing by incorporating vector-based relevance scores and embedding similarity to imform tool selection.
- Action Agent: Triggered only when a query is deemed out-of-scope for the given documents by Tool Agent. This agent could be extended to integrate external APIs.
- Merge Agent: Validates the final result generated by Tool Agent and Action Agent before sending to the user.
3. Used Tools
- LlamaIndex: Document parsing, index building, and query engine routing
- OpenAI GPT-3.5: LLM used for selecting tools and generating responses
- HuggingfaceEmbedding: Embedding model to create vector index for semantic search.
- LangGraph: Support multi agents workflows.