D
Dense retrieval (vector search)
Embed the query, search for the nearest vectors using cosine similarity. Finds semantic matches even when exact words differ ("deadline" matches "time limit"). Requires vector DB.
Keyword matching that scores by term frequency + rarity. Excellent for exact terms (form names, product codes, jargon). Misses synonyms and paraphrases.
Run dense and sparse in parallel, fuse results with Reciprocal Rank Fusion (RRF). Consistently outperforms either alone. A document in both lists scores highest confidence.
After retrieval, a cross-encoder model re-scores top-20 candidates by reading query + document together. Expensive but much more accurate. Pattern: retrieve 20 → rerank → keep 5.
F
RRF (Reciprocal Rank Fusion)
Fusion formula: score = 1/(k + rank) summed across lists. A result ranked #1 in both dense and sparse wins. k=60 is the standard constant. Simple but robust.
Use an LLM to make the user's original query more explicit and keyword-rich before searching. "What happens after I send the form?" becomes a detailed question with relevant terms.
LLM generates 3–5 phrasings of the same query. Retrieve for each, merge and deduplicate. Ensures wording variations do not cause misses.
X
Sub-question decomposition
Break complex queries into simpler sub-questions. Retrieve for each separately. Combine retrieved chunks. LLM synthesises one unified answer from all context.
Apply hard filters (division, date, status) before or during vector search. Equivalent to SQL WHERE clause on vector search. Dramatically narrows the search space.
After retrieval, extract only the sentences from each chunk that are relevant to the query. Strips noise, saves context window, gives the LLM cleaner input.
Small child chunks (128 tokens) for precise embedding match. When matched, return the larger parent chunk (512 tokens) to the LLM for fuller context. Best of both: precision + context.
I
Agentic / ReAct retrieval
Agent loops: Thought → Action (search) → Observe result → decide next step. Each retrieval informs the next. Handles multi-part questions requiring sequential lookups.
LLM reads the DB schema and generates SQL from natural language. Executes against live database. Risks: wrong SQL returns wrong numbers silently. Mitigate with read-only connections and few-shot examples.