Component detail
explanation, examples & design rationale
Tap any node in the diagram to see a full breakdown, examples, and enterprise context.
Chunking is not a solved problem
As of March 2026, there is no universally correct chunking strategy. The optimal approach depends on document type, query distribution, embedding model, and LLM context window. Every production system should A/B test at least two strategies against a golden dataset before committing.
The chunk size is the most important hyperparameter
Wrong chunk size is the single most common cause of RAG failure in production. Too small: chunks lose surrounding context, embeddings are noisy, and multi-sentence answers fragment across boundaries. Too large: embeddings average over too many topics, retrieval is imprecise, and the LLM receives too much irrelevant context.
Parent-child is the enterprise default in 2025–2026
The parent-child (small-to-big) pattern is the most widely deployed enterprise chunking architecture. Small child chunks (128 tokens) give precise embedding matches. Large parent chunks (512–1024 tokens) give the LLM sufficient context to reason. AWS Bedrock, Azure AI Search, and LlamaIndex all support this natively.
Metadata is as important as the chunk text
A chunk without rich metadata is half-useful. The combination of LLM-generated summaries, keywords, hypothetical questions, and structural metadata turns retrieval from approximate similarity search into targeted knowledge retrieval. Every enterprise deployment should invest in metadata enrichment at ingest time.
Evaluation closes the loop
Build a golden dataset of 20–50 query-to-expected-chunk pairs before deploying. Run RAGAS context recall and context precision metrics against it whenever you change chunking strategy, chunk size, or overlap. Without this, you are guessing.