Home Agentic AI Home

Chunking in Agentic AI

Strategies, tradeoffs & enterprise patterns — as of March 2026 · tap any node to explore

Strategies Advanced Metadata Enterprise Tradeoffs Code
What chunking is — the foundational decision
Chunking: splitting documents into retrievable units before embedding and indexing
LLMs have a fixed context window · embedding models work best on focused, short passages · retrieval precision improves when chunks are topically coherent
Foundation
↓ five primary strategies
Strategy 1 — fixed-size chunking (baseline)
Fixed-size chunking
split every N tokens/chars regardless of content
Simple · Fast
Overlap window
repeat 10–20% of previous chunk at start of next
Context bridge
Size selection
128–256 tokens (Q&A) · 512 (general) · 1024 (summarisation)
Size guide
Strategy 2 — recursive character splitting (most common default)
Recursive character splitter
tries paragraph → sentence → word boundaries in order
LangChain default
Separator hierarchy
\n\n → \n → ". " → " " — always prefers semantic breaks
Priority order
Why it wins at baseline
respects natural language structure · no LLM cost · deterministic
Rationale
Strategy 3 — document structure-aware chunking
Structure-aware splitting
headings, sections, paragraphs as natural boundaries
Layout-aware
Markdown / HTML splitter
splits on # headings · preserves header hierarchy in metadata
Format-specific
Code-aware splitter
AST-based · splits on function/class boundaries · tree-sitter
Code docs
Strategy 4 — semantic chunking (topic-boundary detection)
Semantic chunking
embed sentences · measure cosine similarity · cut where similarity drops
Topic-aware
Threshold tuning
percentile-based or fixed cosine drop · controls chunk count
Hyperparameter
Cost consideration
requires embedding every sentence at ingest · 10–30× more expensive
Trade-off
Strategy 5 — proposition-based chunking (LLM-powered, 2024–2026)
Proposition chunking
LLM rewrites each paragraph into standalone atomic facts
SOTA quality
Before vs after
"it can be submitted by…" → "Form A can be submitted by the customer."
No ambiguity
Enterprise adoption
used for high-value doc corpora · compliance · legal · medical
Production 2025+
Parent-child (small-to-big) chunking — most widely adopted enterprise pattern 2025
Parent-child chunking: small child chunks for retrieval precision, large parent returned to LLM
child: 128 tokens for embedding match · parent: 512–1024 tokens for LLM context · indexed separately · child stores parent_id reference
Dual-index pattern
RAPTOR hierarchical indexing — recursive summarisation tree
RAPTOR leaf nodes
original chunks at level 0 — precise facts
Level 0
Cluster summaries
LLM summarises each UMAP cluster → level 1 nodes
Level 1
Root summaries
recursive summarisation → corpus-level abstractions
Level 2+
Query routing
broad questions → high level · specific → leaf nodes
Dual path
Late chunking — embed then chunk (2024 innovation)
Late chunking: embed the full document first, then pool token embeddings into chunk vectors
preserves full document context in every chunk embedding · resolves coreference ("it", "they", "the team") · requires long-context embedding model (jina-embeddings-v3, voyage-3) · Jina AI 2024
Context-aware embeddings
Contextual retrieval — Anthropic 2024, adopted at scale 2025
Contextual retrieval: prepend LLM-generated context to each chunk before embedding
Claude reads full doc + chunk → generates 1–2 sentence context → prepended to chunk text before embedding → 49% retrieval failure reduction reported by Anthropic · BM25 + contextual = 67% reduction
Anthropic 2024
Sliding window + sentence-window retrieval
Sentence-window chunking
index single sentences · retrieve surrounding window of ±3 sentences
High precision
Sliding window with stride
chunk size 512 · stride 256 · 50% overlap every chunk
Dense coverage
Retrieval vs index size
more overlap = better recall · larger index · higher storage cost
Trade-off
Agentic / self-reflective chunking — frontier 2025–2026
Agentic chunking: LLM decides where to cut based on document semantics and downstream task
agent reads document → identifies entity boundaries, argument structure, claim-evidence pairs → proposes chunk boundaries → validated against embedding coherence score · used in document intelligence platforms (Reducto, LlamaParse Pro, Azure Document Intelligence 2025)
LLM-directed · 2025
Source metadata — automatically extracted at ingest
Source metadata
file name · page · type · created · last modified · author
Auto-extract
Structural metadata
section heading · parent heading · chapter · depth level
Document structure
Position metadata
chunk index · total chunks · byte offset · token start/end
Navigation
LLM-generated semantic metadata — the retrieval multiplier
LLM-generated metadata: summary, keywords, entities, hypothetical questions, topic, audience
generated at ingest time per chunk · stored alongside vector · enables pre-filtering and semantic routing · expensive but dramatically improves retrieval F1 · used by Pinecone, Weaviate, LlamaIndex as standard enterprise pattern
Semantic enrichment
Hypothetical questions metadata — dense retrieval booster
Hypothetical questions at ingest
LLM generates 3–5 questions the chunk would answer · stored as searchable metadata
HyDE variant
Why this works
user queries match pre-generated questions better than raw chunk text · closes vocabulary gap
Recall boost
Cost model
~$0.002 per chunk at Haiku pricing · amortised over query lifetime · cache in metadata store
Economics
Metadata filtering — narrows vector search before ANN
Pre-filter by metadata
division · date range · document status · audience · classification level
WHERE clause for vectors
Performance impact
filtering 90% of corpus before ANN = 10× faster search on same hardware
Latency win
Payload indexing
Qdrant payload index · Weaviate property index · Pinecone metadata index
DB feature
Document lineage — essential for compliance and audit
Chunk lineage tracking
chunk_id → parent_doc_id → source_system → ingestion_run_id · stored in Postgres
Audit trail
Chunk versioning
document updated → re-chunk → new chunk IDs · old chunks soft-deleted · version pointer
Change management
PII / sensitivity tagging
Microsoft Presidio or AWS Comprehend scans chunk text at ingest · tags stored in metadata
Data governance
Document type routing — different strategies per source type
Enterprise routing: classify document type → apply strategy → route to appropriate chunker
PDF policies → structure-aware + parent-child · contracts → proposition chunking · code repos → AST splitter · spreadsheets → row/cell chunking · transcripts → speaker-turn chunking · emails → thread chunking
Type-aware pipeline
Multimodal chunking — beyond text (enterprise standard 2025)
Table chunking
Unstructured.io or Camelot · convert to markdown or JSON + text description
Structured data
Image / figure chunking
vision LLM (GPT-4o / Claude) generates caption → stored as embeddable text chunk
Vision pipeline
Audio / video chunking
Whisper → transcript → speaker-turn or time-window chunks with timestamps
Transcription
Slide deck chunking
per-slide chunks · slide title + body + speaker notes + extracted image captions
Presentation
Enterprise tooling stack — what is actually deployed at scale (March 2026)
Unstructured.io
de facto enterprise standard for complex PDF / HTML extraction and layout-aware chunking
Extraction
LlamaParse Pro
cloud API · handles complex PDFs with tables, headers, multi-column layouts · LlamaIndex native
Cloud parser
Reducto
document intelligence SaaS · agentic chunking · used in fintech and legal enterprise 2025
AI-native parser
Azure Document Intelligence
prebuilt models for invoices, contracts, forms · semantic chunking integrated 2025
Azure
Cloud-native chunking services — managed options per cloud
AWS Bedrock Knowledge Bases
fixed · hierarchical (parent-child) · semantic chunking · built into Bedrock ingestion pipeline
AWS managed
GCP Vertex AI Search
layout-based chunking · document AI pre-processing · grounding-native chunk metadata
GCP managed
Azure AI Search chunking
text split skill · sentence-boundary aware · integrated document cracking + OCR
Azure managed
Regulated industries — compliance-specific chunking requirements
Regulated industry requirements: clause-level chunking, citation preservation, PII isolation, audit trail
financial services: section-level with FINRA/SEC citation metadata · healthcare: HIPAA PHI isolation at chunk boundary · legal: clause chunking with Bluebook citation preservation · government: classification-level metadata per chunk · all require chunk lineage stored independently of vector store
Compliance
The fundamental tension — precision vs recall vs cost
Smaller chunks = higher retrieval precision but lower context per chunk · larger chunks = more context but noisier embeddings
the chunk size is the single most impactful hyperparameter in a RAG system · wrong chunk size is the top cause of RAG failure in production · there is no universal optimal size — it depends on query type, domain, and LLM context window
Core tension
Chunk size selection guide by use case
Factual Q&A
128–256 tokens · precise single-fact retrieval · medical, legal, policy
Small
General document Q&A
256–512 tokens · paragraph-level coherence · most enterprise RAG
Medium
Summarisation tasks
512–1024 tokens · richer context per chunk · report generation
Large
Code retrieval
whole function or class · AST-boundary split · never mid-function
Semantic unit
Overlap strategy — preventing context loss at boundaries
Why overlap is essential
without overlap: "it" at chunk start has no referent · answer splits across boundary · retrieval misses
Problem
Overlap sizing rules
10% overlap = minimal context bridge · 20% = standard · 50% = sentence-window pattern
Sizing
Storage cost of overlap
20% overlap = 20% more chunks = 20% more vectors = 20% more storage cost
Economics
Strategy selection matrix
Use fixed-size when
homogeneous docs · speed priority · prototyping · baseline to beat
Choose if
Use recursive when
mixed doc types · default production choice · unknown domain structure
Choose if
Use semantic when
topic shifts are sharp · high retrieval precision required · budget allows ingest cost
Choose if
Use proposition when
high-value corpus · compliance · answer quality is business-critical · cost justified
Choose if
Evaluation — how to know your chunking is working
RAGAS evaluation
context recall · context precision · faithfulness · answer relevancy — per chunk strategy
Framework
Golden dataset
20–50 hand-labelled query → expected chunk pairs · used to A/B test strategies
Ground truth
Chunk utilisation rate
% of retrieved chunks actually used by LLM in final answer · low = noisy retrieval
Metric
Recursive character splitting — production default (Python)
LangChain RecursiveCharacterTextSplitter — most common production baseline
chunk_size · chunk_overlap · separator hierarchy · length_function for token-accurate sizing
LangChain · Python
Semantic chunking — topic-boundary detection (Python)
LangChain SemanticChunker — embed sentences, cut on cosine drop
breakpoint_threshold_type: percentile / standard_deviation / gradient · requires embedding model at ingest
LangChain · Python
Parent-child dual index — enterprise retrieval pattern (Python)
LlamaIndex ParentDocumentRetriever — small chunks for matching, large parent returned to LLM
child_splitter 128 tokens · parent_splitter 512 tokens · child stores parent_id · retrieval fetches parent on match
LlamaIndex · Python
Proposition chunking — LLM-powered atomic facts (Python)
Custom proposition chunker using Claude / GPT-4o — rewrites paragraphs into self-contained facts
batch paragraphs → LLM returns JSON list of propositions → each proposition embedded independently
Custom · Python
Contextual retrieval — Anthropic pattern (Python)
Contextual retrieval: prepend LLM context to each chunk before embedding
for each chunk: call Claude with (full_doc, chunk) → get 2-sentence context → prepend to chunk → embed the enriched chunk
Anthropic pattern · Python
AST code splitter — structure-aware code chunking (Python)
LangChain RecursiveCharacterTextSplitter for code — language-aware split on function/class boundaries
Language.PYTHON · Language.JS · Language.GO · uses tree-sitter under the hood · never splits mid-function
LangChain · Python

Component detail

explanation, examples & design rationale

Tap any node in the diagram to see a full breakdown, examples, and enterprise context.

Chunking is not a solved problem

As of March 2026, there is no universally correct chunking strategy. The optimal approach depends on document type, query distribution, embedding model, and LLM context window. Every production system should A/B test at least two strategies against a golden dataset before committing.

The chunk size is the most important hyperparameter

Wrong chunk size is the single most common cause of RAG failure in production. Too small: chunks lose surrounding context, embeddings are noisy, and multi-sentence answers fragment across boundaries. Too large: embeddings average over too many topics, retrieval is imprecise, and the LLM receives too much irrelevant context.

Parent-child is the enterprise default in 2025–2026

The parent-child (small-to-big) pattern is the most widely deployed enterprise chunking architecture. Small child chunks (128 tokens) give precise embedding matches. Large parent chunks (512–1024 tokens) give the LLM sufficient context to reason. AWS Bedrock, Azure AI Search, and LlamaIndex all support this natively.

Metadata is as important as the chunk text

A chunk without rich metadata is half-useful. The combination of LLM-generated summaries, keywords, hypothetical questions, and structural metadata turns retrieval from approximate similarity search into targeted knowledge retrieval. Every enterprise deployment should invest in metadata enrichment at ingest time.

Evaluation closes the loop

Build a golden dataset of 20–50 query-to-expected-chunk pairs before deploying. Run RAGAS context recall and context precision metrics against it whenever you change chunking strategy, chunk size, or overlap. Without this, you are guessing.