← Learn RAG from Scratch
RAG Chunking Strategies Explained in 5 Minutes
Video 2 of 9 · 7:29
Chapters
- 0:00Why chunking matters
- 0:40What chunking is
- 1:20Chunk size comparison
- 2:30Overlap strategy
- 3:10Fixed vs recursive vs semantic chunking
Transcript
Auto generated by YouTube. Click any timestamp to jump to that moment.
Show
Transcript
Auto generated by YouTube. Click any timestamp to jump to that moment.
- 0:03docs. You search and get garbage
- 0:05The LLM hallucinates. Anyway,
- 0:09people blame the model. But the
- 0:11problem happened before any search
- 0:13You chunked your documents wrong.
- 0:16too big, the embedding is diluted.
- 0:19too small, you lose the context
- 0:21makes the answer useful. This video
- 0:24you exactly how to chunk your
- 0:26so retrieval actually works.
- 0:29the basic idea. You have a
- 0:32document, a refund policy, a
- 0:35spec, whatever. You can't embed
- 0:38whole thing as one vector. The
- 0:40model has a token limit,
- 0:42512 or 8,192
- 0:46depending on the model. Even if
- 0:49fits, one embedding for 10 pages
- 0:52nothing specific. So, you split
- 0:55document into smaller pieces. Each
- 0:58is a chunk. Each chunk gets its
- 1:00embedding. Now, when someone
- 1:02annual plan refund, the search
- 1:05the specific chunk about annual
- 1:07refunds, not a vague summary of the
- 1:10document. That's chunking.
- 1:13question is, how do you decide where
- 1:15split? Let's see why size matters.
- 1:19a refund policy document. If you
- 1:22the whole page as one block,
- 1:242,000 tokens. When someone
- 1:27annual plan refund, the
- 1:30matches loosely. Similarity
- 1:320.61.
- 1:35much unrelated content dilutes the
- 1:37Now go the other extreme. Chunk
- 1:41single sentence 20 tokens each.
- 1:44search finds annual plans may be
- 1:46Similarity 0.89.
- 1:49the chunk has zero context. Refunded
- 1:52Under what conditions? The LLM
- 1:55answer the actual question. The
- 1:58spot is paragraph level 200 to 500
- 2:01The search hits section 4.2, the
- 2:05about annual plan refunds
- 2:07Similarity 0.94.
- 2:11detail to answer the question.
- 2:13context for the LLM to generate a
- 2:15response.
- 2:17another problem. When you split
- 2:19document into chunks, you create hard
- 2:22information that spans two
- 2:24gets cut in half. Take this
- 2:27One paragraph says, "An annual
- 2:30are eligible for refunds." The
- 2:32paragraph says, "Subject to a
- 2:34usage deduction." If your
- 2:37boundary falls between those two
- 2:38neither chunk has the full
- 2:41The fix is overlap. You set a
- 2:44size of 400 tokens and an overlap
- 2:4750 tokens. The last 50 tokens of
- 2:50one repeat as the first 50 tokens
- 2:52chunk two. Now both chunks contain
- 2:55for refunds subject to a
- 2:57usage deduction. The search
- 2:59find the complete answer regardless
- 3:02which chunk it hits. Typical overlap
- 3:0410 to 20% of chunk size. If you want
- 3:08learn how to do this yourself, I run
- 3:11live sessions every Friday at noon
- 3:14Scan the QR code on screen to
- 3:18Would love to see you there. The
- 3:22approach is fixed size
- 3:24You pick a number, 500 tokens.
- 3:28split the document every 500 tokens.
- 3:31it. No logic about paragraphs,
- 3:34or meaning. Just count to 500
- 3:37cut. The advantage is
- 3:41chunk is the same size. Easy to
- 3:44easy to debug, easy to reason
- 3:47storage costs. The downside is
- 3:50You might cut in the middle of
- 3:52sentence. Annual plans are eligible
- 3:55in one chunk, refunds subject to
- 3:58in the next. Neither chunk
- 4:01complete sense on its own. Fixed
- 4:03works fine for homogeneous content
- 4:06log files or structured data. For
- 4:09language documents, you want
- 4:11smarter.
- 4:13is the technique you'll use 80% of
- 4:16time. Recursive character chunking.
- 4:19how it works. You give it a list
- 4:22separators in priority order. Double
- 4:24line, single new line, period space,
- 4:28The algorithm tries to split on
- 4:31new lines first. Those are
- 4:33boundaries. If a resulting
- 4:35is still too big, it falls back to
- 4:38new lines. Still too big, split
- 4:41sentences. Last resort, split on
- 4:44Take our refund policy. The
- 4:48new line split gives us four
- 4:50paragraphs. Section one is 280
- 4:53Fits perfectly. Section two is
- 4:57tokens. Fine. Section three is 620
- 5:02Too big. So it falls back to
- 5:05splitting on that section only.
- 5:07section three becomes two chunks. 3A
- 5:11340 tokens and 3B at 280 tokens.
- 5:15chunk respects natural boundaries.
- 5:17mid-sentence cuts. This is the
- 5:20in lang chain llama index and
- 5:22rag frameworks. Semantic chunking
- 5:26a completely different approach.
- 5:28of splitting by character count
- 5:30punctuation, it splits by meaning.
- 5:34the idea. You take every sentence
- 5:37compute its embedding. Then you
- 5:40adjacent sentences. If two
- 5:42have high cosine similarity,
- 5:45belong in the same chunk. When the
- 5:48drops below a threshold, you
- 5:50a new chunk boundary. Take a
- 5:53spec. Sentences about pricing
- 5:55together. Sentences about
- 5:58cluster together. The algorithm
- 6:00the semantic shift and splits
- 6:03The result is chunks that are
- 6:05coherent. Every chunk is about
- 6:08thing. The downside is cost. You're
- 6:11the embedding model for every
- 6:13not just every chunk. For most
- 6:16cases, recursive character chunking
- 6:18you 90% of the benefit at a
- 6:21of the cost.
- 6:23how to choose. Start with
- 6:26character chunking. Chunk size
- 6:29to 600 tokens. Overlap 50 to 100.
- 6:33your baseline. If your documents
- 6:36structured like code files, log
- 6:39or CSV data, fixed size
- 6:41is fine. If your retrieval
- 6:44still isn't good enough after
- 6:46chunking, try semantic
- 6:48on the problem documents. and
- 6:51measure. Run the same 20 test
- 6:53against each strategy. Compare
- 6:56similarity scores. The right
- 6:58strategy is the one that puts
- 7:00right information in front of the
- 7:04the full picture. If you want to
- 7:07deeper, join my free live session
- 7:09Friday at noon Eastern on Maven. I
- 7:12through this hands-on, answer
- 7:15and show you how to build it
- 7:17Scan the QR code to join.
Want the next one in your inbox?
Join 1,000+ Product Managers getting one deep dive every Friday.