← Learn RAG from Scratch

RAG vs Fine-Tuning vs Prompting - Simple Decision Guide

Video 3 of 9 · 5:09

Chapters

0:00The hallucination problem
0:55What RAG actually does
1:40RAG vs fine-tuning vs context stuffing
3:20Three failure modes of naive RAG

Transcript

Auto generated by YouTube. Click any timestamp to jump to that moment.

Show

0:02You plug in an LLM and ask it,
0:05our refund policy for annual
0:08The model responds confidently.
0:11refund within 60 days of purchase.
0:15reasonable. Except your actual
0:17says pro-rated refund minus 2
0:21of service used. The LLM didn't
0:24It didn't look anything up. It
0:27a plausible sounding answer
0:29its training data. This is called a
0:32The model is confident
0:35wrong. And for any question about
0:38company's specific data, this will
0:40Your product docs, your pricing,
0:43internal policies, the LLM has
0:46seen any of it. Rag stands for
0:50augmented generation.
0:53of asking the LLM to answer from
0:55you first search your own data
0:58relevant documents. Then you pass
1:00documents to the LLM as context.
1:04LLM reads them and generates an
1:07based on what it found. Think of
1:10like this. Without RAD, you're asking
1:12to answer a question about a
1:14they've never read. With RAG, you
1:17them the right pages first, then
1:20the question. The model doesn't need
1:23memorize your data. It just needs to
1:25the right context at query time. So
1:29know you need your data in the LLM
1:31You have three options. Option
1:35stuff the context. Just paste your
1:38directly into the prompt. This
1:41when you have a small amount of
1:42maybe under 50 pages. It's simple,
1:46infrastructure, but it breaks when
1:48data grows and you're paying for
1:51every single call. Option two,
1:54You retrain the model on
1:57data. This changes the model's
1:59and knowledge permanently. It's
2:02for teaching the model a specific
2:04or domain vocabulary, but it's
2:07slow to update, and the model
2:10still hallucinate. Option three,
2:13You store your docs in a searchable
2:16retrieve what's relevant at
2:18time, and pass it as context. It
2:21to millions of documents. You can
2:24your data without retraining, and
2:26model always cites what it found.
2:29the decision framework. Use
2:31stuffing when your data fits in
2:34context window. Use fine-tuning when
2:37need to change the model's behavior,
2:39just its knowledge. Use RAD when you
2:42large or frequently changing data
2:44the model needs to reference.
2:47you want to learn how to do this
2:49I run free live sessions every
2:52at noon Eastern. Scan the QR code
2:56screen to join. Would love to see you
3:02here's the catch. Even with Rag,
3:05go wrong. There are three
3:07failure modes you need to know.
3:10mode one, wrong document. The
3:13asks about the refund policy, but
3:15retriever pulls the pricing page
3:17The embedding for refund policy
3:20close to pricing because they share
3:22vocabulary. So, the LLM
3:25an answer about pricing, not
3:27Real data, wrong source.
3:31mode two. Right document, wrong
3:33The retriever finds the refund
3:36document, but it grabs a chunk
3:38monthly plan refunds when the user
3:40about annual plans. The answer is
3:43the correct document, but the wrong
3:45Failure note three, no match.
3:49user asks, "What happens if I cancel
3:51Your docs use the phrase
3:53termination. The embeddings
3:56match because the vocabulary is
3:57The retriever returns nothing
3:59and the LLM either hallucinates
4:02says it doesn't know. Each of these
4:05modes has a known fix. Wrong
4:08that's a routing and indexing
4:11Wrong chunk, that's a chunking
4:14problem. No match, that's a
4:17translation problem. In the next
4:20in this series, we'll build a
4:22rag system step by step.
4:25query translation, indexing
4:28re-ranking, and
4:30generation. Each video
4:33one module with concrete code you
4:36ship. This is video one of the rag
4:39builder series. Subscribe so you
4:42miss the next one. That's the full
4:45If you want to go deeper, join
4:48free live session this Friday at noon
4:50on Maven. I walk through this
4:53answer questions, and show you
4:56to build it yourself. Scan the QR
4:59to join.

RAG Chunking Strategies Explained in 5 Minutes Next: RAG Re-ranking: Bi-Encoder vs Cross-Encoder Explained

Want the next one in your inbox?

Join 1,000+ Product Managers getting one deep dive every Friday.