← Learn RAG from Scratch
RAG vs Fine-Tuning vs Prompting - Simple Decision Guide
Video 3 of 9 · 5:09
Chapters
- 0:00The hallucination problem
- 0:55What RAG actually does
- 1:40RAG vs fine-tuning vs context stuffing
- 3:20Three failure modes of naive RAG
Transcript
Auto generated by YouTube. Click any timestamp to jump to that moment.
Show
Transcript
Auto generated by YouTube. Click any timestamp to jump to that moment.
- 0:02You plug in an LLM and ask it,
- 0:05our refund policy for annual
- 0:08The model responds confidently.
- 0:11refund within 60 days of purchase.
- 0:15reasonable. Except your actual
- 0:17says pro-rated refund minus 2
- 0:21of service used. The LLM didn't
- 0:24It didn't look anything up. It
- 0:27a plausible sounding answer
- 0:29its training data. This is called a
- 0:32The model is confident
- 0:35wrong. And for any question about
- 0:38company's specific data, this will
- 0:40Your product docs, your pricing,
- 0:43internal policies, the LLM has
- 0:46seen any of it. Rag stands for
- 0:50augmented generation.
- 0:53of asking the LLM to answer from
- 0:55you first search your own data
- 0:58relevant documents. Then you pass
- 1:00documents to the LLM as context.
- 1:04LLM reads them and generates an
- 1:07based on what it found. Think of
- 1:10like this. Without RAD, you're asking
- 1:12to answer a question about a
- 1:14they've never read. With RAG, you
- 1:17them the right pages first, then
- 1:20the question. The model doesn't need
- 1:23memorize your data. It just needs to
- 1:25the right context at query time. So
- 1:29know you need your data in the LLM
- 1:31You have three options. Option
- 1:35stuff the context. Just paste your
- 1:38directly into the prompt. This
- 1:41when you have a small amount of
- 1:42maybe under 50 pages. It's simple,
- 1:46infrastructure, but it breaks when
- 1:48data grows and you're paying for
- 1:51every single call. Option two,
- 1:54You retrain the model on
- 1:57data. This changes the model's
- 1:59and knowledge permanently. It's
- 2:02for teaching the model a specific
- 2:04or domain vocabulary, but it's
- 2:07slow to update, and the model
- 2:10still hallucinate. Option three,
- 2:13You store your docs in a searchable
- 2:16retrieve what's relevant at
- 2:18time, and pass it as context. It
- 2:21to millions of documents. You can
- 2:24your data without retraining, and
- 2:26model always cites what it found.
- 2:29the decision framework. Use
- 2:31stuffing when your data fits in
- 2:34context window. Use fine-tuning when
- 2:37need to change the model's behavior,
- 2:39just its knowledge. Use RAD when you
- 2:42large or frequently changing data
- 2:44the model needs to reference.
- 2:47you want to learn how to do this
- 2:49I run free live sessions every
- 2:52at noon Eastern. Scan the QR code
- 2:56screen to join. Would love to see you
- 3:02here's the catch. Even with Rag,
- 3:05go wrong. There are three
- 3:07failure modes you need to know.
- 3:10mode one, wrong document. The
- 3:13asks about the refund policy, but
- 3:15retriever pulls the pricing page
- 3:17The embedding for refund policy
- 3:20close to pricing because they share
- 3:22vocabulary. So, the LLM
- 3:25an answer about pricing, not
- 3:27Real data, wrong source.
- 3:31mode two. Right document, wrong
- 3:33The retriever finds the refund
- 3:36document, but it grabs a chunk
- 3:38monthly plan refunds when the user
- 3:40about annual plans. The answer is
- 3:43the correct document, but the wrong
- 3:45Failure note three, no match.
- 3:49user asks, "What happens if I cancel
- 3:51Your docs use the phrase
- 3:53termination. The embeddings
- 3:56match because the vocabulary is
- 3:57The retriever returns nothing
- 3:59and the LLM either hallucinates
- 4:02says it doesn't know. Each of these
- 4:05modes has a known fix. Wrong
- 4:08that's a routing and indexing
- 4:11Wrong chunk, that's a chunking
- 4:14problem. No match, that's a
- 4:17translation problem. In the next
- 4:20in this series, we'll build a
- 4:22rag system step by step.
- 4:25query translation, indexing
- 4:28re-ranking, and
- 4:30generation. Each video
- 4:33one module with concrete code you
- 4:36ship. This is video one of the rag
- 4:39builder series. Subscribe so you
- 4:42miss the next one. That's the full
- 4:45If you want to go deeper, join
- 4:48free live session this Friday at noon
- 4:50on Maven. I walk through this
- 4:53answer questions, and show you
- 4:56to build it yourself. Scan the QR
- 4:59to join.
RAG Chunking Strategies Explained in 5 MinutesNext: RAG Re-ranking: Bi-Encoder vs Cross-Encoder Explained
Want the next one in your inbox?
Join 1,000+ Product Managers getting one deep dive every Friday.