Learn RAG from Scratch

RAG vs Fine-Tuning vs Prompting - Simple Decision Guide

Video 3 of 9 · 5:09

Chapters

  • 0:00The hallucination problem
  • 0:55What RAG actually does
  • 1:40RAG vs fine-tuning vs context stuffing
  • 3:20Three failure modes of naive RAG

Transcript

Auto generated by YouTube. Click any timestamp to jump to that moment.

Show
  1. 0:02You plug in an LLM and ask it,
  2. 0:05our refund policy for annual
  3. 0:08The model responds confidently.
  4. 0:11refund within 60 days of purchase.
  5. 0:15reasonable. Except your actual
  6. 0:17says pro-rated refund minus 2
  7. 0:21of service used. The LLM didn't
  8. 0:24It didn't look anything up. It
  9. 0:27a plausible sounding answer
  10. 0:29its training data. This is called a
  11. 0:32The model is confident
  12. 0:35wrong. And for any question about
  13. 0:38company's specific data, this will
  14. 0:40Your product docs, your pricing,
  15. 0:43internal policies, the LLM has
  16. 0:46seen any of it. Rag stands for
  17. 0:50augmented generation.
  18. 0:53of asking the LLM to answer from
  19. 0:55you first search your own data
  20. 0:58relevant documents. Then you pass
  21. 1:00documents to the LLM as context.
  22. 1:04LLM reads them and generates an
  23. 1:07based on what it found. Think of
  24. 1:10like this. Without RAD, you're asking
  25. 1:12to answer a question about a
  26. 1:14they've never read. With RAG, you
  27. 1:17them the right pages first, then
  28. 1:20the question. The model doesn't need
  29. 1:23memorize your data. It just needs to
  30. 1:25the right context at query time. So
  31. 1:29know you need your data in the LLM
  32. 1:31You have three options. Option
  33. 1:35stuff the context. Just paste your
  34. 1:38directly into the prompt. This
  35. 1:41when you have a small amount of
  36. 1:42maybe under 50 pages. It's simple,
  37. 1:46infrastructure, but it breaks when
  38. 1:48data grows and you're paying for
  39. 1:51every single call. Option two,
  40. 1:54You retrain the model on
  41. 1:57data. This changes the model's
  42. 1:59and knowledge permanently. It's
  43. 2:02for teaching the model a specific
  44. 2:04or domain vocabulary, but it's
  45. 2:07slow to update, and the model
  46. 2:10still hallucinate. Option three,
  47. 2:13You store your docs in a searchable
  48. 2:16retrieve what's relevant at
  49. 2:18time, and pass it as context. It
  50. 2:21to millions of documents. You can
  51. 2:24your data without retraining, and
  52. 2:26model always cites what it found.
  53. 2:29the decision framework. Use
  54. 2:31stuffing when your data fits in
  55. 2:34context window. Use fine-tuning when
  56. 2:37need to change the model's behavior,
  57. 2:39just its knowledge. Use RAD when you
  58. 2:42large or frequently changing data
  59. 2:44the model needs to reference.
  60. 2:47you want to learn how to do this
  61. 2:49I run free live sessions every
  62. 2:52at noon Eastern. Scan the QR code
  63. 2:56screen to join. Would love to see you
  64. 3:02here's the catch. Even with Rag,
  65. 3:05go wrong. There are three
  66. 3:07failure modes you need to know.
  67. 3:10mode one, wrong document. The
  68. 3:13asks about the refund policy, but
  69. 3:15retriever pulls the pricing page
  70. 3:17The embedding for refund policy
  71. 3:20close to pricing because they share
  72. 3:22vocabulary. So, the LLM
  73. 3:25an answer about pricing, not
  74. 3:27Real data, wrong source.
  75. 3:31mode two. Right document, wrong
  76. 3:33The retriever finds the refund
  77. 3:36document, but it grabs a chunk
  78. 3:38monthly plan refunds when the user
  79. 3:40about annual plans. The answer is
  80. 3:43the correct document, but the wrong
  81. 3:45Failure note three, no match.
  82. 3:49user asks, "What happens if I cancel
  83. 3:51Your docs use the phrase
  84. 3:53termination. The embeddings
  85. 3:56match because the vocabulary is
  86. 3:57The retriever returns nothing
  87. 3:59and the LLM either hallucinates
  88. 4:02says it doesn't know. Each of these
  89. 4:05modes has a known fix. Wrong
  90. 4:08that's a routing and indexing
  91. 4:11Wrong chunk, that's a chunking
  92. 4:14problem. No match, that's a
  93. 4:17translation problem. In the next
  94. 4:20in this series, we'll build a
  95. 4:22rag system step by step.
  96. 4:25query translation, indexing
  97. 4:28re-ranking, and
  98. 4:30generation. Each video
  99. 4:33one module with concrete code you
  100. 4:36ship. This is video one of the rag
  101. 4:39builder series. Subscribe so you
  102. 4:42miss the next one. That's the full
  103. 4:45If you want to go deeper, join
  104. 4:48free live session this Friday at noon
  105. 4:50on Maven. I walk through this
  106. 4:53answer questions, and show you
  107. 4:56to build it yourself. Scan the QR
  108. 4:59to join.

Want the next one in your inbox?

Join 1,000+ Product Managers getting one deep dive every Friday.