Learn RAG from Scratch

From RAG to Agentic RAG (What Changed?)

Video 9 of 9 · 7:10

Chapters

  • 0:00The problem with one-shot RAG
  • 0:45Self-RAG: the core idea
  • 1:30Self-RAG walkthrough
  • 3:00CRAG and when to use it

Transcript

Auto generated by YouTube. Click any timestamp to jump to that moment.

Show
  1. 0:04your cancellation policy. The
  2. 0:07pulls three documents. One is
  3. 0:10pricing page. One is the onboarding
  4. 0:13One is a 2-year-old FAQ that
  5. 0:17mentions the old policy. The LLM
  6. 0:20all three and generates an answer
  7. 0:22on the outdated FAQ. Confidently
  8. 0:26And here is the real problem.
  9. 0:28rag has no way to know it failed.
  10. 0:32retrieves once, generates once, and
  11. 0:35If the retrieval was bad, the
  12. 0:38is bad. No recovery, no second
  13. 0:41What if the system could check
  14. 0:43own work? Self rag adds a grading
  15. 0:47after generation. The system
  16. 0:50an answer, then asks itself,
  17. 0:53this answer actually supported by the
  18. 0:55documents? If yes, return it.
  19. 0:59no, go back, refine the query,
  20. 1:03again, generate again, grade
  21. 1:06It is a feedback loop. The system
  22. 1:09trying until the answer is
  23. 1:11or until it hits a maximum
  24. 1:13count. Think of it like a code
  25. 1:16for your rag pipeline, except the
  26. 1:19is the system itself.
  27. 1:22walk through a concrete example.
  28. 1:25are building a product docs
  29. 1:27for your SAS app. A user asks,
  30. 1:31happens if I cancel my annual plan
  31. 1:33Attempt one. The retriever
  32. 1:37your docs and pulls back three
  33. 1:39The pricing overview, a general
  34. 1:42about billing, and a blog post about
  35. 1:45new pricing tiers. The LLM reads
  36. 1:48and generates, "You will receive a
  37. 1:50refund for the remaining months."
  38. 1:53reasonable, but the greater
  39. 1:55this answer against the source
  40. 1:57None of them actually mention
  41. 1:59for mid-year cancellation. The
  42. 2:02overview talks about plan costs.
  43. 2:05FAQ covers payment methods. The blog
  44. 2:08is about new tiers. The greater
  45. 2:11the answer as not grounded. Score
  46. 2:14out of one. The system does not give
  47. 2:18It takes the original question and
  48. 2:20failed attempt and reformulates the
  49. 2:23Instead of cancel annual plan
  50. 2:26it tries cancellation policy
  51. 2:28pro-rated annual more specific
  52. 2:32keywords attempt two. The
  53. 2:34finds the actual cancellation
  54. 2:37document section 4.2. It says
  55. 2:41plans canceled midterm receive a
  56. 2:43refund minus 2 months of
  57. 2:46The LLM generates a new answer.
  58. 2:49will receive a pro-rated refund
  59. 2:51two months of service used. The
  60. 2:54checks again. This time, the
  61. 2:56directly quotes the source
  62. 2:58Grounded. Score 0.95.
  63. 3:02the answer. Two attempts. The
  64. 3:05one would have been confidently
  65. 3:06The second one is correct. If you
  66. 3:10to learn how to do this yourself, I
  67. 3:13free live sessions every Friday at
  68. 3:16Eastern. Scan the QR code on screen
  69. 3:20join. Would love to see you there.
  70. 3:24grades after generation.
  71. 3:27rag or crag grades before
  72. 3:30Here is the difference.
  73. 3:33come back from retrieval
  74. 3:35they even reach the LLM. Prague
  75. 3:38each one. Is this document
  76. 3:41relevant to the question? It
  77. 3:44each document. High confidence
  78. 3:46go straight to the LLM. Low
  79. 3:49documents get discarded. And
  80. 3:52all documents score low, crag does
  81. 3:55different. It falls back to
  82. 3:57search. It goes outside your
  83. 3:59base entirely to find better
  84. 4:01Think of CRA as a bouncer at
  85. 4:04door. Only relevant documents get
  86. 4:07Self-ra is a reviewer after the
  87. 4:10CRAG prevents bad input.
  88. 4:13catches bad output. The best
  89. 4:16systems use both. Self rag
  90. 4:20crag are specific patterns, but the
  91. 4:23idea is retrieval loops. The
  92. 4:27can loop multiple times.
  93. 4:29grade the documents. If
  94. 4:32is low, refine the query.
  95. 4:34again. Grade the new documents.
  96. 4:37quality is still low, try a different
  97. 4:40source. Retrieve again. Grade
  98. 4:43Generate. Grade the answer. If
  99. 4:46answer is not grounded, loop back
  100. 4:48more time. Each loop narrows the
  101. 4:51Each retry improves precision.
  102. 4:54practice, most queries resolve in one
  103. 4:57two loops. The retry mechanism is
  104. 5:00for the hard cases, ambiguous
  105. 5:02outdated documents, queries
  106. 5:05span multiple topics.
  107. 5:08gentic rag is not free. Every retry
  108. 5:12adds latency. A basic rag call
  109. 5:141 to two seconds. With
  110. 5:17that can become 3 to 5
  111. 5:19per retry. Two retries and you
  112. 5:22at 10 seconds. For a chatbot where
  113. 5:25expect instant replies, that is
  114. 5:28slow. But for a support system where
  115. 5:31matters more than speed, it is
  116. 5:34every millisecond. For a medical
  117. 5:36base, you cannot afford a
  118. 5:39wrong answer. For a legal
  119. 5:41search, one bad retrieval could
  120. 5:44the wrong clause cited in a
  121. 5:46The rule of thumb, if a wrong
  122. 5:49costs more than a slow answer,
  123. 5:51a gentic rag. If speed matters more
  124. 5:54perfect accuracy, stick with basic
  125. 5:57and invest in better indexing
  126. 6:01is the full picture. A question
  127. 6:04in. CRA evaluates the retrieved
  128. 6:07before generation. Bad
  129. 6:09get discarded and web search
  130. 6:12the gap. The LLM generates an
  131. 6:15Selfrag grades it for
  132. 6:17If it fails, the query
  133. 6:20refined and the loop starts over.
  134. 6:24feedback loops working together, one
  135. 6:26input, one verifying output.
  136. 6:30the basic rag pipeline first. Add
  137. 6:33when you notice retrieval quality
  138. 6:34Add self rag when you need
  139. 6:37verification. Each piece is
  140. 6:40deployable. Start simple.
  141. 6:43correction when you need it. That's
  142. 6:46full picture. If you want to go
  143. 6:48join my free live session this
  144. 6:50at noon Eastern on Maven. I walk
  145. 6:54this hands-on, answer questions,
  146. 6:56show you how to build it yourself.
  147. 6:59the QR code to join.

Want the next one in your inbox?

Join 1,000+ Product Managers getting one deep dive every Friday.