← Learn RAG from Scratch
From RAG to Agentic RAG (What Changed?)
Video 9 of 9 · 7:10
Chapters
- 0:00The problem with one-shot RAG
- 0:45Self-RAG: the core idea
- 1:30Self-RAG walkthrough
- 3:00CRAG and when to use it
Transcript
Auto generated by YouTube. Click any timestamp to jump to that moment.
Show
Transcript
Auto generated by YouTube. Click any timestamp to jump to that moment.
- 0:04your cancellation policy. The
- 0:07pulls three documents. One is
- 0:10pricing page. One is the onboarding
- 0:13One is a 2-year-old FAQ that
- 0:17mentions the old policy. The LLM
- 0:20all three and generates an answer
- 0:22on the outdated FAQ. Confidently
- 0:26And here is the real problem.
- 0:28rag has no way to know it failed.
- 0:32retrieves once, generates once, and
- 0:35If the retrieval was bad, the
- 0:38is bad. No recovery, no second
- 0:41What if the system could check
- 0:43own work? Self rag adds a grading
- 0:47after generation. The system
- 0:50an answer, then asks itself,
- 0:53this answer actually supported by the
- 0:55documents? If yes, return it.
- 0:59no, go back, refine the query,
- 1:03again, generate again, grade
- 1:06It is a feedback loop. The system
- 1:09trying until the answer is
- 1:11or until it hits a maximum
- 1:13count. Think of it like a code
- 1:16for your rag pipeline, except the
- 1:19is the system itself.
- 1:22walk through a concrete example.
- 1:25are building a product docs
- 1:27for your SAS app. A user asks,
- 1:31happens if I cancel my annual plan
- 1:33Attempt one. The retriever
- 1:37your docs and pulls back three
- 1:39The pricing overview, a general
- 1:42about billing, and a blog post about
- 1:45new pricing tiers. The LLM reads
- 1:48and generates, "You will receive a
- 1:50refund for the remaining months."
- 1:53reasonable, but the greater
- 1:55this answer against the source
- 1:57None of them actually mention
- 1:59for mid-year cancellation. The
- 2:02overview talks about plan costs.
- 2:05FAQ covers payment methods. The blog
- 2:08is about new tiers. The greater
- 2:11the answer as not grounded. Score
- 2:14out of one. The system does not give
- 2:18It takes the original question and
- 2:20failed attempt and reformulates the
- 2:23Instead of cancel annual plan
- 2:26it tries cancellation policy
- 2:28pro-rated annual more specific
- 2:32keywords attempt two. The
- 2:34finds the actual cancellation
- 2:37document section 4.2. It says
- 2:41plans canceled midterm receive a
- 2:43refund minus 2 months of
- 2:46The LLM generates a new answer.
- 2:49will receive a pro-rated refund
- 2:51two months of service used. The
- 2:54checks again. This time, the
- 2:56directly quotes the source
- 2:58Grounded. Score 0.95.
- 3:02the answer. Two attempts. The
- 3:05one would have been confidently
- 3:06The second one is correct. If you
- 3:10to learn how to do this yourself, I
- 3:13free live sessions every Friday at
- 3:16Eastern. Scan the QR code on screen
- 3:20join. Would love to see you there.
- 3:24grades after generation.
- 3:27rag or crag grades before
- 3:30Here is the difference.
- 3:33come back from retrieval
- 3:35they even reach the LLM. Prague
- 3:38each one. Is this document
- 3:41relevant to the question? It
- 3:44each document. High confidence
- 3:46go straight to the LLM. Low
- 3:49documents get discarded. And
- 3:52all documents score low, crag does
- 3:55different. It falls back to
- 3:57search. It goes outside your
- 3:59base entirely to find better
- 4:01Think of CRA as a bouncer at
- 4:04door. Only relevant documents get
- 4:07Self-ra is a reviewer after the
- 4:10CRAG prevents bad input.
- 4:13catches bad output. The best
- 4:16systems use both. Self rag
- 4:20crag are specific patterns, but the
- 4:23idea is retrieval loops. The
- 4:27can loop multiple times.
- 4:29grade the documents. If
- 4:32is low, refine the query.
- 4:34again. Grade the new documents.
- 4:37quality is still low, try a different
- 4:40source. Retrieve again. Grade
- 4:43Generate. Grade the answer. If
- 4:46answer is not grounded, loop back
- 4:48more time. Each loop narrows the
- 4:51Each retry improves precision.
- 4:54practice, most queries resolve in one
- 4:57two loops. The retry mechanism is
- 5:00for the hard cases, ambiguous
- 5:02outdated documents, queries
- 5:05span multiple topics.
- 5:08gentic rag is not free. Every retry
- 5:12adds latency. A basic rag call
- 5:141 to two seconds. With
- 5:17that can become 3 to 5
- 5:19per retry. Two retries and you
- 5:22at 10 seconds. For a chatbot where
- 5:25expect instant replies, that is
- 5:28slow. But for a support system where
- 5:31matters more than speed, it is
- 5:34every millisecond. For a medical
- 5:36base, you cannot afford a
- 5:39wrong answer. For a legal
- 5:41search, one bad retrieval could
- 5:44the wrong clause cited in a
- 5:46The rule of thumb, if a wrong
- 5:49costs more than a slow answer,
- 5:51a gentic rag. If speed matters more
- 5:54perfect accuracy, stick with basic
- 5:57and invest in better indexing
- 6:01is the full picture. A question
- 6:04in. CRA evaluates the retrieved
- 6:07before generation. Bad
- 6:09get discarded and web search
- 6:12the gap. The LLM generates an
- 6:15Selfrag grades it for
- 6:17If it fails, the query
- 6:20refined and the loop starts over.
- 6:24feedback loops working together, one
- 6:26input, one verifying output.
- 6:30the basic rag pipeline first. Add
- 6:33when you notice retrieval quality
- 6:34Add self rag when you need
- 6:37verification. Each piece is
- 6:40deployable. Start simple.
- 6:43correction when you need it. That's
- 6:46full picture. If you want to go
- 6:48join my free live session this
- 6:50at noon Eastern on Maven. I walk
- 6:54this hands-on, answer questions,
- 6:56show you how to build it yourself.
- 6:59the QR code to join.
Want the next one in your inbox?
Join 1,000+ Product Managers getting one deep dive every Friday.