← Learn RAG from Scratch

From RAG to Agentic RAG (What Changed?)

Video 9 of 9 · 7:10

Chapters

0:00The problem with one-shot RAG
0:45Self-RAG: the core idea
1:30Self-RAG walkthrough
3:00CRAG and when to use it

Transcript

Auto generated by YouTube. Click any timestamp to jump to that moment.

Show

0:04your cancellation policy. The
0:07pulls three documents. One is
0:10pricing page. One is the onboarding
0:13One is a 2-year-old FAQ that
0:17mentions the old policy. The LLM
0:20all three and generates an answer
0:22on the outdated FAQ. Confidently
0:26And here is the real problem.
0:28rag has no way to know it failed.
0:32retrieves once, generates once, and
0:35If the retrieval was bad, the
0:38is bad. No recovery, no second
0:41What if the system could check
0:43own work? Self rag adds a grading
0:47after generation. The system
0:50an answer, then asks itself,
0:53this answer actually supported by the
0:55documents? If yes, return it.
0:59no, go back, refine the query,
1:03again, generate again, grade
1:06It is a feedback loop. The system
1:09trying until the answer is
1:11or until it hits a maximum
1:13count. Think of it like a code
1:16for your rag pipeline, except the
1:19is the system itself.
1:22walk through a concrete example.
1:25are building a product docs
1:27for your SAS app. A user asks,
1:31happens if I cancel my annual plan
1:33Attempt one. The retriever
1:37your docs and pulls back three
1:39The pricing overview, a general
1:42about billing, and a blog post about
1:45new pricing tiers. The LLM reads
1:48and generates, "You will receive a
1:50refund for the remaining months."
1:53reasonable, but the greater
1:55this answer against the source
1:57None of them actually mention
1:59for mid-year cancellation. The
2:02overview talks about plan costs.
2:05FAQ covers payment methods. The blog
2:08is about new tiers. The greater
2:11the answer as not grounded. Score
2:14out of one. The system does not give
2:18It takes the original question and
2:20failed attempt and reformulates the
2:23Instead of cancel annual plan
2:26it tries cancellation policy
2:28pro-rated annual more specific
2:32keywords attempt two. The
2:34finds the actual cancellation
2:37document section 4.2. It says
2:41plans canceled midterm receive a
2:43refund minus 2 months of
2:46The LLM generates a new answer.
2:49will receive a pro-rated refund
2:51two months of service used. The
2:54checks again. This time, the
2:56directly quotes the source
2:58Grounded. Score 0.95.
3:02the answer. Two attempts. The
3:05one would have been confidently
3:06The second one is correct. If you
3:10to learn how to do this yourself, I
3:13free live sessions every Friday at
3:16Eastern. Scan the QR code on screen
3:20join. Would love to see you there.
3:24grades after generation.
3:27rag or crag grades before
3:30Here is the difference.
3:33come back from retrieval
3:35they even reach the LLM. Prague
3:38each one. Is this document
3:41relevant to the question? It
3:44each document. High confidence
3:46go straight to the LLM. Low
3:49documents get discarded. And
3:52all documents score low, crag does
3:55different. It falls back to
3:57search. It goes outside your
3:59base entirely to find better
4:01Think of CRA as a bouncer at
4:04door. Only relevant documents get
4:07Self-ra is a reviewer after the
4:10CRAG prevents bad input.
4:13catches bad output. The best
4:16systems use both. Self rag
4:20crag are specific patterns, but the
4:23idea is retrieval loops. The
4:27can loop multiple times.
4:29grade the documents. If
4:32is low, refine the query.
4:34again. Grade the new documents.
4:37quality is still low, try a different
4:40source. Retrieve again. Grade
4:43Generate. Grade the answer. If
4:46answer is not grounded, loop back
4:48more time. Each loop narrows the
4:51Each retry improves precision.
4:54practice, most queries resolve in one
4:57two loops. The retry mechanism is
5:00for the hard cases, ambiguous
5:02outdated documents, queries
5:05span multiple topics.
5:08gentic rag is not free. Every retry
5:12adds latency. A basic rag call
5:141 to two seconds. With
5:17that can become 3 to 5
5:19per retry. Two retries and you
5:22at 10 seconds. For a chatbot where
5:25expect instant replies, that is
5:28slow. But for a support system where
5:31matters more than speed, it is
5:34every millisecond. For a medical
5:36base, you cannot afford a
5:39wrong answer. For a legal
5:41search, one bad retrieval could
5:44the wrong clause cited in a
5:46The rule of thumb, if a wrong
5:49costs more than a slow answer,
5:51a gentic rag. If speed matters more
5:54perfect accuracy, stick with basic
5:57and invest in better indexing
6:01is the full picture. A question
6:04in. CRA evaluates the retrieved
6:07before generation. Bad
6:09get discarded and web search
6:12the gap. The LLM generates an
6:15Selfrag grades it for
6:17If it fails, the query
6:20refined and the loop starts over.
6:24feedback loops working together, one
6:26input, one verifying output.
6:30the basic rag pipeline first. Add
6:33when you notice retrieval quality
6:34Add self rag when you need
6:37verification. Each piece is
6:40deployable. Start simple.
6:43correction when you need it. That's
6:46full picture. If you want to go
6:48join my free live session this
6:50at noon Eastern on Maven. I walk
6:54this hands-on, answer questions,
6:56show you how to build it yourself.
6:59the QR code to join.

RAG Multi-Query, HyDE & Fusion (Complete Guide)

Want the next one in your inbox?

Join 1,000+ Product Managers getting one deep dive every Friday.