← Learn RAG from Scratch
RAG Multi-Query, HyDE & Fusion (Complete Guide)
Video 8 of 9 · 6:31
Chapters
- 0:00The problem with vague questions
- 0:50Multi-query: the hero technique
- 2:00Multi-query before and after
- 2:50RAG-Fusion, decomposition, step-back, HyDE
Transcript
Auto generated by YouTube. Click any timestamp to jump to that moment.
Show
Transcript
Auto generated by YouTube. Click any timestamp to jump to that moment.
- 0:03You wire up the search. Then a
- 0:06types, "How do I set up Oth and
- 0:08system returns the billing FAQ?"
- 0:11because your embeddings are bad,
- 0:13the question is too vague. O
- 0:16mean OOTH tokens, session
- 0:19API keys, or SSO
- 0:22Your vector database
- 0:25know which one the user means.
- 0:27it guesses and it guesses wrong.
- 0:30is the most common failure in
- 0:32rag. The user's question
- 0:34match how your documents
- 0:36the same concept. Query
- 0:39fixes this. Instead of
- 0:41with the vague question
- 0:43you transform it first into
- 0:45that actually match your stored
- 0:47Multi-query is the single most
- 0:51query translation technique.
- 0:54is how it works. A user asks, "How
- 0:57I set up O?" The LLM takes that vague
- 1:00and rewrites it into three
- 1:03versions. Version one, what
- 1:06protocols does the API
- 1:08such as OOTH 2 or API keys?
- 1:12two, how do users log in and
- 1:14session tokens? Version three,
- 1:17is the step-by-step process for
- 1:19SSO with a third-party
- 1:22provider? Each version targets
- 1:24different aspect of authentication,
- 1:27each one retrieves different
- 1:28Version one pulls the API
- 1:31Version two pulls the login
- 1:34guide. Version three pulls the SSO
- 1:37walkthrough. You merge all three
- 1:40sets. Dduplicate any overlapping
- 1:43Now you have comprehensive
- 1:45that no single query could have
- 1:48The vague question retrieved
- 1:50wrong document. The three specific
- 1:53retrieved exactly the right
- 1:55Let's see the actual difference.
- 1:59query translation, you search
- 2:01how do I set up off the vector
- 2:04returns your billing FAQ at
- 2:07similarity, the pricing page at
- 2:1168 and the general overview and
- 2:16of these are what the user needs.
- 2:19with multi-query, the same question
- 2:22three targeted searches. The
- 2:25results include the OOTH setup
- 2:28at 0.94,
- 2:30session management docs at 0.91, and
- 2:34SSO configuration walkthrough at
- 2:38relevant documents instead of
- 2:40The retrieval quality jumps from
- 2:43to production ready. That is why
- 2:46should be your default query
- 2:48strategy. It is simple to
- 2:51works with any LLM and
- 2:54improves retrieval for
- 2:56questions. If you want to
- 2:59how to do this yourself, I run
- 3:02live sessions every Friday at noon
- 3:05Scan the QR code on screen to
- 3:08would love to see you there. Hide
- 3:13the approach. Instead of rewriting
- 3:16question, you generate a fake
- 3:18The user asks, "How do I set up
- 3:21The LLM writes a hypothetical
- 3:24something like, "To configure
- 3:27first register your
- 3:29in the OOTH dashboard, then
- 3:32client credentials and
- 3:34the token exchange flow." That
- 3:37answer is not real, but it
- 3:40like your actual documentation.
- 3:42embed that fake answer and search
- 3:45real documents with similar
- 3:47The intuition is simple. A
- 3:50answer is closer in
- 3:52space to the real answer than
- 3:54original vague question was. Hide
- 3:57best when your documents follow
- 3:59formatting. If your docs are
- 4:01the hypothetical answer might not
- 4:04any real document style.
- 4:07questions are not vague. They are
- 4:10Compare the authentication
- 4:12and recommend the best one for a
- 4:15SAS app. That is actually
- 4:18questions. What authentication
- 4:20exist? What are the trade-offs
- 4:23each? Which one fits a multi-tenant
- 4:27breaks the complex
- 4:29into sub questions. Each sub
- 4:32gets answered independently.
- 4:35the results get combined into a
- 4:37answer. Step back prompting takes
- 4:40opposite approach. Instead of
- 4:42down, it zooms out. How do I
- 4:45up off becomes what are the general
- 4:48of web application
- 4:51broader question retrieves
- 4:53context that helps the LLM
- 4:56about the specific question. Both
- 4:58handle edge cases that
- 5:00misses. Use decomposition
- 5:04complex multi-part questions. Use
- 5:07when the user needs background
- 5:09to get a good answer.
- 5:13is how to decide which technique to
- 5:15Start with multiquery as your
- 5:18It handles the most common
- 5:20vague questions, and it is the
- 5:23to implement. If your documents
- 5:26consistent formatting and your
- 5:28ask short questions, add hide. The
- 5:32answer approach works well
- 5:34clean structured docs. For
- 5:36where users ask complex
- 5:38questions, add decomposition.
- 5:42the question down. Answer each
- 5:44Combine the results. And when
- 5:47need deep background context to
- 5:49a useful answer, use step back
- 5:51In practice, most production
- 5:54start with multi-query and only
- 5:56the others when they see specific
- 5:58patterns. Start simple. Measure
- 6:01fails. Add complexity only where it
- 6:05That's the full picture. If you
- 6:08to go deeper, join my free live
- 6:10this Friday at noon Eastern on
- 6:13I walk through this hands-on,
- 6:16questions, and show you how to
- 6:18it yourself. Scan the QR code to
Want the next one in your inbox?
Join 1,000+ Product Managers getting one deep dive every Friday.