← Learn RAG from Scratch

RAG Multi-Query, HyDE & Fusion (Complete Guide)

Video 8 of 9 · 6:31

Chapters

0:00The problem with vague questions
0:50Multi-query: the hero technique
2:00Multi-query before and after
2:50RAG-Fusion, decomposition, step-back, HyDE

Transcript

Auto generated by YouTube. Click any timestamp to jump to that moment.

Show

0:03You wire up the search. Then a
0:06types, "How do I set up Oth and
0:08system returns the billing FAQ?"
0:11because your embeddings are bad,
0:13the question is too vague. O
0:16mean OOTH tokens, session
0:19API keys, or SSO
0:22Your vector database
0:25know which one the user means.
0:27it guesses and it guesses wrong.
0:30is the most common failure in
0:32rag. The user's question
0:34match how your documents
0:36the same concept. Query
0:39fixes this. Instead of
0:41with the vague question
0:43you transform it first into
0:45that actually match your stored
0:47Multi-query is the single most
0:51query translation technique.
0:54is how it works. A user asks, "How
0:57I set up O?" The LLM takes that vague
1:00and rewrites it into three
1:03versions. Version one, what
1:06protocols does the API
1:08such as OOTH 2 or API keys?
1:12two, how do users log in and
1:14session tokens? Version three,
1:17is the step-by-step process for
1:19SSO with a third-party
1:22provider? Each version targets
1:24different aspect of authentication,
1:27each one retrieves different
1:28Version one pulls the API
1:31Version two pulls the login
1:34guide. Version three pulls the SSO
1:37walkthrough. You merge all three
1:40sets. Dduplicate any overlapping
1:43Now you have comprehensive
1:45that no single query could have
1:48The vague question retrieved
1:50wrong document. The three specific
1:53retrieved exactly the right
1:55Let's see the actual difference.
1:59query translation, you search
2:01how do I set up off the vector
2:04returns your billing FAQ at
2:07similarity, the pricing page at
2:1168 and the general overview and
2:16of these are what the user needs.
2:19with multi-query, the same question
2:22three targeted searches. The
2:25results include the OOTH setup
2:28at 0.94,
2:30session management docs at 0.91, and
2:34SSO configuration walkthrough at
2:38relevant documents instead of
2:40The retrieval quality jumps from
2:43to production ready. That is why
2:46should be your default query
2:48strategy. It is simple to
2:51works with any LLM and
2:54improves retrieval for
2:56questions. If you want to
2:59how to do this yourself, I run
3:02live sessions every Friday at noon
3:05Scan the QR code on screen to
3:08would love to see you there. Hide
3:13the approach. Instead of rewriting
3:16question, you generate a fake
3:18The user asks, "How do I set up
3:21The LLM writes a hypothetical
3:24something like, "To configure
3:27first register your
3:29in the OOTH dashboard, then
3:32client credentials and
3:34the token exchange flow." That
3:37answer is not real, but it
3:40like your actual documentation.
3:42embed that fake answer and search
3:45real documents with similar
3:47The intuition is simple. A
3:50answer is closer in
3:52space to the real answer than
3:54original vague question was. Hide
3:57best when your documents follow
3:59formatting. If your docs are
4:01the hypothetical answer might not
4:04any real document style.
4:07questions are not vague. They are
4:10Compare the authentication
4:12and recommend the best one for a
4:15SAS app. That is actually
4:18questions. What authentication
4:20exist? What are the trade-offs
4:23each? Which one fits a multi-tenant
4:27breaks the complex
4:29into sub questions. Each sub
4:32gets answered independently.
4:35the results get combined into a
4:37answer. Step back prompting takes
4:40opposite approach. Instead of
4:42down, it zooms out. How do I
4:45up off becomes what are the general
4:48of web application
4:51broader question retrieves
4:53context that helps the LLM
4:56about the specific question. Both
4:58handle edge cases that
5:00misses. Use decomposition
5:04complex multi-part questions. Use
5:07when the user needs background
5:09to get a good answer.
5:13is how to decide which technique to
5:15Start with multiquery as your
5:18It handles the most common
5:20vague questions, and it is the
5:23to implement. If your documents
5:26consistent formatting and your
5:28ask short questions, add hide. The
5:32answer approach works well
5:34clean structured docs. For
5:36where users ask complex
5:38questions, add decomposition.
5:42the question down. Answer each
5:44Combine the results. And when
5:47need deep background context to
5:49a useful answer, use step back
5:51In practice, most production
5:54start with multi-query and only
5:56the others when they see specific
5:58patterns. Start simple. Measure
6:01fails. Add complexity only where it
6:05That's the full picture. If you
6:08to go deeper, join my free live
6:10this Friday at noon Eastern on
6:13I walk through this hands-on,
6:16questions, and show you how to
6:18it yourself. Scan the QR code to

Do not Ship RAG Without This (Evaluation Metrics)Next: From RAG to Agentic RAG (What Changed?)

Want the next one in your inbox?

Join 1,000+ Product Managers getting one deep dive every Friday.