← Learn RAG from Scratch
RAG Routing Explained: LLM vs Semantic Router (When to Use What)
Video 5 of 9 · 6:00
Chapters
- 0:00The multi-source data problem
- 0:45LLM router walkthrough
- 2:10Semantic router walkthrough
- 3:00When to use which
Transcript
Auto generated by YouTube. Click any timestamp to jump to that moment.
Show
Transcript
Auto generated by YouTube. Click any timestamp to jump to that moment.
- 0:03docs live in a vector store.
- 0:05numbers sit in a SQL database.
- 0:08org chart is in a graph database.
- 0:11records come from an API. A
- 0:14asks, "What's our refund policy?"
- 0:17basic rags searches the vector
- 0:19That works. But now someone asks,
- 0:22was Q3 revenue by segment?" The
- 0:25store has no idea. That question
- 0:27SQL. And who reports to the VP of
- 0:30That needs a graph query.
- 0:33rag sends every question to the
- 0:35place. That's the problem. You need
- 0:38router. Something that reads the
- 0:40understands the intent, and
- 0:42it to the right data source. The
- 0:45router is the most common approach.
- 0:49how it works. A question comes
- 0:51You send it to an LLM with a system
- 0:54that says, "Classify this
- 0:57into one of these categories:
- 0:59search, SQL query, graph query,
- 1:02API call." The LLM reads the
- 1:06understands the intent, and
- 1:08the category. Then your code
- 1:10to the right data source. Let's
- 1:13three concrete examples. First,
- 1:16our refund policy? The LLM
- 1:19this as unstructured text. It
- 1:22to the vector store which
- 1:24your policy documents and
- 1:26the exact refund clause. Second,
- 1:29revenue by segment. The LLM
- 1:32this as structured financial
- 1:34It routes to the SQL database.
- 1:38system generates the SQL query,
- 1:40it, and returns $2.4 million
- 1:44three segments. Third, who
- 1:47to the VP of engineering? The
- 1:49classifies this as an organizational
- 1:52It routes to the graph
- 1:54A cipher query traverses the
- 1:57chart and returns the list of direct
- 2:00Same router, three different
- 2:02three different data sources.
- 2:05LLM does the hard part,
- 2:07what the user actually
- 2:09The semantic router takes a
- 2:12approach. Instead of asking an
- 2:14to classify every question, you
- 2:18embeddings for each route.
- 2:21the setup. You define your
- 2:23Vector search gets example
- 2:26like explain the refund policy
- 2:29what does the user guide say. SQL
- 2:32gets revenue last quarter and
- 2:35sales by region. Graph query gets
- 2:38reports to and team structure. Each
- 2:42of examples gets embedded and
- 2:43Now a question comes in. You
- 2:46it using the same model. Then you
- 2:49cosine similarity against each
- 2:52embeddings. The highest score
- 2:54For what's our refund policy? The
- 2:57search route scores 0.94.
- 3:01scores 0.31.
- 3:04scores 0.18.
- 3:06to vector store. No LLM call
- 3:09This is faster and cheaper than
- 3:12LLM router, but it's less flexible.
- 3:15a question doesn't match any
- 3:17pattern, well, it can
- 3:19If you want to learn how to do
- 3:22yourself, I run free live sessions
- 3:26Friday at noon Eastern. Scan the
- 3:29code on screen to join. Would love to
- 3:33you there.
- 3:35questions need data from multiple
- 3:38Compare our refund policy with
- 3:41quarter's refund volume. That
- 3:44needs the policy document from
- 3:46vector store and the refund totals
- 3:48the SQL database.
- 3:50single source router would pick one or
- 3:53other. You'd get an incomplete
- 3:55A multissource router identifies
- 3:58the data sources a question needs.
- 4:01LLM tags the question with multiple
- 4:04policy document plus financial
- 4:07The system queries both sources in
- 4:10The vector store returns the
- 4:13policy. The SQL database returns
- 4:16refund volume breakdown. Both
- 4:18get passed to the LLM together.
- 4:21we can write a complete answer that
- 4:23the policy against the actual
- 4:26The key insight classify
- 4:29not destinations. One question
- 4:32have multiple intents that map to
- 4:35data sources. When should you
- 4:38which router? Use the LLM router
- 4:41your questions are diverse and
- 4:42It handles edge cases
- 4:45because it reasons about intent.
- 4:47trade-off, it's slower and costs
- 4:50per query. You're making an LLM
- 4:53before you even start retrieving.
- 4:56the semantic router when your
- 4:58patterns are well defined and
- 5:00need speed. It's a single embedding
- 5:03a cosine similarity computation.
- 5:05routing. The trade-off,
- 5:08only works well for questions that
- 5:10your predefined patterns. Use
- 5:13routing when your data is
- 5:15and questions often span
- 5:17sources. Start simple. One data
- 5:21no router needed. Two sources,
- 5:24an LLM router. High traffic, switch
- 5:27semantic. Complex queries, add
- 5:29Build the routing layer
- 5:32matches your actual complexity.
- 5:35the full picture. If you want to
- 5:37deeper, join my free live session
- 5:40Friday at noon Eastern on Maven. I
- 5:43through this hands-on, answer
- 5:45and show you how to build it
- 5:47Scan the QR code to join.
RAG Re-ranking: Bi-Encoder vs Cross-Encoder ExplainedNext: Hybrid Search for RAG: BM25 + Vector Search Explained
Want the next one in your inbox?
Join 1,000+ Product Managers getting one deep dive every Friday.