← Learn RAG from Scratch

RAG Routing Explained: LLM vs Semantic Router (When to Use What)

Video 5 of 9 · 6:00

Chapters

0:00The multi-source data problem
0:45LLM router walkthrough
2:10Semantic router walkthrough
3:00When to use which

Transcript

Auto generated by YouTube. Click any timestamp to jump to that moment.

Show

0:03docs live in a vector store.
0:05numbers sit in a SQL database.
0:08org chart is in a graph database.
0:11records come from an API. A
0:14asks, "What's our refund policy?"
0:17basic rags searches the vector
0:19That works. But now someone asks,
0:22was Q3 revenue by segment?" The
0:25store has no idea. That question
0:27SQL. And who reports to the VP of
0:30That needs a graph query.
0:33rag sends every question to the
0:35place. That's the problem. You need
0:38router. Something that reads the
0:40understands the intent, and
0:42it to the right data source. The
0:45router is the most common approach.
0:49how it works. A question comes
0:51You send it to an LLM with a system
0:54that says, "Classify this
0:57into one of these categories:
0:59search, SQL query, graph query,
1:02API call." The LLM reads the
1:06understands the intent, and
1:08the category. Then your code
1:10to the right data source. Let's
1:13three concrete examples. First,
1:16our refund policy? The LLM
1:19this as unstructured text. It
1:22to the vector store which
1:24your policy documents and
1:26the exact refund clause. Second,
1:29revenue by segment. The LLM
1:32this as structured financial
1:34It routes to the SQL database.
1:38system generates the SQL query,
1:40it, and returns $2.4 million
1:44three segments. Third, who
1:47to the VP of engineering? The
1:49classifies this as an organizational
1:52It routes to the graph
1:54A cipher query traverses the
1:57chart and returns the list of direct
2:00Same router, three different
2:02three different data sources.
2:05LLM does the hard part,
2:07what the user actually
2:09The semantic router takes a
2:12approach. Instead of asking an
2:14to classify every question, you
2:18embeddings for each route.
2:21the setup. You define your
2:23Vector search gets example
2:26like explain the refund policy
2:29what does the user guide say. SQL
2:32gets revenue last quarter and
2:35sales by region. Graph query gets
2:38reports to and team structure. Each
2:42of examples gets embedded and
2:43Now a question comes in. You
2:46it using the same model. Then you
2:49cosine similarity against each
2:52embeddings. The highest score
2:54For what's our refund policy? The
2:57search route scores 0.94.
3:01scores 0.31.
3:04scores 0.18.
3:06to vector store. No LLM call
3:09This is faster and cheaper than
3:12LLM router, but it's less flexible.
3:15a question doesn't match any
3:17pattern, well, it can
3:19If you want to learn how to do
3:22yourself, I run free live sessions
3:26Friday at noon Eastern. Scan the
3:29code on screen to join. Would love to
3:33you there.
3:35questions need data from multiple
3:38Compare our refund policy with
3:41quarter's refund volume. That
3:44needs the policy document from
3:46vector store and the refund totals
3:48the SQL database.
3:50single source router would pick one or
3:53other. You'd get an incomplete
3:55A multissource router identifies
3:58the data sources a question needs.
4:01LLM tags the question with multiple
4:04policy document plus financial
4:07The system queries both sources in
4:10The vector store returns the
4:13policy. The SQL database returns
4:16refund volume breakdown. Both
4:18get passed to the LLM together.
4:21we can write a complete answer that
4:23the policy against the actual
4:26The key insight classify
4:29not destinations. One question
4:32have multiple intents that map to
4:35data sources. When should you
4:38which router? Use the LLM router
4:41your questions are diverse and
4:42It handles edge cases
4:45because it reasons about intent.
4:47trade-off, it's slower and costs
4:50per query. You're making an LLM
4:53before you even start retrieving.
4:56the semantic router when your
4:58patterns are well defined and
5:00need speed. It's a single embedding
5:03a cosine similarity computation.
5:05routing. The trade-off,
5:08only works well for questions that
5:10your predefined patterns. Use
5:13routing when your data is
5:15and questions often span
5:17sources. Start simple. One data
5:21no router needed. Two sources,
5:24an LLM router. High traffic, switch
5:27semantic. Complex queries, add
5:29Build the routing layer
5:32matches your actual complexity.
5:35the full picture. If you want to
5:37deeper, join my free live session
5:40Friday at noon Eastern on Maven. I
5:43through this hands-on, answer
5:45and show you how to build it
5:47Scan the QR code to join.

RAG Re-ranking: Bi-Encoder vs Cross-Encoder Explained Next: Hybrid Search for RAG: BM25 + Vector Search Explained

Want the next one in your inbox?

Join 1,000+ Product Managers getting one deep dive every Friday.