Learn RAG from Scratch

RAG Routing Explained: LLM vs Semantic Router (When to Use What)

Video 5 of 9 · 6:00

Chapters

  • 0:00The multi-source data problem
  • 0:45LLM router walkthrough
  • 2:10Semantic router walkthrough
  • 3:00When to use which

Transcript

Auto generated by YouTube. Click any timestamp to jump to that moment.

Show
  1. 0:03docs live in a vector store.
  2. 0:05numbers sit in a SQL database.
  3. 0:08org chart is in a graph database.
  4. 0:11records come from an API. A
  5. 0:14asks, "What's our refund policy?"
  6. 0:17basic rags searches the vector
  7. 0:19That works. But now someone asks,
  8. 0:22was Q3 revenue by segment?" The
  9. 0:25store has no idea. That question
  10. 0:27SQL. And who reports to the VP of
  11. 0:30That needs a graph query.
  12. 0:33rag sends every question to the
  13. 0:35place. That's the problem. You need
  14. 0:38router. Something that reads the
  15. 0:40understands the intent, and
  16. 0:42it to the right data source. The
  17. 0:45router is the most common approach.
  18. 0:49how it works. A question comes
  19. 0:51You send it to an LLM with a system
  20. 0:54that says, "Classify this
  21. 0:57into one of these categories:
  22. 0:59search, SQL query, graph query,
  23. 1:02API call." The LLM reads the
  24. 1:06understands the intent, and
  25. 1:08the category. Then your code
  26. 1:10to the right data source. Let's
  27. 1:13three concrete examples. First,
  28. 1:16our refund policy? The LLM
  29. 1:19this as unstructured text. It
  30. 1:22to the vector store which
  31. 1:24your policy documents and
  32. 1:26the exact refund clause. Second,
  33. 1:29revenue by segment. The LLM
  34. 1:32this as structured financial
  35. 1:34It routes to the SQL database.
  36. 1:38system generates the SQL query,
  37. 1:40it, and returns $2.4 million
  38. 1:44three segments. Third, who
  39. 1:47to the VP of engineering? The
  40. 1:49classifies this as an organizational
  41. 1:52It routes to the graph
  42. 1:54A cipher query traverses the
  43. 1:57chart and returns the list of direct
  44. 2:00Same router, three different
  45. 2:02three different data sources.
  46. 2:05LLM does the hard part,
  47. 2:07what the user actually
  48. 2:09The semantic router takes a
  49. 2:12approach. Instead of asking an
  50. 2:14to classify every question, you
  51. 2:18embeddings for each route.
  52. 2:21the setup. You define your
  53. 2:23Vector search gets example
  54. 2:26like explain the refund policy
  55. 2:29what does the user guide say. SQL
  56. 2:32gets revenue last quarter and
  57. 2:35sales by region. Graph query gets
  58. 2:38reports to and team structure. Each
  59. 2:42of examples gets embedded and
  60. 2:43Now a question comes in. You
  61. 2:46it using the same model. Then you
  62. 2:49cosine similarity against each
  63. 2:52embeddings. The highest score
  64. 2:54For what's our refund policy? The
  65. 2:57search route scores 0.94.
  66. 3:01scores 0.31.
  67. 3:04scores 0.18.
  68. 3:06to vector store. No LLM call
  69. 3:09This is faster and cheaper than
  70. 3:12LLM router, but it's less flexible.
  71. 3:15a question doesn't match any
  72. 3:17pattern, well, it can
  73. 3:19If you want to learn how to do
  74. 3:22yourself, I run free live sessions
  75. 3:26Friday at noon Eastern. Scan the
  76. 3:29code on screen to join. Would love to
  77. 3:33you there.
  78. 3:35questions need data from multiple
  79. 3:38Compare our refund policy with
  80. 3:41quarter's refund volume. That
  81. 3:44needs the policy document from
  82. 3:46vector store and the refund totals
  83. 3:48the SQL database.
  84. 3:50single source router would pick one or
  85. 3:53other. You'd get an incomplete
  86. 3:55A multissource router identifies
  87. 3:58the data sources a question needs.
  88. 4:01LLM tags the question with multiple
  89. 4:04policy document plus financial
  90. 4:07The system queries both sources in
  91. 4:10The vector store returns the
  92. 4:13policy. The SQL database returns
  93. 4:16refund volume breakdown. Both
  94. 4:18get passed to the LLM together.
  95. 4:21we can write a complete answer that
  96. 4:23the policy against the actual
  97. 4:26The key insight classify
  98. 4:29not destinations. One question
  99. 4:32have multiple intents that map to
  100. 4:35data sources. When should you
  101. 4:38which router? Use the LLM router
  102. 4:41your questions are diverse and
  103. 4:42It handles edge cases
  104. 4:45because it reasons about intent.
  105. 4:47trade-off, it's slower and costs
  106. 4:50per query. You're making an LLM
  107. 4:53before you even start retrieving.
  108. 4:56the semantic router when your
  109. 4:58patterns are well defined and
  110. 5:00need speed. It's a single embedding
  111. 5:03a cosine similarity computation.
  112. 5:05routing. The trade-off,
  113. 5:08only works well for questions that
  114. 5:10your predefined patterns. Use
  115. 5:13routing when your data is
  116. 5:15and questions often span
  117. 5:17sources. Start simple. One data
  118. 5:21no router needed. Two sources,
  119. 5:24an LLM router. High traffic, switch
  120. 5:27semantic. Complex queries, add
  121. 5:29Build the routing layer
  122. 5:32matches your actual complexity.
  123. 5:35the full picture. If you want to
  124. 5:37deeper, join my free live session
  125. 5:40Friday at noon Eastern on Maven. I
  126. 5:43through this hands-on, answer
  127. 5:45and show you how to build it
  128. 5:47Scan the QR code to join.

Want the next one in your inbox?

Join 1,000+ Product Managers getting one deep dive every Friday.