Learn RAG from Scratch

RAG Re-ranking: Bi-Encoder vs Cross-Encoder Explained

Video 4 of 9 · 6:52

Chapters

  • 0:00The ranking problem
  • 0:50How bi-encoders work
  • 1:45Cross-encoders: reading together
  • 2:45Retrieve then re-rank pattern

Transcript

Auto generated by YouTube. Click any timestamp to jump to that moment.

Show
  1. 0:03the refund policy for annual plans?
  2. 0:06vector search returns five results.
  3. 0:09pricing page is number one. The
  4. 0:12refund policy buried at number
  5. 0:14The embeddings are close, but
  6. 0:17is not the same as correct. Vector
  7. 0:20measures distance in
  8. 0:22space. It does not measure
  9. 0:24a document actually answers the
  10. 0:26This is the gap that kills rag
  11. 0:29in production. The results look
  12. 0:31The scores are high, but the
  13. 0:34answer is not on top. To understand
  14. 0:37fix, you need to understand the
  15. 0:40Most rack systems use by
  16. 0:42for retrieval. A by encoder
  17. 0:45the query and each document
  18. 0:47The query goes through one
  19. 0:50Each document goes through the
  20. 0:52encoder independently. Both come
  21. 0:55as vectors. Then you compare them
  22. 0:58cosign similarity. This is fast.
  23. 1:01can premputee all your document
  24. 1:03once and store them. When a
  25. 1:06comes in, you encode it and
  26. 1:08against everything in
  27. 1:10The trade-off, the encoder
  28. 1:13sees the query and document
  29. 1:15It cannot reason about how
  30. 1:18relate. It just compares two points
  31. 1:20space. That is why a pricing page
  32. 1:23annual plans scores high for a
  33. 1:26about refund policy for annual
  34. 1:29The words overlap. A cross
  35. 1:32takes a completely different
  36. 1:35Instead of encoding the query
  37. 1:38document separately, it reads them
  38. 1:40as one input. The query and
  39. 1:43get concatenated. They go
  40. 1:46a single transformer as one
  41. 1:48The model sees every word in
  42. 1:51query next to every word in the
  43. 1:53It can reason about their
  44. 1:55directly. The output is a
  45. 1:58relevant score, not a vector
  46. 2:01but a direct prediction. How
  47. 2:04is this document to this
  48. 2:06This is much more accurate.
  49. 2:08cross encoder understands that
  50. 2:11policy for annual plans is asking
  51. 2:14refunds, not pricing. Even though
  52. 2:17documents mention annual plans, but
  53. 2:20is slow. You cannot premputee
  54. 2:22Every query document pair
  55. 2:25a full forward pass through the
  56. 2:27Here is the technique that fixes
  57. 2:31Cross encoder reranking. It
  58. 2:34the speed of buyenccoders with
  59. 2:37accuracy of cross encoders in a
  60. 2:40pipeline. Stage one, your buy
  61. 2:43retrieves the top 20 candidates.
  62. 2:46is fast milliseconds. You get a
  63. 2:49set of potentially relevant
  64. 2:51Stage two, the cross encoder
  65. 2:55those 20 candidates. It reads
  66. 2:58one paired with the original query.
  67. 3:01it can reason about actual
  68. 3:0320 forward passes instead of
  69. 3:06your entire database. The
  70. 3:08change dramatically. Watch the
  71. 3:11policy example. Before
  72. 3:14the pricing page sits at
  73. 3:16one with a similarity score of
  74. 3:20refund policy is stuck at number
  75. 3:22with 0.58.
  76. 3:25the cross encoder reranks, the
  77. 3:28policy jumps to number one with a
  78. 3:30score of 0.94.
  79. 3:33pricing page drops to number four
  80. 3:350.31.
  81. 3:37documents, same query, completely
  82. 3:40ordering. The cross encoder
  83. 3:43what the user was actually
  84. 3:45for. If you want to learn how to
  85. 3:48this yourself, I run free live
  86. 3:51every Friday at noon Eastern.
  87. 3:55the QR code on screen to join.
  88. 3:58love to see you there.
  89. 4:01not just use cross encoders for
  90. 4:04Math. If you have 100,000
  91. 4:07and use a cross encoder for
  92. 4:10one, that is 100,000 forward passes
  93. 4:13query at 50 milliseconds each. That
  94. 4:1783 minutes per search. Completely
  95. 4:20A by encoder, it premputes all
  96. 4:24embeddings once. At query time,
  97. 4:27encode the query and do a vector
  98. 4:29The entire search takes
  99. 4:31100 milliseconds.
  100. 4:34is the middle ground. By
  101. 4:37narrows 100,000 documents to 20
  102. 4:40milliseconds.
  103. 4:42encoder reranks 20 documents in
  104. 4:441 second. Total latency just over
  105. 4:48second. That is the sweet spot. You
  106. 4:5195% of the accuracy of a full cross
  107. 4:54search at a fraction of the
  108. 4:56You do not have to build a cross
  109. 5:00from scratch. Cohhere rerank is
  110. 5:04most popular hosted option. You send
  111. 5:07query and a list of documents. It
  112. 5:10them reordered by relevance with
  113. 5:12Three lines of code. Gina
  114. 5:16is another option open- source
  115. 5:18Voyage AI focuses on domain
  116. 5:22reranking. And if you want to
  117. 5:24the cross- encoder models on
  118. 5:27face work well. The MS Marco
  119. 5:30are the standard starting point.
  120. 5:33the one that fits your stack. The
  121. 5:35is the same across all of them.
  122. 5:38broadly, then rerank precisely.
  123. 5:42zoom out. Without reranking, your
  124. 5:45pipeline looks like this. User asks
  125. 5:48question. Buy encoder retrieves the
  126. 5:51embeddings. Results go straight
  127. 5:53the LLM. The LLM works with whatever
  128. 5:56gets, even if the best document is
  129. 5:58at position 4. With re-ranking,
  130. 6:01add one step. The cross encoder
  131. 6:04each candidate paired with the
  132. 6:06and reorders them. Now, the LLM
  133. 6:09the most relevant documents first.
  134. 6:12context in, better answers out.
  135. 6:15your RAG app returns technically
  136. 6:17but not quite right answers,
  137. 6:19is probably the fix. Retrieve
  138. 6:22rerank precisely. That's how
  139. 6:25find the right result. That's the
  140. 6:28picture. If you want to go deeper,
  141. 6:31my free live session this Friday at
  142. 6:33Eastern on Maven. I walk through
  143. 6:36hands-on, answer questions, and
  144. 6:39you how to build it yourself. Scan
  145. 6:41QR code to join.

Want the next one in your inbox?

Join 1,000+ Product Managers getting one deep dive every Friday.