# Build a Production-Grade RAG Pipeline with Voyage rerank-2.5 Instruction-Following
Why this matters for builders
Voyage AI just shipped rerank-2.5 and rerank-2.5-lite — the first rerankers with true instruction-following capabilities. They deliver +7.94% and +7.16% retrieval accuracy over Cohere Rerank v3.5 on 93 datasets, jump to +12.7% on the Massive Instructed Retrieval Benchmark (MAIR), support 32K context (8× Cohere), and add zero cost.
The game changer is the natural-language instruction field. You can now tell the reranker exactly how to interpret relevance (“Prioritize regulatory documents and legal statutes, ignore court cases”, “This is an e-commerce site about cars — treat Jaguar as the brand”, “Focus only on the methodology section of papers”). This turns static reranking into a controllable, domain-specific relevance layer.
For builders shipping real apps, this means fewer brittle prompt hacks, better precision in legal, medical, finance, and technical search, and a cleaner separation between retrieval and generation.
When to use it
Use rerank-2.5 when:
- You already have a first-stage retriever (BM25, Voyage embeddings, or vector search in MongoDB, Pinecone, Weaviate, etc.)
- Your domain has nuanced relevance rules that change per user, tenant, or product vertical
- Documents are long (>4k tokens) and you need full context
- You want to reduce hallucination by feeding the LLM only the truly relevant chunks
Use the -lite variant for lower latency/cost in high-QPS consumer apps.
The full process — from idea to shipped feature
Here’s a battle-tested workflow you can follow with Cursor, Claude, or any strong coding assistant.
1. Define the goal (10 minutes)
Write a one-paragraph spec:
“Build a legal research assistant that retrieves from a corpus of statutes, regulations, and case law. For every user query, the system must bias results toward regulatory documents and statutes while de-emphasizing court opinions. Use Voyage rerank-2.5 with the standing instruction ‘Retrieve regulatory documents and legal statutes, not court cases.’ Return the top 5 most relevant passages to the LLM for synthesis. Support documents up to 25k tokens. Measure nDCG@5 before and after adding the instruction.”
2. Shape the spec & prompt your AI coder
Give your coding assistant this starter prompt (copy-paste ready):
You are an expert RAG engineer. We are adding Voyage AI rerank-2.5 (instruction-following) to an existing retrieval pipeline.
Requirements:
- First stage: hybrid search (BM25 + Voyage voyage-3 embeddings) against MongoDB Atlas Vector Search or a simple list for prototyping
- Second stage: rerank with rerank-2.5 using the instruction: "Prioritize regulatory documents and legal statutes, not court cases."
- Support 32k context — do NOT truncate documents
- Return top 5 results with relevance scores
- Provide a simple FastAPI endpoint: POST /legal-search with { "query": "...", "instruction": "..." }
- Include evaluation harness using nDCG@5 on a small golden dataset
- Use official Voyage SDK (check docs for exact method signature)
Output structure:
1. requirements.txt + .env.example
2. core/reranker.py with VoyageReranker class
3. api/routes.py
4. eval/evaluate.py
5. README with local testing instructions
3. Scaffold the project
Run the generated scaffold, then refine with follow-up prompts:
from voyageai import Client
from typing import List, Dict
class VoyageReranker:
def __init__(self, model: str = "rerank-2.5"):
self.client = Client()
self.model = model
def rerank(self, query: str, documents: List[str], instruction: str = None, top_k: int = 5):
# Official pattern from Voyage docs
if instruction:
# Prepend or append — both work; prepend is often cleaner
query_with_instruction = f"{instruction}\n\nQuery: {query}"
else:
query_with_instruction = query
response = self.client.rerank(
query=query_with_instruction,
documents=documents,
model=self.model,
top_k=top_k
)
return response.results # contains index, relevance_score, document
4. Implement the full pipeline
Next prompt for your AI pair programmer:
“Now implement the hybrid retrieval + rerank flow. First retrieve top 30 candidates with MongoDB Atlas $vectorSearch and BM25, then rerank with rerank-2.5 using the legal instruction. Add a fallback to rerank-2.5-lite on timeout. Include async support.”
Validate that the code respects the 32K limit by passing full document text.
5. Validate with real metrics
Create a small evaluation set (10 queries with known good passages).
# eval/evaluate.py
from voyageai import Client
import json
reranker = VoyageReranker("rerank-2.5")
def test_instruction_impact():
query = "legal implications of AI training data"
instruction = "Retrieve regulatory documents and legal statutes, not court cases."
# Run twice — once with, once without instruction
results_with = reranker.rerank(query, docs, instruction)
results_without = reranker.rerank(query, docs, None)
print("With instruction top-3 relevance:", [r.relevance_score for r in results_with[:3]])
print("Without instruction top-3 relevance:", [r.relevance_score for r in results_without[:3]])
Typical gain: 7–13% in relevance alignment on domain-specific data.
6. Ship it safely
Production checklist:
- Add retry + fallback to
rerank-2.5-lite - Cache reranking results for identical (query + instruction) pairs for 5 minutes
- Monitor average rerank latency and cost per 1k tokens (pricing unchanged)
- A/B test the instruction version vs baseline for 1–2 weeks
- Log the exact instruction used per request for debugging
Pitfalls and guardrails
- Don’t put the instruction in the
documentsfield — it belongs with the query. - Overly long instructions can dilute signal — keep under 100 words.
- Test your instruction on a few dozen examples before shipping. Vague instructions (“make it better”) perform worse than specific ones.
- The model still returns relevance scores; higher score = better match to both query AND instruction.
- If you see degraded performance, try prepending vs appending the instruction (some domains prefer one).
What to do next
- Replace your current reranker (Cohere, Voyage rerank-2, or LLM-as-reranker) with
rerank-2.5 - Add one standing instruction per product vertical or user persona
- Measure nDCG@5 or human preference on your real traffic
- Experiment with dynamic instructions generated by a small LLM based on user session context
- Write a follow-up blog post once you have before/after metrics
This single change — adding a controllable reranker — often yields bigger relevance gains than upgrading embeddings or prompt engineering the final LLM.
Sources
- Voyage AI Official Announcement: https://blog.voyageai.com/2025/08/11/rerank-2-5/
- MongoDB mirrored post on rerank-2.5
- Voyage AI Rerankers documentation: https://docs.voyageai.com/docs/reranker
- Massive Instructed Retrieval Benchmark (MAIR) results referenced in the release
(Word count: 982)

