Production checklist: - Add retry logic with exponential backoff for the Voyage API - Implement embedding caching (Redis or database column) for repeated queries - Monitor token usage and cost per day - Set up a fallback to `voyage-3` in case of any unexpected model behavior - Expose model name as a configurable parameter so you can A/B test lite vs full - Add observability (LangSmith, Helicone, or OpenTelemetry) to track retrieval latency and relevance Deploy behind a feature flag so you can ro

Immediate checklist after your first deployment: 1. Run the full eval suite and record voyage-3.5 vs voyage-3.5-lite numbers 2. Instrument cost and latency dashboards 3. Pick 10 real user queries and manually judge relevance 4. Decide which model becomes default for your workload 5. Add hybrid search (BM25 + voyage embeddings) as a future iteration The upgrade to voyage-3.5 is one of the highest-ROI moves available to RAG builders right now. The model is better, the price is the same, and the in

Voyage AI's Voyage-3.5 & Lite: Retrieval Quality Leap

Q: When to use voyage-3.5?

Use it when you need: - High-stakes retrieval (legal, medical, financial, support, internal wikis) - Long-context documents (up to 32K tokens) - Cost predictability — same price as the previous generation - Strong performance on English technical, enterprise, and domain-specific content Skip it (for now) if you need multimodal embeddings or non-English retrieval at the absolute bleeding edge; the announcement focuses on text retrieval improvements.

Q: The full process: from idea to shipped retrieval service?

#### 1. Define the goal (30 minutes) Start by writing a one-paragraph spec. Good example: > “Build a retrieval service for our 4,200 internal engineering Notion pages and 18 product requirement PDFs. Goal: top-5 chunks must contain the correct answer >92% of the time on a 120-question eval set. Latency <450 ms p95 at 10 QPS. Cost target <$180/month at current query volume. Use voyage-3.5 or voyage-3.5-lite.” Capture success metrics, expected load, and domain constraints. This paragraph becomes t

Q: # 2. Shape the spec/prompt for your AI coding tool?

Use this starter prompt (copy-paste friendly): ```markdown You are an expert RAG engineer. We are building a retrieval microservice using Voyage AI's new voyage-3.5 model. Requirements: - Ingest Notion pages + PDF documents - Chunk intelligently (semantic + fixed-size fallback) - Generate embeddings with `voyage-3.5` (fallback to `voyage-3.5-lite` for cost-sensitive paths) - Store in PostgreSQL + pgvector (or Pinecone/Qdrant if preferred) - Expose a clean /retrieve endpoint that returns top-k ch

Q: # 3. Scaffold the project?

Let your AI coding assistant generate the initial layout. A typical structure: ``` voyage-rag-service/ ├── src/ │ ├── ingest/ │ │ ├── chunker.ts │ │ └── embed.ts │ ├── retrieval/ │ │ ├── index.ts │ │ └── rerank.ts │ ├── eval/ │ │ └── evaluate.ts │ └── db/ ├── eval_questions.json ├── package.json └── Dockerfile ``` Prompt for a minimal viable Dockerfile and a `docker-compose.yml` that includes pgvector so you can develop locally with the same stack you’ll use in produc

Q: # 4. Implement carefully?

**Embedding snippet** (Voyage AI SDK): ```ts import { VoyageAIClient } from "voyageai"; const voyage = new VoyageAIClient({ apiKey: process.env.VOYAGE_API_KEY, }); async function embedChunks(texts: string[], model: "voyage-3.5" | "voyage-3.5-lite" = "voyage-3.5") { const result = await voyage.embed({ input: texts, model: model, }); return result.data.map(item => item.embedding); } ``` **Retrieval function** (example with pgvector): ```ts async function retrieve(query: string, k =

Q: # 5. Validate with an eval harness?

Never ship without quantitative validation. Use this simple evaluation prompt with your coding assistant: ```markdown Create an evaluation script that: - Loads 120 question/answer pairs - For each question, retrieves top-5 chunks using voyage-3.5 - Checks if the ground-truth answer appears in any of the top-5 chunks (or is semantically similar via another embedding) - Reports Hit@5, MRR, and average latency - Saves results to a markdown table ``` Run the eval before and after switching from `voy

Q: Pitfalls and guardrails?

- **Chunking is still king** — even the best embedding model suffers if chunks are too large or split mid-sentence. Always validate chunk quality on a few sample documents. - **Don’t forget metadata filtering** — Voyage embeddings are powerful, but combining them with metadata (date, product, owner) usually gives bigger gains than model upgrades alone. - **Eval set contamination** — make sure your evaluation questions were never seen during development. Leakage is the #1 reason teams overestimat

# How to Ship a Production RAG App with voyage-3.5: A Vibe Coding Blueprint

Voyage AI just dropped voyage-3.5 and voyage-3.5-lite. The new models deliver +2.66% retrieval quality over voyage-3 and +4.28% over voyage-3-lite while keeping the exact same pricing ($0.06 and $0.02 per million tokens) and 32K context length. For builders, this is one of the cleanest “drop-in upgrade” moments of 2025: better relevance with zero extra cost and no code changes required beyond swapping the model name.

This guide walks you through a reliable, AI-assisted process to build and ship a production-grade retrieval system using the new Voyage models. Whether you’re refreshing an existing RAG app or starting fresh, the workflow below keeps you moving fast while avoiding the usual embedding pitfalls.

Why this matters for builders

Retrieval quality is the #1 determinant of whether your RAG application feels magical or mediocre. A 2–4% gain on MTEB-style benchmarks often translates to noticeably fewer hallucinated or off-topic answers in production. Because Voyage kept pricing and context identical, you can upgrade today without budget impact or re-chunking your corpus.

The voyage-3.5-lite variant is especially attractive: it now offers a compelling price/performance point for mid-sized knowledge bases where you previously had to choose between quality and cost.

When to use voyage-3.5

Use it when you need:

High-stakes retrieval (legal, medical, financial, support, internal wikis)
Long-context documents (up to 32K tokens)
Cost predictability — same price as the previous generation
Strong performance on English technical, enterprise, and domain-specific content

Skip it (for now) if you need multimodal embeddings or non-English retrieval at the absolute bleeding edge; the announcement focuses on text retrieval improvements.

The full process: from idea to shipped retrieval service

1. Define the goal (30 minutes)

Start by writing a one-paragraph spec. Good example:

“Build a retrieval service for our 4,200 internal engineering Notion pages and 18 product requirement PDFs. Goal: top-5 chunks must contain the correct answer >92% of the time on a 120-question eval set. Latency <450 ms p95 at 10 QPS. Cost target <$180/month at current query volume. Use voyage-3.5 or voyage-3.5-lite.”

Capture success metrics, expected load, and domain constraints. This paragraph becomes the north star for every prompt you give your coding assistant.

2. Shape the spec/prompt for your AI coding tool

Use this starter prompt (copy-paste friendly):

You are an expert RAG engineer. We are building a retrieval microservice using Voyage AI's new voyage-3.5 model.

Requirements:
- Ingest Notion pages + PDF documents
- Chunk intelligently (semantic + fixed-size fallback)
- Generate embeddings with `voyage-3.5` (fallback to `voyage-3.5-lite` for cost-sensitive paths)
- Store in PostgreSQL + pgvector (or Pinecone/Qdrant if preferred)
- Expose a clean /retrieve endpoint that returns top-k chunks with scores and metadata
- Include a small evaluation harness using 80 held-out questions
- Keep everything in TypeScript + Next.js API routes or Python FastAPI

Output a complete project structure, then give me the first three files I should create.

Adjust the stack to match your environment. The more concrete you are about data sources and success criteria, the better the generated scaffolding will be.

3. Scaffold the project

Let your AI coding assistant generate the initial layout. A typical structure:

voyage-rag-service/
├── src/
│   ├── ingest/
│   │   ├── chunker.ts
│   │   └── embed.ts
│   ├── retrieval/
│   │   ├── index.ts
│   │   └── rerank.ts
│   ├── eval/
│   │   └── evaluate.ts
│   └── db/
├── eval_questions.json
├── package.json
└── Dockerfile

Prompt for a minimal viable Dockerfile and a docker-compose.yml that includes pgvector so you can develop locally with the same stack you’ll use in production.

4. Implement carefully

Embedding snippet (Voyage AI SDK):

import { VoyageAIClient } from "voyageai";

const voyage = new VoyageAIClient({
  apiKey: process.env.VOYAGE_API_KEY,
});

async function embedChunks(texts: string[], model: "voyage-3.5" | "voyage-3.5-lite" = "voyage-3.5") {
  const result = await voyage.embed({
    input: texts,
    model: model,
  });
  return result.data.map(item => item.embedding);
}

Retrieval function (example with pgvector):

async function retrieve(query: string, k = 5) {
  const [queryEmbedding] = await embedChunks([query], "voyage-3.5");

  const { rows } = await db.query(`
    SELECT content, metadata, 
           (embedding <=> $1::vector) as distance
    FROM documents 
    ORDER BY embedding <=> $1::vector 
    LIMIT $2
  `, [queryEmbedding, k]);

  return rows;
}

Run the ingestion pipeline on a small subset first (50 documents) and inspect the chunk quality manually. Fix chunking logic before embedding the full corpus.

5. Validate with an eval harness

Never ship without quantitative validation. Use this simple evaluation prompt with your coding assistant:

Create an evaluation script that:
- Loads 120 question/answer pairs
- For each question, retrieves top-5 chunks using voyage-3.5
- Checks if the ground-truth answer appears in any of the top-5 chunks (or is semantically similar via another embedding)
- Reports Hit@5, MRR, and average latency
- Saves results to a markdown table

Run the eval before and after switching from voyage-3 to voyage-3.5. You should see the expected 2–4% lift on your own data.

Also test voyage-3.5-lite in parallel. In many internal wikis the quality difference is small enough that the lite version wins on cost.

6. Ship it safely

Production checklist:

Add retry logic with exponential backoff for the Voyage API
Implement embedding caching (Redis or database column) for repeated queries
Monitor token usage and cost per day
Set up a fallback to voyage-3 in case of any unexpected model behavior
Expose model name as a configurable parameter so you can A/B test lite vs full
Add observability (LangSmith, Helicone, or OpenTelemetry) to track retrieval latency and relevance

Deploy behind a feature flag so you can roll back the model switch instantly.

Pitfalls and guardrails

Chunking is still king — even the best embedding model suffers if chunks are too large or split mid-sentence. Always validate chunk quality on a few sample documents.
Don’t forget metadata filtering — Voyage embeddings are powerful, but combining them with metadata (date, product, owner) usually gives bigger gains than model upgrades alone.
Eval set contamination — make sure your evaluation questions were never seen during development. Leakage is the #1 reason teams overestimate their improvement.
Over-relying on benchmarks — MTEB numbers are useful but your domain may differ. Always measure on your own data.
Cost creep — while per-token price is unchanged, better retrieval sometimes leads to more generous top-k or more frequent queries. Monitor spend for the first week.

What to do next

Immediate checklist after your first deployment:

Run the full eval suite and record voyage-3.5 vs voyage-3.5-lite numbers
Instrument cost and latency dashboards
Pick 10 real user queries and manually judge relevance
Decide which model becomes default for your workload
Add hybrid search (BM25 + voyage embeddings) as a future iteration

The upgrade to voyage-3.5 is one of the highest-ROI moves available to RAG builders right now. The model is better, the price is the same, and the integration is trivial. Ship fast, measure, then iterate on chunking and metadata.

Sources

voyage-3.5 and voyage-3.5-lite: improved quality for a new retrieval frontier – Voyage AI
MongoDB announcement mirroring the same metrics and pricing details
Verified first-party release information (confidence score 100)

All code snippets are starter templates. Check the official Voyage AI documentation and latest SDK reference for any parameter changes.

voyage-3.5 and voyage-3.5-lite: improved quality for a new retrieval frontier — vibe-coding-guide