Introducing the Batch API: Simpler and more efficient for large-scale workloads — vibe-coding-guide
News/2026-03-09-introducing-the-batch-api-simpler-and-more-efficient-for-large-scale-workloads-v
Vibe Coding GuideMar 9, 20266 min read
Verified·First-party

Introducing the Batch API: Simpler and more efficient for large-scale workloads — vibe-coding-guide

Featured:Voyage AI

How to Build a Reliable Offline Embedding Pipeline with Voyage AI’s New Batch API

Why this matters for builders

Voyage AI just shipped a Batch API that makes large-scale embedding work dramatically simpler. Instead of managing queues, retries, rate limits, and partial failures yourself, you upload a JSONL file (up to 1 GB, 100K inputs), get a batch ID, and poll for completion within a 12-hour window. You also receive a 33% discount compared to the regular synchronous endpoint.

This change unlocks production-grade offline workloads that were previously painful: embedding entire documentation corpuses for RAG, running large evaluation suites, backfilling user profiles, or creating vector indexes for search. The async nature plus higher limits means you can now process millions of tokens in one shot without writing custom orchestration code.

When to use it

Use the Batch API when:

  • You have >5,000 embedding requests at once
  • You don’t need real-time responses
  • You want to avoid rate-limit headaches and retry logic
  • You are embedding static or slowly-changing data (docs, historical logs, product catalogs)
  • Cost efficiency matters (33% cheaper)

Avoid it for chat, real-time search, or any latency-sensitive user-facing feature.

The full process – a vibe coder’s playbook

Here’s a repeatable workflow you can follow with Cursor, Claude, or any strong coding assistant. The goal is to ship a production-ready offline embedding job in under two hours.

1. Define the goal (10 minutes)

Write a one-paragraph spec before you touch any code.

Example goal: “Create a Python CLI + library that reads a directory of markdown files, splits them into chunks, sends them to Voyage AI’s Batch API for embedding using voyage-3, stores the resulting embeddings + metadata in a local LanceDB table, and provides a simple process_batch.py script that can be scheduled with cron or GitHub Actions.”

2. Shape the spec/prompt for your AI coding assistant

Use this starter prompt (copy-paste and adapt):

You are an expert Python engineer building production data pipelines.

We are using Voyage AI's new Batch API (announced Dec 2024).
Key facts from the announcement:
- POST /v1/embeddings/batch to create a batch
- Upload a JSONL file where each line is a valid embedding request object
- Batch can contain up to 100K inputs and 1 GB
- Returns a batch object with id and status
- Poll GET /v1/embeddings/batch/{batch_id} until status is "completed" or "failed"
- Download results from the output_file_url when complete
- 33% cheaper than synchronous calls
- 12-hour completion window

Task:
Build a complete, well-structured Python package with:
1. `voyage_batch.py` – core client wrapper that handles upload, polling, and result downloading with clean error handling and logging
2. `chunker.py` – simple markdown splitter that produces chunks with source metadata
3. `embed_pipeline.py` – end-to-end script that reads a docs/ folder, chunks, creates batch, waits, downloads, and writes to LanceDB
4. CLI entrypoint using Typer or Fire

Include retry logic with exponential backoff for polling, proper JSONL formatting, and a .env.example for VOYAGE_API_KEY.
Write clean, typed code with pydantic models where appropriate.

3. Scaffold the project

Start with this folder structure:

voyage-batch-pipeline/
├── pyproject.toml
├── .env.example
├── docs/
│   └── example.md
├── src/
│   ├── voyage_batch.py
│   ├── chunker.py
│   └── pipeline.py
├── tests/
└── process_batch.py

Let your coding assistant generate the pyproject.toml with voyageai, lancedb, pydantic, typer, and tqdm.

4. Implement carefully

Core Batch Client (key snippet)

import json
import time
from pathlib import Path
from typing import List, Dict, Any
import voyageai
from pydantic import BaseModel
import requests
from tenacity import retry, wait_exponential, stop_after_attempt

class BatchRequest(BaseModel):
    input: str
    model: str = "voyage-3"
    # add truncation, etc. as needed

class VoyageBatchClient:
    def __init__(self, api_key: str):
        self.client = voyageai.Client(api_key=api_key)
        self.base_url = "https://api.voyageai.com/v1"
    
    def create_batch(self, requests: List[Dict], output_path: Path = None) -> str:
        if output_path is None:
            output_path = Path("batch_input.jsonl")
        
        with open(output_path, "w") as f:
            for req in requests:
                f.write(json.dumps(req) + "\n")
        
        # The official client may expose batch methods soon.
        # For now we use the raw endpoint as described in the announcement.
        with open(output_path, "rb") as f:
            response = requests.post(
                f"{self.base_url}/embeddings/batch",
                headers={"Authorization": f"Bearer {self.client.api_key}"},
                files={"file": f},
                data={"model": "voyage-3"}
            )
        response.raise_for_status()
        batch_id = response.json()["id"]
        print(f"Batch created: {batch_id}")
        return batch_id
    
    @retry(wait=wait_exponential(multiplier=30, min=30, max=300), stop=stop_after_attempt(20))
    def get_batch_status(self, batch_id: str) -> Dict:
        resp = requests.get(
            f"{self.base_url}/embeddings/batch/{batch_id}",
            headers={"Authorization": f"Bearer {self.client.api_key}"}
        )
        resp.raise_for_status()
        return resp.json()
    
    def download_results(self, output_file_url: str, save_path: Path) -> List:
        resp = requests.get(output_file_url)
        resp.raise_for_status()
        save_path.write_text(resp.text)
        # Parse the JSONL results into list of embeddings
        results = []
        for line in resp.text.splitlines():
            if line.strip():
                results.append(json.loads(line))
        return results

Note: Check the official Voyage AI Batch API reference for exact request/response shapes, as the client SDK may add convenience methods shortly.

5. Validate

Run a small test first:

python process_batch.py --input-dir ./docs --max-inputs 200

Validation checklist:

  • Batch is created successfully and returns a valid ID
  • Polling loop correctly detects "completed" state
  • Output file is downloaded and contains same number of embeddings as inputs
  • Embeddings are non-zero and reasonable length (usually 1024 for voyage-3)
  • LanceDB table is created and queryable
  • Script handles network flakes gracefully (thanks to tenacity)

6. Ship it safely

  • Add logging with structlog or rich
  • Store batch IDs in a small SQLite or JSON file so you can resume or debug failed jobs
  • Add a --dry-run flag that only creates the input JSONL
  • Containerize with Docker if you plan to run this in CI or on a schedule
  • Set up GitHub Actions workflow that runs on changes to the docs/ folder

Pitfalls and guardrails

  • Don’t try to make the batch synchronous in your mind. Accept the 12-hour window and build around it.
  • JSONL formatting must be perfect — one malformed line fails the entire batch.
  • Always validate that the number of results matches the number of inputs.
  • Monitor costs even with the discount; 100K long documents can still add up.
  • The Batch API is async only — do not use it for anything that needs sub-minute latency.
  • Rate limits still apply to the batch creation and polling endpoints; be respectful.

What to do next

  1. Run the pipeline on your full corpus
  2. Add metadata enrichment (document title, section, last updated date)
  3. Build a simple evaluation script that measures retrieval quality before/after the new embeddings
  4. Containerize and schedule the job weekly via GitHub Actions or Modal/Cron
  5. Explore combining with Voyage’s reranker for a complete RAG stack

Sources

This pattern — define goal → craft strong spec prompt → scaffold → implement core client → validate with small data → productionize — works for almost any new async AI API. Once you have the Voyage Batch pipeline working, you can reuse 80% of the orchestration code for future batch jobs from other providers.

Happy building.

Original Source

blog.voyageai.com

Comments

No comments yet. Be the first to share your thoughts!

Voyage AI Launches Batch API for Efficient Large-Scale Workloads | PikaAINews