Calculator

RAG Inference Cost
calculator.

Estimate monthly inference cost for a RAG system in production.

How we calibrated this
Used to model client RAG TCO before architecture decisions.
Inputs

Tell us about your project.

This is a static reference card. For interactive calculators, talk to us — we tune the assumptions per client.

Queries per month

Range: 10005000000 queries · Default: 50000 queries

Avg input tokens per query

Range: 50020000 tokens · Default: 4000 tokens

Avg output tokens per query

Range: 1004000 tokens · Default: 600 tokens

Model
  • Claude Haiku 4.50.2×
  • Claude Sonnet 4.61×
  • Claude Opus 4.74.5×
  • GPT mini0.25×
  • Self-hosted fine-tuned 7B0.05×
How it's calculated

The formula.

Tokens × per-token model price + retrieval costs

Output

Monthly inference cost

API + retrieval cost per month.

Output

Cost per query

Effective unit economics.

Output

Annual run-rate

12-month projection.

Want a real estimate?

This is a band,
not a quote.

For a real estimate calibrated to your specific project, brief us. We get back within two business days.

Brief us on RAG

Got a specific project?

Brief us in three sentences. We'll send a tailored estimate.

Brief us on RAG