Glossary · AI
What is
Inference?
The process of running an already-trained model to produce predictions or generations.
By Anish· Founder · Vedwix
·Definition
Inference is the act of using a trained model — distinct from training. For LLMs, inference is what costs money in production: every API call, every chatbot response, every embedding. Inference optimization (batching, KV caching, quantization, speculative decoding) can drop costs by 10x.
Example
An app makes 10 million LLM inference calls per month at an average of $0.001 each — $10k/month.
How Vedwix uses Inference in client work
Inference cost is a first-class consideration in our architecture decisions.
Building with Inference?
We ship this.
If you're building with Inference in production, we can help — from architecture review to full implementation.
Brief us