Glossary · AI
What is
LLM-as-Judge?
Using one LLM to evaluate the outputs of another LLM (or itself) against criteria.
By Anish· Founder · Vedwix
·Definition
LLM-as-judge is a scalable evaluation approach where an LLM grades outputs against a rubric. It's much faster than human evaluation, but introduces its own biases (position, length, self-preference). Best practice: pair LLM-as-judge with human spot-checks, use rubrics with examples, and validate the judge model itself with a labeled set.
Example
An eval harness uses Claude as a judge to grade 1,000 RAG answers on faithfulness, relevance, and citation correctness.
How Vedwix uses LLM-as-Judge in client work
We use LLM-as-judge for high-volume eval rounds, with human review on a 5–10% sample.
Building with LLM-as-Judge?
We ship this.
If you're building with LLM-as-Judge in production, we can help — from architecture review to full implementation.
Brief us