Glossary · AI

What is
LLM-as-Judge?

Using one LLM to evaluate the outputs of another LLM (or itself) against criteria.

By Anish· Founder · Vedwix
·

Definition

LLM-as-judge is a scalable evaluation approach where an LLM grades outputs against a rubric. It's much faster than human evaluation, but introduces its own biases (position, length, self-preference). Best practice: pair LLM-as-judge with human spot-checks, use rubrics with examples, and validate the judge model itself with a labeled set.

Example

An eval harness uses Claude as a judge to grade 1,000 RAG answers on faithfulness, relevance, and citation correctness.

How Vedwix uses LLM-as-Judge in client work

We use LLM-as-judge for high-volume eval rounds, with human review on a 5–10% sample.

Building with LLM-as-Judge?

We ship this.

If you're building with LLM-as-Judge in production, we can help — from architecture review to full implementation.

Brief us

Working on a LLM-as-Judge project?

Brief Vedwix in three sentences or fewer.

Start a project