skip to content
A n u R o c k
Gardener with a knack for plumbing bits

A pragmatic guide to LLM evals for devs

open.substack.com

TIL about Evals, the automated testing analogue to traditional unit/integration tests. Since running LLMs (for evaluation) in CI pipelines isn’t cheap, it’s good to prioritize test scenarios based on top buckets of real-world user issues.