AI Engineering Notes
Evaluation & Operations

Evals

Benchmark design, regression testing, and practical scorecards.

Use this page for offline evals, human review loops, and how you compare changes across prompts, models, or workflows.