Experiments
Experiment 0001: Agentic Evals for Small Models
A compact evaluation suite for planning, tool choice, self-correction, and distractor resistance in smaller open models.
Published Work
Focused experiments driven by one hypothesis and a small set of key questions.
Experiments
A compact evaluation suite for planning, tool choice, self-correction, and distractor resistance in smaller open models.