Notebook

0001 Agentic Evals Baseline Notebook

A Python notebook for prompt fixtures, scoring checks, and baseline observations for the first experiment.

2026-02-28 ยท Python notebook

0001 Agentic Evals Baseline

This notebook is the working scratchpad for the first experiment. It exists to keep the baseline logic simple, visible, and easy to rerun.

Python
prompt_count = 4
rubric_version = "draft-v0.1"
print(f"Prompts loaded: {prompt_count}")
print(f"Scoring rubric: {rubric_version}")
Prompts loaded: 4\nScoring rubric: draft-v0.1\n

Immediate next steps

  • Add prompt fixtures.
  • Record baseline outputs.
  • Convert failures into explicit scoring checks.