0001 Agentic Evals Baseline
This notebook is the working scratchpad for the first experiment. It exists to keep the baseline logic simple, visible, and easy to rerun.
Python
prompt_count = 4
rubric_version = "draft-v0.1"
print(f"Prompts loaded: {prompt_count}")
print(f"Scoring rubric: {rubric_version}")
Prompts loaded: 4\nScoring rubric: draft-v0.1\nImmediate next steps
- Add prompt fixtures.
- Record baseline outputs.
- Convert failures into explicit scoring checks.