by @plaited
This skill helps you evaluate CLI agent trajectories by capturing full runs and providing structured JSONL for downstream scoring.
This skill helps you evaluate CLI agent trajectories by capturing full runs and providing structured JSONL for downstream scoring.