AI Your Agent Evals Are Lying to You Most agent evals measure the clean path. Production readiness depends on the messy path: tools, time, retries, handoffs, stale state, trace evidence, and recovery.