Gradient Push - Gradient Push

Agent Operations

Before giving your agent more access, use this checklist

A seven-question checklist for reviewing an AI agent before giving it more permission.

Issue #17

Self-Healing Tests Need an Operator, Not a Vibe

AI-assisted test repair can reduce maintenance toil, but only if teams define what may heal automatically, what requires review, and what evidence proves the test is still protecting the behavior users depend on.

AI

Your Agent Evals Are Lying to You

Most agent evals measure the clean path. Production readiness depends on the messy path: tools, time, retries, handoffs, stale state, trace evidence, and recovery.

Abstract agent workflow passing through an identity authority checkpoint with tenant boundaries and audit recovery paths.

Issue #16

Identity Is the Real Control Plane for Agents

If orchestration decides sequence, identity decides legitimacy: what an agent can do, for whom, under what authority, across which tenant boundary, and how operators recover when that authority breaks.

Issue #14

Open-Source Agent Frameworks: What's Worth Your Time

Use an escalation ladder, not a hype ladder: stay in plain code longer than the market wants you to, move to a workflow framework when state and recovery become real, and reach for multi-agent coordination only when the job genuinely needs it.

Abstract workflow diagram showing persistent agent state, checkpoints, and recovery paths.

Issue #13

Persistent Agents Need an Ops Layer

Why long-running agents turn memory design into an ops problem, and what teams should govern before background workflows become invisible operational risk.

Issue #12

A2A Means Your Agent Stack Just Got Distributed

A2A turns agent-to-agent communication into a distributed-systems problem, with identity, task ownership, retries, trust, and failure handling now sitting on the critical path.

Issue #11

Your Agent Stack Is Only as Reliable as Its MCP Layer

MCP servers are becoming production dependencies for agent systems. How to inventory ownership, permissions, observability, and failure modes before they become hidden infrastructure risk.

newsletter

Model Sprawl Is the New Tech Debt

AI teams accumulate models faster than they build controls. How to manage model sprawl with registries, drift monitoring, rollbacks, and consolidation.

Issue #9

AI-Generated Code ≠ Safe Code

AI coding tools have genuinely made teams faster. The Harness 2026 State of DevOps report confirms it: AI coding adoption is up, velocity metrics are up, output is up. The same report notes that security and DevOps maturity haven't kept pace with the acceleration. More code is shipping,

Issue #8

When Your Gateway Has to Hold

A single agent handling predictable traffic is the easy case. Add a gateway, configure it correctly per Parts 1 and 2, and it works. The failure modes at scale are different in kind. An indirect prompt injection embedded in a document your agent was summarizing. A multi-agent workflow where

newsletter

Your Gateway Is Configured. Is It Working?

The gap between 'I added a gateway' and 'my gateway is actually working.' Four configuration decisions that separate coverage from false confidence.