Notes on shipping production AI.
Long-form essays on evals, agents, red-teaming, and the business of running an audit-first technical practice. Specific over vague. Audit-honest. No marketing fluff.
Most \”agents\” in 2026 are elaborate prompts
What real agent architecture actually requires — tool definitions, planning steps, trajectory observability, eval pipelines focused on end-to-end completion — versus what most teams ship and call 'agents.' Aprospective client showed me their "AI agent" in a discovery call last month. It was a system…
The eval pipeline is the product
A demo is a model. A product is a model + evals + fallback + cost cap + abuse detection + observability. The 5× effort to build the second is what determines whether it ships. Afriend sent me a Loom in March. Two minutes of…
Why I red-team every AI before launch
The math on pre-launch versus post-incident is brutal. Once a prompt injection vector lands on Twitter, three months of trust evaporate. So I red-team everything as a default, not an exception. ast year I ran an A1 audit for a Series B B2B SaaS launching…
Get new essays in your inbox.
One essay a month, roughly. No sequences, no marketing. Unsubscribe in one click.