Hire / Productized engagement
Agentic Workflow Design
I install the kind of agentic, LLM-backed workflow you keep meaning to build — the repetitive, judgment-light work in your org turned into a multi-step agent loop with a human gate, cost instrumentation, and a runbook. Fixed scope, shipped to production, handed off.
Most "AI automation" demos die in the gap between a prompt that works once and a workflow that runs every day without supervision. I live in that gap. This entire site is run by the agent workflows below — they enrich its content, answer questions over its corpus, draft its outreach, QA its games, and write essays about its own architecture, in public, with a person in the loop wherever it counts. The engagement is me installing that same machine in your stack, scoped to one workflow that earns its keep.
This isn't a slide deck — it's the site you're on
Six agentic workflows, already in production
A five-agent content pipeline
/research →Five specialized sub-agents — auditor, metadata, content, linker, blog — run over the site's corpus with retry, usage tracking, and prompt versioning; every write is human-approved before it reaches a page.
A cited answer engine over the corpus
/ask →A concierge agent loop calls read-only grounding tools (hybrid semantic + keyword search) and answers questions about the site with citations — the same loop powers the site-wide Cmd-K palette.
A multi-agent essay org
/essays/strata →An agent organization researches, drafts, and cross-checks a long-form essay series — built in public, hydrated from committed runs so every claim is auditable.
An autonomous QA rig
/games →The game library is playtested nightly by an LLM that boots headless Chromium, screenshots every game on a cold mobile profile, and files an issue per regression — backed by a deterministic per-PR gate that blocks merges.
A read-only outreach pipeline
/hire →Lead intake → fit-scoring → an agent that researches each job with a guarded fetch tool → a drafted proposal — and it all halts at needs_review, because the agents only read and a human presses send.
The whole machine, documented
/architecture →A live system map of how every surface and agent on the site communicates — the building-in-public posture turned into a diagram you can read before you trust me with yours.
The engagement
Five phases, roughly four to six weeks
Map
Week 1We find the one workflow worth automating first — repetitive, high-volume, judgment-light — and write down the oracle: what "good output" actually means, in examples, so the agent has something to be measured against. If nothing clears that bar, I tell you, and we stop here.
Design
Week 1–2The agent topology on paper: which steps are sub-agents, which tools each one gets, where the human gates sit, and what model tier each step needs so cost tracks value instead of defaulting to the biggest model everywhere. You approve the shape before any build.
Build
Week 2–4The loop, for real, in your stack: tool-calling, structured outputs, retries and backoff, usage tracking, and an eval harness against the oracle from week 1. It runs end to end on your data before anyone calls it done.
Gate & ship
Week 4–5The part most demos skip: a human-review surface so nothing ships unattended, guardrails that keep agents read-only and route every write through a single audited path, and the monitoring to know when it drifts.
Hand off
CloseA runbook, an architecture doc your team can own, and a walkthrough call. Optionally, a published case study to the same bar as the essays on this site — your win becomes a credible artifact you can point at.
What you walk away with
- A working agentic workflow running in production in your stack
- A human-in-the-loop review gate — nothing ships unattended
- Cost instrumentation: model tiering + per-run usage tracking
- An eval harness tying output to a written definition of "good"
- A runbook and an architecture doc your team can maintain
- A walkthrough call — and, if you want it, a public case study
An honest read on fit
This is for you if
- You have a repetitive, text-shaped workflow eating real hours every week
- You've prototyped something with an LLM and stalled before it was dependable
- You want it owned by your team after, not rented from a consultant forever
- You'd rather ship one workflow that works than a roadmap of ten that don't
It's not, if
- You need a chatbot bolted onto a marketing site (good, but not this)
- The "workflow" has no examples of a right answer to measure against
- You want a fully autonomous system with no human in the loop — I won't build that
- It's really a data or infra project with AI sprinkled on top to raise a round
How I build it
AI where it earns its place
Every step gets the cheapest model that clears the bar, and any step a script does better stays a script. No AI theater — the goal is the work getting done, not the agent count.
A human gate, always
Agents read; a person (or an audited single-writer path) decides. It's how this site ships daily without shipping nonsense, and it's non-negotiable in what I build for you.
Yours when it's over
You get the code, the docs, and the runbook — not a dependency on me. The best outcome is your team extending it without a call.
Questions
What exactly is an "agentic workflow"?
A multi-step process where an LLM (or several specialized ones) calls tools, makes decisions, and produces structured output — not a single prompt, but a loop with retries, evals, and a human gate. The enrichment pipeline that writes this site's content is the canonical example: five sub-agents, each with a job, feeding a human-approved publish step.
How long does it take, and what does it cost?
The build is scoped to roughly four to six weeks for one workflow. Cost depends on the workflow's shape — we settle it on a call once I understand the scope, not from a price tag on a page. If it isn't a fit, that call is still free and you leave with an honest read.
Will I be locked into you afterward?
No. You get the code, an architecture doc, and a runbook — the deliverable is your team owning it. The best outcome is you extending the workflow without calling me.
Do you build fully autonomous agents with no human in the loop?
No — and I'll push back if you ask. Every workflow I ship routes its writes through a human (or an audited single-writer path). It's why this site ships daily without shipping nonsense, and it's the difference between a demo and something you can trust in production.
Which models and stack do you use?
Whatever fits your stack and budget. I default to the latest Claude models and tier them per step so cost tracks value, but the engagement is about the workflow design — the loop, the tools, the gates, the evals — not a lock-in to one vendor.
Have a workflow in mind?
Tell me the repetitive thing in a sentence or two. I'll give you an honest read on whether it's a fit for an agentic workflow — and whether I'm the right person to build it — usually within a day.
Start a conversation →Watch the machine that sells this
An occasional note when something genuinely new ships here — often a new agent workflow running on the site itself. No schedule, no filler, easy out.