I built a podcast production line that pauses exactly once
A voice memo goes in; an edited, captioned, rule-checked episode waits at a human gate. A build log on the new episode production line: an AI editor hired onto a one-episode contract, a free ffmpeg-and-Whisper floor standing behind it, two QA rungs, and a trust posture that never believes an unsigned callback. Plus what a line like this is for, and when you should not build one.
Publishing a podcast episode by hand is about eleven chores pretending to be one. Record, level the audio, cut the filler words, transcribe, caption, run the checks, upload, stamp the feed, cut the promo clips. This week I finished a production line that does the middle of that list from a single voice memo, and it pauses exactly once: to ask a human whether the episode is any good. Here is how it works, what it costs to run, and how to tell whether you need one.
Eight stops, one pause
The first stop takes whatever the phone recorded and makes it broadcast-shaped. It validates the file, then normalizes loudness to the standard podcast window; my 26-minute test recording landed at −19.45 LUFS against a −19 target, which is the kind of number an app like Overcast expects and a laptop microphone never produces on its own.
The second stop is the interesting one: an AI editor removes the ums, cleans up the room tone, and tightens any silence longer than two seconds. More on its employment terms below.
Captions come from Whisper, a speech-to-text model that runs on my own build machines. A 30-minute episode transcribes in about seven minutes of free CI time, and the result becomes the caption files players use, not just a transcript in a drawer.
Then two layers of quality assurance. A deterministic layer re-measures the encoded audio (spec, loudness window, caption coverage) and fails loudly on any miss. A judgment layer has a language model read the full transcript against my published-content rules and the episode brief, and it is built to overblock: anything malformed or borderline parks the episode rather than passing it.
Only then does a person enter. I listen to the edited cut against the raw one and approve or reject. Nothing in this line publishes anything on its own; the gate is the product, not a compliance sticker.
The editor is on a one-episode contract
The edit pass runs on Descript, a hosted AI audio editor, at $24 a month. Before spending a dollar I built the fallback: an editing floor made of ffmpeg and Whisper that costs nothing and will run forever. The vendor only earns its seat if the edited cut beats the raw cut at the human gate on a real episode. If it loses, the subscription dies and the floor takes over the stop. Same shape either way, one swapped part.
The edit happens in a vendor's cloud, but the recipe that made it lives in my repo.
That recipe is a versioned prompt committed next to the code: remove filler, apply studio sound, tighten silences, and an explicit prohibition on rewriting, reordering, or synthesizing any speech. The editor tightens the episode; it does not get to change what I said. Every job also reports its own meter reading (media minutes and AI credits consumed), which the pipeline writes into a per-episode manifest, so the whole lane stays inside a $50-a-month ceiling I can audit from the repo.
Trust the vendor, not its doorbell
One design decision is worth pulling out, because it applies to any pipeline that waits on someone else's cloud. When the editor finishes a job it can ring you back with a callback, a small automatic message saying done. Descript's callbacks are unsigned, which means anyone who guesses the address could ring the same bell. So the pipeline treats a callback as a doorbell and nothing more: it may wake the runner up, but the only thing that completes a job is the pipeline calling the vendor's API itself, with its own credentials, and reading the answer.
What a line like this is for
Strip away the podcast specifics and the pattern is: a recurring recording goes in, a checked, captioned, publishable artifact comes out, and a human approves each unit. That fits more workflows than mine. A weekly show, obviously. Course lectures becoming captioned modules. A church or meetup publishing every talk without volunteering someone into an editing job. Client interviews turning into searchable, quotable transcripts. Internal briefings that people can actually listen to. The stops change; the shape (machine steps, one AI pass on a fixed recipe, one human gate) carries over.
What I am using it for
This line exists to feed The Legibility Desk, my critique podcast, where every episode is already a cited, verifiable page and the audio has been the missing half. It sits alongside Between Systems, the interview show, and everything it produces eventually joins the site radio, the continuous player over this site's own narrated audio. The pipeline itself follows the same house pattern as the workflow engine I wrote about in June: AI does the volume, gates do the judgment, and the human stays load-bearing.
When I would not build this
If you publish occasionally rather than on a cadence, skip all of it and edit by hand in any decent app; a production line pays for itself through repetition or not at all. If you do not already have somewhere for files to live and jobs to run, a hosted end-to-end tool is the honest choice, because the storage and the runners are most of the plumbing. And if your show is multi-speaker interviews, this exact line rejects your files on purpose; separating voices is a different problem, and mine is deferred until the solo line has earned its keep.
If your team has a pile of recordings and a publishing chore, this is a thing I build. Email me at jake@jakelawrence.xyz with what goes in and what should come out, and I will tell you honestly whether you need a pipeline or an afternoon of ffmpeg.
Get the next one
An occasional note when something genuinely new ships here — essays, free tools, projects. No schedule, no filler, easy out.
Need something like this built?
I design and ship AI tools, full-stack apps, and data pipelines — end to end, to production. Tell me the problem in a sentence; I'll give you an honest read on fit within a day.
Work with me →