Content Engine — Cookbook

What this builds

From posting when you remember to a pipeline that never ships weak

The engine runs every weekday at 17:00 with zero input required. It mines the last 7 days of your actual thinking, picks the single strongest moment, drafts it against a measured voice fingerprint, fights it through an adversarial QA loop — and delivers one Slack message you answer with one word.

Before

Content depends on the founder remembering to post. The best ideas — said out loud in meetings, dictated into tools, typed into a coding agent — die where they happened. Drafts that do get written sound like a generic ghostwriter, and a multi-draft experiment just overwhelmed the review channel until it got ignored.

After

Every weekday: one draft, mined from real work, in a voice calibrated on the actual corpus, scored by a separate judge agent against posts that demonstrably went viral. Below the bar, it gets reworked or honestly skipped. Plus a Sunday newsletter, a nightly performance sweep, and a monthly voice recalibration.

A · Source mining

The day, captured

Dictation history (all-day voice)
Claude Code prompts (build thinking)
Obsidian build-log + meetings
Call transcripts + briefings
Inspiration corpus (cookbook 04)

B · Voice + drafting

Sound like you, measurably

Voice fingerprint from the corpus
5-pattern hook taxonomy
3 content pillars, tagged
10-point style check, auto-fails
One CTA per draft, never bait

C · QA + learning loops

The quality firewall

Separate judge agent, own context
7 weighted dimensions, hard gates
Rework loop + stall-escape ladder
7-day performance sweep
Monthly recalibration from outcomes

The evolution

From raw posting to a self-correcting engine

This wasn't designed on a whiteboard. Each stage exists because the previous one hit a wall — the timeline below is the build order to copy.

Raw posting era — 7 months by hand

42 LinkedIn posts shipped manually over 7 months, ~6 per month, median 1,503 characters. No system yet — but those 42 posts became the calibration set everything else is built on. You can't calibrate a voice you haven't published.

The corpus export

LinkedIn's data export, parsed into a structured JSON + a human-readable markdown file: 42 posts with dates and lengths, plus a 30,000-line DM history flagged as the unguarded-voice sample. First wall discovered immediately: the Basic export only contains posts that uploaded media — text-only posts are invisible (the Complete archive is the fix).

Voice calibration — measured, not guessed

A voice-fingerprint file derived from the 42 posts, cross-checked against 60 YouTube video transcripts and 184 top-decile inspiration pieces: sentence length, numbers-per-post, paragraph rhythm, a 5-pattern hook taxonomy, vocabulary tiers, CTA patterns — every claim with a count behind it (full method below).

Themes + entertainment patterns

A weekly active-theme.md file that biases the engine toward a 5-day narrative arc (built for a dogfooding sprint where every business action doubled as content capture), and 11 entertainment patterns mined from the top-decile corpus — every draft must carry at least one (surprise, reversal, self-deprecation, punchy contrarian line, vivid detail) or it gets rewritten.

The v1.1 rebuild — mine the day, judge with a second head

Three corrections at once. The voice-debrief trigger was retired (it depended on daily input that never came, and its retry crons spammed the channel) — the engine now mines the day unconditionally. Dictation history and Claude Code prompts became the highest-priority sources. Output dropped to one draft per day on a rolling 7-day window.

And the QA gate arrived: a genuinely separate judge agent with its own context, scoring every draft on a 7-dimension rubric against viral benchmark anchors. Proven the same week in a logged demo: a competent draft scored 5.1 FAIL, three surgical fixes later it passed at 8.6.

The inspiration frameworks as the floor

A full re-sync of the ~44-creator corpus via the inspiration engine (cookbook 04), then a teardown of the three benchmark LinkedIn creators (Welsh, Martell, Latka) and the top YouTube hooks into a frameworks file. Wired into drafting and QA as a hard rule: a draft weaker than the median top post for its pillar gets rewritten — the median is the floor, not the ceiling.

Ground-truth voice + human feedback loop

The voice file was re-derived with a hierarchy flip: all-day dictation history (70,844 tokens) is ground truth for the thinking voice; published posts are the edit constraint, not the source. Words the first pass had dismissed as wishful turned out to be frequent in real dictation and went back in. A monthly recalibration cron re-derives the whole file; and a trusted editor's thread feedback now auto-revises drafts — no approval gate — between the draft landing and the final ship call.

Day-to-day use

One Slack message in, one word back

The owner's entire daily workload: read one draft at 17:00, reply ship it, edit: <changes> or kill: <reason> in the thread. Everything below happens unattended.

Step	What the engine does
1 · Refresh corpus	Re-pulls the Airtable inspiration tables, applies the quality floor (videos: ≥1,000 views and ≥50 likes; posts: ≥100 likes), dedupes, keeps the top 10% per creator — so new creators and fresh winners flow in automatically.
1.5 · Check theme	If an active-theme file exists and hasn't expired, it overrides source weighting and sets the week's pillar and format.
2 · Pull the day	Rolling 7-day window across sources, priority-ranked: dictation history and Claude Code prompts highest, then build-log, meetings, intentions, content-ideas, call transcripts, briefings, inbox log. Already-used moments are deduped via source hashes stored with every shipped post.
3 · Score & pick ONE	Every candidate moment scored 1–10 on four axes: specificity, pillar fit (Transition > System > Contrarian), voice authenticity, pipeline potential. Only the single highest-scoring moment gets drafted. The runner-up is held, never posted.
4 · Draft	Against the voice file: hook from the taxonomy, pillar tagged, numbers-dense, one CTA. Framework match from the inspiration file (Tension→Reframe, System + before/after, anti-trends proof).
5 · Style check	10-point checklist with auto-fail rewrites: any em dash, unknown hook pattern, >12-word average sentences, walls of text, zero numbers, corporate filler, hedges.
6 · Safety check	Forbidden-topics filter (details in the quality bar section). Reject the angle, not just the sentence.
6.5 · QA gate	The separate judge agent scores it; the writer reworks on the judge's top-3 fixes; loop until PASS or the stall ladder is exhausted.
7 · Deliver	One Slack message to the content channel: pillar, hook pattern, CTA, source attribution, the full ready-to-paste draft, and the reply protocol.
8 · Handle the reply	ship it → saved with frontmatter, 7-day performance check queued. edit: → applied, reposted in-thread, diff logged. kill: → archived + appended to the negative-training file. No reply → one nudge at 22:00, expire after 24h, no further nags.

⏰The four crons

Mon–Fri 17:00 — the daily LinkedIn pipeline. Sun 15:00 — newsletter draft (600–1,200 words synthesizing the week's shipped and killed drafts). Daily 02:00 — performance sweep on posts due for their 7-day check. 1st of month 03:00 — full voice recalibration. OOO on the calendar = the daily run skips itself, says so once, and stays quiet.

🛟Failure modes are scripted

Airtable pull fails → draft from the last cached corpus and say so. Vault unreachable → narrower source pool, noted in the draft footer. Nothing scores high enough → one honest line ("Nothing strong enough today, skipping"), no forced draft. Slack send fails → fall back to messaging the owner directly with the draft body.

Voice calibration

Corpus → fingerprint → constraint: the method

The voice file is not a style wishlist — it's a measurement. The full guide stays private; the method is reproducible on anyone's corpus:

Collect every voice surface you have

Published posts (the LinkedIn export), spoken transcripts (60 video SRTs), unguarded messaging history, and — the richest layer — all-day dictation history (1,360 dictations, 70,844 tokens captured across every app). Each surface plays a different role.

Measure, don't vibe

Extract hard numbers: median post length and percentiles, words per sentence, paragraphs per post, numbers-per-post, punctuation habits (zero em dashes across all 42 posts — so the rule is enforced, not invented). Then classify every hook into a taxonomy with frequencies, and tier the vocabulary: core words with counts, promoted words, wishful words, and thinking-voice fillers that must be stripped before publishing.

Dictation is ground truth; published posts are the edit constraint

The key calibration insight. What you dictate all day is how you actually think; what you published is how you edit. The first voice pass dismissed several words as "wishful" — dictation data proved they're frequent in real thinking and they went back into the active vocabulary. Drafts translate the thinking voice into the publishing constraint: strip filler, preserve exact numbers and named systems, keep the energy.

Draft against the file, every time

The voice file is read before every draft. Hook from the taxonomy, length from the distribution, at least one exact number, the line-break rhythm — and the 10-point style check auto-fails anything off-fingerprint.

Recalibrate from outcomes, monthly

The 1st-of-month job re-derives the file from the latest export, the latest 60 transcripts, every winning-pattern note, and the negative file. Killed drafts append to negative training the moment they die; winning patterns are only written back after a shipped post clears the 7-day engagement threshold (≥30 comments or ≥100 reactions or ≥5 DMs). The voice file learns from what actually traveled, not from what felt good.

The quality bar

A separate judge, seven dimensions, hard gates

The QA gate is two genuinely separate agents — not two roles in one head. The writer drafts and reworks; the judge runs in its own isolated workspace and sees only the draft and the source moment, never the writer's reasoning. That separation kills the self-justification blind spot. Both read the same rubric, hook framework, corpus and voice file, so the critique is legitimate and actionable.

#	Dimension	Weight	What it measures
1	Hook strength	25%	Does line 1 stop the scroll? Scored against the hook framework's 6 proven patterns.
2	Specificity	15%	Concrete numbers, names, dollars, dates. The viral median is 10 numbers per post.
3	Value payoff	15%	Can the reader act on it today? Lesson + concrete action.
4	Voice authenticity	15%	Sounds like the owner. Zero AI-slop, zero em dashes, zero filler. Any em dash caps this at 4.
5	Emotional / contrarian charge	10%	Surprise, reversal, vulnerability or hot take — the reason people comment.
6	Single-idea clarity	10%	One idea, nameable in one sentence. Not five.
7	Structure & scannability	10%	Short paragraphs, line-break rhythm, never a wall.

The gate — a draft PASSES only if all five hold:

Weighted total ≥ 8.0 · hook ≥ 8 (a weak hook kills the post no matter how good the body) · voice ≥ 7 (off-voice defeats the entire purpose) · no dimension below 6 · zero em dashes (binary auto-fail). Why 8.0: the benchmark posts that actually went viral score 8.5–9.5 on this rubric. Below 8.0 a post is competent but won't travel. The judge calibrates against three anchor posts from the corpus — a two-sentence contrarian punch (6,977 likes), a real-time news open with exact numbers (24,015 likes), a sensory vulnerability open (6,804 likes). The draft doesn't need their like counts; it needs their score range.

Iteration 1 — 5.1 / 10, FAIL

"I've been spending a lot on AI lately…" — a competent draft a normal tool would have posted. The judge: hook buries the number (4/10), one number total (5/10), no concrete action (4/10), flat charge (4/10). Three fixes returned, most impactful first — specific enough to act on without guessing.

Iteration 2 — 8.6 / 10, PASS

Fixes applied literally: exact dollar figure in the first four words, the real before/after headcount, the soft CTA replaced with a standalone contrarian close. Six concrete anchors, hook 9/10, voice 9/10. Two passes, no human involved until the draft was already at the viral bar.

🚫What gets auto-rejected

Style: em dashes, setup-before-payoff hooks, corporate openers, buried ledes, AI-slop phrases, hedges, walls of text, zero numbers.

Mechanics (from the inspiration frameworks): comment-gating, "like + comment + reshare", DM-bait, stacked P.S. lead magnets, 24-hour scarcity — proven to drive engagement for others, rejected as off-register. YouTube: any script not opening with a proven hook pattern in the first sentence, or opening with "in this video…" preamble.

🔒The safety gate (forbidden topics)

Drafts are rejected outright — different angle, not a reword — if they contain individual salaries or departure terms, monthly revenue figures, client names (always anonymized to "a client", "an 8-figure operator I coach"), deal financials, or anything about the agent infrastructure's internals. The post describes the outcome, never the implementation. Raw dictation and coding prompts are the richest sources and the leakiest — they get the strictest pass.

🪜The stall-escape ladder

Three reworks with no gain doesn't mean stop — it means change the move: (1) re-angle the same moment (3 structurally different framings, scored fresh), (2) re-mine the source for the missing specific (the number that breaks a plateau is usually in the source, never invented), (3) reader-simulation — a third agent role-plays the ICP reading it cold, (4) ask the owner ONE surgical question ("what did the AI catch this week that a human missed? One sentence unlocks it"), (5) only then skip.

📈The judge gets judged

Every run logs predicted scores per dimension. The nightly sweep joins them to real 7-day outcomes and classifies: false positive (scored 9.0, flopped), false negative (shipped at 7.5 with a warning, went viral), or calibrated hit. A dimension showing consistent bias across 3+ posts gets re-weighted at the monthly recalibration. And once 10+ of the owner's own posts clear the threshold, his winners replace the inspiration creators as the benchmark anchors.

Exit rule: passed → ship. Converged at 7.0–7.9 → post it flagged with the score and the ceiling, owner decides. Below 7.0 → skip honestly — a skip beats a forgettable post under your name.

Walls & gotchas

Seven walls this build actually hit

01The trigger nobody fed

Wall: v1 waited for a daily voice debrief from the owner. He didn't deliver one — and the hourly retry crons polluted the review channel with status spam.

Fix: mine the day unconditionally. The debrief became optional bonus signal; the retry crons were retired. Never build a pipeline whose trigger is human discipline.

02Volume kills attention

Wall: 2–3 draft messages per day overwhelmed the channel — the owner stopped opening it entirely.

Fix: exactly ONE message per day. Ties break on pipeline potential; the runner-up is held for tomorrow's scoring, never posted. One strong draft beats three never read.

03The self-grading blind spot

Wall: one agent writing and scoring its own draft passes itself — self-justification is structural, not a prompt problem.

Fix: a separate judge agent with isolated context that never sees the writer's reasoning. If the judge is unreachable, the writer self-scores as a fallback — and the delivered post is visibly tagged so the owner knows the independent gate didn't run.

04The export that lies by omission

Wall: LinkedIn's Basic data export only surfaces posts that uploaded media — text-only posts from the same window are simply absent, silently skewing the calibration corpus.

Fix: request the Complete archive (takes ~24h) and treat the Basic set as the working corpus until it lands.

05The ranking formula that returns NaN

Wall: the corpus' Like+Share ratio formula divides by views — and LinkedIn doesn't expose views on posts, so every text post ranks as NaN.

Fix: two-layer ranking — videos by Like+Share, text posts by raw Likes. And on YouTube, apply a ≥2,000-view floor before taking the top N, or tiny-view noise tops the list.

06Big creators drown the corpus

Wall: a global top-10% cut would be all mega-accounts — smaller creators' proven patterns vanish.

Fix: top 10% per creator, ceil-rounded so every creator contributes at least one piece. Each teaches what works for their audience, weighted equally.

07The richest sources are the leakiest

Wall: all-day dictation and coding-agent prompts carry names, internal numbers, unfinished thoughts, infrastructure details and secrets.

Fix: the hard safety gate runs strictest on these sources, and the standing rule: publish the outcome ("I built a system that mines my own thinking across 5 tools"), never the implementation.

One more gap worth knowing: the owner's own LinkedIn posts are not in the Airtable corpus (his written voice lives in the export; only his videos are in the base). Closing that with a posts-to-Airtable sync gives the system one source of truth — noted in the build docs, deliberately not blocking for v1.

The Content Engine