MarketingOS ← Library  ·  AI-OS cookbook · 05
Team — grow · voice-calibrated content production

The Content Engine

A daily drafting pipeline that mines your real day — dictations, pair-programming prompts, meetings, build logs — drafts the one strongest post in your calibrated voice, and refuses to ship until a separate judge agent scores it at the viral bar.

Agentic hire · Ghostwriter & Editor One draft a day — ship, edit or kill in Slack QA gate ≥ 8.0 — a skip beats a weak post Claude agents · Airtable · Obsidian · Slack
What this builds

From posting when you remember to a pipeline that never ships weak

The engine runs every weekday at 17:00 with zero input required. It mines the last 7 days of your actual thinking, picks the single strongest moment, drafts it against a measured voice fingerprint, fights it through an adversarial QA loop — and delivers one Slack message you answer with one word.

Before

Content depends on the founder remembering to post. The best ideas — said out loud in meetings, dictated into tools, typed into a coding agent — die where they happened. Drafts that do get written sound like a generic ghostwriter, and a multi-draft experiment just overwhelmed the review channel until it got ignored.

After

Every weekday: one draft, mined from real work, in a voice calibrated on the actual corpus, scored by a separate judge agent against posts that demonstrably went viral. Below the bar, it gets reworked or honestly skipped. Plus a Sunday newsletter, a nightly performance sweep, and a monthly voice recalibration.

A · Source mining

The day, captured
  • Dictation history (all-day voice)
  • Claude Code prompts (build thinking)
  • Obsidian build-log + meetings
  • Call transcripts + briefings
  • Inspiration corpus (cookbook 04)

B · Voice + drafting

Sound like you, measurably
  • Voice fingerprint from the corpus
  • 5-pattern hook taxonomy
  • 3 content pillars, tagged
  • 10-point style check, auto-fails
  • One CTA per draft, never bait

C · QA + learning loops

The quality firewall
  • Separate judge agent, own context
  • 7 weighted dimensions, hard gates
  • Rework loop + stall-escape ladder
  • 7-day performance sweep
  • Monthly recalibration from outcomes
The evolution

From raw posting to a self-correcting engine

This wasn't designed on a whiteboard. Each stage exists because the previous one hit a wall — the timeline below is the build order to copy.

1

Raw posting era — 7 months by hand

42 LinkedIn posts shipped manually over 7 months, ~6 per month, median 1,503 characters. No system yet — but those 42 posts became the calibration set everything else is built on. You can't calibrate a voice you haven't published.

2

The corpus export

LinkedIn's data export, parsed into a structured JSON + a human-readable markdown file: 42 posts with dates and lengths, plus a 30,000-line DM history flagged as the unguarded-voice sample. First wall discovered immediately: the Basic export only contains posts that uploaded media — text-only posts are invisible (the Complete archive is the fix).

3

Voice calibration — measured, not guessed

A voice-fingerprint file derived from the 42 posts, cross-checked against 60 YouTube video transcripts and 184 top-decile inspiration pieces: sentence length, numbers-per-post, paragraph rhythm, a 5-pattern hook taxonomy, vocabulary tiers, CTA patterns — every claim with a count behind it (full method below).

4

Themes + entertainment patterns

A weekly active-theme.md file that biases the engine toward a 5-day narrative arc (built for a dogfooding sprint where every business action doubled as content capture), and 11 entertainment patterns mined from the top-decile corpus — every draft must carry at least one (surprise, reversal, self-deprecation, punchy contrarian line, vivid detail) or it gets rewritten.

5

The v1.1 rebuild — mine the day, judge with a second head

Three corrections at once. The voice-debrief trigger was retired (it depended on daily input that never came, and its retry crons spammed the channel) — the engine now mines the day unconditionally. Dictation history and Claude Code prompts became the highest-priority sources. Output dropped to one draft per day on a rolling 7-day window.

And the QA gate arrived: a genuinely separate judge agent with its own context, scoring every draft on a 7-dimension rubric against viral benchmark anchors. Proven the same week in a logged demo: a competent draft scored 5.1 FAIL, three surgical fixes later it passed at 8.6.

6

The inspiration frameworks as the floor

A full re-sync of the ~44-creator corpus via the inspiration engine (cookbook 04), then a teardown of the three benchmark LinkedIn creators (Welsh, Martell, Latka) and the top YouTube hooks into a frameworks file. Wired into drafting and QA as a hard rule: a draft weaker than the median top post for its pillar gets rewritten — the median is the floor, not the ceiling.

7

Ground-truth voice + human feedback loop

The voice file was re-derived with a hierarchy flip: all-day dictation history (70,844 tokens) is ground truth for the thinking voice; published posts are the edit constraint, not the source. Words the first pass had dismissed as wishful turned out to be frequent in real dictation and went back in. A monthly recalibration cron re-derives the whole file; and a trusted editor's thread feedback now auto-revises drafts — no approval gate — between the draft landing and the final ship call.

Day-to-day use

One Slack message in, one word back

The owner's entire daily workload: read one draft at 17:00, reply ship it, edit: <changes> or kill: <reason> in the thread. Everything below happens unattended.

StepWhat the engine does
1 · Refresh corpusRe-pulls the Airtable inspiration tables, applies the quality floor (videos: ≥1,000 views and ≥50 likes; posts: ≥100 likes), dedupes, keeps the top 10% per creator — so new creators and fresh winners flow in automatically.
1.5 · Check themeIf an active-theme file exists and hasn't expired, it overrides source weighting and sets the week's pillar and format.
2 · Pull the dayRolling 7-day window across sources, priority-ranked: dictation history and Claude Code prompts highest, then build-log, meetings, intentions, content-ideas, call transcripts, briefings, inbox log. Already-used moments are deduped via source hashes stored with every shipped post.
3 · Score & pick ONEEvery candidate moment scored 1–10 on four axes: specificity, pillar fit (Transition > System > Contrarian), voice authenticity, pipeline potential. Only the single highest-scoring moment gets drafted. The runner-up is held, never posted.
4 · DraftAgainst the voice file: hook from the taxonomy, pillar tagged, numbers-dense, one CTA. Framework match from the inspiration file (Tension→Reframe, System + before/after, anti-trends proof).
5 · Style check10-point checklist with auto-fail rewrites: any em dash, unknown hook pattern, >12-word average sentences, walls of text, zero numbers, corporate filler, hedges.
6 · Safety checkForbidden-topics filter (details in the quality bar section). Reject the angle, not just the sentence.
6.5 · QA gateThe separate judge agent scores it; the writer reworks on the judge's top-3 fixes; loop until PASS or the stall ladder is exhausted.
7 · DeliverOne Slack message to the content channel: pillar, hook pattern, CTA, source attribution, the full ready-to-paste draft, and the reply protocol.
8 · Handle the replyship it → saved with frontmatter, 7-day performance check queued. edit: → applied, reposted in-thread, diff logged. kill: → archived + appended to the negative-training file. No reply → one nudge at 22:00, expire after 24h, no further nags.

The four crons

Mon–Fri 17:00 — the daily LinkedIn pipeline. Sun 15:00 — newsletter draft (600–1,200 words synthesizing the week's shipped and killed drafts). Daily 02:00 — performance sweep on posts due for their 7-day check. 1st of month 03:00 — full voice recalibration. OOO on the calendar = the daily run skips itself, says so once, and stays quiet.

🛟Failure modes are scripted

Airtable pull fails → draft from the last cached corpus and say so. Vault unreachable → narrower source pool, noted in the draft footer. Nothing scores high enough → one honest line ("Nothing strong enough today, skipping"), no forced draft. Slack send fails → fall back to messaging the owner directly with the draft body.

Voice calibration

Corpus → fingerprint → constraint: the method

The voice file is not a style wishlist — it's a measurement. The full guide stays private; the method is reproducible on anyone's corpus:

1

Collect every voice surface you have

Published posts (the LinkedIn export), spoken transcripts (60 video SRTs), unguarded messaging history, and — the richest layer — all-day dictation history (1,360 dictations, 70,844 tokens captured across every app). Each surface plays a different role.

2

Measure, don't vibe

Extract hard numbers: median post length and percentiles, words per sentence, paragraphs per post, numbers-per-post, punctuation habits (zero em dashes across all 42 posts — so the rule is enforced, not invented). Then classify every hook into a taxonomy with frequencies, and tier the vocabulary: core words with counts, promoted words, wishful words, and thinking-voice fillers that must be stripped before publishing.

3

Dictation is ground truth; published posts are the edit constraint

The key calibration insight. What you dictate all day is how you actually think; what you published is how you edit. The first voice pass dismissed several words as "wishful" — dictation data proved they're frequent in real thinking and they went back into the active vocabulary. Drafts translate the thinking voice into the publishing constraint: strip filler, preserve exact numbers and named systems, keep the energy.

4

Draft against the file, every time

The voice file is read before every draft. Hook from the taxonomy, length from the distribution, at least one exact number, the line-break rhythm — and the 10-point style check auto-fails anything off-fingerprint.

5

Recalibrate from outcomes, monthly

The 1st-of-month job re-derives the file from the latest export, the latest 60 transcripts, every winning-pattern note, and the negative file. Killed drafts append to negative training the moment they die; winning patterns are only written back after a shipped post clears the 7-day engagement threshold (≥30 comments or ≥100 reactions or ≥5 DMs). The voice file learns from what actually traveled, not from what felt good.

The quality bar

A separate judge, seven dimensions, hard gates

The QA gate is two genuinely separate agents — not two roles in one head. The writer drafts and reworks; the judge runs in its own isolated workspace and sees only the draft and the source moment, never the writer's reasoning. That separation kills the self-justification blind spot. Both read the same rubric, hook framework, corpus and voice file, so the critique is legitimate and actionable.

#DimensionWeightWhat it measures
1Hook strength25%Does line 1 stop the scroll? Scored against the hook framework's 6 proven patterns.
2Specificity15%Concrete numbers, names, dollars, dates. The viral median is 10 numbers per post.
3Value payoff15%Can the reader act on it today? Lesson + concrete action.
4Voice authenticity15%Sounds like the owner. Zero AI-slop, zero em dashes, zero filler. Any em dash caps this at 4.
5Emotional / contrarian charge10%Surprise, reversal, vulnerability or hot take — the reason people comment.
6Single-idea clarity10%One idea, nameable in one sentence. Not five.
7Structure & scannability10%Short paragraphs, line-break rhythm, never a wall.
The gate — a draft PASSES only if all five hold:

Weighted total ≥ 8.0 · hook ≥ 8 (a weak hook kills the post no matter how good the body) · voice ≥ 7 (off-voice defeats the entire purpose) · no dimension below 6 · zero em dashes (binary auto-fail). Why 8.0: the benchmark posts that actually went viral score 8.5–9.5 on this rubric. Below 8.0 a post is competent but won't travel. The judge calibrates against three anchor posts from the corpus — a two-sentence contrarian punch (6,977 likes), a real-time news open with exact numbers (24,015 likes), a sensory vulnerability open (6,804 likes). The draft doesn't need their like counts; it needs their score range.

Iteration 1 — 5.1 / 10, FAIL

"I've been spending a lot on AI lately…" — a competent draft a normal tool would have posted. The judge: hook buries the number (4/10), one number total (5/10), no concrete action (4/10), flat charge (4/10). Three fixes returned, most impactful first — specific enough to act on without guessing.

Iteration 2 — 8.6 / 10, PASS

Fixes applied literally: exact dollar figure in the first four words, the real before/after headcount, the soft CTA replaced with a standalone contrarian close. Six concrete anchors, hook 9/10, voice 9/10. Two passes, no human involved until the draft was already at the viral bar.

🚫What gets auto-rejected

Style: em dashes, setup-before-payoff hooks, corporate openers, buried ledes, AI-slop phrases, hedges, walls of text, zero numbers.

Mechanics (from the inspiration frameworks): comment-gating, "like + comment + reshare", DM-bait, stacked P.S. lead magnets, 24-hour scarcity — proven to drive engagement for others, rejected as off-register. YouTube: any script not opening with a proven hook pattern in the first sentence, or opening with "in this video…" preamble.

🔒The safety gate (forbidden topics)

Drafts are rejected outright — different angle, not a reword — if they contain individual salaries or departure terms, monthly revenue figures, client names (always anonymized to "a client", "an 8-figure operator I coach"), deal financials, or anything about the agent infrastructure's internals. The post describes the outcome, never the implementation. Raw dictation and coding prompts are the richest sources and the leakiest — they get the strictest pass.

🪜The stall-escape ladder

Three reworks with no gain doesn't mean stop — it means change the move: (1) re-angle the same moment (3 structurally different framings, scored fresh), (2) re-mine the source for the missing specific (the number that breaks a plateau is usually in the source, never invented), (3) reader-simulation — a third agent role-plays the ICP reading it cold, (4) ask the owner ONE surgical question ("what did the AI catch this week that a human missed? One sentence unlocks it"), (5) only then skip.

📈The judge gets judged

Every run logs predicted scores per dimension. The nightly sweep joins them to real 7-day outcomes and classifies: false positive (scored 9.0, flopped), false negative (shipped at 7.5 with a warning, went viral), or calibrated hit. A dimension showing consistent bias across 3+ posts gets re-weighted at the monthly recalibration. And once 10+ of the owner's own posts clear the threshold, his winners replace the inspiration creators as the benchmark anchors.

Exit rule: passed → ship. Converged at 7.0–7.9 → post it flagged with the score and the ceiling, owner decides. Below 7.0 → skip honestly — a skip beats a forgettable post under your name.

Walls & gotchas

Seven walls this build actually hit

01The trigger nobody fed

Wall: v1 waited for a daily voice debrief from the owner. He didn't deliver one — and the hourly retry crons polluted the review channel with status spam.

Fix: mine the day unconditionally. The debrief became optional bonus signal; the retry crons were retired. Never build a pipeline whose trigger is human discipline.

02Volume kills attention

Wall: 2–3 draft messages per day overwhelmed the channel — the owner stopped opening it entirely.

Fix: exactly ONE message per day. Ties break on pipeline potential; the runner-up is held for tomorrow's scoring, never posted. One strong draft beats three never read.

03The self-grading blind spot

Wall: one agent writing and scoring its own draft passes itself — self-justification is structural, not a prompt problem.

Fix: a separate judge agent with isolated context that never sees the writer's reasoning. If the judge is unreachable, the writer self-scores as a fallback — and the delivered post is visibly tagged so the owner knows the independent gate didn't run.

04The export that lies by omission

Wall: LinkedIn's Basic data export only surfaces posts that uploaded media — text-only posts from the same window are simply absent, silently skewing the calibration corpus.

Fix: request the Complete archive (takes ~24h) and treat the Basic set as the working corpus until it lands.

05The ranking formula that returns NaN

Wall: the corpus' Like+Share ratio formula divides by views — and LinkedIn doesn't expose views on posts, so every text post ranks as NaN.

Fix: two-layer ranking — videos by Like+Share, text posts by raw Likes. And on YouTube, apply a ≥2,000-view floor before taking the top N, or tiny-view noise tops the list.

06Big creators drown the corpus

Wall: a global top-10% cut would be all mega-accounts — smaller creators' proven patterns vanish.

Fix: top 10% per creator, ceil-rounded so every creator contributes at least one piece. Each teaches what works for their audience, weighted equally.

07The richest sources are the leakiest

Wall: all-day dictation and coding-agent prompts carry names, internal numbers, unfinished thoughts, infrastructure details and secrets.

Fix: the hard safety gate runs strictest on these sources, and the standing rule: publish the outcome ("I built a system that mines my own thinking across 5 tools"), never the implementation.

One more gap worth knowing: the owner's own LinkedIn posts are not in the Airtable corpus (his written voice lives in the export; only his videos are in the base). Closing that with a posts-to-Airtable sync gives the system one source of truth — noted in the build docs, deliberately not blocking for v1.
Put it to work

One prompt, three steps

1

Copy the bootstrap promptThe button below puts it on your clipboard.

2

Paste it into Claude CodeWith your LinkedIn export requested and your daily sources in reach (notes, transcripts, dictation).

3

Answer its questionsIt calibrates your voice fingerprint first, then wires the daily mine → draft → judge → Slack loop.

Pairs with cookbook 04 — the inspiration engine supplies the benchmark corpus.