Inspiration Engine — Cookbook

What this builds

From a dead scraper stack to a self-refreshing corpus

Every drafting system is only as good as its reference set. This engine keeps a living corpus of proven content from ~44 creators across YouTube, LinkedIn and Instagram — engagement numbers attached — so the content engine (cookbook 05) can judge every draft against what demonstrably travels, not against vibes.

Before

~35 Make.com scenarios plus an n8n workflow did the scraping. The LinkedIn path needed a session cookie hard-coded into one scenario — it expired in December 2024 and the LinkedIn side silently died. Nobody noticed for months.

After

One local Node script with a per-channel map, three no-cookie Apify actors, a webhook fired by a single Airtable checkbox. First run backfills 6 months, every later run is incremental and costs ≈ $0. Verified end-to-end: posts upserted, profile refreshed, checkbox auto-cleared.

A · The Airtable hub

One base, three tables

Users — the curated creator list
Videos — YouTube + IG reels, with transcripts
Text Posts — LinkedIn posts
"Run Now" checkbox = the trigger
Upserts deduped on a stable ID

B · The ingest script

Pull what's proven

One CHANNELS map, one actor each
Backfill once, incremental forever
Profile enrich on every run
Webhook + safety-sweep triggers
CLI flags for every situation

C · The benchmark agent

Corpus → quality bar

Top posts ranked by engagement
3 LinkedIn creators deconstructed
YouTube hook standard (9 patterns)
Adopt / reject verdict per move
Your drafts judged against the floor

Part A — the Airtable hub

One base feeds both halves: ingest writes, drafting reads

Base __AIRTABLE_BASE__ holds everything. The same three tables the ingest script writes are the ones the content engine mines — no export step, no second copy.

Table	What lives there	Key fields
Users	The curated creator list — one row per creator per channel. Your own profiles sit in the same list, so your content lands in the same corpus.	Username (handle/slug), Channel, Run Now, Nickname, Signature, Followers, Friends, Platform ID, ProfileLink, BioLink, Profile Pic, Total Videos, Total Likes, Run ID, Last Modified
Videos	YouTube videos + Instagram reels, transcript attached	ID, User (link), Channel, Caption, Type, Views, Likes, Comments, Shares, Saves, Duration, Height/Width, Cover, SRT, Video Created Time, Link + workflow checkboxes (Generate Script, Repurpose, Used, Generate SRT, Delist)
Text Posts	LinkedIn posts	ID, User (link), Channel, Caption, Likes, Comments, Shares, Cover, Media, Post Created Time, Link, Delisted

01The dedup contract

Every record carries a stable ID — the YouTube video id, the LinkedIn activity URN, the Instagram shortcode. All writes go through Airtable's performUpsert (merge on ID, typecast on, chunks of 10), so re-pulling an overlap window is free: existing records update in place, nothing duplicates.

02Formulas do the ranking

Engagement-ratio formulas (Like+Share, LikeView, LikeComment) live in the base. The consuming skill sorts videos by Like+Share and text posts by raw Likes. No saved Airtable views are required — the quality floor and top-10%-per-creator filters are applied in the consumer code on every pull.

03Profiles stay current

Every run refreshes the creator row: follower counts, bio, profile link, display name — and unchecks Run Now when done. The Run ID field stores the last Apify run for traceability.

04Token scope matters

The token lives in a gitignored file (chmod 600). It needs the write scope (data.records:write) on the base — a read-only token works for the drafting side and then silently fails the moment ingest tries to upsert. Scope it right on day one.

Part B — the channel pulls

One cutoff rule, three very different actors

The cutoff is computed per creator in the main loop, so every channel behaves the same — only how each actor reaches the cutoff differs.

First run → backfill

Nothing stored yet for this creator? Pull the full lookback window — default 6 months (--months N to change it).

Every later run → incremental

Read the newest stored post for that creator (one cheap Airtable query, scoped by Channel), fetch only newer items with a 2-day overlap. The overlap re-pulls a few records — the ID-based upsert absorbs them. Incremental cost: ≈ $0.

Window client-side too

Whatever the actor returns gets filtered against the cutoff again locally (lenient: undated items are kept). Server filters where the actor supports them, local filter always.

Channel	Apify actor (public marketplace)	Cookie?	Lands in
YouTube	`streamers/youtube-scraper`	no	Videos
LinkedIn	`apimaestro/linkedin-profile-posts` + `apimaestro/linkedin-profile-detail` (enrich)	no — cookie-free	Text Posts
Instagram	`apify/instagram-reel-scraper` + `apify/instagram-profile-scraper` (enrich)	no	Videos

YTYouTube — the easy one

The actor takes a server-side oldestPostDate, so the cutoff passes straight through. Sort newest-first, download subtitles as SRT (English), keep only watch?v= items.

Store the thumbnail only, never the video file — Airtable bloats fast and the transcript already lives in the SRT field (capped at 100k chars). Channel stats from the first item refresh the profile: subscribers, description, totals.

LILinkedIn — the no-cookie fix

The old actor needed a li_at session cookie. The replacement takes a profile slug and returns posts — no session, nothing to expire. Reactions → Likes, reposts → Shares; LinkedIn exposes no view counts (stored 0).

No server-side date filter exists (only total_posts), so a cheap 12-post probe measures posting cadence, then the real pull is sized to reach the cutoff plus 30% headroom, capped at 400. Follower/connection counts come from one extra profile-detail call — run on every ingest so counts stay current.

IGInstagram — reels with transcripts

The reel scraper returns includeTranscript (→ the SRT field) and includeSharesCount natively, and onlyPostsNewerThan is a server-side date floor — same incremental model as YouTube. Reels map into the Videos table: caption, link, first frame as cover, likes/comments/shares/views, duration, dimensions.

Instagram doesn't expose saves (stored 0). Profile pic + followers + bio come from the separate profile scraper in the enrich step.

+Adding a channel

One entry in the CHANNELS map: { table, dateField, getTs, fetch, map, profile }. The backfill/incremental cutoff is handled centrally — a new channel only describes how to fetch and how to map fields.

Actors already available on the marketplace for the next ones: clockworks/tiktok-profile-scraper, apidojo/tweet-scraper.

Apify Starter plan: $29/mo in credits — the actors are pay-per-result; a 5-post test run cost ≈ $0.00.

Part C — the live trigger

Check a box, the machine does the rest

Adding a creator or refreshing one is a single checkbox in Airtable. Everything downstream is automatic.

Run Now → webhook

An Airtable automation (trigger: Run Now is checked) runs a one-line script step that fetches your endpoint with the triggering record id: https://__YOUR_MACHINE__.ts.net/trigger?token=__WEBHOOK_SECRET__&recordId=... The endpoint is the local listener exposed via Tailscale Funnel — no cloud server, your own machine answers.

Listener acks fast, runs in background

A tiny HTTP listener on localhost validates the token, responds 200 immediately (so Airtable's webhook never times out on a slow scrape), then spawns the ingest. Single-flight with coalescing: webhooks arriving mid-run queue exactly one more drain — no overlap, no missed checkbox. A launchd agent keeps it alive across reboots.

Ingest fetches the exact record

The listener passes --record <id>, so ingest fetches that one row directly instead of querying {Run Now}=1 — dodging Airtable's formula re-index lag that would otherwise miss a just-checked row. It runs the actor, upserts, refreshes the profile, unchecks the box.

✓ Verified live: posts upserted, profile updated, Run Now auto-cleared

Flag	What it does
`node ingest.mjs`	Drain every creator with Run Now checked (manual run)
`--record <id>`	Process one exact record — the webhook path
`--handle NAME`	Only that username
`--months N` / `--since YYYY-MM-DD`	Change the backfill window / set an explicit cutoff
`--backfill`	Force the full window, ignore stored data
`--limit N` / `--dry-run`	Hard-cap items per creator / fetch + map but write nothing
`--all`	Ignore the checkbox, run every creator — full re-syncs (use with care, cost). Skips creators refreshed <12h ago so an interrupted sweep resumes instead of re-probing

⏱The safety sweep

A webhook that fires while the machine sleeps is lost — Airtable doesn't retry. But the design is idempotent: the box stays checked, so a timed poller (launchd, atomic lock dir against overlapping runs, logs only when something happened) drains any missed rows later. Idle cost ≈ 0: one cheap Airtable GET and exit.

⚙Re-arming the Funnel

Tailscale Funnel persists across reboots once armed: tailscale funnel --bg 8787. Requires Funnel + HTTPS certificates enabled once in the Tailscale admin console. Health check: curl /health on the public URL.

Part D — the benchmark agent

The corpus becomes the bar: rank, deconstruct, judge

Scraping is the cheap half. The value is the teardown: rank every creator's content by engagement, deconstruct why the winners won into a frameworks file, and make that file the gate your own drafts must clear. The current teardown sits on a full re-sync of ~44 creators: 273 + 288 + 178 LinkedIn posts from the three benchmark creators and 1,855 YouTube videos.

LI·1Justin Welsh — the register benchmark

Calm authority, mindset reframes, heavy line breaks, no hype. Three repeatable shapes: Aspiration → Tension → Reframe → Resolution, the Identity Gap ("you feel like a mess, they think you're a machine"), and the Data-Drop Listicle (specificity buys the right to teach).

Adapt, don't copy: he funnels every post into his newsletter — the engine softens that to an occasional CTA, never every time.

LI·2Dan Martell — the System-post benchmark

The richer 288-post pull shows five shapes, not two: the spicy leadership reframe with a 3-tier consequence breakdown, the tactical System with a before/after block ("Email time: 2+ hrs → 15 min"), the short inspirational punch, the vulnerable origin story, and the contrarian-credential take.

Hard-avoid: his "-DM" signature, stacked P.S. lead magnets, and one-word "agree?" bait — engagement-farming, off-register.

LI·3Nathan Latka — the specificity benchmark

Worth stealing: the anti-trends proof story (specific outcome → the contrarian moves that produced it), hyper-specific never-rounded numbers ("$80m", "31,085,072 views"), and ICP-named targeting (he names the reader).

The trap: his highest-engagement posts are comment-gated lead-gen ("comment X and I'll DM you"). That's a distribution hack, not a content style — hard reject.

YTThe YouTube hook standard

Rank by Like+Share with a ≥2,000-view floor — raw ratio alone surfaces tiny-view noise. Nine observed hook types; the single biggest validated winner (494K views / 40K likes) is the time-boxed transformation ("Give me 51 seconds and…").

A passing script: hook lands in ≤5 seconds, carries a specific number, opens in the first sentence — never an "in this video…" preamble.

Creator move	Verdict for your own drafts
Tension → Reframe spine (Welsh)	Adopt — same register
Tactical System + before/after (Martell)	Adopt — the core lane
Anti-trends proof story (Latka)	Adopt
Extreme, never-rounded specificity (Latka)	Already doing — keep maxing
Newsletter funnel on every post (Welsh)	Soften — don't funnel every time
Comment-gating / "like + comment + reshare" / DM-bait	Reject — engagement-farming, salesy
"-DM" signature + stacked P.S. lead magnets	Reject

The engine rule — a LinkedIn draft passes only if it:

(1) opens with a ≤12-word hook in one of the proven patterns, (2) uses either a Tension→Reframe spine OR a concrete step-System with a before/after, (3) carries at least one exact number, (4) keeps the line-break rhythm, (5) closes with a standalone punch or ONE relevant CTA — never comment-bait. If a draft is weaker than the median top-post of the three creators for the same pillar, it gets rewritten. The median is the floor, not the ceiling.

→How your own content rides the same rails

Your own channels sit in the same Users table, so your videos land in the same Videos table and rank under the same formulas. At draft time, the content engine's QA agent (cookbook 05) scores every new draft against this frameworks file plus benchmark anchor posts — rework until it clears, never ship below the bar.

Long-term: once enough of your own posts clear the engagement threshold, your winners replace the inspiration creators as the top anchors. Their corpus seeds the bar; your real outcomes take it over.

↻Keep the teardown fresh

Ranking method: LinkedIn by engagement (Likes + 3×Shares + Comments), YouTube by Like+Share over the ≥2,000-view floor. Refresh the whole corpus with node ingest.mjs --all, then re-run the teardown whenever the creator set changes materially — the last refresh made the dataset 2-3× richer and surfaced two new Martell shapes the first pass missed.

Walls & gotchas

Eight walls this build actually hit

01The cookie that killed the pipeline

Wall: the old LinkedIn actor needed a li_at session cookie hard-coded into the automation. It expired — and the LinkedIn ingest was silently dead for over a year.

Fix: switch to a no-cookie actor that takes a handle and returns posts. No session, nothing to expire, nothing to babysit.

02Read scope can't write

Wall: the existing Airtable token only ever read (the drafting side never needed more), so the first ingest run failed on write.

Fix: mint a token with data.records:write on the base. Read alone is not enough for an upserting pipeline.

03LinkedIn's CDN blocks Airtable

Wall: handing Airtable a profile-image URL silently drops the attachment — the CDN blocks datacenter IPs.

Fix: download the image locally (residential IP works) and upload the bytes via Airtable's content API. Clear the existing pic first so re-runs don't append duplicates.

04The checkbox that isn't there yet

Wall: querying filterByFormula={Run Now}=1 right after the webhook misses the just-checked row — Airtable's formula index lags.

Fix: pass the triggering record id through the webhook and fetch that exact record directly.

05No date filter on the LinkedIn actor

Wall: the actor only accepts total_posts — pagination parameters don't advance, so you can't ask for "posts since March".

Fix: a cheap 12-post probe measures the creator's posting cadence, then the real pull is sized to reach the cutoff + headroom (capped at 400). A near-daily poster's first 6-month backfill genuinely needs ~200 posts; incrementals after that are tiny.

06Transcripts cost real money

Wall: Instagram's includeTranscript is compute-heavy — ~$0.05–0.33 per reel, scaling with video length. Backfilling a reel-heavy creator adds up.

Fix: backfill consciously, then lean on the server-side date floor for cheap incrementals.

07Webhooks die while the laptop sleeps

Wall: a webhook that fires while the machine is asleep is lost — Airtable doesn't retry.

Fix: idempotent design. The checkbox stays checked, so a timed safety sweep (lock-dir guarded) drains any missed rows on the next tick.

08Shared tables bleed cutoffs

Wall: YouTube and Instagram share the Videos table — a same-named creator on both channels would inherit the wrong "newest stored post" cutoff.

Fix: scope the latest-post query by Channel as well as by creator.

Everything on this page is the system as it runs today: the ingest script, the webhook listener, the poller, and the frameworks teardown are live in production — the legacy Make.com + n8n wiring is retired.

The Inspiration Engine