MarketingOS ← Library  ·  AI-OS cookbook · 11
Self — save · an inbox that triages itself

The Inbox OS

A triage agent that runs before you wake: stars what matters, labels and archives the noise, unsubscribes from bulk at the protocol level — and reports in one daily briefing you answer in one line. Calibrated on 20 years of your own starred mail, improving with every reply you send it.

Agentic hire · Inbox Assistant 06:30 daily — done before coffee Calibrated on decades of stars Never deletes — archive only Gmail · Slack · a scheduled agent
What this builds

From anxiety surface to processed queue

Before

The inbox is an anxiety surface: hundreds of unread, the important drowning in newsletters and recap mails, a years-deep backlog you'll "sort out someday". Every glance costs focus; nothing has been decided.

After

The inbox is a processed queue: what matters is starred and waiting, noise is labeled and archived but still searchable, bulk is unsubscribed — and one morning briefing surfaces the few cases that genuinely need a human call.

A · The decision ladder

Six actions, first match wins
  • Star + keep — the human queue
  • Label + archive — searchable noise
  • Leave alone — protected senders
  • Unsubscribe — header only
  • Filter future — bulk with no exit
  • Archive — the resting default

B · Calibration

Rules from your behavior
  • Mine every star you ever set
  • Control sample for keyword lift
  • Reply-engagement whitelists
  • Read-only until you approve
  • Re-runs monthly, weighted

C · The daily loop

One run, one briefing
  • Scheduled run at 06:30
  • One Slack DM, not 40 pings
  • One-line reply commands
  • Active learning from replies
  • Full audit log per day
Part A — the decision ladder

Six actions. Every message walks the ladder.

Each new message is evaluated top to bottom — first match wins. Sender-specific overrides sit above domain rules, so one noisy address can't ride a trusted domain into the star queue.

ActionWhat it meansWhat triggers it
Label + archiveUseful but noisy — labeled, out of the inbox, searchable foreverKnown recap and notice senders (e.g. meeting-recorder summaries → label fathom); per-sender overrides — evaluated before star rules
Star + keepThe human queue — what you actually openWhitelisted domains (company, tax/legal, property, key clients), individual trusted senders, money/legal subject keywords (urgent, contract, invoice, term sheet…), replies to threads you started
Leave aloneLooks like bulk, isn't — never filtered, never auto-starredBanking and payment alerts, vendor billing, calendar mail, identity/security senders — and any subject carrying a 2FA or verification code
UnsubscribeBulk with a clean exitList-Unsubscribe header present + bulk-looking sender (noreply/news/marketing localparts, ESP trace headers)
Filter futureBulk with no exit — a filter archives the sender's future mailBulk pattern, no usable unsubscribe header, sender not on the careful list
DefaultThe resting state — differs per accountWork account: leave it in the inbox. Personal account: archive it (cleanup mode)

+Two accounts, opposite postures

Same ladder, different defaults. The work account errs toward visible — anything unmatched stays put. The personal account runs in 20-year cleanup mode and errs toward empty — anything unmatched is archived, never deleted, always searchable.

+Keywords are tripwires, not the system

Subject keywords catch the long tail — payment, legal, deadline, cancellation terms in both your languages. But the heavy lifting is sender-level: who you reply to beats what words they used.

Part B — calibrate before you automate

The rules come from your stars, not your guesses

Before the agent touches a single message, a read-only calibration pass mines what you already told your inbox over two decades — every star is a labeled training example you didn't know you were creating.

1

Mine every star you ever set

Pull all starred mail (paginated year by year when the archive is deep), with full headers: sender, domain, subject, date, thread position. Exclude stars on your own sent mail — those are follow-up bookmarks, not importance signals, and they poison the rules.

2

Pull a control sample

Thousands of recent unstarred messages as the denominator. Without it, "invoice" looks important just because it's frequent. With it, every keyword is measured as lift: how much more often it appears in starred vs ordinary mail.

3

Derive rules with explicit thresholds

Domains starred 5+ times → always-star candidates. Individual senders starred 3+ times (not already covered by a domain) → whitelist. Subject keywords in 10+ starred mails and at ≥3× lift → keyword rules. Replies to threads you started: ≥70% star rate → automatic rule, <30% → dropped, in between → ask-every-time. Domains seen 20+ times with zero stars → archive candidates, surfaced but never auto-applied.

On top: 90 days of sent-mail analysis — whitelist the senders you actually answer, not just the ones you starred years ago.

4

The approval gate

Calibration writes a report and sends a briefing: sample sizes, top domains, top keywords, suggested rules. Nothing acts until you reply "approved" — the daily run literally refuses to start without the approval file on disk. Hand-written rules stay a floor; calibration adds, never overrides.

Done when: the ruleset reflects your behavior — and you've signed off on it
Part C — the daily loop

One run, one briefing, one line back

1

06:30 — the run

A scheduled job processes the last 24 hours of inbox mail (never spam, never trash, never anything already starred). Each message walks the ladder; labels, stars and archives are applied. On the personal account, a historical-cleanup batch of the oldest unread runs alongside.

2

One briefing, not forty pings

A single Slack DM: counts per account, the handful of subjects it starred, pending unsubscribes tagged U1, U2…, and borderline cases tagged B1, B2… — each with a real opinion, not hedging. An empty day says so in one line.

3

You answer in one line — or not at all

Replies are commands. Silence is fine: undecided cases simply resurface tomorrow. Replying to actual emails stays yours — the agent files and flags, it never writes to your contacts on your behalf.

4

The system learns — with your replies weighted 3×

Every star/skip/keep decision is appended to a learning log. Monthly recalibration re-mines everything, weighting your explicit replies 3× heavier than passive star history. A borderline rule collapses into a firm keep-or-drop once 20+ decisions land with a ≥70% majority. Teaching the system is a by-product of reading one DM.

Done when: the morning briefing takes under a minute — most days you reply nothing
→ The entire morning ritual, verbatim

"unsub U1 U3" — execute those unsubscribes.  "keep U2" — never touch that sender again.  "star B1, skip B2" — teach the borderline rules.  That's the whole interface: a one-line Slack reply, or silence.

+The cleanup gearbox

The historical backlog melts on a leash: warm-up at 200 oldest-unread per day → only after 5 days under a 1% error rate does it shift to full at 500/day → complete when the backlog drains. Throughput is earned by a clean error record, not assumed.

+The audit log

Every action lands in a daily file: action, sender, subject, reason — plus stats and borderline reasoning. Not bureaucracy: it's what makes the system debuggable. Two of the three rule patches below came straight out of reading one day's log.

The unsubscribe pass

Protocol-level exits, never body links

1

Only the RFC header

Unsubscribes use the List-Unsubscribe mail header exclusively: one-click POST (RFC 8058) preferred, mailto: fallback. Links inside email bodies are never clicked — that's where the phishing lives. The one-word protocol unsubscribe is the only mail the agent ever sends.

2

No header? Filter the future

Bulk senders without a clean exit get a mail filter that auto-archives their future sends. Created idempotently — existing filters are checked first, so three mails from the same sender in one run don't stack three duplicate filters.

3

Mixed senders get a human

Some platform addresses carry both bulk digests and real humans — community platforms, social networks, code-hosting notifications all send invites and mentions from the same noreply address that sends spam-grade digests. These live on a careful list: never auto-filtered, always surfaced as a pending decision.

4

The first week is training wheels

For 7 days after install, every unsubscribe surfaces for explicit approval instead of executing — protecting the three newsletters you actually love from an over-eager bulk pattern. From day 8, auto-execute resumes for anything you haven't keep-listed.

Done when: bulk mail unsubscribes itself — and nothing you wanted ever disappeared
Guardrails & gotchas

Seven never-rules — and the walls we hit anyway

The agent acts on your mail every day, so the safety rules are absolute — and the gotchas below are real patches, each traceable to a logged false positive in the first days of operation.

Never delete. Archive only — every action is reversible.
Never click links in email bodies. Protocol headers only.
Never act during calibration. Read-only until explicit approval.
Never unsubscribe a real human. A weekly-refreshed work-contacts cache guards the personal cleanup — known contacts get starred, not unsubscribed.
Never touch already-starred mail. Your stars outrank its rules.
Never act on spam or trash. Every query excludes them.
Borderline → surface, don't act. Doubt goes to the briefing, not to autopilot.

01Domain whitelists are too blunt

A whitelisted client domain carried both a key contact and a newsletter sender — the newsletter got starred for days. Fix: drop the domain rule, whitelist the human, label-and-archive the newsletter address. Sender rules now evaluate before domain rules.

02Keywords misfire across contexts

The tax keyword faithfully starred a platform vendor's "tax and price updates" notices. Fix: a sender-level label-and-archive override that beats any keyword. Same lesson from a membership org: real correspondence stars, the same org's noreply digests get labeled away.

03The CLI lies politely

The Gmail CLI's metadata format returned zero headers on this version — silently. Unsubscribe detection needs headers, so everything fetches raw. Cost a debugging session; now one line in the runbook.

04Security mail must be untouchable

Identity senders and anything carrying a verification or 2FA code look exactly like bulk — noreply sender, templated subject. They're hard-coded leave-alone: a triage system that archives your login codes gets uninstalled the same day.

05Fail soft, run anyway

Rate-limited? Process what landed, resume tomorrow. One account's auth expires? Surface it, skip that account, still run the other — a real run did exactly this. Unhandled exception? Error trace to disk plus a DM. The run never silently skips, and it doesn't pause for vacations.

06Dry-run before live, post-mortem after

A read-only harness mirroring every rule ran against the live inbox before the skill was allowed to act. Then: ship narrow, read the audit log, patch. Two dated rule versions in the first week — each one a logged false positive turned into a fix.

Distilled from a triage skill in daily production since May 2026 — ruleset, calibration files, learning log and daily audit logs. Senders, subjects and counts stay private; the patterns are the cookbook.
Put it to work

One prompt, three steps

1

Copy the bootstrap promptThe button below puts it on your clipboard.

2

Paste it into Claude CodeWith Gmail access (CLI or MCP) and a Slack DM route set up.

3

Approve the calibrationIt mines your stars read-only first; nothing touches a message until you reply "approved".

Calibration is read-only — the system earns autonomy one approval at a time.