Inbox OS — Cookbook · MarketingOS

What this builds

From anxiety surface to processed queue

Before

The inbox is an anxiety surface: hundreds of unread, the important drowning in newsletters and recap mails, a years-deep backlog you'll "sort out someday". Every glance costs focus; nothing has been decided.

After

The inbox is a processed queue: what matters is starred and waiting, noise is labeled and archived but still searchable, bulk is unsubscribed — and one morning briefing surfaces the few cases that genuinely need a human call.

A · The decision ladder

Six actions, first match wins

Star + keep — the human queue
Label + archive — searchable noise
Leave alone — protected senders
Unsubscribe — header only
Filter future — bulk with no exit
Archive — the resting default

B · Calibration

Rules from your behavior

Mine every star you ever set
Control sample for keyword lift
Reply-engagement whitelists
Read-only until you approve
Re-runs monthly, weighted

C · The daily loop

One run, one briefing

Scheduled run at 06:30
One Slack DM, not 40 pings
One-line reply commands
Active learning from replies
Full audit log per day

Part A — the decision ladder

Six actions. Every message walks the ladder.

Each new message is evaluated top to bottom — first match wins. Sender-specific overrides sit above domain rules, so one noisy address can't ride a trusted domain into the star queue.

Action	What it means	What triggers it
Label + archive	Useful but noisy — labeled, out of the inbox, searchable forever	Known recap and notice senders (e.g. meeting-recorder summaries → label `fathom`); per-sender overrides — evaluated before star rules
Star + keep	The human queue — what you actually open	Whitelisted domains (company, tax/legal, property, key clients), individual trusted senders, money/legal subject keywords (`urgent`, `contract`, `invoice`, `term sheet`…), replies to threads you started
Leave alone	Looks like bulk, isn't — never filtered, never auto-starred	Banking and payment alerts, vendor billing, calendar mail, identity/security senders — and any subject carrying a 2FA or verification code
Unsubscribe	Bulk with a clean exit	`List-Unsubscribe` header present + bulk-looking sender (noreply/news/marketing localparts, ESP trace headers)
Filter future	Bulk with no exit — a filter archives the sender's future mail	Bulk pattern, no usable unsubscribe header, sender not on the careful list
Default	The resting state — differs per account	Work account: leave it in the inbox. Personal account: archive it (cleanup mode)

+Two accounts, opposite postures

Same ladder, different defaults. The work account errs toward visible — anything unmatched stays put. The personal account runs in 20-year cleanup mode and errs toward empty — anything unmatched is archived, never deleted, always searchable.

+Keywords are tripwires, not the system

Subject keywords catch the long tail — payment, legal, deadline, cancellation terms in both your languages. But the heavy lifting is sender-level: who you reply to beats what words they used.

Part B — calibrate before you automate

The rules come from your stars, not your guesses

Before the agent touches a single message, a read-only calibration pass mines what you already told your inbox over two decades — every star is a labeled training example you didn't know you were creating.

Mine every star you ever set

Pull all starred mail (paginated year by year when the archive is deep), with full headers: sender, domain, subject, date, thread position. Exclude stars on your own sent mail — those are follow-up bookmarks, not importance signals, and they poison the rules.

Pull a control sample

Thousands of recent unstarred messages as the denominator. Without it, "invoice" looks important just because it's frequent. With it, every keyword is measured as lift: how much more often it appears in starred vs ordinary mail.

Derive rules with explicit thresholds

Domains starred 5+ times → always-star candidates. Individual senders starred 3+ times (not already covered by a domain) → whitelist. Subject keywords in 10+ starred mails and at ≥3× lift → keyword rules. Replies to threads you started: ≥70% star rate → automatic rule, <30% → dropped, in between → ask-every-time. Domains seen 20+ times with zero stars → archive candidates, surfaced but never auto-applied.

On top: 90 days of sent-mail analysis — whitelist the senders you actually answer, not just the ones you starred years ago.

The approval gate

Calibration writes a report and sends a briefing: sample sizes, top domains, top keywords, suggested rules. Nothing acts until you reply "approved" — the daily run literally refuses to start without the approval file on disk. Hand-written rules stay a floor; calibration adds, never overrides.

Done when: the ruleset reflects your behavior — and you've signed off on it

Part C — the daily loop

One run, one briefing, one line back

06:30 — the run

A scheduled job processes the last 24 hours of inbox mail (never spam, never trash, never anything already starred). Each message walks the ladder; labels, stars and archives are applied. On the personal account, a historical-cleanup batch of the oldest unread runs alongside.

One briefing, not forty pings

A single Slack DM: counts per account, the handful of subjects it starred, pending unsubscribes tagged U1, U2…, and borderline cases tagged B1, B2… — each with a real opinion, not hedging. An empty day says so in one line.

You answer in one line — or not at all

Replies are commands. Silence is fine: undecided cases simply resurface tomorrow. Replying to actual emails stays yours — the agent files and flags, it never writes to your contacts on your behalf.

The system learns — with your replies weighted 3×

Every star/skip/keep decision is appended to a learning log. Monthly recalibration re-mines everything, weighting your explicit replies 3× heavier than passive star history. A borderline rule collapses into a firm keep-or-drop once 20+ decisions land with a ≥70% majority. Teaching the system is a by-product of reading one DM.

Done when: the morning briefing takes under a minute — most days you reply nothing

→ The entire morning ritual, verbatim

"unsub U1 U3" — execute those unsubscribes. "keep U2" — never touch that sender again. "star B1, skip B2" — teach the borderline rules. That's the whole interface: a one-line Slack reply, or silence.

+The cleanup gearbox

The historical backlog melts on a leash: warm-up at 200 oldest-unread per day → only after 5 days under a 1% error rate does it shift to full at 500/day → complete when the backlog drains. Throughput is earned by a clean error record, not assumed.

+The audit log

Every action lands in a daily file: action, sender, subject, reason — plus stats and borderline reasoning. Not bureaucracy: it's what makes the system debuggable. Two of the three rule patches below came straight out of reading one day's log.

The unsubscribe pass

Protocol-level exits, never body links

Only the RFC header

Unsubscribes use the List-Unsubscribe mail header exclusively: one-click POST (RFC 8058) preferred, mailto: fallback. Links inside email bodies are never clicked — that's where the phishing lives. The one-word protocol unsubscribe is the only mail the agent ever sends.

No header? Filter the future

Bulk senders without a clean exit get a mail filter that auto-archives their future sends. Created idempotently — existing filters are checked first, so three mails from the same sender in one run don't stack three duplicate filters.

Mixed senders get a human

Some platform addresses carry both bulk digests and real humans — community platforms, social networks, code-hosting notifications all send invites and mentions from the same noreply address that sends spam-grade digests. These live on a careful list: never auto-filtered, always surfaced as a pending decision.

The first week is training wheels

For 7 days after install, every unsubscribe surfaces for explicit approval instead of executing — protecting the three newsletters you actually love from an over-eager bulk pattern. From day 8, auto-execute resumes for anything you haven't keep-listed.

Done when: bulk mail unsubscribes itself — and nothing you wanted ever disappeared

Guardrails & gotchas

Seven never-rules — and the walls we hit anyway

The agent acts on your mail every day, so the safety rules are absolute — and the gotchas below are real patches, each traceable to a logged false positive in the first days of operation.

Never delete. Archive only — every action is reversible.

Never click links in email bodies. Protocol headers only.

Never act during calibration. Read-only until explicit approval.

Never unsubscribe a real human. A weekly-refreshed work-contacts cache guards the personal cleanup — known contacts get starred, not unsubscribed.

Never touch already-starred mail. Your stars outrank its rules.

Never act on spam or trash. Every query excludes them.

Borderline → surface, don't act. Doubt goes to the briefing, not to autopilot.

01Domain whitelists are too blunt

A whitelisted client domain carried both a key contact and a newsletter sender — the newsletter got starred for days. Fix: drop the domain rule, whitelist the human, label-and-archive the newsletter address. Sender rules now evaluate before domain rules.

02Keywords misfire across contexts

The tax keyword faithfully starred a platform vendor's "tax and price updates" notices. Fix: a sender-level label-and-archive override that beats any keyword. Same lesson from a membership org: real correspondence stars, the same org's noreply digests get labeled away.

03The CLI lies politely

The Gmail CLI's metadata format returned zero headers on this version — silently. Unsubscribe detection needs headers, so everything fetches raw. Cost a debugging session; now one line in the runbook.

04Security mail must be untouchable

Identity senders and anything carrying a verification or 2FA code look exactly like bulk — noreply sender, templated subject. They're hard-coded leave-alone: a triage system that archives your login codes gets uninstalled the same day.

05Fail soft, run anyway

Rate-limited? Process what landed, resume tomorrow. One account's auth expires? Surface it, skip that account, still run the other — a real run did exactly this. Unhandled exception? Error trace to disk plus a DM. The run never silently skips, and it doesn't pause for vacations.

06Dry-run before live, post-mortem after

A read-only harness mirroring every rule ran against the live inbox before the skill was allowed to act. Then: ship narrow, read the audit log, patch. Two dated rule versions in the first week — each one a logged false positive turned into a fix.

Distilled from a triage skill in daily production since May 2026 — ruleset, calibration files, learning log and daily audit logs. Senders, subjects and counts stay private; the patterns are the cookbook.

The Inbox OS