MarketingOS ← Library  ·  AI-OS cookbook · 03
Team — save · content production infrastructure

The B-Roll AI System

An automated B-roll bank: Apple Photos → AI-tagged Google Drive library, organized by category × location × year, with Obsidian sidecars, a weekly shot-list, and incremental Sunday syncs. Feed this whole page to Claude Code and skip the 18 iterations + 4 production bugs we hit.

Agentic hire · Digital Asset Manager Ships with the code — 41-file production bundle ~$25–30 one-time tagging (5k files) Apple Photos · SQLite · Claude vision · Drive · Obsidian Runs itself every Sunday via launchd
What this builds

From buried photos to an editor-ready asset bank

Before

Thousands of photos and clips buried in Apple Photos. The social team re-asks for the same shots, duplicates get re-uploaded, and nobody knows what B-roll exists.

After

Every file AI-tagged and auto-routed to Photos/{Cat}/{Cat – Specifier YYYY}/ and Videos/{Cat}/… with a descriptive filename, Obsidian sidecars for personal search, a weekly SHOT-LIST of gaps — refreshed every Sunday without you.

→ Hand this to Claude Code with this prompt

"Read this document as my full briefing. Build the B-Roll AI System for me end-to-end. Do not deviate from the architecture or naming conventions. Replicate every script. After all scripts are in place and tested with smoke runs, ask me to:

  1. Run rclone config and create a gdrive remote
  2. Add my Anthropic API key to ~/.openclaw/agents/main/agent/auth-profiles.json under profiles."anthropic:default".key
  3. Open System Settings and grant Full Disk Access to the .app bundles we'll create

After my confirmation, do a full archive ingest from my Apple Photos library, then wire up the weekly launchd automation."

Two ways to build this. Path A — turnkey: download the bundle below; your Claude personalizes the real production code and you're running in one sitting. Path B — from the briefing: this page alone is a complete architecture brief; your Claude writes the ~35 scripts itself — a multi-session build with judgment calls (vision-prompt wording, dedup thresholds, scoring heuristics), equivalent but not identical. Personalization is fully covered on both paths: your name, your reference images, your people, your Drive folder.
Inside the bundle

The real production code ships with this cookbook

Not pseudocode, not a tutorial repo — the same ~8,600 lines running this system in production every Sunday, with every personal value lifted out as a __PLACEHOLDER__.

37Production scripts

The full pipeline: ingest, dedup, NSFW filter, Claude-vision tagging, structure, naming, 8 backfills, Obsidian sidecars, shot-list analytics, launchd ops — with the 18 iterations + 4 bug fixes already baked in.

+schema.sql & docs

The exact SQLite catalog (every column), the README, and the rclone OAuth setup guide.

SETUP-FOR-CLAUDE.md

The installer brief: your Claude asks for your name, Drive folder, vault path and people — then personalizes by find-and-replace instead of rewriting. Works for any owner, not just the author.

Sanitized & safe

Two scrub passes: no API keys, no folder IDs, no names, no personal catalogs. Your data never touches the author's; the author's never touches yours.

Get it — one download, three steps ↓ (at the end of this page)

Architecture — read first

SQLite is the truth, everything else is a surface

┌─────────────────┐          ┌──────────────────┐          ┌──────────────────┐
│  Apple Photos   │ extract  │  Local SQLite    │  upload  │   Google Drive   │
│  (Mac library)  │─────────▶│  media_index     │─────────▶│   B-Roll/        │
└─────────────────┘ osxphotos└──────────────────┘ rclone + │   ├ Photos/{C}/  │
                                      │           Drive API│   └ Videos/{C}/  │
                                      │ tag                └──────────────────┘
                                      ▼
                             ┌──────────────────┐          ┌──────────────────┐
                             │  Claude Haiku    │          │  Obsidian Vault  │
                             │  vision API      │          │  _media/photos/  │
                             └──────────────────┘          └──────────────────┘
                                                                    ▲
                                                          write     │
                                                          sidecars  │
                                   weekly_shot_briefing.py ─────────┘
Key principle: SQLite (media_index.sqlite) is the source of truth. Drive is the editor-facing surface. Obsidian is the personal-search surface. Apple Photos is the upstream input.
Prerequisites

Mac, Drive & API setup

01Mac requirements

  • macOS 13+ (Ventura) for tccutil + Privacy & Security API
  • Apple Photos library with your face confirmed in persons
  • Homebrew at /opt/homebrew
  • brew install python@3.11 ffmpeg

/opt/homebrew/bin/python3.11 -m pip install --user --break-system-packages osxphotos anthropic pillow_heif reverse_geocoder Pillow requests

02Drive / API setup

  • brew install rclone
  • Personal Google Drive OAuth client — NOT rclone's shared default (it gets rate-limited mid-run). Create at console.cloud.google.com → APIs & Services → Credentials → OAuth client ID → Desktop app; configure the rclone remote gdrive with this client_id + secret
  • Anthropic API key at ~/.openclaw/agents/main/agent/auth-profiles.jsonprofiles."anthropic:default".key

03Drive folder structure

  • One top-level B-Roll/ folder in your Drive
  • Note its ID from the URL: drive.google.com/drive/folders/<ID>
  • Hardcode this ID in every script as BROLL_ROOT_ID

04Reference images

Create references/ with 5 representative images per known person (filename = {Person Name}_{1..5}.jpg). These are cached in the Claude API as a system prefix — ~22K tokens cached per call.

Project layout

Every script, one line each

Create at ~/Documents/Claude Code/media-pipeline/ (any path; adjust). Every file below ships in the bundle — this is the box contents, not a shopping list.

media-pipeline/
├── schema.sql                       # SQLite schema (files, file_tags, folders, dedup_pairs, sync_runs)
├── drive_api.py                     # Drive REST API wrapper (uses rclone token)
│
│  # ── ingest & references ──
├── ingest_walk.py                   # Phase 1: walk an EXISTING Drive library → one row per file
├── ingest_rclone.py                 # Phase 1 variant: index existing Drive via rclone listing
├── extract_archive.py               # osxphotos export from Apple Photos
├── extract_references.py            # Build references/ — 5 images per known person for the vision prompt
├── dump_apple_metadata.py           # Apple's persons/face/date/GPS dump
├── ingest_drone.py                  # DJI Mini/Neo 2 from external SSD (SRT GPS extraction)
│
│  # ── dedup & filters ──
├── dedup_plan.py                    # Plan duplicates: md5 + perceptual hash (staging vs Drive) → dedup_plan.json
├── dedup_apply.py                   # Apply the reviewed dedup plan
├── cleanup_doubles.py               # Remove double-uploads already in Drive# NSFW filter: NudeNet pass → is_nsfw column (exclude from upload)
│
│  # ── tagging ──
├── tagger.py                        # Photo tagging (Claude vision + cached reference prefix)
├── tag_videos.py                    # Video tagging (3-frame ffmpeg → Claude vision)
├── tag_owner.py                      # Owner-visibility pass → owner_visibility column (face_clear … not_visible)
├── retag_drive_files.py             # Utility: re-run tagging over files already in Drive
├── process_archive.py               # Tag + upload + DB insert for archive files
│
│  # ── structure & naming ──
├── derive_structure.py              # Cluster by category + leaf splitter → proposed_structure.md
├── apply_structure.py               # Idempotent Drive moves (images)
├── move_videos_to_videos_root.py    # Videos to separate Videos/ root
├── phase2_videos_top_level.py       # Legacy phase-2 video move (superseded)
├── rename_descriptive.py            # YYYY_MM_aspect_Owner_with_cat_loc_stem.ext convention
├── rename_owner.py                   # Legacy OWNER_ prefix (deprecated)
├── fill_empty_folders.py            # Search Misc folders for matches to empty leaves
├── redistribute_misc_by_location.py # Route Misc files by filename location keywords
│
│  # ── backfills ──
├── backfill_dimensions.py           # Drive EXIF → width/height
├── backfill_exif.py                 # Drive EXIF → taken_time (images)
├── backfill_dates.py                # Multi-source date orchestrator (priority order below)
├── backfill_dates_from_apple.py     # Apple cross-check (4-layer disambiguation)
├── backfill_dates_via_ffprobe.py    # Download + ffprobe for videos with no other date source
├── backfill_face_count.py           # Apple's face_info count (drives Owner_alone vs Owner_with)
├── backfill_gps.py                  # GPS columns (lat/lon/country/admin1/city) — the location naming token needs this
├── set_date.py                      # Manual one-off date correction
│
│  # ── outputs & ops ──
├── sync_new_from_apple.py           # Incremental: Apple Photos → Drive pipeline (the Sunday job)
├── write_obsidian.py                # Per-photo sidecars + hub pages
├── gap_analysis.py                  # SHOT-LIST.md heat map + person × activity + format inventory
├── weekly_shot_briefing.py          # Monday top-5-gaps note in _inbox/
├── swap_rclone_oauth.py             # Swap rclone to your personal OAuth client (prereq 02)
├── system_health_check.py           # Pipeline health check
└── media_index.sqlite               # The SQLite catalog
Critical bugs — these cost iterations 7–15

Four walls we hit, so you don't

BUG 01

SQL ordering for primary category

Without ORDER BY rowid the primary category is non-deterministic — files end up in the wrong folder. Must be fixed in 3 places: derive_structure.fetch, rename_descriptive.build_tag_lookups, move_videos_to_videos_root.fetch_videos.

✗ Wrong
conn.execute("SELECT drive_id, tag_kind,
  tag_value FROM file_tags WHERE tag_kind
  IN ('broll_category','person')")
✓ Right
conn.execute("SELECT drive_id, tag_kind,
  tag_value FROM file_tags WHERE tag_kind
  IN ('broll_category','person')
  ORDER BY rowid")
BUG 02

Date fallback in rename

Never fall back to modified_time when taken_time is NULL — that's Drive's upload timestamp, not the photo's date. It produces wrong YYYY_MM filenames silently.

✗ Wrong
tt = row["taken_time"]
  or row["modified_time"] or ""
✓ Right — 0000_00 sorts to top for review
tt = row["taken_time"] or ""
if len(tt) >= 7 and tt[:4].isdigit()
    and tt[5:7].isdigit():
  date = f"{tt[:4]}_{tt[5:7]}"
else:
  date = "0000_00"
BUG 03

COALESCE blocks the --all-rows override

When updating dates from Apple Photos with --all-rows, do not COALESCE — it protects existing (wrong) values, so wrong dates never get fixed.

✗ Wrong
conn.execute("UPDATE files SET taken_time
  = COALESCE(taken_time, ?)
  WHERE drive_id = ?", (taken, did))
✓ Right
if args.all_rows:
  conn.execute("UPDATE files SET
    taken_time = ? WHERE drive_id = ?",
    (taken, did))
else:
  conn.execute("UPDATE files SET taken_time
    = COALESCE(taken_time, ?)
    WHERE drive_id = ?", (taken, did))
BUG 04

Same-category-only fill misses cross-category files

The fill for "Epic Moments – Corvette" must search all {Cat} – Misc folders, not just Epic Moments – Misc: the Corvette video is tagged primary=Cars, so it lives in Cars – Misc. Cross-category search via filename keywords is required.

The naming convention — final

One filename tells the whole story

{YYYY}_{MM}_{aspect}_{Owner_alone|Owner_with}_{category}_{location}_{originalstem}.{ext}
TokenRuleExample
YYYY_MMfiles.taken_time first 7 chars; 0000_00 if NULL2024_03_
aspectOne of 16x9 · 9x16 · 3x2 · 2x3 · 4x3 · 3x4 · 1x1; omit if unknown9x16_
Owner tokenOwner_alone (face_count ≤ 1), Owner_with (>1); omit if the owner isn't visibleOwner_with_
categoryLowercased first broll_category from AI tag (ORDER BY rowid!)talking_to_cam_bts_
locationgps_city → gps_admin1 → CC-name → parsed leaf folder → omitberkeley_
originalstemSlugified original stem (preserved as suffix for uniqueness)img_4223
# Examples
2026_03_9x16_talking_to_cam_bts_img_4171.mov          # vertical Shorts content
2020_09_3x2_Owner_with_wedding_eg0i4473.jpg            # DSLR wedding shot
2016_06_9x16_Owner_alone_sitting_standing_berkeley_img_4223.jpg

Folder structure & splitters

B-Roll/
├── Photos/
│   ├── Awards/
│   │   └── Awards - 2020/, Awards - 2025/, Awards - Misc/
│   ├── Boating/
│   │   ├── Boating - Bavaria 2024/
│   │   └── Boating - Misc/
│   ├── Epic Moments/
│   │   ├── Epic Moments - USA, LA - Corvette with Dan/
│   │   └── Epic Moments - Misc/
│   ├── Events/, Friends/, Hiking/, …        # 30 categories
│   └── …
└── Videos/
    ├── Drone/
    │   ├── Drone - Bernried/                 # filled from DJI SRT GPS
    │   ├── Drone - Bavaria 2026/
    │   └── Drone - Misc/
    ├── Talking To Cam BTS/, Boating/, Hiking/, …
    └── …
Splitter strategyCategories (in derive_structure.SPLITTER)
trip (location + year)Travelling Plane · Hiking · Boating · Driving · Drone · Scenery · Nature · Venues · Eating
yearEvents · Awards · Epic Moments · Wedding · Talking To Cam BTS · Working · Team · Having Fun · Still Shots · Sitting Standing · Baby Owner · Moving
person_yearParenting Family · Friends
Under the hood — what your Claude runs

Eight phases, in run order

This is the reference walkthrough — the exact commands the setup brief makes your Claude run, phase by phase. You don't type these yourself. Ready to start? Jump to Download & install ↓

1

Schema bootstrap

sqlite3 ~/Documents/Claude\ Code/media-pipeline/media_index.sqlite < schema.sql
sqlite3 media_index.sqlite "PRAGMA journal_mode=WAL"   # concurrent reads during long runs
2

Reference images for Claude vision

references/ with 5 images per known person — cached as a system prefix (~22K tokens cached per call).

3

First archive ingest

# Export owner-tagged from Apple Photos (run from Terminal with FDA)
/opt/homebrew/bin/python3.11 extract_archive.py --plan      # preview
/opt/homebrew/bin/python3.11 extract_archive.py --export

# Dump Apple's metadata (persons, face quality, date, GPS)
/opt/homebrew/bin/python3.11 dump_apple_metadata.py

# Tag + upload to Drive
python3 process_archive.py --smoke     # 10 files first
python3 process_archive.py --all       # ~$10–15, ~5–10h depending on count
3b

Dedup, NSFW filter & owner-visibility — before anything ships to Drive

Dedup staging vs Drive with md5 + perceptual hash: dedup_plan.py → review dedup_plan.jsondedup_apply.py; cleanup_doubles.py removes double-uploads already in Drive (the dedup_pairs table + phash/phash_mid/phash_end columns exist for exactly this). NSFW filter via NudeNet → sets is_nsfw, excluded from upload. Owner-visibility pass (tag_owner.py) → owner_visibility (face_clear … not_visible) — this drives the name token and the weekly "owner-visible only" sync filter.

If you already have a messy Drive library: run phase-1 indexing first (ingest_walk.py / ingest_rclone.py) so dedup can compare against it. Starting fresh from Apple Photos only → skip.

4

Derive + apply structure

Event cap: diversity-sample oversized events before structuring — e.g. the Sep-2020 wedding is capped at the 100 best/most-varied items instead of flooding a folder.

python3 derive_structure.py --min-leaf 5 --out /tmp/proposed.md
python3 apply_structure.py --all                  # photos only
python3 move_videos_to_videos_root.py --all       # videos to Videos/
5

Backfills — run in this order

python3 backfill_dimensions.py --all
python3 backfill_exif.py --all
/opt/homebrew/bin/python3.11 backfill_dates_from_apple.py --apply --all-rows
python3 backfill_dates_via_ffprobe.py --all       # videos with no other date source
/opt/homebrew/bin/python3.11 backfill_face_count.py --apply
python3 backfill_gps.py                            # GPS → country/admin1/city — REQUIRED before rename (location token)
# backfill_dates.py = the multi-source orchestrator (priority order in "Date intelligence")
6

Final rename + cleanup

python3 rename_descriptive.py --all
python3 fill_empty_folders.py --apply --min-score 7    # then again with --min-score 5
python3 redistribute_misc_by_location.py --apply
7

Obsidian sidecars

# Nuke year folders + hub subfolders first (sidecars are name-tied)
rm -rf "$VAULT/_media/photos/[0-9]*/"
rm -rf "$VAULT/_media/photos/_hubs/by-"*
python3 write_obsidian.py --apply
8

SHOT-LIST + weekly briefing

python3 gap_analysis.py --apply
python3 weekly_shot_briefing.py --write
Weekly automation — launchd

Two app bundles, zero manual runs

Both follow the TCC pattern: Homebrew python, not /usr/bin/python3 — Apple's system binaries don't inherit TCC attribution from launchd-spawned bundles; Homebrew's python inherits cleanly.

A~/Applications/AppleSync.app — Sunday 10:00

Pulls new owner-tagged + favorited (owner-visible only) from Apple Photos, full pipeline through to sidecars.

B~/Applications/ShotListBriefing.app — Monday 09:00

Refreshes _inbox/shot-list-week-NN.md with the top-5 personal gaps from the heat map.

<!-- Plist template (adjust path to runner) -->
<plist version="1.0"><dict>
    <key>Label</key><string>com.YOURNAME.apple-sync</string>
    <key>ProgramArguments</key><array>
        <string>/Users/YOURNAME/Applications/AppleSync.app/Contents/MacOS/runner</string>
    </array>
    <key>StartCalendarInterval</key><dict>
        <key>Weekday</key><integer>0</integer>    <!-- 0=Sun, 1=Mon, … -->
        <key>Hour</key><integer>10</integer>
        <key>Minute</key><integer>0</integer>
    </dict>
    <key>RunAtLoad</key><false/>
</dict></plist>
# Bundle runner template
#!/bin/zsh
set -e
SCRIPT="/Users/YOURNAME/path/to/media-pipeline/sync_new_from_apple.py"
LOG="$HOME/.local/share/apple-sync/sync-$(date +%Y-W%V).log"
mkdir -p "$(dirname "$LOG")"
{
    echo "=== $(date -u +%Y-%m-%dT%H:%M:%SZ) ==="
    /opt/homebrew/bin/python3.11 "$SCRIPT" --run --download-missing
} >> "$LOG" 2>&1
# After creating the bundle:
chmod +x runner
codesign -s - --deep --force /path/to/Bundle.app
launchctl load -w ~/Library/LaunchAgents/com.YOURNAME.apple-sync.plist

Permissions: grant Full Disk Access to the .app bundle — System Settings → Privacy → Full Disk Access → add via ⌘⇧G → ~/Applications. First manual run from Terminal triggers the macOS Automation prompts ("Terminal wants to control Photos.app" → Allow) — persistent after grant.

SQLite schema — essentials

The catalog that drives everything

CREATE TABLE files (
    drive_id TEXT PRIMARY KEY,
    parent_id TEXT, parent_path TEXT,
    name TEXT, mime TEXT, extension TEXT,
    size_bytes INTEGER, md5 TEXT, sha256 TEXT,
    created_time TEXT, modified_time TEXT, taken_time TEXT,
    width INTEGER, height INTEGER, duration_sec REAL,
    aspect_ratio TEXT, orientation TEXT,
    phash TEXT, phash_mid TEXT, phash_end TEXT,
    source TEXT,
    drive_url TEXT,
    staging_path TEXT,
    ai_tagged_at TEXT,
    scene_description TEXT, is_nsfw INTEGER, crop_candidate_9_16 INTEGER,
    location_guess TEXT,
    owner_visibility TEXT,   -- face_clear | face_partial | profile_or_back | body_only | not_visible
    owner_checked_at TEXT,
    detected_face_count INTEGER,  -- from Apple's face_info (drives Owner_alone vs Owner_with)
    latitude REAL, longitude REAL,
    gps_country TEXT, gps_admin1 TEXT, gps_city TEXT,
    moved_to_parent_id TEXT, moved_at TEXT,   -- idempotency for apply_structure
    renamed_with_owner_prefix INTEGER DEFAULT 0,
    renamed_descriptive_at TEXT,
    original_name TEXT,     -- backup of pre-rename name
    deleted_at TEXT
);

CREATE TABLE file_tags (
    drive_id TEXT,
    tag_kind TEXT,    -- broll_category | person | mood | shot_type
    tag_value TEXT,
    confidence REAL,
    PRIMARY KEY (drive_id, tag_kind, tag_value)
);

BROLL_CATEGORIES enum — final

No "Still Shots" as a tag — it's a meta-category that fails to describe content. The enum ships owner-neutral — extend it with your own people and categories.

BROLL_CATEGORIES = [
    "Awards", "Baby Owner", "Boating", "Cars", "Driving", "Drone", "Eating",
    "Epic Moments", "Events", "Friends", "Golfcart", "Having Fun", "Hiking",
    "Housework", "Moving", "Nature", "Parenting Family", "Preparing", "Props",
    "Reading", "Reflecting", "Scenery", "Sitting Standing", "Sports Yoga Meditation",
    "Talking To Cam BTS", "Team", "Travelling Plane", "Venues",
    "Wedding", "Wellness", "Working",
]
Date intelligence

The 4-layer date matcher (+ priority order)

iPhone IMG counters wrap every ~10K shots — IMG_4223 is potentially dozens of different photos. Layered disambiguation when matching DB stems to Apple Photos:

LayerSignal
1Pixel dimensions match (width × height) — strongest single signal
2Location cross-check — filename location keyword → reverse-geocoded Apple GPS. The key insight: works even for downsized exports with no dim/size match
3Date proximity (Drive upload time vs Apple capture year)
4Byte size (2% tolerance)
5 (fallback)Same-month grouping

Plus a modality filter: a .mov in the DB must never match an Apple .HEIC — different physical files, same IMG counter.

PriorityDate source (each fills NULLs; only #3 with --all-rows overrides)
1DJI filename parser — dji_YYYYMMDDhhmmss_NNNN_d.mp4
2Drive EXIF — imageMediaMetadata.time
3Apple Photos cross-check — the 4-layer matcher above
4ffprobe creation_time — last resort for videos
5Manual override — set_date.py
Costs & customization

Budget & make it yours

ItemCost
Claude Haiku 4.5 tagging (with prompt caching)~$0.003–0.005 per file
5,000-file library, initial tagging~$25–30 total
Weekly incremental~$0.50–2 depending on new files
Drive APIfree tier sufficient
FileWhat to customize
tagger.pyBROLL_CATEGORIES, MOOD_OPTIONS, SHOT_TYPES enums
tagger.py referencesReplace the author's reference images with your own
extract_archive.pyOWNER_NAME constant
rename_descriptive.pyowner_token logic (replace with the user's name)
derive_structure.pySPLITTER dict — splitter strategy per category
gap_analysis.pypersons_focus list — family member names
weekly_shot_briefing.pyPRIORITY_CATEGORIES list
All scriptsBROLL_ROOT_ID constant — your Drive folder ID
API key location~/.openclaw/agents/main/agent/auth-profiles.json is this setup's path — use ANTHROPIC_API_KEY as an env var instead if you don't run OpenClaw
Confirm it worked

You're done when you can show this

The post-install confirmation — after the build finishes, your Claude reports these numbers back to you.

Files ingested: count
Files moved to year-correct folders: count
Sidecars generated: count
Total Claude API cost spent
Launchd jobs registered: launchctl list | grep YOURNAME
URL to the SHOT-LIST in the Obsidian vault
Skip the iterations. Build it right the first time using this doc. Original briefing lives as a Google Doc ("Fotos B-Roll Google Drive AI System") — this page is the canonical, completed version.
Download & install

One download, three steps

1

Download the bundle41 files — the sanitized production code + setup brief. The button below.

2

Unzip & open in Claude CodeOpen the unzipped folder in Claude Code (or Cowork).

3

Say the magic sentence"Read SETUP-FOR-CLAUDE.md and install." — it asks for your name, Drive folder & people, personalizes everything, then builds.

Download the bundle Free · 41 files · ~130 KB · owner-neutral