An automated B-roll bank: Apple Photos → AI-tagged Google Drive library, organized by category × location × year, with Obsidian sidecars, a weekly shot-list, and incremental Sunday syncs. Feed this whole page to Claude Code and skip the 18 iterations + 4 production bugs we hit.
Thousands of photos and clips buried in Apple Photos. The social team re-asks for the same shots, duplicates get re-uploaded, and nobody knows what B-roll exists.
Every file AI-tagged and auto-routed to Photos/{Cat}/{Cat – Specifier YYYY}/ and Videos/{Cat}/… with a descriptive filename, Obsidian sidecars for personal search, a weekly SHOT-LIST of gaps — refreshed every Sunday without you.
"Read this document as my full briefing. Build the B-Roll AI System for me end-to-end. Do not deviate from the architecture or naming conventions. Replicate every script. After all scripts are in place and tested with smoke runs, ask me to:
rclone config and create a gdrive remote~/.openclaw/agents/main/agent/auth-profiles.json under profiles."anthropic:default".keyAfter my confirmation, do a full archive ingest from my Apple Photos library, then wire up the weekly launchd automation."
Not pseudocode, not a tutorial repo — the same ~8,600 lines running this system in production every Sunday, with every personal value lifted out as a __PLACEHOLDER__.
The full pipeline: ingest, dedup, NSFW filter, Claude-vision tagging, structure, naming, 8 backfills, Obsidian sidecars, shot-list analytics, launchd ops — with the 18 iterations + 4 bug fixes already baked in.
The exact SQLite catalog (every column), the README, and the rclone OAuth setup guide.
The installer brief: your Claude asks for your name, Drive folder, vault path and people — then personalizes by find-and-replace instead of rewriting. Works for any owner, not just the author.
Two scrub passes: no API keys, no folder IDs, no names, no personal catalogs. Your data never touches the author's; the author's never touches yours.
Get it — one download, three steps ↓ (at the end of this page)
┌─────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ Apple Photos │ extract │ Local SQLite │ upload │ Google Drive │ │ (Mac library) │─────────▶│ media_index │─────────▶│ B-Roll/ │ └─────────────────┘ osxphotos└──────────────────┘ rclone + │ ├ Photos/{C}/ │ │ Drive API│ └ Videos/{C}/ │ │ tag └──────────────────┘ ▼ ┌──────────────────┐ ┌──────────────────┐ │ Claude Haiku │ │ Obsidian Vault │ │ vision API │ │ _media/photos/ │ └──────────────────┘ └──────────────────┘ ▲ write │ sidecars │ weekly_shot_briefing.py ─────────┘
media_index.sqlite) is the source of truth. Drive is the editor-facing surface. Obsidian is the personal-search surface. Apple Photos is the upstream input.tccutil + Privacy & Security API/opt/homebrewbrew install python@3.11 ffmpeg/opt/homebrew/bin/python3.11 -m pip install --user --break-system-packages osxphotos anthropic pillow_heif reverse_geocoder Pillow requests
brew install rclonegdrive with this client_id + secret~/.openclaw/agents/main/agent/auth-profiles.json → profiles."anthropic:default".keyB-Roll/ folder in your Drivedrive.google.com/drive/folders/<ID>BROLL_ROOT_IDCreate references/ with 5 representative images per known person (filename = {Person Name}_{1..5}.jpg). These are cached in the Claude API as a system prefix — ~22K tokens cached per call.
Create at ~/Documents/Claude Code/media-pipeline/ (any path; adjust). Every file below ships in the bundle — this is the box contents, not a shopping list.
media-pipeline/ ├── schema.sql # SQLite schema (files, file_tags, folders, dedup_pairs, sync_runs) ├── drive_api.py # Drive REST API wrapper (uses rclone token) │ │ # ── ingest & references ── ├── ingest_walk.py # Phase 1: walk an EXISTING Drive library → one row per file ├── ingest_rclone.py # Phase 1 variant: index existing Drive via rclone listing ├── extract_archive.py # osxphotos export from Apple Photos ├── extract_references.py # Build references/ — 5 images per known person for the vision prompt ├── dump_apple_metadata.py # Apple's persons/face/date/GPS dump ├── ingest_drone.py # DJI Mini/Neo 2 from external SSD (SRT GPS extraction) │ │ # ── dedup & filters ── ├── dedup_plan.py # Plan duplicates: md5 + perceptual hash (staging vs Drive) → dedup_plan.json ├── dedup_apply.py # Apply the reviewed dedup plan ├── cleanup_doubles.py # Remove double-uploads already in Drive │ # NSFW filter: NudeNet pass → is_nsfw column (exclude from upload) │ │ # ── tagging ── ├── tagger.py # Photo tagging (Claude vision + cached reference prefix) ├── tag_videos.py # Video tagging (3-frame ffmpeg → Claude vision) ├── tag_owner.py # Owner-visibility pass → owner_visibility column (face_clear … not_visible) ├── retag_drive_files.py # Utility: re-run tagging over files already in Drive ├── process_archive.py # Tag + upload + DB insert for archive files │ │ # ── structure & naming ── ├── derive_structure.py # Cluster by category + leaf splitter → proposed_structure.md ├── apply_structure.py # Idempotent Drive moves (images) ├── move_videos_to_videos_root.py # Videos to separate Videos/ root ├── phase2_videos_top_level.py # Legacy phase-2 video move (superseded) ├── rename_descriptive.py # YYYY_MM_aspect_Owner_with_cat_loc_stem.ext convention ├── rename_owner.py # Legacy OWNER_ prefix (deprecated) ├── fill_empty_folders.py # Search Misc folders for matches to empty leaves ├── redistribute_misc_by_location.py # Route Misc files by filename location keywords │ │ # ── backfills ── ├── backfill_dimensions.py # Drive EXIF → width/height ├── backfill_exif.py # Drive EXIF → taken_time (images) ├── backfill_dates.py # Multi-source date orchestrator (priority order below) ├── backfill_dates_from_apple.py # Apple cross-check (4-layer disambiguation) ├── backfill_dates_via_ffprobe.py # Download + ffprobe for videos with no other date source ├── backfill_face_count.py # Apple's face_info count (drives Owner_alone vs Owner_with) ├── backfill_gps.py # GPS columns (lat/lon/country/admin1/city) — the location naming token needs this ├── set_date.py # Manual one-off date correction │ │ # ── outputs & ops ── ├── sync_new_from_apple.py # Incremental: Apple Photos → Drive pipeline (the Sunday job) ├── write_obsidian.py # Per-photo sidecars + hub pages ├── gap_analysis.py # SHOT-LIST.md heat map + person × activity + format inventory ├── weekly_shot_briefing.py # Monday top-5-gaps note in _inbox/ ├── swap_rclone_oauth.py # Swap rclone to your personal OAuth client (prereq 02) ├── system_health_check.py # Pipeline health check └── media_index.sqlite # The SQLite catalog
Without ORDER BY rowid the primary category is non-deterministic — files end up in the wrong folder. Must be fixed in 3 places: derive_structure.fetch, rename_descriptive.build_tag_lookups, move_videos_to_videos_root.fetch_videos.
conn.execute("SELECT drive_id, tag_kind,
tag_value FROM file_tags WHERE tag_kind
IN ('broll_category','person')")conn.execute("SELECT drive_id, tag_kind,
tag_value FROM file_tags WHERE tag_kind
IN ('broll_category','person')
ORDER BY rowid")Never fall back to modified_time when taken_time is NULL — that's Drive's upload timestamp, not the photo's date. It produces wrong YYYY_MM filenames silently.
tt = row["taken_time"] or row["modified_time"] or ""
tt = row["taken_time"] or ""
if len(tt) >= 7 and tt[:4].isdigit()
and tt[5:7].isdigit():
date = f"{tt[:4]}_{tt[5:7]}"
else:
date = "0000_00"When updating dates from Apple Photos with --all-rows, do not COALESCE — it protects existing (wrong) values, so wrong dates never get fixed.
conn.execute("UPDATE files SET taken_time
= COALESCE(taken_time, ?)
WHERE drive_id = ?", (taken, did))if args.all_rows:
conn.execute("UPDATE files SET
taken_time = ? WHERE drive_id = ?",
(taken, did))
else:
conn.execute("UPDATE files SET taken_time
= COALESCE(taken_time, ?)
WHERE drive_id = ?", (taken, did))The fill for "Epic Moments – Corvette" must search all {Cat} – Misc folders, not just Epic Moments – Misc: the Corvette video is tagged primary=Cars, so it lives in Cars – Misc. Cross-category search via filename keywords is required.
{YYYY}_{MM}_{aspect}_{Owner_alone|Owner_with}_{category}_{location}_{originalstem}.{ext}| Token | Rule | Example |
|---|---|---|
YYYY_MM | files.taken_time first 7 chars; 0000_00 if NULL | 2024_03_ |
| aspect | One of 16x9 · 9x16 · 3x2 · 2x3 · 4x3 · 3x4 · 1x1; omit if unknown | 9x16_ |
| Owner token | Owner_alone (face_count ≤ 1), Owner_with (>1); omit if the owner isn't visible | Owner_with_ |
| category | Lowercased first broll_category from AI tag (ORDER BY rowid!) | talking_to_cam_bts_ |
| location | gps_city → gps_admin1 → CC-name → parsed leaf folder → omit | berkeley_ |
| originalstem | Slugified original stem (preserved as suffix for uniqueness) | img_4223 |
# Examples 2026_03_9x16_talking_to_cam_bts_img_4171.mov # vertical Shorts content 2020_09_3x2_Owner_with_wedding_eg0i4473.jpg # DSLR wedding shot 2016_06_9x16_Owner_alone_sitting_standing_berkeley_img_4223.jpg
B-Roll/ ├── Photos/ │ ├── Awards/ │ │ └── Awards - 2020/, Awards - 2025/, Awards - Misc/ │ ├── Boating/ │ │ ├── Boating - Bavaria 2024/ │ │ └── Boating - Misc/ │ ├── Epic Moments/ │ │ ├── Epic Moments - USA, LA - Corvette with Dan/ │ │ └── Epic Moments - Misc/ │ ├── Events/, Friends/, Hiking/, … # 30 categories │ └── … └── Videos/ ├── Drone/ │ ├── Drone - Bernried/ # filled from DJI SRT GPS │ ├── Drone - Bavaria 2026/ │ └── Drone - Misc/ ├── Talking To Cam BTS/, Boating/, Hiking/, … └── …
| Splitter strategy | Categories (in derive_structure.SPLITTER) |
|---|---|
| trip (location + year) | Travelling Plane · Hiking · Boating · Driving · Drone · Scenery · Nature · Venues · Eating |
| year | Events · Awards · Epic Moments · Wedding · Talking To Cam BTS · Working · Team · Having Fun · Still Shots · Sitting Standing · Baby Owner · Moving |
| person_year | Parenting Family · Friends |
This is the reference walkthrough — the exact commands the setup brief makes your Claude run, phase by phase. You don't type these yourself. Ready to start? Jump to Download & install ↓
sqlite3 ~/Documents/Claude\ Code/media-pipeline/media_index.sqlite < schema.sql
sqlite3 media_index.sqlite "PRAGMA journal_mode=WAL" # concurrent reads during long runsreferences/ with 5 images per known person — cached as a system prefix (~22K tokens cached per call).
# Export owner-tagged from Apple Photos (run from Terminal with FDA) /opt/homebrew/bin/python3.11 extract_archive.py --plan # preview /opt/homebrew/bin/python3.11 extract_archive.py --export # Dump Apple's metadata (persons, face quality, date, GPS) /opt/homebrew/bin/python3.11 dump_apple_metadata.py # Tag + upload to Drive python3 process_archive.py --smoke # 10 files first python3 process_archive.py --all # ~$10–15, ~5–10h depending on count
Dedup staging vs Drive with md5 + perceptual hash: dedup_plan.py → review dedup_plan.json → dedup_apply.py; cleanup_doubles.py removes double-uploads already in Drive (the dedup_pairs table + phash/phash_mid/phash_end columns exist for exactly this). NSFW filter via NudeNet → sets is_nsfw, excluded from upload. Owner-visibility pass (tag_owner.py) → owner_visibility (face_clear … not_visible) — this drives the name token and the weekly "owner-visible only" sync filter.
If you already have a messy Drive library: run phase-1 indexing first (ingest_walk.py / ingest_rclone.py) so dedup can compare against it. Starting fresh from Apple Photos only → skip.
Event cap: diversity-sample oversized events before structuring — e.g. the Sep-2020 wedding is capped at the 100 best/most-varied items instead of flooding a folder.
python3 derive_structure.py --min-leaf 5 --out /tmp/proposed.md python3 apply_structure.py --all # photos only python3 move_videos_to_videos_root.py --all # videos to Videos/
python3 backfill_dimensions.py --all python3 backfill_exif.py --all /opt/homebrew/bin/python3.11 backfill_dates_from_apple.py --apply --all-rows python3 backfill_dates_via_ffprobe.py --all # videos with no other date source /opt/homebrew/bin/python3.11 backfill_face_count.py --apply python3 backfill_gps.py # GPS → country/admin1/city — REQUIRED before rename (location token) # backfill_dates.py = the multi-source orchestrator (priority order in "Date intelligence")
python3 rename_descriptive.py --all
python3 fill_empty_folders.py --apply --min-score 7 # then again with --min-score 5
python3 redistribute_misc_by_location.py --apply# Nuke year folders + hub subfolders first (sidecars are name-tied)
rm -rf "$VAULT/_media/photos/[0-9]*/"
rm -rf "$VAULT/_media/photos/_hubs/by-"*
python3 write_obsidian.py --applypython3 gap_analysis.py --apply python3 weekly_shot_briefing.py --write
Both follow the TCC pattern: Homebrew python, not /usr/bin/python3 — Apple's system binaries don't inherit TCC attribution from launchd-spawned bundles; Homebrew's python inherits cleanly.
Pulls new owner-tagged + favorited (owner-visible only) from Apple Photos, full pipeline through to sidecars.
Refreshes _inbox/shot-list-week-NN.md with the top-5 personal gaps from the heat map.
<!-- Plist template (adjust path to runner) --> <plist version="1.0"><dict> <key>Label</key><string>com.YOURNAME.apple-sync</string> <key>ProgramArguments</key><array> <string>/Users/YOURNAME/Applications/AppleSync.app/Contents/MacOS/runner</string> </array> <key>StartCalendarInterval</key><dict> <key>Weekday</key><integer>0</integer> <!-- 0=Sun, 1=Mon, … --> <key>Hour</key><integer>10</integer> <key>Minute</key><integer>0</integer> </dict> <key>RunAtLoad</key><false/> </dict></plist>
# Bundle runner template
#!/bin/zsh
set -e
SCRIPT="/Users/YOURNAME/path/to/media-pipeline/sync_new_from_apple.py"
LOG="$HOME/.local/share/apple-sync/sync-$(date +%Y-W%V).log"
mkdir -p "$(dirname "$LOG")"
{
echo "=== $(date -u +%Y-%m-%dT%H:%M:%SZ) ==="
/opt/homebrew/bin/python3.11 "$SCRIPT" --run --download-missing
} >> "$LOG" 2>&1# After creating the bundle:
chmod +x runner
codesign -s - --deep --force /path/to/Bundle.app
launchctl load -w ~/Library/LaunchAgents/com.YOURNAME.apple-sync.plistPermissions: grant Full Disk Access to the .app bundle — System Settings → Privacy → Full Disk Access → add via ⌘⇧G → ~/Applications. First manual run from Terminal triggers the macOS Automation prompts ("Terminal wants to control Photos.app" → Allow) — persistent after grant.
CREATE TABLE files (
drive_id TEXT PRIMARY KEY,
parent_id TEXT, parent_path TEXT,
name TEXT, mime TEXT, extension TEXT,
size_bytes INTEGER, md5 TEXT, sha256 TEXT,
created_time TEXT, modified_time TEXT, taken_time TEXT,
width INTEGER, height INTEGER, duration_sec REAL,
aspect_ratio TEXT, orientation TEXT,
phash TEXT, phash_mid TEXT, phash_end TEXT,
source TEXT,
drive_url TEXT,
staging_path TEXT,
ai_tagged_at TEXT,
scene_description TEXT, is_nsfw INTEGER, crop_candidate_9_16 INTEGER,
location_guess TEXT,
owner_visibility TEXT, -- face_clear | face_partial | profile_or_back | body_only | not_visible
owner_checked_at TEXT,
detected_face_count INTEGER, -- from Apple's face_info (drives Owner_alone vs Owner_with)
latitude REAL, longitude REAL,
gps_country TEXT, gps_admin1 TEXT, gps_city TEXT,
moved_to_parent_id TEXT, moved_at TEXT, -- idempotency for apply_structure
renamed_with_owner_prefix INTEGER DEFAULT 0,
renamed_descriptive_at TEXT,
original_name TEXT, -- backup of pre-rename name
deleted_at TEXT
);
CREATE TABLE file_tags (
drive_id TEXT,
tag_kind TEXT, -- broll_category | person | mood | shot_type
tag_value TEXT,
confidence REAL,
PRIMARY KEY (drive_id, tag_kind, tag_value)
);No "Still Shots" as a tag — it's a meta-category that fails to describe content. The enum ships owner-neutral — extend it with your own people and categories.
BROLL_CATEGORIES = [
"Awards", "Baby Owner", "Boating", "Cars", "Driving", "Drone", "Eating",
"Epic Moments", "Events", "Friends", "Golfcart", "Having Fun", "Hiking",
"Housework", "Moving", "Nature", "Parenting Family", "Preparing", "Props",
"Reading", "Reflecting", "Scenery", "Sitting Standing", "Sports Yoga Meditation",
"Talking To Cam BTS", "Team", "Travelling Plane", "Venues",
"Wedding", "Wellness", "Working",
]iPhone IMG counters wrap every ~10K shots — IMG_4223 is potentially dozens of different photos. Layered disambiguation when matching DB stems to Apple Photos:
| Layer | Signal |
|---|---|
| 1 | Pixel dimensions match (width × height) — strongest single signal |
| 2 | Location cross-check — filename location keyword → reverse-geocoded Apple GPS. The key insight: works even for downsized exports with no dim/size match |
| 3 | Date proximity (Drive upload time vs Apple capture year) |
| 4 | Byte size (2% tolerance) |
| 5 (fallback) | Same-month grouping |
Plus a modality filter: a .mov in the DB must never match an Apple .HEIC — different physical files, same IMG counter.
| Priority | Date source (each fills NULLs; only #3 with --all-rows overrides) |
|---|---|
| 1 | DJI filename parser — dji_YYYYMMDDhhmmss_NNNN_d.mp4 |
| 2 | Drive EXIF — imageMediaMetadata.time |
| 3 | Apple Photos cross-check — the 4-layer matcher above |
| 4 | ffprobe creation_time — last resort for videos |
| 5 | Manual override — set_date.py |
| Item | Cost |
|---|---|
| Claude Haiku 4.5 tagging (with prompt caching) | ~$0.003–0.005 per file |
| 5,000-file library, initial tagging | ~$25–30 total |
| Weekly incremental | ~$0.50–2 depending on new files |
| Drive API | free tier sufficient |
| File | What to customize |
|---|---|
tagger.py | BROLL_CATEGORIES, MOOD_OPTIONS, SHOT_TYPES enums |
tagger.py references | Replace the author's reference images with your own |
extract_archive.py | OWNER_NAME constant |
rename_descriptive.py | owner_token logic (replace with the user's name) |
derive_structure.py | SPLITTER dict — splitter strategy per category |
gap_analysis.py | persons_focus list — family member names |
weekly_shot_briefing.py | PRIORITY_CATEGORIES list |
| All scripts | BROLL_ROOT_ID constant — your Drive folder ID |
| API key location | ~/.openclaw/agents/main/agent/auth-profiles.json is this setup's path — use ANTHROPIC_API_KEY as an env var instead if you don't run OpenClaw |
The post-install confirmation — after the build finishes, your Claude reports these numbers back to you.
launchctl list | grep YOURNAMEDownload the bundle41 files — the sanitized production code + setup brief. The button below.
Unzip & open in Claude CodeOpen the unzipped folder in Claude Code (or Cowork).
Say the magic sentence"Read SETUP-FOR-CLAUDE.md and install." — it asks for your name, Drive folder & people, personalizes everything, then builds.
Download the bundle Free · 41 files · ~130 KB · owner-neutral