Cover image for: We Let AI Manage Our Inbox for Two Weeks: What We Learned

We Let AI Manage Our Inbox for Two Weeks: What We Learned

We Let AI Manage Our Inbox for Two Weeks: What We Learned

Affiliate links ↓

Updated · May 17, 2026

Most productivity advice about AI email management reads like a press release. “Reclaim hours!” “Reach inbox zero!” We’d read plenty of it. What we hadn’t done was hand over control and document what happened when things went wrong. So in late April, two of us on the Take The AI editorial team did exactly that: we ran our work inboxes through an AI-managed stack for 14 days, on real client correspondence, vendor threads, and the usual flood of newsletters and cold outreach.

According to a McKinsey Global Institute report, knowledge workers spend an average of 28% of their workweek managing email. For us, across two accounts, that translated to roughly 90 minutes of daily email time. We wanted to cut it in half. Here’s what actually happened.

The setup: what we chose and what we almost used instead

We set a hard constraint before starting: if AI caused us to miss or delay responding to a real person by more than 24 hours, it counted as a failure. Not a learning moment. A failure. That framing changed how seriously we took the training process.

SaneBox at the Personal tier ($7/month per account) handled filtering. It works by watching what you open, reply to, and star, then sorting incoming mail accordingly — higher-priority threads stay in your inbox, everything else files into labeled folders like @SaneLater and @SaneNews.

Shortwave at the paid plan ($18/month per user) became our primary email client. The feature we cared most about was AI thread summarization — it compresses a 30-message thread into a 200-word catch-up in seconds. We also wanted its draft reply suggestions for routine emails.

Zapier on the Starter plan ($19.99/month) routed specific categories — vendor invoices, receipts, automated platform notifications — out of our inboxes entirely and into a shared Notion database for weekly review.

We almost used Superhuman. It’s a fast email client with solid AI drafting, and at around $30/month per user it’s not unreasonable. But Superhuman’s real strength is interface speed, not smart triage — and triage intelligence was what we were testing. We also briefly considered routing emails through the Claude API for draft replies, but the setup time for a 14-day experiment wasn’t worth it.

Week one: the mess we didn’t plan for

Day two of the experiment, a client email went into SaneBox’s @SaneLater folder. We found it 36 hours later. The client had asked a simple clarifying question; we looked unresponsive. First failure. It happened because we went live before properly training the system.

SaneBox learns from your behavior — specifically, what you reply to quickly, what you star, what you move to the inbox. On day one it had none of that history. An email from a client whose domain it had never seen got sorted the same way as a newsletter from a new sender: cautiously, toward the lower-priority folder. Makes sense in retrospect. Felt bad in the moment.

We spent about 40 minutes over those first three days manually correcting misclassifications. By day five, the filtering had calibrated noticeably. By day seven, we trusted it enough to stop compulsively checking @SaneLater every hour.

Shortwave’s summaries, meanwhile, worked almost immediately. A 22-message client thread we’d been putting off reviewing took 90 seconds to catch up on. That feature alone changed how we handled long threads — we stopped avoiding them.

Where did the time savings actually show up?

The biggest measurable win was newsletters and marketing email. Before the experiment, skimming and triaging newsletters was taking 20 to 25 minutes per day — more than we’d admitted to ourselves. With SaneBox routing all of it to @SaneNews, we checked that folder once at day’s end and spent maybe three minutes on it. Consistent, real savings, every single day.

Shortwave’s draft suggestions worked well for transactional replies: confirming receipt, agreeing to a meeting time, answering a single-question follow-up. We accepted the draft with light edits roughly 60% of the time on those categories. For anything requiring actual judgment — project feedback, negotiation, introductions — we ignored the drafts and wrote from scratch. The AI’s versions weren’t wrong, they just weren’t us.

The Zapier routing was quieter but real. Vendor invoices, order confirmations, and automated platform notifications stopped appearing in our main inboxes after the first week. That cleared a low-level noise we hadn’t fully noticed until it disappeared.

End result: combined daily email time dropped from roughly 90 minutes to about 58 minutes. Not the 45-minute target we’d set, but a measurable reduction. Shortwave’s team reports that users typically save 45 to 60 minutes per week; our results landed at the lower end of that range, likely because our inbound volume runs high.

Where did the AI actually get it wrong?

Cold outreach from legitimate contacts was a persistent problem. SaneBox filtered both obvious vendor spam and a first email from a conference organizer we’d actually want to hear from. Both landed in @SaneBlackHole, SaneBox’s spam equivalent. Domain whitelists helped for known contacts, but new legitimate senders kept slipping through.

Unlike SaneBox, which learns passively in the background, Shortwave’s AI drafts require an active choice each time — you see the suggestion, decide to use it or not. That design kept us from accidentally sending something unreviewed. One draft opened with “I would be delighted to connect,” which was technically appropriate but completely wrong for that thread’s casual tone. We caught it before sending. Still: these tools don’t read relationship context, and they don’t know when matching someone’s informal register matters.

The harder failure mode was ambiguity. “Can we jump on a quick call?” from a partial contact — someone we’d exchanged two emails with — got sorted as low-priority cold outreach. Twice. These edge cases don’t have clean behavioral patterns, and current filtering AI handles them poorly by design.

Final count over 14 days: two genuine misses. Both in week one, before the system had calibrated. Zero missed emails in week two.

What we’d change next time

The biggest mistake was running three new systems simultaneously. When something went wrong — and things did go wrong — we couldn’t isolate which layer had caused the problem. Next time: start with SaneBox alone, let it train for a full week, then layer in a second tool.

We’d also add a manual override for specific high-stakes signals. Any email containing words like “invoice,” “contract,” “proposal,” or a recognizable name from our contacts list gets flagged for human review regardless of AI classification. The cost of a missed email in those categories outweighs whatever time the automation saves on everything else.

For teams of more than two or three people, we’d look seriously at Front or Missive as the primary client — both have AI features built around shared inboxes, which SaneBox and Shortwave can’t replicate when multiple people need to act on the same thread.

The final stack

  • SaneBox Personal — $7/month per account — filtering, priority sorting, newsletter quarantine
  • Shortwave paid — $18/month per user — AI thread summaries and draft reply suggestions
  • Zapier Starter — $19.99/month — routing invoices and receipts to Notion
  • Total: around $51/month across two accounts for the full 14-day experiment

Frequently asked questions

Can AI fully manage your inbox without you checking it?

Not for external business communication, at least not yet. Filtering and routine triage can be largely automated, but anything requiring relationship judgment or contextual nuance still needs a human in the loop — especially during the first week before the system has learned your patterns.

Does SaneBox work with Gmail and Outlook?

Yes. SaneBox connects to any IMAP-compatible service — Gmail, Outlook, Apple Mail, Fastmail, and others. There’s no software to install; it creates folders that sync automatically with whatever client you already use.

Is Shortwave worth paying for if you already have Gmail’s Gemini AI?

If your inbox is mostly short threads, probably not — Gmail’s Gemini integration handles basic summarization and draft suggestions at no extra cost. Shortwave earns its $18/month when you’re regularly dealing with long, multi-person threads where context is spread across dozens of messages.

The experiment confirmed something we suspected but hadn’t properly tested: AI email management is genuinely useful for the parts of inbox time that feel mindless because they largely are — triage, filtering, low-stakes replies. The parts requiring relationship awareness and contextual judgment are still yours to own. Build your stack around that division, train it for a week before trusting it, and you’ll reclaim something real. Expect it to run unsupervised from day one and you’ll miss something that matters.

This article contains affiliate links. If you subscribe through one, we may earn a commission at no extra cost to you. It never changes what we recommend — we only link to tools we actually use. Full disclosure.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *