Cover image for: We Tested AI Cold Outreach: Which Approach Got Meetings

We Tested AI Cold Outreach: Which Approach Got Meetings

We Tested AI Cold Outreach: Which Approach Got Meetings

Affiliate links ↓

Updated · May 21, 2026

Everyone in your LinkedIn feed is claiming their AI outreach tool tripled reply rates. We decided to check. Over six weeks, we ran exactly 500 cold emails across five distinct approaches — from a two-second ChatGPT prompt to a fully enriched Clay pipeline — targeting VP-level sales leaders at B2B SaaS companies. We tracked every open, every reply, and every booked meeting. Two approaches genuinely worked. One made things measurably worse. Here’s the data.

How did we run the test?

Target persona: VP Sales, Head of Sales, and SDR Managers at B2B SaaS companies with 50 to 500 employees. All 500 contacts were sourced from Apollo.io and pulled from the same segment to keep the persona consistent across approaches. We sent 100 emails per approach from the same warmed domain using Instantly for delivery, so infrastructure was constant. Same subject line formula for each approach. Six weeks running March through April.

We measured four things: open rate, reply rate, positive reply rate (excluding “unsubscribe me” and “wrong person”), and meetings booked. That last number is the only one we actually care about — everything else is a proxy.

The five approaches we tested:

  • A — Prompt-and-blast: simple ChatGPT prompt, no research, no personalization
  • B — Research + AI: 10–20 minutes of manual LinkedIn and website research per prospect, then fed into ChatGPT
  • C — Clay pipeline: automated enrichment (company news, LinkedIn activity, open job postings) piped into an AI personalization column inside Clay
  • D — Lavender-coached: human first draft, refined in real time using Lavender‘s AI suggestions
  • E — Human control: written from scratch, no AI tools whatsoever

Does AI write better cold emails than a human?

Not by default — and the gap is wider than we expected.

Approach A (prompt-and-blast) produced emails that looked like this:

Hi [Name], I saw that you’re scaling your team at [Company] and wanted to reach out. We help B2B SaaS companies increase pipeline by 40%. Would love 15 minutes to show you how. Are you free this week?

That email got one meeting out of 100 sends. A 2.1% reply rate. Three emails never reached the inbox at all — Instantly flagged them as deliverability risks before sending — and by the end of the six weeks, our domain health score had ticked down slightly. Experienced sales leaders have seen the “40% pipeline increase” claim so many times they’ve developed immunity to it. The emails had the texture of AI: vague value props, no specifics, forced urgency.

The human control (Approach E) did considerably better without any tooling: 6.1% reply rate, 5 meetings. The emails were specific, referenced something the person had actually done or said, and didn’t feel like a template. Time invested: roughly 20 minutes per contact.

Does personalization at scale actually move the needle?

Yes — but the quality of the data source matters more than the AI model writing the email.

Approach B (research + AI) took 10–20 minutes per contact and produced solid first lines. Here’s what a good one looked like:

Saw your post about the challenges of hiring senior SDRs in a remote-first model — we’ve been hearing the same thing from a lot of VPs this quarter, which is part of why I’m reaching out now.

That performed at a 5.3% reply rate and 3 meetings — worse than the human control, which surprised us. Looking at the emails side by side, the issue was the closing paragraphs: the AI padded them with generic language even when the opening line was sharp. Good start, weak finish.

Approach C (Clay pipeline) changed the math. We connected LinkedIn activity, company news via Diffbot, and open job postings into Clay, then used an AI column to write the opening line. One example output:

Saw you’re hiring two more SDRs in Austin right now — figured this might actually be a good moment to talk about how you’re thinking about ramping them efficiently.

That line took roughly two seconds per contact to generate at scale. The Clay approach got a 7.2% reply rate and 6 meetings — outperforming both the 20-minute human approach and the manual research + AI hybrid. The enrichment data gives the model something real to work with. Without it, you’re asking an AI to be specific about nothing.

Setup time for the Clay pipeline: about half a workday on the first build. After that, roughly 3 minutes per contact to run through the workflow.

What surprised us

Three findings we didn’t predict going in.

Lavender outperformed everything. Approach D — a human wrote the email and Lavender flagged weak phrases, suggested cutting sentences, and pushed back on vague subject lines — got an 8.8% reply rate and 7 meetings. More than Clay. More than hand-crafted human emails. Lavender doesn’t write emails; it coaches you into writing better ones. The emails were tighter, more confident, and had clearly been edited rather than generated. That distinction is audible to the reader, apparently.

Open rates barely moved across all five approaches. The spread was 38% to 51% — meaningful but not dramatic. The real divergence showed up in reply rates and meetings booked. Subject lines get opens. The body decides everything else.

Prompt-and-blast may actively damage your sender reputation. Three of the 100 prompt-and-blast emails hit spam folders; none of the other approaches triggered a single one. Instantly’s pre-send warnings flagged two of them before they went out. Domain health damage is a slow-burn risk — a few weeks of high-volume AI slop can erode reputation that takes months to rebuild. This doesn’t show up in a single campaign’s numbers.

Which approach got the most meetings?

Here’s the full picture:

ApproachEmails sentOpen rateReply rateMeetings bookedAvg. time per email
A — Prompt-and-blast (ChatGPT)10038%2.1%1~1 min
B — Research + AI (ChatGPT)10044%5.3%3~15 min
C — Clay enrichment pipeline10051%7.2%6~3 min*
D — Lavender-coached human10047%8.8%7~12 min
E — Human control (no AI)10042%6.1%5~20 min

*Clay time is per contact after initial pipeline setup (~4 hours).

The efficiency winner is Clay: 6 meetings at roughly 3 minutes per contact once the pipeline is running. The quality winner is Lavender: 7 meetings, but it requires a writer who can take AI feedback and act on it. If you can write a decent first draft, Lavender will make it notably better. If your team isn’t strong writers, Clay gets you 85% of the outcome at a fraction of the effort.

Prompt-and-blast is not a strategy. It’s burning a domain for one meeting.

One tool we didn’t include but would test next round: Copy.ai‘s outreach sequences, which now pull in CRM data for personalization. The architecture is similar to Clay but with less flexibility. Worth a round two.

Frequently asked questions

Does using AI for cold email hurt deliverability?

Prompt-and-blast approaches did show measurable deliverability damage in our test — three emails hit spam, and our domain health score declined over six weeks. AI-assisted approaches (Clay, Lavender) showed no deliverability impact. The problem isn’t AI authorship; it’s generic content that spam filters have learned to recognize.

How long does a Clay enrichment pipeline actually take to set up?

Our first build took about four hours, including connecting data sources and dialing in the AI prompt. A second campaign would take under an hour to configure. The learning curve is real but front-loaded — after the first build, iteration is fast.

Can ChatGPT or Claude work for cold outreach if you prompt them well?

Yes, but only if you feed them specific research on the prospect — recent LinkedIn posts, company news, a specific trigger event. The model is only as specific as the inputs you give it. A good prompt without real prospect data produces the same vague output every time.

Is Lavender worth it for a small team or solo founder?

Probably yes if you’re sending fewer than 100 emails a week and doing your own writing. At around $29/month for the individual plan, one extra meeting per month more than covers it. For high-volume SDR teams, Clay’s pipeline approach scales better.

The main takeaway: AI improves cold outreach when it has real data to work with and a human in the loop to catch what it gets wrong. Without either of those constraints, it generates content that experienced buyers recognize and ignore. The tools that performed best — Clay and Lavender — both force you to do something before the AI does its job.

This article contains affiliate links. If you subscribe through one, we may earn a commission at no extra cost to you. It never changes what we recommend — we only link to tools we actually use. Full disclosure.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *