Case Study

I built an internal brand audit that replaced six figures of assumptions.

I used AI-augmented mixed methods to audit Plaid's brand perception across four product areas — rewriting the product marketing strategy and unlocking a $250K external study.

Brayden Decker · Lead Researcher ~3–4 months · 2025–2026 Plaid

900+

Gong sales calls coded

800

Decision-makers surveyed

652

Calls matched to won/lost

Hypotheses tested

Should we spend $250K on a brand tracker we can't design yet?

The Problem

Plaid had an existing brand survey (n=800, external panel) but the team had concerns about panel quality and whether it was measuring the right things. PMM was evaluating a $130K–$250K investment in a more tailored external tracker — but the proposed instrument's competitor sets, attribute lists, and segment definitions were all assumption-based. My research question: can I validate those inputs with internal data before the org commits six figures?

Hypothesis framework

Six hypotheses, four product areas, three segments. Every finding traceable to a specific call, survey response, or CRM record.

Supported

FM & Payments awareness saturated in fintech

Supported

Fraud & Credit awareness has significant room to grow

Nuanced

SMB is the most confused segment — not Enterprise

H4a

Unresolved

Negative sentiment in banking — insufficient evidence

H4b

Supported

Platform intermediaries block shortlisting

H4c

Supported

Compliance decision-makers unreachable

Four sources measuring different populations

The Method

No single source answers the brand question. The existing brand survey gave directional signal but had panel quality concerns — so I triangulated it against Gong transcripts (what happens when prospects find Plaid) and won/lost analysis (which signals predict revenue). Convergence across sources = actionable. Single-source = directional only.

Source

What it measures

Population

Brand Survey

Unaided/aided recall, preference, WTP

n=800 decision-makers
existing panel — quality concerns

Gong Transcripts

Comprehension, confusion type, attribute mentions

900+ prospect calls
in Plaid's funnel

Won/Lost Bayesian

Do call-time signals predict deal outcomes?

652 matched to CRM
revenue correlation

OOH Campaign

Do brand campaigns lift Plaid or the category?

500/market exp vs. ctrl
campaign effectiveness

Statistical approach

Bayesian, not frequentist. Beta-Binomial posteriors with 100K Monte Carlo draws — because "P(won > lost) = 0.97" is actionable in a way that "p < 0.05" is not.

One researcher, 900 calls

AI-Augmented Research

This project would not have been feasible without AI augmentation. I used Claude Code with Glean MCP to search semantically across Plaid's entire Gong corpus, code transcripts against structured rubrics, and run the Bayesian analysis pipeline.

Glean search → Gong transcript → LLM coding (10 attr × 4 dims) → CSV evidence tables → Bayesian analysis → Synthesis

AI added value (10x throughput)

599 calls coded against 10 attributes × 4 scoring dimensions

Structured evidence trails: CSVs, numbered flags, cross-cutting trackers

Bayesian scripts: Beta-Binomial posteriors, era contrasts, fuzzy Salesforce matching

Consistent rubric (v3, 13 edge-case rules) applied identically across all calls

Human judgment required (0x AI capability)

Hypothesis interpretation: does this pattern support or challenge a hypothesis?

40+ flags requiring researcher decisions on competitor sets, sample sizes, contradictions

Translating "P=0.97 for integration ease" into product marketing recommendations

Catching 27 overcoded calls, correcting scores, re-running all three analyses

The research protocol: A 3,000-word instruction file specifying hypothesis frameworks, output formats, citation rules, flag conventions, and explicit boundaries between AI tasks and human judgment. The key architectural decision of the project.

LLM coding calibration: 200 calls × 10 attributes = 2,000 decisions

Present

77% (target 85%)

Valence

80% (met)

Present threshold not met. All prevalence estimates carry ±5–10pp systematic error, documented in every output. Three attributes flagged as directional only.

Comprehension follows a gradient

Key Finding 1

The pattern

Awareness, comprehension, preference, and win rate all degrade with distance from the core product.

80%

Hold

Payments

67%

Reframe

Credit

56%

Build

Fraud

44%

Activate

Percentage of net-new prospects who understood the product at call time. 1,044 coded calls, 2023–2026.

Four products, four plays

Key Finding 2

FM Hold

confusion — down from 25%

The brand works. Confusion fading every year. No new investment. Feed the cross-sell engine.

Protect

Payments Reframe

30%

confused about how, not what

Win rate dropped 67%→49%. $10.9M pricing losses. Lead with "pay-by-bank" framing.

$10.9M losses ACH +47%

Fraud Activate

57pp

aided-to-unaided recall gap

67.6% recognize when prompted, 10.9% unprompted. Association exists — doesn't fire at buying time.

Trending wrong Op-eff P=.99

Credit Build

18/20

existing customers unaware

38.3% prompted association. Every miss = a 157% NRR customer lost. $109M base untapped.

Deepest gap 157% NRR

Confusion doesn't predict revenue. Integration ease does.

Key Finding 3

Won/lost Bayesian · 652 matched calls

Won and lost deals arrive equally confused (10.6% vs. 12.2%). AEs recover confusion in-call. Comprehension investment improves AE efficiency — it does not move win rate directly.

But two attributes predict winning in Fraud

Revenue lever 1

Operational Efficiency

80%

won deals

53%

lost deals

+21pp

P = 0.99

Revenue lever 2

Integration Ease

53%

won deals

36%

lost deals

+20pp

P = 0.98

Both confirmed in recent (2025–26) and historical (2023–24) data. Both fraud-specific — no other product shows a confirmed won/lost attribute signal at current sample sizes.

"I've known Plaid as a way to connect bank accounts. I didn't know there was the potential to do identity verification."

— SMB Founder, Gong call

Three outcomes

Impact

Positioning strategy adopted

The Hold / Reframe / Activate / Build framework became PMM's organizing logic. General brand campaigns redirected toward product-specific interventions.

$250K brand tracker approved

External study greenlit with a revised instrument — competitor sets, attributes, and segments updated based on findings.

Follow-up study planned

New qualitative study on value propositions scoped as a direct follow-on. The audit identified what buyers value; the next study tests how to message it.

Reflections

Learnings

What worked

Measuring multiple populations — market awareness (survey), in-funnel comprehension (Gong), revenue correlation (won/lost). The headline finding that confusion doesn't predict revenue only emerged because I had both the comprehension data and the outcome data.

What I'd change

More upfront LLM coding calibration. The 77% agreement rate means systematic error I can document but not quantify. A fourth calibration pass (2–3 days) would have tightened findings on the exact attributes that mattered most. I'd also build the Salesforce matching pipeline earlier to improve the 62.5% match rate. And I'd loop product-level PMMs into more of the process, not just the CMO. I had executive buy-in from the start — ironically the opposite of my usual challenge — but the PMMs who'd actually use the playbook daily weren't seeing findings until late. Earlier involvement would have sharpened the recommendations and accelerated adoption.

What I learned

AI augmentation changed what I could do, but not what I needed to decide. Claude Code was a 10x throughput multiplier on structured tasks and a 0x capability addition on judgment tasks. Designing my research to account for that asymmetry was the key architectural decision.