This portfolio is password-protected.
Case Study
I used AI-augmented mixed methods to audit Plaid's brand perception across four product areas — rewriting the product marketing strategy and unlocking a $250K external study.
Plaid had an existing brand survey (n=800, external panel) but the team had concerns about panel quality and whether it was measuring the right things. PMM was evaluating a $130K–$250K investment in a more tailored external tracker — but the proposed instrument's competitor sets, attribute lists, and segment definitions were all assumption-based. My research question: can I validate those inputs with internal data before the org commits six figures?
Hypothesis framework
Six hypotheses, four product areas, three segments. Every finding traceable to a specific call, survey response, or CRM record.
H1
Supported
FM & Payments awareness saturated in fintech
H2
Supported
Fraud & Credit awareness has significant room to grow
H3
Nuanced
SMB is the most confused segment — not Enterprise
H4a
Unresolved
Negative sentiment in banking — insufficient evidence
H4b
Supported
Platform intermediaries block shortlisting
H4c
Supported
Compliance decision-makers unreachable
No single source answers the brand question. The existing brand survey gave directional signal but had panel quality concerns — so I triangulated it against Gong transcripts (what happens when prospects find Plaid) and won/lost analysis (which signals predict revenue). Convergence across sources = actionable. Single-source = directional only.
Statistical approach
Bayesian, not frequentist. Beta-Binomial posteriors with 100K Monte Carlo draws — because "P(won > lost) = 0.97" is actionable in a way that "p < 0.05" is not.
This project would not have been feasible without AI augmentation. I used Claude Code with Glean MCP to search semantically across Plaid's entire Gong corpus, code transcripts against structured rubrics, and run the Bayesian analysis pipeline.
AI added value (10x throughput)
Human judgment required (0x AI capability)
The research protocol: A 3,000-word instruction file specifying hypothesis frameworks, output formats, citation rules, flag conventions, and explicit boundaries between AI tasks and human judgment. The key architectural decision of the project.
LLM coding calibration: 200 calls × 10 attributes = 2,000 decisions
Present threshold not met. All prevalence estimates carry ±5–10pp systematic error, documented in every output. Three attributes flagged as directional only.
The pattern
Awareness, comprehension, preference, and win rate all degrade with distance from the core product.
Percentage of net-new prospects who understood the product at call time. 1,044 coded calls, 2023–2026.
5%
confusion — down from 25%
The brand works. Confusion fading every year. No new investment. Feed the cross-sell engine.
30%
confused about how, not what
Win rate dropped 67%→49%. $10.9M pricing losses. Lead with "pay-by-bank" framing.
57pp
aided-to-unaided recall gap
67.6% recognize when prompted, 10.9% unprompted. Association exists — doesn't fire at buying time.
18/20
existing customers unaware
38.3% prompted association. Every miss = a 157% NRR customer lost. $109M base untapped.
Won/lost Bayesian · 652 matched calls
Won and lost deals arrive equally confused (10.6% vs. 12.2%). AEs recover confusion in-call. Comprehension investment improves AE efficiency — it does not move win rate directly.
But two attributes predict winning in Fraud
Revenue lever 1
Operational Efficiency
Revenue lever 2
Integration Ease
Both confirmed in recent (2025–26) and historical (2023–24) data. Both fraud-specific — no other product shows a confirmed won/lost attribute signal at current sample sizes.
"I've known Plaid as a way to connect bank accounts. I didn't know there was the potential to do identity verification."
— SMB Founder, Gong call
Positioning strategy adopted
The Hold / Reframe / Activate / Build framework became PMM's organizing logic. General brand campaigns redirected toward product-specific interventions.
$250K brand tracker approved
External study greenlit with a revised instrument — competitor sets, attributes, and segments updated based on findings.
Follow-up study planned
New qualitative study on value propositions scoped as a direct follow-on. The audit identified what buyers value; the next study tests how to message it.
What worked
Measuring multiple populations — market awareness (survey), in-funnel comprehension (Gong), revenue correlation (won/lost). The headline finding that confusion doesn't predict revenue only emerged because I had both the comprehension data and the outcome data.
What I'd change
More upfront LLM coding calibration. The 77% agreement rate means systematic error I can document but not quantify. A fourth calibration pass (2–3 days) would have tightened findings on the exact attributes that mattered most. I'd also build the Salesforce matching pipeline earlier to improve the 62.5% match rate. And I'd loop product-level PMMs into more of the process, not just the CMO. I had executive buy-in from the start — ironically the opposite of my usual challenge — but the PMMs who'd actually use the playbook daily weren't seeing findings until late. Earlier involvement would have sharpened the recommendations and accelerated adoption.
What I learned
AI augmentation changed what I could do, but not what I needed to decide. Claude Code was a 10x throughput multiplier on structured tasks and a 0x capability addition on judgment tasks. Designing my research to account for that asymmetry was the key architectural decision.