Product Analytics Agent Framework
How to structure a complete, structured analytical response to any product question — from ambiguous prompt to actionable recommendation.
The Problem with Single-Pass Answers
Product analytics interview questions are intentionally vague: "Instagram engagement dropped 10% — investigate." A common failure mode is jumping to conclusions: "It's probably a bug" or listing metrics without structure. Interviewers are evaluating how you think, not just what you know.
The solution is to treat your answer as an orchestrated workflow: break the problem into specialist sub-tasks, execute each with discipline, then synthesize into a recommendation. This mirrors how senior data scientists actually work.
The Four-Agent Mental Model
Think of your analytical response as four specialist "agents" operating in sequence. In a real AI system (or the MCP server's run_product_analytics_framework tool), these run in parallel — in your interview answer, you execute them sequentially out loud.
| Agent | Question it answers | Output |
|---|---|---|
| 1. Orchestrator | What type of problem is this? What framework applies? | Framework selection (HEART vs AARRR), scope definition |
| 2. Metric Definer | What metrics matter? What do we protect? | Primary metric, counter-metrics, segments to analyze |
| 3. Experiment Designer | If we test a fix, how? What are the risks? | Randomization unit, duration, network-effect risks, guardrails |
| 4. Synthesis Agent | What do we recommend and how do we communicate it? | Investigation order, root cause hypotheses, decision criteria |
Framework Selection: HEART vs AARRR vs North Star
When to Use HEART
HEART (Happiness, Engagement, Adoption, Retention, Task Success) was developed by Google UX Research. It is best for evaluating existing feature quality and user experience improvements.
- Use when: diagnosing engagement drops, evaluating a redesign, improving retention
- Strength: covers both quantitative metrics (retention, engagement) and qualitative signals (happiness, task success)
- Meta context: core to product sense interviews — "How would you measure the success of Stories?"
| Dimension | What it measures | Example signals |
|---|---|---|
| Happiness | User satisfaction and sentiment | NPS, CSAT, app store rating |
| Engagement | Frequency and depth of use | DAU, sessions/day, actions/session |
| Adoption | New users reaching core value | Feature adoption rate, time-to-first-use |
| Retention | Users returning over time | D7/D30 retention, churn rate, stickiness (DAU/MAU) |
| Task Success | Users completing intended goals | Completion rate, error rate, funnel conversion |
When to Use AARRR
AARRR (Acquisition, Activation, Retention, Referral, Revenue) is the growth framework. Use it for evaluating growth levers, new market expansion, or monetization decisions.
- Use when: evaluating a new market launch, improving onboarding, increasing virality, monetization analysis
- Strength: maps the full user lifecycle from discovery to revenue
- Meta context: common in "How would you grow WhatsApp in India?" type questions
| Stage | Key Question | Core Metrics |
|---|---|---|
| Acquisition | How do users find us? | Installs, signups, CPA by channel |
| Activation | Do users experience core value? | Onboarding completion rate, time-to-aha-moment |
| Retention | Do users come back? | D7/D30 retention, churn rate, resurrection rate |
| Referral | Do users bring others? | Viral coefficient (K-factor), invite acceptance rate |
| Revenue | Do we monetize effectively? | ARPU, LTV, conversion to paid, ARPDAU |
North Star Metric
Every product has one metric that best captures its core value delivery. For complex interviews, anchor your entire answer to the relevant North Star — then explain how your proposed actions affect it.
| Product | North Star | Why |
|---|---|---|
| Facebook Feed | DAU/MAU (stickiness) | Captures daily habit formation at scale |
| Instagram Reels | Watch-through rate × share rate | Both consumption quality and viral distribution |
| Messages sent per DAU | Depth of engagement, not just presence | |
| Marketplace | Successful transactions per MAU | End-to-end value delivery (listing → sale) |
Worked Example: End-to-End Answer
Prompt: "Instagram Stories engagement is down 10% week-over-week. Investigate."
Agent 1 — Orchestrator: Frame the problem
"Before diving in, I want to clarify a few things: Is this a rate drop (engagement per story view) or an absolute drop? Is it global or region/platform specific? And what's the measurement window? I'll assume engagement rate = reactions + replies per story impression, global, last 7 days vs. prior 7 days."
"This is a diagnostics problem. I'll use the HEART framework with a focus on the Engagement and Task Success dimensions."
Agent 2 — Metric Definer: Define what to measure
- Primary metric: Story engagement rate = (reactions + replies) / story impressions
- Counter-metrics (guardrails): Story creation rate (did creators stop posting?), story view rate (did reach change?), spam rate (quality signal)
- Segments to break down: Platform (iOS/Android/Web), country, user cohort (new vs. established), content type (text vs. photo vs. video)
- Leading indicators to check: Story creation rate (if creators dropped, impressions follow), notification delivery rate (push notifications drive story views)
Agent 3 — Experiment Designer: Plan a test if we identify a fix
"If our investigation identifies a fixable cause — say, the reaction tray is harder to reach in a new UI — we'd A/B test the fix. Key considerations:"
- Randomization unit: User-level (not story-level), since story consumption is tied to individual user behavior patterns
- Network effects: Stories are social — if I see fewer stories (treatment), my network activity also changes. Use cluster randomization by social graph partition to minimize spillover
- Duration: Minimum 2 weeks to capture the full weekly usage cycle and control for novelty effects
- Decision criteria: Ship if engagement rate improves ≥ 2% (MDE) with p < 0.05, no regression in story creation rate or spam rate
Agent 4 — Synthesis: Order of investigation and recommendation
- Check data pipeline first: Is the drop in the data or the product? Verify event ingestion lag and logging completeness
- Pinpoint timing: Plot daily engagement rate over last 30 days. If the drop is a step-change on a specific date → look for releases or infra changes
- Segment isolation: Run segment breakdown (platform × country). If 100% of the drop comes from Android users in Europe → likely a release bug
- Funnel diagnosis: Did story views drop (reach problem) or did reactions per view drop (interaction problem)? These have different fixes
- Root cause hypotheses (ordered by likelihood):
- Product release changed the reaction UI → test on that release date
- Algorithm change reduced story distribution → check organic reach rate
- Seasonal behavior change → compare to same week prior year
- Competitor launch pulling engagement elsewhere → cross-app usage data
- Communication: "We observed a 10% drop in story engagement rate starting [date], concentrated in [segment]. Our primary hypothesis is [cause] because [evidence]. We recommend [action] and will monitor story creation rate as a guardrail."
AI Agent Orchestration in the MCP Server
How This Maps to the run_product_analytics_framework Tool
The MCP server implements this four-agent pattern as a single orchestrated tool. When you call run_product_analytics_framework, it fans out to three specialist sub-components (metric definition, experiment design, diagnostic SQL) and then synthesizes the outputs.
-- Example MCP tool call (via Claude or VS Code):
{
"name": "run_product_analytics_framework",
"arguments": {
"question": "Instagram Stories engagement is down 10% week-over-week",
"product_area": "engagement",
"framework": "HEART",
"include_sql": true
}
}
The tool returns:
- Metric framework — HEART dimensions relevant to engagement, with primary signals and guardrails
- Experiment design — Randomization unit, network-effect risks, decision criteria
- Diagnostic SQL templates — Time-series, segment breakdown, funnel, cohort comparison queries
- Synthesis — Ordered investigation steps and communication template
Additional Specialist Tools
| Tool | When to Use |
|---|---|
define_product_metrics | Only need metric definition (e.g., preparing for a metrics-focused interview round) |
design_product_experiment | Only need experiment design (e.g., evaluating a specific A/B test plan) |
generate_diagnostic_sql | Only need SQL templates (e.g., practicing diagnostic queries) |
design_ab_experiment | Statistical experiment design with sample size calculation (from A/B testing tools) |
Why Orchestrator Pattern (Not Pipeline)
Metric definition, experiment design, and SQL generation are independent tasks — they don't depend on each other's outputs. The orchestrator pattern runs them in parallel and passes all results to the synthesis step, which has a dependency on all three. This is faster and more modular than a linear pipeline where each step must wait for the previous one.
User Question
│
▼
Orchestrator Agent ← classifies problem, selects framework
│ fans out (parallel)
┌────┴────────────────────────┐
▼ ▼ ▼
Metric Experiment SQL Query
Definer Designer Generator
└────┬────────────────────────┘
│ aggregates
▼
Synthesis Agent ← ordered investigation + communication plan
Framework Quick Reference
Which Framework for Which Question?
| Question Type | Framework | Key Dimensions to Emphasize |
|---|---|---|
| "Metric X dropped — investigate" | HEART | Engagement, Task Success (funnel), then Happiness |
| "Measure success of feature Y" | HEART | Adoption, Engagement, Retention (post-feature) |
| "Grow product Z in new market" | AARRR | Acquisition, Activation, Referral |
| "Should we add monetization?" | AARRR + guardrails | Revenue vs. Retention trade-off |
| "Define the North Star for X" | North Star | Value delivery, frequency, breadth of impact |
| "Design an A/B test for Y" | Experiment Design | Randomization, network effects, duration, MDE |
Network Effect Risks — Cheat Sheet
| Risk | What It Is | Mitigation |
|---|---|---|
| Interference / Spillover | Treatment users interact with control via social graph | Cluster randomization by social graph partition or geography |
| Novelty Effect | Engagement spike from excitement, not real lift | Run 2–4 weeks; analyze engagement by days-in-experiment |
| Primacy Effect | Users resist change initially, then adapt | Segment by days-in-experiment; look for behavior stabilization |
| Sample Ratio Mismatch | Groups aren't the expected size → logging bug or selection bias | Chi-square test on group sizes within 24h of launch |
| Multiple Testing | Many metrics → inflated false positive rate | Pre-register primary metric; Bonferroni correction for secondary metrics |