Product Analytics Agent Framework

How to structure a complete, structured analytical response to any product question — from ambiguous prompt to actionable recommendation.

The Problem with Single-Pass Answers

Product analytics interview questions are intentionally vague: "Instagram engagement dropped 10% — investigate." A common failure mode is jumping to conclusions: "It's probably a bug" or listing metrics without structure. Interviewers are evaluating how you think, not just what you know.

The solution is to treat your answer as an orchestrated workflow: break the problem into specialist sub-tasks, execute each with discipline, then synthesize into a recommendation. This mirrors how senior data scientists actually work.

The Four-Agent Mental Model

Think of your analytical response as four specialist "agents" operating in sequence. In a real AI system (or the MCP server's run_product_analytics_framework tool), these run in parallel — in your interview answer, you execute them sequentially out loud.

Agent Question it answers Output
1. Orchestrator What type of problem is this? What framework applies? Framework selection (HEART vs AARRR), scope definition
2. Metric Definer What metrics matter? What do we protect? Primary metric, counter-metrics, segments to analyze
3. Experiment Designer If we test a fix, how? What are the risks? Randomization unit, duration, network-effect risks, guardrails
4. Synthesis Agent What do we recommend and how do we communicate it? Investigation order, root cause hypotheses, decision criteria

Framework Selection: HEART vs AARRR vs North Star

When to Use HEART

HEART (Happiness, Engagement, Adoption, Retention, Task Success) was developed by Google UX Research. It is best for evaluating existing feature quality and user experience improvements.

  • Use when: diagnosing engagement drops, evaluating a redesign, improving retention
  • Strength: covers both quantitative metrics (retention, engagement) and qualitative signals (happiness, task success)
  • Meta context: core to product sense interviews — "How would you measure the success of Stories?"
DimensionWhat it measuresExample signals
HappinessUser satisfaction and sentimentNPS, CSAT, app store rating
EngagementFrequency and depth of useDAU, sessions/day, actions/session
AdoptionNew users reaching core valueFeature adoption rate, time-to-first-use
RetentionUsers returning over timeD7/D30 retention, churn rate, stickiness (DAU/MAU)
Task SuccessUsers completing intended goalsCompletion rate, error rate, funnel conversion

When to Use AARRR

AARRR (Acquisition, Activation, Retention, Referral, Revenue) is the growth framework. Use it for evaluating growth levers, new market expansion, or monetization decisions.

  • Use when: evaluating a new market launch, improving onboarding, increasing virality, monetization analysis
  • Strength: maps the full user lifecycle from discovery to revenue
  • Meta context: common in "How would you grow WhatsApp in India?" type questions
StageKey QuestionCore Metrics
AcquisitionHow do users find us?Installs, signups, CPA by channel
ActivationDo users experience core value?Onboarding completion rate, time-to-aha-moment
RetentionDo users come back?D7/D30 retention, churn rate, resurrection rate
ReferralDo users bring others?Viral coefficient (K-factor), invite acceptance rate
RevenueDo we monetize effectively?ARPU, LTV, conversion to paid, ARPDAU

North Star Metric

Every product has one metric that best captures its core value delivery. For complex interviews, anchor your entire answer to the relevant North Star — then explain how your proposed actions affect it.

ProductNorth StarWhy
Facebook FeedDAU/MAU (stickiness)Captures daily habit formation at scale
Instagram ReelsWatch-through rate × share rateBoth consumption quality and viral distribution
WhatsAppMessages sent per DAUDepth of engagement, not just presence
MarketplaceSuccessful transactions per MAUEnd-to-end value delivery (listing → sale)

Worked Example: End-to-End Answer

Prompt: "Instagram Stories engagement is down 10% week-over-week. Investigate."

Agent 1 — Orchestrator: Frame the problem

"Before diving in, I want to clarify a few things: Is this a rate drop (engagement per story view) or an absolute drop? Is it global or region/platform specific? And what's the measurement window? I'll assume engagement rate = reactions + replies per story impression, global, last 7 days vs. prior 7 days."

"This is a diagnostics problem. I'll use the HEART framework with a focus on the Engagement and Task Success dimensions."

Agent 2 — Metric Definer: Define what to measure

  • Primary metric: Story engagement rate = (reactions + replies) / story impressions
  • Counter-metrics (guardrails): Story creation rate (did creators stop posting?), story view rate (did reach change?), spam rate (quality signal)
  • Segments to break down: Platform (iOS/Android/Web), country, user cohort (new vs. established), content type (text vs. photo vs. video)
  • Leading indicators to check: Story creation rate (if creators dropped, impressions follow), notification delivery rate (push notifications drive story views)

Agent 3 — Experiment Designer: Plan a test if we identify a fix

"If our investigation identifies a fixable cause — say, the reaction tray is harder to reach in a new UI — we'd A/B test the fix. Key considerations:"

  • Randomization unit: User-level (not story-level), since story consumption is tied to individual user behavior patterns
  • Network effects: Stories are social — if I see fewer stories (treatment), my network activity also changes. Use cluster randomization by social graph partition to minimize spillover
  • Duration: Minimum 2 weeks to capture the full weekly usage cycle and control for novelty effects
  • Decision criteria: Ship if engagement rate improves ≥ 2% (MDE) with p < 0.05, no regression in story creation rate or spam rate

Agent 4 — Synthesis: Order of investigation and recommendation

  1. Check data pipeline first: Is the drop in the data or the product? Verify event ingestion lag and logging completeness
  2. Pinpoint timing: Plot daily engagement rate over last 30 days. If the drop is a step-change on a specific date → look for releases or infra changes
  3. Segment isolation: Run segment breakdown (platform × country). If 100% of the drop comes from Android users in Europe → likely a release bug
  4. Funnel diagnosis: Did story views drop (reach problem) or did reactions per view drop (interaction problem)? These have different fixes
  5. Root cause hypotheses (ordered by likelihood):
    • Product release changed the reaction UI → test on that release date
    • Algorithm change reduced story distribution → check organic reach rate
    • Seasonal behavior change → compare to same week prior year
    • Competitor launch pulling engagement elsewhere → cross-app usage data
  6. Communication: "We observed a 10% drop in story engagement rate starting [date], concentrated in [segment]. Our primary hypothesis is [cause] because [evidence]. We recommend [action] and will monitor story creation rate as a guardrail."

AI Agent Orchestration in the MCP Server

How This Maps to the run_product_analytics_framework Tool

The MCP server implements this four-agent pattern as a single orchestrated tool. When you call run_product_analytics_framework, it fans out to three specialist sub-components (metric definition, experiment design, diagnostic SQL) and then synthesizes the outputs.

-- Example MCP tool call (via Claude or VS Code):
{
  "name": "run_product_analytics_framework",
  "arguments": {
    "question": "Instagram Stories engagement is down 10% week-over-week",
    "product_area": "engagement",
    "framework": "HEART",
    "include_sql": true
  }
}

The tool returns:

  • Metric framework — HEART dimensions relevant to engagement, with primary signals and guardrails
  • Experiment design — Randomization unit, network-effect risks, decision criteria
  • Diagnostic SQL templates — Time-series, segment breakdown, funnel, cohort comparison queries
  • Synthesis — Ordered investigation steps and communication template

Additional Specialist Tools

ToolWhen to Use
define_product_metricsOnly need metric definition (e.g., preparing for a metrics-focused interview round)
design_product_experimentOnly need experiment design (e.g., evaluating a specific A/B test plan)
generate_diagnostic_sqlOnly need SQL templates (e.g., practicing diagnostic queries)
design_ab_experimentStatistical experiment design with sample size calculation (from A/B testing tools)

Why Orchestrator Pattern (Not Pipeline)

Metric definition, experiment design, and SQL generation are independent tasks — they don't depend on each other's outputs. The orchestrator pattern runs them in parallel and passes all results to the synthesis step, which has a dependency on all three. This is faster and more modular than a linear pipeline where each step must wait for the previous one.

User Question
      │
      ▼
 Orchestrator Agent   ← classifies problem, selects framework
      │ fans out (parallel)
 ┌────┴────────────────────────┐
 ▼              ▼              ▼
Metric       Experiment    SQL Query
Definer      Designer      Generator
 └────┬────────────────────────┘
      │ aggregates
      ▼
 Synthesis Agent   ← ordered investigation + communication plan

Framework Quick Reference

Which Framework for Which Question?

Question TypeFrameworkKey Dimensions to Emphasize
"Metric X dropped — investigate"HEARTEngagement, Task Success (funnel), then Happiness
"Measure success of feature Y"HEARTAdoption, Engagement, Retention (post-feature)
"Grow product Z in new market"AARRRAcquisition, Activation, Referral
"Should we add monetization?"AARRR + guardrailsRevenue vs. Retention trade-off
"Define the North Star for X"North StarValue delivery, frequency, breadth of impact
"Design an A/B test for Y"Experiment DesignRandomization, network effects, duration, MDE

Network Effect Risks — Cheat Sheet

RiskWhat It IsMitigation
Interference / SpilloverTreatment users interact with control via social graphCluster randomization by social graph partition or geography
Novelty EffectEngagement spike from excitement, not real liftRun 2–4 weeks; analyze engagement by days-in-experiment
Primacy EffectUsers resist change initially, then adaptSegment by days-in-experiment; look for behavior stabilization
Sample Ratio MismatchGroups aren't the expected size → logging bug or selection biasChi-square test on group sizes within 24h of launch
Multiple TestingMany metrics → inflated false positive ratePre-register primary metric; Bonferroni correction for secondary metrics
40 mins Intermediate