Analytical Execution Interview (Data Analysis/Case Study)

Frameworks and practice for real-world case studies and data analysis problems.

What to Expect

This interview assesses your ability to conduct quantitative analysis, draw meaningful conclusions from data, and communicate your findings effectively. You'll typically be presented with a business problem or a dataset and asked to analyze it, focusing on hypothesis generation, quantitative analysis, goal setting aligned with business objectives, and adaptability to dynamic situations.

Key Areas

  • Crafting Hypotheses and Statistical Knowledge: Formulating strong, testable hypotheses, particularly for new features or data-driven decisions. Understanding null (H0) and alternative (H1) hypotheses, framing hypotheses with quantifiable metrics, and prioritizing based on business impact.
  • Proficiency in Quantitative Analysis: Quantifying trade-offs, understanding the impact of features on relevant metrics, and using data to support your arguments.
  • Goal Setting Aligned with Business Objectives: Defining clear goals and success metrics that directly contribute to broader business objectives. Consider success metrics, counter metrics, and overall ecosystem stability metrics.
  • Adapting Analytical Approaches in Dynamic Situations: Demonstrating flexibility when facing data challenges, changing requirements, or confounding factors.

How to Prepare

  • Master Foundational Statistical Concepts: Focus on defining, applying, and explaining the limitations of statistical concepts.
  • Practice with Case Studies: Focus on hypothesis generation, quantitative analysis, and structured problem-solving.
  • Develop a Structured Approach to Trade-offs: Use a structured framework (see table below).
  • Clear and Concise Communication: Explain the \"why\" behind decisions, provide context, and use visualizations.
  • Share Behavioral Stories Demonstrating Adaptability: Use the STAR method.
  • Familiarize Yourself with Meta's Context: Research Meta's products, user base, and business objectives.
  • A/B Testing Deep Dive: Understand sample size calculation, statistical significance, power, metrics to track, and interpreting results.
  • Asking Clarifying Questions: Practice asking clarifying questions to fully understand the problem.

Trade-off Evaluation Framework

Option Metric A Impact on Metric A Metric B Impact on Metric B Overall Assessment
A CTR +5% Load Time +20ms Acceptable Trade-off
B CTR +10% Load Time +100ms Unacceptable Trade-off

Example Scenario (Brief)

Problem: A social media platform has seen a recent decline in user engagement. How would you investigate the cause?

Possible Approach:

  1. Clarifying Questions:
    • What is the timeframe of the decline?
    • Is the decline observed across all regions/markets?
    • Is the decline observed across all user segments or only specific ones?
    • Have there been any recent product changes or marketing campaigns?
    • Have DAU and MAU metrics also been declining?
  2. Define Key Metrics and Business Objectives: Define \"user engagement\" with specific, measurable metrics (DAU, MAU, session duration, content creation rate, etc.).
  3. Craft Hypotheses: Formulate testable hypotheses about potential causes.
  4. Analyze Trends and Segment Users: Analyze trends, segmenting by demographics, behavior, and platform.
  5. Quantitative Analysis and Trade-off Evaluation: Use A/B testing, cohort analysis, or regression.
  6. Adapt to Dynamic Situations: Discuss how you'd adapt if data quality issues arose.
  7. Communicate Findings and Recommendations: Present findings clearly with visualizations.

πŸ“Š Full Worked Case Study: "Instagram Reels Engagement Drop"

Click to expand the complete case study walkthrough

🎯 The Prompt

Interviewer: "Reels watch time per user dropped 15% week-over-week. Walk me through how you'd investigate."


Step 1: Clarifying Questions (2-3 minutes)

Always ask before diving in. This shows structured thinking and prevents wasted effort.

Question Why You're Asking Hypothetical Answer
"Is this global or specific regions?" Localize the problem "It's global, but worse in NA and EU"
"New users, existing, or both?" Acquisition vs retention issue "Existing users are driving the drop"
"Any product launches or bugs last week?" Rule out obvious causes "We launched a new recommendation algo"
"Is this just watch time, or also opens/sessions?" Understand where in funnel the problem is "Opens are flat; watch time per session is down"
"How are we measuring watch time?" Rule out measurement issues "Total seconds watched / DAU"

What we learned: The issue is global, affecting existing users, watch time per session is down (not opens), and there was a new recommendation algorithm launch.


Step 2: Hypothesis Generation (2 minutes)

Generate multiple hypotheses, then prioritize.

# Hypothesis How We'd Test Priority
1 New algo is recommending less engaging content Compare avg video completion rate pre/post launch πŸ”΄ High
2 New algo shows less variety, causing fatigue Measure content diversity score per session πŸ”΄ High
3 Bug in video player (buffering, crashes) Check error rates, buffer time metrics 🟑 Medium
4 Seasonal effect (holiday week, exams) Compare to same period last year 🟑 Medium
5 Competition (TikTok launched new feature) External research; less actionable 🟒 Low

Prioritization logic: Start with the algo change since it's the most likely cause given timing. Rule out bugs quickly. Seasonal and competition are backup hypotheses.


Step 3: Data Analysis (5-7 minutes)

Analysis 1: Algo Impact on Engagement
-- Compare key metrics pre/post algo launch
WITH weekly_metrics AS (
  SELECT 
    CASE 
      WHEN event_date < '2025-12-15' THEN 'pre_launch'
      ELSE 'post_launch'
    END AS period,
    user_id,
    SUM(watch_time_seconds) AS total_watch_time,
    COUNT(DISTINCT video_id) AS videos_watched,
    AVG(completion_rate) AS avg_completion
  FROM reels_events
  WHERE event_date BETWEEN '2025-12-08' AND '2025-12-22'
  GROUP BY 1, 2
)
SELECT 
  period,
  COUNT(DISTINCT user_id) AS users,
  AVG(total_watch_time) AS avg_watch_time,
  AVG(videos_watched) AS avg_videos,
  AVG(avg_completion) AS avg_completion_rate
FROM weekly_metrics
GROUP BY period;

Finding:

Period Avg Watch Time Avg Videos Completion Rate
Pre-launch 420 sec 28 45%
Post-launch 357 sec (-15%) 24 (-14%) 38% (-16%)

Insight: Completion rate dropped significantlyβ€”users are swiping away faster. The algo is showing content users don't want to finish.

Analysis 2: Content Diversity
-- Measure content diversity: unique creators per session
WITH session_diversity AS (
  SELECT 
    period,
    user_id,
    session_id,
    COUNT(DISTINCT creator_id) AS unique_creators,
    COUNT(DISTINCT content_category) AS unique_categories
  FROM reels_events
  GROUP BY 1, 2, 3
)
SELECT 
  period,
  AVG(unique_creators) AS avg_creators_per_session,
  AVG(unique_categories) AS avg_categories_per_session
FROM session_diversity
GROUP BY period;

Finding:

Period Creators/Session Categories/Session
Pre-launch 18 6
Post-launch 12 (-33%) 4 (-33%)

Insight: New algo is over-optimizing for a narrow set of creators/categories. Users get bored faster.

Analysis 3: Segment Deep-Dive
-- Which user segments are most affected?
SELECT 
  user_tenure_bucket,
  period,
  AVG(watch_time_seconds) AS avg_watch_time
FROM reels_events
JOIN users USING (user_id)
GROUP BY 1, 2
ORDER BY user_tenure_bucket, period;

Finding:

Tenure Pre-Launch Post-Launch Change
0-30 days 380 sec 350 sec -8%
31-180 days 420 sec 340 sec -19%
180+ days 450 sec 370 sec -18%

Insight: Veteran users are hit hardest. They have more refined tastes; the algo's lack of diversity bothers them most.


Step 4: Root Cause Identification (1 minute)

The Story:

The new recommendation algorithm is over-indexing on short-term engagement signals (e.g., "user watched 80% of this video, show more like it"). This creates a "filter bubble" that reduces content diversity by 33%. Users, especially veterans, experience content fatigue faster, leading to lower completion rates (-16%) and shorter sessions (-15%).

Step 5: Recommendations (2 minutes)

Recommendation Expected Impact Effort Priority
Immediate: Roll back to previous algo while we fix Recover 15% watch time Low πŸ”΄ Do now
Short-term: Add diversity constraint to algo (min 15 unique creators per session) Improve completion rate Medium 🟑 This sprint
Medium-term: A/B test diversity parameter levels Find optimal balance Medium 🟒 Next month
Long-term: Build content fatigue model Proactive diversity adjustment High 🟒 Roadmap
Trade-off Discussion

Interviewer: "But the new algo increased ad revenue per session by 5%. How do you think about that trade-off?"

Answer framework:

  1. Quantify both sides:
    • +5% ad revenue per session
    • -15% sessions per user (watch time proxy)
    • Net impact: 0.95 Γ— 1.05 = 0.9975... roughly breakeven short-term
  2. Consider long-term effects:
    • Fatigued users may churn or reduce frequency
    • Short-term ad revenue β‰  long-term DAU health
  3. Recommendation: The 5% ad revenue gain is likely a short-term boost at the cost of user experience. I'd prioritize watch time recovery to protect long-term engagement, and explore ad optimization separately.

Step 6: Summarize for Interviewer (30 seconds)

"To summarize: The 15% watch time drop is caused by the new recommendation algorithm reducing content diversity by 33%. Users, especially veterans, are completing fewer videos and ending sessions earlier. My recommendation is to roll back immediately, then add a diversity constraint before re-launching. I'd also run an A/B test to find the optimal diversity level that balances engagement and ad revenue."

🧠 More Practice Case Studies

Case 2: "Facebook Marketplace listing volume dropped 20%"

Clarifying Questions to Ask:

  • Is this new listings, or total active listings?
  • Is this sellers listing less, or fewer unique sellers?
  • Is this specific to certain categories?
  • Any changes to listing flow or policies?

Hypotheses to Generate:

  1. Listing flow has more friction (new verification step?)
  2. Sellers are migrating to other platforms
  3. Fewer buyers β†’ sellers see less value
  4. Category-specific issue (e.g., auto listings down)
  5. Seasonality (post-holiday cleanup is done)

Analysis Approach:

-- Decompose: Is it fewer sellers or fewer listings per seller?
SELECT 
  date_trunc('week', listed_at) AS week,
  COUNT(DISTINCT seller_id) AS unique_sellers,
  COUNT(*) AS total_listings,
  COUNT(*) / COUNT(DISTINCT seller_id)::float AS listings_per_seller
FROM marketplace_listings
WHERE listed_at >= CURRENT_DATE - 60
GROUP BY 1
ORDER BY 1;

Key Insight (Hypothetical):

Unique sellers is flat; listings per seller dropped. A new "enhanced photos" requirement is adding frictionβ€”sellers with phones that don't support it are dropping off.

Recommendation:

Make enhanced photos optional with a "recommended" badge; A/B test impact on buyer engagement.

Case 3: "WhatsApp message send failure rate increased"

Clarifying Questions:

  • What's the failure rate now vs baseline?
  • Is this specific to certain message types (text, media, voice)?
  • Is this regional or global?
  • Any infra changes or deployments?

Hypotheses:

  1. Server-side issue (deployment, capacity)
  2. Client-side issue (app update, device compatibility)
  3. Network issues (carrier, ISP)
  4. Media-specific issue (image/video sending)

Analysis Approach:

-- Segment by message type and region
SELECT 
  message_type,
  region,
  COUNT(*) AS attempts,
  SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) AS failures,
  SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS failure_rate
FROM message_events
WHERE event_date >= CURRENT_DATE - 7
GROUP BY 1, 2
ORDER BY failure_rate DESC;

Key Insight (Hypothetical):

Image messages in India have 25% failure rate (vs 2% baseline). A new image compression library in the latest Android app update is crashing on older devices.

Recommendation:

Rollback the library for devices below Android 10; push a hotfix within 24 hours.

Case 4: "LinkedIn connection acceptance rate dropped"

Clarifying Questions:

  • Is this overall or for cold outreach specifically?
  • Is the drop in sends, accepts, or both?
  • Any changes to the connection request UI?

Hypotheses:

  1. New UI makes "Ignore" easier than "Accept"
  2. Increase in spam/recruiter requests
  3. People are more selective (behavior change)
  4. Measurement changed (definition of "accept")

Analysis Approach:

-- Funnel analysis: request β†’ view β†’ action
SELECT 
  week,
  COUNT(*) AS requests_sent,
  SUM(CASE WHEN viewed THEN 1 ELSE 0 END) AS viewed,
  SUM(CASE WHEN accepted THEN 1 ELSE 0 END) AS accepted,
  SUM(CASE WHEN ignored THEN 1 ELSE 0 END) AS ignored,
  AVG(CASE WHEN accepted THEN 1 ELSE 0 END) AS accept_rate
FROM connection_requests
GROUP BY week;

Key Insight (Hypothetical):

View rate is flat; accept rate dropped 10%, ignore rate up 10%. New mobile UI made "Ignore" the default swipe gesture.

Recommendation:

A/B test reversing the swipe direction; if validated, ship the fix.

πŸ“ˆ Descriptive Analytics Deep Dive

Descriptive analytics is the foundation of every analytical execution interview. Before forming hypotheses or running experiments, you must describe what the data looks like. Interviewers test whether you reach for the right summary statistics and whether you understand when each one is appropriate.

Click to expand the full Descriptive Analytics guide

The Five-Number Summary + Beyond

For any metric you encounter in an interview, your first instinct should be to characterize its distribution:

Statistic What It Tells You When to Use Watch Out For
Mean Average level Symmetric distributions, aggregated metrics Sensitive to outliers β€” one whale user can skew it
Median Typical value Skewed data (revenue, session length) Ignores tail behavior entirely
Mode Most common value Categorical data, discrete counts May not be unique; not useful for continuous data
Standard Deviation Spread around the mean Normal-ish distributions Meaningless for highly skewed data
IQR (Q3 βˆ’ Q1) Spread of the middle 50% Skewed distributions, outlier detection Ignores tails by design
Skewness Asymmetry direction Assessing if mean or median is appropriate Positive = right tail; most product metrics are right-skewed
Kurtosis Tail heaviness Assessing outlier risk High kurtosis = more extreme outliers than normal
Coefficient of Variation (CV) Relative variability (Οƒ/ΞΌ) Comparing variability across metrics with different scales Undefined when mean is 0

Practical SQL for Descriptive Analytics

-- Comprehensive descriptive stats for any metric
SELECT
  COUNT(*)                                    AS n,
  AVG(metric_value)                           AS mean,
  PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY metric_value) AS median,
  STDDEV(metric_value)                        AS stddev,
  VARIANCE(metric_value)                      AS variance,
  MIN(metric_value)                           AS min_val,
  PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY metric_value) AS p25,
  PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY metric_value) AS p75,
  PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY metric_value) AS p90,
  PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY metric_value) AS p95,
  PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY metric_value) AS p99,
  MAX(metric_value)                           AS max_val,
  -- Skewness approximation
  AVG(POWER((metric_value - sub.mu) / sub.sd, 3)) AS skewness,
  -- Coefficient of variation
  STDDEV(metric_value) / NULLIF(AVG(metric_value), 0) AS cv
FROM my_table
CROSS JOIN (
  SELECT AVG(metric_value) AS mu, STDDEV(metric_value) AS sd FROM my_table
) sub;

When Mean vs. Median Matters: A Concrete Example

Suppose you're analyzing revenue per user per month for a freemium app:

UserRevenue ($)
User 10
User 20
User 30
User 45
User 510
User 612
User 715
User 820
User 950
User 102,500
  • Mean: $261.20 β€” dominated by the whale user
  • Median: $11.00 β€” represents the "typical" paying experience
  • P90: $50 β€” what a high-spending (but not extreme) user looks like
  • P99: $2,500 β€” whale territory

Interview tip: Always state which central tendency you're using and why. Saying "I'd look at the median here because revenue is heavily right-skewed" shows distribution awareness.

Segmented Descriptive Analytics

Raw descriptive stats across all users can mask important patterns. Always segment:

-- Descriptive stats segmented by user cohort
SELECT
  user_segment,
  COUNT(*)                                     AS n,
  AVG(metric)                                  AS mean,
  PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY metric) AS median,
  PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY metric) AS p90,
  PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY metric) AS p95,
  STDDEV(metric)                               AS stddev
FROM user_events
GROUP BY user_segment
ORDER BY mean DESC;

Common segmentations:

  • New vs. returning users
  • Platform (iOS / Android / Web)
  • Geography / market maturity
  • Subscription tier (free / premium)
  • Power users vs. casual users (e.g., top 10% by activity)

πŸ“Š Estimating P90, P95, and P99 from Limited Data

In interviews (and in practice) you often have limited data β€” maybe 20-50 observations, a summary table, or just a mean and standard deviation. You still need to estimate extreme percentiles like the 90th, 95th, or 99th. Here's how.

Click to expand Percentile Estimation Techniques

Method 1: Parametric Estimation (Assume a Distribution)

If you can assume a distribution family, you can estimate any percentile from just the mean (ΞΌ) and standard deviation (Οƒ).

For Normal Data

Use the z-score formula: Percentile = ΞΌ + z Γ— Οƒ

Percentile z-score Formula Example (ΞΌ=100, Οƒ=15)
P90 1.282 ΞΌ + 1.282Οƒ 100 + 1.282 Γ— 15 = 119.2
P95 1.645 ΞΌ + 1.645Οƒ 100 + 1.645 Γ— 15 = 124.7
P99 2.326 ΞΌ + 2.326Οƒ 100 + 2.326 Γ— 15 = 134.9

When to use: Aggregated metrics (sample means), measurement data, anything you've confirmed is approximately symmetric.

When NOT to use: Revenue, session duration, time-to-event β€” these are almost never normal.

For Log-Normal Data (Most Product Metrics)

Most product metrics (revenue, session duration, time on page, purchase amounts) are right-skewed and follow a log-normal distribution. The trick: take the log, estimate percentiles as if normal, then exponentiate back.

Step-by-step:

  1. Compute ΞΌ_log = mean of ln(x) and Οƒ_log = stddev of ln(x)
  2. Estimate log-space percentile: P_log = ΞΌ_log + z Γ— Οƒ_log
  3. Convert back: P = exp(P_log)

Worked example β€” Revenue per user:

StatisticRaw ($)Log-transformed
Mean$45.20ΞΌ_log = 3.12
Std Dev$120.50Οƒ_log = 1.45
Percentile Log-space Calculation Result
P90 exp(3.12 + 1.282 Γ— 1.45) = exp(4.98) $145.47
P95 exp(3.12 + 1.645 Γ— 1.45) = exp(5.51) $246.61
P99 exp(3.12 + 2.326 Γ— 1.45) = exp(6.49) $660.52

Interview tip: If you're told "revenue per user has mean $45 and std dev $120," immediately note that Οƒ >> ΞΌ, which signals a right-skewed distribution. State: "This looks log-normal. I'd estimate percentiles in log-space."

For Exponential Data (Time-to-Event)

Time between events (purchases, support tickets, logins) often follows an exponential distribution with rate Ξ» = 1/mean.

Percentile = βˆ’ln(1 βˆ’ p) / Ξ» or equivalently βˆ’mean Γ— ln(1 βˆ’ p)

Percentile Formula (mean = 7 days) Result
P90 βˆ’7 Γ— ln(0.10) 16.1 days
P95 βˆ’7 Γ— ln(0.05) 21.0 days
P99 βˆ’7 Γ— ln(0.01) 32.2 days

When to use: Days between purchases, time to first action, inter-arrival times, server response times.


Method 2: Non-Parametric (Order Statistics)

When you have limited raw data (say, n = 20-50 observations) but no distribution assumption:

  • Sort the data
  • P90 β‰ˆ value at rank ceiling(0.90 Γ— n)
  • For n=20: P90 = value at rank 18, P95 = value at rank 19
  • For n=50: P90 = value at rank 45, P95 = rank 48, P99 = rank 50

Limitations: With small n, you can't reliably estimate P99 (you'd need ~100+ observations for a single P99 data point). State this limitation in interviews.


Method 3: Chebyshev's Inequality (Distribution-Free Bound)

When you know nothing about the distribution shape, Chebyshev gives a worst-case upper bound:

At most 1/kΒ² of data lies beyond k standard deviations from the mean.

Target k Chebyshev Bound Normal Actual Use Case
P90 3.16 ΞΌ Β± 3.16Οƒ ΞΌ Β± 1.28Οƒ Very conservative; safety-critical
P95 4.47 ΞΌ Β± 4.47Οƒ ΞΌ Β± 1.65Οƒ SLA guarantees with unknown distributions
P99 10.0 ΞΌ Β± 10Οƒ ΞΌ Β± 2.33Οƒ Extreme worst-case budgeting

Interview tip: Mention Chebyshev when the interviewer says "assume nothing about the distribution." It shows depth. But immediately note it's very conservative and offer to refine if you can assume log-normal or similar.


Method 4: Bootstrapping (Small Sample, No Assumptions)

When you have limited data and can't assume a distribution, bootstrapping gives confidence intervals for percentile estimates:

import numpy as np

# Your limited data (e.g., 30 page load times in seconds)
data = np.array([0.8, 1.1, 1.2, 1.3, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
                 2.1, 2.2, 2.3, 2.5, 2.7, 2.9, 3.0, 3.2, 3.5, 3.8,
                 4.0, 4.2, 4.5, 5.0, 5.5, 6.0, 7.0, 8.5, 10.0, 15.0])

n_bootstrap = 10000
p90_estimates = []
p95_estimates = []
p99_estimates = []

for _ in range(n_bootstrap):
    sample = np.random.choice(data, size=len(data), replace=True)
    p90_estimates.append(np.percentile(sample, 90))
    p95_estimates.append(np.percentile(sample, 95))
    p99_estimates.append(np.percentile(sample, 99))

print(f"P90: {np.median(p90_estimates):.1f} "
      f"[{np.percentile(p90_estimates, 2.5):.1f}, "
      f"{np.percentile(p90_estimates, 97.5):.1f}]")
print(f"P95: {np.median(p95_estimates):.1f} "
      f"[{np.percentile(p95_estimates, 2.5):.1f}, "
      f"{np.percentile(p95_estimates, 97.5):.1f}]")
print(f"P99: {np.median(p99_estimates):.1f} "
      f"[{np.percentile(p99_estimates, 2.5):.1f}, "
      f"{np.percentile(p99_estimates, 97.5):.1f}]")

Interview tip: You won't code this live, but mentioning "I'd bootstrap percentile estimates and report a confidence interval" shows sophistication and practical awareness of uncertainty.


Decision Framework: Which Method to Use

Situation Best Method Example
Large sample, symmetric data Normal parametric (z-score) NPS scores, satisfaction ratings
Right-skewed metric, any sample size Log-normal parametric Revenue, session duration, purchase amount
Time-to-event data Exponential parametric Days between purchases, time to churn
Small sample (n < 50), no assumptions Bootstrap or order statistics Pilot test results, early A/B test data
Unknown distribution, worst-case bound Chebyshev inequality SLA commitments, safety budgets
Only summary stats given (mean, Οƒ) Parametric with stated assumption "The mean load time is 2s with Οƒ = 1.5s"

πŸ”¬ Real-World Data: Distributions, Behavior, and Assumptions

Different product metrics follow different distributions. Knowing which distribution applies β€” and why β€” is critical for choosing the right analysis technique and communicating results accurately.

Click to expand Real-World Data Distribution Examples

1. Page Load Times β†’ Log-Normal (Right-Skewed)

PropertyValue
Typical range0.5s – 3s
Mean~2.1s
Median~1.6s
P90~4.5s
P95~6.0s
P99~12s
Skewness~2.5 (strongly right-skewed)

Why log-normal: Most requests are fast, but network retries, cold starts, and database timeouts create a long right tail. There's a hard lower bound (can't be < 0) but no practical upper bound.

Assumption to state: "Page load times are typically log-normal. I'd report the median and P95/P99 rather than the mean, since the mean is inflated by tail latencies."

P95 estimation with limited data: If you have mean_log = 0.47 and sd_log = 0.80, then P95 = exp(0.47 + 1.645 Γ— 0.80) = exp(1.79) β‰ˆ 5.98 seconds.


2. Daily Revenue per User β†’ Zero-Inflated Log-Normal

PropertyValue
% of zeros85-95% (non-paying users)
Mean (all users)$0.15
Mean (payers only)$3.00
Median (payers)$1.50
P90 (payers)$8.00
P99 (payers)$150

Why zero-inflated: Most users never pay. Among payers, amounts follow a log-normal pattern. This two-part structure requires special handling.

Assumption to state: "Revenue is zero-inflated. I'd analyze it as a two-part model: (1) probability of paying (binomial) and (2) amount given paying (log-normal). Reporting a single mean across all users is misleading."

-- Two-part decomposition for revenue
SELECT
  segment,
  COUNT(*) AS total_users,
  SUM(CASE WHEN revenue > 0 THEN 1 ELSE 0 END) AS payers,
  AVG(CASE WHEN revenue > 0 THEN 1.0 ELSE 0.0 END) AS payer_rate,
  AVG(CASE WHEN revenue > 0 THEN revenue END) AS avg_revenue_payers,
  PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY CASE WHEN revenue > 0 THEN revenue END) AS p90_payers,
  PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY CASE WHEN revenue > 0 THEN revenue END) AS p99_payers
FROM user_daily_revenue
GROUP BY segment;

3. Session Duration β†’ Log-Normal / Weibull

PropertyValue
Typical range10s – 15 min
Mean~5.2 min
Median~3.1 min
P90~12 min
P95~18 min
P99~45 min
Skewness~3.0

Why: Quick bounces create a mass near zero; deep engagement creates the right tail. The Weibull distribution can model a decreasing hazard rate (users who stay longer tend to stay even longer).

Assumption to state: "Session duration is right-skewed with a floor at 0. I'd use median and percentiles rather than mean. If I need to model it, log-normal or Weibull fit well."


4. Daily Active Users (DAU) β†’ Approximately Normal (Aggregated Count)

PropertyValue
Mean2.1M
Std Dev150K
CV~7%
Skewness~0.1 (near symmetric)

Why normal: DAU is a count of independent events aggregated over millions of users. By the Central Limit Theorem, it's approximately normal. Day-of-week seasonality creates predictable humps, not skew.

Assumption to state: "DAU is a large aggregate count, so by CLT it's approximately normal. I can use z-scores for percentile estimation. But I'd detrend for day-of-week effects first."


5. Number of Events per User per Day β†’ Poisson / Negative Binomial

PropertyValue
Mean (likes per user/day)4.2
Variance18.5
% with 0 events40%
P9012
P9928

Why negative binomial over Poisson: Poisson assumes mean = variance. When variance >> mean (overdispersion), which is typical in user event counts, the negative binomial is more appropriate.

Assumption to state: "Event counts are overdispersed β€” the variance is 4Γ— the mean. A Poisson model would underestimate the tails. I'd use negative binomial, or if I just need percentiles, I'd use the empirical distribution."


6. Time Between Purchases β†’ Exponential / Gamma

PropertyValue
Mean14 days
Median9.7 days
P9032.2 days
P9541.9 days
P9964.5 days

Why exponential: If purchases are memoryless (constant hazard rate), inter-purchase time is exponential. In practice, the Gamma distribution fits better when users have a "warming up" period or increasing purchase propensity.

Quick estimation: With exponential assumption and mean = 14 days:

  • P90 = βˆ’14 Γ— ln(0.10) = 32.2 days
  • P95 = βˆ’14 Γ— ln(0.05) = 41.9 days
  • P99 = βˆ’14 Γ— ln(0.01) = 64.5 days

7. Conversion Rates (per user segment) β†’ Beta Distribution

PropertyValue
Overall rate3.2%
Range across segments0.5% – 12%
Distribution of segment ratesRight-skewed (most segments have low conversion)

Why Beta: Conversion rates are bounded [0, 1] and represent proportions. The Beta distribution naturally models proportions and is the conjugate prior for binomial data, making it useful in Bayesian A/B testing.

Assumption to state: "Conversion rates follow a Beta distribution. For A/B testing with low conversion rates, I need larger sample sizes because the variance of a proportion is p(1-p)/n, which is maximized near p = 0.5."


Quick Reference: Distribution Cheat Sheet for Interviews

Metric Type Likely Distribution Use Mean? Key Percentile Method
Load time, latency Log-normal No β€” use median + P95 Log-transform, then z-score
Revenue per user Zero-inflated log-normal Conditional on payers Separate payers; log-normal for payers
Session duration Log-normal / Weibull No β€” use median Log-transform, then z-score
DAU / MAU Normal (CLT) Yes z-score directly
Events per user Negative binomial Yes, but report variance too Empirical or parametric NB
Time between events Exponential / Gamma Mean is valid Exponential formula: βˆ’ΞΌ ln(1βˆ’p)
Conversion rate Beta (Bayesian) / Binomial Yes (it IS the mean) Beta quantile function
NPS / Satisfaction Bimodal or discrete Depends on shape Empirical percentiles only

🎯 Full Mock Interview: "Uber Eats Delivery Time Optimization"

This is a complete, realistic 35-minute mock interview for an analytical execution round. It demonstrates every skill an interviewer evaluates: structured thinking, descriptive analytics, distribution awareness, percentile estimation, hypothesis generation, SQL analysis, and clear recommendations.

Click to expand the complete mock interview walkthrough

🎬 Setting the Scene

Interviewer: "You're a data scientist at Uber Eats. Our operations team has flagged that customer satisfaction scores dropped from 4.2 to 3.8 (out of 5) over the past month. They suspect delivery times are the issue. You have access to all delivery, order, and customer data. Walk me through how you'd investigate this and what you'd recommend."


Step 1: Clarifying Questions (3 minutes)

Candidate: "Before I dive in, I'd like to understand the problem space better."

# Question Why I'm Asking Interviewer's Answer
1 "Is the CSAT drop consistent across all markets, or concentrated in specific cities?" Localize the problem "It's worse in large cities like NYC, LA, and Chicago. Smaller markets are relatively stable."
2 "Is this across all restaurant types, or specific categories like fast food vs. fine dining?" Segment the problem "Seems to be across the board, but we haven't dug into that yet."
3 "What's the CSAT survey methodology? Is it post-delivery, and what's the response rate?" Rule out measurement changes "Post-delivery push notification, ~15% response rate, methodology hasn't changed."
4 "Have there been any operational changes β€” new driver onboarding, batched deliveries, routing algorithm updates?" Identify potential root causes "We launched batched deliveries (2 orders per trip) in large markets 5 weeks ago to reduce driver costs."
5 "What does 'delivery time' mean here β€” from order placed, or from restaurant pickup?" Define the metric precisely "We track both. Total time = order_placed β†’ delivered. Let's focus on total time."
6 "Do we show an estimated delivery time to the customer? And how does actual compare?" Expectation vs. reality is key to satisfaction "Yes, we show an ETA. Great question β€” that's probably relevant."

Key insight from clarifying questions: Batched deliveries launched 5 weeks ago in the same markets where CSAT dropped. This is our primary suspect. But I also want to check whether the issue is absolute delivery time or delivery time relative to the ETA shown.


Step 2: Define Metrics & Descriptive Analytics (5 minutes)

Candidate: "Let me first understand the delivery time distribution before and after the batching launch."

Key Metrics to Examine
  • Total delivery time (order_placed β†’ delivered)
  • ETA accuracy = (actual_time βˆ’ estimated_time) β€” positive means late
  • Late delivery rate = % of orders where actual > ETA
  • Delivery time percentiles β€” P50, P90, P95, P99
  • CSAT score distribution β€” not just the mean
Descriptive Analysis Query
-- Delivery time distribution: pre vs post batching launch
WITH delivery_stats AS (
  SELECT
    CASE
      WHEN order_date < '2026-02-15' THEN 'pre_batch'
      ELSE 'post_batch'
    END AS period,
    market,
    total_delivery_minutes,
    actual_delivery_minutes - estimated_delivery_minutes AS eta_miss_minutes
  FROM orders o
  JOIN deliveries d ON o.order_id = d.order_id
  WHERE order_date BETWEEN '2026-01-15' AND '2026-03-15'
    AND market IN ('NYC', 'LA', 'Chicago')
)
SELECT
  period,
  COUNT(*) AS orders,
  -- Central tendency
  AVG(total_delivery_minutes) AS mean_delivery,
  PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY total_delivery_minutes) AS median_delivery,
  -- Spread
  STDDEV(total_delivery_minutes) AS stddev_delivery,
  -- Tail behavior (critical for customer experience)
  PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY total_delivery_minutes) AS p90_delivery,
  PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total_delivery_minutes) AS p95_delivery,
  PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY total_delivery_minutes) AS p99_delivery,
  -- ETA accuracy
  AVG(eta_miss_minutes) AS avg_eta_miss,
  PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY eta_miss_minutes) AS p90_eta_miss,
  AVG(CASE WHEN eta_miss_minutes > 0 THEN 1.0 ELSE 0.0 END) AS late_rate,
  AVG(CASE WHEN eta_miss_minutes > 10 THEN 1.0 ELSE 0.0 END) AS very_late_rate
FROM delivery_stats
GROUP BY period;
Hypothetical Results
Metric Pre-Batch Post-Batch Change
Median delivery (min) 32 35 +9%
Mean delivery (min) 36 42 +17%
Std Dev (min) 12 19 +58% ⚠️
P90 delivery (min) 52 68 +31% ⚠️
P95 delivery (min) 58 82 +41% ⚠️
P99 delivery (min) 72 105 +46% ⚠️
Avg ETA miss (min) -2 (early) +8 (late) ⚠️ Expectation gap
Late rate (> ETA) 28% 55% +27pp ⚠️
Very late rate (>10min) 8% 31% +23pp ⚠️

Candidate: "There are two critical findings here:"

  1. The tails exploded. The median only went up 9%, but P90 went up 31% and P99 went up 46%. Batching is creating high-variance outcomes. This tells me delivery time is becoming more right-skewed β€” likely shifting from a log-normal with Οƒ_log β‰ˆ 0.3 to Οƒ_log β‰ˆ 0.5. The mean is misleading here; P95 is the metric that matters for customer experience.
  2. The ETA model isn't calibrated for batching. We went from 2 minutes early on average to 8 minutes late. More than half of orders are now arriving after the promised time. That expectation gap is likely what's driving CSAT down more than the absolute time increase."

Step 3: Hypothesis Generation & Prioritization (2 minutes)

# Hypothesis Evidence So Far How to Test Priority
H1 Batched deliveries increase tail delivery times, and the ETA model doesn't account for the second stop Strong: P90/P95 ballooned; late rate doubled Compare batched vs. single deliveries πŸ”΄ High
H2 The second customer in a batch is getting cold food (long sit time after restaurant pickup) Plausible: second stop adds 10-15 min Compare CSAT for 1st vs. 2nd delivery in batch πŸ”΄ High
H3 Batching algorithm pairs distant restaurants or delivery addresses Unknown Analyze distance between paired orders 🟑 Medium
H4 Drivers are rejecting batched orders, causing reassignment delays Unknown Check driver acceptance rates pre/post 🟑 Medium
H5 CSAT survey timing/selection bias β€” unhappy users respond more Unlikely: methodology unchanged, response rate stable Check response rate trends 🟒 Low

Step 4: Deep-Dive Analysis (7 minutes)

Analysis 1: Batched vs. Single Delivery Comparison
-- Compare batched vs single deliveries
SELECT
  delivery_type,     -- 'single' or 'batched'
  batch_position,    -- NULL for single, 1 or 2 for batched
  COUNT(*) AS orders,
  AVG(total_delivery_minutes) AS mean_delivery,
  PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY total_delivery_minutes) AS p50,
  PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY total_delivery_minutes) AS p90,
  PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total_delivery_minutes) AS p95,
  AVG(actual_delivery_minutes - estimated_delivery_minutes) AS avg_eta_miss,
  AVG(CASE WHEN actual_delivery_minutes > estimated_delivery_minutes THEN 1.0 ELSE 0.0 END) AS late_rate,
  AVG(csat_score) AS avg_csat
FROM orders o
JOIN deliveries d ON o.order_id = d.order_id
LEFT JOIN csat_surveys c ON o.order_id = c.order_id
WHERE order_date >= '2026-02-15'
  AND market IN ('NYC', 'LA', 'Chicago')
GROUP BY delivery_type, batch_position;
Hypothetical Results
Delivery Type Orders P50 (min) P90 (min) P95 (min) Avg ETA Miss Late Rate Avg CSAT
Single 180K 31 48 55 -1 min 26% 4.1
Batch β€” 1st drop 90K 33 55 65 +5 min 48% 3.8
Batch β€” 2nd drop 90K 45 78 92 +18 min 72% 3.2

Candidate: "This confirms Hypotheses H1 and H2. The second customer in a batch is getting hit hardest: P95 is 92 minutes (vs. 55 for single), the average ETA miss is 18 minutes, and CSAT is 3.2. That 3.2 for 2nd-drop customers is dragging the overall CSAT from 4.1 down to 3.8."

Analysis 2: Quantify the CSAT-Lateness Relationship
-- CSAT as a function of ETA miss
SELECT
  CASE
    WHEN actual_minutes - estimated_minutes <= -5  THEN '5+ min early'
    WHEN actual_minutes - estimated_minutes <= 0   THEN '0-5 min early'
    WHEN actual_minutes - estimated_minutes <= 5   THEN '0-5 min late'
    WHEN actual_minutes - estimated_minutes <= 10  THEN '5-10 min late'
    WHEN actual_minutes - estimated_minutes <= 20  THEN '10-20 min late'
    ELSE '20+ min late'
  END AS lateness_bucket,
  COUNT(*) AS orders,
  AVG(csat_score) AS avg_csat,
  STDDEV(csat_score) AS sd_csat
FROM orders o
JOIN deliveries d ON o.order_id = d.order_id
JOIN csat_surveys c ON o.order_id = c.order_id
WHERE order_date >= '2026-02-15'
GROUP BY 1
ORDER BY MIN(actual_minutes - estimated_minutes);
Hypothetical Results
Lateness Bucket Orders Avg CSAT "Threshold Effect"
5+ min early 45K 4.5
0-5 min early 70K 4.3
0-5 min late 55K 4.0 Small dip β€” tolerable
5-10 min late 40K 3.5 ⚠️ Sharp drop β€” the pain threshold
10-20 min late 30K 2.8 πŸ”΄ Severe impact
20+ min late 20K 2.1 πŸ”΄ Nearly guaranteed bad review

Candidate: "There's a clear non-linear threshold: CSAT drops sharply once a delivery is more than 5 minutes late. This suggests that CSAT is driven more by expectation violation than absolute delivery time. A 45-minute delivery that arrives on time scores better than a 35-minute delivery that's 10 minutes late."

Analysis 3: Estimating the Cost of Batching with Percentile Modeling

Candidate: "Let me estimate the P95 delivery time for batched orders using the distribution properties. The post-batch delivery time data shows:"

  • Mean_log (2nd drop) β‰ˆ ln(45) β‰ˆ 3.81
  • From the spread, Οƒ_log β‰ˆ 0.55
  • P95 estimate = exp(3.81 + 1.645 Γ— 0.55) = exp(4.71) β‰ˆ 111 minutes

"This means about 5% of 2nd-drop customers are waiting nearly 2 hours. Those customers will likely churn. If we can tighten the distribution by improving routing (reducing Οƒ_log from 0.55 to 0.35), the estimated P95 drops to exp(3.81 + 1.645 Γ— 0.35) = exp(4.39) β‰ˆ 80 minutes β€” still long, but a 28% improvement at the tail."


Step 5: Root Cause Summary (1 minute)

Root cause: The batched delivery program launched 5 weeks ago is creating two problems:

  1. The 2nd-drop customer experiences 40% longer delivery times (P50: 45 vs. 31 min), with extreme tail cases β€” P95 of 92 minutes.
  2. The ETA model was not recalibrated for batching, creating an expectation gap: 72% of 2nd-drop orders arrive late. Since CSAT drops sharply once lateness exceeds 5 minutes, this expectation gap β€” not the absolute time β€” is the primary driver of the CSAT decline.

In aggregate, the 90K 2nd-drop orders per month with a 3.2 CSAT are pulling the blended average from 4.1 to 3.8.


Step 6: Recommendations (3 minutes)

Timeframe Recommendation Expected Impact Effort Trade-off
Immediate (this week) Recalibrate ETA model to add estimated batching delay for 2nd-drop customers. Show "Your order is part of a multi-stop delivery" with adjusted ETA. Reduce late rate from 72% β†’ ~35% for 2nd-drop; CSAT recovery ~0.2-0.3 points Low Showing longer ETAs may reduce order volume slightly
Short-term (2 weeks) Add distance/time constraints to batching algorithm: only batch if 2nd drop adds <10 min and distance <1.5 miles Reduce 2nd-drop P90 from 78 β†’ ~55 min Medium Reduces % of eligible batches, partially reducing cost savings
Medium-term (1 month) A/B test dynamic batching rules: batch only during low-demand hours when driver supply is ample; single delivery during peak hours when speed matters most Optimize cost savings vs. CSAT trade-off Medium Requires demand forecasting model
Long-term (quarter) Build a real-time delivery experience score model that predicts CSAT from (estimated time, actual time, food type, weather) and uses it to gate batching decisions β€” only batch when model predicts CSAT β‰₯ 4.0 Systematic quality floor for all deliveries High Model complexity; needs continuous retraining

Step 7: Handling the Follow-Up (Trade-off Discussion)

Interviewer: "The operations team says batching saves $2.50 per delivery in driver costs. That's $225K/month in the three cities. How do you think about the trade-off?"

Candidate:

  1. Quantify the CSAT cost:
    • 90K 2nd-drop orders per month with avg CSAT 3.2 (vs. 4.1 baseline)
    • Research shows a 1-point CSAT drop increases churn rate by ~15%
    • If these users have LTV of $200 and churn rate increases by 15%: 90K Γ— 0.15 Γ— $200 = $2.7M at risk annually
  2. Compare to savings:
    • Cost savings: $225K/month = $2.7M/year
    • CSAT-driven churn risk: ~$2.7M/year
    • Roughly breakeven, but the churn effect compounds β€” lost customers don't come back
  3. Recommendation: "The batching savings roughly equal the churn risk, but churn is irreversible while cost savings can be optimized. I'd keep batching but with the tighter constraints I suggested β€” limit the 2nd-drop penalty. A constrained batching model might save $1.80 per delivery (vs $2.50) while keeping CSAT for 2nd-drop orders above 3.8. That's $162K/month in savings with much lower churn risk β€” a better long-term outcome."

Step 8: Closing Summary (30 seconds)

"To wrap up: The CSAT decline from 4.2 to 3.8 is primarily driven by the batched delivery program. The 2nd-drop customer experiences 40% longer delivery times and 72% of their orders arrive late because the ETA model wasn't updated. The non-linear relationship between lateness and CSAT means even small improvements in ETA accuracy will have outsized effects. My recommendation is a three-pronged approach: (1) immediately fix the ETA model, (2) tighten batching constraints over the next two weeks, and (3) A/B test dynamic batching rules to find the cost-quality sweet spot. The batching program can be profitable without destroying customer experience."


πŸ“ Interviewer Scorecard: What This Candidate Demonstrated

Evaluation Criterion Rating Evidence
Problem structuring βœ… Strong Asked 6 clarifying questions; identified batching immediately as the likely cause
Descriptive analytics βœ… Strong Used median + percentiles (not just mean); noted distribution shift; explained mean vs. median choice
Distribution awareness βœ… Strong Identified delivery time as log-normal; estimated P95 parametrically; explained tail behavior
Hypothesis generation βœ… Strong 5 hypotheses with prioritization and testing approach
SQL & quantitative analysis βœ… Strong 3 well-structured queries; segmented by delivery type and lateness bucket
Insight generation βœ… Strong Key insight: CSAT driven by expectation gap, not absolute time β€” non-obvious finding
Recommendations βœ… Strong 4 prioritized recommendations with trade-offs; quantified cost vs. churn risk
Communication βœ… Strong Clear narrative arc; concise summary; used data to support every claim

πŸ“± Social Media Metrics: Expected Behavior & Distributions

In interviews you'll often be asked: "A social media metric changed β€” is that expected?" or "How would you model this metric?". Answering well requires knowing how each metric typically behaves, what distribution it follows, and what assumptions are safe to make.

Click to expand Social Media Metrics Deep Dive

Quick Reference: Social Media Metric Distributions

Metric Distribution Typical Shape Use Mean? Key Percentiles to Watch
DAU / MAU Normal (CLT) Symmetric with day-of-week seasonality Yes Not applicable β€” it's a single aggregate
DAU/MAU ratio (stickiness) Beta Bounded [0,1], often left-skewed for healthy products Yes P10 (floor of engagement)
Likes per post Log-normal / Power-law Most posts get few; viral posts get orders of magnitude more No β€” use median P90, P99 (viral threshold)
Comments per post Zero-inflated negative binomial 50-70% of posts get 0 comments; heavy right tail No β€” use median or % with β‰₯1 comment P75, P90
Shares/Reshares per post Zero-inflated power-law 90%+ posts get 0 shares; extreme outliers No β€” use share rate (% with β‰₯1 share) P95, P99 (virality)
Time spent per session Log-normal Right-skewed; most sessions are short No β€” use median P90, P95
Time spent per day (total) Log-normal / Gamma Right-skewed; power users dominate the tail No β€” use median P90, P95
Sessions per user per day Poisson / Negative binomial Discrete counts, overdispersed Yes (as rate) P90
CTR (click-through rate) Beta Bounded [0,1], usually low (1-5%), right-skewed Yes (it IS the proportion) Segment by position/content type
Follow/connection requests per day Poisson (well-behaved users) / Power-law (spammers) Most send 0-2; bot/spam accounts send hundreds Report median + P99 P99 (spam detection)
Impressions per piece of content Log-normal / Power-law Extreme right skew; algo-boosted content dominates No β€” use median P90, P99
Video watch time per view Bimodal / Beta Spike at 0-3 seconds (scroll-past) and near completion (engaged) No β€” use completion rate P25 (quick-abandon), P75
Ad revenue per user (ARPU) Zero-inflated log-normal Many users see no ads or don't click; heavy right tail Conditional on exposure P90, P99 (high-value users)
Notification open rate Beta Bounded [0,1]; varies heavily by notification type Yes P10 (worst-performing types)
Friend/follower count Power-law (Pareto) Most users have tens-hundreds; influencers have millions No β€” use median P90, P99, P99.9

Case Study Questions: "How Should This Metric Behave?"

Case A: "Likes per post on Instagram dropped 8% week-over-week. Should we be concerned?"

Expected distribution: Likes per post follow a log-normal / power-law distribution. Most posts receive a small number of likes, while a few viral posts receive orders of magnitude more.

Why the distribution matters here:

  • Because of the extreme right skew, the mean likes per post is heavily influenced by viral outliers. A single viral post (or the absence of one) can swing the mean by 5-10%.
  • An 8% drop in the mean could simply mean there were fewer viral posts this week β€” not a systemic problem.
  • The median likes per post is the better signal for "typical" post performance.

How to investigate:

-- Decompose: Is the drop in the body or the tail?
SELECT
  period,
  COUNT(*) AS posts,
  AVG(likes) AS mean_likes,
  PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY likes) AS median_likes,
  PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY likes) AS p90_likes,
  PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY likes) AS p99_likes,
  -- What % of posts drove > 50% of all likes?
  COUNT(*) FILTER (WHERE likes > (SELECT PERCENTILE_CONT(0.95) 
    WITHIN GROUP (ORDER BY likes) FROM posts)) * 100.0 / COUNT(*) AS pct_viral
FROM posts
WHERE posted_at BETWEEN '2026-03-02' AND '2026-03-15'
GROUP BY CASE WHEN posted_at < '2026-03-09' THEN 'prev_week' ELSE 'this_week' END AS period;
Period Mean Likes Median Likes P90 P99 Interpretation
Previous week 142 28 310 8,500 Baseline
This week (Scenario 1) 131 27 290 5,200 ⚠️ Tail shrank β€” fewer viral posts; body is stable. Likely algorithmic or content-mix issue.
This week (Scenario 2) 131 22 250 8,100 πŸ”΄ Body AND tail dropped β€” systemic engagement decline. Investigate feed ranking, app bugs, or external factors.

Key interview insight: "An 8% drop in mean likes doesn't tell me much on its own because the distribution is extremely right-skewed. I'd look at the median and the P90/P99 separately to determine if this is a tail phenomenon or a systemic shift."


Case B: "Time spent per session on TikTok increased by 12%. The PM is celebrating. Is this good?"

Expected distribution: Session duration follows a log-normal distribution. Most sessions are 1-5 minutes (quick scrolls), with a long right tail of 30+ minute deep-engagement sessions.

Why an increase might NOT be good:

  • An increase in mean session time can be driven by the right tail growing (power users spending even longer) while casual users stay flat or churn.
  • If the median didn't move but the P90/P95 jumped β€” a small group is spending much more time. This can indicate addictive loops rather than broad engagement improvement.
  • If total sessions per user decreased while time per session increased β€” users might be coming less often but getting "stuck" when they do. Net time spent per day could be flat or down.

How to investigate:

-- Decompose time spent: Per session vs. daily total
SELECT
  period,
  -- Per-session metrics
  AVG(session_duration_min) AS mean_session_min,
  PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY session_duration_min) AS median_session,
  PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY session_duration_min) AS p90_session,
  -- Daily totals per user
  AVG(daily_total_min) AS avg_daily_total,
  -- Frequency
  AVG(sessions_per_day) AS avg_sessions_per_day
FROM (
  SELECT
    user_id,
    session_id,
    CASE WHEN session_date < '2026-03-09' THEN 'prev_week' ELSE 'this_week' END AS period,
    session_duration_min,
    SUM(session_duration_min) OVER (PARTITION BY user_id, session_date) AS daily_total_min,
    COUNT(*) OVER (PARTITION BY user_id, session_date) AS sessions_per_day
  FROM sessions
  WHERE session_date BETWEEN '2026-03-02' AND '2026-03-15'
) sub
GROUP BY period;
Metric Prev Week This Week Change Signal
Mean session (min) 8.2 9.2 +12% The headline number
Median session (min) 4.5 4.6 +2% Typical user barely changed
P90 session (min) 22 28 +27% ⚠️ Heavy users are going deeper
Sessions per day 5.1 4.6 -10% ⚠️ Users are opening the app less
Total daily time (min) 42 42 0% πŸ”΄ Net engagement is flat

Key interview insight: "A 12% increase in mean session time is misleading here. The median barely moved (+2%), meaning typical users aren't more engaged. The increase is driven by the P90 tail β€” power users going deeper while coming back less often. Total daily engagement is flat. I wouldn't celebrate; I'd investigate whether the longer sessions are healthy engagement or whether users are having trouble finding content quickly."


Case C: "Comments per post on Facebook dropped 15%. How would you investigate?"

Expected distribution: Comments per post follow a zero-inflated negative binomial distribution. Most posts (50-70%) get zero comments. Among posts that do get comments, counts are overdispersed (variance >> mean).

Why this distribution matters:

  • A 15% drop in mean comments could mean: (a) more posts getting zero comments (rate shift) or (b) commented posts getting fewer comments (intensity shift) β€” these have different causes.
  • Because of zero-inflation, the mean is driven by the fraction of posts that get any comments at all more than by comment depth.

Two-part decomposition:

-- Decompose: Is it fewer posts getting comments, or fewer comments per post?
SELECT
  period,
  COUNT(*) AS total_posts,
  -- Part 1: Comment incidence
  AVG(CASE WHEN comment_count > 0 THEN 1.0 ELSE 0.0 END) AS pct_with_comments,
  -- Part 2: Intensity (among commented posts)
  AVG(CASE WHEN comment_count > 0 THEN comment_count END) AS avg_comments_given_any,
  PERCENTILE_CONT(0.50) WITHIN GROUP (
    ORDER BY CASE WHEN comment_count > 0 THEN comment_count END
  ) AS median_comments_given_any,
  -- Combined
  AVG(comment_count) AS mean_comments_overall
FROM posts_with_engagement
GROUP BY period;
Metric Prev Month This Month Change
% posts with β‰₯1 comment 38% 30% -8pp ⚠️
Avg comments (given any) 6.2 6.0 -3%
Mean comments (overall) 2.4 1.8 -25%

Key interview insight: "The drop is almost entirely in comment incidence, not intensity. Fewer posts are sparking conversations at all, but posts that do get comments are performing similarly. I'd investigate changes to the feed algorithm that may be deprioritizing conversation-starter content, or look at whether content mix shifted toward Stories/Reels (which have lower comment rates by format)."


Case D: "DAU/MAU ratio (stickiness) declined from 0.55 to 0.48 over 3 months. What's happening?"

Expected distribution: At the product level, DAU/MAU is a single ratio. But across user segments or markets, the stickiness values follow a Beta distribution β€” bounded between 0 and 1, often left-skewed for healthy apps (most segments have decent stickiness) or right-skewed for apps with retention issues.

Why this matters:

  • DAU/MAU = 0.55 means a typical user opens the app ~17 days per month. Dropping to 0.48 means ~14 days. That's 3 fewer days of engagement per user per month β€” significant.
  • This ratio can decline due to: (a) MAU growing faster than DAU (acquisition without activation), (b) DAU declining (engagement loss), or (c) both.

How to decompose:

-- Is DAU falling, MAU rising, or both?
SELECT
  month,
  dau_avg,
  mau,
  dau_avg * 1.0 / mau AS stickiness,
  -- Decompose: new vs returning DAU
  new_user_dau_avg,
  returning_user_dau_avg,
  new_user_dau_avg * 1.0 / dau_avg AS pct_dau_new
FROM monthly_metrics
WHERE month >= '2025-12-01'
ORDER BY month;
Month DAU (M) MAU (M) Stickiness % DAU from New Users Interpretation
Dec 2025 18.7 34.0 0.55 12% Baseline
Jan 2026 19.1 36.5 0.52 18% ⚠️ MAU growing from marketing push; new users are less sticky
Feb 2026 19.0 38.2 0.50 22% ⚠️ Acquisition continues but activation/retention lagging
Mar 2026 18.5 38.5 0.48 20% πŸ”΄ DAU starting to decline; new users not converting to daily habit

Key interview insight: "The stickiness decline isn't because existing users are disengaging β€” it's driven by rapid MAU growth from acquisition campaigns bringing in users who don't form a daily habit. The existing user base's stickiness is stable. The fix isn't an engagement feature; it's improving the new user onboarding and activation flow. I'd look at D1, D7, D30 retention curves for the new cohorts to find where they drop off."


Case E: "Shares per post on LinkedIn are way up β€” does that mean content quality improved?"

Expected distribution: Shares per post follow a zero-inflated power-law. 90%+ of posts get zero shares. Among shared posts, the distribution follows a power law β€” a tiny fraction account for the vast majority of total shares.

Why this is tricky:

  • Because of the extreme zero-inflation and heavy tail, small changes in the viral tail can massively shift the mean.
  • "Shares are way up" could mean: (a) a few posts went mega-viral (noise), (b) share rate (% of posts shared at least once) increased (signal), or (c) the share button got repositioned in the UI (behavior change).

How to decompose:

-- Separate share rate from share intensity
SELECT
  period,
  COUNT(*) AS total_posts,
  AVG(CASE WHEN shares > 0 THEN 1.0 ELSE 0.0 END) AS share_rate,
  AVG(shares) AS mean_shares_all,
  AVG(CASE WHEN shares > 0 THEN shares END) AS mean_shares_shared,
  PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY shares) AS p99_shares,
  -- Concentration: what % of total shares come from top 1% of posts?
  SUM(CASE WHEN shares >= (SELECT PERCENTILE_CONT(0.99) 
    WITHIN GROUP (ORDER BY shares) FROM posts_engagement) 
    THEN shares ELSE 0 END) * 100.0 / SUM(shares) AS pct_shares_from_top1
FROM posts_engagement
GROUP BY period;
Metric Prev Month This Month Change Signal
Share rate (β‰₯1 share) 7.2% 7.5% +0.3pp Basically flat β€” broad quality didn't change
Mean shares (all posts) 1.8 2.9 +61% The headline number β€” misleading
Mean shares (shared posts) 25 39 +56% Intensity up among shared posts
P99 shares 1,200 4,800 +300% ⚠️ A few mega-viral posts
% of shares from top 1% 42% 68% +26pp ⚠️ Extreme concentration

Key interview insight: "Content quality didn't broadly improve. The share rate is flat at 7.5% β€” the same proportion of posts are resonating. The '61% increase' in mean shares is driven almost entirely by a handful of mega-viral posts (the top 1% now drives 68% of all shares vs. 42% before). This is noise from the power-law tail, not a real content quality improvement. I'd report the share rate as the stable metric and flag the viral outliers as one-time events."


Case F: "Video completion rate on Reels shows a bimodal pattern. Is this a reporting bug?"

Expected distribution: Video completion rate (% of video watched per view) is bimodal β€” NOT normal, NOT log-normal. There's a spike near 0% (users who scroll past immediately) and another spike near 100% (users who watch to completion or loop).

Why bimodal:

  • Short-form video is a binary decision: within 1-2 seconds, the user either scrolls past (0-5% watched) or commits to watching most/all of it (70-100% watched).
  • The middle range (20-60%) is relatively sparse β€” few users watch exactly half a video then leave.
  • This is NOT a bug β€” it reflects genuine binary user behavior.

Practical implications:

Approach Why It Fails / Works
❌ Report mean completion rate A mean of 45% implies a "typical" viewer watches about half β€” but almost no one does. It averages two completely different behaviors.
βœ… Report hook rate (% who watch β‰₯3 sec) Measures how many users get past the initial scroll decision.
βœ… Report full-view rate (% who watch β‰₯90%) Measures engaged viewers. This is what the recommendation algo should optimize for.
βœ… Segment by viewer cohort New users have lower hook rates; returning followers have higher full-view rates.
-- Bimodal analysis: separate the two populations
SELECT
  period,
  COUNT(*) AS total_views,
  -- Hook rate: first decision point
  AVG(CASE WHEN pct_watched >= 0.05 THEN 1.0 ELSE 0.0 END) AS hook_rate,
  -- Full view rate: engagement signal
  AVG(CASE WHEN pct_watched >= 0.90 THEN 1.0 ELSE 0.0 END) AS full_view_rate,
  -- Mean is misleading for bimodal data, but report for reference
  AVG(pct_watched) AS mean_completion,
  -- Loop rate: users who rewatch
  AVG(CASE WHEN loops > 1 THEN 1.0 ELSE 0.0 END) AS loop_rate
FROM video_views
GROUP BY period;

Key interview insight: "This is expected bimodal behavior, not a bug. Short-form video completion rate has two modes β€” quick-scroll (~0%) and full-watch (~100%). The mean is meaningless here. I'd always report hook rate (β‰₯3s watched) and full-view rate (β‰₯90%) separately. Any A/B test measuring 'average completion rate' on this data needs to be restructured around these two distinct metrics."


Case G: "Our notification open rate dropped from 12% to 9%. What do we look at?"

Expected distribution: Notification open rate follows a Beta distribution bounded between 0% and 100%. At the aggregate level it's a single proportion, but across notification types and user segments, rates form a Beta-distributed cloud with wide variance.

The decomposition approach:

-- Decompose by notification type and urgency
SELECT
  notification_type,
  period,
  COUNT(*) AS notifs_sent,
  AVG(CASE WHEN opened THEN 1.0 ELSE 0.0 END) AS open_rate,
  -- Volume shift: are we sending more low-quality notifs?
  COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY period) AS pct_of_volume
FROM notifications
WHERE sent_date BETWEEN '2026-02-01' AND '2026-03-15'
GROUP BY notification_type, period
ORDER BY notification_type, period;
Notification Type Open Rate (Feb) Volume % (Feb) Open Rate (Mar) Volume % (Mar)
Direct message 42% 15% 41% 12%
Friend request 35% 10% 34% 8%
Post like 8% 30% 8% 22%
Suggested content 3% 20% 3% 38% ⚠️
Engagement reminder 5% 25% 5% 20%
Blended 12% 100% 9% 100%

Key interview insight: "This is Simpson's paradox. None of the individual notification types had a meaningful open rate change β€” they're all within Β±1pp. The blended rate dropped because the volume mix shifted: suggested content notifications (3% open rate) went from 20% to 38% of volume, diluting the overall rate. The fix isn't improving open rates β€” it's reducing the volume of low-value notifications. I'd recommend a notification relevance model that gates sending based on predicted open probability."


🎯 Interview Pattern: How to Answer "A Metric Changed" Questions

Every social media metric question follows the same analytical structure:

  1. State the expected distribution. "This metric typically follows a [distribution] because [reason]."
  2. Challenge the summary statistic. "The mean/median/rate can be misleading because [distribution property]. I'd look at [better statistic]."
  3. Decompose the change. Use one of:
    • Body vs. tail: Did the median shift, or just the extremes? (likes, shares, revenue)
    • Incidence vs. intensity: Did the rate of occurrence change, or the magnitude given occurrence? (comments, shares, purchases)
    • Mix shift: Did the composition of the population change? (notification types, user segments, content formats)
  4. Identify root cause. Map the decomposition to potential causes (algo change, UI change, content mix, seasonality, measurement change).
  5. Recommend action with trade-offs acknowledged.

⚠️ Common Analytical Execution Mistakes

Mistake Why It Hurts Fix
Diving into data without asking questions You may solve the wrong problem Always ask 3-5 clarifying questions first
Only one hypothesis Shows narrow thinking Generate 3-5 hypotheses, then prioritize
Complex analysis, no insight Impressive SQL β‰  business value End every query with "so what?"
No recommendation Analysis without action is useless Always propose next steps
Ignoring trade-offs Real decisions have costs Acknowledge what you'd give up

βœ… Self-Assessment Checklist

Before your interview, confirm you can:

  • ☐ Ask 5+ clarifying questions before analyzing
  • ☐ Generate multiple hypotheses and prioritize them
  • ☐ Write SQL to segment and diagnose issues
  • ☐ Summarize findings in one "so what" sentence
  • ☐ Recommend actions with trade-offs acknowledged
  • ☐ Discuss a case study for 15+ minutes without notes
45 mins Intermediate