Role-Level Learning Path

What to Expect in the Interview Loop

SQL coding screen (30–45 min, medium SELECT/JOIN/GROUP BY)
Case study — define a metric, interpret a chart
Behavioral — tell me about a project you worked on

Key Skills to Master

SQL — Core Queries

Challenge bar: 🟢 Easy — Medium

SELECT, WHERE, ORDER BY, LIMIT
GROUP BY + aggregate functions (COUNT, SUM, AVG, MAX, MIN)
INNER JOIN, LEFT JOIN across 2 tables
Subqueries for simple filtering
Basic date functions (DATE_DIFF, DATE_TRUNC)

Python — Exploratory Analysis

pandas: read_csv, head, describe, value_counts
DataFrame selection, filtering, sorting
groupby + aggregation
Basic matplotlib / seaborn charts
Handling missing values with fillna / dropna

Statistics — Descriptive

Mean, median, mode — when to use each
Variance and standard deviation
Distributions: normal, skewed, bimodal
Correlation vs causation
Percentiles and outlier detection

Product Sense — Fundamentals

Defining a North Star metric for a simple product
Interpreting a drop in a metric (basic)
User funnel basics (awareness → activation → retention)

Your Phased Study Plan

Phase 1 — Foundations (Weeks 1–2)

1–1.5 hours/day

Focus: Statistics and core SQL

Work through Statistics & Probability module — focus on descriptive stats
Complete SQL Basics: SELECT, WHERE, GROUP BY, JOINs
Practice 3 Easy SQL problems on LeetCode or StrataScratch daily
Read the SQL Cheat Sheet; reproduce every example by hand

Phase 2 — Analysis Skills (Weeks 3–4)

1.5–2 hours/day

Focus: Python EDA and metric thinking

Python module: pandas groupby, merge, pivot
Build a simple EDA notebook on a Kaggle dataset
Practice defining a North Star + 3 supporting metrics for a familiar app
Work through 2 beginner analytical execution case studies

Phase 3 — Interview Simulation (Weeks 5–8)

2 hours/day

Focus: Timed practice and behavioral prep

Timed SQL: solve 2 Medium problems in <15 min each
Write and refine 5 STAR stories using the Behavioral Interview guide
Complete the Interview Day Checklist
Do 2 mock analytical case studies with a study partner

Curated Handbook Pages for E3

Introduction & Interview Overview Statistics & Probability SQL & Data Manipulation Python for Data Analysis Beginner Exercises Technical Skills Interview Analytical Execution Behavioral Interview SQL Cheat Sheet Statistics Cheat Sheet Interview Day Checklist

What to Expect in the Interview Loop

SQL coding screen (45 min, window functions, CTEs, multi-table joins)
A/B testing — design an experiment, interpret results, flag pitfalls
Product sense — define metrics, diagnose a metric drop
Behavioral — impact, collaboration, data-driven decision making

Key Skills to Master

SQL — Window Functions & CTEs

Challenge bar: 🟡 Medium

Window functions: ROW_NUMBER, RANK, LAG, LEAD
CTEs for multi-step logic
Self-joins for sequential event analysis
CASE WHEN for conditional aggregation
Date spine / calendar table patterns

Python — Production-Ready ETL

pandas: merge, pivot_table, apply, lambda
Cohort retention matrix with groupby
Basic regex for text field parsing
scipy.stats for t-test and chi-square
Clean, reusable function design

Statistics — Inferential

Hypothesis testing: null/alternative, p-value, alpha
Two-sample t-test and proportion z-test
Statistical power and sample size calculation
Type I vs Type II errors
Confidence intervals and effect sizes

Product Sense — Metrics & Experiments

North Star + Leading + Guardrail metric tiers
A/B test design: unit of randomization, SRM checks
Novelty effects and user bias
Root cause analysis: 4-layer framework
Funnel analysis and cohort retention

Your Phased Study Plan

Phase 1 — Technical Refresh (Days 1–7)

1.5–2 hours/day

Focus: Window functions and inferential statistics

SQL module: window functions — write 10 queries using ROW_NUMBER, LAG, LEAD
Statistics module: hypothesis testing and confidence intervals
Solve 5 Medium SQL problems on LeetCode (Database) under timed conditions
Read Advanced SQL Patterns & Techniques supplementary guide

Phase 2 — Analytical Depth (Days 8–21)

2–2.5 hours/day

Focus: A/B testing, product sense, Python ETL

A/B Testing module: sample size calc, SRM, novelty effects
Python: build a cohort retention matrix from scratch
Practice 3 analytical execution case studies with the SPSIL framework
Define metrics for 2 unfamiliar products (pick from app stores)
Complete the 21-Day Sprint Days 8–14 exercises

Phase 3 — Interview Simulation (Days 22–42)

2 hours/day

Focus: Mock interviews and behavioral polish

Timed SQL: 5 Medium + 2 Hard problems per week
Write 8 STAR stories — ensure each covers a Meta value
Practice the 4-layer RCA framework on 3 real anomaly scenarios
Complete behavioral mock interview guide
Review SQL Interview Problems — all 15 problems with self-grading

Curated Handbook Pages for E4

Statistics & Probability SQL & Data Manipulation A/B Testing & Experimentation Python for Data Analysis Intermediate Exercises Technical Skills Interview SQL Interview Problems (15 Curated) Analytical Execution Analytical Reasoning / Product Sense Behavioral Interview 21-Day Sprint Guide Advanced SQL Patterns Sample Size Calculator

What to Expect in the Interview Loop

SQL coding screen (60 min, optimization, complex window functions, performance reasoning)
Advanced A/B testing — causal inference, network effects, switchback experiments
Product sense — system-level metric design, trade-off analysis
Analytical execution — ambiguous open-ended case, structure and present findings
Cross-functional leadership — behavioral + influence stories

Key Skills to Master

SQL — Optimization & Scale

Challenge bar: 🔴 Hard

Query optimization: EXPLAIN, index usage, predicate pushdown
Sessionization with LAG + cumulative SUM patterns
Multiple CTEs chained for complex multi-step pipelines
Approximate aggregations (APPROX_COUNT_DISTINCT)
Performance trade-offs: broadcast vs shuffle joins

Python — Statistical & Scalable

Difference-in-differences implementation
Propensity score matching for observational studies
Custom A/B test variance reduction (CUPED)
Recursive JSON flattening and log parsing
Scalability discussion: from pandas to Spark

Statistics — Causal Inference

Causal inference: RCT vs quasi-experiments
Difference-in-differences (DiD)
Propensity score matching (PSM)
CUPED / variance reduction
Multiple testing corrections (Bonferroni, FDR)
Network effects and interference

Product Sense — Systems Thinking

Metric ecosystem design for a whole product area
Experiment trade-offs: speed vs accuracy vs power
Long-term vs short-term metric misalignment
Stakeholder alignment on ambiguous goals
Identifying leading vs lagging indicators at scale

Your Phased Study Plan

Phase 1 — Advanced Technical (Weeks 1–2)

2 hours/day

Focus: Hard SQL, causal inference, scalable Python

SQL: solve 5 Hard problems per week; annotate each with performance notes
Statistics: implement DiD and PSM in Python from scratch
Python: build a recursive log parser and a stateful event validator
Read and implement the Advanced SQL Patterns & Techniques module

Phase 2 — Strategic Depth (Weeks 3–5)

2–2.5 hours/day

Focus: Complex experimentation and metric design

A/B testing: implement CUPED variance reduction on a sample dataset
Design a metric framework for a multi-sided marketplace
Practice 4 hard analytical execution cases with written outputs
Work through the network effects experiment design section
Simulate a data-driven pushback conversation with a PM

Phase 3 — Leadership & Communication (Weeks 6–8)

1.5–2 hours/day

Focus: SPSIL stories, cross-team influence, and architectural awareness

Write 3 SPSIL stories for cross-functional technical projects
Prepare a metric retrospective: what would you change in a past project?
Practice presenting complex findings to a non-technical audience in <5 minutes
Review data engineering best practices (strategy + architecture)

Curated Handbook Pages for E5

Statistics & Probability A/B Testing & Experimentation Advanced SQL (PostgreSQL) Advanced Exercises Technical Skills Interview Analytical Execution Analytical Reasoning / Product Sense Behavioral Interview Advanced SQL Patterns & Techniques Analytical Patterns Reference Statistics & Probability Practice Data Strategy & Architecture Example Analytical Execution

What to Expect in the Interview Loop

SQL velocity screen (30 min, 5 problems, exabyte-scale optimization)
Python ETL — productionalization, scalability, state management
Data modeling — grain, SCD, Star vs OBT, mini-dimensions
Distributed systems design — Lambda/Kappa/Medallion, exactly-once
Product sense + metrics — North Star, RCA framework
Leadership & ownership — SPSIL (Situation, Problem, Solution, Impact, Lessons) stories, architectural defense

Key Skills to Master

SQL — Exabyte-Scale Velocity

Challenge bar: 🔴 Hard (5 problems in 30 min)

Partition pruning on ds column — always filter before any JOIN
Salting skewed keys for hot-partition mitigation
Broadcast join hints for small dimension tables
Sessionization: LAG + cumulative SUM session ID pattern
COALESCE, CAST — production guardrails
EXPLAIN plan reasoning — state out loud

Python — Production ETL

collections.Counter / defaultdict for O(1) aggregation
Log parsing with re.compile (named groups)
Recursive JSON flattening with max_depth guard
State-machine event sequence validation
100× scalability: pandas → Spark / Kafka → Redis for state

Data Modeling — Architectural

Grain definition before any table design
Fact table types: Transactional, Periodic Snapshot, Accumulating
SCD Type 1 vs Type 2 trade-off defense
Star schema vs One Big Table (OBT)
Mini-dimensions for rapidly changing attributes
Bridge tables for many-to-many with weighting factors

Distributed Systems — Architecture

Lambda vs Kappa vs Medallion architecture trade-offs
Exactly-once via at-least-once + idempotent MERGE
Log-based CDC (Debezium) vs query polling
Backpressure, schema drift, small-file problem
Reprocessing strategy and data lineage

Your Phased Study Plan

Phase 1 — SQL & Python Velocity (Weeks 1–2)

2–2.5 hours/day

Focus: Exabyte-scale SQL and production ETL patterns

SQL: 5 Hard problems per day, timed at 6 min each — no IDE
Annotate each SQL solution with partition pruning and JOIN strategy
Python: implement log parser, JSON flattener, state validator from the E6 guide
Write a 100× scale discussion for each Python exercise

Phase 2 — Architecture Mastery (Weeks 3–5)

2.5 hours/day

Focus: Data modeling and distributed systems

Data modeling: design 3 star schemas from scratch — state grain first
Implement SCD Type 2 logic in SQL (MERGE INTO with effective/expiry dates)
Design a full Medallion pipeline for one of the system design scenarios
Write an architectural comparison doc: Lambda vs Kappa for your domain
Review the Staff Data Engineer (E6) six-module syllabus thoroughly

Phase 3 — Product Sense & Leadership (Weeks 6–8)

2 hours/day

Focus: Metrics framework and cross-team influence

Define North Star + Leading + Guardrail metrics for 3 Meta products
Practice the 4-layer RCA framework — write out SQL for each layer
Prepare 4 SPSIL stories with architectural alternatives explicitly stated
Conduct a granular project retrospective: what broke, why, what changes

Phase 4 — Mock & Polish (Weeks 9–12)

2–3 hours/day

Focus: Full-loop simulation and gap remediation

Run a full mock interview loop (all 6 rounds) with a peer
Timed SQL: 5 problems in 30 minutes — plain text, no autocomplete
Architectural defense: have a peer probe your past systems for weaknesses
Review the interview-day strategy table in the E6 syllabus

Curated Handbook Pages for E6

⭐ Staff Data Engineer (E6) Six-Module Syllabus Technical Skills Interview SQL Interview Problems (15 Curated) Advanced SQL (PostgreSQL) Advanced SQL Patterns & Techniques Data Strategy & Architecture Data Architecture Best Practices Advanced Exercises Analytical Reasoning / Product Sense Behavioral Interview (SPSIL + Leadership) Analytical Patterns Reference Meta Specificity & Culture

Role-Level Learning Path

What level are you targeting?

E3 — Data Analyst

What to Expect in the Interview Loop

Key Skills to Master

SQL — Core Queries

Python — Exploratory Analysis

Statistics — Descriptive

Product Sense — Fundamentals

Your Phased Study Plan

Phase 1 — Foundations (Weeks 1–2)

Phase 2 — Analysis Skills (Weeks 3–4)

Phase 3 — Interview Simulation (Weeks 5–8)

Curated Handbook Pages for E3

E4 — Senior Analyst / Data Scientist

What to Expect in the Interview Loop

Key Skills to Master

SQL — Window Functions & CTEs

Python — Production-Ready ETL

Statistics — Inferential

Product Sense — Metrics & Experiments

Your Phased Study Plan

Phase 1 — Technical Refresh (Days 1–7)

Phase 2 — Analytical Depth (Days 8–21)

Phase 3 — Interview Simulation (Days 22–42)

Curated Handbook Pages for E4

E5 — Senior Data Scientist

What to Expect in the Interview Loop

Key Skills to Master

SQL — Optimization & Scale

Python — Statistical & Scalable

Statistics — Causal Inference

Product Sense — Systems Thinking

Your Phased Study Plan

Phase 1 — Advanced Technical (Weeks 1–2)

Phase 2 — Strategic Depth (Weeks 3–5)

Phase 3 — Leadership & Communication (Weeks 6–8)

Curated Handbook Pages for E5

E6 — Staff Data Engineer / Staff Data Scientist

What to Expect in the Interview Loop

Key Skills to Master

SQL — Exabyte-Scale Velocity

Python — Production ETL

Data Modeling — Architectural

Distributed Systems — Architecture

Your Phased Study Plan

Phase 1 — SQL & Python Velocity (Weeks 1–2)

Phase 2 — Architecture Mastery (Weeks 3–5)

Phase 3 — Product Sense & Leadership (Weeks 6–8)

Phase 4 — Mock & Polish (Weeks 9–12)

Curated Handbook Pages for E6