Role-Level Learning Path
Select your target role level to get a customized study plan, curated exercises, and level-appropriate examples.
What level are you targeting?
Your selection is saved in the browser and shown across the site so every recommendation stays relevant.
Select a level above to see your personalized plan.
E3 — Data Analyst
Suggested timeline: 6–8 weeks
Entry-level analytical role. Focus on clean SQL, descriptive statistics, data storytelling, and translating business questions into queries.
Who this is for: Graduating students, career changers, analysts with <2 years experience targeting IC3/E3 roles at Meta, Google, or comparable companies.
What to Expect in the Interview Loop
- SQL coding screen (30–45 min, medium SELECT/JOIN/GROUP BY)
- Case study — define a metric, interpret a chart
- Behavioral — tell me about a project you worked on
Key Skills to Master
SQL — Core Queries
Challenge bar: 🟢 Easy — Medium
- SELECT, WHERE, ORDER BY, LIMIT
- GROUP BY + aggregate functions (COUNT, SUM, AVG, MAX, MIN)
- INNER JOIN, LEFT JOIN across 2 tables
- Subqueries for simple filtering
- Basic date functions (DATE_DIFF, DATE_TRUNC)
Python — Exploratory Analysis
- pandas: read_csv, head, describe, value_counts
- DataFrame selection, filtering, sorting
- groupby + aggregation
- Basic matplotlib / seaborn charts
- Handling missing values with fillna / dropna
Statistics — Descriptive
- Mean, median, mode — when to use each
- Variance and standard deviation
- Distributions: normal, skewed, bimodal
- Correlation vs causation
- Percentiles and outlier detection
Product Sense — Fundamentals
- Defining a North Star metric for a simple product
- Interpreting a drop in a metric (basic)
- User funnel basics (awareness → activation → retention)
Your Phased Study Plan
Phase 1 — Foundations (Weeks 1–2)
1–1.5 hours/dayFocus: Statistics and core SQL
- Work through Statistics & Probability module — focus on descriptive stats
- Complete SQL Basics: SELECT, WHERE, GROUP BY, JOINs
- Practice 3 Easy SQL problems on LeetCode or StrataScratch daily
- Read the SQL Cheat Sheet; reproduce every example by hand
Phase 2 — Analysis Skills (Weeks 3–4)
1.5–2 hours/dayFocus: Python EDA and metric thinking
- Python module: pandas groupby, merge, pivot
- Build a simple EDA notebook on a Kaggle dataset
- Practice defining a North Star + 3 supporting metrics for a familiar app
- Work through 2 beginner analytical execution case studies
Phase 3 — Interview Simulation (Weeks 5–8)
2 hours/dayFocus: Timed practice and behavioral prep
- Timed SQL: solve 2 Medium problems in <15 min each
- Write and refine 5 STAR stories using the Behavioral Interview guide
- Complete the Interview Day Checklist
- Do 2 mock analytical case studies with a study partner
E4 — Senior Analyst / Data Scientist
Suggested timeline: 4–6 weeks
Mid-level individual contributor. Owns analysis end-to-end, designs and interprets A/B tests, and partners cross-functionally to influence product decisions.
Who this is for: 2–5 years experience. Analysts targeting IC4/E4 at Meta, Google, Airbnb, or data science roles at growth-stage companies.
What to Expect in the Interview Loop
- SQL coding screen (45 min, window functions, CTEs, multi-table joins)
- A/B testing — design an experiment, interpret results, flag pitfalls
- Product sense — define metrics, diagnose a metric drop
- Behavioral — impact, collaboration, data-driven decision making
Key Skills to Master
SQL — Window Functions & CTEs
Challenge bar: 🟡 Medium
- Window functions: ROW_NUMBER, RANK, LAG, LEAD
- CTEs for multi-step logic
- Self-joins for sequential event analysis
- CASE WHEN for conditional aggregation
- Date spine / calendar table patterns
Python — Production-Ready ETL
- pandas: merge, pivot_table, apply, lambda
- Cohort retention matrix with groupby
- Basic regex for text field parsing
- scipy.stats for t-test and chi-square
- Clean, reusable function design
Statistics — Inferential
- Hypothesis testing: null/alternative, p-value, alpha
- Two-sample t-test and proportion z-test
- Statistical power and sample size calculation
- Type I vs Type II errors
- Confidence intervals and effect sizes
Product Sense — Metrics & Experiments
- North Star + Leading + Guardrail metric tiers
- A/B test design: unit of randomization, SRM checks
- Novelty effects and user bias
- Root cause analysis: 4-layer framework
- Funnel analysis and cohort retention
Your Phased Study Plan
Phase 1 — Technical Refresh (Days 1–7)
1.5–2 hours/dayFocus: Window functions and inferential statistics
- SQL module: window functions — write 10 queries using ROW_NUMBER, LAG, LEAD
- Statistics module: hypothesis testing and confidence intervals
- Solve 5 Medium SQL problems on LeetCode (Database) under timed conditions
- Read Advanced SQL Patterns & Techniques supplementary guide
Phase 2 — Analytical Depth (Days 8–21)
2–2.5 hours/dayFocus: A/B testing, product sense, Python ETL
- A/B Testing module: sample size calc, SRM, novelty effects
- Python: build a cohort retention matrix from scratch
- Practice 3 analytical execution case studies with the SPSIL framework
- Define metrics for 2 unfamiliar products (pick from app stores)
- Complete the 21-Day Sprint Days 8–14 exercises
Phase 3 — Interview Simulation (Days 22–42)
2 hours/dayFocus: Mock interviews and behavioral polish
- Timed SQL: 5 Medium + 2 Hard problems per week
- Write 8 STAR stories — ensure each covers a Meta value
- Practice the 4-layer RCA framework on 3 real anomaly scenarios
- Complete behavioral mock interview guide
- Review SQL Interview Problems — all 15 problems with self-grading
Curated Handbook Pages for E4
E5 — Senior Data Scientist
Suggested timeline: 6–8 weeks
Senior IC. Drives complex multi-team experiments, mentors junior analysts, designs measurement frameworks, and contributes to technical architecture decisions.
Who this is for: 5+ years experience. Targeting IC5/E5/Senior DS at Meta, Google, or Staff/Lead Analyst at companies like Stripe, Shopify, or Airbnb.
What to Expect in the Interview Loop
- SQL coding screen (60 min, optimization, complex window functions, performance reasoning)
- Advanced A/B testing — causal inference, network effects, switchback experiments
- Product sense — system-level metric design, trade-off analysis
- Analytical execution — ambiguous open-ended case, structure and present findings
- Cross-functional leadership — behavioral + influence stories
Key Skills to Master
SQL — Optimization & Scale
Challenge bar: 🔴 Hard
- Query optimization: EXPLAIN, index usage, predicate pushdown
- Sessionization with LAG + cumulative SUM patterns
- Multiple CTEs chained for complex multi-step pipelines
- Approximate aggregations (APPROX_COUNT_DISTINCT)
- Performance trade-offs: broadcast vs shuffle joins
Python — Statistical & Scalable
- Difference-in-differences implementation
- Propensity score matching for observational studies
- Custom A/B test variance reduction (CUPED)
- Recursive JSON flattening and log parsing
- Scalability discussion: from pandas to Spark
Statistics — Causal Inference
- Causal inference: RCT vs quasi-experiments
- Difference-in-differences (DiD)
- Propensity score matching (PSM)
- CUPED / variance reduction
- Multiple testing corrections (Bonferroni, FDR)
- Network effects and interference
Product Sense — Systems Thinking
- Metric ecosystem design for a whole product area
- Experiment trade-offs: speed vs accuracy vs power
- Long-term vs short-term metric misalignment
- Stakeholder alignment on ambiguous goals
- Identifying leading vs lagging indicators at scale
Your Phased Study Plan
Phase 1 — Advanced Technical (Weeks 1–2)
2 hours/dayFocus: Hard SQL, causal inference, scalable Python
- SQL: solve 5 Hard problems per week; annotate each with performance notes
- Statistics: implement DiD and PSM in Python from scratch
- Python: build a recursive log parser and a stateful event validator
- Read and implement the Advanced SQL Patterns & Techniques module
Phase 2 — Strategic Depth (Weeks 3–5)
2–2.5 hours/dayFocus: Complex experimentation and metric design
- A/B testing: implement CUPED variance reduction on a sample dataset
- Design a metric framework for a multi-sided marketplace
- Practice 4 hard analytical execution cases with written outputs
- Work through the network effects experiment design section
- Simulate a data-driven pushback conversation with a PM
Phase 3 — Leadership & Communication (Weeks 6–8)
1.5–2 hours/dayFocus: SPSIL stories, cross-team influence, and architectural awareness
- Write 3 SPSIL stories for cross-functional technical projects
- Prepare a metric retrospective: what would you change in a past project?
- Practice presenting complex findings to a non-technical audience in <5 minutes
- Review data engineering best practices (strategy + architecture)
Curated Handbook Pages for E5
E6 — Staff Data Engineer / Staff Data Scientist
Suggested timeline: 8–12 weeks
Staff-level IC. Acts as a technical force multiplier. Designs data systems at exabyte scale, drives architectural decisions, and leads through influence across multiple teams.
Who this is for: 8+ years experience. Targeting IC6/E6/Staff at Meta, Google, or Principal/Staff Eng at top-tier companies.
What to Expect in the Interview Loop
- SQL velocity screen (30 min, 5 problems, exabyte-scale optimization)
- Python ETL — productionalization, scalability, state management
- Data modeling — grain, SCD, Star vs OBT, mini-dimensions
- Distributed systems design — Lambda/Kappa/Medallion, exactly-once
- Product sense + metrics — North Star, RCA framework
- Leadership & ownership — SPSIL (Situation, Problem, Solution, Impact, Lessons) stories, architectural defense
Key Skills to Master
SQL — Exabyte-Scale Velocity
Challenge bar: 🔴 Hard (5 problems in 30 min)
- Partition pruning on ds column — always filter before any JOIN
- Salting skewed keys for hot-partition mitigation
- Broadcast join hints for small dimension tables
- Sessionization: LAG + cumulative SUM session ID pattern
- COALESCE, CAST — production guardrails
- EXPLAIN plan reasoning — state out loud
Python — Production ETL
- collections.Counter / defaultdict for O(1) aggregation
- Log parsing with re.compile (named groups)
- Recursive JSON flattening with max_depth guard
- State-machine event sequence validation
- 100× scalability: pandas → Spark / Kafka → Redis for state
Data Modeling — Architectural
- Grain definition before any table design
- Fact table types: Transactional, Periodic Snapshot, Accumulating
- SCD Type 1 vs Type 2 trade-off defense
- Star schema vs One Big Table (OBT)
- Mini-dimensions for rapidly changing attributes
- Bridge tables for many-to-many with weighting factors
Distributed Systems — Architecture
- Lambda vs Kappa vs Medallion architecture trade-offs
- Exactly-once via at-least-once + idempotent MERGE
- Log-based CDC (Debezium) vs query polling
- Backpressure, schema drift, small-file problem
- Reprocessing strategy and data lineage
Your Phased Study Plan
Phase 1 — SQL & Python Velocity (Weeks 1–2)
2–2.5 hours/dayFocus: Exabyte-scale SQL and production ETL patterns
- SQL: 5 Hard problems per day, timed at 6 min each — no IDE
- Annotate each SQL solution with partition pruning and JOIN strategy
- Python: implement log parser, JSON flattener, state validator from the E6 guide
- Write a 100× scale discussion for each Python exercise
Phase 2 — Architecture Mastery (Weeks 3–5)
2.5 hours/dayFocus: Data modeling and distributed systems
- Data modeling: design 3 star schemas from scratch — state grain first
- Implement SCD Type 2 logic in SQL (MERGE INTO with effective/expiry dates)
- Design a full Medallion pipeline for one of the system design scenarios
- Write an architectural comparison doc: Lambda vs Kappa for your domain
- Review the Staff Data Engineer (E6) six-module syllabus thoroughly
Phase 3 — Product Sense & Leadership (Weeks 6–8)
2 hours/dayFocus: Metrics framework and cross-team influence
- Define North Star + Leading + Guardrail metrics for 3 Meta products
- Practice the 4-layer RCA framework — write out SQL for each layer
- Prepare 4 SPSIL stories with architectural alternatives explicitly stated
- Conduct a granular project retrospective: what broke, why, what changes
Phase 4 — Mock & Polish (Weeks 9–12)
2–3 hours/dayFocus: Full-loop simulation and gap remediation
- Run a full mock interview loop (all 6 rounds) with a peer
- Timed SQL: 5 problems in 30 minutes — plain text, no autocomplete
- Architectural defense: have a peer probe your past systems for weaknesses
- Review the interview-day strategy table in the E6 syllabus