SQL Practice Problems for Meta Data Science Interviews

This section provides complex SQL practice problems designed to simulate real-world scenarios at a social media company like Meta. These problems require a deeper understanding of SQL and analytical thinking. Solutions will be provided separately.

Problem Index

Problem	Difficulty	Type of Problem	Link
1	Medium	Joins, Aggregations	#problem-1-department-highest-salary
2	Medium	Self Joins, Consecutive Rows	#problem-2-consecutive-numbers
3	Medium	Subqueries, NOT IN	#problem-3-customers-who-never-order
4	Medium	Subqueries, MAX	#problem-4-second-highest-salary
5	Medium	Self Joins, Date Functions	#problem-5-rising-temperature
6	Advanced	Joins, Aggregations, Dates	#problem-6-engagement-rate-by-content-type
7	Advanced	Self Joins, Date Functions, Churn	#problem-7-churn-rate-by-user-segment
8	Advanced	Self Joins, Date Differences	#problem-8-average-time-between-posts
9	Advanced	String Manipulation, Window Functions, Dates	#problem-9-most-popular-hashtags-by-week
10	Advanced	Cohort Analysis, Date Functions	#problem-10-retained-users-by-cohort
11	Advanced	Self Joins, Counting Connections	#problem-11-mutual-connections
12	Advanced	Joins, Aggregations, Age Grouping	#problem-12-average-post-engagement-by-user-age-group
13	Advanced	Time Series Analysis, Percentage Change	#problem-13-trending-topics-over-time
14	Advanced	Funnel Analysis, Conversion Rates	#problem-14-user-engagement-funnel

Problem 1: Department Highest Salary (LeetCode 184)

Problem Statement: Write an SQL query to find employees who have the highest salary in each of their departments.

Table Schema:

Employee table:
- Id (int): Primary key.
- Name (varchar): Employee name.
- Salary (int): Employee salary.
- DepartmentId (int): Foreign key referencing the Department table.
Department table:
- Id (int): Primary key.
- Name (varchar): Department name.

Example:

Employee table:

Id	Name	Salary	DepartmentId
1	Joe	70000	1
2	Henry	80000	2
3	Sam	60000	2
4	Max	90000	1
5	Janet	69000	1
6	Randy	85000	1

Department table:

Id	Name
1	IT
2	Sales

Expected Output:

Department	Employee	Salary
IT	Max	90000
Sales	Henry	80000

Problem 2: Consecutive Numbers (LeetCode 180)

Problem Statement: Write an SQL query to find all numbers that appear at least three times consecutively.

Table Schema:

Logs table:
- Id (int): Primary key.
- Num (int): The number in the log.

Example:

Logs table:

Id	Num
1	1
2	1
3	1
4	2
5	1
6	2
7	2

Expected Output:

Num
1

Problem 3: Customers Who Never Order (LeetCode 183)

Problem Statement: Write an SQL query to find all customers who never place an order.

Table Schema:

Customers table:
- Id (int): Primary key.
- Name (varchar): Customer name.
Orders table:
- Id (int): Primary key.
- CustomerId (int): Foreign key referencing the Customers table.

Example:

Customers table:

Id	Name
1	Joe
2	Henry
3	Sam
4	Max

Orders table:

Id	CustomerId
1	3
2	1

Expected Output:

Name
Henry
Max

Problem 4: Second Highest Salary (LeetCode 176)

Problem Statement: Write an SQL query to get the second highest salary from the Employee table. If there is no second highest salary, then the query should return null.

Table Schema:

Employee table:
- Id (int): Primary key.
- Salary (int): Employee salary.

Example:

Employee table:

Id	Salary
1	100
2	200
3	300

Expected Output:

SecondHighestSalary
200

Problem 5: Rising Temperature (LeetCode 197)

Problem Statement: Write an SQL query to find all dates’ Id with higher temperatures compared to its previous dates (yesterday’s dates).

Table Schema:

Weather table:
- Id (int): Primary key.
- RecordDate (date): Date of the weather record.
- Temperature (int): Temperature on that day.

Example:

Weather table:

Id	RecordDate	Temperature
1	2015-01-01	10
2	2015-01-02	25
3	2015-01-03	20
4	2015-01-04	30

Expected Output:

Id
2
4

Problem 6: Engagement Rate by Content Type

Problem Statement: You are a data scientist at a social media platform. The product team is considering investing in more video content. You need to analyze user engagement across different content types (photo, video, text) to determine if this is a worthwhile investment. Write an SQL query to calculate the average engagement rate (likes + comments + shares / impressions) for each content type, broken down by month.

Table Schema:

Posts table:
- post_id (INT, PRIMARY KEY): Unique identifier for each post.
- user_id (INT): ID of the user who created the post.
- content_type (VARCHAR): Type of content (‘photo’, ‘video’, ‘text’).
- created_at (TIMESTAMP): Timestamp of post creation.
- impressions (INT): Number of times the post was shown to users.
Reactions table:
- reaction_id (INT, PRIMARY KEY): Unique identifier for each reaction.
- post_id (INT): ID of the post that received the reaction.
- reaction_type (VARCHAR): Type of reaction (‘like’, ‘comment’, ‘share’).

Example: (Simplified for brevity)

Posts table:

post_id	user_id	content_type	created_at	impressions
1	101	photo	2024-01-15 10:00:00	1000
2	102	video	2024-01-20 12:00:00	500
3	101	text	2024-02-01 09:00:00	2000
4	103	video	2024-02-10 15:00:00	1500

Reactions table:

reaction_id	post_id	reaction_type
1	1	like
2	1	comment
3	2	like
4	2	share
5	3	comment
6	4	like
7	4	like
8	4	share

Expected Output (Conceptual):

month	content_type	avg_engagement_rate
2024-01	photo	0.002
2024-01	video	0.004
2024-02	text	0.0005
2024-02	video	0.00133

Problem 7: Churn Rate by User Segment

Problem Statement: You are tasked with analyzing user churn. Calculate the churn rate for different user segments based on their signup date (users who signed up in the first half of the month vs. the second half). Churn is defined as users who were active in the previous month but did not have any activity in the current month.

Table Schema:

Users table:
- user_id (INT, PRIMARY KEY): Unique identifier for each user.
- signup_date (DATE): Date when the user signed up.
UserActivity table:
- user_id (INT): ID of the user.
- activity_date (DATE): Date of user activity.

Example: (Simplified)

Users table:

user_id	signup_date
1	2024-01-05
2	2024-01-20
3	2024-02-02
4	2024-02-18

UserActivity table:

user_id	activity_date
1	2024-01-10
1	2024-02-05
2	2024-01-25
3	2024-02-15

Expected Output (Conceptual for February 2024):

signup_segment	churn_rate
First Half	0.0
Second Half	1.0

Problem 8: Average Time Between Posts

Problem Statement: Analyze how frequently users post. Calculate the average time (in days) between posts for users who have made at least two posts.

Table Schema:

Posts table:
- post_id (INT, PRIMARY KEY): Unique identifier for each post.
- user_id (INT): ID of the user who created the post.
- created_at (TIMESTAMP): Timestamp of post creation.

Example:

Posts table:

post_id	user_id	created_at
1	101	2024-01-01 10:00:00
2	101	2024-01-05 12:00:00
3	102	2024-01-10 15:00:00
4	101	2024-01-12 08:00:00

Expected Output:

user_id	avg_days_between_posts
101	3

Problem 9: Most Popular Hashtags by Week

Problem Statement: Determine the most popular hashtags used each week. “Popularity” is defined by the number of posts containing the hashtag.

Table Schema:

Posts table:
- post_id (INT, PRIMARY KEY): Unique identifier for each post.
- content (TEXT): The content of the post (may contain hashtags).
- created_at (TIMESTAMP): Timestamp of post creation.

Example:

Posts table:

post_id	content	created_at
1	Check out this #amazing pic	2024-01-01 10:00:00
2	Another #amazing day	2024-01-03 12:00:00
3	Just a #random thought	2024-01-08 15:00:00
4	#amazing view	2024-01-10 08:00:00

Expected Output (Conceptual - assuming week starts on Monday):

week_start	hashtag	post_count
2023-12-31	#amazing	2
2024-01-07	#random	1

Problem 10: Retained Users by Cohort

Problem Statement: Analyze user retention by cohort. A cohort is defined by the month a user signed up. Calculate the percentage of users from each cohort who are still active (have at least one activity) in the current month (let’s assume it’s March 2024 for this example).

Table Schema:

Users table:
- user_id (INT, PRIMARY KEY): Unique identifier for each user.
- signup_date (DATE): Date when the user signed up.
UserActivity table:
- user_id (INT): ID of the user.
- activity_date (DATE): Date of user activity.

Example:

Users table:

user_id	signup_date
1	2024-01-15
2	2024-01-28
3	2024-02-05
4	2024-02-20
5	2024-03-10

UserActivity table:

user_id	activity_date
1	2024-01-20
1	2024-03-01
2	2024-01-30
3	2024-02-10
3	2024-03-15
5	2024-03-20

Expected Output (for March 2024):

signup_month	retention_rate
2024-01	50.0	(1 out of 2 Jan users active in Mar)
2024-02	50.0	(1 out of 2 Feb users active in Mar)
2024-03	100.0	(1 out of 1 Mar users active in Mar)

Clarification:

The signup_month should be formatted as ‘YYYY-MM’.
The retention_rate should be a percentage (e.g., 50.0, 75.0, 100.0).
If a cohort has no users active in the current month, the retention rate should be 0.0.

Problem 11: Mutual Connections

Problem Statement: You are analyzing user connections on the platform. Write an SQL query to find all pairs of users who have at least three mutual connections.

Table Schema:

Connections table:
- user_id1 (INT): ID of the first user in the connection.
- user_id2 (INT): ID of the second user in the connection.
- (Note: Connections are undirected, meaning if (1, 2) exists, (2, 1) does not need to exist. You should treat them as the same connection.)

Example:

Connections table:

user_id1	user_id2
1	2
1	3
1	4
2	3
2	4
2	5
3	4
3	5
4	5

Expected Output:

user1	user2
2	3
2	4
2	5
3	4
3	5
4	5

(Users 2, 3, and 4 have 3 mutual connections with user 5)

Problem 12: Average Post Engagement by User Age Group

Problem Statement: You are analyzing post engagement based on user demographics. Calculate the average number of likes, comments, and shares per post for different user age groups (e.g., 18-24, 25-34, 35+).

Table Schema:

Users table:
- user_id (INT, PRIMARY KEY): Unique identifier for each user.
- age (INT): User’s age.
Posts table:
- post_id (INT, PRIMARY KEY): Unique identifier for each post.
- user_id (INT): ID of the user who created the post.
Reactions table:
- reaction_id (INT, PRIMARY KEY): Unique identifier for each reaction.
- post_id (INT): ID of the post that received the reaction.
- reaction_type (VARCHAR): Type of reaction (‘like’, ‘comment’, ‘share’).

Example: (Simplified)

(Example data would be provided similarly to previous problems)

Expected Output (Conceptual):

age_group	avg_likes	avg_comments	avg_shares
18-24	10.5	2.2	1.1
25-34	15.2	3.8	2.5
35+	8.7	1.5	0.8

Problem Statement: Identify trending topics on the platform over time. A topic is represented by a keyword. A topic is considered trending in a given week if its usage (number of posts containing the keyword) increases by at least 20% compared to the previous week.

Table Schema:

Posts table:
- post_id (INT, PRIMARY KEY): Unique identifier for each post.
- content (TEXT): The content of the post.
- created_at (TIMESTAMP): Timestamp of post creation.

Example: (Simplified)

(Example data would be provided similarly to previous problems)

Expected Output (Conceptual - assuming week starts on Monday):

week_start	keyword	percent_increase
2024-01-07	#newtopic	25.0

Problem 14: User Engagement Funnel

Problem Statement: Analyze the user engagement funnel for a specific feature (e.g., creating a story). Calculate the conversion rates between different stages of the funnel:

Viewed the feature introduction.
Started creating a story.
Successfully posted a story.

Table Schema:

UserActions table:
- user_id (INT): ID of the user.
- action (VARCHAR): Type of user action (‘viewed_story_intro’, ‘started_story_creation’, ‘posted_story’).
- action_timestamp (TIMESTAMP): Timestamp of the action.

Example: (Simplified)

(Example data would be provided similarly to previous problems)

Expected Output (Conceptual):

stage	conversion_rate
Viewed Intro -> Started Creation	60.0
Started Creation -> Posted Story	75.0

SQL Practice Problems for Meta Data Science Interviews

Problem Index

Problem 1: Department Highest Salary (LeetCode 184)

Problem 2: Consecutive Numbers (LeetCode 180)

Problem 3: Customers Who Never Order (LeetCode 183)

Problem 4: Second Highest Salary (LeetCode 176)

Problem 5: Rising Temperature (LeetCode 197)

Problem 6: Engagement Rate by Content Type

Problem 7: Churn Rate by User Segment

Problem 8: Average Time Between Posts

Problem 9: Most Popular Hashtags by Week

Problem 10: Retained Users by Cohort

Problem 11: Mutual Connections

Problem 12: Average Post Engagement by User Age Group

Problem 13: Trending Topics Over Time

Problem 14: User Engagement Funnel