Bellabeat Case Study on User Activity with SQL & Tableau

Amit Kumar
Analyst’s corner
Published in
7 min readFeb 5, 2024

--

Introduction

Bellabeat is a high-tech company that manufactures health-focused smart products including the Bellabeat app, Leaf, Time, and Spring. It also offers a subscription-based membership program for users giving them access to personalized guidance on having a healthy lifestyle. Bellabeat has positioned itself as a tech-driven wellness company for women.

Bellabeat Logo

Bellabeat has been investing extensively in digital marketing including Google Search and being active on social media platforms. The co-founder Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more growth opportunities.

I conducted a casе study using thе R languagе and Tablеau. In this study, I thoroughly еxplainеd the analysis process. If you’rе intеrеstеd, you can chеck it out for a morе in dеpth undеrstanding.

Statement of business task

Smart devices are a big part of people’s everyday life. As a smart device manufacturer, Bellabeat can benefit from learning the trend of smart device usage and make data-driven business strategies to explore growth opportunities.

Key stakeholders

  • Urška Sršen: Bellabeat’s co-founder and Chief Creative Officer
  • Sando Mur: Mathematician and Bellabeat’s co-founder; a key member of the Bellabeat executive team

Good starting points would be:

  1. Investigating the products that have similar functionality
  2. Drill down to explore user behaviors when using the product to identify any usage trends
  3. Apply the trend on Bellabeat products to identify recommendations on functionality and marketing strategies

Insights from the investigation can help Bellabeat to identify weaknesses of their products and new functions or even inspire ideas on new products.

It can also help to inform marketing strategies like knowing the segment of the age group that is the most active user of smart devices. Bellabeat can invest in focusing on a specific age group when doing marketing campaigns in the future.

Prepare the data

  • The data used in this analysis is the Fitbit Fitness Tracker Data made available by Mobius and stored on Kaggle.
  • This dataset is under CC0: Public Domain license meaning the creator has waived his right to the work under the copyright law.
  • The data contains personal fitness trackers from thirty Fitbit users. Thirty eligible Fitbit users consented to submit personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
  • These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016 and 05.12.2016.
  • The dataset has in total of 18 files in .csv format organized in long format.

ROCCC analysis

  • Reliability: LOW — The dataset was collected from 30 individuals whose gender is unknown.
  • Originality: LOW — third-party data collected using Amazon Mechanical Turk.
  • Comprehensive: MEDIUM — dataset contains multiple fields on daily activity intensity, calories used, daily steps taken, daily sleep time, and weight record.
  • Current: MEDIUM — data is 5 years old but the habit of how people live does not change over a few years
  • Cited: HIGH — data collector and the source is well documented

Data selection

The focus is on the daily usage of the Fitbit device as it should provide high-level insight into the usage pattern of smart devices. Thus the following files from the dataset have been selected:

  • dailyActivity_merged.csv
  • dailyCalories_merged.csv
  • dailyIntensities_merged.csv
  • dailySteps_merged.csv
  • sleepDay_merged.csv
  • weightLogInfo_merged.csv

Process the data

Google Spreadsheet and Google BigQuery will be used to process the data as the tool functionality fits the purpose.

Data cleaning

sleepDay_merged.csv and weightLogInfo_merged.csv are loaded into Google Sheets for data cleaning. The fields “SleepDay” and “Date” were not correctly formatted. The following steps have been taken:

  1. The date column has been selected and formatted to ‘Date’ using the spreadsheet function
  2. Time in the column has been removed as time is irrelevant in this analysis
  3. AM/PM indicator is also removed

Data Integrity

The selected data has been loaded into Google BigQuery for analysis. The following queries have been run to check the number of unique Id in each table

SELECT DISTINCT Id
FROM `first - sandbox - 100001. Fitbit_data.dailyActivity_merged`

SELECT DISTINCT Id
FROM `first - sandbox - 100001. Fitbit_data.dailyCalories_merged`

SELECT DISTINCT Id
FROM `first - sandbox - 100001. Fitbit_data.dailyIntensities_merged`

SELECT DISTINCT Id
FROM `first - sandbox - 100001. Fitbit_data.dailySteps_merged`

SELECT DISTINCT Id
FROM `first - sandbox - 100001. Fitbit_data.sleepDay_merged`

SELECT DISTINCT Id
FROM `first - sandbox - 100001. weightLogInfo_merged`

Result (distinct ID in each table):

  • 33
  • 33
  • 33
  • 33
  • 24
  • 8

The result shows the dataset is inconsistent as we expect 30 unique IDs on all tables. The sleepDay_merged table and the weigthLogInfo_merged table have the highest inconsistencies with 6 and 22 input missing. This would affect the result of the analysis.

Data Analysis

The hypothesis has been made with the data available on activity, sleep time, and weight.

  • There is a relationship between activity level and calories burnt.
  • There is a relationship between activity level and sleep time
  • There is a relationship between activity level and weight

Four queries have been constructed to aggregate the data for analysis to find out the relation and validate the hypothesis.

-- For finding activity level and calories burnt.

SELECT Id
,ActivityDate
,Calories
,TotalSteps
,TotalDistance
,TrackerDistance
,LoggedActivitiesDistance
,VeryActiveDistance
,ModeratelyActiveDistance
,LightActiveDistance
,SedentaryActiveDistance
,VeryActiveMinutes
,FairlyActiveMinutes
,LightlyActiveMinutes
,SedentaryMinutes
FROM `first - sandbox - 100001. Fitbit_data.dailyActivity_merged`
WHERE VeryActiveDistance + ModeratelyActiveDistance + LightActiveDistance <> 0
AND VeryActiveMinutes + FairlyActiveMinutes + LightlyActiveMinutes <> 0
ORDER BY TotalSteps DESC;
-- For finding activity level and calories burnt.

SELECT Id
,ActivityDate
,Calories
,TotalSteps
,TotalDistance
,TrackerDistance
,LoggedActivitiesDistance
,(VeryActiveDistance + ModeratelyActiveDistance + LightActiveDistance) AS TotalActiveDistance
,SedentaryActiveDistance
,(VeryActiveMinutes + FairlyActiveMinutes + LightlyActiveMinutes) AS TotalActiveMinutes
,SedentaryMinutes
FROM `first - sandbox - 100001. Fitbit_data.dailyActivity_merged`;

TotalActiveDistance is the sum of the VeryActiveDistance, ModeratelyActiveDistance, and LightActiveDistance that can be useful to find the relation between calories burnt and activity level.

-- For finding relationship between activity level and sleep time.

SELECT activity.Id
,ActivityDate
,Calories
,TotalSleepRecords
,TotalMinutesAsleep
,TotalTimeInBed
,TotalSteps
,TotalDistance
,TrackerDistance
,LoggedActivitiesDistance
,(VeryActiveDistance + ModeratelyActiveDistance) AS ActiveDistance
,(LightActiveDistance + SedentaryActiveDistance) AS non_ActiveDistance
,(VeryActiveMinutes + FairlyActiveMinutes) AS ActiveMinutes
,(LightlyActiveMinutes + SedentaryMinutes) AS non_ActiveMinutes
FROM `first - sandbox - 100001. Fitbit_data.dailyActivity_merged` AS activity
INNER JOIN `first - sandbox - 100001. Fitbit_data.sleepDay_merged` AS sleep ON activity.Id = sleep.Id
AND activity.ActivityDate = sleep.SleepDay;

ActiveDistance, non-ActiveDistance, ActiveMinutes, and non-ActiveMinutes have been calculated to find out the relationship between sleep quality versus a person’s activity in a day.

-- For finding relationship between activiy and weight/BMI.

SELECT activity.Id
,Calories
,BMI
,TotalSteps
,TotalDistance
,TrackerDistance
,LoggedActivitiesDistance
,VeryActiveDistance
,ModeratelyActiveDistance
,LightActiveDistance
,SedentaryActiveDistance
,VeryActiveMinutes
,FairlyActiveMinutes
,LightlyActiveMinutes
,SedentaryMinutes
FROM `first - sandbox - 100001. Fitbit_data.dailyActivity_merged` AS activity
INNER JOIN `first - sandbox - 100001. Fitbit_data.weightLogInfo_merged` AS weight ON activity.Id = weight.Id
AND activity.ActivityDate = weight.DATE;

When comparing the relationship between activity and weight, using BMI is a more consistent metric as weight can also be affected by height. BMI also accounts for height, which is a more universal metric to compare whether a person is underweight or overweight.

Visualization

Activity level and calories burnt in relation

Coefficient of determination (Very active minutes vs Calories burnt)= 0.375

From the above chart, we can see that a person who has higher active minutes tends to burn more calories in a day, the more time they spend inactive, the lower calories they tend to burn in a day.

Activity level and sleep quality relation

Coefficient of determination = 0.345

This visualization compares a person’s non-active minutes versus the minutes asleep. We can see that the more time a person spends non-active, the lower the time they are asleep in bed. It is a negative relationship which implies that non-activity negatively impacts sleep quality.

Activity level and BMI relation

The visualization above compares the average active minutes and average non-active minutes versus the average BMI of the users. It shows a relationship that a person who has a higher average non-active minute tends to have a higher average BMI. The small size sample in this particular comparison hinders the accuracy though.

Recommendation & Act

From the analysis result, it is clear that there is a clear trend in non-active people having a negative lifestyle. The three relations we found during the analysis include:

  • Very active minutes have a positive relation to calories burnt
  • An active person has a positive relation to sleep quality
  • A non-active person is more likely to have a high BMI

Recommendations to business

As these relations are the analysis results of participants who use smart devices to track their activity statistics, we can apply these to make data-driven decisions on Bellabeat future products/functionality:

  1. Bellabeat can include a function in the Bellabeat app to alert the user who tends to have a high number of sedentary minutes
  2. Bellabeat can include timely notifications in Leaf/Time to motivate users to move around regularly to reduce their sedentary minutes
  3. Bellabeat can use the relation between high sedentary minutes and BMI to promote an active lifestyle reduce body fat and create better health with Bellabeat products
  4. Bellabeat can further enhance their sleep tracking function to promote the sleep/non-active relation. Use this as an incentive to purchase Bellabeat products: create a better sleeping habit by being more active in everyday life.

If you’re interested in connecting with me, you can find me through the following link: Click here to get in touch

Connect me Here

--

--

Amit Kumar
Analyst’s corner

Google Certified Data Analytics & Business Intelligence | Skilled in Data analysis and Data Visualization| Passionate about Sketching & Travel |