30 September 2024
Ever wondered how many calories you're burning during a workout? With just a few key data points and the help of linear regression, you can gain some clear insights. In this post, we’ll explore how predicting calories can be both effective and straightforward. By using simple data and basic statistical techniques, I will demonstrate how a linear relationship between exercise metrics and calorie burn can serve as a useful guide.
DATA OVERVIEW
The data used for our calorie prediction model was sourced from the Github Repository. Below is a snapshot of the data.
A brief snapshot of the data.
After a quick review of the data, it was clear that the User_ID
column wasn’t adding any value to the analysis, so I dropped it from the dataframe. The dataset was clean, with no null or duplicate values to worry about. The following were the categorical variables and the numerical variables:
cat = ['Gender']
num = ['Age', 'Height', 'Weight', 'Duration', 'Heart_Rate', 'Body_Temp', 'Calories']
OUTLIER DETECTION
The only variable that exhibited outliers was Body_Temp
, so I addressed them using the Interquartile Range (IQR) method. First, I calculated the first quartile (Q1) and third quartile (Q3) for the Body_Temp
data. The IQR is simply the difference between Q3 and Q1. To identify potential outliers, I determined the lower and upper bounds: any value below Q1 minus 1.5 times the IQR or above Q3 plus 1.5 times the IQR was considered an outlier.
I then applied a technique to cap these outliers at the calculated bounds. If a temperature exceeded the upper bound, it was replaced by the upper limit, and if it fell below the lower bound, it was replaced by the lower limit. This ensures the data is free of extreme values while preserving the overall distribution.
Outliers in Body Temperature
LINEAR REGRESSION MODEL
Fitting a linear regression model to the data was straightforward due to the simplicity of the dataset. The following coefficients provide insight into how each variable impacts the predicted calories burnt:
The coefficient for Age is 0.1412518405810665
The coefficient for Height is -0.004312338502297175
The coefficient for Weight is 0.04338946908874136
The coefficient for Duration is 0.910147462690394
The coefficient for Heart_Rate is 0.30157259886169346
The coefficient for Body_Temp is -0.23489060281965535
The coefficient for Gender_male is -0.017258070653486182
The intercept for our model is 0.013502001430897742
As is evident from the numbers above,