| skim_type | skim_variable | n_missing | complete_rate | factor.ordered | factor.n_unique | factor.top_counts | numeric.mean | numeric.sd | numeric.p0 | numeric.p25 | numeric.p50 | numeric.p75 | numeric.p100 | numeric.hist |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| factor | Gender | 0 | 1 | FALSE | 3 | M: 4, F: 3, O: 2 | NA | NA | NA | NA | NA | NA | NA | NA |
| factor | ActivityLevel | 0 | 1 | FALSE | 3 | low: 4, med: 3, hig: 2 | NA | NA | NA | NA | NA | NA | NA | NA |
| numeric | Height | 0 | 1 | NA | NA | NA | 165.66667 | 15.976545 | 133 | 156 | 166 | 178 | 183 | ▂▁▃▃▇ |
| numeric | Weight | 0 | 1 | NA | NA | NA | 70.11111 | 21.245261 | 45 | 55 | 70 | 80 | 110 | ▇▂▃▂▂ |
| numeric | SleepHours | 0 | 1 | NA | NA | NA | 6.00000 | 1.224745 | 4 | 5 | 6 | 7 | 8 | ▂▅▇▅▂ |
Example Report Template for a Data Analysis Project
The structure below is one possible setup for a report stemming from a data analysis project. It loosely follows the structure of a standard scientific manuscript. Adjust as needed. You don’t need to have exactly these sections, but the content covering those sections should be addressed.
This uses HTML as output format. See the Quarto documentation for instructions on how to use other formats.
1 Introduction
1.1 General Background Information
In here we explore relationships between different variables and these were assess using basic statistical models.
1.2 Description of data and data source
Updated version of the dataset was created by adding two new variables: SleepHours, a numeric variable representing average hours of sleep per night, and ActivityLevel, a categorical variable describing general activity level (Low, Medium, High).
2 Methods
In this analysis, we used a dataset containing measurements from 14 individuals and focused on several variables including height, weight, gender, sleep duration, and activity level. The data were cleaned by removing missing values and ensuring that variables were in appropriate formats, with sleep duration treated as a numeric variable and activity level treated as a categorical variable. Exploratory data analysis was performed using histograms to examine the distributions of height and weight, and scatter plots and boxplots were used to explore relationships between height, weight, sleep duration, gender, and activity level.Finally, linear regression models were fitted to examine whether height could be predicted by the variables.
3 Results
3.1 Exploratory/Descriptive analysis
Use a combination of text/tables/figures to explore and describe your data. Show the most important descriptive results here. Additional ones should go in the supplement. Even more can be in the R and Quarto files that are part of your project.
Table 1 shows a summary of the data.
Note the loading of the data providing a relative path using the ../../ notation. (Two dots means a folder up). You never want to specify an absolute path like C:\yourname\yourproject\results\ because if you share this with someone, it won’t work for them since they don’t have that path. You can also use the here R package to create paths. See examples of that below. I generally recommend the here package.
3.2 Basic statistical analysis
We next applied linear regression models to further examine relationships between height and potential predictors. These models allow us to assess whether variables are associated with height.
3.3 Full analysis
Summary of a linear model fit.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 149.2726967 | 23.3823360 | 6.3839942 | 0.0013962 |
| Weight | 0.2623972 | 0.3512436 | 0.7470519 | 0.4886517 |
| GenderM | -2.1244913 | 15.5488953 | -0.1366329 | 0.8966520 |
| GenderO | -4.7644739 | 19.0114155 | -0.2506112 | 0.8120871 |
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 166.9578947 | 38.786998 | 4.3044810 | 0.0076827 |
| SleepHours | -0.5157895 | 7.136227 | -0.0722776 | 0.9451831 |
| ActivityLevelmedium | 2.1473684 | 18.367989 | 0.1169082 | 0.9114836 |
| ActivityLevelhigh | 4.8947368 | 19.543363 | 0.2504552 | 0.8122012 |
In this dataset, linear regression models did not reveal strong evidence of associations between height and the examined predictors.
4 Discussion
In this exploratory analysis, linear regression models did not reveal strong evidence of associations, likely due to the limited sample size which reduces statistical power.
5 Conclusions
This analysis demonstrated a reproducible workflow for exploratory data analysis and basic statistical modeling. While no strong associations were detected, the approach illustrates how visualizations and regression models can be used to explore relationships in datasets.