Example Report Template for a Data Analysis Project

Author

Saraa Al Jawad; LiangLai

The structure below is one possible setup for a report stemming from a data analysis project. It loosely follows the structure of a standard scientific manuscript. Adjust as needed. You don’t need to have exactly these sections, but the content covering those sections should be addressed.

This uses HTML as output format. See the Quarto documentation for instructions on how to use other formats.

1 Introduction

1.1 General Background Information

In here we explore relationships between different variables and these were assess using basic statistical models.

1.2 Description of data and data source

Updated version of the dataset was created by adding two new variables: SleepHours, a numeric variable representing average hours of sleep per night, and ActivityLevel, a categorical variable describing general activity level (Low, Medium, High).

2 Methods

In this analysis, we used a dataset containing measurements from 14 individuals and focused on several variables including height, weight, gender, sleep duration, and activity level. The data were cleaned by removing missing values and ensuring that variables were in appropriate formats, with sleep duration treated as a numeric variable and activity level treated as a categorical variable. Exploratory data analysis was performed using histograms to examine the distributions of height and weight, and scatter plots and boxplots were used to explore relationships between height, weight, sleep duration, gender, and activity level.Finally, linear regression models were fitted to examine whether height could be predicted by the variables.

3 Results

3.1 Exploratory/Descriptive analysis

Use a combination of text/tables/figures to explore and describe your data. Show the most important descriptive results here. Additional ones should go in the supplement. Even more can be in the R and Quarto files that are part of your project.

Table 1 shows a summary of the data.

Note the loading of the data providing a relative path using the ../../ notation. (Two dots means a folder up). You never want to specify an absolute path like C:\yourname\yourproject\results\ because if you share this with someone, it won’t work for them since they don’t have that path. You can also use the here R package to create paths. See examples of that below. I generally recommend the here package.

Table 1: Data summary table collected from 14 individuals.
skim_type skim_variable n_missing complete_rate factor.ordered factor.n_unique factor.top_counts numeric.mean numeric.sd numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 numeric.hist
factor Gender 0 1 FALSE 3 M: 4, F: 3, O: 2 NA NA NA NA NA NA NA NA
factor ActivityLevel 0 1 FALSE 3 low: 4, med: 3, hig: 2 NA NA NA NA NA NA NA NA
numeric Height 0 1 NA NA NA 165.66667 15.976545 133 156 166 178 183 ▂▁▃▃▇
numeric Weight 0 1 NA NA NA 70.11111 21.245261 45 55 70 80 110 ▇▂▃▂▂
numeric SleepHours 0 1 NA NA NA 6.00000 1.224745 4 5 6 7 8 ▂▅▇▅▂

3.2 Basic statistical analysis

Figure 1: Height and weight stratified by gender.
Figure 2: Height distribution across activity level categories.
Figure 3: Scatter plot of sleep duration versus weight.

We next applied linear regression models to further examine relationships between height and potential predictors. These models allow us to assess whether variables are associated with height.

3.3 Full analysis

Summary of a linear model fit.

Table 2: Linear model fit table.
term estimate std.error statistic p.value
(Intercept) 149.2726967 23.3823360 6.3839942 0.0013962
Weight 0.2623972 0.3512436 0.7470519 0.4886517
GenderM -2.1244913 15.5488953 -0.1366329 0.8966520
GenderO -4.7644739 19.0114155 -0.2506112 0.8120871
Table 3: Linear regression predicting Height from SleepHours and ActivityLevel.
term estimate std.error statistic p.value
(Intercept) 166.9578947 38.786998 4.3044810 0.0076827
SleepHours -0.5157895 7.136227 -0.0722776 0.9451831
ActivityLevelmedium 2.1473684 18.367989 0.1169082 0.9114836
ActivityLevelhigh 4.8947368 19.543363 0.2504552 0.8122012

In this dataset, linear regression models did not reveal strong evidence of associations between height and the examined predictors.

4 Discussion

In this exploratory analysis, linear regression models did not reveal strong evidence of associations, likely due to the limited sample size which reduces statistical power.

5 Conclusions

This analysis demonstrated a reproducible workflow for exploratory data analysis and basic statistical modeling. While no strong associations were detected, the approach illustrates how visualizations and regression models can be used to explore relationships in datasets.