Example Report Template for a Data Analysis Project

Author

Saraa Al Jawad; LiangLai

The structure below is one possible setup for a report stemming from a data analysis project. It loosely follows the structure of a standard scientific manuscript. Adjust as needed. You don’t need to have exactly these sections, but the content covering those sections should be addressed.

This uses HTML as output format. See the Quarto documentation for instructions on how to use other formats.

1 Introduction

1.1 General Background Information

In here we explore relationships between different variables and these were assess using basic statistical models.

1.2 Description of data and data source

Updated version of the dataset was created by adding two new variables: SleepHours, a numeric variable representing average hours of sleep per night, and ActivityLevel, a categorical variable describing general activity level (Low, Medium, High).

2 Methods

In this analysis, we used a dataset containing measurements from 14 individuals and focused on several variables including height, weight, gender, sleep duration, and activity level. The data were cleaned by removing missing values and ensuring that variables were in appropriate formats, with sleep duration treated as a numeric variable and activity level treated as a categorical variable. Exploratory data analysis was performed using histograms to examine the distributions of height and weight, and scatter plots and boxplots were used to explore relationships between height, weight, sleep duration, gender, and activity level.Finally, linear regression models were fitted to examine whether height could be predicted by the variables.

3 Results

3.1 Exploratory/Descriptive analysis

Use a combination of text/tables/figures to explore and describe your data. Show the most important descriptive results here. Additional ones should go in the supplement. Even more can be in the R and Quarto files that are part of your project.

Table 1 shows a summary of the data.

Note the loading of the data providing a relative path using the ../../ notation. (Two dots means a folder up). You never want to specify an absolute path like C:\yourname\yourproject\results\ because if you share this with someone, it won’t work for them since they don’t have that path. You can also use the here R package to create paths. See examples of that below. I generally recommend the here package.

Table 1: Data summary table collected from 14 individuals.

skim_type	skim_variable	complete_rate	factor.ordered	factor.n_unique	factor.top_counts	numeric.mean	numeric.sd	numeric.p0	numeric.p25	numeric.p50	numeric.p75	numeric.p100	numeric.hist
factor	Gender	1	FALSE	3	M: 4, F: 3, O: 2	NA	NA	NA	NA	NA	NA	NA	NA
factor	ActivityLevel	1	FALSE	3	low: 4, med: 3, hig: 2	NA	NA	NA	NA	NA	NA	NA	NA
numeric	Height	1	NA	NA	NA	165.66667	15.976545	133	156	166	178	183	▂▁▃▃▇
numeric	Weight	1	NA	NA	NA	70.11111	21.245261	45	55	70	80	110	▇▂▃▂▂
numeric	SleepHours	1	NA	NA	NA	6.00000	1.224745	4	5	6	7	8	▂▅▇▅▂

3.2 Basic statistical analysis

Figure 1: Height and weight stratified by gender.

Figure 2: Height distribution across activity level categories.

Figure 3: Scatter plot of sleep duration versus weight.

We next applied linear regression models to further examine relationships between height and potential predictors. These models allow us to assess whether variables are associated with height.

3.3 Full analysis

Summary of a linear model fit.

Table 2: Linear model fit table.

term	estimate	std.error	statistic	p.value
(Intercept)	149.2726967	23.3823360	6.3839942	0.0013962
Weight	0.2623972	0.3512436	0.7470519	0.4886517
GenderM	-2.1244913	15.5488953	-0.1366329	0.8966520
GenderO	-4.7644739	19.0114155	-0.2506112	0.8120871

Table 3: Linear regression predicting Height from SleepHours and ActivityLevel.

term	estimate	std.error	statistic	p.value
(Intercept)	166.9578947	38.786998	4.3044810	0.0076827
SleepHours	-0.5157895	7.136227	-0.0722776	0.9451831
ActivityLevelmedium	2.1473684	18.367989	0.1169082	0.9114836
ActivityLevelhigh	4.8947368	19.543363	0.2504552	0.8122012

In this dataset, linear regression models did not reveal strong evidence of associations between height and the examined predictors.

4 Discussion

In this exploratory analysis, linear regression models did not reveal strong evidence of associations, likely due to the limited sample size which reduces statistical power.

5 Conclusions

This analysis demonstrated a reproducible workflow for exploratory data analysis and basic statistical modeling. While no strong associations were detected, the approach illustrates how visualizations and regression models can be used to explore relationships in datasets.