The first week was dedicated to exploring our original dataset. Did basic importing and cleaning data as well as some initial analysis. Looked at variable correlations, correlated variable plots, and the distribution of the number of students per school district.
Margin of Error AnalysisAdditionally, we took a deep dive into understanding the margins of errors.
Week two was dedicated to joining our race data with our school districts.
Importing & cleaning initial data. Joined with household conditions.
Initial Graduation Rates AnalysisLooked at sumamry statisitics, outliers, correlation matricies, and created some visualizations.
High schools within a school district, used to filter for our models later on.
Averaged the revenue, joined to household conditions, looked at total revenue per child, and looked at inital correlations.
Assessments AnalysisAveraged assessment scores from 2014-2018 (reading scores, math scores, and total), joined to household conditions, added predominant race indicator, and looked at correlations.
The goal of this analysis is to fit a simple linear regression model to predict average high school district graduation rate based on household conditions and racial composition of the surrounding area.
Tidymodels Regression AnalysisUsed Tidymodels package to run multiple regression models on our combined datasets.
Predictor Distributions AnalysisLooked at variable distributions, most predominant race, grad rates by most predominant race, predictor distributions by most prevalent race, and variable correlations per predominant racial group.
Final Predictor AnalysisFinal models