Household Conditions by Geographic School District

Data and Society Capstone Project
Seattle Pacific University

By: Jon Geiger, Noel Goodwin, and Abigail Joppa


Analysis

Week 1:

Preliminary Analysis

The first week was dedicated to exploring our original dataset. Did basic importing and cleaning data as well as some initial analysis. Looked at variable correlations, correlated variable plots, and the distribution of the number of students per school district.

Margin of Error Analysis

Additionally, we took a deep dive into understanding the margins of errors.

Week 2:

Race-Households Join

Week two was dedicated to joining our race data with our school districts.

Week 4/5:

Graduation Rates Join

Importing & cleaning initial data. Joined with household conditions.

Initial Graduation Rates Analysis

Looked at sumamry statisitics, outliers, correlation matricies, and created some visualizations.

Week 6:

High School Districts

High schools within a school district, used to filter for our models later on.

Week 7:

Revenue Analysis

Averaged the revenue, joined to household conditions, looked at total revenue per child, and looked at inital correlations.

Assessments Analysis

Averaged assessment scores from 2014-2018 (reading scores, math scores, and total), joined to household conditions, added predominant race indicator, and looked at correlations.

Week 8/9 (& 10/11):

Initial Regression Modeling

The goal of this analysis is to fit a simple linear regression model to predict average high school district graduation rate based on household conditions and racial composition of the surrounding area.

Tidymodels Regression Analysis

Used Tidymodels package to run multiple regression models on our combined datasets.

Predictor Distributions Analysis

Looked at variable distributions, most predominant race, grad rates by most predominant race, predictor distributions by most prevalent race, and variable correlations per predominant racial group.

Final Predictor Analysis

Final models