For the completion of the Data & Society Capstone course at Seattle Pacific University, our class was asked to follow the guidelines of the 2022 Data Challenge Expo put on by by the American Statistical Association.
These guidelines include:
Sourced from the American Community Survey (ACS) and provided by the Urban Institute, this dataset describes the share of households within a geographic school district between 2014 and 2018 under conditions that may affect remote K-12 learning environments.
These five-year estimates of household conditions within a geographic school district are calculated based on aggregate Census survey data for households within each school district’s geographic location. Because of this, some estimates may include households without school-aged children (noted in descriptions) which is partly accounted for in each estimate’s corresponding margin of error.
Variables | Description |
---|---|
School ID | Distinct district ID |
State | Includes all 50 States of the US. |
Geographic School District | A geographic school district is defined as a public-school district that has geographic boundaries reported by a state. This does not include private schools or charter school systems unless they have geographic boundaries that are reported by the state. |
Children 5-17 (SAIPE* estimate) | An estimate of children between the ages 5-17 who are enrolled in school within a certain
geographic school district. A child is estimated to be enrolled in a school district if they
live within the boundaries of the district and their “assigned grade is within the grade range
for which the district is financially responsible”
(EDGE).
This estimate does not account for children who are enrolled in private school or those who
attend school outside the boundaries of their geographically assigned public-school district. * yearly estimate made by US Census Bureau’s Small Area Income and Poverty Estimates (SAIPE) Program based on Census responses. |
Poverty (SAIPE estimate) | A student is considered to be in poverty if their family’s income is at or below 100 percent of the federal poverty level. The poverty level changes each year and is calculated based on how many people are living in a household. (See poverty rates between 2014 and 2018 according to the HHS Poverty Guidelines) |
Single Parents | Students have single parents if they are living in a household with only one father or one mother. |
Linguistically Isolated | A student is considered linguistically isolated if no one at or above the age of 14 speaks English as their first language, or who speaks English “very well” as their second language. |
Children with Disability | Students who have cognitive, ambulatory, independent living, selfcare, vision, or hearing difficulties are considered to be children with disability. |
Parents in vulnerable economic sectors | Parents are considered to be in vulnerable economic sectors if they earn less than 800 dollars a week and works in industries that are most likely to be laid off. This includes those working in the entertainment, service, and retail industries. Parents of a household are defined as the householder and his or her spouse or partner. |
Crowded Conditions | Students are considered to be living crowded conditions if there is less than one room per household member. A room is a space enclosed by walls, a floor and a ceiling. This excludes bathrooms, porches, balconies, foyers, halls, and unfinished basements. This estimate is calculated for all occupied households, including households without students. |
Lack of computer or broadband internet | Students living in a household without a computer or without broadband internet connection. This estimate includes household with non-dial-up internet in its definition of broadband. This estimate also considers desktop computers, laptops, smartphones and tablets as computers. |
Because each household condition is an estimate, this dataset includes a margin of error variable for each estimate. The Census Bureau has documentation of the methodology used by the ACS to calculate estimates and margin of errors.
ACS Design and Methodology – Chapter 12: Variance EstimationThe Margin of Error columns for the estimates were originally formatted 0%-100%. We transformed these MOEs into one-sided margin of errors for easier use later on.
Our data process was neither uniform, nor organized. Our final research question was informed by the exploration of our given dataset as well as many other school district topics--a process that took most of the quarter. You will be able to follow the evolution of our project and research question through our weekly powerpoint updates. Along with the powerpoint updates, we completed anlayses each week to document our data exploration and findings as we started the process of deciding our research question. Here are the analyses in order of when they were completed. These can also be found in the analysis directory but they are not in order).
Based upon our observations throughout the quarter, we became interested graduation rates and specifically how we could predict a district's graduation rate based upon different indicators. This led us to our main reserach question:
The first step of this question was which regression model would be most reflective of the data we had. Through the tidymodels package in r we were able to run 8 different models to assess which one would be the most accurate. Through these trial runs, we found that the gradient boosted tree and random forest models were the most accurate models for predicting graduation rates.