Projects

Team Coin Flip: Travel Fatigue and Performance

This group project looked to determine whether an MLB team’s travel schedule and fatigue had any effect on a team’s in game performance. Initially through our exploratory data analysis, we found that there was significant evidence for the prevelance of home-field advantage, and, through a Granger Causality Test, we found that there was no evidence to support causality between the distance travelled by a team, and their offensive and defensive on-base percentage(OBA).

We looked to get a baseline understanding of a team’s ability by developing an ELO ranking system derived from 538’s NFL ELO system, accounting for margin of victory and pitching form. Additionally, two scalars which controlled the extent of ELO change were found by a grid-search parameter tuning method.

Using the ELO and a group of fatigue metrics(distance travelled, jet lag, days away team has been on the road, and the difference in travel direction for both teams), we built an XGBoost model which looked to predict the probability of the home team winning for each game. Through a feature analysis, we found that the fatigue metrics had very little effect on the model’s predictions, concluding that fatigue and travel had very little effect on a team’s in game performance.

This project was a submission for the 36-hour 2024 Rice Datathon, where we finished 2nd place overall out of 59 teams and received a $700 team prize.

Breaking the Cycle: Reducing Recidivism in Iowa State Prisons

This group project evaluates the factors contributing to prison recidivism and looks to predict the total fiscal cost of recidivism for the May 2023 population of the Iowa state prison system. We created a binary classifier using a Feedforward Neural Network which would predict the probability that an inmate, given certain variables, would re-offend.

Since it was found that time-sensitive, county-level data (e.g. unemployment rate) affected recidivism rates, we fit distributions to model the length of an inmate’s sentence, given the severity of the crime, and created regressions to model the county-level data at release time. We also used previous data to find the odds of an inmate committing a different severity of crime, given that they re-offend.

These models were used in a Monte Carlo simulation, where for each trial, the simulation predicts the release month of each inmate and the county-level parameters at that time. Using these variables, a Feedforward Neural Network predicts the odds of recidivism. We also find what level of crime would be committed, given re-offense, for each crime. We would then find the expected cost of recidivism by multiplying all the odds of recidivism by the fiscal cost of the predicted severity of crime.

For factor analysis, we used the SHAP (SHapley Additive exPlantations) library which used Shapley values from game theory to find which variables had the most effect on an inmate’s probability of re-offense.

This project was created as a submission for the 2023 Modeling the Future Challenge (MTFC), where we finished 2nd place out of 227 teams, gaining a $15,000 team award as well as a publication in the 2023 edition of the Actuarial Clearing House publication.