Abstract
In designing risk assessment algorithms, many scholars promote a “kitchen sink” approach, reasoning that more information yields more accurate predictions. We show, however, that this rationale often fails when algorithms are trained to predict a proxy of the true outcome, as is typically the case. With such “label bias”, one should exclude a feature if its correlation with the proxy and its correlation with the true outcome have opposite signs, conditional on the other model features. This criterion is often satisfied when a feature is weakly correlated with the true outcome, and, additionally, that feature and the true outcome are both direct causes of the remaining features. For example, due to patterns of police deployment, criminal behavior and geography may be weakly correlated and direct causes of one’s criminal record, suggesting one should exclude geography in criminal risk assessments trained to predict arrest as a proxy for behavior.
Citation
Zanger-Tishler, Michael, Julian Nyarko, and Sharad Goel. "Risk Scores, Label Bias, and Everything but the Kitchen Sink." .