The following is a summary of their research, written by Elishka Johnson and Kaylee Meyers
Based on research by Mason Moore, Elishka Johnson, Kaylee Meyers, Jensen Coombs, Brian Arcos-Flores, Amanda Cleary, Anisa Habib, and Austin Fashimpaur.
Analysis and History of PSA Software in Utah
What to do with defendants before their trial is a complicated question full of value judgements and ethical dilemmas. Courts typically wouldn’t want to release someone who would commit a new crime, especially a violent crime, before their trial date, but they also wouldn’t want to incarcerate someone who would not commit any crime while awaiting trial. These are people who have not yet had their day in court, and should be presumed innocent, after all. A less serious scenario that courts nonetheless also want to avoid is releasing people who will flee in an attempt to escape justice, which would necessitate the expenditure of time and money to track a defendant down and bring them back to trial.
Before the advent of pretrial risk assessment software, like the Public Safety Assessment (PSA) analyzed here, a monetary bail system was used to determine who would go free. The “judicial officer [established] probable cause and then [set] monetary bail per the level of the offense charged according to the bail schedule…Judicial officers [exercised] their discretion in determining the bail amount as well as the form of payment” (A Performance Audit of Utah’s Monetary Bail System p. 3). However, the monetary bail system is seriously flawed. Cash bail criminalizes poverty. Instead of keeping dangerous people in jail and letting safe people go free until trial, it merely keeps poor people in jail. Considering that even short jail times can have severe impacts on the lives of defendants and their families, bail only exacerbates economic inequality. Defendents who cannot pay bail may lose their jobs, housing, or child custody, undermining their ability to earn a living and keep their families together. Because poverty in America is not distributed evenly across racial lines, criminalizing poverty tends to mean criminalizing Blackness too, resulting in widening racial inequality as well. Not only is the bail system anti-poor and potentially racist, but it is also expensive. Because it’s based on who can pay rather than who presents an actual risk to society, states end up incarcerating many people unnecessarily. Keeping defendants in jail pre-trial is a cost that bail schedules do little to reduce.
It was in response to these very real problems that Arnold Ventures (AV) developed the Public Safety Assessment (PSA). Promoted by Arnold Ventures as better than a bail schedule, Arnold Ventures promised that their product would make the pretrial system fairer while protecting the public against individuals who were likely to commit additional crimes before their trial. The PSA capitalized on a wider shift in pretrial processes, where pretrial decisions moved from being based on the charges the defendant faced to being based on the risk an individual posed (A Performance Audit of Utah’s Monetary Bail System p.3). The output of the PSA is two numbers—scores that are designed to help judges assess risk in two areas. First, the PSA attempts to predict the likelihood that a defendant will commit a new crime if released before their trial. Second, it attempts to predict the likelihood that the defendant will fail to appear for their court hearing. The PSA also flags defendants who pose an elevated risk of committing a new, violent crime while awaiting the resolution of their case (Laura and John Arnold Foundation). The claim is that pretrial risk assessment software can be better than bail schedules in three ways: 1) It can reduce the rate of pretrial incarceration, by not incarcerating individuals who pose little risk to society, thereby saving states money. 2) It will reduce the rates of crimes committed by those awaiting trial by not releasing individuals who are likely to commit new crimes. 3) It will be fair. Because it’s not based on who can pay, but instead on who is likely to commit a new crime or fail to appear, it won’t criminalize poverty or Blackness.
Studying the PSA
The PSA was adopted in Utah in 2018. Given the importance and stakes regarding pretrial policy reform, it is vital to assess whether the PSA is performing as promised and providing the expected benefits. A study of the PSA would need to address two essential questions. First, we should know whether the PSA is lowering rates of pretrial detention. Second, we should know if the algorithm is fair and unbiased.
The problem is, the PSA was implemented without a concurrent data collection plan, and Utah was not collecting any data with which we could assess either of these questions for at least the first two years after implementation. While it was hoped that using the PSA would decrease the number of people held in jail pretrial and save money, there is currently no data regarding the efficacy of the algorithm in Utah. While there are validation studies performed outside of Utah, these are insufficient, because they do not reflect the same demographic or institutional situation as our state; the effects of the algorithm in Louisiana likely would not be the same as in Utah because it would be deployed in a different judicial system and a different demographic makeup. It may cause serious and long-lasting harm to individuals and their families if we continue to use the algorithm without local validation.
In addition to not being able to assess whether the PSA is lowering rates of pretrial detention and thereby saving Utah money, there is no available data on pretrial incarceration rates or PSA scores by race, making it equally impossible to check for racial bias. The consequences of these knowledge gaps are severe. If we cannot determine whether the PSA is safe and effective, we risk exacerbating inequalities in the criminal justice system. If the algorithm over-predicts risk for Black defendants, yet under-predicts risk for White defendants, like other pre-trial risk assessment software is known to do, then Black defendants would disproportionately face pre-trial incarceration, and a racist algorithm would be amplifying inequality instead of ameliorating it. If we know that the PSA is unfair, then we can work to find other alternatives to the pretrial release and detention system that prevent further harm to individuals and properly address longstanding social issues.
Additionally, any audit of the PSA will have to deal with the inability to detect false positives. Because the people with high scores are not released, there is no way to know how many of them would not have gone on to commit new crimes or failed to appear. There is no way to tell how many people are being jailed unnecessarily.
Qualitative Audit: Is the PSA biased?
Given that the data required to perform a quantitative audit of the PSA is not available, it is important to explore other means of assessing if the PSA is racially biased. While it may not be possible to tell if the risk assessment algorithm is lowering pre-trial detention rates in Utah without hard numbers, racial bias could be assessed by evaluating the nine inputs used to calculate the risk scores to determine if they act as proxies for other demographics such as race, gender, or income. 1) Age at current arrest, 2) Current violent offense, 3) Pending charges at the time of offense, 4) Prior misdemeanor conviction, 5) Prior felony conviction, 6) Prior violent conviction, 7) Prior failure to appear in the past two years, 8) Prior failure to appear older than two years, 9) Prior sentence to incarceration. (INSERT CITATION)
While Arnold Ventures claims that race, ethnicity, and geography are not considered by the algorithm, the nine variables above can still perpetuate bias without directly asking about race. This is especially concerning inputs 4, 5, and 9, which ask about prior convictions for misdemeanors, felonies, and prior incarceration. Studies have shown that Black Americans experience disproportionately severe outcomes at every step in the justice process than White Americans. They are more likely to be stopped by police, arrested, charged, detained pretrial, and sentenced to prison. For example, drug use rates are relatively constant across racial groups, but Black people are arrested and sentenced for drug related offenses at much higher rate than White people. Given these racial disparities in conviction rates across all types of crime, using prior misdemeanors and felonies to calculate a pretrial risk score will mean that Black defendants, as a group, will receive disproportionately high risk scores. While they may not directly address or mention race, factors 4 and 5 will inevitably perpetuate racial biases.
There are additional concerns about inputs 7 and 8 because they do not provide a full or detailed enough history of a defendant to be able to draw conclusions about them. These factors ask whether a defendant has failed to appear at a prior court hearing in the past two years or failed to appear at a hearing more than two years ago. Reducing failure to appear down to a yes or no question gets rid of all context surrounding a failure to appear. Failing to appear because you’re on the run in an effort to escape justice is very different than someone who fails to appear due to a lack of transportation or family emergency. Appearing in court can be a major hurdle for an individual who cannot miss work out of fear of losing their job, who must care for children, or who does not have access to the proper legal advice. Appearing in court is simply harder for poor individuals.
Furthermore, these inputs are also potentially proxies for race. Because of historical systemic racism, legacies from slavery, and red lining, Black Americans are overrepresented in poverty. The racial wealth gap means that Black defendants are more likely to be poor and therefore might also have a higher chance of having a failure to appear on their record. Without context for failures to appear and instead simplifying it to a single yes/no input, there is no way to mitigate and take into account these racial disparities. The algorithm is likely to simply exacerbate them.
Even without the data necessary to perform a statistically robust audit of the PSA in Utah, it is possible to see how this algorithm amplifies the very problems it purportedly ameliorates.
PSA Implementation in Utah
In addition to investigating the algorithm itself to check for possible bias, it is important to examine the manner of its implementation. The implementation of an AI or software system is another area where responsible AI can break down. Even if the core technology displays no biases, a botched implementation could make the use of an AI reckless. We therefore provide here a detailed history of the implementation of the PSA in Utah courts.
The Pretrial Release and Supervision Committee (PRSC) was created by the Utah Judicial Council in the Fall of 2014 to oversee a series of pretrial release and detention reforms and provide recommendations to the Council. Composed of District and Justice Court judges, prosecutors, defense attorneys, police representatives, and state legislators, the committee was charged with analyzing Utah’s then-current practices and assessing their effectiveness, determining how to improve the information provided to judges when making release decisions, reviewing the history of release and bail legislation, and evaluating pretrial release alternatives (Judicial Council Meeting Minutes, 2015 p. 7). The committee offered and pursued several recommendations of reform, including developing pretrial risk assessments and pretrial supervision systems (Judicial Council Meeting Minutes, 2015, p.7). This is reflected in the committee’s “Report to the Utah Judicial Council on Pretrial Release and Supervision Practices,” which was released in 2015. In its report, the authors make a notable observation that would later cause problems with the implementation of the PSA and other pretrial reforms:
“Pretrial release and supervision data is spotty and inconsistent in Utah. In part, this is because there are different data systems in the different branches designed to accomplish different things. The committee recommends that all pretrial release and supervision stakeholders work to create uniform, statewide data collection systems or to improve or modify existing systems. First, and perhaps most important, accurate and up-to-date data is necessary for accurate and up-to-date pretrial risk assessments. These assessments rely on data that resides within systems maintained by the courts, systems maintained by the executive branch, and systems maintained by the counties.”
(Report to the Utah Judicial Council on Pretrial Release and Supervision Practices, 2015, p.52, our italics and bolding).
The committee acknowledges that up-to-date, uniform, and accurate data collection systems are essential to ensure that pretrial reforms can be reviewed and changed if necessary. However, while this acknowledgment suggests awareness about best-practices for algorithm implementation, data collection problems remain concerning as the PSA was implemented before most of these fundamental issues were addressed or resolved, and this hasty implementation caused an array of technical issues. For example, in early 2020, the PSA could not interface with national criminal databases, and the Judicial Council had to allocate $400,000 to have the problem fixed before June of that year. In fall of 2020, an audit of Utah’s judicial information systems found that the PSA was only given to judges in approximately 30% of cases because the system could not draw information from other states’ databases, and the PSA score would not be generated if a state ID was not provided at the time of the defendant’s booking.
In addition to being implemented in the state before the necessary technical and data-collection infrastructure were in place, the PSA was not locally validated as per federal U.S. guidelines. The Bureau of Justice Assistance under the U.S. Department of Justice recommends that a validation study consider and answer the following questions for each location of implementation:
- “How well does a tool separate those who experience an outcome of interest (i.e., recidivism) from those who do not?”
- “How accurately does the tool predict the likelihood of such an outcome?”
- “How frequently does the tool inaccurately predict a low-risk individual to be at high-risk (i.e., false positive errors) and vice versa (i.e., false negative errors)”
- “How sensitive are validation results to different test settings (i.e., different samples, methods)? How does the tool perform across subgroups by race, ethnicity, and gender?” (https://bja.ojp.gov/sites/g/files/xyckuh186/files/media/document/pb-risk-validation.pdf)
The PRSC and the Judicial Council have not clearly pursued answers to these questions.
Using the collected data, the PSA should have been validated for each county periodically; at least once a year the PSA’s operations should have been formally addressed. If it did not lower costs or pre-trial detention rates – or if it showed bias toward certain demographics – its operations should have been paused.
Some members of the PRSC were not concerned about studying the effects of the PSA in Utah because it had already been validated elsewhere—despite the clear recommendation that algorithms be validated locally each time they are put in place. This is especially true when the algorithm was developed in a different area, without any local customization. When speaking about a study to be conducted by Harvard’s Access to Justice Lab to further study the PSA,
“[One member] explained that the PSA tool has already been validated and the Harvard study isn’t about seeing if the tool works. The study is about what effect the PSA has on judicial decision-making…If the data (the state’s or Harvard’s) shows that the processes are not helpful, then the study or the PSA will not continue to be used. As far as the study is concerned, it is a nice add-on, but if it ever inhibits public safety and the courts’ ability to serve the public, then the study will not continue in Utah.”
(PRSC Meeting Minutes, 2018, p. 3)
While the members of the PRSC were concerned about the validity and efficacy of the PSA, it is concerning that federal guidelines were not followed and that, as far as has been made publicly available, the PSA still has not been locally validated. In other words, we are not sure if the PSA is more or less effective than the preexisting status quo or judges’ choices before the PSA was introduced. The PSA was created to provide judges with more information about a defendant before they decide to release them or hold them in jail until their hearing. We should be certain that the PSA does as was intended: that judges have adequate information to make decisions and that the pretrial release and detention system is more fair.
Finally, the Judicial Council circumvented other legislative bodies in the implementation of the PSA; the PSA was discussed and approved within only the judicial branch. This brings up important questions about which branches of government should be consulted upon the implementation of algorithms that will be used on the state’s population. It also makes it unclear who is accountable when things go wrong and whether a small body should set the rules of precedence when they operate outside of their regulatory body’s capacity to oversee their activities. If it was discovered that the PSA was exhibiting bias against a certain group, it is not clear who would be held accountable for the consequences, what accountability would look like, and how it would be enforced. The Judicial Council membership is made up of district court judges. While judges must be elected for retention, they are initially appointed separately from the opinion of the public, making the members of the Judicial Council relatively out of reach from the public’s criticism.
Ideal Legislative Involvement
Because the Judicial Council implemented the PSA without any legislative process, there are no codified rules about minimum standards in Utah for risk assessment algorithms. To our knowledge, without a Utah law, any actions made by courts or judges relating to the PSA, no matter how egregious, would not be punishable by the force of law. This is dangerous given that no local validation has been reported about the PSA’s actual effects.
As citizens of an electoral-representative democracy, the public also has a right to elect representatives to make decisions about how their communities are run. Circumventing legislation to implement such a questionable tool could be seen as an evasion of that right. If the PSA does what the LJAF claims, there would surely be little legislative opposition to its use.
Moving Forward
Revisitation: is it working?
As part of their plan for using the PSA, the state of Utah should set out a specific plan – codified in law – for periodically re-evaluating its performance. Each year a comprehensive, formal report should be made and presented to different councils. If certain standards are not met, keeping the program is only harming the communities it judges. Even if this risk assessment tool was entirely perfect in the first few years, as communities change over time and the algorithm doesn’t its usefulness is likely to degrade. Thus, codified standards for periodic evaluation are absolutely necessary.
Ideal Public Availability
The data collected both before and during the implementation should be made available to the public, as well as the PSA software itself.
Court data is often publicly available if it’s recorded – one of the ways that courts fulfill their duty of serving the public. When the government uses tools like the PSA on local communities, the public has the right to 1) know that an algorithm is being used on them and 2) audit the algorithm in question. Because algorithms simplify immensely complex cultural phenomena, they often behave unexpectedly in hidden ways. Having more eyes and perspectives focused on the actual operation of the PSA would allow us to catch and fix these flaws more quickly and effectively.
Conclusion
The story we see with the PSA is a story we think state governments, corporations, and nonprofits will see frequently in the future. The story goes like this: somebody identifies a complex and persistent societal problem. A well-intentioned company comes along and pitches their AI as a solution to this problem. The developers claim that this solution will perform better than does the current system, and at the same time, save the customer money. This deal is attractive to leaders looking to solve this problem, so the AI is adopted. But it is adopted without any plan to collect the data needed to assess if the AI is performing as promised. Additionally, many times, the problem is too abstract, value-laden, and nuanced for AI to function well, and so it takes shortcuts to fulfill its program but leaves harmful, unintended, and unexpected effects. With the rapid development of AI, this pattern will certainly become commonplace—unless we arm ourselves with the tools necessary to make wise and responsible decisions about implementing AI systems. Performing local validation studies, gathering the data required to test the efficacy of the AI, thinking through unforeseen costs, and making democratic decisions about AI adoption would go a long way toward making AI more responsible.