Biased Programing in Law Enforcement AI Programs – Garbage In – Garbage Out | The Centre for the Study of New Security Challenges

by Richard Hoskins

I generally approach invitations to publicly share my views with a degree of trepidation; however, in this instance I am somewhat more comfortable sharing my thoughts. The use of AI for law enforcement has the potential to seriously impact two communities that I am a part of: the African American and law enforcement communities. My personal and professional experiences allow me to reflect and offer my opinions on the potential issues arising from using biased data to program law enforcement AI systems. I cannot over emphasize that I am not presenting the official views of any agency. The opinions expressed are mine alone.

As a SCI FI fan, this topic holds a particular fascination for me. AI is a natural extension of the science that has been predicted in fiction for decades. HAL 9000 (Heuristically Programmed ALgorithmic Computer) from 2001: A Space Odyssey, Skynet from the Terminator movies, or one of my favorite books growing up, Colossus: The Forbin Project. Today we are faced with the onset of real robocops and super crime-fighting computers. We are even moving in the direction of AI-managed policing to predict who will commit crime, not unlike the super computers depicted in the movie Minority Report. But are we qualified to teach these computers how to make fair assessments of human behavior? Does the historical data we depend on as a template represent fair, unbiased law enforcement practices? Are we the best role models given the history of questionable law enforcement practices directed against certain segments of our society and the continued, unflattering way the public often interprets police practices?

Even deep learning algorithms are only as good as the source data used to program them: garbage in–garbage out. In this post, I will discuss the impacts of using historical data that is suspected of, or proven to, reflect biased human perspectives to program AI algorithms.

For those of us of a certain age, it is both awe-inspiring and terrifying to watch how much influence computers have on our daily lives. I can’t imagine having to start a career in today’s world. In order to compete in today’s market, you have to have an online profile. Doing so means your online personality and profile are subject to the same scrutiny that used to be reserved for face-to-face encounters.

Algorithms can be used to determine which candidates should be awarded an interview, which means an AI program might eliminate a person before they are even given a chance to meet potential employers face to face. If AI as it exists now has that level of influence over our lives, we should absolutely be concerned that it is fair and unbiased.

Is it Possible to Eliminate Bias?

How do we eliminate bias in machines when we have not learned to master it as humans? We often attach negative connotations to the term bias, but part of the dilemma is the complicated way bias functions in our society. After all, isn’t bias just another way of expressing preference? We cannot simply classify bias as something to be avoided in every situation. We have to teach AI to discern when bias is acceptable and, more specifically, when it is inappropriate and potentially harmful.

If someone was to say that a law enforcement agency was demonstrating a preference for hiring white agents, the initial reaction might be to suspect racial discrimination. But what if new investigative initiatives required hiring agents that could go undercover and infiltrate white supremacy groups or develop informants from within those organizations? In a sense, any algorithms designed to assist law enforcement agencies in identifying candidates with specific qualifications would need to be programed with a bias against those who do not fall into the category possessing preferred traits. My first point of observation is that we should resist speaking in terms of “eliminating bias” in AI and instead master our understanding of its nature so that it can be effectively managed so as not to disenfranchise.

Where Does Bias Come From?

There are many things in our environment that shape our bias, such as significant events or encounters with particularly impactful individuals that leave a strong impression. We also see shared biases based on religious upbringing and biases shared by people in an environment with a prevailing political perspective. If you examine a political map of the United States indicating red (conservative) and blue (liberal) states, you will notice clusters of large geographic areas where a common political philosophy is dominant. These things shape our values, beliefs, and the very manner in which we view the world. This, in turn, informs our decisions.

Human’s susceptibility to environmental influences amplifies concerns about bias-infused AI. AI algorithms may be exposed to varying biases depending on who programmed it. Factors such as the programmer’s age, gender, race, ethnicity, and geographic origins can impact the AI decision-making process. We should also question the impact of limiting the human perspectives to which the AI programs are exposed. Similarly, we should not assume that the viewpoints held by those who program algorithms are the only accepted viewpoints, or that their viewpoints won’t change over time.

What Is the Impact of Programmer and Data Representation?

Interviewed in a Forbes Magazine article entitled “How Bias Distorts AI,” expert Dr. Rebecca Parsons, noted that teams responsible for the development, training, and deployment of AI systems are largely not representative of the society at large. She referenced an NYU research report which at that time determined that women comprised only 10% of AI research staff at Google and that only 2.5% of Google’s workforce was black. She opined that this lack of representation is what leads to biased datasets and ultimately algorithms that are much more likely to perpetuate systemic biases. This concern extends to the predominant political affiliations held by the programmers. These observations are consistent with some criticisms directed at companies like Twitter, Facebook, and Google regarding their perceived bias against certain political perspectives. Often, they are accused of banning more conservative and right-wing users than liberal users and supporting content that favors liberal users and leftist views.

There have been attempts to better understand the nature of bias in humans. Researchers use one such tool called the Implicit Association Test (IAT). The test measures responses by human participants who are asked to pair word concepts displayed on a computer screen. The algorithm is able to provide statistical analysis of words that are often used in association with one another, as opposed to those words that are rarely associated. A program was used to index content containing nearly 100 billion words. Researchers examined the sets of target words looking for patterns and evidence of the potential biases that humans unwittingly possess. There were instances of the program associating feminine names more with words attributed with family such as ‘parents’ and ‘wedding’ than names of a masculine nature. On the other hand, masculine names had stronger associations with words attributed with career such as ‘professional’ and ‘salary.’ The project highlighted that the biases in the word embedding are in fact closely aligned with social conception of gender stereotypes.

It is not hard to find evidence that supports these findings in the real world. One large company based in the U.S. discovered that its internal recruiting tool was dismissing female candidates because it was fed data from a period where men were treated more favorably than women regarding hiring and promotion. When they examined the data, they realized that the computer was unintentionally taught to associate certain key terms with a specific gender. The process of correcting the bias involved repeated efforts to identify every element of programming where these associations existed and eliminating them.

If the IAT findings are accepted as a possible indicator of how bias manifests in our society, we might consider that it provides insight into other instances of potential bias resulting in unfair treatment. That being said, the IAT has received criticism from some in the academic community. But even though its findings may not be universally accepted, there is little debate that it has inspired a much-needed dialog that warrants further consideration.

What Are the Implications when Predicting Crime?

I mentioned the crime predicting AI super computers in the movie Minority Report. This type of algorithm is currently being used in the U.S. and other countries. One such program is named The Correctional Offender Management Profiling for Alternative Sanctions (COMPAS). COMPAS was designed as an aide in determining parole by accessing the likelihood of recidivism. Over the last couple of years, the program’s accuracy has been questioned, following a ProPublica investigation that indicated the system might contain strong racial bias. The investigation found that black defendants who did not recidivate over a two-year period were nearly twice as likely to be misclassified as higher risk compared to their white counterparts (45 percent vs. 23 percent). White offenders were ranked as lower risk than black offenders despite their criminal history displaying higher probabilities to reoffend.

The ProPublica findings have been contested by the company that created the algorithm. Even so, just the possibility that a racially biased AI program is advising Judges and Parole panels whether or not to grant human freedoms from incarceration is concerning.

How Can We Resolve Biases in Historical Data?

The examples used thus far are not meant to accuse programmers of malicious intent to teach AI to treat some people unfairly. Additionally, biased activity that does lead to unfair treatment is often changed as we evolve as a society. Certainly, the manner in which women are treated in the work place has evolved tremendously over the past several decades, and the police practices I witnessed as a rookie cop in the 70s would not be tolerated now. However, even though many of these acts of overt discrimination are no longer taking place, there is evidence of less obvious, bias-based law enforcement practices impacting minorities today.

Former New York Mayor Michael Bloomberg, one of the early candidates who ran for President of the United States in 2020, has been frequently forced to address his law enforcement policy while he was mayor, which deployed officers to stop and frisk people, particularly in minority communities. More than 80 percent of the people stopped were people of color, mostly young African American or Latin men. The stops were often violent with police throwing young people against walls or to the ground. The practice failed to reduce crime and was discontinued after being ruled unconstitutional by the U.S. Supreme Court. Mr. Bloomberg has since apologized profusely for his policies admitting that he was wrong.

What happens when we do determine that past police practices unfairly targeted people of color? Do we discontinue the use of data collected as a result of those practices? During the Stop and Frisk campaign, police made numerous misdemeanor arrests for low level offenses, such as jaywalking and possessing open alcohol containers. They even increased arrests among students of color by deploying more police in minority neighborhood schools. Even though the practice has been discontinued, what happens to the volumes of data collected on those stopped and released, arrested for minor infractions, and those convicted. Should this data that was collected through unconstitutional practices be used to program predictive law enforcement AI?

If those records are the basis for predictive algorithms, then would not the resulting data be biased? How do we address the continued use of historical data possibly resulting from unfair targeting of certain segments of the population?

Considering our capability to evolve both individually and as a society, is it not possible that some of the data used to program the AI no longer reflects current attitudes and policies? In this case, some of the data may reflect arrests that would not even be made today, considering changes in laws such as Stop and Frisk, decriminalization of marijuana, or prison reform regarding some nonviolent offenses. We should also consider the specifics of geographic region, neighborhood demographic, unemployment rates, training available to law enforcement, etc. Going back to the Google staffing example, what if at the time the data was collected, there were only 10% black officers on the force and even fewer in the upper ranks and the prosecutor’s office. The insensitivity of some of these law enforcement practices might be attributed to a lack of diversity within the agencies and among their leadership.

Are there Consequences for AI when its Biases Are Deemed Inappropriate and Potentially Harmful?

It is important to acknowledge and understand that the roots of biased practices are not necessarily malicious. In fact, what the public may see as biased police practices may in fact be sanctioned techniques supported by U.S. Supreme Court decisions that, unlike Stop and Frisk, have been deemed constitutional. In Terry v. Ohio (1968), it was determined that a police officer’s observation of what he or she determined to be suspicious behavior was sufficient for them, to stop and conduct a cursory investigation. I would go so far as to offer that a police officer who cannot recognize possible criminal activity on his beat is not very good. But what a police officer deems suspicious can easily be impacted by personal bias. For this reason, there is potential for this important law enforcement tool to unfairly impact some in society more than others.

According to a study by the Washington Post, nearly 1,000 people were fatally shot by police in the United States in 2015. Disproportionately high amounts (40%) of the unarmed people killed by police were black men. To appreciate how disproportionate this is, one must consider that only 13 percent of the U.S. population is black. Black males represent about 6-7 percent. Yet they represent 40 percent of those killed by police. These statistics have given rise to a movement in the United States called “Black Lives Matter,” which is often mistaken as just a plea for police to value black lives as much as they would white lives. But it has a second intent that is to call attention to the frequent failure of these cases to end in convictions for the police officers committing the killings. Black lives are worth criminal convictions of those who take those lives.

As a law enforcement officer, I am grateful that the American public and its juries tend to give police the benefit of the doubt and are not quick to assume bias-driven malice. Given the environment within which they operate, the public can be sympathetic to the occasional fear police have for their lives. The fact that they have that reasonable fear is often the key to their defense. But what if the fear has its origins in biased beliefs? A common saying among American police that spans training to retirement instructs: “It is better to be tried by 12 than carried by 6,” which addresses the need to not be killed because you hesitate. But does it also relay a hidden expectation that juries will not convict?

Regardless of the outcome of past trials, all of the incidents of police shootings require immediate review, potential grand jury, indictment, and possible conviction. As humans, we can face consequences if our biases are deemed inappropriate and potentially harmful. This warrants several questions about the consequences of machine bias:

What remedies do we have for disciplining AI that has been found guilty of biased enforcement?

Who should be held liable? In keeping with my earlier observations, should we place the blame on the programmers?

Can cops be held responsible for biased patrolling or stop and frisk practices if they are following AI recommendations that are later proven to be biased?

If facial recognition algorithms direct police to target a particular race, age, or gender and are later determined to be responsible for false arrest, who do we blame?

If the AI is to be considered a valuable witness to the process leading up to this event, how do we question it or subject it to cross examination?

If the algorithm is recognized as intellectual property, does this interfere with the courts ability to impose reverse engineering in pursuit of the truth of what went wrong?

As an example, recall the challenges to COMPAS predictive algorithm accuracy. The company that created the software contested the findings and refused to disclose details about the algorithm, claiming that revealing proprietary information of that nature would harm their ability to be competitive as a business. This indicates the kinds of challenges we face moving forward as our existing laws struggle to keep up with the unique nature of our ever-growing dependence on AI.

How Should We Address AI Biases in Law Enforcement?

As I conclude, I admit that I have done little to provide solutions for the issues I have raised. It remains the domain of engineers and sociologists with far better understanding of the problems than I. However, I leave you with some final recommendations from my personal perspective. The remedies for addressing the AI issues mirror consequences if we were to observe a human’s inappropriate and damaging biased behavior. That said, at the very least we should consider the following:

We should approach the bias with a firm understanding of its nature, not to villainize it, but to properly manage, limit, and account for its inevitable presence.

Our tendency to change with our environment and evolve individually and as a human society means our systems must be capable of the same capacity to evolve.

Our AI and its governing algorithms must be constantly monitored, updated, and tested. I am encouraged that tech giants like Apple, Facebook, and Google have formed a partnership that encourages research on AI ethics and the potential threat of bias. There are also ongoing efforts to develop algorithms designed specifically for detecting the presence of bias in other AI programs.

The data must have the benefit of a myriad of perspectives representing the population it’s designed to serve regarding race, ethnicity, gender, age, etc. Garbage in—garbage out.

It’s important that those perspectives are not limited to the programming phase. It is now recognized that some past difficulty eliminating bias from programs was caused by the impurity of the testing phase of development which simply reassigned members of the programming team to oversee testing. If the programming team was deficit in representation, then their lack of awareness in the programing would impact their ability to detect bias in the testing phase as well.

This will not be an easy undertaking, but it is crucial for the protection of our human communities that are impacted more and more each day by the decisions informed by AI algorithms.