Bias in Machine Learning

bias in machine learning

As the healthcare industry’s ability to collect digital data increases, a new wave of machine-based learning (ML), Artificial Intelligence (AI) and deep learning technologies are offering the promise of helping improve patient outcomes. As exciting as these new ML and AI in healthcare capabilities are, there are significant considerations that we need to keep in mind when planning, implementing and deploying machine learning in healthcare. One of the biggest challenges that our industry faces, in fact, any industry considering ML, is bias in machine learning. 

Is Bias in Healthcare New?

Before diving into the specifics of bias in machine learning, let us consider the tried and true process of observational studies. Bias in clinical studies is a well researched and known challenge. These biases are caused by a wide range of different factors. 

“Factors that may bias the results of observational studies can be broadly categorized as: selection bias resulting from the way study subjects are recruited or from differing rates of study participation depending on the subjects’ cultural background, age, or socioeconomic status, information bias, measurement error, confounders, and further factors.”
- Avoiding Bias in Observational Studies 

When planning a new clinical study, defining and understanding the potential bias that may impact the results is a critical requirement to help create a successful outcome.

What is Bias in Machine Learning?

The same bias traps in observational studies can lead to similar deep learning bias issues when developing new ML solutions. In essence, human bias in - human bias out. But, deep learning bias can have unique challenges that need to be understood to properly review results and prevent having machine learning biased data unexpectedly impact your patient outcomes.

There are many different kinds of machine learning bias examples, some are inherent in all deep learning models, other types are specific to the healthcare industry. When looking at types of bias machine learning, it’s important to understand bias can come in many different stages of the process. 

Bias machine learning can be applied when collecting the data to build the models. It can come with testing the outputs of the models to verify their validity. It can even be applied when interpreting valid or invalid results from an approved data model. Nearly all of the common machine learning biased data types come from our own cognitive biases.

machine learning bias

What Are Common Bias Types?

Anchoring bias occurs when choices on metrics and data are based on personal experience or preference for a specific set of data. By “anchoring” to this preference, models are built on the preferred set, which could be incomplete or even contain incorrect data leading to invalid results. Because this is the “preferred” standard, realizing when the outcome is invalid or contradictory and can be hard to discover.

Availability bias, similar to anchoring, is when the data set contains information based on what the modeler’s most aware of. For example, if the facility collecting the data specializes in a particular demographic or comorbidity, the data set will be heavily weighted towards that information. If this set is then applied elsewhere, the generated model may recommend incorrect procedures or ignore possible outcomes because of the limited availability of the original data source.

Confirmation bias leads to the tendency to choose source data or model results that align with currently held beliefs or hypotheses. The generated results and output of the model can also strengthen the confirmation bias of the end-user, leading to bad outcomes.

Stability bias is driven by the belief that large changes typically do not occur, so non-conforming results are ignored, thrown out or re-modeled to conform back to the expected behavior. Even if we are feeding our models good data, the results may not align with our beliefs. It can be easy to ignore the real results.

What Are the Current Bias Trends?

In the 2022 research paper “Preventing Bias and Inequities in AI-Enabled Health Tools”, the Duke-Margolis team determined that the healthcare Industry continues to face bias and that “...all stakeholders must commit to ensuring that current and past inequities and biases do not become more ingrained.” Similar to the concerns in a previous study “Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data”, the Duke-Margolis researchers are pointing out four main areas of concern with Health Care data in ML, AI and deep learning bias: Inequitable Framing of the Health Care Challenge or the Users’ Next Steps, Unrepresentative Data, Biased Training Data, and Choices in Data Selection, Curation, Preparation, and Model Development.

bias machine learning in healthcare

How Does Reframing Prevent Bias?

When considering how machine learning biased data can impact patient outcomes, it feels natural to hone in on the input that can lead the ML and AI systems to go wrong. This is an important area to consider, but the Duke-Margolis research surveys found that we also need to step back and look at the entire ecosystem around the AI solutions and consider if developers are asking the right questions for the issues trying to be solved.

In the papar’s use case, they looked at predictive appointment booking systems that tried to optimize scheduling to limit Provider downtime. A common approach used was to double-book an appointment slot if the system determined one of the Patients was likely to not show up. Unfortunately, the system was trained using data that had industry bias built into it. The most likely people to cancel appointments “... are often Black, Latino, and American Indian/Alaskan Native patients, who disproportionately experience systemic barriers to accessing care such as lack of reliable transportation, limited access to paid sick leave or affordable health insurance”.

By using this data to build the ML predictive models, the systems could sometimes determine possible no-shows correctly, but when both patients showed up it impacted the patient and staff by extending patient wait time and having more workload on the staff. A better approach for this system would be to reframe the goal. Instead of trying to predict and double-book appointments, understanding the root causes of why patients “no-show” first leads to a completely different solution. 

Understanding that the industry has “systemic barriers”, the University of California, San Francisco reframed the solution by trying to prevent cancellations before they can occur. They were able to build ML and AI models that looked for opportunities to move patients off the standard self travel, in-practice appointment to other options, such as telehealth or assisted travel solutions. By understanding the entire spectrum of bias in the healthcare industry, they were able to create a more holistic approach to prevent known bias from entering into the solution from the very start of the project.

What Happens When Data is Missing?

When collecting data to build a training set for Machine Learning solutions, it is important to understand the breadth and depth of the data available to you. This includes considering how the data aligns with the target Patient cohort the ML models and AI tools will be used with.

Many data sets are limited by a multitude of dimensions. Examples include regional limitations, such as data collected from a specific medical center, data built from a limited social-economic level (SEL) or data collected that contains systemic bias e.g. studies which limit or completely exclude a specific gender.

In the Duke-Margolis research, they call out that “...there are geographic biases to much of the data used to train AI. One study showed that the majority of papers describing AI tools in health relied on data from just 3 states (California, New York, and Massachusetts) for training those tools.” If the tools being built are deployed in more rural or varied regional populations, the representation of the data may not overlay in the same way and can lead to unexpected outcomes based on the machine learning biased data sets.

The population structure of the source data can be also weighted based on who is included or excluded based on SEL. A lower SEL for a patient can mean a lack of access to healthcare or visiting multiple providers across networks, causing gaps in the patient record. SEL can also impact “...data flowing from devices such as FitBits and biometric sensors. Such data are very rich, but they are sparse—you have them only for certain people.”When models are built upon this data, bias can arise because there are gaps in the data set, specially weighted away from lower SEL patients.

Quality of data and consistency by practitioners can create biased machine learning models. Examples include quality of care based on SEL, which can impact if and how data is entered into the EHR. This can lead to gaps or inconsistencies. Existing biases in the medical field and/or practitioners can also trickle down into the data. For example:

 “Women were less likely than men to receive optimal care at discharge. The observed sex disparity in mortality could potentially be reduced by providing equitable and optimal care.”

- Sex and Race/Ethnicity-Related Disparities in Care and Outcomes After Hospitalization for Coronary Artery Disease Among Older Adults.

machine learning bias examples

How Does Data Become Biased?

Even when a team collecting data makes an extended effort to build an inclusive and representative set for their target population, bias can still bleed into the data models. The Duke-Margolis referenced studies which found “...several tools that predicted Covid-19 deterioration used oxygen saturation sensor data from fingertip devices, which use laser-based sensors that have less accurate performance in individuals with darker skin or other attributes.” Even though the data had a broad representation of the population, the individual data records had a built-in bias created by the tools that extracted the data. 

When collecting and analyzing data for ML model generation, every aspect of the content and collection process needs to be considered. Were the tools and systems used to collect the data bias free, if not how does the bias impact the data? When the data was collected, did the team include or exclude sources based on biased methodology? Was data accepted or rejected that was partial or had missing dimensions that could impact systems downstream?

Bias can be unconsciously introduced at many different levels. In today’s complex multi-tiered systems, data is almost always combined with many different data sets each with their own potential bias built in. As we combine and recombine data, bias can be brought along without our knowledge and until we take time to deep-dive into all the aspects of the data we will not understand how ML and AI systems inherit bias.

How Can We Fix Bias Machine Learning Models?

ML Models can only find a pattern if the pattern is present in the data. If the data represented to the model does not contain enough information or is reflective of a specific time range, then outside of bounds changes can not be predicted or discovered. If the historical data doesn’t have it then ML can’t detect it.

 “... if models trained at one institution are applied to data at another institution, inaccurate analyses and outputs may result”
- Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data

Similar to observational studies, how the deep learning and machine learning models are planned, developed, tested, analyzed, and deployed can lead to removing bias inherent in all systems. At ForeSee Medical, we have a dedicated team of clinicians, medical NLP linguists and machine learning experts focused on understanding, tracking and mitigating bias within our HCC risk adjustment coding data models. We have developed rigorous testing standards to continually improve and review our results against both gold standards and blind tests to verify accuracy, precision and recall.

It is critical that the business owners understand their space and invest time in understanding the underlying medical algorithms that drive ML. By having a deep understanding of the space and the potential for bias, it can be prevented before the models are built and reviewed so that results best reflect the outcomes driven by the ML systems.

 
 

Are you looking to take advantage of the latest precision machine learning technology? See how ForeSee Medical can empower you with accurate, unbiased Medicare risk adjustment coding support and integrate it seamlessly with your EHR.

 

by James Polanco, Director of Software Development