top of page
  • Writer's pictureSwathi Young

What is data bias in Artifical Intelligence and why we should care?

Updated: Feb 19, 2019

Machine learning and Artificial Intelligence have transitioned from being a buzz word to practical applications. Google and Amazon have brought Artificial Intelligence powered devices like Google Home and Alexa to our living rooms. As AI and Machine learning are increasingly used everywhere, the question of biased data becomes very crucial.

In order to understand how biased data in machine learning can impact us, let us look at the workings of an AI system.

Machine learning is the most common way of building an AI program today. Machine learning fundamentally uses the concept of teaching machines how to learn from vast quantities of data. This is called the “training dataset”. The output of the machine learning algorithm depends on this training dataset. In case the training dataset is wrong, that would propagate to the solution as well.

Take an example of the training dataset of a bunch of dog photos with the labels for the dogs and the machine learning is trying to identify various dogs from random pictures. Now by mistake, a wrong image slips into the training dataset, a picture of a wolf labeled a dog. Whenever the algorithm comes across a picture of a dog resembling a wolf, it identifies it as a wolf instead of a dog. The old adage of "garbage in, garbage out" would still hold true.

Hence, it is very important that the data sets are unbiased and non-manipulated for the AI program to produce truly objective outcomes. This is easier said than done because data can be biased intentionally or unintentionally. Moreover, the creators of the algorithm are humans and as humans we tend to have inherent perceptions and biases.

But first, what is biased data?

Cognitive biases are patterns that deviate from normal rationale to make judgements and these are used by us either for mental shortcuts, informational overload or even for social acceptance. These biases can sometimes lead to inaccurate results. There are 188 known biases as represented in the diagram below:

A training data set most probably will include these biases since the first step of creating and labelling the data is manual.

If such a data set is used to train the AI system, the resulting actions or decisions would contain these biases and produce results that might not be beneficial.

Besides the above there can be other types of biases such as sample bias. This bias occurs when the training data does not include the actual conditions that the model would be run. For example, if we would like autonomous vehicles to run both during day and night, just providing data that is taken during the day would not be helpful.

From the above examples, it is very clear that biased data will affect the functioning of your AI system. Here are some of these effects in detail.

1. Lack of transparency in the decision-making process

Machine learning systems are the result of the algorithms that are designed to govern the system. These algorithms are then trained to behave in a certain manner to get the desired results. This means that if that data was biased ,the end user would never know about it. If we are to take the example of Amazon recruitment AI, the candidate would never know that the AI was biased towards male applicants. Maybe even the HR managers are not aware of this issue until a pattern has been recognized.

2. Biased outcomes

AI systems are generally used to automate manual tasks. The AI application is assumed to be efficient and objective and that it gives desired results. The objectivity of the system can be hampered by a biased data set resulting in flawed, less than desirable results and biased outcomes. There are many examples of the manifestation of this issue - worse performing facial recognition software on non-white people, or speech recognition software that doesn’t recognize women’s voices as well as men’s or even more worrying claims of discrimination in the AI used by credit agencies and parole boards.

3. Insensitive content

Artificial Intelligence is widely used in chatbots for commercial purposes. These chatbots are developed to engage with customers or public at large. They are developed in such a way that they learn and adapt once you start engaging with them. Your responses are used as data set by them to learn.

The most famous example of a Chat Bot gone wrong is Microsoft's, Tay. This interactive chatbot on twitter was used as an experiment in “conversational understanding”.When people started tweeting misogynistic remarks, it repeated such remarks back.It went from a friendly bot to writing hate messages within 24 hours of its launch, proving that if we are not careful when developing these bots, we can create rogue bots.

4. Threat to business

AI is all about data and a lot of this data is personal. If an AI malfunctions or starts taking decisions that aren’t healthy or good, all the parties related to the AI or the business for which that particular AI system is employed will be in danger. For-profit organizations may adversely impact their customer base if they do not create inclusive products. Take the case of credit checks, if a credit checking financial institution does not provide transparent results or feedback, the customer would tend to go to your competitor. If an AI application is discriminating women, you would miss out on a whole demography using your product. Hence, ethical design of AI applications can impact the top and bottom line of organizations.

5. Impact on society

We saw from the case of Cambridge Analytica and Facebook fiasco that machine learning data can have a huge impact on society. The use of hyper-targeted advertising using false statements has shown to have had an effect on the 2016 election results in the US. Bots disguised as human accounts are used to spread misinformation or manufacture the illusion of public support. This issue has implication on society at large and can even become a threat to democracy.


In conclusion, the question of ethics in Artificial intelligence application is not to be taken lightly if we want to build Artificial intelligence systems that are beneficial to us. While AI has immense potential to build a better future, we know that biases can creep in unknowingly. Avoiding this issue will require focus on ethical design, providing transparency and questioning our motivations while building these systems.


Recent Posts

See All
bottom of page