Any decision making process is limited by the data available to the decision maker. AI is no different - the system is only as good as the data that we train it with. Where data sets are biased, either because they do not contain a diversity of data (i.e. the data only relates to white males) or because there are biased preferences in the data (i.e. the only examples of success are white males), the algorithm learns the bias and delivers a biased outcome.
Further, many algorithms ignore or reduce the impact of outlier data. However, the outlier data may represent the minority groups in the data and ignoring the outliers makes the data less diverse.
Interogating data sets to ensure they are clear of inherent bias is big step towards achieving a neutral outcome and many cloud providers such as IBM and others are developing tools to search for bias.
But is this enough? What if the algorithm itself is biased?
The challenge for many AI systems is that they are a "black box", with only those that have developed the algorithm understanding how they work. How would we know whether the unconscious biases of the developers have been (unintentionally) built into the systems they design? Calls for transparency in machine learning are increasing but research in this area is still in the early stages.
Until we have transparent models, in my view it is doubly important that we encourage diversity in software development teams. Secondly, where decisions relate to people and not objects, we should use AI to supplement (non-biased) human decision making, not replace it.
The danger of inherent bias in the use of algorithms is a common problem in the technology industry. Algorithms are not told to be biased, but can become unfair through the data they use.