Scientists have been struggling with how to explain decisions made by neural networks for several years and tech giants are making software tools available to help achieve this. An example is the TCAV technology announced last week at the Google developers conference.
A deep neural network has lots of layers. Nodes in the layers are activated when the deep neural network is used to make a decision. To recognize objects depicted in images an image is input to the first layer and it triggers computations at nodes in the first layer. Outputs of the computations are passed to the next nodes in the network. The outputs of the computations are referred to as activations with analogy to neurons in the human brain which activate or "fire". A problem is that the activations in a particular layer of a neural network have no human understandable meaning. As a result it is not possible to explain how the neural network makes a decision.
The TCAV technology announced last week aims to give human understandable meaning to the activations. A human thinks of some human-friendly concepts which are related to the task the neural network has been trained for. An example is where the neural network has been trained to recognize zebras in images depicting animals. Suppose a human thinks of human-friendly concepts including "stripes" and "horse like" because the human thinks those concepts are probably important for the task of recognizing zebras.
The TCAV technology finds activations in layer x produced by input examples that are in the human-friendly concept set (stripes, horse-like) versus random examples. It then deﬁnes a “concept activation vector” (or CAV) as the normal to a hyperplane separating examples without a concept and examples with a concept in the neural network's activations. The CAVs are then used to compute a measure of sensitivity of neural network predictions with respect to concepts at any model layer.
If the human-friendly concepts include things like "female", "disabled" then it is possible to compute a measure of how sensitive the neural network predictions are with respect to those concepts. Ideally the predictions are insensitive to concepts where society values fairness.
TCAV technology is one of a variety of useful tools to help assess fairness of predictions made by deep neural networks. But humans still need to decide what concepts to use when assessing fairness. We have human rights laws to help us decide those concepts but translating human rights law into software is a big challenge.
Through a new research approach called TCAV—or testing with concept activation vectors—we’re working to address bias in machine learning and make models more interpretable. For example, TCAV could reveal if a model trained to detect images of “doctors” mistakenly assumed that being male was an important characteristic of being a doctor because there were more images of male doctors in the training data.