I have frequently been asked about training data used to train a neural network and whether the training data is safe. Suppose I have lots of photographs of faces and I use them to train a neural network to do face recognition. During training the neural network stores some of the information from the training examples but throws much of it away. After training the neural network is able to do accurate face recognition on images it has not seen before because it is able to generalise its ability learnt from the training examples. So it would seem logical to assume that, starting from the trained neural network on its own, it is not possible to get back the original training data.
But scientists have shown that it is possible to get back the training data examples in some circumstances.
In their 2015 paper, Fredrikson et al. "Model inversion attacks that exploit confidence information and basic countermeasures" describe how to recover reconizable images of people's faces given only their names and access to the trained neural network. Fortunately they also suggest ways engineers can use to make the technology secure. Fredrikson et al. used confidence values output by the neural network together with the predictions. (e.g. the neural network predicts Rachel's face with 67% accuracy.) The confidence data together with the predictions enabled the images of people's faces to be recovered. By making sure the confidence data is kept secret the security of the training data can be enhanced.