I recently read a research paper about use of machine learning to predict premature death.  The research shows the effectiveness of machine learning to compute highly accurate predictions but it also highlights a limitation of machine learning which is that it can't tell us about cause and effect. 

The researchers, Weng et al from the University of Nottingham, used data from the UK Biobank which is a database of historical medical data from more than half a million people of whom about 14 thousand died mainly from cancer, respiratory disease or heart disease.

Neural networks were found to give the best results at predicting premature death, closely followed by random decision forests and both of these were more accurate than traditional non-machine learning methods.

The neural network put emphasis on some factors (waist circumference and skin tone) which were different from those of the random decision forest and the traditional method.  The temptation is to jump to a conclusion that the factors the neural network found to be important are the ones which cause premature death.  But it is incorrect to do so.  As explained in the research article the machine learning models "only give an indication of whether there may be a “signal” in the data and not the direction of association, and should thus be interpreted with caution. Further analysis using causal epidemiological study designs is recommended."

Having said that,  the "study shows the value of using ML, to explore a wide array of individual clinical, demographic, lifestyle and environmental risk factors, to produce a novel and holistic model that was not possible to achieve using standard approaches. This work suggests that use of ML should be more routinely considered when developing models for prognosis or diagnosis."

Perhaps a strength of machine learning will be its ability to find potential signals in medical (and other) data which can then be tested using causal study designs such as randomised clinical trials.