Which very common statistical areas of machine learning can trap us into some pitfalls? You might be familiar in some way with trade-offs, general biases, or cognitive nuances.
It’s a reminder list worth reiterating.
π£ Correlation is not Causation
π Resist the inclination to explain findings on correlated variables as though they have a causal relationship
π£ Beware of Non-Representative Samples of Data OR non-representative training data
π Carefully examine if you are creating a false comfort from bad data
π£ Oooof…. Data Leakage!
π Ensure that similar data to the training dataset used to train the model with, WILL be available at the time of the prediction! (There are two types of data leakage: target leakage and train-test contamination)
π£ the best for last … Overfitting
π Being rigorous of about examining the ability of the model to predict training data as well as new unseen observation by the model.
What other ones to keep in mind?
Check out what others may have said about this subject on Linkedin