What is the difference between statistics and machine learning?
There is a subtle difference between statistical learning models and machine learning models.
Statistical learning involves forming a hypothesis before we proceed with building a model. The hypothesis could involve making certain assumptions which we validate after building the models.
For example, let us consider Linear Regression (LR) which is an example of a statistical model. While building a LR model, a set of 3 assumptions are made.
- All the residuals follow a normal distribution around the mean.
- The attributes in the dataset are all independent.
- There is homoscedasticity in the data.
With the assumptions made, a cost function is calculated and minimized using methods like gradient descent and thus we finally arrive at a LR model and diagnose our model if the assumptions we made are followed by the data. If the assumptions are not fulfilled, we reject the initial hypothesis and start over again.
So, our initial hypothesis certainly plays an important role in the case of statistical learning models.
But, in the case of machine learning(ML) models, we directly run the ML algorithms on the model, thus allowing the data to speak out instead of directing it in a certain direction with our initial hypothesis/assumptions.
For example, while building a decision tree/random forest, we assume no hypotheses and directly run the algorithms. The ML algorithm returns the crucial features and their importance. Here, we are not setting up any hypotheses which might affect our final model. The model totally learns the data without any user imposed conditions.