This article contains affiliate links. For more, please read the T&Cs.
Importance of Model Evaluation
Being able to correctly measure the performance of a machine learning model is a critical skill for every machine learning practitioner. In order to assess the performance of the model, we use evaluation metrics.
Depending on the type of problem that we want to solve, we can perform classification (where a categorical variable is predicted) or regression (where a real number is predicted) in order to solve it. Luckily, the scikit-learn library allows us to create regressions easily, without having to deal with the underlying mathematical theory.
In this article, we will demonstrate how to perform linear regression on a given dataset and evaluate its performance using:
- Mean absolute error
- Mean squared error
- R2 score (the coefficient of determination)
Regression Metrics
Regression metrics are different from classification metrics because we are predicting a continuous quantity. Furthermore, regression typically has simpler evaluation needs than classification.
Fundamental metrics that are used for assessing the regression model are presented below.
Mean absolute error
Mean absolute error (MAE) is one of the most common metrics that is used to calculate the prediction error of the model. Prediction error of a single row of data is:
We need to calculate prediction errors for each row of data, get their absolute value and then find the mean of all absolute prediction errors.
MAE is given by the following formula:
where yi represents the predicted value of ŷi.
The plot above represents the residuals – differences between the predicted values (regression line) and the output values. MAE uses the absolute value of the residuals, so it cannot indicate whether the model is underperforming or overperforming. Each residual contributes linearly to the total error because we are summing individual residuals. For that reason, small MAE suggests the model is great at prediction. Similarly, a large MAE suggests that your model may have trouble at generalizing well. An MAE of 0 means that our model outputs perfect predictions, but this is unlikely to happen in real scenarios.
Mean squared error
Mean squared error (MSE) takes the mean squared difference between the target and predicted values. This value is widely used for many regression problems and larger errors have correspondingly larger squared contributions to the mean error.
MSE is given by the following formula:
where yi represents the predicted value of ŷi.
MSE will almost always be bigger than MAE because in MAE residuals contribute linearly to the total error, while in MSE the error grows quadratically with each residual. This is why MSE is used to determine the extent to which the model fits the data because it strongly penalizes the heavy outliers.
The coefficient of determination (R2 score)
R2 score determines how well the regression predictions approximate the real data points.
The value of R2 is calculated with the following formula:
where ŷi represents the predicted value of yi and ȳ is the mean of observed data which is calculated as
R2 can take values from 0 to 1. A value of 1 indicates that the regression predictions perfectly fit the data.
Tips For Using Regression Metrics
- We always need to make sure that the evaluation metric we choose for a regression problem does penalize errors in a way that reflects the consequences of those errors for the business, organizational, or user needs of our application.
- If there are outliers in the data, they can have an unwanted influence on the overall R2 or MSE scores. MAE is robust to the presence of outliers because it uses the absolute value. Hence, we can use the MAE score if ignoring outliers is important to us.
- MAE is the best metrics when we want to make a distinction between different models because it doesn’t reflect large residuals.
- If we want to ensure that our model takes the outliers into account more, we should use the MSE metrics.
Hands-On Example of Regression Metrics
In order to understand regression metrics, it’s best to get hands-on experience with a real dataset. In this tutorial, we will show you how to make a simple linear regression model in scikit-learn and then calculate the metrics that we have previously explained.
The rationale behind the model
The dataset that we will use is a Boston Housing Dataset and the task of our model will be to predict the price of the house. Let’s say that we are an estate agent and that we want to quickly determine the price of the house in Boston. We can do this by creating a model that considers the features of the new property, such as the number of rooms, air quality and the crime rate in the town, etc. In order to train the model, we will build a model on the features of training data and we will use the model to predict a value for new data.
Implementation
Now, let’s start our coding! First, we need to import the necessary libraries:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from sklearn import datasets
%matplotlib inline
We will use the Boston Housing Dataset which can be accessed from the scikit-learn library.
boston_data = datasets.load_boston()Our dataset is a dictionary that contains key:value pairs. We can check them out by printing the keys.
print(boston_data.keys())This dataset contains 506 samples and 13 feature variables. The objective is to predict the prices of the house using the given features.
print(boston_data.data.shape)
By printing the description of the dataset, we can see more information about it and the features that it contains.
print(boston_data.DESCR)Our features will be stored in x variable and output values (prices of properties) will be stored in y. Output values that are stored in y are the target values that our model will try to predict.
# Input Data
x = boston_data.data
# Output Data
y = boston_data.target
After that, we need to split our data into train and test splits.
X_train, X_test, y_train, y_test = train_test_split(x,y,test_size=1/3, random_state=0)
For the prediction, we will use the Linear Regression model. This model is available as the part of the sklearn.linear_model module. We will fit the model using the training data.
model = LinearRegression()
model.fit(X_train, y_train)
Once we train our model, we can use it for prediction. We will predict the prices of properties from our test set.
y_predicted = model.predict(X_test)
Actual vs Predicted graph
Before looking at the metrics and plain numbers, we should first plot our data on the Actual vs Predicted graph for our test dataset. This is one of the most useful plots because it can tell us a lot about the performance of our model. The plot below uses MatPlotLib to make its visualizations for analyzing residuals v. model fit.
fig, ax = plt.subplots()
ax.scatter(y_predicted, y_test, edgecolors=(0, 0, 1))
ax.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=3)
ax.set_xlabel('Predicted')
ax.set_ylabel('Actual')
plt.show()
On this plot, we can check out where the points lie. We can see that some points are far away from the diagonal line and we can conclude that the R2 score will be low. This shows us that the model doesn’t fit the data very well. Now, we will make the same conclusion by solely observing magnitudes of regression metrics.
Evaluation of the model
In this step, we will evaluate the model by using the standard metrics available in sklearn.metrics. The quality of our model shows how well its predictions match up against actual values. We will assess how well the model performs against the test data using a set of standard metrics that we have previously introduced.
# model evaluation for testing set
mae = metrics.mean_absolute_error(y_test, y_predicted)
mse = metrics.mean_squared_error(y_test, y_predicted)
r2 = metrics.r2_score(y_test, y_predicted)
print("The model performance for testing set")
print("--------------------------------------")
print('MAE is {}'.format(mae))
print('MSE is {}'.format(mse))
print('R2 score is {}'.format(r2))
Mean absolute error is 3.55 which shows that our algorithm is not that accurate, but it can still make good predictions.
The value of the mean squared error is 26.36 which shows that we have some outliers.
The R2 score is 0.66 and it shows that our model doesn’t fit data very well because it cannot explain all the variance.
Considering our regression metrics, we can conclude that the model can be further improved. At this point, we could consider adding more features or trying to fit a different regression model.
Summary
In this post, we covered the fundamental metrics used to measure the performance of regression models. In the job interviews, you will be asked to evaluate the model, choose the proper metrics and explain it. This is why it is important to fully grasp the logic behind these concepts and learn when to use which evaluation metrics.
In the table below we give an overview of the regression evaluation metrics that we explored.
ABBREVIATION | FULL NAME | ROBUST TO OUTLIERS |
MAE | MEAN ABSOLUTE ERROR | YES |
MSE | MEAN SQUARED ERROR | NO |
R2 | R SQUARED/ R2 | NO |
For further studying, we suggest you check out these links and find even more about regression evaluation!
- Examples of model evaluation can be found in the book Chapter 5 of Introduction to Machine Learning with Python.
- Getting back to basics is the best way to understand more complex things. Therefore, we suggest you check out the scikit-learn official tutorial on Linear regression.
- Check the scikit-learn official documentation and read more about regression metrics that are part of this library.
- This post on Towards Data Science explains mathematical formulations and usage cases for different regression metrics.
- The tutorial on GeeksforGeeks explains Mean squared error in detail.