/ Methods / Statistics / Data-reduction

# Prevent overfitting of statistical models by using regularization or test-set evaluation

To prevent overfitting, two common techniques involve penalizing models for their complexity, which is also called regularization and test set.

## Regularization

Instead of optimizing an objective function that simply measures how accurate the model is, e.g., via the likelihood or the accuracy, a regularized objective function has a penalty parameter that is proportional to the model complexity.

Two instances of regularized methods are the ridge regression (norm l2) and the lasso (norm l1).

## Test set

The test set approach is, in practice, made of (at least) two sets of data points: the test set and the training set.

1. First, the points from the training set are used to fit the statistical model.
2. Second, the (previously unseen) points in the test set are used to evaluate the performance of the model.

In practice, the process of evaluating the model’s performance is often done several times. Various methods to split the training and test set exist, such as the holdout, cross-validation, and jackknife (or hold one out) methods.

### Holdout

This method simply creates 5, 7, 10, or more training and test set partitions of the data set at random. The measure of the model performance is then repeated over each training and test set split. The average performance is reported with its confidence interval.

### Cross validation

In contrast to the holdout method, which splits in two, the cross-validation method splits data sets in 5, 7, 10 or more partitions of the data set. Then, the model is iteratively trained on all but one partition; the left-out partition is rotated. The average performance is also reported with its confidence interval.

### Jackknife or hold one out

Finally, the jackknife or hold-one-out method is an extension of the cross-validation method, where the number of partitions is the number of data points in the data set. In turn, this means that each test set is composed of only one data point and that the rotation will occur as many times as there are data points.

By MartinThoma (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC BY 3.0 (http://creativecommons.org/licenses/by/3.0)]