/ Methods

# Statistical learning: Methods

Statistical methods, particularly those that focus on teaching machines how to learn concepts, are based on algorithms, i.e., step- by-step procedures that take input values, like numbers, and that yield an output value like a payment fraud risk score (%) or a class (fraud, not fraud), in a finite number of steps [1,2].

In statistical learning, two notable subtypes of algorithms are used, i.e., supervised algorithms and unsupervised algorithms.

## Supervised

Supervised algorithms are designed to learn classification problems with two or more classes.

Identifying payment fraud is an instance of a binary task, but multi-class problems also exist.

Optical character recognition is an instance of multiclass problem. Taxonomic multi-class prediction is another, e.g., when posting on a classified website like eBay, product categories are automatically proposed given the title.

There are also sets of regression methods; some of which can, in turn, be used for classification.

## Unsupervised

There are also algorithms that focus on unsupervised tasks. These algorithms exist to identify structure in data, i.e., groups or clusters of data points that share common characteristics.

This is especially useful when a complex phenomenon, e.g., payment fraud or a complex pathology like Parkinson’s disease, is broken down into a set of simpler and more homogeneous subtypes.

## Common supervised and unsupervised algorithms

Type Name Classification algorithms
Supervised SVM One vs. one and one vs. all
Naive Bayes Most likely class
k-NN Most common class
Decision trees Default to multi-class
Random Forest -
Neural networks (deep learning) Cutoff (e.g., 50%), one vs. all and most likely class
Logistic and lasso regression -
Unsupervised K-means
Model-based clustering