R : 6 ML algorithms

Here is a summary of machine learning techniques performed with R. This is just a quick reminder with some of my comments.

1. Linear Regression or multiple regression

This regression algorithm will try to fit a model to our dataset assuming that the link between variables is linear. It is good to use it to have a first look at your dataset. It is likely to underfit and it cannot represent nonlinear relations between variables.

Linear regression : 1 Y, 1 X

A short example with iris dataset. If we plot the petal length vs the sepal length, we may think that we could fit a linear model to these values.

Multiple regression : 1 Y, several X

R base library

2. Logistic regression

Logistic regressions are used to predict a category for a set of features. Instead of linear regression, you will predict categories your elements may belong to.

3. Support Vector Machine

Support Vector regression algorithms are able to model more complex relations between parameters than linear regression. By adjusting some parameters you will also be able to reduce your global error on the prediction. We need to install the e1071 package in order to use svm.

4. Kmeans

We can qualify this algorithm as a “lazy” algorithm. For a set of data we do not presume the definitions for categories (Unsupervised learning). We just let the algorithm separate the examples and group them by categories allowing it to decide about the boundaries. You just have to set enough centers when launching the calculation (estimate the number of categories)

R base library

clusters

We can see the different clusters with slightly different blue nuances which define the 3 different categories.

5. Random Forest

Random Forest algorithms are able to perform classification tasks. It is likely to overfit to the training set.

randomForest library : install.packages(“randomForest”)

6. Neural Network

Neural Networks are able to model complex relations between parameters, but they are difficult to tune. This library can be used to build simple models between inputs and outputs.

neuralnet library : install.package(“neuralnet”)

nn