R : 6 ML algorithms

Here is a summary of machine learning techniques performed with R. This is just a quick reminder with some of my comments.

1. Linear Regression or multiple regression

This regression algorithm will try to fit a model to our dataset assuming that the link between variables is linear. It is good to use it to have a first look at your dataset. It is likely to underfit and it cannot represent nonlinear relations between variables.

Linear regression : 1 Y, 1 X

A short example with iris dataset. If we plot the petal length vs the sepal length, we may think that we could fit a linear model to these values.

lm<-lm(iris$Petal.Length~iris$Sepal.Length)
lm #to display your formula as (y = ax + b)
chart<-ggplot(data=iris)+geom_jitter(aes(x=iris$Sepal.Length,y=iris$Petal.Length))+ggtitle("Petal vs Sepal Length")

lm<-lm(iris$Petal.Length~iris$Sepal.Length)

lm #to display your formula as (y = ax + b)

chart<-ggplot(data=iris)+geom_jitter(aes(x=iris$Sepal.Length,y=iris$Petal.Length))+ggtitle("Petal vs Sepal Length")

Multiple regression : 1 Y, several X

R base library

lm<-lm(y~x1+x2,dataset) #generate the model
plot(lm)
predict(lm,data.frame(x1=value1,x2=value2),interval='confidence') #predictions

lm<-lm(y~x1+x2,dataset) #generate the model

plot(lm)

predict(lm,data.frame(x1=value1,x2=value2),interval='confidence') #predictions

2. Logistic regression

Logistic regressions are used to predict a category for a set of features. Instead of linear regression, you will predict categories your elements may belong to.

glm<-glm(y~x1+x2,family=binomial(link="logit"),dataset)
plot(glm)
predict(glm,data.frame(x1=value1,x2=value2)) #predictions

glm<-glm(y~x1+x2,family=binomial(link="logit"),dataset)

plot(glm)

predict(glm,data.frame(x1=value1,x2=value2)) #predictions

3. Support Vector Machine

Support Vector regression algorithms are able to model more complex relations between parameters than linear regression. By adjusting some parameters you will also be able to reduce your global error on the prediction. We need to install the e1071 package in order to use svm.

install.packages("e1071") #installing the library
library(e1071) #loading the library

1 2	install.packages("e1071") #installing the library library(e1071) #loading the library

4. Kmeans

We can qualify this algorithm as a “lazy” algorithm. For a set of data we do not presume the definitions for categories (Unsupervised learning). We just let the algorithm separate the examples and group them by categories allowing it to decide about the boundaries. You just have to set enough centers when launching the calculation (estimate the number of categories)

R base library

k<-kmeans(data,centers=3)
plot<-plot(data,col=k$clusters)

1 2	k<-kmeans(data,centers=3) plot<-plot(data,col=k$clusters)

library(dplyr)
data<-select(iris,Sepal.Length,Sepal.Width)
k<-kmeans(data,3)
cbind(data,k$cluster) #We bind the cluster value with each point of the dataset
chart<-ggplot(d,aes(col=k$cluster))+geom_jitter(aes(d$Sepal.Length,d$Sepal.Width))
chart

library(dplyr)

data<-select(iris,Sepal.Length,Sepal.Width)

k<-kmeans(data,3)

cbind(data,k$cluster) #We bind the cluster value with each point of the dataset

chart<-ggplot(d,aes(col=k$cluster))+geom_jitter(aes(d$Sepal.Length,d$Sepal.Width))

chart

clusters

We can see the different clusters with slightly different blue nuances which define the 3 different categories.

5. Random Forest

Random Forest algorithms are able to perform classification tasks. It is likely to overfit to the training set.

randomForest library : install.packages(“randomForest”)

library(randomForest)
rf<-randomForest(input matrix, output vector,ntree=50)
p<-predict(rf, input matrix) #predictions

library(randomForest)

rf<-randomForest(input matrix, output vector,ntree=50)

p<-predict(rf, input matrix) #predictions

6. Neural Network

Neural Networks are able to model complex relations between parameters, but they are difficult to tune. This library can be used to build simple models between inputs and outputs.

neuralnet library : install.package(“neuralnet”)

library(neuralnet)
nn<-neuralnet(y~x1+x2+x3,data,hidden=c(3,2))#hidden refers to the quantity of neural per layer (3 cells on 2nd layer and 2 cells on 3rd layer)
p<-compute(nn,test) #predictions

library(neuralnet)

nn<-neuralnet(y~x1+x2+x3,data,hidden=c(3,2))#hidden refers to the quantity of neural per layer (3 cells on 2nd layer and 2 cells on 3rd layer)

p<-compute(nn,test) #predictions