Time series

 

Dataset for this study : UCI

Datasets is composed of different clients (columns) for which a measurement of electric consumption has been recorded every 15 min for years. I expect to witness seasonality and trend in the signal. I’ll try to figure it out and build a regression model on it. This article aim to be an introduction to data analysis of time series.

Loading data and dependencies

Time series

Time series data are a particular form of signal type data. Usually, time series consists of a sequence of measurements. In most simple cases, every data point is recorded on a regular time basis, but usually we will be facing non regular sequences dataset. Here are 2 examples of regular and irregular time series :

  • A thermometer sensor is linked to a computer, the computer records the temperature every hour.
  • A sportsman is using a sportapp, which can record its activities.

a. Scalar product on time series : signal decomposition

Each basis must be composed with unitary vectors, and each vector must be orthogonal. We can verify these properties with the computation of the magnitude of each vector (norm of vector = 1) and the two by two scalar product of each vector in the basis.

b. Dimension of a signal

Dimension of a signal is the quantity of parameters that have an effect on the signal.

c. Mean value

mean(signal)=\frac{1}{N}\sum_{t=1}^{N}signal(t)

 

d. Rms value

While mean value can be null, we would like to be able to describe the amplitude of the signal. Rms value is one way to do it :

rms(signal)=\sqrt{\frac{1}{N}\sum_{t=1}^{N}(signal(t))^{2}}

 

e. Energy of a signal

Depends on the amplitude and the length (number of points) of the signal.

E(signal)=\sum_{t=1}^{N}(signal(t))^{2}

 

 

f. Power of a signal

Square of the rms value.

P(signal)=\frac{1}{N}\sum_{t=1}^{N}(signal(t))^{2}

 

 

g. Euclidean distance between 2 signals

2 signals with the same dimension represented in the same space can be compared using the euclidean distance. This value is difficult to interpret, but is often used to compare it to another euclidean distance. It’s a good tool to evaluate the performance of our models and compare them.

 d=\sqrt{\sum_{t=1}^{N}(signal_{1}(t)-signal_{2}(t))^{2}}

 

 

h. decomposition and recomposition of a signal (projection on a basis)

A signal can be decomposed on an orthonormal basis. Let’s build an orthogonal basis made of several sinus signals.

 

Resampling

It’s unlikely that our time serie is distributed evenly with constant gap between 2 following points. Resampling techniques allows to decrease or increase the number of time steps in the serie and more important to make it with a regular distribution. Different rules can apply to calculate the missing values (mean, max, min, etc…).

Let’s resample our data to daily

additive or multiplicative model?

Observed signals can be decomposed in 3 components : trend signal, seasonality signal, and noise signal.

 

autocorrelation

Frequency domain

a. Change of basis. Fourier basis (f1 to fn)

Construction of the basis

ei2pift=cos(2pift)+jsin(2pift)

 

b. Signal decomposition in Fourier basis

Transformation of a signal u in the Fourier basis is given by the scalar product of u with the conjugate complex of each vector from the Fourier basis.

TF[u]_{n}=\sum_{0}^{N-1}u_{k}e^{-j2\pi\frac{n}{N}k}

Fourier transform is a complex number.

c. Spectrogram

Spectrum is the module of the Fourier transform

N : length of our signal

We begin by building the Fourier basis Fn, composed of N vectors. We then check that our basis is “almost” an orthogonal basis by calculating the dot product value of vectors 2 by 2 and comparing it to a threshold value.

Finally, decomposition of the signal u is given by the dot product of u with the conjugate of Fn vectors.

in blue, our Fourier function, in orange the one included in numpy. We can see that the values are matching, which validate our algorithm.

d. Convolution of 2 signals

Convolution is an operation which transform 2 signals (u1,u2) into an output signal (s) such as :

s(t)=\int_{-\infty }^{+\infty }u_{1}(\tau )u_{2}(t-\tau)d\tau

We often write convolution as follow :

s = u1 * u2 = u2 * u1

e. smoothing signal with convolution

Let’s define a noisy signal as follow.

We want to use convolution to smooth the signal. Let’s start by averaging the signal with a small window.

u_{m}(t)=\frac{1}{T} \int_{-T/2}^{T/2}u(t-\tau)d\tau

We can write this equation as a convolution of the signal u with a gate signal such as :

gate=\frac{1}{T}\prod (t/T)

Forecasting series with LSTM

LSTM networks can be usde to predict the next output of a serie given the previous inputs. In the following, I am just going to describe a simple implementation of such a model.

We are going to predict a sinus signal with a trend defined as follow :

Given a number of previous time steps, we would like to predict the output of the signal. We need to define some parameters and to prepare the data so it could be fed properly into an LSTM keras layer.

Our signal contains len(x)=300 inputs points. Given the previous 10 points, we are going to predict the value of the next point.

 

Preparation of the data.

The network will be fed with 10 inputs (time_steps) and 1 output. For each window of ten values in our serie we are going to send to the network a vector containing the values of the window (inputs) and the following value just after our window (ouput or label).

Now that pairs of inputs and label are ready, we can feed it into the network.

Building and training the model in Keras.

I choose a LSTM network with one hidden layer of 50 neurons.

 

Plot the loss on the training set.

Loss on training set has a nice decreasing curve. Our network is learning properly.

Make the prediction.