Time Series Data Clustering — Unsupervised Sequential Data Separation with Tslean
Clustering is an important machine learning technique that helps divide the data points into several groups. Common clustering algorithms are K-means, Mean-shift, Density-Based Spatial Clustering, and Expectation–Maximization (EM) Clustering. However, when it comes to time series data, we can not use those algorithms directly since we are dealing with separating sequences of data, not data points. Here, we can replace the Euclidean Distance measure for data points in the K-means algorithm with Dynamic Time Warping to solve the problem.
1. Dynamic Time Warping(DTW)
We can see from the picture, that if we use Euclidean matching, there will be some leftover data points since the duration of these sequences is different. Dynamic Time Warping will find the nearest corresponding points in the other sequence. Euclidean matching is one-to-one, while DTW matching is one-to-many.
The formal way to express the measurement is:
The calculation is implemented in tslean (a library that focuses on time series data analysis):