Time Series Data Clustering — Unsupervised Sequential Data Separation with Tslean

2 min readAug 28, 2022

https://towardsdatascience.com/preprocessing-iot-data-linear-resampling-dde750910531

Clustering is an important machine learning technique that helps divide the data points into several groups. Common clustering algorithms are K-means, Mean-shift, Density-Based Spatial Clustering, and Expectation–Maximization (EM) Clustering. However, when it comes to time series data, we can not use those algorithms directly since we are dealing with separating sequences of data, not data points. Here, we can replace the Euclidean Distance measure for data points in the K-means algorithm with Dynamic Time Warping to solve the problem.

1. Dynamic Time Warping(DTW)

We can see from the picture, that if we use Euclidean matching, there will be some leftover data points since the duration of these sequences is different. Dynamic Time Warping will find the nearest corresponding points in the other sequence. Euclidean matching is one-to-one, while DTW matching is one-to-many.

The formal way to express the measurement is:

https://tslearn.readthedocs.io/en/stable/user_guide/dtw.html

The calculation is implemented in tslean (a library that focuses on time series data analysis):

Time Series Data Clustering — Unsupervised Sequential Data Separation with Tslean

1. Dynamic Time Warping(DTW)

Written by Renee LIN

Responses (1)