Authors: Keogh, E., Lonardi, S and Chiu, W.

Title: Finding Surprising Patterns in a Time Series Database In Linear Time and Space.

Conference: In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. July 23 - 26,

Year: 2002

Abstract: The problem of finding a specified pattern in a time series database (i.e. query by content) has received much attention and is now a relatively mature field. In contrast, the important problem of enumerating all surprising or interesting patterns has received far less attention. This problem requires a meaningful definition of surprising, and an efficient search technique. All previous attempts at finding surprising patterns in time series use a very limited definition of surprise, and/or do not scale to massive datasets. To overcome these limitations we introduce a novel technique that defines a pattern surprising if the frequency of its occurrence differs from that expected by chance, given some previously seen data. This definition has the advantage of not requiring an explicit definition of surprise, which may be impossible to elicit from a domain expert. Instead the user simply gives the algorithm a collection of previously observed normal data. Our algorithm uses a suffix tree to encode the relative frequency of all observed patterns and allows a Markov Model to predict the expected frequency of unobserved patterns. Once the suffix tree has been constructed, the surprisingness of all patterns in a new database can be determined in time and space linear in the size of the database. We demonstrate the utility of our approach with an extensive experimental evaluation.

[Download]

Back