Authors: Dimitris Papadopoulos, Carlotta Domeniconi, Dimitrios Gunopulos, Sheng Ma
Title: Clustering Gene Expression Data in SQL Using Locally Adaptive Metrics
Conference: 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2003 p.35-41
Year: 2003
Abstract: he clustering problem concerns the discovery of homogeneous
groups of data according to a certain similarity measure.
Clustering suffers from the curse of dimensionality. It
is not meaningful to look for clusters in high dimensional
spaces as the average density of points anywhere in input
space is likely to be low. As a consequence, distance functions
that equally use all input features may be ineffective.
We introduce an algorithm that discovers clusters in subspaces
spanned by different combinations of dimensions via
local weightings of features. This approach avoids the risk
of loss of information encountered in global dimensionality
reduction techniques. Our method associates to each cluster
a weight vector, whose values capture the relevance of
features within the corresponding cluster. In this paper we
present an effcient SQL implementation of our algorithm,
that enables the discovery of clusters on data residing inside
a relational DBMS.
[Download]
Back