Secondary Indexing on Key-Value Stores

NoSQL databases like key-value stores achieve fast write throughput and fast lookups on the primary key. However, many applications also require queries on non-primary attributes. For that, several NoSQL databases have added support for secondary indexes. To our best knowledge, little work has studied how to support secondary indexing on pure key-value stores, which are a fundamental and popular category within the NoSQL databases range.

We propose a novel lightweight secondary indexing technique on log-structure merge-tree (LSM tree)-based key-value stores, which we refer as "embedded index". The embedded index, which utilizes Bloom filters, is very space-efficient, and achieves very high write-throughput rates, comparable to non-indexed key-value stores. It is embedded inside the data files, that is, no separate index structure is maintained. To our best knowledge, this is the first work to use Bloom filters for secondary indexing. The embedded index also simplifies the transaction management, compared to alternative stand-alone secondary indexing schemes.


This project is sponsored by a Samsung GRO grant.