Increase the Throughput of Non-Relational Databases through Theoretical Modeling and Optimization

The explosive growth of data is driving the rapid evolution of massive data-storage systems. These systems are widely used, not only in large-scale Internet services, but also in scientific projects in diverse areas such as astronomy, geography, and genetics. This project will increase the efficiency of these data-storage systems, which will allow processing more data at lower cost. There is the potential for a large societal impact as science and engineering research is made more cost-effective.

More specifically, this project will work on improving non-relational databases with log-structured merge-tree storage architectures. One main focus will be on improving a key component of such systems, namely, compaction policies. Compaction policies are not yet well understood, but are crucial for system performance. To date, compaction policies have been designed by trial and error, guided mainly by empirical experience. The project will develop analytical models for compaction, validate and refine the models with empirical testing, design improved policies that are optimal according to the models, and deploy these policies in live systems. Further, the developed theoretical models will be leveraged to optimize non-relational database systems in handling high volumes of dynamic continuous queries, which arrive and expire rapidly.


This project is sponsored by NSF (IIS 1619463 , 2016-2019).




  • Mohiuddin Abdul Qader, Shiwen Cheng, Vagelis Hristidis. A Comparative Study of Secondary Indexing Techniques in LSM-based NoSQL Databases. ACM SIGMOD International Conference on Management of Data (SIGMOD), 2018
  • Mohiuddin Abdul Qader and Vagelis Hristidis. DualDB: An Efficient LSM-based Publish-Subscribe Storage System. International Conference on Scientific and Statistical Database Management (SSDBM) 2017
  • C Mathieu, C Staelin, NE Young, A Yousefi. Bigtable Merge Compaction. arXiv:1407.3008, 2014.
  • Steven Jacobs, Md Yusuf Sarwar Uddin, Michael Carey, Vagelis Hristidis, Vassilis J. Tsotras, N. Venkatasubramanian, Yao Wu, Syed Safir, Purvi Kaul, Xikui Wang, Mohiuddin Abdul Qader, Yawei Li. A BAD Demonstration: Towards Big Active Data. Demo at VLDB 2017