Ordered and Unordered Top-K Range Reporting in Large Data Sets

Peyman Afshani, Gerth Stølting Brodal, and Norbert Zeh

In Proc. 22nd Annual ACM-SIAM Symposium on Discrete Algorithms, pages 390-400, 2011.

Abstract

We study the following problem: Given an array A storing N real numbers, preprocess it to allow fast reporting of the K smallest elements in the subarray A[i, j] in sorted order, for any triple (i, j, K) with 1 ≤ ijN and 1 ≤ Kj - i + 1. We are interested in scenarios where the array A is large, necessitating an I/O-efficient solution.

For a parameter f with 1 ≤ f ≤ logm n, we construct a data structure that uses O((N/f) logm n) space and achieves a query bound of O(logB N + f K/B) I/Os, where B is the block size, M is the size of the main memory, n := N/B, and m := M/B. Our main contribution is to show that this solution is nearly optimal. To be precise, we show that achieving a query bound of O(log^α n + fK/B) I/Os, for any constant α, requires Ω(N (f-1logM n)(log(f-1 logM n))) space, assuming B = Ω(log N). For MB1+ε, this is within a loglogm n factor of the upper bound. The lower bound assumes indivisibility of records and holds even if we assume K is always set to j-1+1.

We also show that it is the requirement that the K smallest elements be reported in sorted order which makes the problem hard. If the K smallest elements in the query range can be reported in any order, then we can obtain a linear-size data structure with a query bound of O(logB N + K/B) I/Os.

Copyright notice

Copyright © 2011 by the Association for Computer Machinery, Inc., and the Society for Industrial and Applied Mathematics.

Online version

soda11.pdf (241 Kb)

DOI

www.siam.org/proceedings/soda/2011/SODA11_031_afshanip.pdf

BIBTEX entry

@inproceedings{soda11,
  author = "Peyman Afshani and Gerth St{\o}lting Brodal and Norbert Zeh",
  booktitle = "Proc. 22nd Annual ACM-SIAM Symposium on Discrete Algorithms",
  doi = "www.siam.org/proceedings/soda/2011/SODA11_031_afshanip.pdf",
  pages = "390-400",
  title = "Ordered and Unordered Top-K Range Reporting in Large Data Sets",
  year = "2011"
}