Worst Case Efficient Data Structures

Gerth Stølting Brodal

PhD Thesis, Department of Computer Science, Aarhus University, Denmark, x+121 pages, January 1997. BRICS-DS-97-1.

Abstract

We study the design of efficient data structures. In particular we focus on the design of data structures where each operation has a worst case efficient implementations. The concrete problems we consider are partial persistence, implementation of priority queues, and implementation of dictionaries.

The first problem we consider is how to make bounded in-degree and out-degree data structures partially persistent, i.e., how to remember old versions of a data structure for later access. A node copying technique of Driscoll et al. supports update steps in amortized constant time and access steps in worst case constant time. The worst case time for an update step can be linear in the size of the structure. We show how to extend the technique of Driscoll et al. such that update steps can be performed in worst case constant time on the pointer machine model.

We present two new comparison based priority queue implementations, with the following properties. The first implementation supports the operations FindMin, Insert and Meld in worst case constant time and Delete and DeleteMin in worst case time O(log n). The priority queues can be implemented on the pointer machine and require linear space. The second implementation achieves the same worst case performance, but furthermore supports DecreaseKey in worst case constant time. The space requirement is again linear, but the implementation requires auxiliary arrays of size O(log n). Our bounds match the best known amortized bounds (achieved by respectively binomial queues and Fibonacci heaps). The data structures presented are the first achieving these worst case bounds, in particular supporting Meld in worst case constant time. We show that these time bounds are optimal for all implementations supporting Meld in worst case time o(n). We also present a tradeoff between the update time and the query time of comparison based priority queue implementations. Finally we show that any randomized implementation with expected amortized cost t comparisons per Insert and Delete operation has expected cost at least n/2O(t) comparisons for FindMin.

Next we consider how to implement priority queues on parallel (comparison based) models. We present time and work optimal priority queues for the CREW PRAM, supporting FindMin, Insert, Meld, DeleteMin, Delete and DecreaseKey in constant time with O(log n) processors. Our implementation is the first supporting all of the listed operations in constant time. To be able to speed up Dijkstra's algorithm for the single-source shortest path problem we present a different parallel priority data structure. With this specialized data structure we give a parallel implementation of Dijkstra's algorithm which runs in O(n) time and performs O(mlog n) work on a CREW PRAM. This represents a logarithmic factor improvement for the running time compared with previous approaches.

We also consider priority queues on a RAM model which is stronger than the comparison model. The specific problem is the maintenance of a set of n integers in the range 0..2w-1 under the operations Insert, Delete, FindMin, FindMax and Pred (predecessor query) on a unit cost RAM with word size w bits. The RAM operations used are addition, left and right bit shifts, and bit-wise boolean operations. For any function f(n) satisfying loglog nf(n)≤ \sqrt{log n}, we present a data structure supporting FindMin and FindMax in worst case constant time, Insert and Delete in worst case O(f(n)) time, and Pred in worst case O((log n)/f(n)) time. This represents the first priority queue implementation for a RAM which supports Insert, Delete and FindMin in worst case time O(loglog n) - previous bounds were only amortized. The data structure is also the first dictionary implementation for a RAM which supports Pred in worst case O(log n/loglog n) time while having worst case O(loglog n) update time. Previous sublogarithmic dictionary implementations do not provide for updates that are significantly faster than queries. The best solutions known support both updates and queries in worst case time O(\sqrt{log n}).

The last problem consider is the following dictionary problem over binary strings. Given a set of n binary strings of length m each, we want to answer d-queries, i.e., given a binary query string of length m to report if there exists a string in the set within Hamming distance d of the query string. We present a data structure of size O(nm) supporting 1-queries in time O(m) and the reporting of all strings within Hamming distance 1 of the query string in time O(m). The data structure can be constructed in time O(nm). The implementation presented is the first achieving these optimal time bounds for the preprocessing of the dictionary and for 1-queries. The data structure can be extended to support the insertion of new strings in amortized time O(m).

Copyright notice

Copyright © 1997, BRICS, Department of Computer Science, Aarhus University. All rights reserved.

Online version

brics-ds-97-1.pdf (944 Kb)

Slides

thesis.pdf (206 Kb)

BIBTEX entry

@phdthesis{ThesisBrodal,
  author = "Gerth St{\o}lting Brodal",
  issn = "1396-7002",
  month = "January",
  number = "BRICS-DS-97-1",
  pages = "x+121",
  school = "Department of Computer Science, Aarhus University, Denmark",
  title = "Worst Case Efficient Data Structures",
  year = "1997"
}