NEC Research Institute Technical Report | 1998 |

**Abstract: **
The excluded middle vantage point forest is a practical new
data structure that supports * worst case * sublinear time
searches in a metric space for nearest neighbors within a
fixed radius $\tau$ of arbitrary queries. Worst case
performance depends on the dataset but is not affected by the
distribution of queries.

Our analysis predicts vp-forest performance in simple settings
such as $L_p$ spaces with uniform random datasets --- and
experiments confirm these predictions. Another contribution
of the analysis is a new perspective on the * curse of
dimensionality *. In our idealized setting the dataset is
organized into a forest of $O(N^{1-\rho})$ trees, each of
depth $O(\log{N})$. Here $\rho$ may be viewed as depending on
$\tau$, the distance function, and on the dataset. The radius
of interest $\tau$ is an input to the organization process and
the result is a linear space data structure specialized to
answer queries within this distance. Searches then require
$O(N^{1-\rho} \log{N})$ time, or $O(\log{N})$ time given
$O(N^{1-\rho})$ processors.

We also discuss design variations including an approach to trading space for reductions in search time.

**Keywords: **
Nearest neighbor search,
Vantage point tree (vp-tree),
Kd-tree,
Computational geometry,
Metric space.

- View the complete paper (converted to HTML)
- The complete paper (PDF)
- The complete paper (PostScript)
- BibTex entry
- My homepage ( Peter N. Yianilos )
- See also:

This work was largely done during 1996-1997 and was presented as part of the CRP (Computing Research at Princeton) talk series during the Summer of 1997.