next up previous contents
Next: The k-DT algorithm. Up: k-NN classification and k-distance Previous: k-NN classification and k-distance

Introduction

The k-Nearest Neighbors (k-NN) classification rule is a technique for non-parametric supervised pattern classification. Given the knowledge of N prototype patterns (vectors of dimension D) and their correct classification into several classes, it assigns an unclassified pattern to the class that is most heavily represented among its k nearest neighbors in the pattern space.
The first formulation of the above rule appears to have been made by Fix and Hodges [53] in 1951. They established the consistency of the rule for sequences such that $k
\rightarrow \infty$ and $k/N \rightarrow 0$. In [54], they investigate the small sample performance of the rule numerically, under the assumption of normal statistics.
The k-NN decision rule makes no assumption on the underlying probabilistic distribution of the samples points and of their classification. Therefore, the probability of error R of this rule must be at least as large as the Bayes probability of error R* - the minimum probability of error when the distribution is known. Cover and Hart [24] show that, with certain statistical assumptions, the conditional risk for the 1-NN rule is $R \leq 2.R^*$. For the k-NN rule, the risk is bounded by (1+1/k)R* [23]. Thus, when $k
\rightarrow \infty$, $R \rightarrow R^*$.
If one implements the k-NN rule with a brute-force method to classify F patterns using N prototypes, it requires F.N distance computations and o( $F.N.\log(N))$ comparisons. This is often unpractical with large data sets, and has led to the research of efficient algorithms.
Hart [71], Gates [63] and Tomek [157] reduce the number of prototypes to consider while trying not to affect the accuracy of the resulting classification. Hart's Condensed Nearest Neighbor (CNN) and Gates' Reduced Nearest Neighbor (RNN) rules only apply to 1-NN classification according to their authors.
Belkasim [4] proposes to make use of the natural clustering of the training data to reduce the number of prototypes to which distances must be computed. In the worst case, this partitioning of the prototype data has no benefit.
Fukunaga and Narendra [60] propose a branch and bound approach, later improved by Kamgar-Parsi and Kanal [87], Niemann and Goppert [115] and finally Jiang and Zhang [83]. In this approach, the prototypes are hierarchically decomposed into disjoint subsets. A powerful tree-search algorithm, the branch and bound method (see survey by Lawler [91]), is then applied to the resulting groups.
Friedman [59] orders the training data along the axis with the maximal sparsity for each pattern. This restricts the computations to a band around the projection of the test data onto this axis. The expected number of distance computations is reduced to O (F.k1/D.N1-1/D) with D dimensional vectors.
Finally, Warfield [175] considers a particular type of applications where the number of possible patterns is much smaller than the number of patterns to classify. One such application is the classification of MRI data, where patterns consists of 2-3 channels (D) of data quantified over a small range of values, for a 3D volume including typically $1-6 \times
10^6$ voxels. Then, it becomes efficient to compute a lookup table for every possible pattern, then to classify the voxels by accessing the location of their values in the lookup table.
The computation of this lookup table is essentially a k-distance transformation problem, where the image is the pattern space and the object pixels are the prototype patterns. Warfield's k-DT algorithm is based on Borgefors' chamfer DT [10] in 2D and on Ragnelmam's corner EDT [128] in higher dimensions. The difference with those methods is that the k nearest patterns (object pixels) are considered, instead of 1. It goes as follow:

Algorithm 11   Warfield's k-distance transformation  


\begin{algorithmic}\STATE Insert training data patterns identifiers into the map...
...s of the $k$\space nearest patterns
\ENDFOR
\ENDFOR
\STATE
\end{algorithmic}
The method requires O (2D F ( D + 1 ) k) distance computations and O $(F (D+1) k \log((D+1) k))$ comparisons. Warfield shows that - for this type of applications - it is an order of magnitude faster than the above k-NN methods.

next up previous contents
Next: The k-DT algorithm. Up: k-NN classification and k-distance Previous: k-NN classification and k-distance
Olivier Cuisenaire
1999-10-05