Several evaluation and comparison attempts of algorithms in different research areas have already been published: edge detection schemes [Fram, 1975], global thresholding methods [Lee, 1990], optical flow estimation [Barron, 1994], [Bolles, 1993] and shape from shading [Zhang, 1994]. Attempts for the evaluation of segmentation algorithms can be split up into two major categories: analytical and empirical methods [Zhang, 1996]. Analytical methods directly examine and assess the segmentation techniques themselves by analysing the algorithm's principles and properties. Empirical methods indirectly judge the segmentation algorithms by applying them to test images and measuring the quality of segmentation results. Zhang [zhang, 1996] splits the empirical methods further up into goodness methods and discrepancy methods. Goodness methods measure some desirable properties of the segmented image (e.g. intra-region uniformity [Weszka, 1978], inter-region contrast [Levine, 1985], region shape [Sahoo, 1988], etc.), while discrepancy methods measure differences to a pre-defined reference segmentation (ground truth).
The most popular approaches to support the authors' argumentation in the published literature are the analytical methods. This category avoids the concrete implementation of the algorithm and the results can be exempted from the influence caused by the arrangement of evaluation experiments. The choice of this comparison method is very often the only possible due to the lack of details for a re-implementation of the published algorithms. However, evaluations of this category generating a quantitative result are very limited. Abdou and Pratt [abdou, 1979] analysed the performance of several edge detectors using a precisely defined and calculated detection probability ratio. This analytical method results in a quantitative measure but can only be calculated for simple edge detectors. It is in general harder to find a quantitative measure for algorithm comparison using analytical methods compared to using empirical methods.
In this report we apply an adapted version of the framework of Hoover et. al. [Hoover, 1996] for the evaluation of range image segmentation algorithms to perform our evaluation. The framework compares the segmentation with a corresponding set of ground truth segmentations. An objective comparison tool performs the evaluation using specified metrics for correct detection, over-segmentation, under-segmentation, missed regions and noise regions. A large number of real test images, the comparison tool and results of four different research groups working on range image segmentation is publicly available and allows therefore an objective and competitive comparison of the own work with state-of-the-art algorithms.
Download gzipped Postscript (150K)