On the selection of the globally optimal prototype subset for Nearest-Neighbor classification

Carrizosa, Emilio, Martin-Barragan, Belen, Plastria, Frank and Romero-Morales, Dolores (2007) On the selection of the globally optimal prototype subset for Nearest-Neighbor classification. INFORMS Journal on Computing, 19 (3). pp. 470-479.

[img] PDF - Published Version
Download (162kB)

Abstract

The nearest-neighbor classifier has been shown to be a powerful tool for multiclass classification. We explore both theoretical properties and empirical behavior of a variant method, in which the nearest-neighbor rule is applied to a reduced set of prototypes. This set is selected a priori by fixing its cardinality and minimizing the empirical misclassification cost. In this way we alleviate the two serious drawbacks of the nearest-neighbor method: high storage requirements and time-consuming queries. Finding this reduced set is shown to be NP-hard. We provide mixed integer programming (MIP) formulations, which are theoretically compared and solved by a standard MIP solver for small problem instances. We show that the classifiers derived from these formulations are comparable to benchmark procedures. We solve large problem instances by a metaheuristic that yields good classification rules in reasonable time. Additional experiments indicate that prototype-based nearest-neighbor classifiers remain quite stable in the presence of missing values.

Item Type: Article
Keywords: classification; optimal prototype subset; nearest neighbor; dissimilarities; integer programming; variable neighborhood search; missing values; management science
Subject(s): Management science
Date Deposited: 27 Aug 2010 12:37
Last Modified: 09 Mar 2017 16:22
Funders: N/A
URI: http://eureka.sbs.ox.ac.uk/id/eprint/425

View statistics

Actions (login required)

Edit View Edit View