Skip to content

Instance-Based Classifier

These classifiers do not build models but classify new records based on their similarity to the examples in the training set.

They are called lazy-diligent learners as opposed to impatient learners (rule-based, decision trees).

K-Nearest Neighbor

K-Nearest Neighbor is a simple algorithm that stores all the available cases and classifies the new data or case based on a similarity measure.

It is mostly used to classify a data point based on how its neighbors are classified.

Requirements:

  • A training set
  • A metric to calculate the distance between records
  • The value of k (the number of neighbors to be used)

The classification process calculates the distance to the records in the training set, it identifies k nearest neighbors and uses nearest neighbor class labels to determine the class of the unknown record.

The choice of k is important because:

  • If k is too small, the approach is sensitive to noise
  • If k is too large, the surround may include examples belonging to other classes

Remember to normalize attributes in pre-processing, because to operate correctly, they should have the same scale of values.

Pros of KNN:

  • Do not require the construction of a model
  • Compared with rule-based or decision tree systems, they allow the construction of nonlinear class (more flexible)

Cons of KNN:

  • Require a similarity or distance measure to assess closeness
  • Require a pre-processing step to normalize the range of variation of attributes
  • Class is locally determined and therefore susceptible to data noise
  • Very sensitive to the presence of irrelevant or related attributes that will distort distances between objects
  • Classification cost can be high and depends linearly on the size of the training set in the absence of appropriate index structures

The R-Tree Index Structure

R-trees are extensions of B+-trees to multi-dimensional spaces:

  • B+-trees organize objects into a set of non-overlapping one-dimensional intervals, applying this principle recursively from the leaves of the root
  • R-trees organize objects into a set of overlapping multi-dimensional intervals, applying this principle recursively from the leaves to the root


Last update: January 15, 2023 13:58:25
Created: October 11, 2022 10:44:03
Authors: Francesca Neri