Instance-Based Classifier

These classifiers do not build models but classify new records based on their similarity to the examples in the training set.

They are called lazy-diligent learners as opposed to impatient learners (rule-based, decision trees).

K-Nearest Neighbor

K-Nearest Neighbor is a simple algorithm that stores all the available cases and classifies the new data or case based on a similarity measure.

It is mostly used to classify a data point based on how its neighbors are classified.

Requirements:

A training set
A metric to calculate the distance between records
The value of k (the number of neighbors to be used)

The classification process calculates the distance to the records in the training set, it identifies k nearest neighbors and uses nearest neighbor class labels to determine the class of the unknown record.

The choice of k is important because:

If k is too small, the approach is sensitive to noise
If k is too large, the surround may include examples belonging to other classes

Remember to normalize attributes in pre-processing, because to operate correctly, they should have the same scale of values.

Pros of KNN:

Do not require the construction of a model
Compared with rule-based or decision tree systems, they allow the construction of nonlinear class (more flexible)

Cons of KNN:

Require a similarity or distance measure to assess closeness
Require a pre-processing step to normalize the range of variation of attributes
Class is locally determined and therefore susceptible to data noise
Very sensitive to the presence of irrelevant or related attributes that will distort distances between objects
Classification cost can be high and depends linearly on the size of the training set in the absence of appropriate index structures

The R-Tree Index Structure

R-trees are extensions of B+-trees to multi-dimensional spaces:

B+-trees organize objects into a set of non-overlapping one-dimensional intervals, applying this principle recursively from the leaves of the root
R-trees organize objects into a set of overlapping multi-dimensional intervals, applying this principle recursively from the leaves to the root

Last update: January 15, 2023 13:58:25
Created: October 11, 2022 10:44:03

Authors: Francesca Neri