Definition:

  • Simple approach:
    • Store all training examples
    • Classify a new example based on most similar stored examples

Components of a KNN classifier:

  • Distance function hyperparameter
    • How do we measure distance between instances?
    • Determine the layout of the example space
  • The k hyperparameter
    • Determine the complexity of the hypothesis space
      • If k = 1, every training example has its own neighborhood
      • If k = N, the entire feature space is one neighborhood
    • Higher k yields smoother decision boundaries?
    • Inductive Bias of k-NN
      • Prediction of an instance should be most similar to its nearby instances
      • All features are equally important
      • Complexity is tuned by the k parameter

Decision boundary of a classifier:

  • After plotting all points of training data, we draw a line/plane/3d plane/… which decides which area/volume belongs to which class
    • a hyperplane, that passes through the origin, unless we define a bias scalar
    • then , project on to see if its on negative half or positive half
  • No longer need to loop through all training examples

Inductive bias of knn

  • Prediction of an instance should be most similar to its nearby instances
  • All features are equally important
  • Complexity is tuned by the k parameter

Variations on KNN:

Weighted voting:
  • Weight neighbors by (inverse) distance
Epsilon Ball Nearest Neighbors:
  • change the method for selecting which training examples vote
  • Instead of using K nearest neighbors, use all examples such that