Definition:
- Simple approach:
- Store all training examples
- Classify a new example based on most similar stored examples
Components of a KNN classifier:
- Distance function hyperparameter
- How do we measure distance between instances?
- Determine the layout of the example space
- The k hyperparameter
- Determine the complexity of the hypothesis space
- If k = 1, every training example has its own neighborhood
- If k = N, the entire feature space is one neighborhood
- Higher k yields smoother decision boundaries?
- Inductive Bias of k-NN
- Prediction of an instance should be most similar to its nearby instances
- All features are equally important
- Complexity is tuned by the k parameter
Decision boundary of a classifier:
- After plotting all points of training data, we draw a line/plane/3d plane/… which decides which area/volume belongs to which class
- a hyperplane, w that passes through the origin, unless we define a bias scalar b
- then y^=sign(wTx+b), project x on w to see if its on negative half or positive half
- No longer need to loop through all training examples
Inductive bias of knn
- Prediction of an instance should be most similar to its nearby instances
- All features are equally important
- Complexity is tuned by the k parameter
Variations on KNN:
Weighted voting:
- Weight neighbors by (inverse) distance
Epsilon Ball Nearest Neighbors:
- change the method for selecting which training examples vote
- Instead of using K nearest neighbors, use all examples x such that distance(x^,x)≤ϵ