What is the Formula for KNN Classifier?

The KNN algorithm is known as the K-nearest neighbor algorithm usually used for classification purposes. Although KNN is a supervised learning model, it can also be used as an unsupervised model as well. The KNN uses many distance formulae for training purposes. The Euclidean distance formula for KNN is d = √((x1′ – x1)^2 + (y1′ – y1)^2 + … + (xn’ – xn)^2) and the Minkowski distance formula for KNN is dist(x,z)=(d∑r=1|xr−zr|p)1/p. We will explore these distance formulae in more detail in the upcoming sections

What is the formula for KNN Classifier?

Before going to understand the formula used in the KNN classifier to train the model, we need to understand what is KNN and how it actually works. Well, a KNN is a supervised learning algorithm, also known as a lazy learner, which is used to classify the dataset.

Implementing a KNN model in Python is very simple and easy. We need to import the KNN classifier from the sklearn module and then use the fit function to train the model. But the important thing is to understand the working of the KNN classifier.

You might have heard that the KNN classifier classifies the incoming data point based on majority voting. But how this majority voting actually works? Well, here comes various distance formulae that the classifier uses. The two most well-known distance formulae used in KNN are the Minkowski distance and Euclidean distance.

The classifier memorizes the training dataset and then uses any of the given distance formulae to calculate the distances from the input value to the training dataset and then based on majority voting, it classifies the data point.

what is the formula for knn classifier

What is Minkowski distance in KNN?

The Minkowski distance is actually a generalized form of the Euclidean and Manhattan distance formula. It is a distance matrix that is used to find the distance between two points in a multi-dimensional array.

Here is the formula for the Minkowski distance:

d = (|x1′ – x1|^p + |y1′ – y1|^p + … + |xn’ – xn|^p)^(1/p)

How to visualize the Minkowski distance in Python?

For the visualization of the Minkowski distance formula, we will use the Matplotlib module. If you are looking to visualize the KNN-trained model, then go through the model visualization article.

To visualize the Minkowski distance formula, we first need to write a function for it in Python.

import numpy as np
import matplotlib.pyplot as plt

#  Minkowski distance function
def minkowski_distance(point_A, point_B, p):
    distance = np.linalg.norm(point_B - point_A, ord=p)
    return distance

Now, the next step is to generate the data points using the linespace in numpy array. Here we go with the array and data points:

# Generate grid points for visualization
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x, y)

# Choose two points for distance calculation
point_A = np.array([0, 0])
point_B = np.array([2, 2])

# Calculate the Minkowski distance for each grid point
p = 2  
grid_points = np.c_[X.ravel(), Y.ravel()]
distances = np.zeros_like(X)

Once we are done with the data points, we will then move toward the visualization of the Minkowski formula:

for i in range(len(grid_points)):
    distances.ravel()[i] = minkowski_distance(point_A, grid_points[i], p)

# Reshape the distances to match the grid shape
distances = distances.reshape(X.shape)

# Plot the Minkowski distance contours
plt.figure(figsize=(8, 6))
plt.contourf(X, Y, distances, levels=20, cmap='viridis')
plt.scatter(point_A[0], point_A[1], color='red', label='Point A')
plt.scatter(point_B[0], point_B[1], color='blue', label='Point B')
plt.colorbar(label='Minkowski Distance')
plt.xlabel('X')
plt.ylabel('Y')
plt.title(f'Minkowski Distance (p={p})')
plt.legend()
plt.show()
minkowski distance formula

As you can see, we were able to visualize the distance formula in Python.

What is Euclidean Distance Formula?

The Euclidean distance formula is another formula that is used to find the distance between two points in an array. The Eculidean formula is actually derived from the Pythagorean theorem.

Here is the formula of the Euclidean distance formula:

d = √((x1′ – x1)^2 + (y1′ – y1)^2 + … + (xn’ – xn)^2)

How to visualize the Euclidean Distance in Python?

Similar to the other distance formula, we will visualize the Euclidean formula in Python using the Matplotlib module. First, we need to create the dataset and then call the distance formula:

# Define the two points
point_A = np.array([2, 3])
point_B = np.array([4, 5])

# Calculate the Euclidean distance
distance = np.linalg.norm(point_B - point_A)

Once we are done with the distance formula, the next step is to visualize the distance that we just have calculated:

plt.figure()
plt.plot([point_A[0], point_B[0]], [point_A[1], point_B[1]], 'ro-')
plt.scatter([point_A[0], point_B[0]], [point_A[1], point_B[1]], color='red')
plt.text(point_A[0] + 0.1, point_A[1], 'A', fontsize=12)
plt.text(point_B[0] + 0.1, point_B[1], 'B', fontsize=12)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Euclidean Distance')
plt.text((point_A[0] + point_B[0]) / 2, (point_A[1] + point_B[1]) / 2, f'd = {distance:.2f}', ha='center')
plt.grid(True)
plt.axis('equal')
plt.show()
distance formula

Hopefully, this article was helpful in understanding the formula for the KNN model.

Final Works

The formula that is commonly used for KNN is the Euclidean one. It is the default set parameter value when training the model. However, you can change it in the parameter setting and use other distance formulae as well.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top