KNN algorithm

Sonalee Bhattacharyya
2 min readJun 22, 2022

--

Today I learned about the knn algorithm. I stumbled upon a post about this algorithm on the app blind, where someone was explaining their journey into data science.

The algorithm itself is pretty cool. Basically, you can use it to classify a data point based on its neighbors. The main assumption is that points close by will share a characteristic. I always feel like I understand something better if I try to explain it, so I am going to try to create an example.

Let’s say you want to predict whether someone will join a gym or not. You have some other data points, so you have information about some people who either joined the gym or didn’t, and you also know those peoples income and how far they live from the gym, let’s say. So maybe we assume that income and distance from the gym will help to predict whether someone will join or not. It could look something like this, where the blue circle is around the person we want to make the prediction on (lets call him Bob) and all the other dots are part of the training set. The red circles indicate people who joined the gym, and the rest did not. So we are assuming Bob will behave similarly to his closest neighbors. You want to choose an odd number of neighbors for this, otherwise you might end up with a 50/50 split. You find the distance between Bob and all the points in the training set, and then pick the five closest neighbors. Then you check the status of each of those. In this case three joined the gym, and two did not, so I would predict that Bob is going to be working out. Good for him! Now I need to learn how to implement this in Python!

--

--

Sonalee Bhattacharyya
Sonalee Bhattacharyya

Written by Sonalee Bhattacharyya

Mathematics lecturer transitioning to a career in data analysis

No responses yet