Understanding the Nearest Neighbour Classifier

What Is This Tutorial About?

Imagine you’re shown a picture of a handwritten number — say a messy, wobbly-looking “3” — and asked to figure out what number it is. For a human, that’s usually easy. But for a computer? Not so simple.

In this lesson, we’ll explore how a machine can learn to recognize handwritten digits using a classic and surprisingly intuitive approach called the Nearest Neighbour classifier. Along the way, we’ll introduce some key concepts in machine learning such as:

  • How to compare and classify using distance
  • What a classification problem is
  • The idea of training vs test data
  • How computers “see” images as numbers

Let’s Start with the Problem: Recognising Digits

We want the machine to do what your eyes and brain do naturally: look at an image and decide what digit (0–9) it shows.

Here are some example images from a famous dataset used for this task. Each one is a 28×28 pixel grayscale image of a digit:

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ]
[ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ]

Some are clean and easy to recognize. Others are messier, like scribbles on a sticky note. Still, we want the machine to make an accurate guess — just like you would.

A Smarter Idea: Let the Machine Learn

Instead of trying to write rules ourselves, we can let the data teach the computer what digits look like.

We use a method from machine learning called Nearest Neighbour classification. Here’s the idea:

  1. Show the computer thousands of examples of handwritten digits, each labeled correctly.
  2. When a new image appears, the machine:
    • Looks through the known examples
    • Finds the most similar one
    • Says: “This new image looks closest to this ‘5’, so I think it’s also a ‘5’.”

That’s it. No rules about loops or lines — just similarity.

How Images Become Data

To a computer, an image is not a picture — it’s a grid of numbers.

Each 28×28 image has 784 pixels. Each pixel is a number from 0 (black) to 255 (white). So a handwritten “3” might look like:

[ 0, 0, 255, 128, … ] → a vector with 784 numbers

This long list of numbers is how the computer “sees” the image.

Comparing Images: Measuring Distance

Once images are turned into vectors of numbers, we can compare them using a mathematical formula. The most common is Euclidean distance — think of it as the length of a straight line between two points in space.

Even though we’re in 784-dimensional space (not just 2D), the concept is the same:

  • Subtract the numbers
  • Square the differences
  • Add them up
  • Take the square root

The smaller the result, the more similar the two images.

The Nearest Neighbour Classifier (1-NN)

The 1-NN method is simple:

  1. Store all labeled training images in memory.
  2. When a new image comes in:
    • Compare it to every image in the training set.
    • Find the one that’s closest (smallest distance).
    • Copy its label as the prediction.

It’s called “lazy learning” because it doesn’t do anything until it’s asked to classify a new point.

How Well Does It Work?

Let’s test it!

The MNIST dataset gives us:

  • 60,000 training images
  • 10,000 test images

When we run 1-NN on the test set, we get a test error of 3.09%. That means 96.91% of the time, the model guesses correctly — pretty impressive for such a simple approach!

By contrast, a random guesser (just picking digits blindly) would have a 90% error rate.

Strengths and Weaknesses

Strengths:

  • No training required — just store and compare
  • Intuitive and easy to implement
  • Works well on clean, well-labeled data

Weaknesses:

  • Slow with large datasets (needs to check every example)
  • No learning: it doesn’t understand anything, just memorizes
  • Sensitive to irrelevant features or noise

Wrapping Up

The Nearest Neighbour classifier is a great first step in machine learning. It teaches us:

  • How to turn images into numerical data
  • How to define similarity using distance
  • Why we need training and test data to evaluate performance

But most importantly, it shows that machines can learn from data — even without complex rules.

The Nearest Neighbour algorithm stands as a powerful yet intuitive introduction to classification in machine learning. Its strengths lie in simplicity and direct application, particularly with datasets like MNIST. However, its limitations — especially in scalability and generalisation — highlight the need for more sophisticated models. As machine learning continues to evolve, understanding these foundational methods remains essential for building toward more advanced solutions.

error: Thank you for visiting! This content is protected. We appreciated your understanding.
Scroll to Top