How to Build a Neural Network from Scratch in Go: Part 1 - Understanding and Reading the MNIST Dataset

Hey there! Welcome to the first part of our exciting series on building a neural network from scratch using Go. If you’re new to machine learning or Go, don’t worry—we’re going to take this step by step, starting with the foundation: the data. In this article, we’ll dive into the MNIST dataset, a classic collection of handwritten digits that’s perfect for learning the ropes of neural networks. Our goal today is to get the dataset, understand its structure, and write some Go code to read it. Plus, we’ll unpack some key concepts like big and little endian and one-hot encoding along the way.

Here’s what we’ll cover:

What is the MNIST Dataset? A quick intro to this famous dataset.
How to Get the MNIST Dataset Where and how to download it.
The Structure of MNIST Breaking down its binary format.
Reading MNIST in Go Walking through the code to load the data.
Key Concepts Explaining big and little endian, one-hot encoding, and alternatives.
Conclusion Wrapping up and looking ahead.

Let’s get started!

What is the MNIST Dataset?

Imagine you’re teaching a kid to recognize numbers. You’d show them lots of examples of handwritten digits, right? That’s exactly what the MNIST dataset does for machines. MNIST stands for Modified National Institute of Standards and Technology dataset, and it’s a collection of 70,000 grayscale images of handwritten digits (0 through 9). Each image is a tidy 28x28 pixels, and it comes with a label telling us which digit it is.

The dataset is split into two parts:

Training Set: 60,000 images to teach our neural network.
Test Set: 10,000 images to check how well it learned.

MNIST is like the "Hello World" of machine learning—simple, well-organized, and widely used. It’s perfect for our first adventure into building a neural network from scratch.

How to Get the MNIST Dataset

Ready to grab the data? Head over to Yann LeCun’s MNIST page. You’ll find four files to download:

train-images-idx3-ubyte.gz: The 60,000 training images.
train-labels-idx1-ubyte.gz: The labels for the training images.
t10k-images-idx3-ubyte.gz: The 10,000 test images.
t10k-labels-idx1-ubyte.gz: The labels for the test images.

These files are compressed with gzip, so after downloading, you’ll need to unzip them. On a Mac or Linux, you can run gunzip filename.gz in the terminal. On Windows, tools like 7-Zip or WinRAR work great. Once unzipped, you’ll have four binary files with extensions like .idx3-ubyte and .idx1-ubyte. Place them in a folder called mnist/ in your project directory, and we’re good to go!

The Structure of MNIST

Before we jump into coding, let’s peek under the hood of these binary files. MNIST uses a custom format called IDX, and it’s pretty straightforward once you get the hang of it.

Image Files

The image files (train-images-idx3-ubyte and t10k-images-idx3-ubyte) start with a 16-byte header:

Magic Number (4 bytes): A special number (2051) that identifies the file type.
Number of Images (4 bytes): 60,000 for training, 10,000 for testing.
Rows (4 bytes): Always 28.
Columns (4 bytes): Also 28.

After the header, the pixel data follows: 784 bytes per image (28 × 28), with each byte representing a pixel’s grayscale value from 0 (black) to 255 (white). Picture it as one long row of pixels for each image, like a flattened version of a 28x28 grid.

Label Files

The label files (train-labels-idx1-ubyte and t10k-labels-idx1-ubyte) are simpler, with an 8-byte header:

Magic Number (4 bytes): 2049 this time.
Number of Labels (4 bytes): Matches the number of images (60,000 or 10,000).

Then, it’s just one byte per label, giving us the digit (0–9) for each corresponding image.

One catch: all these multi-byte numbers (like the magic number or number of images) are stored in big-endian byte order. Don’t worry if that sounds unfamiliar—we’ll explain it soon!

Reading MNIST in Go

Now for the fun part: let’s write some Go code to read this data. We’ll create functions to load the images and labels, making sure they’re ready for our neural network. The code below is an improved version of what you provided, with better error handling and a cleaner structure.

Create a file called reader.go in a reader package:

package reader

import (
    "encoding/binary"
    "errors"
    "fmt"
    "log"
    "os"
)

const (
    trainingImagesPath = "mnist/train-images.idx3-ubyte"
    trainingLabelsPath = "mnist/train-labels.idx1-ubyte"
    testImagesPath     = "mnist/t10k-images.idx3-ubyte"
    testLabelsPath     = "mnist/t10k-labels.idx1-ubyte"
)

// ReadImages reads both training and test images from MNIST dataset
func ReadImages() (trainingImages [][]float64, testImages [][]float64, err error) {
    trainingImages, err = readMnistImages(trainingImagesPath, 60000)
    if err != nil {
        return nil, nil, err
    }

    testImages, err = readMnistImages(testImagesPath, 10000)
    if err != nil {
        return nil, nil, err
    }

    return trainingImages, testImages, nil
}

// ReadLabels reads both training and test labels from MNIST dataset
func ReadLabels() (trainingLabels [][]float64, testLabels [][]float64, err error) {
    trainingLabels, err = readMnistLabels(trainingLabelsPath, 60000)
    if err != nil {
        return nil, nil, err
    }

    testLabels, err = readMnistLabels(testLabelsPath, 10000)
    if err != nil {
        return nil, nil, err
    }

    return trainingLabels, testLabels, nil
}

// Generic function to read image data from a file
func readMnistImages(filepath string, expectedImages uint32) ([][]float64, error) {
    // Open the file
    file, err := os.Open(filepath)
    if err != nil {
        log.Fatal(err)
    }
    defer file.Close()

    // Read the header
    header := make([]byte, 16)
    _, err = file.Read(header)
    if err != nil {
        return nil, err
    }

    // Parse header information
    magicNumber := binary.BigEndian.Uint32(header[0:4])
    numImages := binary.BigEndian.Uint32(header[4:8])
    numRows := binary.BigEndian.Uint32(header[8:12])
    numCols := binary.BigEndian.Uint32(header[12:16])

    fmt.Println("Image file:", filepath)
    fmt.Println("Magic number:", magicNumber, "Images:", numImages, "Rows:", numRows, "Columns:", numCols)

    // Validate header
    if magicNumber != 2051 {
        return nil, errors.New("invalid magic number for image file")
    }

    if numImages != expectedImages {
        return nil, fmt.Errorf("expected %d images, but found %d", expectedImages, numImages)
    }

    // Read image data
    images := make([][]float64, numImages)
    for i := range images {
        images[i] = make([]float64, numRows*numCols)
        tempImage := make([]byte, numRows*numCols)
        _, err := file.Read(tempImage)
        if err != nil {
            return nil, err
        }

        // Normalize pixel values to [0,1]
        for j := range tempImage {
            images[i][j] = float64(tempImage[j]) / 255.0
        }
    }

    return images, nil
}

// Generic function to read label data from a file
func readMnistLabels(filepath string, expectedLabels uint32) ([][]float64, error) {
    // Open the file
    file, err := os.Open(filepath)
    if err != nil {
        log.Fatal(err)
    }
    defer file.Close()

    // Read the header
    header := make([]byte, 8)
    _, err = file.Read(header)
    if err != nil {
        return nil, err
    }

    // Parse header information
    magicNumber := binary.BigEndian.Uint32(header[0:4])
    numLabels := binary.BigEndian.Uint32(header[4:8])

    fmt.Println("Label file:", filepath)
    fmt.Println("Magic number:", magicNumber, "Labels:", numLabels)

    // Validate header
    if magicNumber != 2049 {
        return nil, errors.New("invalid magic number for label file")
    }

    if numLabels != expectedLabels {
        return nil, fmt.Errorf("expected %d labels, but found %d", expectedLabels, numLabels)
    }

    // Read label data
    labels := make([][]float64, numLabels)
    for i := range labels {
        // One-hot encoding for digits 0-9
        labels[i] = make([]float64, 10)
        var label uint8
        err := binary.Read(file, binary.BigEndian, &label)
        if err != nil {
            return nil, err
        }
        labels[i][label] = 1.0
    }

    return labels, nil
}

// ReadData is the original function for backward compatibility
// It reads both images and labels from MNIST dataset
func ReadData() (trainingImages [][]float64, trainingLabels [][]float64,
    testImages [][]float64, testLabels [][]float64, err error) {

    trainingImages, testImages, err = ReadImages()
    if err != nil {
        return nil, nil, nil, nil, err
    }

    trainingLabels, testLabels, err = ReadLabels()
    if err != nil {
        return nil, nil, nil, nil, err
    }

    return trainingImages, trainingLabels, testImages, testLabels, nil
}

How It Works

Let’s break this down:

readMNISTImages:
- Opens the image file and reads the 16-byte header.
- Checks the magic number (2051) and ensures we have the expected number of images and dimensions (28x28).
- Reads 784 bytes per image, normalizes each pixel to a value between 0 and 1 (by dividing by 255), and stores them in a 2D slice.
readMNISTLabels:
- Opens the label file and reads the 8-byte header.
- Verifies the magic number (2049) and number of labels.
- Reads one byte per label and converts each into a 10-element one-hot encoded vector (e.g., digit 3 becomes [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]).
ReadData:
- Ties it all together by calling the above functions with the correct file paths and parameters.

Pro Tip: Notice the defer file.Close()? In Go, this ensures the file closes when the function exits, even if an error occurs. Also, we’ve replaced log.Fatal with proper error returns so the caller can decide what to do—much friendlier for reusable code!

Key Concepts

Big and Little Endian

Ever wondered how computers store numbers bigger than a single byte? It’s all about byte order, and there are two flavors:

Big-Endian: The "big" end (most significant byte) comes first. For example, the number 2051 is stored as 00 00 08 03. It’s like reading a number from left to right in some cultures.
Little-Endian: The "little" end (least significant byte) comes first, so 2051 would be 03 08 00 00. Think of it as right-to-left.

MNIST uses big-endian, which is why we use binary.BigEndian in our Go code to read those 4-byte integers correctly. Your computer might be little-endian (most modern PCs are), so this conversion is crucial.

One-Hot Encoding

Our labels are digits from 0 to 9, but neural networks like to work with probabilities. One-hot encoding turns each label into a 10-element vector where only one position is 1, and the rest are 0s. For example:

Label 2 becomes [0, 0, 1, 0, 0, 0, 0, 0, 0, 0].
Label 7 becomes [0, 0, 0, 0, 0, 0, 0, 1, 0, 0].

It’s like flipping on one light switch out of ten to signal the digit. This matches our network’s output layer, making it easier to compute errors during training.

Alternatives to One-Hot Encoding

One-hot encoding isn’t the only game in town. Here are some other options:

Label Encoding: Just use the integer itself (e.g., 0 for 0, 1 for 1, etc.). Simple, but it implies an order (like 9 is "greater" than 0), which can confuse neural networks for classification tasks.
Binary Encoding: Represent digits in binary (e.g., 3 as 011, 9 as 1001). Uses fewer dimensions but still assumes some structure.
Embeddings: Fancy learned vectors that capture relationships between categories. Great for complex data like words, less so for our simple digits.

For MNIST, one-hot encoding is the go-to because it’s clear and works seamlessly with our network’s output.

Conclusion

Phew, we did it! In this first part, we’ve grabbed the MNIST dataset, decoded its binary structure, and written Go code to read it into memory. We’ve got our images normalized and our labels one-hot encoded, ready for action. Plus, we’ve demystified big and little endian and explored why one-hot encoding is our friend.

Next time, we’ll start building the neural network itself—layers, weights, and all that jazz. So stick around, keep coding, and let’s see where this journey takes us. Have questions or ideas? Drop them below—let’s learn together!

How to Build a Neural Network from Scratch in Go: Part 1

How to Build a Neural Network from Scratch in Go: Part 1 - Understanding and Reading the MNIST Dataset

What is the MNIST Dataset?

How to Get the MNIST Dataset