Session 4 – Pre-read
This pre-read contains information on the models we’ll be using for the in-class examples on classifying penguin species. This information will not be covered during the class, so it is here for your reference.
Neural Network Classifier vs KMeans: Supervised vs Unsupervised Learning
We will be using PyTorch for the Neural Network and Scikit-learn for the KMeans clustering.
Neural Network Classifiers
A neural network classifier is a type of supervised learning model that learns to assign input examples to classes (like penguin species) by finding patterns in labeled training data.

- Neural networks are inspired by biological brains: they consist of connected units called nodes (or sometimes ‘neurons’) organized in layers. Each node processes inputs and passes an output to the next layer.
- A simple network has an input layer (input features), one or more hidden layers (which transform and combine features), and an output layer (which produces scores for each class).
- Each connection between node has a weight that the network learns during training. The network uses these weights to decide how strongly to respond to different inputs.
Training a neural network involves:
- Feeding the training data through the network (the forward pass),
- Comparing the predictions to the true labels using a loss function,
- Adjusting the weights to reduce the loss using techniques like backpropagation and gradient-based optimization.
A neural network classifier learns a function that maps input features to output classes. Once trained, you give it a new example and it predicts the most likely class for that example using learned representations from many training examples.
Example: Given a penguin’s bill length and bill depth, a trained neural network will produce a set of output scores for each species and select the species with the highest score as its prediction.
Click for more information on neural network classifiers!
mouse over underlined tooltip! text for more information.
What Happens Inside a Neural Network
To understand the PenguinNet model, it helps to know what actually happens inside each layer of a neural network.
The Basic Computation
Each node (neuron) in a neural network performs a simple mathematical operation:
- Multiply inputs by weights
- Add a bias
- Apply an activation function
Mathematically this looks like:
\(z = w_1x_1 + w_2x_2 + ... + w_nx_n + b\)
\(a = g(z)\)
Where:
- = input features
- = learned weights
- = bias term
- = activation function
- = neuron output
The weights determine how strongly each input affects the neuron, and they are learned during training. The bias allows the model to shift the decision boundary a line or surface representing where a model changes its predicted class so it can better fit the data.
The output from one layer becomes the input to the next layer. This process continues until the network produces class predictions for the input data.
Activation Functions
After computing the weighted sum of inputs, neural networks apply an activation function The activation function is applied to the weighted sum of inputs.
Activation functions introduce non-linearity into the model. Without them, the network would behave like a simple linear model, no matter how many layers it had.
One of the most common activation functions is ReLU (Rectified Linear Unit):
\(\text{ReLU}(x) = \max(0, x)\)
This means:
- negative values become 0
- positive values remain unchanged
ReLU is popular because it is simple, efficient, and works well in deep neural networks.
Other activation functions you might see include:
- Sigmoid – outputs values between 0 and 1 (often used for binary classification)
- Tanh – outputs values between −1 and 1
- Softmax – converts output scores into probabilities for multi-class classification
Layers in a Neural Network
A typical neural network contains three types of layers:
Input layer
- Receives the input features from the dataset
- In the penguin example: bill length and bill depth
Hidden layers
- Transform and combine features
- Each layer learns increasingly complex patterns in the data
Output layer
- Produces scores for each class
- For a 3-species penguin classifier, there are 3 output neurons
How This Relates to the PyTorch Model
The model below defines a small feedforward neural network using PyTorch.
nn.Module
All neural network models in PyTorch inherit from nn.Module. This provides the infrastructure for storing parameters, tracking gradients, and running training.
nn.Sequential
nn.Sequential creates a pipeline of layers that are applied in order.
The data flows through them like this:
input → Linear → ReLU → Linear → ReLU → Linear → output
Linear Layers
nn.Linear(in_features, out_features) represents a fully connected layer.
It performs the operation:
\(Wx + b\)
where:
W= weight matrixb= bias vector
For example:
means:
- 2 input features
hidden_unitsneurons in the next layer
Each neuron learns its own weights and bias.
Hidden Layers
Your network has two hidden layers:
2 inputs → 16 neurons → 16 neurons → 3 outputs
These layers allow the model to learn complex feature combinations.
For example:
- bill length + bill depth interactions
- nonlinear boundaries between species
Output Layer
The final layer
produces three scores, one for each penguin species.
These values are called logits (unnormalized scores). During training they are typically passed into a loss function like:
nn.CrossEntropyLoss()
which internally applies softmax to convert the scores into class probabilities.
The Forward Pass
The forward() method defines how data flows through the network.
When the model is called:
PyTorch automatically runs:
input → forward() → prediction
This is the forward pass, where the network computes predictions layer by layer.
Training the Network
During training Occurs during the fit() method in our Classifier example. , the process looks like this:
Forward pass Input features move through the network to produce predictions.
Loss calculation The model compares predictions with the true labels.
Backpropagation Gradients are computed to determine how weights should change.
Optimization step An optimizer (like Adam or SGD) updates the weights to reduce the loss.
Over many training iterations, the model learns weights that correctly separate the classes.
Summary
Your PenguinNet model is a small multilayer perceptron (MLP) classifier:
- 2 input features (bill length, bill depth)
- 2 hidden layers with ReLU activations
- 3 output neurons (one for each species)
The linear layers learn weighted combinations of features, and the activation functions allow the network to model complex nonlinear relationships in the data.
K-Means Clustering
K-Means is an unsupervised learning algorithm used for clustering:
- You do not provide the true labels.
- The algorithm tries to split your data into k groups based on similarity.
- It randomly initializes cluster centers, assigns points to the nearest one, then updates the centers iteratively.
Example: Given penguin data without species labels, group them into 3 clusters based on bill length and depth.
Plotting in Python
Please read the ‘Parts of a Figure’ and ‘Coding Styles’ sections of Quick Start Guide (Matplotlib). We will briefly cover plotting with Seaborn (which is built on top of the Matplotlib package), but will not spend much time talking about base Matplotlib.
Optional Reading
Introduction to Object-Oriented Programming (OOP) in Python
We will cover the basics of object-oriented programming and how it relates to analysis workflows in Python during session 4, but Introduction to OOP in Python (Real Python) explains in greater detail and includes some practice exercises.