Session 4 – Pre-read
This pre-read contains information on the models we’ll be using for the in-class examples on classifying penguin species. This information will not be covered during the class, so it is here for your reference.
Neural Network Classifier vs KMeans: Supervised vs Unsupervised Learning
We will be using PyTorch for the Neural Network and Scikit-learn for the KMeans clustering.
Neural Network Classifiers
A neural network classifier is a type of supervised learning model that learns to assign input examples to classes (like penguin species) by finding patterns in labeled training data.

- Neural networks are inspired by biological brains: they consist of connected units called nodes (or sometimes ‘neurons’) organized in layers. Each node processes inputs and passes an output to the next layer.
- A simple network has an input layer (input features), one or more hidden layers (which transform and combine features), and an output layer (which produces scores for each class).
- Each connection between node has a weight that the network learns during training. The network uses these weights to decide how strongly to respond to different inputs.
Training a neural network involves:
- Feeding the training data through the network (the forward pass),
- Comparing the predictions to the true labels using a loss function,
- Adjusting the weights to reduce the loss using techniques like backpropagation and gradient-based optimization.
A neural network classifier learns a function that maps input features to output classes. Once trained, you give it a new example and it predicts the most likely class for that example using learned representations from many training examples.
Example: Given a penguin’s bill length and bill depth, a trained neural network will produce a set of output scores for each species and select the species with the highest score as its prediction.
K-Means Clustering
K-Means is an unsupervised learning algorithm used for clustering:
- You do not provide the true labels.
- The algorithm tries to split your data into k groups based on similarity.
- It randomly initializes cluster centers, assigns points to the nearest one, then updates the centers iteratively.
Example: Given penguin data without species labels, group them into 3 clusters based on bill length and depth.
Plotting in Python
Please read the ‘Parts of a Figure’ and ‘Coding Styles’ sections of Quick Start Guide (Matplotlib). We will briefly cover plotting with Seaborn (which is built on top of the Matplotlib package), but will not spend much time talking about base Matplotlib.
Optional Reading
Introduction to Object-Oriented Programming (OOP) in Python
We will cover the basics of object-oriented programming and how it relates to analysis workflows in Python during session 4, but Introduction to OOP in Python (Real Python) explains in greater detail and includes some practice exercises.