Lesson 2.1: Convolution Neural Network
What is a CNN?
A Convolutional Neural Network (CNN) is a specialized type of artificial neural network designed for processing structured grid data like images. CNNs are particularly effective for computer vision tasks because they automatically and adaptively learn spatial hierarchies of features through backpropagation.
Why CNNs Over Traditional Neural Networks?
Traditional neural networks face several challenges when processing images:
-
Reduce the Number of Input Nodes
- In a traditional neural network, each pixel in an input image would be connected to each neuron in the first hidden layer. For a small 6×6 grayscale image, this means 36 input nodes. For a typical 256×256 RGB image, this would be 256×256×3 = 196,608 input nodes! This leads to:
- Computational inefficiency - too many parameters to learn
- Memory constraints - storing all these weights requires significant memory
- Overfitting - with so many parameters, the model may memorize training data rather than learn general features CNNs solve this by using local connectivity - each neuron in a convolutional layer is connected only to a small region of the input (called the receptive field), dramatically reducing the number of parameters.
- In a traditional neural network, each pixel in an input image would be connected to each neuron in the first hidden layer. For a small 6×6 grayscale image, this means 36 input nodes. For a typical 256×256 RGB image, this would be 256×256×3 = 196,608 input nodes! This leads to:
-
Tolerate Small Shifts in Pixel Locations
- In traditional NNs, if an object in an image shifts slightly, all the pixel values go to different input nodes, making the network see it as a completely different input. CNNs are:
- Translation invariant - the same filter is applied across the entire image, so learned features are detected regardless of their position
- Robust to small transformations - max pooling provides some invariance to small translations
- In traditional NNs, if an object in an image shifts slightly, all the pixel values go to different input nodes, making the network see it as a completely different input. CNNs are:
-
Take Advantage of Spatial Correlation
- In images, nearby pixels are highly correlated. Traditional NNs ignore this spatial structure by flattening the image into a 1D vector. CNNs preserve and exploit this structure by:
- Local connectivity - focusing on small regions at a time
- Parameter sharing - using the same weights (filters) across the entire image
- Hierarchical learning - building complex features from simple ones
- In images, nearby pixels are highly correlated. Traditional NNs ignore this spatial structure by flattening the image into a 1D vector. CNNs preserve and exploit this structure by:
Aspect | Traditional Neural Networks | Convolutional Neural Networks (CNNs) |
---|---|---|
Architecture | Fully connected layers; dense connectivity between neurons. | Convolutional + pooling layers + fully connected layers; designed for grid-like data (e.g., images). |
Local Connectivity | Global connectivity: Each neuron connected to all neurons in the previous layer. | Local connectivity: Neurons in a convolutional layer connected to a small input region (receptive field). |
Weight Sharing | Unique weights per neuron; no parameter sharing. | Shared weights via filters/kernels across input regions; reduces parameters. |
Pooling Layers | No pooling; rely on fully connected layers for dimensionality reduction. | Include pooling (e.g., max pooling) to downsample feature maps and retain key information. |
Applications | Structured data (e.g., tabular data) with simple, well-defined feature relationships | Image/video processing; tasks requiring spatial hierarchies and local pattern recognition |
Parameter Efficiency | High parameter count; prone to overfitting and computational inefficiency for high-dimensional data. | Parameter-efficient due to weight sharing and local connectivity; scalable for large datasets. |