Course: Deep Learning for Computer Vision
What computer sees
- Images are numbers
- Tasks in Computer Vision
- Regression
- Classification
- High level feature detection:
- learn some features that will be high level.
- Manual Feature Extraction
- Domain knowledge
- Define features
- Detect features to classify
- Learning Feature Representations
- Can we learn a hierarchy of features directly from the data instead of hand engineering?
- Low level features
- Mid level features
- High level features
- Can we learn a hierarchy of features directly from the data instead of hand engineering?
Learning Visual Features
- Fully connected Neural Network
- Input
- 2D image
- Vector of pixel values
- Fully Connected:
- Connect neuron in hidden layer to all neurons in input layer
- No spatial information!
- And many, many parameters!
- How can we use spatial structure in the input to inform the architecture of the network?
- Idea: connect patches of input to neurons in hidden layer
- Connect patch in input layer to a single neuron in subsequent layer.Use a sliding window to define connections.
- How can we weight the patch to detect particular features?
- Applying Filters to Extract Features.
- 1) Apply a set of weights – a filter – to extract local features
- 2) Use multiple filters to extract different features
- 3) Spatially share parameters of each filter (features that matter in one part of the input should matter elsewhere)
- Feature Extraction with Convolution
- This “patchy” operation is convolution
Feature Extraction and Convolution
A Case Study
- detect x whatever its rotation
- The Convolution Operation
- We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:
- tensorflow conv2d padding
- keep the shape remain the same (use padding)
- Producing Feature Maps
Convolutional Neural Networks (CNNs)
- CNNs for Classification
- Convolution: Apply filters with learned weights to generate feature maps.
- Non-linearity: Often ReLU.
- Pooling: Downsampling operation on each feature map
- Train model with image data.
- Learn weights of filters in convolutional layers.
- Convolutional Layers: Local Connectivity
- For a neuron in hidden layer:
- Take inputs from patch
- Compute weighted sum
- Apply bias
- 1) applying a window of weights
- 2) computing linear combinations
- 3) activating with non-linear function
- For a neuron in hidden layer:
- CNNs: Spatial Arrangement of Output Volume
- Layer Dimensions hxwxd where h and w are spatial dimensions
d (depth) = number of filters - Stride: Filter step size
- Receptive Field: Locations in input image that a node is path connected to
- Layer Dimensions hxwxd where h and w are spatial dimensions
- Introducing Non-liearity
- ReLU: pixel-by-pixel operation that replaces all negative
values by zero. Non-linear operation!
- ReLU: pixel-by-pixel operation that replaces all negative
- Pooling
- 1) Reduced dimensionality
- 2) Spatial invariance
- How else can we downsample and preserve spatial invariance?
- checkerboard arti kata
- CNNs for Classification: Feature Learning
- Learn features in input image through convolution
- Introduce non-linearity through activation function (real-world data is non-linear!)
- Reduce dimensionality and preserve spatial invariance with pooling
- CONV and POOL layers output high-level features of input
- Fully connected layer uses these features for classifying input image
- Express output as probability of image belonging to a particular class
- Train with Backpropagation: cross-entropy loss
- CNNs for Classification
CNNs for Classification: ImageNet
- ImageNet Dataset
- ImageNet Challenge
- Classification task: produce a list of object categories present in image
1000 categories. - human performance 5.1
- 2012: AlexNet. First CNN to win.
- 8 layers, 61 million parameters
- 2013: ZFNet
- 8 layers, more filters
- 2014: VGG
- 19 layers
- 2014: GoogLeNet
- “Inception” modules- 22 layers, 5million parameters
- 2015: ResNet
- 152 layers
- Classification task: produce a list of object categories present in image
An Architecture for Many Applications
- Object detection with R-CNNs
- Segmentation with fully convolutional network
- FCN: Fully Convolutional Network.
- Network designed with all convolutional layers,
- with downsampling and upsampling operations
- Semantic Segmentation: FCNs
- Driving Scene Segmentation
- Object Detection with R-CNNs
- Image Captioning using RNNs
Deep Learning for Computer Vision: Impact and Summary
- Data, Data, Data
- Deep Learning for Computer Vision: Impact
- Impact: Face Detection
- Impact: Self-Driving Cars
- Impact: Healthcare