Deeplearning-Course3

Course: Deep Learning for Computer Vision

What computer sees
- Images are numbers
- Tasks in Computer Vision
  - Regression
  - Classification
- High level feature detection:
  - learn some features that will be high level.
- Manual Feature Extraction
  - Domain knowledge
  - Define features
  - Detect features to classify
- Learning Feature Representations
  - Can we learn a hierarchy of features directly from the data instead of hand engineering?
    
    Low level features
    
    Mid level features
    
    High level features
Learning Visual Features
- Fully connected Neural Network
- Input
  - 2D image
  - Vector of pixel values
- Fully Connected:
  - Connect neuron in hidden layer to all neurons in input layer
  - No spatial information!
  - And many, many parameters!
- How can we use spatial structure in the input to inform the architecture of the network?
  - Idea: connect patches of input to neurons in hidden layer
  - Connect patch in input layer to a single neuron in subsequent layer.Use a sliding window to define connections.
  - How can we weight the patch to detect particular features?
- Applying Filters to Extract Features.
  - 1) Apply a set of weights – a filter – to extract local features
  - 2) Use multiple filters to extract different features
  - 3) Spatially share parameters of each filter (features that matter in one part of the input should matter elsewhere)
- Feature Extraction with Convolution
- This “patchy” operation is convolution
Feature Extraction and Convolution A Case Study
- detect x whatever its rotation
- The Convolution Operation
  - We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:
  - tensorflow conv2d padding
  - keep the shape remain the same (use padding)
- Producing Feature Maps
Convolutional Neural Networks (CNNs)
- CNNs for Classification
  - Convolution: Apply filters with learned weights to generate feature maps.
  - Non-linearity: Often ReLU.
  - Pooling: Downsampling operation on each feature map
  - Train model with image data.
  - Learn weights of filters in convolutional layers.
- Convolutional Layers: Local Connectivity
  - For a neuron in hidden layer:
    
    Take inputs from patch
    
    Compute weighted sum
    
    Apply bias
  - 1) applying a window of weights
  - 2) computing linear combinations
  - 3) activating with non-linear function
- CNNs: Spatial Arrangement of Output Volume
  - Layer Dimensions hxwxd where h and w are spatial dimensions
    d (depth) = number of filters
  - Stride: Filter step size
  - Receptive Field: Locations in input image that a node is path connected to
- Introducing Non-liearity
  - ReLU: pixel-by-pixel operation that replaces all negative
    values by zero. Non-linear operation!
- Pooling
  - 1) Reduced dimensionality
  - 2) Spatial invariance
  - How else can we downsample and preserve spatial invariance?
  - checkerboard arti kata
- CNNs for Classification: Feature Learning
  - Learn features in input image through convolution
  - Introduce non-linearity through activation function (real-world data is non-linear!)
  - Reduce dimensionality and preserve spatial invariance with pooling
  - CONV and POOL layers output high-level features of input
  - Fully connected layer uses these features for classifying input image
  - Express output as probability of image belonging to a particular class
- Train with Backpropagation: cross-entropy loss
CNNs for Classification: ImageNet
- ImageNet Dataset
- ImageNet Challenge
  - Classification task: produce a list of object categories present in image
    1000 categories.
  - human performance 5.1
  - 2012: AlexNet. First CNN to win.
  - 8 layers, 61 million parameters
  - 2013: ZFNet
  - 8 layers, more filters
  - 2014: VGG
  - 19 layers
  - 2014: GoogLeNet
  - “Inception” modules- 22 layers, 5million parameters
  - 2015: ResNet
  - 152 layers
An Architecture for Many Applications
- Object detection with R-CNNs
- Segmentation with fully convolutional network
- FCN: Fully Convolutional Network.
- Network designed with all convolutional layers,
- with downsampling and upsampling operations
- Semantic Segmentation: FCNs
- Driving Scene Segmentation
- Object Detection with R-CNNs
- Image Captioning using RNNs
Deep Learning for Computer Vision: Impact and Summary
- Data, Data, Data
- Deep Learning for Computer Vision: Impact
- Impact: Face Detection
- Impact: Self-Driving Cars
- Impact: Healthcare