Deeplearning-Course3

Course: Deep Learning for Computer Vision

  • What computer sees

    • Images are numbers
    • Tasks in Computer Vision
      • Regression
      • Classification
    • High level feature detection:
      • learn some features that will be high level.
    • Manual Feature Extraction
      • Domain knowledge
      • Define features
      • Detect features to classify
    • Learning Feature Representations
      • Can we learn a hierarchy of features directly from the data instead of hand engineering?
        • Low level features
        • Mid level features
        • High level features
  • Learning Visual Features

    • Fully connected Neural Network
    • Input
      • 2D image
      • Vector of pixel values
    • Fully Connected:
      • Connect neuron in hidden layer to all neurons in input layer
      • No spatial information!
      • And many, many parameters!
    • How can we use spatial structure in the input to inform the architecture of the network?
      • Idea: connect patches of input to neurons in hidden layer
      • Connect patch in input layer to a single neuron in subsequent layer.Use a sliding window to define connections.
      • How can we weight the patch to detect particular features?
    • Applying Filters to Extract Features.
      • 1) Apply a set of weights – a filter – to extract local features
      • 2) Use multiple filters to extract different features
      • 3) Spatially share parameters of each filter (features that matter in one part of the input should matter elsewhere)
    • Feature Extraction with Convolution
    • This “patchy” operation is convolution
  • Feature Extraction and Convolution A Case Study

    • detect x whatever its rotation
    • The Convolution Operation
      • We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:
      • tensorflow conv2d padding
      • keep the shape remain the same (use padding)
    • Producing Feature Maps
  • Convolutional Neural Networks (CNNs)

    • CNNs for Classification
        1. Convolution: Apply filters with learned weights to generate feature maps.
        1. Non-linearity: Often ReLU.
        1. Pooling: Downsampling operation on each feature map
      • Train model with image data.
      • Learn weights of filters in convolutional layers.
    • Convolutional Layers: Local Connectivity
      • For a neuron in hidden layer:
        • Take inputs from patch
        • Compute weighted sum
        • Apply bias
      • 1) applying a window of weights
      • 2) computing linear combinations
      • 3) activating with non-linear function
    • CNNs: Spatial Arrangement of Output Volume
      • Layer Dimensions hxwxd where h and w are spatial dimensions
        d (depth) = number of filters
      • Stride: Filter step size
      • Receptive Field: Locations in input image that a node is path connected to
    • Introducing Non-liearity
      • ReLU: pixel-by-pixel operation that replaces all negative
        values by zero. Non-linear operation!
    • Pooling
      • 1) Reduced dimensionality
      • 2) Spatial invariance
      • How else can we downsample and preserve spatial invariance?
      • checkerboard arti kata
    • CNNs for Classification: Feature Learning
        1. Learn features in input image through convolution
        1. Introduce non-linearity through activation function (real-world data is non-linear!)
        1. Reduce dimensionality and preserve spatial invariance with pooling
      • CONV and POOL layers output high-level features of input
      • Fully connected layer uses these features for classifying input image
      • Express output as probability of image belonging to a particular class
    • Train with Backpropagation: cross-entropy loss
  • CNNs for Classification: ImageNet

    • ImageNet Dataset
    • ImageNet Challenge
      • Classification task: produce a list of object categories present in image
        1000 categories.
      • human performance 5.1
      • 2012: AlexNet. First CNN to win.
        • 8 layers, 61 million parameters
      • 2013: ZFNet
        • 8 layers, more filters
      • 2014: VGG
        • 19 layers
      • 2014: GoogLeNet
        • “Inception” modules- 22 layers, 5million parameters
      • 2015: ResNet
        • 152 layers
  • An Architecture for Many Applications

    • Object detection with R-CNNs
    • Segmentation with fully convolutional network
    • FCN: Fully Convolutional Network.
    • Network designed with all convolutional layers,
    • with downsampling and upsampling operations
    • Semantic Segmentation: FCNs
    • Driving Scene Segmentation
    • Object Detection with R-CNNs
    • Image Captioning using RNNs
  • Deep Learning for Computer Vision: Impact and Summary

    • Data, Data, Data
    • Deep Learning for Computer Vision: Impact
    • Impact: Face Detection
    • Impact: Self-Driving Cars
    • Impact: Healthcare