Deeplearning-Course2

here for website

  • Author: Yiqing Ma (Hongkong University of Science and Technology )
  • Deep Learning , Course


Course Requirement:

  • In-class presentation
  • We will send out a list of papers in a couple of days
  • Send us three papers in the list you would like to present in a prioritized order
  • You may add a paper not in the list to your preference list
  • If you do not send us the preference list, we will assign a paper to you
  • Each student will present one paper for up to 12 minutes, followed by 3-
    minute Q&A
  • 5 students will present in each lecture
  • Face Regcognition

    • Low level features
    • Mid level features
    • High level features
  • The Perceptron: Forward Propagation

    • Linear Combination of Inputs
    • Non-linear activation function
    • Activation Functions: sigmoid function
  • Common Activation Functions

    • Sigmoid Function
    • Hyperbolic Tangent
    • Rectified Linear Unit (ReLU)
    • function: introduce non-linearities
    • Linear Activation functions produce linear decisions no matter the network size
    • Non-linearities allow us to approximate arbitrarily complex functions
  • The Perceptron: Example(简单感知器)

  • Building Neural Networks with Perceptrons
  • Single Layer Neural Network
  • Multi Output Perceptron

    • USE c++ ,very complex, you can done simply through TF.
  • Train a network with 1000 layer

    • a huge deep network
    • more complex network (DAC)——(Directed Acyclic Graph)
  • Question: will i pass this class ?

    • Probability
    • [Predicted: 0.1]
    • [Actual: 1]
    • Objective Function
    • Cost Function
    • Empirical Risk
    • Mean Square Error Loss
1
loss = tf.reduce_mean( tf.square(tf.subtract(model.y, model.pred) )
  • Training Neural Networks
    • Loss Optimization
    • like a mountain , initialized with a mountain weight.
    • Do gradient descent, repeat until convergence.
1
weights = tf.random_normal(shape, stddev=sigma)
1
grads = tf.gradients(ys=loss, xs=weights)
1
weights_new = weights.assign(weights – lr * grads)
  • Computing Graident : Backpropagation
  • let’s use the chain rule one is (vector) and the other is (metricx).
  • Optimization through gradient descent
  • stable learning rate
  • Do something smarter!Design an adaptive learning rate that “adapts” to the landscape
  • Adaptive Learning Rate Algorithms
1
2
3
4
5
tf.train.MomentumOptimizer 
tf.train.AdagradOptimizer
tf.train.AdadeltaOptimizer
tf.train.AdamOptimizer
tf.train.RMSPropOptimizer
  • Neural Networks in Practice: Mini-batches
    • Can be very computational to compute!
    • Stochastic gradient descent (easy to compute but noisy)
    • Use Mini-batch (Fast to compute and a much better estimate of the true gradient!)
    • Pros:
  1. More accurate estimation of gradient
  2. Smoother convergence
  3. Allows for larger learning rates
  • Neural Networks in Practice: Overfitting
    • The Problem of Overfitting
    • Regularization :dropout earlystop