- Author:
Yiqing Ma
(Hongkong University of Science and Technology ) Deep Learning , Course
Course Requirement:
- In-class presentation
- We will send out a list of papers in a couple of days
- Send us three papers in the list you would like to present in a prioritized order
- You may add a paper not in the list to your preference list
- If you do not send us the preference list, we will assign a paper to you
- Each student will present one paper for up to 12 minutes, followed by 3-
minute Q&A - 5 students will present in each lecture
Face Regcognition
- Low level features
- Mid level features
- High level features
The Perceptron: Forward Propagation
- Linear Combination of Inputs
- Non-linear activation function
- Activation Functions: sigmoid function
Common Activation Functions
- Sigmoid Function
- Hyperbolic Tangent
- Rectified Linear Unit (ReLU)
- function: introduce non-linearities
- Linear Activation functions produce linear decisions no matter the network size
- Non-linearities allow us to approximate arbitrarily complex functions
The Perceptron: Example(简单感知器)
- Building Neural Networks with Perceptrons
- Single Layer Neural Network
Multi Output Perceptron
- USE c++ ,very complex, you can done simply through TF.
Train a network with 1000 layer
- a huge deep network
- more complex network (DAC)——(Directed Acyclic Graph)
Question: will i pass this class ?
- Probability
- [Predicted: 0.1]
- [Actual: 1]
- Objective Function
- Cost Function
- Empirical Risk
- Mean Square Error Loss
1 | loss = tf.reduce_mean( tf.square(tf.subtract(model.y, model.pred) ) |
- Training Neural Networks
- Loss Optimization
- like a mountain , initialized with a mountain weight.
- Do gradient descent, repeat until convergence.
1 | weights = tf.random_normal(shape, stddev=sigma) |
1 | grads = tf.gradients(ys=loss, xs=weights) |
1 | weights_new = weights.assign(weights – lr * grads) |
- Computing Graident : Backpropagation
- let’s use the chain rule one is (vector) and the other is (metricx).
- Optimization through gradient descent
- stable learning rate
- Do something smarter!Design an adaptive learning rate that “adapts” to the landscape
- Adaptive Learning Rate Algorithms
1 | tf.train.MomentumOptimizer |
- Neural Networks in Practice: Mini-batches
- Can be very computational to compute!
- Stochastic gradient descent (easy to compute but noisy)
- Use Mini-batch (Fast to compute and a much better estimate of the true gradient!)
- Pros:
- More accurate estimation of gradient
- Smoother convergence
- Allows for larger learning rates
- Neural Networks in Practice: Overfitting
- The Problem of Overfitting
- Regularization :dropout earlystop