Computer Vision
https://teachablemachine.withgoogle.com/
https://portal.vision.cognitive.azure.com/gallery/featured
DNNs are Feature Extractors
DNNs are Feature Extractors
What do computers 'see'?
Images are just numbers...
2-D Array of Pixel Values
per channel
How do we extract features from images?
Consider the MNIST Dataset
Input dimension - 28 x 28 = 784 (flattened)
How many parameters?
Using Local Spatial Structure
Key Idea: Connect parts of image to neuron in next layer
Using Local Spatial Structure
Key Idea: Connect parts of image to neuron in next layer
Learn weights for this
filter
to detect specific features of interest
Using Local Spatial Structure
Learn weights for this
filter
to detect specific features of interest
Learn multiple such filters!
The Convolution Operation
Elementwise multiplication, then sum for each channel
Add total value for R, G and B, and add bias before activation
The Convolution Operation
But what are these filters doing?
Let's visualize them!
https://deeplizard.com/resource/pavq7noze2
The Max-Pooling Operation
Downscaling outputs of Convolutions
How would we do this?
https://deeplizard.com/resource/pavq7noze3
Convolutional Neural Networks
Deep Reinforcement Learning
Deep
Reinforcement Learning
Q-Learning: Estimating Q-tables from data
$\hat{Q}^t_{opt}(s,a) = (1-\eta)\ \hat{Q}^{t-1}_{opt}(s,a) + \eta\Big[ r + \gamma \hat{V}_{opt}^{t-1}(s') \Big]$
$\hat{V}^t_{opt}(s) = \max_a \hat{Q}^t_{opt}(s,a)$
Optimal Policy$$\pi^*(s) = \argmax_a \hat{Q}(s,a)$$
Deep Reinforcement Learning
Use a neural network to capture $\hat{Q}_{opt}(s,a)$
The Q Function
The Q Function
Iterations: 400
Iterations: 500
Iterations: 600
Deep Q Networks (DQN)
Deep Q Networks (DQN)
Deep Q Networks (DQN)
Target
$r + \max_{a'} \gamma\ Q(s', a')$
Predicted
$\hat{Q}_{opt}(s, a)$
Deep Q Networks (DQN)
Target
$r + \max_{a'} \gamma\ Q(s', a')$
Predicted
$\hat{Q}_{opt}(s, a)$
Q-Loss
$\mathbb{E}\Big[||Target-Predicted||^2\Big]$
$\mathbb{E}\Big[||r + \max_{a'} \gamma\ Q(s', a') - \hat{Q}_{opt}(s, a)||^2\Big]$
Deep Q Networks (DQN)
How do we use these networks to play the game?
Deep Q Networks (DQN)
DQN Atari Results
DQN - Limitations
Suited to discrete actions
Cannot model stochastic policies
Enter
Policy Gradient Methods
Policy Gradient Methods
In Value Learning, we learn $Q(s,a)$
In Policy Learning, we learn $\pi(s)$
Sample $a \sim \pi(s)$
Policy Gradient Methods
Determining $\pi(s)$ from $\hat{Q}(s,a)$
Directly optimize $\pi(s)$
Sample the action with probabilties from $\pi(s)$
Policy Gradient Methods
Discrete v/s continuous action spaces
Policy Gradient Methods
Discrete v/s continuous action spaces
Policy Gradient Methods
Model Training
Initialize model
Run episode until termination
Updates similar to Q-Learning
Stochastic Gradient Descent
Policy Gradient Methods
Model Training
Loss function
$$\mathcal{L} = -\log P(a_t|s_t)R_t$$
Remind you of something?
Policy Gradient Methods
Model Training
$$\mathcal{L} = -\log P(a_t|s_t)R_t$$
Gradient Update
$w = w - \nabla_w \mathcal{L}$
$w = w + \nabla_w \log P(a_t|s_t) R_t$
$w = w + \textcolor{#BA8CA4}{\nabla_w \log P(a_t|s_t) R_t}$
RL in the Real World
Model Training
Initialize model
Run episode until termination
Updates similar to Q-Learning
Stochastic Gradient Descent
Is there a step here that may pose challenges?
CNNs learn powerful features
...so much so that sometimes, we can exploit them!
Remember Gradient Descent?
To attack a network, do gradient ascent instead!
Fast Gradient Sign Method (FGSM)
$$ x_{adv} = x + \alpha\ \mathrm{sign}(\nabla_x L(f_\theta(x), y)) $$
Fast Gradient Sign Method (FGSM)
$$ x_{adv} = x + \alpha\ \mathrm{sign}(\nabla_x L(f_\theta(x), y)) $$
Projected Gradient Descent (PGD)
Repeated FGSM, but clip the resulting outputs to $[x-\epsilon, x+\epsilon]$
$$x_{adv} = \mathrm{Clip}_{x, \epsilon} \Bigg(x + \alpha\ \mathrm{sign}(\nabla_x L(f_\theta(x), y))\Bigg) $$
Attacks can be:
1.
targeted
or
untargeted
2.
black-box
or
white-box
3.
evasion
or
poisoning
4.
digital
or
physically realizable