Computer Vision

https://teachablemachine.withgoogle.com/

https://portal.vision.cognitive.azure.com/gallery/featured

DNNs are Feature Extractors

What do computers 'see'?

Images are just numbers...

2-D Array of Pixel Values

per channel

How do we extract features from images?

Consider the MNIST Dataset

Input dimension - 28 x 28 = 784 (flattened)

How many parameters?

Using Local Spatial Structure

Key Idea: Connect parts of image to neuron in next layer

Using Local Spatial Structure

Key Idea: Connect parts of image to neuron in next layer

Learn weights for this filter
to detect specific features of interest

Using Local Spatial Structure

Learn weights for this filter
to detect specific features of interest

Learn multiple such filters!

The Convolution Operation

Elementwise multiplication, then sum for each channel

Add total value for R, G and B, and add bias before activation

The Convolution Operation

But what are these filters doing?

Let's visualize them!

https://deeplizard.com/resource/pavq7noze2

The Max-Pooling Operation

Downscaling outputs of Convolutions

How would we do this?

https://deeplizard.com/resource/pavq7noze3

Convolutional Neural Networks

Deep Reinforcement Learning

Q-Learning: Estimating Q-tables from data

$\hat{Q}^t_{opt}(s,a) = (1-\eta)\ \hat{Q}^{t-1}_{opt}(s,a) + \eta\Big[ r + \gamma \hat{V}_{opt}^{t-1}(s') \Big]$
$\hat{V}^t_{opt}(s) = \max_a \hat{Q}^t_{opt}(s,a)$

Optimal Policy$$\pi^*(s) = \argmax_a \hat{Q}(s,a)$$

Deep Reinforcement Learning

Use a neural network to capture $\hat{Q}_{opt}(s,a)$

The Q Function

Deep Q Networks (DQN)

Target

$r + \max_{a'} \gamma\ Q(s', a')$

Predicted

$\hat{Q}_{opt}(s, a)$

Deep Q Networks (DQN)

Target

$r + \max_{a'} \gamma\ Q(s', a')$

Predicted

$\hat{Q}_{opt}(s, a)$

Q-Loss

$\mathbb{E}\Big[||Target-Predicted||^2\Big]$

$\mathbb{E}\Big[||r + \max_{a'} \gamma\ Q(s', a') - \hat{Q}_{opt}(s, a)||^2\Big]$

Deep Q Networks (DQN)

How do we use these networks to play the game?

Deep Q Networks (DQN)

DQN Atari Results

DQN - Limitations

Suited to discrete actions

Cannot model stochastic policies

Enter

Policy Gradient Methods

In Value Learning, we learn $Q(s,a)$

In Policy Learning, we learn $\pi(s)$

Sample $a \sim \pi(s)$

Policy Gradient Methods

Determining $\pi(s)$ from $\hat{Q}(s,a)$

Sample the action with probabilties from $\pi(s)$

Policy Gradient Methods

Discrete v/s continuous action spaces

Policy Gradient Methods

Discrete v/s continuous action spaces

Policy Gradient Methods

Model Training

Initialize model
Run episode until termination
Updates similar to Q-Learning

Stochastic Gradient Descent

Policy Gradient Methods

Model Training

Loss function

$$\mathcal{L} = -\log P(a_t|s_t)R_t$$

Remind you of something?

Policy Gradient Methods

Model Training

$$\mathcal{L} = -\log P(a_t|s_t)R_t$$

Gradient Update

$w = w - \nabla_w \mathcal{L}$

$w = w + \nabla_w \log P(a_t|s_t) R_t$

$w = w + \textcolor{#BA8CA4}{\nabla_w \log P(a_t|s_t) R_t}$

RL in the Real World

Model Training

Initialize model
Run episode until termination
Updates similar to Q-Learning

Stochastic Gradient Descent

Is there a step here that may pose challenges?

CNNs learn powerful features

...so much so that sometimes, we can exploit them!

Remember Gradient Descent?

To attack a network, do gradient ascent instead!

Fast Gradient Sign Method (FGSM)

$$ x_{adv} = x + \alpha\ \mathrm{sign}(\nabla_x L(f_\theta(x), y)) $$

Fast Gradient Sign Method (FGSM)

$$ x_{adv} = x + \alpha\ \mathrm{sign}(\nabla_x L(f_\theta(x), y)) $$

Projected Gradient Descent (PGD)

Repeated FGSM, but clip the resulting outputs to $[x-\epsilon, x+\epsilon]$

$$x_{adv} = \mathrm{Clip}_{x, \epsilon} \Bigg(x + \alpha\ \mathrm{sign}(\nabla_x L(f_\theta(x), y))\Bigg) $$

Computer Vision

DNNs are Feature Extractors

DNNs are Feature Extractors

What do computers 'see'?

Images are just numbers...

2-D Array of Pixel Values

per channel

How do we extract features from images?

Consider the MNIST Dataset

Input dimension - 28 x 28 = 784 (flattened)

How many parameters?

Using Local Spatial Structure

Key Idea: Connect parts of image to neuron in next layer

Using Local Spatial Structure

Key Idea: Connect parts of image to neuron in next layer

Learn weights for this filterto detect specific features of interest

Using Local Spatial Structure

Learn weights for this filterto detect specific features of interest

Learn multiple such filters!

The Convolution Operation

Elementwise multiplication, then sum for each channel

Add total value for R, G and B, and add bias before activation

The Convolution Operation

But what are these filters doing?

Let's visualize them!

The Max-Pooling Operation

Downscaling outputs of Convolutions

How would we do this?

Convolutional Neural Networks

Deep Reinforcement Learning

Deep Reinforcement Learning

Q-Learning: Estimating Q-tables from data

$\hat{Q}^t_{opt}(s,a) = (1-\eta)\ \hat{Q}^{t-1}_{opt}(s,a) + \eta\Big[ r + \gamma \hat{V}_{opt}^{t-1}(s') \Big]$$\hat{V}^t_{opt}(s) = \max_a \hat{Q}^t_{opt}(s,a)$

Optimal Policy$$\pi^*(s) = \argmax_a \hat{Q}(s,a)$$

Deep Reinforcement Learning

Use a neural network to capture $\hat{Q}_{opt}(s,a)$

The Q Function

The Q Function

Deep Q Networks (DQN)

Deep Q Networks (DQN)

Deep Q Networks (DQN)

Target

$r + \max_{a'} \gamma\ Q(s', a')$

Predicted

$\hat{Q}_{opt}(s, a)$

Deep Q Networks (DQN)

Target

$r + \max_{a'} \gamma\ Q(s', a')$

Predicted

$\hat{Q}_{opt}(s, a)$

Q-Loss

$\mathbb{E}\Big[||Target-Predicted||^2\Big]$

$\mathbb{E}\Big[||r + \max_{a'} \gamma\ Q(s', a') - \hat{Q}_{opt}(s, a)||^2\Big]$

Deep Q Networks (DQN)

How do we use these networks to play the game?

Deep Q Networks (DQN)

DQN Atari Results

DQN - Limitations

Suited to discrete actions

Cannot model stochastic policies

Enter

Policy Gradient Methods

Policy Gradient Methods

In Value Learning, we learn $Q(s,a)$

In Policy Learning, we learn $\pi(s)$

Sample $a \sim \pi(s)$

Policy Gradient Methods

Policy Gradient Methods

Discrete v/s continuous action spaces

Policy Gradient Methods

Discrete v/s continuous action spaces

Policy Gradient Methods

Model Training

Initialize model

Run episode until termination

Updates similar to Q-Learning

Stochastic Gradient Descent

Policy Gradient Methods

Model Training

Loss function

Learn weights for this filter
to detect specific features of interest

Learn weights for this filter
to detect specific features of interest

$\hat{Q}^t_{opt}(s,a) = (1-\eta)\ \hat{Q}^{t-1}_{opt}(s,a) + \eta\Big[ r + \gamma \hat{V}_{opt}^{t-1}(s') \Big]$
$\hat{V}^t_{opt}(s) = \max_a \hat{Q}^t_{opt}(s,a)$