Computer Vision

DNNs are Feature Extractors

DNNs are Feature Extractors

What do computers 'see'?

Images are just numbers...

2-D Array of Pixel Values

per channel

How do we extract features from images?

Consider the MNIST Dataset

Input dimension - 28 x 28 = 784 (flattened)

How many parameters?

Using Local Spatial Structure

Key Idea: Connect parts of image to neuron in next layer

Using Local Spatial Structure

Key Idea: Connect parts of image to neuron in next layer

Learn weights for this filter
to detect specific features of interest

Using Local Spatial Structure

Learn weights for this filter
to detect specific features of interest

Learn multiple such filters!

The Convolution Operation

Elementwise multiplication, then sum for each channel

Add total value for R, G and B, and add bias before activation

The Convolution Operation

But what are these filters doing?

Let's visualize them!

The Max-Pooling Operation

Downscaling outputs of Convolutions

How would we do this?

Convolutional Neural Networks

Deep Reinforcement Learning

Deep Reinforcement Learning

Q-Learning: Estimating Q-tables from data

$\hat{Q}^t_{opt}(s,a) = (1-\eta)\ \hat{Q}^{t-1}_{opt}(s,a) + \eta\Big[ r + \gamma \hat{V}_{opt}^{t-1}(s') \Big]$
$\hat{V}^t_{opt}(s) = \max_a \hat{Q}^t_{opt}(s,a)$

Optimal Policy$$\pi^*(s) = \argmax_a \hat{Q}(s,a)$$

Deep Reinforcement Learning

Use a neural network to capture $\hat{Q}_{opt}(s,a)$

The Q Function

The Q Function

Iterations: 400
Iterations: 500
Iterations: 600

Deep Q Networks (DQN)

Deep Q Networks (DQN)

Deep Q Networks (DQN)


$r + \max_{a'} \gamma\ Q(s', a')$


$\hat{Q}_{opt}(s, a)$

Deep Q Networks (DQN)


$r + \max_{a'} \gamma\ Q(s', a')$


$\hat{Q}_{opt}(s, a)$



$\mathbb{E}\Big[||r + \max_{a'} \gamma\ Q(s', a') - \hat{Q}_{opt}(s, a)||^2\Big]$

Deep Q Networks (DQN)

How do we use these networks to play the game?

Deep Q Networks (DQN)

DQN Atari Results

DQN - Limitations

Suited to discrete actions

Cannot model stochastic policies


Policy Gradient Methods

Policy Gradient Methods

In Value Learning, we learn $Q(s,a)$

In Policy Learning, we learn $\pi(s)$

Sample $a \sim \pi(s)$

Policy Gradient Methods

Determining $\pi(s)$ from $\hat{Q}(s,a)$
Directly optimize $\pi(s)$
Sample the action with probabilties from $\pi(s)$

Policy Gradient Methods

Discrete v/s continuous action spaces

Policy Gradient Methods

Discrete v/s continuous action spaces

Policy Gradient Methods

Model Training

  • Initialize model

  • Run episode until termination

  • Updates similar to Q-Learning

  • Stochastic Gradient Descent

Policy Gradient Methods

Model Training

Loss function

$$\mathcal{L} = -\log P(a_t|s_t)R_t$$

Remind you of something?

Policy Gradient Methods

Model Training

$$\mathcal{L} = -\log P(a_t|s_t)R_t$$

Gradient Update

$w = w - \nabla_w \mathcal{L}$

$w = w + \nabla_w \log P(a_t|s_t) R_t$

$w = w + \textcolor{#BA8CA4}{\nabla_w \log P(a_t|s_t) R_t}$

RL in the Real World

Model Training

  • Initialize model

  • Run episode until termination

  • Updates similar to Q-Learning

  • Stochastic Gradient Descent

Is there a step here that may pose challenges?

CNNs learn powerful features much so that sometimes, we can exploit them!

Remember Gradient Descent?

To attack a network, do gradient ascent instead!

Fast Gradient Sign Method (FGSM)

$$ x_{adv} = x + \alpha\ \mathrm{sign}(\nabla_x L(f_\theta(x), y)) $$

Fast Gradient Sign Method (FGSM)

$$ x_{adv} = x + \alpha\ \mathrm{sign}(\nabla_x L(f_\theta(x), y)) $$

Projected Gradient Descent (PGD)

Repeated FGSM, but clip the resulting outputs to $[x-\epsilon, x+\epsilon]$

$$x_{adv} = \mathrm{Clip}_{x, \epsilon} \Bigg(x + \alpha\ \mathrm{sign}(\nabla_x L(f_\theta(x), y))\Bigg) $$

Attacks can be:

    1. targeted or untargeted

    2. black-box or white-box

    3. evasion or poisoning

    4. digital or physically realizable