Computer Vision



https://teachablemachine.withgoogle.com/
https://portal.vision.cognitive.azure.com/gallery/featured

DNNs are Feature Extractors

DNNs are Feature Extractors

What do computers 'see'?


Images are just numbers...


2-D Array of Pixel Values

per channel

How do we extract features from images?



Consider the MNIST Dataset


Input dimension - 28 x 28 = 784 (flattened)


How many parameters?

Using Local Spatial Structure



Key Idea: Connect parts of image to neuron in next layer

Using Local Spatial Structure


Key Idea: Connect parts of image to neuron in next layer

Learn weights for this filter
to detect specific features of interest

Using Local Spatial Structure


Learn weights for this filter
to detect specific features of interest


Learn multiple such filters!

The Convolution Operation


Elementwise multiplication, then sum for each channel


Add total value for R, G and B, and add bias before activation

The Convolution Operation



But what are these filters doing?



Let's visualize them!

https://deeplizard.com/resource/pavq7noze2

The Max-Pooling Operation



Downscaling outputs of Convolutions

How would we do this?

https://deeplizard.com/resource/pavq7noze3

Convolutional Neural Networks

Deep Reinforcement Learning



Deep Reinforcement Learning



Q-Learning: Estimating Q-tables from data



$\hat{Q}^t_{opt}(s,a) = (1-\eta)\ \hat{Q}^{t-1}_{opt}(s,a) + \eta\Big[ r + \gamma \hat{V}_{opt}^{t-1}(s') \Big]$
$\hat{V}^t_{opt}(s) = \max_a \hat{Q}^t_{opt}(s,a)$



Optimal Policy$$\pi^*(s) = \argmax_a \hat{Q}(s,a)$$

Deep Reinforcement Learning



Use a neural network to capture $\hat{Q}_{opt}(s,a)$



The Q Function



The Q Function



Iterations: 400
Iterations: 500
Iterations: 600

Deep Q Networks (DQN)



Deep Q Networks (DQN)



Deep Q Networks (DQN)



Target

$r + \max_{a'} \gamma\ Q(s', a')$


Predicted

$\hat{Q}_{opt}(s, a)$


Deep Q Networks (DQN)



Target

$r + \max_{a'} \gamma\ Q(s', a')$


Predicted

$\hat{Q}_{opt}(s, a)$


Q-Loss

$\mathbb{E}\Big[||Target-Predicted||^2\Big]$

$\mathbb{E}\Big[||r + \max_{a'} \gamma\ Q(s', a') - \hat{Q}_{opt}(s, a)||^2\Big]$

Deep Q Networks (DQN)


How do we use these networks to play the game?

Deep Q Networks (DQN)


DQN Atari Results


DQN - Limitations



Suited to discrete actions

Cannot model stochastic policies


Enter

Policy Gradient Methods

Policy Gradient Methods



In Value Learning, we learn $Q(s,a)$


In Policy Learning, we learn $\pi(s)$


Sample $a \sim \pi(s)$


Policy Gradient Methods



Determining $\pi(s)$ from $\hat{Q}(s,a)$
Directly optimize $\pi(s)$
Sample the action with probabilties from $\pi(s)$

Policy Gradient Methods

Discrete v/s continuous action spaces



Policy Gradient Methods

Discrete v/s continuous action spaces



Policy Gradient Methods

Model Training



  • Initialize model

  • Run episode until termination

  • Updates similar to Q-Learning


  • Stochastic Gradient Descent

Policy Gradient Methods

Model Training



Loss function

$$\mathcal{L} = -\log P(a_t|s_t)R_t$$

Remind you of something?

Policy Gradient Methods

Model Training

$$\mathcal{L} = -\log P(a_t|s_t)R_t$$

Gradient Update

$w = w - \nabla_w \mathcal{L}$

$w = w + \nabla_w \log P(a_t|s_t) R_t$

$w = w + \textcolor{#BA8CA4}{\nabla_w \log P(a_t|s_t) R_t}$

RL in the Real World

Model Training


  • Initialize model

  • Run episode until termination

  • Updates similar to Q-Learning


  • Stochastic Gradient Descent



Is there a step here that may pose challenges?

CNNs learn powerful features



...so much so that sometimes, we can exploit them!

Remember Gradient Descent?


To attack a network, do gradient ascent instead!



Fast Gradient Sign Method (FGSM)

$$ x_{adv} = x + \alpha\ \mathrm{sign}(\nabla_x L(f_\theta(x), y)) $$

Fast Gradient Sign Method (FGSM)

$$ x_{adv} = x + \alpha\ \mathrm{sign}(\nabla_x L(f_\theta(x), y)) $$



Projected Gradient Descent (PGD)

Repeated FGSM, but clip the resulting outputs to $[x-\epsilon, x+\epsilon]$

$$x_{adv} = \mathrm{Clip}_{x, \epsilon} \Bigg(x + \alpha\ \mathrm{sign}(\nabla_x L(f_\theta(x), y))\Bigg) $$

Attacks can be:



    1. targeted or untargeted

    2. black-box or white-box

    3. evasion or poisoning

    4. digital or physically realizable