Lecture 16 | Adversarial Examples and Adversarial Training

Ian Goodfellow 本來以為會有 adversarial example 是因為一定程度的 overfitting，那麼只要另外 train 一個 model，那麼模型犯的錯誤就會不一樣

然而實驗發現不同模型容易將同一個 adversarial example 分錯，甚至分成同一個錯誤類別
研究發現 original example 和 adversarial example 所產生的偏移向量，若對其它無噪音example 加上這個偏移向量，總能得到一個 adversarial example，因此這應該是系統性效應，而不是隨機效應；比起 overfitting，更像是 underfitting
模型可能過於線性，即使是 NN，也可以看成分段的 linear model
- ReLU 分段線性
- Maxout 分段線性
- sigmoid 我們常常用心調參使得 score 在中間地帶，而這地段也非常線性
- LSTM 是過去和現在 memory cell 的加總，而加總也非常線性

從 模型參數 到 模型輸出 的映射 (mapping)，是非線性的，這也是為什麼 NN這麼難 train；然而從 模型輸入 到 模型輸出 的 mapping 是分段線性的，因此優化input (使得到想要的output)的問題比優化參數問題要簡單

(img 22:) 圖表沒看懂

(img 23:40)

這三列的 perturbation 的 L2-norm 都一樣
在 pixel 較多(圖片較大)的情況下，對每個 pixel 都做很小的擾動，這樣就能以難以察覺的差異造成 model輸出的變化很大?
- 原句：That means that you can actually make changes that are almost imperceptible but actually move you really far and get a large dot product with the coefficients of the linear function that the model represents
在構造 adversarial example 時，必須確保不會造成第一列那種狀況 (改變了它真正的 class)

The Fast Gradient Sign Method

$J(\tilde x , \theta)\approx J(x,\theta)+(\tilde x - x)^T \nabla_x J(x)$

用一階泰勒展開式作的近似

Maximize $J(x,\theta)+(\tilde x-x)^T \nabla_x J(x)$

想要最大化 cost function $J(\tilde x,\theta)$
只要找到一個方向，和 gradient $\nabla_x J(x)$ 的內積最大，加上原始 example，就可以欺騙網路

subject to $||\tilde x-x||_\infty \leq \epsilon$

即每個 dimension 的最大值不能超過 $\epsilon$

=> $\tilde x = x + \epsilon\ \text{sign}(\nabla_x J(x))$

可以在原始image加上 $\epsilon$ 的擾動

這個方法應用到一般NN時，可以達到99%的攻擊成功率，這就意味著模型線性程度過高的假設是可以成立的。

還有其他攻擊方法像是 Nicholas Carlini's attack，是基於 Adam 優化器的某幾步

Maps of Adversarial and Random Cross-Sections

(img 29:)

每個單元格代表 CIFAR-10 某個 test example 的 decision boundary
單元格正中央是 original example
白色是正確的 class，其他顏色是錯誤的 class
由左向右的方向是 FGSM 的方向，也就是 $\text{sign}(\nabla_x J(x))$
上下的方向是隨機一個和上述方向 orthogonal 的方向
可以看到右側都分錯了類，也就是如果在這個方向上的 inner product (內積)很大，就可以得到 adversarial example
這也多多少少證明了 model 過於線性
adversarial example 就像藏在實數中的有理數，每個實數附近都有一個有理數

其實每個 real example 都靠近某個線性的 decision boundary

(img 31:) 這次，x軸和y軸都是 FGSM找到的方向 (應該是向右向下?)

(img 31:52)

將 original example 加了 random noise，model 仍然可以判斷正確
- 例外：第三列第三行
將 adversarial example 加了 random noise，它仍然會是 adversarial
因為 random noise 平均來說和 $\nabla_x J(x)$ 的內積為 0

Estimating the Subspace Dimensionality

(img 34:)

finding out just how many dimensions there are to these subspaces where the adversarial examples lie in a thick contiguous region
平均 adversarial region 的 dimension 是 25
這裡的 dimensionality 其實是在告訴你：利用 random noise 找到 adversarial example 的容易程度
- 如果任何 direction 都是 adversarial 的，那任何變化都會造成分類錯誤
- 如果大多數 direction 是 adversarial 的，那大部分時候，隨機方向會是 adversarial 的
不同模型常常有相同的 adversarial example，當 adversarial subspace 維數越大，兩個模型的 adversarial subspace 越可能重疊

Adversarial Attack 也可以針對 RL 攻擊

論文：Universal Adversarial Perturbations

在所有類別都可以指定同一個錯誤類別的 perturbation

RBFs behave more intuitively

一些 quadratic models 可以很好的抵抗 adversarial example

Cross-model, Cross-dataset generalization

Transferability Attack

(img 50:)

Enhancing Transfer with Ensembles

如果有一個 adversarial exampl 可以欺騙 ensemble 中的所有模型 (例如五個)，那它有極高機率可以欺騙其他 machine learning model

Lec 16 | Adversarial Examples and Adversarial Training

Lecture 16 | Adversarial Examples and Adversarial Training

The Fast Gradient Sign Method

Maps of Adversarial and Random Cross-Sections

Estimating the Subspace Dimensionality

Adversarial Attack 也可以針對 RL 攻擊

論文：Universal Adversarial Perturbations

RBFs behave more intuitively

Cross-model, Cross-dataset generalization

Transferability Attack

Enhancing Transfer with Ensembles

Adverarial Examples in the Human Brain

Practical Attacks

Failed Defenses

Generative Modeling is not Sufficient to Solve the Problem

Training on Adversarial Examples

results matching ""

No results matching ""