Lecture 3 | Loss Functions and Optimization

3.1 Loss Functions

Multiclass SVM loss (some type of a hinge loss)

Given an example $(x_i,y_i)$

Li=∑j≠yimax
- $s_j$ 是 classifier 預測出來的分數
- $s_{y_i}$ ： $y_i$ 是這個樣本的正確的 label， $s_{y_i}$ 代表該樣本真實 label 的分數
- $1$ 其實可以是任意一個 (正?)數
- Q3: 若以非常接近 0 的隨機值初始化 W，則 loss 應該接近，是 class 數量
  - 可以用這個 trick 來檢查 code 有沒有 bug
- Q4: What if the sum was over all classes? (including j=y_i)
  - loss 會增加 1

Example Code

def L_i_vectorized(x,y,W):
    scores = W.dot(x) # shape = (C,1)
    margins = np.maximum(0,scores-scores[y]+1) # shape = (C,1)
    margins[y] = 0 # 只算 j 不等於 y_i 的 trick
    loss_i = np.sum(margins)
    return loss_i

Occam's Razor

"Among competing hypotheses, the simplest is the best", William of Ockham, 1285 - 1347

科學發現，要讓一個理論的應用更廣泛，若有很多個可以解釋你觀察結果的假設，一般來說應該選擇最簡約的。因為這樣更容易在未來將其用於解釋新的觀察結果

Regularization

L2 regularization 更鼓勵 weight 分散分配，而不是依賴 x 中的某個特定元素
L1 regularization 相反，鼓勵稀疏的 weight，模型複雜度可能較小，但較依賴某些特定的 feature
L2 regularization also corresponds MAP inference using a Gaussian prior on W (WHAT?)

Softmax Loss

$P(Y=k|X=x_i) = \dfrac{e^{s_k}}{\sum_j e^{s_j}}$
- 對於正確的類別，希望 output 機率越大越好
綜合上述兩式， $L_i = -\log(\dfrac{e^{s_{y_i}}}{\sum_j e^{s_j}})$
Q: min & max of Softmax Loss?
- 最小為 0，最大無限大
Q: 若以非常接近 0 的隨機值初始化 W，則 loss 應該接近 $-\ln(C)$
softmax 傾向把正確類別的 score 推向無限大；錯誤類別的 score 推向負無限大

Lec 3 | Loss Functions and Optimization

Lecture 3 | Loss Functions and Optimization

3.1 Loss Functions

Multiclass SVM loss (some type of a hinge loss)

Example Code

Occam's Razor

Regularization

Softmax Loss

results matching ""

No results matching ""