Using word embeddings

Transfer learning and word embeddings

Learn word embeddings from large text corpus, or download pre-trained embedding model.
Transfer embedding to new task with smaller training set.
(Optional) Continue to finetune the word embeddings with new data (only if new data is large).

Properties of word embeddings

問題：cosine similarity = -1 代表什麼???? 好像不代表什麼

Embedding Matrix

Learning word embeddings

if really want to learn a language model, it's natural to use the last few words as a context.

if goal is to learn a word embedding, all of these kind of methods works well.

Word2Vec

兩個版本的Word2Vec

Skip-gram
CBOW

Negative Sampling

因為上述的Word2Vec要訓練10000維的 softmax classifier 太耗時了，因此另一個較簡單且有效的作法是 Negative sampling，簡言之，轉換成10000個簡單的binary classifier，並且每次訓練並不會迭代所有classifier，而是訓練其中的幾個classifier。(問：可以視為一種 dropout?)

GloVe word vectors

????

Sentiment Classification

Debiasing word embeddings

Steps to debiasing word embeddings 以性別偏見為例，

找出bias的方向，例如
- $e_{he}-e_{she}$
- $e_{male}-e_{female}$
- ... 然後做平均
將不該有偏見的詞 project 到 non-bias 的方向上，例如
- doctor
- nurse
- engineer
將該有偏見的詞對調整vector，使詞對到 non-bias方向的距離相等，例如
- grandmother, grandfather
- son, daughter
- brother, sister ...

Week2: Word Embedding