中文筆記

Using word embeddings

Transfer learning and word embeddings

  1. Learn word embeddings from large text corpus, or download pre-trained embedding model.
  2. Transfer embedding to new task with smaller training set.
  3. (Optional) Continue to finetune the word embeddings with new data (only if new data is large).

Properties of word embeddings

問題:cosine similarity = -1 代表什麼???? 好像不代表什麼

Embedding Matrix

Learning word embeddings

if really want to learn a language model, it's natural to use the last few words as a context.

if goal is to learn a word embedding, all of these kind of methods works well.

Word2Vec

兩個版本的Word2Vec

  • Skip-gram
  • CBOW

Negative Sampling

因為上述的Word2Vec要訓練10000維的 softmax classifier 太耗時了,因此另一個較簡單且有效的作法是 Negative sampling,簡言之,轉換成10000個簡單的binary classifier,並且每次訓練並不會迭代所有classifier,而是訓練其中的幾個classifier。(問:可以視為一種 dropout?)

GloVe word vectors

????

Sentiment Classification

Debiasing word embeddings

Steps to debiasing word embeddings 以 性別偏見為例,

  1. 找出bias的方向,例如

    • ... 然後做平均
  2. 將不該有偏見的詞 project 到 non-bias 的方向上,例如

    • doctor
    • nurse
    • engineer
  3. 將 該有偏見的詞對 調整vector,使詞對到 non-bias方向的距離相等,例如

    • grandmother, grandfather
    • son, daughter
    • brother, sister ...

results matching ""

    No results matching ""