Powered by GitBook

中文筆記

以下只作補充

Various sequence to sequence architectures

Basic Models

Image Captioning

pre-trained AlexNet 接 RNN

對短文字說明而言效果不錯

同時有不同的 LAB 提出這樣的論文：

Deep captioning with multimodal recurrent neural networks
Show and tell: Neural image caption generator
Deep visual-semantic alignments for generating image descriptions

Picking the most likely sentence

Why not a greedy search?

greedy search: 條件機率第一個詞選最可能的 $P(\hat y^{< 1>}|x)$ ，第二個詞也選最可能的 $P(\hat y^{< 2>}|\hat y^{< 1>},x)$ (?)，以此類推

然而這樣無法選出聯合機率 $P(\hat y^{< 1>},\hat y^{< 2>},...,\hat y^{< T_y>}|x)$ 最大的句子

Beam Search

假設 dictionary 共有 n 個詞彙，句子長度為 $T_y$ ，則 Beam Search 會搜尋 $B \times n \times T_y$ 次 (?)
當 $B = 1$ 時，基本上就是 greedy search

results matching ""

No results matching ""