XLNet 簡易筆記 (1)

最近研究了一下 NLP 相關的模型，覺得還蠻有興趣的，有計畫要試著拿 taggle 上面的 dataset 做一些實作，來進行 text classification，在此也順便紀錄一下我的一些學習筆記。
此篇文章，會重點整理幾個 XLNet 重要的觀念，由於我也是個 NLP 新手，只能講點表皮的東西，如果大家有興趣，可以自行深入研究。

You will know…

GPT vs. Bert
XLNet 重點觀念
- Permutation Language Model (Attention Mask)
- Two-Stream Attention

GPT-2 vs. Bert

GPT-2 – Auto Regressive

讓電腦由同一方向閱讀文章，(如果是正著閱讀，就像人類閱讀文章的概念一樣)，並在過程中，學習且猜測下文或上文的內容。
(缺點是他只有兩個方向可以進行閱讀學習，但我們生活中的文章，不全然是這樣的邏輯)

假設我們有一文本:
The discussion includes a critical evaluation of the documentary sources.
依照正著閱讀學習的話大致如下表格

round	knowing word	guessing answer	correct answer
1	The	xxx	discussion
2	The discussion	xox	includes
3	The discussion includes	ooo	a
4	The discussion includes a	…	critical
5	The discussion includes a critical	…	evaluation
6	The discussion includes a critical evaluation	…	of
7	The discussion includes a critical evaluation of	…	the
8	The discussion includes a critical evaluation of the	…	documentary
9	The discussion includes a critical evaluation of the documentary	…	sources

guessing answer 的過程也就是我們的訓練過程，經過多筆資料的訓練，訓練好的模型就可以幫助我們猜出上文或下文可能出現的句子。

Bert – Auto Encoding

利用 mask 遮住文本部分內容，讓電腦猜測並學習 mask 的解答是什麼。
例如一文本: The kids were all wearing animal masks.
在訓練過程中由於 mask 的緣故，電腦可能只能看到 The <mask> were all <mask> animal masks.

好處是可以同時看到上下文的資訊，但麻煩的地方是 mask 的處理，以及 mask 與 mask 之間無法互作參考。

XLNet 重點觀念

簡單來說就是把 GPT-2 以及 Bert 的優點拿過來的混合體 !?

Permutation Language Model (Attention Mask)

在閱讀文本時，將文本排列組合後，組合成 Attention Mask，再交由電腦進行學習。

原始文本	I have a cat called Yaya
組合一	have a cat called Yaya I
組合二	a cat called Yaya I have
組合三	cat called Yaya I have a
…	…

而每一種組合都會對應產生出一個二維的 Attention Mask[i][j]，i 代表原文本的單字 (順序必須相同)，j 代表該單字依造對應的排列組合所能看到的單字有哪些，0為看不到，1為看的到。
我們以上表格的組合二為例:

a cat called Yaya I have

I 能看到 a cat called Yaya
have 能看到 a cat called Yaya I
a 什麼都看不到
…

Attention Mask	idx	0	1	2	3	4	5
I	0	x	0	1	1	1	1
have	1	1	x	1	1	1	1
a	2	0	0	x	0	0	0
cat	3	0	0	1	x	0	0
called	4	0	0	1	1	x	0
Yaya	5	0	0	1	1	1	x

x 代表自己

Two-Stream Attention

a	cat	called	Yaya	I	have
2	3	4	5	0	1

紀錄排列組合前的原始位置

Content-Stream Attention

全部都有 context information + position information

P (called) = (called+4, a+2, cat+3)

Query-Stream Attention

自己 (called) 不會有 context information，只看的到 position information

P (called) = (4, a+2, cat+3)

XLNet 簡易筆記 (1)

GPT-2 vs. Bert

GPT-2 – Auto Regressive

Bert – Auto Encoding

XLNet 重點觀念

Permutation Language Model (Attention Mask)

Two-Stream Attention

Content-Stream Attention

Query-Stream Attention

發表者：Bartek Tao

發表留言取消回覆

GPT-2 vs. Bert

GPT-2 – Auto Regressive

Bert – Auto Encoding

XLNet 重點觀念

Permutation Language Model (Attention Mask)

Two-Stream Attention

Content-Stream Attention

Query-Stream Attention

分享此文：

發表者：Bartek Tao

發表留言 取消回覆

發表留言取消回覆