3 Word Representation
Language Model
N-gram: distribution of next word is a categorical conditioned on previous N-1 words,
I visited San ____
- Unigram: mutual indepedent
- Bigram: P(w|San)
- 3-gram: P(w|visited San)
Smoothing
Word Embedding
Problems with wordnet
- Requires human labor to create and adapt
- Impossible to keep up-to-date
- Can’t be used to accurately compute word similarity
Word2Vec
Hierarchical Softmax
- Matmul + softmax over |V| (# of words) is very slow to compute for CBOW and SG
- Huffman encode vocabulary, use binary classifiers to decide which branch to take: log(|V|)
Golve
Construct a Global Vectors for Word Representation
Efficiency
- Pros: Efficient for large corpora
- Cons: Relatively slow for small or medium corpora
Effectiveness
- It is a kind of aggregated word2vec/CBOW
- word2vec mainly focuses on local sliding windows
- GloVe is able to combine global and local features
- More flexible with the values in matrix
- log, PMI variants, ... many tricks can be played!