Using TSNE to Plot a Subset of Similar Words from Word2Vec
I needed to display a spatial map (i.e., scatterplot) with similar words from Word2Vec. I could only find code that would display the all the words or an indexed subset using either TSNE or PCA. I’m sharing the Python code I wrote as a Gist. The code uses the fantastic gensim library as it provides easy access to the raw word vectors and a great api to perform similarity queries. The code performs the following tasks:
- Loads a pre-trained word2vec embedding
- Finds similar words and appends each of the similar words embedding vector to the matrix
- Applies TSNE to the Matrix to project each word to a 2D space (i.e. dimension reduction)
- Plots the 2D position of each word with a label
Figure 1 shows the words most similar to “Madonna”. No surprise that “Lady_Gaga” shows up. As gensim can load Glove pre-trained vectors, the code can easily be adapted to support Glove as well. I’ll be using the code in a follow up blog post on adding lexicon knowledge to an embedding. Have Fun!