Using TSNE to Plot a Subset of Similar Words from Word2Vec

Aneesha Bakharia
1 min readNov 27, 2017

--

I needed to display a spatial map (i.e., scatterplot) with similar words from Word2Vec. I could only find code that would display the all the words or an indexed subset using either TSNE or PCA. I’m sharing the Python code I wrote as a Gist. The code uses the fantastic gensim library as it provides easy access to the raw word vectors and a great api to perform similarity queries. The code performs the following tasks:

  1. Loads a pre-trained word2vec embedding
  2. Finds similar words and appends each of the similar words embedding vector to the matrix
  3. Applies TSNE to the Matrix to project each word to a 2D space (i.e. dimension reduction)
  4. Plots the 2D position of each word with a label
Figure 1: Words close to ‘Madonna’

Figure 1 shows the words most similar to “Madonna”. No surprise that “Lady_Gaga” shows up. As gensim can load Glove pre-trained vectors, the code can easily be adapted to support Glove as well. I’ll be using the code in a follow up blog post on adding lexicon knowledge to an embedding. Have Fun!

--

--

Aneesha Bakharia
Aneesha Bakharia

Written by Aneesha Bakharia

Data Science, Topic Modelling, Deep Learning, Algorithm Usability and Interpretation, Learning Analytics, Electronics — Brisbane, Australia

Responses (8)