Inside Deep Learning, an Intro to NLP

0

In the previous article we were introduced to the general concepts of Machine Learning and Deep Learning. We also discovered what is a Neural Network and implemented one using the amazing Deep Learning library Tensorflow. The implementation was simple but it gave us a strong understanding of the underlying concepts and also gave us knowledge about the computational model of Tensorflow. We will extend that idea in the present article and then apply it to a partcular problem, viz – Natural Language Processing.

Before we venture deep into Deep Learning, let us put a pause here and ask ourselves – “Why is it important for a computer to understand Human Language?” Well, the reasons are many but it is sufficient to say that language is one of the primary modes of communication that is natural to human and if a machine understands conversational language then that opens up a multitude of different possibilities of how a human can interact with the machine and gets the jobs done.

Natural Language Processing, which includes both understanding and generating Natural Languages (like English, German etc.) had been one of the holy grails of AI since its inception. The idea is so strong that almost every futuristic Hollywood movie and every sci-fi books involving a futuristic society, has a computer able to converse as a human, with a human. But the problem itself is a very difficult one. In fact, many feels that the problem of Natural Language is harder to crack than problems in some other domains of AI. Let us try to see a part of if and implement some really cool and interesting models to deal with this problem. In particular from here onwards, spanned across few articles, we will develop a model and its implementation to detect the Positive or Negative sentiment of a given movie review. Pretty cool! Huh?

Eliza

This is Eliza, one of the very first efforts to create a conversational bot. NLP has come a long way since then

Word2Vec or How to model Language as a Vector

We have not discussed much about the underlying mathematical concepts of Machine Learning. But to take a deep dive into the world of Natural Languages, we need to have a basic understanding of the mathematical ideas behind them. So I will try to give you a very brief understanding about that.

The first thing I will talk about is a vector. A vector is just a nX1 matrix. Which means it has n rows and 1 columns. This is one of the very basic thing that you have to keep in mind. And actually that is all you need to know about them at this point. Certainly there is a lot one can know about them and about Linear Algenbra in general and for the curious ones, I suggest going through course materials of MIT OCW 18-06-linear-algebra.

So how does vectors help in deep learning. Well, we try to express all our input data as vectors. Imagine an image, 28X28 pixels, i.e., 784 pixels total. Each pixel is nothing but a numerical value (Consider a greyscale image for simplicity) so if we take all of them and put them one after another in a single column then we get a 784 X 1 dimensional vector and that contains all the information we need to know about the image, and we can construct a model of Neural Net and feed this data in with some activation function and get a classification out. It is that simple!

However, for words, the situation is a little bit different. We will not know everything about it from a vetor like this. Everything we need to know about Human Languages may not always be encoded inside the small span of a single sentence. There are certain context, particular usage and specific meaning that we will not be able to get using the same method as we applied for images. Is in not a self contained encoding system.

To counter these issues, and generally to model language in a better way, Mikolov et.al. proposed the famous model called Word2Vec while they were working in Google AI research in 2013.

 

linear relationshipsPictorial representation of Word2Vec model. (source – Tensorflow website)

What is it, exactly?

The idea behind Word2Vec can be understood from a mathematical stand point. And for that you need to understand the Vector Space Modelling and all related theories. Although you are free to check them out but we will present an intuition here.

Let’s take the famous sentence – “The quick brown fox jumps over the lazy dog” what is our objective? It is simple – we’re going to train the neural network to do the following. Given a specific word in the middle of a sentence (the input word), look at the words nearby and pick one at random. The network is going to tell us the probability for every word in our vocabulary of being the “nearby word” that we chose. Here “nearby” actually means a window of n number of words to the left and n number of words to the right.

So what we are saying is – Given an input word calculate the probability of each of the rest of the words in the vocabulary to be the next word after the the input word. And if our model is right and the training is successful then given the input word “fox” the probability of “jumps” being the next word will be way higher than the probability of “lazy”. The main intution behind this model to work is the assumption that related words seem to occur near to each other very often than unrelated words. This is the concept behind the model we are going to implement. Formally it is called the Skip Gram Model.

It is easy to understand that we will have a lot of training steps that way, as we compare the probability of each words with all the other words in the vocabulary. So we also implement something called negative sampling. Where we “Cut” down all the occurrences of the popular words once we visit it.

One final note before we jump into code – as we can not feed in strings as input vectors for a nural network, we create something that we call “one-hot” vector where we have 1 in the place of the target word and 0 otherwise. That helps us to transform a string to an initial vector representation. The vector’s dimension (the number of rows it has) is equal to the vocabulary size.

skip gram net arch

Pictorial representation of Word2Vec model.( source )

Let’s drive into the code

Equiped with the knowledge we already have, let us see some code now. Before we do that I would like to mention, for a more detailed and through understanding of this model, it is useful to check out Tensorlfow’s own documantation as well.

%matplotlib inline
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import collections
import math
import os
import random
import zipfile

import numpy as np
from six.moves import urllib
from six.moves import xrange # pylint: disable=redefined-builtin
import tensorflow as tf
## Our data source
url = 'http://mattmahoney.net/dc/'
def maybe_download(filename, expected_bytes):
"""Download a file if not present, and make sure it's the right size."""
if not os.path.exists(filename):
## Checking if the file is already downloaded. We come her only the first time we run this code
filename, _ = urllib.request.urlretrieve(url + filename, filename)
statinfo = os.stat(filename)
if statinfo.st_size == expected_bytes:
print('Found and verified', filename)
else:
print(statinfo.st_size)
# The expectations did not match. Raise exception.
raise Exception(
'Failed to verify ' + filename + '. Can you get to it with a browser?')
return filename

filename = maybe_download('text8.zip', 31344016)
Found and verified text8.zip
# Read the data into a list of strings.
def read_data(filename):
"""Extract the first file enclosed in a zip file as a list of words"""
with zipfile.ZipFile(filename) as f:
data = tf.compat.as_str(f.read(f.namelist()[0])).split()
return data

words = read_data(filename)
print('Data size', len(words))
Data size 17005207
# Going to make a vocabulary of 50000 words and then we have to replce the rare words with UNK tokens

vocabulary_size = 50000

def build_dataset(words):
count = [['UNK', -1]]
count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
dictionary = dict()
for word, _ in count:
dictionary[word] = len(dictionary)
data = list()
unk_count = 0
for word in words:
if word in dictionary:
index = dictionary[word]
else:
index = 0 # dictionary['UNK']
unk_count = unk_count + 1
data.append(index)
count[0][1] = unk_count
reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
return data, count, dictionary, reverse_dictionary

data, count, dictionary, reverse_dictionary = build_dataset(words)
print('Most common words (+UNK)', count[:5])
print('Sample data', data[:10])
del words # Hint to reduce memory.
Most common words (+UNK) [['UNK', 418391], ('the', 1061396), ('of', 593677), ('and', 416629), ('one', 411764)]
Sample data [5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156]
# Let us print out some variables
print(data[:10])
print(count[:10])
print(dictionary.items()[:10])
print(reverse_dictionary.items()[:10])
[5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156]
[['UNK', 418391], ('the', 1061396), ('of', 593677), ('and', 416629), ('one', 411764), ('in', 372201), ('a', 325873), ('to', 316376), ('zero', 264975), ('nine', 250430)]
[('fawn', 45848), ('homomorphism', 9648), ('nordisk', 39343), ('nunnery', 36075), ('chthonic', 33554), ('sowell', 40562), ('sonja', 38175), ('showa', 32906), ('woods', 6263), ('hsv', 44222)]
[(0, 'UNK'), (1, 'the'), (2, 'of'), (3, 'and'), (4, 'one'), (5, 'in'), (6, 'a'), (7, 'to'), (8, 'zero'), (9, 'nine')]
## Here, we are going to generate a bacth for the skip-gram model
data_index = 0

def generate_batch(batch_size, num_skips, skip_window):
global data_index
assert batch_size % num_skips == 0
assert num_skips <= 2 * skip_window batch = np.ndarray(shape=(batch_size), dtype=np.int32) labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32) span = 2 * skip_window + 1 # [ skip_window target skip_window ] buffer = collections.deque(maxlen=span) for _ in range(span): buffer.append(data[data_index]) data_index = (data_index + 1) % len(data) for i in range(batch_size // num_skips): target = skip_window # target label at the center of the buffer targets_to_avoid = [ skip_window ] for j in range(num_skips): while target in targets_to_avoid: target = random.randint(0, span - 1) targets_to_avoid.append(target) batch[i * num_skips + j] = buffer[skip_window] labels[i * num_skips + j, 0] = buffer[target] buffer.append(data[data_index]) data_index = (data_index + 1) % len(data) return batch, labels print('data:', [reverse_dictionary[di] for di in data[:32]]) for num_skips, skip_window in [(2, 1), (4, 2)]: data_index = 0 batch, labels = generate_batch(batch_size=16, num_skips=num_skips, skip_window=skip_window) print('\nwith num_skips = %d and skip_window = %d:' % (num_skips, skip_window)) print(' batch:', [reverse_dictionary[bi] for bi in batch]) print(' labels:', [reverse_dictionary[li] for li in labels.reshape(16)]) for num_skips, skip_window in [(2, 1), (4, 2)]: data_index = 1 batch, labels = generate_batch(batch_size=16, num_skips=num_skips, skip_window=skip_window) print('\nwith num_skips = %d and skip_window = %d:' % (num_skips, skip_window)) print(' batch:', [reverse_dictionary[bi] for bi in batch]) print(' labels:', [reverse_dictionary[li] for li in labels.reshape(16)]) ``` data: ['anarchism', 'originated', 'as', 'a', 'term', 'of', 'abuse', 'first', 'used', 'against', 'early', 'working', 'class', 'radicals', 'including', 'the', 'diggers', 'of', 'the', 'english', 'revolution', 'and', 'the', 'sans', 'UNK', 'of', 'the', 'french', 'revolution', 'whilst', 'the', 'term'] with num_skips = 2 and skip_window = 1: batch: ['originated', 'originated', 'as', 'as', 'a', 'a', 'term', 'term', 'of', 'of', 'abuse', 'abuse', 'first', 'first', 'used', 'used'] labels: ['anarchism', 'as', 'a', 'originated', 'as', 'term', 'a', 'of', 'abuse', 'term', 'first', 'of', 'used', 'abuse', 'against', 'first'] with num_skips = 4 and skip_window = 2: batch: ['as', 'as', 'as', 'as', 'a', 'a', 'a', 'a', 'term', 'term', 'term', 'term', 'of', 'of', 'of', 'of'] labels: ['anarchism', 'a', 'originated', 'term', 'of', 'as', 'term', 'originated', 'of', 'abuse', 'a', 'as', 'a', 'first', 'term', 'abuse'] with num_skips = 2 and skip_window = 1: batch: ['as', 'as', 'a', 'a', 'term', 'term', 'of', 'of', 'abuse', 'abuse', 'first', 'first', 'used', 'used', 'against', 'against'] labels: ['originated', 'a', 'as', 'term', 'a', 'of', 'term', 'abuse', 'first', 'of', 'abuse', 'used', 'first', 'against', 'early', 'used'] with num_skips = 4 and skip_window = 2: batch: ['a', 'a', 'a', 'a', 'term', 'term', 'term', 'term', 'of', 'of', 'of', 'of', 'abuse', 'abuse', 'abuse', 'abuse'] labels: ['of', 'as', 'originated', 'term', 'a', 'abuse', 'as', 'of', 'abuse', 'term', 'first', 'a', 'of', 'used', 'first', 'term'] Let us print out some more stuffs ```python print(batch) print(labels) ``` [ 6 6 6 6 195 195 195 195 2 2 2 2 3137 3137 3137 3137] [[ 2] [ 12] [3084] [ 195] [ 6] [3137] [ 12] [ 2] [3137] [ 195] [ 46] [ 6] [ 2] [ 59] [ 46] [ 195]]

Looks like a Vector 🙂

## Define the hyper parameters batch_size = 128 embedding_size = 128 # Dimension of the embedding vector. skip_window = 1 # How many words to consider left and right. num_skips = 2 # How many times to reuse an input to generate a label. # We pick a random validation set to sample nearest neighbors. Here we limit the # validation samples to the words that have a low numeric ID, which by # construction are also the most frequent. valid_size = 16 # Random set of words to evaluate similarity on. valid_window = 100 # Only pick dev samples in the head of the distribution. valid_examples = np.random.choice(valid_window, valid_size, replace=False) num_sampled = 64 # Number of negative examples to sample. ## Creating the tensor flow graph object graph = tf.Graph() with graph.as_default(): # Input data. train_dataset = tf.placeholder(tf.int32, shape=[batch_size]) train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1]) valid_dataset = tf.constant(valid_examples, dtype=tf.int32) # Variables. embeddings = tf.Variable( tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0)) softmax_weights = tf.Variable( tf.truncated_normal([vocabulary_size, embedding_size], stddev=1.0 / math.sqrt(embedding_size))) softmax_biases = tf.Variable(tf.zeros([vocabulary_size])) # Model. # Look up embeddings for inputs. embed = tf.nn.embedding_lookup(embeddings, train_dataset) # Compute the softmax loss, using a sample of the negative labels each time. loss = tf.reduce_mean( tf.nn.sampled_softmax_loss(softmax_weights, softmax_biases, embed, train_labels, num_sampled, vocabulary_size)) # Optimizer. optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss) # Compute the similarity between minibatch examples and all embeddings. # We use the cosine distance: norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True)) normalized_embeddings = embeddings / norm valid_embeddings = tf.nn.embedding_lookup( normalized_embeddings, valid_dataset) similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings)) 
num_steps = 100001 final_embeddings = None with tf.Session(graph=graph) as session: tf.initialize_all_variables().run() print('Initialized') average_loss = 0 for step in range(num_steps): batch_data, batch_labels = generate_batch( batch_size, num_skips, skip_window) feed_dict = {train_dataset : batch_data, train_labels : batch_labels} _, l = session.run([optimizer, loss], feed_dict=feed_dict) average_loss += l if step % 2000 == 0: if step > 0: average_loss = average_loss / 2000 # The average loss is an estimate of the loss over the last 2000 batches. print('Average loss at step %d: %f' % (step, average_loss)) average_loss = 0 # note that this is expensive (~20% slowdown if computed every 500 steps) if step % 10000 == 0: sim = similarity.eval() for i in range(valid_size): valid_word = reverse_dictionary[valid_examples[i]] top_k = 8 # number of nearest neighbors nearest = (-sim[i, :]).argsort()[1:top_k+1] log = 'Nearest to %s:' % valid_word for k in range(top_k): close_word = reverse_dictionary[nearest[k]] log = '%s %s,' % (log, close_word) print(log) final_embeddings = normalized_embeddings.eval()
WARNING:tensorflow:From :5 in .: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Use `tf.global_variables_initializer` instead. Initialized Average loss at step 0: 7.923677 Nearest to such: farina, golfer, sbc, decomposed, arlo, classical, catalan, fort, Nearest to three: chamorros, popeye, raman, lucent, synthesize, businessweek, namesake, cris, Nearest to who: anaphase, outgassing, mixolydian, quicksort, tanzania, coolant, localities, lemurs, Nearest to as: recreationally, adaptive, vatican, abyss, fiume, freelance, establishments, seigenthaler, Nearest to first: postcard, ims, cmb, biomechanics, abner, hoffer, aerobic, aimed, Nearest to was: bertolucci, milestones, u, cartoons, garonne, assaulting, deforest, stomp, Nearest to in: levinson, craggy, proportions, verdean, steamers, fashioning, corporation, drift, Nearest to their: thames, gadgets, sandstorms, economic, panoramic, sloth, outfielders, endings, Nearest to most: folketing, pragmatism, entirely, melisende, narn, turning, rebirth, series, Nearest to s: vikernes, ching, blessings, lampooned, scotch, missal, fdr, shootout, Nearest to but: gunto, corwin, fayed, grotesque, masala, avl, bombers, beauharnais, Nearest to united: uploaded, insect, bioinformatics, waiter, medicare, leg, geometrically, valdivia, Nearest to years: roy, kis, memetic, pullback, zoster, trombone, midlands, scheldt, Nearest to called: suspicions, responds, retires, analyses, subpixels, explode, quotes, generators, Nearest to states: runic, vague, insoluble, encyclopedist, oppressed, murbella, amplifier, genesis, Nearest to also: disagreeable, complained, phantom, rh, greenwich, varepsilon, valve, coliseum, Average loss at step 2000: 4.362807 Average loss at step 4000: 3.864294 Average loss at step 6000: 3.790421 Average loss at step 8000: 3.684613 Average loss at step 10000: 3.615146 Nearest to such: known, centro, farina, sbc, gigue, screename, arlo, crouch, Nearest to three: four, seven, five, eight, six, two, nine, zero, Nearest to who: he, they, maduro, modified, also, crafty, christa, it, Nearest to as: by, matching, paints, gunner, specification, housewife, jarman, dv, Nearest to first: last, aimed, sprung, ims, defensive, advaita, crabs, father, Nearest to was: is, has, had, finest, were, laredo, be, by, Nearest to in: of, on, from, at, between, with, eastern, for, Nearest to their: his, its, rr, economic, compose, personalities, ammonites, gadgets, Nearest to most: folketing, series, entirely, trackless, leaking, impressions, all, diverges, Nearest to s: his, hallucination, crane, dealt, deadly, cap, and, charming, Nearest to but: fayed, without, bombers, or, do, however, abortive, gunto, Nearest to united: uploaded, skating, geostationary, netherlands, elevations, babylonians, leg, apogee, Nearest to years: roy, pullback, trafford, macphail, kis, zoster, zoology, precocious, Nearest to called: suspicions, quotes, fighters, amuse, retires, granules, sporting, shaker, Nearest to states: mainland, retired, sources, oppressed, visible, encyclopedist, insoluble, philosopher, Nearest to also: which, still, it, not, hmac, shepard, no, disagreeable, Average loss at step 12000: 3.602845 Average loss at step 14000: 3.570861 Average loss at step 16000: 3.408698 Average loss at step 18000: 3.456122 Average loss at step 20000: 3.541177 Nearest to such: known, well, these, centro, many, dir, hams, soaked, Nearest to three: seven, six, four, two, five, eight, nine, zero, Nearest to who: he, which, they, also, have, she, crafty, never, Nearest to as: paints, medline, sprang, specification, jarman, housewife, fluctuates, einsatzgruppen, Nearest to first: last, second, father, aimed, abner, advaita, ims, ballad, Nearest to was: is, had, were, has, be, are, became, enclave, Nearest to in: at, on, from, with, gayle, medline, during, and, Nearest to their: his, its, the, her, some, rr, our, compose, Nearest to most: folketing, many, leaking, more, webcomic, ceases, some, pragmatism, Nearest to s: faustus, shootout, charming, chakras, privatisation, sometime, nutritious, and, Nearest to but: however, and, which, although, that, crop, would, it, Nearest to united: uploaded, netherlands, skating, elevations, asparagales, shave, babylonians, leg, Nearest to years: pullback, roy, trafford, macphail, five, zoster, days, pursue, Nearest to called: halen, suspicions, retires, sporting, named, prana, adamantium, fighters, Nearest to states: philosopher, mainland, visible, vignette, sources, oppressed, incense, romany, Nearest to also: which, now, who, not, still, often, sometimes, hmac, Average loss at step 22000: 3.500831 Average loss at step 24000: 3.486090 Average loss at step 26000: 3.480069 Average loss at step 28000: 3.479246 Average loss at step 30000: 3.500683 Nearest to such: well, known, these, many, soaked, centro, crouch, encyclopedists, Nearest to three: four, seven, five, eight, six, two, nine, zero, Nearest to who: he, they, she, also, waldseem, never, which, quits, Nearest to as: hamath, by, prussians, housewife, christa, became, under, toro, Nearest to first: last, second, wings, real, advaita, father, abner, exothermic, Nearest to was: is, were, had, has, been, became, neely, finest, Nearest to in: during, at, of, on, from, hunger, after, and, Nearest to their: his, its, her, the, our, some, economic, these, Nearest to most: more, many, some, folketing, leaking, ceases, git, sides, Nearest to s: his, stapleton, predicting, wombat, charming, og, consultants, inaccuracies, Nearest to but: however, and, although, which, if, that, or, though, Nearest to united: uploaded, netherlands, leg, skating, asparagales, elevations, babylonians, shave, Nearest to years: days, months, pullback, roy, macphail, trafford, year, liturgy, Nearest to called: named, halen, clinically, chien, used, prana, threes, retires, Nearest to states: nations, philosopher, mainland, vignette, moneo, incense, sources, schizophrenic, Nearest to also: now, often, sometimes, still, generally, which, who, never, Average loss at step 32000: 3.502676 Average loss at step 34000: 3.492389 Average loss at step 36000: 3.452780 Average loss at step 38000: 3.293599 Average loss at step 40000: 3.427730 Nearest to such: well, known, these, including, soaked, many, crouch, centro, Nearest to three: five, four, two, seven, six, eight, nine, one, Nearest to who: he, they, she, which, often, waldseem, also, never, Nearest to as: housewife, lifeson, boss, gemological, renewed, when, imprinting, einsatzgruppen, Nearest to first: last, second, abner, wings, next, fourth, father, mourning, Nearest to was: is, became, has, were, had, be, been, neely, Nearest to in: and, of, during, mixture, on, within, between, from, Nearest to their: its, his, her, the, economic, our, rr, brodsky, Nearest to most: more, many, git, some, separatists, folketing, leaking, very, Nearest to s: his, sometime, wombat, stapleton, concealed, benchmarking, bogie, cambridgeshire, Nearest to but: however, and, although, though, while, which, when, it, Nearest to united: netherlands, uploaded, babylonians, leg, asparagales, regiment, shave, geometrically, Nearest to years: days, months, pullback, macphail, year, trafford, stanhope, roy, Nearest to called: named, halen, clinically, prana, climbed, chien, considered, spliced, Nearest to states: nations, kingdom, vignette, philosopher, governments, mainland, incense, us, Nearest to also: often, now, which, sometimes, still, originally, generally, never, Average loss at step 42000: 3.434682 Average loss at step 44000: 3.454715 Average loss at step 46000: 3.448851 Average loss at step 48000: 3.348582 Average loss at step 50000: 3.383299 Nearest to such: these, well, known, many, soaked, regarded, encyclopedists, centro, Nearest to three: four, six, seven, eight, two, five, nine, one, Nearest to who: he, she, which, never, also, they, ventspils, waldseem, Nearest to as: housewife, including, when, lifeson, maximal, lever, constituent, newly, Nearest to first: last, second, next, present, final, fourth, wings, only, Nearest to was: is, were, has, became, had, by, been, be, Nearest to in: during, within, on, near, from, of, gayle, at, Nearest to their: its, his, her, the, our, any, my, rr, Nearest to most: more, less, many, some, git, very, leaking, windsurfing, Nearest to s: his, predicting, and, lavishly, inaccuracies, maritimes, bertha, sulba, Nearest to but: however, although, when, though, while, and, during, which, Nearest to united: netherlands, uploaded, regiment, asparagales, leg, babylonians, shave, dysphoria, Nearest to years: days, months, pullback, year, seconds, times, people, stanhope, Nearest to called: named, halen, hynek, clinically, prana, quarries, subpixels, sanger, Nearest to states: nations, kingdom, hypoxia, philosopher, agonists, licklider, governments, genealogists, Nearest to also: often, now, which, sometimes, eventually, generally, still, who, Average loss at step 52000: 3.436536 Average loss at step 54000: 3.422843 Average loss at step 56000: 3.440728 Average loss at step 58000: 3.396358 Average loss at step 60000: 3.392989 Nearest to such: these, well, known, many, including, marl, various, encyclopedists, Nearest to three: five, four, six, eight, seven, two, nine, one, Nearest to who: he, she, they, waldseem, never, ventspils, still, already, Nearest to as: portillo, bonuses, greasy, monteverdi, relaunch, syntax, voluminous, salle, Nearest to first: last, second, next, only, fourth, wings, third, same, Nearest to was: is, became, had, were, has, been, be, becomes, Nearest to in: within, during, between, on, at, among, until, of, Nearest to their: its, his, her, the, our, your, my, some, Nearest to most: more, some, many, less, all, windsurfing, git, among, Nearest to s: bale, his, predicting, perthshire, consultants, lavishly, medea, palliative, Nearest to but: however, although, and, though, see, while, which, hunchback, Nearest to united: netherlands, uploaded, asparagales, babylonians, rolf, regiment, geometrically, dysphoria, Nearest to years: days, months, year, seconds, times, pullback, stanhope, people, Nearest to called: named, considered, used, hynek, quarries, known, salvadoran, affidavit, Nearest to states: nations, kingdom, countries, governments, us, licklider, moneo, schizophrenic, Nearest to also: now, often, still, generally, sometimes, usually, never, otherwise, Average loss at step 62000: 3.234994 Average loss at step 64000: 3.258751 Average loss at step 66000: 3.408228 Average loss at step 68000: 3.390688 Average loss at step 70000: 3.359497 Nearest to such: these, known, well, many, encyclopedists, regarded, marl, perceptive, Nearest to three: four, six, five, two, seven, eight, nine, one, Nearest to who: she, he, waldseem, which, they, typically, ventspils, never, Nearest to as: blaze, lifeson, jarman, when, periphery, greasy, lundy, while, Nearest to first: last, second, next, final, third, same, fourth, present, Nearest to was: is, has, were, became, had, been, gunnery, when, Nearest to in: within, during, on, for, at, walsingham, between, until, Nearest to their: its, his, her, the, our, your, my, these, Nearest to most: more, less, many, some, all, use, leaking, particularly, Nearest to s: must, whose, og, nepalese, differentiable, bastards, siwa, torrent, Nearest to but: however, although, though, while, which, that, when, beirut, Nearest to united: netherlands, uploaded, asparagales, babylonians, rolf, regiment, tupi, polycrates, Nearest to years: months, days, seconds, times, year, decades, stanhope, hours, Nearest to called: named, hynek, considered, affidavit, worthless, quarries, used, shellfish, Nearest to states: nations, kingdom, countries, governments, us, soma, hypoxia, hawks, Nearest to also: now, still, often, never, generally, which, sometimes, usually, Average loss at step 72000: 3.368112 Average loss at step 74000: 3.348772 Average loss at step 76000: 3.316231 Average loss at step 78000: 3.348619 Average loss at step 80000: 3.378759 Nearest to such: these, well, known, regarded, perceptive, certain, encyclopedists, follows, Nearest to three: four, six, seven, five, eight, two, nine, one, Nearest to who: she, he, waldseem, which, typically, ventspils, already, beforehand, Nearest to as: lemurs, nanaimo, housewife, ashland, when, christa, miki, saxophones, Nearest to first: last, second, final, next, third, same, fourth, original, Nearest to was: is, had, were, became, has, groot, being, been, Nearest to in: within, on, during, until, at, since, of, under, Nearest to their: its, his, her, our, my, your, the, these, Nearest to most: more, less, some, many, among, particularly, leaking, all, Nearest to s: bertha, henotheism, medea, stapleton, codepages, isbn, benchmarking, torrent, Nearest to but: however, although, while, though, swanson, which, see, he, Nearest to united: netherlands, uploaded, asparagales, babylonians, jennifer, regiment, rolf, university, Nearest to years: months, days, year, times, decades, seconds, hours, centuries, Nearest to called: named, used, horseman, known, considered, referred, salvadoran, worthless, Nearest to states: nations, kingdom, countries, us, governments, state, licklider, hawks, Nearest to also: now, often, never, still, sometimes, which, usually, generally, Average loss at step 82000: 3.408638 Average loss at step 84000: 3.408937 Average loss at step 86000: 3.389526 Average loss at step 88000: 3.352726 Average loss at step 90000: 3.362146 Nearest to such: well, known, these, regarded, certain, valine, perceptive, many, Nearest to three: five, two, four, seven, six, eight, nine, one, Nearest to who: he, she, ventspils, waldseem, often, but, blok, already, Nearest to as: nanette, periphery, lifeson, boss, pra, fractions, carburetor, ashland, Nearest to first: last, second, final, next, third, current, original, biggest, Nearest to was: is, had, became, were, has, be, been, groot, Nearest to in: within, during, under, at, near, of, on, mcculloch, Nearest to their: its, his, her, our, your, my, the, these, Nearest to most: more, less, some, among, many, particularly, all, louth, Nearest to s: his, stapleton, hellish, isbn, og, theatrically, their, medea, Nearest to but: however, although, and, while, though, which, they, who, Nearest to united: netherlands, uploaded, asparagales, babylonians, commonwealth, regiment, rolf, jennifer, Nearest to years: days, months, year, centuries, hours, seconds, minutes, decades, Nearest to called: named, used, referred, considered, known, sanger, worthless, salvadoran, Nearest to states: nations, kingdom, us, governments, countries, state, disuse, hypoxia, Nearest to also: still, often, now, which, never, generally, sometimes, usually, Average loss at step 92000: 3.394156 Average loss at step 94000: 3.246415 Average loss at step 96000: 3.355701 Average loss at step 98000: 3.245200 Average loss at step 100000: 3.352572 Nearest to such: well, known, regarded, these, follows, including, certain, perceptive, Nearest to three: five, four, eight, seven, six, two, nine, one, Nearest to who: he, she, waldseem, ventspils, blok, which, typically, petrus, Nearest to as: bonuses, boss, sands, nanaimo, like, lemurs, lifeson, when, Nearest to first: last, second, next, third, final, fourth, original, occupancy, Nearest to was: is, became, were, has, had, been, neely, remained, Nearest to in: within, during, near, from, until, on, among, of, Nearest to their: its, his, her, your, our, the, my, these, Nearest to most: more, less, particularly, among, some, many, louth, especially, Nearest to s: his, isbn, stapleton, detergents, consultants, whose, grimaldi, mutagenic, Nearest to but: however, although, and, though, while, where, which, swanson, Nearest to united: netherlands, babylonians, commonwealth, asparagales, regiment, uploaded, university, senate, Nearest to years: days, months, year, centuries, seconds, decades, minutes, hours, Nearest to called: named, used, known, sanger, referred, worthless, hynek, affidavit, Nearest to states: nations, us, kingdom, governments, countries, state, hypoxia, venetian, Nearest to also: sometimes, now, still, never, often, generally, which, usually, ```python ## Letus print the final embeddings print(final_embeddings[0]) ## It is normalized print(np.sum(np.square(final_embeddings[0])))
[-0.07728009 0.03517218 -0.18228957 -0.00207584 0.01873561 -0.02486203 0.12193949 -0.01250643 0.03940032 -0.10482316 0.05367707 0.09266467 0.02690543 0.12172425 0.0893938 0.15757319 0.01846896 -0.03505716 0.08713429 -0.05382324 -0.14333765 -0.1535188 0.11208815 -0.10859789 -0.07721507 -0.03312055 -0.040875 0.11993434 0.07586625 0.17596677 0.09209873 -0.03513331 0.02667043 -0.04645929 -0.08966226 0.0830242 0.09781107 0.03958929 0.01682834 -0.13307127 -0.11134324 -0.03878304 -0.04923664 0.05671014 -0.04505058 0.12830359 0.01437388 0.00210578 0.05953723 -0.08915968 0.02201243 0.04068568 -0.19690314 -0.00525693 -0.01564837 0.20656994 0.09800821 -0.0333547 -0.05508048 -0.10616101 -0.09900458 0.07546395 -0.11319965 0.02302235 0.02436264 0.04583932 -0.09559632 -0.05123037 -0.05044485 -0.13655767 -0.00899678 -0.05789658 0.08641094 -0.0448037 0.20354451 0.07395704 -0.04202218 0.09748953 -0.06520791 -0.10238893 -0.04608236 -0.12708355 -0.05455118 0.04730448 -0.05347774 -0.01954027 -0.03856944 0.06425887 0.1085251 0.01073437 0.1009686 0.1589793 -0.05503578 0.05784431 0.04731087 -0.06389701 -0.11122342 0.00800457 0.0315149 0.18039903 0.07648252 -0.21616213 -0.07552645 0.04326632 0.02655593 -0.1331623 0.08607479 -0.10952291 -0.07767888 0.03432884 -0.08338465 0.04243781 0.01068177 0.04234843 -0.07459798 -0.0171709 0.02318435 0.03449012 0.05400365 0.11374544 -0.10990858 -0.06443553 -0.09917897 -0.19554527 0.0870312 -0.06549358 0.05170379 0.09347247] 1.0 
# Going to generate th same plot... from matplotlib import pylab from six.moves import range from six.moves.urllib.request import urlretrieve from sklearn.manifold import TSNE num_points = 400 tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000) two_d_embeddings = tsne.fit_transform(final_embeddings[1:num_points+1, :]) def plot(embeddings, labels): assert embeddings.shape[0] >= len(labels), 'More labels than embeddings'
pylab.figure(figsize=(15,15)) # in inches
for i, label in enumerate(labels):
x, y = embeddings[i,:]
pylab.scatter(x, y)
pylab.annotate(label, xy=(x, y), xytext=(5, 2), textcoords='offset points',
ha='right', va='bottom')
pylab.show()

words = [reverse_dictionary[i] for i in range(1, num_points+1)]
plot(two_d_embeddings, words)

output_23_0

Wrapping it up

We have successfully modeled and trained a Neural Net to represent words as vectors. Using this we can easily derive relationships, often contextual and complex between words. For an example, we can write something like -> Longest – Long + Short = Shortest. This is amazing! This means our program is able to reason, at least at the basic level, almost like a human.

In later articles we will use this powerful idea and use it to create a Sentiment Analyzer. We will see how this reasoning and logical / mathematical framework gives us the much needed power to define and determine the underlying meaning of a sentence. We will dive even deeper in the world of Deep Learning.

Shubhadeep Roychowdhury

Shubhadeep Roychowdhury

Shubhadeep Roychowdhury was born in 1979 in a secluded small village of Eastern India. He has always been a curious soul and studied Physics and Computer Science in his Bachelor and Masters level. He is working as a professional software developer for the last 11 years and has designed and developed a bunch of successful and highly scalable backend systems across India, USA and Europe. He has recently been working in serious deep learning projects although NLP has always been a passion for him. Living in Paris with his wife, Shubhadeep can often be found trying to write or travelling with his wife when he is not working.

Linkedin profile
Shubhadeep Roychowdhury

Latest posts by Shubhadeep Roychowdhury (see all)

Related posts

  • Sep 18, 2017
    5 big data trends…

    While “big data” can be a misunderstood buzzword in tech, there’s no denying that the recent AI and machine learning push is dependent on the labeling and synthesis of huge amounts of training data. A new trend report by advisory firm Ovum predicts that the big data market—currently at $1.7 billion—will swell to $9.4 billion by 2020. So what do..

    0 Read More
  • Aug 10, 2017
    Great tech teams are…

    Programmers, data scientists and software engineers play a pivotal role in the development of new companies. They are all needed to ensure that a company and its product runs efficiently. Companies are understanding that they will need more and more to embrace data and analytics and make them an essential part of their businesses. Of course, this will require hiring..

    0 Read More
  • Jul 18, 2017
    Changes … and how…

    There’s a lot of speculation nowadays – much of it rather gloomy – on how automation and AI will change the way people work. It isn’t new. In fact it’s a process that has been ongoing since the start of the Industrial Revolution. The thing that is new is the dizzying pace of those changes. The world of work has..

    0 Read More
  • Mar 16, 2017
    A walk in the…

    If you are a developer or an entrepreneur or just a plain curious human being (somewhat like me) having some interest in technology, and unless you are living under a rock, you must have heard about the buzz flowing around, concerning Machine Learning and Deep Learning. Every other day one of the Big Fours (Google, Facebook, Apple, Microsoft) are trumpeting..

    0 Read More