Long Short-Term Memory (LSTM)

In this video we introduce various types of LSTM architectures especially useful for text.


  • How LSTMs perform classification
  • Deep LSTMs
  • Bidirectional LSTMs
  • CNN/LSTM hybrids

Long Short-Term Memory

We have looked at how to to text classification using timeseries data and LSTMs, and also using convolutional neural networks. In this tutorial, we are going to put it all together, and see how to use LSTMs and hybrid models to do text classification on the imdb movie reviews dataset.

Classic LSTM

Let’s first go over how to use our LSTM to do classification, using concepts we have already seen. Firstly, we are going to translate the words in our text into word embeddings, and feed it into an LSTM. The LSTM outputs a value at every state, but we only care about the final output, which will be the same size as the hidden layer. To transform this output into a classification, we are going to run the final output through a dense layer, and interpret the result as either ‘positive’ or ‘negative’.

Let’s see how this works in code.


Go into the directory lang-classifier, and open up imdb-lstm.py. This code looks very similar to imdb-cnn.py, but we change the architecture of our model to use an LSTM instead of a CNN.

What if we wanted to make our model more complicated/ go deeper? One way to do this is to increase the amount of information being passed between nodes, by increasing the size of the hidden dimension. But if we want to start going into ‘deep learning’, we can actually create a deep LSTM.


A deep LSTM is two LSTMs stacked on top of each other, where all of the output from one LSTM is fed into the second LSTM as input.

How should we do this with our code? Copy and paste line 40, where we add our first LSTM layer, directly below it. We now have to tell Keras that we are hooking these two up by adding a flag.

// line 40
model.add(LSTM(config.hidden_dims, activation=”sigmoid”, return_sequences=True))
model.add(LSTM(config.hidden_dims, activation=”sigmoid”))

This take a long time to run. This model does not get great results on this dataset, because the model is too powerful for such a small amount of data. If you have a larger dataset however, you may want to try deep LSTMs.

Bidirectional LSTM

This is another type of LSTM in which we take two LSTMs and run them in different directions. For text, we might want to do this because there is information running from left to right, but there is also information running from right to left. We then take the output of the last node on the LSTM running left to right and the output from first node on the LSTM running right to left, concatenate them and feed it into a dense layer.

This is quite easy to do in Keras: we just add a bidirectional wrapper. On line 40 add the code:

// line 40
model.add(BidirectionalWrapper)) ??
model.add(LSTM(config.hidden_dims, activation=”sigmoid”))


Another architecture has been getting popular recently is a hybrid CNN and LSTM. We can start with a convolution and pooling layer, and then feed that into an LSTM. This should hopefully get all the power of the LSTM, but the convolutional layer reduces the complexity of the model so that it runs faster.

To create a hybrid CNN LSTM. add the following lines to your code:

// line 40


In this tutorial, we looked at some variations of LSTMs, including deep LSTMs, bidirectional LSTMs and hybrid CNN/LSTMs. These are super applicable to all types of text, including in different languages, and in future tutorials we are going to show how to take these models and apply them to larger datasets.