Posts

Showing posts from November, 2021

Long Short-Term Memory- LSTM

Image
Recurrent neural networks have problems associated with vanishing and exploding gradients . This is a common problem in neural network updates where successive multiplication by the matrix $W^{(k)}$ is inherently unstable; it either results in the gradient disappearing during backpropagation, or in it blowing up to large values in an unstable way. This type of instability is the direct result of successive multiplication with the (recurrent) weight matrix at various time-stamps. One way of viewing this problem is that a neural network that uses only multiplicative updates is good only at learning over short sequences, and is therefore inherently endowed with good short-term memory but poor long-term memory. To address this problem, a solution is to change the recurrence equation for the hidden vector with the use of the LSTM with the use of long-term memory . The operations of the LSTM are designed to have fine-grained control over the data written into this long-term memory. LSTM

Applications of Recurrent Neural Networks

Image
Recurrent neural networks have numerous applications in machine learning applications, which are associated with information retrieval, speech recognition, and handwriting recognition. Text data forms the predominant setting for applications of RNNs, although there are several applications to computational biology as well. Most of the applications of RNNs fall into one of two categories: 1. Conditional language modeling: When the output of a recurrent network is a language model, one can enhance it with context in order to provide a relevant output to the context. In most of these cases, the context is the neural output of another neural network. To provide one example, in image captioning the context is the neural representation of an image provided by a convolutional network, and the language model provides a caption for the image. In machine translation, the context is the representation of a sentence in a source language (produced by another RNN), and the language model in the t

Language Modelling Example of RNN

Image
In order to illustrate the workings of the RNN, we will use a toy example of a single sequence defined on a vocabulary of four words. Consider the sentence: The cat chased the mouse. In this case, we have a lexicon of four words, which are {“the,”“cat,”“chased,”“mouse”}. In Figure below, we have shown the probabilistic prediction of the next word at each of timestamps from 1 to 4. Ideally, we would like the probability of the next word to be predicted correctly from the probabilities of the previous words. Each one-hot encoded input vector $\bar{x}_t$ has length four, in which only one bit is 1 and the remaining bits are 0s. The main flexibility here is in the dimensionality $p$ of the hidden representation, which we set to 2 in this case. As a result, the matrix $W_{xh}$ will be a 2 × 4 matrix, so that it maps a one-hot encoded input vector into a hidden vector $h_t$ vector of size 2. As a practical matter, each column of $W_{xh}$ corresponds to one of the four words, and one of th

The Architecture of Recurrent Neural Network

Image
The simplest recurrent neural network is shown in Figure (a) below. A key point here is the presence of the self-loop in Figure (a), which will cause the hidden state of the neural network to change after the input of each word in the sequence. In practice, one only works with sequences of finite length, and it makes sense to unfold the loop into a “time-layered” network that looks more like a feed-forward network. This network is shown in Figure (b). Note that in this case, we have a different node for the hidden state at each time-stamp and the self-loop has been unfurled into a feed-forward network. This representation is mathematically equivalent to Figure (a), but is much easier to comprehend because of its similarity to a traditional network. The weight matrices in different temporal layers are shared to ensure that the same function is used at each time-stamp. The annotations W_{xh},W_{hh}, and W_{hy}$ of the weight matrices in Figure (b) make the sharing evident. It is notew