Posts

Case Studies-Convolutional Architecture -AlexNet

Image
In the following, we provide some case studies of convolutional architectures. These case studies were derived from successful entries to the ILSVRC competition in recent years.These are instructive because they provide an understanding of the important factors in neural network design that can make these networks work well. Even though recent years have seen some changes in architectural design (like ReLU activation), it is striking how similar the modern architectures are to the basic design of LeNet-5. The main changes from LeNet-5 to modern architectures are in terms of the explosion of depth, the use of ReLU activation, and the training efficiency enabled by modern hardware/optimization enhancements. Modern architectures are deeper, and they use a variety of computational, architectural, and hardware tricks to efficiently train these networks with large amounts of data. Hardware advancements should not be underestimated; modern GPU-based platforms are 10,000 times faster than ...

Gated Recurrent Units-GRUs

Image
The Gated Recurrent Unit (GRU) can be viewed as a simplification of the LSTM, which does not use explicit cell states. Another difference is that the LSTM directly controls the amount of information changed in the hidden state using separate forget and output gates.On the other hand, a GRU uses a single reset gate to achieve the same goal. However, the basic idea in the GRU is quite similar to that of an LSTM, in terms of how it partially resets the hidden states.    It was introduced by Kyunghyun Cho et al in the year 2014. GRU does not have a separate cell state($C_t)$.It only has a hidden state($H_t$). Due to the simpler architecture, GRUs are faster to train. They are almost similar to LSTMs except that they have two gates: reset gate and update gate . Reset gate determines how to combine new input to previous memory and update gate determines how much of the previous state to keep. Update gate in GRU is what input gate and forget gate were in LSTM. We don't have the...

Long Short-Term Memory- LSTM

Image
Recurrent neural networks have problems associated with vanishing and exploding gradients . This is a common problem in neural network updates where successive multiplication by the matrix $W^{(k)}$ is inherently unstable; it either results in the gradient disappearing during backpropagation, or in it blowing up to large values in an unstable way. This type of instability is the direct result of successive multiplication with the (recurrent) weight matrix at various time-stamps. One way of viewing this problem is that a neural network that uses only multiplicative updates is good only at learning over short sequences, and is therefore inherently endowed with good short-term memory but poor long-term memory. To address this problem, a solution is to change the recurrence equation for the hidden vector with the use of the LSTM with the use of long-term memory . The operations of the LSTM are designed to have fine-grained control over the data written into this long-term memory. LSTM ...

Applications of Recurrent Neural Networks

Image
Recurrent neural networks have numerous applications in machine learning applications, which are associated with information retrieval, speech recognition, and handwriting recognition. Text data forms the predominant setting for applications of RNNs, although there are several applications to computational biology as well. Most of the applications of RNNs fall into one of two categories: 1. Conditional language modeling: When the output of a recurrent network is a language model, one can enhance it with context in order to provide a relevant output to the context. In most of these cases, the context is the neural output of another neural network. To provide one example, in image captioning the context is the neural representation of an image provided by a convolutional network, and the language model provides a caption for the image. In machine translation, the context is the representation of a sentence in a source language (produced by another RNN), and the language model in the t...

Language Modelling Example of RNN

Image
In order to illustrate the workings of the RNN, we will use a toy example of a single sequence defined on a vocabulary of four words. Consider the sentence: The cat chased the mouse. In this case, we have a lexicon of four words, which are {“the,”“cat,”“chased,”“mouse”}. In Figure below, we have shown the probabilistic prediction of the next word at each of timestamps from 1 to 4. Ideally, we would like the probability of the next word to be predicted correctly from the probabilities of the previous words. Each one-hot encoded input vector $\bar{x}_t$ has length four, in which only one bit is 1 and the remaining bits are 0s. The main flexibility here is in the dimensionality $p$ of the hidden representation, which we set to 2 in this case. As a result, the matrix $W_{xh}$ will be a 2 × 4 matrix, so that it maps a one-hot encoded input vector into a hidden vector $h_t$ vector of size 2. As a practical matter, each column of $W_{xh}$ corresponds to one of the four words, and one of th...

The Architecture of Recurrent Neural Network

Image
The simplest recurrent neural network is shown in Figure (a) below. A key point here is the presence of the self-loop in Figure (a), which will cause the hidden state of the neural network to change after the input of each word in the sequence. In practice, one only works with sequences of finite length, and it makes sense to unfold the loop into a “time-layered” network that looks more like a feed-forward network. This network is shown in Figure (b). Note that in this case, we have a different node for the hidden state at each time-stamp and the self-loop has been unfurled into a feed-forward network. This representation is mathematically equivalent to Figure (a), but is much easier to comprehend because of its similarity to a traditional network. The weight matrices in different temporal layers are shared to ensure that the same function is used at each time-stamp. The annotations W_{xh},W_{hh}, and W_{hy}$ of the weight matrices in Figure (b) make the sharing evident. It is notew...

Recursive Neural Network

Image
Recursive neural networks represent yet another generalization of recurrent networks, with a different kind of computational graph, which is structured as a deep tree, rather than the chain-like structure of RNNs. The typical computational graph for a recursive network is illustrated in figure below. Recursive neural networks were introduced by Pollack (1990) and their potential use for learning to reason was described by Bottou (2011). Recursive networks have been successfully applied to processing data structures as input to neural nets (Frasconi et al., 1997, 1998), in natural language processing (Socher et al., 2011a,c, 2013a) as well as in computer vision (Socher et al., 2011b). One clear advantage of recursive nets over recurrent nets is that for a sequence of the same length $τ$, the depth (measured as the number of compositions of nonlinear operations) can be drastically reduced from $τ$ to $O(log τ )$, which might help deal with long-term dependencies. An open question is h...