Posts

Recursive Neural Network

Image
Recursive neural networks represent yet another generalization of recurrent networks, with a different kind of computational graph, which is structured as a deep tree, rather than the chain-like structure of RNNs. The typical computational graph for a recursive network is illustrated in figure below. Recursive neural networks were introduced by Pollack (1990) and their potential use for learning to reason was described by Bottou (2011). Recursive networks have been successfully applied to processing data structures as input to neural nets (Frasconi et al., 1997, 1998), in natural language processing (Socher et al., 2011a,c, 2013a) as well as in computer vision (Socher et al., 2011b). One clear advantage of recursive nets over recurrent nets is that for a sequence of the same length $τ$, the depth (measured as the number of compositions of nonlinear operations) can be drastically reduced from $τ$ to $O(log τ )$, which might help deal with long-term dependencies. An open question is h...

Multilayer Recurrent Networks ( Deep Recurrent Network)

Image
In all the aforementioned applications, a single-layer RNN architecture is used for ease in understanding. However, in practical applications, a multilayer architecture is used in order to build models of greater complexity. Furthermore, this multilayer architecture can be used in combination with advanced variations of the RNN, such as the LSTM architecture or the gated recurrent unit. These advanced architectures are introduced in later sections. An example of a deep network containing three layers is shown in Figure below. Note that nodes in higher-level layers receive input from those in lower-level layers. The relationships among the hidden states can be generalized directly from the single-layer network. First, we rewrite the recurrence equation of the hidden layers (for single-layer networks) in a form that can be adapted easily to multilayer networks: Here, we have put together a larger matrix $W = [W_{xh},W_{hh}]$ that includes the columns of $W_{xh}$ and $W_{hh}$. Similarl...

Introduction to Recurrent Neural Network

Image
Recurrent neural networks or RNNs (Rumelhart et al., 1986a) are a family of neural networks for processing sequential data. Much as a convolutional network is a neural network that is specialized for processing a grid of values $X$ such as an image, a recurrent neural network is a neural network that is specialized for processing a sequence of values $x^{(1)}, . . . , x^{(τ)}$ . Just as convolutional networks can readily scale to images with large width and height, and some convolutional networks can process images of variable size, recurrent networks can scale to much longer sequences than would be practical for networks without sequence-based specialization. Most recurrent networks can also process sequences of variable length. All the neural architectures discussed earlier are inherently designed for multidimensional data in which the attributes are largely independent of one another. However, certain data types such as time-series, text, and biological data contain sequential de...

Bidirectional Recurrent Networks

Image
One disadvantage of recurrent networks is that the state at a particular time unit only has knowledge about the past inputs up to a certain point in a sentence, but it has no knowledge about future states. In certain applications like language modeling, the results are vastly improved with knowledge about both past and future states. A specific example is handwriting recognition in which there is a clear advantage in using knowledge about both the past and future symbols, because it provides a better idea of the underlying context. In the bidirectional recurrent network, we have separate hidden states $\bar{h}_t^{(f)}$  and  $\bar{h}_t^{(b)}$ for the forward and backward directions. The forward hidden states interact only with each other and the same is true for the backward hidden states. The main difference is that the forward states interact in the forwards direction, while the backwards states interact in the backwards direction. Both  $\bar{h}_t^{(f)}$  and...