Neural Networks and Deep Learning CST 395 CS 5TH Semester Honors Course notes

Posts

The Basic Structure of a Convolutional Network

March 31, 2022

In convolutional neural networks, the states in each layer are arranged according to a spatial grid structure. These spatial relationships are inherited from one layer to the next because each feature value is based on a small local spatial region in the previous layer. It is important to maintain these spatial relationships among the grid cells, because the convolution operation and the transformation to the next layer is critically dependent on these relationships. Each layer in the convolutional network is a 3-dimensional grid structure, which has a height, width, and depth. The depth of a layer in a convolutional neural network should not be confused with the depth of the network itself. The word “depth” (when used in the context of a single layer) refers to the number of channels in each layer, such as the number of primary color channels (e.g., blue, green, and red) in the input image or the number of feature maps in the hidden layers. The use of the word “depth” to refer to b...

Strides

March 10, 2022

There are other ways in which convolution can reduce the spatial footprint of the image (or hidden layer). The above approach performs the convolution at every position in the spatial location of the feature map. However, it is not necessary to perform the convolution at every spatial position in the layer. One can reduce the level of granularity of the convolution by using the notion of strides. The description above corresponds to the case when a stride of 1 is used. When a stride of $S_q$ is used in the $q$th layer, the convolution is performed at the locations $1, S_q + 1, 2S_q + 1$, and so on along both spatial dimensions of the layer. The spatial size of the output on performing this convolution1 has height of $(L_q − F_q)/S_q + 1$ and a width of $(B_q − F_q)/S_q + 1$. As a result, the use of strides will result in a reduction of each spatial dimension of the layer by a factor of approximately $S_q$ and the area by $S^2_q$ ...

ReLU Layer

February 09, 2022

The convolution operation is interleaved with the pooling and ReLU operations. The ReLU activation is not very different from how it is applied in a traditional neural network. For each of the $L_q ×B_q ×d_q$ values in a layer, the ReLU activation function is applied to it to create $L_q×B_q×d_q$ thresholded values. These values are then passed on to the next layer. Therefore, applying the ReLU does not change the dimensions of a layer because it is a simple one-to one mapping of activation values. In traditional neural networks, the activation function is combined with a linear transformation with a matrix of weights to create the next layer of activations. Similarly, a ReLU typically follows a convolution operation (which is the rough equivalent of the linear transformation in traditional neural networks), and the ReLU layer is often not explicitly shown in pictorial illustrations of the convolution neural network...

Pooling

February 04, 2022

The pooling operation is quite different. The pooling operation works on small grid regions of size $P_q × P_q$ in each layer, and produces another layer with the same depth (unlike filters). For each square region of size $P_q ×P_q$ in each of the $d_q$ activation maps, the maximum of these values is returned. This approach is referred to as max-pooling . If a stride of 1 is used, then this will produce a new layer of size $(L_q − P_q + 1) × (B_q − P_q + 1) × d_q$.However, it is more common to use a stride $S_q > 1$ in pooling. In such cases, the length of the new layer will be $(L_q −P_q)/S_q +1$ and the breadth will be $(B_q −P_q)/S_q +1$. Therefore, pooling drastically reduces the spatial dimensions of each activation map.Unlike with convolution operations, pooling is done at the level of each activation map. Whereas a convolution operation simultaneously uses all $d_q$ feature maps in combination with a filter to produce a single feature value, pooling independently operates ...

Fully Connected Layers

January 22, 2022

Each feature in the final spatial layer is connected to each hidden state in the first fully connected layer. This layer functions in exactly the same way as a traditional feed-forward network. In most cases, one might use more than one fully connected layer to increase the power of the computations towards the end. The connections among these layers are exactly structured like a traditional feed-forward network. Since the fully connected layers are densely connected, the vast majority of parameters lie in the fully connected layers. For example, if each of two fully connected layers has 4096 hidden units, then the connections between them have more than 16 million weights. Similarly, the connections from the last spatial layer to the first fully connected layer will have a large number of parameters.Even though the convolutional layers have a larger number of activations (and a larger memory footprint), the fully connected layers often have a larger number of connections (and paramete...

Local Response Normalization

January 08, 2022

A trick that is introduced in CNN is local response normalization, which is always used immediately after the ReLU layer. The use of this trick aids generalization. The basic idea of this normalization approach is inspired from biological principles, and it is intended to create competition among different filters. First, we describe the normalization formula using all filters, and then we describe how it is actually computed using only a subset of filters. Consider a situation in which a layer contains $N$ filters, and the activation values of these $N$ filters at a particular spatial position $(x, y)$ are given by $a_1 . . . a_N$. Then, each $a_i$ is converted into a normalized value $b_i$ using the following formula: $b_i=\frac{a_i}{(k+\alpha \sum _j a_i^2)^\beta}$ The values of the underlying parameters used in the original paper are $k = 2, α = 10^{−4}$, and $β = 0.75$. However, in practice, one does not normalize over all $N$ filters. Rather the filters are ordered a...

Hierarchical Feature Engineering

January 01, 2022

It is instructive to examine the activations of the filters created by real-world images in different layers. T he activations of the filters in the early layers are low-level features like edges, whereas those in later layers put together these low-level features. For example, a mid-level feature might put together edges to create a hexagon, whereas a higher-level feature might put together the mid-level hexagons to create a honeycomb. It is fairly easy to see why a low-level filter might detect edges. Consider a situation in which the color of the image changes along an edge. As a result, the difference between neighboring pixel values will be non-zero only across the edge. This can be achieved by choosing the appropriate weights in the corresponding low-level filter. Note that the filter to detect a horizontal edge will not be the same as that to detect a vertical edge. This brings us back to Hubel and Weisel’s experiments in wh...

Search This Blog

Neural Networks and Deep Learning CST 395 CS 5TH Semester Honors Course notes - Dr Binu V P