Posts

Showing posts from March, 2022

The Basic Structure of a Convolutional Network

Image
In convolutional neural networks, the states in each layer are arranged according to a spatial grid structure. These spatial relationships are inherited from one layer to the next because each feature value is based on a small local spatial region in the previous layer. It is important to maintain these spatial relationships among the grid cells, because the convolution operation and the transformation to the next layer is critically dependent on these relationships. Each layer in the convolutional network is a 3-dimensional grid structure, which has a height, width, and depth. The depth of a layer in a convolutional neural network should not be confused with the depth of the network itself. The word “depth” (when used in the context of a single layer) refers to the number of channels in each layer, such as the number of primary color channels (e.g., blue, green, and red) in the input image or the number of feature maps in the hidden layers. The use of the word “depth” to refer to b

Strides

 There are other ways in which convolution can reduce the spatial footprint of the image (or hidden layer). The above approach performs the convolution at every position in the spatial  location of the feature map. However, it is not necessary to perform the convolution at every  spatial position in the layer. One can reduce the level of granularity of the convolution by  using the notion of strides. The description above corresponds to the case when a stride  of 1 is used. When a stride of $S_q$ is used in the $q$th layer, the convolution is performed at  the locations $1, S_q + 1, 2S_q + 1$, and so on along both spatial dimensions of the layer. The  spatial size of the output on performing this convolution1 has height of $(L_q − F_q)/S_q + 1$  and a width of $(B_q − F_q)/S_q + 1$. As a result, the use of strides will result in a reduction  of each spatial dimension of the layer by a factor of approximately $S_q$ and the area by $S^2_q$  , although the actual factor may vary becaus