Padding
One observation is that the convolution operation reduces the size of the $(q + 1)$th layer in comparison with the size of the $q$th layer. This type of reduction in size is not desirable in general, because it tends to lose some information along the borders of the image (or of the feature map, in the case of hidden layers). This problem can be resolved by using padding. In padding, one adds $(F_q −1)/2$ “pixels” all around the borders of the feature map in order to maintain the spatial footprint. Note that these pixels are really feature values in the case of padding hidden layers. The value of each of these padded feature values is set to 0, irrespective of whether the input or the hidden layers are being padded. As a result, the spatial height and width of the input volume will both increase by $(F_q − 1)$, which is exactly what they reduce by (in the output volume) after the convolution is performed. The padded portions do not contribute to the final dot product because their values are set to 0. In a sense, what padding does is to allow the convolution operation with a portion of the filter “sticking out” from the borders of the layer and then performing the dot product only over the portion of the layer where the values are defined. This type of padding is referred to as half-padding because (almost) half the filter is sticking out from all sides of the spatial input in the case where the filter is placed in its extreme spatial position along the edges. Half-padding is designed to maintain the spatial footprint exactly.
When padding is not used, the resulting “padding” is also referred to as a valid padding.Valid padding generally does not work well from an experimental point of view. Using half-padding ensures that some of the critical information at the borders of the layer is represented in a standalone way. In the case of valid padding, the contributions of the pixels on the borders of the layer will be under-represented compared to the central pixels in the next hidden layer, which is undesirable. Furthermore, this under-representation will be compounded over multiple layers. Therefore, padding is typically performed in all layers, and not just in the first layer where the spatial locations correspond to input values. Consider a situation in which the layer has size 32×32×3 and the filter is of size 5×5×3. Therefore, (5 − 1)/2 = 2 zeros are padded on all sides of the image. As a result, the 32 × 32 spatial footprint first increases to 36 × 36 because of padding, and then it reduces back to 32 × 32 after performing the convolution. An example of the padding of a single feature map is shown in Figure below, where two zeros are padded on all sides of the image (or feature map).This is a similar situation as discussed above (in terms of addition of two zeros), except that the spatial dimensions of the image are much smaller than 32 × 32 in order to enable illustration in a reasonable amount of space.
Another useful form of padding is full-padding. In full-padding, we allow (almost) the full filter to stick out from various sides of the input. In other words, a portion of the filter of size $F_q − 1$ is allowed to stick out from any side of the input with an overlap of only one spatial feature. For example, the kernel and the input image might overlap at a single pixel at an extreme corner. Therefore, the input is padded with $(F_q − 1)$ zeros on each side. In other words, each spatial dimension of the input increases by $2(F_q − 1)$. Therefore, if the input dimensions in the original image are $L_q$ and $B_q$, the padded spatial dimensions in the input volume become $L_q +2(F_q −1)$ and $B_q +2(F_q −1)$. After performing the convolution, the feature-map dimensions in layer $(q+1)$ become $L_q+F_q−1$ and $B_q+F_q−1$, respectively.While convolution normally reduces the spatial footprint, full padding increases the spatial footprint. Interestingly, full-padding increases each dimension of the spatial footprint by the same value $(F_q −1)$ that no-padding decreases it. This relationship is not a coincidence because a “reverse” convolution operation can be implemented by applying another convolution on the fully padded output (of the original convolution) with an appropriately defined kernel of the same size. This type of “reverse” convolution occurs frequently in the backpropagation and autoencoder algorithms for convolutional neural networks. Fully padded inputs are useful because they increase the spatial footprint, which is required in several types of convolutional autoencoders.
Comments
Post a Comment