Local Response Normalization
A trick that is introduced in CNN is local response normalization, which is always used immediately after the ReLU layer. The use of this trick aids generalization. The basic idea of this normalization approach is inspired from biological principles, and it is intended to create competition among different filters. First, we describe the normalization formula using all filters, and then we describe how it is actually computed using only a subset of filters. Consider a situation in which a layer contains $N$ filters, and the activation values of these $N$ filters at a particular spatial position $(x, y)$ are given by $a_1 . . . a_N$. Then, each $a_i$ is converted into a normalized value $b_i$ using the following formula:
$b_i=\frac{a_i}{(k+\alpha \sum _j a_i^2)^\beta}$
The values of the underlying parameters used in the original paper are $k = 2, α = 10^{−4}$, and $β = 0.75$. However, in practice, one does not normalize over all $N$ filters. Rather the filters are ordered arbitrarily up front to define “adjacency” among filters. Then, the normalization is performed over each set of $n$ “adjacent” filters for some parameter $n$. The value of $n$ used is 5. Therefore, we have the following formula:
$b_i=\frac{a_i}{(k+\alpha \sum _{\lfloor i-n/2 \rfloor}^{\lfloor i+n/2 \rfloor} a_i^2)^\beta}$
In the above formula, any value of $i − n/2$ that is less than 0 is set to 0, and any value of $i + n/2$ that is greater than $N$ is set to $N$.
Comments
Post a Comment