Differentiation of the Sigmoid activation and cross-entropy loss function
A step-by-step differentiation of the Sigmoid activation and cross-entropy loss function is discussed here.
The understanding of derivatives of these two functions is essential in the area of machine learning when performing back-propagation during model training.
Derivative of Sigmoid Function
Sigmoid/ Logistic function is defined as:
Sigmoid/ Logistic function is defined as:
$g(x)=\frac{1}{1+e^{-x}} \in (0,1)$
$g(-5.5)=0.0040$
$g(6.5)=0.9984$
$g(0.4)=0.5986$
Why do we use this version of the derivative?
In the forward propagation step, you compute the sigmoid function $(g(x))$ and have its value handy. While computing the derivative in the backpropagation step, all you have to do is plug in the value of $g(x$) in the formula derived above.
Derivative of Cross-Entropy Function
Cross-Entropy loss function is a very important cost function used for classification problems.The concept of cross-entropy traces back into the field of Information Theory where Claude Shannon introduced the concept of entropy in 1948. Before diving into Cross-Entropy cost function, let us introduce entropy .
Entropy
Entropy of a random variable $X$ is the level of uncertainty inherent in the variables possible outcome.
For $p(x)$ — probability distribution and a random variable $X$, entropy is defined as follows
Cross-Entropy Loss Function is also called logarithmic loss, log loss or logistic loss. Each predicted class probability is compared to the actual class desired output 0 or 1 and a score/loss is calculated that penalizes the probability based on how far it is from the actual expected value. The penalty is logarithmic in nature yielding a large score for large differences close to 1 and small score for small differences tending to 0.
Cross-entropy loss is used when adjusting model weights during training. The aim is to minimize the loss, i.e, the smaller the loss the better the model. A perfect model has a cross-entropy loss of 0.
Cross-entropy is defined as
$E_g= - \sum_{i=1}^n t_i ln (p_i)$
where $t_i$ is the truth value and $p_i$ is the probability of the $i$ᵗʰ class.$E_b=-[ t_n ln(\bar{y}) + (1-t) ln (1-\hat{y})]$
The truth label, $t$, on the binary loss is a known value, whereas $\hat{y}$ is a variable. This means that the function will be differentiated with respect to $\hat{y}$ and treat $t$ as a constant. Let’s go ahead and work on the derivative now.
And therefore, the derivative of the binary cross-entropy loss function becomes
Partial Derivative of cost function for linear regression
Comments
Post a Comment