Posts

Showing posts from August, 2022

Training Deep Models

In the early years, methods for training multilayer networks were not known. In their influential book, Minsky and Papert  strongly argued against the prospects of neural networks because of the inability to train multilayer networks. Therefore, neural networks stayed out of favor as a general area of research till the eighties. The first significant breakthrough in this respect was proposed1 by Rumelhart et al.  in the form of the backpropagation algorithm. The proposal of this algorithm rekindled an interest in neural networks. However, several computational, stability, and overfitting challenges were found in the use of this algorithm. As a result, research in the field of neural networks again fell from favor. At the turn of  the century,several advances again brought popularity to neural networks.Not all of these advances were algorithm-centric. For example, increased data availability and computational power have played the primary role in this resurrection. However, some changes

Introduction to neural networks

Image
Artificial neural networks are popular machine learning techniques that simulate the mechanism of learning in biological organisms. The human nervous system contains cells, which are referred to as neurons . The neurons are connected to one another with the use of axons and dendrites , and the connecting regions between axons and dendrites are referred to as synapses .The strengths of synaptic connections often change in response to external stimuli. This change is how learning takes place in living organisms. This biological mechanism is simulated in artificial neural networks, which contain computation units that are referred to as neurons . The computational units are connected to one another through weights, which serve the same role as the strengths of synaptic connections in biological organisms. Each input to a  neuron is scaled with a weight, which affects the function computed at that unit. The following figure shows both the biological and artificial neural networks. An artif

Teaching Deep Learners to Generalize

Image
Neural networks are powerful learners that have repeatedly proven to be capable of learning complex functions in many domains. However, the great power of neural networks is also their greatest weakness; neural networks often simply overfit the training data if care is not taken to design the learning process carefully.  In practical terms, what overfitting means is that a neural network will provide excellent prediction performance on the training data that it is built on, but will perform poorly on unseen test instances. This is caused by the fact that the learning process often remembers random artifacts of the training data that do not generalize well to the test data. Extreme forms of overfitting are referred to as memorization. A helpful analogy is to think of a child who can solve all the analytical problems for which he or she has seen the solutions, but is unable to provide useful solutions to a new problem. However, if the child is exposed to the solutions of more and more

Differentiation of the Sigmoid activation and cross-entropy loss function

Image
  A step-by-step differentiation of the Sigmoid activation and cross-entropy loss function is discussed here. The understanding of derivatives of these two functions is essential in the area of machine learning when performing back-propagation during model training. Derivative of Sigmoid Function Sigmoid/ Logistic function is defined as: $g(x)=\frac{1}{1+e^{-x}} \in (0,1)$ For any value of $x$, the Sigmoid function $g(x)$ falls in the range $(0, 1)$. As the value of  $x$ decreases, g(x) approaches 0, whereas as $x$ grows bigger, $g(x) $tends to 1. Examples, $g(-5.5)=0.0040$ $g(6.5)=0.9984$ $g(0.4)=0.5986$ It is noted that we can further simplify the derivative and write in term of g(x) Why do we use this version of the derivative? In the forward propagation step, you compute the sigmoid function $(g(x))$ and have its value handy. While computing the derivative in the backpropagation step, all you have to do is plug in the value of $g(x$) in the formula derived above. Here is the plo

Activation Functions

Image
  Activation Functions The activation function defines the output of a neuron / node given an input or set of input (output of multiple neurons). It's the mimic of the stimulation of a biological neuron.Activation function decides, whether a neuron should be activated or not by calculating weighted sum and further adding bias with it. The purpose of the activation function is to introduce non-linearity into the output of a neuron. We know, neural network has neurons that work in correspondence of weight, bias and their respective activation function. In a neural network, we would update the weights and biases of the neurons on the basis of the error at the output. This process is known as back-propagation. Activation functions make the back-propagation possible since the gradients are supplied along with the error to update the weights and biases. The Activation Functions can be basically divided into 2 types- Linear Activation Function Non-linear Activation Function Linear Acti