Posts

NEURAL NETWORKS AND DEEP LEARNING CST 395 CS 5TH SEMESTER HONORS COURSE NOTES - Dr Binu V P, 9847390760

About Me Syllabus Previous Year Question Papers CST 395 Neural Network and Deep Learning Module 1 ( Basics of Machine Learning) Overview of Machine Learning Machine Learning Algorithm Linear Regression Capacity, Overfitting and Underfitting Regularization Hyperparameters and Validation Sets Estimators, Bias , Variance and Consistency Challenges In Machine Learning Linear and Logistic Regression ( Python code) Performance Measures Differentiation of  sigmoid and cross entropy function Gradient Descent, Stochastic, Batch and mini batch gradient descent Regression with Gradient Descent Module-2 (Neural Networks) Introduction to Neural Networks Application of Neural Networks Basic Architecture- Single Layer Neural Network Power of Function Composition- Non Linear Activation XOR Activation Functions Choice of activation and loss functions Multi Layer Neural Network, Back Propagation Back Propagation- Example Implementation of a two layer network(XOR) with sigmoid activation function Practic

Machine Learning Algorithm

A machine learning algorithm is an algorithm that is able to learn from data. But what do we mean by learning? Mitchell (1997) provides the definition  “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance at tasks in T , as measured by P , improves with experience E.”  One can imagine a very wide variety of experiences E, tasks T , and performance measures P.The following sections provide intuitive descriptions and examples of the different kinds of tasks, performance measures and experiences that can be used to construct machine learning algorithms. The Task, T Machine learning allows us to tackle tasks that are too difficult to solve with fixed programs written and designed by human beings. From a scientific and philosophical point of view, machine learning is interesting because developing our understanding of machine learning entails developing our understanding of the principles that underli

Capacity, Overfitting and Underfitting

Image
The central challenge in machine learning is that we must perform well on new, previously unseen inputs—not just those on which our model was trained. The ability to perform well on previously unobserved inputs is called generalization. Typically, when training a machine learning model, we have access to a training set, we can compute some error measure on the training set called the training error, and we reduce this training error. So far, what we have described is simply an optimization problem. What separates machine learning from optimization is that we want the generalization error , also called the test error , to be low as well. The generalization error is defined as the expected value of the error on a new input. Here the expectation is taken across different possible inputs, drawn from the distribution of inputs we expect the system to encounter in practice. We typically estimate the generalization error of a machine learning model by measuring its performance on a test s

Estimators, Bias , Variance and Consistency

Image
The field of statistics gives us many tools that can be used to achieve the machine learning goal of solving a task not only on the training set but also to generalize.Foundational concepts such as parameter estimation, bias and variance are useful to formally characterize notions of generalization, underfitting and overfitting. Point Estimation Point estimation is the attempt to provide the single “best” prediction of some quantity of interest. In general the quantity of interest can be a single parameter or a vector of parameters in some parametric model, such as the weights in our linear regression example , but it can also be a whole function. In order to distinguish estimates of parameters from their true value, our convention will be to denote a point estimate of a parameter $\theta$ by $\hat{\theta}$. Let $\{x^{(1)} ,\ldots, x^{(m)}\}$ be a set of $m$ independent and identically distributed  (i.i.d.) data points. A point estimator or statistics  is any function of the data:

Syllabus CST 395 Neural Network and Deep Learning

Syllabus Module - 1 (Basics of Machine Learning ) Machine Learning basics - Learning algorithms - Supervised, Unsupervised, Reinforcement, overfitting, Underfitting, Hyper parameters and Validation sets, Estimators -Bias and Variance. Challenges in machine learning. Simple Linear Regression, Logistic Regression, Performance measures - Confusion matrix, Accuracy, Precision, Recall, Sensitivity, Specificity, Receiver Operating Characteristic curve( ROC), Area Under Curve(AUC). Module -2 (Neural Networks ) Introduction to neural networks -Single layer perceptrons, Multi Layer Perceptrons (MLPs), Representation Power of MLPs, Activation functions - Sigmoid, Tanh, ReLU, Softmax. Risk minimization, Loss function, Training MLPs with backpropagation, Practical issues in neural network training - The Problem of Overfitting, Vanishing and exploding gradient problems, Difficulties in convergence, Local and spurious Optima, Computational Challenges. Applications of neural networks. Module 3 (Deep

Hyperparameters and Validation Sets

Image
Most machine learning algorithms have several settings that we can use to control the behavior of the learning algorithm. These settings are called hyperparameters.The values of hyperparameters are not adapted by the learning algorithm itself (though we can design a nested learning procedure where one learning algorithm learns the best hyperparameters for another learning algorithm). In the polynomial regression example we saw early, there is a single hyperparameter: the degree of the polynomial, which acts as a capacity hyperparameter.The $\lambda$ value used to control the strength of weight decay is another example of a hyperparameter. Sometimes a setting is chosen to be a hyper parameter that the learning algorithm does not learn because it is difficult to optimize. More frequently, the setting must be a hyper parameter because it is not appropriate to learn that hyper parameter on the training set. This applies to all hyper parameters that control model capacity. If learned on

Regularization

Image
The no free lunch theorem implies that we must design our machine learning algorithms to perform well on a specific task. We do so by building a set of preferences into the learning algorithm. When these preferences are aligned with the learning problems we ask the algorithm to solve, it performs better. So far, the only method of modifying a learning algorithm that we have discussed concretely is to increase or decrease the model’s representational capacity by adding or removing functions from the hypothesis space of solutions the learning algorithm is able to choose. We gave the specific example of increasing or decreasing the degree of a polynomial for a regression problem. The behavior of our algorithm is strongly affected not just by how large we make the set of functions allowed in its hypothesis space, but by the specific identity of those functions. The learning algorithm we have studied so far, linear regression, has a hypothesis space consisting of the set of linear funct