Hyperparameters and Validation Sets
Most machine learning algorithms have several settings that we can use to control the behavior of the learning algorithm. These settings are called hyperparameters.The values of hyperparameters are not adapted by the learning algorithm itself (though we can design a nested learning procedure where one learning algorithm learns the best hyperparameters for another learning algorithm). In the polynomial regression example we saw early, there is a single hyperparameter: the degree of the polynomial, which acts as a capacity hyperparameter.The $\lambda$ value used to control the strength of weight decay is another example of a hyperparameter. Sometimes a setting is chosen to be a hyper parameter that the learning algorithm does not learn because it is difficult to optimize. More frequently, the setting must be a hyper parameter because it is not appropriate to learn that hyper parameter on the training set. This applies to all hyper parameters that control model capacity....