'Machine Learning'에 해당되는 글 1건

Which of the two methods is the most effective to avoid overfitting: regularization or searching the model's hyperparameters through crossvalidation? is one of these methods preferable for large/small sets of data? is there any experimental evidence concerning this question?

Thanks.

5

The 2 methods are generally used together since they have different purposes:

  • cross-validation makes it possible to measure how a parameterized learning algorithm is able to generalize to data unseen at training time

  • regularization is a parameter for a learning algorithm that trades off two types of potential generalization error: bias vs variance. Highly biased models cannot fit the training data as well (fewer degrees of freedom) but on the other hand are less likely to over-fit the training data by learning simpler models hence potentially causing less variance.

It is possible to combine the 2 together by doing a cross validated grid search for the optimal value of the regularization parameter (e.g. C in SVM).

Edit: arguably if the algorithm is able to scale to large datasets (linear training time) and that this data is available cheaply (generally not true for supervised learning) then regularization is less important as the redundancy in the training set will generally act as a natural regularizer that will prevent over-fitting. It is still interesting to do cross validation (maybe online cross validation to make it scalable) so as to measure the remaining amount of overfitting.


블로그 이미지

mind10

,