To determine the fit of a model and be able to compare models, you need to calculate some measure of fit. A common measure, besides Mean Absolute Deviation (MAD), is the root mean squared error (RMSE). Note that in regression models, error is often called a residual. So, here we compute the average of the squared residuals. We square them so that the sign does not matter. The residuals measure how far off a prediction is from the observed value. The predicted values are referred to a fitted values. In R, after you calculate a model m you can get access to the fitted values with m$fitted.values and the residuals with m$residuals. The magnitude of the typical residual provides us with a measure of how far off the average prediction is from the true value. Note that you should calculate the residuals for the validation set to avoid overfitting.
Regression models may be based on a different number of cases and therefore the root mean squared error is often made unbiased by dividing it by the degrees of freedom. You can find the degrees of freedom of model m with m$df.residual. So, here's the proper way to calculate the RMSE -- of course if the number of cases in two model training data sets are the same, then calculating the simple square root works just fine.
1 Comment
|
AuthorMartin Schedlbauer ArchivesCategories |