Re: [R] Calculating RMSE in R from hurdle regression object

Achim Zeileis Wed, 12 Mar 2014 14:53:30 -0700

On Wed, 12 Mar 2014, Tim Marcella wrote:

Hi,


My data is characterized by many zeros (82%) and overdispersion. I have
chosen to model with hurdle regression (pscl package) with a negative
binomial distribution for the count data. In an effort to validate the
model I would like to calculate the RMSE of the predicted vs. the observed
values. From my reading I understand that this is the calculated on the raw
residuals generated from the model output.

In count regressions (and other GLM-type models) the raw residuals are notnecessarily a good measure because the observations are alwaysheteroscedastic. Low predicted counts also have low variances while highercounts have high variances.

This is the formula I used

H1.RMSE <- sqrt(mean(H1$residuals^2))     # Where H1 is my fitted hurdle
model
I get 46.7 as the RMSE. This seems high to me based on the modelresults. Assuming my formula and my understanding of RMSE is correct(and please correct me if I am wrong) I question whether this is anappropriate use of validation for this particular structure of model.The hurdle model correctly predicts all of my zeros. The predictions Iget from the fitted model are all values greater than zero. From myreadings I understand that the predictions from the fitted hurdle modelare means generated for the particular covariate environment based onthe model coefficients.


Yes.

If this is truly the case it does not make sense to compare these meansto the observations. This will generate large residuals (only 18% of theobservations contain counts greater than 0, while the predicted countsall exceed 0). It seems like comparing apples to oranges.

Well, it compares the predicted means to the observations. It's not applesand oranges but they're also not exactly the same thing. Looking at thisthread where a similar question was asked might help:


https://stat.ethz.ch/pipermail/r-help/2011-June/279765.html

Other correlative tests (Pearson's r, Spearman's p) would seem to becomparing the mean predicted value for particular covariate to theobserved which again is heavily dominated by zeros.
Any tips on how best to validate hurdle models in R?

Thanks

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating RMSE in R from hurdle regression object

Reply via email to