Good Day All,

I have a negative binomial model that I created using the function glm.nb() with the MASS library and I am performing a cross-validation using the function cv.glm() from the boot library. I am really interested in determining the performance of this model so I can have confidence (or not) when it might be applied elsewhere

If I understand the cv.glm() procedure correctly, the default cost function is the average squared error and by running run cv.glm() in a loop many times I understand that I can calculate PRESS (PRedictive Error Sum of Squares = 1/n*Sum(all PEs) from the default output.

When I run a loop that is 10 times, my PRESS ~25

I have a few questions:

1) I must now confess my ignorance, how does one interpret my PRESS of 25 ? Are there some internet resources that someone could point me to to help in the interpretation ? I've spent most of yesterday studying up on things but feel like I am chasing my tail. Most of the resources are either way so heavy in theory that I can't puzzle them out or are a couple of paragraphs long and don't have example with data in them. Is my PRESS in essence saying that my model performance is ~ 75% ? (I suspect not, but I don't know thus I ask)

2) All my observations are spatial in nature and thus I would like to plot out spatially where the model is performing well and where it is not. This would be somewhat akin to inspecting residuals in OLS. Is there a way to output from cv.glm() the PEs for individual data points ? 3) My previous idea was to look at AIC, BIC, McFaddenR2 and PseudoR2 as Goodness of Fit measures of each subset model. It appears that I can modify the cost function of cv.glm() but I am not to confident in my ability to write the correct cost function. Are there other valid measures of GOF for my negative binomial model that I can substitute into the cost function of cv.glm() ? Would anyone care to recommend one (or many) ?

Thanks in advance for your patience !

-Don

PS - if you've seen my previous posts, I've abandoned my 80/20 split validation scheme.

--

-Don
Don Catanzaro, PhD                  Landscape Ecologist
[EMAIL PROTECTED]               16144 Sigmond Lane
479-751-3616                        Lowell, AR 72745

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to