I am using bagging to perform Bagged Regression Trees on count data (bird abundance in Britain and Ireland, in relation to climate and land cover variables). Predictions from the final model are visually believable but I would really like a diagnostic equivalent to classification success that can be used to decide if a model is adequate. Whereas with classification data an error rate is returned, with continuous data only the root mean squared error is returned. The RMSE is helpful for comparing different models for the same species and deciding which is best, but as far as I can tell it offers no absolute measure of how good that best model is.
At present I am using the final model to make predictions for the original dataset and then computing a correlation coefficient between observed and predicted values but I expect this is probably biased high due to non-independence. Ideally I think I need the correlation coefficient between the predictions and observed values for the out of bag sample for each of the n trees produced, but I don't see this produced anywhere. Does anyone know of a means of getting a useful unbiased diagnostic for assessing overall fit? thanks Simon ____________________________________________________________ Sign-up for Bird Atlas 2007-11 at www.birdatlas.net ____________________________________________________________ Dr Simon Gillings Senior Research Ecologist - Land Use British Trust for Ornithology The Nunnery, Thetford, Norfolk, IP24 2PU, UK Tel +44(0)1842 750050 Fax +44(0)1842 750030 Charity No 216652 (England and Wales) Company Limited by Guarantee No 357284 (England and Wales) Registered Office The Nunnery, Thetford, Norfolk IP24 2PU [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.