On Thu, Feb 10, 2011 at 8:45 AM, Simon Gillings <simon.gilli...@bto.org> wrote: > I am using bagging to perform Bagged Regression Trees on count data (bird > abundance in Britain and Ireland, in relation to climate and land cover > variables). Predictions from the final model are visually believable but I > would really like a diagnostic equivalent to classification success that can > be used to decide if a model is adequate. Whereas with classification data an > error rate is returned, with continuous data only the root mean squared error > is returned. The RMSE is helpful for comparing different models for the same > species and deciding which is best, but as far as I can tell it offers no > absolute measure of how good that best model is. > > At present I am using the final model to make predictions for the original > dataset and then computing a correlation coefficient between observed and > predicted values but I expect this is probably biased high due to > non-independence. Ideally I think I need the correlation coefficient between > the predictions and observed values for the out of bag sample for each of the > n trees produced, but I don't see this produced anywhere. > > Does anyone know of a means of getting a useful unbiased diagnostic for > assessing overall fit? >
Not sure this suggestion is going to help you, but you could switch to the Random Forest ensemble of regression trees (package randomForest). The Random Forest predictor automatically calculates predicted values from/on out-of-bag samples and hence will give you a source to calculate an unbiased estimate of accuracy. Peter ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.