Hi all, I've been using a custom summary function to optimise regression model methods using the caret package. This has worked smoothly. I've been using the default bootstrapping resampling method. For bagging models (specifically randomForest in this case) caret can, in theory, uses the out-of-bag (oob) error estimate from the model instead of resampling, which (in theory) is largely redundant for such models. Since they take a while to build in the first place, it really slows things down when estimating performance using boostrap.
I can successfully run either using the oob 'resampling method' with the default RMSE optimisation, or run using bootstrap and my custom summaryFunction as the thing to optimise, but they don't work together. If I try and use oob and supply a summaryFunction caret throws an error saying it can't find the relevant metric. Now, if caret is simply polling the randomForest object for the stored oob error I can understand this limitation, but in the case of randomForest (and probably other bagging methods?) the training function can be asked to return information about the individual tree predictions and whether data points were oob in each case. With this information you can reconstruct an oob 'error' using whatever function you choose to target for optimisation. As far as I can tell, caret is not doing this and I can't see anywhere that it can be coerced to do so. Have I missed something? Can anyone suggest how this could be achieved? It wouldn't be *that* hard to code up something that essentially operates in the same way as caret.train but can handle this feature for bagging models, but if it is already there and I've missed something please let me know. Thanks. Matt Francis [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.