Dear Frank, Thanks a lot for your response. And apologies for the question, because the answer was obviously in the help.
As for the caveats on selection: yes, thanks. I think I am actually closely following your book (eg., pp. 249 to 253), and one of the points I am trying to make to my colleagues is that by doing variable selection, we are actually getting a worse model (as evidenced by the bias-corrected AUC, which is smaller if attempting variable selection). Best, R. On Fri, Feb 12, 2010 at 3:13 PM, Frank E Harrell Jr <f.harr...@vanderbilt.edu> wrote: > Ramon Diaz-Uriarte wrote: >> >> Dear All, >> >> For logistic regression models: is it possible to use validate (rms >> package) to compute bias-corrected AUC, but have variable selection >> with AIC use step (or stepAIC, from MASS), instead of fastbw? >> >> >> More details: >> >> I've been using the validate function (in the rms package, by Frank >> Harrell) to obtain, among other things, bootstrap bias-corrected >> estimates of the AUC, when variable selection is carried out (using >> AIC as criterion). validate calls predab.resample, which in turn calls >> fastbw (from the Design package, by Harrell). fastbw " Performs a >> slightly inefficient but numerically stable version of fast backward >> elimination on factors, using a method based on Lawless and Singhal >> (1978). This method uses the fitted complete model (...)". However, I >> am finding that the models returned by fastbw are much smaller than >> those returned by stepAIC or step (a simple example is shown below), >> probably because of the approximation and using the complete model. >> >> I'd like to use step instead of fastbw. I think this can be done by >> hacking predab.resample in a couple of places but I am wondering if >> this is a bad idea (why?) or if I am reinventing the wheel. >> >> >> Best, >> >> R. >> >> >> P.S. Simple example of fastbw compared to step: >> >> library(MASS) ## for stepAIC and bwt data >> example(birthwt) >> library(rms) >> >> bwt.glm <- glm(low ~ ., family = binomial, data = bwt) >> bwt.lrm <- lrm(low ~ ., data = bwt) >> >> step(bwt.glm) >> ## same as stepAIC(bwt.glm) >> >> fastbw(bwt.lrm) > > Hi Ramon, > > By default fastbw uses type='residual' to compute test statistics on all > deleted variables combined. Use type='individual' to get the behavior in > step. In your example fastbw(..., type='ind') gives the same model as > step() and comes surprisingly close to estimating the MLEs without > refitting. Of course you refit the reduced model to get MLEs. Both true > and approximate MLEs are biased by the variable selection so beware. type= > can be passed from calibrate or validate to fastbw. > > Note that none of the statistics computed by step or fastbw were designed to > be used with more than two completely pre-specified models. Variable > selection is hazardous both to inference and to prediction. There is no free > lunch; we are torturing data to confess its own sins. > > Frank > > -- > Frank E Harrell Jr Professor and Chairman School of Medicine > Department of Biostatistics Vanderbilt University > -- Ramon Diaz-Uriarte Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz Phone: +34-91-732-8000 ext. 3019 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.