Frank, let me make sure I understand:
On Fri, Feb 12, 2010 at 5:57 PM, Frank E Harrell Jr <f.harr...@vanderbilt.edu> wrote: > Ramon Diaz-Uriarte wrote: >> >> Dear Frank, >> >> Thanks a lot for your response. And apologies for the question, >> because the answer was obviously in the help. >> >> As for the caveats on selection: yes, thanks. I think I am actually >> closely following your book (eg., pp. 249 to 253), and one of the >> points I am trying to make to my colleagues is that by doing variable >> selection, we are actually getting a worse model (as evidenced by the >> bias-corrected AUC, which is smaller if attempting variable >> selection). >> >> >> Best, >> >> R. > > Thanks Ramon. > > Bias-corrected measures need to be penalized for all variable selection > steps and for univariable screening. When the penalization is complete, you > usually see worse model performance as compared with full model fits, as you > wrote. > I thought that by using validate, and starting from the original (non-screened) model and using "bw = TRUE" in the call to validate, the bias-corrected statistics already include that penalization. After all, for each one of the bootstrap iterations, the selection process is carried out only with the in-bag bootstrap sample, but the "test" is conducted with the out-of-bag sample. So my understanding was that using the Dxy under the "corrected index" column I had accounted for the screening involved in the variable selection. Thanks, R. > Cheers > Frank > >> >> >> >> >> >> On Fri, Feb 12, 2010 at 3:13 PM, Frank E Harrell Jr >> <f.harr...@vanderbilt.edu> wrote: >>> >>> Ramon Diaz-Uriarte wrote: >>>> >>>> Dear All, >>>> >>>> For logistic regression models: is it possible to use validate (rms >>>> package) to compute bias-corrected AUC, but have variable selection >>>> with AIC use step (or stepAIC, from MASS), instead of fastbw? >>>> >>>> >>>> More details: >>>> >>>> I've been using the validate function (in the rms package, by Frank >>>> Harrell) to obtain, among other things, bootstrap bias-corrected >>>> estimates of the AUC, when variable selection is carried out (using >>>> AIC as criterion). validate calls predab.resample, which in turn calls >>>> fastbw (from the Design package, by Harrell). fastbw " Performs a >>>> slightly inefficient but numerically stable version of fast backward >>>> elimination on factors, using a method based on Lawless and Singhal >>>> (1978). This method uses the fitted complete model (...)". However, I >>>> am finding that the models returned by fastbw are much smaller than >>>> those returned by stepAIC or step (a simple example is shown below), >>>> probably because of the approximation and using the complete model. >>>> >>>> I'd like to use step instead of fastbw. I think this can be done by >>>> hacking predab.resample in a couple of places but I am wondering if >>>> this is a bad idea (why?) or if I am reinventing the wheel. >>>> >>>> >>>> Best, >>>> >>>> R. >>>> >>>> >>>> P.S. Simple example of fastbw compared to step: >>>> >>>> library(MASS) ## for stepAIC and bwt data >>>> example(birthwt) >>>> library(rms) >>>> >>>> bwt.glm <- glm(low ~ ., family = binomial, data = bwt) >>>> bwt.lrm <- lrm(low ~ ., data = bwt) >>>> >>>> step(bwt.glm) >>>> ## same as stepAIC(bwt.glm) >>>> >>>> fastbw(bwt.lrm) >>> >>> Hi Ramon, >>> >>> By default fastbw uses type='residual' to compute test statistics on all >>> deleted variables combined. Use type='individual' to get the behavior in >>> step. In your example fastbw(..., type='ind') gives the same model as >>> step() and comes surprisingly close to estimating the MLEs without >>> refitting. Of course you refit the reduced model to get MLEs. Both true >>> and approximate MLEs are biased by the variable selection so beware. >>> type= >>> can be passed from calibrate or validate to fastbw. >>> >>> Note that none of the statistics computed by step or fastbw were designed >>> to >>> be used with more than two completely pre-specified models. Variable >>> selection is hazardous both to inference and to prediction. There is no >>> free >>> lunch; we are torturing data to confess its own sins. >>> >>> Frank >>> >>> -- >>> Frank E Harrell Jr Professor and Chairman School of Medicine >>> Department of Biostatistics Vanderbilt University >>> >> >> > -- Ramon Diaz-Uriarte Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz Phone: +34-91-732-8000 ext. 3019 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.