Ramon Diaz-Uriarte wrote:
Frank, let me make sure I understand:
On Fri, Feb 12, 2010 at 5:57 PM, Frank E Harrell Jr
<f.harr...@vanderbilt.edu> wrote:
Ramon Diaz-Uriarte wrote:
Dear Frank,
Thanks a lot for your response. And apologies for the question,
because the answer was obviously in the help.
As for the caveats on selection: yes, thanks. I think I am actually
closely following your book (eg., pp. 249 to 253), and one of the
points I am trying to make to my colleagues is that by doing variable
selection, we are actually getting a worse model (as evidenced by the
bias-corrected AUC, which is smaller if attempting variable
selection).
Best,
R.
Thanks Ramon.
Bias-corrected measures need to be penalized for all variable selection
steps and for univariable screening. When the penalization is complete, you
usually see worse model performance as compared with full model fits, as you
wrote.
I thought that by using validate, and starting from the original
(non-screened) model and using "bw = TRUE" in the call to validate,
the bias-corrected statistics already include that penalization. After
all, for each one of the bootstrap iterations, the selection process
is carried out only with the in-bag bootstrap sample, but the "test"
is conducted with the out-of-bag sample. So my understanding was that
using the Dxy under the "corrected index" column I had accounted for
the screening involved in the variable selection.
Thanks,
R.
Ramon,
Yes you have it right, assuming there was no univariable or other
screening done that bw=TRUE would not know about. [Note that test and
training samples overlap with the ordinary bootstrap procedure though.]
I wasn't familiar with "bias correct AIC" and assumed that came from
another function. validate() produces the proper corrected indexes for
the indexes it prints.
Frank
Cheers
Frank
On Fri, Feb 12, 2010 at 3:13 PM, Frank E Harrell Jr
<f.harr...@vanderbilt.edu> wrote:
Ramon Diaz-Uriarte wrote:
Dear All,
For logistic regression models: is it possible to use validate (rms
package) to compute bias-corrected AUC, but have variable selection
with AIC use step (or stepAIC, from MASS), instead of fastbw?
More details:
I've been using the validate function (in the rms package, by Frank
Harrell) to obtain, among other things, bootstrap bias-corrected
estimates of the AUC, when variable selection is carried out (using
AIC as criterion). validate calls predab.resample, which in turn calls
fastbw (from the Design package, by Harrell). fastbw " Performs a
slightly inefficient but numerically stable version of fast backward
elimination on factors, using a method based on Lawless and Singhal
(1978). This method uses the fitted complete model (...)". However, I
am finding that the models returned by fastbw are much smaller than
those returned by stepAIC or step (a simple example is shown below),
probably because of the approximation and using the complete model.
I'd like to use step instead of fastbw. I think this can be done by
hacking predab.resample in a couple of places but I am wondering if
this is a bad idea (why?) or if I am reinventing the wheel.
Best,
R.
P.S. Simple example of fastbw compared to step:
library(MASS) ## for stepAIC and bwt data
example(birthwt)
library(rms)
bwt.glm <- glm(low ~ ., family = binomial, data = bwt)
bwt.lrm <- lrm(low ~ ., data = bwt)
step(bwt.glm)
## same as stepAIC(bwt.glm)
fastbw(bwt.lrm)
Hi Ramon,
By default fastbw uses type='residual' to compute test statistics on all
deleted variables combined. Use type='individual' to get the behavior in
step. In your example fastbw(..., type='ind') gives the same model as
step() and comes surprisingly close to estimating the MLEs without
refitting. Of course you refit the reduced model to get MLEs. Both true
and approximate MLEs are biased by the variable selection so beware.
type=
can be passed from calibrate or validate to fastbw.
Note that none of the statistics computed by step or fastbw were designed
to
be used with more than two completely pre-specified models. Variable
selection is hazardous both to inference and to prediction. There is no
free
lunch; we are torturing data to confess its own sins.
Frank
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.