Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?

Frank E Harrell Jr Wed, 23 Jul 2008 16:14:34 -0700

Gustaf Rydevik wrote:

On Wed, Jul 23, 2008 at 4:08 PM, Michal Figurski
<[EMAIL PROTECTED]> wrote:

Gustaf,


I am sorry, but I don't get the point. Let's just focus on predictive
performance from the cited passage, that is the number of values predicted
within 15% of the original value.
So, the predictive performance from the model fit on entire dataset was 56%
of profiles, while from bootstrapped model it was 82% of profiles. Well - I
see a stunning purpose in the bootstrap step here: it turns an useless
equation into a clinically applicable model!

Honestly, I also can't see how this can be better than fitting on entire
dataset, but here you have a proof that it is.

I think that another argument supporting this approach is model validation.
If you fit model on entire data, you have no data left to validate its
predictions.

On the other hand, I agree with you that the passage in methods section
looks awkward.

In my work on a similar problem, that is going to appear in August in Ther
Drug Monit, I used medians since beginning and all the comparisons were done
based on models with median coefficients. I think this is what the authors
of that paper did, though they might just have had a problem with describing
it correctly, and unfortunately it passed through review process unchanged.




Hi,

I believe that you misunderstand the passage. Do you know what
multiple stepwise regression is?

Since they used SPSS, I copied from
http://www.visualstatistics.net/SPSS%20workbook/stepwise_multiple_regression.htm

"Stepwise selection is a combination of forward and backward procedures.
Step 1

The first predictor variable is selected in the same way as in forward
selection. If the probability associated with the test of significance
is less than or equal to the default .05, the predictor variable with
the largest correlation with the criterion variable enters the
equation first.


Step 2

The second variable is selected based on the highest partial
correlation. If it can pass the entry requirement (PIN=.05), it also
enters the equation.

Step 3

From this point, stepwise selection differs from forward selection:

the variables already in the equation are examined for removal
according to the removal criterion (POUT=.10) as in backward
elimination.

Step 4

Variables not in the equation are examined for entry. Variable
selection ends when no more variables meet entry and removal criteria.
-----------


It is the outcome of this *entire process*,step1-4, that they compare
with the outcome of their *entire bootstrap/crossvalidation/selection
process*, Step1-4 in the methods section, and find that their approach
gives better result
What you are doing is only step4 in the article's method
section,estimating the parameters of a model *when you already know
which variables to include*.It is the way this step is conducted that
I am sceptical about.

Regards,

Gustaf

Perfectly stated Gustaf. This is a great example of needing to trulyunderstand a method to be able to use it in the right context.

After having read most of the paper by Pawinski et al now, there areother problems.

1. The paper nowhere uses bootstrapping. It uses repeated 2-foldcross-validation, a procedure not usually recommended.

2. The resampling procedure used in the paper treated the 50pharmacokinetic profiles on 21 renal transplant patients as if thesewere from 50 patients. The cluster bootstrap should have been used instead.

3. Figure 2 showed the fitted regression line to the predicted vs.observed AUCs. It should have shown the line of identify instead. Inother words, the authors allowed a subtle recalibration to creep intothe analysis (and inverted the x- and y-variables in the plots). Thefitted lines are far enough away from the line of identity as to showthat the predicted values are not well calibrated. The r^2 valuesclaimed by the authors used the wrong formulas which allowed anautomatic after-the-fact recalibration (new overall slope and interceptare estimated in the test dataset). Hence the achieved r^2 are misleading.



--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?

Reply via email to