Thank you for your advice, Tim. I am reading your paper and other materials in your website. I could not find R package of your bootknife method. Is there any R package for this procedure?
(11/05/17 14:13), Tim Hesterberg wrote: > My usual rule is that whatever gives the widest confidence intervals > in a particular problem is most accurate for that problem :-) > > Bootstrap percentile intervals tend to be too narrow. > Consider the case of the sample mean; the usual formula CI is > xbar +- t_alpha sqrt( (1/(n-1)) sum((x_i - xbar)^2)) / sqrt(n) > The bootstrap percentile interval for symmetric data is roughly > xbar +- z_alpha sqrt( (1/(n )) sum((x_i - xbar)^2)) / sqrt(n) > It is narrower than the formula CI because > * z quantiles rather than t quantiles > * standard error uses divisor of n rather than (n-1) > > In stratified sampling, the narrowness factor depends on the > stratum sizes, not the overall n. > In regression, estimates for some quantities may be based on a small > subset of the data (e.g. coefficients related to rare factor levels). > > This doesn't mean we should give up on the bootstrap. > There are remedies for the bootstrap biases, see e.g. > Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling > vs. Smoothing, Proceedings of the Section on Statistics and the > Environment, American Statistical Association, 2924-2930. > http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf > > And other methods have their own biases, particularly in nonlinear > applications such as logistic regression. > > Tim Hesterberg > >> Thank you for your reply, Prof. Harrell. >> >> I agree with you. Dropping only one variable does not actually help a lot. >> >> I have one more question. >> During analysis of this model I found that the confidence >> intervals (CIs) of some coefficients provided by bootstrapping (bootcov >> function in rms package) was narrower than CIs provided by usual >> variance-covariance matrix and CIs of other coefficients wider. My data >> has no cluster structure. I am wondering which CIs are better. >> I guess bootstrapping one, but is it right? >> >> I would appreciate your help in advance. >> -- >> KH >> >> >> >> (11/05/16 12:25), Frank Harrell wrote: >>> I think you are doing this correctly except for one thing. The validation >>> and other inferential calculations should be done on the full model. Use >>> the approximate model to get a simpler nomogram but not to get standard >>> errors. With only dropping one variable you might consider just running the >>> nomogram on the entire model. >>> Frank >>> >>> >>> KH wrote: >>>> >>>> Hi, >>>> I am trying to construct a logistic regression model from my data (104 >>>> patients and 25 events). I build a full model consisting of five >>>> predictors with the use of penalization by rms package (lrm, pentrace >>>> etc) because of events per variable issue. Then, I tried to approximate >>>> the full model by step-down technique predicting L from all of the >>>> componet variables using ordinary least squares (ols in rms package) as >>>> the followings. I would like to know whether I am doing right or not. >>>> >>>>> library(rms) >>>>> plogit<- predict(full.model) >>>>> full.ols<- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1) >>>>> fastbw(full.ols, aics=1e10) >>>> >>>> Deleted Chi-Sq d.f. P Residual d.f. P AIC R2 >>>> stenosis 1.41 1 0.2354 1.41 1 0.2354 -0.59 0.991 >>>> x2 16.78 1 0.0000 18.19 2 0.0001 14.19 0.882 >>>> procedure 26.12 1 0.0000 44.31 3 0.0000 38.31 0.711 >>>> ClinicalScore 25.75 1 0.0000 70.06 4 0.0000 62.06 0.544 >>>> x1 83.42 1 0.0000 153.49 5 0.0000 143.49 0.000 >>>> >>>> Then, fitted an approximation to the full model using most imprtant >>>> variable (R^2 for predictions from the reduced model against the >>>> original Y drops below 0.95), that is, dropping "stenosis". >>>> >>>>> full.ols.approx<- ols(plogit ~ x1+x2+ClinicalScore+procedure) >>>>> full.ols.approx$stats >>>> n Model L.R. d.f. R2 g Sigma >>>> 104.0000000 487.9006640 4.0000000 0.9908257 1.3341718 0.1192622 >>>> >>>> This approximate model had R^2 against the full model of 0.99. >>>> Therefore, I updated the original full logistic model dropping >>>> "stenosis" as predictor. >>>> >>>>> full.approx.lrm<- update(full.model, ~ . -stenosis) >>>> >>>>> validate(full.model, bw=F, B=1000) >>>> index.orig training test optimism index.corrected n >>>> Dxy 0.6425 0.7017 0.6131 0.0887 0.5539 1000 >>>> R2 0.3270 0.3716 0.3335 0.0382 0.2888 1000 >>>> Intercept 0.0000 0.0000 0.0821 -0.0821 0.0821 1000 >>>> Slope 1.0000 1.0000 1.0548 -0.0548 1.0548 1000 >>>> Emax 0.0000 0.0000 0.0263 0.0263 0.0263 1000 >>>> >>>>> validate(full.approx.lrm, bw=F, B=1000) >>>> index.orig training test optimism index.corrected n >>>> Dxy 0.6446 0.6891 0.6265 0.0626 0.5820 1000 >>>> R2 0.3245 0.3592 0.3428 0.0164 0.3081 1000 >>>> Intercept 0.0000 0.0000 0.1281 -0.1281 0.1281 1000 >>>> Slope 1.0000 1.0000 1.1104 -0.1104 1.1104 1000 >>>> Emax 0.0000 0.0000 0.0444 0.0444 0.0444 1000 >>>> >>>> Validatin revealed this approximation was not bad. >>>> Then, I made a nomogram. >>>> >>>>> full.approx.lrm.nom<- nomogram(full.approx.lrm, >>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) >>>>> plot(full.approx.lrm.nom) >>>> >>>> Another nomogram using ols model, >>>> >>>>> full.ols.approx.nom<- nomogram(full.ols.approx, >>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) >>>>> plot(full.ols.approx.nom) >>>> >>>> These two nomograms are very similar but a little bit different. >>>> >>>> My questions are; >>>> >>>> 1. Am I doing right? >>>> >>>> 2. Which nomogram is correct >>>> >>>> I would appreciate your help in advance. >>>> >>>> -- >>>> KH >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>> ----- >>> Frank Harrell >>> Department of Biostatistics, Vanderbilt University >>> -- >>> View this message in context: >>> http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> E-mail address >> Office: khos...@med.kobe-u.ac.jp >> Home : khos...@venus.dti.ne.jp >> >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.