Probably a good idea for you. The R help list is useful for both programming AND statistical advice for those who want it.
> -----Original Message----- > From: Michal Figurski [mailto:[EMAIL PROTECTED] > Sent: Tuesday, July 22, 2008 10:44 AM > To: Doran, Harold; r-help@r-project.org > Subject: Re: [R] Coefficients of Logistic Regression from > bootstrap - how to get them? > > Hmm... > > It sounds like ideology to me. I was asking for technical > help. I know what I want to do, just don't know how to do it > in R. I'll go back to SAS then. Thank you. > > -- > Michal J. Figurski > > Doran, Harold wrote: > > I think the answer has been given to you. If you want to > continue to > > ignore that advice and use bootstrap for point estimates > rather than > > the properties of those estimates (which is what bootstrap is for) > > then you are on your own. > > > >> -----Original Message----- > >> From: [EMAIL PROTECTED] > >> [mailto:[EMAIL PROTECTED] On Behalf Of Michal Figurski > >> Sent: Tuesday, July 22, 2008 9:52 AM > >> To: r-help@r-project.org > >> Subject: Re: [R] Coefficients of Logistic Regression from > bootstrap - > >> how to get them? > >> > >> Dear all, > >> > >> I don't want to argue with anybody about words or about what > >> bootstrap is suitable for - I know too little for that. > >> > >> All I need is help to get the *equation coefficients* optimized by > >> bootstrap - either by one of the functions or by simple median. > >> > >> Please help, > >> > >> -- > >> Michal J. Figurski > >> HUP, Pathology & Laboratory Medicine > >> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7 > >> Maloney Philadelphia, PA 19104 tel. (215) 662-3413 > >> > >> Frank E Harrell Jr wrote: > >>> Michal Figurski wrote: > >>>> Frank, > >>>> > >>>> "How does bootstrap improve on that?" > >>>> > >>>> I don't know, but I have an idea. Since the data in my set > >> are just a > >>>> small sample of a big population, then if I use my whole > >> dataset to > >>>> obtain max likelihood estimates, these estimates may be > >> best for this > >>>> dataset, but far from ideal for the whole population. > >>> The bootstrap, being a resampling procedure from your > >> sample, has the > >>> same issues about the population as MLEs. > >>> > >>>> I used bootstrap to virtually increase the size of my > dataset, it > >>>> should result in estimates more close to that from the > >> population - > >>>> isn't it the purpose of bootstrap? > >>> No > >>> > >>>> When I use such median coefficients on another dataset (another > >>>> sample from population), the predictions are better, than > >> using max > >>>> likelihood estimates. I have already tested that and it worked! > >>> Then your testing procedure is probably not valid. > >>> > >>>> I am not a statistician and I don't feel what > >> "overfitting" is, but > >>>> it may be just another word for the same idea. > >>>> > >>>> Nevertheless, I would still like to know how can I get the > >>>> coeffcients for the model that gives the "nearly unbiased > >> estimates". > >>>> I greatly appreciate your help. > >>> More info in my book Regression Modeling Strategies. > >>> > >>> Frank > >>> > >>>> -- > >>>> Michal J. Figurski > >>>> HUP, Pathology & Laboratory Medicine Xenobiotics Toxicokinetics > >>>> Research Laboratory 3400 Spruce St. 7 Maloney Philadelphia, PA > >>>> 19104 tel. (215) 662-3413 > >>>> > >>>> Frank E Harrell Jr wrote: > >>>>> Michal Figurski wrote: > >>>>>> Hello all, > >>>>>> > >>>>>> I am trying to optimize my logistic regression model by using > >>>>>> bootstrap. I was previously using SAS for this kind of > >> tasks, but I > >>>>>> am now switching to R. > >>>>>> > >>>>>> My data frame consists of 5 columns and has 109 rows. > >> Each row is a > >>>>>> single record composed of the following values: Subject_name, > >>>>>> numeric1, numeric2, numeric3 and outcome (yes or no). > All three > >>>>>> numerics are used to predict outcome using LR. > >>>>>> > >>>>>> In SAS I have written a macro, that was splitting the dataset, > >>>>>> running LR on one half of data and making predictions > on second > >>>>>> half. Then it was collecting the equation coefficients > from each > >>>>>> iteration of bootstrap. Later I was just taking > medians of these > >>>>>> coefficients from all iterations, and used them as an > >> optimal model > >>>>>> - it really worked well! > >>>>> Why not use maximum likelihood estimation, i.e., the > coefficients > >>>>> from the original fit. How does the bootstrap improve on that? > >>>>> > >>>>>> Now I want to do the same in R. I tried to use the > 'validate' or > >>>>>> 'calibrate' functions from package "Design", and I also > >>>>>> experimented with function 'sm.binomial.bootstrap' > from package > >>>>>> "sm". I tried also the function 'boot' from package > >> "boot", though > >>>>>> without success > >>>>>> - in my case it randomly selected _columns_ from my > data frame, > >>>>>> while I wanted it to select _rows_. > >>>>> validate and calibrate in Design do resampling on the rows > >>>>> > >>>>> Resampling is mainly used to get a nearly unbiased > >> estimate of the > >>>>> model performance, i.e., to correct for overfitting. > >>>>> > >>>>> Frank Harrell > >>>>> > >>>>>> Though the main point here is the optimized LR > equation. I would > >>>>>> appreciate any help on how to extract the LR equation > >> coefficients > >>>>>> from any of these bootstrap functions, in the same form > >> as given by > >>>>>> 'glm' or 'lrm'. > >>>>>> > >>>>>> Many thanks in advance! > >>>>>> > >>>>> > >>> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.