Hi Michal, This paper by John Fox may help you to precise what you are looking for and to perform your analyses http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-bootstrapping.pdf Nael
On Tue, Jul 22, 2008 at 3:51 PM, Michal Figurski < [EMAIL PROTECTED]> wrote: > Dear all, > > I don't want to argue with anybody about words or about what bootstrap is > suitable for - I know too little for that. > > All I need is help to get the *equation coefficients* optimized by > bootstrap - either by one of the functions or by simple median. > > Please help, > > -- > Michal J. Figurski > HUP, Pathology & Laboratory Medicine > Xenobiotics Toxicokinetics Research Laboratory > 3400 Spruce St. 7 Maloney > Philadelphia, PA 19104 > tel. (215) 662-3413 > > Frank E Harrell Jr wrote: > >> Michal Figurski wrote: >> >>> Frank, >>> >>> "How does bootstrap improve on that?" >>> >>> I don't know, but I have an idea. Since the data in my set are just a >>> small sample of a big population, then if I use my whole dataset to obtain >>> max likelihood estimates, these estimates may be best for this dataset, but >>> far from ideal for the whole population. >>> >> >> The bootstrap, being a resampling procedure from your sample, has the same >> issues about the population as MLEs. >> >> >>> I used bootstrap to virtually increase the size of my dataset, it should >>> result in estimates more close to that from the population - isn't it the >>> purpose of bootstrap? >>> >> >> No >> >> >>> When I use such median coefficients on another dataset (another sample >>> from population), the predictions are better, than using max likelihood >>> estimates. I have already tested that and it worked! >>> >> >> Then your testing procedure is probably not valid. >> >> >>> I am not a statistician and I don't feel what "overfitting" is, but it >>> may be just another word for the same idea. >>> >>> Nevertheless, I would still like to know how can I get the coeffcients >>> for the model that gives the "nearly unbiased estimates". I greatly >>> appreciate your help. >>> >> >> More info in my book Regression Modeling Strategies. >> >> Frank >> >> >>> -- >>> Michal J. Figurski >>> HUP, Pathology & Laboratory Medicine >>> Xenobiotics Toxicokinetics Research Laboratory >>> 3400 Spruce St. 7 Maloney >>> Philadelphia, PA 19104 >>> tel. (215) 662-3413 >>> >>> Frank E Harrell Jr wrote: >>> >>>> Michal Figurski wrote: >>>> >>>>> Hello all, >>>>> >>>>> I am trying to optimize my logistic regression model by using >>>>> bootstrap. I was previously using SAS for this kind of tasks, but I am now >>>>> switching to R. >>>>> >>>>> My data frame consists of 5 columns and has 109 rows. Each row is a >>>>> single record composed of the following values: Subject_name, numeric1, >>>>> numeric2, numeric3 and outcome (yes or no). All three numerics are used to >>>>> predict outcome using LR. >>>>> >>>>> In SAS I have written a macro, that was splitting the dataset, running >>>>> LR on one half of data and making predictions on second half. Then it was >>>>> collecting the equation coefficients from each iteration of bootstrap. >>>>> Later >>>>> I was just taking medians of these coefficients from all iterations, and >>>>> used them as an optimal model - it really worked well! >>>>> >>>> >>>> Why not use maximum likelihood estimation, i.e., the coefficients from >>>> the original fit. How does the bootstrap improve on that? >>>> >>>> >>>>> Now I want to do the same in R. I tried to use the 'validate' or >>>>> 'calibrate' functions from package "Design", and I also experimented with >>>>> function 'sm.binomial.bootstrap' from package "sm". I tried also the >>>>> function 'boot' from package "boot", though without success - in my case >>>>> it >>>>> randomly selected _columns_ from my data frame, while I wanted it to >>>>> select >>>>> _rows_. >>>>> >>>> >>>> validate and calibrate in Design do resampling on the rows >>>> >>>> Resampling is mainly used to get a nearly unbiased estimate of the model >>>> performance, i.e., to correct for overfitting. >>>> >>>> Frank Harrell >>>> >>>> >>>>> Though the main point here is the optimized LR equation. I would >>>>> appreciate any help on how to extract the LR equation coefficients from >>>>> any >>>>> of these bootstrap functions, in the same form as given by 'glm' or 'lrm'. >>>>> >>>>> Many thanks in advance! >>>>> >>>>> >>>> >>>> >>> >> >> > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.