Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?

Michal Figurski Tue, 22 Jul 2008 07:27:24 -0700

Dear all,

I don't want to argue with anybody about words or about what bootstrapis suitable for - I know too little for that.

All I need is help to get the *equation coefficients* optimized bybootstrap - either by one of the functions or by simple median.


Please help,

--
Michal J. Figurski
HUP, Pathology & Laboratory Medicine
Xenobiotics Toxicokinetics Research Laboratory
3400 Spruce St. 7 Maloney
Philadelphia, PA 19104
tel. (215) 662-3413

Frank E Harrell Jr wrote:

Michal Figurski wrote:
Frank,

"How does bootstrap improve on that?"
I don't know, but I have an idea. Since the data in my set are just asmall sample of a big population, then if I use my whole dataset toobtain max likelihood estimates, these estimates may be best for thisdataset, but far from ideal for the whole population.
The bootstrap, being a resampling procedure from your sample, has thesame issues about the population as MLEs.
I used bootstrap to virtually increase the size of my dataset, itshould result in estimates more close to that from the population -isn't it the purpose of bootstrap?
No
When I use such median coefficients on another dataset (another samplefrom population), the predictions are better, than using maxlikelihood estimates. I have already tested that and it worked!
Then your testing procedure is probably not valid.
I am not a statistician and I don't feel what "overfitting" is, but itmay be just another word for the same idea.
Nevertheless, I would still like to know how can I get the coeffcientsfor the model that gives the "nearly unbiased estimates". I greatlyappreciate your help.
More info in my book Regression Modeling Strategies.

Frank
--
Michal J. Figurski
HUP, Pathology & Laboratory Medicine
Xenobiotics Toxicokinetics Research Laboratory
3400 Spruce St. 7 Maloney
Philadelphia, PA 19104
tel. (215) 662-3413

Frank E Harrell Jr wrote:
Michal Figurski wrote:
Hello all,
I am trying to optimize my logistic regression model by usingbootstrap. I was previously using SAS for this kind of tasks, but Iam now switching to R.
My data frame consists of 5 columns and has 109 rows. Each row is asingle record composed of the following values: Subject_name,numeric1, numeric2, numeric3 and outcome (yes or no). All threenumerics are used to predict outcome using LR.
In SAS I have written a macro, that was splitting the dataset,running LR on one half of data and making predictions on secondhalf. Then it was collecting the equation coefficients from eachiteration of bootstrap. Later I was just taking medians of thesecoefficients from all iterations, and used them as an optimal model- it really worked well!
Why not use maximum likelihood estimation, i.e., the coefficientsfrom the original fit. How does the bootstrap improve on that?
Now I want to do the same in R. I tried to use the 'validate' or'calibrate' functions from package "Design", and I also experimentedwith function 'sm.binomial.bootstrap' from package "sm". I triedalso the function 'boot' from package "boot", though without success- in my case it randomly selected _columns_ from my data frame,while I wanted it to select _rows_.
validate and calibrate in Design do resampling on the rows
Resampling is mainly used to get a nearly unbiased estimate of themodel performance, i.e., to correct for overfitting.
Frank Harrell
Though the main point here is the optimized LR equation. I wouldappreciate any help on how to extract the LR equation coefficientsfrom any of these bootstrap functions, in the same form as given by'glm' or 'lrm'.
Many thanks in advance!


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?

Reply via email to