Thank you all for your words of wisdom.

I start getting into what you mean by bootstrap. Not surprisingly, it seems to be something else than I do. The bootstrap is a tool, and I would rather compare it to a hammer than to a gun. People say that hammer is for driving nails. This situation is as if I planned to use it to break rocks.

The key point is that I don't really care about the bias or variance of the mean in the model. These things are useful for statisticians; regular people (like me, also a chemist) do not understand them and have no use for them (well, now I somewhat understand). My goal is very practical: I need an equation that can predict patient's outcome, based on some data, with maximum reliability and accuracy.

I have found from the mentioned paper (and from my own experience) that re-sampling and running the regression on re-sampled dataset multiple times does improve predictions. You have a proof of that in that paper, page 1502, and to me it is rather a stunning proof: compare 56% to 82% of correctly predicted values (correct means within 15% of original value).

I can understand that it's somewhat new for many of you, and some tried to discourage me from this approach (shooting my foot). This concept was devised by, I believe, Mr Michael Hale, a respectable biostatistician. It utilises bootstrap concept of resampling, though, after recent discussion, I think it should be called another name.

In addition to better predictive performance, using this concept I also get a second dataset with each iteration, that can be used for validation of the model. In this approach the validation data are accumulated throughout the bootstrap, and then used in the end to calculate log residuals using equation with median coefficients. I am sure you can question that in many ways, but to me this is as good as you can get.

To be more practical, I will ask the authors of this paper if I can post their original dataset in this forum (I have it somewhere) - if you guys think it's interesting enough. Then anyone of you could use it, follow the procedure, and criticize, if they wish.

--
Michal J. Figurski
HUP, Pathology & Laboratory Medicine
Xenobiotics Toxicokinetics Research Laboratory
3400 Spruce St. 7 Maloney
Philadelphia, PA 19104
tel. (215) 662-3413

S Ellison wrote:
jeez, but you've kicked up a storm!

penn'orth on the bootstrap; and since I'm a chemist, you can ignore at
will.

The bootstrap starts with your data and the model you developed with
it. Resampling gives a fair idea of what the variance _around your
current estimate_ is. But it cannot tell you how biased you are or
improve your estimate, because there is no more information in your
data.
Toy example. Let's say I get some results from some measurement
procedure, like this.

set.seed(408) #so we get the same random sample (!)

y<-rnorm(12,5) #OK, not a very convincing measurement, but....

#Now let's add a bit of bias
y<-y+3

mean(y) #... is my (biased) estimate of the mean value.

#Now let's pretend I don't know the true answer OR the bias, which is
what happens #in the real world, and try bootsrapping. Let's get a rather generous #10000 resamples from my data;

m<-matrix(sample(y, length(y)*10000, replace=T), ncol=length(y))
#This gives me a matrix with 10000 rows, each of which is a resample #of my 12 data.
#And now we can calculate 10000 bootstrapped means in one shot:
bs.mean<-apply(m,1,mean) #which applies 'mean' to each row.

#We hope the variance of these things is about 1/12, 'cos we got y from
a normal distribution #with var 1 and we had 12 of them. let's see...
var(bs.mean)

#which should resemble
1/12

#and does.. roughly. #And for interest, compare with what we go direct from the data;
var(y)/12
#which in this case was slightly further from the 'true' variance. It
won't always be, though; #that depends on the data.

#Anyway, the bootstrap variance looks about right. So ... on to bias

#Now, where would we expect the bootstrapped mean value to be? #At the true value, or where we started?
mean(bs.mean)

#Oh dear. It's still biased. And it looks very much like the mean of
y.
#It's clearly told us nothing about the true mean.

#Bottom line; All you have is your data. Bootstrapping uses your data.

#Therefore, bootstrapping can tell you no more than you can get from
your data.
#But it's still useful if you have some rather more complicated
statistic derived from #a non-linear fit, because it lets you get some idea of the variance.
#But not the bias.

This may be why some folk felt that your solution as worded (an
ever-present peril, wording) was not an answer to the right question.
The fitting procedure already gives you the 'best estimate' (where
'best' means max likelihood, this time), and bootstrapping really cannot
improve on that. It can only start at your current 'best' and move away
from it in a random direction.  That can't possibly improve the
estimated coefficients. And the more you bootstrap, the closer the mean
gets to where you started. So "how does the bootstrap improve on that?" was a very pertinent
question - to which the correct answer was "it can't - but it can
suggest what the variance might be".
As to whether you wanted advice on whether to bootstrap or not; well,
it's an open forum and aid is voluntary. R help always generates at
least three replies, one of which is "tell me more about the problem",
one of which is "why are you doing it that way?" and one of which is
"that is probably not the problem you should be trying to solve". On a
good day you also get the one that goes "this might solve it".

Incidentally, the boot package and the simpleboot package both do
bootstrapping; they might solve your problem.
Then there's advice. Folk obviously can't impose unless you let them -
but they do know a lot about statistics and if they say something is
silly, it is at least worth finding out why so that you (and I, for that
matter) can better defend our silliness. Also, of course, if you see someone trying to do something silly - eg
pull the trigger while the gun is pointed at their foot - would you
really give them the instruction they asked for on how to get the safety
catch off? Or tell them that what they are doing is silly? (Me, well, it's their foot but if I help them, they may sue me later)


If any of the above helps without sounding horribly patronising, I win.
If not, well, you have another email to burn!

happy booting

Steve Ellison

Michal Figurski <[EMAIL PROTECTED]> 22/07/2008 20:42 >>>


*******************************************************************
This email and any attachments are confidential. Any u...{{dropped:8}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to