Re: [R] Fwd: Strange results : bootrstrp CIs

2024-01-14 Thread Duncan Murdoch

On 13/01/2024 8:58 p.m., Rolf Turner wrote:

On Sat, 13 Jan 2024 17:59:16 -0500
Duncan Murdoch  wrote:




My guess is that one of the bootstrap samples had a different
selection of countries, so factor(Country) had different levels, and
that would really mess things up.

You'll need to decide how to handle that:  If you are trying to
estimate the coefficient for Italy in a sample that contains no data
from Italy, what should the coefficient be?


Perhaps NA?  Ben Bolker conjectured that boot() might be able to handle
this.  Getting the NAs into the coefficients is a bit of a fag, but.  I
tried:


My question was really intended as a statistical question.  From a 
statistical perspective, if I have a sampling scheme that sometimes 
generates sample size 0, should my CI be (-Inf, Inf) for high enough 
confidence level?


A Bayesian might say that inference should be entirely based on the 
prior in the case of no relevant data.  You could get similar numerical 
results by adding some fake data to every bootstrap sample, e.g. a 
single weighted observation for each country at your prior mean for that 
country, with weight chosen to match the strength of the prior.  But 
Bayesian methods don't give confidence intervals, they give credible 
intervals, and those aren't the same thing even if they are sometimes 
numerically similar.


Duncan Murdoch



func <- function(data, idx) {
clyde <- coef(lm(Score~ Time + factor(Country),data=data))
ccc <- coef(lm(Score~ Time + factor(Country),data=data[idx,]))
urk <- rep(NA,length(clyde))
names(urk) <-names(clyde)
urk[names(ccc)] <- ccc
urk
}

It produced a result:


set.seed(42)
B= boot(e, func, R=1000)

B

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = e, statistic = func, R = 1000)


Bootstrap Statistics :
   original biasstd. error
t1*  609.62500  3.620405295.39452
t2*  -54.81250 -1.662470436.32911
t3*  -41.3 -2.7337992   100.72113
t4*  -96.0 -1.099571899.78864
t5* -126.0 -0.654888663.47076
t6*  -26.3 -1.651668387.80483
t7*  -15.7 -0.839117091.72467
t8*  -21.7 -5.454401383.69211
t9*   18.3 -0.771100185.57278


However I have no idea if the result is correct, or even meaningful. I
have no idea what I'm doing.  Just hammering and hoping. 😊️



cheers,

Rolf



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: Strange results : bootrstrp CIs

2024-01-14 Thread varin sacha via R-help
Dear R-experts,

I really thank you all for your responses.

Best,



Le dimanche 14 janvier 2024 à 10:22:12 UTC+1, Duncan Murdoch 
 a écrit : 





On 13/01/2024 8:58 p.m., Rolf Turner wrote:
> On Sat, 13 Jan 2024 17:59:16 -0500
> Duncan Murdoch  wrote:
> 
> 
> 
>> My guess is that one of the bootstrap samples had a different
>> selection of countries, so factor(Country) had different levels, and
>> that would really mess things up.
>>
>> You'll need to decide how to handle that:  If you are trying to
>> estimate the coefficient for Italy in a sample that contains no data
>> from Italy, what should the coefficient be?
> 
> Perhaps NA?  Ben Bolker conjectured that boot() might be able to handle
> this.  Getting the NAs into the coefficients is a bit of a fag, but.  I
> tried:

My question was really intended as a statistical question.  From a 
statistical perspective, if I have a sampling scheme that sometimes 
generates sample size 0, should my CI be (-Inf, Inf) for high enough 
confidence level?

A Bayesian might say that inference should be entirely based on the 
prior in the case of no relevant data.  You could get similar numerical 
results by adding some fake data to every bootstrap sample, e.g. a 
single weighted observation for each country at your prior mean for that 
country, with weight chosen to match the strength of the prior.  But 
Bayesian methods don't give confidence intervals, they give credible 
intervals, and those aren't the same thing even if they are sometimes 
numerically similar.


Duncan Murdoch


> func <- function(data, idx) {
> clyde <- coef(lm(Score~ Time + factor(Country),data=data))
> ccc <- coef(lm(Score~ Time + factor(Country),data=data[idx,]))
> urk <- rep(NA,length(clyde))
> names(urk) <-names(clyde)
> urk[names(ccc)] <- ccc
> urk
> }
> 
> It produced a result:
> 
>>> set.seed(42)
>>> B= boot(e, func, R=1000)
>> B
>>
>> ORDINARY NONPARAMETRIC BOOTSTRAP
>>
>>
>> Call:
>> boot(data = e, statistic = func, R = 1000)
>>
>>
>> Bootstrap Statistics :
>>        original    bias    std. error
>> t1*  609.62500  3.6204052    95.39452
>> t2*  -54.81250 -1.6624704    36.32911
>> t3*  -41.3 -2.7337992  100.72113
>> t4*  -96.0 -1.0995718    99.78864
>> t5* -126.0 -0.6548886    63.47076
>> t6*  -26.3 -1.6516683    87.80483
>> t7*  -15.7 -0.8391170    91.72467
>> t8*  -21.7 -5.4544013    83.69211
>> t9*  18.3 -0.7711001    85.57278
> 
> However I have no idea if the result is correct, or even meaningful. I
> have no idea what I'm doing.  Just hammering and hoping. 😊️
> 
> 
> 
> cheers,
> 
> Rolf
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting extrapolation with R like AUTOBOX does

2024-01-14 Thread varin sacha via R-help
Dear R-experts,

I write to you to know if somebody is aware of a R package (or function) able 
to plot graphs for extrapolation.

I need to be clear on what extrapolation really is to me. It is when we use the 
model for X variables outside the range of X variables that were used to 
construct the model and estimates. 

What I am really looking for is that beyond confidence intervals for 
predictions for the one estimated model, I want intervals that reflect the 
uncertainty over the models that could have fit the data as well. I know that 
Clive Granger worried about this a lot, and wrote a few papers (he called them 
'thick' confidence intervals, taking into account model uncertainty).

I found AUTOBOX software (https://autobox.com/cms/) allowing multiple time 
series to be modeled as possible inputs without requiring time as a possible 
predictor thus it can reproduce ordinary regression.

More precisely, I am looking for an R package (or function) able to produce the 
same 4 graphs (done with AUTOBOX) in the answer here : 
https://stats.stackexchange.com/questions/327454/right-way-to-extrapolate-data

Best,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.