It seems to me that your two approaches are calculating CIs for
different quantities. The bootstrap methods are calculating a CI for
the difference in medians, while the Wilcoxon approach is calculating a
CI for the median of the differences. If this were the mean, those
would be the same, but not for the median:
A =c(619, 600, 490, 1076, 654, 955, 563, 955, 827, 873, 1253)
B =c(346, 507, 598, 228, 576, 338, 1153, 354, 560, 517, 381)
> median(A)-median(B)
[1] 320
> median(A-B)
[1] 273
--
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky
On 02/18/2012 05:05 AM, Vittorio Colagrande wrote:
Dear R-group,
I have run into a problem in estimating confidence intervals for the median
difference.
I want to establish a confidence interval at (1- alpha) level for the
difference between the
medians of two indipendent samples (size n and m), by using the Wilcoxon
distribution
or with bootstrap methods.
First method, we consider the z matrix of d=n∙m differences of the first and
second sample
data and we order these differences in a y vector. By the Wilcoxon
distribuition (W) we
determine the q quantile such that Prob(W<q)= alpha/2, inf and sup of
confidence interval are
respectively the q-th and (d-q+1)-th elements of the y vector, while we obtain
the difference
median by the y distribution. This method is also used to establish the CI by
wilcox.test.
Example. Two indipendent sample A and B (n=11, m=13) of CD4 count cells
(T-helper cells):
A =c(619, 600, 490, 1076, 654, 955, 563, 955, 827, 873, 1253)
B =c(346, 507, 598, 228, 576, 338, 1153, 354, 560, 517, 381, 415, 626)
1) CI 95% by matrix z and y vector:
n=length(A)
m=length(B)
d=n*m
z=matrix(0,m,n)
for(j in seq_len(n))
z[, j]=A[j] - B
y=sort(as.vector(z))
q=qwilcox(0.05/2,n,m,lower.tail = TRUE, log.p = FALSE)
inf=y[q]
sup=y[d-q+1]
med=median(y)
results: inf = 100, sup = 516 and med = 300.
2) CI 95% by wilcox.test:
I=wilcox.test(A,B,conf.lev=0.95,conf.int=TRUE,exact=F,correct=T)
inf=I$conf.int[1]
sup=I$conf.int[2].
results: inf = 99.9, sup = 516.
Second method, bootstrap each sample separately, creating the sampling
distribution for
each median. Then calculate the difference between the two medians, and create
the
sampling distribution of those differences. This is the sampling distribution
we care about.
Once we have that distribution we can establish a confidence interval. Some CI
95%,
with reference to the CD4 example, one given below.
1) First procedure (package boot):
library(boot)
n=length(A)
m=length(B)
y=c(A,B)
camp=data.frame(group=rep(c(1,2),c(n,m)),y)
dif.median=function(data,i) {
d=data[i,]
n1=n+1
m1=n+m
median(d$y[1:n])-median(d$y[n1:m1]) }
dif.boot=boot(camp,dif.median,R=10000, strata=camp$group)
boot.ci(dif.boot, conf =0.95, type="bca")
results: inf = 59, sup = 574.
2) Second procedure (package pairwiseCI):
library(pairwiseCI)
MedDiff=Median.diff(A, B, conf.level=0.95, alternative="two.sided",R=10000)
MedDiff$conf.int
MedDiff$estimate
results: inf = 56, sup = 574, median=320
3) Third procedure (stratified bootstrap):
dif<- numeric(10000)
for(i in seq_len(10000))
dif[i]<- median(sample(A, replace=TRUE)) - median(sample(B,
replace=TRUE))
quantile(dif,prob=c(0.5,(1-0.95)/2,(1-(1-0.95)/2)))
results: inf = 56, sup = 574, median = 313.
4) Fourth procedure (package simpleboot)
library(simpleboot)
boot_diff<- two.boot(A, B, median, R = 10000)
boot.ci(boot_diff,conf=0.95,type="bca")
results: inf = 59, sup = 574.
The bootstrap procedures do get the same results, but the confidence intervals
are
significantly different from those obtained using the method that refers to the
Wilcoxon
distribution.
Problem: does this difference depend on really "different" methods
or on incorrect implementation of the bootstrap technique?
I will greatly appreciate any clarification you could provide.
Best regards.
Vittorio Colagrande
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.