I'm trying to implement the two-sample Wald-Wolfowitz runs test.  Daniel
(1990) suggests a method to deal with ties across samples.  His suggestion
is to prepare ordered arrangements, one resulting in the fewest number of
runs, and one resulting in the largest number of runs.  Then take the mean
of these.  The code below counts 9 runs for my example data where '60' is
tied across samples.

X <-  c(58, 62, 55, 60, 60, 67)
n1 <- length(X)
Y <- c(60, 59, 72, 73, 56, 53, 50, 50)
n2 <- length(Y)
data <- c(X, Y)
names(data) <- c(rep("X", n1), rep("Y", n2))
data <- sort(data)
runs <- rle(names(data))
r <- length(runs$lengths)
r

Y  Y  Y  X  Y  X  Y  X  X  Y  X  X  Y  Y
50 50 53 55 56 58 59 60 60 60 62 67 72 73 --> r = 9 runs

The other possible orderings are:

Y  Y  Y  X  Y  X  Y  X  Y  X  X  X  Y  Y  --> 9 runs
50 50 53 55 56 58 59 60 60 60 62 67 72 73

Y  Y  Y  X  Y  X  Y  Y  X  X  X  X  Y  Y  --> 7 runs
50 50 53 55 56 58 59 60 60 60 62 67 72 73

How to I generate the other possible orderings?  Thus, far, I've found a day
to identify cross sample duplicates...

# find the ties across samples
dd <- data[duplicated(data)]  #find all duplicates
idd <- dd  %in% X & dd  %in% Y #determine found in both X and Y
duplicates <- dd[idd]

Thanks!  --Dale

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to