This sounds very much like a recursive problem: something like this seems to get the gist of what you want.
DataSplits <- function(Data, alpha = 0.05) { DataSplitsCore <- function(Data, alpha, level) { tt <- t.test(Data[,1],Data[,2]) print(tt) if (tt$p.value > alpha) { print(paste("Stopped at level", level)) return(invisible(TRUE)) } else { nr = floor(NROW(Data)/2) if (nr == 1) {print(paste("Reached Samples of Size 1")); stop} d1 = DataSplitsCore(Data[(1:nr),], alpha = alpha, level = level + 1) if (d1) return(invisible(TRUE)) d2 = DataSplitsCore(Data[-(1:nr),], alpha = alpha, level = level +1) if (d2) return(invisible(TRUE)) return(invisible(FALSE)) } } DataSplitsCore(Data, alpha = alpha, level = 1) } Your description wasn't the clearest about what to do when the data sizes didn't match, but this should give you a start. Let me know if this doesn't do as desired and I can help tweak it. Hope this can be of help, Michael Weylandt PS -- You might as well use R's built in t.test function. On Thu, Aug 11, 2011 at 5:17 AM, Marina de Wolff <marinadewo...@hotmail.com>wrote: > > I want to implement the following algorithm in R: > > I want to split my data, use a t test to compare both means of the groups > to see if they significantly differ from each other. If this is a yes (p < > alpha) I want to split again (into 4 groups) and do the same procedure > twice, and stop otherwise (here the problem arises). As a final result I > would have different groups of data. > > I made some code where the data is splitted, until no splitting is > possible. So for 16 datapoints, we can split 4 times with a final result of > 16 groups (p is NA for the 4th split since sd cannot be calculated..). > > The code calculated all p values, but I don't want this. I want it to stop > when p > alpha. I tried while, but didn't succeed. > > I hope someone can help me to acchieve my goal. > > This is what I tried so far with test data: > > a = rnorm(9,0,0.1) > b = rnorm(7,1,0.1) > data = c(a,b) > plot(data) > > # Want to calculate max of groups/split for the data > d = seq(1,100,1) > n = 2^d > m <- which(n <=length(data)) > n = n[m[1]:m[length(m)]] > > # All groups > i=0 > j=0 > dx = 0 > dy = > for (i in 1:length(n)){ > split <- length(data)/(n[i]) > for (j in 1:(n[i]/2)){ > x = data[(1 + (j-1)*(2*split)):(round(split) + (j-1)*(2*split))] > dx = cbind(dx,x) > y = data[((round(split)+1) + (j-1)*(2*split)):(2*j*split)] > dy = cbind(dy,y) > }} > > dx = dx[,2:dim(dx)[2]] > dy = dy[,2:dim(dy)[2]] > > k=0 > meanx=0 > meany=0 > sdx=0 > sdy=0 > nx=0 > ny=0 > for (k in 1:dim(dx)[2]) { > meanx[k] = mean(unique(dx[,k])) > meany[k] = mean(unique(dy[,k])) > sdx[k] = sd(unique(dx[,k])) > sdy[k] = sd(unique(dy[,k])) > nx[k] = length(unique(dx[,k])) > ny[k] = length(unique(dy[,k])) > } > > t = (meanx-meany)/sqrt((sdx^2/nx) + (sdy^2/ny)) > df = ((sdx^2/nx) + (sdy^2/ny))^2/((sdx^2/nx)^2/(nx-1) + > (sdy^2/ny)^2/(ny-1)) > p = 2*pt(-abs(t),df=df) > alpha = 0.05 > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.