Hi Matthew, First - I fully support Greg Snow proposition. Sampling is the way to go here.
But besides that: 1) Try to avoid using data.frames as much as possible (use vectors and matrixes instead - they are usually faster) 2) Since you are running on a loop, you can try running it in parallel (if you have more then one core on your computer). I recently wrote how to do this on Windows (here: http://www.r-statistics.com/2010/04/parallel-multicore-processing-with-r-on-windows/) , and there are other ways for doing it on other OS. But again - sampling is probably going to be the only real solution... Good luck, Tal ----------------Contact Details:------------------------------------------------------- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Tue, Apr 27, 2010 at 1:09 AM, Matthew Finkbeiner < matthew.finkbei...@maccs.mq.edu.au> wrote: > Yes, I suspect that I will end up using a sampling approach, but I'd like > to use an exact test if it's at all feasible. > > Here are two samples of data from 3 subjects: > Sample Subj C1 C2 > 44 1 0.0093 0.0077 > 44 2 0.0089 0.0069 > 44 3 0.051 0.0432 > 44 4 0.014 0.0147 > 44 5 0.0161 0.0117 > 45 1 0.0103 0.0086 > 45 2 0.0099 0.0078 > 45 3 0.0542 0.0458 > 45 4 0.0154 0.0163 > 45 5 0.0175 0.0129 > > > and then here is the script I've pieced together from things I've found on > the web (sorry for not citing the snippets!). any pointers on how to speed > it up would be greatly appreciated. > > #---------------------------------- > # Utility function > # that returns binary representation of 1:(2^n) X SubjN > binary.v <- > function(n) > { > x <- 1:(2^n) > mx <- max(x) > digits <- floor(log2(mx)) > ans <- 0:(digits-1); lx <- length(x) > x <- matrix(rep(x,rep(digits, lx)),ncol=lx) > x <- (x %/% 2^ans) %% 2 > } > > library(plyr) > > > #first some global variables > TotalSubjects <- 5 > TotalSamples <- 2 > StartSample <- 44 > EndSample <- ((StartSample + TotalSamples)-1) > maxTs <- NULL > obsTs <- NULL > > > > > #create index array that drives the permuations for all samples > ind <- binary.v(TotalSubjects) > > #transpose ind so that the first 2^N items correspond to S1, > #the second 2^N correspond to S2 and so on... > transind <- t(ind) > > #get data file that is organized first by sample then by subj (e.g. sample1 > subject1 > # sample1 subject 2 ... sample 1 subject N) > #sampledatafile <- file.choose() > > samples <- read.table(sampledatafile, header=T) > > #this is the progress bar > pb <- txtProgressBar(min = StartSample, max = EndSample, style = 3) > setTxtProgressBar(pb, 1) > > start.t <- proc.time() > > #begin loop that analyzes data sample by sample > for (s in StartSample:EndSample) { > > S <- samples[samples$Sample==s,] #pick up data for current sample > > #reproduce data frame rows once for each permutation to be done > expanddata <- S[rep(1:nrow(S), each = 2^TotalSubjects),] > > > #create new array to hold the flipped (permuted) data > permdata = expanddata > > #permute the data > permdata[transind==1,3] <- expanddata[transind==1,4] #Cnd1 <- Cnd2 > permdata[transind==1,4] <- expanddata[transind==1,3] #Cnd2 <- Cnd1 > > #create permutation # as a factor in dataframe > PermN <- rep(rep(1:2^TotalSubjects, TotalSubjects),2) > > #create Sample# as a factor > Sample <- rep(permdata[,1],2) #Sample# is in the 1st Column > > #create subject IDs as a factor > Subj <- rep(permdata[,2],2) #Subject ID is in the 2nd Column > > #stack the permutated data > StackedPermData <- stack(permdata[,3:4]) > > #bind all the factors together > StackedPermData <- as.data.frame(cbind(Sample, Subj, PermN, > StackedPermData)) > > > #sort by perm > sortedstack <- > as.data.frame(StackedPermData[order(StackedPermData$PermN, > StackedPermData$Sample),]) > > > #clear up some memory > rm(expanddata, permdata, StackedPermData) > > #pull out data 1 perm at a time > res<-ddply(sortedstack, c("Sample", "PermN"), function(.data){ > > # Type combinations by Class > combs<-t(combn(sort(unique(.data[,5])),2)) > > # Applying the t-test for them > aaply(combs,1, function(.r){ > x1<-.data[.data[,5]==.r[1],4] # select first column > x2<-.data[.data[,5]==.r[2],4] # select first column > > tvalue <- t.test(x1,x2, paired = T) > > res <- c(tvalue$statistic,tvalue$parameter,tvalue$p.value) > names(res) <- c('stat','df','pvalue') > res > } > ) > } > ) > > # update progress bar > setTxtProgressBar(pb, s) > > #get max T vals > maxTs <- c(maxTs, tapply (res$stat, list (res$Sample), max)) > > #get observed T vals > obsTs <- c(obsTs, res$stat[length(res$stat)]) > > #here we need to save res to a binary file > > > } > #close out the progress bar > close(pb) > > end.t <- proc.time() - start.t > print(end.t) > > #get cutoffs > #these are the 2-tailed t-vals that maintain experimentwise error at the > 0.05 level > lowerT <- quantile(maxTs, .025) > upperT <- quantile(maxTs, .975) > > > > > > > > > > > > > On 4/27/2010 6:53 AM, Greg Snow wrote: > >> The usual way to speed up permutation testing is to sample from the >> set of possible permutations rather than looking at all possible >> ones. >> >> If you show some code then we may be able to find some inefficiencies >> for you, but there is not general solution, poorly written uses of >> apply will be slower than well written for loops. In some cases >> rewriting critical pieces in C or fortran will help quite a bit, but >> we need to see what you are already doing to know if that will help >> or not. >> >> > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.