Here is a way to speed up your toy example: _________________ a1 <- data.frame(id = 1:6, cat = paste('cat', rep(1:3, c(2,3,1))), st = c(1, 7, 30, 40, 59, 91), en = c(5, 25, 39, 55, 70, 120))
a2 <- data.frame(id = paste('probe', 1:8), cat = paste('cat', rep(1:3, c(2,3,3))), st = c(1, 9, 20, 38, 53, 70, 80, 95), en = c(6, 15, 36, 43, 58, 75, 85, 98)) c1 <- outer(a2$st , a1$en , "<=") c2 <- outer(a2$en , a1$st , ">=") c3 <- outer(a2$cat, a1$cat, "==") a1$coverage <- colSums(c1*c2*c3) __________________ This won't work in one step if a1 has 30000 rows and a2 has 200000, unless you memory size is approximately infinite, so you will need a loop. Suppose you can handle 1000 probes at a time. You might be able to get away with something like this: __________________ chunk <- 1000 ### make as large as possible! checkCoverage <- function(a1, a2) colSums(outer(a2$st , a1$en , "<=") * outer(a2$en , a1$st , ">=") * outer(a2$cat, a1$cat, "==")) coverage <- numeric(N <- nrow(a2)) m2 <- 0 while(m2 < N) { m1 <- m2 + 1 m2 <- min(m2 + chunk, N) coverage[m1:m2] <- checkCoverage(a1, a2[m1:m2, ]) } a1$coverage <- coverage __________________ (Warning: untested code.) Failing that, go to C code and be done with it. BTW, why did you think apply() was going to be useful here? Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Anh Tran Sent: Saturday, 2 August 2008 4:04 PM To: rlist Subject: [R] convert for loop into apply() Hi all,I know this topic has came up multiple times, but I've never fully understand the apply() function. Anyway, I'm here asking for your help again to convert this loop to apply(). I have 2 data frames with the following information: a1 is the fragment that is need to be covered, a2 is the probes that cover the specific fragment. I need to count the number of probes cover every given fragment (they need to have the same cat ID to be on the same fragment) a1<-data.frame(id=c(1:6), cat=c('cat 1','cat 1','cat 2','cat 2','cat 2','cat 3'), st=c(1,7,30,40,59,91), en=c(5,25,39,55,70,120)); a2<-data.frame(id=paste('probe',c(1:8)), cat=c('cat 1','cat 1','cat 2','cat 2','cat 2','cat 3','cat 3','cat 3'), st=c(1,9,20,38,53,70,80,95), en=c(6,15,36,43,58,75,85,98)); a1$coverage<-NULL; I came up with this for loop (basically, if a probe starts before the fragment end, and end after a fragment start, it cover that fragment) for (i in 1:length(a1$id)) { a1$coverage[i]<-length(a2[a2$st<=a1$en[i]&a2$en>=a1$st[i]&a2$cat==a1$cat [i],]$id); } > a1$coverage [1] 1 1 2 2 0 1 This loop runs awefully slow when I have 200,000 probes and 30,000 fragments. Is there anyway I can speed this up with apply()? This is the time for my for loop to scan through the first 20 record of my dataset: user system elapsed 2.264 0.501 2.770 I think there is room for improvement here. Any idea? Thanks -- Regards, Anh Tran [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.