Hi all,I know this topic has came up multiple times, but I've never fully understand the apply() function.
Anyway, I'm here asking for your help again to convert this loop to apply(). I have 2 data frames with the following information: a1 is the fragment that is need to be covered, a2 is the probes that cover the specific fragment. I need to count the number of probes cover every given fragment (they need to have the same cat ID to be on the same fragment) a1<-data.frame(id=c(1:6), cat=c('cat 1','cat 1','cat 2','cat 2','cat 2','cat 3'), st=c(1,7,30,40,59,91), en=c(5,25,39,55,70,120)); a2<-data.frame(id=paste('probe',c(1:8)), cat=c('cat 1','cat 1','cat 2','cat 2','cat 2','cat 3','cat 3','cat 3'), st=c(1,9,20,38,53,70,80,95), en=c(6,15,36,43,58,75,85,98)); a1$coverage<-NULL; I came up with this for loop (basically, if a probe starts before the fragment end, and end after a fragment start, it cover that fragment) for (i in 1:length(a1$id)) { a1$coverage[i]<-length(a2[a2$st<=a1$en[i]&a2$en>=a1$st[i]&a2$cat==a1$cat[i],]$id); } > a1$coverage [1] 1 1 2 2 0 1 This loop runs awefully slow when I have 200,000 probes and 30,000 fragments. Is there anyway I can speed this up with apply()? This is the time for my for loop to scan through the first 20 record of my dataset: user system elapsed 2.264 0.501 2.770 I think there is room for improvement here. Any idea? Thanks -- Regards, Anh Tran [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.