Hi all,I know this topic has came up multiple times, but I've never fully
understand the apply() function.

Anyway, I'm here asking for your help again to convert this loop to apply().

I have 2 data frames with the following information: a1 is the fragment that
is need to be covered, a2 is the probes that cover the specific fragment.

I need to count the number of probes cover every given fragment (they need
to have the same cat ID to be on the same fragment)

a1<-data.frame(id=c(1:6), cat=c('cat 1','cat 1','cat 2','cat 2','cat 2','cat
3'), st=c(1,7,30,40,59,91), en=c(5,25,39,55,70,120));
a2<-data.frame(id=paste('probe',c(1:8)), cat=c('cat 1','cat 1','cat 2','cat
2','cat 2','cat 3','cat 3','cat 3'), st=c(1,9,20,38,53,70,80,95),
en=c(6,15,36,43,58,75,85,98));
a1$coverage<-NULL;

I came up with this for loop (basically, if a probe starts before the
fragment end, and end after a fragment start, it cover that fragment)

for (i in 1:length(a1$id))
{
a1$coverage[i]<-length(a2[a2$st<=a1$en[i]&a2$en>=a1$st[i]&a2$cat==a1$cat[i],]$id);
}

> a1$coverage
[1] 1 1 2 2 0 1


This loop runs awefully slow when I have 200,000 probes and 30,000
fragments. Is there anyway I can speed this up with apply()?

This is the time for my for loop to scan through the first 20 record of my
dataset:
   user  system elapsed
  2.264   0.501   2.770

I think there is room for improvement here. Any idea?

Thanks
-- 
Regards,
Anh Tran

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to