Re: [Bioc-devel] BiocParallel

Ryan C. Thompson Thu, 15 Nov 2012 15:06:51 -0800

You can probably parallelize the findOverlaps function, but you'd haveto write the code yourself, and that code would be mostly bookkeepingcode to get the indices right. Maybe there's a case for adding aparallelized findOverlaps function to BiocParallel?

You can't parallelize the disjoin operation with something like"mclapply", since it is not a data-parallel operation. Maybe you couldspeed things up by writing a recursive version of disjoin which splitsits argument into subsets, runs disjoin on each one, and then callsdisjoin on the results. However, I'm not sure if this would actuallyresult in a speedup in practice. More naively, if your arguments areGRanges, you can split by chromosome and run disjoin on each chromosomein parallel, then merge the results. But that will also put things outof order in case you care.

You can probably parallelize subsetByOverlaps using pvec, but you mighthave to ignore the warning about the output length not being the same asthe input.


Your for loop can be parallelized as such:

overlapping.byState <- mclapply(byState, function(x)which(queryHits(findOverlaps(disjoint, x))))

mcols(disjoint)[unlist(overlapping.byState), "state"] <-

factor(rep(names(overlapping.byState),elementLengths(overlapping.byState)))

The above all assumes you are using the pvec and mclapply from this newBiocParallel package which supports operations on non-primitivevector-ish objects.


On 11/15/2012 11:02 AM, Tim Triche, Jr. wrote:

As an aside, if I want to do the following:

         ol <- findOverlaps(object, x)
         so <- object[queryHits(ol)]
         sx <- x[subjectHits(ol)]
         disjoint <- subsetByOverlaps(disjoin(c(sx, so, ignore.mcols = T)),
             so)
         mcols(disjoint)[, "state"] <- rep("", length(disjoint))
         byState <- split(so, mcols(so)[, "state"])
         for (state in names(byState)) {
             overlapping <- queryHits(findOverlaps(disjoint,
byState[[state]]))
             if (length(overlapping) > 0) mcols(disjoint[overlapping])[,
"state"] <- state
         }
         mcols(disjoint)[, "state"] <- as.factor(mcols(disjoint)[, "state"])

often and fast, where 'object' and 'x' are ranges with large numbers of
intervals, is there a clever way to speed it up a lot?


_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] BiocParallel

Reply via email to