You can probably parallelize the findOverlaps function, but you'd have to write the code yourself, and that code would be mostly bookkeeping code to get the indices right. Maybe there's a case for adding a parallelized findOverlaps function to BiocParallel?

You can't parallelize the disjoin operation with something like "mclapply", since it is not a data-parallel operation. Maybe you could speed things up by writing a recursive version of disjoin which splits its argument into subsets, runs disjoin on each one, and then calls disjoin on the results. However, I'm not sure if this would actually result in a speedup in practice. More naively, if your arguments are GRanges, you can split by chromosome and run disjoin on each chromosome in parallel, then merge the results. But that will also put things out of order in case you care.

You can probably parallelize subsetByOverlaps using pvec, but you might have to ignore the warning about the output length not being the same as the input.

Your for loop can be parallelized as such:

overlapping.byState <- mclapply(byState, function(x) which(queryHits(findOverlaps(disjoint, x))))
mcols(disjoint)[unlist(overlapping.byState), "state"] <-
factor(rep(names(overlapping.byState), elementLengths(overlapping.byState)))

The above all assumes you are using the pvec and mclapply from this new BiocParallel package which supports operations on non-primitive vector-ish objects.

On 11/15/2012 11:02 AM, Tim Triche, Jr. wrote:
As an aside, if I want to do the following:

         ol <- findOverlaps(object, x)
         so <- object[queryHits(ol)]
         sx <- x[subjectHits(ol)]
         disjoint <- subsetByOverlaps(disjoin(c(sx, so, ignore.mcols = T)),
             so)
         mcols(disjoint)[, "state"] <- rep("", length(disjoint))
         byState <- split(so, mcols(so)[, "state"])
         for (state in names(byState)) {
             overlapping <- queryHits(findOverlaps(disjoint,
byState[[state]]))
             if (length(overlapping) > 0) mcols(disjoint[overlapping])[,
"state"] <- state
         }
         mcols(disjoint)[, "state"] <- as.factor(mcols(disjoint)[, "state"])

often and fast, where 'object' and 'x' are ranges with large numbers of
intervals, is there a clever way to speed it up a lot?


_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to