A little late, I can report that this speeds up my "many seqlevels" problem, by 3 orders of magnitude.
library(IRanges, lib.loc = "library") library(GenomicRanges, lib.loc = "library") library(BSgenome.Amellifera.BeeBase.assembly4) Un <- Amellifera$GroupUn gr <- GRanges(seqnames = names(Un), ranges= IRanges(start = 1 , width = width(Un))) ## gr has a length of 9244, but each interval is in a new seqname. ## this makes traditional findOverlaps extremely slow system.time({ findOverlaps(gr, gr) }) ## roughly 240 secs system.time({ grF <- as(gr, "GIntervalTree") }) system.time({ findOverlaps(grF, grF) }) ## roughly 0.1 secs ## speedup (for this example): 2400x fold !!! Kasper On Thu, May 30, 2013 at 6:51 AM, Hector Corrada Bravo < hcorr...@umiacs.umd.edu> wrote: > Great. I already have unit tests there for IntervalForest and > GIntervalTree. > Hector > > > On Wed, May 29, 2013 at 8:31 PM, Vincent Carey > <st...@channing.harvard.edu>wrote: > > > Fine with me, as long as he is acquainted with the build/test before > commit > > practices that we are supposed > > to follow. Breaking IRanges can have severe repercussions. > > > > On Wed, May 29, 2013 at 6:36 PM, Michael Lawrence < > > lawrence.mich...@gene.com > > > wrote: > > > > > Would it be feasible/acceptable to give Hector permission to commit? > > > > > > Michael > > > > > > > > > On Wed, May 29, 2013 at 2:12 PM, Hector Corrada Bravo < > > hcorr...@gmail.com > > > >wrote: > > > > > > > That's great! There's some cleaning up to do there how should we do > > this > > > > post-merge? > > > > > > > > > > > > On Wed, May 29, 2013 at 4:19 PM, Valerie Obenchain < > voben...@fhcrc.org > > > >wrote: > > > > > > > >> Hi Hector, Michael, > > > >> > > > >> This sounds great. Bringing these into svn is fine with us. Michael, > > do > > > >> you want to merge these in? > > > >> > > > >> Val > > > >> > > > >> On 05/24/2013 07:30 AM, Hector Corrada Bravo wrote: > > > >> > Thanks Michael, > > > >> > > > > >> > It has made significant difference for our visualization project. > I > > > >> would > > > >> > like to merge this into svn asap. Can I get a ruling from the rest > > of > > > >> the > > > >> > core group? Please let me know if/when/how to proceed. > > > >> > > > > >> > Cheers, > > > >> > Hector > > > >> > > > > >> > > > > >> > On Wed, May 22, 2013 at 1:00 PM, Michael Lawrence < > > > >> lawrence.mich...@gene.com > > > >> >> wrote: > > > >> > > > > >> >> *Added bioc-devel; hope you don't mind* > > > >> >> > > > >> >> Hector, > > > >> >> > > > >> >> This is great stuff. The overall design is on the right track. As > > you > > > >> >> said, there's a bit of cleaning to do, but I think we should > merge > > > >> this > > > >> >> into svn and work the rest out from there. This will really > benefit > > > >> >> performance, especially for visualization. Of course, I can't > speak > > > >> for the > > > >> >> others. > > > >> >> > > > >> >> Michael > > > >> >> > > > >> >> > > > >> >> > > > >> >> On Tue, May 21, 2013 at 11:52 AM, Hector Corrada Bravo < > > > >> >> hcorr...@umiacs.umd.edu> wrote: > > > >> >> > > > >> >>> Since the semester is over I finally finished this... > > > >> >>> > > > >> >>> Recall that I wanted a persistent set of IntervalTrees for > GRanges > > > >> >>> objects for repeated querying. (The application is this: > > > >> >>> http://epiviz.cbcb.umd.edu/help/?page_id=62 which I hope to get > > out > > > >> >>> soon). Folding this into IRanges and GenomicRanges would make > our > > > >> life > > > >> >>> easier come installation time. > > > >> >>> > > > >> >>> I've implemented class 'IntervalForest' within IRanges following > > > >> >>> Michael's suggestion of storing this as an array of rbTree on > the > > C > > > >> side. > > > >> >>> I've implemented findOverlaps that operates with this array in > C. > > > >> There is > > > >> >>> code duplication in IntervalTree.c that could be reduced but > > that's > > > >> if this > > > >> >>> makes it into the package. > > > >> >>> > > > >> >>> I've also implemented a 'GIntervalTree' that uses > 'IntervalForest' > > > >> >>> underneath. findOverlaps-GenomicRanges-GIntervalTree-method is > > > >> implemented > > > >> >>> for this class. I didn't touch the existing > > > >> >>> findOverlaps-GenomicRanges-GenomicRanges-method. > > > >> >>> > > > >> >>> You can pull these here: > > > >> >>> http://github.com/hcorrada/IRanges > > > >> >>> http://github.com/hcorrada/GenomicRanges > > > >> >>> > > > >> >>> These track the devel branch of the two packages. Let me know > the > > > >> best > > > >> >>> way to propagate to svn if you guys want this. It needs > > > >> documentation, but > > > >> >>> I'll add that once implementation is settled. > > > >> >>> > > > >> >>> Kasper, I'm not sure if this would help with the 'too many > > > seqlevels' > > > >> >>> problem but I'd be curious to know if you try it. > > > >> >>> > > > >> >>> Cheers, > > > >> >>> Hector > > > >> >>> > > > >> >> > > > >> >> > > > >> > > > > >> > [[alternative HTML version deleted]] > > > >> > > > > >> > _______________________________________________ > > > >> > Bioc-devel@r-project.org mailing list > > > >> > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > >> > > > > >> > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > > Bioc-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel