A little late, I can report that this speeds up my "many seqlevels"
problem, by 3 orders of magnitude.

library(IRanges, lib.loc = "library")
library(GenomicRanges, lib.loc = "library")
library(BSgenome.Amellifera.BeeBase.assembly4)
Un <- Amellifera$GroupUn
gr <- GRanges(seqnames = names(Un),
              ranges= IRanges(start = 1 , width = width(Un)))

## gr has a length of 9244, but each interval is in a new seqname.
## this makes traditional findOverlaps extremely slow

system.time({
    findOverlaps(gr, gr)
})  ## roughly 240 secs

system.time({
    grF <- as(gr, "GIntervalTree")
})
system.time({
    findOverlaps(grF, grF)
}) ## roughly 0.1 secs

## speedup (for this example): 2400x fold !!!

Kasper


On Thu, May 30, 2013 at 6:51 AM, Hector Corrada Bravo <
hcorr...@umiacs.umd.edu> wrote:

> Great. I already have unit tests there for IntervalForest and
> GIntervalTree.
> Hector
>
>
> On Wed, May 29, 2013 at 8:31 PM, Vincent Carey
> <st...@channing.harvard.edu>wrote:
>
> > Fine with me, as long as he is acquainted with the build/test before
> commit
> > practices that we are supposed
> > to follow.  Breaking IRanges can have severe repercussions.
> >
> > On Wed, May 29, 2013 at 6:36 PM, Michael Lawrence <
> > lawrence.mich...@gene.com
> > > wrote:
> >
> > > Would it be feasible/acceptable to give Hector permission to commit?
> > >
> > > Michael
> > >
> > >
> > > On Wed, May 29, 2013 at 2:12 PM, Hector Corrada Bravo <
> > hcorr...@gmail.com
> > > >wrote:
> > >
> > > > That's great! There's some cleaning up to do there how should we do
> > this
> > > > post-merge?
> > > >
> > > >
> > > > On Wed, May 29, 2013 at 4:19 PM, Valerie Obenchain <
> voben...@fhcrc.org
> > > >wrote:
> > > >
> > > >> Hi Hector, Michael,
> > > >>
> > > >> This sounds great. Bringing these into svn is fine with us. Michael,
> > do
> > > >> you want to merge these in?
> > > >>
> > > >> Val
> > > >>
> > > >> On 05/24/2013 07:30 AM, Hector Corrada Bravo wrote:
> > > >> > Thanks Michael,
> > > >> >
> > > >> > It has made significant difference for our visualization project.
> I
> > > >> would
> > > >> > like to merge this into svn asap. Can I get a ruling from the rest
> > of
> > > >> the
> > > >> > core group? Please let me know if/when/how to proceed.
> > > >> >
> > > >> > Cheers,
> > > >> > Hector
> > > >> >
> > > >> >
> > > >> > On Wed, May 22, 2013 at 1:00 PM, Michael Lawrence <
> > > >> lawrence.mich...@gene.com
> > > >> >> wrote:
> > > >> >
> > > >> >> *Added bioc-devel; hope you don't mind*
> > > >> >>
> > > >> >> Hector,
> > > >> >>
> > > >> >> This is great stuff. The overall design is on the right track. As
> > you
> > > >> >> said, there's a bit of cleaning to do, but I think we should
> merge
> > > >> this
> > > >> >> into svn and work the rest out from there. This will really
> benefit
> > > >> >> performance, especially for visualization. Of course, I can't
> speak
> > > >> for the
> > > >> >> others.
> > > >> >>
> > > >> >> Michael
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> On Tue, May 21, 2013 at 11:52 AM, Hector Corrada Bravo <
> > > >> >> hcorr...@umiacs.umd.edu> wrote:
> > > >> >>
> > > >> >>> Since the semester is over I finally finished this...
> > > >> >>>
> > > >> >>> Recall that I wanted a persistent set of IntervalTrees for
> GRanges
> > > >> >>> objects for repeated querying. (The application is this:
> > > >> >>> http://epiviz.cbcb.umd.edu/help/?page_id=62 which I hope to get
> > out
> > > >> >>> soon). Folding this into IRanges and GenomicRanges would make
> our
> > > >> life
> > > >> >>> easier come installation time.
> > > >> >>>
> > > >> >>> I've implemented class 'IntervalForest' within IRanges following
> > > >> >>> Michael's suggestion of storing this as an array of rbTree on
> the
> > C
> > > >> side.
> > > >> >>> I've implemented findOverlaps that operates with this array in
> C.
> > > >> There is
> > > >> >>> code duplication in IntervalTree.c that could be reduced but
> > that's
> > > >> if this
> > > >> >>> makes it into the package.
> > > >> >>>
> > > >> >>> I've also implemented a 'GIntervalTree' that uses
> 'IntervalForest'
> > > >> >>> underneath. findOverlaps-GenomicRanges-GIntervalTree-method is
> > > >> implemented
> > > >> >>> for this class. I didn't touch the existing
> > > >> >>> findOverlaps-GenomicRanges-GenomicRanges-method.
> > > >> >>>
> > > >> >>> You can pull these here:
> > > >> >>> http://github.com/hcorrada/IRanges
> > > >> >>> http://github.com/hcorrada/GenomicRanges
> > > >> >>>
> > > >> >>> These track the devel branch of the two packages. Let me know
> the
> > > >> best
> > > >> >>> way to propagate to svn if you guys want this. It needs
> > > >> documentation, but
> > > >> >>> I'll add that once implementation is settled.
> > > >> >>>
> > > >> >>> Kasper, I'm not sure if this would help with the 'too many
> > > seqlevels'
> > > >> >>> problem but I'd be curious to know if you try it.
> > > >> >>>
> > > >> >>> Cheers,
> > > >> >>> Hector
> > > >> >>>
> > > >> >>
> > > >> >>
> > > >> >
> > > >> > [[alternative HTML version deleted]]
> > > >> >
> > > >> > _______________________________________________
> > > >> > Bioc-devel@r-project.org mailing list
> > > >> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > > >> >
> > > >>
> > > >
> > > >
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioc-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to