Thanks everyone for their comments. I think the easiest (non-hack) implementation would do the following:
1) For now, use the newly created IntervalTreeList class to implement Michael's hash idea. I'm weary of pushing this into the IntervalTree class since it would make this class depart from the Ranges interface (but see point A below). 2) On the C side, nodes in the IntervalTree have a slot that stores its location in the original Ranges object. For GIntervalTree it seems it would be sufficient to make this store the location in the original GRanges object. We would only need to reimplement the constructor on the C side to take an arbitrary set of indexes to store in nodes instead of the specific location in the Ranges object. This is trivial, and would make storing the index vector in GIntervalTree unnecessary. 3) For now I will implement subsetting i.e. [ as done in the Ranges case by constructing a new GIntervalTree object from subsetting the corresponding GRanges object. I'll write again after updating considering these three points. A couple of other things: A) I'm not clear if Ranges object have the notion of range "spaces" which conceptually correspond to seqlevels (chromosomes) in GRanges objects. Comments? B) I would like to hear comments on the notion of having GenomicRanges@ranges inherit from Ranges instead of IRanges or conversely, having IntervalTree inherit from IRanges instead of Ranges. Either of these would make some of the work needed for GIntervalTree easier. I'm keeping track of discussion here: https://github.com/hcorrada/GenomicIntervalTree/wiki Thanks again! Hector On Wed, Apr 3, 2013 at 3:17 PM, Laurent Gautier <lgaut...@gmail.com> wrote: > On 2013-04-03 19:28, Kasper Daniel Hansen wrote: > >> Making IntervalTree chromosome would also be a great addition for >> organisms >> with many sequences, like bee (due to an incomplete genome; 10,000s of >> sequences). It does not matter for humans, but findOverlaps is >> excruciatingly slow for bee's. I have a couple of posts on this in the >> archive. >> >> I am predicting this will be the case in the future for most non-model >> organisms; finishing a genome is expensive and time consuming. >> > > I can only confirm: many many people do not close genomes, because for a > number of applications this is just not worth the trouble... for now. > Given the number of cases I have seen, that's now concerning the very > large majority of genomes. > > > L. > >> >> Kasper >> >> >> On Wed, Apr 3, 2013 at 12:45 PM, Michael Lawrence < >> lawrence.mich...@gene.com >> >>> wrote: >>> Some ideas: >>> >>> - Turn the IntervalTree into a list/array of nodes that can be >>> subset/reordered with shallow copying (just copy the pointers to the >>> nodes), and the index would be secondary. The index in the array could be >>> stored in each node, for lookup during overlap queries. Right now, as far >>> as I can tell, GIntervalTree will get confused if the user reorders e.g. >>> via [. >>> >>> - Make IntervalTree aware of the sequence/chromosome, e.g., have a hash >>> of >>> trees, which is trivial since seqnames is already a factor. >>> >>> Michael >>> >>> >>> >>> On Wed, Apr 3, 2013 at 9:29 AM, Hector Corrada Bravo <hcorr...@gmail.com >>> >>>> wrote: >>>> Yep, I didn't comment on that, but I agree that abstracting how >>>> GRanges stores ranges would make this more elegant. Right now >>>> ranges(GRanges) is specified to be of IRanges class instead of the >>>> abstract Ranges class. >>>> >>>> If it were the latter then GIntervalTree can be a subclass of >>>> GenomicRanges, in a similar way that IntervalTree is a subclass of >>>> Ranges. >>>> >>>> On Wed, Apr 3, 2013 at 12:23 PM, Michael Lawrence >>>> <lawrence.mich...@gene.com> wrote: >>>> >>>>> Hi Hector, >>>>> >>>>> That's interesting, thanks for passing this along. I'm still wishing >>>>> >>>> that >>> >>>> somehow GRanges itself could abstract the way it stores ranges. I know >>>>> >>>> that >>>> >>>>> Herve/Patrick had some reasons for depending specifically on GRanges. >>>>> >>>> One >>> >>>> reason was probably convenience at the C level, but it wouldn't be hard >>>>> >>>> to >>>> >>>>> create a Ranges abstraction at the C level, as well. >>>>> >>>>> Michael >>>>> >>>>> >>>>> >>>>> On Tue, Apr 2, 2013 at 5:40 PM, Hector Corrada Bravo < >>>>> >>>> hcorr...@gmail.com >>> >>>> wrote: >>>>> >>>>>> Hello bioc-develers, >>>>>> >>>>>> I'm writing an application where lots findOverlap calls are made on >>>>>> static GRanges objects. For IRanges we can create persistent >>>>>> IntervalTree objects that would serve the multiple overlap query >>>>>> use-case. There is no equivalent for GenomicRanges objects, so I'm >>>>>> proposing an implementation for this. >>>>>> >>>>>> Please check >>>>>> http://github.com/hcorrada/**GenomicIntervalTree<http://github.com/hcorrada/GenomicIntervalTree> >>>>>> >>>>>> There's a first cut implementation there you can test by installing >>>>>> this skeleton package. E.g, >>>>>> >>>>>> library(devtools) >>>>>>> install_github("**GenomicIntervalTree", username="hcorrada", >>>>>>> >>>>>> subdir="pkg") >>>> >>>>> library(GenomicIntervalTree) >>>>>>> >>>>>> Let me know what you think. >>>>>> >>>>>> Cheers, >>>>>> Hector >>>>>> >>>>>> ______________________________**_________________ >>>>>> Bioc-devel@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel> >>>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel> >>> >>> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> > > ______________________________**_________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel> > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel