Thanks everyone for their comments. I think the easiest (non-hack)
implementation would do the following:

1) For now, use the newly created IntervalTreeList class to implement
Michael's hash idea. I'm weary of pushing this into the IntervalTree class
since it would make this class depart from the Ranges interface (but see
point A below).

2) On the C side, nodes in the IntervalTree have a slot that stores its
location in the original Ranges object. For GIntervalTree it seems it would
be sufficient to make this store the location in the original GRanges
object. We would only need to reimplement the constructor on the C side to
take an arbitrary set of indexes to store in nodes instead of the specific
location in the Ranges object. This is trivial, and  would make storing the
index vector in GIntervalTree unnecessary.

3) For now I will implement subsetting i.e. [ as done in the Ranges case by
constructing a new GIntervalTree object from subsetting the corresponding
GRanges object.

I'll write again after updating considering these three points.

A couple of other things:

A) I'm not clear if Ranges object have the notion of range "spaces" which
conceptually correspond to seqlevels (chromosomes) in GRanges objects.
Comments?

B) I would like to hear comments on the notion of having
GenomicRanges@ranges inherit from Ranges instead of IRanges or conversely,
having IntervalTree inherit from IRanges instead of Ranges. Either of these
would make some of the work needed for GIntervalTree easier.

I'm keeping track of discussion here:
https://github.com/hcorrada/GenomicIntervalTree/wiki

Thanks again!
Hector


On Wed, Apr 3, 2013 at 3:17 PM, Laurent Gautier <lgaut...@gmail.com> wrote:

> On 2013-04-03 19:28, Kasper Daniel Hansen wrote:
>
>> Making IntervalTree chromosome would also be a great addition for
>> organisms
>> with many sequences, like bee (due to an incomplete genome; 10,000s of
>> sequences).  It does not matter for humans, but findOverlaps is
>> excruciatingly slow for bee's.  I have a couple of posts on this in the
>> archive.
>>
>> I am predicting this will be the case in the future for most non-model
>> organisms; finishing a genome is expensive and time consuming.
>>
>
> I can only confirm: many many people do not close genomes, because for a
> number of applications this is just not worth the trouble... for now.
> Given the number of cases I have seen, that's now concerning the very
> large majority of genomes.
>
>
> L.
>
>>
>> Kasper
>>
>>
>> On Wed, Apr 3, 2013 at 12:45 PM, Michael Lawrence <
>> lawrence.mich...@gene.com
>>
>>> wrote:
>>> Some ideas:
>>>
>>> - Turn the IntervalTree into a list/array of nodes that can be
>>> subset/reordered with shallow copying (just copy the pointers to the
>>> nodes), and the index would be secondary. The index in the array could be
>>> stored in each node, for lookup during overlap queries. Right now, as far
>>> as I can tell, GIntervalTree will get confused if the user reorders e.g.
>>> via [.
>>>
>>> - Make IntervalTree aware of the sequence/chromosome, e.g., have a hash
>>> of
>>> trees, which is trivial since seqnames is already a factor.
>>>
>>> Michael
>>>
>>>
>>>
>>> On Wed, Apr 3, 2013 at 9:29 AM, Hector Corrada Bravo <hcorr...@gmail.com
>>>
>>>> wrote:
>>>> Yep, I didn't comment on that, but I agree that abstracting how
>>>> GRanges stores ranges would make this more elegant. Right now
>>>> ranges(GRanges) is specified to be of IRanges class instead of the
>>>> abstract Ranges class.
>>>>
>>>> If it were the latter then GIntervalTree can be a subclass of
>>>> GenomicRanges, in a similar way that IntervalTree is a subclass of
>>>> Ranges.
>>>>
>>>> On Wed, Apr 3, 2013 at 12:23 PM, Michael Lawrence
>>>> <lawrence.mich...@gene.com> wrote:
>>>>
>>>>> Hi Hector,
>>>>>
>>>>> That's interesting, thanks for passing this along. I'm still wishing
>>>>>
>>>> that
>>>
>>>> somehow GRanges itself could abstract the way it stores ranges. I know
>>>>>
>>>> that
>>>>
>>>>> Herve/Patrick had some reasons for depending specifically on GRanges.
>>>>>
>>>> One
>>>
>>>> reason was probably convenience at the C level, but it wouldn't be hard
>>>>>
>>>> to
>>>>
>>>>> create a Ranges abstraction at the C level, as well.
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Apr 2, 2013 at 5:40 PM, Hector Corrada Bravo <
>>>>>
>>>> hcorr...@gmail.com
>>>
>>>> wrote:
>>>>>
>>>>>> Hello bioc-develers,
>>>>>>
>>>>>> I'm writing an application where lots findOverlap calls are made on
>>>>>> static GRanges objects. For IRanges we can create persistent
>>>>>> IntervalTree objects that would serve the multiple overlap query
>>>>>> use-case. There is no equivalent for GenomicRanges objects, so I'm
>>>>>> proposing an implementation for this.
>>>>>>
>>>>>> Please check
>>>>>> http://github.com/hcorrada/**GenomicIntervalTree<http://github.com/hcorrada/GenomicIntervalTree>
>>>>>>
>>>>>> There's a first cut implementation there you can test by installing
>>>>>> this skeleton package. E.g,
>>>>>>
>>>>>>  library(devtools)
>>>>>>> install_github("**GenomicIntervalTree", username="hcorrada",
>>>>>>>
>>>>>> subdir="pkg")
>>>>
>>>>> library(GenomicIntervalTree)
>>>>>>>
>>>>>> Let me know what you think.
>>>>>>
>>>>>> Cheers,
>>>>>> Hector
>>>>>>
>>>>>> ______________________________**_________________
>>>>>> Bioc-devel@r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>>>>
>>>>>
>>>>>           [[alternative HTML version deleted]]
>>>
>>> ______________________________**_________________
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>
>>>          [[alternative HTML version deleted]]
>>
>> ______________________________**_________________
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>
> ______________________________**_________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to