On 2013-04-03 19:28, Kasper Daniel Hansen wrote:
Making IntervalTree chromosome would also be a great addition for organisms
with many sequences, like bee (due to an incomplete genome; 10,000s of
sequences).  It does not matter for humans, but findOverlaps is
excruciatingly slow for bee's.  I have a couple of posts on this in the
archive.

I am predicting this will be the case in the future for most non-model
organisms; finishing a genome is expensive and time consuming.

I can only confirm: many many people do not close genomes, because for a number of applications this is just not worth the trouble... for now. Given the number of cases I have seen, that's now concerning the very large majority of genomes.

L.

Kasper


On Wed, Apr 3, 2013 at 12:45 PM, Michael Lawrence <lawrence.mich...@gene.com
wrote:
Some ideas:

- Turn the IntervalTree into a list/array of nodes that can be
subset/reordered with shallow copying (just copy the pointers to the
nodes), and the index would be secondary. The index in the array could be
stored in each node, for lookup during overlap queries. Right now, as far
as I can tell, GIntervalTree will get confused if the user reorders e.g.
via [.

- Make IntervalTree aware of the sequence/chromosome, e.g., have a hash of
trees, which is trivial since seqnames is already a factor.

Michael



On Wed, Apr 3, 2013 at 9:29 AM, Hector Corrada Bravo <hcorr...@gmail.com
wrote:
Yep, I didn't comment on that, but I agree that abstracting how
GRanges stores ranges would make this more elegant. Right now
ranges(GRanges) is specified to be of IRanges class instead of the
abstract Ranges class.

If it were the latter then GIntervalTree can be a subclass of
GenomicRanges, in a similar way that IntervalTree is a subclass of
Ranges.

On Wed, Apr 3, 2013 at 12:23 PM, Michael Lawrence
<lawrence.mich...@gene.com> wrote:
Hi Hector,

That's interesting, thanks for passing this along. I'm still wishing
that
somehow GRanges itself could abstract the way it stores ranges. I know
that
Herve/Patrick had some reasons for depending specifically on GRanges.
One
reason was probably convenience at the C level, but it wouldn't be hard
to
create a Ranges abstraction at the C level, as well.

Michael



On Tue, Apr 2, 2013 at 5:40 PM, Hector Corrada Bravo <
hcorr...@gmail.com
wrote:
Hello bioc-develers,

I'm writing an application where lots findOverlap calls are made on
static GRanges objects. For IRanges we can create persistent
IntervalTree objects that would serve the multiple overlap query
use-case. There is no equivalent for GenomicRanges objects, so I'm
proposing an implementation for this.

Please check
http://github.com/hcorrada/GenomicIntervalTree

There's a first cut implementation there you can test by installing
this skeleton package. E.g,

library(devtools)
install_github("GenomicIntervalTree", username="hcorrada",
subdir="pkg")
library(GenomicIntervalTree)
Let me know what you think.

Cheers,
Hector

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to