Re: [Bioc-devel] range-directed metadata management

Vincent Carey Thu, 10 Jul 2014 16:16:43 -0700

On Thu, Jul 10, 2014 at 7:05 PM, Michael Lawrence <lawrence.mich...@gene.com
> wrote:


>
>
>
> On Thu, Jul 10, 2014 at 2:16 PM, Steve Lianoglou <lianoglou.st...@gene.com
> > wrote:
>
>> Hi,
>>
>> On Thu, Jul 10, 2014 at 1:52 PM, Vincent Carey
>> <st...@channing.harvard.edu> wrote:
>> > a new, more inclusive GWAS catalog is available (GRASP, from Andrew
>> Johnson
>> > at NHLBI), with 6 million records and voluminous metadata (though it
>> seems
>> > sparse and perhaps can be trimmed/reshaped)
>> >
>> > i made a GRanges and it takes 3 minutes to load.  even after stripping
>> all
>> > the
>> > metadata, a GRanges with 6 million records takes 20 seconds to load.
>> >  that's probably acceptable, but a managed chromosome-specific
>> distribution
>> > might
>> > be closer to interactive availability.
>> >
>> > the metadata probably would be best kept in SQLite.  it occurred to me
>> to
>> > consider an arrangement in which we have the GRanges managing the ranges
>> > and a key to the database.  range operations can engender queries to
>> > retrieve metadata, metadata queries in the db can generate indices to
>> > retrieve matching ranges.
>> >
>> > is anyone doing something along these lines?
>>
>> You might consider just stuffing it all in the database.
>>
>> SQLite supports RTrees, which is a spatial index, so you could in
>> theory get the fast overlap stuff baked in w/o a need to have a
>> parallel GRanges object to index into the database:
>> http://www.sqlite.org/rtree.html
>>
>> Before the reboot of the GenomicFeatures package (we're talking around
>> 2008/2009?) I was doing something like that for genomic annotations.
>>
>> The way that Hadley has abstracted db access in dplyr to make a
>> database look like a data.frame and respond to all the "data
>> manipulation verbs" in the same way gives me inspiration to believe
>> that we can do the same and make the database look essentially like a
>> GRanges / VRanges object and get cooking that way.
>>
>>
> This would be useful and was part of the intent of DynamicGRanges in the
> MutableRanges package (in svn for years but never released). A short-term
> solution might be an indexed VCF. The parser in VariantAnnotation supports
> multiple modes of restriction that should enable efficient loading.
>

i'll take a look, my impression was that ad hoc parsing and modeling the
unstructured metadata elements
in a vcf would be too costly.


> Michael
>
>
>> Hopefully this answer was at least minimally aligned in the direction
>> of what you were asking ;-)
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Computational Biologist
>> Genentech
>>
>> _______________________________________________
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] range-directed metadata management

Reply via email to