On Thu, Jul 10, 2014 at 7:05 PM, Michael Lawrence <lawrence.mich...@gene.com > wrote:
> > > > On Thu, Jul 10, 2014 at 2:16 PM, Steve Lianoglou <lianoglou.st...@gene.com > > wrote: > >> Hi, >> >> On Thu, Jul 10, 2014 at 1:52 PM, Vincent Carey >> <st...@channing.harvard.edu> wrote: >> > a new, more inclusive GWAS catalog is available (GRASP, from Andrew >> Johnson >> > at NHLBI), with 6 million records and voluminous metadata (though it >> seems >> > sparse and perhaps can be trimmed/reshaped) >> > >> > i made a GRanges and it takes 3 minutes to load. even after stripping >> all >> > the >> > metadata, a GRanges with 6 million records takes 20 seconds to load. >> > that's probably acceptable, but a managed chromosome-specific >> distribution >> > might >> > be closer to interactive availability. >> > >> > the metadata probably would be best kept in SQLite. it occurred to me >> to >> > consider an arrangement in which we have the GRanges managing the ranges >> > and a key to the database. range operations can engender queries to >> > retrieve metadata, metadata queries in the db can generate indices to >> > retrieve matching ranges. >> > >> > is anyone doing something along these lines? >> >> You might consider just stuffing it all in the database. >> >> SQLite supports RTrees, which is a spatial index, so you could in >> theory get the fast overlap stuff baked in w/o a need to have a >> parallel GRanges object to index into the database: >> http://www.sqlite.org/rtree.html >> >> Before the reboot of the GenomicFeatures package (we're talking around >> 2008/2009?) I was doing something like that for genomic annotations. >> >> The way that Hadley has abstracted db access in dplyr to make a >> database look like a data.frame and respond to all the "data >> manipulation verbs" in the same way gives me inspiration to believe >> that we can do the same and make the database look essentially like a >> GRanges / VRanges object and get cooking that way. >> >> > This would be useful and was part of the intent of DynamicGRanges in the > MutableRanges package (in svn for years but never released). A short-term > solution might be an indexed VCF. The parser in VariantAnnotation supports > multiple modes of restriction that should enable efficient loading. > i'll take a look, my impression was that ad hoc parsing and modeling the unstructured metadata elements in a vcf would be too costly. > Michael > > >> Hopefully this answer was at least minimally aligned in the direction >> of what you were asking ;-) >> >> -steve >> >> -- >> Steve Lianoglou >> Computational Biologist >> Genentech >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel