Hi,

I just went through this approach in yeast, 
        regions = gene promoters        
        assays =  H3K3ME1, H3K3ME2, H3K3ME3 ChipSeq
        experimental conditions: 7 recombinant knock-outs and knock-ins of 
different domains of different genes.
        two replicates

So, what I first reached for was something like the IRanges views pattern, but 
for a collection of bigWig, such as we are discussing.

And, not finding it in the wild, I rolled by own using existing BioC, allowing 
me to curry a function down to each Rle corresponding to each region in each 
bigWig, returning ultimately a matrix of values.

This worked, but, in retrospect, I think my need would have been better served 
by serializing the results  for each bigwig, saving it in a corresponding file. 

Why?

Because I find myself now in the position of now needing to slice and dice 
different subsets of samples and assays, and also add in new samples from more 
recent experiments.  

The one thing that has NOT change is the list of regions (in my case, gene 
promoters).

If I instead adopt an approach where I create one serialized (.RDS or .csv) 
file per bigwig storing the tabulation of my summary function at the promoter 
level, I can easily load just the ones in any combination I need for a latter 
analysis.  I can even load them into a multi-dimensional array, index by, say 
[sample,assay].

What I'm thinking wanting now rather is some interface to multi-dimensional 
(virtual) array, where two of the dimensions (say, assay and sample) determine 
the filepath containing the third dimension, begin the computed and serialized 
value at each promoter.

Your mileage may vary.

~ Malcolm

 >
 >Retrieving the data for a genomic range is efficient, doing this for
 >thousands of samples might get tricky, but could probably be vectorized
 >through clever use of matrices. But millions of regions by thousands of
 >samples might need some support in native code, along the lines of
 >viewSums, etc, but iterating over the bigwigs directly. Maybe you guys
 >couple implement something in R and then we could profile and optimize it.
 >
 >
 >On Mon, Nov 18, 2013 at 4:33 PM, Kasper Daniel Hansen <
 >kasperdanielhan...@gmail.com> wrote:
 >
 >> (Michael Love and I had some discussion on this Friday)
 >>
 >> I also think it would be a very convenient class/method.  A lot of data
 >> these days are naturally represented (and are available from say GEO) as
 >> bigWig files (essentially coverage tracks), for example ChIP-seq.  This
 >> would be much more efficient than converting BAM to coverage on the fly.
 >>
 >> It seems to me that bigWig ought to be efficient for this, but I am not
 >> very familiar with its performance.  What we want is really to be able to
 >> chunk multiple coverage profiles over the genome, and do computations on
 >> each of the chunks.  Any idea on efficiency?  I am happy to contribute a
 >> bit, at least with design.
 >>
 >> Best,
 >> Kasper
 >>
 >>
 >> On Mon, Nov 18, 2013 at 6:11 PM, Michael Lawrence <
 >> lawrence.mich...@gene.com> wrote:
 >>
 >>> Aggregating coverage over multiple samples is a popular request recently.
 >>> I'm happy to support this effort, but I thinks someone in Seattle is going
 >>> to have to take the lead on it.
 >>>
 >>>
 >>> On Mon, Nov 18, 2013 at 2:36 PM, Michael Love
 >>> <michaelisaiahl...@gmail.com>wrote:
 >>>
 >>> > a discussion came up on devel last year about looking at a genomic range
 >>> > over multiple samples and multiple experiments (
 >>> >
 >>> >
 >>> https://stat.ethz.ch/pipermail/bioc-devel/attachments/20120920/93a4fb61/attachment.pl
 >>> >  )
 >>> >
 >>> > stepping aside the multiple experiment part, I'm interested in
 >>> > BigWigViews() with fixed ranges across samples. Has there been any more
 >>> > thoughts in this direction?
 >>> >
 >>> > BigWigViews would be incredibly useful for genomics applications where
 >>> we
 >>> > want to scan along the genome looking at lots of samples. BigWig offers
 >>> a
 >>> > concise representation of the information compared to BAM files.
 >>> >
 >>> > What I am trying now is using import(BigWigFile, which=gr) on files one
 >>> by
 >>> > one, and then binding the coverage together.
 >>> >
 >>> > best,
 >>> >
 >>> > Mike
 >>> >
 >>> >         [[alternative HTML version deleted]]
 >>> >
 >>> > _______________________________________________
 >>> > Bioc-devel@r-project.org mailing list
 >>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
 >>> >
 >>>
 >>>         [[alternative HTML version deleted]]
 >>>
 >>> _______________________________________________
 >>> Bioc-devel@r-project.org mailing list
 >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
 >>>
 >>
 >>
 >
 >      [[alternative HTML version deleted]]
 >
 >_______________________________________________
 >Bioc-devel@r-project.org mailing list
 >https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to