I guess the point is we want to see what kind of analyses can be supported
by streaming along the genome, and avoiding intermediate files for each
sample. But maybe that's not ideal for your case, where you add a single
sample and then have to repeat some summarization task at the level of
individual samples.

I've started porting over code from BamViews to BigWigViews here:
https://github.com/mikelove/bigwigviews

Some nice overhead is taken care of for different applications, e.g.
indexing and the *Views_delegate() function.


On Tue, Nov 19, 2013 at 2:57 PM, Cook, Malcolm <m...@stowers.org> wrote:

>
> Hi,
>
> I just went through this approach in yeast,
>         regions = gene promoters
>         assays =  H3K3ME1, H3K3ME2, H3K3ME3 ChipSeq
>         experimental conditions: 7 recombinant knock-outs and knock-ins of
> different domains of different genes.
>         two replicates
>
> So, what I first reached for was something like the IRanges views pattern,
> but for a collection of bigWig, such as we are discussing.
>
> And, not finding it in the wild, I rolled by own using existing BioC,
> allowing me to curry a function down to each Rle corresponding to each
> region in each bigWig, returning ultimately a matrix of values.
>
> This worked, but, in retrospect, I think my need would have been better
> served by serializing the results  for each bigwig, saving it in a
> corresponding file.
>
> Why?
>
> Because I find myself now in the position of now needing to slice and dice
> different subsets of samples and assays, and also add in new samples from
> more recent experiments.
>
> The one thing that has NOT change is the list of regions (in my case, gene
> promoters).
>
> If I instead adopt an approach where I create one serialized (.RDS or
> .csv) file per bigwig storing the tabulation of my summary function at the
> promoter level, I can easily load just the ones in any combination I need
> for a latter analysis.  I can even load them into a multi-dimensional
> array, index by, say [sample,assay].
>
> What I'm thinking wanting now rather is some interface to
> multi-dimensional (virtual) array, where two of the dimensions (say, assay
> and sample) determine the filepath containing the third dimension, begin
> the computed and serialized value at each promoter.
>
> Your mileage may vary.
>
> ~ Malcolm
>
>  >
>  >Retrieving the data for a genomic range is efficient, doing this for
>  >thousands of samples might get tricky, but could probably be vectorized
>  >through clever use of matrices. But millions of regions by thousands of
>  >samples might need some support in native code, along the lines of
>  >viewSums, etc, but iterating over the bigwigs directly. Maybe you guys
>  >couple implement something in R and then we could profile and optimize
> it.
>  >
>  >
>  >On Mon, Nov 18, 2013 at 4:33 PM, Kasper Daniel Hansen <
>  >kasperdanielhan...@gmail.com> wrote:
>  >
>  >> (Michael Love and I had some discussion on this Friday)
>  >>
>  >> I also think it would be a very convenient class/method.  A lot of data
>  >> these days are naturally represented (and are available from say GEO)
> as
>  >> bigWig files (essentially coverage tracks), for example ChIP-seq.  This
>  >> would be much more efficient than converting BAM to coverage on the
> fly.
>  >>
>  >> It seems to me that bigWig ought to be efficient for this, but I am not
>  >> very familiar with its performance.  What we want is really to be able
> to
>  >> chunk multiple coverage profiles over the genome, and do computations
> on
>  >> each of the chunks.  Any idea on efficiency?  I am happy to contribute
> a
>  >> bit, at least with design.
>  >>
>  >> Best,
>  >> Kasper
>  >>
>  >>
>  >> On Mon, Nov 18, 2013 at 6:11 PM, Michael Lawrence <
>  >> lawrence.mich...@gene.com> wrote:
>  >>
>  >>> Aggregating coverage over multiple samples is a popular request
> recently.
>  >>> I'm happy to support this effort, but I thinks someone in Seattle is
> going
>  >>> to have to take the lead on it.
>  >>>
>  >>>
>  >>> On Mon, Nov 18, 2013 at 2:36 PM, Michael Love
>  >>> <michaelisaiahl...@gmail.com>wrote:
>  >>>
>  >>> > a discussion came up on devel last year about looking at a genomic
> range
>  >>> > over multiple samples and multiple experiments (
>  >>> >
>  >>> >
>  >>>
> https://stat.ethz.ch/pipermail/bioc-devel/attachments/20120920/93a4fb61/attachment.pl
>  >>> >  )
>  >>> >
>  >>> > stepping aside the multiple experiment part, I'm interested in
>  >>> > BigWigViews() with fixed ranges across samples. Has there been any
> more
>  >>> > thoughts in this direction?
>  >>> >
>  >>> > BigWigViews would be incredibly useful for genomics applications
> where
>  >>> we
>  >>> > want to scan along the genome looking at lots of samples. BigWig
> offers
>  >>> a
>  >>> > concise representation of the information compared to BAM files.
>  >>> >
>  >>> > What I am trying now is using import(BigWigFile, which=gr) on files
> one
>  >>> by
>  >>> > one, and then binding the coverage together.
>  >>> >
>  >>> > best,
>  >>> >
>  >>> > Mike
>  >>> >
>  >>> >         [[alternative HTML version deleted]]
>  >>> >
>  >>> > _______________________________________________
>  >>> > Bioc-devel@r-project.org mailing list
>  >>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>  >>> >
>  >>>
>  >>>         [[alternative HTML version deleted]]
>  >>>
>  >>> _______________________________________________
>  >>> Bioc-devel@r-project.org mailing list
>  >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>  >>>
>  >>
>  >>
>  >
>  >      [[alternative HTML version deleted]]
>  >
>  >_______________________________________________
>  >Bioc-devel@r-project.org mailing list
>  >https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to