Hi Herve, I think you read my mind. However, now I have managed to come to a slightly different bump. I don't want to use averaged values, per se (I think there's a UCSC tool of some sort that would do that), but rather I want to let Gviz or trackViewer plot both the individiual data points and also the aggregate per-group statistics for them. So I followed your instructions (as I understood them), and did
##... chr <- unique(seqnames(theSelection)) dat <- lapply(lapply(bigWigs, import, as='RleList', genome=theGenome, selection=theSelection), `[`, chr) ##... and I get back a lovely list of Rles (of RPMs) as expected: $tAML01 RleList of length 1 $chr5 numeric-Rle of length 180915260 with 1 run Lengths: 180915260 Values : 0 $tAML03 RleList of length 1 $chr5 numeric-Rle of length 180915260 with 6 runs Lengths: 3599672 63 ... 177315407 Values : 0 0.0348724015057087 ... 0 $tAML05 RleList of length 1 $chr5 numeric-Rle of length 180915260 with 7 runs Lengths: 3599651 40 ... 177314180 Values : 0 0.0769871026277542 ... 0 But when I try to combine them into something I can hand over to Gviz, bad things happen: > do.call(cbind, dat) CD34_011113-1 CD34_012413-2 CD34_021513-1 CD34_031913-1 CD34_041113-1 [1,] ? ? ? ? ? CD34_050213-1 dnAML19 dnAML20 dnAML21 dnAML35 dnAML36 dnAML44 dnAML53 [1,] ? ? ? ? ? ? ? ? dnAML56 dnAML65 dnAML72 dnAML-TCGA-AB-2820 dnAML-TCGA-AB-2838 [1,] ? ? ? ? ? dnAML-TCGA-AB-3000 dnAML-TCGA-AB-3007 dnAML-TCGA-AB-3008 [1,] ? ? ? dnAML-TCGA-AB-3009 tAML01 tAML03 tAML05 tAML12 tAML16 tAML68 tAML_MrT [1,] ? ? ? ? ? ? ? ? So perhaps instead of Rles, I should be using tileGenome() to retrieve binned counts? I suppose it won't hurt me to reduce the granularity to, say, 50bp or even 200bp bins, but if I can get base resolution I'd be ecstatic. Anyways, thanks for the pointers. I think I am at least on the right track now. The remark about SplicingGraphs arose via: http://search.bioconductor.jp/codes/13406 Where I noted that some of the code goes (or appears to go) back and forth from GRanges[List] <-> RleList. But that does not mean I have any clue what the One True Way to do any of these things might be. Also, my use case might just be weird. It wouldn't be the first time. Thanks again, --t Statistics is the grammar of science. Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> On Tue, Apr 1, 2014 at 3:32 PM, Hervé Pagès <hpa...@fhcrc.org> wrote: > Hi Tim, > > There is probably too much guess work for me to really be able to > help... However, and FWIW, in Bioc-devel the 'asRle' argument of > import() has been replaced by the 'as' argument and it can be set > to "GRanges", "RleList", or "NumericList". Be aware that, surprisingly, > if you specify a 'selection', then using as="RleList" vs > as="NumericList" not only changes the returned type but also the semantic > of the function: the returned List object has 1 list element > per chromosome for the former and 1 list element per range in > 'selection' for the latter. > > (IMO it would probably be less confusing and less error-prone if > switching between these 2 semantics was decoupled from choosing > the returned type.) > > Anyway, in your case maybe you want to use as="RleList". > > Then for averaging the score over each of the range in your initial > 'selection", maybe you'll find the examples section of the tileGenome() > function helpful (this is in the GenomicRanges package). > > Am I on the right track with this? > > Cheers, > H. > > > On 04/01/2014 10:40 AM, Tim Triche, Jr. wrote: > >> Hi all, >> >> The following is tangentially related, but hopefully the answer will be >> useful to others (both directly and via my package, which prompts this)... >> >> Suppose I do this: >> >> dat <- GRangesList( >> lapply( bigWigFileNames, import, >> selection=someRanges ) ) >> >> Now I have a GRangesList of values, some of which have 0 ranges, some of >> which may have hundreds or thousands. I would like to aggregate and smooth >> over various subgroups of these for Gviz/trackViewer plots, so I was >> thinking of getting an RleList or similar out of the mcols() from each of >> the GRangesList atoms. >> >> However, >> >> 1) What is the "right" (fast, idiomatic, future-safe) way to extract and >> combine these ranges into columns of Rles might be. I found something >> not-dissimilar in SpliceGraphs (or was it spliceGrapher?) but I imagine >> this is a common operation with some efficient "one true way" to do it. >> >> 2) should I be sucking these into a genoset or SummarizedExperiment >> instead? I'll take the hit if I have to, once, but I don't want it to eat >> up all available RAM since I eventually wish to make the plotting process >> at least somewhat interactive (even if that means calling IGV or >> interactiveDisplay to do it). >> >> Thanks for any guidance you may have to offer, >> >> --t >> >> On Apr 1, 2014, at 7:21 AM, Michael Lawrence <lawrence.mich...@gene.com> >>> wrote: >>> >>> Mostly to Herve: >>> >>> Sometimes we want to pluck the first 1, or 10, or whatever elements from >>> each element of a list. If I had a list 'x', I thought I could do this >>> with: >>> >>> x[IntegerList(1:5)] >>> >>> But it only gives elements 1:5 from x[[1]], not each element of 'x'. In >>> other words, I thought the index would be repped out. Instead, 'x' is >>> subset to the length of 'i', and I'm not sure if that makes sense? >>> >>> But maybe what we really want are pluckHead/Tail, which would be robust >>> to >>> the case that < n elements are in an element. And of course a more >>> general >>> pluck(x, i) to select 'i' from each element, but I wanted the line above >>> to >>> do that. >>> >>> Michael >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel