Thank you all very much! Jesper
On Wed, Apr 1, 2015 at 9:54 PM, Martin Morgan <mtmor...@fredhutch.org> wrote: > On 04/01/2015 07:07 AM, Martin Morgan wrote: > >> On 04/01/2015 05:08 AM, Michael Lawrence wrote: >> >>> It would be nice if someone from Seattle would weigh in on this. >>> >> >> I was hoping to weigh in with 'it's done' but will instead with 'it will >> be done'. >> > > 4-dimensional assays, advisable or otherwise, are available in > GenomicRanges 1.19.49. Thanks for your patience, and for the discussion. > Martin > > > >> A second aspect of Jesper's data that took me a little by surprise and is >> related to Michael's comment below was that assays() can simultaneously >> hold >> arrays of 2, 3, (and 4) dimensions. >> >> Martin >> >> >>> Also, we might want to consider an assayMatrix() accessor that always >>> returns an assay in 2D, except, as you suggest, it might be a matrix of >>> multiples (vectors, matrices, etc) by putting dimensions on a list. That >>> way, generic code can at least assume consistent dimensionality, even if >>> the values are complex. I don't really have any use cases though; just >>> seems possibly beneficial in the abstract. >>> >>> On Wed, Apr 1, 2015 at 1:19 AM, Jesper Gådin <jesper.ga...@gmail.com> >>> wrote: >>> >>> Hi Wolfgang and Michael, >>>> >>>> As Michael says, there is no redundant information in the 4D array I >>>> have, >>>> and all the values are integers. >>>> >>>> Of course I can simulate 4D by e.g. creating extra 3D arrays as assays >>>> equal to the length of the fourth dimension, but that makes the assay >>>> list >>>> a mess. It would also require me to write accessor functions that >>>> transforms the data into 4D before subsequent calculations (or to use a >>>> for >>>> loop..). >>>> >>>> Another option would be to include the 4D as a multiple in the 3D, which >>>> would not require a later transformation into 4D. If I understood >>>> correct, >>>> the array is just a long vector, which is indexed into different >>>> dimensions, and so everything in an SE object could as well be written >>>> as >>>> 2D. But (my belief is that) it is actually convenient to use the >>>> properties >>>> of dimensions for arrays. >>>> >>>> So if there is not a problem extending to 4D, I would be extremely >>>> grateful if you could take a look at it. :) >>>> >>>> Regards, >>>> Jesper >>>> >>>> On Tue, Mar 31, 2015 at 2:16 PM, Michael Lawrence < >>>> lawrence.mich...@gene.com> wrote: >>>> >>>> One would need a long-form colData that aligns with the array. >>>>> >>>>> But now I realize that's not what Jesper wants to do here, and is not >>>>> how >>>>> SE is currently designed. Jesper is using the third (and now fourth) >>>>> dimension to store an additional dimension of information about the >>>>> same >>>>> sample. We already support 3D arrays for this, presumably motivated >>>>> VCF, >>>>> where, for example, each sample can have a probability for WT, het, or >>>>> hom >>>>> at each position. In that case, all of the values are genotype >>>>> likelihoods, >>>>> i.e., they all measure the same thing, so they seem to belong in the >>>>> same >>>>> assay. But they're also the same biological "sample". Essentially, we >>>>> have >>>>> complex measurements that might be a vector, or for Jesper even a >>>>> matrix. >>>>> >>>>> The important question for interoperability is whether we want there to >>>>> be a contract that assays are always two dimensions. I guess we've >>>>> already >>>>> violated that with VCF. Extending to a fourth is not really hurting >>>>> anything. >>>>> >>>>> >>>>> On Tue, Mar 31, 2015 at 4:52 AM, Wolfgang Huber <whu...@embl.de> >>>>> wrote: >>>>> >>>>> >>>>>> Hi Michael >>>>>> >>>>>> where would you put the “colData”-style metadata for the 3rd, 4th, … >>>>>> dimensions? >>>>>> >>>>>> As an (ex-)physicists of course I like arrays, and the more dimensions >>>>>> the better, but in practical work I’ve consistently been bitten by the >>>>>> rigidity of such a design choice too early in a process. >>>>>> >>>>>> Wolfgang >>>>>> >>>>>> On 31 Mar 2015, at 13:32, Michael Lawrence <lawrence.mich...@gene.com >>>>>> > >>>>>> wrote: >>>>>> >>>>>> Taken in the abstract, the tidy data argument is one for consistent >>>>>> data >>>>>> structures that enable interoperability, which is what we have with >>>>>> SummarizedExperiment. The "long form" or "tidy" data frame is an >>>>>> effective >>>>>> general representation, but if there is additional structure in your >>>>>> data, >>>>>> why not represent it formally? Given the way R lays out the data in >>>>>> arrays, >>>>>> it should be possible to add that fourth dimension, in an assay array, >>>>>> while still using the colData to annotate that structure. It does not >>>>>> make >>>>>> the data any less "tidy", but it does make it more structured. >>>>>> >>>>>> On Tue, Mar 31, 2015 at 4:14 AM, Wolfgang Huber <whu...@embl.de> >>>>>> wrote: >>>>>> >>>>>> Dear Jesper >>>>>>> >>>>>>> this is maybe not the answer you want to hear, but stuffing in 4, 5, >>>>>>> … >>>>>>> dimensions may not be all that useful, as you can always roll out >>>>>>> these >>>>>>> higher dimensions into the existing third (or even into the second, >>>>>>> the >>>>>>> SummarizedExperiment columns). There is Hadley’s concept of “tidy >>>>>>> data” >>>>>>> (see e.g. http://www.jstatsoft.org/v59/i10 ) — a paper that is >>>>>>> really >>>>>>> worthwhile to read — which implies that the tidy way forward is to >>>>>>> stay >>>>>>> with 2 (or maybe 3) dimensions in SummarizedExperiment, and to >>>>>>> record the >>>>>>> information that you’d otherwise stuff into the higher dimensions in >>>>>>> the >>>>>>> colData covariates. >>>>>>> >>>>>>> Wolfgang >>>>>>> >>>>>>> Wolfgang Huber >>>>>>> Principal Investigator, EMBL Senior Scientist >>>>>>> Genome Biology Unit >>>>>>> European Molecular Biology Laboratory (EMBL) >>>>>>> Heidelberg, Germany >>>>>>> >>>>>>> T +49-6221-3878823 >>>>>>> wolfgang.hu...@embl.de >>>>>>> http://www.huber.embl.de >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 30 Mar 2015, at 12:38, Jesper Gådin <jesper.ga...@gmail.com> >>>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Hi! >>>>>>>> >>>>>>>> The SummarizedExperiment class is an extremely powerful container >>>>>>>> for >>>>>>>> biological data(thank you!), and all my thinking nowadays is just >>>>>>>> >>>>>>> circling >>>>>>> >>>>>>>> around how to stuff it as effectively as possible. >>>>>>>> >>>>>>>> Have been using 3 dimension for a long time, which has been very >>>>>>>> successful. Now I also have a case for using 4 dimensions. >>>>>>>> Everything >>>>>>>> seemed to work as expected until I tried to subset my object, see >>>>>>>> >>>>>>> example. >>>>>>> >>>>>>>> >>>>>>>> library(GenomicRanges) >>>>>>>> >>>>>>>> rowRanges <- GRanges( >>>>>>>> seqnames="chrx", >>>>>>>> ranges=IRanges(start=1:3,end=4:6), >>>>>>>> strand="*" >>>>>>>> ) >>>>>>>> >>>>>>>> coldata <- DataFrame(row.names=paste("s",1:3, sep="")) >>>>>>>> >>>>>>>> assays <- SimpleList() >>>>>>>> >>>>>>>> #two dim >>>>>>>> assays[["dim2"]] <- array(0,dim=c(3,3)) >>>>>>>> se <- SummarizedExperiment(assays, rowRanges = rowRanges, >>>>>>>> >>>>>>> colData=coldata) >>>>>>> >>>>>>>> se[1] >>>>>>>> #works >>>>>>>> >>>>>>>> #three dim >>>>>>>> assays[["dim3"]] <- array(0,dim=c(3,3,3)) >>>>>>>> se <- SummarizedExperiment(assays, rowRanges = rowRanges, >>>>>>>> >>>>>>> colData=coldata) >>>>>>> >>>>>>>> se[1] >>>>>>>> #works >>>>>>>> >>>>>>>> #four dim >>>>>>>> assays[["dim4"]] <- array(0,dim=c(3,3,3,3)) >>>>>>>> se <- SummarizedExperiment(assays, rowRanges = rowRanges, >>>>>>>> >>>>>>> colData=coldata) >>>>>>> >>>>>>>> se[1] >>>>>>>> #does not work >>>>>>>> #Error in x[i, , , drop = FALSE] : incorrect number of dimensions >>>>>>>> >>>>>>>> This is also the case for rbind and cbind. Would it be appropriate >>>>>>>> to >>>>>>>> >>>>>>> ask >>>>>>> >>>>>>>> you to update the SE functions to handle subset, rbind, cbind also >>>>>>>> >>>>>>> for 4 >>>>>>> >>>>>>>> dimensions? I know the time for next release is very soon, so maybe >>>>>>>> >>>>>>> it is >>>>>>> >>>>>>>> better to wait until after April 16. Just let me know your thoughts >>>>>>>> >>>>>>> about >>>>>>> >>>>>>>> it. >>>>>>>> >>>>>>>> Jesper >>>>>>>> >>>>>>>> [[alternative HTML version deleted]] >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioc-devel@r-project.org mailing list >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioc-devel@r-project.org mailing list >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >>> >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel