you can just implement this by having reserved column names in the colData slot; that will work and will take appr. 23 seconds to implement. I agree it is not as clean from a design perspective, but you get 100% of the functionality and you can write a separate checker for the colData argument.
On Thu, Jun 18, 2015 at 2:00 PM, davide risso <risso.dav...@gmail.com> wrote: > Thank you all for the responses. > > I didn't think about the nested DataFrame solution. It should work. > I agree that an extension might be cleaner, but I clearly need to give it > more thought. > > One of the reasons I wanted to have quality and metadata as separate slots > is that one could enforce that all the qualities are numeric, and have a > quality() method to extract just the quality scores (e.g., for plotting / > quality control). Having them in the same slot makes it harder for the user > to extract just the scores (if the column order and/or names are not > standardized). > > Best, > davide > > > On Thu, Jun 18, 2015 at 6:35 AM Vincent Carey <st...@channing.harvard.edu> > wrote: > >> yes, if a formal extension is warranted. the metadata slot could also be >> used. >> >> On Thu, Jun 18, 2015 at 2:59 PM, Kasper Daniel Hansen < >> kasperdanielhan...@gmail.com> wrote: >> >> > I think the more clean solution for Davide (if he inists on having >> separate >> > objects; I decided against it in minfi) is to extend the class to allow >> > this. >> > >> > Kasper >> > >> > On Thu, Jun 18, 2015 at 12:25 AM, Ryan <r...@thompsonclan.org> wrote: >> > >> > > Oh wow, I didn't know you could put a DataFrame into a single column >> of >> > > another DataFrame. That actually solves a problem for me too (I don't >> > > intend to expose nested DataFrames to the users though). >> > > >> > > >> > > On 6/17/15 7:23 PM, Martin Morgan wrote: >> > > >> > >> On 06/17/2015 11:41 AM, davide risso wrote: >> > >> >> > >>> Dear list, >> > >>> >> > >>> I'm creating an R package to store RNA-seq data of a somewhat large >> > >>> project >> > >>> in which I'm involved. >> > >>> >> > >>> One of the initial goals is to compare different pre-processing >> > >>> pipelines, >> > >>> hence I have multiple expression matrices corresponding to the same >> > >>> samples. >> > >>> The SummarizedExperiment class seems a good candidate, since I have >> > >>> multiple expression matrices with the same rowData and colData >> > >>> information. >> > >>> >> > >>> I have several sample-specific variables that I want to store with >> the >> > >>> object, namely, experimental information (e.g., batch, date, >> > experimental >> > >>> condition, ...) and sample quality (e.g., proportion of aligned >> reads, >> > >>> total duplicate reads, etc...). >> > >>> >> > >>> Of course, I can always create one big data frame concatenating the >> two >> > >>> (experimental info + sample quality), but it seems that both >> > conceptually >> > >>> and practically, it might be useful to have two separate data >> frames. >> > >>> Since this seems somewhat a reasonably standard type of information >> > that >> > >>> one would want to carry on, I was wondering if it would be possible >> / >> > >>> useful to allow the user to have multiple data.frames in the colData >> > slot >> > >>> >> > >> >> > >> Actually, colData() is a DataFrame, and a DataFrame column can >> contain a >> > >> DataFrame. So after >> > >> >> > >> example(SummarizedExperiment) >> > >> >> > >> we could make some faux sample quality data >> > >> >> > >> quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1)) >> > >> >> > >> add this as a column in the colData() >> > >> >> > >> colData(se1)$quality = quality >> > >> >> > >> (or create the SummarizedExperiment from a similar DataFrame >> up-front) >> > >> and manage our grouped data >> > >> >> > >> > colData(se1) >> > >> DataFrame with 6 rows and 2 columns >> > >> Treatment quality >> > >> <character> <DataFrame> >> > >> A ChIP ######## >> > >> B Input ######## >> > >> C ChIP ######## >> > >> D Input ######## >> > >> E ChIP ######## >> > >> F Input ######## >> > >> > colData(se1[,1:2])$quality >> > >> DataFrame with 2 rows and 2 columns >> > >> x y >> > >> <integer> <integer> >> > >> A 1 6 >> > >> B 2 5 >> > >> >> > >> I'm not sure that this is any less confusing to the end user than >> having >> > >> to manage a DataFrameList(), but it does not require any new >> features. >> > >> >> > >> Martin >> > >> >> > >> of SummarizedExperiment. >> > >>> >> > >>> Best, >> > >>> Davide >> > >>> >> > >>> [[alternative HTML version deleted]] >> > >>> >> > >>> _______________________________________________ >> > >>> Bioc-devel@r-project.org mailing list >> > >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > >>> >> > >>> >> > >> >> > >> >> > > _______________________________________________ >> > > Bioc-devel@r-project.org mailing list >> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioc-devel@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel