Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet

Hervé Pagès Wed, 26 Nov 2014 12:12:39 -0800

Hi guys,

I like the idea of separating the row data from the row ranges.
This could be formalized with 2 distinct accessors: rowData() and
rowRanges(). The former would return a DataFrame, and the latter
NULL or a range-based object (GRanges or GRangesList).
I don't think there is the need for an emptyRanges class.


H.

On 11/26/2014 11:40 AM, Hector Corrada Bravo wrote:

One thing that’s become apparent working on epivizr is that it may be useful to 
think about ‘rowData’ in a SummarizedExperiment as having two distinct 
components: row coordinates and row metadata. In the current class rowData is a 
‘GenomicRanges’ which contains both coordinates (the ranges) and metadata 
(mcols(rowData)). In metagenomics (the other application my group works a lot 
with), we think of the taxonomy as providing coordinates. The distinction is 
worthwhile thinking about since there are certain operations we do on 
coordinates that we don’t do with metadata (and conversely).




Thinking about it this way, the ‘ExpressionSet’ object would be data without 
coordinates. So, I would avoid making ‘GenomicRanges’ behave like ‘DataFrame’ 
since this distinction between coordinates and metadata is lost. The 
‘emptyRanges’ proposal gets closer to this since this corresponds to ‘no 
coordinates’, but it may be worth thinking in the long term on making the 
coordinate/metadata distinction more general.




Hector

On Wed, Nov 26, 2014 at 12:38 PM, Tim Triche, Jr. <tim.tri...@gmail.com>
wrote:

so as a simple experiment, I did the following:
library(GenomicRanges)
bar <- matrix(rnorm(100), ncol=10)
colnames(bar) <- as.character(1:10)
rownames(bar) <- letters[1:10]
foo <- SummarizedExperiment(assays=list(bar=bar))
rowData(foo)
## GRangesList object of length 10:
## $a
## GRanges object with 0 ranges and 0 metadata columns:
##    seqnames    ranges strand
##       <Rle> <IRanges>  <Rle>
##
## $b
## GRanges object with 0 ranges and 0 metadata columns:
##      seqnames ranges strand
##
## $c
## GRanges object with 0 ranges and 0 metadata columns:
##      seqnames ranges strand
##
## ...
## <7 more elements>
colData(foo)
## DataFrame with 10 rows and 0 columns
This got me to thinking, why not have an emptyRanges class, or else the
ability to index a bunch of NULL ranges without a lot of hoohah?  The
defaults mostly do what they're supposed to; why not have a compact
representation of empty rowData as for empty colData (i.e., a DataFrame
with 0 rows)?  Or is a GRangesList of empty GRanges as compact as it is
practicable to get for this purpose?
Just pondering what the lowest-impact solution to the problem at hand might
be.
Statistics is the grammar of science.
Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science>
On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty <haverty.pe...@gene.com>
wrote:

Hi all,

I believe there is a strong need for an object that organizes a collection
of rectangular data (matrices, etc.) with metadata on the rows and
columns.  Can SummarizedExperiment inherit from something simpler that has
a DataFrame as rowData?  (I believe GenomicRanges should inherit from
DataTable, rather than Vector, and subset as x[i,j], but maybe that's
getting a bit off topic.)  I often see people stuffing arbitrary data into
an ExpressionSet and calling one of the assays "exprs" as a work-around.

Regards,

Pete

____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto <lg...@cam.ac.uk> wrote:


On 26 November 2014 14:59, Wolfgang Huber wrote:

A colleague and I are designing a package for quantitative proteomics
data, and we are debating whether to base it on the
SummarizedExperiment or the ExpressionSet class.

There is no immediate use for the ranges aspect of
SummarizedExperiment, so that would have to be carried around with
NAs, and this is a parsimony argument for using ExpressionSet
instead. OTOH, the interface of SummarizedExperiment is cleaner, its
code more modern and more likely to be updated, and users of the
Bioconductor project are likely to benefit from having to deal with a
single interface that works the same or similarly across packages,
rather than a variety of formats; which argues that new packages
should converge towards SummarizedExperiment('s interface).

Are there any pertinent insights from this group?


Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
essentially an ExpressionSet for quantitative proteomics (i.e it has a
MIAPE slot, instead of MIAME for example).

Ideally, a SummarizedExperiment for proteomics would use peptide/protein
ranges, which is in the pipeline, as far as I am concerned. When that
becomes available, there should be infrastructure to coerce and MSnSet
(and/or other relevant data) into an SummarizedExperiment.

Hope this helps.

Best wishes,

Laurent

Thanks and best wishes
Wolfgang

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Laurent Gatto
http://cpu.sysbiol.cam.ac.uk/

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

        [[alternative HTML version deleted]]
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet

Reply via email to