Re: [Bioc-devel] assay dimnames in SingleCellExperiment / SummarizedExperiment

Aaron Lun Fri, 15 Sep 2017 22:45:07 -0700

I'll leave the first point to the SummarizedExperiment maintainers, though I  
note that your code seems to be about the names of the dimnames rather than the 
dimnames themselves. (I'm under the impression that consistency in the actual 
dimnames is enforced somehow by the SE constructor.)



As for the second point; I suppose we could set the second name for the 
dimnames as "Cells" in SingleCellExperiment, though the choice for the first 
name is more ambiguous. This request has come up before, and I've never been 
entirely convinced by its necessity. It seems mostly aesthetic to me, and 
honestly, if a user doesn't already know that rows are genes and columns are 
cells, I can't see them flailing away at the keyboard until they call dim() to 
tell them what the dimensions correspond to.


But I guess other people like aesthetics, so if you want, you can put in a PR 
to override dim() and dimnames() for SingleCellExperiment to put some names on 
the returned vectors or lists. If I had to choose, I would go with "Features" 
and "Cells" for the rows and columns, respectively. (We already use a RSE so 
we're already implicitly assuming genomic features.)


-Aaron

________________________________
From: Kevin RUE <kevinru...@gmail.com>
Sent: Thursday, 14 September 2017 10:57:39 PM
To: bioc-devel
Cc: da...@ebi.ac.uk; risso.dav...@gmail.com; Aaron Lun; Maintainer
Subject: assay dimnames in SingleCellExperiment / SummarizedExperiment

Dear all,

I cc-ed to this email individual package maintainer to directly 'notify' them 
of this thread and have their respective opinions, but I thought the common use 
of SummarizedExperiment was worth involving the community as well.

Background: I was updating one of my workflow from SCESet to the 
SingleCellExperiment class recently introduced on the development branch.

1)
One thing leading to another, I ended up noticing that there is no validity 
check on dimnames of the various assays in SummarizedExperiment. In other 
words, the different assays can have different `dimnames` (or some assays can 
have NULL dimnames). Using the example code from SummarizedExperiment:

nrows <- 200; ncols <- 6
counts3 <- counts2 <- counts <-
  matrix(runif(nrows * ncols, 1, 1e4), nrows)

rnames <- paste0("F_", sprintf("%03.f", seq_len(nrows)))
cnames <- LETTERS[1:6]

dimnames(counts) <- list(rnames, cnames)
dimnames(counts2) <- list(Tags = rnames, Samples = cnames)
dimnames(counts3) <- list(Features = rnames, Cells = cnames)

colData <- DataFrame(row.names=cnames)

rse <- SummarizedExperiment(assays=SimpleList(c1=counts, c2=counts2, 
c3=counts3), colData=colData)

assayNames(rse)
names(dimnames(assay(rse, "c1"))) # NULL
names(dimnames(assay(rse, "c2"))) # [1] "Tags"    "Samples"
names(dimnames(assay(rse, "c3"))) # [1] "Features" "Cells"

Although not critical, it'd probably be best practice to have a validity check 
on identical dimnames across all assay, so that one does not have to worry 
later about `melt` calls returning different column names whether each assay 
has proper dimnames or not.


2)
The initial glitch that prompted this email related to the `reshape2::melt` 
method that extracts dimnames, if available, in the `scater::plotHighestExprs` 
function. Anyway, Davis has already prepared a fix to deal with the scenario 
whereby the assay does have dimnames (e.g. counts in the edgeR::DGEList class 
that I generally use to import counts). Somehow that wasn't an issue with the 
SCESet that I was using previously (probably a side-effect of ExpressionSet).

The point is, the glitch prompted me to think whether a potential 
standardisation of names(dimnames) could be beneficial, perhaps more 
specifically in the new `SingleCellExperiment` class (as SummarizedExperiment 
has a much more general purpose). Considering the fairly specific purpose of 
the former, I was wondering whether it would be worth:

  *   enforcing names(dimnames(x)) to "Features" and "Cells", (bearing in mind 
that features could still be genes, transcripts, ...)
  *   or maybe dropping dimnames altogether, storing them only once elsewhere 
(although a slot for that seems overkill)

There may be other possibilities that I haven't thought of yet, but I thought 
I'd get the ball rolling.
Having well-defined dimnames sounds good practice, with the added benefit of 
generating aesthetically pleasing column names in melted data-frame as a 
by-product.
However, I can't tell whether the handling of dimnames is something that needs 
to be handle by individual downstream package developers, or whether standards 
should be set in parent classes.


Thanks for your time!

Best,
Kevin

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] assay dimnames in SingleCellExperiment / SummarizedExperiment

Reply via email to