On 02/11/2014 05:03 PM, Benilton Carvalho wrote:
Hi,

I'm trying to understand why FeatureSet objects behave slightly different
than eSet objects.

There's a combination of things going on, some of which are unfortunate / unintended.

The basic problem is that, with regard to row names, subsetting a matrix with duplicate indexes behaves differently from subsetting a data.frame

    > matrix(0, 2, 2, dimnames=list(1:2, 3:4))[c(1,1),]
      3 4
    1 0 0
    1 0 0
    > data.frame(x=1:2, y=3:4)[c(1, 1),]
        x y
    1   1 3
    1.1 1 3

The creation of artificial row names is particularly bad when the row name identifier has an integer component, like an Ensembl gene id, because then the row name appears somehow legitimate but really isn't.

What happens with subsetting an ExpressionSet? Some of each, unfortunately

    m = matrix(0, 2, 2, dimnames=list(1:2, 3:4))
    e = ExpressionSet(m)[c(1, 1),]
    rownames(fData(e))    ## featureNames(featureData(e))
    ## [1] "1"   "1.1"
    rownames(exprs(e))    ## featureNames(assayData(e))
    ## [1] "1" "1"

and perhaps more unfortunately the validity of the object returned by subsetting is not checked

    validObject(e)
    ## Error in validObject(e) :
    ##   invalid class "ExpressionSet" object: featureNames differ
    ##   between assayData and featureData

NChannelSet seems to behave better, checking that there are confusing labels and failing.

Because the row identifiers need to be munged, and munged identifiers are bad, it seems like the NChannelSet failure is desired. The behavior of ExpressionSet needs to be cleaned up. It seems like the identifiers could be managed separately from the row names, and the validity of returned objects checked. The latter is likely to break code that current works, because an early paradigm was to update an object incrementally.

An alternative is to 'start again' using the much more well-designed IRanges infrastructure, along the lines of

.ExpressionExperiment <- setClass("ExpressionExperiment",
    representation(exptData="List",
                   rowData="DataFrame",
                   colData="DataFrame",
                   assays="SimpleList"))

Simon Anders will recognize this design from an earlier suggestion of his.


Martin


Here's the one example I'm trying to work out:

if (!require(pd.hugene.1.0.st.v1)){
   library(BiocInstaller)
   biocLite('pd.hugene.1.0.st.v1')
}
library(oligoData)
data(affyGeneFS)
affyGeneFS
data(sample.ExpressionSet)
sample.ExpressionSet

## subset ExpressionSet
## everything ok
sample.ExpressionSet[c(1, 1),]

## subset FeatureSet
## error: featureNames differ between assayData and featureData
affyGeneFS[c(1, 1),]

But FeatureSets are derived from NChannelSet objects... so:

example('NChannelSet-class')
obj
obj[c(1, 2),] ## OK
obj[c(1, 1),] ## not OK

I was wondering why/if this is intended (i.e., it works on "single channel"
eSets, but fails on NChannelSets)?

Thank you so much for any insight,

benilton

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to