Thanks a lot, Martin! Will work on fixing things on my end. b On Feb 12, 2014 12:49 PM, "Martin Morgan" <mtmor...@fhcrc.org> wrote:
> On 02/11/2014 05:03 PM, Benilton Carvalho wrote: > >> Hi, >> >> I'm trying to understand why FeatureSet objects behave slightly different >> than eSet objects. >> > > There's a combination of things going on, some of which are unfortunate / > unintended. > > The basic problem is that, with regard to row names, subsetting a matrix > with duplicate indexes behaves differently from subsetting a data.frame > > > matrix(0, 2, 2, dimnames=list(1:2, 3:4))[c(1,1),] > 3 4 > 1 0 0 > 1 0 0 > > data.frame(x=1:2, y=3:4)[c(1, 1),] > x y > 1 1 3 > 1.1 1 3 > > The creation of artificial row names is particularly bad when the row name > identifier has an integer component, like an Ensembl gene id, because then > the row name appears somehow legitimate but really isn't. > > What happens with subsetting an ExpressionSet? Some of each, unfortunately > > m = matrix(0, 2, 2, dimnames=list(1:2, 3:4)) > e = ExpressionSet(m)[c(1, 1),] > rownames(fData(e)) ## featureNames(featureData(e)) > ## [1] "1" "1.1" > rownames(exprs(e)) ## featureNames(assayData(e)) > ## [1] "1" "1" > > and perhaps more unfortunately the validity of the object returned by > subsetting is not checked > > validObject(e) > ## Error in validObject(e) : > ## invalid class "ExpressionSet" object: featureNames differ > ## between assayData and featureData > > NChannelSet seems to behave better, checking that there are confusing > labels and failing. > > Because the row identifiers need to be munged, and munged identifiers are > bad, it seems like the NChannelSet failure is desired. The behavior of > ExpressionSet needs to be cleaned up. It seems like the identifiers could > be managed separately from the row names, and the validity of returned > objects checked. The latter is likely to break code that current works, > because an early paradigm was to update an object incrementally. > > An alternative is to 'start again' using the much more well-designed > IRanges infrastructure, along the lines of > > .ExpressionExperiment <- setClass("ExpressionExperiment", > representation(exptData="List", > rowData="DataFrame", > colData="DataFrame", > assays="SimpleList")) > > Simon Anders will recognize this design from an earlier suggestion of his. > > > Martin > > >> Here's the one example I'm trying to work out: >> >> if (!require(pd.hugene.1.0.st.v1)){ >> library(BiocInstaller) >> biocLite('pd.hugene.1.0.st.v1') >> } >> library(oligoData) >> data(affyGeneFS) >> affyGeneFS >> data(sample.ExpressionSet) >> sample.ExpressionSet >> >> ## subset ExpressionSet >> ## everything ok >> sample.ExpressionSet[c(1, 1),] >> >> ## subset FeatureSet >> ## error: featureNames differ between assayData and featureData >> affyGeneFS[c(1, 1),] >> >> But FeatureSets are derived from NChannelSet objects... so: >> >> example('NChannelSet-class') >> obj >> obj[c(1, 2),] ## OK >> obj[c(1, 1),] ## not OK >> >> I was wondering why/if this is intended (i.e., it works on "single >> channel" >> eSets, but fails on NChannelSets)? >> >> Thank you so much for any insight, >> >> benilton >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel