Re: [Bioc-devel] Subsetting eSet-like objects with duplicated indices

Benilton Carvalho Wed, 12 Feb 2014 07:31:21 -0800

Thanks a lot, Martin! Will work on fixing things on my end. b
On Feb 12, 2014 12:49 PM, "Martin Morgan" <mtmor...@fhcrc.org> wrote:


> On 02/11/2014 05:03 PM, Benilton Carvalho wrote:
>
>> Hi,
>>
>> I'm trying to understand why FeatureSet objects behave slightly different
>> than eSet objects.
>>
>
> There's a combination of things going on, some of which are unfortunate /
> unintended.
>
> The basic problem is that, with regard to row names, subsetting a matrix
> with duplicate indexes behaves differently from subsetting a data.frame
>
>     > matrix(0, 2, 2, dimnames=list(1:2, 3:4))[c(1,1),]
>       3 4
>     1 0 0
>     1 0 0
>     > data.frame(x=1:2, y=3:4)[c(1, 1),]
>         x y
>     1   1 3
>     1.1 1 3
>
> The creation of artificial row names is particularly bad when the row name
> identifier has an integer component, like an Ensembl gene id, because then
> the row name appears somehow legitimate but really isn't.
>
> What happens with subsetting an ExpressionSet? Some of each, unfortunately
>
>     m = matrix(0, 2, 2, dimnames=list(1:2, 3:4))
>     e = ExpressionSet(m)[c(1, 1),]
>     rownames(fData(e))    ## featureNames(featureData(e))
>     ## [1] "1"   "1.1"
>     rownames(exprs(e))    ## featureNames(assayData(e))
>     ## [1] "1" "1"
>
> and perhaps more unfortunately the validity of the object returned by
> subsetting is not checked
>
>     validObject(e)
>     ## Error in validObject(e) :
>     ##   invalid class "ExpressionSet" object: featureNames differ
>     ##   between assayData and featureData
>
> NChannelSet seems to behave better, checking that there are confusing
> labels and failing.
>
> Because the row identifiers need to be munged, and munged identifiers are
> bad, it seems like the NChannelSet failure is desired. The behavior of
> ExpressionSet needs to be cleaned up. It seems like the identifiers could
> be managed separately from the row names, and the validity of returned
> objects checked. The latter is likely to break code that current works,
> because an early paradigm was to update an object incrementally.
>
> An alternative is to 'start again' using the much more well-designed
> IRanges infrastructure, along the lines of
>
> .ExpressionExperiment <- setClass("ExpressionExperiment",
>     representation(exptData="List",
>                    rowData="DataFrame",
>                    colData="DataFrame",
>                    assays="SimpleList"))
>
> Simon Anders will recognize this design from an earlier suggestion of his.
>
>
> Martin
>
>
>> Here's the one example I'm trying to work out:
>>
>> if (!require(pd.hugene.1.0.st.v1)){
>>    library(BiocInstaller)
>>    biocLite('pd.hugene.1.0.st.v1')
>> }
>> library(oligoData)
>> data(affyGeneFS)
>> affyGeneFS
>> data(sample.ExpressionSet)
>> sample.ExpressionSet
>>
>> ## subset ExpressionSet
>> ## everything ok
>> sample.ExpressionSet[c(1, 1),]
>>
>> ## subset FeatureSet
>> ## error: featureNames differ between assayData and featureData
>> affyGeneFS[c(1, 1),]
>>
>> But FeatureSets are derived from NChannelSet objects... so:
>>
>> example('NChannelSet-class')
>> obj
>> obj[c(1, 2),] ## OK
>> obj[c(1, 1),] ## not OK
>>
>> I was wondering why/if this is intended (i.e., it works on "single
>> channel"
>> eSets, but fails on NChannelSets)?
>>
>> Thank you so much for any insight,
>>
>> benilton
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Subsetting eSet-like objects with duplicated indices

Reply via email to