Re: [Bioc-devel] Modeling (statistic, p-value) pairs in MultiAssayExperiment

Levi Waldron Tue, 24 Oct 2017 15:29:22 -0700

OK, I think I'm understanding better now. The best immediate solution that
I can think of is a SummarizedExperiment for each signatures database, then
pasting those SummarizedExperiments together with a MultiAssayExperiment.
Something like this:


set.seed(1)
statvals <- matrix(rnorm(100), ncol=5)
rownames(statvals) <- paste0("pathway", 1:nrow(statvals))
colnames(statvals) <- paste0("cell", 1:ncol(statvals))
pvals <- pnorm(statvals)

coldat <- DataFrame(name=letters[1:ncol(statvals)])
rownames(coldat) <- colnames(statvals)

library(SummarizedExperiment)
se1 <- SummarizedExperiment(list(statvals = statvals[1:12, ], pvals =
pvals[1:12, ]))
se2 <- SummarizedExperiment(list(statvals = statvals[13:20, ], pvals =
pvals[13:20, ]))
library(MultiAssayExperiment)
mae <- MultiAssayExperiment(list(database1=se1, database2=se2),
                            colData=coldat)

Then you can extract with assays() or integrate with wideFormat(), examples
below. The wideFormat example currently only extracts the statvals but you
should be able to select between assays for wideFormat too; I've just
opened an issue
<https://github.com/waldronlab/MultiAssayExperiment/issues/221> for this.

> assays(mae, i="statvals")List of length 2
names(2): database1 database2> assays(mae, i="pvals")List of length 2
names(2): database1 database2> head(assays(mae,
i="pvals")[["database2"]])               cell1      cell2     cell3
 cell4     cell5
pathway13 0.26722067 0.65087047 0.6334933 0.7293096 0.8770575
pathway14 0.01339034 0.47854525 0.1293723 0.1751268 0.7581031
pathway15 0.86969085 0.08424692 0.9240745 0.1049876 0.9437248
pathway16 0.48208011 0.33907294 0.9761707 0.6146450 0.7117439
pathway17 0.49354130 0.34668349 0.3567269 0.3287773 0.1008731
pathway18 0.82737332 0.47635125 0.1482116 0.5004410 0.2832325

> (res <- wideFormat(mae[1, , ], colDataCols="name"))DataFrame with 5 rows and 
> 4 columns
   primary        name database1_pathway1 database2_pathway13
  <factor> <character>          <numeric>           <numeric>
1    cell1           a         -0.6264538          -0.6212406
2    cell2           b          0.9189774           0.3876716
3    cell3           c         -0.1645236           0.3411197
4    cell4           d          2.4016178           0.6107264
5    cell5           e         -0.5686687           1.1604026


On Tue, Oct 24, 2017 at 9:43 AM, Francesco Napolitano <franap...@gmail.com>
wrote:

> Thank you!
>
> Fig 1 shows the pipeline for a single database of pathways, but we
> used 10 different databases (GO, KEGG, Reactome...). Currently we use
> all of MSigDB, which includes 24 subcategories, and we have a matrix
> of ES and a matrix of pvalues for each. You always have the same drugs
> over columns, but different pathways over rows. Keeping them separated
> is necessary (you don't want to rank pathways across unrelated
> databases). On the other hand, if I build one SummarizedExperiment for
> each database, I have to replicate the common metadata across all of
> them, and also lose most of the features that going through the burden
> of modeling my data with SE were all about :-/.
>
> Note I'm considering all this for a package under review to possibly
> improve its interoperability with existing packages.
>
>
> On Tue, Oct 24, 2017 at 2:45 PM, Levi Waldron
> <lwaldron.resea...@gmail.com> wrote:
> > On Oct 24, 2017 6:14 AM, "Francesco Napolitano" <franap...@gmail.com>
> wrote:
> >
> > I'm converting gene expression profiles to "pathway expression
> > profiles" (https://doi.org/10.1093/bioinformatics/btv536), so for each
> > pathway I have an enrichment score and a p-value. I guess it would be
> > like modeling gene expression data where limma-like preprocessing was
> > performed, so you have a fold change - p-value pair for each gene.
> > Isn't there a data model for that?
> >
> >
> > Nice paper, thanks for the link! Could you explain the problem a little
> more
> > using the terminology of your paper? I see your enrichment values matrix
> > (fig 1c ESij) of pathways x cell lines, and imagine additional associated
> > matrices of p-values and ranks, but where do assays with different rows
> come
> > in?
> >
>



-- 
Levi Waldron
http://www.waldronlab.org
Assistant Professor of Biostatistics     CUNY School of Public Health
US: +1 646-364-9616                                           Skype:
levi.waldron

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Modeling (statistic, p-value) pairs in MultiAssayExperiment

Reply via email to