Thank you! Fig 1 shows the pipeline for a single database of pathways, but we used 10 different databases (GO, KEGG, Reactome...). Currently we use all of MSigDB, which includes 24 subcategories, and we have a matrix of ES and a matrix of pvalues for each. You always have the same drugs over columns, but different pathways over rows. Keeping them separated is necessary (you don't want to rank pathways across unrelated databases). On the other hand, if I build one SummarizedExperiment for each database, I have to replicate the common metadata across all of them, and also lose most of the features that going through the burden of modeling my data with SE were all about :-/.
Note I'm considering all this for a package under review to possibly improve its interoperability with existing packages. On Tue, Oct 24, 2017 at 2:45 PM, Levi Waldron <lwaldron.resea...@gmail.com> wrote: > On Oct 24, 2017 6:14 AM, "Francesco Napolitano" <franap...@gmail.com> wrote: > > I'm converting gene expression profiles to "pathway expression > profiles" (https://doi.org/10.1093/bioinformatics/btv536), so for each > pathway I have an enrichment score and a p-value. I guess it would be > like modeling gene expression data where limma-like preprocessing was > performed, so you have a fold change - p-value pair for each gene. > Isn't there a data model for that? > > > Nice paper, thanks for the link! Could you explain the problem a little more > using the terminology of your paper? I see your enrichment values matrix > (fig 1c ESij) of pathways x cell lines, and imagine additional associated > matrices of p-values and ranks, but where do assays with different rows come > in? > _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel