[Bioc-devel] Use of SummerisedExperiments or MultiAssayExperiments of many many Dataframes/ nested List objects

Krutik Patel (PGR) Fri, 31 Jan 2020 05:31:38 -0800

Hello Bioc-Devel,

This will be a long winded question and I apologise for that, I just want to be 
thorough.


I recently submitted a package onto bioconductor for review, and received a 
response to have SummerisedExperiments or MultiAssayExperiments as the standard 
format for my package. I looked into the usage of SE/ MAE and do think they are 
very useful. I just find it difficult to envision the usage of these objects in 
my package. Namely, because I do not use sequencing data and so I do not have 
any phenoData.

The input to my package is deferentially expressed data from microRNA and mRNA 
data, and I feel like that should stay as data frames to make it easier for 
users to use. From these data frames, many other data frames and nested lists 
are created. I will give a short demonstration of how my package functions 
below and I would appreciate it if any user could demonstrate to me how to 
incorporate SE's or MAE's.

# This will load test data
> miR <- mm_miR
> mRNA <- mm_mRNA
# Visualise the data
> head(miR [1:5, 1:5])

                 D1.Log2FC D1.adjPVal    D2.Log2FC   D2.adjPVal  D3.Log2FC
mmu-let-7b-3p -0.008006934 0.97706031 -0.008296431 0.9503666129 -0.1153951
mmu-let-7c-5p  0.299802302 0.30094186  0.511083040 0.0489321072  0.4663393
mmu-let-7d-3p  0.430125310 0.06476131  0.483677350 0.0228474958  0.4301441
mmu-let-7e-3p  0.417901606 0.06543412  0.448677130 0.0301945611  0.3051121
mmu-let-7e-5p  0.637167321 0.01010895  0.984529549 0.0001462246  0.8917273

> head(mRNA [1:5, 1:5])

         D1.Log2FC   D1.adjPVal  D2.Log2FC   D2.adjPVal D3.Log2FC
A2m       1.336002 0.4627700063  4.0470385 0.0114355180  3.688919
AA986860 -1.886142 0.0239685308 -0.8686382 0.2892313624 -1.115943
Aadac    -2.493883 0.0022213531 -2.1678098 0.0051038251 -1.338884
Aadat    -3.647727 0.0006583596 -3.3660043 0.0011145806 -2.616356
Aass     -1.283668 0.0101430103 -1.9567394 0.0004421697 -1.315752
# As you can see creating this type of data for a user would be quite simple if 
it is kept as data frames

# We use the following functions to retrieve annotation IDs
# They will produce several data frames each
> getIDs_miR_mouse(miR)

> head(miR_ensembl)

         GENENAME   ID
1   mmu-let-7b-3p <NA>
2   mmu-let-7c-5p <NA>
3   mmu-let-7d-3p <NA>
4   mmu-let-7e-3p <NA>
5   mmu-let-7e-5p <NA>
6 mmu-let-7f-1-3p <NA>

> head(miR_entrez)
         GENENAME   ID
1   mmu-let-7b-3p <NA>
2   mmu-let-7c-5p <NA>
3   mmu-let-7d-3p <NA>
4   mmu-let-7e-3p <NA>
5   mmu-let-7e-5p <NA>
6 mmu-let-7f-1-3p <NA>

> getIDs_miR_mouse(mRNA)

  GENENAME                 ID
1      A2m ENSMUSG00000030111
2 AA986860 ENSMUSG00000042510
3    Aadac ENSMUSG00000027761
4    Aadat ENSMUSG00000057228
5     Aass ENSMUSG00000029695
6     Abat ENSMUSG00000057880

  GENENAME     ID
1      A2m 232345
2 AA986860 212439
3    Aadac  67758
4    Aadat  23923
5     Aass  30956
6     Abat 268860

# The following function will combine the two data frames into a new one
genetic_data <- CombineGenes(miR_data = miR, mRNA_data = mRNA)

# This function will alter the new data frame into a nested list separated by a 
common string
genelist <- GenesList(method = "c", genetic_data = genetic_data, timeString = 
"D")
> as.data.frame(lapply(genelist, function(x) dim(x)))

    D1   D2   D3   D7  D14
1 2278 2278 2278 2278 2278
2    2    2    2    2    2

# Then we can filter out "non-significant" values
> as.data.frame(lapply(filtered_genelist, function(x) dim(x)))

    D1   D2   D3   D7 D14
1 1108 1389 1037 1196 380
2    2    2    2    2   2


I could go on but I think the point is clear. This package is full of data 
frames and nested lists and it would be nice to use SE or MAE to tidy up the 
global environment. Is there a way of turning many many data frames/ nested 
lists into an SE or MEA object? If there is please do let me know, I am not 
sure how to do this, and I feel as though it would be a necessary process to 
(at least) explore if I want my package on bioconductor.

Many Thanks, Krutik.


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Use of SummerisedExperiments or MultiAssayExperiments of many many Dataframes/ nested List objects

Reply via email to