Hi!

Excuse the long email, but there are a number of things to be clarified in 
preparation for submitting the notame package which I have been developing to 
meet Bioconductor guidelines. As of now it passes almost all of the automatic 
checks, with the exception of formatting and some functions that are over 50 
lines long.

Background 1:
The notame package already has a significant following, and was published in 
2020 with an associated protocol article published in the "Metabolomics Data 
Processing and Data Analysis—Current Best Practices" special issue of the 
Metabolites journal (https://www.mdpi.com/2218-1989/10/4/135). The original 
package relies on the MetaboSet container class, which extends ExpressionSet 
with three slots, namely group_col, time_col and subject_col. These slots are 
used to store the names of the corresponding sample data columns, and are used 
as default arguments to most functions. This makes for a more streamlined 
experience. However, the submission guidelines state that existing classes 
should be preferred, such as SummarizedExperiment. We will be implementing 
support for SummarizedExperiment over the summer. We have included a MetaboSet 
- SummarizedExperiment converter for interoperability. 

Q1: Can an initial Bioconductor submission rely on the Metaboset container 
class? Support for MetaboSet would do well to be included anyways for existing 
users until it is phased out.

Q2: Is it ok to extend the SummarizedExperiment class to utilize the three 
aforementioned slots? It could be called MetaboExperiment. Or should the 
functions be modified such that said columns are specified explicitly, using 
SummarizedExperiment?

Background 2:
The notame package caters to untargeted LC-MS data analysis metabolic profiling 
experiments, encompassing data pretreatment (quality control, normalization, 
imputation and other steps leading up to feature selection) and feature 
selection (univariate analysis and supervised learning). Raw data preprocessing 
is not supported. Instead, the package offers utilities for flexibly reading 
peak tables from an Excel file, resulting from various point-and-click software 
such as MS-DIAL. As such, data in Excel format needs to be included, but is not 
available in any Bioconductor package, although such Excel data could be 
procured from existing data in Bioconductor. However, existing untargeted LC-MS 
data in Bioconductor can not be used, as is, to demonstrate the full 
functionality of the notame package. With regard to feature data, there needs 
to be several analytical modes. Sample data needs to include study group, time 
point, subject ID and several batches. Blank samples would be good as well. 
Packages I have checked for data with the above specifications include FaahKO, 
MetaMSdata, msdata, msqc1, mtbls2, pmp, PtH2O2lipids, and ropls. As of now, the 
example data is not realistic in that it is scrambled and I have not yet been 
informed of the origin and modification of the data. 

Q3: If I get access to information about the origin and modification of the now 
used data, can I further modify it to satisfy the needs of the package for an 
initial Bioconductor release? Or does it need to be realistic? Consider this 
the explicit pre-approval inquiry for including data in the notame package.

Q4: Do you think a separate ExperimentData package satisfying the 
specifications laid out in Background 2 is warranted? This could be included in 
a future version with SummarizedExperiment/MetaboExperiment support.

Q5: The instructions state that the data needs to be documented 
(https://contributions.bioconductor.org/docs.html#doc-inst-script). Is the 
availability of the original data strictly necessary?  I notice many packages 
don't include documentation on how the data was procured.

Thanks,
Vilhelm Suksi
Turku Data Science Group
vks...@utu.fi

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to