Dear Vilhelm, notame seems to be an interesting package filling some gaps that currently exist in the untargeted metabolomics workflow. I would strongly suggest to support the SummarizedExperiment classes (in future). I would maybe suggest to keep it as generic as possible without dedicated additional slots (group_col, time_col, subject_col) as it seems this information would anyway be available within the `colData` of the SummarizedExperiment. Keeping the object as generic as possible would simplify integration with other Bioconductor packages.
We are still heavily working on the xcms package and recently made an update to use more modern classes there too (see https://jorainer.github.io/xcmsTutorials/index.html for an up-to-date tutorial of the new xcms preprocessing). As a final result we support at present to extract the data as a SummarizedExperiment. In our workflows we are using this object that to perform data normalziation etc (adding e.g. the normalized abundance matrix as an additional assay to the SummarizedExperiment). This works extremely well - but using in addition or as an alternative the notame package directly in these workflows would be great. Also, we have then subsequent workflows for annotation (in the MetaboAnnotation package) that can work on both SummarizedExperiment objects as well as the XcmsExperiment class (extending the MsExperiment object from the MsExperiment package). I would be very much interested to discuss this further, maybe in the #metabolomics channel of the Bioconductor Slack - would be great to better integrate the various packages for metabolomics data analysis. cheers, jo Johannes Rainer, PhD Eurac Research Institute for Biomedicine Via A.-Volta 21, I-39100 Bolzano, Italy email: johannes.rai...@eurac.edu github: jorainer mastodon: jorai...@fosstodon.org Hervé Pagès wrote: Hi, On 5/21/24 01:58, Vilhelm Suksi wrote: > Hi! > > Excuse the long email, but there are a number of things to be clarified in > preparation for submitting the notame package which I have been developing to > meet Bioconductor guidelines. As of now it passes almost all of the automatic > checks, with the exception of formatting and some functions that are over 50 > lines long. > > Background 1: > The notame package already has a significant following, and was published in > 2020 with an associated protocol article published in the "Metabolomics Data > Processing and Data Analysis—Current Best Practices" special issue of the > Metabolites journal > (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mdpi.com%2F2218-1989%2F10%2F4%2F135&data=05%7C02%7Cjohannes.rainer%40eurac.edu%7Cc62fb74963fb42d78bbd08dc7a0be544%7C9251326703e3401a80d4c58ed6674e3b%7C0%7C0%7C638519438735874737%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=sAsG5xRW19jgJFD2efhamacpvbJdMk10na1SCBaPMAw%3D&reserved=0<https://www.mdpi.com/2218-1989/10/4/135>). > The original package relies on the MetaboSet container class, which extends > ExpressionSet with three slots, namely group_col, time_col and subject_col. > These slots are used to store the names of the corresponding sample data > columns, and are used as default arguments to most functions. This makes for > a more streamlined experience. However, the submission guidelines state that > existing classes should be preferred, such as SummarizedExperiment. We will > be implementing support for SummarizedExperiment over the summer. We have > included a MetaboSet - SummarizedExperiment converter for interoperability. > > Q1: Can an initial Bioconductor submission rely on the Metaboset container > class? Support for MetaboSet would do well to be included anyways for > existing users until it is phased out. Since you already have a user base, you will need a roadmap for the transition from Metaboset to MetaboExperiment. Bioconductor has a 6-month release cycle that facilitates this. More on this below. > Q2: Is it ok to extend the SummarizedExperiment class to utilize the three > aforementioned slots? It could be called MetaboExperiment. Or should the > functions be modified such that said columns are specified explicitly, using > SummarizedExperiment? It's better to define your own SummarizedExperiment extension with the three additional slots. This way you will have a container (MetaboExperiment) that is semantically equivalent (or close) to Metaboset. Which means that: (1) in principle you won't need to modify the interface of your existing functions, and (2) you'll be able to provide coercion methods to go back and forth between the MetaboExperiment and Metaboset representations (see ?setAs). Overall this should make the transition from Metaboset to MetaboExperiment easier/smoother. This transition would roughly look something like this: 1. Submit theMetaboset-based version of the package for inclusion in BioC 3.20. 2. After the 3.20 release (next Fall), make the following changes in the devel branch of the package: - Implement the MetaboExperiment class + accessors (getters/setters) + constructor function(s) + show() method. - Implement the coercion methods to go from Metaboset to MetaboExperiment and vice-versa. - Modify the implementation of all the functions that deal with Metaboset objects to deal with MetaboExperiment objects. This will be the primary representation that they handle. If they receive a Metaboset, they will immediately replace it with a MetaboExperiment using as(..., "MetaboExperiment"). - Modify all the documentation, unit tests, and serialized objects accordingly. 3. Now you are ready to deprecate the Metaboset class. I recommend that you also do this in the devel branch before the 3.21 release. There are no well established guidelines to deprecate an S4 class. I recommend that you use .Deprecated() to display a deprecation message in its show() method, constructor function(s), getters/setters, and coercion method from MetaboExperiment to Metaboset. 4. After the 3.21 release (Spring 2025), make the Metaboset class defunct by replacing all the .Deprecated() calls with .Defunct() calls. > Background 2: > The notame package caters to untargeted LC-MS data analysis metabolic > profiling experiments, encompassing data pretreatment (quality control, > normalization, imputation and other steps leading up to feature selection) > and feature selection (univariate analysis and supervised learning). Raw data > preprocessing is not supported. Instead, the package offers utilities for > flexibly reading peak tables from an Excel file, resulting from various > point-and-click software such as MS-DIAL. As such, data in Excel format needs > to be included, but is not available in any Bioconductor package, although > such Excel data could be procured from existing data in Bioconductor. > However, existing untargeted LC-MS data in Bioconductor can not be used, as > is, to demonstrate the full functionality of the notame package. With regard > to feature data, there needs to be several analytical modes. Sample data > needs to include study group, time point, subject ID and several batches. > Blank samples would be good as well. Packages I have checked for data with > the above specifications include FaahKO, MetaMSdata, msdata, msqc1, mtbls2, > pmp, PtH2O2lipids, and ropls. As of now, the example data is not realistic in > that it is scrambled and I have not yet been informed of the origin and > modification of the data. > > Q3: If I get access to information about the origin and modification of the > now used data, can I further modify it to satisfy the needs of the package > for an initial Bioconductor release? Or does it need to be realistic? > Consider this the explicit pre-approval inquiry for including data in the > notame package. I'm not sure I fully understand the question (or its connection with Excel) but yes you can include unrealistic data in the package. As long as it allows you to properly illustrate the basic usage of your functions in the man pages and/or vignette(s). It can also be useful to have small (and unrealistic) data for the unit tests. The important thing here is that the data must be small. > Q4: Do you think a separate ExperimentData package satisfying the > specifications laid out in Background 2 is warranted? This could be included > in a future version with SummarizedExperiment/MetaboExperiment support. It depends on the size of the data. For a software package, we limit the size of the source tarball to 5G. So if you're going to exceed that limit then the datasets need to go in an experiment data package. > > Q5: The instructions state that the data needs to be documented > (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcontributions.bioconductor.org%2Fdocs.html%23doc-inst-script&data=05%7C02%7Cjohannes.rainer%40eurac.edu%7Cc62fb74963fb42d78bbd08dc7a0be544%7C9251326703e3401a80d4c58ed6674e3b%7C0%7C0%7C638519438735884650%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=wz%2FFbT%2FLyVWpFNQ076D5JIqF5xOklmrnrwzJr75Ii88%3D&reserved=0<https://contributions.bioconductor.org/docs.html#doc-inst-script>). > Is the availability of the original data strictly necessary? I notice many > packages don't include documentation on how the data was procured. The availability of the original data is not strictly necessary but the data still needs to be documented i.e. what's its nature, where it's coming from, how it was imported/transformed, etc... Best, H. > > Thanks, > Vilhelm Suksi > Turku Data Science Group > vks...@utu.fi > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel&data=05%7C02%7Cjohannes.rainer%40eurac.edu%7Cc62fb74963fb42d78bbd08dc7a0be544%7C9251326703e3401a80d4c58ed6674e3b%7C0%7C0%7C638519438735891028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=eXGyRIDuYYpzj%2BlmamTeG5mb%2FOaxINydmPyqJFOAcfU%3D&reserved=0<https://stat.ethz.ch/mailman/listinfo/bioc-devel> -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel&data=05%7C02%7Cjohannes.rainer%40eurac.edu%7Cc62fb74963fb42d78bbd08dc7a0be544%7C9251326703e3401a80d4c58ed6674e3b%7C0%7C0%7C638519438735895253%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=7IcY8%2BcH%2FModjzgmiJ0BAxdD5Bx8GWA2APHjJISkL4s%3D&reserved=0<https://stat.ethz.ch/mailman/listinfo/bioc-devel> [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel