This is a continuation of the discussion at

https://support.bioconductor.org/p/114814/#114824

Where Wolfgang asks about "creating a corner in the Bioconductor package 
ecosystem for packages that are only ever supposed to build and check with a 
single release"

I think this would be quite challenging to implement correctly, for instance 
ensuring that the user of an appropriate version of R can easily install the 
intended dependencies, and what exactly it means for a package to be restricted 
to a single release, e.g., CRAN packages are updated without versioned releases 
[I mean, a user of Bioc 3.7 will get the current version of the CRAN package, 
not the version that was available at the (beginning or end) of the 3.7 
release], so presumably the idea is that there is a snapshot of package 
versions that one requires. This part sounds as much like a job for packrat / 
switchr etc. Maybe 'our' job is to ensure that the appropriate information is 
discoverable?

I took as an example the defunct package BioMedR. Our friend google 
("Bioconductor BioMedR") took me to the last-known-good landing page (initially 
by way of a mirror in Japan...). The DOI on the (bioconductor.org version) of 
that page took me to the 'Removed packages' ( 
https://bioconductor.org/about/removed-packages/ ) page, which again points to 
the last-known-good page. Likewise https://bioconductor.org/packages/BioMedR . 
The 'In bioc since' tag on the 'last-known-good' page allowed me to find the 
version of Bioconductor where the package was introduced. With some work I can 
find the AMI (https://bioconductor.org/help/bioconductor-cloud-ami/ ) and 
docker images (https://hub.docker.com/r/bioconductor/release_base2/tags/ ) for 
that release of Bioconductor; neither of these would be sufficient for 
reproducibility (I could get relevant Bioconductor package versions simply 
installing the package from our archive via BiocInstaller / BiocManager, but R 
packages would be more challenging). The package has a (impressively 
extensive!) vignette, but the vignette does not include sessionInfo() so one 
has to do considerable extra work to find the relevant packages. Again maybe 
packrat / switchr help with this...

I think 'incoming' versions of such packages would go through the usual review 
process, in an attempt to hue to some sort of overall Bioconductor standard of 
quality; the return on this investment would be limited by the short intended 
shelf-life of the package. These packages often have unique considerations, 
too, e.g., 'large' data and long build times, maintainer concerns about when 
the package is released relative to publication, etc. Also of interest would be 
commitment to the actual data storage and transfer costs and to the management 
costs of this type of package, coupled with appropriate consideration on scope 
of the repository (not just the Bioconductor cognoscenti, presumably) and 
advertising of availability e.g., via 
https://www.nature.com/sdata/policies/repositories .

Contemplating this type of package repository suggests a number of small items 
that provide 'cosmetic' improvements to the current situation (e.g., the 
removed-packages page could be organized in a tabular fashion to include from / 
to versions); a more meaningful attempt would probably require efforts to 
embrace packrat / switchr to avoid reinventing the reproducibility wheel, as 
well as commitment to reviewing and managing these packages for their long-term 
contribution. These are certainly noble goals and align with Bioconductor's 
emphasis on reproducibility; is this something that rises to the level of 
securing separate funding?

Martin
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to