On 16/03/2022 5:01 p.m., Henrik Bengtsson wrote:
Hi,
I think this is a valid concern and feature request, and I believe it
has been raised by others previously on one of our mailing lists.
And what solution or resources for producing one did they offer?
Here's a trivial solution that could even be implemented by a
pharmaceutical company: rename the file to include its SHA when you
download it, and keep a copy and a record of the new name as part of any
document that is produced with it.
There, it's solved.
Duncan Murdoch
Related to this, there's also been discussion (here or on R-devel), of
having `R CMD build` produce identical tarballs when the input doesn't
change, but the injection of `Packaged: <timestamp>; <user>` to the
`DESCRIPTION` file prevents this. If I recall correctly, there was at
least some discussion on being able to control, or anonymize, the
<user> part.
MRAN (https://mran.microsoft.com/timemachine) provides a daily
snapshot of CRAN, and it goes back several years, but I'm not sure if
that would solve your problem. It's only stable for a particular date,
but I'd guess that in this case it could pick up one build one day,
and the other one the next day.
There are a few working groups over at the R Consortium
(https://www.r-consortium.org/projects/isc-working-groups) who are
interested in reproducibility of R packages. I suspect the 'R
Validation Hub' working group (https://www.pharmar.org/overview/)
would be interested in these type of hiccups, even if it's just to
collect rare "incidents" like this one. I suggest you ping them as
well.
/Henrik
On Wed, Mar 16, 2022 at 12:45 PM Duncan Murdoch
<murdoch.dun...@gmail.com> wrote:
On 16/03/2022 2:51 p.m., Borini, Stefano wrote:
Hello,
Validated software needs to ensure consistency and reproducibility of its
environment, potentially in years' time, when the audit comes. For this reason,
we identify all SHA of the packages we download from CRAN to ensure that the
package has not changed after the fact, something that may signal us that the
package has been corrupted, or malicious code has been added after the fact,
and also guarantees the auditors that the packages are indeed the correct ones
as they were at the time of release.
Currently I am dealing with a package that I downloaded once in the past,
MASS_7.3-54. This package used to have SHA256
b800ccd5b5c2709b1559cf5eab126e4935c4f8826cf7891253432bb6a056e821
MASS_7.3-54.tar.gz
The current package has instead SHA:
eb644c0e94b447c46387aa22436ef5a43192960ee9cfd0df2940f4a4116179ae
MASS_7.3-54.tar.gz
This triggers all sort of alarms. It is established poor practice to replace a
package after the fact exact for these reasons. Once a package is released, it
should remain immutable. Subsequent builds can be introduced with a different
build number.
The change appears to be due to the fact that CRAN rebuilds packages
occasionally, for reasons to me unknown. Diffing the old and the new
MASS_7.3.54.tar.gz reveals the change to be due to this:
$ diff -Naur MASS_1/ MASS_2/
diff -Naur MASS_1/DESCRIPTION MASS_2/DESCRIPTION
--- MASS_1/DESCRIPTION 2021-05-03 10:03:00.000000000 +0100
+++ MASS_2/DESCRIPTION 2021-05-03 10:03:50.000000000 +0100
@@ -33,4 +33,4 @@
David Firth [ctb]
Maintainer: Brian Ripley <rip...@stats.ox.ac.uk>
Repository: CRAN
-Date/Publication: 2021-05-03 09:03:00 UTC
+Date/Publication: 2021-05-03 09:03:50 UTC
diff -Naur MASS_1/MD5 MASS_2/MD5
--- MASS_1/MD5 2021-05-03 10:03:00.000000000 +0100
+++ MASS_2/MD5 2021-05-03 10:03:50.000000000 +0100
@@ -1,4 +1,4 @@
-560f72bfd93ac57532d2cf113078d2e7 *DESCRIPTION
+ecf84f78aac3c625898be45513307d79 *DESCRIPTION
35aff05a505ecf7e81e0473767794ca9 *INDEX
c7acdc0fa828f781a0a5586ab9d4fa1b *LICENCE.note
0ac7b30ad35a4c19ea69d76a6a366b02 *NAMESPACE
Please prevent SHA changes of released packages on CRAN. Once a package is
released, it should not be touched again.
--
Stefano Borini
Principal Analytical Tools Developer
AstraZeneca R&D BioPharmaceuticals | Data Science & AI | Early Biometrics &
Statistical Innovation
I don't know the reason that MASS was built again 50 seconds after the
first build, and it would be more convenient for you and some other
people if it hadn't been, but your request comes across as unreasonably
demanding.
You work for a company with a very large budget. CRAN is run by
volunteers, and as far as I know, your company has not contributed
financially to running it.
If you want to guarantee that a CRAN package can be re-installed years
from now, *you* should be archiving a copy of it. You may be negligent
by not doing so: there's no guarantee that CRAN will still be
distributing *any* version of MASS when the auditors show up.
Duncan Murdoch
______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel