On 1/22/23 22:36, Uri Simonsohn wrote:
This is not a perfect list for this question, but possibly a good list.
I think this is a good match.
I maintain 'groundhog', a package that seeks to simplify reproducibility
of R code based on R packages.
It has so far relied on MRAN  for binaries of older/archived versions of
packages, but MRAN is shutting down.
Posit (R Studio) also has archived binaries, but they are less
transparent about it,  they do not have Mac binaries, and I am a little
uncomfortable relying on a 3rd party again, specially because their
archive is more difficult to navigate and this is part of a for-profit
venture so access is far from guaranteed. So...

I will create an independent archive of all binaries for packages for
Windows and Mac machines.

Instead of having daily backups like MRAN does/did, i will keep just one
binary per combination of package, version, R version, operating system.
So a single 'rio' 0.5.0 binary for Windows for R-4.2.x, for example
(MRAN keeps a daily copy of such file instead, possibly with 100+
identical or nearly identical copies).

I need to decide whether to keep the first binary that was uploaded to
CRAN, the last one, or one in the middle, etc.
In  concept binaries should work regardless of which file is chosen, but
there is a reason, i guess they are rebuilt so often so it may make a
difference in the margin which of the many builts available in MRAN is
chosen to be preserved. I think it has to do with changes in underlying
packages used to build them, but am not sure.
This decision will also guide future archiving, which of the many
versions of to be uploaded to CRAN binaries are preserved.

I think the date-based snapshots made sense, that corresponded to how packages were tested and built together. I am afraid you cannot get close to that by choosing a single version. If you really must use only one, probably the latest.

If I wanted some reasonably reliable repeatability of computations, I would though keep a virtual machine snapshot with the matching OS, matching version of R, the packages installed (probably from source in case debugging became needed), and all external dependencies. Ideally with disconnected network to make sure the computation works fully locally, and as I would not be updating the OS, for security as well. I would keep that snapshot intact - never save the machine state again to it. And if possible convert it to several image formats just in case the virtualization software stops being supported or no longer can read certain formats.

Tomas


So, if you have experience or knowledge on this, which of the many
previously created binaries for a given package version would you choose
to archive long-term?
Groundhog will always attempt to install from source if a binary fails,
so a certain error rate is tolerable.

Uri

----------------------------------

Uri Simonsohn (urisohn.com)

Professor of Behavioral Science, ESADE, Barcelona

Senior Fellow, Wharton School, University of Pennsylvania

Blog at:  DataColada.org <www.DataColada.org>

Easy data sharing: ResearchBox.org

Twitter: @uri_sohn




        [[alternative HTML version deleted]]

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to