Hi Scott, Let me share my view, which is probably shared by others at Debian-Med and NeuroDebian project, where reproducibility questions come up from time to time even though it diverges a bit from the original question on versioned packages
So - if the goal is reproducible results over time, the only way to achieve it 100% with versioned packages is to build them also without dynamic linking to any of the 3rd-party libraries you might use, and never rebuild again unless you could guarantee all the same versions of those 3rd party software, thus boiling down to versioning EVERY package in the system -- unacceptable burden imho... if the goal is to make major still officially supported series of software available simultaneously (e.g. what we do in pymvpa -- supporting stable 0.4 series, and pushing forward 2.0 series), then versioned packages imho is a reasonable way to achieve that and you could look at those examples we have already (e.g. pymvpa/pymvpa2, fsl-4.1 etc). But for this we don't want to spit out versioned package for every minor release. Coming back to reproducible research -- various solutions already exist superior imho to the burden of maintaining a universe of versioned packages: - run final analysis in a virtual machine, take a snapshot - file system snapshot (where supported, e.g. btrfs?) - snapshot.debian.org: gather versions for all packages installed on the system (or stracing and only recording those of relevance) and then regenerating complete environment using snapshot.debian.org. could be even simpler if your system is up to date for a given debian release -- just remember the date and list of packages (and configuration settings if they require any) it would be cool if eventually we automated this -- e.g. enter the shell which traces/records information on the packages which have been used, and then provides a protocol which could be seeded later on to reincarnate such environment + given the name and version of a software of interest, you can already reincarnate such full environment with a single debootstrap command pointing to snapshot.debian.org for the relevant date. with help of schroot you could run those scripts inside chroot pretty much as natively as running on the main system thus mimicking "versioned packages" - there also was a tool to create a virtual machine image from current debian installation -- can't recall its name... - yet another alternative -- cdepack http://www.stanford.edu/~pgbovine/cde.html which would wrap all related to execution of a specific script/software into a contained directory which could be ran on any other linux system (btw -- it is packaged, I just need final tune ups/review and will upload into Debian). probably there are even more solutions Best regards, Yaroslav > Has Debian-Med encountered the need for this sort of thing from others from > the scientific community? > Do you have recommendations for how best to approach the problem in a way > which will help us, and let us contribute back to you, most effectively? > BACKGROUND: > The Genome Institute is attempting to shift to using formal packaging for > internal software distribution across our compute cluster. All of our blades > run Ubuntu Lucid now, so we have an interest in getting Debian packages for > all of the popular bioinformatics tools we use, or helping to make them. Our > biggest difficulty is that our pipelines expect to produce reproducible > results over time, and as such only call "versioned executables". We > currently custom compile everything, and install each version next-to each > other. The actual executable in the PATH is something like "cufflinks1.2.0", > which prepends to the PATH where that code exists, and executes the > "cufflinks" binary in that directory. > In many cases the apps we are packaging are apps which you have already > packaged, and we only need to change things for the per-version packaging. > In other cases you have an older version of the code. In others, you have > not packaged it at all. > Our current trajectory is to hand-repackage whatever we depend on making the > version number part of the package name, and formalizing the above > shell-wrapper strategy. For our first few attempts we use etc-alternaives to > manage a symlink with the generic app name. Upgrades will not happen as a > user might expect, since each version is its own package, but we tentatively > planned to make a meta-package with a name like "cufflinks-versioned" which > depends on a regularly changing "cufflinksN.N.N" package, and would create > the common user experience for anyone who did not request a specific version. > Thanks in advance for your advice, > Scott -- =------------------------------------------------------------------= Keep in touch www.onerussian.com Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected] Archive: http://lists.debian.org/[email protected]

