Let me add my opinion: we do not have perfect (easy) reproducibility with Bioc because we can only (easily) download the tar ball corresponding to the latest commit in a given branch. I am ok with that. What I (and Alejandro) is concerned about is the inability to install even that.
There is a clear candidate for which version of the CRAN package we should store: the version we use when we run R CMD check. This is the version we implicitly say things are working with. Best, Kasper On Fri, Apr 25, 2014 at 7:41 AM, Hervé Pagès <hpa...@fhcrc.org> wrote: > Hi, > > See the latest software builds for BioC 2.13: > > http://bioconductor.org/checkResults/2.13/bioc-20140405/ > > The number of packages that needed to be installed on the build > system in order to build and check the 750 BioC software packages > is displayed in the right-most column of the top table: > > 1510 on zin1 (Linux) > 1486 on moscato1 (Windows) > 1500 on perceval (Mac) > > If you click on these numbers, you get the full list of packages > plus their version. > > Once you've subtracted the 750 software packages + the number of data > annotation and data experiment packages (a few more hundreds) from > these numbers, that gives you the number of CRAN packages that > BioC 2.13 depends on. Not that many really (only a very small fraction > of the 5400 CRAN packages). > > If we hosted only this small subset of CRAN packages under > > http://bioconductor.org/packages/2.13/cran > > next to the other 4 frozen repos > > http://bioconductor.org/packages/2.13/bioc > http://bioconductor.org/packages/2.13/data/annotation > http://bioconductor.org/packages/2.13/experiment > http://bioconductor.org/packages/2.13/extra > > and have biocLite() modified to point to > > http://bioconductor.org/packages/2.13/cran > > instead of > > http://cran.fhcrc.org > > then anybody that has R 3.0.3 could *easily* install and run > BioC 2.13 now or in 5 years from now. > > Cheers, > H. > > > > On 04/24/2014 08:09 AM, Steve Lianoglou wrote: > >> Hi all, >> >> Just saw this tangentially related link to "packrat" which seems something >> analogous to a virtualenv (of sorts) for R by the Rstudio folks, which I >> thought might be useful >> >> It actually doesn't solve anybody's problem here, but as I said ... >> tangential :-) >> >> http://rstudio.github.io/packrat/ >> >> >> On Thursday, April 24, 2014, Wolfgang Huber <whu...@embl.de> wrote: >> >> Hi Kasper >>> >>> you are right, I had misunderstood the problem. >>> In that case I agree with Martin that the problem resolves into >>> components >>> that are either intractable, already addressed by deprecation policies, >>> or >>> not very important. >>> Sorry for the noise. >>> >>> Wolfgang >>> >>> On 24 Apr 2014, at 15:18, Kasper Daniel Hansen < >>> kasperdanielhan...@gmail.com> wrote: >>> >>> Wolfgang, >>>> >>>> Alejandro did not have a problem with the current release, but with the >>>> >>> most recent prior release. His issue is precisely because it is no >>> longer >>> the current (stable) release. >>> >>>> >>>> Kasper >>>> >>>> >>>> On Thu, Apr 24, 2014 at 3:05 PM, Wolfgang Huber <whu...@embl.de> wrote: >>>> Hi Martin >>>> to come back to the original trigger for this thread: it was not >>>> >>> concerns for reproducibility, but the fact that a Bioc package in the >>> current release stopped working because a CRAN package has changed in the >>> meanwhile. >>> >>>> What's the most practical solution to this specific problem? >>>> Best wishes >>>> Wolfgang >>>> >>>> >>>> >>>> >>>> On 23 Apr 2014, at 19:41, Martin Morgan <mtmor...@fhcrc.org> wrote: >>>> >>>> On 04/22/2014 09:47 AM, Kasper Daniel Hansen wrote: >>>>> >>>>>> I think we should have a CRAN snapshot (or a subset of CRAN used in >>>>>> >>>>> Bioc) >>> >>>> inside each Bioc release; I don't know how hard that is to manage >>>>>> >>>>> from a >>> >>>> technical point of view. >>>>>> >>>>> >>>>> I followed this thread with some interest. >>>>> >>>>> It would be surprisingly challenging to update even a 2.13 package -- >>>>> >>>> the build machines have moved on to other tasks, unconstrained by the >>> unique system dependencies needed for 2.13 builds. >>> >>>> >>>>> The idea of a 'forever' repository snapshot seems possible, but would >>>>> >>>> the snapshot be at the beginning of the release and hence miss the few >>> but >>> important bug fixes introduced during the release, or at the end of the >>> release, which might be after the time required for the purposes of >>> replication? Either way it is certain that the peanut butter would land >>> face down for one's particular need. Also, the need for the user to >>> satisfy >>> system dependencies becomes increasingly challenging, even with a binary >>> repository. I don't think a central 'Bioc' solution would really address >>> the problem of reproducibility. >>> >>>> >>>>> It is not that 'hard' for an individual group to create a snapshot of >>>>> >>>> Bioc and CRAN, using rsync >>> >>>> >>>>> http://www.bioconductor.org/about/mirrors/mirror-how-to/ >>>>> http://cran.r-project.org/mirror-howto.html >>>>> >>>>> and to use install.packages() or even biocLite to access these (see >>>>> >>>> ?setRepositories). This would again require that the system dependencies >>> for these packages are satisfied in some kind of frozen fashion. >>> >>>> >>>>> A more robust possibility is of course a virtual machine, such as the >>>>> >>>> AMI (or a customized version) we provide >>> >>>> >>>>> http://www.bioconductor.org/help/bioconductor-cloud-ami/#ami_ids >>>>> >>>>> although these have only a subset of packages installed by default. >>>>> >>>>> The CRAN thread referenced earlier included this post >>>>> >>>>> https://stat.ethz.ch/pipermail/r-devel/2014-March/068605.html >>>>> >>>>> which I think makes an important distinction between exact replication >>>>> >>>> and scientific reproducibility; it is the latter that must be the most >>> interesting, and the former that we somehow seem to stumble over. The >>> thread also mentions best practices -- version control >>> >>>> >>>>> http://bioconductor.org/developers/how-to/source-control/ >>>>> >>>>> disciplined approach to deprecation >>>>> >>>>> http://bioconductor.org/developers/how-to/deprecation/ >>>>> >>>>> package versioning >>>>> >>>>> http://bioconductor.org/developers/how-to/version-numbering/ >>>>> >>>>> and the Bioc-style approach to release that we as developers can act >>>>> >>>> on to enhance reproducibility. What other best pract >>> >> >> >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel