Sounds great. Kasper
On Wed, Apr 30, 2014 at 9:02 PM, Martin Morgan <mtmor...@fhcrc.org> wrote: > On 04/30/2014 05:30 PM, Kasper Daniel Hansen wrote: > >> Let me add my opinion: we do not have perfect (easy) reproducibility with >> Bioc because we can only (easily) download the tar ball corresponding to >> the latest commit in a given branch. I am ok with that. What I (and >> Alejandro) is concerned about is the inability to install even that. >> > > There is a clear candidate for which version of the CRAN package we should >> store: the version we use when we run R CMD check. This is the version we >> implicitly say things are working with. >> > > We discussed this internally and are likely to create snapshots at the end > of each release cycle of all Bioc packages and their CRAN dependencies. > Perhaps these will be available too as an AMI. A snapshot facilitates > (though hardly guarantees) reproducibility without too much cost, and is > consistent with project objectives. > > Martin > > >> Best, >> Kasper >> >> >> On Fri, Apr 25, 2014 at 7:41 AM, Hervé Pagès <hpa...@fhcrc.org> wrote: >> >> Hi, >>> >>> See the latest software builds for BioC 2.13: >>> >>> http://bioconductor.org/checkResults/2.13/bioc-20140405/ >>> >>> The number of packages that needed to be installed on the build >>> system in order to build and check the 750 BioC software packages >>> is displayed in the right-most column of the top table: >>> >>> 1510 on zin1 (Linux) >>> 1486 on moscato1 (Windows) >>> 1500 on perceval (Mac) >>> >>> If you click on these numbers, you get the full list of packages >>> plus their version. >>> >>> Once you've subtracted the 750 software packages + the number of data >>> annotation and data experiment packages (a few more hundreds) from >>> these numbers, that gives you the number of CRAN packages that >>> BioC 2.13 depends on. Not that many really (only a very small fraction >>> of the 5400 CRAN packages). >>> >>> If we hosted only this small subset of CRAN packages under >>> >>> http://bioconductor.org/packages/2.13/cran >>> >>> next to the other 4 frozen repos >>> >>> http://bioconductor.org/packages/2.13/bioc >>> http://bioconductor.org/packages/2.13/data/annotation >>> http://bioconductor.org/packages/2.13/experiment >>> http://bioconductor.org/packages/2.13/extra >>> >>> and have biocLite() modified to point to >>> >>> http://bioconductor.org/packages/2.13/cran >>> >>> instead of >>> >>> http://cran.fhcrc.org >>> >>> then anybody that has R 3.0.3 could *easily* install and run >>> BioC 2.13 now or in 5 years from now. >>> >>> Cheers, >>> H. >>> >>> >>> >>> On 04/24/2014 08:09 AM, Steve Lianoglou wrote: >>> >>> Hi all, >>>> >>>> Just saw this tangentially related link to "packrat" which seems >>>> something >>>> analogous to a virtualenv (of sorts) for R by the Rstudio folks, which I >>>> thought might be useful >>>> >>>> It actually doesn't solve anybody's problem here, but as I said ... >>>> tangential :-) >>>> >>>> http://rstudio.github.io/packrat/ >>>> >>>> >>>> On Thursday, April 24, 2014, Wolfgang Huber <whu...@embl.de> wrote: >>>> >>>> Hi Kasper >>>> >>>>> >>>>> you are right, I had misunderstood the problem. >>>>> In that case I agree with Martin that the problem resolves into >>>>> components >>>>> that are either intractable, already addressed by deprecation policies, >>>>> or >>>>> not very important. >>>>> Sorry for the noise. >>>>> >>>>> Wolfgang >>>>> >>>>> On 24 Apr 2014, at 15:18, Kasper Daniel Hansen < >>>>> kasperdanielhan...@gmail.com> wrote: >>>>> >>>>> Wolfgang, >>>>> >>>>>> >>>>>> Alejandro did not have a problem with the current release, but with >>>>>> the >>>>>> >>>>>> most recent prior release. His issue is precisely because it is no >>>>> longer >>>>> the current (stable) release. >>>>> >>>>> >>>>>> Kasper >>>>>> >>>>>> >>>>>> On Thu, Apr 24, 2014 at 3:05 PM, Wolfgang Huber <whu...@embl.de> >>>>>> wrote: >>>>>> Hi Martin >>>>>> to come back to the original trigger for this thread: it was not >>>>>> >>>>>> concerns for reproducibility, but the fact that a Bioc package in the >>>>> current release stopped working because a CRAN package has changed in >>>>> the >>>>> meanwhile. >>>>> >>>>> What's the most practical solution to this specific problem? >>>>>> Best wishes >>>>>> Wolfgang >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 23 Apr 2014, at 19:41, Martin Morgan <mtmor...@fhcrc.org> wrote: >>>>>> >>>>>> On 04/22/2014 09:47 AM, Kasper Daniel Hansen wrote: >>>>>> >>>>>>> >>>>>>> I think we should have a CRAN snapshot (or a subset of CRAN used in >>>>>>>> >>>>>>>> Bioc) >>>>>>> >>>>>> >>>>> inside each Bioc release; I don't know how hard that is to manage >>>>>> >>>>>>> >>>>>>>> from a >>>>>>> >>>>>> >>>>> technical point of view. >>>>>> >>>>>>> >>>>>>>> >>>>>>> I followed this thread with some interest. >>>>>>> >>>>>>> It would be surprisingly challenging to update even a 2.13 package -- >>>>>>> >>>>>>> the build machines have moved on to other tasks, unconstrained by >>>>>> the >>>>>> >>>>> unique system dependencies needed for 2.13 builds. >>>>> >>>>> >>>>>> The idea of a 'forever' repository snapshot seems possible, but would >>>>>>> >>>>>>> the snapshot be at the beginning of the release and hence miss the >>>>>> few >>>>>> >>>>> but >>>>> important bug fixes introduced during the release, or at the end of the >>>>> release, which might be after the time required for the purposes of >>>>> replication? Either way it is certain that the peanut butter would land >>>>> face down for one's particular need. Also, the need for the user to >>>>> satisfy >>>>> system dependencies becomes increasingly challenging, even with a >>>>> binary >>>>> repository. I don't think a central 'Bioc' solution would really >>>>> address >>>>> the problem of reproducibility. >>>>> >>>>> >>>>>> It is not that 'hard' for an individual group to create a snapshot of >>>>>>> >>>>>>> Bioc and CRAN, using rsync >>>>>> >>>>> >>>>> >>>>>> http://www.bioconductor.org/about/mirrors/mirror-how-to/ >>>>>>> http://cran.r-project.org/mirror-howto.html >>>>>>> >>>>>>> and to use install.packages() or even biocLite to access these (see >>>>>>> >>>>>>> ?setRepositories). This would again require that the system >>>>>> dependencies >>>>>> >>>>> for these packages are satisfied in some kind of frozen fashion. >>>>> >>>>> >>>>>> A more robust possibility is of course a virtual machine, such as the >>>>>>> >>>>>>> AMI (or a customized version) we provide >>>>>> >>>>> >>>>> >>>>>> http://www.bioconductor.org/help/bioconductor-cloud-ami/#ami_ids >>>>>>> >>>>>>> although these have only a subset of packages installed by default. >>>>>>> >>>>>>> The CRAN thread referenced earlier included this post >>>>>>> >>>>>>> https://stat.ethz.ch/pipermail/r-devel/2014-March/068605.html >>>>>>> >>>>>>> which I think makes an important distinction between exact >>>>>>> replication >>>>>>> >>>>>>> and scientific reproducibility; it is the latter that must be the >>>>>> most >>>>>> >>>>> interesting, and the former that we somehow seem to stumble over. The >>>>> thread also mentions best practices -- version control >>>>> >>>>> >>>>>> http://bioconductor.org/developers/how-to/source-control/ >>>>>>> >>>>>>> disciplined approach to deprecation >>>>>>> >>>>>>> http://bioconductor.org/developers/how-to/deprecation/ >>>>>>> >>>>>>> package versioning >>>>>>> >>>>>>> http://bioconductor.org/developers/how-to/version-numbering/ >>>>>>> >>>>>>> and the Bioc-style approach to release that we as developers can act >>>>>>> >>>>>>> on to enhance reproducibility. What other best pract >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpa...@fhcrc.org >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 >>> >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >>> >> [[alternative HTML version deleted]] >> >> >> >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel