Just to chime in, we have adopted the same snapshot policy within
Genentech; we might actually write something up on our policies, since
we've invested a lot of thought into them.

We're also set to release our repository management system (GRAN) on github
for public consumption.  It was written by Gabe Becker and has all sorts of
bells and whistles, including total abstraction of where packages live
(CRAN, Bioconductor, github, r-forge, arbitrary svn, no problem), easy
integration with CI systems like Jenkins and something along the same lines
as pakrat, except focused on ease of reproducibility, including parsing and
diffing of session infos, and constructing a lib path or repository based
on a session info. Obtaining specific package versions took a lot of work
to support (for Bioc we walk the svn logs, with CRAN we rely on the source
package archives, for github we record the hash, etc). It was designed from
the ground up to be deployed anywhere. Gabe expects that eventually there
will be public repositories that are associated with doi codes, so that
e.g. a paper would have a corresponding repository. Fun stuff.

Michael



On Wed, Apr 30, 2014 at 6:02 PM, Martin Morgan <mtmor...@fhcrc.org> wrote:

> On 04/30/2014 05:30 PM, Kasper Daniel Hansen wrote:
>
>> Let me add my opinion: we do not have perfect (easy) reproducibility with
>> Bioc because we can only (easily) download the tar ball corresponding to
>> the latest commit in a given branch.  I am ok with that.  What I (and
>> Alejandro) is concerned about is the inability to install even that.
>>
>
>  There is a clear candidate for which version of the CRAN package we should
>> store: the version we use when we run R CMD check.  This is the version we
>> implicitly say things are working with.
>>
>
> We discussed this internally and are likely to create snapshots at the end
> of each release cycle of all Bioc packages and their CRAN dependencies.
> Perhaps these will be available too as an AMI. A snapshot facilitates
> (though hardly guarantees) reproducibility without too much cost, and is
> consistent with project objectives.
>
> Martin
>
>
>> Best,
>> Kasper
>>
>>
>>
>> On Fri, Apr 25, 2014 at 7:41 AM, Hervé Pagès <hpa...@fhcrc.org> wrote:
>>
>>  Hi,
>>>
>>> See the latest software builds for BioC 2.13:
>>>
>>>    http://bioconductor.org/checkResults/2.13/bioc-20140405/
>>>
>>> The number of packages that needed to be installed on the build
>>> system in order to build and check the 750 BioC software packages
>>> is displayed in the right-most column of the top table:
>>>
>>>    1510 on zin1 (Linux)
>>>    1486 on moscato1 (Windows)
>>>    1500 on perceval (Mac)
>>>
>>> If you click on these numbers, you get the full list of packages
>>> plus their version.
>>>
>>> Once you've subtracted the 750 software packages + the number of data
>>> annotation and data experiment packages (a few more hundreds) from
>>> these numbers, that gives you the number of CRAN packages that
>>> BioC 2.13 depends on. Not that many really (only a very small fraction
>>> of the 5400 CRAN packages).
>>>
>>> If we hosted only this small subset of CRAN packages under
>>>
>>>    http://bioconductor.org/packages/2.13/cran
>>>
>>> next to the other 4 frozen repos
>>>
>>>    http://bioconductor.org/packages/2.13/bioc
>>>    http://bioconductor.org/packages/2.13/data/annotation
>>>    http://bioconductor.org/packages/2.13/experiment
>>>    http://bioconductor.org/packages/2.13/extra
>>>
>>> and have biocLite() modified to point to
>>>
>>>     http://bioconductor.org/packages/2.13/cran
>>>
>>> instead of
>>>
>>>    http://cran.fhcrc.org
>>>
>>> then anybody that has R 3.0.3 could *easily* install and run
>>> BioC 2.13 now or in 5 years from now.
>>>
>>> Cheers,
>>> H.
>>>
>>>
>>>
>>> On 04/24/2014 08:09 AM, Steve Lianoglou wrote:
>>>
>>>  Hi all,
>>>>
>>>> Just saw this tangentially related link to "packrat" which seems
>>>> something
>>>> analogous to a virtualenv (of sorts) for R by the Rstudio folks, which I
>>>> thought might be useful
>>>>
>>>> It actually doesn't solve anybody's problem here, but as I said ...
>>>> tangential :-)
>>>>
>>>> http://rstudio.github.io/packrat/
>>>>
>>>>
>>>> On Thursday, April 24, 2014, Wolfgang Huber <whu...@embl.de> wrote:
>>>>
>>>>   Hi Kasper
>>>>
>>>>>
>>>>> you are right, I had misunderstood the problem.
>>>>> In that case I agree with Martin that the problem resolves into
>>>>> components
>>>>> that are either intractable, already addressed by deprecation policies,
>>>>> or
>>>>> not very important.
>>>>> Sorry for the noise.
>>>>>
>>>>>           Wolfgang
>>>>>
>>>>> On 24 Apr 2014, at 15:18, Kasper Daniel Hansen <
>>>>> kasperdanielhan...@gmail.com> wrote:
>>>>>
>>>>>   Wolfgang,
>>>>>
>>>>>>
>>>>>> Alejandro did not have a problem with the current release, but with
>>>>>> the
>>>>>>
>>>>>>  most recent prior release.  His issue is precisely because it is no
>>>>> longer
>>>>> the current (stable) release.
>>>>>
>>>>>
>>>>>> Kasper
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 24, 2014 at 3:05 PM, Wolfgang Huber <whu...@embl.de>
>>>>>> wrote:
>>>>>> Hi Martin
>>>>>> to come back to the original trigger for this thread: it was not
>>>>>>
>>>>>>  concerns for reproducibility, but the fact that a Bioc package in the
>>>>> current release stopped working because a CRAN package has changed in
>>>>> the
>>>>> meanwhile.
>>>>>
>>>>>  What's the most practical solution to this specific problem?
>>>>>>           Best wishes
>>>>>>           Wolfgang
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 23 Apr 2014, at 19:41, Martin Morgan <mtmor...@fhcrc.org> wrote:
>>>>>>
>>>>>>   On 04/22/2014 09:47 AM, Kasper Daniel Hansen wrote:
>>>>>>
>>>>>>>
>>>>>>>  I think we should have a CRAN snapshot (or a subset of CRAN used in
>>>>>>>>
>>>>>>>>  Bioc)
>>>>>>>
>>>>>>
>>>>>  inside each Bioc release; I don't know how hard that is to manage
>>>>>>
>>>>>>>
>>>>>>>>  from a
>>>>>>>
>>>>>>
>>>>>  technical point of view.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>> I followed this thread with some interest.
>>>>>>>
>>>>>>> It would be surprisingly challenging to update even a 2.13 package --
>>>>>>>
>>>>>>>  the build machines have moved on to other tasks, unconstrained by
>>>>>> the
>>>>>>
>>>>> unique system dependencies needed for 2.13 builds.
>>>>>
>>>>>
>>>>>>  The idea of a 'forever' repository snapshot seems possible, but would
>>>>>>>
>>>>>>>  the snapshot be at the beginning of the release and hence miss the
>>>>>> few
>>>>>>
>>>>> but
>>>>> important bug fixes introduced during the release, or at the end of the
>>>>> release, which might be after the time required for the purposes of
>>>>> replication? Either way it is certain that the peanut butter would land
>>>>> face down for one's particular need. Also, the need for the user to
>>>>> satisfy
>>>>> system dependencies becomes increasingly challenging, even with a
>>>>> binary
>>>>> repository. I don't think a central 'Bioc' solution would really
>>>>> address
>>>>> the problem of reproducibility.
>>>>>
>>>>>
>>>>>>  It is not that 'hard' for an individual group to create a snapshot of
>>>>>>>
>>>>>>>  Bioc and CRAN, using rsync
>>>>>>
>>>>>
>>>>>
>>>>>>     http://www.bioconductor.org/about/mirrors/mirror-how-to/
>>>>>>>    http://cran.r-project.org/mirror-howto.html
>>>>>>>
>>>>>>> and to use install.packages() or even biocLite to access these (see
>>>>>>>
>>>>>>>  ?setRepositories). This would again require that the system
>>>>>> dependencies
>>>>>>
>>>>> for these packages are satisfied in some kind of frozen fashion.
>>>>>
>>>>>
>>>>>>  A more robust possibility is of course a virtual machine, such as the
>>>>>>>
>>>>>>>  AMI (or a customized version) we provide
>>>>>>
>>>>>
>>>>>
>>>>>>     http://www.bioconductor.org/help/bioconductor-cloud-ami/#ami_ids
>>>>>>>
>>>>>>> although these have only a subset of packages installed by default.
>>>>>>>
>>>>>>> The CRAN thread referenced earlier included this post
>>>>>>>
>>>>>>>    https://stat.ethz.ch/pipermail/r-devel/2014-March/068605.html
>>>>>>>
>>>>>>> which I think makes an important distinction between exact
>>>>>>> replication
>>>>>>>
>>>>>>>  and scientific reproducibility; it is the latter that must be the
>>>>>> most
>>>>>>
>>>>> interesting, and the former that we somehow seem to stumble over. The
>>>>> thread also mentions best practices -- version control
>>>>>
>>>>>
>>>>>>     http://bioconductor.org/developers/how-to/source-control/
>>>>>>>
>>>>>>> disciplined approach to deprecation
>>>>>>>
>>>>>>>    http://bioconductor.org/developers/how-to/deprecation/
>>>>>>>
>>>>>>> package versioning
>>>>>>>
>>>>>>>    http://bioconductor.org/developers/how-to/version-numbering/
>>>>>>>
>>>>>>> and the Bioc-style approach to release that we as developers can act
>>>>>>>
>>>>>>>  on to enhance reproducibility. What other best pract
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>  --
>>> Hervé Pagès
>>>
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>>
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpa...@fhcrc.org
>>> Phone:  (206) 667-5791
>>> Fax:    (206) 667-1319
>>>
>>>
>>> _______________________________________________
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to