Sounds great.

Kasper


On Wed, Apr 30, 2014 at 9:02 PM, Martin Morgan <mtmor...@fhcrc.org> wrote:

> On 04/30/2014 05:30 PM, Kasper Daniel Hansen wrote:
>
>> Let me add my opinion: we do not have perfect (easy) reproducibility with
>> Bioc because we can only (easily) download the tar ball corresponding to
>> the latest commit in a given branch.  I am ok with that.  What I (and
>> Alejandro) is concerned about is the inability to install even that.
>>
>
>  There is a clear candidate for which version of the CRAN package we should
>> store: the version we use when we run R CMD check.  This is the version we
>> implicitly say things are working with.
>>
>
> We discussed this internally and are likely to create snapshots at the end
> of each release cycle of all Bioc packages and their CRAN dependencies.
> Perhaps these will be available too as an AMI. A snapshot facilitates
> (though hardly guarantees) reproducibility without too much cost, and is
> consistent with project objectives.
>
> Martin
>
>
>> Best,
>> Kasper
>>
>>
>> On Fri, Apr 25, 2014 at 7:41 AM, Hervé Pagès <hpa...@fhcrc.org> wrote:
>>
>>  Hi,
>>>
>>> See the latest software builds for BioC 2.13:
>>>
>>>    http://bioconductor.org/checkResults/2.13/bioc-20140405/
>>>
>>> The number of packages that needed to be installed on the build
>>> system in order to build and check the 750 BioC software packages
>>> is displayed in the right-most column of the top table:
>>>
>>>    1510 on zin1 (Linux)
>>>    1486 on moscato1 (Windows)
>>>    1500 on perceval (Mac)
>>>
>>> If you click on these numbers, you get the full list of packages
>>> plus their version.
>>>
>>> Once you've subtracted the 750 software packages + the number of data
>>> annotation and data experiment packages (a few more hundreds) from
>>> these numbers, that gives you the number of CRAN packages that
>>> BioC 2.13 depends on. Not that many really (only a very small fraction
>>> of the 5400 CRAN packages).
>>>
>>> If we hosted only this small subset of CRAN packages under
>>>
>>>    http://bioconductor.org/packages/2.13/cran
>>>
>>> next to the other 4 frozen repos
>>>
>>>    http://bioconductor.org/packages/2.13/bioc
>>>    http://bioconductor.org/packages/2.13/data/annotation
>>>    http://bioconductor.org/packages/2.13/experiment
>>>    http://bioconductor.org/packages/2.13/extra
>>>
>>> and have biocLite() modified to point to
>>>
>>>     http://bioconductor.org/packages/2.13/cran
>>>
>>> instead of
>>>
>>>    http://cran.fhcrc.org
>>>
>>> then anybody that has R 3.0.3 could *easily* install and run
>>> BioC 2.13 now or in 5 years from now.
>>>
>>> Cheers,
>>> H.
>>>
>>>
>>>
>>> On 04/24/2014 08:09 AM, Steve Lianoglou wrote:
>>>
>>>  Hi all,
>>>>
>>>> Just saw this tangentially related link to "packrat" which seems
>>>> something
>>>> analogous to a virtualenv (of sorts) for R by the Rstudio folks, which I
>>>> thought might be useful
>>>>
>>>> It actually doesn't solve anybody's problem here, but as I said ...
>>>> tangential :-)
>>>>
>>>> http://rstudio.github.io/packrat/
>>>>
>>>>
>>>> On Thursday, April 24, 2014, Wolfgang Huber <whu...@embl.de> wrote:
>>>>
>>>>   Hi Kasper
>>>>
>>>>>
>>>>> you are right, I had misunderstood the problem.
>>>>> In that case I agree with Martin that the problem resolves into
>>>>> components
>>>>> that are either intractable, already addressed by deprecation policies,
>>>>> or
>>>>> not very important.
>>>>> Sorry for the noise.
>>>>>
>>>>>           Wolfgang
>>>>>
>>>>> On 24 Apr 2014, at 15:18, Kasper Daniel Hansen <
>>>>> kasperdanielhan...@gmail.com> wrote:
>>>>>
>>>>>   Wolfgang,
>>>>>
>>>>>>
>>>>>> Alejandro did not have a problem with the current release, but with
>>>>>> the
>>>>>>
>>>>>>  most recent prior release.  His issue is precisely because it is no
>>>>> longer
>>>>> the current (stable) release.
>>>>>
>>>>>
>>>>>> Kasper
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 24, 2014 at 3:05 PM, Wolfgang Huber <whu...@embl.de>
>>>>>> wrote:
>>>>>> Hi Martin
>>>>>> to come back to the original trigger for this thread: it was not
>>>>>>
>>>>>>  concerns for reproducibility, but the fact that a Bioc package in the
>>>>> current release stopped working because a CRAN package has changed in
>>>>> the
>>>>> meanwhile.
>>>>>
>>>>>  What's the most practical solution to this specific problem?
>>>>>>           Best wishes
>>>>>>           Wolfgang
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 23 Apr 2014, at 19:41, Martin Morgan <mtmor...@fhcrc.org> wrote:
>>>>>>
>>>>>>   On 04/22/2014 09:47 AM, Kasper Daniel Hansen wrote:
>>>>>>
>>>>>>>
>>>>>>>  I think we should have a CRAN snapshot (or a subset of CRAN used in
>>>>>>>>
>>>>>>>>  Bioc)
>>>>>>>
>>>>>>
>>>>>  inside each Bioc release; I don't know how hard that is to manage
>>>>>>
>>>>>>>
>>>>>>>>  from a
>>>>>>>
>>>>>>
>>>>>  technical point of view.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>> I followed this thread with some interest.
>>>>>>>
>>>>>>> It would be surprisingly challenging to update even a 2.13 package --
>>>>>>>
>>>>>>>  the build machines have moved on to other tasks, unconstrained by
>>>>>> the
>>>>>>
>>>>> unique system dependencies needed for 2.13 builds.
>>>>>
>>>>>
>>>>>>  The idea of a 'forever' repository snapshot seems possible, but would
>>>>>>>
>>>>>>>  the snapshot be at the beginning of the release and hence miss the
>>>>>> few
>>>>>>
>>>>> but
>>>>> important bug fixes introduced during the release, or at the end of the
>>>>> release, which might be after the time required for the purposes of
>>>>> replication? Either way it is certain that the peanut butter would land
>>>>> face down for one's particular need. Also, the need for the user to
>>>>> satisfy
>>>>> system dependencies becomes increasingly challenging, even with a
>>>>> binary
>>>>> repository. I don't think a central 'Bioc' solution would really
>>>>> address
>>>>> the problem of reproducibility.
>>>>>
>>>>>
>>>>>>  It is not that 'hard' for an individual group to create a snapshot of
>>>>>>>
>>>>>>>  Bioc and CRAN, using rsync
>>>>>>
>>>>>
>>>>>
>>>>>>     http://www.bioconductor.org/about/mirrors/mirror-how-to/
>>>>>>>    http://cran.r-project.org/mirror-howto.html
>>>>>>>
>>>>>>> and to use install.packages() or even biocLite to access these (see
>>>>>>>
>>>>>>>  ?setRepositories). This would again require that the system
>>>>>> dependencies
>>>>>>
>>>>> for these packages are satisfied in some kind of frozen fashion.
>>>>>
>>>>>
>>>>>>  A more robust possibility is of course a virtual machine, such as the
>>>>>>>
>>>>>>>  AMI (or a customized version) we provide
>>>>>>
>>>>>
>>>>>
>>>>>>     http://www.bioconductor.org/help/bioconductor-cloud-ami/#ami_ids
>>>>>>>
>>>>>>> although these have only a subset of packages installed by default.
>>>>>>>
>>>>>>> The CRAN thread referenced earlier included this post
>>>>>>>
>>>>>>>    https://stat.ethz.ch/pipermail/r-devel/2014-March/068605.html
>>>>>>>
>>>>>>> which I think makes an important distinction between exact
>>>>>>> replication
>>>>>>>
>>>>>>>  and scientific reproducibility; it is the latter that must be the
>>>>>> most
>>>>>>
>>>>> interesting, and the former that we somehow seem to stumble over. The
>>>>> thread also mentions best practices -- version control
>>>>>
>>>>>
>>>>>>     http://bioconductor.org/developers/how-to/source-control/
>>>>>>>
>>>>>>> disciplined approach to deprecation
>>>>>>>
>>>>>>>    http://bioconductor.org/developers/how-to/deprecation/
>>>>>>>
>>>>>>> package versioning
>>>>>>>
>>>>>>>    http://bioconductor.org/developers/how-to/version-numbering/
>>>>>>>
>>>>>>> and the Bioc-style approach to release that we as developers can act
>>>>>>>
>>>>>>>  on to enhance reproducibility. What other best pract
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>  --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>>
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpa...@fhcrc.org
>>> Phone:  (206) 667-5791
>>> Fax:    (206) 667-1319
>>>
>>>
>>> _______________________________________________
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>>
>>
>>
>> _______________________________________________
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to