Re: [Bioc-devel] Dependencies in Bioconductor dockers

Nathan Sheffield Mon, 31 Aug 2015 23:51:06 -0700


On 08/31/2015 09:52 AM, Laurent Gatto wrote:

On 29 August 2015 01:19, Martin Morgan wrote:

On 08/28/2015 02:51 PM, Dan Tenenbaum wrote:


----- Original Message -----

From: "Laurent Gatto" <lg...@cam.ac.uk>
To: "Dan Tenenbaum" <dtene...@fredhutch.org>
Cc: "Kasper Daniel Hansen" <kasperdanielhan...@gmail.com>, "bioC-devel" 
<bioc-de...@stat.math.ethz.ch>, "Laurent
Gatto" <lg...@cam.ac.uk>
Sent: Friday, August 28, 2015 2:28:29 PM
Subject: Re: [Bioc-devel] Dependencies in Bioconductor dockers


On 28 August 2015 20:42, Dan Tenenbaum wrote:

----- Original Message -----

From: "Kasper Daniel Hansen" <kasperdanielhan...@gmail.com>
To: "Laurent Gatto" <lg...@cam.ac.uk>
Cc: "bioC-devel" <bioc-de...@stat.math.ethz.ch>
Sent: Wednesday, August 26, 2015 2:36:08 PM
Subject: Re: [Bioc-devel] Dependencies in Bioconductor dockers

This might be especially nice if we use the docker containers for
R
CMD
check.

In this case, you would be checking your own package, right, so the
docker image cannot know in advance what the Suggests dependencies
of
your package are.

[More below].

On Wed, Aug 26, 2015 at 10:56 PM, Laurent Gatto <lg...@cam.ac.uk>
wrote:

Dear all,

As far as I can see, the Suggests dependencies of a package are
not
included in the docker containers. Would you consider adding
these?
It
would be nice to be able to run all examples and vignette code
of
the
packages available in a container.


Adding the Suggests dependencies of all packages installed on the
image is going to make the image much bigger. This request comes
soon
after other requests to reduce the size of the images. We should
probably have a wider discussion and decide exactly what type of
docker images we want to have.

Use cases that have been mentioned are:

    - an image for building/checking with travis (sounds similar to
      Kasper's request above).  For this one in particular, small
      size is
      important as Travis has to build its environment from scratch
      every
      time, and loading large images takes too long.
    - an image that has the Suggests dependencies of all installed
      packages installed.

We might want to pick a different way to decide what packages are
installed on a given image.  Currently we install all packages with
a
given biocView (Sequencing for example) and this leads to very
large
images (sequencing = ~7.5GB).

Thank you for these clarifications, Dan.

If there is interest in having full/complete containers in addition
to
requiring light ones, would it make sense to distribute both? Would
that
be much overhead?

I think it definitely makes sense to distribute the light containers. (and even 
then, I want to see how small a 'light' container is--one that contains R, 
LaTeX, and every system dependency that we know about)
I am a little hesitant to make the existing bloated containers even bigger by 
adding all the Suggests dependencies. That's why I said we might want to 
revisit the way we decide what packages are on a given container. Right now we 
use biocViews (Microarray, Sequencing, Proteomics, FlowCytometry) but that 
results in huge containers containing many packages that people arguably don't 
use that much but just happen to have the correct biocView. Of course it does 
have the benefit of being a somewhat democratic method.

I don't really know what I'm talking about, but does it make sense to think of
the docker images provided by Bioconductor as building blocks for more
specialized containers? i.e., that it should not be 'hard' for a developer to
make an image that is appropriate for their particular needs?

It seems like there's value to some level of nimbleness provided by small
container size. I also wonder about LaTeX -- it seems like HTML vignettes are
way better, and since docker images are forward-looking, maybe the images should
be provisioned with the notion that they'll support HTML?

Maybe there could be a docker-factory script that would take the name of a base
image and the path to a package repository, and create a derived image with the
additional necessary dependencies?

That sounds like a great idea. It would still be nice if Bioconductor
kept the topic specific containers (flow, microarrays, proteomics,
sequencing).

Laurent

I can pitch in a viewpoint here... I'm doing basically exactly this.I've created several of my own Dockerfiles, which essentially use thebase bioconductor images, and then build on these various combinationsof packages that I need; one for production, one for development, etc.

I even wrote a few "R setup" scripts that just take a list of packages,and then install these into a new container on top of the bioconductorbase images. Seems like almost exactly what you're describing, actually.

I don't think it's really reasonable to expect bioconductor to createdocker images like this, for every possible use case; but providing abase image is very useful, and then people (like me) can use this tobuild our own containers, with whatever packages we require. We couldeven write a tutorial on how to do this...

I don't think it's particularly useful to make huge, even democraticcontainers with all packages of a certain type, honestly.

It's a work in progress, but my repo with a couple of Dockerfiles andsetup scripts that does this is here, if anyone is interested:https://github.com/sheffien/docker


-Nathan

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Dependencies in Bioconductor dockers

Reply via email to