Hi everyone,

I would like to start a discussion about integrating publication of the
Flink Docker images hosted on Docker Hub[1] more tightly with the Flink
release process. Apologies in advance for the long post.

More than two and a half years ago (time flies!) we introduced “official”
Docker images for Flink[2]. Since then, the popularity of running
containerized applications in general and containerized Flink in particular
has continued to grow. Today, Flink is one of the most popular “official”
images on Docker Hub[3].

> A graph of Flink Docker image pulls over time:
https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png

“Official” is in quotation marks because while that’s how the Docker
community refers to top-level images on Docker Hub (i.e. those that can be
run with just <docker run foo>), they are not official in the sense of
being officially endorsed by the Flink PMC.

I think it’s time for that to change.

Currently, the Dockerfiles that produce these images are maintained in a
repository called docker-flink[4] in a separate, community-managed GitHub
organization of the same name. When a new release of Flink is available, or
when other changes are necessary, these Dockerfiles—one per image—are
updated, and then a pull request[5] is made to the Docker Hub
official-images repo with an updated manifest of images and tags, after
which infrastructure run by Docker Hub builds, checks, and publishes the
images.

A question that has come up regularly is “Why are the Dockerfiles in a
separate repository from Flink?”, and there are a few different answers:

   -

   These Dockerfiles package only released, published distributions of
   Flink, and are therefore decoupled from a particular commit in the Flink
   repo
   -

   All the Dockerfiles for supported versions (and the corresponding Scala
   version variants) should be available in one Git tree for discoverability
   -

   The master branch of Flink is not the right place to encode what the
   supported versions are, or how to run previous versions of Flink—it should
   be concerned with the point-in-time of the code represented in that commit


But mostly, having a dedicated repo for Dockerfiles is a convention shared
by nearly every other “official” image on Docker Hub[6]. If the Flink
community wants to do this differently, we will need to work with the
Docker Hub maintainers to make sure we continue to work within their
guidelines and expectations.

While it seems intuitive that integrating these images into the Flink
release process is a good thing, I don’t believe it is strictly necessary,
since the images only package approved and signed Flink releases, and do
not themselves build Flink from source. However, there are some concrete
advantages:

   -

   Putting the Docker images on (almost) equal footing with Flink binary
   release artifacts will help the legitimacy of and user confidence in
   running Flink in containerized environments
   -

   By publishing release candidate (and possibly nightly) images, the
   release testing and automated testing processes could be improved
   -

   The delay between Flink releases and when the corresponding Docker
   images are available will be reduced


Considering all of this, I propose the following:

   -

   We move the Git repository containing the Dockerfiles from the
   docker-flink GitHub organization to Apache, placing it under control of the
   Flink PMC
   -

   We codify updating these Dockerfiles and notifying Docker Hub into the
   Flink release process
   -

      For release candidates, Dockerfiles should be added to a special
      directory which will be automatically built and pushed to the
Apache Docker
      Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1
      -

      Upon release, the appropriate “release” Dockerfiles are added (e.g.
      under the 1.10 directory) and release candidate Dockerfiles removed, and
      then a pull request opened on the docker-library/official-images
repository
      -

   Optionally, we introduce “nightly” builds, with an automated process
   building and pushing images to the Apache Docker Hub organization, e.g.
   apache/flink-dev:1.10-SNAPSHOT


If we choose to move forward in this direction, there are some further
steps we could take to improve the experience of both developing and using
Flink with Docker (these are actually mostly orthogonal to the proposed
changes above, but I think this is a natural first step and should make the
following ideas easier to implement).

First, there are important differences between images meant for running
Flink and those meant for development: the former should strictly package
only released distributions of software and be as thin of a layer as
possible over the software itself, while the latter can be used during
development and testing, and can easily be rebuilt from a “working copy” of
the software’s source code.

By standardizing on defining such “production” images in the docker-flink
repository and “development” image(s) in the Flink repository itself, it is
much clearer to developers and users what the right Dockerfile or image
they should use for a given purpose. To that end, we could introduce one or
more documented Maven goals or Make targets for building a Docker image
from the current source tree or a specific release (including unreleased or
unsupported versions).

Additionally, there has been discussion among Flink contributors for some
time about the confusing state of Dockerfiles within the Flink repository,
each meant for a different way of running Flink. I’m not completely up to
speed about these different efforts, but we could possibly solve this by
either building additional “official” images with different entrypoints for
these various purposes, or by developing an improved entrypoint script that
conveniently supports all cases. I defer to Till Rohrmann, Konstantin
Knauf, or Stephan Ewen for further discussion on this point.

I apologize again for the wall of text, but if you made it this far, thank
you! These improvements have been a long time coming, and I hope we can
find a solution that serves the Flink and Docker communities well. Please
don’t hesitate to ask any questions.

--

Patrick Lucas

[1] https://hub.docker.com/_/flink

[2]
https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E

[3] On page 2 at the time we went to press:
https://hub.docker.com/search?q=&type=image&image_filter=official

[4] https://github.com/docker-flink/docker-flink

[5]
https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink

[6] I looked at the 25 most popular “official” images (see [3]) as well as
“official” images of Apache software from the top 125; all use a dedicated
repo
[7] https://hub.docker.com/u/apache

Reply via email to