Big +1 for * official images in a separate repository * unified images (session cluster vs application cluster) * images for development in Apache flink repository
On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann <trohrm...@apache.org> wrote: > Thanks a lot for starting this discussion Patrick! I think it is a very > good idea to move Flink's docker image more under the jurisdiction of the > Flink PMC and to make it releasing new docker images part of Flink's > release process (not saying that we cannot release new docker images > independent of Flink's release cycle). > > One thing I have no strong opinion about is where to place the Dockerfiles > (apache/flink.git vs. apache/flink-docker.git). I see the point that one > wants to separate concerns (Flink code vs. Dockerfiles) and, hence, that > having separate repositories might help with this objective. But on the > other hand, I don't have a lot of experience with Docker Hub and how to > best host Dockerfiles. Consequently, it would be helpful if others who have > made some experience could share it with us. > > Cheers, > Till > > On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng <chenghe...@gmail.com> wrote: > > > Hi Patrick, > > > > Thanks a lot for your continued work on the Docker images. That’s really > > really a great job! And I have also benefited from it. > > > > Big +1 for integrating docker image publication into the Flink release > > process since we can leverage the Flink release process to make sure a > more > > legitimacy docker publication. We can also check and vote on it during > the > > release. > > > > I think the most import thing we need to discuss first is whether to > have a > > dedicated git repo for the Dockerfiles. > > > > Although it is convention shared by nearly every other “official” image > on > > Docker Hub to have a dedicated repo, I'm still not sure about it. Maybe I > > have missed something important. From my point of view, I think it’s > better > > to have the Dockerfiles in the (main)Flink repo. > > - First, I think the Dockerfiles can be treated as part of the release. > > And it is also natural to put the corresponding version of the Dockerfile > > in the corresponding Flink release. > > - Second, we can put the Dockerfiles in the path like > > flink/docker-flink/version/ and the version varies in different releases. > > For example, for release 1.8.3, we have a flink/docker-flink/1.8.3 > > folder(or maybe flink/docker-flink/1.8). Even though all Dockerfiles for > > supported versions are not in one path but they are still in one Git tree > > with different refs. > > - Third, it seems the Docker Hub also supports specifying different > refs. > > For the file[1], we can change the GitRepo link from > > https://github.com/docker-flink/docker-flink.git to > > https://github.com/apache/flink.git and add a GitFetch for each tag, > e.g., > > GitFetch: refs/tags/release-1.8.3. There are some examples in the file of > > ubuntu[2]. > > > > If the above assumptions are right and there are no more obstacles, I'm > > intended to have these Dockerfiles in the main Flink repo. In this case, > we > > can reduce the number of repos and reduce the management overhead. > > What do you think? > > > > Best, > > Hequn > > > > [1] > > > https://github.com/docker-library/official-images/blob/master/library/flink > > [2] > > > > > https://github.com/docker-library/official-images/blob/master/library/ubuntu > > > > > > On Fri, Dec 20, 2019 at 5:29 PM Yang Wang <danrtsey...@gmail.com> wrote: > > > > > Big +1 for this effort. > > > > > > It is really exciting we have started this great work. More and more > > > companies start to > > > use Flink in container environment(docker, Kubernetes, Mesos, even > > > Yarn-3.x). So it is > > > very important that we could have unified official image building and > > > releasing process. > > > > > > > > > The image building process in this proposal is really good and i just > > have > > > the following thoughts. > > > > > > >> Keep a dedicated repo for Dockerfiles to build official image > > > I think this is a good way and we do not need to make some unnecessary > > > changes to Flink repository. > > > > > > >> Integrate building image into the Flink release process > > > It will bring a better experience for container environment users. In > my > > > opinion, a complete > > > release includes the official image. It should be verified to work > well. > > > > > > >> Nightly building > > > Do we support for all the release branch or just master branch? > > > > > > >> Multiple purpose Flink images > > > It is really indeed. In developing and testing process, we need some > > > profiling tools to help > > > us investigate some problems. Currently, we do not even have > jstack/jmap > > in > > > the image. > > > > > > >> Unify the Dockerfile in Flink repository > > > In the current code base, we have flink-contrib/docker-flink/Dockerfile > > to > > > build a image > > > for session cluster. However, it is not updated. For per-job cluster, > > > flink-container/docker/Dockerfile > > > could be used to build a flink image with user artifacts. I think we > need > > > to unify them and > > > provide a more powerful build script and entry point. > > > > > > > > > > > > Best, > > > Yang > > > > > > Patrick Lucas <patr...@ververica.com> 于2019年12月19日周四 下午9:20写道: > > > > > > > Hi everyone, > > > > > > > > > > > > I would like to start a discussion about integrating publication of > the > > > > Flink Docker images hosted on Docker Hub[1] more tightly with the > Flink > > > > release process. Apologies in advance for the long post. > > > > > > > > More than two and a half years ago (time flies!) we introduced > > “official” > > > > Docker images for Flink[2]. Since then, the popularity of running > > > > containerized applications in general and containerized Flink in > > > particular > > > > has continued to grow. Today, Flink is one of the most popular > > “official” > > > > images on Docker Hub[3]. > > > > > > > > > A graph of Flink Docker image pulls over time: > > > > > > > > > > > > > > https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png > > > > > > > > “Official” is in quotation marks because while that’s how the Docker > > > > community refers to top-level images on Docker Hub (i.e. those that > can > > > be > > > > run with just <docker run foo>), they are not official in the sense > of > > > > being officially endorsed by the Flink PMC. > > > > > > > > I think it’s time for that to change. > > > > > > > > Currently, the Dockerfiles that produce these images are maintained > in > > a > > > > repository called docker-flink[4] in a separate, community-managed > > GitHub > > > > organization of the same name. When a new release of Flink is > > available, > > > or > > > > when other changes are necessary, these Dockerfiles—one per image—are > > > > updated, and then a pull request[5] is made to the Docker Hub > > > > official-images repo with an updated manifest of images and tags, > after > > > > which infrastructure run by Docker Hub builds, checks, and publishes > > the > > > > images. > > > > > > > > A question that has come up regularly is “Why are the Dockerfiles in > a > > > > separate repository from Flink?”, and there are a few different > > answers: > > > > > > > > - > > > > > > > > These Dockerfiles package only released, published distributions > of > > > > Flink, and are therefore decoupled from a particular commit in the > > > Flink > > > > repo > > > > - > > > > > > > > All the Dockerfiles for supported versions (and the corresponding > > > Scala > > > > version variants) should be available in one Git tree for > > > > discoverability > > > > - > > > > > > > > The master branch of Flink is not the right place to encode what > the > > > > supported versions are, or how to run previous versions of > Flink—it > > > > should > > > > be concerned with the point-in-time of the code represented in > that > > > > commit > > > > > > > > > > > > But mostly, having a dedicated repo for Dockerfiles is a convention > > > shared > > > > by nearly every other “official” image on Docker Hub[6]. If the Flink > > > > community wants to do this differently, we will need to work with the > > > > Docker Hub maintainers to make sure we continue to work within their > > > > guidelines and expectations. > > > > > > > > While it seems intuitive that integrating these images into the Flink > > > > release process is a good thing, I don’t believe it is strictly > > > necessary, > > > > since the images only package approved and signed Flink releases, and > > do > > > > not themselves build Flink from source. However, there are some > > concrete > > > > advantages: > > > > > > > > - > > > > > > > > Putting the Docker images on (almost) equal footing with Flink > > binary > > > > release artifacts will help the legitimacy of and user confidence > in > > > > running Flink in containerized environments > > > > - > > > > > > > > By publishing release candidate (and possibly nightly) images, the > > > > release testing and automated testing processes could be improved > > > > - > > > > > > > > The delay between Flink releases and when the corresponding Docker > > > > images are available will be reduced > > > > > > > > > > > > Considering all of this, I propose the following: > > > > > > > > - > > > > > > > > We move the Git repository containing the Dockerfiles from the > > > > docker-flink GitHub organization to Apache, placing it under > control > > > of > > > > the > > > > Flink PMC > > > > - > > > > > > > > We codify updating these Dockerfiles and notifying Docker Hub into > > the > > > > Flink release process > > > > - > > > > > > > > For release candidates, Dockerfiles should be added to a > special > > > > directory which will be automatically built and pushed to the > > > > Apache Docker > > > > Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1 > > > > - > > > > > > > > Upon release, the appropriate “release” Dockerfiles are added > > (e.g. > > > > under the 1.10 directory) and release candidate Dockerfiles > > > removed, > > > > and > > > > then a pull request opened on the > docker-library/official-images > > > > repository > > > > - > > > > > > > > Optionally, we introduce “nightly” builds, with an automated > process > > > > building and pushing images to the Apache Docker Hub organization, > > > e.g. > > > > apache/flink-dev:1.10-SNAPSHOT > > > > > > > > > > > > If we choose to move forward in this direction, there are some > further > > > > steps we could take to improve the experience of both developing and > > > using > > > > Flink with Docker (these are actually mostly orthogonal to the > proposed > > > > changes above, but I think this is a natural first step and should > make > > > the > > > > following ideas easier to implement). > > > > > > > > First, there are important differences between images meant for > running > > > > Flink and those meant for development: the former should strictly > > package > > > > only released distributions of software and be as thin of a layer as > > > > possible over the software itself, while the latter can be used > during > > > > development and testing, and can easily be rebuilt from a “working > > copy” > > > of > > > > the software’s source code. > > > > > > > > By standardizing on defining such “production” images in the > > docker-flink > > > > repository and “development” image(s) in the Flink repository itself, > > it > > > is > > > > much clearer to developers and users what the right Dockerfile or > image > > > > they should use for a given purpose. To that end, we could introduce > > one > > > or > > > > more documented Maven goals or Make targets for building a Docker > image > > > > from the current source tree or a specific release (including > > unreleased > > > or > > > > unsupported versions). > > > > > > > > Additionally, there has been discussion among Flink contributors for > > some > > > > time about the confusing state of Dockerfiles within the Flink > > > repository, > > > > each meant for a different way of running Flink. I’m not completely > up > > to > > > > speed about these different efforts, but we could possibly solve this > > by > > > > either building additional “official” images with different > entrypoints > > > for > > > > these various purposes, or by developing an improved entrypoint > script > > > that > > > > conveniently supports all cases. I defer to Till Rohrmann, Konstantin > > > > Knauf, or Stephan Ewen for further discussion on this point. > > > > > > > > I apologize again for the wall of text, but if you made it this far, > > > thank > > > > you! These improvements have been a long time coming, and I hope we > can > > > > find a solution that serves the Flink and Docker communities well. > > Please > > > > don’t hesitate to ask any questions. > > > > > > > > -- > > > > > > > > Patrick Lucas > > > > > > > > [1] https://hub.docker.com/_/flink > > > > > > > > [2] > > > > > > > > > > > > > > https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E > > > > > > > > [3] On page 2 at the time we went to press: > > > > https://hub.docker.com/search?q=&type=image&image_filter=official > > > > > > > > [4] https://github.com/docker-flink/docker-flink > > > > > > > > [5] > > > > > > > > > > > > > > https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink > > > > > > > > [6] I looked at the 25 most popular “official” images (see [3]) as > well > > > as > > > > “official” images of Apache software from the top 125; all use a > > > dedicated > > > > repo > > > > [7] https://hub.docker.com/u/apache > > > > > > > > > > -- Konstantin Knauf | Solutions Architect +49 160 91394525 Follow us @VervericaData Ververica <https://www.ververica.com/> -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Tony) Cheng