+1 for Ufuk's proposal how to proceed. I guess the immediate next step would be a VOTE for accepting the dockerfiles and where to store them.
Cheers, Till On Wed, Jan 22, 2020 at 4:05 PM Fabian Hueske <fhue...@gmail.com> wrote: > Hi everyone, > > First of all, thank you very much Patrick for maintaining and publishing > the Flink Docker images so far and for starting this discussion! > > I'm in favor of adding the Dockerfiles in a separate repository and not in > the main Flink repository. > I also think that it makes sense to first focus on the contribution of the > Dockerfiles and consolidation of existing Dockerfiles before discussing > special cases for development and testing. > > In addition to the Dockerfiles in the Flink main repo, there is also one in > the flink-playgrounds repo [1] to build a customized Docker image for the > playground. > > Besides building and publishing "official" Flink images via DockerHub, > there is also the option to let ASF Infra build Docker images and publish > them under https://hub.docker.com/u/apache. > These images would not be "official" DockerHub images anymore, but > available under the Apache DockerHub user. > However, I think it would be a good idea to keep the current setup for the > main Flink images (those that depend on Flink releases) for better > visibility and to not confuse our users. > We might want to publish less critical images (playground images, dev > images, nightly builds, etc) via Infra under the Apache DockerHub user. > > Best, > Fabian > > Am Mo., 13. Jan. 2020 um 11:38 Uhr schrieb Ufuk Celebi <u...@apache.org>: > > > Hey all, > > > > first of all a big thank you for driving many of the Docker image > releases > > in the last two years. > > > > *(1) Moving docker-flink/docker-flink to apache/docker-flink* > > > > +1 to do this as you outlined. I would propose to aim for a first > > integration with the 1.10 release without major changes to the existing > > Dockerfiles. The work items would be to move the Dockerfiles and update > the > > release process documentation so everyone is on the same page. > > > > *(2) Consolidate Dockerfiles in apache/flink* > > > > +1 to start the process for this. I think this requires a bit of thinking > > about what the requirements are and which problems we want to solve. From > > skimming the existing Dockerfiles, it seems to me that the Docker image > > builds fulfil quite a few different tasks. We have a script that can > bundle > > Hadoop, can copy an existing Flink distribution, can include user jars, > > etc. The scope of this is quite broad and would warrant a design > document/a > > FLIP. > > > > I would move the questions about nightly builds, using a different base > > image or having image variants with debug tooling to after (1) and (2) or > > make it part of (2). > > > > *(3) Next steps* > > > > If there are no objections, I would propose to tackle (1) and (2) > separate > > and to continue as follows: > > > > (i) Create tickets for (1) and aim to align with 1.10 release timeline > > (ideally before the first RC). Since this does not touch any code in the > > release branches, I think this would not be affected by the feature > freeze. > > The major work item would be to update the docs and potential > refactorings > > of the existing process and Dockerfiles. I can help with the process to > > create a new repo. > > > > (ii) Create first draft for consolidation of existing Dockerfiles. After > > this proposal is done, I would propose to bring it up for a separate > > discussion on the ML. > > > > > > What do you think? @Patrick: would you be interested in working on both > (1) > > + (2) or did you mainly have (1) in mind? > > > > Best, > > > > Ufuk > > > > On Sun, Jan 12, 2020 at 8:30 PM Konstantin Knauf < > konstan...@ververica.com > > > > > wrote: > > > > > Big +1 for > > > > > > * official images in a separate repository > > > * unified images (session cluster vs application cluster) > > > * images for development in Apache flink repository > > > > > > On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann <trohrm...@apache.org> > > > wrote: > > > > > > > Thanks a lot for starting this discussion Patrick! I think it is a > very > > > > good idea to move Flink's docker image more under the jurisdiction of > > the > > > > Flink PMC and to make it releasing new docker images part of Flink's > > > > release process (not saying that we cannot release new docker images > > > > independent of Flink's release cycle). > > > > > > > > One thing I have no strong opinion about is where to place the > > > Dockerfiles > > > > (apache/flink.git vs. apache/flink-docker.git). I see the point that > > one > > > > wants to separate concerns (Flink code vs. Dockerfiles) and, hence, > > that > > > > having separate repositories might help with this objective. But on > the > > > > other hand, I don't have a lot of experience with Docker Hub and how > to > > > > best host Dockerfiles. Consequently, it would be helpful if others > who > > > have > > > > made some experience could share it with us. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng <chenghe...@gmail.com> > > > wrote: > > > > > > > > > Hi Patrick, > > > > > > > > > > Thanks a lot for your continued work on the Docker images. That’s > > > really > > > > > really a great job! And I have also benefited from it. > > > > > > > > > > Big +1 for integrating docker image publication into the Flink > > release > > > > > process since we can leverage the Flink release process to make > sure > > a > > > > more > > > > > legitimacy docker publication. We can also check and vote on it > > during > > > > the > > > > > release. > > > > > > > > > > I think the most import thing we need to discuss first is whether > to > > > > have a > > > > > dedicated git repo for the Dockerfiles. > > > > > > > > > > Although it is convention shared by nearly every other “official” > > image > > > > on > > > > > Docker Hub to have a dedicated repo, I'm still not sure about it. > > > Maybe I > > > > > have missed something important. From my point of view, I think > it’s > > > > better > > > > > to have the Dockerfiles in the (main)Flink repo. > > > > > - First, I think the Dockerfiles can be treated as part of the > > > release. > > > > > And it is also natural to put the corresponding version of the > > > Dockerfile > > > > > in the corresponding Flink release. > > > > > - Second, we can put the Dockerfiles in the path like > > > > > flink/docker-flink/version/ and the version varies in different > > > releases. > > > > > For example, for release 1.8.3, we have a flink/docker-flink/1.8.3 > > > > > folder(or maybe flink/docker-flink/1.8). Even though all > Dockerfiles > > > for > > > > > supported versions are not in one path but they are still in one > Git > > > tree > > > > > with different refs. > > > > > - Third, it seems the Docker Hub also supports specifying > different > > > > refs. > > > > > For the file[1], we can change the GitRepo link from > > > > > https://github.com/docker-flink/docker-flink.git to > > > > > https://github.com/apache/flink.git and add a GitFetch for each > tag, > > > > e.g., > > > > > GitFetch: refs/tags/release-1.8.3. There are some examples in the > > file > > > of > > > > > ubuntu[2]. > > > > > > > > > > If the above assumptions are right and there are no more obstacles, > > I'm > > > > > intended to have these Dockerfiles in the main Flink repo. In this > > > case, > > > > we > > > > > can reduce the number of repos and reduce the management overhead. > > > > > What do you think? > > > > > > > > > > Best, > > > > > Hequn > > > > > > > > > > [1] > > > > > > > > > > > > > > > https://github.com/docker-library/official-images/blob/master/library/flink > > > > > [2] > > > > > > > > > > > > > > > > > > > > https://github.com/docker-library/official-images/blob/master/library/ubuntu > > > > > > > > > > > > > > > On Fri, Dec 20, 2019 at 5:29 PM Yang Wang <danrtsey...@gmail.com> > > > wrote: > > > > > > > > > > > Big +1 for this effort. > > > > > > > > > > > > It is really exciting we have started this great work. More and > > more > > > > > > companies start to > > > > > > use Flink in container environment(docker, Kubernetes, Mesos, > even > > > > > > Yarn-3.x). So it is > > > > > > very important that we could have unified official image building > > and > > > > > > releasing process. > > > > > > > > > > > > > > > > > > The image building process in this proposal is really good and i > > just > > > > > have > > > > > > the following thoughts. > > > > > > > > > > > > >> Keep a dedicated repo for Dockerfiles to build official image > > > > > > I think this is a good way and we do not need to make some > > > unnecessary > > > > > > changes to Flink repository. > > > > > > > > > > > > >> Integrate building image into the Flink release process > > > > > > It will bring a better experience for container environment > users. > > In > > > > my > > > > > > opinion, a complete > > > > > > release includes the official image. It should be verified to > work > > > > well. > > > > > > > > > > > > >> Nightly building > > > > > > Do we support for all the release branch or just master branch? > > > > > > > > > > > > >> Multiple purpose Flink images > > > > > > It is really indeed. In developing and testing process, we need > > some > > > > > > profiling tools to help > > > > > > us investigate some problems. Currently, we do not even have > > > > jstack/jmap > > > > > in > > > > > > the image. > > > > > > > > > > > > >> Unify the Dockerfile in Flink repository > > > > > > In the current code base, we have > > > flink-contrib/docker-flink/Dockerfile > > > > > to > > > > > > build a image > > > > > > for session cluster. However, it is not updated. For per-job > > cluster, > > > > > > flink-container/docker/Dockerfile > > > > > > could be used to build a flink image with user artifacts. I think > > we > > > > need > > > > > > to unify them and > > > > > > provide a more powerful build script and entry point. > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > Yang > > > > > > > > > > > > Patrick Lucas <patr...@ververica.com> 于2019年12月19日周四 下午9:20写道: > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > I would like to start a discussion about integrating > publication > > of > > > > the > > > > > > > Flink Docker images hosted on Docker Hub[1] more tightly with > the > > > > Flink > > > > > > > release process. Apologies in advance for the long post. > > > > > > > > > > > > > > More than two and a half years ago (time flies!) we introduced > > > > > “official” > > > > > > > Docker images for Flink[2]. Since then, the popularity of > running > > > > > > > containerized applications in general and containerized Flink > in > > > > > > particular > > > > > > > has continued to grow. Today, Flink is one of the most popular > > > > > “official” > > > > > > > images on Docker Hub[3]. > > > > > > > > > > > > > > > A graph of Flink Docker image pulls over time: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png > > > > > > > > > > > > > > “Official” is in quotation marks because while that’s how the > > > Docker > > > > > > > community refers to top-level images on Docker Hub (i.e. those > > that > > > > can > > > > > > be > > > > > > > run with just <docker run foo>), they are not official in the > > sense > > > > of > > > > > > > being officially endorsed by the Flink PMC. > > > > > > > > > > > > > > I think it’s time for that to change. > > > > > > > > > > > > > > Currently, the Dockerfiles that produce these images are > > maintained > > > > in > > > > > a > > > > > > > repository called docker-flink[4] in a separate, > > community-managed > > > > > GitHub > > > > > > > organization of the same name. When a new release of Flink is > > > > > available, > > > > > > or > > > > > > > when other changes are necessary, these Dockerfiles—one per > > > image—are > > > > > > > updated, and then a pull request[5] is made to the Docker Hub > > > > > > > official-images repo with an updated manifest of images and > tags, > > > > after > > > > > > > which infrastructure run by Docker Hub builds, checks, and > > > publishes > > > > > the > > > > > > > images. > > > > > > > > > > > > > > A question that has come up regularly is “Why are the > Dockerfiles > > > in > > > > a > > > > > > > separate repository from Flink?”, and there are a few different > > > > > answers: > > > > > > > > > > > > > > - > > > > > > > > > > > > > > These Dockerfiles package only released, published > > distributions > > > > of > > > > > > > Flink, and are therefore decoupled from a particular commit > in > > > the > > > > > > Flink > > > > > > > repo > > > > > > > - > > > > > > > > > > > > > > All the Dockerfiles for supported versions (and the > > > corresponding > > > > > > Scala > > > > > > > version variants) should be available in one Git tree for > > > > > > > discoverability > > > > > > > - > > > > > > > > > > > > > > The master branch of Flink is not the right place to encode > > what > > > > the > > > > > > > supported versions are, or how to run previous versions of > > > > Flink—it > > > > > > > should > > > > > > > be concerned with the point-in-time of the code represented > in > > > > that > > > > > > > commit > > > > > > > > > > > > > > > > > > > > > But mostly, having a dedicated repo for Dockerfiles is a > > convention > > > > > > shared > > > > > > > by nearly every other “official” image on Docker Hub[6]. If the > > > Flink > > > > > > > community wants to do this differently, we will need to work > with > > > the > > > > > > > Docker Hub maintainers to make sure we continue to work within > > > their > > > > > > > guidelines and expectations. > > > > > > > > > > > > > > While it seems intuitive that integrating these images into the > > > Flink > > > > > > > release process is a good thing, I don’t believe it is strictly > > > > > > necessary, > > > > > > > since the images only package approved and signed Flink > releases, > > > and > > > > > do > > > > > > > not themselves build Flink from source. However, there are some > > > > > concrete > > > > > > > advantages: > > > > > > > > > > > > > > - > > > > > > > > > > > > > > Putting the Docker images on (almost) equal footing with > Flink > > > > > binary > > > > > > > release artifacts will help the legitimacy of and user > > > confidence > > > > in > > > > > > > running Flink in containerized environments > > > > > > > - > > > > > > > > > > > > > > By publishing release candidate (and possibly nightly) > images, > > > the > > > > > > > release testing and automated testing processes could be > > > improved > > > > > > > - > > > > > > > > > > > > > > The delay between Flink releases and when the corresponding > > > Docker > > > > > > > images are available will be reduced > > > > > > > > > > > > > > > > > > > > > Considering all of this, I propose the following: > > > > > > > > > > > > > > - > > > > > > > > > > > > > > We move the Git repository containing the Dockerfiles from > the > > > > > > > docker-flink GitHub organization to Apache, placing it under > > > > control > > > > > > of > > > > > > > the > > > > > > > Flink PMC > > > > > > > - > > > > > > > > > > > > > > We codify updating these Dockerfiles and notifying Docker > Hub > > > into > > > > > the > > > > > > > Flink release process > > > > > > > - > > > > > > > > > > > > > > For release candidates, Dockerfiles should be added to a > > > > special > > > > > > > directory which will be automatically built and pushed to > > the > > > > > > > Apache Docker > > > > > > > Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1 > > > > > > > - > > > > > > > > > > > > > > Upon release, the appropriate “release” Dockerfiles are > > added > > > > > (e.g. > > > > > > > under the 1.10 directory) and release candidate > Dockerfiles > > > > > > removed, > > > > > > > and > > > > > > > then a pull request opened on the > > > > docker-library/official-images > > > > > > > repository > > > > > > > - > > > > > > > > > > > > > > Optionally, we introduce “nightly” builds, with an automated > > > > process > > > > > > > building and pushing images to the Apache Docker Hub > > > organization, > > > > > > e.g. > > > > > > > apache/flink-dev:1.10-SNAPSHOT > > > > > > > > > > > > > > > > > > > > > If we choose to move forward in this direction, there are some > > > > further > > > > > > > steps we could take to improve the experience of both > developing > > > and > > > > > > using > > > > > > > Flink with Docker (these are actually mostly orthogonal to the > > > > proposed > > > > > > > changes above, but I think this is a natural first step and > > should > > > > make > > > > > > the > > > > > > > following ideas easier to implement). > > > > > > > > > > > > > > First, there are important differences between images meant for > > > > running > > > > > > > Flink and those meant for development: the former should > strictly > > > > > package > > > > > > > only released distributions of software and be as thin of a > layer > > > as > > > > > > > possible over the software itself, while the latter can be used > > > > during > > > > > > > development and testing, and can easily be rebuilt from a > > “working > > > > > copy” > > > > > > of > > > > > > > the software’s source code. > > > > > > > > > > > > > > By standardizing on defining such “production” images in the > > > > > docker-flink > > > > > > > repository and “development” image(s) in the Flink repository > > > itself, > > > > > it > > > > > > is > > > > > > > much clearer to developers and users what the right Dockerfile > or > > > > image > > > > > > > they should use for a given purpose. To that end, we could > > > introduce > > > > > one > > > > > > or > > > > > > > more documented Maven goals or Make targets for building a > Docker > > > > image > > > > > > > from the current source tree or a specific release (including > > > > > unreleased > > > > > > or > > > > > > > unsupported versions). > > > > > > > > > > > > > > Additionally, there has been discussion among Flink > contributors > > > for > > > > > some > > > > > > > time about the confusing state of Dockerfiles within the Flink > > > > > > repository, > > > > > > > each meant for a different way of running Flink. I’m not > > completely > > > > up > > > > > to > > > > > > > speed about these different efforts, but we could possibly > solve > > > this > > > > > by > > > > > > > either building additional “official” images with different > > > > entrypoints > > > > > > for > > > > > > > these various purposes, or by developing an improved entrypoint > > > > script > > > > > > that > > > > > > > conveniently supports all cases. I defer to Till Rohrmann, > > > Konstantin > > > > > > > Knauf, or Stephan Ewen for further discussion on this point. > > > > > > > > > > > > > > I apologize again for the wall of text, but if you made it this > > > far, > > > > > > thank > > > > > > > you! These improvements have been a long time coming, and I > hope > > we > > > > can > > > > > > > find a solution that serves the Flink and Docker communities > > well. > > > > > Please > > > > > > > don’t hesitate to ask any questions. > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > Patrick Lucas > > > > > > > > > > > > > > [1] https://hub.docker.com/_/flink > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E > > > > > > > > > > > > > > [3] On page 2 at the time we went to press: > > > > > > > > > https://hub.docker.com/search?q=&type=image&image_filter=official > > > > > > > > > > > > > > [4] https://github.com/docker-flink/docker-flink > > > > > > > > > > > > > > [5] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink > > > > > > > > > > > > > > [6] I looked at the 25 most popular “official” images (see [3]) > > as > > > > well > > > > > > as > > > > > > > “official” images of Apache software from the top 125; all use > a > > > > > > dedicated > > > > > > > repo > > > > > > > [7] https://hub.docker.com/u/apache > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Konstantin Knauf | Solutions Architect > > > > > > +49 160 91394525 > > > > > > > > > Follow us @VervericaData Ververica <https://www.ververica.com/> > > > > > > > > > -- > > > > > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > > > Conference > > > > > > Stream Processing | Event Driven | Real Time > > > > > > -- > > > > > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > > > > > -- > > > Ververica GmbH > > > Registered at Amtsgericht Charlottenburg: HRB 158244 B > > > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji > > > (Tony) Cheng > > > > > >