Big +1 for

* official images in a separate repository
* unified images (session cluster vs application cluster)
* images for development in Apache flink repository

On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann <trohrm...@apache.org> wrote:

> Thanks a lot for starting this discussion Patrick! I think it is a very
> good idea to move Flink's docker image more under the jurisdiction of the
> Flink PMC and to make it releasing new docker images part of Flink's
> release process (not saying that we cannot release new docker images
> independent of Flink's release cycle).
>
> One thing I have no strong opinion about is where to place the Dockerfiles
> (apache/flink.git vs. apache/flink-docker.git). I see the point that one
> wants to separate concerns (Flink code vs. Dockerfiles) and, hence, that
> having separate repositories might help with this objective. But on the
> other hand, I don't have a lot of experience with Docker Hub and how to
> best host Dockerfiles. Consequently, it would be helpful if others who have
> made some experience could share it with us.
>
> Cheers,
> Till
>
> On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng <chenghe...@gmail.com> wrote:
>
> > Hi Patrick,
> >
> > Thanks a lot for your continued work on the Docker images. That’s really
> > really a great job! And I have also benefited from it.
> >
> > Big +1 for integrating docker image publication into the Flink release
> > process since we can leverage the Flink release process to make sure a
> more
> > legitimacy docker publication. We can also check and vote on it during
> the
> > release.
> >
> > I think the most import thing we need to discuss first is whether to
> have a
> > dedicated git repo for the Dockerfiles.
> >
> > Although it is convention shared by nearly every other “official” image
> on
> > Docker Hub to have a dedicated repo, I'm still not sure about it. Maybe I
> > have missed something important. From my point of view, I think it’s
> better
> > to have the Dockerfiles in the (main)Flink repo.
> >   - First, I think the Dockerfiles can be treated as part of the release.
> > And it is also natural to put the corresponding version of the Dockerfile
> > in the corresponding Flink release.
> >   - Second, we can put the Dockerfiles in the path like
> > flink/docker-flink/version/ and the version varies in different releases.
> > For example, for release 1.8.3, we have a flink/docker-flink/1.8.3
> > folder(or maybe flink/docker-flink/1.8). Even though all Dockerfiles for
> > supported versions are not in one path but they are still in one Git tree
> > with different refs.
> >   - Third, it seems the Docker Hub also supports specifying different
> refs.
> > For the file[1], we can change the GitRepo link from
> > https://github.com/docker-flink/docker-flink.git to
> > https://github.com/apache/flink.git and add a GitFetch for each tag,
> e.g.,
> > GitFetch: refs/tags/release-1.8.3. There are some examples in the file of
> > ubuntu[2].
> >
> > If the above assumptions are right and there are no more obstacles, I'm
> > intended to have these Dockerfiles in the main Flink repo. In this case,
> we
> > can reduce the number of repos and reduce the management overhead.
> > What do you think?
> >
> > Best,
> > Hequn
> >
> > [1]
> >
> https://github.com/docker-library/official-images/blob/master/library/flink
> > [2]
> >
> >
> https://github.com/docker-library/official-images/blob/master/library/ubuntu
> >
> >
> > On Fri, Dec 20, 2019 at 5:29 PM Yang Wang <danrtsey...@gmail.com> wrote:
> >
> > >  Big +1 for this effort.
> > >
> > > It is really exciting we have started this great work. More and more
> > > companies start to
> > > use Flink in container environment(docker, Kubernetes, Mesos, even
> > > Yarn-3.x). So it is
> > > very important that we could have unified official image building and
> > > releasing process.
> > >
> > >
> > > The image building process in this proposal is really good and i just
> > have
> > > the following thoughts.
> > >
> > > >> Keep a dedicated repo for Dockerfiles to build official image
> > > I think this is a good way and we do not need to make some unnecessary
> > > changes to Flink repository.
> > >
> > > >> Integrate building image into the Flink release process
> > > It will bring a better experience for container environment users. In
> my
> > > opinion, a complete
> > > release includes the official image. It should be verified to work
> well.
> > >
> > > >> Nightly building
> > > Do we support for all the release branch or just master branch?
> > >
> > > >> Multiple purpose Flink images
> > > It is really indeed. In developing and testing process, we need some
> > > profiling tools to help
> > > us investigate some problems. Currently, we do not even have
> jstack/jmap
> > in
> > > the image.
> > >
> > > >> Unify the Dockerfile in Flink repository
> > > In the current code base, we have flink-contrib/docker-flink/Dockerfile
> > to
> > > build a image
> > > for session cluster. However, it is not updated. For per-job cluster,
> > > flink-container/docker/Dockerfile
> > > could be used to build a flink image with user artifacts. I think we
> need
> > > to unify them and
> > > provide a more powerful build script and entry point.
> > >
> > >
> > >
> > > Best,
> > > Yang
> > >
> > > Patrick Lucas <patr...@ververica.com> 于2019年12月19日周四 下午9:20写道:
> > >
> > > > Hi everyone,
> > > >
> > > >
> > > > I would like to start a discussion about integrating publication of
> the
> > > > Flink Docker images hosted on Docker Hub[1] more tightly with the
> Flink
> > > > release process. Apologies in advance for the long post.
> > > >
> > > > More than two and a half years ago (time flies!) we introduced
> > “official”
> > > > Docker images for Flink[2]. Since then, the popularity of running
> > > > containerized applications in general and containerized Flink in
> > > particular
> > > > has continued to grow. Today, Flink is one of the most popular
> > “official”
> > > > images on Docker Hub[3].
> > > >
> > > > > A graph of Flink Docker image pulls over time:
> > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png
> > > >
> > > > “Official” is in quotation marks because while that’s how the Docker
> > > > community refers to top-level images on Docker Hub (i.e. those that
> can
> > > be
> > > > run with just <docker run foo>), they are not official in the sense
> of
> > > > being officially endorsed by the Flink PMC.
> > > >
> > > > I think it’s time for that to change.
> > > >
> > > > Currently, the Dockerfiles that produce these images are maintained
> in
> > a
> > > > repository called docker-flink[4] in a separate, community-managed
> > GitHub
> > > > organization of the same name. When a new release of Flink is
> > available,
> > > or
> > > > when other changes are necessary, these Dockerfiles—one per image—are
> > > > updated, and then a pull request[5] is made to the Docker Hub
> > > > official-images repo with an updated manifest of images and tags,
> after
> > > > which infrastructure run by Docker Hub builds, checks, and publishes
> > the
> > > > images.
> > > >
> > > > A question that has come up regularly is “Why are the Dockerfiles in
> a
> > > > separate repository from Flink?”, and there are a few different
> > answers:
> > > >
> > > >    -
> > > >
> > > >    These Dockerfiles package only released, published distributions
> of
> > > >    Flink, and are therefore decoupled from a particular commit in the
> > > Flink
> > > >    repo
> > > >    -
> > > >
> > > >    All the Dockerfiles for supported versions (and the corresponding
> > > Scala
> > > >    version variants) should be available in one Git tree for
> > > > discoverability
> > > >    -
> > > >
> > > >    The master branch of Flink is not the right place to encode what
> the
> > > >    supported versions are, or how to run previous versions of
> Flink—it
> > > > should
> > > >    be concerned with the point-in-time of the code represented in
> that
> > > > commit
> > > >
> > > >
> > > > But mostly, having a dedicated repo for Dockerfiles is a convention
> > > shared
> > > > by nearly every other “official” image on Docker Hub[6]. If the Flink
> > > > community wants to do this differently, we will need to work with the
> > > > Docker Hub maintainers to make sure we continue to work within their
> > > > guidelines and expectations.
> > > >
> > > > While it seems intuitive that integrating these images into the Flink
> > > > release process is a good thing, I don’t believe it is strictly
> > > necessary,
> > > > since the images only package approved and signed Flink releases, and
> > do
> > > > not themselves build Flink from source. However, there are some
> > concrete
> > > > advantages:
> > > >
> > > >    -
> > > >
> > > >    Putting the Docker images on (almost) equal footing with Flink
> > binary
> > > >    release artifacts will help the legitimacy of and user confidence
> in
> > > >    running Flink in containerized environments
> > > >    -
> > > >
> > > >    By publishing release candidate (and possibly nightly) images, the
> > > >    release testing and automated testing processes could be improved
> > > >    -
> > > >
> > > >    The delay between Flink releases and when the corresponding Docker
> > > >    images are available will be reduced
> > > >
> > > >
> > > > Considering all of this, I propose the following:
> > > >
> > > >    -
> > > >
> > > >    We move the Git repository containing the Dockerfiles from the
> > > >    docker-flink GitHub organization to Apache, placing it under
> control
> > > of
> > > > the
> > > >    Flink PMC
> > > >    -
> > > >
> > > >    We codify updating these Dockerfiles and notifying Docker Hub into
> > the
> > > >    Flink release process
> > > >    -
> > > >
> > > >       For release candidates, Dockerfiles should be added to a
> special
> > > >       directory which will be automatically built and pushed to the
> > > > Apache Docker
> > > >       Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1
> > > >       -
> > > >
> > > >       Upon release, the appropriate “release” Dockerfiles are added
> > (e.g.
> > > >       under the 1.10 directory) and release candidate Dockerfiles
> > > removed,
> > > > and
> > > >       then a pull request opened on the
> docker-library/official-images
> > > > repository
> > > >       -
> > > >
> > > >    Optionally, we introduce “nightly” builds, with an automated
> process
> > > >    building and pushing images to the Apache Docker Hub organization,
> > > e.g.
> > > >    apache/flink-dev:1.10-SNAPSHOT
> > > >
> > > >
> > > > If we choose to move forward in this direction, there are some
> further
> > > > steps we could take to improve the experience of both developing and
> > > using
> > > > Flink with Docker (these are actually mostly orthogonal to the
> proposed
> > > > changes above, but I think this is a natural first step and should
> make
> > > the
> > > > following ideas easier to implement).
> > > >
> > > > First, there are important differences between images meant for
> running
> > > > Flink and those meant for development: the former should strictly
> > package
> > > > only released distributions of software and be as thin of a layer as
> > > > possible over the software itself, while the latter can be used
> during
> > > > development and testing, and can easily be rebuilt from a “working
> > copy”
> > > of
> > > > the software’s source code.
> > > >
> > > > By standardizing on defining such “production” images in the
> > docker-flink
> > > > repository and “development” image(s) in the Flink repository itself,
> > it
> > > is
> > > > much clearer to developers and users what the right Dockerfile or
> image
> > > > they should use for a given purpose. To that end, we could introduce
> > one
> > > or
> > > > more documented Maven goals or Make targets for building a Docker
> image
> > > > from the current source tree or a specific release (including
> > unreleased
> > > or
> > > > unsupported versions).
> > > >
> > > > Additionally, there has been discussion among Flink contributors for
> > some
> > > > time about the confusing state of Dockerfiles within the Flink
> > > repository,
> > > > each meant for a different way of running Flink. I’m not completely
> up
> > to
> > > > speed about these different efforts, but we could possibly solve this
> > by
> > > > either building additional “official” images with different
> entrypoints
> > > for
> > > > these various purposes, or by developing an improved entrypoint
> script
> > > that
> > > > conveniently supports all cases. I defer to Till Rohrmann, Konstantin
> > > > Knauf, or Stephan Ewen for further discussion on this point.
> > > >
> > > > I apologize again for the wall of text, but if you made it this far,
> > > thank
> > > > you! These improvements have been a long time coming, and I hope we
> can
> > > > find a solution that serves the Flink and Docker communities well.
> > Please
> > > > don’t hesitate to ask any questions.
> > > >
> > > > --
> > > >
> > > > Patrick Lucas
> > > >
> > > > [1] https://hub.docker.com/_/flink
> > > >
> > > > [2]
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E
> > > >
> > > > [3] On page 2 at the time we went to press:
> > > > https://hub.docker.com/search?q=&type=image&image_filter=official
> > > >
> > > > [4] https://github.com/docker-flink/docker-flink
> > > >
> > > > [5]
> > > >
> > > >
> > >
> >
> https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink
> > > >
> > > > [6] I looked at the 25 most popular “official” images (see [3]) as
> well
> > > as
> > > > “official” images of Apache software from the top 125; all use a
> > > dedicated
> > > > repo
> > > > [7] https://hub.docker.com/u/apache
> > > >
> > >
> >
>


-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng

Reply via email to