+1 for Ufuk's proposal how to proceed. I guess the immediate next step
would be a VOTE for accepting the dockerfiles and where to store them.

Cheers,
Till

On Wed, Jan 22, 2020 at 4:05 PM Fabian Hueske <fhue...@gmail.com> wrote:

> Hi everyone,
>
> First of all, thank you very much Patrick for maintaining and publishing
> the Flink Docker images so far and for starting this discussion!
>
> I'm in favor of adding the Dockerfiles in a separate repository and not in
> the main Flink repository.
> I also think that it makes sense to first focus on the contribution of the
> Dockerfiles and consolidation of existing Dockerfiles before discussing
> special cases for development and testing.
>
> In addition to the Dockerfiles in the Flink main repo, there is also one in
> the flink-playgrounds repo [1] to build a customized Docker image for the
> playground.
>
> Besides building and publishing "official" Flink images via DockerHub,
> there is also the option to let ASF Infra build Docker images and publish
> them under https://hub.docker.com/u/apache.
> These images would not be "official" DockerHub images anymore, but
> available under the Apache DockerHub user.
> However, I think it would be a good idea to keep the current setup for the
> main Flink images (those that depend on Flink releases) for better
> visibility and to not confuse our users.
> We might want to publish less critical images (playground images, dev
> images, nightly builds, etc) via Infra under the Apache DockerHub user.
>
> Best,
> Fabian
>
> Am Mo., 13. Jan. 2020 um 11:38 Uhr schrieb Ufuk Celebi <u...@apache.org>:
>
> > Hey all,
> >
> > first of all a big thank you for driving many of the Docker image
> releases
> > in the last two years.
> >
> > *(1) Moving docker-flink/docker-flink to apache/docker-flink*
> >
> > +1 to do this as you outlined. I would propose to aim for a first
> > integration with the 1.10 release without major changes to the existing
> > Dockerfiles. The work items would be to move the Dockerfiles and update
> the
> > release process documentation so everyone is on the same page.
> >
> > *(2) Consolidate Dockerfiles in apache/flink*
> >
> > +1 to start the process for this. I think this requires a bit of thinking
> > about what the requirements are and which problems we want to solve. From
> > skimming the existing Dockerfiles, it seems to me that the Docker image
> > builds fulfil quite a few different tasks. We have a script that can
> bundle
> > Hadoop, can copy an existing Flink distribution, can include user jars,
> > etc. The scope of this is quite broad and would warrant a design
> document/a
> > FLIP.
> >
> > I would move the questions about nightly builds, using a different base
> > image or having image variants with debug tooling to after (1) and (2) or
> > make it part of (2).
> >
> > *(3) Next steps*
> >
> > If there are no objections, I would propose to tackle (1) and (2)
> separate
> > and to continue as follows:
> >
> > (i) Create tickets for (1) and aim to align with 1.10 release timeline
> > (ideally before the first RC). Since this does not touch any code in the
> > release branches, I think this would not be affected by the feature
> freeze.
> > The major work item would be to update the docs and potential
> refactorings
> > of the existing process and Dockerfiles. I can help with the process to
> > create a new repo.
> >
> > (ii) Create first draft for consolidation of existing Dockerfiles. After
> > this proposal is done, I would propose to bring it up for a separate
> > discussion on the ML.
> >
> >
> > What do you think? @Patrick: would you be interested in working on both
> (1)
> > + (2) or did you mainly have (1) in mind?
> >
> > Best,
> >
> > Ufuk
> >
> > On Sun, Jan 12, 2020 at 8:30 PM Konstantin Knauf <
> konstan...@ververica.com
> > >
> > wrote:
> >
> > > Big +1 for
> > >
> > > * official images in a separate repository
> > > * unified images (session cluster vs application cluster)
> > > * images for development in Apache flink repository
> > >
> > > On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann <trohrm...@apache.org>
> > > wrote:
> > >
> > > > Thanks a lot for starting this discussion Patrick! I think it is a
> very
> > > > good idea to move Flink's docker image more under the jurisdiction of
> > the
> > > > Flink PMC and to make it releasing new docker images part of Flink's
> > > > release process (not saying that we cannot release new docker images
> > > > independent of Flink's release cycle).
> > > >
> > > > One thing I have no strong opinion about is where to place the
> > > Dockerfiles
> > > > (apache/flink.git vs. apache/flink-docker.git). I see the point that
> > one
> > > > wants to separate concerns (Flink code vs. Dockerfiles) and, hence,
> > that
> > > > having separate repositories might help with this objective. But on
> the
> > > > other hand, I don't have a lot of experience with Docker Hub and how
> to
> > > > best host Dockerfiles. Consequently, it would be helpful if others
> who
> > > have
> > > > made some experience could share it with us.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng <chenghe...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Patrick,
> > > > >
> > > > > Thanks a lot for your continued work on the Docker images. That’s
> > > really
> > > > > really a great job! And I have also benefited from it.
> > > > >
> > > > > Big +1 for integrating docker image publication into the Flink
> > release
> > > > > process since we can leverage the Flink release process to make
> sure
> > a
> > > > more
> > > > > legitimacy docker publication. We can also check and vote on it
> > during
> > > > the
> > > > > release.
> > > > >
> > > > > I think the most import thing we need to discuss first is whether
> to
> > > > have a
> > > > > dedicated git repo for the Dockerfiles.
> > > > >
> > > > > Although it is convention shared by nearly every other “official”
> > image
> > > > on
> > > > > Docker Hub to have a dedicated repo, I'm still not sure about it.
> > > Maybe I
> > > > > have missed something important. From my point of view, I think
> it’s
> > > > better
> > > > > to have the Dockerfiles in the (main)Flink repo.
> > > > >   - First, I think the Dockerfiles can be treated as part of the
> > > release.
> > > > > And it is also natural to put the corresponding version of the
> > > Dockerfile
> > > > > in the corresponding Flink release.
> > > > >   - Second, we can put the Dockerfiles in the path like
> > > > > flink/docker-flink/version/ and the version varies in different
> > > releases.
> > > > > For example, for release 1.8.3, we have a flink/docker-flink/1.8.3
> > > > > folder(or maybe flink/docker-flink/1.8). Even though all
> Dockerfiles
> > > for
> > > > > supported versions are not in one path but they are still in one
> Git
> > > tree
> > > > > with different refs.
> > > > >   - Third, it seems the Docker Hub also supports specifying
> different
> > > > refs.
> > > > > For the file[1], we can change the GitRepo link from
> > > > > https://github.com/docker-flink/docker-flink.git to
> > > > > https://github.com/apache/flink.git and add a GitFetch for each
> tag,
> > > > e.g.,
> > > > > GitFetch: refs/tags/release-1.8.3. There are some examples in the
> > file
> > > of
> > > > > ubuntu[2].
> > > > >
> > > > > If the above assumptions are right and there are no more obstacles,
> > I'm
> > > > > intended to have these Dockerfiles in the main Flink repo. In this
> > > case,
> > > > we
> > > > > can reduce the number of repos and reduce the management overhead.
> > > > > What do you think?
> > > > >
> > > > > Best,
> > > > > Hequn
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://github.com/docker-library/official-images/blob/master/library/flink
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/docker-library/official-images/blob/master/library/ubuntu
> > > > >
> > > > >
> > > > > On Fri, Dec 20, 2019 at 5:29 PM Yang Wang <danrtsey...@gmail.com>
> > > wrote:
> > > > >
> > > > > >  Big +1 for this effort.
> > > > > >
> > > > > > It is really exciting we have started this great work. More and
> > more
> > > > > > companies start to
> > > > > > use Flink in container environment(docker, Kubernetes, Mesos,
> even
> > > > > > Yarn-3.x). So it is
> > > > > > very important that we could have unified official image building
> > and
> > > > > > releasing process.
> > > > > >
> > > > > >
> > > > > > The image building process in this proposal is really good and i
> > just
> > > > > have
> > > > > > the following thoughts.
> > > > > >
> > > > > > >> Keep a dedicated repo for Dockerfiles to build official image
> > > > > > I think this is a good way and we do not need to make some
> > > unnecessary
> > > > > > changes to Flink repository.
> > > > > >
> > > > > > >> Integrate building image into the Flink release process
> > > > > > It will bring a better experience for container environment
> users.
> > In
> > > > my
> > > > > > opinion, a complete
> > > > > > release includes the official image. It should be verified to
> work
> > > > well.
> > > > > >
> > > > > > >> Nightly building
> > > > > > Do we support for all the release branch or just master branch?
> > > > > >
> > > > > > >> Multiple purpose Flink images
> > > > > > It is really indeed. In developing and testing process, we need
> > some
> > > > > > profiling tools to help
> > > > > > us investigate some problems. Currently, we do not even have
> > > > jstack/jmap
> > > > > in
> > > > > > the image.
> > > > > >
> > > > > > >> Unify the Dockerfile in Flink repository
> > > > > > In the current code base, we have
> > > flink-contrib/docker-flink/Dockerfile
> > > > > to
> > > > > > build a image
> > > > > > for session cluster. However, it is not updated. For per-job
> > cluster,
> > > > > > flink-container/docker/Dockerfile
> > > > > > could be used to build a flink image with user artifacts. I think
> > we
> > > > need
> > > > > > to unify them and
> > > > > > provide a more powerful build script and entry point.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Yang
> > > > > >
> > > > > > Patrick Lucas <patr...@ververica.com> 于2019年12月19日周四 下午9:20写道:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > >
> > > > > > > I would like to start a discussion about integrating
> publication
> > of
> > > > the
> > > > > > > Flink Docker images hosted on Docker Hub[1] more tightly with
> the
> > > > Flink
> > > > > > > release process. Apologies in advance for the long post.
> > > > > > >
> > > > > > > More than two and a half years ago (time flies!) we introduced
> > > > > “official”
> > > > > > > Docker images for Flink[2]. Since then, the popularity of
> running
> > > > > > > containerized applications in general and containerized Flink
> in
> > > > > > particular
> > > > > > > has continued to grow. Today, Flink is one of the most popular
> > > > > “official”
> > > > > > > images on Docker Hub[3].
> > > > > > >
> > > > > > > > A graph of Flink Docker image pulls over time:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png
> > > > > > >
> > > > > > > “Official” is in quotation marks because while that’s how the
> > > Docker
> > > > > > > community refers to top-level images on Docker Hub (i.e. those
> > that
> > > > can
> > > > > > be
> > > > > > > run with just <docker run foo>), they are not official in the
> > sense
> > > > of
> > > > > > > being officially endorsed by the Flink PMC.
> > > > > > >
> > > > > > > I think it’s time for that to change.
> > > > > > >
> > > > > > > Currently, the Dockerfiles that produce these images are
> > maintained
> > > > in
> > > > > a
> > > > > > > repository called docker-flink[4] in a separate,
> > community-managed
> > > > > GitHub
> > > > > > > organization of the same name. When a new release of Flink is
> > > > > available,
> > > > > > or
> > > > > > > when other changes are necessary, these Dockerfiles—one per
> > > image—are
> > > > > > > updated, and then a pull request[5] is made to the Docker Hub
> > > > > > > official-images repo with an updated manifest of images and
> tags,
> > > > after
> > > > > > > which infrastructure run by Docker Hub builds, checks, and
> > > publishes
> > > > > the
> > > > > > > images.
> > > > > > >
> > > > > > > A question that has come up regularly is “Why are the
> Dockerfiles
> > > in
> > > > a
> > > > > > > separate repository from Flink?”, and there are a few different
> > > > > answers:
> > > > > > >
> > > > > > >    -
> > > > > > >
> > > > > > >    These Dockerfiles package only released, published
> > distributions
> > > > of
> > > > > > >    Flink, and are therefore decoupled from a particular commit
> in
> > > the
> > > > > > Flink
> > > > > > >    repo
> > > > > > >    -
> > > > > > >
> > > > > > >    All the Dockerfiles for supported versions (and the
> > > corresponding
> > > > > > Scala
> > > > > > >    version variants) should be available in one Git tree for
> > > > > > > discoverability
> > > > > > >    -
> > > > > > >
> > > > > > >    The master branch of Flink is not the right place to encode
> > what
> > > > the
> > > > > > >    supported versions are, or how to run previous versions of
> > > > Flink—it
> > > > > > > should
> > > > > > >    be concerned with the point-in-time of the code represented
> in
> > > > that
> > > > > > > commit
> > > > > > >
> > > > > > >
> > > > > > > But mostly, having a dedicated repo for Dockerfiles is a
> > convention
> > > > > > shared
> > > > > > > by nearly every other “official” image on Docker Hub[6]. If the
> > > Flink
> > > > > > > community wants to do this differently, we will need to work
> with
> > > the
> > > > > > > Docker Hub maintainers to make sure we continue to work within
> > > their
> > > > > > > guidelines and expectations.
> > > > > > >
> > > > > > > While it seems intuitive that integrating these images into the
> > > Flink
> > > > > > > release process is a good thing, I don’t believe it is strictly
> > > > > > necessary,
> > > > > > > since the images only package approved and signed Flink
> releases,
> > > and
> > > > > do
> > > > > > > not themselves build Flink from source. However, there are some
> > > > > concrete
> > > > > > > advantages:
> > > > > > >
> > > > > > >    -
> > > > > > >
> > > > > > >    Putting the Docker images on (almost) equal footing with
> Flink
> > > > > binary
> > > > > > >    release artifacts will help the legitimacy of and user
> > > confidence
> > > > in
> > > > > > >    running Flink in containerized environments
> > > > > > >    -
> > > > > > >
> > > > > > >    By publishing release candidate (and possibly nightly)
> images,
> > > the
> > > > > > >    release testing and automated testing processes could be
> > > improved
> > > > > > >    -
> > > > > > >
> > > > > > >    The delay between Flink releases and when the corresponding
> > > Docker
> > > > > > >    images are available will be reduced
> > > > > > >
> > > > > > >
> > > > > > > Considering all of this, I propose the following:
> > > > > > >
> > > > > > >    -
> > > > > > >
> > > > > > >    We move the Git repository containing the Dockerfiles from
> the
> > > > > > >    docker-flink GitHub organization to Apache, placing it under
> > > > control
> > > > > > of
> > > > > > > the
> > > > > > >    Flink PMC
> > > > > > >    -
> > > > > > >
> > > > > > >    We codify updating these Dockerfiles and notifying Docker
> Hub
> > > into
> > > > > the
> > > > > > >    Flink release process
> > > > > > >    -
> > > > > > >
> > > > > > >       For release candidates, Dockerfiles should be added to a
> > > > special
> > > > > > >       directory which will be automatically built and pushed to
> > the
> > > > > > > Apache Docker
> > > > > > >       Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1
> > > > > > >       -
> > > > > > >
> > > > > > >       Upon release, the appropriate “release” Dockerfiles are
> > added
> > > > > (e.g.
> > > > > > >       under the 1.10 directory) and release candidate
> Dockerfiles
> > > > > > removed,
> > > > > > > and
> > > > > > >       then a pull request opened on the
> > > > docker-library/official-images
> > > > > > > repository
> > > > > > >       -
> > > > > > >
> > > > > > >    Optionally, we introduce “nightly” builds, with an automated
> > > > process
> > > > > > >    building and pushing images to the Apache Docker Hub
> > > organization,
> > > > > > e.g.
> > > > > > >    apache/flink-dev:1.10-SNAPSHOT
> > > > > > >
> > > > > > >
> > > > > > > If we choose to move forward in this direction, there are some
> > > > further
> > > > > > > steps we could take to improve the experience of both
> developing
> > > and
> > > > > > using
> > > > > > > Flink with Docker (these are actually mostly orthogonal to the
> > > > proposed
> > > > > > > changes above, but I think this is a natural first step and
> > should
> > > > make
> > > > > > the
> > > > > > > following ideas easier to implement).
> > > > > > >
> > > > > > > First, there are important differences between images meant for
> > > > running
> > > > > > > Flink and those meant for development: the former should
> strictly
> > > > > package
> > > > > > > only released distributions of software and be as thin of a
> layer
> > > as
> > > > > > > possible over the software itself, while the latter can be used
> > > > during
> > > > > > > development and testing, and can easily be rebuilt from a
> > “working
> > > > > copy”
> > > > > > of
> > > > > > > the software’s source code.
> > > > > > >
> > > > > > > By standardizing on defining such “production” images in the
> > > > > docker-flink
> > > > > > > repository and “development” image(s) in the Flink repository
> > > itself,
> > > > > it
> > > > > > is
> > > > > > > much clearer to developers and users what the right Dockerfile
> or
> > > > image
> > > > > > > they should use for a given purpose. To that end, we could
> > > introduce
> > > > > one
> > > > > > or
> > > > > > > more documented Maven goals or Make targets for building a
> Docker
> > > > image
> > > > > > > from the current source tree or a specific release (including
> > > > > unreleased
> > > > > > or
> > > > > > > unsupported versions).
> > > > > > >
> > > > > > > Additionally, there has been discussion among Flink
> contributors
> > > for
> > > > > some
> > > > > > > time about the confusing state of Dockerfiles within the Flink
> > > > > > repository,
> > > > > > > each meant for a different way of running Flink. I’m not
> > completely
> > > > up
> > > > > to
> > > > > > > speed about these different efforts, but we could possibly
> solve
> > > this
> > > > > by
> > > > > > > either building additional “official” images with different
> > > > entrypoints
> > > > > > for
> > > > > > > these various purposes, or by developing an improved entrypoint
> > > > script
> > > > > > that
> > > > > > > conveniently supports all cases. I defer to Till Rohrmann,
> > > Konstantin
> > > > > > > Knauf, or Stephan Ewen for further discussion on this point.
> > > > > > >
> > > > > > > I apologize again for the wall of text, but if you made it this
> > > far,
> > > > > > thank
> > > > > > > you! These improvements have been a long time coming, and I
> hope
> > we
> > > > can
> > > > > > > find a solution that serves the Flink and Docker communities
> > well.
> > > > > Please
> > > > > > > don’t hesitate to ask any questions.
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Patrick Lucas
> > > > > > >
> > > > > > > [1] https://hub.docker.com/_/flink
> > > > > > >
> > > > > > > [2]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E
> > > > > > >
> > > > > > > [3] On page 2 at the time we went to press:
> > > > > > >
> > https://hub.docker.com/search?q=&type=image&image_filter=official
> > > > > > >
> > > > > > > [4] https://github.com/docker-flink/docker-flink
> > > > > > >
> > > > > > > [5]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink
> > > > > > >
> > > > > > > [6] I looked at the 25 most popular “official” images (see [3])
> > as
> > > > well
> > > > > > as
> > > > > > > “official” images of Apache software from the top 125; all use
> a
> > > > > > dedicated
> > > > > > > repo
> > > > > > > [7] https://hub.docker.com/u/apache
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Konstantin Knauf | Solutions Architect
> > >
> > > +49 160 91394525
> > >
> > >
> > > Follow us @VervericaData Ververica <https://www.ververica.com/>
> > >
> > >
> > > --
> > >
> > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> > > Conference
> > >
> > > Stream Processing | Event Driven | Real Time
> > >
> > > --
> > >
> > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> > >
> > > --
> > > Ververica GmbH
> > > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> > > (Tony) Cheng
> > >
> >
>

Reply via email to