Hi everyone,

As you might know, some of us are currently working on Docker-based
playgrounds that make it very easy for first-time Flink users to try out
and play with Flink [0].

Our current setup (still work in progress with some parts merged to the
master branch) looks as follows:
* The playground is a Docker Compose environment [1] consisting of Flink,
Kafka, and Zookeeper images (ZK for Kafka). The playground is based on a
specific Flink job.
* We had planned to add the example job of the playground as an example to
the flink main repository to bundle it with the Flink distribution. Hence,
it would have been included in the Docker-hub-official (soon to be
published) Flink 1.9 Docker image [2].
* The main motivation of adding the job to the examples module in the flink
main repo was to avoid the maintenance overhead for a customized Docker
image.

When discussing to backport the playground job (and its data generator) to
include it in the Flink 1.9 examples, concerns were raised about their
Kafka dependency which will become a problem, if the community agrees on
the recently proposed repository split, which would remove flink-kafka from
the main repository [3]. I think this is a fair concern, that we did not
consider when designing the playground (also the repo split was not
proposed yet).

If we don't add the playground job to the examples, we need to put it
somewhere else. The obvious choice would be the flink-playgrounds [4]
repository, which was intended for the docker-compose configuration files.
However, we would not be able to include it in the Docker-hub-official
Flink image any more and would need to maintain a custom Docker image, what
we tried to avoid. The custom image would of course be based on the
Docker-hub-official Flink image.

There are different approaches for this:

1) Building one (or more) official ASF images
There is an official Apache Docker Hub user [5] and a bunch of projects
publish Docker images via this user. Apache Infra seems to support an
process that automatically builds and publishes Docker images when a
release tag is added to a repository. This feature needs to be enabled. I
haven't found detailed documentation on this but there is a bunch of INFRA
Jira tickets that discuss this mechanism.
This approach would mean that we need a formal Apache release for
flink-playgrounds (similar to flink-shaded). The obvious benefits are that
these images would be ASF-official Docker images. In case we can publish
more than one image per repo, we could also publish images for other
playgrounds (like the SQL playground, which could be based on the SQL
training that I built [6] which uses an image that is published under my
user [7]).

2) Rely on an external image
This image could be build by somebody in the community (like me). Problem
is of course, that the image is not an official image and we would rely on
a volunteer to build the images.
OTOH, the overhead would be pretty small. No need to roll run full
releases, integration with Infra's build process, etc.

IMO, the first approach is clearly the better choice but also needs a bunch
of things to be put into place.

What do others think?
Does somebody have another idea?

Cheers,
Fabian

[0]
https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html
[1]
https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html#anatomy-of-this-playground
[2] https://hub.docker.com/_/flink
[3]
https://lists.apache.org/thread.html/eb841f610ef2c191b8d00b6c07b2eab513da2e4eb2d7da5c5e6846f4@%3Cdev.flink.apache.org%3E
[4] https://github.com/apache/flink-playgrounds
[5] https://hub.docker.com/u/apache
[6] https://github.com/ververica/sql-training/
[7] https://hub.docker.com/r/fhueske/flink-sql-client-training-1.7.2

Reply via email to