Hi everyone, As you might know, some of us are currently working on Docker-based playgrounds that make it very easy for first-time Flink users to try out and play with Flink [0].
Our current setup (still work in progress with some parts merged to the master branch) looks as follows: * The playground is a Docker Compose environment [1] consisting of Flink, Kafka, and Zookeeper images (ZK for Kafka). The playground is based on a specific Flink job. * We had planned to add the example job of the playground as an example to the flink main repository to bundle it with the Flink distribution. Hence, it would have been included in the Docker-hub-official (soon to be published) Flink 1.9 Docker image [2]. * The main motivation of adding the job to the examples module in the flink main repo was to avoid the maintenance overhead for a customized Docker image. When discussing to backport the playground job (and its data generator) to include it in the Flink 1.9 examples, concerns were raised about their Kafka dependency which will become a problem, if the community agrees on the recently proposed repository split, which would remove flink-kafka from the main repository [3]. I think this is a fair concern, that we did not consider when designing the playground (also the repo split was not proposed yet). If we don't add the playground job to the examples, we need to put it somewhere else. The obvious choice would be the flink-playgrounds [4] repository, which was intended for the docker-compose configuration files. However, we would not be able to include it in the Docker-hub-official Flink image any more and would need to maintain a custom Docker image, what we tried to avoid. The custom image would of course be based on the Docker-hub-official Flink image. There are different approaches for this: 1) Building one (or more) official ASF images There is an official Apache Docker Hub user [5] and a bunch of projects publish Docker images via this user. Apache Infra seems to support an process that automatically builds and publishes Docker images when a release tag is added to a repository. This feature needs to be enabled. I haven't found detailed documentation on this but there is a bunch of INFRA Jira tickets that discuss this mechanism. This approach would mean that we need a formal Apache release for flink-playgrounds (similar to flink-shaded). The obvious benefits are that these images would be ASF-official Docker images. In case we can publish more than one image per repo, we could also publish images for other playgrounds (like the SQL playground, which could be based on the SQL training that I built [6] which uses an image that is published under my user [7]). 2) Rely on an external image This image could be build by somebody in the community (like me). Problem is of course, that the image is not an official image and we would rely on a volunteer to build the images. OTOH, the overhead would be pretty small. No need to roll run full releases, integration with Infra's build process, etc. IMO, the first approach is clearly the better choice but also needs a bunch of things to be put into place. What do others think? Does somebody have another idea? Cheers, Fabian [0] https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html [1] https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html#anatomy-of-this-playground [2] https://hub.docker.com/_/flink [3] https://lists.apache.org/thread.html/eb841f610ef2c191b8d00b6c07b2eab513da2e4eb2d7da5c5e6846f4@%3Cdev.flink.apache.org%3E [4] https://github.com/apache/flink-playgrounds [5] https://hub.docker.com/u/apache [6] https://github.com/ververica/sql-training/ [7] https://hub.docker.com/r/fhueske/flink-sql-client-training-1.7.2