Hey all,

thanks for the proposal and the detailed discussion. In particular, thanks
to Andrey for starting this thread and to Patrick for the additional ideas
in the linked Google doc.
I find many of the improvements proposed during the discussion (such as the
unified entrypoint in Flink, proper configuration via environment
variables, Dockerfiles for development, etc.) really important. At the same
time, I believe that these improvements have quite a large scope and could
be tackled independently as Till already suggested. I think we should
ideally split the discussions for those improvements out of this thread and
focus on the main target of FLIP-111.

To me the major point of this FLIP is to consolidate existing Dockerfiles
into apache/flink-docker and document typical usage scenarios (e.g. linking
plugins, installing shaded Hadoop, running a job cluster, etc.).

In order to achieve this, I think we could move forward as follows:

(1) Extend the entrypoint script in apache/flink-docker to start the job
cluster entry point
=> this is currently missing and would block removal of the Dockerfile in
flink-container

(2) Extend the example log4j-console configuration
=> support log retrieval from the Flink UI out of the box

(3) Document typical usage scenarios in apache/flink-docker
=> this should replace the proposed flink_docker_utils helper

(4) Remove the existing Dockerfiles from apache/flink


I really like the convenience of a script such as flink_docker_utils, but I
think we should avoid it for now, because most of the desired usage
scenarios can be covered by documentation. After we have concluded (1)-(4)
we can take a holistic look and identify what would benefit the most from
such a script and how it would interact with the other planned improvements.

I think this will give us a good basis to tackle the other major
improvements that were proposed.

– Ufuk

On Thu, Apr 2, 2020 at 4:34 PM Patrick Lucas <patr...@ververica.com> wrote:
>
> Thanks Andrey for working on this, and everyone else for your feedback.
>
> This FLIP inspired me to discuss and write down some ideas I've had for a
> while about configuring and running Flink (especially in Docker) that go
> beyond the scope of this FLIP, but don't contradict what it sets out to
do.
>
> The crux of it is that Flink should be maximally configurable using
> environment variables, and not require manipulation of the filesystem
(i.e.
> moving/linking JARs or editing config files) in order to run in a large
> majority of cases. And beyond that, particular for running Flink in
Docker,
> is that as much logic as possible should be a part of Flink itself and
not,
> for instance, in the docker-entrypoint.sh script. I've resisted adding
> additional logic to the Flink Docker images except where necessary since
> the beginning, and I believe we can get to the point where the only thing
> the entrypoint script does is drop privileges before invoking a script
> included in Flink.
>
> Ultimately, my ideal end-goal for running Flink in containers would
fulfill
> > the following points:
> >
> >    - A user can configure all “start-time” aspects of Flink with
> >    environment variables, including additions to the classpath
> >    - Flink automatically adapts to the resources available to the
> >    container (such as what BashJavaUtils helps with today)
> >    - A user can include additional JARs using a mounted volume, or at
> >    image build time with convenient tooling
> >    - The role/mode (jobmanager, session) is specified as a command line
> >    argument, with a single entrypoint program sufficing for all uses of
the
> >    image
> >
> > As a bonus, if we could eliminate some or most of the layers of shell
> > scripts that are involved in starting a Flink server, perhaps by
> > re-implementing this part of the stack in Java, and exec-ing to actually
> > run Flink with the proper java CLI arguments, I think it would be a big
win
> > for the project.
>
>
> You can read the rest of my notes here:
>
https://docs.google.com/document/d/1JCACSeDaqeZiXD9G1XxQBunwi-chwrdnFm38U1JxTDQ/edit
>
> On Wed, Mar 4, 2020 at 10:34 AM Andrey Zagrebin <azagre...@apache.org>
> wrote:
>
> > Hi All,
> >
> > If you have ever touched the docker topic in Flink, you
> > probably noticed that we have multiple places in docs and repos which
> > address its various concerns.
> >
> > We have prepared a FLIP [1] to simplify the perception of docker topic
in
> > Flink by users. It mostly advocates for an approach of extending
official
> > Flink image from the docker hub. For convenience, it can come with a
set of
> > bash utilities and documented examples of their usage. The utilities
allow
> > to:
> >
> >    - run the docker image in various modes (single job, session master,
> >    task manager etc)
> >    - customise the extending Dockerfile
> >    - and its entry point
> >
> > Eventually, the FLIP suggests to remove all other user facing
Dockerfiles
> > and building scripts from Flink repo, move all docker docs to
> > apache/flink-docker and adjust existing docker use cases to refer to
this
> > new approach (mostly Kubernetes now).
> >
> > The first contributed version of Flink docker integration also contained
> > example and docs for the integration with Bluemix in IBM cloud. We also
> > suggest to maintain it outside of Flink repository (cc Markus Müller).
> >
> > Thanks,
> > Andrey
> >
> > [1]
> >
https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification
> >

Reply via email to