Re: [DISCUSS] FLIP-111: Docker image unification

Yang Wang Thu, 02 Apr 2020 20:12:15 -0700

Hi Ufuk,

Thanks for make the conclusion and directly point out what need to be done
in
FLIP-111. I agree with you that we should narrow down the scope and focus
the
most important and basic part about docker image unification.


(1) Extend the entrypoint script in apache/flink-docker to start the job
> cluster entry point

I want to add a small requirement for the entry point script. Currently,
for the native
K8s integration, we are using the apache/flink-docker image, but with
different entry
point("kubernetes-entry.sh"). Generate the java cmd in KubernetesUtils and
run it
in the entry point. I really hope it could merge to apache/flink-docker
"docker-entrypoint.sh".

(2) Extend the example log4j-console configuration
> => support log retrieval from the Flink UI out of the box

If you mean to update the "flink-dist/conf/log4j-console.properties" to
support console and
local log files. I will say "+1". But we need to find a proper way to make
stdout/stderr output
both available for console and log files. Maybe till's proposal could help
to solve this.
"`program &2>1 | tee flink-user-taskexecutor.out`"

(3) Document typical usage scenarios in apache/flink-docker
> => this should replace the proposed flink_docker_utils helper

 I agree with you that in the first step, the documentation is enough for
typical usage(e.g. standalone
session, standalone perjob, native, plugins, python, etc.).


Best,
Yang


Ufuk Celebi <u...@apache.org> 于2020年4月3日周五 上午1:03写道：

> Hey all,
>
> thanks for the proposal and the detailed discussion. In particular, thanks
> to Andrey for starting this thread and to Patrick for the additional ideas
> in the linked Google doc.
>
> I find many of the improvements proposed during the discussion (such as the
> unified entrypoint in Flink, proper configuration via environment
> variables, Dockerfiles for development, etc.) really important. At the same
> time, I believe that these improvements have quite a large scope and could
> be tackled independently as Till already suggested. I think we should
> ideally split the discussions for those improvements out of this thread and
> focus on the main target of FLIP-111.
>
> To me the major point of this FLIP is to consolidate existing Dockerfiles
> into apache/flink-docker and document typical usage scenarios (e.g. linking
> plugins, installing shaded Hadoop, running a job cluster, etc.).
>
> In order to achieve this, I think we could move forward as follows:
>
> (1) Extend the entrypoint script in apache/flink-docker to start the job
> cluster entry point
> => this is currently missing and would block removal of the Dockerfile in
> flink-container
>
> (2) Extend the example log4j-console configuration
> => support log retrieval from the Flink UI out of the box
>
> (3) Document typical usage scenarios in apache/flink-docker
> => this should replace the proposed flink_docker_utils helper
>
> (4) Remove the existing Dockerfiles from apache/flink
>
>
> I really like the convenience of a script such as flink_docker_utils, but I
> think we should avoid it for now, because most of the desired usage
> scenarios can be covered by documentation. After we have concluded (1)-(4)
> we can take a holistic look and identify what would benefit the most from
> such a script and how it would interact with the other planned
> improvements.
>
> I think this will give us a good basis to tackle the other major
> improvements that were proposed.
>
> – Ufuk
>
> On Thu, Apr 2, 2020 at 4:34 PM Patrick Lucas <patr...@ververica.com>
> wrote:
> >
> > Thanks Andrey for working on this, and everyone else for your feedback.
> >
> > This FLIP inspired me to discuss and write down some ideas I've had for a
> > while about configuring and running Flink (especially in Docker) that go
> > beyond the scope of this FLIP, but don't contradict what it sets out to
> do.
> >
> > The crux of it is that Flink should be maximally configurable using
> > environment variables, and not require manipulation of the filesystem
> (i.e.
> > moving/linking JARs or editing config files) in order to run in a large
> > majority of cases. And beyond that, particular for running Flink in
> Docker,
> > is that as much logic as possible should be a part of Flink itself and
> not,
> > for instance, in the docker-entrypoint.sh script. I've resisted adding
> > additional logic to the Flink Docker images except where necessary since
> > the beginning, and I believe we can get to the point where the only thing
> > the entrypoint script does is drop privileges before invoking a script
> > included in Flink.
> >
> > Ultimately, my ideal end-goal for running Flink in containers would
> fulfill
> > > the following points:
> > >
> > >    - A user can configure all “start-time” aspects of Flink with
> > >    environment variables, including additions to the classpath
> > >    - Flink automatically adapts to the resources available to the
> > >    container (such as what BashJavaUtils helps with today)
> > >    - A user can include additional JARs using a mounted volume, or at
> > >    image build time with convenient tooling
> > >    - The role/mode (jobmanager, session) is specified as a command line
> > >    argument, with a single entrypoint program sufficing for all uses of
> the
> > >    image
> > >
> > > As a bonus, if we could eliminate some or most of the layers of shell
> > > scripts that are involved in starting a Flink server, perhaps by
> > > re-implementing this part of the stack in Java, and exec-ing to
> actually
> > > run Flink with the proper java CLI arguments, I think it would be a big
> win
> > > for the project.
> >
> >
> > You can read the rest of my notes here:
> >
>
> https://docs.google.com/document/d/1JCACSeDaqeZiXD9G1XxQBunwi-chwrdnFm38U1JxTDQ/edit
> >
> > On Wed, Mar 4, 2020 at 10:34 AM Andrey Zagrebin <azagre...@apache.org>
> > wrote:
> >
> > > Hi All,
> > >
> > > If you have ever touched the docker topic in Flink, you
> > > probably noticed that we have multiple places in docs and repos which
> > > address its various concerns.
> > >
> > > We have prepared a FLIP [1] to simplify the perception of docker topic
> in
> > > Flink by users. It mostly advocates for an approach of extending
> official
> > > Flink image from the docker hub. For convenience, it can come with a
> set of
> > > bash utilities and documented examples of their usage. The utilities
> allow
> > > to:
> > >
> > >    - run the docker image in various modes (single job, session master,
> > >    task manager etc)
> > >    - customise the extending Dockerfile
> > >    - and its entry point
> > >
> > > Eventually, the FLIP suggests to remove all other user facing
> Dockerfiles
> > > and building scripts from Flink repo, move all docker docs to
> > > apache/flink-docker and adjust existing docker use cases to refer to
> this
> > > new approach (mostly Kubernetes now).
> > >
> > > The first contributed version of Flink docker integration also
> contained
> > > example and docs for the integration with Bluemix in IBM cloud. We also
> > > suggest to maintain it outside of Flink repository (cc Markus Müller).
> > >
> > > Thanks,
> > > Andrey
> > >
> > > [1]
> > >
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification
> > >
>

Re: [DISCUSS] FLIP-111: Docker image unification

Reply via email to