Re: [DISCUSS] FLIP-111: Docker image unification

Yang Wang Tue, 17 Mar 2020 02:25:22 -0700

Hi Andrey,

Thanks for your explanation.


> About the logging
What i mean is we could not forward the stdout/stderr to local files and
docker stdout
at the same time by using log4j. For the jobmanager.log/taskmanager.log, it
works
quite well since we only need to add a console appender in
the log4j.properties.

I am just curious how to forward the stdout/stderr to local files and
docker stdout
at the same time by using log4j :)


Best,
Yang

Andrey Zagrebin <azagre...@apache.org> 于2020年3月16日周一 下午4:58写道：

> Thanks for the further feedback Thomas and Yangze.
>
> > A generic, dynamic configuration mechanism based on environment
> variables is essential and it is already supported via envsubst and an
> environment variable that can supply a configuration fragment
>
> True, we already have this. As I understand this was introduced for
> flexibility to template a custom flink-conf.yaml with env vars, put it into
> the FLINK_PROPERTIES and merge it with the default one.
> Could we achieve the same with the dynamic properties (-Drpc.port=1234),
> passed as image args to run it, instead of FLINK_PROPERTIES?
> They could be also parametrised with env vars. This would require
> jobmanager.sh to properly propagate them to
> the StandaloneSessionClusterEntrypoint though:
> https://github.com/docker-flink/docker-flink/pull/82#issuecomment-525285552
> cc @Till
> This would provide a unified configuration approach.
>
> > On the flip side, attempting to support a fixed subset of configuration
> options is brittle and will probably lead to compatibility issues down the
> road
>
> I agree with it. The idea was to have just some shortcut scripted
> functions to set options in flink-conf.yaml for a custom Dockerfile or
> entry point script.
> TASK_MANAGER_NUMBER_OF_TASK_SLOTS could be set as a dynamic property of
> started JM.
> I am not sure how many users depend on it. Maybe we could remove it.
> It also looks we already have somewhat unclean state in
> the docker-entrypoint.sh where some ports are set the hardcoded values
> and then FLINK_PROPERTIES are applied potentially duplicating options in
> the result flink-conf.yaml.
>
> I can see some potential usage of env vars as standard entry point args
> but for purposes related to something which cannot be achieved by passing
> entry point args, like changing flink-conf.yaml options. Nothing comes into
> my mind at the moment. It could be some setting specific to the running
> mode of the entry point. The mode itself can stay the first arg of the
> entry point.
>
> > I would second that it is desirable to support Java 11
>
> > Regarding supporting JAVA 11:
> > - Not sure if it is necessary to ship JAVA. Maybe we could just change
> > the base image from openjdk:8-jre to openjdk:11-jre in template docker
> > file[1]. Correct me if I understand incorrectly. Also, I agree to move
> > this out of the scope of this FLIP if it indeed takes much extra
> > effort.
>
> This is what I meant by bumping up the Java version in the docker hub
> Flink image:
> FROM openjdk:8-jre -> FROM openjdk:11-jre
> This can be polled dependently in user mailing list.
>
> > and in general use a base image that allows the (straightforward) use of
> more recent versions of other software (Python etc.)
>
> This can be polled whether to always include some version of python into
> the docker hub image.
> A potential problem here is once it is there, it is some hassle to
> remove/change it in a custom extended Dockerfile.
>
> It would be also nice to avoid maintaining images for various combinations
> of installed Java/Scala/Python in docker hub.
>
> > Regarding building from local dist:
> > - Yes, I bring this up mostly for development purpose. Since k8s is
> > popular, I believe more and more developers would like to test their
> > work on k8s cluster. I'm not sure should all developers write a custom
> > docker file themselves in this scenario. Thus, I still prefer to
> > provide a script for devs.
> > - I agree to keep the scope of this FLIP mostly for those normal
> > users. But as far as I can see, supporting building from local dist
> > would not take much extra effort.
> > - The maven docker plugin sounds good. I'll take a look at it.
>
> I would see any scripts introduced in this FLIP also as potential building
> blocks for a custom dev Dockerfile.
> Maybe, this will be all what we need for dev images or we write a dev
> Dockerfile, highly parametrised for building a dev image.
> If scripts stay in apache/flink-docker, it is also somewhat inconvenient
> to use them in the main Flink repo but possible.
> If we move them to apache/flink then we will have to e.g. include them
> into the release to make them easily available in apache/flink-docker and
> maintain them in main repo, although they are only docker specific.
> All in all, I would say, once we implement them, we can revisit this topic.
>
> Best,
> Andrey
>
> On Wed, Mar 11, 2020 at 8:58 AM Yangze Guo <karma...@gmail.com> wrote:
>
>> Thanks for the reply, Andrey.
>>
>> Regarding building from local dist:
>> - Yes, I bring this up mostly for development purpose. Since k8s is
>> popular, I believe more and more developers would like to test their
>> work on k8s cluster. I'm not sure should all developers write a custom
>> docker file themselves in this scenario. Thus, I still prefer to
>> provide a script for devs.
>> - I agree to keep the scope of this FLIP mostly for those normal
>> users. But as far as I can see, supporting building from local dist
>> would not take much extra effort.
>> - The maven docker plugin sounds good. I'll take a look at it.
>>
>> Regarding supporting JAVA 11:
>> - Not sure if it is necessary to ship JAVA. Maybe we could just change
>> the base image from openjdk:8-jre to openjdk:11-jre in template docker
>> file[1]. Correct me if I understand incorrectly. Also, I agree to move
>> this out of the scope of this FLIP if it indeed takes much extra
>> effort.
>>
>> Regarding the custom configuration, the mechanism that Thomas mentioned
>> LGTM.
>>
>> [1]
>> https://github.com/apache/flink-docker/blob/master/Dockerfile-debian.template
>>
>> Best,
>> Yangze Guo
>>
>> On Wed, Mar 11, 2020 at 5:52 AM Thomas Weise <t...@apache.org> wrote:
>> >
>> > Thanks for working on improvements to the Flink Docker container
>> images. This will be important as more and more users are looking to adopt
>> Kubernetes and other deployment tooling that relies on Docker images.
>> >
>> > A generic, dynamic configuration mechanism based on environment
>> variables is essential and it is already supported via envsubst and an
>> environment variable that can supply a configuration fragment:
>> >
>> >
>> https://github.com/apache/flink-docker/blob/09adf2dcd99abfb6180e1e2b5b917b288e0c01f6/docker-entrypoint.sh#L88
>> >
>> https://github.com/apache/flink-docker/blob/09adf2dcd99abfb6180e1e2b5b917b288e0c01f6/docker-entrypoint.sh#L85
>> >
>> > This gives the necessary control for infrastructure use cases that aim
>> to supply deployment tooling other users. An example in this category this
>> is the FlinkK8sOperator:
>> >
>> > https://github.com/lyft/flinkk8soperator/tree/master/examples/wordcount
>> >
>> > On the flip side, attempting to support a fixed subset of configuration
>> options is brittle and will probably lead to compatibility issues down the
>> road:
>> >
>> >
>> https://github.com/apache/flink-docker/blob/09adf2dcd99abfb6180e1e2b5b917b288e0c01f6/docker-entrypoint.sh#L97
>> >
>> > Besides the configuration, it may be worthwhile to see in which other
>> ways the base Docker images can provide more flexibility to incentivize
>> wider adoption.
>> >
>> > I would second that it is desirable to support Java 11 and in general
>> use a base image that allows the (straightforward) use of more recent
>> versions of other software (Python etc.)
>> >
>> >
>> https://github.com/apache/flink-docker/blob/d3416e720377e9b4c07a2d0f4591965264ac74c5/Dockerfile-debian.template#L19
>> >
>> > Thanks,
>> > Thomas
>> >
>> > On Tue, Mar 10, 2020 at 12:26 PM Andrey Zagrebin <azagre...@apache.org>
>> wrote:
>> >>
>> >> Hi All,
>> >>
>> >> Thanks a lot for the feedback!
>> >>
>> >> *@Yangze Guo*
>> >>
>> >> - Regarding the flink_docker_utils#install_flink function, I think it
>> >> > should also support build from local dist and build from a
>> >> > user-defined archive.
>> >>
>> >> I suppose you bring this up mostly for development purpose or powerful
>> >> users.
>> >> Most of normal users are usually interested in mainstream released
>> versions
>> >> of Flink.
>> >> Although, you are bring a valid concern, my idea was to keep scope of
>> this
>> >> FLIP mostly for those normal users.
>> >> The powerful users are usually capable to design a completely
>> >> custom Dockerfile themselves.
>> >> At the moment, we already have custom Dockerfiles e.g. for tests in
>> >>
>> flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile.
>> >> We can add something similar for development purposes and maybe
>> introduce a
>> >> special maven goal. There is a maven docker plugin, afaik.
>> >> I will add this to FLIP as next step.
>> >>
>> >> - It seems that the install_shaded_hadoop could be an option of
>> >> > install_flink
>> >>
>> >> I woud rather think about this as a separate independent optional step.
>> >>
>> >> - Should we support JAVA 11? Currently, most of the docker file based
>> on
>> >> > JAVA 8.
>> >>
>> >> Indeed, it is a valid concern. Java version is a fundamental property
>> of
>> >> the docker image.
>> >> To customise this in the current mainstream image is difficult, this
>> would
>> >> require to ship it w/o Java at all.
>> >> Or this is a separate discussion whether we want to distribute docker
>> hub
>> >> images with different Java versions or just bump it to Java 11.
>> >> This should be easy in a custom Dockerfile for development purposes
>> though
>> >> as mentioned before.
>> >>
>> >> - I do not understand how to set config options through
>> >>
>> >> "flink_docker_utils configure"? Does this step happen during the image
>> >> > build or the container start? If it happens during the image build,
>> >> > there would be a new image every time we change the config. If it
>> just
>> >> > a part of the container entrypoint, I think there is no need to add a
>> >> > configure command, we could just add all dynamic config options to
>> the
>> >> > args list of "start_jobmaster"/"start_session_jobmanager". Am I
>> >> > understanding this correctly?
>> >>
>> >>  `flink_docker_utils configure ...` can be called everywhere:
>> >> - while building a custom image (`RUN flink_docker_utils configure
>> ..`) by
>> >> extending our base image from docker hub (`from flink`)
>> >> - in a custom entry point as well
>> >> I will check this but if user can also pass a dynamic config option it
>> also
>> >> sounds like a good option
>> >> Our standard entry point script in base image could just properly
>> forward
>> >> the arguments to the Flink process.
>> >>
>> >> @Yang Wang
>> >>
>> >> > About docker utils
>> >> > I really like the idea to provide some utils for the docker file and
>> entry
>> >> > point. The
>> >> > `flink_docker_utils` will help to build the image easier. I am not
>> sure
>> >> > about the
>> >> > `flink_docker_utils start_jobmaster`. Do you mean when we build a
>> docker
>> >> > image, we
>> >> > need to add `RUN flink_docker_utils start_jobmaster` in the docker
>> file?
>> >> > Why do we need this?
>> >>
>> >> This is a scripted action to start JM. It can be called everywhere.
>> >> Indeed, it does not make too much sense to run it in Dockerfile.
>> >> Mostly, the idea was to use in a custom entry point. When our base
>> docker
>> >> hub image is started its entry point can be also completely overridden.
>> >> The actions are also sorted in the FLIP: for Dockerfile or for entry
>> point.
>> >> E.g. our standard entry point script in the base docker hub image can
>> >> already use it.
>> >> Anyways, it was just an example, the details are to be defined in
>> Jira, imo.
>> >>
>> >> > About docker entry point
>> >> > I agree with you that the docker entry point could more powerful
>> with more
>> >> > functionality.
>> >> > Mostly, it is about to override the config options. If we support
>> dynamic
>> >> > properties, i think
>> >> > it is more convenient for users without any learning curve.
>> >> > `docker run flink session_jobmanager -D rest.bind-port=8081`
>> >>
>> >> Indeed, as mentioned before, it can be a better option.
>> >> The standard entry point also decides at least what to run JM or TM. I
>> >> think we will see what else makes sense to include there during the
>> >> implementation.
>> >> Some specifics may be more convenient to set with env vars as
>> Konstantin
>> >> mentioned.
>> >>
>> >> > About the logging
>> >> > Updating the `log4j-console.properties` to support multiple appender
>> is a
>> >> > better option.
>> >> > Currently, the native K8s is suggesting users to debug the logs in
>> this
>> >> > way[1]. However,
>> >> > there is also some problems. The stderr and stdout of JM/TM
>> processes could
>> >> > not be
>> >> > forwarded to the docker container console.
>> >>
>> >> Strange, we should check maybe there is a docker option to query the
>> >> container's stderr output as well.
>> >> If we forward Flink process stdout as usual in bash console, it should
>> not
>> >> be a problem. Why can it not be forwarded?
>> >>
>> >> @Konstantin Knauf
>> >>
>> >> For the entrypoint, have you considered to also allow setting
>> configuration
>> >> > via environment variables as in "docker run -e
>> FLINK_REST_BIN_PORT=8081
>> >> > ..."? This is quite common and more flexible, e.g. it makes it very
>> easy to
>> >> > pass values of Kubernetes Secrets into the Flink configuration.
>> >>
>> >> This is indeed an interesting option to pass arguments to the entry
>> point
>> >> in general.
>> >> For the config options, the dynamic args can be a better option as
>> >> mentioned above.
>> >>
>> >> With respect to logging, I would opt to keep this very basic and to
>> only
>> >> > support logging to the console (maybe with a fix for the web user
>> >> > interface). For everything else, users can easily build their own
>> images
>> >> > based on library/flink (provide the dependencies, change the logging
>> >> > configuration).
>> >>
>> >> agree
>> >>
>> >> Thanks,
>> >> Andrey
>> >>
>> >> On Sun, Mar 8, 2020 at 8:55 PM Konstantin Knauf <
>> konstan...@ververica.com>
>> >> wrote:
>> >>
>> >> > Hi Andrey,
>> >> >
>> >> > thanks a lot for this proposal. The variety of Docker files in the
>> project
>> >> > has been causing quite some confusion.
>> >> >
>> >> > For the entrypoint, have you considered to also allow setting
>> >> > configuration via environment variables as in "docker run -e
>> >> > FLINK_REST_BIN_PORT=8081 ..."? This is quite common and more
>> flexible, e.g.
>> >> > it makes it very easy to pass values of Kubernetes Secrets into the
>> Flink
>> >> > configuration.
>> >> >
>> >> > With respect to logging, I would opt to keep this very basic and to
>> only
>> >> > support logging to the console (maybe with a fix for the web user
>> >> > interface). For everything else, users can easily build their own
>> images
>> >> > based on library/flink (provide the dependencies, change the logging
>> >> > configuration).
>> >> >
>> >> > Cheers,
>> >> >
>> >> > Konstantin
>> >> >
>> >> >
>> >> > On Thu, Mar 5, 2020 at 11:01 AM Yang Wang <danrtsey...@gmail.com>
>> wrote:
>> >> >
>> >> >> Hi Andrey,
>> >> >>
>> >> >>
>> >> >> Thanks for driving this significant FLIP. From the user ML, we
>> could also
>> >> >> know there are
>> >> >> many users running Flink in container environment. Then the docker
>> image
>> >> >> will be the
>> >> >> very basic requirement. Just as you say, we should provide a unified
>> >> >> place for all various
>> >> >> usage(e.g. session, job, native k8s, swarm, etc.).
>> >> >>
>> >> >>
>> >> >> > About docker utils
>> >> >>
>> >> >> I really like the idea to provide some utils for the docker file and
>> >> >> entry point. The
>> >> >> `flink_docker_utils` will help to build the image easier. I am not
>> sure
>> >> >> about the
>> >> >> `flink_docker_utils start_jobmaster`. Do you mean when we build a
>> docker
>> >> >> image, we
>> >> >> need to add `RUN flink_docker_utils start_jobmaster` in the docker
>> file?
>> >> >> Why do we need this?
>> >> >>
>> >> >>
>> >> >> > About docker entry point
>> >> >>
>> >> >> I agree with you that the docker entry point could more powerful
>> with
>> >> >> more functionality.
>> >> >> Mostly, it is about to override the config options. If we support
>> dynamic
>> >> >> properties, i think
>> >> >> it is more convenient for users without any learning curve.
>> >> >> `docker run flink session_jobmanager -D rest.bind-port=8081`
>> >> >>
>> >> >>
>> >> >> > About the logging
>> >> >>
>> >> >> Updating the `log4j-console.properties` to support multiple
>> appender is a
>> >> >> better option.
>> >> >> Currently, the native K8s is suggesting users to debug the logs in
>> this
>> >> >> way[1]. However,
>> >> >> there is also some problems. The stderr and stdout of JM/TM
>> processes
>> >> >> could not be
>> >> >> forwarded to the docker container console.
>> >> >>
>> >> >>
>> >> >> [1].
>> >> >>
>> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#log-files
>> >> >>
>> >> >>
>> >> >> Best,
>> >> >> Yang
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> Andrey Zagrebin <azagre...@apache.org> 于2020年3月4日周三 下午5:34写道：
>> >> >>
>> >> >>> Hi All,
>> >> >>>
>> >> >>> If you have ever touched the docker topic in Flink, you
>> >> >>> probably noticed that we have multiple places in docs and repos
>> which
>> >> >>> address its various concerns.
>> >> >>>
>> >> >>> We have prepared a FLIP [1] to simplify the perception of docker
>> topic in
>> >> >>> Flink by users. It mostly advocates for an approach of extending
>> official
>> >> >>> Flink image from the docker hub. For convenience, it can come with
>> a set
>> >> >>> of
>> >> >>> bash utilities and documented examples of their usage. The
>> utilities
>> >> >>> allow
>> >> >>> to:
>> >> >>>
>> >> >>>    - run the docker image in various modes (single job, session
>> master,
>> >> >>>    task manager etc)
>> >> >>>    - customise the extending Dockerfile
>> >> >>>    - and its entry point
>> >> >>>
>> >> >>> Eventually, the FLIP suggests to remove all other user facing
>> Dockerfiles
>> >> >>> and building scripts from Flink repo, move all docker docs to
>> >> >>> apache/flink-docker and adjust existing docker use cases to refer
>> to this
>> >> >>> new approach (mostly Kubernetes now).
>> >> >>>
>> >> >>> The first contributed version of Flink docker integration also
>> contained
>> >> >>> example and docs for the integration with Bluemix in IBM cloud. We
>> also
>> >> >>> suggest to maintain it outside of Flink repository (cc Markus
>> Müller).
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Andrey
>> >> >>>
>> >> >>> [1]
>> >> >>>
>> >> >>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification
>> >> >>>
>> >> >>
>> >> >
>> >> > --
>> >> >
>> >> > Konstantin Knauf | Head of Product
>> >> >
>> >> > +49 160 91394525
>> >> >
>> >> >
>> >> > Follow us @VervericaData Ververica <https://www.ververica.com/>
>> >> >
>> >> >
>> >> > --
>> >> >
>> >> > Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> >> > Conference
>> >> >
>> >> > Stream Processing | Event Driven | Real Time
>> >> >
>> >> > --
>> >> >
>> >> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>> >> >
>> >> > --
>> >> > Ververica GmbH
>> >> > Registered at Amtsgericht Charlottenburg: HRB 158244 B
>> >> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason,
>> Ji
>> >> > (Tony) Cheng
>> >> >
>>
>

Re: [DISCUSS] FLIP-111: Docker image unification

Reply via email to