Hi Andrey, Thanks for your explanation.
> About the logging What i mean is we could not forward the stdout/stderr to local files and docker stdout at the same time by using log4j. For the jobmanager.log/taskmanager.log, it works quite well since we only need to add a console appender in the log4j.properties. I am just curious how to forward the stdout/stderr to local files and docker stdout at the same time by using log4j :) Best, Yang Andrey Zagrebin <azagre...@apache.org> 于2020年3月16日周一 下午4:58写道: > Thanks for the further feedback Thomas and Yangze. > > > A generic, dynamic configuration mechanism based on environment > variables is essential and it is already supported via envsubst and an > environment variable that can supply a configuration fragment > > True, we already have this. As I understand this was introduced for > flexibility to template a custom flink-conf.yaml with env vars, put it into > the FLINK_PROPERTIES and merge it with the default one. > Could we achieve the same with the dynamic properties (-Drpc.port=1234), > passed as image args to run it, instead of FLINK_PROPERTIES? > They could be also parametrised with env vars. This would require > jobmanager.sh to properly propagate them to > the StandaloneSessionClusterEntrypoint though: > https://github.com/docker-flink/docker-flink/pull/82#issuecomment-525285552 > cc @Till > This would provide a unified configuration approach. > > > On the flip side, attempting to support a fixed subset of configuration > options is brittle and will probably lead to compatibility issues down the > road > > I agree with it. The idea was to have just some shortcut scripted > functions to set options in flink-conf.yaml for a custom Dockerfile or > entry point script. > TASK_MANAGER_NUMBER_OF_TASK_SLOTS could be set as a dynamic property of > started JM. > I am not sure how many users depend on it. Maybe we could remove it. > It also looks we already have somewhat unclean state in > the docker-entrypoint.sh where some ports are set the hardcoded values > and then FLINK_PROPERTIES are applied potentially duplicating options in > the result flink-conf.yaml. > > I can see some potential usage of env vars as standard entry point args > but for purposes related to something which cannot be achieved by passing > entry point args, like changing flink-conf.yaml options. Nothing comes into > my mind at the moment. It could be some setting specific to the running > mode of the entry point. The mode itself can stay the first arg of the > entry point. > > > I would second that it is desirable to support Java 11 > > > Regarding supporting JAVA 11: > > - Not sure if it is necessary to ship JAVA. Maybe we could just change > > the base image from openjdk:8-jre to openjdk:11-jre in template docker > > file[1]. Correct me if I understand incorrectly. Also, I agree to move > > this out of the scope of this FLIP if it indeed takes much extra > > effort. > > This is what I meant by bumping up the Java version in the docker hub > Flink image: > FROM openjdk:8-jre -> FROM openjdk:11-jre > This can be polled dependently in user mailing list. > > > and in general use a base image that allows the (straightforward) use of > more recent versions of other software (Python etc.) > > This can be polled whether to always include some version of python into > the docker hub image. > A potential problem here is once it is there, it is some hassle to > remove/change it in a custom extended Dockerfile. > > It would be also nice to avoid maintaining images for various combinations > of installed Java/Scala/Python in docker hub. > > > Regarding building from local dist: > > - Yes, I bring this up mostly for development purpose. Since k8s is > > popular, I believe more and more developers would like to test their > > work on k8s cluster. I'm not sure should all developers write a custom > > docker file themselves in this scenario. Thus, I still prefer to > > provide a script for devs. > > - I agree to keep the scope of this FLIP mostly for those normal > > users. But as far as I can see, supporting building from local dist > > would not take much extra effort. > > - The maven docker plugin sounds good. I'll take a look at it. > > I would see any scripts introduced in this FLIP also as potential building > blocks for a custom dev Dockerfile. > Maybe, this will be all what we need for dev images or we write a dev > Dockerfile, highly parametrised for building a dev image. > If scripts stay in apache/flink-docker, it is also somewhat inconvenient > to use them in the main Flink repo but possible. > If we move them to apache/flink then we will have to e.g. include them > into the release to make them easily available in apache/flink-docker and > maintain them in main repo, although they are only docker specific. > All in all, I would say, once we implement them, we can revisit this topic. > > Best, > Andrey > > On Wed, Mar 11, 2020 at 8:58 AM Yangze Guo <karma...@gmail.com> wrote: > >> Thanks for the reply, Andrey. >> >> Regarding building from local dist: >> - Yes, I bring this up mostly for development purpose. Since k8s is >> popular, I believe more and more developers would like to test their >> work on k8s cluster. I'm not sure should all developers write a custom >> docker file themselves in this scenario. Thus, I still prefer to >> provide a script for devs. >> - I agree to keep the scope of this FLIP mostly for those normal >> users. But as far as I can see, supporting building from local dist >> would not take much extra effort. >> - The maven docker plugin sounds good. I'll take a look at it. >> >> Regarding supporting JAVA 11: >> - Not sure if it is necessary to ship JAVA. Maybe we could just change >> the base image from openjdk:8-jre to openjdk:11-jre in template docker >> file[1]. Correct me if I understand incorrectly. Also, I agree to move >> this out of the scope of this FLIP if it indeed takes much extra >> effort. >> >> Regarding the custom configuration, the mechanism that Thomas mentioned >> LGTM. >> >> [1] >> https://github.com/apache/flink-docker/blob/master/Dockerfile-debian.template >> >> Best, >> Yangze Guo >> >> On Wed, Mar 11, 2020 at 5:52 AM Thomas Weise <t...@apache.org> wrote: >> > >> > Thanks for working on improvements to the Flink Docker container >> images. This will be important as more and more users are looking to adopt >> Kubernetes and other deployment tooling that relies on Docker images. >> > >> > A generic, dynamic configuration mechanism based on environment >> variables is essential and it is already supported via envsubst and an >> environment variable that can supply a configuration fragment: >> > >> > >> https://github.com/apache/flink-docker/blob/09adf2dcd99abfb6180e1e2b5b917b288e0c01f6/docker-entrypoint.sh#L88 >> > >> https://github.com/apache/flink-docker/blob/09adf2dcd99abfb6180e1e2b5b917b288e0c01f6/docker-entrypoint.sh#L85 >> > >> > This gives the necessary control for infrastructure use cases that aim >> to supply deployment tooling other users. An example in this category this >> is the FlinkK8sOperator: >> > >> > https://github.com/lyft/flinkk8soperator/tree/master/examples/wordcount >> > >> > On the flip side, attempting to support a fixed subset of configuration >> options is brittle and will probably lead to compatibility issues down the >> road: >> > >> > >> https://github.com/apache/flink-docker/blob/09adf2dcd99abfb6180e1e2b5b917b288e0c01f6/docker-entrypoint.sh#L97 >> > >> > Besides the configuration, it may be worthwhile to see in which other >> ways the base Docker images can provide more flexibility to incentivize >> wider adoption. >> > >> > I would second that it is desirable to support Java 11 and in general >> use a base image that allows the (straightforward) use of more recent >> versions of other software (Python etc.) >> > >> > >> https://github.com/apache/flink-docker/blob/d3416e720377e9b4c07a2d0f4591965264ac74c5/Dockerfile-debian.template#L19 >> > >> > Thanks, >> > Thomas >> > >> > On Tue, Mar 10, 2020 at 12:26 PM Andrey Zagrebin <azagre...@apache.org> >> wrote: >> >> >> >> Hi All, >> >> >> >> Thanks a lot for the feedback! >> >> >> >> *@Yangze Guo* >> >> >> >> - Regarding the flink_docker_utils#install_flink function, I think it >> >> > should also support build from local dist and build from a >> >> > user-defined archive. >> >> >> >> I suppose you bring this up mostly for development purpose or powerful >> >> users. >> >> Most of normal users are usually interested in mainstream released >> versions >> >> of Flink. >> >> Although, you are bring a valid concern, my idea was to keep scope of >> this >> >> FLIP mostly for those normal users. >> >> The powerful users are usually capable to design a completely >> >> custom Dockerfile themselves. >> >> At the moment, we already have custom Dockerfiles e.g. for tests in >> >> >> flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile. >> >> We can add something similar for development purposes and maybe >> introduce a >> >> special maven goal. There is a maven docker plugin, afaik. >> >> I will add this to FLIP as next step. >> >> >> >> - It seems that the install_shaded_hadoop could be an option of >> >> > install_flink >> >> >> >> I woud rather think about this as a separate independent optional step. >> >> >> >> - Should we support JAVA 11? Currently, most of the docker file based >> on >> >> > JAVA 8. >> >> >> >> Indeed, it is a valid concern. Java version is a fundamental property >> of >> >> the docker image. >> >> To customise this in the current mainstream image is difficult, this >> would >> >> require to ship it w/o Java at all. >> >> Or this is a separate discussion whether we want to distribute docker >> hub >> >> images with different Java versions or just bump it to Java 11. >> >> This should be easy in a custom Dockerfile for development purposes >> though >> >> as mentioned before. >> >> >> >> - I do not understand how to set config options through >> >> >> >> "flink_docker_utils configure"? Does this step happen during the image >> >> > build or the container start? If it happens during the image build, >> >> > there would be a new image every time we change the config. If it >> just >> >> > a part of the container entrypoint, I think there is no need to add a >> >> > configure command, we could just add all dynamic config options to >> the >> >> > args list of "start_jobmaster"/"start_session_jobmanager". Am I >> >> > understanding this correctly? >> >> >> >> `flink_docker_utils configure ...` can be called everywhere: >> >> - while building a custom image (`RUN flink_docker_utils configure >> ..`) by >> >> extending our base image from docker hub (`from flink`) >> >> - in a custom entry point as well >> >> I will check this but if user can also pass a dynamic config option it >> also >> >> sounds like a good option >> >> Our standard entry point script in base image could just properly >> forward >> >> the arguments to the Flink process. >> >> >> >> @Yang Wang >> >> >> >> > About docker utils >> >> > I really like the idea to provide some utils for the docker file and >> entry >> >> > point. The >> >> > `flink_docker_utils` will help to build the image easier. I am not >> sure >> >> > about the >> >> > `flink_docker_utils start_jobmaster`. Do you mean when we build a >> docker >> >> > image, we >> >> > need to add `RUN flink_docker_utils start_jobmaster` in the docker >> file? >> >> > Why do we need this? >> >> >> >> This is a scripted action to start JM. It can be called everywhere. >> >> Indeed, it does not make too much sense to run it in Dockerfile. >> >> Mostly, the idea was to use in a custom entry point. When our base >> docker >> >> hub image is started its entry point can be also completely overridden. >> >> The actions are also sorted in the FLIP: for Dockerfile or for entry >> point. >> >> E.g. our standard entry point script in the base docker hub image can >> >> already use it. >> >> Anyways, it was just an example, the details are to be defined in >> Jira, imo. >> >> >> >> > About docker entry point >> >> > I agree with you that the docker entry point could more powerful >> with more >> >> > functionality. >> >> > Mostly, it is about to override the config options. If we support >> dynamic >> >> > properties, i think >> >> > it is more convenient for users without any learning curve. >> >> > `docker run flink session_jobmanager -D rest.bind-port=8081` >> >> >> >> Indeed, as mentioned before, it can be a better option. >> >> The standard entry point also decides at least what to run JM or TM. I >> >> think we will see what else makes sense to include there during the >> >> implementation. >> >> Some specifics may be more convenient to set with env vars as >> Konstantin >> >> mentioned. >> >> >> >> > About the logging >> >> > Updating the `log4j-console.properties` to support multiple appender >> is a >> >> > better option. >> >> > Currently, the native K8s is suggesting users to debug the logs in >> this >> >> > way[1]. However, >> >> > there is also some problems. The stderr and stdout of JM/TM >> processes could >> >> > not be >> >> > forwarded to the docker container console. >> >> >> >> Strange, we should check maybe there is a docker option to query the >> >> container's stderr output as well. >> >> If we forward Flink process stdout as usual in bash console, it should >> not >> >> be a problem. Why can it not be forwarded? >> >> >> >> @Konstantin Knauf >> >> >> >> For the entrypoint, have you considered to also allow setting >> configuration >> >> > via environment variables as in "docker run -e >> FLINK_REST_BIN_PORT=8081 >> >> > ..."? This is quite common and more flexible, e.g. it makes it very >> easy to >> >> > pass values of Kubernetes Secrets into the Flink configuration. >> >> >> >> This is indeed an interesting option to pass arguments to the entry >> point >> >> in general. >> >> For the config options, the dynamic args can be a better option as >> >> mentioned above. >> >> >> >> With respect to logging, I would opt to keep this very basic and to >> only >> >> > support logging to the console (maybe with a fix for the web user >> >> > interface). For everything else, users can easily build their own >> images >> >> > based on library/flink (provide the dependencies, change the logging >> >> > configuration). >> >> >> >> agree >> >> >> >> Thanks, >> >> Andrey >> >> >> >> On Sun, Mar 8, 2020 at 8:55 PM Konstantin Knauf < >> konstan...@ververica.com> >> >> wrote: >> >> >> >> > Hi Andrey, >> >> > >> >> > thanks a lot for this proposal. The variety of Docker files in the >> project >> >> > has been causing quite some confusion. >> >> > >> >> > For the entrypoint, have you considered to also allow setting >> >> > configuration via environment variables as in "docker run -e >> >> > FLINK_REST_BIN_PORT=8081 ..."? This is quite common and more >> flexible, e.g. >> >> > it makes it very easy to pass values of Kubernetes Secrets into the >> Flink >> >> > configuration. >> >> > >> >> > With respect to logging, I would opt to keep this very basic and to >> only >> >> > support logging to the console (maybe with a fix for the web user >> >> > interface). For everything else, users can easily build their own >> images >> >> > based on library/flink (provide the dependencies, change the logging >> >> > configuration). >> >> > >> >> > Cheers, >> >> > >> >> > Konstantin >> >> > >> >> > >> >> > On Thu, Mar 5, 2020 at 11:01 AM Yang Wang <danrtsey...@gmail.com> >> wrote: >> >> > >> >> >> Hi Andrey, >> >> >> >> >> >> >> >> >> Thanks for driving this significant FLIP. From the user ML, we >> could also >> >> >> know there are >> >> >> many users running Flink in container environment. Then the docker >> image >> >> >> will be the >> >> >> very basic requirement. Just as you say, we should provide a unified >> >> >> place for all various >> >> >> usage(e.g. session, job, native k8s, swarm, etc.). >> >> >> >> >> >> >> >> >> > About docker utils >> >> >> >> >> >> I really like the idea to provide some utils for the docker file and >> >> >> entry point. The >> >> >> `flink_docker_utils` will help to build the image easier. I am not >> sure >> >> >> about the >> >> >> `flink_docker_utils start_jobmaster`. Do you mean when we build a >> docker >> >> >> image, we >> >> >> need to add `RUN flink_docker_utils start_jobmaster` in the docker >> file? >> >> >> Why do we need this? >> >> >> >> >> >> >> >> >> > About docker entry point >> >> >> >> >> >> I agree with you that the docker entry point could more powerful >> with >> >> >> more functionality. >> >> >> Mostly, it is about to override the config options. If we support >> dynamic >> >> >> properties, i think >> >> >> it is more convenient for users without any learning curve. >> >> >> `docker run flink session_jobmanager -D rest.bind-port=8081` >> >> >> >> >> >> >> >> >> > About the logging >> >> >> >> >> >> Updating the `log4j-console.properties` to support multiple >> appender is a >> >> >> better option. >> >> >> Currently, the native K8s is suggesting users to debug the logs in >> this >> >> >> way[1]. However, >> >> >> there is also some problems. The stderr and stdout of JM/TM >> processes >> >> >> could not be >> >> >> forwarded to the docker container console. >> >> >> >> >> >> >> >> >> [1]. >> >> >> >> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#log-files >> >> >> >> >> >> >> >> >> Best, >> >> >> Yang >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Andrey Zagrebin <azagre...@apache.org> 于2020年3月4日周三 下午5:34写道: >> >> >> >> >> >>> Hi All, >> >> >>> >> >> >>> If you have ever touched the docker topic in Flink, you >> >> >>> probably noticed that we have multiple places in docs and repos >> which >> >> >>> address its various concerns. >> >> >>> >> >> >>> We have prepared a FLIP [1] to simplify the perception of docker >> topic in >> >> >>> Flink by users. It mostly advocates for an approach of extending >> official >> >> >>> Flink image from the docker hub. For convenience, it can come with >> a set >> >> >>> of >> >> >>> bash utilities and documented examples of their usage. The >> utilities >> >> >>> allow >> >> >>> to: >> >> >>> >> >> >>> - run the docker image in various modes (single job, session >> master, >> >> >>> task manager etc) >> >> >>> - customise the extending Dockerfile >> >> >>> - and its entry point >> >> >>> >> >> >>> Eventually, the FLIP suggests to remove all other user facing >> Dockerfiles >> >> >>> and building scripts from Flink repo, move all docker docs to >> >> >>> apache/flink-docker and adjust existing docker use cases to refer >> to this >> >> >>> new approach (mostly Kubernetes now). >> >> >>> >> >> >>> The first contributed version of Flink docker integration also >> contained >> >> >>> example and docs for the integration with Bluemix in IBM cloud. We >> also >> >> >>> suggest to maintain it outside of Flink repository (cc Markus >> Müller). >> >> >>> >> >> >>> Thanks, >> >> >>> Andrey >> >> >>> >> >> >>> [1] >> >> >>> >> >> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification >> >> >>> >> >> >> >> >> > >> >> > -- >> >> > >> >> > Konstantin Knauf | Head of Product >> >> > >> >> > +49 160 91394525 >> >> > >> >> > >> >> > Follow us @VervericaData Ververica <https://www.ververica.com/> >> >> > >> >> > >> >> > -- >> >> > >> >> > Join Flink Forward <https://flink-forward.org/> - The Apache Flink >> >> > Conference >> >> > >> >> > Stream Processing | Event Driven | Real Time >> >> > >> >> > -- >> >> > >> >> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany >> >> > >> >> > -- >> >> > Ververica GmbH >> >> > Registered at Amtsgericht Charlottenburg: HRB 158244 B >> >> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, >> Ji >> >> > (Tony) Cheng >> >> > >> >