zentol edited a comment on pull request #14630: URL: https://github.com/apache/flink/pull/14630#issuecomment-762796099
> How much of that is actually specific to using Flink in Docker[...]? As it stands, I'd say everything. Let's go through it shall we: 1. Setting of certain options: * jobmanager.rpc.address: Outside of docker this must be set manually depending on the way you deploy Flink, but we can make use of knowledge about dockers networking to simplify the setup. * *.port: Outside of docker these are determined randomly by default, but we set them to static values here since we know that with docker they cannot conflict with others, and it simplifies the network rules setup for users. * taskmanager.numberOfTaskSlots: This is essentially legacy behavior that we inherited. Outside of docker there are set manually by users based on their resources/requirements. 2. copying plugins * It was recommended to us to allow any modifications that are usually done manually be made possible via environment variables. This is one such example. There are no plans to move this upstream. 3. FLINK_PROPERTIES * Similarly to copying plugins, this was added as a convenience for modifying the configuration. Outside of docker this file is modified manually, and there is little benefit making this generally available. 4. envsubst stuff: Again some inherited stuff we can't throw out as of yet. There is a current discussion on the Flink dev mailing list to support this in Flink itself. 5. jemalloc stuff * This switch was introduced due to a problem that is specific to the distribution used by docker image. While it is something that is potentially useful in other cases we have no interest in accommodating all possible platforms in such detail. 6. drop_privs_cmd * Relies on the existence of a user account that is setup in the Dockerfile. 7. wrapping of commands * primarily exists to set the `start-foreground` flag, as by default Flink processes run in the background. This is easier to do if there is some abstraction layer in between the user and Flink; in your model we'd have to inject a new parameter into the script arguments. There are some things we can certainly simplify (like deduplicating the copy_plugins calls, or jemalloc being controlled by an environment variable (that is in fact already implemented and just needs to be merged)). The idea to have users call scripts directly is generally a good one, but it does bear the risk of users using functionality that needs some docker-specific logic that we have yet to set up. Ultimately, our intend neither is nor ever was was to provide an image that allows everything in the distribution to be used (it handles wayyyy to many things for this be make sense imo), but to only ease the setup of singular Flink processes. Maybe this is where some of the dissonance comes from. Nevertheless I'd be interested in trying this out and checking in with others on what they think about it. > This isn't as much a matter of "acceptable" vs "unacceptable". I'd beg to differ, given that https://github.com/docker-library/official-images/pull/9249 has now been sitting around for over a month, forcing us to set a secondary docker image distribution channel. Instead such cleanups could've simply been relegated to the next version. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org