zentol edited a comment on pull request #14630:
URL: https://github.com/apache/flink/pull/14630#issuecomment-762796099


   > How much of that is actually specific to using Flink in Docker[...]?
   
   As it stands, I'd say everything. Let's go through it shall we:
   
   1. Setting of certain options:
   * jobmanager.rpc.address: Outside of docker this must be set manually 
depending on the way you deploy Flink, but we can make use of knowledge about 
dockers networking to simplify the setup.
   * *.port: Outside of docker these are determined randomly by default, but we 
set them to static values here since we know that with docker they cannot 
conflict with others, and it simplifies the network rules setup for users.
   * taskmanager.numberOfTaskSlots: This is essentially legacy behavior that we 
inherited. Outside of docker there are set manually by users based on their 
resources/requirements.
   2. copying plugins
   * It was recommended to us to allow any modifications that are usually done 
manually be made possible via environment variables. This is one such example. 
There are no plans to move this upstream.
   3. FLINK_PROPERTIES
   * Similarly to copying plugins, this was added as a convenience for 
modifying the configuration. Outside of docker this file is modified manually, 
and there is little benefit making this generally available.
   4. envsubst stuff: Again some inherited stuff we can't throw out as of yet. 
There is a current discussion on the Flink dev mailing list to support this in 
Flink itself.
   5. jemalloc stuff
   * This switch was introduced due to a problem that is specific to the 
distribution used by docker image. While it is something that is potentially 
useful in other cases we have no interest in accommodating all possible 
platforms in such detail.
   6. drop_privs_cmd
   * Relies on the existence of a user account that is setup in the Dockerfile.
   7. wrapping of commands
   * primarily exists to set the `start-foreground` flag, as by default Flink 
processes run in the background. This is easier to do if there is some 
abstraction layer in between the user and Flink; in your model we'd have to 
inject a new parameter into the script arguments.
   
   There are some things we can certainly simplify (like deduplicating the 
copy_plugins calls, or jemalloc being controlled by an environment variable 
(that is in fact already implemented and just needs to be merged)).
   The idea to have users call scripts directly is generally a good one, but it 
does bear the risk of users using functionality that needs some docker-specific 
logic that we have yet to set up. Ultimately, our intend neither is nor ever 
was was to provide an image that allows everything in the distribution to be 
used (it handles wayyyy to many things for this be make sense imo), but to only 
ease the setup of singular Flink processes. Maybe this is where some of the 
dissonance comes from.
   Nevertheless I'd be interested in trying this out and checking in with 
others on what they think about it.
   
   > This isn't as much a matter of "acceptable" vs "unacceptable".
   
   I'd beg to differ, given that 
https://github.com/docker-library/official-images/pull/9249 has now been 
sitting around for over a month, forcing us to set a secondary docker image 
distribution channel. Instead such cleanups could've simply been relegated to 
the next version.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to