I am currently in the process of getting flink 1.5 running in a mesos cluster using docker.
I have come across a few improvements that I think could be helpful with this configuration (and will probably also apply for any future containerized deployments, like Kubernetes) I have already created two issues to track this: https://issues.apache.org/jira/browse/FLINK-9611 and https://issues.apache.org/jira/browse/FLINK-9612. A quick summary: FLINK-9611 - Allow for a configuration option to add user defined artifacts to be downloaded into the container. This is useful for cases where you want to add credentials to pull a private docker image (but probably has many other use cases). While this could easily be done via config, it *might* allow for better extensiblity to dynamic classload a user defined overlay class, that could tweak the container specification as needed FLINK-9612 - Add an option for disabling pulling of most of the FlinkDistributionOverlay. Currently, if you are trying to deploy many TaskManagers with a pre-built docker image with a flink distribution, it is very wasteful, as it re-downloads all the dependencies. This can cause problems with swarming the MesosArtifactServer and it doesn't take too many nodes deploying to see some failed downloads. I am willing to implement these two features, but would be interested in getting some feedback. Some questions - Would a limited (but simple) property like ` mesos.resourcemanager.tasks.uris` with a comma separated list of URIs be preferable to a more powerful (but more complex) `mesos.resourcemanager.tasks.user-overlay` property that, when defined, would use a classloader to dynamically add another overlay? - Is there any files that are generated by flink that would need to always be downloaded from as an artifact into the container? As best as I can tell, that isn't the case, at least in the `FlinkDistributionOverlay` - Are there any other overlay layers that are redundant in container deployment using pre-built docker images? Thanks for your feedback!