Github user EronWright commented on the pull request: https://github.com/apache/flink/pull/1978#issuecomment-222285524 This PR dovetails nicely with the Mesos work and I'll be sure to build on it. Here's a few suggestions to align it even further. The problem of _managing_ a Flink cluster is mostly independent from _using_ a cluster to submit and manage jobs. I would like to see the two concerns be cleanly separated. In this PR, the `ClusterDescriptor` handles creating the cluster, then produces a `Client` with which to manage jobs and to handle shutdown. I suggest that a new component - the `YarnDispatcher` - be introduced to handle all lifecycle operations for a cluster. Make the `ClusterDescriptor` be an entity class that is given to the dispatcher. A related issue is that its only possible to use the `YarnClusterClient` to interact with a newly-created YARN session, not a pre-existing one. When submitting a job to an existing YARN session, seems the `StandaloneClusterClient` is used (by supplying a JM endpoint) - is that true? Eventually the CLI should provide a nice way to discover and use existing YARN sessions. The `detached` flags could use clarification. In the `Client` context, the detached concept seems related to interactivity with the job (tailing the status messages, etc). I don't think it should imply anything about the lifecycle of the cluster; leave that to the dispatcher. The `stopAfterJob` method should move accordingly to the dispatcher. How this relates to Mesos is, the `MesosDispatcher` component will run in the Mesos cluster and be accessed remotely by the CLI. The `ClusterDescriptor` will be passed via REST to it. Everything will fit nicely. :)
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---