The above applies to Mesos/DCOS as well. So if someone would also share insights into automatic job deployment in that setup would very useful. Thanks. M
On Fri, Dec 22, 2017 at 6:56 PM, Maximilian Bode < maximilian.b...@tngtech.com> wrote: > Hi everyone, > > We are beginning to run Flink on K8s and found the basic templates [1] as > well as the example Helm chart [2] very helpful. Also the discussion about > JobManager HA [3] and Patrick's talk [4] was very interesting. All in all > it is delightful how easy everything can be set up and works out of the box. > > Now we are looking for some best practices as far as job submission is > concerned. Having played with a few alternative options, we would like to > get some input on what other people are using. What we have looked into so > far: > > 1. Packaging the job jar into e.g. the JM image and submitting > manually (either from the UI or via `kubectl exec`). Ideally, we would like > to establish a more automated setup, preferably using native Kubernetes > objects. > 2. Building a separate image whose responsibility it is to submit the > job and keep it running. This could either use the API [5] or share the > Flink config so that CLI calls connect to the existing cluster. When > scheduling this as a Kubernetes deployment [6] and e.g. the node running > this client pod fails, one ends up with duplicate jobs. One could build > custom logic (poll if job exists, only submit if it does not), but this > seems fragile and it is conceivable that this could lead to weird timing > issues like different containers trying to submit at the same time. One > solution would be to implement an atomic submit-if-not-exists, but I > suppose this would need to involve some level of locking on the JM. > 3. Schedule the client container from the step above as a Kubernetes > job [7]. This seems somewhat unidiomatic for streaming jobs that are not > expected to terminate, but one would not have to deal with duplicate Flink > jobs. In the failure scenario described above, the (Flink) job would still > be running on the Flink cluster, there just would not be a client attached > to it (as the Kubernetes job would not be restarted). On the other hand, > should the (Flink) job fail for some reason, there is no fashion of > restarting it automatically. > > Are we missing something obvious? Has the Flink community come up with a > default way of submitting Flink jobs on Kubernetes yet or are there people > willing to share their experiences? > > Best regards and happy holidays, > Max > > [1] https://ci.apache.org/projects/flink/flink-docs- > release-1.4/ops/deployment/kubernetes.html > [2] https://github.com/docker-flink/examples/tree/master/helm/flink > [3] http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/Flink-HA-with-Kubernetes-without-Zookeeper-td15033.html > [4] https://www.youtube.com/watch?v=w721NI-mtAA Slides: > https://www.slideshare.net/FlinkForward/flink-forward- > berlin-2017-patrick-lucas-flink-in-containerland > [5] https://ci.apache.org/projects/flink/flink-docs- > master/monitoring/rest_api.html#submitting-programs > [6] https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ > [7] https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to- > completion/ > -- > Maximilian Bode * maximilian.b...@tngtech.com > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring > Geschäftsführer: Henrik Klagges, Dr. Robert Dahlke, Gerhard Müller > Sitz: Unterföhring * Amtsgericht München * HRB 135082 >