Re: Flink on K8s job submission best practices

Martin Eden Fri, 22 Dec 2017 23:02:50 -0800

The above applies to Mesos/DCOS as well. So if someone would also share
insights into automatic job deployment in that setup would very useful.
Thanks.
M


On Fri, Dec 22, 2017 at 6:56 PM, Maximilian Bode <
maximilian.b...@tngtech.com> wrote:

> Hi everyone,
>
> We are beginning to run Flink on K8s and found the basic templates [1] as
> well as the example Helm chart [2] very helpful. Also the discussion about
> JobManager HA [3] and Patrick's talk [4] was very interesting. All in all
> it is delightful how easy everything can be set up and works out of the box.
>
> Now we are looking for some best practices as far as job submission is
> concerned. Having played with a few alternative options, we would like to
> get some input on what other people are using. What we have looked into so
> far:
>
>    1. Packaging the job jar into e.g. the JM image and submitting
>    manually (either from the UI or via `kubectl exec`). Ideally, we would like
>    to establish a more automated setup, preferably using native Kubernetes
>    objects.
>    2. Building a separate image whose responsibility it is to submit the
>    job and keep it running. This could either use the API [5] or share the
>    Flink config so that CLI calls connect to the existing cluster. When
>    scheduling this as a Kubernetes deployment [6] and e.g. the node running
>    this client pod fails, one ends up with duplicate jobs. One could build
>    custom logic (poll if job exists, only submit if it does not), but this
>    seems fragile and it is conceivable that this could lead to weird timing
>    issues like different containers trying to submit at the same time. One
>    solution would be to implement an atomic submit-if-not-exists, but I
>    suppose this would need to involve some level of locking on the JM.
>    3. Schedule the client container from the step above as a Kubernetes
>    job [7]. This seems somewhat unidiomatic for streaming jobs that are not
>    expected to terminate, but one would not have to deal with duplicate Flink
>    jobs. In the failure scenario described above, the (Flink) job would still
>    be running on the Flink cluster, there just would not be a client attached
>    to it (as the Kubernetes job would not be restarted). On the other hand,
>    should the (Flink) job fail for some reason, there is no fashion of
>    restarting it automatically.
>
> Are we missing something obvious? Has the Flink community come up with a
> default way of submitting Flink jobs on Kubernetes yet or are there people
> willing to share their experiences?
>
> Best regards and happy holidays,
> Max
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> release-1.4/ops/deployment/kubernetes.html
> [2] https://github.com/docker-flink/examples/tree/master/helm/flink
> [3] http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Flink-HA-with-Kubernetes-without-Zookeeper-td15033.html
> [4] https://www.youtube.com/watch?v=w721NI-mtAA Slides:
> https://www.slideshare.net/FlinkForward/flink-forward-
> berlin-2017-patrick-lucas-flink-in-containerland
> [5] https://ci.apache.org/projects/flink/flink-docs-
> master/monitoring/rest_api.html#submitting-programs
> [6] https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
> [7] https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-
> completion/
> --
> Maximilian Bode * maximilian.b...@tngtech.com
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Dr. Robert Dahlke, Gerhard Müller
> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>

Re: Flink on K8s job submission best practices

Reply via email to