Hi Derek, there is this issue [1] which tracks the active Kubernetes integration. Jin Sun already started implementing some parts of it. There should also be some PRs open for it. Please check them out.
[1] https://issues.apache.org/jira/browse/FLINK-9953 Cheers, Till On Wed, Dec 5, 2018 at 6:39 PM Derek VerLee <derekver...@gmail.com> wrote: > Sounds good. > > Is someone working on this automation today? > > If not, although my time is tight, I may be able to work on a PR for > getting us started down the path Kubernetes native cluster mode. > > > On 12/4/18 5:35 AM, Till Rohrmann wrote: > > Hi Derek, > > what I would recommend to use is to trigger the cancel with savepoint > command [1]. This will create a savepoint and terminate the job execution. > Next you simply need to respawn the job cluster which you provide with the > savepoint to resume from. > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#cancel-job-with-savepoint > > Cheers, > Till > > On Tue, Dec 4, 2018 at 10:30 AM Andrey Zagrebin <and...@data-artisans.com> > wrote: > >> Hi Derek, >> >> I think your automation steps look good. >> Recreating deployments should not take long >> and as you mention, this way you can avoid unpredictable old/new version >> collisions. >> >> Best, >> Andrey >> >> > On 4 Dec 2018, at 10:22, Dawid Wysakowicz <dwysakow...@apache.org> >> wrote: >> > >> > Hi Derek, >> > >> > I am not an expert in kubernetes, so I will cc Till, who should be able >> > to help you more. >> > >> > As for the automation for similar process I would recommend having a >> > look at dA platform[1] which is built on top of kubernetes. >> > >> > Best, >> > >> > Dawid >> > >> > [1] https://data-artisans.com/platform-overview >> > >> > On 30/11/2018 02:10, Derek VerLee wrote: >> >> >> >> I'm looking at the job cluster mode, it looks great and I and >> >> considering migrating our jobs off our "legacy" session cluster and >> >> into Kubernetes. >> >> >> >> I do need to ask some questions because I haven't found a lot of >> >> details in the documentation about how it works yet, and I gave up >> >> following the the DI around in the code after a while. >> >> >> >> Let's say I have a deployment for the job "leader" in HA with ZK, and >> >> another deployment for the taskmanagers. >> >> >> >> I want to upgrade the code or configuration and start from a >> >> savepoint, in an automated way. >> >> >> >> Best I can figure, I can not just update the deployment resources in >> >> kubernetes and allow the containers to restart in an arbitrary order. >> >> >> >> Instead, I expect sequencing is important, something along the lines >> >> of this: >> >> >> >> 1. issue savepoint command on leader >> >> 2. wait for savepoint >> >> 3. destroy all leader and taskmanager containers >> >> 4. deploy new leader, with savepoint url >> >> 5. deploy new taskmanagers >> >> >> >> >> >> For example, I imagine old taskmanagers (with an old version of my >> >> job) attaching to the new leader and causing a problem. >> >> >> >> Does that sound right, or am I overthinking it? >> >> >> >> If not, has anyone tried implementing any automation for this yet? >> >> >> > >> >>