Has any progress been made on this? There are a number of folks in the community looking to help out.
-H On Wed, Dec 5, 2018 at 10:00 AM Till Rohrmann <trohrm...@apache.org> wrote: > > Hi Derek, > > there is this issue [1] which tracks the active Kubernetes integration. Jin > Sun already started implementing some parts of it. There should also be some > PRs open for it. Please check them out. > > [1] https://issues.apache.org/jira/browse/FLINK-9953 > > Cheers, > Till > > On Wed, Dec 5, 2018 at 6:39 PM Derek VerLee <derekver...@gmail.com> wrote: >> >> Sounds good. >> >> Is someone working on this automation today? >> >> If not, although my time is tight, I may be able to work on a PR for getting >> us started down the path Kubernetes native cluster mode. >> >> >> On 12/4/18 5:35 AM, Till Rohrmann wrote: >> >> Hi Derek, >> >> what I would recommend to use is to trigger the cancel with savepoint >> command [1]. This will create a savepoint and terminate the job execution. >> Next you simply need to respawn the job cluster which you provide with the >> savepoint to resume from. >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#cancel-job-with-savepoint >> >> Cheers, >> Till >> >> On Tue, Dec 4, 2018 at 10:30 AM Andrey Zagrebin <and...@data-artisans.com> >> wrote: >>> >>> Hi Derek, >>> >>> I think your automation steps look good. >>> Recreating deployments should not take long >>> and as you mention, this way you can avoid unpredictable old/new version >>> collisions. >>> >>> Best, >>> Andrey >>> >>> > On 4 Dec 2018, at 10:22, Dawid Wysakowicz <dwysakow...@apache.org> wrote: >>> > >>> > Hi Derek, >>> > >>> > I am not an expert in kubernetes, so I will cc Till, who should be able >>> > to help you more. >>> > >>> > As for the automation for similar process I would recommend having a >>> > look at dA platform[1] which is built on top of kubernetes. >>> > >>> > Best, >>> > >>> > Dawid >>> > >>> > [1] https://data-artisans.com/platform-overview >>> > >>> > On 30/11/2018 02:10, Derek VerLee wrote: >>> >> >>> >> I'm looking at the job cluster mode, it looks great and I and >>> >> considering migrating our jobs off our "legacy" session cluster and >>> >> into Kubernetes. >>> >> >>> >> I do need to ask some questions because I haven't found a lot of >>> >> details in the documentation about how it works yet, and I gave up >>> >> following the the DI around in the code after a while. >>> >> >>> >> Let's say I have a deployment for the job "leader" in HA with ZK, and >>> >> another deployment for the taskmanagers. >>> >> >>> >> I want to upgrade the code or configuration and start from a >>> >> savepoint, in an automated way. >>> >> >>> >> Best I can figure, I can not just update the deployment resources in >>> >> kubernetes and allow the containers to restart in an arbitrary order. >>> >> >>> >> Instead, I expect sequencing is important, something along the lines >>> >> of this: >>> >> >>> >> 1. issue savepoint command on leader >>> >> 2. wait for savepoint >>> >> 3. destroy all leader and taskmanager containers >>> >> 4. deploy new leader, with savepoint url >>> >> 5. deploy new taskmanagers >>> >> >>> >> >>> >> For example, I imagine old taskmanagers (with an old version of my >>> >> job) attaching to the new leader and causing a problem. >>> >> >>> >> Does that sound right, or am I overthinking it? >>> >> >>> >> If not, has anyone tried implementing any automation for this yet? >>> >> >>> > >>>