Re: long lived standalone job session cluster in kubernetes

Till Rohrmann Wed, 05 Dec 2018 10:00:29 -0800

Hi Derek,

there is this issue [1] which tracks the active Kubernetes integration. Jin
Sun already started implementing some parts of it. There should also be
some PRs open for it. Please check them out.


[1] https://issues.apache.org/jira/browse/FLINK-9953

Cheers,
Till

On Wed, Dec 5, 2018 at 6:39 PM Derek VerLee <derekver...@gmail.com> wrote:

> Sounds good.
>
> Is someone working on this automation today?
>
> If not, although my time is tight, I may be able to work on a PR for
> getting us started down the path Kubernetes native cluster mode.
>
>
> On 12/4/18 5:35 AM, Till Rohrmann wrote:
>
> Hi Derek,
>
> what I would recommend to use is to trigger the cancel with savepoint
> command [1]. This will create a savepoint and terminate the job execution.
> Next you simply need to respawn the job cluster which you provide with the
> savepoint to resume from.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#cancel-job-with-savepoint
>
> Cheers,
> Till
>
> On Tue, Dec 4, 2018 at 10:30 AM Andrey Zagrebin <and...@data-artisans.com>
> wrote:
>
>> Hi Derek,
>>
>> I think your automation steps look good.
>> Recreating deployments should not take long
>> and as you mention, this way you can avoid unpredictable old/new version
>> collisions.
>>
>> Best,
>> Andrey
>>
>> > On 4 Dec 2018, at 10:22, Dawid Wysakowicz <dwysakow...@apache.org>
>> wrote:
>> >
>> > Hi Derek,
>> >
>> > I am not an expert in kubernetes, so I will cc Till, who should be able
>> > to help you more.
>> >
>> > As for the automation for similar process I would recommend having a
>> > look at dA platform[1] which is built on top of kubernetes.
>> >
>> > Best,
>> >
>> > Dawid
>> >
>> > [1] https://data-artisans.com/platform-overview
>> >
>> > On 30/11/2018 02:10, Derek VerLee wrote:
>> >>
>> >> I'm looking at the job cluster mode, it looks great and I and
>> >> considering migrating our jobs off our "legacy" session cluster and
>> >> into Kubernetes.
>> >>
>> >> I do need to ask some questions because I haven't found a lot of
>> >> details in the documentation about how it works yet, and I gave up
>> >> following the the DI around in the code after a while.
>> >>
>> >> Let's say I have a deployment for the job "leader" in HA with ZK, and
>> >> another deployment for the taskmanagers.
>> >>
>> >> I want to upgrade the code or configuration and start from a
>> >> savepoint, in an automated way.
>> >>
>> >> Best I can figure, I can not just update the deployment resources in
>> >> kubernetes and allow the containers to restart in an arbitrary order.
>> >>
>> >> Instead, I expect sequencing is important, something along the lines
>> >> of this:
>> >>
>> >> 1. issue savepoint command on leader
>> >> 2. wait for savepoint
>> >> 3. destroy all leader and taskmanager containers
>> >> 4. deploy new leader, with savepoint url
>> >> 5. deploy new taskmanagers
>> >>
>> >>
>> >> For example, I imagine old taskmanagers (with an old version of my
>> >> job) attaching to the new leader and causing a problem.
>> >>
>> >> Does that sound right, or am I overthinking it?
>> >>
>> >> If not, has anyone tried implementing any automation for this yet?
>> >>
>> >
>>
>>

Re: long lived standalone job session cluster in kubernetes

Reply via email to