Hi Heath, I just learned that people from Alibaba already made some good progress with FLINK-9953. I'm currently talking to them in order to see how we can merge this contribution into Flink as fast as possible. Since I'm quite busy due to the upcoming release I hope that other community members will help out with the reviewing once the PRs are opened.
Cheers, Till On Fri, Feb 8, 2019 at 8:50 PM Heath Albritton <halbr...@harm.org> wrote: > Has any progress been made on this? There are a number of folks in > the community looking to help out. > > > -H > > On Wed, Dec 5, 2018 at 10:00 AM Till Rohrmann <trohrm...@apache.org> > wrote: > > > > Hi Derek, > > > > there is this issue [1] which tracks the active Kubernetes integration. > Jin Sun already started implementing some parts of it. There should also be > some PRs open for it. Please check them out. > > > > [1] https://issues.apache.org/jira/browse/FLINK-9953 > > > > Cheers, > > Till > > > > On Wed, Dec 5, 2018 at 6:39 PM Derek VerLee <derekver...@gmail.com> > wrote: > >> > >> Sounds good. > >> > >> Is someone working on this automation today? > >> > >> If not, although my time is tight, I may be able to work on a PR for > getting us started down the path Kubernetes native cluster mode. > >> > >> > >> On 12/4/18 5:35 AM, Till Rohrmann wrote: > >> > >> Hi Derek, > >> > >> what I would recommend to use is to trigger the cancel with savepoint > command [1]. This will create a savepoint and terminate the job execution. > Next you simply need to respawn the job cluster which you provide with the > savepoint to resume from. > >> > >> [1] > https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#cancel-job-with-savepoint > >> > >> Cheers, > >> Till > >> > >> On Tue, Dec 4, 2018 at 10:30 AM Andrey Zagrebin < > and...@data-artisans.com> wrote: > >>> > >>> Hi Derek, > >>> > >>> I think your automation steps look good. > >>> Recreating deployments should not take long > >>> and as you mention, this way you can avoid unpredictable old/new > version collisions. > >>> > >>> Best, > >>> Andrey > >>> > >>> > On 4 Dec 2018, at 10:22, Dawid Wysakowicz <dwysakow...@apache.org> > wrote: > >>> > > >>> > Hi Derek, > >>> > > >>> > I am not an expert in kubernetes, so I will cc Till, who should be > able > >>> > to help you more. > >>> > > >>> > As for the automation for similar process I would recommend having a > >>> > look at dA platform[1] which is built on top of kubernetes. > >>> > > >>> > Best, > >>> > > >>> > Dawid > >>> > > >>> > [1] https://data-artisans.com/platform-overview > >>> > > >>> > On 30/11/2018 02:10, Derek VerLee wrote: > >>> >> > >>> >> I'm looking at the job cluster mode, it looks great and I and > >>> >> considering migrating our jobs off our "legacy" session cluster and > >>> >> into Kubernetes. > >>> >> > >>> >> I do need to ask some questions because I haven't found a lot of > >>> >> details in the documentation about how it works yet, and I gave up > >>> >> following the the DI around in the code after a while. > >>> >> > >>> >> Let's say I have a deployment for the job "leader" in HA with ZK, > and > >>> >> another deployment for the taskmanagers. > >>> >> > >>> >> I want to upgrade the code or configuration and start from a > >>> >> savepoint, in an automated way. > >>> >> > >>> >> Best I can figure, I can not just update the deployment resources in > >>> >> kubernetes and allow the containers to restart in an arbitrary > order. > >>> >> > >>> >> Instead, I expect sequencing is important, something along the lines > >>> >> of this: > >>> >> > >>> >> 1. issue savepoint command on leader > >>> >> 2. wait for savepoint > >>> >> 3. destroy all leader and taskmanager containers > >>> >> 4. deploy new leader, with savepoint url > >>> >> 5. deploy new taskmanagers > >>> >> > >>> >> > >>> >> For example, I imagine old taskmanagers (with an old version of my > >>> >> job) attaching to the new leader and causing a problem. > >>> >> > >>> >> Does that sound right, or am I overthinking it? > >>> >> > >>> >> If not, has anyone tried implementing any automation for this yet? > >>> >> > >>> > > >>> >