Hi Heath, I think some of the PRs are already open and ready for review [1, 2].
[1] https://issues.apache.org/jira/browse/FLINK-10932 [2] https://issues.apache.org/jira/browse/FLINK-10935 Cheers, Till On Wed, Feb 27, 2019 at 10:48 AM Heath Albritton <halbr...@harm.org> wrote: > Great, my team is eager to get started. I’m curious what progress had > been made so far? > > -H > > On Feb 26, 2019, at 14:43, Chunhui Shi <c...@apache.org> wrote: > > Hi Heath and Till, thanks for offering help on reviewing this feature. I > just reassigned the JIRAs to myself after offline discussion with Jin. Let > us work together to get kubernetes integrated natively with flink. Thanks. > > On Fri, Feb 15, 2019 at 12:19 AM Till Rohrmann <trohrm...@apache.org> > wrote: > >> Alright, I'll get back to you once the PRs are open. Thanks a lot for >> your help :-) >> >> Cheers, >> Till >> >> On Thu, Feb 14, 2019 at 5:45 PM Heath Albritton <halbr...@harm.org> >> wrote: >> >>> My team and I are keen to help out with testing and review as soon as >>> there is a pill request. >>> >>> -H >>> >>> On Feb 11, 2019, at 00:26, Till Rohrmann <trohrm...@apache.org> wrote: >>> >>> Hi Heath, >>> >>> I just learned that people from Alibaba already made some good progress >>> with FLINK-9953. I'm currently talking to them in order to see how we can >>> merge this contribution into Flink as fast as possible. Since I'm quite >>> busy due to the upcoming release I hope that other community members will >>> help out with the reviewing once the PRs are opened. >>> >>> Cheers, >>> Till >>> >>> On Fri, Feb 8, 2019 at 8:50 PM Heath Albritton <halbr...@harm.org> >>> wrote: >>> >>>> Has any progress been made on this? There are a number of folks in >>>> the community looking to help out. >>>> >>>> >>>> -H >>>> >>>> On Wed, Dec 5, 2018 at 10:00 AM Till Rohrmann <trohrm...@apache.org> >>>> wrote: >>>> > >>>> > Hi Derek, >>>> > >>>> > there is this issue [1] which tracks the active Kubernetes >>>> integration. Jin Sun already started implementing some parts of it. There >>>> should also be some PRs open for it. Please check them out. >>>> > >>>> > [1] https://issues.apache.org/jira/browse/FLINK-9953 >>>> > >>>> > Cheers, >>>> > Till >>>> > >>>> > On Wed, Dec 5, 2018 at 6:39 PM Derek VerLee <derekver...@gmail.com> >>>> wrote: >>>> >> >>>> >> Sounds good. >>>> >> >>>> >> Is someone working on this automation today? >>>> >> >>>> >> If not, although my time is tight, I may be able to work on a PR for >>>> getting us started down the path Kubernetes native cluster mode. >>>> >> >>>> >> >>>> >> On 12/4/18 5:35 AM, Till Rohrmann wrote: >>>> >> >>>> >> Hi Derek, >>>> >> >>>> >> what I would recommend to use is to trigger the cancel with >>>> savepoint command [1]. This will create a savepoint and terminate the job >>>> execution. Next you simply need to respawn the job cluster which you >>>> provide with the savepoint to resume from. >>>> >> >>>> >> [1] >>>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#cancel-job-with-savepoint >>>> >> >>>> >> Cheers, >>>> >> Till >>>> >> >>>> >> On Tue, Dec 4, 2018 at 10:30 AM Andrey Zagrebin < >>>> and...@data-artisans.com> wrote: >>>> >>> >>>> >>> Hi Derek, >>>> >>> >>>> >>> I think your automation steps look good. >>>> >>> Recreating deployments should not take long >>>> >>> and as you mention, this way you can avoid unpredictable old/new >>>> version collisions. >>>> >>> >>>> >>> Best, >>>> >>> Andrey >>>> >>> >>>> >>> > On 4 Dec 2018, at 10:22, Dawid Wysakowicz <dwysakow...@apache.org> >>>> wrote: >>>> >>> > >>>> >>> > Hi Derek, >>>> >>> > >>>> >>> > I am not an expert in kubernetes, so I will cc Till, who should >>>> be able >>>> >>> > to help you more. >>>> >>> > >>>> >>> > As for the automation for similar process I would recommend >>>> having a >>>> >>> > look at dA platform[1] which is built on top of kubernetes. >>>> >>> > >>>> >>> > Best, >>>> >>> > >>>> >>> > Dawid >>>> >>> > >>>> >>> > [1] https://data-artisans.com/platform-overview >>>> >>> > >>>> >>> > On 30/11/2018 02:10, Derek VerLee wrote: >>>> >>> >> >>>> >>> >> I'm looking at the job cluster mode, it looks great and I and >>>> >>> >> considering migrating our jobs off our "legacy" session cluster >>>> and >>>> >>> >> into Kubernetes. >>>> >>> >> >>>> >>> >> I do need to ask some questions because I haven't found a lot of >>>> >>> >> details in the documentation about how it works yet, and I gave >>>> up >>>> >>> >> following the the DI around in the code after a while. >>>> >>> >> >>>> >>> >> Let's say I have a deployment for the job "leader" in HA with >>>> ZK, and >>>> >>> >> another deployment for the taskmanagers. >>>> >>> >> >>>> >>> >> I want to upgrade the code or configuration and start from a >>>> >>> >> savepoint, in an automated way. >>>> >>> >> >>>> >>> >> Best I can figure, I can not just update the deployment >>>> resources in >>>> >>> >> kubernetes and allow the containers to restart in an arbitrary >>>> order. >>>> >>> >> >>>> >>> >> Instead, I expect sequencing is important, something along the >>>> lines >>>> >>> >> of this: >>>> >>> >> >>>> >>> >> 1. issue savepoint command on leader >>>> >>> >> 2. wait for savepoint >>>> >>> >> 3. destroy all leader and taskmanager containers >>>> >>> >> 4. deploy new leader, with savepoint url >>>> >>> >> 5. deploy new taskmanagers >>>> >>> >> >>>> >>> >> >>>> >>> >> For example, I imagine old taskmanagers (with an old version of >>>> my >>>> >>> >> job) attaching to the new leader and causing a problem. >>>> >>> >> >>>> >>> >> Does that sound right, or am I overthinking it? >>>> >>> >> >>>> >>> >> If not, has anyone tried implementing any automation for this >>>> yet? >>>> >>> >> >>>> >>> > >>>> >>> >>>> >>>