Great, my team is eager to get started. I’m curious what progress had been made so far?
-H > On Feb 26, 2019, at 14:43, Chunhui Shi <c...@apache.org> wrote: > > Hi Heath and Till, thanks for offering help on reviewing this feature. I just > reassigned the JIRAs to myself after offline discussion with Jin. Let us work > together to get kubernetes integrated natively with flink. Thanks. > >> On Fri, Feb 15, 2019 at 12:19 AM Till Rohrmann <trohrm...@apache.org> wrote: >> Alright, I'll get back to you once the PRs are open. Thanks a lot for your >> help :-) >> >> Cheers, >> Till >> >>> On Thu, Feb 14, 2019 at 5:45 PM Heath Albritton <halbr...@harm.org> wrote: >>> My team and I are keen to help out with testing and review as soon as there >>> is a pill request. >>> >>> -H >>> >>>> On Feb 11, 2019, at 00:26, Till Rohrmann <trohrm...@apache.org> wrote: >>>> >>>> Hi Heath, >>>> >>>> I just learned that people from Alibaba already made some good progress >>>> with FLINK-9953. I'm currently talking to them in order to see how we can >>>> merge this contribution into Flink as fast as possible. Since I'm quite >>>> busy due to the upcoming release I hope that other community members will >>>> help out with the reviewing once the PRs are opened. >>>> >>>> Cheers, >>>> Till >>>> >>>>> On Fri, Feb 8, 2019 at 8:50 PM Heath Albritton <halbr...@harm.org> wrote: >>>>> Has any progress been made on this? There are a number of folks in >>>>> the community looking to help out. >>>>> >>>>> >>>>> -H >>>>> >>>>> On Wed, Dec 5, 2018 at 10:00 AM Till Rohrmann <trohrm...@apache.org> >>>>> wrote: >>>>> > >>>>> > Hi Derek, >>>>> > >>>>> > there is this issue [1] which tracks the active Kubernetes integration. >>>>> > Jin Sun already started implementing some parts of it. There should >>>>> > also be some PRs open for it. Please check them out. >>>>> > >>>>> > [1] https://issues.apache.org/jira/browse/FLINK-9953 >>>>> > >>>>> > Cheers, >>>>> > Till >>>>> > >>>>> > On Wed, Dec 5, 2018 at 6:39 PM Derek VerLee <derekver...@gmail.com> >>>>> > wrote: >>>>> >> >>>>> >> Sounds good. >>>>> >> >>>>> >> Is someone working on this automation today? >>>>> >> >>>>> >> If not, although my time is tight, I may be able to work on a PR for >>>>> >> getting us started down the path Kubernetes native cluster mode. >>>>> >> >>>>> >> >>>>> >> On 12/4/18 5:35 AM, Till Rohrmann wrote: >>>>> >> >>>>> >> Hi Derek, >>>>> >> >>>>> >> what I would recommend to use is to trigger the cancel with savepoint >>>>> >> command [1]. This will create a savepoint and terminate the job >>>>> >> execution. Next you simply need to respawn the job cluster which you >>>>> >> provide with the savepoint to resume from. >>>>> >> >>>>> >> [1] >>>>> >> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#cancel-job-with-savepoint >>>>> >> >>>>> >> Cheers, >>>>> >> Till >>>>> >> >>>>> >> On Tue, Dec 4, 2018 at 10:30 AM Andrey Zagrebin >>>>> >> <and...@data-artisans.com> wrote: >>>>> >>> >>>>> >>> Hi Derek, >>>>> >>> >>>>> >>> I think your automation steps look good. >>>>> >>> Recreating deployments should not take long >>>>> >>> and as you mention, this way you can avoid unpredictable old/new >>>>> >>> version collisions. >>>>> >>> >>>>> >>> Best, >>>>> >>> Andrey >>>>> >>> >>>>> >>> > On 4 Dec 2018, at 10:22, Dawid Wysakowicz <dwysakow...@apache.org> >>>>> >>> > wrote: >>>>> >>> > >>>>> >>> > Hi Derek, >>>>> >>> > >>>>> >>> > I am not an expert in kubernetes, so I will cc Till, who should be >>>>> >>> > able >>>>> >>> > to help you more. >>>>> >>> > >>>>> >>> > As for the automation for similar process I would recommend having a >>>>> >>> > look at dA platform[1] which is built on top of kubernetes. >>>>> >>> > >>>>> >>> > Best, >>>>> >>> > >>>>> >>> > Dawid >>>>> >>> > >>>>> >>> > [1] https://data-artisans.com/platform-overview >>>>> >>> > >>>>> >>> > On 30/11/2018 02:10, Derek VerLee wrote: >>>>> >>> >> >>>>> >>> >> I'm looking at the job cluster mode, it looks great and I and >>>>> >>> >> considering migrating our jobs off our "legacy" session cluster and >>>>> >>> >> into Kubernetes. >>>>> >>> >> >>>>> >>> >> I do need to ask some questions because I haven't found a lot of >>>>> >>> >> details in the documentation about how it works yet, and I gave up >>>>> >>> >> following the the DI around in the code after a while. >>>>> >>> >> >>>>> >>> >> Let's say I have a deployment for the job "leader" in HA with ZK, >>>>> >>> >> and >>>>> >>> >> another deployment for the taskmanagers. >>>>> >>> >> >>>>> >>> >> I want to upgrade the code or configuration and start from a >>>>> >>> >> savepoint, in an automated way. >>>>> >>> >> >>>>> >>> >> Best I can figure, I can not just update the deployment resources >>>>> >>> >> in >>>>> >>> >> kubernetes and allow the containers to restart in an arbitrary >>>>> >>> >> order. >>>>> >>> >> >>>>> >>> >> Instead, I expect sequencing is important, something along the >>>>> >>> >> lines >>>>> >>> >> of this: >>>>> >>> >> >>>>> >>> >> 1. issue savepoint command on leader >>>>> >>> >> 2. wait for savepoint >>>>> >>> >> 3. destroy all leader and taskmanager containers >>>>> >>> >> 4. deploy new leader, with savepoint url >>>>> >>> >> 5. deploy new taskmanagers >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> For example, I imagine old taskmanagers (with an old version of my >>>>> >>> >> job) attaching to the new leader and causing a problem. >>>>> >>> >> >>>>> >>> >> Does that sound right, or am I overthinking it? >>>>> >>> >> >>>>> >>> >> If not, has anyone tried implementing any automation for this yet? >>>>> >>> >> >>>>> >>> > >>>>> >>>