Thank you all for the votes and your comments! KIP-415 has been accepted with +4 binding votes (Guozhang, Jason, Randall, Ewen) and +4 non-binding votes (Ryanne, Rhys, Robert, Satish).
Best, Konstantine On Thu, Mar 14, 2019 at 10:24 PM Satish Duggana <satish.dugg...@gmail.com> wrote: > Nice work Konstantine! > +1 (non-binding) > > On Fri, Mar 15, 2019 at 7:48 AM Ewen Cheslack-Postava <e...@confluent.io> > wrote: > > > +1 (binding) > > > > -Ewen > > > > On Wed, Mar 13, 2019 at 2:04 PM Randall Hauch <rha...@gmail.com> wrote: > > > > > Excellent work, Konstantine! > > > > > > +1 (binding) > > > > > > On Mon, Mar 11, 2019 at 8:05 PM Konstantine Karantasis < > > > konstant...@confluent.io> wrote: > > > > > > > Thanks Jason! > > > > That makes perfect sense. The change is reflected in the KIP now. > > > > "compatible" will be the default mode for "connect.protocol" > > > > > > > > Cheers, > > > > Konstantine > > > > > > > > > > > > On Mon, Mar 11, 2019 at 4:31 PM Jason Gustafson <ja...@confluent.io> > > > > wrote: > > > > > > > > > +1 Thanks for all the work on this. My only minor comment is that > > > > > `connect.protocol` probably should be `compatible` by default. The > > cost > > > > is > > > > > low and it will save upgrade confusion. > > > > > > > > > > Best, > > > > > Jason > > > > > > > > > > On Fri, Mar 8, 2019 at 10:37 AM Robert Yokota <rayok...@gmail.com> > > > > wrote: > > > > > > > > > > > Thanks for the great KIP Konstantine! > > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > Robert > > > > > > > > > > > > On Thu, Mar 7, 2019 at 2:56 PM Guozhang Wang <wangg...@gmail.com > > > > > > wrote: > > > > > > > > > > > > > Thanks Konstantine, I've read the updated section on > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect > > > > > > > and it lgtm. > > > > > > > > > > > > > > I'm +1 on the KIP. > > > > > > > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 7, 2019 at 2:35 PM Konstantine Karantasis < > > > > > > > konstant...@confluent.io> wrote: > > > > > > > > > > > > > > > Thanks Guozhang. This is a valid observation regarding the > > > current > > > > > > status > > > > > > > > of the PR. > > > > > > > > > > > > > > > > I updated the KIP to explicitly call out how the downgrade > > > process > > > > > > should > > > > > > > > work in the section Compatibility, Deprecation, and > Migration. > > > > > > > > > > > > > > > > Additionally, I reduced the configuration modes for the > > > > > > connect.protocol > > > > > > > to > > > > > > > > only two: eager and compatible. > > > > > > > > That's because there's no way at the moment to select a > > protocol > > > > > based > > > > > > on > > > > > > > > simple majority and not unanimity across at least one option > > for > > > > the > > > > > > > > sub-protocol. > > > > > > > > Therefore there's no way to lock a group of workers in a > > > > > > cooperative-only > > > > > > > > mode at the moment, if we account for accidental joins of > > workers > > > > > > running > > > > > > > > at an older version. > > > > > > > > > > > > > > > > The changes have been reflected in the KIP doc and will be > > > > reflected > > > > > in > > > > > > > the > > > > > > > > PR in a subsequent commit. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Konstantine > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 7, 2019 at 1:17 PM Guozhang Wang < > > wangg...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Konstantine, > > > > > > > > > > > > > > > > > > Thanks for the updated KIP and the PR as well (which is > huge > > > :) I > > > > > > > briefly > > > > > > > > > looked through it as well as the KIP, and I have one minor > > > > comment > > > > > to > > > > > > > add > > > > > > > > > (otherwise I'm binding +1 on it as well) about the backward > > > > > > > > compatibility. > > > > > > > > > I'll use one example to illustrate the issue: > > > > > > > > > > > > > > > > > > 1) Suppose you have workerA and B on newer version and > > > configured > > > > > the > > > > > > > > > connect.protocol as "compatible", they will send both V0/V1 > > to > > > > the > > > > > > > leader > > > > > > > > > (say it's workerA) who will choose V1 as the current > > protocol, > > > > this > > > > > > > will > > > > > > > > be > > > > > > > > > sent back to A and B who would remember the current > protocol > > > > > version > > > > > > is > > > > > > > > > already V1. So after this rebalance everyone remembers that > > V1 > > > > can > > > > > be > > > > > > > > used, > > > > > > > > > which means that upon prepareJoin they will not revoke all > > the > > > > > > assigned > > > > > > > > > tasks. > > > > > > > > > > > > > > > > > > 2) Now let's say a new worker joins but with old version V0 > > > > > > > (practically > > > > > > > > > this is rare, but for illustration purposes some common > > > scenarios > > > > > may > > > > > > > > falls > > > > > > > > > into this, e.g. an existing worker being downgraded, which > is > > > > > > > essentially > > > > > > > > > as being kicked out of the group, and then rejoined as a > new > > > > member > > > > > > on > > > > > > > > the > > > > > > > > > older version), the leader realized that at least one of > the > > > > member > > > > > > > does > > > > > > > > > not know V1 and hence would fall back to use version V0 to > > > > perform > > > > > > > > > assignment. V0 algorithm would do eager rebalance which may > > > move > > > > > some > > > > > > > > tasks > > > > > > > > > to the new comer immediately from the existing members, as > it > > > > > assumes > > > > > > > > that > > > > > > > > > everyone would revoke everything before join (a.k.a the > > > > > sync-barrier) > > > > > > > but > > > > > > > > > this is actually not true, since everyone other than the > old > > > > > > versioned > > > > > > > > new > > > > > > > > > comer would still follow the behavior of V1 --- not > revoking > > > > > anything > > > > > > > --- > > > > > > > > > before sending the join group request. > > > > > > > > > > > > > > > > > > This could be solvable though, e.g. when leader realized > that > > > he > > > > > > needs > > > > > > > to > > > > > > > > > use V0, while the previous "currentProtocol" value is V1, > > > instead > > > > > of > > > > > > > just > > > > > > > > > blindly follow the algorithm of V0 it could just reassign > the > > > > > > existing > > > > > > > > > partitions without migrating anything, while at the same > time > > > > tell > > > > > > > > everyone > > > > > > > > > that the currentProtocol version is downgraded to V0; and > > then > > > > they > > > > > > can > > > > > > > > > trigger another rebalance based on V0 where everything will > > > > revoke > > > > > > the > > > > > > > > > tasks before sending join group requests. > > > > > > > > > > > > > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > > > On Wed, Mar 6, 2019 at 2:28 PM Konstantine Karantasis < > > > > > > > > > konstant...@confluent.io> wrote: > > > > > > > > > > > > > > > > > > > I'd like to open the vote on KIP-415: Incremental > > Cooperative > > > > > > > > Rebalancing > > > > > > > > > > in Kafka Connect > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect > > > > > > > > > > > > > > > > > > > > a proposal that will allow Kafka Connect to scale > > > significantly > > > > > the > > > > > > > > > number > > > > > > > > > > of connectors and tasks it can run in a cluster of > Connect > > > > > workers. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Konstantine > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > > >