Thanks for the great KIP Konstantine! +1 (non-binding)
Robert On Thu, Mar 7, 2019 at 2:56 PM Guozhang Wang <wangg...@gmail.com> wrote: > Thanks Konstantine, I've read the updated section on > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect > and it lgtm. > > I'm +1 on the KIP. > > > Guozhang > > > On Thu, Mar 7, 2019 at 2:35 PM Konstantine Karantasis < > konstant...@confluent.io> wrote: > > > Thanks Guozhang. This is a valid observation regarding the current status > > of the PR. > > > > I updated the KIP to explicitly call out how the downgrade process should > > work in the section Compatibility, Deprecation, and Migration. > > > > Additionally, I reduced the configuration modes for the connect.protocol > to > > only two: eager and compatible. > > That's because there's no way at the moment to select a protocol based on > > simple majority and not unanimity across at least one option for the > > sub-protocol. > > Therefore there's no way to lock a group of workers in a cooperative-only > > mode at the moment, if we account for accidental joins of workers running > > at an older version. > > > > The changes have been reflected in the KIP doc and will be reflected in > the > > PR in a subsequent commit. > > > > Thanks, > > Konstantine > > > > > > On Thu, Mar 7, 2019 at 1:17 PM Guozhang Wang <wangg...@gmail.com> wrote: > > > > > Hi Konstantine, > > > > > > Thanks for the updated KIP and the PR as well (which is huge :) I > briefly > > > looked through it as well as the KIP, and I have one minor comment to > add > > > (otherwise I'm binding +1 on it as well) about the backward > > compatibility. > > > I'll use one example to illustrate the issue: > > > > > > 1) Suppose you have workerA and B on newer version and configured the > > > connect.protocol as "compatible", they will send both V0/V1 to the > leader > > > (say it's workerA) who will choose V1 as the current protocol, this > will > > be > > > sent back to A and B who would remember the current protocol version is > > > already V1. So after this rebalance everyone remembers that V1 can be > > used, > > > which means that upon prepareJoin they will not revoke all the assigned > > > tasks. > > > > > > 2) Now let's say a new worker joins but with old version V0 > (practically > > > this is rare, but for illustration purposes some common scenarios may > > falls > > > into this, e.g. an existing worker being downgraded, which is > essentially > > > as being kicked out of the group, and then rejoined as a new member on > > the > > > older version), the leader realized that at least one of the member > does > > > not know V1 and hence would fall back to use version V0 to perform > > > assignment. V0 algorithm would do eager rebalance which may move some > > tasks > > > to the new comer immediately from the existing members, as it assumes > > that > > > everyone would revoke everything before join (a.k.a the sync-barrier) > but > > > this is actually not true, since everyone other than the old versioned > > new > > > comer would still follow the behavior of V1 --- not revoking anything > --- > > > before sending the join group request. > > > > > > This could be solvable though, e.g. when leader realized that he needs > to > > > use V0, while the previous "currentProtocol" value is V1, instead of > just > > > blindly follow the algorithm of V0 it could just reassign the existing > > > partitions without migrating anything, while at the same time tell > > everyone > > > that the currentProtocol version is downgraded to V0; and then they can > > > trigger another rebalance based on V0 where everything will revoke the > > > tasks before sending join group requests. > > > > > > > > > Guozhang > > > > > > On Wed, Mar 6, 2019 at 2:28 PM Konstantine Karantasis < > > > konstant...@confluent.io> wrote: > > > > > > > I'd like to open the vote on KIP-415: Incremental Cooperative > > Rebalancing > > > > in Kafka Connect > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect > > > > > > > > a proposal that will allow Kafka Connect to scale significantly the > > > number > > > > of connectors and tasks it can run in a cluster of Connect workers. > > > > > > > > Thanks, > > > > Konstantine > > > > > > > > > > > > > -- > > > -- Guozhang > > > > > > > > -- > -- Guozhang >