Thanks Konstantine, I've read the updated section on
https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect
and it lgtm.

I'm +1 on the KIP.


Guozhang


On Thu, Mar 7, 2019 at 2:35 PM Konstantine Karantasis <
konstant...@confluent.io> wrote:

> Thanks Guozhang. This is a valid observation regarding the current status
> of the PR.
>
> I updated the KIP to explicitly call out how the downgrade process should
> work in the section Compatibility, Deprecation, and Migration.
>
> Additionally, I reduced the configuration modes for the connect.protocol to
> only two: eager and compatible.
> That's because there's no way at the moment to select a protocol based on
> simple majority and not unanimity across at least one option for the
> sub-protocol.
> Therefore there's no way to lock a group of workers in a cooperative-only
> mode at the moment, if we account for accidental joins of workers running
> at an older version.
>
> The changes have been reflected in the KIP doc and will be reflected in the
> PR in a subsequent commit.
>
> Thanks,
> Konstantine
>
>
> On Thu, Mar 7, 2019 at 1:17 PM Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Hi Konstantine,
> >
> > Thanks for the updated KIP and the PR as well (which is huge :) I briefly
> > looked through it as well as the KIP, and I have one minor comment to add
> > (otherwise I'm binding +1 on it as well) about the backward
> compatibility.
> > I'll use one example to illustrate the issue:
> >
> > 1) Suppose you have workerA and B on newer version and configured the
> > connect.protocol as "compatible", they will send both V0/V1 to the leader
> > (say it's workerA) who will choose V1 as the current protocol, this will
> be
> > sent back to A and B who would remember the current protocol version is
> > already V1. So after this rebalance everyone remembers that V1 can be
> used,
> > which means that upon prepareJoin they will not revoke all the assigned
> > tasks.
> >
> > 2) Now let's say a new worker joins but with old version V0 (practically
> > this is rare, but for illustration purposes some common scenarios may
> falls
> > into this, e.g. an existing worker being downgraded, which is essentially
> > as being kicked out of the group, and then rejoined as a new member on
> the
> > older version), the leader realized that at least one of the member does
> > not know V1 and hence would fall back to use version V0 to perform
> > assignment. V0 algorithm would do eager rebalance which may move some
> tasks
> > to the new comer immediately from the existing members, as it assumes
> that
> > everyone would revoke everything before join (a.k.a the sync-barrier) but
> > this is actually not true, since everyone other than the old versioned
> new
> > comer would still follow the behavior of V1 --- not revoking anything ---
> > before sending the join group request.
> >
> > This could be solvable though, e.g. when leader realized that he needs to
> > use V0, while the previous "currentProtocol" value is V1, instead of just
> > blindly follow the algorithm of V0 it could just reassign the existing
> > partitions without migrating anything, while at the same time tell
> everyone
> > that the currentProtocol version is downgraded to V0; and then they can
> > trigger another rebalance based on V0 where everything will revoke the
> > tasks before sending join group requests.
> >
> >
> > Guozhang
> >
> > On Wed, Mar 6, 2019 at 2:28 PM Konstantine Karantasis <
> > konstant...@confluent.io> wrote:
> >
> > > I'd like to open the vote on KIP-415: Incremental Cooperative
> Rebalancing
> > > in Kafka Connect
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect
> > >
> > > a proposal that will allow Kafka Connect to scale significantly the
> > number
> > > of connectors and tasks it can run in a cluster of Connect workers.
> > >
> > > Thanks,
> > > Konstantine
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Reply via email to