Re: [DISCUSS] KIP-415: Incremental Cooperative Rebalancing in Kafka Connect

Boyang Chen Mon, 14 Jan 2019 11:24:29 -0800

Hey Konstantine,

great work for making this happen! Incremental rebalancing is super important 
for avoiding unnecessary resource shuffle and improving the overall Connect 
framework stability.

After the first pass, two questions across my mind are:

  1.  For my understanding, the general rebalancing case could be covered by 
configuring the workers as static members, so that we don't need to worry about 
worker temporarily leaving group case. Basically KIP-345 could help with 
avoiding unexpected rebalances during cluster rolling bounce which I feel the 
same way as Stanislav that parts of 415 logic could be simplified. It would be 
great if we could look at these two initiatives holistically to help reduce the 
common workload.
  2.  Since I never used Connect before, I do hope you could enlighten me on 
the potential effort involved in task transfer between workers. The reason I 
ask is to estimate how much burden will we introduce by starting a task on the 
brand new worker? Is there any local state to be replayed? It would be good to 
also provide this background in the KIP motivation so that people could 
understand better of the symptom and build constructive feedbacks.

Thanks a lot!

Boyang
________________________________
From: Stanislav Kozlovski <stanis...@confluent.io>
Sent: Monday, January 14, 2019 3:15 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-415: Incremental Cooperative Rebalancing in Kafka 
Connect

Hey Konstantine,

This is a very exciting and fundamental-improving KIP, thanks a lot for
working on it!

Have you seen KIP-345
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-345>? I was
wondering whether Connect would support the static group membership -
potentially limiting the need to handle "node bounce" cases through a
rebalance (even though there wouldn't be downtime). I find it is somewhat
related to the `scheduled.rebalance.max.delay.ms` config described in
KIP-415. The main difference I think is that rebalance delay in KIP-345 is
configurable through `session.timeout.ms` which is tied to the liveness
heartbeat, whereas here we have a separate config.

The original design document suggested
>  Assignment response includes usual assignment information. Start
processing any new partitions. (Since we expect sticky assignment, we could
also optimize this and omit the assignment when it is just repeating a
previous assignment)
Have we decided on whether we would make use of the optimization as to not
send the assignment that the worker already knows about?

I enjoyed reading the rebalancing examples. As a small readability
improvement, could I suggest we clarify which Worker (W1,W2,W3) is the
leader in the "Initial group and assignment" part? For example, in the
`Leader bounces` I was left thinking whether the leaving W2 was the initial
leader or not.

Thanks,
Stanislav

On Sat, Jan 12, 2019 at 1:44 AM Konstantine Karantasis <
konstant...@confluent.io> wrote:

> Hi all,
>
> I just published KIP-415: Incremental Cooperative Rebalancing in Kafka
> Connect
> on the wiki here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>
> This is the first KIP to suggest an implementation of incremental and
> cooperative rebalancing in the context of Kafka Connect. It aims to provide
> an adequate solution to the stop-the-world effect that occurs in a Connect
> cluster whenever a new connector configuration is submitted or a Connect
> Worker is added or removed from the cluster.
>
> Looking forward to your insightful feedback!
>
> Regards,
> Konstantine
>

--
Best,
Stanislav

Re: [DISCUSS] KIP-415: Incremental Cooperative Rebalancing in Kafka Connect

Reply via email to