[jira] [Commented] (KAFKA-13816) Downgrading Connect rebalancing protocol from incremental to eager causes duplicate task instances

Chris Egerton (Jira) Tue, 19 Apr 2022 14:53:05 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-13816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524601#comment-17524601
 ]


Chris Egerton commented on KAFKA-13816:
---------------------------------------

[~sagarrao] I think you're on the right track, but it's a little trickier than 
that since the leader won't necessarily die if it fails to read the config 
topic during a rebalance. In some (probably most) cases, it'll just issue an 
assignment to everyone with the error field set to 
[CONFIG_MISMATCH|https://github.com/apache/kafka/blob/9c3f605fc78f297ecf5accdcdec18471c19cf7d6/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/ConnectProtocol.java#L279].
 This also means that the leader (unless it dies due to unrelated 
circumstances) will also participate in the next rebalance and remain the 
leader.

Beyond that, I'm sorry but I don't really have time to focus on a design for 
this right now; it's why I've created this ticket but haven't assigned it to 
myself.

> Downgrading Connect rebalancing protocol from incremental to eager causes 
> duplicate task instances
> --------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-13816
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13816
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>            Reporter: Chris Egerton
>            Assignee: Sagar Rao
>            Priority: Major
>
> The rebalancing protocol for a Kafka Connect cluster can be downgraded from 
> incremental to eager by adding a worker to the cluster with 
> {{connect.protocol}} set to {{{}eager{}}}, or by stopping an existing worker 
> in that cluster, reconfiguring it with the new protocol, and restarting it.
> When the worker (re)joins the cluster, a rebalance takes place using the 
> eager protocol, and duplicate task instances are created on the cluster.
> This occurs because:
>  * The leader does not send out an assignment that revokes all connectors and 
> tasks for the cluster during that round
>  * Workers do not respond to the downgrade in protocol by revoking all 
> connectors and tasks that they were running before the rebalance that are not 
> included in the new assignment they received during the rebalance
> It's likely that this bug hasn't surfaced sooner because any subsequent 
> rebalance should cause all connectors and tasks on all each in the cluster to 
> be proactively revoked before the worker rejoins the group.
> [KIP-415|https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect#KIP415:IncrementalCooperativeRebalancinginKafkaConnect-Compatibility,Deprecation,andMigrationPlan]
>  provides one way to address this:
> {quote}To downgrade your cluster to use protocol version 0 from version 1 or 
> higher with {{eager}} rebalancing policy what is required is to switch one of 
> the workers back to {{eager}} mode. 
> {panel}
>  {panel}
> |{{connect.protocol = eager}}|
> Once this worker joins, the group will downgrade to protocol version 0 and 
> {{eager}} rebalancing policy, with immediately release of resources upon 
> joining the group. This process will require a one-time double rebalancing, 
> with the leader detecting the downgrade and first sending a downgraded 
> assignment with empty assigned connectors and tasks and from then on just 
> regular downgraded assignments. 
> {quote}
> However, it's unclear how to accomplish the second round in the double 
> rebalance described above.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (KAFKA-13816) Downgrading Connect rebalancing protocol from incremental to eager causes duplicate task instances

Reply via email to