[
https://issues.apache.org/jira/browse/KAFKA-13816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524601#comment-17524601
]
Chris Egerton commented on KAFKA-13816:
---------------------------------------
[~sagarrao] I think you're on the right track, but it's a little trickier than
that since the leader won't necessarily die if it fails to read the config
topic during a rebalance. In some (probably most) cases, it'll just issue an
assignment to everyone with the error field set to
[CONFIG_MISMATCH|https://github.com/apache/kafka/blob/9c3f605fc78f297ecf5accdcdec18471c19cf7d6/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/ConnectProtocol.java#L279].
This also means that the leader (unless it dies due to unrelated
circumstances) will also participate in the next rebalance and remain the
leader.
Beyond that, I'm sorry but I don't really have time to focus on a design for
this right now; it's why I've created this ticket but haven't assigned it to
myself.
> Downgrading Connect rebalancing protocol from incremental to eager causes
> duplicate task instances
> --------------------------------------------------------------------------------------------------
>
> Key: KAFKA-13816
> URL: https://issues.apache.org/jira/browse/KAFKA-13816
> Project: Kafka
> Issue Type: Bug
> Components: KafkaConnect
> Reporter: Chris Egerton
> Assignee: Sagar Rao
> Priority: Major
>
> The rebalancing protocol for a Kafka Connect cluster can be downgraded from
> incremental to eager by adding a worker to the cluster with
> {{connect.protocol}} set to {{{}eager{}}}, or by stopping an existing worker
> in that cluster, reconfiguring it with the new protocol, and restarting it.
> When the worker (re)joins the cluster, a rebalance takes place using the
> eager protocol, and duplicate task instances are created on the cluster.
> This occurs because:
> * The leader does not send out an assignment that revokes all connectors and
> tasks for the cluster during that round
> * Workers do not respond to the downgrade in protocol by revoking all
> connectors and tasks that they were running before the rebalance that are not
> included in the new assignment they received during the rebalance
> It's likely that this bug hasn't surfaced sooner because any subsequent
> rebalance should cause all connectors and tasks on all each in the cluster to
> be proactively revoked before the worker rejoins the group.
> [KIP-415|https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect#KIP415:IncrementalCooperativeRebalancinginKafkaConnect-Compatibility,Deprecation,andMigrationPlan]
> provides one way to address this:
> {quote}To downgrade your cluster to use protocol version 0 from version 1 or
> higher with {{eager}} rebalancing policy what is required is to switch one of
> the workers back to {{eager}} mode.
> {panel}
> {panel}
> |{{connect.protocol = eager}}|
> Once this worker joins, the group will downgrade to protocol version 0 and
> {{eager}} rebalancing policy, with immediately release of resources upon
> joining the group. This process will require a one-time double rebalancing,
> with the leader detecting the downgrade and first sending a downgraded
> assignment with empty assigned connectors and tasks and from then on just
> regular downgraded assignments.
> {quote}
> However, it's unclear how to accomplish the second round in the double
> rebalance described above.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)