[ 
https://issues.apache.org/jira/browse/KAFKA-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17897827#comment-17897827
 ] 

Asker commented on KAFKA-17232:
-------------------------------

[~frankvicky] [~gharris1727]
Hello,
I believe we're experiencing the issue described here after upgrading to Kafka 
3.9.0.

After the upgrade, we started seeing the following error repeatedly:
{code:bash}
ERROR [Worker clientId=A->C, groupId=A-mm2] Failed to reconfigure connector’s 
tasks (MirrorCheckpointConnector), retrying after backoff.
org.apache.kafka.connect.errors.RetriableException: Timeout while loading 
consumer groups.
at 
org.apache.kafka.connect.mirror.MirrorCheckpointConnector.taskConfigs(MirrorCheckpointConnector.java:138)
…
{code}
Despite increasing {{admin.timeout.ms}} and related timeouts, the error 
persisted. We also confirmed that ACLs and authentication were correctly 
configured.

{*}Resolution{*}:
Downgraded to Kafka 3.8.1. After downgrading, the error no longer appeared, and 
the {{MirrorCheckpointConnector}} functioned correctly.

{*}Conclusion{*}:
It seems that the changes introduced in Kafka 3.9.0 related to this issue may 
have inadvertently caused this problem. Our clusters are not particularly 
large, so the timeout during the initial consumer group load was unexpected.

{*}Questions{*}:
1. Is there a known workaround or configuration change that can prevent this 
error in Kafka 3.9.0?
2. Will there be a fix or patch available in an upcoming release?

Please let me know if I can provide any additional information to help resolve 
this issue.

Thank you for your attention to this matter.

Best regards,
Asker Kakhramanov

> MirrorCheckpointConnector does not generate task configs if initial consumer 
> group load times out
> -------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-17232
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17232
>             Project: Kafka
>          Issue Type: Bug
>          Components: mirrormaker
>    Affects Versions: 3.9.0
>            Reporter: Greg Harris
>            Assignee: TengYao Chi
>            Priority: Major
>             Fix For: 3.9.0
>
>
> The MirrorCheckpointConnector has two operations that read the source 
> consumer groups:
>  * loadInitialConsumerGroups
>  * refreshConsumerGroups
> loadInitialConsumerGroups blocks the start() method of the connector, while 
> refreshConsumerGroups is asynchronous and runs periodically while the 
> connector is running.
> loadInitialConsumerGroups may take a long time to execute, and may exceed the 
> configured "admin.timeout.ms" used by the Scheduler. This timeout is logged 
> and the start() method returns normally. If this happens, the framework will 
> generate task configs immediately after start(), before 
> loadInitialConsumerGroups can finish, and will generate an empty set of task 
> configs: 
> [https://github.com/apache/kafka/blob/e2494e6ffb89f8288ed2aeb9b5596c755210bffd/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorCheckpointConnector.java#L118-L121].
> Later, when loadInitialConsumerGroups completes, it will not request task 
> reconfiguration, believing it is the initial load operation.
> Later still, when refreshConsumerGroups completes, it will not request task 
> reconfiguration, as the set of consumer groups has not changed since the 
> initial load: 
> [https://github.com/apache/kafka/blob/e2494e6ffb89f8288ed2aeb9b5596c755210bffd/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorCheckpointConnector.java#L173-L180]
>  
> This leads to a situation where the MirrorCheckpointConnector believes it has 
> converged with nothing to update, but actually has consumer groups that are 
> not allocated to tasks.
> This happens particularly for large, stable Kafka clusters with many consumer 
> groups that are not being actively created or deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to