George Yang created KAFKA-18386:
-----------------------------------

             Summary: Mirror Maker2 Pod CrashLoopBackoff When one DC is powered 
off
                 Key: KAFKA-18386
                 URL: https://issues.apache.org/jira/browse/KAFKA-18386
             Project: Kafka
          Issue Type: Bug
          Components: mirrormaker
    Affects Versions: 3.7.1
            Reporter: George Yang


When using Kubernetes deployment with MirrorMaker v3.7.1 and deploying one 
Kafka node in each data center (DC1 and DC2), if DC1 is powered off, DC2 will 
encounter a CrashLoopBackOff error. This issue is different from the one 
described in KAFKA-17784. Please find the report log below:


```log
[2025-01-01 08:05:53,432] WARN [AdminClient clientId=dc64->dc88] Connection to 
node -1 (/192.168.2.88:13399) could not be established. Node may not be 
available. 
(org.apache.kafka.clients.NetworkClient:830)[kafka-admin-client-thread | 
dc64->dc88]
[2025-01-01 08:05:55,652] INFO [AdminClient clientId=dc64->dc88] Metadata 
update failed 
(org.apache.kafka.clients.admin.internals.AdminMetadataManager:267)[kafka-admin-client-thread
 | dc64->dc88]
org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the 
call. Call: fetchMetadata
[2025-01-01 08:05:55,653] INFO App info kafka.admin.client for dc64->dc88 
unregistered 
(org.apache.kafka.common.utils.AppInfoParser:88)[kafka-admin-client-thread | 
dc64->dc88]
[2025-01-01 08:05:55,653] INFO [AdminClient clientId=dc64->dc88] Metadata 
update failed 
(org.apache.kafka.clients.admin.internals.AdminMetadataManager:267)[kafka-admin-client-thread
 | dc64->dc88]
org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the 
call. Call: fetchMetadata
[2025-01-01 08:05:55,653] INFO [AdminClient clientId=dc64->dc88] Timed out 1 
remaining operation(s) during close. 
(org.apache.kafka.clients.admin.KafkaAdminClient:1450)[kafka-admin-client-thread
 | dc64->dc88]
[2025-01-01 08:05:55,657] INFO Metrics scheduler closed 
(org.apache.kafka.common.metrics.Metrics:684)[kafka-admin-client-thread | 
dc64->dc88]
[2025-01-01 08:05:55,658] INFO Closing reporter 
org.apache.kafka.common.metrics.JmxReporter 
(org.apache.kafka.common.metrics.Metrics:688)[kafka-admin-client-thread | 
dc64->dc88]
[2025-01-01 08:05:55,658] INFO Metrics reporters closed 
(org.apache.kafka.common.metrics.Metrics:694)[kafka-admin-client-thread | 
dc64->dc88]
[2025-01-01 08:05:55,658] ERROR Stopping due to error 
(org.apache.kafka.connect.mirror.MirrorMaker:360)[main]
org.apache.kafka.connect.errors.ConnectException: Failed to connect to and 
describe Kafka cluster. Check worker's broker connection and security 
properties.
        at 
org.apache.kafka.connect.runtime.WorkerConfig.lookupKafkaClusterId(WorkerConfig.java:305)
        at 
org.apache.kafka.connect.runtime.WorkerConfig.lookupKafkaClusterId(WorkerConfig.java:285)
        at 
org.apache.kafka.connect.runtime.WorkerConfig.kafkaClusterId(WorkerConfig.java:415)
        at 
org.apache.kafka.connect.mirror.MirrorMaker.addHerder(MirrorMaker.java:252)
        at java.base/java.lang.Iterable.forEach(Unknown Source)
        at 
org.apache.kafka.connect.mirror.MirrorMaker.<init>(MirrorMaker.java:158)
        at 
org.apache.kafka.connect.mirror.MirrorMaker.<init>(MirrorMaker.java:170)
        at 
org.apache.kafka.connect.mirror.MirrorMaker.<init>(MirrorMaker.java:174)
        at 
org.apache.kafka.connect.mirror.MirrorMaker.main(MirrorMaker.java:347)
Caused by: java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node 
assignment. Call: listNodes
        at java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown 
Source)
        at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source)
        at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
        at 
org.apache.kafka.connect.runtime.WorkerConfig.lookupKafkaClusterId(WorkerConfig.java:299)
        ... 8 more
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting 
for a node assignment. Call: listNodes
[2025-01-01 08:05:55,687] INFO Stopped http_8083@6705fb02\{HTTP/1.1, 
(http/1.1)}{0.0.0.0:8083} 
(org.eclipse.jetty.server.AbstractConnector:383)[JettyShutdownThread]
```

The configuration of mirrormaker is:
```
clusters = dc64, dc88

dc64.bootstrap.servers = 192.168.2.64:13399
dc88.bootstrap.servers = 192.168.2.88:13399
dc64->dc88.enabled = true
dc64->dc88.topics = .*

dc88->dc64.enabled = true
dc88->dc64.topics = .*

replication.factor=1

tasks.max=6
emit.checkpoints.interval.seconds=5
dc64.producer.acks=all
dc64.producer.batch.size=50000

dc64.consumer.auto.offset.reset=latest
dc88.consumer.auto.offset.reset=latest
dc64.consumer.max.poll.interval.ms=20000
dc88.consumer.max.poll.interval.ms=20000

refresh.topics.enabled=true
refresh.topics.interval.seconds=5

refresh.groups.enabled=true
refresh.groups.interval.seconds=5

dedicated.mode.enable.internal.rest = true
dc64.scheduled.rebalance.max.delay.ms=20000
dc88.scheduled.rebalance.max.delay.ms=20000

checkpoints.topic.replication.factor=1
heartbeats.topic.replication.factor=1
offset-syncs.topic.replication.factor=1

offset.storage.replication.factor=1
status.storage.replication.factor=1
config.storage.replication.factor=1
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to