[ https://issues.apache.org/jira/browse/KAFKA-18386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Greg Harris resolved KAFKA-18386. --------------------------------- Resolution: Won't Fix > Mirror Maker2 Pod CrashLoopBackoff When one DC is powered off > ------------------------------------------------------------- > > Key: KAFKA-18386 > URL: https://issues.apache.org/jira/browse/KAFKA-18386 > Project: Kafka > Issue Type: Bug > Components: mirrormaker > Affects Versions: 3.7.1 > Reporter: George Yang > Priority: Major > Original Estimate: 48h > Remaining Estimate: 48h > > When using Kubernetes deployment with MirrorMaker v3.7.1 and deploying one > Kafka node in each data center (DC1 and DC2), if DC1 is powered off, DC2 will > encounter a CrashLoopBackOff error. This issue is different from the one > described in KAFKA-17784. Please find the report log below: > ```log > [2025-01-01 08:05:53,432] WARN [AdminClient clientId=dc64->dc88] Connection > to node -1 (/192.168.2.88:13399) could not be established. Node may not be > available. > (org.apache.kafka.clients.NetworkClient:830)[kafka-admin-client-thread | > dc64->dc88] > [2025-01-01 08:05:55,652] INFO [AdminClient clientId=dc64->dc88] Metadata > update failed > (org.apache.kafka.clients.admin.internals.AdminMetadataManager:267)[kafka-admin-client-thread > | dc64->dc88] > org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send > the call. Call: fetchMetadata > [2025-01-01 08:05:55,653] INFO App info kafka.admin.client for dc64->dc88 > unregistered > (org.apache.kafka.common.utils.AppInfoParser:88)[kafka-admin-client-thread | > dc64->dc88] > [2025-01-01 08:05:55,653] INFO [AdminClient clientId=dc64->dc88] Metadata > update failed > (org.apache.kafka.clients.admin.internals.AdminMetadataManager:267)[kafka-admin-client-thread > | dc64->dc88] > org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send > the call. Call: fetchMetadata > [2025-01-01 08:05:55,653] INFO [AdminClient clientId=dc64->dc88] Timed out 1 > remaining operation(s) during close. > (org.apache.kafka.clients.admin.KafkaAdminClient:1450)[kafka-admin-client-thread > | dc64->dc88] > [2025-01-01 08:05:55,657] INFO Metrics scheduler closed > (org.apache.kafka.common.metrics.Metrics:684)[kafka-admin-client-thread | > dc64->dc88] > [2025-01-01 08:05:55,658] INFO Closing reporter > org.apache.kafka.common.metrics.JmxReporter > (org.apache.kafka.common.metrics.Metrics:688)[kafka-admin-client-thread | > dc64->dc88] > [2025-01-01 08:05:55,658] INFO Metrics reporters closed > (org.apache.kafka.common.metrics.Metrics:694)[kafka-admin-client-thread | > dc64->dc88] > [2025-01-01 08:05:55,658] ERROR Stopping due to error > (org.apache.kafka.connect.mirror.MirrorMaker:360)[main] > org.apache.kafka.connect.errors.ConnectException: Failed to connect to and > describe Kafka cluster. Check worker's broker connection and security > properties. > at > org.apache.kafka.connect.runtime.WorkerConfig.lookupKafkaClusterId(WorkerConfig.java:305) > at > org.apache.kafka.connect.runtime.WorkerConfig.lookupKafkaClusterId(WorkerConfig.java:285) > at > org.apache.kafka.connect.runtime.WorkerConfig.kafkaClusterId(WorkerConfig.java:415) > at > org.apache.kafka.connect.mirror.MirrorMaker.addHerder(MirrorMaker.java:252) > at java.base/java.lang.Iterable.forEach(Unknown Source) > at > org.apache.kafka.connect.mirror.MirrorMaker.<init>(MirrorMaker.java:158) > at > org.apache.kafka.connect.mirror.MirrorMaker.<init>(MirrorMaker.java:170) > at > org.apache.kafka.connect.mirror.MirrorMaker.<init>(MirrorMaker.java:174) > at > org.apache.kafka.connect.mirror.MirrorMaker.main(MirrorMaker.java:347) > Caused by: java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node > assignment. Call: listNodes > at java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown > Source) > at java.base/java.util.concurrent.CompletableFuture.get(Unknown > Source) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165) > at > org.apache.kafka.connect.runtime.WorkerConfig.lookupKafkaClusterId(WorkerConfig.java:299) > ... 8 more > Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting > for a node assignment. Call: listNodes > [2025-01-01 08:05:55,687] INFO Stopped http_8083@6705fb02\{HTTP/1.1, > (http/1.1)}{0.0.0.0:8083} > (org.eclipse.jetty.server.AbstractConnector:383)[JettyShutdownThread] > ``` > The configuration of mirrormaker is: > ``` > clusters = dc64, dc88 > dc64.bootstrap.servers = 192.168.2.64:13399 > dc88.bootstrap.servers = 192.168.2.88:13399 > dc64->dc88.enabled = true > dc64->dc88.topics = .* > dc88->dc64.enabled = true > dc88->dc64.topics = .* > replication.factor=1 > tasks.max=6 > emit.checkpoints.interval.seconds=5 > dc64.producer.acks=all > dc64.producer.batch.size=50000 > dc64.consumer.auto.offset.reset=latest > dc88.consumer.auto.offset.reset=latest > dc64.consumer.max.poll.interval.ms=20000 > dc88.consumer.max.poll.interval.ms=20000 > refresh.topics.enabled=true > refresh.topics.interval.seconds=5 > refresh.groups.enabled=true > refresh.groups.interval.seconds=5 > dedicated.mode.enable.internal.rest = true > dc64.scheduled.rebalance.max.delay.ms=20000 > dc88.scheduled.rebalance.max.delay.ms=20000 > checkpoints.topic.replication.factor=1 > heartbeats.topic.replication.factor=1 > offset-syncs.topic.replication.factor=1 > offset.storage.replication.factor=1 > status.storage.replication.factor=1 > config.storage.replication.factor=1 > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)