fujian created KAFKA-19371: ------------------------------ Summary: When a broker restarts, it should not attempt to create the __remote_log_metadata topic if it already exists. Key: KAFKA-19371 URL: https://issues.apache.org/jira/browse/KAFKA-19371 Project: Kafka Issue Type: Bug Components: Tiered-Storage Affects Versions: 4.0.0, 3.9.0 Reporter: fujian
*[Precondition]* Kafka cluster already enabled the remote storage feature based on inner topic's implementation. The core inner topic "__remote_log_metadata" already created. *[Steps]* 1. Restart one broker of the Kafka cluster. 2. Check the log and the code logic for the "__remote_log_metadata"s creating when broker restarting *[Expect result]* The broker shouldn't attempt to call API to create the topic due to that it already existed. *[Actual result]* The results are different which depend on the start process' duration for broker: *Case 1: Happy Path when restarting take a short time* [2025-06-03 22:35:11,648] INFO Topic __remote_log_metadata {color:#00875a}exists{color}. TopicId: 4CT2TTC-R6u7fNo_njYlDA, numPartitions: 50, *Case 2: Unhappy path 1 when restarting take some time* [2025-06-03 23:59:40,505] INFO Topic __remote_log_metadata{color:#de350b} does not exist{color}. Error: Timed out waiting for a node assignment. Call: listNodes [2025-06-04 00:00:36,938] INFO Topic [__remote_log_metadata] {color:#de350b}already exists {color} *Case 3: Unhappy path 2 when restarting take a long time.* [2025-06-03 21:57:21,151] INFO Topic __remote_log_metadata {color:#de350b}does not exist{color}. Error: {color:#de350b}Timed out waiting{color} for a node assignment. Call: {color:#de350b}listNodes {color}at [2025-06-03 21:58:21,153] ERROR Encountered error while creating __remote_log_metadata topic. java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.{color:#de350b}TimeoutException{color}: Timed out waiting for a node assignment. Call: {color:#de350b}createTopics {color}at >From the log and current code. we can know that {color:#de350b}case 2 and case >3 both give the prompt "the topic does not exist" and try to call topic >creating API. In actually. it is useless and contradict the fact that the >topic already existed. Especially. the case 2's log prompt the topic existed >and not existed at the same time.{color} [Root Cause analyst] After reviewing the related code (TopicBasedRemoteLogMetadataManager#doesTopicExist). It is one {color:#de350b}wrong implement{color} to judge one topic existed or not. So let me create one PR to fix this minor bug. Thanks FYI: Why we got the timeout exception? It is normal case based on the fact: When restarting broker. The connection to query/create topic in "TopicBasedRemoteLogMetadataManager#initializeResources"will fail until the broker get ready. [2025-06-03 23:21:20,752] WARN [AdminClient clientId=adminclient-1] Connection to node -1 ([10.20.1.125:9559)|https://10-20-1-125/] could not be established. Node may not be available. [2025-06-03 23:21:21,282] INFO [BrokerServer id=2] Transition from STARTING to STARTED (kafka.server.BrokerServer) -- This message was sent by Atlassian Jira (v8.20.10#820010)