[ https://issues.apache.org/jira/browse/KAFKA-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Colin McCabe updated KAFKA-14074: --------------------------------- Summary: Restarting a broker during re-assignment can leave log directory entries in ZK mode (was: Restarting a broker during re-assignment can leave log directory entries) > Restarting a broker during re-assignment can leave log directory entries in > ZK mode > ----------------------------------------------------------------------------------- > > Key: KAFKA-14074 > URL: https://issues.apache.org/jira/browse/KAFKA-14074 > Project: Kafka > Issue Type: Bug > Affects Versions: 2.8.0, 3.1.0 > Reporter: Adrian Preston > Priority: Major > > Re-starting a broker while replicas are being assigned away from the broker > can result in topic partition directories being left in the broker’s log > directory. This can trigger further problems if such a topic is deleted and > re-created. These problems occur when replicas for the new topic are placed > on a broker that hosts a “stale” topic partition directory of the same name, > causing the on-disk topic partition state held by different brokers in the > cluster to diverge. > We have also been able to re-produce variants this problem using Kafka 2.8 > and 3.1, as well as Kafka built from the head of the apache/kafka repository > (at the time of writing this is commit: > 94d4fdeb28b3cd4d474d943448a7ef653eaa145d). We have *not* being able to > re-produce this problem with Kafka running in KRaft mode. > A minimal re-create for topic directories being left on disk is as follows: > # Start ZooKeeper and a broker (both using the sample config) > # Create 100 topics: each with 1 partition, and with replication factor 1 > # Add a second broker to the Kafka cluster (with minor edits to the sample > config for: {{{}broker.id{}}}, {{{}listeners{}}}, and {{{}log.dirs{}}}) > # Issue a re-assignment that moves all of the topic partition replicas from > the first broker to the second broker > # While this re-assignment is taking place shutdown the first broker (you > need to be quick with only two brokers and 100 topics…) > # Wait a few seconds for the re-assignment to stall > # Restart the first broker and wait for the re-assignment to complete and it > to remove any partially deleted topics (e.g. those with a “-delete” suffix). > Inspecting the logs directory for the first broker should show directories > corresponding to topic partitions that are owned by the second broker. These > are not cleaned up when the re-assignment completes, and also remain in the > logs directory even if the first broker is restarted. Deleting the topic > also does not clean up the topic partitions left behind on the first broker - > which leads to a second potential problem. > For topics that have more than one replica: a new topic that has the same > name as a previously deleted topic might have replicas created on a broker > with “stale” topic partition directories. If this happens these topics will > remain in an under-replicated state. > A minimal re-create for this is as follows: > # Create a three node Kafka cluster (backed by ZK) based off the sample > config (to avoid confusion let’s call these kafka-0, kafka-1, and kafka-2) > # Create 100 topics: each with 1 partition, and with replication factor 2 > # Submit a re-assignment to move all of the topic partition replicas to > kafka-0 and kafka-1, and wait for it to complete > # Submit a re-assignment to move all of the topic partition replicas on > kafka-0 to kafka-2. > # While this re-assignment is taking place shutdown and re-start kafka-0. > # Wait for the re-assignment to complete, and check that there’s unexpected > topic partition directories in kafka-0’s logs directory > # Delete all 100 topics, and re-create 100 new topics with the same name and > configuration as the deleted topics. > In this state kafka-1 and kafka-2 continually generate log messages similar > to: > {{[2022-07-14 13:07:49,118] WARN [ReplicaFetcher replicaId=2, leaderId=0, > fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition > test-039-0. This error may be returned transiently when the partition is > being created or deleted, but it is not expected to persist. > (kafka.server.ReplicaFetcherThread)}} > Topics that have had replicas created on kafka-0 are under-replicated with > kafka-0 missing from the ISR list. Performing a rolling restart of each > broker in turn does not resolve the problem, in fact more partitions are > listed as under-replicated, as before kafka-0 is missing from their ISR list. > I also tried to re-create this with Kafka running in Kraft mode, but was > unable to do so. My test configuration was three brokers configured based on > /config/kraft/server.properties. All three brokers were part of the > controller quorum. Interestingly I see log lines like the following when > re-starting the broker that I stopped mid-reassignment: > {{[2022-07-14 13:44:42,705] INFO Found stray log dir > Log(dir=/tmp/kraft-2/test-029-0, topicId=DMGA3zxyQqGUfeV6cmkcmg, > topic=test-029, partition=0, highWatermark=0, lastStableOffset=0, > logStartOffset=0, logEndOffset=0): the current replica assignment [I@530d4c70 > does not contain the local brokerId 2. > (kafka.server.metadata.BrokerMetadataPublisher$)}} > With later log lines showing the topic being deleted. Looking at the > corresponding code: KRaft mode explicitly checks that the topic ID on disk > matches the expected value, and deletes the directory if it does not. -- This message was sent by Atlassian Jira (v8.20.10#820010)