[ https://issues.apache.org/jira/browse/KAFKA-12493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mickael Maison resolved KAFKA-12493. ------------------------------------ Resolution: Won't Fix We're removing ZooKeeper support, closing > The controller should handle the consistency between the controllerContext > and the partition replicas assignment on zookeeper > ----------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-12493 > URL: https://issues.apache.org/jira/browse/KAFKA-12493 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0 > Reporter: Wenbing Shen > Assignee: Wenbing Shen > Priority: Major > > This question can be linked to this email: > [https://lists.apache.org/thread.html/redf5748ec787a9c65fc48597e3d2256ffdd729de14afb873c63e6c5b%40%3Cusers.kafka.apache.org%3E] > > This is a 100% recurring problem. > Problem description: > In the production environment of our customer’s site, the existing partitions > were redistributed in the code of colleagues in other departments and written > into zookeeper. This caused the controller to only judge the newly added > partitions when processing partition modification events. Partition > allocation plan and new partition and replica allocation in the partition > state machine and replica state machine, and issue LeaderAndISR and other > control requests. > But the controller did not verify the existing partition replicas assigment > in the controllerContext and whether the original partition allocation on the > znode in zookeeper has changed. This seems to be no problem, but when we have > to restart the broker for some reasons, such as configuration updates and > upgrades Wait, this will cause this part of the topic in real-time production > to be abnormal, the controller cannot complete the allocation of the new > leader, and the original leader cannot correctly identify the replica > allocated on the current zookeeper. The real-time business in our customer's > on-site environment is interrupted and partially Data has been lost. > This problem can be stably reproduced in the following ways: > Adding partitions or modifying replicas of an existing topic through the > following code will cause the original partition replicas to be reallocated > and finally written to zookeeper.Next, the controller did not accurately > process this event, restart the topic related broker, this topic will not be > able to be produced and consumed. > > {code:java} > public void updateKafkaTopic(KafkaTopicVO kafkaTopicVO) { > ZkUtils zkUtils = ZkUtils.apply(ZK_LIST, SESSION_TIMEOUT, > CONNECTION_TIMEOUT, JaasUtils.isZkSecurityEnabled()); > try { > if (kafkaTopicVO.getPartitionNum() >= 0 && > kafkaTopicVO.getReplicationNum() >= 0) { > // Get the original broker data information > Seq<BrokerMetadata> brokerMetadata = > AdminUtils.getBrokerMetadatas(zkUtils, > RackAwareMode.Enforced$.MODULE$, > Option.apply(null)); > // Generate a new partition replica allocation plan > scala.collection.Map<Object, Seq<Object>> replicaAssign = > AdminUtils.assignReplicasToBrokers(brokerMetadata, > kafkaTopicVO.getPartitionNum(), // Number of partitions > kafkaTopicVO.getReplicationNum(), // Number of replicas > per partition > -1, > -1); > // Modify the partition replica allocation plan > AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK(zkUtils, > kafkaTopicVO.getTopicNameList().get(0), > replicaAssign, > null, > true); > } > } catch (Exception e) { > System.out.println("Adjust partition abnormal"); > System.exit(0); > } finally { > zkUtils.close(); > } > } > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)