Guozhang Wang created KAFKA-6098:
------------------------------------

             Summary: Delete and Re-create topic operation could result in race 
condition
                 Key: KAFKA-6098
                 URL: https://issues.apache.org/jira/browse/KAFKA-6098
             Project: Kafka
          Issue Type: Bug
            Reporter: Guozhang Wang
             Fix For: 1.0.0


Here is the following process to re-produce this issue:

1. Delete a topic using the delete topic request.
2. Confirm the topic is deleted using the list topics request.
3. Create the topic using the create topic request.

In step 3) a race condition can happen that the response returns a 
{{TOPIC_ALREADY_EXISTS}} error code, indicating the topic has already existed.

The root cause of the above issue is in the {{TopicDeletionManager}} class:

{code}
controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq,
 OfflinePartition)
controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq,
 NonExistentPartition)
topicsToBeDeleted -= topic
partitionsToBeDeleted.retain(_.topic != topic)
kafkaControllerZkUtils.deleteTopicZNode(topic)
kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic))
kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic))
controllerContext.removeTopic(topic)
{code}

I.e. it first update the broker's metadata cache through the ISR and metadata 
update request, then delete the topic zk path, and then delete the 
topic-deletion zk path. However, upon handling the create topic request, the 
broker will simply try to write to the topic zk path directly. Hence there is a 
race condition that between brokers update their metadata cache (hence list 
topic request not returning this topic anymore) and zk path for the topic be 
deleted (hence the create topic succeed).

The reason this problem could be exposed, is through current handling logic of 
the create topic response, most of which takes {{TOPIC_ALREADY_EXISTS}} as "OK" 
and moves on, and the zk path will be deleted later, hence leaving the topic to 
be not created at all:

https://github.com/apache/kafka/blob/249e398bf84cdd475af6529e163e78486b43c570/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsKafkaClient.java#L221

https://github.com/apache/kafka/blob/1a653c813c842c0b67f26fb119d7727e272cf834/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java#L232

And the user may not retry.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to