[ 
https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhruvil Shah reassigned KAFKA-6098:
-----------------------------------

    Assignee: Dhruvil Shah

> Delete and Re-create topic operation could result in race condition
> -------------------------------------------------------------------
>
>                 Key: KAFKA-6098
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6098
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Guozhang Wang
>            Assignee: Dhruvil Shah
>            Priority: Major
>              Labels: reliability
>
> Here is the following process to re-produce this issue:
> 1. Delete a topic using the delete topic request.
> 2. Confirm the topic is deleted using the list topics request.
> 3. Create the topic using the create topic request.
> In step 3) a race condition can happen that the response returns a 
> {{TOPIC_ALREADY_EXISTS}} error code, indicating the topic has already existed.
> The root cause of the above issue is in the {{TopicDeletionManager}} class:
> {code}
> controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq,
>  OfflinePartition)
> controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq,
>  NonExistentPartition)
> topicsToBeDeleted -= topic
> partitionsToBeDeleted.retain(_.topic != topic)
> kafkaControllerZkUtils.deleteTopicZNode(topic)
> kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic))
> kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic))
> controllerContext.removeTopic(topic)
> {code}
> I.e. it first update the broker's metadata cache through the ISR and metadata 
> update request, then delete the topic zk path, and then delete the 
> topic-deletion zk path. However, upon handling the create topic request, the 
> broker will simply try to write to the topic zk path directly. Hence there is 
> a race condition that between brokers update their metadata cache (hence list 
> topic request not returning this topic anymore) and zk path for the topic be 
> deleted (hence the create topic succeed).
> The reason this problem could be exposed, is through current handling logic 
> of the create topic response, most of which takes {{TOPIC_ALREADY_EXISTS}} as 
> "OK" and moves on, and the zk path will be deleted later, hence leaving the 
> topic to be not created at all:
> https://github.com/apache/kafka/blob/249e398bf84cdd475af6529e163e78486b43c570/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsKafkaClient.java#L221
> https://github.com/apache/kafka/blob/1a653c813c842c0b67f26fb119d7727e272cf834/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java#L232
> Looking at the code history, it seems this race condition always exist, but 
> testing on trunk / 1.0 with the above steps it is more likely to happen than 
> before. I wonder if the ZK async calls have an effect here. cc [~junrao] 
> [~onurkaraman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to