Hi Stevo, Let me know if we want to open Kafka-2937 again. I can include the above finding in to the patch or you want to create a separate JIra for this.
Thanks, Mayuresh On Fri, Mar 11, 2016 at 7:53 AM, Mayuresh Gharat <gharatmayures...@gmail.com > wrote: > kafka-2937 is different from this I think. Kafka-2937 deals with the > delete topic getting stuck because the LeaderAndISR in ZK was updated by a > controller and then the controller dies and the new controller gets in to > the exception and never completes deleting the topic. The topic existed in > the cluster and was also marked for delete. > The case reported here is that the topic does not exist in cluster but is > marked for delete. > Am I right in understanding? > > Thanks, > > Mayuresh > > On Fri, Mar 11, 2016 at 5:30 AM, Stevo Slavić <ssla...@gmail.com> wrote: > >> Topic it seems would get deleted but request in ZK to delete topic would >> not get cleared even after restarting Kafka cluster. >> >> I'm still investigating why deletion did not complete in the first place >> without restarting any nodes. It seems something smelly happens when there >> is request to delete more than one topic. >> >> Anyway, I think I found one potential bug in >> ReplicaStateMachine.areAllReplicasForTopicDeleted check which could be >> cause for not clearing deletion request from ZK even after restart of >> whole >> cluster. Line ReplicaStateMachine.scala#L285 >> < >> https://github.com/sslavic/kafka/blob/trunk/core/src/main/scala/kafka/controller/ReplicaStateMachine.scala#L285 >> > >> >> replicaStatesForTopic.forall(_._2 == ReplicaDeletionSuccessful) >> >> which is return value of that function/check, probably should better be >> checking for >> >> replicaStatesForTopic.isEmpty || replicaStatesForTopic.forall(_._2 == >> ReplicaDeletionSuccessful) >> >> I noticed it because in controller logs I found entries like: >> >> [2016-03-04 13:27:29,115] DEBUG [Replica state machine on controller 1]: >> Are all replicas for topic foo deleted Map() >> (kafka.controller.ReplicaStateMachine) >> >> even though normally they look like: >> >> [2016-03-04 09:33:41,036] DEBUG [Replica state machine on controller 1]: >> Are all replicas for topic foo deleted >> Map([Topic=foo,Partition=0,Replica=0] -> ReplicaDeletionStarted, >> [Topic=foo,Partition=0,Replica=3] -> ReplicaDeletionStarted, >> [Topic=foo,Partition=0,Replica=1] -> ReplicaDeletionSuccessful) >> (kafka.controller.ReplicaStateMachine) >> >> Kind regards, >> Stevo Slavic. >> >> On Sun, Mar 6, 2016 at 12:31 AM, Guozhang Wang <wangg...@gmail.com> >> wrote: >> >> > Thanks Stevo, >> > >> > Feel free to paste your findings in KAFKA-2937, we can re-open that >> ticket >> > if necessary. >> > >> > Guozhang >> > >> > On Fri, Mar 4, 2016 at 4:38 AM, Stevo Slavić <ssla...@gmail.com> wrote: >> > >> > > Hell Apache Kafka community, >> > > >> > > I'm still investigating an incident; from initial findings topic >> deletion >> > > doesn't seem to work well still with Kafka 0.9.0.1, likely some edge >> case >> > > not covered. >> > > >> > > Before with 0.8.2.x it used to happen that non-lead replica would be >> > stuck >> > > in topic deletion process, and workaround was just to restart that >> node. >> > > >> > > If I'm not mistaken, that edge case got (or at least is expected to >> be) >> > > fixed in 0.9.0.1 via KAFKA-2937 >> > > <https://issues.apache.org/jira/browse/KAFKA-2937> >> > > >> > > Request to delete topic continued to be there in ZK even after whole >> > > cluster restart - topic seemed not to exist, seemed to actually be >> > deleted, >> > > but request to delete topic would remain. Had to manually delete >> request >> > > node in ZK. >> > > >> > > When I have more details, and reproducible use case, will report back. >> > > >> > > Kind regards, >> > > Stevo Slavic. >> > > >> > >> > >> > >> > -- >> > -- Guozhang >> > >> > > > > -- > -Regards, > Mayuresh R. Gharat > (862) 250-7125 > -- -Regards, Mayuresh R. Gharat (862) 250-7125