Hi!

We're running cluster of 3 kafka-0.8.2.2 nodes and delete.topic.enable is set 
to true on all nodes.

Today we tried to delete one of the topics. I waited ~20 minutes after the 
kafka-topics.sh --delete was executed but the topic was still there.

--describe showed Leader: -1 and only one of three brokers (1, 2 or 3) in ISR 
for all partitions.

The topic had the following configuration:
partitions: 50
replication factor: 3
unclean.leader.election.enable: false
min.insync.replicas: 2

Then I restarted one of the nodes with the hope to trigger retry on topic 
deletion. After that things went nuts and we had short cluster downtime before 
I managed to recover it and get to the same point when we were before the 
restart. The topic was still there.

Since we successfully deleted topics before on the same cluster some time ago, 
the only thing which could be different is someone could hold an open 
connection or a consumer group linked to the topic. According to previous 
similar issues described in this group this could make topic deletion to fail. 
We had shut down the service responsible for this topic, triggered preferred 
replica election, but that didn't help. Also we set retention.bytes topic 
config to 1, waited for retention.interval.ms, nothing changed.

The only option to recover which we see is shut down all nodes, delete the 
topic from file system, remove the topic from zookeeper on /config/topics/ and 
/admin/delete_topics and start the nodes again.

So my question is really is it possible to recover from this situation, either 
cancel topic deletion or finally delete it, without cluster downtime?

Thanks.

/Ivan

Reply via email to