Hello Kafka-Users,

I just found another problem in the 3-node-cluster I'm running on Kafka 2.3.1 with Zookeeper 3.4.14.

When running the command
./kafka-topics.sh --bootstrap-server localhost:9092 --describe --under-replicated-partitions

the cluster told me that partition-0 of one topic has under-replicated partitions:         Topic: my-great-topic       Partition: 0    Leader: 2       Replicas: 2,1,3 Isr: 2,3

So I checked the topic with
./kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic my-great-topic

and got:
Topic:my-great-topic        PartitionCount:12 ReplicationFactor:3 Configs:segment.bytes=1073741824,message.format.version=2.3-IV1,retention.bytes=1073741824         Topic: my-great-topic       Partition: 0    Leader: 2       Replicas: 2,1,3 Isr: 2,3         Topic: my-great-topic       Partition: 1    Leader: 1       Replicas: 1,3,2 Isr: 1,3,2         Topic: my-great-topic       Partition: 2    Leader: 2       Replicas: 2,1,3 Isr: 2,1,3         Topic: my-great-topic       Partition: 3    Leader: 3       Replicas: 3,1,2 Isr: 3,1,2         Topic: my-great-topic       Partition: 4    Leader: 1       Replicas: 1,2,3 Isr: 1,2,3         Topic: my-great-topic       Partition: 5    Leader: 2       Replicas: 2,3,1 Isr: 2,3,1         Topic: my-great-topic       Partition: 6    Leader: 3       Replicas: 3,2,1 Isr: 3,2,1         Topic: my-great-topic       Partition: 7    Leader: 1       Replicas: 1,3,2 Isr: 1,3,2         Topic: my-great-topic       Partition: 8    Leader: 2       Replicas: 2,1,3 Isr: 2,1,3         Topic: my-great-topic       Partition: 9    Leader: 3       Replicas: 3,1,2 Isr: 3,1,2         Topic: my-great-topic       Partition: 10   Leader: 1       Replicas: 1,2,3 Isr: 1,2,3         Topic: my-great-topic       Partition: 11   Leader: 2       Replicas: 2,3,1 Isr: 2,3,1

So only that one partition was not on node-0...

Last time we had the problem I tried all the stuff with like manual reshifting of partitions using the exported and modified file etc. Nothing helped. So I deleted the topic last time and with auto-create enabled it was immediately recreated and everything was fine again. But this time when I ran ./kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic my-great-topic
and then
./kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic my-great-topic

The cluster told me that only partition-0 was still there:
Topic:my-great-topic        PartitionCount:1 ReplicationFactor:3 Configs:segment.bytes=1073741824,message.format.version=2.3-IV1,retention.bytes=1073741824         Topic: my-great-topic       Partition: 0    Leader: 2       Replicas: 2,1,3 Isr: 2,3

Running the delete command on any node again got the error that the error:
ERROR java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.

Running any alter-command told me that the topic doesn't exist. When trying to run it with --create it said that the topic already exists....

I checked the logs of Kafka and found that it renamed the log-files for all the partitions on all nodes successfully to *-delete and also removed the logfiles from all nodes. Even the ones for partition-0

So I stopped all three nodes and started them again... Fortunately this recovered the topic and all partitions are now shown as replicated to all three nodes. Fortunately this was not (yet) in Production as it caused a downtime of the whole system to recover it. Unfortunately we are running same Kafka and Zookeeper in Production at the moment.

Any suggestions on what I should check or do next time this happens?

Best regards

Sebastian


--
DISCLAIMER
This email contains information that is confidential and which may be legally privileged. If you have received this email in error please
notify the sender immediately and delete the email.
This email is intended solely for the use of the intended recipient and you may not use or disclose this email in any way.

Reply via email to