Hello Kafka-Users,
I just found another problem in the 3-node-cluster I'm running on Kafka
2.3.1 with Zookeeper 3.4.14.
When running the command
./kafka-topics.sh --bootstrap-server localhost:9092 --describe
--under-replicated-partitions
the cluster told me that partition-0 of one topic has under-replicated
partitions:
Topic: my-great-topic Partition: 0 Leader: 2
Replicas: 2,1,3 Isr: 2,3
So I checked the topic with
./kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic
my-great-topic
and got:
Topic:my-great-topic PartitionCount:12 ReplicationFactor:3
Configs:segment.bytes=1073741824,message.format.version=2.3-IV1,retention.bytes=1073741824
Topic: my-great-topic Partition: 0 Leader: 2
Replicas: 2,1,3 Isr: 2,3
Topic: my-great-topic Partition: 1 Leader: 1
Replicas: 1,3,2 Isr: 1,3,2
Topic: my-great-topic Partition: 2 Leader: 2
Replicas: 2,1,3 Isr: 2,1,3
Topic: my-great-topic Partition: 3 Leader: 3
Replicas: 3,1,2 Isr: 3,1,2
Topic: my-great-topic Partition: 4 Leader: 1
Replicas: 1,2,3 Isr: 1,2,3
Topic: my-great-topic Partition: 5 Leader: 2
Replicas: 2,3,1 Isr: 2,3,1
Topic: my-great-topic Partition: 6 Leader: 3
Replicas: 3,2,1 Isr: 3,2,1
Topic: my-great-topic Partition: 7 Leader: 1
Replicas: 1,3,2 Isr: 1,3,2
Topic: my-great-topic Partition: 8 Leader: 2
Replicas: 2,1,3 Isr: 2,1,3
Topic: my-great-topic Partition: 9 Leader: 3
Replicas: 3,1,2 Isr: 3,1,2
Topic: my-great-topic Partition: 10 Leader: 1
Replicas: 1,2,3 Isr: 1,2,3
Topic: my-great-topic Partition: 11 Leader: 2
Replicas: 2,3,1 Isr: 2,3,1
So only that one partition was not on node-0...
Last time we had the problem I tried all the stuff with like manual
reshifting of partitions using the exported and modified file etc.
Nothing helped. So I deleted the topic last time and with auto-create
enabled it was immediately recreated and everything was fine again. But
this time when I ran
./kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic
my-great-topic
and then
./kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic
my-great-topic
The cluster told me that only partition-0 was still there:
Topic:my-great-topic PartitionCount:1 ReplicationFactor:3
Configs:segment.bytes=1073741824,message.format.version=2.3-IV1,retention.bytes=1073741824
Topic: my-great-topic Partition: 0 Leader: 2
Replicas: 2,1,3 Isr: 2,3
Running the delete command on any node again got the error that the error:
ERROR java.util.concurrent.ExecutionException:
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This
server does not host this topic-partition.
Running any alter-command told me that the topic doesn't exist. When
trying to run it with --create it said that the topic already exists....
I checked the logs of Kafka and found that it renamed the log-files for
all the partitions on all nodes successfully to *-delete and also
removed the logfiles from all nodes. Even the ones for partition-0
So I stopped all three nodes and started them again... Fortunately this
recovered the topic and all partitions are now shown as replicated to
all three nodes.
Fortunately this was not (yet) in Production as it caused a downtime of
the whole system to recover it. Unfortunately we are running same Kafka
and Zookeeper in Production at the moment.
Any suggestions on what I should check or do next time this happens?
Best regards
Sebastian
--
DISCLAIMER
This email contains information that is confidential and which
may be
legally privileged. If you have received this email in error please
notify the sender immediately and delete the email.
This email is intended
solely for the use of the intended recipient and you may not use or
disclose this email in any way.