Unavailable partitions after upgrade to kafka 1.0.0

Mika Linnanoja Sun, 22 Apr 2018 23:05:31 -0700

Hello,

Last week I upgraded one relatively large kafka (EC2, 10 brokers, ~30 TB
data, 100-300 Mbps in/out per instance) 0.10.0.1 cluster to 1.0, and saw
some issues.


Out of ~100 topics with 2..20 partitions each, 9 partitions in 8 topics
become "unavailable" across 3 brokers. The leader was shown as -1 and ISR
was empty. Java service using 0.10.0.1 clients was unable to send any data
to these partitions so it got dropped.

The partitions were shown on the `kafka/bin/kafka-topics.sh --zookeeper
<zk's> --unavailable-partitions --describe` output. Nothing special about
these partitions, among them were big ones (hundreds of gigs) and tiny ones
(megabytes).

The fix was to set up the unclean leader elections and restart one of the
affected brokers in each partition: `kafka/bin/kafka-configs.sh --zookeeper
<zk's> --entity-type topics --entity-name <topicname> --add-config
unclean.leader.election.enable=true --alter`.

Anyone seen something like this, how to avoid it when next upgrading
perchance? Maybe it would be better if said cluster got no traffic during
upgrade, but we cannot have a maintenance break as everything is up 24/7.
Cluster is for analytics data, some of which is consumed in real-time
applications, mostly by secor.

BR,
Mika

-- 
*Mika Linnanoja*
Senior Cloud Engineer
Games Technology
Rovio Entertainment Corp
Keilaranta 7, FIN - 02150 Espoo, Finland
mika.linnan...@rovio.com
www.rovio.com

Unavailable partitions after upgrade to kafka 1.0.0

Reply via email to