Upgrading from 0.8.0 to 0.8.1 one broker at a time issues

Marcin Michalski Wed, 09 Apr 2014 11:19:07 -0700

Hi, has anyone upgraded their kafka from 0.8.0 to 0.8.1 successfully one
broker at a time on a live cluster?


I am seeing strange behaviors where many of my kafka topics become unusable
(by both consumers and producers). When that happens, I see lots of errors
in the server logs that look like this:

[2014-04-09 10:38:14,669] WARN [KafkaApi-1007] Fetch request with
correlation id 2455 from client ReplicaFetcherThread-15-1007 on partition
[risk,0] failed due to Topic risk either doesn't exist or is in the process
of being deleted (kafka.server.KafkaApis)
[2014-04-09 10:38:14,669] WARN [KafkaApi-1007] Fetch request with
correlation id 2455 from client ReplicaFetcherThread-7-1007 on partition
[message,0] failed due to Topic message either doesn't exist or is in the
process of being deleted (kafka.server.KafkaApis)

When I try to consume a message from a topic that complained about the
Topic not existing (above warning), I get the below exception:

....topic message --from-beginning
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
details.
[2014-04-09 10:40:30,571] WARN
[console-consumer-90716_dkafkadatahub07.tag-dev.com-1397065229615-7211ba72-leader-finder-thread],
Failed to add leader for partitions [message,0]; will retry
(kafka.consumer.ConsumerFetcherManager$LeaderFinderThread)
kafka.common.UnknownTopicOrPartitionException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at kafka.common.ErrorMapping$.exceptionFor(ErrorMapping.scala:79)
at
kafka.consumer.SimpleConsumer.earliestOrLatestOffset(SimpleConsumer.scala:167)
at
kafka.consumer.ConsumerFetcherThread.handleOffsetOutOfRange(ConsumerFetcherThread.scala:60)
at
kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply(AbstractFetcherThread.scala:179)
at
kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply(AbstractFetcherThread.scala:174)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:119)
at
kafka.server.AbstractFetcherThread.addPartitions(AbstractFetcherThread.scala:174)
at
kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:86)
at
kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:76)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:119)
at
kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:76)
at
kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:95)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
----------

*More details about my issues:*
My current configuration in the environment where I am testing the upgrade
is 4 physical servers running 2 brokers each with controlled shutdown
feature enabled. When I shutdown the 2 brokers on one of the existing Kafka
0.8.0 machines and upgrade that machine to 0.8.1 and restart it, all is
fine for a bit. Once, the new brokers come up, I ran the
kafka-preferred-replica-election.sh to make sure that started brokers
become leaders of existing topics.  The replication factor on the topics is
set to 4. I tested both producing and consuming messages against brokers
that were leaders with kafka 0.8.0 and 0.8.1 and no issues were encountered.

Later, I tried to perform the control shutdown of the 2 additional brokers
on the Kafka server that has 0.8.0 version installed and after the broker
shutdown and new leaders were assigned, all of my server logs are getting
filled up with the above exceptions and most of my topics are not usable. I
have pulled and build the 0.8.1 kafka code from git last thursday so I
should be pretty much up to date. So not sure if I am doing something wrong
or if migrating from 0.8.0 to 0.8.1 on a live cluster one server at a time
is not supported. Is there a recommended migration approach that one should
take when migrating from live 0.8.0 to 0.8.1 cluster?

As to who is the leader of one of the topics that became unusable is the
broker that was successfully upgraded to 0.8.1:
Topic:message   PartitionCount:1        ReplicationFactor:4     Configs:
        Topic: message  Partition: 0   * Leader: 1007 *   Replicas:
1007,8,9,1001 Isr: 1001,1007,8

Brokers 9 and 1009 where shutdown from one physical server that had kafka
0.8.0 installed when these problems started occurring (I was planning to
upgrade them to 0.8.1). The only way I can recover from this state is to
shutdown all brokers and delete all of kafka topic logs plus zookeeper
kafka directory and start with new cluster.


Your help in this matter is greatly appreciated.

Thanks,
Martin

Upgrading from 0.8.0 to 0.8.1 one broker at a time issues

Reply via email to