Was there any error in the controller and the state-change logs? Thanks,
Jun On Wed, Apr 9, 2014 at 11:18 AM, Marcin Michalski <mmichal...@tagged.com>wrote: > Hi, has anyone upgraded their kafka from 0.8.0 to 0.8.1 successfully one > broker at a time on a live cluster? > > I am seeing strange behaviors where many of my kafka topics become unusable > (by both consumers and producers). When that happens, I see lots of errors > in the server logs that look like this: > > [2014-04-09 10:38:14,669] WARN [KafkaApi-1007] Fetch request with > correlation id 2455 from client ReplicaFetcherThread-15-1007 on partition > [risk,0] failed due to Topic risk either doesn't exist or is in the process > of being deleted (kafka.server.KafkaApis) > [2014-04-09 10:38:14,669] WARN [KafkaApi-1007] Fetch request with > correlation id 2455 from client ReplicaFetcherThread-7-1007 on partition > [message,0] failed due to Topic message either doesn't exist or is in the > process of being deleted (kafka.server.KafkaApis) > > When I try to consume a message from a topic that complained about the > Topic not existing (above warning), I get the below exception: > > ....topic message --from-beginning > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > [2014-04-09 10:40:30,571] WARN > > [console-consumer-90716_dkafkadatahub07.tag-dev.com-1397065229615-7211ba72-leader-finder-thread], > Failed to add leader for partitions [message,0]; will retry > (kafka.consumer.ConsumerFetcherManager$LeaderFinderThread) > kafka.common.UnknownTopicOrPartitionException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at java.lang.Class.newInstance0(Class.java:355) > at java.lang.Class.newInstance(Class.java:308) > at kafka.common.ErrorMapping$.exceptionFor(ErrorMapping.scala:79) > at > > kafka.consumer.SimpleConsumer.earliestOrLatestOffset(SimpleConsumer.scala:167) > at > > kafka.consumer.ConsumerFetcherThread.handleOffsetOutOfRange(ConsumerFetcherThread.scala:60) > at > > kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply(AbstractFetcherThread.scala:179) > at > > kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply(AbstractFetcherThread.scala:174) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:119) > at > > kafka.server.AbstractFetcherThread.addPartitions(AbstractFetcherThread.scala:174) > at > > kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:86) > at > > kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:76) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:119) > at > > kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:76) > at > > kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:95) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) > ---------- > > *More details about my issues:* > My current configuration in the environment where I am testing the upgrade > is 4 physical servers running 2 brokers each with controlled shutdown > feature enabled. When I shutdown the 2 brokers on one of the existing Kafka > 0.8.0 machines and upgrade that machine to 0.8.1 and restart it, all is > fine for a bit. Once, the new brokers come up, I ran the > kafka-preferred-replica-election.sh to make sure that started brokers > become leaders of existing topics. The replication factor on the topics is > set to 4. I tested both producing and consuming messages against brokers > that were leaders with kafka 0.8.0 and 0.8.1 and no issues were > encountered. > > Later, I tried to perform the control shutdown of the 2 additional brokers > on the Kafka server that has 0.8.0 version installed and after the broker > shutdown and new leaders were assigned, all of my server logs are getting > filled up with the above exceptions and most of my topics are not usable. I > have pulled and build the 0.8.1 kafka code from git last thursday so I > should be pretty much up to date. So not sure if I am doing something wrong > or if migrating from 0.8.0 to 0.8.1 on a live cluster one server at a time > is not supported. Is there a recommended migration approach that one should > take when migrating from live 0.8.0 to 0.8.1 cluster? > > As to who is the leader of one of the topics that became unusable is the > broker that was successfully upgraded to 0.8.1: > Topic:message PartitionCount:1 ReplicationFactor:4 Configs: > Topic: message Partition: 0 * Leader: 1007 * Replicas: > 1007,8,9,1001 Isr: 1001,1007,8 > > Brokers 9 and 1009 where shutdown from one physical server that had kafka > 0.8.0 installed when these problems started occurring (I was planning to > upgrade them to 0.8.1). The only way I can recover from this state is to > shutdown all brokers and delete all of kafka topic logs plus zookeeper > kafka directory and start with new cluster. > > > Your help in this matter is greatly appreciated. > > Thanks, > Martin >