Have you tried using the latest stable version of Kafka (0.8.1.1) with controlled shutdown?
On Fri, Dec 5, 2014 at 2:39 PM, Haeley Yao <hae...@quantifind.com> wrote: > Hi, Kafka group > > We try to improve the fault-tolerance of kafka cluster. We setup 4 nodes > kafka cluster and 3 nodes zookeeper cluster. > > ubuntu version: Ubuntu 14.04.1 > zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT > kafka version: kafka_2.8.0-0.8.0 > > kafka0.x.x.x > broker:9092 > borker.id = 11 > zookeeper.connect=zk-01.x.x.x:2182,zk-02.x.x.x:2182,zk-03.x.x.x:2182 > > kafka1.x.x.x > broker:9092 > borker.id = 12 > zookeeper.connect=zk-01.x.x.x:2182,zk-02.x.x.x:2182,zk-03.x.x.x:2182 > > kafka2.x.x.x > broker:9092 > borker.id = 13 > zookeeper.connect=zk-01.x.x.x:2182,zk-02.x.x.x:2182,zk-03.x.x.x:2182 > > kafka3.x.x.x > broker:9092 > borker.id = 14 > zookeeper.connect=zk-01.x.x.x:2182,zk-02.x.x.x:2182,zk-03.x.x.x:2182 > > 1. start both kafka and zk servers, everything OK > > 2. create 2 topics > bin/kafka-create-topic.sh --zookeeper > zk-01.x.x.x:2182,zk-02.x.x.x:2182,zk-03.x.x.x:2182 —parition 3 --replica 3 > --topic zerg.hydra > > bin/kafka-create-topic.sh --zookeeper > zk-01.x.x.x:2182,zk-02.x.x.x:2182,zk-03.x.x.x:2182 —parition 2 --replica 3 > --topic zerg.test > > 3. start producer and consumer on brokerId 11. the inputs sent to producer > are received by consumer. > > 4. stop kafka servers on brokerId 12, 13, 14 by supervisor > /etc/init.d/supervisor stop > > only brokerId 11 is running. > root@kafka0:/home/dev/kafka# bin/kafka-list-topic.sh --topic zerg.hydra > --zookeeper zk-01.dev.quantifind.com:2182,zk-02.dev.quantifind.com:2182, > zk-03.dev.quantifind.com:2182 > topic: zerg.hydra partition: 0 leader: 11 replicas: > 11,14,12 isr: 11 > topic: zerg.hydra partition: 1 leader: 11 replicas: > 11,14,12 isr: 11 > topic: zerg.hydra partition: 2 leader: 11 replicas: > 12,11,13 isr: 11 > > root@kafka0:/home/dev/kafka# bin/kafka-list-topic.sh --topic zerg.test > --zookeeper zk-01.dev.quantifind.com:2182,zk-02.dev.quantifind.com:2182, > zk-03.dev.quantifind.com:2182 > topic: zerg.test partition: 0 leader: 11 replicas: > 13,14,11 isr: 11 > topic: zerg.test partition: 1 leader: 11 replicas: > 14,11,12 isr: 11 > > 5. start kafka on brokerId 13, kafka2.x.x.x > bin/kafka-server-start.sh config/server.properties > > [2014-12-05 14:34:45,607] ERROR [KafkaApi-13] error when handling request > Name: FetchRequest; Version: 0; CorrelationId: 222; ClientId: > ReplicaFetcherThread-0-11; ReplicaId: 13; MaxWait: 500 ms; MinBytes: 1 > bytes; RequestInfo: [zerg.hydra,2] -> > PartitionFetchInfo(0,1048576),[zerg.test,0] -> > PartitionFetchInfo(15,1048576) (kafka.server.KafkaApis) > kafka.common.KafkaException: Shouldn't set logEndOffset for replica 13 > partition [zerg.hydra,2] since it's local > at kafka.cluster.Replica.logEndOffset_$eq(Replica.scala:46) > at > kafka.cluster.Partition.updateLeaderHWAndMaybeExpandIsr(Partition.scala:227) > > Could you help on it? > > Thank you! > > > > > Haeley > — > Work hard, stay humble. > > > > > -- Thanks, Neha