Hi, Kafka group We try to improve the fault-tolerance of kafka cluster. We setup 4 nodes kafka cluster and 3 nodes zookeeper cluster.
ubuntu version: Ubuntu 14.04.1 zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT kafka version: kafka_2.8.0-0.8.0 kafka0.x.x.x broker:9092 borker.id = 11 zookeeper.connect=zk-01.x.x.x:2182,zk-02.x.x.x:2182,zk-03.x.x.x:2182 kafka1.x.x.x broker:9092 borker.id = 12 zookeeper.connect=zk-01.x.x.x:2182,zk-02.x.x.x:2182,zk-03.x.x.x:2182 kafka2.x.x.x broker:9092 borker.id = 13 zookeeper.connect=zk-01.x.x.x:2182,zk-02.x.x.x:2182,zk-03.x.x.x:2182 kafka3.x.x.x broker:9092 borker.id = 14 zookeeper.connect=zk-01.x.x.x:2182,zk-02.x.x.x:2182,zk-03.x.x.x:2182 1. start both kafka and zk servers, everything OK 2. create 2 topics bin/kafka-create-topic.sh --zookeeper zk-01.x.x.x:2182,zk-02.x.x.x:2182,zk-03.x.x.x:2182 —parition 3 --replica 3 --topic zerg.hydra bin/kafka-create-topic.sh --zookeeper zk-01.x.x.x:2182,zk-02.x.x.x:2182,zk-03.x.x.x:2182 —parition 2 --replica 3 --topic zerg.test 3. start producer and consumer on brokerId 11. the inputs sent to producer are received by consumer. 4. stop kafka servers on brokerId 12, 13, 14 by supervisor /etc/init.d/supervisor stop only brokerId 11 is running. root@kafka0:/home/dev/kafka# bin/kafka-list-topic.sh --topic zerg.hydra --zookeeper zk-01.dev.quantifind.com:2182,zk-02.dev.quantifind.com:2182,zk-03.dev.quantifind.com:2182 topic: zerg.hydra partition: 0 leader: 11 replicas: 11,14,12 isr: 11 topic: zerg.hydra partition: 1 leader: 11 replicas: 11,14,12 isr: 11 topic: zerg.hydra partition: 2 leader: 11 replicas: 12,11,13 isr: 11 root@kafka0:/home/dev/kafka# bin/kafka-list-topic.sh --topic zerg.test --zookeeper zk-01.dev.quantifind.com:2182,zk-02.dev.quantifind.com:2182,zk-03.dev.quantifind.com:2182 topic: zerg.test partition: 0 leader: 11 replicas: 13,14,11 isr: 11 topic: zerg.test partition: 1 leader: 11 replicas: 14,11,12 isr: 11 5. start kafka on brokerId 13, kafka2.x.x.x bin/kafka-server-start.sh config/server.properties [2014-12-05 14:34:45,607] ERROR [KafkaApi-13] error when handling request Name: FetchRequest; Version: 0; CorrelationId: 222; ClientId: ReplicaFetcherThread-0-11; ReplicaId: 13; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [zerg.hydra,2] -> PartitionFetchInfo(0,1048576),[zerg.test,0] -> PartitionFetchInfo(15,1048576) (kafka.server.KafkaApis) kafka.common.KafkaException: Shouldn't set logEndOffset for replica 13 partition [zerg.hydra,2] since it's local at kafka.cluster.Replica.logEndOffset_$eq(Replica.scala:46) at kafka.cluster.Partition.updateLeaderHWAndMaybeExpandIsr(Partition.scala:227) Could you help on it? Thank you! Haeley — Work hard, stay humble.