Hi there, Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT Kafka version: kafka_2.8.0-0.8.1.1
I have the following architecture/configuration staging2.mtl.shopmedia.com (broker.id=1) zookeeper:9092 kafka:2181 staging3.mtl.shopmedia.com(broker.id=2) zookeeper:9092 kafka:2181 centos.mtl.shopmedia.com(broker.id=3) zookeeper:9092 kafka:2181 Each kafka server has the same configuration except broker.id and log.dirs broker.id=XXX port=9092 num.network.threads=2 num.io.threads=8 socket.send.buffer.bytes=1048576 socket.receive.buffer.bytes=1048576 socket.request.max.bytes=104857600 log.dirs=/home/shopmedia/nfs/logs/XXX/kafka num.partitions=1 log.retention.hours=1 log.segment.bytes=536870912 log.retention.check.interval.ms=60000 log.cleaner.enable=false zookeeper.connect=staging2.mtl.shopmedia.com:2181, staging3.mtl.shopmedia.com:2181,centos.mtl.shopmedia.com:2181 zookeeper.connection.timeout.ms=1000000 auto.create.topics.enable=true default.replication.factor=3 Zookeeper configuration is also the same on all servers: dataDir=/home/shopmedia/apps/zookeeper/data clientPort=2181 maxClientCnxns=0 I have only 1 topic and 1 partition I have 3 servers(staging2, staging3 and centos) in case of failover. Each partition should be replicated among all kafka brokers ( as replica.factor = 3 ) I have created my topic like this: kafka-topics.sh --create --zookeeper staging2.mtl.shopmedia.com:2181, staging3.mtl.shopmedia.com:2181,centos.mtl.shopmedia.com:2181 --topic hibe-user-server-event --partitions 1 --replication-factor 3 Topic configuration: [shopmedia@staging3:~] $kafka-topics.sh --describe --zookeeper staging2.mtl.shopmedia.com:2181,staging3.mtl.shopmedia.com:2181, centos.mtl.shopmedia.com:2181 --topic hibe-user-server-event Topic:hibe-user-server-event PartitionCount:1 ReplicationFactor:3 Configs: Topic: hibe-user-server-event Partition: 0 Leader: 2 Replicas: 1,2,3 Isr: 2 According to the describe, my broker leader is 2 (staging3) QUESTIONS) 1) Why Isr(In Sync Replica) is only 2 and not 1,2,3? This way, if the leader2 crashes, the other broker won't have any data 2) I am running a consumers on each machine(staging2, staging3 and centos) with the following command: kafka-console-consumer.sh --zookeeper staging2.mtl.shopmedia.com:2181, staging3.mtl.shopmedia.com:2181,centos.mtl.shopmedia.com:2181 --topic hibe-user-server-event All my servers are up and running(Zoo + kafka) I start a producer from staging2: kafka-console-producer.sh --topic hibe-user-server-event --broker-list= staging2.mtl.shopmedia.com:9092,staging3.mtl.shopmedia.com:9092, centos.mtl.shopmedia.com:9092 All my consumers receive the message properly. I shutdown 1 and 3(staging2 and centos) My consumers still receives the message from the producer( good !) I restart 1 and 3 ( so all servers are running like before) I shut 2 only(Leader becomes 1, ISR: 1), My consumers don't receive anymore message and stdout have the following: Staging2 [2014-09-25 04:23:57,602] ERROR [ConsumerFetcherThread-console-consumer-4903_staging2.hibe.com-1411630863195-cbe7a1e8-0-1], Error for partition [hibe-user-server-event,0] tobroker 1:class kafka.common.UnknownTopicOrPartitionException (kafka.consumer.ConsumerFetcherThread) Staging3 [2014-09-25 04:23:58,459] ERROR [ConsumerFetcherThread-console-consumer-99699_staging3.hibe.com-1411630877045-98f884fa-0-1], Error for partition [hibe-user-server-event,0]to broker 1:class kafka.common.NotLeaderForPartitionException (kafka.consumer.ConsumerFetcherThread) Centos [2014-09-25 04:21:42,393] ERROR [ConsumerFetcherThread-console-consumer-38882_centos.mtl.shopmedia.com-1411630833934-e6ceffde-0-1], Error for partition [hibe-user-server-event,0] to broker 1:class kafka.common.NotLeaderForPartitionException (kafka.consumer.ConsumerFetcherThread) Conclusion: When I shut the broker leader, my consumers can't catch up ( I suspect this is because ISR is not up to date ) Any idea ?