Hi I am doing some tests to understand how kafka behaves when adding partitions to a topic while producing and consuming.
My test is like this I launch 3 brokers I create a topic with 3 partitions and replication factor = 2 $ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --create --topic topic1 --partitions 3 --replication-factor 2 $ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --describe --topic topic1 Topic:topic1 PartitionCount:3 ReplicationFactor:2 Configs: Topic: topic1 Partition: 0 Leader: 2 Replicas: 2,3 Isr: 2,3 Topic: topic1 Partition: 1 Leader: 3 Replicas: 3,1 Isr: 3,1 Topic: topic1 Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2 Start a single producer and 3 consumers in a single consumer group. Producer is using default partitioning. Consumers are using automatic rebalancing and, for test 1 AUTO_OFFSET_RESET_CONFIG = earliest and for test 2 AUTO_OFFSET_RESET_CONFIG = latest While producer and consumers are working I modify the topic using kafka-topics.sh $ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --alter --topic topic1 -partitions 6; date WARNING: If partitions are increased for a topic that has a key, the partition logic or ordering of the messages will be affected Adding partitions succeeded! *Tue Apr 26 10:34:46 ART 2016* Now, what I am observing is that producer does not "see" new partitions and consumer group does not rebalance until 4 minutes later aprox. Is this the expected behavior? Is this time configurable? I could not find a property to change this. $ date; ./bin/kafka-topics.sh --zookeeper zookeeper-1 --describe --topic topic1 Tue Apr 26 10:37:54 ART 2016 Topic:topic1 PartitionCount:6 ReplicationFactor:2 Configs: Topic: topic1 Partition: 0 Leader: 2 Replicas: 2,3 Isr: 2,3 Topic: topic1 Partition: 1 Leader: 3 Replicas: 3,1 Isr: 3,1 Topic: topic1 Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2 Topic: topic1 Partition: 3 Leader: 1 Replicas: 1,2 Isr: 1,2 Topic: topic1 Partition: 4 Leader: 2 Replicas: 2,3 Isr: 2,3 Topic: topic1 Partition: 5 Leader: 3 Replicas: 3,1 Isr: 3,1 $ date; ./bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server kafka-1:9092 --describe --group group1 *Tue Apr 26 10:38:04 ART 2016* GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER group1, topic1, 2, 80, 80, 0, consumer-3_/127.0.0.1 group1, topic1, 0, 91, 91, 0, consumer-1_/127.0.0.1 group1, topic1, 1, 88, 88, 0, consumer-2_/127.0.0.1 $ date; ./bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server kafka-1:9092 --describe --group group1 *Tue Apr 26 10:39:40 ART 2016* GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER group1, topic1, 4, 15, 16, 1, consumer-3_/127.0.0.1 group1, topic1, 5, 9, 9, 0, consumer-3_/127.0.0.1 group1, topic1, 0, 108, 108, 0, consumer-1_/127.0.0.1 group1, topic1, 1, 117, 117, 0, consumer-1_/127.0.0.1 group1, topic1, 2, 99, 99, 0, consumer-2_/127.0.0.1 group1, topic1, 3, 6, 6, 0, consumer-2_/127.0.0.1 Another observation is that when consumers use AUTO_OFFSET_RESET_CONFIG = latest then some messages are not received by the group. I understand this is an expected behavior because producer "see" new partitions before consumer group rebalancing is completed, so producer is writing to some partitions not yet assigned to the group. When consumers use AUTO_OFFSET_RESET_CONFIG = earliest all messages are received (with no duplicated so far in my tests). If consumers use latest, then they will lose messages, if they use earliest they can handle rebalances but what happen when they crash and are restarted? they will get all messages in topic. Is there any recommendation regarding this topic? Regards Luciano