Hi

I am doing some tests to understand how kafka behaves when adding
partitions to a topic while producing and consuming.

My test is like this

I launch 3 brokers
I create a topic with 3 partitions and replication factor = 2
$ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --create --topic topic1
--partitions 3 --replication-factor 2
$ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --describe --topic topic1
Topic:topic1 PartitionCount:3 ReplicationFactor:2 Configs:
 Topic: topic1 Partition: 0 Leader: 2 Replicas: 2,3 Isr: 2,3
 Topic: topic1 Partition: 1 Leader: 3 Replicas: 3,1 Isr: 3,1
 Topic: topic1 Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2

Start a single producer and 3 consumers in a single consumer group.
Producer is using default partitioning.
Consumers are using automatic rebalancing and, for test 1
AUTO_OFFSET_RESET_CONFIG = earliest and for test 2 AUTO_OFFSET_RESET_CONFIG
= latest

While producer and consumers are working I modify the topic using
kafka-topics.sh

$ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --alter --topic topic1
-partitions 6; date
WARNING: If partitions are increased for a topic that has a key, the
partition logic or ordering of the messages will be affected
Adding partitions succeeded!
*Tue Apr 26 10:34:46 ART 2016*

Now, what I am observing is that producer does not "see" new partitions and
consumer group does not rebalance until 4 minutes later aprox. Is this the
expected behavior? Is this time configurable? I could not find a property
to change this.

$ date; ./bin/kafka-topics.sh --zookeeper zookeeper-1 --describe --topic
topic1
Tue Apr 26 10:37:54 ART 2016
Topic:topic1 PartitionCount:6 ReplicationFactor:2 Configs:
 Topic: topic1 Partition: 0 Leader: 2 Replicas: 2,3 Isr: 2,3
 Topic: topic1 Partition: 1 Leader: 3 Replicas: 3,1 Isr: 3,1
 Topic: topic1 Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2
 Topic: topic1 Partition: 3 Leader: 1 Replicas: 1,2 Isr: 1,2
 Topic: topic1 Partition: 4 Leader: 2 Replicas: 2,3 Isr: 2,3
 Topic: topic1 Partition: 5 Leader: 3 Replicas: 3,1 Isr: 3,1
$ date; ./bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server
kafka-1:9092 --describe --group group1
*Tue Apr 26 10:38:04 ART 2016*
GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
group1, topic1, 2, 80, 80, 0, consumer-3_/127.0.0.1
group1, topic1, 0, 91, 91, 0, consumer-1_/127.0.0.1
group1, topic1, 1, 88, 88, 0, consumer-2_/127.0.0.1

$ date; ./bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server
kafka-1:9092 --describe --group group1
*Tue Apr 26 10:39:40 ART 2016*
GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
group1, topic1, 4, 15, 16, 1, consumer-3_/127.0.0.1
group1, topic1, 5, 9, 9, 0, consumer-3_/127.0.0.1
group1, topic1, 0, 108, 108, 0, consumer-1_/127.0.0.1
group1, topic1, 1, 117, 117, 0, consumer-1_/127.0.0.1
group1, topic1, 2, 99, 99, 0, consumer-2_/127.0.0.1
group1, topic1, 3, 6, 6, 0, consumer-2_/127.0.0.1

Another observation is that when consumers use AUTO_OFFSET_RESET_CONFIG =
latest then some messages are not received by the group. I understand this
is an expected behavior because producer "see" new partitions before
consumer group rebalancing is completed, so producer is writing to some
partitions not yet assigned to the group.
When consumers use AUTO_OFFSET_RESET_CONFIG = earliest all messages are
received (with no duplicated so far in my tests).
If consumers use latest, then they will lose messages, if they use earliest
they can handle rebalances but what happen when they crash and are
restarted? they will get all messages in topic. Is there any recommendation
regarding this topic?

Regards
Luciano

Reply via email to