so we're currently trying to use Kafka 2.1.0 and are pretty much in a
proof-of-concept-phase. We've only just started to look into it and are
trying to figure out, if it's what we need.

The current setup is as follows:

   - 3 Kafka brokers on different hosts: kafka1,kafka2,kafka3
   - 3 Zookeeper node: zkhost1,zkhost2,zkhost3
   - One topic: "myTopic"
   - The topic had 4 partitions
   - The replication factor was 1
   - We have one producer and three consumers, all in the same consumer
   group "myGroup"

Now I was trying to reassign the partition with the
Kafka-reassign-partitions.sh script. For this I created the following JSON
file:

{"version":1,
"partitions":[
    {"topic":"myTopic","partition":0,"replicas":[0]},
    {"topic":"myTopic","partition":1,"replicas":[0]},
    {"topic":"myTopic","partition":2,"replicas":[1]},
    {"topic":"myTopic","partition":3,"replicas":[2]}
    ]
}

...and then executed the script:

kafka/bin/kafka-reassign-partitions.sh --zookeeper
zkhost1:2181,zkhost2:2181,zkhost3:2181 --reassignment-json-file
increase-replication-factor.json --execute

This ran smoothly and after that I got my expected replication:

Topic:myTopic   PartitionCount:4    ReplicationFactor:1 Configs:
Topic: myTopic  Partition: 0    Leader: 0   Replicas: 0 Isr: 0
Topic: myTopic  Partition: 1    Leader: 0   Replicas: 0 Isr: 0
Topic: myTopic  Partition: 2    Leader: 0   Replicas: 1 Isr: 1
Topic: myTopic  Partition: 3    Leader: 0   Replicas: 2 Isr: 2

What I don't understand is, what happened to the partitions during that
reassignment. When I looked at the ConsumerOffsetChecker, this is what I
saw *before* the reassignment:

kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group
myGroup --zookeeper zkhost1:2181 --topic myTopic

Group           Topic                          Pid Offset
logSize         Lag             Owner
myGroup         myTopic                        0   925230
925230          0               none
myGroup         myTopic                        1   925230
925230          0               none
myGroup         myTopic                        2   925230
925230          0               none
myGroup         myTopic                        3   925230
925230          0               none

...and this is what I saw *after* the reassignment:

Group           Topic                          Pid Offset
logSize         Lag             Owner
myGroup         myTopic                        0   23251
23252           1               none
myGroup         myTopic                        1   41281
41281           0               none
myGroup         myTopic                        2   23260
23260           0               none
myGroup         myTopic                        3   41270
41270           0               none

For me this raised a few questions:

   - Why is the logSize now heavily reduced? Does the reassignment trigger
   some cleanup? (we have not set a byte limit)
   - Why had all 4 partitions roughly the same size before the
   reassignment, whereas after the reassignment there's this big difference
   between partitions 0,2 and 1,3? Shouldn't all partitions of one topic have
   the same logSize or am I misunderstanding the concept here?
   - Can something like this (i.e. reassigning partitions) lead to data
   loss? (I couldn't see any on our consumer in this case). And if so, is
   there a way to do this without this risk?
   - At the time of executing the command kafka-reassign-partitions.sh for
   a topic should we stop producer and consumers?
   - How to know after running kafka-reassign-partitions.sh
   command distribution is complete?

Reply via email to