Duplication when switching to preferred leader

Roshan Naik Thu, 25 Feb 2016 21:01:22 -0800

Here is a case of data duplication that should be avoidable.
It is observed when leadership of partition changes from the current leader 
back to preferred leader.


Steps to reproduce:
 - Using 3 broker setup.
 - Create topic with 1 partition, replication factor=3, ISR count=2 and 
leader.imbalance.check.interval.seconds=3
 - Make a note of which broker is the leader .. This is the preferred leader. 
Stop this broker. This triggers a leader switch.
 - Client data will be sent to the newly elected leader.
 - Using perf producer client, pump data into the topic with using these 
settings: acks=-1 batch.size=1  retries=900000. Using multiple clients helps 
reproduce more easily.
 - Start the preferred leader back up
 - Wait for the leader to  switch back to the preferred leader and the client 
to gets redirected to new leader.
 - Stop the  producer client
 - Check for duplicates by dumping the partition using the console consumer.
-  You should see that the first message that received the 
NOT_LEADER_FOR_PARTITION error.

Theory:
After experimenting with this using some custom instrumentation this seems to 
be the most likely cause:

 - Around the time the leadership is switching, the first leader who is 
accepting data is is still leader, so it appends to log and then around the 
time it is ready to respond, it realizes its no longer leader and responds to 
client with NOT_LEADER_FOR_PARTITION error.
 - Client notices error and resends the value to the new (preferred leader) 
that has come back up.


Thoughts around solution:
If the leader has accepted some message, it should not return the 
NOT_LEADER_FOR_PARTITION error.  However it is not clear the purpose of this 
leadership check after the message has been accepted.

Duplication when switching to preferred leader

Reply via email to