Here is a case of data duplication that should be avoidable. It is observed when leadership of partition changes from the current leader back to preferred leader.
Steps to reproduce: - Using 3 broker setup. - Create topic with 1 partition, replication factor=3, ISR count=2 and leader.imbalance.check.interval.seconds=3 - Make a note of which broker is the leader .. This is the preferred leader. Stop this broker. This triggers a leader switch. - Client data will be sent to the newly elected leader. - Using perf producer client, pump data into the topic with using these settings: acks=-1 batch.size=1 retries=900000. Using multiple clients helps reproduce more easily. - Start the preferred leader back up - Wait for the leader to switch back to the preferred leader and the client to gets redirected to new leader. - Stop the producer client - Check for duplicates by dumping the partition using the console consumer. - You should see that the first message that received the NOT_LEADER_FOR_PARTITION error. Theory: After experimenting with this using some custom instrumentation this seems to be the most likely cause: - Around the time the leadership is switching, the first leader who is accepting data is is still leader, so it appends to log and then around the time it is ready to respond, it realizes its no longer leader and responds to client with NOT_LEADER_FOR_PARTITION error. - Client notices error and resends the value to the new (preferred leader) that has come back up. Thoughts around solution: If the leader has accepted some message, it should not return the NOT_LEADER_FOR_PARTITION error. However it is not clear the purpose of this leadership check after the message has been accepted.