Jun Rao created KAFKA-18625:
-------------------------------

             Summary: consumer client could get duplicated records if assigned 
partitions change quickly
                 Key: KAFKA-18625
                 URL: https://issues.apache.org/jira/browse/KAFKA-18625
             Project: Kafka
          Issue Type: Bug
          Components: consumer
            Reporter: Jun Rao


When a partition is unassigned to a consumer, we don't clear the buffered 
records in the client immediately. When the client calls poll(), 
[FetchCollector.fetchRecords()|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/FetchCollector.java#L156-L158]
 will call {{nextInLineFetch.drain()}} to drain the fetched data for that 
partition if the partition is unassigned. In the common case, the buffered data 
for unassigned partition will be drained before the partition is assigned back 
again.

However, in the rare case, in theory, the following seems possible (1) 
partition1 is assigned to client1; (2) a CompletedFetch for partition1 is 
buffered in client1; (3) partition1 is reassigned to client2 and unassigned to 
client1; (4) client2 consumes the same data buffered in step (2); (5) 
partition1 is reassigned back to client2; (6) client1 calls poll() and consumes 
the data buffered in step (2), causing duplicated data to be returned to the 
client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to