[ https://issues.apache.org/jira/browse/KAFKA-18625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916172#comment-17916172 ]
Jun Rao commented on KAFKA-18625: --------------------------------- [~hachikuji] : Do you think this is a real issue? Thanks. > consumer client could get duplicated records if assigned partitions change > quickly > ---------------------------------------------------------------------------------- > > Key: KAFKA-18625 > URL: https://issues.apache.org/jira/browse/KAFKA-18625 > Project: Kafka > Issue Type: Bug > Components: consumer > Reporter: Jun Rao > Priority: Major > > When a partition is unassigned to a consumer, we don't clear the buffered > records in the client immediately. When the client calls poll(), > [FetchCollector.fetchRecords()|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/FetchCollector.java#L156-L158] > will call {{nextInLineFetch.drain()}} to drain the fetched data for that > partition if the partition is unassigned. In the common case, the buffered > data for unassigned partition will be drained before the partition is > assigned back again. > However, in the rare case, in theory, the following seems possible (1) > partition1 is assigned to client1; (2) a CompletedFetch for partition1 is > buffered in client1; (3) partition1 is reassigned to client2 and unassigned > to client1; (4) client2 consumes the same data buffered in step (2); (5) > partition1 is reassigned back to client2; (6) client1 calls poll() and > consumes the data buffered in step (2), causing duplicated data to be > returned to the client. -- This message was sent by Atlassian Jira (v8.20.10#820010)