[ https://issues.apache.org/jira/browse/KAFKA-18625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kirk True updated KAFKA-18625: ------------------------------ Labels: consumer-threading-refactor (was: ) > consumer client could get duplicated records if assigned partitions change > quickly > ---------------------------------------------------------------------------------- > > Key: KAFKA-18625 > URL: https://issues.apache.org/jira/browse/KAFKA-18625 > Project: Kafka > Issue Type: Bug > Components: clients, consumer > Reporter: Jun Rao > Assignee: TengYao Chi > Priority: Major > Labels: consumer-threading-refactor > > When a partition is unassigned to a consumer, we don't clear the buffered > records in the client immediately. When the client calls poll(), > [FetchCollector.fetchRecords()|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/FetchCollector.java#L156-L158] > will call {{nextInLineFetch.drain()}} to drain the fetched data for that > partition if the partition is unassigned. In the common case, the buffered > data for unassigned partition will be drained before the partition is > assigned back again. > However, in the rare case, in theory, the following seems possible (1) > partition1 is assigned to client1; (2) a CompletedFetch for partition1 is > buffered in client1; (3) partition1 is reassigned to client2 and unassigned > to client1; (4) client2 consumes the same data buffered in step (2); (5) > partition1 is reassigned back to client1; (6) client1 calls poll() and > consumes the data buffered in step (2), causing duplicated data to be > returned to the client. -- This message was sent by Atlassian Jira (v8.20.10#820010)