[ 
https://issues.apache.org/jira/browse/KAFKA-18625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-18625:
------------------------------
    Labels: consumer-threading-refactor  (was: )

> consumer client could get duplicated records if assigned partitions change 
> quickly
> ----------------------------------------------------------------------------------
>
>                 Key: KAFKA-18625
>                 URL: https://issues.apache.org/jira/browse/KAFKA-18625
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>            Reporter: Jun Rao
>            Assignee: TengYao Chi
>            Priority: Major
>              Labels: consumer-threading-refactor
>
> When a partition is unassigned to a consumer, we don't clear the buffered 
> records in the client immediately. When the client calls poll(), 
> [FetchCollector.fetchRecords()|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/FetchCollector.java#L156-L158]
>  will call {{nextInLineFetch.drain()}} to drain the fetched data for that 
> partition if the partition is unassigned. In the common case, the buffered 
> data for unassigned partition will be drained before the partition is 
> assigned back again.
> However, in the rare case, in theory, the following seems possible (1) 
> partition1 is assigned to client1; (2) a CompletedFetch for partition1 is 
> buffered in client1; (3) partition1 is reassigned to client2 and unassigned 
> to client1; (4) client2 consumes the same data buffered in step (2); (5) 
> partition1 is reassigned back to client1; (6) client1 calls poll() and 
> consumes the data buffered in step (2), causing duplicated data to be 
> returned to the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to