Jun Rao created KAFKA-18625: ------------------------------- Summary: consumer client could get duplicated records if assigned partitions change quickly Key: KAFKA-18625 URL: https://issues.apache.org/jira/browse/KAFKA-18625 Project: Kafka Issue Type: Bug Components: consumer Reporter: Jun Rao
When a partition is unassigned to a consumer, we don't clear the buffered records in the client immediately. When the client calls poll(), [FetchCollector.fetchRecords()|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/FetchCollector.java#L156-L158] will call {{nextInLineFetch.drain()}} to drain the fetched data for that partition if the partition is unassigned. In the common case, the buffered data for unassigned partition will be drained before the partition is assigned back again. However, in the rare case, in theory, the following seems possible (1) partition1 is assigned to client1; (2) a CompletedFetch for partition1 is buffered in client1; (3) partition1 is reassigned to client2 and unassigned to client1; (4) client2 consumes the same data buffered in step (2); (5) partition1 is reassigned back to client2; (6) client1 calls poll() and consumes the data buffered in step (2), causing duplicated data to be returned to the client. -- This message was sent by Atlassian Jira (v8.20.10#820010)