[ 
https://issues.apache.org/jira/browse/KAFKA-18469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu-Lin Chen updated KAFKA-18469:
--------------------------------
    Description: 
In AsyncConsumer, the ListOffsetRequest is only retried after the metadata 
update[1]. However, not every retriable errors are followed by a metadata 
update, such as the ReplicaNotAvailable error from remote storage. This errors 
leads to Consumer#offsetsForTimes failing after api timeout(60 seconds).

This issue does not occur with ClassicConsumer, which always triggers a 
metadata update before retrying. [2]

 

This issue is the root cause of the flaky test KAFKA-18036, where 
consumer#offsetsForTimes is called before remote metadata cache initialized.

 

[1] 
[https://github.com/apache/kafka/blob/5684fc7a2ee1a4f29cb6d69d713233ed3c297882/clients/src/main/java/org/apache/kafka/clients/consumer/internals/OffsetsRequestManager.java#L529-L534]

[2] 
https://github.com/apache/kafka/blob/5684fc7a2ee1a4f29cb6d69d713233ed3c297882/clients/src/main/java/org/apache/kafka/clients/consumer/internals/OffsetFetcher.java#L180

  was:
In AsyncConsumer, the ListOffsetRequest is only retried after the metadata 
update[1]. However, not every retriable errors are followed by a metadata 
update, such as the ReplicaNotAvailable error from remote storage. This errors 
leads to Consumer#offsetsForTimes failing after api timeout(60 seconds). 

This issue does not happen in ClassicConsumer. [2] We should keep the behavior 
aligned.

 

This issue is the root cause of the flaky test KAFKA-18036, where 
consumer#offsetsForTimes is called before remote metadata cache initialized.

 

[1] 
https://github.com/apache/kafka/blob/5684fc7a2ee1a4f29cb6d69d713233ed3c297882/clients/src/main/java/org/apache/kafka/clients/consumer/internals/OffsetsRequestManager.java#L529-L534

[2] 
[https://github.com/apache/kafka/blob/5684fc7a2ee1a4f29cb6d69d713233ed3c297882/clients/src/main/java/org/apache/kafka/clients/consumer/internals/OffsetFetcher.java#L144-L153]


> AsyncConsumer fails to retry ListOffsetRequest on ReplicaNotAvailable error 
> without metadata update
> ---------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-18469
>                 URL: https://issues.apache.org/jira/browse/KAFKA-18469
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>            Reporter: Yu-Lin Chen
>            Assignee: Yu-Lin Chen
>            Priority: Major
>
> In AsyncConsumer, the ListOffsetRequest is only retried after the metadata 
> update[1]. However, not every retriable errors are followed by a metadata 
> update, such as the ReplicaNotAvailable error from remote storage. This 
> errors leads to Consumer#offsetsForTimes failing after api timeout(60 
> seconds).
> This issue does not occur with ClassicConsumer, which always triggers a 
> metadata update before retrying. [2]
>  
> This issue is the root cause of the flaky test KAFKA-18036, where 
> consumer#offsetsForTimes is called before remote metadata cache initialized.
>  
> [1] 
> [https://github.com/apache/kafka/blob/5684fc7a2ee1a4f29cb6d69d713233ed3c297882/clients/src/main/java/org/apache/kafka/clients/consumer/internals/OffsetsRequestManager.java#L529-L534]
> [2] 
> https://github.com/apache/kafka/blob/5684fc7a2ee1a4f29cb6d69d713233ed3c297882/clients/src/main/java/org/apache/kafka/clients/consumer/internals/OffsetFetcher.java#L180



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to