Lianet Magrans created KAFKA-17696: -------------------------------------- Summary: New consumer background operations unaware of metadata errors Key: KAFKA-17696 URL: https://issues.apache.org/jira/browse/KAFKA-17696 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Lianet Magrans
When a metadata error happens (ie. Unauthorized topic), the network layer is the one to detect it and it just propagates it to the app thread via en ErrorEvent. [https://github.com/apache/kafka/blob/0edf5dbd204df9eb62bfea1b56993e95737df5a3/clients/src/main/java/org/apache/kafka/clients/consumer/internals/NetworkClientDelegate.java#L153] That allows api calls that processBackgroundEvents to throw the error in the app thread (ex. poll, unsubscribe and close, which are the only api calls that currently processBackgroundEvents). This means that all other api calls that do not processBackgroundEvent will never know about errors like Unauthorized topics. Moreover, it really means that the background operations are not notified/aborted when a metadata error happens (auth error). Ex. call to position block waiting for the updateFetchPositions ([here|https://github.com/apache/kafka/blob/0edf5dbd204df9eb62bfea1b56993e95737df5a3/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1586]), will leave a pendingOffsetFetchEvent waiting to complete, even when the background already got an Unauthorized exception (but it only passed it to the app thread via ErrorEvent) I wonder if we should ensure that metadata errors are not only propagated to the app thread via ErrorEvents, but also ensure that we notify all request managers in the background (so that they can decide if completeExceptionally their outstanding events). Ex. OffsetsRequestManager.onMetadataError should completeExceptionally the pendingOffsetFetchEvent (just first thought, there could be other approaches, but note that calling processBackgroundEvent in api calls like positions will not do because we would block first on the CheckAndUpdatePositions, then processBackgroundEvents that would only happen after the CheckAndUpdate) This behaviour can be repro with the integration test testOffsetFetchWithNoAccess with the new consumer enabled -- This message was sent by Atlassian Jira (v8.20.10#820010)