[ https://issues.apache.org/jira/browse/KAFKA-15776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811071#comment-17811071 ]
Jorge Esteban Quilcate Otoya commented on KAFKA-15776: ------------------------------------------------------ Thanks [~showuon]! Sure, I can prepare a KIP if there's an initial agreement on the path to follow. Will prepare something for next week. On not interrupting the thread: My understanding is that currently on consumer remote fetch, requests are submitted to the thread pool and cancelled on timeout – only then retried. This means only 1 task is submitted per consumer-partition remote fetch at any time. If we opt for not cancelling the tasks, then future would be cancelled but the thread will still be running until completion. On timeout, consumer will retry fetching, allocating yet another task on the thread pool. Potentially, we would have more than one task submitted per consumer-partition remote fetch, holding more resources than needed to deal with a single consumer-partition consumption from remote storage. Let me know if it make sense. This is mostly speculation, so can dive further if some of my reasoning is incorrect. > Update delay timeout for DelayedRemoteFetch request > --------------------------------------------------- > > Key: KAFKA-15776 > URL: https://issues.apache.org/jira/browse/KAFKA-15776 > Project: Kafka > Issue Type: Task > Reporter: Kamal Chandraprakash > Assignee: Kamal Chandraprakash > Priority: Major > > We are reusing the {{fetch.max.wait.ms}} config as a delay timeout for > DelayedRemoteFetchPurgatory. {{fetch.max.wait.ms}} purpose is to wait for the > given amount of time when there is no data available to serve the FETCH > request. > {code:java} > The maximum amount of time the server will block before answering the fetch > request if there isn't sufficient data to immediately satisfy the requirement > given by fetch.min.bytes. > {code} > [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/DelayedRemoteFetch.scala#L41] > Using the same timeout in the DelayedRemoteFetchPurgatory can confuse the > user on how to configure optimal value for each purpose. Moreover, the config > is of *LOW* importance and most of the users won't configure it and use the > default value of 500 ms. > Having the delay timeout of 500 ms in DelayedRemoteFetchPurgatory can lead to > higher number of expired delayed remote fetch requests when the remote > storage have any degradation. > We should introduce one {{fetch.remote.max.wait.ms}} config (preferably > server config) to define the delay timeout for DelayedRemoteFetch requests > (or) take it from client similar to {{request.timeout.ms}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)