[ https://issues.apache.org/jira/browse/KAFKA-19236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chia-Ping Tsai reassigned KAFKA-19236: -------------------------------------- Assignee: Chia-Ping Tsai > Auto.offset.reset "by-duration" should reset a single time > ---------------------------------------------------------- > > Key: KAFKA-19236 > URL: https://issues.apache.org/jira/browse/KAFKA-19236 > Project: Kafka > Issue Type: Improvement > Components: clients, consumer, streams > Affects Versions: 4.0.0 > Reporter: Matthias J. Sax > Assignee: Chia-Ping Tsai > Priority: Major > > KIP-1106 introduced a new option "by-duration" for config `auto.offset.reset` > > ([https://cwiki.apache.org/confluence/display/KAFKA/KIP-1106%3A+Add+duration+based+offset+reset+option+for+consumer+clients)] > If a consumer tries to reset to a "future" time, the observed behavior is > somewhat odd, and we should change it: > Assume there is a topic for which no new data was written for the last hour. > A new consumer starts up at 1pm, and tries to reset by 10 minutes. There is > no data for 12:50pm, so the consumer won't complete the reset, but will keep > retrying (every 30 seconds by default) until it can resolve offsets – the > issue is, that the "seek ts" (ie 12:50pm) is recomputed on every retry and > thus move while the consumer wait. > Because the consumer could not resolve offset, but still has `offsets=null`, > it keep re-executing the reset logic. > Hence, if there is not data for another 30 minutes, the consumer would now > retry to find offset at 1:20pm. This is rather unexpected, as if one starts > the consumer at 1:00pm and resets by 10 minutes, it's reasonable to assume > that data would start flowing when the topic reaches 12:50pm, even if the > consumer was idling for 30 minutes. > Thus, instead of executing the reset logic every 30 seconds, the reset logic > should be executed once, to compute 12:50pm as "seek ts", and if the request > returns `null` (ie, no offset found), the same request should be resent every > 30 second, w/o re-triggering the rest logic itself, to keep the "seek ts" at > 12:50pm. > Kafka Streams, which re-implement the by-duration reset logic by itself, has > the same behavior as the consumer, and should be updated, as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)