[ https://issues.apache.org/jira/browse/KAFKA-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955071#comment-16955071 ]
Matthias J. Sax commented on KAFKA-6108: ---------------------------------------- Didn't your PR not also change when we refresh the committed offsets? We do it more intelligently (and less frequently) now IIRC. The main question is really, if we think that this ticket still applies or not. \cc [~guozhang] WDYT? > Synchronizing on commits and StandbyTasks can be improved > --------------------------------------------------------- > > Key: KAFKA-6108 > URL: https://issues.apache.org/jira/browse/KAFKA-6108 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 1.0.0 > Reporter: Matthias J. Sax > Priority: Major > > In Kafka Streams, we use an optimization that allows us to reuse a source > topic as changelog topic (and thus, avoid unnecessary data duplication) if we > read a topic directly as {{KTable}}. To guarantee that {{StandbyTasks}} > provide a correct state, we need to synchronize the read progress of the > {{StandbyTasks}} with the processing progress of the main {{StreamTask}} --- > otherwise, the {{StandbyTasks}} might restore state too much into the future. > For this, we limit the allowed restore offsets of the {{StandbyTasks}} to be > not larger than the committed offsets of the {{StreamTask}}. > Furthermore, we buffer all data returned by the restore consumer that is > beyond the allowed restore-offsets in-memory. > To achieve both goals, we regularly update the max allowed restore offsets > (this is done within task internally) and we also use a flag > {{processStandbyRecords}} within {{StreamThread}} with the purpose to not > call {{poll()}} on the restore consumer if our in-memory buffer has already > data beyond the allowed max restore offsets. > We should consider: > - unify both places in the code and put the whole logic into a single place > (suggestion is to use the {{StreamThread}} -- a tasks, does not need to know > about this optimization) > - feed only those data into the task, that the task is allowed to restore > (instead of everything) -- This message was sent by Atlassian Jira (v8.3.4#803005)