[ https://issues.apache.org/jira/browse/KAFKA-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias J. Sax resolved KAFKA-6108. ------------------------------------ Resolution: Fixed > Synchronizing on commits and StandbyTasks can be improved > --------------------------------------------------------- > > Key: KAFKA-6108 > URL: https://issues.apache.org/jira/browse/KAFKA-6108 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 1.0.0 > Reporter: Matthias J. Sax > Priority: Major > > In Kafka Streams, we use an optimization that allows us to reuse a source > topic as changelog topic (and thus, avoid unnecessary data duplication) if we > read a topic directly as {{KTable}}. To guarantee that {{StandbyTasks}} > provide a correct state, we need to synchronize the read progress of the > {{StandbyTasks}} with the processing progress of the main {{StreamTask}} --- > otherwise, the {{StandbyTasks}} might restore state too much into the future. > For this, we limit the allowed restore offsets of the {{StandbyTasks}} to be > not larger than the committed offsets of the {{StreamTask}}. > Furthermore, we buffer all data returned by the restore consumer that is > beyond the allowed restore-offsets in-memory. > To achieve both goals, we regularly update the max allowed restore offsets > (this is done within task internally) and we also use a flag > {{processStandbyRecords}} within {{StreamThread}} with the purpose to not > call {{poll()}} on the restore consumer if our in-memory buffer has already > data beyond the allowed max restore offsets. > We should consider: > - unify both places in the code and put the whole logic into a single place > (suggestion is to use the {{StreamThread}} -- a tasks, does not need to know > about this optimization) > - feed only those data into the task, that the task is allowed to restore > (instead of everything) -- This message was sent by Atlassian Jira (v8.3.4#803005)