[jira] [Resolved] (KAFKA-6108) Synchronizing on commits and StandbyTasks can be improved

Matthias J. Sax (Jira) Wed, 23 Oct 2019 21:20:00 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Matthias J. Sax resolved KAFKA-6108.
------------------------------------
    Resolution: Fixed

> Synchronizing on commits and StandbyTasks can be improved
> ---------------------------------------------------------
>
>                 Key: KAFKA-6108
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6108
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 1.0.0
>            Reporter: Matthias J. Sax
>            Priority: Major
>
> In Kafka Streams, we use an optimization that allows us to reuse a source 
> topic as changelog topic (and thus, avoid unnecessary data duplication) if we 
> read a topic directly as {{KTable}}. To guarantee that {{StandbyTasks}} 
> provide a correct state, we need to synchronize the read progress of the 
> {{StandbyTasks}} with the processing progress of the main {{StreamTask}} --- 
> otherwise, the {{StandbyTasks}} might restore state too much into the future. 
> For this, we limit the allowed restore offsets of the {{StandbyTasks}} to be 
> not larger than the committed offsets of the {{StreamTask}}.
> Furthermore, we buffer all data returned by the restore consumer that is 
> beyond the allowed restore-offsets in-memory.
> To achieve both goals, we regularly update the max allowed restore offsets 
> (this is done within task internally) and we also use a flag 
> {{processStandbyRecords}} within {{StreamThread}} with the purpose to not 
> call {{poll()}} on the restore consumer if our in-memory buffer has already 
> data beyond the allowed max restore offsets.
> We should consider:
>  - unify both places in the code and put the whole logic into a single place 
> (suggestion is to use the {{StreamThread}} -- a tasks, does not need to know 
> about this optimization)
>  - feed only those data into the task, that the task is allowed to restore 
> (instead of everything)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (KAFKA-6108) Synchronizing on commits and StandbyTasks can be improved

Reply via email to