[ 
https://issues.apache.org/jira/browse/KAFKA-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955550#comment-16955550
 ] 

Guozhang Wang commented on KAFKA-6108:
--------------------------------------

I think Chris' PR did improve on the current situation already, such that we 
only try to refresh committed offsets when we commit and hence likely that 
offset limit can be extended. I think we can resolve this ticket now.

> Synchronizing on commits and StandbyTasks can be improved
> ---------------------------------------------------------
>
>                 Key: KAFKA-6108
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6108
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 1.0.0
>            Reporter: Matthias J. Sax
>            Priority: Major
>
> In Kafka Streams, we use an optimization that allows us to reuse a source 
> topic as changelog topic (and thus, avoid unnecessary data duplication) if we 
> read a topic directly as {{KTable}}. To guarantee that {{StandbyTasks}} 
> provide a correct state, we need to synchronize the read progress of the 
> {{StandbyTasks}} with the processing progress of the main {{StreamTask}} --- 
> otherwise, the {{StandbyTasks}} might restore state too much into the future. 
> For this, we limit the allowed restore offsets of the {{StandbyTasks}} to be 
> not larger than the committed offsets of the {{StreamTask}}.
> Furthermore, we buffer all data returned by the restore consumer that is 
> beyond the allowed restore-offsets in-memory.
> To achieve both goals, we regularly update the max allowed restore offsets 
> (this is done within task internally) and we also use a flag 
> {{processStandbyRecords}} within {{StreamThread}} with the purpose to not 
> call {{poll()}} on the restore consumer if our in-memory buffer has already 
> data beyond the allowed max restore offsets.
> We should consider:
>  - unify both places in the code and put the whole logic into a single place 
> (suggestion is to use the {{StreamThread}} -- a tasks, does not need to know 
> about this optimization)
>  - feed only those data into the task, that the task is allowed to restore 
> (instead of everything)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to