[jira] [Commented] (KAFKA-7994) Improve Partition-Time for rebalances and restarts

Sophie Blee-Goldman (Jira) Mon, 06 Jan 2020 11:51:03 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009128#comment-17009128
 ]


Sophie Blee-Goldman commented on KAFKA-7994:
--------------------------------------------

Since we fixed part of this issue but not the full scope since partition-time 
is no longer used to determine stream-time, I've updated the description to 
cover only the preservation of partition-time (which was fixed for 2.4). The 
remaining work w.r.t preserving stream-time was broken out into a new ticket so 
we can track that separately. See KAFKA-9368

> Improve Partition-Time for rebalances and restarts
> --------------------------------------------------
>
>                 Key: KAFKA-7994
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7994
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Matthias J. Sax
>            Assignee: Richard Yu
>            Priority: Major
>             Fix For: 2.4.0
>
>         Attachments: possible-patch.diff
>
>
> We compute a per-partition partition-time as the maximum timestamp over all 
> records processed so far. Before 2.3 this was used to determine the logical 
> stream-time used to make decisions about processing out-of-order records or 
> drop them if they are late (ie, timestamp < stream-time - grace-period). 
> Preserving the stream-time is necessary to ensure deterministic results (see 
> KAFKA-9368), and although the processor-time is now used instead of 
> partition-time, preserving the partition-time is a first step towards 
> improving the overall stream-time semantics.
> The partition-time is also used by the TimestampExtractor. It gets passed in 
> to #extract and can be used to determine a rough timestamp estimate if the 
> actual timestamp is missing, corrupt, etc. This means in the corner case 
> where the next record to be processed after a rebalance/restart cannot have 
> its actual timestamp determined, we have no idea way of coming up with a 
> reasonable guess and the record will likely have to be dropped.
>  
> A potential fix would be, to store latest observed partition-time in the 
> metadata of committed offsets. This way, on restart/rebalance we can 
> re-initialize partition-time correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-7994) Improve Partition-Time for rebalances and restarts

Reply via email to