[ https://issues.apache.org/jira/browse/KAFKA-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009128#comment-17009128 ]
Sophie Blee-Goldman commented on KAFKA-7994: -------------------------------------------- Since we fixed part of this issue but not the full scope since partition-time is no longer used to determine stream-time, I've updated the description to cover only the preservation of partition-time (which was fixed for 2.4). The remaining work w.r.t preserving stream-time was broken out into a new ticket so we can track that separately. See KAFKA-9368 > Improve Partition-Time for rebalances and restarts > -------------------------------------------------- > > Key: KAFKA-7994 > URL: https://issues.apache.org/jira/browse/KAFKA-7994 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: Matthias J. Sax > Assignee: Richard Yu > Priority: Major > Fix For: 2.4.0 > > Attachments: possible-patch.diff > > > We compute a per-partition partition-time as the maximum timestamp over all > records processed so far. Before 2.3 this was used to determine the logical > stream-time used to make decisions about processing out-of-order records or > drop them if they are late (ie, timestamp < stream-time - grace-period). > Preserving the stream-time is necessary to ensure deterministic results (see > KAFKA-9368), and although the processor-time is now used instead of > partition-time, preserving the partition-time is a first step towards > improving the overall stream-time semantics. > The partition-time is also used by the TimestampExtractor. It gets passed in > to #extract and can be used to determine a rough timestamp estimate if the > actual timestamp is missing, corrupt, etc. This means in the corner case > where the next record to be processed after a rebalance/restart cannot have > its actual timestamp determined, we have no idea way of coming up with a > reasonable guess and the record will likely have to be dropped. > > A potential fix would be, to store latest observed partition-time in the > metadata of committed offsets. This way, on restart/rebalance we can > re-initialize partition-time correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)