[ https://issues.apache.org/jira/browse/KAFKA-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sophie Blee-Goldman updated KAFKA-7994: --------------------------------------- Summary: Improve Partition-Time for rebalances and restarts (was: Improve Stream-Time for rebalances and restarts) > Improve Partition-Time for rebalances and restarts > -------------------------------------------------- > > Key: KAFKA-7994 > URL: https://issues.apache.org/jira/browse/KAFKA-7994 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: Matthias J. Sax > Assignee: Richard Yu > Priority: Major > Fix For: 2.4.0 > > Attachments: possible-patch.diff > > > We compute a per-partition partition-time as the maximum timestamp over all > records processed so far. Before 2.3 this was used to determine the logical > stream-time used to make decisions about processing out-of-order records or > drop them if they are late (ie, timestamp < stream-time - grace-period). > Preserving the stream-time is necessary to ensure deterministic results (see > KAFKA-9368), and although the processor-time is now used instead of > partition-time, preserving the partition-time is a first step towards > improving the overall stream-time semantics. > The partition-time is also used by the TimestampExtractor. It gets passed in > to #extract and can be used to determine a rough timestamp estimate if the > actual timestamp is missing, corrupt, etc. This means in the corner case > where the next record to be processed after a rebalance/restart cannot have > its actual timestamp determined, we have no idea way of coming up with a > reasonable guess and the record will likely have to be dropped. > > A potential fix would be, to store latest observed partition-time in the > metadata of committed offsets. This way, on restart/rebalance we can > re-initialize partition-time correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)