Matthias J. Sax created KAFKA-20099:
---------------------------------------
Summary: Clarify how "stream-time" works in the docs
Key: KAFKA-20099
URL: https://issues.apache.org/jira/browse/KAFKA-20099
Project: Kafka
Issue Type: Improvement
Components: docs, streams
Reporter: Matthias J. Sax
In Kafka Streams, we have the high-level concept of "stream-time" which we
briefly explain in the docs. However, we do not fully cover the details. Given
other external material on the topic (blog post; talks), the high level concept
of "stream-time" is well established and known by users.
However, the implementation details matter and we have seen user
mis-understanding how it work in particular. We should update the docs
accordingly.
The most important notion is, that KS runtime implements "stream-time" on a per
task level based in input records for the task (we can call this "task time").
This implies that there is no global notion of "stream time", and each task
tacks it's own "stream time" independently. Because "task time" is a runtime
concept, KS also preserves "task time" across rebalances and application
restarts.
In contrast, some DSL operators (which use a grace period) implement their own
"stream time" tracking (we can call this "operator time"). An individual
Processor might not get all record from the task input topics (upstream
operators might filter, or cache records), and most importantly, operators
track their internal "stream time" only in-memory, and thus it's not preserved
across rebalances / restarts. "Operator time" would be re-established based on
the first input record after a rebalance / restart. – This is an important
difference to "task time" as it impact how grace-periods are applied and should
be documented.
Cf https://issues.apache.org/jira/browse/KAFKA-9368
--
This message was sent by Atlassian Jira
(v8.20.10#820010)