Max Gekk created SPARK-57843:
--------------------------------
Summary: Support nanosecond-precision timestamps in streaming
stateful operators
Key: SPARK-57843
URL: https://issues.apache.org/jira/browse/SPARK-57843
Project: Spark
Issue Type: Sub-task
Components: Structured Streaming
Affects Versions: 4.3.0
Reporter: Max Gekk
This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond
precision).
h2. Problem
Streaming stateful operators assume microsecond {{Long}} event times:
{{StreamingSymmetricHashJoinExec}} (~L747, ~L980-984) uses {{getLong}} and
{{watermarkMs * 1000}}; {{StreamingSessionWindowStateManager}} (~L135)
hard-codes {{TimestampType}} in the state key schema;
{{SymmetricHashJoinStateManager}} reads event times via {{getLong}}. The
{{RocksDBStateEncoder}} is schema-generic, but the operators above are not.
h2. Goal
Allow nanosecond event-time columns to flow through stream-stream join eviction
and session-window state, preserving nanosecond resolution in state keys and
eviction comparisons.
h2. Scope
Update the state schema and eviction/read paths in the listed operators to
handle {{TimestampNanosVal}}.
h2. Acceptance criteria
* Stream-stream joins and session windows keyed on / bounded by nanosecond
event time evict and emit correctly.
h2. Testing
{{StreamingJoinSuite}}, {{StreamingSessionWindowSuite}}.
h2. Dependencies
Do AFTER SPARK-57830 (event-time watermark on nanosecond columns) and
SPARK-57829 (window/session_window over nanosecond timestamps).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]