Zeyu Chen created SPARK-51097:
---------------------------------
Summary: Adding partition-level metrics for last uploaded snapshot
version in RocksDB
Key: SPARK-51097
URL: https://issues.apache.org/jira/browse/SPARK-51097
Project: Spark
Issue Type: Improvement
Components: Structured Streaming
Affects Versions: 4.0, 4.1
Reporter: Zeyu Chen
We currently lack detailed visibility into partition-level state maintenance in
RocksDB. This limitation affects the ability to identify performance
degradation issues behind maintenance tasks.
To remediate this, we will add the partition-level metrics to
StreamingQueryProgress to track the latest snapshot version uploaded in RocksDB.
This improvement addresses three challenges in observability:
* Uneven partition starvation, where we need to identify partitions with slow
state maintenance,
* Finding missing snapshots across versions, so we minimize extensive replays
during recovery,
* Identify performance instability, such as gaining insights into snapshot
upload patterns
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]