[ 
https://issues.apache.org/jira/browse/SPARK-51097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zeyu Chen updated SPARK-51097:
------------------------------
    Description: 
We currently lack detailed visibility into state store level state maintenance 
in RocksDB. This limitation affects the ability to identify performance 
degradation issues behind maintenance tasks. 

To remediate this, we will introduce state store "instance" metrics to 
StreamingQueryProgress to track the latest snapshot version uploaded in RocksDB.

This improvement addresses three challenges in observability:
 * Uneven partition starvation, where we need to identify partitions with slow 
state maintenance,
 * Finding missing snapshots across versions, so we minimize extensive replays 
during recovery,
 * Identify performance instability, such as gaining insights into snapshot 
upload patterns

  was:
We currently lack detailed visibility into partition-level state maintenance in 
RocksDB. This limitation affects the ability to identify performance 
degradation issues behind maintenance tasks. 

To remediate this, we will add the partition-level metrics to 
StreamingQueryProgress to track the latest snapshot version uploaded in RocksDB.

This improvement addresses three challenges in observability:
 * Uneven partition starvation, where we need to identify partitions with slow 
state maintenance,
 * Finding missing snapshots across versions, so we minimize extensive replays 
during recovery,
 * Identify performance instability, such as gaining insights into snapshot 
upload patterns


> Adding state store level metrics for last uploaded snapshot version in RocksDB
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-51097
>                 URL: https://issues.apache.org/jira/browse/SPARK-51097
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 4.0.0, 4.1.0
>            Reporter: Zeyu Chen
>            Priority: Minor
>              Labels: pull-request-available
>
> We currently lack detailed visibility into state store level state 
> maintenance in RocksDB. This limitation affects the ability to identify 
> performance degradation issues behind maintenance tasks. 
> To remediate this, we will introduce state store "instance" metrics to 
> StreamingQueryProgress to track the latest snapshot version uploaded in 
> RocksDB.
> This improvement addresses three challenges in observability:
>  * Uneven partition starvation, where we need to identify partitions with 
> slow state maintenance,
>  * Finding missing snapshots across versions, so we minimize extensive 
> replays during recovery,
>  * Identify performance instability, such as gaining insights into snapshot 
> upload patterns



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to