Zeyu Chen created SPARK-51252:
---------------------------------

             Summary: Adding state store level metrics for last uploaded 
snapshot version in HDFS State Stores
                 Key: SPARK-51252
                 URL: https://issues.apache.org/jira/browse/SPARK-51252
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 4.0.0, 4.1.0
            Reporter: Zeyu Chen


Similarly to SPARK-51097,  we would also like to introduce a similar level of 
observability to HDFSBackedStateStore.

The introduction of state store "instance" metrics to StreamingQueryProgress to 
track the latest snapshot version uploaded in HDFS state stores should address 
three challenges in observability:
 * Uneven partition starvation, where we need to identify partitions with slow 
state maintenance,
 * Finding missing snapshots across versions, so we minimize extensive replays 
during recovery,
 * Identify performance instability, such as gaining insights into snapshot 
upload patterns

The instance metrics should be kept as generalized as possible, so that future 
instance metrics for observability can be added with minimal refactoring.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to