Zeyu Chen created SPARK-51252: --------------------------------- Summary: Adding state store level metrics for last uploaded snapshot version in HDFS State Stores Key: SPARK-51252 URL: https://issues.apache.org/jira/browse/SPARK-51252 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 4.0.0, 4.1.0 Reporter: Zeyu Chen
Similarly to SPARK-51097, we would also like to introduce a similar level of observability to HDFSBackedStateStore. The introduction of state store "instance" metrics to StreamingQueryProgress to track the latest snapshot version uploaded in HDFS state stores should address three challenges in observability: * Uneven partition starvation, where we need to identify partitions with slow state maintenance, * Finding missing snapshots across versions, so we minimize extensive replays during recovery, * Identify performance instability, such as gaining insights into snapshot upload patterns The instance metrics should be kept as generalized as possible, so that future instance metrics for observability can be added with minimal refactoring. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org