[ https://issues.apache.org/jira/browse/SPARK-51252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-51252: ----------------------------------- Labels: pull-request-available (was: ) > Adding state store level metrics for last uploaded snapshot version in HDFS > State Stores > ---------------------------------------------------------------------------------------- > > Key: SPARK-51252 > URL: https://issues.apache.org/jira/browse/SPARK-51252 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 4.0.0, 4.1.0 > Reporter: Zeyu Chen > Priority: Minor > Labels: pull-request-available > > Similarly to SPARK-51097, we would also like to introduce a similar level of > observability to HDFSBackedStateStore. > The introduction of state store "instance" metrics to StreamingQueryProgress > to track the latest snapshot version uploaded in HDFS state stores should > address three challenges in observability: > * Uneven partition starvation, where we need to identify partitions with > slow state maintenance, > * Finding missing snapshots across versions, so we minimize extensive > replays during recovery, > * Identify performance instability, such as gaining insights into snapshot > upload patterns > The instance metrics should be kept as generalized as possible, so that > future instance metrics for observability can be added with minimal > refactoring. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org