Hi Community, SCM maintains a DeleteBlockTransaction table [1]. For each transaction record in this table, there is a retry count [2]. This retry count increases every time when SCM retries the delete transaction and until it exceeds the maximum limit, then SCM stops retrying and admin can analyze why some blocks fail to delete.
Because the count is written into DB every time upon retries, I want to discuss whether it is worth an optimization that we can maintain the retry count as an in-memory state and we only write to DB when the retry count exceeds the limit (thus to leave for further analysis). The reason for this idea is in SCM HA we are replicating DB changes over Ratis, and still persist retry count for every increase will have 3x cost compared to now. The drawback of only updating retrycount at the limit is, if SCM restart at a time, the retry count will be cleared and restart to count. [1]: https://github.com/apache/ozone/blob/master/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/metadata/SCMMetadataStore.java#L70 [2]: https://github.com/apache/ozone/blob/master/hadoop-hdds/interface-server/src/main/proto/ScmServerDatanodeHeartbeatProtocol.proto#L331 Thanks, Rui Wang