Hi Community,

SCM maintains a DeleteBlockTransaction table [1]. For each transaction
record in this table, there is a retry count [2]. This retry count
increases every time when SCM retries the delete transaction and until it
exceeds the maximum limit, then SCM stops retrying and admin can analyze
why some blocks fail to delete.

Because the count is written into DB every time upon retries, I want to
discuss whether it is worth an optimization that we can maintain the retry
count as an in-memory state and we only write to DB when the retry count
exceeds the limit (thus to leave for further analysis).

The reason for this idea is in SCM HA we are replicating DB changes over
Ratis, and still persist retry count for every increase will have 3x cost
compared to now.

The drawback of only updating retrycount at the limit is, if SCM restart at
a time, the retry count will be cleared and restart to count.


[1]:
https://github.com/apache/ozone/blob/master/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/metadata/SCMMetadataStore.java#L70
[2]:
https://github.com/apache/ozone/blob/master/hadoop-hdds/interface-server/src/main/proto/ScmServerDatanodeHeartbeatProtocol.proto#L331


Thanks,
Rui Wang

Reply via email to