[ https://issues.apache.org/jira/browse/HDDS-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923476#comment-17923476 ]
Ethan Rose commented on HDDS-11643: ----------------------------------- Hi [~izlenko] I think the repro would be different than you described. The original issue from HDDS-8508 is that with {{ozone.scm.ha.ratis.server.snapshot.creation.gap}} set high and on a non-active cluster, very infrequent snapshot flushing would happen. Setting this to a low value as you described would mitigate the issue, but this Jira is in regards to a different solution added in HDDS-8508: taking flush transactions based on time, regardless of cluster activity. The interval is controlled by the config {{ozone.scm.ha.dbtransactionbuffer.flush.interval}} and defaults to 10 minutes. To repro the issue described here more easily, you can use a more extreme config combination: leave {{ozone.scm.ha.ratis.server.snapshot.creation.gap}} at a fairly high value, maybe 100k, to make sure flushes do not happen due to the number of transactions. Then set {{ozone.scm.ha.dbtransactionbuffer.flush.interval}} pretty low, maybe 10 seconds. We would expect the block deletions to get flushed within 10 seconds on an inactive cluster, but based on the original report this will not be observed. > Investigate time based SCM snapshot/transaction flush not working > ----------------------------------------------------------------- > > Key: HDDS-11643 > URL: https://issues.apache.org/jira/browse/HDDS-11643 > Project: Apache Ozone > Issue Type: Bug > Components: SCM > Reporter: Ethan Rose > Priority: Major > > Per [this GitHub > discussion|https://github.com/apache/ozone/discussions/7239#discussioncomment-11027931], > it seems that the time based snapshot implemented in HDDS-8508 may not be > working correctly. The observed behavior is that block deletes in SCM do not > progress until {{ozone admin containerbalancer start}} is called. This > coincidentally invokes a [manual buffer > flush|https://github.com/apache/ozone/blob/bd8bb39468ed1693b94b75ef85db4710cd693618/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/StatefulServiceStateManagerImpl.java#L69], > and only after this it is observed that deletes resume from SCM. It seems > that the time based flush is not taking effect automatically. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org