[ 
https://issues.apache.org/jira/browse/HDDS-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923476#comment-17923476
 ] 

Ethan Rose commented on HDDS-11643:
-----------------------------------

Hi [~izlenko] I think the repro would be different than you described. The 
original issue from HDDS-8508 is that with 
{{ozone.scm.ha.ratis.server.snapshot.creation.gap}} set high and on a 
non-active cluster, very infrequent snapshot flushing would happen. Setting 
this to a low value as you described would mitigate the issue, but this Jira is 
in regards to a different solution added in HDDS-8508: taking flush 
transactions based on time, regardless of cluster activity. The interval is 
controlled by the config {{ozone.scm.ha.dbtransactionbuffer.flush.interval}} 
and defaults to 10 minutes.

To repro the issue described here more easily, you can use a more extreme 
config combination: leave {{ozone.scm.ha.ratis.server.snapshot.creation.gap}} 
at a fairly high value, maybe 100k, to make sure flushes do not happen due to 
the number of transactions. Then set 
{{ozone.scm.ha.dbtransactionbuffer.flush.interval}} pretty low, maybe 10 
seconds. We would expect the block deletions to get flushed within 10 seconds 
on an inactive cluster, but based on the original report this will not be 
observed.

> Investigate time based SCM snapshot/transaction flush not working
> -----------------------------------------------------------------
>
>                 Key: HDDS-11643
>                 URL: https://issues.apache.org/jira/browse/HDDS-11643
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Ethan Rose
>            Priority: Major
>
> Per [this GitHub 
> discussion|https://github.com/apache/ozone/discussions/7239#discussioncomment-11027931],
>  it seems that the time based snapshot implemented in HDDS-8508 may not be 
> working correctly. The observed behavior is that block deletes in SCM do not 
> progress until {{ozone admin containerbalancer start}} is called. This 
> coincidentally invokes a [manual buffer 
> flush|https://github.com/apache/ozone/blob/bd8bb39468ed1693b94b75ef85db4710cd693618/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/StatefulServiceStateManagerImpl.java#L69],
>  and only after this it is observed that deletes resume from SCM. It seems 
> that the time based flush is not taking effect automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to