Attender: Guo Hao, Bing Hong, George Huang, Xi Chen, Guangbao Zhao, Jackson
Yao, Hualong Zhang, Minyu Liu, Yuanben Wang, Sammi Chen

Agenda:

   1. Goerge raised the question about the 1.3.1 release. After the
   discussion, we reached the consensus that it's better to put all our
   resources on 1.4.0 to speed it up. Once 1.4.0 is released, most partners
   have the plan to upgrade to it. George will send a notice in the community
   dev mail list later about that we are going to cancel the 1.3.1 release.
   2. Bing raised a question about block deletion. They have deleted a lot
   of files recently, and find that the block deletion is still going on after
   several days. Turing the block deletion related properties of OM, SCM and
   DN, doesn't help. Finally it's found that because there are too
   closeContainerCommand in the queue and the thread is busy handling these
   commands, the block deletion commands are waiting for a long time to get
   executed. Please refer to the
   
https://docs.google.com/document/d/1g1h-63fvA-be-clvyVRAHLWehoadCnjNRWyX8Bp-UIU/edit#heading=h.358h6n518zuj
   for detail info.
   3. Xi shared their block deletion turning experience,after some
   turning,1K blocks deletion per second can be achieved on one DN, and
   improvements proposed to community based on the real environment cases


   - HDDS-8888. Consider Datanode queue capacity when sending DeleteBlocks
   command #4939 <https://github.com/apache/ozone/pull/4939>
   - HDDS-8869. Make DN DeleteBlocksCommandHandler wait for the lock can
   timeout. #4913 <https://github.com/apache/ozone/pull/4913>
   - HDDS-8690. Ozone Support deletion related parameter dynamic
   configuration #4798 <https://github.com/apache/ozone/pull/4798>
   - HDDS-8882. Add status management of SCM's DeleteBlocksCommand to avoid
   sending duplicate delete transactions to the DN #4988
   <https://github.com/apache/ozone/pull/4988>

-             Review help is needed for above patches
-        4.  Xi also mentioned that when the SCM rocksdb had 2 billion keys
in the delete transaction table,it took nearly 60-70s for the rocksdb
iterator to seek first place.
-        5.  Hao found their SCM failed to startup because there was a raft
log gap. And it's very hard for SCM to recover from that state without any
SCM metadata loss. He raised the ticket RATIS-1887.

Reply via email to