Ozone Community Meeting Minutes(APAC, Sep 15th)

Sammi Chen Fri, 15 Sep 2023 08:49:44 -0700

Attender: Guo Hao, Bing Hong, George Huang, Xi Chen, Guangbao Zhao, Jackson
Yao, Hualong Zhang, Minyu Liu, Yuanben Wang, Sammi Chen

Agenda:

1. Goerge raised the question about the 1.3.1 release. After the
discussion, we reached the consensus that it's better to put all our
resources on 1.4.0 to speed it up. Once 1.4.0 is released, most partners
have the plan to upgrade to it. George will send a notice in the community
dev mail list later about that we are going to cancel the 1.3.1 release.
2. Bing raised a question about block deletion. They have deleted a lot
of files recently， and find that the block deletion is still going on after
several days. Turing the block deletion related properties of OM, SCM and
DN, doesn't help. Finally it's found that because there are too
closeContainerCommand in the queue and the thread is busy handling these
commands, the block deletion commands are waiting for a long time to get
executed. Please refer to the

https://docs.google.com/document/d/1g1h-63fvA-be-clvyVRAHLWehoadCnjNRWyX8Bp-UIU/edit#heading=h.358h6n518zuj
for detail info.
3. Xi shared their block deletion turning experience，after some
turning，1K blocks deletion per second can be achieved on one DN, and
improvements proposed to community based on the real environment cases

- HDDS-8888. Consider Datanode queue capacity when sending DeleteBlocks
command #4939 <https://github.com/apache/ozone/pull/4939>
- HDDS-8869. Make DN DeleteBlocksCommandHandler wait for the lock can
timeout. #4913 <https://github.com/apache/ozone/pull/4913>
- HDDS-8690. Ozone Support deletion related parameter dynamic
configuration #4798 <https://github.com/apache/ozone/pull/4798>
- HDDS-8882. Add status management of SCM's DeleteBlocksCommand to avoid
sending duplicate delete transactions to the DN #4988
<https://github.com/apache/ozone/pull/4988>

- Review help is needed for above patches
- 4. Xi also mentioned that when the SCM rocksdb had 2 billion keys
in the delete transaction table，it took nearly 60-70s for the rocksdb
iterator to seek first place.
- 5. Hao found their SCM failed to startup because there was a raft
log gap. And it's very hard for SCM to recover from that state without any
SCM metadata loss. He raised the ticket RATIS-1887.

Ozone Community Meeting Minutes(APAC, Sep 15th)

Reply via email to