Ozone Community Meeting(APAC, 2024 Jan 19th)

Sammi Chen Fri, 19 Jan 2024 05:09:54 -0800

Attenders: Guohao, Minyu, Yiyang, Xi,  Hongbin, Jianghua, Hualong, Yuanben,
Kangchen,  Sammi


1. Shopee
    -  1.4.0 RC1 VOTE is passed.  Yiyang is handling post vote release
steps.

2. DiDi
     -  Hit one DN data read performance issue after a lot of files are
deleted. The performance is caused by the lock contention between data read
and block deletion. The jstack shows that all read threads are waiting to
acquire container read lock. Filed
https://issues.apache.org/jira/browse/HDDS-10146 to continue the
investigation and find the right solution.
        One way to alleviate the lock contention is introduced by
HDDS-9107, which introduces a property
"hdds.datanode.block.deleting.max.lock.holding.time" to define how long the
block deletion will hold the container lock in one iterator.
     -  DN decommission is slow after most of the containers on the DN are
replicated. It's not sure if the DN decommission is stuck or not. The
investigation is going on. Suggested to check below two umbrella JIRAs to
see if any fixes are related and usable.
        HDDS-8699 Further Replication Manager Improvements
        HDDS-7759 Improve Ozone Replication Manager

3. Qihoo
     -  Found one container balancer performance bottleneck. When balancing
a DN with 200K containers, it will take around 2~3 hours to select one
container candidate to move.  The most time consuming part is
ContainerBalancerSelectionCriteria#getCandidateContainers. Yiyang has a
solution for this, and will submit a PR later.
     -  Working on container replication grpc zero-copy improvement work,
will submit PR later.

Ozone Community Meeting(APAC, 2024 Jan 19th)

Reply via email to