Ozone Community Meeting(APAC, 2024 Feb 2nd)

Sammi Chen Mon, 05 Feb 2024 01:36:26 -0800

Attenders: Yiyang, Xi(Pony), Hao, Minyu, Bin, Hualong, Yuanben, Jianghua,
Conway, Jeff, Wei-Chiu, Sammi


Cloudera:

   - Share Ozone append initial design.
   - Present the web site under development.
   - Suggest community to pay attention to the "Ozone Storage Container
   Reconciliation" design and better to have a review.

DiDi:

   - Suggest one improvement on Recon UI, fold the pipeline info for a DN
   on Datanode page. When a Datanode has more than 10 pipelines, it will
   display all the pipelines by default, which is not necessary most of the
   time. @devmadhuu <https://github.com/devmadhuu>
   - OM transaction ID crash bug fixed in HDDS-9876
   <https://issues.apache.org/jira/browse/HDDS-9876>. One more further step
   to help to recover the OM DB, would be a CLI tool to update the transaction
   id in RocksDB. Filed HDDS-10295
   <https://issues.apache.org/jira/browse/HDDS-10295>
   - Testing Follower reader performance. One observation about Follower
   read, the client always reads from the first SCM defined in the HA group.
   It's better to randomly select some SCM to start with, so the read request
   will be served evenly across SCM instances. @szetszwo
   <https://github.com/szetszwo> .
   - Reported one block missing issue. After a further investigation after
   the meeting, it is caused by some customized lock changes on the DN side.
   After reverting the changes, the issue should be fixed.

Shopee:

   - Discussed unsupport HCFS APIs in Ozone
      - append, truncate
      - access time
      - EC policy change API
      - no directory rename for open for writing files.
   - Discussed the way to share files between different users.
      - S3 has such capability called presigned URLs. Ozone should have
      supported this too.
      - symbolic link is not supported in Ozone so far. And there is no
      plan to support it yet.

Qihoo:

   - Find one performance issue with volume check, if the disk is not
   physically healthy, the volume check process which tries to write and read
   a small amount of data file will itself hang for around 1 hour, then the
   disk cannot be marked as failed timely. During this 1h, many read/write
   operations scheduled on this disk, causing the whole DN read/write latency
   extremely high. The solution is found and will submit a patch later.
   - DELETED state containers in memory. There are many DELETED state
   container info held in memory after a bunch of data writes, then deleted
   shortly after that. This data in memory will never be freed, nor will it be
   deleted from SCM rocksDB table. Propose a lazy way to delete these
   containers from memory and rocksDB. Will have a simple design document and
   share with the community.
   - Block location cache, which is not cleared and so a client always
   fetches from the same DN for the same block.
   (XceiverClientGrpc.getBlockDNcache). This will be a problem for long
   running services like S3 gateway, which usually holds the client for its
   entire life. So some DNs will become read hot spot.

Ozone Community Meeting(APAC, 2024 Feb 2nd)

Reply via email to