July 28th, 2023

Attender: Engineers from Shopee, DiDi, Qihoo 360 and Cloudera


Agenda:

   1.

   Cloudera
   1.

      Give an update of community status and features’ progress, such as
      snapshot, quota, EC,  block token etc.
      2.

   Shopee
   1.

      Previous seeing a lot bellowing failure during SCM leader transfer,

Inconsistent read for blockID=conID: 1 locID: 111677748019200007 bcsId: 0
length=2 position=1 numBytesToRead=1 numBytesRead=-1

Is root caused and resolved by HDDS-8973
<https://issues.apache.org/jira/browse/HDDS-8973>

   1.

   1.4.0 release progress discussion.


   1.

   DiDi
   1.

      A non production cluster was upgraded to the community master branch
      recently. After upgrading, it took around 30 min to boot up. It was doing
      the bucket usage calculation of 200~ million keys. DiDi would
like to know
      if there is any way to reduce the time before next time they upgrade the
      Ozone production cluster.
      2.

      Find memory leak issue in Ozone s3g service, the same problem as
      RATIS-1705 <https://issues.apache.org/jira/browse/RATIS-1705>
      3.

      DiDi ‘s HDFS cluster is adding 1PB high density storage DN. Ozone can
      learn from it how to support the high density node in future.
      4.

      DN decommission. A DN with 50GB data is under decommissioned for more
      than 3 days. The decommission is still going on. Two suggestions are
      shared,
      1.

         Check SCM log to find if there is any unhealthy container
         2.

         Increase “hdds.datanode.replication.streams.limit”
         2.

   Qihoo 360
   1.

      Developing one big file to save multiple small files for an EC type
      container.
      2.

      Recently synced the internal branch with Ozone master branch, and
      found some complexity during the code merge. The major two are
      1.

         The concrete class KeyValueContainer is used instead of the
         interface Container in many places. As an implementation of a new
         Container type is introduced internally, it costs some time
to complete the
         code merge.
         2.

         Internally Apache Cassandra is used to replace the rocksdb. The
         replacement is mainly based on the Table interface.  With the
new snapshot
         feature, the coupling of upper level logic and rocksdb is
tighter. This
         part of code merge is going on.
         3.

      Plan to submit some refactor patch later for the above two related
      modules.

Reply via email to