Hello,

We have a Ceph cluster (version 12.2.4) with 10 hosts, and there are 21
OSDs on each host.


 An EC pool is created with the following commands:


ceph osd erasure-code-profile set profile_jerasure_4_3_reed_sol_van \

  plugin=jerasure \

  k=4 \

  m=3 \

  technique=reed_sol_van \

  packetsize=2048 \

  crush-device-class=hdd \

  crush-failure-domain=host


ceph osd pool create pool_jerasure_4_3_reed_sol_van 2048 2048 erasure
profile_jerasure_4_3_reed_sol_van



Here are my questions:

   1. The EC pool is created using k=4, m=3, and crush-device-class=hdd, so
   we just disable the network interfaces of some hosts (using "ifdown"
   command) to verify the functionality of the EC pool while performing ‘rados
   bench’ command.
   However, the IO rate drops immediately to 0 when a single host goes
   offline, and it takes a long time (~100 seconds) for the IO rate becoming
   normal.
   As far as I know, the default value of min_size is k+1 or 5, which means
   that the EC pool can be still working even if there are two hosts offline.
   Is there something wrong with my understanding?
   2. According to our observations, it seems that the IO rate becomes
   normal when Ceph detects all OSDs corresponding to the failed host.
   Is there any way to reduce the time needed for Ceph to detect all failed
   OSDs?



Thanks for any help.


Best regards,

Majia Xiao
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to