If I start to use all available space that pool can offer (4.5T) and first OSD (2.7T) fails, I'm sure I'll end up with lost data since it's not possible to fit 4.5T on 2 remaining drives with total raw capacity of 3.6T.

I'm wondering why ceph isn't complaining now. I thought it should place data among disks in that way, that loosing any OSD would keep data safe for RO. (by wasting excessive 0.9T capacity on the first drive)


Oh, and here's my rule and profile - by mistake I've sent it on PM:


rule ceph3_ec_low_k2_m1-data {
    id 2
    type erasure
    min_size 3
    max_size 3
    step set_chooseleaf_tries 5
    step set_choose_tries 100
    step take default class low_hdd
    step choose indep 0 type osd
    step emit
}

crush-device-class=low_hdd
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=2
m=1
plugin=jerasure
technique=reed_sol_van
w=8


Paweł


W dniu 8.11.2022 o 15:47, Danny Webb pisze:
with a m value of 1 if you lost a single OSD/failure domain you'd end up with a 
read only pg or cluster.  usually you need at least k+1 to survive a failure 
domain failure depending on your min_size setting.  The other thing you need to 
take into consideration is that the m value is for both failure domain *and* 
osd in an unlucky scenario (eg, you had a pg that happened to be on a downed 
host and a failed OSD elsewhere in the cluster).    For a 3 OSD configuration 
the minimum fault tolerant setup would be k=1, m=2 and you effectively then are 
doing replica 3 anyways.  At least this is my understanding of it.  Hope that 
helps
________________________________
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to