Hi,

I have 5 data nodes (bluestore, kraken), each with 24 OSDs.
I enabled the optimal crush tunables.
I'd like to try to "really" use EC pools, but until now I've faced cluster 
lockups when I was using 3+2 EC pools with a host failure domain.
When a host was down for instance ;)

Since I'd like the erasure codes to be more than a "nice to have feature with 
12+ ceph data nodes", I wanted to try this :


-          Use a 14+6 EC rule

-          And for each data chunk:

o    select 4 hosts

o   On these hosts, select 5 OSDs

In order to do that, I created this rule in the crush map :

rule 4hosts_20shards {
        ruleset 3
        type erasure
        min_size 20
        max_size 20
        step set_chooseleaf_tries 5
        step set_choose_tries 100
        step take default
        step choose indep 4 type host
        step chooseleaf indep 5 type osd
        step emit
}

I then created an EC pool with this erasure profile :
ceph osd erasure-code-profile set erasurep14_6_osd  ruleset-failure-domain=osd 
k=14 m=6

I hoped this would allow for loosing 1 host completely  without locking the 
cluster, and I have the impression this is working..
But. There's always a but ;)

I tried to make all OSDs down by stopping the ceph-osd daemons on one node.
And according to ceph, the cluster is unhealthy.
The ceph health detail fives me for instance this (for the 3+2 and 14+6 pools) :

pg 5.18b is active+undersized+degraded, acting [57,47,2147483647,23,133]
pg 9.186 is active+undersized+degraded, acting 
[2147483647,2147483647,2147483647,2147483647,2147483647,133,142,125,131,137,50,48,55,65,52,16,13,18,22,3]

My question therefore is : why aren't the down PGs remapped onto my 5th data 
node since I made sure the 20 EC shards were spread onto 4 hosts only ?
I thought/hoped that because osds were down, the data would be rebuilt onto 
another OSD/host ?
I can understand the 3+2 EC pool cannot allocate OSDs on another host because 
the 3+2=5 hosts already, but I don't understand why the 14+6 EC pool/pgs do not 
rebuild somewhere else ?

I do not find anything worth in a "ceph pg query", the up and acting parts are 
equal and do contain the 2147483647 value (wich means none as far as I 
understood).

I've also tried to "ceph osd out" all the OSDs from one host : in that case, 
the 3+2 EC PGs behaves as previously, but the 14+6 EC PGs seem happy despite 
the fact they are still saying the out OSDs are up and acting.
Is my crush rule that wrong ?
Is it possible to do what I want ?

Thanks for any hints...

Regards
Frederic

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to