Re: [ceph-users] will crush rule be used during object relocation in OSD failure ?

Maged Mokhtar Sat, 24 Nov 2018 05:02:02 -0800


On 23/11/18 18:00, ST Wong (ITSC) wrote:

Hi all,


We've 8 osd hosts, 4 in room 1 and 4 in room2.
A pool with size = 3 using following crush map is created, to caterfor room failure.
rule multiroom {
        id 0
        type replicated
        min_size 2
        max_size 4
        step take default
        step choose firstn 2 type room
        step chooseleaf firstn 2 type host
        step emit
}



We're expecting:
1.for each object, there are always 2 replicas in one room and 1replica in other room making size=3. But we can't control which roomhas 1 or 2 replicas.
2.in case an osd host fails, ceph will assign remaining osds to thesame PG to hold replicas on the failed osd host. Selection is based oncrush rule of the pool, thus maintaining the same failure domain -won't make all replicas in the same room.
3.in case of entire room with 1 replica fails, the pool will remaindegraded but won't do any replica relocation.
4. in case of entire room with 2 replicas fails, ceph will make use ofosds in the surviving room and making 2 replicas. Pool will not bewriteable before all objects are made 2 copies (unless we make poolsize=4?). Then when recovery is complete, pool will remain indegraded state until the failed room recover.
Is our understanding correct?  Thanks a lot.
Will do some simulation later to verify.

Regards,
/stwong


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

I think this is correct. To re-phrase 2) : all PGs on the failed nodewill be re-distributed on several other hosts within the same room.

Since some PGs will have 2 replicas in 1 room whereas some other PGswill have 2 replicas in the other room, i tend to dis-like such setup asit is not symmetric,some PGs will suffer more than others in case onroom failure, you failure domain is not symmetric. Besides moreimportantly, as you stated in 4, you cluster will be down while theseunfortunate PGs recover ( statistically that is half your data ). Iwould prefer in such case you would use a size=4 min_size=2 setup.


/Maged

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] will crush rule be used during object relocation in OSD failure ?

Reply via email to