Re: [ceph-users] best practices for EC pools

Caspar Smit Fri, 08 Feb 2019 02:48:28 -0800

Op vr 8 feb. 2019 om 11:31 schreef Scheurer François <
francois.scheu...@everyware.ch>:


> Dear Eugen Block
> Dear Alan Johnson
>
>
> Thank you for your answers.
>
> So we will use EC 3+2 on 6 nodes.
> Currently with only 4 osd's per node, then 8 and later 20.
>
>
> >Just to add, that a more general formula is that the number of nodes
> should be greater than or equal to k+m+m so N>=k+m+m for full recovery
>
> Understood.
> EC k+m assumes the case of loosing m nodes and that would require m
> 'spare' nodes to recover, so k+m+m in total.
> But the loss of a single node should allow a full recovery, shouldn'it ?
>
> Having 3+2 on 6 nodes should be able to:
> -survive the loss of max 2 nodes simultaneously
>

Yes and No, technically you can survive a 2 node failure but EC requires
K+1 nodes to allow writes, so every IO freezes (until all affected PG's are
recovered to at least K+1) when losing the second node.
So yes you survive, but no you can't use the cluster for a while during
this, so if you want to keep using your cluster at all times you can only
have 1 node failure.


> -survive the loss of max 3 nodes, if the recovery has enough time to
> complete between failures
>

I think this kind of scenario shouldn't even be considered.


> -recover the loss of max 1 node
>
> Only if there's enough free disk space left to hold all the data.

Kind regards,
Caspar


> >If the pools are empty I also wouldn't expect that, is restarting one OSD
> also that slow or is it just when you reboot the whole cluster?
> It also happens after rebooting a single node.
>
> In the mon logs we see a lot os such messages:
>
> 2019-02-06 23:07:46.003473 7f14d8ed6700  1 mon.ewos1-osd1-prod@0(leader).osd
> e116 prepare_failure osd.17 10.38.66.71:6803/76983 from osd.1
> 10.38.67.72:6800/75206 is reporting failure:1
> 2019-02-06 23:07:46.003486 7f14d8ed6700  0 log_channel(cluster) log [DBG]
> : osd.17 10.38.66.71:6803/76983 reported failed by osd.1
> 10.38.67.72:6800/75206
> 2019-02-06 <http://10.38.67.72:6800/752062019-02-06> 23:07:57.948959
> 7f14d8ed6700  1 mon.ewos1-osd1-prod@0(leader).osd e116 prepare_failure
> osd.17 10.38.66.71:6803/76983 from osd.1 10.38.67.72:6800/75206 is
> reporting failure:0
> 2019-02-06 23:07:57.948971 7f14d8ed6700  0 log_channel(cluster) log [DBG]
> : osd.17 10.38.66.71:6803/76983 failure report canceled by osd.1
> 10.38.67.72:6800/75206
> 2019-02-06 <http://10.38.67.72:6800/752062019-02-06> 23:08:54.632356
> 7f14d8ed6700  1 mon.ewos1-osd1-prod@0(leader).osd e116 prepare_failure
> osd.0 10.38.65.72:6800/72872 from osd.17 10.38.66.71:6803/76983 is
> reporting failure:1
> 2019-02-06 23:08:54.632374 7f14d8ed6700  0 log_channel(cluster) log [DBG]
> : osd.0 10.38.65.72:6800/72872 reported failed by osd.17
> 10.38.66.71:6803/76983
> 2019-02-06 <http://10.38.66.71:6803/769832019-02-06> 23:10:21.333513
> 7f14d8ed6700  1 mon.ewos1-osd1-prod@0(leader).osd e116 prepare_failure
> osd.23 10.38.66.71:6807/79639 from osd.18 10.38.67.72:6806/79121 is
> reporting failure:1
> 2019-02-06 23:10:21.333527 7f14d8ed6700  0 log_channel(cluster) log [DBG]
> : osd.23 10.38.66.71:6807/79639 reported failed by osd.18
> 10.38.67.72:6806/79121
> 2019-02-06 <http://10.38.67.72:6806/791212019-02-06> 23:10:57.660468
> 7f14d8ed6700  1 mon.ewos1-osd1-prod@0(leader).osd e116 prepare_failure
> osd.23 10.38.66.71:6807/79639 from osd.18 10.38.67.72:6806/79121 is
> reporting failure:0
> 2019-02-06 23:10:57.660481 7f14d8ed6700  0 log_channel(cluster) log [DBG]
> : osd.23 10.38.66.71:6807/79639 failure report canceled by osd.18
> 10.38.67.72:6806/79121
>
>
>
> Best Regards
> Francois Scheurer
>
>
>
>
>
> ________________________________________
> From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Alan
> Johnson <al...@supermicro.com>
> Sent: Thursday, February 7, 2019 8:11 PM
> To: Eugen Block; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] best practices for EC pools
>
> Just to add, that a more general formula is that the number of nodes
> should be greater than or equal to k+m+m so N>=k+m+m for full recovery
>
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Eugen Block
> Sent: Thursday, February 7, 2019 8:47 AM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] best practices for EC pools
>
> Hi Francois,
>
> > Is that correct that recovery will be forbidden by the crush rule if a
> > node is down?
>
> yes, that is correct, failure-domain=host means no two chunks of the same
> PG can be on the same host. So if your PG is divided into 6 chunks, they're
> all on different hosts, no recovery is possible at this point (for the
> EC-pool).
>
> > After rebooting all nodes we noticed that the recovery was slow, maybe
> > half an hour, but all pools are currently empty (new install).
> > This is odd...
>
> If the pools are empty I also wouldn't expect that, is restarting one OSD
> also that slow or is it just when you reboot the whole cluster?
>
> > Which k&m values are preferred on 6 nodes?
>
> It depends on the failures you expect and how many concurrent failures you
> need to cover.
> I think I would keep failure-domain=host (with only 4 OSDs per host).
> As for the k and m values, 3+2 would make sense, I guess. That profile
> would leave one host for recovery and two OSDs of one PG acting set could
> fail without data loss, so as resilient as the 4+2 profile. This is one
> approach, so please don't read this as *the* solution for your environment.
>
> Regards,
> Eugen
>
>
> Zitat von Scheurer François <francois.scheu...@everyware.ch>:
>
> > Dear All
> >
> >
> > We created an erasure coded pool with k=4 m=2 with failure-domain=host
> > but have only 6 osd nodes.
> > Is that correct that recovery will be forbidden by the crush rule if a
> > node is down?
> >
> > After rebooting all nodes we noticed that the recovery was slow, maybe
> > half an hour, but all pools are currently empty (new install).
> > This is odd...
> >
> > Can it be related to the k+m being equal to the number of nodes?
> > (4+2=6) step set_choose_tries 100 was already in the EC crush rule.
> >
> > rule ewos1-prod_cinder_ec {
> >       id 2
> >       type erasure
> >       min_size 3
> >       max_size 6
> >       step set_chooseleaf_tries 5
> >       step set_choose_tries 100
> >       step take default class nvme
> >       step chooseleaf indep 0 type host
> >       step emit
> > }
> >
> > ceph osd erasure-code-profile set ec42 k=4 m=2 crush-root=default
> > crush-failure-domain=host crush-device-class=nvme ceph osd pool create
> > ewos1-prod_cinder_ec 256 256 erasure ec42
> >
> > ceph version 12.2.10-543-gfc6f0c7299
> > (fc6f0c7299e3442e8a0ab83260849a6249ce7b5f) luminous (stable)
> >
> >   cluster:
> >     id:     b5e30221-a214-353c-b66b-8c37b4349123
> >     health: HEALTH_WARN
> >             noout flag(s) set
> >             Reduced data availability: 125 pgs inactive, 32 pgs
> > peering
> >
> >   services:
> >     mon: 3 daemons, quorum
> ewos1-osd1-prod,ewos1-osd3-prod,ewos1-osd5-prod
> >     mgr: ewos1-osd5-prod(active), standbys: ewos1-osd3-prod,
> ewos1-osd1-prod
> >     osd: 24 osds: 24 up, 24 in
> >          flags noout
> >
> >   data:
> >     pools:   4 pools, 1600 pgs
> >     objects: 0 objects, 0B
> >     usage:   24.3GiB used, 43.6TiB / 43.7TiB avail
> >     pgs:     7.812% pgs not active
> >              1475 active+clean
> >              93   activating
> >              32   peering
> >
> >
> > Which k&m values are preferred on 6 nodes?
> > BTW, we plan to use this EC pool as a second rbd pool in Openstack,
> > with the main first rbd pool being replicated size=3; it is nvme ssd
> > only.
> >
> >
> > Thanks for your help!
> >
> >
> >
> > Best Regards
> > Francois Scheurer
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=DwIGaQ&c=4DxX-JX0i28X6V65hK0ftwVK1xnmwcYC0vo7GVya1JY&r=sgFiQgvQASiGFaHpitF5P9M9QDCRkgKGttwwMFt2VIU&m=pTchIHDm3u6d1bmWBYKGF0Akb9UelYSeP1pnEbEw85Q&s=FV0ocIQ2LDiwIdGtKE36tH50px_KHyRvz14eDP1qptI&e=
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] best practices for EC pools

Reply via email to