Re: [ceph-users] Requests blocked in degraded erasure coded pool

Gregory Farnum Wed, 07 Jun 2017 11:30:07 -0700

Whoops, sent that too early. Let me try again.

On Wed, Jun 7, 2017 at 3:24 AM Jonas Jaszkowic <
jonasjaszko...@googlemail.com> wrote:


> Thank you for your feedback! Do you have more information on *why* at
> least
> k+1 nodes need to be active in order for the cluster to work at this point?
>
>
Actually, I misread your email and misdiagnosed it into being too precise.
In your case, you've got a 2+3 EC pool and killed 3 OSDs.

Roughly:
We prevent PGs from going active (and serving writes or reads) when they
have less than "min_size" OSDs participating. This is generally set so that
we have enough redundancy to recover from at least one OSD failing.

In your case, you have 2 OSDs and the failure of either one of them results
in the loss of all written data. So we don't let you go active as it's not
safe.


> I am particularly interested in any material on the erasure coding
> implementations
> in Ceph and how they work in depth. Sometimes the official documentation
> doesn’t
> supply the needed information on problems beyond the point of a default
> cluster
> setup. Are there any technical documentations on the implementation or
> something
> similar?
>

http://docs.ceph.com/docs/master/dev/osd_internals/erasure_coding/ and the
pages it links to
-Greg


>
> Any help is appreciated.
>
> Best regards,
> Jonas
>
>
> Am 07.06.2017 um 08:00 schrieb Gregory Farnum <gfar...@redhat.com>:
>
> On Tue, Jun 6, 2017 at 10:12 AM, Jonas Jaszkowic
> <jonasjaszko...@googlemail.com> wrote:
>
> I setup a simple Ceph cluster with 5 OSD nodes and 1 monitor node. Each OSD
> is on a different host.
> The erasure coded pool has 64 PGs and an initial state of HEALTH_OK.
>
> The goal is to deliberately break as many OSDs as possible up to the number
> of coding chunks m in order to
> evaluate the read performance when these chunks are missing. Per definition
> of Reed-Solomon Coding, any
> chunks out of the n=k+m total chunks can be missing. To simulate the loss
> of
> an OSD I’m doing the following:
>
> ceph osd set noup
> ceph osd down <ID>
> ceph osd out <ID>
>
> With the above procedure I should be able to kill up to m = 3 OSDs without
> loosing any data. However, when I kill k = 3 randomly selected OSDs,
> all requests to the cluster are blocked and HEALTH_ERR is showing. The OSD
> on which the requests are blocked is working properly and [in,up] in the
> cluster.
>
> My question: Why is it not possible to kill m = 3 OSDs and still operate
> the
> cluster? Isn’t that equivalent to loosing data which
> shouldn’t happen in this particular configuration? Is my cluster setup
> properly or am I missing something?
>
>
> Sounds like http://tracker.ceph.com/issues/18749, which, yeah, we need
> to fix that. By default, with a k+m EC code, it currently insists on
> at least one chunk more than the minimum k to go active.
> -Greg
>
>
> Thank you for your help!
>
> I have attached all relevant information about the cluster and status
> outputs:
>
> Erasure coding profile:
>
> jerasure-per-chunk-alignment=false
> k=2
> m=3
> plugin=jerasure
> ruleset-failure-domain=host
> ruleset-root=default
> technique=reed_sol_van
> w=8
>
> Content of ceph.conf:
>
> [global]
> fsid = 6353b831-22c3-424c-a8f1-495788e6b4e2
> mon_initial_members = ip-172-31-27-142
> mon_host = 172.31.27.142
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> osd_pool_default_min_size = 2
> osd_pool_default_size = 2
> mon_allow_pool_delete = true
>
> Crush rule:
>
> rule ecpool {
> ruleset 1
> type erasure
> min_size 2
> max_size 5
> step set_chooseleaf_tries 5
> step set_choose_tries 100
> step take default
> step chooseleaf indep 0 type host
> step emit
> }
>
> Output of 'ceph -s‘ while cluster is degraded:
>
>    cluster 6353b831-22c3-424c-a8f1-495788e6b4e2
>     health HEALTH_ERR
>            38 pgs are stuck inactive for more than 300 seconds
>            26 pgs degraded
>            38 pgs incomplete
>            26 pgs stuck degraded
>            38 pgs stuck inactive
>            64 pgs stuck unclean
>            26 pgs stuck undersized
>            26 pgs undersized
>            2 requests are blocked > 32 sec
>            recovery 3/5 objects degraded (60.000%)
>            recovery 1/5 objects misplaced (20.000%)
>            noup flag(s) set
>     monmap e2: 1 mons at {ip-172-31-27-142=172.31.27.142:6789/0}
>            election epoch 6, quorum 0 ip-172-31-27-142
>        mgr no daemons active
>     osdmap e194: 5 osds: 2 up, 2 in; 64 remapped pgs
>            flags noup,sortbitwise,require_jewel_osds,require_kraken_osds
>      pgmap v970: 64 pgs, 1 pools, 592 bytes data, 1 objects
>            79668 kB used, 22428 MB / 22505 MB avail
>            3/5 objects degraded (60.000%)
>            1/5 objects misplaced (20.000%)
>                  38 incomplete
>                  15 active+undersized+degraded
>                  11 active+undersized+degraded+remapped
>
> Output of 'ceph health‘ while cluster is degraded:
>
> HEALTH_ERR 38 pgs are stuck inactive for more than 300 seconds; 26 pgs
> degraded; 38 pgs incomplete; 26 pgs stuck degraded; 38 pgs stuck inactive;
> 64 pgs stuck unclean; 26 pgs stuck undersized; 26 pgs undersized; 2
> requests
> are blocked > 32 sec; recovery 3/5 objects degraded (60.000%); recovery 1/5
> objects misplaced (20.000%); noup flag(s) set
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Requests blocked in degraded erasure coded pool

Reply via email to