2/3 monitors is enough to maintain quorum, as is any majority.

However, EC pools have a default min_size of  k+1 chunks.
This can be adjusted to k, but that has it's own dangers.
I assume you are using failure domain = "host"?
As you had k=6,m=2, and lost 2 failure domains, you had k chunks left,
resulting in all IO stopping.

Currently, EC pools that have k chunks but less than min_size do not rebuild.
This is being worked on for Octopus: https://github.com/ceph/ceph/pull/17619

k=6,m=2 is therefore somewhat slim for a 10-host cluster.
I do not currently use EC, as I have only 3 failure domains, so others
here may know better than me,
but I might have done k=6,m=3. This would allow rebuilding to OK from
1 host failure, and remaining available in WARN state with 2 hosts
down.
k=4,m=4 would be very safe, but potentially too expensive.


On Wed, Jul 24, 2019 at 1:31 PM Jean-Philippe Méthot
<jp.met...@planethoster.info> wrote:
>
> Hi,
>
> I’m running in production a 3 monitors, 10 osdnodes ceph cluster. This 
> cluster is used to host Openstack VM rbd. My pools are set to use a k=6 m=2 
> erasure code profile with a 3 copy metadata pool in front. The cluster runs 
> well, but we recently had a short outage which triggered unexpected behaviour 
> in the cluster.
>
> I’ve always been under the impression that Ceph would continue working 
> properly even if nodes would go down. I tested it several months ago with 
> this configuration and it worked fine as long as only 2 nodes went down. 
> However, this time, the first monitor as well as two osd nodes went down. As 
> a result, Openstack VMs were able to mount their rbd volume but unable to 
> read from it, even after the cluster had recovered with the following message 
> : Reduced data availability: 599 pgs inactive, 599 pgs incomplete .
>
> I believe the cluster should have continued to work properly despite the 
> outage, so what could have prevented it from functioning? Is it because there 
> was only two monitors remaining? Or is it that reduced data availability 
> message? In that case, is my erasure coding configuration fine for that 
> number of nodes?
>
> Jean-Philippe Méthot
> Openstack system administrator
> Administrateur système Openstack
> PlanetHoster inc.
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to