2/3 monitors is enough to maintain quorum, as is any majority. However, EC pools have a default min_size of k+1 chunks. This can be adjusted to k, but that has it's own dangers. I assume you are using failure domain = "host"? As you had k=6,m=2, and lost 2 failure domains, you had k chunks left, resulting in all IO stopping.
Currently, EC pools that have k chunks but less than min_size do not rebuild. This is being worked on for Octopus: https://github.com/ceph/ceph/pull/17619 k=6,m=2 is therefore somewhat slim for a 10-host cluster. I do not currently use EC, as I have only 3 failure domains, so others here may know better than me, but I might have done k=6,m=3. This would allow rebuilding to OK from 1 host failure, and remaining available in WARN state with 2 hosts down. k=4,m=4 would be very safe, but potentially too expensive. On Wed, Jul 24, 2019 at 1:31 PM Jean-Philippe Méthot <jp.met...@planethoster.info> wrote: > > Hi, > > I’m running in production a 3 monitors, 10 osdnodes ceph cluster. This > cluster is used to host Openstack VM rbd. My pools are set to use a k=6 m=2 > erasure code profile with a 3 copy metadata pool in front. The cluster runs > well, but we recently had a short outage which triggered unexpected behaviour > in the cluster. > > I’ve always been under the impression that Ceph would continue working > properly even if nodes would go down. I tested it several months ago with > this configuration and it worked fine as long as only 2 nodes went down. > However, this time, the first monitor as well as two osd nodes went down. As > a result, Openstack VMs were able to mount their rbd volume but unable to > read from it, even after the cluster had recovered with the following message > : Reduced data availability: 599 pgs inactive, 599 pgs incomplete . > > I believe the cluster should have continued to work properly despite the > outage, so what could have prevented it from functioning? Is it because there > was only two monitors remaining? Or is it that reduced data availability > message? In that case, is my erasure coding configuration fine for that > number of nodes? > > Jean-Philippe Méthot > Openstack system administrator > Administrateur système Openstack > PlanetHoster inc. > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com