It is inherently dangerous to accept client IO - particularly writes -
when at k. Just like it's dangerous to accept IO with 1 replica in
replicated mode. It is not inherently dangerous to do recovery when at
k, but apparently it was originally written to use min_size rather
than k.
Looking at the PR, the actual code change is fairly small, ~30 lines,
but it's a fairly critical change and has several pages of testing
code associated with it. It also requires setting
"osd_allow_recovery_below_min_size" just in case. So clearly it is
being treated with caution.


On Wed, Jul 24, 2019 at 2:28 PM Jean-Philippe Méthot
<jp.met...@planethoster.info> wrote:
>
> Thank you, that does make sense. I was completely unaware that min size was 
> k+1 and not k. Had I known that, I would have designed this pool differently.
>
> Regarding that feature for Octopus, I’m guessing it shouldn't be dangerous 
> for data integrity to recover at less than min_size?
>
> Jean-Philippe Méthot
> Openstack system administrator
> Administrateur système Openstack
> PlanetHoster inc.
>
>
>
>
> Le 24 juill. 2019 à 13:49, Nathan Fish <lordci...@gmail.com> a écrit :
>
> 2/3 monitors is enough to maintain quorum, as is any majority.
>
> However, EC pools have a default min_size of  k+1 chunks.
> This can be adjusted to k, but that has it's own dangers.
> I assume you are using failure domain = "host"?
> As you had k=6,m=2, and lost 2 failure domains, you had k chunks left,
> resulting in all IO stopping.
>
> Currently, EC pools that have k chunks but less than min_size do not rebuild.
> This is being worked on for Octopus: https://github.com/ceph/ceph/pull/17619
>
> k=6,m=2 is therefore somewhat slim for a 10-host cluster.
> I do not currently use EC, as I have only 3 failure domains, so others
> here may know better than me,
> but I might have done k=6,m=3. This would allow rebuilding to OK from
> 1 host failure, and remaining available in WARN state with 2 hosts
> down.
> k=4,m=4 would be very safe, but potentially too expensive.
>
>
> On Wed, Jul 24, 2019 at 1:31 PM Jean-Philippe Méthot
> <jp.met...@planethoster.info> wrote:
>
>
> Hi,
>
> I’m running in production a 3 monitors, 10 osdnodes ceph cluster. This 
> cluster is used to host Openstack VM rbd. My pools are set to use a k=6 m=2 
> erasure code profile with a 3 copy metadata pool in front. The cluster runs 
> well, but we recently had a short outage which triggered unexpected behaviour 
> in the cluster.
>
> I’ve always been under the impression that Ceph would continue working 
> properly even if nodes would go down. I tested it several months ago with 
> this configuration and it worked fine as long as only 2 nodes went down. 
> However, this time, the first monitor as well as two osd nodes went down. As 
> a result, Openstack VMs were able to mount their rbd volume but unable to 
> read from it, even after the cluster had recovered with the following message 
> : Reduced data availability: 599 pgs inactive, 599 pgs incomplete .
>
> I believe the cluster should have continued to work properly despite the 
> outage, so what could have prevented it from functioning? Is it because there 
> was only two monitors remaining? Or is it that reduced data availability 
> message? In that case, is my erasure coding configuration fine for that 
> number of nodes?
>
> Jean-Philippe Méthot
> Openstack system administrator
> Administrateur système Openstack
> PlanetHoster inc.
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to