No, there is no split brain problem even with size/mine_size 2/1. A PG will
not go active if it doesn't have the latest data because all other OSDs
that might have seen writes are currently offline.
That's what the history_ignore_les_bounds option effectively does: it tells
ceph to take a PG active anyways in that situation.

That's why you end up with inactive PGs if you run 2/1 and a disk dies
while OSDs flap. You  then have to set history_ignore_les_bounds if the
dead disk is really unrecoverable, losing the latest modifcations to an
object.
But Ceph will not compromise your data without you manually telling it to
do so, it will just block IO instead.


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Mon, May 20, 2019 at 10:04 PM Frank Schilder <fr...@dtu.dk> wrote:

> If min_size=1 and you loose the last disk, that's end of any data that was
> only on this disk.
>
> Apart from this, using size=2 and min_size=1 is a really bad idea. This
> has nothing to do with data replication but rather with an inherent problem
> with high availability and the number 2. You need at least 3 members of an
> HA group to ensure stable operation with proper majorities. There are
> numerous stories about OSD flapping caused by size-2 min_size-1 pools,
> leading to situations that are extremely hard to recover from. My favourite
> is this one:
> https://blog.noc.grnet.gr/2016/10/18/surviving-a-ceph-cluster-outage-the-hard-way/
> . You will easily find more. The deeper problem here is called
> "split-brain" and there is no real solution to it except to avoid it at all
> cost.
>
> Best regards,
>
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Florent B <flor...@coppint.com>
> Sent: 20 May 2019 21:33
> To: Paul Emmerich; Frank Schilder
> Cc: ceph-users
> Subject: Re: [ceph-users] Default min_size value for EC pools
>
> I understand better thanks to Frank & Paul messages.
>
> Paul, when min_size=k, is it the same problem with replicated pool size=2
> & min_size=1 ?
>
> On 20/05/2019 21:23, Paul Emmerich wrote:
> Yeah, the current situation with recovery and min_size is... unfortunate :(
>
> The reason why min_size = k is bad is just that it means you are accepting
> writes without guaranteeing durability while you are in a degraded state.
> A durable storage system should never tell a client "okay, i've written
> your data" if losing a single disk leads to data loss.
>
> Yes, that is the default behavior of traditional raid 5 and raid 6 systems
> during rebuild (with 1 or 2 disk failures for raid 5/6), but that doesn't
> mean it's a good idea.
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io<http://www.croit.io>
> Tel: +49 89 1896585 90
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to