Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

Kjetil Jørgensen Wed, 22 Mar 2017 19:07:34 -0700

For the most part - I'm assuming min_size=2, size=3. In the min_size=3
and size=3 this changes.

size is how many replicas of an object to maintain, min_size is how many
writes need to succeed before the primary can ack the operation to the
client.

larger min_size most likely higher latency for writes.

On Wed, Mar 22, 2017 at 8:05 AM, Adam Carheden <[email protected]> wrote:

> On Tue, Mar 21, 2017 at 1:54 PM, Kjetil Jørgensen <[email protected]>
> wrote:
>
> >> c. Reads can continue from the single online OSD even in pgs that
> >> happened to have two of 3 osds offline.
> >>
> >
> > Hypothetically (This is partially informed guessing on my part):
> > If the survivor happens to be the acting primary and it were up-to-date
> at
> > the time,
> > it can in theory serve reads. (Only the primary serves reads).
>
> It makes no sense that only the primary could serve reads. That would
> mean that even if only a single OSD failed, all PGs for which that OSD
> was primary would be unreadable.
>

Acting [1, 2, 3] - primary is 1, only 1 serves read. 1 fails, 2 is now the
new
primary. It'll probably check with 3 to determine whether or not it there
were
any writes it itself is unaware of - and peer if there were. Promotion
should
be near instantaneous (well, you'd in all likelihood be able to measure it).

> There must be an algorithm to appoint a new primary. So in a 2 OSD
> failure scenario, a new primary should be appointed after the first
> failure, no? Would the final remaining OSD not appoint itself as
> primary after the 2nd failure?
>
>
Assuming the min_size=2, size=3 - if 2 osd's fail at the same instant,
you have no guarantee that the survivor have all writes.

Assuming min_size=3 and size=3 - then yes - you're good, the surviving
osd can safely be promoted - you're severely degraded, but it can safely
be promoted.

If you genuinely worry about concurrent failures of 2 machines - run with
min_size=3, the price you pay is slightly increased mean/median latency
for writes.

This make sense in the context of CEPH's synchronous writes too. A
> write isn't complete until all 3 OSDs in the PG have the data,
> correct? So shouldn't any one of them be able to act as primary at any
> time?

See distinction between size and min_size.

> I don't see how that would change even if 2 of 3 ODS fail at exactly
> the same time.
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Kjetil Joergensen <[email protected]>
SRE, Medallia Inc
Phone: +1 (650) 739-6580

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

Reply via email to