For the most part - I'm assuming min_size=2, size=3. In the min_size=3 and size=3 this changes.
size is how many replicas of an object to maintain, min_size is how many writes need to succeed before the primary can ack the operation to the client. larger min_size most likely higher latency for writes. On Wed, Mar 22, 2017 at 8:05 AM, Adam Carheden <[email protected]> wrote: > On Tue, Mar 21, 2017 at 1:54 PM, Kjetil Jørgensen <[email protected]> > wrote: > > >> c. Reads can continue from the single online OSD even in pgs that > >> happened to have two of 3 osds offline. > >> > > > > Hypothetically (This is partially informed guessing on my part): > > If the survivor happens to be the acting primary and it were up-to-date > at > > the time, > > it can in theory serve reads. (Only the primary serves reads). > > It makes no sense that only the primary could serve reads. That would > mean that even if only a single OSD failed, all PGs for which that OSD > was primary would be unreadable. > Acting [1, 2, 3] - primary is 1, only 1 serves read. 1 fails, 2 is now the new primary. It'll probably check with 3 to determine whether or not it there were any writes it itself is unaware of - and peer if there were. Promotion should be near instantaneous (well, you'd in all likelihood be able to measure it). > There must be an algorithm to appoint a new primary. So in a 2 OSD > failure scenario, a new primary should be appointed after the first > failure, no? Would the final remaining OSD not appoint itself as > primary after the 2nd failure? > > Assuming the min_size=2, size=3 - if 2 osd's fail at the same instant, you have no guarantee that the survivor have all writes. Assuming min_size=3 and size=3 - then yes - you're good, the surviving osd can safely be promoted - you're severely degraded, but it can safely be promoted. If you genuinely worry about concurrent failures of 2 machines - run with min_size=3, the price you pay is slightly increased mean/median latency for writes. This make sense in the context of CEPH's synchronous writes too. A > write isn't complete until all 3 OSDs in the PG have the data, > correct? So shouldn't any one of them be able to act as primary at any > time? See distinction between size and min_size. > I don't see how that would change even if 2 of 3 ODS fail at exactly > the same time. > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Kjetil Joergensen <[email protected]> SRE, Medallia Inc Phone: +1 (650) 739-6580
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
