Re: [ceph-users] Is redundancy across failure domains guaranteed or best effort?

David Turner Fri, 14 Apr 2017 09:05:06 -0700

The status will be a WARN in this case.

On Fri, Apr 14, 2017 at 12:01 PM Adam Carheden <carhe...@ucar.edu> wrote:


> Thanks for your replies.
>
> I think the sort version is "guaranteed": CEPH will always either store
> 'size' copies of your data or set heath to a WARN and/or ERR state to
> let you know that it can't. I think that's probably the most desirable
> answer.
>
> --
> Adam Carheden
>
> On 04/14/2017 09:51 AM, David Turner wrote:
> > If you have Replica size 3, your failure domain is host, and you have 3
> > servers... you will NEVER have 2 copies of the data on 1 server.  If you
> > weight your OSDs poorly on one of your servers, then one of the drives
> > will fill up to the full ratio in its config and stop receiving writes.
> > You should always monitor your OSDs so that you can fix the weights
> > before an OSD becomes nearfull and definitely so that the OSD never
> > reaches the FULL setting and stops receiving writes.  Note that when it
> > stops receiving writes, it will block the write requests and until it
> > has space to fulfill the write and the cluster will be stuck.
> >
> > Also to truly answer your question, if you had Replica size 3, your
> > failure domain is host, and you only have 2 servers in your cluster...
> > You will only be storing 2 copies of data and every single PG in your
> > cluster will be degraded.  Ceph will never breach the boundary of your
> > failure domain.
> >
> > When dealing with 3 node clusters you want to be careful to never fill
> > up your cluster past a % where you can lose a drive in one of your
> > nodes.  For example, if you have 3 nodes with 3x 4TB drives in each and
> > you lose a drive... the other 2 OSDs in that node need to be able to
> > take the data from the dead drive without going over 80% (the default
> > nearfull setting).  So in this scenario you shouldn't fill the cluster
> > to be more than 53% unless you're planning to tell the cluster not to
> > backfill until the dead OSD is replaced.
> >
> > I will never recommend anyone to go into production with a cluster
> > smaller than N+2 your replica size of failure domains.  So if you have
> > the default Replica size of 3, then you should go into production with
> > at least 5 servers.  This gives you enough failure domains to be able to
> > handle drive failures without the situation being critical.
> >
> > On Fri, Apr 14, 2017 at 11:25 AM Adam Carheden <carhe...@ucar.edu
> > <mailto:carhe...@ucar.edu>> wrote:
> >
> >     Is redundancy across failure domains guaranteed or best effort?
> >
> >     Note: The best answer to the questions below is obviously to avoid
> the
> >     situation by properly weight drives and not approaching the full
> ratio.
> >     I'm just curious how CEPH works.
> >
> >     Hypothetical situation:
> >     Say you have 1 pool of size=3 and 3 servers, each with 2 OSDs. Say
> you
> >     weighted the OSDs poorly such that the OSDs on one server filled up
> but
> >     the OSDs on the others still had space. CEPH could still store 3
> >     replicas of your data, but two of them would be on the same server.
> What
> >     happens?
> >
> >     (select all that apply)
> >     a.[ ] Clients can still read data
> >     b.[ ] Clients can still write data
> >     c.[ ] health = HEALTH_WARN
> >     d.[ ] health = HEALTH_OK
> >     e.[ ] PGs are degraded
> >     f.[ ] ceph stores only two copies of data
> >     g.[ ] ceph stores 3 copies of data, two of which are on the same
> server
> >     h.[ ] something else?
> >
> >     If the answer is "best effort" (a+b+d+g), how would you detect if
> that
> >     scenario is occurring?
> >
> >     If the answer is "guaranteed" (f+e+c+...) and you loose a drive
> while in
> >     that scenario, is there any way to tell CEPH to store temporarily
> store
> >     2 copies on a single server just in case? I suspect the answer is to
> >     remove host bucket from the crushmap but that that's a really bad
> idea
> >     because it would trigger a rebuild and the extra disk activity
> increases
> >     the likelihood of additional drive failures, correct?
> >
> >     --
> >     Adam Carheden
> >     _______________________________________________
> >     ceph-users mailing list
> >     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Is redundancy across failure domains guaranteed or best effort?

Reply via email to