The status will be a WARN in this case. On Fri, Apr 14, 2017 at 12:01 PM Adam Carheden <carhe...@ucar.edu> wrote:
> Thanks for your replies. > > I think the sort version is "guaranteed": CEPH will always either store > 'size' copies of your data or set heath to a WARN and/or ERR state to > let you know that it can't. I think that's probably the most desirable > answer. > > -- > Adam Carheden > > On 04/14/2017 09:51 AM, David Turner wrote: > > If you have Replica size 3, your failure domain is host, and you have 3 > > servers... you will NEVER have 2 copies of the data on 1 server. If you > > weight your OSDs poorly on one of your servers, then one of the drives > > will fill up to the full ratio in its config and stop receiving writes. > > You should always monitor your OSDs so that you can fix the weights > > before an OSD becomes nearfull and definitely so that the OSD never > > reaches the FULL setting and stops receiving writes. Note that when it > > stops receiving writes, it will block the write requests and until it > > has space to fulfill the write and the cluster will be stuck. > > > > Also to truly answer your question, if you had Replica size 3, your > > failure domain is host, and you only have 2 servers in your cluster... > > You will only be storing 2 copies of data and every single PG in your > > cluster will be degraded. Ceph will never breach the boundary of your > > failure domain. > > > > When dealing with 3 node clusters you want to be careful to never fill > > up your cluster past a % where you can lose a drive in one of your > > nodes. For example, if you have 3 nodes with 3x 4TB drives in each and > > you lose a drive... the other 2 OSDs in that node need to be able to > > take the data from the dead drive without going over 80% (the default > > nearfull setting). So in this scenario you shouldn't fill the cluster > > to be more than 53% unless you're planning to tell the cluster not to > > backfill until the dead OSD is replaced. > > > > I will never recommend anyone to go into production with a cluster > > smaller than N+2 your replica size of failure domains. So if you have > > the default Replica size of 3, then you should go into production with > > at least 5 servers. This gives you enough failure domains to be able to > > handle drive failures without the situation being critical. > > > > On Fri, Apr 14, 2017 at 11:25 AM Adam Carheden <carhe...@ucar.edu > > <mailto:carhe...@ucar.edu>> wrote: > > > > Is redundancy across failure domains guaranteed or best effort? > > > > Note: The best answer to the questions below is obviously to avoid > the > > situation by properly weight drives and not approaching the full > ratio. > > I'm just curious how CEPH works. > > > > Hypothetical situation: > > Say you have 1 pool of size=3 and 3 servers, each with 2 OSDs. Say > you > > weighted the OSDs poorly such that the OSDs on one server filled up > but > > the OSDs on the others still had space. CEPH could still store 3 > > replicas of your data, but two of them would be on the same server. > What > > happens? > > > > (select all that apply) > > a.[ ] Clients can still read data > > b.[ ] Clients can still write data > > c.[ ] health = HEALTH_WARN > > d.[ ] health = HEALTH_OK > > e.[ ] PGs are degraded > > f.[ ] ceph stores only two copies of data > > g.[ ] ceph stores 3 copies of data, two of which are on the same > server > > h.[ ] something else? > > > > If the answer is "best effort" (a+b+d+g), how would you detect if > that > > scenario is occurring? > > > > If the answer is "guaranteed" (f+e+c+...) and you loose a drive > while in > > that scenario, is there any way to tell CEPH to store temporarily > store > > 2 copies on a single server just in case? I suspect the answer is to > > remove host bucket from the crushmap but that that's a really bad > idea > > because it would trigger a rebuild and the extra disk activity > increases > > the likelihood of additional drive failures, correct? > > > > -- > > Adam Carheden > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com