Re: [ceph-users] RGW lifecycle bucket stuck processing?

2017-04-14 Thread Ben Hines
Interesting - the state went back to 'UNINITIAL' eventually, possibly because the first run never finished. Will see if it ever completes during a nightly run. -BEn On Thu, Apr 13, 2017 at 11:10 AM, Ben Hines wrote: > I initiated a manual lifecycle cleanup with: > > radosgw-admin lc process > >

Re: [ceph-users] Is redundancy across failure domains guaranteed or best effort?

2017-04-14 Thread David Turner
The status will be a WARN in this case. On Fri, Apr 14, 2017 at 12:01 PM Adam Carheden wrote: > Thanks for your replies. > > I think the sort version is "guaranteed": CEPH will always either store > 'size' copies of your data or set heath to a WARN and/or ERR state to > let you know that it can'

Re: [ceph-users] Is redundancy across failure domains guaranteed or best effort?

2017-04-14 Thread Adam Carheden
Thanks for your replies. I think the sort version is "guaranteed": CEPH will always either store 'size' copies of your data or set heath to a WARN and/or ERR state to let you know that it can't. I think that's probably the most desirable answer. -- Adam Carheden On 04/14/2017 09:51 AM, David Tu

Re: [ceph-users] PG calculator improvement

2017-04-14 Thread Frédéric Nass
Hi Michael, David, Actually, we did start with a lot of work (and then a lot of work :-)) and with the help of a RHCS consultant (an Inktank pioneer :-)) during a 5 days on-site Jumpstart. With his precious help, we deployed our production cluster, set the right options in ceph.conf, the rig

Re: [ceph-users] Is redundancy across failure domains guaranteed or best effort?

2017-04-14 Thread David Turner
If you have Replica size 3, your failure domain is host, and you have 3 servers... you will NEVER have 2 copies of the data on 1 server. If you weight your OSDs poorly on one of your servers, then one of the drives will fill up to the full ratio in its config and stop receiving writes. You should

Re: [ceph-users] Degraded: OSD failure vs crushmap change

2017-04-14 Thread David Turner
The PG just counts as degraded in both scenarios, but if you look at the objects in the degraded PGs (visible in ceph status) some of them are degraded objects and others are misplaced objects. Degraded objects have less than your replica size of copies, like what happens when you lose an OSD. Wh

[ceph-users] Is redundancy across failure domains guaranteed or best effort?

2017-04-14 Thread Adam Carheden
Is redundancy across failure domains guaranteed or best effort? Note: The best answer to the questions below is obviously to avoid the situation by properly weight drives and not approaching the full ratio. I'm just curious how CEPH works. Hypothetical situation: Say you have 1 pool of size=3 and

[ceph-users] Degraded: OSD failure vs crushmap change

2017-04-14 Thread Adam Carheden
Is there a difference between the degraded states triggered by an OSD failure vs a crushmap change? When an OSD fails the cluster is obviously degraded in the sense that you have fewer copies of your data than the pool size mandates. But when you change the crush map, say by adding an OSD, ceph a

Re: [ceph-users] slow requests and short OSD failures in small cluster

2017-04-14 Thread mj
ah right: _during_ the actual removal, you mean. :-) clear now. mj On 04/13/2017 05:50 PM, Lionel Bouton wrote: Le 13/04/2017 à 17:47, mj a écrit : Hi, On 04/13/2017 04:53 PM, Lionel Bouton wrote: We use rbd snapshots on Firefly (and Hammer now) and I didn't see any measurable impact on per