Re: [ceph-users] PG Recovery: HEALTH_ERR to HEALTH_OK

2014-06-05 Thread Jason Harley
Just wanted to close this open loop: I gave up attempting to recover pool 4 as it was just test data, and the PGs with unfound objects were localized to that pool. After I destroyed and recreated the pool this were fine. Thank you for your help, Florian. ./JRH On Jun 3, 2014, at 6:30 PM, Jas

Re: [ceph-users] PG Recovery: HEALTH_ERR to HEALTH_OK

2014-06-03 Thread Jason Harley
On Jun 3, 2014, at 5:58 PM, Smart Weblications GmbH - Florian Wiessner wrote: > I think it would be less painfull if you had removed and the immediatelly > recreate the corrupted osd again to avoid 'holes' in the osd ids. It should > work > with your configuration anyhow, though. I agree with

Re: [ceph-users] PG Recovery: HEALTH_ERR to HEALTH_OK

2014-06-03 Thread Smart Weblications GmbH - Florian Wiessner
Hi, Am 03.06.2014 23:24, schrieb Jason Harley: > On Jun 3, 2014, at 4:17 PM, Smart Weblications GmbH - Florian Wiessner > mailto:f.wiess...@smart-weblications.de>> > wrote: > >> You could try to recreate the osds and start them. Then i think the recovery >> should proceed. If it does not, you co

Re: [ceph-users] PG Recovery: HEALTH_ERR to HEALTH_OK

2014-06-03 Thread Smart Weblications GmbH - Florian Wiessner
Hi, Am 03.06.2014 22:04, schrieb Jason Harley: > # ceph pg 4.ff3 query >> { "state": "active+recovering", >> "epoch": 1642, >> "up": [ >> 7, >> 26], >> "acting": [ >> 7, >> 26], [...] >> "recovery_state": [ >> { "name": "Started\/Primary\/Active",

Re: [ceph-users] PG Recovery: HEALTH_ERR to HEALTH_OK

2014-06-03 Thread Jason Harley
# ceph pg 4.ff3 query > { "state": "active+recovering", > "epoch": 1642, > "up": [ > 7, > 26], > "acting": [ > 7, > 26], > "info": { "pgid": "4.ffe", > "last_update": "339'96", > "last_complete": "339'89", > "log_tail": "0'0", > "last_

Re: [ceph-users] PG Recovery: HEALTH_ERR to HEALTH_OK

2014-06-03 Thread Smart Weblications GmbH - Florian Wiessner
Hi, Am 03.06.2014 21:46, schrieb Jason Harley: > Howdy — > > I’ve had a failure on a small, Dumpling (0.67.4) cluster running on Ubuntu > 13.10 machines. I had three OSD nodes (running 6 OSDs each), and lost two of > them in a beautiful failure. One of these nodes even went so far as to > sc