Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-12 Thread joel.merr...@gmail.com
Thanks Sam, I'll take a look. Seems sensible enough and worth a shot. We'll probably call it a day after this and flatten in, but I'm wondering if it's possible some rbd devices may miss these pg's and could be exportable? Will have a tinker! On Wed, Mar 11, 2015 at 7:06 PM, Samuel Just wrote:

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-12 Thread joel.merr...@gmail.com
Sure thing, n.b. I increased pg count to see if it would help. Alas not. :) Thanks again! health_detail https://gist.github.com/199bab6d3a9fe30fbcae osd_dump https://gist.github.com/499178c542fa08cc33bb osd_tree https://gist.github.com/02b62b2501cbd684f9b2 Random selected queries: queries/0.19

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-11 Thread Samuel Just
For each of those pgs, you'll need to identify the pg copy you want to be the winner and either 1) Remove all of the other ones using ceph-objectstore-tool and hopefully the winner you left alone will allow the pg to recover and go active. 2) Export the winner using ceph-objectstore-tool, use c

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-11 Thread joel.merr...@gmail.com
I'd like to not have to null them if possible, there's nothing outlandishly valuable, its more the time to reprovision (users have stuff on there, mainly testing but I have a nasty feeling some users won't have backed up their test instances). When you say complicated and fragile, could you expand?

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-11 Thread Samuel Just
Ok, you lost all copies from an interval where the pgs went active. The recovery from this is going to be complicated and fragile. Are the pools valuable? -Sam On 03/11/2015 03:35 AM, joel.merr...@gmail.com wrote: For clarity too, I've tried to drop the min_size before as suggested, doesn't m

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-11 Thread joel.merr...@gmail.com
For clarity too, I've tried to drop the min_size before as suggested, doesn't make a difference unfortunately On Wed, Mar 11, 2015 at 9:50 AM, joel.merr...@gmail.com wrote: > Sure thing, n.b. I increased pg count to see if it would help. Alas not. :) > > Thanks again! > > health_detail > https://

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-10 Thread Samuel Just
Yeah, get a ceph pg query on one of the stuck ones. -Sam On Tue, 2015-03-10 at 14:41 +, joel.merr...@gmail.com wrote: > Stuck unclean and stuck inactive. I can fire up a full query and > health dump somewhere useful if you want (full pg query info on ones > listed in health detail, tree, osd d

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-10 Thread joel.merr...@gmail.com
Stuck unclean and stuck inactive. I can fire up a full query and health dump somewhere useful if you want (full pg query info on ones listed in health detail, tree, osd dump etc). There were blocked_by operations that no longer exist after doing the OSD addition. Side note, spent some time yesterd

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-10 Thread Samuel Just
What do you mean by "unblocked" but still "stuck"? -Sam On Mon, 2015-03-09 at 22:54 +, joel.merr...@gmail.com wrote: > On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just wrote: > > You'll probably have to recreate osds with the same ids (empty ones), > > let them boot, stop them, and mark them lost.

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-09 Thread joel.merr...@gmail.com
On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just wrote: > You'll probably have to recreate osds with the same ids (empty ones), > let them boot, stop them, and mark them lost. There is a feature in the > tracker to improve this behavior: http://tracker.ceph.com/issues/10976 > -Sam Thanks Sam, I've re

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-09 Thread Samuel Just
You'll probably have to recreate osds with the same ids (empty ones), let them boot, stop them, and mark them lost. There is a feature in the tracker to improve this behavior: http://tracker.ceph.com/issues/10976 -Sam On Mon, 2015-03-09 at 12:24 +, joel.merr...@gmail.com wrote: > Hi, > > I'm

[ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-09 Thread joel.merr...@gmail.com
Hi, I'm trying to fix an issue within 0.93 on our internal cloud related to incomplete pg's (yes, I realise the folly of having the dev release - it's a not-so-test env now, so I need to recover this really). I'll detail the current outage info; 72 initial (now 65) OSDs 6 nodes * Update to 0.92

[ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-09 Thread joel.merr...@gmail.com
Hi, I'm trying to fix an issue within 0.93 on our internal cloud related to incomplete pg's (yes, I realise the folly of having the dev release - it's a not-so-test env now, so I need to recover this really). I'll detail the current outage info; 72 initial (now 65) OSDs 6 nodes * Update to 0.92