You'll probably have to recreate osds with the same ids (empty ones),
let them boot, stop them, and mark them lost.  There is a feature in the
tracker to improve this behavior:

On Mon, 2015-03-09 at 12:24 +0000, wrote:
> Hi,
> I'm trying to fix an issue within 0.93 on our internal cloud related
> to incomplete pg's (yes, I realise the folly of having the dev release
> - it's a not-so-test env now, so I need to recover this really). I'll
> detail the current outage info;
> 72 initial (now 65) OSDs
> 6 nodes
> * Update to 0.92 from Giant.
> * Fine for a day
> * MDS outage overnight and subsequent node failure
> * Massive increase in RAM utilisation (10G per OSD!)
> * More failure
> * OSD's 'out' to try to alleviate new large cluster requirements and a
> couple died under additional load
> * 'superfluous and faulty' OSD's rm, auth keys deleted
> * RAM added to nodes (96GB each - serving 10-12 OSDs)
> * Ugrade to 0.93
> * Fix broken journals due to 0.92 update
> * No more missing objects or degredation
> So, that brings me to today, I still have 73/2264 PGs listed as stuck
> incomplete/inactive. I also have requests that are blocked.
> Upon querying said placement groups, I notice that they are
> 'blocked_by' non-existent OSDs (ones I have removed due to issues).
> I have no way to tell them the OSD is lost (as it'a already been
> removed, both from osdmap and crushmap).
> Exporting the crushmap shows non-existant OSDs as deviceN (i.e.
> device36 for the removed osd.36)
> Deleting those and reimporting crush map makes no affect
> Some further pg detail -
> So I'm stuck, I can't recover the pg's as I can't remove a
> non-existent OSD that the PG think's blocking it.
> Help graciously accepted!
> Joel

ceph-users mailing list

Reply via email to