I know this thread has been silent for a while, however due to various reasons, 
I have been forced to work specifically on this issue this weekend.

As it turns out, you were partly right, the fix for the state is to use 
ceph-objectstore, however it was not to remove the PG in question, rather to 
inject the missing OSD Map Epoch. Once it has the required Epoch, it can 
successfully start the OSD in question and resume its download of OSDmaps 
through the normal mechanism.

As an example, osd id 123 on storage1 with missing epoch 9876:

On A monitor:
  ceph osd getmap 9876 > e9876

SCP (or other mechanism) the file e9876 from monitor to storage1

Then forcibly inject the epoch into the not-running OSD (our system is 
configured with cluster name txc1, as a result your mileage may vary).

  sudo ceph-objectstore-tool --cluster=txc1 --data-path 
/var/lib/ceph/osd/txc1-123 --journal-path /var/lib/ceph/osd/txc1-123/journal 
--op set-osdmap --file /path/to/e9876 --epoch 9876 --force

I wanted to share this nugget of information for posterity, as I can not be the 
only person out there who has run across this and there appears to be limited 
documentation on this (and what documentation of ceph-objectstore-tool there 
is, is slightly inconsistent with the realities of its use). Thanks also to 
Wido for the poke in the right direction elsewhere, as he filled in the missing 
bits.

Regards,

Stuart 


 − Stuart Harland: 
Infrastructure Engineer
Email: s.harl...@livelinktechnology.net 
<mailto:s.harl...@livelinktechnology.net>
Tel: +44 (0) 207 183 1411



LiveLink Technology Ltd
McCormack House
56A East Street
Havant
PO9 1BS

IMPORTANT: The information transmitted in this e-mail is intended only for the 
person or entity to whom it is addressed and may contain confidential and/or 
privileged information. If you are not the intended recipient of this message, 
please do not read, copy, use or disclose this communication and notify the 
sender immediately. Any review, retransmission, dissemination or other use of, 
or taking any action in reliance upon this information by persons or entities 
other than the intended recipient is prohibited. Any views or opinions 
presented in this e-mail are solely those of the author and do not necessarily 
represent those of LiveLink. This e-mail message has been checked for the 
presence of computer viruses. However, LiveLink is not able to accept liability 
for any damage caused by this e-mail.



> On 26 May 2017, at 22:53, Gregory Farnum <gfar...@redhat.com> wrote:
> 
> Yeah, not sure. It might just be that the restarting is newly exposing old 
> issues, but I don't see how. I gather from skimming that ticket that it was a 
> disk state bug earlier on that was going undetected until Jewel, which is why 
> I was wondering about the upgrades.
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to