[ceph-users] Re: Help: corrupt pg

2020-03-27 Thread Jake Grimmett
Hi Greg, Yes, this was caused by a chain of event. As a cautionary tale, the main ones were: 1) minor nautilus release upgrade, followed by a rolling node restart script that mistakenly relied on "ceph -s" for cluster health info, i.e. it didn't wait for the cluster to return to health bef

[ceph-users] Re: Help: corrupt pg

2020-03-26 Thread Gregory Farnum
On Wed, Mar 25, 2020 at 5:19 AM Jake Grimmett wrote: > > Dear All, > > We are "in a bit of a pickle"... > > No reply to my message (23/03/2020), subject "OSD: FAILED > ceph_assert(clone_size.count(clone))" > > So I'm presuming it's not possible to recover the crashed OSD From your later email i

[ceph-users] Re: Help: corrupt pg

2020-03-25 Thread Jake Grimmett
Hi Eugen, Many thanks for your reply. The other two OSD's are up and running, and being used by other pgs with no problem, for some reason this pg refuses to use these OSD's. The other two OSDs that are missing from this pg crashed at different times last month, each OSD crashed when we trie

[ceph-users] Re: Help: corrupt pg

2020-03-25 Thread Eugen Block
Hi, is there any chance to recover the other failing OSDs that seem to have one chunk of this PG? Do the other OSDs fail with the same error? Zitat von Jake Grimmett : Dear All, We are "in a bit of a pickle"... No reply to my message (23/03/2020),  subject  "OSD: FAILED ceph_assert(clo