[ceph-users] Re: [Ceph incident] PG stuck in peering.

Frank Schilder Thu, 26 Sep 2024 02:35:32 -0700

Hi Loan,

thanks for the detailed post-mortem to the list!


I misread your first message, unfortunately. On our cluster we also had issues 
with 1-2 PGs being stuck in peering resulting in blocked IO and warnings piling 
up. We identified the "bad" OSD by shutting one member-OSD down at a time and 
setting it out, so it was in state down+out. As soon as the bad OSD was 
down+out, the PG recovered and became active. In our case the disks were bad 
and we replaced them.

I thought you had done that, but after re-reading it was restarts only, which 
will not force a remapping. Sorry for the confusion and hopefully our 
experience reports here help other users.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Ceph incident] PG stuck in peering.

Reply via email to