[ceph-users] Re: Weird behavior for 2 OSDs in our cluster

Eugen Block via ceph-users Sun, 31 May 2026 02:12:44 -0700

Hi,

the MGR doesn't always report the correct PG status, so don't rely onthat too much. Sometimes it's necessary to restart primary OSDs forstuck PGs, although a repeer could have been sufficient. Your Cephclients had to refresh their osdmap, that's when they notice thatthere had been down OSDs. It's not a real-time log in this case, noneed to worry. It's a common question though, I think we also asked it8 to 10 years ago. ;-)


Regards,
Eugen

Zitat von Wannes Smet via ceph-users <[email protected]>:

Hi,
I'm running a Ceph cluster 19.2.2, 23 nodes, 152 OSDs, cephadmdeployed. Most SAS SSDs, 12 NVMe SSDs.
Yesterday we experienced a total power failure and everything wentdown hard. Also our Ceph cluster. There were a couple of things, butthis stood out after it got back up:
[ERR] OSD_UNREACHABLE: 2 osds(s) are not reachable
 osd.53's public address is not in '192.168.11.0/24' subnet
 osd.86's public address is not in '192.168.11.0/24' subnet
ceph -s did not say reduced data {availability,redundancy} which isa bit "off", given that both OSDs are in separate hosts, failuredomain=host. There must have been PGs with less than 3 replicas andalso PGs with just one replica left?
So I manually restarted those OSDs with systemctl , a recoveryprocess started and all our VMs, "magically" started booting now.I'm also surprised that the recovery process only started when thoseOSDs got back up.
I didn't make too much of the above, but now this morning, I'mlooking at the kernel ring buffer of our PVE nodes and I notice thelogs below. Just a single "blip". All at the same time on all of ourPVE nodes (ceph clients):
[Sat May 30 22:03:46 2026] libceph(e8020818-2100-11f0-8a12-9cdc71772100 e179035): osd53 down[Sat May 30 22:03:46 2026] libceph(e8020818-2100-11f0-8a12-9cdc71772100 e179050): osd53 up[Sat May 30 22:03:46 2026] libceph(e8020818-2100-11f0-8a12-9cdc71772100 e179057): osd86 down[Sat May 30 22:03:46 2026] libceph(e8020818-2100-11f0-8a12-9cdc71772100 e179074): osd86 up
I don't see anything weird in the Ceph cluster itself, neither inthe log files of the ODS.
I'm not sure what to make from this. Why would this happen and whatwould you do?
Thanks for your insights,

Wannes Smet

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]



_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Weird behavior for 2 OSDs in our cluster

Reply via email to