[ceph-users] Re: OSD stuck down

2023-06-16 Thread Nino Kotur
After cluster enters healthy state mgr should re-check stray daemons, a lot of activities are on hold while cluster is in warning state. In the event it does not disappear after cluster is healthy than mgr restart should help. Kind regards, Nino On Fri, Jun 16, 2023 at 10:24 PM Nicola Mori w

[ceph-users] Re: OSD stuck down

2023-06-16 Thread Nicola Mori
The osd daemon finally disappeared without further intervention. I guess I should have had more patience and wait the purge process to finish. Thanks to everybody who helped. Nicola Il 15 giugno 2023 15:02:16 CEST, Nicola Mori ha scritto: > >I have been able to (sort-of) fix the problem by remo

[ceph-users] Re: OSD stuck down

2023-06-15 Thread Nicola Mori
I have been able to (sort-of) fix the problem by removing the problematic OSD, zapping the disk and starting a new OSD. The new OSD is backfilling, but now the problem is that some parts of Ceph are still waiting for the OSD removal, and the OSD (despite not running anymore on the host) is se

[ceph-users] Re: OSD stuck down

2023-06-15 Thread Nicola Mori
Hi Curt, I increased the debug level but still the OSD daemon doesn't log anything more than I already posted. dmesg does not report anything suspect (the osd disk has the very same messages as other disks for working osds), and smart is not very helpful: # smartctl -a /dev/sdf smartctl 7.1

[ceph-users] Re: OSD stuck down

2023-06-15 Thread Curt
Hello, Have you increased the osd debug level to get more output? Does dmesg on the host machine report anything? Are there any smart errors on the drive? Regards, Curt On Thu, Jun 15, 2023, 13:30 Nicola Mori wrote: > Hi Dario, > > I think the connectivity is ok. My cluster has just a public

[ceph-users] Re: OSD stuck down

2023-06-15 Thread Nicola Mori
Hi Dario, I think the connectivity is ok. My cluster has just a public interface, and all of the other services on the same machine (osds and mgr) work flawlessly so I guess the connectivity is ok. Or in other words, I don't know what to look for in the network since all the other services wor

[ceph-users] Re: OSD stuck down

2023-06-15 Thread Dario Graña
Hi, I have seen this behaviour when the OSD host cluster interface was down but the public interface was up. I suggest checking the network interfaces and the connectivity. Regards! On Thu, Jun 15, 2023 at 11:08 AM Nicola Mori wrote: > I have restarted all the monitors and managers, but still t

[ceph-users] Re: OSD stuck down

2023-06-15 Thread Nicola Mori
I have restarted all the monitors and managers, but still the osd remains down. But I found that cephadm actually sees -it running: # ceph orch ps | grep osd.34 osd.34 balin running (14m) 108s ago 8M75.3M 793M 17.2.6 b1a23658afad 5b9dbea262c7

[ceph-users] Re: OSD stuck down

2023-06-13 Thread Eugen Block
Hi, did you check the MON logs? They should contain some information about the reason why the OSD is marked down and out. You could also just try to mark it in yourself, does it change anything? $ ceph osd in 34 I would also take another look into the OSD logs: cephadm logs --name osd.34