After cluster enters healthy state mgr should re-check stray daemons, a lot
of activities are on hold while cluster is in warning state.
In the event it does not disappear after cluster is healthy than mgr
restart should help.
Kind regards,
Nino
On Fri, Jun 16, 2023 at 10:24 PM Nicola Mori w
The osd daemon finally disappeared without further intervention. I guess I
should have had more patience and wait the purge process to finish.
Thanks to everybody who helped.
Nicola
Il 15 giugno 2023 15:02:16 CEST, Nicola Mori ha scritto:
>
>I have been able to (sort-of) fix the problem by remo
I have been able to (sort-of) fix the problem by removing the
problematic OSD, zapping the disk and starting a new OSD. The new OSD is
backfilling, but now the problem is that some parts of Ceph are still
waiting for the OSD removal, and the OSD (despite not running anymore on
the host) is se
Hi Curt,
I increased the debug level but still the OSD daemon doesn't log
anything more than I already posted. dmesg does not report anything
suspect (the osd disk has the very same messages as other disks for
working osds), and smart is not very helpful:
# smartctl -a /dev/sdf
smartctl 7.1
Hello,
Have you increased the osd debug level to get more output? Does dmesg on
the host machine report anything? Are there any smart errors on the drive?
Regards,
Curt
On Thu, Jun 15, 2023, 13:30 Nicola Mori wrote:
> Hi Dario,
>
> I think the connectivity is ok. My cluster has just a public
Hi Dario,
I think the connectivity is ok. My cluster has just a public interface,
and all of the other services on the same machine (osds and mgr) work
flawlessly so I guess the connectivity is ok. Or in other words, I don't
know what to look for in the network since all the other services wor
Hi, I have seen this behaviour when the OSD host cluster interface was down
but the public interface was up. I suggest checking the network interfaces
and the connectivity.
Regards!
On Thu, Jun 15, 2023 at 11:08 AM Nicola Mori wrote:
> I have restarted all the monitors and managers, but still t
I have restarted all the monitors and managers, but still the osd
remains down. But I found that cephadm actually sees -it running:
# ceph orch ps | grep osd.34
osd.34 balin running (14m) 108s
ago 8M75.3M 793M 17.2.6 b1a23658afad 5b9dbea262c7
Hi,
did you check the MON logs? They should contain some information about
the reason why the OSD is marked down and out. You could also just try
to mark it in yourself, does it change anything?
$ ceph osd in 34
I would also take another look into the OSD logs:
cephadm logs --name osd.34