[ceph-users] Re: Schödinger's OSD

2024-07-16 Thread Eugen Block
Hi, The final machine is operational and I'm going to leave it, but it does show 1 quirk. Dashboard and osd tree show its OSD as up/running, but "ceph orch ps" shows it as "stopped". My guess is that ceph orch is looking for the container OSD and doesn't notice the legacy OSD. I assume so, yes

[ceph-users] Re: Schödinger's OSD

2024-07-16 Thread Tim Holloway
OK. I deleted the questionable stuff with this command: dnf erase ceph-mgr-modules-core-16.2.15-1.el9s.noarch ceph-mgr- diskprediction-local-16.2.15-1.el9s.noarch ceph-mgr-16.2.15- 1.el9s.x86_64 ceph-mds-16.2.15-1.el9s.x86_64 ceph-mon-16.2.15- 1.el9s.x86_64 That left these two: centos-release-

[ceph-users] Re: Schödinger's OSD

2024-07-16 Thread Tim Holloway
Interesting. And thanks for the info. I did a quick look-around. The admin node, which is one of the mixed- osd machines has these packages installed: centos-release-ceph-pacific-1.0-2.el9.noarch cephadm-16.2.14-2.el9s.noarch libcephfs2-16.2.15-1.el9s.x86_64 python3-ceph-common-16.2.15-1.el9s.x86

[ceph-users] Re: Schödinger's OSD

2024-07-15 Thread Eugen Block
Do you have more ceph packages installed than just cephadm? If you have ceph-osd packages (or ceph-mon, ceph-mds etc.), I would remove them and clean up the directories properly. To me it looks like a mixup of "traditional" package based installation and cephadm deployment. Only you can tel

[ceph-users] Re: Schödinger's OSD

2024-07-15 Thread Tim Holloway
The problem with merely disabling or masking the non-cephadm OSD is that the offending systemd service unit lives under /run, not under /lib/systemd or /etc/systemd. As far as I know, essentially the entire /run directory's contents get destroyed when you reboot and that would include the disabled

[ceph-users] Re: Schödinger's OSD

2024-07-15 Thread Alwin Antreich
Hi Tim, On Mon, 15 Jul 2024 at 07:51, Eugen Block wrote: > If the OSD is already running in a container, adopting it won't work, > as you already noticed. I don't have an explanation how the > non-cephadm systemd unit has been created, but that should be fixed by > disabling it. > > > I have con

[ceph-users] Re: Schödinger's OSD

2024-07-14 Thread Eugen Block
If the OSD is already running in a container, adopting it won't work, as you already noticed. I don't have an explanation how the non-cephadm systemd unit has been created, but that should be fixed by disabling it. I have considered simply doing a brute-force removal of the OSD files in /

[ceph-users] Re: Schödinger's OSD

2024-07-13 Thread Tim Holloway
OK. Phantom hosts are gone. Many thanks! I'll have to review my checklist for decomissioning hosts to make sure that step is on it. On the legacy/container OSD stuff, that is a complete puzzle. While the first thing that I see when I look up "creating an OSD" in the system documentation is the

[ceph-users] Re: Schödinger's OSD

2024-07-12 Thread Eugen Block
Okay, it looks like you just need some further cleanup regarding your phantom hosts, for example: ceph osd crush remove www2 ceph osd crush remove docker0 and so on. Regarding the systemd unit (well, cephadm also generates one, but with the fsid as already mentioned), you could just stop an

[ceph-users] Re: Schödinger's OSD

2024-07-12 Thread Tim Holloway
This particular system has it both ways and neither wants to work. The peculiar thing was that when I first re-created the OSD with cephadm, it was reported that this was an "unmanaged node". So I ran the same cephadm agin and THAT time it showed up. So I suspect that the ceph-osd@4.service was th

[ceph-users] Re: Schödinger's OSD

2024-07-12 Thread Eugen Block
Hi, containerized daemons usually have the fsid in the systemd unit, like ceph-{fsid}@osd.5 Is it possible that you have those confused? Check the /var/lib/ceph/osd/ directory to find possible orphaned daemons and clean them up. And as previously stated, it would help to see your osd tree