That's quite a large number of storage units per machine.
My suspicion is that since you have apparently an unusually high number of LVs coming online at boot, the time it takes to linearly activate them is long enough to overlap with the point in time that ceph starts bringing up its storage-dependent components. Likely not only OSDs, but other resources that might keep internal databases and the like.
The cure for that under systemd would be to make Ceph - or at least its storage-dependent services - wait on LV availability.
The fun part is figuring out how to do that. Offhand, I don't know what in systemd controls the activation of LVM resources and it's almost certainly being done asynchronously, so you'd need to provide a detector service that could determine when things were available. Then you'd have to tweak Ceph not to start until the safe time has arrived. You might be able to edit the master ceph target to add such a dependency using an /etc/systemd/system override, but admittedly that doesn't cover allowing everything to come up as soon as possible but no sooner.
In particular, it would be hard to edit the individual OSDs to wait on their LVs, as the systemd components for OSDs on an administered system are constructed dynamically and do not persist when the system reboots, so it would likely require a worst-case delay.
Regards, Tim On 4/10/25 07:45, Alex from North wrote:
Hello Dominique! Os is quite new - Ubuntu 22.04 with all the latest upgrades. _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io