It seems that the shutdown procedure which we had documented (and many others as well) can potentially cause problems when the cluster starts up again.
There is a blog article from croit explaining it in detail [0] if you are interested. The shorter explanation is that when unsetting 'pause' it might take half an hour to get out of that state. Unsetting nodown also could take a long time. During all this OSDs and MONs might consume a lot of CPU resources, increasing the CPU load on the host considerbly. nobackfill and norecover are not needed, but also not harmful. Archeology revealed that the procedure with all the other flags (which this commit removes), originated in the RedHat documentation, but was never part of Ceph's shutdown procedure which is tested by the Ceph team. [0] https://web.archive.org/web/20250624082830/https://www.croit.io/blog/how-not-to-shut-down-a-ceph-cluster Signed-off-by: Aaron Lauterer <a.laute...@proxmox.com> --- pveceph.adoc | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/pveceph.adoc b/pveceph.adoc index 79aa045..a049612 100644 --- a/pveceph.adoc +++ b/pveceph.adoc @@ -1133,34 +1133,25 @@ or the CLI: ceph -s ---- -To disable all self-healing actions, and to pause any client IO in the Ceph -cluster, enable the following OSD flags in the **Ceph -> OSD** panel or via the -CLI: +In order to not cause any recovery during the shut down and later power on +phases, enable the 'noout' OSD flag. Either in the **Ceph -> OSD** panel behind +the **Manage Global Flags** button or the CLI: [source,bash] ---- ceph osd set noout -ceph osd set norecover -ceph osd set norebalance -ceph osd set nobackfill -ceph osd set nodown -ceph osd set pause ---- Start powering down your nodes without a monitor (MON). After these nodes are down, continue by shutting down nodes with monitors on them. When powering on the cluster, start the nodes with monitors (MONs) first. Once -all nodes are up and running, confirm that all Ceph services are up and running -before you unset the OSD flags again: +all nodes are up and running, confirm that all Ceph services are up and running. +In the end, the only warning you should see for Ceph is that the 'noout' flag +is still set. You can disable it via the web UI or via the CLI: [source,bash] ---- -ceph osd unset pause -ceph osd unset nodown -ceph osd unset nobackfill -ceph osd unset norebalance -ceph osd unset norecover ceph osd unset noout ---- -- 2.39.5 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel