On Tue, Oct 25, 2022 at 3:48 AM Frank Schilder <fr...@dtu.dk> wrote: > > Hi Patrick, > > thanks for your answer. This is exactly the behaviour we need. > > For future reference some more background: > > We need to prepare a quite large installation for planned power outages. Even > though they are called planned, we will not be able to handle these manually > in good time for reasons irrelevant here. Our installation is protected by an > UPS, but the guaranteed uptime on outage is only 6 minutes. So, we talk more > about transient protection than uninterrupted power supply. Although we > survived more than 20 minute power outages without loss of power to the DC, > we need to plan with these 6 minutes. > > In these 6 minutes, we need to wait for at least 1-2 minutes to avoid > unintended shut-downs. In the remaining 4 minutes, we need to take down a 500 > node HPC cluster and an 1000OSD+12MDS+2MON ceph sub-cluster. Part of this > ceph cluster will continue running on another site with higher power > redundancy. This gives maybe 1-2 minutes response time for the ceph cluster > and the best we can do is to try to achieve a "consistent at rest" state and > hope we can cleanly power down the system before the power is cut. > > Why am I so concerned about a "consistent at rest" state? > > Its because while not all instances of a power loss lead to data loss, all > instances of data loss I know of and were not caused by admin errors were > caused by a power loss (see https://tracker.ceph.com/issues/46847). We were > asked to prepare for a worst case of weekly power cuts, so no room for taking > too many chances here. Our approach is: unmount as much as possible, fail the > quickly FS to stop all remaining IO, give OSDs and MDSes a chance to flush > pending operations to disk or journal and then try a clean shut down.
To be clear in case there is any confusion: once you do `fs fail`, the MDS are removed from the cluster and they will respawn. They are not given any time to flush remaining I/O. FYI as this may interest you: we have a ticket to set a flag on the file system to prevent new client mounts: https://tracker.ceph.com/issues/57090 -- Patrick Donnelly, Ph.D. He / Him / His Principal Software Engineer Red Hat, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io