[ceph-users] Re: Temporary shutdown of subcluster and cephfs

Patrick Donnelly Tue, 25 Oct 2022 05:52:26 -0700

On Tue, Oct 25, 2022 at 3:48 AM Frank Schilder <fr...@dtu.dk> wrote:
>
> Hi Patrick,
>
> thanks for your answer. This is exactly the behaviour we need.
>
> For future reference some more background:
>
> We need to prepare a quite large installation for planned power outages. Even 
> though they are called planned, we will not be able to handle these manually 
> in good time for reasons irrelevant here. Our installation is protected by an 
> UPS, but the guaranteed uptime on outage is only 6 minutes. So, we talk more 
> about transient protection than uninterrupted power supply. Although we 
> survived more than 20 minute power outages without loss of power to the DC, 
> we need to plan with these 6 minutes.
>
> In these 6 minutes, we need to wait for at least 1-2 minutes to avoid 
> unintended shut-downs. In the remaining 4 minutes, we need to take down a 500 
> node HPC cluster and an 1000OSD+12MDS+2MON ceph sub-cluster. Part of this 
> ceph cluster will continue running on another site with higher power 
> redundancy. This gives maybe 1-2 minutes response time for the ceph cluster 
> and the best we can do is to try to achieve a "consistent at rest" state and 
> hope we can cleanly power down the system before the power is cut.
>
> Why am I so concerned about a "consistent at rest" state?
>
> Its because while not all instances of a power loss lead to data loss, all 
> instances of data loss I know of and were not caused by admin errors were 
> caused by a power loss (see https://tracker.ceph.com/issues/46847). We were 
> asked to prepare for a worst case of weekly power cuts, so no room for taking 
> too many chances here. Our approach is: unmount as much as possible, fail the 
> quickly FS to stop all remaining IO, give OSDs and MDSes a chance to flush 
> pending operations to disk or journal and then try a clean shut down.


To be clear in case there is any confusion: once you do `fs fail`, the
MDS are removed from the cluster and they will respawn. They are not
given any time to flush remaining I/O.

FYI as this may interest you: we have a ticket to set a flag on the
file system to prevent new client mounts:
https://tracker.ceph.com/issues/57090

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Temporary shutdown of subcluster and cephfs

Reply via email to