Hi Frank, On Fri, Jan 10, 2025 at 12:31 PM Frank Schilder <fr...@dtu.dk> wrote: > > Hi all, > > we seem to have a serious issue with our file system, ceph version is pacific > latest. After a large cleanup operation we had an MDS rank with 100Mio stray > entries (yes, one hundred million). Today we restarted this daemon, which > cleans up the stray entries.
... why would you restart the daemon? I can't stress this question enough. Usually when CephFS has a "meltdown", the trigger was "I restarted the MDS" hoping that "X relatively minor problem" would go away. > It seems that this leads to a restart loop due to OOM. The rank becomes > active and then starts pulling in DNS and INOS entries until all memory is > exhausted. > > I have no idea if there is at least progress removing the stray items or if > it starts from scratch every time. If it needs to pull as many DNS/INOS into > cache as there are stray items, we don't have a server at hand with enough > RAM. Some strays may not be eligible for removal due to hard links or snapshots. > Q1: Is the MDS at least making progress in every restart iteration? Probably not. > Q2: If not, how do we get this rank up again? I don't see an easy way to circumvent this problem with any type of hacks/configs. One option you have is to allocate a suitably large swap file for the MDS node to see if it can chew through the stray directories. (More RAM would be better...) > Q3: If we can't get this rank up soon, can we at least move directories away > from this rank by pinning it to another rank? Afraid not. You cannot migrate strays and it wouldn't take effect in time anyway. > Currently, the rank in question reports .mds_cache.num_strays=0 in perf dump. That's probably out-of-date. Checking: this MDS runs out of memory shortly after becoming active right? -- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io