Hi Frank, Can you try `perf top` to find out what the ceph-mds process is doing with that CPU time? Also Mark's profiler is super useful to find those busy loops: https://github.com/markhpc/uwpmp
Cheers, Dan -- Dan van der Ster CTO @ CLYSO Try our Ceph Analyzer -- https://analyzer.clyso.com/ https://clyso.com | dan.vanders...@clyso.com On Fri, Jan 10, 2025 at 2:06 PM Frank Schilder <fr...@dtu.dk> wrote: > > Hi Bailey, > > I already set that value very high: > > # ceph config get mds.ceph-12 mds_beacon_grace > 600000.000000 > > To no avail. The 15s heartbeat timeout comes from somewhere else. What I > observe is that the MDS loads the stray buckets (up to 87Mio DNS/INOS) and as > soon as that happened it seems to start doing something (RAM usage grows > without the DNS/INOS changing any more). However, shortly after the timeout > happens, everything comes to a standstill. I think the MONs keep the MDS > assigned but its no longer part of the file system or the actual MDS worker > thread terminates with this timeout. > > Its reported as up and active, but this report seems just outdated as all > status queries to the MDS just hang. My suspicion is that the MONs don't kick > it out yet (no fail-over triggered), but the rank is actually not really > active. The report just doesn't update. > > I'm stuck here and am out of ideas what to do about it. Increasing the thread > timeout would probably help, but I can't find a config option for that. > > I'm afraid I need to take a break. I will be looking at my e-mail in about 4h > again. Would be great if there are some further ideas for how to proceed. > > Thanks so far and best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Bailey Allison <balli...@45drives.com> > Sent: Friday, January 10, 2025 10:23 PM > To: Frank Schilder; ceph-users@ceph.io > Subject: Re: [ceph-users] Re: Help needed, ceph fs down due to large stray dir > > Hi Frank, > > The value for that is mds_beacon_grace. Default is 15 but you can jack > it up. Apply it to the monitor or global to take effect. > > Just to clarify too, does the MDS daemon come into up:active ? If it > does, are you able to also access that portion of the filesystem in that > time? > > If you can access the filesystem, try running a stat on that portion > with something like 'find . -ls' in a directory and see if the strays > decrease. > > Regards, > > Bailey Allison > Service Team Lead > 45Drives, Ltd. > 866-594-7199 x868 > > On 1/10/25 17:18, Frank Schilder wrote: > > Hi Bailey, > > > > thanks for your response. The MDS was actually unresponsive and I had to > > restart it (ceph tell and ceph daemon commands were hanging, except for > > "help"). Its currently in clientreplay and loading all the stuff again. I'm > > really worried that this here is the rescue killer: > > > > heartbeat_map is_healthy 'MDSRank' had timed out after 15.000000954s > > > > Do you have any idea how to deal with this timeout? Somewhere in he process > > the MDS seems to become unresponsive for too long and seems to become > > unresponsive after that. > > > > I have 4T swap now and the MDS comes up to the point where it actually > > reports back a number for the stray items. However, some time after it > > becomes unresponsive and the heartbeat messages start showing up. I don't > > know how to get past this point. > > > > Best regards, > > ================= > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > ________________________________________ > > From: Bailey Allison <balli...@45drives.com> > > Sent: Friday, January 10, 2025 10:05 PM > > To: ceph-users@ceph.io; Frank Schilder > > Subject: Re: [ceph-users] Re: Help needed, ceph fs down due to large stray > > dir > > > > Frank, > > > > You mentioned previously a large number of strays on the mds rank. Are > > you able to check the rank again to see how many strays there are again? > > We've previously had a similar issue, and once the MDS came back up we > > had to stat the filesystem to decrease the number of strays, and which > > doing so everything returned to normal. > > > > ceph tell mds.X perf dump | jq .mds_cache > > > > Bailey Allison > > Service Team Lead > > 45Drives, Ltd. > > 866-594-7199 x868 > > > > On 1/10/25 16:42, Frank Schilder wrote: > >> Hi all, > >> > >> I got the MDS up. however, after quite some time its sitting with almost > >> no CPU load: > >> > >> top - 21:40:02 up 2:49, 1 user, load average: 0.00, 0.02, 0.34 > >> Tasks: 606 total, 1 running, 247 sleeping, 0 stopped, 0 zombie > >> %Cpu(s): 0.0 us, 0.1 sy, 0.0 ni, 99.9 id, 0.0 wa, 0.0 hi, 0.0 si, > >> 0.0 st > >> GiB Mem : 503.7 total, 12.3 free, 490.3 used, 1.1 buff/cache > >> GiB Swap: 3577.0 total, 3367.0 free, 210.0 used. 2.9 avail Mem > >> > >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > >> COMMAND > >> 59495 ceph 20 0 685.8g 477.9g 0.0g S 1.0 94.9 53:47.57 > >> ceph-mds > >> > >> I'm not sure if its doing anything at all. Only messages like these keep > >> showing up in the log: > >> > >> 2025-01-10T21:38:08.459+0100 7f87ccd5f700 1 heartbeat_map is_healthy > >> 'MDSRank' had timed out after 15.000000954s > >> 2025-01-10T21:38:08.459+0100 7f87ccd5f700 0 mds.beacon.ceph-12 Skipping > >> beacon heartbeat to monitors (last acked 3019.23s ago); MDS internal > >> heartbeat is not healthy! > >> > >> The MDS cluster looks healthy from this output: > >> > >> # ceph fs status > >> con-fs2 - 1554 clients > >> ======= > >> RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS > >> 0 active ceph-15 Reqs: 0 /s 255k 248k 5434 1678 > >> 1 active ceph-14 Reqs: 2 /s 402k 396k 26.7k 144k > >> 2 active ceph-12 Reqs: 0 /s 86.9M 86.9M 46.2k 3909 > >> 3 active ceph-08 Reqs: 0 /s 637k 630k 2663 7457 > >> 4 active ceph-11 Reqs: 0 /s 1496k 1492k 113k 103k > >> 5 active ceph-16 Reqs: 2 /s 775k 769k 65.3k 12.9k > >> 6 active ceph-24 Reqs: 0 /s 130k 113k 7294 8670 > >> 7 active ceph-13 Reqs: 65 /s 3619k 3609k 469k 47.2k > >> POOL TYPE USED AVAIL > >> con-fs2-meta1 metadata 4078G 7269G > >> con-fs2-meta2 data 0 7258G > >> con-fs2-data data 1225T 2476T > >> con-fs2-data-ec-ssd data 794G 22.6T > >> con-fs2-data2 data 5747T 2253T > >> STANDBY MDS > >> ceph-09 > >> ceph-10 > >> ceph-23 > >> ceph-17 > >> MDS version: ceph version 16.2.15 > >> (618f440892089921c3e944a991122ddc44e60516) pacific (stable) > >> > >> Did it mark itself out of the cluster and is waiting for the MON to fail > >> it?? Please help. > >> > >> Best regards, > >> ================= > >> Frank Schilder > >> AIT Risø Campus > >> Bygning 109, rum S14 > >> > >> ________________________________________ > >> From: Frank Schilder <fr...@dtu.dk> > >> Sent: Friday, January 10, 2025 8:51 PM > >> To: Spencer Macphee > >> Cc: ceph-users@ceph.io > >> Subject: Re: [ceph-users] Help needed, ceph fs down due to large stray dir > >> > >> Hi all, > >> > >> I seem to have gotten the MDS up to the point that it reports stats. Does > >> this mean anything: > >> > >> 2025-01-10T20:50:25.256+0100 7f87ccd5f700 1 heartbeat_map is_healthy > >> 'MDSRank' had timed out after 15.000000954s > >> 2025-01-10T20:50:25.256+0100 7f87ccd5f700 0 mds.beacon.ceph-12 Skipping > >> beacon heartbeat to monitors (last acked 156.027s ago); MDS internal > >> heartbeat is not healthy! > >> > >> I hope it doesn't get failed by some king of timeout now. > >> > >> Best regards, > >> ================= > >> Frank Schilder > >> AIT Risø Campus > >> Bygning 109, rum S14 > >> > >> ________________________________________ > >> From: Spencer Macphee <spencerofsyd...@gmail.com> > >> Sent: Friday, January 10, 2025 7:16 PM > >> To: Frank Schilder > >> Cc: ceph-users@ceph.io > >> Subject: Re: [ceph-users] Help needed, ceph fs down due to large stray dir > >> > >> I had a similar issue some months ago that ended up using around 300 > >> gigabytes of RAM for a similar number of strays. > >> > >> You can get an idea of the strays kicking around by checking the omapkeys > >> of the stray objects in the cephfs metadata pool. Strays are tracked in > >> objects: 600.00000000, 601.00000000, 602.00000000, etc... etc... That > >> would also give you an indication if it's progressing at each restart. > >> > >> On Fri, Jan 10, 2025 at 1:30 PM Frank Schilder > >> <fr...@dtu.dk<mailto:fr...@dtu.dk>> wrote: > >> Hi all, > >> > >> we seem to have a serious issue with our file system, ceph version is > >> pacific latest. After a large cleanup operation we had an MDS rank with > >> 100Mio stray entries (yes, one hundred million). Today we restarted this > >> daemon, which cleans up the stray entries. It seems that this leads to a > >> restart loop due to OOM. The rank becomes active and then starts pulling > >> in DNS and INOS entries until all memory is exhausted. > >> > >> I have no idea if there is at least progress removing the stray items or > >> if it starts from scratch every time. If it needs to pull as many DNS/INOS > >> into cache as there are stray items, we don't have a server at hand with > >> enough RAM. > >> > >> Q1: Is the MDS at least making progress in every restart iteration? > >> Q2: If not, how do we get this rank up again? > >> Q3: If we can't get this rank up soon, can we at least move directories > >> away from this rank by pinning it to another rank? > >> > >> Currently, the rank in question reports .mds_cache.num_strays=0 in perf > >> dump. > >> > >> ================= > >> Frank Schilder > >> AIT Risø Campus > >> Bygning 109, rum S14 > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io> > >> To unsubscribe send an email to > >> ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io