[ceph-users] Re: Help needed, ceph fs down due to large stray dir

Dan van der Ster Fri, 10 Jan 2025 14:34:54 -0800

Hi Frank,

Can you try `perf top` to find out what the ceph-mds process is doing
with that CPU time?
Also Mark's profiler is super useful to find those busy loops:
https://github.com/markhpc/uwpmp


Cheers, Dan

--
Dan van der Ster
CTO @ CLYSO
Try our Ceph Analyzer -- https://analyzer.clyso.com/
https://clyso.com | dan.vanders...@clyso.com

On Fri, Jan 10, 2025 at 2:06 PM Frank Schilder <fr...@dtu.dk> wrote:
>
> Hi Bailey,
>
> I already set that value very high:
>
> # ceph config get mds.ceph-12 mds_beacon_grace
> 600000.000000
>
> To no avail. The 15s heartbeat timeout comes from somewhere else. What I 
> observe is that the MDS loads the stray buckets (up to 87Mio DNS/INOS) and as 
> soon as that happened it seems to start doing something (RAM usage grows 
> without the DNS/INOS changing any more). However, shortly after the timeout 
> happens, everything comes to a standstill. I think the MONs keep the MDS 
> assigned but its no longer part of the file system or the actual MDS worker 
> thread terminates with this timeout.
>
> Its reported as up and active, but this report seems just outdated as all 
> status queries to the MDS just hang. My suspicion is that the MONs don't kick 
> it out yet (no fail-over triggered), but the rank is actually not really 
> active. The report just doesn't update.
>
> I'm stuck here and am out of ideas what to do about it. Increasing the thread 
> timeout would probably help, but I can't find a config option for that.
>
> I'm afraid I need to take a break. I will be looking at my e-mail in about 4h 
> again. Would be great if there are some further ideas for how to proceed.
>
> Thanks so far and best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Bailey Allison <balli...@45drives.com>
> Sent: Friday, January 10, 2025 10:23 PM
> To: Frank Schilder; ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: Help needed, ceph fs down due to large stray dir
>
> Hi Frank,
>
> The value for that is mds_beacon_grace. Default is 15 but you can jack
> it up. Apply it to the monitor or global to take effect.
>
> Just to clarify too, does the MDS daemon come into up:active ? If it
> does, are you able to also access that portion of the filesystem in that
> time?
>
> If you can access the filesystem, try running a stat on that portion
> with something like 'find . -ls' in a directory and see if the strays
> decrease.
>
> Regards,
>
> Bailey Allison
> Service Team Lead
> 45Drives, Ltd.
> 866-594-7199 x868
>
> On 1/10/25 17:18, Frank Schilder wrote:
> > Hi Bailey,
> >
> > thanks for your response. The MDS was actually unresponsive and I had to 
> > restart it (ceph tell and ceph daemon commands were hanging, except for 
> > "help"). Its currently in clientreplay and loading all the stuff again. I'm 
> > really worried that this here is the rescue killer:
> >
> >    heartbeat_map is_healthy 'MDSRank' had timed out after 15.000000954s
> >
> > Do you have any idea how to deal with this timeout? Somewhere in he process 
> > the MDS seems to become unresponsive for too long and seems to become 
> > unresponsive after that.
> >
> > I have 4T swap now and the MDS comes up to the point where it actually 
> > reports back a number for the stray items. However, some time after it 
> > becomes unresponsive and the heartbeat messages start showing up. I don't 
> > know how to get past this point.
> >
> > Best regards,
> > =================
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > ________________________________________
> > From: Bailey Allison <balli...@45drives.com>
> > Sent: Friday, January 10, 2025 10:05 PM
> > To: ceph-users@ceph.io; Frank Schilder
> > Subject: Re: [ceph-users] Re: Help needed, ceph fs down due to large stray 
> > dir
> >
> > Frank,
> >
> > You mentioned previously a large number of strays on the mds rank. Are
> > you able to check the rank again to see how many strays there are again?
> > We've previously had a similar issue, and once the MDS came back up we
> > had to stat the filesystem to decrease the number of strays, and which
> > doing so everything returned to normal.
> >
> > ceph tell mds.X perf dump | jq .mds_cache
> >
> > Bailey Allison
> > Service Team Lead
> > 45Drives, Ltd.
> > 866-594-7199 x868
> >
> > On 1/10/25 16:42, Frank Schilder wrote:
> >> Hi all,
> >>
> >> I got the MDS up. however, after quite some time its sitting with almost 
> >> no CPU load:
> >>
> >> top - 21:40:02 up  2:49,  1 user,  load average: 0.00, 0.02, 0.34
> >> Tasks: 606 total,   1 running, 247 sleeping,   0 stopped,   0 zombie
> >> %Cpu(s):  0.0 us,  0.1 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  
> >> 0.0 st
> >> GiB Mem :    503.7 total,     12.3 free,    490.3 used,      1.1 buff/cache
> >> GiB Swap:   3577.0 total,   3367.0 free,    210.0 used.      2.9 avail Mem
> >>
> >>       PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ 
> >> COMMAND
> >>     59495 ceph      20   0  685.8g 477.9g   0.0g S   1.0 94.9  53:47.57 
> >> ceph-mds
> >>
> >> I'm not sure if its doing anything at all. Only messages like these keep 
> >> showing up in the log:
> >>
> >> 2025-01-10T21:38:08.459+0100 7f87ccd5f700  1 heartbeat_map is_healthy 
> >> 'MDSRank' had timed out after 15.000000954s
> >> 2025-01-10T21:38:08.459+0100 7f87ccd5f700  0 mds.beacon.ceph-12 Skipping 
> >> beacon heartbeat to monitors (last acked 3019.23s ago); MDS internal 
> >> heartbeat is not healthy!
> >>
> >> The MDS cluster looks healthy from this output:
> >>
> >> # ceph fs status
> >> con-fs2 - 1554 clients
> >> =======
> >> RANK  STATE     MDS       ACTIVITY     DNS    INOS   DIRS   CAPS
> >>    0    active  ceph-15  Reqs:    0 /s   255k   248k  5434   1678
> >>    1    active  ceph-14  Reqs:    2 /s   402k   396k  26.7k   144k
> >>    2    active  ceph-12  Reqs:    0 /s  86.9M  86.9M  46.2k  3909
> >>    3    active  ceph-08  Reqs:    0 /s   637k   630k  2663   7457
> >>    4    active  ceph-11  Reqs:    0 /s  1496k  1492k   113k   103k
> >>    5    active  ceph-16  Reqs:    2 /s   775k   769k  65.3k  12.9k
> >>    6    active  ceph-24  Reqs:    0 /s   130k   113k  7294   8670
> >>    7    active  ceph-13  Reqs:   65 /s  3619k  3609k   469k  47.2k
> >>           POOL           TYPE     USED  AVAIL
> >>      con-fs2-meta1     metadata  4078G  7269G
> >>      con-fs2-meta2       data       0   7258G
> >>       con-fs2-data       data    1225T  2476T
> >> con-fs2-data-ec-ssd    data     794G  22.6T
> >>      con-fs2-data2       data    5747T  2253T
> >> STANDBY MDS
> >>     ceph-09
> >>     ceph-10
> >>     ceph-23
> >>     ceph-17
> >> MDS version: ceph version 16.2.15 
> >> (618f440892089921c3e944a991122ddc44e60516) pacific (stable)
> >>
> >> Did it mark itself out of the cluster and is waiting for the MON to fail 
> >> it?? Please help.
> >>
> >> Best regards,
> >> =================
> >> Frank Schilder
> >> AIT Risø Campus
> >> Bygning 109, rum S14
> >>
> >> ________________________________________
> >> From: Frank Schilder <fr...@dtu.dk>
> >> Sent: Friday, January 10, 2025 8:51 PM
> >> To: Spencer Macphee
> >> Cc: ceph-users@ceph.io
> >> Subject: Re: [ceph-users] Help needed, ceph fs down due to large stray dir
> >>
> >> Hi all,
> >>
> >> I seem to have gotten the MDS up to the point that it reports stats. Does 
> >> this mean anything:
> >>
> >> 2025-01-10T20:50:25.256+0100 7f87ccd5f700  1 heartbeat_map is_healthy 
> >> 'MDSRank' had timed out after 15.000000954s
> >> 2025-01-10T20:50:25.256+0100 7f87ccd5f700  0 mds.beacon.ceph-12 Skipping 
> >> beacon heartbeat to monitors (last acked 156.027s ago); MDS internal 
> >> heartbeat is not healthy!
> >>
> >> I hope it doesn't get failed by some king of timeout now.
> >>
> >> Best regards,
> >> =================
> >> Frank Schilder
> >> AIT Risø Campus
> >> Bygning 109, rum S14
> >>
> >> ________________________________________
> >> From: Spencer Macphee <spencerofsyd...@gmail.com>
> >> Sent: Friday, January 10, 2025 7:16 PM
> >> To: Frank Schilder
> >> Cc: ceph-users@ceph.io
> >> Subject: Re: [ceph-users] Help needed, ceph fs down due to large stray dir
> >>
> >> I had a similar issue some months ago that ended up using around 300 
> >> gigabytes of RAM for a similar number of strays.
> >>
> >> You can get an idea of the strays kicking around by checking the omapkeys 
> >> of the stray objects in the cephfs metadata pool. Strays are tracked in 
> >> objects: 600.00000000, 601.00000000, 602.00000000, etc... etc... That 
> >> would also give you an indication if it's progressing at each restart.
> >>
> >> On Fri, Jan 10, 2025 at 1:30 PM Frank Schilder 
> >> <fr...@dtu.dk<mailto:fr...@dtu.dk>> wrote:
> >> Hi all,
> >>
> >> we seem to have a serious issue with our file system, ceph version is 
> >> pacific latest. After a large cleanup operation we had an MDS rank with 
> >> 100Mio stray entries (yes, one hundred million). Today we restarted this 
> >> daemon, which cleans up the stray entries. It seems that this leads to a 
> >> restart loop due to OOM. The rank becomes active and then starts pulling 
> >> in DNS and INOS entries until all memory is exhausted.
> >>
> >> I have no idea if there is at least progress removing the stray items or 
> >> if it starts from scratch every time. If it needs to pull as many DNS/INOS 
> >> into cache as there are stray items, we don't have a server at hand with 
> >> enough RAM.
> >>
> >> Q1: Is the MDS at least making progress in every restart iteration?
> >> Q2: If not, how do we get this rank up again?
> >> Q3: If we can't get this rank up soon, can we at least move directories 
> >> away from this rank by pinning it to another rank?
> >>
> >> Currently, the rank in question reports .mds_cache.num_strays=0 in perf 
> >> dump.
> >>
> >> =================
> >> Frank Schilder
> >> AIT Risø Campus
> >> Bygning 109, rum S14
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
> >> To unsubscribe send an email to 
> >> ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

Reply via email to