Hi Edouard, I had the same problem with on of my setups. I used this article to fix it: https://www.suse.com/support/kb/doc/?id=000019740
A changed these values increasing/decreasing 10%: ceph config set mds mds_cache_trim_threshold xxK (should initially be increased) ceph config set mds mds_cache_trim_decay_rate x.x (should initially be decreased) ceph config set mds mds_cache_memory_limit xxxxxxxxxx (should initially be increased) ceph config set mds mds_recall_max_caps xxxx (should initially be increased) ceph config set mds mds_recall_max_decay_rate x.xx (should initially be decreased) Dominique. > -----Oorspronkelijk bericht----- > Van: Frédéric Nass <frederic.n...@univ-lorraine.fr> > Verzonden: dinsdag 13 mei 2025 15:46 > Aan: Edouard FAZENDA <e.faze...@csti.ch> > CC: ceph-users <ceph-users@ceph.io>; Kelian SAINT-BONNET > <k.saintbon...@csti.ch> > Onderwerp: [ceph-users] Re: 2 MDSs behind on trimming on my Ceph > Cluster since the upgrade from 18.2.6 (reef) to 19.2.2 (squid) > > Hi Edouard, > > Sorry to hear that, although I'm not that surprised. I think you'll have to > wait > for the fix. > > Regards, > Frédéric. > > ----- Le 13 Mai 25, à 15:41, Edouard FAZENDA <e.faze...@csti.ch> a écrit : > > > Dear Frederic, > > > I have applied the settings you have provided, unfortunately the > > cluster was back green and yellow afterward with still MDS on trimming > again. > > > Thanks for the help > > > Best Regards, > > > [ https://www.csti.ch/ ] > > S wiss C loud P rovider > > > > > Edouard Fazenda > > > Technical Support > > > [ https://www.csti.ch/ ] > > > Chemin du Curé-Desclouds, 2 > > CH-1226 Thonex > > +41 22 869 04 40 > > > [ https://mail.univ-lorraine.fr/www.csti.ch | www.csti.ch ] > > > From: Frédéric Nass <frederic.n...@univ-lorraine.fr> > > Sent: vendredi, 9 mai 2025 14:20 > > To: Edouard FAZENDA <e.faze...@csti.ch> > > Cc: ceph-users <ceph-users@ceph.io>; Kelian SAINT-BONNET > > <k.saintbon...@csti.ch> > > Subject: Re: [ceph-users] 2 MDSs behind on trimming on my Ceph Cluster > > since the upgrade from 18.2.6 (reef) to 19.2.2 (squid) > > > ----- Le 9 Mai 25, à 14:10, Frédéric Nass < [ > > mailto:frederic.n...@univ-lorraine.fr | frederic.n...@univ-lorraine.fr > > ] > a écrit : > > >> Hi Edouard, > > >> ----- Le 8 Mai 25, à 10:15, Edouard FAZENDA < [ > >> mailto:e.faze...@csti.ch | e.faze...@csti.ch ] > a écrit : > > >>> Dear all, > > >>> I have the following issue on my Ceph cluster MDSs behind on > >>> trimming on my Ceph Cluster since the upgrade using cephadm from > 18.2.6 to 19.2.2. > > >>> Here some cluster logs : > > >>> 8/5/25 09:00 AM [WRN] overall HEALTH_WARN 2 MDSs behind on > trimming > > >>> 8/5/25 08:50 AM [WRN] overall HEALTH_WARN 2 MDSs behind on > trimming > > >>> 8/5/25 08:40 AM [WRN] mds.cephfs.node2.isqjza(mds.0): Behind on > >>> trimming > >>> (326/128) max_segments: 128, num_segments: 326 > > >>> 8/5/25 08:40 AM [WRN] mds.cephfs.node1.ojmpnk(mds.0): Behind on > >>> trimming > >>> (326/128) max_segments: 128, num_segments: 326 > > >>> 8/5/25 08:40 AM [WRN] [WRN] MDS_TRIM: 2 MDSs behind on trimming > > >>> 8/5/25 08:40 AM [WRN] Health detail: HEALTH_WARN 2 MDSs behind on > >>> trimming > > >>> 8/5/25 08:33 AM [WRN] Health check update: 2 MDSs behind on > trimming > >>> (MDS_TRIM) > > >>> 8/5/25 08:33 AM [WRN] Health check failed: 1 MDSs behind on trimming > >>> (MDS_TRIM) > > >>> 8/5/25 08:30 AM [INF] overall HEALTH_OK > > >>> 8/5/25 08:22 AM [INF] Cluster is now healthy > > >>> 8/5/25 08:22 AM [INF] Health check cleared: MDS_TRIM (was: 1 MDSs > >>> behind on > >>> trimming) > > >>> 8/5/25 08:22 AM [INF] MDS health message cleared (mds.?): Behind on > >>> trimming > >>> (525/128) > > >>> 8/5/25 08:22 AM [WRN] Health check update: 1 MDSs behind on > trimming > >>> (MDS_TRIM) > > >>> 8/5/25 08:22 AM [INF] MDS health message cleared (mds.?): Behind on > >>> trimming > >>> (525/128) > > >>> 8/5/25 08:20 AM [WRN] overall HEALTH_WARN 2 MDSs behind on > trimming > > >>> 8/5/25 08:10 AM [WRN] mds.cephfs.node2.isqjza(mds.0): Behind on > >>> trimming > >>> (332/128) max_segments: 128, num_segments: 332 > > >>> 8/5/25 08:10 AM [WRN] mds.cephfs.node1.ojmpnk(mds.0): Behind on > >>> trimming > >>> (332/128) max_segments: 128, num_segments: 332 > > >>> 8/5/25 08:10 AM [WRN] [WRN] MDS_TRIM: 2 MDSs behind on trimming > > >>> 8/5/25 08:10 AM [WRN] Health detail: HEALTH_WARN 2 MDSs behind on > >>> trimming > > >>> 8/5/25 08:03 AM [WRN] Health check update: 2 MDSs behind on > trimming > >>> (MDS_TRIM) > > >>> 8/5/25 08:03 AM [WRN] Health check failed: 1 MDSs behind on trimming > >>> (MDS_TRIM) > > >>> 8/5/25 08:00 AM [INF] overall HEALTH_OK > > >>> #ceph fs status > > >>> cephfs - 50 clients > > >>> ====== > > >>> RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS > > >>> 0 active cephfs.node1.ojmpnk Reqs: 10 /s 305k 294k 91.8k 6818 > > >>> 0-s standby-replay cephfs.node2.isqjza Evts: 0 /s 551k 243k 90.6k 0 > > >>> POOL TYPE USED AVAIL > > >>> cephfs_metadata metadata 2630M 2413G > > >>> cephfs_data data 12.7T 3620G > > >>> STANDBY MDS > > >>> cephfs.node3.vdicdn > > >>> MDS version: ceph version 19.2.2 > >>> (0eceb0defba60152a8182f7bd87d164b639885b8) > >>> squid (stable) > > >>> # ceph versions > > >>> { > > >>> "mon": { > > >>> "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) > squid (stable)": > >>> 3 > > >>> }, > > >>> "mgr": { > > >>> "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) > squid (stable)": > >>> 2 > > >>> }, > > >>> "osd": { > > >>> "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) > squid (stable)": > >>> 18 > > >>> }, > > >>> "mds": { > > >>> "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) > squid (stable)": > >>> 3 > > >>> }, > > >>> "rgw": { > > >>> "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) > squid (stable)": > >>> 6 > > >>> }, > > >>> "overall": { > > >>> "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) > squid (stable)": > >>> 32 > > >>> } > > >>> } > > >>> #ceph orch ps --daemon-type mds > > >>> NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM > VERSION IMAGE > >>> ID CONTAINER ID > > >>> mds.cephfs.node1.ojmpnk rke-sh1-1 running (18h) 4m ago 19M 1709M - > >>> 19.2.2 4892a7ef541b 8dd8db30a1de > > >>> mds.cephfs.node2.isqjza rke-sh1-2 running (18h) 2m ago 3y 1720M - > >>> 19.2.2 4892a7ef541b 7b9d5b692764 > > >>> mds.cephfs.node3.vdicdn rke-sh1-3 running (18h) 108s ago 18M 27.9M - > >>> 19.2.2 4892a7ef541b d2de22a15e18 > > >>> root@node1:~# ceph config show-with-defaults > >>> mds.cephfs.rke-sh1-3.vdicdn | egrep > "mds_cache_trim_threshold|mds_cache_trim_decay_rate|mds_cache_me > mory_limit|mds_recall_max_caps|mds_recall_max_decay_rate" > > >>> mds_cache_memory_limit 4294967296 default > > >>> mds_cache_trim_decay_rate 1.000000 default > > >>> mds_cache_trim_threshold 262144 default > > >>> mds_recall_max_caps 30000 default > > >>> mds_recall_max_decay_rate 1.500000 default > > >>> root@node2:~# ceph config show-with-defaults > >>> mds.cephfs.rke-sh1-2.isqjza | egrep > "mds_cache_trim_threshold|mds_cache_trim_decay_rate|mds_cache_me > mory_limit|mds_recall_max_caps|mds_recall_max_decay_rate" > > >>> mds_cache_memory_limit 4294967296 default > > >>> mds_cache_trim_decay_rate 1.000000 default > > >>> mds_cache_trim_threshold 262144 default > > >>> mds_recall_max_caps 30000 default > > >>> mds_recall_max_decay_rate 1.500000 default > > >>> root@node3:~# ceph config show-with-defaults > >>> mds.cephfs.rke-sh1-1.ojmpnk | egrep > "mds_cache_trim_threshold|mds_cache_trim_decay_rate|mds_cache_me > mory_limit|mds_recall_max_caps|mds_recall_max_decay_rate" > > >>> mds_cache_memory_limit 4294967296 default > > >>> mds_cache_trim_decay_rate 1.000000 default > > >>> mds_cache_trim_threshold 262144 default > > >>> mds_recall_max_caps 30000 default > > >>> mds_recall_max_decay_rate 1.500000 default > > >>> # ceph mds stat > > >>> cephfs:1 {0=cephfs.node1.ojmpnk=up:active} 1 up:standby-replay 1 > >>> up:standby > > >>> Do you have an idea on what could happen ? Should I increate > >>> mds_cache_trim_decay_rate ? > > >>> I saw the folloing issue : [ https://tracker.ceph.com/issues/66948 | Bug > #66948: > >>> mon.a (mon.0) 326 : cluster [WRN] Health check failed: 1 MDSs behind > >>> on trimming (MDS_TRIM)" in cluster log - CephFS - Ceph ] ( [ > >>> https://github.com/ceph/ceph/pull/60838 | squid: mds: trim mdlog > >>> when segments exceed threshold and trim was idle by vshankar · Pull > >>> Request #60838 · ceph/ceph · GitHub ] ) maybe related ? > >> There's a fair chance, yes as you said MDS_TRIM alert came after the > >> upgrade, Reef is immune to this bug (as not using major/minor log > >> segment changes) and Squid v19.2.2 does not contain the fix. > > >> Wou could try decreasing mds_recall_max_decay_rate to 1 (instead of > >> 1.5) > > >> Regards, > > >> Frédéric. > > > Apologies, the email was sent prematurely... > > > What you could try waiting for the fix is to: > > > - increase mds_log_max_segments to 256 (defaults to 128) > > > And eventually: > > > - reduce mds_recall_max_decay_rate to 1 (defaults to 1.5) > > > - reduce mds_recall_max_decay_threshold to 32K (defaults to 128K) > > > - increase mds_recall_global_max_decay_threshold to 256K (defaults to > > 128K) > > > to allow the MDS to reclaim client caps in a more aggressive manner, > > though I'm not sure this will prevent MDS_TRIM alert from being triggered. > > > Regards, > > > Frédéric. > > >>> Thanks for the help 😊 > > >>> Best Regards, Edouard Fazenda. > > >>> [ https://www.csti.ch/ ] > >>> S wiss C loud P rovider > > > > >>> Edouard Fazenda > > >>> Technical Support > > >>> [ https://www.csti.ch/ ] > > >>> Chemin du Curé-Desclouds, 2 > >>> CH-1226 Thonex > >>> +41 22 869 04 40 > > >>> [ https://mail.univ-lorraine.fr/www.csti.ch | www.csti.ch ] > > >>> _______________________________________________ > >>> ceph-users mailing list -- [ mailto:ceph-users@ceph.io | > >>> ceph-users@ceph.io ] To unsubscribe send an email to [ > >>> mailto:ceph-users-le...@ceph.io | ceph-users-le...@ceph.io ] > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email > to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io