Hi Edouard,
----- Le 8 Mai 25, à 10:15, Edouard FAZENDA <[email protected]> a écrit :
> Dear all,
> I have the following issue on my Ceph cluster MDSs behind on trimming on my
> Ceph
> Cluster since the upgrade using cephadm from 18.2.6 to 19.2.2.
> Here some cluster logs :
> 8/5/25 09:00 AM [WRN] overall HEALTH_WARN 2 MDSs behind on trimming
> 8/5/25 08:50 AM [WRN] overall HEALTH_WARN 2 MDSs behind on trimming
> 8/5/25 08:40 AM [WRN] mds.cephfs.node2.isqjza(mds.0): Behind on trimming
> (326/128) max_segments: 128, num_segments: 326
> 8/5/25 08:40 AM [WRN] mds.cephfs.node1.ojmpnk(mds.0): Behind on trimming
> (326/128) max_segments: 128, num_segments: 326
> 8/5/25 08:40 AM [WRN] [WRN] MDS_TRIM: 2 MDSs behind on trimming
> 8/5/25 08:40 AM [WRN] Health detail: HEALTH_WARN 2 MDSs behind on trimming
> 8/5/25 08:33 AM [WRN] Health check update: 2 MDSs behind on trimming
> (MDS_TRIM)
> 8/5/25 08:33 AM [WRN] Health check failed: 1 MDSs behind on trimming
> (MDS_TRIM)
> 8/5/25 08:30 AM [INF] overall HEALTH_OK
> 8/5/25 08:22 AM [INF] Cluster is now healthy
> 8/5/25 08:22 AM [INF] Health check cleared: MDS_TRIM (was: 1 MDSs behind on
> trimming)
> 8/5/25 08:22 AM [INF] MDS health message cleared (mds.?): Behind on trimming
> (525/128)
> 8/5/25 08:22 AM [WRN] Health check update: 1 MDSs behind on trimming
> (MDS_TRIM)
> 8/5/25 08:22 AM [INF] MDS health message cleared (mds.?): Behind on trimming
> (525/128)
> 8/5/25 08:20 AM [WRN] overall HEALTH_WARN 2 MDSs behind on trimming
> 8/5/25 08:10 AM [WRN] mds.cephfs.node2.isqjza(mds.0): Behind on trimming
> (332/128) max_segments: 128, num_segments: 332
> 8/5/25 08:10 AM [WRN] mds.cephfs.node1.ojmpnk(mds.0): Behind on trimming
> (332/128) max_segments: 128, num_segments: 332
> 8/5/25 08:10 AM [WRN] [WRN] MDS_TRIM: 2 MDSs behind on trimming
> 8/5/25 08:10 AM [WRN] Health detail: HEALTH_WARN 2 MDSs behind on trimming
> 8/5/25 08:03 AM [WRN] Health check update: 2 MDSs behind on trimming
> (MDS_TRIM)
> 8/5/25 08:03 AM [WRN] Health check failed: 1 MDSs behind on trimming
> (MDS_TRIM)
> 8/5/25 08:00 AM [INF] overall HEALTH_OK
> #ceph fs status
> cephfs - 50 clients
> ======
> RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
> 0 active cephfs.node1.ojmpnk Reqs: 10 /s 305k 294k 91.8k 6818
> 0-s standby-replay cephfs.node2.isqjza Evts: 0 /s 551k 243k 90.6k 0
> POOL TYPE USED AVAIL
> cephfs_metadata metadata 2630M 2413G
> cephfs_data data 12.7T 3620G
> STANDBY MDS
> cephfs.node3.vdicdn
> MDS version: ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8)
> squid (stable)
> # ceph versions
> {
> "mon": {
> "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid
> (stable)":
> 3
> },
> "mgr": {
> "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid
> (stable)":
> 2
> },
> "osd": {
> "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid
> (stable)":
> 18
> },
> "mds": {
> "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid
> (stable)":
> 3
> },
> "rgw": {
> "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid
> (stable)":
> 6
> },
> "overall": {
> "ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid
> (stable)":
> 32
> }
> }
> #ceph orch ps --daemon-type mds
> NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID
> CONTAINER
> ID
> mds.cephfs.node1.ojmpnk rke-sh1-1 running (18h) 4m ago 19M 1709M - 19.2.2
> 4892a7ef541b 8dd8db30a1de
> mds.cephfs.node2.isqjza rke-sh1-2 running (18h) 2m ago 3y 1720M - 19.2.2
> 4892a7ef541b 7b9d5b692764
> mds.cephfs.node3.vdicdn rke-sh1-3 running (18h) 108s ago 18M 27.9M - 19.2.2
> 4892a7ef541b d2de22a15e18
> root@node1:~# ceph config show-with-defaults mds.cephfs.rke-sh1-3.vdicdn |
> egrep
> "mds_cache_trim_threshold|mds_cache_trim_decay_rate|mds_cache_memory_limit|mds_recall_max_caps|mds_recall_max_decay_rate"
> mds_cache_memory_limit 4294967296 default
> mds_cache_trim_decay_rate 1.000000 default
> mds_cache_trim_threshold 262144 default
> mds_recall_max_caps 30000 default
> mds_recall_max_decay_rate 1.500000 default
> root@node2:~# ceph config show-with-defaults mds.cephfs.rke-sh1-2.isqjza |
> egrep
> "mds_cache_trim_threshold|mds_cache_trim_decay_rate|mds_cache_memory_limit|mds_recall_max_caps|mds_recall_max_decay_rate"
> mds_cache_memory_limit 4294967296 default
> mds_cache_trim_decay_rate 1.000000 default
> mds_cache_trim_threshold 262144 default
> mds_recall_max_caps 30000 default
> mds_recall_max_decay_rate 1.500000 default
> root@node3:~# ceph config show-with-defaults mds.cephfs.rke-sh1-1.ojmpnk |
> egrep
> "mds_cache_trim_threshold|mds_cache_trim_decay_rate|mds_cache_memory_limit|mds_recall_max_caps|mds_recall_max_decay_rate"
> mds_cache_memory_limit 4294967296 default
> mds_cache_trim_decay_rate 1.000000 default
> mds_cache_trim_threshold 262144 default
> mds_recall_max_caps 30000 default
> mds_recall_max_decay_rate 1.500000 default
> # ceph mds stat
> cephfs:1 {0=cephfs.node1.ojmpnk=up:active} 1 up:standby-replay 1 up:standby
> Do you have an idea on what could happen ? Should I increate
> mds_cache_trim_decay_rate ?
> I saw the folloing issue : [ https://tracker.ceph.com/issues/66948 | Bug
> #66948:
> mon.a (mon.0) 326 : cluster [WRN] Health check failed: 1 MDSs behind on
> trimming (MDS_TRIM)" in cluster log - CephFS - Ceph ] ( [
> https://github.com/ceph/ceph/pull/60838 | squid: mds: trim mdlog when segments
> exceed threshold and trim was idle by vshankar · Pull Request #60838 ·
> ceph/ceph · GitHub ] ) maybe related ?
There's a fair chance, yes as you said MDS_TRIM alert came after the upgrade,
Reef is immune to this bug (as not using major/minor log segment changes) and
Squid v19.2.2 does not contain the fix.
Wou could try decreasing mds_recall_max_decay_rate to 1 (instead of 1.5)
Regards,
Frédéric.
> Thanks for the help 😊
> Best Regards, Edouard Fazenda.
> [ https://www.csti.ch/ ]
> S wiss C loud P rovider
> Edouard Fazenda
> Technical Support
> [ https://www.csti.ch/ ]
> Chemin du Curé-Desclouds, 2
> CH-1226 Thonex
> +41 22 869 04 40
> [ https://mail.univ-lorraine.fr/www.csti.ch | www.csti.ch ]
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]