[ceph-users] Experience with high mds_max_caps_per_client

Kasper Rasmussen Fri, 04 Jul 2025 00:56:27 -0700

I have a CephFS where workloads use many small files.
I see cache pressure / MDS_CLIENT_RECALL warnings ones in awhile (due to 
clients exceeding mds_max_caps_per_client) and it seems if they linger to long, 
it ends up with more warnings e.g. MDS_SLOW_REQUESTS, and some directories 
locks up.


Anyway - Currently I have the mds_max_caps_per_client set to 2M which by 
looking into the output of

sudo ceph tell mds.<name> counter dump
..
..
"counters": {
"cap_hits": 8122912454,
"cap_miss": 497593,
"avg_read_latency": 0.000000028,
"avg_write_latency": 0.000000000,
"avg_metadata_latency": 0.000000000,
"dentry_lease_hits": 5630994071,
"dentry_lease_miss": 174816044,
"opened_files": 65,
"opened_inodes": 2106823,
"pinned_icaps": 2106823,
"total_inodes": 2106823,
"total_read_ops": 309938,
"total_read_size": 191662499168,
"total_write_ops": 371242,
"total_write_size": 414398493835
..
..

It Is not enough. However there is not a lot of open files

Checking the "ceph_mds_client_metrics_<fs_name>_pinned_icaps" gauge in 
prometheus tells the same story.. The client is constantly hidden the max caps 
roof over days (clients have long running jobs)


Can anyone share experience in regards to changing mds_max_caps_per_client to 
accomodate for such workloads.
When changing this, should other config variables be taken into account like -

mds_cache_memory_limit - currently: 36GB
mds_cache_trim_decay_rate - currently: 0.9
mds_cache_trim_threshold - currently: 288358
mds_recall_max_caps - currently: 33000
mds_recall_max_decay_rate - currently: 1.35

Or should the be tuned on a observe-and-change-as-needed basis.

Thanks in advance.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Experience with high mds_max_caps_per_client

Reply via email to