I have a CephFS where workloads use many small files. I see cache pressure / MDS_CLIENT_RECALL warnings ones in awhile (due to clients exceeding mds_max_caps_per_client) and it seems if they linger to long, it ends up with more warnings e.g. MDS_SLOW_REQUESTS, and some directories locks up.
Anyway - Currently I have the mds_max_caps_per_client set to 2M which by looking into the output of sudo ceph tell mds.<name> counter dump .. .. "counters": { "cap_hits": 8122912454, "cap_miss": 497593, "avg_read_latency": 0.000000028, "avg_write_latency": 0.000000000, "avg_metadata_latency": 0.000000000, "dentry_lease_hits": 5630994071, "dentry_lease_miss": 174816044, "opened_files": 65, "opened_inodes": 2106823, "pinned_icaps": 2106823, "total_inodes": 2106823, "total_read_ops": 309938, "total_read_size": 191662499168, "total_write_ops": 371242, "total_write_size": 414398493835 .. .. It Is not enough. However there is not a lot of open files Checking the "ceph_mds_client_metrics_<fs_name>_pinned_icaps" gauge in prometheus tells the same story.. The client is constantly hidden the max caps roof over days (clients have long running jobs) Can anyone share experience in regards to changing mds_max_caps_per_client to accomodate for such workloads. When changing this, should other config variables be taken into account like - mds_cache_memory_limit - currently: 36GB mds_cache_trim_decay_rate - currently: 0.9 mds_cache_trim_threshold - currently: 288358 mds_recall_max_caps - currently: 33000 mds_recall_max_decay_rate - currently: 1.35 Or should the be tuned on a observe-and-change-as-needed basis. Thanks in advance. _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io