Hello all,

We use Ceph (v18.2.2) and Rook (1.14.3) as the CSI for a Kubernetes environment. Last week, we had a problem with the MDS falling behind on trimming every 4-5 days (GitHub issue link <https://github.com/rook/rook/issues/14220>). We resolved the issue using the steps outlined in the GitHub issue.

We have 3 hosts (I know, I need to increase this as soon as possible, and I will!) and 6 OSDs. After running the commands:

|ceph config set mds mds_dir_max_commit_size 80|, |
|

|ceph fs fail <fs_name>|, and |
|

|ceph fs set <fs_name> joinable true|,

After that, the snaptrim queue for our PGs has stopped decreasing. All PGs of our CephFS are in either |active+clean+snaptrim_wait| or |active+clean+snaptrim| states. For example, the PG |3.12| is in the |active+clean+snaptrim| state, and its |snap_trimq_len| was 4077 yesterday but has increased to 4538 today.

I increased the |osd_snap_trim_priority| to 10 (|ceph config set osd osd_snap_trim_priority 10|), but it didn't help. Only the PGs of our CephFS have this problem.

Do you have any ideas on how we can resolve this issue?

Thanks in advance,

Giovanna
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to