subject:"\[ceph\-users\] Re\: PG_SLOW_SNAP_TRIMMING and possible storage leakage on 16.2.5"

[ceph-users] Re: PG_SLOW_SNAP_TRIMMING and possible storage leakage on 16.2.5

2022-01-31 Thread David Prude

Hello, As an update: we were able to clear the queue by repeering all PGs which had outstanding entries in their snaptrim queues. After this process completed and we confirmed that no PGs remained with non-zero length queues, we re-enabled our snapshot schedule. Several days have now passed and

[ceph-users] Re: PG_SLOW_SNAP_TRIMMING and possible storage leakage on 16.2.5

2022-01-24 Thread Dan van der Ster

Hi, Yes, restarting an OSD also works to re-peer and "kick" the snaptrimming process. (In the ticket we first noticed this because snap trimming restarted after an unrelated OSD crashed/restarted). Please feel free to add your experience to that ticket. > monitoring snaptrimq This is from our lo

[ceph-users] Re: PG_SLOW_SNAP_TRIMMING and possible storage leakage on 16.2.5

2022-01-24 Thread David Prude

Dan, Thank you for replying. Since I posted I did some more digging. It really seemed as if snaptrim simply wasn't being processed. The output of "ceph health detail" showed that PG 3.9b had the longest queue. I examined this PG and saw that it's primary was osd.8 so I manually restarted that da

[ceph-users] Re: PG_SLOW_SNAP_TRIMMING and possible storage leakage on 16.2.5

2022-01-24 Thread Dan van der Ster

Hi David, We observed the same here: https://tracker.ceph.com/issues/52026 You can poke the trimming by repeering the PGs. Also, depending on your hardware, the defaults for osd_snap_trim_sleep might be far too conservative. We use osd_snap_trim_sleep = 0.1 on our mixed hdd block / ssd block.db O