On 1/29/23 00:50, Matt Vandermeulen wrote:
I've observed a similar horror when upgrading a cluster from Luminous
to Nautilus, which had the same effect of an overwhelming amount of
snaptrim making the cluster unusable.
In our case, we held its hand by setting all OSDs to have zero max
trimmin
I've observed a similar horror when upgrading a cluster from Luminous to
Nautilus, which had the same effect of an overwhelming amount of
snaptrim making the cluster unusable.
In our case, we held its hand by setting all OSDs to have zero max
trimming PGs, unsetting nosnaptrim, and then slowly
After some investigation this is what I'm seeing:
- OSD processes get stuck at least at 100% CPU if I ceph osd unset
nosnaptrim. They keep at 100% CPU even if I ceph osd set nosnaptrim.
They stayed like that for at least 26 hours. Some quick benchmarks don't
show a reduction of the performance