[ceph-users] Re: RocksDB degradation / manual compaction vs. snaptrim operations choking Ceph to a halt

Christian Rohmann Thu, 08 Jul 2021 00:21:43 -0700

Hey Igor,

On 07/07/2021 14:59, Igor Fedotov wrote:

after an upgrade from Ceph Nautilus to Octopus we ran into extremeperformance issues leading to an unusable clusterwhen doing a larger snapshot delete and the cluster doing snaptrims,see i.e. https://tracker.ceph.com/issues/50511#note-13.Since this was not an issue prior to the upgrade, maybe theconversion of the OSD to OMAP caused this degradation of the RocksDBdata structures, maybe not. (We were running bluefs_buffered_io=true,so that was NOT the issue here).
It's hard to say what exactly caused the issue this time. Indeed OMAPconversion could have some impact since it had performed bulk removalalong the upgrade process - so DB could gain critical mass to startlagging.
But I presume this is a one-time effect - it should vaporize after DBcompaction. Which doesn't mean that snaptrims or any other bulkremovals are absolutely safe since then though.

Thank you very much for your quick and extensive reply!

If OMAP conversion could have this effect, maybe it's sensible totrigger either an an immediate online compaction to the end of theconversion or at least add this to the upgrade notes. I suppose with theEoL of Nautilus more and more clusters will now make the jump to theOctopus release and convert their OSDs to OMAP in the process. Even ifnot all clusters RocksDBs would go over the edge, in any case running acompaction should not hurt right?



Thanks again,


Christian

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: RocksDB degradation / manual compaction vs. snaptrim operations choking Ceph to a halt

Reply via email to