[ceph-users] High io wait when osd rocksdb is compacting

2020-07-29 Thread Raffael Bachmann
Hi All, I'm kind of crossposting this from here: https://forum.proxmox.com/threads/i-o-wait-after-upgrade-5-x-to-6-2-and-ceph-luminous-to-nautilus.73581/ But since I'm more and more sure that it's a ceph problem I'll try my luck here. Since updating from Luminous to Nautilus I have a big prob

[ceph-users] Re: High io wait when osd rocksdb is compacting

2020-07-29 Thread Raffael Bachmann
/07/2020 15:04, Wido den Hollander wrote: On 29/07/2020 14:52, Raffael Bachmann wrote: Hi All, I'm kind of crossposting this from here: https://forum.proxmox.com/threads/i-o-wait-after-upgrade-5-x-to-6-2-and-ceph-luminous-to-nautilus.73581/ But since I'm more and more sure that i

[ceph-users] Re: High io wait when osd rocksdb is compacting

2020-07-29 Thread Raffael Bachmann
action events by running this script: https://github.com/ceph/cbt/blob/master/tools/ceph_rocksdb_log_parser.py That can give you an idea of how long your compaction events are lasting and what they are doing. Mark On 7/29/20 7:52 AM, Raffael Bachmann wrote: Hi All, I'm kind of

[ceph-users] Re: High io wait when osd rocksdb is compacting

2020-07-29 Thread Raffael Bachmann
"log_latency_fn slow operation observed for" lines? Have you tried "osd bench" command for your OSDs? Does it show similar numbers for every OSD? You might want to try manual offline DB compaction using ceph-kvstore-tool. Any improvements after that? Thanks, Igor On 7

[ceph-users] Re: High io wait when osd rocksdb is compacting

2020-07-29 Thread Raffael Bachmann
TQo/edit?usp=sharing No wonder you are seeing periodic stalls.  How many DBs per NVMe drive?  What's your cluster workload typically like? Also, can you see if the NVMe drive aqu-sz is getting big waiting for the requests to be serviced? Mark On 7/29/20 8:35 AM, Raffael Bachmann wrot