[ceph-users] Re: High io wait when osd rocksdb is compacting

2020-07-29 Thread Mark Nelson
On 7/29/20 7:47 PM, Raffael Bachmann wrote: Hi Mark I think its 15 hours not 15 days. But the compaction time seems really to be slow. I' destroying and recreating all nvme osds one by one. And the recreated ones don't have latency problems and are also much faster compacting the disk. Thi

[ceph-users] Re: High io wait when osd rocksdb is compacting

2020-07-29 Thread Raffael Bachmann
Hi Mark I think its 15 hours not 15 days. But the compaction time seems really to be slow. I' destroying and recreating all nvme osds one by one. And the recreated ones don't have latency problems and are also much faster compacting the disk. This is since two hours: Compaction Statistics  

[ceph-users] Re: High io wait when osd rocksdb is compacting

2020-07-29 Thread Raffael Bachmann
Hi Igor Thanks for you answer. All the disks had low latancy warnings. "had" because I think the problem is solved. After moving some data and almost losing the nearfull nvme pool, because one disk had so much latency that ceph decided to mark it out, I could start destroying and recreating ea

[ceph-users] Re: High io wait when osd rocksdb is compacting

2020-07-29 Thread Mark Nelson
Wow, that's crazy.  You only had 13 compaction events for that OSD over roughly 15 days but the average compaction time was 116 seconds!  Notice too though that the average compaction output size is 422MB with an average output throughput of 3.5MB!  That's really slow with RocksDB sitting on an

[ceph-users] Re: High io wait when osd rocksdb is compacting

2020-07-29 Thread Igor Fedotov
Hi Raffael, wondering if all OSDs are suffering from slow compaction or just he one which is "near full"? Do other OSDs has that "log_latency_fn slow operation observed for" lines? Have you tried "osd bench" command for your OSDs? Does it show similar numbers for every OSD? You might want

[ceph-users] Re: High io wait when osd rocksdb is compacting

2020-07-29 Thread Raffael Bachmann
Hi Mark Unfortunately it is the production cluster and I don't have another one :-( This is the output of the log parser. I'have nothing to compare them to. Stupid me has no more logs from before the upgrade. python ceph_rocksdb_log_parser.py ceph-osd.1.log Compaction Statistics   ceph-osd.1.

[ceph-users] Re: High io wait when osd rocksdb is compacting

2020-07-29 Thread Mark Nelson
Hi Raffael, Adam made a PR this year that shards rocksdb data across different column families to help reduce compaction overhead.  The goal is to reduce write-amplification during compaction by storing multiple small LSM hierarchies rather than 1 big one.  We've seen evidence that this lowe

[ceph-users] Re: High io wait when osd rocksdb is compacting

2020-07-29 Thread Raffael Bachmann
Hi Wido Thanks for the quick answer. They are all Intel p3520 https://ark.intel.com/content/www/us/en/ark/products/88727/intel-ssd-dc-p3520-series-2-0tb-2-5in-pcie-3-0-x4-3d1-mlc.html And this is ceph df RAW STORAGE:     CLASS SIZE   AVAIL   USED    RAW USED %RAW USED     n

[ceph-users] Re: High io wait when osd rocksdb is compacting

2020-07-29 Thread Wido den Hollander
On 29/07/2020 14:52, Raffael Bachmann wrote: Hi All, I'm kind of crossposting this from here: https://forum.proxmox.com/threads/i-o-wait-after-upgrade-5-x-to-6-2-and-ceph-luminous-to-nautilus.73581/ But since I'm more and more sure that it's a ceph problem I'll try my luck here. Since up