On 7/29/20 7:47 PM, Raffael Bachmann wrote:
Hi Mark
I think its 15 hours not 15 days. But the compaction time seems really
to be slow. I' destroying and recreating all nvme osds one by one. And
the recreated ones don't have latency problems and are also much
faster compacting the disk.
Thi
Hi Mark
I think its 15 hours not 15 days. But the compaction time seems really
to be slow. I' destroying and recreating all nvme osds one by one. And
the recreated ones don't have latency problems and are also much faster
compacting the disk.
This is since two hours:
Compaction Statistics
Hi Igor
Thanks for you answer. All the disks had low latancy warnings. "had"
because I think the problem is solved.
After moving some data and almost losing the nearfull nvme pool, because
one disk had so much latency that ceph decided to mark it out, I could
start destroying and recreating ea
Wow, that's crazy. You only had 13 compaction events for that OSD over
roughly 15 days but the average compaction time was 116 seconds! Notice
too though that the average compaction output size is 422MB with an
average output throughput of 3.5MB! That's really slow with RocksDB
sitting on an
Hi Raffael,
wondering if all OSDs are suffering from slow compaction or just he one
which is "near full"?
Do other OSDs has that "log_latency_fn slow operation observed for" lines?
Have you tried "osd bench" command for your OSDs? Does it show similar
numbers for every OSD?
You might want
Hi Mark
Unfortunately it is the production cluster and I don't have another one :-(
This is the output of the log parser. I'have nothing to compare them to.
Stupid me has no more logs from before the upgrade.
python ceph_rocksdb_log_parser.py ceph-osd.1.log
Compaction Statistics ceph-osd.1.
Hi Raffael,
Adam made a PR this year that shards rocksdb data across different
column families to help reduce compaction overhead. The goal is to
reduce write-amplification during compaction by storing multiple small
LSM hierarchies rather than 1 big one. We've seen evidence that this
lowe
Hi Wido
Thanks for the quick answer. They are all Intel p3520
https://ark.intel.com/content/www/us/en/ark/products/88727/intel-ssd-dc-p3520-series-2-0tb-2-5in-pcie-3-0-x4-3d1-mlc.html
And this is ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
n
On 29/07/2020 14:52, Raffael Bachmann wrote:
Hi All,
I'm kind of crossposting this from here:
https://forum.proxmox.com/threads/i-o-wait-after-upgrade-5-x-to-6-2-and-ceph-luminous-to-nautilus.73581/
But since I'm more and more sure that it's a ceph problem I'll try my
luck here.
Since up