Interesting, a few weeks ago I added a new disk to each of my 3 node cluster
and saw the same 2 Mb/s recovery.What I had noticed was that one OSD was
using very high CPU and seems to have been the primary node on the affected
PGs.I couldn’t find anything overly wrong with the OSD, networ
Hi Zoltan,
Sadly it looks like some of the debug symbols are messed which makes
things a little rough to debug from this. On the write path if you look
at the bstore_kv_sync thread:
Good state write test:
1.
+ 86.00% FileJournal::_open_file(long, long, bool)
2.
|+ 86.00% ???
3.
+
An update on my testing.
I have a 6 node test ceph cluster deployed as 1 admin and 5 OSDs. Each
nodes is running Centos7+podman with cephadm deployment of Octopus.
Other than scale, this mirrors my production setup.
On one of the OSD nodes I did a fresh install of RockyLinux8, being sure
to
Hi,
I created a tracker issue: https://tracker.ceph.com/issues/57115
Thanks,
Eugen
Zitat von Dhairya Parmar :
Hi there,
This thread contains some really insightful information. Thanks Eugen for
sharing the explanation by the SUSE team. Definitely the doc can be updated
with this, it might he
6 hosts with 2 x 10G NICs, data in 2+2 EC pool. 17.2.0, upgrade from
pacific.
cluster:
id:
health: HEALTH_WARN
2 host(s) running different kernel versions
2071 pgs not deep-scrubbed in time
837 pgs not scrubbed in time
services:
mon:5 da