Hello Nima,

Unsure if you have found the root cause of the problem in the meantime>
From the top of my head, if any useful:
- Quincy changes the default scheduler from wpq to mclock
- The default number of scrubs on each osd is increased from 1 to 3
- There's a new rocksdb compact_on_deletion option that triggers compaction more frequently according to number of tombstones over a sliding window.

The latter, however, is very workload dependent. What type of workload does the cluster serve?

Cheers,
Enrico


On 4/15/25 12:13, Nima AbolhassanBeigi wrote:
Hello again
We have done some investigating. Since our first message did not have any
information or context, let me add some new information.

I looked through our dashboards and the write latency panels in the OSD
Overview dashboard.
The latency for write operations had increased significantly, but in
distinction, the physical write more or less stayed the same; however, the
write process operations and write prepare operations have increased
dramatically.

In the end, the RocksDB tunings helped our situation and decreased the
latency more or less to the pre-upgrade amount.
We changed the default parameters to the following:
compression=kNoCompression,max_write_buffer_number=128,min_write_buffer_number_to_merge=16,compaction_style=kCompactionStyleLevel,write_buffer_size=8388608,max_background_jobs=4,level0_file_num_compaction_trigger=8,max_bytes_for_level_base=1073741824,max_bytes_for_level_multiplier=8,compaction_readahead_size=2MB,max_total_wal_size=1073741824,writable_file_max_buffer_size=0
This seems to have done the trick, however, we are still worried that we
haven't found the root cause of this problem and only mitigated the
symptoms.

Another probably important information is that the bluefs allocator for our
OSDs is not the default hybrid and is set to the bitmap option. In
addition, the bluestore allocation block score of these OSDs is, for most
of them, almost 0.9, which I know is terrible, but what I don't understand
is why this problem occurred post-upgrade. Could it be the terrible block
scores?

Another interesting change was the memory usage of each OSD post-upgrade,
which dropped nearly in half, but the Node's memory cache/buffer usage
increased.

I've put the Grafana panels screenshots in this Google Doc
<https://docs.google.com/document/d/1DSf4MJoze_BTSWAJWoYyQetA33P6fwpMavYXPdAk5kU/edit?usp=sharing>
.
We started our upgrade process on March 6 and then again on March 7. We
tuned the RocksDB parameters on March 9th, as the screenshots show, which
drops the latency.

We are still looking for the root cause, and I could use all the help I can
get to try to find it, and it is much appreciated.

Regards
Nima

On Mon, Apr 7, 2025 at 3:27 PM Nima AbolhassanBeigi <
nima.abolhassanbe...@gmail.com> wrote:

Hi dear ceph community

We have encountered an issue with our ceph cluster after upgrading from
v16.2.13 to v17.2.7.
The issue is that the write latency on OSDs has increased significantly
and doesn't seem to plummet back down.
The average write latency has almost doubled, and this has happened since
we upgraded the OSDs.

If anybody could help figure this out.

Kind regards
Nima

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

--
Enrico Bocchi
CERN European Laboratory for Particle Physics
IT - Storage & Data Management  - General Storage Services
Mailbox: G20500 - Office: 31-2-010
1211 Genève 23
Switzerland
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to