Hello, ceph users,

TL;DR: how can I look into ceph cluster write latency issues?

Details: we have a HDD-based cluster (with NVMe for metadata), about 20 hosts,
2 OSD per host, mostly used as RBD storage for QEMU/KVM virtual machines.
>From time to time our users complain about write latencies inside their VMs.

I would like to be able to see when the cluster is overloaded or when
the write latency is bad.

What did I try so far:

1) fio inside the KVM virtual machine:
fio --ioengine=libaio --direct=1 --rw=write --numjobs=1 --bs=1M --iodepth=16 
--size=5G --name=/var/tmp/fio-test
[...]
  write: IOPS=63, BW=63.3MiB/s (66.4MB/s)(5120MiB/80863msec); 0 zone resets

- I am usually getting about 60 to 150 IOPS for 1MB writes

2) PostgreSQL from the KVM virtual machine, running many tiny INSERTs
as separate transactions for about 10 seconds. This is where I clearly see
latency spikes:

Wed Dec 18 09:20:21 PM CET 2024 406.062 txn/s
Wed Dec 18 09:25:21 PM CET 2024 318.974 txn/s
Wed Dec 18 09:30:21 PM CET 2024 285.591 txn/s
Wed Dec 18 09:35:21 PM CET 2024 191.804 txn/s
Wed Dec 18 09:40:22 PM CET 2024 246.679 txn/s
Wed Dec 18 09:45:22 PM CET 2024 201.005 txn/s
Wed Dec 18 09:50:22 PM CET 2024 153.206 txn/s
Wed Dec 18 09:55:22 PM CET 2024 124.546 txn/s
Wed Dec 18 10:00:23 PM CET 2024 33.094 txn/s
Wed Dec 18 10:05:23 PM CET 2024 82.659 txn/s
Wed Dec 18 10:10:23 PM CET 2024 292.544 txn/s
Wed Dec 18 10:15:24 PM CET 2024 453.366 txn/s

The drawback of both fio and postgresql benchmark is that I am
unnecessarily loading the cluster with additional work, just to measure
latency. And I am not covering the whole cluster, just the OSDs on which
that VM happens to have its own data.

3) ceph osd perf
I don't see any single obviously overloaded OSD here, but the latencies vary
nevertheless. Here are statistics computed across all OSDs from
the "ceph osd perf" output:

Fri Jan  3 10:12:41 AM CET 2025
average 9       9
median  5       5
3rd-q   13      13
max     70      70
Fri Jan  3 10:13:42 AM CET 2025
average 5       5
median  3       3
3rd-q   10      10
max     31      31
Fri Jan  3 10:14:42 AM CET 2025
average 3       3
median  2       2
3rd-q   4       4
max     19      19
Fri Jan  3 10:15:42 AM CET 2025
average 5       5
median  1       1
3rd-q   3       3
max     63      63

However, I am not sure what exactly these numbers actually mean
- what timespan do they cover? I would like to have something like
"in the last 5 minutes, 99 % of all writes committed under XXX ms".
Can Ceph tell me that?

What else apart from buying faster hardware can I try in order
to improve the write latency for QEMU/KVM-based VMs with RBD images?

Thanks for any hints.

-Yenya

-- 
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| https://www.fi.muni.cz/~kas/                        GPG: 4096R/A45477D5 |
    We all agree on the necessity of compromise. We just can't agree on
    when it's necessary to compromise.                     --Larry Wall
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to