Am 2025-03-20 15:15, schrieb Chris Palmer:
Hi,
* Ceph cluster 19.2.1 with 3 nodes, 4 x SATA disks with shared NVMe
DB/WAL, single 10g NICs
* Promox 8.3.5 cluster with 2 nodes (separate nodes to Ceph), single
10g NICs , single 1g NICs for corosync
* Test VM was using KRBD R3 pool on HDD, iothread=1, aio=io_uring,
cache=writeback
"cache=writeback" measures basically the memory bandwidth of the
kvm-host not the Ceph performance. This gets even worse with --size=1G
instead of more realistic 10G.
Please remove the writeback cache and increase the filesize to at
least 10G in order to measure Ceph rbd performance
Last but not least in order to make the benchmark independent of
previous runs you shall please drop the caches.
Overview Results for blocksize=4K
=================================
NVMe with cache=none (10 GB Fibre) | read: IOPS = 12.9k, BW =
50.3 MiB/s
NVMe with cache=writeback (10 GB Fibre)| read: IOPS = 58.3k, BW =
228.0 MiB/s
HDD with cache=none (10 GB Fibre) | read: IOPS = 0.34k, BW =
1.3 MiB/s
HDD with cache=writeback (10 GB Fibre)| read: IOPS = 53.8k, BW =
210.0 MiB/s
Detailed reports
================
NVME with cache=none
# echo 3 > /proc/sys/vm/drop_caches ; fio --name=registry-read
--ioengine=libaio --rw=randread --bs=4k --numjobs=4 --size=10G
--runtime=60 --group_reporting --iodepth=16
registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W)
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.36
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=39.0MiB/s][r=9981 IOPS][eta 00m:00s]
registry-read: (groupid=0, jobs=4): err= 0: pid=443550: Tue Mar 25
13:25:10 2025
read: IOPS=12.9k, BW=50.3MiB/s (52.7MB/s)(3017MiB/60001msec)
slat (usec): min=2, max=13134, avg=307.13, stdev=154.43
clat (usec): min=2, max=28147, avg=4662.15, stdev=1048.17
lat (usec): min=384, max=28432, avg=4969.28, stdev=1106.45
clat percentiles (usec):
| 1.00th=[ 2737], 5.00th=[ 3261], 10.00th=[ 3589], 20.00th=[
4080],
| 30.00th=[ 4293], 40.00th=[ 4424], 50.00th=[ 4555], 60.00th=[
4621],
| 70.00th=[ 4817], 80.00th=[ 5080], 90.00th=[ 5932], 95.00th=[
6325],
| 99.00th=[ 8225], 99.50th=[ 9634], 99.90th=[12911],
99.95th=[14353],
| 99.99th=[17957]
bw ( KiB/s): min=26816, max=58136, per=100.00%, avg=51622.66,
stdev=1587.61, samples=476
iops : min= 6704, max=14534, avg=12905.66, stdev=396.90,
samples=476
lat (usec) : 4=0.01%, 10=0.01%, 500=0.01%, 1000=0.01%
lat (msec) : 2=0.03%, 4=18.10%, 10=81.47%, 20=0.40%, 50=0.01%
cpu : usr=1.52%, sys=8.43%, ctx=726528, majf=0, minf=109
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued rwts: total=772457,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=50.3MiB/s (52.7MB/s), 50.3MiB/s-50.3MiB/s
(52.7MB/s-52.7MB/s), io=3017MiB (3164MB), run=60001-60001msec
Disk stats (read/write):
sdc: ios=725474/0, sectors=5803792/0, merge=0/0, ticks=223185/0,
in_queue=223185, util=99.89%
NVMe with cache=writeback
# echo 3 > /proc/sys/vm/drop_caches ; fio --name=registry-read
--ioengine=libaio --rw=randread --bs=4k --numjobs=4 --size=10G
--runtime=60 --group_reporting --iodepth=16
registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W)
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.36
Starting 4 processes
Jobs: 3 (f=3): [r(3),_(1)][65.9%][r=31.9MiB/s][r=8165 IOPS][eta 00m:31s]
registry-read: (groupid=0, jobs=4): err= 0: pid=442595: Tue Mar 25
13:19:11 2025
read: IOPS=58.3k, BW=228MiB/s (239MB/s)(13.3GiB/60001msec)
slat (usec): min=2, max=5185, avg=58.95, stdev=131.23
clat (nsec): min=1440, max=15294k, avg=908789.46, stdev=1764896.35
lat (usec): min=3, max=15367, avg=967.74, stdev=1880.61
clat percentiles (usec):
| 1.00th=[ 55], 5.00th=[ 58], 10.00th=[ 67], 20.00th=[
117],
| 30.00th=[ 133], 40.00th=[ 165], 50.00th=[ 196], 60.00th=[
239],
| 70.00th=[ 347], 80.00th=[ 873], 90.00th=[ 4146], 95.00th=[
6063],
| 99.00th=[ 6652], 99.50th=[ 6980], 99.90th=[ 8717], 99.95th=[
9503],
| 99.99th=[11076]
bw ( KiB/s): min=203632, max=602000, per=100.00%, avg=387963.51,
stdev=30229.81, samples=420
iops : min=50908, max=150500, avg=96990.60, stdev=7557.47,
samples=420
lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
lat (usec) : 100=15.15%, 250=46.78%, 500=10.58%, 750=5.17%,
1000=6.62%
lat (msec) : 2=3.90%, 4=1.46%, 10=10.29%, 20=0.03%
cpu : usr=2.96%, sys=15.72%, ctx=1029435, majf=0, minf=108
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued rwts: total=3499461,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=228MiB/s (239MB/s), 228MiB/s-228MiB/s (239MB/s-239MB/s),
io=13.3GiB (14.3GB), run=60001-60001msec
Disk stats (read/write):
sdb: ios=1028680/0, sectors=8229440/0, merge=0/0, ticks=180767/0,
in_queue=180766, util=99.60%
HDD with cache=none
# echo 3 > /proc/sys/vm/drop_caches ; fio --name=registry-read
--ioengine=libaio --rw=randread --bs=4k --numjobs=4 --size=10G
--runtime=60 --group_reporting --iodepth=16
registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W)
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.36
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=1033KiB/s][r=258 IOPS][eta 00m:00s]
registry-read: (groupid=0, jobs=4): err= 0: pid=442579: Tue Mar 25
13:15:51 2025
read: IOPS=335, BW=1340KiB/s (1373kB/s)(78.6MiB/60011msec)
slat (usec): min=160, max=361240, avg=11930.20, stdev=15819.52
clat (usec): min=4, max=957176, avg=178747.09, stdev=89162.82
lat (msec): min=5, max=977, avg=190.68, stdev=93.50
clat percentiles (msec):
| 1.00th=[ 86], 5.00th=[ 101], 10.00th=[ 109], 20.00th=[
121],
| 30.00th=[ 130], 40.00th=[ 140], 50.00th=[ 150], 60.00th=[
163],
| 70.00th=[ 180], 80.00th=[ 218], 90.00th=[ 300], 95.00th=[
372],
| 99.00th=[ 514], 99.50th=[ 558], 99.90th=[ 726], 99.95th=[
802],
| 99.99th=[ 885]
bw ( KiB/s): min= 248, max= 2160, per=99.89%, avg=1339.50,
stdev=117.47, samples=476
iops : min= 62, max= 540, avg=334.87, stdev=29.37,
samples=476
lat (usec) : 10=0.02%
lat (msec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=4.87%, 250=79.56%
lat (msec) : 500=14.36%, 750=1.08%, 1000=0.08%
cpu : usr=0.06%, sys=0.27%, ctx=20110, majf=0, minf=110
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.2%, 16=99.7%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued rwts: total=20110,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=1340KiB/s (1373kB/s), 1340KiB/s-1340KiB/s
(1373kB/s-1373kB/s), io=78.6MiB (82.4MB), run=60011-60011msec
Disk stats (read/write):
sdd: ios=20096/0, sectors=160768/0, merge=0/0, ticks=238953/0,
in_queue=238952, util=99.89%
HDD with cache=writeback
# echo 3 > /proc/sys/vm/drop_caches ; fio --name=registry-read
--ioengine=libaio --rw=randread --bs=4k --numjobs=4 --size=10G
--runtime=60 --group_reporting --iodepth=16
registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W)
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.36
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=210MiB/s][r=53.7k IOPS][eta 00m:00s]
registry-read: (groupid=0, jobs=4): err= 0: pid=943730: Thu Mar 20
14:51:33 2025
read: IOPS=53.8k, BW=210MiB/s (220MB/s)(12.3GiB/60001msec)
slat (usec): min=26, max=4995, avg=71.31, stdev=21.64
clat (usec): min=3, max=8707, avg=1116.26, stdev=141.55
lat (usec): min=79, max=8769, avg=1187.56, stdev=148.56
clat percentiles (usec):
| 1.00th=[ 938], 5.00th=[ 979], 10.00th=[ 1004], 20.00th=[
1029],
| 30.00th=[ 1045], 40.00th=[ 1074], 50.00th=[ 1090], 60.00th=[
1106],
| 70.00th=[ 1139], 80.00th=[ 1188], 90.00th=[ 1254], 95.00th=[
1336],
| 99.00th=[ 1582], 99.50th=[ 1811], 99.90th=[ 2474], 99.95th=[
2802],
| 99.99th=[ 3982]
bw ( KiB/s): min=167800, max=230352, per=100.00%, avg=215502.18,
stdev=2114.67, samples=476
iops : min=41950, max=57588, avg=53875.55, stdev=528.67,
samples=476
lat (usec) : 4=0.01%, 10=0.01%, 100=0.01%, 250=0.01%, 500=0.01%
lat (usec) : 750=0.01%, 1000=10.02%
lat (msec) : 2=89.68%, 4=0.28%, 10=0.01%
cpu : usr=4.83%, sys=37.00%, ctx=3232089, majf=0, minf=101
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued rwts: total=3230027,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=210MiB/s (220MB/s), 210MiB/s-210MiB/s (220MB/s-220MB/s),
io=12.3GiB (13.2GB), run=60001-60001msec
Disk stats (read/write):
sdd: ios=3224017/2, sectors=25792136/3, merge=0/0, ticks=168114/14,
in_queue=168141, util=99.03%
This was hdd (3/2 replication).