Am 2025-03-20 15:15, schrieb Chris Palmer:

Hi,

 * Ceph cluster 19.2.1 with 3 nodes, 4 x SATA disks with shared NVMe
   DB/WAL, single 10g NICs
 * Promox 8.3.5 cluster with 2 nodes (separate nodes to Ceph), single
   10g NICs , single 1g NICs for corosync
 * Test VM was using KRBD R3 pool on HDD, iothread=1, aio=io_uring,
   cache=writeback

"cache=writeback" measures basically the memory bandwidth of the kvm-host not the Ceph performance. This gets even worse with --size=1G instead of more realistic 10G.

Please remove the writeback cache and increase the filesize to at least 10G in order to measure Ceph rbd performance

Last but not least in order to make the benchmark independent of previous runs you shall please drop the caches.

Overview Results for blocksize=4K
=================================

NVMe with cache=none (10 GB Fibre) | read: IOPS = 12.9k, BW = 50.3 MiB/s NVMe with cache=writeback (10 GB Fibre)| read: IOPS = 58.3k, BW = 228.0 MiB/s HDD with cache=none (10 GB Fibre) | read: IOPS = 0.34k, BW = 1.3 MiB/s HDD with cache=writeback (10 GB Fibre)| read: IOPS = 53.8k, BW = 210.0 MiB/s


Detailed reports
================
NVME with cache=none

# echo 3 > /proc/sys/vm/drop_caches ; fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k --numjobs=4 --size=10G --runtime=60 --group_reporting --iodepth=16 registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.36
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=39.0MiB/s][r=9981 IOPS][eta 00m:00s]
registry-read: (groupid=0, jobs=4): err= 0: pid=443550: Tue Mar 25 13:25:10 2025
  read: IOPS=12.9k, BW=50.3MiB/s (52.7MB/s)(3017MiB/60001msec)
    slat (usec): min=2, max=13134, avg=307.13, stdev=154.43
    clat (usec): min=2, max=28147, avg=4662.15, stdev=1048.17
     lat (usec): min=384, max=28432, avg=4969.28, stdev=1106.45
    clat percentiles (usec):
| 1.00th=[ 2737], 5.00th=[ 3261], 10.00th=[ 3589], 20.00th=[ 4080], | 30.00th=[ 4293], 40.00th=[ 4424], 50.00th=[ 4555], 60.00th=[ 4621], | 70.00th=[ 4817], 80.00th=[ 5080], 90.00th=[ 5932], 95.00th=[ 6325], | 99.00th=[ 8225], 99.50th=[ 9634], 99.90th=[12911], 99.95th=[14353],
     | 99.99th=[17957]
bw ( KiB/s): min=26816, max=58136, per=100.00%, avg=51622.66, stdev=1587.61, samples=476 iops : min= 6704, max=14534, avg=12905.66, stdev=396.90, samples=476
  lat (usec)   : 4=0.01%, 10=0.01%, 500=0.01%, 1000=0.01%
  lat (msec)   : 2=0.03%, 4=18.10%, 10=81.47%, 20=0.40%, 50=0.01%
  cpu          : usr=1.52%, sys=8.43%, ctx=726528, majf=0, minf=109
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=772457,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
READ: bw=50.3MiB/s (52.7MB/s), 50.3MiB/s-50.3MiB/s (52.7MB/s-52.7MB/s), io=3017MiB (3164MB), run=60001-60001msec

Disk stats (read/write):
sdc: ios=725474/0, sectors=5803792/0, merge=0/0, ticks=223185/0, in_queue=223185, util=99.89%

NVMe with cache=writeback
# echo 3 > /proc/sys/vm/drop_caches ; fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k --numjobs=4 --size=10G --runtime=60 --group_reporting --iodepth=16 registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.36
Starting 4 processes
Jobs: 3 (f=3): [r(3),_(1)][65.9%][r=31.9MiB/s][r=8165 IOPS][eta 00m:31s]
registry-read: (groupid=0, jobs=4): err= 0: pid=442595: Tue Mar 25 13:19:11 2025
  read: IOPS=58.3k, BW=228MiB/s (239MB/s)(13.3GiB/60001msec)
    slat (usec): min=2, max=5185, avg=58.95, stdev=131.23
    clat (nsec): min=1440, max=15294k, avg=908789.46, stdev=1764896.35
     lat (usec): min=3, max=15367, avg=967.74, stdev=1880.61
    clat percentiles (usec):
| 1.00th=[ 55], 5.00th=[ 58], 10.00th=[ 67], 20.00th=[ 117], | 30.00th=[ 133], 40.00th=[ 165], 50.00th=[ 196], 60.00th=[ 239], | 70.00th=[ 347], 80.00th=[ 873], 90.00th=[ 4146], 95.00th=[ 6063], | 99.00th=[ 6652], 99.50th=[ 6980], 99.90th=[ 8717], 99.95th=[ 9503],
     | 99.99th=[11076]
bw ( KiB/s): min=203632, max=602000, per=100.00%, avg=387963.51, stdev=30229.81, samples=420 iops : min=50908, max=150500, avg=96990.60, stdev=7557.47, samples=420
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
lat (usec) : 100=15.15%, 250=46.78%, 500=10.58%, 750=5.17%, 1000=6.62%
  lat (msec)   : 2=3.90%, 4=1.46%, 10=10.29%, 20=0.03%
  cpu          : usr=2.96%, sys=15.72%, ctx=1029435, majf=0, minf=108
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=3499461,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
READ: bw=228MiB/s (239MB/s), 228MiB/s-228MiB/s (239MB/s-239MB/s), io=13.3GiB (14.3GB), run=60001-60001msec

Disk stats (read/write):
sdb: ios=1028680/0, sectors=8229440/0, merge=0/0, ticks=180767/0, in_queue=180766, util=99.60%

HDD with cache=none
# echo 3 > /proc/sys/vm/drop_caches ; fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k --numjobs=4 --size=10G --runtime=60 --group_reporting --iodepth=16 registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.36
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=1033KiB/s][r=258 IOPS][eta 00m:00s]
registry-read: (groupid=0, jobs=4): err= 0: pid=442579: Tue Mar 25 13:15:51 2025
  read: IOPS=335, BW=1340KiB/s (1373kB/s)(78.6MiB/60011msec)
    slat (usec): min=160, max=361240, avg=11930.20, stdev=15819.52
    clat (usec): min=4, max=957176, avg=178747.09, stdev=89162.82
     lat (msec): min=5, max=977, avg=190.68, stdev=93.50
    clat percentiles (msec):
| 1.00th=[ 86], 5.00th=[ 101], 10.00th=[ 109], 20.00th=[ 121], | 30.00th=[ 130], 40.00th=[ 140], 50.00th=[ 150], 60.00th=[ 163], | 70.00th=[ 180], 80.00th=[ 218], 90.00th=[ 300], 95.00th=[ 372], | 99.00th=[ 514], 99.50th=[ 558], 99.90th=[ 726], 99.95th=[ 802],
     | 99.99th=[  885]
bw ( KiB/s): min= 248, max= 2160, per=99.89%, avg=1339.50, stdev=117.47, samples=476 iops : min= 62, max= 540, avg=334.87, stdev=29.37, samples=476
  lat (usec)   : 10=0.02%
  lat (msec)   : 10=0.01%, 20=0.01%, 50=0.01%, 100=4.87%, 250=79.56%
  lat (msec)   : 500=14.36%, 750=1.08%, 1000=0.08%
  cpu          : usr=0.06%, sys=0.27%, ctx=20110, majf=0, minf=110
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.2%, 16=99.7%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=20110,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
READ: bw=1340KiB/s (1373kB/s), 1340KiB/s-1340KiB/s (1373kB/s-1373kB/s), io=78.6MiB (82.4MB), run=60011-60011msec

Disk stats (read/write):
sdd: ios=20096/0, sectors=160768/0, merge=0/0, ticks=238953/0, in_queue=238952, util=99.89%

HDD with cache=writeback
# echo 3 > /proc/sys/vm/drop_caches ; fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k --numjobs=4 --size=10G --runtime=60 --group_reporting --iodepth=16 registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.36
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=210MiB/s][r=53.7k IOPS][eta 00m:00s]
registry-read: (groupid=0, jobs=4): err= 0: pid=943730: Thu Mar 20 14:51:33 2025
  read: IOPS=53.8k, BW=210MiB/s (220MB/s)(12.3GiB/60001msec)
    slat (usec): min=26, max=4995, avg=71.31, stdev=21.64
    clat (usec): min=3, max=8707, avg=1116.26, stdev=141.55
     lat (usec): min=79, max=8769, avg=1187.56, stdev=148.56
    clat percentiles (usec):
| 1.00th=[ 938], 5.00th=[ 979], 10.00th=[ 1004], 20.00th=[ 1029], | 30.00th=[ 1045], 40.00th=[ 1074], 50.00th=[ 1090], 60.00th=[ 1106], | 70.00th=[ 1139], 80.00th=[ 1188], 90.00th=[ 1254], 95.00th=[ 1336], | 99.00th=[ 1582], 99.50th=[ 1811], 99.90th=[ 2474], 99.95th=[ 2802],
     | 99.99th=[ 3982]
bw ( KiB/s): min=167800, max=230352, per=100.00%, avg=215502.18, stdev=2114.67, samples=476 iops : min=41950, max=57588, avg=53875.55, stdev=528.67, samples=476
  lat (usec)   : 4=0.01%, 10=0.01%, 100=0.01%, 250=0.01%, 500=0.01%
  lat (usec)   : 750=0.01%, 1000=10.02%
  lat (msec)   : 2=89.68%, 4=0.28%, 10=0.01%
  cpu          : usr=4.83%, sys=37.00%, ctx=3232089, majf=0, minf=101
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=3230027,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
READ: bw=210MiB/s (220MB/s), 210MiB/s-210MiB/s (220MB/s-220MB/s), io=12.3GiB (13.2GB), run=60001-60001msec

Disk stats (read/write):
sdd: ios=3224017/2, sectors=25792136/3, merge=0/0, ticks=168114/14, in_queue=168141, util=99.03%

This was hdd (3/2 replication).


--
--martin konold
ppa. Martin Konold

--
Martin Konold - Prokurist, CTO
KONSEC GmbH -⁠ make things real
Amtsgericht Stuttgart, HRB 23690
Geschäftsführer: Andreas Mack
Im Köller 3, 70794 Filderstadt, Germany
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to