[ceph-users] Re: [SPAM] Re: Ceph RBD, MySQL write IOPs - what is possible?

Sebastian Tue, 11 Jun 2024 05:20:02 -0700

Hi,
don’t expect solution on group, just direction.
Here is link to the blog post 
https://ceph.io/en/news/blog/2024/ceph-a-journey-to-1tibps/
on youtube is presentation from nyc ceph days

View performance from the client's perspective, run the measurement tools from 
inside the virtual machine. 
This approach will provide you the performance as experienced by the client.
The most commonly used tool for performance measurement is fio. I strongly 
recommend using fio for your evaluation.
also use ioping to measure latency. While fio will provide IOPS/ and latency 
metrics during load, ioping offers view of latency behavior when the machine is 
not under heavy load.
Based on my previous experiences (not only mine, but also my team), many 
performance issues were related to network configurations or problems around 
the network infrastructure. As example we encountered a situation where a 
change made by the network team to the spine switches caused disk latency to 
increase from 3ms to 80-120ms. 
Other example which almost burn me was issue with one spine cards, which was 
not fully broken, monitoring not discovered it, tests shows everything is ok 
but on ceph we had many, many issues like flapping osd’s, like half of osds 
form 500 goes down, latency spikes time to time. Card had problems time to time 
but not during tests :) 
and of course AMD nodes before I discovered iommu=pt for kernel params.
Belive me this c-states and power management on nodes are important. 

You already received very good advices from others, not much to add, look on 
your network drivers, rx queue, tx queue.

for your information this cluster was not fine tuned, also e2e enc. is enabled
6 node cluster, all nvme 8x nvme per node, 512gb ram, 4x25GB lacp for public 
and another 4x25GB for cluster net. (malleanox cards) 
# rados bench -p test 10 write -t 8 -b 16K
Rados bench results:
Total time run:         10.0003
Total writes made:      113195
Write size:             16384
Object size:            16384
Bandwidth (MB/sec):     176.862
Stddev Bandwidth:       27.047
Max bandwidth (MB/sec): 195.828
Min bandwidth (MB/sec): 107.906
Average IOPS:           11319
Stddev IOPS:            1731.01
Max IOPS:               12533
Min IOPS:               6906
Average Latency(s):     0.000705734
Stddev Latency(s):      0.00224331
Max latency(s):         0.325178
Min latency(s):         0.000413413

This is test from fio with librbd it shows more or less vm performance. 

[test]
ioengine=rbd
clientname=admin
pool=test
rbdname=bench
rw=randwrite
bs=4k
iodepth=256
direct=1
numjobs=1
fsync=0
size=10G
runtime=300
time_based
invalidate=0

test: (groupid=0, jobs=1): err= 0: pid=3495143: Tue Jun 11 11:56:04 2024
  write: IOPS=83.6k, BW=326MiB/s (342MB/s)(95.6GiB/300002msec); 0 zone resets
    slat (nsec): min=975, max=2665.0k, avg=3943.68, stdev=2820.21
    clat (usec): min=399, max=225434, avg=3058.67, stdev=1801.25

and for iodepth=1

test: (groupid=0, jobs=1): err= 0: pid=3503647: Tue Jun 11 11:57:48 2024
  write: IOPS=1845, BW=7382KiB/s (7559kB/s)(159MiB/22033msec); 0 zone resets
    slat (nsec): min=2966, max=41133, avg=4381.81, stdev=1062.40
    clat (usec): min=367, max=202364, avg=537.05, stdev=1009.49

and iodepth=256 and bs=16k

test: (groupid=0, jobs=1): err= 0: pid=3505339: Tue Jun 11 12:03:27 2024
  write: IOPS=79.6k, BW=1244MiB/s (1305MB/s)(365GiB/300002msec); 0 zone resets
    slat (nsec): min=1815, max=4497.4k, avg=5671.20, stdev=3540.33
    clat (usec): min=446, max=267567, avg=3208.34, stdev=2038.58
     lat (usec): min=451, max=267571, avg=3214.01, stdev=2038.60

BR,
Sebastian

> On 11 Jun 2024, at 02:23, Mark Lehrer <leh...@gmail.com> wrote:
> 
> If they can do 1 TB/s with a single 16K write thread, that will be
> quite impressive :D    Otherwise not really applicable.  Ceph scaling
> has always been good.
> 
> More seriously, would you mind sending a link to this?
> 
> 
> Thanks!
> 
> Mark
> 
> On Mon, Jun 10, 2024 at 12:01 PM Anthony D'Atri <anthony.da...@gmail.com> 
> wrote:
>> 
>> Eh?  cf. Mark and Dan's 1TB/s presentation.
>> 
>> On Jun 10, 2024, at 13:58, Mark Lehrer <leh...@gmail.com> wrote:
>> 
>> It
>> seems like Ceph still hasn't adjusted to SSD performance.
>> 
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [SPAM] Re: Ceph RBD, MySQL write IOPs - what is possible?

Reply via email to