[ceph-users] Re: Experience with 100G Ceph in Proxmox

Anthony D'Atri Fri, 11 Apr 2025 13:28:25 -0700

Please do let me know if that strategy works out.  When you change an osd_spec, 
out of an abundance of caution it won’t be retroactively applied to existing 
OSDs, which can be exploited for migrations.


> On Apr 11, 2025, at 3:29 PM, Giovanna Ratini 
> <giovanna.rat...@uni-konstanz.de> wrote:
> 
> Hello Eneko,
> 
> I switched to KRDB, and I’m seeing slightly better performance now.
> 
> For Switching: 
> https://forum.proxmox.com/threads/how-to-safely-enable-krbd-in-a-5-node-production-environment-running-7-4-19.159186/
> 
> NVMe performance remains disappointing, though...
> They went from 35MB/s to 45MB/s.
> 
> I’m planning to apply the change that Anthony recommended:
> setting mon_target_pg_per_osd to 250 and configuring 2 osds_per_device.
> This will take a bit of time.
> ceph config set global mon_target_pg_per_osd 250
> ceph config set global osds_per_device 2
> 
> To split the drives into 2 OSDs each,
> I’ll need to update the ceph orch ls --export OSD service spec,
> 
> then zap an existing OSD, allow it to be rebuilt as two, and repeat the 
> process for the remaining ones.
> 
> We'll see if this change helps. I’ll write the results here once it's done.
> 
> Cheers,
> 
> Gio
> 
> root@gitlab:~# fio --name=registry-read --ioengine=libaio --rw=randread 
> --bs=4k --numjobs=4 --iodepth=16 --size=1G --runtime=60
> 
> registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
> 4096B-4096B, ioengine=libaio, iodepth=16
> ...
> fio-3.33
> Starting 4 processes
> Jobs: 4 (f=4): [r(4)][100.0%][r=91.7MiB/s][r=23.5k IOPS][eta 00m:00s]
> registry-read: (groupid=0, jobs=1): err= 0: pid=2547: Fri Apr 11 21:02:31 2025
>   read: IOPS=2756, BW=10.8MiB/s (11.3MB/s)(646MiB/60001msec)
>     slat (usec): min=50, max=8619, avg=360.14, stdev=217.84
>     clat (usec): min=2, max=17259, avg=5441.99, stdev=1633.01
>      lat (usec): min=108, max=17721, avg=5802.13, stdev=1728.71
>     clat percentiles (usec):
>      |  1.00th=[ 1909],  5.00th=[ 2507], 10.00th=[ 2966], 20.00th=[ 3818],
>      | 30.00th=[ 4621], 40.00th=[ 5342], 50.00th=[ 5932], 60.00th=[ 6259],
>      | 70.00th=[ 6456], 80.00th=[ 6718], 90.00th=[ 6980], 95.00th=[ 7308],
>      | 99.00th=[ 9241], 99.50th=[10290], 99.90th=[13173], 99.95th=[13698],
>      | 99.99th=[16450]
>    bw (  KiB/s): min= 8456, max=22296, per=24.64%, avg=10937.08, 
> stdev=3222.24, samples=119
>    iops        : min= 2114, max= 5574, avg=2734.27, stdev=805.56, samples=119
>   lat (usec)   : 4=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
>   lat (msec)   : 2=1.33%, 4=20.70%, 10=77.32%, 20=0.65%
>   cpu          : usr=0.78%, sys=6.75%, ctx=165432, majf=0, minf=27
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>      issued rwts: total=165408,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>      latency   : target=0, window=0, percentile=100.00%, depth=16
> registry-read: (groupid=0, jobs=1): err= 0: pid=2548: Fri Apr 11 21:02:31 2025
>   read: IOPS=2807, BW=11.0MiB/s (11.5MB/s)(658MiB/60001msec)
>     slat (usec): min=50, max=8950, avg=353.61, stdev=213.68
>     clat (usec): min=2, max=17110, avg=5344.32, stdev=1642.90
>      lat (usec): min=93, max=17575, avg=5697.93, stdev=1740.41
>     clat percentiles (usec):
>      |  1.00th=[ 1844],  5.00th=[ 2409], 10.00th=[ 2868], 20.00th=[ 3687],
>      | 30.00th=[ 4490], 40.00th=[ 5276], 50.00th=[ 5866], 60.00th=[ 6194],
>      | 70.00th=[ 6390], 80.00th=[ 6587], 90.00th=[ 6915], 95.00th=[ 7242],
>      | 99.00th=[ 8979], 99.50th=[10159], 99.90th=[13042], 99.95th=[13829],
>      | 99.99th=[15926]
>    bw (  KiB/s): min= 8536, max=23624, per=25.10%, avg=11138.08, 
> stdev=3441.69, samples=119
>    iops        : min= 2134, max= 5906, avg=2784.52, stdev=860.42, samples=119
>   lat (usec)   : 4=0.01%, 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%
>   lat (usec)   : 1000=0.01%
>   lat (msec)   : 2=1.80%, 4=22.21%, 10=75.40%, 20=0.58%
>   cpu          : usr=0.98%, sys=6.72%, ctx=168450, majf=0, minf=25
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>      issued rwts: total=168432,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>      latency   : target=0, window=0, percentile=100.00%, depth=16
> registry-read: (groupid=0, jobs=1): err= 0: pid=2549: Fri Apr 11 21:02:31 2025
>   read: IOPS=2773, BW=10.8MiB/s (11.4MB/s)(650MiB/60001msec)
>     slat (usec): min=46, max=8246, avg=357.89, stdev=213.33
>     clat (usec): min=2, max=19652, avg=5408.19, stdev=1641.03
>      lat (usec): min=411, max=20124, avg=5766.08, stdev=1738.36
>     clat percentiles (usec):
>      |  1.00th=[ 1909],  5.00th=[ 2474], 10.00th=[ 2933], 20.00th=[ 3752],
>      | 30.00th=[ 4555], 40.00th=[ 5342], 50.00th=[ 5932], 60.00th=[ 6259],
>      | 70.00th=[ 6456], 80.00th=[ 6652], 90.00th=[ 6980], 95.00th=[ 7242],
>      | 99.00th=[ 9110], 99.50th=[10421], 99.90th=[12911], 99.95th=[14353],
>      | 99.99th=[16909]
>    bw (  KiB/s): min= 8432, max=22520, per=24.79%, avg=11004.77, 
> stdev=3330.83, samples=119
>    iops        : min= 2108, max= 5630, avg=2751.19, stdev=832.71, samples=119
>   lat (usec)   : 4=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
>   lat (msec)   : 2=1.40%, 4=21.56%, 10=76.44%, 20=0.60%
>   cpu          : usr=0.99%, sys=6.58%, ctx=166457, majf=0, minf=25
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>      issued rwts: total=166442,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>      latency   : target=0, window=0, percentile=100.00%, depth=16
> registry-read: (groupid=0, jobs=1): err= 0: pid=2550: Fri Apr 11 21:02:31 2025
>   read: IOPS=2757, BW=10.8MiB/s (11.3MB/s)(646MiB/60001msec)
>     slat (usec): min=49, max=7497, avg=360.11, stdev=212.22
>     clat (usec): min=2, max=19699, avg=5441.22, stdev=1616.73
>      lat (usec): min=390, max=20175, avg=5801.33, stdev=1712.21
>     clat percentiles (usec):
>      |  1.00th=[ 1909],  5.00th=[ 2540], 10.00th=[ 2999], 20.00th=[ 3818],
>      | 30.00th=[ 4621], 40.00th=[ 5407], 50.00th=[ 5932], 60.00th=[ 6259],
>      | 70.00th=[ 6456], 80.00th=[ 6652], 90.00th=[ 6980], 95.00th=[ 7308],
>      | 99.00th=[ 8979], 99.50th=[10159], 99.90th=[13042], 99.95th=[13829],
>      | 99.99th=[16057]
>    bw (  KiB/s): min= 8512, max=23152, per=24.65%, avg=10941.71, 
> stdev=3229.43, samples=119
>    iops        : min= 2128, max= 5788, avg=2735.43, stdev=807.36, samples=119
>   lat (usec)   : 4=0.01%, 500=0.01%, 1000=0.01%
>   lat (msec)   : 2=1.39%, 4=20.78%, 10=77.28%, 20=0.54%
>   cpu          : usr=0.80%, sys=6.75%, ctx=165463, majf=0, minf=27
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>      issued rwts: total=165432,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>      latency   : target=0, window=0, percentile=100.00%, depth=16
> 
> Run status group 0 (all jobs):
>    READ: bw=43.3MiB/s (45.4MB/s), 10.8MiB/s-11.0MiB/s (11.3MB/s-11.5MB/s), 
> io=2600MiB (2727MB), run=60001-60001msec
> 
> Disk stats (read/write):
>     dm-0: ios=663651/273, merge=0/0, ticks=221100/28, in_queue=221128, 
> util=99.88%, aggrios=666145/189, aggrmerge=202/85, aggrticks=206340/50, 
> aggrin_queue=206423, aggrutil=66.45%
>   sda: ios=666145/189, merge=202/85, ticks=206340/50, in_queue=206423, 
> util=66.45%
> 
> 
> Am 20.03.2025 um 16:57 schrieb Eneko Lacunza:
>> Hi Chris,
>> 
>> I tried KRBD, even with a newly created disk and after shuting down and 
>> starting VM again, but no measurable difference.
>> 
>> Our Ceph is 18.2.4, that may be a factor to consider, but 9k -> 273k?!
>> 
>> Maybe Giovanna can test KRBD option and report back... :)
>> 
>> Cheers
>> 
>> El 20/3/25 a las 16:19, Chris Palmer escribió:
>>> HI Eneko
>>> 
>>> No containers. In the Promox console go to Datacenter\Storage, click on the 
>>> storage you are using, then Edit. There is a tick box KRBD. With that set, 
>>> any virtual disks created in that storage will use KRBD rather than librbd. 
>>> So it applies to all VMs that use that storage.
>>> 
>>> Chris
>>> 
>>> On 20/03/2025 15:00, Eneko Lacunza wrote:
>>>> 
>>>> Chris, you tested from a container? Or how do you configure a KRBD disk 
>>>> for a VM?
>>>> 
>>>> El 20/3/25 a las 15:15, Chris Palmer escribió:
>>>>> I just ran that command on one of my VMs. Salient details:
>>>>> 
>>>>>   * Ceph cluster 19.2.1 with 3 nodes, 4 x SATA disks with shared NVMe
>>>>>     DB/WAL, single 10g NICs
>>>>>   * Promox 8.3.5 cluster with 2 nodes (separate nodes to Ceph), single
>>>>>     10g NICs , single 1g NICs for corosync
>>>>>   * Test VM was using KRBD R3 pool on HDD, iothread=1, aio=io_uring,
>>>>>     cache=writeback
>>>>> 
>>>>> The results are very different:
>>>>> 
>>>>> # fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k 
>>>>> --numjobs=4 --size=1G --runtime=60 --group_reporting --iodepth=16
>>>>> registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, 
>>>>> (T) 4096B-4096B, ioengine=libaio, iodepth=16
>>>>> ...
>>>>> fio-3.37
>>>>> Starting 4 processes
>>>>> Jobs: 4 (f=4): [r(4)][-.-%][r=1080MiB/s][r=277k IOPS][eta 00m:00s]
>>>>> registry-read: (groupid=0, jobs=4): err= 0: pid=13355: Thu Mar 20 
>>>>> 13:57:05 2025
>>>>>   read: IOPS=273k, BW=1068MiB/s (1120MB/s)(4096MiB/3835msec)
>>>>>     slat (usec): min=7, max=3802, avg=13.77, stdev= 6.41
>>>>>     clat (nsec): min=599, max=4395.1k, avg=215298.68, stdev=38131.71
>>>>>      lat (usec): min=11, max=4408, avg=229.07, stdev=40.01
>>>>>     clat percentiles (usec):
>>>>>      |  1.00th=[  194],  5.00th=[  200], 10.00th=[  202], 20.00th=[  204],
>>>>>      | 30.00th=[  206], 40.00th=[  208], 50.00th=[  210], 60.00th=[  212],
>>>>>      | 70.00th=[  215], 80.00th=[  217], 90.00th=[  227], 95.00th=[  243],
>>>>>      | 99.00th=[  367], 99.50th=[  420], 99.90th=[  594], 99.95th=[  668],
>>>>>      | 99.99th=[  963]
>>>>>    bw (  MiB/s): min=  920, max= 1118, per=100.00%, avg=1068.04, 
>>>>> stdev=16.81, samples=28
>>>>>    iops        : min=235566, max=286286, avg=273417.14, stdev=4303.79, 
>>>>> samples=28
>>>>>   lat (nsec)   : 750=0.01%, 1000=0.01%
>>>>>   lat (usec)   : 20=0.01%, 50=0.01%, 100=0.01%, 250=96.06%, 500=3.67%
>>>>>   lat (usec)   : 750=0.24%, 1000=0.02%
>>>>>   lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
>>>>>   cpu          : usr=4.68%, sys=29.99%, ctx=1048987, majf=0, minf=102
>>>>>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, 
>>>>> >=64=0.0%
>>>>>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>>>>> >=64=0.0%
>>>>>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, 
>>>>> >=64=0.0%
>>>>>      issued rwts: total=1048576,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>>>>>      latency   : target=0, window=0, percentile=100.00%, depth=16
>>>>> 
>>>>> Run status group 0 (all jobs):
>>>>>    READ: bw=1068MiB/s (1120MB/s), 1068MiB/s-1068MiB/s 
>>>>> (1120MB/s-1120MB/s), io=4096MiB (4295MB), run=3835-3835msec
>>>>> 
>>>>> Disk stats (read/write):
>>>>>   sdc: ios=999346/0, sectors=7994768/0, merge=0/0, ticks=10360/0, 
>>>>> in_queue=10361, util=95.49%
>>>>> 
>>>>> 
>>>>> 
>>>>> On 20/03/2025 12:23, Eneko Lacunza wrote:
>>>>>> Hi Giovanna,
>>>>>> 
>>>>>> I just tested one of my VMs:
>>>>>> # fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k 
>>>>>> --numjobs=4 --size=1G --runtime=60 --group_reporting
>>>>>> registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, 
>>>>>> (T) 4096B-4096B, ioengine=libaio, iodepth=1
>>>>>> registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, 
>>>>>> (T) 4096B-4096B, ioengine=libaio, iodepth=1
>>>>>> ...
>>>>>> fio-3.33
>>>>>> Starting 4 processes
>>>>>> registry-read: Laying out IO file (1 file / 1024MiB)
>>>>>> registry-read: Laying out IO file (1 file / 1024MiB)
>>>>>> registry-read: Laying out IO file (1 file / 1024MiB)
>>>>>> registry-read: Laying out IO file (1 file / 1024MiB)
>>>>>> Jobs: 4 (f=0): [f(4)][100.0%][r=33.5MiB/s][r=8578 IOPS][eta 00m:00s]
>>>>>> registry-read: (groupid=0, jobs=4): err= 0: pid=24261: Thu Mar 20 
>>>>>> 12:57:26 2025
>>>>>>   read: IOPS=8538, BW=33.4MiB/s (35.0MB/s)(2001MiB/60001msec)
>>>>>>     slat (usec): min=309, max=4928, avg=464.54, stdev=73.15
>>>>>>     clat (nsec): min=602, max=1532.4k, avg=1999.15, stdev=3724.16
>>>>>>      lat (usec): min=310, max=4931, avg=466.54, stdev=73.36
>>>>>>     clat percentiles (nsec):
>>>>>>      |  1.00th=[  812],  5.00th=[  884], 10.00th=[  940], 20.00th=[ 
>>>>>> 1096],
>>>>>>      | 30.00th=[ 1368], 40.00th=[ 1576], 50.00th=[ 1720], 60.00th=[ 
>>>>>> 1832],
>>>>>>      | 70.00th=[ 1944], 80.00th=[ 2096], 90.00th=[ 2480], 95.00th=[ 
>>>>>> 3024],
>>>>>>      | 99.00th=[12480], 99.50th=[15808], 99.90th=[47360], 
>>>>>> 99.95th=[61696],
>>>>>>      | 99.99th=[90624]
>>>>>>    bw (  KiB/s): min=30448, max=35868, per=100.00%, avg=34155.76, 
>>>>>> stdev=269.75, samples=476
>>>>>>    iops        : min= 7612, max= 8966, avg=8538.87, stdev=67.43, 
>>>>>> samples=476
>>>>>>   lat (nsec)   : 750=0.06%, 1000=14.94%
>>>>>>   lat (usec)   : 2=59.18%, 4=23.07%, 10=1.28%, 20=1.17%, 50=0.21%
>>>>>>   lat (usec)   : 100=0.08%, 250=0.01%, 500=0.01%
>>>>>>   lat (msec)   : 2=0.01%
>>>>>>   cpu          : usr=1.04%, sys=5.50%, ctx=537639, majf=0, minf=36
>>>>>>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>>>>>> >=64=0.0%
>>>>>>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>>>>>> >=64=0.0%
>>>>>>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>>>>>> >=64=0.0%
>>>>>>      issued rwts: total=512316,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>>>>>>      latency   : target=0, window=0, percentile=100.00%, depth=1
>>>>>> 
>>>>>> Run status group 0 (all jobs):
>>>>>>    READ: bw=33.4MiB/s (35.0MB/s), 33.4MiB/s-33.4MiB/s 
>>>>>> (35.0MB/s-35.0MB/s), io=2001MiB (2098MB), run=60001-60001msec
>>>>>> 
>>>>>> Results are worse than yours, but this is on a production (not very 
>>>>>> busy) pool with 4x3.84TB SATA disks (4 disks total vs ~15 disks in your 
>>>>>> case) and 10G network.
>>>>>> 
>>>>>> VM cpu is x86_64_v3 and host CPU Ryzen 1700.
>>>>>> 
>>>>>> I gest almost the same IOPS with --iodepth=16 .
>>>>>> 
>>>>>> I tried moving the VM to a Ryzen 5900X and results are somewhat better:
>>>>>> 
>>>>>> # fio --name=registry-read --ioengine=libaio --rw=randread --bs=4k 
>>>>>> --numjobs=4 --size=1G --runtime=60 --group_reporting --iodepth=16
>>>>>> registry-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, 
>>>>>> (T) 4096B-4096B, ioengine=libaio, iodepth=16
>>>>>> ...
>>>>>> fio-3.33
>>>>>> Starting 4 processes
>>>>>> Jobs: 4 (f=4): [r(4)][100.0%][r=45.4MiB/s][r=11.6k IOPS][eta 00m:00s]
>>>>>> registry-read: (groupid=0, jobs=4): err= 0: pid=24282: Thu Mar 20 
>>>>>> 13:18:23 2025
>>>>>>   read: IOPS=11.6k, BW=45.5MiB/s (47.7MB/s)(2730MiB/60001msec)
>>>>>>     slat (usec): min=110, max=21206, avg=341.21, stdev=79.69
>>>>>>     clat (nsec): min=1390, max=42395k, avg=5147009.08, stdev=475506.40
>>>>>>      lat (usec): min=335, max=42779, avg=5488.22, stdev=498.03
>>>>>>     clat percentiles (usec):
>>>>>>      |  1.00th=[ 4621],  5.00th=[ 4752], 10.00th=[ 4817], 20.00th=[ 
>>>>>> 4948],
>>>>>>      | 30.00th=[ 5014], 40.00th=[ 5080], 50.00th=[ 5080], 60.00th=[ 
>>>>>> 5145],
>>>>>>      | 70.00th=[ 5211], 80.00th=[ 5276], 90.00th=[ 5407], 95.00th=[ 
>>>>>> 5538],
>>>>>>      | 99.00th=[ 6194], 99.50th=[ 6783], 99.90th=[ 9765], 
>>>>>> 99.95th=[12125],
>>>>>>      | 99.99th=[24249]
>>>>>>    bw (  KiB/s): min=36434, max=48352, per=100.00%, avg=46612.18, 
>>>>>> stdev=300.09, samples=476
>>>>>>    iops        : min= 9108, max=12088, avg=11653.04, stdev=75.03, 
>>>>>> samples=476
>>>>>>   lat (usec)   : 2=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
>>>>>>   lat (msec)   : 2=0.01%, 4=0.01%, 10=99.90%, 20=0.08%, 50=0.01%
>>>>>>   cpu          : usr=0.98%, sys=4.18%, ctx=706399, majf=0, minf=99
>>>>>>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, 
>>>>>> >=64=0.0%
>>>>>>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>>>>>> >=64=0.0%
>>>>>>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, 
>>>>>> >=64=0.0%
>>>>>>      issued rwts: total=698956,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>>>>>>      latency   : target=0, window=0, percentile=100.00%, depth=16
>>>>>> 
>>>>>> Run status group 0 (all jobs):
>>>>>>    READ: bw=45.5MiB/s (47.7MB/s), 45.5MiB/s-45.5MiB/s 
>>>>>> (47.7MB/s-47.7MB/s), io=2730MiB (2863MB), run=60001-60001msec
>>>>>> 
>>>>>> I think we're limited by the IO thread. I suggest you try multiple disks 
>>>>>> with SCSI Virtio single.
>>>>>> 
>>>>>> My VM conf:
>>>>>> agent: 1
>>>>>> boot: order=scsi0;ide2;net0
>>>>>> cores: 2
>>>>>> cpu: x86-64-v3
>>>>>> ide2: none,media=cdrom
>>>>>> memory: 2048
>>>>>> meta: creation-qemu=9.0.2,ctime=1739888364
>>>>>> name: elacunza-btrfs-test
>>>>>> net0: virtio=BC:24:11:47:9B:58,bridge=vmbr0,firewall=1
>>>>>> numa: 0
>>>>>> ostype: l26
>>>>>> scsi0: proxmox_r3_ssd2:vm-112-disk-0,discard=on,iothread=1,size=15G
>>>>>> scsihw: virtio-scsi-single
>>>>>> smbios1: uuid=263ab229-4379-4abf-b6bf-615b98ccd3d4
>>>>>> sockets: 1
>>>>>> vmgenid: 13b7f2a4-2a42-4600-845a-da88f96ae6e8
>>>>>> 
>>>>>> I think this is a KVM/QEMU issue, not a Ceph issue :) Maybe you can get 
>>>>>> better suggestions in pve-user mailing list.
>>>>>> 
>>>>>> Cheers
>>>>>> 
>>>>>> El 20/3/25 a las 12:29, Giovanna Ratini escribió:
>>>>>>> Hello Eneko,
>>>>>>> 
>>>>>>> this is my configuration. The performance is similar across all VMs. I 
>>>>>>> am now checking GitLab, as that is where people are complaining the 
>>>>>>> most.
>>>>>>> 
>>>>>>> agent: 1
>>>>>>> balloon: 65000
>>>>>>> bios: ovmf
>>>>>>> boot: order=scsi0;net0
>>>>>>> cores: 10
>>>>>>> cpu: host
>>>>>>> efidisk0: cephvm:vm-6506-disk-0,efitype=4m,size=528K
>>>>>>> memory: 130000
>>>>>>> meta: creation-qemu=9.0.2,ctime=1734995123
>>>>>>> name: gitlab02
>>>>>>> net0: virtio=BC:24:11:6E:28:71,bridge=vmbr1,firewall=1
>>>>>>> numa: 0
>>>>>>> ostype: l26
>>>>>>> scsi0: 
>>>>>>> cephvm:vm-6506-disk-1,aio=native,cache=writeback,iothread=1,size=64G,ssd=1
>>>>>>> scsi1: 
>>>>>>> cephvm:vm-6506-disk-2,aio=native,cache=writeback,iothread=1,size=10T,ssd=1
>>>>>>> scsihw: virtio-scsi-single
>>>>>>> smbios1: uuid=0a5294c0-c82a-40f2-aae4-f5880022a2ac
>>>>>>> sockets: 2
>>>>>>> vmgenid: ea610fde-6c71-4b7f-9257-fa431a428e16
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> Gio
>>>>>>> 
>>>>>>> Am 20.03.2025 um 10:23 schrieb Eneko Lacunza:
>>>>>>>> Hi Giovanna,
>>>>>>>> 
>>>>>>>> Can you post VM's full config?
>>>>>>>> 
>>>>>>>> Also, can you test with IO thread enabled and SCSI virtio single, and 
>>>>>>>> multiple disks?
>>>>>>>> 
>>>>>>>> Cheers
>>>>>>>> 
>>>>>>>> El 19/3/25 a las 17:27, Giovanna Ratini escribió:
>>>>>>>>> 
>>>>>>>>> hello Eneko,
>>>>>>>>> 
>>>>>>>>> Yes I did.  No significant changes.  :-(
>>>>>>>>> Cheers,
>>>>>>>>> 
>>>>>>>>> Gio
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Am Mittwoch, März 19, 2025 13:09 CET, schrieb Eneko Lacunza 
>>>>>>>>> <elacu...@binovo.es>:
>>>>>>>>> 
>>>>>>>>>> Hi Giovanna,
>>>>>>>>>> 
>>>>>>>>>> Have you tried increasing iothreads option for the VM?
>>>>>>>>>> 
>>>>>>>>>> Cheers
>>>>>>>>>> 
>>>>>>>>>> El 18/3/25 a las 19:13, Giovanna Ratini escribió:
>>>>>>>>>> > Hello Antony,
>>>>>>>>>> >
>>>>>>>>>> > no, no QoS applied to Vms.
>>>>>>>>>> >
>>>>>>>>>> > The Server has PCIe Gen 4
>>>>>>>>>> >
>>>>>>>>>> > ceph osd dump | grep pool
>>>>>>>>>> > pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash
>>>>>>>>>> > rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags
>>>>>>>>>> > hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application 
>>>>>>>>>> > mgr
>>>>>>>>>> > read_balance_score 13.04
>>>>>>>>>> > pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0
>>>>>>>>>> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
>>>>>>>>>> > last_change 598 lfor 0/598/596 flags hashpspool stripe_width 0
>>>>>>>>>> > application cephfs read_balance_score 2.02
>>>>>>>>>> > pool 3 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0
>>>>>>>>>> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
>>>>>>>>>> > last_change 50 flags hashpspool stripe_width 0 pg_autoscale_bias 4
>>>>>>>>>> > pg_num_min 16 recovery_priority 5 application cephfs
>>>>>>>>>> > read_balance_score 2.42
>>>>>>>>>> > pool 4 'cephvm' replicated size 3 min_size 2 crush_rule 0 
>>>>>>>>>> > object_hash
>>>>>>>>>> > rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 16386
>>>>>>>>>> > lfor 0/644/2603 flags hashpspool,selfmanaged_snaps stripe_width 0
>>>>>>>>>> > application rbd read_balance_score 1.52
>>>>>>>>>> >
>>>>>>>>>> > I think, this is the default config. 🙈
>>>>>>>>>> >
>>>>>>>>>> > I will search for my chassies supermicro upgrade.
>>>>>>>>>> >
>>>>>>>>>> > Thank you
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > Am 18.03.2025 um 17:57 schrieb Anthony D'Atri:
>>>>>>>>>> >>> Then I tested on the *Proxmox host*, and the results were
>>>>>>>>>> >>> significantly better.
>>>>>>>>>> >> My Proxmox prowess is limited, but from my experience with other
>>>>>>>>>> >> virtualization platforms, I have to ask if there is any QoS
>>>>>>>>>> >> throttling applied to VMs.  With OpenStack or DO there is often 
>>>>>>>>>> >> IOPS
>>>>>>>>>> >> and/or throughput throttling via libvirt to mitigate noisy 
>>>>>>>>>> >> neighbors.
>>>>>>>>>> >>
>>>>>>>>>> >>>   fio --name=host-test --filename=/dev/rbd0 --ioengine=libaio
>>>>>>>>>> >>> --rw=randread --bs=4k --numjobs=4 --iodepth=32 --size=1G
>>>>>>>>>> >>> --runtime=60 --group_reporting
>>>>>>>>>> >>>
>>>>>>>>>> >>> *IOPS*: *1.54M*
>>>>>>>>>> >>>
>>>>>>>>>> >>> # *Bandwidth*: *6032MiB/s (6325MB/s)*
>>>>>>>>>> >>> # *Latency*:
>>>>>>>>>> >>>
>>>>>>>>>> >>> * *Avg*: *39.8µs*
>>>>>>>>>> >>> * *99.9th percentile*: *71µs*
>>>>>>>>>> >>>
>>>>>>>>>> >>> # *CPU Usage*: *usr=22.60%, sys=77.13%*
>>>>>>>>>> >>> #
>>>>>>>>>> >>>
>>>>>>>>>> >>> Am 18.03.2025 um 15:27 schrieb Anthony D'Atri:
>>>>>>>>>> >>>> Which NVMe drive SKUs specifically?
>>>>>>>>>> >>> # */dev/nvme6n1* – *KCD61LUL15T3* – 15.36 TB – SN: 6250A02QT5A8
>>>>>>>>>> >>> # */dev/nvme5n1* – *KCD61LUL15T3* – 15.36 TB – SN: 42R0A036T5A8
>>>>>>>>>> >>> # */dev/nvme4n1* – *KCD61LUL15T3* – 15.36 TB – SN: 6250A02UT5A8
>>>>>>>>>> >> Kioxia CD6.  If you were using client-class drives all manner of
>>>>>>>>>> >> performance issues would be expected.
>>>>>>>>>> >>
>>>>>>>>>> >> Is your server chassis at least PCIe Gen 4? If it’s Gen 3 that may
>>>>>>>>>> >> hamper these drives.
>>>>>>>>>> >>
>>>>>>>>>> >> Also, how many of these are in your cluster? If it’s a small 
>>>>>>>>>> >> number
>>>>>>>>>> >> you might still benefit from chopping each into at least 2 
>>>>>>>>>> >> separate
>>>>>>>>>> >> OSDs.
>>>>>>>>>> >>
>>>>>>>>>> >> And please send `ceph osd dump | grep pool`, having too few PGs
>>>>>>>>>> >> wouldn’t do you any favors.
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>>> Are you running a recent kernel?
>>>>>>>>>> >>> penultimate: 6.8.12-8-pve (VM, yes)
>>>>>>>>>> >> Groovy.  If you were running like a CentOS 6 or CentOS 7 kernel 
>>>>>>>>>> >> then
>>>>>>>>>> >> NVMe issues might be expected as old kernels had rudimentary NVMe
>>>>>>>>>> >> support.
>>>>>>>>>> >>
>>>>>>>>>> >>>>   Have you updated firmware on the NVMe devices?
>>>>>>>>>> >>> No.
>>>>>>>>>> >> Kioxia appears to not release firmware updates publicly but your
>>>>>>>>>> >> chassis brand (Dell, HP, SMCI, etc) might have an update.
>>>>>>>>>> >> e.g.https://www.dell.com/support/home/en-vc/drivers/driversdetails?driverid=7ny55
>>>>>>>>>> >>  
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>   If there is an available update I would strongly suggest 
>>>>>>>>>> >> applying.
>>>>>>>>>> >
>>>>>>>>>> >>
>>>>>>>>>> >>> Thanks again,
>>>>>>>>>> >>>
>>>>>>>>>> >>> best regards,
>>>>>>>>>> >>> Gio
>>>>>>>>>> >>>
>>>>>>>>>> >>> _______________________________________________
>>>>>>>>>> >>> ceph-users mailing list --ceph-users@ceph.io
>>>>>>>>>> >>> To unsubscribe send an email toceph-users-le...@ceph.io
>>>>>>>>>> > _______________________________________________
>>>>>>>>>> > ceph-users mailing list -- ceph-users@ceph.io
>>>>>>>>>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>>>>>>>>> 
>>>>>>>>>> Eneko Lacunza
>>>>>>>>>> Zuzendari teknikoa | Director técnico
>>>>>>>>>> Binovo IT Human Project
>>>>>>>>>> 
>>>>>>>>>> Tel. +34 943 569 206 <tel:+34 943 569 206> | https://www.binovo.es
>>>>>>>>>> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
>>>>>>>>>> 
>>>>>>>>>> https://www.youtube.com/user/CANALBINOVO
>>>>>>>>>> https://www.linkedin.com/company/37269706/
>>>>>>>>>> _______________________________________________
>>>>>>>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>>>>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>>      EnekoLacunza
>>>>>>>> 
>>>>>>>> Director Técnico | Zuzendari teknikoa
>>>>>>>> 
>>>>>>>> Binovo IT Human Project
>>>>>>>> 
>>>>>>>>     943 569 206 <tel:943 569 206>
>>>>>>>> 
>>>>>>>> elacu...@binovo.es <mailto:elacu...@binovo.es>
>>>>>>>> 
>>>>>>>>     binovo.es <//binovo.es>
>>>>>>>> 
>>>>>>>>     Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun
>>>>>>>> 
>>>>>>>> 
>>>>>>>> youtube <https://www.youtube.com/user/CANALBINOVO/>
>>>>>>>>     linkedin <https://www.linkedin.com/company/37269706/>
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>>>> 
>>>>>> Eneko Lacunza
>>>>>> Zuzendari teknikoa | Director técnico
>>>>>> Binovo IT Human Project
>>>>>> 
>>>>>> Tel. +34 943 569 206 | https://www.binovo.es
>>>>>> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
>>>>>> 
>>>>>> https://www.youtube.com/user/CANALBINOVO
>>>>>> https://www.linkedin.com/company/37269706/
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>>> 
>>>> 
>>>> Eneko Lacunza
>>>> Zuzendari teknikoa | Director técnico
>>>> Binovo IT Human Project
>>>> 
>>>> Tel. +34 943 569 206 |https://www.binovo.es
>>>> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
>>>> 
>>>> https://www.youtube.com/user/CANALBINOVO
>>>> https://www.linkedin.com/company/37269706/
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>> 
>> 
>> Eneko Lacunza
>> Zuzendari teknikoa | Director técnico
>> Binovo IT Human Project
>> 
>> Tel. +34 943 569 206 | https://www.binovo.es
>> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
>> 
>> https://www.youtube.com/user/CANALBINOVO
>> https://www.linkedin.com/company/37269706/
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Experience with 100G Ceph in Proxmox

Reply via email to