Re: [ceph-users] All SSD Pool - Odd Performance

Alexandre DERUMIER Sun, 22 Nov 2015 05:29:27 -0800

>>One test with proxmox ve 4 (qemu 2.4, iothread for device, and 
>>cache=writeback) gives 14856 iops


Please also note that qemu in proxmox ve 4 is compiled with jemalloc.


----- Mail original -----
De: "Udo Lembke" <ulem...@polarzone.de>
À: "Sean Redmond" <sean.redmo...@gmail.com>
Cc: "ceph-users" <ceph-us...@ceph.com>
Envoyé: Dimanche 22 Novembre 2015 04:29:29
Objet: Re: [ceph-users] All SSD Pool - Odd Performance

Hi Sean, 
Haomai is right, that qemu can have a huge performance differences. 

I have done two test to the same ceph-cluster (different pools, but this should 
not do any differences). 
One test with proxmox ve 4 (qemu 2.4, iothread for device, and cache=writeback) 
gives 14856 iops 
Same test with proxmox ve 3.4 (qemu 2.2.1, cache=writethrough) gives 5070 iops 
only. 

Here the results in long: 
############### proxmox ve 3.x ############### 
kvm --version 
QEMU emulator version 2.2.1, Copyright (c) 2003-2008 Fabrice Bellard 

VM: 
virtio2: ceph_ file:vm-405-disk-1,cache=writethrough,backup=no,size=4096G 

root@fileserver:/daten/support/test# fio --time_based --name=benchmark 
--size=4G --filename=/mnt/test.bin --ioengine=libaio --randrepeat=0 
--iodepth=128 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 
--rw=randwrite --blocksize=4k --group_reporting 
fio: time_based requires a runtime/timeout setting 
benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
iodepth=128 
... 
fio-2.1.11 
Starting 4 processes 
benchmark: Laying out IO file(s) (1 file(s) / 4096MB) 
Jobs: 1 (f=1): [_(1),w(1),_(2)] [100.0% done] [0KB/40024KB/0KB /s] [0/10.6K/0 
iops] [eta 00m:00s] 
benchmark: (groupid=0, jobs=4): err= 0: pid=7821: Sun Nov 22 04:07:47 2015 
write: io=16384MB, bw=20282KB/s, iops=5070, runt=827178msec 
slat (usec): min=0, max=2531.7K, avg=778.68, stdev=12757.26 
clat (usec): min=508, max=2755.2K, avg=99980.14, stdev=146967.17 
lat (msec): min=1, max=2755, avg=100.76, stdev=147.54 
clat percentiles (msec): 
| 1.00th=[ 10], 5.00th=[ 14], 10.00th=[ 19], 20.00th=[ 28], 
| 30.00th=[ 36], 40.00th=[ 43], 50.00th=[ 51], 60.00th=[ 63], 
| 70.00th=[ 81], 80.00th=[ 128], 90.00th=[ 237], 95.00th=[ 367], 
| 99.00th=[ 717], 99.50th=[ 889], 99.90th=[ 1516], 99.95th=[ 1713], 
| 99.99th=[ 2573] 
bw (KB /s): min= 4, max=30726, per=26.90%, avg=5456.84, stdev=3014.45 
lat (usec) : 750=0.01%, 1000=0.01% 
lat (msec) : 2=0.01%, 4=0.01%, 10=1.11%, 20=10.18%, 50=37.74% 
lat (msec) : 100=26.45%, 250=15.22%, 500=6.66%, 750=1.74%, 1000=0.55% 
lat (msec) : 2000=0.29%, >=2000=0.03% 
cpu : usr=0.36%, sys=2.31%, ctx=1148702, majf=0, minf=30 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% 
issued : total=r=0/w=4194304/d=0, short=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=128 

Run status group 0 (all jobs): 
WRITE: io=16384MB, aggrb=20282KB/s, minb=20282KB/s, maxb=20282KB/s, 
mint=827178msec, maxt=827178msec 

Disk stats (read/write): 
dm-0: ios=0/4483641, merge=0/0, ticks=0/104928824, in_queue=105927128, 
util=100.00%, aggrios=1/4469640, aggrmerge=0/14788, aggrticks=64/103711096, 
aggrin_queue=104165356, aggrutil=100.00% 
vda: ios=1/4469640, merge=0/14788, ticks=64/103711096, in_queue=104165356, 
util=100.00% 

############################################## 

############### proxmox ve 4.x ############### 
kvm --version 
QEMU emulator version 2.4.0.1 pve-qemu-kvm_2.4-12, Copyright (c) 2003-2008 
Fabrice Bellard 

grep ceph /etc/pve/qemu-server/102.conf 
virtio1: ceph_test:vm-102-disk-1,cache=writeback,iothread=on,size=100G 

root@fileserver-test:/daten/tv01/test# fio --time_based --name=benchmark 
--size=4G --filename=/mnt/test.bin --ioengine=libaio --randrepeat=0 
--iodepth=128 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 
--rw=randwrite --blocksize=4k --group_reporting 
fio: time_based requires a runtime/timeout setting 
benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
iodepth=128 
... 
fio-2.1.11 
Starting 4 processes 
Jobs: 4 (f=4): [w(4)] [99.6% done] [0KB/56148KB/0KB /s] [0/14.4K/0 iops] [eta 
00m:01s] 
benchmark: (groupid=0, jobs=4): err= 0: pid=26131: Sun Nov 22 03:51:04 2015 
write: io=0B, bw=59425KB/s, iops=14856, runt=282327msec 
slat (usec): min=6, max=216925, avg=261.78, stdev=1802.78 
clat (msec): min=1, max=330, avg=34.04, stdev=27.78 
lat (msec): min=1, max=330, avg=34.30, stdev=27.87 
clat percentiles (msec): 
| 1.00th=[ 10], 5.00th=[ 13], 10.00th=[ 14], 20.00th=[ 16], 
| 30.00th=[ 18], 40.00th=[ 19], 50.00th=[ 21], 60.00th=[ 24], 
| 70.00th=[ 33], 80.00th=[ 62], 90.00th=[ 81], 95.00th=[ 87], 
| 99.00th=[ 95], 99.50th=[ 100], 99.90th=[ 269], 99.95th=[ 277], 
| 99.99th=[ 297] 
bw (KB /s): min= 3, max=42216, per=25.10%, avg=14917.03, stdev=2990.50 
lat (msec) : 2=0.01%, 4=0.01%, 10=1.13%, 20=45.52%, 50=28.23% 
lat (msec) : 100=24.61%, 250=0.35%, 500=0.16% 
cpu : usr=2.20%, sys=14.42%, ctx=2462199, majf=0, minf=40 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% 
issued : total=r=0/w=4194304/d=0, short=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=128 

Run status group 0 (all jobs): 
WRITE: io=16384MB, aggrb=59424KB/s, minb=59424KB/s, maxb=59424KB/s, 
mint=282327msec, maxt=282327msec 

Disk stats (read/write): 
dm-0: ios=0/4192044, merge=0/0, ticks=0/35093432, in_queue=35116888, 
util=99.70%, aggrios=0/4194626, aggrmerge=0/14, aggrticks=0/34902692, 
aggrin_queue=34903976, aggrutil=99.65% 
vda: ios=0/4194626, merge=0/14, ticks=0/34902692, in_queue=34903976, 
util=99.65% 
############################################## 

regards 

Udo 

On 19.11.2015 11:46, Sean Redmond wrote: 



Hi Mike/Warren, 

Thanks for helping out here. I am running the below fio command to test this 
with 4 jobs and a iodepth of 128 

fio --time_based --name=benchmark --size=4G --filename=/mnt/test.bin 
--ioengine=libaio --randrepeat=0 --iodepth=128 --direct=1 --invalidate=1 
--verify=0 --verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k 
--group_reportin 

The QEMU instance is created using nova, the settings I can see in the config 
are below: 

<disk type='network' device='disk'> 
<driver name='qemu' type='raw' cache='writeback'/> 
<auth username='$$'> 
<secret type='ceph' uuid='$$'/> 
</auth> 
<source protocol='rbd' name='ssd_volume/volume-$$'> 
<host name='$$' port='6789'/> 
<host name='$$' port='6789'/> 
<host name='$$' port='6789'/> 
</source> 
<target dev='vde' bus='virtio'/> 
<serial>$$</serial> 
<address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> 
</disk> 


The below shows the output from running Fio: 

# fio --time_based --name=benchmark --size=4G --filename=/mnt/test.bin 
--ioengine=libaio --randrepeat=0 --iodepth=128 --direct=1 --invalidate=1 
--verify=0 --verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k 
--group_reporting 
fio: time_based requires a runtime/timeout setting 
benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
iodepth=128 
... 
benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
iodepth=128 
fio-2.0.13 
Starting 4 processes 
Jobs: 3 (f=3): [_www] [99.7% done] [0K/36351K/0K /s] [0 /9087 /0 iops] [eta 
00m:03s] 
benchmark: (groupid=0, jobs=4): err= 0: pid=8547: Thu Nov 19 05:16:31 2015 
write: io=16384MB, bw=19103KB/s, iops=4775 , runt=878269msec 
slat (usec): min=4 , max=2339.4K, avg=807.17, stdev=12460.02 
clat (usec): min=1 , max=2469.6K, avg=106265.05, stdev=138893.39 
lat (usec): min=67 , max=2469.8K, avg=107073.04, stdev=139377.68 
clat percentiles (usec): 
| 1.00th=[ 1928], 5.00th=[ 9408], 10.00th=[12352], 20.00th=[18816], 
| 30.00th=[43776], 40.00th=[64768], 50.00th=[78336], 60.00th=[89600], 
| 70.00th=[102912], 80.00th=[123392], 90.00th=[216064], 95.00th=[370688], 
| 99.00th=[733184], 99.50th=[782336], 99.90th=[1044480], 99.95th=[2088960], 
| 99.99th=[2342912] 
bw (KB/s) : min= 4, max=14968, per=26.11%, avg=4987.39, stdev=1947.67 
lat (usec) : 2=0.01%, 20=0.01%, 50=0.01%, 100=0.05%, 250=0.30% 
lat (usec) : 500=0.24%, 750=0.11%, 1000=0.08% 
lat (msec) : 2=0.23%, 4=0.46%, 10=4.47%, 20=15.08%, 50=11.28% 
lat (msec) : 100=35.47%, 250=23.52%, 500=5.92%, 750=1.96%, 1000=0.70% 
lat (msec) : 2000=0.06%, >=2000=0.06% 
cpu : usr=0.62%, sys=2.42%, ctx=1602209, majf=1, minf=101 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% 
issued : total=r=0/w=4194304/d=0, short=r=0/w=0/d=0 

Run status group 0 (all jobs): 
WRITE: io=16384MB, aggrb=19102KB/s, minb=19102KB/s, maxb=19102KB/s, 
mint=878269msec, maxt=878269msec 

Disk stats (read/write): 
vde: ios=1119/4330437, merge=0/105599, ticks=556/121755054, in_queue=121749666, 
util=99.86 

The below shows lspci from within the guest: 

# lspci | grep -i scsi 
00:04.0 SCSI storage controller: Red Hat, Inc Virtio block devic 

Thanks 

On Wed, Nov 18, 2015 at 7:05 PM, Warren Wang - ISD < warren.w...@walmart.com > 
wrote: 

BQ_BEGIN
What were you using for iodepth and numjobs? If you’re getting an average of 
2ms per operation, and you’re single threaded, I’d expect about 500 IOPS / 
thread, until you hit the limit of your QEMU setup, which may be a single IO 
thread. That’s also what I think Mike is alluding to. 

Warren 

From: Sean Redmond < sean.redmo...@gmail.com <mailto: sean.redmo...@gmail.com 
>> 
Date: Wednesday, November 18, 2015 at 6:39 AM 
To: " ceph-us...@ceph.com <mailto: ceph-us...@ceph.com >" < ceph-us...@ceph.com 
<mailto: ceph-us...@ceph.com >> 
Subject: [ceph-users] All SSD Pool - Odd Performance 

Hi, 

I have a performance question for anyone running an SSD only pool. Let me 
detail the setup first. 

12 X Dell PowerEdge R630 ( 2 X 2620v3 64Gb RAM) 
8 X intel DC 3710 800GB 
Dual port Solarflare 10GB/s NIC (one front and one back) 
Ceph 0.94.5 
Ubuntu 14.04 (3.13.0-68-generic) 

The above is in one pool that is used for QEMU guests, A 4k FIO test on the SSD 
directly yields around 55k Iops, the same test inside a QEMU guest seems to hit 
a limit around 4k Iops. If I deploy multiple guests they can all reach 4K Iops 
simultaneously. 

I don't see any evidence of a bottle neck on the OSD hosts,Is this limit inside 
the guest expected or I am just not looking deep enough yet? 

Thanks 

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
*** 






_______________________________________________
ceph-users mailing list ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

BQ_END


_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] All SSD Pool - Odd Performance

Reply via email to