Re: [ceph-users] All SSD Pool - Odd Performance

Zoltan Arnold Nagy Sun, 22 Nov 2015 05:29:27 -0800

It would have been more interesting if you had tweaked only one option as now 
we can’t be sure which changed had what impact… :-)


> On 22 Nov 2015, at 04:29, Udo Lembke <ulem...@polarzone.de> wrote:
> 
> Hi Sean,
> Haomai is right, that qemu can have a huge performance differences.
> 
> I have done two test to the same ceph-cluster (different pools, but this 
> should not do any differences).
> One test with proxmox ve 4 (qemu 2.4, iothread for device, and 
> cache=writeback) gives 14856 iops
> Same test with proxmox ve 3.4 (qemu 2.2.1, cache=writethrough) gives 5070 
> iops only.
> 
> Here the results in long:
> ############### proxmox ve 3.x ###############
> kvm --version
> QEMU emulator version 2.2.1, Copyright (c) 2003-2008 Fabrice Bellard
> 
> VM:
> virtio2: ceph_file:vm-405-disk-1,cache=writethrough,backup=no,size=4096G 
> <file://vm-405-disk-1,cache=writethrough,backup=no,size=4096G>
> 
> root@fileserver:/daten/support/test# fio --time_based --name=benchmark 
> --size=4G --filename=/mnt/test.bin --ioengine=libaio --randrepeat=0 
> --iodepth=128 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 
> --numjobs=4 --rw=randwrite --blocksize=4k --group_reporting
> fio: time_based requires a runtime/timeout setting
> benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
> iodepth=128
> ...
> fio-2.1.11
> Starting 4 processes
> benchmark: Laying out IO file(s) (1 file(s) / 4096MB)
> Jobs: 1 (f=1): [_(1),w(1),_(2)] [100.0% done] [0KB/40024KB/0KB /s] [0/10.6K/0 
> iops] [eta 00m:00s]
> benchmark: (groupid=0, jobs=4): err= 0: pid=7821: Sun Nov 22 04:07:47 2015
>   write: io=16384MB, bw=20282KB/s, iops=5070, runt=827178msec
>     slat (usec): min=0, max=2531.7K, avg=778.68, stdev=12757.26
>     clat (usec): min=508, max=2755.2K, avg=99980.14, stdev=146967.17
>      lat (msec): min=1, max=2755, avg=100.76, stdev=147.54
>     clat percentiles (msec):
>      |  1.00th=[   10],  5.00th=[   14], 10.00th=[   19], 20.00th=[   28],
>      | 30.00th=[   36], 40.00th=[   43], 50.00th=[   51], 60.00th=[   63],
>      | 70.00th=[   81], 80.00th=[  128], 90.00th=[  237], 95.00th=[  367],
>      | 99.00th=[  717], 99.50th=[  889], 99.90th=[ 1516], 99.95th=[ 1713],
>      | 99.99th=[ 2573]
>     bw (KB  /s): min=    4, max=30726, per=26.90%, avg=5456.84, stdev=3014.45
>     lat (usec) : 750=0.01%, 1000=0.01%
>     lat (msec) : 2=0.01%, 4=0.01%, 10=1.11%, 20=10.18%, 50=37.74%
>     lat (msec) : 100=26.45%, 250=15.22%, 500=6.66%, 750=1.74%, 1000=0.55%
>     lat (msec) : 2000=0.29%, >=2000=0.03%
>   cpu          : usr=0.36%, sys=2.31%, ctx=1148702, majf=0, minf=30
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.1%
>      issued    : total=r=0/w=4194304/d=0, short=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=128
> 
> Run status group 0 (all jobs):
>   WRITE: io=16384MB, aggrb=20282KB/s, minb=20282KB/s, maxb=20282KB/s, 
> mint=827178msec, maxt=827178msec
> 
> Disk stats (read/write):
>     dm-0: ios=0/4483641, merge=0/0, ticks=0/104928824, in_queue=105927128, 
> util=100.00%, aggrios=1/4469640, aggrmerge=0/14788, aggrticks=64/103711096, 
> aggrin_queue=104165356, aggrutil=100.00%
>   vda: ios=1/4469640, merge=0/14788, ticks=64/103711096, in_queue=104165356, 
> util=100.00%
> 
> ##############################################
> 
> ############### proxmox ve 4.x ###############
> kvm --version
> QEMU emulator version 2.4.0.1 pve-qemu-kvm_2.4-12, Copyright (c) 2003-2008 
> Fabrice Bellard
> 
> grep ceph /etc/pve/qemu-server/102.conf 
> virtio1: ceph_test:vm-102-disk-1,cache=writeback,iothread=on,size=100G
> 
> root@fileserver-test:/daten/tv01/test# fio --time_based --name=benchmark 
> --size=4G --filename=/mnt/test.bin --ioengine=libaio --randrepeat=0 
> --iodepth=128 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 
> --numjobs=4 --rw=randwrite --blocksize=4k --group_reporting           
> fio: time_based requires a runtime/timeout setting                            
>                                                            
> benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
> iodepth=128                                                              
> ...                                                                           
>                                                                       
> fio-2.1.11
> Starting 4 processes
> Jobs: 4 (f=4): [w(4)] [99.6% done] [0KB/56148KB/0KB /s] [0/14.4K/0 iops] [eta 
> 00m:01s]
> benchmark: (groupid=0, jobs=4): err= 0: pid=26131: Sun Nov 22 03:51:04 2015
>   write: io=0B, bw=59425KB/s, iops=14856, runt=282327msec
>     slat (usec): min=6, max=216925, avg=261.78, stdev=1802.78
>     clat (msec): min=1, max=330, avg=34.04, stdev=27.78
>      lat (msec): min=1, max=330, avg=34.30, stdev=27.87
>     clat percentiles (msec):
>      |  1.00th=[   10],  5.00th=[   13], 10.00th=[   14], 20.00th=[   16],
>      | 30.00th=[   18], 40.00th=[   19], 50.00th=[   21], 60.00th=[   24],
>      | 70.00th=[   33], 80.00th=[   62], 90.00th=[   81], 95.00th=[   87],
>      | 99.00th=[   95], 99.50th=[  100], 99.90th=[  269], 99.95th=[  277],
>      | 99.99th=[  297]
>     bw (KB  /s): min=    3, max=42216, per=25.10%, avg=14917.03, stdev=2990.50
>     lat (msec) : 2=0.01%, 4=0.01%, 10=1.13%, 20=45.52%, 50=28.23%
>     lat (msec) : 100=24.61%, 250=0.35%, 500=0.16%
>   cpu          : usr=2.20%, sys=14.42%, ctx=2462199, majf=0, minf=40
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.1%
>      issued    : total=r=0/w=4194304/d=0, short=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=128
> 
> Run status group 0 (all jobs):
>   WRITE: io=16384MB, aggrb=59424KB/s, minb=59424KB/s, maxb=59424KB/s, 
> mint=282327msec, maxt=282327msec
> 
> Disk stats (read/write):
>     dm-0: ios=0/4192044, merge=0/0, ticks=0/35093432, in_queue=35116888, 
> util=99.70%, aggrios=0/4194626, aggrmerge=0/14, aggrticks=0/34902692, 
> aggrin_queue=34903976, aggrutil=99.65%
>   vda: ios=0/4194626, merge=0/14, ticks=0/34902692, in_queue=34903976, 
> util=99.65%
> ##############################################
> 
> regards
> 
> Udo
> 
> On 19.11.2015 11:46, Sean Redmond wrote:
>> Hi Mike/Warren,
>> 
>> Thanks for helping out here. I am running the below fio command to test this 
>> with 4 jobs and a iodepth of 128
>> 
>> fio --time_based --name=benchmark --size=4G --filename=/mnt/test.bin 
>> --ioengine=libaio --randrepeat=0 --iodepth=128 --direct=1 --invalidate=1 
>> --verify=0 --verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k 
>> --group_reportin
>> 
>> The QEMU instance is created using nova, the settings I can see in the 
>> config are below:
>> 
>>     <disk type='network' device='disk'>
>>       <driver name='qemu' type='raw' cache='writeback'/>
>>       <auth username='$$'>
>>         <secret type='ceph' uuid='$$'/>
>>       </auth>
>>       <source protocol='rbd' name='ssd_volume/volume-$$'>
>>         <host name='$$' port='6789'/>
>>         <host name='$$' port='6789'/>
>>         <host name='$$' port='6789'/>
>>       </source>
>>       <target dev='vde' bus='virtio'/>
>>       <serial>$$</serial>
>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x09' 
>> function='0x0'/>
>>     </disk>
>> 
>> 
>> The below shows the output from running Fio:
>> 
>> # fio --time_based --name=benchmark --size=4G --filename=/mnt/test.bin 
>> --ioengine=libaio --randrepeat=0 --iodepth=128 --direct=1 --invalidate=1 
>> --verify=0 --verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k 
>> --group_reporting
>> fio: time_based requires a runtime/timeout setting
>> benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
>> iodepth=128
>> ...
>> benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
>> iodepth=128
>> fio-2.0.13
>> Starting 4 processes
>> Jobs: 3 (f=3): [_www] [99.7% done] [0K/36351K/0K /s] [0 /9087 /0  iops] [eta 
>> 00m:03s]
>> benchmark: (groupid=0, jobs=4): err= 0: pid=8547: Thu Nov 19 05:16:31 2015
>>   write: io=16384MB, bw=19103KB/s, iops=4775 , runt=878269msec
>>     slat (usec): min=4 , max=2339.4K, avg=807.17, stdev=12460.02
>>     clat (usec): min=1 , max=2469.6K, avg=106265.05, stdev=138893.39
>>      lat (usec): min=67 , max=2469.8K, avg=107073.04, stdev=139377.68
>>     clat percentiles (usec):
>>      |  1.00th=[ 1928],  5.00th=[ 9408], 10.00th=[12352], 20.00th=[18816],
>>      | 30.00th=[43776], 40.00th=[64768], 50.00th=[78336], 60.00th=[89600],
>>      | 70.00th=[102912], 80.00th=[123392], 90.00th=[216064], 
>> 95.00th=[370688],
>>      | 99.00th=[733184], 99.50th=[782336], 99.90th=[1044480], 
>> 99.95th=[2088960],
>>      | 99.99th=[2342912]
>>     bw (KB/s)  : min=    4, max=14968, per=26.11%, avg=4987.39, stdev=1947.67
>>     lat (usec) : 2=0.01%, 20=0.01%, 50=0.01%, 100=0.05%, 250=0.30%
>>     lat (usec) : 500=0.24%, 750=0.11%, 1000=0.08%
>>     lat (msec) : 2=0.23%, 4=0.46%, 10=4.47%, 20=15.08%, 50=11.28%
>>     lat (msec) : 100=35.47%, 250=23.52%, 500=5.92%, 750=1.96%, 1000=0.70%
>>     lat (msec) : 2000=0.06%, >=2000=0.06%
>>   cpu          : usr=0.62%, sys=2.42%, ctx=1602209, majf=1, minf=101
>>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, 
>> >=64=100.0%
>>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>> >=64=0.0%
>>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>> >=64=0.1%
>>      issued    : total=r=0/w=4194304/d=0, short=r=0/w=0/d=0
>> 
>> Run status group 0 (all jobs):
>>   WRITE: io=16384MB, aggrb=19102KB/s, minb=19102KB/s, maxb=19102KB/s, 
>> mint=878269msec, maxt=878269msec
>> 
>> Disk stats (read/write):
>>   vde: ios=1119/4330437, merge=0/105599, ticks=556/121755054, 
>> in_queue=121749666, util=99.86
>> 
>> The below shows lspci from within the guest:
>> 
>> # lspci | grep -i scsi
>> 00:04.0 SCSI storage controller: Red Hat, Inc Virtio block devic
>> 
>> Thanks
>> 
>> On Wed, Nov 18, 2015 at 7:05 PM, Warren Wang - ISD <warren.w...@walmart.com 
>> <mailto:warren.w...@walmart.com>> wrote:
>> What were you using for iodepth and numjobs? If you’re getting an average of 
>> 2ms per operation, and you’re single threaded, I’d expect about 500 IOPS / 
>> thread, until you hit the limit of your QEMU setup, which may be a single IO 
>> thread. That’s also what I think Mike is alluding to.
>> 
>> Warren
>> 
>> From: Sean Redmond <sean.redmo...@gmail.com 
>> <mailto:sean.redmo...@gmail.com><mailto:sean.redmo...@gmail.com 
>> <mailto:sean.redmo...@gmail.com>>>
>> Date: Wednesday, November 18, 2015 at 6:39 AM
>> To: "ceph-us...@ceph.com 
>> <mailto:ceph-us...@ceph.com><mailto:ceph-us...@ceph.com 
>> <mailto:ceph-us...@ceph.com>>" <ceph-us...@ceph.com 
>> <mailto:ceph-us...@ceph.com><mailto:ceph-us...@ceph.com 
>> <mailto:ceph-us...@ceph.com>>>
>> Subject: [ceph-users] All SSD Pool - Odd Performance
>> 
>> Hi,
>> 
>> I have a performance question for anyone running an SSD only pool. Let me 
>> detail the setup first.
>> 
>> 12 X Dell PowerEdge R630 ( 2 X 2620v3 64Gb RAM)
>> 8 X intel DC 3710 800GB
>> Dual port Solarflare 10GB/s NIC (one front and one back)
>> Ceph 0.94.5
>> Ubuntu 14.04 (3.13.0-68-generic)
>> 
>> The above is in one pool that is used for QEMU guests, A 4k FIO test on the 
>> SSD directly yields around 55k Iops, the same test inside a QEMU guest seems 
>> to hit a limit around 4k Iops. If I deploy multiple guests they can all 
>> reach 4K Iops simultaneously.
>> 
>> I don't see any evidence of a bottle neck on the OSD hosts,Is this limit 
>> inside the guest expected or I am just not looking deep enough yet?
>> 
>> Thanks
>> 
>> This email and any files transmitted with it are confidential and intended 
>> solely for the individual or entity to whom they are addressed. If you have 
>> received this email in error destroy it immediately. *** Walmart 
>> Confidential ***
>> 
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] All SSD Pool - Odd Performance

Reply via email to