Re: [ceph-users] fio librbd result is poor

mazhongming Sun, 18 Dec 2016 23:06:07 -0800

Hi Christian,
Thanks for your reply.


At 2016-12-19 14:01:57, "Christian Balzer" <ch...@gol.com> wrote:
>
>Hello,
>
>On Mon, 19 Dec 2016 13:29:07 +0800 (CST) 马忠明 wrote:
>
>> Hi guys,
>> 
>> So recently I was testing our ceph cluster which mainly used for block 
>> usage(rbd).
>> 
>> We have 30 ssd drives total(5 storage nodes,6 ssd drives each node).However 
>> the result of fio is very poor.
>>
>All relevant details are missing.
>SSD exact models, CPU/RAM config, network config, Ceph, OS/kernel, fio

>versions, the config you tested this with, as in replication.
SSD:Intel® SSD DC S3510 Series 1.2TB 2.5"   
CPU:2×Intel E5-2630v4
MEM:128GB
Network config:2*10G bond4  LACP network connection 
Ceph:Hammer 0.94.6
OS/kernel:  Ubuntu 14.04.5 LTS/3.13.0-96-generic
Fio:2.12


>
>> We tested the workload on ssd pool with following parameter :
>> 
>> "fio --size=50G \
>> 
>>        --ioengine=rbd \
>> 
>>        --direct=1 \
>> 
>>        --numjobs=1 \
>> 
>>        --rw=randwrite(randread) \
>> 
>>        --name=com_ssd_4k_randwrite(randread) \
>> 
>>        --bs=4k \
>> 
>>        --iodepth=32 \
>> 
>>        --pool=ssd_volumes \
>> 
>>        --runtime=60 \
>> 
>>        --ramp_time=30 \
>> 
>> --rbdname=4k_test_image"
>> 
>> and here is the result:
>> 
>> random write:4631;random read:21127 
>> 
>> 
>> 
>> 
>> I also tested  the pool(size=1,min_size=1,pg_num=256) which is consisted by  
>> only one single ssd drive with same workload pattern which is more 
>> acceptable.(random write:8303;random read:27859)
>> 
>I'm only going to comment on the write part.
>
>On my staging cluster (* see below) I ran your fio against the cache tier
>(so only SSDs involved) with this result:
>
>  write: io=4206.3MB, bw=71784KB/s, iops=17945, runt= 60003msec
>    slat (usec): min=0, max=531, avg= 3.26, stdev=11.33
>    clat (usec): min=5, max=41996, avg=1770.23, stdev=2260.61
>     lat (usec): min=9, max=41997, avg=1773.36, stdev=2260.60
>
>So more than 2 times better than your non-replicated test.
>
>4k randwrites stress the CPUs (run atop or such on your OSD nodes
>when doing a test run), so this might be your limit here.
>Along with less than optimal SSDs or a high latency network.

>
yes...CPU usage might be  the bottleneck of the whole system.BTW,our ceph 
cluster is combined with mirantis openstack,above result ran from one computer 
node.And I also ran pressure test with all 10 computer node.The result is 
almost same and cpu usage for all storage node  is nearly 50-60%.the cpu usage 
for every ssd osd is nearly 250-300%.


pool parameter for ssd_volomes(size=3,min_size=1,pg_num 2048 pgp_num 2048)




>Christian
>
>
>* Staging cluster:
>---
>4 nodes running latest Hammer under Debian Jessie (with sysvinit, kernel
>4.6) and manually created OSDs. 
>Infiniband (IPoIB) QDR (40Gb/s, about 30Gb/s effective) between all nodes.
>
>2 HDD OSD nodes with 32GB RAM, fast enough CPU (E5-2620 v3), 2x 200GB DC S3610 
>for
>OS and journals (2 per SSD), 4x 1GB 2.5" SATAs for OSDs.
>For my amusement and edification the OSDs of one node are formatted with
>XFS, the other one EXT4 (as all my production clusters).
>
>The 2 SSD ODS nodes have 1x 200GB DC S3610 (OS and 4 journal partitions)
>and 2x 400GB DC S3610s (2 180GB partitions, so 8 SSD OSDs total), same
>specs as the HDD nodes otherwise.
>Also one node with XFS, the other EXT4.
>
>Pools are size=2, min_size=1, obviously. 
>---
>
>> 
>> 
>> 
>> We have optimized the linux 
>> kernal(read_ahead,disk_scheduler,numa,swappiness) and 
>> ceph.conf(client_message,filestore_queue,journal_queue,rbd_cache).And 
>> checked the raid cache setting.
>> 
>> 
>> 
>> 
>> The only deficiency for the architecture is the unbalance weight between 
>> three racks which one rack has only one storage node.
>> 
>> 
>> 
>> 
>> So can anybody tell us whether  this  number is reasonable.If not,any 
>> suggestion to improve the number will be appreciated.
>> 
>> 
>> 
>> 
>>  
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>
>
>-- 
>Christian Balzer        Network/Systems Engineer                
>ch...@gol.com          Global OnLine Japan/Rakuten Communications
>http://www.gol.com/

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] fio librbd result is poor

Reply via email to