For the record

--direct=1 (or any O_DIRECT IO anywhere) is by itselt not guaranteed to be 
unbuffered and synchronous.
you need to add
--direct=1 --sync=1 --fsync=1 to make sure you are actually flushing the data 
somewhere. (This puts additional OPS in the queue though)
In case of RBD this is important because O_DIRECT write by itself could 
actually end in rbd cache.
Not sure how it is with different kernels, I believe this behaviour changed 
several times as applications have different assumptions on durability of 
O_DIRECT writes.
I can probably dig some reference to that if you want...

Jan

> On 09 Sep 2015, at 11:06, Nick Fisk <n...@fisk.me.uk> wrote:
> 
> It looks like you are using the kernel RBD client, ie you ran "rbd map ...." 
> In which case the librbd settings in the ceph.conf won't have any affect as 
> they are only for if you are using fio with the librbd engine.
> 
> There are several things you may have to do to improve Kernel client 
> performance, but 1st thing you need to pass the "direct=1" flag to your fio 
> job to get a realistic idea of your clusters performance. But be warned if 
> you thought you had bad performance now, you will likely be shocked after you 
> enable it.
> 
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Daleep Bais
>> Sent: 09 September 2015 09:37
>> To: Nick Fisk <n...@fisk.me.uk>
>> Cc: Ceph-User <ceph-us...@ceph.com>
>> Subject: Re: [ceph-users] Poor IOPS performance with Ceph
>> 
>> Hi Nick,
>> 
>> I dont have separate SSD / HDD for journal. I am using a 10 G partition on 
>> the
>> same HDD for journaling. They are rotating HDD's and not SSD's.
>> 
>> I am using below command to run the test:
>> 
>> fio --name=test --filename=test --bs=4k  --size=4G --readwrite=read / write
>> 
>> I did few kernel tuning and that has improved my write IOPS. For read I am
>> using rbd_readahead  and also used read_ahead_kb kernel tuning
>> parameter.
>> 
>> Also I should mention that its not x86, its on armv7 32bit.
>> 
>> Thanks.
>> 
>> Daleep Singh Bais
>> 
>> 
>> 
>> On Wed, Sep 9, 2015 at 1:55 PM, Nick Fisk <n...@fisk.me.uk> wrote:
>>> -----Original Message-----
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
>> Of
>>> Daleep Bais
>>> Sent: 09 September 2015 09:18
>>> To: Ceph-User <ceph-us...@ceph.com>
>>> Subject: [ceph-users] Poor IOPS performance with Ceph
>>> 
>>> Hi,
>>> 
>>> I have made a test ceph cluster of 6 OSD's and 03 MON. I am testing the
>> read
>>> write performance for the test cluster and the read IOPS is  poor.
>>> When I individually test it for each HDD, I get good performance, whereas,
>>> when I test it for ceph cluster, it is poor.
>> 
>> Can you give any further details about your cluster. Are your HDD's backed by
>> SSD journals?
>> 
>>> 
>>> Between nodes, using iperf, I get good bandwidth.
>>> 
>>> My cluster info :
>>> 
>>> root@ceph-node3:~# ceph --version
>>> ceph version 9.0.2-752-g64d37b7
>>> (64d37b70a687eb63edf69a91196bb124651da210)
>>> root@ceph-node3:~# ceph -s
>>>    cluster 9654468b-5c78-44b9-9711-4a7c4455c480
>>>     health HEALTH_OK
>>>     monmap e9: 3 mons at {ceph-node10=192.168.1.210:6789/0,ceph-
>>> node17=192.168.1.217:6789/0,ceph-node3=192.168.1.203:6789/0}
>>>            election epoch 442, quorum 0,1,2 ceph-node3,ceph-node10,ceph-
>>> node17
>>>     osdmap e1850: 6 osds: 6 up, 6 in
>>>      pgmap v17400: 256 pgs, 2 pools, 9274 MB data, 2330 objects
>>>            9624 MB used, 5384 GB / 5394 GB avail
>>>                 256 active+clean
>>> 
>>> 
>>> I have mapped an RBD block device to client machine (Ubuntu 14) and from
>>> there, when I run tests using FIO, i get good write IOPS, however, read is
>>> poor comparatively.
>>> 
>>> Write IOPS : 44618 approx
>>> 
>>> Read IOPS : 7356 approx
>> 
>> 1st thing that strikes me is that your numbers are too good, unless these are
>> actually SSD's and not spinning HDD's? I would expect to get around a max of
>> 600 read IOPs for 6x 7.2k disks, so I guess either you are hitting the page
>> cache on the OSD node(s) or the librbd cache.
>> 
>> The writes are even higher, are you using the "direct=1" option in the Fio
>> job?
>> 
>>> 
>>> Pool replica - single
>>> pool 1 'test1' replicated size 1 min_size 1
>>> 
>>> I have implemented rbd_readahead in my ceph conf file also.
>>> Any suggestions in this regard with help me..
>>> 
>>> Thanks.
>>> 
>>> Daleep Singh Bais
>> 
>> 
>> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to