For the record --direct=1 (or any O_DIRECT IO anywhere) is by itselt not guaranteed to be unbuffered and synchronous. you need to add --direct=1 --sync=1 --fsync=1 to make sure you are actually flushing the data somewhere. (This puts additional OPS in the queue though) In case of RBD this is important because O_DIRECT write by itself could actually end in rbd cache. Not sure how it is with different kernels, I believe this behaviour changed several times as applications have different assumptions on durability of O_DIRECT writes. I can probably dig some reference to that if you want...
Jan > On 09 Sep 2015, at 11:06, Nick Fisk <n...@fisk.me.uk> wrote: > > It looks like you are using the kernel RBD client, ie you ran "rbd map ...." > In which case the librbd settings in the ceph.conf won't have any affect as > they are only for if you are using fio with the librbd engine. > > There are several things you may have to do to improve Kernel client > performance, but 1st thing you need to pass the "direct=1" flag to your fio > job to get a realistic idea of your clusters performance. But be warned if > you thought you had bad performance now, you will likely be shocked after you > enable it. > >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Daleep Bais >> Sent: 09 September 2015 09:37 >> To: Nick Fisk <n...@fisk.me.uk> >> Cc: Ceph-User <ceph-us...@ceph.com> >> Subject: Re: [ceph-users] Poor IOPS performance with Ceph >> >> Hi Nick, >> >> I dont have separate SSD / HDD for journal. I am using a 10 G partition on >> the >> same HDD for journaling. They are rotating HDD's and not SSD's. >> >> I am using below command to run the test: >> >> fio --name=test --filename=test --bs=4k --size=4G --readwrite=read / write >> >> I did few kernel tuning and that has improved my write IOPS. For read I am >> using rbd_readahead and also used read_ahead_kb kernel tuning >> parameter. >> >> Also I should mention that its not x86, its on armv7 32bit. >> >> Thanks. >> >> Daleep Singh Bais >> >> >> >> On Wed, Sep 9, 2015 at 1:55 PM, Nick Fisk <n...@fisk.me.uk> wrote: >>> -----Original Message----- >>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf >> Of >>> Daleep Bais >>> Sent: 09 September 2015 09:18 >>> To: Ceph-User <ceph-us...@ceph.com> >>> Subject: [ceph-users] Poor IOPS performance with Ceph >>> >>> Hi, >>> >>> I have made a test ceph cluster of 6 OSD's and 03 MON. I am testing the >> read >>> write performance for the test cluster and the read IOPS is poor. >>> When I individually test it for each HDD, I get good performance, whereas, >>> when I test it for ceph cluster, it is poor. >> >> Can you give any further details about your cluster. Are your HDD's backed by >> SSD journals? >> >>> >>> Between nodes, using iperf, I get good bandwidth. >>> >>> My cluster info : >>> >>> root@ceph-node3:~# ceph --version >>> ceph version 9.0.2-752-g64d37b7 >>> (64d37b70a687eb63edf69a91196bb124651da210) >>> root@ceph-node3:~# ceph -s >>> cluster 9654468b-5c78-44b9-9711-4a7c4455c480 >>> health HEALTH_OK >>> monmap e9: 3 mons at {ceph-node10=192.168.1.210:6789/0,ceph- >>> node17=192.168.1.217:6789/0,ceph-node3=192.168.1.203:6789/0} >>> election epoch 442, quorum 0,1,2 ceph-node3,ceph-node10,ceph- >>> node17 >>> osdmap e1850: 6 osds: 6 up, 6 in >>> pgmap v17400: 256 pgs, 2 pools, 9274 MB data, 2330 objects >>> 9624 MB used, 5384 GB / 5394 GB avail >>> 256 active+clean >>> >>> >>> I have mapped an RBD block device to client machine (Ubuntu 14) and from >>> there, when I run tests using FIO, i get good write IOPS, however, read is >>> poor comparatively. >>> >>> Write IOPS : 44618 approx >>> >>> Read IOPS : 7356 approx >> >> 1st thing that strikes me is that your numbers are too good, unless these are >> actually SSD's and not spinning HDD's? I would expect to get around a max of >> 600 read IOPs for 6x 7.2k disks, so I guess either you are hitting the page >> cache on the OSD node(s) or the librbd cache. >> >> The writes are even higher, are you using the "direct=1" option in the Fio >> job? >> >>> >>> Pool replica - single >>> pool 1 'test1' replicated size 1 min_size 1 >>> >>> I have implemented rbd_readahead in my ceph conf file also. >>> Any suggestions in this regard with help me.. >>> >>> Thanks. >>> >>> Daleep Singh Bais >> >> >> > > > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com