I can reproduce this in:
ceph version 0.81-423-g1fb4574
on Ubuntu 14.04. I have a two osd cluster with data on two sata spinners
(WD blacks) and journals on two ssd (Crucual m4's). I getting about 3.5
MB/s (kernel and librbd) using your dd command with direct on. Leaving
off direct I'm seeing about 140 MB/s (librbd) and 90 MB/s (kernel 3.11
[2]). The ssd's can do writes at about 180 MB/s each... which is
something to look at another day[1].
It would be interesting to know what version of Ceph Tyer is using, as
his setup seems not nearly impacted by adding direct. Also it might be
useful to know what make and model of ssd you both are using (some of
'em do not like a series of essentially sync writes). Having said that
testing my Crucial m4's shows they can do the dd command (with direct
*on*) at about 180 MB/s...hmmm...so it *is* the Ceph layer it seems.
Regards
Mark
[1] I set filestore_max_sync_interval = 100 (30G journal...ssd able to
do 180 MB/s etc), however I am still seeing writes to the spinners
during the 8s or so that the above dd tests take).
[2] Ubuntu 13.10 VM - I'll upgrade it to 14.04 and see if that helps at all.
On 21/06/14 09:17, Greg Poirier wrote:
Thanks Tyler. So, I'm not totally crazy. There is something weird going on.
I've looked into things about as much as I can:
- We have tested with collocated journals and dedicated journal disks.
- We have bonded 10Gb nics and have verified network configuration and
connectivity is sound
- We have run dd independently on the SSDs in the cluster and they are
performing fine
- We have tested both in a VM and with the RBD kernel module and get
identical performance
- We have pool size = 3, pool min size = 2 and have tested with min size
of 2 and 3 -- the performance impact is not bad
- osd_op times are approximately 6-12ms
- osd_sub_op times are 6-12 ms
- iostat reports service time of 6-12ms
- Latency between the storage and rbd client is approximately .1-.2ms
- Disabling replication entirely did not help significantly
On Fri, Jun 20, 2014 at 2:13 PM, Tyler Wilson <k...@linuxdigital.net
<mailto:k...@linuxdigital.net>> wrote:
Greg,
Not a real fix for you but I too run a full-ssd cluster and am able
to get 112MB/s with your command;
[root@plesk-test ~]# dd if=/dev/zero of=testfilasde bs=16k
count=65535 oflag=direct
65535+0 records in
65535+0 records out
1073725440 bytes (1.1 GB) copied, 9.59092 s, 112 MB/s
This of course is in a VM, here is my ceph config
[global]
fsid = <hidden>
mon_initial_members = node-1 node-2 node-3
mon_host = 192.168.0.3 192.168.0.4 192.168.0.5
auth_supported = cephx
osd_journal_size = 2048
filestore_xattr_use_omap = true
osd_pool_default_size = 2
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 1024
public_network = 192.168.0.0/24 <http://192.168.0.0/24>
osd_mkfs_type = xfs
cluster_network = 192.168.1.0/24 <http://192.168.1.0/24>
On Fri, Jun 20, 2014 at 11:08 AM, Greg Poirier
<greg.poir...@opower.com <mailto:greg.poir...@opower.com>> wrote:
I recently created a 9-node Firefly cluster backed by all SSDs.
We have had some pretty severe performance degradation when
using O_DIRECT in our tests (as this is how MySQL will be
interacting with RBD volumes, this makes the most sense for a
preliminary test). Running the following test:
dd if=/dev/zero of=testfilasde bs=16k count=65535 oflag=direct
779829248 bytes (780 MB) copied, 604.333 s, 1.3 MB/s
Shows us only about 1.5 MB/s throughput and 100 IOPS from the
single dd thread. Running a second dd process does show
increased throughput which is encouraging, but I am still
concerned by the low throughput of a single thread w/ O_DIRECT.
Two threads:
779829248 bytes (780 MB) copied, 604.333 s, 1.3 MB/s
126271488 bytes (126 MB) copied, 99.2069 s, 1.3 MB/s
I am testing with an RBD volume mounted with the kernel module
(I have also tested from within KVM, similar performance).
If allow caching, we start to see reasonable numbers from a
single dd process:
dd if=/dev/zero of=testfilasde bs=16k count=65535
65535+0 records in
65535+0 records out
1073725440 bytes (1.1 GB) copied, 2.05356 s, 523 MB/s
I can get >1GB/s from a single host with three threads.
Rados bench produces similar results.
Is there something I can do to increase the performance of
O_DIRECT? I expect performance degradation, but so much?
If I increase the blocksize to 4M, I'm able to get significantly
higher throughput:
3833593856 bytes (3.8 GB) copied, 44.2964 s, 86.5 MB/s
This still seems very low.
I'm using the deadline scheduler in all places. With noop
scheduler, I do not see a performance improvement.
Suggestions?
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com