Re: [ceph-users] Poor performance on all SSD cluster

Mark Kirkwood Sat, 21 Jun 2014 19:10:24 -0700

I can reproduce this in:

ceph version 0.81-423-g1fb4574

on Ubuntu 14.04. I have a two osd cluster with data on two sata spinners(WD blacks) and journals on two ssd (Crucual m4's). I getting about 3.5MB/s (kernel and librbd) using your dd command with direct on. Leavingoff direct I'm seeing about 140 MB/s (librbd) and 90 MB/s (kernel 3.11[2]). The ssd's can do writes at about 180 MB/s each... which issomething to look at another day[1].

It would be interesting to know what version of Ceph Tyer is using, ashis setup seems not nearly impacted by adding direct. Also it might beuseful to know what make and model of ssd you both are using (some of'em do not like a series of essentially sync writes). Having said thattesting my Crucial m4's shows they can do the dd command (with direct*on*) at about 180 MB/s...hmmm...so it *is* the Ceph layer it seems.


Regards

Mark

[1] I set filestore_max_sync_interval = 100 (30G journal...ssd able todo 180 MB/s etc), however I am still seeing writes to the spinnersduring the 8s or so that the above dd tests take).

[2] Ubuntu 13.10 VM - I'll upgrade it to 14.04 and see if that helps at all.

On 21/06/14 09:17, Greg Poirier wrote:

Thanks Tyler. So, I'm not totally crazy. There is something weird going on.

I've looked into things about as much as I can:

- We have tested with collocated journals and dedicated journal disks.
- We have bonded 10Gb nics and have verified network configuration and
connectivity is sound
- We have run dd independently on the SSDs in the cluster and they are
performing fine
- We have tested both in a VM and with the RBD kernel module and get
identical performance
- We have pool size = 3, pool min size = 2 and have tested with min size
of 2 and 3 -- the performance impact is not bad
- osd_op times are approximately 6-12ms
- osd_sub_op times are 6-12 ms
- iostat reports service time of 6-12ms
- Latency between the storage and rbd client is approximately .1-.2ms
- Disabling replication entirely did not help significantly




On Fri, Jun 20, 2014 at 2:13 PM, Tyler Wilson <k...@linuxdigital.net
<mailto:k...@linuxdigital.net>> wrote:

    Greg,

    Not a real fix for you but I too run a full-ssd cluster and am able
    to get 112MB/s with your command;

    [root@plesk-test ~]# dd if=/dev/zero of=testfilasde bs=16k
    count=65535 oflag=direct
    65535+0 records in
    65535+0 records out
    1073725440 bytes (1.1 GB) copied, 9.59092 s, 112 MB/s

    This of course is in a VM, here is my ceph config

    [global]
    fsid = <hidden>
    mon_initial_members = node-1 node-2 node-3
    mon_host = 192.168.0.3 192.168.0.4 192.168.0.5
    auth_supported = cephx
    osd_journal_size = 2048
    filestore_xattr_use_omap = true
    osd_pool_default_size = 2
    osd_pool_default_min_size = 1
    osd_pool_default_pg_num = 1024
    public_network = 192.168.0.0/24 <http://192.168.0.0/24>
    osd_mkfs_type = xfs
    cluster_network = 192.168.1.0/24 <http://192.168.1.0/24>



    On Fri, Jun 20, 2014 at 11:08 AM, Greg Poirier
    <greg.poir...@opower.com <mailto:greg.poir...@opower.com>> wrote:

        I recently created a 9-node Firefly cluster backed by all SSDs.
        We have had some pretty severe performance degradation when
        using O_DIRECT in our tests (as this is how MySQL will be
        interacting with RBD volumes, this makes the most sense for a
        preliminary test). Running the following test:

        dd if=/dev/zero of=testfilasde bs=16k count=65535 oflag=direct

        779829248 bytes (780 MB) copied, 604.333 s, 1.3 MB/s

        Shows us only about 1.5 MB/s throughput and 100 IOPS from the
        single dd thread. Running a second dd process does show
        increased throughput which is encouraging, but I am still
        concerned by the low throughput of a single thread w/ O_DIRECT.

        Two threads:
        779829248 bytes (780 MB) copied, 604.333 s, 1.3 MB/s
        126271488 bytes (126 MB) copied, 99.2069 s, 1.3 MB/s

        I am testing with an RBD volume mounted with the kernel module
        (I have also tested from within KVM, similar performance).

        If allow caching, we start to see reasonable numbers from a
        single dd process:

        dd if=/dev/zero of=testfilasde bs=16k count=65535
        65535+0 records in
        65535+0 records out
        1073725440 bytes (1.1 GB) copied, 2.05356 s, 523 MB/s

        I can get >1GB/s from a single host with three threads.

        Rados bench produces similar results.

        Is there something I can do to increase the performance of
        O_DIRECT? I expect performance degradation, but so much?

        If I increase the blocksize to 4M, I'm able to get significantly
        higher throughput:

        3833593856 bytes (3.8 GB) copied, 44.2964 s, 86.5 MB/s

        This still seems very low.

        I'm using the deadline scheduler in all places. With noop
        scheduler, I do not see a performance improvement.

        Suggestions?


        _______________________________________________
        ceph-users mailing list
        ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
        http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Poor performance on all SSD cluster

Reply via email to