That can not be correct.

Check it at your cluster with dstat as i said...

You will see at every node parallel IO on every OSD and journal....


Am 21.07.16 um 15:02 schrieb Jake Young:
I think the answer is that with 1 thread you can only ever write to one journal at a time. Theoretically, you would need 10 threads to be able to write to 10 nodes at the same time.

Jake

On Thursday, July 21, 2016, w...@globe.de <mailto:w...@globe.de> <w...@globe.de <mailto:w...@globe.de>> wrote:

    What i not really undertand is:

    Lets say the Intel P3700 works with 200 MByte/s rados bench one
    thread... See Nicks results below...

    If we have multiple OSD Nodes. For example 10 Nodes.

    Every Node has exactly 1x P3700 NVMe built in.

    Why is the single Thread performance exactly at 200 MByte/s on the
    rbd client with 10 OSD Node Cluster???

    I think it must be at 10 Nodes * 200 MByte/s = 2000 MByte/s.


    Everyone look yourself at your cluster.

    dstat -D sdb,sdc,sdd,sdX ....

    You will see that Ceph stripes the data over all OSD's in the
    cluster if you test at the client side with rados bench...

    *rados bench -p rbd 60 write -b 4M -t 1*



    Am 21.07.16 um 14:38 schrieb w...@globe.de
    <javascript:_e(%7B%7D,'cvml','w...@globe.de');>:
    Is there not a way to enable Linux page Cache? So do not user
    D_Sync...

    Then we would the dramatically performance improve.


    Am 21.07.16 um 14:33 schrieb Nick Fisk:
    -----Original Message-----
    From: w...@globe.de <javascript:_e(%7B%7D,'cvml','w...@globe.de');>
    [mailto:w...@globe.de <javascript:_e(%7B%7D,'cvml','w...@globe.de');>]
    Sent: 21 July 2016 13:23
    To: n...@fisk.me.uk
    <javascript:_e(%7B%7D,'cvml','n...@fisk.me.uk');>; 'Horace Ng'
    <hor...@hkisl.net>
    <javascript:_e(%7B%7D,'cvml','hor...@hkisl.net');>
    Cc: ceph-users@lists.ceph.com
    <javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>
    Subject: Re: [ceph-users] Ceph + VMware + Single Thread
    Performance

    Okay and what is your plan now to speed up ?
    Now I have come up with a lower latency hardware design, there
    is not much further improvement until persistent RBD caching is
    implemented, as you will be moving the SSD/NVME closer to the
    client. But I'm happy with what I can achieve at the moment. You
    could also experiment with bcache on the RBD.

    Would it help to put in multiple P3700 per OSD Node to improve
    performance for a single Thread (example Storage VMotion) ?
    Most likely not, it's all the other parts of the puzzle which
    are causing the latency. ESXi was designed for storage arrays
    that service IO's in 100us-1ms range, Ceph is probably about 10x
    slower than this, hence the problem. Disable the BBWC on a RAID
    controller or SAN and you will the same behaviour.

    Regards


    Am 21.07.16 um 14:17 schrieb Nick Fisk:
    -----Original Message-----
    From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com
    <javascript:_e(%7B%7D,'cvml','ceph-users-boun...@lists.ceph.com');>]
    On Behalf
    Of w...@globe.de <javascript:_e(%7B%7D,'cvml','w...@globe.de');>
    Sent: 21 July 2016 13:04
    To: n...@fisk.me.uk
    <javascript:_e(%7B%7D,'cvml','n...@fisk.me.uk');>; 'Horace
    Ng' <hor...@hkisl.net>
    <javascript:_e(%7B%7D,'cvml','hor...@hkisl.net');>
    Cc: ceph-users@lists.ceph.com
    <javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>
    Subject: Re: [ceph-users] Ceph + VMware + Single Thread
    Performance

    Hi,

    hmm i think 200 MByte/s is really bad. Is your Cluster in
    production right now?
    It's just been built, not running yet.

    So if you start a storage migration you get only 200 MByte/s
    right?
    I wish. My current cluster (not this new one) would storage
    migrate at
    ~10-15MB/s. Serial latency is the problem, without being able to
    buffer, ESXi waits on an ack for each IO before sending the
    next. Also it submits the migrations in 64kb chunks, unless
    you get VAAI
    working. I think esxi will try and do them in parallel, which
    will help as well.
    I think it would be awesome if you get 1000 MByte/s

    Where is the Bottleneck?
    Latency serialisation, without a buffer, you can't drive the
    devices
    to 100%. With buffered IO (or high queue depths) I can max out
    the journals.

    A FIO Test from Sebastien Han give us 400 MByte/s raw
    performance from the P3700.

    https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your

    -ssd-is-suitable-as-a-journal-device/

    How could it be that the rbd client performance is 50% slower?

    Regards


    Am 21.07.16 um 12:15 schrieb Nick Fisk:
    I've had a lot of pain with this, smaller block sizes are
    even worse.
    You want to try and minimize latency at every point as there
    is no
    buffering happening in the iSCSI stack. This means:-

    1. Fast journals (NVME or NVRAM)
    2. 10GB or better networking
    3. Fast CPU's (Ghz)
    4. Fix CPU c-state's to C1
    5. Fix CPU's Freq to max

    Also I can't be sure, but I think there is a metadata update
    happening with VMFS, particularly if you are using thin
    VMDK's, this
    can also be a major bottleneck. For my use case, I've
    switched over to NFS as it has given much more performance
    at scale and
    less headache.
    For the RADOS Run, here you go (400GB P3700):

    Total time run:         60.026491
    Total writes made:      3104
    Write size:             4194304
    Object size:            4194304
    Bandwidth (MB/sec):     206.842
    Stddev Bandwidth:       8.10412
    Max bandwidth (MB/sec): 224
    Min bandwidth (MB/sec): 180
    Average IOPS:           51
    Stddev IOPS:            2
    Max IOPS:               56
    Min IOPS:               45
    Average Latency(s):     0.0193366
    Stddev Latency(s):      0.00148039
    Max latency(s):         0.0377946
    Min latency(s):         0.015909

    Nick

    -----Original Message-----
    From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com
    <javascript:_e(%7B%7D,'cvml','ceph-users-boun...@lists.ceph.com');>]
    On
    Behalf Of Horace
    Sent: 21 July 2016 10:26
    To: w...@globe.de <javascript:_e(%7B%7D,'cvml','w...@globe.de');>
    Cc: ceph-users@lists.ceph.com
    <javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>
    Subject: Re: [ceph-users] Ceph + VMware + Single Thread
    Performance

    Hi,

    Same here, I've read some blog saying that vmware will
    frequently
    verify the locking on VMFS over iSCSI, hence it will have
    much slower performance than NFS (with different locking
    mechanism).

    Regards,
    Horace Ng

    ----- Original Message -----
    From: w...@globe.de
    <javascript:_e(%7B%7D,'cvml','w...@globe.de');>
    To: ceph-users@lists.ceph.com
    <javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>
    Sent: Thursday, July 21, 2016 5:11:21 PM
    Subject: [ceph-users] Ceph + VMware + Single Thread
    Performance

    Hi everyone,

    we see at our cluster relatively slow Single Thread
    Performance on the iscsi Nodes.


    Our setup:

    3 Racks:

    18x Data Nodes, 3 Mon Nodes, 3 iscsi Gateway Nodes with tgt
    (rbd cache off).

    2x Samsung SM863 Enterprise SSD for Journal (3 OSD per SSD)
    and 6x
    WD Red 1TB per Data Node as OSD.

    Replication = 3

    chooseleaf = 3 type Rack in the crush map


    We get only ca. 90 MByte/s on the iscsi Gateway Servers with:

    rados bench -p rbd 60 write -b 4M -t 1


    If we test with:

    rados bench -p rbd 60 write -b 4M -t 32

    we get ca. 600 - 700 MByte/s


    We plan to replace the Samsung SSD with Intel DC P3700 PCIe
    NVM'e
    for the Journal to get better Single Thread Performance.

    Is anyone of you out there who has an Intel P3700 for
    Journal an
    can give me back test results with:


    rados bench -p rbd 60 write -b 4M -t 1


    Thank you very much !!

    Kind Regards !!

    _______________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com
    <javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    _______________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com
    <javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    _______________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com
    <javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


    _______________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com
    <javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to