Okay that should be the answer...

I think it would be great to use Intel P3700 1.6TB as bcache in the iscsi rbd client gateway nodes.

caching device: Intel P3700 1.6TB

backing device: RBD from Ceph Cluster

What do you mean? I think this setup should improve the performance dramatically or not?

If i enable writeback in these nodes and use tgt for vmware. What happens if iscsi node 1 goes offline. Power Loss... or Linux Kernel crash.



Am 21.07.16 um 15:57 schrieb Nick Fisk:

What you are seeing is probably averaged over 1 second or something like that. So yes in 1 second IO would have run on all OSD’s. But for any 1 point in time a single thread will only run on 1 OSD (+2 replicas) assuming the IO size isn’t bigger than the object size.

For RBD, If data is striped in 4MB chunks, then you will have to read/write more than 4MB at a time to cross over to the next object. You get exactly the same problems with reading when you don’t set the readahead above 4MB.

*From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *w...@globe.de
*Sent:* 21 July 2016 14:05
*To:* ceph-users@lists.ceph.com
*Subject:* Re: [ceph-users] Ceph + VMware + Single Thread Performance

That can not be correct.

Check it at your cluster with dstat as i said...

You will see at every node parallel IO on every OSD and journal....

Am 21.07.16 um 15:02 schrieb Jake Young:

    I think the answer is that with 1 thread you can only ever write
    to one journal at a time. Theoretically, you would need 10 threads
    to be able to write to 10 nodes at the same time.

    Jake

    On Thursday, July 21, 2016, w...@globe.de <mailto:w...@globe.de>
    <w...@globe.de <mailto:w...@globe.de>> wrote:

        What i not really undertand is:

        Lets say the Intel P3700 works with 200 MByte/s rados bench
        one thread... See Nicks results below...

        If we have multiple OSD Nodes. For example 10 Nodes.

        Every Node has exactly 1x P3700 NVMe built in.

        Why is the single Thread performance exactly at 200 MByte/s on
        the rbd client with 10 OSD Node Cluster???

        I think it must be at 10 Nodes * 200 MByte/s = 2000 MByte/s.

        Everyone look yourself at your cluster.

        dstat -D sdb,sdc,sdd,sdX ....

        You will see that Ceph stripes the data over all OSD's in the
        cluster if you test at the client side with rados bench...

        *rados bench -p rbd 60 write -b 4M -t 1*

        Am 21.07.16 um 14:38 schrieb w...@globe.de
        <javascript:_e(%7B%7D,'cvml','w...@globe.de');>:

            Is there not a way to enable Linux page Cache? So do not
            user D_Sync...

            Then we would the dramatically performance improve.


            Am 21.07.16 um 14:33 schrieb Nick Fisk:

                    -----Original Message-----
                    From: w...@globe.de
                    <javascript:_e(%7B%7D,'cvml','w...@globe.de');>
                    [mailto:w...@globe.de
                    <javascript:_e(%7B%7D,'cvml','w...@globe.de');>]
                    Sent: 21 July 2016 13:23
                    To: n...@fisk.me.uk
                    <javascript:_e(%7B%7D,'cvml','n...@fisk.me.uk');>;
                    'Horace Ng' <hor...@hkisl.net>
                    <javascript:_e(%7B%7D,'cvml','hor...@hkisl.net');>
                    Cc: ceph-users@lists.ceph.com
                    <javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>

                    Subject: Re: [ceph-users] Ceph + VMware + Single
                    Thread Performance

                    Okay and what is your plan now to speed up ?

                Now I have come up with a lower latency hardware
                design, there is not much further improvement until
                persistent RBD caching is implemented, as you will be
                moving the SSD/NVME closer to the client. But I'm
                happy with what I can achieve at the moment. You could
                also experiment with bcache on the RBD.


                    Would it help to put in multiple P3700 per OSD
                    Node to improve performance for a single Thread
                    (example Storage VMotion) ?

                Most likely not, it's all the other parts of the
                puzzle which are causing the latency. ESXi was
                designed for storage arrays that service IO's in
                100us-1ms range, Ceph is probably about 10x slower
                than this, hence the problem. Disable the BBWC on a
                RAID controller or SAN and you will the same behaviour.


                    Regards


                    Am 21.07.16 um 14:17 schrieb Nick Fisk:

                            -----Original Message-----
                            From: ceph-users
                            [mailto:ceph-users-boun...@lists.ceph.com
                            
<javascript:_e(%7B%7D,'cvml','ceph-users-boun...@lists.ceph.com');>]
                            On Behalf
                            Of w...@globe.de
                            <javascript:_e(%7B%7D,'cvml','w...@globe.de');>
                            Sent: 21 July 2016 13:04
                            To: n...@fisk.me.uk
                            <javascript:_e(%7B%7D,'cvml','n...@fisk.me.uk');>;
                            'Horace Ng' <hor...@hkisl.net>
                            <javascript:_e(%7B%7D,'cvml','hor...@hkisl.net');>

                            Cc: ceph-users@lists.ceph.com
                            
<javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>

                            Subject: Re: [ceph-users] Ceph + VMware +
                            Single Thread Performance

                            Hi,

                            hmm i think 200 MByte/s is really bad. Is
                            your Cluster in production right now?

                        It's just been built, not running yet.


                            So if you start a storage migration you
                            get only 200 MByte/s right?

                        I wish. My current cluster (not this new one)
                        would storage migrate at
                        ~10-15MB/s. Serial latency is the problem,
                        without being able to
                        buffer, ESXi waits on an ack for each IO
                        before sending the next. Also it submits the
                        migrations in 64kb chunks, unless you get VAAI

                    working. I think esxi will try and do them in
                    parallel, which will help as well.

                            I think it would be awesome if you get
                            1000 MByte/s

                            Where is the Bottleneck?

                        Latency serialisation, without a buffer, you
                        can't drive the devices
                        to 100%. With buffered IO (or high queue
                        depths) I can max out the journals.


                            A FIO Test from Sebastien Han give us 400
                            MByte/s raw performance from the P3700.

                            
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your
                            
<http://xo4t.mj.am/lnk/AEUAEVRyDsUAAAAAAAAAAF3gduwAADNJBWwAAAAAAACRXwBXkNS25jxwid4zTY2Q-MxCEbrCqgAAlBI/1/23ZTV1Q8lRkjcImuQ7SFow/aHR0cHM6Ly93d3cuc2ViYXN0aWVuLWhhbi5mci9ibG9nLzIwMTQvMTAvMTAvY2VwaC1ob3ctdG8tdGVzdC1pZi15b3Vy>

                            -ssd-is-suitable-as-a-journal-device/

                            How could it be that the rbd client
                            performance is 50% slower?

                            Regards


                            Am 21.07.16 um 12:15 schrieb Nick Fisk:

                                I've had a lot of pain with this,
                                smaller block sizes are even worse.
                                You want to try and minimize latency
                                at every point as there is no
                                buffering happening in the iSCSI
                                stack. This means:-

                                1. Fast journals (NVME or NVRAM)
                                2. 10GB or better networking
                                3. Fast CPU's (Ghz)
                                4. Fix CPU c-state's to C1
                                5. Fix CPU's Freq to max

                                Also I can't be sure, but I think
                                there is a metadata update
                                happening with VMFS, particularly if
                                you are using thin VMDK's, this
                                can also be a major bottleneck. For my
                                use case, I've switched over to NFS as
                                it has given much more performance at
                                scale and

                    less headache.

                                For the RADOS Run, here you go (400GB
                                P3700):

                                Total time run:         60.026491
                                Total writes made:      3104
                                Write size:             4194304
                                Object size:            4194304
                                Bandwidth (MB/sec):     206.842
                                Stddev Bandwidth:       8.10412
                                Max bandwidth (MB/sec): 224
                                Min bandwidth (MB/sec): 180
                                Average IOPS:           51
                                Stddev IOPS:            2
                                Max IOPS:               56
                                Min IOPS:               45
                                Average Latency(s):     0.0193366
                                Stddev Latency(s):      0.00148039
                                Max latency(s):         0.0377946
                                Min latency(s):         0.015909

                                Nick


                                    -----Original Message-----
                                    From: ceph-users
                                    [mailto:ceph-users-boun...@lists.ceph.com
                                    
<javascript:_e(%7B%7D,'cvml','ceph-users-boun...@lists.ceph.com');>]
                                    On
                                    Behalf Of Horace
                                    Sent: 21 July 2016 10:26
                                    To: w...@globe.de
                                    
<javascript:_e(%7B%7D,'cvml','w...@globe.de');>

                                    Cc: ceph-users@lists.ceph.com
                                    
<javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>

                                    Subject: Re: [ceph-users] Ceph +
                                    VMware + Single Thread Performance

                                    Hi,

                                    Same here, I've read some blog
                                    saying that vmware will frequently
                                    verify the locking on VMFS over
                                    iSCSI, hence it will have much
                                    slower performance than NFS (with
                                    different locking mechanism).

                                    Regards,
                                    Horace Ng

                                    ----- Original Message -----
                                    From: w...@globe.de
                                    
<javascript:_e(%7B%7D,'cvml','w...@globe.de');>

                                    To: ceph-users@lists.ceph.com
                                    
<javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>

                                    Sent: Thursday, July 21, 2016
                                    5:11:21 PM
                                    Subject: [ceph-users] Ceph +
                                    VMware + Single Thread Performance

                                    Hi everyone,

                                    we see at our cluster relatively
                                    slow Single Thread Performance on
                                    the iscsi Nodes.


                                    Our setup:

                                    3 Racks:

                                    18x Data Nodes, 3 Mon Nodes, 3
                                    iscsi Gateway Nodes with tgt (rbd
                                    cache off).

                                    2x Samsung SM863 Enterprise SSD
                                    for Journal (3 OSD per SSD) and 6x
                                    WD Red 1TB per Data Node as OSD.

                                    Replication = 3

                                    chooseleaf = 3 type Rack in the
                                    crush map


                                    We get only ca. 90 MByte/s on the
                                    iscsi Gateway Servers with:

                                    rados bench -p rbd 60 write -b 4M
                                    -t 1


                                    If we test with:

                                    rados bench -p rbd 60 write -b 4M
                                    -t 32

                                    we get ca. 600 - 700 MByte/s


                                    We plan to replace the Samsung SSD
                                    with Intel DC P3700 PCIe NVM'e
                                    for the Journal to get better
                                    Single Thread Performance.

                                    Is anyone of you out there who has
                                    an Intel P3700 for Journal an
                                    can give me back test results with:


                                    rados bench -p rbd 60 write -b 4M
                                    -t 1


                                    Thank you very much !!

                                    Kind Regards !!

                                    
_______________________________________________

                                    ceph-users mailing list
                                    ceph-users@lists.ceph.com
                                    
<javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>

                                    
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
                                    
<http://xo4t.mj.am/lnk/AEUAEVRyDsUAAAAAAAAAAF3gduwAADNJBWwAAAAAAACRXwBXkNS25jxwid4zTY2Q-MxCEbrCqgAAlBI/2/g9lPlFLVt4e9TNXUI9CNjQ/aHR0cDovL2xpc3RzLmNlcGguY29tL2xpc3RpbmZvLmNnaS9jZXBoLXVzZXJzLWNlcGguY29t>

                                    
_______________________________________________

                                    ceph-users mailing list
                                    ceph-users@lists.ceph.com
                                    
<javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>

                                    
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
                                    
<http://xo4t.mj.am/lnk/AEUAEVRyDsUAAAAAAAAAAF3gduwAADNJBWwAAAAAAACRXwBXkNS25jxwid4zTY2Q-MxCEbrCqgAAlBI/3/8YpZJi41UwAV7Xr9sKW1HA/aHR0cDovL2xpc3RzLmNlcGguY29tL2xpc3RpbmZvLmNnaS9jZXBoLXVzZXJzLWNlcGguY29t>


                            _______________________________________________

                            ceph-users mailing list
                            ceph-users@lists.ceph.com
                            
<javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>

                            
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
                            
<http://xo4t.mj.am/lnk/AEUAEVRyDsUAAAAAAAAAAF3gduwAADNJBWwAAAAAAACRXwBXkNS25jxwid4zTY2Q-MxCEbrCqgAAlBI/4/J-6BODPdc4qnht0N7izvag/aHR0cDovL2xpc3RzLmNlcGguY29tL2xpc3RpbmZvLmNnaS9jZXBoLXVzZXJzLWNlcGguY29t>



            _______________________________________________
            ceph-users mailing list
            ceph-users@lists.ceph.com
            <javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');>
            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
            
<http://xo4t.mj.am/lnk/AEUAEVRyDsUAAAAAAAAAAF3gduwAADNJBWwAAAAAAACRXwBXkNS25jxwid4zTY2Q-MxCEbrCqgAAlBI/5/eRCJCLsvgeqo55wM0Ecg6A/aHR0cDovL2xpc3RzLmNlcGguY29tL2xpc3RpbmZvLmNnaS9jZXBoLXVzZXJzLWNlcGguY29t>






--
mit freundlichen Grüßen,
Yours sincerely,

Wilhelm Redbrake - Network Engineer
voice: +49-251-5205-355
mail: w...@globe.de

Globe Development GmbH  Königsberger Straße 260  48157 Münster
voice: +49-251-5205-20  fax: +49-251-5205-299
mail: i...@globe.de web: http://www.globe.de
Registergericht: Amtsgericht Münster/Westfalen, HBR 5523
Geschäftsführer: Martin Stein

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to