Re: [ceph-users] Ceph performance with 8K blocks.

Jason Villalta Tue, 17 Sep 2013 14:10:48 -0700

I will try both suggestions,  Thank you for your input.


On Tue, Sep 17, 2013 at 5:06 PM, Josh Durgin <josh.dur...@inktank.com>wrote:

> Also enabling rbd writeback caching will allow requests to be merged,
> which will help a lot for small sequential I/O.
>
>
> On 09/17/2013 02:03 PM, Gregory Farnum wrote:
>
>> Try it with oflag=dsync instead? I'm curious what kind of variation
>> these disks will provide.
>>
>> Anyway, you're not going to get the same kind of performance with
>> RADOS on 8k sync IO that you will with a local FS. It needs to
>> traverse the network and go through work queues in the daemon; your
>> primary limiter here is probably the per-request latency that you're
>> seeing (average ~30 ms, looking at the rados bench results). The good
>> news is that means you should be able to scale out to a lot of
>> clients, and if you don't force those 8k sync IOs (which RBD won't,
>> unless the application asks for them by itself using directIO or
>> frequent fsync or whatever) your performance will go way up.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Tue, Sep 17, 2013 at 1:47 PM, Jason Villalta <ja...@rubixnet.com>
>> wrote:
>>
>>>
>>> Here are the stats with direct io.
>>>
>>> dd of=ddbenchfile if=/dev/zero bs=8K count=1000000 oflag=direct
>>> 8192000000 bytes (8.2 GB) copied, 68.4789 s, 120 MB/s
>>>
>>> dd if=ddbenchfile of=/dev/null bs=8K
>>> 8192000000 bytes (8.2 GB) copied, 19.7318 s, 415 MB/s
>>>
>>> These numbers are still over all much faster than when using RADOS bench.
>>> The replica is set to 2.  The Journals are on the same disk but separate
>>> partitions.
>>>
>>> I kept the block size the same 8K.
>>>
>>>
>>>
>>>
>>> On Tue, Sep 17, 2013 at 11:37 AM, Campbell, Bill <
>>> bcampbell@axcess-financial.**com <bcampb...@axcess-financial.com>>
>>> wrote:
>>>
>>>>
>>>> As Gregory mentioned, your 'dd' test looks to be reading from the cache
>>>> (you are writing 8GB in, and then reading that 8GB out, so the reads are
>>>> all cached reads) so the performance is going to seem good.  You can add
>>>> the 'oflag=direct' to your dd test to try and get a more accurate reading
>>>> from that.
>>>>
>>>> RADOS performance from what I've seen is largely going to hinge on
>>>> replica size and journal location.  Are your journals on separate disks or
>>>> on the same disk as the OSD?  What is the replica size of your pool?
>>>>
>>>> ______________________________**__
>>>> From: "Jason Villalta" <ja...@rubixnet.com>
>>>> To: "Bill Campbell" 
>>>> <bcampbell@axcess-financial.**com<bcampb...@axcess-financial.com>
>>>> >
>>>> Cc: "Gregory Farnum" <g...@inktank.com>, "ceph-users" <
>>>> ceph-users@lists.ceph.com>
>>>> Sent: Tuesday, September 17, 2013 11:31:43 AM
>>>>
>>>> Subject: Re: [ceph-users] Ceph performance with 8K blocks.
>>>>
>>>> Thanks for you feed back it is helpful.
>>>>
>>>> I may have been wrong about the default windows block size.  What would
>>>> be the best tests to compare native performance of the SSD disks at 4K
>>>> blocks vs Ceph performance with 4K blocks?  It just seems their is a huge
>>>> difference in the results.
>>>>
>>>>
>>>> On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill <
>>>> bcampbell@axcess-financial.**com <bcampb...@axcess-financial.com>>
>>>> wrote:
>>>>
>>>>>
>>>>> Windows default (NTFS) is a 4k block.  Are you changing the allocation
>>>>> unit to 8k as a default for your configuration?
>>>>>
>>>>> ______________________________**__
>>>>> From: "Gregory Farnum" <g...@inktank.com>
>>>>> To: "Jason Villalta" <ja...@rubixnet.com>
>>>>> Cc: ceph-users@lists.ceph.com
>>>>> Sent: Tuesday, September 17, 2013 10:40:09 AM
>>>>> Subject: Re: [ceph-users] Ceph performance with 8K blocks.
>>>>>
>>>>>
>>>>> Your 8k-block dd test is not nearly the same as your 8k-block rados
>>>>> bench or SQL tests. Both rados bench and SQL require the write to be
>>>>> committed to disk before moving on to the next one; dd is simply writing
>>>>> into the page cache. So you're not going to get 460 or even 273MB/s with
>>>>> sync 8k writes regardless of your settings.
>>>>>
>>>>> However, I think you should be able to tune your OSDs into somewhat
>>>>> better numbers -- that rados bench is giving you ~300IOPs on every OSD
>>>>> (with a small pipeline!), and an SSD-based daemon should be going faster.
>>>>> What kind of logging are you running with and what configs have you set?
>>>>>
>>>>> Hopefully you can get Mark or Sam or somebody who's done some
>>>>> performance tuning to offer some tips as well. :)
>>>>> -Greg
>>>>>
>>>>> On Tuesday, September 17, 2013, Jason Villalta wrote:
>>>>>
>>>>>>
>>>>>> Hello all,
>>>>>> I am new to the list.
>>>>>>
>>>>>> I have a single machines setup for testing Ceph.  It has a dual proc
>>>>>> 6 cores(12core total) for CPU and 128GB of RAM.  I also have 3 Intel 520
>>>>>> 240GB SSDs and an OSD setup on each disk with the OSD and Journal in
>>>>>> separate partitions formatted with ext4.
>>>>>>
>>>>>> My goal here is to prove just how fast Ceph can go and what kind of
>>>>>> performance to expect when using it as a back-end storage for virtual
>>>>>> machines mostly windows.  I would also like to try to understand how it
>>>>>> will scale IO by removing one disk of the three and doing the benchmark
>>>>>> tests.  But that is secondary.  So far here are my results.  I am aware
>>>>>> this is all sequential, I just want to know how fast it can go.
>>>>>>
>>>>>> DD IO test of SSD disks:  I am testing 8K blocks since that is the
>>>>>> default block size of windows.
>>>>>>   dd of=ddbenchfile if=/dev/zero bs=8K count=1000000
>>>>>> 8192000000 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s
>>>>>>
>>>>>> dd if=ddbenchfile of=/dev/null bs=8K
>>>>>> 8192000000 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s
>>>>>>
>>>>>> RADOS bench test with 3 SSD disks and 4MB object size(Default):
>>>>>> rados --no-cleanup bench -p pbench 30 write
>>>>>> Total writes made:      2061
>>>>>> Write size:             4194304
>>>>>> Bandwidth (MB/sec):     273.004
>>>>>>
>>>>>> Stddev Bandwidth:       67.5237
>>>>>> Max bandwidth (MB/sec): 352
>>>>>> Min bandwidth (MB/sec): 0
>>>>>> Average Latency:        0.234199
>>>>>> Stddev Latency:         0.130874
>>>>>> Max latency:            0.867119
>>>>>> Min latency:            0.039318
>>>>>> -----
>>>>>> rados bench -p pbench 30 seq
>>>>>> Total reads made:     2061
>>>>>> Read size:            4194304
>>>>>> Bandwidth (MB/sec):    956.466
>>>>>>
>>>>>> Average Latency:       0.0666347
>>>>>> Max latency:           0.208986
>>>>>> Min latency:           0.011625
>>>>>>
>>>>>> This all looks like I would expect from using three disks.  The
>>>>>> problems appear to come with the 8K blocks/object size.
>>>>>>
>>>>>> RADOS bench test with 3 SSD disks and 8K object size(8K blocks):
>>>>>> rados --no-cleanup bench -b 8192 -p pbench 30 write
>>>>>> Total writes made:      13770
>>>>>> Write size:             8192
>>>>>> Bandwidth (MB/sec):     3.581
>>>>>>
>>>>>> Stddev Bandwidth:       1.04405
>>>>>> Max bandwidth (MB/sec): 6.19531
>>>>>> Min bandwidth (MB/sec): 0
>>>>>> Average Latency:        0.0348977
>>>>>> Stddev Latency:         0.0349212
>>>>>> Max latency:            0.326429
>>>>>> Min latency:            0.0019
>>>>>> ------
>>>>>> rados bench -b 8192 -p pbench 30 seq
>>>>>> Total reads made:     13770
>>>>>> Read size:            8192
>>>>>> Bandwidth (MB/sec):    52.573
>>>>>>
>>>>>> Average Latency:       0.00237483
>>>>>> Max latency:           0.006783
>>>>>> Min latency:           0.000521
>>>>>>
>>>>>> So are these performance correct or is this something I missed with
>>>>>> the testing procedure?  The RADOS bench number with 8K block size are the
>>>>>> same we see when testing performance in an VM with SQLIO.  Does anyone 
>>>>>> know
>>>>>> of any configure changes that are needed to get the Ceph performance 
>>>>>> closer
>>>>>> to native performance with 8K blocks?
>>>>>>
>>>>>> Thanks in advance.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> Jason Villalta
>>>>>> Co-founder
>>>>>> 800.799.4407x1230 | www.RubixTechnology.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>>
>>>>> ______________________________**_________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>>>>
>>>>>
>>>>> NOTICE: Protect the information in this message in accordance with the
>>>>> company's security policies. If you received this message in error,
>>>>> immediately notify the sender and destroy all copies.
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Jason Villalta
>>>> Co-founder
>>>> 800.799.4407x1230 | www.RubixTechnology.com
>>>>
>>>>
>>>> NOTICE: Protect the information in this message in accordance with the
>>>> company's security policies. If you received this message in error,
>>>> immediately notify the sender and destroy all copies.
>>>>
>>>>
>>>
>>>
>>> --
>>> --
>>> Jason Villalta
>>> Co-founder
>>> 800.799.4407x1230 | www.RubixTechnology.com
>>>
>> ______________________________**_________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>
>>
>


-- 
-- 
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>

<<EmailLogo.png>>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph performance with 8K blocks.

Reply via email to