I will try both suggestions, Thank you for your input.
On Tue, Sep 17, 2013 at 5:06 PM, Josh Durgin <josh.dur...@inktank.com>wrote: > Also enabling rbd writeback caching will allow requests to be merged, > which will help a lot for small sequential I/O. > > > On 09/17/2013 02:03 PM, Gregory Farnum wrote: > >> Try it with oflag=dsync instead? I'm curious what kind of variation >> these disks will provide. >> >> Anyway, you're not going to get the same kind of performance with >> RADOS on 8k sync IO that you will with a local FS. It needs to >> traverse the network and go through work queues in the daemon; your >> primary limiter here is probably the per-request latency that you're >> seeing (average ~30 ms, looking at the rados bench results). The good >> news is that means you should be able to scale out to a lot of >> clients, and if you don't force those 8k sync IOs (which RBD won't, >> unless the application asks for them by itself using directIO or >> frequent fsync or whatever) your performance will go way up. >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> On Tue, Sep 17, 2013 at 1:47 PM, Jason Villalta <ja...@rubixnet.com> >> wrote: >> >>> >>> Here are the stats with direct io. >>> >>> dd of=ddbenchfile if=/dev/zero bs=8K count=1000000 oflag=direct >>> 8192000000 bytes (8.2 GB) copied, 68.4789 s, 120 MB/s >>> >>> dd if=ddbenchfile of=/dev/null bs=8K >>> 8192000000 bytes (8.2 GB) copied, 19.7318 s, 415 MB/s >>> >>> These numbers are still over all much faster than when using RADOS bench. >>> The replica is set to 2. The Journals are on the same disk but separate >>> partitions. >>> >>> I kept the block size the same 8K. >>> >>> >>> >>> >>> On Tue, Sep 17, 2013 at 11:37 AM, Campbell, Bill < >>> bcampbell@axcess-financial.**com <bcampb...@axcess-financial.com>> >>> wrote: >>> >>>> >>>> As Gregory mentioned, your 'dd' test looks to be reading from the cache >>>> (you are writing 8GB in, and then reading that 8GB out, so the reads are >>>> all cached reads) so the performance is going to seem good. You can add >>>> the 'oflag=direct' to your dd test to try and get a more accurate reading >>>> from that. >>>> >>>> RADOS performance from what I've seen is largely going to hinge on >>>> replica size and journal location. Are your journals on separate disks or >>>> on the same disk as the OSD? What is the replica size of your pool? >>>> >>>> ______________________________**__ >>>> From: "Jason Villalta" <ja...@rubixnet.com> >>>> To: "Bill Campbell" >>>> <bcampbell@axcess-financial.**com<bcampb...@axcess-financial.com> >>>> > >>>> Cc: "Gregory Farnum" <g...@inktank.com>, "ceph-users" < >>>> ceph-users@lists.ceph.com> >>>> Sent: Tuesday, September 17, 2013 11:31:43 AM >>>> >>>> Subject: Re: [ceph-users] Ceph performance with 8K blocks. >>>> >>>> Thanks for you feed back it is helpful. >>>> >>>> I may have been wrong about the default windows block size. What would >>>> be the best tests to compare native performance of the SSD disks at 4K >>>> blocks vs Ceph performance with 4K blocks? It just seems their is a huge >>>> difference in the results. >>>> >>>> >>>> On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill < >>>> bcampbell@axcess-financial.**com <bcampb...@axcess-financial.com>> >>>> wrote: >>>> >>>>> >>>>> Windows default (NTFS) is a 4k block. Are you changing the allocation >>>>> unit to 8k as a default for your configuration? >>>>> >>>>> ______________________________**__ >>>>> From: "Gregory Farnum" <g...@inktank.com> >>>>> To: "Jason Villalta" <ja...@rubixnet.com> >>>>> Cc: ceph-users@lists.ceph.com >>>>> Sent: Tuesday, September 17, 2013 10:40:09 AM >>>>> Subject: Re: [ceph-users] Ceph performance with 8K blocks. >>>>> >>>>> >>>>> Your 8k-block dd test is not nearly the same as your 8k-block rados >>>>> bench or SQL tests. Both rados bench and SQL require the write to be >>>>> committed to disk before moving on to the next one; dd is simply writing >>>>> into the page cache. So you're not going to get 460 or even 273MB/s with >>>>> sync 8k writes regardless of your settings. >>>>> >>>>> However, I think you should be able to tune your OSDs into somewhat >>>>> better numbers -- that rados bench is giving you ~300IOPs on every OSD >>>>> (with a small pipeline!), and an SSD-based daemon should be going faster. >>>>> What kind of logging are you running with and what configs have you set? >>>>> >>>>> Hopefully you can get Mark or Sam or somebody who's done some >>>>> performance tuning to offer some tips as well. :) >>>>> -Greg >>>>> >>>>> On Tuesday, September 17, 2013, Jason Villalta wrote: >>>>> >>>>>> >>>>>> Hello all, >>>>>> I am new to the list. >>>>>> >>>>>> I have a single machines setup for testing Ceph. It has a dual proc >>>>>> 6 cores(12core total) for CPU and 128GB of RAM. I also have 3 Intel 520 >>>>>> 240GB SSDs and an OSD setup on each disk with the OSD and Journal in >>>>>> separate partitions formatted with ext4. >>>>>> >>>>>> My goal here is to prove just how fast Ceph can go and what kind of >>>>>> performance to expect when using it as a back-end storage for virtual >>>>>> machines mostly windows. I would also like to try to understand how it >>>>>> will scale IO by removing one disk of the three and doing the benchmark >>>>>> tests. But that is secondary. So far here are my results. I am aware >>>>>> this is all sequential, I just want to know how fast it can go. >>>>>> >>>>>> DD IO test of SSD disks: I am testing 8K blocks since that is the >>>>>> default block size of windows. >>>>>> dd of=ddbenchfile if=/dev/zero bs=8K count=1000000 >>>>>> 8192000000 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s >>>>>> >>>>>> dd if=ddbenchfile of=/dev/null bs=8K >>>>>> 8192000000 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s >>>>>> >>>>>> RADOS bench test with 3 SSD disks and 4MB object size(Default): >>>>>> rados --no-cleanup bench -p pbench 30 write >>>>>> Total writes made: 2061 >>>>>> Write size: 4194304 >>>>>> Bandwidth (MB/sec): 273.004 >>>>>> >>>>>> Stddev Bandwidth: 67.5237 >>>>>> Max bandwidth (MB/sec): 352 >>>>>> Min bandwidth (MB/sec): 0 >>>>>> Average Latency: 0.234199 >>>>>> Stddev Latency: 0.130874 >>>>>> Max latency: 0.867119 >>>>>> Min latency: 0.039318 >>>>>> ----- >>>>>> rados bench -p pbench 30 seq >>>>>> Total reads made: 2061 >>>>>> Read size: 4194304 >>>>>> Bandwidth (MB/sec): 956.466 >>>>>> >>>>>> Average Latency: 0.0666347 >>>>>> Max latency: 0.208986 >>>>>> Min latency: 0.011625 >>>>>> >>>>>> This all looks like I would expect from using three disks. The >>>>>> problems appear to come with the 8K blocks/object size. >>>>>> >>>>>> RADOS bench test with 3 SSD disks and 8K object size(8K blocks): >>>>>> rados --no-cleanup bench -b 8192 -p pbench 30 write >>>>>> Total writes made: 13770 >>>>>> Write size: 8192 >>>>>> Bandwidth (MB/sec): 3.581 >>>>>> >>>>>> Stddev Bandwidth: 1.04405 >>>>>> Max bandwidth (MB/sec): 6.19531 >>>>>> Min bandwidth (MB/sec): 0 >>>>>> Average Latency: 0.0348977 >>>>>> Stddev Latency: 0.0349212 >>>>>> Max latency: 0.326429 >>>>>> Min latency: 0.0019 >>>>>> ------ >>>>>> rados bench -b 8192 -p pbench 30 seq >>>>>> Total reads made: 13770 >>>>>> Read size: 8192 >>>>>> Bandwidth (MB/sec): 52.573 >>>>>> >>>>>> Average Latency: 0.00237483 >>>>>> Max latency: 0.006783 >>>>>> Min latency: 0.000521 >>>>>> >>>>>> So are these performance correct or is this something I missed with >>>>>> the testing procedure? The RADOS bench number with 8K block size are the >>>>>> same we see when testing performance in an VM with SQLIO. Does anyone >>>>>> know >>>>>> of any configure changes that are needed to get the Ceph performance >>>>>> closer >>>>>> to native performance with 8K blocks? >>>>>> >>>>>> Thanks in advance. >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -- >>>>>> Jason Villalta >>>>>> Co-founder >>>>>> 800.799.4407x1230 | www.RubixTechnology.com >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>>> >>>>> ______________________________**_________________ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >>>>> >>>>> >>>>> NOTICE: Protect the information in this message in accordance with the >>>>> company's security policies. If you received this message in error, >>>>> immediately notify the sender and destroy all copies. >>>>> >>>>> >>>> >>>> >>>> -- >>>> -- >>>> Jason Villalta >>>> Co-founder >>>> 800.799.4407x1230 | www.RubixTechnology.com >>>> >>>> >>>> NOTICE: Protect the information in this message in accordance with the >>>> company's security policies. If you received this message in error, >>>> immediately notify the sender and destroy all copies. >>>> >>>> >>> >>> >>> -- >>> -- >>> Jason Villalta >>> Co-founder >>> 800.799.4407x1230 | www.RubixTechnology.com >>> >> ______________________________**_________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> >> > -- -- *Jason Villalta* Co-founder [image: Inline image 1] 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
<<EmailLogo.png>>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com