We redid the test with 4MB Block Size (using the same command as before but 
with 4MB for the BS) and we are getting better result from all devices:

Intel DC S3500 120GB =          148 MB/s
Samsung Pro 128GB =             187 MB/s
Intel 520 120GB =               154 MB/s
Samsung EVO 1TB =               186 MB/s
Intel DC S3500 300GB =          250 MB/s
I have not tested the DC S3610 yet but I will be ordering some soon
Since previously we had the journal and OSD on the same SSD Im still wondering 
if having the journal separate from the SSD (with a ration of 1:3 or 1:4) will 
actually bring more Write speed.
This is the configuration I was thinking of if we separate the Journal from the 
OSD:

÷Each OSD_Node÷
Dual E5-2620v2 with 64GB of RAM
-------------------
HBA 9207-8i #1
3x1TB Samsung 1TB for the Storage layer + 1 Intel S3610 200GB for the Journal
3x1TB Samsung 1TB for the Storage layer + 1 Intel S3610 200GB for the Journal
-------------------
HBA 9207-8i #2
3x1TB Samsung 1TB for the Storage layer + 1 Intel S3610 200GB for the Journal
3x1TB Samsung 1TB for the Storage layer + 1 Intel S3610 200GB for the Journal
-------------------
1x LSI RAID Card + 2x 120GB SSD (For OS)
2x 10GbE dual port

There would be between 6-8 OSD Node like this to start the cluster.

My goal would be to max out at least 20 Gbps switch ports in writes to a single 
OpenStack Compute node. (Im still not sure about the CPU capacity)

As anyone testes a similar environment?

Anyway guys, lets me know what you think since we are still testing this POC.
---
Anthony Lévesque


> On Apr 25, 2015, at 11:46 PM, Christian Balzer <ch...@gol.com> wrote:
> 
> 
> Hello,
> 
> I think that the dd test isn't a 100% replica of what Ceph actually does
> then. 
> My suspicion would be the 4k blocks, since when people test the maximum
> bandwidth they do it with rados bench or other tools that write the
> optimum sized "blocks" for Ceph, 4MB ones.
> 
> I currently have no unused DC S3700s to do a realistic comparison and the
> DC S3500 I have aren't used in any Ceph environment.
> 
> When testing a 200GB DC S3700 that has specs of 35K write IOPS and 365MB/s
> sequential writes on mostly idle system (but on top of Ext4, not the raw
> device) with a 4k dd dsync test run, atop and iostat show a 70% SSD
> utilization, 30k IOPS and 70MB/s writes. 
> Which matches the specs perfectly.
> If I do that test with 4MB blocks, the speed goes up to 330MB/s and 90%
> SSD utilization according to atop, again on par with the specs.
> 
> Lastly on existing Ceph clusters with DC S3700 SSDs as journals and rados
> bench and its 4MB default size that pattern continues.
> Smaller sizes with rados naturally (at least on my hardware and Ceph
> version, Firefly) run into the limitations of Ceph long before they hit
> the SSDs (nearly 100% busy cores, journals at 4-8%, OSD HDDs anywhere from
> 50-100%).
> 
> Of course using the same dd test over all brands will still give you a
> good comparison of the SSDs capabilities.
> But translating that into actual Ceph journal performance is another thing.
> 
> Christian
> 
> On Sat, 25 Apr 2015 18:32:30 +0200 (CEST) Alexandre DERUMIER wrote:
> 
>> I'm able to reach around 20000-25000iops with 4k block with s3500 (with
>> o_dsync) (so yes, around 80-100MB/S).
>> 
>> I'l bench new s3610 soon to compare.
>> 
>> 
>> ----- Mail original -----
>> De: "Anthony Levesque" <aleves...@gtcomm.net>
>> À: "Christian Balzer" <ch...@gol.com>
>> Cc: "ceph-users" <ceph-users@lists.ceph.com>
>> Envoyé: Vendredi 24 Avril 2015 22:00:44
>> Objet: Re: [ceph-users] Possible improvements for a slow write
>> speed        (excluding independent SSD journals)
>> 
>> Hi Christian, 
>> 
>> We tested some DC S3500 300GB using dd if=randfile of=/dev/sda bs=4k
>> count=100000 oflag=direct,dsync 
>> 
>> we got 96 MB/s which is far from the 315 MB/s from the website. 
>> 
>> Can I ask you or anyone on the mailing list how you are testing the
>> write speed for journals? 
>> 
>> Thanks 
>> --- 
>> Anthony Lévesque 
>> GloboTech Communications 
>> Phone: 1-514-907-0050 x 208 
>> Toll Free: 1-(888)-GTCOMM1 x 208 
>> Phone Urgency: 1-(514) 907-0047 
>> 1-(866)-500-1555 
>> Fax: 1-(514)-907-0750 
>> aleves...@gtcomm.net 
>> http://www.gtcomm.net 
>> 
>> 
>> 
>> 
>> On Apr 23, 2015, at 9:05 PM, Christian Balzer < ch...@gol.com > wrote: 
>> 
>> 
>> Hello, 
>> 
>> On Thu, 23 Apr 2015 18:40:38 -0400 Anthony Levesque wrote: 
>> 
>> 
>> BQ_BEGIN
>> To update you on the current test in our lab: 
>> 
>> 1.We tested the Samsung OSD in Recovery mode and the speed was able to 
>> maxout 2x 10GbE port(transferring data at 2200+ MB/s during recovery). 
>> So for normal write operation without O_DSYNC writes Samsung drives seem 
>> ok. 
>> 
>> 2.We then tested a couple of different model of SSD we had in stock with 
>> the following command: 
>> 
>> dd if=randfile of=/dev/sda bs=4k count=100000 oflag=direct,dsync 
>> 
>> This was from a blog written by Sebastien Han and I think should be able 
>> to show how the drives would perform in O_DSYNC writes. For people 
>> interested in some result of what we tested here they are: 
>> 
>> Intel DC S3500 120GB = 114 MB/s 
>> Samsung Pro 128GB = 2.4 MB/s 
>> WD Black 1TB (HDD) = 409 KB/s 
>> Intel 330 120GB = 105 MB/s 
>> Intel 520 120GB = 9.4 MB/s 
>> Intel 335 80GB = 9.4 MB/s 
>> Samsung EVO 1TB = 2.5 MB/s 
>> Intel 320 120GB = 78 MB/s 
>> OCZ Revo Drive 240GB = 60.8 MB/s 
>> 4x Samsung EVO 1TB LSI RAID0 HW + BBU = 28.4 MB/s 
>> 
>> 
>> 
>> No real surprises here, but a nice summary nonetheless. 
>> 
>> You _really_ want to avoid consumer SSDs for journals and have a good
>> idea on how much data you'll write per day and how long you expect your
>> SSDs to last (the TBW/$ ratio). 
>> 
>> 
>> BQ_BEGIN
>> Please let us know if the command we ran was not optimal to test O_DSYNC 
>> writes 
>> 
>> We order larger drive from Intel DC series to see if we could get more 
>> than 200 MB/s per SSD. We will keep you posted on tests if that 
>> interested you guys. We dint test multiple parallel test yet (to 
>> simulate multiple journal on one SSD). 
>> 
>> 
>> BQ_END
>> You can totally trust the numbers on Intel's site: 
>> http://ark.intel.com/products/family/83425/Data-Center-SSDs 
>> 
>> The S3500s are by far the slowest and have the lowest endurance. 
>> Again, depending on your expected write level the S3610 or S3700 models 
>> are going to be a better fit regarding price/performance. 
>> Especially when you consider that loosing a journal SSD will result in 
>> several dead OSDs. 
>> 
>> 
>> BQ_BEGIN
>> 3.We remove the Journal from all Samsung OSD and put 2x Intel 330 120GB 
>> on all 6 Node to test. The overall speed we were getting from the rados 
>> bench went from 1000 MB/s(approx.) to 450 MB/s which might only be 
>> because the intel cannot do too much in term of journaling (was tested 
>> at around 100 MB/s). It will be interesting to test with bigger Intel 
>> DC S3500 drives(and more journals) per node to see if I can back up to 
>> 1000MB/s or even surpass it. 
>> 
>> We also wanted to test if the CPU could be a huge bottle neck so we swap 
>> the Dual E5-2620v2 from node #6 and replace them with Dual 
>> E5-2609v2(Which are much smaller in core and speed) and the 450 MB/s we 
>> got from he rados bench went even lower to 180 MB/s. 
>> 
>> 
>> BQ_END
>> You really don't have to swap CPUs around, monitor things with atop or 
>> other tools to see where your bottlenecks are. 
>> 
>> 
>> BQ_BEGIN
>> So Im wondering if the 1000MB/s we got when the Journal was shared on 
>> the OSD SSD was not limited by the CPUs (even though the samsung are not 
>> good for journals on the long run) and not just by the fact Samsung SSD 
>> are bad in O_DSYNC writes(or maybe both). It is probable that 16 SSD 
>> OSD per node in a full SSD cluster is too much and the major bottleneck 
>> will be from the CPU. 
>> 
>> 
>> BQ_END
>> That's what I kept saying. ^.^ 
>> 
>> 
>> BQ_BEGIN
>> 4.Im wondering if we find good SSD for the journal and keep the samsung 
>> for normal writes and read(We can saturate 20GbE easy with read 
>> benchmark. We will test 40GbE soon) if the cluster will keep healthy 
>> since Samsung seem to get burnt from O_DSYNC writes. 
>> 
>> 
>> BQ_END
>> They will get burned, as in have their cells worn out by any and all 
>> writes. 
>> 
>> 
>> BQ_BEGIN
>> 5.In term of HBA controller, did you guys have made any test for a full 
>> SSD cluster or even just for SSD Journal. 
>> 
>> 
>> BQ_END
>> If you have separate journals and OSDs, it often makes good sense to
>> have them on separate controllers as well. 
>> It all depends on density of your setup and capabilities of the 
>> controllers. 
>> LSI HBAs in IT mode are a known and working entity. 
>> 
>> Christian 
> 
> 
> -- 
> Christian Balzer        Network/Systems Engineer                
> ch...@gol.com         Global OnLine Japan/Fusion Communications
> http://www.gol.com/

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to