We redid the test with 4MB Block Size (using the same command as before but with 4MB for the BS) and we are getting better result from all devices:
Intel DC S3500 120GB = 148 MB/s Samsung Pro 128GB = 187 MB/s Intel 520 120GB = 154 MB/s Samsung EVO 1TB = 186 MB/s Intel DC S3500 300GB = 250 MB/s I have not tested the DC S3610 yet but I will be ordering some soon Since previously we had the journal and OSD on the same SSD Im still wondering if having the journal separate from the SSD (with a ration of 1:3 or 1:4) will actually bring more Write speed. This is the configuration I was thinking of if we separate the Journal from the OSD: ÷Each OSD_Node÷ Dual E5-2620v2 with 64GB of RAM ------------------- HBA 9207-8i #1 3x1TB Samsung 1TB for the Storage layer + 1 Intel S3610 200GB for the Journal 3x1TB Samsung 1TB for the Storage layer + 1 Intel S3610 200GB for the Journal ------------------- HBA 9207-8i #2 3x1TB Samsung 1TB for the Storage layer + 1 Intel S3610 200GB for the Journal 3x1TB Samsung 1TB for the Storage layer + 1 Intel S3610 200GB for the Journal ------------------- 1x LSI RAID Card + 2x 120GB SSD (For OS) 2x 10GbE dual port There would be between 6-8 OSD Node like this to start the cluster. My goal would be to max out at least 20 Gbps switch ports in writes to a single OpenStack Compute node. (Im still not sure about the CPU capacity) As anyone testes a similar environment? Anyway guys, lets me know what you think since we are still testing this POC. --- Anthony Lévesque > On Apr 25, 2015, at 11:46 PM, Christian Balzer <ch...@gol.com> wrote: > > > Hello, > > I think that the dd test isn't a 100% replica of what Ceph actually does > then. > My suspicion would be the 4k blocks, since when people test the maximum > bandwidth they do it with rados bench or other tools that write the > optimum sized "blocks" for Ceph, 4MB ones. > > I currently have no unused DC S3700s to do a realistic comparison and the > DC S3500 I have aren't used in any Ceph environment. > > When testing a 200GB DC S3700 that has specs of 35K write IOPS and 365MB/s > sequential writes on mostly idle system (but on top of Ext4, not the raw > device) with a 4k dd dsync test run, atop and iostat show a 70% SSD > utilization, 30k IOPS and 70MB/s writes. > Which matches the specs perfectly. > If I do that test with 4MB blocks, the speed goes up to 330MB/s and 90% > SSD utilization according to atop, again on par with the specs. > > Lastly on existing Ceph clusters with DC S3700 SSDs as journals and rados > bench and its 4MB default size that pattern continues. > Smaller sizes with rados naturally (at least on my hardware and Ceph > version, Firefly) run into the limitations of Ceph long before they hit > the SSDs (nearly 100% busy cores, journals at 4-8%, OSD HDDs anywhere from > 50-100%). > > Of course using the same dd test over all brands will still give you a > good comparison of the SSDs capabilities. > But translating that into actual Ceph journal performance is another thing. > > Christian > > On Sat, 25 Apr 2015 18:32:30 +0200 (CEST) Alexandre DERUMIER wrote: > >> I'm able to reach around 20000-25000iops with 4k block with s3500 (with >> o_dsync) (so yes, around 80-100MB/S). >> >> I'l bench new s3610 soon to compare. >> >> >> ----- Mail original ----- >> De: "Anthony Levesque" <aleves...@gtcomm.net> >> À: "Christian Balzer" <ch...@gol.com> >> Cc: "ceph-users" <ceph-users@lists.ceph.com> >> Envoyé: Vendredi 24 Avril 2015 22:00:44 >> Objet: Re: [ceph-users] Possible improvements for a slow write >> speed (excluding independent SSD journals) >> >> Hi Christian, >> >> We tested some DC S3500 300GB using dd if=randfile of=/dev/sda bs=4k >> count=100000 oflag=direct,dsync >> >> we got 96 MB/s which is far from the 315 MB/s from the website. >> >> Can I ask you or anyone on the mailing list how you are testing the >> write speed for journals? >> >> Thanks >> --- >> Anthony Lévesque >> GloboTech Communications >> Phone: 1-514-907-0050 x 208 >> Toll Free: 1-(888)-GTCOMM1 x 208 >> Phone Urgency: 1-(514) 907-0047 >> 1-(866)-500-1555 >> Fax: 1-(514)-907-0750 >> aleves...@gtcomm.net >> http://www.gtcomm.net >> >> >> >> >> On Apr 23, 2015, at 9:05 PM, Christian Balzer < ch...@gol.com > wrote: >> >> >> Hello, >> >> On Thu, 23 Apr 2015 18:40:38 -0400 Anthony Levesque wrote: >> >> >> BQ_BEGIN >> To update you on the current test in our lab: >> >> 1.We tested the Samsung OSD in Recovery mode and the speed was able to >> maxout 2x 10GbE port(transferring data at 2200+ MB/s during recovery). >> So for normal write operation without O_DSYNC writes Samsung drives seem >> ok. >> >> 2.We then tested a couple of different model of SSD we had in stock with >> the following command: >> >> dd if=randfile of=/dev/sda bs=4k count=100000 oflag=direct,dsync >> >> This was from a blog written by Sebastien Han and I think should be able >> to show how the drives would perform in O_DSYNC writes. For people >> interested in some result of what we tested here they are: >> >> Intel DC S3500 120GB = 114 MB/s >> Samsung Pro 128GB = 2.4 MB/s >> WD Black 1TB (HDD) = 409 KB/s >> Intel 330 120GB = 105 MB/s >> Intel 520 120GB = 9.4 MB/s >> Intel 335 80GB = 9.4 MB/s >> Samsung EVO 1TB = 2.5 MB/s >> Intel 320 120GB = 78 MB/s >> OCZ Revo Drive 240GB = 60.8 MB/s >> 4x Samsung EVO 1TB LSI RAID0 HW + BBU = 28.4 MB/s >> >> >> >> No real surprises here, but a nice summary nonetheless. >> >> You _really_ want to avoid consumer SSDs for journals and have a good >> idea on how much data you'll write per day and how long you expect your >> SSDs to last (the TBW/$ ratio). >> >> >> BQ_BEGIN >> Please let us know if the command we ran was not optimal to test O_DSYNC >> writes >> >> We order larger drive from Intel DC series to see if we could get more >> than 200 MB/s per SSD. We will keep you posted on tests if that >> interested you guys. We dint test multiple parallel test yet (to >> simulate multiple journal on one SSD). >> >> >> BQ_END >> You can totally trust the numbers on Intel's site: >> http://ark.intel.com/products/family/83425/Data-Center-SSDs >> >> The S3500s are by far the slowest and have the lowest endurance. >> Again, depending on your expected write level the S3610 or S3700 models >> are going to be a better fit regarding price/performance. >> Especially when you consider that loosing a journal SSD will result in >> several dead OSDs. >> >> >> BQ_BEGIN >> 3.We remove the Journal from all Samsung OSD and put 2x Intel 330 120GB >> on all 6 Node to test. The overall speed we were getting from the rados >> bench went from 1000 MB/s(approx.) to 450 MB/s which might only be >> because the intel cannot do too much in term of journaling (was tested >> at around 100 MB/s). It will be interesting to test with bigger Intel >> DC S3500 drives(and more journals) per node to see if I can back up to >> 1000MB/s or even surpass it. >> >> We also wanted to test if the CPU could be a huge bottle neck so we swap >> the Dual E5-2620v2 from node #6 and replace them with Dual >> E5-2609v2(Which are much smaller in core and speed) and the 450 MB/s we >> got from he rados bench went even lower to 180 MB/s. >> >> >> BQ_END >> You really don't have to swap CPUs around, monitor things with atop or >> other tools to see where your bottlenecks are. >> >> >> BQ_BEGIN >> So Im wondering if the 1000MB/s we got when the Journal was shared on >> the OSD SSD was not limited by the CPUs (even though the samsung are not >> good for journals on the long run) and not just by the fact Samsung SSD >> are bad in O_DSYNC writes(or maybe both). It is probable that 16 SSD >> OSD per node in a full SSD cluster is too much and the major bottleneck >> will be from the CPU. >> >> >> BQ_END >> That's what I kept saying. ^.^ >> >> >> BQ_BEGIN >> 4.Im wondering if we find good SSD for the journal and keep the samsung >> for normal writes and read(We can saturate 20GbE easy with read >> benchmark. We will test 40GbE soon) if the cluster will keep healthy >> since Samsung seem to get burnt from O_DSYNC writes. >> >> >> BQ_END >> They will get burned, as in have their cells worn out by any and all >> writes. >> >> >> BQ_BEGIN >> 5.In term of HBA controller, did you guys have made any test for a full >> SSD cluster or even just for SSD Journal. >> >> >> BQ_END >> If you have separate journals and OSDs, it often makes good sense to >> have them on separate controllers as well. >> It all depends on density of your setup and capabilities of the >> controllers. >> LSI HBAs in IT mode are a known and working entity. >> >> Christian > > > -- > Christian Balzer Network/Systems Engineer > ch...@gol.com Global OnLine Japan/Fusion Communications > http://www.gol.com/
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com