Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

Cédric Lemarchand Tue, 02 Sep 2014 03:46:12 -0700

Hi Sebastian,

> Le 2 sept. 2014 à 10:41, Sebastien Han <sebastien....@enovance.com> a écrit :
> 
> Hey,
> 
> Well I ran an fio job that simulates the (more or less) what ceph is doing 
> (journal writes with dsync and o_direct) and the ssd gave me 29K IOPS too.
> I could do this, but for me it definitely looks like a major waste since we 
> don’t even get a third of the ssd performance.


Did you had a look if the raw ssd IOPS (using iostat -x for example) show same 
results during fio bench ?

Cheers 

> 
>> On 02 Sep 2014, at 09:38, Alexandre DERUMIER <aderum...@odiso.com> wrote:
>> 
>> Hi Sebastien,
>> 
>>>> I got 6340 IOPS on a single OSD SSD. (journal and data on the same 
>>>> partition).
>> 
>> Shouldn't it better to have 2 partitions, 1 for journal and 1 for datas ?
>> 
>> (I'm thinking about filesystem write syncs)
>> 
>> 
>> 
>> 
>> ----- Mail original -----
>> 
>> De: "Sebastien Han" <sebastien....@enovance.com>
>> À: "Somnath Roy" <somnath....@sandisk.com>
>> Cc: ceph-users@lists.ceph.com
>> Envoyé: Mardi 2 Septembre 2014 02:19:16
>> Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
>> IOPS
>> 
>> Mark and all, Ceph IOPS performance has definitely improved with Giant.
>> With this version: ceph version 0.84-940-g3215c52 
>> (3215c520e1306f50d0094b5646636c02456c9df4) on Debian 7.6 with Kernel 3.14-0.
>> 
>> I got 6340 IOPS on a single OSD SSD. (journal and data on the same 
>> partition).
>> So basically twice the amount of IOPS that I was getting with Firefly.
>> 
>> Rand reads 4k went from 12431 to 10201, so I’m a bit disappointed here.
>> 
>> The SSD is still under-utilised:
>> 
>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await 
>> w_await svctm %util
>> sdp1 0.00 540.37 0.00 5902.30 0.00 47.14 16.36 0.87 0.15 0.00 0.15 0.07 40.15
>> sdp2 0.00 0.00 0.00 4454.67 0.00 49.16 22.60 0.31 0.07 0.00 0.07 0.07 30.61
>> 
>> Thanks a ton for all your comments and assistance guys :).
>> 
>> One last question for Sage (or other that might know), what’s the status of 
>> the S2FS implementation? (or maybe we are waiting for S2FS to provide atomic 
>> transactions?)
>> I tried to run the OSD on f2fs however ceph-osd mkfs got stuck on a xattr 
>> test:
>> 
>> fremovexattr(10, "user.test@5848273") = 0
>> 
>>> On 01 Sep 2014, at 11:13, Sebastien Han <sebastien....@enovance.com> wrote:
>>> 
>>> Mark, thanks a lot for experimenting this for me.
>>> I’m gonna try master soon and will tell you how much I can get.
>>> 
>>> It’s interesting to see that using 2 SSDs brings up more performance, even 
>>> both SSDs are under-utilized…
>>> They should be able to sustain both loads at the same time (journal and osd 
>>> data).
>>> 
>>>> On 01 Sep 2014, at 09:51, Somnath Roy <somnath....@sandisk.com> wrote:
>>>> 
>>>> As I said, 107K with IOs serving from memory, not hitting the disk..
>>>> 
>>>> From: Jian Zhang [mailto:amberzhan...@gmail.com]
>>>> Sent: Sunday, August 31, 2014 8:54 PM
>>>> To: Somnath Roy
>>>> Cc: Haomai Wang; ceph-users@lists.ceph.com
>>>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 
>>>> 2K IOPS
>>>> 
>>>> Somnath,
>>>> on the small workload performance, 107k is higher than the theoretical 
>>>> IOPS of 520, any idea why?
>>>> 
>>>> 
>>>> 
>>>>>> Single client is ~14K iops, but scaling as number of clients increases. 
>>>>>> 10 clients ~107K iops. ~25 cpu cores are used.
>>>> 
>>>> 
>>>> 2014-09-01 11:52 GMT+08:00 Jian Zhang <amberzhan...@gmail.com>:
>>>> Somnath,
>>>> on the small workload performance,
>>>> 
>>>> 
>>>> 
>>>> 2014-08-29 14:37 GMT+08:00 Somnath Roy <somnath....@sandisk.com>:
>>>> 
>>>> Thanks Haomai !
>>>> 
>>>> Here is some of the data from my setup.
>>>> 
>>>> 
>>>> 
>>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>> 
>>>> Set up:
>>>> 
>>>> --------
>>>> 
>>>> 
>>>> 
>>>> 32 core cpu with HT enabled, 128 GB RAM, one SSD (both journal and data) 
>>>> -> one OSD. 5 client m/c with 12 core cpu and each running two instances 
>>>> of ceph_smalliobench (10 clients total). Network is 10GbE.
>>>> 
>>>> 
>>>> 
>>>> Workload:
>>>> 
>>>> -------------
>>>> 
>>>> 
>>>> 
>>>> Small workload – 20K objects with 4K size and io_size is also 4K RR. The 
>>>> intent is to serve the ios from memory so that it can uncover the 
>>>> performance problems within single OSD.
>>>> 
>>>> 
>>>> 
>>>> Results from Firefly:
>>>> 
>>>> --------------------------
>>>> 
>>>> 
>>>> 
>>>> Single client throughput is ~14K iops, but as the number of client 
>>>> increases the aggregated throughput is not increasing. 10 clients ~15K 
>>>> iops. ~9-10 cpu cores are used.
>>>> 
>>>> 
>>>> 
>>>> Result with latest master:
>>>> 
>>>> ------------------------------
>>>> 
>>>> 
>>>> 
>>>> Single client is ~14K iops, but scaling as number of clients increases. 10 
>>>> clients ~107K iops. ~25 cpu cores are used.
>>>> 
>>>> 
>>>> 
>>>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> More realistic workload:
>>>> 
>>>> -----------------------------
>>>> 
>>>> Let’s see how it is performing while > 90% of the ios are served from disks
>>>> 
>>>> Setup:
>>>> 
>>>> -------
>>>> 
>>>> 40 cpu core server as a cluster node (single node cluster) with 64 GB RAM. 
>>>> 8 SSDs -> 8 OSDs. One similar node for monitor and rgw. Another node for 
>>>> client running fio/vdbench. 4 rbds are configured with ‘noshare’ option. 
>>>> 40 GbE network
>>>> 
>>>> 
>>>> 
>>>> Workload:
>>>> 
>>>> ------------
>>>> 
>>>> 
>>>> 
>>>> 8 SSDs are populated , so, 8 * 800GB = ~6.4 TB of data. Io_size = 4K RR.
>>>> 
>>>> 
>>>> 
>>>> Results from Firefly:
>>>> 
>>>> ------------------------
>>>> 
>>>> 
>>>> 
>>>> Aggregated output while 4 rbd clients stressing the cluster in parallel is 
>>>> ~20-25K IOPS , cpu cores used ~8-10 cores (may be less can’t remember 
>>>> precisely)
>>>> 
>>>> 
>>>> 
>>>> Results from latest master:
>>>> 
>>>> --------------------------------
>>>> 
>>>> 
>>>> 
>>>> Aggregated output while 4 rbd clients stressing the cluster in parallel is 
>>>> ~120K IOPS , cpu is 7% idle i.e ~37-38 cpu cores.
>>>> 
>>>> 
>>>> 
>>>> Hope this helps.
>>>> 
>>>> 
>>>> 
>>>> Thanks & Regards
>>>> 
>>>> Somnath
>>>> 
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Haomai Wang [mailto:haomaiw...@gmail.com]
>>>> Sent: Thursday, August 28, 2014 8:01 PM
>>>> To: Somnath Roy
>>>> Cc: Andrey Korolyov; ceph-users@lists.ceph.com
>>>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 
>>>> 2K IOPS
>>>> 
>>>> 
>>>> Hi Roy,
>>>> 
>>>> 
>>>> 
>>>> I already scan your merged codes about "fdcache" and "optimizing for 
>>>> lfn_find/lfn_open", could you give some performance improvement data about 
>>>> it? I fully agree with your orientation, do you have any update about it?
>>>> 
>>>> 
>>>> 
>>>> As for messenger level, I have some very early works on 
>>>> it(https://github.com/yuyuyu101/ceph/tree/msg-event), it contains a new 
>>>> messenger implementation which support different event mechanism.
>>>> 
>>>> It looks like at least one more week to make it work.
>>>> 
>>>> 
>>>> 
>>>>> On Fri, Aug 29, 2014 at 5:48 AM, Somnath Roy <somnath....@sandisk.com> 
>>>>> wrote:
>>>>> 
>>>>> Yes, what I saw the messenger level bottleneck is still huge !
>>>> 
>>>>> Hopefully RDMA messenger will resolve that and the performance gain will 
>>>>> be significant for Read (on SSDs). For write we need to uncover the OSD 
>>>>> bottlenecks first to take advantage of the improved upstream.
>>>> 
>>>>> What I experienced that till you remove the very last bottleneck the 
>>>>> performance improvement will not be visible and that could be confusing 
>>>>> because you might think that the upstream improvement you did is not 
>>>>> valid (which is not).
>>>> 
>>>> 
>>>>> Thanks & Regards
>>>> 
>>>>> Somnath
>>>> 
>>>>> -----Original Message-----
>>>> 
>>>>> From: Andrey Korolyov [mailto:and...@xdel.ru]
>>>> 
>>>>> Sent: Thursday, August 28, 2014 12:57 PM
>>>> 
>>>>> To: Somnath Roy
>>>> 
>>>>> Cc: David Moreau Simard; Mark Nelson; ceph-users@lists.ceph.com
>>>> 
>>>>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go
>>>> 
>>>>> over 3, 2K IOPS
>>>> 
>>>> 
>>>>> On Thu, Aug 28, 2014 at 10:48 PM, Somnath Roy <somnath....@sandisk.com> 
>>>>> wrote:
>>>> 
>>>>>> Nope, this will not be back ported to Firefly I guess.
>>>> 
>>>> 
>>>>>> Thanks & Regards
>>>> 
>>>>>> Somnath
>>>> 
>>>> 
>>>> 
>>>>> Thanks for sharing this, the first thing in thought when I looked at
>>>> 
>>>>> this thread, was your patches :)
>>>> 
>>>> 
>>>>> If Giant will incorporate them, both the RDMA support and those should 
>>>>> give a huge performance boost for RDMA-enabled Ceph backnets.
>>>> 
>>>> 
>>>>> ________________________________
>>>> 
>>>> 
>>>>> PLEASE NOTE: The information contained in this electronic mail message is 
>>>>> intended only for the use of the designated recipient(s) named above. If 
>>>>> the reader of this message is not the intended recipient, you are hereby 
>>>>> notified that you have received this message in error and that any 
>>>>> review, dissemination, distribution, or copying of this message is 
>>>>> strictly prohibited. If you have received this communication in error, 
>>>>> please notify the sender by telephone or e-mail (as shown above) 
>>>>> immediately and destroy any and all copies of this message in your 
>>>>> possession (whether hard copies or electronically stored copies).
>>>> 
>>>> 
>>>>> _______________________________________________
>>>> 
>>>>> ceph-users mailing list
>>>> 
>>>>> ceph-users@lists.ceph.com
>>>> 
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> Best Regards,
>>>> 
>>>> 
>>>> 
>>>> Wheat
>>>> 
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> 
>>> Cheers.
>>> ––––
>>> Sébastien Han
>>> Cloud Architect
>>> 
>>> "Always give 100%. Unless you're giving blood."
>>> 
>>> Phone: +33 (0)1 49 70 99 72
>>> Mail: sebastien....@enovance.com
>>> Address : 11 bis, rue Roquépine - 75008 Paris
>>> Web : www.enovance.com - Twitter : @enovance
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 
>> Cheers.
>> ––––
>> Sébastien Han
>> Cloud Architect
>> 
>> "Always give 100%. Unless you're giving blood."
>> 
>> Phone: +33 (0)1 49 70 99 72
>> Mail: sebastien....@enovance.com
>> Address : 11 bis, rue Roquépine - 75008 Paris
>> Web : www.enovance.com - Twitter : @enovance
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> Cheers.
> –––– 
> Sébastien Han 
> Cloud Architect 
> 
> "Always give 100%. Unless you're giving blood."
> 
> Phone: +33 (0)1 49 70 99 72 
> Mail: sebastien....@enovance.com 
> Address : 11 bis, rue Roquépine - 75008 Paris
> Web : www.enovance.com - Twitter : @enovance 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

Reply via email to