Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

Alexandre DERUMIER Tue, 02 Sep 2014 00:40:15 -0700

Hi Sebastien,

>>I got 6340 IOPS on a single OSD SSD. (journal and data on the same 
>>partition).


Shouldn't it better to have 2 partitions, 1 for journal and 1 for datas ?

(I'm thinking about filesystem write syncs)




----- Mail original ----- 

De: "Sebastien Han" <sebastien....@enovance.com> 
À: "Somnath Roy" <somnath....@sandisk.com> 
Cc: ceph-users@lists.ceph.com 
Envoyé: Mardi 2 Septembre 2014 02:19:16 
Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
IOPS 

Mark and all, Ceph IOPS performance has definitely improved with Giant. 
With this version: ceph version 0.84-940-g3215c52 
(3215c520e1306f50d0094b5646636c02456c9df4) on Debian 7.6 with Kernel 3.14-0. 

I got 6340 IOPS on a single OSD SSD. (journal and data on the same partition). 
So basically twice the amount of IOPS that I was getting with Firefly. 

Rand reads 4k went from 12431 to 10201, so I’m a bit disappointed here. 

The SSD is still under-utilised: 

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await 
w_await svctm %util 
sdp1 0.00 540.37 0.00 5902.30 0.00 47.14 16.36 0.87 0.15 0.00 0.15 0.07 40.15 
sdp2 0.00 0.00 0.00 4454.67 0.00 49.16 22.60 0.31 0.07 0.00 0.07 0.07 30.61 

Thanks a ton for all your comments and assistance guys :). 

One last question for Sage (or other that might know), what’s the status of the 
S2FS implementation? (or maybe we are waiting for S2FS to provide atomic 
transactions?) 
I tried to run the OSD on f2fs however ceph-osd mkfs got stuck on a xattr test: 

fremovexattr(10, "user.test@5848273") = 0 

On 01 Sep 2014, at 11:13, Sebastien Han <sebastien....@enovance.com> wrote: 

> Mark, thanks a lot for experimenting this for me. 
> I’m gonna try master soon and will tell you how much I can get. 
> 
> It’s interesting to see that using 2 SSDs brings up more performance, even 
> both SSDs are under-utilized… 
> They should be able to sustain both loads at the same time (journal and osd 
> data). 
> 
> On 01 Sep 2014, at 09:51, Somnath Roy <somnath....@sandisk.com> wrote: 
> 
>> As I said, 107K with IOs serving from memory, not hitting the disk.. 
>> 
>> From: Jian Zhang [mailto:amberzhan...@gmail.com] 
>> Sent: Sunday, August 31, 2014 8:54 PM 
>> To: Somnath Roy 
>> Cc: Haomai Wang; ceph-users@lists.ceph.com 
>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 
>> 2K IOPS 
>> 
>> Somnath, 
>> on the small workload performance, 107k is higher than the theoretical IOPS 
>> of 520, any idea why? 
>> 
>> 
>> 
>>>> Single client is ~14K iops, but scaling as number of clients increases. 10 
>>>> clients ~107K iops. ~25 cpu cores are used. 
>> 
>> 
>> 2014-09-01 11:52 GMT+08:00 Jian Zhang <amberzhan...@gmail.com>: 
>> Somnath, 
>> on the small workload performance, 
>> 
>> 
>> 
>> 2014-08-29 14:37 GMT+08:00 Somnath Roy <somnath....@sandisk.com>: 
>> 
>> Thanks Haomai ! 
>> 
>> Here is some of the data from my setup. 
>> 
>> 
>> 
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>  
>> 
>> Set up: 
>> 
>> -------- 
>> 
>> 
>> 
>> 32 core cpu with HT enabled, 128 GB RAM, one SSD (both journal and data) -> 
>> one OSD. 5 client m/c with 12 core cpu and each running two instances of 
>> ceph_smalliobench (10 clients total). Network is 10GbE. 
>> 
>> 
>> 
>> Workload: 
>> 
>> ------------- 
>> 
>> 
>> 
>> Small workload – 20K objects with 4K size and io_size is also 4K RR. The 
>> intent is to serve the ios from memory so that it can uncover the 
>> performance problems within single OSD. 
>> 
>> 
>> 
>> Results from Firefly: 
>> 
>> -------------------------- 
>> 
>> 
>> 
>> Single client throughput is ~14K iops, but as the number of client increases 
>> the aggregated throughput is not increasing. 10 clients ~15K iops. ~9-10 cpu 
>> cores are used. 
>> 
>> 
>> 
>> Result with latest master: 
>> 
>> ------------------------------ 
>> 
>> 
>> 
>> Single client is ~14K iops, but scaling as number of clients increases. 10 
>> clients ~107K iops. ~25 cpu cores are used. 
>> 
>> 
>> 
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>  
>> 
>> 
>> 
>> 
>> 
>> More realistic workload: 
>> 
>> ----------------------------- 
>> 
>> Let’s see how it is performing while > 90% of the ios are served from disks 
>> 
>> Setup: 
>> 
>> ------- 
>> 
>> 40 cpu core server as a cluster node (single node cluster) with 64 GB RAM. 8 
>> SSDs -> 8 OSDs. One similar node for monitor and rgw. Another node for 
>> client running fio/vdbench. 4 rbds are configured with ‘noshare’ option. 40 
>> GbE network 
>> 
>> 
>> 
>> Workload: 
>> 
>> ------------ 
>> 
>> 
>> 
>> 8 SSDs are populated , so, 8 * 800GB = ~6.4 TB of data. Io_size = 4K RR. 
>> 
>> 
>> 
>> Results from Firefly: 
>> 
>> ------------------------ 
>> 
>> 
>> 
>> Aggregated output while 4 rbd clients stressing the cluster in parallel is 
>> ~20-25K IOPS , cpu cores used ~8-10 cores (may be less can’t remember 
>> precisely) 
>> 
>> 
>> 
>> Results from latest master: 
>> 
>> -------------------------------- 
>> 
>> 
>> 
>> Aggregated output while 4 rbd clients stressing the cluster in parallel is 
>> ~120K IOPS , cpu is 7% idle i.e ~37-38 cpu cores. 
>> 
>> 
>> 
>> Hope this helps. 
>> 
>> 
>> 
>> Thanks & Regards 
>> 
>> Somnath 
>> 
>> 
>> 
>> -----Original Message----- 
>> From: Haomai Wang [mailto:haomaiw...@gmail.com] 
>> Sent: Thursday, August 28, 2014 8:01 PM 
>> To: Somnath Roy 
>> Cc: Andrey Korolyov; ceph-users@lists.ceph.com 
>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 
>> 2K IOPS 
>> 
>> 
>> Hi Roy, 
>> 
>> 
>> 
>> I already scan your merged codes about "fdcache" and "optimizing for 
>> lfn_find/lfn_open", could you give some performance improvement data about 
>> it? I fully agree with your orientation, do you have any update about it? 
>> 
>> 
>> 
>> As for messenger level, I have some very early works on 
>> it(https://github.com/yuyuyu101/ceph/tree/msg-event), it contains a new 
>> messenger implementation which support different event mechanism. 
>> 
>> It looks like at least one more week to make it work. 
>> 
>> 
>> 
>> On Fri, Aug 29, 2014 at 5:48 AM, Somnath Roy <somnath....@sandisk.com> 
>> wrote: 
>> 
>>> Yes, what I saw the messenger level bottleneck is still huge ! 
>> 
>>> Hopefully RDMA messenger will resolve that and the performance gain will be 
>>> significant for Read (on SSDs). For write we need to uncover the OSD 
>>> bottlenecks first to take advantage of the improved upstream. 
>> 
>>> What I experienced that till you remove the very last bottleneck the 
>>> performance improvement will not be visible and that could be confusing 
>>> because you might think that the upstream improvement you did is not valid 
>>> (which is not). 
>> 
>>> 
>> 
>>> Thanks & Regards 
>> 
>>> Somnath 
>> 
>>> -----Original Message----- 
>> 
>>> From: Andrey Korolyov [mailto:and...@xdel.ru] 
>> 
>>> Sent: Thursday, August 28, 2014 12:57 PM 
>> 
>>> To: Somnath Roy 
>> 
>>> Cc: David Moreau Simard; Mark Nelson; ceph-users@lists.ceph.com 
>> 
>>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go 
>> 
>>> over 3, 2K IOPS 
>> 
>>> 
>> 
>>> On Thu, Aug 28, 2014 at 10:48 PM, Somnath Roy <somnath....@sandisk.com> 
>>> wrote: 
>> 
>>>> Nope, this will not be back ported to Firefly I guess. 
>> 
>>>> 
>> 
>>>> Thanks & Regards 
>> 
>>>> Somnath 
>> 
>>>> 
>> 
>>> 
>> 
>>> Thanks for sharing this, the first thing in thought when I looked at 
>> 
>>> this thread, was your patches :) 
>> 
>>> 
>> 
>>> If Giant will incorporate them, both the RDMA support and those should give 
>>> a huge performance boost for RDMA-enabled Ceph backnets. 
>> 
>>> 
>> 
>>> ________________________________ 
>> 
>>> 
>> 
>>> PLEASE NOTE: The information contained in this electronic mail message is 
>>> intended only for the use of the designated recipient(s) named above. If 
>>> the reader of this message is not the intended recipient, you are hereby 
>>> notified that you have received this message in error and that any review, 
>>> dissemination, distribution, or copying of this message is strictly 
>>> prohibited. If you have received this communication in error, please notify 
>>> the sender by telephone or e-mail (as shown above) immediately and destroy 
>>> any and all copies of this message in your possession (whether hard copies 
>>> or electronically stored copies). 
>> 
>>> 
>> 
>>> _______________________________________________ 
>> 
>>> ceph-users mailing list 
>> 
>>> ceph-users@lists.ceph.com 
>> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> 
>> Best Regards, 
>> 
>> 
>> 
>> Wheat 
>> 
>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> 
>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> Cheers. 
> –––– 
> Sébastien Han 
> Cloud Architect 
> 
> "Always give 100%. Unless you're giving blood." 
> 
> Phone: +33 (0)1 49 70 99 72 
> Mail: sebastien....@enovance.com 
> Address : 11 bis, rue Roquépine - 75008 Paris 
> Web : www.enovance.com - Twitter : @enovance 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


Cheers. 
–––– 
Sébastien Han 
Cloud Architect 

"Always give 100%. Unless you're giving blood." 

Phone: +33 (0)1 49 70 99 72 
Mail: sebastien....@enovance.com 
Address : 11 bis, rue Roquépine - 75008 Paris 
Web : www.enovance.com - Twitter : @enovance 


_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

Reply via email to