Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

Alexandre DERUMIER Thu, 11 Sep 2014 22:59:12 -0700

>>For crucial, I'll try to apply the patch from stefan priebe, to ignore 
>>flushes (as crucial m550 have supercaps) 
>>http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-November/035707.html 
Here the results, disable cache flush


crucial m550
------------
#fio --filename=/dev/sdb --direct=1 --rw=write --bs=4k --numjobs=2 
--group_reporting --invalidate=0 --name=ab --sync=1
bw=177575KB/s, iops=44393 


----- Mail original ----- 

De: "Alexandre DERUMIER" <aderum...@odiso.com> 
À: "Cedric Lemarchand" <ced...@yipikai.org> 
Cc: ceph-users@lists.ceph.com 
Envoyé: Vendredi 12 Septembre 2014 04:55:21 
Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
IOPS 

Hi, 
seem that intel s3500 perform a lot better with o_dsync 

crucial m550 
------------ 
#fio --filename=/dev/sdb --direct=1 --rw=write --bs=4k --numjobs=2 
--group_reporting --invalidate=0 --name=ab --sync=1 
bw=1249.9KB/s, iops=312 

intel s3500 
----------- 
fio --filename=/dev/sdb --direct=1 --rw=write --bs=4k --numjobs=2 
--group_reporting --invalidate=0 --name=ab --sync=1 
#bw=41794KB/s, iops=10448 

ok, so 30x faster. 



For crucial, I have try to apply the patch from stefan priebe, to ignore 
flushes (as crucial m550 have supercaps) 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-November/035707.html 
Coming from zfs, this sound like "zfs_nocacheflush"

Now results:

crucial m550 
------------ 
#fio --filename=/dev/sdb --direct=1 --rw=write --bs=4k --numjobs=2 
--group_reporting --invalidate=0 --name=ab --sync=1 
bw=177575KB/s, iops=44393  



fio rbd crucial m550 1 osd 0.85 (osd_enable_op_tracker true or false, same 
result):
---------------------------
bw=12327KB/s, iops=3081

So no much better than before, but this time, iostat show only 15% utils, and 
latencies are lower

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdb               0,00    29,00    0,00 3075,00     0,00 36748,50    23,90     
0,29    0,10    0,00    0,10   0,05  15,20


So, the write bottleneck seem to be in ceph.



I will send s3500 result today

----- Mail original ----- 

De: "Cedric Lemarchand" <ced...@yipikai.org> 
À: ceph-users@lists.ceph.com 
Envoyé: Jeudi 11 Septembre 2014 21:23:23 
Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
IOPS 


Le 11/09/2014 19:33, Cedric Lemarchand a écrit : 
> Le 11/09/2014 08:20, Alexandre DERUMIER a écrit : 
>> Hi Sebastien, 
>> 
>> here my first results with crucial m550 (I'll send result with intel s3500 
>> later): 
>> 
>> - 3 nodes 
>> - dell r620 without expander backplane 
>> - sas controller : lsi LSI 9207 (no hardware raid or cache) 
>> - 2 x E5-2603v2 1.8GHz (4cores) 
>> - 32GB ram 
>> - network : 2xgigabit link lacp + 2xgigabit lacp for cluster replication. 
>> 
>> -os : debian wheezy, with kernel 3.10 
>> 
>> os + ceph mon : 2x intel s3500 100gb linux soft raid 
>> osd : crucial m550 (1TB). 
>> 
>> 
>> 3mon in the ceph cluster, 
>> and 1 osd (journal and datas on same disk) 
>> 
>> 
>> ceph.conf 
>> --------- 
>> debug_lockdep = 0/0 
>> debug_context = 0/0 
>> debug_crush = 0/0 
>> debug_buffer = 0/0 
>> debug_timer = 0/0 
>> debug_filer = 0/0 
>> debug_objecter = 0/0 
>> debug_rados = 0/0 
>> debug_rbd = 0/0 
>> debug_journaler = 0/0 
>> debug_objectcatcher = 0/0 
>> debug_client = 0/0 
>> debug_osd = 0/0 
>> debug_optracker = 0/0 
>> debug_objclass = 0/0 
>> debug_filestore = 0/0 
>> debug_journal = 0/0 
>> debug_ms = 0/0 
>> debug_monc = 0/0 
>> debug_tp = 0/0 
>> debug_auth = 0/0 
>> debug_finisher = 0/0 
>> debug_heartbeatmap = 0/0 
>> debug_perfcounter = 0/0 
>> debug_asok = 0/0 
>> debug_throttle = 0/0 
>> debug_mon = 0/0 
>> debug_paxos = 0/0 
>> debug_rgw = 0/0 
>> osd_op_threads = 5 
>> filestore_op_threads = 4 
>> 
>> ms_nocrc = true 
>> cephx sign messages = false 
>> cephx require signatures = false 
>> 
>> ms_dispatch_throttle_bytes = 0 
>> 
>> #0.85 
>> throttler_perf_counter = false 
>> filestore_fd_cache_size = 64 
>> filestore_fd_cache_shards = 32 
>> osd_op_num_threads_per_shard = 1 
>> osd_op_num_shards = 25 
>> osd_enable_op_tracker = true 
>> 
>> 
>> 
>> Fio disk 4K benchmark 
>> ------------------ 
>> rand read 4k : fio --filename=/dev/sdb --direct=1 --rw=randread --bs=4k 
>> --iodepth=32 --group_reporting --invalidate=0 --name=abc --ioengine=aio 
>> bw=271755KB/s, iops=67938 
>> 
>> rand write 4k : fio --filename=/dev/sdb --direct=1 --rw=randwrite --bs=4k 
>> --iodepth=32 --group_reporting --invalidate=0 --name=abc --ioengine=aio 
>> bw=228293KB/s, iops=57073 
>> 
>> 
>> 
>> fio osd benchmark (through librbd) 
>> ---------------------------------- 
>> [global] 
>> ioengine=rbd 
>> clientname=admin 
>> pool=test 
>> rbdname=test 
>> invalidate=0 # mandatory 
>> rw=randwrite 
>> rw=randread 
>> bs=4k 
>> direct=1 
>> numjobs=4 
>> group_reporting=1 
>> 
>> [rbd_iodepth32] 
>> iodepth=32 
>> 
>> 
>> 
>> FIREFLY RESULTS 
>> ---------------- 
>> fio randwrite : bw=5009.6KB/s, iops=1252 
>> 
>> fio randread: bw=37820KB/s, iops=9455 
>> 
>> 
>> 
>> O.85 RESULTS 
>> ------------ 
>> 
>> fio randwrite : bw=11658KB/s, iops=2914 
>> 
>> fio randread : bw=38642KB/s, iops=9660 
>> 
>> 
>> 
>> 0.85 + osd_enable_op_tracker=false 
>> ----------------------------------- 
>> fio randwrite : bw=11630KB/s, iops=2907 
>> fio randread : bw=80606KB/s, iops=20151, (cpu 100% - GREAT !) 
>> 
>> 
>> 
>> So, for read, seem that osd_enable_op_tracker is the bottleneck. 
>> 
>> 
>> Now for write, I really don't understand why it's so low. 
>> 
>> 
>> I have done some iostat: 
>> 
>> 
>> FIO directly on /dev/sdb 
>> bw=228293KB/s, iops=57073 
>> 
>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await 
>> w_await svctm %util 
>> sdb 0,00 0,00 0,00 63613,00 0,00 254452,00 8,00 31,24 0,49 0,00 0,49 0,02 
>> 100,00 
>> 
>> 
>> FIO directly on osd through librbd 
>> bw=11658KB/s, iops=2914 
>> 
>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await 
>> w_await svctm %util 
>> sdb 0,00 355,00 0,00 5225,00 0,00 29678,00 11,36 57,63 11,03 0,00 11,03 0,19 
>> 99,70 
>> 
>> 
>> (I don't understand what exactly is %util, 100% in the 2 cases, because 10x 
>> slower with ceph) 
> It would be interesting if you could catch the size of writes on SSD 
> during the bench through librbd (I know nmon can do that) 
Replying to myself ... I ask a bit quickly in the way we already have 
this information (29678 / 5225 = 5,68Ko), but this is irrelevant. 

Cheers 

>> It could be a dsync problem, result seem pretty poor 
>> 
>> # dd if=rand.file of=/dev/sdb bs=4k count=65536 oflag=direct 
>> 65536+0 enregistrements lus 
>> 65536+0 enregistrements écrits 
>> 268435456 octets (268 MB) copiés, 2,77433 s, 96,8 MB/s 
>> 
>> 
>> # dd if=rand.file of=/dev/sdb bs=4k count=65536 oflag=dsync,direct 
>> ^C17228+0 enregistrements lus 
>> 17228+0 enregistrements écrits 
>> 70565888 octets (71 MB) copiés, 70,4098 s, 1,0 MB/s 
>> 
>> 
>> 
>> I'll do tests with intel s3500 tomorrow to compare 
>> 
>> ----- Mail original ----- 
>> 
>> De: "Sebastien Han" <sebastien....@enovance.com> 
>> À: "Warren Wang" <warren_w...@cable.comcast.com> 
>> Cc: ceph-users@lists.ceph.com 
>> Envoyé: Lundi 8 Septembre 2014 22:58:25 
>> Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
>> IOPS 
>> 
>> They definitely are Warren! 
>> 
>> Thanks for bringing this here :). 
>> 
>> On 05 Sep 2014, at 23:02, Wang, Warren <warren_w...@cable.comcast.com> 
>> wrote: 
>> 
>>> +1 to what Cedric said. 
>>> 
>>> Anything more than a few minutes of heavy sustained writes tended to get 
>>> our solid state devices into a state where garbage collection could not 
>>> keep up. Originally we used small SSDs and did not overprovision the 
>>> journals by much. Manufacturers publish their SSD stats, and then in very 
>>> small font, state that the attained IOPS are with empty drives, and the 
>>> tests are only run for very short amounts of time. Even if the drives are 
>>> new, it's a good idea to perform an hdparm secure erase on them (so that 
>>> the SSD knows that the blocks are truly unused), and then overprovision 
>>> them. You'll know if you have a problem by watching for utilization and 
>>> wait data on the journals. 
>>> 
>>> One of the other interesting performance issues is that the Intel 10Gbe 
>>> NICs + default kernel that we typically use max out around 1million 
>>> packets/sec. It's worth tracking this metric to if you are close. 
>>> 
>>> I know these aren't necessarily relevant to the test parameters you gave 
>>> below, but they're worth keeping in mind. 
>>> 
>>> -- 
>>> Warren Wang 
>>> Comcast Cloud (OpenStack) 
>>> 
>>> 
>>> From: Cedric Lemarchand <ced...@yipikai.org> 
>>> Date: Wednesday, September 3, 2014 at 5:14 PM 
>>> To: "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com> 
>>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 
>>> 2K IOPS 
>>> 
>>> 
>>> Le 03/09/2014 22:11, Sebastien Han a écrit : 
>>>> Hi Warren, 
>>>> 
>>>> What do mean exactly by secure erase? At the firmware level with 
>>>> constructor softwares? 
>>>> SSDs were pretty new so I don’t we hit that sort of things. I believe that 
>>>> only aged SSDs have this behaviour but I might be wrong. 
>>>> 
>>> Sorry I forgot to reply to the real question ;-) 
>>> So yes it only plays after some times, for your case, if the SSD still 
>>> delivers write IOPS specified by the manufacturer, it will doesn't help in 
>>> any ways. 
>>> 
>>> But it seems this practice is nowadays increasingly used. 
>>> 
>>> Cheers 
>>>> On 02 Sep 2014, at 18:23, Wang, Warren <warren_w...@cable.comcast.com> 
>>>> wrote: 
>>>> 
>>>> 
>>>>> Hi Sebastien, 
>>>>> 
>>>>> Something I didn't see in the thread so far, did you secure erase the 
>>>>> SSDs before they got used? I assume these were probably repurposed for 
>>>>> this test. We have seen some pretty significant garbage collection issue 
>>>>> on various SSD and other forms of solid state storage to the point where 
>>>>> we are overprovisioning pretty much every solid state device now. By as 
>>>>> much as 50% to handle sustained write operations. Especially important 
>>>>> for the journals, as we've found. 
>>>>> 
>>>>> Maybe not an issue on the short fio run below, but certainly evident on 
>>>>> longer runs or lots of historical data on the drives. The max transaction 
>>>>> time looks pretty good for your test. Something to consider though. 
>>>>> 
>>>>> Warren 
>>>>> 
>>>>> -----Original Message----- 
>>>>> From: ceph-users [ 
>>>>> mailto:ceph-users-boun...@lists.ceph.com 
>>>>> ] On Behalf Of Sebastien Han 
>>>>> Sent: Thursday, August 28, 2014 12:12 PM 
>>>>> To: ceph-users 
>>>>> Cc: Mark Nelson 
>>>>> Subject: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
>>>>> IOPS 
>>>>> 
>>>>> Hey all, 
>>>>> 
>>>>> It has been a while since the last thread performance related on the ML 
>>>>> :p I've been running some experiment to see how much I can get from an 
>>>>> SSD on a Ceph cluster. 
>>>>> To achieve that I did something pretty simple: 
>>>>> 
>>>>> * Debian wheezy 7.6 
>>>>> * kernel from debian 3.14-0.bpo.2-amd64 
>>>>> * 1 cluster, 3 mons (i'd like to keep this realistic since in a real 
>>>>> deployment i'll use 3) 
>>>>> * 1 OSD backed by an SSD (journal and osd data on the same device) 
>>>>> * 1 replica count of 1 
>>>>> * partitions are perfectly aligned 
>>>>> * io scheduler is set to noon but deadline was showing the same results 
>>>>> * no updatedb running 
>>>>> 
>>>>> About the box: 
>>>>> 
>>>>> * 32GB of RAM 
>>>>> * 12 cores with HT @ 2,4 GHz 
>>>>> * WB cache is enabled on the controller 
>>>>> * 10Gbps network (doesn't help here) 
>>>>> 
>>>>> The SSD is a 200G Intel DC S3700 and is capable of delivering around 29K 
>>>>> iops with random 4k writes (my fio results) As a benchmark tool I used 
>>>>> fio with the rbd engine (thanks deutsche telekom guys!). 
>>>>> 
>>>>> O_DIECT and D_SYNC don't seem to be a problem for the SSD: 
>>>>> 
>>>>> # dd if=/dev/urandom of=rand.file bs=4k count=65536 
>>>>> 65536+0 records in 
>>>>> 65536+0 records out 
>>>>> 268435456 bytes (268 MB) copied, 29.5477 s, 9.1 MB/s 
>>>>> 
>>>>> # du -sh rand.file 
>>>>> 256M rand.file 
>>>>> 
>>>>> # dd if=rand.file of=/dev/sdo bs=4k count=65536 oflag=dsync,direct 
>>>>> 65536+0 records in 
>>>>> 65536+0 records out 
>>>>> 268435456 bytes (268 MB) copied, 2.73628 s, 98.1 MB/s 
>>>>> 
>>>>> See my ceph.conf: 
>>>>> 
>>>>> [global] 
>>>>> auth cluster required = cephx 
>>>>> auth service required = cephx 
>>>>> auth client required = cephx 
>>>>> fsid = 857b8609-8c9b-499e-9161-2ea67ba51c97 
>>>>> osd pool default pg num = 4096 
>>>>> osd pool default pgp num = 4096 
>>>>> osd pool default size = 2 
>>>>> osd crush chooseleaf type = 0 
>>>>> 
>>>>> debug lockdep = 0/0 
>>>>> debug context = 0/0 
>>>>> debug crush = 0/0 
>>>>> debug buffer = 0/0 
>>>>> debug timer = 0/0 
>>>>> debug journaler = 0/0 
>>>>> debug osd = 0/0 
>>>>> debug optracker = 0/0 
>>>>> debug objclass = 0/0 
>>>>> debug filestore = 0/0 
>>>>> debug journal = 0/0 
>>>>> debug ms = 0/0 
>>>>> debug monc = 0/0 
>>>>> debug tp = 0/0 
>>>>> debug auth = 0/0 
>>>>> debug finisher = 0/0 
>>>>> debug heartbeatmap = 0/0 
>>>>> debug perfcounter = 0/0 
>>>>> debug asok = 0/0 
>>>>> debug throttle = 0/0 
>>>>> 
>>>>> [mon] 
>>>>> mon osd down out interval = 600 
>>>>> mon osd min down reporters = 13 
>>>>> [mon.ceph-01] 
>>>>> host = ceph-01 
>>>>> mon addr = 172.20.20.171 
>>>>> [mon.ceph-02] 
>>>>> host = ceph-02 
>>>>> mon addr = 172.20.20.172 
>>>>> [mon.ceph-03] 
>>>>> host = ceph-03 
>>>>> mon addr = 172.20.20.173 
>>>>> 
>>>>> debug lockdep = 0/0 
>>>>> debug context = 0/0 
>>>>> debug crush = 0/0 
>>>>> debug buffer = 0/0 
>>>>> debug timer = 0/0 
>>>>> debug journaler = 0/0 
>>>>> debug osd = 0/0 
>>>>> debug optracker = 0/0 
>>>>> debug objclass = 0/0 
>>>>> debug filestore = 0/0 
>>>>> debug journal = 0/0 
>>>>> debug ms = 0/0 
>>>>> debug monc = 0/0 
>>>>> debug tp = 0/0 
>>>>> debug auth = 0/0 
>>>>> debug finisher = 0/0 
>>>>> debug heartbeatmap = 0/0 
>>>>> debug perfcounter = 0/0 
>>>>> debug asok = 0/0 
>>>>> debug throttle = 0/0 
>>>>> 
>>>>> [osd] 
>>>>> osd mkfs type = xfs 
>>>>> osd mkfs options xfs = -f -i size=2048 
>>>>> osd mount options xfs = rw,noatime,logbsize=256k,delaylog 
>>>>> osd journal size = 20480 
>>>>> cluster_network = 172.20.20.0/24 
>>>>> public_network = 172.20.20.0/24 
>>>>> osd mon heartbeat interval = 30 
>>>>> # Performance tuning 
>>>>> filestore merge threshold = 40 
>>>>> filestore split multiple = 8 
>>>>> osd op threads = 8 
>>>>> # Recovery tuning 
>>>>> osd recovery max active = 1 
>>>>> osd max backfills = 1 
>>>>> osd recovery op priority = 1 
>>>>> 
>>>>> 
>>>>> debug lockdep = 0/0 
>>>>> debug context = 0/0 
>>>>> debug crush = 0/0 
>>>>> debug buffer = 0/0 
>>>>> debug timer = 0/0 
>>>>> debug journaler = 0/0 
>>>>> debug osd = 0/0 
>>>>> debug optracker = 0/0 
>>>>> debug objclass = 0/0 
>>>>> debug filestore = 0/0 
>>>>> debug journal = 0/0 
>>>>> debug ms = 0/0 
>>>>> debug monc = 0/0 
>>>>> debug tp = 0/0 
>>>>> debug auth = 0/0 
>>>>> debug finisher = 0/0 
>>>>> debug heartbeatmap = 0/0 
>>>>> debug perfcounter = 0/0 
>>>>> debug asok = 0/0 
>>>>> debug throttle = 0/0 
>>>>> 
>>>>> Disabling all debugging made me win 200/300 more IOPS. 
>>>>> 
>>>>> See my fio template: 
>>>>> 
>>>>> [global] 
>>>>> #logging 
>>>>> #write_iops_log=write_iops_log 
>>>>> #write_bw_log=write_bw_log 
>>>>> #write_lat_log=write_lat_lo 
>>>>> 
>>>>> time_based 
>>>>> runtime=60 
>>>>> 
>>>>> ioengine=rbd 
>>>>> clientname=admin 
>>>>> pool=test 
>>>>> rbdname=fio 
>>>>> invalidate=0 # mandatory 
>>>>> #rw=randwrite 
>>>>> rw=write 
>>>>> bs=4k 
>>>>> #bs=32m 
>>>>> size=5G 
>>>>> group_reporting 
>>>>> 
>>>>> [rbd_iodepth32] 
>>>>> iodepth=32 
>>>>> direct=1 
>>>>> 
>>>>> See my rio output: 
>>>>> 
>>>>> rbd_iodepth32: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, 
>>>>> iodepth=32 fio-2.1.11-14-gb74e Starting 1 process rbd engine: RBD 
>>>>> version: 0.1.8 
>>>>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/12876KB/0KB /s] [0/3219/0 iops] 
>>>>> [eta 00m:00s] 
>>>>> rbd_iodepth32: (groupid=0, jobs=1): err= 0: pid=32116: Thu Aug 28 
>>>>> 00:28:26 2014 
>>>>> write: io=771448KB, bw=12855KB/s, iops=3213, runt= 60010msec 
>>>>> slat (usec): min=42, max=1578, avg=66.50, stdev=16.96 
>>>>> clat (msec): min=1, max=28, avg= 9.85, stdev= 1.48 
>>>>> lat (msec): min=1, max=28, avg= 9.92, stdev= 1.47 
>>>>> clat percentiles (usec): 
>>>>> | 1.00th=[ 6368], 5.00th=[ 8256], 10.00th=[ 8640], 20.00th=[ 9152], 
>>>>> | 30.00th=[ 9408], 40.00th=[ 9664], 50.00th=[ 9792], 60.00th=[10048], 
>>>>> | 70.00th=[10176], 80.00th=[10560], 90.00th=[10944], 95.00th=[11456], 
>>>>> | 99.00th=[13120], 99.50th=[16768], 99.90th=[25984], 99.95th=[27008], 
>>>>> | 99.99th=[28032] 
>>>>> bw (KB /s): min=11864, max=13808, per=100.00%, avg=12864.36, stdev=407.35 
>>>>> lat (msec) : 2=0.03%, 4=0.54%, 10=59.79%, 20=39.24%, 50=0.41% 
>>>>> cpu : usr=19.15%, sys=4.69%, ctx=326309, majf=0, minf=426088 
>>>>> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=33.9%, 32=66.1%, >=64=0.0% 
>>>>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>>>>> complete : 0=0.0%, 4=99.6%, 8=0.4%, 16=0.1%, 32=0.1%, 64=0.0%, >=64=0.0% 
>>>>> issued : total=r=0/w=192862/d=0, short=r=0/w=0/d=0 
>>>>> latency : target=0, window=0, percentile=100.00%, depth=32 
>>>>> 
>>>>> Run status group 0 (all jobs): 
>>>>> WRITE: io=771448KB, aggrb=12855KB/s, minb=12855KB/s, maxb=12855KB/s, 
>>>>> mint=60010msec, maxt=60010msec 
>>>>> 
>>>>> Disk stats (read/write): 
>>>>> dm-1: ios=0/49, merge=0/0, ticks=0/12, in_queue=12, util=0.01%, 
>>>>> aggrios=0/22, aggrmerge=0/27, aggrticks=0/12, aggrin_queue=12, 
>>>>> aggrutil=0.01% 
>>>>> sda: ios=0/22, merge=0/27, ticks=0/12, in_queue=12, util=0.01% 
>>>>> 
>>>>> I tried to tweak several parameters like: 
>>>>> 
>>>>> filestore_wbthrottle_xfs_ios_start_flusher = 10000 
>>>>> filestore_wbthrottle_xfs_ios_hard_limit = 10000 
>>>>> filestore_wbthrottle_btrfs_ios_start_flusher = 10000 
>>>>> filestore_wbthrottle_btrfs_ios_hard_limit = 10000 filestore queue max ops 
>>>>> = 2000 
>>>>> 
>>>>> But didn't any improvement. 
>>>>> 
>>>>> Then I tried other things: 
>>>>> 
>>>>> * Increasing the io_depth up to 256 or 512 gave me between 50 to 100 more 
>>>>> IOPS but it's not a realistic workload anymore and not that significant. 
>>>>> * adding another SSD for the journal, still getting 3,2K IOPS 
>>>>> * I tried with rbd bench and I also got 3K IOPS 
>>>>> * I ran the test on a client machine and then locally on the server, 
>>>>> still getting 3,2K IOPS 
>>>>> * put the journal in memory, still getting 3,2K IOPS 
>>>>> * with 2 clients running the test in parallel I got a total of 3,6K IOPS 
>>>>> but I don't seem to be able to go over 
>>>>> * I tried is to add another OSD to that SSD, so I had 2 OSD and 2 
>>>>> journals on 1 SSD, got 4,5K IOPS YAY! 
>>>>> 
>>>>> Given the results of the last time it seems that something is limiting 
>>>>> the number of IOPS per OSD process. 
>>>>> 
>>>>> Running the test on a client or locally didn't show any difference. 
>>>>> So it looks to me that there is some contention within Ceph that might 
>>>>> cause this. 
>>>>> 
>>>>> I also ran perf and looked at the output, everything looks decent, but 
>>>>> someone might want to have a look at it :). 
>>>>> 
>>>>> We have been able to reproduce this on 3 distinct platforms with some 
>>>>> deviations (because of the hardware) but the behaviour is the same. 
>>>>> Any thoughts will be highly appreciated, only getting 3,2k out of an 29K 
>>>>> IOPS SSD is a bit frustrating :). 
>>>>> 
>>>>> Cheers. 
>>>>> ---- 
>>>>> Sébastien Han 
>>>>> Cloud Architect 
>>>>> 
>>>>> "Always give 100%. Unless you're giving blood." 
>>>>> 
>>>>> Phone: +33 (0)1 49 70 99 72 
>>>>> Mail: 
>>>>> sebastien....@enovance.com 
>>>>> 
>>>>> Address : 11 bis, rue Roquépine - 75008 Paris Web : 
>>>>> www.enovance.com 
>>>>> - Twitter : @enovance 
>>>>> 
>>>>> 
>>>> Cheers. 
>>>> –––– 
>>>> Sébastien Han 
>>>> Cloud Architect 
>>>> 
>>>> "Always give 100%. Unless you're giving blood." 
>>>> 
>>>> Phone: +33 (0)1 49 70 99 72 
>>>> Mail: 
>>>> sebastien....@enovance.com 
>>>> 
>>>> Address : 11 bis, rue Roquépine - 75008 Paris 
>>>> Web : 
>>>> www.enovance.com 
>>>> - Twitter : @enovance 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________ 
>>>> ceph-users mailing list 
>>>> 
>>>> ceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>  
>>> -- 
>>> Cédric 
>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users@lists.ceph.com 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> Cheers. 
>> –––– 
>> Sébastien Han 
>> Cloud Architect 
>> 
>> "Always give 100%. Unless you're giving blood." 
>> 
>> Phone: +33 (0)1 49 70 99 72 
>> Mail: sebastien....@enovance.com 
>> Address : 11 bis, rue Roquépine - 75008 Paris 
>> Web : www.enovance.com - Twitter : @enovance 
>> 
>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

-- 
Cédric 

_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

Reply via email to