Not a bad idea.. which reminds me there might be some benefits to toying with 
the MTU settings as well..

I'll check when I have a chance.
--
David Moreau Simard

> On Dec 8, 2014, at 2:13 PM, Nick Fisk <n...@fisk.me.uk> wrote:
> 
> Hi David,
> 
> This is a long shot, but have you checked the Max queue depth on the iscsi 
> side. I've got a feeling that lio might be set at 32 as default.
> 
> This would definitely have an effect at the high queue depths you are testing 
> with.
> 
> On 8 Dec 2014 16:53, David Moreau Simard <dmsim...@iweb.com> wrote:
>> 
>> Haven't tried other iSCSI implementations (yet). LIO/targetcli makes it very 
>> easy to iQuoting David Moreau Simard <dmsim...@iweb.com>
> 
>> Haven't tried other iSCSI implementations (yet).
>> 
>> LIO/targetcli makes it very easy to implement/integrate/wrap/automate around 
>> so I'm really trying to get this right.
>> 
>> PCI-E SSD cache tier in front of spindles-backed erasure coded pool in 10 
>> Gbps across the board yields results slightly better or very similar to two 
>> spindles in hardware RAID-0 with writeback caching.
>> With that in mind, the performance is not outright awful by any means, 
>> there's just a lot of overhead we have to be reminded about.
>> 
>> What I'd like to further test but am unable to right now is to see what 
>> happens if you scale up the cluster. Right now I'm testing on only two nodes.
>> Does the IOPS scale linearly with increasing amount of OSDs/servers ? Or is 
>> it more about a capacity thing ?
>> 
>> Perhaps if someone else can chime in, I'm really curious.
>> --
>> David Moreau Simard
>> 
>>> On Dec 6, 2014, at 11:18 AM, Nick Fisk <n...@fisk.me.uk> wrote:
>>> 
>>> Hi David,
>>> 
>>> Very strange, but  I'm glad you managed to finally get the cluster working
>>> normally. Thank you for posting the benchmarks figures, it's interesting to
>>> see the overhead of LIO over pure RBD performance.
>>> 
>>> I should have the hardware for our cluster up and running early next year, I
>>> will be in a better position to test the iSCSI performance then. I will
>>> report back once I have some numbers.
>>> 
>>> Just out of interest, have you tried any of the other iSCSI implementations
>>> to see if they show the same performance drop?
>>> 
>>> Nick
>>> 
>>> -----Original Message-----
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>>> David Moreau Simard
>>> Sent: 05 December 2014 16:03
>>> To: Nick Fisk
>>> Cc: ceph-users@lists.ceph.com
>>> Subject: Re: [ceph-users] Poor RBD performance as LIO iSCSI target
>>> 
>>> I've flushed everything - data, pools, configs and reconfigured the whole
>>> thing.
>>> 
>>> I was particularly careful with cache tiering configurations (almost leaving
>>> defaults when possible) and it's not locking anymore.
>>> It looks like the cache tiering configuration I had was causing the problem
>>> ? I can't put my finger on exactly what/why and I don't have the luxury of
>>> time to do this lengthy testing again.
>>> 
>>> Here's what I dumped as far as config goes before wiping:
>>> ========
>>> # for var in size min_size pg_num pgp_num crush_ruleset
>>> erasure_code_profile; do ceph osd pool get volumes $var; done
>>> size: 5
>>> min_size: 2
>>> pg_num: 7200
>>> pgp_num: 7200
>>> crush_ruleset: 1
>>> erasure_code_profile: ecvolumes
>>> 
>>> # for var in size min_size pg_num pgp_num crush_ruleset hit_set_type
>>> hit_set_period hit_set_count target_max_objects target_max_bytes
>>> cache_target_dirty_ratio cache_target_full_ratio cache_min_flush_age
>>> cache_min_evict_age; do ceph osd pool get volumecache $var; done
>>> size: 2
>>> min_size: 1
>>> pg_num: 7200
>>> pgp_num: 7200
>>> crush_ruleset: 4
>>> hit_set_type: bloom
>>> hit_set_period: 3600
>>> hit_set_count: 1
>>> target_max_objects: 0
>>> target_max_bytes: 100000000000
>>> cache_target_dirty_ratio: 0.5
>>> cache_target_full_ratio: 0.8
>>> cache_min_flush_age: 600
>>> cache_min_evict_age: 1800
>>> 
>>> # ceph osd erasure-code-profile get ecvolumes
>>> directory=/usr/lib/ceph/erasure-code
>>> k=3
>>> m=2
>>> plugin=jerasure
>>> ruleset-failure-domain=osd
>>> technique=reed_sol_van
>>> ========
>>> 
>>> And now:
>>> ========
>>> # for var in size min_size pg_num pgp_num crush_ruleset
>>> erasure_code_profile; do ceph osd pool get volumes $var; done
>>> size: 5
>>> min_size: 3
>>> pg_num: 2048
>>> pgp_num: 2048
>>> crush_ruleset: 1
>>> erasure_code_profile: ecvolumes
>>> 
>>> # for var in size min_size pg_num pgp_num crush_ruleset hit_set_type
>>> hit_set_period hit_set_count target_max_objects target_max_bytes
>>> cache_target_dirty_ratio cache_target_full_ratio cache_min_flush_age
>>> cache_min_evict_age; do ceph osd pool get volumecache $var; done
>>> size: 2
>>> min_size: 1
>>> pg_num: 2048
>>> pgp_num: 2048
>>> crush_ruleset: 4
>>> hit_set_type: bloom
>>> hit_set_period: 3600
>>> hit_set_count: 1
>>> target_max_objects: 0
>>> target_max_bytes: 150000000000
>>> cache_target_dirty_ratio: 0.5
>>> cache_target_full_ratio: 0.8
>>> cache_min_flush_age: 0
>>> cache_min_evict_age: 1800
>>> 
>>> # ceph osd erasure-code-profile get ecvolumes
>>> directory=/usr/lib/ceph/erasure-code
>>> k=3
>>> m=2
>>> plugin=jerasure
>>> ruleset-failure-domain=osd
>>> technique=reed_sol_van
>>> ========
>>> 
>>> Crush map hasn't really changed before and after.
>>> 
>>> FWIW, the benchmarks I pulled out of the setup:
>>> https://gist.github.com/dmsimard/2737832d077cfc5eff34
>>> Definite overhead going from krbd to krbd + LIO...
>>> --
>>> David Moreau Simard
>>> 
>>> 
>>>> On Nov 20, 2014, at 4:14 PM, Nick Fisk <n...@fisk.me.uk> wrote:
>>>> 
>>>> Here you go:-
>>>> 
>>>> Erasure Profile
>>>> k=2
>>>> m=1
>>>> plugin=jerasure
>>>> ruleset-failure-domain=osd
>>>> ruleset-root=hdd
>>>> technique=reed_sol_van
>>>> 
>>>> Cache Settings
>>>> hit_set_type: bloom
>>>> hit_set_period: 3600
>>>> hit_set_count: 1
>>>> target_max_objects
>>>> target_max_objects: 0
>>>> target_max_bytes: 1000000000
>>>> cache_target_dirty_ratio: 0.4
>>>> cache_target_full_ratio: 0.8
>>>> cache_min_flush_age: 0
>>>> cache_min_evict_age: 0
>>>> 
>>>> Crush Dump
>>>> # begin crush map
>>>> tunable choose_local_tries 0
>>>> tunable choose_local_fallback_tries 0
>>>> tunable choose_total_tries 50
>>>> tunable chooseleaf_descend_once 1
>>>> 
>>>> # devices
>>>> device 0 osd.0
>>>> device 1 osd.1
>>>> device 2 osd.2
>>>> device 3 osd.3
>>>> 
>>>> # types
>>>> type 0 osd
>>>> type 1 host
>>>> type 2 chassis
>>>> type 3 rack
>>>> type 4 row
>>>> type 5 pdu
>>>> type 6 pod
>>>> type 7 room
>>>> type 8 datacenter
>>>> type 9 region
>>>> type 10 root
>>>> 
>>>> # buckets
>>>> host ceph-test-hdd {
>>>>      id -5           # do not change unnecessarily
>>>>      # weight 2.730
>>>>      alg straw
>>>>      hash 0  # rjenkins1
>>>>      item osd.1 weight 0.910
>>>>      item osd.2 weight 0.910
>>>>      item osd.0 weight 0.910
>>>> }
>>>> root hdd {
>>>>      id -3           # do not change unnecessarily
>>>>      # weight 2.730
>>>>      alg straw
>>>>      hash 0  # rjenkins1
>>>>      item ceph-test-hdd weight 2.730 } host ceph-test-ssd {
>>>>      id -6           # do not change unnecessarily
>>>>      # weight 1.000
>>>>      alg straw
>>>>      hash 0  # rjenkins1
>>>>      item osd.3 weight 1.000
>>>> }
>>>> root ssd {
>>>>      id -4           # do not change unnecessarily
>>>>      # weight 1.000
>>>>      alg straw
>>>>      hash 0  # rjenkins1
>>>>      item ceph-test-ssd weight 1.000 }
>>>> 
>>>> # rules
>>>> rule hdd {
>>>>      ruleset 0
>>>>      type replicated
>>>>      min_size 0
>>>>      max_size 10
>>>>      step take hdd
>>>>      step chooseleaf firstn 0 type osd
>>>>      step emit
>>>> }
>>>> rule ssd {
>>>>      ruleset 1
>>>>      type replicated
>>>>      min_size 0
>>>>      max_size 4
>>>>      step take ssd
>>>>      step chooseleaf firstn 0 type osd
>>>>      step emit
>>>> }
>>>> rule ecpool {
>>>>      ruleset 2
>>>>      type erasure
>>>>      min_size 3
>>>>      max_size 20
>>>>      step set_chooseleaf_tries 5
>>>>      step take hdd
>>>>      step chooseleaf indep 0 type osd
>>>>      step emit
>>>> }
>>>> 
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
>>>> Of David Moreau Simard
>>>> Sent: 20 November 2014 20:03
>>>> To: Nick Fisk
>>>> Cc: ceph-users@lists.ceph.com
>>>> Subject: Re: [ceph-users] Poor RBD performance as LIO iSCSI target
>>>> 
>>>> Nick,
>>>> 
>>>> Can you share more datails on the configuration you are using ? I'll
>>>> try and duplicate those configurations in my environment and see what
>>> happens.
>>>> I'm mostly interested in:
>>>> - Erasure code profile (k, m, plugin, ruleset-failure-domain)
>>>> - Cache tiering pool configuration (ex: hit_set_type, hit_set_period,
>>>> hit_set_count, target_max_objects, target_max_bytes,
>>>> cache_target_dirty_ratio, cache_target_full_ratio,
>>>> cache_min_flush_age,
>>>> cache_min_evict_age)
>>>> 
>>>> The crush rulesets would also be helpful.
>>>> 
>>>> Thanks,
>>>> --
>>>> David Moreau Simard
>>>> 
>>>>> On Nov 20, 2014, at 12:43 PM, Nick Fisk <n...@fisk.me.uk> wrote:
>>>>> 
>>>>> Hi David,
>>>>> 
>>>>> I've just finished running the 75GB fio test you posted a few days
>>>>> back on my new test cluster.
>>>>> 
>>>>> The cluster is as follows:-
>>>>> 
>>>>> Single server with 3x hdd and 1 ssd
>>>>> Ubuntu 14.04 with 3.16.7 kernel
>>>>> 2+1 EC pool on hdds below a 10G ssd cache pool. SSD is also
>>>>> 2+partitioned to
>>>>> provide journals for hdds.
>>>>> 150G RBD mapped locally
>>>>> 
>>>>> The fio test seemed to run without any problems. I want to run a few
>>>>> more tests with different settings to see if I can reproduce your
>>>>> problem. I will let you know if I find anything.
>>>>> 
>>>>> If there is anything you would like me to try, please let me know.
>>>>> 
>>>>> Nick
>>>>> 
>>>>> -----Original Message-----
>>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
>>>>> Of David Moreau Simard
>>>>> Sent: 19 November 2014 10:48
>>>>> To: Ramakrishna Nishtala (rnishtal)
>>>>> Cc: ceph-users@lists.ceph.com; Nick Fisk
>>>>> Subject: Re: [ceph-users] Poor RBD performance as LIO iSCSI target
>>>>> 
>>>>> Rama,
>>>>> 
>>>>> Thanks for your reply.
>>>>> 
>>>>> My end goal is to use iSCSI (with LIO/targetcli) to export rbd block
>>>>> devices.
>>>>> 
>>>>> I was encountering issues with iSCSI which are explained in my
>>>>> previous emails.
>>>>> I ended up being able to reproduce the problem at will on various
>>>>> Kernel and OS combinations, even on raw RBD devices - thus ruling out
>>>>> the hypothesis that it was a problem with iSCSI but rather with Ceph.
>>>>> I'm even running 0.88 now and the issue is still there.
>>>>> 
>>>>> I haven't isolated the issue just yet.
>>>>> My next tests involve disabling the cache tiering.
>>>>> 
>>>>> I do have client krbd cache as well, i'll try to disable it too if
>>>>> cache tiering isn't enough.
>>>>> --
>>>>> David Moreau Simard
>>>>> 
>>>>> 
>>>>>> On Nov 18, 2014, at 8:10 PM, Ramakrishna Nishtala (rnishtal)
>>>>> <rnish...@cisco.com> wrote:
>>>>>> 
>>>>>> Hi Dave
>>>>>> Did you say iscsi only? The tracker issue does not say though.
>>>>>> I am on giant, with both client and ceph on RHEL 7 and seems to work
>>>>>> ok,
>>>>> unless I am missing something here. RBD on baremetal with kmod-rbd
>>>>> and caching disabled.
>>>>>> 
>>>>>> [root@compute4 ~]# time fio --name=writefile --size=100G
>>>>>> --filesize=100G --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1
>>>>>> --sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1
>>>>>> --iodepth=200 --ioengine=libaio
>>>>>> writefile: (g=0): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio,
>>>>>> iodepth=200
>>>>>> fio-2.1.11
>>>>>> Starting 1 process
>>>>>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/853.0MB/0KB /s] [0/853/0
>>>>>> iops] [eta 00m:00s] ...
>>>>>> Disk stats (read/write):
>>>>>> rbd0: ios=184/204800, merge=0/0, ticks=70/16164931,
>>>>>> in_queue=16164942, util=99.98%
>>>>>> 
>>>>>> real    1m56.175s
>>>>>> user    0m18.115s
>>>>>> sys     0m10.430s
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>> Rama
>>>>>> 
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
>>>>>> Behalf Of David Moreau Simard
>>>>>> Sent: Tuesday, November 18, 2014 3:49 PM
>>>>>> To: Nick Fisk
>>>>>> Cc: ceph-users@lists.ceph.com
>>>>>> Subject: Re: [ceph-users] Poor RBD performance as LIO iSCSI target
>>>>>> 
>>>>>> Testing without the cache tiering is the next test I want to do when
>>>>>> I
>>>>> have time..
>>>>>> 
>>>>>> When it's hanging, there is no activity at all on the cluster.
>>>>>> Nothing in "ceph -w", nothing in "ceph osd pool stats".
>>>>>> 
>>>>>> I'll provide an update when I have a chance to test without tiering.
>>>>>> --
>>>>>> David Moreau Simard
>>>>>> 
>>>>>> 
>>>>>>> On Nov 18, 2014, at 3:28 PM, Nick Fisk <n...@fisk.me.uk> wrote:
>>>>>>> 
>>>>>>> Hi David,
>>>>>>> 
>>>>>>> Have you tried on a normal replicated pool with no cache? I've seen
>>>>>>> a number of threads recently where caching is causing various
>>>>>>> things to
>>>>> block/hang.
>>>>>>> It would be interesting to see if this still happens without the
>>>>>>> caching layer, at least it would rule it out.
>>>>>>> 
>>>>>>> Also is there any sign that as the test passes ~50GB that the cache
>>>>>>> might start flushing to the backing pool causing slow performance?
>>>>>>> 
>>>>>>> I am planning a deployment very similar to yours so I am following
>>>>>>> this with great interest. I'm hoping to build a single node test
>>>>>>> "cluster" shortly, so I might be in a position to work with you on
>>>>>>> this issue and hopefully get it resolved.
>>>>>>> 
>>>>>>> Nick
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
>>>>>>> Behalf Of David Moreau Simard
>>>>>>> Sent: 18 November 2014 19:58
>>>>>>> To: Mike Christie
>>>>>>> Cc: ceph-users@lists.ceph.com; Christopher Spearman
>>>>>>> Subject: Re: [ceph-users] Poor RBD performance as LIO iSCSI target
>>>>>>> 
>>>>>>> Thanks guys. I looked at http://tracker.ceph.com/issues/8818 and
>>>>>>> chatted with "dis" on #ceph-devel.
>>>>>>> 
>>>>>>> I ran a LOT of tests on a LOT of comabination of kernels (sometimes
>>>>>>> with tunables legacy). I haven't found a magical combination in
>>>>>>> which the following test does not hang:
>>>>>>> fio --name=writefile --size=100G --filesize=100G
>>>>>>> --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0
>>>>>>> --randrepeat=0 --rw=write --refill_buffers --end_fsync=1
>>>>>>> --iodepth=200 --ioengine=libaio
>>>>>>> 
>>>>>>> Either directly on a mapped rbd device, on a mounted filesystem
>>>>>>> (over rbd), exported through iSCSI.. nothing.
>>>>>>> I guess that rules out a potential issue with iSCSI overhead.
>>>>>>> 
>>>>>>> Now, something I noticed out of pure luck is that I am unable to
>>>>>>> reproduce the issue if I drop the size of the test to 50GB. Tests
>>>>>>> will complete in under 2 minutes.
>>>>>>> 75GB will hang right at the end and take more than 10 minutes.
>>>>>>> 
>>>>>>> TL;DR of tests:
>>>>>>> - 3x fio --name=writefile --size=50G --filesize=50G
>>>>>>> --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0
>>>>>>> --randrepeat=0 --rw=write --refill_buffers --end_fsync=1
>>>>>>> --iodepth=200 --ioengine=libaio
>>>>>>> -- 1m44s, 1m49s, 1m40s
>>>>>>> 
>>>>>>> - 3x fio --name=writefile --size=75G --filesize=75G
>>>>>>> --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0
>>>>>>> --randrepeat=0 --rw=write --refill_buffers --end_fsync=1
>>>>>>> --iodepth=200 --ioengine=libaio
>>>>>>> -- 10m12s, 10m11s, 10m13s
>>>>>>> 
>>>>>>> Details of tests here: http://pastebin.com/raw.php?i=3v9wMtYP
>>>>>>> 
>>>>>>> Does that ring you guys a bell ?
>>>>>>> 
>>>>>>> --
>>>>>>> David Moreau Simard
>>>>>>> 
>>>>>>> 
>>>>>>>> On Nov 13, 2014, at 3:31 PM, Mike Christie <mchri...@redhat.com>
>>> wrote:
>>>>>>>> 
>>>>>>>> On 11/13/2014 10:17 AM, David Moreau Simard wrote:
>>>>>>>>> Running into weird issues here as well in a test environment. I
>>>>>>>>> don't
>>>>>>> have a solution either but perhaps we can find some things in common..
>>>>>>>>> 
>>>>>>>>> Setup in a nutshell:
>>>>>>>>> - Ceph cluster: Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (OSDs
>>>>>>>>> with separate public/cluster network in 10 Gbps)
>>>>>>>>> - iSCSI Proxy node (targetcli/LIO): Ubuntu 14.04, Kernel 3.16.7,
>>>>>>>>> Ceph
>>>>>>>>> 0.87-1 (10 Gbps)
>>>>>>>>> - Client node: Ubuntu 12.04, Kernel 3.11 (10 Gbps)
>>>>>>>>> 
>>>>>>>>> Relevant cluster config: Writeback cache tiering with NVME PCI-E
>>>>>>>>> cards (2
>>>>>>> replica) in front of a erasure coded pool (k=3,m=2) backed by spindles.
>>>>>>>>> 
>>>>>>>>> I'm following the instructions here:
>>>>>>>>> http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd
>>>>>>>>> - im a ges-san-storage-devices No issues with creating and
>>>>>>>>> mapping a 100GB RBD image and then creating the target.
>>>>>>>>> 
>>>>>>>>> I'm interested in finding out the overhead/performance impact of
>>>>>>> re-exporting through iSCSI so the idea is to run benchmarks.
>>>>>>>>> Here's a fio test I'm trying to run on the client node on the
>>>>>>>>> mounted
>>>>>>> iscsi device:
>>>>>>>>> fio --name=writefile --size=100G --filesize=100G
>>>>>>>>> --filename=/dev/sdu --bs=1M --nrfiles=1 --direct=1 --sync=0
>>>>>>>>> --randrepeat=0 --rw=write --refill_buffers --end_fsync=1
>>>>>>>>> --iodepth=200 --ioengine=libaio
>>>>>>>>> 
>>>>>>>>> The benchmark will eventually hang towards the end of the test
>>>>>>>>> for some
>>>>>>> long seconds before completing.
>>>>>>>>> On the proxy node, the kernel complains with iscsi portal login
>>>>>>>>> timeout: http://pastebin.com/Q49UnTPr and I also see irqbalance
>>>>>>>>> errors in syslog: http://pastebin.com/AiRTWDwR
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> You are hitting a different issue. German Anders is most likely
>>>>>>>> correct and you hit the rbd hang. That then caused the iscsi/scsi
>>>>>>>> command to timeout which caused the scsi error handler to run. In
>>>>>>>> your logs we see the LIO error handler has received a task abort
>>>>>>>> from the initiator and that timed out which caused the escalation
>>>>>>>> (iscsi portal login related messages).
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users@lists.ceph.com
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> 
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 
> 
> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to