Not a bad idea.. which reminds me there might be some benefits to toying with the MTU settings as well..
I'll check when I have a chance. -- David Moreau Simard > On Dec 8, 2014, at 2:13 PM, Nick Fisk <n...@fisk.me.uk> wrote: > > Hi David, > > This is a long shot, but have you checked the Max queue depth on the iscsi > side. I've got a feeling that lio might be set at 32 as default. > > This would definitely have an effect at the high queue depths you are testing > with. > > On 8 Dec 2014 16:53, David Moreau Simard <dmsim...@iweb.com> wrote: >> >> Haven't tried other iSCSI implementations (yet). LIO/targetcli makes it very >> easy to iQuoting David Moreau Simard <dmsim...@iweb.com> > >> Haven't tried other iSCSI implementations (yet). >> >> LIO/targetcli makes it very easy to implement/integrate/wrap/automate around >> so I'm really trying to get this right. >> >> PCI-E SSD cache tier in front of spindles-backed erasure coded pool in 10 >> Gbps across the board yields results slightly better or very similar to two >> spindles in hardware RAID-0 with writeback caching. >> With that in mind, the performance is not outright awful by any means, >> there's just a lot of overhead we have to be reminded about. >> >> What I'd like to further test but am unable to right now is to see what >> happens if you scale up the cluster. Right now I'm testing on only two nodes. >> Does the IOPS scale linearly with increasing amount of OSDs/servers ? Or is >> it more about a capacity thing ? >> >> Perhaps if someone else can chime in, I'm really curious. >> -- >> David Moreau Simard >> >>> On Dec 6, 2014, at 11:18 AM, Nick Fisk <n...@fisk.me.uk> wrote: >>> >>> Hi David, >>> >>> Very strange, but I'm glad you managed to finally get the cluster working >>> normally. Thank you for posting the benchmarks figures, it's interesting to >>> see the overhead of LIO over pure RBD performance. >>> >>> I should have the hardware for our cluster up and running early next year, I >>> will be in a better position to test the iSCSI performance then. I will >>> report back once I have some numbers. >>> >>> Just out of interest, have you tried any of the other iSCSI implementations >>> to see if they show the same performance drop? >>> >>> Nick >>> >>> -----Original Message----- >>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >>> David Moreau Simard >>> Sent: 05 December 2014 16:03 >>> To: Nick Fisk >>> Cc: ceph-users@lists.ceph.com >>> Subject: Re: [ceph-users] Poor RBD performance as LIO iSCSI target >>> >>> I've flushed everything - data, pools, configs and reconfigured the whole >>> thing. >>> >>> I was particularly careful with cache tiering configurations (almost leaving >>> defaults when possible) and it's not locking anymore. >>> It looks like the cache tiering configuration I had was causing the problem >>> ? I can't put my finger on exactly what/why and I don't have the luxury of >>> time to do this lengthy testing again. >>> >>> Here's what I dumped as far as config goes before wiping: >>> ======== >>> # for var in size min_size pg_num pgp_num crush_ruleset >>> erasure_code_profile; do ceph osd pool get volumes $var; done >>> size: 5 >>> min_size: 2 >>> pg_num: 7200 >>> pgp_num: 7200 >>> crush_ruleset: 1 >>> erasure_code_profile: ecvolumes >>> >>> # for var in size min_size pg_num pgp_num crush_ruleset hit_set_type >>> hit_set_period hit_set_count target_max_objects target_max_bytes >>> cache_target_dirty_ratio cache_target_full_ratio cache_min_flush_age >>> cache_min_evict_age; do ceph osd pool get volumecache $var; done >>> size: 2 >>> min_size: 1 >>> pg_num: 7200 >>> pgp_num: 7200 >>> crush_ruleset: 4 >>> hit_set_type: bloom >>> hit_set_period: 3600 >>> hit_set_count: 1 >>> target_max_objects: 0 >>> target_max_bytes: 100000000000 >>> cache_target_dirty_ratio: 0.5 >>> cache_target_full_ratio: 0.8 >>> cache_min_flush_age: 600 >>> cache_min_evict_age: 1800 >>> >>> # ceph osd erasure-code-profile get ecvolumes >>> directory=/usr/lib/ceph/erasure-code >>> k=3 >>> m=2 >>> plugin=jerasure >>> ruleset-failure-domain=osd >>> technique=reed_sol_van >>> ======== >>> >>> And now: >>> ======== >>> # for var in size min_size pg_num pgp_num crush_ruleset >>> erasure_code_profile; do ceph osd pool get volumes $var; done >>> size: 5 >>> min_size: 3 >>> pg_num: 2048 >>> pgp_num: 2048 >>> crush_ruleset: 1 >>> erasure_code_profile: ecvolumes >>> >>> # for var in size min_size pg_num pgp_num crush_ruleset hit_set_type >>> hit_set_period hit_set_count target_max_objects target_max_bytes >>> cache_target_dirty_ratio cache_target_full_ratio cache_min_flush_age >>> cache_min_evict_age; do ceph osd pool get volumecache $var; done >>> size: 2 >>> min_size: 1 >>> pg_num: 2048 >>> pgp_num: 2048 >>> crush_ruleset: 4 >>> hit_set_type: bloom >>> hit_set_period: 3600 >>> hit_set_count: 1 >>> target_max_objects: 0 >>> target_max_bytes: 150000000000 >>> cache_target_dirty_ratio: 0.5 >>> cache_target_full_ratio: 0.8 >>> cache_min_flush_age: 0 >>> cache_min_evict_age: 1800 >>> >>> # ceph osd erasure-code-profile get ecvolumes >>> directory=/usr/lib/ceph/erasure-code >>> k=3 >>> m=2 >>> plugin=jerasure >>> ruleset-failure-domain=osd >>> technique=reed_sol_van >>> ======== >>> >>> Crush map hasn't really changed before and after. >>> >>> FWIW, the benchmarks I pulled out of the setup: >>> https://gist.github.com/dmsimard/2737832d077cfc5eff34 >>> Definite overhead going from krbd to krbd + LIO... >>> -- >>> David Moreau Simard >>> >>> >>>> On Nov 20, 2014, at 4:14 PM, Nick Fisk <n...@fisk.me.uk> wrote: >>>> >>>> Here you go:- >>>> >>>> Erasure Profile >>>> k=2 >>>> m=1 >>>> plugin=jerasure >>>> ruleset-failure-domain=osd >>>> ruleset-root=hdd >>>> technique=reed_sol_van >>>> >>>> Cache Settings >>>> hit_set_type: bloom >>>> hit_set_period: 3600 >>>> hit_set_count: 1 >>>> target_max_objects >>>> target_max_objects: 0 >>>> target_max_bytes: 1000000000 >>>> cache_target_dirty_ratio: 0.4 >>>> cache_target_full_ratio: 0.8 >>>> cache_min_flush_age: 0 >>>> cache_min_evict_age: 0 >>>> >>>> Crush Dump >>>> # begin crush map >>>> tunable choose_local_tries 0 >>>> tunable choose_local_fallback_tries 0 >>>> tunable choose_total_tries 50 >>>> tunable chooseleaf_descend_once 1 >>>> >>>> # devices >>>> device 0 osd.0 >>>> device 1 osd.1 >>>> device 2 osd.2 >>>> device 3 osd.3 >>>> >>>> # types >>>> type 0 osd >>>> type 1 host >>>> type 2 chassis >>>> type 3 rack >>>> type 4 row >>>> type 5 pdu >>>> type 6 pod >>>> type 7 room >>>> type 8 datacenter >>>> type 9 region >>>> type 10 root >>>> >>>> # buckets >>>> host ceph-test-hdd { >>>> id -5 # do not change unnecessarily >>>> # weight 2.730 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item osd.1 weight 0.910 >>>> item osd.2 weight 0.910 >>>> item osd.0 weight 0.910 >>>> } >>>> root hdd { >>>> id -3 # do not change unnecessarily >>>> # weight 2.730 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item ceph-test-hdd weight 2.730 } host ceph-test-ssd { >>>> id -6 # do not change unnecessarily >>>> # weight 1.000 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item osd.3 weight 1.000 >>>> } >>>> root ssd { >>>> id -4 # do not change unnecessarily >>>> # weight 1.000 >>>> alg straw >>>> hash 0 # rjenkins1 >>>> item ceph-test-ssd weight 1.000 } >>>> >>>> # rules >>>> rule hdd { >>>> ruleset 0 >>>> type replicated >>>> min_size 0 >>>> max_size 10 >>>> step take hdd >>>> step chooseleaf firstn 0 type osd >>>> step emit >>>> } >>>> rule ssd { >>>> ruleset 1 >>>> type replicated >>>> min_size 0 >>>> max_size 4 >>>> step take ssd >>>> step chooseleaf firstn 0 type osd >>>> step emit >>>> } >>>> rule ecpool { >>>> ruleset 2 >>>> type erasure >>>> min_size 3 >>>> max_size 20 >>>> step set_chooseleaf_tries 5 >>>> step take hdd >>>> step chooseleaf indep 0 type osd >>>> step emit >>>> } >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf >>>> Of David Moreau Simard >>>> Sent: 20 November 2014 20:03 >>>> To: Nick Fisk >>>> Cc: ceph-users@lists.ceph.com >>>> Subject: Re: [ceph-users] Poor RBD performance as LIO iSCSI target >>>> >>>> Nick, >>>> >>>> Can you share more datails on the configuration you are using ? I'll >>>> try and duplicate those configurations in my environment and see what >>> happens. >>>> I'm mostly interested in: >>>> - Erasure code profile (k, m, plugin, ruleset-failure-domain) >>>> - Cache tiering pool configuration (ex: hit_set_type, hit_set_period, >>>> hit_set_count, target_max_objects, target_max_bytes, >>>> cache_target_dirty_ratio, cache_target_full_ratio, >>>> cache_min_flush_age, >>>> cache_min_evict_age) >>>> >>>> The crush rulesets would also be helpful. >>>> >>>> Thanks, >>>> -- >>>> David Moreau Simard >>>> >>>>> On Nov 20, 2014, at 12:43 PM, Nick Fisk <n...@fisk.me.uk> wrote: >>>>> >>>>> Hi David, >>>>> >>>>> I've just finished running the 75GB fio test you posted a few days >>>>> back on my new test cluster. >>>>> >>>>> The cluster is as follows:- >>>>> >>>>> Single server with 3x hdd and 1 ssd >>>>> Ubuntu 14.04 with 3.16.7 kernel >>>>> 2+1 EC pool on hdds below a 10G ssd cache pool. SSD is also >>>>> 2+partitioned to >>>>> provide journals for hdds. >>>>> 150G RBD mapped locally >>>>> >>>>> The fio test seemed to run without any problems. I want to run a few >>>>> more tests with different settings to see if I can reproduce your >>>>> problem. I will let you know if I find anything. >>>>> >>>>> If there is anything you would like me to try, please let me know. >>>>> >>>>> Nick >>>>> >>>>> -----Original Message----- >>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf >>>>> Of David Moreau Simard >>>>> Sent: 19 November 2014 10:48 >>>>> To: Ramakrishna Nishtala (rnishtal) >>>>> Cc: ceph-users@lists.ceph.com; Nick Fisk >>>>> Subject: Re: [ceph-users] Poor RBD performance as LIO iSCSI target >>>>> >>>>> Rama, >>>>> >>>>> Thanks for your reply. >>>>> >>>>> My end goal is to use iSCSI (with LIO/targetcli) to export rbd block >>>>> devices. >>>>> >>>>> I was encountering issues with iSCSI which are explained in my >>>>> previous emails. >>>>> I ended up being able to reproduce the problem at will on various >>>>> Kernel and OS combinations, even on raw RBD devices - thus ruling out >>>>> the hypothesis that it was a problem with iSCSI but rather with Ceph. >>>>> I'm even running 0.88 now and the issue is still there. >>>>> >>>>> I haven't isolated the issue just yet. >>>>> My next tests involve disabling the cache tiering. >>>>> >>>>> I do have client krbd cache as well, i'll try to disable it too if >>>>> cache tiering isn't enough. >>>>> -- >>>>> David Moreau Simard >>>>> >>>>> >>>>>> On Nov 18, 2014, at 8:10 PM, Ramakrishna Nishtala (rnishtal) >>>>> <rnish...@cisco.com> wrote: >>>>>> >>>>>> Hi Dave >>>>>> Did you say iscsi only? The tracker issue does not say though. >>>>>> I am on giant, with both client and ceph on RHEL 7 and seems to work >>>>>> ok, >>>>> unless I am missing something here. RBD on baremetal with kmod-rbd >>>>> and caching disabled. >>>>>> >>>>>> [root@compute4 ~]# time fio --name=writefile --size=100G >>>>>> --filesize=100G --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 >>>>>> --sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 >>>>>> --iodepth=200 --ioengine=libaio >>>>>> writefile: (g=0): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, >>>>>> iodepth=200 >>>>>> fio-2.1.11 >>>>>> Starting 1 process >>>>>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/853.0MB/0KB /s] [0/853/0 >>>>>> iops] [eta 00m:00s] ... >>>>>> Disk stats (read/write): >>>>>> rbd0: ios=184/204800, merge=0/0, ticks=70/16164931, >>>>>> in_queue=16164942, util=99.98% >>>>>> >>>>>> real 1m56.175s >>>>>> user 0m18.115s >>>>>> sys 0m10.430s >>>>>> >>>>>> Regards, >>>>>> >>>>>> Rama >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On >>>>>> Behalf Of David Moreau Simard >>>>>> Sent: Tuesday, November 18, 2014 3:49 PM >>>>>> To: Nick Fisk >>>>>> Cc: ceph-users@lists.ceph.com >>>>>> Subject: Re: [ceph-users] Poor RBD performance as LIO iSCSI target >>>>>> >>>>>> Testing without the cache tiering is the next test I want to do when >>>>>> I >>>>> have time.. >>>>>> >>>>>> When it's hanging, there is no activity at all on the cluster. >>>>>> Nothing in "ceph -w", nothing in "ceph osd pool stats". >>>>>> >>>>>> I'll provide an update when I have a chance to test without tiering. >>>>>> -- >>>>>> David Moreau Simard >>>>>> >>>>>> >>>>>>> On Nov 18, 2014, at 3:28 PM, Nick Fisk <n...@fisk.me.uk> wrote: >>>>>>> >>>>>>> Hi David, >>>>>>> >>>>>>> Have you tried on a normal replicated pool with no cache? I've seen >>>>>>> a number of threads recently where caching is causing various >>>>>>> things to >>>>> block/hang. >>>>>>> It would be interesting to see if this still happens without the >>>>>>> caching layer, at least it would rule it out. >>>>>>> >>>>>>> Also is there any sign that as the test passes ~50GB that the cache >>>>>>> might start flushing to the backing pool causing slow performance? >>>>>>> >>>>>>> I am planning a deployment very similar to yours so I am following >>>>>>> this with great interest. I'm hoping to build a single node test >>>>>>> "cluster" shortly, so I might be in a position to work with you on >>>>>>> this issue and hopefully get it resolved. >>>>>>> >>>>>>> Nick >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On >>>>>>> Behalf Of David Moreau Simard >>>>>>> Sent: 18 November 2014 19:58 >>>>>>> To: Mike Christie >>>>>>> Cc: ceph-users@lists.ceph.com; Christopher Spearman >>>>>>> Subject: Re: [ceph-users] Poor RBD performance as LIO iSCSI target >>>>>>> >>>>>>> Thanks guys. I looked at http://tracker.ceph.com/issues/8818 and >>>>>>> chatted with "dis" on #ceph-devel. >>>>>>> >>>>>>> I ran a LOT of tests on a LOT of comabination of kernels (sometimes >>>>>>> with tunables legacy). I haven't found a magical combination in >>>>>>> which the following test does not hang: >>>>>>> fio --name=writefile --size=100G --filesize=100G >>>>>>> --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0 >>>>>>> --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 >>>>>>> --iodepth=200 --ioengine=libaio >>>>>>> >>>>>>> Either directly on a mapped rbd device, on a mounted filesystem >>>>>>> (over rbd), exported through iSCSI.. nothing. >>>>>>> I guess that rules out a potential issue with iSCSI overhead. >>>>>>> >>>>>>> Now, something I noticed out of pure luck is that I am unable to >>>>>>> reproduce the issue if I drop the size of the test to 50GB. Tests >>>>>>> will complete in under 2 minutes. >>>>>>> 75GB will hang right at the end and take more than 10 minutes. >>>>>>> >>>>>>> TL;DR of tests: >>>>>>> - 3x fio --name=writefile --size=50G --filesize=50G >>>>>>> --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0 >>>>>>> --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 >>>>>>> --iodepth=200 --ioengine=libaio >>>>>>> -- 1m44s, 1m49s, 1m40s >>>>>>> >>>>>>> - 3x fio --name=writefile --size=75G --filesize=75G >>>>>>> --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0 >>>>>>> --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 >>>>>>> --iodepth=200 --ioengine=libaio >>>>>>> -- 10m12s, 10m11s, 10m13s >>>>>>> >>>>>>> Details of tests here: http://pastebin.com/raw.php?i=3v9wMtYP >>>>>>> >>>>>>> Does that ring you guys a bell ? >>>>>>> >>>>>>> -- >>>>>>> David Moreau Simard >>>>>>> >>>>>>> >>>>>>>> On Nov 13, 2014, at 3:31 PM, Mike Christie <mchri...@redhat.com> >>> wrote: >>>>>>>> >>>>>>>> On 11/13/2014 10:17 AM, David Moreau Simard wrote: >>>>>>>>> Running into weird issues here as well in a test environment. I >>>>>>>>> don't >>>>>>> have a solution either but perhaps we can find some things in common.. >>>>>>>>> >>>>>>>>> Setup in a nutshell: >>>>>>>>> - Ceph cluster: Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (OSDs >>>>>>>>> with separate public/cluster network in 10 Gbps) >>>>>>>>> - iSCSI Proxy node (targetcli/LIO): Ubuntu 14.04, Kernel 3.16.7, >>>>>>>>> Ceph >>>>>>>>> 0.87-1 (10 Gbps) >>>>>>>>> - Client node: Ubuntu 12.04, Kernel 3.11 (10 Gbps) >>>>>>>>> >>>>>>>>> Relevant cluster config: Writeback cache tiering with NVME PCI-E >>>>>>>>> cards (2 >>>>>>> replica) in front of a erasure coded pool (k=3,m=2) backed by spindles. >>>>>>>>> >>>>>>>>> I'm following the instructions here: >>>>>>>>> http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd >>>>>>>>> - im a ges-san-storage-devices No issues with creating and >>>>>>>>> mapping a 100GB RBD image and then creating the target. >>>>>>>>> >>>>>>>>> I'm interested in finding out the overhead/performance impact of >>>>>>> re-exporting through iSCSI so the idea is to run benchmarks. >>>>>>>>> Here's a fio test I'm trying to run on the client node on the >>>>>>>>> mounted >>>>>>> iscsi device: >>>>>>>>> fio --name=writefile --size=100G --filesize=100G >>>>>>>>> --filename=/dev/sdu --bs=1M --nrfiles=1 --direct=1 --sync=0 >>>>>>>>> --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 >>>>>>>>> --iodepth=200 --ioengine=libaio >>>>>>>>> >>>>>>>>> The benchmark will eventually hang towards the end of the test >>>>>>>>> for some >>>>>>> long seconds before completing. >>>>>>>>> On the proxy node, the kernel complains with iscsi portal login >>>>>>>>> timeout: http://pastebin.com/Q49UnTPr and I also see irqbalance >>>>>>>>> errors in syslog: http://pastebin.com/AiRTWDwR >>>>>>>>> >>>>>>>> >>>>>>>> You are hitting a different issue. German Anders is most likely >>>>>>>> correct and you hit the rbd hang. That then caused the iscsi/scsi >>>>>>>> command to timeout which caused the scsi error handler to run. In >>>>>>>> your logs we see the LIO error handler has received a task abort >>>>>>>> from the initiator and that timed out which caused the escalation >>>>>>>> (iscsi portal login related messages). >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users@lists.ceph.com >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> ceph-users mailing list >>>>>> ceph-users@lists.ceph.com >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> >>> >> >> > > > > > > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com