Sagi,
Here is an example of the different types of tests. This was only on one kernel.
The first two are to set a baseline. The lines starting with buffer is
using fio with direct=0, the lines starting with direct is fio with
direct=1. The lines starting with block is fio running against a raw
block deice (technically 40 partitions on a single drive) with
direct=0. I also reduced the tests to only test one path per port
instead of four like before.
# /root/run_path_tests.sh check-paths
Test all iSER paths individually
4.5.0-rc5-5adabdd1-00023-g5adabdd
buffer;sdc;10.218.128.17;3815778;953944;21984
buffer;sdd;10.219.128.17;3743744;935936;22407
buffer;sde;10.220.128.17;4915392;1228848;17066
direct;sdc;10.218.128.17;876644;219161;95690
direct;sdd;10.219.128.17;881684;220421;95143
direct;sde;10.220.128.17;892215;223053;94020
block;sdc;10.218.128.17;3890459;972614;21562
block;sdd;10.219.128.17;4127642;1031910;20323
block;sde;10.220.128.17;4939705;1234926;16982
# /root/run_path_tests.sh check-paths
Test all iSER paths individually
4.5.0-rc5-5adabdd1-00023-g5adabdd
buffer;sdc;10.218.128.17;3983572;995893;21058
buffer;sdd;10.219.128.17;3774231;943557;6
buffer;sde;10.220.128.17;4856204;1214051;17274
direct;sdc;10.218.128.17;875820;218955;95780
direct;sdd;10.219.128.17;884072;221018;94886
direct;sde;10.220.128.17;902486;225621;92950
block;sdc;10.218.128.17;3790433;947608;22131
block;sdd;10.219.128.17;3860025;965006;21732
block;sde;10.220.128.17;4946404;1236601;16959
For the following test, I set the IRQ on the initiator using mlx_tune
-p HIGH_THROUGHPUT with irqbalance disabled.
# /root/run_path_tests.sh check-paths
Test all iSER paths individually
4.5.0-rc5-5adabdd1-00023-g5adabdd
buffer;sdc;10.218.128.17;3742742;935685;22413
buffer;sdd;10.219.128.17;3786327;946581;22155
buffer;sde;10.220.128.17;5009619;1252404;16745
direct;sdc;10.218.128.17;871942;217985;96206
direct;sdd;10.219.128.17;883467;220866;94951
direct;sde;10.220.128.17;901138;225284;93089
block;sdc;10.218.128.17;3911319;977829;21447
block;sdd;10.219.128.17;3758168;939542;22321
block;sde;10.220.128.17;4968377;1242094;16884
For the following test, I also set the IRQs on the target using
mlx_tune -p HIGH_THROUGHPUT and disabled irqbalance.
# /root/run_path_tests.sh check-paths
Test all iSER paths individually
4.5.0-rc5-5adabdd1-00023-g5adabdd
buffer;sdc;10.218.128.17;3804357;951089;22050
buffer;sdd;10.219.128.17;3767113;941778;22268
buffer;sde;10.220.128.17;4966612;1241653;16890
direct;sdc;10.218.128.17;879742;219935;95353
direct;sdd;10.219.128.17;886641;221660;94611
direct;sde;10.220.128.17;886857;221714;94588
block;sdc;10.218.128.17;3760864;940216;22305
block;sdd;10.219.128.17;3763564;940891;22289
block;sde;10.220.128.17;4965436;1241359;16894
It seems that mlx_tune marginally helps, but not really providing
anything groundbreaking.
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Wed, Jun 22, 2016 at 11:46 AM, Robert LeBlanc wrote:
> Sagi,
>
> Yes you are understanding the data correctly and what I'm seeing. I
> think you are also seeing the confusion that I've been running into
> trying to figure this out as well. As far as your questions about SRP,
> the performance data is from the initiator and the CPU info is from
> the target (all fio threads on the initiator were low CPU
> utilization).
>
> I spent a good day tweaking the IRQ assignments (spreading IRQs to all
> cores, spreading to all cores on the NUMA node the card is attached
> to, and spreading to all non-hyperthreaded cores on the NUMA node).
> None of these provided any substantial gains/detriments (irqbalance
> was not running). I don't know if there is IRQ steering going on, but
> in some cases with irqbalance not running the IRQs would get pinned
> back to the previous core(s) and I'd have to set them again. I did not
> use the Mellanox scripts, I just did it by hand based on the
> documents/scripts. I also offlined all cores on the second NUMA node
> which didn't help either. I got more performance gains with nomerges
> (1 or 2 provided about the same gain, 2 slightly more) and the queue.
> It seems that something in 1aaa57f5 was going right as both cards
> performed very well without needing any IRQ fudging.
>
> I understand that there are many moving parts to try and figure this
> out, it could be anywhere in the IB drivers, LIO, and even the SCSI
> sub systems, RAM disk implementation or file system. However since the
> performance is bouncing between cards, it seems it is unlikely
> something very common (except when both cards show a loss/gain), but
> as you mentioned, there doesn't seem to be any rhyme or reason to the
> shifts.
>
> I haven't been using the straight block device in these tests, before
> when I did, after one thread read the data, if another read that same
> block it then started reading it from cache invalidating the test. I
> could only saturate the path/port