Re: Connect-IB not performing as well as ConnectX-3 with iSER

2016-06-24 Thread Robert LeBlanc
Sagi,

Here is an example of the different types of tests. This was only on one kernel.

The first two are to set a baseline. The lines starting with buffer is
using fio with direct=0, the lines starting with direct is fio with
direct=1. The lines starting with block is fio running against a raw
block deice (technically 40 partitions on a single drive) with
direct=0. I also reduced the tests to only test one path per port
instead of four like before.

# /root/run_path_tests.sh check-paths
 Test all iSER paths individually 
4.5.0-rc5-5adabdd1-00023-g5adabdd
buffer;sdc;10.218.128.17;3815778;953944;21984
buffer;sdd;10.219.128.17;3743744;935936;22407
buffer;sde;10.220.128.17;4915392;1228848;17066
direct;sdc;10.218.128.17;876644;219161;95690
direct;sdd;10.219.128.17;881684;220421;95143
direct;sde;10.220.128.17;892215;223053;94020
block;sdc;10.218.128.17;3890459;972614;21562
block;sdd;10.219.128.17;4127642;1031910;20323
block;sde;10.220.128.17;4939705;1234926;16982
# /root/run_path_tests.sh check-paths
 Test all iSER paths individually 
4.5.0-rc5-5adabdd1-00023-g5adabdd
buffer;sdc;10.218.128.17;3983572;995893;21058
buffer;sdd;10.219.128.17;3774231;943557;6
buffer;sde;10.220.128.17;4856204;1214051;17274
direct;sdc;10.218.128.17;875820;218955;95780
direct;sdd;10.219.128.17;884072;221018;94886
direct;sde;10.220.128.17;902486;225621;92950
block;sdc;10.218.128.17;3790433;947608;22131
block;sdd;10.219.128.17;3860025;965006;21732
block;sde;10.220.128.17;4946404;1236601;16959

For the following test, I set the IRQ on the initiator using mlx_tune
-p HIGH_THROUGHPUT with irqbalance disabled.

# /root/run_path_tests.sh check-paths
 Test all iSER paths individually 
4.5.0-rc5-5adabdd1-00023-g5adabdd
buffer;sdc;10.218.128.17;3742742;935685;22413
buffer;sdd;10.219.128.17;3786327;946581;22155
buffer;sde;10.220.128.17;5009619;1252404;16745
direct;sdc;10.218.128.17;871942;217985;96206
direct;sdd;10.219.128.17;883467;220866;94951
direct;sde;10.220.128.17;901138;225284;93089
block;sdc;10.218.128.17;3911319;977829;21447
block;sdd;10.219.128.17;3758168;939542;22321
block;sde;10.220.128.17;4968377;1242094;16884

For the following test, I also set the IRQs on the target using
mlx_tune -p HIGH_THROUGHPUT and disabled irqbalance.

# /root/run_path_tests.sh check-paths
 Test all iSER paths individually 
4.5.0-rc5-5adabdd1-00023-g5adabdd
buffer;sdc;10.218.128.17;3804357;951089;22050
buffer;sdd;10.219.128.17;3767113;941778;22268
buffer;sde;10.220.128.17;4966612;1241653;16890
direct;sdc;10.218.128.17;879742;219935;95353
direct;sdd;10.219.128.17;886641;221660;94611
direct;sde;10.220.128.17;886857;221714;94588
block;sdc;10.218.128.17;3760864;940216;22305
block;sdd;10.219.128.17;3763564;940891;22289
block;sde;10.220.128.17;4965436;1241359;16894

It seems that mlx_tune marginally helps, but not really providing
anything groundbreaking.

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jun 22, 2016 at 11:46 AM, Robert LeBlanc  wrote:
> Sagi,
>
> Yes you are understanding the data correctly and what I'm seeing. I
> think you are also seeing the confusion that I've been running into
> trying to figure this out as well. As far as your questions about SRP,
> the performance data is from the initiator and the CPU info is from
> the target (all fio threads on the initiator were low CPU
> utilization).
>
> I spent a good day tweaking the IRQ assignments (spreading IRQs to all
> cores, spreading to all cores on the NUMA node the card is attached
> to, and spreading to all non-hyperthreaded cores on the NUMA node).
> None of these provided any substantial gains/detriments (irqbalance
> was not running). I don't know if there is IRQ steering going on, but
> in some cases with irqbalance not running the IRQs would get pinned
> back to the previous core(s) and I'd have to set them again. I did not
> use the Mellanox scripts, I just did it by hand based on the
> documents/scripts. I also offlined all cores on the second NUMA node
> which didn't help either. I got more performance gains with nomerges
> (1 or 2 provided about the same gain, 2 slightly more) and the queue.
> It seems that something in 1aaa57f5 was going right as both cards
> performed very well without needing any IRQ fudging.
>
> I understand that there are many moving parts to try and figure this
> out, it could be anywhere in the IB drivers, LIO, and even the SCSI
> sub systems, RAM disk implementation or file system. However since the
> performance is bouncing between cards, it seems it is unlikely
> something very common (except when both cards show a loss/gain), but
> as you mentioned, there doesn't seem to be any rhyme or reason to the
> shifts.
>
> I haven't been using the straight block device in these tests, before
> when I did, after one thread read the data, if another read that same
> block it then started reading it from cache invalidating the test. I
> could only saturate the path/port 

[PATCH] scsi: ufs: remove unnecessary goto label

2016-06-24 Thread Tiezhu Yang
When buff_ascii kmalloc failed, there is no need to call kfree,
it should return -ENOMEM directly, this patch fixes it.

Signed-off-by: Tiezhu Yang 
---
 drivers/scsi/ufs/ufshcd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 8e8989a..f08d41a 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2137,7 +2137,7 @@ int ufshcd_read_string_desc(struct ufs_hba *hba, int 
desc_index, u8 *buf,
buff_ascii = kmalloc(ascii_len, GFP_KERNEL);
if (!buff_ascii) {
err = -ENOMEM;
-   goto out_free_buff;
+   goto out;
}
 
/*
@@ -2156,7 +2156,6 @@ int ufshcd_read_string_desc(struct ufs_hba *hba, int 
desc_index, u8 *buf,
size - QUERY_DESC_HDR_SIZE);
memcpy(buf + QUERY_DESC_HDR_SIZE, buff_ascii, ascii_len);
buf[QUERY_DESC_LENGTH_OFFSET] = ascii_len + QUERY_DESC_HDR_SIZE;
-out_free_buff:
kfree(buff_ascii);
}
 out:
-- 
1.8.3.1