------- Comment From cls...@us.ibm.com 2018-10-23 19:47 EDT------- (In reply to comment #8) > I built a test kernel with commit 37fdffb217a45609edccbb8b407d031143f551c0. > The test kernel can be downloaded from: > http://kernel.ubuntu.com/~jsalisbury/lp1799393 > > Can you test this kernel and see if it resolves this bug? > > Note about installing test kernels: > ? If the test kernel is prior to 4.15(Bionic) you need to install the > linux-image and linux-image-extra .deb packages. > ? If the test kernel is 4.15(Bionic) or newer, you need to install the > linux-modules, linux-modules-extra and linux-image-unsigned .deb packages. > > Thanks in advance!
Hi I was able to verify this with this kernel 4.18.0-10-generic #12~lp1799393 SMP Tue Oct 23 19:04:13 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux I did a ping flood and I can see that I am not getting wqe_err right way like before. #netstat -in Kernel Interface table Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg enP2p1s0 1500 4295 0 9 0 271 0 0 0 BMRU enp1s0f0 1500 5608322 0 0 0 5606566 0 0 0 BMRU lo 65536 12 0 0 0 12 0 0 0 LRU virbr0 1500 0 0 0 0 0 0 0 0 BMU # ethtool -S enp1s0f0 | grep rx_wqe_err rx_wqe_err: 0 Thanks. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1799393 Title: Mellanox CX5 stops pinging with rx_wqe_err (mlx5_core) Status in The Ubuntu-power-systems project: In Progress Status in linux package in Ubuntu: In Progress Bug description: == Comment: #0 - Michael Ranweiler - 2018-10-18 11:34:40 == ---Problem Description--- At the system if u do ethtool -S enP48p1s0f0 | grep wqe_err rx_wqe_err: 1 rx0_wqe_err: 0 rx1_wqe_err: 0 rx2_wqe_err: 0 rx3_wqe_err: 1 rx4_wqe_err: 0 rx5_wqe_err: 0 rx6_wqe_err: 0 rx7_wqe_err: 0 rx8_wqe_err: 0 rx9_wqe_err: 0 rx10_wqe_err: 0 rx11_wqe_err: 0 rx12_wqe_err: 0 rx13_wqe_err: 0 rx14_wqe_err: 0 rx15_wqe_err: 0 Will see that rx side is hitting issue. ---Additional Hardware Info--- Mellanox CX5 Ethernet 100G lspci 0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] Machine Type = P9 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- Using a CX5 Ethernet 100G card lspci 0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] just configure IP ifconfig enP48p1s0f0 33.33.33.33 netmask 255.255.255.0 up then partner system configure IP and then try ping -f ping -f 33.33.33.33 PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data. ........................................^C --- 33.33.33.33 ping statistics --- 5413 packets transmitted, 5373 received, 0% packet loss, time 934ms rtt min/avg/max/mdev = 0.015/0.019/0.669/0.010 ms, ipg/ewma 0.172/0.020 ms # ping 33.33.33.33 PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data. ^C --- 33.33.33.33 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1071ms then at the recv system then do ethtool -S enP48p1s0f0 | grep wqe_err rx_wqe_err: 1 rx0_wqe_err: 0 rx1_wqe_err: 0 rx2_wqe_err: 0 rx3_wqe_err: 1 rx4_wqe_err: 0 rx5_wqe_err: 0 rx6_wqe_err: 0 rx7_wqe_err: 0 rx8_wqe_err: 0 rx9_wqe_err: 0 rx10_wqe_err: 0 rx11_wqe_err: 0 rx12_wqe_err: 0 rx13_wqe_err: 0 rx14_wqe_err: 0 rx15_wqe_err: 0 you will see rx_wqe_err with a counter non-zero. This is fixed by this patch: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=37fdffb217a45609edccbb8b407d031143f551c0 == Comment: #1 - Carol L. Soto - 2018-10-18 11:46:00 == I did a git clone to the cosmic tree and loaded the kernel in a system. kernel 4.18.12 and I can recreate it. lspci | grep Mell | grep ConnectX-5 0000:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 0000:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 0030:01:00.0 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 0030:01:00.1 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] :~# ethtool -S enp1s0f0 | grep wqe_err rx_wqe_err: 2 rx0_wqe_err: 1 rx1_wqe_err: 1 rx2_wqe_err: 0 rx3_wqe_err: 0 rx4_wqe_err: 0 rx5_wqe_err: 0 rx6_wqe_err: 0 rx7_wqe_err: 0 rx8_wqe_err: 0 rx9_wqe_err: 0 rx10_wqe_err: 0 ... Let me check if the proposed patch needs backport or not. == Comment: #3 - Carol L. Soto - 2018-10-18 13:34:46 == I was able to apply the proposed patch as it to the cosmic git tree and no issue. (no need to backport) using a kernel 4.18.12+. With the proposed patch I do not see wqe err and ping does not stop. ethtool -S enp1s0f0 | grep wqe_err rx_wqe_err: 0 rx0_wqe_err: 0 rx1_wqe_err: 0 rx2_wqe_err: 0 rx3_wqe_err: 0 rx4_wqe_err: 0 rx5_wqe_err: 0 rx6_wqe_err: 0 rx7_wqe_err: 0 rx8_wqe_err: 0 rx9_wqe_err: 0 rx10_wqe_err: 0 ... To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1799393/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp