RE: iSCSI failing, MLX rx_ring errors ?

2017-01-01 Thread Meny Yossefi
Hi Ben,



Those are not HW errors, note that:



hw.mlxen1.stat.rx_dropped: 0

hw.mlxen1.stat.rx_errors: 0



It seems to be triggered when you are failing to allocate a replacement buffer.

Any chance you ran out of mbufs in the system?



en_rx.c:



mlx4_en_process_rx_cq():



   mb = mlx4_en_rx_mb(priv, rx_desc, mb_list, length);

if (!mb) {

ring->errors++;

goto next;

}



mlx4_en_rx_mb() --> mlx4_en_complete_rx_desc():



  /* Allocate a replacement page */

if (mlx4_en_alloc_buf(priv, rx_desc, mb_list, nr))

goto fail;



-Meny



 Forwarded Message 

Subject: iSCSI failing, MLX rx_ring errors ?

Date: Fri, 30 Dec 2016 22:55:19 +0100

From: Ben RUBSON mailto:ben.rub...@gmail.com>>

To: FreeBSD Net mailto:freebsd-net@freebsd.org>>



Hello,



2 FreeBSD 11.0-p3 servers, one iSCSI initiator, one target.

Both with Mellanox ConnectX-3 40G.



Since a few days, sometimes, under undetermined circumstances, as soon as there 
is some (very low) iSCSI traffic, some of the disks get disconnected :

kernel: WARNING: 192.168.2.2 (iqn..): no ping reply (NOP-Out) after

5 seconds; dropping connection



At the same moment, sysctl counters hw.mlxen1.stat.rx_ring*.error grow on 
initiator side.



I then tried to reproduce these network errors burning the link at 40G 
full-duplex using iPerf.

But I did not manage to increase these error counters.



It's strange because it's a sporadic issue, I can have traffic on iSCSI disks 
without any issue, and sometimes, they get disconnected with errors growing.



What should I look at ?

What do these rx_ring*.error counters mean ? Hardware errors ?



Below are some numbers to help with the investigation.

(strangely enough for MLX guys, all hw.mlxen*.stat.tx_*_bytes_packets counters 
are 0)



Tank you very much for your help & support !



Best regards,



Ben







# uname -r

11.0-RELEASE-p3



# ifconfig mlxen1

mlxen1: flags=8843 metric 0 mtu 9020 
options=ed07bb

ether XX:XX:XX:XX:XX:XX

inet 192.168.2.1 netmask 0x broadcast 192.168.255.255 nd6 
options=29

media: Ethernet autoselect (40Gbase-CR4 )

status: active



# mst status

MST devices:



pci0:133:0:0 - MT27500 Family [ConnectX-3]



# flint -d pci0:133:0:0 q

Image type:  FS2

FW Version:  2.36.5000

FW Release Date: 26.1.2016

Product Version: 02.36.50.00

Rom Info:type=PXE version=3.4.718 devid=4099

PSID:MT_1090110023



[initiator]# netstat -I mlxen1

NameMtu Network   AddressIpkts Ierrs Idrop

Opkts Oerrs  Coll

mlxen  9020   XX:XX:XX:XX:XX:XX 4095609916 0 0

3321316930 0 0

mlxen - 192.168.0.0/1 initiator 2020732710 - -

3242031277 - -



[target]# netstat -I mlxen1

NameMtu Network   AddressIpkts Ierrs Idrop

Opkts Oerrs  Coll

mlxen  9020   XX:XX:XX:XX:XX:XX 3798170324 0 0

5098312540 0 0

mlxen - 192.168.0.0/1 target2462248779 - -

5057776404 - -



[initiator]# sysctl hw.mlxen1

hw.mlxen1.stat.rx_ring15.error: 52

hw.mlxen1.stat.rx_ring15.bytes: 3477976760

hw.mlxen1.stat.rx_ring15.packets: 7360524

hw.mlxen1.stat.rx_ring14.error: 77

hw.mlxen1.stat.rx_ring14.bytes: 791762343420

hw.mlxen1.stat.rx_ring14.packets: 142943349

hw.mlxen1.stat.rx_ring13.error: 33

hw.mlxen1.stat.rx_ring13.bytes: 2284826126

hw.mlxen1.stat.rx_ring13.packets: 7781479

hw.mlxen1.stat.rx_ring12.error: 20

hw.mlxen1.stat.rx_ring12.bytes: 730312221216

hw.mlxen1.stat.rx_ring12.packets: 155950019

hw.mlxen1.stat.rx_ring11.error: 57

hw.mlxen1.stat.rx_ring11.bytes: 114233581104

hw.mlxen1.stat.rx_ring11.packets: 69633934

hw.mlxen1.stat.rx_ring10.error: 49

hw.mlxen1.stat.rx_ring10.bytes: 10775291086886

hw.mlxen1.stat.rx_ring10.packets: 1389173314

hw.mlxen1.stat.rx_ring9.error: 68

hw.mlxen1.stat.rx_ring9.bytes: 35171979154

hw.mlxen1.stat.rx_ring9.packets: 86633073

hw.mlxen1.stat.rx_ring8.error: 81

hw.mlxen1.stat.rx_ring8.bytes: 23210482350

hw.mlxen1.stat.rx_ring8.packets: 68058961

hw.mlxen1.stat.rx_ring7.error: 49

hw.mlxen1.stat.rx_ring7.bytes: 5093871869318

hw.mlxen1.stat.rx_ring7.packets: 744833265

hw.mlxen1.stat.rx_ring6.error: 37

hw.mlxen1.stat.rx_ring6.bytes: 90764137790

hw.mlxen1.stat.rx_ring6.packets: 130626431

hw.mlxen1.stat.rx_ring5.error: 7

hw.mlxen1.stat.rx_ring5.bytes: 641902292152

hw.mlxen1.stat.rx_ring5.packets: 76754874

hw.mlxen1.stat.rx_ring4.error: 59

hw.mlxen1.stat.rx_ring4.bytes: 28894253498

hw.mlxen1.stat.rx_ring4.packets: 12545685

hw.mlxen1.stat.rx_ring3.error: 87

hw.mlxen1.stat.rx_ring3.bytes: 1581250152646

hw.mlxen1.stat.rx_ring3.packets: 255027061

hw.mlxen1.stat.rx_ring2.error: 19

hw.mlxen1.stat.rx_ring2.bytes: 47056101376

hw.mlxen1.stat.rx_ring2.packets: 11670049

hw.mlxen1.stat.rx_ring1.error: 76

hw.mlxen1.stat.rx_r

Problem reports for freebsd-net@FreeBSD.org that need special attention

2017-01-01 Thread bugzilla-noreply
To view an individual PR, use:
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id).

The following is a listing of current problems submitted by FreeBSD users,
which need special attention. These represent problem reports covering
all versions including experimental development code and obsolete releases.

Status  |Bug Id | Description
+---+---
In Progress |165622 | [ndis][panic][patch] Unregistered use of FPU in k 
In Progress |200361 | net.inet.tcp.hostcache.list is jail information l 
In Progress |203422 | mpd/ppoe not working with re(4) with revision 285 
In Progress |206581 | bxe_ioctl_nvram handler is faulty 
New |204438 | setsockopt() handling of kern.ipc.maxsockbuf limi 
New |205592 | TCP processing in IPSec causes kernel panic   
New |206053 | kqueue support code of netmap causes panic
New |213410 | [carp] service netif restart causes hang only whe 
Open|148807 | [panic] "panic: sbdrop" and "panic: sbsndptr: soc 
Open|193452 | Dell PowerEdge 210 II -- Kernel panic bce (broadc 
Open|194485 | Userland cannot add IPv6 prefix routes
Open|194515 | Fatal Trap 12 Kernel with vimage  
Open|199136 | [if_tap] Added down_on_close sysctl variable to t 
Open|202510 | [CARP] advertisements sourced from CARP IP cause  
Open|206544 | sendmsg(2) (sendto(2) too?) can fail with EINVAL; 
Open|211031 | [panic] in ng_uncallout when argument is NULL 
Open|211962 | bxe driver queue soft hangs and flooding tx_soft_ 
Open|212018 | Enable IPSEC_NAT_T in GENERIC kernel configuratio 
Open|213257 | Crash in IGB driver with ALTQ 

19 problems total for which you should take action.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"