Hi Ben,
Those are not HW errors, note that:
hw.mlxen1.stat.rx_dropped: 0
hw.mlxen1.stat.rx_errors: 0
It seems to be triggered when you are failing to allocate a replacement buffer.
Any chance you ran out of mbufs in the system?
en_rx.c:
mlx4_en_process_rx_cq():
mb = mlx4_en_rx_mb(priv, rx_desc, mb_list, length);
if (!mb) {
ring->errors++;
goto next;
}
mlx4_en_rx_mb() --> mlx4_en_complete_rx_desc():
/* Allocate a replacement page */
if (mlx4_en_alloc_buf(priv, rx_desc, mb_list, nr))
goto fail;
-Meny
Forwarded Message
Subject: iSCSI failing, MLX rx_ring errors ?
Date: Fri, 30 Dec 2016 22:55:19 +0100
From: Ben RUBSON mailto:ben.rub...@gmail.com>>
To: FreeBSD Net mailto:freebsd-net@freebsd.org>>
Hello,
2 FreeBSD 11.0-p3 servers, one iSCSI initiator, one target.
Both with Mellanox ConnectX-3 40G.
Since a few days, sometimes, under undetermined circumstances, as soon as there
is some (very low) iSCSI traffic, some of the disks get disconnected :
kernel: WARNING: 192.168.2.2 (iqn..): no ping reply (NOP-Out) after
5 seconds; dropping connection
At the same moment, sysctl counters hw.mlxen1.stat.rx_ring*.error grow on
initiator side.
I then tried to reproduce these network errors burning the link at 40G
full-duplex using iPerf.
But I did not manage to increase these error counters.
It's strange because it's a sporadic issue, I can have traffic on iSCSI disks
without any issue, and sometimes, they get disconnected with errors growing.
What should I look at ?
What do these rx_ring*.error counters mean ? Hardware errors ?
Below are some numbers to help with the investigation.
(strangely enough for MLX guys, all hw.mlxen*.stat.tx_*_bytes_packets counters
are 0)
Tank you very much for your help & support !
Best regards,
Ben
# uname -r
11.0-RELEASE-p3
# ifconfig mlxen1
mlxen1: flags=8843 metric 0 mtu 9020
options=ed07bb
ether XX:XX:XX:XX:XX:XX
inet 192.168.2.1 netmask 0x broadcast 192.168.255.255 nd6
options=29
media: Ethernet autoselect (40Gbase-CR4 )
status: active
# mst status
MST devices:
pci0:133:0:0 - MT27500 Family [ConnectX-3]
# flint -d pci0:133:0:0 q
Image type: FS2
FW Version: 2.36.5000
FW Release Date: 26.1.2016
Product Version: 02.36.50.00
Rom Info:type=PXE version=3.4.718 devid=4099
PSID:MT_1090110023
[initiator]# netstat -I mlxen1
NameMtu Network AddressIpkts Ierrs Idrop
Opkts Oerrs Coll
mlxen 9020 XX:XX:XX:XX:XX:XX 4095609916 0 0
3321316930 0 0
mlxen - 192.168.0.0/1 initiator 2020732710 - -
3242031277 - -
[target]# netstat -I mlxen1
NameMtu Network AddressIpkts Ierrs Idrop
Opkts Oerrs Coll
mlxen 9020 XX:XX:XX:XX:XX:XX 3798170324 0 0
5098312540 0 0
mlxen - 192.168.0.0/1 target2462248779 - -
5057776404 - -
[initiator]# sysctl hw.mlxen1
hw.mlxen1.stat.rx_ring15.error: 52
hw.mlxen1.stat.rx_ring15.bytes: 3477976760
hw.mlxen1.stat.rx_ring15.packets: 7360524
hw.mlxen1.stat.rx_ring14.error: 77
hw.mlxen1.stat.rx_ring14.bytes: 791762343420
hw.mlxen1.stat.rx_ring14.packets: 142943349
hw.mlxen1.stat.rx_ring13.error: 33
hw.mlxen1.stat.rx_ring13.bytes: 2284826126
hw.mlxen1.stat.rx_ring13.packets: 7781479
hw.mlxen1.stat.rx_ring12.error: 20
hw.mlxen1.stat.rx_ring12.bytes: 730312221216
hw.mlxen1.stat.rx_ring12.packets: 155950019
hw.mlxen1.stat.rx_ring11.error: 57
hw.mlxen1.stat.rx_ring11.bytes: 114233581104
hw.mlxen1.stat.rx_ring11.packets: 69633934
hw.mlxen1.stat.rx_ring10.error: 49
hw.mlxen1.stat.rx_ring10.bytes: 10775291086886
hw.mlxen1.stat.rx_ring10.packets: 1389173314
hw.mlxen1.stat.rx_ring9.error: 68
hw.mlxen1.stat.rx_ring9.bytes: 35171979154
hw.mlxen1.stat.rx_ring9.packets: 86633073
hw.mlxen1.stat.rx_ring8.error: 81
hw.mlxen1.stat.rx_ring8.bytes: 23210482350
hw.mlxen1.stat.rx_ring8.packets: 68058961
hw.mlxen1.stat.rx_ring7.error: 49
hw.mlxen1.stat.rx_ring7.bytes: 5093871869318
hw.mlxen1.stat.rx_ring7.packets: 744833265
hw.mlxen1.stat.rx_ring6.error: 37
hw.mlxen1.stat.rx_ring6.bytes: 90764137790
hw.mlxen1.stat.rx_ring6.packets: 130626431
hw.mlxen1.stat.rx_ring5.error: 7
hw.mlxen1.stat.rx_ring5.bytes: 641902292152
hw.mlxen1.stat.rx_ring5.packets: 76754874
hw.mlxen1.stat.rx_ring4.error: 59
hw.mlxen1.stat.rx_ring4.bytes: 28894253498
hw.mlxen1.stat.rx_ring4.packets: 12545685
hw.mlxen1.stat.rx_ring3.error: 87
hw.mlxen1.stat.rx_ring3.bytes: 1581250152646
hw.mlxen1.stat.rx_ring3.packets: 255027061
hw.mlxen1.stat.rx_ring2.error: 19
hw.mlxen1.stat.rx_ring2.bytes: 47056101376
hw.mlxen1.stat.rx_ring2.packets: 11670049
hw.mlxen1.stat.rx_ring1.error: 76
hw.mlxen1.stat.rx_r