On Wed, Apr 17, 2019 at 3:05 PM Sudarsana Reddy Kalluru <skall...@marvell.com> wrote: > > > -----Original Message----- > > From: Ian Kumlien <ian.kuml...@gmail.com> > > Sent: Wednesday, April 17, 2019 4:32 PM > > To: Sudarsana Reddy Kalluru <skall...@marvell.com> > > Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>; Ariel Elior > > <ael...@marvell.com>; Ameen Rahman <arah...@marvell.com> > > Subject: Re: bnx2x - odd behaviour > > > > On Wed, Apr 17, 2019 at 9:58 AM Sudarsana Reddy Kalluru > > <skall...@marvell.com> wrote: > > > > > > +Ameen > > > > > > Ian, > > > We couldn't find the root-cause from the logs/register-dump. > > > Could you please load the driver with link-debugs enabled, i.e., modprobe > > bnx2x debug=0x4 or 'ethtool -s <interface> msglvl 0x4'. And collect the > > complete kernel logs and the register-dump(collected before performing > > ifconfig-down). Please also provide the output of "ethtool -i <interface>". > > > > I'll try, this is a production system... > > > > Could it be related to the gro changes for UDP that was done in 5.x? > > > Thanks for your help. I'm not sure if this is related to gro, link related > code is handled by different component [management firmware (mfw)]. May be > the complete logs/register-dump provide some additional pointers. There were > some fixes in the newer version of mfw, getting the mfw version on the chip > would help (ethtool -i <interface> provides mfw/boot-code version).
ethtool -i enp2s0f0 driver: bnx2x version: 1.712.30-0 storm 7.13.1.0 firmware-version: bc 6.2.28 phy baa0.105 expansion-rom-version: bus-info: 0000:02:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes What we can see in the logs (not with the linkdebug enabled) is: apr 12 06:22:35 localhost kernel: bnx2x 0000:02:00.0 enp2s0f0: NIC Link is Down apr 12 06:22:35 localhost kernel: bond0: link status down for active interface enp2s0f0, disabling it in 1000 ms apr 12 06:22:35 localhost kernel: bnx2x 0000:02:00.0 enp2s0f0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - transmit apr 12 06:22:35 localhost kernel: bond0: link status up again after 400 ms for interface enp2s0f0 apr 12 06:22:36 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention 0x04000000 (masked) apr 12 06:22:36 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr 12 06:22:37 localhost kernel: bnx2x: [bnx2x_hw_stats_update:869(enp2s0f0)]NIG timer max (1) apr 12 06:22:37 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention 0x04000000 (masked) apr 12 06:22:37 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr 12 06:22:38 localhost kernel: bnx2x: [bnx2x_hw_stats_update:869(enp2s0f0)]NIG timer max (2) apr 12 06:22:38 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention 0x04000000 (masked) apr 12 06:22:38 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr 12 06:22:39 localhost kernel: bnx2x: [bnx2x_hw_stats_update:869(enp2s0f0)]NIG timer max (3) apr 12 06:22:39 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention 0x04000000 (masked) apr 12 06:22:39 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr 12 06:22:40 localhost kernel: bnx2x: [bnx2x_hw_stats_update:869(enp2s0f0)]NIG timer max (4) apr 12 06:22:40 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention 0x04000000 (masked) apr 12 06:22:40 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr 12 06:22:41 localhost kernel: bnx2x: [bnx2x_hw_stats_update:869(enp2s0f0)]NIG timer max (5) apr 12 06:22:41 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention 0x04000000 (masked) apr 12 06:22:41 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr 12 06:22:42 localhost kernel: bnx2x: [bnx2x_hw_stats_update:869(enp2s0f0)]NIG timer max (6) apr 12 06:22:42 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention 0x04000000 (masked) apr 12 06:22:42 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr 12 06:22:43 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention 0x04000000 (masked) apr 12 06:22:43 localhost kernel: bnx2x: [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr 12 06:22:44 localhost kernel: bnx2x: [bnx2x_hw_stats_update:869(enp2s0f0)]NIG timer max (7) ... and so it begins =) > > > Thanks, > > > Sudarsana > > > > -----Original Message----- > > > > From: Ian Kumlien <ian.kuml...@gmail.com> > > > > Sent: Friday, April 12, 2019 4:39 PM > > > > To: Sudarsana Reddy Kalluru <skall...@marvell.com> > > > > Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>; Ariel > > > > Elior <ael...@marvell.com> > > > > Subject: Re: bnx2x - odd behaviour > > > > > > > > On Fri, Apr 12, 2019 at 12:53 PM Sudarsana Reddy Kalluru > > > > <skall...@marvell.com> wrote: > > > > > > > > > > Hi Ian, > > > > > Thanks for your info/help. There's not much info in the logs > > > > > (e.g., FW > > > > traces, calltraces). Will contact our firmware team on the > > > > register-dump analysis and provide you the update. > > > > > > > > Thank you =) > > > > > > > > > Thanks, > > > > > Sudarsana > > > > > > -----Original Message----- > > > > > > From: Ian Kumlien <ian.kuml...@gmail.com> > > > > > > Sent: Friday, April 12, 2019 2:44 PM > > > > > > To: Sudarsana Reddy Kalluru <skall...@marvell.com> > > > > > > Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>; > > > > > > Ariel Elior <ael...@marvell.com> > > > > > > Subject: Re: bnx2x - odd behaviour > > > > > > > > > > > > Finally! > > > > > > > > > > > > Just had a machine with the same issue! > > > > > > > > > > > > On Thu, Apr 11, 2019 at 10:56 AM Ian Kumlien > > > > > > <ian.kuml...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > On Thu, Apr 4, 2019 at 4:27 PM Sudarsana Reddy Kalluru > > > > > > > <skall...@marvell.com> wrote: > > > > > > > > > > > > > > > > Hi, > > > > > > > > We are not aware of this issue. Please collect the > > > > > > > > register dump i.e., > > > > > > "ethtool -d <interface>" output when this issue happens (before > > > > > > performing > > > > > > link-flap) and share it for the analysis. > > > > > > > > > > > > Sent the dump separately :)