On Fri, Apr 19, 2019 at 7:23 AM Sudarsana Reddy Kalluru <skall...@marvell.com> wrote: > > Hi Ian, > Thanks for your info. Mfw team already analyzed the "nig timer" related > logs but can't infer anything. From the boot-code version, the device look to > be from the older generation of Broadcom nics. Besides the > elink-logs/register-dump, could you also share the lspci output (lspci -vvv).
Yes, this is older machines =) Sorry for the delay in answering, there has been a holiday here, =) lspci output: 02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57711E 10-Gigabit PCIe Subsystem: Hewlett-Packard Company NC532i Dual Port 10GbE Multifunction BL-C Adapter Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 41 Region 0: Memory at fb000000 (64-bit, non-prefetchable) [size=8M] Region 2: Memory at fa800000 (64-bit, non-prefetchable) [size=8M] Capabilities: [48] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable+ DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data Product Name: HP NC532i DP 10GbE Multifunction BL-c Adapter Read-only fields: [PN] Part number: N/A [EC] Engineering changes: N/A [SN] Serial number: 0123456789 [MN] Manufacture ID: 31 34 65 34 [RV] Reserved: checksum good, 39 byte(s) reserved End Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [a0] MSI-X: Enable+ Count=17 Masked- Vector table: BAR=0 offset=00440000 PBA: BAR=0 offset=00441800 Capabilities: [ac] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <2us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+ MaxPayload 256 bytes, MaxReadReq 4096 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <2us, L1 <2us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Device Serial Number 44-1e-a1-ff-fe-45-a6-38 Capabilities: [110 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UESvrt: DLP- SDES+ TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [150 v1] Power Budgeting <?> Capabilities: [160 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Kernel driver in use: bnx2x Kernel modules: bnx2x --- > Thanks, > Sudarsana > > -----Original Message----- > > From: Ian Kumlien <ian.kuml...@gmail.com> > > Sent: Wednesday, April 17, 2019 6:51 PM > > To: Sudarsana Reddy Kalluru <skall...@marvell.com> > > Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>; Ariel Elior > > <ael...@marvell.com>; Ameen Rahman <arah...@marvell.com> > > Subject: Re: bnx2x - odd behaviour > > > > On Wed, Apr 17, 2019 at 3:05 PM Sudarsana Reddy Kalluru > > <skall...@marvell.com> wrote: > > > > > > > -----Original Message----- > > > > From: Ian Kumlien <ian.kuml...@gmail.com> > > > > Sent: Wednesday, April 17, 2019 4:32 PM > > > > To: Sudarsana Reddy Kalluru <skall...@marvell.com> > > > > Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>; Ariel > > > > Elior <ael...@marvell.com>; Ameen Rahman <arah...@marvell.com> > > > > Subject: Re: bnx2x - odd behaviour > > > > > > > > On Wed, Apr 17, 2019 at 9:58 AM Sudarsana Reddy Kalluru > > > > <skall...@marvell.com> wrote: > > > > > > > > > > +Ameen > > > > > > > > > > Ian, > > > > > We couldn't find the root-cause from the logs/register-dump. > > > > > Could you please load the driver with link-debugs enabled, i.e., > > > > > modprobe > > > > bnx2x debug=0x4 or 'ethtool -s <interface> msglvl 0x4'. And collect > > > > the complete kernel logs and the register-dump(collected before > > > > performing ifconfig-down). Please also provide the output of "ethtool -i > > <interface>". > > > > > > > > I'll try, this is a production system... > > > > > > > > Could it be related to the gro changes for UDP that was done in 5.x? > > > > > > > Thanks for your help. I'm not sure if this is related to gro, link > > > related code > > is handled by different component [management firmware (mfw)]. May be > > the complete logs/register-dump provide some additional pointers. There > > were some fixes in the newer version of mfw, getting the mfw version on the > > chip would help (ethtool -i <interface> provides mfw/boot-code version). > > > > ethtool -i enp2s0f0 > > driver: bnx2x > > version: 1.712.30-0 storm 7.13.1.0 > > firmware-version: bc 6.2.28 phy baa0.105 > > expansion-rom-version: > > bus-info: 0000:02:00.0 > > supports-statistics: yes > > supports-test: yes > > supports-eeprom-access: yes > > supports-register-dump: yes > > supports-priv-flags: yes > > > > What we can see in the logs (not with the linkdebug enabled) is: > > apr 12 06:22:35 localhost kernel: bnx2x 0000:02:00.0 enp2s0f0: NIC Link is > > Down apr 12 06:22:35 localhost kernel: bond0: link status down for active > > interface enp2s0f0, disabling it in 1000 ms apr 12 06:22:35 localhost > > kernel: > > bnx2x 0000:02:00.0 enp2s0f0: NIC Link is Up, 10000 Mbps full duplex, Flow > > control: ON - transmit apr 12 06:22:35 localhost kernel: bond0: link status > > up > > again after > > 400 ms for interface enp2s0f0 > > apr 12 06:22:36 localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention > > 0x04000000 (masked) > > apr 12 06:22:36 localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr > > 12 06:22:37 localhost kernel: bnx2x: > > [bnx2x_hw_stats_update:869(enp2s0f0)]NIG timer max (1) apr 12 06:22:37 > > localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention > > 0x04000000 (masked) > > apr 12 06:22:37 localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr > > 12 06:22:38 localhost kernel: bnx2x: > > [bnx2x_hw_stats_update:869(enp2s0f0)]NIG timer max (2) apr 12 06:22:38 > > localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention > > 0x04000000 (masked) > > apr 12 06:22:38 localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr > > 12 06:22:39 localhost kernel: bnx2x: > > [bnx2x_hw_stats_update:869(enp2s0f0)]NIG timer max (3) apr 12 06:22:39 > > localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention > > 0x04000000 (masked) > > apr 12 06:22:39 localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr > > 12 06:22:40 localhost kernel: bnx2x: > > [bnx2x_hw_stats_update:869(enp2s0f0)]NIG timer max (4) apr 12 06:22:40 > > localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention > > 0x04000000 (masked) > > apr 12 06:22:40 localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr > > 12 06:22:41 localhost kernel: bnx2x: > > [bnx2x_hw_stats_update:869(enp2s0f0)]NIG timer max (5) apr 12 06:22:41 > > localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention > > 0x04000000 (masked) > > apr 12 06:22:41 localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr > > 12 06:22:42 localhost kernel: bnx2x: > > [bnx2x_hw_stats_update:869(enp2s0f0)]NIG timer max (6) apr 12 06:22:42 > > localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention > > 0x04000000 (masked) > > apr 12 06:22:42 localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr > > 12 06:22:43 localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4357(enp2s0f0)]LATCHED attention > > 0x04000000 (masked) > > apr 12 06:22:43 localhost kernel: bnx2x: > > [bnx2x_attn_int_deasserted3:4361(enp2s0f0)]GRC time-out 0x08004384 apr > > 12 06:22:44 localhost kernel: bnx2x: > > [bnx2x_hw_stats_update:869(enp2s0f0)]NIG timer max (7) ... and so it > > begins =) > > > > > > > Thanks, > > > > > Sudarsana > > > > > > -----Original Message----- > > > > > > From: Ian Kumlien <ian.kuml...@gmail.com> > > > > > > Sent: Friday, April 12, 2019 4:39 PM > > > > > > To: Sudarsana Reddy Kalluru <skall...@marvell.com> > > > > > > Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>; > > > > > > Ariel Elior <ael...@marvell.com> > > > > > > Subject: Re: bnx2x - odd behaviour > > > > > > > > > > > > On Fri, Apr 12, 2019 at 12:53 PM Sudarsana Reddy Kalluru > > > > > > <skall...@marvell.com> wrote: > > > > > > > > > > > > > > Hi Ian, > > > > > > > Thanks for your info/help. There's not much info in the > > > > > > > logs (e.g., FW > > > > > > traces, calltraces). Will contact our firmware team on the > > > > > > register-dump analysis and provide you the update. > > > > > > > > > > > > Thank you =) > > > > > > > > > > > > > Thanks, > > > > > > > Sudarsana > > > > > > > > -----Original Message----- > > > > > > > > From: Ian Kumlien <ian.kuml...@gmail.com> > > > > > > > > Sent: Friday, April 12, 2019 2:44 PM > > > > > > > > To: Sudarsana Reddy Kalluru <skall...@marvell.com> > > > > > > > > Cc: Linux Kernel Network Developers > > > > > > > > <netdev@vger.kernel.org>; Ariel Elior <ael...@marvell.com> > > > > > > > > Subject: Re: bnx2x - odd behaviour > > > > > > > > > > > > > > > > Finally! > > > > > > > > > > > > > > > > Just had a machine with the same issue! > > > > > > > > > > > > > > > > On Thu, Apr 11, 2019 at 10:56 AM Ian Kumlien > > > > > > > > <ian.kuml...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Thu, Apr 4, 2019 at 4:27 PM Sudarsana Reddy Kalluru > > > > > > > > > <skall...@marvell.com> wrote: > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > We are not aware of this issue. Please collect the > > > > > > > > > > register dump i.e., > > > > > > > > "ethtool -d <interface>" output when this issue happens > > > > > > > > (before performing > > > > > > > > link-flap) and share it for the analysis. > > > > > > > > > > > > > > > > Sent the dump separately :)