[E1000-devel] Re-2: e1000e detected hardware unit hang problem

Lars Maschke Mon, 04 Mar 2013 15:33:43 -0800

Hello Tushar,

first of all. Thanks for Your quick reply.


That's the point. I don't know why this occurs. If I have the chance I see a 
failure of the e1000e driver on the console. The server is completly down and I 
can't logon to get any other information.

The error isn't logged at dmesg.log or syslog on my debian system. There is no 
logging at all after the crash.

Only a full reset solves the problem. I get this error every two, three or four 
days. At the crash time no special cron job is running. It occurs only an night 
between 0:00h and 2:30h.

>From now on I try to reset networking with the following bash-script every 
>night and I hope that it's a good idea:

---
#!/bin/sh

/etc/init.d/networking stop
/sbin/rmmod e1000e
/sbin/modprobe e1000e RxIntDelay=0,0 IntMode=1,1
/etc/init.d/networking start
/sbin/ethtool -K eth0 tso off
/sbin/shorewall restart
---

As You see I read some other posts and the readme of the official driver. Do 
You think "RxIntDelay=0,0" can make my problem go away? I also tried 
IntMode=0,0 with no success. With the MSI-PCI option of the kernel I can see 
that the eth0 netcard is on exclusive interrupt.

I have this trouble from the first use of the S1200BLT board. Normally I use 
this board with VMWare ESXi Servers of version 4.1, 5.0 and 5.1 with no 
problem. That's why I don't understand this.

Here's the output of the lspci:

---
03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
        Subsystem: Intel Corporation Device 3578
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 44
        Region 0: Memory at c2300000 (32-bit, non-prefetchable) [size=128K]
        Region 2: I/O ports at 2000 [size=32]
        Region 3: Memory at c2320000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee0f00c  Data: 4172
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, 
L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
Unsupported+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency 
L0 <128ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
        Capabilities: [a0] MSI-X: Enable- Count=5 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00002000
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt+ 
RxOF+ MalfTLP+ ECRC- UnsupReq+ ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140 v1] Device Serial Number 00-1e-67-ff-ff-XX-XX-XX
        Kernel driver in use: e1000e
---

None of the used kernels like 3.3.8, 3.4.32 or 3.7.9 worked with the shipped 
e1000e drivers. Only one of the two network cards is attached to the switch. 
Changing to the other network connector, flow control settings or link speed 
settings doesn't solved the problem.

Do You think that the problem can occur when the other Intel "e1000" driver is 
also loaded on the machine?

Greets
Lars


--------------------------------------------------------------------------     
Subject: RE: e1000e detected hardware unit hang problem (04-Mrz-2013 23:03)
From:    Dave, Tushar N <[email protected]>
To:      lm


Lars,
 
Sorry that you have issue with board. Please always send your email to or CC to 
[email protected].
What is the device? lspci vvv (after issue occurs)
Send full dmesg log after issue occurs.
How quick does the issue occurs? Any reproduction steps?
Was it ever working before with any good known  driver/kernel version?
-Tushar
 
From: Lars Maschke
Sent: Saturday, March 02, 2013 11:18 PM
To: Dave, Tushar N
Subject: e1000e detected hardware unit hang problem
 
Dear Tushar,
 
I saw in some forums regarding our problem that You develop the e1000e driver.
 
We have big trouble with the network chips on our S1200BLT mainboard. Every two 
or three days we get the "detected hardware unit hang" failure. The complete 
server is unreachable and there's no possiblity to log in on the console.
 
Our system is debian with kernel 3.7.9. I've also installed the last driver 
from intel.com as You see here:
 
driver: e1000e
version: 2.2.14-NAPI
firmware-version: 2.1-0
bus-info: 0000:03:00.0
 
That's what I tried:
Kernel 3.3.8
Kernel appends: "pci=nomsi","pcie_aspm=off"
 
We have other server boards like S5520 or S3420 which make no trouble.
 
Can You tell me if there is any chance to get my server working correctly?
 
Best Regards
Lars Maschke

To: [email protected]
Cc: [email protected]



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

[E1000-devel] Re-2: e1000e detected hardware unit hang problem

Reply via email to