Hello Tushar,
first of all. Thanks for Your quick reply.
That's the point. I don't know why this occurs. If I have the chance I see a
failure of the e1000e driver on the console. The server is completly down and I
can't logon to get any other information.
The error isn't logged at dmesg.log or syslog on my debian system. There is no
logging at all after the crash.
Only a full reset solves the problem. I get this error every two, three or four
days. At the crash time no special cron job is running. It occurs only an night
between 0:00h and 2:30h.
>From now on I try to reset networking with the following bash-script every
>night and I hope that it's a good idea:
---
#!/bin/sh
/etc/init.d/networking stop
/sbin/rmmod e1000e
/sbin/modprobe e1000e RxIntDelay=0,0 IntMode=1,1
/etc/init.d/networking start
/sbin/ethtool -K eth0 tso off
/sbin/shorewall restart
---
As You see I read some other posts and the readme of the official driver. Do
You think "RxIntDelay=0,0" can make my problem go away? I also tried
IntMode=0,0 with no success. With the MSI-PCI option of the kernel I can see
that the eth0 netcard is on exclusive interrupt.
I have this trouble from the first use of the S1200BLT board. Normally I use
this board with VMWare ESXi Servers of version 4.1, 5.0 and 5.1 with no
problem. That's why I don't understand this.
Here's the output of the lspci:
---
03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
Subsystem: Intel Corporation Device 3578
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 44
Region 0: Memory at c2300000 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at 2000 [size=32]
Region 3: Memory at c2320000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee0f00c Data: 4172
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns,
L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+
TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency
L0 <128ns, L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
DLActive- BWMgmt- ABWMgmt-
Capabilities: [a0] MSI-X: Enable- Count=5 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt+
RxOF+ MalfTLP+ ECRC- UnsupReq+ ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [140 v1] Device Serial Number 00-1e-67-ff-ff-XX-XX-XX
Kernel driver in use: e1000e
---
None of the used kernels like 3.3.8, 3.4.32 or 3.7.9 worked with the shipped
e1000e drivers. Only one of the two network cards is attached to the switch.
Changing to the other network connector, flow control settings or link speed
settings doesn't solved the problem.
Do You think that the problem can occur when the other Intel "e1000" driver is
also loaded on the machine?
Greets
Lars
--------------------------------------------------------------------------
Subject: RE: e1000e detected hardware unit hang problem (04-Mrz-2013 23:03)
From: Dave, Tushar N <[email protected]>
To: lm
Lars,
Sorry that you have issue with board. Please always send your email to or CC to
[email protected].
What is the device? lspci vvv (after issue occurs)
Send full dmesg log after issue occurs.
How quick does the issue occurs? Any reproduction steps?
Was it ever working before with any good known driver/kernel version?
-Tushar
From: Lars Maschke
Sent: Saturday, March 02, 2013 11:18 PM
To: Dave, Tushar N
Subject: e1000e detected hardware unit hang problem
Dear Tushar,
I saw in some forums regarding our problem that You develop the e1000e driver.
We have big trouble with the network chips on our S1200BLT mainboard. Every two
or three days we get the "detected hardware unit hang" failure. The complete
server is unreachable and there's no possiblity to log in on the console.
Our system is debian with kernel 3.7.9. I've also installed the last driver
from intel.com as You see here:
driver: e1000e
version: 2.2.14-NAPI
firmware-version: 2.1-0
bus-info: 0000:03:00.0
That's what I tried:
Kernel 3.3.8
Kernel appends: "pci=nomsi","pcie_aspm=off"
We have other server boards like S5520 or S3420 which make no trouble.
Can You tell me if there is any chance to get my server working correctly?
Best Regards
Lars Maschke
To: [email protected]
Cc: [email protected]
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired