Re: watchdog timeout panic in e1000 driver

Auke Kok Wed, 25 Oct 2006 08:15:44 -0700

Kenzo Iwami wrote:

Hi,

This problem originally occurred in a very large cluster system using snmp
for server management. About two servers panicked each day. The program I sent
is to reproduce this problem in a very short time. It does occur under normal
load when there is a lot of servers.
hmm, not good - does your snmp daemon use ethtool excessively? That would certainly bepainful to the driver (any driver!).


I only looked at the panic message after this problem occurred.
I could tell that the snmp daemon caused the panic while trying to process
the ethtool's ioctl, but I don't know how often this was called.
However, it shouldn't be excessively called because it occurred on a production
system while it was idle.

Anyway as I said in the same e-mail, we're working on reducing the lock timeout to areasonable time. This will unfortunately take some time, as we need to change some majorcomponents in the driver to make sure this doesn't happen.


How about the following approach?
If acquiring semaphore fails inside the interrupt handler, acquiring semaphore
is abandoned immediately without waiting for timeout.
However, I don't know whether this method affects other processes.

with the current hardware being accessed simultaneously from several users in thekernel, that would lead to large problems - the watchdog task accesses it every 2seconds as it reads the PHY link status, so when one of those fails the driver wouldhave no choice but to reset the entire device.


Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: watchdog timeout panic in e1000 driver

Reply via email to