Folks,

We have been battling a tough one here for a couple weeks.  We think we
have isolated it though and are seeking comments before we submit a patch.
Here's the deal:

tg3 driver v3.23 as packaged with SuSE SLES9 SP2
multiprocessor IA64 platform

We were experiencing panic's during boot at a rate of approximately 1 in
80 boots.  The panic occured in the tg3 driver every time it happened
and was a PCI infrastructure failure (target device not responding).

After lots of investigation we believe that tg3_open() is being called by
the SuSE hotplug infrastructure while tg3_init_one() is still in flight.

What happens then is that tg3_open() commences to reset the Broadcom
device BEFORE tg3_init_one() has called tg3_pci_save_state().
The Broadcom device is now whacked into oblivion just when
tg3_pci_save_state() does its thing... and it saves off a copy of ALL ZERO
pci config registers.  A little bit later, we do a pci_restore_state(),
putting all the zero's back to the part and ensuring that the BARs decode
to never-never land.

The next PIO we do, to the MAC_MODE register fails in our PCI
infrastructure because the Broadcom device is no longer decoded where
we think it is.

Whew.  Nasty.

It is likely that you would experience this problem only with hotplug
enabled and only on a multiprocessor system.  It appears that one
processor is running tg3_init_one() and another is anxiously waiting
in hotplug for the tg3_open() to be registered and then pounces on it
as soon as it shows up.

So, what we have done is to take both spin_locks() at the top of
tg3_init_one() right after they are initialized and release them
appropriately at all the exit points.  We think this prevents tg3_open()
from doing anything until tg3_init_one() has completed.

We have been running for quite some time now without a panic using this
remedy.

I have checked the newer tg3.c v3.31 and it doesn't look like this issue
has been addressed there yet.

What says the collective?

Thanks.

Chris Elmquist
SGI Network Engineering
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to