Folks, We have been battling a tough one here for a couple weeks. We think we have isolated it though and are seeking comments before we submit a patch. Here's the deal:
tg3 driver v3.23 as packaged with SuSE SLES9 SP2 multiprocessor IA64 platform We were experiencing panic's during boot at a rate of approximately 1 in 80 boots. The panic occured in the tg3 driver every time it happened and was a PCI infrastructure failure (target device not responding). After lots of investigation we believe that tg3_open() is being called by the SuSE hotplug infrastructure while tg3_init_one() is still in flight. What happens then is that tg3_open() commences to reset the Broadcom device BEFORE tg3_init_one() has called tg3_pci_save_state(). The Broadcom device is now whacked into oblivion just when tg3_pci_save_state() does its thing... and it saves off a copy of ALL ZERO pci config registers. A little bit later, we do a pci_restore_state(), putting all the zero's back to the part and ensuring that the BARs decode to never-never land. The next PIO we do, to the MAC_MODE register fails in our PCI infrastructure because the Broadcom device is no longer decoded where we think it is. Whew. Nasty. It is likely that you would experience this problem only with hotplug enabled and only on a multiprocessor system. It appears that one processor is running tg3_init_one() and another is anxiously waiting in hotplug for the tg3_open() to be registered and then pounces on it as soon as it shows up. So, what we have done is to take both spin_locks() at the top of tg3_init_one() right after they are initialized and release them appropriately at all the exit points. We think this prevents tg3_open() from doing anything until tg3_init_one() has completed. We have been running for quite some time now without a panic using this remedy. I have checked the newer tg3.c v3.31 and it doesn't look like this issue has been addressed there yet. What says the collective? Thanks. Chris Elmquist SGI Network Engineering - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html