Mike Mason wrote: > > This patch changes the EEH_MAX_FAILS action from panic to printing > an error message. Panicking under under this condition is too > harsh. Although performance will be affected and the device may not > recover, the system is still running, which at the very least, > should allow for a more graceful shutdown. The panic() is now > wrapped in a DEBUG statement for development purposes. The patch > also removes the msleep() within a spinlock, which is not allowed.
> @@ -509,18 +510,24 @@ For ease of review, please try to use diff -p to generate patches. > rc = 1; > if (pdn->eeh_mode & EEH_MODE_ISOLATED) { > pdn->eeh_check_count ++; > - if (pdn->eeh_check_count >= EEH_MAX_FAILS) { > - printk (KERN_ERR "EEH: Device driver ignored %d bad > reads, panicing\n", > - pdn->eeh_check_count); > + if (pdn->eeh_check_count % EEH_MAX_FAILS == 0) { > + location = (char *) of_get_property(dn, "ibm,loc-code", > NULL); Unneeded cast here, I think. > + printk (KERN_ERR "EEH: %d reads ignored for recovering > device at " > + "location=%s driver=%s pci addr=%s\n", > + pdn->eeh_check_count, location, > + dev->driver->name, pci_name(dev)); > + printk (KERN_ERR "EEH: Might be infinite loop in %s > driver\n", > + dev->driver->name); > +#ifdef DEBUG > dump_stack(); > - msleep(5000); > - > + > /* re-read the slot reset state */ > if (read_slot_reset_state(pdn, rets) != 0) > rets[0] = -1; /* reset state unknown */ > > /* If we are here, then we hit an infinite loop. Stop. > */ > panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], > pci_name(dev)); > +#endif While I tend to agree that panic() is unnecessary, don't we want a stack dump unconditionally (i.e. not bracketed in #ifdef DEBUG)? I'd prefer just removing the code instead of adding #ifdef's in the middle of this function. eeh.c needs less #ifdef DEBUG, not more :) _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev