On Wed, 2009-07-15 at 14:43 -0700, Mike Mason wrote: > This patch increments the device_node reference counter when an EEH > error occurs and decrements the counter when the event has been > handled. This is to prevent the device_node from being released until > eeh_event_handler() has had a chance to deal with the event. We've > seen cases where the device_node is released too soon when an EEH > event occurs during a dlpar remove, causing the event handler to > attempt to access bad memory locations. > > Please review and let me know of any concerns.
Taking a reference sounds sane, but ... > Signed-off-by: Mike Mason <mm...@us.ibm.com> > > --- a/arch/powerpc/platforms/pseries/eeh_event.c 2008-10-09 > 15:13:53.000000000 -0700 > +++ b/arch/powerpc/platforms/pseries/eeh_event.c 2009-07-14 > 14:14:00.000000000 -0700 > @@ -75,6 +75,14 @@ static int eeh_event_handler(void * dumm > if (event == NULL) > return 0; > > + /* EEH holds a reference to the device_node, so if it > + * equals 1 it's no longer valid and the event should > + * be ignored */ > + if (atomic_read(&event->dn->kref.refcount) == 1) { > + of_node_put(event->dn); > + return 0; > + } That's really gross :) And what happens if the refcount goes to 1 just after the check? ie. here. > /* Serialize processing of EEH events */ > mutex_lock(&eeh_event_mutex); > eeh_mark_slot(event->dn, EEH_MODE_RECOVERING); cheers
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev