On Wed, 2009-07-15 at 14:43 -0700, Mike Mason wrote:
> This patch increments the device_node reference counter when an EEH
> error occurs and decrements the counter when the event has been
> handled.  This is to prevent the device_node from being released until
> eeh_event_handler() has had a chance to deal with the event.  We've
> seen cases where the device_node is released too soon when an EEH
> event occurs during a dlpar remove, causing the event handler to
> attempt to access bad memory locations.
> 
> Please review and let me know of any concerns.

Taking a reference sounds sane, but ...

> Signed-off-by: Mike Mason <mm...@us.ibm.com> 
> 
> --- a/arch/powerpc/platforms/pseries/eeh_event.c      2008-10-09 
> 15:13:53.000000000 -0700
> +++ b/arch/powerpc/platforms/pseries/eeh_event.c      2009-07-14 
> 14:14:00.000000000 -0700
> @@ -75,6 +75,14 @@ static int eeh_event_handler(void * dumm
>       if (event == NULL)
>               return 0;
>  
> +     /* EEH holds a reference to the device_node, so if it
> +      * equals 1 it's no longer valid and the event should
> +      * be ignored */
> +     if (atomic_read(&event->dn->kref.refcount) == 1) {
> +             of_node_put(event->dn);
> +             return 0;
> +     }

That's really gross :)

And what happens if the refcount goes to 1 just after the check? ie.
here.

>       /* Serialize processing of EEH events */
>       mutex_lock(&eeh_event_mutex);
>       eeh_mark_slot(event->dn, EEH_MODE_RECOVERING);


cheers

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to