On Sun, Jun 16, 2013 at 03:12:11PM +1000, Benjamin Herrenschmidt wrote:
>On Sat, 2013-06-15 at 17:03 +0800, Gavin Shan wrote:
>> On PowerNV platform, the EEH event is produced either by detect
>> on accessing config or I/O registers, or by interrupts dedicated
>> for EEH report. The patch adds support to process the interrupts
>> dedicated for EEH report.
>> 
>> Firstly, the kernel thread will be waken up to process incoming
>> interrupt. The PHBs will be scanned one by one to process all
>> existing EEH errors. Besides, There're mulple EEH errors that can
>> be reported from interrupts and we have differentiated actions
>> against them:
>> 
>> - If the IOC is dead, all PCI buses under all PHBs will be removed
>>   from the system.
>> - If the PHB is dead, all PCI buses under the PHB will be removed
>>   from the system.
>> - If the PHB is fenced, EEH event will be sent to EEH core and
>>   the fenced PHB is expected to be resetted completely.
>> - If specific PE has been put into frozen state, EEH event will
>>   be sent to EEH core so that the PE will be resetted.
>> - If the error is informational one, we just output the related
>>   registers for debugging purpose and no more action will be
>>   taken.
>

Thanks for the review, Ben.

>Getting better.... but:
>
> - I still don't like having a kthread for that. Why not use schedule_work() ?
>

Ok. Will update it with schedule_work() in next revision :-)

> - We already have an EEH thread, why not just use it ? IE send it a special
>type of message that makes it query the backend for error info instead ?
>

Ok. I'll try to do as you suggested in next revision. Something like:

        - Interrupt comes in
        - OPAL notifier callback
        - Mark all PHB and its subordinate PEs "isolated" since we don't know
          which PHB/PE has problems (Note: we still need eeh_serialize_lock())
        - Create an EEH event without binding PE to EEH core.
        - EEH core starts new kthread and calls to next_error() backend
          and handle the EEH errors accordingly.
          
          * Informational errors: clear PHB "isolated" state and output 
diag-data
            in backend (in eeh-ioda.c as you suggested).
          * Fenced PHB: PHB complete reset by EEH core and "isolated" state will
            be cleared during the reset automatically.
          * Dead PHB: Remove the PHB and its subordinate PCI buses/devices from
                      the system.
          * Dead IOC: Remove PCI domain from the system.

The problem with the scheme is that the PHB's state can't reflect the real state
any more. For example, PHB#0 has been fenced, but PHB#1 is normal state. We have
to mark all PHBs as "isolated" (fenced) since we don't know which PHB is 
encountering
problems in the OPAL notifier callback.

I think it would work well. Let me have a try to change the code and make it
workable. The side-effect would be introducing more logic to EEH core and it's
shared by multiple platforms (powernv, pseries, powerkvm guest in future). So
my initial though is making opal_pci_next_error() invisible from EEH core and
make the EEH core totally event-driven :-)

> - I'm not fan of exposing that EEH private lock. I don't entirely understand
>why you need to do that either.
>

It's used to get consistent PE isolated state, which is protected by the lock.
Without it, we would have following case. Since we're going to change the
PE's state in platform code (pci-err.c), we need the lock to protect the PE's
state.

        
                    CPU#0                               CPU#1
        PCI-CFG read returns 0xFF's             PCI-CFG read returns 0xFF's
        PE not fenced                           PE not fenced
        PE marked as fenced                     PE marked as fenced
        EEH event to EEH core                   EEH event to EEH core

>Generally speaking, I'm thinking this file should contain less stuff, most of
>it should move into the ioda backend, the interrupt just turning into some
>request down to the existing EEH thread.
>

Yeah, I'll move most of the stuff into eeh-ioda.c with above scheme applied :-)

Thanks,
Gavin

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to