On Tue, Apr 26, 2016 at 03:29:59PM +1000, David Gibson wrote:
>On Fri, Apr 22, 2016 at 11:28:02PM +1000, Gavin Shan wrote:
>> The function eeh_pe_reset_and_recover() is used to recover EEH
>> error when the passthrough device are transferred to guest and
>> backwards, meaning the device's driver is vfio-pci or none.
>> When the driver is vfio-pci that provides error_detected() error
>> handler only, the handler simply stops the guest and it's not
>> expected behaviour. On the other hand, no error handlers will
>> be called if we don't have a bound driver.
>> 
>> This ignores all error handlers provided by device driver in
>> eeh_pe_reset_and_recover() to avoid the exceptional behaviour.
>> 
>> Fixes: 5cfb20b9 ("powerpc/eeh: Emulate EEH recovery for VFIO devices")
>> Cc: sta...@vger.kernel.org #v3.18+
>> Signed-off-by: Gavin Shan <gws...@linux.vnet.ibm.com>
>> Reviewed-by: Russell Currey <rus...@russell.cc>
>> ---
>>  arch/powerpc/kernel/eeh_driver.c | 11 +----------
>>  1 file changed, 1 insertion(+), 10 deletions(-)
>> 
>> diff --git a/arch/powerpc/kernel/eeh_driver.c 
>> b/arch/powerpc/kernel/eeh_driver.c
>> index fb6207d..1c7d703 100644
>> --- a/arch/powerpc/kernel/eeh_driver.c
>> +++ b/arch/powerpc/kernel/eeh_driver.c
>> @@ -552,7 +552,7 @@ static int eeh_clear_pe_frozen_state(struct eeh_pe *pe,
>>  
>>  int eeh_pe_reset_and_recover(struct eeh_pe *pe)
>>  {
>> -    int result, ret;
>> +    int ret;
>>  
>>      /* Bail if the PE is being recovered */
>>      if (pe->state & EEH_PE_RECOVERING)
>> @@ -564,9 +564,6 @@ int eeh_pe_reset_and_recover(struct eeh_pe *pe)
>>      /* Save states */
>>      eeh_pe_dev_traverse(pe, eeh_dev_save_state, NULL);
>>  
>> -    /* Report error */
>> -    eeh_pe_dev_traverse(pe, eeh_report_error, &result);
>
>Ok, so after chatting to Gavin, I've made sense of this.  The basic
>thing here is that eeh_pe_reset_and_recover() should be discarding any
>errors from before the reset, not reporting them - the whole point is
>that we know things have gone bad, and we want to clear back to a good
>state.
>
>>      /* Issue reset */
>>      ret = eeh_reset_pe(pe);
>>      if (ret) {
>> @@ -581,15 +578,9 @@ int eeh_pe_reset_and_recover(struct eeh_pe *pe)
>>              return ret;
>>      }
>>  
>> -    /* Notify completion of reset */
>> -    eeh_pe_dev_traverse(pe, eeh_report_reset, &result);
>
>However, it's not clear if removing the report of a reset makes sense.
>There are no current users of reset notification IIUC, but if we're
>going to remove the reset reporting, we should put that in a separate
>patch with its own justification, and remove the other caller as well.
>

Thanks, David. It makes sense to me. I will split it into two: one removes
eeh_report_error notification and another removes the left notification
handlers.

>>      /* Restore device state */
>>      eeh_pe_dev_traverse(pe, eeh_dev_restore_state, NULL);
>>  
>> -    /* Resume */
>> -    eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
>
>And I'm not sure if it makes sense to remove the resume notification either.
>

Based on the offline talk, we either keep all notification handlers or remove
all of them. As we can't keep eeh_report_error, we have to remove all of them.

>>      /* Clear recovery mode */
>>      eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
>>  
>
>-- 
>David Gibson                   | I'll have my music baroque, and my code
>david AT gibson.dropbear.id.au | minimalist, thank you.  NOT _the_ _other_
>                               | _way_ _around_!
>http://www.ozlabs.org/~dgibson


_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to