Ganesh Goudar <ganes...@linux.ibm.com> writes:
> For all unrecoverable errors we are missing to log the
> error, Since machine_check_log_err() is not getting called
> for unrecoverable errors.
>
> Raise irq work in save_mce_event() for unrecoverable errors,
> So that we log the error from MCE event handling block in
> timer handler.

But the patch also removes the irq work raise from machine_check_ue_event().

That's currently done unconditionally, regardless of the disposition. So
doesn't this change also drop logging of recoverable UEs?

Maybe that's OK, but the change log should explain it.

> Log without this change
>
>  MCE: CPU27: machine check (Severe)  Real address Load/Store (foreign/control 
> memory) [Not recovered]
>  MCE: CPU27: PID: 10580 Comm: inject-ra-err NIP: [0000000010000df4]
>  MCE: CPU27: Initiator CPU
>  MCE: CPU27: Unknown
>
> Log with this change
>
>  MCE: CPU24: machine check (Severe)  Real address Load/Store (foreign/control 
> memory) [Not recovered]
>  MCE: CPU24: PID: 1589811 Comm: inject-ra-err NIP: [0000000010000e48]
>  MCE: CPU24: Initiator CPU
>  MCE: CPU24: Unknown
>  RTAS: event: 5, Type: Platform Error (224), Severity: 3
>
> Signed-off-by: Ganesh Goudar <ganes...@linux.ibm.com>
> Reviewed-by: Mahesh Salgaonkar <mah...@linux.ibm.com>
> ---
> V2: Rephrasing the commit message.
> ---
>  arch/powerpc/kernel/mce.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index 6c5d30fba766..a1cb2172eb7b 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -131,6 +131,13 @@ void save_mce_event(struct pt_regs *regs, long handled,
>       if (mce->error_type == MCE_ERROR_TYPE_UE)
>               mce->u.ue_error.ignore_event = mce_err->ignore_event;
>  
> +     /*
> +      * Raise irq work, So that we don't miss to log the error for
> +      * unrecoverable errors.
> +      */
> +     if (mce->disposition == MCE_DISPOSITION_NOT_RECOVERED)
> +             mce_irq_work_queue();
> +
>       if (!addr)
>               return;
>  
> @@ -235,7 +242,6 @@ static void machine_check_ue_event(struct 
> machine_check_event *evt)
>              evt, sizeof(*evt));
>  
>       /* Queue work to process this event later. */

This comment is meaningless without the function call it's commenting
about, ie. the comment should be removed too.

> -     mce_irq_work_queue();
>  }
>  

cheers

Reply via email to