Ganesh Goudar <ganes...@linux.ibm.com> writes: > For all unrecoverable errors we are missing to log the > error, Since machine_check_log_err() is not getting called > for unrecoverable errors. > > Raise irq work in save_mce_event() for unrecoverable errors, > So that we log the error from MCE event handling block in > timer handler.
But the patch also removes the irq work raise from machine_check_ue_event(). That's currently done unconditionally, regardless of the disposition. So doesn't this change also drop logging of recoverable UEs? Maybe that's OK, but the change log should explain it. > Log without this change > > MCE: CPU27: machine check (Severe) Real address Load/Store (foreign/control > memory) [Not recovered] > MCE: CPU27: PID: 10580 Comm: inject-ra-err NIP: [0000000010000df4] > MCE: CPU27: Initiator CPU > MCE: CPU27: Unknown > > Log with this change > > MCE: CPU24: machine check (Severe) Real address Load/Store (foreign/control > memory) [Not recovered] > MCE: CPU24: PID: 1589811 Comm: inject-ra-err NIP: [0000000010000e48] > MCE: CPU24: Initiator CPU > MCE: CPU24: Unknown > RTAS: event: 5, Type: Platform Error (224), Severity: 3 > > Signed-off-by: Ganesh Goudar <ganes...@linux.ibm.com> > Reviewed-by: Mahesh Salgaonkar <mah...@linux.ibm.com> > --- > V2: Rephrasing the commit message. > --- > arch/powerpc/kernel/mce.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c > index 6c5d30fba766..a1cb2172eb7b 100644 > --- a/arch/powerpc/kernel/mce.c > +++ b/arch/powerpc/kernel/mce.c > @@ -131,6 +131,13 @@ void save_mce_event(struct pt_regs *regs, long handled, > if (mce->error_type == MCE_ERROR_TYPE_UE) > mce->u.ue_error.ignore_event = mce_err->ignore_event; > > + /* > + * Raise irq work, So that we don't miss to log the error for > + * unrecoverable errors. > + */ > + if (mce->disposition == MCE_DISPOSITION_NOT_RECOVERED) > + mce_irq_work_queue(); > + > if (!addr) > return; > > @@ -235,7 +242,6 @@ static void machine_check_ue_event(struct > machine_check_event *evt) > evt, sizeof(*evt)); > > /* Queue work to process this event later. */ This comment is meaningless without the function call it's commenting about, ie. the comment should be removed too. > - mce_irq_work_queue(); > } > cheers