On 1/31/23 4:59 PM, Michael Ellerman wrote:

Ganesh Goudar<ganes...@linux.ibm.com>  writes:
For all unrecoverable errors we are missing to log the
error, Since machine_check_log_err() is not getting called
for unrecoverable errors.

Raise irq work in save_mce_event() for unrecoverable errors,
So that we log the error from MCE event handling block in
timer handler.
But the patch also removes the irq work raise from machine_check_ue_event().

That's currently done unconditionally, regardless of the disposition. So
doesn't this change also drop logging of recoverable UEs?

Maybe that's OK, but the change log should explain it.

Yes, its ok, exception vector code will do that for recoverable errors, ill 
explain
this in commit message.


Log without this change

  MCE: CPU27: machine check (Severe)  Real address Load/Store (foreign/control 
memory) [Not recovered]
  MCE: CPU27: PID: 10580 Comm: inject-ra-err NIP: [0000000010000df4]
  MCE: CPU27: Initiator CPU
  MCE: CPU27: Unknown

Log with this change

  MCE: CPU24: machine check (Severe)  Real address Load/Store (foreign/control 
memory) [Not recovered]
  MCE: CPU24: PID: 1589811 Comm: inject-ra-err NIP: [0000000010000e48]
  MCE: CPU24: Initiator CPU
  MCE: CPU24: Unknown
  RTAS: event: 5, Type: Platform Error (224), Severity: 3

Signed-off-by: Ganesh Goudar<ganes...@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar<mah...@linux.ibm.com>
---
V2: Rephrasing the commit message.
---
  arch/powerpc/kernel/mce.c | 8 +++++++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 6c5d30fba766..a1cb2172eb7b 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -131,6 +131,13 @@ void save_mce_event(struct pt_regs *regs, long handled,
        if (mce->error_type == MCE_ERROR_TYPE_UE)
                mce->u.ue_error.ignore_event = mce_err->ignore_event;
+ /*
+        * Raise irq work, So that we don't miss to log the error for
+        * unrecoverable errors.
+        */
+       if (mce->disposition == MCE_DISPOSITION_NOT_RECOVERED)
+               mce_irq_work_queue();
+
        if (!addr)
                return;
@@ -235,7 +242,6 @@ static void machine_check_ue_event(struct machine_check_event *evt)
               evt, sizeof(*evt));
/* Queue work to process this event later. */
This comment is meaningless without the function call it's commenting
about, ie. the comment should be removed too.

ok.

Thanks.

Reply via email to