when one page is already hwpoisoned by AO action, process may not be killed, the process mapping this page may make a syscall include this page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel mode it may be fixed by fixup_exception, current code will just return error code to user process.
This is not suffient, we should send a SIGBUS to the process and log the info to console, as we can't trust the process will handle the error correctly. Suggested-by: Feng Yang <yangfe...@kingsoft.com> Signed-off-by: Aili Yao <yaoa...@kingsoft.com> --- arch/x86/mm/fault.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index f1f1b5a0956a..36d1e385512b 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -662,7 +662,16 @@ no_context(struct pt_regs *regs, unsigned long error_code, * In this case we need to make sure we're not recursively * faulting through the emulate_vsyscall() logic. */ +#ifdef CONFIG_MEMORY_FAILURE + if (si_code == BUS_MCEERR_AR && signal == SIGBUS) + pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n", + current->comm, current->pid, address); + + if ((current->thread.sig_on_uaccess_err && signal) || + (si_code == BUS_MCEERR_AR && signal == SIGBUS)) { +#else if (current->thread.sig_on_uaccess_err && signal) { +#endif sanitize_error_code(address, &error_code); set_signal_archinfo(address, error_code); @@ -927,7 +936,14 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address, { /* Kernel mode? Handle exceptions or die: */ if (!(error_code & X86_PF_USER)) { +#ifdef CONFIG_MEMORY_FAILURE + if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) + no_context(regs, error_code, address, SIGBUS, BUS_MCEERR_AR); + else + no_context(regs, error_code, address, SIGBUS, BUS_ADRERR); +#else no_context(regs, error_code, address, SIGBUS, BUS_ADRERR); +#endif return; } -- 2.25.1