On Thu, Jan 28, 2021 at 07:43:26PM +0800, Aili Yao wrote: > when one page is already hwpoisoned by AO action, process may not be > killed, the process mapping this page may make a syscall include this > page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel > mode it may be fixed by fixup_exception, current code will just return > error code to user process.
Shouldn't the AO action that poisoned the page have also unmapped it? > > This is not suffient, we should send a SIGBUS to the process and log the > info to console, as we can't trust the process will handle the error > correctly. I agree with this part ... few apps check for -EFAULT and do the right thing. But I'm not sure how this happens. Can you provide a bit more detail on the steps -Tony P.S. Typo: s/suffient/sufficient/ > > Suggested-by: Feng Yang <yangfe...@kingsoft.com> > Signed-off-by: Aili Yao <yaoa...@kingsoft.com> > --- > arch/x86/mm/fault.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > index f1f1b5a0956a..36d1e385512b 100644 > --- a/arch/x86/mm/fault.c > +++ b/arch/x86/mm/fault.c > @@ -662,7 +662,16 @@ no_context(struct pt_regs *regs, unsigned long > error_code, > * In this case we need to make sure we're not recursively > * faulting through the emulate_vsyscall() logic. > */ > +#ifdef CONFIG_MEMORY_FAILURE > + if (si_code == BUS_MCEERR_AR && signal == SIGBUS) > + pr_err("MCE: Killing %s:%d due to hardware memory > corruption fault at %lx\n", > + current->comm, current->pid, address); > + > + if ((current->thread.sig_on_uaccess_err && signal) || > + (si_code == BUS_MCEERR_AR && signal == SIGBUS)) { > +#else > if (current->thread.sig_on_uaccess_err && signal) { > +#endif > sanitize_error_code(address, &error_code); > > set_signal_archinfo(address, error_code); > @@ -927,7 +936,14 @@ do_sigbus(struct pt_regs *regs, unsigned long > error_code, unsigned long address, > { > /* Kernel mode? Handle exceptions or die: */ > if (!(error_code & X86_PF_USER)) { > +#ifdef CONFIG_MEMORY_FAILURE > + if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) > + no_context(regs, error_code, address, SIGBUS, > BUS_MCEERR_AR); > + else > + no_context(regs, error_code, address, SIGBUS, > BUS_ADRERR); > +#else > no_context(regs, error_code, address, SIGBUS, BUS_ADRERR); > +#endif > return; > } > > -- > 2.25.1 >