On Thu, Jan 28, 2021 at 07:43:26PM +0800, Aili Yao wrote:
> when one page is already hwpoisoned by AO action, process may not be
> killed, the process mapping this page may make a syscall include this
> page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel
> mode it may be fixed by fixup_exception, current code will just return
> error code to user process.

Shouldn't the AO action that poisoned the page have also unmapped it?
> 
> This is not suffient, we should send a SIGBUS to the process and log the
> info to console, as we can't trust the process will handle the error
> correctly.

I agree with this part ... few apps check for -EFAULT and do the right
thing.  But I'm not sure how this happens. Can you provide a bit more
detail on the steps

-Tony

P.S. Typo: s/suffient/sufficient/

> 
> Suggested-by: Feng Yang <yangfe...@kingsoft.com>
> Signed-off-by: Aili Yao <yaoa...@kingsoft.com>
> ---
>  arch/x86/mm/fault.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index f1f1b5a0956a..36d1e385512b 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -662,7 +662,16 @@ no_context(struct pt_regs *regs, unsigned long 
> error_code,
>                * In this case we need to make sure we're not recursively
>                * faulting through the emulate_vsyscall() logic.
>                */
> +#ifdef CONFIG_MEMORY_FAILURE
> +             if (si_code == BUS_MCEERR_AR && signal == SIGBUS)
> +                     pr_err("MCE: Killing %s:%d due to hardware memory 
> corruption fault at %lx\n",
> +                             current->comm, current->pid, address);
> +
> +             if ((current->thread.sig_on_uaccess_err && signal) ||
> +                     (si_code == BUS_MCEERR_AR && signal == SIGBUS)) {
> +#else
>               if (current->thread.sig_on_uaccess_err && signal) {
> +#endif
>                       sanitize_error_code(address, &error_code);
>  
>                       set_signal_archinfo(address, error_code);
> @@ -927,7 +936,14 @@ do_sigbus(struct pt_regs *regs, unsigned long 
> error_code, unsigned long address,
>  {
>       /* Kernel mode? Handle exceptions or die: */
>       if (!(error_code & X86_PF_USER)) {
> +#ifdef CONFIG_MEMORY_FAILURE
> +             if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE))
> +                     no_context(regs, error_code, address, SIGBUS, 
> BUS_MCEERR_AR);
> +             else
> +                     no_context(regs, error_code, address, SIGBUS, 
> BUS_ADRERR);
> +#else
>               no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
> +#endif
>               return;
>       }
>  
> -- 
> 2.25.1
> 

Reply via email to