Hi Aili,

On Mon, Feb 01, 2021 at 04:17:49PM +0800, Aili Yao wrote:
> When one page is already hwpoisoned by AO action, process may not be
> killed, the process mapping this page may make a syscall include this
> page and result to trigger a VM_FAULT_HWPOISON fault, if it's in kernel
> mode it may be fixed by fixup_exception. Current code will just return
> error code to user process.
> 
> This is not sufficient, we should send a SIGBUS to the process and log
> the info to console, as we can't trust the process will handle the error
> correctly.
> 
> Suggested-by: Feng Yang <yangfe...@kingsoft.com>
> Signed-off-by: Aili Yao <yaoa...@kingsoft.com>
> ---
...
> @@ -662,12 +662,32 @@ no_context(struct pt_regs *regs, unsigned long 
> error_code,
>                * In this case we need to make sure we're not recursively
>                * faulting through the emulate_vsyscall() logic.
>                */
> +
> +             if (IS_ENABLED(CONFIG_MEMORY_FAILURE) &&
> +                 fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
> +                     unsigned int lsb = 0;
> +
> +                     pr_err("MCE: Killing %s:%d due to hardware memory 
> corruption fault at %lx\n",
> +                             current->comm, current->pid, address);
> +
> +                     sanitize_error_code(address, &error_code);
> +                     set_signal_archinfo(address, error_code);
> +
> +                     if (fault & VM_FAULT_HWPOISON_LARGE)
> +                             lsb = 
> hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
> +                     if (fault & VM_FAULT_HWPOISON)
> +                             lsb = PAGE_SHIFT;
> +
> +                     force_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, 
> lsb);

This part contains some duplicated code with do_sigbus(), so some refactoring 
(like
adding a common function) would be more helpful.

Thanks,
Naoya Horiguchi

Reply via email to