Re: [RFC 0/4] Fix machine check recovery for copy_from_user

2021-04-07 Thread Aili Yao
d by your patch, the user process may check the return values, for errors, it may exit the process, then the error page will be freed, and then the page maybe alloced to other process or to kernel itself, then code will initialize it and this will trigger one SRAO, if it's used by kernel, we m

Re: [RFC 0/4] Fix machine check recovery for copy_from_user

2021-04-08 Thread Aili Yao
E for this page has the swap/poison signature, so > the > page is not freed for re-use. > > -Tony Oh, Yes, Sorry for my rudeness and error-understandings, I just happen to can't control my emotions and get confused for some other things. Thanks! Aili Yao

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-22 Thread Aili Yao
On Mon, 22 Feb 2021 13:45:50 +0100 Borislav Petkov wrote: > On Mon, Feb 22, 2021 at 08:35:49PM +0800, Aili Yao wrote: > > Guest VM, the qemu has no way to know the RIPV value, so always get it > > cleared. > > What does that mean? > > The guest VM will get the MCE

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-23 Thread Aili Yao
On Tue, 23 Feb 2021 10:43:00 +0100 Borislav Petkov wrote: > On Tue, Feb 23, 2021 at 10:27:55AM +0800, Aili Yao wrote: > > When Guest access one address with UE error, it will exit guest mode, > > the host will do the recovery job, and then one SIGBUS is send to > > the VCP

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-23 Thread Aili Yao
On Tue, 23 Feb 2021 11:05:38 +0100 Borislav Petkov wrote: > On Tue, Feb 23, 2021 at 05:56:40PM +0800, Aili Yao wrote: > > What i inject is AR error, and I don't see MCG_STATUS_RIPV flag. > > Then keep debugging qemu to figure out why that is. > What I think is qemu

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-02-23 Thread Aili Yao
On Fri, 5 Feb 2021 17:01:35 +0800 Aili Yao wrote: > When one page is already hwpoisoned by MCE AO action, processes may not > be killed, processes mapping this page may make a syscall include this > page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel > mode i

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-23 Thread Aili Yao
ry SRAR is triggered, RIPV will always be set, then it's the job of qemu to set the RIPV instead. Or if When SRAR is triggered with RIPV cleared, the same issue will be true for host. And I think it's better for VM to know the real RIPV value, It need more work in qemu and kernel if possible. Thanks Aili Yao

[PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-02-23 Thread Aili Yao
ect. For other cases which care the return value of memory_failure() should check why they want to process a memory error which have already been processed. This behavior seems reasonable. In kill_me_maybe, log the fact about the memory may not recovered, and we will kill the related process. Signed-o

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-02-24 Thread Aili Yao
gt; > > > For other cases which care the return value of memory_failure() should > > check why they want to process a memory error which have already been > > processed. This behavior seems reasonable. > > > > In kill_me_maybe, log the fact about the memory may

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-02-24 Thread Aili Yao
On Tue, 23 Feb 2021 08:42:59 -0800 "Luck, Tony" wrote: > On Tue, Feb 23, 2021 at 07:33:46AM -0800, Andy Lutomirski wrote: > > > > > On Feb 23, 2021, at 4:44 AM, Aili Yao wrote: > > > > > > On Fri, 5 Feb 2021 17:01:35 +0800 > > > Ai

Re: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

2021-01-28 Thread Aili Yao
On Thu, 28 Jan 2021 09:43:52 -0800 "Luck, Tony" wrote: > On Thu, Jan 28, 2021 at 07:43:26PM +0800, Aili Yao wrote: > > when one page is already hwpoisoned by AO action, process may not be > > killed, the process mapping this page may make a syscall include this > &g

Re: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

2021-01-31 Thread Aili Yao
Do you mean the force_sig_mceerr and force_sig_fault difference? I see a hwpoison related comment there, but it's better to follow the usual way force_sig_mceerr, I will modify this in a v2 patch. Or something other, you may post a better one. Thanks -- Best Regards! Aili Yao

[PATCH v2] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-02-01 Thread Aili Yao
de to user process. This is not sufficient, we should send a SIGBUS to the process and log the info to console, as we can't trust the process will handle the error correctly. Suggested-by: Feng Yang Signed-off-by: Aili Yao --- arch/x86/mm/fault.c | 34 +++--- 1 fi

Re: [PATCH v2] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-02-01 Thread Aili Yao
On Mon, 1 Feb 2021 08:58:27 -0800 Andy Lutomirski wrote: > On Mon, Feb 1, 2021 at 12:17 AM Aili Yao wrote: > > > > When one page is already hwpoisoned by AO action, process may not be > > killed, the process mapping this page may make a syscall include this > > p

Re: [PATCH v2] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-02-04 Thread Aili Yao
On Thu, 4 Feb 2021 07:25:55 + HORIGUCHI NAOYA(堀口 直也) wrote: > Hi Aili, > > On Mon, Feb 01, 2021 at 04:17:49PM +0800, Aili Yao wrote: > > When one page is already hwpoisoned by AO action, process may not be > > killed, the process mapping this page may make a syscall

[PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-02-05 Thread Aili Yao
code to user code. This is not sufficient, we should send a SIGBUS to the process and log the info to console, as we can't trust the process will handle the error correctly. Suggested-by: Feng Yang Signed-off-by: Aili Yao --- arch/x86/mm/fault.c | 62 +-

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-02-25 Thread Aili Yao
ISON, but other options are fine if justified well. > > -EHWPOISON seems like a good fit. > I am OK with the -EHWPOISON error code, But I have one doubt here: When we return this -EHWPOISON error code, Does this means we have to add a new error code to error-base.h or errno.h? Is this easy realized? Thanks! Aili Yao

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-02-25 Thread Aili Yao
int, if we block for this issue, Does this change the result that the process should be killed? Or is there something other still need to be considered? Thanks! Aili Yao

[PATCH v3] mm/gup: check page posion status for coredump.

2021-03-18 Thread Aili Yao
oison status in get_dump_page(), and if TRUE, return NULL. There maybe other scenario that is also better to check the posion status and not to panic, so make a wrapper for this check, Thanks to David's suggestion(). Signed-off-by: Aili Yao --- mm/gup.c | 4 mm/inter

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-03-23 Thread Aili Yao
On Wed, 24 Feb 2021 10:39:21 +0800 Aili Yao wrote: > On Tue, 23 Feb 2021 16:12:43 + > "Luck, Tony" wrote: > > > > What I think is qemu has not an easy to get the MCE signature from host > > > or currently no methods for this > > > So qemu t

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-03-24 Thread Aili Yao
On Wed, 24 Mar 2021 10:59:50 +0800 Aili Yao wrote: > On Wed, 24 Feb 2021 10:39:21 +0800 > Aili Yao wrote: > > > On Tue, 23 Feb 2021 16:12:43 + > > "Luck, Tony" wrote: > > > > > > What I think is qemu has not an easy to get the MCE sig

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-04 Thread Aili Yao
On Thu, 4 Mar 2021 15:57:20 -0800 "Luck, Tony" wrote: > On Thu, Mar 04, 2021 at 02:45:24PM +0800, Aili Yao wrote: > > > > if your methods works, should it be like this? > > > > > > > > 1582 pteval = > > > >

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-04 Thread Aili Yao
On Fri, 5 Mar 2021 09:30:16 +0800 Aili Yao wrote: > On Thu, 4 Mar 2021 15:57:20 -0800 > "Luck, Tony" wrote: > > > On Thu, Mar 04, 2021 at 02:45:24PM +0800, Aili Yao wrote: > > > > > if your methods works, should it be like this? > > > &

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-30 Thread Aili Yao
On Wed, 31 Mar 2021 01:52:59 + HORIGUCHI NAOYA(堀口 直也) wrote: > On Fri, Mar 26, 2021 at 03:22:49PM +0100, David Hildenbrand wrote: > > On 26.03.21 15:09, David Hildenbrand wrote: > > > On 22.03.21 12:33, Aili Yao wrote: > > > > When we do coredump for user p

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-31 Thread Aili Yao
On Wed, 31 Mar 2021 08:44:53 +0200 David Hildenbrand wrote: > On 31.03.21 06:32, HORIGUCHI NAOYA(堀口 直也) wrote: > > On Wed, Mar 31, 2021 at 10:43:36AM +0800, Aili Yao wrote: > >> On Wed, 31 Mar 2021 01:52:59 + HORIGUCHI NAOYA(堀口 直也) > >> wrote: > >>

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-31 Thread Aili Yao
on this topic, but i noticed today I made a stupid mistake that EHWPOISON is already been declared, so we should better return EHWPOISON for this case. Really sorry for this! As the patch is still under review, I will post a new version for this, if I change this, may I add your review tag here please? -- Thanks! Aili Yao

[PATCH v3] mm,hwpoison: return -EHWPOISON when page already poisoned

2021-03-31 Thread Aili Yao
ey want to process a memory error which have already been processed. This behavior seems reasonable. Signed-off-by: Aili Yao --- mm/memory-failure.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 24210c9bd843..5cd42144b67c 10

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-03 Thread Aili Yao
On Tue, 2 Mar 2021 19:39:53 -0800 "Luck, Tony" wrote: > On Fri, Feb 26, 2021 at 10:59:15AM +0800, Aili Yao wrote: > > Hi naoya, tony: > > > > > > > > Idea for what we should do next ... Now that x86 is calling > > > > memory_failure()

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-03 Thread Aili Yao
Hi tony: > On Tue, 2 Mar 2021 19:39:53 -0800 > "Luck, Tony" wrote: > > > On Fri, Feb 26, 2021 at 10:59:15AM +0800, Aili Yao wrote: > > > Hi naoya, tony: > > > > > > > > > > Idea for what we should do next ... Now that x86 is

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-03 Thread Aili Yao
On Wed, 3 Mar 2021 20:24:02 +0800 Aili Yao wrote: > On Mon, 1 Mar 2021 11:09:36 -0800 > Andy Lutomirski wrote: > > > > On Mar 1, 2021, at 11:02 AM, Luck, Tony wrote: > > > > > >  > > >> > > >> Some programs may use read(

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-03 Thread Aili Yao
correctly? if this is the proper action, the original posion flow in current code from read and write need to change too. -- Thanks! Aili Yao

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-03 Thread Aili Yao
c_mm_counter(mm, mm_counter(page)); 1590 set_pte_at(mm, address, pvmw.pte, pteval); 1591 } the page fault check if it's a poison page using is_hwpoison_entry(), -- Thanks! Aili Yao

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-03 Thread Aili Yao
On Thu, 4 Mar 2021 10:16:53 +0800 Aili Yao wrote: > On Wed, 3 Mar 2021 15:41:35 + > "Luck, Tony" wrote: > > > > For error address with sigbus, i think this is not an issue resulted by > > > the patch i post, before my patch, the issue is already there

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-03 Thread Aili Yao
On Thu, 4 Mar 2021 12:19:41 +0800 Aili Yao wrote: > On Thu, 4 Mar 2021 10:16:53 +0800 > Aili Yao wrote: > > > On Wed, 3 Mar 2021 15:41:35 + > > "Luck, Tony" wrote: > > > > > > For error address with sigbus, i think this is not an issue

Re: [PATCH v3] mm/gup: check page posion status for coredump.

2021-03-21 Thread Aili Yao
On Sat, 20 Mar 2021 00:35:16 + Matthew Wilcox wrote: > On Fri, Mar 19, 2021 at 10:44:37AM +0800, Aili Yao wrote: > > +++ b/mm/gup.c > > @@ -1536,6 +1536,10 @@ struct page *get_dump_page(unsigned long addr) > > FOLL_FORCE | FOLL_DUMP |

[PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-22 Thread Aili Yao
Signed-off-by: Aili Yao Cc: David Hildenbrand Cc: Matthew Wilcox Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Mike Kravetz Cc: Aili Yao Cc: sta...@vger.kernel.org Signed-off-by: Andrew Morton --- mm/gup.c | 4 mm/internal.h | 20 2 files changed, 24 insertion

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-16 Thread Aili Yao
pfn = PM_PFRAME(val); > + else if (val & PM_HWPOISON) > + pfn = PM_SWAP_OFFSET(val); > else > pfn = 0; > > @@ -742,7 +745,7 @@ static void walk_vma(unsigned long index, unsigned long > count) > pfn = pagemap_pfn(buf[i]); > if (pfn) > walk_pfn(index + i, pfn, 1, buf[i]); > - if (buf[i] & PM_SWAP) > + else if (buf[i] & PM_SWAP) > walk_swap(index + i, buf[i]); > } > -- Thanks! Aili Yao

Re: [PATCH v7] mm/gup: check page hwpoison status for memory recovery failures.

2021-04-07 Thread Aili Yao
On Wed, 7 Apr 2021 01:54:28 + HORIGUCHI NAOYA(堀口 直也) wrote: > On Tue, Apr 06, 2021 at 10:41:23AM +0800, Aili Yao wrote: > > When we call get_user_pages() to pin user page in memory, there may be > > hwpoison page, currently, we just handle the normal case that memory >

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-31 Thread Aili Yao
On Wed, 31 Mar 2021 08:44:53 +0200 David Hildenbrand wrote: > On 31.03.21 06:32, HORIGUCHI NAOYA(堀口 直也) wrote: > > On Wed, Mar 31, 2021 at 10:43:36AM +0800, Aili Yao wrote: > >> On Wed, 31 Mar 2021 01:52:59 + HORIGUCHI NAOYA(堀口 直也) > >> wrote: > >>

Re: [PATCH v3] mm,hwpoison: return -EHWPOISON when page already poisoned

2021-04-01 Thread Aili Yao
On Thu, 1 Apr 2021 08:33:20 -0700 "Luck, Tony" wrote: > On Wed, Mar 31, 2021 at 07:25:40PM +0800, Aili Yao wrote: > > When the page is already poisoned, another memory_failure() call in the > > same page now return 0, meaning OK. For nested memory mce handling, this >

Re: [PATCH v1 0/3] mm,hwpoison: fix sending SIGBUS for Action Required MCE

2021-04-16 Thread Aili Yao
ould you please test and > let me have some feedback? > > Thanks, > Naoya Horiguchi > > [1]: > https://lore.kernel.org/linux-mm/20210331192540.2141052f@alex-virtual-machine/ > --- > Summary: > > Aili Yao (1): > mm,hwpoison: return -EHWPOISON when page al

Re: [PATCH v2 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address

2021-04-18 Thread Aili Yao
fdef. > > Here's the v2 of 3/3. > > Aili, could you test with it? > > Thanks, > Naoya Horiguchi > I tested this v2 version, In my test, this patches worked as expected and the previous issues didn't happen again. Test-by: Aili Yao Thanks, Aili Yao > -

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-17 Thread Aili Yao
ng. > I can't find the way to fix this, maybe the virtual address is contained in related register, but this is really beyong my knowledge. This is a v2 RFC patch, add support for thp and 1G huge page errors. Thanks Aili Yao >From 31b685609610b3b06c8fd98d866913dbfeb7e159 Mon Sep 17 00

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-17 Thread Aili Yao
bug info and other unclean modifications. Post a clean one. Thanks Aili Yao >From 2289276ba943cdcddbf3b5b2cdbcaff78690e2e8 Mon Sep 17 00:00:00 2001 From: Aili Yao Date: Wed, 17 Mar 2021 16:12:41 +0800 Subject: [PATCH] fix invalid SIGBUS address for recovery fail Walk the current process pages a

[PATCH] mm/gup: check page posion status for coredump.

2021-03-17 Thread Aili Yao
oison status in get_dump_page(), and if TRUE, return NULL. Signed-off-by: Aili Yao --- mm/gup.c | 8 1 file changed, 8 insertions(+) diff --git a/mm/gup.c b/mm/gup.c index e4c224c..499a496 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1536,6 +1536,14 @@ struct page *get_dump_page(unsigned long

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-17 Thread Aili Yao
refuse to excute current read and further operation. For the process, it seems it have a change to proceed. if just error code is returned, the process may care or not, it may not correctly process the error. It seems the worst case here is the process will touch the poison page again,

Re: [PATCH] mm/gup: check page posion status for coredump.

2021-03-17 Thread Aili Yao
t; > return (ret == 1) ? page : NULL; > > } > > #endif /* CONFIG_ELF_CORE */ > > > > Yes, May other places meet the requirements as the coredump meets, it's better to make a wrapper for this. But i am not familiar with the specific scenario, so this patch only cover the coredump case. I will post a v2 patch for this. -- Thanks! Aili Yao

[PATCH v2] mm/gup: check page posion status for coredump.

2021-03-17 Thread Aili Yao
oison status in get_dump_page(), and if TRUE, return NULL. There maybe other scenario that is also better to check the posion status and not to panic, so make a wrapper for this check, suggested by David Hildenbrand Signed-off-by: Aili Yao --- mm/gup.c | 4 mm/internal.h

Re: [PATCH] mm/gup: check page posion status for coredump.

2021-03-17 Thread Aili Yao
tern bool take_page_off_buddy(struct page *page); > #else > PAGEFLAG_FALSE(HWPoison) > #define __PG_HWPOISON 0 > #endif > > so there's no need for this > if (IS_ENABLED(CONFIG_MEMORY_FAILURE) > check, as it simply turns into > > if (PageHuge(page) && 0) > else if (0) > > and the compiler can optimise it all away. Yes, You are right, I will modify this later. Thanks for correction -- Thanks! Aili Yao

Re: [PATCH v3] mm,hwpoison: return -EHWPOISON when page already poisoned

2021-04-05 Thread Aili Yao
that the virtual address will be > available in MCE handler. > > Anyway I'll try to write a patch for this. Yeah, previous patch didn't adress the multiple virtual address issue, If there is a way to fix that, That would be great! -- Thanks! Aili Yao

[PATCH v6] mm/gup: check page hwpoison status for memory recovery failures.

2021-04-05 Thread Aili Yao
. Signed-off-by: Aili Yao Cc: David Hildenbrand Cc: Matthew Wilcox Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Mike Kravetz Cc: Andrew Morton Cc: sta...@vger.kernel.org --- mm/gup.c | 27 +++ mm/huge_memory.c | 9 +++-- mm/hugetlb.c | 8 +++- mm

[PATCH v7] mm/gup: check page hwpoison status for memory recovery failures.

2021-04-05 Thread Aili Yao
. Changes since v6: - Fix wrong page pointer check in follow_trans_huge_pmd(); Signed-off-by: Aili Yao Cc: David Hildenbrand Cc: Matthew Wilcox Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Mike Kravetz Cc: Andrew Morton Cc: sta...@vger.kernel.org --- mm/gup.c | 27

x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-21 Thread Aili Yao
el will touch the error page again, whick result to a fatal error. We need to poison the page and then kill current in memory-failure module. So fix it using the orinigal checking method. Signed-off-by: Aili Yao --- arch/x86/kernel/cpu/mce/core.c | 7 --- 1 file changed, 4 insertions(+)

[PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-21 Thread Aili Yao
tep in kernel will touch the error page again, which result to a fatal error. We need to poison the page and then kill current in memory-failure module. So fix it using the orinigal checking method. Signed-off-by: Aili Yao --- arch/x86/kernel/cpu/mce/core.c | 5 - 1 file changed, 4 inserti

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-22 Thread Aili Yao
On Mon, 22 Feb 2021 10:24:03 +0100 Borislav Petkov wrote: > On Mon, Feb 22, 2021 at 11:50:07AM +0800, Aili Yao wrote: > > From commit b2f9d678e28c ("x86/mce: Check for faults tagged in > > EXTABLE_CLASS_FAULT exception table entries"), When there is a > > memor

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-22 Thread Aili Yao
On Mon, 22 Feb 2021 11:03:56 +0100 Borislav Petkov wrote: > On Mon, Feb 22, 2021 at 05:31:09PM +0800, Aili Yao wrote: > > you can inject a memory UE to a VM, it should always be MCG_STATUS_RIPV 0. > > So the signature you injected is not something the hardware would > g

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-22 Thread Aili Yao
On Mon, 22 Feb 2021 11:22:06 +0100 Borislav Petkov wrote: > On Mon, Feb 22, 2021 at 06:08:19PM +0800, Aili Yao wrote: > > So why would intel provide this MCG_STATUS_RIPV flag, it's better to > > remove it as it will never be set, and all the related logic for this >

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-22 Thread Aili Yao
On Mon, 22 Feb 2021 19:21:46 +0800 Aili Yao wrote: > On Mon, 22 Feb 2021 11:22:06 +0100 > Borislav Petkov wrote: > > > On Mon, Feb 22, 2021 at 06:08:19PM +0800, Aili Yao wrote: > > > So why would intel provide this MCG_STATUS_RIPV flag, it's better to > >

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-22 Thread Aili Yao
On Mon, 22 Feb 2021 13:22:41 +0100 Borislav Petkov wrote: > On Mon, Feb 22, 2021 at 08:17:23PM +0800, Aili Yao wrote: > > AR (Action Required) flag, bit 55 - Indicates (when set) that MCA > > error code specific recovery action must be... > > Give me the *exact* MCE signa

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-01 Thread Aili Yao
re what to do about this. Do you mean the patch will replace the SIGSEGV with SIGBUS for hwposion case? I think SIGBUS is more accurate for the error. Normally for poison access, the process shouldn't be returned and an exit will be good or we need another code stream for this I think. This is the legacy way to process user poison access error like other posion code branch in kernel. Thanks! Aili Yao

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-11 Thread Aili Yao
et correctly, but we may lost the correct page shift? And for copyin case, we don't need to call set_mce_nospec()? -- Thanks! Aili Yao

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-10 Thread Aili Yao
we can use a send_sig_mceerr() instead of force_sig_mceerr(), if process want to ignore the SIGBUS, then it will ignore that, or it can also process the SIGBUS? -- Thanks! Aili Yao

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-10 Thread Aili Yao
On Wed, 10 Mar 2021 17:28:12 -0800 Andy Lutomirski wrote: > On Wed, Mar 10, 2021 at 5:19 PM Aili Yao wrote: > > > > On Mon, 8 Mar 2021 11:00:28 -0800 > > Andy Lutomirski wrote: > > > > > > On Mar 8, 2021, at 10:31 AM, Luck, Tony wrote: > > >

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-11 Thread Aili Yao
On Thu, 11 Mar 2021 08:55:30 + HORIGUCHI NAOYA(堀口 直也) wrote: > On Wed, Mar 10, 2021 at 02:10:42PM +0800, Aili Yao wrote: > > On Fri, 5 Mar 2021 15:55:25 + > > "Luck, Tony" wrote: > > > > > > From the walk, it seems we have got the virtual

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-02 Thread Aili Yao
On Fri, 26 Feb 2021 09:58:37 -0800 "Luck, Tony" wrote: > On Fri, Feb 26, 2021 at 10:52:50AM +0800, Aili Yao wrote: > > Hi naoya,Oscar,david: > > > > > > > We could use some negative value (error code) to report the reported > > > > cas

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-08 Thread Aili Yao
Sorry, another question: > > When programs use read(2), write(2) as ways to check if memory is valid, > > does it really want to check if the user page the program provided is > > valid, not the destination or disk space valid? > > They may well be trying to see if their memory is valid. Thanks for your reply, and I don't know what to do. For current code, if user program write to a block device(maybe a test try) and if its user copy page corrupt when in kernel copy, the process is killed with a SIGBUS. And for the page fault case in this thread, the process is error returned. -- Thanks! Aili Yao

Re: [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races

2021-03-08 Thread Aili Yao
SHIFT, flags) && !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) { set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); sync_core(); return; } while we place set_mce_nospec() here is for a reason, please see commit fd0e786d9d09024f67b. 2. When memory_failure return 0 and maybe return to user process, and it may re-execute the instruction triggering previous fault, this behavior assume an implicit dependence that the related pte has been correctly set. or if not correctlily set, it will lead to infinite loop again. -- Thanks! Aili Yao

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-08 Thread Aili Yao
the address in siginfo. > > -Tony Is the kill action for this scenario in memory_failure()? -- Thanks! Aili Yao

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-08 Thread Aili Yao
On Tue, 9 Mar 2021 10:14:52 +0800 Aili Yao wrote: > On Mon, 8 Mar 2021 18:31:07 + > "Luck, Tony" wrote: > > > > Can you point me at that SIGBUS code in a current kernel? > > > > It is in kill_me_maybe(). mce_vaddr is setup when we disassemble w

[PATCH v2] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-08 Thread Aili Yao
t to process a memory error which have already been processed. This behavior seems reasonable. Signed-off-by: Aili Yao --- mm/memory-failure.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 24210c9bd843..b6bc77460ee1 100644 ---

Re: [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races

2021-03-08 Thread Aili Yao
rom memory_failure()'s concurrency issue, > so I'm still expecting that your patch is to be merged. Maybe do you want > to update it based on the discussion (if it's concluded)? > > Thanks, > Naoya Horiguchi I have submitted a v2 patch, and please help review. Thanks! -- Thanks! Aili Yao

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-09 Thread Aili Yao
condition, and if you really think the pfn with SIGBUS is not proper, I think following patch maybe one way. I copy your abandon code, and make a little modification, and just now it pass my simple test. And also this is a RFC version, only valid if you think the pfn with SIGBUS is not right.

Re: [PATCH v2] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-10 Thread Aili Yao
On Tue, 9 Mar 2021 08:28:24 + HORIGUCHI NAOYA(堀口 直也) wrote: > On Tue, Mar 09, 2021 at 02:35:34PM +0800, Aili Yao wrote: > > When the page is already poisoned, another memory_failure() call in the > > same page now return 0, meaning OK. For nested memory mce handling, this &

[PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

2021-01-28 Thread Aili Yao
de to user process. This is not suffient, we should send a SIGBUS to the process and log the info to console, as we can't trust the process will handle the error correctly. Suggested-by: Feng Yang Signed-off-by: Aili Yao --- arch/x86/mm/fault.c | 16 1 file changed, 16