Re: [PATCH v13 2/5] arm64: add support for ARCH_HAS_COPY_MC

2025-04-05 Thread Tong Tiangen
在 2025/3/29 1:06, Yeoreum Yun 写道: Hi, 在 2025/2/13 0:21, Catalin Marinas 写道: (catching up with old threads) On Mon, Dec 09, 2024 at 10:42:54AM +0800, Tong Tiangen wrote: For the arm64 kernel, when it processes hardware memory errors for synchronize notifications(do_sea()), if the errors

Re: [PATCH v13 2/5] arm64: add support for ARCH_HAS_COPY_MC

2025-04-02 Thread Tong Tiangen
在 2025/3/25 0:54, Luck, Tony 写道: On Fri, Feb 14, 2025 at 09:44:02AM +0800, Tong Tiangen wrote: 在 2025/2/13 0:21, Catalin Marinas 写道: (catching up with old threads) On Mon, Dec 09, 2024 at 10:42:54AM +0800, Tong Tiangen wrote: For the arm64 kernel, when it processes hardware memory

Re: [PATCH v13 4/5] arm64: support copy_mc_[user]_highpage()

2025-03-04 Thread Tong Tiangen
Hi,Catalin: Kindly ping ... Thanks.:) 在 2025/2/19 3:42, Catalin Marinas 写道: On Tue, Feb 18, 2025 at 07:51:10PM +0800, Tong Tiangen wrote: 在 2025/2/13 1:11, Catalin Marinas 写道: On Mon, Dec 09, 2024 at 10:42:56AM +0800, Tong Tiangen wrote: Currently, many scenarios that can tolerate memory

Re: [PATCH v13 4/5] arm64: support copy_mc_[user]_highpage()

2025-02-18 Thread Tong Tiangen
在 2025/2/17 22:55, Catalin Marinas 写道: On Mon, Feb 17, 2025 at 04:07:49PM +0800, Tong Tiangen wrote: 在 2025/2/15 1:24, Catalin Marinas 写道: On Fri, Feb 14, 2025 at 10:49:01AM +0800, Tong Tiangen wrote: 在 2025/2/13 1:11, Catalin Marinas 写道: On Mon, Dec 09, 2024 at 10:42:56AM +0800, Tong

Re: [PATCH v13 4/5] arm64: support copy_mc_[user]_highpage()

2025-02-17 Thread Tong Tiangen
在 2025/2/15 1:24, Catalin Marinas 写道: On Fri, Feb 14, 2025 at 10:49:01AM +0800, Tong Tiangen wrote: 在 2025/2/13 1:11, Catalin Marinas 写道: On Mon, Dec 09, 2024 at 10:42:56AM +0800, Tong Tiangen wrote: Currently, many scenarios that can tolerate memory errors when copying page have been

Re: [PATCH v13 5/5] arm64: introduce copy_mc_to_kernel() implementation

2025-02-13 Thread Tong Tiangen
在 2025/2/13 1:18, Catalin Marinas 写道: On Mon, Dec 09, 2024 at 10:42:57AM +0800, Tong Tiangen wrote: The copy_mc_to_kernel() helper is memory copy implementation that handles source exceptions. It can be used in memory copy scenarios that tolerate hardware memory errors(e.g: pmem_read

Re: [PATCH v13 4/5] arm64: support copy_mc_[user]_highpage()

2025-02-13 Thread Tong Tiangen
在 2025/2/13 1:11, Catalin Marinas 写道: On Mon, Dec 09, 2024 at 10:42:56AM +0800, Tong Tiangen wrote: Currently, many scenarios that can tolerate memory errors when copying page have been supported in the kernel[1~5], all of which are implemented by copy_mc_[user]_highpage(). arm64 should also

Re: [PATCH v13 2/5] arm64: add support for ARCH_HAS_COPY_MC

2025-02-13 Thread Tong Tiangen
在 2025/2/13 0:21, Catalin Marinas 写道: (catching up with old threads) On Mon, Dec 09, 2024 at 10:42:54AM +0800, Tong Tiangen wrote: For the arm64 kernel, when it processes hardware memory errors for synchronize notifications(do_sea()), if the errors is consumed within the kernel, the current

Re: [PATCH v13 4/5] arm64: support copy_mc_[user]_highpage()

2025-02-13 Thread Tong Tiangen
在 2025/2/13 1:11, Catalin Marinas 写道: On Mon, Dec 09, 2024 at 10:42:56AM +0800, Tong Tiangen wrote: Currently, many scenarios that can tolerate memory errors when copying page have been supported in the kernel[1~5], all of which are implemented by copy_mc_[user]_highpage(). arm64 should also

[PATCH v13 0/5]arm64: add ARCH_HAS_COPY_MC support

2024-12-08 Thread Tong Tiangen
Borisllav's suggestion, update commit message of patch 1/5. Since V1: 1.Consistent with PPC/x86, Using CONFIG_ARCH_HAS_COPY_MC instead of ARM64_UCE_KERNEL_RECOVERY. 2.Add two new scenes, cow and pagecache reading. 3.Fix two small bug(the first two patch). V1 in here: https://lore.ke

[PATCH v13 4/5] arm64: support copy_mc_[user]_highpage()

2024-12-08 Thread Tong Tiangen
rom poisoned anonymous memory") [5] commit 12904d953364 ("mm/khugepaged: recover from poisoned file-backed memory") Signed-off-by: Tong Tiangen --- arch/arm64/include/asm/mte.h| 9 arch/arm64/include/asm/page.h | 10 arch/arm64/lib/Makefile

[PATCH v13 1/5] uaccess: add generic fallback version of copy_mc_to_user()

2024-12-08 Thread Tong Tiangen
x86/powerpc has it's implementation of copy_mc_to_user(), we add generic fallback in include/linux/uaccess.h prepare for other architechures to enable CONFIG_ARCH_HAS_COPY_MC. Signed-off-by: Tong Tiangen Acked-by: Michael Ellerman Reviewed-by: Mauro Carvalho Chehab Reviewed-by: Jon

[PATCH v13 3/5] mm/hwpoison: return -EFAULT when copy fail in copy_mc_[user]_highpage()

2024-12-08 Thread Tong Tiangen
eck if copy was succeeded or not, make the interface more generic by using an error code when copy fails (-EFAULT) or return zero on success. Signed-off-by: Tong Tiangen Reviewed-by: Jonathan Cameron Reviewed-by: Mauro Carvalho Chehab --- include/linux/highmem.h | 8 mm/khugepaged.c

[PATCH v13 5/5] arm64: introduce copy_mc_to_kernel() implementation

2024-12-08 Thread Tong Tiangen
considered at present. Signed-off-by: Tong Tiangen --- arch/arm64/include/asm/string.h | 5 ++ arch/arm64/include/asm/uaccess.h | 18 ++ arch/arm64/lib/Makefile | 2 +- arch/arm64/lib/memcpy_mc.S | 98 mm/kasan/shadow.c| 12 +++

[PATCH v13 2/5] arm64: add support for ARCH_HAS_COPY_MC

2024-12-08 Thread Tong Tiangen
__arch_copy_to_user(), This make the regular copy_to_user() will handle kernel memory errors. Signed-off-by: Tong Tiangen --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/asm-extable.h | 31 +++- arch/arm64/include/asm/asm-uaccess.h | 4 arch/arm64

Re: [PATCH v12 4/6] arm64: support copy_mc_[user]_highpage()

2024-08-21 Thread Tong Tiangen
在 2024/8/21 19:28, Jonathan Cameron 写道: On Tue, 20 Aug 2024 11:02:05 +0800 Tong Tiangen wrote: 在 2024/8/19 19:56, Jonathan Cameron 写道: On Tue, 28 May 2024 16:59:13 +0800 Tong Tiangen wrote: Currently, many scenarios that can tolerate memory errors when copying page have been

Re: [PATCH v12 2/6] arm64: add support for ARCH_HAS_COPY_MC

2024-08-20 Thread Tong Tiangen
在 2024/8/20 17:12, Mark Rutland 写道: On Tue, Aug 20, 2024 at 10:11:45AM +0800, Tong Tiangen wrote: 在 2024/8/20 1:29, Mark Rutland 写道: Hi Tong, On Tue, May 28, 2024 at 04:59:11PM +0800, Tong Tiangen wrote: For the arm64 kernel, when it processes hardware memory errors for synchronize

Re: [PATCH v12 6/6] arm64: send SIGBUS to user process for SEA exception

2024-08-19 Thread Tong Tiangen
在 2024/8/19 20:08, Jonathan Cameron 写道: On Tue, 28 May 2024 16:59:15 +0800 Tong Tiangen wrote: For SEA exception, kernel require take some action to recover from memory error, such as isolate poison page adn kill failure thread, which are done in memory_failure(). During our test, the

Re: [PATCH v12 4/6] arm64: support copy_mc_[user]_highpage()

2024-08-19 Thread Tong Tiangen
在 2024/8/19 19:56, Jonathan Cameron 写道: On Tue, 28 May 2024 16:59:13 +0800 Tong Tiangen wrote: Currently, many scenarios that can tolerate memory errors when copying page have been supported in the kernel[1~5], all of which are implemented by copy_mc_[user]_highpage(). arm64 should also

Re: [PATCH v12 2/6] arm64: add support for ARCH_HAS_COPY_MC

2024-08-19 Thread Tong Tiangen
在 2024/8/19 18:30, Jonathan Cameron 写道: On Tue, 28 May 2024 16:59:11 +0800 Tong Tiangen wrote: For the arm64 kernel, when it processes hardware memory errors for synchronize notifications(do_sea()), if the errors is consumed within the kernel, the current processing is panic. However, it

Re: [PATCH v12 2/6] arm64: add support for ARCH_HAS_COPY_MC

2024-08-19 Thread Tong Tiangen
在 2024/8/20 1:29, Mark Rutland 写道: Hi Tong, On Tue, May 28, 2024 at 04:59:11PM +0800, Tong Tiangen wrote: For the arm64 kernel, when it processes hardware memory errors for synchronize notifications(do_sea()), if the errors is consumed within the kernel, the current processing is panic

Re: [PATCH v12 1/6] uaccess: add generic fallback version of copy_mc_to_user()

2024-08-19 Thread Tong Tiangen
在 2024/8/19 17:57, Jonathan Cameron 写道: On Tue, 28 May 2024 16:59:10 +0800 Tong Tiangen wrote: x86/powerpc has it's implementation of copy_mc_to_user(), we add generic fallback in include/linux/uaccess.h prepare for other architechures to enable CONFIG_ARCH_HAS_COPY_MC. Signed-o

[PATCH v12 0/6]arm64: add ARCH_HAS_COPY_MC support

2024-05-28 Thread Tong Tiangen
Since V1: 1.Consistent with PPC/x86, Using CONFIG_ARCH_HAS_COPY_MC instead of ARM64_UCE_KERNEL_RECOVERY. 2.Add two new scenes, cow and pagecache reading. 3.Fix two small bug(the first two patch). V1 in here: https://lore.kernel.org/lkml/20220323033705.3966643-1-tongtian...@huawei.com/ T

[PATCH v12 4/6] arm64: support copy_mc_[user]_highpage()

2024-05-28 Thread Tong Tiangen
uot;mm/khugepaged: recover from poisoned file-backed memory") Signed-off-by: Tong Tiangen --- arch/arm64/include/asm/mte.h| 9 + arch/arm64/include/asm/page.h | 10 ++ arch/arm64/lib/Makefile | 2 ++ arch/arm64/lib/copy_mc_page.S | 35 ++

[PATCH v12 6/6] arm64: send SIGBUS to user process for SEA exception

2024-05-28 Thread Tong Tiangen
signals to user processes in do_sea(). After [1] is merged, this patch can be rolled back or the SIGBUS will be sent repeated. [1]https://lore.kernel.org/lkml/20240204080144.7977-1-xuesh...@linux.alibaba.com/ Signed-off-by: Tong Tiangen --- arch/arm64/mm/fault.c | 14 +++--- 1 file

[PATCH v12 1/6] uaccess: add generic fallback version of copy_mc_to_user()

2024-05-28 Thread Tong Tiangen
x86/powerpc has it's implementation of copy_mc_to_user(), we add generic fallback in include/linux/uaccess.h prepare for other architechures to enable CONFIG_ARCH_HAS_COPY_MC. Signed-off-by: Tong Tiangen Acked-by: Michael Ellerman --- arch/powerpc/include/asm/uaccess.h | 1 + arch/x86/in

[PATCH v12 5/6] arm64: introduce copy_mc_to_kernel() implementation

2024-05-28 Thread Tong Tiangen
introduce copy_mc_to_kernel() implementation. Also add memcpy_mc() for memory copy that handles source exceptions. Because there is no GPR is available for saving "bytes not copied" in memcpy(), the mempcy_mc() is referenced to the implementation of copy_from_user(). Signed-off-by: To

[PATCH v12 3/6] mm/hwpoison: return -EFAULT when copy fail in copy_mc_[user]_highpage()

2024-05-28 Thread Tong Tiangen
If hardware errors are encountered during page copying, returning the bytes not copied is not meaningful, and the caller cannot do any processing on the remaining data. Returning -EFAULT is more reasonable, which represents a hardware error encountered during the copying. Signed-off-by: Tong

[PATCH v12 2/6] arm64: add support for ARCH_HAS_COPY_MC

2024-05-28 Thread Tong Tiangen
, only the associated process is affected. Killing the user process and isolating the corrupt page is a better choice. New fixup type EX_TYPE_KACCESS_ERR_ZERO_ME_SAFE is added to identify insn that can recover from memory errors triggered by access to kernel memory. Signed-off-by: Tong Tiangen

Re: [PATCH v11 0/5]arm64: add ARCH_HAS_COPY_MC support

2024-03-26 Thread Tong Tiangen
Hi Mark: Kindly ping... Thanks, Tong. 在 2024/2/7 21:21, Tong Tiangen 写道: With the increase of memory capacity and density, the probability of memory error also increases. The increasing size and density of server RAM in data centers and clouds have shown increased uncorrectable memory

Re: [PATCH v11 0/5]arm64: add ARCH_HAS_COPY_MC support

2024-02-17 Thread Tong Tiangen
Hi Mark: Kindly ping :) Thanks. Tong. 在 2024/2/7 21:21, Tong Tiangen 写道: With the increase of memory capacity and density, the probability of memory error also increases. The increasing size and density of server RAM in data centers and clouds have shown increased uncorrectable memory errors

[PATCH v11 5/5] arm64: send SIGBUS to user process for SEA exception

2024-02-07 Thread Tong Tiangen
signals to user processes (!(PF_KTHREAD|PF_IO_WORKER|PF_WQ_WORKER|PF_USER_WORKER)) in do_sea(). After [1] is merged, this patch can be rolled back or the SIGBUS will be sent repeated. [1]https://lore.kernel.org/lkml/20240204080144.7977-1-xuesh...@linux.alibaba.com/ Signed-off-by: Tong Tiangen

[PATCH v11 4/5] arm64: support copy_mc_[user]_highpage()

2024-02-07 Thread Tong Tiangen
uot;mm/khugepaged: recover from poisoned file-backed memory") Signed-off-by: Tong Tiangen --- arch/arm64/include/asm/mte.h| 9 + arch/arm64/include/asm/page.h | 10 ++ arch/arm64/lib/Makefile | 2 ++ arch/arm64/lib/copy_mc_page.S | 37 +++

[PATCH v11 0/5]arm64: add ARCH_HAS_COPY_MC support

2024-02-07 Thread Tong Tiangen
tion. 3. According Mark's suggestion, update commit message of patch 2/5. 4. According Borisllav's suggestion, update commit message of patch 1/5. Since V1: 1.Consistent with PPC/x86, Using CONFIG_ARCH_HAS_COPY_MC instead of ARM64_UCE_KERNEL_RECOVERY. 2.Add two new scenes, cow and p

[PATCH v11 3/5] mm/hwpoison: return -EFAULT when copy fail in copy_mc_[user]_highpage()

2024-02-07 Thread Tong Tiangen
If hardware errors are encountered during page copying, returning the bytes not copied is not meaningful, and the caller cannot do any processing on the remaining data. Returning -EFAULT is more reasonable, which represents a hardware error encountered during the copying. Signed-off-by: Tong

[PATCH v11 2/5] arm64: add support for ARCH_HAS_COPY_MC

2024-02-07 Thread Tong Tiangen
, only the associated process is affected. Killing the user process and isolating the corrupt page is a better choice. New fixup type EX_TYPE_KACCESS_ERR_ZERO_ME_SAFE is added to identify insn that can recover from memory errors triggered by access to kernel memory. Signed-off-by: Tong Tiangen

[PATCH v11 1/5] uaccess: add generic fallback version of copy_mc_to_user()

2024-02-07 Thread Tong Tiangen
x86/powerpc has it's implementation of copy_mc_to_user(), we add generic fallback in include/linux/uaccess.h prepare for other architechures to enable CONFIG_ARCH_HAS_COPY_MC. Signed-off-by: Tong Tiangen Acked-by: Michael Ellerman --- arch/powerpc/include/asm/uaccess.h | 1 + arch/x86/in

Re: [PATCH v10 6/6] arm64: introduce copy_mc_to_kernel() implementation

2024-01-30 Thread Tong Tiangen
在 2024/1/30 18:20, Mark Rutland 写道: On Mon, Jan 29, 2024 at 09:46:52PM +0800, Tong Tiangen wrote: The copy_mc_to_kernel() helper is memory copy implementation that handles source exceptions. It can be used in memory copy scenarios that tolerate hardware memory errors(e.g: pmem_read

Re: [PATCH v10 5/6] arm64: support copy_mc_[user]_highpage()

2024-01-30 Thread Tong Tiangen
在 2024/1/30 18:31, Mark Rutland 写道: On Mon, Jan 29, 2024 at 09:46:51PM +0800, Tong Tiangen wrote: Currently, many scenarios that can tolerate memory errors when copying page have been supported in the kernel[1][2][3], all of which are implemented by copy_mc_[user]_highpage(). arm64 should

Re: [PATCH v10 3/6] arm64: add uaccess to machine check safe

2024-01-30 Thread Tong Tiangen
在 2024/1/30 20:01, Mark Rutland 写道: On Tue, Jan 30, 2024 at 07:14:35PM +0800, Tong Tiangen wrote: 在 2024/1/30 1:43, Mark Rutland 写道: On Mon, Jan 29, 2024 at 09:46:49PM +0800, Tong Tiangen wrote: Further, this change will also silently fixup unexpected kernel faults if we pass bad kernel

Re: [PATCH v10 2/6] arm64: add support for machine check error safe

2024-01-30 Thread Tong Tiangen
在 2024/1/30 21:07, Mark Rutland 写道: On Tue, Jan 30, 2024 at 06:57:24PM +0800, Tong Tiangen wrote: 在 2024/1/30 1:51, Mark Rutland 写道: On Mon, Jan 29, 2024 at 09:46:48PM +0800, Tong Tiangen wrote: diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 55f6455a8284..312932dc100b

Re: [PATCH v10 3/6] arm64: add uaccess to machine check safe

2024-01-30 Thread Tong Tiangen
在 2024/1/30 1:43, Mark Rutland 写道: On Mon, Jan 29, 2024 at 09:46:49PM +0800, Tong Tiangen wrote: If user process access memory fails due to hardware memory error, only the relevant processes are affected, so it is more reasonable to kill the user process and isolate the corrupt page than to

Re: [PATCH v10 2/6] arm64: add support for machine check error safe

2024-01-30 Thread Tong Tiangen
在 2024/1/30 1:51, Mark Rutland 写道: On Mon, Jan 29, 2024 at 09:46:48PM +0800, Tong Tiangen wrote: For the arm64 kernel, when it processes hardware memory errors for synchronize notifications(do_sea()), if the errors is consumed within the kernel, the current processing is panic. However, it

[PATCH v10 6/6] arm64: introduce copy_mc_to_kernel() implementation

2024-01-29 Thread Tong Tiangen
framework, we introduce copy_mc_to_kernel() implementation. Signed-off-by: Tong Tiangen --- arch/arm64/include/asm/string.h | 5 + arch/arm64/include/asm/uaccess.h | 21 +++ arch/arm64/lib/Makefile | 2 +- arch/arm64/lib/memcpy_mc.S | 257 +++ mm/kasan

[PATCH v10 0/6]arm64: add machine check safe support

2024-01-29 Thread Tong Tiangen
o new scenes, cow and pagecache reading. 3.Fix two small bug(the first two patch). V1 in here: https://lore.kernel.org/lkml/20220323033705.3966643-1-tongtian...@huawei.com/ Tong Tiangen (6): uaccess: add generic fallback version of copy_mc_to_user() arm64: add support for machine check erro

[PATCH v10 2/6] arm64: add support for machine check error safe

2024-01-29 Thread Tong Tiangen
user process will be affected. Killing the user process and isolating the corrupt page is a better choice. This patch only enable machine error check framework and adds an exception fixup before the kernel panic in do_sea(). Signed-off-by: Tong Tiangen --- arch/arm64/Kconfig | 1

[PATCH v10 4/6] mm/hwpoison: return -EFAULT when copy fail in copy_mc_[user]_highpage()

2024-01-29 Thread Tong Tiangen
If hardware errors are encountered during page copying, returning the bytes not copied is not meaningful, and the caller cannot do any processing on the remaining data. Returning -EFAULT is more reasonable, which represents a hardware error encountered during the copying. Signed-off-by: Tong

[PATCH v10 3/6] arm64: add uaccess to machine check safe

2024-01-29 Thread Tong Tiangen
If user process access memory fails due to hardware memory error, only the relevant processes are affected, so it is more reasonable to kill the user process and isolate the corrupt page than to panic the kernel. Signed-off-by: Tong Tiangen --- arch/arm64/lib/copy_from_user.S | 10

[PATCH v10 5/6] arm64: support copy_mc_[user]_highpage()

2024-01-29 Thread Tong Tiangen
2500b93cc9 ("mm/khugepaged: recover from poisoned anonymous memory") [3]6b970599e807 ("mm: hwpoison: support recovery from ksm_might_need_to_copy()") Signed-off-by: Tong Tiangen --- arch/arm64/include/asm/asm-extable.h | 15 ++ arch/arm64/include/asm/assembler.h | 4 ++

[PATCH v10 1/6] uaccess: add generic fallback version of copy_mc_to_user()

2024-01-29 Thread Tong Tiangen
x86/powerpc has it's implementation of copy_mc_to_user(), we add generic fallback in include/linux/uaccess.h prepare for other architechures to enable CONFIG_ARCH_HAS_COPY_MC. Signed-off-by: Tong Tiangen Acked-by: Michael Ellerman --- arch/powerpc/include/asm/uaccess.h | 1 + arch/x86/in