在 2025/3/29 1:06, Yeoreum Yun 写道:
Hi,
在 2025/2/13 0:21, Catalin Marinas 写道:
(catching up with old threads)
On Mon, Dec 09, 2024 at 10:42:54AM +0800, Tong Tiangen wrote:
For the arm64 kernel, when it processes hardware memory errors for
synchronize notifications(do_sea()), if the errors
在 2025/3/25 0:54, Luck, Tony 写道:
On Fri, Feb 14, 2025 at 09:44:02AM +0800, Tong Tiangen wrote:
在 2025/2/13 0:21, Catalin Marinas 写道:
(catching up with old threads)
On Mon, Dec 09, 2024 at 10:42:54AM +0800, Tong Tiangen wrote:
For the arm64 kernel, when it processes hardware memory
Hi,Catalin:
Kindly ping ...
Thanks.:)
在 2025/2/19 3:42, Catalin Marinas 写道:
On Tue, Feb 18, 2025 at 07:51:10PM +0800, Tong Tiangen wrote:
在 2025/2/13 1:11, Catalin Marinas 写道:
On Mon, Dec 09, 2024 at 10:42:56AM +0800, Tong Tiangen wrote:
Currently, many scenarios that can tolerate memory
在 2025/2/17 22:55, Catalin Marinas 写道:
On Mon, Feb 17, 2025 at 04:07:49PM +0800, Tong Tiangen wrote:
在 2025/2/15 1:24, Catalin Marinas 写道:
On Fri, Feb 14, 2025 at 10:49:01AM +0800, Tong Tiangen wrote:
在 2025/2/13 1:11, Catalin Marinas 写道:
On Mon, Dec 09, 2024 at 10:42:56AM +0800, Tong
在 2025/2/15 1:24, Catalin Marinas 写道:
On Fri, Feb 14, 2025 at 10:49:01AM +0800, Tong Tiangen wrote:
在 2025/2/13 1:11, Catalin Marinas 写道:
On Mon, Dec 09, 2024 at 10:42:56AM +0800, Tong Tiangen wrote:
Currently, many scenarios that can tolerate memory errors when copying page
have been
在 2025/2/13 1:18, Catalin Marinas 写道:
On Mon, Dec 09, 2024 at 10:42:57AM +0800, Tong Tiangen wrote:
The copy_mc_to_kernel() helper is memory copy implementation that handles
source exceptions. It can be used in memory copy scenarios that tolerate
hardware memory errors(e.g: pmem_read
在 2025/2/13 1:11, Catalin Marinas 写道:
On Mon, Dec 09, 2024 at 10:42:56AM +0800, Tong Tiangen wrote:
Currently, many scenarios that can tolerate memory errors when copying page
have been supported in the kernel[1~5], all of which are implemented by
copy_mc_[user]_highpage(). arm64 should also
在 2025/2/13 0:21, Catalin Marinas 写道:
(catching up with old threads)
On Mon, Dec 09, 2024 at 10:42:54AM +0800, Tong Tiangen wrote:
For the arm64 kernel, when it processes hardware memory errors for
synchronize notifications(do_sea()), if the errors is consumed within the
kernel, the current
在 2025/2/13 1:11, Catalin Marinas 写道:
On Mon, Dec 09, 2024 at 10:42:56AM +0800, Tong Tiangen wrote:
Currently, many scenarios that can tolerate memory errors when copying page
have been supported in the kernel[1~5], all of which are implemented by
copy_mc_[user]_highpage(). arm64 should also
Borisllav's suggestion, update commit message of patch 1/5.
Since V1:
1.Consistent with PPC/x86, Using CONFIG_ARCH_HAS_COPY_MC instead of
ARM64_UCE_KERNEL_RECOVERY.
2.Add two new scenes, cow and pagecache reading.
3.Fix two small bug(the first two patch).
V1 in here:
https://lore.ke
rom poisoned anonymous
memory")
[5] commit 12904d953364 ("mm/khugepaged: recover from poisoned file-backed
memory")
Signed-off-by: Tong Tiangen
---
arch/arm64/include/asm/mte.h| 9
arch/arm64/include/asm/page.h | 10
arch/arm64/lib/Makefile
x86/powerpc has it's implementation of copy_mc_to_user(), we add generic
fallback in include/linux/uaccess.h prepare for other architechures to
enable CONFIG_ARCH_HAS_COPY_MC.
Signed-off-by: Tong Tiangen
Acked-by: Michael Ellerman
Reviewed-by: Mauro Carvalho Chehab
Reviewed-by: Jon
eck if
copy was succeeded or not, make the interface more generic by using an
error code when copy fails (-EFAULT) or return zero on success.
Signed-off-by: Tong Tiangen
Reviewed-by: Jonathan Cameron
Reviewed-by: Mauro Carvalho Chehab
---
include/linux/highmem.h | 8
mm/khugepaged.c
considered
at present.
Signed-off-by: Tong Tiangen
---
arch/arm64/include/asm/string.h | 5 ++
arch/arm64/include/asm/uaccess.h | 18 ++
arch/arm64/lib/Makefile | 2 +-
arch/arm64/lib/memcpy_mc.S | 98
mm/kasan/shadow.c| 12 +++
__arch_copy_to_user(), This make the regular
copy_to_user() will handle kernel memory errors.
Signed-off-by: Tong Tiangen
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/asm-extable.h | 31 +++-
arch/arm64/include/asm/asm-uaccess.h | 4
arch/arm64
在 2024/8/21 19:28, Jonathan Cameron 写道:
On Tue, 20 Aug 2024 11:02:05 +0800
Tong Tiangen wrote:
在 2024/8/19 19:56, Jonathan Cameron 写道:
On Tue, 28 May 2024 16:59:13 +0800
Tong Tiangen wrote:
Currently, many scenarios that can tolerate memory errors when copying page
have been
在 2024/8/20 17:12, Mark Rutland 写道:
On Tue, Aug 20, 2024 at 10:11:45AM +0800, Tong Tiangen wrote:
在 2024/8/20 1:29, Mark Rutland 写道:
Hi Tong,
On Tue, May 28, 2024 at 04:59:11PM +0800, Tong Tiangen wrote:
For the arm64 kernel, when it processes hardware memory errors for
synchronize
在 2024/8/19 20:08, Jonathan Cameron 写道:
On Tue, 28 May 2024 16:59:15 +0800
Tong Tiangen wrote:
For SEA exception, kernel require take some action to recover from memory
error, such as isolate poison page adn kill failure thread, which are done
in memory_failure().
During our test, the
在 2024/8/19 19:56, Jonathan Cameron 写道:
On Tue, 28 May 2024 16:59:13 +0800
Tong Tiangen wrote:
Currently, many scenarios that can tolerate memory errors when copying page
have been supported in the kernel[1~5], all of which are implemented by
copy_mc_[user]_highpage(). arm64 should also
在 2024/8/19 18:30, Jonathan Cameron 写道:
On Tue, 28 May 2024 16:59:11 +0800
Tong Tiangen wrote:
For the arm64 kernel, when it processes hardware memory errors for
synchronize notifications(do_sea()), if the errors is consumed within the
kernel, the current processing is panic. However, it
在 2024/8/20 1:29, Mark Rutland 写道:
Hi Tong,
On Tue, May 28, 2024 at 04:59:11PM +0800, Tong Tiangen wrote:
For the arm64 kernel, when it processes hardware memory errors for
synchronize notifications(do_sea()), if the errors is consumed within the
kernel, the current processing is panic
在 2024/8/19 17:57, Jonathan Cameron 写道:
On Tue, 28 May 2024 16:59:10 +0800
Tong Tiangen wrote:
x86/powerpc has it's implementation of copy_mc_to_user(), we add generic
fallback in include/linux/uaccess.h prepare for other architechures to
enable CONFIG_ARCH_HAS_COPY_MC.
Signed-o
Since V1:
1.Consistent with PPC/x86, Using CONFIG_ARCH_HAS_COPY_MC instead of
ARM64_UCE_KERNEL_RECOVERY.
2.Add two new scenes, cow and pagecache reading.
3.Fix two small bug(the first two patch).
V1 in here:
https://lore.kernel.org/lkml/20220323033705.3966643-1-tongtian...@huawei.com/
T
uot;mm/khugepaged: recover from poisoned file-backed
memory")
Signed-off-by: Tong Tiangen
---
arch/arm64/include/asm/mte.h| 9 +
arch/arm64/include/asm/page.h | 10 ++
arch/arm64/lib/Makefile | 2 ++
arch/arm64/lib/copy_mc_page.S | 35 ++
signals to user
processes in do_sea(). After [1] is merged, this patch can be rolled back
or the SIGBUS will be sent repeated.
[1]https://lore.kernel.org/lkml/20240204080144.7977-1-xuesh...@linux.alibaba.com/
Signed-off-by: Tong Tiangen
---
arch/arm64/mm/fault.c | 14 +++---
1 file
x86/powerpc has it's implementation of copy_mc_to_user(), we add generic
fallback in include/linux/uaccess.h prepare for other architechures to
enable CONFIG_ARCH_HAS_COPY_MC.
Signed-off-by: Tong Tiangen
Acked-by: Michael Ellerman
---
arch/powerpc/include/asm/uaccess.h | 1 +
arch/x86/in
introduce copy_mc_to_kernel() implementation.
Also add memcpy_mc() for memory copy that handles source exceptions.
Because there is no GPR is available for saving "bytes not copied" in
memcpy(), the mempcy_mc() is referenced to the implementation of
copy_from_user().
Signed-off-by: To
If hardware errors are encountered during page copying, returning the bytes
not copied is not meaningful, and the caller cannot do any processing on
the remaining data. Returning -EFAULT is more reasonable, which represents
a hardware error encountered during the copying.
Signed-off-by: Tong
, only the associated process is affected. Killing the user
process and isolating the corrupt page is a better choice.
New fixup type EX_TYPE_KACCESS_ERR_ZERO_ME_SAFE is added to identify insn
that can recover from memory errors triggered by access to kernel memory.
Signed-off-by: Tong Tiangen
Hi Mark:
Kindly ping...
Thanks,
Tong.
在 2024/2/7 21:21, Tong Tiangen 写道:
With the increase of memory capacity and density, the probability of memory
error also increases. The increasing size and density of server RAM in data
centers and clouds have shown increased uncorrectable memory
Hi Mark:
Kindly ping :)
Thanks.
Tong.
在 2024/2/7 21:21, Tong Tiangen 写道:
With the increase of memory capacity and density, the probability of memory
error also increases. The increasing size and density of server RAM in data
centers and clouds have shown increased uncorrectable memory errors
signals to user
processes (!(PF_KTHREAD|PF_IO_WORKER|PF_WQ_WORKER|PF_USER_WORKER)) in
do_sea(). After [1] is merged, this patch can be rolled back or the SIGBUS
will be sent repeated.
[1]https://lore.kernel.org/lkml/20240204080144.7977-1-xuesh...@linux.alibaba.com/
Signed-off-by: Tong Tiangen
uot;mm/khugepaged: recover from poisoned file-backed
memory")
Signed-off-by: Tong Tiangen
---
arch/arm64/include/asm/mte.h| 9 +
arch/arm64/include/asm/page.h | 10 ++
arch/arm64/lib/Makefile | 2 ++
arch/arm64/lib/copy_mc_page.S | 37 +++
tion.
3. According Mark's suggestion, update commit message of patch 2/5.
4. According Borisllav's suggestion, update commit message of patch 1/5.
Since V1:
1.Consistent with PPC/x86, Using CONFIG_ARCH_HAS_COPY_MC instead of
ARM64_UCE_KERNEL_RECOVERY.
2.Add two new scenes, cow and p
If hardware errors are encountered during page copying, returning the bytes
not copied is not meaningful, and the caller cannot do any processing on
the remaining data. Returning -EFAULT is more reasonable, which represents
a hardware error encountered during the copying.
Signed-off-by: Tong
, only the associated process is affected. Killing the user
process and isolating the corrupt page is a better choice.
New fixup type EX_TYPE_KACCESS_ERR_ZERO_ME_SAFE is added to identify insn
that can recover from memory errors triggered by access to kernel memory.
Signed-off-by: Tong Tiangen
x86/powerpc has it's implementation of copy_mc_to_user(), we add generic
fallback in include/linux/uaccess.h prepare for other architechures to
enable CONFIG_ARCH_HAS_COPY_MC.
Signed-off-by: Tong Tiangen
Acked-by: Michael Ellerman
---
arch/powerpc/include/asm/uaccess.h | 1 +
arch/x86/in
在 2024/1/30 18:20, Mark Rutland 写道:
On Mon, Jan 29, 2024 at 09:46:52PM +0800, Tong Tiangen wrote:
The copy_mc_to_kernel() helper is memory copy implementation that handles
source exceptions. It can be used in memory copy scenarios that tolerate
hardware memory errors(e.g: pmem_read
在 2024/1/30 18:31, Mark Rutland 写道:
On Mon, Jan 29, 2024 at 09:46:51PM +0800, Tong Tiangen wrote:
Currently, many scenarios that can tolerate memory errors when copying page
have been supported in the kernel[1][2][3], all of which are implemented by
copy_mc_[user]_highpage(). arm64 should
在 2024/1/30 20:01, Mark Rutland 写道:
On Tue, Jan 30, 2024 at 07:14:35PM +0800, Tong Tiangen wrote:
在 2024/1/30 1:43, Mark Rutland 写道:
On Mon, Jan 29, 2024 at 09:46:49PM +0800, Tong Tiangen wrote:
Further, this change will also silently fixup unexpected kernel faults if we
pass bad kernel
在 2024/1/30 21:07, Mark Rutland 写道:
On Tue, Jan 30, 2024 at 06:57:24PM +0800, Tong Tiangen wrote:
在 2024/1/30 1:51, Mark Rutland 写道:
On Mon, Jan 29, 2024 at 09:46:48PM +0800, Tong Tiangen wrote:
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 55f6455a8284..312932dc100b
在 2024/1/30 1:43, Mark Rutland 写道:
On Mon, Jan 29, 2024 at 09:46:49PM +0800, Tong Tiangen wrote:
If user process access memory fails due to hardware memory error, only the
relevant processes are affected, so it is more reasonable to kill the user
process and isolate the corrupt page than to
在 2024/1/30 1:51, Mark Rutland 写道:
On Mon, Jan 29, 2024 at 09:46:48PM +0800, Tong Tiangen wrote:
For the arm64 kernel, when it processes hardware memory errors for
synchronize notifications(do_sea()), if the errors is consumed within the
kernel, the current processing is panic. However, it
framework, we introduce copy_mc_to_kernel()
implementation.
Signed-off-by: Tong Tiangen
---
arch/arm64/include/asm/string.h | 5 +
arch/arm64/include/asm/uaccess.h | 21 +++
arch/arm64/lib/Makefile | 2 +-
arch/arm64/lib/memcpy_mc.S | 257 +++
mm/kasan
o new scenes, cow and pagecache reading.
3.Fix two small bug(the first two patch).
V1 in here:
https://lore.kernel.org/lkml/20220323033705.3966643-1-tongtian...@huawei.com/
Tong Tiangen (6):
uaccess: add generic fallback version of copy_mc_to_user()
arm64: add support for machine check erro
user process will be affected. Killing the user process and
isolating the corrupt page is a better choice.
This patch only enable machine error check framework and adds an exception
fixup before the kernel panic in do_sea().
Signed-off-by: Tong Tiangen
---
arch/arm64/Kconfig | 1
If hardware errors are encountered during page copying, returning the bytes
not copied is not meaningful, and the caller cannot do any processing on
the remaining data. Returning -EFAULT is more reasonable, which represents
a hardware error encountered during the copying.
Signed-off-by: Tong
If user process access memory fails due to hardware memory error, only the
relevant processes are affected, so it is more reasonable to kill the user
process and isolate the corrupt page than to panic the kernel.
Signed-off-by: Tong Tiangen
---
arch/arm64/lib/copy_from_user.S | 10
2500b93cc9 ("mm/khugepaged: recover from poisoned anonymous memory")
[3]6b970599e807 ("mm: hwpoison: support recovery from ksm_might_need_to_copy()")
Signed-off-by: Tong Tiangen
---
arch/arm64/include/asm/asm-extable.h | 15 ++
arch/arm64/include/asm/assembler.h | 4 ++
x86/powerpc has it's implementation of copy_mc_to_user(), we add generic
fallback in include/linux/uaccess.h prepare for other architechures to
enable CONFIG_ARCH_HAS_COPY_MC.
Signed-off-by: Tong Tiangen
Acked-by: Michael Ellerman
---
arch/powerpc/include/asm/uaccess.h | 1 +
arch/x86/in
50 matches
Mail list logo