Re: next-20241031: kernel/time/clockevents.c:455 clockevents_register_device

2024-10-31 Thread Thomas Gleixner
On Thu, Oct 31 2024 at 14:10, Naresh Kamboju wrote:
> The QEMU-ARM64 boot has failed with the Linux next-20241031 tag.
> The boot log shows warnings at clockevents_register_device and followed
> by rcu_preempt detected stalls.
>
> However, the system did not proceed far enough to reach the login prompt.
> The fvp-aemva, Qemu-arm64, Qemu-armv7 and Qemu-riscv64 boot failed.
>
> Please find the incomplete boot log links below for your reference.
> The Qemu version is 9.0.2.
> <4>[ 0.220657] WARNING: CPU: 1 PID: 0 at kernel/time/clockevents.c:455
> clockevents_register_device (kernel/time/clockevents.c:455
> <4>[ 0.225218] clockevents_register_device+0x170/0x188 P
> <4>[ 0.225367] clockevents_config_and_register+0x34/0x50 L
> <4>[ 0.225487] clockevents_config_and_register (kernel/time/clockevents.c:523)
> <4>[ 0.225553] arch_timer_starting_cpu
> (drivers/clocksource/arm_arch_timer.c:1034)
> <4>[ 0.225602] cpuhp_invoke_callback (kernel/cpu.c:194)
> <4>[ 0.225649] __cpuhp_invoke_callback_range (kernel/cpu.c:965)
> <4>[ 0.225691] notify_cpu_starting (kernel/cpu.c:1604)

That's obvious what happens here. notify_cpu_starting() is invoked
before the CPU is marked online, which triggers the new check in
clockevents_register_device().

I removed the warning and force pushed the fixed up branch, so that
should be gone by tomorrow.

Thanks,

tglx



[PATCH RFT v12 0/8] fork: Support shadow stacks in clone3()

2024-10-31 Thread Mark Brown
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zicfiss respectively), I am actively
working on GCS[1].  With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses.  This provides some
protection against ROP attacks and making it easier to collect call
stacks.  These shadow stacks are allocated in the address space of the
userspace process.

Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled.  The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread.  This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces.  As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.

Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process, keeping the current
implicit allocation behaviour if one is not specified either with
clone3() or through the use of clone().  The user must provide a shadow
stack pointer, this must point to memory mapped for use as a shadow
stackby map_shadow_stack() with an architecture specified shadow stack
token at the top of the stack.

Please note that the x86 portions of this code are build tested only, I
don't appear to have a system that can run CET available to me.

[1] 
https://lore.kernel.org/linux-arm-kernel/20241001-arm64-gcs-v13-0-222b78d87...@kernel.org/T/#mc58f97f27461749ccf400ebabf6f9f937116a86b

Signed-off-by: Mark Brown 
---
Changes in v12:
- Add the regular prctl() to the userspace API document since arm64
  support is queued in -next.
- Link to v11: 
https://lore.kernel.org/r/20241005-clone3-shadow-stack-v11-0-2a6a2bd6d...@kernel.org

Changes in v11:
- Rebase onto arm64 for-next/gcs, which is based on v6.12-rc1, and
  integrate arm64 support.
- Rework the interface to specify a shadow stack pointer rather than a
  base and size like we do for the regular stack.
- Link to v10: 
https://lore.kernel.org/r/20240821-clone3-shadow-stack-v10-0-06e8797b9...@kernel.org

Changes in v10:
- Integrate fixes & improvements for the x86 implementation from Rick
  Edgecombe.
- Require that the shadow stack be VM_WRITE.
- Require that the shadow stack base and size be sizeof(void *) aligned.
- Clean up trailing newline.
- Link to v9: 
https://lore.kernel.org/r/20240819-clone3-shadow-stack-v9-0-962d74f99...@kernel.org

Changes in v9:
- Pull token validation earlier and report problems with an error return
  to parent rather than signal delivery to the child.
- Verify that the top of the supplied shadow stack is VM_SHADOW_STACK.
- Rework token validation to only do the page mapping once.
- Drop no longer needed support for testing for signals in selftest.
- Fix typo in comments.
- Link to v8: 
https://lore.kernel.org/r/20240808-clone3-shadow-stack-v8-0-0acf37caf...@kernel.org

Changes in v8:
- Fix token verification with user specified shadow stack.
- Don't track user managed shadow stacks for child processes.
- Link to v7: 
https://lore.kernel.org/r/20240731-clone3-shadow-stack-v7-0-a9532eebf...@kernel.org

Changes in v7:
- Rebase onto v6.11-rc1.
- Typo fixes.
- Link to v6: 
https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1...@kernel.org

Changes in v6:
- Rebase onto v6.10-rc3.
- Ensure we don't try to free the parent shadow stack in error paths of
  x86 arch code.
- Spelling fixes in userspace API document.
- Additional cleanups and improvements to the clone3() tests to support
  the shadow stack tests.
- Link to v5: 
https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598...@kernel.org

Changes in v5:
- Rebase onto v6.8-rc2.
- Rework ABI to have the user allocate the shadow stack memory with
  map_shadow_stack() and a token.
- Force inlining of the x86 shadow stack enablement.
- Move shadow stack enablement out into a shared header for reuse by
  other tests.
- Link to v4: 
https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f...@kernel.org

Changes in v4:
- Formatting changes.
- Use a define for minimum shadow stack size and move some basic
  validation to fork.c.
- Link to v3: 
https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2...@kernel.org

Changes in v3:
- Rebase onto v6.7-rc2.
- Remove stale shadow_stack in internal kargs.
- If a shadow stac

[PATCH RFT v12 1/8] arm64/gcs: Return a success value from gcs_alloc_thread_stack()

2024-10-31 Thread Mark Brown
Currently as a result of templating from x86 code gcs_alloc_thread_stack()
returns a pointer as an unsigned int however on arm64 we don't actually use
this pointer value as anything other than a pass/fail flag. Simplify the
interface to just return an int with 0 on success and a negative error code
on failure.

Acked-by: Deepak Gupta 
Signed-off-by: Mark Brown 
---
 arch/arm64/include/asm/gcs.h | 8 
 arch/arm64/kernel/process.c  | 8 
 arch/arm64/mm/gcs.c  | 8 
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/gcs.h b/arch/arm64/include/asm/gcs.h
index 
f50660603ecf5dc09a92740062df3a089b02b219..d8923b5f03b776252aca76ce316ef57399d71fa9
 100644
--- a/arch/arm64/include/asm/gcs.h
+++ b/arch/arm64/include/asm/gcs.h
@@ -64,8 +64,8 @@ static inline bool task_gcs_el0_enabled(struct task_struct 
*task)
 void gcs_set_el0_mode(struct task_struct *task);
 void gcs_free(struct task_struct *task);
 void gcs_preserve_current_state(void);
-unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
-const struct kernel_clone_args *args);
+int gcs_alloc_thread_stack(struct task_struct *tsk,
+  const struct kernel_clone_args *args);
 
 static inline int gcs_check_locked(struct task_struct *task,
   unsigned long new_val)
@@ -91,8 +91,8 @@ static inline bool task_gcs_el0_enabled(struct task_struct 
*task)
 static inline void gcs_set_el0_mode(struct task_struct *task) { }
 static inline void gcs_free(struct task_struct *task) { }
 static inline void gcs_preserve_current_state(void) { }
-static inline unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
-  const struct 
kernel_clone_args *args)
+static inline int gcs_alloc_thread_stack(struct task_struct *tsk,
+const struct kernel_clone_args *args)
 {
return -ENOTSUPP;
 }
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 
fdd095480c3ffb8c13fd4e7c9abc79e88143e08b..8ebd11c29792524dfeeade9cc7826b007329aa6a
 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -297,7 +297,7 @@ static void flush_gcs(void)
 static int copy_thread_gcs(struct task_struct *p,
   const struct kernel_clone_args *args)
 {
-   unsigned long gcs;
+   int ret;
 
if (!system_supports_gcs())
return 0;
@@ -305,9 +305,9 @@ static int copy_thread_gcs(struct task_struct *p,
p->thread.gcs_base = 0;
p->thread.gcs_size = 0;
 
-   gcs = gcs_alloc_thread_stack(p, args);
-   if (IS_ERR_VALUE(gcs))
-   return PTR_ERR((void *)gcs);
+   ret = gcs_alloc_thread_stack(p, args);
+   if (ret != 0)
+   return ret;
 
p->thread.gcs_el0_mode = current->thread.gcs_el0_mode;
p->thread.gcs_el0_locked = current->thread.gcs_el0_locked;
diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index 
5c46ec527b1cdaa8f52cff445d70ba0f8509d086..1f633a482558b59aac5427963d42b37fce08c8a6
 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -38,8 +38,8 @@ static unsigned long gcs_size(unsigned long size)
return max(PAGE_SIZE, size);
 }
 
-unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
-const struct kernel_clone_args *args)
+int gcs_alloc_thread_stack(struct task_struct *tsk,
+  const struct kernel_clone_args *args)
 {
unsigned long addr, size;
 
@@ -59,13 +59,13 @@ unsigned long gcs_alloc_thread_stack(struct task_struct 
*tsk,
size = gcs_size(size);
addr = alloc_gcs(0, size);
if (IS_ERR_VALUE(addr))
-   return addr;
+   return PTR_ERR((void *)addr);
 
tsk->thread.gcs_base = addr;
tsk->thread.gcs_size = size;
tsk->thread.gcspr_el0 = addr + size - sizeof(u64);
 
-   return addr;
+   return 0;
 }
 
 SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, 
unsigned int, flags)

-- 
2.39.2




[PATCH RFT v12 2/8] Documentation: userspace-api: Add shadow stack API documentation

2024-10-31 Thread Mark Brown
There are a number of architectures with shadow stack features which we are
presenting to userspace with as consistent an API as we can (though there
are some architecture specifics). Especially given that there are some
important considerations for userspace code interacting directly with the
feature let's provide some documentation covering the common aspects.

Reviewed-by: Catalin Marinas 
Reviewed-by: Kees Cook 
Tested-by: Kees Cook 
Acked-by: Shuah Khan 
Signed-off-by: Mark Brown 
---
 Documentation/userspace-api/index.rst|  1 +
 Documentation/userspace-api/shadow_stack.rst | 42 
 2 files changed, 43 insertions(+)

diff --git a/Documentation/userspace-api/index.rst 
b/Documentation/userspace-api/index.rst
index 
274cc7546efc2a042d2dc00aa67c71c52372179a..c39709bfba2c5682d0d1a22444db17c17bcf01ce
 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -59,6 +59,7 @@ Everything else
 
ELF
netlink/index
+   shadow_stack
sysfs-platform_profile
vduse
futex2
diff --git a/Documentation/userspace-api/shadow_stack.rst 
b/Documentation/userspace-api/shadow_stack.rst
new file mode 100644
index 
..9d0d4e79cfa7c47d3208dd53071a3d0b86c18575
--- /dev/null
+++ b/Documentation/userspace-api/shadow_stack.rst
@@ -0,0 +1,42 @@
+=
+Shadow Stacks
+=
+
+Introduction
+
+
+Several architectures have features which provide backward edge
+control flow protection through a hardware maintained stack, only
+writeable by userspace through very limited operations.  This feature
+is referred to as shadow stacks on Linux, on x86 it is part of Intel
+Control Enforcement Technology (CET), on arm64 it is Guarded Control
+Stacks feature (FEAT_GCS) and for RISC-V it is the Zicfiss extension.
+It is expected that this feature will normally be managed by the
+system dynamic linker and libc in ways broadly transparent to
+application code, this document covers interfaces and considerations.
+
+
+Enabling
+
+
+Shadow stacks default to disabled when a userspace process is
+executed, they can be enabled for the current thread with a syscall:
+
+ - For x86 the ARCH_SHSTK_ENABLE arch_prctl()
+ - For other architectures the PR_SET_SHADOW_STACK_ENABLE prctl()
+
+It is expected that this will normally be done by the dynamic linker.
+Any new threads created by a thread with shadow stacks enabled will
+themselves have shadow stacks enabled.
+
+
+Enablement considerations
+=
+
+- Returning from the function that enables shadow stacks without first
+  disabling them will cause a shadow stack exception.  This includes
+  any syscall wrapper or other library functions, the syscall will need
+  to be inlined.
+- A lock feature allows userspace to prevent disabling of shadow stacks.
+- Those that change the stack context like longjmp() or use of ucontext
+  changes on signal return will need support from libc.

-- 
2.39.2




[PATCH RFT v12 3/8] selftests: Provide helper header for shadow stack testing

2024-10-31 Thread Mark Brown
While almost all users of shadow stacks should be relying on the dynamic
linker and libc to enable the feature there are several low level test
programs where it is useful to enable without any libc support, allowing
testing without full system enablement. This low level testing is helpful
during bringup of the support itself, and also in enabling coverage by
automated testing without needing all system components in the target root
filesystems to have enablement.

Provide a header with helpers for this purpose, intended for use only by
test programs directly exercising shadow stack interfaces.

Reviewed-by: Rick Edgecombe 
Reviewed-by: Kees Cook 
Tested-by: Kees Cook 
Acked-by: Shuah Khan 
Signed-off-by: Mark Brown 
---
 tools/testing/selftests/ksft_shstk.h | 98 
 1 file changed, 98 insertions(+)

diff --git a/tools/testing/selftests/ksft_shstk.h 
b/tools/testing/selftests/ksft_shstk.h
new file mode 100644
index 
..869ecea2bf3ea3d30cead9819d2b3a75f5397754
--- /dev/null
+++ b/tools/testing/selftests/ksft_shstk.h
@@ -0,0 +1,98 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Helpers for shadow stack enablement, this is intended to only be
+ * used by low level test programs directly exercising interfaces for
+ * working with shadow stacks.
+ *
+ * Copyright (C) 2024 ARM Ltd.
+ */
+
+#ifndef __KSFT_SHSTK_H
+#define __KSFT_SHSTK_H
+
+#include 
+
+/* This is currently only defined for x86 */
+#ifndef SHADOW_STACK_SET_TOKEN
+#define SHADOW_STACK_SET_TOKEN (1ULL << 0)
+#endif
+
+static bool shadow_stack_enabled;
+
+#ifdef __x86_64__
+#define ARCH_SHSTK_ENABLE  0x5001
+#define ARCH_SHSTK_SHSTK   (1ULL <<  0)
+
+#define ARCH_PRCTL(arg1, arg2) \
+({ \
+   long _ret;  \
+   register long _num  asm("eax") = __NR_arch_prctl;   \
+   register long _arg1 asm("rdi") = (long)(arg1);  \
+   register long _arg2 asm("rsi") = (long)(arg2);  \
+   \
+   asm volatile (  \
+   "syscall\n" \
+   : "=a"(_ret)\
+   : "r"(_arg1), "r"(_arg2),   \
+ "0"(_num) \
+   : "rcx", "r11", "memory", "cc"  \
+   );  \
+   _ret;   \
+})
+
+#define ENABLE_SHADOW_STACK
+static inline __attribute__((always_inline)) void enable_shadow_stack(void)
+{
+   int ret = ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK);
+   if (ret == 0)
+   shadow_stack_enabled = true;
+}
+
+#endif
+
+#ifdef __aarch64__
+#define PR_SET_SHADOW_STACK_STATUS  75
+# define PR_SHADOW_STACK_ENABLE (1UL << 0)
+
+#define my_syscall2(num, arg1, arg2)  \
+({\
+   register long _num  __asm__ ("x8") = (num);   \
+   register long _arg1 __asm__ ("x0") = (long)(arg1);\
+   register long _arg2 __asm__ ("x1") = (long)(arg2);\
+   register long _arg3 __asm__ ("x2") = 0;   \
+   register long _arg4 __asm__ ("x3") = 0;   \
+   register long _arg5 __asm__ ("x4") = 0;   \
+ \
+   __asm__  volatile (   \
+   "svc #0\n"\
+   : "=r"(_arg1) \
+   : "r"(_arg1), "r"(_arg2), \
+ "r"(_arg3), "r"(_arg4), \
+ "r"(_arg5), "r"(_num)   \
+   : "memory", "cc"  \
+   );\
+   _arg1;\
+})
+
+#define ENABLE_SHADOW_STACK
+static inline __attribute__((always_inline))  void enable_shadow_stack(void)
+{
+   int ret;
+
+   ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ PR_SHADOW_STACK_ENABLE);
+   if (ret == 0)
+   shadow_stack_enabled = true;
+}
+
+#endif
+
+#ifndef __NR_map_shadow_stack
+#define __NR_map_shadow_stack 453
+#endif
+
+#ifndef ENABLE_SHADOW_STACK
+static inline void enab

[PATCH RFT v12 6/8] selftests/clone3: Factor more of main loop into test_clone3()

2024-10-31 Thread Mark Brown
In order to make it easier to add more configuration for the tests and
more support for runtime detection of when tests can be run pass the
structure describing the tests into test_clone3() rather than picking
the arguments out of it and have that function do all the per-test work.

No functional change.

Reviewed-by: Kees Cook 
Tested-by: Kees Cook 
Acked-by: Shuah Khan 
Reviewed-by: Catalin Marinas 
Signed-off-by: Mark Brown 
---
 tools/testing/selftests/clone3/clone3.c | 77 -
 1 file changed, 37 insertions(+), 40 deletions(-)

diff --git a/tools/testing/selftests/clone3/clone3.c 
b/tools/testing/selftests/clone3/clone3.c
index 
e61f07973ce5e27aff30047b35e03b1b51875c15..e066b201fa64eb17c55939b7cec18ac5d109613b
 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -30,6 +30,19 @@ enum test_mode {
CLONE3_ARGS_INVAL_EXIT_SIGNAL_NSIG,
 };
 
+typedef bool (*filter_function)(void);
+typedef size_t (*size_function)(void);
+
+struct test {
+   const char *name;
+   uint64_t flags;
+   size_t size;
+   size_function size_function;
+   int expected;
+   enum test_mode test_mode;
+   filter_function filter;
+};
+
 static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
 {
struct __clone_args args = {
@@ -109,30 +122,40 @@ static int call_clone3(uint64_t flags, size_t size, enum 
test_mode test_mode)
return 0;
 }
 
-static bool test_clone3(uint64_t flags, size_t size, int expected,
-   enum test_mode test_mode)
+static void test_clone3(const struct test *test)
 {
+   size_t size;
int ret;
 
+   if (test->filter && test->filter()) {
+   ksft_test_result_skip("%s\n", test->name);
+   return;
+   }
+
+   if (test->size_function)
+   size = test->size_function();
+   else
+   size = test->size;
+
+   ksft_print_msg("Running test '%s'\n", test->name);
+
ksft_print_msg(
"[%d] Trying clone3() with flags %#" PRIx64 " (size %zu)\n",
-   getpid(), flags, size);
-   ret = call_clone3(flags, size, test_mode);
+   getpid(), test->flags, size);
+   ret = call_clone3(test->flags, size, test->test_mode);
ksft_print_msg("[%d] clone3() with flags says: %d expected %d\n",
-   getpid(), ret, expected);
-   if (ret != expected) {
+   getpid(), ret, test->expected);
+   if (ret != test->expected) {
ksft_print_msg(
"[%d] Result (%d) is different than expected (%d)\n",
-   getpid(), ret, expected);
-   return false;
+   getpid(), ret, test->expected);
+   ksft_test_result_fail("%s\n", test->name);
+   return;
}
 
-   return true;
+   ksft_test_result_pass("%s\n", test->name);
 }
 
-typedef bool (*filter_function)(void);
-typedef size_t (*size_function)(void);
-
 static bool not_root(void)
 {
if (getuid() != 0) {
@@ -160,16 +183,6 @@ static size_t page_size_plus_8(void)
return getpagesize() + 8;
 }
 
-struct test {
-   const char *name;
-   uint64_t flags;
-   size_t size;
-   size_function size_function;
-   int expected;
-   enum test_mode test_mode;
-   filter_function filter;
-};
-
 static const struct test tests[] = {
{
.name = "simple clone3()",
@@ -319,24 +332,8 @@ int main(int argc, char *argv[])
ksft_set_plan(ARRAY_SIZE(tests));
test_clone3_supported();
 
-   for (i = 0; i < ARRAY_SIZE(tests); i++) {
-   if (tests[i].filter && tests[i].filter()) {
-   ksft_test_result_skip("%s\n", tests[i].name);
-   continue;
-   }
-
-   if (tests[i].size_function)
-   size = tests[i].size_function();
-   else
-   size = tests[i].size;
-
-   ksft_print_msg("Running test '%s'\n", tests[i].name);
-
-   ksft_test_result(test_clone3(tests[i].flags, size,
-tests[i].expected,
-tests[i].test_mode),
-"%s\n", tests[i].name);
-   }
+   for (i = 0; i < ARRAY_SIZE(tests); i++)
+   test_clone3(&tests[i]);
 
ksft_finished();
 }

-- 
2.39.2




[PATCH RFT v12 7/8] selftests/clone3: Allow tests to flag if -E2BIG is a valid error code

2024-10-31 Thread Mark Brown
The clone_args structure is extensible, with the syscall passing in the
length of the structure. Inside the kernel we use copy_struct_from_user()
to read the struct but this has the unfortunate side effect of silently
accepting some overrun in the structure size providing the extra data is
all zeros. This means that we can't discover the clone3() features that
the running kernel supports by simply probing with various struct sizes.
We need to check this for the benefit of test systems which run newer
kselftests on old kernels.

Add a flag which can be set on a test to indicate that clone3() may return
-E2BIG due to the use of newer struct versions. Currently no tests need
this but it will become an issue for testing clone3() support for shadow
stacks, the support for shadow stacks is already present on x86.

Reviewed-by: Kees Cook 
Tested-by: Kees Cook 
Acked-by: Shuah Khan 
Reviewed-by: Catalin Marinas 
Signed-off-by: Mark Brown 
---
 tools/testing/selftests/clone3/clone3.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tools/testing/selftests/clone3/clone3.c 
b/tools/testing/selftests/clone3/clone3.c
index 
e066b201fa64eb17c55939b7cec18ac5d109613b..5b8b7d640e70132242fc6939450669acd0c534f9
 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -39,6 +39,7 @@ struct test {
size_t size;
size_function size_function;
int expected;
+   bool e2big_valid;
enum test_mode test_mode;
filter_function filter;
 };
@@ -146,6 +147,11 @@ static void test_clone3(const struct test *test)
ksft_print_msg("[%d] clone3() with flags says: %d expected %d\n",
getpid(), ret, test->expected);
if (ret != test->expected) {
+   if (test->e2big_valid && ret == -E2BIG) {
+   ksft_print_msg("Test reported -E2BIG\n");
+   ksft_test_result_skip("%s\n", test->name);
+   return;
+   }
ksft_print_msg(
"[%d] Result (%d) is different than expected (%d)\n",
getpid(), ret, test->expected);

-- 
2.39.2




[PATCH RFT v12 5/8] selftests/clone3: Remove redundant flushes of output streams

2024-10-31 Thread Mark Brown
Since there were widespread issues with output not being flushed the
kselftest framework was modified to explicitly set the output streams
unbuffered in commit 58e2847ad2e6 ("selftests: line buffer test
program's stdout") so there is no need to explicitly flush in the clone3
tests.

Reviewed-by: Kees Cook 
Tested-by: Kees Cook 
Acked-by: Shuah Khan 
Reviewed-by: Catalin Marinas 
Signed-off-by: Mark Brown 
---
 tools/testing/selftests/clone3/clone3_selftests.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/testing/selftests/clone3/clone3_selftests.h 
b/tools/testing/selftests/clone3/clone3_selftests.h
index 
3d2663fe50ba56f011629e4f2eb68a72bcceb087..39b5dcba663c30b9fc2542d9a0d2686105ce5761
 100644
--- a/tools/testing/selftests/clone3/clone3_selftests.h
+++ b/tools/testing/selftests/clone3/clone3_selftests.h
@@ -35,8 +35,6 @@ struct __clone_args {
 
 static pid_t sys_clone3(struct __clone_args *args, size_t size)
 {
-   fflush(stdout);
-   fflush(stderr);
return syscall(__NR_clone3, args, size);
 }
 

-- 
2.39.2




Re: [PATCH net-next v2 4/4] selftests: hsr: Add test for VLAN

2024-10-31 Thread Paolo Abeni
On 10/24/24 12:30, MD Danish Anwar wrote:
> @@ -183,9 +232,21 @@ trap cleanup_all_ns EXIT
>  setup_hsr_interfaces 0
>  do_complete_ping_test
>  
> +# Run VLAN Test
> +if $vlan; then
> + setup_vlan_interfaces
> + hsr_vlan_ping
> +fi
> +
>  setup_ns ns1 ns2 ns3
>  
>  setup_hsr_interfaces 1
>  do_complete_ping_test
>  
> +# Run VLAN Test
> +if $vlan; then
> + setup_vlan_interfaces
> + hsr_vlan_ping
> +fi

The new tests should be enabled by default. Indeed ideally the test
script should be able to run successfully on kernel not supporting such
feature; you could cope with that looking for the hsr exposed feature
and skipping the vlan test when the relevant feature is not present.

Cheers,

Paolo




Re: [PATCH net-next v6 00/12] selftests: ncdevmem: Add ncdevmem to ksft

2024-10-31 Thread Stanislav Fomichev
On 10/31, Mina Almasry wrote:
> On Wed, Oct 30, 2024 at 8:37 AM Stanislav Fomichev  
> wrote:
> >
> > On 10/30, Mina Almasry wrote:
> > > On Wed, Oct 30, 2024 at 8:13 AM Stanislav Fomichev  
> > > wrote:
> > > >
> > > > On 10/30, Mina Almasry wrote:
> > > > > On Wed, Oct 30, 2024 at 7:27 AM Stanislav Fomichev  
> > > > > wrote:
> > > > > >
> > > > > > The goal of the series is to simplify and make it possible to use
> > > > > > ncdevmem in an automated way from the ksft python wrapper.
> > > > > >
> > > > > > ncdevmem is slowly mutated into a state where it uses stdout
> > > > > > to print the payload and the python wrapper is added to
> > > > > > make sure the arrived payload matches the expected one.
> > > > > >
> > > > > > v6:
> > > > > > - fix compilation issue in 'Unify error handling' patch (Jakub)
> > > > > >
> > > > >
> > > > > Since I saw a compilation failures on a couple of iterations I
> > > > > cherry-picked this locally and tested compilation. I'm seeing this:
> > > >
> > > > Are you cherry picking the whole series or just this patch? It looks
> > > > too broken.
> > > >
> > > > > sudo CFLAGS="-static" make -C ./tools/testing/selftests/drivers/net/hw
> > > > > TARGETS=ncdevmem 2>&1
> > > > > make: Entering directory
> > > > > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw'
> > > > >   CC   ncdevmem
> > > > > In file included from ncdevmem.c:63:
> > > > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:23:43:
> > > > > warning: ‘enum ethtool_header_flags’ declared inside parameter list
> > > > > will not be visible outside of this definition or declaration
> > > > >23 | const char *ethtool_header_flags_str(enum 
> > > > > ethtool_header_flags value);
> > > > >   |   ^~~~
> > > > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:25:41:
> > > > > warning: ‘enum ethtool_module_fw_flash_status’ declared inside
> > > > > parameter list will not be visible outside of this definition or
> > > > > declaration
> > > > >25 | ethtool_module_fw_flash_status_str(enum
> > > > > ethtool_module_fw_flash_status value);
> > > > >   | 
> > > > > ^~
> > > > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:6766:45:
> > > > > error: field ‘status’ has incomplete type
> > > > >  6766 | enum ethtool_module_fw_flash_status status;
> > > > >   | ^~
> > > >
> > > > This has been fixed via '#include '
> > > >
> > > > > ncdevmem.c: In function ‘do_server’:
> > > > > ncdevmem.c:517:37: error: storage size of ‘token’ isn’t known
> > > > >   517 | struct dmabuf_token token;
> > > >
> > > > And this, and the rest, don't make sense at all?
> > > >
> > > > I'll double check on my side.
> > >
> > > Oh, whoops, I forgot to headers_install first. This works for me:
> > >
> > > ➜  cos-kernel git:(tcpdevmem-fixes-1) ✗ sudo make headers_install &&
> > > sudo CFLAGS="-static" make -C ./tools/testing/selftests/drivers/net/hw
> > > TARGETS=ncdevmem 2>&1
> > >   INSTALL ./usr/include
> > > make: Entering directory
> > > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw'
> > > make: Nothing to be done for 'all'.
> > > make: Leaving directory
> > > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw'
> > > ➜  cos-kernel git:(tcpdevmem-fixes-1) ✗ find . -iname ncdevmem
> > > ./tools/testing/selftests/drivers/net/hw/ncdevmem
> > >
> > > Sorry for the noise :D
> >
> > Whew, thanks and no worries!
> 
> Sorry, 2 issues testing this series:

Thank you for testing!


> 1. ipv4 addresses seem broken, or maybe i'm using them wrong.
> 
> Client command:
> yes $(echo -e \\x01\\x02\\x03\\x04\\x05\\x06) | tr \\n \\0 | head -c
> 1G | nc 192.168.1.4 5224 -p 5224
> 
> Server command and logs:
> mina-1 /home/almasrymina # ./ncdevmem -s 192.168.1.4 -c 192.168.1.5 -l
> -p 5224 -v 7 -f eth1
> here: ynl.c:887:ynl_req_trampoline
> using queues 15..16
> Running: sudo ethtool -K eth1 ntuple off >&2
> Running: sudo ethtool -K eth1 ntuple on >&2
> Running: sudo ethtool -n eth1 | grep 'Filter:' | awk '{print $2}' |
> xargs -n1 ethtool -N eth1 delete >&2
> ethtool: bad command line argument(s)
> For more information run ethtool -h
> here: ynl.c:887:ynl_req_trampoline
> TCP header split: on
> Running: sudo ethtool -X eth1 equal 15 >&2
> Running: sudo ethtool -N eth1 flow-type tcp6 src-ip 192.168.1.5 dst-ip
> 192.168.1.4 src-port 5224 dst-port 5224 queue 15 >&2
> Invalid src-ip value[192.168.1.5]
> ethtool: bad command line argument(s)
> For more information run ethtool -h
> ./ncdevmem: Failed to configure flow steering
> 
> The ethtool co

Re: [PATCH net-next v6 00/12] selftests: ncdevmem: Add ncdevmem to ksft

2024-10-31 Thread Mina Almasry
On Thu, Oct 31, 2024 at 9:45 AM Mina Almasry  wrote:
>
...
>
> Sorry, 2 issues testing this series:
>
...
>
> 2. Validation is now broken:
>

Validation is re-fixed with this diff:

diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c
b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
index 692c189bb5cc..494ae66d8abf 100644
--- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c
+++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
@@ -568,8 +568,7 @@ int do_server(struct memory_buffer *mem)

if (do_validation)
validate_buffer(
-   ((unsigned char *)tmp_mem) +
-   dmabuf_cmsg->frag_offset,
+   ((unsigned char *)tmp_mem),
dmabuf_cmsg->frag_size);
else
print_nonzero_bytes(tmp_mem,
dmabuf_cmsg->frag_size);

Since memcpy_from_device copies to the beginning of tmp_mem, then the
beginning tmp_mem should be passed to the validation.

-- 
Thanks,
Mina



[PATCH RFT v12 8/8] selftests/clone3: Test shadow stack support

2024-10-31 Thread Mark Brown
Add basic test coverage for specifying the shadow stack for a newly
created thread via clone3(), including coverage of the newly extended
argument structure.  We check that a user specified shadow stack can be
provided, and that invalid combinations of parameters are rejected.

In order to facilitate testing on systems without userspace shadow stack
support we manually enable shadow stacks on startup, this is architecture
specific due to the use of an arch_prctl() on x86. Due to interactions with
potential userspace locking of features we actually detect support for
shadow stacks on the running system by attempting to allocate a shadow
stack page during initialisation using map_shadow_stack(), warning if this
succeeds when the enable failed.

In order to allow testing of user configured shadow stacks on
architectures with that feature we need to ensure that we do not return
from the function where the clone3() syscall is called in the child
process, doing so would trigger a shadow stack underflow.  To do this we
use inline assembly rather than the standard syscall wrapper to call
clone3().  In order to avoid surprises we also use a syscall rather than
the libc exit() function., this should be overly cautious.

Acked-by: Shuah Khan 
Signed-off-by: Mark Brown 
---
 tools/testing/selftests/clone3/clone3.c   | 143 +-
 tools/testing/selftests/clone3/clone3_selftests.h |  63 ++
 2 files changed, 205 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/clone3/clone3.c 
b/tools/testing/selftests/clone3/clone3.c
index 
5b8b7d640e70132242fc6939450669acd0c534f9..b0378d7418cc8b00caebc6f92f58280bc04b0f80
 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -3,6 +3,7 @@
 /* Based on Christian Brauner's clone3() example */
 
 #define _GNU_SOURCE
+#include 
 #include 
 #include 
 #include 
@@ -11,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -19,8 +21,12 @@
 #include 
 
 #include "../kselftest.h"
+#include "../ksft_shstk.h"
 #include "clone3_selftests.h"
 
+static bool shadow_stack_supported;
+static size_t max_supported_args_size;
+
 enum test_mode {
CLONE3_ARGS_NO_TEST,
CLONE3_ARGS_ALL_0,
@@ -28,6 +34,10 @@ enum test_mode {
CLONE3_ARGS_INVAL_EXIT_SIGNAL_NEG,
CLONE3_ARGS_INVAL_EXIT_SIGNAL_CSIG,
CLONE3_ARGS_INVAL_EXIT_SIGNAL_NSIG,
+   CLONE3_ARGS_SHADOW_STACK,
+   CLONE3_ARGS_SHADOW_STACK_MISALIGNED,
+   CLONE3_ARGS_SHADOW_STACK_NO_TOKEN,
+   CLONE3_ARGS_SHADOW_STACK_NORMAL_MEMORY,
 };
 
 typedef bool (*filter_function)(void);
@@ -44,6 +54,44 @@ struct test {
filter_function filter;
 };
 
+
+/*
+ * We check for shadow stack support by attempting to use
+ * map_shadow_stack() since features may have been locked by the
+ * dynamic linker resulting in spurious errors when we attempt to
+ * enable on startup.  We warn if the enable failed.
+ */
+static void test_shadow_stack_supported(void)
+{
+   long ret;
+
+   ret = syscall(__NR_map_shadow_stack, 0, getpagesize(), 0);
+   if (ret == -1) {
+   ksft_print_msg("map_shadow_stack() not supported\n");
+   } else if ((void *)ret == MAP_FAILED) {
+   ksft_print_msg("Failed to map shadow stack\n");
+   } else {
+   ksft_print_msg("Shadow stack supportd\n");
+   shadow_stack_supported = true;
+
+   if (!shadow_stack_enabled)
+   ksft_print_msg("Mapped but did not enable shadow 
stack\n");
+   }
+}
+
+static void *get_shadow_stack_page(unsigned long flags)
+{
+   unsigned long long page;
+
+   page = syscall(__NR_map_shadow_stack, 0, getpagesize(), flags);
+   if ((void *)page == MAP_FAILED) {
+   ksft_print_msg("map_shadow_stack() failed: %d\n", errno);
+   return 0;
+   }
+
+   return (void *)page;
+}
+
 static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
 {
struct __clone_args args = {
@@ -57,6 +105,7 @@ static int call_clone3(uint64_t flags, size_t size, enum 
test_mode test_mode)
} args_ext;
 
pid_t pid = -1;
+   void *p;
int status;
 
memset(&args_ext, 0, sizeof(args_ext));
@@ -89,6 +138,26 @@ static int call_clone3(uint64_t flags, size_t size, enum 
test_mode test_mode)
case CLONE3_ARGS_INVAL_EXIT_SIGNAL_NSIG:
args.exit_signal = 0x00f0ULL;
break;
+   case CLONE3_ARGS_SHADOW_STACK:
+   p = get_shadow_stack_page(SHADOW_STACK_SET_TOKEN);
+   p += getpagesize() - sizeof(void *);
+   args.shadow_stack_pointer = (unsigned long long)p;
+   break;
+   case CLONE3_ARGS_SHADOW_STACK_MISALIGNED:
+   p = get_shadow_stack_page(SHADOW_STACK_SET_TOKEN);
+   p += getpagesize() - sizeof(void *) - 1;
+   args.shadow_stack_pointer = (u

Re: [PATCH] kvm: selftest: fix noop test in guest_memfd_test.c

2024-10-31 Thread Sean Christopherson
On Thu, 24 Oct 2024 10:59:53 +0100, Patrick Roy wrote:
> The loop in test_create_guest_memfd_invalid that is supposed to test
> that nothing is accepted as a valid flag to KVM_CREATE_GUEST_MEMFD was
> initializing `flag` as 0 instead of BIT(0). This caused the loop to
> immediately exit instead of iterating over BIT(0), BIT(1), ... .

Applied to kvm-x86 fixes, thanks!

[1/1] kvm: selftest: fix noop test in guest_memfd_test.c
  https://github.com/kvm-x86/linux/commit/fd5b88cc7fbf

--
https://github.com/kvm-x86/linux/tree/next



Re: [PATCH -next] KVM: selftests: Use ARRAY_SIZE for array length

2024-10-31 Thread Sean Christopherson
On Fri, 13 Sep 2024 13:43:15 +0800, Jiapeng Chong wrote:
> Use of macro ARRAY_SIZE to calculate array size minimizes
> the redundant code and improves code reusability.
> 
> ./tools/testing/selftests/kvm/x86_64/debug_regs.c:169:32-33: WARNING: Use 
> ARRAY_SIZE.

Applied to kvm-x86 selftests, thanks!

[1/1] KVM: selftests: Use ARRAY_SIZE for array length
  https://github.com/kvm-x86/linux/commit/f8912210eb21

--
https://github.com/kvm-x86/linux/tree/next



Re: [PATCH v3 2/9] KVM: selftests: Add a basic SNP smoke test

2024-10-31 Thread Pratik R. Sampat
Hi Sean,

On 10/30/2024 12:57 PM, Sean Christopherson wrote:
> On Wed, Oct 30, 2024, Pratik R. Sampat wrote:
>> On 10/30/2024 8:46 AM, Sean Christopherson wrote:
>>> +/* Minimum firmware version required for the SEV-SNP support */
>>> +#define SNP_FW_REQ_VER_MAJOR   1
>>> +#define SNP_FW_REQ_VER_MINOR   51
>>>
>>> Side topic, why are these hardcoded?  And where did they come from?  If 
>>> they're
>>> arbitrary KVM selftests values, make that super duper clear.
>>
>> Well, it's not entirely arbitrary. This was the version that SNP GA'd
>> with first so that kind of became the minimum required version needed.
>>
>> I think the only place we've documented this is here -
>> https://github.com/AMDESE/AMDSEV/tree/snp-latest?tab=readme-ov-file#upgrade-sev-firmware.
>>
>> Maybe, I can modify the comment above to say something like -
>> Minimum general availability release firmware required for SEV-SNP support.
> 
> Hmm, so if AMD says SNP is only supported for firmware version >= 1.51, why on
> earth is that not checked and enforced by the kernel?  Relying on userspace to
> not crash the host (or worse) because of unsupported firmware is not a winning
> strategy.

We do check against the firmware level 1.51 while setting things up
first (drivers/crypto/ccp/sev-dev.c:__sev_snp_init_locked()) and we bail
out if it's otherwise. From the userspace, calls to KVM_SEV_INIT2 or any
other corresponding SNP calls should fail cleanly without any adverse
effects to the host.

>From the positive selftest perspective though, we want to make sure it's
both supported and enabled, and skip the test if not.
I believe we can tell if it's supported by the platform using the MSR -
MSR_AMD64_SEV_SNP_ENABLED or the X86_FEATURE_SEV_SNP from the KVM
capabilities. However, to determine if it's enabled from the kernel, I
made this check here. Having said that, I do agree that there should
probably be a better way to expose this support to the userspace.

Thanks
Pratik



Re: [PATCH net-next v6 00/12] selftests: ncdevmem: Add ncdevmem to ksft

2024-10-31 Thread Mina Almasry
On Wed, Oct 30, 2024 at 8:37 AM Stanislav Fomichev  wrote:
>
> On 10/30, Mina Almasry wrote:
> > On Wed, Oct 30, 2024 at 8:13 AM Stanislav Fomichev  
> > wrote:
> > >
> > > On 10/30, Mina Almasry wrote:
> > > > On Wed, Oct 30, 2024 at 7:27 AM Stanislav Fomichev  
> > > > wrote:
> > > > >
> > > > > The goal of the series is to simplify and make it possible to use
> > > > > ncdevmem in an automated way from the ksft python wrapper.
> > > > >
> > > > > ncdevmem is slowly mutated into a state where it uses stdout
> > > > > to print the payload and the python wrapper is added to
> > > > > make sure the arrived payload matches the expected one.
> > > > >
> > > > > v6:
> > > > > - fix compilation issue in 'Unify error handling' patch (Jakub)
> > > > >
> > > >
> > > > Since I saw a compilation failures on a couple of iterations I
> > > > cherry-picked this locally and tested compilation. I'm seeing this:
> > >
> > > Are you cherry picking the whole series or just this patch? It looks
> > > too broken.
> > >
> > > > sudo CFLAGS="-static" make -C ./tools/testing/selftests/drivers/net/hw
> > > > TARGETS=ncdevmem 2>&1
> > > > make: Entering directory
> > > > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw'
> > > >   CC   ncdevmem
> > > > In file included from ncdevmem.c:63:
> > > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:23:43:
> > > > warning: ‘enum ethtool_header_flags’ declared inside parameter list
> > > > will not be visible outside of this definition or declaration
> > > >23 | const char *ethtool_header_flags_str(enum ethtool_header_flags 
> > > > value);
> > > >   |   ^~~~
> > > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:25:41:
> > > > warning: ‘enum ethtool_module_fw_flash_status’ declared inside
> > > > parameter list will not be visible outside of this definition or
> > > > declaration
> > > >25 | ethtool_module_fw_flash_status_str(enum
> > > > ethtool_module_fw_flash_status value);
> > > >   | 
> > > > ^~
> > > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:6766:45:
> > > > error: field ‘status’ has incomplete type
> > > >  6766 | enum ethtool_module_fw_flash_status status;
> > > >   | ^~
> > >
> > > This has been fixed via '#include '
> > >
> > > > ncdevmem.c: In function ‘do_server’:
> > > > ncdevmem.c:517:37: error: storage size of ‘token’ isn’t known
> > > >   517 | struct dmabuf_token token;
> > >
> > > And this, and the rest, don't make sense at all?
> > >
> > > I'll double check on my side.
> >
> > Oh, whoops, I forgot to headers_install first. This works for me:
> >
> > ➜  cos-kernel git:(tcpdevmem-fixes-1) ✗ sudo make headers_install &&
> > sudo CFLAGS="-static" make -C ./tools/testing/selftests/drivers/net/hw
> > TARGETS=ncdevmem 2>&1
> >   INSTALL ./usr/include
> > make: Entering directory
> > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw'
> > make: Nothing to be done for 'all'.
> > make: Leaving directory
> > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw'
> > ➜  cos-kernel git:(tcpdevmem-fixes-1) ✗ find . -iname ncdevmem
> > ./tools/testing/selftests/drivers/net/hw/ncdevmem
> >
> > Sorry for the noise :D
>
> Whew, thanks and no worries!

Sorry, 2 issues testing this series:

1. ipv4 addresses seem broken, or maybe i'm using them wrong.

Client command:
yes $(echo -e \\x01\\x02\\x03\\x04\\x05\\x06) | tr \\n \\0 | head -c
1G | nc 192.168.1.4 5224 -p 5224

Server command and logs:
mina-1 /home/almasrymina # ./ncdevmem -s 192.168.1.4 -c 192.168.1.5 -l
-p 5224 -v 7 -f eth1
here: ynl.c:887:ynl_req_trampoline
using queues 15..16
Running: sudo ethtool -K eth1 ntuple off >&2
Running: sudo ethtool -K eth1 ntuple on >&2
Running: sudo ethtool -n eth1 | grep 'Filter:' | awk '{print $2}' |
xargs -n1 ethtool -N eth1 delete >&2
ethtool: bad command line argument(s)
For more information run ethtool -h
here: ynl.c:887:ynl_req_trampoline
TCP header split: on
Running: sudo ethtool -X eth1 equal 15 >&2
Running: sudo ethtool -N eth1 flow-type tcp6 src-ip 192.168.1.5 dst-ip
192.168.1.4 src-port 5224 dst-port 5224 queue 15 >&2
Invalid src-ip value[192.168.1.5]
ethtool: bad command line argument(s)
For more information run ethtool -h
./ncdevmem: Failed to configure flow steering

The ethtool command to configure flow steering is not working for me.
It thinks it's in v6 mode, when the ip address is a v4 address.
(notice `-s 192.168.1.4 -c 192.168.1.5`). flow-type should be tcp in
this case.

Reverting patch 9e2da4faeccf ("Revert "selftests: ncdevmem: Switch to
AF_IN

Re: [PATCH net-next v11 12/23] ovpn: implement TCP transport

2024-10-31 Thread Sabrina Dubroca
2024-10-29, 11:47:25 +0100, Antonio Quartulli wrote:
> +static void ovpn_socket_release_work(struct work_struct *work)
> +{
> + struct ovpn_socket *sock = container_of(work, struct ovpn_socket, work);
> +
> + ovpn_socket_detach(sock->sock);
> + kfree_rcu(sock, rcu);
> +}
> +
> +static void ovpn_socket_schedule_release(struct ovpn_socket *sock)
> +{
> + INIT_WORK(&sock->work, ovpn_socket_release_work);
> + schedule_work(&sock->work);

How does module unloading know that it has to wait for this work to
complete? Will ovpn_cleanup get stuck until some refcount gets
released by this work?


[...]
> +static void ovpn_tcp_rcv(struct strparser *strp, struct sk_buff *skb)
> +{
> + struct ovpn_peer *peer = container_of(strp, struct ovpn_peer, tcp.strp);
> + struct strp_msg *msg = strp_msg(skb);
> + size_t pkt_len = msg->full_len - 2;
> + size_t off = msg->offset + 2;
> +
> + /* ensure skb->data points to the beginning of the openvpn packet */
> + if (!pskb_pull(skb, off)) {
> + net_warn_ratelimited("%s: packet too small\n",
> +  peer->ovpn->dev->name);
> + goto err;
> + }
> +
> + /* strparser does not trim the skb for us, therefore we do it now */
> + if (pskb_trim(skb, pkt_len) != 0) {
> + net_warn_ratelimited("%s: trimming skb failed\n",
> +  peer->ovpn->dev->name);
> + goto err;
> + }
> +
> + /* we need the first byte of data to be accessible
> +  * to extract the opcode and the key ID later on
> +  */
> + if (!pskb_may_pull(skb, 1)) {
> + net_warn_ratelimited("%s: packet too small to fetch opcode\n",
> +  peer->ovpn->dev->name);
> + goto err;
> + }
> +
> + /* DATA_V2 packets are handled in kernel, the rest goes to user space */
> + if (likely(ovpn_opcode_from_skb(skb, 0) == OVPN_DATA_V2)) {
> + /* hold reference to peer as required by ovpn_recv().
> +  *
> +  * NOTE: in this context we should already be holding a
> +  * reference to this peer, therefore ovpn_peer_hold() is
> +  * not expected to fail
> +  */
> + if (WARN_ON(!ovpn_peer_hold(peer)))
> + goto err;
> +
> + ovpn_recv(peer, skb);
> + } else {
> + /* The packet size header must be there when sending the packet
> +  * to userspace, therefore we put it back
> +  */
> + skb_push(skb, 2);
> + ovpn_tcp_to_userspace(peer, strp->sk, skb);
> + }
> +
> + return;
> +err:
> + netdev_err(peer->ovpn->dev,
> +"cannot process incoming TCP data for peer %u\n", peer->id);

This should also be ratelimited, and maybe just combined with the
net_warn_ratelimited just before each goto.

> + dev_core_stats_rx_dropped_inc(peer->ovpn->dev);
> + kfree_skb(skb);
> + ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_TRANSPORT_ERROR);
> +}

[...]
> +void ovpn_tcp_socket_detach(struct socket *sock)
> +{
[...]
> + /* restore CBs that were saved in ovpn_sock_set_tcp_cb() */
> + sock->sk->sk_data_ready = peer->tcp.sk_cb.sk_data_ready;
> + sock->sk->sk_write_space = peer->tcp.sk_cb.sk_write_space;
> + sock->sk->sk_prot = peer->tcp.sk_cb.prot;
> + sock->sk->sk_socket->ops = peer->tcp.sk_cb.ops;
> + rcu_assign_sk_user_data(sock->sk, NULL);
> +
> + rcu_read_unlock();
> +
> + /* cancel any ongoing work. Done after removing the CBs so that these
> +  * workers cannot be re-armed
> +  */

I'm not sure whether a barrier is needed to prevent compiler/CPU
reordering here.

> + cancel_work_sync(&peer->tcp.tx_work);
> + strp_done(&peer->tcp.strp);
> +}
> +
> +static void ovpn_tcp_send_sock(struct ovpn_peer *peer)
> +{
> + struct sk_buff *skb = peer->tcp.out_msg.skb;
> +
> + if (!skb)
> + return;
> +
> + if (peer->tcp.tx_in_progress)
> + return;
> +
> + peer->tcp.tx_in_progress = true;

Sorry, I never answered your question about my concerns in a previous
review here.

We can reach ovpn_tcp_send_sock in two different contexts:
 - lock_sock (either from ovpn_tcp_sendmsg or ovpn_tcp_tx_work)
 - bh_lock_sock (from ovpn_tcp_send_skb, ie "data path")

These are not fully mutually exclusive. lock_sock grabs bh_lock_sock
(a spinlock) for a brief period to mark the (sleeping/mutex) lock as
taken, and then releases it.

So when bh_lock_sock is held, it's not possible to grab lock_sock. But
when lock_sock is taken, it's still possible to grab bh_lock_sock.


The buggy scenario would be:

  (data path encrypt)   (sendmsg)
  ovpn_tcp_send_skb lock_sock
  bh_lock_sock + owned=1 + 
bh_unlock_sock
  bh_lock_sock
  ovpn_tcp_send_sock_skb  ovpn_tcp_send_sock_skb

Re: [PATCH v4 2/2] vsock/test: fix parameter types in SO_VM_SOCKETS_* calls

2024-10-31 Thread Konstantin Shkolnyy

On 10/31/2024 09:16, Stefano Garzarella wrote:

On Tue, Oct 29, 2024 at 09:49:54AM -0500, Konstantin Shkolnyy wrote:

Change parameters of SO_VM_SOCKETS_* to uint64_t so that they are always


In include/uapi/linux/vm_sockets.h we talk about "unsigned long long",
but in the kernel code we use u64. IIUC "unsigned long long" should be 
u64 on every architecture, at least till we will have some 128-bit cpu, 
right?


I'm not sure what "unsigned long long" would be on a 128-bit machine.


What about using `unsigned long long` as documented in the vm_sockets.h?


I use uint64_t because the kernel uses u64. I think, this way the code
isn't vulnerable to potential variability of "unsigned long long".
If we change to "unsigned long long" should we also change the kernel
to "unsigned long long"?




Re: [PATCH v3 2/9] KVM: selftests: Add a basic SNP smoke test

2024-10-31 Thread Sean Christopherson
On Thu, Oct 31, 2024, Pratik R. Sampat wrote:
> Hi Sean,
> 
> On 10/30/2024 12:57 PM, Sean Christopherson wrote:
> > On Wed, Oct 30, 2024, Pratik R. Sampat wrote:
> >> On 10/30/2024 8:46 AM, Sean Christopherson wrote:
> >>> +/* Minimum firmware version required for the SEV-SNP support */
> >>> +#define SNP_FW_REQ_VER_MAJOR   1
> >>> +#define SNP_FW_REQ_VER_MINOR   51
> >>>
> >>> Side topic, why are these hardcoded?  And where did they come from?  If 
> >>> they're
> >>> arbitrary KVM selftests values, make that super duper clear.
> >>
> >> Well, it's not entirely arbitrary. This was the version that SNP GA'd
> >> with first so that kind of became the minimum required version needed.
> >>
> >> I think the only place we've documented this is here -
> >> https://github.com/AMDESE/AMDSEV/tree/snp-latest?tab=readme-ov-file#upgrade-sev-firmware.
> >>
> >> Maybe, I can modify the comment above to say something like -
> >> Minimum general availability release firmware required for SEV-SNP support.
> > 
> > Hmm, so if AMD says SNP is only supported for firmware version >= 1.51, why 
> > on
> > earth is that not checked and enforced by the kernel?  Relying on userspace 
> > to
> > not crash the host (or worse) because of unsupported firmware is not a 
> > winning
> > strategy.
> 
> We do check against the firmware level 1.51 while setting things up
> first (drivers/crypto/ccp/sev-dev.c:__sev_snp_init_locked()) and we bail
> out if it's otherwise. From the userspace, calls to KVM_SEV_INIT2 or any
> other corresponding SNP calls should fail cleanly without any adverse
> effects to the host.

And I'm saying, that's not good enough.  If the platform doesn't support SNP,
the KVM *must not* advertise support for SNP.

> From the positive selftest perspective though, we want to make sure it's
> both supported and enabled, and skip the test if not.

No, we want the test to assert that KVM reports SNP support if and only if SNP
is 100% supported.

> I believe we can tell if it's supported by the platform using the MSR -
> MSR_AMD64_SEV_SNP_ENABLED or the X86_FEATURE_SEV_SNP from the KVM
> capabilities. However, to determine if it's enabled from the kernel, I
> made this check here. Having said that, I do agree that there should
> probably be a better way to expose this support to the userspace.
> 
> Thanks
> Pratik



Re: [PATCH v8 2/3] modpost: Produce extended MODVERSIONS information

2024-10-31 Thread Luis Chamberlain
On Wed, Oct 30, 2024 at 11:05:03PM +, Matthew Maurer wrote:
> diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
> index 
> e6b2427e5c190aacf7b9c5c1bb57fca39d311564..a31c617cd67d3d66b24d2fba34cbd5cc9c53ab78
>  100644
> --- a/kernel/module/Kconfig
> +++ b/kernel/module/Kconfig
> @@ -208,6 +208,16 @@ config ASM_MODVERSIONS
> assembly. This can be enabled only when the target architecture
> supports it.
>  
> +config EXTENDED_MODVERSIONS
> + bool "Extended Module Versioning Support"
> + depends on MODVERSIONS
> + help
> +   This enables extended MODVERSIONs support, allowing long symbol
> +   names to be versioned.
> +
> +   The most likely reason you would enable this is to enable Rust
> +   support. If unsure, say N.
> +

The question is, if only extended moversions are used, what new tooling
requirements are there? Can you test using only extended modversions?

  Luis



Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)

2024-10-31 Thread Sabrina Dubroca
2024-10-29, 11:47:22 +0100, Antonio Quartulli wrote:
> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
> +{
[...]
> + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
> + if (unlikely(opcode != OVPN_DATA_V2)) {
> + /* DATA_V1 is not supported */
> + if (opcode == OVPN_DATA_V1)

The TCP encap code passes everything that's not V2 to userspace. Why
not do that with UDP as well?

> + goto drop;
> +
> + /* unknown or control packet: let it bubble up to userspace */
> + return 1;
> + }
> +
> + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
> + /* some OpenVPN server implementations send data packets with the
> +  * peer-id set to undef. In this case we skip the peer lookup by peer-id
> +  * and we try with the transport address
> +  */
> + if (peer_id != OVPN_PEER_ID_UNDEF) {
> + peer = ovpn_peer_get_by_id(ovpn, peer_id);
> + if (!peer) {
> + net_err_ratelimited("%s: received data from unknown 
> peer (id: %d)\n",
> + __func__, peer_id);
> + goto drop;
> + }
> + }
> +
> + if (!peer) {

nit: that could be an "else" combined with the previous case?

> + /* data packet with undef peer-id */
> + peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
> + if (unlikely(!peer)) {
> + net_dbg_ratelimited("%s: received data with undef 
> peer-id from unknown source\n",
> + __func__);
> + goto drop;
> + }
> + }

-- 
Sabrina



Re: [PATCH net v6] ipv6: Fix soft lockups in fib6_select_path under high next hop churn

2024-10-31 Thread Paolo Abeni
On 10/25/24 09:30, Omid Ehtemam-Haghighi wrote:
> Soft lockups have been observed on a cluster of Linux-based edge routers
> located in a highly dynamic environment. Using the `bird` service, these
> routers continuously update BGP-advertised routes due to frequently
> changing nexthop destinations, while also managing significant IPv6
> traffic. The lockups occur during the traversal of the multipath
> circular linked-list in the `fib6_select_path` function, particularly
> while iterating through the siblings in the list. The issue typically
> arises when the nodes of the linked list are unexpectedly deleted
> concurrently on a different core—indicated by their 'next' and
> 'previous' elements pointing back to the node itself and their reference
> count dropping to zero. This results in an infinite loop, leading to a
> soft lockup that triggers a system panic via the watchdog timer.
> 
> Apply RCU primitives in the problematic code sections to resolve the
> issue. Where necessary, update the references to fib6_siblings to
> annotate or use the RCU APIs.
> 
> Include a test script that reproduces the issue. The script
> periodically updates the routing table while generating a heavy load
> of outgoing IPv6 traffic through multiple iperf3 clients. It
> consistently induces infinite soft lockups within a couple of minutes.
> 
> Kernel log:
> 
>  0 [bd13003e8d30] machine_kexec at 8ceaf3eb
>  1 [bd13003e8d90] __crash_kexec at 8d0120e3
>  2 [bd13003e8e58] panic at 8cef65d4
>  3 [bd13003e8ed8] watchdog_timer_fn at 8d05cb03
>  4 [bd13003e8f08] __hrtimer_run_queues at 8cfec62f
>  5 [bd13003e8f70] hrtimer_interrupt at 8cfed756
>  6 [bd13003e8fd0] __sysvec_apic_timer_interrupt at 8cea01af
>  7 [bd13003e8ff0] sysvec_apic_timer_interrupt at 8df1b83d
> --  --
>  8 [bd13003d3708] asm_sysvec_apic_timer_interrupt at 8e000ecb
> [exception RIP: fib6_select_path+299]
> RIP: 8ddafe7b  RSP: bd13003d37b8  RFLAGS: 0287
> RAX: 975850b43600  RBX: 975850b40200  RCX: 
> RDX: 3fff  RSI: 51d383e4  RDI: 975850b43618
> RBP: bd13003d3800   R8:    R9: 975850b40200
> R10:   R11:   R12: bd13003d3830
> R13: 975850b436a8  R14: 975850b43600  R15: 0007
> ORIG_RAX:   CS: 0010  SS: 0018
>  9 [bd13003d3808] ip6_pol_route at 8ddb030c
> 10 [bd13003d3888] ip6_pol_route_input at 8ddb068c
> 11 [bd13003d3898] fib6_rule_lookup at 8ddf02b5
> 12 [bd13003d3928] ip6_route_input at 8ddb0f47
> 13 [bd13003d3a18] ip6_rcv_finish_core.constprop.0 at 8dd950d0
> 14 [bd13003d3a30] ip6_list_rcv_finish.constprop.0 at 8dd96274
> 15 [bd13003d3a98] ip6_sublist_rcv at 8dd96474
> 16 [bd13003d3af8] ipv6_list_rcv at 8dd96615
> 17 [bd13003d3b60] __netif_receive_skb_list_core at 8dc16fec
> 18 [bd13003d3be0] netif_receive_skb_list_internal at 8dc176b3
> 19 [bd13003d3c50] napi_gro_receive at 8dc565b9
> 20 [bd13003d3c80] ice_receive_skb at c087e4f5 [ice]
> 21 [bd13003d3c90] ice_clean_rx_irq at c0881b80 [ice]
> 22 [bd13003d3d20] ice_napi_poll at c088232f [ice]
> 23 [bd13003d3d80] __napi_poll at 8dc18000
> 24 [bd13003d3db8] net_rx_action at 8dc18581
> 25 [bd13003d3e40] __do_softirq at 8df352e9
> 26 [bd13003d3eb0] run_ksoftirqd at 8ceffe47
> 27 [bd13003d3ec0] smpboot_thread_fn at 8cf36a30
> 28 [bd13003d3ee8] kthread at 8cf2b39f
> 29 [bd13003d3f28] ret_from_fork at 8ce5fa64
> 30 [bd13003d3f50] ret_from_fork_asm at 8ce03cbb
> 
> Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in 
> fib6_table")
> Reported-by: Adrian Oliver 
> Signed-off-by: Omid Ehtemam-Haghighi 
> Cc: David S. Miller 
> Cc: David Ahern 
> Cc: Eric Dumazet 
> Cc: Jakub Kicinski 
> Cc: Paolo Abeni 
> Cc: Shuah Khan 
> Cc: Ido Schimmel 
> Cc: Kuniyuki Iwashima 
> Cc: Simon Horman 
> Cc: net...@vger.kernel.org
> Cc: linux-kselft...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org

Given the issue is long-standing, and the fix is somewhat invasive, I
suggest steering this patch on net-next.

Would that be ok for you?

Thanks,

Paolo




Re: next-20241031: kernel/time/clockevents.c:455 clockevents_register_device

2024-10-31 Thread Thomas Gleixner
On Thu, Oct 31 2024 at 10:56, Frederic Weisbecker wrote:
> On Thu, Oct 31, 2024 at 02:10:14PM +0530, Naresh Kamboju wrote:
>> <4>[ 0.220657] WARNING: CPU: 1 PID: 0 at kernel/time/clockevents.c:455
>> clockevents_register_device (kernel/time/clockevents.c:455
>
> It's possible that I messed up something with clockevents.

Yes. You added the warning :)

> Can you try to reproduce with:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
>   timers/core

I force pushed the branch with the warning removed.

> I wish I could reproduce on my own but I don't have easy
> access to such hardware.

apt-get install qemu-system

gives you access to all supported architectures :)






Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)

2024-10-31 Thread Antonio Quartulli

On 31/10/2024 12:29, Sabrina Dubroca wrote:

2024-10-29, 11:47:22 +0100, Antonio Quartulli wrote:

+static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
+{

[...]

+   opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
+   if (unlikely(opcode != OVPN_DATA_V2)) {
+   /* DATA_V1 is not supported */
+   if (opcode == OVPN_DATA_V1)


The TCP encap code passes everything that's not V2 to userspace. Why
not do that with UDP as well?


If that's the case, then this is a bug in the TCP code.

DATA_Vx packets are part of the data channel and userspace can't do 
anything with them (userspace handles the control channel only when the 
ovpn module is in use).


I'll go check the TCP code then, because sending DATA_V1 to userspace is 
not expected. Thanks for noticing this discrepancy.





+   goto drop;
+
+   /* unknown or control packet: let it bubble up to userspace */
+   return 1;
+   }
+
+   peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
+   /* some OpenVPN server implementations send data packets with the
+* peer-id set to undef. In this case we skip the peer lookup by peer-id
+* and we try with the transport address
+*/
+   if (peer_id != OVPN_PEER_ID_UNDEF) {
+   peer = ovpn_peer_get_by_id(ovpn, peer_id);
+   if (!peer) {
+   net_err_ratelimited("%s: received data from unknown peer 
(id: %d)\n",
+   __func__, peer_id);
+   goto drop;
+   }
+   }
+
+   if (!peer) {


nit: that could be an "else" combined with the previous case?


mhh that's true. Then I can combine the two "if (!peer)" in one block only.


Thanks!
Regards,




+   /* data packet with undef peer-id */
+   peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
+   if (unlikely(!peer)) {
+   net_dbg_ratelimited("%s: received data with undef peer-id 
from unknown source\n",
+   __func__);
+   goto drop;
+   }
+   }




--
Antonio Quartulli
OpenVPN Inc.




Re: [PATCH net-next v11 11/23] ovpn: store tunnel and transport statistics

2024-10-31 Thread Antonio Quartulli




On 31/10/2024 12:37, Sabrina Dubroca wrote:

2024-10-29, 11:47:24 +0100, Antonio Quartulli wrote:

@@ -136,6 +139,10 @@ void ovpn_decrypt_post(void *data, int ret)
goto drop;
}
  
+	/* increment RX stats */

+   ovpn_peer_stats_increment_rx(&peer->vpn_stats, skb->len);
+   ovpn_peer_stats_increment_rx(&peer->link_stats, orig_len);


[I don't know much about the userspace implementation, so maybe this
is a silly question]

What's the value of keeping track of 2 separate stats if they are
incremented exactly at the same time? Packet count will be the same,
and the difference in bytes will be just measuring the encap overhead.

Should one of them be "packets/individual messages that get received
over the UDP/TCP link" and the other "packets that get passed up to
the stack"?


You're correct: link_stats if "received over the TCP/UDP socket", while 
vpn_stats if what is passing through the ovpn virtual device.


Packet count may not match though, for example when something happens 
between "received packet on the link" and "packet passed up to the 
device" (i.e. decryption error).


This makes me wonder why we increment them at the very same place
link_stats should be increased upon RX from the socket, while vpn_stats 
just before delivery. I'll double check.






@@ -197,6 +206,8 @@ void ovpn_encrypt_post(void *data, int ret)
goto err;
  
  	skb_mark_not_on_list(skb);

+   ovpn_peer_stats_increment_tx(&peer->link_stats, skb->len);
+   ovpn_peer_stats_increment_tx(&peer->vpn_stats, orig_len);
  
  	switch (peer->sock->sock->sk->sk_protocol) {

case IPPROTO_UDP:


And on TX maybe something like "packets that the stack wants to send
through the tunnel" and "packets that actually make it onto the
UDP/TCP socket after encap/encrypt"?


Correct.

Same issue here. Increments should not happen back to back.


Thanks a lot for spotting these.

Regards,






--
Antonio Quartulli
OpenVPN Inc.




Re: [PATCH v4 2/2] vsock/test: fix parameter types in SO_VM_SOCKETS_* calls

2024-10-31 Thread Stefano Garzarella

On Tue, Oct 29, 2024 at 09:49:54AM -0500, Konstantin Shkolnyy wrote:

Change parameters of SO_VM_SOCKETS_* to uint64_t so that they are always


In include/uapi/linux/vm_sockets.h we talk about "unsigned long long",
but in the kernel code we use u64. IIUC "unsigned long long" should be 
u64 on every architecture, at least till we will have some 128-bit cpu, 
right?


64-bit, because the corresponding kernel code requires them to be at 
least

that large, no matter what architecture.

Fixes: 5c338112e48a ("test/vsock: rework message bounds test")
Fixes: 685a21c314a8 ("test/vsock: add big message test")
Fixes: 542e893fbadc ("vsock/test: two tests to check credit update logic")
Fixes: 8abbffd27ced ("test/vsock: vsock_perf utility")
Signed-off-by: Konstantin Shkolnyy 
---
tools/testing/vsock/vsock_perf.c |  2 +-
tools/testing/vsock/vsock_test.c | 19 ++-
2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/tools/testing/vsock/vsock_perf.c b/tools/testing/vsock/vsock_perf.c
index 22633c2848cc..88f6be4162a6 100644
--- a/tools/testing/vsock/vsock_perf.c
+++ b/tools/testing/vsock/vsock_perf.c
@@ -33,7 +33,7 @@

static unsigned int port = DEFAULT_PORT;
static unsigned long buf_size_bytes = DEFAULT_BUF_SIZE_BYTES;
-static unsigned long vsock_buf_bytes = DEFAULT_VSOCK_BUF_BYTES;
+static uint64_t vsock_buf_bytes = DEFAULT_VSOCK_BUF_BYTES;


What about using `unsigned long long` as documented in the vm_sockets.h?

Thanks,
Stefano


static bool zerocopy;

static void error(const char *s)
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 7fd25b814b4b..49a32515886f 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -429,7 +429,7 @@ static void test_seqpacket_msg_bounds_client(const struct 
test_opts *opts)

static void test_seqpacket_msg_bounds_server(const struct test_opts *opts)
{
-   unsigned long sock_buf_size;
+   uint64_t sock_buf_size;
unsigned long remote_hash;
unsigned long curr_hash;
int fd;
@@ -634,7 +634,8 @@ static void test_seqpacket_timeout_server(const struct 
test_opts *opts)

static void test_seqpacket_bigmsg_client(const struct test_opts *opts)
{
-   unsigned long sock_buf_size;
+   uint64_t sock_buf_size;
+   size_t buf_size;
socklen_t len;
void *data;
int fd;
@@ -655,13 +656,19 @@ static void test_seqpacket_bigmsg_client(const struct 
test_opts *opts)

sock_buf_size++;

-   data = malloc(sock_buf_size);
+   buf_size = (size_t) sock_buf_size; /* size_t can be < uint64_t */
+   if (buf_size != sock_buf_size) {
+   fprintf(stderr, "Returned BUFFER_SIZE too large\n");
+   exit(EXIT_FAILURE);
+   }
+
+   data = malloc(buf_size);
if (!data) {
perror("malloc");
exit(EXIT_FAILURE);
}

-   send_buf(fd, data, sock_buf_size, 0, -EMSGSIZE);
+   send_buf(fd, data, buf_size, 0, -EMSGSIZE);

control_writeln("CLISENT");

@@ -1360,6 +1367,7 @@ static void test_stream_credit_update_test(const struct 
test_opts *opts,
int recv_buf_size;
struct pollfd fds;
size_t buf_size;
+   uint64_t sock_buf_size;
void *buf;
int fd;

@@ -1370,9 +1378,10 @@ static void test_stream_credit_update_test(const struct 
test_opts *opts,
}

buf_size = RCVLOWAT_CREDIT_UPD_BUF_SIZE;
+   sock_buf_size = buf_size; /* size_t can be < uint64_t */

if (setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_SIZE,
-  &buf_size, sizeof(buf_size))) {
+  &sock_buf_size, sizeof(sock_buf_size))) {
perror("setsockopt(SO_VM_SOCKETS_BUFFER_SIZE)");
exit(EXIT_FAILURE);
}
--
2.34.1






Re: [PATCH net v6] ipv6: Fix soft lockups in fib6_select_path under high next hop churn

2024-10-31 Thread David Ahern
On 10/31/24 4:13 AM, Paolo Abeni wrote:
> Given the issue is long-standing, and the fix is somewhat invasive, I
> suggest steering this patch on net-next.
> 


FWIW, I think net-next is best.




Re: [PATCH net-next v11 11/23] ovpn: store tunnel and transport statistics

2024-10-31 Thread Sabrina Dubroca
2024-10-29, 11:47:24 +0100, Antonio Quartulli wrote:
> @@ -136,6 +139,10 @@ void ovpn_decrypt_post(void *data, int ret)
>   goto drop;
>   }
>  
> + /* increment RX stats */
> + ovpn_peer_stats_increment_rx(&peer->vpn_stats, skb->len);
> + ovpn_peer_stats_increment_rx(&peer->link_stats, orig_len);

[I don't know much about the userspace implementation, so maybe this
is a silly question]

What's the value of keeping track of 2 separate stats if they are
incremented exactly at the same time? Packet count will be the same,
and the difference in bytes will be just measuring the encap overhead.

Should one of them be "packets/individual messages that get received
over the UDP/TCP link" and the other "packets that get passed up to
the stack"?


> @@ -197,6 +206,8 @@ void ovpn_encrypt_post(void *data, int ret)
>   goto err;
>  
>   skb_mark_not_on_list(skb);
> + ovpn_peer_stats_increment_tx(&peer->link_stats, skb->len);
> + ovpn_peer_stats_increment_tx(&peer->vpn_stats, orig_len);
>  
>   switch (peer->sock->sock->sk->sk_protocol) {
>   case IPPROTO_UDP:

And on TX maybe something like "packets that the stack wants to send
through the tunnel" and "packets that actually make it onto the
UDP/TCP socket after encap/encrypt"?

-- 
Sabrina



Re: [PATCH net-next v11 12/23] ovpn: implement TCP transport

2024-10-31 Thread Antonio Quartulli

On 29/10/2024 11:47, Antonio Quartulli wrote:
[...]

+
+   /* DATA_V2 packets are handled in kernel, the rest goes to user space */
+   if (likely(ovpn_opcode_from_skb(skb, 0) == OVPN_DATA_V2)) {
+   /* hold reference to peer as required by ovpn_recv().
+*
+* NOTE: in this context we should already be holding a
+* reference to this peer, therefore ovpn_peer_hold() is
+* not expected to fail
+*/
+   if (WARN_ON(!ovpn_peer_hold(peer)))
+   goto err;
+
+   ovpn_recv(peer, skb);
+   } else {


As pointed out by Sabrina, we are indeed sending DATA_V1 packets to 
userspace.

Not a big deal because userspace will likely ignore or drop them.

However, I will change this and mirror what we do for UDP.

Thanks.

Regards,



+   /* The packet size header must be there when sending the packet
+* to userspace, therefore we put it back
+*/
+   skb_push(skb, 2);
+   ovpn_tcp_to_userspace(peer, strp->sk, skb);
+   }
+
+   return;




--
Antonio Quartulli
OpenVPN Inc.




Re: [PATCH net-next v2 2/4] net: hsr: Add VLAN CTAG filter support

2024-10-31 Thread Paolo Abeni



On 10/24/24 12:30, MD Danish Anwar wrote:
> From: Murali Karicheri 
> 
> This patch adds support for VLAN ctag based filtering at slave devices.
> The slave ethernet device may be capable of filtering ethernet packets
> based on VLAN ID. This requires that when the VLAN interface is created
> over an HSR/PRP interface, it passes the VID information to the
> associated slave ethernet devices so that it updates the hardware
> filters to filter ethernet frames based on VID. This patch adds the
> required functions to propagate the vid information to the slave
> devices.
> 
> Signed-off-by: Murali Karicheri 
> Signed-off-by: MD Danish Anwar 
> ---
>  net/hsr/hsr_device.c | 71 +++-
>  1 file changed, 70 insertions(+), 1 deletion(-)
> 
> diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
> index 0ca47ebb01d3..ff586bdc2bde 100644
> --- a/net/hsr/hsr_device.c
> +++ b/net/hsr/hsr_device.c
> @@ -515,6 +515,68 @@ static void hsr_change_rx_flags(struct net_device *dev, 
> int change)
>   }
>  }
>  
> +static int hsr_ndo_vlan_rx_add_vid(struct net_device *dev,
> +__be16 proto, u16 vid)
> +{
> + struct hsr_port *port;
> + struct hsr_priv *hsr;
> + int ret = 0;
> +
> + hsr = netdev_priv(dev);
> +
> + hsr_for_each_port(hsr, port) {
> + if (port->type == HSR_PT_MASTER)
> + continue;

If the desired behavior is to ignore INTERLINK port, I think you should
explicitly skip them here, otherwise you will end-up in a
nondeterministic state.

> + ret = vlan_vid_add(port->dev, proto, vid);
> + switch (port->type) {
> + case HSR_PT_SLAVE_A:
> + if (ret) {
> + netdev_err(dev, "add vid failed for Slave-A\n");
> + return ret;
> + }
> + break;
> +
> + case HSR_PT_SLAVE_B:
> + if (ret) {
> + /* clean up Slave-A */
> + netdev_err(dev, "add vid failed for Slave-B\n");
> + vlan_vid_del(port->dev, proto, vid);

This code relies on a specific port_list order - which is actually
respected at list creation time. Still such assumption looks fragile and
may lead to long term bugs.

I think would be better to refactor the above loop handling arbitrary
HSR_PT_SLAVE_A, HSR_PT_SLAVE_B order. Guestimate is that the complexity
will not increase measurably.

> + return ret;
> + }
> + break;
> + default:
> + break;
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int hsr_ndo_vlan_rx_kill_vid(struct net_device *dev,
> + __be16 proto, u16 vid)
> +{
> + struct hsr_port *port;
> + struct hsr_priv *hsr;
> +
> + hsr = netdev_priv(dev);
> +
> + hsr_for_each_port(hsr, port) {
> + if (port->type == HSR_PT_MASTER)
> + continue;

I think it would be more consistent just removing the above statement...

> + switch (port->type) {
> + case HSR_PT_SLAVE_A:
> + case HSR_PT_SLAVE_B:
> + vlan_vid_del(port->dev, proto, vid);
> + break;
> + default:> + break;

... MASTER and INTERLINK port will be ignored anyway.

> + }
> + }
> +
> + return 0;
> +}
> +
>  static const struct net_device_ops hsr_device_ops = {
>   .ndo_change_mtu = hsr_dev_change_mtu,
>   .ndo_open = hsr_dev_open,

Cheers,

Paolo




[PATCH v4 0/3] rust: kunit: Support KUnit tests with a user-space like syntax

2024-10-31 Thread David Gow
This series was originally written by José Expósito, and has been
modified and updated by Matt Gilbride and myself. The original version
can be found here:
https://github.com/Rust-for-Linux/linux/pull/950

Add support for writing KUnit tests in Rust. While Rust doctests are
already converted to KUnit tests and run, they're really better suited
for examples, rather than as first-class unit tests.

This series implements a series of direct Rust bindings for KUnit tests,
as well as a new macro which allows KUnit tests to be written using a
close variant of normal Rust unit test syntax. The only change required
is replacing '#[cfg(test)]' with '#[kunit_tests(kunit_test_suite_name)]'

An example test would look like:
#[kunit_tests(rust_kernel_hid_driver)]
mod tests {
use super::*;
use crate::{c_str, driver, hid, prelude::*};
use core::ptr;

struct SimpleTestDriver;
impl Driver for SimpleTestDriver {
type Data = ();
}

#[test]
fn rust_test_hid_driver_adapter() {
let mut hid = bindings::hid_driver::default();
let name = c_str!("SimpleTestDriver");
static MODULE: ThisModule = unsafe { 
ThisModule::from_ptr(ptr::null_mut()) };

let res = unsafe {
 as 
driver::DriverOps>::register(&mut hid, name, &MODULE)
};
assert_eq!(res, Err(ENODEV)); // The mock returns -19
}
}


Please give this a go, and make sure I haven't broken it! There's almost
certainly a lot of improvements which can be made -- and there's a fair
case to be made for replacing some of this with generated C code which
can use the C macros -- but this is hopefully an adequate implementation
for now, and the interface can (with luck) remain the same even if the
implementation changes.

A few small notable missing features:
- Attributes (like the speed of a test) are hardcoded to the default
  value.
- Similarly, the module name attribute is hardcoded to NULL. In C, we
  use the KBUILD_MODNAME macro, but I couldn't find a way to use this
  from Rust which wasn't more ugly than just disabling it.
- Assertions are not automatically rewritten to use KUnit assertions.

---

Changes since v3:
https://lore.kernel.org/linux-kselftest/20241030045719.3085147-2-david...@google.com/T/
- The kunit_unsafe_test_suite!() macro now panic!s if the suite name is
  too long, triggering a compile error. (Thanks, Alice!)
- The #[kunit_tests()] macro now preserves span information, so
  errors can be better reported. (Thanks, Boqun!)
- The example tests have been updated to no longer use assert_eq!() with
  a constant bool argument (which triggered a clippy warning now we
  have the span info).

Changes since v2:
https://lore.kernel.org/linux-kselftest/20241029092422.2884505-1-david...@google.com/T/
- Include missing rust/macros/kunit.rs file from v2. (Thanks Boqun!)
- The kunit_unsafe_test_suite!() macro will truncate the name of the
  suite if it is too long. (Thanks Alice!)
- The proc macro now emits an error if the suite name is too long.
- We no longer needlessly use UnsafeCell<> in
  kunit_unsafe_test_suite!(). (Thanks Alice!)

Changes since v1:
https://lore.kernel.org/lkml/20230720-rustbind-v1-0-c80db349e...@google.com/T/
- Rebase on top of the latest rust-next (commit 718c4069896c)
- Make kunit_case a const fn, rather than a macro (Thanks Boqun)
- As a result, the null terminator is now created with
  kernel::kunit::kunit_case_null()
- Use the C kunit_get_current_test() function to implement
  in_kunit_test(), rather than re-implementing it (less efficiently)
  ourselves.

Changes since the GitHub PR:
- Rebased on top of kselftest/kunit
- Add const_mut_refs feature
  This may conflict with 
https://lore.kernel.org/lkml/20230503090708.2524310-6-...@metaspace.dk/
- Add rust/macros/kunit.rs to the KUnit MAINTAINERS entry

---

José Expósito (3):
  rust: kunit: add KUnit case and suite macros
  rust: macros: add macro to easily run KUnit tests
  rust: kunit: allow to know if we are in a test

 MAINTAINERS  |   1 +
 rust/kernel/kunit.rs | 194 +++
 rust/kernel/lib.rs   |   1 +
 rust/macros/kunit.rs | 168 +
 rust/macros/lib.rs   |  29 +++
 5 files changed, 393 insertions(+)
 create mode 100644 rust/macros/kunit.rs

-- 
2.47.0.199.ga7371fff76-goog




Re: [PATCH v3 2/3] rust: macros: add macro to easily run KUnit tests

2024-10-31 Thread David Gow
On Wed, 30 Oct 2024 at 13:11, Boqun Feng  wrote:
>
> On Wed, Oct 30, 2024 at 12:57:14PM +0800, David Gow wrote:
> > From: José Expósito 
> > Add a new procedural macro (`#[kunit_tests(kunit_test_suit_name)]`) to
> > run KUnit tests using a user-space like syntax.
> >
> > The macro, that should be used on modules, transforms every `#[test]`
> > in a `kunit_case!` and adds a `kunit_unsafe_test_suite!` registering
> > all of them.
> >
> > The only difference with user-space tests is that instead of using
> > `#[cfg(test)]`, `#[kunit_tests(kunit_test_suit_name)]` is used.
> >
> > Note that `#[cfg(CONFIG_KUNIT)]` is added so the test module is not
> > compiled when `CONFIG_KUNIT` is set to `n`.
> >
> > Reviewed-by: David Gow 
> > Signed-off-by: José Expósito 
> > [Updated to use new const fn.]
> > Signed-off-by: David Gow 
> > ---
> >
> > Changes since v2:
> > https://lore.kernel.org/linux-kselftest/20241029092422.2884505-3-david...@google.com/
> > - Include missing rust/macros/kunit.rs file from v2. (Thanks Boqun!)
> > - The proc macro now emits an error if the suite name is too long.
> >
> > Changes since v1:
> > https://lore.kernel.org/lkml/20230720-rustbind-v1-2-c80db349e...@google.com/
> > - Rebased on top of rust-next
> > - Make use of the new const functions, rather than the kunit_case!()
> >   macro.
> >
> > ---
> >  MAINTAINERS  |   1 +
> >  rust/kernel/kunit.rs |  11 
> >  rust/macros/kunit.rs | 153 +++
> >  rust/macros/lib.rs   |  29 
> >  4 files changed, 194 insertions(+)
> >  create mode 100644 rust/macros/kunit.rs
> >

(...snip...)

> > +let new_body: TokenStream = vec![body.stream(), 
> > kunit_macros.parse().unwrap()]
> > +.into_iter()
> > +.collect();
> > +
> > +// Remove the `#[test]` macros.
> > +let new_body = new_body.to_string().replace("#[test]", "");
>
> Yeah, I want to see how you do it this time ;-) So if you do a
> `.to_string()` on a `TokenStream`, you lose all the span [1] information
> ("span information" is a term invited by me, hope I get it right ;-))
> e.g. if there is a compile error in the test code, the compiler cannot
> report the exact line of the error, it can only report there is an
> error.
>
> Last time I find how to preserve the Span:
>
> 
> https://lore.kernel.org/rust-for-linux/ZMba0_XXZuTgWyWY@boqun-archlinux/
>
> Hope it helps!
>
> [1]: https://doc.rust-lang.org/proc_macro/struct.Span.html
>
> Regards,
> Boqun

Thanks. I managed to get this working, but just ended up with an
uglier version of your change, so I've copied it in and added you as a
co-developer for this patch in v4.

It made clippy catch a couple of warnings in the example tests, too,
so it's clearly working well.

Cheers,
-- David


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH] hardware_disable_test: remove unused macro

2024-10-31 Thread Sean Christopherson
On Tue, 03 Sep 2024 12:31:35 +0800, Ba Jing wrote:
> The macro GUEST_CODE_PIO_PORT is never referenced in the code,
> just remove it.

Applied to kvm-x86 selftests, thanks!

[1/1] hardware_disable_test: remove unused macro
  https://github.com/kvm-x86/linux/commit/600aa88014e9

--
https://github.com/kvm-x86/linux/tree/next



Re: [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs

2024-10-31 Thread Colton Lewis

Bumping this for Mingwei

Colton Lewis  writes:


Extend pmu_counters_test to AMD CPUs.



As the AMD PMU is quite different from Intel with different events and
feature sets, this series introduces a new code path to test it,
specifically focusing on the core counters including the
PerfCtrExtCore and PerfMonV2 features. Northbridge counters and cache
counters exist, but are not as important and can be deferred to a
later series.



The first patch is a bug fix that could be submitted separately.



The series has been tested on both Intel and AMD machines, but I have
not found an AMD machine old enough to lack PerfCtrExtCore. I have
made efforts that no part of the code has any dependency on its
presence.



I am aware of similar work in this direction done by Jinrong Liang
[1]. He told me he is not working on it currently and I am not
intruding by making my own submission.


[1]  
https://lore.kernel.org/kvm/20231121115457.76269-1-cloudli...@tencent.com/



v2:
* Test all combinations of VM setup rather than only the maximum
   allowed by hardware
* Add fixes tag to bug fix in patch 1
* Refine some names



v1:
https://lore.kernel.org/kvm/20240813164244.751597-1-coltonle...@google.com/



Colton Lewis (6):
   KVM: x86: selftests: Fix typos in macro variable use
   KVM: x86: selftests: Define AMD PMU CPUID leaves
   KVM: x86: selftests: Set up AMD VM in pmu_counters_test
   KVM: x86: selftests: Test read/write core counters
   KVM: x86: selftests: Test core events
   KVM: x86: selftests: Test PerfMonV2



  .../selftests/kvm/include/x86_64/processor.h  |   7 +
  .../selftests/kvm/x86_64/pmu_counters_test.c  | 304 --
  2 files changed, 277 insertions(+), 34 deletions(-)




base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63
--
2.46.0.662.g92d0881bb0-goog




Re: [PATCH RFT v12 0/8] fork: Support shadow stacks in clone3()

2024-10-31 Thread Edgecombe, Rick P
On Thu, 2024-10-31 at 19:25 +, Mark Brown wrote:
> ---
> base-commit: d17cd7b7cc92d37ee8b2df8f975fc859a261f4dc

Where can I find this base commit?

> change-id: 20231019-clone3-shadow-stack-15d40d2bf536



[PATCH net-next v3 6/7] selftests: net: Add busy_poll_test

2024-10-31 Thread Joe Damato
Add an epoll busy poll test using netdevsim.

This test is comprised of:
  - busy_poller (via busy_poller.c)
  - busy_poll_test.sh which loads netdevsim, sets up network namespaces,
and runs busy_poller to receive data and netcat to send data.

The selftest tests two different scenarios:
  - busy poll (the pre-existing version in the kernel)
  - busy poll with suspend enabled (what this series adds)

The data transmit is a 1MiB temporary file generated from /dev/urandom
and the test is considered passing if the md5sum of the input file to
netcat matches the md5sum of the output file from busy_poller.

netdevsim was chosen instead of veth due to netdevsim's support for
netdev-genl.

For now, this test uses the functionality that netdevim provides. In the
future, perhaps netdevsim can be extended to emulate device IRQs to more
thoroughly test all pre-existing kernel options (like defer_hard_irqs)
and suspend.

Signed-off-by: Joe Damato 
Co-developed-by: Martin Karsten 
Signed-off-by: Martin Karsten 
---
 v3:
   - New in v3

 tools/testing/selftests/net/.gitignore|   1 +
 tools/testing/selftests/net/Makefile  |   2 +
 tools/testing/selftests/net/busy_poll_test.sh | 164 
 tools/testing/selftests/net/busy_poller.c | 245 ++
 4 files changed, 412 insertions(+)
 create mode 100755 tools/testing/selftests/net/busy_poll_test.sh
 create mode 100644 tools/testing/selftests/net/busy_poller.c

diff --git a/tools/testing/selftests/net/.gitignore 
b/tools/testing/selftests/net/.gitignore
index 217d8b7a7365..85b0c4a2179f 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -2,6 +2,7 @@
 bind_bhash
 bind_timewait
 bind_wildcard
+busy_poller
 cmsg_sender
 diag_uid
 epoll_busy_poll
diff --git a/tools/testing/selftests/net/Makefile 
b/tools/testing/selftests/net/Makefile
index 26a4883a65c9..4b5f3d7d900b 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -69,6 +69,7 @@ TEST_GEN_FILES += hwtstamp_config rxtimestamp timestamping 
txtimestamp
 TEST_GEN_FILES += ipsec
 TEST_GEN_FILES += ioam6_parser
 TEST_GEN_FILES += gro
+TEST_GEN_FILES += busy_poller
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
 TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls tun tap 
epoll_busy_poll
 TEST_GEN_FILES += toeplitz
@@ -96,6 +97,7 @@ TEST_PROGS += fdb_flush.sh
 TEST_PROGS += fq_band_pktlimit.sh
 TEST_PROGS += vlan_hw_filter.sh
 TEST_PROGS += bpf_offload.py
+TEST_PROGS += busy_poll_test.sh
 
 # YNL files, must be before "include ..lib.mk"
 YNL_GEN_FILES := ncdevmem
diff --git a/tools/testing/selftests/net/busy_poll_test.sh 
b/tools/testing/selftests/net/busy_poll_test.sh
new file mode 100755
index ..bf33737691a4
--- /dev/null
+++ b/tools/testing/selftests/net/busy_poll_test.sh
@@ -0,0 +1,164 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-only
+source net_helper.sh
+
+NSIM_DEV_1_ID=$((256 + RANDOM % 256))
+NSIM_DEV_1_SYS=/sys/bus/netdevsim/devices/netdevsim$NSIM_DEV_1_ID
+NSIM_DEV_2_ID=$((512 + RANDOM % 256))
+NSIM_DEV_2_SYS=/sys/bus/netdevsim/devices/netdevsim$NSIM_DEV_2_ID
+
+NSIM_DEV_SYS_NEW=/sys/bus/netdevsim/new_device
+NSIM_DEV_SYS_DEL=/sys/bus/netdevsim/del_device
+NSIM_DEV_SYS_LINK=/sys/bus/netdevsim/link_device
+NSIM_DEV_SYS_UNLINK=/sys/bus/netdevsim/unlink_device
+
+YNL_PATH=$(realpath $(dirname ${0}))/../../../net/ynl/cli.py
+SPEC_PATH=$(realpath $(dirname 
${0}))/../../../../Documentation/netlink/specs/netdev.yaml
+
+check_ynl()
+{
+   if [ ! -f ${YNL_PATH} ]; then
+   echo "YNL path invalid: ${YNL_PATH}"
+   exit 1
+   fi
+
+   if [ ! -f ${SPEC_PATH} ]; then
+   echo "spec path invalid: ${SPEC_PATH}"
+   exit 1
+   fi
+}
+
+setup_ns()
+{
+   set -e
+   ip netns add nssv
+   ip netns add nscl
+
+   NSIM_DEV_1_NAME=$(find $NSIM_DEV_1_SYS/net -maxdepth 1 -type d ! \
+   -path $NSIM_DEV_1_SYS/net -exec basename {} \;)
+   NSIM_DEV_2_NAME=$(find $NSIM_DEV_2_SYS/net -maxdepth 1 -type d ! \
+   -path $NSIM_DEV_2_SYS/net -exec basename {} \;)
+
+   # ensure the server has 1 queue
+   ethtool -L $NSIM_DEV_1_NAME combined 1 2>/dev/null
+
+   ip link set $NSIM_DEV_1_NAME netns nssv
+   ip link set $NSIM_DEV_2_NAME netns nscl
+
+   ip netns exec nssv ip addr add '192.168.1.1/24' dev $NSIM_DEV_1_NAME
+   ip netns exec nscl ip addr add '192.168.1.2/24' dev $NSIM_DEV_2_NAME
+
+   ip netns exec nssv ip link set dev $NSIM_DEV_1_NAME up
+   ip netns exec nscl ip link set dev $NSIM_DEV_2_NAME up
+
+   set +e
+}
+
+cleanup_ns()
+{
+   ip netns del nscl
+   ip netns del nssv
+}
+
+test_busypoll()
+{
+   tmp_file=$(mktemp)
+   out_file=$(mktemp)
+
+   # fill a test file with random data
+   dd if=/dev/urandom of=${tmp_file} bs=1M count=1 2> /dev/null
+
+   timeout -k 60s 60s ip netns exec nssv ./bus

Re: [PATCH v8 2/3] modpost: Produce extended MODVERSIONS information

2024-10-31 Thread Matthew Maurer
> The question is, if only extended moversions are used, what new tooling
> requirements are there? Can you test using only extended modversions?
>
>   Luis

I'm not sure precisely what you're asking for. Do you want:
1. A kconfig that suppresses the emission of today's MODVERSIONS
format? This would be fairly easy to do, but I was leaving it enabled
for compatibility's sake, at least until extended modversions become
more common. This way existing `kmod` tools and kernels would continue
to be able to load new-style modules.
2. libkmod support for parsing the new format? I can do that fairly
easily too, but wanted the format actually decided on and accepted
before I started modifying things that read modversions.
3. Something else? Maybe I'm not understanding your comment?



[PATCH RFT v12 4/8] fork: Add shadow stack support to clone3()

2024-10-31 Thread Mark Brown
Unlike with the normal stack there is no API for configuring the the shadow
stack for a new thread, instead the kernel will dynamically allocate a new
shadow stack with the same size as the normal stack. This appears to be due
to the shadow stack series having been in development since before the more
extensible clone3() was added rather than anything more deliberate.

Add a paramter to clone3() specifying the shadow stack pointer to use
for the new thread, this is inconsistent with the way we specify the
normal stack but during review concerns were expressed about having to
identify where the shadow stack pointer should be placed especially in
cases where the shadow stack has been previously active.  If no shadow
stack is specified then the existing implicit allocation behaviour is
maintained.

If a shadow stack pointer is specified then it is required to have an
architecture defined token placed on the stack, this will be consumed by
the new task.  If no valid token is present then this will be reported
with -EINVAL.  This token prevents new threads being created pointing at
the shadow stack of an existing running thread.

If the architecture does not support shadow stacks the shadow stack
pointer must be not be specified, architectures that do support the
feature are expected to enforce the same requirement on individual
systems that lack shadow stack support.

Update the existing arm64 and x86 implementations to pay attention to
the newly added arguments, in order to maintain compatibility we use the
existing behaviour if no shadow stack is specified. Since we are now
using more fields from the kernel_clone_args we pass that into the
shadow stack code rather than individual fields.

Portions of the x86 architecture code were written by Rick Edgecombe.

Signed-off-by: Mark Brown 
---
 arch/arm64/mm/gcs.c  | 54 +-
 arch/x86/include/asm/shstk.h | 11 +++--
 arch/x86/kernel/process.c|  2 +-
 arch/x86/kernel/shstk.c  | 57 +---
 include/asm-generic/cacheflush.h | 11 +
 include/linux/sched/task.h   | 17 +++
 include/uapi/linux/sched.h   | 10 +++--
 kernel/fork.c| 96 +++-
 8 files changed, 232 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index 
1f633a482558b59aac5427963d42b37fce08c8a6..c4e93b7ce05c5dfa1128923ad587f9b5a7fb0051
 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -43,8 +43,24 @@ int gcs_alloc_thread_stack(struct task_struct *tsk,
 {
unsigned long addr, size;
 
-   if (!system_supports_gcs())
+   if (!system_supports_gcs()) {
+   if (args->shadow_stack_pointer)
+   return -EINVAL;
+
+   return 0;
+   }
+
+   /*
+* If the user specified a GCS then use it, otherwise fall
+* back to a default allocation strategy. Validation is done
+* in arch_shstk_validate_clone().
+*/
+   if (args->shadow_stack_pointer) {
+   tsk->thread.gcs_base = 0;
+   tsk->thread.gcs_size = 0;
+   tsk->thread.gcspr_el0 = args->shadow_stack_pointer;
return 0;
+   }
 
if (!task_gcs_el0_enabled(tsk))
return 0;
@@ -68,6 +84,42 @@ int gcs_alloc_thread_stack(struct task_struct *tsk,
return 0;
 }
 
+static bool gcs_consume_token(struct vm_area_struct *vma, struct page *page,
+ unsigned long user_addr)
+{
+   u64 expected = GCS_CAP(user_addr);
+   u64 *token = page_address(page) + offset_in_page(user_addr);
+
+   if (!cmpxchg_to_user_page(vma, page, user_addr, token, expected, 0))
+   return false;
+   set_page_dirty_lock(page);
+
+   return true;
+}
+
+int arch_shstk_validate_clone(struct task_struct *tsk,
+ struct vm_area_struct *vma,
+ struct page *page,
+ struct kernel_clone_args *args)
+{
+unsigned long gcspr_el0;
+int ret = 0;
+
+   /* Ensure that a token written as a result of a pivot is visible */
+   gcsb_dsync();
+
+   gcspr_el0 = args->shadow_stack_pointer;
+   if (!gcs_consume_token(vma, page, gcspr_el0))
+   return -EINVAL;
+
+   tsk->thread.gcspr_el0 = gcspr_el0 + sizeof(u64);
+
+   /* Ensure that our token consumption visible */
+   gcsb_dsync();
+
+   return ret;
+}
+
 SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, 
unsigned int, flags)
 {
unsigned long alloc_size;
diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index 
4cb77e004615dff003426a2eb594460ca1015f4e..252feeda69991e939942c74556e23e27c835e766
 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -6,6 +6,7 @@
 #include 
 
 struct task_struct;
+struct kernel_clone_args;
 struct ksignal;
 
 #ifdef

Re: [PATCH 1/2] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case

2024-10-31 Thread Kees Cook
On Wed, 30 Oct 2024 14:37:31 -0600, Tycho Andersen wrote:
> Zbigniew mentioned at Linux Plumber's that systemd is interested in
> switching to execveat() for service execution, but can't, because the
> contents of /proc/pid/comm are the file descriptor which was used,
> instead of the path to the binary. This makes the output of tools like
> top and ps useless, especially in a world where most fds are opened
> CLOEXEC so the number is truly meaningless.
> 
> [...]

Applied to for-next/execve, thanks!

[1/2] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
  https://git.kernel.org/kees/c/7bdc6fc85c9a
[2/2] selftests/exec: add a test for execveat()'s comm
  https://git.kernel.org/kees/c/bd104872311a

Take care,

-- 
Kees Cook




Re: [PATCH net-next v8 03/24] ovpn: add basic netlink support

2024-10-31 Thread Sergey Ryazanov

Hello Jiri,

Sorry for the late reply. Could you elaborate a bit reasons for the RTNL 
interface implementation? Please find the questions inlined.


On 08.10.2024 15:52, Jiri Pirko wrote:

Tue, Oct 08, 2024 at 11:16:01AM CEST, anto...@openvpn.net wrote:

On 08/10/2024 10:58, Jiri Pirko wrote:

Tue, Oct 08, 2024 at 10:01:40AM CEST, anto...@openvpn.net wrote:

Hi,

On 07/10/24 17:32, Jiri Pirko wrote:

Wed, Oct 02, 2024 at 11:02:17AM CEST, anto...@openvpn.net wrote:

[...]



+operations:
+  list:
+-
+  name: dev-new
+  attribute-set: ovpn
+  flags: [ admin-perm ]
+  doc: Create a new interface of type ovpn
+  do:
+request:
+  attributes:
+- ifname
+- mode
+reply:
+  attributes:
+- ifname
+- ifindex
+-
+  name: dev-del


Why you expose new and del here in ovn specific generic netlink iface?
Why can't you use the exising RTNL api which is used for creation and
destruction of other types of devices?


That was my original approach in v1, but it was argued that an ovpn interface
needs a userspace program to be configured and used in a meaningful way,
therefore it was decided to concentrate all iface mgmt APIs along with the
others in the netlink family and to not expose any RTNL ops.


Can you please point me to the message id?


 from
Sergey and subsequent replies.
RTNL vs NL topic starts right after the definition of 'ovpn_link_ops'


Yeah, does not make sense to me. All devices should implement common
rtnl ops, the extra-config, if needed, could be on a separate channel.
I don't find Sergey's argumentation valid.


Do we consider word *should* in terms of RFC 2119:

   SHOULD   This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.

I am asking because rtnl_link_register() allows ops without .newlink 
implementation. What makes .newlink implementation as least optional and 
gives a freedom in design.


Let me briefly summarize my argumentation from the referenced thread. We 
have two classes of links point-to-point and point-to-multipoint. The 
major class is PtP and RTNL is perfectly suited to manage it. While PtMP 
is a minor class and it is an obstacle for RTNL due to need to manage 
multiple peers. What requires a different interface to manage these 
peers. Lets call it GENL interface. A PtMP-class netdev without any 
configured peer is useless, what makes GENL interface for peers 
management mandatory. Mandatory to implement in both user- and kernel-space.


Link creation can be implemented using any of these (RTNL or GENL) 
interfaces. GENL interface is already mandatory to implement in a 
user-space software, while RTNL can be considered optional to implement. 
So, implementing the link creation using GENL requires only a new 
message support implementation. While implementing the the link creation 
using RTNL requires a whole new interface implementation (socket 
read/write, messages demux, etc.).


My point is, GENL-only management gives us consolidated and clear 
solution, while GENL+RTNL requires code duplication and causes a 
complexity. That's it.



Jiri, do you see big flaws in this reasoning?



Recently Kuniyuki commented on this topic as well in:
<20240919055259.17622-1-kun...@amazon.com>
and that is why I added a default dellink implemetation.


Having dellink without newlink implemented is just wrong.


Could you clarify this statement please? I can not recall any 
documentation or a code block that enforces .newlink implementation in 
case of the .dellink presence.



Generally speaking, I can understand a feel of irregularity when looking 
at code implementing delete operation without a link creation 
counterpart. This confusion can be resolved taking into consideration a 
difference in a nature of these operations. A new link can not be 
created automatically while an existing link can be removed 
automatically without any extra inputs.


.newlink designated only for fulfilling user's requests since it 
requires extra information unavailable inside the kernel. While .dellink 
has two semantics: (a) user's requests fulfilling, (b) automatic cleanup 
of unneeded remainders.


From that perspective, having an option to implement .dellink without 
.newlink implementation looks reasonable.




However, recently we decided to add a dellink implementation for better
integration with network namespaces and to allow the user to wipe a dangling
interface.


Hmm, one more argument to have symmetric add/del impletentation in RTNL




In the future we are planning to also add the possibility to create a
"persistent interface", that is an interface created before launching any
userspace program and that survives when the latter is stopped.
I can guess this functionality may be better suited for

Re: [PATCH net-next v11 00/23] Introducing OpenVPN Data Channel Offload

2024-10-31 Thread patchwork-bot+netdevbpf
Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski :

On Tue, 29 Oct 2024 11:47:13 +0100 you wrote:
> Notable changes from v10:
> * extended commit message of 23/23 with brief description of the output
> * Link to v10: 
> https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777...@openvpn.net
> 
> Please note that some patches were already reviewed by Andre Lunn,
> Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag
> since no major code modification has happened since the review.
> 
> [...]

Here is the summary with links:
  - [net-next,v11,01/23] netlink: add NLA_POLICY_MAX_LEN macro
https://git.kernel.org/netdev/net-next/c/4138e9ec0093
  - [net-next,v11,02/23] net: introduce OpenVPN Data Channel Offload (ovpn)
(no matching commit)
  - [net-next,v11,03/23] ovpn: add basic netlink support
(no matching commit)
  - [net-next,v11,04/23] ovpn: add basic interface 
creation/destruction/management routines
(no matching commit)
  - [net-next,v11,05/23] ovpn: keep carrier always on
(no matching commit)
  - [net-next,v11,06/23] ovpn: introduce the ovpn_peer object
(no matching commit)
  - [net-next,v11,07/23] ovpn: introduce the ovpn_socket object
(no matching commit)
  - [net-next,v11,08/23] ovpn: implement basic TX path (UDP)
(no matching commit)
  - [net-next,v11,09/23] ovpn: implement basic RX path (UDP)
(no matching commit)
  - [net-next,v11,10/23] ovpn: implement packet processing
(no matching commit)
  - [net-next,v11,11/23] ovpn: store tunnel and transport statistics
(no matching commit)
  - [net-next,v11,12/23] ovpn: implement TCP transport
(no matching commit)
  - [net-next,v11,13/23] ovpn: implement multi-peer support
(no matching commit)
  - [net-next,v11,14/23] ovpn: implement peer lookup logic
(no matching commit)
  - [net-next,v11,15/23] ovpn: implement keepalive mechanism
(no matching commit)
  - [net-next,v11,16/23] ovpn: add support for updating local UDP endpoint
(no matching commit)
  - [net-next,v11,17/23] ovpn: add support for peer floating
(no matching commit)
  - [net-next,v11,18/23] ovpn: implement peer add/get/dump/delete via netlink
(no matching commit)
  - [net-next,v11,19/23] ovpn: implement key add/get/del/swap via netlink
(no matching commit)
  - [net-next,v11,20/23] ovpn: kill key and notify userspace in case of IV 
exhaustion
(no matching commit)
  - [net-next,v11,21/23] ovpn: notify userspace when a peer is deleted
(no matching commit)
  - [net-next,v11,22/23] ovpn: add basic ethtool support
(no matching commit)
  - [net-next,v11,23/23] testing/selftests: add test tool and scripts for ovpn 
module
(no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html





Re: [PATCH v3] selftests/net: Fix ./ns-XXXXXX not cleanup

2024-10-31 Thread patchwork-bot+netdevbpf
Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski :

On Wed, 30 Oct 2024 08:59:43 +0800 you wrote:
> ```
> readonly STATS="$(mktemp -p /tmp ns-XX)"
> readonly BASE=`basename $STATS`
> ```
> It could be a mistake to write to $BASE rather than $STATS, where $STATS
> is used to save the NSTAT_HISTORY and it will be cleaned up before exit.
> 
> [...]

Here is the summary with links:
  - [v3] selftests/net: Fix ./ns-XX not cleanup
https://git.kernel.org/netdev/net-next/c/d3774a4b21e9

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html





Re: [PATCH net-next v1 0/7] devmem TCP fixes

2024-10-31 Thread Jakub Kicinski
On Tue, 29 Oct 2024 20:55:20 + Mina Almasry wrote:
> A few unrelated devmem TCP fixes bundled in a series for some
> convenience (if that's ok).

These two should go to net I presume? It's missing input validation.

Either way you gotta repost either as two properly separate series,
or combine them as one, cause right now they are neither and patchwork
doesn't recognize they are related.



Re: [PATCH net-next v11 00/23] Introducing OpenVPN Data Channel Offload

2024-10-31 Thread Jakub Kicinski
On Thu, 31 Oct 2024 11:00:05 +0100 Antonio Quartulli wrote:
> It seems some little changes to ovpn.yaml were not reflected in the 
> generated files I committed.
> 
> Specifically I changed some U32 to BE32 (IPv4 addresses) and files were 
> not regenerated before committing.
> 
> (I saw the failure in patchwork about this)
> 
> It seems I'll have to send v12 after all.

I'll apply patch 1 already, it's tangentially related,
and no point rebasing.



Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks()

2024-10-31 Thread Doug Anderson
Hi,

On Wed, Oct 30, 2024 at 10:03 PM Paul E. McKenney  wrote:
>
> > > > Note that:
> > > >
> > > > * Switching to real NMIs is impossible on many existing arm64 CPUs.
> > > > The hardware support simply isn't there. Pseudo-NMIs should work fine
> > > > and are (in nearly all cases) just as good as NMIs but they have a
> > > > small performance impact. There are also compatibility issues with
> > > > some pre-existing bootloaders. ...so code can't assume even Pseudo-NMI
> > > > will work and needs to be able to fall back. Prior to my recent
> > > > changes arm64 CPUs wouldn't even do stacktraces in some scenarios. Now
> > > > at least they fall back to regular IPIs.
> > >
> > > But those IPIs are of no help whatsoever when the target CPU is spinning
> > > with interrupts disabled, which is the scenario under discussion.
> >
> > Right that we can't trace the target CPU spinning with interrupts
> > disabled.
>
> You can by invoking sched_show_task(cpu_curr(cpu)) on the CPU requesting
> the backtrace.  The resulting backtrace will not be as good as you
> would get from using an NMI, but if you don't have NMIs, it is better
> than nothing.
>
> >   ...but in the case where NMI isn't available it _still_
> > makes sense to make arch_trigger_cpumask_backtrace() fall back to IPI
> > because:
> >
> > 1. There are many cases where the trigger.*cpu.*backtrace() class of
> > functions are used to trace CPUs that _aren't_ looping with interrupts
> > disabled. We still want to be able to backtrace in that case. For
> > instance, you can see in "panic.c" that there are cases where
> > trigger_all_cpu_backtrace() is called. It's valuable to make cases
> > like this (where there's no indication that a CPU is locked) actually
> > work.
> >
> > 2. Even if there's a CPU that's looping with interrupts disabled and
> > we can't trace it because of no NMI, it could still be useful to get
> > backtraces of other CPUs. It's possible that what the other CPUs are
> > doing could give a hint about the CPU that's wedged. This isn't a
> > great hint, but some info is better than no info.
>
> And it also makes sense for an IRQ-based backtrace to fall back to
> something like the aforementioned sched_show_task(cpu_curr(cpu)) if
> the destination CPU has interrupts disabled.
>
> > > Hence my suggestion that arm64, when using IRQs to request stack
> > > backtraces, have an additional short timeout (milliseconds) in
> > > order to fall back to remote stack tracing.  In many cases, this
> > > fallback happens automatically, as you can see in dump_cpu_task().
> > > If trigger_single_cpu_backtrace() returns false, the stack is dumped
> > > remotely.
> >
> > I think the answer here is that the API needs to change. The whole
> > boolean return value for trigger.*cpu.*backtrace() is mostly ignored
> > by callers. Yes, you've pointed at one of the two places that it's not
> > ignored and it tries a reasonable fallback, but all the other callers
> > just silently do nothing when the function returns false. That led to
> > many places where arm64 devices were simply not getting _any_
> > stacktrace.
> >
> > Perhaps we need some type of API that says "I actually have a
> > fallback, so if you don't think the stracktrace will succeed then it's
> > OK to return false".
>
> Boolean is fine for trigger_single_cpu_backtrace(), which is directed at
> a single CPU.  And the one call to this function that does not currently
> check its return value could just call dump_cpu_task() instead.
>
> There are only 20 or so uses of all of these functions, so the API can
> change, give or take the pain involved in changing architecture-specific
> code.

Right. Falling back to "sched_show_task(cpu_curr(cpu))" or something
similar if the IPI doesn't go through is a good idea. Indeed, falling
back to that if the NMI doesn't go through is _also_ a good idea,
right? ...and we don't want to change all 20 callers to all add a
fallback. To that end, it feels like someone should change it so that
the common code includes the fallback and we get rid of the boolean
return value.


> > > > * Even if we decided that we absolutely had to disable stacktrades on
> > > > arm64 CPUs without some type of NMI, that won't fix arm32. arm32 has
> > > > been using regular IPIs for backtraces like this for many, many years.
> > > > Maybe folks don't care as much about arm32 anymore but it feels bad if
> > > > we totally break it.
> > >
> > > Because arm32 isn't used much for datacenter workloads, it will not
> > > be suffering from this issue.  Not enough to have motivated anyone to
> > > fix it, anyway.  In contrast, in the datacenter, CPUs really can and
> > > do have interrupts disabled for many seconds.  (No, this is not a good
> > > thing, but it is common enough for us to need to avoid disabling other
> > > debugging facilities.)
> > >
> > > So it might well be that arm32 will continue to do just fine with
> > > IRQ-based stack tracing.  Or maybe someday arm32 will also need to

next-20241031: kernel/time/clockevents.c:455 clockevents_register_device

2024-10-31 Thread Naresh Kamboju
The QEMU-ARM64 boot has failed with the Linux next-20241031 tag.
The boot log shows warnings at clockevents_register_device and followed
by rcu_preempt detected stalls.

However, the system did not proceed far enough to reach the login prompt.
The fvp-aemva, Qemu-arm64, Qemu-armv7 and Qemu-riscv64 boot failed.

Please find the incomplete boot log links below for your reference.
The Qemu version is 9.0.2.

This is always reproducible.
First seen on Linux next-20241031 tag.
  Good: next-20241030
  Good: next-20241031

qemu-arm64:
  boot:
* clang-19-lkftconfig
* gcc-13-lkftconfig
* clang-nightly-lkftconfig

qemu-armv7:
  boot:
* clang-19-lkftconfig
* gcc-13-lkftconfig
* clang-nightly-lkftconfig

Reported-by: Linux Kernel Functional Testing 

Boot log:
---
[0.00] Booting Linux on physical CPU 0x00 [0x000f0510]
0.00] Linux version 6.12.0-rc5-next-20241031 (tuxmake@tuxmake)
(aarch64-linux-gnu-gcc (Debian 13.3.0-5) 13.3.0, GNU ld (GNU Binutils
for Debian) 2.43.1) #1 SMP PREEMPT @1730356841
[0.00] KASLR enabled
[0.00] random: crng init done
[0.00] Machine model: linux,dummy-virt

<6>[0.216503] GICv3: CPU1: found redistributor 1 region 0:0x080c
<6>[0.218511] GICv3: CPU1: using allocated LPI pending table
@0x00010025
<4>[0.220528] [ cut here ]
<4>[ 0.220657] WARNING: CPU: 1 PID: 0 at kernel/time/clockevents.c:455
clockevents_register_device (kernel/time/clockevents.c:455
(discriminator 1))
<4>[0.221897] Modules linked in:
<4>[0.222659] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted
6.12.0-rc5-next-20241031 #1
<4>[0.223100] Hardware name: linux,dummy-virt (DT)
<4>[0.223480] pstate: 23c9 (nzCv DAIF -PAN -UAO -TCO -DIT
-SSBS BTYPE=--)
<4>[ 0.223815] pc : clockevents_register_device
(kernel/time/clockevents.c:455 (discriminator 1))
<4>[ 0.223899] lr : clockevents_config_and_register
(kernel/time/clockevents.c:523)
<4>[0.223951] sp : 800080103cd0
<4>[0.223985] x29: 800080103cd0 x28: 0091 x27:

<4>[0.224143] x26: fff06d7e11c1b000 x25: fff0ff4d7108 x24:
00ee
<4>[0.224221] x23: 01ff x22: 9282ed4e66c8 x21:
9282ed8ce940
<4>[0.224296] x20: fff0ff4ef000 x19: fff0ff4ef000 x18:
0010
<4>[0.224376] x17: 3030353230303130 x16: 3030303030307830 x15:
4020656c62617420
<4>[0.224450] x14: 676e69646e657020 x13: 3030303035323030 x12:
3130303030303030
<4>[0.224522] x11: 78304020656c6261 x10: 9282eda4de88 x9 :
9282eb3b161c
<4>[0.224621] x8 : 1dcd6500 x7 : 3b9aca00 x6 :
007f
<4>[0.224693] x5 : 001b x4 : 0080 x3 :
01ff
<4>[0.224771] x2 : 0001 x1 : 9282ed9ced40 x0 :

<4>[0.225007] Call trace:
<4>[ 0.225218] clockevents_register_device+0x170/0x188 P
<4>[ 0.225367] clockevents_config_and_register+0x34/0x50 L
<4>[ 0.225487] clockevents_config_and_register (kernel/time/clockevents.c:523)
<4>[ 0.225553] arch_timer_starting_cpu
(drivers/clocksource/arm_arch_timer.c:1034)
<4>[ 0.225602] cpuhp_invoke_callback (kernel/cpu.c:194)
<4>[ 0.225649] __cpuhp_invoke_callback_range (kernel/cpu.c:965)
<4>[ 0.225691] notify_cpu_starting (kernel/cpu.c:1604)
<4>[ 0.225733] secondary_start_kernel (arch/arm64/kernel/smp.c:251)
<4>[ 0.225809] __secondary_switched (arch/arm64/kernel/head.S:421)
<4>[0.226244] ---[ end trace  ]---
<6>[0.228186] CPU1: Booted secondary processor 0x01 [0x000f0510]
<6>[0.237749] smp: Brought up 1 node, 2 CPUs
<6>[0.253976] SMP: Total of 2 processors activated.
<6>[0.254658] CPU: All CPU(s) started at EL2

<3>[   21.732871] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3>[   21.734120] rcu: 1-...!: (0 ticks this GP) idle=0ca0/0/0x0
softirq=0/0 fqs=0 (false positive?)
<3>[   21.735202] rcu: (detected by 0, t=5252 jiffies, g=-1179, q=28 ncpus=2)
<6>[   21.736404] Sending NMI from CPU 0 to CPUs 1:
<4>[ 21.737195] NMI backtrace for cpu 1 skipped: idling at
default_idle_call (kernel/sched/idle.c:126)
<3>[   21.741361] rcu: rcu_preempt kthread timer wakeup didn't happen
for 5251 jiffies! g-1179 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
<3>[   21.742143] rcu: Possible timer handling issue on cpu=1 timer-softirq=0
<3>[   21.742724] rcu: rcu_preempt kthread starved for 5252 jiffies!
g-1179 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
<3>[   21.743583] rcu: Unless rcu_preempt kthread gets sufficient CPU
time, OOM is now expected behavior.
<3>[   21.744154] rcu: RCU grace-period kthread stack dump:
<6>[   21.744678] task:r

Re: [PATCH v5 00/19] Implement DWARF modversions

2024-10-31 Thread Sedat Dilek
On Thu, Oct 31, 2024 at 2:56 AM Sedat Dilek  wrote:
>
> On Wed, Oct 30, 2024 at 10:14 PM Sami Tolvanen  
> wrote:
> >
> > Hi Sedat,
> >
> > On Wed, Oct 30, 2024 at 2:00 PM Sedat Dilek  wrote:
> > >
> > > Hi Sami,
> > >
> > > perfect timing: Nathan uploaded SLIM LLVM toolchain v19.1.3
> > >
> > > KBUILD_GENDWARFKSYMS_STABLE is to be set manually?
> > > What value is recommended?
> >
> > The usage is similar to KBUILD_SYMTYPES, you can just set
> > KBUILD_GENDWARFKSYMS_STABLE=1 to use --stable when calculating
> > versions. However, it's not normally necessary to set this flag at all
> > when building your own kernel, it's mostly for distributions.
> >
> > Sami
>
> OK, thanks.
>
> # cat /proc/version
> Linux version 6.12.0-rc5-1-amd64-clang19-kcfi
> (sedat.di...@gmail.com@iniza) (ClangBuiltLinux clang version 19.1.3
> (https://github.com/llvm/llvm-project.git
> ab51eccf88f5321e7c60591c5546b254b6afab99), ClangBuiltLinux LLD 19.1.3
> (https://github.com/llvm/llvm-project.git
> ab51eccf88f5321e7c60591c5546b254b6afab99)) #1~trixie+dileks SMP
> PREEMPT_DYNAMIC 2024-10-30
>
> Tested-by: Sedat Dilek  # LLVM/Clang v19.1.3 on x86-64
>

Fix email-address in credit tag:

Tested-by: Sedat Dilek  # LLVM/Clang v19.1.3 on x86-64

-sed@-



Re: next-20241031: kernel/time/clockevents.c:455 clockevents_register_device

2024-10-31 Thread Frederic Weisbecker
Hi,

On Thu, Oct 31, 2024 at 02:10:14PM +0530, Naresh Kamboju wrote:
> The QEMU-ARM64 boot has failed with the Linux next-20241031 tag.
> The boot log shows warnings at clockevents_register_device and followed
> by rcu_preempt detected stalls.
> 
> However, the system did not proceed far enough to reach the login prompt.
> The fvp-aemva, Qemu-arm64, Qemu-armv7 and Qemu-riscv64 boot failed.
> 
> Please find the incomplete boot log links below for your reference.
> The Qemu version is 9.0.2.
> 
> This is always reproducible.
> First seen on Linux next-20241031 tag.
>   Good: next-20241030
>   Good: next-20241031
> 
> qemu-arm64:
>   boot:
> * clang-19-lkftconfig
> * gcc-13-lkftconfig
> * clang-nightly-lkftconfig
> 
> qemu-armv7:
>   boot:
> * clang-19-lkftconfig
> * gcc-13-lkftconfig
> * clang-nightly-lkftconfig
> 
> Reported-by: Linux Kernel Functional Testing 
> 
> Boot log:
> ---
> [0.00] Booting Linux on physical CPU 0x00 [0x000f0510]
> 0.00] Linux version 6.12.0-rc5-next-20241031 (tuxmake@tuxmake)
> (aarch64-linux-gnu-gcc (Debian 13.3.0-5) 13.3.0, GNU ld (GNU Binutils
> for Debian) 2.43.1) #1 SMP PREEMPT @1730356841
> [0.00] KASLR enabled
> [0.00] random: crng init done
> [0.00] Machine model: linux,dummy-virt
> 
> <6>[0.216503] GICv3: CPU1: found redistributor 1 region 
> 0:0x080c
> <6>[0.218511] GICv3: CPU1: using allocated LPI pending table
> @0x00010025
> <4>[0.220528] [ cut here ]
> <4>[ 0.220657] WARNING: CPU: 1 PID: 0 at kernel/time/clockevents.c:455
> clockevents_register_device (kernel/time/clockevents.c:455

It's possible that I messed up something with clockevents.

Can you try to reproduce with:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
timers/core

And if so it's possible that the bad commit is somewhere between:

17a8945f369c (clockevents: Improve clockevents_notify_released() comment)

and

bf9a001fb8e4 (clocksource/drivers/timer-tegra: Remove clockevents shutdown 
call on offlining)

I wish I could reproduce on my own but I don't have easy
access to such hardware.

Thanks.



Re: next-20241031: kernel/time/clockevents.c:455 clockevents_register_device

2024-10-31 Thread Frederic Weisbecker
On Thu, Oct 31, 2024 at 10:42:45AM +0100, Thomas Gleixner wrote:
> On Thu, Oct 31 2024 at 14:10, Naresh Kamboju wrote:
> > The QEMU-ARM64 boot has failed with the Linux next-20241031 tag.
> > The boot log shows warnings at clockevents_register_device and followed
> > by rcu_preempt detected stalls.
> >
> > However, the system did not proceed far enough to reach the login prompt.
> > The fvp-aemva, Qemu-arm64, Qemu-armv7 and Qemu-riscv64 boot failed.
> >
> > Please find the incomplete boot log links below for your reference.
> > The Qemu version is 9.0.2.
> > <4>[ 0.220657] WARNING: CPU: 1 PID: 0 at kernel/time/clockevents.c:455
> > clockevents_register_device (kernel/time/clockevents.c:455
> > <4>[ 0.225218] clockevents_register_device+0x170/0x188 P
> > <4>[ 0.225367] clockevents_config_and_register+0x34/0x50 L
> > <4>[ 0.225487] clockevents_config_and_register 
> > (kernel/time/clockevents.c:523)
> > <4>[ 0.225553] arch_timer_starting_cpu
> > (drivers/clocksource/arm_arch_timer.c:1034)
> > <4>[ 0.225602] cpuhp_invoke_callback (kernel/cpu.c:194)
> > <4>[ 0.225649] __cpuhp_invoke_callback_range (kernel/cpu.c:965)
> > <4>[ 0.225691] notify_cpu_starting (kernel/cpu.c:1604)
> 
> That's obvious what happens here. notify_cpu_starting() is invoked
> before the CPU is marked online, which triggers the new check in
> clockevents_register_device().
> 
> I removed the warning and force pushed the fixed up branch, so that
> should be gone by tomorrow.

Ah, phew!

Thanks.

> 
> Thanks,
> 
> tglx
> 



Re: [PATCH net-next v11 00/23] Introducing OpenVPN Data Channel Offload

2024-10-31 Thread Antonio Quartulli
It seems some little changes to ovpn.yaml were not reflected in the 
generated files I committed.


Specifically I changed some U32 to BE32 (IPv4 addresses) and files were 
not regenerated before committing.


(I saw the failure in patchwork about this)

It seems I'll have to send v12 after all.

Cheers,

On 29/10/2024 11:47, Antonio Quartulli wrote:

Notable changes from v10:
* extended commit message of 23/23 with brief description of the output
* Link to v10: 
https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777...@openvpn.net

Please note that some patches were already reviewed by Andre Lunn,
Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag
since no major code modification has happened since the review.

The latest code can also be found at:

https://github.com/OpenVPN/linux-kernel-ovpn

Thanks a lot!
Best Regards,

Antonio Quartulli
OpenVPN Inc.

---
Antonio Quartulli (23):
   netlink: add NLA_POLICY_MAX_LEN macro
   net: introduce OpenVPN Data Channel Offload (ovpn)
   ovpn: add basic netlink support
   ovpn: add basic interface creation/destruction/management routines
   ovpn: keep carrier always on
   ovpn: introduce the ovpn_peer object
   ovpn: introduce the ovpn_socket object
   ovpn: implement basic TX path (UDP)
   ovpn: implement basic RX path (UDP)
   ovpn: implement packet processing
   ovpn: store tunnel and transport statistics
   ovpn: implement TCP transport
   ovpn: implement multi-peer support
   ovpn: implement peer lookup logic
   ovpn: implement keepalive mechanism
   ovpn: add support for updating local UDP endpoint
   ovpn: add support for peer floating
   ovpn: implement peer add/get/dump/delete via netlink
   ovpn: implement key add/get/del/swap via netlink
   ovpn: kill key and notify userspace in case of IV exhaustion
   ovpn: notify userspace when a peer is deleted
   ovpn: add basic ethtool support
   testing/selftests: add test tool and scripts for ovpn module

  Documentation/netlink/specs/ovpn.yaml  |  362 +++
  MAINTAINERS|   11 +
  drivers/net/Kconfig|   14 +
  drivers/net/Makefile   |1 +
  drivers/net/ovpn/Makefile  |   22 +
  drivers/net/ovpn/bind.c|   54 +
  drivers/net/ovpn/bind.h|  117 +
  drivers/net/ovpn/crypto.c  |  214 ++
  drivers/net/ovpn/crypto.h  |  145 ++
  drivers/net/ovpn/crypto_aead.c |  386 
  drivers/net/ovpn/crypto_aead.h |   33 +
  drivers/net/ovpn/io.c  |  462 
  drivers/net/ovpn/io.h  |   25 +
  drivers/net/ovpn/main.c|  337 +++
  drivers/net/ovpn/main.h|   24 +
  drivers/net/ovpn/netlink-gen.c |  212 ++
  drivers/net/ovpn/netlink-gen.h |   41 +
  drivers/net/ovpn/netlink.c | 1135 ++
  drivers/net/ovpn/netlink.h |   18 +
  drivers/net/ovpn/ovpnstruct.h  |   61 +
  drivers/net/ovpn/packet.h  |   40 +
  drivers/net/ovpn/peer.c| 1201 ++
  drivers/net/ovpn/peer.h|  165 ++
  drivers/net/ovpn/pktid.c   |  130 ++
  drivers/net/ovpn/pktid.h   |   87 +
  drivers/net/ovpn/proto.h   |  104 +
  drivers/net/ovpn/skb.h |   56 +
  drivers/net/ovpn/socket.c  |  178 ++
  drivers/net/ovpn/socket.h  |   55 +
  drivers/net/ovpn/stats.c   |   21 +
  drivers/net/ovpn/stats.h   |   47 +
  drivers/net/ovpn/tcp.c |  506 +
  drivers/net/ovpn/tcp.h |   44 +
  drivers/net/ovpn/udp.c |  406 
  drivers/net/ovpn/udp.h |   26 +
  include/net/netlink.h  |1 +
  include/uapi/linux/if_link.h   |   15 +
  include/uapi/linux/ovpn.h  |  109 +
  include/uapi/linux/udp.h   |1 +
  tools/net/ynl/ynl-gen-c.py |4 +-
  tools/testing/selftests/Makefile   |1 +
  tools/testing/selftests/net/ovpn/.gitignore|2 +
  tools/testing/selftests/net/ovpn/Makefile  |   17 +
  tools/testing/selftests/net/ovpn/config|   10 +
  tools/testing/selftests/net/ovpn/data64.key|5 +
  tools/testing/selftests/net/ovpn/ovpn-cli.c| 2370 
  tools/testing/selftests/net/ovpn/tcp_p

Re: [PATCH v8 1/3] modules: Support extended MODVERSIONS info

2024-10-31 Thread Luis Chamberlain
On Wed, Oct 30, 2024 at 10:06:12PM -0700, Matthew Maurer wrote:
> On Wed, Oct 30, 2024 at 9:37 PM Luis Chamberlain  wrote:
> >
> > On Thu, Oct 31, 2024 at 12:22:36PM +1100, Michael Ellerman wrote:
> > > Matthew Maurer  writes:
> > > > Adds a new format for MODVERSIONS which stores each field in a separate
> > > > ELF section. This initially adds support for variable length names, but
> > > > could later be used to add additional fields to MODVERSIONS in a
> > > > backwards compatible way if needed. Any new fields will be ignored by
> > > > old user tooling, unlike the current format where user tooling cannot
> > > > tolerate adjustments to the format (for example making the name field
> > > > longer).
> > > >
> > > > Since PPC munges its version records to strip leading dots, we reproduce
> > > > the munging for the new format. Other architectures do not appear to
> > > > have architecture-specific usage of this information.
> > > >
> > > > Reviewed-by: Sami Tolvanen 
> > > > Signed-off-by: Matthew Maurer 
> > > > ---
> > > >  arch/powerpc/kernel/module_64.c | 24 ++-
> > >
> > > Acked-by: Michael Ellerman  (powerpc)
> >
> > Michael, Matthew, why make everyone deal with this instead of just
> > making this an arch thing and ppc would be the only one doing it?
> >
> >   Luis
> >
> 
> I'm not sure I understand - the PPC changes are in an arch-specific
> directory, and triggered through the arch-implemented callback
> mod_frob_arch_sections. What would you like done to make it more of an
> arch-thing?

Sorry, yes, I see that now, that's what I get for late night patch
review. Nevermidn, this all looks good to me now.

  Luis



[RFC v2 01/13] rust: Introduce atomic API helpers

2024-10-31 Thread Boqun Feng
In order to support LKMM atomics in Rust, add rust_helper_* for atomic
APIs. These helpers ensure the implementation of LKMM atomics in Rust is
the same as in C. This could save the maintenance burden of having two
similar atomic implementations in asm.

Originally-by: Mark Rutland 
Signed-off-by: Boqun Feng 
---
 rust/helpers/atomic.c | 1038 +
 rust/helpers/helpers.c|1 +
 scripts/atomic/gen-atomics.sh |1 +
 scripts/atomic/gen-rust-atomic-helpers.sh |   65 ++
 4 files changed, 1105 insertions(+)
 create mode 100644 rust/helpers/atomic.c
 create mode 100755 scripts/atomic/gen-rust-atomic-helpers.sh

diff --git a/rust/helpers/atomic.c b/rust/helpers/atomic.c
new file mode 100644
index ..00bf10887928
--- /dev/null
+++ b/rust/helpers/atomic.c
@@ -0,0 +1,1038 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Generated by scripts/atomic/gen-rust-atomic-helpers.sh
+// DO NOT MODIFY THIS FILE DIRECTLY
+
+/*
+ * This file provides helpers for the various atomic functions for Rust.
+ */
+#ifndef _RUST_ATOMIC_API_H
+#define _RUST_ATOMIC_API_H
+
+#include 
+
+// TODO: Remove this after LTO helper support is added.
+#define __rust_helper
+
+__rust_helper int
+rust_helper_atomic_read(const atomic_t *v)
+{
+   return atomic_read(v);
+}
+
+__rust_helper int
+rust_helper_atomic_read_acquire(const atomic_t *v)
+{
+   return atomic_read_acquire(v);
+}
+
+__rust_helper void
+rust_helper_atomic_set(atomic_t *v, int i)
+{
+   atomic_set(v, i);
+}
+
+__rust_helper void
+rust_helper_atomic_set_release(atomic_t *v, int i)
+{
+   atomic_set_release(v, i);
+}
+
+__rust_helper void
+rust_helper_atomic_add(int i, atomic_t *v)
+{
+   atomic_add(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_add_return(int i, atomic_t *v)
+{
+   return atomic_add_return(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_add_return_acquire(int i, atomic_t *v)
+{
+   return atomic_add_return_acquire(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_add_return_release(int i, atomic_t *v)
+{
+   return atomic_add_return_release(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_add_return_relaxed(int i, atomic_t *v)
+{
+   return atomic_add_return_relaxed(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_fetch_add(int i, atomic_t *v)
+{
+   return atomic_fetch_add(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_fetch_add_acquire(int i, atomic_t *v)
+{
+   return atomic_fetch_add_acquire(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_fetch_add_release(int i, atomic_t *v)
+{
+   return atomic_fetch_add_release(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_fetch_add_relaxed(int i, atomic_t *v)
+{
+   return atomic_fetch_add_relaxed(i, v);
+}
+
+__rust_helper void
+rust_helper_atomic_sub(int i, atomic_t *v)
+{
+   atomic_sub(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_sub_return(int i, atomic_t *v)
+{
+   return atomic_sub_return(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_sub_return_acquire(int i, atomic_t *v)
+{
+   return atomic_sub_return_acquire(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_sub_return_release(int i, atomic_t *v)
+{
+   return atomic_sub_return_release(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_sub_return_relaxed(int i, atomic_t *v)
+{
+   return atomic_sub_return_relaxed(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_fetch_sub(int i, atomic_t *v)
+{
+   return atomic_fetch_sub(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_fetch_sub_acquire(int i, atomic_t *v)
+{
+   return atomic_fetch_sub_acquire(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_fetch_sub_release(int i, atomic_t *v)
+{
+   return atomic_fetch_sub_release(i, v);
+}
+
+__rust_helper int
+rust_helper_atomic_fetch_sub_relaxed(int i, atomic_t *v)
+{
+   return atomic_fetch_sub_relaxed(i, v);
+}
+
+__rust_helper void
+rust_helper_atomic_inc(atomic_t *v)
+{
+   atomic_inc(v);
+}
+
+__rust_helper int
+rust_helper_atomic_inc_return(atomic_t *v)
+{
+   return atomic_inc_return(v);
+}
+
+__rust_helper int
+rust_helper_atomic_inc_return_acquire(atomic_t *v)
+{
+   return atomic_inc_return_acquire(v);
+}
+
+__rust_helper int
+rust_helper_atomic_inc_return_release(atomic_t *v)
+{
+   return atomic_inc_return_release(v);
+}
+
+__rust_helper int
+rust_helper_atomic_inc_return_relaxed(atomic_t *v)
+{
+   return atomic_inc_return_relaxed(v);
+}
+
+__rust_helper int
+rust_helper_atomic_fetch_inc(atomic_t *v)
+{
+   return atomic_fetch_inc(v);
+}
+
+__rust_helper int
+rust_helper_atomic_fetch_inc_acquire(atomic_t *v)
+{
+   return atomic_fetch_inc_acquire(v);
+}
+
+__rust_helper int
+rust_helper_atomic_fetch_inc_release(atomic_t *v)
+{
+   return atomic_fetch_inc_release(v);
+}
+
+__rust_helper int
+rust_helper_atomic_fetch_inc_relaxed(atomic_t *v)
+{
+   return atomic_fetch_inc_relaxed(v);
+}
+
+__rust_helper void
+rust_helper_a

[RFC v2 04/13] rust: sync: atomic: Add generic atomics

2024-10-31 Thread Boqun Feng
To provide using LKMM atomics for Rust code, a generic `Atomic` is
added, currently `T` needs to be Send + Copy because these are the
straightforward usages and all basic types support this. The trait
`AllowAtomic` should be only ipmlemented inside atomic mod until the
generic atomic framework is mature enough (unless the ipmlementer is a
`#[repr(transparent)]` new type).

`AtomicIpml` types are automatically `AllowAtomic`, and so far only
basic operations load() and store() are introduced.

Signed-off-by: Boqun Feng 
---
 rust/kernel/sync/atomic.rs |   2 +
 rust/kernel/sync/atomic/generic.rs | 253 +
 2 files changed, 255 insertions(+)
 create mode 100644 rust/kernel/sync/atomic/generic.rs

diff --git a/rust/kernel/sync/atomic.rs b/rust/kernel/sync/atomic.rs
index be2e8583595f..b791abc59b61 100644
--- a/rust/kernel/sync/atomic.rs
+++ b/rust/kernel/sync/atomic.rs
@@ -16,7 +16,9 @@
 //!
 //! [`LKMM`]: srctree/tools/memory-mode/
 
+pub mod generic;
 pub mod ops;
 pub mod ordering;
 
+pub use generic::Atomic;
 pub use ordering::{Acquire, Full, Relaxed, Release};
diff --git a/rust/kernel/sync/atomic/generic.rs 
b/rust/kernel/sync/atomic/generic.rs
new file mode 100644
index ..204da38e2691
--- /dev/null
+++ b/rust/kernel/sync/atomic/generic.rs
@@ -0,0 +1,253 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Generic atomic primitives.
+
+use super::ops::*;
+use super::ordering::*;
+use crate::types::Opaque;
+
+/// A generic atomic variable.
+///
+/// `T` must impl [`AllowAtomic`], that is, an [`AtomicImpl`] has to be chosen.
+///
+/// # Invariants
+///
+/// Doing an atomic operation while holding a reference of [`Self`] won't 
cause a data race, this
+/// is guaranteed by the safety requirement of [`Self::from_ptr`] and the 
extra safety requirement
+/// of the usage on pointers returned by [`Self::as_ptr`].
+#[repr(transparent)]
+pub struct Atomic(Opaque);
+
+// SAFETY: `Atomic` is safe to share among execution contexts because all 
accesses are atomic.
+unsafe impl Sync for Atomic {}
+
+/// Atomics that support basic atomic operations.
+///
+/// TODO: Unless the `impl` is a `#[repr(transparet)]` new type of an existing 
[`AllowAtomic`], the
+/// impl block should be only done in atomic mod. And currently only basic 
integer types can
+/// implement this trait in atomic mod.
+///
+/// # Safety
+///
+/// [`Self`] must have the same size and alignment as [`Self::Repr`].
+pub unsafe trait AllowAtomic: Sized + Send + Copy {
+/// The backing atomic implementation type.
+type Repr: AtomicImpl;
+
+/// Converts into a [`Self::Repr`].
+fn into_repr(self) -> Self::Repr;
+
+/// Converts from a [`Self::Repr`].
+fn from_repr(repr: Self::Repr) -> Self;
+}
+
+// SAFETY: `T::Repr` is `Self` (i.e. `T`), so they have the same size and 
alignment.
+unsafe impl AllowAtomic for T {
+type Repr = Self;
+
+fn into_repr(self) -> Self::Repr {
+self
+}
+
+fn from_repr(repr: Self::Repr) -> Self {
+repr
+}
+}
+
+impl Atomic {
+/// Creates a new atomic.
+pub const fn new(v: T) -> Self {
+Self(Opaque::new(v))
+}
+
+/// Creates a reference to [`Self`] from a pointer.
+///
+/// # Safety
+///
+/// - `ptr` has to be a valid pointer.
+/// - `ptr` has to be valid for both reads and writes for the whole 
lifetime `'a`.
+/// - For the whole lifetime of '`a`, other accesses to the object cannot 
cause data races
+///   (defined by [`LKMM`]) against atomic operations on the returned 
reference.
+///
+/// [`LKMM`]: srctree/tools/memory-model
+///
+/// # Examples
+///
+/// Using [`Atomic::from_ptr()`] combined with [`Atomic::load()`] or 
[`Atomic::store()`] can
+/// achieve the same functionality as `READ_ONCE()`/`smp_load_acquire()` or
+/// `WRITE_ONCE()`/`smp_store_release()` in C side:
+///
+/// ```rust
+/// # use kernel::types::Opaque;
+/// use kernel::sync::atomic::{Atomic, Relaxed, Release};
+///
+/// // Assume there is a C struct `Foo`.
+/// mod cbindings {
+/// #[repr(C)]
+/// pub(crate) struct foo { pub(crate) a: i32, pub(crate) b: i32 }
+/// }
+///
+/// let tmp = Opaque::new(cbindings::foo { a: 1, b: 2});
+///
+/// // struct foo *foo_ptr = ..;
+/// let foo_ptr = tmp.get();
+///
+/// // SAFETY: `foo_ptr` is a valid pointer, and `.a` is inbound.
+/// let foo_a_ptr = unsafe { core::ptr::addr_of_mut!((*foo_ptr).a) };
+///
+/// // a = READ_ONCE(foo_ptr->a);
+/// //
+/// // SAFETY: `foo_a_ptr` is a valid pointer for read, and all accesses 
on it is atomic, so no
+/// // data race.
+/// let a = unsafe { Atomic::from_ptr(foo_a_ptr) }.load(Relaxed);
+/// # assert_eq!(a, 1);
+///
+/// // smp_store_release(&foo_ptr->a, 2);
+/// //
+/// // SAFETY: `foo_a_ptr` is a valid pointer for write, and all accesses 
on it is atomic, so no
+/// // data rac

[RFC v2 05/13] rust: sync: atomic: Add atomic {cmp,}xchg operations

2024-10-31 Thread Boqun Feng
xchg() and cmpxchg() are basic operations on atomic. Provide these based
on C APIs.

Note that cmpxchg() use the similar function signature as
compare_exchange() in Rust std: returning a `Result`, `Ok(old)` means
the operation succeeds and `Err(old)` means the operation fails. With
the compiler optimization and inline helpers, it should provides the
same efficient code generation as using atomic_try_cmpxchg() or
atomic_cmpxchg() correctly.

Except it's not! Because of commit 44fe84459faf ("locking/atomic: Fix
atomic_try_cmpxchg() semantics"), the atomic_try_cmpxchg() on x86 has a
branch even if the caller doesn't care about the success of cmpxchg and
only wants to use the old value. For example, for code like:

// Uses the latest value regardlessly, same as atomic_cmpxchg() in C.
let latest = x.cmpxchg(42, 64, Full).unwrap_or_else(|old| old);

It will still generate code:

movl$0x40, %ecx
movl$0x34, %eax
lock
cmpxchgl%ecx, 0x4(%rsp)
jne 1f
2:
...
1:  movl%eax, %ecx
jmp 2b

Attempting to write an x86 try_cmpxchg_exclusive() for Rust use only,
because the Rust function takes a `&mut` for old pointer, which must be
exclusive to the function, therefore it's unsafe to use some shared
pointer. But maybe I'm missing something?

Signed-off-by: Boqun Feng 
---
 rust/kernel/sync/atomic/generic.rs | 151 +
 1 file changed, 151 insertions(+)

diff --git a/rust/kernel/sync/atomic/generic.rs 
b/rust/kernel/sync/atomic/generic.rs
index 204da38e2691..bfccc4336c75 100644
--- a/rust/kernel/sync/atomic/generic.rs
+++ b/rust/kernel/sync/atomic/generic.rs
@@ -251,3 +251,154 @@ pub fn store(&self, v: T, _: 
Ordering) {
 };
 }
 }
+
+impl Atomic
+where
+T::Repr: AtomicHasXchgOps,
+{
+/// Atomic exchange.
+///
+/// # Examples
+///
+/// ```rust
+/// use kernel::sync::atomic::{Atomic, Acquire, Relaxed};
+///
+/// let x = Atomic::new(42);
+///
+/// assert_eq!(42, x.xchg(52, Acquire));
+/// assert_eq!(52, x.load(Relaxed));
+/// ```
+#[inline(always)]
+pub fn xchg(&self, v: T, _: Ordering) -> T {
+let v = T::into_repr(v);
+let a = self.as_ptr().cast::();
+
+// SAFETY:
+// - For calling the atomic_xchg*() function:
+//   - `self.as_ptr()` is a valid pointer, and per the safety 
requirement of `AllocAtomic`,
+//  a `*mut T` is a valid `*mut T::Repr`. Therefore `a` is a valid 
pointer,
+//   - per the type invariants, the following atomic operation won't 
cause data races.
+// - For extra safety requirement of usage on pointers returned by 
`self.as_ptr():
+//   - atomic operations are used here.
+let ret = unsafe {
+match Ordering::ORDER {
+OrderingDesc::Full => T::Repr::atomic_xchg(a, v),
+OrderingDesc::Acquire => T::Repr::atomic_xchg_acquire(a, v),
+OrderingDesc::Release => T::Repr::atomic_xchg_release(a, v),
+OrderingDesc::Relaxed => T::Repr::atomic_xchg_relaxed(a, v),
+}
+};
+
+T::from_repr(ret)
+}
+
+/// Atomic compare and exchange.
+///
+/// Compare: The comparison is done via the byte level comparison between 
the atomic variables
+/// with the `old` value.
+///
+/// Ordering: A failed compare and exchange does provide anything, the 
read part of a failed
+/// cmpxchg should be treated as a relaxed read.
+///
+/// Returns `Ok(value)` if cmpxchg succeeds, and `value` is guaranteed to 
be equal to `old`,
+/// otherwise returns `Err(value)`, and `value` is the value of the atomic 
variable when cmpxchg
+/// was happening.
+///
+/// # Examples
+///
+/// ```rust
+/// use kernel::sync::atomic::{Atomic, Full, Relaxed};
+///
+/// let x = Atomic::new(42);
+///
+/// // Checks whether cmpxchg succeeded.
+/// let success = x.cmpxchg(52, 64, Relaxed).is_ok();
+/// # assert!(!success);
+///
+/// // Checks whether cmpxchg failed.
+/// let failure = x.cmpxchg(52, 64, Relaxed).is_err();
+/// # assert!(failure);
+///
+/// // Uses the old value if failed, probably re-try cmpxchg.
+/// match x.cmpxchg(52, 64, Relaxed) {
+/// Ok(_) => { },
+/// Err(old) => {
+/// // do something with `old`.
+/// # assert_eq!(old, 42);
+/// }
+/// }
+///
+/// // Uses the latest value regardlessly, same as atomic_cmpxchg() in C.
+/// let latest = x.cmpxchg(42, 64, Full).unwrap_or_else(|old| old);
+/// # assert_eq!(42, latest);
+/// assert_eq!(64, x.load(Relaxed));
+/// ```
+#[inline(always)]
+pub fn cmpxchg(&self, mut old: T, new: T, o: Ordering) -> 
Result {
+if self.try_cmpxchg(&mut old, new, o) {
+Ok(old)
+} else {
+Err(old)
+}
+}
+
+/// Atomic compare and exchange and return

[RFC v2 00/13] LKMM *generic* atomics in Rust

2024-10-31 Thread Boqun Feng
Hi,

This is another RFC version of LKMM atomics in Rust, you can find the
previous versions:

v1: 
https://lore.kernel.org/rust-for-linux/20240612223025.1158537-1-boqun.f...@gmail.com/
wip: 
https://lore.kernel.org/rust-for-linux/20240322233838.868874-1-boqun.f...@gmail.com/

I add a few more people Cced this time, so if you're curious about why
we choose to implement LKMM atomics instead of using the Rust ones, you
can find some explanation:

* In my Kangrejos talk: https://lwn.net/Articles/993785/
* In Linus' email: 
https://lore.kernel.org/rust-for-linux/CAHk-=whkQk=zq5ximcau3xj4v69+jyop-y6sywhq-tvxssv...@mail.gmail.com/

This time, I try implementing a generic atomic type `Atomic`, since
Benno and Gary suggested last time, and also Rust standard library is
also going to that direction [1].

Honestly, a generic atomic type is still not quite necessary for myself,
but here are a few reasons that it's useful:

*   It's useful for type alias, for example, if you have:

type c_long = isize;

Then `Atomic` and `Atomic` is the same type,
this may make FFI code (mapping a C type to a Rust type or vice
versa) more readable.

*   In kernel, we sometimes switch atomic to percpu for better
scalabity, percpu is naturally a generic type, because it can
have data that is larger than machine word size. Making atomic
generic ease the potential switching/refactoring.

*   Generic atomics provide all the functionalities that non-generic
atomics could have.

That said, currently "generic" is limited to a few types: the type must
be the same size as atomic_t or atomic64_t, other than basic types, only
#[repr()] struct can be used in a `Atomic`.

Also this is not a full feature set patchset, things like different
arithmetic operations and bit operations are still missing, they can be
either future work or for future versions.

I included an RCU pointer implementation as one example of the usage, so
a patch from Danilo is added, but I will drop it once his patch merged.

This is based on today's rust-next, and I've run all kunit tests from
the doc test on x86, arm64 and riscv.

Feedbacks and comments are welcome! Thanks.

Regards,
Boqun

[1]: https://github.com/rust-lang/rust/issues/130539

Boqun Feng (12):
  rust: Introduce atomic API helpers
  rust: sync: Add basic atomic operation mapping framework
  rust: sync: atomic: Add ordering annotation types
  rust: sync: atomic: Add generic atomics
  rust: sync: atomic: Add atomic {cmp,}xchg operations
  rust: sync: atomic: Add the framework of arithmetic operations
  rust: sync: atomic: Add Atomic
  rust: sync: atomic: Add Atomic<{usize,isize}>
  rust: sync: atomic: Add Atomic<*mut T>
  rust: sync: atomic: Add arithmetic ops for Atomic<*mut T>
  rust: sync: Add memory barriers
  rust: sync: rcu: Add RCU protected pointer

Wedson Almeida Filho (1):
  rust: add rcu abstraction

 MAINTAINERS   |4 +-
 rust/helpers/atomic.c | 1038 +
 rust/helpers/helpers.c|3 +
 rust/helpers/rcu.c|   13 +
 rust/kernel/sync.rs   |3 +
 rust/kernel/sync/atomic.rs|  228 +
 rust/kernel/sync/atomic/generic.rs|  516 ++
 rust/kernel/sync/atomic/ops.rs|  199 
 rust/kernel/sync/atomic/ordering.rs   |   94 ++
 rust/kernel/sync/barrier.rs   |   67 ++
 rust/kernel/sync/rcu.rs   |  319 +++
 scripts/atomic/gen-atomics.sh |1 +
 scripts/atomic/gen-rust-atomic-helpers.sh |   65 ++
 13 files changed, 2549 insertions(+), 1 deletion(-)
 create mode 100644 rust/helpers/atomic.c
 create mode 100644 rust/helpers/rcu.c
 create mode 100644 rust/kernel/sync/atomic.rs
 create mode 100644 rust/kernel/sync/atomic/generic.rs
 create mode 100644 rust/kernel/sync/atomic/ops.rs
 create mode 100644 rust/kernel/sync/atomic/ordering.rs
 create mode 100644 rust/kernel/sync/barrier.rs
 create mode 100644 rust/kernel/sync/rcu.rs
 create mode 100755 scripts/atomic/gen-rust-atomic-helpers.sh

-- 
2.45.2




[RFC v2 03/13] rust: sync: atomic: Add ordering annotation types

2024-10-31 Thread Boqun Feng
Preparation for atomic primitives. Instead of a suffix like _acquire, a
method parameter along with the corresponding generic parameter will be
used to specify the ordering of an atomic operations. For example,
atomic load() can be defined as:

impl Atomic {
pub fn load(&self, _o: O) -> T { ... }
}

and acquire users would do:

let r = x.load(Acquire);

relaxed users:

let r = x.load(Relaxed);

doing the following:

let r = x.load(Release);

will cause a compiler error.

Compared to suffixes, it's easier to tell what ordering variants an
operation has, and it also make it easier to unify the implementation of
all ordering variants in one method via generic. The `IS_RELAXED` and
`ORDER` associate consts are for generic function to pick up the
particular implementation specified by an ordering annotation.

Signed-off-by: Boqun Feng 
---
 rust/kernel/sync/atomic.rs  |  3 +
 rust/kernel/sync/atomic/ordering.rs | 94 +
 2 files changed, 97 insertions(+)
 create mode 100644 rust/kernel/sync/atomic/ordering.rs

diff --git a/rust/kernel/sync/atomic.rs b/rust/kernel/sync/atomic.rs
index 21b87563667e..be2e8583595f 100644
--- a/rust/kernel/sync/atomic.rs
+++ b/rust/kernel/sync/atomic.rs
@@ -17,3 +17,6 @@
 //! [`LKMM`]: srctree/tools/memory-mode/
 
 pub mod ops;
+pub mod ordering;
+
+pub use ordering::{Acquire, Full, Relaxed, Release};
diff --git a/rust/kernel/sync/atomic/ordering.rs 
b/rust/kernel/sync/atomic/ordering.rs
new file mode 100644
index ..6cf01cd276c6
--- /dev/null
+++ b/rust/kernel/sync/atomic/ordering.rs
@@ -0,0 +1,94 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Memory orderings.
+//!
+//! The semantics of these orderings follows the [`LKMM`] definitions and 
rules.
+//!
+//! - [`Acquire`] and [`Release`] are similar to their counterpart in Rust 
memory model.
+//! - [`Full`] means "fully-ordered", that is:
+//!   - It provides ordering between all the preceding memory accesses and the 
annotated operation.
+//!   - It provides ordering between the annotated operation and all the 
following memory accesses.
+//!   - It provides ordering between all the preceding memory accesses and all 
the fllowing memory
+//! accesses.
+//!   - All the orderings are the same strong as a full memory barrier (i.e. 
`smp_mb()`).
+//! - [`Relaxed`] is similar to the counterpart in Rust memory model, except 
that dependency
+//!   orderings are also honored in [`LKMM`]. Dependency orderings are 
described in "DEPENDENCY
+//!   RELATIONS" in [`LKMM`]'s [`explanation`].
+//!
+//! [`LKMM`]: srctree/tools/memory-model/
+//! [`explanation`]: srctree/tools/memory-model/Documentation/explanation.txt
+
+/// The annotation type for relaxed memory ordering.
+pub struct Relaxed;
+
+/// The annotation type for acquire memory ordering.
+pub struct Acquire;
+
+/// The annotation type for release memory ordering.
+pub struct Release;
+
+/// The annotation type for fully-order memory ordering.
+pub struct Full;
+
+/// The trait bound for operations that only support relaxed ordering.
+pub trait RelaxedOnly {}
+
+impl RelaxedOnly for Relaxed {}
+
+/// The trait bound for operations that only support acquire or relaxed 
ordering.
+pub trait AcquireOrRelaxed {
+/// Describes whether an ordering is relaxed or not.
+const IS_RELAXED: bool = false;
+}
+
+impl AcquireOrRelaxed for Acquire {}
+
+impl AcquireOrRelaxed for Relaxed {
+const IS_RELAXED: bool = true;
+}
+
+/// The trait bound for operations that only support release or relaxed 
ordering.
+pub trait ReleaseOrRelaxed {
+/// Describes whether an ordering is relaxed or not.
+const IS_RELAXED: bool = false;
+}
+
+impl ReleaseOrRelaxed for Release {}
+
+impl ReleaseOrRelaxed for Relaxed {
+const IS_RELAXED: bool = true;
+}
+
+/// Describes the exact memory ordering of an `impl` [`All`].
+pub enum OrderingDesc {
+/// Relaxed ordering.
+Relaxed,
+/// Acquire ordering.
+Acquire,
+/// Release ordering.
+Release,
+/// Fully-ordered.
+Full,
+}
+
+/// The trait bound for annotating operations that should support all 
orderings.
+pub trait All {
+/// Describes the exact memory ordering.
+const ORDER: OrderingDesc;
+}
+
+impl All for Relaxed {
+const ORDER: OrderingDesc = OrderingDesc::Relaxed;
+}
+
+impl All for Acquire {
+const ORDER: OrderingDesc = OrderingDesc::Acquire;
+}
+
+impl All for Release {
+const ORDER: OrderingDesc = OrderingDesc::Release;
+}
+
+impl All for Full {
+const ORDER: OrderingDesc = OrderingDesc::Full;
+}
-- 
2.45.2




[RFC v2 09/13] rust: sync: atomic: Add Atomic<*mut T>

2024-10-31 Thread Boqun Feng
Add atomic support for raw pointer values, similar to `isize` and
`usize`, the representation type is selected based on CONFIG_64BIT.

`*mut T` is not `Send`, however `Atomic<*mut T>` definitely needs to be
a `Sync`, and that's the whole point of atomics: being able to have
multiple shared references in different threads so that they can sync
with each other. As a result, a pointer value will be transferred from
one thread to another via `Atomic<*mut T>`:

  
x.store(p1, Relaxed);
let p = x.load(p1, Relaxed);

This means a raw pointer value (`*mut T`) needs to be able to transfer
across thread boundaries, which is essentially `Send`.

To reflect this in the type system, and based on the fact that pointer
values can be transferred safely (only using them to dereference is
unsafe), as suggested by Alice, extend the `AllowAtomic` trait to
include a customized `Send` semantics, that is: `impl AllowAtomic` has
to be safe to be transferred across thread boundaries.

Suggested-by: Alice Ryhl 
Signed-off-by: Boqun Feng 
---
 rust/kernel/sync/atomic.rs | 24 
 rust/kernel/sync/atomic/generic.rs | 16 +---
 2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/rust/kernel/sync/atomic.rs b/rust/kernel/sync/atomic.rs
index 4166ad48604f..e62c3cd1d3ca 100644
--- a/rust/kernel/sync/atomic.rs
+++ b/rust/kernel/sync/atomic.rs
@@ -173,3 +173,27 @@ fn delta_into_repr(d: Self::Delta) -> Self::Repr {
 d as _
 }
 }
+
+/// ```rust
+/// use kernel::sync::atomic::{Atomic, Relaxed};
+///
+/// let x = Atomic::new(core::ptr::null_mut::());
+///
+/// assert!(x.load(Relaxed).is_null());
+/// ```
+// SAFETY: A `*mut T` has the same size and the alignment as `i64` for 64bit 
and the same as `i32`
+// for 32bit. And it's safe to transfer the ownership of a pointer value to 
another thread.
+unsafe impl generic::AllowAtomic for *mut T {
+#[cfg(CONFIG_64BIT)]
+type Repr = i64;
+#[cfg(not(CONFIG_64BIT))]
+type Repr = i32;
+
+fn into_repr(self) -> Self::Repr {
+self as _
+}
+
+fn from_repr(repr: Self::Repr) -> Self {
+repr as _
+}
+}
diff --git a/rust/kernel/sync/atomic/generic.rs 
b/rust/kernel/sync/atomic/generic.rs
index a75c3e9f4c89..cff98469ed35 100644
--- a/rust/kernel/sync/atomic/generic.rs
+++ b/rust/kernel/sync/atomic/generic.rs
@@ -19,6 +19,10 @@
 #[repr(transparent)]
 pub struct Atomic(Opaque);
 
+// SAFETY: `Atomic` is safe to send between execution contexts, because `T` 
is `AllowAtomic` and
+// `AllowAtomic`'s safety requirement guarantees that.
+unsafe impl Send for Atomic {}
+
 // SAFETY: `Atomic` is safe to share among execution contexts because all 
accesses are atomic.
 unsafe impl Sync for Atomic {}
 
@@ -30,8 +34,13 @@ unsafe impl Sync for Atomic {}
 ///
 /// # Safety
 ///
-/// [`Self`] must have the same size and alignment as [`Self::Repr`].
-pub unsafe trait AllowAtomic: Sized + Send + Copy {
+/// - [`Self`] must have the same size and alignment as [`Self::Repr`].
+/// - The implementer must guarantee it's safe to transfer ownership from one 
execution context to
+///   another, this means it has to be a [`Send`], but because `*mut T` is not 
[`Send`] and that's
+///   the basic type needs to support atomic operations, so this safety 
requirement is added to
+///   [`AllowAtomic`] trait. This safety requirement is automatically 
satisfied if the type is a
+///   [`Send`].
+pub unsafe trait AllowAtomic: Sized + Copy {
 /// The backing atomic implementation type.
 type Repr: AtomicImpl;
 
@@ -42,7 +51,8 @@ pub unsafe trait AllowAtomic: Sized + Send + Copy {
 fn from_repr(repr: Self::Repr) -> Self;
 }
 
-// SAFETY: `T::Repr` is `Self` (i.e. `T`), so they have the same size and 
alignment.
+// SAFETY: `T::Repr` is `Self` (i.e. `T`), so they have the same size and 
alignment. And all
+// `AtomicImpl` types are `Send`.
 unsafe impl AllowAtomic for T {
 type Repr = Self;
 
-- 
2.45.2




[RFC v2 06/13] rust: sync: atomic: Add the framework of arithmetic operations

2024-10-31 Thread Boqun Feng
One important set of atomic operations is the arithmetic operations,
i.e. add(), sub(), fetch_add(), add_return(), etc. However it may not
make senses for all the types that `AllowAtomic` to have arithmetic
operations, for example a `Foo(u32)` may not have a reasonable add() or
sub(), plus subword types (`u8` and `u16`) currently don't have
atomic arithmetic operations even on C side and might not have them in
the future in Rust (because they are usually suboptimal on a few
architecures). Therefore add a subtrait of `AllowAtomic` describing
which types have and can do atomic arithemtic operations.

A few things about this `AllowAtomicArithmetic` trait:

* It has an associate type `Delta` instead of using
  `AllowAllowAtomic::Repr` because, a `Bar(u32)` (whose `Repr` is `i32`)
  may not wants an `add(&self, i32)`, but an `add(&self, u32)`.

* `AtomicImpl` types already implement an `AtomicHasArithmeticOps`
  trait, so add blanket implementation for them. In the future, `i8` and
  `i16` may impl `AtomicImpl` but not `AtomicHasArithmeticOps` if
  arithemtic operations are not available.

Only add() and fetch_add() are added. The rest will be added in the
future.

Signed-off-by: Boqun Feng 
---
 rust/kernel/sync/atomic/generic.rs | 102 +
 1 file changed, 102 insertions(+)

diff --git a/rust/kernel/sync/atomic/generic.rs 
b/rust/kernel/sync/atomic/generic.rs
index bfccc4336c75..a75c3e9f4c89 100644
--- a/rust/kernel/sync/atomic/generic.rs
+++ b/rust/kernel/sync/atomic/generic.rs
@@ -3,6 +3,7 @@
 //! Generic atomic primitives.
 
 use super::ops::*;
+use super::ordering;
 use super::ordering::*;
 use crate::types::Opaque;
 
@@ -54,6 +55,23 @@ fn from_repr(repr: Self::Repr) -> Self {
 }
 }
 
+/// Atomics that allows arithmetic operations with an integer type.
+pub trait AllowAtomicArithmetic: AllowAtomic {
+/// The delta types for arithmetic operations.
+type Delta;
+
+/// Converts [`Self::Delta`] into the representation of the atomic type.
+fn delta_into_repr(d: Self::Delta) -> Self::Repr;
+}
+
+impl AllowAtomicArithmetic for T {
+type Delta = Self;
+
+fn delta_into_repr(d: Self::Delta) -> Self::Repr {
+d
+}
+}
+
 impl Atomic {
 /// Creates a new atomic.
 pub const fn new(v: T) -> Self {
@@ -402,3 +420,87 @@ pub fn compare_exchange(&self, mut old: T, 
new: T, o: Ordering) -
 }
 }
 }
+
+impl Atomic
+where
+T::Repr: AtomicHasArithmeticOps,
+{
+/// Atomic add.
+///
+/// The addition is a wrapping addition.
+///
+/// # Examples
+///
+/// ```rust
+/// use kernel::sync::atomic::{Atomic, Relaxed};
+///
+/// let x = Atomic::new(42);
+///
+/// assert_eq!(42, x.load(Relaxed));
+///
+/// x.add(12, Relaxed);
+///
+/// assert_eq!(54, x.load(Relaxed));
+/// ```
+#[inline(always)]
+pub fn add(&self, v: T::Delta, _: Ordering) {
+let v = T::delta_into_repr(v);
+let a = self.as_ptr().cast::();
+
+// SAFETY:
+// - For calling the atomic_add() function:
+//   - `self.as_ptr()` is a valid pointer, and per the safety 
requirement of `AllocAtomic`,
+//  a `*mut T` is a valid `*mut T::Repr`. Therefore `a` is a valid 
pointer,
+//   - per the type invariants, the following atomic operation won't 
cause data races.
+// - For extra safety requirement of usage on pointers returned by 
`self.as_ptr():
+//   - atomic operations are used here.
+unsafe {
+T::Repr::atomic_add(a, v);
+}
+}
+
+/// Atomic fetch and add.
+///
+/// The addition is a wrapping addition.
+///
+/// # Examples
+///
+/// ```rust
+/// use kernel::sync::atomic::{Atomic, Acquire, Full, Relaxed};
+///
+/// let x = Atomic::new(42);
+///
+/// assert_eq!(42, x.load(Relaxed));
+///
+/// assert_eq!(54, { x.fetch_add(12, Acquire); x.load(Relaxed) });
+///
+/// let x = Atomic::new(42);
+///
+/// assert_eq!(42, x.load(Relaxed));
+///
+/// assert_eq!(54, { x.fetch_add(12, Full); x.load(Relaxed) } );
+/// ```
+#[inline(always)]
+pub fn fetch_add(&self, v: T::Delta, _: Ordering) -> T {
+let v = T::delta_into_repr(v);
+let a = self.as_ptr().cast::();
+
+// SAFETY:
+// - For calling the atomic_fetch_add*() function:
+//   - `self.as_ptr()` is a valid pointer, and per the safety 
requirement of `AllocAtomic`,
+//  a `*mut T` is a valid `*mut T::Repr`. Therefore `a` is a valid 
pointer,
+//   - per the type invariants, the following atomic operation won't 
cause data races.
+// - For extra safety requirement of usage on pointers returned by 
`self.as_ptr():
+//   - atomic operations are used here.
+let ret = unsafe {
+match Ordering::ORDER {
+ordering::OrderingDesc::Full => T::Repr::atomic_fetch_add(a, 
v),
+ 

[RFC v2 08/13] rust: sync: atomic: Add Atomic<{usize,isize}>

2024-10-31 Thread Boqun Feng
Add generic atomic support for `usize` and `isize`. Note that instead of
mapping directly to `atomic_long_t`, the represention type
(`AllowAtomic::Repr`) is selected based on CONFIG_64BIT. This reduces
the necessarity of creating `atomic_long_*` helpers, which could save
the binary size of kernel if inline helpers are not available.

Signed-off-by: Boqun Feng 
---
 rust/kernel/sync/atomic.rs | 71 ++
 1 file changed, 71 insertions(+)

diff --git a/rust/kernel/sync/atomic.rs b/rust/kernel/sync/atomic.rs
index b2e81e22c105..4166ad48604f 100644
--- a/rust/kernel/sync/atomic.rs
+++ b/rust/kernel/sync/atomic.rs
@@ -102,3 +102,74 @@ fn delta_into_repr(d: Self::Delta) -> Self::Repr {
 d as _
 }
 }
+
+// SAFETY: `usize` has the same size and the alignment as `i64` for 64bit and 
the same as `i32` for
+// 32bit.
+unsafe impl generic::AllowAtomic for usize {
+#[cfg(CONFIG_64BIT)]
+type Repr = i64;
+#[cfg(not(CONFIG_64BIT))]
+type Repr = i32;
+
+fn into_repr(self) -> Self::Repr {
+self as _
+}
+
+fn from_repr(repr: Self::Repr) -> Self {
+repr as _
+}
+}
+
+/// ```rust
+/// use kernel::sync::atomic::{Atomic, Full, Relaxed};
+///
+/// let x = Atomic::new(42usize);
+///
+/// assert_eq!(42, x.fetch_add(12, Full));
+/// assert_eq!(54, x.load(Relaxed));
+///
+/// x.add(13, Relaxed);
+///
+/// assert_eq!(67, x.load(Relaxed));
+/// ```
+impl generic::AllowAtomicArithmetic for usize {
+type Delta = usize;
+
+fn delta_into_repr(d: Self::Delta) -> Self::Repr {
+d as _
+}
+}
+
+// SAFETY: `isize` has the same size and the alignment as `i64` for 64bit and 
the same as `i32` for
+// 32bit.
+unsafe impl generic::AllowAtomic for isize {
+type Repr = i64;
+
+fn into_repr(self) -> Self::Repr {
+self as _
+}
+
+fn from_repr(repr: Self::Repr) -> Self {
+repr as _
+}
+}
+
+/// ```rust
+/// use kernel::sync::atomic::{Atomic, Full, Relaxed};
+///
+/// let x = Atomic::new(42isize);
+///
+/// assert_eq!(42, x.fetch_add(12, Full));
+/// assert_eq!(54, x.load(Relaxed));
+///
+/// x.add(13, Relaxed);
+///
+/// assert_eq!(67, x.load(Relaxed));
+/// ```
+impl generic::AllowAtomicArithmetic for isize {
+type Delta = isize;
+
+fn delta_into_repr(d: Self::Delta) -> Self::Repr {
+d as _
+}
+}
-- 
2.45.2




[RFC v2 07/13] rust: sync: atomic: Add Atomic

2024-10-31 Thread Boqun Feng
Add generic atomic support for basic unsigned types that have an
`AtomicIpml` with the same size and alignment.

Signed-off-by: Boqun Feng 
---
 rust/kernel/sync/atomic.rs | 80 ++
 1 file changed, 80 insertions(+)

diff --git a/rust/kernel/sync/atomic.rs b/rust/kernel/sync/atomic.rs
index b791abc59b61..b2e81e22c105 100644
--- a/rust/kernel/sync/atomic.rs
+++ b/rust/kernel/sync/atomic.rs
@@ -22,3 +22,83 @@
 
 pub use generic::Atomic;
 pub use ordering::{Acquire, Full, Relaxed, Release};
+
+/// ```rust
+/// use kernel::sync::atomic::{Atomic, Relaxed};
+///
+/// let x = Atomic::new(42u64);
+///
+/// assert_eq!(42, x.load(Relaxed));
+/// ```
+// SAFETY: `u64` and `i64` has the same size and alignment.
+unsafe impl generic::AllowAtomic for u64 {
+type Repr = i64;
+
+fn into_repr(self) -> Self::Repr {
+self as _
+}
+
+fn from_repr(repr: Self::Repr) -> Self {
+repr as _
+}
+}
+
+/// ```rust
+/// use kernel::sync::atomic::{Atomic, Full, Relaxed};
+///
+/// let x = Atomic::new(42u64);
+///
+/// assert_eq!(42, x.fetch_add(12, Full));
+/// assert_eq!(54, x.load(Relaxed));
+///
+/// x.add(13, Relaxed);
+///
+/// assert_eq!(67, x.load(Relaxed));
+/// ```
+impl generic::AllowAtomicArithmetic for u64 {
+type Delta = u64;
+
+fn delta_into_repr(d: Self::Delta) -> Self::Repr {
+d as _
+}
+}
+
+/// ```rust
+/// use kernel::sync::atomic::{Atomic, Relaxed};
+///
+/// let x = Atomic::new(42u32);
+///
+/// assert_eq!(42, x.load(Relaxed));
+/// ```
+// SAFETY: `u32` and `i32` has the same size and alignment.
+unsafe impl generic::AllowAtomic for u32 {
+type Repr = i32;
+
+fn into_repr(self) -> Self::Repr {
+self as _
+}
+
+fn from_repr(repr: Self::Repr) -> Self {
+repr as _
+}
+}
+
+/// ```rust
+/// use kernel::sync::atomic::{Atomic, Full, Relaxed};
+///
+/// let x = Atomic::new(42u32);
+///
+/// assert_eq!(42, x.fetch_add(12, Full));
+/// assert_eq!(54, x.load(Relaxed));
+///
+/// x.add(13, Relaxed);
+///
+/// assert_eq!(67, x.load(Relaxed));
+/// ```
+impl generic::AllowAtomicArithmetic for u32 {
+type Delta = u32;
+
+fn delta_into_repr(d: Self::Delta) -> Self::Repr {
+d as _
+}
+}
-- 
2.45.2




[RFC v2 12/13] rust: add rcu abstraction

2024-10-31 Thread Boqun Feng
From: Wedson Almeida Filho 

Add a simple abstraction to guard critical code sections with an rcu
read lock.

Signed-off-by: Wedson Almeida Filho 
Signed-off-by: Danilo Krummrich 
---
 rust/helpers/helpers.c  |  1 +
 rust/helpers/rcu.c  | 13 +++
 rust/kernel/sync.rs |  1 +
 rust/kernel/sync/rcu.rs | 52 +
 4 files changed, 67 insertions(+)
 create mode 100644 rust/helpers/rcu.c
 create mode 100644 rust/kernel/sync/rcu.rs

diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index f4a94833b29d..65951245879f 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -18,6 +18,7 @@
 #include "mutex.c"
 #include "page.c"
 #include "rbtree.c"
+#include "rcu.c"
 #include "refcount.c"
 #include "signal.c"
 #include "slab.c"
diff --git a/rust/helpers/rcu.c b/rust/helpers/rcu.c
new file mode 100644
index ..f1cec6583513
--- /dev/null
+++ b/rust/helpers/rcu.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+
+void rust_helper_rcu_read_lock(void)
+{
+   rcu_read_lock();
+}
+
+void rust_helper_rcu_read_unlock(void)
+{
+   rcu_read_unlock();
+}
diff --git a/rust/kernel/sync.rs b/rust/kernel/sync.rs
index 0d0b19441ae8..f5a413e1ce30 100644
--- a/rust/kernel/sync.rs
+++ b/rust/kernel/sync.rs
@@ -13,6 +13,7 @@
 mod condvar;
 pub mod lock;
 mod locked_by;
+pub mod rcu;
 
 pub use arc::{Arc, ArcBorrow, UniqueArc};
 pub use condvar::{new_condvar, CondVar, CondVarTimeoutResult};
diff --git a/rust/kernel/sync/rcu.rs b/rust/kernel/sync/rcu.rs
new file mode 100644
index ..5a35495f69a4
--- /dev/null
+++ b/rust/kernel/sync/rcu.rs
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! RCU support.
+//!
+//! C header: [`include/linux/rcupdate.h`](srctree/include/linux/rcupdate.h)
+
+use crate::bindings;
+use core::marker::PhantomData;
+
+/// Evidence that the RCU read side lock is held on the current thread/CPU.
+///
+/// The type is explicitly not `Send` because this property is per-thread/CPU.
+///
+/// # Invariants
+///
+/// The RCU read side lock is actually held while instances of this guard 
exist.
+pub struct Guard {
+_not_send: PhantomData<*mut ()>,
+}
+
+impl Guard {
+/// Acquires the RCU read side lock and returns a guard.
+pub fn new() -> Self {
+// SAFETY: An FFI call with no additional requirements.
+unsafe { bindings::rcu_read_lock() };
+// INVARIANT: The RCU read side lock was just acquired above.
+Self {
+_not_send: PhantomData,
+}
+}
+
+/// Explicitly releases the RCU read side lock.
+pub fn unlock(self) {}
+}
+
+impl Default for Guard {
+fn default() -> Self {
+Self::new()
+}
+}
+
+impl Drop for Guard {
+fn drop(&mut self) {
+// SAFETY: By the type invariants, the rcu read side is locked, so it 
is ok to unlock it.
+unsafe { bindings::rcu_read_unlock() };
+}
+}
+
+/// Acquires the RCU read side lock.
+pub fn read_lock() -> Guard {
+Guard::new()
+}
-- 
2.45.2




[RFC v2 11/13] rust: sync: Add memory barriers

2024-10-31 Thread Boqun Feng
Memory barriers are building blocks for concurrent code, hence provide
a minimal set of them.

The compiler barrier, barrier(), is implemented in inline asm instead of
using core::sync::atomic::compiler_fence() because memory models are
different: kernel's atomics are implemented in inline asm therefore the
compiler barrier should be implemented in inline asm as well.

Signed-off-by: Boqun Feng 
---
 rust/helpers/helpers.c  |  1 +
 rust/kernel/sync.rs |  1 +
 rust/kernel/sync/barrier.rs | 67 +
 3 files changed, 69 insertions(+)
 create mode 100644 rust/kernel/sync/barrier.rs

diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index ab5a3f1be241..f4a94833b29d 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -8,6 +8,7 @@
  */
 
 #include "atomic.c"
+#include "barrier.c"
 #include "blk.c"
 #include "bug.c"
 #include "build_assert.c"
diff --git a/rust/kernel/sync.rs b/rust/kernel/sync.rs
index 66ac3752ca71..0d0b19441ae8 100644
--- a/rust/kernel/sync.rs
+++ b/rust/kernel/sync.rs
@@ -9,6 +9,7 @@
 
 mod arc;
 pub mod atomic;
+pub mod barrier;
 mod condvar;
 pub mod lock;
 mod locked_by;
diff --git a/rust/kernel/sync/barrier.rs b/rust/kernel/sync/barrier.rs
new file mode 100644
index ..277aa09747bf
--- /dev/null
+++ b/rust/kernel/sync/barrier.rs
@@ -0,0 +1,67 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Memory barriers.
+//!
+//! These primitives have the same semantics as their C counterparts: and the 
precise definitions of
+//! semantics can be found at [`LKMM`].
+//!
+//! [`LKMM`]: srctree/tools/memory-mode/
+
+/// A compiler barrier.
+///
+/// An explicic compiler barrier function that prevents the compiler from 
moving the memory
+/// accesses either side of it to the other side.
+pub fn barrier() {
+// By default, Rust inline asms are treated as being able to access any 
memory or flags, hence
+// it suffices as a compiler barrier.
+//
+// SAFETY: An empty asm block should be safe.
+unsafe {
+core::arch::asm!("");
+}
+}
+
+/// A full memory barrier.
+///
+/// A barrier function that prevents both the compiler and the CPU from moving 
the memory accesses
+/// either side of it to the other side.
+pub fn smp_mb() {
+if cfg!(CONFIG_SMP) {
+// SAFETY: `smp_mb()` is safe to call.
+unsafe {
+bindings::smp_mb();
+}
+} else {
+barrier();
+}
+}
+
+/// A write-write memory barrier.
+///
+/// A barrier function that prevents both the compiler and the CPU from moving 
the memory write
+/// accesses either side of it to the other side.
+pub fn smp_wmb() {
+if cfg!(CONFIG_SMP) {
+// SAFETY: `smp_wmb()` is safe to call.
+unsafe {
+bindings::smp_wmb();
+}
+} else {
+barrier();
+}
+}
+
+/// A read-read memory barrier.
+///
+/// A barrier function that prevents both the compiler and the CPU from moving 
the memory read
+/// accesses either side of it to the other side.
+pub fn smp_rmb() {
+if cfg!(CONFIG_SMP) {
+// SAFETY: `smp_rmb()` is safe to call.
+unsafe {
+bindings::smp_rmb();
+}
+} else {
+barrier();
+}
+}
-- 
2.45.2




[RFC v2 10/13] rust: sync: atomic: Add arithmetic ops for Atomic<*mut T>

2024-10-31 Thread Boqun Feng
(This is more an RFC)

Add arithmetic operations support for `Atomic<*mut T>`. Currently the
semantics of arithmetic atomic operation is the same as pointer
arithmetic, that is, e.g. `Atomic<*mut u64>::add(1)` is adding 8
(`size_of::`) to the pointer value.

In Rust std library, there are two sets of pointer arithmetic for
`AtomicPtr`:

* ptr_add() and ptr_sub(), which is the same as Atomic<*mut T>::add(),
  pointer arithmetic.

* byte_add() and byte_sub(), which use the input as byte offset to
  change the pointer value, e.g. byte_add(1) means adding 1 to the
  pointer value.

We can either take the approach in the current patch and add byte_add()
later on if needed, or start with ptr_add() and byte_add() naming.

Signed-off-by: Boqun Feng 
---
 rust/kernel/sync/atomic.rs | 29 +
 1 file changed, 29 insertions(+)

diff --git a/rust/kernel/sync/atomic.rs b/rust/kernel/sync/atomic.rs
index e62c3cd1d3ca..cbe5d40d9e36 100644
--- a/rust/kernel/sync/atomic.rs
+++ b/rust/kernel/sync/atomic.rs
@@ -197,3 +197,32 @@ fn from_repr(repr: Self::Repr) -> Self {
 repr as _
 }
 }
+
+/// ```rust
+/// use kernel::sync::atomic::{Atomic, Relaxed};
+///
+/// let s: &mut [i32] = &mut [1, 3, 2, 4];
+///
+/// let x = Atomic::new(s.as_mut_ptr());
+///
+/// x.add(1, Relaxed);
+///
+/// let ptr = x.fetch_add(1, Relaxed); // points to the 2nd element.
+/// let ptr2 = x.load(Relaxed); // points to the 3rd element.
+///
+/// // SAFETY: `ptr` and `ptr2` are valid pointers to the 2nd and 3rd elements 
of `s` with writing
+/// // provenance, and no other thread is accessing these elements.
+/// unsafe { core::ptr::swap(ptr, ptr2); }
+///
+/// assert_eq!(s, &mut [1, 2, 3, 4]);
+/// ```
+impl generic::AllowAtomicArithmetic for *mut T {
+type Delta = isize;
+
+/// The behavior of arithmetic operations
+fn delta_into_repr(d: Self::Delta) -> Self::Repr {
+// Since atomic arithmetic operations are wrapping, so a 
wrapping_mul() here suffices even
+// if overflow may happen.
+d.wrapping_mul(core::mem::size_of::() as _) as _
+}
+}
-- 
2.45.2




[RFC v2 02/13] rust: sync: Add basic atomic operation mapping framework

2024-10-31 Thread Boqun Feng
Preparation for generic atomic implementation. To unify the
ipmlementation of a generic method over `i32` and `i64`, the C side
atomic methods need to be grouped so that in a generic method, they can
be referred as ::, otherwise their parameters and return
value are different between `i32` and `i64`, which would require using
`transmute()` to unify the type into a `T`.

Introduce `AtomicIpml` to represent a basic type in Rust that has the
direct mapping to an atomic implementation from C. This trait is sealed,
and currently only `i32` and `i64` ipml this.

Further, different methods are put into different `*Ops` trait groups,
and this is for the future when smaller types like `i8`/`i16` are
supported but only with a limited set of API (e.g. only set(), load(),
xchg() and cmpxchg(), no add() or sub() etc).

While the atomic mod is introduced, documentation is also added for
memory models and data races.

Also bump my role to the maintainer of ATOMIC INFRASTRUCTURE to reflect
my responsiblity on the Rust atomic mod.

Signed-off-by: Boqun Feng 
---
 MAINTAINERS|   4 +-
 rust/kernel/sync.rs|   1 +
 rust/kernel/sync/atomic.rs |  19 
 rust/kernel/sync/atomic/ops.rs | 199 +
 4 files changed, 222 insertions(+), 1 deletion(-)
 create mode 100644 rust/kernel/sync/atomic.rs
 create mode 100644 rust/kernel/sync/atomic/ops.rs

diff --git a/MAINTAINERS b/MAINTAINERS
index b77f4495dcf4..e09471027a63 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3635,7 +3635,7 @@ F:drivers/input/touchscreen/atmel_mxt_ts.c
 ATOMIC INFRASTRUCTURE
 M: Will Deacon 
 M: Peter Zijlstra 
-R: Boqun Feng 
+M: Boqun Feng 
 R: Mark Rutland 
 L: linux-kernel@vger.kernel.org
 S: Maintained
@@ -3644,6 +3644,8 @@ F:arch/*/include/asm/atomic*.h
 F: include/*/atomic*.h
 F: include/linux/refcount.h
 F: scripts/atomic/
+F: rust/kernel/sync/atomic.rs
+F: rust/kernel/sync/atomic/
 
 ATTO EXPRESSSAS SAS/SATA RAID SCSI DRIVER
 M: Bradley Grove 
diff --git a/rust/kernel/sync.rs b/rust/kernel/sync.rs
index 0ab20975a3b5..66ac3752ca71 100644
--- a/rust/kernel/sync.rs
+++ b/rust/kernel/sync.rs
@@ -8,6 +8,7 @@
 use crate::types::Opaque;
 
 mod arc;
+pub mod atomic;
 mod condvar;
 pub mod lock;
 mod locked_by;
diff --git a/rust/kernel/sync/atomic.rs b/rust/kernel/sync/atomic.rs
new file mode 100644
index ..21b87563667e
--- /dev/null
+++ b/rust/kernel/sync/atomic.rs
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Atomic primitives.
+//!
+//! These primitives have the same semantics as their C counterparts: and the 
precise definitions of
+//! semantics can be found at [`LKMM`]. Note that Linux Kernel Memory 
(Consistency) Model is the
+//! only model for Rust code in kernel, and Rust's own atomics should be 
avoided.
+//!
+//! # Data races
+//!
+//! [`LKMM`] atomics have different rules regarding data races:
+//!
+//! - A normal read doesn't data-race with an atomic read.
+//! - A normal write from C side is treated as an atomic write if
+//!   CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y.
+//!
+//! [`LKMM`]: srctree/tools/memory-mode/
+
+pub mod ops;
diff --git a/rust/kernel/sync/atomic/ops.rs b/rust/kernel/sync/atomic/ops.rs
new file mode 100644
index ..59101a0d0273
--- /dev/null
+++ b/rust/kernel/sync/atomic/ops.rs
@@ -0,0 +1,199 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Atomic implementations.
+//!
+//! Provides 1:1 mapping of atomic implementations.
+
+use crate::bindings::*;
+use crate::macros::paste;
+
+mod private {
+/// Sealed trait marker to disable customized impls on atomic 
implementation traits.
+pub trait Sealed {}
+}
+
+// `i32` and `i64` are only supported atomic implementations.
+impl private::Sealed for i32 {}
+impl private::Sealed for i64 {}
+
+/// A marker trait for types that ipmlement atomic operations with C side 
primitives.
+///
+/// This trait is sealed, and only types that have directly mapping to the C 
side atomics should
+/// impl this:
+///
+/// - `i32` maps to `atomic_t`.
+/// - `i64` maps to `atomic64_t`.
+pub trait AtomicImpl: Sized + Send + Copy + private::Sealed {}
+
+// `atomic_t` impl atomic operations on `i32`.
+impl AtomicImpl for i32 {}
+
+// `atomic64_t` impl atomic operations on `i64`.
+impl AtomicImpl for i64 {}
+
+// This macro generates the function signature with given argument list and 
return type.
+macro_rules! declare_atomic_method {
+(
+$func:ident($($arg:ident : $arg_type:ty),*) $(-> $ret:ty)?
+) => {
+paste!(
+#[doc = concat!("Atomic ", stringify!($func))]
+#[doc = "# Safety"]
+#[doc = "- any pointer passed to the function has to be a valid 
pointer"]
+#[doc = "- Accesses must not cause data races per LKMM:"]
+#[doc = "  - atomic read racing with normal read, normal write or 
atomic write is not data race."]
+#[doc = "  - atomic write r

[PATCH v4 2/3] rust: macros: add macro to easily run KUnit tests

2024-10-31 Thread David Gow
From: José Expósito 

Add a new procedural macro (`#[kunit_tests(kunit_test_suit_name)]`) to
run KUnit tests using a user-space like syntax.

The macro, that should be used on modules, transforms every `#[test]`
in a `kunit_case!` and adds a `kunit_unsafe_test_suite!` registering
all of them.

The only difference with user-space tests is that instead of using
`#[cfg(test)]`, `#[kunit_tests(kunit_test_suit_name)]` is used.

Note that `#[cfg(CONFIG_KUNIT)]` is added so the test module is not
compiled when `CONFIG_KUNIT` is set to `n`.

Reviewed-by: David Gow 
Signed-off-by: José Expósito 
Co-developed-by: Boqun Feng 
Signed-off-by: Boqun Feng 
Signed-off-by: David Gow 
---

Changes since v3:
https://lore.kernel.org/linux-kselftest/20241030045719.3085147-6-david...@google.com/
- The #[kunit_tests()] macro now preserves span information, so
  errors can be better reported. (Thanks, Boqun!)
- The example test has been replaced to no longer use assert_eq!() with
  a constant bool argument (which triggered a clippy warning now we
  have the span info). It now checks 1 + 1 == 2, to match the C example.
  - (The in_kunit_test() test in the next patch uses assert!() to check
a bool, so having something different here seemed to make a better
example.)

Changes since v2:
https://lore.kernel.org/linux-kselftest/20241029092422.2884505-3-david...@google.com/
- Include missing rust/macros/kunit.rs file from v2. (Thanks Boqun!)
- The proc macro now emits an error if the suite name is too long.

Changes since v1:
https://lore.kernel.org/lkml/20230720-rustbind-v1-2-c80db349e...@google.com/
- Rebased on top of rust-next
- Make use of the new const functions, rather than the kunit_case!()
  macro.

---
 MAINTAINERS  |   1 +
 rust/kernel/kunit.rs |  10 +++
 rust/macros/kunit.rs | 168 +++
 rust/macros/lib.rs   |  29 
 4 files changed, 208 insertions(+)
 create mode 100644 rust/macros/kunit.rs

diff --git a/MAINTAINERS b/MAINTAINERS
index b77f4495dcf4..b65035ede675 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12433,6 +12433,7 @@ F:  Documentation/dev-tools/kunit/
 F: include/kunit/
 F: lib/kunit/
 F: rust/kernel/kunit.rs
+F: rust/macros/kunit.rs
 F: scripts/rustdoc_test_*
 F: tools/testing/kunit/
 
diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs
index 85bc1faff0d5..71ce1d145be8 100644
--- a/rust/kernel/kunit.rs
+++ b/rust/kernel/kunit.rs
@@ -40,6 +40,8 @@ pub fn info(args: fmt::Arguments<'_>) {
 }
 }
 
+use macros::kunit_tests;
+
 /// Asserts that a boolean expression is `true` at runtime.
 ///
 /// Public but hidden since it should only be used from generated tests.
@@ -273,3 +275,11 @@ macro_rules! kunit_unsafe_test_suite {
 };
 };
 }
+
+#[kunit_tests(rust_kernel_kunit)]
+mod tests {
+#[test]
+fn rust_test_kunit_example_test() {
+assert_eq!(1 + 1, 2);
+}
+}
diff --git a/rust/macros/kunit.rs b/rust/macros/kunit.rs
new file mode 100644
index ..c421ff65f7f9
--- /dev/null
+++ b/rust/macros/kunit.rs
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Procedural macro to run KUnit tests using a user-space like syntax.
+//!
+//! Copyright (c) 2023 José Expósito 
+
+use proc_macro::{Delimiter, Group, TokenStream, TokenTree};
+use std::fmt::Write;
+
+pub(crate) fn kunit_tests(attr: TokenStream, ts: TokenStream) -> TokenStream {
+if attr.to_string().is_empty() {
+panic!("Missing test name in #[kunit_tests(test_name)] macro")
+}
+
+if attr.to_string().as_bytes().len() > 255 {
+panic!("The test suite name `{}` exceeds the maximum length of 255 
bytes.", attr)
+}
+
+let mut tokens: Vec<_> = ts.into_iter().collect();
+
+// Scan for the "mod" keyword.
+tokens
+.iter()
+.find_map(|token| match token {
+TokenTree::Ident(ident) => match ident.to_string().as_str() {
+"mod" => Some(true),
+_ => None,
+},
+_ => None,
+})
+.expect("#[kunit_tests(test_name)] attribute should only be applied to 
modules");
+
+// Retrieve the main body. The main body should be the last token tree.
+let body = match tokens.pop() {
+Some(TokenTree::Group(group)) if group.delimiter() == Delimiter::Brace 
=> group,
+_ => panic!("cannot locate main body of module"),
+};
+
+// Get the functions set as tests. Search for `[test]` -> `fn`.
+let mut body_it = body.stream().into_iter();
+let mut tests = Vec::new();
+while let Some(token) = body_it.next() {
+match token {
+TokenTree::Group(ident) if ident.to_string() == "[test]" => match 
body_it.next() {
+Some(TokenTree::Ident(ident)) if ident.to_string() == "fn" => {
+let test_name = match body_it.next() {
+Some(TokenTree::Ident(ident)) => ident.to_string(),
+_ => continue,
+ 

[PATCH v4 1/3] rust: kunit: add KUnit case and suite macros

2024-10-31 Thread David Gow
From: José Expósito 

Add a couple of Rust const functions and macros to allow to develop
KUnit tests without relying on generated C code:

 - The `kunit_unsafe_test_suite!` Rust macro is similar to the
   `kunit_test_suite` C macro. It requires a NULL-terminated array of
   test cases (see below).
 - The `kunit_case` Rust function is similar to the `KUNIT_CASE` C macro.
   It generates as case from the name and function.
 - The `kunit_case_null` Rust function generates a NULL test case, which
   is to be used as delimiter in `kunit_test_suite!`.

While these functions and macros can be used on their own, a future
patch will introduce another macro to create KUnit tests using a
user-space like syntax.

Signed-off-by: José Expósito 
Co-developed-by: Matt Gilbride 
Signed-off-by: Matt Gilbride 
Co-developed-by: David Gow 
Signed-off-by: David Gow 
---

Changes since v3:
https://lore.kernel.org/linux-kselftest/20241030045719.3085147-4-david...@google.com/
- The kunit_unsafe_test_suite!() macro now panic!s if the suite name is
  too long, triggering a compile error. (Thanks, Alice!)

Changes since v2:
https://lore.kernel.org/linux-kselftest/20241029092422.2884505-2-david...@google.com/
- The kunit_unsafe_test_suite!() macro will truncate the name of the
  suite if it is too long. (Thanks Alice!)
- We no longer needlessly use UnsafeCell<> in
  kunit_unsafe_test_suite!(). (Thanks Alice!)

Changes since v1:
https://lore.kernel.org/lkml/20230720-rustbind-v1-1-c80db349e...@google.com/
- Rebase on top of rust-next
- As a result, KUnit attributes are new set. These are hardcoded to the
  defaults of "normal" speed and no module name.
- Split the kunit_case!() macro into two const functions, kunit_case()
  and kunit_case_null() (for the NULL terminator).

---
 rust/kernel/kunit.rs | 112 +++
 rust/kernel/lib.rs   |   1 +
 2 files changed, 113 insertions(+)

diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs
index 824da0e9738a..85bc1faff0d5 100644
--- a/rust/kernel/kunit.rs
+++ b/rust/kernel/kunit.rs
@@ -161,3 +161,115 @@ macro_rules! kunit_assert_eq {
 $crate::kunit_assert!($name, $file, $diff, $left == $right);
 }};
 }
+
+/// Represents an individual test case.
+///
+/// The test case should have the signature
+/// `unsafe extern "C" fn test_case(test: *mut crate::bindings::kunit)`.
+///
+/// The `kunit_unsafe_test_suite!` macro expects a NULL-terminated list of 
test cases.
+/// Use `kunit_case_null` to generate such a delimeter.
+const fn kunit_case(name: &'static kernel::str::CStr, run_case: unsafe extern 
"C" fn(*mut kernel::bindings::kunit)) -> kernel::bindings::kunit_case {
+kernel::bindings::kunit_case {
+run_case: Some(run_case),
+name: name.as_char_ptr(),
+attr: kernel::bindings::kunit_attributes {
+speed: kernel::bindings::kunit_speed_KUNIT_SPEED_NORMAL,
+},
+generate_params: None,
+status: kernel::bindings::kunit_status_KUNIT_SUCCESS,
+module_name: core::ptr::null_mut(),
+log: core::ptr::null_mut(),
+}
+}
+
+/// Represents the NULL test case delimiter.
+///
+/// The `kunit_unsafe_test_suite!` macro expects a NULL-terminated list of 
test cases. This
+/// function retuns such a delimiter.
+const fn kunit_case_null() -> kernel::bindings::kunit_case {
+kernel::bindings::kunit_case {
+run_case: None,
+name: core::ptr::null_mut(),
+generate_params: None,
+attr: kernel::bindings::kunit_attributes {
+speed: kernel::bindings::kunit_speed_KUNIT_SPEED_NORMAL,
+},
+status: kernel::bindings::kunit_status_KUNIT_SUCCESS,
+module_name: core::ptr::null_mut(),
+log: core::ptr::null_mut(),
+}
+}
+
+
+/// Registers a KUnit test suite.
+///
+/// # Safety
+///
+/// `test_cases` must be a NULL terminated array of test cases.
+///
+/// # Examples
+///
+/// ```ignore
+/// unsafe extern "C" fn test_fn(_test: *mut crate::bindings::kunit) {
+/// let actual = 1 + 1;
+/// let expected = 2;
+/// assert_eq!(actual, expected);
+/// }
+///
+/// static mut KUNIT_TEST_CASE: crate::bindings::kunit_case = 
crate::kunit_case(name, test_fn);
+/// static mut KUNIT_NULL_CASE: crate::bindings::kunit_case = 
crate::kunit_case_null();
+/// static mut KUNIT_TEST_CASES: &mut[crate::bindings::kunit_case] = unsafe {
+/// &mut[KUNIT_TEST_CASE, KUNIT_NULL_CASE]
+/// };
+/// crate::kunit_unsafe_test_suite!(suite_name, KUNIT_TEST_CASES);
+/// ```
+#[macro_export]
+macro_rules! kunit_unsafe_test_suite {
+($name:ident, $test_cases:ident) => {
+const _: () = {
+static KUNIT_TEST_SUITE_NAME: [i8; 256] = {
+let name_u8 = core::stringify!($name).as_bytes();
+let mut ret = [0; 256];
+
+if name_u8.len() > 255 {
+panic!(concat!("The test suite name `", 
core::stringify!($name), "` exceeds the maximum length of 255 bytes."));
+ 

[PATCH v4 3/3] rust: kunit: allow to know if we are in a test

2024-10-31 Thread David Gow
From: José Expósito 

In some cases, we need to call test-only code from outside the test
case, for example, to mock a function or a module.

In order to check whether we are in a test or not, we need to test if
`CONFIG_KUNIT` is set.
Unfortunately, we cannot rely only on this condition because:
- a test could be running in another thread,
- some distros compile KUnit in production kernels, so checking at runtime
  that `current->kunit_test != NULL` is required.

Forturately, KUnit provides an optimised check in
`kunit_get_current_test()`, which checks CONFIG_KUNIT, a global static
key, and then the current thread's running KUnit test.

Add a safe wrapper function around this to know whether or not we are in
a KUnit test and examples showing how to mock a function and a module.

Signed-off-by: José Expósito 
Co-developed-by: David Gow 
Signed-off-by: David Gow 
---

Changes since v3:
https://lore.kernel.org/linux-kselftest/20241030045719.3085147-8-david...@google.com/
- The example test has been updated to no longer use assert_eq!() with
  a constant bool argument (fixes a clippy warning).

No changes since v2:
https://lore.kernel.org/linux-kselftest/20241029092422.2884505-4-david...@google.com/

Changes since v1:
https://lore.kernel.org/lkml/20230720-rustbind-v1-3-c80db349e...@google.com/
- Rebased on top of rust-next.
- Use the `kunit_get_current_test()` C function, which wasn't previously
  available, instead of rolling our own.
- (Thanks also to Boqun for suggesting a nicer way of implementing this,
  which I tried, but the `kunit_get_current_test()` version obsoleted.)

---
 rust/kernel/kunit.rs | 72 
 1 file changed, 72 insertions(+)

diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs
index 71ce1d145be8..ad38d6d62446 100644
--- a/rust/kernel/kunit.rs
+++ b/rust/kernel/kunit.rs
@@ -276,10 +276,82 @@ macro_rules! kunit_unsafe_test_suite {
 };
 }
 
+/// In some cases, you need to call test-only code from outside the test case, 
for example, to
+/// create a function mock. This function can be invoked to know whether we 
are currently running a
+/// KUnit test or not.
+///
+/// # Examples
+///
+/// This example shows how a function can be mocked to return a well-known 
value while testing:
+///
+/// ```
+/// # use kernel::kunit::in_kunit_test;
+/// #
+/// fn fn_mock_example(n: i32) -> i32 {
+/// if in_kunit_test() {
+/// 100
+/// } else {
+/// n + 1
+/// }
+/// }
+///
+/// let mock_res = fn_mock_example(5);
+/// assert_eq!(mock_res, 100);
+/// ```
+///
+/// Sometimes, you don't control the code that needs to be mocked. This 
example shows how the
+/// `bindings` module can be mocked:
+///
+/// ```
+/// // Import our mock naming it as the real module.
+/// #[cfg(CONFIG_KUNIT)]
+/// use bindings_mock_example as bindings;
+///
+/// // This module mocks `bindings`.
+/// mod bindings_mock_example {
+/// use kernel::kunit::in_kunit_test;
+/// use kernel::bindings::u64_;
+///
+/// // Make the other binding functions available.
+/// pub(crate) use kernel::bindings::*;
+///
+/// // Mock `ktime_get_boot_fast_ns` to return a well-known value when 
running a KUnit test.
+/// pub(crate) unsafe fn ktime_get_boot_fast_ns() -> u64_ {
+/// if in_kunit_test() {
+/// 1234
+/// } else {
+/// unsafe { kernel::bindings::ktime_get_boot_fast_ns() }
+/// }
+/// }
+/// }
+///
+/// // This is the function we want to test. Since `bindings` has been mocked, 
we can use its
+/// // functions seamlessly.
+/// fn get_boot_ns() -> u64 {
+/// unsafe { bindings::ktime_get_boot_fast_ns() }
+/// }
+///
+/// let time = get_boot_ns();
+/// assert_eq!(time, 1234);
+/// ```
+pub fn in_kunit_test() -> bool {
+// SAFETY: kunit_get_current_test() is always safe to call from C (it has 
fallbacks for
+// when KUnit is not enabled), and we're only comparing the result to NULL.
+unsafe { !bindings::kunit_get_current_test().is_null() }
+}
+
 #[kunit_tests(rust_kernel_kunit)]
 mod tests {
+use super::*;
+
 #[test]
 fn rust_test_kunit_example_test() {
 assert_eq!(1 + 1, 2);
 }
+
+#[test]
+fn rust_test_kunit_in_kunit_test() {
+let in_kunit = in_kunit_test();
+assert!(in_kunit);
+}
 }
-- 
2.47.0.199.ga7371fff76-goog




Re: [RFC v2 11/13] rust: sync: Add memory barriers

2024-10-31 Thread David Gow
On Fri, 1 Nov 2024 at 14:07, Boqun Feng  wrote:
>
> Memory barriers are building blocks for concurrent code, hence provide
> a minimal set of them.
>
> The compiler barrier, barrier(), is implemented in inline asm instead of
> using core::sync::atomic::compiler_fence() because memory models are
> different: kernel's atomics are implemented in inline asm therefore the
> compiler barrier should be implemented in inline asm as well.
>
> Signed-off-by: Boqun Feng 
> ---
>  rust/helpers/helpers.c  |  1 +
>  rust/kernel/sync.rs |  1 +
>  rust/kernel/sync/barrier.rs | 67 +
>  3 files changed, 69 insertions(+)
>  create mode 100644 rust/kernel/sync/barrier.rs
>
> diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
> index ab5a3f1be241..f4a94833b29d 100644
> --- a/rust/helpers/helpers.c
> +++ b/rust/helpers/helpers.c
> @@ -8,6 +8,7 @@
>   */
>
>  #include "atomic.c"
> +#include "barrier.c"

It looks like "barrier.c" is missing, so this isn't compiling for me.
I assume it was meant to be added in this patch...?

Thanks,
-- David


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH v3] selftests: clone3: Use the capget and capset syscall directly

2024-10-31 Thread Shuah Khan

On 10/29/24 20:50, zhouyuhang wrote:

From: zhouyuhang 

The libcap commit aca076443591 ("Make cap_t operations thread safe.")
added a __u8 mutex at the beginning of the struct _cap_struct, it changes
the offset of the members in the structure that breaks the assumption
made in the "struct libcap" definition in clone3_cap_checkpoint_restore.c.
This will cause the test case to fail with the following output:

  #  RUN   global.clone3_cap_checkpoint_restore ...
  # clone3() syscall supported
  # clone3_cap_checkpoint_restore.c:151:clone3_cap_checkpoint_restore:Child has 
PID 130508
  cap_set_proc: Operation not permitted


Sounds like EPERM is returned here. What's the error number from
cap_set_proc().
 

  # clone3_cap_checkpoint_restore.c:160:clone3_cap_checkpoint_restore:Expected 
set_capability() (-1) == 0 (0)


What's the error number here? Looks like this test simply
uses perror - it is better to use strerror() which includes
the error number.

Is this related EPERM?
 

  # clone3_cap_checkpoint_restore.c:161:clone3_cap_checkpoint_restore:Could not 
set CAP_CHECKPOINT_RESTORE
  # clone3_cap_checkpoint_restore: Test terminated by assertion
  #  FAIL  global.clone3_cap_checkpoint_restore

Changing to using capget and capset syscall directly here can fix this error,
just like what the commit 663af70aabb7 ("bpf: selftests: Add helpers to directly
use the capget and capset syscall") does.


Is this still accurate for v3 - Does this patch match the
bpf commit?

What is the output with this patch? Include it in the change log.



Signed-off-by: zhouyuhang > ---


Please mention the changes from v2 to v3 here so it makes it
easier for reviewers associating the changes to the reviewer.

I had to go look up v1 and v2.


  .../clone3/clone3_cap_checkpoint_restore.c| 58 +--
  1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c 
b/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c
index 3c196fa86c99..8b61702bf721 100644
--- a/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c
+++ b/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c
@@ -27,6 +27,13 @@
  #include "../kselftest_harness.h"
  #include "clone3_selftests.h"
  
+/*

+ * Prevent not being defined in the header file
+ */
+#ifndef CAP_CHECKPOINT_RESTORE
+#define CAP_CHECKPOINT_RESTORE 40
+#endif
+
  static void child_exit(int ret)
  {
fflush(stdout);
@@ -87,47 +94,36 @@ static int test_clone3_set_tid(struct __test_metadata 
*_metadata,
return ret;
  }
  
-struct libcap {

-   struct __user_cap_header_struct hdr;
-   struct __user_cap_data_struct data[2];
-};
-
  static int set_capability(void)
  {
-   cap_value_t cap_values[] = { CAP_SETUID, CAP_SETGID };
-   struct libcap *cap;
-   int ret = -1;
-   cap_t caps;
-
-   caps = cap_get_proc();
-   if (!caps) {
-   perror("cap_get_proc");
+   struct __user_cap_data_struct data[2];
+   struct __user_cap_header_struct hdr = {
+   .version = _LINUX_CAPABILITY_VERSION_3,


cap_validate_magic() handles _LINUX_CAPABILITY_VERSION_1,
_LINUX_CAPABILITY_VERSION_2, and _LINUX_CAPABILITY_VERSION_3

It would help to add a comment on why it is necessary to
set the version here.


+   };
+   __u32 cap0 = 1 << CAP_SETUID | 1 << CAP_SETGID;
+   __u32 cap1 = 1 << (CAP_CHECKPOINT_RESTORE - 32);


Explain why this is necessary - a comment will help future
maintenance of this code.


+   int ret;
+
+   ret = capget(&hdr, data);
+   if (ret) {
+   perror("capget");




return -1;
}
  
  	/* Drop all capabilities */

-   if (cap_clear(caps)) {
-   perror("cap_clear");
-   goto out;
-   }
-
-   cap_set_flag(caps, CAP_EFFECTIVE, 2, cap_values, CAP_SET);
-   cap_set_flag(caps, CAP_PERMITTED, 2, cap_values, CAP_SET);
+   memset(&data, 0, sizeof(data));
  
-	cap = (struct libcap *) caps;

+   data[0].effective |= cap0;
+   data[0].permitted |= cap0;
  
-	/* 40 -> CAP_CHECKPOINT_RESTORE */

-   cap->data[1].effective |= 1 << (40 - 32);
-   cap->data[1].permitted |= 1 << (40 - 32);
+   data[1].effective |= cap1;
+   data[1].permitted |= cap1;
  
-	if (cap_set_proc(caps)) {

-   perror("cap_set_proc");
-   goto out;
+   ret = capset(&hdr, data);
+   if (ret) {
+   perror("capset");
+   return -1;
}
-   ret = 0;
-out:
-   if (cap_free(caps))
-   perror("cap_free");
return ret;
  }
  





Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks()

2024-10-31 Thread 王正睿
On Thu, 2024-10-31 at 14:27 -0700, Doug Anderson wrote:
> Hi,
> 
> On Wed, Oct 30, 2024 at 10:03 PM Paul E. McKenney  wrote:
> > 
> > > > > Note that:
> > > > > 
> > > > > * Switching to real NMIs is impossible on many existing arm64 CPUs.
> > > > > The hardware support simply isn't there. Pseudo-NMIs should work fine
> > > > > and are (in nearly all cases) just as good as NMIs but they have a
> > > > > small performance impact. There are also compatibility issues with
> > > > > some pre-existing bootloaders. ...so code can't assume even Pseudo-NMI
> > > > > will work and needs to be able to fall back. Prior to my recent
> > > > > changes arm64 CPUs wouldn't even do stacktraces in some scenarios. Now
> > > > > at least they fall back to regular IPIs.
> > > > 
> > > > But those IPIs are of no help whatsoever when the target CPU is spinning
> > > > with interrupts disabled, which is the scenario under discussion.
> > > 
> > > Right that we can't trace the target CPU spinning with interrupts
> > > disabled.
> > 
> > You can by invoking sched_show_task(cpu_curr(cpu)) on the CPU requesting
> > the backtrace.  The resulting backtrace will not be as good as you
> > would get from using an NMI, but if you don't have NMIs, it is better
> > than nothing.
> > 
> > >   ...but in the case where NMI isn't available it _still_
> > > makes sense to make arch_trigger_cpumask_backtrace() fall back to IPI
> > > because:
> > > 
> > > 1. There are many cases where the trigger.*cpu.*backtrace() class of
> > > functions are used to trace CPUs that _aren't_ looping with interrupts
> > > disabled. We still want to be able to backtrace in that case. For
> > > instance, you can see in "panic.c" that there are cases where
> > > trigger_all_cpu_backtrace() is called. It's valuable to make cases
> > > like this (where there's no indication that a CPU is locked) actually
> > > work.
> > > 
> > > 2. Even if there's a CPU that's looping with interrupts disabled and
> > > we can't trace it because of no NMI, it could still be useful to get
> > > backtraces of other CPUs. It's possible that what the other CPUs are
> > > doing could give a hint about the CPU that's wedged. This isn't a
> > > great hint, but some info is better than no info.
> > 
> > And it also makes sense for an IRQ-based backtrace to fall back to
> > something like the aforementioned sched_show_task(cpu_curr(cpu)) if
> > the destination CPU has interrupts disabled.
> > 
> > > > Hence my suggestion that arm64, when using IRQs to request stack
> > > > backtraces, have an additional short timeout (milliseconds) in
> > > > order to fall back to remote stack tracing.  In many cases, this
> > > > fallback happens automatically, as you can see in dump_cpu_task().
> > > > If trigger_single_cpu_backtrace() returns false, the stack is dumped
> > > > remotely.
> > > 
> > > I think the answer here is that the API needs to change. The whole
> > > boolean return value for trigger.*cpu.*backtrace() is mostly ignored
> > > by callers. Yes, you've pointed at one of the two places that it's not
> > > ignored and it tries a reasonable fallback, but all the other callers
> > > just silently do nothing when the function returns false. That led to
> > > many places where arm64 devices were simply not getting _any_
> > > stacktrace.
> > > 
> > > Perhaps we need some type of API that says "I actually have a
> > > fallback, so if you don't think the stracktrace will succeed then it's
> > > OK to return false".
> > 
> > Boolean is fine for trigger_single_cpu_backtrace(), which is directed at
> > a single CPU.  And the one call to this function that does not currently
> > check its return value could just call dump_cpu_task() instead.
> > 
> > There are only 20 or so uses of all of these functions, so the API can
> > change, give or take the pain involved in changing architecture-specific
> > code.
> 
> Right. Falling back to "sched_show_task(cpu_curr(cpu))" or something
> similar if the IPI doesn't go through is a good idea. Indeed, falling
> back to that if the NMI doesn't go through is _also_ a good idea,
> right? ...and we don't want to change all 20 callers to all add a
> fallback. To that end, it feels like someone should change it so that
> the common code includes the fallback and we get rid of the boolean
> return value.
> 
> > > > > * Even if we decided that we absolutely had to disable stacktrades on
> > > > > arm64 CPUs without some type of NMI, that won't fix arm32. arm32 has
> > > > > been using regular IPIs for backtraces like this for many, many years.
> > > > > Maybe folks don't care as much about arm32 anymore but it feels bad if
> > > > > we totally break it.
> > > > 
> > > > Because arm32 isn't used much for datacenter workloads, it will not
> > > > be suffering from this issue.  Not enough to have motivated anyone to
> > > > fix it, anyway.  In contrast, in the datacenter, CPUs really can and
> > > > do have interrupts disabled for many seconds.  (No, this is not a good