date:20240122

Re: [PATCH] selftests/mm: run_vmtests.sh: add missing tests

2024-01-22 Thread Muhammad Usama Anjum

On 1/19/24 9:09 PM, Ryan Roberts wrote:
> Hi Muhammad,
> 
> Afraid this patch is causing a regression on our CI system when it turned up 
> in
> linux-next today. Additionally, 2 of thetests you have added are failing 
> because
> the scripts are not exported correctly...
Andrew has dropped this patch for now.

> 
> On 16/01/2024 09:06, Muhammad Usama Anjum wrote:
>> Add missing tests to run_vmtests.sh. The mm kselftests are run through
>> run_vmtests.sh. If a test isn't present in this script, it'll not run
>> with run_tests or `make -C tools/testing/selftests/mm run_tests`.
>>
>> Signed-off-by: Muhammad Usama Anjum 
>> ---
>>  tools/testing/selftests/mm/run_vmtests.sh | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/tools/testing/selftests/mm/run_vmtests.sh 
>> b/tools/testing/selftests/mm/run_vmtests.sh
>> index 246d53a5d7f2..a5e6ba8d3579 100755
>> --- a/tools/testing/selftests/mm/run_vmtests.sh
>> +++ b/tools/testing/selftests/mm/run_vmtests.sh
>> @@ -248,6 +248,9 @@ CATEGORY="hugetlb" run_test ./map_hugetlb
>>  CATEGORY="hugetlb" run_test ./hugepage-mremap
>>  CATEGORY="hugetlb" run_test ./hugepage-vmemmap
>>  CATEGORY="hugetlb" run_test ./hugetlb-madvise
>> +CATEGORY="hugetlb" run_test ./charge_reserved_hugetlb.sh
>> +CATEGORY="hugetlb" run_test ./hugetlb_reparenting_test.sh
> 
> These 2 tests are failing because the test scripts are not exported. You will
> need to add them to the TEST_FILES variable in the Makefile.
This must be done. I'll investigate even after adding them if these scripts
are robust enough to pass.

> 
>> +CATEGORY="hugetlb" run_test ./hugetlb-read-hwpoison
> 
> The addition of this test causes 2 later tests to fail with ENOMEM. I suspect
> its a side-effect of marking the hugetlbs as hwpoisoned? (just a guess based 
> on
> the test name!). Once a page is marked poisoned, is there a way to un-poison 
> it?
> If not, I suspect that's why it wasn't part of the standard test script in the
> first place.
hugetlb-read-hwpoison failed as probably the fix in the kernel for the test
hasn't been merged in the kernel. The other tests (uffd-stress) aren't
failing on my end and on CI [1][2]

[1] https://lava.collabora.dev/scheduler/job/12577207#L3677
[2] https://lava.collabora.dev/scheduler/job/12577229#L4027

Maybe its configurations issue which is exposed now. Not sure. Maybe
hugetlb-read-hwpoison is changing some configuration and not restoring it.
Maybe your system has less number of hugetlb pages.

> 
> These are the tests that start failing:
> 
> # # 
> # # running ./uffd-stress hugetlb 128 32
> # # 
> # # nr_pages: 64, nr_pages_per_cpu: 8
> # # ERROR: context init failed (errno=12, @uffd-stress.c:254)
> # # [FAIL]
> # not ok 18 uffd-stress hugetlb 128 32 # exit=1
> # # 
> # # running ./uffd-stress hugetlb-private 128 32
> # # 
> # # nr_pages: 64, nr_pages_per_cpu: 8
> # # bounces: 31, mode: rnd racing ver poll, ERROR: UFFDIO_COPY error: 
> -12ERROR:
> UFFDIO_COPY error: -12 (errno=12, @uffd-common.c:614)
> # #  (errno=12, @uffd-common.c:614)
> # # [FAIL]
> 
> Quickest way to repo is:
> 
> $ sudo ./run_vmtests.sh -t "userfaultfd hugetlb"
> 
> Thanks,
> Ryan
> 
> 
>>  
>>  nr_hugepages_tmp=$(cat /proc/sys/vm/nr_hugepages)
>>  # For this test, we need one and just one huge page
> 
> 

-- 
BR,
Muhammad Usama Anjum

[PATCH net] selftests/net/lib: update busywait timeout value

2024-01-22 Thread Hangbin Liu

The busywait timeout value is a millisecond, not a second. So the
current setting 2 is meaningless. Let's copy the WAIT_TIMEOUT from
forwarding/lib.sh and set a BUSYWAIT_TIMEOUT here.

Signed-off-by: Hangbin Liu 
---
Not sure if the default WAIT_TIMEOUT 20s is too large. But since
we usually don't need to wait for that long. I think it's OK to
stay the same value with forwarding/lib.sh. Please tell me if you
think we need to set a more proper value.

BTW, This doesn't look like a fix. But also not a feature. So I just
post it to net tree.
---
 tools/testing/selftests/net/lib.sh | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/lib.sh 
b/tools/testing/selftests/net/lib.sh
index dca549443801..f9fe182dfbd4 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -4,6 +4,9 @@
 ##
 # Defines
 
+WAIT_TIMEOUT=${WAIT_TIMEOUT:=20}
+BUSYWAIT_TIMEOUT=$((WAIT_TIMEOUT * 1000)) # ms
+
 # Kselftest framework requirement - SKIP code is 4.
 ksft_skip=4
 # namespace list created by setup_ns
@@ -48,7 +51,7 @@ cleanup_ns()
 
for ns in "$@"; do
ip netns delete "${ns}" &> /dev/null
-   if ! busywait 2 ip netns list \| grep -vq "^$ns$" &> /dev/null; 
then
+   if ! busywait $BUSYWAIT_TIMEOUT ip netns list \| grep -vq 
"^$ns$" &> /dev/null; then
echo "Warn: Failed to remove namespace $ns"
ret=1
fi
-- 
2.43.0

[PATCH v5 00/12] RISCV: Add kvm Sstc timer selftests

2024-01-22 Thread Haibo Xu

The RISC-V arch_timer selftests is used to validate Sstc timer
functionality in a guest, which sets up periodic timer interrupts
and check the basic interrupt status upon its receipt.

This KVM selftests was ported from aarch64 arch_timer and tested
with Linux v6.7-rc8 on a Qemu riscv64 virt machine.

---
Changed since v4:
  * Rebased to Linux 6.7-rc8
  * Added new patch(2/12) to clean up the data type in struct test_args
  * Re-ordered patch(11/11) in v4 to patch(3/12)
  * Changed the timer_err_margin_us type from int to uint32_t

Haibo Xu (11):
  KVM: arm64: selftests: Data type cleanup for arch_timer test
  KVM: arm64: selftests: Enable tuning of error margin in arch_timer
test
  KVM: arm64: selftests: Split arch_timer test code
  KVM: selftests: Add CONFIG_64BIT definition for the build
  tools: riscv: Add header file csr.h
  tools: riscv: Add header file vdso/processor.h
  KVM: riscv: selftests: Switch to use macro from csr.h
  KVM: riscv: selftests: Add exception handling support
  KVM: riscv: selftests: Add guest helper to get vcpu id
  KVM: riscv: selftests: Change vcpu_has_ext to a common function
  KVM: riscv: selftests: Add sstc timer test

Paolo Bonzini (1):
  selftests/kvm: Fix issues with $(SPLIT_TESTS)

 tools/arch/riscv/include/asm/csr.h| 541 ++
 tools/arch/riscv/include/asm/vdso/processor.h |  32 ++
 tools/testing/selftests/kvm/Makefile  |  27 +-
 .../selftests/kvm/aarch64/arch_timer.c| 295 +-
 tools/testing/selftests/kvm/arch_timer.c  | 259 +
 .../selftests/kvm/include/aarch64/processor.h |   4 -
 .../selftests/kvm/include/kvm_util_base.h |   9 +
 .../selftests/kvm/include/riscv/arch_timer.h  |  71 +++
 .../selftests/kvm/include/riscv/processor.h   |  65 ++-
 .../testing/selftests/kvm/include/test_util.h |   2 +
 .../selftests/kvm/include/timer_test.h|  45 ++
 .../selftests/kvm/lib/riscv/handlers.S| 101 
 .../selftests/kvm/lib/riscv/processor.c   |  87 +++
 .../testing/selftests/kvm/riscv/arch_timer.c  | 111 
 .../selftests/kvm/riscv/get-reg-list.c|  11 +-
 15 files changed, 1353 insertions(+), 307 deletions(-)
 create mode 100644 tools/arch/riscv/include/asm/csr.h
 create mode 100644 tools/arch/riscv/include/asm/vdso/processor.h
 create mode 100644 tools/testing/selftests/kvm/arch_timer.c
 create mode 100644 tools/testing/selftests/kvm/include/riscv/arch_timer.h
 create mode 100644 tools/testing/selftests/kvm/include/timer_test.h
 create mode 100644 tools/testing/selftests/kvm/lib/riscv/handlers.S
 create mode 100644 tools/testing/selftests/kvm/riscv/arch_timer.c

-- 
2.34.1

[PATCH v5 01/12] selftests/kvm: Fix issues with $(SPLIT_TESTS)

2024-01-22 Thread Haibo Xu

From: Paolo Bonzini 

The introduction of $(SPLIT_TESTS) also introduced a warning when
building selftests on architectures that include get-reg-lists:

make: Entering directory '/root/kvm/tools/testing/selftests/kvm'
Makefile:272: warning: overriding recipe for target 
'/root/kvm/tools/testing/selftests/kvm/get-reg-list'
Makefile:267: warning: ignoring old recipe for target 
'/root/kvm/tools/testing/selftests/kvm/get-reg-list'
make: Leaving directory '/root/kvm/tools/testing/selftests/kvm'

In addition, the rule for $(SPLIT_TESTS_TARGETS) includes _all_
the $(SPLIT_TESTS_OBJS), which only works because there is just one.
So fix both by adjusting the rules:

- remove $(SPLIT_TESTS_TARGETS) from the $(TEST_GEN_PROGS) rules,
  and rename it to $(SPLIT_TEST_GEN_PROGS)

- fix $(SPLIT_TESTS_OBJS) so that it plays well with $(OUTPUT),
  rename it to $(SPLIT_TEST_GEN_OBJ), and list the object file
  explicitly in the $(SPLIT_TEST_GEN_PROGS) link rule

Fixes: 17da79e009c3 ("KVM: arm64: selftests: Split get-reg-list test code", 
2023-08-09)
Signed-off-by: Paolo Bonzini 
Tested-by: Andrew Jones 
---
 tools/testing/selftests/kvm/Makefile | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index 3e0c36b8ddd5..c5e9abb185b6 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -257,32 +257,36 @@ LIBKVM_C_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, 
$(LIBKVM_C))
 LIBKVM_S_OBJ := $(patsubst %.S, $(OUTPUT)/%.o, $(LIBKVM_S))
 LIBKVM_STRING_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM_STRING))
 LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ) $(LIBKVM_STRING_OBJ)
-SPLIT_TESTS_TARGETS := $(patsubst %, $(OUTPUT)/%, $(SPLIT_TESTS))
-SPLIT_TESTS_OBJS := $(patsubst %, $(ARCH_DIR)/%.o, $(SPLIT_TESTS))
+SPLIT_TEST_GEN_PROGS := $(patsubst %, $(OUTPUT)/%, $(SPLIT_TESTS))
+SPLIT_TEST_GEN_OBJ := $(patsubst %, $(OUTPUT)/$(ARCH_DIR)/%.o, $(SPLIT_TESTS))
 
 TEST_GEN_OBJ = $(patsubst %, %.o, $(TEST_GEN_PROGS))
 TEST_GEN_OBJ += $(patsubst %, %.o, $(TEST_GEN_PROGS_EXTENDED))
 TEST_DEP_FILES = $(patsubst %.o, %.d, $(TEST_GEN_OBJ))
 TEST_DEP_FILES += $(patsubst %.o, %.d, $(LIBKVM_OBJS))
-TEST_DEP_FILES += $(patsubst %.o, %.d, $(SPLIT_TESTS_OBJS))
+TEST_DEP_FILES += $(patsubst %.o, %.d, $(SPLIT_TEST_GEN_OBJ))
 -include $(TEST_DEP_FILES)
 
-$(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED): %: %.o
+x := $(shell mkdir -p $(sort $(OUTPUT)/$(ARCH_DIR) $(dir $(LIBKVM_C_OBJ) 
$(LIBKVM_S_OBJ
+
+$(filter-out $(SPLIT_TEST_GEN_PROGS), $(TEST_GEN_PROGS)) \
+$(TEST_GEN_PROGS_EXTENDED): %: %.o
$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $< $(LIBKVM_OBJS) 
$(LDLIBS) -o $@
 $(TEST_GEN_OBJ): $(OUTPUT)/%.o: %.c
$(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
 
-$(SPLIT_TESTS_TARGETS): %: %.o $(SPLIT_TESTS_OBJS)
+$(SPLIT_TEST_GEN_PROGS): $(OUTPUT)/%: $(OUTPUT)/%.o $(OUTPUT)/$(ARCH_DIR)/%.o
$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $^ $(LDLIBS) -o $@
+$(SPLIT_TEST_GEN_OBJ): $(OUTPUT)/$(ARCH_DIR)/%.o: $(ARCH_DIR)/%.c
+   $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
 
 EXTRA_CLEAN += $(GEN_HDRS) \
   $(LIBKVM_OBJS) \
-  $(SPLIT_TESTS_OBJS) \
+  $(SPLIT_TEST_GEN_OBJ) \
   $(TEST_DEP_FILES) \
   $(TEST_GEN_OBJ) \
   cscope.*
 
-x := $(shell mkdir -p $(sort $(dir $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ
 $(LIBKVM_C_OBJ): $(OUTPUT)/%.o: %.c $(GEN_HDRS)
$(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
 
@@ -296,7 +300,7 @@ $(LIBKVM_STRING_OBJ): $(OUTPUT)/%.o: %.c
$(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c -ffreestanding $< -o $@
 
 x := $(shell mkdir -p $(sort $(dir $(TEST_GEN_PROGS
-$(SPLIT_TESTS_OBJS): $(GEN_HDRS)
+$(SPLIT_TEST_GEN_OBJ): $(GEN_HDRS)
 $(TEST_GEN_PROGS): $(LIBKVM_OBJS)
 $(TEST_GEN_PROGS_EXTENDED): $(LIBKVM_OBJS)
 $(TEST_GEN_OBJ): $(GEN_HDRS)
-- 
2.34.1

[PATCH v5 02/12] KVM: arm64: selftests: Data type cleanup for arch_timer test

2024-01-22 Thread Haibo Xu

Change signed type to unsigned in test_args struct which
only make sense for unsigned value.

Suggested-by: Andrew Jones 
Signed-off-by: Haibo Xu 
---
 tools/testing/selftests/kvm/aarch64/arch_timer.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/kvm/aarch64/arch_timer.c 
b/tools/testing/selftests/kvm/aarch64/arch_timer.c
index 274b8465b42a..3260fefcc1b3 100644
--- a/tools/testing/selftests/kvm/aarch64/arch_timer.c
+++ b/tools/testing/selftests/kvm/aarch64/arch_timer.c
@@ -42,10 +42,10 @@
 #define TIMER_TEST_MIGRATION_FREQ_MS   2
 
 struct test_args {
-   int nr_vcpus;
-   int nr_iter;
-   int timer_period_ms;
-   int migration_freq_ms;
+   uint32_t nr_vcpus;
+   uint32_t nr_iter;
+   uint32_t timer_period_ms;
+   uint32_t migration_freq_ms;
struct kvm_arm_counter_offset offset;
 };
 
@@ -57,7 +57,7 @@ static struct test_args test_args = {
.offset = { .reserved = 1 },
 };
 
-#define msecs_to_usecs(msec)   ((msec) * 1000LL)
+#define msecs_to_usecs(msec)   ((msec) * 1000ULL)
 
 #define GICD_BASE_GPA  0x800ULL
 #define GICR_BASE_GPA  0x80AULL
@@ -72,7 +72,7 @@ enum guest_stage {
 
 /* Shared variables between host and guest */
 struct test_vcpu_shared_data {
-   int nr_iter;
+   uint32_t nr_iter;
enum guest_stage guest_stage;
uint64_t xcnt;
 };
-- 
2.34.1

[PATCH v5 03/12] KVM: arm64: selftests: Enable tuning of error margin in arch_timer test

2024-01-22 Thread Haibo Xu

There are intermittent failures occurred when stressing the
arch-timer test in a Qemu VM:

 Guest assert failed,  vcpu 0; stage; 4; iter: 3
  Test Assertion Failure 
   aarch64/arch_timer.c:196: config_iter + 1 == irq_iter
   pid=4048 tid=4049 errno=4 - Interrupted system call
  1  0x0040253b: test_vcpu_run at arch_timer.c:248
  2  0xb60dd5c7: ?? ??:0
  3  0xb6145d1b: ?? ??:0
   0x3 != 0x2 (config_iter + 1 != irq_iter)e

Further test and debug show that the timeout for an interrupt
to arrive do have random high fluctuation, espectially when
testing in an virtual environment.

To alleviate this issue, just expose the timeout value as user
configurable and print some hint message to increase the value
when hitting the failure..

Signed-off-by: Haibo Xu 
Reviewed-by: Andrew Jones 
---
 .../selftests/kvm/aarch64/arch_timer.c| 32 +--
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/kvm/aarch64/arch_timer.c 
b/tools/testing/selftests/kvm/aarch64/arch_timer.c
index 3260fefcc1b3..9b9a119bdd61 100644
--- a/tools/testing/selftests/kvm/aarch64/arch_timer.c
+++ b/tools/testing/selftests/kvm/aarch64/arch_timer.c
@@ -6,16 +6,18 @@
  * CVAL and TVAL registers. This consitutes the four stages in the test.
  * The guest's main thread configures the timer interrupt for a stage
  * and waits for it to fire, with a timeout equal to the timer period.
- * It asserts that the timeout doesn't exceed the timer period.
+ * It asserts that the timeout doesn't exceed the timer period plus
+ * a user configurable error margin(default to 100us).
  *
  * On the other hand, upon receipt of an interrupt, the guest's interrupt
  * handler validates the interrupt by checking if the architectural state
  * is in compliance with the specifications.
  *
  * The test provides command-line options to configure the timer's
- * period (-p), number of vCPUs (-n), and iterations per stage (-i).
- * To stress-test the timer stack even more, an option to migrate the
- * vCPUs across pCPUs (-m), at a particular rate, is also provided.
+ * period (-p), number of vCPUs (-n), iterations per stage (-i) and timer
+ * interrupt arrival error margin (-e). To stress-test the timer stack
+ * even more, an option to migrate the vCPUs across pCPUs (-m), at a
+ * particular rate, is also provided.
  *
  * Copyright (c) 2021, Google LLC.
  */
@@ -46,6 +48,7 @@ struct test_args {
uint32_t nr_iter;
uint32_t timer_period_ms;
uint32_t migration_freq_ms;
+   uint32_t timer_err_margin_us;
struct kvm_arm_counter_offset offset;
 };
 
@@ -54,6 +57,7 @@ static struct test_args test_args = {
.nr_iter = NR_TEST_ITERS_DEF,
.timer_period_ms = TIMER_TEST_PERIOD_MS_DEF,
.migration_freq_ms = TIMER_TEST_MIGRATION_FREQ_MS,
+   .timer_err_margin_us = TIMER_TEST_ERR_MARGIN_US,
.offset = { .reserved = 1 },
 };
 
@@ -190,10 +194,14 @@ static void guest_run_stage(struct test_vcpu_shared_data 
*shared_data,
 
/* Setup a timeout for the interrupt to arrive */
udelay(msecs_to_usecs(test_args.timer_period_ms) +
-   TIMER_TEST_ERR_MARGIN_US);
+   test_args.timer_err_margin_us);
 
irq_iter = READ_ONCE(shared_data->nr_iter);
-   GUEST_ASSERT_EQ(config_iter + 1, irq_iter);
+   __GUEST_ASSERT(config_iter + 1 == irq_iter,
+   "config_iter + 1 = 0x%lx, irq_iter = 0x%lx.\n"
+   "  Guest timer interrupt was not trigged within 
the specified\n"
+   "  interval, try to increase the error margin 
by [-e] option.\n",
+   config_iter + 1, irq_iter);
}
 }
 
@@ -408,8 +416,9 @@ static void test_vm_cleanup(struct kvm_vm *vm)
 
 static void test_print_help(char *name)
 {
-   pr_info("Usage: %s [-h] [-n nr_vcpus] [-i iterations] [-p 
timer_period_ms]\n",
-   name);
+   pr_info("Usage: %s [-h] [-n nr_vcpus] [-i iterations] [-p 
timer_period_ms]\n"
+   "\t\t[-m migration_freq_ms] [-o counter_offset]\n"
+   "\t\t[-e timer_err_margin_us]\n", name);
pr_info("\t-n: Number of vCPUs to configure (default: %u; max: %u)\n",
NR_VCPUS_DEF, KVM_MAX_VCPUS);
pr_info("\t-i: Number of iterations per stage (default: %u)\n",
@@ -419,6 +428,8 @@ static void test_print_help(char *name)
pr_info("\t-m: Frequency (in ms) of vCPUs to migrate to different pCPU. 
0 to turn off (default: %u)\n",
TIMER_TEST_MIGRATION_FREQ_MS);
pr_info("\t-o: Counter offset (in counter cycles, default: 0)\n");
+   pr_info("\t-e: Interrupt arrival error margin (in us) of the guest 
timer (default: %u)\n",
+   TIMER_TEST_ERR_MARGIN_US);
pr_info("\t-h: print this help screen\n");
 }
 
@@ -426,7 +437,7 @@ static bool par

[PATCH v5 04/12] KVM: arm64: selftests: Split arch_timer test code

2024-01-22 Thread Haibo Xu

Split the arch-neutral test code out of aarch64/arch_timer.c
and put them into a common arch_timer.c. This is a preparation
to share timer test codes in riscv.

Suggested-by: Andrew Jones 
Signed-off-by: Haibo Xu 
Reviewed-by: Andrew Jones 
---
 tools/testing/selftests/kvm/Makefile  |   3 +-
 .../selftests/kvm/aarch64/arch_timer.c| 285 +-
 tools/testing/selftests/kvm/arch_timer.c  | 257 
 .../testing/selftests/kvm/include/test_util.h |   2 +
 .../selftests/kvm/include/timer_test.h|  44 +++
 5 files changed, 311 insertions(+), 280 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/arch_timer.c
 create mode 100644 tools/testing/selftests/kvm/include/timer_test.h

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index c5e9abb185b6..87f0f76ea639 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -141,7 +141,6 @@ TEST_GEN_PROGS_x86_64 += system_counter_offset_test
 TEST_GEN_PROGS_EXTENDED_x86_64 += x86_64/nx_huge_pages_test
 
 TEST_GEN_PROGS_aarch64 += aarch64/aarch32_id_regs
-TEST_GEN_PROGS_aarch64 += aarch64/arch_timer
 TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
 TEST_GEN_PROGS_aarch64 += aarch64/hypercalls
 TEST_GEN_PROGS_aarch64 += aarch64/page_fault_test
@@ -153,6 +152,7 @@ TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
 TEST_GEN_PROGS_aarch64 += aarch64/vgic_irq
 TEST_GEN_PROGS_aarch64 += aarch64/vpmu_counter_access
 TEST_GEN_PROGS_aarch64 += access_tracking_perf_test
+TEST_GEN_PROGS_aarch64 += arch_timer
 TEST_GEN_PROGS_aarch64 += demand_paging_test
 TEST_GEN_PROGS_aarch64 += dirty_log_test
 TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
@@ -191,6 +191,7 @@ TEST_GEN_PROGS_riscv += kvm_page_table_test
 TEST_GEN_PROGS_riscv += set_memory_region_test
 TEST_GEN_PROGS_riscv += kvm_binary_stats_test
 
+SPLIT_TESTS += arch_timer
 SPLIT_TESTS += get-reg-list
 
 TEST_PROGS += $(TEST_PROGS_$(ARCH_DIR))
diff --git a/tools/testing/selftests/kvm/aarch64/arch_timer.c 
b/tools/testing/selftests/kvm/aarch64/arch_timer.c
index 9b9a119bdd61..a4732ec9f761 100644
--- a/tools/testing/selftests/kvm/aarch64/arch_timer.c
+++ b/tools/testing/selftests/kvm/aarch64/arch_timer.c
@@ -1,68 +1,19 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * arch_timer.c - Tests the aarch64 timer IRQ functionality
- *
  * The test validates both the virtual and physical timer IRQs using
- * CVAL and TVAL registers. This consitutes the four stages in the test.
- * The guest's main thread configures the timer interrupt for a stage
- * and waits for it to fire, with a timeout equal to the timer period.
- * It asserts that the timeout doesn't exceed the timer period plus
- * a user configurable error margin(default to 100us).
- *
- * On the other hand, upon receipt of an interrupt, the guest's interrupt
- * handler validates the interrupt by checking if the architectural state
- * is in compliance with the specifications.
- *
- * The test provides command-line options to configure the timer's
- * period (-p), number of vCPUs (-n), iterations per stage (-i) and timer
- * interrupt arrival error margin (-e). To stress-test the timer stack
- * even more, an option to migrate the vCPUs across pCPUs (-m), at a
- * particular rate, is also provided.
+ * CVAL and TVAL registers.
  *
  * Copyright (c) 2021, Google LLC.
  */
 #define _GNU_SOURCE
 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "kvm_util.h"
-#include "processor.h"
-#include "delay.h"
 #include "arch_timer.h"
+#include "delay.h"
 #include "gic.h"
+#include "processor.h"
+#include "timer_test.h"
 #include "vgic.h"
 
-#define NR_VCPUS_DEF   4
-#define NR_TEST_ITERS_DEF  5
-#define TIMER_TEST_PERIOD_MS_DEF   10
-#define TIMER_TEST_ERR_MARGIN_US   100
-#define TIMER_TEST_MIGRATION_FREQ_MS   2
-
-struct test_args {
-   uint32_t nr_vcpus;
-   uint32_t nr_iter;
-   uint32_t timer_period_ms;
-   uint32_t migration_freq_ms;
-   uint32_t timer_err_margin_us;
-   struct kvm_arm_counter_offset offset;
-};
-
-static struct test_args test_args = {
-   .nr_vcpus = NR_VCPUS_DEF,
-   .nr_iter = NR_TEST_ITERS_DEF,
-   .timer_period_ms = TIMER_TEST_PERIOD_MS_DEF,
-   .migration_freq_ms = TIMER_TEST_MIGRATION_FREQ_MS,
-   .timer_err_margin_us = TIMER_TEST_ERR_MARGIN_US,
-   .offset = { .reserved = 1 },
-};
-
-#define msecs_to_usecs(msec)   ((msec) * 1000ULL)
-
 #define GICD_BASE_GPA  0x800ULL
 #define GICR_BASE_GPA  0x80AULL
 
@@ -74,22 +25,8 @@ enum guest_stage {
GUEST_STAGE_MAX,
 };
 
-/* Shared variables between host and guest */
-struct test_vcpu_shared_data {
-   uint32_t nr_iter;
-   enum guest_stage guest_stage;
-   uint64_t xcnt;
-};
-
-static struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
-static pthread_t pt_vcpu_run[KVM_MAX_VCPUS];
-static struct test_vcpu_shared_data vcpu

[PATCH v5 05/12] KVM: selftests: Add CONFIG_64BIT definition for the build

2024-01-22 Thread Haibo Xu

Since only 64bit KVM selftests were supported on all architectures,
add the CONFIG_64BIT definition in kvm/Makefile to ensure only 64bit
definitions were available in the corresponding included files.

Suggested-by: Andrew Jones 
Signed-off-by: Haibo Xu 
Reviewed-by: Andrew Jones 
---
 tools/testing/selftests/kvm/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index 87f0f76ea639..a18d18994fe8 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -215,7 +215,7 @@ else
 LINUX_TOOL_ARCH_INCLUDE = $(top_srcdir)/tools/arch/$(ARCH)/include
 endif
 CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 \
-   -Wno-gnu-variable-sized-type-not-at-end -MD -MP \
+   -Wno-gnu-variable-sized-type-not-at-end -MD -MP -DCONFIG_64BIT \
-fno-builtin-memcmp -fno-builtin-memcpy -fno-builtin-memset \
-fno-builtin-strnlen \
-fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \
-- 
2.34.1

[PATCH v5 06/12] tools: riscv: Add header file csr.h

2024-01-22 Thread Haibo Xu

Borrow the csr definitions and operations from kernel's
arch/riscv/include/asm/csr.h to tools/ for riscv.

Signed-off-by: Haibo Xu 
Reviewed-by: Andrew Jones 
---
 tools/arch/riscv/include/asm/csr.h | 541 +
 1 file changed, 541 insertions(+)
 create mode 100644 tools/arch/riscv/include/asm/csr.h

diff --git a/tools/arch/riscv/include/asm/csr.h 
b/tools/arch/riscv/include/asm/csr.h
new file mode 100644
index ..0dfc09254f99
--- /dev/null
+++ b/tools/arch/riscv/include/asm/csr.h
@@ -0,0 +1,541 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2015 Regents of the University of California
+ */
+
+#ifndef _ASM_RISCV_CSR_H
+#define _ASM_RISCV_CSR_H
+
+#include 
+
+/* Status register flags */
+#define SR_SIE _AC(0x0002, UL) /* Supervisor Interrupt Enable */
+#define SR_MIE _AC(0x0008, UL) /* Machine Interrupt Enable */
+#define SR_SPIE_AC(0x0020, UL) /* Previous Supervisor IE */
+#define SR_MPIE_AC(0x0080, UL) /* Previous Machine IE */
+#define SR_SPP _AC(0x0100, UL) /* Previously Supervisor */
+#define SR_MPP _AC(0x1800, UL) /* Previously Machine */
+#define SR_SUM _AC(0x0004, UL) /* Supervisor User Memory Access */
+
+#define SR_FS  _AC(0x6000, UL) /* Floating-point Status */
+#define SR_FS_OFF  _AC(0x, UL)
+#define SR_FS_INITIAL  _AC(0x2000, UL)
+#define SR_FS_CLEAN_AC(0x4000, UL)
+#define SR_FS_DIRTY_AC(0x6000, UL)
+
+#define SR_VS  _AC(0x0600, UL) /* Vector Status */
+#define SR_VS_OFF  _AC(0x, UL)
+#define SR_VS_INITIAL  _AC(0x0200, UL)
+#define SR_VS_CLEAN_AC(0x0400, UL)
+#define SR_VS_DIRTY_AC(0x0600, UL)
+
+#define SR_XS  _AC(0x00018000, UL) /* Extension Status */
+#define SR_XS_OFF  _AC(0x, UL)
+#define SR_XS_INITIAL  _AC(0x8000, UL)
+#define SR_XS_CLEAN_AC(0x0001, UL)
+#define SR_XS_DIRTY_AC(0x00018000, UL)
+
+#define SR_FS_VS   (SR_FS | SR_VS) /* Vector and Floating-Point Unit */
+
+#ifndef CONFIG_64BIT
+#define SR_SD  _AC(0x8000, UL) /* FS/VS/XS dirty */
+#else
+#define SR_SD  _AC(0x8000, UL) /* FS/VS/XS dirty */
+#endif
+
+#ifdef CONFIG_64BIT
+#define SR_UXL _AC(0x3, UL) /* XLEN mask for U-mode */
+#define SR_UXL_32  _AC(0x1, UL) /* XLEN = 32 for U-mode */
+#define SR_UXL_64  _AC(0x2, UL) /* XLEN = 64 for U-mode */
+#endif
+
+/* SATP flags */
+#ifndef CONFIG_64BIT
+#define SATP_PPN   _AC(0x003F, UL)
+#define SATP_MODE_32   _AC(0x8000, UL)
+#define SATP_MODE_SHIFT31
+#define SATP_ASID_BITS 9
+#define SATP_ASID_SHIFT22
+#define SATP_ASID_MASK _AC(0x1FF, UL)
+#else
+#define SATP_PPN   _AC(0x0FFF, UL)
+#define SATP_MODE_39   _AC(0x8000, UL)
+#define SATP_MODE_48   _AC(0x9000, UL)
+#define SATP_MODE_57   _AC(0xa000, UL)
+#define SATP_MODE_SHIFT60
+#define SATP_ASID_BITS 16
+#define SATP_ASID_SHIFT44
+#define SATP_ASID_MASK _AC(0x, UL)
+#endif
+
+/* Exception cause high bit - is an interrupt if set */
+#define CAUSE_IRQ_FLAG (_AC(1, UL) << (__riscv_xlen - 1))
+
+/* Interrupt causes (minus the high bit) */
+#define IRQ_S_SOFT 1
+#define IRQ_VS_SOFT2
+#define IRQ_M_SOFT 3
+#define IRQ_S_TIMER5
+#define IRQ_VS_TIMER   6
+#define IRQ_M_TIMER7
+#define IRQ_S_EXT  9
+#define IRQ_VS_EXT 10
+#define IRQ_M_EXT  11
+#define IRQ_S_GEXT 12
+#define IRQ_PMU_OVF13
+#define IRQ_LOCAL_MAX  (IRQ_PMU_OVF + 1)
+#define IRQ_LOCAL_MASK GENMASK((IRQ_LOCAL_MAX - 1), 0)
+
+/* Exception causes */
+#define EXC_INST_MISALIGNED0
+#define EXC_INST_ACCESS1
+#define EXC_INST_ILLEGAL   2
+#define EXC_BREAKPOINT 3
+#define EXC_LOAD_MISALIGNED4
+#define EXC_LOAD_ACCESS5
+#define EXC_STORE_MISALIGNED   6
+#define EXC_STORE_ACCESS   7
+#define EXC_SYSCALL8
+#define EXC_HYPERVISOR_SYSCALL 9
+#define EXC_SUPERVISOR_SYSCALL 10
+#define EXC_INST_PAGE_FAULT12
+#define EXC_LOAD_PAGE_FAULT13
+#define EXC_STORE_PAGE_FAULT   15
+#define EXC_INST_GUEST_PAGE_FAULT  20
+#define EXC_LOAD_GUEST_PAGE_FAULT  21
+#define EXC_VIRTUAL_INST_FAULT 22
+#define EXC_STORE_GUEST_PAGE_FAULT 23
+
+/* PMP configuration */
+#define PMP_R  0x01
+#define PMP_W  0x02
+#define PMP_X  0x04
+#define PMP_A  0x18
+#define PMP_A_TOR  0x08
+#define PMP_A_NA4  0x10
+#define PMP_A_NAPOT0x18
+#define PMP_L  0x80
+
+/* HSTATUS flags */
+#ifdef CONFIG_64BIT
+#define HSTATUS_VSXL   _AC(0x3, UL)
+#define HSTATUS_VSXL_SHIFT 32
+#endif
+#define HSTATUS_VTSR

[PATCH v5 07/12] tools: riscv: Add header file vdso/processor.h

2024-01-22 Thread Haibo Xu

Borrow the cpu_relax() definitions from kernel's
arch/riscv/include/asm/vdso/processor.h to tools/ for riscv.

Signed-off-by: Haibo Xu 
Reviewed-by: Andrew Jones 
---
 tools/arch/riscv/include/asm/vdso/processor.h | 32 +++
 1 file changed, 32 insertions(+)
 create mode 100644 tools/arch/riscv/include/asm/vdso/processor.h

diff --git a/tools/arch/riscv/include/asm/vdso/processor.h 
b/tools/arch/riscv/include/asm/vdso/processor.h
new file mode 100644
index ..662aca039848
--- /dev/null
+++ b/tools/arch/riscv/include/asm/vdso/processor.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_VDSO_PROCESSOR_H
+#define __ASM_VDSO_PROCESSOR_H
+
+#ifndef __ASSEMBLY__
+
+#include 
+
+static inline void cpu_relax(void)
+{
+#ifdef __riscv_muldiv
+   int dummy;
+   /* In lieu of a halt instruction, induce a long-latency stall. */
+   __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy));
+#endif
+
+#ifdef CONFIG_TOOLCHAIN_HAS_ZIHINTPAUSE
+   /*
+* Reduce instruction retirement.
+* This assumes the PC changes.
+*/
+   __asm__ __volatile__ ("pause");
+#else
+   /* Encoding of the pause instruction */
+   __asm__ __volatile__ (".4byte 0x10F");
+#endif
+   barrier();
+}
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ASM_VDSO_PROCESSOR_H */
-- 
2.34.1

[PATCH v5 08/12] KVM: riscv: selftests: Switch to use macro from csr.h

2024-01-22 Thread Haibo Xu

Signed-off-by: Haibo Xu 
Reviewed-by: Andrew Jones 
---
 tools/testing/selftests/kvm/include/riscv/processor.h | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h 
b/tools/testing/selftests/kvm/include/riscv/processor.h
index 5b62a3d2aa9b..6f9e1e5e466d 100644
--- a/tools/testing/selftests/kvm/include/riscv/processor.h
+++ b/tools/testing/selftests/kvm/include/riscv/processor.h
@@ -7,8 +7,9 @@
 #ifndef SELFTEST_KVM_PROCESSOR_H
 #define SELFTEST_KVM_PROCESSOR_H
 
-#include "kvm_util.h"
 #include 
+#include 
+#include "kvm_util.h"
 
 static inline uint64_t __kvm_reg_id(uint64_t type, uint64_t idx,
uint64_t  size)
@@ -95,13 +96,6 @@ static inline uint64_t __kvm_reg_id(uint64_t type, uint64_t 
idx,
 #define PGTBL_PAGE_SIZEPGTBL_L0_BLOCK_SIZE
 #define PGTBL_PAGE_SIZE_SHIFT  PGTBL_L0_BLOCK_SHIFT
 
-#define SATP_PPN   _AC(0x0FFF, UL)
-#define SATP_MODE_39   _AC(0x8000, UL)
-#define SATP_MODE_48   _AC(0x9000, UL)
-#define SATP_ASID_BITS 16
-#define SATP_ASID_SHIFT44
-#define SATP_ASID_MASK _AC(0x, UL)
-
 #define SBI_EXT_EXPERIMENTAL_START 0x0800
 #define SBI_EXT_EXPERIMENTAL_END   0x08FF
 
-- 
2.34.1

[PATCH v5 09/12] KVM: riscv: selftests: Add exception handling support

2024-01-22 Thread Haibo Xu

Add the infrastructure for guest exception handling in riscv selftests.
Customized handlers can be enabled by vm_install_exception_handler(vector)
or vm_install_interrupt_handler().

The code is inspired from that of x86/arm64.

Signed-off-by: Haibo Xu 
Reviewed-by: Andrew Jones 
---
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../selftests/kvm/include/kvm_util_base.h |   7 ++
 .../selftests/kvm/include/riscv/processor.h   |  43 
 .../selftests/kvm/lib/riscv/handlers.S| 101 ++
 .../selftests/kvm/lib/riscv/processor.c   |  69 
 5 files changed, 221 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/lib/riscv/handlers.S

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index a18d18994fe8..f514c81877ce 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -53,6 +53,7 @@ LIBKVM_s390x += lib/s390x/diag318_test_handler.c
 LIBKVM_s390x += lib/s390x/processor.c
 LIBKVM_s390x += lib/s390x/ucall.c
 
+LIBKVM_riscv += lib/riscv/handlers.S
 LIBKVM_riscv += lib/riscv/processor.c
 LIBKVM_riscv += lib/riscv/ucall.c
 
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
b/tools/testing/selftests/kvm/include/kvm_util_base.h
index a18db6a7b3cf..135ae2eb5249 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -932,4 +932,11 @@ void kvm_selftest_arch_init(void);
 
 void kvm_arch_vm_post_create(struct kvm_vm *vm);
 
+void vm_init_vector_tables(struct kvm_vm *vm);
+void vcpu_init_vector_tables(struct kvm_vcpu *vcpu);
+
+struct ex_regs;
+typedef void(*exception_handler_fn)(struct ex_regs *);
+void vm_install_exception_handler(struct kvm_vm *vm, int vector, 
exception_handler_fn handler);
+
 #endif /* SELFTEST_KVM_UTIL_BASE_H */
diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h 
b/tools/testing/selftests/kvm/include/riscv/processor.h
index 6f9e1e5e466d..b68b1b731a34 100644
--- a/tools/testing/selftests/kvm/include/riscv/processor.h
+++ b/tools/testing/selftests/kvm/include/riscv/processor.h
@@ -42,6 +42,49 @@ static inline uint64_t __kvm_reg_id(uint64_t type, uint64_t 
idx,
 #define RISCV_ISA_EXT_REG(idx) __kvm_reg_id(KVM_REG_RISCV_ISA_EXT, \
 idx, KVM_REG_SIZE_ULONG)
 
+struct ex_regs {
+   unsigned long ra;
+   unsigned long sp;
+   unsigned long gp;
+   unsigned long tp;
+   unsigned long t0;
+   unsigned long t1;
+   unsigned long t2;
+   unsigned long s0;
+   unsigned long s1;
+   unsigned long a0;
+   unsigned long a1;
+   unsigned long a2;
+   unsigned long a3;
+   unsigned long a4;
+   unsigned long a5;
+   unsigned long a6;
+   unsigned long a7;
+   unsigned long s2;
+   unsigned long s3;
+   unsigned long s4;
+   unsigned long s5;
+   unsigned long s6;
+   unsigned long s7;
+   unsigned long s8;
+   unsigned long s9;
+   unsigned long s10;
+   unsigned long s11;
+   unsigned long t3;
+   unsigned long t4;
+   unsigned long t5;
+   unsigned long t6;
+   unsigned long epc;
+   unsigned long status;
+   unsigned long cause;
+};
+
+#define NR_VECTORS  2
+#define NR_EXCEPTIONS  32
+#define EC_MASK  (NR_EXCEPTIONS - 1)
+
+void vm_install_interrupt_handler(struct kvm_vm *vm, exception_handler_fn 
handler);
+
 /* L3 index Bit[47:39] */
 #define PGTBL_L3_INDEX_MASK0xFF80ULL
 #define PGTBL_L3_INDEX_SHIFT   39
diff --git a/tools/testing/selftests/kvm/lib/riscv/handlers.S 
b/tools/testing/selftests/kvm/lib/riscv/handlers.S
new file mode 100644
index ..aa0abd3f35bb
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/riscv/handlers.S
@@ -0,0 +1,101 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2023 Intel Corporation
+ */
+
+#ifndef __ASSEMBLY__
+#define __ASSEMBLY__
+#endif
+
+#include 
+
+.macro save_context
+   addi  sp, sp, (-8*34)
+   sdx1, 0(sp)
+   sdx2, 8(sp)
+   sdx3, 16(sp)
+   sdx4, 24(sp)
+   sdx5, 32(sp)
+   sdx6, 40(sp)
+   sdx7, 48(sp)
+   sdx8, 56(sp)
+   sdx9, 64(sp)
+   sdx10, 72(sp)
+   sdx11, 80(sp)
+   sdx12, 88(sp)
+   sdx13, 96(sp)
+   sdx14, 104(sp)
+   sdx15, 112(sp)
+   sdx16, 120(sp)
+   sdx17, 128(sp)
+   sdx18, 136(sp)
+   sdx19, 144(sp)
+   sdx20, 152(sp)
+   sdx21, 160(sp)
+   sdx22, 168(sp)
+   sdx23, 176(sp)
+   sdx24, 184(sp)
+   sdx25, 192(sp)
+   sdx26, 200(sp)
+   sdx27, 208(sp)
+   sdx28, 216(sp)
+   sdx29, 224(sp)
+   sdx30, 232(sp)
+   sdx31, 240(sp)
+   csrr  s0, CSR_SEPC
+   csrr  s1, CSR_SSTATUS
+   csrr  s2, CSR_SCAUSE
+

[PATCH v5 10/12] KVM: riscv: selftests: Add guest helper to get vcpu id

2024-01-22 Thread Haibo Xu

Add guest_get_vcpuid() helper to simplify accessing to per-cpu
private data. The sscratch CSR was used to store the vcpu id.

Signed-off-by: Haibo Xu 
Reviewed-by: Andrew Jones 
---
 tools/testing/selftests/kvm/include/aarch64/processor.h | 4 
 tools/testing/selftests/kvm/include/kvm_util_base.h | 2 ++
 tools/testing/selftests/kvm/lib/riscv/processor.c   | 8 
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/aarch64/processor.h 
b/tools/testing/selftests/kvm/include/aarch64/processor.h
index c42d683102c7..16ae0ac01879 100644
--- a/tools/testing/selftests/kvm/include/aarch64/processor.h
+++ b/tools/testing/selftests/kvm/include/aarch64/processor.h
@@ -226,8 +226,4 @@ void smccc_smc(uint32_t function_id, uint64_t arg0, 
uint64_t arg1,
   uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5,
   uint64_t arg6, struct arm_smccc_res *res);
 
-
-
-uint32_t guest_get_vcpuid(void);
-
 #endif /* SELFTEST_KVM_PROCESSOR_H */
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 135ae2eb5249..666438113d22 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -939,4 +939,6 @@ struct ex_regs;
 typedef void(*exception_handler_fn)(struct ex_regs *);
 void vm_install_exception_handler(struct kvm_vm *vm, int vector, 
exception_handler_fn handler);
 
+uint32_t guest_get_vcpuid(void);
+
 #endif /* SELFTEST_KVM_UTIL_BASE_H */
diff --git a/tools/testing/selftests/kvm/lib/riscv/processor.c 
b/tools/testing/selftests/kvm/lib/riscv/processor.c
index efd9ac4b0198..39a1e9902dec 100644
--- a/tools/testing/selftests/kvm/lib/riscv/processor.c
+++ b/tools/testing/selftests/kvm/lib/riscv/processor.c
@@ -316,6 +316,9 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, 
uint32_t vcpu_id,
vcpu_set_reg(vcpu, RISCV_CORE_REG(regs.sp), stack_vaddr + stack_size);
vcpu_set_reg(vcpu, RISCV_CORE_REG(regs.pc), (unsigned long)guest_code);
 
+   /* Setup sscratch for guest_get_vcpuid() */
+   vcpu_set_reg(vcpu, RISCV_CSR_REG(sscratch), vcpu_id);
+
/* Setup default exception vector of guest */
vcpu_set_reg(vcpu, RISCV_CSR_REG(stvec), (unsigned 
long)guest_unexp_trap);
 
@@ -436,3 +439,8 @@ void vm_install_interrupt_handler(struct kvm_vm *vm, 
exception_handler_fn handle
 
handlers->exception_handlers[1][0] = handler;
 }
+
+uint32_t guest_get_vcpuid(void)
+{
+   return csr_read(CSR_SSCRATCH);
+}
-- 
2.34.1

[PATCH v5 11/12] KVM: riscv: selftests: Change vcpu_has_ext to a common function

2024-01-22 Thread Haibo Xu

Move vcpu_has_ext to the processor.c and rename it to __vcpu_has_ext
so that other test cases can use it for vCPU extension check.

Signed-off-by: Haibo Xu 
Reviewed-by: Andrew Jones 
---
 tools/testing/selftests/kvm/include/riscv/processor.h |  2 ++
 tools/testing/selftests/kvm/lib/riscv/processor.c | 10 ++
 tools/testing/selftests/kvm/riscv/get-reg-list.c  | 11 +--
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h 
b/tools/testing/selftests/kvm/include/riscv/processor.h
index b68b1b731a34..bd27e1c67579 100644
--- a/tools/testing/selftests/kvm/include/riscv/processor.h
+++ b/tools/testing/selftests/kvm/include/riscv/processor.h
@@ -42,6 +42,8 @@ static inline uint64_t __kvm_reg_id(uint64_t type, uint64_t 
idx,
 #define RISCV_ISA_EXT_REG(idx) __kvm_reg_id(KVM_REG_RISCV_ISA_EXT, \
 idx, KVM_REG_SIZE_ULONG)
 
+bool __vcpu_has_ext(struct kvm_vcpu *vcpu, int ext);
+
 struct ex_regs {
unsigned long ra;
unsigned long sp;
diff --git a/tools/testing/selftests/kvm/lib/riscv/processor.c 
b/tools/testing/selftests/kvm/lib/riscv/processor.c
index 39a1e9902dec..dad73ce18164 100644
--- a/tools/testing/selftests/kvm/lib/riscv/processor.c
+++ b/tools/testing/selftests/kvm/lib/riscv/processor.c
@@ -15,6 +15,16 @@
 
 static vm_vaddr_t exception_handlers;
 
+bool __vcpu_has_ext(struct kvm_vcpu *vcpu, int ext)
+{
+   unsigned long value = 0;
+   int ret;
+
+   ret = __vcpu_get_reg(vcpu, RISCV_ISA_EXT_REG(ext), &value);
+
+   return !ret && !!value;
+}
+
 static uint64_t page_align(struct kvm_vm *vm, uint64_t v)
 {
return (v + vm->page_size) & ~(vm->page_size - 1);
diff --git a/tools/testing/selftests/kvm/riscv/get-reg-list.c 
b/tools/testing/selftests/kvm/riscv/get-reg-list.c
index 25de4b8bc347..ed29ba45588c 100644
--- a/tools/testing/selftests/kvm/riscv/get-reg-list.c
+++ b/tools/testing/selftests/kvm/riscv/get-reg-list.c
@@ -75,15 +75,6 @@ bool check_reject_set(int err)
return err == EINVAL;
 }
 
-static inline bool vcpu_has_ext(struct kvm_vcpu *vcpu, int ext)
-{
-   int ret;
-   unsigned long value;
-
-   ret = __vcpu_get_reg(vcpu, RISCV_ISA_EXT_REG(ext), &value);
-   return (ret) ? false : !!value;
-}
-
 void finalize_vcpu(struct kvm_vcpu *vcpu, struct vcpu_reg_list *c)
 {
unsigned long isa_ext_state[KVM_RISCV_ISA_EXT_MAX] = { 0 };
@@ -111,7 +102,7 @@ void finalize_vcpu(struct kvm_vcpu *vcpu, struct 
vcpu_reg_list *c)
__vcpu_set_reg(vcpu, RISCV_ISA_EXT_REG(s->feature), 1);
 
/* Double check whether the desired extension was enabled */
-   __TEST_REQUIRE(vcpu_has_ext(vcpu, s->feature),
+   __TEST_REQUIRE(__vcpu_has_ext(vcpu, s->feature),
   "%s not available, skipping tests\n", s->name);
}
 }
-- 
2.34.1

[PATCH v5 12/12] KVM: riscv: selftests: Add sstc timer test

2024-01-22 Thread Haibo Xu

Add a KVM selftests to validate the Sstc timer functionality.
The test was ported from arm64 arch timer test.

Signed-off-by: Haibo Xu 
Reviewed-by: Andrew Jones 
---
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../selftests/kvm/aarch64/arch_timer.c|  12 +-
 tools/testing/selftests/kvm/arch_timer.c  |  10 +-
 .../selftests/kvm/include/riscv/arch_timer.h  |  71 +++
 .../selftests/kvm/include/riscv/processor.h   |  10 ++
 .../selftests/kvm/include/timer_test.h|   5 +-
 .../testing/selftests/kvm/riscv/arch_timer.c  | 111 ++
 7 files changed, 210 insertions(+), 10 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/include/riscv/arch_timer.h
 create mode 100644 tools/testing/selftests/kvm/riscv/arch_timer.c

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index f514c81877ce..77004220763e 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -183,6 +183,7 @@ TEST_GEN_PROGS_s390x += rseq_test
 TEST_GEN_PROGS_s390x += set_memory_region_test
 TEST_GEN_PROGS_s390x += kvm_binary_stats_test
 
+TEST_GEN_PROGS_riscv += arch_timer
 TEST_GEN_PROGS_riscv += demand_paging_test
 TEST_GEN_PROGS_riscv += dirty_log_test
 TEST_GEN_PROGS_riscv += guest_print_test
diff --git a/tools/testing/selftests/kvm/aarch64/arch_timer.c 
b/tools/testing/selftests/kvm/aarch64/arch_timer.c
index a4732ec9f761..77393be9236d 100644
--- a/tools/testing/selftests/kvm/aarch64/arch_timer.c
+++ b/tools/testing/selftests/kvm/aarch64/arch_timer.c
@@ -194,10 +194,14 @@ struct kvm_vm *test_vm_create(void)
vm_init_descriptor_tables(vm);
vm_install_exception_handler(vm, VECTOR_IRQ_CURRENT, guest_irq_handler);
 
-   if (!test_args.offset.reserved) {
-   if (kvm_has_cap(KVM_CAP_COUNTER_OFFSET))
-   vm_ioctl(vm, KVM_ARM_SET_COUNTER_OFFSET, 
&test_args.offset);
-   else
+   if (!test_args.reserved) {
+   if (kvm_has_cap(KVM_CAP_COUNTER_OFFSET)) {
+   struct kvm_arm_counter_offset offset = {
+   .counter_offset = test_args.counter_offset,
+   .reserved = 0,
+   };
+   vm_ioctl(vm, KVM_ARM_SET_COUNTER_OFFSET, &offset);
+   } else
TEST_FAIL("no support for global offset\n");
}
 
diff --git a/tools/testing/selftests/kvm/arch_timer.c 
b/tools/testing/selftests/kvm/arch_timer.c
index 113d40f7bb14..e4eb6cacc356 100644
--- a/tools/testing/selftests/kvm/arch_timer.c
+++ b/tools/testing/selftests/kvm/arch_timer.c
@@ -36,7 +36,7 @@ struct test_args test_args = {
.timer_period_ms = TIMER_TEST_PERIOD_MS_DEF,
.migration_freq_ms = TIMER_TEST_MIGRATION_FREQ_MS,
.timer_err_margin_us = TIMER_TEST_ERR_MARGIN_US,
-   .offset = { .reserved = 1 },
+   .reserved = 1,
 };
 
 struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
@@ -75,6 +75,8 @@ static void *test_vcpu_run(void *arg)
TEST_FAIL("Unexpected guest exit\n");
}
 
+   pr_info("PASS(vCPU-%d).\n", vcpu_idx);
+
return NULL;
 }
 
@@ -190,7 +192,7 @@ static void test_print_help(char *name)
TIMER_TEST_PERIOD_MS_DEF);
pr_info("\t-m: Frequency (in ms) of vCPUs to migrate to different pCPU. 
0 to turn off (default: %u)\n",
TIMER_TEST_MIGRATION_FREQ_MS);
-   pr_info("\t-o: Counter offset (in counter cycles, default: 0)\n");
+   pr_info("\t-o: Counter offset (in counter cycles, default: 0) 
[aarch64-only]\n");
pr_info("\t-e: Interrupt arrival error margin (in us) of the guest 
timer (default: %u)\n",
TIMER_TEST_ERR_MARGIN_US);
pr_info("\t-h: print this help screen\n");
@@ -223,8 +225,8 @@ static bool parse_args(int argc, char *argv[])
test_args.timer_err_margin_us = 
atoi_non_negative("Error Margin", optarg);
break;
case 'o':
-   test_args.offset.counter_offset = strtol(optarg, NULL, 
0);
-   test_args.offset.reserved = 0;
+   test_args.counter_offset = strtol(optarg, NULL, 0);
+   test_args.reserved = 0;
break;
case 'h':
default:
diff --git a/tools/testing/selftests/kvm/include/riscv/arch_timer.h 
b/tools/testing/selftests/kvm/include/riscv/arch_timer.h
new file mode 100644
index ..225d81dad064
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/riscv/arch_timer.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * RISC-V Arch Timer(sstc) specific interface
+ *
+ * Copyright (c) 2024 Intel Corporation
+ */
+
+#ifndef SELFTEST_KVM_ARCH_TIMER_H
+#define SELFTEST_KVM_ARCH_TIMER_H
+
+#include 
+#include 
+
+static unsigned long timer_freq;
+
+#define msec_to_cycles(msec)   \
+   ((timer_freq) * (ui

Re: [PATCH] selftests/mm: run_vmtests.sh: add missing tests

2024-01-22 Thread Ryan Roberts

On 22/01/2024 08:46, Muhammad Usama Anjum wrote:
> On 1/19/24 9:09 PM, Ryan Roberts wrote:
>> Hi Muhammad,
>>
>> Afraid this patch is causing a regression on our CI system when it turned up 
>> in
>> linux-next today. Additionally, 2 of thetests you have added are failing 
>> because
>> the scripts are not exported correctly...
> Andrew has dropped this patch for now.
> 
>>
>> On 16/01/2024 09:06, Muhammad Usama Anjum wrote:
>>> Add missing tests to run_vmtests.sh. The mm kselftests are run through
>>> run_vmtests.sh. If a test isn't present in this script, it'll not run
>>> with run_tests or `make -C tools/testing/selftests/mm run_tests`.
>>>
>>> Signed-off-by: Muhammad Usama Anjum 
>>> ---
>>>  tools/testing/selftests/mm/run_vmtests.sh | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/tools/testing/selftests/mm/run_vmtests.sh 
>>> b/tools/testing/selftests/mm/run_vmtests.sh
>>> index 246d53a5d7f2..a5e6ba8d3579 100755
>>> --- a/tools/testing/selftests/mm/run_vmtests.sh
>>> +++ b/tools/testing/selftests/mm/run_vmtests.sh
>>> @@ -248,6 +248,9 @@ CATEGORY="hugetlb" run_test ./map_hugetlb
>>>  CATEGORY="hugetlb" run_test ./hugepage-mremap
>>>  CATEGORY="hugetlb" run_test ./hugepage-vmemmap
>>>  CATEGORY="hugetlb" run_test ./hugetlb-madvise
>>> +CATEGORY="hugetlb" run_test ./charge_reserved_hugetlb.sh
>>> +CATEGORY="hugetlb" run_test ./hugetlb_reparenting_test.sh
>>
>> These 2 tests are failing because the test scripts are not exported. You will
>> need to add them to the TEST_FILES variable in the Makefile.
> This must be done. I'll investigate even after adding them if these scripts
> are robust enough to pass.

Great thanks!

> 
>>
>>> +CATEGORY="hugetlb" run_test ./hugetlb-read-hwpoison
>>
>> The addition of this test causes 2 later tests to fail with ENOMEM. I suspect
>> its a side-effect of marking the hugetlbs as hwpoisoned? (just a guess based 
>> on
>> the test name!). Once a page is marked poisoned, is there a way to un-poison 
>> it?
>> If not, I suspect that's why it wasn't part of the standard test script in 
>> the
>> first place.
> hugetlb-read-hwpoison failed as probably the fix in the kernel for the test
> hasn't been merged in the kernel. The other tests (uffd-stress) aren't
> failing on my end and on CI [1][2]

To be clear, hugetlb-read-hwpoison isn't failing for me, its just causing the
subsequent tests uffd-stress tests to fail. Both of those subsequent tests are
allocating hugetlbs so my guess is that since this test is marking some hugetlbs
as poisoned, there are no longer enough for the subsequent tests.

> 
> [1] https://lava.collabora.dev/scheduler/job/12577207#L3677
> [2] https://lava.collabora.dev/scheduler/job/12577229#L4027
> 
> Maybe its configurations issue which is exposed now. Not sure. Maybe
> hugetlb-read-hwpoison is changing some configuration and not restoring it.

Well yes - its marking some hugetlb pages as HWPOISONED.

> Maybe your system has less number of hugetlb pages.

YEs probably; What is hugetlb-read-hwpoison's requirement for size and number of
hugetlb pages? the run_vmtests.sh script allocates the required number of
default-sized hugetlb pages before running any tests (I guess this value should
be increased for hugetlb-read-hwpoison's requirements?).

Additionally, our CI preallocates non-default sizes from the kernel command line
at boot. Happy to increase these if you can tell me what the new requirement is:

hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2

Thanks,
Ryan

> 
>>
>> These are the tests that start failing:
>>
>> # # 
>> # # running ./uffd-stress hugetlb 128 32
>> # # 
>> # # nr_pages: 64, nr_pages_per_cpu: 8
>> # # ERROR: context init failed (errno=12, @uffd-stress.c:254)
>> # # [FAIL]
>> # not ok 18 uffd-stress hugetlb 128 32 # exit=1
>> # # 
>> # # running ./uffd-stress hugetlb-private 128 32
>> # # 
>> # # nr_pages: 64, nr_pages_per_cpu: 8
>> # # bounces: 31, mode: rnd racing ver poll, ERROR: UFFDIO_COPY error: 
>> -12ERROR:
>> UFFDIO_COPY error: -12 (errno=12, @uffd-common.c:614)
>> # #  (errno=12, @uffd-common.c:614)
>> # # [FAIL]
>>
>> Quickest way to repo is:
>>
>> $ sudo ./run_vmtests.sh -t "userfaultfd hugetlb"
>>
>> Thanks,
>> Ryan
>>
>>
>>>  
>>>  nr_hugepages_tmp=$(cat /proc/sys/vm/nr_hugepages)
>>>  # For this test, we need one and just one huge page
>>
>>
>

Re: [PATCH] kunit: Mark filter_glob param as rw

2024-01-22 Thread Krzysztofik, Janusz

On Friday, 19 January 2024 00:29:33 CET Lucas De Marchi wrote:
> On Thu, Jan 18, 2024 at 05:23:33PM -0500, Rae Moar wrote:
> >On Thu, Jan 11, 2024 at 7:13 PM Lucas De Marchi
> > wrote:
> >>
> >> By allowing the filter_glob parameter to be written to, it's possible to
> >> tweak the testsuites that will be executed on new module loads. This
> >> makes it easier to run specific tests without having to reload kunit and
> >> provides a way to filter tests on real HW even if kunit is builtin.
> >> Example for xe driver:
> >>
> >> 1) Run just 1 test
> >> # echo -n xe_bo > /sys/module/kunit/parameters/filter_glob
> >> # modprobe -r xe_live_test
> >> # modprobe xe_live_test
> >> # ls /sys/kernel/debug/kunit/
> >> xe_bo
> >>
> >> 2) Run all tests
> >> # echo \* > /sys/module/kunit/parameters/filter_glob
> >> # modprobe -r xe_live_test
> >> # modprobe xe_live_test
> >> # ls /sys/kernel/debug/kunit/
> >> xe_bo  xe_dma_buf  xe_migrate  xe_mocs
> >>
> >> References: https://lore.kernel.org/intel-xe/
dzacvbdditbneiu3e3fmstjmttcbne44yspumpkd6sjn56jqpk@vxu7sksbqrp6/
> >> Signed-off-by: Lucas De Marchi 
> >
> >Hello!
> >
> >I have tested this and this looks good to me. I agree this is very
> >helpful and I wonder if we should do the same with the other module
> >parameters (filter, filter_action).
> 
> yeah, after I sent this I was wondering about the other parameters. I
> don't have a use for them right now, but I can try a few things and spin
> a new version if people find it useful.

Yes, please do.  I find it very useful for improving the current 
implementation of IGT kunit which now depends the ability to unload and reload 
the kunit base module with specific filter parameters in order to get a KTAP 
formatted list of test cases without executing them, then to run those test 
cases filtered one by one.

Thanks,
Janusz

> 
> >
> >It did worry me to make filter_glob writable due to the recent patch
> >that requires the output of filtering to be a valid virtual address
> >but I think there is a sufficient amount of checking of filter_glob.
> >
> >Thanks!
> >-Rae
> >
> >Reviewed-by: Rae Moar 
> 
> thanks
> Lucas De Marchi
> 
> 

-
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial 
Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | 
Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z 
dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach 
handlowych.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i 
moze zawierac informacje poufne. W razie przypadkowego otrzymania tej 
wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; 
jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). If you are not the intended recipient, please 
contact the sender and delete all copies; any review or distribution by others 
is strictly prohibited.

[PATCH] papr_vpd.c: calling devfd before get_system_loc_code

2024-01-22 Thread R Nageswara Sastry

Calling get_system_loc_code before checking devfd and errno - fails the test
when the device is not available, expected a SKIP.
Change the order of 'SKIP_IF_MSG' correctly SKIP when the /dev/papr-vpd device
is not available.

with out patch: Test FAILED on line 271
with patch: [SKIP] Test skipped on line 266: /dev/papr-vpd not present

Signed-off-by: R Nageswara Sastry 
---
 tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c 
b/tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c
index 98cbb9109ee6..505294da1b9f 100644
--- a/tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c
+++ b/tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c
@@ -263,10 +263,10 @@ static int papr_vpd_system_loc_code(void)
off_t size;
int fd;
 
-   SKIP_IF_MSG(get_system_loc_code(&lc),
-   "Cannot determine system location code");
SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
DEVPATH " not present");
+   SKIP_IF_MSG(get_system_loc_code(&lc),
+   "Cannot determine system location code");
 
FAIL_IF(devfd < 0);
 
-- 
2.37.1 (Apple Git-137.1)

[PATCH v1] selftests/mm: ksm_tests should only MADV_HUGEPAGE valid memory

2024-01-22 Thread Ryan Roberts

ksm_tests was previously mmapping a region of memory, aligning the
returned pointer to a PMD boundary, then setting MADV_HUGEPAGE, but was
setting it past the end of the mmapped area due to not taking the
pointer alignment into consideration. Fix this behaviour.

Up until commit efa7df3e3bb5 ("mm: align larger anonymous mappings on
THP boundaries"), this buggy behavior was (usually) masked because the
alignment difference was always less than PMD-size. But since the
mentioned commit, `ksm_tests -H -s 100` started failing.

Fixes: 325254899684 ("selftests: vm: add KSM huge pages merging time test")
Cc: sta...@vger.kernel.org
Signed-off-by: Ryan Roberts 
---
Applies on top of mm-unstable.

Thanks,
Ryan


 tools/testing/selftests/mm/ksm_tests.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/ksm_tests.c 
b/tools/testing/selftests/mm/ksm_tests.c
index 380b691d3eb9..b748c48908d9 100644
--- a/tools/testing/selftests/mm/ksm_tests.c
+++ b/tools/testing/selftests/mm/ksm_tests.c
@@ -566,7 +566,7 @@ static int ksm_merge_hugepages_time(int merge_type, int 
mapping, int prot,
if (map_ptr_orig == MAP_FAILED)
err(2, "initial mmap");

-   if (madvise(map_ptr, len + HPAGE_SIZE, MADV_HUGEPAGE))
+   if (madvise(map_ptr, len, MADV_HUGEPAGE))
err(2, "MADV_HUGEPAGE");

pagemap_fd = open("/proc/self/pagemap", O_RDONLY);
--
2.25.1

Re: [PATCH v5 02/12] KVM: arm64: selftests: Data type cleanup for arch_timer test

2024-01-22 Thread Andrew Jones

On Mon, Jan 22, 2024 at 05:58:32PM +0800, Haibo Xu wrote:
> Change signed type to unsigned in test_args struct which
> only make sense for unsigned value.
> 
> Suggested-by: Andrew Jones 
> Signed-off-by: Haibo Xu 
> ---
>  tools/testing/selftests/kvm/aarch64/arch_timer.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/aarch64/arch_timer.c 
> b/tools/testing/selftests/kvm/aarch64/arch_timer.c
> index 274b8465b42a..3260fefcc1b3 100644
> --- a/tools/testing/selftests/kvm/aarch64/arch_timer.c
> +++ b/tools/testing/selftests/kvm/aarch64/arch_timer.c
> @@ -42,10 +42,10 @@
>  #define TIMER_TEST_MIGRATION_FREQ_MS 2
>  
>  struct test_args {
> - int nr_vcpus;
> - int nr_iter;
> - int timer_period_ms;
> - int migration_freq_ms;
> + uint32_t nr_vcpus;
> + uint32_t nr_iter;
> + uint32_t timer_period_ms;
> + uint32_t migration_freq_ms;
>   struct kvm_arm_counter_offset offset;
>  };
>  
> @@ -57,7 +57,7 @@ static struct test_args test_args = {
>   .offset = { .reserved = 1 },
>  };
>  
> -#define msecs_to_usecs(msec) ((msec) * 1000LL)
> +#define msecs_to_usecs(msec) ((msec) * 1000ULL)
>  
>  #define GICD_BASE_GPA0x800ULL
>  #define GICR_BASE_GPA0x80AULL
> @@ -72,7 +72,7 @@ enum guest_stage {
>  
>  /* Shared variables between host and guest */
>  struct test_vcpu_shared_data {
> - int nr_iter;
> + uint32_t nr_iter;
>   enum guest_stage guest_stage;
>   uint64_t xcnt;
>  };
> -- 
> 2.34.1
>

Reviewed-by: Andrew Jones

Re: [PATCH v6 2/3] livepatch: Move tests from lib/livepatch to selftests/livepatch

2024-01-22 Thread Marcos Paulo de Souza

On Fri, 2024-01-19 at 14:19 +0100, Alexander Gordeev wrote:
> On Fri, Jan 19, 2024 at 02:11:01PM +0100, Alexander Gordeev wrote:
> > FWIW, for s390 part:
> > 
> > Alexander Gordeev 
> 
> Acked-by: Alexander Gordeev 

Thanks Alexandre and Joe for testing and supporting the change.

Shuah, now that the issue found by that Joe was fixed, do you think the
change is ready to be merged? The patches were reviewed by three
different people already, and I don't know what else can be missing at
this point.

Thanks,
  Marcos

Re: [PATCH v5 02/12] KVM: arm64: selftests: Data type cleanup for arch_timer test

2024-01-22 Thread Haibo Xu

On Mon, Jan 22, 2024 at 8:21 PM Andrew Jones  wrote:
>
> On Mon, Jan 22, 2024 at 05:58:32PM +0800, Haibo Xu wrote:
> > Change signed type to unsigned in test_args struct which
> > only make sense for unsigned value.
> >
> > Suggested-by: Andrew Jones 
> > Signed-off-by: Haibo Xu 
> > ---
> >  tools/testing/selftests/kvm/aarch64/arch_timer.c | 12 ++--
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/aarch64/arch_timer.c 
> > b/tools/testing/selftests/kvm/aarch64/arch_timer.c
> > index 274b8465b42a..3260fefcc1b3 100644
> > --- a/tools/testing/selftests/kvm/aarch64/arch_timer.c
> > +++ b/tools/testing/selftests/kvm/aarch64/arch_timer.c
> > @@ -42,10 +42,10 @@
> >  #define TIMER_TEST_MIGRATION_FREQ_MS 2
> >
> >  struct test_args {
> > - int nr_vcpus;
> > - int nr_iter;
> > - int timer_period_ms;
> > - int migration_freq_ms;
> > + uint32_t nr_vcpus;
> > + uint32_t nr_iter;
> > + uint32_t timer_period_ms;
> > + uint32_t migration_freq_ms;
> >   struct kvm_arm_counter_offset offset;
> >  };
> >
> > @@ -57,7 +57,7 @@ static struct test_args test_args = {
> >   .offset = { .reserved = 1 },
> >  };
> >
> > -#define msecs_to_usecs(msec) ((msec) * 1000LL)
> > +#define msecs_to_usecs(msec) ((msec) * 1000ULL)
> >
> >  #define GICD_BASE_GPA0x800ULL
> >  #define GICR_BASE_GPA0x80AULL
> > @@ -72,7 +72,7 @@ enum guest_stage {
> >
> >  /* Shared variables between host and guest */
> >  struct test_vcpu_shared_data {
> > - int nr_iter;
> > + uint32_t nr_iter;
> >   enum guest_stage guest_stage;
> >   uint64_t xcnt;
> >  };
> > --
> > 2.34.1
> >
>
> Reviewed-by: Andrew Jones 

Thanks!

Re: [PATCH v14] exec: Fix dead-lock in de_thread with ptrace_attach

2024-01-22 Thread Bernd Edlinger

On 1/17/24 17:38, Oleg Nesterov wrote:
> On 01/17, Bernd Edlinger wrote:
>> Yes. but the tracer has to do its job, and that is ptrace_attach the
>> remaining treads, it does not know that it would avoid a dead-lock
>> when it calls wait(), instead of ptrace_attach.  It does not know
>> that the tracee has just called execve in one of the not yet traced
>> threads.
> 
> Hmm. I don't understand you.

Certainly I am willing to rephrase this until it is understandable.
Probably I have just not yet found the proper way to describe the issue
here, and your help in resolving that documentation issue is very important
to me.

> 
> I agree we have a problem which should be fixed. Just the changelog
> looks confusing to me, imo it doesn't explain the race/problem clearly.
> 

I am trying here to summarize what the test case "attach" in
./tools/testing/selftests/ptrace/vmaccess.c does.

I think it models the use case of a tracer that is trying to attach
to a multi-threaded process that is executing execve in a not-yet
traced thread while a different sub-thread is already traced,
it is not relevant that the test case uses PTRACE_TRACEME, to make
the sub-thead traced, the same would happen if the tracer uses
some out-of-band mechanism like /proc/pid/task to learn the thread_id
of the sub-threads and uses ptrace_attach to each of them.

The test case hits the dead-lock because there is a race condition
between before the PTRACE_ATTACH, and it cannot know that the
exit event from the sub-thread is already pending before the
PTRACE_ATTACH.  Of course a real tracer will not sleep a whole
second before a PTRACE_ATTACH, but even if it does a
waitpid(-1, &s, WNOHANG) immediately before the PTRACE_ATTACH
there is a tiny chance that the execve is entered just immediately
after waitpid has indicated that there is currently not
event pending.

 +  if (unlikely(t->ptrace)
 +  && (t != tsk->group_leader || !t->exit_state))
 +  unsafe_execve_in_progress = true;
>>>
>>> The !t->exit_state is not right... This sub-thread can already be a zombie
>>> with ->exit_state != 0 but see above, it won't be reaped until the debugger
>>> does wait().
>>>
>>
>> I dont think so.
>> de_thread() handles the group_leader different than normal threads.
> 
> I don't follow...
> 
> I didn't say that t is a group leader. I said it can be a zombie sub-thread
> with ->exit_state != 0.
> 

the condition here is 

(t != tsk->group_leader || !t->exit_state)

so in other words, if t is a sub-thread, i.e. t != tsk->group_leader
then the t->exit_state does not count, and the deadlock is possible.

But if t it is a group leader, then t == tsk->group_leader, but a
deadlock is only possible when t->exit_state == 0 at this time.
The most likely reason for this is PTRACE_O_TRACEEXIT.

I will add a new test case that demonstrates this in the next iteration
of this patch.  Here is a preview of what I have right now:

/*
 * Same test as previous, except that
 * the group leader is ptraced first,
 * but this time with PTRACE_O_TRACEEXIT,
 * and the thread that does execve is
 * not yet ptraced.  This exercises the
 * code block in de_thread where the
 * if (!thread_group_leader(tsk)) {
 * is executed and enters a wait state.
 */
static long thread2_tid;
static void *thread2(void *arg)
{
thread2_tid = syscall(__NR_gettid);
sleep(2);
execlp("false", "false", NULL);
return NULL;
}

TEST(attach2)
{
int s, k, pid = fork();

if (!pid) {
pthread_t pt;

pthread_create(&pt, NULL, thread2, NULL);
pthread_join(pt, NULL);
return;
}

sleep(1);
k = ptrace(PTRACE_ATTACH, pid, 0L, 0L);
ASSERT_EQ(k, 0);
k = waitpid(-1, &s, 0);
ASSERT_EQ(k, pid);
ASSERT_EQ(WIFSTOPPED(s), 1);
ASSERT_EQ(WSTOPSIG(s), SIGSTOP);
k = ptrace(PTRACE_SETOPTIONS, pid, 0L, PTRACE_O_TRACEEXIT);
ASSERT_EQ(k, 0);
thread2_tid = ptrace(PTRACE_PEEKDATA, pid, &thread2_tid, 0L);
ASSERT_NE(thread2_tid, -1);
ASSERT_NE(thread2_tid, 0);
ASSERT_NE(thread2_tid, pid);
k = waitpid(-1, &s, WNOHANG);
ASSERT_EQ(k, 0);
sleep(2);
/* deadlock may happen here */
k = ptrace(PTRACE_ATTACH, thread2_tid, 0L, 0L);
ASSERT_EQ(k, 0);
k = waitpid(-1, &s, WNOHANG);
ASSERT_EQ(k, pid);
ASSERT_EQ(WIFSTOPPED(s), 1);
ASSERT_EQ(WSTOPSIG(s), SIGTRAP);
k = waitpid(-1, &s, WNOHANG);
ASSERT_EQ(k, 0);
k = ptrace(PTRACE_CONT, pid, 0L, 0L);
ASSERT_EQ(k, 0);
k = waitpid(-1, &s, 0);
ASSERT_EQ(k, pid);
ASSERT_EQ(WIFSTOPPED(s), 1);
ASSERT_EQ(WSTOPSIG(s), SIGTRAP);
k = waitpid(-1, &s, WNOHANG);
ASSERT_EQ(k, 0);
k = ptrace(PTRACE_CONT, pid, 0L, 0L);
ASSERT_EQ(k, 0);
k = waitpid(-1, &s, 0);
ASSERT_EQ(k, pid);

Re: [PATCH v14] exec: Fix dead-lock in de_thread with ptrace_attach

2024-01-22 Thread Oleg Nesterov

I'll try to read your email later, just one note for now...

On 01/22, Bernd Edlinger wrote:
>
> > I didn't say that t is a group leader. I said it can be a zombie sub-thread
> > with ->exit_state != 0.
>
> the condition here is
>
> (t != tsk->group_leader || !t->exit_state)
>
> so in other words, if t is a sub-thread, i.e. t != tsk->group_leader
> then the t->exit_state does not count,

Ah indeed, somehow I misread this check as if you skip the sub-threads
with ->exit_state != 0.

Sorry for noise.

Oleg.

Re: [PATCH v2] selftests: mm: fix map_hugetlb failure on 64K page size systems

2024-01-22 Thread Nico Pache

Hi Andrew,

No, I think it's always been broken-- I don't think the test was
written with 512M huge page sizes in mind.

-- Nico

On Sat, Jan 20, 2024 at 9:39 PM Andrew Morton  wrote:
>
> On Fri, 19 Jan 2024 06:14:29 -0700 Nico Pache  wrote:
>
> > On systems with 64k page size and 512M huge page sizes, the allocation
> > and test succeeds but errors out at the munmap. As the comment states,
> > munmap will failure if its not HUGEPAGE aligned. This is due to the
> > length of the mapping being 1/2 the size of the hugepage causing the
> > munmap to not be hugepage aligned. Fix this by making the mapping length
> > the full hugepage if the hugepage is larger than the length of the
> > mapping.
>
> Is
>
> Fixes: fa7b9a805c79 ("tools/selftest/vm: allow choosing mem size and page 
> size in map_hugetlb")
>
> a suitable Fixes: target for this?
>
> > --- a/tools/testing/selftests/mm/map_hugetlb.c
> > +++ b/tools/testing/selftests/mm/map_hugetlb.c
> > @@ -15,6 +15,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include "vm_util.h"
> >
> >  #define LENGTH (256UL*1024*1024)
> >  #define PROTECTION (PROT_READ | PROT_WRITE)
> > @@ -58,10 +59,16 @@ int main(int argc, char **argv)
> >  {
> >   void *addr;
> >   int ret;
> > + size_t hugepage_size;
> >   size_t length = LENGTH;
> >   int flags = FLAGS;
> >   int shift = 0;
> >
> > + hugepage_size = default_huge_page_size();
> > + /* munmap with fail if the length is not page aligned */
> > + if (hugepage_size > length)
> > + length = hugepage_size;
> > +
> >   if (argc > 1)
> >   length = atol(argv[1]) << 20;
> >   if (argc > 2) {
> > --
> > 2.43.0
>

[PATCH v2] kselftest: dt: Stop relying on dirname to improve performance

2024-01-22 Thread Nícolas F . R . A . Prado

When walking directory trees, instead of looking for specific files and
running dirname to get the parent folder, traverse all folders and
ignore the ones not containing the desired files. This avoids the need
to call dirname inside the loop, which drastically decreases run time:
Running locally on a mt8192-asurada-spherion, which reports 160 test
cases, has gone from 5.5s to 2.9s, while running remotely with an
nfsroot has gone from 13.5s to 5.5s.

This change has a side-effect, which is that the root DT node now
also shows in the output, even though it isn't expected to bind to a
driver. However there shouldn't be a matching driver for the board
compatible, so the end result will be just an extra skipped test:

ok 1 / # SKIP

Reported-by: Mark Brown 
Closes: 
https://lore.kernel.org/all/310391e8-fdf2-4c2f-a680-7744eb685...@sirena.org.uk
Fixes: 14571ab1ad21 ("kselftest: Add new test for detecting unprobed Devicetree 
devices")
Tested-by: Mark Brown 
Signed-off-by: Nícolas F. R. A. Prado 
---
Changes in v2:
- Tweaked commit message
- Added trailer tags
- Rebased on 6.8-rc1
---
 tools/testing/selftests/dt/test_unprobed_devices.sh | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/dt/test_unprobed_devices.sh 
b/tools/testing/selftests/dt/test_unprobed_devices.sh
index b07af2a4c4de..7fae90293a9d 100755
--- a/tools/testing/selftests/dt/test_unprobed_devices.sh
+++ b/tools/testing/selftests/dt/test_unprobed_devices.sh
@@ -33,8 +33,8 @@ if [[ ! -d "${PDT}" ]]; then
 fi
 
 nodes_compatible=$(
-   for node_compat in $(find ${PDT} -name compatible); do
-   node=$(dirname "${node_compat}")
+   for node in $(find ${PDT} -type d); do
+   [ ! -f "${node}"/compatible ] && continue
# Check if node is available
if [[ -e "${node}"/status ]]; then
status=$(tr -d '\000' < "${node}"/status)
@@ -46,10 +46,11 @@ nodes_compatible=$(
 
 nodes_dev_bound=$(
IFS=$'\n'
-   for uevent in $(find /sys/devices -name uevent); do
-   if [[ -d "$(dirname "${uevent}")"/driver ]]; then
-   grep '^OF_FULLNAME=' "${uevent}" | sed -e 
's|OF_FULLNAME=||'
-   fi
+   for dev_dir in $(find /sys/devices -type d); do
+   [ ! -f "${dev_dir}"/uevent ] && continue
+   [ ! -d "${dev_dir}"/driver ] && continue
+
+   grep '^OF_FULLNAME=' "${dev_dir}"/uevent | sed -e 
's|OF_FULLNAME=||'
done
)
 

---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20240122-dt-kselftest-dirname-perf-fix-7dc421e6dfb0

Best regards,
-- 
Nícolas F. R. A. Prado

Re: [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions

2024-01-22 Thread Michal Koutný

Hello Waiman.

On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long  
wrote:
> This patch series is based on the RFC patch from Frederic [1]. Instead
> of offering RCU_NOCB as a separate option, it is now lumped into a
> root-only cpuset.cpus.isolation_full flag that will enable all the
> additional CPU isolation capabilities available for isolated partitions
> if set. RCU_NOCB is just the first one to this party. Additional dynamic
> CPU isolation capabilities will be added in the future.

IIUC this is similar to what I suggested back in the day and you didn't
consider it [1]. Do I read this right that you've changed your mind?

(It's fine if you did, I'm only asking to follow the heading of cpuset
controller.)

Thanks,
Michal

[1] https://lore.kernel.org/r/58c87587-417b-1498-185f-1db6bb612...@redhat.com/


signature.asc
Description: PGP signature

[PATCH v7 1/4] mseal: Wire up mseal syscall

2024-01-22 Thread jeffxu

From: Jeff Xu 

Wire up mseal syscall for all architectures.

Signed-off-by: Jeff Xu 
---
 arch/alpha/kernel/syscalls/syscall.tbl  | 1 +
 arch/arm/tools/syscall.tbl  | 1 +
 arch/arm64/include/asm/unistd.h | 2 +-
 arch/arm64/include/asm/unistd32.h   | 2 ++
 arch/m68k/kernel/syscalls/syscall.tbl   | 1 +
 arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   | 1 +
 arch/parisc/kernel/syscalls/syscall.tbl | 1 +
 arch/powerpc/kernel/syscalls/syscall.tbl| 1 +
 arch/s390/kernel/syscalls/syscall.tbl   | 1 +
 arch/sh/kernel/syscalls/syscall.tbl | 1 +
 arch/sparc/kernel/syscalls/syscall.tbl  | 1 +
 arch/x86/entry/syscalls/syscall_32.tbl  | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl  | 1 +
 arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
 include/uapi/asm-generic/unistd.h   | 5 -
 kernel/sys_ni.c | 1 +
 19 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl 
b/arch/alpha/kernel/syscalls/syscall.tbl
index 8ff110826ce2..d8f96362e9f8 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -501,3 +501,4 @@
 569common  lsm_get_self_attr   sys_lsm_get_self_attr
 570common  lsm_set_self_attr   sys_lsm_set_self_attr
 571common  lsm_list_modulessys_lsm_list_modules
+572common  mseal   sys_mseal
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index b6c9e01e14f5..2ed7d229c8f9 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -475,3 +475,4 @@
 459common  lsm_get_self_attr   sys_lsm_get_self_attr
 460common  lsm_set_self_attr   sys_lsm_set_self_attr
 461common  lsm_list_modulessys_lsm_list_modules
+462common  mseal   sys_mseal
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 491b2b9bd553..1346579f802f 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -39,7 +39,7 @@
 #define __ARM_NR_compat_set_tls(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls   462
+#define __NR_compat_syscalls   463
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 7118282d1c79..266b96acc014 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -929,6 +929,8 @@ __SYSCALL(__NR_lsm_get_self_attr, sys_lsm_get_self_attr)
 __SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
 #define __NR_lsm_list_modules 461
 __SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
+#define __NR_mseal 462
+__SYSCALL(__NR_mseal, sys_mseal)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl 
b/arch/m68k/kernel/syscalls/syscall.tbl
index 7fd43fd4c9f2..22a3cbd4c602 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -461,3 +461,4 @@
 459common  lsm_get_self_attr   sys_lsm_get_self_attr
 460common  lsm_set_self_attr   sys_lsm_set_self_attr
 461common  lsm_list_modulessys_lsm_list_modules
+462common  mseal   sys_mseal
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl 
b/arch/microblaze/kernel/syscalls/syscall.tbl
index b00ab2cabab9..2b81a6bd78b2 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -467,3 +467,4 @@
 459common  lsm_get_self_attr   sys_lsm_get_self_attr
 460common  lsm_set_self_attr   sys_lsm_set_self_attr
 461common  lsm_list_modulessys_lsm_list_modules
+462common  mseal   sys_mseal
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 83cfc9eb6b88..cc869f5d5693 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -400,3 +400,4 @@
 459n32 lsm_get_self_attr   sys_lsm_get_self_attr
 460n32 lsm_set_self_attr   sys_lsm_set_self_attr
 461n32 lsm_list_modulessys_lsm_list_modules
+462n32 mseal   sys_mseal
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl 
b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 532b855df589..1464c6be6eb3 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -376,3 +376,4 @@
 459n64 lsm_get_self_attr

[PATCH v7 0/4] Introduce mseal()

2024-01-22 Thread jeffxu

From: Jeff Xu 

This patchset proposes a new mseal() syscall for the Linux kernel.

In a nutshell, mseal() protects the VMAs of a given virtual memory
range against modifications, such as changes to their permission bits.

Modern CPUs support memory permissions, such as the read/write (RW)
and no-execute (NX) bits. Linux has supported NX since the release of
kernel version 2.6.8 in August 2004 [1]. The memory permission feature
improves the security stance on memory corruption bugs, as an attacker
cannot simply write to arbitrary memory and point the code to it. The
memory must be marked with the X bit, or else an exception will occur.
Internally, the kernel maintains the memory permissions in a data
structure called VMA (vm_area_struct). mseal() additionally protects
the VMA itself against modifications of the selected seal type.

Memory sealing is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For
example, such an attacker primitive can break control-flow integrity
guarantees since read-only memory that is supposed to be trusted can
become writable or .text pages can get remapped. Memory sealing can
automatically be applied by the runtime loader to seal .text and
.rodata pages and applications can additionally seal security critical
data at runtime. A similar feature already exists in the XNU kernel
with the VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the
mimmutable syscall [4]. Also, Chrome wants to adopt this feature for
their CFI work [2] and this patchset has been designed to be
compatible with the Chrome use case.

Two system calls are involved in sealing the map:  mmap() and mseal().

The new mseal() is an syscall on 64 bit CPU, and with
following signature:

int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.

mseal() blocks following operations for the given memory range.

1> Unmapping, moving to another location, and shrinking the size,
   via munmap() and mremap(), can leave an empty space, therefore can
   be replaced with a VMA with a new set of attributes.

2> Moving or expanding a different VMA into the current location,
   via mremap().

3> Modifying a VMA via mmap(MAP_FIXED).

4> Size expansion, via mremap(), does not appear to pose any specific
   risks to sealed VMAs. It is included anyway because the use case is
   unclear. In any case, users can rely on merging to expand a sealed VMA.

5> mprotect() and pkey_mprotect().

6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
   memory, when users don't have write permission to the memory. Those
   behaviors can alter region contents by discarding pages, effectively a
   memset(0) for anonymous memory.

In addition: mmap() has two related changes.

The PROT_SEAL bit in prot field of mmap(). When present, it marks
the map sealed since creation.

The MAP_SEALABLE bit in the flags field of mmap(). When present, it marks
the map as sealable. A map created without MAP_SEALABLE will not support
sealing, i.e. mseal() will fail.

Applications that don't care about sealing will expect their behavior
unchanged. For those that need sealing support, opt-in by adding
MAP_SEALABLE in mmap().

The idea that inspired this patch comes from Stephen Röttger’s work in
V8 CFI [5]. Chrome browser in ChromeOS will be the first user of this
API.

Indeed, the Chrome browser has very specific requirements for sealing,
which are distinct from those of most applications. For example, in
the case of libc, sealing is only applied to read-only (RO) or
read-execute (RX) memory segments (such as .text and .RELRO) to
prevent them from becoming writable, the lifetime of those mappings
are tied to the lifetime of the process.

Chrome wants to seal two large address space reservations that are
managed by different allocators. The memory is mapped RW- and RWX
respectively but write access to it is restricted using pkeys (or in
the future ARM permission overlay extensions). The lifetime of those
mappings are not tied to the lifetime of the process, therefore, while
the memory is sealed, the allocators still need to free or discard the
unused memory. For example, with madvise(DONTNEED).

However, always allowing madvise(DONTNEED) on this range poses a
security risk. For example if a jump instruction crosses a page
boundary and the second page gets discarded, it will overwrite the
target bytes with zeros and change the control flow. Checking
write-permission before the discard operation allows us to control
when the operation is valid. In this case, the madvise will only
succeed if the executing thread has PKEY write permissions and PKRU
changes are protected in software by control-flow integrity.

Although the initial version of this patch series is targeting the
Chrome browser as its first user, it became evident during upstream
discussions that we would also want to ensure that the patch set
eventually is a complete solution for memory sealing

[PATCH v7 2/4] mseal: add mseal syscall

2024-01-22 Thread jeffxu

From: Jeff Xu 

The new mseal() is an syscall on 64 bit CPU, and with
following signature:

int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.

mseal() blocks following operations for the given memory range.

1> Unmapping, moving to another location, and shrinking the size,
   via munmap() and mremap(), can leave an empty space, therefore can
   be replaced with a VMA with a new set of attributes.

2> Moving or expanding a different VMA into the current location,
   via mremap().

3> Modifying a VMA via mmap(MAP_FIXED).

4> Size expansion, via mremap(), does not appear to pose any specific
   risks to sealed VMAs. It is included anyway because the use case is
   unclear. In any case, users can rely on merging to expand a sealed VMA.

5> mprotect() and pkey_mprotect().

6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
   memory, when users don't have write permission to the memory. Those
   behaviors can alter region contents by discarding pages, effectively a
   memset(0) for anonymous memory.

In addition: mmap() has two related changes.

The PROT_SEAL bit in prot field of mmap(). When present, it marks
the map sealed since creation.

The MAP_SEALABLE bit in the flags field of mmap(). When present, it marks
the map as sealable. A map created without MAP_SEALABLE will not support
sealing, i.e. mseal() will fail.

Applications that don't care about sealing will expect their behavior
unchanged. For those that need sealing support, opt-in by adding
MAP_SEALABLE in mmap().

I would like to formally acknowledge the valuable contributions
received during the RFC process, which were instrumental
in shaping this patch:

Jann Horn: raising awareness and providing valuable insights on the
destructive madvise operations.
Linus Torvalds: assisting in defining system call signature and scope.
Pedro Falcato: suggesting sealing in the mmap().
Theo de Raadt: sharing the experiences and insights gained from
implementing mimmutable() in OpenBSD.

Finally, the idea that inspired this patch comes from Stephen Röttger’s
work in Chrome V8 CFI.

Signed-off-by: Jeff Xu 
---
 include/linux/mm.h |  48 
 include/linux/syscalls.h   |   1 +
 include/uapi/asm-generic/mman-common.h |   8 +
 mm/Makefile|   4 +
 mm/madvise.c   |  12 +
 mm/mmap.c  |  27 ++
 mm/mprotect.c  |  10 +
 mm/mremap.c|  31 +++
 mm/mseal.c | 343 +
 9 files changed, 484 insertions(+)
 create mode 100644 mm/mseal.c

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f5a97dec5169..bdd9a53e9291 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -328,6 +328,14 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
+#ifdef CONFIG_64BIT
+/* VM is sealable, in vm_flags */
+#define VM_SEALABLE_BITUL(63)
+
+/* VM is sealed, in vm_flags */
+#define VM_SEALED  _BITUL(62)
+#endif
+
 #ifdef CONFIG_ARCH_HAS_PKEYS
 # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
 # define VM_PKEY_BIT0  VM_HIGH_ARCH_0  /* A protection key is a 4-bit value */
@@ -4182,4 +4190,44 @@ static inline bool pfn_is_unaccepted_memory(unsigned 
long pfn)
return range_contains_unaccepted_memory(paddr, paddr + PAGE_SIZE);
 }
 
+#ifdef CONFIG_64BIT
+static inline int can_do_mseal(unsigned long flags)
+{
+   if (flags)
+   return -EINVAL;
+
+   return 0;
+}
+
+bool can_modify_mm(struct mm_struct *mm, unsigned long start,
+   unsigned long end);
+bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
+   unsigned long end, int behavior);
+unsigned long get_mmap_seals(unsigned long prot,
+   unsigned long flags);
+#else
+static inline int can_do_mseal(unsigned long flags)
+{
+   return -EPERM;
+}
+
+static inline bool can_modify_mm(struct mm_struct *mm, unsigned long start,
+   unsigned long end)
+{
+   return true;
+}
+
+static inline bool can_modify_mm_madv(struct mm_struct *mm, unsigned long 
start,
+   unsigned long end, int behavior)
+{
+   return true;
+}
+
+static inline unsigned long get_mmap_seals(unsigned long prot,
+   unsigned long flags)
+{
+   return 0;
+}
+#endif
+
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index cdba4d0c6d4a..2d44e0d99e37 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -820,6 +820,7 @@ asmlinkage long sys_process_mrelease(int pidfd, unsigned 
int flags);
 asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size,
unsigned long prot, unsigned long pgoff,
unsigned long flags);
+asmlinkage long sys_mseal(unsigned long start, size_t len, un

[PATCH v7 4/4] mseal:add documentation

2024-01-22 Thread jeffxu

From: Jeff Xu 

Add documentation for mseal().

Signed-off-by: Jeff Xu 
---
 Documentation/userspace-api/index.rst |   1 +
 Documentation/userspace-api/mseal.rst | 183 ++
 2 files changed, 184 insertions(+)
 create mode 100644 Documentation/userspace-api/mseal.rst

diff --git a/Documentation/userspace-api/index.rst 
b/Documentation/userspace-api/index.rst
index 09f61bd2ac2e..178f6a1d79cb 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -26,6 +26,7 @@ place where this information is gathered.
iommu
iommufd
media/index
+   mseal
netlink/index
sysfs-platform_profile
vduse
diff --git a/Documentation/userspace-api/mseal.rst 
b/Documentation/userspace-api/mseal.rst
new file mode 100644
index ..929a706b70eb
--- /dev/null
+++ b/Documentation/userspace-api/mseal.rst
@@ -0,0 +1,183 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=
+Introduction of mseal
+=
+
+:Author: Jeff Xu 
+
+Modern CPUs support memory permissions such as RW and NX bits. The memory
+permission feature improves security stance on memory corruption bugs, i.e.
+the attacker can’t just write to arbitrary memory and point the code to it,
+the memory has to be marked with X bit, or else an exception will happen.
+
+Memory sealing additionally protects the mapping itself against
+modifications. This is useful to mitigate memory corruption issues where a
+corrupted pointer is passed to a memory management system. For example,
+such an attacker primitive can break control-flow integrity guarantees
+since read-only memory that is supposed to be trusted can become writable
+or .text pages can get remapped. Memory sealing can automatically be
+applied by the runtime loader to seal .text and .rodata pages and
+applications can additionally seal security critical data at runtime.
+
+A similar feature already exists in the XNU kernel with the
+VM_FLAGS_PERMANENT flag [1] and on OpenBSD with the mimmutable syscall [2].
+
+User API
+
+Two system calls are involved in virtual memory sealing, mseal() and mmap().
+
+mseal()
+---
+The mseal() syscall has the following signature:
+
+``int mseal(void addr, size_t len, unsigned long flags)``
+
+**addr/len**: virtual memory address range.
+
+The address range set by ``addr``/``len`` must meet:
+   - The start address must be in an allocated VMA.
+   - The start address must be page aligned.
+   - The end address (``addr`` + ``len``) must be in an allocated VMA.
+   - no gap (unallocated memory) between start and end address.
+
+The ``len`` will be paged aligned implicitly by the kernel.
+
+**flags**: reserved for future use.
+
+**return values**:
+
+- ``0``: Success.
+
+- ``-EINVAL``:
+- Invalid input ``flags``.
+- The start address (``addr``) is not page aligned.
+- Address range (``addr`` + ``len``) overflow.
+
+- ``-ENOMEM``:
+- The start address (``addr``) is not allocated.
+- The end address (``addr`` + ``len``) is not allocated.
+- A gap (unallocated memory) between start and end address.
+
+- ``-EACCES``:
+- ``MAP_SEALABLE`` is not set during mmap().
+
+- ``-EPERM``:
+- sealing is supported only on 64-bit CPUs, 32-bit is not supported.
+
+- For above error cases, users can expect the given memory range is
+  unmodified, i.e. no partial update.
+
+- There might be other internal errors/cases not listed here, e.g.
+  error during merging/splitting VMAs, or the process reaching the max
+  number of supported VMAs. In those cases, partial updates to the given
+  memory range could happen. However, those cases should be rare.
+
+**Blocked operations after sealing**:
+Unmapping, moving to another location, and shrinking the size,
+via munmap() and mremap(), can leave an empty space, therefore
+can be replaced with a VMA with a new set of attributes.
+
+Moving or expanding a different VMA into the current location,
+via mremap().
+
+Modifying a VMA via mmap(MAP_FIXED).
+
+Size expansion, via mremap(), does not appear to pose any
+specific risks to sealed VMAs. It is included anyway because
+the use case is unclear. In any case, users can rely on
+merging to expand a sealed VMA.
+
+mprotect() and pkey_mprotect().
+
+Some destructive madvice() behaviors (e.g. MADV_DONTNEED)
+for anonymous memory, when users don't have write permission to the
+memory. Those behaviors can alter region contents by discarding pages,
+effectively a memset(0) for anonymous memory.
+
+Kernel will return -EPERM for blocked operations.
+
+**Note**:
+
+- mseal() only works on 64-bit CPUs, not 32-bit CPU.
+
+- users can call mseal() multiple times, mseal() on an already sealed memory
+  is a no-action (not error).
+
+- munseal() is not supported.
+
+mmap()
+--
+``void *mmap(void* addr, size_t length, int prot, int flags, int fd,
+off_t offset);``
+
+We add two changes in ``prot`` and ``flag

[PATCH v7 3/4] selftest mm/mseal memory sealing

2024-01-22 Thread jeffxu

From: Jeff Xu 

selftest for memory sealing change in mmap() and mseal().

Signed-off-by: Jeff Xu 
---
 tools/testing/selftests/mm/.gitignore   |1 +
 tools/testing/selftests/mm/Makefile |1 +
 tools/testing/selftests/mm/mseal_test.c | 1997 +++
 3 files changed, 1999 insertions(+)
 create mode 100644 tools/testing/selftests/mm/mseal_test.c

diff --git a/tools/testing/selftests/mm/.gitignore 
b/tools/testing/selftests/mm/.gitignore
index 4ff10ea61461..76474c51c786 100644
--- a/tools/testing/selftests/mm/.gitignore
+++ b/tools/testing/selftests/mm/.gitignore
@@ -46,3 +46,4 @@ gup_longterm
 mkdirty
 va_high_addr_switch
 hugetlb_fault_after_madv
+mseal_test
diff --git a/tools/testing/selftests/mm/Makefile 
b/tools/testing/selftests/mm/Makefile
index 2453add65d12..ba36a5c2b1fc 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -59,6 +59,7 @@ TEST_GEN_FILES += mlock2-tests
 TEST_GEN_FILES += mrelease_test
 TEST_GEN_FILES += mremap_dontunmap
 TEST_GEN_FILES += mremap_test
+TEST_GEN_FILES += mseal_test
 TEST_GEN_FILES += on-fault-limit
 TEST_GEN_FILES += pagemap_ioctl
 TEST_GEN_FILES += thuge-gen
diff --git a/tools/testing/selftests/mm/mseal_test.c 
b/tools/testing/selftests/mm/mseal_test.c
new file mode 100644
index ..0d8b7041a7a0
--- /dev/null
+++ b/tools/testing/selftests/mm/mseal_test.c
@@ -0,0 +1,1997 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "../kselftest.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * need those definition for manually build using gcc.
+ * gcc -I ../../../../usr/include   -DDEBUG -O3  -DDEBUG -O3 mseal_test.c -o 
mseal_test
+ */
+#ifndef MAP_SEALABLE
+#define MAP_SEALABLE 0x800
+#endif
+
+#ifndef PROT_SEAL
+#define PROT_SEAL 0x0400
+#endif
+
+#ifndef PKEY_DISABLE_ACCESS
+# define PKEY_DISABLE_ACCESS0x1
+#endif
+
+#ifndef PKEY_DISABLE_WRITE
+# define PKEY_DISABLE_WRITE 0x2
+#endif
+
+#ifndef PKEY_BITS_PER_KEY
+#define PKEY_BITS_PER_PKEY  2
+#endif
+
+#ifndef PKEY_MASK
+#define PKEY_MASK   (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)
+#endif
+
+#define FAIL_TEST_IF_FALSE(c) do {\
+   if (!(c)) {\
+   ksft_test_result_fail("%s, line:%d\n", __func__, 
__LINE__);\
+   goto test_end;\
+   } \
+   } \
+   while (0)
+
+#define SKIP_TEST_IF_FALSE(c) do {\
+   if (!(c)) {\
+   ksft_test_result_skip("%s, line:%d\n", __func__, 
__LINE__);\
+   goto test_end;\
+   } \
+   } \
+   while (0)
+
+
+#define TEST_END_CHECK() {\
+   ksft_test_result_pass("%s\n", __func__);\
+   return;\
+test_end:\
+   return;\
+}
+
+#ifndef u64
+#define u64 unsigned long long
+#endif
+
+static unsigned long get_vma_size(void *addr)
+{
+   FILE *maps;
+   char line[256];
+   int size = 0;
+   uintptr_t  addr_start, addr_end;
+
+   maps = fopen("/proc/self/maps", "r");
+   if (!maps)
+   return 0;
+
+   while (fgets(line, sizeof(line), maps)) {
+   if (sscanf(line, "%lx-%lx", &addr_start, &addr_end) == 2) {
+   if (addr_start == (uintptr_t) addr) {
+   size = addr_end - addr_start;
+   break;
+   }
+   }
+   }
+   fclose(maps);
+   return size;
+}
+
+/*
+ * define sys_xyx to call syscall directly.
+ */
+static int sys_mseal(void *start, size_t len)
+{
+   int sret;
+
+   errno = 0;
+   sret = syscall(__NR_mseal, start, len, 0);
+   return sret;
+}
+
+static int sys_mprotect(void *ptr, size_t size, unsigned long prot)
+{
+   int sret;
+
+   errno = 0;
+   sret = syscall(SYS_mprotect, ptr, size, prot);
+   return sret;
+}
+
+static int sys_mprotect_pkey(void *ptr, size_t size, unsigned long orig_prot,
+   unsigned long pkey)
+{
+   int sret;
+
+   errno = 0;
+   sret = syscall(__NR_pkey_mprotect, ptr, size, orig_prot, pkey);
+   return sret;
+}
+
+static void *sys_mmap(void *addr, unsigned long len, unsigned long prot,
+   unsigned long flags, unsigned long fd, unsigned long offset)
+{
+   void *sret;
+
+   errno = 0;
+   sret = (void *) syscall(__NR_mmap, addr, len, prot,
+   flags, fd, offset);
+   return sret;
+}
+
+static int sys_munmap(void *ptr, size_t size)
+{
+   int sret;
+
+   errno = 0;
+   sret = syscall(SYS_munmap, ptr, size);
+   return sret;
+}
+
+static int sys_madvise(void *start, size_t len, int types)
+{
+   int sret;
+
+   errno = 0;
+   sret = syscall(__NR_madvise, start, len, types);
+   return sret;
+}
+
+static int sys_pkey_alloc(unsigned long flags, unsign

Re: [PATCH v7 0/4] Introduce mseal()

2024-01-22 Thread Theo de Raadt

Regarding these pieces

> The PROT_SEAL bit in prot field of mmap(). When present, it marks
> the map sealed since creation.

OpenBSD won't be doing this.  I had PROT_IMMUTABLE as a draft.  In my
research I found basically zero circumstances when you userland does
that.  The most common circumstance is you create a RW mapping, fill it,
and then change to a more restrictve mapping, and lock it.

There are a few regions in the addressspace that can be locked while RW.
For instance, the stack.  But the kernel does that, not userland.  I
found regions where the kernel wants to do this to the address space,
but there is no need to export useless functionality to userland.

OpenBSD now uses this for a high percent of the address space.  It might
be worth re-reading a description of the split of responsibility regarding
who locks different types of memory in a process;
- kernel (the majority, based upon what ELF layout tell us),
- shared library linker (the next majority, dealing with shared
  library mappings and left-overs not determinable at kernel time),
- libc (a small minority, mostly regarding forced mutable objects)
- and the applications themselves (only 1 application today)

https://lwn.net/Articles/915662/

> The MAP_SEALABLE bit in the flags field of mmap(). When present, it marks
> the map as sealable. A map created without MAP_SEALABLE will not support
> sealing, i.e. mseal() will fail.

We definately won't be doing this.  We allow a process to lock any and all
it's memory that isn't locked already, even if it means it is shooting
itself in the foot.

I think you are going to severely hurt the power of this mechanism,
because you won't be able to lock memory that has been allocated by a
different callsite not under your source-code control which lacks the
MAP_SEALABLE flag.  (Which is extremely common with the system-parts of
a process, meaning not just libc but kernel allocated objects).

It may be fine inside a program like chrome, but I expect that flag to make
it harder to use in libc, and it will hinder adoption.

[PATCH v2 0/2] kselftest/seccomp: Convert to KTAP output

2024-01-22 Thread Mark Brown

Currently the seccomp benchmark selftest produces non-standard output,
meaning that while it makes a number of checks of the performance it
observes this has to be parsed by humans.  This means that automated
systems running this suite of tests are almost certainly ignoring the
results which isn't ideal for spotting problems.  Let's rework things so
that each check that the program does is reported as a test result to
the framework.

Signed-off-by: Mark Brown 
---
Changes in v2:
- Rebase onto v6.8-rc1.
- Link to v1: 
https://lore.kernel.org/r/20231219-b4-kselftest-seccomp-benchmark-ktap-v1-0-f99e22863...@kernel.org

---
Mark Brown (2):
  kselftest/seccomp: Use kselftest output functions for benchmark
  kselftest/seccomp: Report each expectation we assert as a KTAP test

 .../testing/selftests/seccomp/seccomp_benchmark.c  | 105 +
 1 file changed, 65 insertions(+), 40 deletions(-)
---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20231219-b4-kselftest-seccomp-benchmark-ktap-357603823708

Best regards,
-- 
Mark Brown

[PATCH v2 1/2] kselftest/seccomp: Use kselftest output functions for benchmark

2024-01-22 Thread Mark Brown

In preparation for trying to output the test results themselves in TAP
format rework all the prints in the benchmark to use the kselftest output
functions. The uses of system() all produce single line output so we can
avoid having to deal with fully managing the child process and continue to
use system() by simply printing an empty message before we invoke system().
We also leave one printf() used to complete a line of output in place.

Tested-by: Anders Roxell 
Signed-off-by: Mark Brown 
---
 .../testing/selftests/seccomp/seccomp_benchmark.c  | 45 --
 1 file changed, 24 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_benchmark.c 
b/tools/testing/selftests/seccomp/seccomp_benchmark.c
index 5b5c9d558dee..93168dd2c1e3 100644
--- a/tools/testing/selftests/seccomp/seccomp_benchmark.c
+++ b/tools/testing/selftests/seccomp/seccomp_benchmark.c
@@ -38,10 +38,10 @@ unsigned long long timing(clockid_t clk_id, unsigned long 
long samples)
i *= 10ULL;
i += finish.tv_nsec - start.tv_nsec;
 
-   printf("%lu.%09lu - %lu.%09lu = %llu (%.1fs)\n",
-   finish.tv_sec, finish.tv_nsec,
-   start.tv_sec, start.tv_nsec,
-   i, (double)i / 10.0);
+   ksft_print_msg("%lu.%09lu - %lu.%09lu = %llu (%.1fs)\n",
+  finish.tv_sec, finish.tv_nsec,
+  start.tv_sec, start.tv_nsec,
+  i, (double)i / 10.0);
 
return i;
 }
@@ -53,7 +53,7 @@ unsigned long long calibrate(void)
pid_t pid, ret;
int seconds = 15;
 
-   printf("Calibrating sample size for %d seconds worth of syscalls 
...\n", seconds);
+   ksft_print_msg("Calibrating sample size for %d seconds worth of 
syscalls ...\n", seconds);
 
samples = 0;
pid = getpid();
@@ -102,14 +102,14 @@ long compare(const char *name_one, const char *name_eval, 
const char *name_two,
 {
bool good;
 
-   printf("\t%s %s %s (%lld %s %lld): ", name_one, name_eval, name_two,
-  (long long)one, name_eval, (long long)two);
+   ksft_print_msg("\t%s %s %s (%lld %s %lld): ", name_one, name_eval, 
name_two,
+  (long long)one, name_eval, (long long)two);
if (one > INT_MAX) {
-   printf("Miscalculation! Measurement went negative: %lld\n", 
(long long)one);
+   ksft_print_msg("Miscalculation! Measurement went negative: 
%lld\n", (long long)one);
return 1;
}
if (two > INT_MAX) {
-   printf("Miscalculation! Measurement went negative: %lld\n", 
(long long)two);
+   ksft_print_msg("Miscalculation! Measurement went negative: 
%lld\n", (long long)two);
return 1;
}
 
@@ -145,12 +145,15 @@ int main(int argc, char *argv[])
 
setbuf(stdout, NULL);
 
-   printf("Running on:\n");
+   ksft_print_msg("Running on:\n");
+   ksft_print_msg("");
system("uname -a");
 
-   printf("Current BPF sysctl settings:\n");
+   ksft_print_msg("Current BPF sysctl settings:\n");
/* Avoid using "sysctl" which may not be installed. */
+   ksft_print_msg("");
system("grep -H . /proc/sys/net/core/bpf_jit_enable");
+   ksft_print_msg("");
system("grep -H . /proc/sys/net/core/bpf_jit_harden");
 
if (argc > 1)
@@ -158,11 +161,11 @@ int main(int argc, char *argv[])
else
samples = calibrate();
 
-   printf("Benchmarking %llu syscalls...\n", samples);
+   ksft_print_msg("Benchmarking %llu syscalls...\n", samples);
 
/* Native call */
native = timing(CLOCK_PROCESS_CPUTIME_ID, samples) / samples;
-   printf("getpid native: %llu ns\n", native);
+   ksft_print_msg("getpid native: %llu ns\n", native);
 
ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
assert(ret == 0);
@@ -172,33 +175,33 @@ int main(int argc, char *argv[])
assert(ret == 0);
 
bitmap1 = timing(CLOCK_PROCESS_CPUTIME_ID, samples) / samples;
-   printf("getpid RET_ALLOW 1 filter (bitmap): %llu ns\n", bitmap1);
+   ksft_print_msg("getpid RET_ALLOW 1 filter (bitmap): %llu ns\n", 
bitmap1);
 
/* Second filter resulting in a bitmap */
ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &bitmap_prog);
assert(ret == 0);
 
bitmap2 = timing(CLOCK_PROCESS_CPUTIME_ID, samples) / samples;
-   printf("getpid RET_ALLOW 2 filters (bitmap): %llu ns\n", bitmap2);
+   ksft_print_msg("getpid RET_ALLOW 2 filters (bitmap): %llu ns\n", 
bitmap2);
 
/* Third filter, can no longer be converted to bitmap */
ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
assert(ret == 0);
 
filter1 = timing(CLOCK_PROCESS_CPUTIME_ID, samples) / samples;
-   printf("getpid RET_ALLOW 3 filters (full): %llu ns\n", filter1);
+   ksft_print_msg("getpid RET_ALLOW 3 filters (full): %llu ns\n", filt

[PATCH v2] kselftest/clone3: Make test names for set_tid test stable

2024-01-22 Thread Mark Brown

The test results reported for the clone3_set_tid tests interact poorly with
automation for running kselftest since the reported test names include TIDs
dynamically allocated at runtime. A lot of automation for running kselftest
will compare runs by looking at the test name to identify if the same test
is being run so changing names make it look like the testsuite has been
updated to include new tests. This makes the results display less clearly
and breaks cases like bisection.

Address this by providing a brief description of the tests and logging that
along with the stable parameters for the test currently logged. The TIDs
are already logged separately in existing logging except for the final test
which has a new log message added. We also tweak the formatting of the
logging of expected/actual values for clarity.

There are still issues with the logging of skipped tests (many are simply
not logged at all when skipped and all are logged with different names) but
these are less disruptive since the skips are all based on not being run as
root, a condition likely to be stable for a given test system.

Acked-by: Christian Brauner 
Signed-off-by: Mark Brown 
---
Changes in v2:
- Rebase onto v6.8-rc1.
- Link to v1: 
https://lore.kernel.org/r/20231115-kselftest-clone3-set-tid-v1-1-c1932591c...@kernel.org
---
 tools/testing/selftests/clone3/clone3_set_tid.c | 117 ++--
 1 file changed, 69 insertions(+), 48 deletions(-)

diff --git a/tools/testing/selftests/clone3/clone3_set_tid.c 
b/tools/testing/selftests/clone3/clone3_set_tid.c
index ed785afb6077..9ae38733cb6e 100644
--- a/tools/testing/selftests/clone3/clone3_set_tid.c
+++ b/tools/testing/selftests/clone3/clone3_set_tid.c
@@ -114,7 +114,8 @@ static int call_clone3_set_tid(pid_t *set_tid,
return WEXITSTATUS(status);
 }
 
-static void test_clone3_set_tid(pid_t *set_tid,
+static void test_clone3_set_tid(const char *desc,
+   pid_t *set_tid,
size_t set_tid_size,
int flags,
int expected,
@@ -129,17 +130,13 @@ static void test_clone3_set_tid(pid_t *set_tid,
ret = call_clone3_set_tid(set_tid, set_tid_size, flags, expected_pid,
  wait_for_it);
ksft_print_msg(
-   "[%d] clone3() with CLONE_SET_TID %d says :%d - expected %d\n",
+   "[%d] clone3() with CLONE_SET_TID %d says: %d - expected %d\n",
getpid(), set_tid[0], ret, expected);
-   if (ret != expected)
-   ksft_test_result_fail(
-   "[%d] Result (%d) is different than expected (%d)\n",
-   getpid(), ret, expected);
-   else
-   ksft_test_result_pass(
-   "[%d] Result (%d) matches expectation (%d)\n",
-   getpid(), ret, expected);
+
+   ksft_test_result(ret == expected, "%s with %d TIDs and flags 0x%x\n",
+desc, set_tid_size, flags);
 }
+
 int main(int argc, char *argv[])
 {
FILE *f;
@@ -172,73 +169,91 @@ int main(int argc, char *argv[])
 
/* Try invalid settings */
memset(&set_tid, 0, sizeof(set_tid));
-   test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
+   test_clone3_set_tid("invalid size, 0 TID",
+   set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
 
-   test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
+   test_clone3_set_tid("invalid size, 0 TID",
+   set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
 
-   test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2 + 1, 0,
-   -EINVAL, 0, 0);
+   test_clone3_set_tid("invalid size, 0 TID",
+   set_tid, MAX_PID_NS_LEVEL * 2 + 1, 0,
+   -EINVAL, 0, 0);
 
-   test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 42, 0, -EINVAL, 0, 0);
+   test_clone3_set_tid("invalid size, 0 TID",
+   set_tid, MAX_PID_NS_LEVEL * 42, 0, -EINVAL, 0, 0);
 
/*
 * This can actually work if this test running in a MAX_PID_NS_LEVEL - 1
 * nested PID namespace.
 */
-   test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL - 1, 0, -EINVAL, 0, 0);
+   test_clone3_set_tid("invalid size, 0 TID",
+   set_tid, MAX_PID_NS_LEVEL - 1, 0, -EINVAL, 0, 0);
 
memset(&set_tid, 0xff, sizeof(set_tid));
-   test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
+   test_clone3_set_tid("invalid size, TID all 1s",
+   set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
 
-   test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
+   test_clone3_set_tid("invalid size, TID all 1s",
+   set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
 
-   test_clone3_set_tid(set

Re: [PATCH v2] kselftest: dt: Stop relying on dirname to improve performance

2024-01-22 Thread Rob Herring

On Mon, Jan 22, 2024 at 11:29:18AM -0300, Nícolas F. R. A. Prado wrote:
> When walking directory trees, instead of looking for specific files and
> running dirname to get the parent folder, traverse all folders and
> ignore the ones not containing the desired files. This avoids the need
> to call dirname inside the loop, which drastically decreases run time:
> Running locally on a mt8192-asurada-spherion, which reports 160 test
> cases, has gone from 5.5s to 2.9s, while running remotely with an
> nfsroot has gone from 13.5s to 5.5s.
> 
> This change has a side-effect, which is that the root DT node now
> also shows in the output, even though it isn't expected to bind to a
> driver. However there shouldn't be a matching driver for the board
> compatible, so the end result will be just an extra skipped test:
> 
> ok 1 / # SKIP
> 
> Reported-by: Mark Brown 
> Closes: 
> https://lore.kernel.org/all/310391e8-fdf2-4c2f-a680-7744eb685...@sirena.org.uk
> Fixes: 14571ab1ad21 ("kselftest: Add new test for detecting unprobed 
> Devicetree devices")
> Tested-by: Mark Brown 
> Signed-off-by: Nícolas F. R. A. Prado 
> ---
> Changes in v2:
> - Tweaked commit message
> - Added trailer tags
> - Rebased on 6.8-rc1
> ---
>  tools/testing/selftests/dt/test_unprobed_devices.sh | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)

Applied, thanks.

Rob

[PATCH v2 2/2] kselftest/seccomp: Report each expectation we assert as a KTAP test

2024-01-22 Thread Mark Brown

The seccomp benchmark test makes a number of checks on the performance it
measures and logs them to the output but does so in a custom format which
none of the automated test runners understand meaning that the chances that
anyone is paying attention are slim. Let's additionally log each result in
KTAP format so that automated systems parsing the test output will see each
comparison as a test case. The original logs are left in place since they
provide the actual numbers for analysis.

As part of this rework the flow for the main program so that when we skip
tests we still log all the tests we skip, this is because the standard KTAP
headers and footers include counts of the number of expected and run tests.

Tested-by: Anders Roxell 
---
 .../testing/selftests/seccomp/seccomp_benchmark.c  | 62 +++---
 1 file changed, 42 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_benchmark.c 
b/tools/testing/selftests/seccomp/seccomp_benchmark.c
index 93168dd2c1e3..436a527b8235 100644
--- a/tools/testing/selftests/seccomp/seccomp_benchmark.c
+++ b/tools/testing/selftests/seccomp/seccomp_benchmark.c
@@ -98,24 +98,36 @@ bool le(int i_one, int i_two)
 }
 
 long compare(const char *name_one, const char *name_eval, const char *name_two,
-unsigned long long one, bool (*eval)(int, int), unsigned long long 
two)
+unsigned long long one, bool (*eval)(int, int), unsigned long long 
two,
+bool skip)
 {
bool good;
 
+   if (skip) {
+   ksft_test_result_skip("%s %s %s\n", name_one, name_eval,
+ name_two);
+   return 0;
+   }
+
ksft_print_msg("\t%s %s %s (%lld %s %lld): ", name_one, name_eval, 
name_two,
   (long long)one, name_eval, (long long)two);
if (one > INT_MAX) {
ksft_print_msg("Miscalculation! Measurement went negative: 
%lld\n", (long long)one);
-   return 1;
+   good = false;
+   goto out;
}
if (two > INT_MAX) {
ksft_print_msg("Miscalculation! Measurement went negative: 
%lld\n", (long long)two);
-   return 1;
+   good = false;
+   goto out;
}
 
good = eval(one, two);
printf("%s\n", good ? "✔️" : "❌");
 
+out:
+   ksft_test_result(good, "%s %s %s\n", name_one, name_eval, name_two);
+
return good ? 0 : 1;
 }
 
@@ -142,9 +154,13 @@ int main(int argc, char *argv[])
unsigned long long samples, calc;
unsigned long long native, filter1, filter2, bitmap1, bitmap2;
unsigned long long entry, per_filter1, per_filter2;
+   bool skip = false;
 
setbuf(stdout, NULL);
 
+   ksft_print_header();
+   ksft_set_plan(7);
+
ksft_print_msg("Running on:\n");
ksft_print_msg("");
system("uname -a");
@@ -202,8 +218,10 @@ int main(int argc, char *argv[])
 #define ESTIMATE(fmt, var, what)   do {\
var = (what);   \
ksft_print_msg("Estimated " fmt ": %llu ns\n", var);\
-   if (var > INT_MAX)  \
-   goto more_samples;  \
+   if (var > INT_MAX) {\
+   skip = true;\
+   ret |= 1;   \
+   }   \
} while (0)
 
ESTIMATE("total seccomp overhead for 1 bitmapped filter", calc,
@@ -222,30 +240,34 @@ int main(int argc, char *argv[])
 (filter2 - native - entry) / 4);
 
ksft_print_msg("Expectations:\n");
-   ret |= compare("native", "≤", "1 bitmap", native, le, bitmap1);
-   bits = compare("native", "≤", "1 filter", native, le, filter1);
+   ret |= compare("native", "≤", "1 bitmap", native, le, bitmap1,
+  skip);
+   bits = compare("native", "≤", "1 filter", native, le, filter1,
+  skip);
if (bits)
-   goto more_samples;
+   skip = true;
 
ret |= compare("per-filter (last 2 diff)", "≈", "per-filter (filters / 
4)",
-   per_filter1, approx, per_filter2);
+  per_filter1, approx, per_filter2, skip);
 
bits = compare("1 bitmapped", "≈", "2 bitmapped",
-   bitmap1 - native, approx, bitmap2 - native);
+  bitmap1 - native, approx, bitmap2 - native, skip);
if (bits) {
ksft_print_msg("Skipping constant action bitmap expectations: 
they appear unsupported.\n");
-   goto out;
+   skip = true;
}
 
-   ret |= compare("entry", "≈", "1 bitmapped", entry, approx, bitmap1 - 
native);
-   ret |= compare("entry", "≈", "2 bitma

[PATCH v4 01/14] arm64/cpufeature: Hook new identification registers up to cpufeature

2024-01-22 Thread Mark Brown

The 2023 architecture extensions have defined several new ID registers,
hook them up to the cpufeature code so we can add feature checks and hwcaps
based on their contents.

Signed-off-by: Mark Brown 
---
 arch/arm64/include/asm/cpu.h   |  3 +++
 arch/arm64/kernel/cpufeature.c | 28 
 arch/arm64/kernel/cpuinfo.c|  3 +++
 3 files changed, 34 insertions(+)

diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h
index b1e43f56ee46..96379be913cd 100644
--- a/arch/arm64/include/asm/cpu.h
+++ b/arch/arm64/include/asm/cpu.h
@@ -52,14 +52,17 @@ struct cpuinfo_arm64 {
u64 reg_id_aa64isar0;
u64 reg_id_aa64isar1;
u64 reg_id_aa64isar2;
+   u64 reg_id_aa64isar3;
u64 reg_id_aa64mmfr0;
u64 reg_id_aa64mmfr1;
u64 reg_id_aa64mmfr2;
u64 reg_id_aa64mmfr3;
u64 reg_id_aa64pfr0;
u64 reg_id_aa64pfr1;
+   u64 reg_id_aa64pfr2;
u64 reg_id_aa64zfr0;
u64 reg_id_aa64smfr0;
+   u64 reg_id_aa64fpfr0;
 
struct cpuinfo_32bitaarch32;
 };
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 8d1a634a403e..eae59ec0f4b0 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -234,6 +234,10 @@ static const struct arm64_ftr_bits ftr_id_aa64isar2[] = {
ARM64_FTR_END,
 };
 
+static const struct arm64_ftr_bits ftr_id_aa64isar3[] = {
+   ARM64_FTR_END,
+};
+
 static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 
ID_AA64PFR0_EL1_CSV3_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 
ID_AA64PFR0_EL1_CSV2_SHIFT, 4, 0),
@@ -267,6 +271,10 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = {
ARM64_FTR_END,
 };
 
+static const struct arm64_ftr_bits ftr_id_aa64pfr2[] = {
+   ARM64_FTR_END,
+};
+
 static const struct arm64_ftr_bits ftr_id_aa64zfr0[] = {
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SVE),
   FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ZFR0_EL1_F64MM_SHIFT, 
4, 0),
@@ -319,6 +327,10 @@ static const struct arm64_ftr_bits ftr_id_aa64smfr0[] = {
ARM64_FTR_END,
 };
 
+static const struct arm64_ftr_bits ftr_id_aa64fpfr0[] = {
+   ARM64_FTR_END,
+};
+
 static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = {
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64MMFR0_EL1_ECV_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64MMFR0_EL1_FGT_SHIFT, 4, 0),
@@ -702,10 +714,12 @@ static const struct __ftr_reg_entry {
   &id_aa64pfr0_override),
ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64PFR1_EL1, ftr_id_aa64pfr1,
   &id_aa64pfr1_override),
+   ARM64_FTR_REG(SYS_ID_AA64PFR2_EL1, ftr_id_aa64pfr2),
ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64ZFR0_EL1, ftr_id_aa64zfr0,
   &id_aa64zfr0_override),
ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64SMFR0_EL1, ftr_id_aa64smfr0,
   &id_aa64smfr0_override),
+   ARM64_FTR_REG(SYS_ID_AA64FPFR0_EL1, ftr_id_aa64fpfr0),
 
/* Op1 = 0, CRn = 0, CRm = 5 */
ARM64_FTR_REG(SYS_ID_AA64DFR0_EL1, ftr_id_aa64dfr0),
@@ -717,6 +731,7 @@ static const struct __ftr_reg_entry {
   &id_aa64isar1_override),
ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64ISAR2_EL1, ftr_id_aa64isar2,
   &id_aa64isar2_override),
+   ARM64_FTR_REG(SYS_ID_AA64ISAR3_EL1, ftr_id_aa64isar3),
 
/* Op1 = 0, CRn = 0, CRm = 7 */
ARM64_FTR_REG(SYS_ID_AA64MMFR0_EL1, ftr_id_aa64mmfr0),
@@ -1043,14 +1058,17 @@ void __init init_cpu_features(struct cpuinfo_arm64 
*info)
init_cpu_ftr_reg(SYS_ID_AA64ISAR0_EL1, info->reg_id_aa64isar0);
init_cpu_ftr_reg(SYS_ID_AA64ISAR1_EL1, info->reg_id_aa64isar1);
init_cpu_ftr_reg(SYS_ID_AA64ISAR2_EL1, info->reg_id_aa64isar2);
+   init_cpu_ftr_reg(SYS_ID_AA64ISAR3_EL1, info->reg_id_aa64isar3);
init_cpu_ftr_reg(SYS_ID_AA64MMFR0_EL1, info->reg_id_aa64mmfr0);
init_cpu_ftr_reg(SYS_ID_AA64MMFR1_EL1, info->reg_id_aa64mmfr1);
init_cpu_ftr_reg(SYS_ID_AA64MMFR2_EL1, info->reg_id_aa64mmfr2);
init_cpu_ftr_reg(SYS_ID_AA64MMFR3_EL1, info->reg_id_aa64mmfr3);
init_cpu_ftr_reg(SYS_ID_AA64PFR0_EL1, info->reg_id_aa64pfr0);
init_cpu_ftr_reg(SYS_ID_AA64PFR1_EL1, info->reg_id_aa64pfr1);
+   init_cpu_ftr_reg(SYS_ID_AA64PFR2_EL1, info->reg_id_aa64pfr2);
init_cpu_ftr_reg(SYS_ID_AA64ZFR0_EL1, info->reg_id_aa64zfr0);
init_cpu_ftr_reg(SYS_ID_AA64SMFR0_EL1, info->reg_id_aa64smfr0);
+   init_cpu_ftr_reg(SYS_ID_AA64FPFR0_EL1, info->reg_id_aa64fpfr0);
 
if (id_aa64pfr0_32bit_e

[PATCH v4 02/14] arm64/fpsimd: Enable host kernel access to FPMR

2024-01-22 Thread Mark Brown

FEAT_FPMR provides a new generally accessible architectural register FPMR.
This is only accessible to EL0 and EL1 when HCRX_EL2.EnFPM is set to 1,
do this when the host is running. The guest part will be done along with
context switching the new register and exposing it via guest management.

Signed-off-by: Mark Brown 
---
 arch/arm64/include/asm/kvm_arm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 3c6f8ba1e479..7f45ce9170bb 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -105,7 +105,7 @@
 #define HCRX_GUEST_FLAGS \
(HCRX_EL2_SMPME | HCRX_EL2_TCR2En | \
 (cpus_have_final_cap(ARM64_HAS_MOPS) ? (HCRX_EL2_MSCEn | 
HCRX_EL2_MCE2) : 0))
-#define HCRX_HOST_FLAGS (HCRX_EL2_MSCEn | HCRX_EL2_TCR2En)
+#define HCRX_HOST_FLAGS (HCRX_EL2_MSCEn | HCRX_EL2_TCR2En | HCRX_EL2_EnFPM)
 
 /* TCR_EL2 Registers bits */
 #define TCR_EL2_DS (1UL << 32)

-- 
2.30.2

[PATCH v4 04/14] arm64/signal: Add FPMR signal handling

2024-01-22 Thread Mark Brown

Expose FPMR in the signal context on systems where it is supported. The
kernel validates the exact size of the FPSIMD registers so we can't readily
add it to fpsimd_context without disruption.

Signed-off-by: Mark Brown 
---
 arch/arm64/include/uapi/asm/sigcontext.h |  8 +
 arch/arm64/kernel/signal.c   | 59 
 2 files changed, 67 insertions(+)

diff --git a/arch/arm64/include/uapi/asm/sigcontext.h 
b/arch/arm64/include/uapi/asm/sigcontext.h
index f23c1dc3f002..8a45b7a411e0 100644
--- a/arch/arm64/include/uapi/asm/sigcontext.h
+++ b/arch/arm64/include/uapi/asm/sigcontext.h
@@ -152,6 +152,14 @@ struct tpidr2_context {
__u64 tpidr2;
 };
 
+/* FPMR context */
+#define FPMR_MAGIC 0x46504d52
+
+struct fpmr_context {
+   struct _aarch64_ctx head;
+   __u64 fpmr;
+};
+
 #define ZA_MAGIC   0x54366345
 
 struct za_context {
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 0e8beb3349ea..460823baa603 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -60,6 +60,7 @@ struct rt_sigframe_user_layout {
unsigned long tpidr2_offset;
unsigned long za_offset;
unsigned long zt_offset;
+   unsigned long fpmr_offset;
unsigned long extra_offset;
unsigned long end_offset;
 };
@@ -182,6 +183,8 @@ struct user_ctxs {
u32 za_size;
struct zt_context __user *zt;
u32 zt_size;
+   struct fpmr_context __user *fpmr;
+   u32 fpmr_size;
 };
 
 static int preserve_fpsimd_context(struct fpsimd_context __user *ctx)
@@ -227,6 +230,33 @@ static int restore_fpsimd_context(struct user_ctxs *user)
return err ? -EFAULT : 0;
 }
 
+static int preserve_fpmr_context(struct fpmr_context __user *ctx)
+{
+   int err = 0;
+
+   current->thread.uw.fpmr = read_sysreg_s(SYS_FPMR);
+
+   __put_user_error(FPMR_MAGIC, &ctx->head.magic, err);
+   __put_user_error(sizeof(*ctx), &ctx->head.size, err);
+   __put_user_error(current->thread.uw.fpmr, &ctx->fpmr, err);
+
+   return err;
+}
+
+static int restore_fpmr_context(struct user_ctxs *user)
+{
+   u64 fpmr;
+   int err = 0;
+
+   if (user->fpmr_size != sizeof(*user->fpmr))
+   return -EINVAL;
+
+   __get_user_error(fpmr, &user->fpmr->fpmr, err);
+   if (!err)
+   write_sysreg_s(fpmr, SYS_FPMR);
+
+   return err;
+}
 
 #ifdef CONFIG_ARM64_SVE
 
@@ -590,6 +620,7 @@ static int parse_user_sigframe(struct user_ctxs *user,
user->tpidr2 = NULL;
user->za = NULL;
user->zt = NULL;
+   user->fpmr = NULL;
 
if (!IS_ALIGNED((unsigned long)base, 16))
goto invalid;
@@ -684,6 +715,17 @@ static int parse_user_sigframe(struct user_ctxs *user,
user->zt_size = size;
break;
 
+   case FPMR_MAGIC:
+   if (!system_supports_fpmr())
+   goto invalid;
+
+   if (user->fpmr)
+   goto invalid;
+
+   user->fpmr = (struct fpmr_context __user *)head;
+   user->fpmr_size = size;
+   break;
+
case EXTRA_MAGIC:
if (have_extra_context)
goto invalid;
@@ -806,6 +848,9 @@ static int restore_sigframe(struct pt_regs *regs,
if (err == 0 && system_supports_tpidr2() && user.tpidr2)
err = restore_tpidr2_context(&user);
 
+   if (err == 0 && system_supports_fpmr() && user.fpmr)
+   err = restore_fpmr_context(&user);
+
if (err == 0 && system_supports_sme() && user.za)
err = restore_za_context(&user);
 
@@ -928,6 +973,13 @@ static int setup_sigframe_layout(struct 
rt_sigframe_user_layout *user,
}
}
 
+   if (system_supports_fpmr()) {
+   err = sigframe_alloc(user, &user->fpmr_offset,
+sizeof(struct fpmr_context));
+   if (err)
+   return err;
+   }
+
return sigframe_alloc_end(user);
 }
 
@@ -983,6 +1035,13 @@ static int setup_sigframe(struct rt_sigframe_user_layout 
*user,
err |= preserve_tpidr2_context(tpidr2_ctx);
}
 
+   /* FPMR if supported */
+   if (system_supports_fpmr() && err == 0) {
+   struct fpmr_context __user *fpmr_ctx =
+   apply_user_offset(user, user->fpmr_offset);
+   err |= preserve_fpmr_context(fpmr_ctx);
+   }
+
/* ZA state if present */
if (system_supports_sme() && err == 0 && user->za_offset) {
struct za_context __user *za_ctx =

-- 
2.30.2

[PATCH v4 05/14] arm64/ptrace: Expose FPMR via ptrace

2024-01-22 Thread Mark Brown

Add a new regset to expose FPMR via ptrace. It is not added to the FPSIMD
registers since that structure is exposed elsewhere without any allowance
for extension we don't add there.

Signed-off-by: Mark Brown 
---
 arch/arm64/kernel/ptrace.c | 42 ++
 include/uapi/linux/elf.h   |  1 +
 2 files changed, 43 insertions(+)

diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index dc6cf0e37194..aacb45bd36e6 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -698,6 +698,39 @@ static int tls_set(struct task_struct *target, const 
struct user_regset *regset,
return ret;
 }
 
+static int fpmr_get(struct task_struct *target, const struct user_regset 
*regset,
+  struct membuf to)
+{
+   if (!system_supports_fpmr())
+   return -EINVAL;
+
+   if (target == current)
+   fpsimd_preserve_current_state();
+
+   return membuf_store(&to, target->thread.uw.fpmr);
+}
+
+static int fpmr_set(struct task_struct *target, const struct user_regset 
*regset,
+  unsigned int pos, unsigned int count,
+  const void *kbuf, const void __user *ubuf)
+{
+   int ret;
+   unsigned long fpmr;
+
+   if (!system_supports_fpmr())
+   return -EINVAL;
+
+   ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &fpmr, 0, count);
+   if (ret)
+   return ret;
+
+   target->thread.uw.fpmr = fpmr;
+
+   fpsimd_flush_task_state(target);
+
+   return 0;
+}
+
 static int system_call_get(struct task_struct *target,
   const struct user_regset *regset,
   struct membuf to)
@@ -1419,6 +1452,7 @@ enum aarch64_regset {
REGSET_HW_BREAK,
REGSET_HW_WATCH,
 #endif
+   REGSET_FPMR,
REGSET_SYSTEM_CALL,
 #ifdef CONFIG_ARM64_SVE
REGSET_SVE,
@@ -1497,6 +1531,14 @@ static const struct user_regset aarch64_regsets[] = {
.regset_get = system_call_get,
.set = system_call_set,
},
+   [REGSET_FPMR] = {
+   .core_note_type = NT_ARM_FPMR,
+   .n = 1,
+   .size = sizeof(u64),
+   .align = sizeof(u64),
+   .regset_get = fpmr_get,
+   .set = fpmr_set,
+   },
 #ifdef CONFIG_ARM64_SVE
[REGSET_SVE] = { /* Scalable Vector Extension */
.core_note_type = NT_ARM_SVE,
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 9417309b7230..b54b313bcf07 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -440,6 +440,7 @@ typedef struct elf64_shdr {
 #define NT_ARM_SSVE0x40b   /* ARM Streaming SVE registers */
 #define NT_ARM_ZA  0x40c   /* ARM SME ZA registers */
 #define NT_ARM_ZT  0x40d   /* ARM SME ZT registers */
+#define NT_ARM_FPMR0x40e   /* ARM floating point mode register */
 #define NT_ARC_V2  0x600   /* ARCv2 accumulator/extra registers */
 #define NT_VMCOREDD0x700   /* Vmcore Device Dump Note */
 #define NT_MIPS_DSP0x800   /* MIPS DSP ASE registers */

-- 
2.30.2

[PATCH v4 06/14] arm64/hwcap: Define hwcaps for 2023 DPISA features

2024-01-22 Thread Mark Brown

The 2023 architecture extensions include a large number of floating point
features, most of which simply add new instructions. Add hwcaps so that
userspace can enumerate these features.

Signed-off-by: Mark Brown 
---
 Documentation/arch/arm64/elf_hwcaps.rst | 49 +
 arch/arm64/include/asm/hwcap.h  | 15 ++
 arch/arm64/include/uapi/asm/hwcap.h | 15 ++
 arch/arm64/kernel/cpufeature.c  | 35 +++
 arch/arm64/kernel/cpuinfo.c | 15 ++
 5 files changed, 129 insertions(+)

diff --git a/Documentation/arch/arm64/elf_hwcaps.rst 
b/Documentation/arch/arm64/elf_hwcaps.rst
index ced7b335e2e0..448c1664879b 100644
--- a/Documentation/arch/arm64/elf_hwcaps.rst
+++ b/Documentation/arch/arm64/elf_hwcaps.rst
@@ -317,6 +317,55 @@ HWCAP2_LRCPC3
 HWCAP2_LSE128
 Functionality implied by ID_AA64ISAR0_EL1.Atomic == 0b0011.
 
+HWCAP2_FPMR
+Functionality implied by ID_AA64PFR2_EL1.FMR == 0b0001.
+
+HWCAP2_LUT
+Functionality implied by ID_AA64ISAR2_EL1.LUT == 0b0001.
+
+HWCAP2_FAMINMAX
+Functionality implied by ID_AA64ISAR3_EL1.FAMINMAX == 0b0001.
+
+HWCAP2_F8CVT
+Functionality implied by ID_AA64FPFR0_EL1.F8CVT == 0b1.
+
+HWCAP2_F8FMA
+Functionality implied by ID_AA64FPFR0_EL1.F8FMA == 0b1.
+
+HWCAP2_F8DP4
+Functionality implied by ID_AA64FPFR0_EL1.F8DP4 == 0b1.
+
+HWCAP2_F8DP2
+Functionality implied by ID_AA64FPFR0_EL1.F8DP2 == 0b1.
+
+HWCAP2_F8E4M3
+Functionality implied by ID_AA64FPFR0_EL1.F8E4M3 == 0b1.
+
+HWCAP2_F8E5M2
+Functionality implied by ID_AA64FPFR0_EL1.F8E5M2 == 0b1.
+
+HWCAP2_SME_LUTV2
+Functionality implied by ID_AA64SMFR0_EL1.LUTv2 == 0b1.
+
+HWCAP2_SME_F8F16
+Functionality implied by ID_AA64SMFR0_EL1.F8F16 == 0b1.
+
+HWCAP2_SME_F8F32
+Functionality implied by ID_AA64SMFR0_EL1.F8F32 == 0b1.
+
+HWCAP2_SME_SF8FMA
+Functionality implied by ID_AA64SMFR0_EL1.SF8FMA == 0b1.
+
+HWCAP2_SME_SF8DP4
+Functionality implied by ID_AA64SMFR0_EL1.SF8DP4 == 0b1.
+
+HWCAP2_SME_SF8DP2
+Functionality implied by ID_AA64SMFR0_EL1.SF8DP2 == 0b1.
+
+HWCAP2_SME_SF8DP4
+Functionality implied by ID_AA64SMFR0_EL1.SF8DP4 == 0b1.
+
+
 4. Unused AT_HWCAP bits
 ---
 
diff --git a/arch/arm64/include/asm/hwcap.h b/arch/arm64/include/asm/hwcap.h
index cd71e09ea14d..4edd3b61df11 100644
--- a/arch/arm64/include/asm/hwcap.h
+++ b/arch/arm64/include/asm/hwcap.h
@@ -142,6 +142,21 @@
 #define KERNEL_HWCAP_SVE_B16B16__khwcap2_feature(SVE_B16B16)
 #define KERNEL_HWCAP_LRCPC3__khwcap2_feature(LRCPC3)
 #define KERNEL_HWCAP_LSE128__khwcap2_feature(LSE128)
+#define KERNEL_HWCAP_FPMR  __khwcap2_feature(FPMR)
+#define KERNEL_HWCAP_LUT   __khwcap2_feature(LUT)
+#define KERNEL_HWCAP_FAMINMAX  __khwcap2_feature(FAMINMAX)
+#define KERNEL_HWCAP_F8CVT __khwcap2_feature(F8CVT)
+#define KERNEL_HWCAP_F8FMA __khwcap2_feature(F8FMA)
+#define KERNEL_HWCAP_F8DP4 __khwcap2_feature(F8DP4)
+#define KERNEL_HWCAP_F8DP2 __khwcap2_feature(F8DP2)
+#define KERNEL_HWCAP_F8E4M3__khwcap2_feature(F8E4M3)
+#define KERNEL_HWCAP_F8E5M2__khwcap2_feature(F8E5M2)
+#define KERNEL_HWCAP_SME_LUTV2 __khwcap2_feature(SME_LUTV2)
+#define KERNEL_HWCAP_SME_F8F16 __khwcap2_feature(SME_F8F16)
+#define KERNEL_HWCAP_SME_F8F32 __khwcap2_feature(SME_F8F32)
+#define KERNEL_HWCAP_SME_SF8FMA__khwcap2_feature(SME_SF8FMA)
+#define KERNEL_HWCAP_SME_SF8DP4__khwcap2_feature(SME_SF8DP4)
+#define KERNEL_HWCAP_SME_SF8DP2__khwcap2_feature(SME_SF8DP2)
 
 /*
  * This yields a mask that user programs can use to figure out what
diff --git a/arch/arm64/include/uapi/asm/hwcap.h 
b/arch/arm64/include/uapi/asm/hwcap.h
index 5023599fa278..285610e626f5 100644
--- a/arch/arm64/include/uapi/asm/hwcap.h
+++ b/arch/arm64/include/uapi/asm/hwcap.h
@@ -107,5 +107,20 @@
 #define HWCAP2_SVE_B16B16  (1UL << 45)
 #define HWCAP2_LRCPC3  (1UL << 46)
 #define HWCAP2_LSE128  (1UL << 47)
+#define HWCAP2_FPMR(1UL << 48)
+#define HWCAP2_LUT (1UL << 49)
+#define HWCAP2_FAMINMAX(1UL << 50)
+#define HWCAP2_F8CVT   (1UL << 51)
+#define HWCAP2_F8FMA   (1UL << 52)
+#define HWCAP2_F8DP4   (1UL << 53)
+#define HWCAP2_F8DP2   (1UL << 54)
+#define HWCAP2_F8E4M3  (1UL << 55)
+#define HWCAP2_F8E5M2  (1UL << 56)
+#define HWCAP2_SME_LUTV2   (1UL << 57)
+#define HWCAP2_SME_F8F16   (1UL << 58)
+#define HWCAP2_SME_F8F32   (1UL << 59)
+#define HWCAP2_SME_SF8FMA  (1UL << 60)
+#define HWCAP2_SME_SF8DP4  (1UL << 61)
+#define HWCAP2_SME_SF8DP2  (1UL << 62)
 
 #endif /* _UAPI__ASM_HWCAP_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 0263565f617a..aefda789f510 100644
--- a/arch/arm64/kernel/cpufeature.

[PATCH v4 07/14] kselftest/arm64: Handle FPMR context in generic signal frame parser

2024-01-22 Thread Mark Brown

Teach the generic signal frame parsing code about the newly added FPMR
frame, avoiding warnings every time one is generated.

Signed-off-by: Mark Brown 
---
 tools/testing/selftests/arm64/signal/testcases/testcases.c | 8 
 tools/testing/selftests/arm64/signal/testcases/testcases.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/tools/testing/selftests/arm64/signal/testcases/testcases.c 
b/tools/testing/selftests/arm64/signal/testcases/testcases.c
index 9f580b55b388..674b88cc8c39 100644
--- a/tools/testing/selftests/arm64/signal/testcases/testcases.c
+++ b/tools/testing/selftests/arm64/signal/testcases/testcases.c
@@ -209,6 +209,14 @@ bool validate_reserved(ucontext_t *uc, size_t resv_sz, 
char **err)
zt = (struct zt_context *)head;
new_flags |= ZT_CTX;
break;
+   case FPMR_MAGIC:
+   if (flags & FPMR_CTX)
+   *err = "Multiple FPMR_MAGIC";
+   else if (head->size !=
+sizeof(struct fpmr_context))
+   *err = "Bad size for fpmr_context";
+   new_flags |= FPMR_CTX;
+   break;
case EXTRA_MAGIC:
if (flags & EXTRA_CTX)
*err = "Multiple EXTRA_MAGIC";
diff --git a/tools/testing/selftests/arm64/signal/testcases/testcases.h 
b/tools/testing/selftests/arm64/signal/testcases/testcases.h
index a08ab0d6207a..7727126347e0 100644
--- a/tools/testing/selftests/arm64/signal/testcases/testcases.h
+++ b/tools/testing/selftests/arm64/signal/testcases/testcases.h
@@ -19,6 +19,7 @@
 #define ZA_CTX (1 << 2)
 #define EXTRA_CTX  (1 << 3)
 #define ZT_CTX (1 << 4)
+#define FPMR_CTX   (1 << 5)
 
 #define KSFT_BAD_MAGIC 0xdeadbeef
 

-- 
2.30.2

[PATCH v4 09/14] kselftest/arm64: Add 2023 DPISA hwcap test coverage

2024-01-22 Thread Mark Brown

Add the hwcaps added for the 2023 DPISA extensions to the hwcaps test
program.

Signed-off-by: Mark Brown 
---
 tools/testing/selftests/arm64/abi/hwcap.c | 217 ++
 1 file changed, 217 insertions(+)

diff --git a/tools/testing/selftests/arm64/abi/hwcap.c 
b/tools/testing/selftests/arm64/abi/hwcap.c
index 1189e77c8152..d8909b2b535a 100644
--- a/tools/testing/selftests/arm64/abi/hwcap.c
+++ b/tools/testing/selftests/arm64/abi/hwcap.c
@@ -58,11 +58,46 @@ static void cssc_sigill(void)
asm volatile(".inst 0xdac01c00" : : : "x0");
 }
 
+static void f8cvt_sigill(void)
+{
+   /* FSCALE V0.4H, V0.4H, V0.4H */
+   asm volatile(".inst 0x2ec03c00");
+}
+
+static void f8dp2_sigill(void)
+{
+   /* FDOT V0.4H, V0.4H, V0.5H */
+   asm volatile(".inst 0xe40fc00");
+}
+
+static void f8dp4_sigill(void)
+{
+   /* FDOT V0.2S, V0.2S, V0.2S */
+   asm volatile(".inst 0xe00fc00");
+}
+
+static void f8fma_sigill(void)
+{
+   /* FMLALB V0.8H, V0.16B, V0.16B */
+   asm volatile(".inst 0xec0fc00");
+}
+
+static void faminmax_sigill(void)
+{
+   /* FAMIN V0.4H, V0.4H, V0.4H */
+   asm volatile(".inst 0x2ec01c00");
+}
+
 static void fp_sigill(void)
 {
asm volatile("fmov s0, #1");
 }
 
+static void fpmr_sigill(void)
+{
+   asm volatile("mrs x0, S3_3_C4_C4_2" : : : "x0");
+}
+
 static void ilrcpc_sigill(void)
 {
/* LDAPUR W0, [SP, #8] */
@@ -95,6 +130,12 @@ static void lse128_sigill(void)
 : "cc", "memory");
 }
 
+static void lut_sigill(void)
+{
+   /* LUTI2 V0.16B, { V0.16B }, V[0] */
+   asm volatile(".inst 0x4e801000");
+}
+
 static void mops_sigill(void)
 {
char dst[1], src[1];
@@ -216,6 +257,78 @@ static void smef16f16_sigill(void)
asm volatile("msr S0_3_C4_C6_3, xzr" : : : );
 }
 
+static void smef8f16_sigill(void)
+{
+   /* SMSTART */
+   asm volatile("msr S0_3_C4_C7_3, xzr" : : : );
+
+   /* FDOT ZA.H[W0, 0], Z0.B-Z1.B, Z0.B-Z1.B */
+   asm volatile(".inst 0xc1a01020" : : : );
+
+   /* SMSTOP */
+   asm volatile("msr S0_3_C4_C6_3, xzr" : : : );
+}
+
+static void smef8f32_sigill(void)
+{
+   /* SMSTART */
+   asm volatile("msr S0_3_C4_C7_3, xzr" : : : );
+
+   /* FDOT ZA.S[W0, 0], { Z0.B-Z1.B }, Z0.B[0] */
+   asm volatile(".inst 0xc1500038" : : : );
+
+   /* SMSTOP */
+   asm volatile("msr S0_3_C4_C6_3, xzr" : : : );
+}
+
+static void smelutv2_sigill(void)
+{
+   /* SMSTART */
+   asm volatile("msr S0_3_C4_C7_3, xzr" : : : );
+
+   /* LUTI4 { Z0.B-Z3.B }, ZT0, { Z0-Z1 } */
+   asm volatile(".inst 0xc08b" : : : );
+
+   /* SMSTOP */
+   asm volatile("msr S0_3_C4_C6_3, xzr" : : : );
+}
+
+static void smesf8dp2_sigill(void)
+{
+   /* SMSTART */
+   asm volatile("msr S0_3_C4_C7_3, xzr" : : : );
+
+   /* FDOT Z0.H, Z0.B, Z0.B[0] */
+   asm volatile(".inst 0x64204400" : : : );
+
+   /* SMSTOP */
+   asm volatile("msr S0_3_C4_C6_3, xzr" : : : );
+}
+
+static void smesf8dp4_sigill(void)
+{
+   /* SMSTART */
+   asm volatile("msr S0_3_C4_C7_3, xzr" : : : );
+
+   /* FDOT Z0.S, Z0.B, Z0.B[0] */
+   asm volatile(".inst 0xc1a41C00" : : : );
+
+   /* SMSTOP */
+   asm volatile("msr S0_3_C4_C6_3, xzr" : : : );
+}
+
+static void smesf8fma_sigill(void)
+{
+   /* SMSTART */
+   asm volatile("msr S0_3_C4_C7_3, xzr" : : : );
+
+   /* FMLALB V0.8H, V0.16B, V0.16B */
+   asm volatile(".inst 0xec0fc00");
+
+   /* SMSTOP */
+   asm volatile("msr S0_3_C4_C6_3, xzr" : : : );
+}
+
 static void sve_sigill(void)
 {
/* RDVL x0, #0 */
@@ -353,6 +466,53 @@ static const struct hwcap_data {
.cpuinfo = "cssc",
.sigill_fn = cssc_sigill,
},
+   {
+   .name = "F8CVT",
+   .at_hwcap = AT_HWCAP2,
+   .hwcap_bit = HWCAP2_F8CVT,
+   .cpuinfo = "f8cvt",
+   .sigill_fn = f8cvt_sigill,
+   },
+   {
+   .name = "F8DP4",
+   .at_hwcap = AT_HWCAP2,
+   .hwcap_bit = HWCAP2_F8DP4,
+   .cpuinfo = "f8dp4",
+   .sigill_fn = f8dp4_sigill,
+   },
+   {
+   .name = "F8DP2",
+   .at_hwcap = AT_HWCAP2,
+   .hwcap_bit = HWCAP2_F8DP2,
+   .cpuinfo = "f8dp4",
+   .sigill_fn = f8dp2_sigill,
+   },
+   {
+   .name = "F8E5M2",
+   .at_hwcap = AT_HWCAP2,
+   .hwcap_bit = HWCAP2_F8E5M2,
+   .cpuinfo = "f8e5m2",
+   },
+   {
+   .name = "F8E4M3",
+   .at_hwcap = AT_HWCAP2,
+   .hwcap_bit = HWCAP2_F8E4M3,
+   .cpuinfo = "f8e4m3",
+   },
+   {
+   .name = "F8FMA",
+   .at_hwcap = AT_HWCAP2,
+   .hwcap_bit = HWCAP2_F8FMA,
+   .cpuinfo = "f8fma",
+   .sigill_fn = f8fma_sigill,
+   },
+   {

[PATCH v4 10/14] KVM: arm64: Share all userspace hardened thread data with the hypervisor

2024-01-22 Thread Mark Brown

As part of the lazy FPSIMD state transitioning done by the hypervisor we
currently share the userpsace FPSIMD state in thread->uw.fpsimd_state with
the host. Since this struct is non-extensible userspace ABI we have to keep
the definition as is but the addition of FPMR in the 2023 dpISA means that
we will want to share more storage with the host. To facilitate this
refactor the current code to share the entire thread->uw rather than just
the one field.

The large number of references to fpsimd_state make it very inconvenient
to add an additional wrapper struct.

Signed-off-by: Mark Brown 
---
 arch/arm64/include/asm/kvm_host.h   |  3 ++-
 arch/arm64/include/asm/processor.h  |  2 +-
 arch/arm64/kvm/fpsimd.c | 13 ++---
 arch/arm64/kvm/hyp/include/hyp/switch.h |  2 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c  |  4 ++--
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 7993694a54af..c4fdcc94d733 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
@@ -601,7 +602,7 @@ struct kvm_vcpu_arch {
struct kvm_guest_debug_arch vcpu_debug_state;
struct kvm_guest_debug_arch external_debug_state;
 
-   struct user_fpsimd_state *host_fpsimd_state;/* hyp VA */
+   struct thread_struct_uw *host_uw;   /* hyp VA */
struct task_struct *parent_task;
 
struct {
diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index b453c66d3fae..544baa57f9b9 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -152,7 +152,7 @@ struct thread_struct {
 * Maintainers must ensure manually that this contains no
 * implicit padding.
 */
-   struct {
+   struct thread_struct_uw {
unsigned long   tp_value;   /* TLS register */
unsigned long   tp2_value;
unsigned long   fpmr;
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index e3e611e30e91..6cf22cd8f020 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -17,13 +17,13 @@
 void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu)
 {
struct task_struct *p = vcpu->arch.parent_task;
-   struct user_fpsimd_state *fpsimd;
+   struct thread_struct_uw *uw;
 
if (!is_protected_kvm_enabled() || !p)
return;
 
-   fpsimd = &p->thread.uw.fpsimd_state;
-   kvm_unshare_hyp(fpsimd, fpsimd + 1);
+   uw = &p->thread.uw;
+   kvm_unshare_hyp(uw, uw + 1);
put_task_struct(p);
 }
 
@@ -39,17 +39,16 @@ void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu)
 int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
 {
int ret;
-
-   struct user_fpsimd_state *fpsimd = ¤t->thread.uw.fpsimd_state;
+   struct thread_struct_uw *uw = ¤t->thread.uw;
 
kvm_vcpu_unshare_task_fp(vcpu);
 
/* Make sure the host task fpsimd state is visible to hyp: */
-   ret = kvm_share_hyp(fpsimd, fpsimd + 1);
+   ret = kvm_share_hyp(uw, uw + 1);
if (ret)
return ret;
 
-   vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
+   vcpu->arch.host_uw = kern_hyp_va(uw);
 
/*
 * We need to keep current's task_struct pinned until its data has been
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h 
b/arch/arm64/kvm/hyp/include/hyp/switch.h
index a038320cdb08..27fcdfd432b9 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -371,7 +371,7 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, 
u64 *exit_code)
 
/* Write out the host state if it's in the registers */
if (vcpu->arch.fp_state == FP_STATE_HOST_OWNED)
-   __fpsimd_save_state(vcpu->arch.host_fpsimd_state);
+   __fpsimd_save_state(&(vcpu->arch.host_uw->fpsimd_state));
 
/* Restore the guest state */
if (sve_guest)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 2385fd03ed87..eb2208009875 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -42,7 +42,7 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
hyp_vcpu->vcpu.arch.fp_state= host_vcpu->arch.fp_state;
 
hyp_vcpu->vcpu.arch.debug_ptr   = 
kern_hyp_va(host_vcpu->arch.debug_ptr);
-   hyp_vcpu->vcpu.arch.host_fpsimd_state = 
host_vcpu->arch.host_fpsimd_state;
+   hyp_vcpu->vcpu.arch.host_uw = host_vcpu->arch.host_uw;
 
hyp_vcpu->vcpu.arch.vsesr_el2   = host_vcpu->arch.vsesr_el2;
 
@@ -64,7 +64,7 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
host_vcpu->arch.fault   = hyp_vcpu->vcpu.arch.fault;
 
host_vcpu->arch.iflags  = hyp

[PATCH v4 12/14] KVM: arm64: Support FEAT_FPMR for guests

2024-01-22 Thread Mark Brown

FEAT_FPMR introduces a new system register FPMR which allows configuration
of floating point behaviour, currently for FP8 specific features. Allow use
of this in guests, disabling the trap while guests are running and saving
and restoring the value along with the rest of the floating point state.
Since FPMR is stored immediately after the main floating point state we
share it with the hypervisor by adjusting the size of the shared region.

Access to FPMR is covered by both a register specific trap HCRX_EL2.EnFPM
and the overall floating point access trap so we just unconditionally
enable the FPMR specific trap and rely on the floating point access trap to
detect guest floating point usage.

Signed-off-by: Mark Brown 
---
 arch/arm64/include/asm/kvm_arm.h|  2 +-
 arch/arm64/include/asm/kvm_host.h   |  3 ++-
 arch/arm64/kvm/emulate-nested.c |  8 
 arch/arm64/kvm/fpsimd.c |  2 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h |  7 ++-
 arch/arm64/kvm/sys_regs.c   | 11 +++
 6 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 7f45ce9170bb..b2ddb6165953 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -103,7 +103,7 @@
 #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
 
 #define HCRX_GUEST_FLAGS \
-   (HCRX_EL2_SMPME | HCRX_EL2_TCR2En | \
+   (HCRX_EL2_SMPME | HCRX_EL2_TCR2En | HCRX_EL2_EnFPM | \
 (cpus_have_final_cap(ARM64_HAS_MOPS) ? (HCRX_EL2_MSCEn | 
HCRX_EL2_MCE2) : 0))
 #define HCRX_HOST_FLAGS (HCRX_EL2_MSCEn | HCRX_EL2_TCR2En | HCRX_EL2_EnFPM)
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index c4fdcc94d733..99c0f8944f04 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -384,6 +384,8 @@ enum vcpu_sysreg {
APGAKEYLO_EL1,
APGAKEYHI_EL1,
 
+   FPMR,
+
/* Memory Tagging Extension registers */
RGSR_EL1,   /* Random Allocation Tag Seed Register */
GCR_EL1,/* Tag Control Register */
@@ -544,7 +546,6 @@ struct kvm_vcpu_arch {
enum fp_type fp_type;
unsigned int sve_max_vl;
u64 svcr;
-   unsigned long fpmr;
 
/* Stage 2 paging state used by the hardware on next switch */
struct kvm_s2_mmu *hw_mmu;
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 431fd429932d..3af5fd0e28dc 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -67,6 +67,8 @@ enum cgt_group_id {
CGT_HCR_TTLBIS,
CGT_HCR_TTLBOS,
 
+   CGT_HCRX_EnFPM,
+
CGT_MDCR_TPMCR,
CGT_MDCR_TPM,
CGT_MDCR_TDE,
@@ -279,6 +281,11 @@ static const struct trap_bits coarse_trap_bits[] = {
.mask   = HCR_TTLBOS,
.behaviour  = BEHAVE_FORWARD_ANY,
},
+   [CGT_HCRX_EnFPM] = {
+   .index  = HCRX_EL2,
+   .mask   = HCRX_EL2_EnFPM,
+   .behaviour  = BEHAVE_FORWARD_ANY,
+   },
[CGT_MDCR_TPMCR] = {
.index  = MDCR_EL2,
.value  = MDCR_EL2_TPMCR,
@@ -478,6 +485,7 @@ static const struct encoding_to_trap_config 
encoding_to_cgt[] __initconst = {
SR_TRAP(SYS_AIDR_EL1,   CGT_HCR_TID1),
SR_TRAP(SYS_SMIDR_EL1,  CGT_HCR_TID1),
SR_TRAP(SYS_CTR_EL0,CGT_HCR_TID2),
+   SR_TRAP(SYS_FPMR,   CGT_HCRX_EnFPM),
SR_TRAP(SYS_CCSIDR_EL1, CGT_HCR_TID2_TID4),
SR_TRAP(SYS_CCSIDR2_EL1,CGT_HCR_TID2_TID4),
SR_TRAP(SYS_CLIDR_EL1,  CGT_HCR_TID2_TID4),
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 6cf22cd8f020..9e002489c843 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -152,7 +152,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
fp_state.sve_vl = vcpu->arch.sve_max_vl;
fp_state.sme_state = NULL;
fp_state.svcr = &vcpu->arch.svcr;
-   fp_state.fpmr = &vcpu->arch.fpmr;
+   fp_state.fpmr = (unsigned long *)&__vcpu_sys_reg(vcpu, FPMR);
fp_state.fp_type = &vcpu->arch.fp_type;
 
if (vcpu_has_sve(vcpu))
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h 
b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 27fcdfd432b9..abf785c473d0 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -370,10 +370,15 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, 
u64 *exit_code)
isb();
 
/* Write out the host state if it's in the registers */
-   if (vcpu->arch.fp_state == FP_STATE_HOST_OWNED)
+   if (vcpu->arch.fp_state == FP_STATE_HOST_OWNED) {
__fpsimd_save_state(&(vcpu->arch.host_uw->fpsimd_state));
+

[PATCH v4 13/14] KVM: arm64: selftests: Document feature registers added in 2023 extensions

2024-01-22 Thread Mark Brown

The 2023 architecture extensions allocated some previously usused feature
registers, add comments mapping the names in get-reg-list as we do for the
other allocated registers.

Signed-off-by: Mark Brown 
---
 tools/testing/selftests/kvm/aarch64/get-reg-list.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/aarch64/get-reg-list.c 
b/tools/testing/selftests/kvm/aarch64/get-reg-list.c
index 709d7d721760..71ea6ecec7ce 100644
--- a/tools/testing/selftests/kvm/aarch64/get-reg-list.c
+++ b/tools/testing/selftests/kvm/aarch64/get-reg-list.c
@@ -428,7 +428,7 @@ static __u64 base_regs[] = {
ARM64_SYS_REG(3, 0, 0, 4, 4),   /* ID_AA64ZFR0_EL1 */
ARM64_SYS_REG(3, 0, 0, 4, 5),   /* ID_AA64SMFR0_EL1 */
ARM64_SYS_REG(3, 0, 0, 4, 6),
-   ARM64_SYS_REG(3, 0, 0, 4, 7),
+   ARM64_SYS_REG(3, 0, 0, 4, 7),   /* ID_AA64FPFR_EL1 */
ARM64_SYS_REG(3, 0, 0, 5, 0),   /* ID_AA64DFR0_EL1 */
ARM64_SYS_REG(3, 0, 0, 5, 1),   /* ID_AA64DFR1_EL1 */
ARM64_SYS_REG(3, 0, 0, 5, 2),
@@ -440,7 +440,7 @@ static __u64 base_regs[] = {
ARM64_SYS_REG(3, 0, 0, 6, 0),   /* ID_AA64ISAR0_EL1 */
ARM64_SYS_REG(3, 0, 0, 6, 1),   /* ID_AA64ISAR1_EL1 */
ARM64_SYS_REG(3, 0, 0, 6, 2),   /* ID_AA64ISAR2_EL1 */
-   ARM64_SYS_REG(3, 0, 0, 6, 3),
+   ARM64_SYS_REG(3, 0, 0, 6, 3),   /* ID_AA64ISAR3_EL1 */
ARM64_SYS_REG(3, 0, 0, 6, 4),
ARM64_SYS_REG(3, 0, 0, 6, 5),
ARM64_SYS_REG(3, 0, 0, 6, 6),

-- 
2.30.2

Re: [PATCH v2 4/4] selftests/resctrl: Add non-contiguous CBMs CAT test

2024-01-22 Thread Reinette Chatre

Hi Maciej,

On 1/21/2024 11:56 PM, Maciej Wieczór-Retman wrote:
> Hi!
> 
> On 2024-01-19 at 08:39:31 -0800, Reinette Chatre wrote:
>> Hi Maciej,
>>
>> On 1/18/2024 11:37 PM, Maciej Wieczór-Retman wrote:
>>> On 2024-01-18 at 09:15:46 -0800, Reinette Chatre wrote:
 On 1/18/2024 4:02 AM, Maciej Wieczór-Retman wrote:
> On 2024-01-17 at 10:49:06 -0800, Reinette Chatre wrote:
>> On 1/17/2024 12:26 AM, Maciej Wieczór-Retman wrote:
>>> On 2024-01-08 at 14:42:11 -0800, Reinette Chatre wrote:
 On 12/12/2023 6:52 AM, Maciej Wieczor-Retman wrote:

> + bit_center = count_bits(full_cache_mask) / 2;
> + cont_mask = full_cache_mask >> bit_center;
> +
> + /* Contiguous mask write check. */
> + snprintf(schemata, sizeof(schemata), "%lx", cont_mask);
> + ret = write_schemata("", schemata, uparams->cpu, 
> test->resource);
> + if (ret)
> + return ret;

 How will user know what failed? I am seeing this single test exercise 
 a few scenarios
 and it is not obvious to me if the issue will be clear if this test,
 noncont_cat_run_test(), fails.
>>>
>>> write_schemata() either succeeds with '0' or errors out with a negative 
>>> value. If
>>> the contiguous mask write fails, write_schemata should print out what 
>>> was wrong
>>> and I believe that the test will report an error rather than failure.
>>
>> Right. I am trying to understand whether the user will be able to 
>> decipher what failed
>> in case there is an error. Seems like in this case the user is expected 
>> to look at the
>> source code of the test to understand what the test was trying to do at 
>> the time it
>> encountered the failure. In this case user may be "lucky" that this test 
>> only has
>> one write_schemata() call _not_ followed by a ksft_print_msg() so user 
>> can use that
>> reasoning to figure out which write_schemata() failed to further dig 
>> what test was
>> trying to do. 
>
> When a write_schemata() is executed the string that is being written gets
> printed. If there are multiple calls in a single tests and one fails I'd 
> imagine
> it would be easy for the user to figure out which one failed.

 It would be easy for the user the figure out if (a) it is obvious to the 
 user
 what schema a particular write_schema() call attempted to write and (b) 
 all the
 write_schema() calls attempt to write different schema.
>>>
>>> Okay, your comment made me wonder if on error the schemata still is 
>>> printed. I
>>> double checked in the code and whether write_schemata() fails or not it has 
>>> a
>>> goto path where before returning it will print out the schema. So I believe 
>>> that
>>> satisfies your (a) condition.
>>
>> Let me try with an example.
>> Scenario 1:
>> The test has the following code:
>>  ...
>>  write_schemata(..., "0xfff", ...);
>>  ...
>>  write_schemata(..., "0xf0f", ...);
>>  ...
>>
>> Scenario 2:
>> The test has the following code:
>>  ...
>>  write_schemata(..., schemata, ...);
>>  ...
>>  write_schemata(..., schemata, ...);
>>  ...
>>
>> Any failure of write_schemata() in scenario 1 will be easy to trace. As you
>> state, write_schemata() prints the schemata attempted and it will thus be
>> easy to look at the code to see which write_schemata() call failed since it
>> is obvious from the code which schemata was attempted.
>> A failure of one of the write_schemata() in scenario 2 will not be as easy
>> to trace since the user first needs to determine what the value of "schemata"
>> is at each call and that may depend on the platform, bit shifting done in 
>> test,
>> and state of system state at time of test.
> 
> Doing things similar to scenario 1 would be great from a debugging perspective
> but since the masks can have different sizes putting literals there seems
> impossible.
> 
> Maybe the code could be improved by putting an example CBM in the comment 
> above
> a write_schemata() call? "For a 12 bit maximum CBM value, the contiguous
> schemata will look like '0x3f'" and "For a 12 bit maximum CBM value, the
> non-contiguous schemata will look like '0xf0f'"
> 
> This seems like the closest I could get to what you're
> showing in scenario 1 (which I assume would be the best).

I am not asking you to use literals. I am trying to demonstrate that the only 
way
it would be obvious to the user where a failure is is when the test uses 
literals.
I continue to try to motivate for clear indication to user/developer what failed
when this test failed ... this could just be a ksft_print_msg() when the
write_schemata() call we are talking about fails.

> 
>>> As for (b) depends on what you meant. Other tests that run more than one
>>> write_schemata() use different ones every time (CAT, MBM, MBA). Do you

Re: [PATCH] selftests: Move KTAP bash helpers to selftests common folder

2024-01-22 Thread Rob Herring

On Tue, Jan 02, 2024 at 03:15:28PM +0100, Laura Nao wrote:
> Move bash helpers for outputting in KTAP format to the common selftests
> folder. This allows kselftests other than the dt one to source the file
> and make use of the helper functions.
> Define pass, fail and skip codes in the same file too.
> 
> Signed-off-by: Laura Nao 
> ---
>  tools/testing/selftests/Makefile  | 1 +
>  tools/testing/selftests/dt/Makefile   | 2 +-
>  tools/testing/selftests/dt/test_unprobed_devices.sh   | 6 +-
>  tools/testing/selftests/{dt => kselftest}/ktap_helpers.sh | 6 ++
>  4 files changed, 9 insertions(+), 6 deletions(-)
>  rename tools/testing/selftests/{dt => kselftest}/ktap_helpers.sh (94%)

Acked-by: Rob Herring

[PATCH v4 03/14] arm64/fpsimd: Support FEAT_FPMR

2024-01-22 Thread Mark Brown

FEAT_FPMR defines a new EL0 accessible register FPMR use to configure the
FP8 related features added to the architecture at the same time. Detect
support for this register and context switch it for EL0 when present.

Due to the sharing of responsibility for saving floating point state
between the host kernel and KVM FP8 support is not yet implemented in KVM
and a stub similar to that used for SVCR is provided for FPMR in order to
avoid bisection issues. To make it easier to share host state with the
hypervisor we store FPMR as a hardened usercopy field in uw (along with
some padding).

Signed-off-by: Mark Brown 
---
 arch/arm64/include/asm/cpufeature.h |  5 +
 arch/arm64/include/asm/fpsimd.h |  2 ++
 arch/arm64/include/asm/kvm_host.h   |  1 +
 arch/arm64/include/asm/processor.h  |  4 
 arch/arm64/kernel/cpufeature.c  |  9 +
 arch/arm64/kernel/fpsimd.c  | 13 +
 arch/arm64/kvm/fpsimd.c |  1 +
 arch/arm64/tools/cpucaps|  1 +
 8 files changed, 36 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 21c824edf8ce..34fcdbc65d7d 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -768,6 +768,11 @@ static __always_inline bool system_supports_tpidr2(void)
return system_supports_sme();
 }
 
+static __always_inline bool system_supports_fpmr(void)
+{
+   return alternative_has_cap_unlikely(ARM64_HAS_FPMR);
+}
+
 static __always_inline bool system_supports_cnp(void)
 {
return alternative_has_cap_unlikely(ARM64_HAS_CNP);
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 50e5f25d3024..6cf72b0d2c04 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -89,6 +89,7 @@ struct cpu_fp_state {
void *sve_state;
void *sme_state;
u64 *svcr;
+   unsigned long *fpmr;
unsigned int sve_vl;
unsigned int sme_vl;
enum fp_type *fp_type;
@@ -154,6 +155,7 @@ extern void cpu_enable_sve(const struct 
arm64_cpu_capabilities *__unused);
 extern void cpu_enable_sme(const struct arm64_cpu_capabilities *__unused);
 extern void cpu_enable_sme2(const struct arm64_cpu_capabilities *__unused);
 extern void cpu_enable_fa64(const struct arm64_cpu_capabilities *__unused);
+extern void cpu_enable_fpmr(const struct arm64_cpu_capabilities *__unused);
 
 extern u64 read_smcr_features(void);
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 21c57b812569..7993694a54af 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -543,6 +543,7 @@ struct kvm_vcpu_arch {
enum fp_type fp_type;
unsigned int sve_max_vl;
u64 svcr;
+   unsigned long fpmr;
 
/* Stage 2 paging state used by the hardware on next switch */
struct kvm_s2_mmu *hw_mmu;
diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index 5b0a04810b23..b453c66d3fae 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -155,6 +155,8 @@ struct thread_struct {
struct {
unsigned long   tp_value;   /* TLS register */
unsigned long   tp2_value;
+   unsigned long   fpmr;
+   unsigned long   pad;
struct user_fpsimd_state fpsimd_state;
} uw;
 
@@ -253,6 +255,8 @@ static inline void arch_thread_struct_whitelist(unsigned 
long *offset,
BUILD_BUG_ON(sizeof_field(struct thread_struct, uw) !=
 sizeof_field(struct thread_struct, uw.tp_value) +
 sizeof_field(struct thread_struct, uw.tp2_value) +
+sizeof_field(struct thread_struct, uw.fpmr) +
+sizeof_field(struct thread_struct, uw.pad) +
 sizeof_field(struct thread_struct, uw.fpsimd_state));
 
*offset = offsetof(struct thread_struct, uw);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index eae59ec0f4b0..0263565f617a 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -272,6 +272,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = {
 };
 
 static const struct arm64_ftr_bits ftr_id_aa64pfr2[] = {
+   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64PFR2_EL1_FPMR_SHIFT, 4, 0),
ARM64_FTR_END,
 };
 
@@ -2767,6 +2768,14 @@ static const struct arm64_cpu_capabilities 
arm64_features[] = {
.type = ARM64_CPUCAP_SYSTEM_FEATURE,
.matches = has_lpa2,
},
+   {
+   .desc = "FPMR",
+   .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+   .capability = ARM64_HAS_FPMR,
+   .matches = has_cpuid_feature,
+   .cpu_enable = cpu_enable_fpmr,
+   ARM64_CPUID_FIELDS(ID_AA64PFR2_EL1, FPMR, IMP)
+   },

[PATCH v2] kunit: Mark filter* params as rw

2024-01-22 Thread Lucas De Marchi

By allowing the filter_glob parameter to be written to, it's possible to
tweak the testsuites that will be executed on new module loads. This
makes it easier to run specific tests without having to reload kunit and
provides a way to filter tests on real HW even if kunit is builtin.
Example for xe driver:

1) Run just 1 test
# echo -n xe_bo > /sys/module/kunit/parameters/filter_glob
# modprobe -r xe_live_test
# modprobe xe_live_test
# ls /sys/kernel/debug/kunit/
xe_bo

2) Run all tests
# echo \* > /sys/module/kunit/parameters/filter_glob
# modprobe -r xe_live_test
# modprobe xe_live_test
# ls /sys/kernel/debug/kunit/
xe_bo  xe_dma_buf  xe_migrate  xe_mocs

For completeness and to cover other use cases, also change filter and
filter_action to rw.

Link: 
https://lore.kernel.org/intel-xe/dzacvbdditbneiu3e3fmstjmttcbne44yspumpkd6sjn56jqpk@vxu7sksbqrp6/
Reviewed-by: Rae Moar 
Signed-off-by: Lucas De Marchi 
---

Rae, I kept your r-b from v1 since the additions are just what we talked
about.

v2: also change filter_action and filter to rw, testing with the xe
module to see if filter=module=none filter_action=skip produces
the result expected by igt

 lib/kunit/executor.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/kunit/executor.c b/lib/kunit/executor.c
index 1236b3cd2fbb..371ddcee7fb5 100644
--- a/lib/kunit/executor.c
+++ b/lib/kunit/executor.c
@@ -31,13 +31,13 @@ static char *filter_glob_param;
 static char *filter_param;
 static char *filter_action_param;
 
-module_param_named(filter_glob, filter_glob_param, charp, 0400);
+module_param_named(filter_glob, filter_glob_param, charp, 0600);
 MODULE_PARM_DESC(filter_glob,
"Filter which KUnit test suites/tests run at boot-time, e.g. 
list* or list*.*del_test");
-module_param_named(filter, filter_param, charp, 0400);
+module_param_named(filter, filter_param, charp, 0600);
 MODULE_PARM_DESC(filter,
"Filter which KUnit test suites/tests run at boot-time using 
attributes, e.g. speed>slow");
-module_param_named(filter_action, filter_action_param, charp, 0400);
+module_param_named(filter_action, filter_action_param, charp, 0600);
 MODULE_PARM_DESC(filter_action,
"Changes behavior of filtered tests using attributes, valid 
values are:\n"
": do not run filtered tests as normal\n"
-- 
2.40.1

Re: [PATCH v6 1/3] kselftests: lib.mk: Add TEST_GEN_MODS_DIR variable

2024-01-22 Thread Shuah Khan


On 1/12/24 10:43, Marcos Paulo de Souza wrote:

Add TEST_GEN_MODS_DIR variable for kselftests. It can point to
a directory containing kernel modules that will be used by
selftest scripts.

The modules are built as external modules for the running kernel.
As a result they are always binary compatible and the same tests
can be used for older or newer kernels.

The build requires "kernel-devel" package to be installed.
For example, in the upstream sources, the rpm devel package
is produced by "make rpm-pkg"

The modules can be built independently by

   make -C tools/testing/selftests/livepatch/

or they will be automatically built before running the tests via

   make -C tools/testing/selftests/livepatch/ run_tests

Note that they are _not_ built when running the standalone
tests by calling, for example, ./test-state.sh.

Along with TEST_GEN_MODS_DIR, it was necessary to create a new install
rule. INSTALL_MODS_RULE is needed because INSTALL_SINGLE_RULE would
copy the entire TEST_GEN_MODS_DIR directory to the destination, even
the files created by Kbuild to compile the modules. The new install
rule copies only the .ko files, as we would expect the gen_tar to work.

Reviewed-by: Joe Lawrence 
Reviewed-by: Petr Mladek 
Signed-off-by: Marcos Paulo de Souza 
---
  Documentation/dev-tools/kselftest.rst |  4 
  tools/testing/selftests/lib.mk| 26 +-



Hi Marcos,

I would like the doc patch and lib.mk patch separate. If lib.mk needs changes
we don't have to touch the doc patch.

thanks,
-- Shuah

[PATCH v4 11/14] KVM: arm64: Add newly allocated ID registers to register descriptions

2024-01-22 Thread Mark Brown

The 2023 architecture extensions have allocated some new ID registers, add
them to the KVM system register descriptions so that they are visible to
guests.

Signed-off-by: Mark Brown 
---
 arch/arm64/kvm/sys_regs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 30253bd19917..38503b1cd2eb 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2292,12 +2292,12 @@ static const struct sys_reg_desc sys_reg_descs[] = {
   ID_AA64PFR0_EL1_AdvSIMD |
   ID_AA64PFR0_EL1_FP), },
ID_SANITISED(ID_AA64PFR1_EL1),
-   ID_UNALLOCATED(4,2),
+   ID_SANITISED(ID_AA64PFR2_EL1),
ID_UNALLOCATED(4,3),
ID_WRITABLE(ID_AA64ZFR0_EL1, ~ID_AA64ZFR0_EL1_RES0),
ID_HIDDEN(ID_AA64SMFR0_EL1),
ID_UNALLOCATED(4,6),
-   ID_UNALLOCATED(4,7),
+   ID_SANITISED(ID_AA64FPFR0_EL1),
 
/* CRm=5 */
{ SYS_DESC(SYS_ID_AA64DFR0_EL1),
@@ -2324,7 +2324,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_WRITABLE(ID_AA64ISAR2_EL1, ~(ID_AA64ISAR2_EL1_RES0 |
ID_AA64ISAR2_EL1_APA3 |
ID_AA64ISAR2_EL1_GPA3)),
-   ID_UNALLOCATED(6,3),
+   ID_WRITABLE(ID_AA64ISAR3_EL1, ~ID_AA64ISAR3_EL1_RES0),
ID_UNALLOCATED(6,4),
ID_UNALLOCATED(6,5),
ID_UNALLOCATED(6,6),

-- 
2.30.2

[PATCH v4 00/14] arm64: Support for 2023 DPISA extensions

2024-01-22 Thread Mark Brown

This series enables support for the data processing extensions in the
newly released 2023 architecture, this is mainly support for 8 bit
floating point formats.  Most of the extensions only introduce new
instructions and therefore only require hwcaps but there is a new EL0
visible control register FPMR used to control the 8 bit floating point
formats, we need to manage traps for this and context switch it.

Due to uncertainty with the plan for parsing ID registers to identify
which features to expose to the guest the KVM support is placed at the
end of the series, it will need to be revised once that issue is
resolved.  The sharing of floating point save code between the host and
guest kernels slightly complicates the introduction of KVM support, we
first introduce host support with some placeholders for KVM then replace
those with the actual KVM support.

I've not added test coverage for ptrace, I've got a test program which
exercises all the FP ptrace interfaces and their interactions together,
my plan is to cover it there rather than add another tiny test program
that duplicates the boilerplace for tracing a target and doesn't
actually run the traced program.

Signed-off-by: Mark Brown 
---
Changes in v4:
- Rebase onto v6.8-rc1.
- Move KVM support to the end of the series.
- Link to v3: 
https://lore.kernel.org/r/20231205-arm64-2023-dpisa-v3-0-dbcbcd867...@kernel.org

Changes in v3:
- Rebase onto v6.7-rc3.
- Hook up traps for FPMR in emulate-nested.c.
- Link to v2: 
https://lore.kernel.org/r/20231114-arm64-2023-dpisa-v2-0-47251894f...@kernel.org

Changes in v2:
- Rebase onto v6.7-rc1.
- Link to v1: 
https://lore.kernel.org/r/20231026-arm64-2023-dpisa-v1-0-8470dd989...@kernel.org

---
Mark Brown (14):
  arm64/cpufeature: Hook new identification registers up to cpufeature
  arm64/fpsimd: Enable host kernel access to FPMR
  arm64/fpsimd: Support FEAT_FPMR
  arm64/signal: Add FPMR signal handling
  arm64/ptrace: Expose FPMR via ptrace
  arm64/hwcap: Define hwcaps for 2023 DPISA features
  kselftest/arm64: Handle FPMR context in generic signal frame parser
  kselftest/arm64: Add basic FPMR test
  kselftest/arm64: Add 2023 DPISA hwcap test coverage
  KVM: arm64: Share all userspace hardened thread data with the hypervisor
  KVM: arm64: Add newly allocated ID registers to register descriptions
  KVM: arm64: Support FEAT_FPMR for guests
  KVM: arm64: selftests: Document feature registers added in 2023 extensions
  KVM: arm64: selftests: Teach get-reg-list about FPMR

 Documentation/arch/arm64/elf_hwcaps.rst|  49 +
 arch/arm64/include/asm/cpu.h   |   3 +
 arch/arm64/include/asm/cpufeature.h|   5 +
 arch/arm64/include/asm/fpsimd.h|   2 +
 arch/arm64/include/asm/hwcap.h |  15 ++
 arch/arm64/include/asm/kvm_arm.h   |   4 +-
 arch/arm64/include/asm/kvm_host.h  |   5 +-
 arch/arm64/include/asm/processor.h |   6 +-
 arch/arm64/include/uapi/asm/hwcap.h|  15 ++
 arch/arm64/include/uapi/asm/sigcontext.h   |   8 +
 arch/arm64/kernel/cpufeature.c |  72 +++
 arch/arm64/kernel/cpuinfo.c|  18 ++
 arch/arm64/kernel/fpsimd.c |  13 ++
 arch/arm64/kernel/ptrace.c |  42 
 arch/arm64/kernel/signal.c |  59 ++
 arch/arm64/kvm/emulate-nested.c|   8 +
 arch/arm64/kvm/fpsimd.c|  14 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h|   9 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c |   4 +-
 arch/arm64/kvm/sys_regs.c  |  17 +-
 arch/arm64/tools/cpucaps   |   1 +
 include/uapi/linux/elf.h   |   1 +
 tools/testing/selftests/arm64/abi/hwcap.c  | 217 +
 tools/testing/selftests/arm64/signal/.gitignore|   1 +
 .../arm64/signal/testcases/fpmr_siginfo.c  |  82 
 .../selftests/arm64/signal/testcases/testcases.c   |   8 +
 .../selftests/arm64/signal/testcases/testcases.h   |   1 +
 tools/testing/selftests/kvm/aarch64/get-reg-list.c |  11 +-
 28 files changed, 670 insertions(+), 20 deletions(-)
---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20231003-arm64-2023-dpisa-2f3d25746474

Best regards,
-- 
Mark Brown

Re: [PATCH v6 2/3] livepatch: Move tests from lib/livepatch to selftests/livepatch

2024-01-22 Thread Shuah Khan


On 1/22/24 05:55, Marcos Paulo de Souza wrote:

On Fri, 2024-01-19 at 14:19 +0100, Alexander Gordeev wrote:

On Fri, Jan 19, 2024 at 02:11:01PM +0100, Alexander Gordeev wrote:

FWIW, for s390 part:

Alexander Gordeev 


Acked-by: Alexander Gordeev 


Thanks Alexandre and Joe for testing and supporting the change.

Shuah, now that the issue found by that Joe was fixed, do you think the
change is ready to be merged? The patches were reviewed by three
different people already, and I don't know what else can be missing at
this point.



I would have liked doc patch and lib.mk separate. However, I am pulling this
now to get testing done. In the future please keep them separate.

thanks,
-- Shuah

[PATCH v4 08/14] kselftest/arm64: Add basic FPMR test

2024-01-22 Thread Mark Brown

Verify that a FPMR frame is generated on systems that support FPMR and not
generated otherwise.

Signed-off-by: Mark Brown 
---
 tools/testing/selftests/arm64/signal/.gitignore|  1 +
 .../arm64/signal/testcases/fpmr_siginfo.c  | 82 ++
 2 files changed, 83 insertions(+)

diff --git a/tools/testing/selftests/arm64/signal/.gitignore 
b/tools/testing/selftests/arm64/signal/.gitignore
index 839e3a252629..1ce5b5eac386 100644
--- a/tools/testing/selftests/arm64/signal/.gitignore
+++ b/tools/testing/selftests/arm64/signal/.gitignore
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 mangle_*
 fake_sigreturn_*
+fpmr_*
 sme_*
 ssve_*
 sve_*
diff --git a/tools/testing/selftests/arm64/signal/testcases/fpmr_siginfo.c 
b/tools/testing/selftests/arm64/signal/testcases/fpmr_siginfo.c
new file mode 100644
index ..e9d24685e741
--- /dev/null
+++ b/tools/testing/selftests/arm64/signal/testcases/fpmr_siginfo.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023 ARM Limited
+ *
+ * Verify that the FPMR register context in signal frames is set up as
+ * expected.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test_signals_utils.h"
+#include "testcases.h"
+
+static union {
+   ucontext_t uc;
+   char buf[1024 * 128];
+} context;
+
+#define SYS_FPMR "S3_3_C4_C4_2"
+
+static uint64_t get_fpmr(void)
+{
+   uint64_t val;
+
+   asm volatile (
+   "mrs%0, " SYS_FPMR "\n"
+   : "=r"(val)
+   :
+   : "cc");
+
+   return val;
+}
+
+int fpmr_present(struct tdescr *td, siginfo_t *si, ucontext_t *uc)
+{
+   struct _aarch64_ctx *head = GET_BUF_RESV_HEAD(context);
+   struct fpmr_context *fpmr_ctx;
+   size_t offset;
+   bool in_sigframe;
+   bool have_fpmr;
+   __u64 orig_fpmr;
+
+   have_fpmr = getauxval(AT_HWCAP2) & HWCAP2_FPMR;
+   if (have_fpmr)
+   orig_fpmr = get_fpmr();
+
+   if (!get_current_context(td, &context.uc, sizeof(context)))
+   return 1;
+
+   fpmr_ctx = (struct fpmr_context *)
+   get_header(head, FPMR_MAGIC, td->live_sz, &offset);
+
+   in_sigframe = fpmr_ctx != NULL;
+
+   fprintf(stderr, "FPMR sigframe %s on system %s FPMR\n",
+   in_sigframe ? "present" : "absent",
+   have_fpmr ? "with" : "without");
+
+   td->pass = (in_sigframe == have_fpmr);
+
+   if (have_fpmr && fpmr_ctx) {
+   if (fpmr_ctx->fpmr != orig_fpmr) {
+   fprintf(stderr, "FPMR in frame is %llx, was %llx\n",
+   fpmr_ctx->fpmr, orig_fpmr);
+   td->pass = false;
+   }
+   }
+
+   return 0;
+}
+
+struct tdescr tde = {
+   .name = "FPMR",
+   .descr = "Validate that FPMR is present as expected",
+   .timeout = 3,
+   .run = fpmr_present,
+};

-- 
2.30.2

[PATCH v4 14/14] KVM: arm64: selftests: Teach get-reg-list about FPMR

2024-01-22 Thread Mark Brown

FEAT_FPMR defines a new register FMPR which is available at all ELs and is
discovered via ID_AA64PFR2_EL1.FPMR, add this to the set of registers that
get-reg-list knows to check for with the required identification register
depdendency.

Signed-off-by: Mark Brown 
---
 tools/testing/selftests/kvm/aarch64/get-reg-list.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tools/testing/selftests/kvm/aarch64/get-reg-list.c 
b/tools/testing/selftests/kvm/aarch64/get-reg-list.c
index 71ea6ecec7ce..1e43511d1440 100644
--- a/tools/testing/selftests/kvm/aarch64/get-reg-list.c
+++ b/tools/testing/selftests/kvm/aarch64/get-reg-list.c
@@ -40,6 +40,12 @@ static struct feature_id_reg feat_id_regs[] = {
ARM64_SYS_REG(3, 0, 0, 7, 3),   /* ID_AA64MMFR3_EL1 */
4,
1
+   },
+   {
+   ARM64_SYS_REG(3, 3, 4, 4, 2),   /* FPMR */
+   ARM64_SYS_REG(3, 0, 0, 4, 2),   /* ID_AA64PFR2_EL1 */
+   32,
+   1
}
 };
 
@@ -481,6 +487,7 @@ static __u64 base_regs[] = {
ARM64_SYS_REG(3, 3, 14, 2, 1),  /* CNTP_CTL_EL0 */
ARM64_SYS_REG(3, 3, 14, 2, 2),  /* CNTP_CVAL_EL0 */
ARM64_SYS_REG(3, 4, 3, 0, 0),   /* DACR32_EL2 */
+   ARM64_SYS_REG(3, 3, 4, 4, 2),   /* FPMR */
ARM64_SYS_REG(3, 4, 5, 0, 1),   /* IFSR32_EL2 */
ARM64_SYS_REG(3, 4, 5, 3, 0),   /* FPEXC32_EL2 */
 };

-- 
2.30.2

Re: [PATCH v6 1/3] kselftests: lib.mk: Add TEST_GEN_MODS_DIR variable

2024-01-22 Thread Marcos Paulo de Souza

On Mon, 2024-01-22 at 10:15 -0700, Shuah Khan wrote:
> On 1/12/24 10:43, Marcos Paulo de Souza wrote:
> > Add TEST_GEN_MODS_DIR variable for kselftests. It can point to
> > a directory containing kernel modules that will be used by
> > selftest scripts.
> > 
> > The modules are built as external modules for the running kernel.
> > As a result they are always binary compatible and the same tests
> > can be used for older or newer kernels.
> > 
> > The build requires "kernel-devel" package to be installed.
> > For example, in the upstream sources, the rpm devel package
> > is produced by "make rpm-pkg"
> > 
> > The modules can be built independently by
> > 
> >    make -C tools/testing/selftests/livepatch/
> > 
> > or they will be automatically built before running the tests via
> > 
> >    make -C tools/testing/selftests/livepatch/ run_tests
> > 
> > Note that they are _not_ built when running the standalone
> > tests by calling, for example, ./test-state.sh.
> > 
> > Along with TEST_GEN_MODS_DIR, it was necessary to create a new
> > install
> > rule. INSTALL_MODS_RULE is needed because INSTALL_SINGLE_RULE would
> > copy the entire TEST_GEN_MODS_DIR directory to the destination,
> > even
> > the files created by Kbuild to compile the modules. The new install
> > rule copies only the .ko files, as we would expect the gen_tar to
> > work.
> > 
> > Reviewed-by: Joe Lawrence 
> > Reviewed-by: Petr Mladek 
> > Signed-off-by: Marcos Paulo de Souza 
> > ---
> >   Documentation/dev-tools/kselftest.rst |  4 
> >   tools/testing/selftests/lib.mk    | 26 +-
> > 
> 
> 
> Hi Marcos,
> 
> I would like the doc patch and lib.mk patch separate. If lib.mk needs
> changes
> we don't have to touch the doc patch.

Hi Shuah,
on patch 2/3 you also said that you would like to have the
documentation changes split in the future, and that you picked the
changes into a testing branch. Does it also applies to this patch?

Do I need to resend the three patches and separate the documentation
part into a new one, or can I apply this rationale to future changes to
lib.mk? Sorry, I'm confused.

Thanks in advance,
  Marcos

> 
> thanks,
> -- Shuah

Re: [PATCH v6 1/3] kselftests: lib.mk: Add TEST_GEN_MODS_DIR variable

2024-01-22 Thread Shuah Khan


On 1/22/24 10:37, Marcos Paulo de Souza wrote:

On Mon, 2024-01-22 at 10:15 -0700, Shuah Khan wrote:

On 1/12/24 10:43, Marcos Paulo de Souza wrote:

Add TEST_GEN_MODS_DIR variable for kselftests. It can point to
a directory containing kernel modules that will be used by
selftest scripts.

The modules are built as external modules for the running kernel.
As a result they are always binary compatible and the same tests
can be used for older or newer kernels.

The build requires "kernel-devel" package to be installed.
For example, in the upstream sources, the rpm devel package
is produced by "make rpm-pkg"

The modules can be built independently by

    make -C tools/testing/selftests/livepatch/

or they will be automatically built before running the tests via

    make -C tools/testing/selftests/livepatch/ run_tests

Note that they are _not_ built when running the standalone
tests by calling, for example, ./test-state.sh.

Along with TEST_GEN_MODS_DIR, it was necessary to create a new
install
rule. INSTALL_MODS_RULE is needed because INSTALL_SINGLE_RULE would
copy the entire TEST_GEN_MODS_DIR directory to the destination,
even
the files created by Kbuild to compile the modules. The new install
rule copies only the .ko files, as we would expect the gen_tar to
work.

Reviewed-by: Joe Lawrence 
Reviewed-by: Petr Mladek 
Signed-off-by: Marcos Paulo de Souza 
---
   Documentation/dev-tools/kselftest.rst |  4 
   tools/testing/selftests/lib.mk    | 26 +-




Hi Marcos,

I would like the doc patch and lib.mk patch separate. If lib.mk needs
changes
we don't have to touch the doc patch.


Hi Shuah,
on patch 2/3 you also said that you would like to have the
documentation changes split in the future, and that you picked the
changes into a testing branch. Does it also applies to this patch?



No need to do anything now. I just applied the series to linux-kselftest next

thanks,
-- Shuah

resctrl selftests ready for inclusion

2024-01-22 Thread Reinette Chatre

Hi Shuah,

Could you please consider Ilpo's resctrl selftest enhancements [1]
for inclusion into kselftest's "next" branch in preparation for the
next merge window?

Thank you very much.

Reinette

[1] 
https://lore.kernel.org/lkml/20231215150515.36983-1-ilpo.jarvi...@linux.intel.com/

[PATCH v15] exec: Fix dead-lock in de_thread with ptrace_attach

2024-01-22 Thread Bernd Edlinger

This introduces signal->exec_bprm, which is used to
fix the case when at least one of the sibling threads
is traced, and therefore the trace process may dead-lock
in ptrace_attach, but de_thread will need to wait for the
tracer to continue execution.

The problem happens when a tracer tries to ptrace_attach
to a multi-threaded process, that does an execve in one of
the threads at the same time, without doing that in a forked
sub-process.  That means: There is a race condition, when one
or more of the threads are already ptraced, but the thread
that invoked the execve is not yet traced.  Now in this
case the execve locks the cred_guard_mutex and waits for
de_thread to complete.  But that waits for the traced
sibling threads to exit, and those have to wait for the
tracer to receive the exit signal, but the tracer cannot
call wait right now, because it is waiting for the ptrace
call to complete, and this never does not happen.
The traced process and the tracer are now in a deadlock
situation, and can only be killed by a fatal signal.

The solution is to detect this situation and allow
ptrace_attach to continue by temporarily releasing the
cred_guard_mutex, while de_thread() is still waiting for
traced zombies to be eventually released by the tracer.
In the case of the thread group leader we only have to wait
for the thread to become a zombie, which may also need
co-operation from the tracer due to PTRACE_O_TRACEEXIT.

When a tracer wants to ptrace_attach a task that already
is in execve, we simply retry the ptrace_may_access
check while temporarily installing the new credentials
and dumpability which are about to be used after execve
completes.  If the ptrace_attach happens on a thread that
is a sibling-thread of the thread doing execve, it is
sufficient to check against the old credentials, as this
thread will be waited for, before the new credentials are
installed.

Other threads die quickly since the cred_guard_mutex is
released, but a deadly signal is already pending.  In case
the mutex_lock_killable misses the signal, the non-zero
current->signal->exec_bprm makes sure they release the
mutex immediately and return with -ERESTARTNOINTR.

This means there is no API change, unlike the previous
version of this patch which was discussed here:

https://lore.kernel.org/lkml/b6537ae6-31b1-5c50-f32b-8b8332ace...@hotmail.de/

See tools/testing/selftests/ptrace/vmaccess.c
for a test case that gets fixed by this change.

Note that since the test case was originally designed to
test the ptrace_attach returning an error in this situation,
the test expectation needed to be adjusted, to allow the
API to succeed at the first attempt.

Signed-off-by: Bernd Edlinger 
---
 fs/exec.c |  69 ---
 fs/proc/base.c|   6 +
 include/linux/cred.h  |   1 +
 include/linux/sched/signal.h  |  18 +++
 kernel/cred.c |  30 -
 kernel/ptrace.c   |  31 +
 kernel/seccomp.c  |  12 +-
 tools/testing/selftests/ptrace/vmaccess.c | 135 --
 8 files changed, 265 insertions(+), 37 deletions(-)

v10: Changes to previous version, make the PTRACE_ATTACH
return -EAGAIN, instead of execve return -ERESTARTSYS.
Added some lessions learned to the description.

v11: Check old and new credentials in PTRACE_ATTACH again without
changing the API.

Note: I got actually one response from an automatic checker to the v11 patch,

https://lore.kernel.org/lkml/202107121344.wu68hepf-...@intel.com/

which is complaining about:

>> >> kernel/ptrace.c:425:26: sparse: sparse: incorrect type in assignment 
>> >> (different address spaces) @@ expected struct cred const *old_cred @@ 
>> >> got struct cred const [noderef] __rcu *real_cred @@

   417  struct linux_binprm *bprm = task->signal->exec_bprm;
   418  const struct cred *old_cred;
   419  struct mm_struct *old_mm;
   420  
   421  retval = 
down_write_killable(&task->signal->exec_update_lock);
   422  if (retval)
   423  goto unlock_creds;
   424  task_lock(task);
 > 425  old_cred = task->real_cred;

v12: Essentially identical to v11.

- Fixed a minor merge conflict in linux v5.17, and fixed the
above mentioned nit by adding __rcu to the declaration.

- re-tested the patch with all linux versions from v5.11 to v6.6

v10 was an alternative approach which did imply an API change.
But I would prefer to avoid such an API change.

The difficult part is getting the right dumpability flags assigned
before de_thread starts, hope you like this version.
If not, the v10 is of course also acceptable.

v13: Fixed duplicated Return section in function header of
is_dumpability_changed which was reported by the kernel test robot

v14: rebased to v6.7, refreshed and retested.
And added a more detai

[PATCH v4 0/3] Add test to verify probe of devices from discoverable buses

2024-01-22 Thread Nícolas F . R . A . Prado

This is part of an effort to improve detection of regressions impacting
device probe on all platforms. The recently merged DT kselftest [3]
detects probe issues for all devices described statically in the DT.
That leaves out devices discovered at run-time from discoverable buses.

This is where this test comes in. All of the devices that are connected
through discoverable buses (ie USB and PCI), and which are internal and
therefore always present, can be described based on their position in
the system topology in a per-platform YAML file so they can be checked
for. The test will check that the device has been instantiated and bound
to a driver.

Patch 1 introduces the test. Patch 2 and 3 add the device definitions
for the google,spherion machine (Acer Chromebook 514) and XPS 13 as
examples.

This is the output from the test running on Spherion:

TAP version 13
Using board file: boards/google,spherion.yaml
1..8
ok 1 /usb2-controller@1120/1.4.1/camera.device
ok 2 /usb2-controller@1120/1.4.1/camera.0.driver
ok 3 /usb2-controller@1120/1.4.1/camera.1.driver
ok 4 /usb2-controller@1120/1.4.2/bluetooth.device
ok 5 /usb2-controller@1120/1.4.2/bluetooth.0.driver
ok 6 /usb2-controller@1120/1.4.2/bluetooth.1.driver
ok 7 /pci-controller@1123/0.0/0.0/wifi.device
ok 8 /pci-controller@1123/0.0/0.0/wifi.driver
Totals: pass:8 fail:0 xfail:0 xpass:0 skip:0 error:0

[3] 
https://lore.kernel.org/all/20230828211424.2964562-1-nfrapr...@collabora.com/

Changes in v4:
- Dropped RFC tag
- Fixed 'busses' misspelling
- Link to v3: 
https://lore.kernel.org/all/20231227123643.52348-1-nfrapr...@collabora.com

Changes in v3:
- Reverted approach of encoding stable device reference in test file
from device match fields (from modalias) back to HW topology (from v1)
- Changed board file description to YAML
- Rewrote test script in python to handle YAML and support x86 platforms
- Link to v2: 
https://lore.kernel.org/all/20231127233558.868365-1-nfrapr...@collabora.com

Changes in v2:
- Changed approach of encoding stable device reference in test file from
HW topology to device match fields (the ones from modalias)
- Better documented test format
- Link to v1: 
https://lore.kernel.org/all/20231024211818.365844-1-nfrapr...@collabora.com

---
Nícolas F. R. A. Prado (3):
  kselftest: Add test to verify probe of devices from discoverable buses
  kselftest: devices: Add sample board file for google,spherion
  kselftest: devices: Add sample board file for XPS 13 9300

 tools/testing/selftests/Makefile   |   1 +
 tools/testing/selftests/devices/Makefile   |   4 +
 .../devices/boards/Dell Inc.,XPS 13 9300.yaml  |  40 +++
 .../selftests/devices/boards/google,spherion.yaml  |  50 
 tools/testing/selftests/devices/ksft.py|  90 ++
 .../selftests/devices/test_discoverable_devices.py | 318 +
 6 files changed, 503 insertions(+)
---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20240122-discoverable-devs-ksft-9d501e312688

Best regards,
-- 
Nícolas F. R. A. Prado

[PATCH v4 1/3] kselftest: Add test to verify probe of devices from discoverable buses

2024-01-22 Thread Nícolas F . R . A . Prado

Add a new test to verify that a list of expected devices from
discoverable buses (ie USB, PCI) have been successfully instantiated and
probed by a driver.

The per-platform list of expected devices is selected from the ones
under the boards/ directory based on the DT compatible or the DMI IDs.

Signed-off-by: Nícolas F. R. A. Prado 
---
 tools/testing/selftests/Makefile   |   1 +
 tools/testing/selftests/devices/Makefile   |   4 +
 tools/testing/selftests/devices/ksft.py|  90 ++
 .../selftests/devices/test_discoverable_devices.py | 318 +
 4 files changed, 413 insertions(+)

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 15b6a111c3be..a7858126c7c5 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -13,6 +13,7 @@ TARGETS += core
 TARGETS += cpufreq
 TARGETS += cpu-hotplug
 TARGETS += damon
+TARGETS += devices
 TARGETS += dmabuf-heaps
 TARGETS += drivers/dma-buf
 TARGETS += drivers/s390x/uvdevice
diff --git a/tools/testing/selftests/devices/Makefile 
b/tools/testing/selftests/devices/Makefile
new file mode 100644
index ..ca29249b30c3
--- /dev/null
+++ b/tools/testing/selftests/devices/Makefile
@@ -0,0 +1,4 @@
+TEST_PROGS := test_discoverable_devices.py
+TEST_FILES := boards ksft.py
+
+include ../lib.mk
diff --git a/tools/testing/selftests/devices/ksft.py 
b/tools/testing/selftests/devices/ksft.py
new file mode 100644
index ..cd89fb2bc10e
--- /dev/null
+++ b/tools/testing/selftests/devices/ksft.py
@@ -0,0 +1,90 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2023 Collabora Ltd
+#
+# Kselftest helpers for outputting in KTAP format. Based on kselftest.h.
+#
+
+import sys
+
+ksft_cnt = {"pass": 0, "fail": 0, "skip": 0}
+ksft_num_tests = 0
+ksft_test_number = 1
+
+KSFT_PASS = 0
+KSFT_FAIL = 1
+KSFT_SKIP = 4
+
+
+def print_header():
+print("TAP version 13")
+
+
+def set_plan(num_tests):
+global ksft_num_tests
+ksft_num_tests = num_tests
+print("1..{}".format(num_tests))
+
+
+def print_cnts():
+print(
+f"# Totals: pass:{ksft_cnt['pass']} fail:{ksft_cnt['fail']} xfail:0 
xpass:0 skip:{ksft_cnt['skip']} error:0"
+)
+
+
+def print_msg(msg):
+print(f"# {msg}")
+
+
+def _test_print(result, description, directive=None):
+if directive:
+directive_str = f"# {directive}"
+else:
+directive_str = ""
+
+global ksft_test_number
+print(f"{result} {ksft_test_number} {description} {directive_str}")
+ksft_test_number += 1
+
+
+def test_result_pass(description):
+_test_print("ok", description)
+ksft_cnt["pass"] += 1
+
+
+def test_result_fail(description):
+_test_print("not ok", description)
+ksft_cnt["fail"] += 1
+
+
+def test_result_skip(description):
+_test_print("ok", description, "SKIP")
+ksft_cnt["skip"] += 1
+
+
+def test_result(condition, description=""):
+if condition:
+test_result_pass(description)
+else:
+test_result_fail(description)
+
+
+def finished():
+if ksft_cnt["pass"] == ksft_num_tests:
+exit_code = KSFT_PASS
+else:
+exit_code = KSFT_FAIL
+
+print_cnts()
+
+sys.exit(exit_code)
+
+
+def exit_fail():
+print_cnts()
+sys.exit(KSFT_FAIL)
+
+
+def exit_pass():
+print_cnts()
+sys.exit(KSFT_PASS)
diff --git a/tools/testing/selftests/devices/test_discoverable_devices.py 
b/tools/testing/selftests/devices/test_discoverable_devices.py
new file mode 100755
index ..fbae8deb593d
--- /dev/null
+++ b/tools/testing/selftests/devices/test_discoverable_devices.py
@@ -0,0 +1,318 @@
+#!/usr/bin/python3
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2023 Collabora Ltd
+#
+# This script tests for presence and driver binding of devices from 
discoverable
+# buses (ie USB, PCI).
+#
+# The per-platform YAML file defining the devices to be tested is stored inside
+# the boards/ directory and chosen based on DT compatible or DMI IDs 
(sys_vendor
+# and product_name).
+#
+# See boards/google,spherion.yaml and boards/'Dell Inc.,XPS 13 9300.yaml' for
+# the description and examples of the file structure and vocabulary.
+#
+
+import glob
+import ksft
+import os
+import re
+import sys
+import yaml
+
+pci_controllers = []
+usb_controllers = []
+
+sysfs_usb_devices = "/sys/bus/usb/devices/"
+
+
+def find_pci_controller_dirs():
+sysfs_devices = "/sys/devices"
+pci_controller_sysfs_dir = "pci[0-9a-f]{4}:[0-9a-f]{2}"
+
+dir_regex = re.compile(pci_controller_sysfs_dir)
+for path, dirs, _ in os.walk(sysfs_devices):
+for d in dirs:
+if dir_regex.match(d):
+pci_controllers.append(os.path.join(path, d))
+
+
+def find_usb_controller_dirs():
+usb_controller_sysfs_dir = "usb[\d]+"
+
+dir_regex = re.compile(usb_controller_sysfs_dir)
+for d in os.scandir(sysfs_usb_devices):
+if dir_regex.match(d.name):
+usb_controllers.append(

[PATCH v4 2/3] kselftest: devices: Add sample board file for google,spherion

2024-01-22 Thread Nícolas F . R . A . Prado

Add a sample board file describing the file's format and with the list
of devices expected to be probed on the google,spherion machine as an
example.

Test output:

TAP version 13
Using board file: boards/google,spherion.yaml
1..8
ok 1 /usb2-controller@1120/1.4.1/camera.device
ok 2 /usb2-controller@1120/1.4.1/camera.0.driver
ok 3 /usb2-controller@1120/1.4.1/camera.1.driver
ok 4 /usb2-controller@1120/1.4.2/bluetooth.device
ok 5 /usb2-controller@1120/1.4.2/bluetooth.0.driver
ok 6 /usb2-controller@1120/1.4.2/bluetooth.1.driver
ok 7 /pci-controller@1123/0.0/0.0/wifi.device
ok 8 /pci-controller@1123/0.0/0.0/wifi.driver
Totals: pass:8 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Nícolas F. R. A. Prado 
---
 .../selftests/devices/boards/google,spherion.yaml  | 50 ++
 1 file changed, 50 insertions(+)

diff --git a/tools/testing/selftests/devices/boards/google,spherion.yaml 
b/tools/testing/selftests/devices/boards/google,spherion.yaml
new file mode 100644
index ..17157ecd8c14
--- /dev/null
+++ b/tools/testing/selftests/devices/boards/google,spherion.yaml
@@ -0,0 +1,50 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# This is the device definition for the Google Spherion Chromebook.
+# The filename "google,spherion" comes from the Devicetree compatible, so this
+# file will be automatically used when the test is run on that machine.
+#
+# The top-level is a list of controllers, either for USB or PCI(e).
+# Every controller needs to have a 'type' key set to either 'usb-controller' or
+# 'pci-controller'.
+# Every controller needs to be uniquely identified on the platform. To achieve
+# this, several optional keys can be used:
+# - dt-mmio: identify the MMIO address of the controller as defined in the
+#   Devicetree.
+# - usb-version: for USB controllers to differentiate between USB3 and USB2
+#   buses sharing the same controller.
+# - acpi-uid: _UID property of the controller as supplied by the ACPI. Useful 
to
+#   distinguish between multiple PCI host controllers.
+#
+# The 'devices' key defines a list of devices that are accessible under that
+# controller. A device might be a leaf device or another controller (see
+# 'Dell Inc.,XPS 13 9300.yaml').
+#
+# The 'path' key is needed for every child device (that is, not top-level) to
+# define how to reach this device from the parent controller. For USB devices 
it
+# follows the format \d(.\d)* and denotes the port in the hub at each level in
+# the USB topology. For PCI devices it follows the format \d.\d(/\d.\d)*
+# denoting the device (identified by device-function pair) at each level in the
+# PCI topology.
+#
+# The 'name' key is used in the leaf devices to name the device for clarity in
+# the test output.
+#
+# For USB leaf devices, the 'interfaces' key should contain a list of the
+# interfaces in that device that should be bound to a driver.
+#
+- type: usb-controller
+  dt-mmio: 1120
+  usb-version: 2
+  devices:
+- path: 1.4.1
+  interfaces: [0, 1]
+  name: camera
+- path: 1.4.2
+  interfaces: [0, 1]
+  name: bluetooth
+- type: pci-controller
+  dt-mmio: 1123
+  devices:
+- path: 0.0/0.0
+  name: wifi

-- 
2.43.0

[PATCH v4 3/3] kselftest: devices: Add sample board file for XPS 13 9300

2024-01-22 Thread Nícolas F . R . A . Prado

Add a sample board file describing the file's format and with the list
of devices expected to be probed on the XPS 13 9300 machine as an
example x86 platform.

Test output:

TAP version 13
Using board file: boards/Dell Inc.,XPS 13 9300.yaml
1..22
ok 1 /pci-controller/14.0/usb2-controller/9/camera.device
ok 2 /pci-controller/14.0/usb2-controller/9/camera.0.driver
ok 3 /pci-controller/14.0/usb2-controller/9/camera.1.driver
ok 4 /pci-controller/14.0/usb2-controller/9/camera.2.driver
ok 5 /pci-controller/14.0/usb2-controller/9/camera.3.driver
ok 6 /pci-controller/14.0/usb2-controller/10/bluetooth.device
ok 7 /pci-controller/14.0/usb2-controller/10/bluetooth.0.driver
ok 8 /pci-controller/14.0/usb2-controller/10/bluetooth.1.driver
ok 9 /pci-controller/2.0/gpu.device
ok 10 /pci-controller/2.0/gpu.driver
ok 11 /pci-controller/4.0/thermal.device
ok 12 /pci-controller/4.0/thermal.driver
ok 13 /pci-controller/12.0/sensors.device
ok 14 /pci-controller/12.0/sensors.driver
ok 15 /pci-controller/14.3/wifi.device
ok 16 /pci-controller/14.3/wifi.driver
ok 17 /pci-controller/1d.0/0.0/ssd.device
ok 18 /pci-controller/1d.0/0.0/ssd.driver
ok 19 /pci-controller/1d.7/0.0/sdcard-reader.device
ok 20 /pci-controller/1d.7/0.0/sdcard-reader.driver
ok 21 /pci-controller/1f.3/audio.device
ok 22 /pci-controller/1f.3/audio.driver
Totals: pass:22 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Nícolas F. R. A. Prado 
---
 .../devices/boards/Dell Inc.,XPS 13 9300.yaml  | 40 ++
 1 file changed, 40 insertions(+)

diff --git a/tools/testing/selftests/devices/boards/Dell Inc.,XPS 13 9300.yaml 
b/tools/testing/selftests/devices/boards/Dell Inc.,XPS 13 9300.yaml
new file mode 100644
index ..ff932eb19f0b
--- /dev/null
+++ b/tools/testing/selftests/devices/boards/Dell Inc.,XPS 13 9300.yaml 
@@ -0,0 +1,40 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# This is the device definition for the XPS 13 9300.
+# The filename "Dell Inc.,XPS 13 9300" was chosen following the format
+# "Vendor,Product", where Vendor comes from
+# /sys/devices/virtual/dmi/id/sys_vendor, and Product comes from
+# /sys/devices/virtual/dmi/id/product_name.
+#
+# See google,spherion.yaml for more information.
+#
+- type: pci-controller
+  # This machine has a single PCI host controller so it's valid to not have any
+  # key to identify the controller. If it had more than one controller, the UID
+  # of the controller from ACPI could be used to distinguish as follows:
+  #acpi-uid: 0
+  devices:
+- path: 14.0
+  type: usb-controller
+  usb-version: 2
+  devices:
+- path: 9
+  name: camera
+  interfaces: [0, 1, 2, 3]
+- path: 10
+  name: bluetooth
+  interfaces: [0, 1]
+- path: 2.0
+  name: gpu
+- path: 4.0
+  name: thermal
+- path: 12.0
+  name: sensors
+- path: 14.3
+  name: wifi
+- path: 1d.0/0.0
+  name: ssd
+- path: 1d.7/0.0
+  name: sdcard-reader
+- path: 1f.3
+  name: audio

-- 
2.43.0

Re: [PATCH v2 2/2] kselftest/seccomp: Report each expectation we assert as a KTAP test

2024-01-22 Thread Shuah Khan


On 1/22/24 09:04, Mark Brown wrote:

The seccomp benchmark test makes a number of checks on the performance it
measures and logs them to the output but does so in a custom format which
none of the automated test runners understand meaning that the chances that
anyone is paying attention are slim. Let's additionally log each result in
KTAP format so that automated systems parsing the test output will see each
comparison as a test case. The original logs are left in place since they
provide the actual numbers for analysis.

As part of this rework the flow for the main program so that when we skip
tests we still log all the tests we skip, this is because the standard KTAP
headers and footers include counts of the number of expected and run tests.

Tested-by: Anders Roxell 


Hi Mark,

This patch is missing Signed-off-by. Please fix and resend. I will pull both 
patches.
1/2 is okay.

thanks,
-- Shuah

Re: [PATCH v10 0/4] RISC-V: mm: Make SV48 the default address space

2024-01-22 Thread Charlie Jenkins

On Sat, Jan 20, 2024 at 03:09:51PM +0800, Yangyu Chen wrote:
> 
> 
> On 1/20/24 14:49, Charlie Jenkins wrote:
> > On Sat, Jan 20, 2024 at 02:13:14PM +0800, Yangyu Chen wrote:
> > > Thanks for your reply.
> > > 
> > > On 1/20/24 09:34, Charlie Jenkins wrote:
> > > > On Sun, Jan 14, 2024 at 01:26:57AM +0800, Yangyu Chen wrote:
> > > > > Hi, Charlie
> > > > > 
> > > > > Although this patchset has been merged I still have some questions 
> > > > > about
> > > > > this patchset. Because it breaks regular mmap if address >= 38 bits on
> > > > > sv48 / sv57 capable systems like qemu. For example, If a userspace 
> > > > > program
> > > > > wants to mmap an anonymous page to addr=(1<<45) on an sv48 capable 
> > > > > system,
> > > > > it will fail and kernel will mmaped to another sv39 address since it 
> > > > > does
> > > > 
> > > > Thank you for raising this concern. To make sure I am understanding
> > > > correctly, you are passing a hint address of (1<<45) and expecting mmap
> > > > to return 1<<45 and if it returns a different address you are describing
> > > > mmap as failing? If you want an address that is in the sv48 space you
> > > > can pass in an address that is greater than 1<<47.
> > > > 
> > > > > not meet the requirement to use sv48 as you wrote:
> > > > > 
> > > > > > else if _addr) >= VA_USER_SV48)) && (VA_BITS >= 
> > > > > > VA_BITS_SV48)) \
> > > > > > mmap_end = VA_USER_SV48;\
> > > > > > else\
> > > > > > mmap_end = VA_USER_SV39;\
> > > > > 
> > > > > Then, How can a userspace program create a mmap with a hint if the 
> > > > > address
> > > > > > = (1<<38) after your patch without MAP_FIXED? The only way to do 
> > > > > > this is
> > > > > to pass a hint >= (1<<47) on mmap syscall then kernel will return a 
> > > > > random
> > > > > address in sv48 address space but the hint address gets lost. I think 
> > > > > this
> > > > 
> > > > In order to force mmap to return the address provided you must use
> > > > MAP_FIXED. Otherwise, the address is a "hint" and has no guarantees. The
> > > > hint address on riscv is used to mean "don't give me an address that
> > > > uses more bits than this". This behavior is not unique to riscv, arm64
> > > > and powerpc use a similar scheme. In arch/arm64/include/asm/processor.h
> > > > there is the following code:
> > > > 
> > > > #define arch_get_mmap_base(addr, base) ((addr > DEFAULT_MAP_WINDOW) ? \
> > > > base + TASK_SIZE - 
> > > > DEFAULT_MAP_WINDOW :\
> > > > base)
> > > > 
> > > > arm64/powerpc are only concerned with a single boundary so the code is 
> > > > simpler.
> > > > 
> > > 
> > > As you say, this code in arm64/powerpc will not meet the issue I address.
> > > For example, If the addr here is (1<<50) on arm64, the arch_get_mmap_base
> > > will return base+TASK_SIZE-DEFAULT_MAP_WINDOW which is (1< > > And this behavior on arm64/powerpc/x86 does not break anything since we 
> > > will
> > > use a larger address space if the hint address is specified on the 
> > > address >
> > > DEFAULT_MAP_WINDOW. The corresponding behavior on RISC-V should be if the
> > > hint address > BIT(47) then use Sv57 address space and use Sv48 when the
> > > hint address > BIT(38) if we want Sv39 by default.
> > > 
> > > However, your patch needs the address >= BIT(47) rather than BIT(38) to 
> > > use
> > > Sv48 and address >= BIT(56) to use Sv57, thus breaking existing userspace
> > > software to create mapping on the hint address without MAP_FIXED set.
> > 
> > Code that needs mmap to provide a specific address must use MAP_FIXED.
> > On riscv, it was decided that the address returned from mmap cannot be
> > greater than the hint address. This is currently implemented by using
> > the largest address space that can fit into the hint address. It may be
> > possible that this range can be extended to use all of the addresses
> > that are less than or equal to the hint address.
> > 
> 
> So this decision might be wrong. It requires some userspace software to
> modify their mmap flags to fit with this. For example, a binary translate
> JIT compiler already probes this platform is capable with Sv48, then want to
> create mapping on some address specified on the mmap hint to align with
> foreign binary native address but also provide a fallback path with
> performance overhead. Your patch here will always let userspace software use

I do not follow. This mechanism allows a program to always know how many
bits will be available in the virtual address provided by mmap,
regardless of the size of the underlying virtual address space.

The phrasing "align with foreign binary native address" seems like the
program requires a specific address, which is never guaranteed by mmap
without MAP_FIXED. If the program is relying on mmap to provide the
address without MAP_

[PATCH net v3] selftests: net: fix rps_default_mask with >32 CPUs

2024-01-22 Thread Jakub Kicinski

If there is more than 32 cpus the bitmask will start to contain
commas, leading to:

./rps_default_mask.sh: line 36: [: ,: integer expression 
expected

Remove the commas, bash doesn't interpret leading zeroes as oct
so that should be good enough. Switch to bash, Simon reports that
not all shells support this type of substitution.

Fixes: c12e0d5f267d ("self-tests: introduce self-tests for RPS default mask")
Signed-off-by: Jakub Kicinski 
---
v3:
 - switch to bash
v2: https://lore.kernel.org/all/20240120210256.3864747-1-k...@kernel.org/
 - remove all commas
v1: https://lore.kernel.org/all/20240119151248.3476897-1-k...@kernel.org/

CC: sh...@kernel.org
CC: ho...@kernel.org
CC: linux-kselftest@vger.kernel.org
---
 tools/testing/selftests/net/rps_default_mask.sh | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/rps_default_mask.sh 
b/tools/testing/selftests/net/rps_default_mask.sh
index a26c5624429f..4287a8529890 100755
--- a/tools/testing/selftests/net/rps_default_mask.sh
+++ b/tools/testing/selftests/net/rps_default_mask.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
 readonly ksft_skip=4
@@ -33,6 +33,10 @@ chk_rps() {
 
rps_mask=$($cmd /sys/class/net/$dev_name/queues/rx-0/rps_cpus)
printf "%-60s" "$msg"
+
+   # In case there is more than 32 CPUs we need to remove commas from masks
+   rps_mask=${rps_mask//,}
+   expected_rps_mask=${expected_rps_mask//,}
if [ $rps_mask -eq $expected_rps_mask ]; then
echo "[ ok ]"
else
-- 
2.43.0

[PATCH net] selftests: fill in some missing configs for net

2024-01-22 Thread Jakub Kicinski

We are missing a lot of config options from net selftests,
it seems:

tun/tap: CONFIG_TUN, CONFIG_MACVLAN, CONFIG_MACVTAP
fib_tests:   CONFIG_NET_SCH_FQ_CODEL
l2tp:CONFIG_L2TP, CONFIG_L2TP_V3, CONFIG_L2TP_IP, CONFIG_L2TP_ETH
sctp-vrf:CONFIG_INET_DIAG
txtimestamp: CONFIG_NET_CLS_U32
vxlan_mdb:   CONFIG_BRIDGE_VLAN_FILTERING
gre_gso: CONFIG_NET_IPGRE_DEMUX, CONFIG_IP_GRE, CONFIG_IPV6_GRE
srv6_end_dt*_l3vpn:   CONFIG_IPV6_SEG6_LWTUNNEL
ip_local_port_range:  CONFIG_MPTCP
fib_test:CONFIG_NET_CLS_BASIC
rtnetlink:   CONFIG_MACSEC, CONFIG_NET_SCH_HTB, CONFIG_XFRM_INTERFACE
 CONFIG_NET_IPGRE, CONFIG_BONDING
fib_nexthops: CONFIG_MPLS, CONFIG_MPLS_ROUTING
vxlan_mdb:   CONFIG_NET_ACT_GACT
tls: CONFIG_TLS, CONFIG_CRYPTO_CHACHA20POLY1305
psample: CONFIG_PSAMPLE
fcnal:   CONFIG_TCP_MD5SIG

Try to add them in a semi-alphabetical order.

Fixes: 62199e3f1658 ("selftests: net: Add VXLAN MDB test")
Fixes: c12e0d5f267d ("self-tests: introduce self-tests for RPS default mask")
Fixes: ae5439658cce ("selftests/net: Cover the IP_LOCAL_PORT_RANGE socket 
option")
Signed-off-by: Jakub Kicinski 
--
These are not all the options we're missing. Since the merge window
is over I may not have the time to dig into it myself :(

Adding Fixes tag for 3 semi-random commits which I think missed things.
The full list would be very long.

CC: sh...@kernel.org
CC: ra...@blackwall.org
CC: ido...@nvidia.com
CC: ho...@kernel.org
CC: ja...@cloudflare.com
CC: kun...@amazon.com
CC: linux-kselftest@vger.kernel.org
---
 tools/testing/selftests/net/config | 28 
 1 file changed, 28 insertions(+)

diff --git a/tools/testing/selftests/net/config 
b/tools/testing/selftests/net/config
index 8da562a9ae87..19ff75051660 100644
--- a/tools/testing/selftests/net/config
+++ b/tools/testing/selftests/net/config
@@ -1,5 +1,6 @@
 CONFIG_USER_NS=y
 CONFIG_NET_NS=y
+CONFIG_BONDING=m
 CONFIG_BPF_SYSCALL=y
 CONFIG_TEST_BPF=m
 CONFIG_NUMA=y
@@ -14,9 +15,13 @@ CONFIG_VETH=y
 CONFIG_NET_IPVTI=y
 CONFIG_IPV6_VTI=y
 CONFIG_DUMMY=y
+CONFIG_BRIDGE_VLAN_FILTERING=y
 CONFIG_BRIDGE=y
+CONFIG_CRYPTO_CHACHA20POLY1305=m
 CONFIG_VLAN_8021Q=y
 CONFIG_IFB=y
+CONFIG_INET_DIAG=y
+CONFIG_IP_GRE=m
 CONFIG_NETFILTER=y
 CONFIG_NETFILTER_ADVANCED=y
 CONFIG_NF_CONNTRACK=m
@@ -25,15 +30,36 @@ CONFIG_IP6_NF_IPTABLES=m
 CONFIG_IP_NF_IPTABLES=m
 CONFIG_IP6_NF_NAT=m
 CONFIG_IP_NF_NAT=m
+CONFIG_IPV6_GRE=m
+CONFIG_IPV6_SEG6_LWTUNNEL=y
+CONFIG_L2TP_ETH=m
+CONFIG_L2TP_IP=m
+CONFIG_L2TP=m
+CONFIG_L2TP_V3=y
+CONFIG_MACSEC=m
+CONFIG_MACVLAN=y
+CONFIG_MACVTAP=y
+CONFIG_MPLS=y
+CONFIG_MPTCP=y
 CONFIG_NF_TABLES=m
 CONFIG_NF_TABLES_IPV6=y
 CONFIG_NF_TABLES_IPV4=y
 CONFIG_NFT_NAT=m
+CONFIG_NET_ACT_GACT=m
+CONFIG_NET_CLS_BASIC=m
+CONFIG_NET_CLS_U32=m
+CONFIG_NET_IPGRE_DEMUX=m
+CONFIG_NET_IPGRE=m
+CONFIG_NET_SCH_FQ_CODEL=m
+CONFIG_NET_SCH_HTB=m
 CONFIG_NET_SCH_FQ=m
 CONFIG_NET_SCH_ETF=m
 CONFIG_NET_SCH_NETEM=y
+CONFIG_PSAMPLE=m
+CONFIG_TCP_MD5SIG=y
 CONFIG_TEST_BLACKHOLE_DEV=m
 CONFIG_KALLSYMS=y
+CONFIG_TLS=m
 CONFIG_TRACEPOINTS=y
 CONFIG_NET_DROP_MONITOR=m
 CONFIG_NETDEVSIM=m
@@ -48,7 +74,9 @@ CONFIG_BAREUDP=m
 CONFIG_IPV6_IOAM6_LWTUNNEL=y
 CONFIG_CRYPTO_SM4_GENERIC=y
 CONFIG_AMT=m
+CONFIG_TUN=y
 CONFIG_VXLAN=m
 CONFIG_IP_SCTP=m
 CONFIG_NETFILTER_XT_MATCH_POLICY=m
 CONFIG_CRYPTO_ARIA=y
+CONFIG_XFRM_INTERFACE=m
-- 
2.43.0

[PATCH] kselftest/arm64: Test that ptrace takes effect in the target process

2024-01-22 Thread Mark Brown

While we have test coverage for the ptrace interface in our selftests
the current programs have a number of gaps. The testing is done per
regset so does not cover interactions and at no point do any of the
tests actually run the traced processes meaning that there is no
validation that anything we read or write corresponds to register values
the process actually sees. Let's add a new program which attempts to cover
these gaps.

Each test we do performs a single ptrace write. For each test we generate
some random initial register data in memory and then fork() and trace a
child. The child will load the generated data into the registers then
trigger a breakpoint. The parent waits for the breakpoint then reads the
entire child register state via ptrace, verifying that the values expected
were actually loaded by the child. It then does the write being tested
and resumes the child. Once resumed the child saves the register state
it sees to memory and executes another breakpoint. The parent uses
process_vm_readv() to get these values from the child and verifies that
the values were as expected before cleaning up the child.

We generate configurations with combinations of vector lengths and SVCR
values and then try every ptrace write which will implement the
transition we generated. In order to control execution time (especially
in emulation) we only cover the minimum and maximum VL for each of SVE
and SME, this will ensure we generate both increasing and decreasing
changes in vector length. In order to provide a baseline test we also
check the case where we resume the child without doing a ptrace write.

In order to simplify the generation of the test count for kselftest we
will report but skip a substantial number of tests that can't actually
be expressed via a single ptrace write, several times more than we
actually run. This is noisy and will add some overhead but is very much
simpler so is probably worth the tradeoff.

Signed-off-by: Mark Brown 
---
 tools/testing/selftests/arm64/fp/.gitignore  |1 +
 tools/testing/selftests/arm64/fp/Makefile|5 +-
 tools/testing/selftests/arm64/fp/fp-ptrace-asm.S |  279 
 tools/testing/selftests/arm64/fp/fp-ptrace.c | 1503 ++
 tools/testing/selftests/arm64/fp/fp-ptrace.h |   13 +
 5 files changed, 1800 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/arm64/fp/.gitignore 
b/tools/testing/selftests/arm64/fp/.gitignore
index ebc86757bdd8..00e52c966281 100644
--- a/tools/testing/selftests/arm64/fp/.gitignore
+++ b/tools/testing/selftests/arm64/fp/.gitignore
@@ -1,4 +1,5 @@
 fp-pidbench
+fp-ptrace
 fp-stress
 fpsimd-test
 rdvl-sme
diff --git a/tools/testing/selftests/arm64/fp/Makefile 
b/tools/testing/selftests/arm64/fp/Makefile
index b413b0af07f9..55d4f00d9e8e 100644
--- a/tools/testing/selftests/arm64/fp/Makefile
+++ b/tools/testing/selftests/arm64/fp/Makefile
@@ -5,7 +5,9 @@ top_srcdir = $(realpath ../../../../../)
 
 CFLAGS += $(KHDR_INCLUDES)
 
-TEST_GEN_PROGS := fp-stress \
+TEST_GEN_PROGS := \
+   fp-ptrace \
+   fp-stress \
sve-ptrace sve-probe-vls \
vec-syscfg \
za-fork za-ptrace
@@ -24,6 +26,7 @@ EXTRA_CLEAN += $(OUTPUT)/asm-utils.o $(OUTPUT)/rdvl.o 
$(OUTPUT)/za-fork-asm.o
 # Build with nolibc to avoid effects due to libc's clone() support
 $(OUTPUT)/fp-pidbench: fp-pidbench.S $(OUTPUT)/asm-utils.o
$(CC) -nostdlib $^ -o $@
+$(OUTPUT)/fp-ptrace: fp-ptrace.c fp-ptrace-asm.S
 $(OUTPUT)/fpsimd-test: fpsimd-test.S $(OUTPUT)/asm-utils.o
$(CC) -nostdlib $^ -o $@
 $(OUTPUT)/rdvl-sve: rdvl-sve.c $(OUTPUT)/rdvl.o
diff --git a/tools/testing/selftests/arm64/fp/fp-ptrace-asm.S 
b/tools/testing/selftests/arm64/fp/fp-ptrace-asm.S
new file mode 100644
index ..7ad59d92d02b
--- /dev/null
+++ b/tools/testing/selftests/arm64/fp/fp-ptrace-asm.S
@@ -0,0 +1,279 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2021-3 ARM Limited.
+//
+// Assembly portion of the FP ptrace test
+
+//
+// Load values from memory into registers, break on a breakpoint, then
+// break on a further breakpoint
+//
+
+#include "fp-ptrace.h"
+#include "sme-inst.h"
+
+.arch_extension sve
+
+// Load and save register values with pauses for ptrace
+//
+// x0 - SVE in use
+// x1 - SME in use
+// x2 - SME2 in use
+// x3 - FA64 supported
+
+.globl load_and_save
+load_and_save:
+   stp x11, x12, [sp, #-0x10]!
+
+   // This should be redundant in the SVE case
+   ldr x7, =v_in
+   ldp q0, q1, [x7]
+   ldp q2, q3, [x7, #16 * 2]
+   ldp q4, q5, [x7, #16 * 4]
+   ldp q6, q7, [x7, #16 * 6]
+   ldp q8, q9, [x7, #16 * 8]
+   ldp q10, q11, [x7, #16 * 10]
+   ldp q12, q13, [x7, #16 * 12]
+   ldp q14, q15, [x7, #16 * 14]
+   ldp q16, q17, [x7, #16 * 16]
+   ldp q18, q19, [x7, #16 * 18]
+   ldp q20, q21, [x7, #16 * 20]
+   ldp q22, q23, [x7, #16 * 22]
+   ldp q24, q25, [x7,

[PATCH v3 0/2] kselftest/seccomp: Convert to KTAP output

2024-01-22 Thread Mark Brown

Currently the seccomp benchmark selftest produces non-standard output,
meaning that while it makes a number of checks of the performance it
observes this has to be parsed by humans.  This means that automated
systems running this suite of tests are almost certainly ignoring the
results which isn't ideal for spotting problems.  Let's rework things so
that each check that the program does is reported as a test result to
the framework.

Signed-off-by: Mark Brown 
---
Changes in v3:
- Re-add signoff.
- Link to v2: 
https://lore.kernel.org/r/20240122-b4-kselftest-seccomp-benchmark-ktap-v2-0-aed137eae...@kernel.org

Changes in v2:
- Rebase onto v6.8-rc1.
- Link to v1: 
https://lore.kernel.org/r/20231219-b4-kselftest-seccomp-benchmark-ktap-v1-0-f99e22863...@kernel.org

---
Mark Brown (2):
  kselftest/seccomp: Use kselftest output functions for benchmark
  kselftest/seccomp: Report each expectation we assert as a KTAP test

 .../testing/selftests/seccomp/seccomp_benchmark.c  | 105 +
 1 file changed, 65 insertions(+), 40 deletions(-)
---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20231219-b4-kselftest-seccomp-benchmark-ktap-357603823708

Best regards,
-- 
Mark Brown

[PATCH v3 1/2] kselftest/seccomp: Use kselftest output functions for benchmark

2024-01-22 Thread Mark Brown

In preparation for trying to output the test results themselves in TAP
format rework all the prints in the benchmark to use the kselftest output
functions. The uses of system() all produce single line output so we can
avoid having to deal with fully managing the child process and continue to
use system() by simply printing an empty message before we invoke system().
We also leave one printf() used to complete a line of output in place.

Tested-by: Anders Roxell 
Signed-off-by: Mark Brown 
---
 .../testing/selftests/seccomp/seccomp_benchmark.c  | 45 --
 1 file changed, 24 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_benchmark.c 
b/tools/testing/selftests/seccomp/seccomp_benchmark.c
index 5b5c9d558dee..93168dd2c1e3 100644
--- a/tools/testing/selftests/seccomp/seccomp_benchmark.c
+++ b/tools/testing/selftests/seccomp/seccomp_benchmark.c
@@ -38,10 +38,10 @@ unsigned long long timing(clockid_t clk_id, unsigned long 
long samples)
i *= 10ULL;
i += finish.tv_nsec - start.tv_nsec;
 
-   printf("%lu.%09lu - %lu.%09lu = %llu (%.1fs)\n",
-   finish.tv_sec, finish.tv_nsec,
-   start.tv_sec, start.tv_nsec,
-   i, (double)i / 10.0);
+   ksft_print_msg("%lu.%09lu - %lu.%09lu = %llu (%.1fs)\n",
+  finish.tv_sec, finish.tv_nsec,
+  start.tv_sec, start.tv_nsec,
+  i, (double)i / 10.0);
 
return i;
 }
@@ -53,7 +53,7 @@ unsigned long long calibrate(void)
pid_t pid, ret;
int seconds = 15;
 
-   printf("Calibrating sample size for %d seconds worth of syscalls 
...\n", seconds);
+   ksft_print_msg("Calibrating sample size for %d seconds worth of 
syscalls ...\n", seconds);
 
samples = 0;
pid = getpid();
@@ -102,14 +102,14 @@ long compare(const char *name_one, const char *name_eval, 
const char *name_two,
 {
bool good;
 
-   printf("\t%s %s %s (%lld %s %lld): ", name_one, name_eval, name_two,
-  (long long)one, name_eval, (long long)two);
+   ksft_print_msg("\t%s %s %s (%lld %s %lld): ", name_one, name_eval, 
name_two,
+  (long long)one, name_eval, (long long)two);
if (one > INT_MAX) {
-   printf("Miscalculation! Measurement went negative: %lld\n", 
(long long)one);
+   ksft_print_msg("Miscalculation! Measurement went negative: 
%lld\n", (long long)one);
return 1;
}
if (two > INT_MAX) {
-   printf("Miscalculation! Measurement went negative: %lld\n", 
(long long)two);
+   ksft_print_msg("Miscalculation! Measurement went negative: 
%lld\n", (long long)two);
return 1;
}
 
@@ -145,12 +145,15 @@ int main(int argc, char *argv[])
 
setbuf(stdout, NULL);
 
-   printf("Running on:\n");
+   ksft_print_msg("Running on:\n");
+   ksft_print_msg("");
system("uname -a");
 
-   printf("Current BPF sysctl settings:\n");
+   ksft_print_msg("Current BPF sysctl settings:\n");
/* Avoid using "sysctl" which may not be installed. */
+   ksft_print_msg("");
system("grep -H . /proc/sys/net/core/bpf_jit_enable");
+   ksft_print_msg("");
system("grep -H . /proc/sys/net/core/bpf_jit_harden");
 
if (argc > 1)
@@ -158,11 +161,11 @@ int main(int argc, char *argv[])
else
samples = calibrate();
 
-   printf("Benchmarking %llu syscalls...\n", samples);
+   ksft_print_msg("Benchmarking %llu syscalls...\n", samples);
 
/* Native call */
native = timing(CLOCK_PROCESS_CPUTIME_ID, samples) / samples;
-   printf("getpid native: %llu ns\n", native);
+   ksft_print_msg("getpid native: %llu ns\n", native);
 
ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
assert(ret == 0);
@@ -172,33 +175,33 @@ int main(int argc, char *argv[])
assert(ret == 0);
 
bitmap1 = timing(CLOCK_PROCESS_CPUTIME_ID, samples) / samples;
-   printf("getpid RET_ALLOW 1 filter (bitmap): %llu ns\n", bitmap1);
+   ksft_print_msg("getpid RET_ALLOW 1 filter (bitmap): %llu ns\n", 
bitmap1);
 
/* Second filter resulting in a bitmap */
ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &bitmap_prog);
assert(ret == 0);
 
bitmap2 = timing(CLOCK_PROCESS_CPUTIME_ID, samples) / samples;
-   printf("getpid RET_ALLOW 2 filters (bitmap): %llu ns\n", bitmap2);
+   ksft_print_msg("getpid RET_ALLOW 2 filters (bitmap): %llu ns\n", 
bitmap2);
 
/* Third filter, can no longer be converted to bitmap */
ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
assert(ret == 0);
 
filter1 = timing(CLOCK_PROCESS_CPUTIME_ID, samples) / samples;
-   printf("getpid RET_ALLOW 3 filters (full): %llu ns\n", filter1);
+   ksft_print_msg("getpid RET_ALLOW 3 filters (full): %llu ns\n", filt

[PATCH v3 2/2] kselftest/seccomp: Report each expectation we assert as a KTAP test

2024-01-22 Thread Mark Brown

The seccomp benchmark test makes a number of checks on the performance it
measures and logs them to the output but does so in a custom format which
none of the automated test runners understand meaning that the chances that
anyone is paying attention are slim. Let's additionally log each result in
KTAP format so that automated systems parsing the test output will see each
comparison as a test case. The original logs are left in place since they
provide the actual numbers for analysis.

As part of this rework the flow for the main program so that when we skip
tests we still log all the tests we skip, this is because the standard KTAP
headers and footers include counts of the number of expected and run tests.

Tested-by: Anders Roxell 
Signed-off-by: Mark Brown 
---
 .../testing/selftests/seccomp/seccomp_benchmark.c  | 62 +++---
 1 file changed, 42 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_benchmark.c 
b/tools/testing/selftests/seccomp/seccomp_benchmark.c
index 93168dd2c1e3..436a527b8235 100644
--- a/tools/testing/selftests/seccomp/seccomp_benchmark.c
+++ b/tools/testing/selftests/seccomp/seccomp_benchmark.c
@@ -98,24 +98,36 @@ bool le(int i_one, int i_two)
 }
 
 long compare(const char *name_one, const char *name_eval, const char *name_two,
-unsigned long long one, bool (*eval)(int, int), unsigned long long 
two)
+unsigned long long one, bool (*eval)(int, int), unsigned long long 
two,
+bool skip)
 {
bool good;
 
+   if (skip) {
+   ksft_test_result_skip("%s %s %s\n", name_one, name_eval,
+ name_two);
+   return 0;
+   }
+
ksft_print_msg("\t%s %s %s (%lld %s %lld): ", name_one, name_eval, 
name_two,
   (long long)one, name_eval, (long long)two);
if (one > INT_MAX) {
ksft_print_msg("Miscalculation! Measurement went negative: 
%lld\n", (long long)one);
-   return 1;
+   good = false;
+   goto out;
}
if (two > INT_MAX) {
ksft_print_msg("Miscalculation! Measurement went negative: 
%lld\n", (long long)two);
-   return 1;
+   good = false;
+   goto out;
}
 
good = eval(one, two);
printf("%s\n", good ? "✔️" : "❌");
 
+out:
+   ksft_test_result(good, "%s %s %s\n", name_one, name_eval, name_two);
+
return good ? 0 : 1;
 }
 
@@ -142,9 +154,13 @@ int main(int argc, char *argv[])
unsigned long long samples, calc;
unsigned long long native, filter1, filter2, bitmap1, bitmap2;
unsigned long long entry, per_filter1, per_filter2;
+   bool skip = false;
 
setbuf(stdout, NULL);
 
+   ksft_print_header();
+   ksft_set_plan(7);
+
ksft_print_msg("Running on:\n");
ksft_print_msg("");
system("uname -a");
@@ -202,8 +218,10 @@ int main(int argc, char *argv[])
 #define ESTIMATE(fmt, var, what)   do {\
var = (what);   \
ksft_print_msg("Estimated " fmt ": %llu ns\n", var);\
-   if (var > INT_MAX)  \
-   goto more_samples;  \
+   if (var > INT_MAX) {\
+   skip = true;\
+   ret |= 1;   \
+   }   \
} while (0)
 
ESTIMATE("total seccomp overhead for 1 bitmapped filter", calc,
@@ -222,30 +240,34 @@ int main(int argc, char *argv[])
 (filter2 - native - entry) / 4);
 
ksft_print_msg("Expectations:\n");
-   ret |= compare("native", "≤", "1 bitmap", native, le, bitmap1);
-   bits = compare("native", "≤", "1 filter", native, le, filter1);
+   ret |= compare("native", "≤", "1 bitmap", native, le, bitmap1,
+  skip);
+   bits = compare("native", "≤", "1 filter", native, le, filter1,
+  skip);
if (bits)
-   goto more_samples;
+   skip = true;
 
ret |= compare("per-filter (last 2 diff)", "≈", "per-filter (filters / 
4)",
-   per_filter1, approx, per_filter2);
+  per_filter1, approx, per_filter2, skip);
 
bits = compare("1 bitmapped", "≈", "2 bitmapped",
-   bitmap1 - native, approx, bitmap2 - native);
+  bitmap1 - native, approx, bitmap2 - native, skip);
if (bits) {
ksft_print_msg("Skipping constant action bitmap expectations: 
they appear unsupported.\n");
-   goto out;
+   skip = true;
}
 
-   ret |= compare("entry", "≈", "1 bitmapped", entry, approx, bitmap1 - 
native);
-   ret |= com

Re: [PATCH v14] exec: Fix dead-lock in de_thread with ptrace_attach

2024-01-22 Thread Kees Cook

On Mon, Jan 22, 2024 at 02:24:37PM +0100, Bernd Edlinger wrote:
> The main concern was when a set-suid program is executed by execve.
> Then it makes a difference if the current thread is traced before the
> execve or not.  That means if the current thread is already traced,
> the decision, which credentials will be used is different than otherwise.
> 
> So currently there are two possbilities, either the trace happens
> before the execve, and the suid-bit will be ignored, or the trace
> happens after the execve, but it is checked that the now potentially
> more privileged credentials allow the tracer to proceed.
> 
> With this patch we will have a third prossibility, that is in order
> to avoid the possible dead-lock we allow the suid-bit to take effect,
> but only if the tracer's privileges allow both to attach the current
> credentials and the new credentials.  But I would only do that as
> a last resort, to avoid the possible dead-lock, and not unless a dead-lock
> is really expected to happen.

Instead of doing this special cred check (which I am worried could
become fragile -- I'd prefer all privilege checks happen in the same
place and in the same way...), could we just fail the ptrace_attach of
the execve?

-- 
Kees Cook

Re: [PATCH v2 1/2] kselftest/seccomp: Use kselftest output functions for benchmark

2024-01-22 Thread Kees Cook

On Mon, Jan 22, 2024 at 04:04:15PM +, Mark Brown wrote:
> In preparation for trying to output the test results themselves in TAP
> format rework all the prints in the benchmark to use the kselftest output
> functions. The uses of system() all produce single line output so we can
> avoid having to deal with fully managing the child process and continue to
> use system() by simply printing an empty message before we invoke system().
> We also leave one printf() used to complete a line of output in place.
> 
> Tested-by: Anders Roxell 
> Signed-off-by: Mark Brown 

Acked-by: Kees Cook 

-- 
Kees Cook

Re: [PATCH v2 2/2] kselftest/seccomp: Report each expectation we assert as a KTAP test

2024-01-22 Thread Kees Cook

On Mon, Jan 22, 2024 at 04:04:16PM +, Mark Brown wrote:
> The seccomp benchmark test makes a number of checks on the performance it
> measures and logs them to the output but does so in a custom format which
> none of the automated test runners understand meaning that the chances that
> anyone is paying attention are slim. Let's additionally log each result in
> KTAP format so that automated systems parsing the test output will see each
> comparison as a test case. The original logs are left in place since they
> provide the actual numbers for analysis.
> 
> As part of this rework the flow for the main program so that when we skip
> tests we still log all the tests we skip, this is because the standard KTAP
> headers and footers include counts of the number of expected and run tests.
> 
> Tested-by: Anders Roxell 

with the S-o-b added,

Acked-by: Kees Cook 

-- 
Kees Cook

Re: [PATCH v3 1/2] kselftest/seccomp: Use kselftest output functions for benchmark

2024-01-22 Thread Kees Cook

On Mon, Jan 22, 2024 at 09:08:17PM +, Mark Brown wrote:
> In preparation for trying to output the test results themselves in TAP
> format rework all the prints in the benchmark to use the kselftest output
> functions. The uses of system() all produce single line output so we can
> avoid having to deal with fully managing the child process and continue to
> use system() by simply printing an empty message before we invoke system().
> We also leave one printf() used to complete a line of output in place.
> 
> Tested-by: Anders Roxell 
> Signed-off-by: Mark Brown 

Acked-by: Kees Cook 

-- 
Kees Cook

Re: [PATCH v3 2/2] kselftest/seccomp: Report each expectation we assert as a KTAP test

2024-01-22 Thread Kees Cook

On Mon, Jan 22, 2024 at 09:08:18PM +, Mark Brown wrote:
> The seccomp benchmark test makes a number of checks on the performance it
> measures and logs them to the output but does so in a custom format which
> none of the automated test runners understand meaning that the chances that
> anyone is paying attention are slim. Let's additionally log each result in
> KTAP format so that automated systems parsing the test output will see each
> comparison as a test case. The original logs are left in place since they
> provide the actual numbers for analysis.
> 
> As part of this rework the flow for the main program so that when we skip
> tests we still log all the tests we skip, this is because the standard KTAP
> headers and footers include counts of the number of expected and run tests.
> 
> Tested-by: Anders Roxell 
> Signed-off-by: Mark Brown 

Acked-by: Kees Cook 

-- 
Kees Cook

Re: [PATCH v7 0/4] Introduce mseal()

2024-01-22 Thread Jeff Xu

On Mon, Jan 22, 2024 at 7:49 AM Theo de Raadt  wrote:
>
> Regarding these pieces
>
> > The PROT_SEAL bit in prot field of mmap(). When present, it marks
> > the map sealed since creation.
>
> OpenBSD won't be doing this.  I had PROT_IMMUTABLE as a draft.  In my
> research I found basically zero circumstances when you userland does
> that.  The most common circumstance is you create a RW mapping, fill it,
> and then change to a more restrictve mapping, and lock it.
>
> There are a few regions in the addressspace that can be locked while RW.
> For instance, the stack.  But the kernel does that, not userland.  I
> found regions where the kernel wants to do this to the address space,
> but there is no need to export useless functionality to userland.
>
I have a feeling that most apps that need to use mmap() in their code
are likely using RW mappings. Adding sealing to mmap() could stop
those mappings from being executable. Of course, those apps would
need to change their code. We can't do it for them.

Also, I believe adding this to mmap() has no downsides, only
performance gain, as Pedro Falcato pointed out in [1].

[1] 
https://lore.kernel.org/lkml/CAKbZUD2A+=bp_sd+q0yif7njqmu8p__eb4yguq0agecmlh8...@mail.gmail.com/

> OpenBSD now uses this for a high percent of the address space.  It might
> be worth re-reading a description of the split of responsibility regarding
> who locks different types of memory in a process;
> - kernel (the majority, based upon what ELF layout tell us),
> - shared library linker (the next majority, dealing with shared
>   library mappings and left-overs not determinable at kernel time),
> - libc (a small minority, mostly regarding forced mutable objects)
> - and the applications themselves (only 1 application today)
>
> https://lwn.net/Articles/915662/
>
> > The MAP_SEALABLE bit in the flags field of mmap(). When present, it marks
> > the map as sealable. A map created without MAP_SEALABLE will not support
> > sealing, i.e. mseal() will fail.
>
> We definately won't be doing this.  We allow a process to lock any and all
> it's memory that isn't locked already, even if it means it is shooting
> itself in the foot.
>
> I think you are going to severely hurt the power of this mechanism,
> because you won't be able to lock memory that has been allocated by a
> different callsite not under your source-code control which lacks the
> MAP_SEALABLE flag.  (Which is extremely common with the system-parts of
> a process, meaning not just libc but kernel allocated objects).
>
MAP_SEALABLE was an open discussion item called out on V3 [2] and V4 [3].

I acknowledge that additional coordination would be required if
mapping were to be allocated by one software component and sealed in
another. However, this is feasible.

Considering the side effect of not having this flag (as discussed in
V3/V4) and the significant implications of altering the lifetime of
the mapping (since unmapping would not be possible), I believe it is
reasonable to expect developers to exercise additional care and
caution when utilizing memory sealing.

[2] 
https://lore.kernel.org/linux-mm/20231212231706.2680890-2-jef...@chromium.org/
[3] https://lore.kernel.org/all/20240104185138.169307-1-jef...@chromium.org/

> It may be fine inside a program like chrome, but I expect that flag to make
> it harder to use in libc, and it will hinder adoption.
>
In the case of glibc and linux, as stated in the cover letter, Stephen
is working on a change to glibc to add sealing support to the dynamic
linker,  also I plan to make necessary code changes in the linux kernel.

Re: [PATCH v7 0/4] Introduce mseal()

2024-01-22 Thread Theo de Raadt

Jeff Xu  wrote:

> On Mon, Jan 22, 2024 at 7:49 AM Theo de Raadt  wrote:
> >
> > Regarding these pieces
> >
> > > The PROT_SEAL bit in prot field of mmap(). When present, it marks
> > > the map sealed since creation.
> >
> > OpenBSD won't be doing this.  I had PROT_IMMUTABLE as a draft.  In my
> > research I found basically zero circumstances when you userland does
> > that.  The most common circumstance is you create a RW mapping, fill it,
> > and then change to a more restrictve mapping, and lock it.
> >
> > There are a few regions in the addressspace that can be locked while RW.
> > For instance, the stack.  But the kernel does that, not userland.  I
> > found regions where the kernel wants to do this to the address space,
> > but there is no need to export useless functionality to userland.
> >
> I have a feeling that most apps that need to use mmap() in their code
> are likely using RW mappings. Adding sealing to mmap() could stop
> those mappings from being executable. Of course, those apps would
> need to change their code. We can't do it for them.

I don't have a feeling about it.

I spent a year engineering a complete system which exercises the maximum
amount of memory you can lock.

I saw nothing like what you are describing.  I had PROT_IMMUTABLE in my
drafts, and saw it turning into a dangerous anti-pattern.

> Also, I believe adding this to mmap() has no downsides, only
> performance gain, as Pedro Falcato pointed out in [1].
> 
> [1] 
> https://lore.kernel.org/lkml/CAKbZUD2A+=bp_sd+q0yif7njqmu8p__eb4yguq0agecmlh8...@mail.gmail.com/

Are you joking?  You don't have any code doing that today.  More feelings?

OpenBSD userland has zero places it can use mmap() MAP_IMMUTABLE.

It has two places where it has mprotect() + mimmutable() adjacent to each
other, two codepaths for late mprotect() of RELRO, and then make the RELRO
immutable.

I think this idea is a premature optimization, and intentionally incompatible.

Like I say, I had a similar MAP_ flag for mprotect() and mmap() in my
development trees, and I recognized it was pointless, distracting developers
into the wrong patterns, and I threw it out.

> > OpenBSD now uses this for a high percent of the address space.  It might
> > be worth re-reading a description of the split of responsibility regarding
> > who locks different types of memory in a process;
> > - kernel (the majority, based upon what ELF layout tell us),
> > - shared library linker (the next majority, dealing with shared
> >   library mappings and left-overs not determinable at kernel time),
> > - libc (a small minority, mostly regarding forced mutable objects)
> > - and the applications themselves (only 1 application today)
> >
> > https://lwn.net/Articles/915662/
> >
> > > The MAP_SEALABLE bit in the flags field of mmap(). When present, it marks
> > > the map as sealable. A map created without MAP_SEALABLE will not support
> > > sealing, i.e. mseal() will fail.
> >
> > We definately won't be doing this.  We allow a process to lock any and all
> > it's memory that isn't locked already, even if it means it is shooting
> > itself in the foot.
> >
> > I think you are going to severely hurt the power of this mechanism,
> > because you won't be able to lock memory that has been allocated by a
> > different callsite not under your source-code control which lacks the
> > MAP_SEALABLE flag.  (Which is extremely common with the system-parts of
> > a process, meaning not just libc but kernel allocated objects).
> >
> MAP_SEALABLE was an open discussion item called out on V3 [2] and V4 [3].
> 
> I acknowledge that additional coordination would be required if
> mapping were to be allocated by one software component and sealed in
> another. However, this is feasible.
> 
> Considering the side effect of not having this flag (as discussed in
> V3/V4) and the significant implications of altering the lifetime of
> the mapping (since unmapping would not be possible), I believe it is
> reasonable to expect developers to exercise additional care and
> caution when utilizing memory sealing.
>
> [2] 
> https://lore.kernel.org/linux-mm/20231212231706.2680890-2-jef...@chromium.org/
> [3] https://lore.kernel.org/all/20240104185138.169307-1-jef...@chromium.org/

I disagree *strongly*.  Developers need to exercise additional care on
memory, period.  Memory sealing issues is the least of their worries.

(Except for handling RELRO, but only the ld.so developers will lose
their hair).

OK, so mseal and mimmutable are very different.

mimmutable can be used by any developer on the address space easily.

mseal requires control of the whole stack between allocation and consumption.

I'm sorry, but I don't think you understand how dangerous this MAP_SEALABLE
proposal is because of the difficulties it will create for use.

The immutable memory management we have today in OpenBSD would completely
impossible with such a flag.  Seperation between allocator (that doesn't know
what is going to happen), and consu

Re: [PATCH 6/6] of: Add KUnit test to confirm DTB is loaded

2024-01-22 Thread Stephen Boyd

Quoting David Gow (2024-01-15 21:03:12)
> On Sat, 13 Jan 2024 at 04:07, Stephen Boyd  wrote:
> >
> > Add a KUnit test that confirms a DTB has been loaded, i.e. there is a
> > root node, and that the of_have_populated_dt() API works properly.
> >
> > Cc: Rob Herring 
> > Cc: Frank Rowand 
> > Cc: David Gow 
> > Cc: Brendan Higgins 
> > Signed-off-by: Stephen Boyd 
> > ---
> 
> I won't pretend to be a devicetree expert, but this looks good to me
> from a KUnit point of view, and passes comfortably here.
> 
> checkpatch seems to have one complaint about the kconfig help text.
> Personally, I think the brief description is fine.
> 
> Reviewed-by: David Gow 
> 

Thanks! I noticed that x86 has some devicetree init code. Did you happen
to try on an x86 kvm instance? Or only run on UML?

8<
diff --git a/arch/x86/kernel/devicetree.c b/arch/x86/kernel/devicetree.c
index afd09924094e..650752d112a6 100644
--- a/arch/x86/kernel/devicetree.c
+++ b/arch/x86/kernel/devicetree.c
@@ -283,22 +283,24 @@ void __init x86_flattree_get_config(void)
u32 size, map_len;
void *dt;
 
-   if (!initial_dtb)
-   return;
+   if (initial_dtb) {
+   map_len = max(PAGE_SIZE - (initial_dtb & ~PAGE_MASK), (u64)128);
 
-   map_len = max(PAGE_SIZE - (initial_dtb & ~PAGE_MASK), (u64)128);
+   dt = early_memremap(initial_dtb, map_len);
+   size = fdt_totalsize(dt);
+   if (map_len < size) {
+   early_memunmap(dt, map_len);
+   dt = early_memremap(initial_dtb, size);
+   map_len = size;
+   }
 
-   dt = early_memremap(initial_dtb, map_len);
-   size = fdt_totalsize(dt);
-   if (map_len < size) {
-   early_memunmap(dt, map_len);
-   dt = early_memremap(initial_dtb, size);
-   map_len = size;
+   early_init_dt_verify(dt);
}
 
-   early_init_dt_verify(dt);
unflatten_and_copy_device_tree();
-   early_memunmap(dt, map_len);
+
+   if (initial_dtb)
+   early_memunmap(dt, map_len);
 }
 #endif

[PATCH 6.7 370/641] kselftest/alsa - mixer-test: fix the number of parameters to ksft_exit_fail_msg()

2024-01-22 Thread Greg Kroah-Hartman

6.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Mirsad Todorovac 

[ Upstream commit 8c51c13dc63d46e754c44215eabc0890a8bd9bfb ]

Minor fix in the number of arguments to error reporting function in the
test program as reported by GCC 13.2.0 warning.

mixer-test.c: In function ‘find_controls’:
mixer-test.c:169:44: warning: too many arguments for format 
[-Wformat-extra-args]
  169 | ksft_exit_fail_msg("snd_ctl_poll_descriptors() 
failed for %d\n",
  |
^~~~

The number of arguments in call to ksft_exit_fail_msg() doesn't correspond
to the format specifiers, so this is adjusted resembling the sibling calls
to the error function.

Fixes: b1446bda56456 ("kselftest: alsa: Check for event generation when we 
write to controls")
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: Shuah Khan 
Cc: linux-so...@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Mirsad Todorovac 
Acked-by: Mark Brown 
Link: 
https://lore.kernel.org/r/20240107173704.937824-2-mirsad.todoro...@alu.unizg.hr
Signed-off-by: Takashi Iwai 
Signed-off-by: Sasha Levin 
---
 tools/testing/selftests/alsa/mixer-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/alsa/mixer-test.c 
b/tools/testing/selftests/alsa/mixer-test.c
index 23df154fcdd7..208c2170c074 100644
--- a/tools/testing/selftests/alsa/mixer-test.c
+++ b/tools/testing/selftests/alsa/mixer-test.c
@@ -166,7 +166,7 @@ static void find_controls(void)
err = snd_ctl_poll_descriptors(card_data->handle,
   &card_data->pollfd, 1);
if (err != 1) {
-   ksft_exit_fail_msg("snd_ctl_poll_descriptors() failed 
for %d\n",
+   ksft_exit_fail_msg("snd_ctl_poll_descriptors() failed 
for card %d: %d\n",
   card, err);
}
 
-- 
2.43.0

[PATCH 6.7 371/641] kselftest/alsa - mixer-test: Fix the print format specifier warning

2024-01-22 Thread Greg Kroah-Hartman

6.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Mirsad Todorovac 

[ Upstream commit 3f47c1ebe5ca9c5883e596c7888dec4bec0176d8 ]

The GCC 13.2.0 compiler issued the following warning:

mixer-test.c: In function ‘ctl_value_index_valid’:
mixer-test.c:322:79: warning: format ‘%lld’ expects argument of type ‘long long 
int’, \
  but argument 5 has type ‘long int’ [-Wformat=]
  322 | ksft_print_msg("%s.%d value %lld more than 
maximum %lld\n",
  | 
   ~~~^
  | 
  |
  | 
  long long int
  | 
   %ld
  323 |ctl->name, index, int64_val,
  324 |
snd_ctl_elem_info_get_max(ctl->info));
  |

  ||
  |long int

Fixing the format specifier as advised by the compiler suggestion removes the
warning.

Fixes: 3f48b137d88e7 ("kselftest: alsa: Factor out check that values meet 
constraints")
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: Shuah Khan 
Cc: linux-so...@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Mirsad Todorovac 
Acked-by: Mark Brown 
Link: 
https://lore.kernel.org/r/20240107173704.937824-3-mirsad.todoro...@alu.unizg.hr
Signed-off-by: Takashi Iwai 
Signed-off-by: Sasha Levin 
---
 tools/testing/selftests/alsa/mixer-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/alsa/mixer-test.c 
b/tools/testing/selftests/alsa/mixer-test.c
index 208c2170c074..df942149c6f6 100644
--- a/tools/testing/selftests/alsa/mixer-test.c
+++ b/tools/testing/selftests/alsa/mixer-test.c
@@ -319,7 +319,7 @@ static bool ctl_value_index_valid(struct ctl_data *ctl,
}
 
if (int64_val > snd_ctl_elem_info_get_max64(ctl->info)) {
-   ksft_print_msg("%s.%d value %lld more than maximum 
%lld\n",
+   ksft_print_msg("%s.%d value %lld more than maximum 
%ld\n",
   ctl->name, index, int64_val,
   snd_ctl_elem_info_get_max(ctl->info));
return false;
-- 
2.43.0

[PATCH 6.7 372/641] kselftest/alsa - conf: Stringify the printed errno in sysfs_get()

2024-01-22 Thread Greg Kroah-Hartman

6.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Mirsad Todorovac 

[ Upstream commit fd38dd6abda589a8771e7872e4dea28c99c6a6ef ]

GCC 13.2.0 reported the warning of the print format specifier:

conf.c: In function ‘sysfs_get’:
conf.c:181:72: warning: format ‘%s’ expects argument of type ‘char *’, \
but argument 3 has type ‘int’ [-Wformat=]
  181 | ksft_exit_fail_msg("sysfs: unable to read value '%s': 
%s\n",
  |   ~^
  ||
  |
char *
  |   %d

The fix passes strerror(errno) as it was intended, like in the sibling error
exit message.

Fixes: aba51cd0949ae ("selftests: alsa - add PCM test")
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: Shuah Khan 
Cc: linux-so...@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Mirsad Todorovac 
Acked-by: Mark Brown 
Link: 
https://lore.kernel.org/r/20240107173704.937824-5-mirsad.todoro...@alu.unizg.hr
Signed-off-by: Takashi Iwai 
Signed-off-by: Sasha Levin 
---
 tools/testing/selftests/alsa/conf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/alsa/conf.c 
b/tools/testing/selftests/alsa/conf.c
index 00925eb8d9f4..89e3656a042d 100644
--- a/tools/testing/selftests/alsa/conf.c
+++ b/tools/testing/selftests/alsa/conf.c
@@ -179,7 +179,7 @@ static char *sysfs_get(const char *sysfs_root, const char 
*id)
close(fd);
if (len < 0)
ksft_exit_fail_msg("sysfs: unable to read value '%s': %s\n",
-  path, errno);
+  path, strerror(errno));
while (len > 0 && path[len-1] == '\n')
len--;
path[len] = '\0';
-- 
2.43.0

[PATCH 63/82] mm: Refactor intentional wrap-around test

2024-01-22 Thread Kees Cook

In an effort to separate intentional arithmetic wrap-around from
unexpected wrap-around, we need to refactor places that depend on this
kind of math. One of the most common code patterns of this is:

VAR + value < VAR

Notably, this is considered "undefined behavior" for signed and pointer
types, which the kernel works around by using the -fno-strict-overflow
option in the build[1] (which used to just be -fwrapv). Regardless, we
want to get the kernel source to the position where we can meaningfully
instrument arithmetic wrap-around conditions and catch them when they
are unexpected, regardless of whether they are signed[2], unsigned[3],
or pointer[4] types.

Refactor open-coded wrap-around addition test to use add_would_overflow().
This paves the way to enabling the wrap-around sanitizers in the future.

Link: https://git.kernel.org/linus/68df3755e383e6fecf2354a67b08f92f18536594 [1]
Link: https://github.com/KSPP/linux/issues/26 [2]
Link: https://github.com/KSPP/linux/issues/27 [3]
Link: https://github.com/KSPP/linux/issues/344 [4]
Cc: Andrew Morton 
Cc: Shuah Khan 
Cc: linux...@kvack.org
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Kees Cook 
---
 mm/memory.c | 4 ++--
 mm/mmap.c   | 2 +-
 mm/mremap.c | 2 +-
 mm/nommu.c  | 4 ++--
 mm/util.c   | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7e1f4849463a..d47acdff7af3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2559,7 +2559,7 @@ int vm_iomap_memory(struct vm_area_struct *vma, 
phys_addr_t start, unsigned long
unsigned long vm_len, pfn, pages;
 
/* Check that the physical memory area passed in looks valid */
-   if (start + len < start)
+   if (add_would_overflow(start, len))
return -EINVAL;
/*
 * You *really* shouldn't map things that aren't page-aligned,
@@ -2569,7 +2569,7 @@ int vm_iomap_memory(struct vm_area_struct *vma, 
phys_addr_t start, unsigned long
len += start & ~PAGE_MASK;
pfn = start >> PAGE_SHIFT;
pages = (len + ~PAGE_MASK) >> PAGE_SHIFT;
-   if (pfn + pages < pfn)
+   if (add_would_overflow(pfn, pages))
return -EINVAL;
 
/* We start the mapping 'vm_pgoff' pages into the area */
diff --git a/mm/mmap.c b/mm/mmap.c
index b78e83d351d2..16501fcaf511 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3023,7 +3023,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, 
unsigned long, size,
return ret;
 
/* Does pgoff wrap? */
-   if (pgoff + (size >> PAGE_SHIFT) < pgoff)
+   if (add_would_overflow(pgoff, (size >> PAGE_SHIFT)))
return ret;
 
if (mmap_write_lock_killable(mm))
diff --git a/mm/mremap.c b/mm/mremap.c
index 38d98465f3d8..efa27019a05d 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -848,7 +848,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long 
addr,
/* Need to be careful about a growing mapping */
pgoff = (addr - vma->vm_start) >> PAGE_SHIFT;
pgoff += vma->vm_pgoff;
-   if (pgoff + (new_len >> PAGE_SHIFT) < pgoff)
+   if (add_would_overflow(pgoff, (new_len >> PAGE_SHIFT)))
return ERR_PTR(-EINVAL);
 
if (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP))
diff --git a/mm/nommu.c b/mm/nommu.c
index b6dc558d3144..299bcfe19eed 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -202,7 +202,7 @@ EXPORT_SYMBOL(vmalloc_to_pfn);
 long vread_iter(struct iov_iter *iter, const char *addr, size_t count)
 {
/* Don't allow overflow */
-   if ((unsigned long) addr + count < count)
+   if (add_would_overflow(count, (unsigned long)addr))
count = -(unsigned long) addr;
 
return copy_to_iter(addr, count, iter);
@@ -1705,7 +1705,7 @@ int access_process_vm(struct task_struct *tsk, unsigned 
long addr, void *buf, in
 {
struct mm_struct *mm;
 
-   if (addr + len < addr)
+   if (add_would_overflow(addr, len))
return 0;
 
mm = get_task_mm(tsk);
diff --git a/mm/util.c b/mm/util.c
index 5a6a9802583b..e6beeb23b48b 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -567,7 +567,7 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot,
unsigned long flag, unsigned long offset)
 {
-   if (unlikely(offset + PAGE_ALIGN(len) < offset))
+   if (unlikely(add_would_overflow(offset, PAGE_ALIGN(len
return -EINVAL;
if (unlikely(offset_in_page(offset)))
return -EINVAL;
-- 
2.34.1

[PATCH 6.1 243/417] kselftest/alsa - mixer-test: Fix the print format specifier warning

2024-01-22 Thread Greg Kroah-Hartman

6.1-stable review patch.  If anyone has any objections, please let me know.

--

From: Mirsad Todorovac 

[ Upstream commit 3f47c1ebe5ca9c5883e596c7888dec4bec0176d8 ]

The GCC 13.2.0 compiler issued the following warning:

mixer-test.c: In function ‘ctl_value_index_valid’:
mixer-test.c:322:79: warning: format ‘%lld’ expects argument of type ‘long long 
int’, \
  but argument 5 has type ‘long int’ [-Wformat=]
  322 | ksft_print_msg("%s.%d value %lld more than 
maximum %lld\n",
  | 
   ~~~^
  | 
  |
  | 
  long long int
  | 
   %ld
  323 |ctl->name, index, int64_val,
  324 |
snd_ctl_elem_info_get_max(ctl->info));
  |

  ||
  |long int

Fixing the format specifier as advised by the compiler suggestion removes the
warning.

Fixes: 3f48b137d88e7 ("kselftest: alsa: Factor out check that values meet 
constraints")
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: Shuah Khan 
Cc: linux-so...@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Mirsad Todorovac 
Acked-by: Mark Brown 
Link: 
https://lore.kernel.org/r/20240107173704.937824-3-mirsad.todoro...@alu.unizg.hr
Signed-off-by: Takashi Iwai 
Signed-off-by: Sasha Levin 
---
 tools/testing/selftests/alsa/mixer-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/alsa/mixer-test.c 
b/tools/testing/selftests/alsa/mixer-test.c
index d59910658c8c..9ad39db32d14 100644
--- a/tools/testing/selftests/alsa/mixer-test.c
+++ b/tools/testing/selftests/alsa/mixer-test.c
@@ -358,7 +358,7 @@ static bool ctl_value_index_valid(struct ctl_data *ctl,
}
 
if (int64_val > snd_ctl_elem_info_get_max64(ctl->info)) {
-   ksft_print_msg("%s.%d value %lld more than maximum 
%lld\n",
+   ksft_print_msg("%s.%d value %lld more than maximum 
%ld\n",
   ctl->name, index, int64_val,
   snd_ctl_elem_info_get_max(ctl->info));
return false;
-- 
2.43.0

[PATCH 6.1 242/417] kselftest/alsa - mixer-test: fix the number of parameters to ksft_exit_fail_msg()

2024-01-22 Thread Greg Kroah-Hartman

6.1-stable review patch.  If anyone has any objections, please let me know.

--

From: Mirsad Todorovac 

[ Upstream commit 8c51c13dc63d46e754c44215eabc0890a8bd9bfb ]

Minor fix in the number of arguments to error reporting function in the
test program as reported by GCC 13.2.0 warning.

mixer-test.c: In function ‘find_controls’:
mixer-test.c:169:44: warning: too many arguments for format 
[-Wformat-extra-args]
  169 | ksft_exit_fail_msg("snd_ctl_poll_descriptors() 
failed for %d\n",
  |
^~~~

The number of arguments in call to ksft_exit_fail_msg() doesn't correspond
to the format specifiers, so this is adjusted resembling the sibling calls
to the error function.

Fixes: b1446bda56456 ("kselftest: alsa: Check for event generation when we 
write to controls")
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: Shuah Khan 
Cc: linux-so...@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Mirsad Todorovac 
Acked-by: Mark Brown 
Link: 
https://lore.kernel.org/r/20240107173704.937824-2-mirsad.todoro...@alu.unizg.hr
Signed-off-by: Takashi Iwai 
Signed-off-by: Sasha Levin 
---
 tools/testing/selftests/alsa/mixer-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/alsa/mixer-test.c 
b/tools/testing/selftests/alsa/mixer-test.c
index 37da902545a4..d59910658c8c 100644
--- a/tools/testing/selftests/alsa/mixer-test.c
+++ b/tools/testing/selftests/alsa/mixer-test.c
@@ -205,7 +205,7 @@ static void find_controls(void)
err = snd_ctl_poll_descriptors(card_data->handle,
   &card_data->pollfd, 1);
if (err != 1) {
-   ksft_exit_fail_msg("snd_ctl_poll_descriptors() failed 
for %d\n",
+   ksft_exit_fail_msg("snd_ctl_poll_descriptors() failed 
for card %d: %d\n",
   card, err);
}
 
-- 
2.43.0

[PATCH 6.6 334/583] kselftest/alsa - mixer-test: fix the number of parameters to ksft_exit_fail_msg()

2024-01-22 Thread Greg Kroah-Hartman

6.6-stable review patch.  If anyone has any objections, please let me know.

--

From: Mirsad Todorovac 

[ Upstream commit 8c51c13dc63d46e754c44215eabc0890a8bd9bfb ]

Minor fix in the number of arguments to error reporting function in the
test program as reported by GCC 13.2.0 warning.

mixer-test.c: In function ‘find_controls’:
mixer-test.c:169:44: warning: too many arguments for format 
[-Wformat-extra-args]
  169 | ksft_exit_fail_msg("snd_ctl_poll_descriptors() 
failed for %d\n",
  |
^~~~

The number of arguments in call to ksft_exit_fail_msg() doesn't correspond
to the format specifiers, so this is adjusted resembling the sibling calls
to the error function.

Fixes: b1446bda56456 ("kselftest: alsa: Check for event generation when we 
write to controls")
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: Shuah Khan 
Cc: linux-so...@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Mirsad Todorovac 
Acked-by: Mark Brown 
Link: 
https://lore.kernel.org/r/20240107173704.937824-2-mirsad.todoro...@alu.unizg.hr
Signed-off-by: Takashi Iwai 
Signed-off-by: Sasha Levin 
---
 tools/testing/selftests/alsa/mixer-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/alsa/mixer-test.c 
b/tools/testing/selftests/alsa/mixer-test.c
index 23df154fcdd7..208c2170c074 100644
--- a/tools/testing/selftests/alsa/mixer-test.c
+++ b/tools/testing/selftests/alsa/mixer-test.c
@@ -166,7 +166,7 @@ static void find_controls(void)
err = snd_ctl_poll_descriptors(card_data->handle,
   &card_data->pollfd, 1);
if (err != 1) {
-   ksft_exit_fail_msg("snd_ctl_poll_descriptors() failed 
for %d\n",
+   ksft_exit_fail_msg("snd_ctl_poll_descriptors() failed 
for card %d: %d\n",
   card, err);
}
 
-- 
2.43.0

[PATCH 6.6 335/583] kselftest/alsa - mixer-test: Fix the print format specifier warning

2024-01-22 Thread Greg Kroah-Hartman

6.6-stable review patch.  If anyone has any objections, please let me know.

--

From: Mirsad Todorovac 

[ Upstream commit 3f47c1ebe5ca9c5883e596c7888dec4bec0176d8 ]

The GCC 13.2.0 compiler issued the following warning:

mixer-test.c: In function ‘ctl_value_index_valid’:
mixer-test.c:322:79: warning: format ‘%lld’ expects argument of type ‘long long 
int’, \
  but argument 5 has type ‘long int’ [-Wformat=]
  322 | ksft_print_msg("%s.%d value %lld more than 
maximum %lld\n",
  | 
   ~~~^
  | 
  |
  | 
  long long int
  | 
   %ld
  323 |ctl->name, index, int64_val,
  324 |
snd_ctl_elem_info_get_max(ctl->info));
  |

  ||
  |long int

Fixing the format specifier as advised by the compiler suggestion removes the
warning.

Fixes: 3f48b137d88e7 ("kselftest: alsa: Factor out check that values meet 
constraints")
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: Shuah Khan 
Cc: linux-so...@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Mirsad Todorovac 
Acked-by: Mark Brown 
Link: 
https://lore.kernel.org/r/20240107173704.937824-3-mirsad.todoro...@alu.unizg.hr
Signed-off-by: Takashi Iwai 
Signed-off-by: Sasha Levin 
---
 tools/testing/selftests/alsa/mixer-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/alsa/mixer-test.c 
b/tools/testing/selftests/alsa/mixer-test.c
index 208c2170c074..df942149c6f6 100644
--- a/tools/testing/selftests/alsa/mixer-test.c
+++ b/tools/testing/selftests/alsa/mixer-test.c
@@ -319,7 +319,7 @@ static bool ctl_value_index_valid(struct ctl_data *ctl,
}
 
if (int64_val > snd_ctl_elem_info_get_max64(ctl->info)) {
-   ksft_print_msg("%s.%d value %lld more than maximum 
%lld\n",
+   ksft_print_msg("%s.%d value %lld more than maximum 
%ld\n",
   ctl->name, index, int64_val,
   snd_ctl_elem_info_get_max(ctl->info));
return false;
-- 
2.43.0

[PATCH 6.6 336/583] kselftest/alsa - conf: Stringify the printed errno in sysfs_get()

2024-01-22 Thread Greg Kroah-Hartman

6.6-stable review patch.  If anyone has any objections, please let me know.

--

From: Mirsad Todorovac 

[ Upstream commit fd38dd6abda589a8771e7872e4dea28c99c6a6ef ]

GCC 13.2.0 reported the warning of the print format specifier:

conf.c: In function ‘sysfs_get’:
conf.c:181:72: warning: format ‘%s’ expects argument of type ‘char *’, \
but argument 3 has type ‘int’ [-Wformat=]
  181 | ksft_exit_fail_msg("sysfs: unable to read value '%s': 
%s\n",
  |   ~^
  ||
  |
char *
  |   %d

The fix passes strerror(errno) as it was intended, like in the sibling error
exit message.

Fixes: aba51cd0949ae ("selftests: alsa - add PCM test")
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: Shuah Khan 
Cc: linux-so...@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Mirsad Todorovac 
Acked-by: Mark Brown 
Link: 
https://lore.kernel.org/r/20240107173704.937824-5-mirsad.todoro...@alu.unizg.hr
Signed-off-by: Takashi Iwai 
Signed-off-by: Sasha Levin 
---
 tools/testing/selftests/alsa/conf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/alsa/conf.c 
b/tools/testing/selftests/alsa/conf.c
index 2f1685a3eae1..ff09038fdce6 100644
--- a/tools/testing/selftests/alsa/conf.c
+++ b/tools/testing/selftests/alsa/conf.c
@@ -186,7 +186,7 @@ static char *sysfs_get(const char *sysfs_root, const char 
*id)
close(fd);
if (len < 0)
ksft_exit_fail_msg("sysfs: unable to read value '%s': %s\n",
-  path, errno);
+  path, strerror(errno));
while (len > 0 && path[len-1] == '\n')
len--;
path[len] = '\0';
-- 
2.43.0

Re: [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions

2024-01-22 Thread Waiman Long




On 1/22/24 10:07, Michal Koutný wrote:

Hello Waiman.

On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long  
wrote:

This patch series is based on the RFC patch from Frederic [1]. Instead
of offering RCU_NOCB as a separate option, it is now lumped into a
root-only cpuset.cpus.isolation_full flag that will enable all the
additional CPU isolation capabilities available for isolated partitions
if set. RCU_NOCB is just the first one to this party. Additional dynamic
CPU isolation capabilities will be added in the future.

IIUC this is similar to what I suggested back in the day and you didn't
consider it [1]. Do I read this right that you've changed your mind?


I didn't said that we were not going to do this at the time. It's just 
that more evaluation will need to be done before we are going to do 
this. I was also looking to see if there were use cases where such 
capabilities were needed. Now I am aware that such use cases do exist 
and we should start looking into it.




(It's fine if you did, I'm only asking to follow the heading of cpuset
controller.)


OK, the title of the cover-letter may be too specific. I will make it 
more general in the next version.


Cheers,
Longman

[PATCH net] selftests: netdevsim: fix the udp_tunnel_nic test

2024-01-22 Thread Jakub Kicinski

This test is missing a whole bunch of checks for interface
renaming and one ifup. Presumably it was only used on a system
with renaming disabled and NetworkManager running.

Fixes: 91f430b2c49d ("selftests: net: add a test for UDP tunnel info infra")
Signed-off-by: Jakub Kicinski 
---
CC: sh...@kernel.org
CC: ho...@kernel.org
CC: linux-kselftest@vger.kernel.org
---
 .../selftests/drivers/net/netdevsim/udp_tunnel_nic.sh| 9 +
 1 file changed, 9 insertions(+)

diff --git a/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh 
b/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh
index 4855ef597a15..f98435c502f6 100755
--- a/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh
+++ b/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh
@@ -270,6 +270,7 @@ for port in 0 1; do
echo 1 > $NSIM_DEV_SYS/new_port
 fi
 NSIM_NETDEV=`get_netdev_name old_netdevs`
+ifconfig $NSIM_NETDEV up
 
 msg="new NIC device created"
 exp0=( 0 0 0 0 )
@@ -431,6 +432,7 @@ for port in 0 1; do
 fi
 
 echo $port > $NSIM_DEV_SYS/new_port
+NSIM_NETDEV=`get_netdev_name old_netdevs`
 ifconfig $NSIM_NETDEV up
 
 overflow_table0 "overflow NIC table"
@@ -488,6 +490,7 @@ for port in 0 1; do
 fi
 
 echo $port > $NSIM_DEV_SYS/new_port
+NSIM_NETDEV=`get_netdev_name old_netdevs`
 ifconfig $NSIM_NETDEV up
 
 overflow_table0 "overflow NIC table"
@@ -544,6 +547,7 @@ for port in 0 1; do
 fi
 
 echo $port > $NSIM_DEV_SYS/new_port
+NSIM_NETDEV=`get_netdev_name old_netdevs`
 ifconfig $NSIM_NETDEV up
 
 overflow_table0 "destroy NIC"
@@ -573,6 +577,7 @@ for port in 0 1; do
 fi
 
 echo $port > $NSIM_DEV_SYS/new_port
+NSIM_NETDEV=`get_netdev_name old_netdevs`
 ifconfig $NSIM_NETDEV up
 
 msg="create VxLANs v6"
@@ -633,6 +638,7 @@ for port in 0 1; do
 fi
 
 echo $port > $NSIM_DEV_SYS/new_port
+NSIM_NETDEV=`get_netdev_name old_netdevs`
 ifconfig $NSIM_NETDEV up
 
 echo 110 > $NSIM_DEV_DFS/ports/$port/udp_ports_inject_error
@@ -688,6 +694,7 @@ for port in 0 1; do
 fi
 
 echo $port > $NSIM_DEV_SYS/new_port
+NSIM_NETDEV=`get_netdev_name old_netdevs`
 ifconfig $NSIM_NETDEV up
 
 msg="create VxLANs v6"
@@ -747,6 +754,7 @@ for port in 0 1; do
 fi
 
 echo $port > $NSIM_DEV_SYS/new_port
+NSIM_NETDEV=`get_netdev_name old_netdevs`
 ifconfig $NSIM_NETDEV up
 
 msg="create VxLANs v6"
@@ -877,6 +885,7 @@ msg="re-add a port"
 
 echo 2 > $NSIM_DEV_SYS/del_port
 echo 2 > $NSIM_DEV_SYS/new_port
+NSIM_NETDEV=`get_netdev_name old_netdevs`
 check_tables
 
 msg="replace VxLAN in overflow table"
-- 
2.43.0

[PATCH] selftests/landlock:Fix net_test build issues with old libc

2024-01-22 Thread Hu Yadi

From: "Hu.Yadi" 

Fixes: a549d055a22e ("selftests/landlock: Add network tests")

one issues comes up while building selftest/landlock/net_test on my side
(gcc 7.3/glibc-2.28/kernel-4.19)

net_test.c: In function ‘set_service’:
net_test.c:91:45: warning: implicit declaration of function ‘gettid’; 
[-Wimplicit-function-declaration]
"_selftests-landlock-net-tid%d-index%d", gettid(),
 ^~
 getgid
net_test.c:(.text+0x4e0): undefined reference to `gettid'

Signed-off-by: Hu Yadi 
Suggested-by: Jiao 
Reviewed-by: Berlin 
---
 tools/testing/selftests/landlock/net_test.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/landlock/net_test.c 
b/tools/testing/selftests/landlock/net_test.c
index 929e21c4db05..6cc1bb1a9166 100644
--- a/tools/testing/selftests/landlock/net_test.c
+++ b/tools/testing/selftests/landlock/net_test.c
@@ -18,9 +18,15 @@
 #include 
 #include 
 #include 
-
+#include 
 #include "common.h"

+
+static pid_t sys_gettid(void)
+{
+   return syscall(__NR_gettid);
+}
+
 const short sock_port_start = (1 << 10);

 static const char loopback_ipv4[] = "127.0.0.1";
@@ -88,7 +94,7 @@ static int set_service(struct service_fixture *const srv,
case AF_UNIX:
srv->unix_addr.sun_family = prot.domain;
sprintf(srv->unix_addr.sun_path,
-   "_selftests-landlock-net-tid%d-index%d", gettid(),
+   "_selftests-landlock-net-tid%d-index%d", sys_gettid(),
index);
srv->unix_addr_len = SUN_LEN(&srv->unix_addr);
srv->unix_addr.sun_path[0] = '\0';
--
2.23.0

[PATCH v2 2/2] selftests/mm: run_vmtests: remove sudo and conform to tap

2024-01-22 Thread Muhammad Usama Anjum

Remove sudo as some test running environments may not have sudo
available. Instead skip the test if root privileges aren't available in
the test.

Signed-off-by: Muhammad Usama Anjum 
---
Changes since v1:
- Added this patch in v2

We are allocating 2*RLIMIT_MEMLOCK.rlim_max memory and mmap() isn't
failing. This seems like true bug in the kernel. Even the root user
shouldn't be able to allocate more memory than allowed MEMLOCKed memory.
Any ideas?
---
 tools/testing/selftests/mm/on-fault-limit.c | 36 ++---
 tools/testing/selftests/mm/run_vmtests.sh   |  2 +-
 2 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/mm/on-fault-limit.c 
b/tools/testing/selftests/mm/on-fault-limit.c
index b5888d613f34e..0ea98ffab3589 100644
--- a/tools/testing/selftests/mm/on-fault-limit.c
+++ b/tools/testing/selftests/mm/on-fault-limit.c
@@ -5,40 +5,38 @@
 #include 
 #include 
 #include 
+#include "../kselftest.h"
 
-static int test_limit(void)
+static void test_limit(void)
 {
-   int ret = 1;
struct rlimit lims;
void *map;
 
-   if (getrlimit(RLIMIT_MEMLOCK, &lims)) {
-   perror("getrlimit");
-   return ret;
-   }
+   if (getrlimit(RLIMIT_MEMLOCK, &lims))
+   ksft_exit_fail_msg("getrlimit: %s\n", strerror(errno));
 
-   if (mlockall(MCL_ONFAULT | MCL_FUTURE)) {
-   perror("mlockall");
-   return ret;
-   }
+   if (mlockall(MCL_ONFAULT | MCL_FUTURE))
+   ksft_exit_fail_msg("mlockall: %s\n", strerror(errno));
 
map = mmap(NULL, 2 * lims.rlim_max, PROT_READ | PROT_WRITE,
   MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
+
+   ksft_test_result(map == MAP_FAILED, "Failed mmap\n");
+
if (map != MAP_FAILED)
-   printf("mmap should have failed, but didn't\n");
-   else {
-   ret = 0;
munmap(map, 2 * lims.rlim_max);
-   }
-
munlockall();
-   return ret;
 }
 
 int main(int argc, char **argv)
 {
-   int ret = 0;
+   ksft_print_header();
+   ksft_set_plan(1);
+
+   if (getuid())
+   ksft_test_result_skip("Require root privileges to run\n");
+   else
+   test_limit();
 
-   ret += test_limit();
-   return ret;
+   ksft_finished();
 }
diff --git a/tools/testing/selftests/mm/run_vmtests.sh 
b/tools/testing/selftests/mm/run_vmtests.sh
index 12754af00b39c..863bbc2015332 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -294,7 +294,7 @@ echo "$nr_hugepgs" > /proc/sys/vm/nr_hugepages
 
 CATEGORY="compaction" run_test ./compaction_test
 
-CATEGORY="mlock" run_test sudo -u nobody ./on-fault-limit
+CATEGORY="mlock" run_test ./on-fault-limit
 
 CATEGORY="mmap" run_test ./map_populate
 
-- 
2.42.0

[PATCH v2 1/2] selftests/mm: run_vmtests.sh: add missing tests

2024-01-22 Thread Muhammad Usama Anjum

Add missing tests to run_vmtests.sh. The mm kselftests are run through
run_vmtests.sh. If a test isn't present in this script, it'll not run
with run_tests or `make -C tools/testing/selftests/mm run_tests`.

Cc: Ryan Roberts 
Signed-off-by: Muhammad Usama Anjum 
---
Changes since v1:
- Copy the original scripts and their dependence script to install directory as 
well
---
 tools/testing/selftests/mm/Makefile   | 3 +++
 tools/testing/selftests/mm/run_vmtests.sh | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/tools/testing/selftests/mm/Makefile 
b/tools/testing/selftests/mm/Makefile
index 2453add65d12f..c9c8112a7262e 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -114,6 +114,9 @@ TEST_PROGS := run_vmtests.sh
 TEST_FILES := test_vmalloc.sh
 TEST_FILES += test_hmm.sh
 TEST_FILES += va_high_addr_switch.sh
+TEST_FILES += charge_reserved_hugetlb.sh
+TEST_FILES += write_hugetlb_memory.sh
+TEST_FILES += hugetlb_reparenting_test.sh
 
 include ../lib.mk
 
diff --git a/tools/testing/selftests/mm/run_vmtests.sh 
b/tools/testing/selftests/mm/run_vmtests.sh
index 246d53a5d7f28..12754af00b39c 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -248,6 +248,9 @@ CATEGORY="hugetlb" run_test ./map_hugetlb
 CATEGORY="hugetlb" run_test ./hugepage-mremap
 CATEGORY="hugetlb" run_test ./hugepage-vmemmap
 CATEGORY="hugetlb" run_test ./hugetlb-madvise
+CATEGORY="hugetlb" run_test ./charge_reserved_hugetlb.sh -cgroup-v2
+CATEGORY="hugetlb" run_test ./hugetlb_reparenting_test.sh -cgroup-v2
+CATEGORY="hugetlb" run_test ./hugetlb-read-hwpoison
 
 nr_hugepages_tmp=$(cat /proc/sys/vm/nr_hugepages)
 # For this test, we need one and just one huge page
-- 
2.42.0

Re: [PATCH] selftests/mm: run_vmtests.sh: add missing tests

2024-01-22 Thread Muhammad Usama Anjum

On 1/22/24 2:59 PM, Ryan Roberts wrote:
 +CATEGORY="hugetlb" run_test ./hugetlb-read-hwpoison
>>>
>>> The addition of this test causes 2 later tests to fail with ENOMEM. I 
>>> suspect
>>> its a side-effect of marking the hugetlbs as hwpoisoned? (just a guess 
>>> based on
>>> the test name!). Once a page is marked poisoned, is there a way to 
>>> un-poison it?
>>> If not, I suspect that's why it wasn't part of the standard test script in 
>>> the
>>> first place.
>> hugetlb-read-hwpoison failed as probably the fix in the kernel for the test
>> hasn't been merged in the kernel. The other tests (uffd-stress) aren't
>> failing on my end and on CI [1][2]
> 
> To be clear, hugetlb-read-hwpoison isn't failing for me, its just causing the
> subsequent tests uffd-stress tests to fail. Both of those subsequent tests are
> allocating hugetlbs so my guess is that since this test is marking some 
> hugetlbs
> as poisoned, there are no longer enough for the subsequent tests.
> 
>>
>> [1] https://lava.collabora.dev/scheduler/job/12577207#L3677
>> [2] https://lava.collabora.dev/scheduler/job/12577229#L4027
>>
>> Maybe its configurations issue which is exposed now. Not sure. Maybe
>> hugetlb-read-hwpoison is changing some configuration and not restoring it.
> 
> Well yes - its marking some hugetlb pages as HWPOISONED.
> 
>> Maybe your system has less number of hugetlb pages.
> 
> YEs probably; What is hugetlb-read-hwpoison's requirement for size and number 
> of
> hugetlb pages? the run_vmtests.sh script allocates the required number of
> default-sized hugetlb pages before running any tests (I guess this value 
> should
> be increased for hugetlb-read-hwpoison's requirements?).
> 
> Additionally, our CI preallocates non-default sizes from the kernel command 
> line
> at boot. Happy to increase these if you can tell me what the new requirement 
> is:
I'm not sure about the exact requirement of the number of hugetlb for these
tests. But I specify hugepages=1000 and tests work for me.

I've sent v2 [1]. Would it be possible to run your CI on that and share
results before we merge that one?

[1]
https://lore.kernel.org/all/20240123073615.920324-1-usama.an...@collabora.com

> 
> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2
> 
> Thanks,
> Ryan
> 

-- 
BR,
Muhammad Usama Anjum

Re: [PATCH v3 1/7] selftests/mm: hugepage-shm: conform test to TAP format output

2024-01-22 Thread Muhammad Usama Anjum

Hi Andrew,

There hasn't been any comment on these. I guess, they can be picked up now?

Thanks,

On 1/15/24 12:32 PM, Muhammad Usama Anjum wrote:
> Conform the layout, informational and status messages to TAP. No
> functional change is intended other than the layout of output messages.
> 
> The "." was being printed inside for loop to indicate the writes
> progress. This was extraneous and hence removed in the patch.
> 
> Signed-off-by: Muhammad Usama Anjum 
> ---
>  tools/testing/selftests/mm/hugepage-shm.c | 47 +++
>  1 file changed, 22 insertions(+), 25 deletions(-)
> 
> diff --git a/tools/testing/selftests/mm/hugepage-shm.c 
> b/tools/testing/selftests/mm/hugepage-shm.c
> index 478bb1e989e9..f949dbbc3454 100644
> --- a/tools/testing/selftests/mm/hugepage-shm.c
> +++ b/tools/testing/selftests/mm/hugepage-shm.c
> @@ -34,11 +34,10 @@
>  #include 
>  #include 
>  #include 
> +#include "../kselftest.h"
>  
>  #define LENGTH (256UL*1024*1024)
>  
> -#define dprintf(x)  printf(x)
> -
>  /* Only ia64 requires this */
>  #ifdef __ia64__
>  #define ADDR (void *)(0x8000UL)
> @@ -54,44 +53,42 @@ int main(void)
>   unsigned long i;
>   char *shmaddr;
>  
> + ksft_print_header();
> + ksft_set_plan(1);
> +
>   shmid = shmget(2, LENGTH, SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W);
> - if (shmid < 0) {
> - perror("shmget");
> - exit(1);
> - }
> - printf("shmid: 0x%x\n", shmid);
> + if (shmid < 0)
> + ksft_exit_fail_msg("shmget: %s\n", strerror(errno));
> +
> + ksft_print_msg("shmid: 0x%x\n", shmid);
>  
>   shmaddr = shmat(shmid, ADDR, SHMAT_FLAGS);
>   if (shmaddr == (char *)-1) {
> - perror("Shared memory attach failure");
>   shmctl(shmid, IPC_RMID, NULL);
> - exit(2);
> + ksft_exit_fail_msg("Shared memory attach failure: %s\n", 
> strerror(errno));
>   }
> - printf("shmaddr: %p\n", shmaddr);
>  
> - dprintf("Starting the writes:\n");
> - for (i = 0; i < LENGTH; i++) {
> + ksft_print_msg("shmaddr: %p\n", shmaddr);
> +
> + ksft_print_msg("Starting the writes:");
> + for (i = 0; i < LENGTH; i++)
>   shmaddr[i] = (char)(i);
> - if (!(i % (1024 * 1024)))
> - dprintf(".");
> - }
> - dprintf("\n");
> + ksft_print_msg("Done.\n");
>  
> - dprintf("Starting the Check...");
> + ksft_print_msg("Starting the Check...");
>   for (i = 0; i < LENGTH; i++)
> - if (shmaddr[i] != (char)i) {
> - printf("\nIndex %lu mismatched\n", i);
> - exit(3);
> - }
> - dprintf("Done.\n");
> + if (shmaddr[i] != (char)i)
> + ksft_exit_fail_msg("\nIndex %lu mismatched\n", i);
> + ksft_print_msg("Done.\n");
>  
>   if (shmdt((const void *)shmaddr) != 0) {
> - perror("Detach failure");
>   shmctl(shmid, IPC_RMID, NULL);
> - exit(4);
> + ksft_exit_fail_msg("Detach failure: %s\n", strerror(errno));
>   }
>  
>   shmctl(shmid, IPC_RMID, NULL);
>  
> - return 0;
> + ksft_test_result_pass("Completed test\n");
> +
> + ksft_finished();
>  }

-- 
BR,
Muhammad Usama Anjum

Re: [PATCH] selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros

2024-01-22 Thread Muhammad Usama Anjum

Hi,

Can anybody please pick this patch? This was fixing genuine regression in
some build system.

Thanks,

On 10/24/23 8:51 PM, Muhammad Usama Anjum wrote:
> Correct header file is needed for getting CLOSE_RANGE_* macros.
> Previously it was tested with newer glibc which didn't show the need to
> include the header which was a mistake.
> 
> Fixes: ec54424923cf ("selftests: core: remove duplicate defines")
> Reported-by: Aishwarya TCV 
> Link: https://lore.kernel.org/all/7161219e-0223-d699-d6f3-81abd9abf...@arm.com
> Signed-off-by: Muhammad Usama Anjum 
> ---
>  tools/testing/selftests/core/close_range_test.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/testing/selftests/core/close_range_test.c 
> b/tools/testing/selftests/core/close_range_test.c
> index 534576f06df1c..c59e4adb905df 100644
> --- a/tools/testing/selftests/core/close_range_test.c
> +++ b/tools/testing/selftests/core/close_range_test.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "../kselftest_harness.h"
>  #include "../clone3/clone3_selftests.h"

-- 
BR,
Muhammad Usama Anjum

Re: [PATCH v2 4/4] selftests/resctrl: Add non-contiguous CBMs CAT test

2024-01-22 Thread Maciej Wieczór-Retman

On 2024-01-22 at 08:32:36 -0800, Reinette Chatre wrote:
>Hi Maciej,
>
>On 1/21/2024 11:56 PM, Maciej Wieczór-Retman wrote:
>> Hi!
>> 
>> On 2024-01-19 at 08:39:31 -0800, Reinette Chatre wrote:
>>> Hi Maciej,
>>>
>>> On 1/18/2024 11:37 PM, Maciej Wieczór-Retman wrote:
 On 2024-01-18 at 09:15:46 -0800, Reinette Chatre wrote:
> On 1/18/2024 4:02 AM, Maciej Wieczór-Retman wrote:
>> On 2024-01-17 at 10:49:06 -0800, Reinette Chatre wrote:
>>> On 1/17/2024 12:26 AM, Maciej Wieczór-Retman wrote:
 On 2024-01-08 at 14:42:11 -0800, Reinette Chatre wrote:
> On 12/12/2023 6:52 AM, Maciej Wieczor-Retman wrote:
>
>> +bit_center = count_bits(full_cache_mask) / 2;
>> +cont_mask = full_cache_mask >> bit_center;
>> +
>> +/* Contiguous mask write check. */
>> +snprintf(schemata, sizeof(schemata), "%lx", cont_mask);
>> +ret = write_schemata("", schemata, uparams->cpu, 
>> test->resource);
>> +if (ret)
>> +return ret;
>
> How will user know what failed? I am seeing this single test exercise 
> a few scenarios
> and it is not obvious to me if the issue will be clear if this test,
> noncont_cat_run_test(), fails.

 write_schemata() either succeeds with '0' or errors out with a 
 negative value. If
 the contiguous mask write fails, write_schemata should print out what 
 was wrong
 and I believe that the test will report an error rather than failure.
>>>
>>> Right. I am trying to understand whether the user will be able to 
>>> decipher what failed
>>> in case there is an error. Seems like in this case the user is expected 
>>> to look at the
>>> source code of the test to understand what the test was trying to do at 
>>> the time it
>>> encountered the failure. In this case user may be "lucky" that this 
>>> test only has
>>> one write_schemata() call _not_ followed by a ksft_print_msg() so user 
>>> can use that
>>> reasoning to figure out which write_schemata() failed to further dig 
>>> what test was
>>> trying to do. 
>>
>> When a write_schemata() is executed the string that is being written gets
>> printed. If there are multiple calls in a single tests and one fails I'd 
>> imagine
>> it would be easy for the user to figure out which one failed.
>
> It would be easy for the user the figure out if (a) it is obvious to the 
> user
> what schema a particular write_schema() call attempted to write and (b) 
> all the
> write_schema() calls attempt to write different schema.
>> 
 As for (b) depends on what you meant. Other tests that run more than one
 write_schemata() use different ones every time (CAT, MBM, MBA). Do you 
 suggest
 that the non-contiguous test should attempt more schematas? For example 
 shift
 the bit hole from one side to the other? I assumed one CBM with a centered 
 bit
 hole would be enough to check if non-contiguous CBM feature works properly 
 and
 more CBMs would be redundant.
>>>
>>> Let me try with an example.
>>> Scenario 1:
>>> The test has the following code:
>>> ...
>>> write_schemata(..., "0xfff", ...);
>>> ...
>>> write_schemata(..., "0xf0f", ...);
>>> ...
>>>
>>> Scenario 2:
>>> The test has the following code:
>>> ...
>>> write_schemata(..., "0xfff", ...);
>>> ...
>>> write_schemata(..., "0xfff", ...);
>>> ...
>>>
>>> A failure of either write_schemata() in scenario 1 will be easy to trace 
>>> since 
>>> the schemata attempted is different in each case. The schemata printed by 
>>> the
>>> write_schemata() error message can thus easily be connected to the specific
>>> write_schemata() call.
>>> A failure of either write_schemata() in scenario 2 is not so obvious since 
>>> they
>>> both attempted the same schemata so the error message printed by 
>>> write_schemata()
>>> could belong to either. 
>
>> I'm sorry to drag this thread out but I want to be sure if I'm right or are 
>> you
>> suggesting something and I missed it?
>
>Please just add a ksft_print_msg() to noncont_cat_run_test() when this
>write_schemata() fails.

My point all along was that if write_schemata() fails it already prints out all
the necessary information. I'd like to avoid adding redundant messages so please
take a look at how it looks now:

I injected write_schemata() with an error so it will take a path as if write()
failed with 'Permission denied' as a reason. Here is the output for L3
non-contiguous CAT test:

[root@spr1 ~]# ./resctrl_tests -t L3_NONCONT_CAT
TAP version 13
# Pass: Check kernel supports resctrl filesystem
# Pass: Check resctrl mountpoint "/sys/fs/resctrl" exists
# resctrl filesystem not mounted
# dmesg: [   18.579861] resctrl: L3 allocation detected

1 2 >

100 matches

Mail list logo