RE: [Automated-testing] [RFC] Test catalog template
> -Original Message- > From: automated-test...@lists.yoctoproject.org > On Behalf Of Don Zickus > Hi, > > At Linux Plumbers, a few dozen of us gathered together to discuss how > to expose what tests subsystem maintainers would like to run for every > patch submitted or when CI runs tests. We agreed on a mock up of a > yaml template to start gathering info. The yaml file could be > temporarily stored on kernelci.org until a more permanent home could > be found. Attached is a template to start the conversation. > Don, I'm interested in this initiative. Is discussion going to be on a kernel mailing list, or on this e-mail, or somewhere else? See a few comments below. > Longer story. > > The current problem is CI systems are not unanimous about what tests > they run on submitted patches or git branches. This makes it > difficult to figure out why a test failed or how to reproduce. > Further, it isn't always clear what tests a normal contributor should > run before posting patches. > > It has been long communicated that the tests LTP, xfstest and/or > kselftests should be the tests to run. Just saying "LTP" is not granular enough. LTP has hundreds of individual test programs, and it would be useful to specify the individual tests from LTP that should be run per sub-system. I was particularly intrigued by the presentation at Plumbers about test coverage. It would be nice to have data (or easily replicable methods) for determining the code coverage of a test or set of tests, to indicate what parts of the kernel are being missed and help drive new test development. > However, not all maintainers > use those tests for their subsystems. I am hoping to either capture > those tests or find ways to convince them to add their tests to the > preferred locations. > > The goal is for a given subsystem (defined in MAINTAINERS), define a > set of tests that should be run for any contributions to that > subsystem. The hope is the collective CI results can be triaged > collectively (because they are related) and even have the numerous > flakes waived collectively (same reason) improving the ability to > find and debug new test failures. Because the tests and process are > known, having a human help debug any failures becomes easier. > > The plan is to put together a minimal yaml template that gets us going > (even if it is not optimized yet) and aim for about a dozen or so > subsystems. At that point we should have enough feedback to promote > this more seriously and talk optimizations. Sounds like a good place to start. Do we have some candidate sub-systems in mind? Has anyone volunteered to lead the way? > > Feedback encouraged. > > Cheers, > Don > > --- > # List of tests by subsystem > # > # Tests should adhere to KTAP definitions for results > # > # Description of section entries > # > # maintainer:test maintainer - name > # list:mailing list for discussion > # version: stable version of the test > # dependency: necessary distro package for testing > # test: > #path:internal git path or url to fetch from > #cmd:command to run; ability to run locally > #param: additional param necessary to run test > # hardware: hardware necessary for validation Is this something new in MAINTAINERS, or is it a separate file? > # > # Subsystems (alphabetical) > > KUNIT TEST: > maintainer: > - name: name1 > email: email1 > - name: name2 > email: email2 > list: > version: > dependency: > - dep1 > - dep2 > test: > - path: tools/testing/kunit > cmd: > param: > - path: > cmd: > param: > hardware: none Looks OK so far - it'd be nice to have a few concrete examples. -- Tim
Re: [PATCH v5 19/30] arm64: add POE signal support
On Tue, Oct 15, 2024 at 01:25:29PM +0100, Joey Gouly wrote: > On Tue, Oct 15, 2024 at 12:41:16PM +0100, Will Deacon wrote: > > if (system_supports_poe() && err == 0 && user->poe_offset) { > > ... > > which gives the wrong impression that the POR is somehow optional, even > > if the CPU supports POE. So we should drop that check of > > 'user->poe_offset' as it cannot be NULL here. > That was cargo culted (by me) from the rest of the function (apart from TPIDR2 > and FPMR). I think Kevin is planning on sending his signal changes still, but > is on holiday, maybe he can remove the last part of the condition as part of > his series. That's there because the decisions about "should we save this thing" are taken in setup_sigframe_layout() and for a bunch of the extensions we suppress the saving if they're in some sort of default state (eg, when we don't have TIF_SVE set we don't output the SVE sigframe). signature.asc Description: PGP signature
Re: [PATCH v6 2/5] selftests: kvm: s390: Add uc_skey VM test case
On Tue Oct 15, 2024 at 3:22 PM CEST, Janosch Frank wrote: > On 10/15/24 10:37 AM, Christoph Schlameuss wrote: > > Add a test case manipulating s390 storage keys from within the ucontrol > > VM. > > > > Storage key instruction (ISKE, SSKE and RRBE) intercepts and > > Keyless-subset facility are disabled on first use, where the skeys are > > setup by KVM in non ucontrol VMs. > > > > [...] > > > -/* verify SIEIC exit > > +/* > > + * Disable skey intercepts and rewind last instruction > > + * (KVM would init the skeys here) > > + */ > > +static void uc_skey_enable(FIXTURE_DATA(uc_kvm) *self) > > +{ > > + struct kvm_s390_sie_block *sie_block = self->sie_block; > > + int ilen = insn_length(sie_block->ipa >> 8); > > + struct kvm_run *run = self->run; > > + > > + /* disable KSS */ > > + sie_block->cpuflags &= ~CPUSTAT_KSS; > > + /* disable skey inst interception */ > > + sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE); > > + > > + /* rewind to reexecute intercepted instruction */ > > + run->psw_addr = run->psw_addr - ilen; > > There's a very important detail between KSS and the SKEY ICTLs: > KSS is (mostly) nullifying i.e. the PSW points to the instruction that > caused the KSS exit. > ICTL intercepts are suppressing which means the PSW points after the > instruction and hence we need to rewind the PSW if we want to re-issue > the instruction. > > Re-winding on a KSS intercept makes the guest cpu execute the > instruction before the intercept producing instruction twice. Oh, yes. You are right, I did mess that up in my cleanup. I will fix that. Here it does now only work since the KSS is not intercepted on the second invocation. But I am with you, it should only be executed once.
Re: [PATCH v5 19/30] arm64: add POE signal support
On Tue, Oct 15, 2024 at 12:41:16PM +0100, Will Deacon wrote: > On Tue, Oct 15, 2024 at 10:59:11AM +0100, Joey Gouly wrote: > > On Mon, Oct 14, 2024 at 06:10:23PM +0100, Will Deacon wrote: > > > Kevin, Joey, > > > > > > On Wed, Oct 09, 2024 at 03:43:01PM +0100, Will Deacon wrote: > > > > On Tue, Sep 24, 2024 at 01:27:58PM +0200, Kevin Brodsky wrote: > > > > > On 22/08/2024 17:11, Joey Gouly wrote: > > > > > > @@ -1178,6 +1237,9 @@ static void setup_return(struct pt_regs > > > > > > *regs, struct k_sigaction *ka, > > > > > > sme_smstop(); > > > > > > } > > > > > > > > > > > > + if (system_supports_poe()) > > > > > > + write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); > > > > > > > > > > At the point where setup_return() is called, the signal frame has > > > > > already been written to the user stack. In other words, we write to > > > > > the > > > > > user stack first, and then reset POR_EL0. This may be problematic, > > > > > especially if we are using the alternate signal stack, which the > > > > > interrupted POR_EL0 may not grant access to. In that situation uaccess > > > > > will fail and we'll end up with a SIGSEGV. > > > > > > > > > > This issue has already been discussed on the x86 side, and as it > > > > > happens > > > > > patches to reset PKRU early [1] have just landed. I don't think this > > > > > is > > > > > a blocker for getting this series landed, but we should try and align > > > > > with x86. If there's no objection, I'm planning to work on a > > > > > counterpart > > > > > to the x86 series (resetting POR_EL0 early during signal delivery). > > > > > > > > Did you get a chance to work on that? It would be great to land the > > > > fixes for 6.12, if possible, so that the first kernel release with POE > > > > support doesn't land with known issues. > > > > > > Looking a little more at this, I think we have quite a weird behaviour > > > on arm64 as it stands. It looks like we rely on the signal frame to hold > > > the original POR_EL0 so, if for some reason we fail to allocate space > > > for the POR context, I think we'll return back from the signal with > > > POR_EL0_INIT. That seems bad? > > > > If we don't allocate space for POR_EL0, I think the program recieves > > SIGSGEV? > > > > setup_sigframe_layout() > > if (system_supports_poe()) { > > err = sigframe_alloc(user, &user->poe_offset, > > sizeof(struct poe_context)); > > if (err) > > return err; > > } > > > > Through get_sigframe() and setup_rt_frame(), that eventually hets here: > > > > handle_signal() > > ret = setup_rt_frame(usig, ksig, oldset, regs); > > > > [..] > > > > signal_setup_done(ret, ksig, test_thread_flag(TIF_SINGLESTEP)); > > > > void signal_setup_done(int failed, struct ksignal *ksig, int stepping) > > > > > > { > > > > > > if (failed) > > > > > > force_sigsegv(ksig->sig); > > > > > > else > > > > > > signal_delivered(ksig, stepping); > > > > > > } > > > > So I think it's "fine"? > > Ah, yes, sorry about that. I got confused by the conditional push in > setup_sigframe(): > > if (system_supports_poe() && err == 0 && user->poe_offset) { > ... > > which gives the wrong impression that the POR is somehow optional, even > if the CPU supports POE. So we should drop that check of > 'user->poe_offset' as it cannot be NULL here. >From memory and a quick glance at the code: For other "conditionally unconditional" things, we don't have a corresponding check on user->foo. For conditional stuff, non-NULLness of user->foo is used to track whether we decided to dump the corresponding record; for consistency here, if we have system_supports_poe() && err == 0, then that's sufficient (though in prior versions of this code, POR_EL0 dumping was conditional and so the extra check did do something...) In any case, if some allocation fails then we splat
Re: [PATCH v3 3/4] KVM: X86: Add documentation about behavioral difference for KVM_EXIT_BUS_LOCK
x86 (lowercase 'x') On Fri, Oct 04, 2024, Manali Shukla wrote: > Add a note about behavioral difference for KVM_EXIT_X86_BUS_LOCK > between AMD CPUs and Intel CPUs in KVM_CAP_X86_BUS_LOCK_EXIT > capability documentation. This belongs in the previous patch, especially if patch 2 is split into arch collateral and KVM implementation. The KVM changes are small enough that I don't see a reason to separate the documentation from the code. And there will likely be enough subtleties to the code that we'll want a bit more documentation.
Re: [PATCH net-next 04/10] selftests: RED: Use defer for test cleanup
Paolo Abeni writes: > On 10/9/24 14:06, Petr Machata wrote: >> @@ -450,6 +415,7 @@ __do_ecn_test() >> start_tcp_traffic $h1.$vlan $(ipaddr 1 $vlan) $(ipaddr 3 $vlan) \ >>$h3_mac tos=0x01 >> +defer stop_traffic $! >> sleep 1 >> ecn_test_common "$name" "$get_nmarked" $vlan $limit >> @@ -462,7 +428,6 @@ __do_ecn_test() >> check_fail $? "UDP traffic went into backlog instead of being >> early-dropped" >> log_test "TC $((vlan - 10)): $name backlog > limit: UDP early-dropped" >> - stop_traffic >> sleep 1 > > I'm wodering what role/goal has the above 'sleep 1'?!? It looks like it > could/should be removed > after moving the stop_traffic call to the deferred cleanup. Yeah, I'll drop these for v2.
[PATCH v6 0/5] selftests: kvm: s390: Add ucontrol memory selftests
This patch series adds a some not yet picked selftests to the kvm s390x selftest suite. The additional test cases are covering: * Assert KVM_EXIT_S390_UCONTROL exit on not mapped memory access * Assert functionality of storage keys in ucontrol VM * Assert that memory region operations are rejected for ucontrol VMs Running the test cases requires sys_admin capabilities to start the ucontrol VM. This can be achieved by running as root or with a command like: sudo setpriv --reuid nobody --inh-caps -all,+sys_admin \ --ambient-caps -all,+sys_admin --bounding-set -all,+sys_admin \ ./ucontrol_test --- The patches in this series have been part of the previous patch series. The test cases added here do depend on the fixture added in the earlier patches. >From v5 PATCH 7-9 the segment and page table generation has been removed and DAT has been disabled. Since DAT is not necessary to validate the KVM code. https://lore.kernel.org/kvm/20240807154512.316936-1-schlame...@linux.ibm.com/ v6: - add instruction intercept handling for skey specific instructions (iske, sske, rrbe) in addition to kss intercept to work properly on all machines - reorder local variables - fixup some method comments - add a patch correcting the IP.b value length a debug message v5: - rebased to current upstream master - corrected assertion on 0x00 to 0 - reworded fixup commit so that it can be merged on top of current upstream v4: - fix whitespaces in pointer function arguments (thanks Claudio) - fix whitespaces in comments (thanks Janosch) v3: - fix skey assertion (thanks Claudio) - introduce a wrapper around UCAS map and unmap ioctls to improve readability (Claudio) - add an displacement to accessed memory to assert translation intercepts actually point to segments to the uc_map_unmap test - add an misaligned failing mapping try to the uc_map_unmap test v2: - Reenable KSS intercept and handle it within skey test. - Modify the checked register between storing (sske) and reading (iske) it within the test program to make sure the. - Add an additional state assertion in the end of uc_skey - Fix some typos and white spaces. v1: - Remove segment and page table generation and disable DAT. This is not necessary to validate the KVM code. Christoph Schlameuss (5): selftests: kvm: s390: Add uc_map_unmap VM test case selftests: kvm: s390: Add uc_skey VM test case selftests: kvm: s390: Verify reject memory region operations for ucontrol VMs selftests: kvm: s390: Fix whitespace confusion in ucontrol test selftests: kvm: s390: correct IP.b length in uc_handle_sieic debug output .../selftests/kvm/include/s390x/processor.h | 6 + .../selftests/kvm/s390x/ucontrol_test.c | 307 +- 2 files changed, 305 insertions(+), 8 deletions(-) base-commit: eca631b8fe808748d7585059c4307005ca5c5820 -- 2.47.0
[PATCH v6 1/5] selftests: kvm: s390: Add uc_map_unmap VM test case
Add a test case verifying basic running and interaction of ucontrol VMs. Fill the segment and page tables for allocated memory and map memory on first access. * uc_map_unmap Store and load data to mapped and unmapped memory and use pic segment translation handling to map memory on access. Signed-off-by: Christoph Schlameuss Reviewed-by: Janosch Frank --- .../selftests/kvm/s390x/ucontrol_test.c | 145 +- 1 file changed, 144 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/s390x/ucontrol_test.c b/tools/testing/selftests/kvm/s390x/ucontrol_test.c index f257beec1430..3e649b12a0b9 100644 --- a/tools/testing/selftests/kvm/s390x/ucontrol_test.c +++ b/tools/testing/selftests/kvm/s390x/ucontrol_test.c @@ -16,7 +16,11 @@ #include #include +#define PGM_SEGMENT_TRANSLATION 0x10 + #define VM_MEM_SIZE (4 * SZ_1M) +#define VM_MEM_EXT_SIZE (2 * SZ_1M) +#define VM_MEM_MAX_M ((VM_MEM_SIZE + VM_MEM_EXT_SIZE) / SZ_1M) /* so directly declare capget to check caps without libcap */ int capget(cap_user_header_t header, cap_user_data_t data); @@ -58,6 +62,23 @@ asm("test_gprs_asm:\n" " j 0b\n" ); +/* Test program manipulating memory */ +extern char test_mem_asm[]; +asm("test_mem_asm:\n" + "xgr%r0, %r0\n" + + "0:\n" + " ahi %r0,1\n" + " st %r1,0(%r5,%r6)\n" + + " xgr %r1,%r1\n" + " l %r1,0(%r5,%r6)\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " j 0b\n" +); + FIXTURE(uc_kvm) { struct kvm_s390_sie_block *sie_block; @@ -67,6 +88,7 @@ FIXTURE(uc_kvm) uintptr_t base_hva; uintptr_t code_hva; int kvm_run_size; + vm_paddr_t pgd; void *vm_mem; int vcpu_fd; int kvm_fd; @@ -116,7 +138,7 @@ FIXTURE_SETUP(uc_kvm) self->base_gpa = 0; self->code_gpa = self->base_gpa + (3 * SZ_1M); - self->vm_mem = aligned_alloc(SZ_1M, VM_MEM_SIZE); + self->vm_mem = aligned_alloc(SZ_1M, VM_MEM_MAX_M * SZ_1M); ASSERT_NE(NULL, self->vm_mem) TH_LOG("malloc failed %u", errno); self->base_hva = (uintptr_t)self->vm_mem; self->code_hva = self->base_hva - self->base_gpa + self->code_gpa; @@ -222,6 +244,60 @@ TEST(uc_cap_hpage) close(kvm_fd); } +/* calculate host virtual addr from guest physical addr */ +static void *gpa2hva(FIXTURE_DATA(uc_kvm) *self, u64 gpa) +{ + return (void *)(self->base_hva - self->base_gpa + gpa); +} + +/* map / make additional memory available */ +static int uc_map_ext(FIXTURE_DATA(uc_kvm) *self, u64 vcpu_addr, u64 length) +{ + struct kvm_s390_ucas_mapping map = { + .user_addr = (u64)gpa2hva(self, vcpu_addr), + .vcpu_addr = vcpu_addr, + .length = length, + }; + pr_info("ucas map %p %p 0x%llx", + (void *)map.user_addr, (void *)map.vcpu_addr, map.length); + return ioctl(self->vcpu_fd, KVM_S390_UCAS_MAP, &map); +} + +/* unmap previously mapped memory */ +static int uc_unmap_ext(FIXTURE_DATA(uc_kvm) *self, u64 vcpu_addr, u64 length) +{ + struct kvm_s390_ucas_mapping map = { + .user_addr = (u64)gpa2hva(self, vcpu_addr), + .vcpu_addr = vcpu_addr, + .length = length, + }; + pr_info("ucas unmap %p %p 0x%llx", + (void *)map.user_addr, (void *)map.vcpu_addr, map.length); + return ioctl(self->vcpu_fd, KVM_S390_UCAS_UNMAP, &map); +} + +/* handle ucontrol exit by mapping the accessed segment */ +static void uc_handle_exit_ucontrol(FIXTURE_DATA(uc_kvm) *self) +{ + struct kvm_run *run = self->run; + u64 seg_addr; + int rc; + + TEST_ASSERT_EQ(KVM_EXIT_S390_UCONTROL, run->exit_reason); + switch (run->s390_ucontrol.pgm_code) { + case PGM_SEGMENT_TRANSLATION: + seg_addr = run->s390_ucontrol.trans_exc_code & ~(SZ_1M - 1); + pr_info("ucontrol pic segment translation 0x%llx, mapping segment 0x%lx\n", + run->s390_ucontrol.trans_exc_code, seg_addr); + /* map / make additional memory available */ + rc = uc_map_ext(self, seg_addr, SZ_1M); + TEST_ASSERT_EQ(0, rc); + break; + default: + TEST_FAIL("UNEXPECTED PGM CODE %d", run->s390_ucontrol.pgm_code); + } +} + /* verify SIEIC exit * * fail on codes not expected in the test cases */ @@ -255,6 +331,12 @@ static bool uc_handle_exit(FIXTURE_DATA(uc_kvm) * self) struct kvm_run *run = self->run; switch (run->exit_reason) { + case KVM_EXIT_S390_UCONTROL: + /** check program interruption code +* handle page fault --> ucas map +*/ + uc_handle_exit_ucontrol(self); + break; case KVM_EXIT_S390_SIEIC: return uc_handle_sieic(self); def
[PATCH v6 5/5] selftests: kvm: s390: correct IP.b length in uc_handle_sieic debug output
The length of the interrupt parameters (IP) are: a: 2 bytes b: 4 bytes Signed-off-by: Christoph Schlameuss --- tools/testing/selftests/kvm/s390x/ucontrol_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/s390x/ucontrol_test.c b/tools/testing/selftests/kvm/s390x/ucontrol_test.c index db8776269373..e69e22504428 100644 --- a/tools/testing/selftests/kvm/s390x/ucontrol_test.c +++ b/tools/testing/selftests/kvm/s390x/ucontrol_test.c @@ -375,7 +375,7 @@ static bool uc_handle_sieic(FIXTURE_DATA(uc_kvm) *self) struct kvm_run *run = self->run; /* check SIE interception code */ - pr_info("sieic: 0x%.2x 0x%.4x 0x%.4x\n", + pr_info("sieic: 0x%.2x 0x%.4x 0x%.8x\n", run->s390_sieic.icptcode, run->s390_sieic.ipa, run->s390_sieic.ipb); -- 2.47.0
[PATCH v6 3/5] selftests: kvm: s390: Verify reject memory region operations for ucontrol VMs
Add a test case verifying KVM_SET_USER_MEMORY_REGION and KVM_SET_USER_MEMORY_REGION2 cannot be executed on ucontrol VMs. Executing this test case on not patched kernels will cause a null pointer dereference in the host kernel. This is fixed with commit: commit 7816e58967d0 ("kvm: s390: Reject memory region operations for ucontrol VMs") Signed-off-by: Christoph Schlameuss Reviewed-by: Janosch Frank --- .../selftests/kvm/s390x/ucontrol_test.c | 22 +++ 1 file changed, 22 insertions(+) diff --git a/tools/testing/selftests/kvm/s390x/ucontrol_test.c b/tools/testing/selftests/kvm/s390x/ucontrol_test.c index 9568a4e03e4b..ae077bf838c8 100644 --- a/tools/testing/selftests/kvm/s390x/ucontrol_test.c +++ b/tools/testing/selftests/kvm/s390x/ucontrol_test.c @@ -439,6 +439,28 @@ static void uc_assert_diag44(FIXTURE_DATA(uc_kvm) * self) TEST_ASSERT_EQ(0x44, sie_block->ipb); } +TEST_F(uc_kvm, uc_no_user_region) +{ + struct kvm_userspace_memory_region region = { + .slot = 1, + .guest_phys_addr = self->code_gpa, + .memory_size = VM_MEM_EXT_SIZE, + .userspace_addr = (uintptr_t)self->code_hva, + }; + struct kvm_userspace_memory_region2 region2 = { + .slot = 1, + .guest_phys_addr = self->code_gpa, + .memory_size = VM_MEM_EXT_SIZE, + .userspace_addr = (uintptr_t)self->code_hva, + }; + + ASSERT_EQ(-1, ioctl(self->vm_fd, KVM_SET_USER_MEMORY_REGION, ®ion)); + ASSERT_EQ(EINVAL, errno); + + ASSERT_EQ(-1, ioctl(self->vm_fd, KVM_SET_USER_MEMORY_REGION2, ®ion2)); + ASSERT_EQ(EINVAL, errno); +} + TEST_F(uc_kvm, uc_map_unmap) { struct kvm_sync_regs *sync_regs = &self->run->s.regs; -- 2.47.0
[PATCH v6 2/5] selftests: kvm: s390: Add uc_skey VM test case
Add a test case manipulating s390 storage keys from within the ucontrol VM. Storage key instruction (ISKE, SSKE and RRBE) intercepts and Keyless-subset facility are disabled on first use, where the skeys are setup by KVM in non ucontrol VMs. Signed-off-by: Christoph Schlameuss --- .../selftests/kvm/include/s390x/processor.h | 6 + .../selftests/kvm/s390x/ucontrol_test.c | 130 +- 2 files changed, 134 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/kvm/include/s390x/processor.h b/tools/testing/selftests/kvm/include/s390x/processor.h index 481bd2fd6a32..33fef6fd9617 100644 --- a/tools/testing/selftests/kvm/include/s390x/processor.h +++ b/tools/testing/selftests/kvm/include/s390x/processor.h @@ -32,4 +32,10 @@ static inline void cpu_relax(void) barrier(); } +/* Get the instruction length */ +static inline int insn_length(unsigned char code) +{ + return int)code + 64) >> 7) + 1) << 1; +} + #endif diff --git a/tools/testing/selftests/kvm/s390x/ucontrol_test.c b/tools/testing/selftests/kvm/s390x/ucontrol_test.c index 3e649b12a0b9..9568a4e03e4b 100644 --- a/tools/testing/selftests/kvm/s390x/ucontrol_test.c +++ b/tools/testing/selftests/kvm/s390x/ucontrol_test.c @@ -79,6 +79,33 @@ asm("test_mem_asm:\n" " j 0b\n" ); +/* Test program manipulating storage keys */ +extern char test_skey_asm[]; +asm("test_skey_asm:\n" + "xgr%r0, %r0\n" + + "0:\n" + " ahi %r0,1\n" + " st %r1,0(%r5,%r6)\n" + + " iske%r1,%r6\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " sske%r1,%r6\n" + " xgr %r1,%r1\n" + " iske%r1,%r6\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " rrbe%r1,%r6\n" + " iske%r1,%r6\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " j 0b\n" +); + FIXTURE(uc_kvm) { struct kvm_s390_sie_block *sie_block; @@ -298,8 +325,49 @@ static void uc_handle_exit_ucontrol(FIXTURE_DATA(uc_kvm) *self) } } -/* verify SIEIC exit +/* + * Disable skey intercepts and rewind last instruction + * (KVM would init the skeys here) + */ +static void uc_skey_enable(FIXTURE_DATA(uc_kvm) *self) +{ + struct kvm_s390_sie_block *sie_block = self->sie_block; + int ilen = insn_length(sie_block->ipa >> 8); + struct kvm_run *run = self->run; + + /* disable KSS */ + sie_block->cpuflags &= ~CPUSTAT_KSS; + /* disable skey inst interception */ + sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE); + + /* rewind to reexecute intercepted instruction */ + run->psw_addr = run->psw_addr - ilen; + pr_info("rewind guest addr to 0x%.16llx\n", run->psw_addr); +} + +/* + * Handle the instruction intercept + * Returns if interception is handled / execution can be continued + */ +static bool uc_handle_insn_ic(FIXTURE_DATA(uc_kvm) *self) +{ + struct kvm_run *run = self->run; + + switch (run->s390_sieic.ipa) { + case 0xB229: /* ISKE */ + case 0xB22b: /* SSKE */ + case 0xB22a: /* RRBE */ + uc_skey_enable(self); + return true; + default: + return false; + } +} + +/* + * Handle the SIEIC exit * * fail on codes not expected in the test cases + * Returns if interception is handled / execution can be continued */ static bool uc_handle_sieic(FIXTURE_DATA(uc_kvm) * self) { @@ -315,7 +383,10 @@ static bool uc_handle_sieic(FIXTURE_DATA(uc_kvm) * self) case ICPT_INST: /* end execution in caller on intercepted instruction */ pr_info("sie instruction interception\n"); - return false; + return uc_handle_insn_ic(self); + case ICPT_KSS: + uc_skey_enable(self); + return true; case ICPT_OPEREXC: /* operation exception */ TEST_FAIL("sie exception on %.4x%.8x", sie_block->ipa, sie_block->ipb); @@ -472,4 +543,59 @@ TEST_F(uc_kvm, uc_gprs) ASSERT_EQ(1, sync_regs->gprs[0]); } +TEST_F(uc_kvm, uc_skey) +{ + struct kvm_sync_regs *sync_regs = &self->run->s.regs; + u64 test_vaddr = VM_MEM_SIZE - (SZ_1M / 2); + struct kvm_run *run = self->run; + const u8 skeyvalue = 0x34; + + /* copy test_skey_asm to code_hva / code_gpa */ + TH_LOG("copy code %p to vm mapped memory %p / %p", + &test_skey_asm, (void *)self->code_hva, (void *)self->code_gpa); + memcpy((void *)self->code_hva, &test_skey_asm, PAGE_SIZE); + + /* set register content for test_skey_asm to access not mapped memory */ + sync_regs->gprs[1] = skeyvalue; + sync_regs->gprs[5] = self->base_gpa; + sync_regs->gprs[6] = test_vaddr; + run->kvm_dirty_regs |= KVM_SYNC_GPRS; + + /* DAT disabled + 64 bit mod
[PATCH v6 4/5] selftests: kvm: s390: Fix whitespace confusion in ucontrol test
Checkpatch thinks that we're doing a multiplication but we're obviously not. Fix 4 instances where we adhered to wrong checkpatch advice. Signed-off-by: Christoph Schlameuss --- tools/testing/selftests/kvm/s390x/ucontrol_test.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/kvm/s390x/ucontrol_test.c b/tools/testing/selftests/kvm/s390x/ucontrol_test.c index ae077bf838c8..db8776269373 100644 --- a/tools/testing/selftests/kvm/s390x/ucontrol_test.c +++ b/tools/testing/selftests/kvm/s390x/ucontrol_test.c @@ -369,7 +369,7 @@ static bool uc_handle_insn_ic(FIXTURE_DATA(uc_kvm) *self) * * fail on codes not expected in the test cases * Returns if interception is handled / execution can be continued */ -static bool uc_handle_sieic(FIXTURE_DATA(uc_kvm) * self) +static bool uc_handle_sieic(FIXTURE_DATA(uc_kvm) *self) { struct kvm_s390_sie_block *sie_block = self->sie_block; struct kvm_run *run = self->run; @@ -397,7 +397,7 @@ static bool uc_handle_sieic(FIXTURE_DATA(uc_kvm) * self) } /* verify VM state on exit */ -static bool uc_handle_exit(FIXTURE_DATA(uc_kvm) * self) +static bool uc_handle_exit(FIXTURE_DATA(uc_kvm) *self) { struct kvm_run *run = self->run; @@ -417,7 +417,7 @@ static bool uc_handle_exit(FIXTURE_DATA(uc_kvm) * self) } /* run the VM until interrupted */ -static int uc_run_once(FIXTURE_DATA(uc_kvm) * self) +static int uc_run_once(FIXTURE_DATA(uc_kvm) *self) { int rc; @@ -428,7 +428,7 @@ static int uc_run_once(FIXTURE_DATA(uc_kvm) * self) return rc; } -static void uc_assert_diag44(FIXTURE_DATA(uc_kvm) * self) +static void uc_assert_diag44(FIXTURE_DATA(uc_kvm) *self) { struct kvm_s390_sie_block *sie_block = self->sie_block; -- 2.47.0
Re: [PATCH v2 5/6] iommu/arm-smmu-v3: Make smmuv3 set_dev_pasid() op support replace
On Thu, Sep 12, 2024 at 06:04:26AM -0700, Yi Liu wrote: > From: Jason Gunthorpe > > set_dev_pasid() op is going to be enhanced to support domain replacement > of a pasid. This prepares for this op definition. > > Signed-off-by: Jason Gunthorpe > Signed-off-by: Yi Liu > --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 +--- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +- > 3 files changed, 7 insertions(+), 5 deletions(-) > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > index 645da7b69bed..1d3e71569775 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > @@ -349,7 +349,7 @@ static int arm_smmu_sva_set_dev_pasid(struct iommu_domain > *domain, >* get reassigned >*/ > arm_smmu_make_sva_cd(&target, master, domain->mm, smmu_domain->cd.asid); > - ret = arm_smmu_set_pasid(master, smmu_domain, id, &target); > + ret = arm_smmu_set_pasid(master, smmu_domain, id, &target, old); > > mmput(domain->mm); > return ret; > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > index ed2b106e02dd..f7a73b854151 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > @@ -2824,7 +2824,8 @@ static int arm_smmu_attach_dev(struct iommu_domain > *domain, struct device *dev) > } > > static int arm_smmu_s1_set_dev_pasid(struct iommu_domain *domain, > - struct device *dev, ioasid_t id) > + struct device *dev, ioasid_t id, > + struct iommu_domain *old) > { > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > struct arm_smmu_master *master = dev_iommu_priv_get(dev); > @@ -2850,7 +2851,7 @@ static int arm_smmu_s1_set_dev_pasid(struct > iommu_domain *domain, >*/ > arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); > return arm_smmu_set_pasid(master, to_smmu_domain(domain), id, > - &target_cd); > + &target_cd, old); > } > > static void arm_smmu_update_ste(struct arm_smmu_master *master, > @@ -2880,7 +2881,7 @@ static void arm_smmu_update_ste(struct arm_smmu_master > *master, > > int arm_smmu_set_pasid(struct arm_smmu_master *master, > struct arm_smmu_domain *smmu_domain, ioasid_t pasid, > -struct arm_smmu_cd *cd) > +struct arm_smmu_cd *cd, struct iommu_domain *old) > { > struct iommu_domain *sid_domain = iommu_get_domain_for_dev(master->dev); > struct arm_smmu_attach_state state = { > @@ -2890,6 +2891,7 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, >* already attached, no need to set old_domain. >*/ > .ssid = pasid, > + .old_domain = old, nit: The existing comment says "no need to set old_domain" and now you're doing exactly that! Please can you update the commentary (probably just remove the comment entirely)? Cheers, Will
Re: [PATCH v3 3/4] vfio: Add VFIO_DEVICE_PASID_[AT|DE]TACH_IOMMUFD_PT
On 2024/10/14 23:49, Alex Williamson wrote: On Sat, 12 Oct 2024 21:49:05 +0800 Yi Liu wrote: On 2024/10/1 20:11, Jason Gunthorpe wrote: On Mon, Sep 30, 2024 at 07:55:08AM +, Tian, Kevin wrote: +struct vfio_device_pasid_attach_iommufd_pt { + __u32 argsz; + __u32 flags; + __u32 pasid; + __u32 pt_id; +}; + +#define VFIO_DEVICE_PASID_ATTACH_IOMMUFD_PT_IO(VFIO_TYPE, VFIO_BASE + 21) Not sure whether this was discussed before. Does it make sense to reuse the existing VFIO_DEVICE_ATTACH_IOMMUFD_PT by introducing a new pasid field and a new flag bit? Maybe? I don't have a strong feeling either way. There is somewhat less code if you reuse the ioctl at least I had a rough memory that I was suggested to add a separate ioctl for PASID. Let's see Alex's opinion. I don't recall any previous arguments for separate ioctls, but it seems to make a lot of sense to me to extend the existing ioctls with a flag to indicate pasid cscope and id. Thanks, thanks for the confirmation. How about the below? diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index bb1817bd4ff3..c78533bce3c6 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -162,21 +162,34 @@ void vfio_df_unbind_iommufd(struct vfio_device_file *df) int vfio_df_ioctl_attach_pt(struct vfio_device_file *df, struct vfio_device_attach_iommufd_pt __user *arg) { - struct vfio_device *device = df->device; + unsigned long minsz = offsetofend( + struct vfio_device_attach_iommufd_pt, pt_id); struct vfio_device_attach_iommufd_pt attach; - unsigned long minsz; + struct vfio_device *device = df->device; + u32 user_size; int ret; - minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id); + ret = get_user(user_size, (u32 __user *)arg); + if (ret) + return ret; - if (copy_from_user(&attach, arg, minsz)) - return -EFAULT; + ret = copy_struct_from_user(&attach, sizeof(attach), arg, user_size); + if (ret) + return ret; - if (attach.argsz < minsz || attach.flags) + if (attach.argsz < minsz || attach.flags & (~VFIO_DEVICE_ATTACH_PASID)) return -EINVAL; + if ((attach.flags & VFIO_DEVICE_ATTACH_PASID) && + !device->ops->pasid_attach_ioas) + return -EOPNOTSUPP; + mutex_lock(&device->dev_set->lock); - ret = device->ops->attach_ioas(device, &attach.pt_id); + if (attach.flags & VFIO_DEVICE_ATTACH_PASID) + ret = device->ops->pasid_attach_ioas(device, attach.pasid, +&attach.pt_id); + else + ret = device->ops->attach_ioas(device, &attach.pt_id); if (ret) goto out_unlock; @@ -198,20 +211,33 @@ int vfio_df_ioctl_attach_pt(struct vfio_device_file *df, int vfio_df_ioctl_detach_pt(struct vfio_device_file *df, struct vfio_device_detach_iommufd_pt __user *arg) { - struct vfio_device *device = df->device; + unsigned long minsz = + offsetofend(struct vfio_device_detach_iommufd_pt, flags); struct vfio_device_detach_iommufd_pt detach; - unsigned long minsz; + struct vfio_device *device = df->device; + u32 user_size; + int ret; - minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags); + ret = get_user(user_size, (u32 __user *)arg); + if (ret) + return ret; - if (copy_from_user(&detach, arg, minsz)) - return -EFAULT; + ret = copy_struct_from_user(&detach, sizeof(detach), arg, user_size); + if (ret) + return ret; - if (detach.argsz < minsz || detach.flags) + if (detach.argsz < minsz || detach.flags & (~VFIO_DEVICE_DETACH_PASID)) return -EINVAL; + if ((detach.flags & VFIO_DEVICE_DETACH_PASID) && + !device->ops->pasid_detach_ioas) + return -EOPNOTSUPP; + mutex_lock(&device->dev_set->lock); - device->ops->detach_ioas(device); + if (detach.flags & VFIO_DEVICE_DETACH_PASID) + device->ops->pasid_detach_ioas(device, detach.pasid); + else + device->ops->detach_ioas(device); mutex_unlock(&device->dev_set->lock); return 0; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 2b68e6cdf190..40b414e642f5 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -931,29 +931,34 @@ struct vfio_device_bind_iommufd { * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 19, * struct vfio_device_attach_iommufd_pt) * @argsz: User filled size of this data. - * @flags: Must be 0. + * @flags: Flags for attach. * @pt_id: Input the target id which can repr
Re: [PATCH v5 19/30] arm64: add POE signal support
On Tue, Oct 15, 2024 at 12:41:16PM +0100, Will Deacon wrote: > On Tue, Oct 15, 2024 at 10:59:11AM +0100, Joey Gouly wrote: > > On Mon, Oct 14, 2024 at 06:10:23PM +0100, Will Deacon wrote: > > > Kevin, Joey, > > > > > > On Wed, Oct 09, 2024 at 03:43:01PM +0100, Will Deacon wrote: > > > > On Tue, Sep 24, 2024 at 01:27:58PM +0200, Kevin Brodsky wrote: > > > > > On 22/08/2024 17:11, Joey Gouly wrote: > > > > > > @@ -1178,6 +1237,9 @@ static void setup_return(struct pt_regs > > > > > > *regs, struct k_sigaction *ka, > > > > > > sme_smstop(); > > > > > > } > > > > > > > > > > > > + if (system_supports_poe()) > > > > > > + write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); > > > > > > > > > > At the point where setup_return() is called, the signal frame has > > > > > already been written to the user stack. In other words, we write to > > > > > the > > > > > user stack first, and then reset POR_EL0. This may be problematic, > > > > > especially if we are using the alternate signal stack, which the > > > > > interrupted POR_EL0 may not grant access to. In that situation uaccess > > > > > will fail and we'll end up with a SIGSEGV. > > > > > > > > > > This issue has already been discussed on the x86 side, and as it > > > > > happens > > > > > patches to reset PKRU early [1] have just landed. I don't think this > > > > > is > > > > > a blocker for getting this series landed, but we should try and align > > > > > with x86. If there's no objection, I'm planning to work on a > > > > > counterpart > > > > > to the x86 series (resetting POR_EL0 early during signal delivery). > > > > > > > > Did you get a chance to work on that? It would be great to land the > > > > fixes for 6.12, if possible, so that the first kernel release with POE > > > > support doesn't land with known issues. > > > > > > Looking a little more at this, I think we have quite a weird behaviour > > > on arm64 as it stands. It looks like we rely on the signal frame to hold > > > the original POR_EL0 so, if for some reason we fail to allocate space > > > for the POR context, I think we'll return back from the signal with > > > POR_EL0_INIT. That seems bad? > > > > If we don't allocate space for POR_EL0, I think the program recieves > > SIGSGEV? > > > > setup_sigframe_layout() > > if (system_supports_poe()) { > > err = sigframe_alloc(user, &user->poe_offset, > > sizeof(struct poe_context)); > > if (err) > > return err; > > } > > > > Through get_sigframe() and setup_rt_frame(), that eventually hets here: > > > > handle_signal() > > ret = setup_rt_frame(usig, ksig, oldset, regs); > > > > [..] > > > > signal_setup_done(ret, ksig, test_thread_flag(TIF_SINGLESTEP)); > > > > void signal_setup_done(int failed, struct ksignal *ksig, int stepping) > > > > > > { > > > > > > if (failed) > > > > > > force_sigsegv(ksig->sig); > > > > > > else > > > > > > signal_delivered(ksig, stepping); > > > > > > } > > > > So I think it's "fine"? > > Ah, yes, sorry about that. I got confused by the conditional push in > setup_sigframe(): > > if (system_supports_poe() && err == 0 && user->poe_offset) { > ... > > which gives the wrong impression that the POR is somehow optional, even > if the CPU supports POE. So we should drop that check of > 'user->poe_offset' as it cannot be NULL here. > > We also still need to resolve Kevin's concern, which probably means > keeping the thread's original POR around someplace. That was cargo culted (by me) from the rest of the function (apart from TPIDR2 and FPMR). I think Kevin is planning on sending his signal changes still, but is on holiday, maybe he can remove the last part of the condition as part of his series. Thanks, Joey
Re: [PATCH v6 2/5] selftests: kvm: s390: Add uc_skey VM test case
On 10/15/24 10:37 AM, Christoph Schlameuss wrote: Add a test case manipulating s390 storage keys from within the ucontrol VM. Storage key instruction (ISKE, SSKE and RRBE) intercepts and Keyless-subset facility are disabled on first use, where the skeys are setup by KVM in non ucontrol VMs. [...] -/* verify SIEIC exit +/* + * Disable skey intercepts and rewind last instruction + * (KVM would init the skeys here) + */ +static void uc_skey_enable(FIXTURE_DATA(uc_kvm) *self) +{ + struct kvm_s390_sie_block *sie_block = self->sie_block; + int ilen = insn_length(sie_block->ipa >> 8); + struct kvm_run *run = self->run; + + /* disable KSS */ + sie_block->cpuflags &= ~CPUSTAT_KSS; + /* disable skey inst interception */ + sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE); + + /* rewind to reexecute intercepted instruction */ + run->psw_addr = run->psw_addr - ilen; There's a very important detail between KSS and the SKEY ICTLs: KSS is (mostly) nullifying i.e. the PSW points to the instruction that caused the KSS exit. ICTL intercepts are suppressing which means the PSW points after the instruction and hence we need to rewind the PSW if we want to re-issue the instruction. Re-winding on a KSS intercept makes the guest cpu execute the instruction before the intercept producing instruction twice.
Re: [PATCH net-next 01/10] selftests: net: lib: Introduce deferred commands
Paolo Abeni writes: > Hi, > > On 10/9/24 14:06, Petr Machata wrote: >> diff --git a/tools/testing/selftests/net/lib/sh/defer.sh >> b/tools/testing/selftests/net/lib/sh/defer.sh >> new file mode 100644 >> index ..8d205c3f0445 >> --- /dev/null >> +++ b/tools/testing/selftests/net/lib/sh/defer.sh >> @@ -0,0 +1,115 @@ >> +#!/bin/bash >> +# SPDX-License-Identifier: GPL-2.0 >> + >> +# map[(scope_id,track,cleanup_id) -> cleanup_command] >> +# track={d=default | p=priority} >> +declare -A __DEFER__JOBS >> + >> +# map[(scope_id,track) -> # cleanup_commands] >> +declare -A __DEFER__NJOBS >> + >> +# scope_id of the topmost scope. >> +__DEFER__SCOPE_ID=0 >> + >> +__defer__ndefer_key() >> +{ >> +local track=$1; shift > > Minor nit: IMHO the trailing shift is here a bit confusing: it let me > think about other arguments, which are not really expected. This is IMHO how a function header should look like: function() { local foo=$1; shift local bar=$1; shift local baz=$1; shift ... } Because it lets you reorder the arguments freely just by reordering the lines, copy argument subsets to other functions without risking forgetting / screwing up renumbering, etc. It's easy to parse visually as well. If the function uses "$@" as rest argument, it will contain the rest by default. It's just a very convenient notation overall. Vast majority of net/lib.sh and net/forwarding/lib.sh use this. >> +__defer__schedule() >> +{ >> +local track=$1; shift >> +local ndefers=$(__defer__ndefers $track) >> +local ndefers_key=$(__defer__ndefer_key $track) >> +local defer_key=$(__defer__defer_key $track $ndefers) >> +local defer="$@" >> + >> +__DEFER__JOBS[$defer_key]="$defer" >> +__DEFER__NJOBS[$ndefers_key]=$((${__DEFER__NJOBS[$ndefers_key]} + 1)) > > '${__DEFER__NJOBS[$ndefers_key]}' is actually '$ndefers', right? If so > it would be better to reuse the avail variable. I figured I would leave it all spelled out, because the left hand side needs to be, and having the same expression on both sides makes it clear that this is just an X++ sort of a deal.
Re: [PATCH net-next 04/10] selftests: RED: Use defer for test cleanup
On 10/9/24 14:06, Petr Machata wrote: @@ -450,6 +415,7 @@ __do_ecn_test() start_tcp_traffic $h1.$vlan $(ipaddr 1 $vlan) $(ipaddr 3 $vlan) \ $h3_mac tos=0x01 + defer stop_traffic $! sleep 1 ecn_test_common "$name" "$get_nmarked" $vlan $limit @@ -462,7 +428,6 @@ __do_ecn_test() check_fail $? "UDP traffic went into backlog instead of being early-dropped" log_test "TC $((vlan - 10)): $name backlog > limit: UDP early-dropped" - stop_traffic sleep 1 I'm wodering what role/goal has the above 'sleep 1'?!? It looks like it could/should be removed after moving the stop_traffic call to the deferred cleanup. Other similar instances below. Cheers, Paolo
[PATCH v6 2/5] selftests: kvm: s390: Add uc_skey VM test case
Add a test case manipulating s390 storage keys from within the ucontrol VM. Storage key instruction (ISKE, SSKE and RRBE) intercepts and Keyless-subset facility are disabled on first use, where the skeys are setup by KVM in non ucontrol VMs. Signed-off-by: Christoph Schlameuss --- .../selftests/kvm/include/s390x/processor.h | 6 + .../selftests/kvm/s390x/ucontrol_test.c | 130 +- 2 files changed, 134 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/kvm/include/s390x/processor.h b/tools/testing/selftests/kvm/include/s390x/processor.h index 481bd2fd6a32..33fef6fd9617 100644 --- a/tools/testing/selftests/kvm/include/s390x/processor.h +++ b/tools/testing/selftests/kvm/include/s390x/processor.h @@ -32,4 +32,10 @@ static inline void cpu_relax(void) barrier(); } +/* Get the instruction length */ +static inline int insn_length(unsigned char code) +{ + return int)code + 64) >> 7) + 1) << 1; +} + #endif diff --git a/tools/testing/selftests/kvm/s390x/ucontrol_test.c b/tools/testing/selftests/kvm/s390x/ucontrol_test.c index 3e649b12a0b9..9568a4e03e4b 100644 --- a/tools/testing/selftests/kvm/s390x/ucontrol_test.c +++ b/tools/testing/selftests/kvm/s390x/ucontrol_test.c @@ -79,6 +79,33 @@ asm("test_mem_asm:\n" " j 0b\n" ); +/* Test program manipulating storage keys */ +extern char test_skey_asm[]; +asm("test_skey_asm:\n" + "xgr%r0, %r0\n" + + "0:\n" + " ahi %r0,1\n" + " st %r1,0(%r5,%r6)\n" + + " iske%r1,%r6\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " sske%r1,%r6\n" + " xgr %r1,%r1\n" + " iske%r1,%r6\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " rrbe%r1,%r6\n" + " iske%r1,%r6\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " j 0b\n" +); + FIXTURE(uc_kvm) { struct kvm_s390_sie_block *sie_block; @@ -298,8 +325,49 @@ static void uc_handle_exit_ucontrol(FIXTURE_DATA(uc_kvm) *self) } } -/* verify SIEIC exit +/* + * Disable skey intercepts and rewind last instruction + * (KVM would init the skeys here) + */ +static void uc_skey_enable(FIXTURE_DATA(uc_kvm) *self) +{ + struct kvm_s390_sie_block *sie_block = self->sie_block; + int ilen = insn_length(sie_block->ipa >> 8); + struct kvm_run *run = self->run; + + /* disable KSS */ + sie_block->cpuflags &= ~CPUSTAT_KSS; + /* disable skey inst interception */ + sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE); + + /* rewind to reexecute intercepted instruction */ + run->psw_addr = run->psw_addr - ilen; + pr_info("rewind guest addr to 0x%.16llx\n", run->psw_addr); +} + +/* + * Handle the instruction intercept + * Returns if interception is handled / execution can be continued + */ +static bool uc_handle_insn_ic(FIXTURE_DATA(uc_kvm) *self) +{ + struct kvm_run *run = self->run; + + switch (run->s390_sieic.ipa) { + case 0xB229: /* ISKE */ + case 0xB22b: /* SSKE */ + case 0xB22a: /* RRBE */ + uc_skey_enable(self); + return true; + default: + return false; + } +} + +/* + * Handle the SIEIC exit * * fail on codes not expected in the test cases + * Returns if interception is handled / execution can be continued */ static bool uc_handle_sieic(FIXTURE_DATA(uc_kvm) * self) { @@ -315,7 +383,10 @@ static bool uc_handle_sieic(FIXTURE_DATA(uc_kvm) * self) case ICPT_INST: /* end execution in caller on intercepted instruction */ pr_info("sie instruction interception\n"); - return false; + return uc_handle_insn_ic(self); + case ICPT_KSS: + uc_skey_enable(self); + return true; case ICPT_OPEREXC: /* operation exception */ TEST_FAIL("sie exception on %.4x%.8x", sie_block->ipa, sie_block->ipb); @@ -472,4 +543,59 @@ TEST_F(uc_kvm, uc_gprs) ASSERT_EQ(1, sync_regs->gprs[0]); } +TEST_F(uc_kvm, uc_skey) +{ + struct kvm_sync_regs *sync_regs = &self->run->s.regs; + u64 test_vaddr = VM_MEM_SIZE - (SZ_1M / 2); + struct kvm_run *run = self->run; + const u8 skeyvalue = 0x34; + + /* copy test_skey_asm to code_hva / code_gpa */ + TH_LOG("copy code %p to vm mapped memory %p / %p", + &test_skey_asm, (void *)self->code_hva, (void *)self->code_gpa); + memcpy((void *)self->code_hva, &test_skey_asm, PAGE_SIZE); + + /* set register content for test_skey_asm to access not mapped memory */ + sync_regs->gprs[1] = skeyvalue; + sync_regs->gprs[5] = self->base_gpa; + sync_regs->gprs[6] = test_vaddr; + run->kvm_dirty_regs |= KVM_SYNC_GPRS; + + /* DAT disabled + 64 bit mod
Re: [PATCH v6 2/5] selftests: kvm: s390: Add uc_skey VM test case
Please ignore this one. I tricked myself in my patch generation and regenerated the original patch.
[PATCH v6 2/5] selftests: kvm: s390: Add uc_skey VM test case
Add a test case manipulating s390 storage keys from within the ucontrol VM. Storage key instruction (ISKE, SSKE and RRBE) intercepts and Keyless-subset facility are disabled on first use, where the skeys are setup by KVM in non ucontrol VMs. Signed-off-by: Christoph Schlameuss --- .../selftests/kvm/include/s390x/processor.h | 6 + .../selftests/kvm/s390x/ucontrol_test.c | 130 +- 2 files changed, 134 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/kvm/include/s390x/processor.h b/tools/testing/selftests/kvm/include/s390x/processor.h index 481bd2fd6a32..33fef6fd9617 100644 --- a/tools/testing/selftests/kvm/include/s390x/processor.h +++ b/tools/testing/selftests/kvm/include/s390x/processor.h @@ -32,4 +32,10 @@ static inline void cpu_relax(void) barrier(); } +/* Get the instruction length */ +static inline int insn_length(unsigned char code) +{ + return int)code + 64) >> 7) + 1) << 1; +} + #endif diff --git a/tools/testing/selftests/kvm/s390x/ucontrol_test.c b/tools/testing/selftests/kvm/s390x/ucontrol_test.c index 3e649b12a0b9..c1bc199d6e85 100644 --- a/tools/testing/selftests/kvm/s390x/ucontrol_test.c +++ b/tools/testing/selftests/kvm/s390x/ucontrol_test.c @@ -79,6 +79,33 @@ asm("test_mem_asm:\n" " j 0b\n" ); +/* Test program manipulating storage keys */ +extern char test_skey_asm[]; +asm("test_skey_asm:\n" + "xgr%r0, %r0\n" + + "0:\n" + " ahi %r0,1\n" + " st %r1,0(%r5,%r6)\n" + + " iske%r1,%r6\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " sske%r1,%r6\n" + " xgr %r1,%r1\n" + " iske%r1,%r6\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " rrbe%r1,%r6\n" + " iske%r1,%r6\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " j 0b\n" +); + FIXTURE(uc_kvm) { struct kvm_s390_sie_block *sie_block; @@ -298,8 +325,49 @@ static void uc_handle_exit_ucontrol(FIXTURE_DATA(uc_kvm) *self) } } -/* verify SIEIC exit +/* + * Disable skey intercepts + * (KVM would init the skeys here) + */ +static void uc_skey_enable(FIXTURE_DATA(uc_kvm) *self) +{ + struct kvm_s390_sie_block *sie_block = self->sie_block; + + /* disable KSS */ + sie_block->cpuflags &= ~CPUSTAT_KSS; + /* disable skey inst interception */ + sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE); +} + +/* + * Handle the instruction intercept + * Returns if interception is handled / execution can be continued + */ +static bool uc_handle_insn_ic(FIXTURE_DATA(uc_kvm) *self) +{ + struct kvm_s390_sie_block *sie_block = self->sie_block; + int ilen = insn_length(sie_block->ipa >> 8); + struct kvm_run *run = self->run; + + switch (run->s390_sieic.ipa) { + case 0xB229: /* ISKE */ + case 0xB22b: /* SSKE */ + case 0xB22a: /* RRBE */ + uc_skey_enable(self); + + /* rewind to reexecute intercepted instruction */ + run->psw_addr = run->psw_addr - ilen; + pr_info("rewind guest addr to 0x%.16llx\n", run->psw_addr); + return true; + default: + return false; + } +} + +/* + * Handle the SIEIC exit * * fail on codes not expected in the test cases + * Returns if interception is handled / execution can be continued */ static bool uc_handle_sieic(FIXTURE_DATA(uc_kvm) * self) { @@ -315,7 +383,10 @@ static bool uc_handle_sieic(FIXTURE_DATA(uc_kvm) * self) case ICPT_INST: /* end execution in caller on intercepted instruction */ pr_info("sie instruction interception\n"); - return false; + return uc_handle_insn_ic(self); + case ICPT_KSS: + uc_skey_enable(self); + return true; case ICPT_OPEREXC: /* operation exception */ TEST_FAIL("sie exception on %.4x%.8x", sie_block->ipa, sie_block->ipb); @@ -472,4 +543,59 @@ TEST_F(uc_kvm, uc_gprs) ASSERT_EQ(1, sync_regs->gprs[0]); } +TEST_F(uc_kvm, uc_skey) +{ + struct kvm_sync_regs *sync_regs = &self->run->s.regs; + u64 test_vaddr = VM_MEM_SIZE - (SZ_1M / 2); + struct kvm_run *run = self->run; + const u8 skeyvalue = 0x34; + + /* copy test_skey_asm to code_hva / code_gpa */ + TH_LOG("copy code %p to vm mapped memory %p / %p", + &test_skey_asm, (void *)self->code_hva, (void *)self->code_gpa); + memcpy((void *)self->code_hva, &test_skey_asm, PAGE_SIZE); + + /* set register content for test_skey_asm to access not mapped memory */ + sync_regs->gprs[1] = skeyvalue; + sync_regs->gprs[5] = self->base_gpa; + sync_regs->gprs[6] = test_vaddr; + run->kvm_dirty_regs |= KVM_SYNC_GPRS; + + /* DAT di
Re: [PATCH v2 5/6] iommu/arm-smmu-v3: Make smmuv3 set_dev_pasid() op support replace
On 2024/10/15 16:43, Will Deacon wrote: On Thu, Sep 12, 2024 at 06:04:26AM -0700, Yi Liu wrote: From: Jason Gunthorpe set_dev_pasid() op is going to be enhanced to support domain replacement of a pasid. This prepares for this op definition. Signed-off-by: Jason Gunthorpe Signed-off-by: Yi Liu --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 645da7b69bed..1d3e71569775 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -349,7 +349,7 @@ static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain, * get reassigned */ arm_smmu_make_sva_cd(&target, master, domain->mm, smmu_domain->cd.asid); - ret = arm_smmu_set_pasid(master, smmu_domain, id, &target); + ret = arm_smmu_set_pasid(master, smmu_domain, id, &target, old); mmput(domain->mm); return ret; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index ed2b106e02dd..f7a73b854151 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2824,7 +2824,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) } static int arm_smmu_s1_set_dev_pasid(struct iommu_domain *domain, - struct device *dev, ioasid_t id) +struct device *dev, ioasid_t id, +struct iommu_domain *old) { struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_master *master = dev_iommu_priv_get(dev); @@ -2850,7 +2851,7 @@ static int arm_smmu_s1_set_dev_pasid(struct iommu_domain *domain, */ arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); return arm_smmu_set_pasid(master, to_smmu_domain(domain), id, - &target_cd); + &target_cd, old); } static void arm_smmu_update_ste(struct arm_smmu_master *master, @@ -2880,7 +2881,7 @@ static void arm_smmu_update_ste(struct arm_smmu_master *master, int arm_smmu_set_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid, - struct arm_smmu_cd *cd) + struct arm_smmu_cd *cd, struct iommu_domain *old) { struct iommu_domain *sid_domain = iommu_get_domain_for_dev(master->dev); struct arm_smmu_attach_state state = { @@ -2890,6 +2891,7 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, * already attached, no need to set old_domain. */ .ssid = pasid, + .old_domain = old, nit: The existing comment says "no need to set old_domain" and now you're doing exactly that! Please can you update the commentary (probably just remove the comment entirely)? oops, indeed, should remove this comment entirely. thanks for spotting it. -- Regards, Yi Liu
Re: [PATCH v5 19/30] arm64: add POE signal support
On Mon, Oct 14, 2024 at 06:10:23PM +0100, Will Deacon wrote: > Kevin, Joey, > > On Wed, Oct 09, 2024 at 03:43:01PM +0100, Will Deacon wrote: > > On Tue, Sep 24, 2024 at 01:27:58PM +0200, Kevin Brodsky wrote: > > > On 22/08/2024 17:11, Joey Gouly wrote: > > > > @@ -1178,6 +1237,9 @@ static void setup_return(struct pt_regs *regs, > > > > struct k_sigaction *ka, > > > > sme_smstop(); > > > > } > > > > > > > > + if (system_supports_poe()) > > > > + write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); > > > > > > At the point where setup_return() is called, the signal frame has > > > already been written to the user stack. In other words, we write to the > > > user stack first, and then reset POR_EL0. This may be problematic, > > > especially if we are using the alternate signal stack, which the > > > interrupted POR_EL0 may not grant access to. In that situation uaccess > > > will fail and we'll end up with a SIGSEGV. > > > > > > This issue has already been discussed on the x86 side, and as it happens > > > patches to reset PKRU early [1] have just landed. I don't think this is > > > a blocker for getting this series landed, but we should try and align > > > with x86. If there's no objection, I'm planning to work on a counterpart > > > to the x86 series (resetting POR_EL0 early during signal delivery). > > > > Did you get a chance to work on that? It would be great to land the > > fixes for 6.12, if possible, so that the first kernel release with POE > > support doesn't land with known issues. > > Looking a little more at this, I think we have quite a weird behaviour > on arm64 as it stands. It looks like we rely on the signal frame to hold > the original POR_EL0 so, if for some reason we fail to allocate space > for the POR context, I think we'll return back from the signal with > POR_EL0_INIT. That seems bad? If we don't allocate space for POR_EL0, I think the program recieves SIGSGEV? setup_sigframe_layout() if (system_supports_poe()) { err = sigframe_alloc(user, &user->poe_offset, sizeof(struct poe_context)); if (err) return err; } Through get_sigframe() and setup_rt_frame(), that eventually hets here: handle_signal() ret = setup_rt_frame(usig, ksig, oldset, regs); [..] signal_setup_done(ret, ksig, test_thread_flag(TIF_SINGLESTEP)); void signal_setup_done(int failed, struct ksignal *ksig, int stepping) { if (failed) force_sigsegv(ksig->sig); else signal_delivered(ksig, stepping); } So I think it's "fine"? Thanks, Joey
Re: [PATCH net-next 01/10] selftests: net: lib: Introduce deferred commands
Hi, On 10/9/24 14:06, Petr Machata wrote: diff --git a/tools/testing/selftests/net/lib/sh/defer.sh b/tools/testing/selftests/net/lib/sh/defer.sh new file mode 100644 index ..8d205c3f0445 --- /dev/null +++ b/tools/testing/selftests/net/lib/sh/defer.sh @@ -0,0 +1,115 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# map[(scope_id,track,cleanup_id) -> cleanup_command] +# track={d=default | p=priority} +declare -A __DEFER__JOBS + +# map[(scope_id,track) -> # cleanup_commands] +declare -A __DEFER__NJOBS + +# scope_id of the topmost scope. +__DEFER__SCOPE_ID=0 + +__defer__ndefer_key() +{ + local track=$1; shift Minor nit: IMHO the trailing shift is here a bit confusing: it let me think about other arguments, which are not really expected. [...] +__defer__schedule() +{ + local track=$1; shift + local ndefers=$(__defer__ndefers $track) + local ndefers_key=$(__defer__ndefer_key $track) + local defer_key=$(__defer__defer_key $track $ndefers) + local defer="$@" + + __DEFER__JOBS[$defer_key]="$defer" + __DEFER__NJOBS[$ndefers_key]=$((${__DEFER__NJOBS[$ndefers_key]} + 1)) '${__DEFER__NJOBS[$ndefers_key]}' is actually '$ndefers', right? If so it would be better to reuse the avail variable. Thanks, Paolo
Re: [PATCH v5 19/30] arm64: add POE signal support
On Tue, Oct 15, 2024 at 10:59:11AM +0100, Joey Gouly wrote: > On Mon, Oct 14, 2024 at 06:10:23PM +0100, Will Deacon wrote: > > Looking a little more at this, I think we have quite a weird behaviour > > on arm64 as it stands. It looks like we rely on the signal frame to hold > > the original POR_EL0 so, if for some reason we fail to allocate space > > for the POR context, I think we'll return back from the signal with > > POR_EL0_INIT. That seems bad? > If we don't allocate space for POR_EL0, I think the program recieves SIGSGEV? ... > So I think it's "fine"? Yeah, there's a bunch of other stuff would go badly if we tried to carry on after failing to allocate a signal frame. signature.asc Description: PGP signature
Re: [PATCH v5 19/30] arm64: add POE signal support
On Tue, Oct 15, 2024 at 10:59:11AM +0100, Joey Gouly wrote: > On Mon, Oct 14, 2024 at 06:10:23PM +0100, Will Deacon wrote: > > Kevin, Joey, > > > > On Wed, Oct 09, 2024 at 03:43:01PM +0100, Will Deacon wrote: > > > On Tue, Sep 24, 2024 at 01:27:58PM +0200, Kevin Brodsky wrote: > > > > On 22/08/2024 17:11, Joey Gouly wrote: > > > > > @@ -1178,6 +1237,9 @@ static void setup_return(struct pt_regs *regs, > > > > > struct k_sigaction *ka, > > > > > sme_smstop(); > > > > > } > > > > > > > > > > + if (system_supports_poe()) > > > > > + write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); > > > > > > > > At the point where setup_return() is called, the signal frame has > > > > already been written to the user stack. In other words, we write to the > > > > user stack first, and then reset POR_EL0. This may be problematic, > > > > especially if we are using the alternate signal stack, which the > > > > interrupted POR_EL0 may not grant access to. In that situation uaccess > > > > will fail and we'll end up with a SIGSEGV. > > > > > > > > This issue has already been discussed on the x86 side, and as it happens > > > > patches to reset PKRU early [1] have just landed. I don't think this is > > > > a blocker for getting this series landed, but we should try and align > > > > with x86. If there's no objection, I'm planning to work on a counterpart > > > > to the x86 series (resetting POR_EL0 early during signal delivery). > > > > > > Did you get a chance to work on that? It would be great to land the > > > fixes for 6.12, if possible, so that the first kernel release with POE > > > support doesn't land with known issues. > > > > Looking a little more at this, I think we have quite a weird behaviour > > on arm64 as it stands. It looks like we rely on the signal frame to hold > > the original POR_EL0 so, if for some reason we fail to allocate space > > for the POR context, I think we'll return back from the signal with > > POR_EL0_INIT. That seems bad? > > If we don't allocate space for POR_EL0, I think the program recieves SIGSGEV? > > setup_sigframe_layout() > if (system_supports_poe()) { > err = sigframe_alloc(user, &user->poe_offset, > sizeof(struct poe_context)); > if (err) > return err; > } > > Through get_sigframe() and setup_rt_frame(), that eventually hets here: > > handle_signal() > ret = setup_rt_frame(usig, ksig, oldset, regs); > > [..] > > signal_setup_done(ret, ksig, test_thread_flag(TIF_SINGLESTEP)); > > void signal_setup_done(int failed, struct ksignal *ksig, int stepping) > > > { > > > if (failed) > > > force_sigsegv(ksig->sig); > > > else > > > signal_delivered(ksig, stepping); > > > } > > So I think it's "fine"? Ah, yes, sorry about that. I got confused by the conditional push in setup_sigframe(): if (system_supports_poe() && err == 0 && user->poe_offset) { ... which gives the wrong impression that the POR is somehow optional, even if the CPU supports POE. So we should drop that check of 'user->poe_offset' as it cannot be NULL here. We also still need to resolve Kevin's concern, which probably means keeping the thread's original POR around someplace. Will
Re: [PATCH v3 2/4] KVM: SVM: Enable Bus lock threshold exit
On Fri, Oct 04, 2024, Manali Shukla wrote: > When a VMRUN instruction is executed, the bus lock threshold count is > loaded into an internal count register. Before the processor executes > a bus lock in the guest, it checks the value of this register: > > - If the value is greater than '0', the processor successfully >executes the bus lock and decrements the count. > > - If the value is '0', the bus lock is not executed, and a #VMEXIT to >the VMM is taken. > > The bus lock threshold #VMEXIT is reported to the VMM with the VMEXIT > code A5h, SVM_EXIT_BUS_LOCK. I vote to split this into two patches: one to add the architectural collateral, with the above as the changelog, and a second to actually implement support in KVM. Having the above background is useful, but it makes it quite hard to find information on the KVM design and implementation. > This implementation is set up to intercept every guest bus lock. "This implementation" is a variation on "This patch". Drop it, and simply state what the patch is doing. > initial value of the Bus Lock Threshold counter is '0' in the VMCB, so > the very first bus lock will exit, set the Bus Lock Threshold counter > to '1' (so that the offending instruction can make forward progress) > but then the value is at '0' again so the next bus lock will exit. > > The generated SVM_EXIT_BUS_LOCK in kvm will exit to user space by s/kvm/KVM And again, the tone is wrong. Something is what I would aim for: Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM CPUs with Bus Lock Threshold, which is close enough to VMX's Bus Lock Detection VM-Exit to allow reusing KVM_CAP_X86_BUS_LOCK_EXIT. The biggest difference between the two features is that Threshold is fault-like, whereas Detection is trap-like. To allow the guest to make forward progress, Threshold provides a per-VMCB counter which is decremented every time a bus lock occurs, and a VM-Exit is triggered if and only if the counter is '0'. To provide Detection-like semantics, initialize the counter to '0', i.e. exit on every bus lock, and when re-executing the guilty instruction, set the counter to '1' to effectively step past the instruction. Note, in the unlikely scenario that re-executing the instruction doesn't trigger a bus lock, e.g. because the guest has changed memory types or patched the guilty instruction, the bus lock counter will be left at '1', i.e. the guest will be able to do a bus lock on a different instruction. In a perfect world, KVM would ensure the counter is '0' if the guest has made forward progress, e.g. if RIP has changed. But trying to close that hole would incur non-trivial complexity, for marginal benefit; the intent of KVM_CAP_X86_BUS_LOCK_EXIT is to allow userspace rate-limit bus locks, not to allow for precise detection of problematic guest code. And, it's simply not feasible to fully close the hole, e.g. if an interrupt arrives before the original instruction can re-execute, the guest could step past a different bus lock. > setting the KVM_RUN_BUS_LOCK flag in vcpu->run->flags, indicating to > the user space that a bus lock has been detected in the guest. > > Use the already available KVM capability KVM_CAP_X86_BUS_LOCK_EXIT to > enable the feature. This feature is disabled by default, it can be > enabled explicitly from user space. > > More details about the Bus Lock Threshold feature can be found in AMD > APM [1]. ... > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c > index d5314cb7dff4..ca1c42201894 100644 > --- a/arch/x86/kvm/svm/nested.c > +++ b/arch/x86/kvm/svm/nested.c > @@ -669,6 +669,11 @@ static void nested_vmcb02_prepare_control(struct > vcpu_svm *svm, > vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa; > vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa; > > + /* > + * The bus lock threshold count should keep running across nested > + * transitions. Copy the buslock threshold count from vmcb01 to vmcb02. No, it should be reset to '0'. The bus lock can't have been on VMRUN, because KVM is emulating the VMRUN. That leaves two possibilities: the bus lock happened in L1 on an instruction before VMRUN, or the bus lock happened in _an_ L2, before a nested VM-Exit to L1 occurred. In the first case, the bus lock firmly happened on an instruction in the past. Even if vmcb01's counter is still '1', e.g. because the guilty instruction got patched, the vCPU has clearly made forward progress and so KVM should reset vmcb02's counter to '0'. In the second case, KVM has no idea if L2 has made forward progress. The only way to _try to_ detect if L2 has made forward progress would to be to track the counter on a per-vmcb12 basis, but even that is flawed because KVM doesn't have visibility into L1's management of L2. I do think we may need to stash vmcb02's counter though. E.g. if userspace rate- limits the vCPU, then it's entirely possible that
Re: [PATCH v5 19/30] arm64: add POE signal support
On Tue, Oct 15, 2024 at 12:41:16PM +0100, Will Deacon wrote: > On Tue, Oct 15, 2024 at 10:59:11AM +0100, Joey Gouly wrote: > > On Mon, Oct 14, 2024 at 06:10:23PM +0100, Will Deacon wrote: > > > Looking a little more at this, I think we have quite a weird behaviour > > > on arm64 as it stands. It looks like we rely on the signal frame to hold > > > the original POR_EL0 so, if for some reason we fail to allocate space > > > for the POR context, I think we'll return back from the signal with > > > POR_EL0_INIT. That seems bad? > > > > If we don't allocate space for POR_EL0, I think the program recieves > > SIGSGEV? > > > > setup_sigframe_layout() > > if (system_supports_poe()) { > > err = sigframe_alloc(user, &user->poe_offset, > > sizeof(struct poe_context)); > > if (err) > > return err; > > } > > > > Through get_sigframe() and setup_rt_frame(), that eventually hets here: > > > > handle_signal() > > ret = setup_rt_frame(usig, ksig, oldset, regs); > > > > [..] > > > > signal_setup_done(ret, ksig, test_thread_flag(TIF_SINGLESTEP)); > > > > void signal_setup_done(int failed, struct ksignal *ksig, int stepping) > > { > > if (failed) > > force_sigsegv(ksig->sig); > > else > > signal_delivered(ksig, stepping); > > } > > > > So I think it's "fine"? > > Ah, yes, sorry about that. I got confused by the conditional push in > setup_sigframe(): > > if (system_supports_poe() && err == 0 && user->poe_offset) { > ... > > which gives the wrong impression that the POR is somehow optional, even > if the CPU supports POE. So we should drop that check of > 'user->poe_offset' as it cannot be NULL here. I agree, we should remove this check as it's confusing. > We also still need to resolve Kevin's concern, which probably means > keeping the thread's original POR around someplace. If we fail to allocate context for POR_EL0 (or anything else), we'll deliver a SIGSEGV. I think it's quite likely that the SIGSEGV will also fail to allocate context we end up with a fatal SIGSEGV. Not sure the user can affect the allocation/layout, though it can change stack attributes where the frame is written. Assuming that the user tricks the kernel into failing to write the context but allows it to succeed on the resulting SIGSEGV, POR_EL0 wouldn't have been reset and the SIGSEGV context will still have the original value. I don't think we need to do anything here for 6.12. However, in for-next/core, we have gcs_signal_entry() called after resetting POR_EL0. If this fails, we can end up with a new POR_EL0 on sigreturn (subject to the above user toggling permissions). I think this needs to be fixed, POR_EL0 only reset when we know we are going to deliver the signal. -- Catalin
Re: [PATCH v3 3/4] vfio: Add VFIO_DEVICE_PASID_[AT|DE]TACH_IOMMUFD_PT
On Tue, 15 Oct 2024 19:07:52 +0800 Yi Liu wrote: > On 2024/10/14 23:49, Alex Williamson wrote: > > On Sat, 12 Oct 2024 21:49:05 +0800 > > Yi Liu wrote: > > > >> On 2024/10/1 20:11, Jason Gunthorpe wrote: > >>> On Mon, Sep 30, 2024 at 07:55:08AM +, Tian, Kevin wrote: > >>> > > +struct vfio_device_pasid_attach_iommufd_pt { > > + __u32 argsz; > > + __u32 flags; > > + __u32 pasid; > > + __u32 pt_id; > > +}; > > + > > +#define VFIO_DEVICE_PASID_ATTACH_IOMMUFD_PT_IO(VFIO_TYPE, > > VFIO_BASE + 21) > > Not sure whether this was discussed before. Does it make sense > to reuse the existing VFIO_DEVICE_ATTACH_IOMMUFD_PT > by introducing a new pasid field and a new flag bit? > >>> > >>> Maybe? I don't have a strong feeling either way. > >>> > >>> There is somewhat less code if you reuse the ioctl at least > >> > >> I had a rough memory that I was suggested to add a separate ioctl for > >> PASID. Let's see Alex's opinion. > > > > I don't recall any previous arguments for separate ioctls, but it seems > > to make a lot of sense to me to extend the existing ioctls with a flag > > to indicate pasid cscope and id. Thanks, > > thanks for the confirmation. How about the below? > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c > index bb1817bd4ff3..c78533bce3c6 100644 > --- a/drivers/vfio/device_cdev.c > +++ b/drivers/vfio/device_cdev.c > @@ -162,21 +162,34 @@ void vfio_df_unbind_iommufd(struct vfio_device_file *df) > int vfio_df_ioctl_attach_pt(struct vfio_device_file *df, > struct vfio_device_attach_iommufd_pt __user *arg) > { > - struct vfio_device *device = df->device; > + unsigned long minsz = offsetofend( > + struct vfio_device_attach_iommufd_pt, pt_id); > struct vfio_device_attach_iommufd_pt attach; > - unsigned long minsz; > + struct vfio_device *device = df->device; > + u32 user_size; > int ret; > > - minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id); > + ret = get_user(user_size, (u32 __user *)arg); > + if (ret) > + return ret; > > - if (copy_from_user(&attach, arg, minsz)) > - return -EFAULT; > + ret = copy_struct_from_user(&attach, sizeof(attach), arg, user_size); > + if (ret) > + return ret; I think this could break current users. For better or worse, we don't currently have any requirements for the remainder of the user buffer, whereas copy_struct_from_user() returns an error for non-zero trailing bytes. I think we need to monotonically increase the structure size, but maybe something more like below, using flags. The expectation would be that if we add another flag that extends the structure, we'd test that flag after PASID and clobber xend to a new value further into the new structure. We'd also add that flag to the flags mask, but we'd share the copy code. if (attach.argsz < minsz) return -EINVAL; if (attach.flags & (~VFIO_DEVICE_ATTACH_PASID)) return -EINVAL; if (attach.flags & VFIO_DEVICE_ATTACH_PASID) xend = offsetofend(struct vfio_device_attach_iommufd_pt, pasid); if (xend) { if (attach.argsz < xend) return -EINVAL; if (copy_from_user((void *)&attach + minsz, (void __user *)arg + minsz, xend - minsz)) return -EFAULT; } Maybe there are still more elegant options available. We also generally try to label flags with FLAGS in the name, but it does get rather unwieldy, so I'm open to suggestions. Thanks, Alex > > - if (attach.argsz < minsz || attach.flags) > + if (attach.argsz < minsz || attach.flags & (~VFIO_DEVICE_ATTACH_PASID)) > return -EINVAL; > > + if ((attach.flags & VFIO_DEVICE_ATTACH_PASID) && > + !device->ops->pasid_attach_ioas) > + return -EOPNOTSUPP; > + > mutex_lock(&device->dev_set->lock); > - ret = device->ops->attach_ioas(device, &attach.pt_id); > + if (attach.flags & VFIO_DEVICE_ATTACH_PASID) > + ret = device->ops->pasid_attach_ioas(device, attach.pasid, > + &attach.pt_id); > + else > + ret = device->ops->attach_ioas(device, &attach.pt_id); > if (ret) > goto out_unlock; > > @@ -198,20 +211,33 @@ int vfio_df_ioctl_attach_pt(struct vfio_device_file *df, > int vfio_df_ioctl_detach_pt(struct vfio_device_file *df, > struct vfio_device_detach_iommufd_pt __user *arg) > { > - struct vfio_device *device = df->device; > + unsigned long minsz = > + offsetofend(struct vfio_device_detach_iommufd_pt, flags); > struct vfio_device_detach_iommufd_pt detach; > - unsigned l
Re: [PATCH v2 5/6] iommu/arm-smmu-v3: Make smmuv3 set_dev_pasid() op support replace
On Tue, Oct 15, 2024 at 09:43:24AM +0100, Will Deacon wrote: > > @@ -2890,6 +2891,7 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, > > * already attached, no need to set old_domain. > > */ > > .ssid = pasid, > > + .old_domain = old, > > nit: The existing comment says "no need to set old_domain" and now you're > doing exactly that! Please can you update the commentary (probably just > remove the comment entirely)? +1 Jason
Re: [PATCH v3 3/4] vfio: Add VFIO_DEVICE_PASID_[AT|DE]TACH_IOMMUFD_PT
On 2024/10/16 00:22, Alex Williamson wrote: On Tue, 15 Oct 2024 19:07:52 +0800 Yi Liu wrote: On 2024/10/14 23:49, Alex Williamson wrote: On Sat, 12 Oct 2024 21:49:05 +0800 Yi Liu wrote: On 2024/10/1 20:11, Jason Gunthorpe wrote: On Mon, Sep 30, 2024 at 07:55:08AM +, Tian, Kevin wrote: +struct vfio_device_pasid_attach_iommufd_pt { + __u32 argsz; + __u32 flags; + __u32 pasid; + __u32 pt_id; +}; + +#define VFIO_DEVICE_PASID_ATTACH_IOMMUFD_PT_IO(VFIO_TYPE, VFIO_BASE + 21) Not sure whether this was discussed before. Does it make sense to reuse the existing VFIO_DEVICE_ATTACH_IOMMUFD_PT by introducing a new pasid field and a new flag bit? Maybe? I don't have a strong feeling either way. There is somewhat less code if you reuse the ioctl at least I had a rough memory that I was suggested to add a separate ioctl for PASID. Let's see Alex's opinion. I don't recall any previous arguments for separate ioctls, but it seems to make a lot of sense to me to extend the existing ioctls with a flag to indicate pasid cscope and id. Thanks, thanks for the confirmation. How about the below? diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index bb1817bd4ff3..c78533bce3c6 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -162,21 +162,34 @@ void vfio_df_unbind_iommufd(struct vfio_device_file *df) int vfio_df_ioctl_attach_pt(struct vfio_device_file *df, struct vfio_device_attach_iommufd_pt __user *arg) { - struct vfio_device *device = df->device; + unsigned long minsz = offsetofend( + struct vfio_device_attach_iommufd_pt, pt_id); struct vfio_device_attach_iommufd_pt attach; - unsigned long minsz; + struct vfio_device *device = df->device; + u32 user_size; int ret; - minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id); + ret = get_user(user_size, (u32 __user *)arg); + if (ret) + return ret; - if (copy_from_user(&attach, arg, minsz)) - return -EFAULT; + ret = copy_struct_from_user(&attach, sizeof(attach), arg, user_size); + if (ret) + return ret; I think this could break current users. not quite get here. My understanding is as the below: If the current user (compiled with the existing uapi struct), it will provide less data that the new kernel knows. The copy_struct_from_user() would zero the trailing bytes. And such user won't set the new flag, so it should be fine. So to me, the trailing bytes concern comes when user is compiled with the new uapi struct but running on an eld kernel (say <= 6.12).But the eld kernel uses copy_from_user(), so even there is non-zero trailing bytes in the user buffer, the eld kernel just ignores them. So this seems not an issue to me so far. For better or worse, we don't currently have any requirements for the remainder of the user buffer, whereas copy_struct_from_user() returns an error for non-zero trailing bytes. This seems to be a general requirement when using copy_struct_from_user(). User needs to zero the fields that it does not intend to use. If there is no such requirement for user, then the trailing bytes can be a concern in the future but not this time as the existing kernel uses copy_from_user(). I think we need to monotonically increase the structure size, but maybe something more like below, using flags. The expectation would be that if we add another flag that extends the structure, we'd test that flag after PASID and clobber xend to a new value further into the new structure. We'd also add that flag to the flags mask, but we'd share the copy code. agree, this share code might be needed for other path as well. Some macros I guess. if (attach.argsz < minsz) return -EINVAL; if (attach.flags & (~VFIO_DEVICE_ATTACH_PASID)) return -EINVAL; if (attach.flags & VFIO_DEVICE_ATTACH_PASID) xend = offsetofend(struct vfio_device_attach_iommufd_pt, pasid); if (xend) { if (attach.argsz < xend) return -EINVAL; if (copy_from_user((void *)&attach + minsz, (void __user *)arg + minsz, xend - minsz)) return -EFAULT; } I think it might need to zero the trailing bytes if the user does not set the extended flag. is it? Maybe there are still more elegant options available. We also generally try to label flags with FLAGS in the name, but it does get rather unwieldy, so I'm open to suggestions. Thanks, There is already examples that new field added to a kernel-to-user uapi struct like the vfio_region_info::cap_offset and vfio_device_info::cap_offset. But it might be a little bit different with the case we face here as it's user-to-kernel struct. It's a time for yo
kselftest/next build: 7 builds: 2 failed, 5 passed, 1 warning (v6.12-rc3-5-gecfe6870abac)
kselftest/next build: 7 builds: 2 failed, 5 passed, 1 warning (v6.12-rc3-5-gecfe6870abac) Full Build Summary: https://kernelci.org/build/kselftest/branch/next/kernel/v6.12-rc3-5-gecfe6870abac/ Tree: kselftest Branch: next Git Describe: v6.12-rc3-5-gecfe6870abac Git Commit: ecfe6870abac400036d802e28dde4822ec153ffd Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git Built: 4 unique architectures Build Failures Detected: arm64: defconfig+kselftest+arm64-chromebook: (clang-16) FAIL defconfig+kselftest+arm64-chromebook: (gcc-12) FAIL Warnings Detected: arm64: arm: i386: x86_64: x86_64_defconfig+kselftest (clang-16): 1 warning Warnings summary: 1vmlinux.o: warning: objtool: set_ftrace_ops_ro+0x23: relocation to !ENDBR: .text+0x14fd19 Detailed per-defconfig build reports: defconfig+kselftest (arm64, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches defconfig+kselftest+arm64-chromebook (arm64, gcc-12) — FAIL, 0 errors, 0 warnings, 0 section mismatches defconfig+kselftest+arm64-chromebook (arm64, clang-16) — FAIL, 0 errors, 0 warnings, 0 section mismatches i386_defconfig+kselftest (i386, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches multi_v7_defconfig+kselftest (arm, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches x86_64_defconfig+kselftest (x86_64, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches x86_64_defconfig+kselftest (x86_64, clang-16) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: vmlinux.o: warning: objtool: set_ftrace_ops_ro+0x23: relocation to !ENDBR: .text+0x14fd19 --- For more info write to