Re: [PATCH v3 1/4] ida: Add ida_find_first_range()
On 2024/9/12 23:11, Matthew Wilcox wrote: On Thu, Sep 12, 2024 at 06:17:26AM -0700, Yi Liu wrote: There is no helpers for user to check if a given ID is allocated or not, neither a helper to loop all the allocated IDs in an IDA and do something for cleanup. With the two needs, a helper to get the lowest allocated ID of a range and two variants based on it. Caller can check if a given ID is allocated or not by: bool ida_exists(struct ida *ida, unsigned int id) Caller can iterate all allocated IDs by: int id; while ((id = ida_find_first(&pasid_ida)) > 0) { //anything to do with the allocated ID ida_free(pasid_ida, pasid); } Cc: Matthew Wilcox (Oracle) Suggested-by: Jason Gunthorpe Signed-off-by: Yi Liu --- include/linux/idr.h | 11 lib/idr.c | 67 + 2 files changed, 78 insertions(+) No test cases for the test suite? ;-( let me add something like the below. :) diff --git a/lib/test_ida.c b/lib/test_ida.c index c80155a1956d..d89554ff5719 100644 --- a/lib/test_ida.c +++ b/lib/test_ida.c @@ -189,6 +189,75 @@ static void ida_check_bad_free(struct ida *ida) IDA_BUG_ON(ida, !ida_is_empty(ida)); } +/* + * Check ida_find_first_range() and varriants. + */ +static void ida_check_find_first(struct ida *ida) +{ + /* IDA is empty; all of the below should be not exist */ + IDA_BUG_ON(ida, ida_exists(ida, 0)); + IDA_BUG_ON(ida, ida_exists(ida, 3)); + IDA_BUG_ON(ida, ida_exists(ida, 63)); + IDA_BUG_ON(ida, ida_exists(ida, 1023)); + IDA_BUG_ON(ida, ida_exists(ida, (1 << 20) - 1)); + /* IDA contains a single value entry */ + + IDA_BUG_ON(ida, ida_alloc_min(ida, 3, GFP_KERNEL) != 3); + IDA_BUG_ON(ida, ida_exists(ida, 0)); + IDA_BUG_ON(ida, !ida_exists(ida, 3)); + IDA_BUG_ON(ida, ida_exists(ida, 63)); + IDA_BUG_ON(ida, ida_exists(ida, 1023)); + IDA_BUG_ON(ida, ida_exists(ida, (1 << 20) - 1)); + + IDA_BUG_ON(ida, ida_alloc_min(ida, 63, GFP_KERNEL) != 63); + IDA_BUG_ON(ida, ida_exists(ida, 0)); + IDA_BUG_ON(ida, !ida_exists(ida, 3)); + IDA_BUG_ON(ida, !ida_exists(ida, 63)); + IDA_BUG_ON(ida, ida_exists(ida, 1023)); + IDA_BUG_ON(ida, ida_exists(ida, (1 << 20) - 1)); + + /* IDA contains a single bitmap */ + IDA_BUG_ON(ida, ida_alloc_min(ida, 1023, GFP_KERNEL) != 1023); + IDA_BUG_ON(ida, ida_exists(ida, 0)); + IDA_BUG_ON(ida, !ida_exists(ida, 3)); + IDA_BUG_ON(ida, !ida_exists(ida, 63)); + IDA_BUG_ON(ida, !ida_exists(ida, 1023)); + IDA_BUG_ON(ida, ida_exists(ida, (1 << 20) - 1)); + + /* IDA contains a tree */ + IDA_BUG_ON(ida, ida_alloc_min(ida, (1 << 20) - 1, GFP_KERNEL) != (1 << 20) - 1); + IDA_BUG_ON(ida, ida_exists(ida, 0)); + IDA_BUG_ON(ida, !ida_exists(ida, 3)); + IDA_BUG_ON(ida, !ida_exists(ida, 63)); + IDA_BUG_ON(ida, !ida_exists(ida, 1023)); + IDA_BUG_ON(ida, !ida_exists(ida, (1 << 20) - 1)); + + /* Now try to find first */ + IDA_BUG_ON(ida, ida_find_first(ida) != 3); + IDA_BUG_ON(ida, ida_find_first_range(ida, -1, 2) != -EINVAL); + IDA_BUG_ON(ida, ida_find_first_range(ida, 0, 2) != -ENOENT); // no used ID + IDA_BUG_ON(ida, ida_find_first_range(ida, 0, 3) != 3); + IDA_BUG_ON(ida, ida_find_first_range(ida, 1, 3) != 3); + IDA_BUG_ON(ida, ida_find_first_range(ida, 3, 3) != 3); + IDA_BUG_ON(ida, ida_find_first_range(ida, 2, 4) != 3); + IDA_BUG_ON(ida, ida_find_first_range(ida, 4, 3) != -ENOENT); // min > max, fail + IDA_BUG_ON(ida, ida_find_first_range(ida, 4, 60) != -ENOENT); // no used ID + IDA_BUG_ON(ida, ida_find_first_range(ida, 4, 64) != 63); + IDA_BUG_ON(ida, ida_find_first_range(ida, 63, 63) != 63); + IDA_BUG_ON(ida, ida_find_first_range(ida, 64, 1026) != 1023); + IDA_BUG_ON(ida, ida_find_first_range(ida, 1023, 1023) != 1023); + IDA_BUG_ON(ida, ida_find_first_range(ida, 1023, (1 << 20) - 1) != 1023); + IDA_BUG_ON(ida, ida_find_first_range(ida, 1024, (1 << 20) - 1) != (1 << 20) - 1); + IDA_BUG_ON(ida, ida_find_first_range(ida, (1 << 20), INT_MAX) != -ENOENT); + + ida_free(ida, 3); + ida_free(ida, 63); + ida_free(ida, 1023); + ida_free(ida, (1 << 20) - 1); + + IDA_BUG_ON(ida, !ida_is_empty(ida)); +} + static DEFINE_IDA(ida); static int ida_checks(void) @@ -202,6 +270,7 @@ static int ida_checks(void) ida_check_max(&ida); ida_check_conv(&ida); ida_check_bad_free(&ida); + ida_check_find_first(&ida); printk("IDA: %u of %u tests passed\n", tests_passed, tests_run); return (tests_run != tests_passed) ? 0 : -EINVAL; -- Regards, Yi Liu
Re: [PATCH 3/3] iommu: Add a wrapper for remove_dev_pasid
On 2024/9/13 10:24, Baolu Lu wrote: On 9/12/24 9:06 PM, Yi Liu wrote: The iommu drivers are on the way to drop the remove_dev_pasid op by extending the blocked_domain to support PASID. However, this cannot be done in one shot. So far, the Intel iommu and the ARM SMMUv3 driver have supported it, while the AMD iommu driver has not yet. During this transition, the IOMMU core needs to support both ways to destroy the attachment of device/PASID and domain. Signed-off-by: Yi Liu --- drivers/iommu/iommu.c | 30 -- 1 file changed, 24 insertions(+), 6 deletions(-) This patch should be moved before the change in drivers. After all drivers convert to use set blocking domain to pasid, the remove_dev_pasid could be removed as the last step. you are right! let me fix it. -- Regards, Yi Liu
[PATCH v3 0/3] selftests: kvm: s390: Add ucontrol memory selftests
This patch series adds a some not yet picked selftests to the kvm s390x selftest suite. The additional test cases are covering: * Assert KVM_EXIT_S390_UCONTROL exit on not mapped memory access * Assert functionality of storage keys in ucontrol VM * Assert that memory region operations are rejected for ucontrol VMs Running the test cases requires sys_admin capabilities to start the ucontrol VM. This can be achieved by running as root or with a command like: sudo setpriv --reuid nobody --inh-caps -all,+sys_admin \ --ambient-caps -all,+sys_admin --bounding-set -all,+sys_admin \ ./ucontrol_test --- The patches in this series have been part of the previous patch series. The test cases added here do depend on the fixture added in the earlier patches. >From v5 PATCH 7-9 the segment and page table generation has been removed and DAT has been disabled. Since DAT is not necessary to validate the KVM code. https://lore.kernel.org/kvm/20240807154512.316936-1-schlame...@linux.ibm.com/ v3: - fix skey assertion (thanks Claudio) - introduce a wrapper around UCAS map and unmap ioctls to improve readability (Claudio) - add an displacement to accessed memory to assert translation intercepts actually point to segments to the uc_map_unmap test - add an misaligned failing mapping try to the uc_map_unmap test v2: - Reenable KSS intercept and handle it within skey test. - Modify the checked register between storing (sske) and reading (iske) it within the test program to make sure the. - Add an additional state assertion in the end of uc_skey - Fix some typos and white spaces. v1: - Remove segment and page table generation and disable DAT. This is not necessary to validate the KVM code. Christoph Schlameuss (3): selftests: kvm: s390: Add uc_map_unmap VM test case selftests: kvm: s390: Add uc_skey VM test case selftests: kvm: s390: Verify reject memory region operations for ucontrol VMs .../selftests/kvm/s390x/ucontrol_test.c | 256 +- 1 file changed, 254 insertions(+), 2 deletions(-) -- 2.46.0
[PATCH v3 1/3] selftests: kvm: s390: Add uc_map_unmap VM test case
Add a test case verifying basic running and interaction of ucontrol VMs. Fill the segment and page tables for allocated memory and map memory on first access. * uc_map_unmap Store and load data to mapped and unmapped memory and use pic segment translation handling to map memory on access. Signed-off-by: Christoph Schlameuss --- .../selftests/kvm/s390x/ucontrol_test.c | 145 +- 1 file changed, 144 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/s390x/ucontrol_test.c b/tools/testing/selftests/kvm/s390x/ucontrol_test.c index 030c59010fe1..084cea02c2fa 100644 --- a/tools/testing/selftests/kvm/s390x/ucontrol_test.c +++ b/tools/testing/selftests/kvm/s390x/ucontrol_test.c @@ -16,7 +16,11 @@ #include #include +#define PGM_SEGMENT_TRANSLATION 0x10 + #define VM_MEM_SIZE (4 * SZ_1M) +#define VM_MEM_EXT_SIZE (2 * SZ_1M) +#define VM_MEM_MAX_M ((VM_MEM_SIZE + VM_MEM_EXT_SIZE) / SZ_1M) /* so directly declare capget to check caps without libcap */ int capget(cap_user_header_t header, cap_user_data_t data); @@ -58,6 +62,23 @@ asm("test_gprs_asm:\n" " j 0b\n" ); +/* Test program manipulating memory */ +extern char test_mem_asm[]; +asm("test_mem_asm:\n" + "xgr%r0, %r0\n" + + "0:\n" + " ahi %r0,1\n" + " st %r1,0(%r5,%r6)\n" + + " xgr %r1,%r1\n" + " l %r1,0(%r5,%r6)\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " j 0b\n" +); + FIXTURE(uc_kvm) { struct kvm_s390_sie_block *sie_block; @@ -67,6 +88,7 @@ FIXTURE(uc_kvm) uintptr_t base_hva; uintptr_t code_hva; int kvm_run_size; + vm_paddr_t pgd; void *vm_mem; int vcpu_fd; int kvm_fd; @@ -116,7 +138,7 @@ FIXTURE_SETUP(uc_kvm) self->base_gpa = 0; self->code_gpa = self->base_gpa + (3 * SZ_1M); - self->vm_mem = aligned_alloc(SZ_1M, VM_MEM_SIZE); + self->vm_mem = aligned_alloc(SZ_1M, VM_MEM_MAX_M * SZ_1M); ASSERT_NE(NULL, self->vm_mem) TH_LOG("malloc failed %u", errno); self->base_hva = (uintptr_t)self->vm_mem; self->code_hva = self->base_hva - self->base_gpa + self->code_gpa; @@ -222,6 +244,60 @@ TEST(uc_cap_hpage) close(kvm_fd); } +/* calculate host virtual addr from guest physical addr */ +static void *gpa2hva(FIXTURE_DATA(uc_kvm) * self, u64 gpa) +{ + return (void *)(self->base_hva - self->base_gpa + gpa); +} + +/* map / make additional memory available */ +static int uc_map_ext(FIXTURE_DATA(uc_kvm) * self, u64 vcpu_addr, u64 length) +{ + struct kvm_s390_ucas_mapping map = { + .user_addr = (u64)gpa2hva(self, vcpu_addr), + .vcpu_addr = vcpu_addr, + .length = length, + }; + pr_info("ucas map %p %p 0x%llx", + (void *)map.user_addr, (void *)map.vcpu_addr, map.length); + return ioctl(self->vcpu_fd, KVM_S390_UCAS_MAP, &map); +} + +/* unmap previously mapped memory */ +static int uc_unmap_ext(FIXTURE_DATA(uc_kvm) * self, u64 vcpu_addr, u64 length) +{ + struct kvm_s390_ucas_mapping map = { + .user_addr = (u64)gpa2hva(self, vcpu_addr), + .vcpu_addr = vcpu_addr, + .length = length, + }; + pr_info("ucas unmap %p %p 0x%llx", + (void *)map.user_addr, (void *)map.vcpu_addr, map.length); + return ioctl(self->vcpu_fd, KVM_S390_UCAS_UNMAP, &map); +} + +/* handle ucontrol exit by mapping the accessed segment */ +static void uc_handle_exit_ucontrol(FIXTURE_DATA(uc_kvm) * self) +{ + struct kvm_run *run = self->run; + u64 seg_addr; + int rc; + + TEST_ASSERT_EQ(KVM_EXIT_S390_UCONTROL, run->exit_reason); + switch (run->s390_ucontrol.pgm_code) { + case PGM_SEGMENT_TRANSLATION: + seg_addr = run->s390_ucontrol.trans_exc_code & ~(SZ_1M - 1); + pr_info("ucontrol pic segment translation 0x%llx, mapping segment 0x%lx\n", + run->s390_ucontrol.trans_exc_code, seg_addr); + /* map / make additional memory available */ + rc = uc_map_ext(self, seg_addr, SZ_1M); + TEST_ASSERT_EQ(0, rc); + break; + default: + TEST_FAIL("UNEXPECTED PGM CODE %d", run->s390_ucontrol.pgm_code); + } +} + /* verify SIEIC exit * * reset stop requests * * fail on codes not expected in the test cases @@ -256,6 +332,12 @@ static bool uc_handle_exit(FIXTURE_DATA(uc_kvm) * self) struct kvm_run *run = self->run; switch (run->exit_reason) { + case KVM_EXIT_S390_UCONTROL: + /** check program interruption code +* handle page fault --> ucas map +*/ + uc_handle_exit_ucontrol(self); + break; case KVM_EXIT_S390_SIEIC: return uc_handle_sieic(self); defaul
[PATCH v3 2/3] selftests: kvm: s390: Add uc_skey VM test case
Add a test case manipulating s390 storage keys from within the ucontrol VM. Signed-off-by: Christoph Schlameuss --- .../selftests/kvm/s390x/ucontrol_test.c | 89 ++- 1 file changed, 88 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/s390x/ucontrol_test.c b/tools/testing/selftests/kvm/s390x/ucontrol_test.c index 084cea02c2fa..f6e3a68f89a9 100644 --- a/tools/testing/selftests/kvm/s390x/ucontrol_test.c +++ b/tools/testing/selftests/kvm/s390x/ucontrol_test.c @@ -79,6 +79,33 @@ asm("test_mem_asm:\n" " j 0b\n" ); +/* Test program manipulating storage keys */ +extern char test_skey_asm[]; +asm("test_skey_asm:\n" + "xgr%r0, %r0\n" + + "0:\n" + " ahi %r0,1\n" + " st %r1,0(%r5,%r6)\n" + + " iske%r1,%r6\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " sske%r1,%r6\n" + " xgr %r1,%r1\n" + " iske%r1,%r6\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " rrbe%r1,%r6\n" + " iske%r1,%r6\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " j 0b\n" +); + FIXTURE(uc_kvm) { struct kvm_s390_sie_block *sie_block; @@ -299,8 +326,9 @@ static void uc_handle_exit_ucontrol(FIXTURE_DATA(uc_kvm) * self) } /* verify SIEIC exit - * * reset stop requests + * * handle expected interception codes * * fail on codes not expected in the test cases + * Returns if interception is handled / execution can be continued */ static bool uc_handle_sieic(FIXTURE_DATA(uc_kvm) * self) { @@ -317,6 +345,10 @@ static bool uc_handle_sieic(FIXTURE_DATA(uc_kvm) * self) /* end execution in caller on intercepted instruction */ pr_info("sie instruction interception\n"); return false; + case ICPT_KSS: + /* disable KSS and continue; KVM would init the skeys here */ + sie_block->cpuflags &= ~CPUSTAT_KSS; + return true; case ICPT_OPEREXC: /* operation exception */ TEST_FAIL("sie exception on %.4x%.8x", sie_block->ipa, sie_block->ipb); @@ -473,4 +505,59 @@ TEST_F(uc_kvm, uc_gprs) ASSERT_EQ(1, sync_regs->gprs[0]); } +TEST_F(uc_kvm, uc_skey) +{ + u64 test_vaddr = VM_MEM_SIZE - (SZ_1M / 2); + struct kvm_sync_regs *sync_regs = &self->run->s.regs; + struct kvm_run *run = self->run; + u8 skeyvalue = 0x34; + + /* copy test_skey_asm to code_hva / code_gpa */ + TH_LOG("copy code %p to vm mapped memory %p / %p", + &test_skey_asm, (void *)self->code_hva, (void *)self->code_gpa); + memcpy((void *)self->code_hva, &test_skey_asm, PAGE_SIZE); + + /* set register content for test_skey_asm to access not mapped memory */ + sync_regs->gprs[1] = skeyvalue; + sync_regs->gprs[5] = self->base_gpa; + sync_regs->gprs[6] = test_vaddr; + run->kvm_dirty_regs |= KVM_SYNC_GPRS; + + /* DAT disabled + 64 bit mode */ + run->psw_mask = 0x00018000ULL; + run->psw_addr = self->code_gpa; + + ASSERT_EQ(0, uc_run_once(self)); + ASSERT_EQ(true, uc_handle_exit(self)); + ASSERT_EQ(1, sync_regs->gprs[0]); + + /* ISKE */ + ASSERT_EQ(0, uc_run_once(self)); + ASSERT_EQ(false, uc_handle_exit(self)); + ASSERT_EQ(2, sync_regs->gprs[0]); + /* assert initial skey (ACC = 0, R & C = 1) */ + ASSERT_EQ(0x06, sync_regs->gprs[1]); + uc_assert_diag44(self); + + /* SSKE */ + sync_regs->gprs[1] = skeyvalue; + run->kvm_dirty_regs |= KVM_SYNC_GPRS; + ASSERT_EQ(0, uc_run_once(self)); + ASSERT_EQ(false, uc_handle_exit(self)); + ASSERT_EQ(3, sync_regs->gprs[0]); + ASSERT_EQ(skeyvalue, sync_regs->gprs[1]); + uc_assert_diag44(self); + + /* RRBE */ + sync_regs->gprs[1] = skeyvalue; + run->kvm_dirty_regs |= KVM_SYNC_GPRS; + ASSERT_EQ(0, uc_run_once(self)); + ASSERT_EQ(false, uc_handle_exit(self)); + ASSERT_EQ(4, sync_regs->gprs[0]); + /* assert R reset but rest of skey unchanged*/ + ASSERT_EQ(skeyvalue & 0xfa, sync_regs->gprs[1]); + ASSERT_EQ(0x00, sync_regs->gprs[1] & 0x04); + uc_assert_diag44(self); +} + TEST_HARNESS_MAIN -- 2.46.0
[PATCH v3 3/3] selftests: kvm: s390: Verify reject memory region operations for ucontrol VMs
Add a test case verifying KVM_SET_USER_MEMORY_REGION and KVM_SET_USER_MEMORY_REGION2 cannot be executed on ucontrol VMs. Executing this test case on not patched kernels will cause a null pointer dereference in the host kernel. This is fixed with commit: commit 7816e58967d0 ("kvm: s390: Reject memory region operations for ucontrol VMs") Signed-off-by: Christoph Schlameuss Reviewed-by: Janosch Frank --- .../selftests/kvm/s390x/ucontrol_test.c | 22 +++ 1 file changed, 22 insertions(+) diff --git a/tools/testing/selftests/kvm/s390x/ucontrol_test.c b/tools/testing/selftests/kvm/s390x/ucontrol_test.c index f6e3a68f89a9..11fbb09eff66 100644 --- a/tools/testing/selftests/kvm/s390x/ucontrol_test.c +++ b/tools/testing/selftests/kvm/s390x/ucontrol_test.c @@ -401,6 +401,28 @@ static void uc_assert_diag44(FIXTURE_DATA(uc_kvm) * self) TEST_ASSERT_EQ(0x44, sie_block->ipb); } +TEST_F(uc_kvm, uc_no_user_region) +{ + struct kvm_userspace_memory_region region = { + .slot = 1, + .guest_phys_addr = self->code_gpa, + .memory_size = VM_MEM_EXT_SIZE, + .userspace_addr = (uintptr_t)self->code_hva, + }; + struct kvm_userspace_memory_region2 region2 = { + .slot = 1, + .guest_phys_addr = self->code_gpa, + .memory_size = VM_MEM_EXT_SIZE, + .userspace_addr = (uintptr_t)self->code_hva, + }; + + ASSERT_EQ(-1, ioctl(self->vm_fd, KVM_SET_USER_MEMORY_REGION, ®ion)); + ASSERT_EQ(EINVAL, errno); + + ASSERT_EQ(-1, ioctl(self->vm_fd, KVM_SET_USER_MEMORY_REGION2, ®ion2)); + ASSERT_EQ(EINVAL, errno); +} + TEST_F(uc_kvm, uc_map_unmap) { struct kvm_sync_regs *sync_regs = &self->run->s.regs; -- 2.46.0
Re: [PATCH v4 01/10] iommu: Introduce a replace API for device pasid
On 2024/9/13 10:44, Baolu Lu wrote: On 9/12/24 9:12 PM, Yi Liu wrote: Provide a high-level API to allow replacements of one domain with another for specific pasid of a device. This is similar to iommu_group_replace_domain() and it is expected to be used only by IOMMUFD. Co-developed-by: Lu Baolu Signed-off-by: Lu Baolu Signed-off-by: Yi Liu --- drivers/iommu/iommu-priv.h | 4 ++ drivers/iommu/iommu.c | 90 -- 2 files changed, 90 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h index de5b54eaa8bf..90b367de267e 100644 --- a/drivers/iommu/iommu-priv.h +++ b/drivers/iommu/iommu-priv.h @@ -27,6 +27,10 @@ static inline const struct iommu_ops *iommu_fwspec_ops(struct iommu_fwspec *fwsp int iommu_group_replace_domain(struct iommu_group *group, struct iommu_domain *new_domain); +int iommu_replace_device_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid, + struct iommu_attach_handle *handle); + int iommu_device_register_bus(struct iommu_device *iommu, const struct iommu_ops *ops, const struct bus_type *bus, diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index b6b44b184004..066f659018a5 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3347,14 +3347,15 @@ static void iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, } static int __iommu_set_group_pasid(struct iommu_domain *domain, - struct iommu_group *group, ioasid_t pasid) + struct iommu_group *group, ioasid_t pasid, + struct iommu_domain *old) { struct group_device *device, *last_gdev; int ret; for_each_group_device(group, device) { ret = domain->ops->set_dev_pasid(domain, device->dev, - pasid, NULL); + pasid, old); if (ret) goto err_revert; } @@ -3366,7 +3367,20 @@ static int __iommu_set_group_pasid(struct iommu_domain *domain, for_each_group_device(group, device) { if (device == last_gdev) break; - iommu_remove_dev_pasid(device->dev, pasid, domain); + /* If no old domain, undo the succeeded devices/pasid */ + if (!old) { + iommu_remove_dev_pasid(device->dev, pasid, domain); + continue; + } + + /* + * Rollback the succeeded devices/pasid to the old domain. + * And it is a driver bug to fail attaching with a previously + * good domain. + */ + if (WARN_ON(old->ops->set_dev_pasid(old, device->dev, + pasid, domain))) + iommu_remove_dev_pasid(device->dev, pasid, domain); You want to rollback to the 'old' domain, right? So, %s/domain/old/ ? this will be invoked if the rollback failed. Since the set_dev_pasid op would keep the 'old' configure, so at this point, the 'old' domain is 'domain'. } return ret; } @@ -3425,7 +3439,7 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, if (ret) goto out_unlock; - ret = __iommu_set_group_pasid(domain, group, pasid); + ret = __iommu_set_group_pasid(domain, group, pasid, NULL); if (ret) xa_erase(&group->pasid_array, pasid); out_unlock: @@ -3434,6 +3448,74 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, } EXPORT_SYMBOL_GPL(iommu_attach_device_pasid); +/** + * iommu_replace_device_pasid - Replace the domain that a pasid is attached to + * @domain: the new iommu domain + * @dev: the attached device. + * @pasid: the pasid of the device. + * @handle: the attach handle. + * + * This API allows the pasid to switch domains. Return 0 on success, or an + * error. The pasid will keep the old configuration if replacement failed. + * This is supposed to be used by iommufd, and iommufd can guarantee that + * both iommu_attach_device_pasid() and iommu_replace_device_pasid() would + * pass in a valid @handle. + */ +int iommu_replace_device_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid, + struct iommu_attach_handle *handle) How about passing the old domain as a parameter? I suppose it was agreed in the below link. https://lore.kernel.org/linux-iommu/20240816124707.gz2032...@nvidia.com/ +{ + /* Caller must be a probed driver on dev */ + struct iommu_group *group = dev->iommu_group; + struct iommu_attach_handle *curr; + int ret; + + if (!domain->ops->set_dev_pasid) + return -EOPNOTSUPP; + + if (!group) + return -ENODEV; + + if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner || + pasid == IOMMU_NO_PASID || !handle) dev_has_iommu() check is duplicate with above if (!group) check. I was just referring to the iommu_attach_device_pasid(). So both the
Re: [PATCH v2 2/6] iommu/vt-d: Move intel_drain_pasid_prq() into intel_pasid_tear_down_entry()
On 2024/9/12 21:22, Baolu Lu wrote: On 2024/9/12 21:04, Yi Liu wrote: Draining PRQ is mostly conjuncted with pasid teardown, and with more callers coming, move it into it in the intel_pasid_tear_down_entry(). But there is scenario that only teardown pasid entry but no PRQ drain, so passing a flag to mark it. Is it a reasonable case where PRI needs to be drained but the pasid entry won't be torn down? For example, could this happen when a PRI is disabled? in concept, yes. But it seems no more than a debugging method in my opinion. I cannot map it to a usage so far. Signed-off-by: Yi Liu --- drivers/iommu/intel/iommu.c | 8 drivers/iommu/intel/pasid.c | 13 +++-- drivers/iommu/intel/pasid.h | 8 +--- drivers/iommu/intel/svm.c | 3 ++- 4 files changed, 22 insertions(+), 10 deletions(-) Thanks, baolu -- Regards, Yi Liu
Re: [PATCH v2 2/6] iommu/vt-d: Move intel_drain_pasid_prq() into intel_pasid_tear_down_entry()
On 2024/9/13 10:11, Baolu Lu wrote: On 9/12/24 9:04 PM, Yi Liu wrote: diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index b51fc268dc84..ceb9c5274a39 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -236,8 +236,13 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu, qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 - VTD_PAGE_SHIFT); } +/* + * Not all PASID entry destroy requires PRQ drain as it can be handled in + * the remove_dev_pasid path. Caller should be clear about it and set the + * @flags properly. + */ void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev, - u32 pasid, bool fault_ignore) + u32 pasid, u32 flags) As a generic opt-in bit flags, why not using 'unsigned int'? aha, I can use 'unsigned int'. -- Regards, Yi Liu
Re: [PATCH v2 3/6] iommu/vt-d: Make intel_iommu_set_dev_pasid() to handle domain replacement
On 2024/9/13 09:35, Baolu Lu wrote: On 9/12/24 9:04 PM, Yi Liu wrote: set_dev_pasid op is going to support domain replacement and keep the old hardware config if it fails. Make the Intel iommu driver be prepared for it. Signed-off-by: Yi Liu --- drivers/iommu/intel/iommu.c | 98 - 1 file changed, 65 insertions(+), 33 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 80b587de226d..6f5a8e549f3f 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4248,8 +4248,8 @@ static int intel_iommu_iotlb_sync_map(struct iommu_domain *domain, return 0; } -static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, - struct iommu_domain *domain) +static void domain_remove_dev_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid) { struct device_domain_info *info = dev_iommu_priv_get(dev); struct dev_pasid_info *curr, *dev_pasid = NULL; @@ -4257,11 +4257,6 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, struct dmar_domain *dmar_domain; unsigned long flags; - if (domain->type == IOMMU_DOMAIN_IDENTITY) { - intel_pasid_tear_down_entry(iommu, dev, pasid, 0); - return; - } - dmar_domain = to_dmar_domain(domain); spin_lock_irqsave(&dmar_domain->lock, flags); list_for_each_entry(curr, &dmar_domain->dev_pasids, link_domain) { @@ -4278,13 +4273,24 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, domain_detach_iommu(dmar_domain, iommu); intel_iommu_debugfs_remove_dev_pasid(dev_pasid); kfree(dev_pasid); +} + +static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, + struct iommu_domain *domain) +{ + struct device_domain_info *info = dev_iommu_priv_get(dev); + struct intel_iommu *iommu = info->iommu; + intel_pasid_tear_down_entry(iommu, dev, pasid, INTEL_PASID_TEARDOWN_DRAIN_PRQ); + if (domain->type == IOMMU_DOMAIN_IDENTITY) + return; The static identity domain is not capable of handling page requests. Therefore there is no need to drain PRQ for an identity domain removal. oh, yes. so maybe for SI domain, there is no PRQ at all. So it probably should be something like this: if (domain->type == IOMMU_DOMAIN_IDENTITY) { intel_pasid_tear_down_entry(iommu, dev, pasid, 0); return; } intel_pasid_tear_down_entry(iommu, dev, pasid, INTEL_PASID_TEARDOWN_DRAIN_PRQ); + domain_remove_dev_pasid(domain, dev, pasid); } -static int intel_iommu_set_dev_pasid(struct iommu_domain *domain, - struct device *dev, ioasid_t pasid, - struct iommu_domain *old) +static struct dev_pasid_info * +domain_prepare_dev_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid) Why do you want to return a struct pointer instead of an integer? The returned pointer is not used after it is returned. it's for the intel_iommu_debugfs_create_dev_pasid(). it needs the dev_pasid. Also, how about renaming this helper to domain_add_dev_pasid() to pair it with domain_remove_dev_pasid()? sure. { struct device_domain_info *info = dev_iommu_priv_get(dev); struct dmar_domain *dmar_domain = to_dmar_domain(domain); @@ -4293,22 +4299,13 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain, unsigned long flags; int ret; - if (!pasid_supported(iommu) || dev_is_real_dma_subdevice(dev)) - return -EOPNOTSUPP; - - if (domain->dirty_ops) - return -EINVAL; - - if (context_copied(iommu, info->bus, info->devfn)) - return -EBUSY; - ret = prepare_domain_attach_device(domain, dev); if (ret) - return ret; + return ERR_PTR(ret); dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL); if (!dev_pasid) - return -ENOMEM; + return ERR_PTR(-ENOMEM); ret = domain_attach_iommu(dmar_domain, iommu); if (ret) Thanks, baolu -- Regards, Yi Liu
Re: [PATCH v2 3/6] iommu/vt-d: Make intel_iommu_set_dev_pasid() to handle domain replacement
On 2024/9/13 10:17, Baolu Lu wrote: On 9/13/24 9:35 AM, Baolu Lu wrote: On 9/12/24 9:04 PM, Yi Liu wrote: set_dev_pasid op is going to support domain replacement and keep the old hardware config if it fails. Make the Intel iommu driver be prepared for it. Signed-off-by: Yi Liu --- drivers/iommu/intel/iommu.c | 98 - 1 file changed, 65 insertions(+), 33 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 80b587de226d..6f5a8e549f3f 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4248,8 +4248,8 @@ static int intel_iommu_iotlb_sync_map(struct iommu_domain *domain, return 0; } -static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, - struct iommu_domain *domain) +static void domain_remove_dev_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid) { struct device_domain_info *info = dev_iommu_priv_get(dev); struct dev_pasid_info *curr, *dev_pasid = NULL; @@ -4257,11 +4257,6 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, struct dmar_domain *dmar_domain; unsigned long flags; - if (domain->type == IOMMU_DOMAIN_IDENTITY) { - intel_pasid_tear_down_entry(iommu, dev, pasid, 0); - return; - } - dmar_domain = to_dmar_domain(domain); spin_lock_irqsave(&dmar_domain->lock, flags); list_for_each_entry(curr, &dmar_domain->dev_pasids, link_domain) { @@ -4278,13 +4273,24 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, domain_detach_iommu(dmar_domain, iommu); intel_iommu_debugfs_remove_dev_pasid(dev_pasid); kfree(dev_pasid); +} + +static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, + struct iommu_domain *domain) +{ + struct device_domain_info *info = dev_iommu_priv_get(dev); + struct intel_iommu *iommu = info->iommu; + intel_pasid_tear_down_entry(iommu, dev, pasid, INTEL_PASID_TEARDOWN_DRAIN_PRQ); + if (domain->type == IOMMU_DOMAIN_IDENTITY) + return; The static identity domain is not capable of handling page requests. Therefore there is no need to drain PRQ for an identity domain removal. So it probably should be something like this: if (domain->type == IOMMU_DOMAIN_IDENTITY) { intel_pasid_tear_down_entry(iommu, dev, pasid, 0); return; } intel_pasid_tear_down_entry(iommu, dev, pasid, INTEL_PASID_TEARDOWN_DRAIN_PRQ); Just revisited this. It seems that we just need to drain PRQ if the attached domain is iopf-capable. Therefore, how about making it like this? unsigned int flags = 0; if (domain->iopf_handler) flags |= INTEL_PASID_TEARDOWN_DRAIN_PRQ; intel_pasid_tear_down_entry(iommu, dev, pasid, flags); /* Identity domain has no meta data for pasid. */ if (domain->type == IOMMU_DOMAIN_IDENTITY) return; got it. -- Regards, Yi Liu
Re: [PATCH v2 3/6] iommu/vt-d: Make intel_iommu_set_dev_pasid() to handle domain replacement
On 2024/9/13 09:42, Baolu Lu wrote: On 9/12/24 9:04 PM, Yi Liu wrote: @@ -4325,24 +4363,18 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain, ret = intel_pasid_setup_second_level(iommu, dmar_domain, dev, pasid); if (ret) - goto out_unassign_tag; + goto out_undo_dev_pasid; - dev_pasid->dev = dev; - dev_pasid->pasid = pasid; - spin_lock_irqsave(&dmar_domain->lock, flags); - list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids); - spin_unlock_irqrestore(&dmar_domain->lock, flags); + if (old) + domain_remove_dev_pasid(old, dev, pasid); if (domain->type & __IOMMU_DOMAIN_PAGING) intel_iommu_debugfs_create_dev_pasid(dev_pasid); return 0; -out_unassign_tag: - cache_tag_unassign_domain(dmar_domain, dev, pasid); -out_detach_iommu: - domain_detach_iommu(dmar_domain, iommu); -out_free: - kfree(dev_pasid); + +out_undo_dev_pasid: + domain_remove_dev_pasid(domain, dev, pasid); return ret; } Do you need to re-install the old domain to the pasid entry in the failure path? yes, but no. The old domain is still installed in the pasid entry when the failure happened. :) -- Regards, Yi Liu
Re: [PATCH v2 4/6] iommu/vt-d: Add set_dev_pasid callback for nested domain
On 2024/9/13 09:52, Baolu Lu wrote: On 9/12/24 9:04 PM, Yi Liu wrote: @@ -4299,7 +4304,12 @@ domain_prepare_dev_pasid(struct iommu_domain *domain, unsigned long flags; int ret; - ret = prepare_domain_attach_device(domain, dev); + /* Nested type domain should prepare its parent domain */ + if (domain_type_is_nested(dmar_domain)) + ret = prepare_domain_attach_device( + &dmar_domain->s2_domain->domain, dev); + else + ret = prepare_domain_attach_device(domain, dev); if (ret) return ERR_PTR(ret); 'prepare' is a bit confusing in this context. It actually means checking whether a domain is compatible with the hardware capabilities of RID or PASID. Or not? I know this confusion comes from the naming of prepare_domain_attach_device(). Hence, do you mind renaming this helper in a small patch? good suggestion. -- Regards, Yi Liu
Re: [PATCH v3 1/3] selftests: kvm: s390: Add uc_map_unmap VM test case
On 9/13/24 1:52 PM, Christoph Schlameuss wrote: Add a test case verifying basic running and interaction of ucontrol VMs. Fill the segment and page tables for allocated memory and map memory on first access. * uc_map_unmap Store and load data to mapped and unmapped memory and use pic segment translation handling to map memory on access. Signed-off-by: Christoph Schlameuss Reviewed-by: Janosch Frank
[PATCH] selftests: livepatch: test livepatching a kprobed function
The test proves that a function that is being kprobed and uses a post_handler cannot be livepatched. Only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be registered to any given function at a time. Signed-off-by: Michael Vetter --- tools/testing/selftests/livepatch/Makefile| 3 +- .../selftests/livepatch/test-kprobe.sh| 62 +++ .../selftests/livepatch/test_modules/Makefile | 3 +- .../livepatch/test_modules/test_klp_kprobe.c | 38 4 files changed, 104 insertions(+), 2 deletions(-) create mode 100755 tools/testing/selftests/livepatch/test-kprobe.sh create mode 100644 tools/testing/selftests/livepatch/test_modules/test_klp_kprobe.c diff --git a/tools/testing/selftests/livepatch/Makefile b/tools/testing/selftests/livepatch/Makefile index 35418a4790be..a080eb54a215 100644 --- a/tools/testing/selftests/livepatch/Makefile +++ b/tools/testing/selftests/livepatch/Makefile @@ -10,7 +10,8 @@ TEST_PROGS := \ test-state.sh \ test-ftrace.sh \ test-sysfs.sh \ - test-syscall.sh + test-syscall.sh \ + test-kprobe.sh TEST_FILES := settings diff --git a/tools/testing/selftests/livepatch/test-kprobe.sh b/tools/testing/selftests/livepatch/test-kprobe.sh new file mode 100755 index ..0c62c6b81e18 --- /dev/null +++ b/tools/testing/selftests/livepatch/test-kprobe.sh @@ -0,0 +1,62 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2024 SUSE +# Author: Michael Vetter + +. $(dirname $0)/functions.sh + +MOD_LIVEPATCH=test_klp_livepatch +MOD_KPROBE=test_klp_kprobe + +setup_config + +# Kprobe a function and verify that we can't livepatch that same function +# when it uses a post_handler since only one IPMODIFY maybe be registered +# to any given function at a time. + +start_test "livepatch interaction with kprobed function with post_handler" + +echo 1 > /sys/kernel/debug/kprobes/enabled + +load_mod $MOD_KPROBE has_post_handler=true +load_failing_mod $MOD_LIVEPATCH +unload_mod $MOD_KPROBE + +check_result "% insmod test_modules/test_klp_kprobe.ko has_post_handler=true +% insmod test_modules/$MOD_LIVEPATCH.ko +livepatch: enabling patch '$MOD_LIVEPATCH' +livepatch: '$MOD_LIVEPATCH': initializing patching transition +livepatch: failed to register ftrace handler for function 'cmdline_proc_show' (-16) +livepatch: failed to patch object 'vmlinux' +livepatch: failed to enable patch '$MOD_LIVEPATCH' +livepatch: '$MOD_LIVEPATCH': canceling patching transition, going to unpatch +livepatch: '$MOD_LIVEPATCH': completing unpatching transition +livepatch: '$MOD_LIVEPATCH': unpatching complete +insmod: ERROR: could not insert module test_modules/$MOD_LIVEPATCH.ko: Device or resource busy +% rmmod test_klp_kprobe" + +start_test "livepatch interaction with kprobed function without post_handler" + +load_mod $MOD_KPROBE has_post_handler=false +load_lp $MOD_LIVEPATCH + +unload_mod $MOD_KPROBE +disable_lp $MOD_LIVEPATCH +unload_lp $MOD_LIVEPATCH + +check_result "% insmod test_modules/test_klp_kprobe.ko has_post_handler=false +% insmod test_modules/$MOD_LIVEPATCH.ko +livepatch: enabling patch '$MOD_LIVEPATCH' +livepatch: '$MOD_LIVEPATCH': initializing patching transition +livepatch: '$MOD_LIVEPATCH': starting patching transition +livepatch: '$MOD_LIVEPATCH': completing patching transition +livepatch: '$MOD_LIVEPATCH': patching complete +% rmmod test_klp_kprobe +% echo 0 > /sys/kernel/livepatch/$MOD_LIVEPATCH/enabled +livepatch: '$MOD_LIVEPATCH': initializing unpatching transition +livepatch: '$MOD_LIVEPATCH': starting unpatching transition +livepatch: '$MOD_LIVEPATCH': completing unpatching transition +livepatch: '$MOD_LIVEPATCH': unpatching complete +% rmmod $MOD_LIVEPATCH" + +exit 0 diff --git a/tools/testing/selftests/livepatch/test_modules/Makefile b/tools/testing/selftests/livepatch/test_modules/Makefile index e6e638c4bcba..4981d270f128 100644 --- a/tools/testing/selftests/livepatch/test_modules/Makefile +++ b/tools/testing/selftests/livepatch/test_modules/Makefile @@ -11,7 +11,8 @@ obj-m += test_klp_atomic_replace.o \ test_klp_state2.o \ test_klp_state3.o \ test_klp_shadow_vars.o \ - test_klp_syscall.o + test_klp_syscall.o \ + test_klp_kprobe.o # Ensure that KDIR exists, otherwise skip the compilation modules: diff --git a/tools/testing/selftests/livepatch/test_modules/test_klp_kprobe.c b/tools/testing/selftests/livepatch/test_modules/test_klp_kprobe.c new file mode 100644 index ..49b579ea1054 --- /dev/null +++ b/tools/testing/selftests/livepatch/test_modules/test_klp_kprobe.c @@ -0,0 +1,38 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (C) 2024 Marcos Paulo de Souza +// Copyright (C) 2024 Michael Vetter + +#include +#include +#include + +static bool has_post_handler = true; +module_param(has_post_handler, bool, 0444); + +static void __kprobes post_handler(struct kprobe *p, struct pt_regs *regs, + unsigned long flags
Re: [PATCH v3 2/3] selftests: kvm: s390: Add uc_skey VM test case
On 9/13/24 1:52 PM, Christoph Schlameuss wrote: Add a test case manipulating s390 storage keys from within the ucontrol VM. Signed-off-by: Christoph Schlameuss Except for the two nits: Reviewed-by: Janosch Frank I'll think about what to do with the nits next week. --- .../selftests/kvm/s390x/ucontrol_test.c | 89 ++- 1 file changed, 88 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/s390x/ucontrol_test.c b/tools/testing/selftests/kvm/s390x/ucontrol_test.c index 084cea02c2fa..f6e3a68f89a9 100644 --- a/tools/testing/selftests/kvm/s390x/ucontrol_test.c +++ b/tools/testing/selftests/kvm/s390x/ucontrol_test.c @@ -79,6 +79,33 @@ asm("test_mem_asm:\n" " j 0b\n" ); +/* Test program manipulating storage keys */ +extern char test_skey_asm[]; +asm("test_skey_asm:\n" + "xgr %r0, %r0\n" + + "0:\n" + " ahi %r0,1\n" + " st %r1,0(%r5,%r6)\n" + + " iske%r1,%r6\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " sske%r1,%r6\n" + " xgr %r1,%r1\n" + " iske%r1,%r6\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " rrbe%r1,%r6\n" + " iske%r1,%r6\n" + " ahi %r0,1\n" + " diag0,0,0x44\n" + + " j 0b\n" +); + FIXTURE(uc_kvm) { struct kvm_s390_sie_block *sie_block; @@ -299,8 +326,9 @@ static void uc_handle_exit_ucontrol(FIXTURE_DATA(uc_kvm) * self) } /* verify SIEIC exit - * * reset stop requests Argh, that made it in the patches that I've picked up. + * * handle expected interception codes * * fail on codes not expected in the test cases + * Returns if interception is handled / execution can be continued */ static bool uc_handle_sieic(FIXTURE_DATA(uc_kvm) * self) { @@ -317,6 +345,10 @@ static bool uc_handle_sieic(FIXTURE_DATA(uc_kvm) * self) /* end execution in caller on intercepted instruction */ pr_info("sie instruction interception\n"); return false; + case ICPT_KSS: + /* disable KSS and continue; KVM would init the skeys here */ + sie_block->cpuflags &= ~CPUSTAT_KSS; + return true; case ICPT_OPEREXC: /* operation exception */ TEST_FAIL("sie exception on %.4x%.8x", sie_block->ipa, sie_block->ipb); @@ -473,4 +505,59 @@ TEST_F(uc_kvm, uc_gprs) ASSERT_EQ(1, sync_regs->gprs[0]); } +TEST_F(uc_kvm, uc_skey) +{ + u64 test_vaddr = VM_MEM_SIZE - (SZ_1M / 2); + struct kvm_sync_regs *sync_regs = &self->run->s.regs; + struct kvm_run *run = self->run; + u8 skeyvalue = 0x34; + + /* copy test_skey_asm to code_hva / code_gpa */ + TH_LOG("copy code %p to vm mapped memory %p / %p", + &test_skey_asm, (void *)self->code_hva, (void *)self->code_gpa); + memcpy((void *)self->code_hva, &test_skey_asm, PAGE_SIZE); + + /* set register content for test_skey_asm to access not mapped memory */ + sync_regs->gprs[1] = skeyvalue; + sync_regs->gprs[5] = self->base_gpa; + sync_regs->gprs[6] = test_vaddr; + run->kvm_dirty_regs |= KVM_SYNC_GPRS; + + /* DAT disabled + 64 bit mode */ + run->psw_mask = 0x00018000ULL; + run->psw_addr = self->code_gpa; + + ASSERT_EQ(0, uc_run_once(self)); + ASSERT_EQ(true, uc_handle_exit(self)); + ASSERT_EQ(1, sync_regs->gprs[0]); + + /* ISKE */ + ASSERT_EQ(0, uc_run_once(self)); + ASSERT_EQ(false, uc_handle_exit(self)); + ASSERT_EQ(2, sync_regs->gprs[0]); + /* assert initial skey (ACC = 0, R & C = 1) */ + ASSERT_EQ(0x06, sync_regs->gprs[1]); + uc_assert_diag44(self); + + /* SSKE */ + sync_regs->gprs[1] = skeyvalue; + run->kvm_dirty_regs |= KVM_SYNC_GPRS; + ASSERT_EQ(0, uc_run_once(self)); + ASSERT_EQ(false, uc_handle_exit(self)); + ASSERT_EQ(3, sync_regs->gprs[0]); + ASSERT_EQ(skeyvalue, sync_regs->gprs[1]); + uc_assert_diag44(self); + + /* RRBE */ + sync_regs->gprs[1] = skeyvalue; + run->kvm_dirty_regs |= KVM_SYNC_GPRS; + ASSERT_EQ(0, uc_run_once(self)); + ASSERT_EQ(false, uc_handle_exit(self)); + ASSERT_EQ(4, sync_regs->gprs[0]); + /* assert R reset but rest of skey unchanged*/ Missing space before "*" + ASSERT_EQ(skeyvalue & 0xfa, sync_regs->gprs[1]); + ASSERT_EQ(0x00, sync_regs->gprs[1] & 0x04); + uc_assert_diag44(self); +} + TEST_HARNESS_MAIN
Re: [PATCH v3 1/4] ida: Add ida_find_first_range()
On Fri, Sep 13, 2024 at 07:45:55PM +0800, Yi Liu wrote: > > No test cases for the test suite? ;-( > > let me add something like the below. :) That looks pretty comprehensive, thanks! Acked-by: Matthew Wilcox (Oracle)
Re: [PATCH v5 06/30] arm64: context switch POR_EL0 register
On Thu, Sep 12, 2024 at 01:48:35PM +0100, Joey Gouly wrote: > On Thu, Sep 12, 2024 at 11:50:18AM +0100, Will Deacon wrote: > > On Wed, Sep 11, 2024 at 08:33:54AM -0700, Dave Hansen wrote: > > > On 9/11/24 08:01, Kevin Brodsky wrote: > > > > On 22/08/2024 17:10, Joey Gouly wrote: > > > >> @@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const > > > >> struct kernel_clone_args *args) > > > >>if (system_supports_tpidr2()) > > > >>p->thread.tpidr2_el0 = > > > >> read_sysreg_s(SYS_TPIDR2_EL0); > > > >> > > > >> + if (system_supports_poe()) > > > >> + p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0); > > > > Here we are only reloading POR_EL0's value if the target is a user > > > > thread. However, as this series stands, POR_EL0 is also relevant to > > > > kthreads, because any uaccess or GUP done from a kthread will also be > > > > checked against POR_EL0. This is especially important in cases like the > > > > io_uring kthread, which accesses the memory of the user process that > > > > spawned it. To prevent such a kthread from inheriting a stale value of > > > > POR_EL0, it seems that we should reload POR_EL0's value in all cases > > > > (user and kernel thread). > > > > > > The problem with this is trying to figure out which POR_EL0 to use. The > > > kthread could have been spawned ages ago and might not have a POR_EL0 > > > which is very different from the current value of any of the threads in > > > the process right now. > > > > > > There's also no great way for a kthread to reach out and grab an updated > > > value. It's all completely inherently racy. > > > > > > > Other approaches could also be considered (e.g. resetting POR_EL0 to > > > > unrestricted when creating a kthread), see my reply on v4 [1]. > > > > > > I kinda think this is the only way to go. It's the only sensible, > > > predictable way. I _think_ it's what x86 will end up doing with PKRU, > > > but there's been enough churn there that I'd need to go double check > > > what happens in practice. > > > > I agree. > > > > > Either way, it would be nice to get an io_uring test in here that > > > actually spawns kthreads: > > > > > > tools/testing/selftests/mm/protection_keys.c > > > > It would be good to update Documentation/core-api/protection-keys.rst > > as well, since the example with read() raises more questions than it > > answers! > > > > Kevin, Joey -- I've got this series queued in arm64 as-is, so perhaps > > you could send some patches on top so we can iron this out in time for > > 6.12? I'll also be at LPC next week if you're about. > > I found the code in arch/x86 that does this, I must have missed this > previously. > > arch/x86/kernel/process.c: int copy_thread() > > > > /* Kernel thread ? */ > > > if (unlikely(p->flags & PF_KTHREAD)) { > > > p->thread.pkru = pkru_get_init_value(); > > > memset(childregs, 0, sizeof(struct pt_regs)); > > > kthread_frame_init(frame, args->fn, args->fn_arg); > > > return 0; > > > } > > I can send a similar patch for arm64. I have no idea how to write io_uring > code, so looking for examples I can work with to get a test written. Might > just > send the arm64 fix first, if that's fine? I think fix + documentation is what we need before 6.12, but you've still got plenty of time after the merge window. Cheers, Will
Re: [PATCH v3 1/3] selftests: kvm: s390: Add uc_map_unmap VM test case
On Fri, 13 Sep 2024 13:52:46 +0200 Christoph Schlameuss wrote: > Add a test case verifying basic running and interaction of ucontrol VMs. > Fill the segment and page tables for allocated memory and map memory on > first access. > > * uc_map_unmap > Store and load data to mapped and unmapped memory and use pic segment > translation handling to map memory on access. > > Signed-off-by: Christoph Schlameuss > --- > .../selftests/kvm/s390x/ucontrol_test.c | 145 +- > 1 file changed, 144 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/kvm/s390x/ucontrol_test.c > b/tools/testing/selftests/kvm/s390x/ucontrol_test.c > index 030c59010fe1..084cea02c2fa 100644 > --- a/tools/testing/selftests/kvm/s390x/ucontrol_test.c > +++ b/tools/testing/selftests/kvm/s390x/ucontrol_test.c [...] >base_gpa + self->code_gpa; > @@ -222,6 +244,60 @@ TEST(uc_cap_hpage) > close(kvm_fd); > } > > +/* calculate host virtual addr from guest physical addr */ > +static void *gpa2hva(FIXTURE_DATA(uc_kvm) * self, u64 gpa) why the space? I would have expected *self > +{ > + return (void *)(self->base_hva - self->base_gpa + gpa); > +} > + > +/* map / make additional memory available */ [...]
Re: [PATCH v2 3/6] iommu/vt-d: Make intel_iommu_set_dev_pasid() to handle domain replacement
On 9/13/24 8:21 PM, Yi Liu wrote: On 2024/9/13 09:42, Baolu Lu wrote: On 9/12/24 9:04 PM, Yi Liu wrote: @@ -4325,24 +4363,18 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain, ret = intel_pasid_setup_second_level(iommu, dmar_domain, dev, pasid); if (ret) - goto out_unassign_tag; + goto out_undo_dev_pasid; - dev_pasid->dev = dev; - dev_pasid->pasid = pasid; - spin_lock_irqsave(&dmar_domain->lock, flags); - list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids); - spin_unlock_irqrestore(&dmar_domain->lock, flags); + if (old) + domain_remove_dev_pasid(old, dev, pasid); if (domain->type & __IOMMU_DOMAIN_PAGING) intel_iommu_debugfs_create_dev_pasid(dev_pasid); return 0; -out_unassign_tag: - cache_tag_unassign_domain(dmar_domain, dev, pasid); -out_detach_iommu: - domain_detach_iommu(dmar_domain, iommu); -out_free: - kfree(dev_pasid); + +out_undo_dev_pasid: + domain_remove_dev_pasid(domain, dev, pasid); return ret; } Do you need to re-install the old domain to the pasid entry in the failure path? yes, but no. The old domain is still installed in the pasid entry when the failure happened. :) I am afraid not. The old domain has already been cleaned up by intel_pasid_tear_down_entry(). Or not? Thanks, baolu
Re: [PATCH v2 3/6] iommu/vt-d: Make intel_iommu_set_dev_pasid() to handle domain replacement
> On Sep 14, 2024, at 09:08, Baolu Lu wrote: > > On 9/13/24 8:21 PM, Yi Liu wrote: >>> On 2024/9/13 09:42, Baolu Lu wrote: >>> On 9/12/24 9:04 PM, Yi Liu wrote: @@ -4325,24 +4363,18 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain, ret = intel_pasid_setup_second_level(iommu, dmar_domain, dev, pasid); if (ret) -goto out_unassign_tag; +goto out_undo_dev_pasid; -dev_pasid->dev = dev; -dev_pasid->pasid = pasid; -spin_lock_irqsave(&dmar_domain->lock, flags); -list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids); -spin_unlock_irqrestore(&dmar_domain->lock, flags); +if (old) +domain_remove_dev_pasid(old, dev, pasid); if (domain->type & __IOMMU_DOMAIN_PAGING) intel_iommu_debugfs_create_dev_pasid(dev_pasid); return 0; -out_unassign_tag: -cache_tag_unassign_domain(dmar_domain, dev, pasid); -out_detach_iommu: -domain_detach_iommu(dmar_domain, iommu); -out_free: -kfree(dev_pasid); + +out_undo_dev_pasid: +domain_remove_dev_pasid(domain, dev, pasid); return ret; } >>> >>> Do you need to re-install the old domain to the pasid entry in the >>> failure path? >> yes, but no. The old domain is still installed in the pasid entry >> when the failure happened. :) > > I am afraid not. The old domain has already been cleaned up by > intel_pasid_tear_down_entry(). Or not? oops, yes. The tear down was lifted to this function now instead of in the setup helpers. I may move them back to the setup helpers just like v1 does. Good catch! Regards, Yi Liu
Re: [PATCH v3 1/4] ida: Add ida_find_first_range()
On 2024/9/13 23:09, Matthew Wilcox wrote: On Fri, Sep 13, 2024 at 07:45:55PM +0800, Yi Liu wrote: No test cases for the test suite? ;-( let me add something like the below. :) That looks pretty comprehensive, thanks! Acked-by: Matthew Wilcox (Oracle) thanks, and FYI. I found a bug when running the unit test. will fix it in the next version. it's really helpful suggestion. diff --git a/lib/idr.c b/lib/idr.c index 6644d3d1af02..f16eb3d172bc 100644 --- a/lib/idr.c +++ b/lib/idr.c @@ -494,6 +494,7 @@ int ida_find_first_range(struct ida *ida, unsigned int min, unsigned int max) unsigned int offset = min % IDA_BITMAP_BITS; unsigned long *addr, size, bit; unsigned long flags; + unsigned long tmp = 0; void *entry; int ret; @@ -518,8 +519,7 @@ int ida_find_first_range(struct ida *ida, unsigned int min, unsigned int max) } if (xa_is_value(entry)) { - unsigned long tmp = xa_to_value(entry); - + tmp = xa_to_value(entry); addr = &tmp; size = BITS_PER_XA_VALUE; } else { -- Regards, Yi Liu