Re: [PATCH v4 1/6] firmware/psci: Add definitions for PSCI v1.3 specification
On Tue, 2024-09-24 at 17:05 +0100, David Woodhouse wrote: > From: David Woodhouse > > The v1.3 PSCI spec (https://developer.arm.com/documentation/den0022) > adds > SYSTEM_OFF2, CLEAN_INV_MEMREGION and CLEAN_INV_MEMREGION_ATTRIBUTES > functions. Add definitions for them and their parameters, along with > the > new TIMEOUT, RATE_LIMITED and BUSY error values. The CLEAN_INV_MEMREGION and CLEAN_INV_MEMREGION_ATTRIBUTES functions were added in the alpha release of the spec but have been dropped in the beta release, and are not included in the final spec. So IMO the uapi header file should not contain these definitions. The same goes for the TIMEOUT, RATE_LIMITED and BUSY error values.
Re: [PATCH v4 2/6] KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation
On Tue, 2024-09-24 at 17:05 +0100, David Woodhouse wrote: > From: David Woodhouse > > The PSCI v1.3 specification (alpha) adds support for a SYSTEM_OFF2 Can remove (alpha). > function > which is analogous to ACPI S4 state. This will allow hosting > environments > to determine that a guest is hibernated rather than just powered off, > and > ensure that they preserve the virtual environment appropriately to > allow > the guest to resume safely (or bump the hardware_signature in the > FACS to > trigger a clean reboot instead). > > The beta version will be changed to say that PSCI_FEATURES returns a > bit > mask of the supported hibernate types, which is implemented here. Since the final spec has been released, we can revise or remove the above wording. > Although this new feature is inflicted unconditionally on unexpecting > userspace, it ought to be mostly OK because it still results in the > same > KVM_SYSTEM_EVENT_SHUTDOWN event, just with a new flag which hopefully > won't cause userspace to get unhappy. > > Signed-off-by: David Woodhouse > --- > Documentation/virt/kvm/api.rst | 11 + > arch/arm64/include/uapi/asm/kvm.h | 6 + > arch/arm64/kvm/psci.c | 37 > +++ > 3 files changed, 54 insertions(+) > > diff --git a/Documentation/virt/kvm/api.rst > b/Documentation/virt/kvm/api.rst > index b3be87489108..2918898b7047 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6840,6 +6840,10 @@ the first `ndata` items (possibly zero) of the > data array are valid. > the guest issued a SYSTEM_RESET2 call according to v1.1 of the > PSCI > specification. > > + - for arm64, data[0] is set to > KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2 > + if the guest issued a SYSTEM_OFF2 call according to v1.3 of the > PSCI > + specification. > + > - for RISC-V, data[0] is set to the value of the second argument of > the > ``sbi_system_reset`` call. > > @@ -6873,6 +6877,13 @@ either: > - Deny the guest request to suspend the VM. See ARM DEN0022D.b > 5.19.2 > "Caller responsibilities" for possible return values. > > +Hibernation using the PSCI SYSTEM_OFF2 call is enabled when PSCI > v1.3 > +is enabled. If a guest invokes the PSCI SYSTEM_OFF2 function, KVM > will > +exit to userspace with the KVM_SYSTEM_EVENT_SHUTDOWN event type and > with > +data[0] set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2. The only > +supported hibernate type for the SYSTEM_OFF2 function is > HIBERNATE_OFF > +0x0). The spec says that the HIBERNATE_OFF parameter value is 0x1, not 0x0 (which is kind of unfortunate because it doesn't match the corresponding bit in the feature flags). So, either the BIT(PSCI_1_3_HIBERNATE_TYPE_OFF) value should be used for the SYSTEM_OFF2 functions in the code, or the definition should be changed in the header file (unless the text in the spec is wrong).
Re: [PATCH v4 6/6] arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate
On Tue, 2024-09-24 at 17:05 +0100, David Woodhouse wrote: > From: David Woodhouse > > The PSCI v1.3 specification (alpha) adds support for a SYSTEM_OFF2 Can remove (alpha).
Re: [PATCH v4 1/6] firmware/psci: Add definitions for PSCI v1.3 specification
Hi David, > On 24 Sep 2024, at 16:05, David Woodhouse wrote: > > From: David Woodhouse > > The v1.3 PSCI spec (https://developer.arm.com/documentation/den0022) adds > SYSTEM_OFF2, CLEAN_INV_MEMREGION and CLEAN_INV_MEMREGION_ATTRIBUTES > functions. Add definitions for them and their parameters, along with the > new TIMEOUT, RATE_LIMITED and BUSY error values. > DEN0022F REL superseded DEN0022F ALP1 which doesn’t describe CLEAN_INV_MEMREGION or CLEAN_INV_MEMREGION_ATTRIBUTES. Defining those at another time shouldn’t be a blocker for the rest of this patchset. > Signed-off-by: David Woodhouse > --- > include/uapi/linux/psci.h | 20 > 1 file changed, 20 insertions(+) > > diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h > index 42a40ad3fb62..082ed689fdaf 100644 > --- a/include/uapi/linux/psci.h > +++ b/include/uapi/linux/psci.h > @@ -59,6 +59,8 @@ > #define PSCI_1_1_FN_SYSTEM_RESET2 PSCI_0_2_FN(18) > #define PSCI_1_1_FN_MEM_PROTECT PSCI_0_2_FN(19) > #define PSCI_1_1_FN_MEM_PROTECT_CHECK_RANGE PSCI_0_2_FN(20) > +#define PSCI_1_3_FN_SYSTEM_OFF2 PSCI_0_2_FN(21) > +#define PSCI_1_3_FN_CLEAN_INV_MEMREGION_ATTRIBUTES PSCI_0_2_FN(23) > > #define PSCI_1_0_FN64_CPU_DEFAULT_SUSPEND PSCI_0_2_FN64(12) > #define PSCI_1_0_FN64_NODE_HW_STATE PSCI_0_2_FN64(13) > @@ -68,6 +70,8 @@ > > #define PSCI_1_1_FN64_SYSTEM_RESET2 PSCI_0_2_FN64(18) > #define PSCI_1_1_FN64_MEM_PROTECT_CHECK_RANGE PSCI_0_2_FN64(20) > +#define PSCI_1_3_FN64_SYSTEM_OFF2 PSCI_0_2_FN64(21) > +#define PSCI_1_3_FN64_CLEAN_INV_MEMREGION PSCI_0_2_FN64(22) > > /* PSCI v0.2 power state encoding for CPU_SUSPEND function */ > #define PSCI_0_2_POWER_STATE_ID_MASK 0x > @@ -100,6 +104,19 @@ > #define PSCI_1_1_RESET_TYPE_SYSTEM_WARM_RESET 0 > #define PSCI_1_1_RESET_TYPE_VENDOR_START 0x8000U > > +/* PSCI v1.3 hibernate type for SYSTEM_OFF2 */ > +#define PSCI_1_3_HIBERNATE_TYPE_OFF 0 Should it be 1 as hibernate type? > + > +/* PSCI v1.3 flags for CLEAN_INV_MEMREGION */ > +#define PSCI_1_3_CLEAN_INV_MEMREGION_FLAG_DRY_RUN BIT(0) > + > +/* PSCI v1.3 attributes for CLEAN_INV_MEMREGION_ATTRIBUTES */ > +#define PSCI_1_3_CLEAN_INV_MEMREGION_ATTR_OP_TYPE 0 > +#define PSCI_1_3_CLEAN_INV_MEMREGION_ATTR_CPU_RDVZ 1 > +#define PSCI_1_3_CLEAN_INV_MEMREGION_ATTR_LATENCY 2 > +#define PSCI_1_3_CLEAN_INV_MEMREGION_ATTR_RATE_LIMIT 3 > +#define PSCI_1_3_CLEAN_INV_MEMREGION_ATTR_TIMEOUT 4 > + > /* PSCI version decoding (independent of PSCI version) */ > #define PSCI_VERSION_MAJOR_SHIFT 16 > #define PSCI_VERSION_MINOR_MASK \ > @@ -133,5 +150,8 @@ > #define PSCI_RET_NOT_PRESENT -7 > #define PSCI_RET_DISABLED -8 > #define PSCI_RET_INVALID_ADDRESS -9 > +#define PSCI_RET_TIMEOUT -10 > +#define PSCI_RET_RATE_LIMITED -11 > +#define PSCI_RET_BUSY -12 > Thanks Miguel > #endif /* _UAPI_LINUX_PSCI_H */ > -- > 2.44.0 > >
Re: [PATCH v4 1/6] firmware/psci: Add definitions for PSCI v1.3 specification
On Thu, 2024-09-26 at 10:55 +0200, Francesco Lavra wrote: > On Tue, 2024-09-24 at 17:05 +0100, David Woodhouse wrote: > > From: David Woodhouse > > > > The v1.3 PSCI spec (https://developer.arm.com/documentation/den0022) > > adds > > SYSTEM_OFF2, CLEAN_INV_MEMREGION and CLEAN_INV_MEMREGION_ATTRIBUTES > > functions. Add definitions for them and their parameters, along with > > the > > new TIMEOUT, RATE_LIMITED and BUSY error values. > > The CLEAN_INV_MEMREGION and CLEAN_INV_MEMREGION_ATTRIBUTES > functions were added in the alpha release of the spec but have been > dropped in the beta release, and are not included in the final spec. So > IMO the uapi header file should not contain these definitions. > The same goes for the TIMEOUT, RATE_LIMITED and BUSY error values. Thanks. I'll drop those. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH v4 1/6] firmware/psci: Add definitions for PSCI v1.3 specification
On Thu, 2024-09-26 at 09:56 +, Miguel Luis wrote: > > > +/* PSCI v1.3 hibernate type for SYSTEM_OFF2 */ > > +#define PSCI_1_3_HIBERNATE_TYPE_OFF 0 > > Should it be 1 as hibernate type? It is in discovery, as BIT(PSCI_1_3_HIBERNATE_TYPE_OFF) == 1<<0 == 1. But using a bitmask was only supposed to be for the discovery with PSCI_FEATURES, as that has to advertise all the available hibernation types. The actual SYSTEM_OFF2 call was supposed to just take the numeric value as an argument, since obviously *that* one isn't a bitmask. Except... I see that now the spec has finally been updated, it seems to say that 0x1 is the value to pass to the SYSTEM_OFF2 call for HIBERNATE_OFF, not 0x0. Which doesn't seem to make much sense, and I don't recall it being what we discussed. Souvik, what happened there? My understanding was that for each supported hibernation type #n, for which HIBERERNATE_OFF is zero), the PSCI_FEATURES query would include the bit (1< smime.p7s Description: S/MIME cryptographic signature
[PATCH doc] docs: gcov: fix link to LCOV website
The previous website hosted on SourceForge is no longer available since January 2024 according to archive.org [1]. It looks like the website has been officially moved to GitHub in June 2022 [2]. Best to redirect readers to the new location then. Link: https://web.archive.org/web/20240105235756/https://ltp.sourceforge.net/coverage/lcov.php [1] Link: https://github.com/linux-test-project/lcov/commit/6da8399c7a7a [2] Signed-off-by: Matthieu Baerts (NGI0) --- Documentation/dev-tools/gcov.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/dev-tools/gcov.rst b/Documentation/dev-tools/gcov.rst index 5fce2b06f22954c3ac6e7d7118064f015624c4a9..53a85ffebcea9fc99484296c5193df7ab5ec4e3e 100644 --- a/Documentation/dev-tools/gcov.rst +++ b/Documentation/dev-tools/gcov.rst @@ -23,7 +23,7 @@ Possible uses: associated code is never run?) .. _gcov: https://gcc.gnu.org/onlinedocs/gcc/Gcov.html -.. _lcov: http://ltp.sourceforge.net/coverage/lcov.php +.. _lcov: https://github.com/linux-test-project/lcov Preparation --- base-commit: 052f172ef127e3b76f31e11f71e957e552cdb94d change-id: 20240926-doc-fix-lcov-link-0bfc5a83a1be Best regards, -- Matthieu Baerts (NGI0)
[PATCH v5 0/5] Add PSCI v1.3 SYSTEM_OFF2 support for hibernation
The PSCI v1.3 spec (https://developer.arm.com/documentation/den0022) adds support for a SYSTEM_OFF2 function enabling a HIBERNATE_OFF state which is analogous to ACPI S4. This will allow hosting environments to determine that a guest is hibernated rather than just powered off, and ensure that they preserve the virtual environment appropriately to allow the guest to resume safely (or bump the hardware_signature in the FACS to trigger a clean reboot instead). This updates KVM to support advertising PSCI v1.3, and unconditionally enables the SYSTEM_OFF2 support when PSCI v1.3 is enabled. For the guest side, add a new SYS_OFF_MODE_POWER_OFF handler with higher priority than the EFI one, but which *only* triggers when there's a hibernation in progress. There are other ways to do this (see the commit message for more details) but this seemed like the simplest. Version 2 of the patch series splits out the psci.h definitions into a separate commit (a dependency for both the guest and KVM side), and adds definitions for the other new functions added in v1.3. It also moves the pKVM psci-relay support to a separate commit; although in arch/arm64/kvm that's actually about the *guest* side of SYSTEM_OFF2 (i.e. using it from the host kernel, relayed through nVHE). Version 3 dropped the KVM_CAP which allowed userspace to explicitly opt in to the new feature like with SYSTEM_SUSPEND, and makes it depend only on PSCI v1.3 being exposed to the guest. Version 4 is no longer RFC, as the PSCI v1.3 spec is finally published. Minor fixes from the last round of review, and an added KVM self test. Version 5 drops some of the changes which didn't make it to the final v1.3 spec, and cleans up a couple of places which still referred to it as 'alpha' or 'beta'. It also temporarily drops the guest-side patch to invoke SYSTEM_OFF2 for hibernation, pending confirmation that the final PSCI v1.3 spec just has a typo where it changed to saying that 0x1 should be passed to mean HIBERNATE_OFF, even though it's advertised as bit 0. That can be sent under separate cover, and perhaps should have been anyway. The change in question doesn't matter for any of the KVM patches, because we just treat SYSTEM_OFF2 like the existing SYSTEM_RESET2, setting a flag to indicate that it was a SYSTEM_OFF2 call, but not actually caring about the argument; that's for userspace to worry about. David Woodhouse (5): firmware/psci: Add definitions for PSCI v1.3 specification KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation KVM: arm64: Add support for PSCI v1.2 and v1.3 KVM: selftests: Add test for PSCI SYSTEM_OFF2 KVM: arm64: nvhe: Pass through PSCI v1.3 SYSTEM_OFF2 call Documentation/virt/kvm/api.rst | 11 + arch/arm64/include/uapi/asm/kvm.h | 6 +++ arch/arm64/kvm/hyp/nvhe/psci-relay.c| 2 + arch/arm64/kvm/hypercalls.c | 2 + arch/arm64/kvm/psci.c | 43 - include/kvm/arm_psci.h | 4 +- include/uapi/linux/psci.h | 5 ++ tools/testing/selftests/kvm/aarch64/psci_test.c | 61 + 8 files changed, 132 insertions(+), 2 deletions(-)
[PATCH v5 1/5] firmware/psci: Add definitions for PSCI v1.3 specification
From: David Woodhouse The v1.3 PSCI spec (https://developer.arm.com/documentation/den0022) adds the SYSTEM_OFF2 function. Add definitions for it and its hibernation type parameter. Signed-off-by: David Woodhouse --- include/uapi/linux/psci.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h index 42a40ad3fb62..f9a015ebe221 100644 --- a/include/uapi/linux/psci.h +++ b/include/uapi/linux/psci.h @@ -59,6 +59,7 @@ #define PSCI_1_1_FN_SYSTEM_RESET2 PSCI_0_2_FN(18) #define PSCI_1_1_FN_MEM_PROTECTPSCI_0_2_FN(19) #define PSCI_1_1_FN_MEM_PROTECT_CHECK_RANGEPSCI_0_2_FN(20) +#define PSCI_1_3_FN_SYSTEM_OFF2PSCI_0_2_FN(21) #define PSCI_1_0_FN64_CPU_DEFAULT_SUSPEND PSCI_0_2_FN64(12) #define PSCI_1_0_FN64_NODE_HW_STATEPSCI_0_2_FN64(13) @@ -68,6 +69,7 @@ #define PSCI_1_1_FN64_SYSTEM_RESET2PSCI_0_2_FN64(18) #define PSCI_1_1_FN64_MEM_PROTECT_CHECK_RANGE PSCI_0_2_FN64(20) +#define PSCI_1_3_FN64_SYSTEM_OFF2 PSCI_0_2_FN64(21) /* PSCI v0.2 power state encoding for CPU_SUSPEND function */ #define PSCI_0_2_POWER_STATE_ID_MASK 0x @@ -100,6 +102,9 @@ #define PSCI_1_1_RESET_TYPE_SYSTEM_WARM_RESET 0 #define PSCI_1_1_RESET_TYPE_VENDOR_START 0x8000U +/* PSCI v1.3 hibernate type for SYSTEM_OFF2 */ +#define PSCI_1_3_HIBERNATE_TYPE_OFF0 + /* PSCI version decoding (independent of PSCI version) */ #define PSCI_VERSION_MAJOR_SHIFT 16 #define PSCI_VERSION_MINOR_MASK\ -- 2.44.0
[PATCH v5 2/5] KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation
From: David Woodhouse The PSCI v1.3 specification adds support for a SYSTEM_OFF2 function which is analogous to ACPI S4 state. This will allow hosting environments to determine that a guest is hibernated rather than just powered off, and ensure that they preserve the virtual environment appropriately to allow the guest to resume safely (or bump the hardware_signature in the FACS to trigger a clean reboot instead). Although this new feature is inflicted unconditionally on unexpecting userspace, it should be OK because it still results in the same KVM_SYSTEM_EVENT_SHUTDOWN event, just with a new flag which hopefully won't cause userspace to get unhappy. Signed-off-by: David Woodhouse --- Documentation/virt/kvm/api.rst| 11 + arch/arm64/include/uapi/asm/kvm.h | 6 + arch/arm64/kvm/psci.c | 37 +++ 3 files changed, 54 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index b3be87489108..2918898b7047 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6840,6 +6840,10 @@ the first `ndata` items (possibly zero) of the data array are valid. the guest issued a SYSTEM_RESET2 call according to v1.1 of the PSCI specification. + - for arm64, data[0] is set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2 + if the guest issued a SYSTEM_OFF2 call according to v1.3 of the PSCI + specification. + - for RISC-V, data[0] is set to the value of the second argument of the ``sbi_system_reset`` call. @@ -6873,6 +6877,13 @@ either: - Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2 "Caller responsibilities" for possible return values. +Hibernation using the PSCI SYSTEM_OFF2 call is enabled when PSCI v1.3 +is enabled. If a guest invokes the PSCI SYSTEM_OFF2 function, KVM will +exit to userspace with the KVM_SYSTEM_EVENT_SHUTDOWN event type and with +data[0] set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2. The only +supported hibernate type for the SYSTEM_OFF2 function is HIBERNATE_OFF +0x0). + :: /* KVM_EXIT_IOAPIC_EOI */ diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h index 964df31da975..66736ff04011 100644 --- a/arch/arm64/include/uapi/asm/kvm.h +++ b/arch/arm64/include/uapi/asm/kvm.h @@ -484,6 +484,12 @@ enum { */ #define KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2(1ULL << 0) +/* + * Shutdown caused by a PSCI v1.3 SYSTEM_OFF2 call. + * Valid only when the system event has a type of KVM_SYSTEM_EVENT_SHUTDOWN. + */ +#define KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2 (1ULL << 0) + /* run->fail_entry.hardware_entry_failure_reason codes. */ #define KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED(1ULL << 0) diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c index 1f69b667332b..fd0f82464f7d 100644 --- a/arch/arm64/kvm/psci.c +++ b/arch/arm64/kvm/psci.c @@ -194,6 +194,12 @@ static void kvm_psci_system_off(struct kvm_vcpu *vcpu) kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN, 0); } +static void kvm_psci_system_off2(struct kvm_vcpu *vcpu) +{ + kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN, +KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2); +} + static void kvm_psci_system_reset(struct kvm_vcpu *vcpu) { kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET, 0); @@ -358,6 +364,11 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor) if (minor >= 1) val = 0; break; + case PSCI_1_3_FN_SYSTEM_OFF2: + case PSCI_1_3_FN64_SYSTEM_OFF2: + if (minor >= 3) + val = BIT(PSCI_1_3_HIBERNATE_TYPE_OFF); + break; } break; case PSCI_1_0_FN_SYSTEM_SUSPEND: @@ -392,6 +403,32 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor) break; } break; + case PSCI_1_3_FN_SYSTEM_OFF2: + kvm_psci_narrow_to_32bit(vcpu); + fallthrough; + case PSCI_1_3_FN64_SYSTEM_OFF2: + if (minor < 3) + break; + + arg = smccc_get_arg1(vcpu); + if (arg != PSCI_1_3_HIBERNATE_TYPE_OFF) { + val = PSCI_RET_INVALID_PARAMS; + break; + } + kvm_psci_system_off2(vcpu); + /* +* We shouldn't be going back to guest VCPU after +* receiving SYSTEM_OFF2 request. +* +* If user space accidentally/deliberately resumes +* guest VCPU after SYSTEM_OFF2 request then guest +* VCPU should see internal failure from PSCI return +* value. To achieve this, we preload r0 (or x0) with +*
[PATCH v5 5/5] KVM: arm64: nvhe: Pass through PSCI v1.3 SYSTEM_OFF2 call
From: David Woodhouse Pass through the SYSTEM_OFF2 function for hibernation, just like SYSTEM_OFF. Signed-off-by: David Woodhouse --- arch/arm64/kvm/hyp/nvhe/psci-relay.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c index dfe8fe0f7eaf..9c2ce1e0e99a 100644 --- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c +++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c @@ -265,6 +265,8 @@ static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_ case PSCI_1_0_FN_PSCI_FEATURES: case PSCI_1_0_FN_SET_SUSPEND_MODE: case PSCI_1_1_FN64_SYSTEM_RESET2: + case PSCI_1_3_FN_SYSTEM_OFF2: + case PSCI_1_3_FN64_SYSTEM_OFF2: return psci_forward(host_ctxt); case PSCI_1_0_FN64_SYSTEM_SUSPEND: return psci_system_suspend(func_id, host_ctxt); -- 2.44.0
[PATCH v5 4/5] KVM: selftests: Add test for PSCI SYSTEM_OFF2
From: David Woodhouse Signed-off-by: David Woodhouse --- .../testing/selftests/kvm/aarch64/psci_test.c | 61 +++ 1 file changed, 61 insertions(+) diff --git a/tools/testing/selftests/kvm/aarch64/psci_test.c b/tools/testing/selftests/kvm/aarch64/psci_test.c index 61731a950def..b7e37956aecf 100644 --- a/tools/testing/selftests/kvm/aarch64/psci_test.c +++ b/tools/testing/selftests/kvm/aarch64/psci_test.c @@ -54,6 +54,15 @@ static uint64_t psci_system_suspend(uint64_t entry_addr, uint64_t context_id) return res.a0; } +static uint64_t psci_system_off2(uint64_t type) +{ + struct arm_smccc_res res; + + smccc_hvc(PSCI_1_3_FN64_SYSTEM_OFF2, type, 0, 0, 0, 0, 0, 0, &res); + + return res.a0; +} + static uint64_t psci_features(uint32_t func_id) { struct arm_smccc_res res; @@ -188,11 +197,63 @@ static void host_test_system_suspend(void) kvm_vm_free(vm); } +static void guest_test_system_off2(void) +{ + uint64_t ret; + + /* assert that SYSTEM_OFF2 is discoverable */ + GUEST_ASSERT(psci_features(PSCI_1_3_FN_SYSTEM_OFF2) & +BIT(PSCI_1_3_HIBERNATE_TYPE_OFF)); + GUEST_ASSERT(psci_features(PSCI_1_3_FN64_SYSTEM_OFF2) & +BIT(PSCI_1_3_HIBERNATE_TYPE_OFF)); + + ret = psci_system_off2(PSCI_1_3_HIBERNATE_TYPE_OFF); + GUEST_SYNC(ret); +} + +static void host_test_system_off2(void) +{ + struct kvm_vcpu *source, *target; + uint64_t psci_version = 0; + struct kvm_run *run; + struct kvm_vm *vm; + + vm = setup_vm(guest_test_system_off2, &source, &target); + vcpu_get_reg(target, KVM_REG_ARM_PSCI_VERSION, &psci_version); + TEST_ASSERT(psci_version >= PSCI_VERSION(0, 2), + "Unexpected PSCI version %lu.%lu", + PSCI_VERSION_MAJOR(psci_version), + PSCI_VERSION_MINOR(psci_version)); + + if (psci_version < PSCI_VERSION(1,3)) + goto skip; + + vcpu_power_off(target); + run = source->run; + + enter_guest(source); + + TEST_ASSERT_KVM_EXIT_REASON(source, KVM_EXIT_SYSTEM_EVENT); + TEST_ASSERT(run->system_event.type == KVM_SYSTEM_EVENT_SHUTDOWN, + "Unhandled system event: %u (expected: %u)", + run->system_event.type, KVM_SYSTEM_EVENT_SHUTDOWN); + TEST_ASSERT(run->system_event.ndata >= 1, + "Unexpected amount of system event data: %u (expected, >= 1)", + run->system_event.ndata); + TEST_ASSERT(run->system_event.data[0] & KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2, + "PSCI_OFF2 flag not set. Flags %llu (expected %llu)", + run->system_event.data[0], KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2); + + skip: + kvm_vm_free(vm); +} + int main(void) { TEST_REQUIRE(kvm_has_cap(KVM_CAP_ARM_SYSTEM_SUSPEND)); host_test_cpu_on(); host_test_system_suspend(); + host_test_system_off2(); return 0; } -- 2.44.0
[PATCH v5 3/5] KVM: arm64: Add support for PSCI v1.2 and v1.3
From: David Woodhouse Signed-off-by: David Woodhouse --- arch/arm64/kvm/hypercalls.c | 2 ++ arch/arm64/kvm/psci.c | 6 +- include/kvm/arm_psci.h | 4 +++- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index 5763d979d8ca..9c6267ca2b82 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -575,6 +575,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) case KVM_ARM_PSCI_0_2: case KVM_ARM_PSCI_1_0: case KVM_ARM_PSCI_1_1: + case KVM_ARM_PSCI_1_2: + case KVM_ARM_PSCI_1_3: if (!wants_02) return -EINVAL; vcpu->kvm->arch.psci_version = val; diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c index fd0f82464f7d..5177dda5a411 100644 --- a/arch/arm64/kvm/psci.c +++ b/arch/arm64/kvm/psci.c @@ -328,7 +328,7 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor) switch(psci_fn) { case PSCI_0_2_FN_PSCI_VERSION: - val = minor == 0 ? KVM_ARM_PSCI_1_0 : KVM_ARM_PSCI_1_1; + val = PSCI_VERSION(1, minor); break; case PSCI_1_0_FN_PSCI_FEATURES: arg = smccc_get_arg1(vcpu); @@ -486,6 +486,10 @@ int kvm_psci_call(struct kvm_vcpu *vcpu) } switch (version) { + case KVM_ARM_PSCI_1_3: + return kvm_psci_1_x_call(vcpu, 3); + case KVM_ARM_PSCI_1_2: + return kvm_psci_1_x_call(vcpu, 2); case KVM_ARM_PSCI_1_1: return kvm_psci_1_x_call(vcpu, 1); case KVM_ARM_PSCI_1_0: diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h index e8fb624013d1..cbaec804eb83 100644 --- a/include/kvm/arm_psci.h +++ b/include/kvm/arm_psci.h @@ -14,8 +14,10 @@ #define KVM_ARM_PSCI_0_2 PSCI_VERSION(0, 2) #define KVM_ARM_PSCI_1_0 PSCI_VERSION(1, 0) #define KVM_ARM_PSCI_1_1 PSCI_VERSION(1, 1) +#define KVM_ARM_PSCI_1_2 PSCI_VERSION(1, 2) +#define KVM_ARM_PSCI_1_3 PSCI_VERSION(1, 3) -#define KVM_ARM_PSCI_LATESTKVM_ARM_PSCI_1_1 +#define KVM_ARM_PSCI_LATESTKVM_ARM_PSCI_1_3 static inline int kvm_psci_version(struct kvm_vcpu *vcpu) { -- 2.44.0
Re: [PATCH RFC v4 0/9] tun: Introduce virtio-net hashing feature
On Fri, Sep 27, 2024 at 10:11 AM Akihiko Odaki wrote: > > On 2024/09/25 12:30, Jason Wang wrote: > > On Tue, Sep 24, 2024 at 5:01 PM Akihiko Odaki > > wrote: > >> > >> virtio-net have two usage of hashes: one is RSS and another is hash > >> reporting. Conventionally the hash calculation was done by the VMM. > >> However, computing the hash after the queue was chosen defeats the > >> purpose of RSS. > >> > >> Another approach is to use eBPF steering program. This approach has > >> another downside: it cannot report the calculated hash due to the > >> restrictive nature of eBPF. > >> > >> Introduce the code to compute hashes to the kernel in order to overcome > >> thse challenges. > >> > >> An alternative solution is to extend the eBPF steering program so that it > >> will be able to report to the userspace, but it is based on context > >> rewrites, which is in feature freeze. We can adopt kfuncs, but they will > >> not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM > >> and vhost_net). > >> > > > > I wonder if we could clone the skb and reuse some to store the hash, > > then the steering eBPF program can access these fields without > > introducing full RSS in the kernel? > > I don't get how cloning the skb can solve the issue. > > We can certainly implement Toeplitz function in the kernel or even with > tc-bpf to store a hash value that can be used for eBPF steering program > and virtio hash reporting. However we don't have a means of storing a > hash type, which is specific to virtio hash reporting and lacks a > corresponding skb field. I may miss something but looking at sk_filter_is_valid_access(). It looks to me we can make use of skb->cb[0..4]? Thanks > > Regards, > Akihiko Odaki >
Re: [PATCH RFC v4 7/9] tun: Introduce virtio-net RSS
On 2024/09/24 22:05, Simon Horman wrote: On Tue, Sep 24, 2024 at 11:01:12AM +0200, Akihiko Odaki wrote: RSS is a receive steering algorithm that can be negotiated to use with virtio_net. Conventionally the hash calculation was done by the VMM. However, computing the hash after the queue was chosen defeats the purpose of RSS. Another approach is to use eBPF steering program. This approach has another downside: it cannot report the calculated hash due to the restrictive nature of eBPF steering program. Introduce the code to perform RSS to the kernel in order to overcome thse challenges. An alternative solution is to extend the eBPF steering program so that it will be able to report to the userspace, but I didn't opt for it because extending the current mechanism of eBPF steering program as is because it relies on legacy context rewriting, and introducing kfunc-based eBPF will result in non-UAPI dependency while the other relevant virtualization APIs such as KVM and vhost_net are UAPIs. Signed-off-by: Akihiko Odaki ... diff --git a/drivers/net/tun.c b/drivers/net/tun.c ... @@ -333,8 +336,10 @@ static long tun_set_vnet_be(struct tun_struct *tun, int __user *argp) return -EFAULT; if (be) { + struct tun_vnet_hash_container *vnet_hash = rtnl_dereference(tun->vnet_hash); + if (!(tun->flags & TUN_VNET_LE) && - (tun->vnet_hash.flags & TUN_VNET_HASH_REPORT)) + vnet_hash && (vnet_hash->flags & TUN_VNET_HASH_REPORT)) Hi Odaki-san, I am wondering if the above should this be vnet_hash->common.flags? I am seeing this: ../drivers/net/tun.c:342:44: error: ‘struct tun_vnet_hash_container’ has no member named ‘flags’ 342 | vnet_hash && (vnet_hash->flags & TUN_VNET_HASH_REPORT)) ... You are right. I couldn't notice this error because I was testing without CONFIG_TUN_VNET_CROSS_LE; I'll test with the configuration and submit a new version with fix. Regards, Akihiko Odaki
Re: [PATCH RFC v4 0/9] tun: Introduce virtio-net hashing feature
On 2024/09/25 12:30, Jason Wang wrote: On Tue, Sep 24, 2024 at 5:01 PM Akihiko Odaki wrote: virtio-net have two usage of hashes: one is RSS and another is hash reporting. Conventionally the hash calculation was done by the VMM. However, computing the hash after the queue was chosen defeats the purpose of RSS. Another approach is to use eBPF steering program. This approach has another downside: it cannot report the calculated hash due to the restrictive nature of eBPF. Introduce the code to compute hashes to the kernel in order to overcome thse challenges. An alternative solution is to extend the eBPF steering program so that it will be able to report to the userspace, but it is based on context rewrites, which is in feature freeze. We can adopt kfuncs, but they will not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM and vhost_net). I wonder if we could clone the skb and reuse some to store the hash, then the steering eBPF program can access these fields without introducing full RSS in the kernel? I don't get how cloning the skb can solve the issue. We can certainly implement Toeplitz function in the kernel or even with tc-bpf to store a hash value that can be used for eBPF steering program and virtio hash reporting. However we don't have a means of storing a hash type, which is specific to virtio hash reporting and lacks a corresponding skb field. Regards, Akihiko Odaki
Re: [PATCH 07/12] huge_memory: Allow mappings of PMD sized pages
Alistair Popple wrote: > Currently DAX folio/page reference counts are managed differently to > normal pages. To allow these to be managed the same as normal pages > introduce dax_insert_pfn_pmd. This will map the entire PMD-sized folio > and take references as it would for a normally mapped page. > > This is distinct from the current mechanism, vmf_insert_pfn_pmd, which > simply inserts a special devmap PMD entry into the page table without > holding a reference to the page for the mapping. It would be useful to mention the rationale for the locking changes and your understanding of the new "pgtable deposit" handling, because those things make this not a trivial conversion. > > Signed-off-by: Alistair Popple > --- > include/linux/huge_mm.h | 1 +- > mm/huge_memory.c| 57 ++ > 2 files changed, 48 insertions(+), 10 deletions(-) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index d3a1872..eaf3f78 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -40,6 +40,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct > vm_area_struct *vma, > > vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write); > vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write); > +vm_fault_t dax_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write); > vm_fault_t dax_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write); > > enum transparent_hugepage_flag { > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index e8985a4..790041e 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1237,14 +1237,12 @@ static void insert_pfn_pmd(struct vm_area_struct > *vma, unsigned long addr, > { > struct mm_struct *mm = vma->vm_mm; > pmd_t entry; > - spinlock_t *ptl; > > - ptl = pmd_lock(mm, pmd); > if (!pmd_none(*pmd)) { > if (write) { > if (pmd_pfn(*pmd) != pfn_t_to_pfn(pfn)) { > WARN_ON_ONCE(!is_huge_zero_pmd(*pmd)); > - goto out_unlock; > + return; > } > entry = pmd_mkyoung(*pmd); > entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); > @@ -1252,7 +1250,7 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, > unsigned long addr, > update_mmu_cache_pmd(vma, addr, pmd); > } > > - goto out_unlock; > + return; > } > > entry = pmd_mkhuge(pfn_t_pmd(pfn, prot)); > @@ -1271,11 +1269,6 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, > unsigned long addr, > > set_pmd_at(mm, addr, pmd, entry); > update_mmu_cache_pmd(vma, addr, pmd); > - > -out_unlock: > - spin_unlock(ptl); > - if (pgtable) > - pte_free(mm, pgtable); > } > > /** > @@ -1294,6 +1287,7 @@ vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, > pfn_t pfn, bool write) > struct vm_area_struct *vma = vmf->vma; > pgprot_t pgprot = vma->vm_page_prot; > pgtable_t pgtable = NULL; > + spinlock_t *ptl; > > /* >* If we had pmd_special, we could avoid all these restrictions, > @@ -1316,12 +1310,55 @@ vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, > pfn_t pfn, bool write) > } > > track_pfn_insert(vma, &pgprot, pfn); > - > + ptl = pmd_lock(vma->vm_mm, vmf->pmd); > insert_pfn_pmd(vma, addr, vmf->pmd, pfn, pgprot, write, pgtable); > + spin_unlock(ptl); > + if (pgtable) > + pte_free(vma->vm_mm, pgtable); > + > return VM_FAULT_NOPAGE; > } > EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd); > > +vm_fault_t dax_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write) > +{ > + struct vm_area_struct *vma = vmf->vma; > + unsigned long addr = vmf->address & PMD_MASK; > + struct mm_struct *mm = vma->vm_mm; > + spinlock_t *ptl; > + pgtable_t pgtable = NULL; > + struct folio *folio; > + struct page *page; > + > + if (addr < vma->vm_start || addr >= vma->vm_end) > + return VM_FAULT_SIGBUS; > + > + if (arch_needs_pgtable_deposit()) { > + pgtable = pte_alloc_one(vma->vm_mm); > + if (!pgtable) > + return VM_FAULT_OOM; > + } > + > + track_pfn_insert(vma, &vma->vm_page_prot, pfn); > + > + ptl = pmd_lock(mm, vmf->pmd); > + if (pmd_none(*vmf->pmd)) { > + page = pfn_t_to_page(pfn); > + folio = page_folio(page); > + folio_get(folio); > + folio_add_file_rmap_pmd(folio, page, vma); > + add_mm_counter(mm, mm_counter_file(folio), HPAGE_PMD_NR); > + } > + insert_pfn_pmd(vma, addr, vmf->pmd, pfn, vma->vm_page_prot, > + write, pgtable); > + spin_unlock(ptl); > + if (pgtable) > + pte_free(mm, pgtable); Are not the deposit ru
Re: [PATCH 08/12] gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages
Dan Williams wrote: > Alistair Popple wrote: > > Longterm pinning of FS DAX pages should already be disallowed by > > various pXX_devmap checks. However a future change will cause these > > checks to be invalid for FS DAX pages so make > > folio_is_longterm_pinnable() return false for FS DAX pages. > > > > Signed-off-by: Alistair Popple > > --- > > include/linux/memremap.h | 11 +++ > > include/linux/mm.h | 4 > > 2 files changed, 15 insertions(+) > > > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > > index 14273e6..6a1406a 100644 > > --- a/include/linux/memremap.h > > +++ b/include/linux/memremap.h > > @@ -187,6 +187,17 @@ static inline bool folio_is_device_coherent(const > > struct folio *folio) > > return is_device_coherent_page(&folio->page); > > } > > > > +static inline bool is_device_dax_page(const struct page *page) > > +{ > > + return is_zone_device_page(page) && > > + page_dev_pagemap(page)->type == MEMORY_DEVICE_FS_DAX; > > +} > > + > > +static inline bool folio_is_device_dax(const struct folio *folio) > > +{ > > + return is_device_dax_page(&folio->page); > > +} > > + > > #ifdef CONFIG_ZONE_DEVICE > > void zone_device_page_init(struct page *page); > > void *memremap_pages(struct dev_pagemap *pgmap, int nid); > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index ae6d713..935e493 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -1989,6 +1989,10 @@ static inline bool folio_is_longterm_pinnable(struct > > folio *folio) > > if (folio_is_device_coherent(folio)) > > return false; > > > > + /* DAX must also always allow eviction. */ > > + if (folio_is_device_dax(folio)) > > Why is this called "folio_is_device_dax()" when the check is for fsdax? > > I would expect: > > if (folio_is_fsdax(folio)) > return false; > > ...and s/device_dax/fsdax/ for the rest of the helpers. Specifically devdax is ok to allow longterm pinning since it is statically allocated. fsdax is the only ZONE_DEVICE mode where there is a higher-level allocator that does not support a 3rd party the block its operations indefinitely with a pin. So this needs to be explicit for that case.