Hi Mykola,

Mykola Kvach <xakep.ama...@gmail.com> writes:

> From: Mykola Kvach <mykola_kv...@epam.com>
>
> Add support for the PSCI SYSTEM_SUSPEND function in the vPSCI interface,
> allowing guests to request suspend via the PSCI v1.0 SYSTEM_SUSPEND call
> (both 32-bit and 64-bit variants).
>
> Implementation details:
> - Add SYSTEM_SUSPEND function IDs to PSCI definitions
> - Trap and handle SYSTEM_SUSPEND in vPSCI
> - Allow only non-hardware domains to invoke SYSTEM_SUSPEND; return
>   PSCI_NOT_SUPPORTED for the hardware domain to avoid halting the system
>   in hwdom_shutdown() via domain_shutdown
> - Require all secondary VCPUs of the calling domain to be offline before
>   suspend, as mandated by the PSCI specification
>
> Additionally, GIC context must be saved when a domain suspends.
> LRs are not architecturally preserved across suspend/resume or context
> switches, so Xen must explicitly save and restore them. This requirement
> is specified in the PSCI specification (DEN0022F.b, section 6.8 "Preserving
> the execution context"). The fix is implemented by moving the respective
> code in ctxt_switch_from() before the return path taken when the domain
> suspends.
>
> Introduce domain_resume_nopause/_helper() to allow resuming a domain from
> SYSTEM_SUSPEND without pausing it first. This avoids problematic
> domain_pause() calls when resuming from suspend on Arm, particularly in error
> paths. The helper is Arm-specific; other architectures continue to use the
> original domain_resume().
>
> Usage:
>
> For Linux-based guests, suspend can be initiated with:
>     echo mem > /sys/power/state
> or via:
>     systemctl suspend
>
> Resuming the guest is performed from control domain using:
>       xl resume <domain>
>
> Signed-off-by: Mykola Kvach <mykola_kv...@epam.com>
> ---
> Changes in V10:
> - small changes to the commit message reflect updates introduced in this
>   version of the patch.
> - Comments are improved, clarified, and expanded, especially regarding PSCI
>   requirements and context handling.
> - An ARM-specific helper (domain_resume_nopause_helper)
> - gprintk() and PRIregister are used for logging in vPSCI code.
> - An isb() is added before p2m_save_state
> - The is_64bit_domain check is dropped when masking the upper part of entry
>   point and cid for SMC32 SYSTEM_SUSPEND PSCI calls
>
> Changes in V9:
> - no functional changes
> - cosmetic chnages after review
> - enhance commit message and add extra comment to the code after review
>
> Changes in V8:
> - GIC and virtual timer context must be saved when the domain suspends
> - rework locking
> - minor changes after code review
>
> Changes in V7:
> - add proper locking
> - minor changes after code review
>
> Changes in V6:
> - skip execution of ctxt_switch_from for vcpu that is in paused domain
> - add implementation of domain_resume without domain_pause
> - add helper function to determine if vcpu is suspended or not
> - ignore upper 32 bits of argument values when the domain is 64-bit
>   and calls the SMC32 SYSTEM_SUSPEND function
> - cosmetic changes after review
>
> Changes in V5:
> - don't use standby mode, restore execution in a provided by guest point
> - move checking that all CPUs, except current one, are offline to after
>   pausing the vCPUs
> - provide ret status from arch_domain_shutdown and handle it in
>   domain_shutdown
> - adjust VPSCI_NR_FUNCS to reflect the number of newly added PSCI functions
>
> Changes in V4:
> Dropped all changes related to watchdog, domain is marked as shutting
> down in domain_shutdown and watchdog timeout handler won't trigger
> because of it.
>
> Previous versions included code to manage Xen watchdog timers during suspend,
> but this was removed. When a guest OS starts the Xen watchdog (either via the
> kernel driver or xenwatchdogd), it is responsible for managing that state
> across suspend/resume. On Linux, the Xen kernel driver properly stops the
> watchdog during suspend. However, when xenwatchdogd is used instead, suspend
> handling is incomplete, potentially leading to watchdog-triggered resets on
> resume. Xen leaves watchdog handling to the guest OS and its services.
>
> Dropped all changes related to VCPU context, because instead domain_shutdown
> is used, so we don't need any extra changes for suspending domain.
>
> Changes in V3:
> Dropped all domain flags and related code (which touched common functions like
> vcpu_unblock), keeping only the necessary changes for Xen suspend/resume, i.e.
> suspend/resume is now fully supported only for the hardware domain.
> Proper support for domU suspend/resume will be added in a future patch.
> This patch does not yet include VCPU context reset or domain context
> restoration in VCPU.
> ---
>  xen/arch/arm/domain.c                 |  21 +++++-
>  xen/arch/arm/include/asm/domain.h     |   2 +
>  xen/arch/arm/include/asm/perfc_defn.h |   1 +
>  xen/arch/arm/include/asm/psci.h       |   2 +
>  xen/arch/arm/include/asm/vpsci.h      |   2 +-
>  xen/arch/arm/vpsci.c                  | 100 ++++++++++++++++++++++----
>  xen/common/domain.c                   |  35 +++++++--
>  xen/include/xen/sched.h               |   3 +
>  8 files changed, 141 insertions(+), 25 deletions(-)
>
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 863ae18157..401106fd67 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -90,6 +90,24 @@ static void ctxt_switch_from(struct vcpu *p)
>      if ( is_idle_vcpu(p) )
>          return;
>  
> +    /*
> +     * Even if the guest saves and restores its own context, Xen must save
> +     * the transient state (such as List Registers) that is not preserved
> +     * by hardware across suspend. See PSCI DEN0022F.b, section 6.8
> +     * "Preserving the execution context".
> +     */
> +    gic_save_state(p);
> +
> +    /*
> +     * Skip saving the rest of the context if the VCPU is suspended,
> +     * to avoid modifying SCTLR_EL1. This is required by the PSCI
> +     * specification, which manages SCTLR_EL1 during SYSTEM_SUSPEND flow.
> +     */
> +    if ( test_bit(_VPF_suspended, &p->pause_flags) )
> +        return;
> +
> +    isb();
> +
>      p2m_save_state(p);

I am pretty sure that Julien told you that you can't skip p2m due to
to some workarounds regarding CPU speculation.

>  
>      /* CP 15 */
> @@ -158,9 +176,6 @@ static void ctxt_switch_from(struct vcpu *p)
>  
>      /* XXX MPU */
>  
> -    /* VGIC */
> -    gic_save_state(p);
> -
>      isb();
>  }
>  
> diff --git a/xen/arch/arm/include/asm/domain.h 
> b/xen/arch/arm/include/asm/domain.h
> index a3487ca713..f9577bc9ae 100644
> --- a/xen/arch/arm/include/asm/domain.h
> +++ b/xen/arch/arm/include/asm/domain.h
> @@ -312,6 +312,8 @@ static inline void update_guest_memory_policy(struct vcpu 
> *v,
>                                                struct guest_memory_policy 
> *gmp)
>  {}
>  
> +void domain_resume_nopause_helper(struct domain *d);

If I am not mistaken, function declarations for xen/common/domain.c
should go into include/xen/domain.h, not arch/arm/include/asm/domain.h

Also, I am honestly don't know is it a good way to expose internal
function from xen/common/domain.c. I do understand why do you need this,
but function naming becomes more and more confusing. I had my
considerations against domain_resume_nopause(), but
domain_resume_nopause_helper() is even worse, because it sounds to
generic.

Lastly, I checked docs/misra/rules.rst. There is no mention of Rule
8.7. So why you can't just make domain_resume_nopause() externally
available?

> +
>  #endif /* __ASM_DOMAIN_H__ */
>  
>  /*
> diff --git a/xen/arch/arm/include/asm/perfc_defn.h 
> b/xen/arch/arm/include/asm/perfc_defn.h
> index effd25b69e..8dfcac7e3b 100644
> --- a/xen/arch/arm/include/asm/perfc_defn.h
> +++ b/xen/arch/arm/include/asm/perfc_defn.h
> @@ -33,6 +33,7 @@ PERFCOUNTER(vpsci_system_reset,        "vpsci: 
> system_reset")
>  PERFCOUNTER(vpsci_cpu_suspend,         "vpsci: cpu_suspend")
>  PERFCOUNTER(vpsci_cpu_affinity_info,   "vpsci: cpu_affinity_info")
>  PERFCOUNTER(vpsci_features,            "vpsci: features")
> +PERFCOUNTER(vpsci_system_suspend,      "vpsci: system_suspend")
>  
>  PERFCOUNTER(vcpu_kick,                 "vcpu: notify other vcpu")
>  
> diff --git a/xen/arch/arm/include/asm/psci.h b/xen/arch/arm/include/asm/psci.h
> index 4780972621..48a93e6b79 100644
> --- a/xen/arch/arm/include/asm/psci.h
> +++ b/xen/arch/arm/include/asm/psci.h
> @@ -47,10 +47,12 @@ void call_psci_system_reset(void);
>  #define PSCI_0_2_FN32_SYSTEM_OFF          PSCI_0_2_FN32(8)
>  #define PSCI_0_2_FN32_SYSTEM_RESET        PSCI_0_2_FN32(9)
>  #define PSCI_1_0_FN32_PSCI_FEATURES       PSCI_0_2_FN32(10)
> +#define PSCI_1_0_FN32_SYSTEM_SUSPEND      PSCI_0_2_FN32(14)
>  
>  #define PSCI_0_2_FN64_CPU_SUSPEND         PSCI_0_2_FN64(1)
>  #define PSCI_0_2_FN64_CPU_ON              PSCI_0_2_FN64(3)
>  #define PSCI_0_2_FN64_AFFINITY_INFO       PSCI_0_2_FN64(4)
> +#define PSCI_1_0_FN64_SYSTEM_SUSPEND      PSCI_0_2_FN64(14)
>  
>  /* PSCI v0.2 affinity level state returned by AFFINITY_INFO */
>  #define PSCI_0_2_AFFINITY_LEVEL_ON      0
> diff --git a/xen/arch/arm/include/asm/vpsci.h 
> b/xen/arch/arm/include/asm/vpsci.h
> index 0cca5e6830..69d40f9d7f 100644
> --- a/xen/arch/arm/include/asm/vpsci.h
> +++ b/xen/arch/arm/include/asm/vpsci.h
> @@ -23,7 +23,7 @@
>  #include <asm/psci.h>
>  
>  /* Number of function implemented by virtual PSCI (only 0.2 or later) */
> -#define VPSCI_NR_FUNCS  12
> +#define VPSCI_NR_FUNCS  14
>  
>  /* Functions handle PSCI calls from the guests */
>  bool do_vpsci_0_1_call(struct cpu_user_regs *regs, uint32_t fid);
> diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
> index 7ba9ccd94b..3371f33b81 100644
> --- a/xen/arch/arm/vpsci.c
> +++ b/xen/arch/arm/vpsci.c
> @@ -10,28 +10,18 @@
>  
>  #include <public/sched.h>
>  
> -static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
> -                            register_t context_id)
> +static int do_setup_vcpu_ctx(struct vcpu *v, register_t entry_point,
> +                      register_t context_id)
>  {
> -    struct vcpu *v;
>      struct domain *d = current->domain;
>      struct vcpu_guest_context *ctxt;
>      int rc;
>      bool is_thumb = entry_point & 1;
> -    register_t vcpuid;
> -
> -    vcpuid = vaffinity_to_vcpuid(target_cpu);
> -
> -    if ( (v = domain_vcpu(d, vcpuid)) == NULL )
> -        return PSCI_INVALID_PARAMETERS;
>  
>      /* THUMB set is not allowed with 64-bit domain */
>      if ( is_64bit_domain(d) && is_thumb )
>          return PSCI_INVALID_ADDRESS;
>  
> -    if ( !test_bit(_VPF_down, &v->pause_flags) )
> -        return PSCI_ALREADY_ON;
> -
>      if ( (ctxt = alloc_vcpu_guest_context()) == NULL )
>          return PSCI_DENIED;
>  
> @@ -78,11 +68,32 @@ static int do_common_cpu_on(register_t target_cpu, 
> register_t entry_point,
>      if ( rc < 0 )
>          return PSCI_DENIED;
>  
> -    vcpu_wake(v);
> -
>      return PSCI_SUCCESS;
>  }
>  
> +static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
> +                            register_t context_id)
> +{
> +    int rc;
> +    struct vcpu *v;
> +    struct domain *d = current->domain;
> +    register_t vcpuid;
> +
> +    vcpuid = vaffinity_to_vcpuid(target_cpu);
> +
> +    if ( (v = domain_vcpu(d, vcpuid)) == NULL )
> +        return PSCI_INVALID_PARAMETERS;
> +
> +    if ( !test_bit(_VPF_down, &v->pause_flags) )
> +        return PSCI_ALREADY_ON;
> +
> +    rc = do_setup_vcpu_ctx(v, entry_point, context_id);
> +    if ( rc == PSCI_SUCCESS )
> +        vcpu_wake(v);
> +
> +    return rc;
> +}
> +
>  static int32_t do_psci_cpu_on(uint32_t vcpuid, register_t entry_point)
>  {
>      int32_t ret;
> @@ -197,6 +208,48 @@ static void do_psci_0_2_system_reset(void)
>      domain_shutdown(d,SHUTDOWN_reboot);
>  }
>  
> +static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> +{
> +    int32_t rc;
> +    struct vcpu *v;
> +    struct domain *d = current->domain;
> +
> +    /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> +    if ( is_hardware_domain(d) )
> +        return PSCI_NOT_SUPPORTED;
> +
> +    /* Ensure that all CPUs other than the calling one are offline */
> +    domain_lock(d);
> +    for_each_vcpu ( d, v )
> +    {
> +        if ( v != current && is_vcpu_online(v) )
> +        {
> +            domain_unlock(d);
> +            return PSCI_DENIED;
> +        }
> +    }
> +    domain_unlock(d);
> +
> +    rc = domain_shutdown(d, SHUTDOWN_suspend);
> +    if ( rc )
> +        return PSCI_DENIED;
> +
> +    rc = do_setup_vcpu_ctx(current, epoint, cid);
> +    if ( rc != PSCI_SUCCESS )
> +    {
> +        domain_resume_nopause_helper(d);
> +        return rc;
> +    }
> +
> +    set_bit(_VPF_suspended, &current->pause_flags);
> +
> +    gprintk(XENLOG_DEBUG,
> +            "SYSTEM_SUSPEND requested, epoint=0x%"PRIregister", 
> cid=0x%"PRIregister,
> +            epoint, cid);
> +
> +    return rc;
> +}
> +
>  static int32_t do_psci_1_0_features(uint32_t psci_func_id)
>  {
>      /* /!\ Ordered by function ID and not name */
> @@ -214,6 +267,8 @@ static int32_t do_psci_1_0_features(uint32_t psci_func_id)
>      case PSCI_0_2_FN32_SYSTEM_OFF:
>      case PSCI_0_2_FN32_SYSTEM_RESET:
>      case PSCI_1_0_FN32_PSCI_FEATURES:
> +    case PSCI_1_0_FN32_SYSTEM_SUSPEND:
> +    case PSCI_1_0_FN64_SYSTEM_SUSPEND:
>      case ARM_SMCCC_VERSION_FID:
>          return 0;
>      default:
> @@ -344,6 +399,23 @@ bool do_vpsci_0_2_call(struct cpu_user_regs *regs, 
> uint32_t fid)
>          return true;
>      }
>  
> +    case PSCI_1_0_FN32_SYSTEM_SUSPEND:
> +    case PSCI_1_0_FN64_SYSTEM_SUSPEND:
> +    {
> +        register_t epoint = PSCI_ARG(regs, 1);
> +        register_t cid = PSCI_ARG(regs, 2);
> +
> +        if ( fid == PSCI_1_0_FN32_SYSTEM_SUSPEND )
> +        {
> +            epoint &= GENMASK(31, 0);
> +            cid &= GENMASK(31, 0);
> +        }
> +
> +        perfc_incr(vpsci_system_suspend);
> +        PSCI_SET_RESULT(regs, do_psci_1_0_system_suspend(epoint, cid));
> +        return true;
> +    }
> +
>      default:
>          return false;
>      }
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 104e917f07..e2371549a0 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1349,16 +1349,10 @@ int domain_shutdown(struct domain *d, u8 reason)
>      return 0;
>  }
>  
> -void domain_resume(struct domain *d)
> +static void domain_resume_nopause(struct domain *d)
>  {
>      struct vcpu *v;
>  
> -    /*
> -     * Some code paths assume that shutdown status does not get reset under
> -     * their feet (e.g., some assertions make this assumption).
> -     */
> -    domain_pause(d);
> -
>      spin_lock(&d->shutdown_lock);
>  
>      d->is_shutting_down = d->is_shut_down = 0;
> @@ -1366,13 +1360,40 @@ void domain_resume(struct domain *d)
>  
>      for_each_vcpu ( d, v )
>      {
> +        /*
> +         * No need to conditionally clear _VPF_suspended here:
> +         * - This bit is only set on Arm, and only after a successful 
> suspend.
> +         * - domain_resume_nopause() may also be called from paths other than
> +         *   the suspend/resume flow, such as "soft-reset" actions (e.g.,
> +         *   on_poweroff), as part of the Xenstore control/shutdown protocol.
> +         *   These require guest acknowledgement to complete the operation.
> +         * So clearing the bit unconditionally is safe.
> +         */
> +        clear_bit(_VPF_suspended, &v->pause_flags);
> +
>          if ( v->paused_for_shutdown )
>              vcpu_unpause(v);
>          v->paused_for_shutdown = 0;
>      }
>  
>      spin_unlock(&d->shutdown_lock);
> +}
>  
> +#ifdef CONFIG_ARM
> +void domain_resume_nopause_helper(struct domain *d)
> +{
> +    domain_resume_nopause(d);
> +}
> +#endif
> +
> +void domain_resume(struct domain *d)
> +{
> +    /*
> +     * Some code paths assume that shutdown status does not get reset under
> +     * their feet (e.g., some assertions make this assumption).
> +     */
> +    domain_pause(d);
> +    domain_resume_nopause(d);
>      domain_unpause(d);
>  }
>  
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 02bdc256ce..12429d16b9 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -1014,6 +1014,9 @@ static inline struct domain *next_domain_in_cpupool(
>  /* VCPU is parked. */
>  #define _VPF_parked          8
>  #define VPF_parked           (1UL<<_VPF_parked)
> +/* VCPU is suspended. */
> +#define _VPF_suspended       9
> +#define VPF_suspended        (1UL << _VPF_suspended)
>  
>  static inline bool vcpu_runnable(const struct vcpu *v)
>  {

-- 
WBR, Volodymyr

Reply via email to