date:20240321

RE: [PATCH 0/3] ui/console: initialize QemuDmaBuf in ui/console

2024-03-21 Thread Kim, Dongwon

Hi Phlippe,

> -Original Message-
> From: Philippe Mathieu-Daudé 
> Sent: Wednesday, March 20, 2024 11:57 PM
> To: Kim, Dongwon ; qemu-devel@nongnu.org
> Cc: marcandre.lur...@redhat.com
> Subject: Re: [PATCH 0/3] ui/console: initialize QemuDmaBuf in ui/console
> 
> Hi Dongwon,
> 
> On 20/3/24 21:50, dongwon@intel.com wrote:
> > From: Dongwon Kim 
> >
> > QemuDmaBuf struct is defined and primarily used by ui/console/gl so it
> > is better to handle its creation, initialization and access within
> > ui/console rather than within hw modules such as hw/display/virtio-gpu
> > and hw/vfio/display.
> >
> > To achieve this, new methods for allocating, initializing the struct,
> > and accessing certain fields necessary for hardware modules have been
> > introduced in ui/console.c.
> > (3rd patch)
> >
> > Furthermore, modifications have been made to hw/display/virtio-gpu and
> > hw/vfio/display to utilize these new methods instead of setting up the
> > struct independently.
> > (1st and 2nd patches)
> 
> Thanks for splitting, unfortunately the series isn't buildable / bisectable 
> since the
> methods use in patches 1&2 are only introduced in patch 3 :/
[Kim, Dongwon]  Maybe changing order of patches like 3-1-2 would be acceptable?

> 
> > Dongwon Kim (3):
> >hw/virtio: intialize QemuDmaBuf using the function from ui/console
> >hw/vfio: intialize QemuDmaBuf using the function from ui/console
> >ui/console: add methods for allocating, intializing and accessing
> >  QemuDmaBuf

Re: [PATCH 0/3] ui/console: initialize QemuDmaBuf in ui/console

2024-03-21 Thread Philippe Mathieu-Daudé

On 21/3/24 08:01, Kim, Dongwon wrote:

Hi Phlippe,

-Original Message-
From: Philippe Mathieu-Daudé 
Sent: Wednesday, March 20, 2024 11:57 PM
To: Kim, Dongwon ; qemu-devel@nongnu.org
Cc: marcandre.lur...@redhat.com
Subject: Re: [PATCH 0/3] ui/console: initialize QemuDmaBuf in ui/console

Hi Dongwon,

On 20/3/24 21:50, dongwon@intel.com wrote:

From: Dongwon Kim 

QemuDmaBuf struct is defined and primarily used by ui/console/gl so it
is better to handle its creation, initialization and access within
ui/console rather than within hw modules such as hw/display/virtio-gpu
and hw/vfio/display.

To achieve this, new methods for allocating, initializing the struct,
and accessing certain fields necessary for hardware modules have been
introduced in ui/console.c.
(3rd patch)

Furthermore, modifications have been made to hw/display/virtio-gpu and
hw/vfio/display to utilize these new methods instead of setting up the
struct independently.
(1st and 2nd patches)

Thanks for splitting, unfortunately the series isn't buildable / bisectable 
since the
methods use in patches 1&2 are only introduced in patch 3 :/

[Kim, Dongwon]  Maybe changing order of patches like 3-1-2 would be acceptable?

No, because of the g_free() call in dpy_gl_release_dmabuf().

Maybe Marc-André is OK with the previous version...

Dongwon Kim (3):
hw/virtio: intialize QemuDmaBuf using the function from ui/console
hw/vfio: intialize QemuDmaBuf using the function from ui/console
ui/console: add methods for allocating, intializing and accessing
  QemuDmaBuf

Re: [PATCH 1/5] target/riscv: Add support for Zve32x extension

2024-03-21 Thread Jason Chien

I will re-send shortly. Thanks.

Daniel Henrique Barboza  於 2024年3月20日 週三
上午5:19寫道：

> Hi Jason,
>
> Care to re-send please? The patches don't apply to neither
> riscv-to-apply.next
> nor master.
>
>
> Thanks,
>
> Daniel
>
> On 3/19/24 13:23, Jason Chien wrote:
> > Ping. Can anyone review the patches please?
> >
> > Jason Chien mailto:jason.ch...@sifive.com>> 於
> 2024年3月7日 週四 上午1:09寫道：
> >
> > Add support for Zve32x extension and replace some checks for Zve32f
> with
> > Zve32x, since Zve32f depends on Zve32x.
> >
> > Signed-off-by: Jason Chien  jason.ch...@sifive.com>>
> > Reviewed-by: Frank Chang  frank.ch...@sifive.com>>
> > Reviewed-by: Max Chou  max.c...@sifive.com>>
> > ---
> >   target/riscv/cpu.c  |  1 +
> >   target/riscv/cpu_cfg.h  |  1 +
> >   target/riscv/cpu_helper.c   |  2 +-
> >   target/riscv/csr.c  |  2 +-
> >   target/riscv/insn_trans/trans_rvv.c.inc |  4 ++--
> >   target/riscv/tcg/tcg-cpu.c  | 16 
> >   6 files changed, 14 insertions(+), 12 deletions(-)
> >
> > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> > index fd0c7efdda..10ccae3323 100644
> > --- a/target/riscv/cpu.c
> > +++ b/target/riscv/cpu.c
> > @@ -152,6 +152,7 @@ const RISCVIsaExtData isa_edata_arr[] = {
> >   ISA_EXT_DATA_ENTRY(zvbb, PRIV_VERSION_1_12_0, ext_zvbb),
> >   ISA_EXT_DATA_ENTRY(zvbc, PRIV_VERSION_1_12_0, ext_zvbc),
> >   ISA_EXT_DATA_ENTRY(zve32f, PRIV_VERSION_1_10_0, ext_zve32f),
> > +ISA_EXT_DATA_ENTRY(zve32x, PRIV_VERSION_1_10_0, ext_zve32x),
> >   ISA_EXT_DATA_ENTRY(zve64f, PRIV_VERSION_1_10_0, ext_zve64f),
> >   ISA_EXT_DATA_ENTRY(zve64d, PRIV_VERSION_1_10_0, ext_zve64d),
> >   ISA_EXT_DATA_ENTRY(zvfbfmin, PRIV_VERSION_1_12_0,
> ext_zvfbfmin),
> > diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
> > index be39870691..beb3d10213 100644
> > --- a/target/riscv/cpu_cfg.h
> > +++ b/target/riscv/cpu_cfg.h
> > @@ -90,6 +90,7 @@ struct RISCVCPUConfig {
> >   bool ext_zhinx;
> >   bool ext_zhinxmin;
> >   bool ext_zve32f;
> > +bool ext_zve32x;
> >   bool ext_zve64f;
> >   bool ext_zve64d;
> >   bool ext_zvbb;
> > diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> > index c994a72634..ebbe56d9a2 100644
> > --- a/target/riscv/cpu_helper.c
> > +++ b/target/riscv/cpu_helper.c
> > @@ -72,7 +72,7 @@ void cpu_get_tb_cpu_state(CPURISCVState *env,
> vaddr *pc,
> >   *pc = env->xl == MXL_RV32 ? env->pc & UINT32_MAX : env->pc;
> >   *cs_base = 0;
> >
> > -if (cpu->cfg.ext_zve32f) {
> > +if (cpu->cfg.ext_zve32x) {
> >   /*
> >* If env->vl equals to VLMAX, we can use generic vector
> operation
> >* expanders (GVEC) to accerlate the vector operations.
> > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> > index 726096444f..d96feea5d3 100644
> > --- a/target/riscv/csr.c
> > +++ b/target/riscv/csr.c
> > @@ -93,7 +93,7 @@ static RISCVException fs(CPURISCVState *env, int
> csrno)
> >
> >   static RISCVException vs(CPURISCVState *env, int csrno)
> >   {
> > -if (riscv_cpu_cfg(env)->ext_zve32f) {
> > +if (riscv_cpu_cfg(env)->ext_zve32x) {
> >   #if !defined(CONFIG_USER_ONLY)
> >   if (!env->debugger && !riscv_cpu_vector_enabled(env)) {
> >   return RISCV_EXCP_ILLEGAL_INST;
> > diff --git a/target/riscv/insn_trans/trans_rvv.c.inc
> b/target/riscv/insn_trans/trans_rvv.c.inc
> > index 9e101ab434..f00f1ee886 100644
> > --- a/target/riscv/insn_trans/trans_rvv.c.inc
> > +++ b/target/riscv/insn_trans/trans_rvv.c.inc
> > @@ -149,7 +149,7 @@ static bool do_vsetvl(DisasContext *s, int rd,
> int rs1, TCGv s2)
> >   {
> >   TCGv s1, dst;
> >
> > -if (!require_rvv(s) || !s->cfg_ptr->ext_zve32f) {
> > +if (!require_rvv(s) || !s->cfg_ptr->ext_zve32x) {
> >   return false;
> >   }
> >
> > @@ -179,7 +179,7 @@ static bool do_vsetivli(DisasContext *s, int rd,
> TCGv s1, TCGv s2)
> >   {
> >   TCGv dst;
> >
> > -if (!require_rvv(s) || !s->cfg_ptr->ext_zve32f) {
> > +if (!require_rvv(s) || !s->cfg_ptr->ext_zve32x) {
> >   return false;
> >   }
> >
> > diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
> > index ab6db817db..ce539528e6 100644
> > --- a/target/riscv/tcg/tcg-cpu.c
> > +++ b/target/riscv/tcg/tcg-cpu.c
> > @@ -501,9 +501,13 @@ void riscv_cpu_validate_set_extensions(RISCVCPU
> *cpu, Error **errp)
> >   return;
> >   }
> >
> > -if (cpu->cfg.ext_zve32f && !riscv_has_ext(env, RVF)) {
> > -error_setg(errp, "Zve32f/Zve64f extensions require F
> extensi

Re: [PATCH v2] target/riscv: Fix the element agnostic function problem

2024-03-21 Thread Richard Henderson


On 3/20/24 17:58, Huang Tao wrote:

In RVV and vcrypto instructions, the masked and tail elements are set to 1s
using vext_set_elems_1s function if the vma/vta bit is set. It is the element
agnostic policy.

However, this function can't deal the big endian situation. This patch fixes
the problem by adding handling of such case.

Signed-off-by: Huang Tao 
Suggested-by: Richard Henderson 
---
Changes in v2:
- Keep the api of vext_set_elems_1s
- Reduce the number of patches.
---
  target/riscv/vector_internals.c | 22 ++
  1 file changed, 22 insertions(+)

diff --git a/target/riscv/vector_internals.c b/target/riscv/vector_internals.c
index 12f5964fbb..3e45b9b4a7 100644
--- a/target/riscv/vector_internals.c
+++ b/target/riscv/vector_internals.c
@@ -30,6 +30,28 @@ void vext_set_elems_1s(void *base, uint32_t is_agnostic, 
uint32_t cnt,
  if (tot - cnt == 0) {
  return ;
  }
+
+#if HOST_BIG_ENDIAN
+/*
+ * Deal the situation when the elements are insdie
+ * only one uint64 block including setting the
+ * masked-off element.
+ */
+if ((tot - 1) ^ cnt < 8) {
+memset(base + H1(tot - 1), -1, tot - cnt);
+return;
+}


(1) tot will always be a multiple of 8, afaik, so there's no need for this 
first block.
(2) Using if not #if means that the code is always compile-tested, even if it 
is eliminated.


r~

Re: [PATCH v2] target/riscv: Fix the element agnostic function problem

2024-03-21 Thread Huang Tao




On 2024/3/21 16:18, Richard Henderson wrote:

On 3/20/24 17:58, Huang Tao wrote:
In RVV and vcrypto instructions, the masked and tail elements are set 
to 1s
using vext_set_elems_1s function if the vma/vta bit is set. It is the 
element

agnostic policy.

However, this function can't deal the big endian situation. This 
patch fixes

the problem by adding handling of such case.

Signed-off-by: Huang Tao 
Suggested-by: Richard Henderson 
---
Changes in v2:
- Keep the api of vext_set_elems_1s
- Reduce the number of patches.
---
  target/riscv/vector_internals.c | 22 ++
  1 file changed, 22 insertions(+)

diff --git a/target/riscv/vector_internals.c 
b/target/riscv/vector_internals.c

index 12f5964fbb..3e45b9b4a7 100644
--- a/target/riscv/vector_internals.c
+++ b/target/riscv/vector_internals.c
@@ -30,6 +30,28 @@ void vext_set_elems_1s(void *base, uint32_t 
is_agnostic, uint32_t cnt,

  if (tot - cnt == 0) {
  return ;
  }
+
+#if HOST_BIG_ENDIAN
+    /*
+ * Deal the situation when the elements are insdie
+ * only one uint64 block including setting the
+ * masked-off element.
+ */
+    if ((tot - 1) ^ cnt < 8) {
+    memset(base + H1(tot - 1), -1, tot - cnt);
+    return;
+    }


(1) tot will always be a multiple of 8, afaik, so there's no need for 
this first block.
(2) Using if not #if means that the code is always compile-tested, 
even if it is eliminated.



r~


tot is not always be a multiple of 8. In the vector instructions, the 
helper fuinctions will use vext_set_elems_1s to set one masked-off 
element. In that case, tot = cnt + esz, and tot is not the end of a 
vector register.


There is an example in GEN_VEXT_SHIFT_VV:

    for (i = env->vstart; i < vl; i++) {
    if (!vm && !vext_elem_mask(v0, i)) {
    /* set masked-off elements to 1s */
    vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
    continue;
    }

As for the second point, I will use if instead of #if in the next version.

Thanks,

Huang Tao

[RFC PATCH] target/ppc: Fix TCG PMC5 instruction counting

2024-03-21 Thread Nicholas Piggin

PMC5 does not count instructions when single stepping (with gdb,
haven't tried single stepping inside the target), or when taking
exceptions. At least the single-steppig is a bit of a landmine for
replay.

I don't quite understand the logic of the approach taken for
counting now. AFAIKS instructions must be counted whenever leaving
the current TB whether it is exiting or going to the next TB
directly.

This patch fixes up at least the ss and syscall/synchronous exception
cases, and doesn't seem to break anything. I don't know if there
is a better or more consistent way to do it though.

Thanks,
Nick

---
 target/ppc/translate.c | 146 +++--
 1 file changed, 67 insertions(+), 79 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 93ffec787c..4e4648e02d 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -296,11 +296,73 @@ static inline void gen_update_nip(DisasContext *ctx, 
target_ulong nip)
 tcg_gen_movi_tl(cpu_nip, nip);
 }
 
+#if defined(TARGET_PPC64)
+static void pmu_count_insns(DisasContext *ctx)
+{
+/*
+ * Do not bother calling the helper if the PMU isn't counting
+ * instructions.
+ */
+if (!ctx->pmu_insn_cnt) {
+return;
+}
+
+ #if !defined(CONFIG_USER_ONLY)
+TCGLabel *l;
+TCGv t0;
+
+/*
+ * The PMU insns_inc() helper stops the internal PMU timer if a
+ * counter overflows happens. In that case, if the guest is
+ * running with icount and we do not handle it beforehand,
+ * the helper can trigger a 'bad icount read'.
+ */
+translator_io_start(&ctx->base);
+
+/* Avoid helper calls when only PMC5-6 are enabled. */
+if (!ctx->pmc_other) {
+l = gen_new_label();
+t0 = tcg_temp_new();
+
+gen_load_spr(t0, SPR_POWER_PMC5);
+tcg_gen_addi_tl(t0, t0, ctx->base.num_insns);
+gen_store_spr(SPR_POWER_PMC5, t0);
+/* Check for overflow, if it's enabled */
+if (ctx->mmcr0_pmcjce) {
+tcg_gen_brcondi_tl(TCG_COND_LT, t0, PMC_COUNTER_NEGATIVE_VAL, l);
+gen_helper_handle_pmc5_overflow(tcg_env);
+}
+
+gen_set_label(l);
+} else {
+gen_helper_insns_inc(tcg_env, tcg_constant_i32(ctx->base.num_insns));
+}
+  #else
+/*
+ * User mode can read (but not write) PMC5 and start/stop
+ * the PMU via MMCR0_FC. In this case just increment
+ * PMC5 with base.num_insns.
+ */
+TCGv t0 = tcg_temp_new();
+
+gen_load_spr(t0, SPR_POWER_PMC5);
+tcg_gen_addi_tl(t0, t0, ctx->base.num_insns);
+gen_store_spr(SPR_POWER_PMC5, t0);
+  #endif /* #if !defined(CONFIG_USER_ONLY) */
+}
+#else
+static void pmu_count_insns(DisasContext *ctx)
+{
+return;
+}
+#endif /* #if defined(TARGET_PPC64) */
+
 static void gen_exception_err_nip(DisasContext *ctx, uint32_t excp,
   uint32_t error, target_ulong nip)
 {
 TCGv_i32 t0, t1;
 
+pmu_count_insns(ctx);
 gen_update_nip(ctx, nip);
 t0 = tcg_constant_i32(excp);
 t1 = tcg_constant_i32(error);
@@ -323,6 +385,7 @@ static void gen_exception_nip(DisasContext *ctx, uint32_t 
excp,
 {
 TCGv_i32 t0;
 
+pmu_count_insns(ctx);
 gen_update_nip(ctx, nip);
 t0 = tcg_constant_i32(excp);
 gen_helper_raise_exception(tcg_env, t0);
@@ -4082,67 +4145,6 @@ static inline void gen_update_cfar(DisasContext *ctx, 
target_ulong nip)
 #endif
 }
 
-#if defined(TARGET_PPC64)
-static void pmu_count_insns(DisasContext *ctx)
-{
-/*
- * Do not bother calling the helper if the PMU isn't counting
- * instructions.
- */
-if (!ctx->pmu_insn_cnt) {
-return;
-}
-
- #if !defined(CONFIG_USER_ONLY)
-TCGLabel *l;
-TCGv t0;
-
-/*
- * The PMU insns_inc() helper stops the internal PMU timer if a
- * counter overflows happens. In that case, if the guest is
- * running with icount and we do not handle it beforehand,
- * the helper can trigger a 'bad icount read'.
- */
-translator_io_start(&ctx->base);
-
-/* Avoid helper calls when only PMC5-6 are enabled. */
-if (!ctx->pmc_other) {
-l = gen_new_label();
-t0 = tcg_temp_new();
-
-gen_load_spr(t0, SPR_POWER_PMC5);
-tcg_gen_addi_tl(t0, t0, ctx->base.num_insns);
-gen_store_spr(SPR_POWER_PMC5, t0);
-/* Check for overflow, if it's enabled */
-if (ctx->mmcr0_pmcjce) {
-tcg_gen_brcondi_tl(TCG_COND_LT, t0, PMC_COUNTER_NEGATIVE_VAL, l);
-gen_helper_handle_pmc5_overflow(tcg_env);
-}
-
-gen_set_label(l);
-} else {
-gen_helper_insns_inc(tcg_env, tcg_constant_i32(ctx->base.num_insns));
-}
-  #else
-/*
- * User mode can read (but not write) PMC5 and start/stop
- * the PMU via MMCR0_FC. In this case just increment
- * PMC5 with base.num_insns.
- */
-TCGv t0 = tcg_temp_new();
-
-gen_load_spr(t0, SPR_POWER_PMC5);
-tcg_gen_addi_tl(t0, t0, ct

Re: [RFC PATCH v8 06/23] target/arm: Add support for Non-maskable Interrupt

2024-03-21 Thread Jinjie Ruan via




On 2024/3/20 1:28, Peter Maydell wrote:
> On Mon, 18 Mar 2024 at 09:37, Jinjie Ruan  wrote:
>>
>> This only implements the external delivery method via the GICv3.
>>
>> Signed-off-by: Jinjie Ruan 
>> Reviewed-by: Richard Henderson 
>> ---
>> v8:
>> - Fix the rcu stall after sending a VNMI in qemu VM.
>> v7:
>> - Add Reviewed-by.
>> v6:
>> - env->cp15.hcr_el2 -> arm_hcr_el2_eff().
>> - env->cp15.hcrx_el2 -> arm_hcrx_el2_eff().
>> - Not include VF && VFNMI in CPU_INTERRUPT_VNMI.
>> v4:
>> - Accept NMI unconditionally for arm_cpu_has_work() but add comment.
>> - Change from & to && for EXCP_IRQ or EXCP_FIQ.
>> - Refator nmi mask in arm_excp_unmasked().
>> - Also handle VNMI in arm_cpu_exec_interrupt() and arm_cpu_set_irq().
>> - Rename virtual to Virtual.
>> v3:
>> - Not include CPU_INTERRUPT_NMI when FEAT_NMI not enabled
>> - Add ARM_CPU_VNMI.
>> - Refator nmi mask in arm_excp_unmasked().
>> - Test SCTLR_ELx.NMI for ALLINT mask for NMI.
>> ---
>>  target/arm/cpu-qom.h   |  4 +-
>>  target/arm/cpu.c   | 85 +++---
>>  target/arm/cpu.h   |  4 ++
>>  target/arm/helper.c|  2 +
>>  target/arm/internals.h |  9 +
>>  5 files changed, 97 insertions(+), 7 deletions(-)
>>
>> diff --git a/target/arm/cpu-qom.h b/target/arm/cpu-qom.h
>> index 8e032691db..e0c9e18036 100644
>> --- a/target/arm/cpu-qom.h
>> +++ b/target/arm/cpu-qom.h
>> @@ -36,11 +36,13 @@ DECLARE_CLASS_CHECKERS(AArch64CPUClass, AARCH64_CPU,
>>  #define ARM_CPU_TYPE_SUFFIX "-" TYPE_ARM_CPU
>>  #define ARM_CPU_TYPE_NAME(name) (name ARM_CPU_TYPE_SUFFIX)
>>
>> -/* Meanings of the ARMCPU object's four inbound GPIO lines */
>> +/* Meanings of the ARMCPU object's six inbound GPIO lines */
>>  #define ARM_CPU_IRQ 0
>>  #define ARM_CPU_FIQ 1
>>  #define ARM_CPU_VIRQ 2
>>  #define ARM_CPU_VFIQ 3
>> +#define ARM_CPU_NMI 4
>> +#define ARM_CPU_VNMI 5
>>
>>  /* For M profile, some registers are banked secure vs non-secure;
>>   * these are represented as a 2-element array where the first element
> 
>> @@ -678,13 +687,31 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
>> unsigned int excp_idx,
>>  return false;
>>  }
>>
>> +if (cpu_isar_feature(aa64_nmi, env_archcpu(env)) &&
>> +env->cp15.sctlr_el[target_el] & SCTLR_NMI && cur_el == target_el) {
>> +allIntMask = env->pstate & PSTATE_ALLINT ||
>> + ((env->cp15.sctlr_el[target_el] & SCTLR_SPINTMASK) &&
>> +  (env->pstate & PSTATE_SP));
>> +}
>> +
>>  switch (excp_idx) {
>> +case EXCP_NMI:
>> +pstate_unmasked = !allIntMask;
>> +break;
>> +
>> +case EXCP_VNMI:
>> +if ((!(hcr_el2 & HCR_IMO) && !(hcr_el2 & HCR_FMO)) ||
>> + (hcr_el2 & HCR_TGE)) {
>> +/* VNMIs(VIRQs or VFIQs) are only taken when hypervized.  */
>> +return false;
>> +}
> 
> VINMI and VFNMI aren't the same thing: do we definitely want to
> merge them into one EXCP_VNMI ? It feels like it would be simpler
> to keep them separate. Similarly CPU_INTERRUPT_VNMI, and
> arm_cpu_update_vnmi() probably want VINMI and VFNMI versions.

It's not like that. The VFNMI cannot be reported from the GIC, there
will be no opportunity to call arm_cpu_update_vfnmi().
> 
>> +return !allIntMask;
>>  case EXCP_FIQ:
>> -pstate_unmasked = !(env->daif & PSTATE_F);
>> +pstate_unmasked = (!(env->daif & PSTATE_F)) && (!allIntMask);
>>  break;
>>
>>  case EXCP_IRQ:
>> -pstate_unmasked = !(env->daif & PSTATE_I);
>> +pstate_unmasked = (!(env->daif & PSTATE_I)) && (!allIntMask);
>>  break;
>>
>>  case EXCP_VFIQ:
>> @@ -692,13 +719,13 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
>> unsigned int excp_idx,
>>  /* VFIQs are only taken when hypervized.  */
>>  return false;
>>  }
>> -return !(env->daif & PSTATE_F);
>> +return !(env->daif & PSTATE_F) && (!allIntMask);
>>  case EXCP_VIRQ:
>>  if (!(hcr_el2 & HCR_IMO) || (hcr_el2 & HCR_TGE)) {
>>  /* VIRQs are only taken when hypervized.  */
>>  return false;
>>  }
>> -return !(env->daif & PSTATE_I);
>> +return !(env->daif & PSTATE_I) && (!allIntMask);
>>  case EXCP_VSERR:
>>  if (!(hcr_el2 & HCR_AMO) || (hcr_el2 & HCR_TGE)) {
>>  /* VIRQs are only taken when hypervized.  */
>> @@ -804,6 +831,24 @@ static bool arm_cpu_exec_interrupt(CPUState *cs, int 
>> interrupt_request)
>>
>>  /* The prioritization of interrupts is IMPLEMENTATION DEFINED. */
>>
>> +if (cpu_isar_feature(aa64_nmi, env_archcpu(env))) {
>> +if (interrupt_request & CPU_INTERRUPT_NMI) {
>> +excp_idx = EXCP_NMI;
>> +target_el = arm_phys_excp_target_el(cs, excp_idx, cur_el, 
>> secure);
>> +if (arm_excp_unmasked(cs, excp_idx, target_el,
>> +  cur_el, secure, hcr_el2)) {
>> +

change QARMA3 default for aarch64?

2024-03-21 Thread Michael Tokarev


Since commit v8.1.0-511-g399e5e7125 "target/arm: Implement FEAT_PACQARMA3",
pauth-qarma3 is the default pauth scheme.  However this one is very slow.

When people run aarch64 code in qemu tcg, an immediate reaction is like,
"this seems to be a bug somewhere", since the code run insanely slower than
it was before.

And this is very difficult to find as well, - the reason for that slowdown
is usually well hidden from an average soul.

When the reason is actually discovered, people start changing settings in
various tools and configs to work around this issue.  Qemu itself has
overrides, pauth-impdef=on, in various tests, to make the test run at
saner speed.

After seeing how many issues people are having in debian with that, I'm
about to switch the default in debian build of qemu, because impdef,
while makes certain arm64-specific protection feature less effective,
is actually significantly more practical.  I dislike changing the
defaults, but this is a situation when it needs to be done, imho.

But before doing that, maybe it's better to change qemu default
instead?  What do you think?

Ditto for QARMA5 I guess.

/mjt

Re: [PATCH 0/3] ui/console: initialize QemuDmaBuf in ui/console

2024-03-21 Thread Philippe Mathieu-Daudé


On 20/3/24 21:50, dongwon@intel.com wrote:

From: Dongwon Kim 

QemuDmaBuf struct is defined and primarily used by ui/console/gl so it is
better to handle its creation, initialization and access within ui/console
rather than within hw modules such as hw/display/virtio-gpu and
hw/vfio/display.

To achieve this, new methods for allocating, initializing the struct, and
accessing certain fields necessary for hardware modules have been introduced
in ui/console.c.
(3rd patch)

Furthermore, modifications have been made to hw/display/virtio-gpu and
hw/vfio/display to utilize these new methods instead of setting up the struct
independently.
(1st and 2nd patches)


What about dbus_scanout_texture(), should it use dpy_gl_create_dmabuf()?


Dongwon Kim (3):
   hw/virtio: intialize QemuDmaBuf using the function from ui/console
   hw/vfio: intialize QemuDmaBuf using the function from ui/console
   ui/console: add methods for allocating, intializing and accessing
 QemuDmaBuf

Re: [PATCH 3/3] ui/console: add methods for allocating, intializing and accessing QemuDmaBuf

2024-03-21 Thread Philippe Mathieu-Daudé


On 20/3/24 21:50, dongwon@intel.com wrote:

From: Dongwon Kim 

This commit introduces new methods within ui/console to handle the allocation,
initialization, and field retrieval of QemuDmaBuf. By isolating these
operations within ui/console, it enhances safety and encapsulation of
the struct.

Cc: Philippe Mathieu-Daudé 
Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
  include/ui/console.h | 10 
  ui/console.c | 55 
  2 files changed, 65 insertions(+)




  void dpy_gl_release_dmabuf(QemuConsole *con,
QemuDmaBuf *dmabuf)
  {
@@ -1145,6 +1199,7 @@ void dpy_gl_release_dmabuf(QemuConsole *con,
  if (dcl->ops->dpy_gl_release_dmabuf) {
  dcl->ops->dpy_gl_release_dmabuf(dcl, dmabuf);
  }
+g_free(dmabuf);


This makes vhost_user_gpu_handle_display() crash, see VhostUserGPU.


  }
  }

Re: [RFC PATCH v8 06/23] target/arm: Add support for Non-maskable Interrupt

2024-03-21 Thread Peter Maydell

On Thu, 21 Mar 2024 at 09:27, Jinjie Ruan  wrote:
>
>
>
> On 2024/3/20 1:28, Peter Maydell wrote:
> > On Mon, 18 Mar 2024 at 09:37, Jinjie Ruan  wrote:
> >>
> >> This only implements the external delivery method via the GICv3.
> >>
> >> Signed-off-by: Jinjie Ruan 
> >> Reviewed-by: Richard Henderson 
> >> ---
> >> v8:
> >> - Fix the rcu stall after sending a VNMI in qemu VM.
> >> v7:
> >> - Add Reviewed-by.
> >> v6:
> >> - env->cp15.hcr_el2 -> arm_hcr_el2_eff().
> >> - env->cp15.hcrx_el2 -> arm_hcrx_el2_eff().
> >> - Not include VF && VFNMI in CPU_INTERRUPT_VNMI.
> >> v4:
> >> - Accept NMI unconditionally for arm_cpu_has_work() but add comment.
> >> - Change from & to && for EXCP_IRQ or EXCP_FIQ.
> >> - Refator nmi mask in arm_excp_unmasked().
> >> - Also handle VNMI in arm_cpu_exec_interrupt() and arm_cpu_set_irq().
> >> - Rename virtual to Virtual.
> >> v3:
> >> - Not include CPU_INTERRUPT_NMI when FEAT_NMI not enabled
> >> - Add ARM_CPU_VNMI.
> >> - Refator nmi mask in arm_excp_unmasked().
> >> - Test SCTLR_ELx.NMI for ALLINT mask for NMI.
> >> ---
> >>  target/arm/cpu-qom.h   |  4 +-
> >>  target/arm/cpu.c   | 85 +++---
> >>  target/arm/cpu.h   |  4 ++
> >>  target/arm/helper.c|  2 +
> >>  target/arm/internals.h |  9 +
> >>  5 files changed, 97 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/target/arm/cpu-qom.h b/target/arm/cpu-qom.h
> >> index 8e032691db..e0c9e18036 100644
> >> --- a/target/arm/cpu-qom.h
> >> +++ b/target/arm/cpu-qom.h
> >> @@ -36,11 +36,13 @@ DECLARE_CLASS_CHECKERS(AArch64CPUClass, AARCH64_CPU,
> >>  #define ARM_CPU_TYPE_SUFFIX "-" TYPE_ARM_CPU
> >>  #define ARM_CPU_TYPE_NAME(name) (name ARM_CPU_TYPE_SUFFIX)
> >>
> >> -/* Meanings of the ARMCPU object's four inbound GPIO lines */
> >> +/* Meanings of the ARMCPU object's six inbound GPIO lines */
> >>  #define ARM_CPU_IRQ 0
> >>  #define ARM_CPU_FIQ 1
> >>  #define ARM_CPU_VIRQ 2
> >>  #define ARM_CPU_VFIQ 3
> >> +#define ARM_CPU_NMI 4
> >> +#define ARM_CPU_VNMI 5
> >>
> >>  /* For M profile, some registers are banked secure vs non-secure;
> >>   * these are represented as a 2-element array where the first element
> >
> >> @@ -678,13 +687,31 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
> >> unsigned int excp_idx,
> >>  return false;
> >>  }
> >>
> >> +if (cpu_isar_feature(aa64_nmi, env_archcpu(env)) &&
> >> +env->cp15.sctlr_el[target_el] & SCTLR_NMI && cur_el == target_el) 
> >> {
> >> +allIntMask = env->pstate & PSTATE_ALLINT ||
> >> + ((env->cp15.sctlr_el[target_el] & SCTLR_SPINTMASK) &&
> >> +  (env->pstate & PSTATE_SP));
> >> +}
> >> +
> >>  switch (excp_idx) {
> >> +case EXCP_NMI:
> >> +pstate_unmasked = !allIntMask;
> >> +break;
> >> +
> >> +case EXCP_VNMI:
> >> +if ((!(hcr_el2 & HCR_IMO) && !(hcr_el2 & HCR_FMO)) ||
> >> + (hcr_el2 & HCR_TGE)) {
> >> +/* VNMIs(VIRQs or VFIQs) are only taken when hypervized.  */
> >> +return false;
> >> +}
> >
> > VINMI and VFNMI aren't the same thing: do we definitely want to
> > merge them into one EXCP_VNMI ? It feels like it would be simpler
> > to keep them separate. Similarly CPU_INTERRUPT_VNMI, and
> > arm_cpu_update_vnmi() probably want VINMI and VFNMI versions.
>
> It's not like that. The VFNMI cannot be reported from the GIC, there
> will be no opportunity to call arm_cpu_update_vfnmi().

The GIC can't trigger it, but the hypervisor can by setting
the HCRX_EL2.VFNMI bit. So writes to HCRX_EL2 (and HCR_EL2)
would need to call arm_cpu_update_vfnmi().

-- PMM

[PATCH 08/10] pnv/phb4: Implement IODA PCT table

2024-03-21 Thread Saif Abrar

IODA PCT table (#3) is implemented
without any functionality, being a debug table.

Signed-off-by: Saif Abrar 
---
 hw/pci-host/pnv_phb4.c  | 6 ++
 include/hw/pci-host/pnv_phb4.h  | 2 ++
 include/hw/pci-host/pnv_phb4_regs.h | 1 +
 3 files changed, 9 insertions(+)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index 6823ffab54..f48750ee54 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -263,6 +263,10 @@ static uint64_t *pnv_phb4_ioda_access(PnvPHB4 *phb,
 mask = phb->big_phb ? PNV_PHB4_MAX_MIST : (PNV_PHB4_MAX_MIST >> 1);
 mask -= 1;
 break;
+case IODA3_TBL_PCT:
+tptr = phb->ioda_PCT;
+mask = 7;
+break;
 case IODA3_TBL_RCAM:
 mask = phb->big_phb ? 127 : 63;
 break;
@@ -361,6 +365,8 @@ static void pnv_phb4_ioda_write(PnvPHB4 *phb, uint64_t val)
 /* Handle side effects */
 switch (table) {
 case IODA3_TBL_LIST:
+case IODA3_TBL_PCT:
+/* No action for debug tables */
 break;
 case IODA3_TBL_MIST: {
 /* Special mask for MIST partial write */
diff --git a/include/hw/pci-host/pnv_phb4.h b/include/hw/pci-host/pnv_phb4.h
index 91e81eee0e..6d83e5616f 100644
--- a/include/hw/pci-host/pnv_phb4.h
+++ b/include/hw/pci-host/pnv_phb4.h
@@ -64,6 +64,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(PnvPHB4, PNV_PHB4)
 #define PNV_PHB4_MAX_LSIs  8
 #define PNV_PHB4_MAX_INTs  4096
 #define PNV_PHB4_MAX_MIST  (PNV_PHB4_MAX_INTs >> 2)
+#define PNV_PHB4_MAX_PCT   128
 #define PNV_PHB4_MAX_MMIO_WINDOWS  32
 #define PNV_PHB4_MIN_MMIO_WINDOWS  16
 #define PNV_PHB4_NUM_REGS  (0x3000 >> 3)
@@ -144,6 +145,7 @@ struct PnvPHB4 {
 /* On-chip IODA tables */
 uint64_t ioda_LIST[PNV_PHB4_MAX_LSIs];
 uint64_t ioda_MIST[PNV_PHB4_MAX_MIST];
+uint64_t ioda_PCT[PNV_PHB4_MAX_PCT];
 uint64_t ioda_TVT[PNV_PHB4_MAX_TVEs];
 uint64_t ioda_MBT[PNV_PHB4_MAX_MBEs];
 uint64_t ioda_MDT[PNV_PHB4_MAX_PEs];
diff --git a/include/hw/pci-host/pnv_phb4_regs.h 
b/include/hw/pci-host/pnv_phb4_regs.h
index c1d5a83271..e30adff7b2 100644
--- a/include/hw/pci-host/pnv_phb4_regs.h
+++ b/include/hw/pci-host/pnv_phb4_regs.h
@@ -486,6 +486,7 @@
 
 #define IODA3_TBL_LIST  1
 #define IODA3_TBL_MIST  2
+#define IODA3_TBL_PCT   3
 #define IODA3_TBL_RCAM  5
 #define IODA3_TBL_MRT   6
 #define IODA3_TBL_PESTA 7
-- 
2.39.3

[PATCH 07/10] pnv/phb4: Set link speed and width in the DLP training control register

2024-03-21 Thread Saif Abrar

Get the current link-status from PCIE macro.
Extract link-speed and link-width from the link-status
and set in the DLP training control (PCIE_DLP_TCR) register.

Signed-off-by: Saif Abrar 
---
 hw/pci-host/pnv_phb4.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index 7b3d75bae6..6823ffab54 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -980,10 +980,27 @@ static uint64_t pnv_phb4_reg_read(void *opaque, hwaddr 
off, unsigned size)
 val |= PHB_PCIE_SCR_PLW_X16; /* RO bit */
 break;
 
-/* Link training always appears trained */
 case PHB_PCIE_DLP_TRAIN_CTL:
-/* TODO: Do something sensible with speed ? */
+/* Link training always appears trained */
 val |= PHB_PCIE_DLP_INBAND_PRESENCE | PHB_PCIE_DLP_TL_LINKACT;
+
+/* Get the current link-status from PCIE */
+uint32_t exp_offset = get_exp_offset(phb);
+uint32_t lnkstatus = bswap32(pnv_phb4_rc_config_read(phb,
+exp_offset + PCI_EXP_LNKSTA, 4));
+
+/* Extract link-speed from the link-status */
+uint32_t v = lnkstatus & PCI_EXP_LNKSTA_CLS;
+/* Set the current link-speed at the LINK_SPEED position */
+val = SETFIELD(PHB_PCIE_DLP_LINK_SPEED, val, v);
+
+/*
+ * Extract link-width from the link-status,
+ * after shifting the required bitfields.
+ */
+v = (lnkstatus & PCI_EXP_LNKSTA_NLW) >> PCI_EXP_LNKSTA_NLW_SHIFT;
+/* Set the current link-width at the LINK_WIDTH position */
+val = SETFIELD(PHB_PCIE_DLP_LINK_WIDTH, val, v);
 return val;
 
 /*
-- 
2.39.3

[PATCH 02/10] pnv/phb4: Add reset logic to PHB4

2024-03-21 Thread Saif Abrar

Add a method to be invoked on QEMU reset.
Also add CFG and PBL core-blocks reset logic using
appropriate bits of PHB_PCIE_CRESET register.

Tested by reading the reset value of a register.

Signed-off-by: Saif Abrar 
---
 hw/pci-host/pnv_phb4.c  | 104 +++-
 include/hw/pci-host/pnv_phb4_regs.h |  16 -
 tests/qtest/pnv-phb4-test.c |  10 +++
 3 files changed, 127 insertions(+), 3 deletions(-)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index 075499d36d..d2e7403b37 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -1,7 +1,7 @@
 /*
- * QEMU PowerPC PowerNV (POWER9) PHB4 model
+ * QEMU PowerPC PowerNV (POWER10) PHB4 model
  *
- * Copyright (c) 2018-2020, IBM Corporation.
+ * Copyright (c) 2018-2024, IBM Corporation.
  *
  * This code is licensed under the GPL version 2 or later. See the
  * COPYING file in the top-level directory.
@@ -22,6 +22,7 @@
 #include "hw/qdev-properties.h"
 #include "qom/object.h"
 #include "trace.h"
+#include "sysemu/reset.h"
 
 #define phb_error(phb, fmt, ...)\
 qemu_log_mask(LOG_GUEST_ERROR, "phb4[%d:%d]: " fmt "\n",\
@@ -499,6 +500,86 @@ static void pnv_phb4_update_xsrc(PnvPHB4 *phb)
 }
 }
 
+/*
+ * Get the PCI-E capability offset from the root-port
+ */
+static uint32_t get_exp_offset(PnvPHB4 *phb)
+{
+PCIHostState *pci = PCI_HOST_BRIDGE(phb->phb_base);
+PCIDevice *pdev;
+pdev = pci_find_device(pci->bus, 0, 0);
+if (!pdev) {
+phb_error(phb, "PCI device not found");
+return ~0;
+}
+PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(pdev);
+return rpc->exp_offset;
+}
+
+#define RC_CONFIG_WRITE(a, v) pnv_phb4_rc_config_write(phb, a, 4, v);
+
+static void pnv_phb4_cfg_core_reset(PnvPHB4 *phb)
+{
+/* Zero all registers initially */
+int i;
+for (i = PCI_COMMAND ; i < PHB_RC_CONFIG_SIZE ; i += 4) {
+RC_CONFIG_WRITE(i, 0)
+}
+
+RC_CONFIG_WRITE(PCI_COMMAND,  0x100100);
+RC_CONFIG_WRITE(PCI_CLASS_REVISION,   0x604);
+RC_CONFIG_WRITE(PCI_CACHE_LINE_SIZE,  0x1);
+RC_CONFIG_WRITE(PCI_MEMORY_BASE,  0x10);
+RC_CONFIG_WRITE(PCI_PREF_MEMORY_BASE, 0x10011);
+RC_CONFIG_WRITE(PCI_CAPABILITY_LIST,  0x40);
+RC_CONFIG_WRITE(PCI_INTERRUPT_LINE,   0x2);
+/* PM Capabilities Register */
+RC_CONFIG_WRITE(PCI_BRIDGE_CONTROL + PCI_PM_PMC, 0xC8034801);
+
+uint32_t exp_offset = get_exp_offset(phb);
+RC_CONFIG_WRITE(exp_offset, 0x420010);
+RC_CONFIG_WRITE(exp_offset + PCI_EXP_DEVCAP,  0x8022);
+RC_CONFIG_WRITE(exp_offset + PCI_EXP_DEVCTL,  0x140);
+RC_CONFIG_WRITE(exp_offset + PCI_EXP_LNKCAP,  0x300105);
+RC_CONFIG_WRITE(exp_offset + PCI_EXP_LNKCTL,  0x2010008);
+RC_CONFIG_WRITE(exp_offset + PCI_EXP_SLTCTL,  0x2000);
+RC_CONFIG_WRITE(exp_offset + PCI_EXP_DEVCAP2, 0x1003F);
+RC_CONFIG_WRITE(exp_offset + PCI_EXP_DEVCTL2, 0x20);
+RC_CONFIG_WRITE(exp_offset + PCI_EXP_LNKCAP2, 0x80003E);
+RC_CONFIG_WRITE(exp_offset + PCI_EXP_LNKCTL2, 0x5);
+
+RC_CONFIG_WRITE(PHB_AER_ECAP,0x14810001);
+RC_CONFIG_WRITE(PHB_AER_CAPCTRL, 0xA0);
+RC_CONFIG_WRITE(PHB_SEC_ECAP,0x1A010019);
+
+RC_CONFIG_WRITE(PHB_LMR_ECAP, 0x1E810027);
+/* LMR - Margining Lane Control / Status Register # 2 to 16 */
+for (i = PHB_LMR_CTLSTA_2 ; i <= PHB_LMR_CTLSTA_16 ; i += 4) {
+RC_CONFIG_WRITE(i, 0x9C38);
+}
+
+RC_CONFIG_WRITE(PHB_DLF_ECAP, 0x1F410025);
+RC_CONFIG_WRITE(PHB_DLF_CAP,  0x8001);
+RC_CONFIG_WRITE(P16_ECAP, 0x22410026);
+RC_CONFIG_WRITE(P32_ECAP, 0x1002A);
+RC_CONFIG_WRITE(P32_CAP,  0x103);
+}
+
+static void pnv_phb4_pbl_core_reset(PnvPHB4 *phb)
+{
+/* Zero all registers initially */
+int i;
+for (i = PHB_PBL_CONTROL ; i <= PHB_PBL_ERR1_STATUS_MASK ; i += 8) {
+phb->regs[i >> 3] = 0x0;
+}
+
+/* Set specific register values */
+phb->regs[PHB_PBL_CONTROL   >> 3] = 0xC009;
+phb->regs[PHB_PBL_TIMEOUT_CTRL  >> 3] = 0x2020;
+phb->regs[PHB_PBL_NPTAG_ENABLE  >> 3] = 0x;
+phb->regs[PHB_PBL_SYS_LINK_INIT >> 3] = 0x80088B4642473000;
+}
+
 static void pnv_phb4_reg_write(void *opaque, hwaddr off, uint64_t val,
unsigned size)
 {
@@ -612,6 +693,16 @@ static void pnv_phb4_reg_write(void *opaque, hwaddr off, 
uint64_t val,
 pnv_phb4_update_xsrc(phb);
 break;
 
+/* Reset core blocks */
+case PHB_PCIE_CRESET:
+if (val & PHB_PCIE_CRESET_CFG_CORE) {
+pnv_phb4_cfg_core_reset(phb);
+}
+if (val & PHB_PCIE_CRESET_PBL) {
+pnv_phb4_pbl_core_reset(phb);
+}
+break;
+
 /* Silent simple writes */
 case PHB_ASN_CMPM:
 case PHB_CONFIG_ADDRESS:
@@ -1531,6 +1622,13 @@ static void pnv_phb4_xscom_realize(PnvPHB4 *phb)
 static PCIIOMMUOps pnv_phb4_iommu_ops = {
 .get_addres

[PATCH 00/10] pnv/phb4: Update PHB4 to the latest spec PH5

2024-03-21 Thread Saif Abrar

Hello,

This series updates the existing PHB4 model to the latest spec:
"Power Systems Host Bridge 5 (PHB5) Functional Specification Version 0.5_00".

Updates include the following:
- implemented sticky reset logic
- implemented read-only, write-only, W1C and WxC logic
- return all 1's on read to unimplemented registers
- update PCIE registers for link status, speed and width
- implement IODA PCT debug table without any functionality
- set write-mask bits for PCIE Link-Control-2 register that is read/written by 
PHB4
- update LSI Source-ID register based on small/big PHB number of interrupts

Also, a new testbench for PHB4 model is added that does XSCOM read/writes
to various registers of interest and verifies the values.

Regards.

Saif Abrar (10):
  qtest/phb4: Add testbench for PHB4
  pnv/phb4: Add reset logic to PHB4
  pnv/phb4: Implement sticky reset logic in PHB4
  pnv/phb4: Implement read-only and write-only bits of registers
  pnv/phb4: Implement write-clear and return 1's on unimplemented reg read
  pnv/phb4: Set link-active status in HPSTAT and LMR registers
  pnv/phb4: Set link speed and width in the DLP training control register
  pnv/phb4: Implement IODA PCT table
  hw/pci: Set write-mask bits for PCIE Link-Control-2 register
  pnv/phb4: Mask off LSI Source-ID based on number of interrupts

 hw/pci-host/pnv_phb4.c| 601 --
 hw/pci/pcie.c |   6 +
 include/hw/pci-host/pnv_phb4.h|   9 +
 include/hw/pci-host/pnv_phb4_regs.h   |  66 ++-
 include/standard-headers/linux/pci_regs.h |   3 +
 tests/qtest/meson.build   |   1 +
 tests/qtest/pnv-phb4-test.c   | 177 +++
 7 files changed, 805 insertions(+), 58 deletions(-)
 create mode 100644 tests/qtest/pnv-phb4-test.c

-- 
2.39.3

[PATCH 04/10] pnv/phb4: Implement read-only and write-only bits of registers

2024-03-21 Thread Saif Abrar

SW cannot write the read-only(RO) bits of a register
and write-only(WO) bits of a register return 0 when read.

Added ro_mask[] for each register that defines which
bits in that register are RO.
When writing to a register, the RO-bits are not updated.

When reading a register, clear the WO bits and return the updated value.

Tested the registers PHB_DMA_SYNC, PHB_PCIE_HOTPLUG_STATUS, PHB_PCIE_LMR,
PHB_PCIE_DLP_TRWCTL, PHB_LEM_ERROR_AND_MASK and PHB_LEM_ERROR_OR_MASK
by writing all 1's and reading back the value.
The WO bits in these registers should read back as 0.

Signed-off-by: Saif Abrar 
---
 hw/pci-host/pnv_phb4.c  | 77 ++---
 include/hw/pci-host/pnv_phb4.h  |  7 +++
 include/hw/pci-host/pnv_phb4_regs.h | 19 +--
 tests/qtest/pnv-phb4-test.c | 60 +-
 4 files changed, 150 insertions(+), 13 deletions(-)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index b3a83837f8..a81763f34c 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -735,6 +735,10 @@ static void pnv_phb4_reg_write(void *opaque, hwaddr off, 
uint64_t val,
 return;
 }
 
+/* Update 'val' according to the register's RO-mask */
+val = (phb->regs[off >> 3] & phb->ro_mask[off >> 3]) |
+  (val & ~(phb->ro_mask[off >> 3]));
+
 /* Record whether it changed */
 changed = phb->regs[off >> 3] != val;
 
@@ -808,7 +812,7 @@ static void pnv_phb4_reg_write(void *opaque, hwaddr off, 
uint64_t val,
 case PHB_TCE_TAG_ENABLE:
 case PHB_INT_NOTIFY_ADDR:
 case PHB_INT_NOTIFY_INDEX:
-case PHB_DMARD_SYNC:
+case PHB_DMA_SYNC:
break;
 
 /* Noise on anything else */
@@ -846,7 +850,7 @@ static uint64_t pnv_phb4_reg_read(void *opaque, hwaddr off, 
unsigned size)
 case PHB_VERSION:
 return PNV_PHB4_PEC_GET_CLASS(phb->pec)->version;
 
-/* Read-only */
+/* Read-only */
 case PHB_PHB4_GEN_CAP:
 return 0xe4b8ull;
 case PHB_PHB4_TCE_CAP:
@@ -856,18 +860,49 @@ static uint64_t pnv_phb4_reg_read(void *opaque, hwaddr 
off, unsigned size)
 case PHB_PHB4_EEH_CAP:
 return phb->big_phb ? 0x2000ull : 0x1000ull;
 
+/* Write-only, read will return zeros */
+case PHB_LEM_ERROR_AND_MASK:
+case PHB_LEM_ERROR_OR_MASK:
+return 0;
+case PHB_PCIE_DLP_TRWCTL:
+val &= ~PHB_PCIE_DLP_TRWCTL_WREN;
+return val;
 /* IODA table accesses */
 case PHB_IODA_DATA0:
 return pnv_phb4_ioda_read(phb);
 
+/*
+ * DMA sync: make it look like it's complete,
+ *   clear write-only read/write start sync bits.
+ */
+case PHB_DMA_SYNC:
+val = PHB_DMA_SYNC_RD_COMPLETE |
+~(PHB_DMA_SYNC_RD_START | PHB_DMA_SYNC_WR_START);
+return val;
+
+/*
+ * PCI-E Stack registers
+ */
+case PHB_PCIE_SCR:
+val |= PHB_PCIE_SCR_PLW_X16; /* RO bit */
+break;
+
 /* Link training always appears trained */
 case PHB_PCIE_DLP_TRAIN_CTL:
 /* TODO: Do something sensible with speed ? */
-return PHB_PCIE_DLP_INBAND_PRESENCE | PHB_PCIE_DLP_TL_LINKACT;
+val |= PHB_PCIE_DLP_INBAND_PRESENCE | PHB_PCIE_DLP_TL_LINKACT;
+return val;
+
+case PHB_PCIE_HOTPLUG_STATUS:
+/* Clear write-only bit */
+val &= ~PHB_PCIE_HPSTAT_RESAMPLE;
+return val;
 
-/* DMA read sync: make it look like it's complete */
-case PHB_DMARD_SYNC:
-return PHB_DMARD_SYNC_COMPLETE;
+/* Link Management Register */
+case PHB_PCIE_LMR:
+/* These write-only bits always read as 0 */
+val &= ~(PHB_PCIE_LMR_CHANGELW | PHB_PCIE_LMR_RETRAINLINK);
+return val;
 
 /* Silent simple reads */
 case PHB_LSI_SOURCE_ID:
@@ -1712,6 +1747,33 @@ static PCIIOMMUOps pnv_phb4_iommu_ops = {
 .get_address_space = pnv_phb4_dma_iommu,
 };
 
+static void pnv_phb4_ro_mask_init(PnvPHB4 *phb)
+{
+/* Clear RO-mask to make all regs as R/W by default */
+memset(phb->ro_mask, 0x0, PNV_PHB4_NUM_REGS * sizeof(uint64_t));
+
+/*
+ * Set register specific RO-masks
+ */
+
+/* PBL - Error Injection Register (0x1910) */
+phb->ro_mask[PHB_PBL_ERR_INJECT >> 3] =
+PPC_BITMASK(0, 23) | PPC_BITMASK(28, 35) | PPC_BIT(38) | PPC_BIT(46) |
+PPC_BITMASK(49, 51) | PPC_BITMASK(55, 63);
+
+/* Reserved bits[60:63] */
+phb->ro_mask[PHB_TXE_ERR_LEM_ENABLE >> 3] =
+phb->ro_mask[PHB_TXE_ERR_AIB_FENCE_ENABLE >> 3] = PPC_BITMASK(60, 63);
+/* Reserved bits[36:63] */
+phb->ro_mask[PHB_RXE_TCE_ERR_LEM_ENABLE >> 3] =
+phb->ro_mask[PHB_RXE_TCE_ERR_AIB_FENCE_ENABLE >> 3] = PPC_BITMASK(36, 63);
+/* Reserved bits[40:63] */
+phb->ro_mask[PHB_ERR_LEM_ENABLE >> 3] =
+phb->ro_mask[PHB_ERR_AIB_FENCE_ENABLE >> 3] = PPC_BITMASK(40, 63);
+
+/* TODO: Add more RO-masks as regs are implemented in the model */
+}
+
 static void pnv_phb4_err_reg_reset(PnvPHB4 *phb)

[PATCH 01/10] qtest/phb4: Add testbench for PHB4

2024-03-21 Thread Saif Abrar

New qtest TB added for PHB4.
TB reads PHB Version register and asserts that
bits[24:31] have value 0xA5.

Signed-off-by: Saif Abrar 
---
 tests/qtest/meson.build |  1 +
 tests/qtest/pnv-phb4-test.c | 74 +
 2 files changed, 75 insertions(+)
 create mode 100644 tests/qtest/pnv-phb4-test.c

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 36c5c13a7b..4795e51c17 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -168,6 +168,7 @@ qtests_ppc64 = \
   (config_all_devices.has_key('CONFIG_PSERIES') ? ['device-plug-test'] : []) + 
  \
   (config_all_devices.has_key('CONFIG_POWERNV') ? ['pnv-xscom-test'] : []) +   
  \
   (config_all_devices.has_key('CONFIG_POWERNV') ? ['pnv-host-i2c-test'] : []) 
+  \
+  (config_all_devices.has_key('CONFIG_POWERNV') ? ['pnv-phb4-test'] : []) +
  \
   (config_all_devices.has_key('CONFIG_PSERIES') ? ['rtas-test'] : []) +
  \
   (slirp.found() ? ['pxe-test'] : []) +  \
   (config_all_devices.has_key('CONFIG_USB_UHCI') ? ['usb-hcd-uhci-test'] : []) 
+ \
diff --git a/tests/qtest/pnv-phb4-test.c b/tests/qtest/pnv-phb4-test.c
new file mode 100644
index 00..e3b809e9c4
--- /dev/null
+++ b/tests/qtest/pnv-phb4-test.c
@@ -0,0 +1,74 @@
+/*
+ * QTest testcase for PowerNV PHB4
+ *
+ * Copyright (c) 2024, IBM Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "hw/pci-host/pnv_phb4_regs.h"
+
+#define P10_XSCOM_BASE  0x000603fcull
+#define PHB4_MMIO   0x000600c3c000ull
+#define PHB4_XSCOM  0x8010900ull
+
+#define PPC_BIT(bit)(0x8000ULL >> (bit))
+#define PPC_BITMASK(bs, be) ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
+
+static uint64_t pnv_xscom_addr(uint32_t pcba)
+{
+return P10_XSCOM_BASE | ((uint64_t) pcba << 3);
+}
+
+static uint64_t pnv_phb4_xscom_addr(uint32_t reg)
+{
+return pnv_xscom_addr(PHB4_XSCOM + reg);
+}
+
+/*
+ * XSCOM read/write is indirect in PHB4:
+ * Write 'SCOM - HV Indirect Address Register'
+ * with register-offset to read/write.
+   - bit[0]: Valid Bit
+   - bit[51:61]: Indirect Address(00:10)
+ * Read/write 'SCOM - HV Indirect Data Register' to get/set the value.
+ */
+
+static uint64_t pnv_phb4_xscom_read(QTestState *qts, uint32_t reg)
+{
+qtest_writeq(qts, pnv_phb4_xscom_addr(PHB_SCOM_HV_IND_ADDR),
+PPC_BIT(0) | reg);
+return qtest_readq(qts, pnv_phb4_xscom_addr(PHB_SCOM_HV_IND_DATA));
+}
+
+/* Assert that 'PHB - Version Register Offset 0x0800' bits-[24:31] are 0xA5 */
+static void phb4_version_test(QTestState *qts)
+{
+uint64_t ver = pnv_phb4_xscom_read(qts, PHB_VERSION);
+
+/* PHB Version register [24:31]: Major Revision ID 0xA5 */
+ver = ver >> (63 - 31);
+g_assert_cmpuint(ver, ==, 0xA5);
+}
+
+static void test_phb4(void)
+{
+QTestState *qts = NULL;
+
+qts = qtest_initf("-machine powernv10 -accel tcg -nographic -d unimp");
+
+/* Make sure test is running on PHB */
+phb4_version_test(qts);
+
+qtest_quit(qts);
+}
+
+int main(int argc, char **argv)
+{
+g_test_init(&argc, &argv, NULL);
+qtest_add_func("phb4", test_phb4);
+return g_test_run();
+}
-- 
2.39.3

[PATCH 05/10] pnv/phb4: Implement write-clear and return 1's on unimplemented reg read

2024-03-21 Thread Saif Abrar

Implement write-1-to-clear and write-X-to-clear logic.
Update registers with silent simple read and write.
Return all 1's when an unimplemented/reserved register is read.

Test that reading address 0x0 returns all 1's (i.e. -1).

Signed-off-by: Saif Abrar 
---
 hw/pci-host/pnv_phb4.c  | 190 ++--
 include/hw/pci-host/pnv_phb4_regs.h |  12 +-
 tests/qtest/pnv-phb4-test.c |   9 ++
 3 files changed, 170 insertions(+), 41 deletions(-)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index a81763f34c..4e3a6b37f9 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -683,8 +683,41 @@ static void pnv_phb4_reg_write(void *opaque, hwaddr off, 
uint64_t val,
 return;
 }
 
-/* Handle masking */
+/* Handle RO, W1C, WxC and masking */
 switch (off) {
+/* W1C: Write-1-to-Clear registers */
+case PHB_TXE_ERR_STATUS:
+case PHB_RXE_ARB_ERR_STATUS:
+case PHB_RXE_MRG_ERR_STATUS:
+case PHB_RXE_TCE_ERR_STATUS:
+case PHB_ERR_STATUS:
+case PHB_REGB_ERR_STATUS:
+case PHB_PCIE_DLP_ERRLOG1:
+case PHB_PCIE_DLP_ERRLOG2:
+case PHB_PCIE_DLP_ERR_STATUS:
+case PHB_PBL_ERR_STATUS:
+phb->regs[off >> 3] &= ~val;
+return;
+
+/* WxC: Clear register on any write */
+case PHB_PBL_ERR1_STATUS:
+case PHB_PBL_ERR_LOG_0 ... PHB_PBL_ERR_LOG_1:
+case PHB_REGB_ERR1_STATUS:
+case PHB_REGB_ERR_LOG_0 ... PHB_REGB_ERR_LOG_1:
+case PHB_TXE_ERR1_STATUS:
+case PHB_TXE_ERR_LOG_0 ... PHB_TXE_ERR_LOG_1:
+case PHB_RXE_ARB_ERR1_STATUS:
+case PHB_RXE_ARB_ERR_LOG_0 ... PHB_RXE_ARB_ERR_LOG_1:
+case PHB_RXE_MRG_ERR1_STATUS:
+case PHB_RXE_MRG_ERR_LOG_0 ... PHB_RXE_MRG_ERR_LOG_1:
+case PHB_RXE_TCE_ERR1_STATUS:
+case PHB_RXE_TCE_ERR_LOG_0 ... PHB_RXE_TCE_ERR_LOG_1:
+case PHB_ERR1_STATUS:
+case PHB_ERR_LOG_0 ... PHB_ERR_LOG_1:
+phb->regs[off >> 3] = 0;
+return;
+
+/* Write value updated by masks */
 case PHB_LSI_SOURCE_ID:
 val &= PHB_LSI_SRC_ID;
 break;
@@ -723,7 +756,6 @@ static void pnv_phb4_reg_write(void *opaque, hwaddr off, 
uint64_t val,
 case PHB_LEM_WOF:
 val = 0;
 break;
-/* TODO: More regs ..., maybe create a table with masks... */
 
 /* Read only registers */
 case PHB_CPU_LOADSTORE_STATUS:
@@ -732,6 +764,12 @@ static void pnv_phb4_reg_write(void *opaque, hwaddr off, 
uint64_t val,
 case PHB_PHB4_TCE_CAP:
 case PHB_PHB4_IRQ_CAP:
 case PHB_PHB4_EEH_CAP:
+case PHB_VERSION:
+case PHB_DMA_CHAN_STATUS:
+case PHB_TCE_TAG_STATUS:
+case PHB_PBL_BUF_STATUS:
+case PHB_PCIE_BNR:
+case PHB_PCIE_PHY_RXEQ_STAT_G3_00_03 ... PHB_PCIE_PHY_RXEQ_STAT_G5_12_15:
 return;
 }
 
@@ -752,6 +790,7 @@ static void pnv_phb4_reg_write(void *opaque, hwaddr off, 
uint64_t val,
 pnv_phb4_update_all_msi_regions(phb);
 }
 break;
+
 case PHB_M32_START_ADDR:
 case PHB_M64_UPPER_BITS:
 if (changed) {
@@ -797,27 +836,63 @@ static void pnv_phb4_reg_write(void *opaque, hwaddr off, 
uint64_t val,
 break;
 
 /* Silent simple writes */
-case PHB_ASN_CMPM:
-case PHB_CONFIG_ADDRESS:
-case PHB_IODA_ADDR:
-case PHB_TCE_KILL:
-case PHB_TCE_SPEC_CTL:
-case PHB_PEST_BAR:
-case PHB_PELTV_BAR:
+/* PHB Fundamental register set A */
+case PHB_CONFIG_DATA ... PHB_LOCK1:
 case PHB_RTT_BAR:
-case PHB_LEM_FIR_ACCUM:
-case PHB_LEM_ERROR_MASK:
-case PHB_LEM_ACTION0:
-case PHB_LEM_ACTION1:
-case PHB_TCE_TAG_ENABLE:
+case PHB_PELTV_BAR:
+case PHB_PEST_BAR:
+case PHB_CAPI_CMPM ... PHB_M64_AOMASK:
+case PHB_NXLATE_PREFIX ... PHB_DMA_SYNC:
+case PHB_TCE_KILL ... PHB_IODA_ADDR:
+case PHB_PAPR_ERR_INJ_CTL ... PHB_PAPR_ERR_INJ_MASK:
 case PHB_INT_NOTIFY_ADDR:
 case PHB_INT_NOTIFY_INDEX:
-case PHB_DMA_SYNC:
-   break;
+/* Fundamental register set B */
+case PHB_AIB_FENCE_CTRL ... PHB_Q_DMA_R:
+/* FIR & Error registers */
+case PHB_LEM_FIR_ACCUM:
+case PHB_LEM_ERROR_MASK:
+case PHB_LEM_ACTION0 ... PHB_LEM_WOF:
+case PHB_ERR_INJECT ... PHB_ERR_AIB_FENCE_ENABLE:
+case PHB_ERR_STATUS_MASK ... PHB_ERR1_STATUS_MASK:
+case PHB_TXE_ERR_INJECT ... PHB_TXE_ERR_AIB_FENCE_ENABLE:
+case PHB_TXE_ERR_STATUS_MASK ... PHB_TXE_ERR1_STATUS_MASK:
+case PHB_RXE_ARB_ERR_INJECT ... PHB_RXE_ARB_ERR_AIB_FENCE_ENABLE:
+case PHB_RXE_ARB_ERR_STATUS_MASK ... PHB_RXE_ARB_ERR1_STATUS_MASK:
+case PHB_RXE_MRG_ERR_INJECT ... PHB_RXE_MRG_ERR_AIB_FENCE_ENABLE:
+case PHB_RXE_MRG_ERR_STATUS_MASK ... PHB_RXE_MRG_ERR1_STATUS_MASK:
+case PHB_RXE_TCE_ERR_INJECT ... PHB_RXE_TCE_ERR_AIB_FENCE_ENABLE:
+case PHB_RXE_TCE_ERR_STATUS_MASK ... PHB_RXE_TCE_ERR1_STATUS_MASK:
+/* Performance monitor & Debug registers */
+case PHB_TRACE_CONTROL ... PHB_PERFMON_CTR1:
+/* REGB Registers */
+/* PBL core */
+case PHB_PBL_

[PATCH 09/10] hw/pci: Set write-mask bits for PCIE Link-Control-2 register

2024-03-21 Thread Saif Abrar

PHB updates the register PCIE Link-Control-2.
Set the write-mask bits for TLS, ENTER_COMP, TX_MARGIN,
HASD, MOD_COMP, COMP_SOS and COMP_P_DE.

Signed-off-by: Saif Abrar 
---
 hw/pci/pcie.c | 6 ++
 include/standard-headers/linux/pci_regs.h | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 4b2f0805c6..e3081f6b84 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -212,6 +212,12 @@ int pcie_cap_init(PCIDevice *dev, uint8_t offset,
 
 pci_set_word(dev->wmask + pos + PCI_EXP_DEVCTL2, PCI_EXP_DEVCTL2_EETLPPB);
 
+pci_set_word(dev->wmask + pos + PCI_EXP_LNKCTL2,
+PCI_EXP_LNKCTL2_TLS | PCI_EXP_LNKCTL2_ENTER_COMP |
+PCI_EXP_LNKCTL2_TX_MARGIN | PCI_EXP_LNKCTL2_HASD |
+PCI_EXP_LNKCTL2_MOD_COMP | PCI_EXP_LNKCTL2_COMP_SOS |
+PCI_EXP_LNKCTL2_COMP_P_DE);
+
 if (dev->cap_present & QEMU_PCIE_EXTCAP_INIT) {
 /* read-only to behave like a 'NULL' Extended Capability Header */
 pci_set_long(dev->wmask + PCI_CONFIG_SPACE_SIZE, 0);
diff --git a/include/standard-headers/linux/pci_regs.h 
b/include/standard-headers/linux/pci_regs.h
index a39193213f..f743defe91 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -694,6 +694,9 @@
 #define  PCI_EXP_LNKCTL2_ENTER_COMP0x0010 /* Enter Compliance */
 #define  PCI_EXP_LNKCTL2_TX_MARGIN 0x0380 /* Transmit Margin */
 #define  PCI_EXP_LNKCTL2_HASD  0x0020 /* HW Autonomous Speed Disable */
+#define  PCI_EXP_LNKCTL2_MOD_COMP  0x0400 /* Enter Modified Compliance */
+#define  PCI_EXP_LNKCTL2_COMP_SOS  0x0800 /* Compliance SOS */
+#define  PCI_EXP_LNKCTL2_COMP_P_DE 0xF000 /* Compliance Preset/De-emphasis 
*/
 #define PCI_EXP_LNKSTA20x32/* Link Status 2 */
 #define  PCI_EXP_LNKSTA2_FLIT  0x0400 /* Flit Mode Status */
 #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 0x32/* end of v2 EPs w/ link */
-- 
2.39.3

[PATCH 03/10] pnv/phb4: Implement sticky reset logic in PHB4

2024-03-21 Thread Saif Abrar

Sticky bits retain their values on reset and are not overwritten with the reset 
value.
Added sticky reset logic for all required registers,
i.e. CFG core, PBL core, PHB error registers, PCIE stack registers and REGB 
error registers.

Tested by writing all 1's to the reg PHB_PBL_ERR_INJECT.
This will set the bits in the reg PHB_PBL_ERR_STATUS.
Reset the PBL core by setting PHB_PCIE_CRESET_PBL in reg PHB_PCIE_CRESET.
Verify that the sticky bits in the PHB_PBL_ERR_STATUS reg are still set.

Signed-off-by: Saif Abrar 
---
 hw/pci-host/pnv_phb4.c  | 156 ++--
 include/hw/pci-host/pnv_phb4_regs.h |  20 +++-
 tests/qtest/pnv-phb4-test.c |  30 +-
 3 files changed, 196 insertions(+), 10 deletions(-)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index d2e7403b37..b3a83837f8 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -516,14 +516,52 @@ static uint32_t get_exp_offset(PnvPHB4 *phb)
 return rpc->exp_offset;
 }
 
-#define RC_CONFIG_WRITE(a, v) pnv_phb4_rc_config_write(phb, a, 4, v);
+#define RC_CONFIG_WRITE(a, v) pnv_phb4_rc_config_write(phb, a, 4, v)
+
+/*
+ * Apply sticky-mask 's' to the reset-value 'v' and write to the address 'a'.
+ * RC-config space values and masks are LE.
+ * Method pnv_phb4_rc_config_read() returns BE, hence convert to LE.
+ * Compute new value in LE domain.
+ * New value computation using sticky-mask is in LE.
+ * Convert the computed value from LE to BE before writing back.
+ */
+#define RC_CONFIG_STICKY_RESET(a, v, s) \
+(RC_CONFIG_WRITE(a, bswap32( \
+ (bswap32(pnv_phb4_rc_config_read(phb, a, 4)) & s) \
+  | (v & ~s) \
+ )))
 
 static void pnv_phb4_cfg_core_reset(PnvPHB4 *phb)
 {
-/* Zero all registers initially */
+/*
+ * Zero all registers initially,
+ * except those that have sticky reset.
+ */
 int i;
 for (i = PCI_COMMAND ; i < PHB_RC_CONFIG_SIZE ; i += 4) {
-RC_CONFIG_WRITE(i, 0)
+switch (i) {
+case PCI_EXP_LNKCTL2:
+case PHB_AER_UERR:
+case PHB_AER_UERR_MASK:
+case PHB_AER_CERR:
+case PHB_AER_CAPCTRL:
+case PHB_AER_HLOG_1:
+case PHB_AER_HLOG_2:
+case PHB_AER_HLOG_3:
+case PHB_AER_HLOG_4:
+case PHB_AER_RERR:
+case PHB_AER_ESID:
+case PHB_DLF_STAT:
+case P16_STAT:
+case P16_LDPM:
+case P16_FRDPM:
+case P16_SRDPM:
+case P32_CTL:
+break;
+default:
+RC_CONFIG_WRITE(i, 0);
+}
 }
 
 RC_CONFIG_WRITE(PCI_COMMAND,  0x100100);
@@ -563,15 +601,55 @@ static void pnv_phb4_cfg_core_reset(PnvPHB4 *phb)
 RC_CONFIG_WRITE(P16_ECAP, 0x22410026);
 RC_CONFIG_WRITE(P32_ECAP, 0x1002A);
 RC_CONFIG_WRITE(P32_CAP,  0x103);
+
+/* Sticky reset */
+RC_CONFIG_STICKY_RESET(exp_offset + PCI_EXP_LNKCTL2,   0x5,  0xFEFFBF);
+RC_CONFIG_STICKY_RESET(PHB_AER_UERR,  0,0x1FF030);
+RC_CONFIG_STICKY_RESET(PHB_AER_UERR_MASK, 0,0x1FF030);
+RC_CONFIG_STICKY_RESET(PHB_AER_CERR,  0,0x11C1);
+RC_CONFIG_STICKY_RESET(PHB_AER_CAPCTRL,   0xA0, 0x15F);
+RC_CONFIG_STICKY_RESET(PHB_AER_HLOG_1,0,0x);
+RC_CONFIG_STICKY_RESET(PHB_AER_HLOG_2,0,0x);
+RC_CONFIG_STICKY_RESET(PHB_AER_HLOG_3,0,0x);
+RC_CONFIG_STICKY_RESET(PHB_AER_HLOG_4,0,0x);
+RC_CONFIG_STICKY_RESET(PHB_AER_RERR,  0,0x7F);
+RC_CONFIG_STICKY_RESET(PHB_AER_ESID,  0,0x);
+RC_CONFIG_STICKY_RESET(PHB_DLF_STAT,  0,0x807F);
+RC_CONFIG_STICKY_RESET(P16_STAT,  0,0x1F);
+RC_CONFIG_STICKY_RESET(P16_LDPM,  0,0x);
+RC_CONFIG_STICKY_RESET(P16_FRDPM, 0,0x);
+RC_CONFIG_STICKY_RESET(P16_SRDPM, 0,0x);
+RC_CONFIG_STICKY_RESET(P32_CTL,   0,0x3);
 }
 
+/* Apply sticky-mask to the reset-value and write to the reg-address */
+#define STICKY_RST(addr, rst_val, sticky_mask) (phb->regs[addr >> 3] = \
+((phb->regs[addr >> 3] & sticky_mask) | (rst_val & ~sticky_mask)))
+
 static void pnv_phb4_pbl_core_reset(PnvPHB4 *phb)
 {
-/* Zero all registers initially */
+/*
+ * Zero all registers initially,
+ * with sticky reset of certain registers.
+ */
 int i;
 for (i = PHB_PBL_CONTROL ; i <= PHB_PBL_ERR1_STATUS_MASK ; i += 8) {
-phb->regs[i >> 3] = 0x0;
+switch (i) {
+case PHB_PBL_ERR_STATUS:
+break;
+case PHB_PBL_ERR1_STATUS:
+case PHB_PBL_ERR_LOG_0:
+case PHB_PBL_ERR_LOG_1:
+case PHB_PBL_ERR_STATUS_MASK:
+case PHB_PBL_ERR1_STATUS_MASK:
+STICKY_RST(i, 0, PPC_BITMASK(0, 63));
+break;
+default:
+phb->regs[i >> 3] = 0x0;
+}
 }
+STICKY_RST(PHB_PBL_ERR_STATUS, 0, \
+

[PATCH 10/10] pnv/phb4: Mask off LSI Source-ID based on number of interrupts

2024-03-21 Thread Saif Abrar

Add a method to reset the value of LSI Source-ID.
Mask off LSI source-id based on number of interrupts in the big/small PHB.

Signed-off-by: Saif Abrar 
---
 hw/pci-host/pnv_phb4.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index f48750ee54..8fbaf6512e 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -489,6 +489,7 @@ static void pnv_phb4_update_xsrc(PnvPHB4 *phb)
 
 lsi_base = GETFIELD(PHB_LSI_SRC_ID, phb->regs[PHB_LSI_SOURCE_ID >> 3]);
 lsi_base <<= 3;
+lsi_base &= (xsrc->nr_irqs - 1);
 
 /* TODO: handle reset values of PHB_LSI_SRC_ID */
 if (!lsi_base) {
@@ -1966,6 +1967,12 @@ static void pnv_phb4_ro_mask_init(PnvPHB4 *phb)
 /* TODO: Add more RO-masks as regs are implemented in the model */
 }
 
+static void pnv_phb4_fund_A_reset(PnvPHB4 *phb)
+{
+phb->regs[PHB_LSI_SOURCE_ID >> 3] = PPC_BITMASK(4, 12);
+pnv_phb4_update_xsrc(phb);
+}
+
 static void pnv_phb4_err_reg_reset(PnvPHB4 *phb)
 {
 STICKY_RST(PHB_ERR_STATUS,   0, PPC_BITMASK(0, 33));
@@ -2023,6 +2030,7 @@ static void pnv_phb4_reset(void *dev)
 pnv_phb4_cfg_core_reset(phb);
 pnv_phb4_pbl_core_reset(phb);
 
+pnv_phb4_fund_A_reset(phb);
 pnv_phb4_err_reg_reset(phb);
 pnv_phb4_pcie_stack_reg_reset(phb);
 pnv_phb4_regb_err_reg_reset(phb);
@@ -2102,8 +2110,6 @@ static void pnv_phb4_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-pnv_phb4_update_xsrc(phb);
-
 phb->qirqs = qemu_allocate_irqs(xive_source_set_irq, xsrc, xsrc->nr_irqs);
 
 pnv_phb4_xscom_realize(phb);
-- 
2.39.3

[PATCH v3 1/3] ui/console: Introduce dpy_gl_dmabuf_get_height/width() helpers

2024-03-21 Thread Philippe Mathieu-Daudé

From: Dongwon Kim 

Signed-off-by: Dongwon Kim 
Message-Id: <20240320034229.3347130-1-dongwon@intel.com>
[PMD: Split patch in 3, part 1/3]
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/ui/console.h|  2 ++
 hw/display/virtio-gpu-udmabuf.c |  8 +---
 hw/vfio/display.c   |  9 ++---
 ui/console.c| 18 ++
 4 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/include/ui/console.h b/include/ui/console.h
index a4a49ffc64..a7f6cef26d 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -358,6 +358,8 @@ void dpy_gl_cursor_dmabuf(QemuConsole *con, QemuDmaBuf 
*dmabuf,
   bool have_hot, uint32_t hot_x, uint32_t hot_y);
 void dpy_gl_cursor_position(QemuConsole *con,
 uint32_t pos_x, uint32_t pos_y);
+uint32_t dpy_gl_dmabuf_get_width(QemuDmaBuf *dmabuf);
+uint32_t dpy_gl_dmabuf_get_height(QemuDmaBuf *dmabuf);
 void dpy_gl_release_dmabuf(QemuConsole *con,
QemuDmaBuf *dmabuf);
 void dpy_gl_update(QemuConsole *con,
diff --git a/hw/display/virtio-gpu-udmabuf.c b/hw/display/virtio-gpu-udmabuf.c
index d51184d658..d680e871c1 100644
--- a/hw/display/virtio-gpu-udmabuf.c
+++ b/hw/display/virtio-gpu-udmabuf.c
@@ -206,20 +206,22 @@ int virtio_gpu_update_dmabuf(VirtIOGPU *g,
 {
 struct virtio_gpu_scanout *scanout = &g->parent_obj.scanout[scanout_id];
 VGPUDMABuf *new_primary, *old_primary = NULL;
+uint32_t width, height;
 
 new_primary = virtio_gpu_create_dmabuf(g, scanout_id, res, fb, r);
 if (!new_primary) {
 return -EINVAL;
 }
 
+width = dpy_gl_dmabuf_get_width(&new_primary->buf);
+height = dpy_gl_dmabuf_get_height(&new_primary->buf);
+
 if (g->dmabuf.primary[scanout_id]) {
 old_primary = g->dmabuf.primary[scanout_id];
 }
 
 g->dmabuf.primary[scanout_id] = new_primary;
-qemu_console_resize(scanout->con,
-new_primary->buf.width,
-new_primary->buf.height);
+qemu_console_resize(scanout->con, width, height);
 dpy_gl_scanout_dmabuf(scanout->con, &new_primary->buf);
 
 if (old_primary) {
diff --git a/hw/vfio/display.c b/hw/vfio/display.c
index 1aa440c663..c962e5f88f 100644
--- a/hw/vfio/display.c
+++ b/hw/vfio/display.c
@@ -286,6 +286,7 @@ static void vfio_display_dmabuf_update(void *opaque)
 VFIOPCIDevice *vdev = opaque;
 VFIODisplay *dpy = vdev->dpy;
 VFIODMABuf *primary, *cursor;
+uint32_t width, height;
 bool free_bufs = false, new_cursor = false;
 
 primary = vfio_display_get_dmabuf(vdev, DRM_PLANE_TYPE_PRIMARY);
@@ -296,10 +297,12 @@ static void vfio_display_dmabuf_update(void *opaque)
 return;
 }
 
+width = dpy_gl_dmabuf_get_width(&primary->buf);
+height = dpy_gl_dmabuf_get_height(&primary->buf);
+
 if (dpy->dmabuf.primary != primary) {
 dpy->dmabuf.primary = primary;
-qemu_console_resize(dpy->con,
-primary->buf.width, primary->buf.height);
+qemu_console_resize(dpy->con, width, height);
 dpy_gl_scanout_dmabuf(dpy->con, &primary->buf);
 free_bufs = true;
 }
@@ -328,7 +331,7 @@ static void vfio_display_dmabuf_update(void *opaque)
 cursor->pos_updates = 0;
 }
 
-dpy_gl_update(dpy->con, 0, 0, primary->buf.width, primary->buf.height);
+dpy_gl_update(dpy->con, 0, 0, width, height);
 
 if (free_bufs) {
 vfio_display_free_dmabufs(vdev);
diff --git a/ui/console.c b/ui/console.c
index 832055675c..edabad64c0 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -1190,6 +1190,24 @@ void dpy_gl_cursor_position(QemuConsole *con,
 }
 }
 
+uint32_t dpy_gl_dmabuf_get_width(QemuDmaBuf *dmabuf)
+{
+if (dmabuf) {
+return dmabuf->width;
+}
+
+return 0;
+}
+
+uint32_t dpy_gl_dmabuf_get_height(QemuDmaBuf *dmabuf)
+{
+if (dmabuf) {
+return dmabuf->height;
+}
+
+return 0;
+}
+
 void dpy_gl_release_dmabuf(QemuConsole *con,
   QemuDmaBuf *dmabuf)
 {
-- 
2.41.0

[PATCH 06/10] pnv/phb4: Set link-active status in HPSTAT and LMR registers

2024-03-21 Thread Saif Abrar

Config-read the link-status register in the PCI-E macro,
Depending on the link-active bit, set the link-active status
in the HOTPLUG_STATUS and LINK_MANAGEMENT registers
Also, clear the Presence-status active low bit in HOTPLUG_STATUS reg
after config-reading the slot-status in the PCI-E macro.

Signed-off-by: Saif Abrar 
---
 hw/pci-host/pnv_phb4.c | 57 +-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index 4e3a6b37f9..7b3d75bae6 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -516,6 +516,19 @@ static uint32_t get_exp_offset(PnvPHB4 *phb)
 return rpc->exp_offset;
 }
 
+/*
+ * Config-read the link-status register in the PCI-E macro,
+ * convert to LE and check the link-active bit.
+ */
+static uint32_t is_link_active(PnvPHB4 *phb)
+{
+uint32_t exp_offset = get_exp_offset(phb);
+
+return (bswap32(pnv_phb4_rc_config_read(phb,
+exp_offset + PCI_EXP_LNKSTA, 4))
+& PCI_EXP_LNKSTA_DLLLA);
+}
+
 #define RC_CONFIG_WRITE(a, v) pnv_phb4_rc_config_write(phb, a, 4, v)
 
 /*
@@ -757,6 +770,11 @@ static void pnv_phb4_reg_write(void *opaque, hwaddr off, 
uint64_t val,
 val = 0;
 break;
 
+case PHB_PCIE_HOTPLUG_STATUS:
+/* For normal operations, Simspeed diagnostic bit is always zero */
+val &= PHB_PCIE_HPSTAT_SIMDIAG;
+break;
+
 /* Read only registers */
 case PHB_CPU_LOADSTORE_STATUS:
 case PHB_ETU_ERR_SUMMARY:
@@ -968,8 +986,40 @@ static uint64_t pnv_phb4_reg_read(void *opaque, hwaddr 
off, unsigned size)
 val |= PHB_PCIE_DLP_INBAND_PRESENCE | PHB_PCIE_DLP_TL_LINKACT;
 return val;
 
+/*
+ * Read PCI-E registers and set status for:
+ * - Card present (active low bit 10)
+ * - Link active  (bit 12)
+ */
 case PHB_PCIE_HOTPLUG_STATUS:
-/* Clear write-only bit */
+/*
+ * Presence-status bit hpi_present_n is active-low, with reset value 1.
+ * Start by setting this bit to 1, indicating the card is not present.
+ * Then check the PCI-E register and clear the bit if card is present.
+ */
+val |= PHB_PCIE_HPSTAT_PRESENCE;
+
+/* Get the PCI-E capability offset from the root-port */
+uint32_t exp_base = get_exp_offset(phb);
+
+/*
+ * Config-read the PCI-E macro register for slot-status.
+ * Method for config-read converts to BE value.
+ * To check actual bit in the PCI-E register,
+ * convert the value back to LE using bswap32().
+ * Clear the Presence-status active low bit.
+ */
+if (bswap32(pnv_phb4_rc_config_read(phb, exp_base + PCI_EXP_SLTSTA, 4))
+& PCI_EXP_SLTSTA_PDS) {
+val &= ~PHB_PCIE_HPSTAT_PRESENCE;
+}
+
+/* Check if link is active and set the bit */
+if (is_link_active(phb)) {
+val |= PHB_PCIE_HPSTAT_LINKACTIVE;
+}
+
+/* Clear write-only resample-bit */
 val &= ~PHB_PCIE_HPSTAT_RESAMPLE;
 return val;
 
@@ -977,6 +1027,11 @@ static uint64_t pnv_phb4_reg_read(void *opaque, hwaddr 
off, unsigned size)
 case PHB_PCIE_LMR:
 /* These write-only bits always read as 0 */
 val &= ~(PHB_PCIE_LMR_CHANGELW | PHB_PCIE_LMR_RETRAINLINK);
+
+/* Check if link is active and set the bit */
+if (is_link_active(phb)) {
+val |= PHB_PCIE_LMR_LINKACTIVE;
+}
 return val;
 
 /* Silent simple reads */
-- 
2.39.3

[INCOMPLETE PATCH v3 3/3] ui/console: Introduce dpy_gl_create_dmabuf() helper

2024-03-21 Thread Philippe Mathieu-Daudé

From: Dongwon Kim 

It is safer to create, initialize, and access all the parameters
in QemuDmaBuf from a central location, ui/console, instead of
hw/virtio-gpu or hw/vfio modules.

Signed-off-by: Dongwon Kim 
Message-Id: <20240320034229.3347130-1-dongwon@intel.com>
[PMD: Split patch in 3, part 3/3]
Signed-off-by: Philippe Mathieu-Daudé 
---
Incomplete... VhostUserGPU doesn't allocate,
vhost_user_gpu_handle_display() crashes.
---
 include/hw/vfio/vfio-common.h   |  2 +-
 include/hw/virtio/virtio-gpu.h  |  2 +-
 include/ui/console.h|  7 +++
 hw/display/virtio-gpu-udmabuf.c | 23 ---
 hw/vfio/display.c   | 24 ++--
 ui/console.c| 28 
 6 files changed, 55 insertions(+), 31 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b9da6c08ef..d66e27db02 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -148,7 +148,7 @@ typedef struct VFIOGroup {
 } VFIOGroup;
 
 typedef struct VFIODMABuf {
-QemuDmaBuf buf;
+QemuDmaBuf *buf;
 uint32_t pos_x, pos_y, pos_updates;
 uint32_t hot_x, hot_y, hot_updates;
 int dmabuf_id;
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index ed44cdad6b..010083e8e3 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -169,7 +169,7 @@ struct VirtIOGPUBaseClass {
 DEFINE_PROP_UINT32("yres", _state, _conf.yres, 800)
 
 typedef struct VGPUDMABuf {
-QemuDmaBuf buf;
+QemuDmaBuf *buf;
 uint32_t scanout_id;
 QTAILQ_ENTRY(VGPUDMABuf) next;
 } VGPUDMABuf;
diff --git a/include/ui/console.h b/include/ui/console.h
index 1f3d025548..0b823efb2e 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -279,6 +279,7 @@ typedef struct DisplayChangeListenerOps {
 /* optional */
 void (*dpy_gl_cursor_position)(DisplayChangeListener *dcl,
uint32_t pos_x, uint32_t pos_y);
+
 /* optional */
 void (*dpy_gl_release_dmabuf)(DisplayChangeListener *dcl,
   QemuDmaBuf *dmabuf);
@@ -358,6 +359,12 @@ void dpy_gl_cursor_dmabuf(QemuConsole *con, QemuDmaBuf 
*dmabuf,
   bool have_hot, uint32_t hot_x, uint32_t hot_y);
 void dpy_gl_cursor_position(QemuConsole *con,
 uint32_t pos_x, uint32_t pos_y);
+QemuDmaBuf *dpy_gl_create_dmabuf(uint32_t width, uint32_t height,
+ uint32_t stride, uint32_t x,
+ uint32_t y, uint32_t backing_width,
+ uint32_t backing_height, uint32_t fourcc,
+ uint32_t modifier, uint32_t dmabuf_fd,
+ bool allow_fences);
 uint32_t dpy_gl_dmabuf_get_width(QemuDmaBuf *dmabuf);
 uint32_t dpy_gl_dmabuf_get_height(QemuDmaBuf *dmabuf);
 int32_t dpy_gl_dmabuf_get_fd(QemuDmaBuf *dmabuf);
diff --git a/hw/display/virtio-gpu-udmabuf.c b/hw/display/virtio-gpu-udmabuf.c
index d680e871c1..dde6c8e9d9 100644
--- a/hw/display/virtio-gpu-udmabuf.c
+++ b/hw/display/virtio-gpu-udmabuf.c
@@ -162,7 +162,7 @@ static void virtio_gpu_free_dmabuf(VirtIOGPU *g, VGPUDMABuf 
*dmabuf)
 struct virtio_gpu_scanout *scanout;
 
 scanout = &g->parent_obj.scanout[dmabuf->scanout_id];
-dpy_gl_release_dmabuf(scanout->con, &dmabuf->buf);
+dpy_gl_release_dmabuf(scanout->con, dmabuf->buf);
 QTAILQ_REMOVE(&g->dmabuf.bufs, dmabuf, next);
 g_free(dmabuf);
 }
@@ -181,17 +181,10 @@ static VGPUDMABuf
 }
 
 dmabuf = g_new0(VGPUDMABuf, 1);
-dmabuf->buf.width = r->width;
-dmabuf->buf.height = r->height;
-dmabuf->buf.stride = fb->stride;
-dmabuf->buf.x = r->x;
-dmabuf->buf.y = r->y;
-dmabuf->buf.backing_width = fb->width;
-dmabuf->buf.backing_height = fb->height;
-dmabuf->buf.fourcc = qemu_pixman_to_drm_format(fb->format);
-dmabuf->buf.fd = res->dmabuf_fd;
-dmabuf->buf.allow_fences = true;
-dmabuf->buf.draw_submitted = false;
+dmabuf->buf = dpy_gl_create_dmabuf(r->width, r->height, fb->stride,
+   r->x, r->y, fb->width, fb->height,
+   qemu_pixman_to_drm_format(fb->format),
+   0, res->dmabuf_fd, false);
 dmabuf->scanout_id = scanout_id;
 QTAILQ_INSERT_HEAD(&g->dmabuf.bufs, dmabuf, next);
 
@@ -213,8 +206,8 @@ int virtio_gpu_update_dmabuf(VirtIOGPU *g,
 return -EINVAL;
 }
 
-width = dpy_gl_dmabuf_get_width(&new_primary->buf);
-height = dpy_gl_dmabuf_get_height(&new_primary->buf);
+width = dpy_gl_dmabuf_get_width(new_primary->buf);
+height = dpy_gl_dmabuf_get_height(new_primary->buf);
 
 if (g->dmabuf.primary[scanout_id]) {
 old_primary = g->dmabuf.primary[scanout_id];
@@ -222,7 +215,7 @@ int virtio_gpu_update_dmabuf(VirtIOGPU *g,
 
 g->dmabuf.p

[PATCH v3 0/3] ui/console: initialize QemuDmaBuf in ui/console

2024-03-21 Thread Philippe Mathieu-Daudé

Respin of Dongwon v2 split as bisectable changes.
Unfortunately last patch breaks vhost_user_gpu_handle_display.

Should dbus_scanout_texture() use dpy_gl_create_dmabuf()?

Dongwon, you can use it as a base for a v4.

Regards,

Phil.

Dongwon Kim (3):
  ui/console: Introduce dpy_gl_dmabuf_get_height/width() helpers
  ui/console: Introduce dpy_gl_dmabuf_get_fd() helper
  ui/console: Introduce dpy_gl_create_dmabuf() helper

 include/hw/vfio/vfio-common.h   |  2 +-
 include/hw/virtio/virtio-gpu.h  |  2 +-
 include/ui/console.h| 10 ++
 hw/display/virtio-gpu-udmabuf.c | 27 +++-
 hw/vfio/display.c   | 35 -
 ui/console.c| 55 +
 6 files changed, 98 insertions(+), 33 deletions(-)

-- 
2.41.0

[PATCH v3 2/3] ui/console: Introduce dpy_gl_dmabuf_get_fd() helper

2024-03-21 Thread Philippe Mathieu-Daudé

From: Dongwon Kim 

Signed-off-by: Dongwon Kim 
Message-Id: <20240320034229.3347130-1-dongwon@intel.com>
[PMD: Split patch in 3, part 2/3]
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/ui/console.h | 1 +
 hw/vfio/display.c| 8 +++-
 ui/console.c | 9 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/ui/console.h b/include/ui/console.h
index a7f6cef26d..1f3d025548 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -360,6 +360,7 @@ void dpy_gl_cursor_position(QemuConsole *con,
 uint32_t pos_x, uint32_t pos_y);
 uint32_t dpy_gl_dmabuf_get_width(QemuDmaBuf *dmabuf);
 uint32_t dpy_gl_dmabuf_get_height(QemuDmaBuf *dmabuf);
+int32_t dpy_gl_dmabuf_get_fd(QemuDmaBuf *dmabuf);
 void dpy_gl_release_dmabuf(QemuConsole *con,
QemuDmaBuf *dmabuf);
 void dpy_gl_update(QemuConsole *con,
diff --git a/hw/vfio/display.c b/hw/vfio/display.c
index c962e5f88f..676b2fc5f3 100644
--- a/hw/vfio/display.c
+++ b/hw/vfio/display.c
@@ -259,9 +259,15 @@ static VFIODMABuf *vfio_display_get_dmabuf(VFIOPCIDevice 
*vdev,
 
 static void vfio_display_free_one_dmabuf(VFIODisplay *dpy, VFIODMABuf *dmabuf)
 {
+int fd;
+
 QTAILQ_REMOVE(&dpy->dmabuf.bufs, dmabuf, next);
+fd = dpy_gl_dmabuf_get_fd(&dmabuf->buf);
+if (fd > -1) {
+close(fd);
+}
+
 dpy_gl_release_dmabuf(dpy->con, &dmabuf->buf);
-close(dmabuf->buf.fd);
 g_free(dmabuf);
 }
 
diff --git a/ui/console.c b/ui/console.c
index edabad64c0..10abeb9780 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -1208,6 +1208,15 @@ uint32_t dpy_gl_dmabuf_get_height(QemuDmaBuf *dmabuf)
 return 0;
 }
 
+int32_t dpy_gl_dmabuf_get_fd(QemuDmaBuf *dmabuf)
+{
+if (dmabuf) {
+return dmabuf->fd;
+}
+
+return -1;
+}
+
 void dpy_gl_release_dmabuf(QemuConsole *con,
   QemuDmaBuf *dmabuf)
 {
-- 
2.41.0

Re: [PATCH] coroutine: reserve 5,000 mappings

2024-03-21 Thread Daniel P . Berrangé

On Wed, Mar 20, 2024 at 02:12:32PM -0400, Stefan Hajnoczi wrote:
> Daniel P. Berrangé  pointed out that the coroutine
> pool size heuristic is very conservative. Instead of halving
> max_map_count, he suggested reserving 5,000 mappings for non-coroutine
> users based on observations of guests he has access to.
> 
> Fixes: 86a637e48104 ("coroutine: cap per-thread local pool size")

It wasn't really broken, so "Tweaks" or "Enhances" rather than
"Fixes" if you like :-)

> Signed-off-by: Stefan Hajnoczi 
> ---
>  util/qemu-coroutine.c | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> index 2790959eaf..eb4eebefdf 100644
> --- a/util/qemu-coroutine.c
> +++ b/util/qemu-coroutine.c
> @@ -377,12 +377,17 @@ static unsigned int get_global_pool_hard_max_size(void)
>  NULL) &&
>  qemu_strtoi(contents, NULL, 10, &max_map_count) == 0) {
>  /*
> - * This is a conservative upper bound that avoids exceeding
> - * max_map_count. Leave half for non-coroutine users like library
> - * dependencies, vhost-user, etc. Each coroutine takes up 2 VMAs so
> - * halve the amount again.
> + * This is an upper bound that avoids exceeding max_map_count. Leave 
> a
> + * fixed amount for non-coroutine users like library dependencies,
> + * vhost-user, etc. Each coroutine takes up 2 VMAs so halve the
> + * remaining amount.
>   */
> -return max_map_count / 4;
> +if (max_map_count > 5000) {
> +return (max_map_count - 5000) / 2;
> +} else {
> +/* Disable the global pool but threads still have local pools */
> +return 0;
> +}
>  }
>  #endif

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH] hw/intc: Update APLIC IDC after claiming iforce register

2024-03-21 Thread frank . chang

From: Frank Chang 

Currently, QEMU only sets the iforce register to 0 and returns early
when claiming the iforce register. However, this may leave mip.meip
remains at 1 if a spurious external interrupt triggered by iforce
register is the only pending interrupt to be claimed, and the interrupt
cannot be lowered as expected.

This commit fixes this issue by calling riscv_aplic_idc_update() to
update the IDC status after the iforce register is claimed.

Signed-off-by: Frank Chang 
Reviewed-by: Jim Shu 
---
 hw/intc/riscv_aplic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/intc/riscv_aplic.c b/hw/intc/riscv_aplic.c
index 6a7fbfa861..fc5df0d598 100644
--- a/hw/intc/riscv_aplic.c
+++ b/hw/intc/riscv_aplic.c
@@ -488,6 +488,7 @@ static uint32_t riscv_aplic_idc_claimi(RISCVAPLICState 
*aplic, uint32_t idc)
 
 if (!topi) {
 aplic->iforce[idc] = 0;
+riscv_aplic_idc_update(aplic, idc);
 return 0;
 }
 
-- 
2.43.2

Re: [PATCH v4 2/3] tools: build qemu-vmsr-helper

2024-03-21 Thread Daniel P . Berrangé

On Mon, Mar 18, 2024 at 04:12:15PM +0100, Anthony Harivel wrote:
> Introduce a privileged helper to access RAPL MSR.
> 
> The privileged helper tool, qemu-vmsr-helper, is designed to provide
> virtual machines with the ability to read specific RAPL (Running Average
> Power Limit) MSRs without requiring CAP_SYS_RAWIO privileges or relying
> on external, out-of-tree patches.
> 
> The helper tool leverages Unix permissions and SO_PEERCRED socket
> options to enforce access control, ensuring that only processes
> explicitly requesting read access via readmsr() from a valid Thread ID
> can access these MSRs.
> 
> The list of RAPL MSRs that are allowed to be read by the helper tool is
> defined in rapl-msr-index.h. This list corresponds to the RAPL MSRs that
> will be supported in the next commit titled "Add support for RAPL MSRs
> in KVM/QEMU."
> 
> The tool is intentionally designed to run on the Linux x86 platform.
> This initial implementation is tailored for Intel CPUs but can be
> extended to support AMD CPUs in the future.
> 
> Signed-off-by: Anthony Harivel 
> ---
>  contrib/systemd/qemu-vmsr-helper.service |  15 +
>  contrib/systemd/qemu-vmsr-helper.socket  |   9 +
>  docs/tools/index.rst |   1 +
>  docs/tools/qemu-vmsr-helper.rst  |  89 
>  meson.build  |   5 +
>  tools/i386/qemu-vmsr-helper.c| 564 +++
>  tools/i386/rapl-msr-index.h  |  28 ++
>  7 files changed, 711 insertions(+)
>  create mode 100644 contrib/systemd/qemu-vmsr-helper.service
>  create mode 100644 contrib/systemd/qemu-vmsr-helper.socket
>  create mode 100644 docs/tools/qemu-vmsr-helper.rst
>  create mode 100644 tools/i386/qemu-vmsr-helper.c
>  create mode 100644 tools/i386/rapl-msr-index.h
> 

> diff --git a/meson.build b/meson.build
> index b375248a7614..376da49b60ab 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -4052,6 +4052,11 @@ if have_tools
> dependencies: [authz, crypto, io, qom, qemuutil,
>libcap_ng, mpathpersist],
> install: true)
> +
> +executable('qemu-vmsr-helper', files('tools/i386/qemu-vmsr-helper.c'),
> +   dependencies: [authz, crypto, io, qom, qemuutil,
> +  libcap_ng, mpathpersist],
> +   install: true)
>endif

Missed feedback from v2 saying this must /only/ be built
on x86 architectures. It fails to build on others due
to the ASM usage eg

https://gitlab.com/berrange/qemu/-/jobs/6445384073

>  
>if have_ivshmem
> diff --git a/tools/i386/qemu-vmsr-helper.c b/tools/i386/qemu-vmsr-helper.c
> new file mode 100644
> index ..d8439dc173af
> --- /dev/null
> +++ b/tools/i386/qemu-vmsr-helper.c
> @@ -0,0 +1,564 @@
> +/*
> + * Privileged RAPL MSR helper commands for QEMU
> + *
> + * Copyright (C) 2024 Red Hat, Inc. 
> + *
> + * Author: Anthony Harivel 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; under version 2 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +#include 
> +#include 
> +#include 
> +#ifdef CONFIG_LIBCAP_NG
> +#include 
> +#endif
> +#include 
> +#include 
> +
> +#include "qemu/help-texts.h"
> +#include "qapi/error.h"
> +#include "qemu/cutils.h"
> +#include "qemu/main-loop.h"
> +#include "qemu/module.h"
> +#include "qemu/error-report.h"
> +#include "qemu/config-file.h"
> +#include "qemu-version.h"
> +#include "qapi/error.h"
> +#include "qemu/error-report.h"
> +#include "qemu/log.h"
> +#include "qemu/systemd.h"
> +#include "io/channel.h"
> +#include "io/channel-socket.h"
> +#include "trace/control.h"
> +#include "qemu-version.h"
> +#include "rapl-msr-index.h"
> +
> +#define MSR_PATH_TEMPLATE "/dev/cpu/%u/msr"
> +
> +static char *socket_path;
> +static char *pidfile;
> +static enum { RUNNING, TERMINATE, TERMINATING } state;
> +static QIOChannelSocket *server_ioc;
> +static int server_watch;
> +static int num_active_sockets = 1;
> +
> +#ifdef CONFIG_LIBCAP_NG
> +static int uid = -1;
> +static int gid = -1;
> +#endif
> +
> +static void compute_default_paths(void)
> +{
> +g_autofree char *state = qemu_get_local_state_dir();
> +
> +socket_path = g_build_filename(state, "run", "qemu-vmsr-helper.sock", 
> NULL);
> +pidfile = g_build_filename(state, "run", "qemu-vmsr-helper.pid", NULL);
> +}
> +
> +static int is_intel_processor(void)
> +{
> +int result;
> +int ebx, ecx, edx;
> +
> +/* Execute C

Re: [RFC PATCH v8 06/23] target/arm: Add support for Non-maskable Interrupt

2024-03-21 Thread Peter Maydell

On Mon, 18 Mar 2024 at 09:37, Jinjie Ruan  wrote:
>
> This only implements the external delivery method via the GICv3.
>
> Signed-off-by: Jinjie Ruan 
> Reviewed-by: Richard Henderson 

> @@ -692,13 +719,13 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
> unsigned int excp_idx,
>  /* VFIQs are only taken when hypervized.  */
>  return false;
>  }
> -return !(env->daif & PSTATE_F);
> +return !(env->daif & PSTATE_F) && (!allIntMask);
>  case EXCP_VIRQ:
>  if (!(hcr_el2 & HCR_IMO) || (hcr_el2 & HCR_TGE)) {
>  /* VIRQs are only taken when hypervized.  */
>  return false;
>  }
> -return !(env->daif & PSTATE_I);
> +return !(env->daif & PSTATE_I) && (!allIntMask);
>  case EXCP_VSERR:
>  if (!(hcr_el2 & HCR_AMO) || (hcr_el2 & HCR_TGE)) {
>  /* VIRQs are only taken when hypervized.  */
> @@ -804,6 +831,24 @@ static bool arm_cpu_exec_interrupt(CPUState *cs, int 
> interrupt_request)
>
>  /* The prioritization of interrupts is IMPLEMENTATION DEFINED. */
>
> +if (cpu_isar_feature(aa64_nmi, env_archcpu(env))) {
> +if (interrupt_request & CPU_INTERRUPT_NMI) {
> +excp_idx = EXCP_NMI;
> +target_el = arm_phys_excp_target_el(cs, excp_idx, cur_el, 
> secure);
> +if (arm_excp_unmasked(cs, excp_idx, target_el,
> +  cur_el, secure, hcr_el2)) {
> +goto found;
> +}
> +}
> +if (interrupt_request & CPU_INTERRUPT_VNMI) {
> +excp_idx = EXCP_VNMI;
> +target_el = 1;
> +if (arm_excp_unmasked(cs, excp_idx, target_el,
> +  cur_el, secure, hcr_el2)) {
> +goto found;
> +}
> +}
> +}
>  if (interrupt_request & CPU_INTERRUPT_FIQ) {
>  excp_idx = EXCP_FIQ;
>  target_el = arm_phys_excp_target_el(cs, excp_idx, cur_el, secure);

This part adds handling for taking IRQNMI and VIRQNMI, but not
VFIQNMI. So because at the moment we merge VFIQNMI and VFIQ
they both come out as interrupt_request having CPU_INTERRUPT_VFIQ
set. But we don't handle that in this function or in
arm_excp_unmasked(), so we treat it exactly the same as VFIQ, which
means that if PSTATE.F is 1 then we will incorrectly fail to take
the VFIQNMI.

I think the code is going to be a lot more straightforward
if we keep all of these things separate. Compare how we handle
VSError, which is also (for QEMU) something where only the
virtual version exists and which is only triggered via the
HCR bits.

thanks
-- PMM

Re: [PATCH v3 37/49] i386/sev: Add the SNP launch start context

2024-03-21 Thread Paolo Bonzini

Il mer 20 mar 2024, 23:33 Michael Roth  ha scritto:

> On Wed, Mar 20, 2024 at 10:58:30AM +0100, Paolo Bonzini wrote:
> > On 3/20/24 09:39, Michael Roth wrote:
> > > From: Brijesh Singh 
> > >
> > > The SNP_LAUNCH_START is called first to create a cryptographic launch
> > > context within the firmware.
> > >
> > > Signed-off-by: Brijesh Singh 
> > > Signed-off-by: Michael Roth 
> > > ---
> > >   target/i386/sev.c| 42
> +++-
> > >   target/i386/trace-events |  1 +
> > >   2 files changed, 42 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/target/i386/sev.c b/target/i386/sev.c
> > > index 3b4dbc63b1..9f63a41f08 100644
> > > --- a/target/i386/sev.c
> > > +++ b/target/i386/sev.c
> > > @@ -39,6 +39,7 @@
> > >   #include "confidential-guest.h"
> > >   #include "hw/i386/pc.h"
> > >   #include "exec/address-spaces.h"
> > > +#include "qemu/queue.h"
> > >   OBJECT_DECLARE_SIMPLE_TYPE(SevCommonState, SEV_COMMON)
> > >   OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST)
> > > @@ -106,6 +107,16 @@ struct SevSnpGuestState {
> > >   #define DEFAULT_SEV_DEVICE  "/dev/sev"
> > >   #define DEFAULT_SEV_SNP_POLICY  0x3
> > > +typedef struct SevLaunchUpdateData {
> > > +QTAILQ_ENTRY(SevLaunchUpdateData) next;
> > > +hwaddr gpa;
> > > +void *hva;
> > > +uint64_t len;
> > > +int type;
> > > +} SevLaunchUpdateData;
> > > +
> > > +static QTAILQ_HEAD(, SevLaunchUpdateData) launch_update;
> > > +
> > >   #define SEV_INFO_BLOCK_GUID
>  "00f771de-1a7e-4fcb-890e-68c77e2fb44e"
> > >   typedef struct __attribute__((__packed__)) SevInfoBlock {
> > >   /* SEV-ES Reset Vector Address */
> > > @@ -668,6 +679,30 @@ sev_read_file_base64(const char *filename, guchar
> **data, gsize *len)
> > >   return 0;
> > >   }
> > > +static int
> > > +sev_snp_launch_start(SevSnpGuestState *sev_snp_guest)
> > > +{
> > > +int fw_error, rc;
> > > +SevCommonState *sev_common = SEV_COMMON(sev_snp_guest);
> > > +struct kvm_sev_snp_launch_start *start =
> &sev_snp_guest->kvm_start_conf;
> > > +
> > > +trace_kvm_sev_snp_launch_start(start->policy,
> sev_snp_guest->guest_visible_workarounds);
> > > +
> > > +rc = sev_ioctl(sev_common->sev_fd, KVM_SEV_SNP_LAUNCH_START,
> > > +   start, &fw_error);
> > > +if (rc < 0) {
> > > +error_report("%s: SNP_LAUNCH_START ret=%d fw_error=%d '%s'",
> > > +__func__, rc, fw_error, fw_error_to_str(fw_error));
> > > +return 1;
> > > +}
> > > +
> > > +QTAILQ_INIT(&launch_update);
> > > +
> > > +sev_set_guest_state(sev_common, SEV_STATE_LAUNCH_UPDATE);
> > > +
> > > +return 0;
> > > +}
> > > +
> > >   static int
> > >   sev_launch_start(SevGuestState *sev_guest)
> > >   {
> > > @@ -1007,7 +1042,12 @@ static int
> sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
> > >   goto err;
> > >   }
> > > -ret = sev_launch_start(SEV_GUEST(sev_common));
> > > +if (sev_snp_enabled()) {
> > > +ret = sev_snp_launch_start(SEV_SNP_GUEST(sev_common));
> > > +} else {
> > > +ret = sev_launch_start(SEV_GUEST(sev_common));
> > > +}
> >
> > Instead of an "if", this should be a method in sev-common.  Likewise for
> > launch_finish in the next patch.
>
> Makes sense.
>
> >
> > Also, patch 47 should introduce an "int (*launch_update_data)(hwaddr gpa,
> > uint8_t *ptr, uint64_t len)" method whose implementation is either the
> > existing sev_launch_update_data() for sev-guest, or a wrapper around
> > snp_launch_update_data() (to add KVM_SEV_SNP_PAGE_TYPE_NORMAL) for
> > sev-snp-guest.
>
> I suppose if we end up introducing an unused 'gpa' parameter in the case
> of sev_launch_update_data() that's still worth the change? Seems
> reasonable to me.
>

Yeah, you most likely have the gpa anyway in the kind of board code that
calls UPDATE_DATA. It would put some challenges with migration to use gpas
everywhere instead of ram_addrs, but that doesn't use UPDATE_DATA and
there's no SEV/SEV-ES migration anyway.

>
> >
> > In general, the only uses of sev_snp_enabled() should be in
> > sev_add_kernel_loader_hashes() and kvm_handle_vmgexit_ext_req().  I would
> > not be that strict for the QMP and HMP functions, but if you want to make
> > those methods of sev-common I wouldn't complain.
>
> There's a good bit of duplication in those cases which is a little
> awkward to break out into a common helper. Will consider these as well
> though.
>

I would say especially in HMP do not bother since you have to start from a
QAPI union and not QOM objects. For QMP I am not sure but don't waste too
much effort in it. All I wanted to say is that the monitor is a fine place
to draw the line.

Paolo


> Thanks,
>
> Mike
>
> >
> > Paolo
> >
> > >   if (ret) {
> > >   error_setg(errp, "%s: failed to create encryption context",
> __func__);
> > >   goto err;
> > > diff --git a/target/i386/trace-events b/target/i386/trace-events
> > > index 2cd87

Re: [PULL 0/5] more maintainer updates (git, avocado)

2024-03-21 Thread Peter Maydell

On Wed, 20 Mar 2024 at 16:15, Alex Bennée  wrote:
>
> The following changes since commit c62d54d0a8067ffb3d5b909276f7296d7df33fa7:
>
>   Update version for v9.0.0-rc0 release (2024-03-19 19:13:52 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/stsquad/qemu.git 
> tags/pull-maintainer-final-for-real-this-time-200324-1
>
> for you to fetch changes up to 55900f5dcc3205b87609d9be452c6c76c48b863b:
>
>   tests/avocado: sbsa-ref: add OpenBSD tests for misc 'max' setup (2024-03-20 
> 09:52:27 +)
>
> 
> maintainer updates (gitlab, avocado):
>
>   - avoid extra git data on gitlab checkouts
>   - update sbsa-ref tests
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM

Re: [PATCH 4/5] target/riscv: Expose Zve64x extension to users

2024-03-21 Thread Daniel Henrique Barboza





On 3/6/24 14:08, Jason Chien wrote:

Signed-off-by: Jason Chien 
Reviewed-by: Frank Chang 
Reviewed-by: Max Chou 
---


Please add the following tag in this commit msg:


Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2107


The link is a feature request named "target/riscv: zve32x/zve64x are not 
supported"
that was opened a couple of months ago. Adding this tag will close the bug 
(since by
this time we'll have both zve32x and zve64x) as soon as the series is merged to
master.


Thanks,


Daniel


  target/riscv/cpu.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8b5d1eb6a8..58b2a94694 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1473,6 +1473,7 @@ const RISCVCPUMultiExtConfig riscv_cpu_extensions[] = {
  MULTI_EXT_CFG_BOOL("zve32x", ext_zve32x, false),
  MULTI_EXT_CFG_BOOL("zve64f", ext_zve64f, false),
  MULTI_EXT_CFG_BOOL("zve64d", ext_zve64d, false),
+MULTI_EXT_CFG_BOOL("zve64x", ext_zve64x, false),
  MULTI_EXT_CFG_BOOL("sstc", ext_sstc, true),
  
  MULTI_EXT_CFG_BOOL("smepmp", ext_smepmp, false),

Re: [PATCH v2] target/loongarch: Fix qemu-system-loongarch64 assert failed with the option '-d int'

2024-03-21 Thread Philippe Mathieu-Daudé


On 21/3/24 07:31, Song Gao wrote:

qemu-system-loongarch64 assert failed with the option '-d int',
the helper_idle() raise an exception EXCP_HLT, but the exception name is 
undefined.

Signed-off-by: Song Gao 
---
  target/loongarch/cpu.c | 76 +++---
  1 file changed, 42 insertions(+), 34 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index f6ffb3aadb..c56e606d28 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -45,33 +45,47 @@ const char * const fregnames[32] = {
  "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31",
  };
  
-static const char * const excp_names[] = {

-[EXCCODE_INT] = "Interrupt",
-[EXCCODE_PIL] = "Page invalid exception for load",
-[EXCCODE_PIS] = "Page invalid exception for store",
-[EXCCODE_PIF] = "Page invalid exception for fetch",
-[EXCCODE_PME] = "Page modified exception",
-[EXCCODE_PNR] = "Page Not Readable exception",
-[EXCCODE_PNX] = "Page Not Executable exception",
-[EXCCODE_PPI] = "Page Privilege error",
-[EXCCODE_ADEF] = "Address error for instruction fetch",
-[EXCCODE_ADEM] = "Address error for Memory access",
-[EXCCODE_SYS] = "Syscall",
-[EXCCODE_BRK] = "Break",
-[EXCCODE_INE] = "Instruction Non-Existent",
-[EXCCODE_IPE] = "Instruction privilege error",
-[EXCCODE_FPD] = "Floating Point Disabled",
-[EXCCODE_FPE] = "Floating Point Exception",
-[EXCCODE_DBP] = "Debug breakpoint",
-[EXCCODE_BCE] = "Bound Check Exception",
-[EXCCODE_SXD] = "128 bit vector instructions Disable exception",
-[EXCCODE_ASXD] = "256 bit vector instructions Disable exception",
+struct TypeExcp {
+int32_t exccode;
+const char *name;
+};
+
+static const struct TypeExcp excp_names[] = {
+{EXCCODE_INT, "Interrupt"},
+{EXCCODE_PIL, "Page invalid exception for load"},
+{EXCCODE_PIS, "Page invalid exception for store"},
+{EXCCODE_PIF, "Page invalid exception for fetch"},
+{EXCCODE_PME, "Page modified exception"},
+{EXCCODE_PNR, "Page Not Readable exception"},
+{EXCCODE_PNX, "Page Not Executable exception"},
+{EXCCODE_PPI, "Page Privilege error"},
+{EXCCODE_ADEF, "Address error for instruction fetch"},
+{EXCCODE_ADEM, "Address error for Memory access"},
+{EXCCODE_SYS, "Syscall"},
+{EXCCODE_BRK, "Break"},
+{EXCCODE_INE, "Instruction Non-Existent"},
+{EXCCODE_IPE, "Instruction privilege error"},
+{EXCCODE_FPD, "Floating Point Disabled"},
+{EXCCODE_FPE, "Floating Point Exception"},
+{EXCCODE_DBP, "Debug breakpoint"},
+{EXCCODE_BCE, "Bound Check Exception"},
+{EXCCODE_SXD, "128 bit vector instructions Disable exception"},
+{EXCCODE_ASXD, "256 bit vector instructions Disable exception"},
+{EXCP_HLT, "EXCP_HLT"},
  };
  
  const char *loongarch_exception_name(int32_t exception)

  {
-assert(excp_names[exception]);
-return excp_names[exception];
+int i;
+const char *name = NULL;
+
+for (i = 0; i < ARRAY_SIZE(excp_names); i++) {
+if (excp_names[i].exccode == exception) {


  return excp_names[i].name;


+name = excp_names[i].name;
+break;
+}
+}


   return "Unknown";


+return name;
  }

Re: [PATCH] coroutine: cap per-thread local pool size

2024-03-21 Thread Kevin Wolf

Am 20.03.2024 um 15:09 hat Daniel P. Berrangé geschrieben:
> On Wed, Mar 20, 2024 at 09:35:39AM -0400, Stefan Hajnoczi wrote:
> > On Tue, Mar 19, 2024 at 08:10:49PM +, Daniel P. Berrangé wrote:
> > > On Tue, Mar 19, 2024 at 01:55:10PM -0400, Stefan Hajnoczi wrote:
> > > > On Tue, Mar 19, 2024 at 01:43:32PM +, Daniel P. Berrangé wrote:
> > > > > On Mon, Mar 18, 2024 at 02:34:29PM -0400, Stefan Hajnoczi wrote:
> > > > > > diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> > > > > > index 5fd2dbaf8b..2790959eaf 100644
> > > > > > --- a/util/qemu-coroutine.c
> > > > > > +++ b/util/qemu-coroutine.c
> > > > > 
> > > > > > +static unsigned int get_global_pool_hard_max_size(void)
> > > > > > +{
> > > > > > +#ifdef __linux__
> > > > > > +g_autofree char *contents = NULL;
> > > > > > +int max_map_count;
> > > > > > +
> > > > > > +/*
> > > > > > + * Linux processes can have up to max_map_count virtual memory 
> > > > > > areas
> > > > > > + * (VMAs). mmap(2), mprotect(2), etc fail with ENOMEM beyond 
> > > > > > this limit. We
> > > > > > + * must limit the coroutine pool to a safe size to avoid 
> > > > > > running out of
> > > > > > + * VMAs.
> > > > > > + */
> > > > > > +if (g_file_get_contents("/proc/sys/vm/max_map_count", 
> > > > > > &contents, NULL,
> > > > > > +NULL) &&
> > > > > > +qemu_strtoi(contents, NULL, 10, &max_map_count) == 0) {
> > > > > > +/*
> > > > > > + * This is a conservative upper bound that avoids exceeding
> > > > > > + * max_map_count. Leave half for non-coroutine users like 
> > > > > > library
> > > > > > + * dependencies, vhost-user, etc. Each coroutine takes up 
> > > > > > 2 VMAs so
> > > > > > + * halve the amount again.
> > > 
> > > Leaving half for loaded libraries, etc is quite conservative
> > > if max_map_count is the small-ish 64k default.
> > > 
> > > That reservation could perhaps a fixed number like 5,000 ?
> > 
> > While I don't want QEMU to abort, once this heuristic is in the code it
> > will be scary to make it more optimistic and we may never change it. So
> > now is the best time to try 5,000.
> > 
> > I'll send a follow-up patch that reserves 5,000 mappings. If that turns
> > out to be too optimistic we can increase the reservation.
> 
> BTW, I suggested 5,000, because I looked at a few QEM processes I have
> running on Fedora and saw just under 1,000 lines in /proc/$PID/maps,
> of which only a subset is library mappings. So multiplying that x5 felt
> like a fairly generous overhead for more complex build configurations.

On my system, the boring desktop VM with no special hardware or other
advanced configuration takes ~1500 mappings, most of which are
libraries. I'm not concerned about the library mappings, it's unlikely
that we'll double the number of libraries soon.

But I'm not sure about dynamic mappings outside of coroutines, maybe
when enabling features my simple desktop VM doesn't even use at all. If
we're sure that nothing else uses any number worth mentioning, fine with
me. But I couldn't tell.

Staying the area we know reasonably well, how many libblkio bounce
buffers could be in use at the same time? I think each one is an
individual mmap(), right?

Kevin

Re: [PATCH] migration/postcopy: Fix high frequency sync

2024-03-21 Thread Fabiano Rosas

pet...@redhat.com writes:

> From: Peter Xu 
>
> On current code base I can observe extremely high sync count during
> precopy, as long as one enables postcopy-ram=on before switchover to
> postcopy.
>
> To provide some context of when we decide to do a full sync: we check
> must_precopy (which implies "data must be sent during precopy phase"), and
> as long as it is lower than the threshold size we calculated (out of
> bandwidth and expected downtime) we will kick off the slow sync.
>
> However, when postcopy is enabled (even if still during precopy phase), RAM
> only reports all pages as can_postcopy, and report must_precopy==0.  Then
> "must_precopy <= threshold_size" mostly always triggers and enforces a slow
> sync for every call to migration_iteration_run() when postcopy is enabled
> even if not used.  That is insane.
>
> It turns out it was a regress bug introduced in the previous refactoring in
> QEMU 8.0 in late 2022. Fix this by checking the whole RAM size rather than
> must_precopy, like before.  Not copy stable yet as many things changed, and
> even if this should be a major performance regression, no functional change
> has observed (and that's also probably why nobody found it).  I only notice
> this when looking for another bug reported by Nina.
>
> When at it, cleanup a little bit on the lines around.
>
> Cc: Nina Schoetterl-Glausch 
> Fixes: c8df4a7aef ("migration: Split save_live_pending() into 
> state_pending_*")
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas 

> ---
>
> Nina: I copied you only because this might still be relevant, as this issue
> also misteriously points back to c8df4a7aef..  However I don't think it
> should be a fix of your problem, at most it can change the possibility of
> reproducability.
>
> This is not a regression for this release, but I still want to have it for
> 9.0.  Fabiano, any opinions / objections?

Go for it.

> ---
>  migration/migration.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 047b6b49cf..9fe8fd2afd 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3199,17 +3199,16 @@ typedef enum {
>   */
>  static MigIterateState migration_iteration_run(MigrationState *s)
>  {
> -uint64_t must_precopy, can_postcopy;
> +uint64_t must_precopy, can_postcopy, pending_size;
>  Error *local_err = NULL;
>  bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
>  bool can_switchover = migration_can_switchover(s);
>  
>  qemu_savevm_state_pending_estimate(&must_precopy, &can_postcopy);
> -uint64_t pending_size = must_precopy + can_postcopy;
> -
> +pending_size = must_precopy + can_postcopy;
>  trace_migrate_pending_estimate(pending_size, must_precopy, can_postcopy);
>  
> -if (must_precopy <= s->threshold_size) {
> +if (pending_size < s->threshold_size) {
>  qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);
>  pending_size = must_precopy + can_postcopy;
>  trace_migrate_pending_exact(pending_size, must_precopy, 
> can_postcopy);

[PATCH v3] target/loongarch: Fix qemu-system-loongarch64 assert failed with the option '-d int'

2024-03-21 Thread Song Gao

qemu-system-loongarch64 assert failed with the option '-d int',
the helper_idle() raise an exception EXCP_HLT, but the exception name is 
undefined.

Signed-off-by: Song Gao 
---
 target/loongarch/cpu.c | 74 +++---
 1 file changed, 40 insertions(+), 34 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index f6ffb3aadb..4d681a733e 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -45,33 +45,45 @@ const char * const fregnames[32] = {
 "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31",
 };
 
-static const char * const excp_names[] = {
-[EXCCODE_INT] = "Interrupt",
-[EXCCODE_PIL] = "Page invalid exception for load",
-[EXCCODE_PIS] = "Page invalid exception for store",
-[EXCCODE_PIF] = "Page invalid exception for fetch",
-[EXCCODE_PME] = "Page modified exception",
-[EXCCODE_PNR] = "Page Not Readable exception",
-[EXCCODE_PNX] = "Page Not Executable exception",
-[EXCCODE_PPI] = "Page Privilege error",
-[EXCCODE_ADEF] = "Address error for instruction fetch",
-[EXCCODE_ADEM] = "Address error for Memory access",
-[EXCCODE_SYS] = "Syscall",
-[EXCCODE_BRK] = "Break",
-[EXCCODE_INE] = "Instruction Non-Existent",
-[EXCCODE_IPE] = "Instruction privilege error",
-[EXCCODE_FPD] = "Floating Point Disabled",
-[EXCCODE_FPE] = "Floating Point Exception",
-[EXCCODE_DBP] = "Debug breakpoint",
-[EXCCODE_BCE] = "Bound Check Exception",
-[EXCCODE_SXD] = "128 bit vector instructions Disable exception",
-[EXCCODE_ASXD] = "256 bit vector instructions Disable exception",
+struct TypeExcp {
+int32_t exccode;
+const char *name;
+};
+
+static const struct TypeExcp excp_names[] = {
+{EXCCODE_INT, "Interrupt"},
+{EXCCODE_PIL, "Page invalid exception for load"},
+{EXCCODE_PIS, "Page invalid exception for store"},
+{EXCCODE_PIF, "Page invalid exception for fetch"},
+{EXCCODE_PME, "Page modified exception"},
+{EXCCODE_PNR, "Page Not Readable exception"},
+{EXCCODE_PNX, "Page Not Executable exception"},
+{EXCCODE_PPI, "Page Privilege error"},
+{EXCCODE_ADEF, "Address error for instruction fetch"},
+{EXCCODE_ADEM, "Address error for Memory access"},
+{EXCCODE_SYS, "Syscall"},
+{EXCCODE_BRK, "Break"},
+{EXCCODE_INE, "Instruction Non-Existent"},
+{EXCCODE_IPE, "Instruction privilege error"},
+{EXCCODE_FPD, "Floating Point Disabled"},
+{EXCCODE_FPE, "Floating Point Exception"},
+{EXCCODE_DBP, "Debug breakpoint"},
+{EXCCODE_BCE, "Bound Check Exception"},
+{EXCCODE_SXD, "128 bit vector instructions Disable exception"},
+{EXCCODE_ASXD, "256 bit vector instructions Disable exception"},
+{EXCP_HLT, "EXCP_HLT"},
 };
 
 const char *loongarch_exception_name(int32_t exception)
 {
-assert(excp_names[exception]);
-return excp_names[exception];
+int i;
+
+for (i = 0; i < ARRAY_SIZE(excp_names); i++) {
+if (excp_names[i].exccode == exception) {
+return excp_names[i].name;
+}
+}
+return "Unknown";
 }
 
 void G_NORETURN do_raise_exception(CPULoongArchState *env,
@@ -80,7 +92,7 @@ void G_NORETURN do_raise_exception(CPULoongArchState *env,
 {
 CPUState *cs = env_cpu(env);
 
-qemu_log_mask(CPU_LOG_INT, "%s: %d (%s)\n",
+qemu_log_mask(CPU_LOG_INT, "%s: expection: %d (%s)\n",
   __func__,
   exception,
   loongarch_exception_name(exception));
@@ -154,22 +166,16 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
 CPULoongArchState *env = cpu_env(cs);
 bool update_badinstr = 1;
 int cause = -1;
-const char *name;
 bool tlbfill = FIELD_EX64(env->CSR_TLBRERA, CSR_TLBRERA, ISTLBR);
 uint32_t vec_size = FIELD_EX64(env->CSR_ECFG, CSR_ECFG, VS);
 
 if (cs->exception_index != EXCCODE_INT) {
-if (cs->exception_index < 0 ||
-cs->exception_index >= ARRAY_SIZE(excp_names)) {
-name = "unknown";
-} else {
-name = excp_names[cs->exception_index];
-}
-
 qemu_log_mask(CPU_LOG_INT,
  "%s enter: pc " TARGET_FMT_lx " ERA " TARGET_FMT_lx
- " TLBRERA " TARGET_FMT_lx " %s exception\n", __func__,
- env->pc, env->CSR_ERA, env->CSR_TLBRERA, name);
+ " TLBRERA " TARGET_FMT_lx " exception: %d (%s)\n",
+ __func__, env->pc, env->CSR_ERA, env->CSR_TLBRERA,
+ cs->exception_index,
+ loongarch_exception_name(cs->exception_index));
 }
 
 switch (cs->exception_index) {
-- 
2.25.1

[RFC PATCH v9 23/23] hw/arm/virt: Add FEAT_GICv3_NMI feature support in virt GIC

2024-03-21 Thread Jinjie Ruan via

A PE that implements FEAT_NMI and FEAT_GICv3 also implements
FEAT_GICv3_NMI. A PE that does not implement FEAT_NMI, does not implement
FEAT_GICv3_NMI

So included support FEAT_GICv3_NMI feature as part of virt platform
GIC initialization if FEAT_NMI and FEAT_GICv3 supported.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v4:
- Add Reviewed-by.
v3:
- Adjust to be the last after add FEAT_NMI to max.
- Check whether support FEAT_NMI and FEAT_GICv3 for FEAT_GICv3_NMI.
---
 hw/arm/virt.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ef2e6c2c4d..63d9f5b553 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -729,6 +729,19 @@ static void create_v2m(VirtMachineState *vms)
 vms->msi_controller = VIRT_MSI_CTRL_GICV2M;
 }
 
+/*
+ * A PE that implements FEAT_NMI and FEAT_GICv3 also implements
+ * FEAT_GICv3_NMI. A PE that does not implement FEAT_NMI, does not implement
+ * FEAT_GICv3_NMI.
+ */
+static bool gicv3_nmi_present(VirtMachineState *vms)
+{
+ARMCPU *cpu = ARM_CPU(qemu_get_cpu(0));
+
+return cpu_isar_feature(aa64_nmi, cpu) &&
+   (vms->gic_version != VIRT_GIC_VERSION_2);
+}
+
 static void create_gic(VirtMachineState *vms, MemoryRegion *mem)
 {
 MachineState *ms = MACHINE(vms);
@@ -802,6 +815,11 @@ static void create_gic(VirtMachineState *vms, MemoryRegion 
*mem)
   vms->virt);
 }
 }
+
+if (gicv3_nmi_present(vms)) {
+qdev_prop_set_bit(vms->gic, "has-nmi", true);
+}
+
 gicbusdev = SYS_BUS_DEVICE(vms->gic);
 sysbus_realize_and_unref(gicbusdev, &error_fatal);
 sysbus_mmio_map(gicbusdev, 0, vms->memmap[VIRT_GIC_DIST].base);
-- 
2.34.1

[RFC PATCH v9 10/23] hw/arm/virt: Wire NMI and VINMI irq lines from GIC to CPU

2024-03-21 Thread Jinjie Ruan via

Wire the new NMI and VINMI interrupt line from the GIC to each CPU.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v9:
- Rename ARM_CPU_VNMI to ARM_CPU_VINMI.
- Update the commit message.
v4:
- Add Reviewed-by.
v3:
- Also add VNMI wire.
---
 hw/arm/virt.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index a9a913aead..ef2e6c2c4d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -821,7 +821,8 @@ static void create_gic(VirtMachineState *vms, MemoryRegion 
*mem)
 
 /* Wire the outputs from each CPU's generic timer and the GICv3
  * maintenance interrupt signal to the appropriate GIC PPI inputs,
- * and the GIC's IRQ/FIQ/VIRQ/VFIQ interrupt outputs to the CPU's inputs.
+ * and the GIC's IRQ/FIQ/VIRQ/VFIQ/NMI/VINMI interrupt outputs to the
+ * CPU's inputs.
  */
 for (i = 0; i < smp_cpus; i++) {
 DeviceState *cpudev = DEVICE(qemu_get_cpu(i));
@@ -865,6 +866,10 @@ static void create_gic(VirtMachineState *vms, MemoryRegion 
*mem)
qdev_get_gpio_in(cpudev, ARM_CPU_VIRQ));
 sysbus_connect_irq(gicbusdev, i + 3 * smp_cpus,
qdev_get_gpio_in(cpudev, ARM_CPU_VFIQ));
+sysbus_connect_irq(gicbusdev, i + 4 * smp_cpus,
+   qdev_get_gpio_in(cpudev, ARM_CPU_NMI));
+sysbus_connect_irq(gicbusdev, i + 5 * smp_cpus,
+   qdev_get_gpio_in(cpudev, ARM_CPU_VINMI));
 }
 
 fdt_add_gic_node(vms);
-- 
2.34.1

[RFC PATCH v9 22/23] target/arm: Add FEAT_NMI to max

2024-03-21 Thread Jinjie Ruan via

Enable FEAT_NMI on the 'max' CPU.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v3:
- Add Reviewed-by.
- Sorted to last.
---
 docs/system/arm/emulation.rst | 1 +
 target/arm/tcg/cpu64.c| 1 +
 2 files changed, 2 insertions(+)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index 2a7bbb82dc..a9ae7ede9f 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -64,6 +64,7 @@ the following architecture extensions:
 - FEAT_MTE (Memory Tagging Extension)
 - FEAT_MTE2 (Memory Tagging Extension)
 - FEAT_MTE3 (MTE Asymmetric Fault Handling)
+- FEAT_NMI (Non-maskable Interrupt)
 - FEAT_NV (Nested Virtualization)
 - FEAT_NV2 (Enhanced nested virtualization support)
 - FEAT_PACIMP (Pointer authentication - IMPLEMENTATION DEFINED algorithm)
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index 9f7a9f3d2c..62c4663512 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -1175,6 +1175,7 @@ void aarch64_max_tcg_initfn(Object *obj)
 t = FIELD_DP64(t, ID_AA64PFR1, RAS_FRAC, 0);  /* FEAT_RASv1p1 + 
FEAT_DoubleFault */
 t = FIELD_DP64(t, ID_AA64PFR1, SME, 1);   /* FEAT_SME */
 t = FIELD_DP64(t, ID_AA64PFR1, CSV2_FRAC, 0); /* FEAT_CSV2_2 */
+t = FIELD_DP64(t, ID_AA64PFR1, NMI, 1);   /* FEAT_NMI */
 cpu->isar.id_aa64pfr1 = t;
 
 t = cpu->isar.id_aa64mmfr0;
-- 
2.34.1

[RFC PATCH v9 01/23] target/arm: Handle HCR_EL2 accesses for bits introduced with FEAT_NMI

2024-03-21 Thread Jinjie Ruan via

FEAT_NMI defines another three new bits in HCRX_EL2: TALLINT, HCRX_VINMI and
HCRX_VFNMI. When the feature is enabled, allow these bits to be written in
HCRX_EL2.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v9:
- Declare cpu variable to reuse latter.
v4:
- Update the comment for FEAT_NMI in hcrx_write().
- Update the commit message, s/thress/three/g.
v3:
- Add Reviewed-by.
- Add HCRX_VINMI and HCRX_VFNMI support in HCRX_EL2.
- Upate the commit messsage.
---
 target/arm/cpu-features.h | 5 +
 target/arm/helper.c   | 9 -
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index e5758d9fbc..b300d0446d 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -681,6 +681,11 @@ static inline bool isar_feature_aa64_sme(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, SME) != 0;
 }
 
+static inline bool isar_feature_aa64_nmi(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, NMI) != 0;
+}
+
 static inline bool isar_feature_aa64_tgran4_lpa2(const ARMISARegisters *id)
 {
 return FIELD_SEX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN4) >= 1;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 3f3a5b55d4..7d6c6e9878 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6183,13 +6183,20 @@ bool el_is_in_host(CPUARMState *env, int el)
 static void hcrx_write(CPUARMState *env, const ARMCPRegInfo *ri,
uint64_t value)
 {
+ARMCPU *cpu = env_archcpu(env);
+
 uint64_t valid_mask = 0;
 
 /* FEAT_MOPS adds MSCEn and MCE2 */
-if (cpu_isar_feature(aa64_mops, env_archcpu(env))) {
+if (cpu_isar_feature(aa64_mops, cpu)) {
 valid_mask |= HCRX_MSCEN | HCRX_MCE2;
 }
 
+/* FEAT_NMI adds TALLINT, VINMI and VFNMI */
+if (cpu_isar_feature(aa64_nmi, cpu)) {
+valid_mask |= HCRX_TALLINT | HCRX_VINMI | HCRX_VFNMI;
+}
+
 /* Clear RES0 bits.  */
 env->cp15.hcrx_el2 = value & valid_mask;
 }
-- 
2.34.1

[RFC PATCH v9 18/23] hw/intc/arm_gicv3: Handle icv_nmiar1_read() for icc_nmiar1_read()

2024-03-21 Thread Jinjie Ruan via

Implement icv_nmiar1_read() for icc_nmiar1_read(), so add definition for
ICH_LR_EL2.NMI and ICH_AP1R_EL2.NMI bit.

If FEAT_GICv3_NMI is supported, ich_ap_write() should consider ICH_AP1R_EL2.NMI
bit. In icv_activate_irq() and icv_eoir_write(), the ICH_AP1R_EL2.NMI bit
should be set or clear according to the Superpriority info.

By the way, add gicv3_icv_nmiar1_read trace event.

If the hpp irq is a NMI, the icv iar read should return 1022 and trap for
NMI again

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v9:
- Correct the INTID_NMI logic.
v8:
- Fix an unexpected interrupt bug when sending VNMI by running qemu VM.
v7:
- Add Reviewed-by.
v6:
- Implement icv_nmiar1_read().
---
 hw/intc/arm_gicv3_cpuif.c | 52 +++
 hw/intc/gicv3_internal.h  |  3 +++
 hw/intc/trace-events  |  1 +
 3 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index df82a413c6..65e84bc013 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -728,7 +728,7 @@ static uint64_t icv_hppir_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 return value;
 }
 
-static void icv_activate_irq(GICv3CPUState *cs, int idx, int grp)
+static void icv_activate_irq(GICv3CPUState *cs, int idx, int grp, bool nmi)
 {
 /* Activate the interrupt in the specified list register
  * by moving it from Pending to Active state, and update the
@@ -742,7 +742,12 @@ static void icv_activate_irq(GICv3CPUState *cs, int idx, 
int grp)
 
 cs->ich_lr_el2[idx] &= ~ICH_LR_EL2_STATE_PENDING_BIT;
 cs->ich_lr_el2[idx] |= ICH_LR_EL2_STATE_ACTIVE_BIT;
-cs->ich_apr[grp][regno] |= (1 << regbit);
+
+if (cs->gic->nmi_support) {
+cs->ich_apr[grp][regno] |= (1 << regbit) | (nmi ? ICH_AP1R_EL2_NMI : 
0);
+} else {
+cs->ich_apr[grp][regno] |= (1 << regbit);
+}
 }
 
 static void icv_activate_vlpi(GICv3CPUState *cs)
@@ -776,7 +781,11 @@ static uint64_t icv_iar_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 if (thisgrp == grp && icv_hppi_can_preempt(cs, lr)) {
 intid = ich_lr_vintid(lr);
 if (!gicv3_intid_is_special(intid)) {
-icv_activate_irq(cs, idx, grp);
+if (!(lr & ICH_LR_EL2_NMI)) {
+icv_activate_irq(cs, idx, grp, false);
+} else {
+intid = INTID_NMI;
+}
 } else {
 /* Interrupt goes from Pending to Invalid */
 cs->ich_lr_el2[idx] &= ~ICH_LR_EL2_STATE_PENDING_BIT;
@@ -797,8 +806,32 @@ static uint64_t icv_iar_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 
 static uint64_t icv_nmiar1_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
-/* todo */
+GICv3CPUState *cs = icc_cs_from_env(env);
+int idx = hppvi_index(cs);
 uint64_t intid = INTID_SPURIOUS;
+
+if (idx >= 0 && idx != HPPVI_INDEX_VLPI) {
+uint64_t lr = cs->ich_lr_el2[idx];
+int thisgrp = (lr & ICH_LR_EL2_GROUP) ? GICV3_G1NS : GICV3_G0;
+
+if ((thisgrp == GICV3_G1NS) && (lr & ICH_LR_EL2_NMI)) {
+intid = ich_lr_vintid(lr);
+if (!gicv3_intid_is_special(intid)) {
+icv_activate_irq(cs, idx, GICV3_G1NS, true);
+} else {
+/* Interrupt goes from Pending to Invalid */
+cs->ich_lr_el2[idx] &= ~ICH_LR_EL2_STATE_PENDING_BIT;
+/* We will now return the (bogus) ID from the list register,
+ * as per the pseudocode.
+ */
+}
+}
+}
+
+trace_gicv3_icv_nmiar1_read(gicv3_redist_affid(cs), intid);
+
+gicv3_cpuif_virt_update(cs);
+
 return intid;
 }
 
@@ -1403,6 +1436,11 @@ static int icv_drop_prio(GICv3CPUState *cs)
 return (apr0count + i * 32) << (icv_min_vbpr(cs) + 1);
 } else {
 *papr1 &= *papr1 - 1;
+
+if (cs->gic->nmi_support && (*papr1 & ICH_AP1R_EL2_NMI)) {
+*papr1 &= ~ICH_AP1R_EL2_NMI;
+}
+
 return (apr1count + i * 32) << (icv_min_vbpr(cs) + 1);
 }
 }
@@ -2552,7 +2590,11 @@ static void ich_ap_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 
 trace_gicv3_ich_ap_write(ri->crm & 1, regno, gicv3_redist_affid(cs), 
value);
 
-cs->ich_apr[grp][regno] = value & 0xU;
+if (cs->gic->nmi_support) {
+cs->ich_apr[grp][regno] = value & (0xU | ICH_AP1R_EL2_NMI);
+} else {
+cs->ich_apr[grp][regno] = value & 0xU;
+}
 gicv3_cpuif_virt_irq_fiq_update(cs);
 }
 
diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h
index 93e56b3726..5e2b32861d 100644
--- a/hw/intc/gicv3_internal.h
+++ b/hw/intc/gicv3_internal.h
@@ -242,6 +242,7 @@ FIELD(GICR_VPENDBASER, VALID, 63, 1)
 #define ICH_LR_EL2_PRIORITY_SHIFT 48
 #define ICH_LR_EL2_PRIORITY_LENGTH 8
 #define ICH_LR_EL2_PRIORITY_MASK (0xffULL << ICH_LR_EL2_PRIORITY_SHIFT)
+#def

[RFC PATCH v9 21/23] hw/intc/arm_gicv3: Report the VINMI interrupt

2024-03-21 Thread Jinjie Ruan via

In vCPU Interface, if the vIRQ has the superpriority property, report
vINMI to the corresponding vPE.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v9:
- Update the commit subject and message, vNMI -> vINMI.
v6:
- Add Reviewed-by.
---
 hw/intc/arm_gicv3_cpuif.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index f8bc74323f..faafd17703 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -465,6 +465,7 @@ void gicv3_cpuif_virt_irq_fiq_update(GICv3CPUState *cs)
 int idx;
 int irqlevel = 0;
 int fiqlevel = 0;
+int nmilevel = 0;
 
 idx = hppvi_index(cs);
 trace_gicv3_cpuif_virt_update(gicv3_redist_affid(cs), idx,
@@ -482,9 +483,17 @@ void gicv3_cpuif_virt_irq_fiq_update(GICv3CPUState *cs)
 uint64_t lr = cs->ich_lr_el2[idx];
 
 if (icv_hppi_can_preempt(cs, lr)) {
-/* Virtual interrupts are simple: G0 are always FIQ, and G1 IRQ */
+/*
+ * Virtual interrupts are simple: G0 are always FIQ, and G1 are
+ * IRQ or NMI which depends on the ICH_LR_EL2.NMI to have
+ * non-maskable property.
+ */
 if (lr & ICH_LR_EL2_GROUP) {
-irqlevel = 1;
+if (cs->gic->nmi_support && (lr & ICH_LR_EL2_NMI)) {
+nmilevel = 1;
+} else {
+irqlevel = 1;
+}
 } else {
 fiqlevel = 1;
 }
@@ -494,6 +503,7 @@ void gicv3_cpuif_virt_irq_fiq_update(GICv3CPUState *cs)
 trace_gicv3_cpuif_virt_set_irqs(gicv3_redist_affid(cs), fiqlevel, 
irqlevel);
 qemu_set_irq(cs->parent_vfiq, fiqlevel);
 qemu_set_irq(cs->parent_virq, irqlevel);
+qemu_set_irq(cs->parent_vnmi, nmilevel);
 }
 
 static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
-- 
2.34.1

[RFC PATCH v9 20/23] hw/intc/arm_gicv3: Report the NMI interrupt in gicv3_cpuif_update()

2024-03-21 Thread Jinjie Ruan via

In CPU Interface, if the IRQ has the superpriority property, report
NMI to the corresponding PE.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v6:
- Add Reviewed-by.
v4:
- Swap the ordering of the IFs.
v3:
- Remove handling nmi_is_irq flag.
---
 hw/intc/arm_gicv3_cpuif.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index 65e84bc013..f8bc74323f 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -971,6 +971,7 @@ void gicv3_cpuif_update(GICv3CPUState *cs)
 /* Tell the CPU about its highest priority pending interrupt */
 int irqlevel = 0;
 int fiqlevel = 0;
+int nmilevel = 0;
 ARMCPU *cpu = ARM_CPU(cs->cpu);
 CPUARMState *env = &cpu->env;
 
@@ -1009,6 +1010,8 @@ void gicv3_cpuif_update(GICv3CPUState *cs)
 
 if (isfiq) {
 fiqlevel = 1;
+} else if (cs->hppi.superprio) {
+nmilevel = 1;
 } else {
 irqlevel = 1;
 }
@@ -1018,6 +1021,7 @@ void gicv3_cpuif_update(GICv3CPUState *cs)
 
 qemu_set_irq(cs->parent_fiq, fiqlevel);
 qemu_set_irq(cs->parent_irq, irqlevel);
+qemu_set_irq(cs->parent_nmi, nmilevel);
 }
 
 static uint64_t icc_pmr_read(CPUARMState *env, const ARMCPRegInfo *ri)
-- 
2.34.1

[RFC PATCH v9 16/23] hw/intc: Enable FEAT_GICv3_NMI Feature

2024-03-21 Thread Jinjie Ruan via

Added properties to enable FEAT_GICv3_NMI feature, setup distributor
and redistributor registers to indicate NMI support.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v4:
- Add Reviewed-by.
---
 hw/intc/arm_gicv3_common.c | 1 +
 hw/intc/arm_gicv3_dist.c   | 2 ++
 hw/intc/gicv3_internal.h   | 1 +
 include/hw/intc/arm_gicv3_common.h | 1 +
 4 files changed, 5 insertions(+)

diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index c52f060026..2d2cea6858 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -569,6 +569,7 @@ static Property arm_gicv3_common_properties[] = {
 DEFINE_PROP_UINT32("num-irq", GICv3State, num_irq, 32),
 DEFINE_PROP_UINT32("revision", GICv3State, revision, 3),
 DEFINE_PROP_BOOL("has-lpi", GICv3State, lpi_enable, 0),
+DEFINE_PROP_BOOL("has-nmi", GICv3State, nmi_support, 0),
 DEFINE_PROP_BOOL("has-security-extensions", GICv3State, security_extn, 0),
 /*
  * Compatibility property: force 8 bits of physical priority, even
diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
index 9739404e35..c4e28d209a 100644
--- a/hw/intc/arm_gicv3_dist.c
+++ b/hw/intc/arm_gicv3_dist.c
@@ -412,6 +412,7 @@ static bool gicd_readl(GICv3State *s, hwaddr offset,
  *  by GICD_TYPER.IDbits)
  * MBIS == 0 (message-based SPIs not supported)
  * SecurityExtn == 1 if security extns supported
+ * NMI = 1 if Non-maskable interrupt property is supported
  * CPUNumber == 0 since for us ARE is always 1
  * ITLinesNumber == (((max SPI IntID + 1) / 32) - 1)
  */
@@ -425,6 +426,7 @@ static bool gicd_readl(GICv3State *s, hwaddr offset,
 bool dvis = s->revision >= 4;
 
 *data = (1 << 25) | (1 << 24) | (dvis << 18) | (sec_extn << 10) |
+(s->nmi_support << GICD_TYPER_NMI_SHIFT) |
 (s->lpi_enable << GICD_TYPER_LPIS_SHIFT) |
 (0xf << 19) | itlinesnumber;
 return true;
diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h
index a1fc34597e..8d793243f4 100644
--- a/hw/intc/gicv3_internal.h
+++ b/hw/intc/gicv3_internal.h
@@ -70,6 +70,7 @@
 #define GICD_CTLR_E1NWF (1U << 7)
 #define GICD_CTLR_RWP   (1U << 31)
 
+#define GICD_TYPER_NMI_SHIFT   9
 #define GICD_TYPER_LPIS_SHIFT  17
 
 /* 16 bits EventId */
diff --git a/include/hw/intc/arm_gicv3_common.h 
b/include/hw/intc/arm_gicv3_common.h
index df4380141d..16c5fa7256 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -251,6 +251,7 @@ struct GICv3State {
 uint32_t num_irq;
 uint32_t revision;
 bool lpi_enable;
+bool nmi_support;
 bool security_extn;
 bool force_8bit_prio;
 bool irq_reset_nonsecure;
-- 
2.34.1

[RFC PATCH v9 14/23] hw/intc/arm_gicv3_redist: Implement GICR_INMIR0

2024-03-21 Thread Jinjie Ruan via

Add GICR_INMIR0 register and support access GICR_INMIR0.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v6:
- Add Reviewed-by.
v4:
- Make the GICR_INMIR0 implementation more clearer.
---
 hw/intc/arm_gicv3_redist.c | 19 +++
 hw/intc/gicv3_internal.h   |  1 +
 2 files changed, 20 insertions(+)

diff --git a/hw/intc/arm_gicv3_redist.c b/hw/intc/arm_gicv3_redist.c
index 8153525849..7a16a058b1 100644
--- a/hw/intc/arm_gicv3_redist.c
+++ b/hw/intc/arm_gicv3_redist.c
@@ -35,6 +35,15 @@ static int gicr_ns_access(GICv3CPUState *cs, int irq)
 return extract32(cs->gicr_nsacr, irq * 2, 2);
 }
 
+static void gicr_write_bitmap_reg(GICv3CPUState *cs, MemTxAttrs attrs,
+  uint32_t *reg, uint32_t val)
+{
+/* Helper routine to implement writing to a "set" register */
+val &= mask_group(cs, attrs);
+*reg = val;
+gicv3_redist_update(cs);
+}
+
 static void gicr_write_set_bitmap_reg(GICv3CPUState *cs, MemTxAttrs attrs,
   uint32_t *reg, uint32_t val)
 {
@@ -406,6 +415,10 @@ static MemTxResult gicr_readl(GICv3CPUState *cs, hwaddr 
offset,
 *data = value;
 return MEMTX_OK;
 }
+case GICR_INMIR0:
+*data = cs->gic->nmi_support ?
+gicr_read_bitmap_reg(cs, attrs, cs->gicr_isuperprio) : 0;
+return MEMTX_OK;
 case GICR_ICFGR0:
 case GICR_ICFGR1:
 {
@@ -555,6 +568,12 @@ static MemTxResult gicr_writel(GICv3CPUState *cs, hwaddr 
offset,
 gicv3_redist_update(cs);
 return MEMTX_OK;
 }
+case GICR_INMIR0:
+if (cs->gic->nmi_support) {
+gicr_write_bitmap_reg(cs, attrs, &cs->gicr_isuperprio, value);
+}
+return MEMTX_OK;
+
 case GICR_ICFGR0:
 /* Register is all RAZ/WI or RAO/WI bits */
 return MEMTX_OK;
diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h
index 29d5cdc1b6..f35b7d2f03 100644
--- a/hw/intc/gicv3_internal.h
+++ b/hw/intc/gicv3_internal.h
@@ -109,6 +109,7 @@
 #define GICR_ICFGR1   (GICR_SGI_OFFSET + 0x0C04)
 #define GICR_IGRPMODR0(GICR_SGI_OFFSET + 0x0D00)
 #define GICR_NSACR(GICR_SGI_OFFSET + 0x0E00)
+#define GICR_INMIR0   (GICR_SGI_OFFSET + 0x0F80)
 
 /* VLPI redistributor registers, offsets from VLPI_base */
 #define GICR_VPROPBASER   (GICR_VLPI_OFFSET + 0x70)
-- 
2.34.1

[RFC PATCH v9 08/23] target/arm: Handle IS/FS in ISR_EL1 for NMI, VINMI and VFNMI

2024-03-21 Thread Jinjie Ruan via

Add IS and FS bit in ISR_EL1 and handle the read. With CPU_INTERRUPT_NMI or
CPU_INTERRUPT_VINMI, both CPSR_I and ISR_IS must be set. With
CPU_INTERRUPT_VFNMI, both CPSR_F and ISR_FS must be set.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v9:
- CPU_INTERRUPT_VNMI -> CPU_INTERRUPT_VINMI.
- Handle CPSR_F and ISR_FS according to CPU_INTERRUPT_VFNMI instead of
  CPU_INTERRUPT_VFIQ and HCRX_EL2.VFNMI.
- Update the commit message.
v7:
- env->cp15.hcrx_el2 -> arm_hcrx_el2_eff().
- Add Reviewed-by.
v6:
- Verify that HCR_EL2.VF is set before checking VFNMI.
v4；
- Also handle VNMI.
v3:
- CPU_INTERRUPT_NMI do not set FIQ, so remove it.
- With CPU_INTERRUPT_NMI, both CPSR_I and ISR_IS must be set.
---
 target/arm/cpu.h|  2 ++
 target/arm/helper.c | 13 +
 2 files changed, 15 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 08a6bc50de..97997dbd08 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1398,6 +1398,8 @@ void pmu_init(ARMCPU *cpu);
 #define CPSR_N (1U << 31)
 #define CPSR_NZCV (CPSR_N | CPSR_Z | CPSR_C | CPSR_V)
 #define CPSR_AIF (CPSR_A | CPSR_I | CPSR_F)
+#define ISR_FS (1U << 9)
+#define ISR_IS (1U << 10)
 
 #define CPSR_IT (CPSR_IT_0_1 | CPSR_IT_2_7)
 #define CACHED_CPSR_BITS (CPSR_T | CPSR_AIF | CPSR_GE | CPSR_IT | CPSR_Q \
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 077c9a6923..b57114d35d 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -2021,16 +2021,29 @@ static uint64_t isr_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 if (cs->interrupt_request & CPU_INTERRUPT_VIRQ) {
 ret |= CPSR_I;
 }
+if (cs->interrupt_request & CPU_INTERRUPT_VINMI) {
+ret |= ISR_IS;
+ret |= CPSR_I;
+}
 } else {
 if (cs->interrupt_request & CPU_INTERRUPT_HARD) {
 ret |= CPSR_I;
 }
+
+if (cs->interrupt_request & CPU_INTERRUPT_NMI) {
+ret |= ISR_IS;
+ret |= CPSR_I;
+}
 }
 
 if (hcr_el2 & HCR_FMO) {
 if (cs->interrupt_request & CPU_INTERRUPT_VFIQ) {
 ret |= CPSR_F;
 }
+if (cs->interrupt_request & CPU_INTERRUPT_VFNMI) {
+ret |= ISR_FS;
+ret |= CPSR_F;
+}
 } else {
 if (cs->interrupt_request & CPU_INTERRUPT_FIQ) {
 ret |= CPSR_F;
-- 
2.34.1

[RFC PATCH v9 07/23] target/arm: Add support for NMI in arm_phys_excp_target_el()

2024-03-21 Thread Jinjie Ruan via

According to Arm GIC section 4.6.3 Interrupt superpriority, the interrupt
with superpriority is always IRQ, never FIQ, so handle NMI same as IRQ in
arm_phys_excp_target_el().

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v4:
- Add Reviewed-by.
v3:
- Remove nmi_is_irq flag in CPUARMState.
- Handle NMI same as IRQ in arm_phys_excp_target_el().
---
 target/arm/helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 1868235499..077c9a6923 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -10760,6 +10760,7 @@ uint32_t arm_phys_excp_target_el(CPUState *cs, uint32_t 
excp_idx,
 hcr_el2 = arm_hcr_el2_eff(env);
 switch (excp_idx) {
 case EXCP_IRQ:
+case EXCP_NMI:
 scr = ((env->cp15.scr_el3 & SCR_IRQ) == SCR_IRQ);
 hcr = hcr_el2 & HCR_IMO;
 break;
-- 
2.34.1

[RFC PATCH v9 02/23] target/arm: Add PSTATE.ALLINT

2024-03-21 Thread Jinjie Ruan via

When PSTATE.ALLINT is set, an IRQ or FIQ interrupt that is targeted to
ELx, with or without superpriority is masked.

As Richard suggested, place ALLINT bit in PSTATE in env->pstate.

With the change to pstate_read/write, exception entry
and return are automatically handled.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v5:
- Remove the ALLINT comment, as it is covered by "all other bits".
- Add Reviewed-by.
v4:
- Keep PSTATE.ALLINT in env->pstate but not env->allint.
- Update the commit message.
v3:
- Remove ALLINT dump in aarch64_cpu_dump_state().
- Update the commit message.
---
 target/arm/cpu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index bc0c84873f..de740d223f 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1430,6 +1430,7 @@ void pmu_init(ARMCPU *cpu);
 #define PSTATE_D (1U << 9)
 #define PSTATE_BTYPE (3U << 10)
 #define PSTATE_SSBS (1U << 12)
+#define PSTATE_ALLINT (1U << 13)
 #define PSTATE_IL (1U << 20)
 #define PSTATE_SS (1U << 21)
 #define PSTATE_PAN (1U << 22)
-- 
2.34.1

[RFC PATCH v9 06/23] target/arm: Add support for Non-maskable Interrupt

2024-03-21 Thread Jinjie Ruan via

This only implements the external delivery method via the GICv3.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v9:
- Update the GPIOs passed in the arm_cpu_kvm_set_irq, and update the comment.
- Definitely not merge VINMI and VFNMI into EXCP_VNMI.
- Update VINMI and VFNMI when writing HCR_EL2 or HCRX_EL2.
v8:
- Fix the rcu stall after sending a VNMI in qemu VM.
v7:
- Add Reviewed-by.
v6:
- env->cp15.hcr_el2 -> arm_hcr_el2_eff().
- env->cp15.hcrx_el2 -> arm_hcrx_el2_eff().
- Not include VF && VFNMI in CPU_INTERRUPT_VNMI.
v4:
- Accept NMI unconditionally for arm_cpu_has_work() but add comment.
- Change from & to && for EXCP_IRQ or EXCP_FIQ.
- Refator nmi mask in arm_excp_unmasked().
- Also handle VNMI in arm_cpu_exec_interrupt() and arm_cpu_set_irq().
- Rename virtual to Virtual.
v3:
- Not include CPU_INTERRUPT_NMI when FEAT_NMI not enabled
- Add ARM_CPU_VNMI.
- Refator nmi mask in arm_excp_unmasked().
- Test SCTLR_ELx.NMI for ALLINT mask for NMI.
---
 target/arm/cpu-qom.h   |   5 +-
 target/arm/cpu.c   | 124 ++---
 target/arm/cpu.h   |   6 ++
 target/arm/helper.c|  33 +--
 target/arm/internals.h |  18 ++
 5 files changed, 172 insertions(+), 14 deletions(-)

diff --git a/target/arm/cpu-qom.h b/target/arm/cpu-qom.h
index 8e032691db..b497667d61 100644
--- a/target/arm/cpu-qom.h
+++ b/target/arm/cpu-qom.h
@@ -36,11 +36,14 @@ DECLARE_CLASS_CHECKERS(AArch64CPUClass, AARCH64_CPU,
 #define ARM_CPU_TYPE_SUFFIX "-" TYPE_ARM_CPU
 #define ARM_CPU_TYPE_NAME(name) (name ARM_CPU_TYPE_SUFFIX)
 
-/* Meanings of the ARMCPU object's four inbound GPIO lines */
+/* Meanings of the ARMCPU object's seven inbound GPIO lines */
 #define ARM_CPU_IRQ 0
 #define ARM_CPU_FIQ 1
 #define ARM_CPU_VIRQ 2
 #define ARM_CPU_VFIQ 3
+#define ARM_CPU_NMI 4
+#define ARM_CPU_VINMI 5
+#define ARM_CPU_VFNMI 6
 
 /* For M profile, some registers are banked secure vs non-secure;
  * these are represented as a 2-element array where the first element
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index ab8d007a86..f1e7ae0975 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -122,6 +122,13 @@ void arm_restore_state_to_opc(CPUState *cs,
 }
 #endif /* CONFIG_TCG */
 
+/*
+ * With SCTLR_ELx.NMI == 0, IRQ with Superpriority is masked identically with
+ * IRQ without Superpriority. Moreover, if the GIC is configured so that
+ * FEAT_GICv3_NMI is only set if FEAT_NMI is set, then we won't ever see
+ * CPU_INTERRUPT_*NMI anyway. So we might as well accept NMI here
+ * unconditionally.
+ */
 static bool arm_cpu_has_work(CPUState *cs)
 {
 ARMCPU *cpu = ARM_CPU(cs);
@@ -129,6 +136,7 @@ static bool arm_cpu_has_work(CPUState *cs)
 return (cpu->power_state != PSCI_OFF)
 && cs->interrupt_request &
 (CPU_INTERRUPT_FIQ | CPU_INTERRUPT_HARD
+ | CPU_INTERRUPT_NMI | CPU_INTERRUPT_VINMI | CPU_INTERRUPT_VFNMI
  | CPU_INTERRUPT_VFIQ | CPU_INTERRUPT_VIRQ | CPU_INTERRUPT_VSERR
  | CPU_INTERRUPT_EXITTB);
 }
@@ -668,6 +676,7 @@ static inline bool arm_excp_unmasked(CPUState *cs, unsigned 
int excp_idx,
 CPUARMState *env = cpu_env(cs);
 bool pstate_unmasked;
 bool unmasked = false;
+bool allIntMask = false;
 
 /*
  * Don't take exceptions if they target a lower EL.
@@ -678,13 +687,36 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
unsigned int excp_idx,
 return false;
 }
 
+if (cpu_isar_feature(aa64_nmi, env_archcpu(env)) &&
+env->cp15.sctlr_el[target_el] & SCTLR_NMI && cur_el == target_el) {
+allIntMask = env->pstate & PSTATE_ALLINT ||
+ ((env->cp15.sctlr_el[target_el] & SCTLR_SPINTMASK) &&
+  (env->pstate & PSTATE_SP));
+}
+
 switch (excp_idx) {
+case EXCP_NMI:
+pstate_unmasked = !allIntMask;
+break;
+
+case EXCP_VINMI:
+if (!(hcr_el2 & HCR_IMO) || (hcr_el2 & HCR_TGE)) {
+/* VINMIs are only taken when hypervized.  */
+return false;
+}
+return !allIntMask;
+case EXCP_VFNMI:
+if (!(hcr_el2 & HCR_FMO) || (hcr_el2 & HCR_TGE)) {
+/* VFNMIs are only taken when hypervized.  */
+return false;
+}
+return !allIntMask;
 case EXCP_FIQ:
-pstate_unmasked = !(env->daif & PSTATE_F);
+pstate_unmasked = (!(env->daif & PSTATE_F)) && (!allIntMask);
 break;
 
 case EXCP_IRQ:
-pstate_unmasked = !(env->daif & PSTATE_I);
+pstate_unmasked = (!(env->daif & PSTATE_I)) && (!allIntMask);
 break;
 
 case EXCP_VFIQ:
@@ -692,13 +724,13 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
unsigned int excp_idx,
 /* VFIQs are only taken when hypervized.  */
 return false;
 }
-return !(env->daif & PSTATE_F);
+return !(env->daif & PSTATE_F) && (!allIntMask);
 case EXCP_VIRQ:
 if (!(hcr_el2 & HCR_IMO) || (hcr_el2 & HCR_TGE)) {

[RFC PATCH v9 13/23] hw/intc/arm_gicv3: Add irq superpriority information

2024-03-21 Thread Jinjie Ruan via

A SPI, PPI or SGI interrupt can have a superpriority property. So
maintain superpriority information in PendingIrq and GICR/GICD.

Signed-off-by: Jinjie Ruan 
Acked-by: Richard Henderson 
---
v3:
- Place this ahead of implement GICR_INMIR.
- Add Acked-by.
---
 include/hw/intc/arm_gicv3_common.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/hw/intc/arm_gicv3_common.h 
b/include/hw/intc/arm_gicv3_common.h
index 7324c7d983..df4380141d 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -146,6 +146,7 @@ typedef struct {
 int irq;
 uint8_t prio;
 int grp;
+bool superprio;
 } PendingIrq;
 
 struct GICv3CPUState {
@@ -172,6 +173,7 @@ struct GICv3CPUState {
 uint32_t gicr_ienabler0;
 uint32_t gicr_ipendr0;
 uint32_t gicr_iactiver0;
+uint32_t gicr_isuperprio;
 uint32_t edge_trigger; /* ICFGR0 and ICFGR1 even bits */
 uint32_t gicr_igrpmodr0;
 uint32_t gicr_nsacr;
@@ -274,6 +276,7 @@ struct GICv3State {
 GIC_DECLARE_BITMAP(active);   /* GICD_ISACTIVER */
 GIC_DECLARE_BITMAP(level);/* Current level */
 GIC_DECLARE_BITMAP(edge_trigger); /* GICD_ICFGR even bits */
+GIC_DECLARE_BITMAP(superprio);/* GICD_INMIR */
 uint8_t gicd_ipriority[GICV3_MAXIRQ];
 uint64_t gicd_irouter[GICV3_MAXIRQ];
 /* Cached information: pointer to the cpu i/f for the CPUs specified
@@ -313,6 +316,7 @@ GICV3_BITMAP_ACCESSORS(pending)
 GICV3_BITMAP_ACCESSORS(active)
 GICV3_BITMAP_ACCESSORS(level)
 GICV3_BITMAP_ACCESSORS(edge_trigger)
+GICV3_BITMAP_ACCESSORS(superprio)
 
 #define TYPE_ARM_GICV3_COMMON "arm-gicv3-common"
 typedef struct ARMGICv3CommonClass ARMGICv3CommonClass;
-- 
2.34.1

[RFC PATCH v9 19/23] hw/intc/arm_gicv3: Implement NMI interrupt prioirty

2024-03-21 Thread Jinjie Ruan via

If GICD_CTLR_DS bit is zero and the NMI is non-secure, the NMI prioirty
is higher than 0x80, otherwise it is higher than 0x0. And save NMI
super prioirty information in hppi.superprio to deliver NMI exception.
Since both GICR and GICD can deliver NMI, it is both necessary to check
whether the pending irq is NMI in gicv3_redist_update_noirqset and
gicv3_update_noirqset. And In irqbetter(), only a non-NMI with the same
priority and a smaller interrupt number can be preempted but not NMI.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v8:
- Add Reviewed-by.
v7:
- Reorder the irqbetter() code for clarity.
- Eliminate the has_superprio local variable for gicv3_get_priority().
- false -> cs->hpplpi.superprio in gicv3_redist_update_noirqset().
- 0x0 -> false in arm_gicv3_common_reset_hold().
- Clear superprio in several places for hppi, hpplpi and hppvlpi.
v6:
- Put the "extract superprio info" logic into gicv3_get_priority().
- Update the comment in irqbetter().
- Reset the cs->hppi.superprio to 0x0.
- Set hppi.superprio to false for LPI.
v4:
- Replace is_nmi with has_superprio to not a mix NMI and superpriority.
- Update the comment in irqbetter().
- Extract gicv3_get_priority() to avoid code repeat.
---
v3:
- Add missing brace
---
 hw/intc/arm_gicv3.c| 69 +-
 hw/intc/arm_gicv3_common.c |  3 ++
 hw/intc/arm_gicv3_redist.c |  3 ++
 3 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
index 0b8f79a122..9496a28005 100644
--- a/hw/intc/arm_gicv3.c
+++ b/hw/intc/arm_gicv3.c
@@ -21,7 +21,8 @@
 #include "hw/intc/arm_gicv3.h"
 #include "gicv3_internal.h"
 
-static bool irqbetter(GICv3CPUState *cs, int irq, uint8_t prio)
+static bool irqbetter(GICv3CPUState *cs, int irq, uint8_t prio,
+  bool has_superprio)
 {
 /* Return true if this IRQ at this priority should take
  * precedence over the current recorded highest priority
@@ -30,14 +31,23 @@ static bool irqbetter(GICv3CPUState *cs, int irq, uint8_t 
prio)
  * is the same as this one (a property which the calling code
  * relies on).
  */
-if (prio < cs->hppi.prio) {
-return true;
+if (prio != cs->hppi.prio) {
+return prio < cs->hppi.prio;
+}
+
+/*
+ * The same priority IRQ with superpriority should signal to the CPU
+ * as it have the priority higher than the labelled 0x80 or 0x00.
+ */
+if (has_superprio != cs->hppi.superprio) {
+return has_superprio;
 }
+
 /* If multiple pending interrupts have the same priority then it is an
  * IMPDEF choice which of them to signal to the CPU. We choose to
  * signal the one with the lowest interrupt number.
  */
-if (prio == cs->hppi.prio && irq <= cs->hppi.irq) {
+if (irq <= cs->hppi.irq) {
 return true;
 }
 return false;
@@ -129,6 +139,40 @@ static uint32_t gicr_int_pending(GICv3CPUState *cs)
 return pend;
 }
 
+static bool gicv3_get_priority(GICv3CPUState *cs, bool is_redist,
+   uint8_t *prio, int irq)
+{
+uint32_t superprio = 0x0;
+
+if (is_redist) {
+superprio = extract32(cs->gicr_isuperprio, irq, 1);
+} else {
+superprio = *gic_bmp_ptr32(cs->gic->superprio, irq);
+superprio = superprio & (1 << (irq & 0x1f));
+}
+
+if (superprio) {
+/* DS = 0 & Non-secure NMI */
+if (!(cs->gic->gicd_ctlr & GICD_CTLR_DS) &&
+((is_redist && extract32(cs->gicr_igroupr0, irq, 1)) ||
+ (!is_redist && gicv3_gicd_group_test(cs->gic, irq {
+*prio = 0x80;
+} else {
+*prio = 0x0;
+}
+
+return true;
+}
+
+if (is_redist) {
+*prio = cs->gicr_ipriorityr[irq];
+} else {
+*prio = cs->gic->gicd_ipriority[irq];
+}
+
+return false;
+}
+
 /* Update the interrupt status after state in a redistributor
  * or CPU interface has changed, but don't tell the CPU i/f.
  */
@@ -141,6 +185,7 @@ static void gicv3_redist_update_noirqset(GICv3CPUState *cs)
 uint8_t prio;
 int i;
 uint32_t pend;
+bool has_superprio = false;
 
 /* Find out which redistributor interrupts are eligible to be
  * signaled to the CPU interface.
@@ -152,10 +197,11 @@ static void gicv3_redist_update_noirqset(GICv3CPUState 
*cs)
 if (!(pend & (1 << i))) {
 continue;
 }
-prio = cs->gicr_ipriorityr[i];
-if (irqbetter(cs, i, prio)) {
+has_superprio = gicv3_get_priority(cs, true, &prio, i);
+if (irqbetter(cs, i, prio, has_superprio)) {
 cs->hppi.irq = i;
 cs->hppi.prio = prio;
+cs->hppi.superprio = has_superprio;
 seenbetter = true;
 }
 }
@@ -168,9 +214,11 @@ static void gicv3_redist_update_noirqset(GICv3CPUState *cs)
 if ((cs->gicr_ctlr & GICR_CTLR_ENABLE_L

[RFC PATCH v9 12/23] target/arm: Handle NMI in arm_cpu_do_interrupt_aarch64()

2024-03-21 Thread Jinjie Ruan via

According to Arm GIC section 4.6.3 Interrupt superpriority, the interrupt
with superpriority is always IRQ, never FIQ, so the NMI exception trap entry
behave like IRQ. And VINMI(vIRQ with Superpriority) can be raised from the
GIC or come from the hcrx_el2.HCRX_VINMI bit, VFNMI(vFIQ with Superpriority)
come from the hcrx_el2.HCRX_VFNMI bit.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v9:
- Update the commit message.
- Handle VINMI and VFNMI.
v7:
- Add Reviewed-by.
v6:
- Not combine VFNMI with CPU_INTERRUPT_VNMI.
v4:
- Also handle VNMI in arm_cpu_do_interrupt_aarch64().
v3:
- Remove the FIQ NMI handle.
---
 target/arm/helper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 967e833ee8..eef37b801d 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11650,10 +11650,13 @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
 break;
 case EXCP_IRQ:
 case EXCP_VIRQ:
+case EXCP_NMI:
+case EXCP_VINMI:
 addr += 0x80;
 break;
 case EXCP_FIQ:
 case EXCP_VFIQ:
+case EXCP_VFNMI:
 addr += 0x100;
 break;
 case EXCP_VSERR:
-- 
2.34.1

[RFC PATCH v9 11/23] hw/intc/arm_gicv3: Add external IRQ lines for NMI

2024-03-21 Thread Jinjie Ruan via

Augment the GICv3's QOM device interface by adding one
new set of sysbus IRQ line, to signal NMI to each CPU.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v4:
- Add Reviewed-by.
v3:
- Add support for VNMI.
---
 hw/intc/arm_gicv3_common.c | 6 ++
 include/hw/intc/arm_gic_common.h   | 2 ++
 include/hw/intc/arm_gicv3_common.h | 2 ++
 3 files changed, 10 insertions(+)

diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index cb55c72681..c52f060026 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -299,6 +299,12 @@ void gicv3_init_irqs_and_mmio(GICv3State *s, 
qemu_irq_handler handler,
 for (i = 0; i < s->num_cpu; i++) {
 sysbus_init_irq(sbd, &s->cpu[i].parent_vfiq);
 }
+for (i = 0; i < s->num_cpu; i++) {
+sysbus_init_irq(sbd, &s->cpu[i].parent_nmi);
+}
+for (i = 0; i < s->num_cpu; i++) {
+sysbus_init_irq(sbd, &s->cpu[i].parent_vnmi);
+}
 
 memory_region_init_io(&s->iomem_dist, OBJECT(s), ops, s,
   "gicv3_dist", 0x1);
diff --git a/include/hw/intc/arm_gic_common.h b/include/hw/intc/arm_gic_common.h
index 7080375008..97fea4102d 100644
--- a/include/hw/intc/arm_gic_common.h
+++ b/include/hw/intc/arm_gic_common.h
@@ -71,6 +71,8 @@ struct GICState {
 qemu_irq parent_fiq[GIC_NCPU];
 qemu_irq parent_virq[GIC_NCPU];
 qemu_irq parent_vfiq[GIC_NCPU];
+qemu_irq parent_nmi[GIC_NCPU];
+qemu_irq parent_vnmi[GIC_NCPU];
 qemu_irq maintenance_irq[GIC_NCPU];
 
 /* GICD_CTLR; for a GIC with the security extensions the NS banked version
diff --git a/include/hw/intc/arm_gicv3_common.h 
b/include/hw/intc/arm_gicv3_common.h
index 4e2fb518e7..7324c7d983 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -155,6 +155,8 @@ struct GICv3CPUState {
 qemu_irq parent_fiq;
 qemu_irq parent_virq;
 qemu_irq parent_vfiq;
+qemu_irq parent_nmi;
+qemu_irq parent_vnmi;
 
 /* Redistributor */
 uint32_t level;  /* Current IRQ level */
-- 
2.34.1

[RFC PATCH v9 15/23] hw/intc/arm_gicv3: Implement GICD_INMIR

2024-03-21 Thread Jinjie Ruan via

Add GICD_INMIR, GICD_INMIRnE register and support access GICD_INMIR0.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v4:
- Make the GICD_INMIR implementation more clearer.
- Udpate the commit message.
v3:
- Add Reviewed-by.
---
 hw/intc/arm_gicv3_dist.c | 34 ++
 hw/intc/gicv3_internal.h |  2 ++
 2 files changed, 36 insertions(+)

diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
index 35e850685c..9739404e35 100644
--- a/hw/intc/arm_gicv3_dist.c
+++ b/hw/intc/arm_gicv3_dist.c
@@ -89,6 +89,29 @@ static int gicd_ns_access(GICv3State *s, int irq)
 return extract32(s->gicd_nsacr[irq / 16], (irq % 16) * 2, 2);
 }
 
+static void gicd_write_bitmap_reg(GICv3State *s, MemTxAttrs attrs,
+  uint32_t *bmp, maskfn *maskfn,
+  int offset, uint32_t val)
+{
+/*
+ * Helper routine to implement writing to a "set" register
+ * (GICD_INMIR, etc).
+ * Semantics implemented here:
+ * RAZ/WI for SGIs, PPIs, unimplemented IRQs
+ * Bits corresponding to Group 0 or Secure Group 1 interrupts RAZ/WI.
+ * offset should be the offset in bytes of the register from the start
+ * of its group.
+ */
+int irq = offset * 8;
+
+if (irq < GIC_INTERNAL || irq >= s->num_irq) {
+return;
+}
+val &= mask_group_and_nsacr(s, attrs, maskfn, irq);
+*gic_bmp_ptr32(bmp, irq) = val;
+gicv3_update(s, irq, 32);
+}
+
 static void gicd_write_set_bitmap_reg(GICv3State *s, MemTxAttrs attrs,
   uint32_t *bmp,
   maskfn *maskfn,
@@ -543,6 +566,11 @@ static bool gicd_readl(GICv3State *s, hwaddr offset,
 /* RAZ/WI since affinity routing is always enabled */
 *data = 0;
 return true;
+case GICD_INMIR ... GICD_INMIR + 0x7f:
+*data = (!s->nmi_support) ? 0 :
+gicd_read_bitmap_reg(s, attrs, s->superprio, NULL,
+ offset - GICD_INMIR);
+return true;
 case GICD_IROUTER ... GICD_IROUTER + 0x1fdf:
 {
 uint64_t r;
@@ -752,6 +780,12 @@ static bool gicd_writel(GICv3State *s, hwaddr offset,
 case GICD_SPENDSGIR ... GICD_SPENDSGIR + 0xf:
 /* RAZ/WI since affinity routing is always enabled */
 return true;
+case GICD_INMIR ... GICD_INMIR + 0x7f:
+if (s->nmi_support) {
+gicd_write_bitmap_reg(s, attrs, s->superprio, NULL,
+  offset - GICD_INMIR, value);
+}
+return true;
 case GICD_IROUTER ... GICD_IROUTER + 0x1fdf:
 {
 uint64_t r;
diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h
index f35b7d2f03..a1fc34597e 100644
--- a/hw/intc/gicv3_internal.h
+++ b/hw/intc/gicv3_internal.h
@@ -52,6 +52,8 @@
 #define GICD_SGIR0x0F00
 #define GICD_CPENDSGIR   0x0F10
 #define GICD_SPENDSGIR   0x0F20
+#define GICD_INMIR   0x0F80
+#define GICD_INMIRnE 0x3B00
 #define GICD_IROUTER 0x6000
 #define GICD_IDREGS  0xFFD0
 
-- 
2.34.1

[RFC PATCH v9 04/23] target/arm: Implement ALLINT MSR (immediate)

2024-03-21 Thread Jinjie Ruan via

Add ALLINT MSR (immediate) to decodetree, in which the CRm is 0b000x. The
EL0 check is necessary to ALLINT, and the EL1 check is necessary when
imm == 1. So implement it inline for EL2/3, or EL1 with imm==0. Avoid the
unconditional write to pc and use raise_exception_ra to unwind.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v7:
- Add Reviewed-by.
v6:
- Fix DISAS_TOO_MANY to DISAS_UPDATE_EXIT and add the comment.
v5:
- Drop the & 1 in trans_MSR_i_ALLINT().
- Simplify and merge msr_i_allint() and allint_check().
- Rename msr_i_allint() to msr_set_allint_el1().
v4:
- Fix the ALLINT MSR (immediate) decodetree implementation.
- Remove arm_is_el2_enabled() check in allint_check().
- Update env->allint to env->pstate.
- Only call allint_check() when imm == 1.
- Simplify the allint_check() to not pass "op" and extract.
- Implement it inline for EL2/3, or EL1 with imm==0.
- Pass (a->imm & 1) * PSTATE_ALLINT (i64) to simplfy the ALLINT set/clear.
v3:
- Remove EL0 check in allint_check().
- Add TALLINT check for EL1 in allint_check().
- Remove unnecessarily arm_rebuild_hflags() in msr_i_allint helper.
---
 target/arm/tcg/a64.decode  |  1 +
 target/arm/tcg/helper-a64.c| 12 
 target/arm/tcg/helper-a64.h|  1 +
 target/arm/tcg/translate-a64.c | 19 +++
 4 files changed, 33 insertions(+)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 8a20dce3c8..0e7656fd15 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -207,6 +207,7 @@ MSR_i_DIT   1101 0101  0 011 0100  010 1 
@msr_i
 MSR_i_TCO   1101 0101  0 011 0100  100 1 @msr_i
 MSR_i_DAIFSET   1101 0101  0 011 0100  110 1 @msr_i
 MSR_i_DAIFCLEAR 1101 0101  0 011 0100  111 1 @msr_i
+MSR_i_ALLINT1101 0101  0 001 0100 000 imm:1 000 1
 MSR_i_SVCR  1101 0101  0 011 0100 0 mask:2 imm:1 011 1
 
 # MRS, MSR (register), SYS, SYSL. These are all essentially the
diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
index ebaa7f00df..7818537890 100644
--- a/target/arm/tcg/helper-a64.c
+++ b/target/arm/tcg/helper-a64.c
@@ -66,6 +66,18 @@ void HELPER(msr_i_spsel)(CPUARMState *env, uint32_t imm)
 update_spsel(env, imm);
 }
 
+void HELPER(msr_set_allint_el1)(CPUARMState *env)
+{
+/* ALLINT update to PSTATE. */
+if (arm_hcrx_el2_eff(env) & HCRX_TALLINT) {
+raise_exception_ra(env, EXCP_UDEF,
+   syn_aa64_sysregtrap(0, 1, 0, 4, 1, 0x1f, 0),
+   exception_target_el(env), GETPC());
+}
+
+env->pstate |= PSTATE_ALLINT;
+}
+
 static void daif_check(CPUARMState *env, uint32_t op,
uint32_t imm, uintptr_t ra)
 {
diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
index 575a5dab7d..0518165399 100644
--- a/target/arm/tcg/helper-a64.h
+++ b/target/arm/tcg/helper-a64.h
@@ -22,6 +22,7 @@ DEF_HELPER_FLAGS_1(rbit64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_2(msr_i_spsel, void, env, i32)
 DEF_HELPER_2(msr_i_daifset, void, env, i32)
 DEF_HELPER_2(msr_i_daifclear, void, env, i32)
+DEF_HELPER_1(msr_set_allint_el1, void, env)
 DEF_HELPER_3(vfp_cmph_a64, i64, f16, f16, ptr)
 DEF_HELPER_3(vfp_cmpeh_a64, i64, f16, f16, ptr)
 DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 340265beb0..21758b290d 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -2036,6 +2036,25 @@ static bool trans_MSR_i_DAIFCLEAR(DisasContext *s, arg_i 
*a)
 return true;
 }
 
+static bool trans_MSR_i_ALLINT(DisasContext *s, arg_i *a)
+{
+if (!dc_isar_feature(aa64_nmi, s) || s->current_el == 0) {
+return false;
+}
+
+if (a->imm == 0) {
+clear_pstate_bits(PSTATE_ALLINT);
+} else if (s->current_el > 1) {
+set_pstate_bits(PSTATE_ALLINT);
+} else {
+gen_helper_msr_set_allint_el1(tcg_env);
+}
+
+/* Exit the cpu loop to re-evaluate pending IRQs. */
+s->base.is_jmp = DISAS_UPDATE_EXIT;
+return true;
+}
+
 static bool trans_MSR_i_SVCR(DisasContext *s, arg_MSR_i_SVCR *a)
 {
 if (!dc_isar_feature(aa64_sme, s) || a->mask == 0) {
-- 
2.34.1

[RFC PATCH v9 09/23] target/arm: Handle PSTATE.ALLINT on taking an exception

2024-03-21 Thread Jinjie Ruan via

Set or clear PSTATE.ALLINT on taking an exception to ELx according to the
SCTLR_ELx.SPINTMASK bit.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v9:
- Not check SCTLR_NMI in arm_cpu_do_interrupt_aarch64().
v3:
- Add Reviewed-by.
---
 target/arm/helper.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index b57114d35d..967e833ee8 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11730,6 +11730,14 @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
 }
 }
 
+if (cpu_isar_feature(aa64_nmi, cpu)) {
+if (!(env->cp15.sctlr_el[new_el] & SCTLR_SPINTMASK)) {
+new_mode |= PSTATE_ALLINT;
+} else {
+new_mode &= ~PSTATE_ALLINT;
+}
+}
+
 pstate_write(env, PSTATE_DAIF | new_mode);
 env->aarch64 = true;
 aarch64_restore_sp(env, new_el);
-- 
2.34.1

[RFC PATCH v9 03/23] target/arm: Add support for FEAT_NMI, Non-maskable Interrupt

2024-03-21 Thread Jinjie Ruan via

Add support for FEAT_NMI. NMI (FEAT_NMI) is an mandatory feature in
ARMv8.8-A and ARM v9.3-A.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v3:
- Add Reviewed-by.
- Adjust to before the MSR patches.
---
 target/arm/internals.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index dd3da211a3..516e0584bf 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1229,6 +1229,9 @@ static inline uint32_t aarch64_pstate_valid_mask(const 
ARMISARegisters *id)
 if (isar_feature_aa64_mte(id)) {
 valid |= PSTATE_TCO;
 }
+if (isar_feature_aa64_nmi(id)) {
+valid |= PSTATE_ALLINT;
+}
 
 return valid;
 }
-- 
2.34.1

[RFC PATCH v9 17/23] hw/intc/arm_gicv3: Add NMI handling CPU interface registers

2024-03-21 Thread Jinjie Ruan via

Add the NMIAR CPU interface registers which deal with acknowledging NMI.

When introduce NMI interrupt, there are some updates to the semantics for the
register ICC_IAR1_EL1 and ICC_HPPIR1_EL1. For ICC_IAR1_EL1 register, it
should return 1022 if the intid has super priority. And for ICC_NMIAR1_EL1
register, it should return 1023 if the intid do not have super priority.
Howerever, these are not necessary for ICC_HPPIR1_EL1 register.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v7:
- Add Reviewed-by.
v4:
- Define ICC_NMIAR1_EL1 only if FEAT_GICv3_NMI is implemented.
- Check sctrl_elx.SCTLR_NMI to return 1022 for icc_iar1_read().
- Add gicv3_icc_nmiar1_read() trace event.
- Do not check icc_hppi_can_preempt() for icc_nmiar1_read().
- Add icv_nmiar1_read() and call it when EL2Enabled() and HCR_EL2.IMO == '1'
---
 hw/intc/arm_gicv3_cpuif.c | 59 +--
 hw/intc/gicv3_internal.h  |  1 +
 hw/intc/trace-events  |  1 +
 3 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index e1a60d8c15..df82a413c6 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -795,6 +795,13 @@ static uint64_t icv_iar_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 return intid;
 }
 
+static uint64_t icv_nmiar1_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+/* todo */
+uint64_t intid = INTID_SPURIOUS;
+return intid;
+}
+
 static uint32_t icc_fullprio_mask(GICv3CPUState *cs)
 {
 /*
@@ -1097,7 +1104,8 @@ static uint64_t icc_hppir0_value(GICv3CPUState *cs, 
CPUARMState *env)
 return cs->hppi.irq;
 }
 
-static uint64_t icc_hppir1_value(GICv3CPUState *cs, CPUARMState *env)
+static uint64_t icc_hppir1_value(GICv3CPUState *cs, CPUARMState *env,
+ bool is_nmi, bool is_hppi)
 {
 /* Return the highest priority pending interrupt register value
  * for group 1.
@@ -1108,6 +1116,19 @@ static uint64_t icc_hppir1_value(GICv3CPUState *cs, 
CPUARMState *env)
 return INTID_SPURIOUS;
 }
 
+if (!is_hppi) {
+int el = arm_current_el(env);
+
+if (is_nmi && (!cs->hppi.superprio)) {
+return INTID_SPURIOUS;
+}
+
+if ((!is_nmi) && cs->hppi.superprio
+&& env->cp15.sctlr_el[el] & SCTLR_NMI) {
+return INTID_NMI;
+}
+}
+
 /* Check whether we can return the interrupt or if we should return
  * a special identifier, as per the CheckGroup1ForSpecialIdentifiers
  * pseudocode. (We can simplify a little because for us ICC_SRE_EL1.RM
@@ -1168,7 +1189,7 @@ static uint64_t icc_iar1_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 if (!icc_hppi_can_preempt(cs)) {
 intid = INTID_SPURIOUS;
 } else {
-intid = icc_hppir1_value(cs, env);
+intid = icc_hppir1_value(cs, env, false, false);
 }
 
 if (!gicv3_intid_is_special(intid)) {
@@ -1179,6 +1200,25 @@ static uint64_t icc_iar1_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 return intid;
 }
 
+static uint64_t icc_nmiar1_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+GICv3CPUState *cs = icc_cs_from_env(env);
+uint64_t intid;
+
+if (icv_access(env, HCR_IMO)) {
+return icv_nmiar1_read(env, ri);
+}
+
+intid = icc_hppir1_value(cs, env, true, false);
+
+if (!gicv3_intid_is_special(intid)) {
+icc_activate_irq(cs, intid);
+}
+
+trace_gicv3_icc_nmiar1_read(gicv3_redist_affid(cs), intid);
+return intid;
+}
+
 static void icc_drop_prio(GICv3CPUState *cs, int grp)
 {
 /* Drop the priority of the currently active interrupt in
@@ -1555,7 +1595,7 @@ static uint64_t icc_hppir1_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 return icv_hppir_read(env, ri);
 }
 
-value = icc_hppir1_value(cs, env);
+value = icc_hppir1_value(cs, env, false, true);
 trace_gicv3_icc_hppir1_read(gicv3_redist_affid(cs), value);
 return value;
 }
@@ -2482,6 +2522,15 @@ static const ARMCPRegInfo 
gicv3_cpuif_icc_apxr23_reginfo[] = {
 },
 };
 
+static const ARMCPRegInfo gicv3_cpuif_gicv3_nmi_reginfo[] = {
+{ .name = "ICC_NMIAR1_EL1", .state = ARM_CP_STATE_BOTH,
+  .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 9, .opc2 = 5,
+  .type = ARM_CP_IO | ARM_CP_NO_RAW,
+  .access = PL1_R, .accessfn = gicv3_irq_access,
+  .readfn = icc_nmiar1_read,
+},
+};
+
 static uint64_t ich_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
 GICv3CPUState *cs = icc_cs_from_env(env);
@@ -2838,6 +2887,10 @@ void gicv3_init_cpuif(GICv3State *s)
  */
 define_arm_cp_regs(cpu, gicv3_cpuif_reginfo);
 
+if (s->nmi_support) {
+define_arm_cp_regs(cpu, gicv3_cpuif_gicv3_nmi_reginfo);
+}
+
 /*
  * The CPU implementation specifies the number of supported
  * bits of physical priority. For backwards compatibility
diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_interna

[RFC PATCH v9 05/23] target/arm: Support MSR access to ALLINT

2024-03-21 Thread Jinjie Ruan via

Support ALLINT msr access as follow:
mrs , ALLINT// read allint
msr ALLINT, // write allint with imm

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v9:
- Move nmi_reginfo and related functions inside an existing ifdef
  TARGET_AARCH64 to solve the --target-list=aarch64-softmmu,arm-softmmu
  compilation problem.
- Check 'isread' when writing to ALLINT.
v5:
- Add Reviewed-by.
v4:
- Remove arm_is_el2_enabled() check in allint_check().
- Change to env->pstate instead of env->allint.
v3:
- Remove EL0 check in aa64_allint_access() which alreay checks in .access
  PL1_RW.
- Use arm_hcrx_el2_eff() in aa64_allint_access() instead of env->cp15.hcrx_el2.
- Make ALLINT msr access function controlled by aa64_nmi.
---
 target/arm/helper.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 7d6c6e9878..a65729af66 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7497,6 +7497,37 @@ static const ARMCPRegInfo rme_mte_reginfo[] = {
   .opc0 = 1, .opc1 = 6, .crn = 7, .crm = 14, .opc2 = 5,
   .access = PL3_W, .type = ARM_CP_NOP },
 };
+
+static void aa64_allint_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+env->pstate = (env->pstate & ~PSTATE_ALLINT) | (value & PSTATE_ALLINT);
+}
+
+static uint64_t aa64_allint_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+return env->pstate & PSTATE_ALLINT;
+}
+
+static CPAccessResult aa64_allint_access(CPUARMState *env,
+ const ARMCPRegInfo *ri, bool isread)
+{
+if (!isread && arm_current_el(env) == 1 &&
+(arm_hcrx_el2_eff(env) & HCRX_TALLINT)) {
+return CP_ACCESS_TRAP_EL2;
+}
+return CP_ACCESS_OK;
+}
+
+static const ARMCPRegInfo nmi_reginfo[] = {
+{ .name = "ALLINT", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 0, .opc2 = 0, .crn = 4, .crm = 3,
+  .type = ARM_CP_NO_RAW,
+  .access = PL1_RW, .accessfn = aa64_allint_access,
+  .fieldoffset = offsetof(CPUARMState, pstate),
+  .writefn = aa64_allint_write, .readfn = aa64_allint_read,
+  .resetfn = arm_cp_reset_ignore },
+};
 #endif /* TARGET_AARCH64 */
 
 static void define_pmu_regs(ARMCPU *cpu)
@@ -9891,6 +9922,10 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 if (cpu_isar_feature(aa64_nv2, cpu)) {
 define_arm_cp_regs(cpu, nv2_reginfo);
 }
+
+if (cpu_isar_feature(aa64_nmi, cpu)) {
+define_arm_cp_regs(cpu, nmi_reginfo);
+}
 #endif
 
 if (cpu_isar_feature(any_predinv, cpu)) {
-- 
2.34.1

Re: [RFC PATCH v8 13/23] hw/intc/arm_gicv3: Add irq superpriority information

2024-03-21 Thread Peter Maydell

On Mon, 18 Mar 2024 at 09:38, Jinjie Ruan  wrote:
>
> A SPI, PPI or SGI interrupt can have a superpriority property. So
> maintain superpriority information in PendingIrq and GICR/GICD.
>
> Signed-off-by: Jinjie Ruan 
> Acked-by: Richard Henderson 
> ---
> v3:
> - Place this ahead of implement GICR_INMIR.
> - Add Acked-by.
> ---
>  include/hw/intc/arm_gicv3_common.h | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/hw/intc/arm_gicv3_common.h 
> b/include/hw/intc/arm_gicv3_common.h
> index 7324c7d983..df4380141d 100644
> --- a/include/hw/intc/arm_gicv3_common.h
> +++ b/include/hw/intc/arm_gicv3_common.h
> @@ -146,6 +146,7 @@ typedef struct {
>  int irq;
>  uint8_t prio;
>  int grp;
> +bool superprio;
>  } PendingIrq;
>
>  struct GICv3CPUState {
> @@ -172,6 +173,7 @@ struct GICv3CPUState {
>  uint32_t gicr_ienabler0;
>  uint32_t gicr_ipendr0;
>  uint32_t gicr_iactiver0;
> +uint32_t gicr_isuperprio;

This field stores the state that is in the GICR_INMIR0
register, so please name it that way: gicr_inmir0.

>  uint32_t edge_trigger; /* ICFGR0 and ICFGR1 even bits */
>  uint32_t gicr_igrpmodr0;
>  uint32_t gicr_nsacr;
> @@ -274,6 +276,7 @@ struct GICv3State {
>  GIC_DECLARE_BITMAP(active);   /* GICD_ISACTIVER */
>  GIC_DECLARE_BITMAP(level);/* Current level */
>  GIC_DECLARE_BITMAP(edge_trigger); /* GICD_ICFGR even bits */
> +GIC_DECLARE_BITMAP(superprio);/* GICD_INMIR */
>  uint8_t gicd_ipriority[GICV3_MAXIRQ];
>  uint64_t gicd_irouter[GICV3_MAXIRQ];
>  /* Cached information: pointer to the cpu i/f for the CPUs specified
> @@ -313,6 +316,7 @@ GICV3_BITMAP_ACCESSORS(pending)
>  GICV3_BITMAP_ACCESSORS(active)
>  GICV3_BITMAP_ACCESSORS(level)
>  GICV3_BITMAP_ACCESSORS(edge_trigger)
> +GICV3_BITMAP_ACCESSORS(superprio)

This is the state behind the GICD_INMIR registers, and
the GIC spec calls the bits in those registers NMI,
so I would call this bitmap nmi, not superprio.

This commit adds new device state, so it also needs to be migrated.
You'll want to add a new subsection to vmstate_gicv3_cpu which
is present if the GIC implements NMIs, and which has an entry
for the gicr_inmir0 field. Similarly, you want a new subsection
in vmstate_gicv3 which is present if NMIs are implemented and which
has a field for the nmi array.

thanks
-- PMM

[PATCH v2 2/3] block-backend: fix edge case in bdrv_next() where BDS associated to BB changes

2024-03-21 Thread Fiona Ebner

The old_bs variable in bdrv_next() is currently determined by looking
at the old block backend. However, if the block graph changes before
the next bdrv_next() call, it might be that the associated BDS is not
the same that was referenced previously. In that case, the wrong BDS
is unreferenced, leading to an assertion failure later:

> bdrv_unref: Assertion `bs->refcnt > 0' failed.

In particular, this can happen in the context of bdrv_flush_all(),
when polling for bdrv_co_flush() in the generated co-wrapper leads to
a graph change (for example with a stream block job [0]).

A racy reproducer:

> #!/bin/bash
> rm -f /tmp/backing.qcow2
> rm -f /tmp/top.qcow2
> ./qemu-img create /tmp/backing.qcow2 -f qcow2 64M
> ./qemu-io -c "write -P42 0x0 0x1" /tmp/backing.qcow2
> ./qemu-img create /tmp/top.qcow2 -f qcow2 64M -b /tmp/backing.qcow2 -F qcow2
> ./qemu-system-x86_64 --qmp stdio \
> --blockdev 
> qcow2,node-name=node0,file.driver=file,file.filename=/tmp/top.qcow2 \
> < {"execute": "qmp_capabilities"}
> {"execute": "block-stream", "arguments": { "job-id": "stream0", "device": 
> "node0" } }
> {"execute": "quit"}
> EOF

[0]:

> #0  bdrv_replace_child_tran (child=..., new_bs=..., tran=...)
> #1  bdrv_replace_node_noperm (from=..., to=..., auto_skip=..., tran=..., 
> errp=...)
> #2  bdrv_replace_node_common (from=..., to=..., auto_skip=..., 
> detach_subchain=..., errp=...)
> #3  bdrv_drop_filter (bs=..., errp=...)
> #4  bdrv_cor_filter_drop (cor_filter_bs=...)
> #5  stream_prepare (job=...)
> #6  job_prepare_locked (job=...)
> #7  job_txn_apply_locked (fn=..., job=...)
> #8  job_do_finalize_locked (job=...)
> #9  job_exit (opaque=...)
> #10 aio_bh_poll (ctx=...)
> #11 aio_poll (ctx=..., blocking=...)
> #12 bdrv_poll_co (s=...)
> #13 bdrv_flush (bs=...)
> #14 bdrv_flush_all ()
> #15 do_vm_stop (state=..., send_stop=...)
> #16 vm_shutdown ()

Signed-off-by: Fiona Ebner 
---

Not sure if this is the correct fix, or if the call site should rather
be adapted somehow?

New in v2.

 block/block-backend.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 9c4de79e6b..28af1eb17a 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -599,14 +599,14 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it)
 /* Must be called from the main loop */
 assert(qemu_get_current_aio_context() == qemu_get_aio_context());
 
+old_bs = it->bs;
+
 /* First, return all root nodes of BlockBackends. In order to avoid
  * returning a BDS twice when multiple BBs refer to it, we only return it
  * if the BB is the first one in the parent list of the BDS. */
 if (it->phase == BDRV_NEXT_BACKEND_ROOTS) {
 BlockBackend *old_blk = it->blk;
 
-old_bs = old_blk ? blk_bs(old_blk) : NULL;
-
 do {
 it->blk = blk_all_next(it->blk);
 bs = it->blk ? blk_bs(it->blk) : NULL;
@@ -620,11 +620,10 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it)
 if (bs) {
 bdrv_ref(bs);
 bdrv_unref(old_bs);
+it->bs = bs;
 return bs;
 }
 it->phase = BDRV_NEXT_MONITOR_OWNED;
-} else {
-old_bs = it->bs;
 }
 
 /* Then return the monitor-owned BDSes without a BB attached. Ignore all
-- 
2.39.2

[PATCH v2 1/3] block/io: accept NULL qiov in bdrv_pad_request

2024-03-21 Thread Fiona Ebner

From: Stefan Reiter 

Some operations, e.g. block-stream, perform reads while discarding the
results (only copy-on-read matters). In this case, they will pass NULL
as the target QEMUIOVector, which will however trip bdrv_pad_request,
since it wants to extend its passed vector. In particular, this is the
case for the blk_co_preadv() call in stream_populate().

If there is no qiov, no operation can be done with it, but the bytes
and offset still need to be updated, so the subsequent aligned read
will actually be aligned and not run into an assertion failure.

In particular, this can happen when the request alignment of the top
node is larger than the allocated part of the bottom node, in which
case padding becomes necessary. For example:

> ./qemu-img create /tmp/backing.qcow2 -f qcow2 64M -o cluster_size=32768
> ./qemu-io -c "write -P42 0x0 0x1" /tmp/backing.qcow2
> ./qemu-img create /tmp/top.qcow2 -f qcow2 64M -b /tmp/backing.qcow2 -F qcow2
> ./qemu-system-x86_64 --qmp stdio \
> --blockdev 
> qcow2,node-name=node0,file.driver=file,file.filename=/tmp/top.qcow2 \
> < {"execute": "qmp_capabilities"}
> {"execute": "blockdev-add", "arguments": { "driver": "compress", "file": 
> "node0", "node-name": "node1" } }
> {"execute": "block-stream", "arguments": { "job-id": "stream0", "device": 
> "node1" } }
> EOF

Originally-by: Stefan Reiter 
Signed-off-by: Thomas Lamprecht 
[FE: do update bytes and offset in any case
 add reproducer to commit message]
Signed-off-by: Fiona Ebner 
---

No changes in v2.

 block/io.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/block/io.c b/block/io.c
index 33150c0359..395bea3bac 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1726,22 +1726,29 @@ static int bdrv_pad_request(BlockDriverState *bs,
 return 0;
 }
 
-sliced_iov = qemu_iovec_slice(*qiov, *qiov_offset, *bytes,
-  &sliced_head, &sliced_tail,
-  &sliced_niov);
+/*
+ * For prefetching in stream_populate(), no qiov is passed along, because
+ * only copy-on-read matters.
+ */
+if (qiov && *qiov) {
+sliced_iov = qemu_iovec_slice(*qiov, *qiov_offset, *bytes,
+  &sliced_head, &sliced_tail,
+  &sliced_niov);
 
-/* Guaranteed by bdrv_check_request32() */
-assert(*bytes <= SIZE_MAX);
-ret = bdrv_create_padded_qiov(bs, pad, sliced_iov, sliced_niov,
-  sliced_head, *bytes);
-if (ret < 0) {
-bdrv_padding_finalize(pad);
-return ret;
+/* Guaranteed by bdrv_check_request32() */
+assert(*bytes <= SIZE_MAX);
+ret = bdrv_create_padded_qiov(bs, pad, sliced_iov, sliced_niov,
+  sliced_head, *bytes);
+if (ret < 0) {
+bdrv_padding_finalize(pad);
+return ret;
+}
+*qiov = &pad->local_qiov;
+*qiov_offset = 0;
 }
+
 *bytes += pad->head + pad->tail;
 *offset -= pad->head;
-*qiov = &pad->local_qiov;
-*qiov_offset = 0;
 if (padded) {
 *padded = true;
 }
-- 
2.39.2

[PATCH v2 0/3] fix two edge cases related to stream block jobs

2024-03-21 Thread Fiona Ebner

Changes in v2:
* Ran into another issue while writing the IO test Stefan wanted
  to have (good call :)), so include a fix for that and add the
  test. I didn't notice during manual testing, because I hadn't
  used a scripted QMP 'quit', so there was no race.

Fiona Ebner (2):
  block-backend: fix edge case in bdrv_next() where BDS associated to BB
changes
  iotests: add test for stream job with an unaligned prefetch read

Stefan Reiter (1):
  block/io: accept NULL qiov in bdrv_pad_request

 block/block-backend.c |  7 +-
 block/io.c| 31 ---
 .../tests/stream-unaligned-prefetch   | 86 +++
 .../tests/stream-unaligned-prefetch.out   |  5 ++
 4 files changed, 113 insertions(+), 16 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/stream-unaligned-prefetch
 create mode 100644 tests/qemu-iotests/tests/stream-unaligned-prefetch.out

-- 
2.39.2

[PATCH v2 3/3] iotests: add test for stream job with an unaligned prefetch read

2024-03-21 Thread Fiona Ebner

Previously, bdrv_pad_request() could not deal with a NULL qiov when
a read needed to be aligned. During prefetch, a stream job will pass a
NULL qiov. Add a test case to cover this scenario.

By accident, also covers a previous race during shutdown, where block
graph changes during iteration in bdrv_flush_all() could lead to
unreferencing the wrong block driver state and an assertion failure
later.

Signed-off-by: Fiona Ebner 
---

New in v2.

 .../tests/stream-unaligned-prefetch   | 86 +++
 .../tests/stream-unaligned-prefetch.out   |  5 ++
 2 files changed, 91 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/stream-unaligned-prefetch
 create mode 100644 tests/qemu-iotests/tests/stream-unaligned-prefetch.out

diff --git a/tests/qemu-iotests/tests/stream-unaligned-prefetch 
b/tests/qemu-iotests/tests/stream-unaligned-prefetch
new file mode 100755
index 00..546db1d369
--- /dev/null
+++ b/tests/qemu-iotests/tests/stream-unaligned-prefetch
@@ -0,0 +1,86 @@
+#!/usr/bin/env python3
+# group: rw quick
+#
+# Test what happens when a stream job does an unaligned prefetch read
+# which requires padding while having a NULL qiov.
+#
+# Copyright (C) Proxmox Server Solutions GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import iotests
+from iotests import imgfmt, qemu_img_create, qemu_io, QMPTestCase
+
+image_size = 1 * 1024 * 1024
+cluster_size = 64 * 1024
+base = os.path.join(iotests.test_dir, 'base.img')
+top = os.path.join(iotests.test_dir, 'top.img')
+
+class TestStreamUnalignedPrefetch(QMPTestCase):
+def setUp(self) -> None:
+"""
+Create two images:
+- base image {base} with {cluster_size // 2} bytes allocated
+- top image {top} without any data allocated and coarser
+  cluster size
+
+Attach a compress filter for the top image, because that
+requires that the request alignment is the top image's cluster
+size.
+"""
+qemu_img_create('-f', imgfmt,
+'-o', 'cluster_size={}'.format(cluster_size // 2),
+base, str(image_size))
+qemu_io('-c', f'write 0 {cluster_size // 2}', base)
+qemu_img_create('-f', imgfmt,
+'-o', 'cluster_size={}'.format(cluster_size),
+top, str(image_size))
+
+self.vm = iotests.VM()
+self.vm.add_blockdev(self.vm.qmp_to_opts({
+'driver': imgfmt,
+'node-name': 'base',
+'file': {
+'driver': 'file',
+'filename': base
+}
+}))
+self.vm.add_blockdev(self.vm.qmp_to_opts({
+'driver': 'compress',
+'node-name': 'compress-top',
+'file': {
+'driver': imgfmt,
+'node-name': 'top',
+'file': {
+'driver': 'file',
+'filename': top
+},
+'backing': 'base'
+}
+}))
+self.vm.launch()
+
+def tearDown(self) -> None:
+self.vm.shutdown()
+os.remove(top)
+os.remove(base)
+
+def test_stream_unaligned_prefetch(self) -> None:
+self.vm.cmd('block-stream', job_id='stream', device='compress-top')
+
+
+if __name__ == '__main__':
+iotests.main(supported_fmts=['qcow2'], supported_protocols=['file'])
diff --git a/tests/qemu-iotests/tests/stream-unaligned-prefetch.out 
b/tests/qemu-iotests/tests/stream-unaligned-prefetch.out
new file mode 100644
index 00..ae1213e6f8
--- /dev/null
+++ b/tests/qemu-iotests/tests/stream-unaligned-prefetch.out
@@ -0,0 +1,5 @@
+.
+--
+Ran 1 tests
+
+OK
-- 
2.39.2

Re: [PATCH v4 3/3] Add support for RAPL MSRs in KVM/Qemu

2024-03-21 Thread Daniel P . Berrangé

On Mon, Mar 18, 2024 at 04:12:16PM +0100, Anthony Harivel wrote:
> Starting with the "Sandy Bridge" generation, Intel CPUs provide a RAPL
> interface (Running Average Power Limit) for advertising the accumulated
> energy consumption of various power domains (e.g. CPU packages, DRAM,
> etc.).
> 
> The consumption is reported via MSRs (model specific registers) like
> MSR_PKG_ENERGY_STATUS for the CPU package power domain. These MSRs are
> 64 bits registers that represent the accumulated energy consumption in
> micro Joules. They are updated by microcode every ~1ms.
> 
> For now, KVM always returns 0 when the guest requests the value of
> these MSRs. Use the KVM MSR filtering mechanism to allow QEMU handle
> these MSRs dynamically in userspace.
> 
> To limit the amount of system calls for every MSR call, create a new
> thread in QEMU that updates the "virtual" MSR values asynchronously.
> 
> Each vCPU has its own vMSR to reflect the independence of vCPUs. The
> thread updates the vMSR values with the ratio of energy consumed of
> the whole physical CPU package the vCPU thread runs on and the
> thread's utime and stime values.
> 
> All other non-vCPU threads are also taken into account. Their energy
> consumption is evenly distributed among all vCPUs threads running on
> the same physical CPU package.
> 
> To overcome the problem that reading the RAPL MSR requires priviliged
> access, a socket communication between QEMU and the qemu-vmsr-helper is
> mandatory. You can specified the socket path in the parameter.
> 
> This feature is activated with -accel kvm,rapl=true,path=/path/sock.sock
> 
> Actual limitation:
> - Works only on Intel host CPU because AMD CPUs are using different MSR
>   adresses.
> 
> - Only the Package Power-Plane (MSR_PKG_ENERGY_STATUS) is reported at
>   the moment.
> 
> Signed-off-by: Anthony Harivel 
> ---
>  accel/kvm/kvm-all.c   |  27 +++
>  docs/specs/index.rst  |   1 +
>  docs/specs/rapl-msr.rst   | 155 +
>  include/sysemu/kvm.h  |   2 +
>  include/sysemu/kvm_int.h  |  30 +++
>  target/i386/cpu.h |   8 +
>  target/i386/kvm/kvm-cpu.c |   7 +
>  target/i386/kvm/kvm.c | 420 ++
>  target/i386/kvm/meson.build   |   1 +
>  target/i386/kvm/vmsr_energy.c | 381 ++
>  target/i386/kvm/vmsr_energy.h |  97 
>  11 files changed, 1129 insertions(+)
>  create mode 100644 docs/specs/rapl-msr.rst
>  create mode 100644 target/i386/kvm/vmsr_energy.c
>  create mode 100644 target/i386/kvm/vmsr_energy.h
> 


> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
> index 882e37e12c5b..ea4587b53eb1 100644
> --- a/include/sysemu/kvm_int.h
> +++ b/include/sysemu/kvm_int.h
> @@ -14,6 +14,8 @@
>  #include "qemu/accel.h"
>  #include "qemu/queue.h"
>  #include "sysemu/kvm.h"
> +#include "hw/boards.h"
> +#include "hw/i386/topology.h"
>  
>  typedef struct KVMSlot
>  {
> @@ -48,6 +50,33 @@ typedef struct KVMMemoryListener {
>  
>  #define KVM_MSI_HASHTAB_SIZE256
>  
> +typedef struct KVMHostTopoInfo {
> +/* Number of package on the Host */
> +unsigned int maxpkgs;
> +/* Number of cpus on the Host */
> +unsigned int maxcpus;
> +/* Number of cpus on each different package */
> +unsigned int *pkg_cpu_count;
> +/* Each package can have different maxticks */
> +unsigned int *maxticks;
> +} KVMHostTopoInfo;
> +
> +struct KVMMsrEnergy {
> +pid_t pid;
> +bool enable;
> +char *socket_path;
> +QemuThread msr_thr;
> +unsigned int vcpus;
> +unsigned int vsockets;
> +X86CPUTopoInfo topo_info;
> +KVMHostTopoInfo host_topo;
> +const CPUArchIdList *cpu_list;
> +uint64_t *msr_value;
> +uint64_t msr_unit;
> +uint64_t msr_limit;
> +uint64_t msr_info;
> +};
> +
>  enum KVMDirtyRingReaperState {
>  KVM_DIRTY_RING_REAPER_NONE = 0,
>  /* The reaper is sleeping */
> @@ -114,6 +143,7 @@ struct KVMState
>  bool kvm_dirty_ring_with_bitmap;
>  uint64_t kvm_eager_split_size;  /* Eager Page Splitting chunk size */
>  struct KVMDirtyRingReaper reaper;
> +struct KVMMsrEnergy msr_energy;
>  NotifyVmexitOption notify_vmexit;
>  uint32_t notify_window;
>  uint32_t xen_version;
> diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
> index 9c791b7b0520..246de2bac2f1 100644
> --- a/target/i386/kvm/kvm-cpu.c
> +++ b/target/i386/kvm/kvm-cpu.c
> @@ -50,6 +50,13 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
> MSR_IA32_UCODE_REV);
>  }
>  }
> +if (kvm_is_rapl_feat_enable(cs)) {
> +if (IS_INTEL_CPU(env)) {

You need to invert this check  if (!IS_INTEL_CPU(...)) {

> +error_setg(errp, "RAPL feature is enable and CPU is not INTEL 
> CPU");

Tweak the message

"The RAPL feature can only be enabled with Intel CPU models"

> +return false;
> +};
> +};

Re: [PATCH v3 47/49] hw/i386/sev: Add support to encrypt BIOS when SEV-SNP is enabled

2024-03-21 Thread Michael Roth via

On Wed, Mar 20, 2024 at 12:22:34PM +, Daniel P. Berrangé wrote:
> On Wed, Mar 20, 2024 at 03:39:43AM -0500, Michael Roth wrote:
> > TODO: Brijesh as author, me as co-author (vice-versa depending)
> >   drop flash handling? we only support BIOS now
> 
> A reminder that this commit message needs fixing.

Sorry, definitely meant to fix this one up before submitting. I've
gone ahead and force-pushed an updated tree to same qemu-v3-rc branch.
The only change is proper attribution/commit message for this patch:

  https://github.com/AMDESE/qemu/commit/c54618a1cc23f2398e6c3af6f3cf140c4901347c

-Mike

> 
> > 
> > Signed-off-by: Michael Roth 
> > ---
> >  hw/i386/pc_sysfw.c| 12 +++-
> >  hw/i386/x86.c |  2 +-
> >  include/hw/i386/x86.h |  2 +-
> >  target/i386/sev-sysemu-stub.c |  2 +-
> >  target/i386/sev.c | 15 +++
> >  target/i386/sev.h |  2 +-
> >  6 files changed, 22 insertions(+), 13 deletions(-)
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>

Re: [PATCH v3] target/loongarch: Fix qemu-system-loongarch64 assert failed with the option '-d int'

2024-03-21 Thread Philippe Mathieu-Daudé


On 21/3/24 13:36, Song Gao wrote:

qemu-system-loongarch64 assert failed with the option '-d int',
the helper_idle() raise an exception EXCP_HLT, but the exception name is 
undefined.

Signed-off-by: Song Gao 
---
  target/loongarch/cpu.c | 74 +++---
  1 file changed, 40 insertions(+), 34 deletions(-)




-static const char * const excp_names[] = {
-[EXCCODE_INT] = "Interrupt",
-[EXCCODE_PIL] = "Page invalid exception for load",
-[EXCCODE_PIS] = "Page invalid exception for store",
-[EXCCODE_PIF] = "Page invalid exception for fetch",
-[EXCCODE_PME] = "Page modified exception",
-[EXCCODE_PNR] = "Page Not Readable exception",
-[EXCCODE_PNX] = "Page Not Executable exception",
-[EXCCODE_PPI] = "Page Privilege error",
-[EXCCODE_ADEF] = "Address error for instruction fetch",
-[EXCCODE_ADEM] = "Address error for Memory access",
-[EXCCODE_SYS] = "Syscall",
-[EXCCODE_BRK] = "Break",
-[EXCCODE_INE] = "Instruction Non-Existent",
-[EXCCODE_IPE] = "Instruction privilege error",
-[EXCCODE_FPD] = "Floating Point Disabled",
-[EXCCODE_FPE] = "Floating Point Exception",
-[EXCCODE_DBP] = "Debug breakpoint",
-[EXCCODE_BCE] = "Bound Check Exception",
-[EXCCODE_SXD] = "128 bit vector instructions Disable exception",
-[EXCCODE_ASXD] = "256 bit vector instructions Disable exception",
+struct TypeExcp {
+int32_t exccode;
+const char *name;


  const char * const name;

(Sorry I missed in review, no need to respin)

Reviewed-by: Philippe Mathieu-Daudé 



+};
+
+static const struct TypeExcp excp_names[] = {
+{EXCCODE_INT, "Interrupt"},
+{EXCCODE_PIL, "Page invalid exception for load"},
+{EXCCODE_PIS, "Page invalid exception for store"},
+{EXCCODE_PIF, "Page invalid exception for fetch"},
+{EXCCODE_PME, "Page modified exception"},
+{EXCCODE_PNR, "Page Not Readable exception"},
+{EXCCODE_PNX, "Page Not Executable exception"},
+{EXCCODE_PPI, "Page Privilege error"},
+{EXCCODE_ADEF, "Address error for instruction fetch"},
+{EXCCODE_ADEM, "Address error for Memory access"},
+{EXCCODE_SYS, "Syscall"},
+{EXCCODE_BRK, "Break"},
+{EXCCODE_INE, "Instruction Non-Existent"},
+{EXCCODE_IPE, "Instruction privilege error"},
+{EXCCODE_FPD, "Floating Point Disabled"},
+{EXCCODE_FPE, "Floating Point Exception"},
+{EXCCODE_DBP, "Debug breakpoint"},
+{EXCCODE_BCE, "Bound Check Exception"},
+{EXCCODE_SXD, "128 bit vector instructions Disable exception"},
+{EXCCODE_ASXD, "256 bit vector instructions Disable exception"},
+{EXCP_HLT, "EXCP_HLT"},
  };

[PATCH v10 02/21] hw/core/machine: Support modules in -smp

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

Add "modules" parameter parsing support in -smp.

Suggested-by: Xiaoyao Li 
Tested-by: Yongwei Ma 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
Acked-by: Markus Armbruster 
---
Changes since v9:
 * Rebased on the SMP changes about unsupported "parameter=1"
   configurations. (Philippe)
 * Fixed typo about topology field. (Dapeng)

Changes since v8:
 * Added module description in qemu_smp_opts.

Changes since v7:
 * New commit to introduce module level in -smp.
---
 hw/core/machine-smp.c | 39 +--
 hw/core/machine.c |  1 +
 qapi/machine.json |  3 +++
 system/vl.c   |  3 +++
 4 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
index 2e68fcfdfd79..2b93fa99c943 100644
--- a/hw/core/machine-smp.c
+++ b/hw/core/machine-smp.c
@@ -51,6 +51,10 @@ static char *cpu_hierarchy_to_string(MachineState *ms)
 g_string_append_printf(s, " * clusters (%u)", ms->smp.clusters);
 }
 
+if (mc->smp_props.modules_supported) {
+g_string_append_printf(s, " * modules (%u)", ms->smp.modules);
+}
+
 g_string_append_printf(s, " * cores (%u)", ms->smp.cores);
 g_string_append_printf(s, " * threads (%u)", ms->smp.threads);
 
@@ -88,6 +92,7 @@ void machine_parse_smp_config(MachineState *ms,
 unsigned sockets = config->has_sockets ? config->sockets : 0;
 unsigned dies= config->has_dies ? config->dies : 0;
 unsigned clusters = config->has_clusters ? config->clusters : 0;
+unsigned modules = config->has_modules ? config->modules : 0;
 unsigned cores   = config->has_cores ? config->cores : 0;
 unsigned threads = config->has_threads ? config->threads : 0;
 unsigned maxcpus = config->has_maxcpus ? config->maxcpus : 0;
@@ -103,6 +108,7 @@ void machine_parse_smp_config(MachineState *ms,
 (config->has_sockets && config->sockets == 0) ||
 (config->has_dies && config->dies == 0) ||
 (config->has_clusters && config->clusters == 0) ||
+(config->has_modules && config->modules == 0) ||
 (config->has_cores && config->cores == 0) ||
 (config->has_threads && config->threads == 0) ||
 (config->has_maxcpus && config->maxcpus == 0)) {
@@ -115,6 +121,20 @@ void machine_parse_smp_config(MachineState *ms,
  * If not supported by the machine, a topology parameter must be
  * omitted.
  */
+if (!mc->smp_props.modules_supported && config->has_modules) {
+if (config->modules > 1) {
+error_setg(errp, "modules not supported by this "
+   "machine's CPU topology");
+return;
+} else {
+/* Here modules only equals 1 since we've checked zero case. */
+warn_report("Deprecated CPU topology (considered invalid): "
+"Unsupported modules parameter mustn't be "
+"specified as 1");
+}
+}
+modules = modules > 0 ? modules : 1;
+
 if (!mc->smp_props.clusters_supported && config->has_clusters) {
 if (config->clusters > 1) {
 error_setg(errp, "clusters not supported by this "
@@ -185,11 +205,13 @@ void machine_parse_smp_config(MachineState *ms,
 cores = cores > 0 ? cores : 1;
 threads = threads > 0 ? threads : 1;
 sockets = maxcpus /
-  (drawers * books * dies * clusters * cores * 
threads);
+  (drawers * books * dies * clusters *
+   modules * cores * threads);
 } else if (cores == 0) {
 threads = threads > 0 ? threads : 1;
 cores = maxcpus /
-(drawers * books * sockets * dies * clusters * 
threads);
+(drawers * books * sockets * dies *
+ clusters * modules * threads);
 }
 } else {
 /* prefer cores over sockets since 6.2 */
@@ -197,22 +219,26 @@ void machine_parse_smp_config(MachineState *ms,
 sockets = sockets > 0 ? sockets : 1;
 threads = threads > 0 ? threads : 1;
 cores = maxcpus /
-(drawers * books * sockets * dies * clusters * 
threads);
+(drawers * books * sockets * dies *
+ clusters * modules * threads);
 } else if (sockets == 0) {
 threads = threads > 0 ? threads : 1;
 sockets = maxcpus /
-  (drawers * books * dies * clusters * cores * 
threads);
+  (drawers * books * dies * clusters *
+   modules * cores * threads);
 }
 }
 
 /* try to calculate omitted threads at last */
 if (threads == 0) {
 threads = maxcpus /
-  (drawers * books * sockets * dies * clusters * cores);
+

[PATCH v10 00/21] i386: Introduce smp.modules and clean up cache topology

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

Hi,

This is the our v10 patch series, rebased on the master branch at the
commit 54294b23e16d ("Merge tag 'ui-pull-request' of
https://gitlab.com/marcandre.lureau/qemu into staging").

Compared with v9 [1], v10 mainly contains minor cleanups, without
significant code changes.

Intel's hybrid Client platform and E core server platform introduce
module level and share L2 cache on the module level, in order to
configure the CPU/cache topology for the Guest to be consistent with
Host's, this series did the following work:
 * Add now "module" CPU topology level for x86 CPU.
 * Refacter cache topology encoding for x86 CPU (This is base to
   support the L2 per module).

So, this series is also necessary to support subsequent user
configurations of cache topology (via -smp, [2]) and Intel heterogeneous
CPU topology ([3] and [4]).


Background
==

At present, x86 defaults L2 cache is shared in one core, but this is
not enough. There're some platforms that multiple cores share the
same L2 cache, e.g., Alder Lake-P shares L2 cache for one module of
Atom cores, that is, every four Atom cores shares one L2 cache. On
E core server platform, there's the similar L2 per module topology.
Therefore, we need the new CPU topology level.

Another reason is that Intel client hybrid architectures organize P
cores and E cores via module, so a new CPU topology level is necessary
to support hybrid CPU topology!


Why We Introduce Module Instead of Reusing Cluster
--

For the discussion in v7 about whether we should reuse current
smp.clusters for x86 module, the core point is what's the essential
differences between x86 module and general cluster.

Since, cluster (for ARM/riscv) lacks a comprehensive and rigorous
hardware definition, and judging from the description of smp.clusters
[5] when it was introduced by QEMU, x86 module is very similar to
general smp.clusters: they are all a layer above existing core level
to organize the physical cores and share L2 cache.

But there are following reasons that drive us to introduce the new
smp.modules:

  * As the CPU topology abstraction in device tree [6], cluster supports
nesting (though currently QEMU hasn't support that). In contrast,
(x86) module does not support nesting.

  * Due to nesting, there is great flexibility in sharing resources
on cluster, rather than narrowing cluster down to sharing L2 (and
L3 tags) as the lowest topology level that contains cores.

  * Flexible nesting of cluster allows it to correspond to any level
between the x86 package and core.

  * In Linux kernel, x86's cluster only represents the L2 cache domain
but QEMU's smp.clusters is the CPU topology level. Linux kernel will
also expose module level topology information in sysfs for x86. To
avoid cluster ambiguity and keep a consistent CPU topology naming
style with the Linux kernel, we introduce module level for x86.

Based on the above considerations, and in order to eliminate the naming
confusion caused by the mapping between general cluster and x86 module,
we now formally introduce smp.modules as the new topology level.


Where to Place Module in Existing Topology Levels
-

The module is, in existing hardware practice, the lowest layer that
contains the core, while the cluster is able to have a higher topological
scope than the module due to its nesting.

Therefore, we place the module between the cluster and the core:

drawer/book/socket/die/cluster/module/core/thread


Patch Series Overview
=

Introduction of Module Level in -smp


First, a new module level is introduced in the -smp related code to
support the module topology in subsequent x86 parts.

Users can specify the number of modules (in one die) for a PC machine
with "-smp modules=*".


Why not Share L2 Cache in Module Directly
-

Though one of module's goals is to implement L2 cache per module,
directly using module to define x86's L2 cache topology will cause the
compatibility problem:

Currently, x86 defaults that the L2 cache is shared in one core, which
actually implies a default setting "cores per L2 cache is 1" and
therefore implicitly defaults to having as many L2 caches as cores.

For example (i386 PC machine):
-smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16 (*)

Considering the topology of the L2 cache, this (*) implicitly means "1
core per L2 cache" and "2 L2 caches per die".

If we use module to configure L2 cache topology with the new default
setting "modules per L2 cache is 1", the above semantics will change
to "2 cores per module" and "1 module per L2 cache", that is, "2
cores per L2 cache".

So the same command (*) will cause changes in the L2 cache topology,
further affecting the performance of the virtual machine.

Therefore, x86 should only treat module as a cpu top

[PATCH v10 07/21] i386/cpu: Use APIC ID info get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14]

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

The commit 8f4202fb1080 ("i386: Populate AMD Processor Cache Information
for cpuid 0x801D") adds the cache topology for AMD CPU by encoding
the number of sharing threads directly.

>From AMD's APM, NumSharingCache (CPUID[0x801D].EAX[bits 25:14])
means [1]:

The number of logical processors sharing this cache is the value of
this field incremented by 1. To determine which logical processors are
sharing a cache, determine a Share Id for each processor as follows:

ShareId = LocalApicId >> log2(NumSharingCache+1)

Logical processors with the same ShareId then share a cache. If
NumSharingCache+1 is not a power of two, round it up to the next power
of two.

>From the description above, the calculation of this field should be same
as CPUID[4].EAX[bits 25:14] for Intel CPUs. So also use the offsets of
APIC ID to calculate this field.

[1]: APM, vol.3, appendix.E.4.15 Function 8000_001Dh--Cache Topology
 Information

Tested-by: Yongwei Ma 
Signed-off-by: Zhao Liu 
Reviewed-by: Babu Moger 
Tested-by: Babu Moger 
Reviewed-by: Xiaoyao Li 
---
Changes since v7:
 * Moved this patch after CPUID[4]'s similar change ("i386/cpu: Use APIC
   ID offset to encode cache topo in CPUID[4]"). (Xiaoyao)
 * Dropped Michael/Babu's Acked/Reviewed/Tested tags since the code
   change due to the rebase.
 * Re-added Yongwei's Tested tag For his re-testing (compilation on
   Intel platforms).

Changes since v3:
 * Rewrote the subject. (Babu)
 * Deleted the original "comment/help" expression, as this behavior is
   confirmed for AMD CPUs. (Babu)
 * Renamed "num_apic_ids" (v3) to "num_sharing_cache" to match spec
   definition. (Babu)

Changes since v1:
 * Renamed "l3_threads" to "num_apic_ids" in
   encode_cache_cpuid801d(). (Yanan)
 * Added the description of the original commit and add Cc.
---
 target/i386/cpu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 0ebacacf2aad..8f559b42f706 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -331,7 +331,7 @@ static void encode_cache_cpuid801d(CPUCacheInfo *cache,
uint32_t *eax, uint32_t *ebx,
uint32_t *ecx, uint32_t *edx)
 {
-uint32_t l3_threads;
+uint32_t num_sharing_cache;
 assert(cache->size == cache->line_size * cache->associativity *
   cache->partitions * cache->sets);
 
@@ -340,11 +340,11 @@ static void encode_cache_cpuid801d(CPUCacheInfo 
*cache,
 
 /* L3 is shared among multiple cores */
 if (cache->level == 3) {
-l3_threads = topo_info->cores_per_die * topo_info->threads_per_core;
-*eax |= (l3_threads - 1) << 14;
+num_sharing_cache = 1 << apicid_die_offset(topo_info);
 } else {
-*eax |= ((topo_info->threads_per_core - 1) << 14);
+num_sharing_cache = 1 << apicid_core_offset(topo_info);
 }
+*eax |= (num_sharing_cache - 1) << 14;
 
 assert(cache->line_size > 0);
 assert(cache->partitions > 0);
-- 
2.34.1

[PATCH v10 05/21] i386/cpu: Fix i/d-cache topology to core level for Intel CPU

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

For i-cache and d-cache, current QEMU hardcodes the maximum IDs for CPUs
sharing cache (CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits
25:14]) to 0, and this means i-cache and d-cache are shared in the SMT
level.

This is correct if there's single thread per core, but is wrong for the
hyper threading case (one core contains multiple threads) since the
i-cache and d-cache are shared in the core level other than SMT level.

For AMD CPU, commit 8f4202fb1080 ("i386: Populate AMD Processor Cache
Information for cpuid 0x801D") has already introduced i/d cache
topology as core level by default.

Therefore, in order to be compatible with both multi-threaded and
single-threaded situations, we should set i-cache and d-cache be shared
at the core level by default.

This fix changes the default i/d cache topology from per-thread to
per-core. Potentially, this change in L1 cache topology may affect the
performance of the VM if the user does not specifically specify the
topology or bind the vCPU. However, the way to achieve optimal
performance should be to create a reasonable topology and set the
appropriate vCPU affinity without relying on QEMU's default topology
structure.

Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
Suggested-by: Robert Hoo 
Signed-off-by: Zhao Liu 
Reviewed-by: Xiaoyao Li 
Tested-by: Babu Moger 
Tested-by: Yongwei Ma 
Acked-by: Michael S. Tsirkin 
---
Changes since v3:
 * Changed the description of current i/d cache encoding status to avoid
   misleading to "architectural rules". (Xiaoyao)

Changes since v1:
 * Split this fix from the patch named "i386/cpu: Fix number of
   addressable IDs in CPUID.04H".
 * Added the explanation of the impact on performance. (Xiaoyao)
---
 target/i386/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 33760a2ee163..eedc2c5ea6e0 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6113,12 +6113,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 switch (count) {
 case 0: /* L1 dcache info */
 encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
-1, cs->nr_cores,
+cs->nr_threads, cs->nr_cores,
 eax, ebx, ecx, edx);
 break;
 case 1: /* L1 icache info */
 encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
-1, cs->nr_cores,
+cs->nr_threads, cs->nr_cores,
 eax, ebx, ecx, edx);
 break;
 case 2: /* L2 cache info */
-- 
2.34.1

[PATCH v10 10/21] i386: Split topology types of CPUID[0x1F] from the definitions of CPUID[0xB]

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

CPUID[0xB] defines SMT, Core and Invalid types, and this leaf is shared
by Intel and AMD CPUs.

But for extended topology levels, Intel CPU (in CPUID[0x1F]) and AMD CPU
(in CPUID[0x8026]) have the different definitions with different
enumeration values.

Though CPUID[0x8026] hasn't been implemented in QEMU, to avoid
possible misunderstanding, split topology types of CPUID[0x1F] from the
definitions of CPUID[0xB] and introduce CPUID[0x1F]-specific topology
types.

Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
Acked-by: Michael S. Tsirkin 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Babu Moger 
---
Changes since v8:
 * Add Philippe's reviewed-by tag.

Changes since v3:
 * New commit to prepare to refactor CPUID[0x1F] encoding.
---
 target/i386/cpu.c | 14 +++---
 target/i386/cpu.h | 13 +
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6b159298fea5..d030b45f9c3e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6266,17 +6266,17 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 case 0:
 *eax = apicid_core_offset(&topo_info);
 *ebx = topo_info.threads_per_core;
-*ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
+*ecx |= CPUID_B_ECX_TOPO_LEVEL_SMT << 8;
 break;
 case 1:
 *eax = apicid_pkg_offset(&topo_info);
 *ebx = threads_per_pkg;
-*ecx |= CPUID_TOPOLOGY_LEVEL_CORE;
+*ecx |= CPUID_B_ECX_TOPO_LEVEL_CORE << 8;
 break;
 default:
 *eax = 0;
 *ebx = 0;
-*ecx |= CPUID_TOPOLOGY_LEVEL_INVALID;
+*ecx |= CPUID_B_ECX_TOPO_LEVEL_INVALID << 8;
 }
 
 assert(!(*eax & ~0x1f));
@@ -6301,22 +6301,22 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 case 0:
 *eax = apicid_core_offset(&topo_info);
 *ebx = topo_info.threads_per_core;
-*ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
+*ecx |= CPUID_1F_ECX_TOPO_LEVEL_SMT << 8;
 break;
 case 1:
 *eax = apicid_die_offset(&topo_info);
 *ebx = topo_info.cores_per_die * topo_info.threads_per_core;
-*ecx |= CPUID_TOPOLOGY_LEVEL_CORE;
+*ecx |= CPUID_1F_ECX_TOPO_LEVEL_CORE << 8;
 break;
 case 2:
 *eax = apicid_pkg_offset(&topo_info);
 *ebx = threads_per_pkg;
-*ecx |= CPUID_TOPOLOGY_LEVEL_DIE;
+*ecx |= CPUID_1F_ECX_TOPO_LEVEL_DIE << 8;
 break;
 default:
 *eax = 0;
 *ebx = 0;
-*ecx |= CPUID_TOPOLOGY_LEVEL_INVALID;
+*ecx |= CPUID_1F_ECX_TOPO_LEVEL_INVALID << 8;
 }
 assert(!(*eax & ~0x1f));
 *ebx &= 0x; /* The count doesn't need to be reliable. */
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 60ebc6378064..2e24f457468d 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1017,10 +1017,15 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord 
w,
 #define CPUID_MWAIT_EMX (1U << 0) /* enumeration supported */
 
 /* CPUID[0xB].ECX level types */
-#define CPUID_TOPOLOGY_LEVEL_INVALID  (0U << 8)
-#define CPUID_TOPOLOGY_LEVEL_SMT  (1U << 8)
-#define CPUID_TOPOLOGY_LEVEL_CORE (2U << 8)
-#define CPUID_TOPOLOGY_LEVEL_DIE  (5U << 8)
+#define CPUID_B_ECX_TOPO_LEVEL_INVALID  0
+#define CPUID_B_ECX_TOPO_LEVEL_SMT  1
+#define CPUID_B_ECX_TOPO_LEVEL_CORE 2
+
+/* COUID[0x1F].ECX level types */
+#define CPUID_1F_ECX_TOPO_LEVEL_INVALID  CPUID_B_ECX_TOPO_LEVEL_INVALID
+#define CPUID_1F_ECX_TOPO_LEVEL_SMT  CPUID_B_ECX_TOPO_LEVEL_SMT
+#define CPUID_1F_ECX_TOPO_LEVEL_CORE CPUID_B_ECX_TOPO_LEVEL_CORE
+#define CPUID_1F_ECX_TOPO_LEVEL_DIE  5
 
 /* MSR Feature Bits */
 #define MSR_ARCH_CAP_RDCL_NO(1U << 0)
-- 
2.34.1

[PATCH v10 04/21] hw/core: Support module-id in numa configuration

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

Module is a level above the core, thereby supporting numa
configuration on the module level can bring user more numa flexibility.

This is the natural further support for module level.

Add module level support in numa configuration.

Tested-by: Yongwei Ma 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
---
Changes since v7:
 * New commit to support module level.
---
 hw/core/machine.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 9ff5170f8e31..27340392aec8 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -797,6 +797,11 @@ void machine_set_cpu_numa_node(MachineState *machine,
 return;
 }
 
+if (props->has_module_id && !slot->props.has_module_id) {
+error_setg(errp, "module-id is not supported");
+return;
+}
+
 if (props->has_cluster_id && !slot->props.has_cluster_id) {
 error_setg(errp, "cluster-id is not supported");
 return;
@@ -821,6 +826,11 @@ void machine_set_cpu_numa_node(MachineState *machine,
 continue;
 }
 
+if (props->has_module_id &&
+props->module_id != slot->props.module_id) {
+continue;
+}
+
 if (props->has_cluster_id &&
 props->cluster_id != slot->props.cluster_id) {
 continue;
@@ -1218,6 +1228,12 @@ static char *cpu_slot_to_string(const CPUArchId *cpu)
 }
 g_string_append_printf(s, "cluster-id: %"PRId64, 
cpu->props.cluster_id);
 }
+if (cpu->props.has_module_id) {
+if (s->len) {
+g_string_append_printf(s, ", ");
+}
+g_string_append_printf(s, "module-id: %"PRId64, cpu->props.module_id);
+}
 if (cpu->props.has_core_id) {
 if (s->len) {
 g_string_append_printf(s, ", ");
-- 
2.34.1

[PATCH v10 14/21] i386: Expose module level in CPUID[0x1F]

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
erroneous smp_num_siblings on Intel Hybrid platforms") is able to
handle platforms with Module level enumerated via CPUID.1F.

Expose the module level in CPUID[0x1F] if the machine has more than 1
modules.

Tested-by: Yongwei Ma 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
---
Changes since v7:
 * Mapped x86 module to smp module instead of cluster.
 * Dropped Michael/Babu's ACKed/Tested tags since the code change.
 * Re-added Yongwei's Tested tag For his re-testing.

Changes since v3:
 * New patch to expose module level in 0x1F.
 * Added Tested-by tag from Yongwei.
---
 hw/i386/x86.c  | 2 +-
 include/hw/i386/topology.h | 6 --
 target/i386/cpu.c  | 6 ++
 target/i386/cpu.h  | 1 +
 4 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 6df762369c71..a4da29ec8115 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -322,7 +322,7 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 
 if (ms->smp.modules > 1) {
 env->nr_modules = ms->smp.modules;
-/* TODO: Expose module level in CPUID[0x1F]. */
+set_bit(CPU_TOPO_LEVEL_MODULE, env->avail_cpu_topo);
 }
 
 if (ms->smp.dies > 1) {
diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
index 7622d806932c..ea871045779d 100644
--- a/include/hw/i386/topology.h
+++ b/include/hw/i386/topology.h
@@ -71,6 +71,7 @@ enum CPUTopoLevel {
 CPU_TOPO_LEVEL_INVALID,
 CPU_TOPO_LEVEL_SMT,
 CPU_TOPO_LEVEL_CORE,
+CPU_TOPO_LEVEL_MODULE,
 CPU_TOPO_LEVEL_DIE,
 CPU_TOPO_LEVEL_PACKAGE,
 CPU_TOPO_LEVEL_MAX,
@@ -198,11 +199,12 @@ static inline apic_id_t 
x86_apicid_from_cpu_idx(X86CPUTopoInfo *topo_info,
 }
 
 /*
- * Check whether there's extended topology level (die)?
+ * Check whether there's extended topology level (module or die)?
  */
 static inline bool x86_has_extended_topo(unsigned long *topo_bitmap)
 {
-return test_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
+return test_bit(CPU_TOPO_LEVEL_MODULE, topo_bitmap) ||
+   test_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
 }
 
 #endif /* HW_I386_TOPOLOGY_H */
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 7c5c6a0e87a6..8dab6d473247 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -277,6 +277,8 @@ static uint32_t num_threads_by_topo_level(X86CPUTopoInfo 
*topo_info,
 return 1;
 case CPU_TOPO_LEVEL_CORE:
 return topo_info->threads_per_core;
+case CPU_TOPO_LEVEL_MODULE:
+return topo_info->threads_per_core * topo_info->cores_per_module;
 case CPU_TOPO_LEVEL_DIE:
 return topo_info->threads_per_core * topo_info->cores_per_module *
topo_info->modules_per_die;
@@ -297,6 +299,8 @@ static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo 
*topo_info,
 return 0;
 case CPU_TOPO_LEVEL_CORE:
 return apicid_core_offset(topo_info);
+case CPU_TOPO_LEVEL_MODULE:
+return apicid_module_offset(topo_info);
 case CPU_TOPO_LEVEL_DIE:
 return apicid_die_offset(topo_info);
 case CPU_TOPO_LEVEL_PACKAGE:
@@ -316,6 +320,8 @@ static uint32_t cpuid1f_topo_type(enum CPUTopoLevel 
topo_level)
 return CPUID_1F_ECX_TOPO_LEVEL_SMT;
 case CPU_TOPO_LEVEL_CORE:
 return CPUID_1F_ECX_TOPO_LEVEL_CORE;
+case CPU_TOPO_LEVEL_MODULE:
+return CPUID_1F_ECX_TOPO_LEVEL_MODULE;
 case CPU_TOPO_LEVEL_DIE:
 return CPUID_1F_ECX_TOPO_LEVEL_DIE;
 default:
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 095540e58f7a..c3a83c33345a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1025,6 +1025,7 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_1F_ECX_TOPO_LEVEL_INVALID  CPUID_B_ECX_TOPO_LEVEL_INVALID
 #define CPUID_1F_ECX_TOPO_LEVEL_SMT  CPUID_B_ECX_TOPO_LEVEL_SMT
 #define CPUID_1F_ECX_TOPO_LEVEL_CORE CPUID_B_ECX_TOPO_LEVEL_CORE
+#define CPUID_1F_ECX_TOPO_LEVEL_MODULE   3
 #define CPUID_1F_ECX_TOPO_LEVEL_DIE  5
 
 /* MSR Feature Bits */
-- 
2.34.1

[PATCH v10 17/21] tests: Add test case of APIC ID for module level parsing

2024-03-21 Thread Zhao Liu

From: Zhuocheng Ding 

After i386 supports module level, it's time to add the test for module
level's parsing.

Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
Reviewed-by: Yanan Wang 
Tested-by: Babu Moger 
Tested-by: Yongwei Ma 
Acked-by: Michael S. Tsirkin 
---
 tests/unit/test-x86-topo.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/tests/unit/test-x86-topo.c b/tests/unit/test-x86-topo.c
index f21b8a5d95c2..55b731ccae55 100644
--- a/tests/unit/test-x86-topo.c
+++ b/tests/unit/test-x86-topo.c
@@ -37,6 +37,7 @@ static void test_topo_bits(void)
 topo_info = (X86CPUTopoInfo) {1, 1, 1, 1};
 g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 0);
 g_assert_cmpuint(apicid_core_width(&topo_info), ==, 0);
+g_assert_cmpuint(apicid_module_width(&topo_info), ==, 0);
 g_assert_cmpuint(apicid_die_width(&topo_info), ==, 0);
 
 topo_info = (X86CPUTopoInfo) {1, 1, 1, 1};
@@ -74,13 +75,22 @@ static void test_topo_bits(void)
 topo_info = (X86CPUTopoInfo) {1, 1, 33, 2};
 g_assert_cmpuint(apicid_core_width(&topo_info), ==, 6);
 
-topo_info = (X86CPUTopoInfo) {1, 1, 30, 2};
+topo_info = (X86CPUTopoInfo) {1, 6, 30, 2};
+g_assert_cmpuint(apicid_module_width(&topo_info), ==, 3);
+topo_info = (X86CPUTopoInfo) {1, 7, 30, 2};
+g_assert_cmpuint(apicid_module_width(&topo_info), ==, 3);
+topo_info = (X86CPUTopoInfo) {1, 8, 30, 2};
+g_assert_cmpuint(apicid_module_width(&topo_info), ==, 3);
+topo_info = (X86CPUTopoInfo) {1, 9, 30, 2};
+g_assert_cmpuint(apicid_module_width(&topo_info), ==, 4);
+
+topo_info = (X86CPUTopoInfo) {1, 6, 30, 2};
 g_assert_cmpuint(apicid_die_width(&topo_info), ==, 0);
-topo_info = (X86CPUTopoInfo) {2, 1, 30, 2};
+topo_info = (X86CPUTopoInfo) {2, 6, 30, 2};
 g_assert_cmpuint(apicid_die_width(&topo_info), ==, 1);
-topo_info = (X86CPUTopoInfo) {3, 1, 30, 2};
+topo_info = (X86CPUTopoInfo) {3, 6, 30, 2};
 g_assert_cmpuint(apicid_die_width(&topo_info), ==, 2);
-topo_info = (X86CPUTopoInfo) {4, 1, 30, 2};
+topo_info = (X86CPUTopoInfo) {4, 6, 30, 2};
 g_assert_cmpuint(apicid_die_width(&topo_info), ==, 2);
 
 /* build a weird topology and see if IDs are calculated correctly
@@ -91,6 +101,7 @@ static void test_topo_bits(void)
 topo_info = (X86CPUTopoInfo) {1, 1, 6, 3};
 g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 2);
 g_assert_cmpuint(apicid_core_offset(&topo_info), ==, 2);
+g_assert_cmpuint(apicid_module_offset(&topo_info), ==, 5);
 g_assert_cmpuint(apicid_die_offset(&topo_info), ==, 5);
 g_assert_cmpuint(apicid_pkg_offset(&topo_info), ==, 5);
 
-- 
2.34.1

[PATCH v10 09/21] i386/cpu: Introduce bitmap to cache available CPU topology levels

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

Currently, QEMU checks the specify number of topology domains to detect
if there's extended topology levels (e.g., checking nr_dies).

With this bitmap, the extended CPU topology (the levels other than SMT,
core and package) could be easier to detect without touching the
topology details.

This is also in preparation for the follow-up to decouple CPUID[0x1F]
subleaf with specific topology level.

Tested-by: Yongwei Ma 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
Reviewed-by: Xiaoyao Li 
---
Changes since v7:
 * New commit to response Xiaoyao's suggestion about the gloabl variable
   to cache topology levels. (Xiaoyao)
---
 hw/i386/x86.c  |  5 -
 include/hw/i386/topology.h | 23 +++
 target/i386/cpu.c  | 18 +++---
 target/i386/cpu.h  |  4 
 target/i386/kvm/kvm.c  |  3 ++-
 5 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index ffbda48917fd..0a6c59c724f1 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -313,7 +313,10 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 
 init_topo_info(&topo_info, x86ms);
 
-env->nr_dies = ms->smp.dies;
+if (ms->smp.dies > 1) {
+env->nr_dies = ms->smp.dies;
+set_bit(CPU_TOPO_LEVEL_DIE, env->avail_cpu_topo);
+}
 
 /*
  * If APIC ID is not set,
diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
index d4eeb7ab8290..befeb92b0b19 100644
--- a/include/hw/i386/topology.h
+++ b/include/hw/i386/topology.h
@@ -60,6 +60,21 @@ typedef struct X86CPUTopoInfo {
 unsigned threads_per_core;
 } X86CPUTopoInfo;
 
+/*
+ * CPUTopoLevel is the general i386 topology hierarchical representation,
+ * ordered by increasing hierarchical relationship.
+ * Its enumeration value is not bound to the type value of Intel (CPUID[0x1F])
+ * or AMD (CPUID[0x8026]).
+ */
+enum CPUTopoLevel {
+CPU_TOPO_LEVEL_INVALID,
+CPU_TOPO_LEVEL_SMT,
+CPU_TOPO_LEVEL_CORE,
+CPU_TOPO_LEVEL_DIE,
+CPU_TOPO_LEVEL_PACKAGE,
+CPU_TOPO_LEVEL_MAX,
+};
+
 /* Return the bit width needed for 'count' IDs */
 static unsigned apicid_bitwidth_for_count(unsigned count)
 {
@@ -168,4 +183,12 @@ static inline apic_id_t 
x86_apicid_from_cpu_idx(X86CPUTopoInfo *topo_info,
 return x86_apicid_from_topo_ids(topo_info, &topo_ids);
 }
 
+/*
+ * Check whether there's extended topology level (die)?
+ */
+static inline bool x86_has_extended_topo(unsigned long *topo_bitmap)
+{
+return test_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
+}
+
 #endif /* HW_I386_TOPOLOGY_H */
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 0ad400cd709a..6b159298fea5 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6290,7 +6290,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 0x1F:
 /* V2 Extended Topology Enumeration Leaf */
-if (topo_info.dies_per_pkg < 2) {
+if (!x86_has_extended_topo(env->avail_cpu_topo)) {
 *eax = *ebx = *ecx = *edx = 0;
 break;
 }
@@ -7122,7 +7122,7 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
  * cpu->vendor_cpuid_only has been unset for compatibility with older
  * machine types.
  */
-if ((env->nr_dies > 1) &&
+if (x86_has_extended_topo(env->avail_cpu_topo) &&
 (IS_INTEL_CPU(env) || !cpu->vendor_cpuid_only)) {
 x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1F);
 }
@@ -7628,13 +7628,25 @@ static void x86_cpu_post_initfn(Object *obj)
 accel_cpu_instance_init(CPU(obj));
 }
 
+static void x86_cpu_init_default_topo(X86CPU *cpu)
+{
+CPUX86State *env = &cpu->env;
+
+env->nr_dies = 1;
+
+/* SMT, core and package levels are set by default. */
+set_bit(CPU_TOPO_LEVEL_SMT, env->avail_cpu_topo);
+set_bit(CPU_TOPO_LEVEL_CORE, env->avail_cpu_topo);
+set_bit(CPU_TOPO_LEVEL_PACKAGE, env->avail_cpu_topo);
+}
+
 static void x86_cpu_initfn(Object *obj)
 {
 X86CPU *cpu = X86_CPU(obj);
 X86CPUClass *xcc = X86_CPU_GET_CLASS(obj);
 CPUX86State *env = &cpu->env;
 
-env->nr_dies = 1;
+x86_cpu_init_default_topo(cpu);
 
 object_property_add(obj, "feature-words", "X86CPUFeatureWordInfo",
 x86_cpu_get_feature_words,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 6b0573807918..60ebc6378064 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -24,6 +24,7 @@
 #include "cpu-qom.h"
 #include "kvm/hyperv-proto.h"
 #include "exec/cpu-defs.h"
+#include "hw/i386/topology.h"
 #include "qapi/qapi-types-common.h"
 #include "qemu/cpu-float.h"
 #include "qemu/timer.h"
@@ -1892,6 +1893,9 @@ typedef struct CPUArchState {
 
 /* Number of dies within this CPU package. */
 unsigned nr_dies;
+
+/* Bitmap of available CPU topology levels for this CPU. */
+DECLARE_BITMAP(avail_cpu_topo, CPU_TOPO_LEVEL_MAX);
 } CPUX86State;
 
 struct kvm_msrs;
diff --git a/ta

[PATCH v10 08/21] i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid()

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

In cpu_x86_cpuid(), there are many variables in representing the cpu
topology, e.g., topo_info, cs->nr_cores and cs->nr_threads.

Since the names of cs->nr_cores and cs->nr_threads do not accurately
represent its meaning, the use of cs->nr_cores or cs->nr_threads is
prone to confusion and mistakes.

And the structure X86CPUTopoInfo names its members clearly, thus the
variable "topo_info" should be preferred.

In addition, in cpu_x86_cpuid(), to uniformly use the topology variable,
replace env->dies with topo_info.dies_per_pkg as well.

Suggested-by: Robert Hoo 
Tested-by: Yongwei Ma 
Signed-off-by: Zhao Liu 
Reviewed-by: Xiaoyao Li 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Babu Moger 
---
Changes since v9:
 * Polished the commit message. ( Xiaoyao)

Changes since v8:
 * Added Philippe's reviewed-by tag.

Changes since v7:
 * Renamed cpus_per_pkg to threads_per_pkg. (Xiaoyao)
 * Dropped Michael/Babu's Acked/Tested tags since the code change.
 * Re-added Yongwei's Tested tag For his re-testing.
 * Added Xiaoyao's Reviewed tag.

Changes since v3:
 * Fixed typo. (Babu)

Changes since v1:
 * Extracted cores_per_socket from the code block and use it as a local
   variable for cpu_x86_cpuid(). (Yanan)
 * Removed vcpus_per_socket variable and use cpus_per_pkg directly.
   (Yanan)
 * Replaced env->dies with topo_info.dies_per_pkg in cpu_x86_cpuid().
---
 target/i386/cpu.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 8f559b42f706..0ad400cd709a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6017,11 +6017,16 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 uint32_t limit;
 uint32_t signature[3];
 X86CPUTopoInfo topo_info;
+uint32_t cores_per_pkg;
+uint32_t threads_per_pkg;
 
 topo_info.dies_per_pkg = env->nr_dies;
 topo_info.cores_per_die = cs->nr_cores / env->nr_dies;
 topo_info.threads_per_core = cs->nr_threads;
 
+cores_per_pkg = topo_info.cores_per_die * topo_info.dies_per_pkg;
+threads_per_pkg = cores_per_pkg * topo_info.threads_per_core;
+
 /* Calculate & apply limits for different index ranges */
 if (index >= 0xC000) {
 limit = env->cpuid_xlevel2;
@@ -6057,8 +6062,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *ecx |= CPUID_EXT_OSXSAVE;
 }
 *edx = env->features[FEAT_1_EDX];
-if (cs->nr_cores * cs->nr_threads > 1) {
-*ebx |= (cs->nr_cores * cs->nr_threads) << 16;
+if (threads_per_pkg > 1) {
+*ebx |= threads_per_pkg << 16;
 *edx |= CPUID_HT;
 }
 if (!cpu->enable_pmu) {
@@ -6106,15 +6111,15 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
  */
 if (*eax & 31) {
 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
-int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
-if (cs->nr_cores > 1) {
+
+if (cores_per_pkg > 1) {
 addressable_cores_width = apicid_pkg_offset(&topo_info) -
   apicid_core_offset(&topo_info);
 
 *eax &= ~0xFC00;
 *eax |= ((1 << addressable_cores_width) - 1) << 26;
 }
-if (host_vcpus_per_cache > vcpus_per_socket) {
+if (host_vcpus_per_cache > threads_per_pkg) {
 /* Share the cache at package level. */
 addressable_threads_width = apicid_pkg_offset(&topo_info);
 
@@ -6260,12 +6265,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 switch (count) {
 case 0:
 *eax = apicid_core_offset(&topo_info);
-*ebx = cs->nr_threads;
+*ebx = topo_info.threads_per_core;
 *ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
 break;
 case 1:
 *eax = apicid_pkg_offset(&topo_info);
-*ebx = cs->nr_cores * cs->nr_threads;
+*ebx = threads_per_pkg;
 *ecx |= CPUID_TOPOLOGY_LEVEL_CORE;
 break;
 default:
@@ -6285,7 +6290,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 0x1F:
 /* V2 Extended Topology Enumeration Leaf */
-if (env->nr_dies < 2) {
+if (topo_info.dies_per_pkg < 2) {
 *eax = *ebx = *ecx = *edx = 0;
 break;
 }
@@ -6295,7 +6300,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 switch (count) {
 case 0:
 *eax = apicid_core_offset(&topo_info);
-*ebx = cs->nr_threads;
+*ebx = topo_info.threads_per_core;
 *ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
 break;
 case 1:
@@ -6305,7 +6310,7 @@ void cpu_x86_cpuid(CPU

[PATCH v10 16/21] i386/cpu: Introduce module-id to X86CPU

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

Introduce module-id to be consistent with the module-id field in
CpuInstanceProperties.

Following the legacy smp check rules, also add the module_id validity
into x86_cpu_pre_plug().

Tested-by: Yongwei Ma 
Co-developed-by: Zhuocheng Ding 
Signed-off-by: Zhuocheng Ding 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
---
Changes since v7:
 * Introduced module_id instead of cluster_id.
 * Dropped Michael/Babu's ACKed/Tested tags since the code change.
 * Re-added Yongwei's Tested tag For his re-testing.

Changes since v6:
 * Updated the comment when check cluster-id. Since there's no
   v8.2, the cluster-id support should at least start from v9.0.

Changes since v5:
 * Updated the comment when check cluster-id. Since current QEMU is
   v8.2, the cluster-id support should at least start from v8.3.

Changes since v3:
 * Used the imperative in the commit message. (Babu)
---
 hw/i386/x86.c | 33 +
 target/i386/cpu.c |  2 ++
 target/i386/cpu.h |  1 +
 3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 81f50b5fcd3c..e7d25165a947 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -343,6 +343,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 cpu->die_id = 0;
 }
 
+/*
+ * module-id was optional in QEMU 9.0 and older, so keep it optional
+ * if there's only one module per die.
+ */
+if (cpu->module_id < 0 && ms->smp.modules == 1) {
+cpu->module_id = 0;
+}
+
 if (cpu->socket_id < 0) {
 error_setg(errp, "CPU socket-id is not set");
 return;
@@ -359,6 +367,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
cpu->die_id, ms->smp.dies - 1);
 return;
 }
+if (cpu->module_id < 0) {
+error_setg(errp, "CPU module-id is not set");
+return;
+} else if (cpu->module_id > ms->smp.modules - 1) {
+error_setg(errp, "Invalid CPU module-id: %u must be in range 0:%u",
+   cpu->module_id, ms->smp.modules - 1);
+return;
+}
 if (cpu->core_id < 0) {
 error_setg(errp, "CPU core-id is not set");
 return;
@@ -378,16 +394,9 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 
 topo_ids.pkg_id = cpu->socket_id;
 topo_ids.die_id = cpu->die_id;
+topo_ids.module_id = cpu->module_id;
 topo_ids.core_id = cpu->core_id;
 topo_ids.smt_id = cpu->thread_id;
-
-/*
- * TODO: This is the temporary initialization for topo_ids.module_id to
- * avoid "maybe-uninitialized" compilation errors. Will remove when
- * X86CPU supports module_id.
- */
-topo_ids.module_id = 0;
-
 cpu->apic_id = x86_apicid_from_topo_ids(&topo_info, &topo_ids);
 }
 
@@ -432,6 +441,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 }
 cpu->die_id = topo_ids.die_id;
 
+if (cpu->module_id != -1 && cpu->module_id != topo_ids.module_id) {
+error_setg(errp, "property module-id: %u doesn't match set apic-id:"
+" 0x%x (module-id: %u)", cpu->module_id, cpu->apic_id,
+topo_ids.module_id);
+return;
+}
+cpu->module_id = topo_ids.module_id;
+
 if (cpu->core_id != -1 && cpu->core_id != topo_ids.core_id) {
 error_setg(errp, "property core-id: %u doesn't match set apic-id:"
 " 0x%x (core-id: %u)", cpu->core_id, cpu->apic_id,
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 8dab6d473247..3a500640935f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7948,12 +7948,14 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_UINT32("apic-id", X86CPU, apic_id, 0),
 DEFINE_PROP_INT32("thread-id", X86CPU, thread_id, 0),
 DEFINE_PROP_INT32("core-id", X86CPU, core_id, 0),
+DEFINE_PROP_INT32("module-id", X86CPU, module_id, 0),
 DEFINE_PROP_INT32("die-id", X86CPU, die_id, 0),
 DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, 0),
 #else
 DEFINE_PROP_UINT32("apic-id", X86CPU, apic_id, UNASSIGNED_APIC_ID),
 DEFINE_PROP_INT32("thread-id", X86CPU, thread_id, -1),
 DEFINE_PROP_INT32("core-id", X86CPU, core_id, -1),
+DEFINE_PROP_INT32("module-id", X86CPU, module_id, -1),
 DEFINE_PROP_INT32("die-id", X86CPU, die_id, -1),
 DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, -1),
 #endif
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index c3a83c33345a..cc24bf197de0 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2051,6 +2051,7 @@ struct ArchCPU {
 int32_t node_id; /* NUMA node this CPU belongs to */
 int32_t socket_id;
 int32_t die_id;
+int32_t module_id;
 int32_t core_id;
 int32_t thread_id;
 
-- 
2.34.1

[PATCH v10 13/21] i386: Support modules_per_die in X86CPUTopoInfo

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

Support module level in i386 cpu topology structure "X86CPUTopoInfo".

Since x86 does not yet support the "modules" parameter in "-smp",
X86CPUTopoInfo.modules_per_die is currently always 1.

Therefore, the module level width in APIC ID, which can be calculated by
"apicid_bitwidth_for_count(topo_info->modules_per_die)", is always 0 for
now, so we can directly add APIC ID related helpers to support module
level parsing.

In addition, update topology structure in test-x86-topo.c.

Tested-by: Yongwei Ma 
Co-developed-by: Zhuocheng Ding 
Signed-off-by: Zhuocheng Ding 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
---
Changes since v7:
 * Mapped x86 module to smp module instead of cluster.
 * Dropped Michael/Babu's ACKed/Tested tags since the code change.
 * Re-added Yongwei's Tested tag For his re-testing.

Changes since v3:
 * Droped the description about not exposing module level in commit
   message.
 * Updated topology related calculation in newly added helpers:
   num_cpus_by_topo_level() and apicid_offset_by_topo_level().

Changes since v1:
 * Included module level related helpers (apicid_module_width() and
   apicid_module_offset()) in this patch. (Yanan)
---
 hw/i386/x86.c  |  9 +++-
 include/hw/i386/topology.h | 22 +++
 target/i386/cpu.c  | 13 ++-
 tests/unit/test-x86-topo.c | 45 --
 4 files changed, 58 insertions(+), 31 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 7c94d366af03..6df762369c71 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -72,7 +72,14 @@ static void init_topo_info(X86CPUTopoInfo *topo_info,
 MachineState *ms = MACHINE(x86ms);
 
 topo_info->dies_per_pkg = ms->smp.dies;
-topo_info->cores_per_die = ms->smp.cores;
+/*
+ * Though smp.modules means the number of modules in one cluster,
+ * i386 doesn't support cluster level so that the smp.clusters
+ * always defaults to 1, therefore using smp.modules directly is
+ * fine here.
+ */
+topo_info->modules_per_die = ms->smp.modules;
+topo_info->cores_per_module = ms->smp.cores;
 topo_info->threads_per_core = ms->smp.threads;
 }
 
diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
index befeb92b0b19..7622d806932c 100644
--- a/include/hw/i386/topology.h
+++ b/include/hw/i386/topology.h
@@ -56,7 +56,8 @@ typedef struct X86CPUTopoIDs {
 
 typedef struct X86CPUTopoInfo {
 unsigned dies_per_pkg;
-unsigned cores_per_die;
+unsigned modules_per_die;
+unsigned cores_per_module;
 unsigned threads_per_core;
 } X86CPUTopoInfo;
 
@@ -92,7 +93,13 @@ static inline unsigned apicid_smt_width(X86CPUTopoInfo 
*topo_info)
 /* Bit width of the Core_ID field */
 static inline unsigned apicid_core_width(X86CPUTopoInfo *topo_info)
 {
-return apicid_bitwidth_for_count(topo_info->cores_per_die);
+return apicid_bitwidth_for_count(topo_info->cores_per_module);
+}
+
+/* Bit width of the Module_ID field */
+static inline unsigned apicid_module_width(X86CPUTopoInfo *topo_info)
+{
+return apicid_bitwidth_for_count(topo_info->modules_per_die);
 }
 
 /* Bit width of the Die_ID field */
@@ -107,10 +114,16 @@ static inline unsigned apicid_core_offset(X86CPUTopoInfo 
*topo_info)
 return apicid_smt_width(topo_info);
 }
 
+/* Bit offset of the Module_ID field */
+static inline unsigned apicid_module_offset(X86CPUTopoInfo *topo_info)
+{
+return apicid_core_offset(topo_info) + apicid_core_width(topo_info);
+}
+
 /* Bit offset of the Die_ID field */
 static inline unsigned apicid_die_offset(X86CPUTopoInfo *topo_info)
 {
-return apicid_core_offset(topo_info) + apicid_core_width(topo_info);
+return apicid_module_offset(topo_info) + apicid_module_width(topo_info);
 }
 
 /* Bit offset of the Pkg_ID (socket ID) field */
@@ -142,7 +155,8 @@ static inline void x86_topo_ids_from_idx(X86CPUTopoInfo 
*topo_info,
  X86CPUTopoIDs *topo_ids)
 {
 unsigned nr_dies = topo_info->dies_per_pkg;
-unsigned nr_cores = topo_info->cores_per_die;
+unsigned nr_cores = topo_info->cores_per_module *
+topo_info->modules_per_die;
 unsigned nr_threads = topo_info->threads_per_core;
 
 topo_ids->pkg_id = cpu_index / (nr_dies * nr_cores * nr_threads);
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index b8917c412175..7c5c6a0e87a6 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -278,10 +278,11 @@ static uint32_t num_threads_by_topo_level(X86CPUTopoInfo 
*topo_info,
 case CPU_TOPO_LEVEL_CORE:
 return topo_info->threads_per_core;
 case CPU_TOPO_LEVEL_DIE:
-return topo_info->threads_per_core * topo_info->cores_per_die;
+return topo_info->threads_per_core * topo_info->cores_per_module *
+   topo_info->modules_per_die;
 case CPU_TOPO_LEVEL_PACKAGE:
-return topo_info->threads_per_core * topo_info->cores_per_die *
-   topo_info->di

[PATCH v10 03/21] hw/core: Introduce module-id as the topology subindex

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

Add module-id in CpuInstanceProperties, to locate the CPU with module
level.

Suggested-by: Xiaoyao Li 
Tested-by: Yongwei Ma 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
Acked-by: Markus Armbruster 
---
Changes since v7:
 * New commit to introduce module_id to locate the CPU with module
   level.
---
 hw/core/machine-hmp-cmds.c | 4 
 qapi/machine.json  | 4 
 2 files changed, 8 insertions(+)

diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
index a6ff6a487583..8701f00cc7cc 100644
--- a/hw/core/machine-hmp-cmds.c
+++ b/hw/core/machine-hmp-cmds.c
@@ -87,6 +87,10 @@ void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "cluster-id: \"%" PRIu64 "\"\n",
c->cluster_id);
 }
+if (c->has_module_id) {
+monitor_printf(mon, "module-id: \"%" PRIu64 "\"\n",
+   c->module_id);
+}
 if (c->has_core_id) {
 monitor_printf(mon, "core-id: \"%" PRIu64 "\"\n", c->core_id);
 }
diff --git a/qapi/machine.json b/qapi/machine.json
index 3f6a5af10ba8..366da6244ab6 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -925,6 +925,9 @@
 # @cluster-id: cluster number within the parent container the CPU
 # belongs to (since 7.1)
 #
+# @module-id: module number within the parent container the CPU
+# belongs to (since 9.1)
+#
 # @core-id: core number within the parent container the CPU
 # belongs to
 #
@@ -943,6 +946,7 @@
 '*socket-id': 'int',
 '*die-id': 'int',
 '*cluster-id': 'int',
+'*module-id': 'int',
 '*core-id': 'int',
 '*thread-id': 'int'
   }
-- 
2.34.1

[PATCH v10 01/21] hw/core/machine: Introduce the module as a CPU topology level

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

In x86, module is the topology level above core, which contains a set
of cores that share certain resources (in current products, the resource
usually includes L2 cache, as well as module scoped features and MSRs).

Though smp.clusters could also share the L2 cache resource [1], there
are following reasons that drive us to introduce the new smp.modules:

  * As the CPU topology abstraction in device tree [2], cluster supports
nesting (though currently QEMU hasn't support that). In contrast,
(x86) module does not support nesting.

  * Due to nesting, there is great flexibility in sharing resources
on cluster, rather than narrowing cluster down to sharing L2 (and
L3 tags) as the lowest topology level that contains cores.

  * Flexible nesting of cluster allows it to correspond to any level
between the x86 package and core.

  * In Linux kernel, x86's cluster only represents the L2 cache domain
but QEMU's smp.clusters is the CPU topology level. Linux kernel will
also expose module level topology information in sysfs for x86. To
avoid cluster ambiguity and keep a consistent CPU topology naming
style with the Linux kernel, we introduce module level for x86.

The module is, in existing hardware practice, the lowest layer that
contains the core, while the cluster is able to have a higher
topological scope than the module due to its nesting.

Therefore, place the module between the cluster and the core:

drawer/book/socket/die/cluster/module/core/thread

With the above topological hierarchy order, introduce module level
support in MachineState and MachineClass.

[1]: 
https://lore.kernel.org/qemu-devel/c3d68005-54e0-b8fe-8dc1-5989fe3c7...@huawei.com/
[2]: 
https://www.kernel.org/doc/Documentation/devicetree/bindings/cpu/cpu-topology.txt

Suggested-by: Xiaoyao Li 
Tested-by: Yongwei Ma 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
---
Changes since v8:
 * Add the reason of why a new module level is needed in commit message.
   (Markus).
 * Add the description about how Linux kernel supports x86 module level.
   (Daniel)

Changes since v7:
 * New commit to introduce module level in -smp.
---
 hw/core/machine-smp.c | 2 +-
 hw/core/machine.c | 1 +
 include/hw/boards.h   | 4 
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
index 27864c950766..2e68fcfdfd79 100644
--- a/hw/core/machine-smp.c
+++ b/hw/core/machine-smp.c
@@ -266,7 +266,7 @@ void machine_parse_smp_config(MachineState *ms,
 
 unsigned int machine_topo_get_cores_per_socket(const MachineState *ms)
 {
-return ms->smp.cores * ms->smp.clusters * ms->smp.dies;
+return ms->smp.cores * ms->smp.modules * ms->smp.clusters * ms->smp.dies;
 }
 
 unsigned int machine_topo_get_threads_per_socket(const MachineState *ms)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 37ede0e7d4fd..fe0579b7a7e9 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1154,6 +1154,7 @@ static void machine_initfn(Object *obj)
 ms->smp.sockets = 1;
 ms->smp.dies = 1;
 ms->smp.clusters = 1;
+ms->smp.modules = 1;
 ms->smp.cores = 1;
 ms->smp.threads = 1;
 
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 8b8f6d5c00d3..392be94f3cd7 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -143,6 +143,7 @@ typedef struct {
  * provided SMP configuration
  * @books_supported - whether books are supported by the machine
  * @drawers_supported - whether drawers are supported by the machine
+ * @modules_supported - whether modules are supported by the machine
  */
 typedef struct {
 bool prefer_sockets;
@@ -151,6 +152,7 @@ typedef struct {
 bool has_clusters;
 bool books_supported;
 bool drawers_supported;
+bool modules_supported;
 } SMPCompatProps;
 
 /**
@@ -338,6 +340,7 @@ typedef struct DeviceMemoryState {
  * @sockets: the number of sockets in one book
  * @dies: the number of dies in one socket
  * @clusters: the number of clusters in one die
+ * @modules: the number of modules in one cluster
  * @cores: the number of cores in one cluster
  * @threads: the number of threads in one core
  * @max_cpus: the maximum number of logical processors on the machine
@@ -349,6 +352,7 @@ typedef struct CpuTopology {
 unsigned int sockets;
 unsigned int dies;
 unsigned int clusters;
+unsigned int modules;
 unsigned int cores;
 unsigned int threads;
 unsigned int max_cpus;
-- 
2.34.1

[PATCH v10 20/21] i386/cpu: Use CPUCacheInfo.share_level to encode CPUID[4]

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

CPUID[4].EAX[bits 25:14] is used to represent the cache topology for
Intel CPUs.

After cache models have topology information, we can use
CPUCacheInfo.share_level to decide which topology level to be encoded
into CPUID[4].EAX[bits 25:14].

And since with the helper max_processor_ids_for_cache(), the filed
CPUID[4].EAX[bits 25:14] (original virable "num_apic_ids") is parsed
based on cpu topology levels, which are verified when parsing -smp, it's
no need to check this value by "assert(num_apic_ids > 0)" again, so
remove this assert().

Additionally, wrap the encoding of CPUID[4].EAX[bits 31:26] into a
helper to make the code cleaner.

Tested-by: Yongwei Ma 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
---
Changes since v7:
 * Renamed max_processor_ids_for_cache() to max_thread_ids_for_cache().
   (Xiaoyao)
 * Dropped Michael/Babu's ACKed/Tested tags since the code change.
 * Re-added Yongwei's Tested tag For his re-testing.

Changes since v1:
 * Used "enum CPUTopoLevel share_level" as the parameter in
   max_processor_ids_for_cache().
 * Made cache_into_passthrough case also use
   max_processor_ids_for_cache() and max_core_ids_in_package() to
   encode CPUID[4]. (Yanan)
 * Renamed the title of this patch (the original is "i386: Use
   CPUCacheInfo.share_level to encode CPUID[4].EAX[bits 25:14]").
---
 target/i386/cpu.c | 84 +--
 1 file changed, 45 insertions(+), 39 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 5f6f72fc849f..831957e4a06f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -235,22 +235,53 @@ static uint8_t cpuid2_cache_descriptor(CPUCacheInfo 
*cache)
((t) == UNIFIED_CACHE) ? CACHE_TYPE_UNIFIED : \
0 /* Invalid value */)
 
+static uint32_t max_thread_ids_for_cache(X86CPUTopoInfo *topo_info,
+ enum CPUTopoLevel share_level)
+{
+uint32_t num_ids = 0;
+
+switch (share_level) {
+case CPU_TOPO_LEVEL_CORE:
+num_ids = 1 << apicid_core_offset(topo_info);
+break;
+case CPU_TOPO_LEVEL_DIE:
+num_ids = 1 << apicid_die_offset(topo_info);
+break;
+case CPU_TOPO_LEVEL_PACKAGE:
+num_ids = 1 << apicid_pkg_offset(topo_info);
+break;
+default:
+/*
+ * Currently there is no use case for SMT and MODULE, so use
+ * assert directly to facilitate debugging.
+ */
+g_assert_not_reached();
+}
+
+return num_ids - 1;
+}
+
+static uint32_t max_core_ids_in_package(X86CPUTopoInfo *topo_info)
+{
+uint32_t num_cores = 1 << (apicid_pkg_offset(topo_info) -
+   apicid_core_offset(topo_info));
+return num_cores - 1;
+}
 
 /* Encode cache info for CPUID[4] */
 static void encode_cache_cpuid4(CPUCacheInfo *cache,
-int num_apic_ids, int num_cores,
+X86CPUTopoInfo *topo_info,
 uint32_t *eax, uint32_t *ebx,
 uint32_t *ecx, uint32_t *edx)
 {
 assert(cache->size == cache->line_size * cache->associativity *
   cache->partitions * cache->sets);
 
-assert(num_apic_ids > 0);
 *eax = CACHE_TYPE(cache->type) |
CACHE_LEVEL(cache->level) |
(cache->self_init ? CACHE_SELF_INIT_LEVEL : 0) |
-   ((num_cores - 1) << 26) |
-   ((num_apic_ids - 1) << 14);
+   (max_core_ids_in_package(topo_info) << 26) |
+   (max_thread_ids_for_cache(topo_info, cache->share_level) << 14);
 
 assert(cache->line_size > 0);
 assert(cache->partitions > 0);
@@ -6244,18 +6275,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
(cpuid2_cache_descriptor(env->cache_info_cpuid2.l1i_cache) <<  
8) |
(cpuid2_cache_descriptor(env->cache_info_cpuid2.l2_cache));
 break;
-case 4: {
-/*
- * CPUID.04H:EAX[bits 25:14]: Maximum number of addressable IDs for
- * logical processors sharing this cache.
- */
-int addressable_threads_width;
-/*
- * CPUID.04H:EAX[bits 31:26]: Maximum number of addressable IDs for
- * processor cores in the physical package.
- */
-int addressable_cores_width;
-
+case 4:
 /* cache info: needed for Core compatibility */
 if (cpu->cache_info_passthrough) {
 x86_cpu_get_cache_cpuid(index, count, eax, ebx, ecx, edx);
@@ -6267,55 +6287,42 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
 
 if (cores_per_pkg > 1) {
-addressable_cores_width = apicid_pkg_offset(&topo_info) -
-  apicid_core_offset(&topo_info);
-
 *eax &= ~0xFC00;
-

[PATCH v10 11/21] i386/cpu: Decouple CPUID[0x1F] subleaf with specific topology level

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

At present, the subleaf 0x02 of CPUID[0x1F] is bound to the "die" level.

In fact, the specific topology level exposed in 0x1F depends on the
platform's support for extension levels (module, tile and die).

To help expose "module" level in 0x1F, decouple CPUID[0x1F] subleaf
with specific topology level.

Tested-by: Yongwei Ma 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
Reviewed-by: Xiaoyao Li 
---
Changes since v10:
 * Combined ecx and edx encoding into the single line. (Xiaoyao)
 * Fixed the comment in encode_topo_cpuid1f(). (Xiaoyao)

Changes since v7:
 * Refactored the encode_topo_cpuid1f() to use traversal to search the
   encoded level and avoid using static variables. (Xiaoyao)
   - Since the total number of levels in the bitmap is not too large,
 the overhead of traversing is supposed to be acceptable.
 * Renamed the variable num_cpus_next_level to num_threads_next_level.
   (Xiaoyao)
 * Renamed the helper num_cpus_by_topo_level() to
   num_threads_by_topo_level(). (Xiaoyao)
 * Dropped Michael/Babu's Acked/Tested tags since the code change.
 * Re-added Yongwei's Tested tag For his re-testing.

Changes since v3:
 * New patch to prepare to expose module level in 0x1F.
 * Moved the CPUTopoLevel enumeration definition from "i386: Add cache
   topology info in CPUCacheInfo" to this patch. Note, to align with
   topology types in SDM, revert the name of CPU_TOPO_LEVEL_UNKNOW to
   CPU_TOPO_LEVEL_INVALID.
---
 target/i386/cpu.c | 135 +-
 1 file changed, 110 insertions(+), 25 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index d030b45f9c3e..92d85e920015 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -269,6 +269,115 @@ static void encode_cache_cpuid4(CPUCacheInfo *cache,
(cache->complex_indexing ? CACHE_COMPLEX_IDX : 0);
 }
 
+static uint32_t num_threads_by_topo_level(X86CPUTopoInfo *topo_info,
+  enum CPUTopoLevel topo_level)
+{
+switch (topo_level) {
+case CPU_TOPO_LEVEL_SMT:
+return 1;
+case CPU_TOPO_LEVEL_CORE:
+return topo_info->threads_per_core;
+case CPU_TOPO_LEVEL_DIE:
+return topo_info->threads_per_core * topo_info->cores_per_die;
+case CPU_TOPO_LEVEL_PACKAGE:
+return topo_info->threads_per_core * topo_info->cores_per_die *
+   topo_info->dies_per_pkg;
+default:
+g_assert_not_reached();
+}
+return 0;
+}
+
+static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo *topo_info,
+enum CPUTopoLevel topo_level)
+{
+switch (topo_level) {
+case CPU_TOPO_LEVEL_SMT:
+return 0;
+case CPU_TOPO_LEVEL_CORE:
+return apicid_core_offset(topo_info);
+case CPU_TOPO_LEVEL_DIE:
+return apicid_die_offset(topo_info);
+case CPU_TOPO_LEVEL_PACKAGE:
+return apicid_pkg_offset(topo_info);
+default:
+g_assert_not_reached();
+}
+return 0;
+}
+
+static uint32_t cpuid1f_topo_type(enum CPUTopoLevel topo_level)
+{
+switch (topo_level) {
+case CPU_TOPO_LEVEL_INVALID:
+return CPUID_1F_ECX_TOPO_LEVEL_INVALID;
+case CPU_TOPO_LEVEL_SMT:
+return CPUID_1F_ECX_TOPO_LEVEL_SMT;
+case CPU_TOPO_LEVEL_CORE:
+return CPUID_1F_ECX_TOPO_LEVEL_CORE;
+case CPU_TOPO_LEVEL_DIE:
+return CPUID_1F_ECX_TOPO_LEVEL_DIE;
+default:
+/* Other types are not supported in QEMU. */
+g_assert_not_reached();
+}
+return 0;
+}
+
+static void encode_topo_cpuid1f(CPUX86State *env, uint32_t count,
+X86CPUTopoInfo *topo_info,
+uint32_t *eax, uint32_t *ebx,
+uint32_t *ecx, uint32_t *edx)
+{
+X86CPU *cpu = env_archcpu(env);
+unsigned long level, next_level;
+uint32_t num_threads_next_level, offset_next_level;
+
+assert(count + 1 < CPU_TOPO_LEVEL_MAX);
+
+/*
+ * Find the No.(count + 1) topology level in avail_cpu_topo bitmap.
+ * The search starts from bit 1 (CPU_TOPO_LEVEL_INVALID + 1).
+ */
+level = CPU_TOPO_LEVEL_INVALID;
+for (int i = 0; i <= count; i++) {
+level = find_next_bit(env->avail_cpu_topo,
+  CPU_TOPO_LEVEL_PACKAGE,
+  level + 1);
+
+/*
+ * CPUID[0x1f] doesn't explicitly encode the package level,
+ * and it just encodes the invalid level (all fields are 0)
+ * into the last subleaf of 0x1f.
+ */
+if (level == CPU_TOPO_LEVEL_PACKAGE) {
+level = CPU_TOPO_LEVEL_INVALID;
+break;
+}
+}
+
+if (level == CPU_TOPO_LEVEL_INVALID) {
+num_threads_next_level = 0;
+offset_next_level = 0;
+} else {
+next_level = find_next_bit(env->avail_cpu_topo,
+   CPU_TOPO_LEVEL_PACKAGE,
+   le

[PATCH v10 12/21] i386: Introduce module level cpu topology to CPUX86State

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

Intel CPUs implement module level on hybrid client products (e.g.,
ADL-N, MTL, etc) and E-core server products.

A module contains a set of cores that share certain resources (in
current products, the resource usually includes L2 cache, as well as
module scoped features and MSRs).

Module level support is the prerequisite for L2 cache topology on
module level. With module level, we can implement the Guest's CPU
topology and future cache topology to be consistent with the Host's on
Intel hybrid client/E-core server platforms.

Tested-by: Yongwei Ma 
Co-developed-by: Zhuocheng Ding 
Signed-off-by: Zhuocheng Ding 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
---
Changes since v7:
 * Mapped x86 module to smp module instead of cluster.
 * Re-wrote the commit message to explain the reason why we needs module
   level.
 * Dropped Michael/Babu's ACKed/Tested tags since the code change.
 * Re-added Yongwei's Tested tag For his re-testing.

Changes since v1:
 * The background of the introduction of the "cluster" parameter and its
   exact meaning were revised according to Yanan's explanation. (Yanan)
---
 hw/i386/x86.c | 5 +
 target/i386/cpu.c | 1 +
 target/i386/cpu.h | 3 +++
 3 files changed, 9 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 0a6c59c724f1..7c94d366af03 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -313,6 +313,11 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 
 init_topo_info(&topo_info, x86ms);
 
+if (ms->smp.modules > 1) {
+env->nr_modules = ms->smp.modules;
+/* TODO: Expose module level in CPUID[0x1F]. */
+}
+
 if (ms->smp.dies > 1) {
 env->nr_dies = ms->smp.dies;
 set_bit(CPU_TOPO_LEVEL_DIE, env->avail_cpu_topo);
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 92d85e920015..b8917c412175 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7717,6 +7717,7 @@ static void x86_cpu_init_default_topo(X86CPU *cpu)
 {
 CPUX86State *env = &cpu->env;
 
+env->nr_modules = 1;
 env->nr_dies = 1;
 
 /* SMT, core and package levels are set by default. */
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 2e24f457468d..095540e58f7a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1899,6 +1899,9 @@ typedef struct CPUArchState {
 /* Number of dies within this CPU package. */
 unsigned nr_dies;
 
+/* Number of modules within one die. */
+unsigned nr_modules;
+
 /* Bitmap of available CPU topology levels for this CPU. */
 DECLARE_BITMAP(avail_cpu_topo, CPU_TOPO_LEVEL_MAX);
 } CPUX86State;
-- 
2.34.1

[PATCH v10 06/21] i386/cpu: Use APIC ID info to encode cache topo in CPUID[4]

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the
CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the
nearest power-of-2 integer.

The nearest power-of-2 integer can be calculated by pow2ceil() or by
using APIC ID offset/width (like L3 topology using 1 << die_offset [3]).

But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
are associated with APIC ID. For example, in linux kernel, the field
"num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID. And for
another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
matched with actual core numbers and it's calculated by:
"(1 << (pkg_offset - core_offset)) - 1".

Therefore the topology information of APIC ID should be preferred to
calculate nearest power-of-2 integer for CPUID.04H:EAX[bits 25:14] and
CPUID.04H:EAX[bits 31:26]:
1. d/i cache is shared in a core, 1 << core_offset should be used
   instead of "cs->nr_threads" in encode_cache_cpuid4() for
   CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14].
2. L2 cache is supposed to be shared in a core as for now, thereby
   1 << core_offset should also be used instead of "cs->nr_threads" in
   encode_cache_cpuid4() for CPUID.04H.02H:EAX[bits 25:14].
3. Similarly, the value for CPUID.04H:EAX[bits 31:26] should also be
   calculated with the bit width between the package and SMT levels in
   the APIC ID (1 << (pkg_offset - core_offset) - 1).

In addition, use APIC ID bits calculations to replace "pow2ceil()" for
cache_info_passthrough case.

[1]: efb3934adf9e ("x86: cpu: make sure number of addressable IDs for processor 
cores meets the spec")
[2]: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical 
processors sharing cache")
[3]: d65af288a84d ("i386: Update new x86_apicid parsing rules with die_offset 
support")

Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
Suggested-by: Robert Hoo 
Tested-by: Yongwei Ma 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
---
Changes since v9:
 * Added comments on addressable_threads_width and
   addressable_cores_width. (Xiaoyao)

Changes since v7:
 * Fixed calculations in cache_info_passthrough case. (Xiaoyao)
 * Renamed variables as *_width. (Xiaoyao)
 * Unified variable names for encoding cache_info_passthrough case and
   non-cache_info_passthrough case as addressable_cores_width and
   addressable_threads_width.
 * Fixed typos in commit message. (Xiaoyao)
 * Dropped Michael/Babu's ACKed/Tested tags since the code change.
 * Re-added Yongwei's Tested tag For his re-testing.

Changes since v3:
 * Fixed compile warnings. (Babu)
 * Fixed spelling typo.

Changes since v1:
 * Used APIC ID offset to replace "pow2ceil()" for cache_info_passthrough
   case. (Yanan)
 * Split the L1 cache fix into a separate patch.
 * Renamed the title of this patch (the original is "i386/cpu: Fix number
   of addressable IDs in CPUID.04H").
---
 target/i386/cpu.c | 45 -
 1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index eedc2c5ea6e0..0ebacacf2aad 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6014,7 +6014,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 {
 X86CPU *cpu = env_archcpu(env);
 CPUState *cs = env_cpu(env);
-uint32_t die_offset;
 uint32_t limit;
 uint32_t signature[3];
 X86CPUTopoInfo topo_info;
@@ -6086,7 +6085,18 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
(cpuid2_cache_descriptor(env->cache_info_cpuid2.l1i_cache) <<  
8) |
(cpuid2_cache_descriptor(env->cache_info_cpuid2.l2_cache));
 break;
-case 4:
+case 4: {
+/*
+ * CPUID.04H:EAX[bits 25:14]: Maximum number of addressable IDs for
+ * logical processors sharing this cache.
+ */
+int addressable_threads_width;
+/*
+ * CPUID.04H:EAX[bits 31:26]: Maximum number of addressable IDs for
+ * processor cores in the physical package.
+ */
+int addressable_cores_width;
+
 /* cache info: needed for Core compatibility */
 if (cpu->cache_info_passthrough) {
 x86_cpu_get_cache_cpuid(index, count, eax, ebx, ecx, edx);
@@ -6098,39 +6108,55 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
 int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
 if (cs->nr_cores > 1) {
+addressable_cores_width = apicid_pkg_offset(&topo_info) -
+  apicid_core_offset(&topo_info);
+
 *eax &= ~0xFC00;
-*eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
+*eax |= ((1 << addressable_cores_width) - 1) << 26;
 }

[PATCH v10 21/21] i386/cpu: Use CPUCacheInfo.share_level to encode CPUID[0x8000001D].EAX[bits 25:14]

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

CPUID[0x801D].EAX[bits 25:14] NumSharingCache: number of logical
processors sharing cache.

The number of logical processors sharing this cache is
NumSharingCache + 1.

After cache models have topology information, we can use
CPUCacheInfo.share_level to decide which topology level to be encoded
into CPUID[0x801D].EAX[bits 25:14].

Tested-by: Yongwei Ma 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
Reviewed-by: Babu Moger 
---
Changes since v7:
 * Renamed max_processor_ids_for_cache() to max_thread_ids_for_cache().
 * Dropped Michael/Babu's ACKed/Tested tags since the code change.
 * Re-added Yongwei's Tested tag For his re-testing.

Changes since v3:
 * Explained what "CPUID[0x801D].EAX[bits 25:14]" means in the
   commit message. (Babu)

Changes since v1:
 * Used cache->share_level as the parameter in
   max_processor_ids_for_cache().
---
 target/i386/cpu.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 831957e4a06f..b7a91c80a271 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -478,20 +478,12 @@ static void encode_cache_cpuid801d(CPUCacheInfo 
*cache,
uint32_t *eax, uint32_t *ebx,
uint32_t *ecx, uint32_t *edx)
 {
-uint32_t num_sharing_cache;
 assert(cache->size == cache->line_size * cache->associativity *
   cache->partitions * cache->sets);
 
 *eax = CACHE_TYPE(cache->type) | CACHE_LEVEL(cache->level) |
(cache->self_init ? CACHE_SELF_INIT_LEVEL : 0);
-
-/* L3 is shared among multiple cores */
-if (cache->level == 3) {
-num_sharing_cache = 1 << apicid_die_offset(topo_info);
-} else {
-num_sharing_cache = 1 << apicid_core_offset(topo_info);
-}
-*eax |= (num_sharing_cache - 1) << 14;
+*eax |= max_thread_ids_for_cache(topo_info, cache->share_level) << 14;
 
 assert(cache->line_size > 0);
 assert(cache->partitions > 0);
-- 
2.34.1

[PATCH v10 18/21] hw/i386/pc: Support smp.modules for x86 PC machine

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

As module-level topology support is added to X86CPU, now we can enable
the support for the modules parameter on PC machines. With this support,
we can define a 5-level x86 CPU topology with "-smp":

-smp cpus=*,maxcpus=*,sockets=*,dies=*,modules=*,cores=*,threads=*.

So, add the 5-level topology example in description of "-smp".

Additionally, add the missed drawers and books options in previous
example.

Tested-by: Yongwei Ma 
Co-developed-by: Zhuocheng Ding 
Signed-off-by: Zhuocheng Ding 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
Reviewed-by: Babu Moger 
---
Changes since v9:
 * Mentioned the change about adding missed drawers and books. (Babu)

Changes since v8:
 * Added missing "modules" parameter in -smp example.

Changes since v7:
 * Supported modules instead of clusters for PC.
 * Dropped Michael/Babu/Yanan's ACKed/Tested/Reviewed tags since the
   code change.
 * Re-added Yongwei's Tested tag For his re-testing.
---
 hw/i386/pc.c|  1 +
 qemu-options.hx | 18 ++
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index e80f02bef41c..f4e75069d47d 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1831,6 +1831,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
 mc->nvdimm_supported = true;
 mc->smp_props.dies_supported = true;
+mc->smp_props.modules_supported = true;
 mc->default_ram_id = "pc.ram";
 pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_AUTO;
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 7fd1713fa83c..783df2e53523 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -281,7 +281,8 @@ ERST
 
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
 "-smp 
[[cpus=]n][,maxcpus=maxcpus][,drawers=drawers][,books=books][,sockets=sockets]\n"
-"   
[,dies=dies][,clusters=clusters][,cores=cores][,threads=threads]\n"
+"   
[,dies=dies][,clusters=clusters][,modules=modules][,cores=cores]\n"
+"   [,threads=threads]\n"
 "set the number of initial CPUs to 'n' [default=1]\n"
 "maxcpus= maximum number of total CPUs, including\n"
 "offline CPUs for hotplug, etc\n"
@@ -290,7 +291,8 @@ DEF("smp", HAS_ARG, QEMU_OPTION_smp,
 "sockets= number of sockets in one book\n"
 "dies= number of dies in one socket\n"
 "clusters= number of clusters in one die\n"
-"cores= number of cores in one cluster\n"
+"modules= number of modules in one cluster\n"
+"cores= number of cores in one module\n"
 "threads= number of threads in one core\n"
 "Note: Different machines may have different subsets of the CPU topology\n"
 "  parameters supported, so the actual meaning of the supported 
parameters\n"
@@ -306,7 +308,7 @@ DEF("smp", HAS_ARG, QEMU_OPTION_smp,
 "  must be set as 1 in the purpose of correct parsing.\n",
 QEMU_ARCH_ALL)
 SRST
-``-smp 
[[cpus=]n][,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,clusters=clusters][,cores=cores][,threads=threads]``
+``-smp 
[[cpus=]n][,maxcpus=maxcpus][,drawers=drawers][,books=books][,sockets=sockets][,dies=dies][,clusters=clusters][,modules=modules][,cores=cores][,threads=threads]``
 Simulate a SMP system with '\ ``n``\ ' CPUs initially present on
 the machine type board. On boards supporting CPU hotplug, the optional
 '\ ``maxcpus``\ ' parameter can be set to enable further CPUs to be
@@ -345,14 +347,14 @@ SRST
 -smp 8,sockets=2,cores=2,threads=2,maxcpus=8
 
 The following sub-option defines a CPU topology hierarchy (2 sockets
-totally on the machine, 2 dies per socket, 2 cores per die, 2 threads
-per core) for PC machines which support sockets/dies/cores/threads.
-Some members of the option can be omitted but their values will be
-automatically computed:
+totally on the machine, 2 dies per socket, 2 modules per die, 2 cores per
+module, 2 threads per core) for PC machines which support sockets/dies
+/modules/cores/threads. Some members of the option can be omitted but
+their values will be automatically computed:
 
 ::
 
--smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16
+-smp 32,sockets=2,dies=2,modules=2,cores=2,threads=2,maxcpus=32
 
 The following sub-option defines a CPU topology hierarchy (2 sockets
 totally on the machine, 2 clusters per socket, 2 cores per cluster,
-- 
2.34.1

[PATCH v10 19/21] i386: Add cache topology info in CPUCacheInfo

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

Currently, by default, the cache topology is encoded as:
1. i/d cache is shared in one core.
2. L2 cache is shared in one core.
3. L3 cache is shared in one die.

This default general setting has caused a misunderstanding, that is, the
cache topology is completely equated with a specific cpu topology, such
as the connection between L2 cache and core level, and the connection
between L3 cache and die level.

In fact, the settings of these topologies depend on the specific
platform and are not static. For example, on Alder Lake-P, every
four Atom cores share the same L2 cache.

Thus, we should explicitly define the corresponding cache topology for
different cache models to increase scalability.

Except legacy_l2_cache_cpuid2 (its default topo level is
CPU_TOPO_LEVEL_UNKNOW), explicitly set the corresponding topology level
for all other cache models. In order to be compatible with the existing
cache topology, set the CPU_TOPO_LEVEL_CORE level for the i/d cache, set
the CPU_TOPO_LEVEL_CORE level for L2 cache, and set the
CPU_TOPO_LEVEL_DIE level for L3 cache.

The field for CPUID[4].EAX[bits 25:14] or CPUID[0x801D].EAX[bits
25:14] will be set based on CPUCacheInfo.share_level.

Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
Tested-by: Yongwei Ma 
Acked-by: Michael S. Tsirkin 
---
Changes since v3:
 * Fixed cache topology uninitialization bugs for some AMD CPUs. (Babu)
 * Moved the CPUTopoLevel enumeration definition to the previous 0x1f
   rework patch.

Changes since v1:
 * Added the prefix "CPU_TOPO_LEVEL_*" for CPU topology level names.
   (Yanan)
 * (Revert, pls refer "i386: Decouple CPUID[0x1F] subleaf with specific
   topology level") Renamed the "INVALID" level to CPU_TOPO_LEVEL_UNKNOW.
   (Yanan)
---
 target/i386/cpu.c | 36 
 target/i386/cpu.h |  7 +++
 2 files changed, 43 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 3a500640935f..5f6f72fc849f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -551,6 +551,7 @@ static CPUCacheInfo legacy_l1d_cache = {
 .sets = 64,
 .partitions = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /*FIXME: CPUID leaf 0x8005 is inconsistent with leaves 2 & 4 */
@@ -565,6 +566,7 @@ static CPUCacheInfo legacy_l1d_cache_amd = {
 .partitions = 1,
 .lines_per_tag = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /* L1 instruction cache: */
@@ -578,6 +580,7 @@ static CPUCacheInfo legacy_l1i_cache = {
 .sets = 64,
 .partitions = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /*FIXME: CPUID leaf 0x8005 is inconsistent with leaves 2 & 4 */
@@ -592,6 +595,7 @@ static CPUCacheInfo legacy_l1i_cache_amd = {
 .partitions = 1,
 .lines_per_tag = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /* Level 2 unified cache: */
@@ -605,6 +609,7 @@ static CPUCacheInfo legacy_l2_cache = {
 .sets = 4096,
 .partitions = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /*FIXME: CPUID leaf 2 descriptor is inconsistent with CPUID leaf 4 */
@@ -614,6 +619,7 @@ static CPUCacheInfo legacy_l2_cache_cpuid2 = {
 .size = 2 * MiB,
 .line_size = 64,
 .associativity = 8,
+.share_level = CPU_TOPO_LEVEL_INVALID,
 };
 
 
@@ -627,6 +633,7 @@ static CPUCacheInfo legacy_l2_cache_amd = {
 .associativity = 16,
 .sets = 512,
 .partitions = 1,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /* Level 3 unified cache: */
@@ -642,6 +649,7 @@ static CPUCacheInfo legacy_l3_cache = {
 .self_init = true,
 .inclusive = true,
 .complex_indexing = true,
+.share_level = CPU_TOPO_LEVEL_DIE,
 };
 
 /* TLB definitions: */
@@ -1940,6 +1948,7 @@ static const CPUCaches epyc_cache_info = {
 .lines_per_tag = 1,
 .self_init = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 },
 .l1i_cache = &(CPUCacheInfo) {
 .type = INSTRUCTION_CACHE,
@@ -1952,6 +1961,7 @@ static const CPUCaches epyc_cache_info = {
 .lines_per_tag = 1,
 .self_init = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 },
 .l2_cache = &(CPUCacheInfo) {
 .type = UNIFIED_CACHE,
@@ -1962,6 +1972,7 @@ static const CPUCaches epyc_cache_info = {
 .partitions = 1,
 .sets = 1024,
 .lines_per_tag = 1,
+.share_level = CPU_TOPO_LEVEL_CORE,
 },
 .l3_cache = &(CPUCacheInfo) {
 .type = UNIFIED_CACHE,
@@ -1975,6 +1986,7 @@ static const CPUCaches epyc_cache_info = {
 .self_init = true,
 .inclusive = true,
 .complex_indexing = true,
+.share_level = CPU_TOPO_LEVEL_DIE,
 },
 };
 
@@ -1990,6 +2002,7 @@ static CPUCaches epyc_v4_cache_info = {
 .lines_per_tag = 1,
 .self_init = 1,
 .no_invd_sharing = true,
+.share

[PATCH v10 15/21] i386: Support module_id in X86CPUTopoIDs

2024-03-21 Thread Zhao Liu

From: Zhao Liu 

Add module_id member in X86CPUTopoIDs.

module_id can be parsed from APIC ID, so also update APIC ID parsing
rule to support module level. With this support, the conversions with
module level between X86CPUTopoIDs, X86CPUTopoInfo and APIC ID are
completed.

module_id can be also generated from cpu topology, and before i386
supports "modules" in smp, the default "modules per die" (modules *
clusters) is only 1, thus the module_id generated in this way is 0,
so that it will not conflict with the module_id generated by APIC ID.

Tested-by: Yongwei Ma 
Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhuocheng Ding 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
---
Changes since v7:
 * Mapped x86 module to the smp module instead of cluster.
 * Dropped Michael/Babu's ACKed/Tested tags since the code change.
 * Re-added Yongwei's Tested tag For his re-testing.

Changes since v1:
 * Merged the patch "i386: Update APIC ID parsing rule to support module
   level" into this one. (Yanan)
 * Moved the apicid_module_width() and apicid_module_offset() support
   into the previous modules_per_die related patch. (Yanan)
---
 hw/i386/x86.c  | 31 +--
 include/hw/i386/topology.h | 17 +
 2 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index a4da29ec8115..81f50b5fcd3c 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -332,12 +332,9 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 
 /*
  * If APIC ID is not set,
- * set it based on socket/die/core/thread properties.
+ * set it based on socket/die/module/core/thread properties.
  */
 if (cpu->apic_id == UNASSIGNED_APIC_ID) {
-int max_socket = (ms->smp.max_cpus - 1) /
-smp_threads / smp_cores / ms->smp.dies;
-
 /*
  * die-id was optional in QEMU 4.0 and older, so keep it optional
  * if there's only one die per socket.
@@ -349,9 +346,9 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 if (cpu->socket_id < 0) {
 error_setg(errp, "CPU socket-id is not set");
 return;
-} else if (cpu->socket_id > max_socket) {
+} else if (cpu->socket_id > ms->smp.sockets - 1) {
 error_setg(errp, "Invalid CPU socket-id: %u must be in range 0:%u",
-   cpu->socket_id, max_socket);
+   cpu->socket_id, ms->smp.sockets - 1);
 return;
 }
 if (cpu->die_id < 0) {
@@ -383,17 +380,27 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 topo_ids.die_id = cpu->die_id;
 topo_ids.core_id = cpu->core_id;
 topo_ids.smt_id = cpu->thread_id;
+
+/*
+ * TODO: This is the temporary initialization for topo_ids.module_id to
+ * avoid "maybe-uninitialized" compilation errors. Will remove when
+ * X86CPU supports module_id.
+ */
+topo_ids.module_id = 0;
+
 cpu->apic_id = x86_apicid_from_topo_ids(&topo_info, &topo_ids);
 }
 
 cpu_slot = x86_find_cpu_slot(MACHINE(x86ms), cpu->apic_id, &idx);
 if (!cpu_slot) {
 x86_topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids);
+
 error_setg(errp,
-"Invalid CPU [socket: %u, die: %u, core: %u, thread: %u] with"
-" APIC ID %" PRIu32 ", valid index range 0:%d",
-topo_ids.pkg_id, topo_ids.die_id, topo_ids.core_id, 
topo_ids.smt_id,
-cpu->apic_id, ms->possible_cpus->len - 1);
+"Invalid CPU [socket: %u, die: %u, module: %u, core: %u, thread: 
%u]"
+" with APIC ID %" PRIu32 ", valid index range 0:%d",
+topo_ids.pkg_id, topo_ids.die_id, topo_ids.module_id,
+topo_ids.core_id, topo_ids.smt_id, cpu->apic_id,
+ms->possible_cpus->len - 1);
 return;
 }
 
@@ -519,6 +526,10 @@ const CPUArchIdList 
*x86_possible_cpu_arch_ids(MachineState *ms)
 ms->possible_cpus->cpus[i].props.has_die_id = true;
 ms->possible_cpus->cpus[i].props.die_id = topo_ids.die_id;
 }
+if (ms->smp.modules > 1) {
+ms->possible_cpus->cpus[i].props.has_module_id = true;
+ms->possible_cpus->cpus[i].props.module_id = topo_ids.module_id;
+}
 ms->possible_cpus->cpus[i].props.has_core_id = true;
 ms->possible_cpus->cpus[i].props.core_id = topo_ids.core_id;
 ms->possible_cpus->cpus[i].props.has_thread_id = true;
diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
index ea871045779d..dff49fce1154 100644
--- a/include/hw/i386/topology.h
+++ b/include/hw/i386/topology.h
@@ -50,6 +50,7 @@ typedef uint32_t apic_id_t;
 typedef struct X86CPUTopoIDs {
 unsigned pkg_id;
 unsigned die_id;
+unsigned module_id;
 unsigned core_id;
 unsigned smt_id;
 } X86CPUTopoIDs;
@@ -143,6 +144,7 @@ static inline apic_id_t 
x86_apicid_from_topo_ids(X86CPUTopoInfo *t

Re: qemu fuzz crash in virtio_net_queue_reset()

2024-03-21 Thread Alexander Bulekov

On 240320 0024, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> From fuzzing I've got a fuzz-data, which produces the following crash:
> 
> qemu-fuzz-x86_64: ../hw/net/virtio-net.c:134: void 
> flush_or_purge_queued_packets(NetClientState *): Assertion 
> `!virtio_net_get_subqueue(nc)->async_tx.elem' failed.
> ==2172308== ERROR: libFuzzer: deadly signal
> #0 0x5bd8c748b5a1 in __sanitizer_print_stack_trace 
> (/home/vsementsov/work/src/qemu/yc7-fuzz/build/qemu-fuzz-x86_64+0x26f05a1) 
> (BuildId: b41827f440fd9feaa98c667dbdcc961abb2799ae)
> #1 0x5bd8c73fde38 in fuzzer::PrintStackTrace() 
> (/home/vsementsov/work/src/qemu/yc7-fuzz/build/qemu-fuzz-x86_64+0x2662e38) 
> (BuildId: b41827f440fd9feaa98c667dbdcc961abb2799ae)
> #2 0x5bd8c73e38b3 in fuzzer::Fuzzer::CrashCallback() 
> (/home/settlements/work/src/qemu/yc7-fuzz/build/qemu-fuzz-x86_64+0x26488b3) 
> (BuildId: b41827f440fd9feaa98c667dbdcc961abb2799ae)
> #3 0x739eec84251f  (/lib/x86_64-linux-gnu/libc.so.6+0x4251f) (BuildId: 
> c289da5071a3399de893d2af81d6a30c62646e1e)
> #4 0x739eec8969fb in __pthread_kill_implementation 
> nptl/./nptl/pthread_kill.c:43:17
> #5 0x739eec8969fb in __pthread_kill_internal 
> nptl/./nptl/pthread_kill.c:78:10
> #6 0x739eec8969fb in pthread_kill nptl/./nptl/pthread_kill.c:89:10
> #7 0x739eec842475 in gsignal signal/../sysdeps/posix/raise.c:26:13
> #8 0x739eec8287f2 in abort stdlib/./stdlib/abort.c:79:7
> #9 0x739eec82871a in __assert_fail_base assert/./assert/assert.c:92:3
> #10 0x739eec839e95 in __assert_fail assert/./assert/assert.c:101:3
> #11 0x5bd8c995d9e2 in flush_or_purge_queued_packets 
> /home/vsementsov/work/src/qemu/yc7-fuzz/build/../hw/net/virtio-net.c:134:5
> #12 0x5bd8c9918a5f in virtio_net_queue_reset 
> /home/vsementsov/work/src/qemu/yc7-fuzz/build/../hw/net/virtio-net.c:563:5
> #13 0x5bd8c9b724e5 in virtio_queue_reset 
> /home/vsementsov/work/src/qemu/yc7-fuzz/build/../hw/virtio/virtio.c:2492:9
> #14 0x5bd8c8bcfb7c in virtio_pci_common_write 
> /home/vsementsov/work/src/qemu/yc7-fuzz/build/../hw/virtio/virtio-pci.c:1372:13
> #15 0x5bd8c9e19cf3 in memory_region_write_accessor 
> /home/vsementsov/work/src/qemu/yc7-fuzz/build/../softmmu/memory.c:492:5
> #16 0x5bd8c9e19631 in access_with_adjusted_size 
> /home/vsementsov/work/src/qemu/yc7-fuzz/build/../softmmu/memory.c:554:18
> #17 0x5bd8c9e17f3c in memory_region_dispatch_write 
> /home/vsementsov/work/src/qemu/yc7-fuzz/build/../softmmu/memory.c:1514:16
> #18 0x5bd8c9ea3bbe in flatview_write_continue 
> /home/vsementsov/work/src/qemu/yc7-fuzz/build/../softmmu/physmem.c:2825:23
> #19 0x5bd8c9e91aab in flatview_write 
> /home/vsementsov/work/src/qemu/yc7-fuzz/build/../softmmu/physmem.c:2867:12
> #20 0x5bd8c9e91568 in address_space_write 
> /home/vsementsov/work/src/qemu/yc7-fuzz/build/../softmmu/physmem.c:2963:18
> #21 0x5bd8c74c8a90 in __wrap_qtest_writeq 
> /home/vsementsov/work/src/qemu/yc7-fuzz/build/../tests/qtest/fuzz/qtest_wrappers.c:187:9
> #22 0x5bd8c74dc4da in op_write 
> /home/vsementsov/work/src/qemu/yc7-fuzz/build/../tests/qtest/fuzz/generic_fuzz.c:487:13
> #23 0x5bd8c74d942e in generic_fuzz 
> /home/vsementsov/work/src/qemu/yc7-fuzz/build/../tests/qtest/fuzz/generic_fuzz.c:714:17
> #24 0x5bd8c74c016e in LLVMFuzzerTestOneInput 
> /home/vsementsov/work/src/qemu/yc7-fuzz/build/../tests/qtest/fuzz/fuzz.c:152:5
> #25 0x5bd8c73e4e43 in fuzzer::Fuzzer::ExecuteCallback(unsigned char 
> const*, unsigned long) 
> (/home/vsementsov/work/src/qemu/yc7-fuzz/build/qemu-fuzz-x86_64+0x2649e43) 
> (BuildId: b41827f440fd9feaa98c667dbdcc961abb2799ae)
> #26 0x5bd8c73cebbf in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, 
> unsigned long) 
> (/home/vsementsov/work/src/qemu/yc7-fuzz/build/qemu-fuzz-x86_64+0x2633bbf) 
> (BuildId: b41827f440fd9feaa98c667dbdcc961abb2799ae)
> #27 0x5bd8c73d4916 in fuzzer::FuzzerDriver(int*, char***, int 
> (*)(unsigned char const*, unsigned long)) 
> (/home/vsementsov/work/src/qemu/yc7-fuzz/build/qemu-fuzz-x86_64+0x2639916) 
> (BuildId: b41827f440fd9feaa98c667dbdcc961abb2799ae)
> #28 0x5bd8c73fe732 in main 
> (/home/vsementsov/work/src/qemu/yc7-fuzz/build/qemu-fuzz-x86_64+0x2663732) 
> (BuildId: b41827f440fd9feaa98c667dbdcc961abb2799ae)
> #29 0x739eec829d8f in __libc_start_call_main 
> csu/../sysdeps/nptl/libc_start_call_main.h:58:16
> #30 0x739eec829e3f in __libc_start_main csu/../csu/libc-start.c:392:3
> #31 0x5bd8c73c9484 in _start 
> (/home/vsementsov/work/src/qemu/yc7-fuzz/build/qemu-fuzz-x86_64+0x262e484) 
> (BuildId: b41827f440fd9feaa98c667dbdcc961abb2799ae)
> 
> 
> 

Hello Vladimir,
This looks like a similar crash.
https://gitlab.com/qemu-project/qemu/-/issues/1451

That issue has a qtest reproducer that does not require a fuzzer to
reproduce.

The fuzzer should run fine under gdb. e.g.
gdb ./qemu-fuzz-i386
r  --fuzz-target=generic-fuzz-virtio-net-pci-slirp 
~/generic-fuzz-virtio-net-pci-slirp.crash-7707e14adea6

Re: [PATCH v5 5/7] migration/multifd: implement initialization of qpl compression

2024-03-21 Thread Peter Xu

On Thu, Mar 21, 2024 at 01:37:36AM +, Liu, Yuan1 wrote:
> > -Original Message-
> > From: Peter Xu 
> > Sent: Thursday, March 21, 2024 4:32 AM
> > To: Liu, Yuan1 
> > Cc: Daniel P. Berrangé ; faro...@suse.de; qemu-
> > de...@nongnu.org; hao.xi...@bytedance.com; bryan.zh...@bytedance.com; Zou,
> > Nanhai 
> > Subject: Re: [PATCH v5 5/7] migration/multifd: implement initialization of
> > qpl compression
> > 
> > On Wed, Mar 20, 2024 at 04:23:01PM +, Liu, Yuan1 wrote:
> > > let me explain here, during the decompression operation of IAA, the
> > > decompressed data can be directly output to the virtual address of the
> > > guest memory by IAA hardware.  It can avoid copying the decompressed
> > data
> > > to guest memory by CPU.
> > 
> > I see.
> > 
> > > Without -mem-prealloc, all the guest memory is not populated, and IAA
> > > hardware needs to trigger I/O page fault first and then output the
> > > decompressed data to the guest memory region.  Besides that, CPU page
> > > faults will also trigger IOTLB flush operation when IAA devices use SVM.
> > 
> > Oh so the IAA hardware already can use CPU pgtables?  Nice..
> > 
> > Why IOTLB flush is needed?  AFAIU we're only installing new pages, the
> > request can either come from a CPU access or a DMA.  In all cases there
> > should have no tearing down of an old page.  Isn't an iotlb flush only
> > needed if a tear down happens?
> 
> As far as I know, IAA hardware uses SVM technology to use the CPU's page 
> table 
> for address translation (IOMMU scalable mode directly accesses the CPU page 
> table).
> Therefore, when the CPU page table changes, the device's Invalidation 
> operation needs
> to be triggered to update the IOMMU and the device's cache. 
> 
> My current kernel version is mainline 6.2. The issue I see is as follows:
> --Handle_mm_fault
>  |
>   -- wp_page_copy

This is the CoW path.  Not usual at all..

I assume this issue should only present on destination.  Then the guest
pages should be the destination of such DMAs to happen, which means these
should be write faults, and as we see here it is, otherwise it won't
trigger a CoW.

However it's not clear to me why a pre-installed zero page existed.  It
means someone read the guest pages first.

It might be interesting to know _why_ someone reads the guest pages, even
if we know they're all zeros.  If we can avoid such reads then it'll be a
hole rather than a prefaulted read on zero page, then invalidations are not
needed, and I expect that should fix the iotlb storm issue.

It'll still be good we can fix this first to not make qpl special from this
regard, so that the hope is migration submodule shouldn't rely on any
pre-config (-mem-prealloc) on guest memory behaviors to work properly.

> |
> -- mmu_notifier_invalidate_range
>   |
>   -- intel_invalidate_rage
> |
> -- qi_flush_piotlb
> -- qi_flush_dev_iotlb_pasid

-- 
Peter Xu

Re: [RFC PATCH v9 06/23] target/arm: Add support for Non-maskable Interrupt

2024-03-21 Thread Peter Maydell

On Thu, 21 Mar 2024 at 13:10, Jinjie Ruan  wrote:
>
> This only implements the external delivery method via the GICv3.
>
> Signed-off-by: Jinjie Ruan 
> Reviewed-by: Richard Henderson 
> ---
> v9:
> - Update the GPIOs passed in the arm_cpu_kvm_set_irq, and update the comment.
> - Definitely not merge VINMI and VFNMI into EXCP_VNMI.
> - Update VINMI and VFNMI when writing HCR_EL2 or HCRX_EL2.
> v8:
> - Fix the rcu stall after sending a VNMI in qemu VM.
> v7:
> - Add Reviewed-by.
> v6:
> - env->cp15.hcr_el2 -> arm_hcr_el2_eff().
> - env->cp15.hcrx_el2 -> arm_hcrx_el2_eff().
> - Not include VF && VFNMI in CPU_INTERRUPT_VNMI.
> v4:
> - Accept NMI unconditionally for arm_cpu_has_work() but add comment.
> - Change from & to && for EXCP_IRQ or EXCP_FIQ.
> - Refator nmi mask in arm_excp_unmasked().
> - Also handle VNMI in arm_cpu_exec_interrupt() and arm_cpu_set_irq().
> - Rename virtual to Virtual.
> v3:
> - Not include CPU_INTERRUPT_NMI when FEAT_NMI not enabled
> - Add ARM_CPU_VNMI.
> - Refator nmi mask in arm_excp_unmasked().
> - Test SCTLR_ELx.NMI for ALLINT mask for NMI.
> ---
>  target/arm/cpu-qom.h   |   5 +-
>  target/arm/cpu.c   | 124 ++---
>  target/arm/cpu.h   |   6 ++
>  target/arm/helper.c|  33 +--
>  target/arm/internals.h |  18 ++
>  5 files changed, 172 insertions(+), 14 deletions(-)
>
> diff --git a/target/arm/cpu-qom.h b/target/arm/cpu-qom.h
> index 8e032691db..b497667d61 100644
> --- a/target/arm/cpu-qom.h
> +++ b/target/arm/cpu-qom.h
> @@ -36,11 +36,14 @@ DECLARE_CLASS_CHECKERS(AArch64CPUClass, AARCH64_CPU,
>  #define ARM_CPU_TYPE_SUFFIX "-" TYPE_ARM_CPU
>  #define ARM_CPU_TYPE_NAME(name) (name ARM_CPU_TYPE_SUFFIX)
>
> -/* Meanings of the ARMCPU object's four inbound GPIO lines */
> +/* Meanings of the ARMCPU object's seven inbound GPIO lines */
>  #define ARM_CPU_IRQ 0
>  #define ARM_CPU_FIQ 1
>  #define ARM_CPU_VIRQ 2
>  #define ARM_CPU_VFIQ 3
> +#define ARM_CPU_NMI 4
> +#define ARM_CPU_VINMI 5
> +#define ARM_CPU_VFNMI 6
>
>  /* For M profile, some registers are banked secure vs non-secure;
>   * these are represented as a 2-element array where the first element
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index ab8d007a86..f1e7ae0975 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -122,6 +122,13 @@ void arm_restore_state_to_opc(CPUState *cs,
>  }
>  #endif /* CONFIG_TCG */
>
> +/*
> + * With SCTLR_ELx.NMI == 0, IRQ with Superpriority is masked identically with
> + * IRQ without Superpriority. Moreover, if the GIC is configured so that
> + * FEAT_GICv3_NMI is only set if FEAT_NMI is set, then we won't ever see
> + * CPU_INTERRUPT_*NMI anyway. So we might as well accept NMI here
> + * unconditionally.
> + */
>  static bool arm_cpu_has_work(CPUState *cs)
>  {
>  ARMCPU *cpu = ARM_CPU(cs);
> @@ -129,6 +136,7 @@ static bool arm_cpu_has_work(CPUState *cs)
>  return (cpu->power_state != PSCI_OFF)
>  && cs->interrupt_request &
>  (CPU_INTERRUPT_FIQ | CPU_INTERRUPT_HARD
> + | CPU_INTERRUPT_NMI | CPU_INTERRUPT_VINMI | CPU_INTERRUPT_VFNMI
>   | CPU_INTERRUPT_VFIQ | CPU_INTERRUPT_VIRQ | CPU_INTERRUPT_VSERR
>   | CPU_INTERRUPT_EXITTB);
>  }
> @@ -668,6 +676,7 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
> unsigned int excp_idx,
>  CPUARMState *env = cpu_env(cs);
>  bool pstate_unmasked;
>  bool unmasked = false;
> +bool allIntMask = false;
>
>  /*
>   * Don't take exceptions if they target a lower EL.
> @@ -678,13 +687,36 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
> unsigned int excp_idx,
>  return false;
>  }
>
> +if (cpu_isar_feature(aa64_nmi, env_archcpu(env)) &&
> +env->cp15.sctlr_el[target_el] & SCTLR_NMI && cur_el == target_el) {
> +allIntMask = env->pstate & PSTATE_ALLINT ||
> + ((env->cp15.sctlr_el[target_el] & SCTLR_SPINTMASK) &&
> +  (env->pstate & PSTATE_SP));
> +}
> +
>  switch (excp_idx) {
> +case EXCP_NMI:
> +pstate_unmasked = !allIntMask;
> +break;
> +
> +case EXCP_VINMI:
> +if (!(hcr_el2 & HCR_IMO) || (hcr_el2 & HCR_TGE)) {
> +/* VINMIs are only taken when hypervized.  */
> +return false;
> +}
> +return !allIntMask;
> +case EXCP_VFNMI:
> +if (!(hcr_el2 & HCR_FMO) || (hcr_el2 & HCR_TGE)) {
> +/* VFNMIs are only taken when hypervized.  */
> +return false;
> +}
> +return !allIntMask;
>  case EXCP_FIQ:
> -pstate_unmasked = !(env->daif & PSTATE_F);
> +pstate_unmasked = (!(env->daif & PSTATE_F)) && (!allIntMask);
>  break;
>
>  case EXCP_IRQ:
> -pstate_unmasked = !(env->daif & PSTATE_I);
> +pstate_unmasked = (!(env->daif & PSTATE_I)) && (!allIntMask);
>  break;
>
>  case EXCP_VFIQ:
> @@ -692,13 +724,13 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
> uns

[PATCH-for-9.0? 01/21] host/atomic128: Include missing 'qemu/atomic.h' header

2024-03-21 Thread Philippe Mathieu-Daudé

qatomic_cmpxchg__nocheck(), qatomic_read__nocheck(),
qatomic_set__nocheck() are defined in "qemu/atomic.h".
Include it in order to avoid:

  In file included from include/exec/helper-proto.h:10:
  In file included from include/exec/helper-proto-common.h:10:
  In file included from include/qemu/atomic128.h:61:
  In file included from host/include/aarch64/host/atomic128-cas.h:16:
  host/include/generic/host/atomic128-cas.h:23:11: error: call to undeclared 
function 'qatomic_cmpxchg__nocheck'; ISO C99 and later do not support implicit 
function declarations [-Wimplicit-function-declaration]
r.i = qatomic_cmpxchg__nocheck(ptr_align, c.i, n.i);
  ^

Signed-off-by: Philippe Mathieu-Daudé 
---
 host/include/generic/host/atomic128-cas.h  | 2 ++
 host/include/generic/host/atomic128-ldst.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/host/include/generic/host/atomic128-cas.h 
b/host/include/generic/host/atomic128-cas.h
index 6b40cc2271..4824f14659 100644
--- a/host/include/generic/host/atomic128-cas.h
+++ b/host/include/generic/host/atomic128-cas.h
@@ -11,6 +11,8 @@
 #ifndef HOST_ATOMIC128_CAS_H
 #define HOST_ATOMIC128_CAS_H
 
+#include "qemu/atomic.h"
+
 #if defined(CONFIG_ATOMIC128)
 static inline Int128 ATTRIBUTE_ATOMIC128_OPT
 atomic16_cmpxchg(Int128 *ptr, Int128 cmp, Int128 new)
diff --git a/host/include/generic/host/atomic128-ldst.h 
b/host/include/generic/host/atomic128-ldst.h
index 691e6a8531..12e4aca2da 100644
--- a/host/include/generic/host/atomic128-ldst.h
+++ b/host/include/generic/host/atomic128-ldst.h
@@ -11,6 +11,8 @@
 #ifndef HOST_ATOMIC128_LDST_H
 #define HOST_ATOMIC128_LDST_H
 
+#include "qemu/atomic.h"
+
 #if defined(CONFIG_ATOMIC128)
 # define HAVE_ATOMIC128_RO 1
 # define HAVE_ATOMIC128_RW 1
-- 
2.41.0

[PATCH-for-9.1 00/21] target/monitor: Cleanup around hmp_info_tlb()

2024-03-21 Thread Philippe Mathieu-Daudé

Hi,

In [*] I posted preliminary steps to unify hmp_info_tlb()
and hmp_info_mem() after making them per-CPU handler,
rather than target-specific method (which break single
binary). Since there is no rush and we need to figure
the usefulness of 'info tlb/mem' and what we want to do
with it, I dropped the series but salvaged these cleanup
patches.

Regards,

Phil.

[*] https://lore.kernel.org/qemu-devel/20240320164055.60319-1-phi...@linaro.org/

Philippe Mathieu-Daudé (21):
  host/atomic128: Include missing 'qemu/atomic.h' header
  hw/core: Remove check on NEED_CPU_H in tcg-cpu-ops.h
  target/i386: Move APIC related code to cpu-apic.c
  target/i386: Extract x86_dump_mmu() from hmp_info_tlb()
  target/m68k: Replace qemu_printf() by monitor_printf() in monitor
  target/m68k: Have dump_ttr() take a @description argument
  target/m68k: Move MMU monitor commands from helper.c to monitor.c
  target/microblaze: Prefix MMU API with 'mb_'
  target/mips: Prefix MMU API with 'mips_'
  target/nios2: Prefix MMU API with 'nios2_'
  target/nios2: Move monitor commands to monitor.c
  target/nios2: Replace qemu_printf() by monitor_printf() in monitor
  target/ppc: Replace qemu_printf() by monitor_printf() in monitor
  target/sh4: Extract sh4_dump_mmu() from hmp_info_tlb()
  target/sparc: Fix string format errors when DEBUG_MMU is defined
  target/sparc: Replace qemu_printf() by monitor_printf() in monitor
  target/xtensa: Prefix MMU API with 'xtensa_'
  target/xtensa: Extract MMU API to new mmu.c/mmu.h files
  target/xtensa: Simplify dump_mpu() and dump_tlb()
  target/xtensa: Move monitor commands to monitor.c
  target/xtensa: Replace qemu_printf() by monitor_printf() in monitor

 host/include/generic/host/atomic128-cas.h  |2 +
 host/include/generic/host/atomic128-ldst.h |2 +
 include/hw/core/tcg-cpu-ops.h  |2 -
 target/i386/cpu.h  |7 +
 target/m68k/cpu.h  |3 +-
 target/microblaze/mmu.h|   10 +-
 target/mips/tcg/tcg-internal.h |2 +-
 target/nios2/cpu.h |2 +-
 target/nios2/mmu.h |   11 +-
 target/ppc/cpu.h   |2 +-
 target/sh4/cpu.h   |2 +
 target/sparc/cpu.h |2 +-
 target/xtensa/cpu.h|   32 +-
 target/xtensa/mmu.h|   95 ++
 target/i386/cpu-apic.c |  112 +++
 target/i386/cpu-sysemu.c   |   77 --
 target/i386/mmu.c  |  231 +
 target/i386/monitor.c  |  240 -
 target/m68k/helper.c   |  223 -
 target/m68k/monitor.c  |  225 -
 target/microblaze/cpu.c|2 +-
 target/microblaze/helper.c |4 +-
 target/microblaze/mmu.c|   14 +-
 target/microblaze/op_helper.c  |4 +-
 target/mips/cpu.c  |2 +-
 target/mips/tcg/sysemu/tlb_helper.c|2 +-
 target/nios2/cpu.c |2 +-
 target/nios2/helper.c  |4 +-
 target/nios2/mmu.c |   34 +-
 target/nios2/monitor.c |   27 +-
 target/ppc/mmu_common.c|  147 +--
 target/ppc/ppc-qmp-cmds.c  |2 +-
 target/sh4/monitor.c   |   22 +-
 target/sparc/ldst_helper.c |   26 +-
 target/sparc/mmu_helper.c  |  102 +-
 target/sparc/monitor.c |2 +-
 target/xtensa/cpu.c|2 +-
 target/xtensa/mmu.c|  889 +
 target/xtensa/mmu_helper.c | 1037 +---
 target/xtensa/monitor.c|  149 ++-
 target/i386/meson.build|2 +
 target/xtensa/meson.build  |1 +
 42 files changed, 1943 insertions(+), 1815 deletions(-)
 create mode 100644 target/xtensa/mmu.h
 create mode 100644 target/i386/cpu-apic.c
 create mode 100644 target/i386/mmu.c
 create mode 100644 target/xtensa/mmu.c

-- 
2.41.0

[PATCH-for-9.1 03/21] target/i386: Move APIC related code to cpu-apic.c

2024-03-21 Thread Philippe Mathieu-Daudé

Move APIC related code split in cpu-sysemu.c and
monitor.c to cpu-apic.c.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/i386/cpu-apic.c   | 112 +++
 target/i386/cpu-sysemu.c |  77 ---
 target/i386/monitor.c|  25 -
 target/i386/meson.build  |   1 +
 4 files changed, 113 insertions(+), 102 deletions(-)
 create mode 100644 target/i386/cpu-apic.c

diff --git a/target/i386/cpu-apic.c b/target/i386/cpu-apic.c
new file mode 100644
index 00..d397ec94dc
--- /dev/null
+++ b/target/i386/cpu-apic.c
@@ -0,0 +1,112 @@
+/*
+ * QEMU x86 CPU <-> APIC
+ *
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * SPDX-License-Identifier: MIT
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/error.h"
+#include "monitor/monitor.h"
+#include "monitor/hmp-target.h"
+#include "sysemu/hw_accel.h"
+#include "sysemu/kvm.h"
+#include "sysemu/xen.h"
+#include "exec/address-spaces.h"
+#include "hw/qdev-properties.h"
+#include "hw/i386/apic_internal.h"
+#include "cpu-internal.h"
+
+APICCommonClass *apic_get_class(Error **errp)
+{
+const char *apic_type = "apic";
+
+/* TODO: in-kernel irqchip for hvf */
+if (kvm_enabled()) {
+if (!kvm_irqchip_in_kernel()) {
+error_setg(errp, "KVM does not support userspace APIC");
+return NULL;
+}
+apic_type = "kvm-apic";
+} else if (xen_enabled()) {
+apic_type = "xen-apic";
+} else if (whpx_apic_in_platform()) {
+apic_type = "whpx-apic";
+}
+
+return APIC_COMMON_CLASS(object_class_by_name(apic_type));
+}
+
+void x86_cpu_apic_create(X86CPU *cpu, Error **errp)
+{
+APICCommonState *apic;
+APICCommonClass *apic_class = apic_get_class(errp);
+
+if (!apic_class) {
+return;
+}
+
+cpu->apic_state = DEVICE(object_new_with_class(OBJECT_CLASS(apic_class)));
+object_property_add_child(OBJECT(cpu), "lapic",
+  OBJECT(cpu->apic_state));
+object_unref(OBJECT(cpu->apic_state));
+
+/* TODO: convert to link<> */
+apic = APIC_COMMON(cpu->apic_state);
+apic->cpu = cpu;
+apic->apicbase = APIC_DEFAULT_ADDRESS | MSR_IA32_APICBASE_ENABLE;
+
+/*
+ * apic_common_set_id needs to check if the CPU has x2APIC
+ * feature in case APIC ID >= 255, so we need to set apic->cpu
+ * before setting APIC ID
+ */
+qdev_prop_set_uint32(cpu->apic_state, "id", cpu->apic_id);
+}
+
+void x86_cpu_apic_realize(X86CPU *cpu, Error **errp)
+{
+APICCommonState *apic;
+static bool apic_mmio_map_once;
+
+if (cpu->apic_state == NULL) {
+return;
+}
+qdev_realize(DEVICE(cpu->apic_state), NULL, errp);
+
+/* Map APIC MMIO area */
+apic = APIC_COMMON(cpu->apic_state);
+if (!apic_mmio_map_once) {
+memory_region_add_subregion_overlap(get_system_memory(),
+apic->apicbase &
+MSR_IA32_APICBASE_BASE,
+&apic->io_memory,
+0x1000);
+apic_mmio_map_once = true;
+ }
+}
+
+void hmp_info_local_apic(Monitor *mon, const QDict *qdict)
+{
+CPUState *cs;
+
+if (qdict_haskey(qdict, "apic-id")) {
+int id = qdict_get_try_int(qdict, "apic-id", 0);
+
+cs = cpu_by_arch_id(id);
+if (cs) {
+cpu_synchronize_state(cs);
+}
+} else {
+cs = mon_get_cpu(mon);
+}
+
+
+if (!cs) {
+monitor_printf(mon, "No CPU available\n");
+return;
+}
+x86_cpu_dump_local_apic_state(cs, CPU_DUMP_FPU);
+}
diff --git a/target/i386/cpu-sysemu.c b/target/i386/cpu-sysemu.c
index 3f9093d285..227ac021f6 100644
--- a/target/i386/cpu-sysemu.c
+++ b/target/i386/cpu-sysemu.c
@@ -19,19 +19,12 @@
 
 #include "qemu/osdep.h"
 #include "cpu.h"
-#include "sysemu/kvm.h"
-#include "sysemu/xen.h"
-#include "sysemu/whpx.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-run-state.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qobject-input-visitor.h"
 #include "qom/qom-qobject.h"
 #include "qapi/qapi-commands-machine-target.h"
-#include "hw/qdev-properties.h"
-
-#include "exec/address-spaces.h"
-#include "hw/i386/apic_internal.h"
 
 #include "cpu-internal.h"
 
@@ -273,75 +266,6 @@ void x86_cpu_machine_reset_cb(void *opaque)
 cpu_reset(CPU(cpu));
 }
 
-APICCommonClass *apic_get_class(Error **errp)
-{
-const char *apic_type = "apic";
-
-/* TODO: in-kernel irqchip for hvf */
-if (kvm_enabled()) {
-if (!kvm_irqchip_in_kernel()) {
-error_setg(errp, "KVM does not support userspace APIC");
-return NULL;
-}
-apic_type = "kvm-apic";
-} else if (xen_enabled()) {
-apic_type = "xen-apic";
-} else if (whpx_apic_in_platform()) {
-apic_type = "whpx-apic";
-}
-
-return APIC_COMMON_CLASS(object_class_by_name(apic_type));
-}
-
-v

[PATCH-for-9.1 06/21] target/m68k: Have dump_ttr() take a @description argument

2024-03-21 Thread Philippe Mathieu-Daudé

Slightly simplify dump_mmu() by passing the description as
argument to dump_ttr().

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/m68k/helper.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/target/m68k/helper.c b/target/m68k/helper.c
index 310e26dfa1..cf9d83e47e 100644
--- a/target/m68k/helper.c
+++ b/target/m68k/helper.c
@@ -578,8 +578,9 @@ static void dump_address_map(Monitor *mon, CPUM68KState 
*env,
 break; \
 }
 
-static void dump_ttr(Monitor *mon, uint32_t ttr)
+static void dump_ttr(Monitor *mon, const char *desc, uint32_t ttr)
 {
+monitor_printf(mon, "%s: ", desc);
 if ((ttr & M68K_TTR_ENABLED) == 0) {
 monitor_puts(mon, "disabled\n");
 return;
@@ -663,14 +664,10 @@ void dump_mmu(Monitor *mon, CPUM68KState *env)
 monitor_puts(mon, "\n");
 }
 
-monitor_puts(mon, "ITTR0: ");
-dump_ttr(mon, env->mmu.ttr[M68K_ITTR0]);
-monitor_puts(mon, "ITTR1: ");
-dump_ttr(mon, env->mmu.ttr[M68K_ITTR1]);
-monitor_puts(mon, "DTTR0: ");
-dump_ttr(mon, env->mmu.ttr[M68K_DTTR0]);
-monitor_puts(mon, "DTTR1: ");
-dump_ttr(mon, env->mmu.ttr[M68K_DTTR1]);
+dump_ttr(mon, "ITTR0", env->mmu.ttr[M68K_ITTR0]);
+dump_ttr(mon, "ITTR1", env->mmu.ttr[M68K_ITTR1]);
+dump_ttr(mon, "DTTR0", env->mmu.ttr[M68K_DTTR0]);
+dump_ttr(mon, "DTTR1", env->mmu.ttr[M68K_DTTR1]);
 
 monitor_printf(mon, "SRP: 0x%08x\n", env->mmu.srp);
 dump_address_map(mon, env, env->mmu.srp);
-- 
2.41.0

[PATCH-for-9.1 08/21] target/microblaze: Prefix MMU API with 'mb_'

2024-03-21 Thread Philippe Mathieu-Daudé

MicroBlaze MMU API is exposed in "mmu.h". In order to avoid
name clashing with other targets, prefix the API with 'mb_'.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/microblaze/mmu.h   | 10 +-
 target/microblaze/cpu.c   |  2 +-
 target/microblaze/helper.c|  4 ++--
 target/microblaze/mmu.c   | 14 +++---
 target/microblaze/op_helper.c |  4 ++--
 5 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/target/microblaze/mmu.h b/target/microblaze/mmu.h
index 1068bd2d52..5b51e0a9c6 100644
--- a/target/microblaze/mmu.h
+++ b/target/microblaze/mmu.h
@@ -85,10 +85,10 @@ typedef struct {
 } err;
 } MicroBlazeMMULookup;
 
-unsigned int mmu_translate(MicroBlazeCPU *cpu, MicroBlazeMMULookup *lu,
-   target_ulong vaddr, MMUAccessType rw, int mmu_idx);
-uint32_t mmu_read(CPUMBState *env, bool ea, uint32_t rn);
-void mmu_write(CPUMBState *env, bool ea, uint32_t rn, uint32_t v);
-void mmu_init(MicroBlazeMMU *mmu);
+unsigned int mb_mmu_translate(MicroBlazeCPU *cpu, MicroBlazeMMULookup *lu,
+  target_ulong vaddr, MMUAccessType rw, int 
mmu_idx);
+uint32_t mb_mmu_read(CPUMBState *env, bool ea, uint32_t rn);
+void mb_mmu_write(CPUMBState *env, bool ea, uint32_t rn, uint32_t v);
+void mb_mmu_init(MicroBlazeMMU *mmu);
 
 #endif
diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
index 96c2b71f7f..59bfb5c45d 100644
--- a/target/microblaze/cpu.c
+++ b/target/microblaze/cpu.c
@@ -205,7 +205,7 @@ static void mb_cpu_reset_hold(Object *obj)
 mb_cpu_write_msr(env, MSR_EE | MSR_IE | MSR_VM | MSR_UM);
 #else
 mb_cpu_write_msr(env, 0);
-mmu_init(&env->mmu);
+mb_mmu_init(&env->mmu);
 #endif
 }
 
diff --git a/target/microblaze/helper.c b/target/microblaze/helper.c
index d25c9eb4d3..961687bae7 100644
--- a/target/microblaze/helper.c
+++ b/target/microblaze/helper.c
@@ -57,7 +57,7 @@ bool mb_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
 return true;
 }
 
-hit = mmu_translate(cpu, &lu, address, access_type, mmu_idx);
+hit = mb_mmu_translate(cpu, &lu, address, access_type, mmu_idx);
 if (likely(hit)) {
 uint32_t vaddr = address & TARGET_PAGE_MASK;
 uint32_t paddr = lu.paddr + vaddr - lu.vaddr;
@@ -238,7 +238,7 @@ hwaddr mb_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr 
addr,
 attrs->secure = mb_cpu_access_is_secure(cpu, MMU_DATA_LOAD);
 
 if (mmu_idx != MMU_NOMMU_IDX) {
-hit = mmu_translate(cpu, &lu, addr, 0, 0);
+hit = mb_mmu_translate(cpu, &lu, addr, 0, 0);
 if (hit) {
 vaddr = addr & TARGET_PAGE_MASK;
 paddr = lu.paddr + vaddr - lu.vaddr;
diff --git a/target/microblaze/mmu.c b/target/microblaze/mmu.c
index 234006634e..5fb8ee8418 100644
--- a/target/microblaze/mmu.c
+++ b/target/microblaze/mmu.c
@@ -74,8 +74,8 @@ static void mmu_change_pid(CPUMBState *env, unsigned int 
newpid)
 }
 
 /* rw - 0 = read, 1 = write, 2 = fetch.  */
-unsigned int mmu_translate(MicroBlazeCPU *cpu, MicroBlazeMMULookup *lu,
-   target_ulong vaddr, MMUAccessType rw, int mmu_idx)
+unsigned int mb_mmu_translate(MicroBlazeCPU *cpu, MicroBlazeMMULookup *lu,
+  target_ulong vaddr, MMUAccessType rw, int 
mmu_idx)
 {
 MicroBlazeMMU *mmu = &cpu->env.mmu;
 unsigned int i, hit = 0;
@@ -175,7 +175,7 @@ done:
 }
 
 /* Writes/reads to the MMU's special regs end up here.  */
-uint32_t mmu_read(CPUMBState *env, bool ext, uint32_t rn)
+uint32_t mb_mmu_read(CPUMBState *env, bool ext, uint32_t rn)
 {
 MicroBlazeCPU *cpu = env_archcpu(env);
 unsigned int i;
@@ -228,7 +228,7 @@ uint32_t mmu_read(CPUMBState *env, bool ext, uint32_t rn)
 return r;
 }
 
-void mmu_write(CPUMBState *env, bool ext, uint32_t rn, uint32_t v)
+void mb_mmu_write(CPUMBState *env, bool ext, uint32_t rn, uint32_t v)
 {
 MicroBlazeCPU *cpu = env_archcpu(env);
 uint64_t tmp64;
@@ -304,8 +304,8 @@ void mmu_write(CPUMBState *env, bool ext, uint32_t rn, 
uint32_t v)
 return;
 }
 
-hit = mmu_translate(cpu, &lu, v & TLB_EPN_MASK,
-0, cpu_mmu_index(env_cpu(env), false));
+hit = mb_mmu_translate(cpu, &lu, v & TLB_EPN_MASK,
+   0, cpu_mmu_index(env_cpu(env), false));
 if (hit) {
 env->mmu.regs[MMU_R_TLBX] = lu.idx;
 } else {
@@ -319,7 +319,7 @@ void mmu_write(CPUMBState *env, bool ext, uint32_t rn, 
uint32_t v)
}
 }
 
-void mmu_init(MicroBlazeMMU *mmu)
+void mb_mmu_init(MicroBlazeMMU *mmu)
 {
 int i;
 for (i = 0; i < ARRAY_SIZE(mmu->regs); i++) {
diff --git a/target/microblaze/op_helper.c b/target/microblaze/op_helper.c
index f6378030b7..58475a3af5 100644
--- a/target/microblaze/op_helper.c
+++ b/target/microblaze/op_helper.c
@@ -386,12 +386,12 @@ void helper_stackprot(CPUMBState *env, target_ulong addr)
 /* Writes/reads to the MMU's special regs end

[PATCH-for-9.1 04/21] target/i386: Extract x86_dump_mmu() from hmp_info_tlb()

2024-03-21 Thread Philippe Mathieu-Daudé

hmp_info_tlb() is specific to tcg/system, move it to
target/i386/tcg/sysemu/hmp-cmds.c, along with the functions
it depend on (except addr_canonical() which is exposed in
"cpu.h").

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/i386/cpu.h   |   7 ++
 target/i386/mmu.c   | 231 
 target/i386/monitor.c   | 215 -
 target/i386/meson.build |   1 +
 4 files changed, 239 insertions(+), 215 deletions(-)
 create mode 100644 target/i386/mmu.c

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 952174bb6f..055c5b99de 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2342,6 +2342,13 @@ static inline int cpu_mmu_index_kernel(CPUX86State *env)
 return mmu_index_base + mmu_index_32;
 }
 
+#if !defined(CONFIG_USER_ONLY)
+void x86_dump_mmu(Monitor *mon, CPUX86State *env);
+
+/* Perform linear address sign extension */
+hwaddr addr_canonical(CPUArchState *env, hwaddr addr);
+#endif
+
 #define CC_DST  (env->cc_dst)
 #define CC_SRC  (env->cc_src)
 #define CC_SRC2 (env->cc_src2)
diff --git a/target/i386/mmu.c b/target/i386/mmu.c
new file mode 100644
index 00..da9b2263b4
--- /dev/null
+++ b/target/i386/mmu.c
@@ -0,0 +1,231 @@
+/*
+ * QEMU x86 MMU monitor commands
+ *
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * SPDX-License-Identifier: MIT
+ */
+
+#include "qemu/osdep.h"
+#include "monitor/monitor.h"
+#include "monitor/hmp-target.h"
+#include "cpu.h"
+
+hwaddr addr_canonical(CPUArchState *env, hwaddr addr)
+{
+#ifdef TARGET_X86_64
+if (env->cr[4] & CR4_LA57_MASK) {
+if (addr & (1ULL << 56)) {
+addr |= (hwaddr)-(1LL << 57);
+}
+} else {
+if (addr & (1ULL << 47)) {
+addr |= (hwaddr)-(1LL << 48);
+}
+}
+#endif
+return addr;
+}
+
+static void print_pte(Monitor *mon, CPUArchState *env, hwaddr addr,
+  hwaddr pte, hwaddr mask)
+{
+addr = addr_canonical(env, addr);
+
+monitor_printf(mon, HWADDR_FMT_plx ": " HWADDR_FMT_plx
+   " %c%c%c%c%c%c%c%c%c\n",
+   addr,
+   pte & mask,
+   pte & PG_NX_MASK ? 'X' : '-',
+   pte & PG_GLOBAL_MASK ? 'G' : '-',
+   pte & PG_PSE_MASK ? 'P' : '-',
+   pte & PG_DIRTY_MASK ? 'D' : '-',
+   pte & PG_ACCESSED_MASK ? 'A' : '-',
+   pte & PG_PCD_MASK ? 'C' : '-',
+   pte & PG_PWT_MASK ? 'T' : '-',
+   pte & PG_USER_MASK ? 'U' : '-',
+   pte & PG_RW_MASK ? 'W' : '-');
+}
+
+static void tlb_info_32(Monitor *mon, CPUArchState *env)
+{
+unsigned int l1, l2;
+uint32_t pgd, pde, pte;
+
+pgd = env->cr[3] & ~0xfff;
+for(l1 = 0; l1 < 1024; l1++) {
+cpu_physical_memory_read(pgd + l1 * 4, &pde, 4);
+pde = le32_to_cpu(pde);
+if (pde & PG_PRESENT_MASK) {
+if ((pde & PG_PSE_MASK) && (env->cr[4] & CR4_PSE_MASK)) {
+/* 4M pages */
+print_pte(mon, env, (l1 << 22), pde, ~((1 << 21) - 1));
+} else {
+for(l2 = 0; l2 < 1024; l2++) {
+cpu_physical_memory_read((pde & ~0xfff) + l2 * 4, &pte, 4);
+pte = le32_to_cpu(pte);
+if (pte & PG_PRESENT_MASK) {
+print_pte(mon, env, (l1 << 22) + (l2 << 12),
+  pte & ~PG_PSE_MASK,
+  ~0xfff);
+}
+}
+}
+}
+}
+}
+
+static void tlb_info_pae32(Monitor *mon, CPUArchState *env)
+{
+unsigned int l1, l2, l3;
+uint64_t pdpe, pde, pte;
+uint64_t pdp_addr, pd_addr, pt_addr;
+
+pdp_addr = env->cr[3] & ~0x1f;
+for (l1 = 0; l1 < 4; l1++) {
+cpu_physical_memory_read(pdp_addr + l1 * 8, &pdpe, 8);
+pdpe = le64_to_cpu(pdpe);
+if (pdpe & PG_PRESENT_MASK) {
+pd_addr = pdpe & 0x3f000ULL;
+for (l2 = 0; l2 < 512; l2++) {
+cpu_physical_memory_read(pd_addr + l2 * 8, &pde, 8);
+pde = le64_to_cpu(pde);
+if (pde & PG_PRESENT_MASK) {
+if (pde & PG_PSE_MASK) {
+/* 2M pages with PAE, CR4.PSE is ignored */
+print_pte(mon, env, (l1 << 30) + (l2 << 21), pde,
+  ~((hwaddr)(1 << 20) - 1));
+} else {
+pt_addr = pde & 0x3f000ULL;
+for (l3 = 0; l3 < 512; l3++) {
+cpu_physical_memory_read(pt_addr + l3 * 8, &pte, 
8);
+pte = le64_to_cpu(pte);
+if (pte & PG_PRESENT_MASK) {
+print_pte(mon, env, (l1 << 30) + (l2 << 21)
+  + (l3 << 12),
+

[PATCH-for-9.1 09/21] target/mips: Prefix MMU API with 'mips_'

2024-03-21 Thread Philippe Mathieu-Daudé

MIPS MMU API declared in tcg-internal.h has public linkage.
In order to avoid name clashing with other targets, prefix
the API with 'mips_'.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/tcg/tcg-internal.h  | 2 +-
 target/mips/cpu.c   | 2 +-
 target/mips/tcg/sysemu/tlb_helper.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/mips/tcg/tcg-internal.h b/target/mips/tcg/tcg-internal.h
index aef032c48d..2dc9d9100f 100644
--- a/target/mips/tcg/tcg-internal.h
+++ b/target/mips/tcg/tcg-internal.h
@@ -43,7 +43,7 @@ void do_raise_exception(CPUMIPSState *env,
 void mips_cpu_do_interrupt(CPUState *cpu);
 bool mips_cpu_exec_interrupt(CPUState *cpu, int int_req);
 
-void mmu_init(CPUMIPSState *env, const mips_def_t *def);
+void mips_mmu_init(CPUMIPSState *env, const mips_def_t *def);
 
 void update_pagemask(CPUMIPSState *env, target_ulong arg1, int32_t *pagemask);
 
diff --git a/target/mips/cpu.c b/target/mips/cpu.c
index 8d8f690a53..8acf691b0b 100644
--- a/target/mips/cpu.c
+++ b/target/mips/cpu.c
@@ -485,7 +485,7 @@ static void mips_cpu_realizefn(DeviceState *dev, Error 
**errp)
 env->exception_base = (int32_t)0xBFC0;
 
 #if defined(CONFIG_TCG) && !defined(CONFIG_USER_ONLY)
-mmu_init(env, env->cpu_model);
+mips_mmu_init(env, env->cpu_model);
 #endif
 fpu_init(env, env->cpu_model);
 mvp_init(env);
diff --git a/target/mips/tcg/sysemu/tlb_helper.c 
b/target/mips/tcg/sysemu/tlb_helper.c
index 119eae771e..0167b1162f 100644
--- a/target/mips/tcg/sysemu/tlb_helper.c
+++ b/target/mips/tcg/sysemu/tlb_helper.c
@@ -464,7 +464,7 @@ static void r4k_mmu_init(CPUMIPSState *env, const 
mips_def_t *def)
 env->tlb->helper_tlbinvf = r4k_helper_tlbinvf;
 }
 
-void mmu_init(CPUMIPSState *env, const mips_def_t *def)
+void mips_mmu_init(CPUMIPSState *env, const mips_def_t *def)
 {
 env->tlb = g_malloc0(sizeof(CPUMIPSTLBContext));
 
-- 
2.41.0

1 2 3 >

1 - 100 of 212 matches

Mail list logo