date:20250302

Re: [PATCH v3 1/3] target/riscv/cpu: remove unneeded !kvm_enabled() check

2025-03-02 Thread Alistair Francis

On Mon, Feb 24, 2025 at 10:33 PM Daniel Henrique Barboza
 wrote:
>
> Remove the !kvm_enabled() check in kvm_riscv_reset_vcpu() since the
> function is already being gated by kvm_enabled() in
> riscv_cpu_reset_hold().
>
> Signed-off-by: Daniel Henrique Barboza 
> Reviewed-by: Andrew Jones 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/kvm/kvm-cpu.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
> index 471fd554b3..19bb87515b 100644
> --- a/target/riscv/kvm/kvm-cpu.c
> +++ b/target/riscv/kvm/kvm-cpu.c
> @@ -1606,9 +1606,6 @@ void kvm_riscv_reset_vcpu(RISCVCPU *cpu)
>  CPURISCVState *env = &cpu->env;
>  int i;
>
> -if (!kvm_enabled()) {
> -return;
> -}
>  for (i = 0; i < 32; i++) {
>  env->gpr[i] = 0;
>  }
> --
> 2.48.1
>
>

Re: [PATCH v3 0/3] target/riscv/kvm: reset time changes

2025-03-02 Thread Alistair Francis

On Mon, Feb 24, 2025 at 10:32 PM Daniel Henrique Barboza
 wrote:
>
> Hi,
>
> In this version I rolled back on the riscv_cpu_reset_hold() changes made
> in patch 1. Peter made an argument about keeping the design the same
> across architectures and I agreed. Patches 2 and 3 are already taking
> care of everything that KVM needs during reset, thus this doesn't incur
> in any functional changes.
>
> Drew said it was ok to keep his ack on patch 1 so his ack is kept.
>
> Patches based on alistair/riscv-to-apply.next. All patches acked.
>
> Changes from v2:
> - patch 1:
>   - do not do an early exit in riscv_cpu_reset_hold() if kvm_enabled()
> - v2 link: 
> https://lore.kernel.org/qemu-riscv/20250221122623.495188-1-dbarb...@ventanamicro.com/
>
>
> Daniel Henrique Barboza (3):
>   target/riscv/cpu: remove unneeded !kvm_enabled() check
>   target/riscv/kvm: add kvm_riscv_reset_regs_csr()
>   target/riscv/kvm: add missing KVM CSRs

Thanks!

Applied to riscv-to-apply.next

Alistair

>
>  target/riscv/kvm/kvm-cpu.c | 32 +---
>  1 file changed, 21 insertions(+), 11 deletions(-)
>
> --
> 2.48.1
>
>

Re: [PATCH v5 34/36] vfio/migration: Max in-flight VFIO device state buffer count limit

2025-03-02 Thread Avihai Horon




On 02/03/2025 16:59, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


On 2.03.2025 15:54, Maciej S. Szmigiero wrote:

On 2.03.2025 15:53, Avihai Horon wrote:


On 19/02/2025 22:34, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 

Allow capping the maximum count of in-flight VFIO device state buffers
queued at the destination, otherwise a malicious QEMU source could
theoretically cause the target QEMU to allocate unlimited amounts 
of memory

for buffers-in-flight.


I still think it's better to limit the number of bytes rather than 
number of buffers:
1. To the average user the number of buffers doesn't really mean 
anything. They have to open the code and see what is the size of a 
single buffer and then choose their value.
2. Currently VFIO migration buffer size is 1MB. If later it's 
changed to 2MB for example, users will have to adjust their 
configuration accordingly. With number of bytes, the configuration 
remains the same no matter what is the VFIO migration buffer size.


Sorry Avihai, but we're a little more than week from code freeze
so it's really not a time for more than cosmetic changes.


And if you really, really want to have queued buffers size limit
that's something could be added later as additional
x-migration-max-queued-buffers-size or something property
since these limits aren't exclusive.


Sure, I agree.
It's not urgent nor mandatory for now, I just wanted to express my 
opinion :)


Thanks.


Thanks,
Maciej

[PATCH v6 2/2] target/loongarch: check tlb_ps

2025-03-02 Thread Song Gao

For LoongArch th min tlb_ps is 12(4KB), for TLB code,
the tlb_ps may be 0,this may case UndefinedBehavior
Add a check-tlb_ps fuction to check tlb_ps,
to make sure the tlb_ps is avalablie. we check tlb_ps
when get the tlb_ps from tlb->misc or CSR bits.
1. cpu reset
   set CSR_PWCL.PTBASE and CSR_STLBPS.PS bits a default value
   from CSR_PRCFG2;
2. tlb instructions.
   some tlb instructions get  the tlb_ps from tlb->misc but the
   value may  has been initialized to 0. we need just check the tlb_ps
   skip the function and write a guest log.
3. csrwr instructions.
   to make sure CSR_PWCL.PTBASE and CSR_STLBPS.PS bits are avalable,
   cheke theses bits and set a default value from CSR_PRCFG2.

Signed-off-by: Song Gao 
---
 target/loongarch/cpu.c| 11 +--
 target/loongarch/helper.h |  1 +
 target/loongarch/internals.h  |  2 ++
 target/loongarch/tcg/csr_helper.c | 30 ++-
 .../tcg/insn_trans/trans_privileged.c.inc |  1 +
 target/loongarch/tcg/tlb_helper.c | 28 +++--
 6 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index e91f4a5239..4d81dcc746 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -537,6 +537,7 @@ static void loongarch_max_initfn(Object *obj)
 
 static void loongarch_cpu_reset_hold(Object *obj, ResetType type)
 {
+uint8_t tlb_ps;
 CPUState *cs = CPU(obj);
 LoongArchCPUClass *lacc = LOONGARCH_CPU_GET_CLASS(obj);
 CPULoongArchState *env = cpu_env(cs);
@@ -585,13 +586,17 @@ static void loongarch_cpu_reset_hold(Object *obj, 
ResetType type)
  */
 env->CSR_PGDH = 0;
 env->CSR_PGDL = 0;
-env->CSR_PWCL = 0;
 env->CSR_PWCH = 0;
-env->CSR_STLBPS = 0;
 env->CSR_EENTRY = 0;
 env->CSR_TLBRENTRY = 0;
 env->CSR_MERRENTRY = 0;
-
+/* set CSR_PWCL.PTBASE and CSR_STLBPS.PS bits from CSR_PRCFG2 */
+if (env->CSR_PRCFG2 == 0) {
+env->CSR_PRCFG2 = 0x3f000;
+}
+tlb_ps = clz32(env->CSR_PRCFG2);
+env->CSR_STLBPS = FIELD_DP64(env->CSR_STLBPS, CSR_STLBPS, PS, tlb_ps);
+env->CSR_PWCL = FIELD_DP64(env->CSR_PWCL, CSR_PWCL, PTBASE, tlb_ps);
 for (n = 0; n < 4; n++) {
 env->CSR_DMW[n] = FIELD_DP64(env->CSR_DMW[n], CSR_DMW, PLV0, 0);
 env->CSR_DMW[n] = FIELD_DP64(env->CSR_DMW[n], CSR_DMW, PLV1, 0);
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 943517b5f2..1d5cb0198c 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -100,6 +100,7 @@ DEF_HELPER_1(rdtime_d, i64, env)
 DEF_HELPER_1(csrrd_pgd, i64, env)
 DEF_HELPER_1(csrrd_cpuid, i64, env)
 DEF_HELPER_1(csrrd_tval, i64, env)
+DEF_HELPER_2(csrwr_stlbps, i64, env, tl)
 DEF_HELPER_2(csrwr_estat, i64, env, tl)
 DEF_HELPER_2(csrwr_asid, i64, env, tl)
 DEF_HELPER_2(csrwr_tcfg, i64, env, tl)
diff --git a/target/loongarch/internals.h b/target/loongarch/internals.h
index 7b254c5f49..1cd959a766 100644
--- a/target/loongarch/internals.h
+++ b/target/loongarch/internals.h
@@ -43,6 +43,8 @@ enum {
 TLBRET_PE = 7,
 };
 
+bool check_ps(CPULoongArchState *ent, int ps);
+
 extern const VMStateDescription vmstate_loongarch_cpu;
 
 void loongarch_cpu_set_irq(void *opaque, int irq, int level);
diff --git a/target/loongarch/tcg/csr_helper.c 
b/target/loongarch/tcg/csr_helper.c
index 6c95be9910..2ede9eaf79 100644
--- a/target/loongarch/tcg/csr_helper.c
+++ b/target/loongarch/tcg/csr_helper.c
@@ -17,6 +17,27 @@
 #include "hw/irq.h"
 #include "cpu-csr.h"
 
+
+
+target_ulong helper_csrwr_stlbps(CPULoongArchState *env, target_ulong val)
+{
+int64_t old_v = env->CSR_STLBPS;
+uint8_t default_ps = ctz32(env->CSR_PRCFG2);
+
+/*
+ * The real hardware only supports the min tlb_ps is 12
+ * tlb_ps=0 may cause undefined-behavior.
+ */
+uint8_t tlb_ps = FIELD_EX64(env->CSR_STLBPS, CSR_STLBPS, PS);
+if (!check_ps(env, tlb_ps)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "Attempted set ps %d\n", tlb_ps);
+val = FIELD_DP64(val, CSR_STLBPS, PS, default_ps);
+}
+env->CSR_STLBPS = val;
+return old_v;
+}
+
 target_ulong helper_csrrd_pgd(CPULoongArchState *env)
 {
 int64_t v;
@@ -99,19 +120,26 @@ target_ulong helper_csrwr_ticlr(CPULoongArchState *env, 
target_ulong val)
 
 target_ulong helper_csrwr_pwcl(CPULoongArchState *env, target_ulong val)
 {
-int shift;
+int shift, ptbase;
 int64_t old_v = env->CSR_PWCL;
+uint8_t default_ps = ctz32(env->CSR_PRCFG2);
 
 /*
  * The real hardware only supports 64bit PTE width now, 128bit or others
  * treated as illegal.
  */
 shift = FIELD_EX64(val, CSR_PWCL, PTEWIDTH);
+ptbase = FIELD_EX64(val, CSR_PWCL, PTBASE);
 if (shift) {
 qemu_log_mask(LOG_GUEST_ERROR,
   "Attempted set pte width with %d bit\n", 64 << shift);
 val = FIELD_DP64(val, CSR_PWCL, PTEWIDTH, 0);
 }
+if (!c

[PATCH v6 1/2] target/loongarch: fix 'make check-functional' failed

2025-03-02 Thread Song Gao

 some tlb instructions get  the tlb_ps from tlb->misc but the
 value may has been initialized to 0,just check the tlb_ps skip
 the function and write a log.

Signed-off-by: Song Gao 
Reviewed-by: Bibo Mao 
---
 target/loongarch/tcg/tlb_helper.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/target/loongarch/tcg/tlb_helper.c 
b/target/loongarch/tcg/tlb_helper.c
index a323606e5a..112556301f 100644
--- a/target/loongarch/tcg/tlb_helper.c
+++ b/target/loongarch/tcg/tlb_helper.c
@@ -427,7 +427,11 @@ void helper_invtlb_page_asid(CPULoongArchState *env, 
target_ulong info,
 uint16_t tlb_asid = FIELD_EX64(tlb->tlb_misc, TLB_MISC, ASID);
 uint64_t vpn, tlb_vppn;
 uint8_t tlb_ps, compare_shift;
+uint8_t tlb_e = FIELD_EX64(tlb->tlb_misc, TLB_MISC, E);
 
+if (!tlb_e) {
+continue;
+}
 if (i >= LOONGARCH_STLB) {
 tlb_ps = FIELD_EX64(tlb->tlb_misc, TLB_MISC, PS);
 } else {
@@ -456,7 +460,11 @@ void helper_invtlb_page_asid_or_g(CPULoongArchState *env,
 uint16_t tlb_asid = FIELD_EX64(tlb->tlb_misc, TLB_MISC, ASID);
 uint64_t vpn, tlb_vppn;
 uint8_t tlb_ps, compare_shift;
+uint8_t tlb_e = FIELD_EX64(tlb->tlb_misc, TLB_MISC, E);
 
+if (!tlb_e) {
+continue;
+}
 if (i >= LOONGARCH_STLB) {
 tlb_ps = FIELD_EX64(tlb->tlb_misc, TLB_MISC, PS);
 } else {
-- 
2.34.1

[PATCH v6 0/2] target/loongarch: fix 'make check-functional' failed

2025-03-02 Thread Song Gao

 Some tlb instructions get the tlb_ps from tlb->misc but the
 value may has been initialized to 0,just check the tlb_e skip
 the function and check_tlb_ps write a log.

For LoongArch th min tlb_ps is 12(4KB), for TLB code,
the tlb_ps may be 0,this may case UndefinedBehavior
Add a check-tlb_ps fuction to check tlb_ps,
to make sure the tlb_ps is avalablie. we check tlb_ps
when get the tlb_ps from tlb->misc or CSR bits.
1. cpu reset.
   set CSR_PWCL.PTBASE and CSR_STLBPS.PS bits a default value
   from CSR_PRCFG2;
2. tlb instructions.
   some tlb instructions get  the tlb_ps from tlb->misc but the
   value may  has been initialized to 0. we need just check the tlb_e
   skip the function and check tlb_ps write a guest log.
3. csrwr instructions.
   to make sure CSR_PWCL.PTBASE and CSR_STLBPS.PS bits are avalable,
   cheke theses bits and set a default value from CSR_PRCFG2.


v6:
  1 clean code. fix code style.
  2 rebase and R-b.

V5:
  1 Add add chek_ps() function to check tlb_ps with CSR_PRCFG2;
  2 Some tlb instuctions, just check tlb_ps, do't rewrite the tlb_ps
  bits  just write a guest log;
  3 remove csrwr crmd helper function and crmd check PWCL and check 
CSR_STLBPS.just
chcek PWCL and STLBPS when csrwr CSR_PWCL and CSR_STLBPS;
  4 set CSR_PWCL.PTBASE and CSR_STLBPS.PS bits a default value
   from CSR_PRCFG2 when cpu reset.
V4:
1.Get the default tlb_ps value from env->CSR_PRCFG2.
2.Some tlb  instrucions check the tlb_ps such as tlbfill/tlbwr/invtlb.
3.check_tlb_ps()just check CSR_PWCL.PTBASE bits and CSR_STLBPS.PS bits.
don't check all tlb->misc.

v3:
remove some tlb instruction chek MMU on PG model, because on DA model
also can use tlb instructions.

v2:
check-tlb_ps when write CSR_PWCL and CSR_STLBPS;
some tlb instructions check CRMD PG model when clear/read/write the tlb.
link to patch: 
https://patchew.org/QEMU/20250220012226.2182174-1-gaos...@loongson.cn/

Thanks.
Song Gao

Song Gao (2):
  target/loongarch: fix 'make check-functional' failed
  target/loongarch: check tlb_ps

 target/loongarch/cpu.c| 10 +++-
 target/loongarch/cpu_helper.c |  8 ++-
 target/loongarch/helper.h |  1 +
 target/loongarch/internals.h  |  2 +
 target/loongarch/tcg/csr_helper.c | 30 +-
 .../tcg/insn_trans/trans_privileged.c.inc |  1 +
 target/loongarch/tcg/tlb_helper.c | 55 ++-
 7 files changed, 99 insertions(+), 8 deletions(-)

-- 
2.34.1

Re: [PATCH v5 23/36] vfio/migration: Multifd device state transfer support - VFIOStateBuffer(s)

2025-03-02 Thread Avihai Horon




On 19/02/2025 22:34, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 

Add VFIOStateBuffer(s) types and the associated methods.

These store received device state buffers and config state waiting to get
loaded into the device.

Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration-multifd.c | 54 +
  1 file changed, 54 insertions(+)

diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index 0c3185a26242..760b110a39b9 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -29,3 +29,57 @@ typedef struct VFIODeviceStatePacket {
  uint32_t flags;
  uint8_t data[0];
  } QEMU_PACKED VFIODeviceStatePacket;
+
+/* type safety */
+typedef struct VFIOStateBuffers {
+GArray *array;
+} VFIOStateBuffers;
+
+typedef struct VFIOStateBuffer {
+bool is_present;
+char *data;
+size_t len;
+} VFIOStateBuffer;
+
+static void vfio_state_buffer_clear(gpointer data)
+{
+VFIOStateBuffer *lb = data;
+
+if (!lb->is_present) {
+return;
+}
+
+g_clear_pointer(&lb->data, g_free);
+lb->is_present = false;
+}
+
+static void vfio_state_buffers_init(VFIOStateBuffers *bufs)
+{
+bufs->array = g_array_new(FALSE, TRUE, sizeof(VFIOStateBuffer));
+g_array_set_clear_func(bufs->array, vfio_state_buffer_clear);
+}
+
+static void vfio_state_buffers_destroy(VFIOStateBuffers *bufs)
+{
+g_clear_pointer(&bufs->array, g_array_unref);
+}
+
+static void vfio_state_buffers_assert_init(VFIOStateBuffers *bufs)
+{
+assert(bufs->array);
+}
+
+static guint vfio_state_buffers_size_get(VFIOStateBuffers *bufs)
+{
+return bufs->array->len;
+}
+
+static void vfio_state_buffers_size_set(VFIOStateBuffers *bufs, guint size)
+{
+g_array_set_size(bufs->array, size);
+}
+
+static VFIOStateBuffer *vfio_state_buffers_at(VFIOStateBuffers *bufs, guint 
idx)
+{
+return &g_array_index(bufs->array, VFIOStateBuffer, idx);
+}


This patch breaks compilation as non of the functions are used, e.g.: 
error: ‘vfio_state_buffers_init’ defined but not used I can think of 
three options to solve it: 1. Move these functions to their own file and 
export them, e.g., hw/vfio/state-buffer.{c,h}. But this seems like an 
overkill for such a small API. 2. Add __attribute__((unused)) tags and 
remove them in patch #26 where the functions are actually used. A bit 
ugly. 3. Squash this patch into patch #26. I prefer option 3 as this is 
a small API closely related to patch #26 (and patch #26 will still 
remain rather small).


Thanks.

[PATCH 39/39] target/hexagon: Add pcycle setting functionality

2025-03-02 Thread Brian Cain

Signed-off-by: Brian Cain 
Signed-off-by: Matheus Tavares Bernardino 
---
 target/hexagon/cpu.c| 10 +++---
 target/hexagon/cpu_helper.c | 17 ++---
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index 80f5e23794..4ca6add834 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -440,19 +440,23 @@ static void hexagon_cpu_realize(DeviceState *dev, Error 
**errp)
 #endif
 
 qemu_init_vcpu(cs);
-#ifndef CONFIG_USER_ONLY
 CPUHexagonState *env = cpu_env(cs);
+#ifndef CONFIG_USER_ONLY
 hex_mmu_realize(env);
 if (cs->cpu_index == 0) {
 env->g_sreg = g_new0(target_ulong, NUM_SREGS);
-env->g_pcycle_base = g_malloc0(sizeof(*env->g_pcycle_base));
 } else {
 CPUState *cpu0 = qemu_get_cpu(0);
 CPUHexagonState *env0 = cpu_env(cpu0);
 env->g_sreg = env0->g_sreg;
-env->g_pcycle_base = env0->g_pcycle_base;
 }
 #endif
+if (cs->cpu_index == 0) {
+env->g_pcycle_base = g_malloc0(sizeof(*env->g_pcycle_base));
+} else {
+CPUState *cpu0 = qemu_get_cpu(0);
+env->g_pcycle_base = cpu_env(cpu0)->g_pcycle_base;
+}
 
 mcc->parent_realize(dev, errp);
 }
diff --git a/target/hexagon/cpu_helper.c b/target/hexagon/cpu_helper.c
index 9c44cb7950..08c749e9fa 100644
--- a/target/hexagon/cpu_helper.c
+++ b/target/hexagon/cpu_helper.c
@@ -70,18 +70,29 @@ uint32_t hexagon_get_sys_pcycle_count_low(CPUHexagonState 
*env)
 void hexagon_set_sys_pcycle_count_high(CPUHexagonState *env,
 uint32_t cycles_hi)
 {
-g_assert_not_reached();
+uint64_t cur_cycles = hexagon_get_sys_pcycle_count(env);
+uint64_t cycles =
+((uint64_t)cycles_hi << 32) | extract64(cur_cycles, 0, 32);
+hexagon_set_sys_pcycle_count(env, cycles);
 }
 
 void hexagon_set_sys_pcycle_count_low(CPUHexagonState *env,
 uint32_t cycles_lo)
 {
-g_assert_not_reached();
+uint64_t cur_cycles = hexagon_get_sys_pcycle_count(env);
+uint64_t cycles = extract64(cur_cycles, 32, 32) | cycles_lo;
+hexagon_set_sys_pcycle_count(env, cycles);
 }
 
 void hexagon_set_sys_pcycle_count(CPUHexagonState *env, uint64_t cycles)
 {
-g_assert_not_reached();
+*(env->g_pcycle_base) = cycles;
+
+CPUState *cs;
+CPU_FOREACH(cs) {
+CPUHexagonState *env_ = cpu_env(cs);
+env_->t_cycle_count = 0;
+}
 }
 
 static void set_wait_mode(CPUHexagonState *env)
-- 
2.34.1

Re: [PATCH v5 14/36] migration/multifd: Device state transfer support - send side

2025-03-02 Thread Avihai Horon




On 19/02/2025 22:33, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 

A new function multifd_queue_device_state() is provided for device to queue
its state for transmission via a multifd channel.

Reviewed-by: Peter Xu 
Signed-off-by: Maciej S. Szmigiero 
---
  include/migration/misc.h |   4 ++
  migration/meson.build|   1 +
  migration/multifd-device-state.c | 115 +++
  migration/multifd-nocomp.c   |  14 +++-
  migration/multifd.c  |  42 +--
  migration/multifd.h  |  27 +---
  6 files changed, 187 insertions(+), 16 deletions(-)
  create mode 100644 migration/multifd-device-state.c

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 4c171f4e897e..bd3b725fa0b7 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -118,4 +118,8 @@ bool migrate_is_uri(const char *uri);
  bool migrate_uri_parse(const char *uri, MigrationChannel **channel,
 Error **errp);

+/* migration/multifd-device-state.c */
+bool multifd_queue_device_state(char *idstr, uint32_t instance_id,
+char *data, size_t len);
+
  #endif
diff --git a/migration/meson.build b/migration/meson.build
index d3bfe84d6204..9aa48b290e2a 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -25,6 +25,7 @@ system_ss.add(files(
'migration-hmp-cmds.c',
'migration.c',
'multifd.c',
+  'multifd-device-state.c',
'multifd-nocomp.c',
'multifd-zlib.c',
'multifd-zero-page.c',
diff --git a/migration/multifd-device-state.c b/migration/multifd-device-state.c
new file mode 100644
index ..ab83773e2d62
--- /dev/null
+++ b/migration/multifd-device-state.c
@@ -0,0 +1,115 @@
+/*
+ * Multifd device state migration
+ *
+ * Copyright (C) 2024,2025 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/lockable.h"
+#include "migration/misc.h"
+#include "multifd.h"
+
+static struct {
+QemuMutex queue_job_mutex;
+
+MultiFDSendData *send_data;
+} *multifd_send_device_state;
+
+size_t multifd_device_state_payload_size(void)
+{
+return sizeof(MultiFDDeviceState_t);
+}
+
+void multifd_device_state_send_setup(void)
+{
+assert(!multifd_send_device_state);
+multifd_send_device_state = g_malloc(sizeof(*multifd_send_device_state));
+
+qemu_mutex_init(&multifd_send_device_state->queue_job_mutex);
+
+multifd_send_device_state->send_data = multifd_send_data_alloc();
+}
+
+void multifd_device_state_send_cleanup(void)
+{
+g_clear_pointer(&multifd_send_device_state->send_data,
+multifd_send_data_free);
+
+qemu_mutex_destroy(&multifd_send_device_state->queue_job_mutex);
+
+g_clear_pointer(&multifd_send_device_state, g_free);
+}
+
+void multifd_send_data_clear_device_state(MultiFDDeviceState_t *device_state)
+{
+g_clear_pointer(&device_state->idstr, g_free);
+g_clear_pointer(&device_state->buf, g_free);
+}
+
+static void multifd_device_state_fill_packet(MultiFDSendParams *p)
+{
+MultiFDDeviceState_t *device_state = &p->data->u.device_state;
+MultiFDPacketDeviceState_t *packet = p->packet_device_state;
+
+packet->hdr.flags = cpu_to_be32(p->flags);
+strncpy(packet->idstr, device_state->idstr, sizeof(packet->idstr));


(I think we talked about this in v2):
Looking at idstr creation code, idstr is always NULL terminated. It's 
also treated everywhere as a NULL terminated string.
For consistency and to avoid confusion, I'd treat it as a NULL 
terminated string here too (use strcpy, remove the QEMU_NONSTRING from 
its definition, etc.).

This will also avoid strncpy() unnecessary zeroing of the extra bytes.

Thanks.


+packet->instance_id = cpu_to_be32(device_state->instance_id);
+packet->next_packet_size = cpu_to_be32(p->next_packet_size);
+}
+
+static void multifd_prepare_header_device_state(MultiFDSendParams *p)
+{
+p->iov[0].iov_len = sizeof(*p->packet_device_state);
+p->iov[0].iov_base = p->packet_device_state;
+p->iovs_num++;
+}
+
+void multifd_device_state_send_prepare(MultiFDSendParams *p)
+{
+MultiFDDeviceState_t *device_state = &p->data->u.device_state;
+
+assert(multifd_payload_device_state(p->data));
+
+multifd_prepare_header_device_state(p);
+
+assert(!(p->flags & MULTIFD_FLAG_SYNC));
+
+p->next_packet_size = device_state->buf_len;
+if (p->next_packet_size > 0) {
+p->iov[p->iovs_num].iov_base = device_state->buf;
+p->iov[p->iovs_num].iov_len = p->next_packet_size;
+p->iovs_num++;
+}
+
+p->flags |= MULTIFD_FLAG_NOCOMP | MULTIFD_FLAG_DEVICE_STATE;
+
+multifd_device_state_fill_packet(p);
+}
+
+bool multifd_queue_device_state(char *idstr, uint32_t instance_id,
+

Re: [PATCH v5 11/36] migration/multifd: Device state transfer support - receive side

2025-03-02 Thread Avihai Horon


Hi Maciej,

Sorry for the long delay, I have been busy with other tasks.
I got some small comments for the series.

On 19/02/2025 22:33, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 

Add a basic support for receiving device state via multifd channels -
channels that are shared with RAM transfers.

Depending whether MULTIFD_FLAG_DEVICE_STATE flag is present or not in the
packet header either device state (MultiFDPacketDeviceState_t) or RAM
data (existing MultiFDPacket_t) is read.

The received device state data is provided to
qemu_loadvm_load_state_buffer() function for processing in the
device's load_state_buffer handler.

Reviewed-by: Peter Xu 
Signed-off-by: Maciej S. Szmigiero 
---
  migration/multifd.c | 99 -
  migration/multifd.h | 26 +++-
  2 files changed, 113 insertions(+), 12 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 3b47e63c2c4a..700a385447c7 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -21,6 +21,7 @@
  #include "file.h"
  #include "migration.h"
  #include "migration-stats.h"
+#include "savevm.h"
  #include "socket.h"
  #include "tls.h"
  #include "qemu-file.h"
@@ -252,14 +253,24 @@ static int 
multifd_recv_unfill_packet_header(MultiFDRecvParams *p,
  return 0;
  }

-static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
+static int multifd_recv_unfill_packet_device_state(MultiFDRecvParams *p,
+   Error **errp)
+{
+MultiFDPacketDeviceState_t *packet = p->packet_dev_state;
+
+packet->instance_id = be32_to_cpu(packet->instance_id);
+p->next_packet_size = be32_to_cpu(packet->next_packet_size);
+
+return 0;
+}
+
+static int multifd_recv_unfill_packet_ram(MultiFDRecvParams *p, Error **errp)
  {
  const MultiFDPacket_t *packet = p->packet;
  int ret = 0;

  p->next_packet_size = be32_to_cpu(packet->next_packet_size);
  p->packet_num = be64_to_cpu(packet->packet_num);
-p->packets_recved++;

  /* Always unfill, old QEMUs (<9.0) send data along with SYNC */
  ret = multifd_ram_unfill_packet(p, errp);
@@ -270,6 +281,17 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams 
*p, Error **errp)
  return ret;
  }

+static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
+{
+p->packets_recved++;
+
+if (p->flags & MULTIFD_FLAG_DEVICE_STATE) {
+return multifd_recv_unfill_packet_device_state(p, errp);
+}
+
+return multifd_recv_unfill_packet_ram(p, errp);
+}
+
  static bool multifd_send_should_exit(void)
  {
  return qatomic_read(&multifd_send_state->exiting);
@@ -1057,6 +1079,7 @@ static void 
multifd_recv_cleanup_channel(MultiFDRecvParams *p)
  p->packet_len = 0;
  g_free(p->packet);
  p->packet = NULL;
+g_clear_pointer(&p->packet_dev_state, g_free);
  g_free(p->normal);
  p->normal = NULL;
  g_free(p->zero);
@@ -1158,6 +1181,32 @@ void multifd_recv_sync_main(void)
  trace_multifd_recv_sync_main(multifd_recv_state->packet_num);
  }

+static int multifd_device_state_recv(MultiFDRecvParams *p, Error **errp)
+{
+g_autofree char *idstr = NULL;
+g_autofree char *dev_state_buf = NULL;
+int ret;
+
+dev_state_buf = g_malloc(p->next_packet_size);
+
+ret = qio_channel_read_all(p->c, dev_state_buf, p->next_packet_size, errp);
+if (ret != 0) {
+return ret;
+}
+
+idstr = g_strndup(p->packet_dev_state->idstr,
+  sizeof(p->packet_dev_state->idstr));
+
+if (!qemu_loadvm_load_state_buffer(idstr,
+   p->packet_dev_state->instance_id,
+   dev_state_buf, p->next_packet_size,
+   errp)) {
+ret = -1;
+}
+
+return ret;
+}
+
  static void *multifd_recv_thread(void *opaque)
  {
  MigrationState *s = migrate_get_current();
@@ -1176,6 +1225,7 @@ static void *multifd_recv_thread(void *opaque)
  while (true) {
  MultiFDPacketHdr_t hdr;
  uint32_t flags = 0;
+bool is_device_state = false;
  bool has_data = false;
  uint8_t *pkt_buf;
  size_t pkt_len;
@@ -1209,8 +1259,14 @@ static void *multifd_recv_thread(void *opaque)
  break;
  }

-pkt_buf = (uint8_t *)p->packet + sizeof(hdr);
-pkt_len = p->packet_len - sizeof(hdr);
+is_device_state = p->flags & MULTIFD_FLAG_DEVICE_STATE;
+if (is_device_state) {
+pkt_buf = (uint8_t *)p->packet_dev_state + sizeof(hdr);
+pkt_len = sizeof(*p->packet_dev_state) - sizeof(hdr);
+} else {
+pkt_buf = (uint8_t *)p->packet + sizeof(hdr);
+pkt_len = p->packet_len - sizeof(hdr);
+}

  ret = qio_channel_read_all_eof(p->c, (char *)pkt_buf, p

[PATCH v2] docs/specs/riscv-iommu: Fixed broken link to external risv iommu document

2025-03-02 Thread hemanshu.khilari.foss

The link to riscv iommu specification document is incorrect. This patch
updates the said link to point to correct location.

Cc: qemu-ri...@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2808
Signed-off-by: hemanshu.khilari.foss 
---
 docs/specs/riscv-iommu.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/specs/riscv-iommu.rst b/docs/specs/riscv-iommu.rst
index b1538c9ead..772145e8d7 100644
--- a/docs/specs/riscv-iommu.rst
+++ b/docs/specs/riscv-iommu.rst
@@ -4,7 +4,7 @@ RISC-V IOMMU support for RISC-V machines
 
 
 QEMU implements a RISC-V IOMMU emulation based on the RISC-V IOMMU spec
-version 1.0 `iommu1.0`_.
+version 1.0 `iommu1.0.0`_.
 
 The emulation includes a PCI reference device (riscv-iommu-pci) and a platform
 bus device (riscv-iommu-sys) that QEMU RISC-V boards can use.  The 'virt'
@@ -107,7 +107,7 @@ riscv-iommu options:
 - "s-stage": enabled
 - "g-stage": enabled
 
-.. _iommu1.0: 
https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
+.. _iommu1.0.0: 
https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0.0/riscv-iommu.pdf
 
 .. _linux-v8: 
https://lore.kernel.org/linux-riscv/cover.1718388908.git.tjezn...@rivosinc.com/
 
-- 
2.42.0

[PATCH v2 08/10] target/i386/kvm: reset AMD PMU registers during VM reset

2025-03-02 Thread Dongli Zhang

QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
and kvm_put_msrs() to restore them to KVM. However, there is no support for
AMD PMU registers. Currently, has_pmu_version and num_pmu_gp_counters are
initialized based on cpuid(0xa), which does not apply to AMD processors.
For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
is determined based on the CPU version.

To address this issue, we need to add support for AMD PMU registers.
Without this support, the following problems can arise:

1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.

2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.

3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.

4. After a reboot, the VM kernel may report the following error:

[0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, 
complain to your hardware vendor.
[0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 
c0010200 is 530076)

5. In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process.

Signed-off-by: Dongli Zhang 
---
Changed since v1:
  - Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
AMD64_NUM_COUNTERS (suggested by Sandipan Das).
  - Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
(suggested by Sandipan Das).
  - Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
  - Don't initialize PMU info if kvm.enable_pmu=N.

 target/i386/cpu.h |   8 ++
 target/i386/kvm/kvm.c | 173 +-
 2 files changed, 177 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index c67b42d34f..319600672b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -490,6 +490,14 @@ typedef enum X86Seg {
 #define MSR_CORE_PERF_GLOBAL_CTRL   0x38f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
 
+#define MSR_K7_EVNTSEL0 0xc001
+#define MSR_K7_PERFCTR0 0xc0010004
+#define MSR_F15H_PERF_CTL0  0xc0010200
+#define MSR_F15H_PERF_CTR0  0xc0010201
+
+#define AMD64_NUM_COUNTERS  4
+#define AMD64_NUM_COUNTERS_CORE 6
+
 #define MSR_MC0_CTL 0x400
 #define MSR_MC0_STATUS  0x401
 #define MSR_MC0_ADDR0x402
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index efba3ae7a4..d4be8a0d2e 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2069,7 +2069,7 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
 return 0;
 }
 
-static void kvm_init_pmu_info(CPUX86State *env)
+static void kvm_init_pmu_info_intel(CPUX86State *env)
 {
 uint32_t eax, edx;
 uint32_t unused;
@@ -2106,6 +2106,94 @@ static void kvm_init_pmu_info(CPUX86State *env)
 }
 }
 
+static void kvm_init_pmu_info_amd(CPUX86State *env)
+{
+uint32_t unused;
+int64_t family;
+uint32_t ecx;
+
+has_pmu_version = 0;
+
+/*
+ * To determine the CPU family, the following code is derived from
+ * x86_cpuid_version_get_family().
+ */
+family = (env->cpuid_version >> 8) & 0xf;
+if (family == 0xf) {
+family += (env->cpuid_version >> 20) & 0xff;
+}
+
+/*
+ * Performance-monitoring supported from K7 and later.
+ */
+if (family < 6) {
+return;
+}
+
+has_pmu_version = 1;
+
+cpu_x86_cpuid(env, 0x8001, 0, &unused, &unused, &ecx, &unused);
+
+if (!(ecx & CPUID_EXT3_PERFCORE)) {
+num_pmu_gp_counters = AMD64_NUM_COUNTERS;
+return;
+}
+
+num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
+}
+
+static bool is_same_vendor(CPUX86State *env)
+{
+static uint32_t host_cpuid_vendor1;
+static uint32_t host_cpuid_vendor2;
+static uint32_t host_cpuid_vendor3;
+
+host_cpuid(0x0, 0, NULL, &host_cpuid_vendor1, &host_cpuid_vendor3,
+   &host_cpuid_vendor2);
+
+return env->cpuid_vendor1 == host_cpuid_vendor1 &&
+   env->cpuid_vendor2 == host_cpuid_vendor2 &&
+   env->cpuid_vendor3 == host_cpuid_vendor3;
+}
+
+static void kvm_init_pmu_info(CPUState *cs)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = &cpu->env;
+
+/*
+ * The PMU virtualization is disabled by kvm.enable_pmu=N.
+ */
+if (kvm_pmu_disabled) {
+return;
+}
+
+/*
+ * It is not supported to virtualize AMD PMU registers on Intel
+ * processors, nor to virtualize Intel PMU registers on AMD processors.
+

[PATCH v2 09/10] target/i386/kvm: support perfmon-v2 for reset

2025-03-02 Thread Dongli Zhang

Since perfmon-v2, the AMD PMU supports additional registers. This update
includes get/put functionality for these extra registers.

Similar to the implementation in KVM:

- MSR_CORE_PERF_GLOBAL_STATUS and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS both
use env->msr_global_status.
- MSR_CORE_PERF_GLOBAL_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_CTL both use
env->msr_global_ctrl.
- MSR_CORE_PERF_GLOBAL_OVF_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
both use env->msr_global_ovf_ctrl.

No changes are needed for vmstate_msr_architectural_pmu or
pmu_enable_needed().

Signed-off-by: Dongli Zhang 
---
Changed since v1:
  - Use "has_pmu_version > 1", not "has_pmu_version == 2".

 target/i386/cpu.h |  4 
 target/i386/kvm/kvm.c | 47 ++-
 2 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 319600672b..fdceebfc72 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -490,6 +490,10 @@ typedef enum X86Seg {
 #define MSR_CORE_PERF_GLOBAL_CTRL   0x38f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
 
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS   0xc300
+#define MSR_AMD64_PERF_CNTR_GLOBAL_CTL  0xc301
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR   0xc302
+
 #define MSR_K7_EVNTSEL0 0xc001
 #define MSR_K7_PERFCTR0 0xc0010004
 #define MSR_F15H_PERF_CTL0  0xc0010200
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index d4be8a0d2e..c5911baef0 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2108,9 +2108,9 @@ static void kvm_init_pmu_info_intel(CPUX86State *env)
 
 static void kvm_init_pmu_info_amd(CPUX86State *env)
 {
+uint32_t eax, ebx, ecx;
 uint32_t unused;
 int64_t family;
-uint32_t ecx;
 
 has_pmu_version = 0;
 
@@ -2140,6 +2140,13 @@ static void kvm_init_pmu_info_amd(CPUX86State *env)
 }
 
 num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
+
+cpu_x86_cpuid(env, 0x8022, 0, &eax, &ebx, &unused, &unused);
+
+if (eax & CPUID_8000_0022_EAX_PERFMON_V2) {
+has_pmu_version = 2;
+num_pmu_gp_counters = ebx & 0xf;
+}
 }
 
 static bool is_same_vendor(CPUX86State *env)
@@ -4195,13 +4202,14 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 uint32_t step = 1;
 
 /*
- * When PERFCORE is enabled, AMD PMU uses a separate set of
- * addresses for the selector and counter registers.
- * Additionally, the address of the next selector or counter
- * register is determined by incrementing the address of the
- * current register by two.
+ * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a
+ * separate set of addresses for the selector and counter
+ * registers. Additionally, the address of the next selector or
+ * counter register is determined by incrementing the address
+ * of the current register by two.
  */
-if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE ||
+has_pmu_version > 1) {
 sel_base = MSR_F15H_PERF_CTL0;
 ctr_base = MSR_F15H_PERF_CTR0;
 step = 2;
@@ -4213,6 +4221,15 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_add(cpu, sel_base + i * step,
   env->msr_gp_evtsel[i]);
 }
+
+if (has_pmu_version > 1) {
+kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
+  env->msr_global_status);
+kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
+  env->msr_global_ovf_ctrl);
+kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+  env->msr_global_ctrl);
+}
 }
 
 /*
@@ -4690,13 +4707,14 @@ static int kvm_get_msrs(X86CPU *cpu)
 uint32_t step = 1;
 
 /*
- * When PERFCORE is enabled, AMD PMU uses a separate set of
- * addresses for the selector and counter registers.
+ * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a separate
+ * set of addresses for the selector and counter registers.
  * Additionally, the address of the next selector or counter
  * register is determined by incrementing the address of the
  * current register by two.
  */
-if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE ||
+has_pmu_version > 1) {
 sel_base = MSR_F15H_PERF_CTL0;
 ctr_base = MSR_F15H_PERF_CTR0;
 step = 2;
@@ -4706,6 +4724,12 @@ static int kvm_get_msrs(X86CPU *cpu)
 kvm_msr_entry_add(cpu, ctr_base + i * step, 0);

[PATCH v2 00/10] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup

2025-03-02 Thread Dongli Zhang

Would you mind suggesting how we can move forward with this patchset as:

(1) One patch for kvm_arch_pre_create_vcpu() is picked from Xiaoyao's
patchset.
(2) Dapeng is working on mediated passthrough vPMU QEMU patches. This
patchset doesn't support mediated passthrough vPMU.

This patchset addresses four bugs related to AMD PMU virtualization.

1. The PerfMonV2 is still available if PERCORE if disabled via
"-cpu host,-perfctr-core".

2. The VM 'cpuid' command still returns PERFCORE although "-pmu" is
configured.

3. The third issue is that using "-cpu host,-pmu" does not disable AMD PMU
virtualization. When using "-cpu EPYC" or "-cpu host,-pmu", AMD PMU
virtualization remains enabled. On the VM's Linux side, you might still
see:

[0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.

instead of:

[0.596381] Performance Events: PMU not available due to virtualization, 
using software events only.
[0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.

4. The fourth issue is that unreclaimed performance events (after a QEMU
system_reset) in KVM may cause random, unwanted, or unknown NMIs to be
injected into the VM.

The AMD PMU registers are not reset during QEMU system_reset.

(1) If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.

(2) Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.

(3) The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.

(4) After a reboot, the VM kernel may report the following error:

[0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, 
complain to your hardware vendor.
[0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 
c0010200 is 530076)

(5) In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process


Changed since v1:
  - Use feature_dependencies for CPUID_EXT3_PERFCORE and
CPUID_8000_0022_EAX_PERFMON_V2.
  - Remove CPUID_EXT3_PERFCORE when !cpu->enable_pmu.
  - Pick kvm_arch_pre_create_vcpu() patch from Xiaoyao Li.
  - Use "-pmu" but not a global "pmu-cap-disabled" for KVM_PMU_CAP_DISABLE.
  - Also use sysfs kvm.enable_pmu=N to determine if PMU is supported.
  - Some changes to PMU register limit calculation.


Xiaoyao Li (1):
  kvm: Introduce kvm_arch_pre_create_vcpu()

Dongli Zhang (9):
  target/i386: disable PerfMonV2 when PERFCORE unavailable
  target/i386: disable PERFCORE when "-pmu" is configured
  target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
  target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
  target/i386/kvm: rename architectural PMU variables
  target/i386/kvm: query kvm.enable_pmu parameter
  target/i386/kvm: reset AMD PMU registers during VM reset
  target/i386/kvm: support perfmon-v2 for reset
  target/i386/kvm: don't stop Intel PMU counters

 accel/kvm/kvm-all.c|   5 +
 include/system/kvm.h   |   1 +
 target/arm/kvm.c   |   5 +
 target/i386/cpu.c  |   8 +
 target/i386/cpu.h  |  12 ++
 target/i386/kvm/kvm.c  | 348 ++--
 target/loongarch/kvm/kvm.c |   5 +
 target/mips/kvm.c  |   5 +
 target/ppc/kvm.c   |   5 +
 target/riscv/kvm/kvm-cpu.c |   5 +
 target/s390x/kvm/kvm.c |   5 +
 11 files changed, 357 insertions(+), 47 deletions(-)

base-commit: b69801dd6b1eb4d107f7c2f643adf0a4e3ec9124

Thank you very much!

Dongli Zhang

[PATCH v2 05/10] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()

2025-03-02 Thread Dongli Zhang

The initialization of 'has_architectural_pmu_version',
'num_architectural_pmu_gp_counters', and
'num_architectural_pmu_fixed_counters' is unrelated to the process of
building the CPUID.

Extract them out of kvm_x86_build_cpuid().

No functional change.

Signed-off-by: Dongli Zhang 
---
Changed since v1:
  - Still extract the code, but call them for all CPUs.

 target/i386/kvm/kvm.c | 66 +--
 1 file changed, 39 insertions(+), 27 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5c8a852dbd..8f293ffd61 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1959,33 +1959,6 @@ static uint32_t kvm_x86_build_cpuid(CPUX86State *env,
 }
 }
 
-if (limit >= 0x0a) {
-uint32_t eax, edx;
-
-cpu_x86_cpuid(env, 0x0a, 0, &eax, &unused, &unused, &edx);
-
-has_architectural_pmu_version = eax & 0xff;
-if (has_architectural_pmu_version > 0) {
-num_architectural_pmu_gp_counters = (eax & 0xff00) >> 8;
-
-/* Shouldn't be more than 32, since that's the number of bits
- * available in EBX to tell us _which_ counters are available.
- * Play it safe.
- */
-if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
-num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
-}
-
-if (has_architectural_pmu_version > 1) {
-num_architectural_pmu_fixed_counters = edx & 0x1f;
-
-if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) 
{
-num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
-}
-}
-}
-}
-
 cpu_x86_cpuid(env, 0x8000, 0, &limit, &unused, &unused, &unused);
 
 for (i = 0x8000; i <= limit; i++) {
@@ -2085,6 +2058,43 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
 return 0;
 }
 
+static void kvm_init_pmu_info(CPUX86State *env)
+{
+uint32_t eax, edx;
+uint32_t unused;
+uint32_t limit;
+
+cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
+
+if (limit < 0x0a) {
+return;
+}
+
+cpu_x86_cpuid(env, 0x0a, 0, &eax, &unused, &unused, &edx);
+
+has_architectural_pmu_version = eax & 0xff;
+if (has_architectural_pmu_version > 0) {
+num_architectural_pmu_gp_counters = (eax & 0xff00) >> 8;
+
+/*
+ * Shouldn't be more than 32, since that's the number of bits
+ * available in EBX to tell us _which_ counters are available.
+ * Play it safe.
+ */
+if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
+num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
+}
+
+if (has_architectural_pmu_version > 1) {
+num_architectural_pmu_fixed_counters = edx & 0x1f;
+
+if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
+num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
+}
+}
+}
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 struct {
@@ -2267,6 +2277,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
 cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
 cpuid_data.cpuid.nent = cpuid_i;
 
+kvm_init_pmu_info(env);
+
 if (((env->cpuid_version >> 8)&0xF) >= 6
 && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
(CPUID_MCE | CPUID_MCA)) {
-- 
2.43.5

[PATCH v2 02/10] target/i386: disable PERFCORE when "-pmu" is configured

2025-03-02 Thread Dongli Zhang

Currently, AMD PMU support isn't determined based on CPUID, that is, the
"-pmu" option does not fully disable KVM AMD PMU virtualization.

To minimize AMD PMU features, remove PERFCORE when "-pmu" is configured.

To completely disable AMD PMU virtualization will be implemented via
KVM_CAP_PMU_CAPABILITY in upcoming patches.

As a reminder, neither CPUID_EXT3_PERFCORE nor
CPUID_8000_0022_EAX_PERFMON_V2 is removed from env->features[] when "-pmu"
is configured. Developers should query whether they are supported via
cpu_x86_cpuid() rather than relying on env->features[] in future patches.

Suggested-by: Zhao Liu 
Signed-off-by: Dongli Zhang 
---
 target/i386/cpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index b6d6167910..61a671028a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7115,6 +7115,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 !(env->hflags & HF_LMA_MASK)) {
 *edx &= ~CPUID_EXT2_SYSCALL;
 }
+
+if (kvm_enabled() && IS_AMD_CPU(env) && !cpu->enable_pmu) {
+*ecx &= ~CPUID_EXT3_PERFCORE;
+}
 break;
 case 0x8002:
 case 0x8003:
-- 
2.43.5

Re: [PATCH] hw/ppc/e500: Partial implementation of local access window registers

2025-03-02 Thread BALATON Zoltan


On Sun, 2 Mar 2025, Bernhard Beschow wrote:

Am 1. März 2025 16:10:35 UTC schrieb BALATON Zoltan :

On Wed, 15 Jan 2025, BALATON Zoltan wrote:

This allows guests to set the CCSR base address. Also store and return
values of the local access window registers but their functionality
isn't implemented.


Bernhard,


Hi Zoltan,



If you have no alternative patch you plan to submit for next release 
should we merge this for now? This helps running u-boot binaries even 
if it's not enough alone but would be one patch less in my local tree. 
Or do you know about a problem with this patch why this should not be 
merged?


QEMU sets a machine-specific CCSR base address (pmc->ccsrbar_base) which 
differs from the real chip's default. The default is checked by U-Boot 
which enters an infinite loop on mismatch: 
. 
How does this work for you?


The U-Boot version I've tested (or something else, I don't remember) 
wanted to set the CCSRBAR to the value it expects which is different from 
where it looks for it at reset and that works with this patch. I don't 
know which code path that corresponds to in start.S.


In addition, when moving the CCSR region, `env->mpic_iack` should be 
updated as well: 



I did not notice that. Maybe it's only relevant with KVM and I've only 
tested with TCG or could be I did not get to the part yet when this would 
cause problems. I don't see an easy way to update this as these are in 
separate places and also don't know how this mpic_iack is handled with 
multiple cores. So in case this patch breaks this then it's OK to drop it 
for now. I can carry it locally until fixed upstream.


I'm happy to submit an implementation on top of my e500 cleanup series 
 
which needs agreement on some open discussions.


Some of that series was already merged. What are the outstanding patches 
and discussions? I don't remember seeing an updated version of that series 
with only the outstanding patches.


Regards,
BALATON Zoltan

Re: [PATCH v2] docs/specs/riscv-iommu: Fixed broken link to external risv iommu document

2025-03-02 Thread Alistair Francis

On Sun, Mar 2, 2025 at 11:07 PM hemanshu.khilari.foss
 wrote:
>
> The link to riscv iommu specification document is incorrect. This patch
> updates the said link to point to correct location.
>
> Cc: qemu-ri...@nongnu.org
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2808
> Signed-off-by: hemanshu.khilari.foss 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  docs/specs/riscv-iommu.rst | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/docs/specs/riscv-iommu.rst b/docs/specs/riscv-iommu.rst
> index b1538c9ead..772145e8d7 100644
> --- a/docs/specs/riscv-iommu.rst
> +++ b/docs/specs/riscv-iommu.rst
> @@ -4,7 +4,7 @@ RISC-V IOMMU support for RISC-V machines
>  
>
>  QEMU implements a RISC-V IOMMU emulation based on the RISC-V IOMMU spec
> -version 1.0 `iommu1.0`_.
> +version 1.0 `iommu1.0.0`_.
>
>  The emulation includes a PCI reference device (riscv-iommu-pci) and a 
> platform
>  bus device (riscv-iommu-sys) that QEMU RISC-V boards can use.  The 'virt'
> @@ -107,7 +107,7 @@ riscv-iommu options:
>  - "s-stage": enabled
>  - "g-stage": enabled
>
> -.. _iommu1.0: 
> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
> +.. _iommu1.0.0: 
> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0.0/riscv-iommu.pdf
>
>  .. _linux-v8: 
> https://lore.kernel.org/linux-riscv/cover.1718388908.git.tjezn...@rivosinc.com/
>
> --
> 2.42.0
>
>

Re: [PATCH v2] docs/specs/riscv-iommu: Fixed broken link to external risv iommu document

2025-03-02 Thread Alistair Francis

On Sun, Mar 2, 2025 at 11:07 PM hemanshu.khilari.foss
 wrote:
>
> The link to riscv iommu specification document is incorrect. This patch
> updates the said link to point to correct location.
>
> Cc: qemu-ri...@nongnu.org
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2808
> Signed-off-by: hemanshu.khilari.foss 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  docs/specs/riscv-iommu.rst | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/docs/specs/riscv-iommu.rst b/docs/specs/riscv-iommu.rst
> index b1538c9ead..772145e8d7 100644
> --- a/docs/specs/riscv-iommu.rst
> +++ b/docs/specs/riscv-iommu.rst
> @@ -4,7 +4,7 @@ RISC-V IOMMU support for RISC-V machines
>  
>
>  QEMU implements a RISC-V IOMMU emulation based on the RISC-V IOMMU spec
> -version 1.0 `iommu1.0`_.
> +version 1.0 `iommu1.0.0`_.
>
>  The emulation includes a PCI reference device (riscv-iommu-pci) and a 
> platform
>  bus device (riscv-iommu-sys) that QEMU RISC-V boards can use.  The 'virt'
> @@ -107,7 +107,7 @@ riscv-iommu options:
>  - "s-stage": enabled
>  - "g-stage": enabled
>
> -.. _iommu1.0: 
> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
> +.. _iommu1.0.0: 
> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0.0/riscv-iommu.pdf
>
>  .. _linux-v8: 
> https://lore.kernel.org/linux-riscv/cover.1718388908.git.tjezn...@rivosinc.com/
>
> --
> 2.42.0
>
>

Re: [PATCH v2] docs/specs/riscv-iommu: Fixed broken link to external risv iommu document

2025-03-02 Thread Alistair Francis

On Mon, Mar 3, 2025 at 10:46 AM Alistair Francis  wrote:
>
> On Sun, Mar 2, 2025 at 11:07 PM hemanshu.khilari.foss
>  wrote:
> >
> > The link to riscv iommu specification document is incorrect. This patch
> > updates the said link to point to correct location.
> >
> > Cc: qemu-ri...@nongnu.org
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2808
> > Signed-off-by: hemanshu.khilari.foss 
>
> Thanks!
>
> Applied to riscv-to-apply.next
>
> Alistair
>
> > ---
> >  docs/specs/riscv-iommu.rst | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/docs/specs/riscv-iommu.rst b/docs/specs/riscv-iommu.rst
> > index b1538c9ead..772145e8d7 100644
> > --- a/docs/specs/riscv-iommu.rst
> > +++ b/docs/specs/riscv-iommu.rst
> > @@ -4,7 +4,7 @@ RISC-V IOMMU support for RISC-V machines
> >  
> >
> >  QEMU implements a RISC-V IOMMU emulation based on the RISC-V IOMMU spec
> > -version 1.0 `iommu1.0`_.
> > +version 1.0 `iommu1.0.0`_.
> >
> >  The emulation includes a PCI reference device (riscv-iommu-pci) and a 
> > platform
> >  bus device (riscv-iommu-sys) that QEMU RISC-V boards can use.  The 'virt'
> > @@ -107,7 +107,7 @@ riscv-iommu options:
> >  - "s-stage": enabled
> >  - "g-stage": enabled
> >
> > -.. _iommu1.0: 
> > https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
> > +.. _iommu1.0.0: 
> > https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0.0/riscv-iommu.pdf

This fails to build with the error:

qemu/docs/specs/riscv-iommu.rst:16:Unknown target name: "iommu1.0".

You missed a iommu1.0 update

Alistair

> >
> >  .. _linux-v8: 
> > https://lore.kernel.org/linux-riscv/cover.1718388908.git.tjezn...@rivosinc.com/
> >
> > --
> > 2.42.0
> >
> >

Re: [PATCH v3 2/3] target/riscv/kvm: add kvm_riscv_reset_regs_csr()

2025-03-02 Thread Alistair Francis

On Mon, Feb 24, 2025 at 10:32 PM Daniel Henrique Barboza
 wrote:
>
> We're setting reset vals for KVM csrs during kvm_riscv_reset_vcpu(), but
> in no particular order and missing some of them (like env->mstatus).
>
> Create a helper to do that, unclogging reset_vcpu(), and initialize
> env->mstatus as well. Keep the regs in the same order they appear in
> struct kvm_riscv_csr from the KVM UAPI, similar to what
> kvm_riscv_(get|put)_regs_csr are doing. This will make a bit easier to
> add new KVM CSRs and to verify which values we're writing back to KVM
> during vcpu reset.
>
> Signed-off-by: Daniel Henrique Barboza 
> Reviewed-by: Andrew Jones 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/kvm/kvm-cpu.c | 23 +++
>  1 file changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
> index 19bb87515b..cabc34b6a2 100644
> --- a/target/riscv/kvm/kvm-cpu.c
> +++ b/target/riscv/kvm/kvm-cpu.c
> @@ -608,6 +608,19 @@ static int kvm_riscv_put_regs_core(CPUState *cs)
>  return ret;
>  }
>
> +static void kvm_riscv_reset_regs_csr(CPURISCVState *env)
> +{
> +env->mstatus = 0;
> +env->mie = 0;
> +env->stvec = 0;
> +env->sscratch = 0;
> +env->sepc = 0;
> +env->scause = 0;
> +env->stval = 0;
> +env->mip = 0;
> +env->satp = 0;
> +}
> +
>  static int kvm_riscv_get_regs_csr(CPUState *cs)
>  {
>  CPURISCVState *env = &RISCV_CPU(cs)->env;
> @@ -1612,14 +1625,8 @@ void kvm_riscv_reset_vcpu(RISCVCPU *cpu)
>  env->pc = cpu->env.kernel_addr;
>  env->gpr[10] = kvm_arch_vcpu_id(CPU(cpu)); /* a0 */
>  env->gpr[11] = cpu->env.fdt_addr;  /* a1 */
> -env->satp = 0;
> -env->mie = 0;
> -env->stvec = 0;
> -env->sscratch = 0;
> -env->sepc = 0;
> -env->scause = 0;
> -env->stval = 0;
> -env->mip = 0;
> +
> +kvm_riscv_reset_regs_csr(env);
>  }
>
>  void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level)
> --
> 2.48.1
>
>

Re: [PATCH v3 3/3] target/riscv/kvm: add missing KVM CSRs

2025-03-02 Thread Alistair Francis

On Mon, Feb 24, 2025 at 10:32 PM Daniel Henrique Barboza
 wrote:
>
> We're missing scounteren and senvcfg CSRs, both already present in the
> KVM UAPI.
>
> Signed-off-by: Daniel Henrique Barboza 
> Reviewed-by: Andrew Jones 

Acked-by: Alistair Francis 

Alistair

> ---
>  target/riscv/kvm/kvm-cpu.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
> index cabc34b6a2..e8c31a32d4 100644
> --- a/target/riscv/kvm/kvm-cpu.c
> +++ b/target/riscv/kvm/kvm-cpu.c
> @@ -619,6 +619,8 @@ static void kvm_riscv_reset_regs_csr(CPURISCVState *env)
>  env->stval = 0;
>  env->mip = 0;
>  env->satp = 0;
> +env->scounteren = 0;
> +env->senvcfg = 0;
>  }
>
>  static int kvm_riscv_get_regs_csr(CPUState *cs)
> @@ -634,6 +636,8 @@ static int kvm_riscv_get_regs_csr(CPUState *cs)
>  KVM_RISCV_GET_CSR(cs, env, stval, env->stval);
>  KVM_RISCV_GET_CSR(cs, env, sip, env->mip);
>  KVM_RISCV_GET_CSR(cs, env, satp, env->satp);
> +KVM_RISCV_GET_CSR(cs, env, scounteren, env->scounteren);
> +KVM_RISCV_GET_CSR(cs, env, senvcfg, env->senvcfg);
>
>  return 0;
>  }
> @@ -651,6 +655,8 @@ static int kvm_riscv_put_regs_csr(CPUState *cs)
>  KVM_RISCV_SET_CSR(cs, env, stval, env->stval);
>  KVM_RISCV_SET_CSR(cs, env, sip, env->mip);
>  KVM_RISCV_SET_CSR(cs, env, satp, env->satp);
> +KVM_RISCV_SET_CSR(cs, env, scounteren, env->scounteren);
> +KVM_RISCV_SET_CSR(cs, env, senvcfg, env->senvcfg);
>
>  return 0;
>  }
> --
> 2.48.1
>
>

Re: [PATCH v5 31/36] vfio/migration: Add x-migration-multifd-transfer VFIO property

2025-03-02 Thread Avihai Horon




On 19/02/2025 22:34, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 

This property allows configuring at runtime whether to transfer the


IIUC, in this patch it's not configurable at runtime, so let's drop "at 
runtime".



particular device state via multifd channels when live migrating that
device.

It defaults to AUTO, which means that VFIO device state transfer via
multifd channels is attempted in configurations that otherwise support it.

Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration-multifd.c   | 17 -
  hw/vfio/pci.c |  3 +++
  include/hw/vfio/vfio-common.h |  2 ++
  3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index 0cfa9d31732a..18a5ff964a37 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -460,11 +460,26 @@ bool vfio_multifd_transfer_supported(void)

  bool vfio_multifd_transfer_enabled(VFIODevice *vbasedev)
  {
-return false;
+VFIOMigration *migration = vbasedev->migration;
+
+return migration->multifd_transfer;
  }

  bool vfio_multifd_transfer_setup(VFIODevice *vbasedev, Error **errp)
  {
+VFIOMigration *migration = vbasedev->migration;
+
+/*
+ * Make a copy of this setting at the start in case it is changed
+ * mid-migration.
+ */
+if (vbasedev->migration_multifd_transfer == ON_OFF_AUTO_AUTO) {
+migration->multifd_transfer = vfio_multifd_transfer_supported();
+} else {
+migration->multifd_transfer =
+vbasedev->migration_multifd_transfer == ON_OFF_AUTO_ON;
+}


Making a copy of this value is only relevant for the next patch where 
it's turned mutable, so let's move this code to patch #32.


Thanks.


+
  if (vfio_multifd_transfer_enabled(vbasedev) &&
  !vfio_multifd_transfer_supported()) {
  error_setg(errp,
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 89d900e9cf0c..184ff882f9d1 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3377,6 +3377,9 @@ static const Property vfio_pci_dev_properties[] = {
  VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
  DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
  vbasedev.enable_migration, ON_OFF_AUTO_AUTO),
+DEFINE_PROP_ON_OFF_AUTO("x-migration-multifd-transfer", VFIOPCIDevice,
+vbasedev.migration_multifd_transfer,
+ON_OFF_AUTO_AUTO),
  DEFINE_PROP_BOOL("migration-events", VFIOPCIDevice,
   vbasedev.migration_events, false),
  DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index ba851917f9fc..3006931accf6 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -91,6 +91,7 @@ typedef struct VFIOMigration {
  uint64_t mig_flags;
  uint64_t precopy_init_size;
  uint64_t precopy_dirty_size;
+bool multifd_transfer;
  VFIOMultifd *multifd;
  bool initial_data_sent;

@@ -153,6 +154,7 @@ typedef struct VFIODevice {
  bool no_mmap;
  bool ram_block_discard_allowed;
  OnOffAuto enable_migration;
+OnOffAuto migration_multifd_transfer;
  bool migration_events;
  VFIODeviceOps *ops;
  unsigned int num_irqs;

Re: [PATCH v5 34/36] vfio/migration: Max in-flight VFIO device state buffer count limit

2025-03-02 Thread Avihai Horon




On 19/02/2025 22:34, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 

Allow capping the maximum count of in-flight VFIO device state buffers
queued at the destination, otherwise a malicious QEMU source could
theoretically cause the target QEMU to allocate unlimited amounts of memory
for buffers-in-flight.


I still think it's better to limit the number of bytes rather than 
number of buffers:
1. To the average user the number of buffers doesn't really mean 
anything. They have to open the code and see what is the size of a 
single buffer and then choose their value.
2. Currently VFIO migration buffer size is 1MB. If later it's changed to 
2MB for example, users will have to adjust their configuration 
accordingly. With number of bytes, the configuration remains the same no 
matter what is the VFIO migration buffer size.




Since this is not expected to be a realistic threat in most of VFIO live
migration use cases and the right value depends on the particular setup
disable the limit by default by setting it to UINT64_MAX.

Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration-multifd.c   | 14 ++
  hw/vfio/pci.c |  2 ++
  include/hw/vfio/vfio-common.h |  1 +
  3 files changed, 17 insertions(+)

diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index 18a5ff964a37..04aa3f4a6596 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -53,6 +53,7 @@ typedef struct VFIOMultifd {
  QemuMutex load_bufs_mutex; /* Lock order: this lock -> BQL */
  uint32_t load_buf_idx;
  uint32_t load_buf_idx_last;
+uint32_t load_buf_queued_pending_buffers;
  } VFIOMultifd;

  static void vfio_state_buffer_clear(gpointer data)
@@ -121,6 +122,15 @@ static bool vfio_load_state_buffer_insert(VFIODevice 
*vbasedev,

  assert(packet->idx >= multifd->load_buf_idx);

+multifd->load_buf_queued_pending_buffers++;
+if (multifd->load_buf_queued_pending_buffers >
+vbasedev->migration_max_queued_buffers) {
+error_setg(errp,
+   "queuing state buffer %" PRIu32 " would exceed the max of 
%" PRIu64,
+   packet->idx, vbasedev->migration_max_queued_buffers);


Let's add vbasedev->name to the error message so we know which device 
caused the error.


Thanks.


+return false;
+}
+
  lb->data = g_memdup2(&packet->data, packet_total_size - sizeof(*packet));
  lb->len = packet_total_size - sizeof(*packet);
  lb->is_present = true;
@@ -374,6 +384,9 @@ static bool vfio_load_bufs_thread(void *opaque, bool 
*should_quit, Error **errp)
  goto ret_signal;
  }

+assert(multifd->load_buf_queued_pending_buffers > 0);
+multifd->load_buf_queued_pending_buffers--;
+
  if (multifd->load_buf_idx == multifd->load_buf_idx_last - 1) {
  trace_vfio_load_state_device_buffer_end(vbasedev->name);
  }
@@ -408,6 +421,7 @@ VFIOMultifd *vfio_multifd_new(void)

  multifd->load_buf_idx = 0;
  multifd->load_buf_idx_last = UINT32_MAX;
+multifd->load_buf_queued_pending_buffers = 0;
  qemu_cond_init(&multifd->load_bufs_buffer_ready_cond);

  multifd->load_bufs_thread_running = false;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 9111805ae06c..247418f0fce2 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3383,6 +3383,8 @@ static const Property vfio_pci_dev_properties[] = {
  vbasedev.migration_multifd_transfer,
  qdev_prop_on_off_auto_mutable, OnOffAuto,
  .set_default = true, .defval.i = ON_OFF_AUTO_AUTO),
+DEFINE_PROP_UINT64("x-migration-max-queued-buffers", VFIOPCIDevice,
+   vbasedev.migration_max_queued_buffers, UINT64_MAX),
  DEFINE_PROP_BOOL("migration-events", VFIOPCIDevice,
   vbasedev.migration_events, false),
  DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 3006931accf6..30a5bb9af61b 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -155,6 +155,7 @@ typedef struct VFIODevice {
  bool ram_block_discard_allowed;
  OnOffAuto enable_migration;
  OnOffAuto migration_multifd_transfer;
+uint64_t migration_max_queued_buffers;
  bool migration_events;
  VFIODeviceOps *ops;
  unsigned int num_irqs;

Re: [PATCH v5 26/36] vfio/migration: Multifd device state transfer support - received buffers queuing

2025-03-02 Thread Avihai Horon




On 19/02/2025 22:34, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 

The multifd received data needs to be reassembled since device state
packets sent via different multifd channels can arrive out-of-order.

Therefore, each VFIO device state packet carries a header indicating its
position in the stream.
The raw device state data is saved into a VFIOStateBuffer for later
in-order loading into the device.

The last such VFIO device state packet should have
VFIO_DEVICE_STATE_CONFIG_STATE flag set and carry the device config state.

Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration-multifd.c | 103 
  hw/vfio/migration-multifd.h |   3 ++
  hw/vfio/migration.c |   1 +
  hw/vfio/trace-events|   1 +
  4 files changed, 108 insertions(+)

diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index c2defc0efef0..5d5ee1393674 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -42,6 +42,11 @@ typedef struct VFIOStateBuffer {
  } VFIOStateBuffer;

  typedef struct VFIOMultifd {
+VFIOStateBuffers load_bufs;
+QemuCond load_bufs_buffer_ready_cond;
+QemuMutex load_bufs_mutex; /* Lock order: this lock -> BQL */
+uint32_t load_buf_idx;
+uint32_t load_buf_idx_last;
  } VFIOMultifd;

  static void vfio_state_buffer_clear(gpointer data)
@@ -87,15 +92,113 @@ static VFIOStateBuffer 
*vfio_state_buffers_at(VFIOStateBuffers *bufs, guint idx)
  return &g_array_index(bufs->array, VFIOStateBuffer, idx);
  }

+static bool vfio_load_state_buffer_insert(VFIODevice *vbasedev,
+  VFIODeviceStatePacket *packet,
+  size_t packet_total_size,
+  Error **errp)
+{
+VFIOMigration *migration = vbasedev->migration;
+VFIOMultifd *multifd = migration->multifd;
+VFIOStateBuffer *lb;
+
+vfio_state_buffers_assert_init(&multifd->load_bufs);
+if (packet->idx >= vfio_state_buffers_size_get(&multifd->load_bufs)) {
+vfio_state_buffers_size_set(&multifd->load_bufs, packet->idx + 1);
+}
+
+lb = vfio_state_buffers_at(&multifd->load_bufs, packet->idx);
+if (lb->is_present) {
+error_setg(errp, "state buffer %" PRIu32 " already filled",
+   packet->idx);


Let's add vbasedev->name to the error message so we know which device 
caused the error.



+return false;
+}
+
+assert(packet->idx >= multifd->load_buf_idx);
+
+lb->data = g_memdup2(&packet->data, packet_total_size - sizeof(*packet));
+lb->len = packet_total_size - sizeof(*packet);
+lb->is_present = true;
+
+return true;
+}
+
+bool vfio_load_state_buffer(void *opaque, char *data, size_t data_size,
+Error **errp)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+VFIOMultifd *multifd = migration->multifd;
+VFIODeviceStatePacket *packet = (VFIODeviceStatePacket *)data;
+
+/*
+ * Holding BQL here would violate the lock order and can cause
+ * a deadlock once we attempt to lock load_bufs_mutex below.
+ */
+assert(!bql_locked());


To be clearer, I'd move the assert down to be just above 
"QEMU_LOCK_GUARD(&multifd->load_bufs_mutex);".



+
+if (!vfio_multifd_transfer_enabled(vbasedev)) {
+error_setg(errp,
+   "got device state packet but not doing multifd transfer");


Let's add vbasedev->name to the error message so we know which device 
caused the error.



+return false;
+}
+
+assert(multifd);
+
+if (data_size < sizeof(*packet)) {
+error_setg(errp, "packet too short at %zu (min is %zu)",
+   data_size, sizeof(*packet));


Ditto.


+return false;
+}
+
+if (packet->version != VFIO_DEVICE_STATE_PACKET_VER_CURRENT) {
+error_setg(errp, "packet has unknown version %" PRIu32,
+   packet->version);


Ditto.


+return false;
+}
+
+if (packet->idx == UINT32_MAX) {
+error_setg(errp, "packet has too high idx");


Ditto.


+return false;
+}
+
+trace_vfio_load_state_device_buffer_incoming(vbasedev->name, packet->idx);
+
+QEMU_LOCK_GUARD(&multifd->load_bufs_mutex);
+
+/* config state packet should be the last one in the stream */
+if (packet->flags & VFIO_DEVICE_STATE_CONFIG_STATE) {
+multifd->load_buf_idx_last = packet->idx;
+}
+
+if (!vfio_load_state_buffer_insert(vbasedev, packet, data_size, errp)) {
+return false;
+}
+
+qemu_cond_signal(&multifd->load_bufs_buffer_ready_cond);
+
+return true;
+}
+
  VFIOMultifd *vfio_multifd_new(void)
  {
  VFIOMultifd *multifd = g_new(VFIOMultifd, 1);

+vfio_state_buffers_init(&multifd->load_bufs);
+
+qemu_mutex_init(&multifd->load_bufs_mutex);


Nit: move qemu_mutex_init() jus

Re: [PATCH v5 27/36] vfio/migration: Multifd device state transfer support - load thread

2025-03-02 Thread Avihai Horon




On 26/02/2025 15:49, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


On 2/19/25 21:34, Maciej S. Szmigiero wrote:

From: "Maciej S. Szmigiero" 

Since it's important to finish loading device state transferred via the
main migration channel (via save_live_iterate SaveVMHandler) before
starting loading the data asynchronously transferred via multifd the 
thread

doing the actual loading of the multifd transferred data is only started
from switchover_start SaveVMHandler.

switchover_start handler is called when MIG_CMD_SWITCHOVER_START
sub-command of QEMU_VM_COMMAND is received via the main migration 
channel.


This sub-command is only sent after all save_live_iterate data have 
already

been posted so it is safe to commence loading of the multifd-transferred
device state upon receiving it - loading of save_live_iterate data 
happens

synchronously in the main migration thread (much like the processing of
MIG_CMD_SWITCHOVER_START) so by the time MIG_CMD_SWITCHOVER_START is
processed all the proceeding data must have already been loaded.

Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration-multifd.c | 225 
  hw/vfio/migration-multifd.h |   2 +
  hw/vfio/migration.c |  12 ++
  hw/vfio/trace-events    |   5 +
  4 files changed, 244 insertions(+)

diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index 5d5ee1393674..b3a88c062769 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -42,8 +42,13 @@ typedef struct VFIOStateBuffer {
  } VFIOStateBuffer;

  typedef struct VFIOMultifd {
+    QemuThread load_bufs_thread;
+    bool load_bufs_thread_running;
+    bool load_bufs_thread_want_exit;
+
  VFIOStateBuffers load_bufs;
  QemuCond load_bufs_buffer_ready_cond;
+    QemuCond load_bufs_thread_finished_cond;
  QemuMutex load_bufs_mutex; /* Lock order: this lock -> BQL */
  uint32_t load_buf_idx;
  uint32_t load_buf_idx_last;
@@ -179,6 +184,175 @@ bool vfio_load_state_buffer(void *opaque, char 
*data, size_t data_size,

  return true;
  }

+static int vfio_load_bufs_thread_load_config(VFIODevice *vbasedev)
+{
+    return -EINVAL;
+}



please move to next patch.

+static VFIOStateBuffer *vfio_load_state_buffer_get(VFIOMultifd 
*multifd)

+{
+    VFIOStateBuffer *lb;
+    guint bufs_len;


guint:  I guess it's ok to use here. It is not common practice in VFIO.


Glib documentation says that in new code unsigned int is preferred over 
guint [1].


Thanks.

[1] https://docs.gtk.org/glib/types.html#guint

Re: [PATCH v5 23/36] vfio/migration: Multifd device state transfer support - VFIOStateBuffer(s)

2025-03-02 Thread Maciej S. Szmigiero


On 2.03.2025 14:00, Avihai Horon wrote:


On 19/02/2025 22:34, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 

Add VFIOStateBuffer(s) types and the associated methods.

These store received device state buffers and config state waiting to get
loaded into the device.

Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration-multifd.c | 54 +
  1 file changed, 54 insertions(+)

diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index 0c3185a26242..760b110a39b9 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -29,3 +29,57 @@ typedef struct VFIODeviceStatePacket {
  uint32_t flags;
  uint8_t data[0];
  } QEMU_PACKED VFIODeviceStatePacket;
+
+/* type safety */
+typedef struct VFIOStateBuffers {
+    GArray *array;
+} VFIOStateBuffers;
+
+typedef struct VFIOStateBuffer {
+    bool is_present;
+    char *data;
+    size_t len;
+} VFIOStateBuffer;
+
+static void vfio_state_buffer_clear(gpointer data)
+{
+    VFIOStateBuffer *lb = data;
+
+    if (!lb->is_present) {
+    return;
+    }
+
+    g_clear_pointer(&lb->data, g_free);
+    lb->is_present = false;
+}
+
+static void vfio_state_buffers_init(VFIOStateBuffers *bufs)
+{
+    bufs->array = g_array_new(FALSE, TRUE, sizeof(VFIOStateBuffer));
+    g_array_set_clear_func(bufs->array, vfio_state_buffer_clear);
+}
+
+static void vfio_state_buffers_destroy(VFIOStateBuffers *bufs)
+{
+    g_clear_pointer(&bufs->array, g_array_unref);
+}
+
+static void vfio_state_buffers_assert_init(VFIOStateBuffers *bufs)
+{
+    assert(bufs->array);
+}
+
+static guint vfio_state_buffers_size_get(VFIOStateBuffers *bufs)
+{
+    return bufs->array->len;
+}
+
+static void vfio_state_buffers_size_set(VFIOStateBuffers *bufs, guint size)
+{
+    g_array_set_size(bufs->array, size);
+}
+
+static VFIOStateBuffer *vfio_state_buffers_at(VFIOStateBuffers *bufs, guint 
idx)
+{
+    return &g_array_index(bufs->array, VFIOStateBuffer, idx);
+}


This patch breaks compilation as non of the functions are used, e.g.: error: 
‘vfio_state_buffers_init’ defined but not used I can think of three options to 
solve it: 1. Move these functions to their own file and export them, e.g., 
hw/vfio/state-buffer.{c,h}. But this seems like an overkill for such a small 
API. 2. Add __attribute__((unused)) tags and remove them in patch #26 where the 
functions are actually used. A bit ugly. 3. Squash this patch into patch #26. I 
prefer option 3 as this is a small API closely related to patch #26 (and patch 
#26 will still remain rather small).


Looks like some build configs use -Werror, as unused functions aren't normally
an error.

Will have look at this tomorrow.


Thanks.



Thanks,
Maciej

Re: [PATCH v5 2/2] target/loongarch: check tlb_ps

2025-03-02 Thread gaosong


在 2025/3/1 下午3:40, bibo mao 写道:



On 2025/2/28 下午5:06, Song Gao wrote:

For LoongArch th min tlb_ps is 12(4KB), for TLB code,
the tlb_ps may be 0,this may case UndefinedBehavior
Add a check-tlb_ps fuction to check tlb_ps,
to make sure the tlb_ps is avalablie. we check tlb_ps
when get the tlb_ps from tlb->misc or CSR bits.
1. cpu reset
    set CSR_PWCL.PTBASE and CSR_STLBPS.PS bits a default value
    from CSR_PRCFG2;
2. tlb instructions.
    some tlb instructions get  the tlb_ps from tlb->misc but the
    value may  has been initialized to 0. we need just check the tlb_ps
    skip the function and write a guest log.
3. csrwr instructions.
    to make sure CSR_PWCL.PTBASE and CSR_STLBPS.PS bits are avalable,
    cheke theses bits and set a default value from CSR_PRCFG2.

Signed-off-by: Song Gao 
---
  target/loongarch/cpu.c    | 10 ++--
  target/loongarch/cpu_helper.c |  8 +++-
  target/loongarch/helper.h |  1 +
  target/loongarch/internals.h  |  2 +
  target/loongarch/tcg/csr_helper.c | 30 +++-
  .../tcg/insn_trans/trans_privileged.c.inc |  1 +
  target/loongarch/tcg/tlb_helper.c | 46 ++-
  7 files changed, 90 insertions(+), 8 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index e91f4a5239..162a227d52 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -585,13 +585,17 @@ static void loongarch_cpu_reset_hold(Object 
*obj, ResetType type)

   */
  env->CSR_PGDH = 0;
  env->CSR_PGDL = 0;
-    env->CSR_PWCL = 0;
  env->CSR_PWCH = 0;
-    env->CSR_STLBPS = 0;
  env->CSR_EENTRY = 0;
  env->CSR_TLBRENTRY = 0;
  env->CSR_MERRENTRY = 0;
-
+    /* set CSR_PWCL.PTBASE and CSR_STLBPS.PS bits from CSR_PRCFG2 */
+    if (env->CSR_PRCFG2 == 0) {
+    env->CSR_PRCFG2 =0x3f000;
+    }
+    int tlb_ps = clz32(env->CSR_PRCFG2);
Could you please put variable declaration "int tlb_ps" to header of 
function?



Yes,

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 943517b5f2..1d5cb0198c 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -100,6 +100,7 @@ DEF_HELPER_1(rdtime_d, i64, env)
  DEF_HELPER_1(csrrd_pgd, i64, env)
  DEF_HELPER_1(csrrd_cpuid, i64, env)
  DEF_HELPER_1(csrrd_tval, i64, env)
+DEF_HELPER_2(csrwr_stlbps, i64, env, tl)
  DEF_HELPER_2(csrwr_estat, i64, env, tl)
  DEF_HELPER_2(csrwr_asid, i64, env, tl)
  DEF_HELPER_2(csrwr_tcfg, i64, env, tl)
diff --git a/target/loongarch/internals.h b/target/loongarch/internals.h
index 7b254c5f49..1cd959a766 100644
--- a/target/loongarch/internals.h
+++ b/target/loongarch/internals.h
@@ -43,6 +43,8 @@ enum {
  TLBRET_PE = 7,
  };
  +bool check_ps(CPULoongArchState *ent, int ps);
+
  extern const VMStateDescription vmstate_loongarch_cpu;
    void loongarch_cpu_set_irq(void *opaque, int irq, int level);
diff --git a/target/loongarch/tcg/csr_helper.c 
b/target/loongarch/tcg/csr_helper.c

index 6c95be9910..1c8a234b16 100644
--- a/target/loongarch/tcg/csr_helper.c
+++ b/target/loongarch/tcg/csr_helper.c
@@ -17,6 +17,27 @@
  #include "hw/irq.h"
  #include "cpu-csr.h"
  +
+
+target_ulong helper_csrwr_stlbps(CPULoongArchState *env, 
target_ulong val)

+{
+    int64_t old_v = env->CSR_STLBPS;
+    uint8_t default_ps = ctz32(env->CSR_PRCFG2);
+
+    /*
+ * The real hardware only supports the min tlb_ps is 12
+ * tlb_ps=0 may cause undefined-behavior.
+ */
+    uint8_t tlb_ps = FIELD_EX64(env->CSR_STLBPS, CSR_STLBPS, PS);
+    if (!check_ps(env, tlb_ps)) {
+    qemu_log_mask(LOG_GUEST_ERROR,
+  "Attempted set ps %d\n",tlb_ps);
+    val = FIELD_DP64(val, CSR_STLBPS, PS, default_ps);
+    }
+    env->CSR_STLBPS = val;
+    return old_v;
+}
+
  target_ulong helper_csrrd_pgd(CPULoongArchState *env)
  {
  int64_t v;
@@ -99,19 +120,26 @@ target_ulong 
helper_csrwr_ticlr(CPULoongArchState *env, target_ulong val)
    target_ulong helper_csrwr_pwcl(CPULoongArchState *env, 
target_ulong val)

  {
-    int shift;
+    int shift, ptbase;
  int64_t old_v = env->CSR_PWCL;
+    uint8_t default_ps = ctz32(env->CSR_PRCFG2);
    /*
   * The real hardware only supports 64bit PTE width now, 128bit 
or others

   * treated as illegal.
   */
  shift = FIELD_EX64(val, CSR_PWCL, PTEWIDTH);
+    ptbase = FIELD_EX64(val, CSR_PWCL, PTBASE);
  if (shift) {
  qemu_log_mask(LOG_GUEST_ERROR,
    "Attempted set pte width with %d bit\n", 64 
<< shift);

  val = FIELD_DP64(val, CSR_PWCL, PTEWIDTH, 0);
  }
+    if (!check_ps(env, ptbase)) {
+ qemu_log_mask(LOG_GUEST_ERROR,
+  "Attrmpted set ptbase 2^%d\n", ptbase);
+ val = FIELD_DP64(val, CSR_PWCL, PTBASE, default_ps);
+    }
    env->CSR_PWCL = val;
  return old_v;
diff --git a/target/loongarch/tcg/insn_trans/trans_privileged.c.inc 
b/target/loongarch/tcg/insn_trans/

[PATCH v6 1/8] i386: Add Intel RDT device and State to config.

2025-03-02 Thread Hendrik Wuethrich

Change config to show RDT, add minimal code to the rdt.c module to make
sure things still compile.

Signed-off-by: Hendrik Wuethrich 
---
 hw/i386/Kconfig   |  4 ++
 hw/i386/meson.build   |  1 +
 hw/i386/rdt.c | 97 +++
 include/hw/i386/rdt.h | 35 
 target/i386/cpu.h |  4 ++
 5 files changed, 141 insertions(+)
 create mode 100644 hw/i386/rdt.c
 create mode 100644 include/hw/i386/rdt.h

diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index d34ce07b21..a3a6b2259c 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -10,6 +10,9 @@ config SGX
 bool
 depends on KVM
 
+config RDT
+bool
+
 config PC
 bool
 imply APPLESMC
@@ -26,6 +29,7 @@ config PC
 imply QXL
 imply SEV
 imply SGX
+imply RDT
 imply TEST_DEVICES
 imply TPM_CRB
 imply TPM_TIS_ISA
diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 10bdfde27c..3a697dcc03 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -22,6 +22,7 @@ i386_ss.add(when: 'CONFIG_VMPORT', if_true: files('vmport.c'))
 i386_ss.add(when: 'CONFIG_VTD', if_true: files('intel_iommu.c'))
 i386_ss.add(when: 'CONFIG_SGX', if_true: files('sgx-epc.c','sgx.c'),
 if_false: files('sgx-stub.c'))
+i386_ss.add(when: 'CONFIG_RDT', if_true: files('rdt.c'))
 
 i386_ss.add(when: 'CONFIG_ACPI', if_true: files('acpi-common.c'))
 i386_ss.add(when: 'CONFIG_PC', if_true: files(
diff --git a/hw/i386/rdt.c b/hw/i386/rdt.c
new file mode 100644
index 00..76a253902b
--- /dev/null
+++ b/hw/i386/rdt.c
@@ -0,0 +1,97 @@
+/*
+ * Intel Resource Director Technology (RDT).
+ *
+ * Copyright 2025 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "hw/i386/rdt.h"
+#include "qemu/osdep.h" /* Needs to be included before isa.h */
+#include "hw/isa/isa.h"
+#include "hw/qdev-properties.h"
+#include "qom/object.h"
+
+#define TYPE_RDT "rdt"
+#define RDT_NUM_RMID_PROP "rmids"
+
+OBJECT_DECLARE_TYPE(RDTState, RDTStateClass, RDT);
+
+struct RDTMonitor {
+uint64_t count_local;
+uint64_t count_remote;
+uint64_t count_l3;
+};
+
+struct RDTAllocation {
+QemuMutex lock;
+uint32_t active_cos;
+};
+
+struct RDTStatePerCore {
+QemuMutex lock;
+uint32_t active_rmid;
+};
+
+struct RDTStatePerL3Cache {
+QemuMutex lock;
+
+RDTMonitor *monitors;
+
+/* RDT Allocation bitmask MSRs */
+uint32_t msr_L3_ia32_mask_n[RDT_MAX_L3_MASK_COUNT];
+uint32_t msr_L2_ia32_mask_n[RDT_MAX_L2_MASK_COUNT];
+uint32_t ia32_L2_qos_ext_bw_thrtl_n[RDT_MAX_MBA_THRTL_COUNT];
+
+/* Parent RDTState */
+RDTState *rdtstate;
+};
+
+/* One instance of RDT-internal state to be shared by all cores */
+struct RDTState {
+ISADevice parent;
+
+/* Max amount of RMIDs */
+uint32_t rmids;
+
+uint16_t l3_caches;
+
+RDTStatePerL3Cache *rdtInstances;
+RDTAllocation *allocations;
+};
+
+struct RDTStateClass {
+};
+
+OBJECT_DEFINE_TYPE(RDTState, rdt, RDT, ISA_DEVICE);
+
+static Property rdt_properties[] = {
+DEFINE_PROP_UINT32(RDT_NUM_RMID_PROP, RDTState, rmids, 256),
+};
+
+static void rdt_init(Object *obj)
+{
+}
+
+static void rdt_finalize(Object *obj)
+{
+}
+
+static void rdt_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->hotpluggable = false;
+dc->desc = "RDT";
+dc->user_creatable = true;
+
+device_class_set_props(dc, rdt_properties);
+}
diff --git a/include/hw/i386/rdt.h b/include/hw/i386/rdt.h
new file mode 100644
index 00..1f99f98f7f
--- /dev/null
+++ b/include/hw/i386/rdt.h
@@ -0,0 +1,35 @@
+/*
+ * Intel Resource Director Technology (RDT).
+ *
+ * Copyright 2025 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef HW_RDT_H
+#define HW_RDT_H
+
+/* Max counts for allocation masks or CBMs. In other words, the size of
+ * respective MSRs.
+ * L3_MASK and L3_mask are architectural limitations. THRTL_COUNT is just
+ * the space left until the next MSR.
+ * */
+#define RDT_MAX_L3_MASK_COUNT  127

[PATCH v7 00/19] Change ghes to use HEST-based offsets and add support for error inject

2025-03-02 Thread Mauro Carvalho Chehab

Hi Michael,

I guess we're ready to merge this patch series. Patches here have
been thoughfully reviewed mainly by Igor and Jonathan.

The only change from v7 is a minor editorial change at HEST doc
spec, and the addition of Igor and Jonathan's A-B/R-B.

This series change the way HEST table offsets are calculated,
making them identical to what an OSPM would do and allowing
multiple HEST entries without causing migration issues. It open
space to add HEST support for non-arm architectures, as now
the number and type of HEST notification entries are not
hardcoded at ghes.c. Instead, they're passed as a parameter
from the arch-dependent init code.

With such issue addressed, it adds a new notification type and
add support to inject errors via a Python script. The script
itself is at the final patch.

---
v7:
  - minor editorial change at the patch updating HEST doc spec
   with the new workflow

v6:
- some minor nits addressed:
   - use GPA instead of offset;
   - merged two patches;
   - fixed a couple of long line coding style issues;
   - the HEST/DSDT diff inside a patch was changed to avoid troubles
 applying it.

v5:
- make checkpatch happier;
- HEST table is now tested;
- some changes at HEST spec documentation to align with code changes;
- extra care was taken with regards to git bisectability.

v4:
- added an extra comment for AcpiGhesState structure;
- patches reordered;
- no functional changes, just code shift between the patches in this series.

v3:
- addressed more nits;
- hest_add_le now points to the beginning of HEST table;
- removed HEST from tests/data/acpi;
- added an extra patch to not use fw_cfg with virt-10.0 for hw_error_le

v2: 
- address some nits;
- improved ags cleanup patch and removed ags.present field;
- added some missing le*_to_cpu() calls;
- update date at copyright for new files to 2024-2025;
- qmp command changed to: inject-ghes-v2-error ans since updated to 10.0;
- added HEST and DSDT tables after the changes to make check target happy.
  (two patches: first one whitelisting such tables; second one removing from
   whitelist and updating/adding such tables to tests/data/acpi)



Mauro Carvalho Chehab (19):
  tests/acpi: virt: add an empty HEST file
  tests/qtest/bios-tables-test: extend to also check HEST table
  tests/acpi: virt: update HEST file with its current data
  acpi/ghes: Cleanup the code which gets ghes ged state
  acpi/ghes: prepare to change the way HEST offsets are calculated
  acpi/ghes: add a firmware file with HEST address
  acpi/ghes: Use HEST table offsets when preparing GHES records
  acpi/ghes: don't hard-code the number of sources for HEST table
  acpi/ghes: add a notifier to notify when error data is ready
  acpi/generic_event_device: Update GHES migration to cover hest addr
  acpi/generic_event_device: add logic to detect if HEST addr is
available
  acpi/generic_event_device: add an APEI error device
  tests/acpi: virt: allow acpi table changes at DSDT and HEST tables
  arm/virt: Wire up a GED error device for ACPI / GHES
  qapi/acpi-hest: add an interface to do generic CPER error injection
  acpi/generic_event_device.c: enable use_hest_addr for QEMU 10.x
  tests/acpi: virt: update HEST and DSDT tables
  docs: hest: add new "etc/acpi_table_hest_addr" and update workflow
  scripts/ghes_inject: add a script to generate GHES error inject

 MAINTAINERS   |  10 +
 docs/specs/acpi_hest_ghes.rst |  28 +-
 hw/acpi/Kconfig   |   5 +
 hw/acpi/aml-build.c   |  10 +
 hw/acpi/generic_event_device.c|  44 ++
 hw/acpi/ghes-stub.c   |   7 +-
 hw/acpi/ghes.c| 231 --
 hw/acpi/ghes_cper.c   |  38 +
 hw/acpi/ghes_cper_stub.c  |  19 +
 hw/acpi/meson.build   |   2 +
 hw/arm/virt-acpi-build.c  |  35 +-
 hw/arm/virt.c |  19 +-
 hw/core/machine.c |   2 +
 include/hw/acpi/acpi_dev_interface.h  |   1 +
 include/hw/acpi/aml-build.h   |   2 +
 include/hw/acpi/generic_event_device.h|   1 +
 include/hw/acpi/ghes.h|  51 +-
 include/hw/arm/virt.h |   2 +
 qapi/acpi-hest.json   |  35 +
 qapi/meson.build  |   1 +
 qapi/qapi-schema.json |   1 +
 scripts/arm_processor_error.py| 476 
 scripts/ghes_inject.py|  51 ++
 scripts/qmp_helper.py | 703 ++
 target/arm/kvm.c  |   7 +-
 tests/data/acpi/aarch64/virt/DSDT | Bin 5196 -> 5240 bytes
 .../data/acpi/aarch64/virt/DSDT.acpihmatvirt  | Bin 5282 -> 5326 bytes
 tests/data/acpi/aarch64/virt/DSDT.memhp   | Bin 6557 -> 6601 bytes
 tests

[PATCH v7 01/19] tests/acpi: virt: add an empty HEST file

2025-03-02 Thread Mauro Carvalho Chehab

Such file will be used to track HEST table changes.

For now, disallow HEST table check until we update it to the
current data.

Signed-off-by: Mauro Carvalho Chehab 
Acked-by: Igor Mammedov 
Reviewed-by: Jonathan Cameron 
---
 tests/data/acpi/aarch64/virt/HEST   | 0
 tests/qtest/bios-tables-test-allowed-diff.h | 1 +
 2 files changed, 1 insertion(+)
 create mode 100644 tests/data/acpi/aarch64/virt/HEST

diff --git a/tests/data/acpi/aarch64/virt/HEST 
b/tests/data/acpi/aarch64/virt/HEST
new file mode 100644
index ..e69de29bb2d1
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8bf4..39901c58d647 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,2 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/aarch64/virt/HEST",
-- 
2.48.1

[PATCH] accel/tcg: fix msan findings in translate-all

2025-03-02 Thread Patrick Venture

From: Peter Foley 

e.g.
  Uninitialized value was created by an allocation of 'host_pc' in the stack 
frame
  #0 0xc07df87c in tb_gen_code 
third_party/qemu/accel/tcg/translate-all.c:297:5

Signed-off-by: Peter Foley 
Signed-off-by: Patrick Venture 
---
 accel/tcg/translate-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index d4189c7386..f584055a15 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -298,7 +298,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 tcg_insn_unit *gen_code_buf;
 int gen_code_size, search_size, max_insns;
 int64_t ti;
-void *host_pc;
+void *host_pc = NULL;
 
 assert_memory_lock();
 qemu_thread_jit_write();
-- 
2.48.1.711.g2feabab25a-goog

[PATCH v7 03/19] tests/acpi: virt: update HEST file with its current data

2025-03-02 Thread Mauro Carvalho Chehab

Now that HEST table is checked for aarch64, add the current
firmware file.

Signed-off-by: Mauro Carvalho Chehab 
Acked-by: Igor Mammedov 
Reviewed-by: Jonathan Cameron 
---
 tests/data/acpi/aarch64/virt/HEST   | Bin 0 -> 132 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   1 -
 2 files changed, 1 deletion(-)

diff --git a/tests/data/acpi/aarch64/virt/HEST 
b/tests/data/acpi/aarch64/virt/HEST
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..4c5d8c5b5da5b3241f93cd0839e94272bf6b1486
 100644
GIT binary patch
literal 132
zcmeZp4Gw8xU|?W;66afGL

literal 0
HcmV?d1

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index 39901c58d647..dfb8523c8bf4 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,2 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/aarch64/virt/HEST",
-- 
2.48.1

[PATCH v7 08/19] acpi/ghes: don't hard-code the number of sources for HEST table

2025-03-02 Thread Mauro Carvalho Chehab

The current code is actually dependent on having just one error
structure with a single source, as any change there would cause
migration issues.

As the number of sources should be arch-dependent, as it will depend on
what kind of notifications will exist, and how many errors can be
reported at the same time, change the logic to be more flexible,
allowing the number of sources to be defined when building the
HEST table by the caller.

Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Igor Mammedov 
---
 hw/acpi/ghes.c   | 39 +--
 hw/arm/virt-acpi-build.c |  8 +++-
 include/hw/acpi/ghes.h   | 17 -
 3 files changed, 40 insertions(+), 24 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 668ca72587c7..f49d0d628fc4 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -238,17 +238,17 @@ ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
  * See docs/specs/acpi_hest_ghes.rst for blobs format.
  */
 static void build_ghes_error_table(AcpiGhesState *ags, GArray *hardware_errors,
-   BIOSLinker *linker)
+   BIOSLinker *linker, int num_sources)
 {
 int i, error_status_block_offset;
 
 /* Build error_block_address */
-for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+for (i = 0; i < num_sources; i++) {
 build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
 }
 
 /* Build read_ack_register */
-for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+for (i = 0; i < num_sources; i++) {
 /*
  * Initialize the value of read_ack_register to 1, so GHES can be
  * writable after (re)boot.
@@ -263,13 +263,13 @@ static void build_ghes_error_table(AcpiGhesState *ags, 
GArray *hardware_errors,
 
 /* Reserve space for Error Status Data Block */
 acpi_data_push(hardware_errors,
-ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
+ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
 
 /* Tell guest firmware to place hardware_errors blob into RAM */
 bios_linker_loader_alloc(linker, ACPI_HW_ERROR_FW_CFG_FILE,
  hardware_errors, sizeof(uint64_t), false);
 
-for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+for (i = 0; i < num_sources; i++) {
 /*
  * Tell firmware to patch error_block_address entries to point to
  * corresponding "Generic Error Status Block"
@@ -295,12 +295,14 @@ static void build_ghes_error_table(AcpiGhesState *ags, 
GArray *hardware_errors,
 }
 
 /* Build Generic Hardware Error Source version 2 (GHESv2) */
-static void build_ghes_v2(GArray *table_data,
-  BIOSLinker *linker,
-  enum AcpiGhesNotifyType notify,
-  uint16_t source_id)
+static void build_ghes_v2_entry(GArray *table_data,
+BIOSLinker *linker,
+const AcpiNotificationSourceId *notif_src,
+uint16_t index, int num_sources)
 {
 uint64_t address_offset;
+const uint16_t notify = notif_src->notify;
+const uint16_t source_id = notif_src->source_id;
 
 /*
  * Type:
@@ -331,7 +333,7 @@ static void build_ghes_v2(GArray *table_data,
address_offset + GAS_ADDR_OFFSET,
sizeof(uint64_t),
ACPI_HW_ERROR_FW_CFG_FILE,
-   source_id * sizeof(uint64_t));
+   index * sizeof(uint64_t));
 
 /* Notification Structure */
 build_ghes_hw_error_notification(table_data, notify);
@@ -351,8 +353,7 @@ static void build_ghes_v2(GArray *table_data,
address_offset + GAS_ADDR_OFFSET,
sizeof(uint64_t),
ACPI_HW_ERROR_FW_CFG_FILE,
-   (ACPI_GHES_ERROR_SOURCE_COUNT + source_id)
-   * sizeof(uint64_t));
+   (num_sources + index) * sizeof(uint64_t));
 
 /*
  * Read Ack Preserve field
@@ -368,22 +369,26 @@ static void build_ghes_v2(GArray *table_data,
 void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
  GArray *hardware_errors,
  BIOSLinker *linker,
+ const AcpiNotificationSourceId *notif_source,
+ int num_sources,
  const char *oem_id, const char *oem_table_id)
 {
 AcpiTable table = { .sig = "HEST", .rev = 1,
 .oem_id = oem_id, .oem_table_id = oem_table_id };
 uint32_t hest_offset;
+int i;
 
 hest_offset = table_data->len;
 
-build_ghes_error_table(ags, hardware_errors, linker);
+build_ghes_error_table(ags, hardware_err

[PATCH v7 07/19] acpi/ghes: Use HEST table offsets when preparing GHES records

2025-03-02 Thread Mauro Carvalho Chehab

There are two pointers that are needed during error injection:

1. The start address of the CPER block to be stored;
2. The address of the read ack.

It is preferable to calculate them from the HEST table.  This allows
checking the source ID, the size of the table and the type of the
HEST error block structures.

Yet, keep the old code, as this is needed for migration purposes
from older QEMU versions.

Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Igor Mammedov 
---
 hw/acpi/ghes.c | 100 +
 include/hw/acpi/ghes.h |   2 +-
 2 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index cbc96c1909f0..668ca72587c7 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -41,6 +41,12 @@
 /* Address offset in Generic Address Structure(GAS) */
 #define GAS_ADDR_OFFSET 4
 
+/*
+ * ACPI spec 1.0b
+ * 5.2.3 System Description Table Header
+ */
+#define ACPI_DESC_HEADER_OFFSET 36
+
 /*
  * The total size of Generic Error Data Entry
  * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
@@ -61,6 +67,30 @@
  */
 #define ACPI_GHES_GESB_SIZE 20
 
+/*
+ * See the memory layout map at docs/specs/acpi_hest_ghes.rst.
+ */
+
+/*
+ * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source version 2
+ * Table 18-344 Generic Hardware Error Source version 2 (GHESv2) Structure
+ */
+#define HEST_GHES_V2_ENTRY_SIZE  92
+
+/*
+ * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source version 2
+ * Table 18-344 Generic Hardware Error Source version 2 (GHESv2) Structure
+ * Read Ack Register
+ */
+#define GHES_READ_ACK_ADDR_OFF  64
+
+/*
+ * ACPI 6.1: 18.3.2.7: Generic Hardware Error Source
+ * Table 18-341 Generic Hardware Error Source Structure
+ * Error Status Address
+ */
+#define GHES_ERR_STATUS_ADDR_OFF  20
+
 /*
  * Values for error_severity field
  */
@@ -408,6 +438,73 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
 *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
 }
 
+static void get_ghes_source_offsets(uint16_t source_id,
+uint64_t hest_addr,
+uint64_t *cper_addr,
+uint64_t *read_ack_start_addr,
+Error **errp)
+{
+uint64_t hest_err_block_addr, hest_read_ack_addr;
+uint64_t err_source_entry, error_block_addr;
+uint32_t num_sources, i;
+
+hest_addr += ACPI_DESC_HEADER_OFFSET;
+
+cpu_physical_memory_read(hest_addr, &num_sources,
+ sizeof(num_sources));
+num_sources = le32_to_cpu(num_sources);
+
+err_source_entry = hest_addr + sizeof(num_sources);
+
+/*
+ * Currently, HEST Error source navigates only for GHESv2 tables
+ */
+for (i = 0; i < num_sources; i++) {
+uint64_t addr = err_source_entry;
+uint16_t type, src_id;
+
+cpu_physical_memory_read(addr, &type, sizeof(type));
+type = le16_to_cpu(type);
+
+/* For now, we only know the size of GHESv2 table */
+if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) {
+error_setg(errp, "HEST: type %d not supported.", type);
+return;
+}
+
+/* Compare CPER source ID at the GHESv2 structure */
+addr += sizeof(type);
+cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
+if (le16_to_cpu(src_id) == source_id) {
+break;
+}
+
+err_source_entry += HEST_GHES_V2_ENTRY_SIZE;
+}
+if (i == num_sources) {
+error_setg(errp, "HEST: Source %d not found.", source_id);
+return;
+}
+
+/* Navigate through table address pointers */
+hest_err_block_addr = err_source_entry + GHES_ERR_STATUS_ADDR_OFF +
+  GAS_ADDR_OFFSET;
+
+cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
+ sizeof(error_block_addr));
+error_block_addr = le64_to_cpu(error_block_addr);
+
+cpu_physical_memory_read(error_block_addr, cper_addr,
+ sizeof(*cper_addr));
+*cper_addr = le64_to_cpu(*cper_addr);
+
+hest_read_ack_addr = err_source_entry + GHES_READ_ACK_ADDR_OFF +
+ GAS_ADDR_OFFSET;
+cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
+ sizeof(*read_ack_start_addr));
+*read_ack_start_addr = le64_to_cpu(*read_ack_start_addr);
+}
+
 void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
  uint16_t source_id, Error **errp)
 {
@@ -423,6 +520,9 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void 
*cper, size_t len,
 if (!ags->use_hest_addr) {
 get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
  &cper_addr, &read_ack_register_addr);
+} else {
+get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le),
+

[PATCH v7 09/19] acpi/ghes: add a notifier to notify when error data is ready

2025-03-02 Thread Mauro Carvalho Chehab

Some error injection notify methods are async, like GPIO
notify. Add a notifier to be used when the error record is
ready to be sent to the guest OS.

Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Jonathan Cameron 
Acked-by: Igor Mammedov 
---
 hw/acpi/ghes.c | 5 -
 include/hw/acpi/ghes.h | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index f49d0d628fc4..0135ac844bcf 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -510,6 +510,9 @@ static void get_ghes_source_offsets(uint16_t source_id,
 *read_ack_start_addr = le64_to_cpu(*read_ack_start_addr);
 }
 
+NotifierList acpi_generic_error_notifiers =
+NOTIFIER_LIST_INITIALIZER(error_device_notifiers);
+
 void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
  uint16_t source_id, Error **errp)
 {
@@ -550,7 +553,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void 
*cper, size_t len,
 /* Write the generic error data entry into guest memory */
 cpu_physical_memory_write(cper_addr, cper, len);
 
-return;
+notifier_list_notify(&acpi_generic_error_notifiers, NULL);
 }
 
 int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 8c4b08433760..390943e46d99 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -24,6 +24,9 @@
 
 #include "hw/acpi/bios-linker-loader.h"
 #include "qapi/error.h"
+#include "qemu/notify.h"
+
+extern NotifierList acpi_generic_error_notifiers;
 
 /*
  * Values for Hardware Error Notification Type field
-- 
2.48.1

[PATCH v7 16/19] acpi/generic_event_device.c: enable use_hest_addr for QEMU 10.x

2025-03-02 Thread Mauro Carvalho Chehab

Now that we have everything in place, enable using HEST GPA
instead of etc/hardware_errors GPA.

Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Igor Mammedov 
---
 hw/acpi/generic_event_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index f029753ab709..9fe70b74bd42 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -332,7 +332,7 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, 
AcpiEventStatusBits ev)
 static const Property acpi_ged_properties[] = {
 DEFINE_PROP_UINT32("ged-event", AcpiGedState, ged_event_bitmap, 0),
 DEFINE_PROP_BOOL("x-has-hest-addr", AcpiGedState,
- ghes_state.use_hest_addr, false),
+ ghes_state.use_hest_addr, true),
 };
 
 static const VMStateDescription vmstate_memhp_state = {
-- 
2.48.1

[PATCH v7 02/19] tests/qtest/bios-tables-test: extend to also check HEST table

2025-03-02 Thread Mauro Carvalho Chehab

Currently, aarch64 can generate a HEST table when loaded with
-machine ras=on. Add support for it.

Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Igor Mammedov 
Reviewed-by: Jonathan Cameron 
---
 tests/qtest/bios-tables-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index 0a333ec43536..8d41601cc9e9 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -2122,7 +2122,7 @@ static void test_acpi_aarch64_virt_tcg(void)
 
 data.smbios_cpu_max_speed = 2900;
 data.smbios_cpu_curr_speed = 2700;
-test_acpi_one("-cpu cortex-a57 "
+test_acpi_one("-cpu cortex-a57 -machine ras=on "
   "-smbios type=4,max-speed=2900,current-speed=2700", &data);
 free_test_data(&data);
 }
-- 
2.48.1

[PATCH v7 05/19] acpi/ghes: prepare to change the way HEST offsets are calculated

2025-03-02 Thread Mauro Carvalho Chehab

Add a new ags flag to change the way HEST offsets are calculated.
Currently, offsets needed to store ACPI HEST offsets and read ack
are calculated based on a previous knowledge from the logic
which creates the HEST table.

Such logic is not generic, not allowing to easily add more HEST
entries nor replicates what OSPM does.

As the next patches will be adding a more generic logic, add a
new use_hest_addr, set to false, in preparation for such changes.

Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Igor Mammedov 
---
 hw/acpi/ghes.c   | 39 ---
 hw/arm/virt-acpi-build.c | 13 ++---
 include/hw/acpi/ghes.h   | 12 +++-
 3 files changed, 45 insertions(+), 19 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 84b891fd3dcf..9243b5ad4acb 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -206,7 +206,8 @@ ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
  * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg 
blobs.
  * See docs/specs/acpi_hest_ghes.rst for blobs format.
  */
-static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
+static void build_ghes_error_table(AcpiGhesState *ags, GArray *hardware_errors,
+   BIOSLinker *linker)
 {
 int i, error_status_block_offset;
 
@@ -251,13 +252,15 @@ static void build_ghes_error_table(GArray 
*hardware_errors, BIOSLinker *linker)
i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
 }
 
-/*
- * tell firmware to write hardware_errors GPA into
- * hardware_errors_addr fw_cfg, once the former has been initialized.
- */
-bios_linker_loader_write_pointer(linker, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, 0,
- sizeof(uint64_t),
- ACPI_HW_ERROR_FW_CFG_FILE, 0);
+if (!ags->use_hest_addr) {
+/*
+ * Tell firmware to write hardware_errors GPA into
+ * hardware_errors_addr fw_cfg, once the former has been initialized.
+ */
+bios_linker_loader_write_pointer(linker, 
ACPI_HW_ERROR_ADDR_FW_CFG_FILE,
+ 0, sizeof(uint64_t),
+ ACPI_HW_ERROR_FW_CFG_FILE, 0);
+}
 }
 
 /* Build Generic Hardware Error Source version 2 (GHESv2) */
@@ -331,14 +334,15 @@ static void build_ghes_v2(GArray *table_data,
 }
 
 /* Build Hardware Error Source Table */
-void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
+void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
+ GArray *hardware_errors,
  BIOSLinker *linker,
  const char *oem_id, const char *oem_table_id)
 {
 AcpiTable table = { .sig = "HEST", .rev = 1,
 .oem_id = oem_id, .oem_table_id = oem_table_id };
 
-build_ghes_error_table(hardware_errors, linker);
+build_ghes_error_table(ags, hardware_errors, linker);
 
 acpi_table_begin(&table, table_data);
 
@@ -357,9 +361,11 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState 
*s,
 fw_cfg_add_file(s, ACPI_HW_ERROR_FW_CFG_FILE, hardware_error->data,
 hardware_error->len);
 
-/* Create a read-write fw_cfg file for Address */
-fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
-NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);
+if (!ags->use_hest_addr) {
+/* Create a read-write fw_cfg file for Address */
+fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
+NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);
+}
 }
 
 static void get_hw_error_offsets(uint64_t ghes_addr,
@@ -395,8 +401,11 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const 
void *cper, size_t len,
 }
 
 assert(ACPI_GHES_ERROR_SOURCE_COUNT == 1);
-get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
- &cper_addr, &read_ack_register_addr);
+
+if (!ags->use_hest_addr) {
+get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
+ &cper_addr, &read_ack_register_addr);
+}
 
 cpu_physical_memory_read(read_ack_register_addr,
  &read_ack_register, sizeof(read_ack_register));
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 3ac8f8e17861..040d875d4e83 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -946,9 +946,16 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
 build_dbg2(tables_blob, tables->linker, vms);
 
 if (vms->ras) {
-acpi_add_table(table_offsets, tables_blob);
-acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
-vms->oem_id, vms->oem_table_id);
+AcpiGedState *acpi_ged_state;
+AcpiGhesState *ags;
+
+

[PATCH v7 18/19] docs: hest: add new "etc/acpi_table_hest_addr" and update workflow

2025-03-02 Thread Mauro Carvalho Chehab

While the HEST layout didn't change, there are some internal
changes related to how offsets are calculated and how memory error
events are triggered.

Update specs to reflect such changes.

Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Igor Mammedov 
---
 docs/specs/acpi_hest_ghes.rst | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
index c3e9f8d9a702..3d1b85d74b70 100644
--- a/docs/specs/acpi_hest_ghes.rst
+++ b/docs/specs/acpi_hest_ghes.rst
@@ -89,12 +89,21 @@ Design Details
 addresses in the "error_block_address" fields with a pointer to the
 respective "Error Status Data Block" in the "etc/hardware_errors" blob.
 
-(8) QEMU defines a third and write-only fw_cfg blob which is called
-"etc/hardware_errors_addr". Through that blob, the firmware can send back
-the guest-side allocation addresses to QEMU. The "etc/hardware_errors_addr"
-blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER command
-for the firmware. The firmware will write back the start address of
-"etc/hardware_errors" blob to the fw_cfg file "etc/hardware_errors_addr".
+(8) QEMU defines a third and write-only fw_cfg blob to store the location
+where the error block offsets, read ack registers and CPER records are
+stored.
+
+Up to QEMU 9.2, the location was at "etc/hardware_errors_addr", and
+contains a GPA for the beginning of "etc/hardware_errors".
+
+Newer versions place the location at "etc/acpi_table_hest_addr",
+pointing to the GPA of the HEST table.
+
+Using above mentioned 'fw_cfg' files, the firmware can send back the
+guest-side allocation addresses to QEMU. They contain a 8-byte entry.
+QEMU generates a single WRITE_POINTER command for the firmware. The
+firmware will write back the start address of either "etc/hardware_errors"
+or HEST table at the corresponding fw_cfg file.
 
 (9) When QEMU gets a SIGBUS from the kernel, QEMU writes CPER into 
corresponding
 "Error Status Data Block", guest memory, and then injects platform specific
@@ -105,8 +114,5 @@ Design Details
  kernel, on receiving notification, guest APEI driver could read the CPER 
error
  and take appropriate action.
 
-(11) kvm_arch_on_sigbus_vcpu() uses source_id as index in 
"etc/hardware_errors" to
- find out "Error Status Data Block" entry corresponding to error source. 
So supported
- source_id values should be assigned here and not be changed afterwards to 
make sure
- that guest will write error into expected "Error Status Data Block" even 
if guest was
- migrated to a newer QEMU.
+(11) kvm_arch_on_sigbus_vcpu() report RAS errors via a SEA notifications,
+ when a SIGBUS event is triggered.
-- 
2.48.1

[PATCH v7 06/19] acpi/ghes: add a firmware file with HEST address

2025-03-02 Thread Mauro Carvalho Chehab

Store HEST table address at GPA, placing its the start of the table at
hest_addr_le variable.

Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Igor Mammedov 
---
 hw/acpi/ghes.c | 22 --
 include/hw/acpi/ghes.h |  6 +-
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 9243b5ad4acb..cbc96c1909f0 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -30,6 +30,7 @@
 
 #define ACPI_HW_ERROR_FW_CFG_FILE   "etc/hardware_errors"
 #define ACPI_HW_ERROR_ADDR_FW_CFG_FILE  "etc/hardware_errors_addr"
+#define ACPI_HEST_ADDR_FW_CFG_FILE  "etc/acpi_table_hest_addr"
 
 /* The max size in bytes for one error block */
 #define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
@@ -341,6 +342,9 @@ void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
 {
 AcpiTable table = { .sig = "HEST", .rev = 1,
 .oem_id = oem_id, .oem_table_id = oem_table_id };
+uint32_t hest_offset;
+
+hest_offset = table_data->len;
 
 build_ghes_error_table(ags, hardware_errors, linker);
 
@@ -352,6 +356,17 @@ void acpi_build_hest(AcpiGhesState *ags, GArray 
*table_data,
   ACPI_GHES_NOTIFY_SEA, ACPI_HEST_SRC_ID_SEA);
 
 acpi_table_end(linker, &table);
+
+if (ags->use_hest_addr) {
+/*
+ * Tell firmware to write into GPA the address of HEST via fw_cfg,
+ * once initialized.
+ */
+bios_linker_loader_write_pointer(linker,
+ ACPI_HEST_ADDR_FW_CFG_FILE, 0,
+ sizeof(uint64_t),
+ ACPI_BUILD_TABLE_FILE, hest_offset);
+}
 }
 
 void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
@@ -361,7 +376,10 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState 
*s,
 fw_cfg_add_file(s, ACPI_HW_ERROR_FW_CFG_FILE, hardware_error->data,
 hardware_error->len);
 
-if (!ags->use_hest_addr) {
+if (ags->use_hest_addr) {
+fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
+NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
+} else {
 /* Create a read-write fw_cfg file for Address */
 fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
 NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);
@@ -484,7 +502,7 @@ AcpiGhesState *acpi_ghes_get_state(void)
 }
 ags = &acpi_ged_state->ghes_state;
 
-if (!ags->hw_error_le) {
+if (!ags->hw_error_le && !ags->hest_addr_le) {
 return NULL;
 }
 return ags;
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 411f592662af..ea9baab764e2 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -70,9 +70,13 @@ enum {
  * When use_hest_addr is false, the GPA of the etc/hardware_errors firmware
  * is stored at hw_error_le. This is the default on QEMU 9.x.
  *
- * An GPA value equal to zero means that GHES is not present.
+ * When use_hest_addr is true, the GPA of the HEST table is stored at
+ * hest_addr_le. This is the default for QEMU 10.x and above.
+ *
+ * Whe both GPA values are equal to zero means that GHES is not present.
  */
 typedef struct AcpiGhesState {
+uint64_t hest_addr_le;
 uint64_t hw_error_le;
 bool use_hest_addr; /* Currently, always false */
 } AcpiGhesState;
-- 
2.48.1

[PATCH v7 14/19] arm/virt: Wire up a GED error device for ACPI / GHES

2025-03-02 Thread Mauro Carvalho Chehab

Adds support to ARM virtualization to allow handling
generic error ACPI Event via GED & error source device.

It is aligned with Linux Kernel patch:
https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.hu...@intel.com/

Co-authored-by: Mauro Carvalho Chehab 
Co-authored-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
Signed-off-by: Mauro Carvalho Chehab 
Acked-by: Igor Mammedov 
Reviewed-by: Jonathan Cameron 
---
 hw/arm/virt-acpi-build.c |  1 +
 hw/arm/virt.c| 12 +++-
 include/hw/arm/virt.h|  1 +
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 5443615d976d..2bf9118fda55 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -861,6 +861,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 }
 
 acpi_dsdt_add_power_button(scope);
+aml_append(scope, aml_error_device());
 #ifdef CONFIG_TPM
 acpi_dsdt_add_tpm(scope, vms);
 #endif
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4a5a9666e916..3faf32f900b5 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -678,7 +678,7 @@ static inline DeviceState *create_acpi_ged(VirtMachineState 
*vms)
 DeviceState *dev;
 MachineState *ms = MACHINE(vms);
 int irq = vms->irqmap[VIRT_ACPI_GED];
-uint32_t event = ACPI_GED_PWR_DOWN_EVT;
+uint32_t event = ACPI_GED_PWR_DOWN_EVT | ACPI_GED_ERROR_EVT;
 
 if (ms->ram_slots) {
 event |= ACPI_GED_MEM_HOTPLUG_EVT;
@@ -1010,6 +1010,13 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
 }
 }
 
+static void virt_generic_error_req(Notifier *n, void *opaque)
+{
+VirtMachineState *s = container_of(n, VirtMachineState, 
generic_error_notifier);
+
+acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR);
+}
+
 static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
  uint32_t phandle)
 {
@@ -2404,6 +2411,9 @@ static void machvirt_init(MachineState *machine)
 
 if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) {
 vms->acpi_dev = create_acpi_ged(vms);
+vms->generic_error_notifier.notify = virt_generic_error_req;
+notifier_list_add(&acpi_generic_error_notifiers,
+  &vms->generic_error_notifier);
 } else {
 create_gpio_devices(vms, VIRT_GPIO, sysmem);
 }
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index c8e94e6aedc9..f3cf28436770 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -176,6 +176,7 @@ struct VirtMachineState {
 DeviceState *gic;
 DeviceState *acpi_dev;
 Notifier powerdown_notifier;
+Notifier generic_error_notifier;
 PCIBus *bus;
 char *oem_id;
 char *oem_table_id;
-- 
2.48.1

[PATCH v7 15/19] qapi/acpi-hest: add an interface to do generic CPER error injection

2025-03-02 Thread Mauro Carvalho Chehab

Creates a QMP command to be used for generic ACPI APEI hardware error
injection (HEST) via GHESv2, and add support for it for ARM guests.

Error injection uses ACPI_HEST_SRC_ID_QMP source ID to be platform
independent. This is mapped at arch virt bindings, depending on the
types supported by QEMU and by the BIOS. So, on ARM, this is supported
via ACPI_GHES_NOTIFY_GPIO notification type.

This patch is co-authored:
- original ghes logic to inject a simple ARM record by Shiju Jose;
- generic logic to handle block addresses by Jonathan Cameron;
- generic GHESv2 error inject by Mauro Carvalho Chehab;

Co-authored-by: Jonathan Cameron 
Co-authored-by: Shiju Jose 
Co-authored-by: Mauro Carvalho Chehab 
Signed-off-by: Jonathan Cameron 
Signed-off-by: Shiju Jose 
Signed-off-by: Mauro Carvalho Chehab 
Acked-by: Igor Mammedov 
Acked-by: Markus Armbruster 
---
 MAINTAINERS  |  7 +++
 hw/acpi/Kconfig  |  5 +
 hw/acpi/ghes.c   |  2 +-
 hw/acpi/ghes_cper.c  | 38 ++
 hw/acpi/ghes_cper_stub.c | 19 +++
 hw/acpi/meson.build  |  2 ++
 hw/arm/virt-acpi-build.c |  1 +
 hw/arm/virt.c|  7 +++
 include/hw/acpi/ghes.h   |  1 +
 include/hw/arm/virt.h|  1 +
 qapi/acpi-hest.json  | 35 +++
 qapi/meson.build |  1 +
 qapi/qapi-schema.json|  1 +
 13 files changed, 119 insertions(+), 1 deletion(-)
 create mode 100644 hw/acpi/ghes_cper.c
 create mode 100644 hw/acpi/ghes_cper_stub.c
 create mode 100644 qapi/acpi-hest.json

diff --git a/MAINTAINERS b/MAINTAINERS
index 1911949526ce..7358735007c8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2081,6 +2081,13 @@ F: hw/acpi/ghes.c
 F: include/hw/acpi/ghes.h
 F: docs/specs/acpi_hest_ghes.rst
 
+ACPI/HEST/GHES/ARM processor CPER
+R: Mauro Carvalho Chehab 
+S: Maintained
+F: hw/arm/ghes_cper.c
+F: hw/acpi/ghes_cper_stub.c
+F: qapi/acpi-hest.json
+
 ppc4xx
 L: qemu-...@nongnu.org
 S: Orphan
diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 1d4e9f0845c0..daabbe6cd11e 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -51,6 +51,11 @@ config ACPI_APEI
 bool
 depends on ACPI
 
+config GHES_CPER
+bool
+depends on ACPI_APEI
+default y
+
 config ACPI_PCI
 bool
 depends on ACPI && PCI
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 0135ac844bcf..1d02ef6dcb70 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -553,7 +553,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void 
*cper, size_t len,
 /* Write the generic error data entry into guest memory */
 cpu_physical_memory_write(cper_addr, cper, len);
 
-notifier_list_notify(&acpi_generic_error_notifiers, NULL);
+notifier_list_notify(&acpi_generic_error_notifiers, &source_id);
 }
 
 int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c
new file mode 100644
index ..0a2d95dd8b27
--- /dev/null
+++ b/hw/acpi/ghes_cper.c
@@ -0,0 +1,38 @@
+/*
+ * CPER payload parser for error injection
+ *
+ * Copyright(C) 2024-2025 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "qemu/base64.h"
+#include "qemu/error-report.h"
+#include "qemu/uuid.h"
+#include "qapi/qapi-commands-acpi-hest.h"
+#include "hw/acpi/ghes.h"
+
+void qmp_inject_ghes_v2_error(const char *qmp_cper, Error **errp)
+{
+AcpiGhesState *ags;
+
+ags = acpi_ghes_get_state();
+if (!ags) {
+return;
+}
+
+uint8_t *cper;
+size_t  len;
+
+cper = qbase64_decode(qmp_cper, -1, &len, errp);
+if (!cper) {
+error_setg(errp, "missing GHES CPER payload");
+return;
+}
+
+ghes_record_cper_errors(ags, cper, len, ACPI_HEST_SRC_ID_QMP, errp);
+}
diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c
new file mode 100644
index ..5ebc61970a78
--- /dev/null
+++ b/hw/acpi/ghes_cper_stub.c
@@ -0,0 +1,19 @@
+/*
+ * Stub interface for CPER payload parser for error injection
+ *
+ * Copyright(C) 2024-2025 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-acpi-hest.h"
+#include "hw/acpi/ghes.h"
+
+void qmp_inject_ghes_v2_error(const char *cper, Error **errp)
+{
+error_setg(errp, "GHES QMP error inject is not compiled in");
+}
diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index 73f02b96912b..56b5d1ec9691 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -34,4 +34,6 @@ endif
 system_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 
'aml-build-stub.c', 'ghes-stub.c', 'acpi_interface.c'))
 system_ss.add(when: 'CONFIG_ACPI_PCI_BRIDGE', if_false: 
files('pci-bridge-stub.c'))
 system_ss.add_all(when: 'CONFIG_ACPI', if_t

[PATCH v7 04/19] acpi/ghes: Cleanup the code which gets ghes ged state

2025-03-02 Thread Mauro Carvalho Chehab

Move the check logic into a common function and simplify the
code which checks if GHES is enabled and was properly setup.

Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Igor Mammedov 
---
 hw/acpi/ghes-stub.c|  7 ---
 hw/acpi/ghes.c | 38 +++---
 include/hw/acpi/ghes.h | 14 +++---
 target/arm/kvm.c   |  7 +--
 4 files changed, 27 insertions(+), 39 deletions(-)

diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index 7cec1812dad9..40f660c246fe 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -11,12 +11,13 @@
 #include "qemu/osdep.h"
 #include "hw/acpi/ghes.h"
 
-int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
+int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
+uint64_t physical_address)
 {
 return -1;
 }
 
-bool acpi_ghes_present(void)
+AcpiGhesState *acpi_ghes_get_state(void)
 {
-return false;
+return NULL;
 }
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index b709c177cdea..84b891fd3dcf 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -360,18 +360,12 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState 
*s,
 /* Create a read-write fw_cfg file for Address */
 fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
 NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);
-
-ags->present = true;
 }
 
 static void get_hw_error_offsets(uint64_t ghes_addr,
  uint64_t *cper_addr,
  uint64_t *read_ack_register_addr)
 {
-if (!ghes_addr) {
-return;
-}
-
 /*
  * non-HEST version supports only one source, so no need to change
  * the start offset based on the source ID. Also, we can't validate
@@ -390,35 +384,20 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
 *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
 }
 
-void ghes_record_cper_errors(const void *cper, size_t len,
+void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
  uint16_t source_id, Error **errp)
 {
 uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
-AcpiGedState *acpi_ged_state;
-AcpiGhesState *ags;
 
 if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
 error_setg(errp, "GHES CPER record is too big: %zd", len);
 return;
 }
 
-acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
-   NULL));
-if (!acpi_ged_state) {
-error_setg(errp, "Can't find ACPI_GED object");
-return;
-}
-ags = &acpi_ged_state->ghes_state;
-
 assert(ACPI_GHES_ERROR_SOURCE_COUNT == 1);
 get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
  &cper_addr, &read_ack_register_addr);
 
-if (!cper_addr) {
-error_setg(errp, "can not find Generic Error Status Block");
-return;
-}
-
 cpu_physical_memory_read(read_ack_register_addr,
  &read_ack_register, sizeof(read_ack_register));
 
@@ -444,7 +423,8 @@ void ghes_record_cper_errors(const void *cper, size_t len,
 return;
 }
 
-int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
+int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
+uint64_t physical_address)
 {
 /* Memory Error Section Type */
 const uint8_t guid[] =
@@ -470,7 +450,7 @@ int acpi_ghes_memory_errors(uint16_t source_id, uint64_t 
physical_address)
 acpi_ghes_build_append_mem_cper(block, physical_address);
 
 /* Report the error */
-ghes_record_cper_errors(block->data, block->len, source_id, &errp);
+ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp);
 
 g_array_free(block, true);
 
@@ -482,7 +462,7 @@ int acpi_ghes_memory_errors(uint16_t source_id, uint64_t 
physical_address)
 return 0;
 }
 
-bool acpi_ghes_present(void)
+AcpiGhesState *acpi_ghes_get_state(void)
 {
 AcpiGedState *acpi_ged_state;
 AcpiGhesState *ags;
@@ -491,8 +471,12 @@ bool acpi_ghes_present(void)
NULL));
 
 if (!acpi_ged_state) {
-return false;
+return NULL;
 }
 ags = &acpi_ged_state->ghes_state;
-return ags->present;
+
+if (!ags->hw_error_le) {
+return NULL;
+}
+return ags;
 }
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 39619a2457cb..f96ac3e85ca2 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -66,7 +66,6 @@ enum {
 
 typedef struct AcpiGhesState {
 uint64_t hw_error_le;
-bool present; /* True if GHES is present at all on this board */
 } AcpiGhesState;
 
 void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
@@ -74,15 +73,16 @@ void acpi_build_hest(GArray *table_data, GArray

[PATCH v7 12/19] acpi/generic_event_device: add an APEI error device

2025-03-02 Thread Mauro Carvalho Chehab

Adds a generic error device to handle generic hardware error
events as specified at ACPI 6.5 specification at 18.3.2.7.2:
https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
using HID PNP0C33.

The PNP0C33 device is used to report hardware errors to
the guest via ACPI APEI Generic Hardware Error Source (GHES).

Co-authored-by: Mauro Carvalho Chehab 
Co-authored-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Igor Mammedov 
---
 hw/acpi/aml-build.c| 10 ++
 hw/acpi/generic_event_device.c | 13 +
 include/hw/acpi/acpi_dev_interface.h   |  1 +
 include/hw/acpi/aml-build.h|  2 ++
 include/hw/acpi/generic_event_device.h |  1 +
 5 files changed, 27 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index f8f93a9f66c8..e4bd7b611372 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -2614,3 +2614,13 @@ Aml *aml_i2c_serial_bus_device(uint16_t address, const 
char *resource_source)
 
 return var;
 }
+
+/* ACPI 5.0b: 18.3.2.6.2 Event Notification For Generic Error Sources */
+Aml *aml_error_device(void)
+{
+Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE);
+aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
+aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+
+return dev;
+}
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index 471d1a7afc76..f029753ab709 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
 ACPI_GED_PWR_DOWN_EVT,
 ACPI_GED_NVDIMM_HOTPLUG_EVT,
 ACPI_GED_CPU_HOTPLUG_EVT,
+ACPI_GED_ERROR_EVT,
 };
 
 /*
@@ -116,6 +117,16 @@ void build_ged_aml(Aml *table, const char *name, 
HotplugHandler *hotplug_dev,
aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
   aml_int(0x80)));
 break;
+case ACPI_GED_ERROR_EVT:
+/*
+ * ACPI 5.0b: 5.6.6 Device Object Notifications
+ * Table 5-135 Error Device Notification Values
+ * Defines 0x80 as the value to be used on notifications
+ */
+aml_append(if_ctx,
+   aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE),
+  aml_int(0x80)));
+break;
 case ACPI_GED_NVDIMM_HOTPLUG_EVT:
 aml_append(if_ctx,
aml_notify(aml_name("\\_SB.NVDR"),
@@ -295,6 +306,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, 
AcpiEventStatusBits ev)
 sel = ACPI_GED_MEM_HOTPLUG_EVT;
 } else if (ev & ACPI_POWER_DOWN_STATUS) {
 sel = ACPI_GED_PWR_DOWN_EVT;
+} else if (ev & ACPI_GENERIC_ERROR) {
+sel = ACPI_GED_ERROR_EVT;
 } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) {
 sel = ACPI_GED_NVDIMM_HOTPLUG_EVT;
 } else if (ev & ACPI_CPU_HOTPLUG_STATUS) {
diff --git a/include/hw/acpi/acpi_dev_interface.h 
b/include/hw/acpi/acpi_dev_interface.h
index 68d9d15f50aa..8294f8f0ccca 100644
--- a/include/hw/acpi/acpi_dev_interface.h
+++ b/include/hw/acpi/acpi_dev_interface.h
@@ -13,6 +13,7 @@ typedef enum {
 ACPI_NVDIMM_HOTPLUG_STATUS = 16,
 ACPI_VMGENID_CHANGE_STATUS = 32,
 ACPI_POWER_DOWN_STATUS = 64,
+ACPI_GENERIC_ERROR = 128,
 } AcpiEventStatusBits;
 
 #define TYPE_ACPI_DEVICE_IF "acpi-device-interface"
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index c18f68134246..f38e12971932 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -252,6 +252,7 @@ struct CrsRangeSet {
 /* Consumer/Producer */
 #define AML_SERIAL_BUS_FLAG_CONSUME_ONLY(1 << 1)
 
+#define ACPI_APEI_ERROR_DEVICE   "GEDD"
 /**
  * init_aml_allocator:
  *
@@ -382,6 +383,7 @@ Aml *aml_dma(AmlDmaType typ, AmlDmaBusMaster bm, 
AmlTransferSize sz,
  uint8_t channel);
 Aml *aml_sleep(uint64_t msec);
 Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source);
+Aml *aml_error_device(void);
 
 /* Block AML object primitives */
 Aml *aml_scope(const char *name_format, ...) G_GNUC_PRINTF(1, 2);
diff --git a/include/hw/acpi/generic_event_device.h 
b/include/hw/acpi/generic_event_device.h
index d2dac87b4a9f..1c18ac296fcb 100644
--- a/include/hw/acpi/generic_event_device.h
+++ b/include/hw/acpi/generic_event_device.h
@@ -101,6 +101,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
 #define ACPI_GED_PWR_DOWN_EVT  0x2
 #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4
 #define ACPI_GED_CPU_HOTPLUG_EVT0x8
+#define ACPI_GED_ERROR_EVT  0x10
 
 typedef struct GEDState {
 MemoryRegion evt;
-- 
2.48.1

[PATCH 0/3] Enhancing Device Identification in RISC-V IOMMU Using Memory Attributes

2025-03-02 Thread Jason Chien

This patch series enhances how device IDs are handled in RISC-V IOMMU by
leveraging memory attributes.

The BDF (Bus-Device-Function) is now included in memory attributes for
DMA operations, ensuring accurate device identification.

Since PCIe bus numbers can change after re-enumeration, relying on static
device IDs in RISCVIOMMUSpace may lead to incorrect Device Directory Table
walk. The IOMMU now dynamically retrieves latest device IDs from memory
attributes.

The bus property, previously used to set non-root endpoint bus numbers,
is removed. As PCIe bus numbers are assigned at runtime and vary across
endpoints, exposing a single property to pre-set them is unnecessary and
incorrect. With device IDs now retrieved dynamically, this property is no
longer required.

Jason Chien (3):
  include/hw/pci: Attach BDF to Memory Attributes
  hw/riscv/riscv-iommu: Obtain Device IDs from Memory Attributes
  hw/riscv/riscv_iommu: Remove the "bus" property

 hw/riscv/riscv-iommu.c  | 15 +++
 hw/riscv/riscv-iommu.h  |  1 -
 include/hw/pci/pci_device.h | 10 --
 3 files changed, 15 insertions(+), 11 deletions(-)

-- 
2.43.2

[PATCH 3/3] hw/riscv/riscv_iommu: Remove the "bus" property

2025-03-02 Thread Jason Chien

This property was originally intended to set the bus number for non-root
endpoints. However, since the PCIe bus number is assigned and modified
at runtime, setting this property before software execution is incorrect.
Additionally, the property incorrectly assumes that all endpoints share
the same bus, whereas no such restriction exists.

With the IOMMU now retrieving the latest device IDs from memory attributes,
there is no longer a need to set or update device IDs.

Signed-off-by: Jason Chien 
---
 hw/riscv/riscv-iommu.c | 7 ---
 hw/riscv/riscv-iommu.h | 1 -
 2 files changed, 8 deletions(-)

diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
index b72ce8e6d0..1ca85b95ac 100644
--- a/hw/riscv/riscv-iommu.c
+++ b/hw/riscv/riscv-iommu.c
@@ -1197,9 +1197,6 @@ static AddressSpace *riscv_iommu_space(RISCVIOMMUState 
*s, uint32_t devid)
 {
 RISCVIOMMUSpace *as;
 
-/* FIXME: PCIe bus remapping for attached endpoints. */
-devid |= s->bus << 8;
-
 QLIST_FOREACH(as, &s->spaces, list) {
 if (as->devid == devid) {
 break;
@@ -2261,9 +2258,6 @@ static MemTxResult riscv_iommu_trap_write(void *opaque, 
hwaddr addr,
 return MEMTX_ACCESS_ERROR;
 }
 
-/* FIXME: PCIe bus remapping for attached endpoints. */
-devid |= s->bus << 8;
-
 ctx = riscv_iommu_ctx(s, devid, 0, &ref);
 if (ctx == NULL) {
 res = MEMTX_ACCESS_ERROR;
@@ -2498,7 +2492,6 @@ void riscv_iommu_reset(RISCVIOMMUState *s)
 static const Property riscv_iommu_properties[] = {
 DEFINE_PROP_UINT32("version", RISCVIOMMUState, version,
 RISCV_IOMMU_SPEC_DOT_VER),
-DEFINE_PROP_UINT32("bus", RISCVIOMMUState, bus, 0x0),
 DEFINE_PROP_UINT32("ioatc-limit", RISCVIOMMUState, iot_limit,
 LIMIT_CACHE_IOT),
 DEFINE_PROP_BOOL("intremap", RISCVIOMMUState, enable_msi, TRUE),
diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
index a31aa62144..655c0e71a8 100644
--- a/hw/riscv/riscv-iommu.h
+++ b/hw/riscv/riscv-iommu.h
@@ -34,7 +34,6 @@ struct RISCVIOMMUState {
 /*< public >*/
 uint32_t version; /* Reported interface version number */
 uint32_t pid_bits;/* process identifier width */
-uint32_t bus; /* PCI bus mapping for non-root endpoints */
 
 uint64_t cap; /* IOMMU supported capabilities */
 uint64_t fctl;/* IOMMU enabled features */
-- 
2.43.2

[PATCH 2/3] hw/riscv/riscv-iommu: Obtain Device IDs from Memory Attributes

2025-03-02 Thread Jason Chien

The bus number of a PCIe endpoint may change after PCIe re-enumeration,
potentially causing the device ID stored in RISCVIOMMUSpace to become
outdated. This can lead to an incorrect Device Directory Table walk.

This commit ensures that the IOMMU dynamically retrieves the latest device
IDs from the memory attributes of the requester devices, ensuring accuracy.

Signed-off-by: Jason Chien 
---
 hw/riscv/riscv-iommu.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
index d46beb2d64..b72ce8e6d0 100644
--- a/hw/riscv/riscv-iommu.c
+++ b/hw/riscv/riscv-iommu.c
@@ -2644,7 +2644,13 @@ void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, 
PCIBus *bus,
 static int riscv_iommu_memory_region_index(IOMMUMemoryRegion *iommu_mr,
 MemTxAttrs attrs)
 {
-return attrs.unspecified ? RISCV_IOMMU_NOPROCID : (int)attrs.pid;
+RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
+
+/* Requesters must attach its device ID. */
+g_assert(attrs.unspecified == 0);
+
+as->devid = attrs.requester_id;
+return attrs.pid;
 }
 
 static int riscv_iommu_memory_region_index_len(IOMMUMemoryRegion *iommu_mr)
-- 
2.43.2

[PATCH 1/3] include/hw/pci: Attach BDF to Memory Attributes

2025-03-02 Thread Jason Chien

This commit adds the BDF to the memory attributes for DMA operations.

Signed-off-by: Jason Chien 
---
 include/hw/pci/pci_device.h | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index add208edfa..968f1ba3e9 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -244,6 +244,8 @@ static inline MemTxResult pci_dma_rw(PCIDevice *dev, 
dma_addr_t addr,
  void *buf, dma_addr_t len,
  DMADirection dir, MemTxAttrs attrs)
 {
+attrs.unspecified = 0;
+attrs.requester_id = pci_requester_id(dev);
 return dma_memory_rw(pci_get_address_space(dev), addr, buf, len,
  dir, attrs);
 }
@@ -292,6 +294,8 @@ static inline MemTxResult pci_dma_write(PCIDevice *dev, 
dma_addr_t addr,
uint##_bits##_t *val, \
MemTxAttrs attrs) \
 { \
+attrs.unspecified = 0; \
+attrs.requester_id = pci_requester_id(dev); \
 return ld##_l##_dma(pci_get_address_space(dev), addr, val, attrs); \
 } \
 static inline MemTxResult st##_s##_pci_dma(PCIDevice *dev, \
@@ -299,6 +303,8 @@ static inline MemTxResult pci_dma_write(PCIDevice *dev, 
dma_addr_t addr,
uint##_bits##_t val, \
MemTxAttrs attrs) \
 { \
+attrs.unspecified = 0; \
+attrs.requester_id = pci_requester_id(dev); \
 return st##_s##_dma(pci_get_address_space(dev), addr, val, attrs); \
 }
 
@@ -327,8 +333,8 @@ PCI_DMA_DEFINE_LDST(q_be, q_be, 64);
 static inline void *pci_dma_map(PCIDevice *dev, dma_addr_t addr,
 dma_addr_t *plen, DMADirection dir)
 {
-return dma_memory_map(pci_get_address_space(dev), addr, plen, dir,
-  MEMTXATTRS_UNSPECIFIED);
+MemTxAttrs attrs = {.requester_id = pci_requester_id(dev)};
+return dma_memory_map(pci_get_address_space(dev), addr, plen, dir, attrs);
 }
 
 static inline void pci_dma_unmap(PCIDevice *dev, void *buffer, dma_addr_t len,
-- 
2.43.2

[PATCH v7 10/19] acpi/generic_event_device: Update GHES migration to cover hest addr

2025-03-02 Thread Mauro Carvalho Chehab

The GHES migration logic should now support HEST table location too.

Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Igor Mammedov 
---
 hw/acpi/generic_event_device.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index c85d97ca3776..5346cae573b7 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -386,6 +386,34 @@ static const VMStateDescription vmstate_ghes_state = {
 }
 };
 
+static const VMStateDescription vmstate_hest = {
+.name = "acpi-hest",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (const VMStateField[]) {
+VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
+VMSTATE_END_OF_LIST()
+},
+};
+
+static bool hest_needed(void *opaque)
+{
+AcpiGedState *s = opaque;
+return s->ghes_state.hest_addr_le;
+}
+
+static const VMStateDescription vmstate_hest_state = {
+.name = "acpi-ged/hest",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = hest_needed,
+.fields = (const VMStateField[]) {
+VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
+   vmstate_hest, AcpiGhesState),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_acpi_ged = {
 .name = "acpi-ged",
 .version_id = 1,
@@ -398,6 +426,7 @@ static const VMStateDescription vmstate_acpi_ged = {
 &vmstate_memhp_state,
 &vmstate_cpuhp_state,
 &vmstate_ghes_state,
+&vmstate_hest_state,
 NULL
 }
 };
-- 
2.48.1

[PATCH 23/38] target/hexagon: Add implicit attributes to sysemu macros

2025-03-02 Thread Brian Cain

From: Brian Cain 

Signed-off-by: Brian Cain 
---
 target/hexagon/hex_common.py | 5 +
 1 file changed, 5 insertions(+)

diff --git a/target/hexagon/hex_common.py b/target/hexagon/hex_common.py
index 1e94e1fef5..7b5bb2cd46 100755
--- a/target/hexagon/hex_common.py
+++ b/target/hexagon/hex_common.py
@@ -128,8 +128,13 @@ def calculate_attribs():
 add_qemu_macro_attrib("fTRAP", "A_IMPLICIT_READS_PC")
 add_qemu_macro_attrib("fSET_OVERFLOW", "A_IMPLICIT_WRITES_USR")
 add_qemu_macro_attrib("fSET_LPCFG", "A_IMPLICIT_WRITES_USR")
+add_qemu_macro_attrib("fLOAD_LOCKED", "A_LLSC")
+add_qemu_macro_attrib("fSTORE_LOCKED", "A_LLSC")
+add_qemu_macro_attrib("fCLEAR_RTE_EX", "A_IMPLICIT_WRITES_SSR")
 add_qemu_macro_attrib("fLOAD", "A_SCALAR_LOAD")
 add_qemu_macro_attrib("fSTORE", "A_SCALAR_STORE")
+add_qemu_macro_attrib("fSET_K0_LOCK", "A_IMPLICIT_READS_PC")
+add_qemu_macro_attrib("fSET_TLB_LOCK", "A_IMPLICIT_READS_PC")
 add_qemu_macro_attrib('fLSBNEW0', 'A_IMPLICIT_READS_P0')
 add_qemu_macro_attrib('fLSBNEW0NOT', 'A_IMPLICIT_READS_P0')
 add_qemu_macro_attrib('fREAD_P0', 'A_IMPLICIT_READS_P0')
-- 
2.34.1

[PATCH v7 17/19] tests/acpi: virt: update HEST and DSDT tables

2025-03-02 Thread Mauro Carvalho Chehab

@@ -1,39 +1,39 @@
 /*
  * Intel ACPI Component Architecture
  * AML/ASL+ Disassembler version 20240322 (64-bit version)
  * Copyright (c) 2000 - 2023 Intel Corporation
  *
- * Disassembly of tests/data/acpi/aarch64/virt/HEST
+ * Disassembly of /tmp/aml-DMPE22
  *
  * ACPI Data Table [HEST]
  *
  * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue (in 
hex)
  */

 [000h  004h]   Signature : "HEST"[Hardware Error 
Source Table]
-[004h 0004 004h]Table Length : 0084
+[004h 0004 004h]Table Length : 00E0
 [008h 0008 001h]Revision : 01
-[009h 0009 001h]Checksum : E2
+[009h 0009 001h]Checksum : 6C
 [00Ah 0010 006h]  Oem ID : "BOCHS "
 [010h 0016 008h]Oem Table ID : "BXPC"
 [018h 0024 004h]Oem Revision : 0001
 [01Ch 0028 004h] Asl Compiler ID : "BXPC"
 [020h 0032 004h]   Asl Compiler Revision : 0001

-[024h 0036 004h]  Error Source Count : 0001
+[024h 0036 004h]  Error Source Count : 0002

 [028h 0040 002h]   Subtable Type : 000A [Generic Hardware Error 
Source V2]
 [02Ah 0042 002h]   Source Id : 
 [02Ch 0044 002h]   Related Source Id : 
 [02Eh 0046 001h]Reserved : 00
 [02Fh 0047 001h] Enabled : 01
 [030h 0048 004h]  Records To Preallocate : 0001
 [034h 0052 004h] Max Sections Per Record : 0001
 [038h 0056 004h] Max Raw Data Length : 0400

 [03Ch 0060 00Ch]Error Status Address : [Generic Address Structure]
 [03Ch 0060 001h]Space ID : 00 [SystemMemory]
 [03Dh 0061 001h]   Bit Width : 40
 [03Eh 0062 001h]  Bit Offset : 00
 [03Fh 0063 001h]Encoded Access Width : 04 [QWord Access:64]
 [040h 0064 008h] Address : 43DA
@@ -42,32 +42,75 @@
 [048h 0072 001h] Notify Type : 08 [SEA]
 [049h 0073 001h]   Notify Length : 1C
 [04Ah 0074 002h]  Configuration Write Enable : 
 [04Ch 0076 004h]PollInterval : 
 [050h 0080 004h]  Vector : 
 [054h 0084 004h] Polling Threshold Value : 
 [058h 0088 004h]Polling Threshold Window : 
 [05Ch 0092 004h]   Error Threshold Value : 
 [060h 0096 004h]  Error Threshold Window : 

 [064h 0100 004h]   Error Status Block Length : 0400
 [068h 0104 00Ch]   Read Ack Register : [Generic Address Structure]
 [068h 0104 001h]Space ID : 00 [SystemMemory]
 [069h 0105 001h]   Bit Width : 40
 [06Ah 0106 001h]  Bit Offset : 00
 [06Bh 0107 001h]Encoded Access Width : 04 [QWord Access:64]
-[06Ch 0108 008h] Address : 43DA0008
+[06Ch 0108 008h] Address : 43DA0010

 [074h 0116 008h]   Read Ack Preserve : FFFE
 [07Ch 0124 008h]  Read Ack Write : 0001

-Raw Table Data: Length 132 (0x84)
+[084h 0132 002h]   Subtable Type : 000A [Generic Hardware Error 
Source V2]
+[086h 0134 002h]   Source Id : 0001
+[088h 0136 002h]   Related Source Id : 
+[08Ah 0138 001h]Reserved : 00
+[08Bh 0139 001h] Enabled : 01
+[08Ch 0140 004h]  Records To Preallocate : 0001
+[090h 0144 004h] Max Sections Per Record : 0001
+[094h 0148 004h] Max Raw Data Length : 0400
+
+[098h 0152 00Ch]Error Status Address : [Generic Address Structure]
+[098h 0152 001h]Space ID : 00 [SystemMemory]
+[099h 0153 001h]   Bit Width : 40
+[09Ah 0154 001h]  Bit Offset : 00
+[09Bh 0155 001h]Encoded Access Width : 04 [QWord Access:64]
+[09Ch 0156 008h] Address : 43DA0008
+
+[0A4h 0164 01Ch]  Notify : [Hardware Error Notification 
Structure]
+[0A4h 0164 001h] Notify Type : 07 [GPIO]
+[0A5h 0165 001h]   Notify Length : 1C
+[0A6h 0166 002h]  Configuration Write Enable : 
+[0A8h 0168 004h]PollInterval : 
+[0ACh 0172 004h]  Vector : 
+[0B0h 0176 004h] Polling Threshold Value : 
+[0B4h 0180 004h]Polling Threshold Window : 
+[0B8h 0184 004h]   Error Threshold Value : 
+[0BCh 0188 004h]  Error Threshold Window : 
+
+[0C0h 0192 004h]   Error Status Block Length : 0400
+[0C4h 0196 00Ch]   Read Ack Register : [Generic Address Structure]
+[0C4h 0196 001h]Space ID : 00 [SystemMemory]
+[0C5h 0197 001h]   Bit Width : 40
+[0C6h 0198 001h]  Bit Offset : 00
+[0C7h 0199 001h]Encoded Access Width : 04 [QWord Access:64]
+[0C8

[PATCH v7 11/19] acpi/generic_event_device: add logic to detect if HEST addr is available

2025-03-02 Thread Mauro Carvalho Chehab

Create a new property (x-has-hest-addr) and use it to detect if
the GHES table offsets can be calculated from the HEST address
(qemu 10.0 and upper) or via the legacy way via an offset obtained
from the hardware_errors firmware file.

Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Igor Mammedov 
---
 hw/acpi/generic_event_device.c |  2 ++
 hw/arm/virt-acpi-build.c   | 18 --
 hw/core/machine.c  |  2 ++
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index 5346cae573b7..471d1a7afc76 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -318,6 +318,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, 
AcpiEventStatusBits ev)
 
 static const Property acpi_ged_properties[] = {
 DEFINE_PROP_UINT32("ged-event", AcpiGedState, ged_event_bitmap, 0),
+DEFINE_PROP_BOOL("x-has-hest-addr", AcpiGedState,
+ ghes_state.use_hest_addr, false),
 };
 
 static const VMStateDescription vmstate_memhp_state = {
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index ea9682ee2662..5443615d976d 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -897,6 +897,10 @@ static const AcpiNotificationSourceId hest_ghes_notify[] = 
{
 { ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
 };
 
+static const AcpiNotificationSourceId hest_ghes_notify_9_2[] = {
+{ ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
+};
+
 static
 void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
 {
@@ -951,15 +955,25 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
 
 if (vms->ras) {
 AcpiGedState *acpi_ged_state;
+static const AcpiNotificationSourceId *notify;
+unsigned int notify_sz;
 AcpiGhesState *ags;
 
 acpi_ged_state = ACPI_GED(vms->acpi_dev);
 ags = &acpi_ged_state->ghes_state;
 if (ags) {
 acpi_add_table(table_offsets, tables_blob);
+
+if (!ags->use_hest_addr) {
+notify = hest_ghes_notify_9_2;
+notify_sz = ARRAY_SIZE(hest_ghes_notify_9_2);
+} else {
+notify = hest_ghes_notify;
+notify_sz = ARRAY_SIZE(hest_ghes_notify);
+}
+
 acpi_build_hest(ags, tables_blob, tables->hardware_errors,
-tables->linker, hest_ghes_notify,
-ARRAY_SIZE(hest_ghes_notify),
+tables->linker, notify, notify_sz,
 vms->oem_id, vms->oem_table_id);
 }
 }
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 02cff735b3fb..7a11e0f87b11 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -34,6 +34,7 @@
 #include "hw/virtio/virtio-pci.h"
 #include "hw/virtio/virtio-net.h"
 #include "hw/virtio/virtio-iommu.h"
+#include "hw/acpi/generic_event_device.h"
 #include "audio/audio.h"
 
 GlobalProperty hw_compat_9_2[] = {
@@ -43,6 +44,7 @@ GlobalProperty hw_compat_9_2[] = {
 { "virtio-balloon-pci-non-transitional", "vectors", "0" },
 { "virtio-mem-pci", "vectors", "0" },
 { "migration", "multifd-clean-tls-termination", "false" },
+{ TYPE_ACPI_GED, "x-has-hest-addr", "false" },
 };
 const size_t hw_compat_9_2_len = G_N_ELEMENTS(hw_compat_9_2);
 
-- 
2.48.1

[PATCH v7 13/19] tests/acpi: virt: allow acpi table changes at DSDT and HEST tables

2025-03-02 Thread Mauro Carvalho Chehab

We'll be adding a new GED device for HEST GPIO notification and
increasing the number of entries at the HEST table.

Blocklist testing HEST and DSDT tables until such changes
are completed.

Signed-off-by: Mauro Carvalho Chehab 
Acked-by: Igor Mammedov 
Reviewed-by: Jonathan Cameron 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8bf4..0a1a26543ba2 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,7 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/aarch64/virt/HEST",
+"tests/data/acpi/aarch64/virt/DSDT",
+"tests/data/acpi/aarch64/virt/DSDT.acpihmatvirt",
+"tests/data/acpi/aarch64/virt/DSDT.memhp",
+"tests/data/acpi/aarch64/virt/DSDT.pxb",
+"tests/data/acpi/aarch64/virt/DSDT.topology",
-- 
2.48.1

Re: [PULL 3/6] python: add qapi static analysis tests

2025-03-02 Thread Stefan Hajnoczi

Hi John,
Please take a look at this CI failure:
https://gitlab.com/qemu-project/qemu/-/jobs/9284725716#L150

If you cannot reproduce it locally there is a chance that other pull
requests on the staging branch caused the errors. If that's the case,
please wait for the next update to qemu.git/master and rebase your
patch.

Thanks,
Stefan

Re: [PATCH v7 28/52] i386/tdx: Wire TDX_REPORT_FATAL_ERROR with GuestPanic facility

2025-03-02 Thread Xiaoyao Li


On 2/28/2025 12:30 AM, Francesco Lavra wrote:

On Fri, 2025-01-24 at 08:20 -0500, Xiaoyao Li wrote:

diff --git a/system/runstate.c b/system/runstate.c
index 272801d30769..c4244c8915c6 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -565,6 +565,60 @@ static void qemu_system_wakeup(void)
  }
  }
  
+static char *tdx_parse_panic_message(char *message)

+{
+    bool printable = false;
+    char *buf = NULL;
+    int len = 0, i;
+
+    /*
+ * Although message is defined as a json string, we shouldn't
+ * unconditionally treat it as is because the guest generated it
and
+ * it's not necessarily trustable.
+ */
+    if (message) {
+    /* The caller guarantees the NULL-terminated string. */
+    len = strlen(message);
+
+    printable = len > 0;
+    for (i = 0; i < len; i++) {
+    if (!(0x20 <= message[i] && message[i] <= 0x7e)) {
+    printable = false;
+    break;
+    }
+    }
+    }
+
+    if (len == 0) {
+    buf = g_malloc(1);
+    buf[0] = '\0';
+    } else {
+    if (!printable) {
+    /* 3 = length of "%02x " */
+    buf = g_malloc(len * 3);
+    for (i = 0; i < len; i++) {
+    if (message[i] == '\0') {
+    break;
+    } else {
+    sprintf(buf + 3 * i, "%02x ", message[i]);
+    }
+    }
+    if (i > 0) {
+    /* replace the last ' '(space) to NULL */
+    buf[i * 3 - 1] = '\0';
+    } else {
+    buf[0] = '\0';
+    }
+
+    } else {
+    buf = g_malloc(len);
+    memcpy(buf, message, len);


This fails to null-terminate the message string in buf.



will just use g_strdup(message);

Re: [PATCH v7 38/52] i386/apic: Skip kvm_apic_put() for TDX

2025-03-02 Thread Xiaoyao Li


On 2/28/2025 12:57 AM, Francesco Lavra wrote:

On Fri, 2025-01-24 at 08:20 -0500, Xiaoyao Li wrote:

KVM neithers allow writing to MSR_IA32_APICBASE for TDs, nor allow
for
KVM_SET_LAPIC[*].

Note, KVM_GET_LAPIC is also disallowed for TDX. It is called in the
path

   do_kvm_cpu_synchronize_state()
   -> kvm_arch_get_registers()
  -> kvm_get_apic()

and it's already disllowed for confidential guest through
guest_state_protected.

[*] https://lore.kernel.org/all/z3w4ku4jq0crt...@google.com/

Signed-off-by: Xiaoyao Li 
---
  hw/i386/kvm/apic.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
index 757510600098..a1850524a67f 100644
--- a/hw/i386/kvm/apic.c
+++ b/hw/i386/kvm/apic.c
@@ -17,6 +17,7 @@
  #include "system/hw_accel.h"
  #include "system/kvm.h"
  #include "kvm/kvm_i386.h"
+#include "kvm/tdx.h"
  
  static inline void kvm_apic_set_reg(struct kvm_lapic_state *kapic,

  int reg_id, uint32_t val)
@@ -141,6 +142,10 @@ static void kvm_apic_put(CPUState *cs,
run_on_cpu_data data)
  struct kvm_lapic_state kapic;
  int ret;
  
+    if(is_tdx_vm()) {


Missing space between if and (.
scripts/checkpatch.pl would have caught this.


Me to be blamed that don't use checkpatch.pl everytime before post.

Re: [PATCH v4 6/6] migration: Add qtest for migration over RDMA

2025-03-02 Thread Zhijian Li (Fujitsu)

Fabiano

Thanks for your testing.

On 28/02/2025 21:49, Fabiano Rosas wrote:
> Li Zhijian via  writes:
> 
>> This qtest requires there is a RDMA(RoCE) link in the host.
>> In order to make the test work smoothly, introduce a
>> scripts/rdma-migration-helper.sh to
>> - setup a new Soft-RoCE(aka RXE) if it's root
>> - detect existing RoCE link
>>
>> Test will be skipped if there is no available RoCE link.
>>   # Start of rdma tests
>>   # Running /x86_64/migration/precopy/rdma/plain
>>   ok 1 /x86_64/migration/precopy/rdma/plain # SKIP
>>   There is no available rdma link to run RDMA migration test.
>>   To enable the test:
>>   (1) Run 'scripts/rdma-migration-helper.sh setup' with root and rerun the 
>> test
> 
> sudo scripts/rdma-migration-helper.sh setup
> QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test
> --full -r /x86_64/migration/precopy/rdma/plain
> 
> # {
> # "error": {
> # "class": "GenericError",
> # "desc": "RDMA ERROR: rdma migration: error registering 0 control!"
> # }
> # }
> 


1333 static int qemu_rdma_reg_control(RDMAContext *rdma, int idx)
1334 {
1335 rdma->wr_data[idx].control_mr = ibv_reg_mr(rdma->pd,
1336 rdma->wr_data[idx].control, RDMA_CONTROL_MAX_BUFFER,
1337 IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE);  <<<=== It 
failed here

1338 if (rdma->wr_data[idx].control_mr) {
1339 rdma->total_registrations++;
1340 return 0;
1341 }
1342 return -1;
1343 }


It appears to have failed at ibv_reg_mr()
  
This worked on my Ubuntu2204 and Fedora40. I wonder if your distro's security 
policy
is preventing MR registration without root privileges...?




>>   or
>>   (2) Run the test with root privilege
> 
> This one works fine.
> 
>>
>>   # End of rdma tests
>>
>> Reviewed-by: Peter Xu 
>> Signed-off-by: Li Zhijian 
>> ---
>>   MAINTAINERS   |  1 +
>>   scripts/rdma-migration-helper.sh  | 41 +
>>   tests/qtest/migration/precopy-tests.c | 64 +++
>>   3 files changed, 106 insertions(+)
>>   create mode 100755 scripts/rdma-migration-helper.sh
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 3848d37a38d..15360fcdc4b 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -3480,6 +3480,7 @@ R: Li Zhijian 
>>   R: Peter Xu 
>>   S: Odd Fixes
>>   F: migration/rdma*
>> +F: scripts/rdma-migration-helper.sh
>>   
>>   Migration dirty limit and dirty page rate
>>   M: Hyman Huang 
>> diff --git a/scripts/rdma-migration-helper.sh 
>> b/scripts/rdma-migration-helper.sh
>> new file mode 100755
>> index 000..66557d9e267
>> --- /dev/null
>> +++ b/scripts/rdma-migration-helper.sh
>> @@ -0,0 +1,41 @@
>> +#!/bin/bash
>> +
> 
> I'd prefer a command -v rdma check around here. With the way the script
> pipes commands into one another will cause bash to emit a couple of
> "rdma: command not found" in case rdma command is not present.
> 

It sounds good to me.


>> +# Copied from blktests
>> +get_ipv4_addr()
>> +{
>> +ip -4 -o addr show dev "$1" |
>> +sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' |
>> +tr -d '\n'
>> +}
>> +
>> +has_soft_rdma()
>> +{
>> +rdma link | grep -q " netdev $1[[:blank:]]*\$"
>> +}
>> +
>> +rdma_rxe_setup_detect()
>> +{
>> +(
>> +cd /sys/class/net &&
>> +for i in *; do
>> +[ -e "$i" ] || continue
>> +[ "$i" = "lo" ] && continue
>> +[ "$(<"$i/addr_len")" = 6 ] || continue
>> +[ "$(<"$i/carrier")" = 1 ] || continue
>> +
>> +has_soft_rdma "$i" && break
>> +[ "$operation" = "setup" ] &&
>> +rdma link add "${i}_rxe" type rxe netdev "$i" && break
>> +done
>> +has_soft_rdma "$i" || return
>> +get_ipv4_addr "$i"
>> +)
>> +}
>> +
>> +operation=${1:-setup}
>> +
>> +if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then
>> +rdma_rxe_setup_detect
>> +else
>> +echo "Usage: $0 [setup | detect]"
>> +fi
> 
> What happened to the cleanup option? I think I missed some discussion on
> this... We can't expect people to know how to clean this up without any
> hint.

Nothing special, one reason could be to keep it as simple as possible in the 
beginning.

I'm fine to add it back.


> 
>> diff --git a/tests/qtest/migration/precopy-tests.c 
>> b/tests/qtest/migration/precopy-tests.c
>> index ba273d10b9a..bf97f4e9325 100644
>> --- a/tests/qtest/migration/precopy-tests.c
>> +++ b/tests/qtest/migration/precopy-tests.c
>> @@ -99,6 +99,66 @@ static void test_precopy_unix_dirty_ring(void)
>>   test_precopy_common(&args);
>>   }
>>   
>> +#ifdef CONFIG_RDMA
>> +
>> +#define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh"
>> +static int new_rdma_link(char *buffer)
>> +{
>> +const char *argument = (geteuid() == 0) ? "setup" : "detect";
>> +char cmd[1024];
>> +
>> +snprintf(cmd, s

Re: [PATCH v2 02/10] target/i386: disable PERFCORE when "-pmu" is configured

2025-03-02 Thread Xiaoyao Li


On 3/3/2025 6:00 AM, Dongli Zhang wrote:

Currently, AMD PMU support isn't determined based on CPUID, that is, the
"-pmu" option does not fully disable KVM AMD PMU virtualization.

To minimize AMD PMU features, remove PERFCORE when "-pmu" is configured.

To completely disable AMD PMU virtualization will be implemented via
KVM_CAP_PMU_CAPABILITY in upcoming patches.

As a reminder, neither CPUID_EXT3_PERFCORE nor
CPUID_8000_0022_EAX_PERFMON_V2 is removed from env->features[] when "-pmu"
is configured. Developers should query whether they are supported via
cpu_x86_cpuid() rather than relying on env->features[] in future patches.


I don't think it is the correct direction to go.

env->features[] should be finalized before cpu_x86_cpuid() and 
env->features[] needs to be able to be exposed to guest directly. This 
ensures guest and QEMU have the same view of CPUIDs and it simplifies 
things.


We can adjust env->features[] by filtering all PMU related CPUIDs based 
on cpu->enable_pmu in x86_cpu_realizefn().



Suggested-by: Zhao Liu 
Signed-off-by: Dongli Zhang 
---
  target/i386/cpu.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index b6d6167910..61a671028a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7115,6 +7115,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
  !(env->hflags & HF_LMA_MASK)) {
  *edx &= ~CPUID_EXT2_SYSCALL;
  }
+
+if (kvm_enabled() && IS_AMD_CPU(env) && !cpu->enable_pmu) {
+*ecx &= ~CPUID_EXT3_PERFCORE;
+}
  break;
  case 0x8002:
  case 0x8003:

[PATCH] hw/char: sifive_uart: Free fifo on unrealize

2025-03-02 Thread Alistair Francis

We previously allocate the fifo on reset and never free it, which means
we are leaking memory.

Instead let's allocate on realize and free on unrealize.

Signed-off-by: Alistair Francis 
---
 hw/char/sifive_uart.c | 44 +++
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/hw/char/sifive_uart.c b/hw/char/sifive_uart.c
index 4bc5767284..b45e6c098c 100644
--- a/hw/char/sifive_uart.c
+++ b/hw/char/sifive_uart.c
@@ -251,6 +251,23 @@ static int sifive_uart_be_change(void *opaque)
 return 0;
 }
 
+static void sifive_uart_reset_enter(Object *obj, ResetType type)
+{
+SiFiveUARTState *s = SIFIVE_UART(obj);
+
+s->txfifo = 0;
+s->ie = 0;
+s->ip = 0;
+s->txctrl = 0;
+s->rxctrl = 0;
+s->div = 0;
+
+s->rx_fifo_len = 0;
+
+memset(s->rx_fifo, 0, SIFIVE_UART_RX_FIFO_SIZE);
+fifo8_reset(&s->tx_fifo);
+}
+
 static const Property sifive_uart_properties[] = {
 DEFINE_PROP_CHR("chardev", SiFiveUARTState, chr),
 };
@@ -270,30 +287,24 @@ static void sifive_uart_realize(DeviceState *dev, Error 
**errp)
 {
 SiFiveUARTState *s = SIFIVE_UART(dev);
 
+fifo8_create(&s->tx_fifo, SIFIVE_UART_TX_FIFO_SIZE);
+
 s->fifo_trigger_handle = timer_new_ns(QEMU_CLOCK_VIRTUAL,
   fifo_trigger_update, s);
 
-qemu_chr_fe_set_handlers(&s->chr, sifive_uart_can_rx, sifive_uart_rx,
- sifive_uart_event, sifive_uart_be_change, s,
- NULL, true);
+if (qemu_chr_fe_backend_connected(&s->chr)) {
+qemu_chr_fe_set_handlers(&s->chr, sifive_uart_can_rx, sifive_uart_rx,
+ sifive_uart_event, sifive_uart_be_change, s,
+ NULL, true);
+}
 
 }
 
-static void sifive_uart_reset_enter(Object *obj, ResetType type)
+static void sifive_uart_unrealize(DeviceState *dev)
 {
-SiFiveUARTState *s = SIFIVE_UART(obj);
-
-s->txfifo = 0;
-s->ie = 0;
-s->ip = 0;
-s->txctrl = 0;
-s->rxctrl = 0;
-s->div = 0;
-
-s->rx_fifo_len = 0;
+SiFiveUARTState *s = SIFIVE_UART(dev);
 
-memset(s->rx_fifo, 0, SIFIVE_UART_RX_FIFO_SIZE);
-fifo8_create(&s->tx_fifo, SIFIVE_UART_TX_FIFO_SIZE);
+fifo8_destroy(&s->tx_fifo);
 }
 
 static void sifive_uart_reset_hold(Object *obj, ResetType type)
@@ -329,6 +340,7 @@ static void sifive_uart_class_init(ObjectClass *oc, void 
*data)
 ResettableClass *rc = RESETTABLE_CLASS(oc);
 
 dc->realize = sifive_uart_realize;
+dc->unrealize = sifive_uart_unrealize;
 dc->vmsd = &vmstate_sifive_uart;
 rc->phases.enter = sifive_uart_reset_enter;
 rc->phases.hold  = sifive_uart_reset_hold;
-- 
2.48.1

RE: [PATCH v3 01/28] hw/intc/aspeed: Support setting different memory and register size

2025-03-02 Thread Jamin Lin

Hi Cedric,

> Cc: Troy Lee 
> Subject: Re: [PATCH v3 01/28] hw/intc/aspeed: Support setting different
> memory and register size
> 
> On 2/26/25 04:40, Jamin Lin wrote:
> > Hi Cedric,
> >
>  and the register array as:
> 
>   uint32_t regs[ASPEED_INTC_NR_REGS];
> 
>  The number of regs looks pretty big for me. Are the registers
>  covering the whole MMIO aperture ?
> 
> >>> According to the datasheet, the entire register address space size
> >>> of INTC (CPU DIE) is 16KB (0x1210-0x12103FFF). Therefore, I set
> >>> the
> >> memory size to 0x4000.
> >>> Currently, we need to use the "GICINT192-201 raw status and clear"
> >>> register
> >> INTC1B04.
> >>> Thus, an array size of 0x2000 is sufficient.
> >>
> >> yes but we are only using these regs :
> >>
> >> REG32(GICINT128_EN, 0x1000)
> >> REG32(GICINT128_STATUS, 0x1004)
> >> REG32(GICINT129_EN, 0x1100)
> >> REG32(GICINT129_STATUS, 0x1104)
> >> REG32(GICINT130_EN, 0x1200)
> >> REG32(GICINT130_STATUS, 0x1204)
> >> REG32(GICINT131_EN, 0x1300)
> >> REG32(GICINT131_STATUS, 0x1304)
> >> REG32(GICINT132_EN, 0x1400)
> >> REG32(GICINT132_STATUS, 0x1404)
> >> REG32(GICINT133_EN, 0x1500)
> >> REG32(GICINT133_STATUS, 0x1504)
> >> REG32(GICINT134_EN, 0x1600)
> >> REG32(GICINT134_STATUS, 0x1604)
> >> REG32(GICINT135_EN, 0x1700)
> >> REG32(GICINT135_STATUS, 0x1704)
> >> REG32(GICINT136_EN, 0x1800)
> >> REG32(GICINT136_STATUS, 0x1804)
> >> REG32(GICINT192_201_EN, 0x1B00)
> >> REG32(GICINT192_201_STATUS, 0x1B04)
> >>
> >> so the first 0x1000 are unused.
> >>
> >>
> >>>
> >>> However, we are going to increase the size to 0x3000 to support the
> >>> co-processors SSP and TSP In the another patch series.
> >>> We also need to include the following register definitions:
> >>>
> >>> /* SSP INTC Registers */
> >>> REG32(SSPINT128_EN, 0x2000)
> >>> REG32(SSPINT128_STATUS, 0x2004)
> >>> REG32(SSPINT129_EN, 0x2100)
> >>> REG32(SSPINT129_STATUS, 0x2104)
> >>> REG32(SSPINT130_EN, 0x2200)
> >>> REG32(SSPINT130_STATUS, 0x2204)
> >>> REG32(SSPINT131_EN, 0x2300)
> >>> REG32(SSPINT131_STATUS, 0x2304)
> >>> REG32(SSPINT132_EN, 0x2400)
> >>> REG32(SSPINT132_STATUS, 0x2404)
> >>> REG32(SSPINT133_EN, 0x2500)
> >>> REG32(SSPINT133_STATUS, 0x2504)
> >>> REG32(SSPINT134_EN, 0x2600)
> >>> REG32(SSPINT134_STATUS, 0x2604)
> >>> REG32(SSPINT135_EN, 0x2700)
> >>> REG32(SSPINT135_STATUS, 0x2704)
> >>> REG32(SSPINT136_EN, 0x2800)
> >>> REG32(SSPINT136_STATUS, 0x2804)
> >>> REG32(SSPINT137_EN, 0x2900)
> >>> REG32(SSPINT137_STATUS, 0x2904)
> >>> REG32(SSPINT138_EN, 0x2A00)
> >>> REG32(SSPINT138_STATUS, 0x2A04)
> >>> REG32(SSPINT160_169_EN, 0x2B00)
> >>> REG32(SSPINT160_169_STATUS, 0x2B04)
> >>>
> 
> > +if (offset >= aic->reg_size) {
> 
>  This is dead code since the MMIO aperture has the same size. You
>  could remove the check.
> >>>
> >>> Will remove.
> 
> > qemu_log_mask(LOG_GUEST_ERROR,
> >   "%s: Out-of-bounds read at offset 0x%"
>  HWADDR_PRIx "\n",
> >   __func__, offset); @@ -143,7 +144,7
> @@
>  static
> > void aspeed_intc_write(void *opaque, hwaddr offset, uint64_t data,
> > uint32_t change;
> > uint32_t irq;
> >
> > -if (addr >= ASPEED_INTC_NR_REGS) {
> > +if (offset >= aic->reg_size) {
> > qemu_log_mask(LOG_GUEST_ERROR,
> >   "%s: Out-of-bounds write at offset 0x%"
>  HWADDR_PRIx "\n",
> >   __func__, offset); @@ -302,10 +303,16
> >> @@
> > static void aspeed_intc_realize(DeviceState *dev, Error **errp)
> > AspeedINTCClass *aic = ASPEED_INTC_GET_CLASS(s);
> > int i;
> >
> > +memory_region_init(&s->iomem_container, OBJECT(s),
> > +TYPE_ASPEED_INTC ".container", aic->mem_size);
> > +
> > +sysbus_init_mmio(sbd, &s->iomem_container);
> 
>  Why introduce a container ? Do you plan to have multiple sub-regions ?
> 
> >>> I created a container to save the entire register address space of
> >>> INTC and its sub-region to place the actual used register address space.
> >>> 1210-12103fff (prio 0, i/o): aspeed.intc.container
> >>> 1210-12101fff (prio 0, i/o):
> >>> aspeed.intc.regs 14c18000-14c183ff (prio 0, i/o):
> >> aspeed.intc.container
> >>> 14c18000-14c183d7 (prio 0, i/o):
> >>> aspeed.intc.regs
> >>>
> >>> If I misunderstood this design, I will change it to have two memory
> >>> regions
> >> for INTC and INTCIO, respectively.
>

RE: [PATCH v3 15/28] hw/misc/aspeed_scu: Fix the revision ID cannot be set in the SOC layer for AST2700

2025-03-02 Thread Jamin Lin

Hi Cedric,

> Subject: Re: [PATCH v3 15/28] hw/misc/aspeed_scu: Fix the revision ID cannot
> be set in the SOC layer for AST2700
> 
> Hello Jamin,
> 
> On 2/26/25 07:38, Jamin Lin wrote:
> > Hi Cedric,
> >
> >>
> >> On 2/13/25 04:35, Jamin Lin wrote:
> >>> According to the design of the AST2600, it has a Silicon Revision ID
> >>> Register, specifically SCU004 and SCU014, to set the Revision ID for
> >>> the
> >> AST2600.
> >>> For the AST2600 A3, SCU004 is set to 0x05030303 and SCU014 is set to
> >> 0x05030303.
> >>> In the "aspeed_ast2600_scu_reset" function, the hardcoded value
> >>> "AST2600_A3_SILICON_REV" is set in SCU004, and "s->silicon_rev" is
> >>> set in SCU014. The value of "s->silicon_rev" is set by the SOC layer
> >>> via the "silicon-rev" property.
> >>>
> >>> However, the design of the AST2700 is different. There are two SCU
> >> controllers:
> >>> SCU0 (CPU Die) and SCU1 (IO Die). In the AST2700, the firmware reads
> >>> the SCU Silicon Revision ID register (SCU0_000) and the SCUIO
> >>> Silicon Revision ID register (SCU1_000) and combines them into a 64-bit
> value.
> >>> The combined value of SCU0_000[23:16] and SCU1_000[23:16] represents
> >>> the silicon revision. For example, the AST2700-A1 revision is
> >>> "0x0601010306010103", where
> >>> SCU0_000 should be 06010103 and SCU1_000 should be 06010103.
> >>
> >> Are both these values supposed to be identical ? if not, we should
> >> plan for changes at machine/SoC level too.
> >>
> >
> > Currently, these values are supposed to be identical. Therefore, we
> > can reuse the current design of the silicon_rev in the machine/SoC layer for
> AST2700.
> >
> >>>static const uint32_t
> >>> ast2700_a0_resets[ASPEED_AST2700_SCU_NR_REGS]
> >> = {
> >>> -[AST2700_SILICON_REV]   = AST2700_A0_SILICON_REV,
> >>>[AST2700_HW_STRAP1] = 0x0800,
> >>>[AST2700_HW_STRAP1_CLR] = 0xFFF0FFF0,
> >>>[AST2700_HW_STRAP1_LOCK]= 0x0FFF,
> >>> @@ -940,6 +939,7 @@ static void aspeed_ast2700_scu_reset(DeviceState
> >> *dev)
> >>>AspeedSCUClass *asc = ASPEED_SCU_GET_CLASS(dev);
> >>>
> >>>memcpy(s->regs, asc->resets, asc->nr_regs * 4);
> >>> +s->regs[AST2700_SILICON_REV] = s->silicon_rev;
> >>>
> >>>}
> >>>
> >>>static void aspeed_2700_scu_class_init(ObjectClass *klass, void
> >>> *data) @@ -1032,7 +1032,6 @@ static const MemoryRegionOps
> >> aspeed_ast2700_scuio_ops = {
> >>>};
> >>>
> >>>static const uint32_t
> >> ast2700_a0_resets_io[ASPEED_AST2700_SCU_NR_REGS] = {
> >>> -[AST2700_SILICON_REV]   = 0x0603,
> >>>[AST2700_HW_STRAP1] = 0x0504,
> >>
> >> why isn't AST2700_HW_STRAP1 assigned with s->hw_strap1 property ?
> >>
> >
> > This is a bug. The design of the HW_STRAP register has changed in the
> AST2700.
> > There is one hw_strap1 register in the SCU (CPU DIE) and another hw_strap1
> register in the SCUIO (IO DIE).
> > The values of these two hw_strap1 registers should not be the same.
> >
> > To fix this issue, I made the following changes. Do you have any 
> > suggestions?
> 
> All Aspeed SoC models currently define "hw-strap1" and "hw-strap2"
> properties as aliases on the same properties of the SCU model :
> 
>  object_property_add_alias(obj, "hw-strap1", OBJECT(&s->scu),
>"hw-strap1");
>  object_property_add_alias(obj, "hw-strap2", OBJECT(&s->scu),
>"hw-strap2");
> 
> For the AST2700 SoC, you could change the "hw-strap2" alias to point to the
> SCUIO model  :
> 
>  object_property_add_alias(obj, "hw-strap1", OBJECT(&s->scu),
>"hw-strap1");
>  object_property_add_alias(obj, "hw-strap2", OBJECT(&s->scuio),
>"hw-strap1");
> 

Will fix it.
> > Additionally, would it be possible to submit a separate patch for the SCU
> silicon_rev and SCU hw_strap fix?
> 
> yes we should please.
> 
> > The patch series for supporting AST2700 A1 is quite large.
> 
> yes. That's why I asked you to split it :)
> 
> > Thanks-Jamin
> >
> > 1. I dumped the real values of both registers on the EVB
> >
> > root@ast2700-a0-default:~# md 14c02010 1  > SCUIO hw_strap1
> > 14c02010: 0500
> > root@ast2700-a0-default:~# md 12c02010 1 --> SCU hw_strap1
> > 12c02010: 0800
> >
> > The value AST2700_EVB_HW_STRAP1 0x0800 is used for the SCU
> hw_strap1.
> > The value AST2700_EVB_HW_STRAP2 0x0500 is used for the SCUIO
> > hw_strap1
> >
> > --- a/hw/arm/aspeed.c
> > +++ b/hw/arm/aspeed.c
> > @@ -181,8 +181,8 @@ struct AspeedMachineState {
> >
> >   #ifdef TARGET_AARCH64
> >   /* AST2700 evb hardware value */
> > -#define AST2700_EVB_HW_STRAP1 0x00C0 -#define
> > AST2700_EVB_HW_STRAP2 0x0003
> > +#define AST2700_EVB_HW_STRAP1 0x0800 #define
> > +AST2700_EVB_HW_STRAP2 0x0500
> >   #endif
> >
> > 2.  Change to set hw_strap2 in the SCUIO model. Note this will modify the
> hw

[PATCH v2 06/10] target/i386/kvm: rename architectural PMU variables

2025-03-02 Thread Dongli Zhang

AMD does not have what is commonly referred to as an architectural PMU.
Therefore, we need to rename the following variables to be applicable for
both Intel and AMD:

- has_architectural_pmu_version
- num_architectural_pmu_gp_counters
- num_architectural_pmu_fixed_counters

For Intel processors, the meaning of has_pmu_version remains unchanged.

For AMD processors:

has_pmu_version == 1 corresponds to versions before AMD PerfMonV2.
has_pmu_version == 2 corresponds to AMD PerfMonV2.

Signed-off-by: Dongli Zhang 
---
 target/i386/kvm/kvm.c | 49 ---
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8f293ffd61..e895d22f94 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -164,9 +164,16 @@ static bool has_msr_perf_capabs;
 static bool has_msr_pkrs;
 static bool has_msr_hwcr;
 
-static uint32_t has_architectural_pmu_version;
-static uint32_t num_architectural_pmu_gp_counters;
-static uint32_t num_architectural_pmu_fixed_counters;
+/*
+ * For Intel processors, the meaning is the architectural PMU version
+ * number.
+ *
+ * For AMD processors: 1 corresponds to the prior versions, and 2
+ * corresponds to AMD PerfMonV2.
+ */
+static uint32_t has_pmu_version;
+static uint32_t num_pmu_gp_counters;
+static uint32_t num_pmu_fixed_counters;
 
 static int has_xsave2;
 static int has_xcrs;
@@ -2072,24 +2079,24 @@ static void kvm_init_pmu_info(CPUX86State *env)
 
 cpu_x86_cpuid(env, 0x0a, 0, &eax, &unused, &unused, &edx);
 
-has_architectural_pmu_version = eax & 0xff;
-if (has_architectural_pmu_version > 0) {
-num_architectural_pmu_gp_counters = (eax & 0xff00) >> 8;
+has_pmu_version = eax & 0xff;
+if (has_pmu_version > 0) {
+num_pmu_gp_counters = (eax & 0xff00) >> 8;
 
 /*
  * Shouldn't be more than 32, since that's the number of bits
  * available in EBX to tell us _which_ counters are available.
  * Play it safe.
  */
-if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
-num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
+if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
+num_pmu_gp_counters = MAX_GP_COUNTERS;
 }
 
-if (has_architectural_pmu_version > 1) {
-num_architectural_pmu_fixed_counters = edx & 0x1f;
+if (has_pmu_version > 1) {
+num_pmu_fixed_counters = edx & 0x1f;
 
-if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
-num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
+if (num_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
+num_pmu_fixed_counters = MAX_FIXED_COUNTERS;
 }
 }
 }
@@ -4041,25 +4048,25 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 
env->poll_control_msr);
 }
 
-if (has_architectural_pmu_version > 0) {
-if (has_architectural_pmu_version > 1) {
+if (has_pmu_version > 0) {
+if (has_pmu_version > 1) {
 /* Stop the counter.  */
 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
 }
 
 /* Set the counter values.  */
-for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
+for (i = 0; i < num_pmu_fixed_counters; i++) {
 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i,
   env->msr_fixed_counters[i]);
 }
-for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+for (i = 0; i < num_pmu_gp_counters; i++) {
 kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i,
   env->msr_gp_counters[i]);
 kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i,
   env->msr_gp_evtsel[i]);
 }
-if (has_architectural_pmu_version > 1) {
+if (has_pmu_version > 1) {
 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS,
   env->msr_global_status);
 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
@@ -4519,17 +4526,17 @@ static int kvm_get_msrs(X86CPU *cpu)
 if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
 kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
 }
-if (has_architectural_pmu_version > 0) {
-if (has_architectural_pmu_version > 1) {
+if (has_pmu_version > 0) {
+if (has_pmu_version > 1) {
 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS, 0);
 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0);
 }

[PATCH v2 07/10] target/i386/kvm: query kvm.enable_pmu parameter

2025-03-02 Thread Dongli Zhang

There is no way to distinguish between the following scenarios:

(1) KVM_CAP_PMU_CAPABILITY is not supported.
(2) KVM_CAP_PMU_CAPABILITY is supported but disabled via the module
parameter kvm.enable_pmu=N.

In scenario (1), there is no way to fully disable AMD PMU virtualization.

In scenario (2), PMU virtualization is completely disabled by the KVM
module.

To help determine the scenario, read the kvm.enable_pmu value from the
sysfs module parameter.

There isn't any requirement to initialize 'has_pmu_version',
'num_pmu_gp_counters' or 'num_pmu_fixed_counters', if kvm.enable_pmu=N.

Signed-off-by: Dongli Zhang 
---
 target/i386/kvm/kvm.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index e895d22f94..efba3ae7a4 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -184,6 +184,10 @@ static int has_triple_fault_event;
 static bool has_msr_mcg_ext_ctl;
 
 static int has_pmu_cap;
+/*
+ * Read from /sys/module/kvm/parameters/enable_pmu.
+ */
+static bool kvm_pmu_disabled;
 
 static struct kvm_cpuid2 *cpuid_cache;
 static struct kvm_cpuid2 *hv_cpuid_cache;
@@ -3256,6 +3260,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 int ret;
 struct utsname utsname;
 Error *local_err = NULL;
+g_autofree char *kvm_enable_pmu;
 
 /*
  * Initialize SEV context, if required
@@ -3401,6 +3406,17 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 
 has_pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
 
+/*
+ * The kvm.enable_pmu's permission is 0444. It does not change until a
+ * reload of the KVM module.
+ */
+if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
+&kvm_enable_pmu, NULL, NULL)) {
+if (*kvm_enable_pmu == 'N') {
+kvm_pmu_disabled = true;
+}
+}
+
 return 0;
 }
 
-- 
2.43.5

[PATCH v2 03/10] [DO NOT MERGE] kvm: Introduce kvm_arch_pre_create_vcpu()

2025-03-02 Thread Dongli Zhang

From: Xiaoyao Li 

Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
work prior to create any vcpu. This is for i386 TDX because it needs
call TDX_INIT_VM before creating any vcpu.

The specific implemnet of i386 will be added in the future patch.

Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 
---
I used to send a version:
https://lore.kernel.org/all/20221119122901.2469-2-dongli.zh...@oracle.com/
Just pick the one from Xiaoyao's patchset as Dapeng may use this version
as well.
https://lore.kernel.org/all/20250124132048.3229049-8-xiaoyao...@intel.com/

 accel/kvm/kvm-all.c| 5 +
 include/system/kvm.h   | 1 +
 target/arm/kvm.c   | 5 +
 target/i386/kvm/kvm.c  | 5 +
 target/loongarch/kvm/kvm.c | 5 +
 target/mips/kvm.c  | 5 +
 target/ppc/kvm.c   | 5 +
 target/riscv/kvm/kvm-cpu.c | 5 +
 target/s390x/kvm/kvm.c | 5 +
 9 files changed, 41 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f89568bfa3..df9840e53a 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -540,6 +540,11 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 
 trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
+ret = kvm_arch_pre_create_vcpu(cpu, errp);
+if (ret < 0) {
+goto err;
+}
+
 ret = kvm_create_vcpu(cpu);
 if (ret < 0) {
 error_setg_errno(errp, -ret,
diff --git a/include/system/kvm.h b/include/system/kvm.h
index ab17c09a55..d7dfa25493 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -374,6 +374,7 @@ int kvm_arch_get_default_type(MachineState *ms);
 
 int kvm_arch_init(MachineState *ms, KVMState *s);
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp);
 int kvm_arch_init_vcpu(CPUState *cpu);
 int kvm_arch_destroy_vcpu(CPUState *cpu);
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index da30bdbb23..93f1a7245b 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -1874,6 +1874,11 @@ static int kvm_arm_sve_set_vls(ARMCPU *cpu)
 
 #define ARM_CPU_ID_MPIDR   3, 0, 0, 0, 5
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 int ret;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 6c749d4ee8..f41e190fb8 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2051,6 +2051,11 @@ full:
 abort();
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 struct {
diff --git a/target/loongarch/kvm/kvm.c b/target/loongarch/kvm/kvm.c
index a3f55155b0..91c3c67cdb 100644
--- a/target/loongarch/kvm/kvm.c
+++ b/target/loongarch/kvm/kvm.c
@@ -973,6 +973,11 @@ static int kvm_cpu_check_pmu(CPUState *cs, Error **errp)
 return 0;
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 uint64_t val;
diff --git a/target/mips/kvm.c b/target/mips/kvm.c
index d67b7c1a8e..ec53acb51a 100644
--- a/target/mips/kvm.c
+++ b/target/mips/kvm.c
@@ -61,6 +61,11 @@ int kvm_arch_irqchip_create(KVMState *s)
 return 0;
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 CPUMIPSState *env = cpu_env(cs);
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 966c2c6572..758298d565 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -477,6 +477,11 @@ static void kvmppc_hw_debug_points_init(CPUPPCState *cenv)
 }
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 PowerPCCPU *cpu = POWERPC_CPU(cs);
diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
index 23ce779359..55be7542e7 100644
--- a/target/riscv/kvm/kvm-cpu.c
+++ b/target/riscv/kvm/kvm-cpu.c
@@ -1362,6 +1362,11 @@ static int kvm_vcpu_enable_sbi_dbcn(RISCVCPU *cpu, 
CPUState *cs)
 return kvm_set_one_reg(cs, kvm_sbi_dbcn.kvm_reg_id, ®);
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 int ret = 0;
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 4d56e653dd..1f592733f4 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -404,6 +404,11 @@ unsigned long kvm_arch_vcpu_id(CPUState *cpu)
 return cpu->cpu_index;
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 unsigned int max_cpus = MACHINE(qdev_get_machine())->smp.max_cpus;
-- 
2.43.5

[PATCH v2 01/10] target/i386: disable PerfMonV2 when PERFCORE unavailable

2025-03-02 Thread Dongli Zhang

When the PERFCORE is disabled with "-cpu host,-perfctr-core", it is
reflected in in guest dmesg.

[0.285136] Performance Events: AMD PMU driver.

However, the guest CPUID indicates the PerfMonV2 is still available.

CPU:
   Extended Performance Monitoring and Debugging (0x8022):
  AMD performance monitoring V2 = true
  AMD LBR V2= false
  AMD LBR stack & PMC freezing  = false
  number of core perf ctrs  = 0x6 (6)
  number of LBR stack entries   = 0x0 (0)
  number of avail Northbridge perf ctrs = 0x0 (0)
  number of available UMC PMCs  = 0x0 (0)
  active UMCs bitmask   = 0x0

Disable PerfMonV2 in CPUID when PERFCORE is disabled.

Suggested-by: Zhao Liu 
Fixes: 209b0ac12074 ("target/i386: Add PerfMonV2 feature bit")
Signed-off-by: Dongli Zhang 
---
Changed since v1:
  - Use feature_dependencies (suggested by Zhao Liu).

 target/i386/cpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 72ab147e85..b6d6167910 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1805,6 +1805,10 @@ static FeatureDep feature_dependencies[] = {
 .from = { FEAT_7_1_EDX, CPUID_7_1_EDX_AVX10 },
 .to = { FEAT_24_0_EBX,  ~0ull },
 },
+{
+.from = { FEAT_8000_0001_ECX,   CPUID_EXT3_PERFCORE },
+.to = { FEAT_8000_0022_EAX, CPUID_8000_0022_EAX_PERFMON_V2 },
+},
 };
 
 typedef struct X86RegisterInfo32 {
-- 
2.43.5

[PATCH v2 10/10] target/i386/kvm: don't stop Intel PMU counters

2025-03-02 Thread Dongli Zhang

The kvm_put_msrs() sets the MSRs using KVM_SET_MSRS. The x86 KVM processes
these MSRs one by one in a loop, only saving the config and triggering the
KVM_REQ_PMU request. This approach does not immediately stop the event
before updating PMC.

In additional, PMU MSRs are set only at levels >= KVM_PUT_RESET_STATE,
excluding runtime. Therefore, updating these MSRs without stopping events
should be acceptable.

Finally, KVM creates kernel perf events with host mode excluded
(exclude_host = 1). While the events remain active, they don't increment
the counter during QEMU vCPU userspace mode.

No Fixed tag is going to be added for the commit 0d89436786b0 ("kvm:
migrate vPMU state"), because this isn't a bugfix.

Signed-off-by: Dongli Zhang 
---
 target/i386/kvm/kvm.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index c5911baef0..4902694129 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4160,13 +4160,6 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 }
 
 if (IS_INTEL_CPU(env) && has_pmu_version > 0) {
-if (has_pmu_version > 1) {
-/* Stop the counter.  */
-kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
-kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
-}
-
-/* Set the counter values.  */
 for (i = 0; i < num_pmu_fixed_counters; i++) {
 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i,
   env->msr_fixed_counters[i]);
@@ -4182,8 +4175,6 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
   env->msr_global_status);
 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
   env->msr_global_ovf_ctrl);
-
-/* Now start the PMU.  */
 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL,
   env->msr_fixed_ctr_ctrl);
 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL,
-- 
2.43.5

[PATCH v2 04/10] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured

2025-03-02 Thread Dongli Zhang

Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured,
there is no way to fully disable KVM AMD PMU virtualization. Neither
"-cpu host,-pmu" nor "-cpu EPYC" achieves this.

As a result, the following message still appears in the VM dmesg:

[0.263615] Performance Events: AMD PMU driver.

However, the expected output should be:

[0.596381] Performance Events: PMU not available due to virtualization, 
using software events only.
[0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

This occurs because AMD does not use any CPUID bit to indicate PMU
availability.

To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.

Signed-off-by: Dongli Zhang 
---
Changed since v1:
  - Switch back to the initial implementation with "-pmu".
https://lore.kernel.org/all/20221119122901.2469-3-dongli.zh...@oracle.com
  - Mention that "KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on
Intel platform because current "pmu" property works as expected."

 target/i386/kvm/kvm.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f41e190fb8..5c8a852dbd 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -176,6 +176,8 @@ static int has_triple_fault_event;
 
 static bool has_msr_mcg_ext_ctl;
 
+static int has_pmu_cap;
+
 static struct kvm_cpuid2 *cpuid_cache;
 static struct kvm_cpuid2 *hv_cpuid_cache;
 static struct kvm_msr_list *kvm_feature_msrs;
@@ -2053,6 +2055,33 @@ full:
 
 int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
 {
+static bool first = true;
+int ret;
+
+if (first) {
+first = false;
+
+/*
+ * Since Linux v5.18, KVM provides a VM-level capability to easily
+ * disable PMUs; however, QEMU has been providing PMU property per
+ * CPU since v1.6. In order to accommodate both, have to configure
+ * the VM-level capability here.
+ *
+ * KVM_PMU_CAP_DISABLE doesn't change the PMU
+ * behavior on Intel platform because current "pmu" property works
+ * as expected.
+ */
+if (has_pmu_cap && !X86_CPU(cpu)->enable_pmu) {
+ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
+KVM_PMU_CAP_DISABLE);
+if (ret < 0) {
+error_setg_errno(errp, -ret,
+ "Failed to set KVM_PMU_CAP_DISABLE");
+return ret;
+}
+}
+}
+
 return 0;
 }
 
@@ -3351,6 +3380,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 }
 }
 
+has_pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
+
 return 0;
 }
 
-- 
2.43.5

Re: [PATCH] hw/ppc/e500: Partial implementation of local access window registers

2025-03-02 Thread Bernhard Beschow




Am 1. März 2025 16:10:35 UTC schrieb BALATON Zoltan :
>On Wed, 15 Jan 2025, BALATON Zoltan wrote:
>> This allows guests to set the CCSR base address. Also store and return
>> values of the local access window registers but their functionality
>> isn't implemented.
>
>Bernhard,

Hi Zoltan,

>
>If you have no alternative patch you plan to submit for next release should we 
>merge this for now? This helps running u-boot binaries even if it's not enough 
>alone but would be one patch less in my local tree. Or do you know about a 
>problem with this patch why this should not be merged?

QEMU sets a machine-specific CCSR base address (pmc->ccsrbar_base) which 
differs from the real chip's default. The default is checked by U-Boot which 
enters an infinite loop on mismatch: 
.
 How does this work for you?

In addition, when moving the CCSR region, `env->mpic_iack` should be updated as 
well: 


I'm happy to submit an implementation on top of my e500 cleanup series 
 which 
needs agreement on some open discussions.

Best regards,
Bernhard

>
>Regards,
>BALATON Zoltan
>
>> Signed-off-by: BALATON Zoltan 
>> ---
>> hw/ppc/e500-ccsr.h |  4 +++
>> hw/ppc/e500.c  | 79 --
>> 2 files changed, 80 insertions(+), 3 deletions(-)
>> 
>> diff --git a/hw/ppc/e500-ccsr.h b/hw/ppc/e500-ccsr.h
>> index 249c17be3b..cfbf96e181 100644
>> --- a/hw/ppc/e500-ccsr.h
>> +++ b/hw/ppc/e500-ccsr.h
>> @@ -4,12 +4,16 @@
>> #include "hw/sysbus.h"
>> #include "qom/object.h"
>> 
>> +#define NR_LAWS 12
>> +
>> struct PPCE500CCSRState {
>> /*< private >*/
>> SysBusDevice parent;
>> /*< public >*/
>> 
>> MemoryRegion ccsr_space;
>> +
>> +uint32_t law_regs[NR_LAWS * 2];
>> };
>> 
>> #define TYPE_CCSR "e500-ccsr"
>> diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
>> index 64f8c766b4..376cb4cb37 100644
>> --- a/hw/ppc/e500.c
>> +++ b/hw/ppc/e500.c
>> @@ -43,6 +43,7 @@
>> #include "qemu/host-utils.h"
>> #include "qemu/option.h"
>> #include "hw/pci-host/ppce500.h"
>> +#include "qemu/log.h"
>> #include "qemu/error-report.h"
>> #include "hw/platform-bus.h"
>> #include "hw/net/fsl_etsec/etsec.h"
>> @@ -1331,11 +1332,83 @@ void ppce500_init(MachineState *machine)
>> boot_info->dt_size = dt_size;
>> }
>> 
>> +static int law_idx(hwaddr addr)
>> +{
>> +int idx;
>> +
>> +addr -= 0xc08;
>> +idx = 2 * ((addr >> 5) & 0xf);
>> +if (addr & 8) {
>> +idx++;
>> +}
>> +assert(idx < 2 * NR_LAWS);
>> +return idx;
>> +}
>> +
>> +static uint64_t law_read(void *opaque, hwaddr addr, unsigned size)
>> +{
>> +PPCE500CCSRState *s = opaque;
>> +uint64_t val = 0;
>> +
>> +switch (addr) {
>> +case 0:
>> +val = s->ccsr_space.addr >> 12;
>> +break;
>> +case 0xc08 ... 0xd70:
>> +val = s->law_regs[law_idx(addr)];
>> +break;
>> +default:
>> +qemu_log_mask(LOG_GUEST_ERROR, "Invalid local access register read"
>> +  "0x%" HWADDR_PRIx "\n", addr);
>> +}
>> +return val;
>> +}
>> +
>> +static void law_write(void *opaque, hwaddr addr, uint64_t val, unsigned 
>> size)
>> +{
>> +PPCE500CCSRState *s = opaque;
>> +
>> +switch (addr) {
>> +case 0:
>> +val &= 0x00;
>> +memory_region_set_address(&s->ccsr_space, val << 12);
>> +break;
>> +case 0xc08 ... 0xd70:
>> +{
>> +int idx = law_idx(addr);
>> +
>> +qemu_log_mask(LOG_UNIMP, "Unimplemented local access register write"
>> +  "0x%" HWADDR_PRIx " <- 0x%" PRIx64 "\n", addr, val);
>> +val &= (idx & 1) ? 0x80f0003f : 0xff;
>> +s->law_regs[idx] = val;
>> +break;
>> +}
>> +default:
>> +qemu_log_mask(LOG_GUEST_ERROR, "Invalid local access register write"
>> +  "0x%" HWADDR_PRIx "\n", addr);
>> +}
>> +}
>> +
>> +static const MemoryRegionOps law_ops = {
>> +.read = law_read,
>> +.write = law_write,
>> +.endianness = DEVICE_BIG_ENDIAN,
>> +.valid = {
>> +.min_access_size = 4,
>> +.max_access_size = 4,
>> +},
>> +};
>> +
>> static void e500_ccsr_initfn(Object *obj)
>> {
>> -PPCE500CCSRState *ccsr = CCSR(obj);
>> -memory_region_init(&ccsr->ccsr_space, obj, "e500-ccsr",
>> -   MPC8544_CCSRBAR_SIZE);
>> +PPCE500CCSRState *s = CCSR(obj);
>> +MemoryRegion *mr;
>> +
>> +memory_region_init(&s->ccsr_space, obj, "e500-ccsr", 
>> MPC8544_CCSRBAR_SIZE);
>> +
>> +mr = g_new(MemoryRegion, 1);
>> +memory_region_init_io(mr, obj, &law_ops, s, "local-access", 4096);
>> +memory_region_add_subregion(&s->ccsr_space, 0, mr);
>> }
>> 
>> static const TypeInfo e500_ccsr_info = {
>>

Re: [PATCH] bcm2838: Add GIC-400 timer interupt connections

2025-03-02 Thread Sourojeet Adhikari


On 2025-02-27 10:17, Peter Maydell wrote:


On Thu, 27 Feb 2025 at 09:15, Sourojeet Adhikari
 wrote:

The systmr INTERRUPT_TIMER0..3 sysbus IRQ outputs are already
being wired up in the function bcm_soc_peripherals_common_realize()
in hw/arm/bcm2835_peripherals.c (to the TYPE_BCM2835_IC
interrupt controller), and it isn't valid to wire one input
directly to multiple outputs.

In fact it looks like we are currently getting this wrong for
all of the interrupts that need to be wired to both the
"legacy interrupt controller" and the GIC. I think at the moment
what happens is that the wiring to the GIC will happen last
and this overrides the earlier wiring to the legacy interrupt
controller, so code using the latter won't work correctly.

I'll try reading through the relevant sections and send an
updated patch later next week. From what I can tell it falls
under the bcm2835_pheripherals.c file, right?

Yes. To expand a bit, QEMU's qemu_irq abstraction must
always be wired exactly 1-to-1, from a single output to
a single input. Wiring either one input to multiple outputs
or one output to multiple inputs will not behave correctly
(and unfortunately we don't have an easy way to assert()
if code in QEMU gets this wrong).

So for cases where you want the one-to-many behaviour you need
to create an object of TYPE_SPLIT_IRQ. This has one input and
multiple outputs, so you can connect your wire from device A's
output to the splitter's input, and then connect outputs
from the splitter to devices B, C, etc. (In this case A
would be the timer, and B, C the two interrupt controllers.)
Searching the source code for TYPE_SPLIT_IRQ will give some
places where it's used. (Ignore the qdev_new(TYPE_SPLIT_IRQ)
ones, those are a code pattern we use in board models, not
in SoC device models.)

In this specific bcm2838 case, it's a little more awkward,
because one of the two interrupt controllers is created inside
bcm2835_peripherals.c and one of them is created outside it.
Since bcm2838 is already reaching inside the bcm2835_peripherals
object I guess the simplest thing is:
  * create a splitter object in bcm2835_peripherals.c for
every IRQ line that needs to be connected to both
interrupt controllers (probably easiest to have an array
of splitter objects, irq_splitter[])
  * in bcm2835_peripherals.c, connect the device's outbound
IRQ to the input of the appropriate splitter, and
connect output 0 of that splitter to the BCM2835_IC
correct interrupt controller input
  * in bcm2838.c, connect output 0 of ps_base->irq_splitter[n]
to the correct GIC input

(This is kind of breaking the abstraction layer that ideally
exists where the code that creates and uses a device doesn't
try to look "inside" it at any subparts it might have. We
could, for instance, instead make the bcm2835_peripherals
object expose its own qemu_irq outputs which were the second
outputs of the splitters, so that the bcm2838.c code wasn't
looking inside and finding the splitters directly. But I
think that's more awkward than it's worth. It's also possible
that we have the split between the main SoC and the
peripheral object wrong and either both interrupt controllers
or neither should be inside the peripheral object; but
reshuffling things like that would be a lot of work too.)


This weekend I'll try my best to mess around, and get the solution
you proposed working. From what I can tell, I (personally) think , the 
not-reshuffling things approach might be a bit better here. Since otherwise 
it'd turn into a somewhat sizeable patch pretty quick, and is a lot of work, 
for something that's not *too* big of an issue. I do have access to a raspberry 
pi if you think I should do any kind of testing before doing the reshuffling.

On another note, do you think it's reasonable to add what you said here into 
the development documentation (paraphrased, and if not already documented). If 
I do write a patch to the documentation, can/should I cc you on it?


(PS: for the other "not 1:1" case, where you want to connect
many qemu_irqs outputs together into one input, the usual semantics
you want is to logically-OR the interrupt lines together, and
so you use TYPE_OR_IRQ for that.)


(oh oki, I'll make sure to do that on the upcoming patch then,
thank you!)

(P.S the patch probably won't be coming till next week since I have quite a bit 
of work outside of my programming stuff to do. Should hopfully be done by 
Wednesday next week though?)

Re: [PATCH v5 23/36] vfio/migration: Multifd device state transfer support - VFIOStateBuffer(s)

2025-03-02 Thread Cédric Le Goater


On 3/2/25 14:00, Avihai Horon wrote:


On 19/02/2025 22:34, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 

Add VFIOStateBuffer(s) types and the associated methods.

These store received device state buffers and config state waiting to get
loaded into the device.

Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration-multifd.c | 54 +
  1 file changed, 54 insertions(+)

diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index 0c3185a26242..760b110a39b9 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -29,3 +29,57 @@ typedef struct VFIODeviceStatePacket {
  uint32_t flags;
  uint8_t data[0];
  } QEMU_PACKED VFIODeviceStatePacket;
+
+/* type safety */
+typedef struct VFIOStateBuffers {
+    GArray *array;
+} VFIOStateBuffers;
+
+typedef struct VFIOStateBuffer {
+    bool is_present;
+    char *data;
+    size_t len;
+} VFIOStateBuffer;
+
+static void vfio_state_buffer_clear(gpointer data)
+{
+    VFIOStateBuffer *lb = data;
+
+    if (!lb->is_present) {
+    return;
+    }
+
+    g_clear_pointer(&lb->data, g_free);
+    lb->is_present = false;
+}
+
+static void vfio_state_buffers_init(VFIOStateBuffers *bufs)
+{
+    bufs->array = g_array_new(FALSE, TRUE, sizeof(VFIOStateBuffer));
+    g_array_set_clear_func(bufs->array, vfio_state_buffer_clear);
+}
+
+static void vfio_state_buffers_destroy(VFIOStateBuffers *bufs)
+{
+    g_clear_pointer(&bufs->array, g_array_unref);
+}
+
+static void vfio_state_buffers_assert_init(VFIOStateBuffers *bufs)
+{
+    assert(bufs->array);
+}
+
+static guint vfio_state_buffers_size_get(VFIOStateBuffers *bufs)
+{
+    return bufs->array->len;
+}
+
+static void vfio_state_buffers_size_set(VFIOStateBuffers *bufs, guint size)
+{
+    g_array_set_size(bufs->array, size);
+}
+
+static VFIOStateBuffer *vfio_state_buffers_at(VFIOStateBuffers *bufs, guint 
idx)
+{
+    return &g_array_index(bufs->array, VFIOStateBuffer, idx);
+}


This patch breaks compilation as non of the functions are used, e.g.: error: ‘vfio_state_buffers_init’ defined but not used I can think of three options to solve it: 1. Move these functions to their own file and export them, e.g., hw/vfio/state-buffer.{c,h}. But this seems like an overkill for such a small API. 2. Add __attribute__((unused)) tags and remove them in patch #26 where the functions are actually used. A bit ugly. 




3. Squash this patch into patch #26. I prefer option 3 as this is a small API 
closely related to patch #26 (and patch #26 will still remain rather small).


I vote for option 3 too.

vfio_state_buffers_init is only called once, it's 2 lines,
it could be merged in vfio_multifd_new() too.


Thanks,

C.

Adding gamma support to QemuMacDrivers

2025-03-02 Thread Hab Gallagher

Hi.

I have interest in contributing patches to extend qemu's powerpc graphics
card emulation to include more comprehensive support for gamma. Some
classic mac applications will balk at launch if the graphics card doesn't
claim enough support for gamma tricks. Native parity should be possible,
allowing applications to ramp the apparent screen brightness using gamma
tricks.

As far as I can tell, both https://github.com/ozbenh/QemuMacDrivers and
qemu itself need to be updated for this endeavor. It is unclear to me how
much of the code "should" live in one repository or the other. I assume
that it would be preferable to put as little code as possible into the
driver that runs emulated, and move the bulk of the work to the driver
side, in the qemu host.

I could not find any existing work on this, nor anyone asking about the
feature. I don't have experience hacking on qemu, but I have extensively
patched SheepShaver before. I hope to contribute more to qemu-ppc instead.

Thank you for any advice on direction or prior art.

[PATCH] target/loongarch: Adjust the cpu reset action to a proper position

2025-03-02 Thread Xianglai Li

The commit 5a99a10da6cf ("target/loongarch: fix vcpu reset command word issue")
fixes the error in the cpu reset ioctl command word delivery process,
so that the command word can be delivered correctly, and adds the judgment
and processing of the error return value, which exposes another problem that
under loongarch, the cpu reset action is earlier than the creation of vcpu.
An error occurs when the cpu reset command is sent.

Now adjust the order of cpu reset and vcpu create actions to fix this problem

Signed-off-by: Xianglai Li 
---
Cc: Bibo Mao 
Cc: Huacai Chen 
Cc: Song Gao 
Cc: Xianglai Li 

 target/loongarch/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 3788f895c1..67aa7875b6 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -640,8 +640,8 @@ static void loongarch_cpu_realizefn(DeviceState *dev, Error 
**errp)
 
 loongarch_cpu_register_gdb_regs_for_features(cs);
 
-cpu_reset(cs);
 qemu_init_vcpu(cs);
+cpu_reset(cs);
 
 lacc->parent_realize(dev, errp);
 }
-- 
2.39.1

[PATCH] target/loongarch: Adjust the cpu reset action to a proper position

2025-03-02 Thread Xianglai Li

The commit 5a99a10da6cf ("target/loongarch: fix vcpu reset command word issue")
fixes the error in the cpu reset ioctl command word delivery process,
so that the command word can be delivered correctly, and adds the judgment
and processing of the error return value, which exposes another problem that
under loongarch, the cpu reset action is earlier than the creation of vcpu.
An error occurs when the cpu reset command is sent.

Now adjust the order of cpu reset and vcpu create actions to fix this problem

Signed-off-by: Xianglai Li 
---
Bibo Mao 
Huacai Chen 
Song Gao 
Xianglai Li 

 target/loongarch/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 3788f895c1..67aa7875b6 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -640,8 +640,8 @@ static void loongarch_cpu_realizefn(DeviceState *dev, Error 
**errp)
 
 loongarch_cpu_register_gdb_regs_for_features(cs);
 
-cpu_reset(cs);
 qemu_init_vcpu(cs);
+cpu_reset(cs);
 
 lacc->parent_realize(dev, errp);
 }
-- 
2.39.1

Re: [PATCH v4 0/4] Support HACE to AST2700 (resend)

2025-03-02 Thread Cédric Le Goater


On 2/25/25 08:56, Jamin Lin wrote:

This patch series is from 
https://patchwork.kernel.org/project/qemu-devel/cover/20250213033531.3367697-1-jamin_...@aspeedtech.com/.
To expedite the review process, I have separated the HACE patches portion from
the 
https://patchwork.kernel.org/project/qemu-devel/cover/20250213033531.3367697-1-jamin_...@aspeedtech.com/
 patch series into this new patch series.

v4: Support HACE to AST2700

Jamin Lin (4):
   hw/misc/aspeed_hace: Fix coding style
   hw/misc/aspeed_hace: Add AST2700 support
   hw/arm/aspeed_ast27x0: Add HACE support for AST2700
   hw/misc/aspeed_hace: Fix boot issue in the Crypto Manager Self Test

  include/hw/misc/aspeed_hace.h |  2 ++
  hw/arm/aspeed_ast27x0.c   | 15 ++
  hw/misc/aspeed_hace.c | 55 ---
  3 files changed, 68 insertions(+), 4 deletions(-)




Applied to aspeed-next.

Thanks,

C.

Re: Kubernetes gitlab-runner jobs cannot be scheduled

2025-03-02 Thread Stefan Hajnoczi

On Sat, Mar 1, 2025 at 2:36 PM Paolo Bonzini  wrote:
>
> On 3/1/25 07:19, Stefan Hajnoczi wrote:
> > Hi,
> > On February 26th GitLab CI started failing many jobs because they
> > could not be scheduled. I've been unable to merge pull requests
> > because the CI is not working.
> >
> > Here is an example failed job:
> > https://gitlab.com/qemu-project/qemu/-/jobs/9281757413
>
> Hi Stefan,
>
> until February 26th the Digital Ocean runners were not enabled; I tried
> enabling them (which is what caused the issue) to start gauging how much
> credit we would need to be able to move from Azure to DO for CI.  I
> posted a note on IRC, I'm sorry if you missed that.

There is a new type of timeout failure:
https://gitlab.com/qemu-project/qemu/-/jobs/9288349332

GitLab says:
"There has been a timeout failure or the job got stuck. Check your
timeout limits or try again"

Duration: 77 minutes 13 seconds
Timeout: 1h (from project)

It ran 17 minutes longer than the job timeout.

Any idea?

Stefan

[PATCH v4 0/6] Fix hw-strap for AST2700

2025-03-02 Thread Jamin Lin via

v1: This patch series is from 
https://patchwork.kernel.org/project/qemu-devel/cover/20250213033531.3367697-1-jamin_...@aspeedtech.com/.
   To expedite the review process, I have separated the SCU fix patches
   
a. Fix the hw-strap and revision ID for SCU and SCUIO
b. ix boot issue for AST2700

Jamin Lin (6):
  hw/misc/aspeed_scu: Skipping dram_init in u-boot
  hw/misc/aspeed_scu: Fix the revision ID cannot be set in the SOC layer
for AST2700
  hw/arm/aspeed Update HW Strap Default Values for AST2700
  hw/misc/aspeed_scu: Fix the hw-strap1 cannot be set in the SOC layer
for AST2700
  hw/arm/aspeed_ast27x0.c Separate HW Strap Registers for SCU and SCUIO
  hw/arm/aspeed_ast27x0.c Fix boot issue for AST2700

 hw/arm/aspeed.c |  6 --
 hw/arm/aspeed_ast27x0.c | 13 ++---
 hw/misc/aspeed_scu.c|  8 
 3 files changed, 18 insertions(+), 9 deletions(-)

-- 
2.34.1

[PATCH v4 6/6] hw/arm/aspeed_ast27x0.c Fix boot issue for AST2700

2025-03-02 Thread Jamin Lin via

Currently, ASPEED_DEV_SPI_BOOT is set to "0x4", which is the DRAM start
address, and the QEMU loader is used to load the U-Boot binary into this 
address.

However, if users want to install FMC flash contents as a boot ROM, the DRAM
address 0x4 would be overwritten with Boot ROM data. This causes the
AST2700 to fail to boot because the U-Boot data becomes incorrect.

To fix this, change the ASPEED_DEV_SPI_BOOT address to "0x1", which is
the FMC0 memory-mapped start address in the AST2700.

Signed-off-by: Jamin Lin 
---
 hw/arm/aspeed_ast27x0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/aspeed_ast27x0.c b/hw/arm/aspeed_ast27x0.c
index 8c225d4f90..527a5f8245 100644
--- a/hw/arm/aspeed_ast27x0.c
+++ b/hw/arm/aspeed_ast27x0.c
@@ -24,7 +24,7 @@
 #include "qemu/log.h"
 
 static const hwaddr aspeed_soc_ast2700_memmap[] = {
-[ASPEED_DEV_SPI_BOOT]  =  0x4,
+[ASPEED_DEV_SPI_BOOT]  =  0x1,
 [ASPEED_DEV_SRAM]  =  0x1000,
 [ASPEED_DEV_SDMC]  =  0x12C0,
 [ASPEED_DEV_SCU]   =  0x12C02000,
-- 
2.34.1

[PATCH v4 2/6] hw/misc/aspeed_scu: Fix the revision ID cannot be set in the SOC layer for AST2700

2025-03-02 Thread Jamin Lin via

According to the design of the AST2600, it has a Silicon Revision ID Register,
specifically SCU004 and SCU014, to set the Revision ID for the AST2600.
For the AST2600 A3, SCU004 is set to 0x05030303 and SCU014 is set to 0x05030303.
In the "aspeed_ast2600_scu_reset" function, the hardcoded value
"AST2600_A3_SILICON_REV" is set in SCU004, and "s->silicon_rev" is set in
SCU014. The value of "s->silicon_rev" is set by the SOC layer via the
"silicon-rev" property.

However, the design of the AST2700 is different. There are two SCU controllers:
SCU0 (CPU Die) and SCU1 (IO Die). In the AST2700, the firmware reads the
SCU Silicon Revision ID register (SCU0_000) and the SCUIO Silicon Revision ID
register (SCU1_000) and combines them into a 64-bit value.
The combined value of SCU0_000[23:16] and SCU1_000[23:16] represents the silicon
revision. For example, the AST2700-A1 revision is "0x0601010306010103", where
SCU0_000 should be 06010103 and SCU1_000 should be 06010103.

Reference:
https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-v2023.10/arch/arm/mach-aspeed/ast2700/cpu-info.c

Signed-off-by: Jamin Lin 
---
 hw/misc/aspeed_scu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c
index 50f74fbabd..545d004749 100644
--- a/hw/misc/aspeed_scu.c
+++ b/hw/misc/aspeed_scu.c
@@ -910,7 +910,6 @@ static const MemoryRegionOps aspeed_ast2700_scu_ops = {
 };
 
 static const uint32_t ast2700_a0_resets[ASPEED_AST2700_SCU_NR_REGS] = {
-[AST2700_SILICON_REV]   = AST2700_A0_SILICON_REV,
 [AST2700_HW_STRAP1] = 0x0800,
 [AST2700_HW_STRAP1_CLR] = 0xFFF0FFF0,
 [AST2700_HW_STRAP1_LOCK]= 0x0FFF,
@@ -940,6 +939,7 @@ static void aspeed_ast2700_scu_reset(DeviceState *dev)
 AspeedSCUClass *asc = ASPEED_SCU_GET_CLASS(dev);
 
 memcpy(s->regs, asc->resets, asc->nr_regs * 4);
+s->regs[AST2700_SILICON_REV] = s->silicon_rev;
 }
 
 static void aspeed_2700_scu_class_init(ObjectClass *klass, void *data)
@@ -1032,7 +1032,6 @@ static const MemoryRegionOps aspeed_ast2700_scuio_ops = {
 };
 
 static const uint32_t ast2700_a0_resets_io[ASPEED_AST2700_SCU_NR_REGS] = {
-[AST2700_SILICON_REV]   = 0x0603,
 [AST2700_HW_STRAP1] = 0x0504,
 [AST2700_HW_STRAP1_CLR] = 0xFFF0FFF0,
 [AST2700_HW_STRAP1_LOCK]= 0x0FFF,
-- 
2.34.1

[PATCH v4 5/6] hw/arm/aspeed_ast27x0.c Separate HW Strap Registers for SCU and SCUIO

2025-03-02 Thread Jamin Lin via

There is one hw-strap1 register in the SCU (CPU DIE) and another hw-strap1
register in the SCUIO (IO DIE). The values of these two registers should not be
the same. To reuse the current design of hw-strap, hw-strap1 is assigned to the
SCU and sets the value in the SCU hw-strap1 register, while hw-strap2 is
assigned to the SCUIO and sets the value in the SCUIO hw-strap1 register.

Signed-off-by: Jamin Lin 
---
 hw/arm/aspeed_ast27x0.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/arm/aspeed_ast27x0.c b/hw/arm/aspeed_ast27x0.c
index a48f47b74e..8c225d4f90 100644
--- a/hw/arm/aspeed_ast27x0.c
+++ b/hw/arm/aspeed_ast27x0.c
@@ -331,16 +331,23 @@ static void aspeed_soc_ast2700_init(Object *obj)
 object_initialize_child(obj, "scu", &s->scu, TYPE_ASPEED_2700_SCU);
 qdev_prop_set_uint32(DEVICE(&s->scu), "silicon-rev",
  sc->silicon_rev);
+/*
+ * There is one hw-strap1 register in the SCU (CPU DIE) and another
+ * hw-strap1 register in the SCUIO (IO DIE). To reuse the current design
+ * of hw-strap, hw-strap1 is assigned to the SCU and sets the value in the
+ * SCU hw-strap1 register, while hw-strap2 is assigned to the SCUIO and
+ * sets the value in the SCUIO hw-strap1 register.
+ */
 object_property_add_alias(obj, "hw-strap1", OBJECT(&s->scu),
   "hw-strap1");
-object_property_add_alias(obj, "hw-strap2", OBJECT(&s->scu),
-  "hw-strap2");
 object_property_add_alias(obj, "hw-prot-key", OBJECT(&s->scu),
   "hw-prot-key");
 
 object_initialize_child(obj, "scuio", &s->scuio, TYPE_ASPEED_2700_SCUIO);
 qdev_prop_set_uint32(DEVICE(&s->scuio), "silicon-rev",
  sc->silicon_rev);
+object_property_add_alias(obj, "hw-strap2", OBJECT(&s->scuio),
+  "hw-strap1");
 
 snprintf(typename, sizeof(typename), "aspeed.fmc-%s", socname);
 object_initialize_child(obj, "fmc", &s->fmc, typename);
-- 
2.34.1

[PATCH v4 4/6] hw/misc/aspeed_scu: Fix the hw-strap1 cannot be set in the SOC layer for AST2700

2025-03-02 Thread Jamin Lin via

There is one hw_strap1 register in the SCU (CPU DIE) and another hw_strap1
register in the SCUIO (IO DIE).

In the "ast2700_a0_resets" function, the hardcoded value "0x0800" is set in
SCU hw-strap1 (CPU DIE), and in "ast2700_a0_resets_io" the hardcoded value
"0x0504" is set in SCUIO hw-strap1 (IO DIE). Both values cannot be set via
the SOC layer.

The value of "s->hw_strap1" is set by the SOC layer via the "hw-strap1" 
property.
Update the "aspeed_ast2700_scu_reset" function to set the value of 
"s->hw_strap1"
in both the SCU and SCUIO hw-strap1 registers.

Signed-off-by: Jamin Lin 
---
 hw/misc/aspeed_scu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c
index 545d004749..0581c744f1 100644
--- a/hw/misc/aspeed_scu.c
+++ b/hw/misc/aspeed_scu.c
@@ -910,7 +910,6 @@ static const MemoryRegionOps aspeed_ast2700_scu_ops = {
 };
 
 static const uint32_t ast2700_a0_resets[ASPEED_AST2700_SCU_NR_REGS] = {
-[AST2700_HW_STRAP1] = 0x0800,
 [AST2700_HW_STRAP1_CLR] = 0xFFF0FFF0,
 [AST2700_HW_STRAP1_LOCK]= 0x0FFF,
 [AST2700_HW_STRAP1_SEC1]= 0x00FF,
@@ -940,6 +939,7 @@ static void aspeed_ast2700_scu_reset(DeviceState *dev)
 
 memcpy(s->regs, asc->resets, asc->nr_regs * 4);
 s->regs[AST2700_SILICON_REV] = s->silicon_rev;
+s->regs[AST2700_HW_STRAP1] = s->hw_strap1;
 }
 
 static void aspeed_2700_scu_class_init(ObjectClass *klass, void *data)
@@ -1032,7 +1032,6 @@ static const MemoryRegionOps aspeed_ast2700_scuio_ops = {
 };
 
 static const uint32_t ast2700_a0_resets_io[ASPEED_AST2700_SCU_NR_REGS] = {
-[AST2700_HW_STRAP1] = 0x0504,
 [AST2700_HW_STRAP1_CLR] = 0xFFF0FFF0,
 [AST2700_HW_STRAP1_LOCK]= 0x0FFF,
 [AST2700_HW_STRAP1_SEC1]= 0x00FF,
-- 
2.34.1

[PATCH v4 1/6] hw/misc/aspeed_scu: Skipping dram_init in u-boot

2025-03-02 Thread Jamin Lin via

Setting BIT6 in VGA0 SCRATCH register will indicate that the ddr traning
is done, therefore skipping the u-boot-spl dram_init() process.

Signed-off-by: Jamin Lin 
Signed-off-by: Troy Lee 
---
 hw/misc/aspeed_scu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c
index bac1441b06..50f74fbabd 100644
--- a/hw/misc/aspeed_scu.c
+++ b/hw/misc/aspeed_scu.c
@@ -157,6 +157,7 @@
 #define AST2700_SCU_FREQ_CNTR   TO_REG(0x3b0)
 #define AST2700_SCU_CPU_SCRATCH_0   TO_REG(0x780)
 #define AST2700_SCU_CPU_SCRATCH_1   TO_REG(0x784)
+#define AST2700_SCU_VGA_SCRATCH_0   TO_REG(0x900)
 
 #define AST2700_SCUIO_CLK_STOP_CTL_1TO_REG(0x240)
 #define AST2700_SCUIO_CLK_STOP_CLR_1TO_REG(0x244)
@@ -930,6 +931,7 @@ static const uint32_t 
ast2700_a0_resets[ASPEED_AST2700_SCU_NR_REGS] = {
 [AST2700_SCU_FREQ_CNTR] = 0x000375eb,
 [AST2700_SCU_CPU_SCRATCH_0] = 0x,
 [AST2700_SCU_CPU_SCRATCH_1] = 0x0004,
+[AST2700_SCU_VGA_SCRATCH_0] = 0x0040,
 };
 
 static void aspeed_ast2700_scu_reset(DeviceState *dev)
-- 
2.34.1

[PATCH v4 3/6] hw/arm/aspeed Update HW Strap Default Values for AST2700

2025-03-02 Thread Jamin Lin via

Separate HW Strap Registers for SCU and SCUIO.
AST2700_EVB_HW_STRAP1 is used for the SCU (CPU Die) hw-strap1.
AST2700_EVB_HW_STRAP2 is used for the SCUIO (IO Die) hw-strap1.

Additionally, both default values are updated based on the dump from the EVB.

Signed-off-by: Jamin Lin 
---
 hw/arm/aspeed.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 98bf071139..c6c18596d6 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -181,8 +181,10 @@ struct AspeedMachineState {
 
 #ifdef TARGET_AARCH64
 /* AST2700 evb hardware value */
-#define AST2700_EVB_HW_STRAP1 0x00C0
-#define AST2700_EVB_HW_STRAP2 0x0003
+/* SCU HW Strap1 */
+#define AST2700_EVB_HW_STRAP1 0x0800
+/* SCUIO HW Strap1 */
+#define AST2700_EVB_HW_STRAP2 0x0700
 #endif
 
 /* Rainier hardware value: (QEMU prototype) */
-- 
2.34.1

Re: [PATCH v5 27/36] vfio/migration: Multifd device state transfer support - load thread

2025-03-02 Thread Avihai Horon




On 19/02/2025 22:34, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 


Maybe add a sentence talking about the load thread itself first? E.g.:

Add a thread which loads the VFIO device state buffers that were 
received and via multifd.
Each VFIO device that has multifd device state transfer enabled has one 
such thread, which is created using migration core API 
qemu_loadvm_start_load_thread().


Since it's important to finish...


Since it's important to finish loading device state transferred via the
main migration channel (via save_live_iterate SaveVMHandler) before
starting loading the data asynchronously transferred via multifd the thread
doing the actual loading of the multifd transferred data is only started
from switchover_start SaveVMHandler.

switchover_start handler is called when MIG_CMD_SWITCHOVER_START
sub-command of QEMU_VM_COMMAND is received via the main migration channel.

This sub-command is only sent after all save_live_iterate data have already
been posted so it is safe to commence loading of the multifd-transferred
device state upon receiving it - loading of save_live_iterate data happens
synchronously in the main migration thread (much like the processing of
MIG_CMD_SWITCHOVER_START) so by the time MIG_CMD_SWITCHOVER_START is
processed all the proceeding data must have already been loaded.

Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration-multifd.c | 225 
  hw/vfio/migration-multifd.h |   2 +
  hw/vfio/migration.c |  12 ++
  hw/vfio/trace-events|   5 +
  4 files changed, 244 insertions(+)

diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index 5d5ee1393674..b3a88c062769 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -42,8 +42,13 @@ typedef struct VFIOStateBuffer {
  } VFIOStateBuffer;

  typedef struct VFIOMultifd {
+QemuThread load_bufs_thread;


This can be dropped.


+bool load_bufs_thread_running;
+bool load_bufs_thread_want_exit;
+
  VFIOStateBuffers load_bufs;
  QemuCond load_bufs_buffer_ready_cond;
+QemuCond load_bufs_thread_finished_cond;
  QemuMutex load_bufs_mutex; /* Lock order: this lock -> BQL */
  uint32_t load_buf_idx;
  uint32_t load_buf_idx_last;
@@ -179,6 +184,175 @@ bool vfio_load_state_buffer(void *opaque, char *data, 
size_t data_size,
  return true;
  }

+static int vfio_load_bufs_thread_load_config(VFIODevice *vbasedev)
+{
+return -EINVAL;
+}
+
+static VFIOStateBuffer *vfio_load_state_buffer_get(VFIOMultifd *multifd)
+{
+VFIOStateBuffer *lb;
+guint bufs_len;
+
+bufs_len = vfio_state_buffers_size_get(&multifd->load_bufs);
+if (multifd->load_buf_idx >= bufs_len) {
+assert(multifd->load_buf_idx == bufs_len);
+return NULL;
+}
+
+lb = vfio_state_buffers_at(&multifd->load_bufs,
+   multifd->load_buf_idx);
+if (!lb->is_present) {
+return NULL;
+}
+
+return lb;
+}
+
+static bool vfio_load_state_buffer_write(VFIODevice *vbasedev,
+ VFIOStateBuffer *lb,
+ Error **errp)
+{
+VFIOMigration *migration = vbasedev->migration;
+VFIOMultifd *multifd = migration->multifd;
+g_autofree char *buf = NULL;
+char *buf_cur;
+size_t buf_len;
+
+if (!lb->len) {
+return true;
+}
+
+trace_vfio_load_state_device_buffer_load_start(vbasedev->name,
+   multifd->load_buf_idx);
+
+/* lb might become re-allocated when we drop the lock */
+buf = g_steal_pointer(&lb->data);
+buf_cur = buf;
+buf_len = lb->len;
+while (buf_len > 0) {
+ssize_t wr_ret;
+int errno_save;
+
+/*
+ * Loading data to the device takes a while,
+ * drop the lock during this process.
+ */
+qemu_mutex_unlock(&multifd->load_bufs_mutex);
+wr_ret = write(migration->data_fd, buf_cur, buf_len);
+errno_save = errno;
+qemu_mutex_lock(&multifd->load_bufs_mutex);
+
+if (wr_ret < 0) {
+error_setg(errp,
+   "writing state buffer %" PRIu32 " failed: %d",
+   multifd->load_buf_idx, errno_save);


Let's add vbasedev->name to the error message so we know which device 
caused the error.



+return false;
+}
+
+assert(wr_ret <= buf_len);


I think this assert is redundant: we write buf_len bytes and by 
definition of write() wr_ret will be <= buf_len.



+buf_len -= wr_ret;
+buf_cur += wr_ret;
+}
+
+trace_vfio_load_state_device_buffer_load_end(vbasedev->name,
+ multifd->load_buf_idx);
+
+return true;
+}
+
+static bool vfio_load_bufs_thread_want_exit(VFIOMultifd *multifd,
+

Re: [PATCH v5 28/36] vfio/migration: Multifd device state transfer support - config loading support

2025-03-02 Thread Avihai Horon




On 19/02/2025 22:34, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 

Load device config received via multifd using the existing machinery
behind vfio_load_device_config_state().

Also, make sure to process the relevant main migration channel flags.

Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration-multifd.c   | 47 ++-
  hw/vfio/migration.c   |  8 +-
  include/hw/vfio/vfio-common.h |  2 ++
  3 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index b3a88c062769..7200f6f1c2a2 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -15,6 +15,7 @@
  #include "qemu/lockable.h"
  #include "qemu/main-loop.h"
  #include "qemu/thread.h"
+#include "io/channel-buffer.h"
  #include "migration/qemu-file.h"
  #include "migration-multifd.h"
  #include "trace.h"
@@ -186,7 +187,51 @@ bool vfio_load_state_buffer(void *opaque, char *data, 
size_t data_size,

  static int vfio_load_bufs_thread_load_config(VFIODevice *vbasedev)
  {
-return -EINVAL;
+VFIOMigration *migration = vbasedev->migration;
+VFIOMultifd *multifd = migration->multifd;
+VFIOStateBuffer *lb;
+g_autoptr(QIOChannelBuffer) bioc = NULL;
+QEMUFile *f_out = NULL, *f_in = NULL;


Can we move patch #29 before this one and use g_autoptr() for f_out an f_in?


+uint64_t mig_header;
+int ret;
+
+assert(multifd->load_buf_idx == multifd->load_buf_idx_last);
+lb = vfio_state_buffers_at(&multifd->load_bufs, multifd->load_buf_idx);
+assert(lb->is_present);
+
+bioc = qio_channel_buffer_new(lb->len);
+qio_channel_set_name(QIO_CHANNEL(bioc), "vfio-device-config-load");
+
+f_out = qemu_file_new_output(QIO_CHANNEL(bioc));
+qemu_put_buffer(f_out, (uint8_t *)lb->data, lb->len);
+
+ret = qemu_fflush(f_out);
+if (ret) {
+g_clear_pointer(&f_out, qemu_fclose);
+return ret;
+}
+
+qio_channel_io_seek(QIO_CHANNEL(bioc), 0, 0, NULL);
+f_in = qemu_file_new_input(QIO_CHANNEL(bioc));
+
+mig_header = qemu_get_be64(f_in);
+if (mig_header != VFIO_MIG_FLAG_DEV_CONFIG_STATE) {
+g_clear_pointer(&f_out, qemu_fclose);
+g_clear_pointer(&f_in, qemu_fclose);
+return -EINVAL;
+}
+
+bql_lock();
+ret = vfio_load_device_config_state(f_in, vbasedev);
+bql_unlock();
+
+g_clear_pointer(&f_out, qemu_fclose);
+g_clear_pointer(&f_in, qemu_fclose);
+if (ret < 0) {
+return ret;
+}
+
+return 0;
  }

  static VFIOStateBuffer *vfio_load_state_buffer_get(VFIOMultifd *multifd)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 85f54cb22df2..b962309f7c27 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -264,7 +264,7 @@ static int vfio_save_device_config_state(QEMUFile *f, void 
*opaque,
  return ret;
  }

-static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
+int vfio_load_device_config_state(QEMUFile *f, void *opaque)
  {
  VFIODevice *vbasedev = opaque;
  uint64_t data;
@@ -728,6 +728,12 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int 
version_id)
  switch (data) {
  case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
  {
+if (vfio_multifd_transfer_enabled(vbasedev)) {
+error_report("%s: got DEV_CONFIG_STATE but doing multifd 
transfer",
+ vbasedev->name);


To make clearer, maybe change to:
"%s: got DEV_CONFIG_STATE in main migration channel but doing multifd 
transfer"


Thanks.


+return -EINVAL;
+}
+
  return vfio_load_device_config_state(f, opaque);
  }
  case VFIO_MIG_FLAG_DEV_SETUP_STATE:
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index ab110198bd6b..ce2bdea8a2c2 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -298,6 +298,8 @@ void vfio_add_bytes_transferred(unsigned long val);
  bool vfio_device_state_is_running(VFIODevice *vbasedev);
  bool vfio_device_state_is_precopy(VFIODevice *vbasedev);

+int vfio_load_device_config_state(QEMUFile *f, void *opaque);
+
  #ifdef CONFIG_LINUX
  int vfio_get_region_info(VFIODevice *vbasedev, int index,
   struct vfio_region_info **info);

Re: [PATCH v5 30/36] vfio/migration: Multifd device state transfer support - send side

2025-03-02 Thread Avihai Horon




On 19/02/2025 22:34, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 

Implement the multifd device state transfer via additional per-device
thread inside save_live_complete_precopy_thread handler.

Switch between doing the data transfer in the new handler and doing it
in the old save_state handler depending on the
x-migration-multifd-transfer device property value.


x-migration-multifd-transfer is not yet introduced. Maybe rephrase to:

... depending if VFIO multifd transfer is enabled or not.



Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration-multifd.c   | 139 ++
  hw/vfio/migration-multifd.h   |   5 ++
  hw/vfio/migration.c   |  26 +--
  hw/vfio/trace-events  |   2 +
  include/hw/vfio/vfio-common.h |   8 ++
  5 files changed, 174 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index 7200f6f1c2a2..0cfa9d31732a 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -476,6 +476,145 @@ bool vfio_multifd_transfer_setup(VFIODevice *vbasedev, 
Error **errp)
  return true;
  }

+void vfio_multifd_emit_dummy_eos(VFIODevice *vbasedev, QEMUFile *f)
+{
+assert(vfio_multifd_transfer_enabled(vbasedev));
+
+/*
+ * Emit dummy NOP data on the main migration channel since the actual
+ * device state transfer is done via multifd channels.
+ */
+qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
+}
+
+static bool
+vfio_save_complete_precopy_thread_config_state(VFIODevice *vbasedev,
+   char *idstr,
+   uint32_t instance_id,
+   uint32_t idx,
+   Error **errp)
+{
+g_autoptr(QIOChannelBuffer) bioc = NULL;
+g_autoptr(QEMUFile) f = NULL;
+int ret;
+g_autofree VFIODeviceStatePacket *packet = NULL;
+size_t packet_len;
+
+bioc = qio_channel_buffer_new(0);
+qio_channel_set_name(QIO_CHANNEL(bioc), "vfio-device-config-save");
+
+f = qemu_file_new_output(QIO_CHANNEL(bioc));
+
+if (vfio_save_device_config_state(f, vbasedev, errp)) {
+return false;
+}
+
+ret = qemu_fflush(f);
+if (ret) {
+error_setg(errp, "save config state flush failed: %d", ret);


Let's add vbasedev->name to the error message so we know which device 
caused the error.



+return false;
+}
+
+packet_len = sizeof(*packet) + bioc->usage;
+packet = g_malloc0(packet_len);
+packet->version = VFIO_DEVICE_STATE_PACKET_VER_CURRENT;
+packet->idx = idx;
+packet->flags = VFIO_DEVICE_STATE_CONFIG_STATE;
+memcpy(&packet->data, bioc->data, bioc->usage);
+
+if (!multifd_queue_device_state(idstr, instance_id,
+(char *)packet, packet_len)) {
+error_setg(errp, "multifd config data queuing failed");


Ditto.


+return false;
+}
+
+vfio_add_bytes_transferred(packet_len);
+
+return true;
+}
+
+/*
+ * This thread is spawned by the migration core directly via
+ * .save_live_complete_precopy_thread SaveVMHandler.
+ *
+ * It exits after either:
+ * * completing saving the remaining device state and device config, OR:
+ * * encountering some error while doing the above, OR:
+ * * being forcefully aborted by the migration core by
+ *   multifd_device_state_save_thread_should_exit() returning true.
+ */
+bool vfio_save_complete_precopy_thread(SaveLiveCompletePrecopyThreadData *d,
+   Error **errp)
+{
+VFIODevice *vbasedev = d->handler_opaque;
+VFIOMigration *migration = vbasedev->migration;
+bool ret;
+g_autofree VFIODeviceStatePacket *packet = NULL;
+uint32_t idx;
+
+if (!vfio_multifd_transfer_enabled(vbasedev)) {
+/* Nothing to do, vfio_save_complete_precopy() does the transfer. */
+return true;
+}
+
+trace_vfio_save_complete_precopy_thread_start(vbasedev->name,
+  d->idstr, d->instance_id);
+
+/* We reach here with device state STOP or STOP_COPY only */
+if (vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
+ VFIO_DEVICE_STATE_STOP, errp)) {
+ret = false;
+goto ret_finish;
+}
+
+packet = g_malloc0(sizeof(*packet) + migration->data_buffer_size);
+packet->version = VFIO_DEVICE_STATE_PACKET_VER_CURRENT;
+
+for (idx = 0; ; idx++) {
+ssize_t data_size;
+size_t packet_size;
+
+if (multifd_device_state_save_thread_should_exit()) {
+error_setg(errp, "operation cancelled");


Same comment as in patch #27:

IIUC, if multifd_device_state_save_thread_should_exit() returns true, it 
means that some other code part already failed and set migration error, no?
If so, shouldn't we return tru

Re: [PATCH v5 34/36] vfio/migration: Max in-flight VFIO device state buffer count limit

2025-03-02 Thread Maciej S. Szmigiero


On 2.03.2025 15:53, Avihai Horon wrote:


On 19/02/2025 22:34, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 

Allow capping the maximum count of in-flight VFIO device state buffers
queued at the destination, otherwise a malicious QEMU source could
theoretically cause the target QEMU to allocate unlimited amounts of memory
for buffers-in-flight.


I still think it's better to limit the number of bytes rather than number of 
buffers:
1. To the average user the number of buffers doesn't really mean anything. They 
have to open the code and see what is the size of a single buffer and then 
choose their value.
2. Currently VFIO migration buffer size is 1MB. If later it's changed to 2MB 
for example, users will have to adjust their configuration accordingly. With 
number of bytes, the configuration remains the same no matter what is the VFIO 
migration buffer size.


Sorry Avihai, but we're a little more than week from code freeze
so it's really not a time for more than cosmetic changes.

Thanks,
Maciej

Re: [PATCH v5 34/36] vfio/migration: Max in-flight VFIO device state buffer count limit

2025-03-02 Thread Maciej S. Szmigiero


On 2.03.2025 15:54, Maciej S. Szmigiero wrote:

On 2.03.2025 15:53, Avihai Horon wrote:


On 19/02/2025 22:34, Maciej S. Szmigiero wrote:

External email: Use caution opening links or attachments


From: "Maciej S. Szmigiero" 

Allow capping the maximum count of in-flight VFIO device state buffers
queued at the destination, otherwise a malicious QEMU source could
theoretically cause the target QEMU to allocate unlimited amounts of memory
for buffers-in-flight.


I still think it's better to limit the number of bytes rather than number of 
buffers:
1. To the average user the number of buffers doesn't really mean anything. They 
have to open the code and see what is the size of a single buffer and then 
choose their value.
2. Currently VFIO migration buffer size is 1MB. If later it's changed to 2MB 
for example, users will have to adjust their configuration accordingly. With 
number of bytes, the configuration remains the same no matter what is the VFIO 
migration buffer size.


Sorry Avihai, but we're a little more than week from code freeze
so it's really not a time for more than cosmetic changes.


And if you really, really want to have queued buffers size limit
that's something could be added later as additional
x-migration-max-queued-buffers-size or something property
since these limits aren't exclusive.

Thanks,
Maciej

Re: [PATCH] hw/net/fsl_etsec: Set eTSEC device description and category

2025-03-02 Thread Bernhard Beschow




Am 18. Februar 2025 15:54:07 UTC schrieb BALATON Zoltan :
>Add description and set category for eTSEC device so it shows up
>better in -device help.
>
>Signed-off-by: BALATON Zoltan 
>---
> hw/net/fsl_etsec/etsec.c | 2 ++
> 1 file changed, 2 insertions(+)
>
>diff --git a/hw/net/fsl_etsec/etsec.c b/hw/net/fsl_etsec/etsec.c
>index 3ce4fa2662..adde644892 100644
>--- a/hw/net/fsl_etsec/etsec.c
>+++ b/hw/net/fsl_etsec/etsec.c
>@@ -423,8 +423,10 @@ static void etsec_class_init(ObjectClass *klass, void 
>*data)
> DeviceClass *dc = DEVICE_CLASS(klass);
> 
> dc->realize = etsec_realize;
>+dc->desc = "Freescale Enhanced Three-Speed Ethernet Controller";
> device_class_set_legacy_reset(dc, etsec_reset);
> device_class_set_props(dc, etsec_properties);
>+set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
> }
> 
> static const TypeInfo etsec_types[] = {

Reviewed-by: Bernhard Beschow

89 matches

Mail list logo