Re: [PATCH v1 1/4] physmem: disallow direct access to RAM DEVICE in address_space_write_rom()

2025-01-22 Thread David Hildenbrand

On 22.01.25 11:10, David Hildenbrand wrote:

On 22.01.25 11:07, Philippe Mathieu-Daudé wrote:

Hi David,

On 20/1/25 12:14, David Hildenbrand wrote:

As documented in commit 4a2e242bbb306 ("memory: Don't use memcpy for
ram_device regions"), we disallow direct access to RAM DEVICE regions.

Let's factor out the "supports direct access" check from
memory_access_is_direct() so we can reuse it, and make it a bit easier to
read.

This change implies that address_space_write_rom() and
cpu_memory_rw_debug() won't be able to write to RAM DEVICE regions. It
will also affect cpu_flush_icache_range(), but it's only used by
hw/core/loader.c after writing to ROM, so it is expected to not apply
here with RAM DEVICE.

This fixes direct access to these regions where we don't want direct
access. We'll extend cpu_memory_rw_debug() next to also be able to write to
these (and IO) regions.

This is a preparation for further changes.

Cc: Alex Williamson 
Signed-off-by: David Hildenbrand 
---
include/exec/memory.h | 30 --
system/physmem.c  |  3 +--
2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3ee1901b52..bd0ddb9cdf 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2985,15 +2985,33 @@ MemTxResult 
address_space_write_cached_slow(MemoryRegionCache *cache,
int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr);
bool prepare_mmio_access(MemoryRegion *mr);

+static inline bool memory_region_supports_direct_access(MemoryRegion *mr)

+{
+/* ROM DEVICE regions only allow direct access if in ROMD mode. */
+if (memory_region_is_romd(mr)) {
+return true;
+}
+if (!memory_region_is_ram(mr)) {
+return false;
+}
+/*
+ * RAM DEVICE regions can be accessed directly using memcpy, but it might
+ * be MMIO and access using mempy can be wrong (e.g., using instructions 
not
+ * intended for MMIO access). So we treat this as IO.
+ */
+return !memory_region_is_ram_device(mr);
+
+}
+
static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
{
-if (is_write) {
-return memory_region_is_ram(mr) && !mr->readonly &&
-   !mr->rom_device && !memory_region_is_ram_device(mr);
-} else {
-return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) 
||


This patch is doing multiple things at once, and I'm having hard time
reviewing it.


I appreciate the review, but ... really?! :)

25 insertions(+), 8 deletions(-)


FWIW, I'll try to split it up ... I thought the comments added to 
memory_region_supports_direct_access() and friends are pretty clear.


--
Cheers,

David / dhildenb




Re: [PATCH v1 1/4] physmem: disallow direct access to RAM DEVICE in address_space_write_rom()

2025-01-22 Thread David Hildenbrand

On 22.01.25 11:18, David Hildenbrand wrote:

On 22.01.25 11:17, Philippe Mathieu-Daudé wrote:

On 22/1/25 11:13, David Hildenbrand wrote:

On 22.01.25 11:10, David Hildenbrand wrote:

On 22.01.25 11:07, Philippe Mathieu-Daudé wrote:

Hi David,

On 20/1/25 12:14, David Hildenbrand wrote:

As documented in commit 4a2e242bbb306 ("memory: Don't use memcpy for
ram_device regions"), we disallow direct access to RAM DEVICE regions.

Let's factor out the "supports direct access" check from
memory_access_is_direct() so we can reuse it, and make it a bit
easier to
read.

This change implies that address_space_write_rom() and
cpu_memory_rw_debug() won't be able to write to RAM DEVICE regions. It
will also affect cpu_flush_icache_range(), but it's only used by
hw/core/loader.c after writing to ROM, so it is expected to not apply
here with RAM DEVICE.

This fixes direct access to these regions where we don't want direct
access. We'll extend cpu_memory_rw_debug() next to also be able to
write to
these (and IO) regions.

This is a preparation for further changes.

Cc: Alex Williamson 
Signed-off-by: David Hildenbrand 
---
      include/exec/memory.h | 30 --
      system/physmem.c  |  3 +--
      2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3ee1901b52..bd0ddb9cdf 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2985,15 +2985,33 @@ MemTxResult
address_space_write_cached_slow(MemoryRegionCache *cache,
      int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr);
      bool prepare_mmio_access(MemoryRegion *mr);
+static inline bool
memory_region_supports_direct_access(MemoryRegion *mr)
+{
+    /* ROM DEVICE regions only allow direct access if in ROMD mode. */
+    if (memory_region_is_romd(mr)) {
+    return true;
+    }
+    if (!memory_region_is_ram(mr)) {
+    return false;
+    }
+    /*
+ * RAM DEVICE regions can be accessed directly using memcpy,
but it might
+ * be MMIO and access using mempy can be wrong (e.g., using
instructions not
+ * intended for MMIO access). So we treat this as IO.
+ */
+    return !memory_region_is_ram_device(mr);
+
+}
+
      static inline bool memory_access_is_direct(MemoryRegion *mr,
bool is_write)
      {
-    if (is_write) {
-    return memory_region_is_ram(mr) && !mr->readonly &&
-   !mr->rom_device && !memory_region_is_ram_device(mr);
-    } else {
-    return (memory_region_is_ram(mr) && !
memory_region_is_ram_device(mr)) ||


This patch is doing multiple things at once, and I'm having hard time
reviewing it.


I appreciate the review, but ... really?! :)

25 insertions(+), 8 deletions(-)


FWIW, I'll try to split it up ... I thought the comments added to
memory_region_supports_direct_access() and friends are pretty clear.


No worry, I'll give it another try. (split still welcomed, but not
blocking).


I think unmangling the existing unreadable conditions in
memory_access_is_direct() can be done separately; let me see what I can do.


The following should hopefully be easier to follow:


From 89519beec0de96d9c9c243a844ccb698db759893 Mon Sep 17 00:00:00 2001
From: David Hildenbrand 
Date: Wed, 22 Jan 2025 11:23:00 +0100
Subject: [PATCH 1/4] physmem: factor out memory_region_is_ram_device() check
 in memory_access_is_direct()

As documented in commit 4a2e242bbb306 ("memory: Don't use memcpy for
ram_device regions"), we disallow direct access to RAM DEVICE regions.

Let's make this clearer to prepare for further changes. Note that ROMD
regions will never be RAM DEVICE at the same time.

Signed-off-by: David Hildenbrand 
---
 include/exec/memory.h | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3ee1901b52..7931aba2ea 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2987,12 +2987,19 @@ bool prepare_mmio_access(MemoryRegion *mr);
 
 static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)

 {
+/*
+ * RAM DEVICE regions can be accessed directly using memcpy, but it might
+ * be MMIO and access using mempy can be wrong (e.g., using instructions 
not
+ * intended for MMIO access). So we treat this as IO.
+ */
+if (memory_region_is_ram_device(mr)) {
+return false;
+}
 if (is_write) {
 return memory_region_is_ram(mr) && !mr->readonly &&
-   !mr->rom_device && !memory_region_is_ram_device(mr);
+   !mr->rom_device;
 } else {
-return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) 
||
-   memory_region_is_romd(mr);
+return memory_region_is_ram(mr) || memory_region_is_romd(mr);
 }
 }
 
--

2.47.1


From ba793917cf3e35bc3b39898524ea96a85cef768d Mon Sep 17 00:00:00 2001
From: David Hildenbrand 
Date: Wed, 22 Jan 2025 11:28:19 +0100
Subject: [PATCH 2/4] physmem: factor out RAM/ROMD chec

[PATCH] target/i386: extract common bits of gen_repz/gen_repz_nz

2025-01-22 Thread Paolo Bonzini
Now that everything has been cleaned up, look at DF and prefixes
in a single function, and call that one from gen_repz and gen_repz_nz.

Signed-off-by: Paolo Bonzini 
---
 target/i386/tcg/translate.c | 34 ++
 1 file changed, 14 insertions(+), 20 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 3d8a0a8071f..a8935f487aa 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -695,14 +695,6 @@ static inline void gen_string_movl_A0_EDI(DisasContext *s)
 gen_lea_v_seg(s, cpu_regs[R_EDI], R_ES, -1);
 }
 
-static inline TCGv gen_compute_Dshift(DisasContext *s, MemOp ot)
-{
-TCGv dshift = tcg_temp_new();
-tcg_gen_ld32s_tl(dshift, tcg_env, offsetof(CPUX86State, df));
-tcg_gen_shli_tl(dshift, dshift, ot);
-return dshift;
-};
-
 static TCGv gen_ext_tl(TCGv dst, TCGv src, MemOp size, bool sign)
 {
 if (size == MO_TL) {
@@ -1453,29 +1445,31 @@ static void do_gen_rep(DisasContext *s, MemOp ot, TCGv 
dshift,
 gen_jmp_rel_csize(s, 0, 1);
 }
 
-static void gen_repz(DisasContext *s, MemOp ot,
- void (*fn)(DisasContext *s, MemOp ot, TCGv dshift))
-
+static void do_gen_string(DisasContext *s, MemOp ot,
+  void (*fn)(DisasContext *s, MemOp ot, TCGv dshift),
+  bool is_repz_nz)
 {
-TCGv dshift = gen_compute_Dshift(s, ot);
+TCGv dshift = tcg_temp_new();
+tcg_gen_ld32s_tl(dshift, tcg_env, offsetof(CPUX86State, df));
+tcg_gen_shli_tl(dshift, dshift, ot);
 
 if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
-do_gen_rep(s, ot, dshift, fn, false);
+do_gen_rep(s, ot, dshift, fn, is_repz_nz);
 } else {
 fn(s, ot, dshift);
 }
 }
 
+static void gen_repz(DisasContext *s, MemOp ot,
+ void (*fn)(DisasContext *s, MemOp ot, TCGv dshift))
+{
+do_gen_string(s, ot, fn, false);
+}
+
 static void gen_repz_nz(DisasContext *s, MemOp ot,
 void (*fn)(DisasContext *s, MemOp ot, TCGv dshift))
 {
-TCGv dshift = gen_compute_Dshift(s, ot);
-
-if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
-do_gen_rep(s, ot, dshift, fn, true);
-} else {
-fn(s, ot, dshift);
-}
+do_gen_string(s, ot, fn, true);
 }
 
 static void gen_helper_fp_arith_ST0_FT0(int op)
-- 
2.47.1




[PATCH] target/i386: extract common bits of gen_repz/gen_repz_nz

2025-01-22 Thread Paolo Bonzini
Now that everything has been cleaned up, look at DF and prefixes
in a single function, and call that one from gen_repz and gen_repz_nz.

Based-on: <20241215090613.89588-1-pbonz...@redhat.com>
Suggested-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
This was requested in the review of "target/i386: optimize string 
operations"

 target/i386/tcg/translate.c | 34 ++
 1 file changed, 14 insertions(+), 20 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 9f4d3ebbd95..9b2fde5eb28 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -688,14 +688,6 @@ static inline void gen_string_movl_A0_EDI(DisasContext *s)
 gen_lea_v_seg(s, cpu_regs[R_EDI], R_ES, -1);
 }
 
-static inline TCGv gen_compute_Dshift(DisasContext *s, MemOp ot)
-{
-TCGv dshift = tcg_temp_new();
-tcg_gen_ld32s_tl(dshift, tcg_env, offsetof(CPUX86State, df));
-tcg_gen_shli_tl(dshift, dshift, ot);
-return dshift;
-};
-
 static TCGv gen_ext_tl(TCGv dst, TCGv src, MemOp size, bool sign)
 {
 if (size == MO_TL) {
@@ -1446,29 +1438,31 @@ static void do_gen_rep(DisasContext *s, MemOp ot, TCGv 
dshift,
 gen_jmp_rel_csize(s, 0, 1);
 }
 
-static void gen_repz(DisasContext *s, MemOp ot,
- void (*fn)(DisasContext *s, MemOp ot, TCGv dshift))
-
+static void do_gen_string(DisasContext *s, MemOp ot,
+  void (*fn)(DisasContext *s, MemOp ot, TCGv dshift),
+  bool is_repz_nz)
 {
-TCGv dshift = gen_compute_Dshift(s, ot);
+TCGv dshift = tcg_temp_new();
+tcg_gen_ld32s_tl(dshift, tcg_env, offsetof(CPUX86State, df));
+tcg_gen_shli_tl(dshift, dshift, ot);
 
 if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
-do_gen_rep(s, ot, dshift, fn, false);
+do_gen_rep(s, ot, dshift, fn, is_repz_nz);
 } else {
 fn(s, ot, dshift);
 }
 }
 
+static void gen_repz(DisasContext *s, MemOp ot,
+ void (*fn)(DisasContext *s, MemOp ot, TCGv dshift))
+{
+do_gen_string(s, ot, fn, false);
+}
+
 static void gen_repz_nz(DisasContext *s, MemOp ot,
 void (*fn)(DisasContext *s, MemOp ot, TCGv dshift))
 {
-TCGv dshift = gen_compute_Dshift(s, ot);
-
-if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
-do_gen_rep(s, ot, dshift, fn, true);
-} else {
-fn(s, ot, dshift);
-}
+do_gen_string(s, ot, fn, true);
 }
 
 static void gen_helper_fp_arith_ST0_FT0(int op)
-- 
2.47.1




[PATCH v2 02/10] gdbstub: Clarify no more than @gdb_num_core_regs can be accessed

2025-01-22 Thread Philippe Mathieu-Daudé
Both CPUClass::gdb_read_register() and CPUClass::gdb_write_register()
handlers are called from common gdbstub code, and won't be called with
register index over CPUClass::gdb_num_core_regs:

  int gdb_read_register(CPUState *cpu, GByteArray *buf, int reg)
  {
  CPUClass *cc = CPU_GET_CLASS(cpu);

  if (reg < cc->gdb_num_core_regs) {
  return cc->gdb_read_register(cpu, buf, reg);
  }
  ...
  }

  static int gdb_write_register(CPUState *cpu, uint8_t *mem_buf, int reg)
  {
  CPUClass *cc = CPU_GET_CLASS(cpu);

  if (reg < cc->gdb_num_core_regs) {
  return cc->gdb_write_register(cpu, mem_buf, reg);
  }
  ...
  }

Clarify that in CPUClass docstring, and remove unreachable code on
the microblaze and tricore implementations.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/core/cpu.h   | 2 ++
 target/microblaze/gdbstub.c | 5 -
 target/openrisc/gdbstub.c   | 5 -
 3 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index fb397cdfc53..7b6b22c431b 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -124,7 +124,9 @@ struct SysemuCPUOps;
  * @get_pc: Callback for getting the Program Counter register.
  *   As above, with the semantics of the target architecture.
  * @gdb_read_register: Callback for letting GDB read a register.
+ * No more than @gdb_num_core_regs registers can be read.
  * @gdb_write_register: Callback for letting GDB write a register.
+ * No more than @gdb_num_core_regs registers can be 
written.
  * @gdb_adjust_breakpoint: Callback for adjusting the address of a
  *   breakpoint.  Used by AVR to handle a gdb mis-feature with
  *   its Harvard architecture split code and data.
diff --git a/target/microblaze/gdbstub.c b/target/microblaze/gdbstub.c
index 09d74e164d0..d493681d38d 100644
--- a/target/microblaze/gdbstub.c
+++ b/target/microblaze/gdbstub.c
@@ -110,14 +110,9 @@ int mb_cpu_gdb_read_stack_protect(CPUState *cs, GByteArray 
*mem_buf, int n)
 
 int mb_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
 {
-CPUClass *cc = CPU_GET_CLASS(cs);
 CPUMBState *env = cpu_env(cs);
 uint32_t tmp;
 
-if (n > cc->gdb_num_core_regs) {
-return 0;
-}
-
 tmp = ldl_p(mem_buf);
 
 switch (n) {
diff --git a/target/openrisc/gdbstub.c b/target/openrisc/gdbstub.c
index c2a77d5d4d5..45bba80d878 100644
--- a/target/openrisc/gdbstub.c
+++ b/target/openrisc/gdbstub.c
@@ -47,14 +47,9 @@ int openrisc_cpu_gdb_read_register(CPUState *cs, GByteArray 
*mem_buf, int n)
 
 int openrisc_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
 {
-CPUClass *cc = CPU_GET_CLASS(cs);
 CPUOpenRISCState *env = cpu_env(cs);
 uint32_t tmp;
 
-if (n > cc->gdb_num_core_regs) {
-return 0;
-}
-
 tmp = ldl_p(mem_buf);
 
 if (n < 32) {
-- 
2.47.1




[PATCH v2 08/10] gdbstub: Prefer cached CpuClass over CPU_GET_CLASS() macro

2025-01-22 Thread Philippe Mathieu-Daudé
CpuState caches its CPUClass since commit 6fbdff87062
("cpu: cache CPUClass in CPUState for hot code paths"),
use it.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
---
 gdbstub/gdbstub.c | 26 +-
 gdbstub/system.c  |  7 ++-
 gdbstub/user-target.c |  6 ++
 gdbstub/user.c|  7 ++-
 4 files changed, 15 insertions(+), 31 deletions(-)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index e366df12d4a..282e13e163f 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -354,7 +354,6 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
GDBProcess *process)
 {
 CPUState *cpu = gdb_get_first_cpu_in_process(process);
-CPUClass *cc = CPU_GET_CLASS(cpu);
 GDBRegisterState *r;
 size_t len;
 
@@ -377,11 +376,11 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
  ""
  ""));
 
-if (cc->gdb_arch_name) {
+if (cpu->cc->gdb_arch_name) {
 g_ptr_array_add(
 xml,
 g_markup_printf_escaped("%s",
-cc->gdb_arch_name(cpu)));
+cpu->cc->gdb_arch_name(cpu)));
 }
 for (guint i = 0; i < cpu->gdb_regs->len; i++) {
 r = &g_array_index(cpu->gdb_regs, GDBRegisterState, i);
@@ -520,11 +519,10 @@ GArray *gdb_get_register_list(CPUState *cpu)
 
 int gdb_read_register(CPUState *cpu, GByteArray *buf, int reg)
 {
-CPUClass *cc = CPU_GET_CLASS(cpu);
 GDBRegisterState *r;
 
-if (reg < cc->gdb_num_core_regs) {
-return cc->gdb_read_register(cpu, buf, reg);
+if (reg < cpu->cc->gdb_num_core_regs) {
+return cpu->cc->gdb_read_register(cpu, buf, reg);
 }
 
 for (guint i = 0; i < cpu->gdb_regs->len; i++) {
@@ -538,11 +536,10 @@ int gdb_read_register(CPUState *cpu, GByteArray *buf, int 
reg)
 
 static int gdb_write_register(CPUState *cpu, uint8_t *mem_buf, int reg)
 {
-CPUClass *cc = CPU_GET_CLASS(cpu);
 GDBRegisterState *r;
 
-if (reg < cc->gdb_num_core_regs) {
-return cc->gdb_write_register(cpu, mem_buf, reg);
+if (reg < cpu->cc->gdb_num_core_regs) {
+return cpu->cc->gdb_write_register(cpu, mem_buf, reg);
 }
 
 for (guint i = 0; i < cpu->gdb_regs->len; i++) {
@@ -570,7 +567,7 @@ static void gdb_register_feature(CPUState *cpu, int 
base_reg,
 
 void gdb_init_cpu(CPUState *cpu)
 {
-CPUClass *cc = CPU_GET_CLASS(cpu);
+CPUClass *cc = cpu->cc;
 const GDBFeature *feature;
 
 cpu->gdb_regs = g_array_new(false, false, sizeof(GDBRegisterState));
@@ -1646,11 +1643,8 @@ void gdb_extend_qsupported_features(char *qflags)
 
 static void handle_query_supported(GArray *params, void *user_ctx)
 {
-CPUClass *cc;
-
 g_string_printf(gdbserver_state.str_buf, "PacketSize=%x", 
MAX_PACKET_LENGTH);
-cc = CPU_GET_CLASS(first_cpu);
-if (cc->gdb_core_xml_file) {
+if (first_cpu->cc->gdb_core_xml_file) {
 g_string_append(gdbserver_state.str_buf, ";qXfer:features:read+");
 }
 
@@ -1697,7 +1691,6 @@ static void handle_query_supported(GArray *params, void 
*user_ctx)
 static void handle_query_xfer_features(GArray *params, void *user_ctx)
 {
 GDBProcess *process;
-CPUClass *cc;
 unsigned long len, total_len, addr;
 const char *xml;
 const char *p;
@@ -1708,8 +1701,7 @@ static void handle_query_xfer_features(GArray *params, 
void *user_ctx)
 }
 
 process = gdb_get_cpu_process(gdbserver_state.g_cpu);
-cc = CPU_GET_CLASS(gdbserver_state.g_cpu);
-if (!cc->gdb_core_xml_file) {
+if (!gdbserver_state.g_cpu->cc->gdb_core_xml_file) {
 gdb_put_packet("");
 return;
 }
diff --git a/gdbstub/system.c b/gdbstub/system.c
index 8ce79fa88cf..215a2c5dcad 100644
--- a/gdbstub/system.c
+++ b/gdbstub/system.c
@@ -452,8 +452,6 @@ static int phy_memory_mode;
 int gdb_target_memory_rw_debug(CPUState *cpu, hwaddr addr,
uint8_t *buf, int len, bool is_write)
 {
-CPUClass *cc;
-
 if (phy_memory_mode) {
 if (is_write) {
 cpu_physical_memory_write(addr, buf, len);
@@ -463,9 +461,8 @@ int gdb_target_memory_rw_debug(CPUState *cpu, hwaddr addr,
 return 0;
 }
 
-cc = CPU_GET_CLASS(cpu);
-if (cc->memory_rw_debug) {
-return cc->memory_rw_debug(cpu, addr, buf, len, is_write);
+if (cpu->cc->memory_rw_debug) {
+return cpu->cc->memory_rw_debug(cpu, addr, buf, len, is_write);
 }
 
 return cpu_memory_rw_debug(cpu, addr, buf, len, is_write);
diff --git a/gdbstub/user-target.c b/gdbstub/user-target.c
index 22bf4008c0f..355b1901b4f 100644
--- a/gdbstub/user-target.c
+++ b/gdbstub/user-target.c
@@ -233,10 +233,8 @@ void gdb_handle_query_offsets(GArray *params, void 
*user_ctx)
 static inline int target_memory_rw_debug(CPUS

[PATCH v2 00/10] cpus: Prefer cached CpuClass over CPU_GET_CLASS() macro

2025-01-22 Thread Philippe Mathieu-Daudé
Missing review: 1 & 2

v1 cover:
Use cached CPUState::cc to get CPUClass.
Main rationale is overall code style.

Philippe Mathieu-Daudé (10):
  hw/core/generic-loader: Do not open-code cpu_set_pc()
  gdbstub: Clarify no more than @gdb_num_core_regs can be accessed
  cpus: Cache CPUClass early in instance_init() handler
  cpus: Prefer cached CpuClass over CPU_GET_CLASS() macro
  accel: Prefer cached CpuClass over CPU_GET_CLASS() macro
  user: Prefer cached CpuClass over CPU_GET_CLASS() macro
  disas: Prefer cached CpuClass over CPU_GET_CLASS() macro
  gdbstub: Prefer cached CpuClass over CPU_GET_CLASS() macro
  hw/acpi: Prefer cached CpuClass over CPU_GET_CLASS() macro
  target/arm: Prefer cached CpuClass over CPU_GET_CLASS() macro

 include/hw/core/cpu.h  | 12 
 linux-user/alpha/target_proc.h |  2 +-
 accel/accel-target.c   | 12 
 accel/tcg/tcg-accel-ops.c  |  3 +-
 accel/tcg/translate-all.c  |  2 +-
 accel/tcg/watchpoint.c |  9 +++---
 bsd-user/signal.c  |  4 +--
 cpu-common.c   | 10 +++
 cpu-target.c   |  9 ++
 disas/disas-common.c   |  5 ++--
 gdbstub/gdbstub.c  | 26 ++--
 gdbstub/system.c   |  7 ++---
 gdbstub/user-target.c  |  6 ++--
 gdbstub/user.c |  7 ++---
 hw/acpi/cpu.c  |  4 +--
 hw/acpi/cpu_hotplug.c  |  3 +-
 hw/core/cpu-common.c   | 16 +-
 hw/core/cpu-system.c   | 55 --
 hw/core/generic-loader.c   |  5 +---
 linux-user/signal.c|  4 +--
 target/arm/cpu.c   |  3 +-
 target/arm/tcg/cpu-v7m.c   |  3 +-
 target/microblaze/gdbstub.c|  5 
 target/openrisc/gdbstub.c  |  5 
 24 files changed, 76 insertions(+), 141 deletions(-)

-- 
2.47.1




Re: [PATCH 10/10] rust: vmstate: make order of parameters consistent in vmstate_clock

2025-01-22 Thread Philippe Mathieu-Daudé

On 17/1/25 10:00, Paolo Bonzini wrote:

Place struct_name before field_name, similar to offset_of.

Signed-off-by: Paolo Bonzini 
---
  rust/hw/char/pl011/src/device_class.rs | 2 +-
  rust/qemu-api/src/vmstate.rs   | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] hw/boards: Convert MachineClass bitfields to boolean

2025-01-22 Thread Daniel P . Berrangé
On Wed, Jan 22, 2025 at 11:32:23AM +0100, Philippe Mathieu-Daudé wrote:
> As Daniel mentioned:
> 
>  "The number of instances of MachineClass is not large enough
>   that we save a useful amount of memory through bitfields."
> 
> Also, see recent commit ecbf3567e21 ("docs/devel/style: add a
> section about bitfield, and disallow them for packed structures").

Also developers incorrectly think these are already booleans:

$ git grep  -E 
'\b(no_parallel|no_serial|no_floppy|no_cdrom|no_sdcard|pci_allow_0_address|legacy_fw_cfg_order)\b'
 | grep -E '(true|false)'
hw/arm/sbsa-ref.c:mc->pci_allow_0_address = true;
hw/arm/virt.c:mc->pci_allow_0_address = true;
hw/arm/xlnx-versal-virt.c:mc->no_cdrom = true;
hw/m68k/next-cube.c:mc->no_cdrom = true;
hw/ppc/spapr.c:mc->pci_allow_0_address = true;
hw/riscv/virt.c:mc->pci_allow_0_address = true;

> 
> Convert the MachineClass bitfields used as boolean as real ones.
> 
> Suggested-by: Daniel P. Berrangé 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/hw/boards.h| 14 +++---
>  hw/arm/aspeed.c|  6 +++---
>  hw/arm/fby35.c |  4 ++--
>  hw/arm/npcm7xx_boards.c|  6 +++---
>  hw/arm/raspi.c |  6 +++---
>  hw/arm/sbsa-ref.c  |  2 +-
>  hw/arm/virt.c  |  2 +-
>  hw/arm/xilinx_zynq.c   |  2 +-
>  hw/avr/arduino.c   |  6 +++---
>  hw/core/null-machine.c | 10 +-
>  hw/i386/microvm.c  |  2 +-
>  hw/i386/pc_piix.c  |  2 +-
>  hw/i386/pc_q35.c   |  4 ++--
>  hw/loongarch/virt.c|  2 +-
>  hw/m68k/virt.c |  6 +++---
>  hw/ppc/pnv.c   |  2 +-
>  hw/ppc/spapr.c |  2 +-
>  hw/riscv/virt.c|  2 +-
>  hw/s390x/s390-virtio-ccw.c |  8 
>  hw/xtensa/sim.c|  2 +-
>  20 files changed, 45 insertions(+), 45 deletions(-)

Reviewed-by: Daniel P. Berrangé 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v7 2/4] chardev/char-hub: implement backend chardev aggregator

2025-01-22 Thread Alex Bennée
Roman Penyaev  writes:

> This patch implements a new chardev backend `hub` device, which
> aggregates input from multiple backend devices and forwards it to a
> single frontend device. Additionally, `hub` device takes the output
> from the frontend device and sends it back to all the connected
> backend devices. This allows for seamless interaction between
> different backend devices and a single frontend interface.
>
> The idea of the change is trivial: keep list of backend devices
> (up to 4), init them on demand and forward data buffer back and
> forth.
>
> The following is QEMU command line example:
>
>-chardev pty,path=/tmp/pty,id=pty0 \
>-chardev vc,id=vc0 \
>-chardev hub,id=hub0,chardevs.0=pty0,chardevs.1=vc0 \
>-device virtconsole,chardev=hub0 \
>-vnc 0.0.0.0:0
>
> Which creates 2 backend devices: text virtual console (`vc0`) and a
> pseudo TTY (`pty0`) connected to the single virtio hvc console with
> the backend aggregator (`hub0`) help. `vc0` renders text to an image,
> which can be shared over the VNC protocol.  `pty0` is a pseudo TTY
> backend which provides biderectional communication to the virtio hvc
> console.
>

> +static void qemu_chr_open_hub(Chardev *chr,
> + ChardevBackend *backend,
> + bool *be_opened,
> + Error **errp)
> +{
> +ChardevHub *hub = backend->u.hub.data;
> +HubChardev *d = HUB_CHARDEV(chr);
> +strList *list = hub->chardevs;
> +
> +d->be_eagain_ind = -1;
> +
> +if (list == NULL) {
> +error_setg(errp, "hub: 'chardevs' list is not defined");
> +return;
> +}
> +
> +while (list) {
> +Chardev *s;
> +
> +s = qemu_chr_find(list->value);
> +if (s == NULL) {
> +error_setg(errp, "hub: chardev can't be found by id '%s'",
> +   list->value);
> +return;
> +}
> +if (CHARDEV_IS_HUB(s) || CHARDEV_IS_MUX(s)) {
> +error_setg(errp, "hub: multiplexers and hub devices can't be "
> +   "stacked, check chardev '%s', chardev should not "
> +   "be a hub device or have 'mux=on' enabled",
> +   list->value);
> +return;

So I was looking at this to see if I could implement what I wanted which
was a tee-like copy of a serial port output while maintaining the C-a
support of the mux.

Normally I just use the shortcut -serial mon:stdio

However that form is a special case so I tried the following and ran
into the above:

  -chardev stdio,mux=on,id=char0 \
  -chardev file,path=console.log,id=clog  \
  -mon chardev=char0,mode=readline \
  -chardev hub,id=hub0,chardevs.0=char0,chardevs.1=clog

Giving:
  qemu-system-aarch64: -chardev -hub,id=hub0,chardevs.0=char0,chardevs.1=clog: 
hub: -multiplexers and hub devices can't be stacked, check chardev
-'char0', chardev should not be a hub device or have 'mux=on' 
-enabled
  
So what stops this sort of chain?

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH v3 2/3] scripts: validate SPDX license choices

2025-01-22 Thread Peter Maydell
On Fri, 17 Jan 2025 at 12:42, Daniel P. Berrangé  wrote:
>
> We expect all new code to be contributed with the "GPL-2.0-or-later"
> license tag. Divergence is permitted if the new file is derived from
> pre-existing code under a different license, whether from elsewhere
> in QEMU codebase, or outside.
>
> Issue a warning if the declared license is not "GPL-2.0-or-later",
> and an error if the license is not one of the handful of the
> expected licenses to prevent unintended proliferation. The warning
> asks users to explain their unusual choice of license in the commit
> message.
>
> Signed-off-by: Daniel P. Berrangé 

Reviewed-by: Peter Maydell 

thanks
-- PMM



Re: [PATCH v3 17/26] hw/arm/virt: Reserve one bit of guest-physical address for RME

2025-01-22 Thread Jean-Philippe Brucker
Hi Gavin,

On Fri, Dec 13, 2024 at 10:03:08PM +1000, Gavin Shan wrote:
> Hi Jean,
> 
> On 11/26/24 5:56 AM, Jean-Philippe Brucker wrote:
> > When RME is enabled, the upper GPA bit is used to distinguish protected
> > from unprotected addresses. Reserve it when setting up the guest memory
> > map.
> > 
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >   hw/arm/virt.c | 14 --
> >   1 file changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 9836dfbdfb..eb94997914 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -3035,14 +3035,24 @@ static int virt_kvm_type(MachineState *ms, const 
> > char *type_str)
> >   VirtMachineState *vms = VIRT_MACHINE(ms);
> >   int rme_vm_type = kvm_arm_rme_vm_type(ms);
> >   int max_vm_pa_size, requested_pa_size;
> > +int rme_reserve_bit = 0;
> >   bool fixed_ipa;
> > -max_vm_pa_size = kvm_arm_get_max_vm_ipa_size(ms, &fixed_ipa);
> > +if (rme_vm_type) {
> > +/*
> > + * With RME, the upper GPA bit differentiates Realm from NS memory.
> > + * Reserve the upper bit to ensure that highmem devices will fit.
> > + */
> > +rme_reserve_bit = 1;
> > +}
> > +
> > +max_vm_pa_size = kvm_arm_get_max_vm_ipa_size(ms, &fixed_ipa) -
> > + rme_reserve_bit;
> 
> For realm, @max_vm_pa_size is decreased by 1 ...
> 
> >   /* we freeze the memory map to compute the highest gpa */
> >   virt_set_memmap(vms, max_vm_pa_size);
> > -requested_pa_size = 64 - clz64(vms->highest_gpa);
> > +requested_pa_size = 64 - clz64(vms->highest_gpa) + rme_reserve_bit;
> 
> ... For realm, @requested_pa_size is increased by 1, meaning there are two 
> bits in
> the gap.

I think it's a 1-bit gap: max_vm_pa_size is decreased by 1 for the purpose
of memory map calculation, and here we increase by 1 what comes out of
that calculation, for the KVM IPA size setting

> 
> >   /*
> >* KVM requires the IPA size to be at least 32 bits.
> 
> One bit instead of two bits seems the correct gap for the followup check?

Yes this check seems wrong for realm, since (requested_pa_size ==
max_vm_pa_size + 1) should be valid in this case. I'll fix this.

Thanks,
Jean


> 
> if (requested_pa_size > max_vm_pa_size) {
> error_report("-m and ,maxmem option values "
>  "require an IPA range (%d bits) larger than "
>  "the one supported by the host (%d bits)",
>  requested_pa_size, max_vm_pa_size);
> return -1;
> }
> 
> Thanks,
> Gavin
> 



Re: [PATCH 03/10] rust: pl011: extract conversion to RegisterOffset

2025-01-22 Thread Paolo Bonzini

On 1/22/25 15:34, Zhao Liu wrote:

On Fri, Jan 17, 2025 at 10:26:50AM +0100, Paolo Bonzini wrote:

Date: Fri, 17 Jan 2025 10:26:50 +0100
From: Paolo Bonzini 
Subject: [PATCH 03/10] rust: pl011: extract conversion to RegisterOffset
X-Mailer: git-send-email 2.47.1

As an added bonus, this also makes the new function return u32 instead
of u64, thus factoring some casts into a single place.

Signed-off-by: Paolo Bonzini 
---
  rust/hw/char/pl011/src/device.rs | 114 +--
  1 file changed, 63 insertions(+), 51 deletions(-)


[snip]


-pub fn read(&mut self, offset: hwaddr, _size: c_uint) -> 
std::ops::ControlFlow {
+fn regs_read(&mut self, offset: RegisterOffset) -> ControlFlow {
  use RegisterOffset::*;


Can we move this "use" to the start of the file?


I don't think it's a good idea to make the register names visible 
globally...  "use Enum::*" before a match statement is relatively 
common.  For example: https://doc.rust-lang.org/src/std/io/error.rs.html#436



+std::ops::ControlFlow::Break(match offset {


std::ops can be omitted now.


Done, add added a patch to get rid of ControlFlow completely.


-Ok(RSR) => {
-self.receive_status_error_clear.reset();
+RSR => {
+self.receive_status_error_clear = 0.into();


Emm, why do we use 0.into() instead of reset() here? It looks they're
same.


Fixed.


+pub fn read(&mut self, offset: hwaddr, _size: u32) -> ControlFlow {


Maybe pub(crate)? But both are fine for me :-)


The struct is not public outside the crate, so it doesn't make a 
difference, does it?



Reviewed-by: Zhao Liu 


Thanks, I'll post a quick v2 anyway once you've finished reviewing.

Paolo




Re: [PATCH 04/10] rust: pl011: extract CharBackend receive logic into a separate function

2025-01-22 Thread Paolo Bonzini

On 1/22/25 15:59, Zhao Liu wrote:

  if size > 0 {
  debug_assert!(!buf.is_null());
-state.as_mut().put_fifo(c_uint::from(buf.read_volatile()))


An extra question...here I'm not sure, do we really need read_volatile?


No, the buffer is not guest visible.  It will certainly go away together 
with chardev bindings.


Paolo




Re: [PATCH v2 4/5] hw/arm: enable secure EL2 timers for virt machine

2025-01-22 Thread Alex Bennée
Peter Maydell  writes:

> On Wed, 18 Dec 2024 at 18:15, Alex Bennée  wrote:
>>
>> Signed-off-by: Alex Bennée 
>> Cc: qemu-sta...@nongnu.org
>> ---
>>  hw/arm/virt.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 333eaf67ea..5e3589dc6a 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -873,6 +873,8 @@ static void create_gic(VirtMachineState *vms, 
>> MemoryRegion *mem)
>>  [GTIMER_HYP]  = ARCH_TIMER_NS_EL2_IRQ,
>>  [GTIMER_SEC]  = ARCH_TIMER_S_EL1_IRQ,
>>  [GTIMER_HYPVIRT] = ARCH_TIMER_NS_EL2_VIRT_IRQ,
>> +[GTIMER_SEC_PEL2] = ARCH_TIMER_S_EL2_IRQ,
>> +[GTIMER_SEC_VEL2] = ARCH_TIMER_S_EL2_VIRT_IRQ,
>>  };
>>
>>  for (unsigned irq = 0; irq < ARRAY_SIZE(timer_irq); irq++) {
>
> Do these timer interrupts have a defined devicetree binding that
> we need to set up in fdt_add_timer_nodes()? How about ACPI?

The DT in the kernel doesn't care (why would it, it never sees the SEL2
timers). The hafnium test case works without it.

I'm not sure where there is a DT specification that describes what
should be there for SEL2.

>
> thanks
> -- PMM

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH 03/10] rust: vmstate: add varray support to vmstate_of!

2025-01-22 Thread Zhao Liu
On Fri, Jan 17, 2025 at 10:00:39AM +0100, Paolo Bonzini wrote:
> Date: Fri, 17 Jan 2025 10:00:39 +0100
> From: Paolo Bonzini 
> Subject: [PATCH 03/10] rust: vmstate: add varray support to vmstate_of!
> X-Mailer: git-send-email 2.47.1
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  rust/qemu-api/src/vmstate.rs | 42 ++--
>  1 file changed, 40 insertions(+), 2 deletions(-)

...

> +/// Internal utility function to retrieve a type's `VMStateFlags` when it
> +/// is used as the element count of a `VMSTATE_VARRAY`; used by
> +/// [`vmstate_of!`](crate::vmstate_of).
> +pub const fn vmstate_varray_flag(_: PhantomData) -> 
> VMStateField {

A typo? It should return VMStateFlags type.

> +T::VARRAY_FLAG
> +}
> +

Reviewed-by: Zhao Liu 




Re: [PATCH 2/2] docs/cpu-features: Update "PAuth" (Pointer Authentication) details

2025-01-22 Thread Kashyap Chamarthy
On Sat, Jan 18, 2025 at 10:04:37AM +, Marc Zyngier wrote:
> On Fri, 17 Jan 2025 19:11:06 +,
> Kashyap Chamarthy  wrote:
> > 
> > PAuth (Pointer Authentication), a security feature in software, is
> > relevant for both KVM and QEMU.  Relect this fact into the docs:
> > 
> >   - For KVM, `pauth` is a binary, "on" vs "off" option.  The host CPU
> > will choose the cryptographic algorithm.
> > 
> >   - For TCG, however, along with `pauth`, a couple of properties can be
> > controlled -- they're are related to cryptographic algorithm choice.
> > 
> > Thanks to Peter Maydell and Marc Zyngier for explaining more about PAuth
> > on IRC (#qemu, OFTC).
> > 
> > Signed-off-by: Kashyap Chamarthy 
> > ---

[...]

> > -TCG vCPU Features
> > -=
> > +"PAuth" (Pointer Authentication)
> > +
> > +
> > +PAuth (Pointer Authentication) is a security feature in software that
> > +was introduced in Armv8.3-A and Armv9.0-A.  It aims to protect against
> 
> nit: given that ARMv9.0 is congruent to ARMv8.5 and therefore has all
> the ARMv8.5 features, mentioning ARMv8.3 should be enough (but I don't
> feel strongly about this). I feel much strongly about the use of
> capital letters, but I live in a distant past... ;-)

Sure, I can keep it to just v8.3.

On capitalization, I don't feel strongly about it, I just followed this
commit[1], which explained that the rebranding changed "ARM" to "Arm":

6fe6d6c9a95 (docs: Be consistent about capitalization of 'Arm',
2020-03-09)

That's why I went with it.  I see you know this by your "distant past"
remark :)  To match the above, I'll keep the capitalization to "Arm".

> > +ROP (return-oriented programming) attacks.
> > +
> > +KVM
> > +---
> > +
> > +``pauth``
> > +
> > +  Enable or disable ``FEAT_Pauth``.  The host silicon will choose the
> > +  cryptographic algorithm.  No other properties can be controlled.
> 
> nit: "choose" is a an odd choice of word. The host implementation
> defines, or even imposes the signature algorithm, as well as the level
> of PAuth support (PAuth, EPAC, PAuth2, FPAC, FPACCOMBINE, ...), some
> of which are mutually exclusive (EPAC and PAuth2 are incompatible).
> 
> Maybe it would be worth capturing some of these details, as this has a
> direct influence on the ability to migrate a VM.

Yeah, I thought about it but I was not sure if it's the right place.  As
you point out, there's a live-migration impact depending on the level of
PAuth support, so mentioning these details will be useful.

I'll come up with something for v2.  Thanks for looking!

-- 
/kashyap




Re: [PATCH 02/10] rust: pl011: hide unnecessarily "pub" items from outside pl011::device

2025-01-22 Thread Zhao Liu
On Fri, Jan 17, 2025 at 10:26:49AM +0100, Paolo Bonzini wrote:
> Date: Fri, 17 Jan 2025 10:26:49 +0100
> From: Paolo Bonzini 
> Subject: [PATCH 02/10] rust: pl011: hide unnecessarily "pub" items from
>  outside pl011::device
> X-Mailer: git-send-email 2.47.1
> 
> The only public interfaces for pl011 are TYPE_PL011 and pl011_create.
> Remove pub from everything else.
> 
> Note: the "allow(dead_code)" is removed later.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  rust/hw/char/pl011/src/device.rs   |  2 +-
>  rust/hw/char/pl011/src/device_class.rs |  2 +-
>  rust/hw/char/pl011/src/lib.rs  | 13 -
>  3 files changed, 10 insertions(+), 7 deletions(-)
> 

Reviewed-by: Zhao Liu 




Re: [PATCH 01/10] rust: pl011: remove unnecessary "extern crate"

2025-01-22 Thread Zhao Liu
On Fri, Jan 17, 2025 at 10:26:48AM +0100, Paolo Bonzini wrote:
> Date: Fri, 17 Jan 2025 10:26:48 +0100
> From: Paolo Bonzini 
> Subject: [PATCH 01/10] rust: pl011: remove unnecessary "extern crate"
> X-Mailer: git-send-email 2.47.1
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  rust/hw/char/pl011/src/lib.rs | 4 
>  1 file changed, 4 deletions(-)

Yes, it's unnecessary from Rust 2018.

Reviewed-by: Zhao Liu 




Re: [PATCH v3 4/4] tests/qtest/migration: add postcopy test with multifd

2025-01-22 Thread Peter Xu
On Wed, Jan 22, 2025 at 01:26:21PM +0530, Prasad Pandit wrote:
> Hi,
> On Tue, 21 Jan 2025 at 21:17, Peter Xu  wrote:
> > https://lore.kernel.org/qemu-devel/ZykJBq7ME5jgSzCA@x1n/
> > Would you please add all the tests mentioned there?
> 
> /x86_64/migration/multifd/file/mapped-ram/
> /x86_64/migration/multifd/tcp/uri/plain/none
> /x86_64/migration/multifd/tcp/plain/cancel
> 
> /x86_64/migration/postcopy/plain
> /x86_64/migration/postcopy/recovery/
> /x86_64/migration/postcopy/preempt/
> 
> * Of the tests you suggested above, I'll try to enable multifd
> channels for 'postcopy/recovery' and 'postcopy/preempt' tests. For the
> 'multifd' tests above, how do we want to modify them? Enable
> 'postcopy' mode for them?

Right, that implies a migration with both features enabled but finished
even before postcopy starts.

We have some tricky paths that may go differently when different feature
enabled, especially when it's relevant to postcopy and multifd.  Adding
these tests could make sure those corner cases got covered.

And btw, some of my above lines are not a single test, but a set of tests.
E.g., "/x86_64/migration/postcopy/recovery/" is not a single test but:

# /x86_64/migration/postcopy/recovery/plain
# /x86_64/migration/postcopy/recovery/tls/psk
# /x86_64/migration/postcopy/recovery/double-failures/handshake
# /x86_64/migration/postcopy/recovery/double-failures/reconnect

Let me list all the tests that is relevant to the two features to be
explicit..

# /x86_64/migration/postcopy/plain
# /x86_64/migration/postcopy/suspend
# /x86_64/migration/postcopy/tls/psk
# /x86_64/migration/postcopy/recovery/plain
# /x86_64/migration/postcopy/recovery/tls/psk
# /x86_64/migration/postcopy/recovery/double-failures/handshake
# /x86_64/migration/postcopy/recovery/double-failures/reconnect
# /x86_64/migration/postcopy/preempt/plain
# /x86_64/migration/postcopy/preempt/tls/psk
# /x86_64/migration/postcopy/preempt/recovery/plain
# /x86_64/migration/postcopy/preempt/recovery/tls/psk
# /x86_64/migration/multifd/tcp/tls/psk/match
# /x86_64/migration/multifd/tcp/tls/psk/mismatch
# /x86_64/migration/multifd/tcp/tls/x509/default-host
# /x86_64/migration/multifd/tcp/tls/x509/override-host
# /x86_64/migration/multifd/tcp/tls/x509/mismatch-host
# /x86_64/migration/multifd/tcp/tls/x509/allow-anon-client
# /x86_64/migration/multifd/tcp/tls/x509/reject-anon-client
# /x86_64/migration/multifd/tcp/plain/zstd
# /x86_64/migration/multifd/tcp/plain/zlib
# /x86_64/migration/multifd/tcp/plain/cancel
# /x86_64/migration/multifd/tcp/plain/zero-page/legacy
# /x86_64/migration/multifd/tcp/plain/zero-page/none
# /x86_64/migration/multifd/tcp/uri/plain/none
# /x86_64/migration/multifd/tcp/channels/plain/none

(I used to reference mapped-ram for multifd, but I just remembered it can't
be enabled with postcopy, so I dropped them)

I believe many of the tests can be avoided, but still below is a list of
minimum tests that I think might still be good to add:

# /x86_64/migration/postcopy/plain
# /x86_64/migration/postcopy/recovery/tls/psk
# /x86_64/migration/postcopy/preempt/plain
# /x86_64/migration/postcopy/preempt/recovery/tls/psk
# /x86_64/migration/multifd/tcp/tls/psk/match
# /x86_64/migration/multifd/tcp/plain/zstd
# /x86_64/migration/multifd/tcp/plain/cancel

I kept almost tls relevant ones because it has the most code path coverage,
and I suppose tls should also be an emphasis in the future for migration in
CoCo environments.  I removed most of the trivial test cases, like postcopy
double failures etc. which can be too hard to trigger in real life.

Feel free to comment on whether you think the list is suitable to you.  If
you want to add some more into the list I'm also ok with.

IMHO we can have a specific path "/x86_64/migration/multifd+postcopy/*" for
all above new tests, that have both features enabled.

Fabiano, you're rethinking the test infra, please comment if you have any
thoughts on above too.

Thanks,

-- 
Peter Xu




Re: [PATCH 9/9] aspeed: Create sd devices only when defaults are enabled

2025-01-22 Thread Philippe Mathieu-Daudé

On 22/1/25 08:09, Cédric Le Goater wrote:

When the -nodefaults option is set, sd devices should not be
automatically created by the machine. Instead they should be defined
on the command line.

Note that it is not currently possible to define which bus an
"sd-card" device is attached to:

   -blockdev node-name=drive0,driver=file,filename=/path/to/file.img \
   -device sd-card,drive=drive0,id=sd0

and the first bus named "sd-bus" will be used.

Signed-off-by: Cédric Le Goater 
---
  hw/arm/aspeed.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


For SDHCI:
Acked-by: Philippe Mathieu-Daudé 




[PATCH 8/9] iotests: Add filter_qtest()

2025-01-22 Thread Kevin Wolf
The open-coded form of this filter has been copied into enough tests
that it's better to move it into iotests.py.

Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/iotests.py | 4 
 tests/qemu-iotests/041| 4 +---
 tests/qemu-iotests/165| 4 +---
 tests/qemu-iotests/tests/copy-before-write| 3 +--
 tests/qemu-iotests/tests/migrate-bitmaps-test | 7 +++
 5 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 19817c7353..9c9c908983 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -701,6 +701,10 @@ def _filter(_key, value):
 def filter_nbd_exports(output: str) -> str:
 return re.sub(r'((min|opt|max) block): [0-9]+', r'\1: XXX', output)
 
+def filter_qtest(output: str) -> str:
+output = re.sub(r'^\[I \d+\.\d+\] OPENED\n', '', output)
+output = re.sub(r'\n?\[I \+\d+\.\d+\] CLOSED\n?$', '', output)
+return output
 
 Msg = TypeVar('Msg', Dict[str, Any], List[Any], str)
 
diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 98d17b1388..8452845f44 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -1100,10 +1100,8 @@ class TestRepairQuorum(iotests.QMPTestCase):
 
 # Check the full error message now
 self.vm.shutdown()
-log = self.vm.get_log()
-log = re.sub(r'^\[I \d+\.\d+\] OPENED\n', '', log)
+log = iotests.filter_qtest(self.vm.get_log())
 log = re.sub(r'^Formatting.*\n', '', log)
-log = re.sub(r'\n\[I \+\d+\.\d+\] CLOSED\n?$', '', log)
 log = re.sub(r'^%s: ' % os.path.basename(iotests.qemu_prog), '', log)
 
 self.assertEqual(log,
diff --git a/tests/qemu-iotests/165 b/tests/qemu-iotests/165
index b24907a62f..b3b1709d71 100755
--- a/tests/qemu-iotests/165
+++ b/tests/qemu-iotests/165
@@ -82,9 +82,7 @@ class TestPersistentDirtyBitmap(iotests.QMPTestCase):
 self.vm.shutdown()
 
 #catch 'Persistent bitmaps are lost' possible error
-log = self.vm.get_log()
-log = re.sub(r'^\[I \d+\.\d+\] OPENED\n', '', log)
-log = re.sub(r'\[I \+\d+\.\d+\] CLOSED\n?$', '', log)
+log = iotests.filter_qtest(self.vm.get_log())
 if log:
 print(log)
 
diff --git a/tests/qemu-iotests/tests/copy-before-write 
b/tests/qemu-iotests/tests/copy-before-write
index d33bea577d..498c558008 100755
--- a/tests/qemu-iotests/tests/copy-before-write
+++ b/tests/qemu-iotests/tests/copy-before-write
@@ -95,8 +95,7 @@ class TestCbwError(iotests.QMPTestCase):
 
 self.vm.shutdown()
 log = self.vm.get_log()
-log = re.sub(r'^\[I \d+\.\d+\] OPENED\n', '', log)
-log = re.sub(r'\[I \+\d+\.\d+\] CLOSED\n?$', '', log)
+log = iotests.filter_qtest(log)
 log = iotests.filter_qemu_io(log)
 return log
 
diff --git a/tests/qemu-iotests/tests/migrate-bitmaps-test 
b/tests/qemu-iotests/tests/migrate-bitmaps-test
index f98e721e97..8fb4099201 100755
--- a/tests/qemu-iotests/tests/migrate-bitmaps-test
+++ b/tests/qemu-iotests/tests/migrate-bitmaps-test
@@ -122,11 +122,10 @@ class TestDirtyBitmapMigration(iotests.QMPTestCase):
 
 # catch 'Could not reopen qcow2 layer: Bitmap already exists'
 # possible error
-log = self.vm_a.get_log()
-log = re.sub(r'^\[I \d+\.\d+\] OPENED\n', '', log)
-log = re.sub(r'^(wrote .* bytes at offset .*\n.*KiB.*ops.*sec.*\n){3}',
+log = iotests.filter_qtest(self.vm_a.get_log())
+log = re.sub(r'^(wrote .* bytes at offset .*\n'
+ r'.*KiB.*ops.*sec.*\n?){3}',
  '', log)
-log = re.sub(r'\[I \+\d+\.\d+\] CLOSED\n?$', '', log)
 self.assertEqual(log, '')
 
 # test that bitmap is still persistent
-- 
2.48.1




[PATCH v4] hw/i386/cpu: remove default_cpu_version and simplify

2025-01-22 Thread Ani Sinha
commit 0788a56bd1ae3 ("i386: Make unversioned CPU models be aliases")
introduced 'default_cpu_version' for PCMachineClass. This created three
categories of CPU models:
 - Most unversioned CPU models would use version 1 by default.
 - For machines 4.0.1 and older that do not support cpu model aliases, a
   special default_cpu_version value of CPU_VERSION_LEGACY is used.
 - It was thought that future machines would use the latest value of cpu
   versions corresponding to default_cpu_version value of
   CPU_VERSION_LATEST [1].

All pc machines still use the default cpu version of 1 for
unversioned cpu models. CPU_VERSION_LATEST is a moving target and
changes with time. Therefore, if machines use CPU_VERSION_LATEST, it would
mean that over a period of time, for the same machine type, the cpu version
would be different depending on what is latest at that time. This would
break guests even when they use a constant machine type. Therefore, for
pc machines, use of CPU_VERSION_LATEST is not possible. Currently, only
microvms use CPU_VERSION_LATEST.

This change cleans up the complicated logic around default_cpu_version
including getting rid of default_cpu_version property itself. A couple of new
flags are introduced, one for the legacy model for machines 4.0.1 and older
and other for microvms. For older machines, a new pc machine property is
introduced that separates pc machine versions 4.0.1 and older from the newer
machines. 4.0.1 and older machines are scheduled to be deleted towards
end of 2025 since they would be 6 years old by then. At that time, we can
remove all logic around legacy cpus. Microvms are the only machines that
continue to use the latest cpu version. If this changes later, we can
remove all logic around x86_cpu_model_last_version(). Default cpu version
for unversioned cpu models is hardcoded to the value 1 and applies
unconditionally for all pc machine types of version 4.1 and above.

This change also removes all complications around CPU_VERSION_AUTO
including removal of the value itself.

1) See commit dcafd1ef0af227 ("i386: Register versioned CPU models")

CC: imamm...@redhat.com
Signed-off-by: Ani Sinha 
---
 hw/i386/microvm.c |  3 +-
 hw/i386/pc_piix.c |  6 ++--
 hw/i386/pc_q35.c  |  6 ++--
 hw/i386/x86-common.c  |  4 +--
 include/hw/i386/pc.h  | 19 ++--
 include/hw/i386/x86.h |  5 +++-
 target/i386/cpu.c | 69 ++-
 target/i386/cpu.h | 23 ---
 8 files changed, 70 insertions(+), 65 deletions(-)

changelog:
v2: explain in commit log why use of CPU_VERSION_LATEST for machines
is problematic.
v3: fix a bug that broke the pipeline
https://gitlab.com/mstredhat/qemu/-/pipelines/1626171267
when cpu versions are explicitly specified in the command line,
respect that and do not enforce legacy (unversioned) cpu logic.
The pipeline is green now with the fix:
https://gitlab.com/anisinha/qemu/-/pipelines/1626783632
v4: made changes as per Zhao's suggestions.
Pipeline passes https://gitlab.com/anisinha/qemu/-/pipelines/1635829877

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index a8d354aabe..ffb1b37fe5 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -458,7 +458,8 @@ static void microvm_machine_state_init(MachineState 
*machine)
 
 microvm_memory_init(mms);
 
-x86_cpus_init(x86ms, CPU_VERSION_LATEST);
+x86_cpu_uses_lastest_version();
+x86_cpus_init(x86ms);
 
 microvm_devices_init(mms);
 }
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 04d2957adc..dc684cb011 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -181,7 +181,8 @@ static void pc_init1(MachineState *machine, const char 
*pci_type)
 }
 
 pc_machine_init_sgx_epc(pcms);
-x86_cpus_init(x86ms, pcmc->default_cpu_version);
+
+pc_init_cpus(machine);
 
 if (kvm_enabled()) {
 kvmclock_create(pcmc->kvmclock_create_always);
@@ -457,7 +458,6 @@ static void pc_i440fx_machine_options(MachineClass *m)
 ObjectClass *oc = OBJECT_CLASS(m);
 pcmc->default_south_bridge = TYPE_PIIX3_DEVICE;
 pcmc->pci_root_uid = 0;
-pcmc->default_cpu_version = 1;
 
 m->family = "pc_piix";
 m->desc = "Standard PC (i440FX + PIIX, 1996)";
@@ -669,7 +669,7 @@ static void pc_i440fx_machine_4_0_options(MachineClass *m)
 {
 PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_i440fx_machine_4_1_options(m);
-pcmc->default_cpu_version = CPU_VERSION_LEGACY;
+pcmc->no_versioned_cpu_model = true;
 compat_props_add(m->compat_props, hw_compat_4_0, hw_compat_4_0_len);
 compat_props_add(m->compat_props, pc_compat_4_0, pc_compat_4_0_len);
 }
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 77536dd697..045b05da64 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -187,7 +187,8 @@ static void pc_q35_init(MachineState *machine)
 }
 
 pc_machine_init_sgx_epc(pcms);
-x86_cpus_init(x86ms, pcmc->default_cpu_version);
+
+pc_init_cpus(machine);
 
 if (kvm_enabled()) {
 kvmc

[PATCH v2 0/2] tests/functional: Fix broken decorators

2025-01-22 Thread Thomas Huth
Many of the new decorators of the functional tests don't work as
expected (and simply always allow to run the tests). Let's fix them!

v2:
- Use importlib.import_module() to check whether we can import a module
- Split the import check into a separate patch

Thomas Huth (2):
  tests/functional/qemu_test/decorators: Fix bad check for imports
  tests/functional: Fix broken decorators with lamda functions

 tests/functional/qemu_test/decorators.py | 45 
 1 file changed, 22 insertions(+), 23 deletions(-)

-- 
2.48.1




[PATCH v2 2/2] tests/functional: Fix broken decorators with lamda functions

2025-01-22 Thread Thomas Huth
The decorators that use a lambda function are currently broken
and do not properly skip the test if the condition is not met.
Using "return skipUnless(lambda: ...)" does not work as expected.
To fix it, rewrite the decorators without lambda, it's simpler
that way anyway.

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Thomas Huth 
---
 tests/functional/qemu_test/decorators.py | 44 +++-
 1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/tests/functional/qemu_test/decorators.py 
b/tests/functional/qemu_test/decorators.py
index 08f58f6b40..3d9c02fd59 100644
--- a/tests/functional/qemu_test/decorators.py
+++ b/tests/functional/qemu_test/decorators.py
@@ -17,15 +17,14 @@
   @skipIfMissingCommands("mkisofs", "losetup")
 '''
 def skipIfMissingCommands(*args):
-def has_cmds(cmdlist):
-for cmd in cmdlist:
-if not which(cmd):
-return False
-return True
-
-return skipUnless(lambda: has_cmds(args),
-  'required command(s) "%s" not installed' %
-  ", ".join(args))
+has_cmds = True
+for cmd in args:
+ if not which(cmd):
+ has_cmds = False
+ break
+
+return skipUnless(has_cmds, 'required command(s) "%s" not installed' %
+", ".join(args))
 
 '''
 Decorator to skip execution of a test if the current
@@ -36,9 +35,9 @@ def has_cmds(cmdlist):
   @skipIfNotMachine("x86_64", "aarch64")
 '''
 def skipIfNotMachine(*args):
-return skipUnless(lambda: platform.machine() in args,
-'not running on one of the required machine(s) "%s"' %
-", ".join(args))
+return skipUnless(platform.machine() in args,
+  'not running on one of the required machine(s) "%s"' %
+  ", ".join(args))
 
 '''
 Decorator to skip execution of flaky tests, unless
@@ -95,14 +94,13 @@ def skipBigDataTest():
   @skipIfMissingImports("numpy", "cv2")
 '''
 def skipIfMissingImports(*args):
-def has_imports(importlist):
-for impname in importlist:
-try:
-importlib.import_module(impname)
-except ImportError:
-return False
-return True
-
-return skipUnless(lambda: has_imports(args),
-  'required import(s) "%s" not installed' %
-  ", ".join(args))
+has_imports = True
+for impname in args:
+try:
+importlib.import_module(impname)
+except ImportError:
+has_imports = False
+break
+
+return skipUnless(has_imports, 'required import(s) "%s" not installed' %
+   ", ".join(args))
-- 
2.48.1




[PATCH v2 1/2] tests/functional/qemu_test/decorators: Fix bad check for imports

2025-01-22 Thread Thomas Huth
skipIfMissingImports should use importlib.import_module() for checking
whether a module with the name stored in the "impname" variable is
available or not, otherwise the code tries to import a module with
the name "impname" instead.
(This bug hasn't been noticed before since there is another issue
with this decorator that will be fixed by the next patch)

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Thomas Huth 
---
 tests/functional/qemu_test/decorators.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/functional/qemu_test/decorators.py 
b/tests/functional/qemu_test/decorators.py
index df088bc090..08f58f6b40 100644
--- a/tests/functional/qemu_test/decorators.py
+++ b/tests/functional/qemu_test/decorators.py
@@ -2,6 +2,7 @@
 #
 # Decorators useful in functional tests
 
+import importlib
 import os
 import platform
 from unittest import skipUnless
@@ -97,7 +98,7 @@ def skipIfMissingImports(*args):
 def has_imports(importlist):
 for impname in importlist:
 try:
-import impname
+importlib.import_module(impname)
 except ImportError:
 return False
 return True
-- 
2.48.1




Re: [PATCH] vhost-user: Silence unsupported VHOST_USER_PROTOCOL_F_RARP error

2025-01-22 Thread Stefano Garzarella

On Tue, Jan 21, 2025 at 11:00:29AM +0100, Laurent Vivier wrote:

In vhost_user_receive() if vhost_net_notify_migration_done() reports
an error we display on the console:

 Vhost user backend fails to broadcast fake RARP

This message can be useful if there is a problem to execute
VHOST_USER_SEND_RARP but it is useless if the backend doesn't
support VHOST_USER_PROTOCOL_F_RARP.

Don't report the error if vhost_net_notify_migration_done()
returns -ENOTSUP (from vhost_user_migration_done())

Update vhost_net-stub.c to return -ENOTSUP too.

Signed-off-by: Laurent Vivier 
---
hw/net/vhost_net-stub.c | 2 +-
net/vhost-user.c| 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/net/vhost_net-stub.c b/hw/net/vhost_net-stub.c
index 72df6d757e4d..875cd6c2b9c8 100644
--- a/hw/net/vhost_net-stub.c
+++ b/hw/net/vhost_net-stub.c
@@ -93,7 +93,7 @@ void vhost_net_config_mask(VHostNetState *net, VirtIODevice 
*dev, bool mask)

int vhost_net_notify_migration_done(struct vhost_net *net, char* mac_addr)
{
-return -1;
+return -ENOTSUP;
}

VHostNetState *get_vhost_net(NetClientState *nc)
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 1218e838..636fff8a84a2 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -146,7 +146,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const 
uint8_t *buf,

r = vhost_net_notify_migration_done(s->vhost_net, mac_addr);

-if ((r != 0) && (display_rarp_failure)) {
+if ((r != 0) && (r != -ENOTSUP) && (display_rarp_failure)) {
fprintf(stderr,
"Vhost user backend fails to broadcast fake RARP\n");
fflush(stderr);
--
2.47.1



IIUC the message was there since the introduction about 10 years ago
from commit 3e866365e1 ("vhost user: add rarp sending after live
migration for legacy guest"). IIUC -ENOTSUP is returned when both F_RARP
and F_GUEST_ANNOUNCE are not negotiated.

That said, I honestly don't know what F_RARP or F_GUEST_ANNOUNCE is for,
but my understanding is that the message was to notify that the
migration was finished (reading that commit).

If neither feature is supported, could this be a problem for the user
and that's why we were printing the message?

Thanks,
Stefano




Re: [PATCH 2/2] improve precision of throttle_pct

2025-01-22 Thread fuqiang wang



On 2025/1/21 20:25, Yong Huang wrote:

On Tue, Dec 31, 2024 at 9:56 AM fuqiang wang  wrote:


Using the current algorithm, there are issues with precision not being
handled correctly during division operations. (Even though double type
casting is used in the function, it does not seem to have any effect.)
Refer to the results of the test program from [1]. When there is a large
discrepancy between current and quota, there is a noticeable error.




The main derivation of the new algorithm is(For current > quota):
ring_full_time_us * current
 quota   = 
ring_full_time_us + throttle_us




current - quota
 throttle_us = - * ring_full_time_us
 quota




In the actual code, first calculate the value of
{(current-quota})\quota} and store the intermediate result as a double.
Then, multiply it by ring_full_time_us.

Test scenario:
- generate dirty pages program: tests/migration/stress, dirtyrate is
   about 1500MB/s with WP enable.
- dirtyring size : 65536
- dirtylimit: 333

To facilitate testing, merge both the new and old algorithms into the
same code, calculate the difference in throttle_us between them, and
track the value of the next non-linear adjustment after a linear
adjustment.

The test results are as follows:

- throttle_us difference:
   [19003, 24755, 25231, 14630, 25705]

   average: 21864

- next non-linear adjustment":
   [16764, 16368, 16357, 16591, 16347]

   average: 16485

Based on the test results, after merging this patch, the linear
adjustment value will increase, allowing the quota to be reached one
loop earlier.

[1]:
https://github.com/cai-fuqiang/kernel_test/tree/master/dirty_throttle_pct_test



Thanks for this work. This modified algorithm seems ok to me. Could you
share

the guestperf test result or other performance tests? Such that we could
observe

the improvement directly.

Good suggestion. I will to do it in the next patch, but it seems I will 
need to wait until after the holiday. Wishing you a Happy New Year!





Signed-off-by: wangfuqiang49 


---

  system/dirtylimit.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/system/dirtylimit.c b/system/dirtylimit.c
index c7f663e5b9..25439e8e99 100644
--- a/system/dirtylimit.c
+++ b/system/dirtylimit.c
@@ -281,7 +281,7 @@ static void dirtylimit_set_throttle(CPUState *cpu,
  {
  int64_t ring_full_time_us = 0;
  uint64_t sleep_pct = 0;
-uint64_t throttle_pct = 0;
+double throttle_pct = 0;
  uint64_t throttle_us = 0;
  int64_t throtlle_us_old = cpu->throttle_us_per_full;

@@ -294,14 +294,14 @@ static void dirtylimit_set_throttle(CPUState *cpu,

  if (dirtylimit_need_linear_adjustment(quota, current)) {
  if (quota < current) {
-throttle_pct  = (current - quota) * 100 / current;
+throttle_pct  = (current - quota) / (double)quota;
  throttle_us =
-ring_full_time_us * throttle_pct / (double)(100 -
throttle_pct);
+ring_full_time_us * throttle_pct;
  cpu->throttle_us_per_full += throttle_us;
  } else {
-throttle_pct = (quota - current) * 100 / quota;
+throttle_pct = (quota - current) / (double)current;
  throttle_us =
-ring_full_time_us * throttle_pct / (double)(100 -
throttle_pct);
+ring_full_time_us * throttle_pct;
  cpu->throttle_us_per_full -= throttle_us;
  }

--
2.47.0



Yong






Re: [PATCH v2 1/2] tests/functional/qemu_test/decorators: Fix bad check for imports

2025-01-22 Thread Daniel P . Berrangé
On Wed, Jan 22, 2025 at 02:43:13PM +0100, Thomas Huth wrote:
> skipIfMissingImports should use importlib.import_module() for checking
> whether a module with the name stored in the "impname" variable is
> available or not, otherwise the code tries to import a module with
> the name "impname" instead.
> (This bug hasn't been noticed before since there is another issue
> with this decorator that will be fixed by the next patch)
> 
> Suggested-by: Daniel P. Berrangé 
> Signed-off-by: Thomas Huth 
> ---
>  tests/functional/qemu_test/decorators.py | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v2 0/3] scripts/qemu-gdb: Make coroutine dumps to work with coredumps

2025-01-22 Thread Kevin Wolf
Am 12.12.2024 um 21:47 hat Peter Xu geschrieben:
> v1: https://lore.kernel.org/r/20241211201739.1380222-1-pet...@redhat.com
> 
> Changelog: in previous v1, I got a wrong cut-off accident in commit
> message, which is now fixed (along with some small touchup elsewhere).
> When at it, I also tried to make it look even better to be as close as gdb
> bt, so it looks like this now:
> 
>   Coroutine at 0x7f9f4c57c748:
>   #0  0x55ae6c0dc9a8 in qemu_coroutine_switch<+120> () at 
> ../util/coroutine-ucontext.c:321
>   #1  0x55ae6c0da2f8 in qemu_aio_coroutine_enter<+356> () at 
> ../util/qemu-coroutine.c:293
>   #2  0x55ae6c0da3f1 in qemu_coroutine_enter<+34> () at 
> ../util/qemu-coroutine.c:316
>   #3  0x55ae6baf775e in migration_incoming_process<+43> () at 
> ../migration/migration.c:876
>   #4  0x55ae6baf7ab4 in migration_ioc_process_incoming<+490> () at 
> ../migration/migration.c:1008
>   #5  0x55ae6bae9ae7 in migration_channel_process_incoming<+145> () at 
> ../migration/channel.c:45
>   #6  0x55ae6bb18e35 in socket_accept_incoming_migration<+118> () at 
> ../migration/socket.c:132
>   #7  0x55ae6be939ef in qio_net_listener_channel_func<+131> () at 
> ../io/net-listener.c:54
>   #8  0x55ae6be8ce1a in qio_channel_fd_source_dispatch<+78> () at 
> ../io/channel-watch.c:84
>   #9  0x7f9f5b26728c in g_main_context_dispatch_unlocked.lto_priv<+315> ()
>   #10  0x7f9f5b267555 in g_main_context_dispatch<+36> ()
>   #11  0x55ae6c0d91a7 in glib_pollfds_poll<+90> () at ../util/main-loop.c:287
>   #12  0x55ae6c0d9235 in os_host_main_loop_wait<+128> () at 
> ../util/main-loop.c:310
>   #13  0x55ae6c0d9364 in main_loop_wait<+203> () at ../util/main-loop.c:589
>   #14  0x55ae6bac212a in qemu_main_loop<+41> () at ../system/runstate.c:835
>   #15  0x55ae6bfdf522 in qemu_default_main<+19> () at ../system/main.c:37
>   #16  0x55ae6bfdf55f in main<+40> () at ../system/main.c:48
>   #17  0x7f9f59d42248 in __libc_start_call_main<+119> ()
>   #18  0x7f9f59d4230b in __libc_start_main_impl<+138> ()
> 
> Coroutines are used in many cases in block layers. It's also used in live
> migration when on destination side, and it'll be handy to diagnose crashes
> within a coroutine when we want to also know what other coroutines are
> doing.
> 
> This series adds initial support for that, not pretty but it should start
> working.  Since we can't use the trick to modify registers on the fly in
> non-live gdb sessions, we do manual unwinds.
> 
> One thing to mention is there's a similar but more generic solution
> mentioned on the list from Niall:
> 
> https://lore.kernel.org/r/f0ebccca-7a17-4da8-ac4a-71cf6d69a...@mtasv.net
> 
> That adds more dependency on both gdb and qemu in the future, however more
> generic.  So this series is an intermediate quick solution as for now,
> which should work for most older qemu/gdb binaries too.
> 
> Thanks,
> 
> Peter Xu (3):
>   scripts/qemu-gdb: Always do full stack dump for python errors
>   scripts/qemu-gdb: Simplify fs_base fetching for coroutines
>   scripts/qemu-gdb: Support coroutine dumps in coredumps

Thanks, applied to the block branch.

Kevin




Re: [RFC v3 1/5] vhost-vdpa: Decouple the IOVA allocator

2025-01-22 Thread Jonah Palmer




On 1/21/25 12:25 PM, Eugenio Perez Martin wrote:

On Tue, Jan 21, 2025 at 3:53 PM Jonah Palmer  wrote:




On 1/16/25 11:44 AM, Eugenio Perez Martin wrote:

On Fri, Jan 10, 2025 at 6:09 PM Jonah Palmer  wrote:


Decouples the IOVA allocator from the full IOVA->HVA tree to support a
SVQ IOVA->HVA tree for host-only memory mappings.

The IOVA allocator still allocates an IOVA range but instead adds this
range to an IOVA-only tree (iova_map) that keeps track of allocated IOVA
ranges for both guest & host-only memory mappings.

A new API function vhost_iova_tree_insert() is also created for adding
IOVA->HVA mappings into the SVQ IOVA->HVA tree, since the allocator is
no longer doing that.



What is the reason for not adding IOVA -> HVA tree on _alloc
automatically? The problematic one is GPA -> HVA, isn't it? Doing this
way we force all the allocations to do the two calls (alloc+insert),
or the trees will be inconsistent.



Ah, I believe you also made a similar comment in RFC v1, saying it
wasn't intuitive for the user to follow up with a
vhost_iova_tree_insert() call afterwards (e.g. in
vhost_vdpa_svq_map_ring() or vhost_vdpa_cvq_map_buf()).

I believe what I ended up doing in RFC v2 was creating separate alloc
functions for host-only memory mapping (e.g. vhost_vdpa_svq_map_ring()
and vhost_vdpa_cvq_map_buf()) and guest-backed memory mapping (e.g.
vhost_vdpa_listener_region_add()).

This way, for host-only memory, in the alloc function we allocate an
IOVA range (in the IOVA-only tree) and then also inserts the IOVA->HVA
mapping into the SVQ IOVA->HVA tree. Similarly, for guest-backed memory,
we create its own alloc function (e.g. vhost_iova_tree_map_alloc_gpa()),
allocate the IOVA range (in the IOVA-only tree) and then insert the
GPA->IOVA mapping into the GPA->IOVA tree.

This was done so that we didn't have to rely on the user to also call
the insertion function after calling the allocation function.

Is this kinda what you're thinking of here?



Right, I think it makes more sense. Do you think differently, maybe I
missed any drawbacks?



No, I totally think this is fine. I can't think of any serious drawbacks.

Will do in the next series!


Signed-off-by: Jonah Palmer 
---
   hw/virtio/vhost-iova-tree.c | 35 +++
   hw/virtio/vhost-iova-tree.h |  1 +
   hw/virtio/vhost-vdpa.c  | 21 -
   net/vhost-vdpa.c| 13 +++--
   4 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
index 3d03395a77..b1cfd17843 100644
--- a/hw/virtio/vhost-iova-tree.c
+++ b/hw/virtio/vhost-iova-tree.c
@@ -28,12 +28,15 @@ struct VhostIOVATree {

   /* IOVA address to qemu memory maps. */
   IOVATree *iova_taddr_map;
+
+/* Allocated IOVA addresses */
+IOVATree *iova_map;
   };

   /**
- * Create a new IOVA tree
+ * Create a new VhostIOVATree
*
- * Returns the new IOVA tree
+ * Returns the new VhostIOVATree
*/
   VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
   {
@@ -44,15 +47,17 @@ VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, 
hwaddr iova_last)
   tree->iova_last = iova_last;

   tree->iova_taddr_map = iova_tree_new();
+tree->iova_map = iova_tree_new();
   return tree;
   }

   /**
- * Delete an iova tree
+ * Delete a VhostIOVATree


Thanks for fixing the doc of new and delete :) Maybe it is better to
put it in an independent patch?



Sure can :)


*/
   void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
   {
   iova_tree_destroy(iova_tree->iova_taddr_map);
+iova_tree_destroy(iova_tree->iova_map);
   g_free(iova_tree);
   }

@@ -94,7 +99,7 @@ int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap 
*map)
   }

   /* Allocate a node in IOVA address */
-return iova_tree_alloc_map(tree->iova_taddr_map, map, iova_first,
+return iova_tree_alloc_map(tree->iova_map, map, iova_first,
  tree->iova_last);
   }

@@ -107,4 +112,26 @@ int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap 
*map)
   void vhost_iova_tree_remove(VhostIOVATree *iova_tree, DMAMap map)
   {
   iova_tree_remove(iova_tree->iova_taddr_map, map);
+iova_tree_remove(iova_tree->iova_map, map);
+}
+
+/**
+ * Insert a new mapping to the IOVA->HVA tree
+ *
+ * @tree: The VhostIOVATree
+ * @map: The IOVA->HVA mapping
+ *
+ * Returns:
+ * - IOVA_OK if the map fits in the container
+ * - IOVA_ERR_INVALID if the map does not make sense (e.g. size overflow)
+ * - IOVA_ERR_OVERLAP if the IOVA range overlaps with an existing range
+ */
+int vhost_iova_tree_insert(VhostIOVATree *iova_tree, DMAMap *map)
+{
+if (map->translated_addr + map->size < map->translated_addr ||
+map->perm == IOMMU_NONE) {
+return IOVA_ERR_INVALID;
+}
+
+return iova_tree_insert(iova_tree->iova_taddr_map, map);
   }
diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
inde

Re: [PATCH] vhost-user: Silence unsupported VHOST_USER_PROTOCOL_F_RARP error

2025-01-22 Thread Michael S. Tsirkin
On Wed, Jan 22, 2025 at 05:20:06PM +0100, Stefano Garzarella wrote:
> On Wed, Jan 22, 2025 at 08:59:22AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Jan 22, 2025 at 02:42:14PM +0100, Stefano Garzarella wrote:
> > > On Tue, Jan 21, 2025 at 11:00:29AM +0100, Laurent Vivier wrote:
> > > > In vhost_user_receive() if vhost_net_notify_migration_done() reports
> > > > an error we display on the console:
> > > >
> > > >  Vhost user backend fails to broadcast fake RARP
> > > >
> > > > This message can be useful if there is a problem to execute
> > > > VHOST_USER_SEND_RARP but it is useless if the backend doesn't
> > > > support VHOST_USER_PROTOCOL_F_RARP.
> > > >
> > > > Don't report the error if vhost_net_notify_migration_done()
> > > > returns -ENOTSUP (from vhost_user_migration_done())
> > > >
> > > > Update vhost_net-stub.c to return -ENOTSUP too.
> > > >
> > > > Signed-off-by: Laurent Vivier 
> > > > ---
> > > > hw/net/vhost_net-stub.c | 2 +-
> > > > net/vhost-user.c| 2 +-
> > > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/hw/net/vhost_net-stub.c b/hw/net/vhost_net-stub.c
> > > > index 72df6d757e4d..875cd6c2b9c8 100644
> > > > --- a/hw/net/vhost_net-stub.c
> > > > +++ b/hw/net/vhost_net-stub.c
> > > > @@ -93,7 +93,7 @@ void vhost_net_config_mask(VHostNetState *net, 
> > > > VirtIODevice *dev, bool mask)
> > > >
> > > > int vhost_net_notify_migration_done(struct vhost_net *net, char* 
> > > > mac_addr)
> > > > {
> > > > -return -1;
> > > > +return -ENOTSUP;
> > > > }
> > > >
> > > > VHostNetState *get_vhost_net(NetClientState *nc)
> > > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > > index 1218e838..636fff8a84a2 100644
> > > > --- a/net/vhost-user.c
> > > > +++ b/net/vhost-user.c
> > > > @@ -146,7 +146,7 @@ static ssize_t vhost_user_receive(NetClientState 
> > > > *nc, const uint8_t *buf,
> > > >
> > > > r = vhost_net_notify_migration_done(s->vhost_net, mac_addr);
> > > >
> > > > -if ((r != 0) && (display_rarp_failure)) {
> > > > +if ((r != 0) && (r != -ENOTSUP) && (display_rarp_failure)) {
> > > > fprintf(stderr,
> > > > "Vhost user backend fails to broadcast fake 
> > > > RARP\n");
> > > > fflush(stderr);
> > > > --
> > > > 2.47.1
> > > >
> > > 
> > > IIUC the message was there since the introduction about 10 years ago
> > > from commit 3e866365e1 ("vhost user: add rarp sending after live
> > > migration for legacy guest"). IIUC -ENOTSUP is returned when both F_RARP
> > > and F_GUEST_ANNOUNCE are not negotiated.
> > > 
> > > That said, I honestly don't know what F_RARP or F_GUEST_ANNOUNCE is for,
> > 
> > rarp is to have destination host broadcast a message with VM address
> > to update the network. Guest announce is when it will instead
> > ask the guest to do this.
> 
> Okay, thanks for explaining to me.
> So if both features are not negotiated, no one is going to broadcast
> the message, right?
> 
> Could that be a valid reason to print an error message in QEMU?
> 
> To me it might be reasonable because the user might experience some
> network problems, but I'm not a network guy :-)
> 
> Thanks,
> Stefano

reasonable, yes.

> > 
> > 
> > > but my understanding is that the message was to notify that the
> > > migration was finished (reading that commit).
> > > 
> > > If neither feature is supported, could this be a problem for the user
> > > and that's why we were printing the message?
> > > 
> > > Thanks,
> > > Stefano
> > 




Re: [PULL v2 0/9] s390x and test patches 2025-01-21

2025-01-22 Thread Stefan Hajnoczi
Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/10.0 for any 
user-visible changes.


signature.asc
Description: PGP signature


Re: [PATCH] vhost-user: Silence unsupported VHOST_USER_PROTOCOL_F_RARP error

2025-01-22 Thread Stefano Garzarella

On Wed, Jan 22, 2025 at 08:59:22AM -0500, Michael S. Tsirkin wrote:

On Wed, Jan 22, 2025 at 02:42:14PM +0100, Stefano Garzarella wrote:

On Tue, Jan 21, 2025 at 11:00:29AM +0100, Laurent Vivier wrote:
> In vhost_user_receive() if vhost_net_notify_migration_done() reports
> an error we display on the console:
>
>  Vhost user backend fails to broadcast fake RARP
>
> This message can be useful if there is a problem to execute
> VHOST_USER_SEND_RARP but it is useless if the backend doesn't
> support VHOST_USER_PROTOCOL_F_RARP.
>
> Don't report the error if vhost_net_notify_migration_done()
> returns -ENOTSUP (from vhost_user_migration_done())
>
> Update vhost_net-stub.c to return -ENOTSUP too.
>
> Signed-off-by: Laurent Vivier 
> ---
> hw/net/vhost_net-stub.c | 2 +-
> net/vhost-user.c| 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/hw/net/vhost_net-stub.c b/hw/net/vhost_net-stub.c
> index 72df6d757e4d..875cd6c2b9c8 100644
> --- a/hw/net/vhost_net-stub.c
> +++ b/hw/net/vhost_net-stub.c
> @@ -93,7 +93,7 @@ void vhost_net_config_mask(VHostNetState *net, VirtIODevice 
*dev, bool mask)
>
> int vhost_net_notify_migration_done(struct vhost_net *net, char* mac_addr)
> {
> -return -1;
> +return -ENOTSUP;
> }
>
> VHostNetState *get_vhost_net(NetClientState *nc)
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index 1218e838..636fff8a84a2 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -146,7 +146,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, 
const uint8_t *buf,
>
> r = vhost_net_notify_migration_done(s->vhost_net, mac_addr);
>
> -if ((r != 0) && (display_rarp_failure)) {
> +if ((r != 0) && (r != -ENOTSUP) && (display_rarp_failure)) {
> fprintf(stderr,
> "Vhost user backend fails to broadcast fake RARP\n");
> fflush(stderr);
> --
> 2.47.1
>

IIUC the message was there since the introduction about 10 years ago
from commit 3e866365e1 ("vhost user: add rarp sending after live
migration for legacy guest"). IIUC -ENOTSUP is returned when both F_RARP
and F_GUEST_ANNOUNCE are not negotiated.

That said, I honestly don't know what F_RARP or F_GUEST_ANNOUNCE is for,


rarp is to have destination host broadcast a message with VM address
to update the network. Guest announce is when it will instead
ask the guest to do this.


Okay, thanks for explaining to me.
So if both features are not negotiated, no one is going to broadcast
the message, right?

Could that be a valid reason to print an error message in QEMU?

To me it might be reasonable because the user might experience some
network problems, but I'm not a network guy :-)

Thanks,
Stefano





but my understanding is that the message was to notify that the
migration was finished (reading that commit).

If neither feature is supported, could this be a problem for the user
and that's why we were printing the message?

Thanks,
Stefano







Re: [PATCH 05/10] rust: pl011: pull interrupt updates out of read/write ops

2025-01-22 Thread Zhao Liu
On Fri, Jan 17, 2025 at 10:26:52AM +0100, Paolo Bonzini wrote:
> Date: Fri, 17 Jan 2025 10:26:52 +0100
> From: Paolo Bonzini 
> Subject: [PATCH 05/10] rust: pl011: pull interrupt updates out of
>  read/write ops
> X-Mailer: git-send-email 2.47.1
> 
> qemu_irqs are not part of the vmstate, therefore they will remain in
> PL011State.  Update them if needed after regs_read()/regs_write().
> 
> Apply #[must_use] to functions that return whether the interrupt state
> could have changed, so that it's harder to forget the call to update().
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  rust/hw/char/pl011/src/device.rs | 68 ++--
>  1 file changed, 38 insertions(+), 30 deletions(-)
> 

[snip]

>  
>  pub fn event(&mut self, event: QEMUChrEvent) {
>  if event == bindings::QEMUChrEvent::CHR_EVENT_BREAK && 
> !self.loopback_enabled() {
> -self.put_fifo(registers::Data::BREAK.into());
> +let update = self.put_fifo(registers::Data::BREAK.into());

We can omit this `update` variable.

> +if update {
> +self.update();
> +}
>  }
>  }

Nice refactoring!

Reviewed-by: Zhao Liu 




Re: [PATCH 02/10] rust: vmstate: implement VMState for non-leaf types

2025-01-22 Thread Zhao Liu
On Fri, Jan 17, 2025 at 10:00:38AM +0100, Paolo Bonzini wrote:
> Date: Fri, 17 Jan 2025 10:00:38 +0100
> From: Paolo Bonzini 
> Subject: [PATCH 02/10] rust: vmstate: implement VMState for non-leaf types
> X-Mailer: git-send-email 2.47.1
> 
> Arrays, pointers and cells use a VMStateField that is based on that
> for the inner type.  The implementation therefore delegates to the
> VMState implementation of the inner type.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  rust/qemu-api/src/vmstate.rs | 79 +++-
>  1 file changed, 78 insertions(+), 1 deletion(-)
> 

Reviewed-by: Zhao Liu 




Re: [PATCH v2] hw/misc: i2c-echo: add tracing

2025-01-22 Thread Corey Minyard
On Tue, Jan 21, 2025 at 10:59:34AM +, Titus Rwantare wrote:
> This has been useful when debugging and unsure if the guest is
> generating i2c traffic.

Acked-by: Corey Minyard 

> 
> Signed-off-by: Titus Rwantare 
> ---
>  hw/misc/i2c-echo.c   | 8 
>  hw/misc/trace-events | 5 +
>  2 files changed, 13 insertions(+)
> 
> diff --git a/hw/misc/i2c-echo.c b/hw/misc/i2c-echo.c
> index 5ae3d0817e..65d10029dc 100644
> --- a/hw/misc/i2c-echo.c
> +++ b/hw/misc/i2c-echo.c
> @@ -13,6 +13,7 @@
>  #include "qemu/main-loop.h"
>  #include "block/aio.h"
>  #include "hw/i2c/i2c.h"
> +#include "trace.h"
>  
>  #define TYPE_I2C_ECHO "i2c-echo"
>  OBJECT_DECLARE_SIMPLE_TYPE(I2CEchoState, I2C_ECHO)
> @@ -80,11 +81,13 @@ static int i2c_echo_event(I2CSlave *s, enum i2c_event 
> event)
>  case I2C_START_RECV:
>  state->pos = 0;
>  
> +trace_i2c_echo_event(DEVICE(s)->canonical_path, "I2C_START_RECV");
>  break;
>  
>  case I2C_START_SEND:
>  state->pos = 0;
>  
> +trace_i2c_echo_event(DEVICE(s)->canonical_path, "I2C_START_SEND");
>  break;
>  
>  case I2C_FINISH:
> @@ -92,12 +95,15 @@ static int i2c_echo_event(I2CSlave *s, enum i2c_event 
> event)
>  state->state = I2C_ECHO_STATE_START_SEND;
>  i2c_bus_master(state->bus, state->bh);
>  
> +trace_i2c_echo_event(DEVICE(s)->canonical_path, "I2C_FINISH");
>  break;
>  
>  case I2C_NACK:
> +trace_i2c_echo_event(DEVICE(s)->canonical_path, "I2C_NACK");
>  break;
>  
>  default:
> +trace_i2c_echo_event(DEVICE(s)->canonical_path, "UNHANDLED");
>  return -1;
>  }
>  
> @@ -112,6 +118,7 @@ static uint8_t i2c_echo_recv(I2CSlave *s)
>  return 0xff;
>  }
>  
> +trace_i2c_echo_recv(DEVICE(s)->canonical_path, state->data[state->pos]);
>  return state->data[state->pos++];
>  }
>  
> @@ -119,6 +126,7 @@ static int i2c_echo_send(I2CSlave *s, uint8_t data)
>  {
>  I2CEchoState *state = I2C_ECHO(s);
>  
> +trace_i2c_echo_send(DEVICE(s)->canonical_path, data);
>  if (state->pos > 2) {
>  return -1;
>  }
> diff --git a/hw/misc/trace-events b/hw/misc/trace-events
> index cf1abe6928..d58dca2389 100644
> --- a/hw/misc/trace-events
> +++ b/hw/misc/trace-events
> @@ -390,3 +390,8 @@ ivshmem_flat_read_write_mmr_invalid(uint64_t addr_offset) 
> "No ivshmem register m
>  ivshmem_flat_interrupt_invalid_peer(uint16_t peer_id) "Can't interrupt 
> non-existing peer %u"
>  ivshmem_flat_write_mmr(uint64_t addr_offset) "Write access at offset %"PRIu64
>  ivshmem_flat_interrupt_peer(uint16_t peer_id, uint16_t vector_id) 
> "Interrupting peer ID %u, vector %u..."
> +
> +#i2c-echo.c
> +i2c_echo_event(const char *id, const char *event) "%s: %s"
> +i2c_echo_recv(const char *id, uint8_t data) "%s: recv 0x%" PRIx8
> +i2c_echo_send(const char *id, uint8_t data) "%s: send 0x%" PRIx8
> -- 
> 2.48.0.rc2.279.g1de40edade-goog
> 



Re: [PATCH 10/10] rust: vmstate: make order of parameters consistent in vmstate_clock

2025-01-22 Thread Zhao Liu
On Fri, Jan 17, 2025 at 10:00:46AM +0100, Paolo Bonzini wrote:
> Date: Fri, 17 Jan 2025 10:00:46 +0100
> From: Paolo Bonzini 
> Subject: [PATCH 10/10] rust: vmstate: make order of parameters consistent
>  in vmstate_clock
> X-Mailer: git-send-email 2.47.1
> 
> Place struct_name before field_name, similar to offset_of.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  rust/hw/char/pl011/src/device_class.rs | 2 +-
>  rust/qemu-api/src/vmstate.rs   | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Zhao Liu 




Re: [PATCH 07/10] rust: qemu_api: add vmstate_struct

2025-01-22 Thread Zhao Liu
On Fri, Jan 17, 2025 at 10:00:43AM +0100, Paolo Bonzini wrote:
> Date: Fri, 17 Jan 2025 10:00:43 +0100
> From: Paolo Bonzini 
> Subject: [PATCH 07/10] rust: qemu_api: add vmstate_struct
> X-Mailer: git-send-email 2.47.1
> 
> It is not type safe, but it's the best that can be done without
> const_refs_static.  It can also be used with BqlCell and BqlRefCell.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  rust/qemu-api/src/vmstate.rs | 33 +
>  1 file changed, 33 insertions(+)

...

> +#[doc(alias = "VMSTATE_STRUCT")]
> +#[macro_export]
> +macro_rules! vmstate_struct {
> +($struct_name:ty, $field_name:ident $([0 .. $num:ident $(* 
> $factor:expr)?])?, $vmsd:expr, $type:ty $(,)?) => {
> +$crate::bindings::VMStateField {
> +name: ::core::concat!(::core::stringify!($field_name), "\0")
> +.as_bytes()
> +.as_ptr() as *const ::std::os::raw::c_char,
> +$(.num_offset: $crate::offset_of!($struct_name, $num),)?
> +offset: {
> +$crate::assert_field_type!($struct_name, $field_name, $type);
> +$crate::offset_of!($struct_name, $field_name)
> +},
> +size: ::core::mem::size_of::<$type>(),
> +flags: $crate::bindings::VMStateFlags::VMS_STRUCT,
> +vmsd: unsafe { $vmsd },

Yes, this `unsafe` is fitting.

> +..$crate::zeroable::Zeroable::ZERO $(
> +.with_varray_flag($crate::call_func_with_field!(
> +$crate::vmstate::vmstate_varray_flag,
> +$struct_name,
> +$num))
> +   $(.with_varray_multiply($factor))?)?
> +}
> +};
> +}
> +

Reviewed-by: Zhao Liu 




[PATCH 01/11] acpi/ghes: Prepare to support multiple sources on ghes

2025-01-22 Thread Mauro Carvalho Chehab
The current code is actually dependent on having just one error
structure with a single source.

As the number of sources should be arch-dependent, as it will depend on
what kind of synchronous/assynchronous notifications will exist, change
the logic to dynamically build the table.

Yet, for a proper support, we need to get the number of sources by
reading the number from the HEST table. However, bios currently doesn't
store a pointer to it.

For now just change the logic at table build time, while enforcing that
it will behave like before with a single source ID.

A future patch will add a HEST table bios pointer and change the logic
at acpi_ghes_record_errors() to dynamically use the new size.

Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Jonathan Cameron 
---
 hw/acpi/ghes.c   | 43 ++--
 hw/arm/virt-acpi-build.c |  5 +
 include/hw/acpi/ghes.h   | 21 +---
 3 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index b709c177cdea..3f519ccab90d 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -206,17 +206,26 @@ ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
  * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg 
blobs.
  * See docs/specs/acpi_hest_ghes.rst for blobs format.
  */
-static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
+static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
+   int num_sources)
 {
 int i, error_status_block_offset;
 
+/*
+ * TODO: Current version supports only one source.
+ * A further patch will drop this check, after adding a proper migration
+ * code, as, for the code to work, we need to store a bios pointer to the
+ * HEST table.
+ */
+assert(num_sources == 1);
+
 /* Build error_block_address */
-for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+for (i = 0; i < num_sources; i++) {
 build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
 }
 
 /* Build read_ack_register */
-for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+for (i = 0; i < num_sources; i++) {
 /*
  * Initialize the value of read_ack_register to 1, so GHES can be
  * writable after (re)boot.
@@ -231,13 +240,13 @@ static void build_ghes_error_table(GArray 
*hardware_errors, BIOSLinker *linker)
 
 /* Reserve space for Error Status Data Block */
 acpi_data_push(hardware_errors,
-ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
+ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
 
 /* Tell guest firmware to place hardware_errors blob into RAM */
 bios_linker_loader_alloc(linker, ACPI_HW_ERROR_FW_CFG_FILE,
  hardware_errors, sizeof(uint64_t), false);
 
-for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+for (i = 0; i < num_sources; i++) {
 /*
  * Tell firmware to patch error_block_address entries to point to
  * corresponding "Generic Error Status Block"
@@ -263,10 +272,12 @@ static void build_ghes_error_table(GArray 
*hardware_errors, BIOSLinker *linker)
 /* Build Generic Hardware Error Source version 2 (GHESv2) */
 static void build_ghes_v2(GArray *table_data,
   BIOSLinker *linker,
-  enum AcpiGhesNotifyType notify,
-  uint16_t source_id)
+  const AcpiNotificationSourceId *notif_src,
+  uint16_t index, int num_sources)
 {
 uint64_t address_offset;
+const uint16_t notify = notif_src->notify;
+const uint16_t source_id = notif_src->source_id;
 
 /*
  * Type:
@@ -297,7 +308,7 @@ static void build_ghes_v2(GArray *table_data,
address_offset + GAS_ADDR_OFFSET,
sizeof(uint64_t),
ACPI_HW_ERROR_FW_CFG_FILE,
-   source_id * sizeof(uint64_t));
+   index * sizeof(uint64_t));
 
 /* Notification Structure */
 build_ghes_hw_error_notification(table_data, notify);
@@ -317,8 +328,7 @@ static void build_ghes_v2(GArray *table_data,
address_offset + GAS_ADDR_OFFSET,
sizeof(uint64_t),
ACPI_HW_ERROR_FW_CFG_FILE,
-   (ACPI_GHES_ERROR_SOURCE_COUNT + source_id)
-   * sizeof(uint64_t));
+   (num_sources + index) * sizeof(uint64_t));
 
 /*
  * Read Ack Preserve field
@@ -333,19 +343,23 @@ static void build_ghes_v2(GArray *table_data,
 /* Build Hardware Error Source Table */
 void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
  BIOSLinker *linker,

[PATCH 06/11] acpi/ghes: add a notifier to notify when error data is ready

2025-01-22 Thread Mauro Carvalho Chehab
Some error injection notify methods are async, like GPIO
notify. Add a notifier to be used when the error record is
ready to be sent to the guest OS.

Signed-off-by: Mauro Carvalho Chehab 
---
 hw/acpi/ghes.c | 5 -
 include/hw/acpi/ghes.h | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 86c97f60d6a0..961fc38ea8f5 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -506,6 +506,9 @@ static void get_ghes_source_offsets(uint16_t source_id, 
uint64_t hest_addr,
  sizeof(*read_ack_start_addr));
 }
 
+NotifierList acpi_generic_error_notifiers =
+NOTIFIER_LIST_INITIALIZER(error_device_notifiers);
+
 void ghes_record_cper_errors(const void *cper, size_t len,
  uint16_t source_id, Error **errp)
 {
@@ -561,7 +564,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
 /* Write the generic error data entry into guest memory */
 cpu_physical_memory_write(cper_addr, cper, len);
 
-return;
+notifier_list_notify(&acpi_generic_error_notifiers, NULL);
 }
 
 int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 164ed8b0f9a3..2e8405edfe27 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -24,6 +24,9 @@
 
 #include "hw/acpi/bios-linker-loader.h"
 #include "qapi/error.h"
+#include "qemu/notify.h"
+
+extern NotifierList acpi_generic_error_notifiers;
 
 /*
  * Values for Hardware Error Notification Type field
-- 
2.48.1




[PATCH 03/11] acpi/ghes: Use HEST table offsets when preparing GHES records

2025-01-22 Thread Mauro Carvalho Chehab
There are two pointers that are needed during error injection:

1. The start address of the CPER block to be stored;
2. The address of the ack, which needs a reset before next error.

It is preferable to calculate them from the HEST table.  This allows
checking the source ID, the size of the table and the type of the
HEST error block structures.

Yet, keep the old code, as this is needed for migration purposes.

Signed-off-by: Mauro Carvalho Chehab 
---
 hw/acpi/ghes.c | 98 --
 1 file changed, 88 insertions(+), 10 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 34e3364d3fd8..b46b563bcaf8 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -61,6 +61,23 @@
  */
 #define ACPI_GHES_GESB_SIZE 20
 
+/*
+ * Offsets with regards to the start of the HEST table stored at
+ * ags->hest_addr_le, according with the memory layout map at
+ * docs/specs/acpi_hest_ghes.rst.
+ */
+
+/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
+ * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure
+ */
+#define HEST_GHES_V2_TABLE_SIZE  92
+#define GHES_ACK_OFFSET  (64 + GAS_ADDR_OFFSET)
+
+/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source
+ * Table 18-380: 'Error Status Address' field
+ */
+#define GHES_ERR_ST_ADDR_OFFSET  (20 + GAS_ADDR_OFFSET)
+
 /*
  * Values for error_severity field
  */
@@ -212,14 +229,6 @@ static void build_ghes_error_table(GArray 
*hardware_errors, BIOSLinker *linker,
 {
 int i, error_status_block_offset;
 
-/*
- * TODO: Current version supports only one source.
- * A further patch will drop this check, after adding a proper migration
- * code, as, for the code to work, we need to store a bios pointer to the
- * HEST table.
- */
-assert(num_sources == 1);
-
 /* Build error_block_address */
 for (i = 0; i < num_sources; i++) {
 build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
@@ -419,6 +428,70 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
 *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
 }
 
+static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
+uint64_t *cper_addr,
+uint64_t *read_ack_start_addr,
+Error **errp)
+{
+uint64_t hest_err_block_addr, hest_read_ack_addr;
+uint64_t err_source_struct, error_block_addr;
+uint32_t num_sources, i;
+
+if (!hest_addr) {
+return;
+}
+
+cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
+num_sources = le32_to_cpu(num_sources);
+
+err_source_struct = hest_addr + sizeof(num_sources);
+
+/*
+ * Currently, HEST Error source navigates only for GHESv2 tables
+ */
+
+for (i = 0; i < num_sources; i++) {
+uint64_t addr = err_source_struct;
+uint16_t type, src_id;
+
+cpu_physical_memory_read(addr, &type, sizeof(type));
+type = le16_to_cpu(type);
+
+/* For now, we only know the size of GHESv2 table */
+if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) {
+error_setg(errp, "HEST: type %d not supported.", type);
+return;
+}
+
+/* Compare CPER source address at the GHESv2 structure */
+addr += sizeof(type);
+cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
+
+if (src_id == source_id) {
+break;
+}
+
+err_source_struct += HEST_GHES_V2_TABLE_SIZE;
+}
+if (i == num_sources) {
+error_setg(errp, "HEST: Source %d not found.", source_id);
+return;
+}
+
+/* Navigate though table address pointers */
+hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
+hest_read_ack_addr = err_source_struct + GHES_ACK_OFFSET;
+
+cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
+ sizeof(error_block_addr));
+
+cpu_physical_memory_read(error_block_addr, cper_addr,
+ sizeof(*cper_addr));
+
+cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
+ sizeof(*read_ack_start_addr));
+}
+
 void ghes_record_cper_errors(const void *cper, size_t len,
  uint16_t source_id, Error **errp)
 {
@@ -439,8 +512,13 @@ void ghes_record_cper_errors(const void *cper, size_t len,
 }
 ags = &acpi_ged_state->ghes_state;
 
-get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
- &cper_addr, &read_ack_register_addr);
+if (!ags->hest_addr_le) {
+get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
+ &cper_addr, &read_ack_register_addr);
+} else {
+get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le),
+&cper_addr, &read_ack_register_addr, errp);
+}
 
 if (!cper

[PATCH 02/11] acpi/ghes: add a firmware file with HEST address

2025-01-22 Thread Mauro Carvalho Chehab
Store HEST table address at GPA, placing its content at
hest_addr_le variable.

Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Jonathan Cameron 

---

Change from v8:
- hest_addr_lr is now pointing to the error source size and data.

Signed-off-by: Mauro Carvalho Chehab 
---
 hw/acpi/ghes.c | 17 -
 include/hw/acpi/ghes.h |  1 +
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 3f519ccab90d..34e3364d3fd8 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -30,6 +30,7 @@
 
 #define ACPI_HW_ERROR_FW_CFG_FILE   "etc/hardware_errors"
 #define ACPI_HW_ERROR_ADDR_FW_CFG_FILE  "etc/hardware_errors_addr"
+#define ACPI_HEST_ADDR_FW_CFG_FILE  "etc/acpi_table_hest_addr"
 
 /* The max size in bytes for one error block */
 #define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
@@ -261,7 +262,7 @@ static void build_ghes_error_table(GArray *hardware_errors, 
BIOSLinker *linker,
 }
 
 /*
- * tell firmware to write hardware_errors GPA into
+ * Tell firmware to write hardware_errors GPA into
  * hardware_errors_addr fw_cfg, once the former has been initialized.
  */
 bios_linker_loader_write_pointer(linker, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, 0,
@@ -355,6 +356,8 @@ void acpi_build_hest(GArray *table_data, GArray 
*hardware_errors,
 
 acpi_table_begin(&table, table_data);
 
+int hest_offset = table_data->len;
+
 /* Error Source Count */
 build_append_int_noprefix(table_data, num_sources, 4);
 for (i = 0; i < num_sources; i++) {
@@ -362,6 +365,15 @@ void acpi_build_hest(GArray *table_data, GArray 
*hardware_errors,
 }
 
 acpi_table_end(linker, &table);
+
+/*
+ * tell firmware to write into GPA the address of HEST via fw_cfg,
+ * once initialized.
+ */
+bios_linker_loader_write_pointer(linker,
+ ACPI_HEST_ADDR_FW_CFG_FILE, 0,
+ sizeof(uint64_t),
+ ACPI_BUILD_TABLE_FILE, hest_offset);
 }
 
 void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
@@ -375,6 +387,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
 fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
 NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);
 
+fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
+NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
+
 ags->present = true;
 }
 
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 9f0120d0d596..237721fec0a2 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -58,6 +58,7 @@ enum AcpiGhesNotifyType {
 };
 
 typedef struct AcpiGhesState {
+uint64_t hest_addr_le;
 uint64_t hw_error_le;
 bool present; /* True if GHES is present at all on this board */
 } AcpiGhesState;
-- 
2.48.1




[PATCH 04/11] acpi/generic_event_device: Update GHES migration to cover hest addr

2025-01-22 Thread Mauro Carvalho Chehab
The GHES migration logic at GED should now support HEST table
location too.

Signed-off-by: Mauro Carvalho Chehab 
---
 hw/acpi/generic_event_device.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index c85d97ca3776..5346cae573b7 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -386,6 +386,34 @@ static const VMStateDescription vmstate_ghes_state = {
 }
 };
 
+static const VMStateDescription vmstate_hest = {
+.name = "acpi-hest",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (const VMStateField[]) {
+VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
+VMSTATE_END_OF_LIST()
+},
+};
+
+static bool hest_needed(void *opaque)
+{
+AcpiGedState *s = opaque;
+return s->ghes_state.hest_addr_le;
+}
+
+static const VMStateDescription vmstate_hest_state = {
+.name = "acpi-ged/hest",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = hest_needed,
+.fields = (const VMStateField[]) {
+VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
+   vmstate_hest, AcpiGhesState),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_acpi_ged = {
 .name = "acpi-ged",
 .version_id = 1,
@@ -398,6 +426,7 @@ static const VMStateDescription vmstate_acpi_ged = {
 &vmstate_memhp_state,
 &vmstate_cpuhp_state,
 &vmstate_ghes_state,
+&vmstate_hest_state,
 NULL
 }
 };
-- 
2.48.1




[PATCH 05/11] acpi/generic_event_device: add logic to detect if HEST addr is available

2025-01-22 Thread Mauro Carvalho Chehab
Create a new property (x-has-hest-addr) and use it to detect if
the GHES table offsets can be calculated from the HEST address
(qemu 9.2 and upper) or via the legacy way via an offset obtained
from the hardware_errors firmware file.

Signed-off-by: Mauro Carvalho Chehab 
---
 hw/acpi/generic_event_device.c |  1 +
 hw/acpi/ghes.c | 28 +---
 hw/arm/virt-acpi-build.c   | 30 ++
 hw/core/machine.c  |  2 ++
 include/hw/acpi/ghes.h |  1 +
 5 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index 5346cae573b7..fe537ed05c66 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -318,6 +318,7 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, 
AcpiEventStatusBits ev)
 
 static const Property acpi_ged_properties[] = {
 DEFINE_PROP_UINT32("ged-event", AcpiGedState, ged_event_bitmap, 0),
+DEFINE_PROP_BOOL("x-has-hest-addr", AcpiGedState, ghes_state.hest_lookup, 
true),
 };
 
 static const VMStateDescription vmstate_memhp_state = {
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index b46b563bcaf8..86c97f60d6a0 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -359,6 +359,8 @@ void acpi_build_hest(GArray *table_data, GArray 
*hardware_errors,
 {
 AcpiTable table = { .sig = "HEST", .rev = 1,
 .oem_id = oem_id, .oem_table_id = oem_table_id };
+AcpiGedState *acpi_ged_state;
+AcpiGhesState *ags = NULL;
 int i;
 
 build_ghes_error_table(hardware_errors, linker, num_sources);
@@ -379,10 +381,20 @@ void acpi_build_hest(GArray *table_data, GArray 
*hardware_errors,
  * tell firmware to write into GPA the address of HEST via fw_cfg,
  * once initialized.
  */
-bios_linker_loader_write_pointer(linker,
- ACPI_HEST_ADDR_FW_CFG_FILE, 0,
- sizeof(uint64_t),
- ACPI_BUILD_TABLE_FILE, hest_offset);
+
+acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
+   NULL));
+if (!acpi_ged_state) {
+return;
+}
+
+ags = &acpi_ged_state->ghes_state;
+if (ags->hest_lookup) {
+bios_linker_loader_write_pointer(linker,
+ ACPI_HEST_ADDR_FW_CFG_FILE, 0,
+ sizeof(uint64_t),
+ ACPI_BUILD_TABLE_FILE, hest_offset);
+}
 }
 
 void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
@@ -396,8 +408,10 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState 
*s,
 fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
 NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);
 
-fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
-NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
+if (ags && ags->hest_lookup) {
+fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
+NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
+}
 
 ags->present = true;
 }
@@ -512,7 +526,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
 }
 ags = &acpi_ged_state->ghes_state;
 
-if (!ags->hest_addr_le) {
+if (!ags->hest_lookup) {
 get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
  &cper_addr, &read_ack_register_addr);
 } else {
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 3d411787fc37..ada5d08cfbe7 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -897,6 +897,10 @@ static const AcpiNotificationSourceId hest_ghes_notify[] = 
{
 { ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
 };
 
+static const AcpiNotificationSourceId hest_ghes_notify_9_2[] = {
+{ ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
+};
+
 static
 void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
 {
@@ -950,10 +954,28 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
 build_dbg2(tables_blob, tables->linker, vms);
 
 if (vms->ras) {
-acpi_add_table(table_offsets, tables_blob);
-acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
-hest_ghes_notify, ARRAY_SIZE(hest_ghes_notify),
-vms->oem_id, vms->oem_table_id);
+AcpiGhesState *ags;
+AcpiGedState *acpi_ged_state;
+
+acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
+   NULL));
+if (acpi_ged_state) {
+ags = &acpi_ged_state->ghes_state;
+
+acpi_add_table(table_offsets, tables_blob);
+
+if (!ags->hest_lookup) {
+acpi_build_hest(tables_blob, table

[PATCH 10/11] qapi/acpi-hest: add an interface to do generic CPER error injection

2025-01-22 Thread Mauro Carvalho Chehab
Creates a QMP command to be used for generic ACPI APEI hardware error
injection (HEST) via GHESv2, and add support for it for ARM guests.

Error injection uses ACPI_HEST_SRC_ID_QMP source ID to be platform
independent. This is mapped at arch virt bindings, depending on the
types supported by QEMU and by the BIOS. So, on ARM, this is supported
via ACPI_GHES_NOTIFY_GPIO notification type.

This patch is co-authored:
- original ghes logic to inject a simple ARM record by Shiju Jose;
- generic logic to handle block addresses by Jonathan Cameron;
- generic GHESv2 error inject by Mauro Carvalho Chehab;

Co-authored-by: Jonathan Cameron 
Co-authored-by: Shiju Jose 
Co-authored-by: Mauro Carvalho Chehab 
Signed-off-by: Jonathan Cameron 
Signed-off-by: Shiju Jose 
Signed-off-by: Mauro Carvalho Chehab 

---

Changes since v9:
- ARM source IDs renamed to reflect SYNC/ASYNC;
- command name changed to better reflect what it does;
- some improvements at JSON documentation;
- add a check for QMP source at the notification logic.

Signed-off-by: Mauro Carvalho Chehab 
---
 MAINTAINERS  |  7 +++
 hw/acpi/Kconfig  |  5 +
 hw/acpi/ghes.c   |  2 +-
 hw/acpi/ghes_cper.c  | 32 
 hw/acpi/ghes_cper_stub.c | 19 +++
 hw/acpi/meson.build  |  2 ++
 hw/arm/virt-acpi-build.c |  1 +
 hw/arm/virt.c|  7 +++
 include/hw/acpi/ghes.h   |  1 +
 include/hw/arm/virt.h|  1 +
 qapi/acpi-hest.json  | 35 +++
 qapi/meson.build |  1 +
 qapi/qapi-schema.json|  1 +
 13 files changed, 113 insertions(+), 1 deletion(-)
 create mode 100644 hw/acpi/ghes_cper.c
 create mode 100644 hw/acpi/ghes_cper_stub.c
 create mode 100644 qapi/acpi-hest.json

diff --git a/MAINTAINERS b/MAINTAINERS
index 846b81e3ec03..8e1f662fa0e0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
 F: include/hw/acpi/ghes.h
 F: docs/specs/acpi_hest_ghes.rst
 
+ACPI/HEST/GHES/ARM processor CPER
+R: Mauro Carvalho Chehab 
+S: Maintained
+F: hw/arm/ghes_cper.c
+F: hw/acpi/ghes_cper_stub.c
+F: qapi/acpi-hest.json
+
 ppc4xx
 L: qemu-...@nongnu.org
 S: Orphan
diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 1d4e9f0845c0..daabbe6cd11e 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -51,6 +51,11 @@ config ACPI_APEI
 bool
 depends on ACPI
 
+config GHES_CPER
+bool
+depends on ACPI_APEI
+default y
+
 config ACPI_PCI
 bool
 depends on ACPI && PCI
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 5d29db3918dd..cf83c959b5ef 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -547,7 +547,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
 /* Write the generic error data entry into guest memory */
 cpu_physical_memory_write(cper_addr, cper, len);
 
-notifier_list_notify(&acpi_generic_error_notifiers, NULL);
+notifier_list_notify(&acpi_generic_error_notifiers, &source_id);
 }
 
 int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c
new file mode 100644
index ..02c47b41b990
--- /dev/null
+++ b/hw/acpi/ghes_cper.c
@@ -0,0 +1,32 @@
+/*
+ * CPER payload parser for error injection
+ *
+ * Copyright(C) 2024 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "qemu/base64.h"
+#include "qemu/error-report.h"
+#include "qemu/uuid.h"
+#include "qapi/qapi-commands-acpi-hest.h"
+#include "hw/acpi/ghes.h"
+
+void qmp_inject_ghes_error(const char *qmp_cper, Error **errp)
+{
+
+uint8_t *cper;
+size_t  len;
+
+cper = qbase64_decode(qmp_cper, -1, &len, errp);
+if (!cper) {
+error_setg(errp, "missing GHES CPER payload");
+return;
+}
+
+ghes_record_cper_errors(cper, len, ACPI_HEST_SRC_ID_QMP, errp);
+}
diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c
new file mode 100644
index ..8782e2c02fa8
--- /dev/null
+++ b/hw/acpi/ghes_cper_stub.c
@@ -0,0 +1,19 @@
+/*
+ * Stub interface for CPER payload parser for error injection
+ *
+ * Copyright(C) 2024 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-acpi-hest.h"
+#include "hw/acpi/ghes.h"
+
+void qmp_inject_ghes_error(const char *cper, Error **errp)
+{
+error_setg(errp, "GHES QMP error inject is not compiled in");
+}
diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index 73f02b96912b..56b5d1ec9691 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -34,4 +34,6 @@ endif
 system_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 
'aml-build-stub.c', 'ghes-stub.c', 'acpi_interface.c'))
 system_ss.add(when: 'CONFIG_ACPI_PCI_BRIDGE', if_false: 
file

[PATCH 07/11] acpi/ghes: Cleanup the code which gets ghes ged state

2025-01-22 Thread Mauro Carvalho Chehab
Move the check logic into a common function and simplify the
code which checks if GHES is enabled and was properly setup.

Signed-off-by: Mauro Carvalho Chehab 
---
 hw/acpi/ghes-stub.c|  4 ++--
 hw/acpi/ghes.c | 33 +++--
 include/hw/acpi/ghes.h |  9 +
 target/arm/kvm.c   |  2 +-
 4 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index 7cec1812dad9..fbabf955155a 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -16,7 +16,7 @@ int acpi_ghes_memory_errors(uint16_t source_id, uint64_t 
physical_address)
 return -1;
 }
 
-bool acpi_ghes_present(void)
+AcpiGhesState *acpi_ghes_get_state(void)
 {
-return false;
+return NULL;
 }
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 961fc38ea8f5..5d29db3918dd 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -420,10 +420,6 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
  uint64_t *cper_addr,
  uint64_t *read_ack_register_addr)
 {
-if (!ghes_addr) {
-return;
-}
-
 /*
  * non-HEST version supports only one source, so no need to change
  * the start offset based on the source ID. Also, we can't validate
@@ -451,10 +447,6 @@ static void get_ghes_source_offsets(uint16_t source_id, 
uint64_t hest_addr,
 uint64_t err_source_struct, error_block_addr;
 uint32_t num_sources, i;
 
-if (!hest_addr) {
-return;
-}
-
 cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
 num_sources = le32_to_cpu(num_sources);
 
@@ -513,7 +505,6 @@ void ghes_record_cper_errors(const void *cper, size_t len,
  uint16_t source_id, Error **errp)
 {
 uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
-AcpiGedState *acpi_ged_state;
 AcpiGhesState *ags;
 
 if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
@@ -521,13 +512,10 @@ void ghes_record_cper_errors(const void *cper, size_t len,
 return;
 }
 
-acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
-   NULL));
-if (!acpi_ged_state) {
-error_setg(errp, "Can't find ACPI_GED object");
+ags = acpi_ghes_get_state();
+if (!ags) {
 return;
 }
-ags = &acpi_ged_state->ghes_state;
 
 if (!ags->hest_lookup) {
 get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
@@ -537,11 +525,6 @@ void ghes_record_cper_errors(const void *cper, size_t len,
 &cper_addr, &read_ack_register_addr, errp);
 }
 
-if (!cper_addr) {
-error_setg(errp, "can not find Generic Error Status Block");
-return;
-}
-
 cpu_physical_memory_read(read_ack_register_addr,
  &read_ack_register, sizeof(read_ack_register));
 
@@ -605,7 +588,7 @@ int acpi_ghes_memory_errors(uint16_t source_id, uint64_t 
physical_address)
 return 0;
 }
 
-bool acpi_ghes_present(void)
+AcpiGhesState *acpi_ghes_get_state(void)
 {
 AcpiGedState *acpi_ged_state;
 AcpiGhesState *ags;
@@ -614,8 +597,14 @@ bool acpi_ghes_present(void)
NULL));
 
 if (!acpi_ged_state) {
-return false;
+return NULL;
 }
 ags = &acpi_ged_state->ghes_state;
-return ags->present;
+if (!ags->present) {
+return NULL;
+}
+if (!ags->hw_error_le && !ags->hest_addr_le) {
+return NULL;
+}
+return ags;
 }
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 2e8405edfe27..64fe2b5bea65 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -91,10 +91,11 @@ void ghes_record_cper_errors(const void *cper, size_t len,
  uint16_t source_id, Error **errp);
 
 /**
- * acpi_ghes_present: Report whether ACPI GHES table is present
+ * acpi_ghes_get_state: Get a pointer for ACPI ghes state
  *
- * Returns: true if the system has an ACPI GHES table and it is
- * safe to call acpi_ghes_memory_errors() to record a memory error.
+ * Returns: a pointer to ghes state if the system has an ACPI GHES table,
+ * it is enabled and it is safe to call acpi_ghes_memory_errors() to record
+ * a memory error. Returns false, otherwise.
  */
-bool acpi_ghes_present(void);
+AcpiGhesState *acpi_ghes_get_state(void);
 #endif
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index da30bdbb2349..0283089713b9 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2369,7 +2369,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void 
*addr)
 
 assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
 
-if (acpi_ghes_present() && addr) {
+if (acpi_ghes_get_state() && addr) {
 ram_addr = qemu_ram_addr_from_host(addr);
 if (ram_addr != RAM_ADDR_INVALID &&
 kvm_physical_memory_addr_from_host(c->kvm_state, addr, &pad

[PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject

2025-01-22 Thread Mauro Carvalho Chehab
Now that the ghes preparation patches were merged, let's add support
for error injection.

I'm opting to fold two patch series into one here:

1. 
https://lore.kernel.org/qemu-devel/20250113130854.848688-1-mchehab+hua...@kernel.org/

It is the first 5 patches containing changes to the math used to calculate 
offsets at HEST
table and hardware_error firmware file, together with its migration code. 
Migration tested
with both latest QEMU released kernel and upstream, on both directions.

There were no changes on this series since last submission, except for a 
conflict
resolution at the migration table, due to upstream changes.

For more details, se the post of my previous submission.

2. It follows 6 patches from:

https://lore.kernel.org/qemu-devel/cover.1726293808.git.mchehab+hua...@kernel.org/
containing the error injection code and script.

   They add a new QAPI to allow injecting GHESv2 errors, and a script using 
such QAPI
   to inject ARM Processor Error records.

PS.: If I'm counting well, this is the 18th version of this series rebase.

Mauro Carvalho Chehab (11):
  acpi/ghes: Prepare to support multiple sources on ghes
  acpi/ghes: add a firmware file with HEST address
  acpi/ghes: Use HEST table offsets when preparing GHES records
  acpi/generic_event_device: Update GHES migration to cover hest addr
  acpi/generic_event_device: add logic to detect if HEST addr is
available
  acpi/ghes: add a notifier to notify when error data is ready
  acpi/ghes: Cleanup the code which gets ghes ged state
  acpi/generic_event_device: add an APEI error device
  arm/virt: Wire up a GED error device for ACPI / GHES
  qapi/acpi-hest: add an interface to do generic CPER error injection
  scripts/ghes_inject: add a script to generate GHES error inject

 MAINTAINERS|  10 +
 hw/acpi/Kconfig|   5 +
 hw/acpi/aml-build.c|  10 +
 hw/acpi/generic_event_device.c |  38 ++
 hw/acpi/ghes-stub.c|   4 +-
 hw/acpi/ghes.c | 184 +--
 hw/acpi/ghes_cper.c|  32 ++
 hw/acpi/ghes_cper_stub.c   |  19 +
 hw/acpi/meson.build|   2 +
 hw/arm/virt-acpi-build.c   |  35 +-
 hw/arm/virt.c  |  19 +-
 hw/core/machine.c  |   2 +
 include/hw/acpi/acpi_dev_interface.h   |   1 +
 include/hw/acpi/aml-build.h|   2 +
 include/hw/acpi/generic_event_device.h |   1 +
 include/hw/acpi/ghes.h |  36 +-
 include/hw/arm/virt.h  |   2 +
 qapi/acpi-hest.json|  35 ++
 qapi/meson.build   |   1 +
 qapi/qapi-schema.json  |   1 +
 scripts/arm_processor_error.py | 377 +
 scripts/ghes_inject.py |  51 ++
 scripts/qmp_helper.py  | 702 +
 target/arm/kvm.c   |   2 +-
 24 files changed, 1517 insertions(+), 54 deletions(-)
 create mode 100644 hw/acpi/ghes_cper.c
 create mode 100644 hw/acpi/ghes_cper_stub.c
 create mode 100644 qapi/acpi-hest.json
 create mode 100644 scripts/arm_processor_error.py
 create mode 100755 scripts/ghes_inject.py
 create mode 100644 scripts/qmp_helper.py

-- 
2.48.1





[PATCH 11/11] scripts/ghes_inject: add a script to generate GHES error inject

2025-01-22 Thread Mauro Carvalho Chehab
Using the QMP GHESv2 API requires preparing a raw data array
containing a CPER record.

Add a helper script with subcommands to prepare such data.

Currently, only ARM Processor error CPER record is supported.

Signed-off-by: Mauro Carvalho Chehab 
---
 MAINTAINERS|   3 +
 scripts/arm_processor_error.py | 377 ++
 scripts/ghes_inject.py |  51 +++
 scripts/qmp_helper.py  | 702 +
 4 files changed, 1133 insertions(+)
 create mode 100644 scripts/arm_processor_error.py
 create mode 100755 scripts/ghes_inject.py
 create mode 100644 scripts/qmp_helper.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 8e1f662fa0e0..99a9ba5c2ace 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2081,6 +2081,9 @@ S: Maintained
 F: hw/arm/ghes_cper.c
 F: hw/acpi/ghes_cper_stub.c
 F: qapi/acpi-hest.json
+F: scripts/ghes_inject.py
+F: scripts/arm_processor_error.py
+F: scripts/qmp_helper.py
 
 ppc4xx
 L: qemu-...@nongnu.org
diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py
new file mode 100644
index ..62e0c5662232
--- /dev/null
+++ b/scripts/arm_processor_error.py
@@ -0,0 +1,377 @@
+#!/usr/bin/env python3
+#
+# pylint: disable=C0301,C0114,R0903,R0912,R0913,R0914,R0915,W0511
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2024 Mauro Carvalho Chehab 
+
+# TODO: current implementation has dummy defaults.
+#
+# For a better implementation, a QMP addition/call is needed to
+# retrieve some data for ARM Processor Error injection:
+#
+#   - ARM registers: power_state, mpidr.
+
+import argparse
+import re
+
+from qmp_helper import qmp, util, cper_guid
+
+class ArmProcessorEinj:
+"""
+Implements ARM Processor Error injection via GHES
+"""
+
+DESC = """
+Generates an ARM processor error CPER, compatible with
+UEFI 2.9A Errata.
+"""
+
+ACPI_GHES_ARM_CPER_LENGTH = 40
+ACPI_GHES_ARM_CPER_PEI_LENGTH = 32
+
+# Context types
+CONTEXT_AARCH32_EL1 = 1
+CONTEXT_AARCH64_EL1 = 5
+CONTEXT_MISC_REG = 8
+
+def __init__(self, subparsers):
+"""Initialize the error injection class and add subparser"""
+
+# Valid choice values
+self.arm_valid_bits = {
+"mpidr":util.bit(0),
+"affinity": util.bit(1),
+"running":  util.bit(2),
+"vendor":   util.bit(3),
+}
+
+self.pei_flags = {
+"first":util.bit(0),
+"last": util.bit(1),
+"propagated":   util.bit(2),
+"overflow": util.bit(3),
+}
+
+self.pei_error_types = {
+"cache":util.bit(1),
+"tlb":  util.bit(2),
+"bus":  util.bit(3),
+"micro-arch":   util.bit(4),
+}
+
+self.pei_valid_bits = {
+"multiple-error":   util.bit(0),
+"flags":util.bit(1),
+"error-info":   util.bit(2),
+"virt-addr":util.bit(3),
+"phy-addr": util.bit(4),
+}
+
+self.data = bytearray()
+
+parser = subparsers.add_parser("arm", description=self.DESC)
+
+arm_valid_bits = ",".join(self.arm_valid_bits.keys())
+flags = ",".join(self.pei_flags.keys())
+error_types = ",".join(self.pei_error_types.keys())
+pei_valid_bits = ",".join(self.pei_valid_bits.keys())
+
+# UEFI N.16 ARM Validation bits
+g_arm = parser.add_argument_group("ARM processor")
+g_arm.add_argument("--arm", "--arm-valid",
+   help=f"ARM valid bits: {arm_valid_bits}")
+g_arm.add_argument("-a", "--affinity",  "--level", "--affinity-level",
+   type=lambda x: int(x, 0),
+   help="Affinity level (when multiple levels apply)")
+g_arm.add_argument("-l", "--mpidr", type=lambda x: int(x, 0),
+   help="Multiprocessor Affinity Register")
+g_arm.add_argument("-i", "--midr", type=lambda x: int(x, 0),
+   help="Main ID Register")
+g_arm.add_argument("-r", "--running",
+   action=argparse.BooleanOptionalAction,
+   default=None,
+   help="Indicates if the processor is running or not")
+g_arm.add_argument("--psci", "--psci-state",
+   type=lambda x: int(x, 0),
+   help="Power State Coordination Interface - PSCI 
state")
+
+# TODO: Add vendor-specific support
+
+# UEFI N.17 bitmaps (type and flags)
+g_pei = parser.add_argument_group("ARM Processor Error Info (PEI)")
+g_pei.add_argument("-t", "--type", nargs="+",
+help=f"one or more error types: {error_types}")
+g_pei.add_argument("-f", "--flags", nargs="*",
+help=f"zero or more error flags: {flags}")

[PATCH 08/11] acpi/generic_event_device: add an APEI error device

2025-01-22 Thread Mauro Carvalho Chehab
Adds a generic error device to handle generic hardware error
events as specified at ACPI 6.5 specification at 18.3.2.7.2:
https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
using HID PNP0C33.

The PNP0C33 device is used to report hardware errors to
the guest via ACPI APEI Generic Hardware Error Source (GHES).

Co-authored-by: Mauro Carvalho Chehab 
Co-authored-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Igor Mammedov 
---
 hw/acpi/aml-build.c| 10 ++
 hw/acpi/generic_event_device.c |  8 
 include/hw/acpi/acpi_dev_interface.h   |  1 +
 include/hw/acpi/aml-build.h|  2 ++
 include/hw/acpi/generic_event_device.h |  1 +
 5 files changed, 22 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index f8f93a9f66c8..e4bd7b611372 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -2614,3 +2614,13 @@ Aml *aml_i2c_serial_bus_device(uint16_t address, const 
char *resource_source)
 
 return var;
 }
+
+/* ACPI 5.0b: 18.3.2.6.2 Event Notification For Generic Error Sources */
+Aml *aml_error_device(void)
+{
+Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE);
+aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
+aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+
+return dev;
+}
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index fe537ed05c66..ce00c80054f4 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
 ACPI_GED_PWR_DOWN_EVT,
 ACPI_GED_NVDIMM_HOTPLUG_EVT,
 ACPI_GED_CPU_HOTPLUG_EVT,
+ACPI_GED_ERROR_EVT,
 };
 
 /*
@@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, 
HotplugHandler *hotplug_dev,
aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
   aml_int(0x80)));
 break;
+case ACPI_GED_ERROR_EVT:
+aml_append(if_ctx,
+   aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE),
+  aml_int(0x80)));
+break;
 case ACPI_GED_NVDIMM_HOTPLUG_EVT:
 aml_append(if_ctx,
aml_notify(aml_name("\\_SB.NVDR"),
@@ -295,6 +301,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, 
AcpiEventStatusBits ev)
 sel = ACPI_GED_MEM_HOTPLUG_EVT;
 } else if (ev & ACPI_POWER_DOWN_STATUS) {
 sel = ACPI_GED_PWR_DOWN_EVT;
+} else if (ev & ACPI_GENERIC_ERROR) {
+sel = ACPI_GED_ERROR_EVT;
 } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) {
 sel = ACPI_GED_NVDIMM_HOTPLUG_EVT;
 } else if (ev & ACPI_CPU_HOTPLUG_STATUS) {
diff --git a/include/hw/acpi/acpi_dev_interface.h 
b/include/hw/acpi/acpi_dev_interface.h
index 68d9d15f50aa..8294f8f0ccca 100644
--- a/include/hw/acpi/acpi_dev_interface.h
+++ b/include/hw/acpi/acpi_dev_interface.h
@@ -13,6 +13,7 @@ typedef enum {
 ACPI_NVDIMM_HOTPLUG_STATUS = 16,
 ACPI_VMGENID_CHANGE_STATUS = 32,
 ACPI_POWER_DOWN_STATUS = 64,
+ACPI_GENERIC_ERROR = 128,
 } AcpiEventStatusBits;
 
 #define TYPE_ACPI_DEVICE_IF "acpi-device-interface"
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index c18f68134246..f38e12971932 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -252,6 +252,7 @@ struct CrsRangeSet {
 /* Consumer/Producer */
 #define AML_SERIAL_BUS_FLAG_CONSUME_ONLY(1 << 1)
 
+#define ACPI_APEI_ERROR_DEVICE   "GEDD"
 /**
  * init_aml_allocator:
  *
@@ -382,6 +383,7 @@ Aml *aml_dma(AmlDmaType typ, AmlDmaBusMaster bm, 
AmlTransferSize sz,
  uint8_t channel);
 Aml *aml_sleep(uint64_t msec);
 Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source);
+Aml *aml_error_device(void);
 
 /* Block AML object primitives */
 Aml *aml_scope(const char *name_format, ...) G_GNUC_PRINTF(1, 2);
diff --git a/include/hw/acpi/generic_event_device.h 
b/include/hw/acpi/generic_event_device.h
index d2dac87b4a9f..1c18ac296fcb 100644
--- a/include/hw/acpi/generic_event_device.h
+++ b/include/hw/acpi/generic_event_device.h
@@ -101,6 +101,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
 #define ACPI_GED_PWR_DOWN_EVT  0x2
 #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4
 #define ACPI_GED_CPU_HOTPLUG_EVT0x8
+#define ACPI_GED_ERROR_EVT  0x10
 
 typedef struct GEDState {
 MemoryRegion evt;
-- 
2.48.1




[PATCH 09/11] arm/virt: Wire up a GED error device for ACPI / GHES

2025-01-22 Thread Mauro Carvalho Chehab
Adds support to ARM virtualization to allow handling
generic error ACPI Event via GED & error source device.

It is aligned with Linux Kernel patch:
https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.hu...@intel.com/

Co-authored-by: Mauro Carvalho Chehab 
Co-authored-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
Signed-off-by: Mauro Carvalho Chehab 
Acked-by: Igor Mammedov 

---

Changes from v8:

- Added a call to the function that produces GHES generic
  records, as this is now added earlier in this series.

Signed-off-by: Mauro Carvalho Chehab 
---
 hw/arm/virt-acpi-build.c |  1 +
 hw/arm/virt.c| 12 +++-
 include/hw/arm/virt.h|  1 +
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index ada5d08cfbe7..ae60268bdcc2 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -861,6 +861,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 }
 
 acpi_dsdt_add_power_button(scope);
+aml_append(scope, aml_error_device());
 #ifdef CONFIG_TPM
 acpi_dsdt_add_tpm(scope, vms);
 #endif
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 99e0a68b6c55..e272b35ea114 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -678,7 +678,7 @@ static inline DeviceState *create_acpi_ged(VirtMachineState 
*vms)
 DeviceState *dev;
 MachineState *ms = MACHINE(vms);
 int irq = vms->irqmap[VIRT_ACPI_GED];
-uint32_t event = ACPI_GED_PWR_DOWN_EVT;
+uint32_t event = ACPI_GED_PWR_DOWN_EVT | ACPI_GED_ERROR_EVT;
 
 if (ms->ram_slots) {
 event |= ACPI_GED_MEM_HOTPLUG_EVT;
@@ -1010,6 +1010,13 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
 }
 }
 
+static void virt_generic_error_req(Notifier *n, void *opaque)
+{
+VirtMachineState *s = container_of(n, VirtMachineState, 
generic_error_notifier);
+
+acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR);
+}
+
 static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
  uint32_t phandle)
 {
@@ -2404,6 +2411,9 @@ static void machvirt_init(MachineState *machine)
 
 if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) {
 vms->acpi_dev = create_acpi_ged(vms);
+vms->generic_error_notifier.notify = virt_generic_error_req;
+notifier_list_add(&acpi_generic_error_notifiers,
+  &vms->generic_error_notifier);
 } else {
 create_gpio_devices(vms, VIRT_GPIO, sysmem);
 }
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index c8e94e6aedc9..f3cf28436770 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -176,6 +176,7 @@ struct VirtMachineState {
 DeviceState *gic;
 DeviceState *acpi_dev;
 Notifier powerdown_notifier;
+Notifier generic_error_notifier;
 PCIBus *bus;
 char *oem_id;
 char *oem_table_id;
-- 
2.48.1




[PATCH 6/9] nbd/server: Support inactive nodes

2025-01-22 Thread Kevin Wolf
In order to support running an NBD export on inactive nodes, we must
make sure to return errors for any operations that aren't allowed on
inactive nodes. Reads are the only operation we know we need for
inactive images, so to err on the side of caution, return errors for
everything else, even if some operations could possibly be okay.

Signed-off-by: Kevin Wolf 
---
 nbd/server.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/nbd/server.c b/nbd/server.c
index f64e47270c..2076fb2666 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2026,6 +2026,7 @@ static void nbd_export_delete(BlockExport *blk_exp)
 const BlockExportDriver blk_exp_nbd = {
 .type   = BLOCK_EXPORT_TYPE_NBD,
 .instance_size  = sizeof(NBDExport),
+.supports_inactive  = true,
 .create = nbd_export_create,
 .delete = nbd_export_delete,
 .request_shutdown   = nbd_export_request_shutdown,
@@ -2920,6 +2921,22 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
 NBDExport *exp = client->exp;
 char *msg;
 size_t i;
+bool inactive;
+
+WITH_GRAPH_RDLOCK_GUARD() {
+inactive = bdrv_is_inactive(blk_bs(exp->common.blk));
+if (inactive) {
+switch (request->type) {
+case NBD_CMD_READ:
+/* These commands are allowed on inactive nodes */
+break;
+default:
+/* Return an error for the rest */
+return nbd_send_generic_reply(client, request, -EPERM,
+  "export is inactive", errp);
+}
+}
+}
 
 switch (request->type) {
 case NBD_CMD_CACHE:
-- 
2.48.1




[PATCH 7/9] block: Add blockdev-set-active QMP command

2025-01-22 Thread Kevin Wolf
The system emulator tries to automatically activate and inactivate block
nodes at the right point during migration. However, there are still
cases where it's necessary that the user can do this manually.

Images are only activated on the destination VM of a migration when the
VM is actually resumed. If the VM was paused, this doesn't happen
automatically. The user may want to perform some operation on a block
device (e.g. taking a snapshot or starting a block job) without also
resuming the VM yet. This is an example where a manual command is
necessary.

Another example is VM migration when the image files are opened by an
external qemu-storage-daemon instance on each side. In this case, the
process that needs to hand over the images isn't even part of the
migration and can't know when the migration completes. Management tools
need a way to explicitly inactivate images on the source and activate
them on the destination.

This adds a new blockdev-set-active QMP command that lets the user
change the status of individual nodes (this is necessary in
qemu-storage-daemon because it could be serving multiple VMs and only
one of them migrates at a time). For convenience, operating on all
devices (like QEMU does automatically during migration) is offered as an
option, too, and can be used in the context of single VM.

Signed-off-by: Kevin Wolf 
---
 qapi/block-core.json   | 32 ++
 include/block/block-global-state.h |  3 +++
 block.c| 21 
 blockdev.c | 32 ++
 4 files changed, 88 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5bc164dbed..39c675a036 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4941,6 +4941,38 @@
 { 'command': 'blockdev-del', 'data': { 'node-name': 'str' },
   'allow-preconfig': true }
 
+##
+# @blockdev-set-active:
+#
+# Activate or inactive a block device. Use this to manage the handover of block
+# devices on migration with qemu-storage-daemon.
+#
+# Activating a node automatically activates all of its child nodes first.
+# Inactivating a node automatically inactivates any of its child nodes that are
+# not in use by a still active node.
+#
+# @node-name: Name of the graph node to activate or inactivate. By default, all
+# nodes are affected by the operation.
+#
+# @active: true if the nodes should be active when the command returns success,
+# false if they should be inactive.
+#
+# Since: 10.0
+#
+# .. qmp-example::
+#
+# -> { "execute": "blockdev-set-active",
+#  "arguments": {
+#   "node-name": "node0",
+#   "active": false
+#  }
+#}
+# <- { "return": {} }
+##
+{ 'command': 'blockdev-set-active',
+  'data': { '*node-name': 'str', 'active': 'bool' },
+  'allow-preconfig': true }
+
 ##
 # @BlockdevCreateOptionsFile:
 #
diff --git a/include/block/block-global-state.h 
b/include/block/block-global-state.h
index a826bf5f78..9be34b3c99 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -184,6 +184,9 @@ bdrv_activate(BlockDriverState *bs, Error **errp);
 int coroutine_fn no_co_wrapper_bdrv_rdlock
 bdrv_co_activate(BlockDriverState *bs, Error **errp);
 
+int no_coroutine_fn
+bdrv_inactivate(BlockDriverState *bs, Error **errp);
+
 void bdrv_activate_all(Error **errp);
 int bdrv_inactivate_all(void);
 
diff --git a/block.c b/block.c
index 76cddd6757..ebb6a7baeb 100644
--- a/block.c
+++ b/block.c
@@ -7040,6 +7040,27 @@ bdrv_inactivate_recurse(BlockDriverState *bs, bool 
top_level)
 return 0;
 }
 
+int bdrv_inactivate(BlockDriverState *bs, Error **errp)
+{
+int ret;
+
+GLOBAL_STATE_CODE();
+GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+if (bdrv_has_bds_parent(bs, true)) {
+error_setg(errp, "Node has active parent node");
+return -EPERM;
+}
+
+ret = bdrv_inactivate_recurse(bs, true);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to inactivate node");
+return ret;
+}
+
+return 0;
+}
+
 int bdrv_inactivate_all(void)
 {
 BlockDriverState *bs = NULL;
diff --git a/blockdev.c b/blockdev.c
index 218024497b..874cef0299 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3455,6 +3455,38 @@ void qmp_blockdev_del(const char *node_name, Error 
**errp)
 bdrv_unref(bs);
 }
 
+void qmp_blockdev_set_active(const char *node_name, bool active, Error **errp)
+{
+int ret;
+
+GLOBAL_STATE_CODE();
+GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+if (!node_name) {
+if (active) {
+bdrv_activate_all(errp);
+} else {
+ret = bdrv_inactivate_all();
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to inactivate all nodes");
+}
+}
+} else {
+BlockDriverState *bs = bdrv_find_node(node_name);
+if (!bs) {
+error_setg(errp, "Failed to find node with node-name=

[PATCH 9/9] iotests: Add qsd-migrate case

2025-01-22 Thread Kevin Wolf
Test that it's possible to migrate a VM that uses an image on shared
storage through qemu-storage-daemon.

Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/tests/qsd-migrate | 132 +++
 tests/qemu-iotests/tests/qsd-migrate.out |  51 +
 2 files changed, 183 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/qsd-migrate
 create mode 100644 tests/qemu-iotests/tests/qsd-migrate.out

diff --git a/tests/qemu-iotests/tests/qsd-migrate 
b/tests/qemu-iotests/tests/qsd-migrate
new file mode 100755
index 00..687bda6f93
--- /dev/null
+++ b/tests/qemu-iotests/tests/qsd-migrate
@@ -0,0 +1,132 @@
+#!/usr/bin/env python3
+# group: rw quick
+#
+# Copyright (C) Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+# Creator/Owner: Kevin Wolf 
+
+import iotests
+
+from iotests import filter_qemu_io, filter_qtest
+
+iotests.script_initialize(supported_fmts=['generic'],
+  supported_protocols=['file'],
+  supported_platforms=['linux'])
+
+with iotests.FilePath('disk.img') as path, \
+ iotests.FilePath('nbd-src.sock', base_dir=iotests.sock_dir) as nbd_src, \
+ iotests.FilePath('nbd-dst.sock', base_dir=iotests.sock_dir) as nbd_dst, \
+ iotests.FilePath('migrate.sock', base_dir=iotests.sock_dir) as mig_sock, \
+ iotests.VM(path_suffix="-src") as vm_src, \
+ iotests.VM(path_suffix="-dst") as vm_dst:
+
+img_size = '10M'
+
+iotests.log('Preparing disk...')
+iotests.qemu_img_create('-f', iotests.imgfmt, path, img_size)
+
+iotests.log('Launching source QSD...')
+qsd_src = iotests.QemuStorageDaemon(
+'--blockdev', f'file,node-name=disk-file,filename={path}',
+'--blockdev', f'{iotests.imgfmt},file=disk-file,node-name=disk-fmt',
+'--nbd-server', f'addr.type=unix,addr.path={nbd_src}',
+'--export', 'nbd,id=exp0,node-name=disk-fmt,writable=true,'
+'allow-inactive=true',
+qmp=True,
+)
+
+iotests.log('Launching source VM...')
+vm_src.add_args('-blockdev', f'nbd,node-name=disk,server.type=unix,'
+ f'server.path={nbd_src},export=disk-fmt')
+vm_src.add_args('-device', 'virtio-blk,drive=disk,id=virtio0')
+vm_src.launch()
+
+iotests.log('Launching destination QSD...')
+qsd_dst = iotests.QemuStorageDaemon(
+'--blockdev', f'file,node-name=disk-file,filename={path},active=off',
+'--blockdev', f'{iotests.imgfmt},file=disk-file,node-name=disk-fmt,'
+  f'active=off',
+'--nbd-server', f'addr.type=unix,addr.path={nbd_dst}',
+'--export', 'nbd,id=exp0,node-name=disk-fmt,writable=true,'
+'allow-inactive=true',
+qmp=True,
+instance_id='b',
+)
+
+iotests.log('Launching destination VM...')
+vm_dst.add_args('-blockdev', f'nbd,node-name=disk,server.type=unix,'
+ f'server.path={nbd_dst},export=disk-fmt')
+vm_dst.add_args('-device', 'virtio-blk,drive=disk,id=virtio0')
+vm_dst.add_args('-incoming', f'unix:{mig_sock}')
+vm_dst.launch()
+
+iotests.log('\nTest I/O on the source')
+vm_src.hmp_qemu_io('virtio0/virtio-backend', 'write -P 0x11 0 4k',
+   use_log=True, qdev=True)
+vm_src.hmp_qemu_io('virtio0/virtio-backend', 'read -P 0x11 0 4k',
+   use_log=True, qdev=True)
+
+iotests.log('\nStarting migration...')
+
+mig_caps = [
+{'capability': 'events', 'state': True},
+{'capability': 'pause-before-switchover', 'state': True},
+]
+vm_src.qmp_log('migrate-set-capabilities', capabilities=mig_caps)
+vm_dst.qmp_log('migrate-set-capabilities', capabilities=mig_caps)
+vm_src.qmp_log('migrate', uri=f'unix:{mig_sock}',
+   filters=[iotests.filter_qmp_testfiles])
+
+vm_src.event_wait('MIGRATION',
+  match={'data': {'status': 'pre-switchover'}})
+
+iotests.log('\nPre-switchover: Reconfigure QSD instances')
+
+iotests.log(qsd_src.qmp('blockdev-set-active', {'active': False}))
+iotests.log(qsd_dst.qmp('blockdev-set-active', {'active': True}))
+
+iotests.log('\nCompleting migration...')
+
+vm_src.qmp_log('migrate-continue', state='pre-switchover')
+vm_dst.event_wait('MIGRATION', match={'data': {'status': 'completed'}})
+
+iotests.lo

[PATCH 0/9] block: Managing inactive nodes (QSD migration)

2025-01-22 Thread Kevin Wolf
This series adds a mechanism that allows the user or management tool to
manually activate and inactivate block nodes instead of fully relying on
the automatic management in the migration code.

One case where this is needed is for migration with shared storage and
devices backed by qemu-storage-daemon, which as an external process is
not involved in the VM migration. Management tools can manually
orchestrate the handover in this scenario. The new qemu-iotests case
qsd-migrate demonstrates this.

There are other cases without qemu-storage-daemon where manual
management is necessary. For example, after migration, the destination
VM only activates images on 'cont', but after migrating a paused VM, the
user may want to perform operations on a block node while the VM is
still paused.

This series adds support for block exports on an inactive node (needed
for shared storage migration with qemu-storage-daemon) only to NBD.
Adding it to other export types will be done in a future series.

Kevin Wolf (9):
  block: Allow inactivating already inactive nodes
  block: Add option to create inactive nodes
  block: Support inactive nodes in blk_insert_bs()
  block/export: Don't ignore image activation error in blk_exp_add()
  block/export: Add option to allow export of inactive nodes
  nbd/server: Support inactive nodes
  block: Add blockdev-set-active QMP command
  iotests: Add filter_qtest()
  iotests: Add qsd-migrate case

 qapi/block-core.json  |  38 +
 qapi/block-export.json|  10 +-
 include/block/block-common.h  |   1 +
 include/block/block-global-state.h|   6 +
 include/block/export.h|   3 +
 block.c   |  50 ++-
 block/block-backend.c |  14 +-
 block/export/export.c |  29 +++-
 blockdev.c|  32 +
 nbd/server.c  |  17 +++
 tests/qemu-iotests/iotests.py |   4 +
 tests/qemu-iotests/041|   4 +-
 tests/qemu-iotests/165|   4 +-
 tests/qemu-iotests/tests/copy-before-write|   3 +-
 tests/qemu-iotests/tests/migrate-bitmaps-test |   7 +-
 tests/qemu-iotests/tests/qsd-migrate  | 132 ++
 tests/qemu-iotests/tests/qsd-migrate.out  |  51 +++
 17 files changed, 379 insertions(+), 26 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/qsd-migrate
 create mode 100644 tests/qemu-iotests/tests/qsd-migrate.out

-- 
2.48.1




[PATCH 2/9] block: Add option to create inactive nodes

2025-01-22 Thread Kevin Wolf
In QEMU, nodes are automatically created inactive while expecting an
incoming migration (i.e. RUN_STATE_INMIGRATE). In qemu-storage-daemon,
the notion of runstates doesn't exist. It also wouldn't necessarily make
sense to introduce it because a single daemon can serve multiple VMs
that can be in different states.

Therefore, allow the user to explicitly open images as inactive with a
new option. The default is as before: Nodes are usually active, except
when created during RUN_STATE_INMIGRATE.

Signed-off-by: Kevin Wolf 
---
 qapi/block-core.json | 6 ++
 include/block/block-common.h | 1 +
 block.c  | 9 +
 3 files changed, 16 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index fd3bcc1c17..5bc164dbed 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4679,6 +4679,11 @@
 #
 # @cache: cache-related options
 #
+# @active: whether the block node should be activated (default: true).
+# Having inactive block nodes is useful primarily for migration because it
+# allows opening an image on the destination while the source is still
+# holding locks for it. (Since 10.0)
+#
 # @read-only: whether the block device should be read-only (default:
 # false).  Note that some block drivers support only read-only
 # access, either generally or in certain configurations.  In this
@@ -4705,6 +4710,7 @@
 '*node-name': 'str',
 '*discard': 'BlockdevDiscardOptions',
 '*cache': 'BlockdevCacheOptions',
+'*active': 'bool',
 '*read-only': 'bool',
 '*auto-read-only': 'bool',
 '*force-share': 'bool',
diff --git a/include/block/block-common.h b/include/block/block-common.h
index 338fe5ff7a..7030669f04 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -257,6 +257,7 @@ typedef enum {
 #define BDRV_OPT_AUTO_READ_ONLY "auto-read-only"
 #define BDRV_OPT_DISCARD"discard"
 #define BDRV_OPT_FORCE_SHARE"force-share"
+#define BDRV_OPT_ACTIVE "active"
 
 
 #define BDRV_SECTOR_BITS   9
diff --git a/block.c b/block.c
index 43ed632a7a..2740a95a72 100644
--- a/block.c
+++ b/block.c
@@ -1573,6 +1573,10 @@ static void update_flags_from_options(int *flags, 
QemuOpts *opts)
 if (qemu_opt_get_bool_del(opts, BDRV_OPT_AUTO_READ_ONLY, false)) {
 *flags |= BDRV_O_AUTO_RDONLY;
 }
+
+if (!qemu_opt_get_bool_del(opts, BDRV_OPT_ACTIVE, true)) {
+*flags |= BDRV_O_INACTIVE;
+}
 }
 
 static void update_options_from_flags(QDict *options, int flags)
@@ -1799,6 +1803,11 @@ QemuOptsList bdrv_runtime_opts = {
 .type = QEMU_OPT_BOOL,
 .help = "Ignore flush requests",
 },
+{
+.name = BDRV_OPT_ACTIVE,
+.type = QEMU_OPT_BOOL,
+.help = "Node is activated",
+},
 {
 .name = BDRV_OPT_READ_ONLY,
 .type = QEMU_OPT_BOOL,
-- 
2.48.1




[PATCH 4/9] block/export: Don't ignore image activation error in blk_exp_add()

2025-01-22 Thread Kevin Wolf
Currently, block jobs can't handle inactive images correctly. Incoming
write requests would run into assertion failures. Make sure that we
return an error when creating an export can't activate the image.

Signed-off-by: Kevin Wolf 
---
 block/export/export.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/export/export.c b/block/export/export.c
index 79c71ee245..bac42b8608 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -145,7 +145,11 @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error 
**errp)
  * ctx was acquired in the caller.
  */
 bdrv_graph_rdlock_main_loop();
-bdrv_activate(bs, NULL);
+ret = bdrv_activate(bs, errp);
+if (ret < 0) {
+bdrv_graph_rdunlock_main_loop();
+goto fail;
+}
 bdrv_graph_rdunlock_main_loop();
 
 perm = BLK_PERM_CONSISTENT_READ;
-- 
2.48.1




[PATCH 5/9] block/export: Add option to allow export of inactive nodes

2025-01-22 Thread Kevin Wolf
Add an option in BlockExportOptions to allow creating an export on an
inactive node without activating the node. This mode needs to be
explicitly supported by the export type (so that it doesn't perform any
operations that are forbidden for inactive nodes), so this patch alone
doesn't allow this option to be successfully used yet.

Signed-off-by: Kevin Wolf 
---
 qapi/block-export.json | 10 +-
 include/block/block-global-state.h |  3 +++
 include/block/export.h |  3 +++
 block.c|  4 
 block/export/export.c  | 31 --
 5 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/qapi/block-export.json b/qapi/block-export.json
index ce33fe378d..117b05d13c 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -372,6 +372,13 @@
 # cannot be moved to the iothread.  The default is false.
 # (since: 5.2)
 #
+# @allow-inactive: If true, the export allows the exported node to be inactive.
+# If it is created for an inactive block node, the node remains inactive. 
If
+# the export type doesn't support running on an inactive node, an error is
+# returned. If false, inactive block nodes are automatically activated 
before
+# creating the export and trying to inactivate them later fails.
+# (since: 10.0; default: false)
+#
 # Since: 4.2
 ##
 { 'union': 'BlockExportOptions',
@@ -381,7 +388,8 @@
 '*iothread': 'str',
 'node-name': 'str',
 '*writable': 'bool',
-'*writethrough': 'bool' },
+'*writethrough': 'bool',
+'*allow-inactive': 'bool' },
   'discriminator': 'type',
   'data': {
   'nbd': 'BlockExportOptionsNbd',
diff --git a/include/block/block-global-state.h 
b/include/block/block-global-state.h
index bd7cecd1cf..a826bf5f78 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -175,6 +175,9 @@ BlockDriverState * GRAPH_RDLOCK
 check_to_replace_node(BlockDriverState *parent_bs, const char *node_name,
   Error **errp);
 
+
+bool GRAPH_RDLOCK bdrv_is_inactive(BlockDriverState *bs);
+
 int no_coroutine_fn GRAPH_RDLOCK
 bdrv_activate(BlockDriverState *bs, Error **errp);
 
diff --git a/include/block/export.h b/include/block/export.h
index f2fe0f8078..4bd9531d4d 100644
--- a/include/block/export.h
+++ b/include/block/export.h
@@ -29,6 +29,9 @@ typedef struct BlockExportDriver {
  */
 size_t instance_size;
 
+/* True if the export type supports running on an inactive node */
+bool supports_inactive;
+
 /* Creates and starts a new block export */
 int (*create)(BlockExport *, BlockExportOptions *, Error **);
 
diff --git a/block.c b/block.c
index 2740a95a72..76cddd6757 100644
--- a/block.c
+++ b/block.c
@@ -6833,6 +6833,10 @@ void bdrv_init_with_whitelist(void)
 bdrv_init();
 }
 
+bool bdrv_is_inactive(BlockDriverState *bs) {
+return bs->open_flags & BDRV_O_INACTIVE;
+}
+
 int bdrv_activate(BlockDriverState *bs, Error **errp)
 {
 BdrvChild *child, *parent;
diff --git a/block/export/export.c b/block/export/export.c
index bac42b8608..f3bbf11070 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -75,6 +75,7 @@ static const BlockExportDriver 
*blk_exp_find_driver(BlockExportType type)
 BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
 {
 bool fixed_iothread = export->has_fixed_iothread && export->fixed_iothread;
+bool allow_inactive = export->has_allow_inactive && export->allow_inactive;
 const BlockExportDriver *drv;
 BlockExport *exp = NULL;
 BlockDriverState *bs;
@@ -138,17 +139,24 @@ BlockExport *blk_exp_add(BlockExportOptions *export, 
Error **errp)
 }
 }
 
-/*
- * Block exports are used for non-shared storage migration. Make sure
- * that BDRV_O_INACTIVE is cleared and the image is ready for write
- * access since the export could be available before migration handover.
- * ctx was acquired in the caller.
- */
 bdrv_graph_rdlock_main_loop();
-ret = bdrv_activate(bs, errp);
-if (ret < 0) {
-bdrv_graph_rdunlock_main_loop();
-goto fail;
+if (allow_inactive) {
+if (!drv->supports_inactive) {
+error_setg(errp, "Export type does not support inactive exports");
+bdrv_graph_rdunlock_main_loop();
+goto fail;
+}
+} else {
+/*
+ * Block exports are used for non-shared storage migration. Make sure
+ * that BDRV_O_INACTIVE is cleared and the image is ready for write
+ * access since the export could be available before migration 
handover.
+ */
+ret = bdrv_activate(bs, errp);
+if (ret < 0) {
+bdrv_graph_rdunlock_main_loop();
+goto fail;
+}
 }
 bdrv_graph_rdunlock_main_loop();
 
@@ -162,6 +170,9 @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error 
*

[PATCH 3/9] block: Support inactive nodes in blk_insert_bs()

2025-01-22 Thread Kevin Wolf
Device models have a relatively complex way to set up their block
backends, in which blk_attach_dev() sets blk->disable_perm = true.
We want to support inactive images in exports, too, so that
qemu-storage-daemon can be used with migration. Because they don't use
blk_attach_dev(), they need another way to set this flag. The most
convenient is to do this automatically when an inactive node is attached
to a BlockBackend.

Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index b610582644..76ec5f0576 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -900,14 +900,24 @@ void blk_remove_bs(BlockBackend *blk)
 int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
 {
 ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
+uint64_t perm, shared_perm;
 
 GLOBAL_STATE_CODE();
 bdrv_ref(bs);
 bdrv_graph_wrlock();
+
+if (bs->open_flags & BDRV_O_INACTIVE) {
+blk->disable_perm = true;
+perm = 0;
+shared_perm = BLK_PERM_ALL;
+} else {
+perm = blk->perm;
+shared_perm = blk->shared_perm;
+}
+
 blk->root = bdrv_root_attach_child(bs, "root", &child_root,
BDRV_CHILD_FILTERED | 
BDRV_CHILD_PRIMARY,
-   blk->perm, blk->shared_perm,
-   blk, errp);
+   perm, shared_perm, blk, errp);
 bdrv_graph_wrunlock();
 if (blk->root == NULL) {
 return -EPERM;
-- 
2.48.1




Re: [PATCH 05/10] rust: vmstate: implement VMState for scalar types

2025-01-22 Thread Zhao Liu
On Fri, Jan 17, 2025 at 10:00:41AM +0100, Paolo Bonzini wrote:
> Date: Fri, 17 Jan 2025 10:00:41 +0100
> From: Paolo Bonzini 
> Subject: [PATCH 05/10] rust: vmstate: implement VMState for scalar types
> X-Mailer: git-send-email 2.47.1
> 
> Scalar types are those that have their own VMStateInfo.  This poses
> a problem in that references to VMStateInfo can only be included in
> associated consts starting with Rust 1.83.0, when the const_refs_static
> was stabilized.  Removing the requirement is done by placing a limited
> list of VMStateInfos in an enum, and going from enum to &VMStateInfo
> only when building the VMStateField.
> 
> The same thing cannot be done with VMS_STRUCT because the set of
> VMStateDescriptions extends to structs defined by the devices.
> Therefore, structs and cells cannot yet use vmstate_of!.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  rust/qemu-api/src/vmstate.rs | 128 ++-
>  1 file changed, 126 insertions(+), 2 deletions(-)


>  /// Internal utility function to retrieve a type's `VMStateField`;
>  /// used by [`vmstate_of!`](crate::vmstate_of).
>  pub const fn vmstate_base(_: PhantomData) -> VMStateField {
> @@ -99,6 +178,15 @@ pub const fn vmstate_varray_flag(_: 
> PhantomData) -> VMStateField
>  /// Return the `VMStateField` for a field of a struct.  The field must be
>  /// visible in the current scope.
>  ///
> +/// Only a limited set of types is supported out of the box:
> +/// * scalar types (integer and `bool`)
> +/// * the C struct `QEMUTimer`
> +/// * a transparent wrapper for any of the above (`Cell`, `UnsafeCell`,
> +///   [`BqlCell`](crate::cell::BqlCell), 
> [`BqlRefCell`](crate::cell::BqlRefCell)
> +/// * a raw pointer to any of the above
> +/// * a `NonNull` pointer to any of the above, possibly wrapped with `Option`

I just found your rust-next has already updated and removed `Option` :-)

> +/// * an array of any of the above
> +///
>  /// In order to support other types, the trait `VMState` must be implemented
>  /// for them.
>  #[macro_export]
> @@ -109,8 +197,14 @@ macro_rules! vmstate_of {
>  .as_bytes()
>  .as_ptr() as *const ::std::os::raw::c_char,
>  offset: $crate::offset_of!($struct_name, $field_name),
> -// Compute most of the VMStateField from the type of the field.

Rebase mistake? This comment seems no need to be deleted.

>  $(.num_offset: $crate::offset_of!($struct_name, $num),)?
> +// The calls to `call_func_with_field!` are the magic that
> +// computes most of the VMStateField from the type of the field.
> +info: $crate::info_enum_to_ref!($crate::call_func_with_field!(
> +$crate::vmstate::vmstate_scalar_type,
> +$struct_name,
> +$field_name
> +)),
>

Only a nit above,

Reviewed-by: Zhao Liu 




Re: [PATCH] hw/boards: Convert MachineClass bitfields to boolean

2025-01-22 Thread BALATON Zoltan

On Wed, 22 Jan 2025, Philippe Mathieu-Daudé wrote:

As Daniel mentioned:

"The number of instances of MachineClass is not large enough
 that we save a useful amount of memory through bitfields."

Also, see recent commit ecbf3567e21 ("docs/devel/style: add a
section about bitfield, and disallow them for packed structures").


Packed structs are used when the layout is important and MachineClass is 
not one of those so this does not apply here. This looks like just churn 
without any advantage but it's also not large so I'm not opposed to it but 
I don't see why this would be any better.


Regards,
BALATON Zoltan


Convert the MachineClass bitfields used as boolean as real ones.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Philippe Mathieu-Daudé 
---
include/hw/boards.h| 14 +++---
hw/arm/aspeed.c|  6 +++---
hw/arm/fby35.c |  4 ++--
hw/arm/npcm7xx_boards.c|  6 +++---
hw/arm/raspi.c |  6 +++---
hw/arm/sbsa-ref.c  |  2 +-
hw/arm/virt.c  |  2 +-
hw/arm/xilinx_zynq.c   |  2 +-
hw/avr/arduino.c   |  6 +++---
hw/core/null-machine.c | 10 +-
hw/i386/microvm.c  |  2 +-
hw/i386/pc_piix.c  |  2 +-
hw/i386/pc_q35.c   |  4 ++--
hw/loongarch/virt.c|  2 +-
hw/m68k/virt.c |  6 +++---
hw/ppc/pnv.c   |  2 +-
hw/ppc/spapr.c |  2 +-
hw/riscv/virt.c|  2 +-
hw/s390x/s390-virtio-ccw.c |  8 
hw/xtensa/sim.c|  2 +-
20 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 2ad711e56db..ff5904d6fd8 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -279,13 +279,13 @@ struct MachineClass {
int max_cpus;
int min_cpus;
int default_cpus;
-unsigned int no_serial:1,
-no_parallel:1,
-no_floppy:1,
-no_cdrom:1,
-no_sdcard:1,
-pci_allow_0_address:1,
-legacy_fw_cfg_order:1;
+bool no_serial;
+bool no_parallel;
+bool no_floppy;
+bool no_cdrom;
+bool no_sdcard;
+bool pci_allow_0_address;
+bool legacy_fw_cfg_order;
bool is_default;
const char *default_machine_opts;
const char *default_boot_order;
diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index a18d4ed1fb1..dc91052e94d 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -1225,9 +1225,9 @@ static void aspeed_machine_class_init(ObjectClass *oc, 
void *data)
AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);

mc->init = aspeed_machine_init;
-mc->no_floppy = 1;
-mc->no_cdrom = 1;
-mc->no_parallel = 1;
+mc->no_floppy = true;
+mc->no_cdrom = true;
+mc->no_parallel = true;
mc->default_ram_id = "ram";
amc->macs_mask = ASPEED_MAC0_ON;
amc->uart_default = ASPEED_DEV_UART5;
diff --git a/hw/arm/fby35.c b/hw/arm/fby35.c
index 83d08e578b7..04d0eb9b0c1 100644
--- a/hw/arm/fby35.c
+++ b/hw/arm/fby35.c
@@ -168,8 +168,8 @@ static void fby35_class_init(ObjectClass *oc, void *data)

mc->desc = "Meta Platforms fby35";
mc->init = fby35_init;
-mc->no_floppy = 1;
-mc->no_cdrom = 1;
+mc->no_floppy = true;
+mc->no_cdrom = true;
mc->min_cpus = mc->max_cpus = mc->default_cpus = 3;

object_class_property_add_bool(oc, "execute-in-place",
diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
index 7727e0dc4bb..c9735b357cd 100644
--- a/hw/arm/npcm7xx_boards.c
+++ b/hw/arm/npcm7xx_boards.c
@@ -461,9 +461,9 @@ static void npcm7xx_machine_class_init(ObjectClass *oc, 
void *data)
NULL
};

-mc->no_floppy = 1;
-mc->no_cdrom = 1;
-mc->no_parallel = 1;
+mc->no_floppy = true;
+mc->no_cdrom = true;
+mc->no_parallel = true;
mc->default_ram_id = "ram";
mc->valid_cpu_types = valid_cpu_types;
}
diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index a7a662f40db..665ccd9b50b 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -322,9 +322,9 @@ void raspi_machine_class_common_init(MachineClass *mc,
   board_type(board_rev),
   FIELD_EX32(board_rev, REV_CODE, REVISION));
mc->block_default_type = IF_SD;
-mc->no_parallel = 1;
-mc->no_floppy = 1;
-mc->no_cdrom = 1;
+mc->no_parallel = true;
+mc->no_floppy = true;
+mc->no_cdrom = true;
mc->default_cpus = mc->min_cpus = mc->max_cpus = cores_count(board_rev);
mc->default_ram_size = board_ram_size(board_rev);
mc->default_ram_id = "ram";
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 6183111f2de..33c6b9ea3ec 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -899,7 +899,7 @@ static void sbsa_ref_class_init(ObjectClass *oc, void *data)
mc->pci_allow_0_address = true;
mc->minimum_page_bits = 12;
mc->block_default_type = IF_IDE;
-mc->no_cdrom = 1;
+mc->no_cdrom = true;
mc->default_nic = "e1000e";
mc->default_ram_size = 1 * GiB;
mc->default_ram_id = "sbsa-ref.ram";
diff -

Re: [PATCH v3] hw/i386/cpu: remove default_cpu_version and simplify

2025-01-22 Thread Ani Sinha
On Tue, Jan 21, 2025 at 4:58 PM Zhao Liu  wrote:
>
> Hi Ani,
>
> Sorry for late reply.
>
> On Thu, Jan 16, 2025 at 09:04:18AM +0530, Ani Sinha wrote:
> > Date: Thu, 16 Jan 2025 09:04:18 +0530
> > From: Ani Sinha 
> > Subject: [PATCH v3] hw/i386/cpu: remove default_cpu_version and simplify
> > X-Mailer: git-send-email 2.45.2
> >
> > commit 0788a56bd1ae3 ("i386: Make unversioned CPU models be aliases")
> > introduced 'default_cpu_version' for PCMachineClass. This created three
> > categories of CPU models:
> >  - Most unversioned CPU models would use version 1 by default.
> >  - For machines 4.0.1 and older that do not support cpu model aliases, a
> >special default_cpu_version value of CPU_VERSION_LEGACY is used.
> >  - It was thought that future machines would use the latest value of cpu
> >versions corresponding to default_cpu_version value of
> >CPU_VERSION_LATEST [1].
> >
> > All pc machines still use the default cpu version of 1 for
> > unversioned cpu models. CPU_VERSION_LATEST is a moving target and
> > changes with time. Therefore, if machines use CPU_VERSION_LATEST, it would
> > mean that over a period of time, for the same machine type, the cpu version
> > would be different depending on what is latest at that time. This would
> > break guests even when they use a constant machine type. Therefore, for
> > pc machines, use of CPU_VERSION_LATEST is not possible. Currently, only
> > microvms use CPU_VERSION_LATEST.
> >
> > This change cleans up the complicated logic around default_cpu_version
> > including getting rid of default_cpu_version property itself. A couple of 
> > new
> > flags are introduced, one for the legacy model for machines 4.0.1 and older
> > and other for microvms. For older machines, a new pc machine property is
> > introduced that separates pc machine versions 4.0.1 and older from the newer
> > machines. 4.0.1 and older machines are scheduled to be deleted towards
> > end of 2025 since they would be 6 years old by then. At that time, we can
> > remove all logic around legacy cpus. Microvms are the only machines that
> > continue to use the latest cpu version. If this changes later, we can
> > remove all logic around x86_cpu_model_last_version(). Default cpu version
> > for unversioned cpu models is hardcoded to the value 1 and applies
> > unconditionally for all pc machine types of version 4.1 and above.
> >
> > This change also removes all complications around CPU_VERSION_AUTO
> > including removal of the value itself.
>
> I like the idea to remove CPU_VERSION_AUTO. Though this patch introduces
> 2 more new static variables ("use_legacy_cpu" and "use_last_cpu_version"),
> as you said, once 4.0.1 and older machines are removed, it's easy to
> clean up "use_legacy_cpu".
>
> > 1) See commit dcafd1ef0af227 ("i386: Register versioned CPU models")
> >
> > CC: imamm...@redhat.com
> > Signed-off-by: Ani Sinha 
> > ---
>
> [snip]
>
> > -void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
> > +void x86_legacy_cpus_init(X86MachineState *x86ms)
> > +{
> > +machine_uses_legacy_cpu();
> > +x86_cpus_init(x86ms);
> > +}
> > +
> > +void x86_cpus_init_with_latest_cpu_version(X86MachineState *x86ms)
> > +{
> > +x86_cpu_uses_lastest_version();
> > +x86_cpus_init(x86ms);
> > +}
>
> Could we simplify it even further, i.e., omit these two new helpers and
> just add x86_cpu_uses_lastest_version() and machine_uses_legacy_cpu() to
> the initialization of the PC & microvm, e.g.,
>
> --- a/hw/i386/microvm.c
> +++ b/hw/i386/microvm.c
> @@ -458,7 +458,8 @@ static void microvm_machine_state_init(MachineState 
> *machine)
>
>  microvm_memory_init(mms);
>
> -x86_cpus_init_with_latest_cpu_version(x86ms);
> +x86_cpu_uses_lastest_version();
> +x86_cpus_init(x86ms);
>
>  microvm_devices_init(mms);
>  }
>
> and
>
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -138,11 +138,10 @@ static inline void pc_init_cpus(MachineState *ms)
>
>  if (pcmc->no_versioned_cpu_model) {
>  /* use legacy cpu as it does not support versions */
> -x86_legacy_cpus_init(x86ms);
> -} else {
> -/* use non-legacy cpus */
> -x86_cpus_init(x86ms);
> +machine_uses_legacy_cpu();
>  }
> +
> +x86_cpus_init(x86ms);
>  }

yeah this simplifies things a bit.

>
>  /* ioapic.c */
>
> [snip]
>
> > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> > index a558705cb9..ad43a233d8 100644
> > --- a/include/hw/i386/pc.h
> > +++ b/include/hw/i386/pc.h
> > @@ -92,9 +92,6 @@ struct PCMachineClass {
> >
> >  /* Compat options: */
> >
> > -/* Default CPU model version.  See x86_cpu_set_default_version(). */
> > -int default_cpu_version;
> > -
> >  /* ACPI compat: */
> >  bool has_acpi_build;
> >  int pci_root_uid;
> > @@ -125,11 +122,29 @@ struct PCMachineClass {
> >   * check for memory.
> >   */
> >  bool broken_32bit_mem_addr_check;
> > +
> > +/* whether the machine supports versio

[PATCH 1/9] block: Allow inactivating already inactive nodes

2025-01-22 Thread Kevin Wolf
What we wanted to catch with the assertion is cases where the recursion
finds that a child was inactive before its parent. This should never
happen. But if the user tries to inactivate an image that is already
inactive, that's harmless and we don't want to fail the assertion.

Signed-off-by: Kevin Wolf 
---
 block.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index f60606f242..43ed632a7a 100644
--- a/block.c
+++ b/block.c
@@ -6955,7 +6955,8 @@ bdrv_has_bds_parent(BlockDriverState *bs, bool 
only_active)
 return false;
 }
 
-static int GRAPH_RDLOCK bdrv_inactivate_recurse(BlockDriverState *bs)
+static int GRAPH_RDLOCK
+bdrv_inactivate_recurse(BlockDriverState *bs, bool top_level)
 {
 BdrvChild *child, *parent;
 int ret;
@@ -6973,7 +6974,14 @@ static int GRAPH_RDLOCK 
bdrv_inactivate_recurse(BlockDriverState *bs)
 return 0;
 }
 
-assert(!(bs->open_flags & BDRV_O_INACTIVE));
+/*
+ * Inactivating an already inactive node on user request is harmless, but 
if
+ * a child is already inactive before its parent, that's bad.
+ */
+if (bs->open_flags & BDRV_O_INACTIVE) {
+assert(top_level);
+return 0;
+}
 
 /* Inactivate this node */
 if (bs->drv->bdrv_inactivate) {
@@ -7010,7 +7018,7 @@ static int GRAPH_RDLOCK 
bdrv_inactivate_recurse(BlockDriverState *bs)
 
 /* Recursively inactivate children */
 QLIST_FOREACH(child, &bs->children, next) {
-ret = bdrv_inactivate_recurse(child->bs);
+ret = bdrv_inactivate_recurse(child->bs, false);
 if (ret < 0) {
 return ret;
 }
@@ -7035,7 +7043,7 @@ int bdrv_inactivate_all(void)
 if (bdrv_has_bds_parent(bs, false)) {
 continue;
 }
-ret = bdrv_inactivate_recurse(bs);
+ret = bdrv_inactivate_recurse(bs, true);
 if (ret < 0) {
 bdrv_next_cleanup(&it);
 break;
-- 
2.48.1




Re: [PATCH v5 4/8] virtio-gpu: Support asynchronous fencing

2025-01-22 Thread Dmitry Osipenko
On 1/20/25 16:56, Alex Bennée wrote:
...
>> @@ -972,15 +973,29 @@ void virtio_gpu_virgl_process_cmd(VirtIOGPU *g,
>>  
>>  trace_virtio_gpu_fence_ctrl(cmd->cmd_hdr.fence_id, cmd->cmd_hdr.type);
>>  
>> -/*
>> - * Unlike other virglrenderer functions, this one returns a positive
>> - * error code.
>> - */
>> -ret = virgl_renderer_create_fence(cmd->cmd_hdr.fence_id, 0);
>> -if (ret) {
>> -qemu_log_mask(LOG_GUEST_ERROR,
>> -  "%s: virgl_renderer_create_fence error: %s",
>> -  __func__, strerror(ret));
>> +if (gl->context_fence_enabled &&
>> +(cmd->cmd_hdr.flags & VIRTIO_GPU_FLAG_INFO_RING_IDX)) {
>> +uint32_t flags = 0;
> 
> Is this is constant or do we expect to change this later?
There are no immediate plans for using this flags variable in QEMU
today. But in general context-specific flags could be specified here.
Crosv makes use of the flags.

>> +ret = virgl_renderer_context_create_fence(cmd->cmd_hdr.ctx_id, 
>> flags,
>> +  cmd->cmd_hdr.ring_idx,
>> +  cmd->cmd_hdr.fence_id);
>> +if (ret) {
>> +qemu_log_mask(LOG_GUEST_ERROR,
>> +  "%s: virgl_renderer_context_create_fence error: 
>> %s",
>> +  __func__, strerror(-ret));
> 
> This still fails with older virglrenderers:
> 
> ../../hw/display/virtio-gpu-virgl.c: In function 
> ‘virtio_gpu_virgl_process_cmd’:
> ../../hw/display/virtio-gpu-virgl.c:980:15: error: implicit declaration of 
> function ‘virgl_renderer_context_create_fence’; did you mean 
> ‘virgl_renderer_context_create’? [-Werror=implicit-function-declaration]
>   980 | ret = 
> virgl_renderer_context_create_fence(cmd->cmd_hdr.ctx_id, flags,
>   |   ^~~
>   |   virgl_renderer_context_create
> ../../hw/display/virtio-gpu-virgl.c:980:15: error: nested extern declaration 
> of ‘virgl_renderer_context_create_fence’ [-Werror=nested-externs]
> cc1: all warnings being treated as errors
> [1981/2819] Compiling C object libcommon.a.p/ui_sdl2-gl.c.o

Indeed! Good catch again, thanks! Will fix in v6

-- 
Best regards,
Dmitry



Re: [PATCH] vvfat: fix out of bounds array write

2025-01-22 Thread BALATON Zoltan

On Wed, 22 Jan 2025, Michael Tokarev wrote:

22.01.2025 02:14, Pierrick Bouvier wrote:
..
I agree the existing code (and this patch) is pretty cryptic for anyone not 
familiar with FAT format.
However, I think it could be a good thing to first merge this one (which is 
correct, and works), and refactor this in a second time, so the current 
ubsan issue is fixed upstream as soon as possible.


For an actual *fix*, please take a look at
https://lore.kernel.org/qemu-devel/20250119093233.9e4c450...@localhost.tls.msk.ru/

which is minimal, understandable, verified and works.


Just noticed in that patch you have several &(s->directory) where () is 
not needed, -> is higher priority than & (address_of).


Regards,
BALATON Zoltan



Re: [PATCH v5 0/8] Support virtio-gpu DRM native context

2025-01-22 Thread Dmitry Osipenko
On 1/20/25 18:41, Alex Bennée wrote:
> Dmitry Osipenko  writes:
> 
>> This patchset adds DRM native context support to VirtIO-GPU on Qemu.
>>
>> Contarary to Virgl and Venus contexts that mediates high level GFX APIs,
>> DRM native context [1] mediates lower level kernel driver UAPI, which
>> reflects in a less CPU overhead and less/simpler code needed to support it.
>> DRM context consists of a host and guest parts that have to be implemented
>> for each GPU driver. On a guest side, DRM context presents a virtual GPU as
>> a real/native host GPU device for GL/VK applications.
>>
>> [1] https://www.youtube.com/watch?v=9sFP_yddLLQ
>>
>> Today there are four known DRM native context drivers existing in a wild:
>>
>>   - Freedreno (Qualcomm SoC GPUs), completely upstreamed
>>   - AMDGPU, mostly merged into upstreams
>>   - Intel (i915), merge requests are opened
> 
> With the patch for the build failure:
> 
> Tested-by: Alex Bennée 
> 
> Host:
> 
>   x86
>   Trixie
>   virglrenderer @ 
> digitx/native-context-iris/a0b1872d252430a2b7f007db9fdbb0526385cfc0 
>   -display sdl,gl=on
> 
> KVM Guest
> 
>   x86
>   Trixie
>   mesa @ digitx/native-context-iris/78b1508c3f06
> 
> With gtk,gl=on I'm still seeing a lot of screen corruption which mirrors
> the terminal an leaves a destructive trail under the mouse cursor.
> show-cursor on or off makes no difference.

Thank you for the review and testing! I'm looking into that issue. Only
some people are hitting it and Pierre-Eric said he had that mirroring
issue without using nctx. Still interesting that the bug affects only
certain setups and is triggered by nctx.

-- 
Best regards,
Dmitry



Re: [PATCH] hw/boards: Convert MachineClass bitfields to boolean

2025-01-22 Thread Thomas Huth

On 22/01/2025 15.33, Peter Maydell wrote:

On Wed, 22 Jan 2025 at 12:36, Thomas Huth  wrote:


On 22/01/2025 11.32, Philippe Mathieu-Daudé wrote:

As Daniel mentioned:

   "The number of instances of MachineClass is not large enough
that we save a useful amount of memory through bitfields."

Also, see recent commit ecbf3567e21 ("docs/devel/style: add a
section about bitfield, and disallow them for packed structures").

Convert the MachineClass bitfields used as boolean as real ones.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Philippe Mathieu-Daudé 
---
   include/hw/boards.h| 14 +++---
   hw/arm/aspeed.c|  6 +++---
   hw/arm/fby35.c |  4 ++--
   hw/arm/npcm7xx_boards.c|  6 +++---
   hw/arm/raspi.c |  6 +++---
   hw/arm/sbsa-ref.c  |  2 +-
   hw/arm/virt.c  |  2 +-
   hw/arm/xilinx_zynq.c   |  2 +-
   hw/avr/arduino.c   |  6 +++---
   hw/core/null-machine.c | 10 +-
   hw/i386/microvm.c  |  2 +-
   hw/i386/pc_piix.c  |  2 +-
   hw/i386/pc_q35.c   |  4 ++--
   hw/loongarch/virt.c|  2 +-
   hw/m68k/virt.c |  6 +++---
   hw/ppc/pnv.c   |  2 +-
   hw/ppc/spapr.c |  2 +-
   hw/riscv/virt.c|  2 +-
   hw/s390x/s390-virtio-ccw.c |  8 
   hw/xtensa/sim.c|  2 +-
   20 files changed, 45 insertions(+), 45 deletions(-)


So if you are touching all these files, why not go with an even more
meaningful rework instead? Flip the meaning of the "no_*" flags to the
opposite, so that we e.g. have "has_default_cdrom" instead of "no_cdrom",
then new boards would not have to remember to set these ugly "no_" flags
anymore. It's quite a bit of work, but it could certainly be helpful in the
long run.


Well, that depends on what you think the default for new
boards should be. I suspect these are all no_foo because
when they were put in the idea was "all boards should
be default have a foo, and 'this board defaults to not
having a foo' is the rarer special case it has to set"...


That might have been the reasoning for the naming 20 years ago. But times 
have changed... which recent board does still have a floppy drive? parallel 
port? And the others are also not that common anymore...


 Thomas




[PATCH] tests/functional: Fix the aarch64_tcg_plugins test

2025-01-22 Thread Thomas Huth
Unfortunately, this test had not been added to meson.build, so we did
not notice a regression: Looking for 'Kernel panic - not syncing: VFS:'
as the indication for the final boot state of the kernel was a bad
idea since 'Kernel panic - not syncing' is the default failure
message of the LinuxKernelTest class, and since we're now reading
the console input byte by byte instead of linewise (see commit
cdad03b74f75), the failure now triggers before we fully read the
success string. Let's fix this by simply looking for the previous
line in the console output instead.

Also, replace the call to cancel() - this was only available in the
Avocado framework. In the functional framework, we must use skipTest()
instead.

Fixes: 3abc545e66 ("tests/functional: Convert the tcg_plugins test")
Fixes: cdad03b74f ("tests/functional: rewrite console handling to be bytewise")
Signed-off-by: Thomas Huth 
---
 tests/functional/meson.build | 1 +
 tests/functional/test_aarch64_tcg_plugins.py | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tests/functional/meson.build b/tests/functional/meson.build
index 5457331643..e0a276f349 100644
--- a/tests/functional/meson.build
+++ b/tests/functional/meson.build
@@ -72,6 +72,7 @@ tests_aarch64_system_thorough = [
   'aarch64_sbsaref',
   'aarch64_sbsaref_alpine',
   'aarch64_sbsaref_freebsd',
+  'aarch64_tcg_plugins',
   'aarch64_tuxrun',
   'aarch64_virt',
   'aarch64_xlnx_versal',
diff --git a/tests/functional/test_aarch64_tcg_plugins.py 
b/tests/functional/test_aarch64_tcg_plugins.py
index 01660eb090..357eb48477 100755
--- a/tests/functional/test_aarch64_tcg_plugins.py
+++ b/tests/functional/test_aarch64_tcg_plugins.py
@@ -46,7 +46,7 @@ def run_vm(self, kernel_path, kernel_command_line,
 except:
 # TODO: probably fails because plugins not enabled but we
 # can't currently probe for the feature.
-self.cancel("TCG Plugins not enabled?")
+self.skipTest("TCG Plugins not enabled?")
 
 self.wait_for_console_pattern(console_pattern, vm)
 # ensure logs are flushed
@@ -65,7 +65,7 @@ def test_aarch64_virt_insn(self):
 kernel_path = self.ASSET_KERNEL.fetch()
 kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
'console=ttyAMA0')
-console_pattern = 'Kernel panic - not syncing: VFS:'
+console_pattern = 'Please append a correct "root=" boot option'
 
 plugin_log = tempfile.NamedTemporaryFile(mode="r+t", prefix="plugin",
  suffix=".log")
@@ -91,7 +91,7 @@ def test_aarch64_virt_insn_icount(self):
 kernel_path = self.ASSET_KERNEL.fetch()
 kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
'console=ttyAMA0')
-console_pattern = 'Kernel panic - not syncing: VFS:'
+console_pattern = 'Please append a correct "root=" boot option'
 
 plugin_log = tempfile.NamedTemporaryFile(mode="r+t", prefix="plugin",
  suffix=".log")
-- 
2.48.1




[PATCH v2 03/10] cpus: Cache CPUClass early in instance_init() handler

2025-01-22 Thread Philippe Mathieu-Daudé
Cache CPUClass as early as possible, when the instance
is initialized.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
---
 cpu-target.c | 3 ---
 hw/core/cpu-common.c | 3 +++
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/cpu-target.c b/cpu-target.c
index 667688332c9..89874496a41 100644
--- a/cpu-target.c
+++ b/cpu-target.c
@@ -134,9 +134,6 @@ const VMStateDescription vmstate_cpu_common = {
 
 bool cpu_exec_realizefn(CPUState *cpu, Error **errp)
 {
-/* cache the cpu class for the hotpath */
-cpu->cc = CPU_GET_CLASS(cpu);
-
 if (!accel_cpu_common_realize(cpu, errp)) {
 return false;
 }
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index cb79566cc51..ff605059c15 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -238,6 +238,9 @@ static void cpu_common_initfn(Object *obj)
 {
 CPUState *cpu = CPU(obj);
 
+/* cache the cpu class for the hotpath */
+cpu->cc = CPU_GET_CLASS(cpu);
+
 gdb_init_cpu(cpu);
 cpu->cpu_index = UNASSIGNED_CPU_INDEX;
 cpu->cluster_index = UNASSIGNED_CLUSTER_INDEX;
-- 
2.47.1




Re: [RFC PATCH] Fix race in live migration failure path

2025-01-22 Thread Shivam Kumar


On 13 Jan 2025, at 9:59 PM, Peter Xu  wrote:

!---|
 CAUTION: External Email

|---!

On Fri, Jan 10, 2025 at 10:09:38AM -0300, Fabiano Rosas wrote:
Shivam Kumar  writes:

Even if a live migration fails due to some reason, migration status
should not be set to MIGRATION_STATUS_FAILED until migrate fd cleanup
is done, else the client can trigger another instance of migration
before the cleanup is complete (as it would assume no migration is
active) or reset migration capabilities affecting old migration's
cleanup. Hence, set the status to 'failing' when a migration failure
happens and once the cleanup is complete, set the migration status to
MIGRATION_STATUS_FAILED.

Signed-off-by: Shivam Kumar 
---
migration/migration.c | 49 +--
migration/migration.h |  9 
migration/multifd.c   |  6 ++
migration/savevm.c|  7 +++
4 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index df61ca4e93..f084f54f6b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1143,8 +1143,9 @@ static bool migration_is_active(void)

migration_is_running() is the one that gates qmp_migrate() and
qmp_migrate_set_capabilities().

{
MigrationState *s = current_migration;

-return (s->state == MIGRATION_STATUS_ACTIVE ||
-s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
+return ((s->state == MIGRATION_STATUS_ACTIVE ||
+s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) &&
+!qatomic_read(&s->failing));
}

static bool migrate_show_downtime(MigrationState *s)
@@ -1439,6 +1440,11 @@ static void migrate_fd_cleanup(MigrationState *s)
  MIGRATION_STATUS_CANCELLED);
}

+if (qatomic_xchg(&s->failing, 0)) {
+migrate_set_state(&s->state, s->state,
+  MIGRATION_STATUS_FAILED);
+}

I hope you've verified that sure every place that used to set FAILED
will also reach migrate_fd_cleanup() eventually.

Also, we probably still need the FAILING state. Otherwise, this will
trip code that expects a state change on failure. Anything that does:

if (state != MIGRATION_STATUS_FOO) {
  ...
}

So I think the change overall should be

-migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILING);

void migrate_set_state(MigrationStatus *state, MigrationStatus old_state,
   MigrationStatus new_state)
{
assert(new_state < MIGRATION_STATUS__MAX);
if (qatomic_cmpxchg(state, old_state, new_state) == old_state) {
trace_migrate_set_state(MigrationStatus_str(new_state));

+if (new_state == MIGRATION_STATUS_FAILING) {
+qatomic_set(&s->failing, 1);
+}
migrate_generate_event(new_state);
}
}

And we should proably do the same for CANCELLING actually, but there the
(preexisting) issue is actual concurrency, while here it's just
inconsistency in the state.

Yes something like FAILING sounds reasonable.  Though since we have
s->error, I wonder whether that's a better place to represent a migration
as "failing" in one place, because otherwise we need to set two places
(both FAILING state, and the s->error) - whenever something fails, we'd
better always update s->error so as to remember what failed, then reported
via query-migrate.

From that POV, s->failing is probably never gonna be needed (due to
s->error being present anyway)?  So far, such Error* looks like the best
single point to say that the migration is failing - it also enforces the
Error to be provided whoever wants to set it to failing state.

--
Peter Xu
There is one place where we set the migration status to FAILED but we don’t set
s->error, i.e. in save_snapshot workflow, please check qemu_savevm_state but
not sure setting s->error in this case is possible (or required), as it seems a
different workflow to me.

In addition, one potentially real problem that I see is this comment in
migration_detect_error:
/*
 * For postcopy, we allow the network to be down for a
 * while. After that, it can be continued by a
 * recovery phase.
 */
Let's say if we set s->error at some place and there was a file error on either
source or destination (qemu_file_get_error_obj_any returns a positive value
when called by migration_detect_error). We expect migration to fail in this
case but migration will continue to run since post-copy migration is tolerant
to file errors?




Re: [PATCH] hw/boards: Convert MachineClass bitfields to boolean

2025-01-22 Thread Thomas Huth

On 22/01/2025 11.32, Philippe Mathieu-Daudé wrote:

As Daniel mentioned:

  "The number of instances of MachineClass is not large enough
   that we save a useful amount of memory through bitfields."

Also, see recent commit ecbf3567e21 ("docs/devel/style: add a
section about bitfield, and disallow them for packed structures").

Convert the MachineClass bitfields used as boolean as real ones.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Philippe Mathieu-Daudé 
---
  include/hw/boards.h| 14 +++---
  hw/arm/aspeed.c|  6 +++---
  hw/arm/fby35.c |  4 ++--
  hw/arm/npcm7xx_boards.c|  6 +++---
  hw/arm/raspi.c |  6 +++---
  hw/arm/sbsa-ref.c  |  2 +-
  hw/arm/virt.c  |  2 +-
  hw/arm/xilinx_zynq.c   |  2 +-
  hw/avr/arduino.c   |  6 +++---
  hw/core/null-machine.c | 10 +-
  hw/i386/microvm.c  |  2 +-
  hw/i386/pc_piix.c  |  2 +-
  hw/i386/pc_q35.c   |  4 ++--
  hw/loongarch/virt.c|  2 +-
  hw/m68k/virt.c |  6 +++---
  hw/ppc/pnv.c   |  2 +-
  hw/ppc/spapr.c |  2 +-
  hw/riscv/virt.c|  2 +-
  hw/s390x/s390-virtio-ccw.c |  8 
  hw/xtensa/sim.c|  2 +-
  20 files changed, 45 insertions(+), 45 deletions(-)


So if you are touching all these files, why not go with an even more 
meaningful rework instead? Flip the meaning of the "no_*" flags to the 
opposite, so that we e.g. have "has_default_cdrom" instead of "no_cdrom", 
then new boards would not have to remember to set these ugly "no_" flags 
anymore. It's quite a bit of work, but it could certainly be helpful in the 
long run.


 Thomas



diff --git a/include/hw/boards.h b/include/hw/boards.h
index 2ad711e56db..ff5904d6fd8 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -279,13 +279,13 @@ struct MachineClass {
  int max_cpus;
  int min_cpus;
  int default_cpus;
-unsigned int no_serial:1,
-no_parallel:1,
-no_floppy:1,
-no_cdrom:1,
-no_sdcard:1,
-pci_allow_0_address:1,
-legacy_fw_cfg_order:1;
+bool no_serial;
+bool no_parallel;
+bool no_floppy;
+bool no_cdrom;
+bool no_sdcard;
+bool pci_allow_0_address;
+bool legacy_fw_cfg_order;
  bool is_default;
  const char *default_machine_opts;
  const char *default_boot_order;





Re: [PATCH] vvfat: fix out of bounds array write

2025-01-22 Thread Michael Tokarev

22.01.2025 15:19, BALATON Zoltan wrote:

On Wed, 22 Jan 2025, Michael Tokarev wrote:

22.01.2025 02:14, Pierrick Bouvier wrote:
..

I agree the existing code (and this patch) is pretty cryptic for anyone not 
familiar with FAT format.
However, I think it could be a good thing to first merge this one (which is correct, and works), and refactor this in a second time, so the current 
ubsan issue is fixed upstream as soon as possible.


For an actual *fix*, please take a look at
https://lore.kernel.org/qemu-devel/20250119093233.9e4c450...@localhost.tls.msk.ru/

which is minimal, understandable, verified and works.


Just noticed in that patch you have several &(s->directory) where () is not needed, 
-> is higher priority than & (address_of).


Yes.  I especially mentioned that I kept the original style,
to minimize the changes.  It is not needed to fix the issue
at hand, the fix is maximally targeted (or minimally).

The subsequent patch - which is optional, unrelated to the issue
at hand - changes all that stuff to adhere to qemu coding style
(and yes, this is a style thing, for some, these parens makes it
more readable).

Thanks,

/mjt



Re: [PULL 00/68] tcg patch queue

2025-01-22 Thread Stefan Hajnoczi
Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/10.0 for any 
user-visible changes.


signature.asc
Description: PGP signature


[PATCH 4/5] cpu: Introduce cpu_get_phys_bits()

2025-01-22 Thread Cédric Le Goater
The Intel CPU has a complex history regarding setting of the physical
address space width on KVM. A 'phys_bits' field and a "phys-bits"
property were added by commit af45907a1328 ("target-i386: Allow
physical address bits to be set") to tune this value.

In certain circumstances, it is interesting to know this value to
check that all the conditions are met for optimal operation. For
instance, when the system has a 39-bit IOMMU address space width and a
larger CPU physical address space, we expect issues when mapping the
MMIO regions of passthrough devices and it would good to report to the
user. These hybrid HW configs can be found on some consumer grade
processors or when using a vIOMMU device with default settings.

For this purpose, add an helper routine and a CPUClass callback to
return the physical address space width of a CPU.

Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: "Philippe Mathieu-Daudé" 
Cc: Yanan Wang 
Cc: Zhao Liu 
Signed-off-by: Cédric Le Goater 
---
 include/hw/core/cpu.h|  9 +
 include/hw/core/sysemu-cpu-ops.h |  6 ++
 cpu-target.c |  5 +
 hw/core/cpu-system.c | 11 +++
 target/i386/cpu.c|  6 ++
 5 files changed, 37 insertions(+)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 
fb397cdfc53d12d40d3e4e7f86251fc31c48b9f6..1b3eead102ce62fcee55ab0ed5e0dff327fa2fc5
 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -748,6 +748,14 @@ int cpu_asidx_from_attrs(CPUState *cpu, MemTxAttrs attrs);
  */
 bool cpu_virtio_is_big_endian(CPUState *cpu);
 
+/**
+ * cpu_get_phys_bits:
+ * @cpu: CPU
+ *
+ * Return the physical address space width of the CPU @cpu.
+ */
+uint32_t cpu_get_phys_bits(const CPUState *cpu);
+
 #endif /* CONFIG_USER_ONLY */
 
 /**
@@ -1168,6 +1176,7 @@ void cpu_exec_unrealizefn(CPUState *cpu);
 void cpu_exec_reset_hold(CPUState *cpu);
 
 const char *target_name(void);
+uint32_t target_phys_bits(void);
 
 #ifdef COMPILING_PER_TARGET
 
diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h
index 
0df5b058f50073e47d2a6b8286be5204776520d2..210b3ed57985525795b81559e41e0085969210d5
 100644
--- a/include/hw/core/sysemu-cpu-ops.h
+++ b/include/hw/core/sysemu-cpu-ops.h
@@ -81,6 +81,12 @@ typedef struct SysemuCPUOps {
  */
 bool (*virtio_is_big_endian)(CPUState *cpu);
 
+/**
+ * @get_phys_bits: Callback to return the physical address space
+ * width of a CPU.
+ */
+uint32_t (*get_phys_bits)(const CPUState *cpu);
+
 /**
  * @legacy_vmsd: Legacy state for migration.
  *   Do not use in new targets, use #DeviceClass::vmsd instead.
diff --git a/cpu-target.c b/cpu-target.c
index 
667688332c929aa53782c94343def34571272d5f..88158272c06cc42424d435b9701e33735f080239
 100644
--- a/cpu-target.c
+++ b/cpu-target.c
@@ -472,3 +472,8 @@ const char *target_name(void)
 {
 return TARGET_NAME;
 }
+
+uint32_t target_phys_bits(void)
+{
+return TARGET_PHYS_ADDR_SPACE_BITS;
+}
diff --git a/hw/core/cpu-system.c b/hw/core/cpu-system.c
index 
6aae28a349a7a377d010ff9dcab5ebc29e1126ca..05067d84f4258facf4252216f17729e390d38eae
 100644
--- a/hw/core/cpu-system.c
+++ b/hw/core/cpu-system.c
@@ -60,6 +60,17 @@ hwaddr cpu_get_phys_page_attrs_debug(CPUState *cpu, vaddr 
addr,
 return cc->sysemu_ops->get_phys_page_debug(cpu, addr);
 }
 
+uint32_t cpu_get_phys_bits(const CPUState *cpu)
+{
+CPUClass *cc = CPU_GET_CLASS(cpu);
+
+if (cc->sysemu_ops->get_phys_bits) {
+return cc->sysemu_ops->get_phys_bits(cpu);
+}
+
+return target_phys_bits();
+}
+
 hwaddr cpu_get_phys_page_debug(CPUState *cpu, vaddr addr)
 {
 MemTxAttrs attrs = {};
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 
1b9c11022c48e3103627d370f7fbdb2ae94a9f81..8f9f75de7cafaca72b4eb32e8229a7a7668f5c1c
 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -8239,6 +8239,11 @@ static bool x86_cpu_get_paging_enabled(const CPUState 
*cs)
 
 return cpu->env.cr[0] & CR0_PG_MASK;
 }
+
+static uint32_t x86_cpu_get_phys_bits(const CPUState *cs)
+{
+return X86_CPU(cs)->phys_bits;
+}
 #endif /* !CONFIG_USER_ONLY */
 
 static void x86_cpu_set_pc(CPUState *cs, vaddr value)
@@ -8547,6 +8552,7 @@ static const struct SysemuCPUOps i386_sysemu_ops = {
 .get_memory_mapping = x86_cpu_get_memory_mapping,
 .get_paging_enabled = x86_cpu_get_paging_enabled,
 .get_phys_page_attrs_debug = x86_cpu_get_phys_page_attrs_debug,
+.get_phys_bits = x86_cpu_get_phys_bits,
 .asidx_from_attrs = x86_asidx_from_attrs,
 .get_crash_info = x86_cpu_get_crash_info,
 .write_elf32_note = x86_cpu_write_elf32_note,
-- 
2.48.1




[PATCH 5/5] vfio: Check compatibility of CPU and IOMMU address space width

2025-01-22 Thread Cédric Le Goater
Print a warning if IOMMU address space width is smaller than the
physical address width. In this case, PCI peer-to-peer transactions on
BARs are not supported and failures of device MMIO regions are to be
expected.

This can occur with the 39-bit IOMMU address space width as found on
consumer grade processors or when using a vIOMMU device with default
settings.

Signed-off-by: Cédric Le Goater 
---
 hw/vfio/common.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 
561be7f57cf903e6e2bcbbb708be9e4d4ee8941c..d0104976e9d99c3f64ec716accd3adb4a70d9afe
 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -44,6 +44,8 @@
 #include "migration/qemu-file.h"
 #include "system/tpm.h"
 
+#include "hw/core/cpu.h"
+
 VFIODeviceList vfio_device_list =
 QLIST_HEAD_INITIALIZER(vfio_device_list);
 static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
@@ -1561,12 +1563,28 @@ retry:
 return info;
 }
 
+static bool vfio_device_check_address_space(VFIODevice *vbasedev, Error **errp)
+{
+uint32_t cpu_aw_bits = cpu_get_phys_bits(first_cpu);
+uint32_t iommu_aw_bits = vfio_device_get_aw_bits(vbasedev);
+
+if (cpu_aw_bits && cpu_aw_bits > iommu_aw_bits) {
+error_setg(errp, "Host physical address space (%u) is larger than "
+   "the host IOMMU address space (%u).", cpu_aw_bits,
+   iommu_aw_bits);
+vfio_container_p2p_error_append(vbasedev->bcontainer, errp);
+return false;
+}
+return true;
+}
+
 bool vfio_attach_device(char *name, VFIODevice *vbasedev,
 AddressSpace *as, Error **errp)
 {
 const VFIOIOMMUClass *ops =
 VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
 HostIOMMUDevice *hiod = NULL;
+Error *local_err = NULL;
 
 if (vbasedev->iommufd) {
 ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
@@ -1586,6 +1604,9 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
 return false;
 }
 
+if (!vfio_device_check_address_space(vbasedev, &local_err)) {
+warn_report_err(local_err);
+}
 return true;
 }
 
-- 
2.48.1




Re: [PATCH v7 2/4] chardev/char-hub: implement backend chardev aggregator

2025-01-22 Thread Roman Penyaev
On Wed, Jan 22, 2025 at 3:44 PM Alex Bennée  wrote:
>
> Roman Penyaev  writes:
>
> > This patch implements a new chardev backend `hub` device, which
> > aggregates input from multiple backend devices and forwards it to a
> > single frontend device. Additionally, `hub` device takes the output
> > from the frontend device and sends it back to all the connected
> > backend devices. This allows for seamless interaction between
> > different backend devices and a single frontend interface.
> >
> > The idea of the change is trivial: keep list of backend devices
> > (up to 4), init them on demand and forward data buffer back and
> > forth.
> >
> > The following is QEMU command line example:
> >
> >-chardev pty,path=/tmp/pty,id=pty0 \
> >-chardev vc,id=vc0 \
> >-chardev hub,id=hub0,chardevs.0=pty0,chardevs.1=vc0 \
> >-device virtconsole,chardev=hub0 \
> >-vnc 0.0.0.0:0
> >
> > Which creates 2 backend devices: text virtual console (`vc0`) and a
> > pseudo TTY (`pty0`) connected to the single virtio hvc console with
> > the backend aggregator (`hub0`) help. `vc0` renders text to an image,
> > which can be shared over the VNC protocol.  `pty0` is a pseudo TTY
> > backend which provides biderectional communication to the virtio hvc
> > console.
> >
> 
> > +static void qemu_chr_open_hub(Chardev *chr,
> > + ChardevBackend *backend,
> > + bool *be_opened,
> > + Error **errp)
> > +{
> > +ChardevHub *hub = backend->u.hub.data;
> > +HubChardev *d = HUB_CHARDEV(chr);
> > +strList *list = hub->chardevs;
> > +
> > +d->be_eagain_ind = -1;
> > +
> > +if (list == NULL) {
> > +error_setg(errp, "hub: 'chardevs' list is not defined");
> > +return;
> > +}
> > +
> > +while (list) {
> > +Chardev *s;
> > +
> > +s = qemu_chr_find(list->value);
> > +if (s == NULL) {
> > +error_setg(errp, "hub: chardev can't be found by id '%s'",
> > +   list->value);
> > +return;
> > +}
> > +if (CHARDEV_IS_HUB(s) || CHARDEV_IS_MUX(s)) {
> > +error_setg(errp, "hub: multiplexers and hub devices can't be "
> > +   "stacked, check chardev '%s', chardev should not "
> > +   "be a hub device or have 'mux=on' enabled",
> > +   list->value);
> > +return;
>
> So I was looking at this to see if I could implement what I wanted which
> was a tee-like copy of a serial port output while maintaining the C-a
> support of the mux.
>
> Normally I just use the shortcut -serial mon:stdio
>
> However that form is a special case so I tried the following and ran
> into the above:
>
>   -chardev stdio,mux=on,id=char0 \
>   -chardev file,path=console.log,id=clog  \
>   -mon chardev=char0,mode=readline \
>   -chardev hub,id=hub0,chardevs.0=char0,chardevs.1=clog
>
> Giving:
>   qemu-system-aarch64: -chardev 
> -hub,id=hub0,chardevs.0=char0,chardevs.1=clog: hub: -multiplexers and hub 
> devices can't be stacked, check chardev
> -'char0', chardev should not be a hub device or have 'mux=on' 
> -enabled
>
> So what stops this sort of chain?

Hi Alex,

You define 'char0' as a mux device (the "mux=on" option). That is not supported
simply to avoid circle dependencies. To be frank I never considered
this use-case,
although chains might be useful. If you need this use-case we can
think of a more
proper (or advance) check.

--
Roman



[PATCH 1/5] vfio/pci: Replace "iommu_device" by "vIOMMU"

2025-01-22 Thread Cédric Le Goater
This is to be consistent with other reported errors related to vIOMMU
devices.

Signed-off-by: Cédric Le Goater 
---
 hw/vfio/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 
cf14987e42bd9188d5040b51a2f84cfa959f632d..ad326839db49cf3a50524d5443ceedac66e1df3d
 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3131,7 +3131,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
 if (!vbasedev->mdev &&
 !pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
-error_prepend(errp, "Failed to set iommu_device: ");
+error_prepend(errp, "Failed to set vIOMMU: ");
 goto out_teardown;
 }
 
-- 
2.48.1




[PATCH 2/5] vfio: Modify vfio_viommu_preset() parameter

2025-01-22 Thread Cédric Le Goater
We plan to use vfio_viommu_preset() in MemoryListener handlers which
operate at the container level. Change the parameter to VFIOContainerBase
to ease future changes.

Signed-off-by: Cédric Le Goater 
---
 include/hw/vfio/vfio-common.h | 2 +-
 hw/vfio/common.c  | 4 ++--
 hw/vfio/migration.c   | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 
48018dc751e51066769b23bc6e4675a7167b099e..fcfe0b4b8cbe907877e366117e7bb7f74311d4f6
 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -274,7 +274,7 @@ extern int vfio_kvm_device_fd;
 bool vfio_mig_active(void);
 int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp);
 void vfio_unblock_multiple_devices_migration(void);
-bool vfio_viommu_preset(VFIODevice *vbasedev);
+bool vfio_viommu_preset(VFIOContainerBase *bcontainer);
 int64_t vfio_mig_bytes_transferred(void);
 void vfio_reset_bytes_transferred(void);
 bool vfio_device_state_is_running(VFIODevice *vbasedev);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 
2660a42f9edc9346f2e62652efb0c78a8b48b52b..3ca5dbf883ed2262e36952fcc47e717ff4154f12
 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -142,9 +142,9 @@ void vfio_unblock_multiple_devices_migration(void)
 migrate_del_blocker(&multiple_devices_migration_blocker);
 }
 
-bool vfio_viommu_preset(VFIODevice *vbasedev)
+bool vfio_viommu_preset(VFIOContainerBase *bcontainer)
 {
-return vbasedev->bcontainer->space->as != &address_space_memory;
+return bcontainer->space->as != &address_space_memory;
 }
 
 static void vfio_set_migration_error(int ret)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 
adfa752db5272e37d73fc0a435a0834e74e3f2fe..347390adb27bc3cd123f3eec1de6dc6986d8d952
 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -1069,7 +1069,7 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error 
**errp)
 goto out_deinit;
 }
 
-if (vfio_viommu_preset(vbasedev)) {
+if (vfio_viommu_preset(vbasedev->bcontainer)) {
 error_setg(&err, "%s: Migration is currently not supported "
"with vIOMMU enabled", vbasedev->name);
 goto add_blocker;
-- 
2.48.1




[PATCH 0/5] vfio: Improve error reporting when MMIO region mapping fails

2025-01-22 Thread Cédric Le Goater
Hello,

Under certain circumstances, a MMIO region of a device fails to map
because the region is outside the supported IOVA ranges of the VM. In
this case, PCI peer-to-peer transactions on BARs are not supported.
This typically occurs when the IOMMU address space width is less than
the physical address width, as can be the case on consumer processors
or when using a vIOMMU device with default settings.

This series tries to clarify the error message reported to the user
and provide advice on how to possibly resolve this issue.

Thanks,

C.

Cédric Le Goater (5):
  vfio/pci: Replace "iommu_device" by "vIOMMU"
  vfio: Modify vfio_viommu_preset() parameter
  vfio: Improve error reporting when MMIO region mapping fails
  cpu: Introduce cpu_get_phys_bits()
  vfio: Check compatibility of CPU and IOMMU address space width

 include/hw/core/cpu.h|  9 ++
 include/hw/core/sysemu-cpu-ops.h |  6 
 include/hw/vfio/vfio-common.h|  2 +-
 cpu-target.c |  5 
 hw/core/cpu-system.c | 11 
 hw/vfio/common.c | 47 +---
 hw/vfio/migration.c  |  2 +-
 hw/vfio/pci.c|  2 +-
 target/i386/cpu.c|  6 
 9 files changed, 83 insertions(+), 7 deletions(-)

-- 
2.48.1




[PATCH 3/5] vfio: Improve error reporting when MMIO region mapping fails

2025-01-22 Thread Cédric Le Goater
When the IOMMU address space width is smaller than the physical
address width, a MMIO region of a device can fail to map because the
region is outside the supported IOVA ranges of the VM. In this case,
PCI peer-to-peer transactions on BARs are not supported.

This can occur with the 39-bit IOMMU address space width, as can be
the case on consumer processors or when using a vIOMMU device with
default settings.

The current error message is unclear. Change the error report to a
warning because it is a non fatal condition for the VM, clarify the
induced limitations for the user and provide advice on how to possibly
resolve this issue: setting the CPU address space width of the IOMMU
address space width accordingly.

Signed-off-by: Cédric Le Goater 
---
 hw/vfio/common.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 
3ca5dbf883ed2262e36952fcc47e717ff4154f12..561be7f57cf903e6e2bcbbb708be9e4d4ee8941c
 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -569,6 +569,20 @@ static bool vfio_get_section_iova_range(VFIOContainerBase 
*bcontainer,
 return true;
 }
 
+static void vfio_container_p2p_error_append(VFIOContainerBase *bcontainer,
+Error **errp)
+{
+error_append_hint(errp, "PCI peer-to-peer transactions on BARs "
+  "are not supported. ");
+if (vfio_viommu_preset(bcontainer)) {
+error_append_hint(errp, "Try setting the vIOMMU \"aw-bits\" "
+  "property to match CPU address space width\n");
+} else {
+error_append_hint(errp, "Try setting the CPU \"phys-bits\" "
+  "property to match IOMMU address space width\n");
+}
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
@@ -685,9 +699,13 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
"0x%"HWADDR_PRIx", %p) = %d (%s)",
bcontainer, iova, int128_get64(llsize), vaddr, ret,
strerror(-ret));
+/*
+ * MMIO region mapping failures are not fatal but in this case
+ * PCI peer-to-peer transactions are broken.
+ */
 if (memory_region_is_ram_device(section->mr)) {
-/* Allow unexpected mappings not to be fatal for RAM devices */
-error_report_err(err);
+vfio_container_p2p_error_append(bcontainer, &err);
+warn_report_err(err);
 return;
 }
 goto fail;
-- 
2.48.1




Re: [PATCH v1 1/4] physmem: disallow direct access to RAM DEVICE in address_space_write_rom()

2025-01-22 Thread Philippe Mathieu-Daudé

On 22/1/25 11:13, David Hildenbrand wrote:

On 22.01.25 11:10, David Hildenbrand wrote:

On 22.01.25 11:07, Philippe Mathieu-Daudé wrote:

Hi David,

On 20/1/25 12:14, David Hildenbrand wrote:

As documented in commit 4a2e242bbb306 ("memory: Don't use memcpy for
ram_device regions"), we disallow direct access to RAM DEVICE regions.

Let's factor out the "supports direct access" check from
memory_access_is_direct() so we can reuse it, and make it a bit 
easier to

read.

This change implies that address_space_write_rom() and
cpu_memory_rw_debug() won't be able to write to RAM DEVICE regions. It
will also affect cpu_flush_icache_range(), but it's only used by
hw/core/loader.c after writing to ROM, so it is expected to not apply
here with RAM DEVICE.

This fixes direct access to these regions where we don't want direct
access. We'll extend cpu_memory_rw_debug() next to also be able to 
write to

these (and IO) regions.

This is a preparation for further changes.

Cc: Alex Williamson 
Signed-off-by: David Hildenbrand 
---
    include/exec/memory.h | 30 --
    system/physmem.c  |  3 +--
    2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3ee1901b52..bd0ddb9cdf 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2985,15 +2985,33 @@ MemTxResult 
address_space_write_cached_slow(MemoryRegionCache *cache,

    int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr);
    bool prepare_mmio_access(MemoryRegion *mr);
+static inline bool 
memory_region_supports_direct_access(MemoryRegion *mr)

+{
+    /* ROM DEVICE regions only allow direct access if in ROMD mode. */
+    if (memory_region_is_romd(mr)) {
+    return true;
+    }
+    if (!memory_region_is_ram(mr)) {
+    return false;
+    }
+    /*
+ * RAM DEVICE regions can be accessed directly using memcpy, 
but it might
+ * be MMIO and access using mempy can be wrong (e.g., using 
instructions not

+ * intended for MMIO access). So we treat this as IO.
+ */
+    return !memory_region_is_ram_device(mr);
+
+}
+
    static inline bool memory_access_is_direct(MemoryRegion *mr, 
bool is_write)

    {
-    if (is_write) {
-    return memory_region_is_ram(mr) && !mr->readonly &&
-   !mr->rom_device && !memory_region_is_ram_device(mr);
-    } else {
-    return (memory_region_is_ram(mr) && ! 
memory_region_is_ram_device(mr)) ||


This patch is doing multiple things at once, and I'm having hard time
reviewing it.


I appreciate the review, but ... really?! :)

25 insertions(+), 8 deletions(-)


FWIW, I'll try to split it up ... I thought the comments added to 
memory_region_supports_direct_access() and friends are pretty clear.


No worry, I'll give it another try. (split still welcomed, but not
blocking).




Re: [PATCH v1 1/4] physmem: disallow direct access to RAM DEVICE in address_space_write_rom()

2025-01-22 Thread David Hildenbrand

On 22.01.25 11:17, Philippe Mathieu-Daudé wrote:

On 22/1/25 11:13, David Hildenbrand wrote:

On 22.01.25 11:10, David Hildenbrand wrote:

On 22.01.25 11:07, Philippe Mathieu-Daudé wrote:

Hi David,

On 20/1/25 12:14, David Hildenbrand wrote:

As documented in commit 4a2e242bbb306 ("memory: Don't use memcpy for
ram_device regions"), we disallow direct access to RAM DEVICE regions.

Let's factor out the "supports direct access" check from
memory_access_is_direct() so we can reuse it, and make it a bit
easier to
read.

This change implies that address_space_write_rom() and
cpu_memory_rw_debug() won't be able to write to RAM DEVICE regions. It
will also affect cpu_flush_icache_range(), but it's only used by
hw/core/loader.c after writing to ROM, so it is expected to not apply
here with RAM DEVICE.

This fixes direct access to these regions where we don't want direct
access. We'll extend cpu_memory_rw_debug() next to also be able to
write to
these (and IO) regions.

This is a preparation for further changes.

Cc: Alex Williamson 
Signed-off-by: David Hildenbrand 
---
     include/exec/memory.h | 30 --
     system/physmem.c  |  3 +--
     2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3ee1901b52..bd0ddb9cdf 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2985,15 +2985,33 @@ MemTxResult
address_space_write_cached_slow(MemoryRegionCache *cache,
     int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr);
     bool prepare_mmio_access(MemoryRegion *mr);
+static inline bool
memory_region_supports_direct_access(MemoryRegion *mr)
+{
+    /* ROM DEVICE regions only allow direct access if in ROMD mode. */
+    if (memory_region_is_romd(mr)) {
+    return true;
+    }
+    if (!memory_region_is_ram(mr)) {
+    return false;
+    }
+    /*
+ * RAM DEVICE regions can be accessed directly using memcpy,
but it might
+ * be MMIO and access using mempy can be wrong (e.g., using
instructions not
+ * intended for MMIO access). So we treat this as IO.
+ */
+    return !memory_region_is_ram_device(mr);
+
+}
+
     static inline bool memory_access_is_direct(MemoryRegion *mr,
bool is_write)
     {
-    if (is_write) {
-    return memory_region_is_ram(mr) && !mr->readonly &&
-   !mr->rom_device && !memory_region_is_ram_device(mr);
-    } else {
-    return (memory_region_is_ram(mr) && !
memory_region_is_ram_device(mr)) ||


This patch is doing multiple things at once, and I'm having hard time
reviewing it.


I appreciate the review, but ... really?! :)

25 insertions(+), 8 deletions(-)


FWIW, I'll try to split it up ... I thought the comments added to
memory_region_supports_direct_access() and friends are pretty clear.


No worry, I'll give it another try. (split still welcomed, but not
blocking).


I think unmangling the existing unreadable conditions in 
memory_access_is_direct() can be done separately; let me see what I can do.


--
Cheers,

David / dhildenb




[RFC v2 4/5] i386/kvm: Support event with masked entry format in KVM PMU filter

2025-01-22 Thread Zhao Liu
KVM_SET_PMU_EVENT_FILTER of x86 KVM supports masked events mode, which
accepts masked entry format event to flexibly represent a group of PMU
events.

Support masked entry format in kvm-pmu-filter object and handle this in
i386 kvm codes.

Signed-off-by: Zhao Liu 
---
Changes since RFC v1:
 * Bump up the supported QAPI version to 10.0.
---
 accel/kvm/kvm-pmu.c   | 91 +++
 qapi/kvm.json | 68 ++--
 target/i386/kvm/kvm.c | 39 +++
 3 files changed, 195 insertions(+), 3 deletions(-)

diff --git a/accel/kvm/kvm-pmu.c b/accel/kvm/kvm-pmu.c
index cbd32e8e21f8..9d68cd15e477 100644
--- a/accel/kvm/kvm-pmu.c
+++ b/accel/kvm/kvm-pmu.c
@@ -62,6 +62,16 @@ static void kvm_pmu_filter_get_event(Object *obj, Visitor 
*v, const char *name,
 str_event->u.x86_default.umask =
 g_strdup_printf("0x%x", event->u.x86_default.umask);
 break;
+case KVM_PMU_EVENT_FMT_X86_MASKED_ENTRY:
+str_event->u.x86_masked_entry.select =
+g_strdup_printf("0x%x", event->u.x86_masked_entry.select);
+str_event->u.x86_masked_entry.match =
+g_strdup_printf("0x%x", event->u.x86_masked_entry.match);
+str_event->u.x86_masked_entry.mask =
+g_strdup_printf("0x%x", event->u.x86_masked_entry.mask);
+str_event->u.x86_masked_entry.exclude =
+event->u.x86_masked_entry.exclude;
+break;
 default:
 g_assert_not_reached();
 }
@@ -160,6 +170,87 @@ static void kvm_pmu_filter_set_event(Object *obj, Visitor 
*v, const char *name,
 event->u.x86_default.umask = umask;
 break;
 }
+case KVM_PMU_EVENT_FMT_X86_MASKED_ENTRY: {
+uint64_t select, match, mask;
+
+ret = qemu_strtou64(str_event->u.x86_masked_entry.select,
+NULL, 0, &select);
+if (ret < 0) {
+error_setg(errp,
+   "Invalid %s PMU event (select: %s): %s. "
+   "The select must be a "
+   "12-bit unsigned number string.",
+   KVMPMUEventEncodeFmt_str(str_event->format),
+   str_event->u.x86_masked_entry.select,
+   strerror(-ret));
+g_free(event);
+goto fail;
+}
+if (select > UINT12_MAX) {
+error_setg(errp,
+   "Invalid %s PMU event (select: %s): "
+   "Numerical result out of range. "
+   "The select must be a "
+   "12-bit unsigned number string.",
+   KVMPMUEventEncodeFmt_str(str_event->format),
+   str_event->u.x86_masked_entry.select);
+g_free(event);
+goto fail;
+}
+event->u.x86_masked_entry.select = select;
+
+ret = qemu_strtou64(str_event->u.x86_masked_entry.match,
+NULL, 0, &match);
+if (ret < 0) {
+error_setg(errp,
+   "Invalid %s PMU event (match: %s): %s. "
+   "The match must be a uint8 string.",
+   KVMPMUEventEncodeFmt_str(str_event->format),
+   str_event->u.x86_masked_entry.match,
+   strerror(-ret));
+g_free(event);
+goto fail;
+}
+if (match > UINT8_MAX) {
+error_setg(errp,
+   "Invalid %s PMU event (match: %s): "
+   "Numerical result out of range. "
+   "The match must be a uint8 string.",
+   KVMPMUEventEncodeFmt_str(str_event->format),
+   str_event->u.x86_masked_entry.match);
+g_free(event);
+goto fail;
+}
+event->u.x86_masked_entry.match = match;
+
+ret = qemu_strtou64(str_event->u.x86_masked_entry.mask,
+NULL, 0, &mask);
+if (ret < 0) {
+error_setg(errp,
+   "Invalid %s PMU event (mask: %s): %s. "
+   "The mask must be a uint8 string.",
+   KVMPMUEventEncodeFmt_str(str_event->format),
+   str_event->u.x86_masked_entry.mask,
+   strerror(-ret));
+g_free(event);
+goto fail;
+}
+if (mask > UINT8_MAX) {
+error_setg(errp,
+   "Invalid %s PMU event (mask: %s): "
+   "Numerical result out of range. "
+   "The mask must be a u

[PATCH v2 04/10] cpus: Prefer cached CpuClass over CPU_GET_CLASS() macro

2025-01-22 Thread Philippe Mathieu-Daudé
CpuState caches its CPUClass since commit 6fbdff87062
("cpu: cache CPUClass in CPUState for hot code paths"),
use it.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
---
 include/hw/core/cpu.h | 10 +++-
 cpu-common.c  | 10 
 cpu-target.c  |  6 ++---
 hw/core/cpu-common.c  | 13 --
 hw/core/cpu-system.c  | 55 +++
 5 files changed, 32 insertions(+), 62 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 7b6b22c431b..d9e19d192e4 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -826,10 +826,8 @@ const char *parse_cpu_option(const char *cpu_option);
  */
 static inline bool cpu_has_work(CPUState *cpu)
 {
-CPUClass *cc = CPU_GET_CLASS(cpu);
-
-g_assert(cc->has_work);
-return cc->has_work(cpu);
+g_assert(cpu->cc->has_work);
+return cpu->cc->has_work(cpu);
 }
 
 /**
@@ -968,9 +966,7 @@ void cpu_interrupt(CPUState *cpu, int mask);
  */
 static inline void cpu_set_pc(CPUState *cpu, vaddr addr)
 {
-CPUClass *cc = CPU_GET_CLASS(cpu);
-
-cc->set_pc(cpu, addr);
+cpu->cc->set_pc(cpu, addr);
 }
 
 /**
diff --git a/cpu-common.c b/cpu-common.c
index 4248b2d727e..3a409aacb2e 100644
--- a/cpu-common.c
+++ b/cpu-common.c
@@ -389,11 +389,10 @@ void process_queued_cpu_work(CPUState *cpu)
 int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int flags,
   CPUBreakpoint **breakpoint)
 {
-CPUClass *cc = CPU_GET_CLASS(cpu);
 CPUBreakpoint *bp;
 
-if (cc->gdb_adjust_breakpoint) {
-pc = cc->gdb_adjust_breakpoint(cpu, pc);
+if (cpu->cc->gdb_adjust_breakpoint) {
+pc = cpu->cc->gdb_adjust_breakpoint(cpu, pc);
 }
 
 bp = g_malloc(sizeof(*bp));
@@ -419,11 +418,10 @@ int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int 
flags,
 /* Remove a specific breakpoint.  */
 int cpu_breakpoint_remove(CPUState *cpu, vaddr pc, int flags)
 {
-CPUClass *cc = CPU_GET_CLASS(cpu);
 CPUBreakpoint *bp;
 
-if (cc->gdb_adjust_breakpoint) {
-pc = cc->gdb_adjust_breakpoint(cpu, pc);
+if (cpu->cc->gdb_adjust_breakpoint) {
+pc = cpu->cc->gdb_adjust_breakpoint(cpu, pc);
 }
 
 QTAILQ_FOREACH(bp, &cpu->breakpoints, entry) {
diff --git a/cpu-target.c b/cpu-target.c
index 89874496a41..98e9e7cc4a1 100644
--- a/cpu-target.c
+++ b/cpu-target.c
@@ -159,10 +159,8 @@ bool cpu_exec_realizefn(CPUState *cpu, Error **errp)
 void cpu_exec_unrealizefn(CPUState *cpu)
 {
 #ifndef CONFIG_USER_ONLY
-CPUClass *cc = CPU_GET_CLASS(cpu);
-
-if (cc->sysemu_ops->legacy_vmsd != NULL) {
-vmstate_unregister(NULL, cc->sysemu_ops->legacy_vmsd, cpu);
+if (cpu->cc->sysemu_ops->legacy_vmsd != NULL) {
+vmstate_unregister(NULL, cpu->cc->sysemu_ops->legacy_vmsd, cpu);
 }
 if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
 vmstate_unregister(NULL, &vmstate_cpu_common, cpu);
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index ff605059c15..886aa793c04 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -40,9 +40,7 @@ CPUState *cpu_by_arch_id(int64_t id)
 CPUState *cpu;
 
 CPU_FOREACH(cpu) {
-CPUClass *cc = CPU_GET_CLASS(cpu);
-
-if (cc->get_arch_id(cpu) == id) {
+if (cpu->cc->get_arch_id(cpu) == id) {
 return cpu;
 }
 }
@@ -101,11 +99,9 @@ static int cpu_common_gdb_write_register(CPUState *cpu, 
uint8_t *buf, int reg)
 
 void cpu_dump_state(CPUState *cpu, FILE *f, int flags)
 {
-CPUClass *cc = CPU_GET_CLASS(cpu);
-
-if (cc->dump_state) {
+if (cpu->cc->dump_state) {
 cpu_synchronize_state(cpu);
-cc->dump_state(cpu, f, flags);
+cpu->cc->dump_state(cpu, f, flags);
 }
 }
 
@@ -119,11 +115,10 @@ void cpu_reset(CPUState *cpu)
 static void cpu_common_reset_hold(Object *obj, ResetType type)
 {
 CPUState *cpu = CPU(obj);
-CPUClass *cc = CPU_GET_CLASS(cpu);
 
 if (qemu_loglevel_mask(CPU_LOG_RESET)) {
 qemu_log("CPU Reset (CPU %d)\n", cpu->cpu_index);
-log_cpu_state(cpu, cc->reset_dump_flags);
+log_cpu_state(cpu, cpu->cc->reset_dump_flags);
 }
 
 cpu->interrupt_request = 0;
diff --git a/hw/core/cpu-system.c b/hw/core/cpu-system.c
index 6aae28a349a..37d54d04bf8 100644
--- a/hw/core/cpu-system.c
+++ b/hw/core/cpu-system.c
@@ -25,10 +25,8 @@
 
 bool cpu_paging_enabled(const CPUState *cpu)
 {
-CPUClass *cc = CPU_GET_CLASS(cpu);
-
-if (cc->sysemu_ops->get_paging_enabled) {
-return cc->sysemu_ops->get_paging_enabled(cpu);
+if (cpu->cc->sysemu_ops->get_paging_enabled) {
+return cpu->cc->sysemu_ops->get_paging_enabled(cpu);
 }
 
 return false;
@@ -37,10 +35,8 @@ bool cpu_paging_enabled(const CPUState *cpu)
 bool cpu_get_memory_mapping(CPUState *cpu, MemoryMappingList *list,
 Error **errp)
 {
-CPUClass *cc = CPU_GET_CLASS(cpu);
-
-if (cc->sysemu_ops->get_memory_mapping) {
-retu

[PATCH v2 10/10] target/arm: Prefer cached CpuClass over CPU_GET_CLASS() macro

2025-01-22 Thread Philippe Mathieu-Daudé
CpuState caches its CPUClass since commit 6fbdff87062
("cpu: cache CPUClass in CPUState for hot code paths"),
use it.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
---
 target/arm/cpu.c | 3 +--
 target/arm/tcg/cpu-v7m.c | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index dc0231233a6..048b825a006 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -846,7 +846,6 @@ static inline bool arm_excp_unmasked(CPUState *cs, unsigned 
int excp_idx,
 
 static bool arm_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
 {
-CPUClass *cc = CPU_GET_CLASS(cs);
 CPUARMState *env = cpu_env(cs);
 uint32_t cur_el = arm_current_el(env);
 bool secure = arm_is_secure(env);
@@ -946,7 +945,7 @@ static bool arm_cpu_exec_interrupt(CPUState *cs, int 
interrupt_request)
  found:
 cs->exception_index = excp_idx;
 env->exception.target_el = target_el;
-cc->tcg_ops->do_interrupt(cs);
+cs->cc->tcg_ops->do_interrupt(cs);
 return true;
 }
 
diff --git a/target/arm/tcg/cpu-v7m.c b/target/arm/tcg/cpu-v7m.c
index 03acdf83e00..d2d0b94b630 100644
--- a/target/arm/tcg/cpu-v7m.c
+++ b/target/arm/tcg/cpu-v7m.c
@@ -19,7 +19,6 @@
 
 static bool arm_v7m_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
 {
-CPUClass *cc = CPU_GET_CLASS(cs);
 ARMCPU *cpu = ARM_CPU(cs);
 CPUARMState *env = &cpu->env;
 bool ret = false;
@@ -35,7 +34,7 @@ static bool arm_v7m_cpu_exec_interrupt(CPUState *cs, int 
interrupt_request)
 if (interrupt_request & CPU_INTERRUPT_HARD
 && (armv7m_nvic_can_take_pending_exception(env->nvic))) {
 cs->exception_index = EXCP_IRQ;
-cc->tcg_ops->do_interrupt(cs);
+cs->cc->tcg_ops->do_interrupt(cs);
 ret = true;
 }
 return ret;
-- 
2.47.1




[PATCH] hw/boards: Convert MachineClass bitfields to boolean

2025-01-22 Thread Philippe Mathieu-Daudé
As Daniel mentioned:

 "The number of instances of MachineClass is not large enough
  that we save a useful amount of memory through bitfields."

Also, see recent commit ecbf3567e21 ("docs/devel/style: add a
section about bitfield, and disallow them for packed structures").

Convert the MachineClass bitfields used as boolean as real ones.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/boards.h| 14 +++---
 hw/arm/aspeed.c|  6 +++---
 hw/arm/fby35.c |  4 ++--
 hw/arm/npcm7xx_boards.c|  6 +++---
 hw/arm/raspi.c |  6 +++---
 hw/arm/sbsa-ref.c  |  2 +-
 hw/arm/virt.c  |  2 +-
 hw/arm/xilinx_zynq.c   |  2 +-
 hw/avr/arduino.c   |  6 +++---
 hw/core/null-machine.c | 10 +-
 hw/i386/microvm.c  |  2 +-
 hw/i386/pc_piix.c  |  2 +-
 hw/i386/pc_q35.c   |  4 ++--
 hw/loongarch/virt.c|  2 +-
 hw/m68k/virt.c |  6 +++---
 hw/ppc/pnv.c   |  2 +-
 hw/ppc/spapr.c |  2 +-
 hw/riscv/virt.c|  2 +-
 hw/s390x/s390-virtio-ccw.c |  8 
 hw/xtensa/sim.c|  2 +-
 20 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 2ad711e56db..ff5904d6fd8 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -279,13 +279,13 @@ struct MachineClass {
 int max_cpus;
 int min_cpus;
 int default_cpus;
-unsigned int no_serial:1,
-no_parallel:1,
-no_floppy:1,
-no_cdrom:1,
-no_sdcard:1,
-pci_allow_0_address:1,
-legacy_fw_cfg_order:1;
+bool no_serial;
+bool no_parallel;
+bool no_floppy;
+bool no_cdrom;
+bool no_sdcard;
+bool pci_allow_0_address;
+bool legacy_fw_cfg_order;
 bool is_default;
 const char *default_machine_opts;
 const char *default_boot_order;
diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index a18d4ed1fb1..dc91052e94d 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -1225,9 +1225,9 @@ static void aspeed_machine_class_init(ObjectClass *oc, 
void *data)
 AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 
 mc->init = aspeed_machine_init;
-mc->no_floppy = 1;
-mc->no_cdrom = 1;
-mc->no_parallel = 1;
+mc->no_floppy = true;
+mc->no_cdrom = true;
+mc->no_parallel = true;
 mc->default_ram_id = "ram";
 amc->macs_mask = ASPEED_MAC0_ON;
 amc->uart_default = ASPEED_DEV_UART5;
diff --git a/hw/arm/fby35.c b/hw/arm/fby35.c
index 83d08e578b7..04d0eb9b0c1 100644
--- a/hw/arm/fby35.c
+++ b/hw/arm/fby35.c
@@ -168,8 +168,8 @@ static void fby35_class_init(ObjectClass *oc, void *data)
 
 mc->desc = "Meta Platforms fby35";
 mc->init = fby35_init;
-mc->no_floppy = 1;
-mc->no_cdrom = 1;
+mc->no_floppy = true;
+mc->no_cdrom = true;
 mc->min_cpus = mc->max_cpus = mc->default_cpus = 3;
 
 object_class_property_add_bool(oc, "execute-in-place",
diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
index 7727e0dc4bb..c9735b357cd 100644
--- a/hw/arm/npcm7xx_boards.c
+++ b/hw/arm/npcm7xx_boards.c
@@ -461,9 +461,9 @@ static void npcm7xx_machine_class_init(ObjectClass *oc, 
void *data)
 NULL
 };
 
-mc->no_floppy = 1;
-mc->no_cdrom = 1;
-mc->no_parallel = 1;
+mc->no_floppy = true;
+mc->no_cdrom = true;
+mc->no_parallel = true;
 mc->default_ram_id = "ram";
 mc->valid_cpu_types = valid_cpu_types;
 }
diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index a7a662f40db..665ccd9b50b 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -322,9 +322,9 @@ void raspi_machine_class_common_init(MachineClass *mc,
board_type(board_rev),
FIELD_EX32(board_rev, REV_CODE, REVISION));
 mc->block_default_type = IF_SD;
-mc->no_parallel = 1;
-mc->no_floppy = 1;
-mc->no_cdrom = 1;
+mc->no_parallel = true;
+mc->no_floppy = true;
+mc->no_cdrom = true;
 mc->default_cpus = mc->min_cpus = mc->max_cpus = cores_count(board_rev);
 mc->default_ram_size = board_ram_size(board_rev);
 mc->default_ram_id = "ram";
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 6183111f2de..33c6b9ea3ec 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -899,7 +899,7 @@ static void sbsa_ref_class_init(ObjectClass *oc, void *data)
 mc->pci_allow_0_address = true;
 mc->minimum_page_bits = 12;
 mc->block_default_type = IF_IDE;
-mc->no_cdrom = 1;
+mc->no_cdrom = true;
 mc->default_nic = "e1000e";
 mc->default_ram_size = 1 * GiB;
 mc->default_ram_id = "sbsa-ref.ram";
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 99e0a68b6c5..8de57be1d1c 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3124,7 +3124,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);
 #endif
 mc-

Re: [PATCH 06/22] exec/cpu: Call cpu_remove_sync() once in cpu_common_unrealize()

2025-01-22 Thread Igor Mammedov
On Thu, 16 Jan 2025 19:05:46 +0100
Philippe Mathieu-Daudé  wrote:

> On 28/11/23 17:42, Igor Mammedov wrote:
> > On Mon, 18 Sep 2023 18:02:39 +0200
> > Philippe Mathieu-Daudé  wrote:
> >   
> >> While create_vcpu_thread() creates a vCPU thread, its counterpart
> >> is cpu_remove_sync(), which join and destroy the thread.
> >>
> >> create_vcpu_thread() is called in qemu_init_vcpu(), itself called
> >> in cpu_common_realizefn(). Since we don't have qemu_deinit_vcpu()
> >> helper (we probably don't need any), simply destroy the thread in
> >> cpu_common_unrealizefn().
> >>
> >> Note: only the PPC and X86 targets were calling cpu_remove_sync(),
> >> meaning all other targets were leaking the thread when the vCPU
> >> was unrealized (mostly when vCPU are hot-unplugged).
> >>
> >> Signed-off-by: Philippe Mathieu-Daudé 
> >> ---
> >>   hw/core/cpu-common.c  | 3 +++
> >>   target/i386/cpu.c | 1 -
> >>   target/ppc/cpu_init.c | 2 --
> >>   3 files changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
> >> index a3b8de7054..e5841c59df 100644
> >> --- a/hw/core/cpu-common.c
> >> +++ b/hw/core/cpu-common.c
> >> @@ -221,6 +221,9 @@ static void cpu_common_unrealizefn(DeviceState *dev)
> >>   
> >>   /* NOTE: latest generic point before the cpu is fully unrealized */
> >>   cpu_exec_unrealizefn(cpu);
> >> +
> >> +/* Destroy vCPU thread */
> >> +cpu_remove_sync(cpu);
> >>   }
> >>   
> >>   static void cpu_common_initfn(Object *obj)
> >> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> >> index cb41d30aab..d79797d963 100644
> >> --- a/target/i386/cpu.c
> >> +++ b/target/i386/cpu.c
> >> @@ -7470,7 +7470,6 @@ static void x86_cpu_unrealizefn(DeviceState *dev)
> >>   X86CPUClass *xcc = X86_CPU_GET_CLASS(dev);
> >>   
> >>   #ifndef CONFIG_USER_ONLY
> >> -cpu_remove_sync(CPU(dev));
> >>   qemu_unregister_reset(x86_cpu_machine_reset_cb, dev);
> >>   #endif  
> > 
> > missing  followup context:
> >  ...
> >  xcc->parent_unrealize(dev);
> > 
> > Before the patch, vcpu thread is stopped and onnly then
> > clean up happens.
> > 
> > After the patch we have cleanup while vcpu thread is still running.
> > 
> > Even if it doesn't explode, such ordering still seems to be wrong.  
> 
> OK.

looking at all users, some do stop vcpu thread before tearing down vcpu object
and interrupt controller, while some do it other way around or mix of both.

It's probably safe to stop vcpu thread wrt intc cleanup.
Can you check what would happen if there were a pending interrupt,
but then flowing would happen:
 1. destroying intc
 2. can vcpu thread just kicked out from KVM_RUN,
trip over missing/invalid intc state while thread runs towards its exit 
point?

If above can't crash then,
I'd prefer to stop vcpu at least before vcpu cleanup is run.
i.e. put cpu_remove_sync() as the very 1st call inside of 
cpu_common_unrealizefn()

> >> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> >> index e2c06c1f32..24d4e8fa7e 100644
> >> --- a/target/ppc/cpu_init.c
> >> +++ b/target/ppc/cpu_init.c
> >> @@ -6853,8 +6853,6 @@ static void ppc_cpu_unrealize(DeviceState *dev)
> >>   
> >>   pcc->parent_unrealize(dev);
> >>   
> >> -cpu_remove_sync(CPU(cpu));  
> > 
> > bug in current code?  
> 
> Plausibly. See:
> 
> commit f1023d21e81b7bf523ddf2ac91a48117f20ef9d7
> Author: Greg Kurz 
> Date:   Thu Oct 15 23:18:32 2020 +0200
> 
>  spapr: Unrealize vCPUs with qdev_unrealize()
> 
>  Since we introduced CPU hot-unplug in sPAPR, we don't unrealize the
>  vCPU objects explicitly. Instead, we let QOM handle that for us
>  under object_property_del_all() when the CPU core object is
>  finalized. The only thing we do is calling cpu_remove_sync() to
>  tear the vCPU thread down.
> 
>  This happens to work but it is ugly because:
>  - we call qdev_realize() but the corresponding qdev_unrealize() is
>buried deep in the QOM code
>  - we call cpu_remove_sync() to undo qemu_init_vcpu() called by
>ppc_cpu_realize() in target/ppc/translate_init.c.inc
>  - the CPU init and teardown paths aren't really symmetrical
> 
>  The latter didn't bite us so far but a future patch that greatly
>  simplifies the CPU core realize path needs it to avoid a crash
>  in QOM.
> 
>  For all these reasons, have ppc_cpu_unrealize() to undo the changes
>  of ppc_cpu_realize() by calling cpu_remove_sync() at the right
>  place, and have the sPAPR CPU core code to call qdev_unrealize().
> 
>  This requires to add a missing stub because translate_init.c.inc is
>  also compiled for user mode.
> 
> >   
> >> -
> >>   destroy_ppc_opcodes(cpu);
> >>   }
> >> 
> 




Re: [PATCH] hw/boards: Convert MachineClass bitfields to boolean

2025-01-22 Thread Peter Maydell
On Wed, 22 Jan 2025 at 12:36, Thomas Huth  wrote:
>
> On 22/01/2025 11.32, Philippe Mathieu-Daudé wrote:
> > As Daniel mentioned:
> >
> >   "The number of instances of MachineClass is not large enough
> >that we save a useful amount of memory through bitfields."
> >
> > Also, see recent commit ecbf3567e21 ("docs/devel/style: add a
> > section about bitfield, and disallow them for packed structures").
> >
> > Convert the MachineClass bitfields used as boolean as real ones.
> >
> > Suggested-by: Daniel P. Berrangé 
> > Signed-off-by: Philippe Mathieu-Daudé 
> > ---
> >   include/hw/boards.h| 14 +++---
> >   hw/arm/aspeed.c|  6 +++---
> >   hw/arm/fby35.c |  4 ++--
> >   hw/arm/npcm7xx_boards.c|  6 +++---
> >   hw/arm/raspi.c |  6 +++---
> >   hw/arm/sbsa-ref.c  |  2 +-
> >   hw/arm/virt.c  |  2 +-
> >   hw/arm/xilinx_zynq.c   |  2 +-
> >   hw/avr/arduino.c   |  6 +++---
> >   hw/core/null-machine.c | 10 +-
> >   hw/i386/microvm.c  |  2 +-
> >   hw/i386/pc_piix.c  |  2 +-
> >   hw/i386/pc_q35.c   |  4 ++--
> >   hw/loongarch/virt.c|  2 +-
> >   hw/m68k/virt.c |  6 +++---
> >   hw/ppc/pnv.c   |  2 +-
> >   hw/ppc/spapr.c |  2 +-
> >   hw/riscv/virt.c|  2 +-
> >   hw/s390x/s390-virtio-ccw.c |  8 
> >   hw/xtensa/sim.c|  2 +-
> >   20 files changed, 45 insertions(+), 45 deletions(-)
>
> So if you are touching all these files, why not go with an even more
> meaningful rework instead? Flip the meaning of the "no_*" flags to the
> opposite, so that we e.g. have "has_default_cdrom" instead of "no_cdrom",
> then new boards would not have to remember to set these ugly "no_" flags
> anymore. It's quite a bit of work, but it could certainly be helpful in the
> long run.

Well, that depends on what you think the default for new
boards should be. I suspect these are all no_foo because
when they were put in the idea was "all boards should
be default have a foo, and 'this board defaults to not
having a foo' is the rarer special case it has to set"...

-- PMM



Re: [PATCH 04/10] rust: pl011: extract CharBackend receive logic into a separate function

2025-01-22 Thread Zhao Liu
On Fri, Jan 17, 2025 at 10:26:51AM +0100, Paolo Bonzini wrote:
> Date: Fri, 17 Jan 2025 10:26:51 +0100
> From: Paolo Bonzini 
> Subject: [PATCH 04/10] rust: pl011: extract CharBackend receive logic into
>  a separate function
> X-Mailer: git-send-email 2.47.1
> 
> Prepare for moving all references to the registers and the FIFO into a
> separate struct.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  rust/hw/char/pl011/src/device.rs | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)

[snip]

> -pub fn put_fifo(&mut self, value: c_uint) {
> +pub fn put_fifo(&mut self, value: u32) {
>  let depth = self.fifo_depth();
>  assert!(depth > 0);
>  let slot = (self.read_pos + self.read_count) & (depth - 1);
> @@ -615,12 +621,9 @@ pub fn write(&mut self, offset: hwaddr, value: u64) {
>  unsafe {
>  debug_assert!(!opaque.is_null());
>  let mut state = NonNull::new_unchecked(opaque.cast::());
> -if state.as_ref().loopback_enabled() {
> -return;
> -}
>  if size > 0 {
>  debug_assert!(!buf.is_null());
> -state.as_mut().put_fifo(c_uint::from(buf.read_volatile()))

An extra question...here I'm not sure, do we really need read_volatile?

> +state.as_mut().receive(u32::from(buf.read_volatile()));
>  }
>  }
>  }

Patch is fine for me,

Reviewed-by: Zhao Liu 





[PATCH v2 05/10] accel: Prefer cached CpuClass over CPU_GET_CLASS() macro

2025-01-22 Thread Philippe Mathieu-Daudé
CpuState caches its CPUClass since commit 6fbdff87062
("cpu: cache CPUClass in CPUState for hot code paths"),
use it.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
---
 accel/accel-target.c  | 12 +---
 accel/tcg/tcg-accel-ops.c |  3 +--
 accel/tcg/translate-all.c |  2 +-
 accel/tcg/watchpoint.c|  9 -
 4 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/accel/accel-target.c b/accel/accel-target.c
index 08626c00c2d..8a16c0c3ae0 100644
--- a/accel/accel-target.c
+++ b/accel/accel-target.c
@@ -112,22 +112,20 @@ void accel_init_interfaces(AccelClass *ac)
 
 void accel_cpu_instance_init(CPUState *cpu)
 {
-CPUClass *cc = CPU_GET_CLASS(cpu);
-
-if (cc->accel_cpu && cc->accel_cpu->cpu_instance_init) {
-cc->accel_cpu->cpu_instance_init(cpu);
+if (cpu->cc->accel_cpu && cpu->cc->accel_cpu->cpu_instance_init) {
+cpu->cc->accel_cpu->cpu_instance_init(cpu);
 }
 }
 
 bool accel_cpu_common_realize(CPUState *cpu, Error **errp)
 {
-CPUClass *cc = CPU_GET_CLASS(cpu);
 AccelState *accel = current_accel();
 AccelClass *acc = ACCEL_GET_CLASS(accel);
 
 /* target specific realization */
-if (cc->accel_cpu && cc->accel_cpu->cpu_target_realize
-&& !cc->accel_cpu->cpu_target_realize(cpu, errp)) {
+if (cpu->cc->accel_cpu
+&& cpu->cc->accel_cpu->cpu_target_realize
+&& !cpu->cc->accel_cpu->cpu_target_realize(cpu, errp)) {
 return false;
 }
 
diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
index 6e3f1fa92b2..299d6176cfb 100644
--- a/accel/tcg/tcg-accel-ops.c
+++ b/accel/tcg/tcg-accel-ops.c
@@ -120,10 +120,9 @@ static inline int xlat_gdb_type(CPUState *cpu, int gdbtype)
 [GDB_WATCHPOINT_ACCESS] = BP_GDB | BP_MEM_ACCESS,
 };
 
-CPUClass *cc = CPU_GET_CLASS(cpu);
 int cputype = xlat[gdbtype];
 
-if (cc->gdb_stop_before_watchpoint) {
+if (cpu->cc->gdb_stop_before_watchpoint) {
 cputype |= BP_STOP_BEFORE_ACCESS;
 }
 return cputype;
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index d56ca13cddf..5a378cb0281 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -622,7 +622,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
  * to account for the re-execution of the branch.
  */
 n = 1;
-cc = CPU_GET_CLASS(cpu);
+cc = cpu->cc;
 if (cc->tcg_ops->io_recompile_replay_branch &&
 cc->tcg_ops->io_recompile_replay_branch(cpu, tb)) {
 cpu->neg.icount_decr.u16.low++;
diff --git a/accel/tcg/watchpoint.c b/accel/tcg/watchpoint.c
index af57d182d5b..52e550dec6b 100644
--- a/accel/tcg/watchpoint.c
+++ b/accel/tcg/watchpoint.c
@@ -69,7 +69,6 @@ int cpu_watchpoint_address_matches(CPUState *cpu, vaddr addr, 
vaddr len)
 void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
   MemTxAttrs attrs, int flags, uintptr_t ra)
 {
-CPUClass *cc = CPU_GET_CLASS(cpu);
 CPUWatchpoint *wp;
 
 assert(tcg_enabled());
@@ -85,9 +84,9 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr 
len,
 return;
 }
 
-if (cc->tcg_ops->adjust_watchpoint_address) {
+if (cpu->cc->tcg_ops->adjust_watchpoint_address) {
 /* this is currently used only by ARM BE32 */
-addr = cc->tcg_ops->adjust_watchpoint_address(cpu, addr, len);
+addr = cpu->cc->tcg_ops->adjust_watchpoint_address(cpu, addr, len);
 }
 
 assert((flags & ~BP_MEM_ACCESS) == 0);
@@ -119,8 +118,8 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr 
len,
 wp->hitattrs = attrs;
 
 if (wp->flags & BP_CPU
-&& cc->tcg_ops->debug_check_watchpoint
-&& !cc->tcg_ops->debug_check_watchpoint(cpu, wp)) {
+&& cpu->cc->tcg_ops->debug_check_watchpoint
+&& !cpu->cc->tcg_ops->debug_check_watchpoint(cpu, wp)) {
 wp->flags &= ~BP_WATCHPOINT_HIT;
 continue;
 }
-- 
2.47.1




[PATCH v2 06/10] user: Prefer cached CpuClass over CPU_GET_CLASS() macro

2025-01-22 Thread Philippe Mathieu-Daudé
CpuState caches its CPUClass since commit 6fbdff87062
("cpu: cache CPUClass in CPUState for hot code paths"),
use it.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
---
 linux-user/alpha/target_proc.h | 2 +-
 bsd-user/signal.c  | 4 ++--
 linux-user/signal.c| 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/linux-user/alpha/target_proc.h b/linux-user/alpha/target_proc.h
index dac37dffc9d..da437ee0e56 100644
--- a/linux-user/alpha/target_proc.h
+++ b/linux-user/alpha/target_proc.h
@@ -15,7 +15,7 @@ static int open_cpuinfo(CPUArchState *cpu_env, int fd)
 const char *p, *q;
 int t;
 
-p = object_class_get_name(OBJECT_CLASS(CPU_GET_CLASS(env_cpu(cpu_env;
+p = object_class_get_name(OBJECT_CLASS(env_cpu(cpu_env)->cc));
 q = strchr(p, '-');
 t = q - p;
 assert(t < sizeof(model));
diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index b4e1458237a..4e32cd64f18 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -1021,7 +1021,7 @@ void process_pending_signals(CPUArchState *env)
 void cpu_loop_exit_sigsegv(CPUState *cpu, target_ulong addr,
MMUAccessType access_type, bool maperr, uintptr_t 
ra)
 {
-const TCGCPUOps *tcg_ops = CPU_GET_CLASS(cpu)->tcg_ops;
+const TCGCPUOps *tcg_ops = cpu->cc->tcg_ops;
 
 if (tcg_ops->record_sigsegv) {
 tcg_ops->record_sigsegv(cpu, addr, access_type, maperr, ra);
@@ -1037,7 +1037,7 @@ void cpu_loop_exit_sigsegv(CPUState *cpu, target_ulong 
addr,
 void cpu_loop_exit_sigbus(CPUState *cpu, target_ulong addr,
   MMUAccessType access_type, uintptr_t ra)
 {
-const TCGCPUOps *tcg_ops = CPU_GET_CLASS(cpu)->tcg_ops;
+const TCGCPUOps *tcg_ops = cpu->cc->tcg_ops;
 
 if (tcg_ops->record_sigbus) {
 tcg_ops->record_sigbus(cpu, addr, access_type, ra);
diff --git a/linux-user/signal.c b/linux-user/signal.c
index 087c4d270e4..53b40e82261 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -743,7 +743,7 @@ void force_sigsegv(int oldsig)
 void cpu_loop_exit_sigsegv(CPUState *cpu, target_ulong addr,
MMUAccessType access_type, bool maperr, uintptr_t 
ra)
 {
-const TCGCPUOps *tcg_ops = CPU_GET_CLASS(cpu)->tcg_ops;
+const TCGCPUOps *tcg_ops = cpu->cc->tcg_ops;
 
 if (tcg_ops->record_sigsegv) {
 tcg_ops->record_sigsegv(cpu, addr, access_type, maperr, ra);
@@ -759,7 +759,7 @@ void cpu_loop_exit_sigsegv(CPUState *cpu, target_ulong addr,
 void cpu_loop_exit_sigbus(CPUState *cpu, target_ulong addr,
   MMUAccessType access_type, uintptr_t ra)
 {
-const TCGCPUOps *tcg_ops = CPU_GET_CLASS(cpu)->tcg_ops;
+const TCGCPUOps *tcg_ops = cpu->cc->tcg_ops;
 
 if (tcg_ops->record_sigbus) {
 tcg_ops->record_sigbus(cpu, addr, access_type, ra);
-- 
2.47.1




[PATCH v2 07/10] disas: Prefer cached CpuClass over CPU_GET_CLASS() macro

2025-01-22 Thread Philippe Mathieu-Daudé
CpuState caches its CPUClass since commit 6fbdff87062
("cpu: cache CPUClass in CPUState for hot code paths"),
use it.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
---
 disas/disas-common.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/disas/disas-common.c b/disas/disas-common.c
index de61f6d8a12..57505823cb7 100644
--- a/disas/disas-common.c
+++ b/disas/disas-common.c
@@ -67,9 +67,8 @@ void disas_initialize_debug_target(CPUDebug *s, CPUState *cpu)
 s->info.endian =  BFD_ENDIAN_LITTLE;
 }
 
-CPUClass *cc = CPU_GET_CLASS(cpu);
-if (cc->disas_set_info) {
-cc->disas_set_info(cpu, &s->info);
+if (cpu->cc->disas_set_info) {
+cpu->cc->disas_set_info(cpu, &s->info);
 }
 }
 
-- 
2.47.1




[PATCH v2 09/10] hw/acpi: Prefer cached CpuClass over CPU_GET_CLASS() macro

2025-01-22 Thread Philippe Mathieu-Daudé
CpuState caches its CPUClass since commit 6fbdff87062
("cpu: cache CPUClass in CPUState for hot code paths"),
use it.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
---
 hw/acpi/cpu.c | 4 ++--
 hw/acpi/cpu_hotplug.c | 3 +--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index f70a2c045e1..6f1ae79edbf 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -235,8 +235,8 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
 
 static AcpiCpuStatus *get_cpu_status(CPUHotplugState *cpu_st, DeviceState *dev)
 {
-CPUClass *k = CPU_GET_CLASS(dev);
-uint64_t cpu_arch_id = k->get_arch_id(CPU(dev));
+CPUState *cpu = CPU(dev);
+uint64_t cpu_arch_id = cpu->cc->get_arch_id(cpu);
 int i;
 
 for (i = 0; i < cpu_st->dev_count; i++) {
diff --git a/hw/acpi/cpu_hotplug.c b/hw/acpi/cpu_hotplug.c
index 83b8bc5deb8..aa0e1e3efa5 100644
--- a/hw/acpi/cpu_hotplug.c
+++ b/hw/acpi/cpu_hotplug.c
@@ -62,10 +62,9 @@ static const MemoryRegionOps AcpiCpuHotplug_ops = {
 static void acpi_set_cpu_present_bit(AcpiCpuHotplug *g, CPUState *cpu,
  bool *swtchd_to_modern)
 {
-CPUClass *k = CPU_GET_CLASS(cpu);
 int64_t cpu_id;
 
-cpu_id = k->get_arch_id(cpu);
+cpu_id = cpu->cc->get_arch_id(cpu);
 if ((cpu_id / 8) >= ACPI_GPE_PROC_LEN) {
 object_property_set_bool(g->device, "cpu-hotplug-legacy", false,
  &error_abort);
-- 
2.47.1




[PATCH v2 01/10] hw/core/generic-loader: Do not open-code cpu_set_pc()

2025-01-22 Thread Philippe Mathieu-Daudé
Directly call cpu_set_pc() instead of open-coding it.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/core/generic-loader.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c
index fb354693aff..1b9ab600c9c 100644
--- a/hw/core/generic-loader.c
+++ b/hw/core/generic-loader.c
@@ -48,11 +48,8 @@ static void generic_loader_reset(void *opaque)
 GenericLoaderState *s = GENERIC_LOADER(opaque);
 
 if (s->set_pc) {
-CPUClass *cc = CPU_GET_CLASS(s->cpu);
 cpu_reset(s->cpu);
-if (cc) {
-cc->set_pc(s->cpu, s->addr);
-}
+cpu_set_pc(s->cpu, s->addr);
 }
 
 if (s->data_len) {
-- 
2.47.1




Re: [PATCH v2 02/10] gdbstub: Clarify no more than @gdb_num_core_regs can be accessed

2025-01-22 Thread Philippe Mathieu-Daudé

On 22/1/25 10:30, Philippe Mathieu-Daudé wrote:

Both CPUClass::gdb_read_register() and CPUClass::gdb_write_register()
handlers are called from common gdbstub code, and won't be called with
register index over CPUClass::gdb_num_core_regs:

   int gdb_read_register(CPUState *cpu, GByteArray *buf, int reg)
   {
   CPUClass *cc = CPU_GET_CLASS(cpu);

   if (reg < cc->gdb_num_core_regs) {
   return cc->gdb_read_register(cpu, buf, reg);
   }
   ...
   }

   static int gdb_write_register(CPUState *cpu, uint8_t *mem_buf, int reg)
   {
   CPUClass *cc = CPU_GET_CLASS(cpu);

   if (reg < cc->gdb_num_core_regs) {
   return cc->gdb_write_register(cpu, mem_buf, reg);
   }
   ...
   }

Clarify that in CPUClass docstring, and remove unreachable code on
the microblaze and tricore implementations.


s/tricore/openrisc/ 🤦



Signed-off-by: Philippe Mathieu-Daudé 
---
  include/hw/core/cpu.h   | 2 ++
  target/microblaze/gdbstub.c | 5 -
  target/openrisc/gdbstub.c   | 5 -
  3 files changed, 2 insertions(+), 10 deletions(-)





Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager

2025-01-22 Thread Xu Yilun
On Wed, Jan 22, 2025 at 03:30:05PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 22/1/25 02:18, Peter Xu wrote:
> > On Tue, Jun 25, 2024 at 12:31:13AM +0800, Xu Yilun wrote:
> > > On Mon, Jan 20, 2025 at 03:46:15PM -0500, Peter Xu wrote:
> > > > On Mon, Jan 20, 2025 at 09:22:50PM +1100, Alexey Kardashevskiy wrote:
> > > > > > It is still uncertain how to implement the private MMIO. Our 
> > > > > > assumption
> > > > > > is the private MMIO would also create a memory region with
> > > > > > guest_memfd-like backend. Its mr->ram is true and should be managed 
> > > > > > by
> > > > > > RamdDiscardManager which can skip doing DMA_MAP in VFIO's region_add
> > > > > > listener.
> > > > > 
> > > > > My current working approach is to leave it as is in QEMU and VFIO.
> > > > 
> > > > Agreed.  Setting ram=true to even private MMIO sounds hackish, at least
> > > 
> > > The private MMIO refers to assigned MMIO, not emulated MMIO. IIUC,
> > > normal assigned MMIO is always set ram=true,
> > > 
> > > void memory_region_init_ram_device_ptr(MemoryRegion *mr,
> > > Object *owner,
> > > const char *name,
> > > uint64_t size,
> > > void *ptr)
> > > {
> > >  memory_region_init(mr, owner, name, size);
> > >  mr->ram = true;
> > > 
> > > 
> > > So I don't think ram=true is a problem here.
> > 
> > I see.  If there's always a host pointer then it looks valid.  So it means
> > the device private MMIOs are always mappable since the start?
> 
> Yes. VFIO owns the mapping and does not treat shared/private MMIO any
> different at the moment. Thanks,

mm.. I'm actually expecting private MMIO not have a host pointer, just
as private memory do.

But I'm not sure why having host pointer correlates mr->ram == true.

Thanks,
Yilun

> 
> > 
> > Thanks,
> > 
> 
> -- 
> Alexey
> 



Re: [PATCH] tests/functional: Fix broken decorators with lamda functions

2025-01-22 Thread Daniel P . Berrangé
On Tue, Jan 21, 2025 at 07:58:14AM +0100, Thomas Huth wrote:
> The decorators that use a lambda function are currently broken
> and do not properly skip the test if the condition is not met.
> Using "return skipUnless(lambda: ...)" does not work as expected.
> To fix it, rewrite the decorators without lambda, it's simpler
> that way anyway.

Urgh, I clearly failed to re-test this properly. Originally
I wasn't using skipUnless as a helper, but had implemented
something that looked pretty much like skipUnless and then
refactored it :-(

> 
> skipIfMissingImports also needs to exec() the import statement,
> otherwise we always try to import a module called "impname" which
> does not exist.

Worth doing this as a separate commit.

> 
> Signed-off-by: Thomas Huth 
> ---
>  I've noticed the problem while trying to get the migration test
>  through the CI:
>  https://gitlab.com/thuth/qemu/-/jobs/8901960783#L100
>  ... the OpenSUSE containers apparently lack the "nc" binary ...
> 
>  tests/functional/qemu_test/decorators.py | 44 +++-
>  1 file changed, 21 insertions(+), 23 deletions(-)
> 
> diff --git a/tests/functional/qemu_test/decorators.py 
> b/tests/functional/qemu_test/decorators.py
> index df088bc090..7750af7b7d 100644
> --- a/tests/functional/qemu_test/decorators.py
> +++ b/tests/functional/qemu_test/decorators.py
> @@ -16,15 +16,14 @@
>@skipIfMissingCommands("mkisofs", "losetup")
>  '''
>  def skipIfMissingCommands(*args):
> -def has_cmds(cmdlist):
> -for cmd in cmdlist:
> -if not which(cmd):
> -return False
> -return True
> -
> -return skipUnless(lambda: has_cmds(args),
> -  'required command(s) "%s" not installed' %
> -  ", ".join(args))
> +has_cmds = True
> +for cmd in args:
> + if not which(cmd):
> + has_cmds = False
> + break
> +
> +return skipUnless(has_cmds, 'required command(s) "%s" not installed' %
> +", ".join(args))
>  
>  '''
>  Decorator to skip execution of a test if the current
> @@ -35,9 +34,9 @@ def has_cmds(cmdlist):
>@skipIfNotMachine("x86_64", "aarch64")
>  '''
>  def skipIfNotMachine(*args):
> -return skipUnless(lambda: platform.machine() in args,
> -'not running on one of the required machine(s) "%s"' 
> %
> -", ".join(args))
> +return skipUnless(platform.machine() in args,
> +  'not running on one of the required machine(s) "%s"' %
> +  ", ".join(args))
>  
>  '''
>  Decorator to skip execution of flaky tests, unless
> @@ -94,14 +93,13 @@ def skipBigDataTest():
>@skipIfMissingImports("numpy", "cv2")
>  '''
>  def skipIfMissingImports(*args):
> -def has_imports(importlist):
> -for impname in importlist:
> -try:
> -import impname
> -except ImportError:
> -return False
> -return True
> -
> -return skipUnless(lambda: has_imports(args),
> -  'required import(s) "%s" not installed' %
> -  ", ".join(args))
> +has_imports = True
> +for impname in args:
> +try:
> +exec('import %s' % impname)

I feel like the recommended approach would probably be to use

  importlib.import_module(impname)

> +except ImportError:
> +has_imports = False
> +break
> +
> +return skipUnless(has_imports, 'required import(s) "%s" not installed' %
> +   ", ".join(args))

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v1 1/4] physmem: disallow direct access to RAM DEVICE in address_space_write_rom()

2025-01-22 Thread Philippe Mathieu-Daudé

Hi David,

On 20/1/25 12:14, David Hildenbrand wrote:

As documented in commit 4a2e242bbb306 ("memory: Don't use memcpy for
ram_device regions"), we disallow direct access to RAM DEVICE regions.

Let's factor out the "supports direct access" check from
memory_access_is_direct() so we can reuse it, and make it a bit easier to
read.

This change implies that address_space_write_rom() and
cpu_memory_rw_debug() won't be able to write to RAM DEVICE regions. It
will also affect cpu_flush_icache_range(), but it's only used by
hw/core/loader.c after writing to ROM, so it is expected to not apply
here with RAM DEVICE.

This fixes direct access to these regions where we don't want direct
access. We'll extend cpu_memory_rw_debug() next to also be able to write to
these (and IO) regions.

This is a preparation for further changes.

Cc: Alex Williamson 
Signed-off-by: David Hildenbrand 
---
  include/exec/memory.h | 30 --
  system/physmem.c  |  3 +--
  2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3ee1901b52..bd0ddb9cdf 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2985,15 +2985,33 @@ MemTxResult 
address_space_write_cached_slow(MemoryRegionCache *cache,
  int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr);
  bool prepare_mmio_access(MemoryRegion *mr);
  
+static inline bool memory_region_supports_direct_access(MemoryRegion *mr)

+{
+/* ROM DEVICE regions only allow direct access if in ROMD mode. */
+if (memory_region_is_romd(mr)) {
+return true;
+}
+if (!memory_region_is_ram(mr)) {
+return false;
+}
+/*
+ * RAM DEVICE regions can be accessed directly using memcpy, but it might
+ * be MMIO and access using mempy can be wrong (e.g., using instructions 
not
+ * intended for MMIO access). So we treat this as IO.
+ */
+return !memory_region_is_ram_device(mr);
+
+}
+
  static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
  {
-if (is_write) {
-return memory_region_is_ram(mr) && !mr->readonly &&
-   !mr->rom_device && !memory_region_is_ram_device(mr);
-} else {
-return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) 
||


This patch is doing multiple things at once, and I'm having hard time
reviewing it.


-   memory_region_is_romd(mr);
+if (!memory_region_supports_direct_access(mr)) {
+return false;
+}
+if (!is_write) {
+return true;
  }
+return !mr->readonly && !mr->rom_device;
  }


Trying to split.

1/ Extract starting with ram[_device]:
-- >8 --
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3ee1901b52c..5834a208618 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2987,2 +2987,15 @@ bool prepare_mmio_access(MemoryRegion *mr);

+static inline bool memory_region_supports_direct_access(MemoryRegion *mr)
+{
+if (!memory_region_is_ram(mr)) {
+return false;
+}
+/*
+ * RAM DEVICE regions can be accessed directly using memcpy, but it 
might
+ * be MMIO and access using mempy can be wrong (e.g., using 
instructions not

+ * intended for MMIO access). So we treat this as IO.
+ */
+return !memory_region_is_ram_device(mr);
+}
+
 static inline bool memory_access_is_direct(MemoryRegion *mr, bool 
is_write)
@@ -2990,6 +3003,6 @@ static inline bool 
memory_access_is_direct(MemoryRegion *mr, bool is_write)

 if (is_write) {
-return memory_region_is_ram(mr) && !mr->readonly &&
-   !mr->rom_device && !memory_region_is_ram_device(mr);
+return !mr->readonly && !mr->rom_device &&
+   !memory_region_supports_direct_access(mr);
 } else {
-return (memory_region_is_ram(mr) && 
!memory_region_is_ram_device(mr)) ||

+return memory_region_supports_direct_access(mr) ||
memory_region_is_romd(mr);
---

2/ Call memory_region_supports_direct_access() once [dubious]
-- >8 --
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5834a208618..4c5c84059b7 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -3002,8 +3002,10 @@ static inline bool 
memory_access_is_direct(MemoryRegion *mr, bool is_write)

 {
+if (!memory_region_supports_direct_access(mr)) {
+return false;
+}
+
 if (is_write) {
-return !mr->readonly && !mr->rom_device &&
-   !memory_region_supports_direct_access(mr);
+return !mr->readonly && !mr->rom_device;
 } else {
-return memory_region_supports_direct_access(mr) ||
-   memory_region_is_romd(mr);
+return memory_region_is_romd(mr);
 }
---

3/ Invert if ladders:
-- >8 --
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 4c5c84059b7..e89cd2f10f0 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -3006,7 +3006,7 @@ static inline bool 
memory_

Re: [PATCH v1 3/4] hmp: use cpu_get_phys_page_debug() in hmp_gva2gpa()

2025-01-22 Thread Philippe Mathieu-Daudé

On 20/1/25 12:15, David Hildenbrand wrote:

We don't need the MemTxAttrs, so let's simply use the simpler function
variant.

Signed-off-by: David Hildenbrand 
---
  monitor/hmp-cmds-target.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v1 1/4] physmem: disallow direct access to RAM DEVICE in address_space_write_rom()

2025-01-22 Thread David Hildenbrand

On 22.01.25 11:07, Philippe Mathieu-Daudé wrote:

Hi David,

On 20/1/25 12:14, David Hildenbrand wrote:

As documented in commit 4a2e242bbb306 ("memory: Don't use memcpy for
ram_device regions"), we disallow direct access to RAM DEVICE regions.

Let's factor out the "supports direct access" check from
memory_access_is_direct() so we can reuse it, and make it a bit easier to
read.

This change implies that address_space_write_rom() and
cpu_memory_rw_debug() won't be able to write to RAM DEVICE regions. It
will also affect cpu_flush_icache_range(), but it's only used by
hw/core/loader.c after writing to ROM, so it is expected to not apply
here with RAM DEVICE.

This fixes direct access to these regions where we don't want direct
access. We'll extend cpu_memory_rw_debug() next to also be able to write to
these (and IO) regions.

This is a preparation for further changes.

Cc: Alex Williamson 
Signed-off-by: David Hildenbrand 
---
   include/exec/memory.h | 30 --
   system/physmem.c  |  3 +--
   2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3ee1901b52..bd0ddb9cdf 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2985,15 +2985,33 @@ MemTxResult 
address_space_write_cached_slow(MemoryRegionCache *cache,
   int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr);
   bool prepare_mmio_access(MemoryRegion *mr);
   
+static inline bool memory_region_supports_direct_access(MemoryRegion *mr)

+{
+/* ROM DEVICE regions only allow direct access if in ROMD mode. */
+if (memory_region_is_romd(mr)) {
+return true;
+}
+if (!memory_region_is_ram(mr)) {
+return false;
+}
+/*
+ * RAM DEVICE regions can be accessed directly using memcpy, but it might
+ * be MMIO and access using mempy can be wrong (e.g., using instructions 
not
+ * intended for MMIO access). So we treat this as IO.
+ */
+return !memory_region_is_ram_device(mr);
+
+}
+
   static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
   {
-if (is_write) {
-return memory_region_is_ram(mr) && !mr->readonly &&
-   !mr->rom_device && !memory_region_is_ram_device(mr);
-} else {
-return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) 
||


This patch is doing multiple things at once, and I'm having hard time
reviewing it.


I appreciate the review, but ... really?! :)

25 insertions(+), 8 deletions(-)

--
Cheers,

David / dhildenb




  1   2   >