Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support

2025-03-14 Thread Ani Sinha
On Fri, Mar 14, 2025 at 8:47 PM Jörg Rödel  wrote:
>
> On Fri, Mar 14, 2025 at 03:08:43PM +0100, Gerd Hoffman wrote:
> > If your input firmware image already is an IGVM (say coconut), what is
> > supposed to happen?
>
> The COCONUT igvmbuilder has the ability to take another IGVM file as
> input and add its directive and contents to the output file. This is
> needed for the Hyper-V firmware, which is also provided as an IGVM file.
> I think it can also make changes when processing an input IGVM, like
> changing the VMPL of a VMSA.
>
> This can be extended to cover more use-cases, e.g. like direct-boot of a
> linux kernel with initrd and command-line.

Have we considered how IGVM will be deployed in the FUKI case? I have
always assumed that UKI would be the right means for this. For edk2
for example, we are implementing support in UKI. Hopefully the same
can apply for IGVM.




[Stable-9.2.3 05/51] target/arm: Make CP_ACCESS_TRAPs to AArch32 EL3 be Monitor traps

2025-03-14 Thread Michael Tokarev
From: Peter Maydell 

In system register access pseudocode the common pattern for
AArch32 registers with access traps to EL3 is:

at EL1 and EL2:
  if HaveEL(EL3) && !ELUsingAArch32(EL3) && (SCR_EL3.TERR == 1) then
 AArch64.AArch32SystemAccessTrap(EL3, 0x03);
  elsif HaveEL(EL3) && ELUsingAArch32(EL3) && (SCR.TERR == 1) then
 AArch32.TakeMonitorTrapException();
at EL3:
  if (PSTATE.M != M32_Monitor) && (SCR.TERR == 1) then
 AArch32.TakeMonitorTrapException();

(taking as an example the ERRIDR access pseudocode).

This implements the behaviour of (in this case) SCR.TERR that
"Accesses to the specified registers from modes other than Monitor
mode generate a Monitor Trap exception" and of SCR_EL3.TERR that
"Accesses of the specified Error Record registers at EL2 and EL1
are trapped to EL3, unless the instruction generates a higher
priority exception".

In QEMU we don't implement this pattern correctly in two ways:
 * in access_check_cp_reg() we turn the CP_ACCESS_TRAP_EL3 into
   an UNDEF, not a trap to Monitor mode
 * in the access functions, we check trap bits like SCR.TERR
   only when arm_current_el(env) < 3 -- this is correct for
   AArch64 EL3, but misses the "trap non-Monitor-mode execution
   at EL3 into Monitor mode" case for AArch32 EL3

In this commit we fix the first of these two issues, by
making access_check_cp_reg() handle CP_ACCESS_TRAP_EL3
as a Monitor trap. This is a kind of exception that we haven't
yet implemented(!), so we need a new EXCP_MON_TRAP for it.

This diverges from the pseudocode approach, where every access check
function explicitly checks for "if EL3 is AArch32" and takes a
monitor trap; if we wanted to be closer to the pseudocode we could
add a new CP_ACCESS_TRAP_MONITOR and make all the accessfns use it
when appropriate.  But because there are no non-standard cases in the
pseudocode (i.e.  where either it raises a Monitor trap that doesn't
correspond to an AArch64 SystemAccessTrap or where it raises a
SystemAccessTrap that doesn't correspond to a Monitor trap), handling
this all in one place seems less likely to result in future bugs
where we forgot again about this special case when writing an
accessor.

(The cc of stable here is because "hw/intc/arm_gicv3_cpuif: Don't
downgrade monitor traps for AArch32 EL3" which is also cc:stable
will implicitly use the new EXCP_MON_TRAP code path.)

Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Maydell 
Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Message-id: 20250130182309.717346-6-peter.mayd...@linaro.org
(cherry picked from commit 4cf4948651615181c5bc3d0e4a9f5c46be576bb2)
Signed-off-by: Michael Tokarev 

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index d86e641280..e3416cd435 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -62,6 +62,7 @@
 #define EXCP_NMI26
 #define EXCP_VINMI  27
 #define EXCP_VFNMI  28
+#define EXCP_MON_TRAP   29   /* AArch32 trap to Monitor mode */
 /* NB: add new EXCP_ defines to the array in arm_log_exception() too */
 
 #define ARMV7M_EXCP_RESET   1
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 0e386c9ae5..fe255ccb43 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -10858,6 +10858,7 @@ void arm_log_exception(CPUState *cs)
 [EXCP_NMI] = "NMI",
 [EXCP_VINMI] = "Virtual IRQ NMI",
 [EXCP_VFNMI] = "Virtual FIQ NMI",
+[EXCP_MON_TRAP] = "Monitor Trap",
 };
 
 if (idx >= 0 && idx < ARRAY_SIZE(excnames)) {
@@ -11424,6 +11425,16 @@ static void arm_cpu_do_interrupt_aarch32(CPUState *cs)
 mask = CPSR_A | CPSR_I | CPSR_F;
 offset = 0;
 break;
+case EXCP_MON_TRAP:
+new_mode = ARM_CPU_MODE_MON;
+addr = 0x04;
+mask = CPSR_A | CPSR_I | CPSR_F;
+if (env->thumb) {
+offset = 2;
+} else {
+offset = 4;
+}
+break;
 default:
 cpu_abort(cs, "Unhandled exception 0x%x\n", cs->exception_index);
 return; /* Never happens.  Keep compiler happy.  */
diff --git a/target/arm/tcg/op_helper.c b/target/arm/tcg/op_helper.c
index 1ecb465988..7cde2337ac 100644
--- a/target/arm/tcg/op_helper.c
+++ b/target/arm/tcg/op_helper.c
@@ -758,6 +758,7 @@ const void *HELPER(access_check_cp_reg)(CPUARMState *env, 
uint32_t key,
 const ARMCPRegInfo *ri = get_arm_cp_reginfo(cpu->cp_regs, key);
 CPAccessResult res = CP_ACCESS_OK;
 int target_el;
+uint32_t excp;
 
 assert(ri != NULL);
 
@@ -842,8 +843,18 @@ const void *HELPER(access_check_cp_reg)(CPUARMState *env, 
uint32_t key,
 }
 
  fail:
+excp = EXCP_UDEF;
 switch (res & ~CP_ACCESS_EL_MASK) {
 case CP_ACCESS_TRAP:
+/*
+ * If EL3 is AArch32 then there's no syndrome register; the cases
+ * where we would raise a SystemAccessTrap to AArch64 EL3 all become
+ * raising a Monitor trap exception. (Because there's no visible
+ * syndrome it doesn't matter wh

[Stable-9.2.3 23/51] target/riscv: rvv: Fix unexpected behavior of vector reduction instructions when vl is 0

2025-03-14 Thread Michael Tokarev
From: Max Chou 

According to the Vector Reduction Operations section in the RISC-V "V"
Vector Extension spec,
"If vl=0, no operation is performed and the destination register is not
updated."

The vd should be updated when vl is larger than 0.

Fixes: fe5c9ab1fc ("target/riscv: vector single-width integer reduction 
instructions")
Fixes: f714361ed7 ("target/riscv: rvv-1.0: implement vstart CSR")
Signed-off-by: Max Chou 
Reviewed-by: Daniel Henrique Barboza 
Message-ID: <20250124101452.2519171-1-max.c...@sifive.com>
Signed-off-by: Alistair Francis 
(cherry picked from commit ffd455963f230c7dc04965609d6675da687a5a78)
Signed-off-by: Michael Tokarev 

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index a85dd1d200..3731500717 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4648,7 +4648,9 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,  
\
 } \
 s1 = OP(s1, (TD)s2);  \
 } \
-*((TD *)vd + HD(0)) = s1; \
+if (vl > 0) { \
+*((TD *)vd + HD(0)) = s1; \
+} \
 env->vstart = 0;  \
 /* set tail elements to 1s */ \
 vext_set_elems_1s(vd, vta, esz, vlenb);   \
@@ -4734,7 +4736,9 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,  
 \
 }  \
 s1 = OP(s1, (TD)s2, &env->fp_status);  \
 }  \
-*((TD *)vd + HD(0)) = s1;  \
+if (vl > 0) {  \
+*((TD *)vd + HD(0)) = s1;  \
+}  \
 env->vstart = 0;   \
 /* set tail elements to 1s */  \
 vext_set_elems_1s(vd, vta, esz, vlenb);\
-- 
2.39.5




[Stable-9.2.3 14/51] amd_iommu: Use correct bitmask to set capability BAR

2025-03-14 Thread Michael Tokarev
From: Sairaj Kodilkar 

AMD IOMMU provides the base address of control registers through
IVRS table and PCI capability. Since this base address is of 64 bit,
use 32 bits mask (instead of 16 bits) to set BAR low and high.

Fixes: d29a09ca68 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Sairaj Kodilkar 
Reviewed-by: Vasant Hegde 
Message-Id: <20250207045354.27329-3-sarun...@amd.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
(cherry picked from commit 3684717b7407cc395dc9bf522e193dbc85293dee)
Signed-off-by: Michael Tokarev 

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 9cf5b40200..ffb234fb5c 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1593,9 +1593,9 @@ static void amdvi_pci_realize(PCIDevice *pdev, Error 
**errp)
 /* reset AMDVI specific capabilities, all r/o */
 pci_set_long(pdev->config + s->capab_offset, AMDVI_CAPAB_FEATURES);
 pci_set_long(pdev->config + s->capab_offset + AMDVI_CAPAB_BAR_LOW,
- AMDVI_BASE_ADDR & ~(0x));
+ AMDVI_BASE_ADDR & MAKE_64BIT_MASK(14, 18));
 pci_set_long(pdev->config + s->capab_offset + AMDVI_CAPAB_BAR_HIGH,
-(AMDVI_BASE_ADDR & ~(0x)) >> 16);
+AMDVI_BASE_ADDR >> 32);
 pci_set_long(pdev->config + s->capab_offset + AMDVI_CAPAB_RANGE,
  0xff00);
 pci_set_long(pdev->config + s->capab_offset + AMDVI_CAPAB_MISC, 0);
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index e0dac4d9a9..28125130c6 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -187,7 +187,7 @@
 AMDVI_CAPAB_FLAG_HTTUNNEL |  AMDVI_CAPAB_EFR_SUP)
 
 /* AMDVI default address */
-#define AMDVI_BASE_ADDR 0xfed8
+#define AMDVI_BASE_ADDR 0xfed8ULL
 
 /* page management constants */
 #define AMDVI_PAGE_SHIFT 12
-- 
2.39.5




[Stable-9.2.3 11/51] hw/i386/pc: Fix crash that occurs when introspecting TYPE_PC_MACHINE machines

2025-03-14 Thread Michael Tokarev
From: Thomas Huth 

QEMU currently crashes when you try to inspect the machines based on
TYPE_PC_MACHINE for their properties:

 $ echo '{ "execute": "qmp_capabilities" }
 { "execute": "qom-list-properties","arguments":
  { "typename": "pc-q35-10.0-machine"}}' \
   | ./qemu-system-x86_64 -M pc -qmp stdio
 {"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 9},
  "package": "v9.2.0-1070-g87e115c122-dirty"}, "capabilities": ["oob"]}}
 {"return": {}}
 Segmentation fault (core dumped)

This happens because TYPE_PC_MACHINE machines add a machine_init-
done_notifier in their instance_init function - but instance_init
of machines are not only called for machines that are realized,
but also for machines that are introspected, so in this case the
listener is added for a q35 machine that is never realized. But
since there is already a running pc machine, the listener function
is triggered immediately, causing a crash since it was not for the
right machine it was meant for.

Such listener functions must never be installed from an instance_init
function. Let's do it from pc_basic_device_init() instead - this
function is called from the MachineClass->init() function instead,
i.e. guaranteed to be only called once in the lifetime of a QEMU
process.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2779
Signed-off-by: Thomas Huth 
Message-Id: <20250117192106.471029-1-th...@redhat.com>
Reviewed-by: Akihiko Odaki 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
(cherry picked from commit de538288e4dac21332cc94ba9727ed8ec8fe5ea1)
Signed-off-by: Michael Tokarev 

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 317aaca25a..9b89b51b8f 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1236,6 +1236,9 @@ void pc_basic_device_init(struct PCMachineState *pcms,
 /* Super I/O */
 pc_superio_init(isa_bus, create_fdctrl, pcms->i8042_enabled,
 pcms->vmport != ON_OFF_AUTO_ON, &error_fatal);
+
+pcms->machine_done.notify = pc_machine_done;
+qemu_add_machine_init_done_notifier(&pcms->machine_done);
 }
 
 void pc_nic_init(PCMachineClass *pcmc, ISABus *isa_bus, PCIBus *pci_bus)
@@ -1709,9 +1712,6 @@ static void pc_machine_initfn(Object *obj)
 if (pcmc->pci_enabled) {
 cxl_machine_init(obj, &pcms->cxl_devices_state);
 }
-
-pcms->machine_done.notify = pc_machine_done;
-qemu_add_machine_init_done_notifier(&pcms->machine_done);
 }
 
 static void pc_machine_reset(MachineState *machine, ResetType type)
-- 
2.39.5




[Stable-9.2.3 03/51] target/arm: Report correct syndrome for UNDEFINED S1E2 AT ops at EL3

2025-03-14 Thread Michael Tokarev
From: Peter Maydell 

The pseudocode for AT S1E2R and AT S1E2W says that they should be
UNDEFINED if executed at EL3 when EL2 is not enabled. We were
incorrectly using CP_ACCESS_TRAP and reporting the wrong exception
syndrome as a result. Use CP_ACCESS_TRAP_UNCATEGORIZED.

Cc: qemu-sta...@nongnu.org
Fixes: 2a47df953202e1 ("target-arm: Wire up AArch64 EL2 and EL3 address 
translation ops")
Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20250130182309.717346-4-peter.mayd...@linaro.org
(cherry picked from commit ccda792945d650bce4609c8dbce8814a220df1bb)
Signed-off-by: Michael Tokarev 

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 32cf6039e3..63cdb29510 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -3878,7 +3878,7 @@ static CPAccessResult at_s1e2_access(CPUARMState *env, 
const ARMCPRegInfo *ri,
 {
 if (arm_current_el(env) == 3 &&
 !(env->cp15.scr_el3 & (SCR_NS | SCR_EEL2))) {
-return CP_ACCESS_TRAP;
+return CP_ACCESS_TRAP_UNCATEGORIZED;
 }
 return at_e012_access(env, ri, isread);
 }
-- 
2.39.5




[Stable-9.2.3 04/51] target/arm: Report correct syndrome for UNDEFINED LOR sysregs when NS=0

2025-03-14 Thread Michael Tokarev
From: Peter Maydell 

The pseudocode for the accessors for the LOR sysregs says they
are UNDEFINED if SCR_EL3.NS is 0. We were reporting the wrong
syndrome value here; use CP_ACCESS_TRAP_UNCATEGORIZED.

Cc: qemu-sta...@nongnu.org
Fixes: 2d7137c10faf ("target/arm: Implement the ARMv8.1-LOR extension")
Signed-off-by: Peter Maydell 
Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Message-id: 20250130182309.717346-5-peter.mayd...@linaro.org
(cherry picked from commit 707d478ed8f2da6f2327e5af780890c1fd9c371a)
Signed-off-by: Michael Tokarev 

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 63cdb29510..0e386c9ae5 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7737,8 +7737,8 @@ static CPAccessResult access_lor_other(CPUARMState *env,
const ARMCPRegInfo *ri, bool isread)
 {
 if (arm_is_secure_below_el3(env)) {
-/* Access denied in secure mode.  */
-return CP_ACCESS_TRAP;
+/* UNDEF if SCR_EL3.NS == 0 */
+return CP_ACCESS_TRAP_UNCATEGORIZED;
 }
 return access_lor_ns(env, ri, isread);
 }
-- 
2.39.5




[Stable-9.2.3 16/51] vdpa: Fix endian bugs in shadow virtqueue

2025-03-14 Thread Michael Tokarev
From: Konstantin Shkolnyy 

VDPA didn't work on a big-endian machine due to missing/incorrect
CPU<->LE data format conversions.

Signed-off-by: Konstantin Shkolnyy 
Message-Id: <20250212164923.1971538-1-k...@linux.ibm.com>
Fixes: 10857ec0ad ("vhost: Add VhostShadowVirtqueue")
Acked-by: Eugenio Pérez 
Tested-by: Lei Yang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
(cherry picked from commit 50e9754149066dc91f58405d3378b589098cb408)
Signed-off-by: Michael Tokarev 

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 37aca8b431..4af0d7c669 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -165,10 +165,10 @@ static bool 
vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
 descs[i].len = cpu_to_le32(iovec[n].iov_len);
 
 last = i;
-i = cpu_to_le16(svq->desc_next[i]);
+i = svq->desc_next[i];
 }
 
-svq->free_head = le16_to_cpu(svq->desc_next[last]);
+svq->free_head = svq->desc_next[last];
 return true;
 }
 
@@ -228,10 +228,12 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
 smp_mb();
 
 if (virtio_vdev_has_feature(svq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
-uint16_t avail_event = *(uint16_t 
*)(&svq->vring.used->ring[svq->vring.num]);
+uint16_t avail_event = le16_to_cpu(
+*(uint16_t *)(&svq->vring.used->ring[svq->vring.num]));
 needs_kick = vring_need_event(avail_event, svq->shadow_avail_idx, 
svq->shadow_avail_idx - 1);
 } else {
-needs_kick = !(svq->vring.used->flags & VRING_USED_F_NO_NOTIFY);
+needs_kick =
+!(svq->vring.used->flags & 
cpu_to_le16(VRING_USED_F_NO_NOTIFY));
 }
 
 if (!needs_kick) {
@@ -365,7 +367,7 @@ static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
 return true;
 }
 
-svq->shadow_used_idx = cpu_to_le16(*(volatile uint16_t *)used_idx);
+svq->shadow_used_idx = le16_to_cpu(*(volatile uint16_t *)used_idx);
 
 return svq->last_used_idx != svq->shadow_used_idx;
 }
@@ -383,7 +385,7 @@ static bool 
vhost_svq_enable_notification(VhostShadowVirtqueue *svq)
 {
 if (virtio_vdev_has_feature(svq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
 uint16_t *used_event = (uint16_t 
*)&svq->vring.avail->ring[svq->vring.num];
-*used_event = svq->shadow_used_idx;
+*used_event = cpu_to_le16(svq->shadow_used_idx);
 } else {
 svq->vring.avail->flags &= ~cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
 }
@@ -408,7 +410,7 @@ static uint16_t vhost_svq_last_desc_of_chain(const 
VhostShadowVirtqueue *svq,
  uint16_t num, uint16_t i)
 {
 for (uint16_t j = 0; j < (num - 1); ++j) {
-i = le16_to_cpu(svq->desc_next[i]);
+i = svq->desc_next[i];
 }
 
 return i;
@@ -683,7 +685,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, 
VirtIODevice *vdev,
 svq->desc_state = g_new0(SVQDescState, svq->vring.num);
 svq->desc_next = g_new0(uint16_t, svq->vring.num);
 for (unsigned i = 0; i < svq->vring.num - 1; i++) {
-svq->desc_next[i] = cpu_to_le16(i + 1);
+svq->desc_next[i] = i + 1;
 }
 }
 
-- 
2.39.5




[Stable-9.2.3 07/51] target/arm: Honour SDCR.TDCC and SCR.TERR in AArch32 EL3 non-Monitor modes

2025-03-14 Thread Michael Tokarev
From: Peter Maydell 

There are not many traps in AArch32 which should trap to Monitor
mode, but these trap bits should trap not just lower ELs to Monitor
mode but also the non-Monitor modes running at EL3 (i.e.  Secure
System, Secure Undef, etc).

We get this wrong because the relevant access functions implement the
AArch64-style logic of
   if (el < 3 && trap_bit_set) {
   return CP_ACCESS_TRAP_EL3;
   }
which won't trap the non-Monitor modes at EL3.

Correct this error by using arm_is_el3_or_mon() instead, which
returns true when the CPU is at AArch64 EL3 or AArch32 Monitor mode.
(Since the new callsites are compiled also for the linux-user mode,
we need to provide a dummy implementation for CONFIG_USER_ONLY.)

This affects only:
 * trapping of ERRIDR via SCR.TERR
 * trapping of the debug channel registers via SDCR.TDCC
 * trapping of GICv3 registers via SCR.IRQ and SCR.FIQ
   (which we already used arm_is_el3_or_mon() for)

This patch changes the handling of SCR.TERR and SDCR.TDCC. This
patch only changes guest-visible behaviour for "-cpu max" on
the qemu-system-arm binary, because SCR.TERR
and SDCR.TDCC (and indeed the entire SDCR register) only arrived
in Armv8, and the only guest CPU we support which has any v8
features and also starts in AArch32 EL3 is the 32-bit 'max'.

Other uses of CP_ACCESS_TRAP_EL3 don't need changing:

 * uses in code paths that can't happen when EL3 is AArch32:
   access_trap_aa32s_el1, cpacr_access, cptr_access, nsacr_access
 * uses which are in accessfns for AArch64-only registers:
   gt_stimer_access, gt_cntpoff_access, access_hxen, access_tpidr2,
   access_smpri, access_smprimap, access_lor_ns, access_pauth,
   access_mte, access_tfsr_el2, access_scxtnum, access_fgt
 * trap bits which exist only in the AArch64 version of the
   trap register, not the AArch32 one:
   access_tpm, pmreg_access, access_dbgvcr32, access_tdra,
   access_tda, access_tdosa (TPM, TDA and TDOSA exist only in
   MDCR_EL3, not in SDCR, and we enforce this in sdcr_write())

Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Maydell 
Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Message-id: 20250130182309.717346-8-peter.mayd...@linaro.org
(cherry picked from commit 4d436fb05c2a1fff7befc815ebcbb04a14977448)
Signed-off-by: Michael Tokarev 

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index e3416cd435..b005f93735 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2562,6 +2562,11 @@ static inline bool arm_is_secure_below_el3(CPUARMState 
*env)
 return false;
 }
 
+static inline bool arm_is_el3_or_mon(CPUARMState *env)
+{
+return false;
+}
+
 static inline ARMSecuritySpace arm_security_space(CPUARMState *env)
 {
 return ARMSS_NonSecure;
diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
index 7d856acddf..019b2b6f97 100644
--- a/target/arm/debug_helper.c
+++ b/target/arm/debug_helper.c
@@ -880,7 +880,8 @@ static CPAccessResult access_tdcc(CPUARMState *env, const 
ARMCPRegInfo *ri,
 if (el < 2 && (mdcr_el2_tda || mdcr_el2_tdcc)) {
 return CP_ACCESS_TRAP_EL2;
 }
-if (el < 3 && ((env->cp15.mdcr_el3 & MDCR_TDA) || mdcr_el3_tdcc)) {
+if (!arm_is_el3_or_mon(env) &&
+((env->cp15.mdcr_el3 & MDCR_TDA) || mdcr_el3_tdcc)) {
 return CP_ACCESS_TRAP_EL3;
 }
 return CP_ACCESS_OK;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index fe255ccb43..8df38a30a1 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7052,7 +7052,7 @@ static CPAccessResult access_terr(CPUARMState *env, const 
ARMCPRegInfo *ri,
 if (el < 2 && (arm_hcr_el2_eff(env) & HCR_TERR)) {
 return CP_ACCESS_TRAP_EL2;
 }
-if (el < 3 && (env->cp15.scr_el3 & SCR_TERR)) {
+if (!arm_is_el3_or_mon(env) && (env->cp15.scr_el3 & SCR_TERR)) {
 return CP_ACCESS_TRAP_EL3;
 }
 return CP_ACCESS_OK;
-- 
2.39.5




[Stable-9.2.3 31/51] target/arm: Apply correct timer offset when calculating deadlines

2025-03-14 Thread Michael Tokarev
From: Peter Maydell 

When we are calculating timer deadlines, the correct definition of
whether or not to apply an offset to the physical count is described
in the Arm ARM DDI4087 rev L.a section D12.2.4.1.  This is different
from when the offset should be applied for a direct read of the
counter sysreg.

We got this right for the EL1 physical timer and for the EL1 virtual
timer, but got all the rest wrong: they should be using a zero offset
always.

Factor the offset calculation out into a function that has a comment
documenting exactly which offset it is calculating and which gets the
HYP, SEC, and HYPVIRT cases right.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Maydell 
Reviewed-by: Alex Bennée 
Message-id: 20250204125009.2281315-2-peter.mayd...@linaro.org
(cherry picked from commit db6c2192839ee0282d38f6fa87e0629fcd13)
Signed-off-by: Michael Tokarev 

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 8df38a30a1..417801d9c3 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -2727,6 +2727,32 @@ static uint64_t gt_phys_cnt_offset(CPUARMState *env)
 return gt_phys_raw_cnt_offset(env);
 }
 
+static uint64_t gt_indirect_access_timer_offset(CPUARMState *env, int timeridx)
+{
+/*
+ * Return the timer offset to use for indirect accesses to the timer.
+ * This is the Offset value as defined in D12.2.4.1 "Operation of the
+ * CompareValue views of the timers".
+ *
+ * The condition here is not always the same as the condition for
+ * whether to apply an offset register when doing a direct read of
+ * the counter sysreg; those conditions are described in the
+ * access pseudocode for each counter register.
+ */
+switch (timeridx) {
+case GTIMER_PHYS:
+return gt_phys_raw_cnt_offset(env);
+case GTIMER_VIRT:
+return env->cp15.cntvoff_el2;
+case GTIMER_HYP:
+case GTIMER_SEC:
+case GTIMER_HYPVIRT:
+return 0;
+default:
+g_assert_not_reached();
+}
+}
+
 static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
 {
 ARMGenericTimer *gt = &cpu->env.cp15.c14_timer[timeridx];
@@ -2736,8 +2762,7 @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
  * Timer enabled: calculate and set current ISTATUS, irq, and
  * reset timer to when ISTATUS next has to change
  */
-uint64_t offset = timeridx == GTIMER_VIRT ?
-cpu->env.cp15.cntvoff_el2 : gt_phys_raw_cnt_offset(&cpu->env);
+uint64_t offset = gt_indirect_access_timer_offset(&cpu->env, timeridx);
 uint64_t count = gt_get_countervalue(&cpu->env);
 /* Note that this must be unsigned 64 bit arithmetic: */
 int istatus = count - offset >= gt->cval;
-- 
2.39.5




[Stable-9.2.3 20/51] physmem: replace assertion with error

2025-03-14 Thread Michael Tokarev
From: Paolo Bonzini 

It is possible to start QEMU with a confidential-guest-support object
even in TCG mode.  While there is already a check in qemu_machine_creation_done:

if (machine->cgs && !machine->cgs->ready) {
error_setg(errp, "accelerator does not support confidential guest %s",
   object_get_typename(OBJECT(machine->cgs)));
exit(1);
}

the creation of RAMBlocks happens earlier, in qemu_init_board(), if
the command line does not override the default memory backend with
-M memdev.  Then the RAMBlock will try to use guest_memfd (because
machine_require_guest_memfd correctly returns true; at least correctly
according to the current implementation) and trigger the assertion
failure for kvm_enabled().  This happend with a command line as
simple as the following:

qemu-system-x86_64 -m 512 -nographic -object 
sev-snp-guest,reduced-phys-bits=48,id=sev0 \
   -M q35,kernel-irqchip=split,confidential-guest-support=sev0
qemu-system-x86_64: ../system/physmem.c:1871: ram_block_add: Assertion 
`kvm_enabled()' failed.

Cc: Xiaoyao Li 
Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: David Hildenbrand 
Reviewed-by: Pankaj Gupta 
Reviewed-by: Xiaoyao Li 
Link: https://lore.kernel.org/r/20250217120812.396522-1-pbonz...@redhat.com
Signed-off-by: Paolo Bonzini 
(cherry picked from commit 6debfb2cb1795427d2dc6a741c7430a233c76695)
Signed-off-by: Michael Tokarev 

diff --git a/system/physmem.c b/system/physmem.c
index 75389064a8..83013b59f7 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1868,7 +1868,11 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp)
 if (new_block->flags & RAM_GUEST_MEMFD) {
 int ret;
 
-assert(kvm_enabled());
+if (!kvm_enabled()) {
+error_setg(errp, "cannot set up private guest memory for %s: KVM 
required",
+   object_get_typename(OBJECT(current_machine->cgs)));
+goto out_free;
+}
 assert(new_block->guest_memfd < 0);
 
 ret = ram_block_discard_require(true);
-- 
2.39.5




[Stable-9.2.3 15/51] cryptodev/vhost: allocate CryptoDevBackendVhost using g_mem0()

2025-03-14 Thread Michael Tokarev
From: Stefano Garzarella 

The function `vhost_dev_init()` expects the `struct vhost_dev`
(passed as a parameter) to be fully initialized. This is important
because some parts of the code check whether `vhost_dev->config_ops`
is NULL to determine if it has been set (e.g. later via
`vhost_dev_set_config_notifier`).

To ensure this initialization, it’s better to allocate the entire
`CryptoDevBackendVhost` structure (which includes `vhost_dev`) using
`g_mem0()`, following the same approach used for other vhost devices,
such as in `vhost_net_init()`.

Fixes: 042cea274c ("cryptodev: add vhost-user as a new cryptodev backend")
Cc: qemu-sta...@nongnu.org
Reported-by: mylu...@m.fudan.edu.cn
Signed-off-by: Stefano Garzarella 
Message-Id: <20250211135523.101203-1-sgarz...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
(cherry picked from commit 83cb18ac4500f3a14067b19408705068647cb0c5)
Signed-off-by: Michael Tokarev 

diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c
index 93523732f3..5901b3ec4c 100644
--- a/backends/cryptodev-vhost.c
+++ b/backends/cryptodev-vhost.c
@@ -53,7 +53,7 @@ cryptodev_vhost_init(
 CryptoDevBackendVhost *crypto;
 Error *local_err = NULL;
 
-crypto = g_new(CryptoDevBackendVhost, 1);
+crypto = g_new0(CryptoDevBackendVhost, 1);
 crypto->dev.max_queues = 1;
 crypto->dev.nvqs = 1;
 crypto->dev.vqs = crypto->vqs;
-- 
2.39.5




[Stable-9.2.3 24/51] target/riscv: rvv: Fix incorrect vlen comparison in prop_vlen_set

2025-03-14 Thread Michael Tokarev
From: Max Chou 

In prop_vlen_set function, there is an incorrect comparison between
vlen(bit) and vlenb(byte).
This will cause unexpected error when user applies the `vlen=1024` cpu
option with a vendor predefined cpu type that the default vlen is
1024(vlenb=128).

Fixes: 4f6d036ccc ("target/riscv/cpu.c: remove cpu->cfg.vlen")
Signed-off-by: Max Chou 
Reviewed-by: Daniel Henrique Barboza 
Message-ID: <20250124090539.2506448-1-max.c...@sifive.com>
Signed-off-by: Alistair Francis 
(cherry picked from commit bf3adf93f16730ca5aaa6c26cf969e64eeff6e7b)
Signed-off-by: Michael Tokarev 

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index f219f0c3b5..261db879a2 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1897,6 +1897,7 @@ static void prop_vlen_set(Object *obj, Visitor *v, const 
char *name,
  void *opaque, Error **errp)
 {
 RISCVCPU *cpu = RISCV_CPU(obj);
+uint16_t cpu_vlen = cpu->cfg.vlenb << 3;
 uint16_t value;
 
 if (!visit_type_uint16(v, name, &value, errp)) {
@@ -1908,10 +1909,10 @@ static void prop_vlen_set(Object *obj, Visitor *v, 
const char *name,
 return;
 }
 
-if (value != cpu->cfg.vlenb && riscv_cpu_is_vendor(obj)) {
+if (value != cpu_vlen && riscv_cpu_is_vendor(obj)) {
 cpu_set_prop_err(cpu, name, errp);
 error_append_hint(errp, "Current '%s' val: %u\n",
-  name, cpu->cfg.vlenb << 3);
+  name, cpu_vlen);
 return;
 }
 
-- 
2.39.5




[Stable-9.2.3 17/51] hw/virtio/virtio-nsm: Respond with correct length

2025-03-14 Thread Michael Tokarev
From: Alexander Graf 

When we return a response packet from NSM, we need to indicate its
length according to the content of the response. Prior to this patch, we
returned the length of the source buffer, which may confuse guest code
that relies on the response size.

Fix it by returning the response payload size instead.

Fixes: bb154e3e0cc715 ("device/virtio-nsm: Support for Nitro Secure Module 
device")
Reported-by: Vikrant Garg 
Signed-off-by: Alexander Graf 
Message-Id: <20250213114541.67515-1-g...@amazon.com>
Reviewed-by: Dorjoy Chowdhury 
Fixes: bb154e3e0cc715 ("device/virtio-nsm: Support for Nitro Secure Module 
device")
Reported-by: Vikrant Garg 
Signed-off-by: Alexander Graf 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Vikrant Garg 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
(cherry picked from commit 131fe64e63c88ec52c45a5946a478c0edeb31b78)
Signed-off-by: Michael Tokarev 

diff --git a/hw/virtio/virtio-nsm.c b/hw/virtio/virtio-nsm.c
index a3db8eef3e..5dd56cf274 100644
--- a/hw/virtio/virtio-nsm.c
+++ b/hw/virtio/virtio-nsm.c
@@ -1589,7 +1589,7 @@ static void handle_input(VirtIODevice *vdev, VirtQueue 
*vq)
 g_free(req.iov_base);
 g_free(res.iov_base);
 virtqueue_push(vq, out_elem, 0);
-virtqueue_push(vq, in_elem, in_elem->in_sg->iov_len);
+virtqueue_push(vq, in_elem, sz);
 virtio_notify(vdev, vq);
 return;
 
-- 
2.39.5




[Stable-9.2.3 08/51] target/arm: Correct errors in WFI/WFE trapping

2025-03-14 Thread Michael Tokarev
From: Peter Maydell 

The code for WFI/WFE trapping has several errors:
 * it wasn't using arm_sctlr(), so it would look at SCTLR_EL1
   even if the CPU was in the EL2&0 translation regime
 * it was raising UNDEF, not Monitor Trap, for traps to
   AArch32 EL3 because of SCR.{TWE,TWI}
 * it was not honouring SCR.{TWE,TWI} when running in
   AArch32 at EL3 not in Monitor mode
 * it checked SCR.{TWE,TWI} even on v7 CPUs which don't have
   those bits

Fix these bugs.

Cc: qemu-sta...@nongnu.org
Fixes: b1eced713d99 ("target-arm: Add WFx instruction trap support")
Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20250130182309.717346-15-peter.mayd...@linaro.org
(cherry picked from commit 2b95a2d01b04afadf510a49ac14b38a59be8c5f5)
Signed-off-by: Michael Tokarev 

diff --git a/target/arm/tcg/op_helper.c b/target/arm/tcg/op_helper.c
index 7cde2337ac..5aef45d9c4 100644
--- a/target/arm/tcg/op_helper.c
+++ b/target/arm/tcg/op_helper.c
@@ -313,15 +313,19 @@ void HELPER(check_bxj_trap)(CPUARMState *env, uint32_t rm)
 }
 
 #ifndef CONFIG_USER_ONLY
-/* Function checks whether WFx (WFI/WFE) instructions are set up to be trapped.
+/*
+ * Function checks whether WFx (WFI/WFE) instructions are set up to be trapped.
  * The function returns the target EL (1-3) if the instruction is to be 
trapped;
  * otherwise it returns 0 indicating it is not trapped.
+ * For a trap, *excp is updated with the EXCP_* trap type to use.
  */
-static inline int check_wfx_trap(CPUARMState *env, bool is_wfe)
+static inline int check_wfx_trap(CPUARMState *env, bool is_wfe, uint32_t *excp)
 {
 int cur_el = arm_current_el(env);
 uint64_t mask;
 
+*excp = EXCP_UDEF;
+
 if (arm_feature(env, ARM_FEATURE_M)) {
 /* M profile cores can never trap WFI/WFE. */
 return 0;
@@ -331,18 +335,9 @@ static inline int check_wfx_trap(CPUARMState *env, bool 
is_wfe)
  * WFx instructions being trapped to EL1. These trap bits don't exist in 
v7.
  */
 if (cur_el < 1 && arm_feature(env, ARM_FEATURE_V8)) {
-int target_el;
-
 mask = is_wfe ? SCTLR_nTWE : SCTLR_nTWI;
-if (arm_is_secure_below_el3(env) && !arm_el_is_aa64(env, 3)) {
-/* Secure EL0 and Secure PL1 is at EL3 */
-target_el = 3;
-} else {
-target_el = 1;
-}
-
-if (!(env->cp15.sctlr_el[target_el] & mask)) {
-return target_el;
+if (!(arm_sctlr(env, cur_el) & mask)) {
+return exception_target_el(env);
 }
 }
 
@@ -358,9 +353,12 @@ static inline int check_wfx_trap(CPUARMState *env, bool 
is_wfe)
 }
 
 /* We are not trapping to EL1 or EL2; trap to EL3 if SCR_EL3 requires it */
-if (cur_el < 3) {
+if (arm_feature(env, ARM_FEATURE_V8) && !arm_is_el3_or_mon(env)) {
 mask = (is_wfe) ? SCR_TWE : SCR_TWI;
 if (env->cp15.scr_el3 & mask) {
+if (!arm_el_is_aa64(env, 3)) {
+*excp = EXCP_MON_TRAP;
+}
 return 3;
 }
 }
@@ -383,7 +381,8 @@ void HELPER(wfi)(CPUARMState *env, uint32_t insn_len)
 return;
 #else
 CPUState *cs = env_cpu(env);
-int target_el = check_wfx_trap(env, false);
+uint32_t excp;
+int target_el = check_wfx_trap(env, false, &excp);
 
 if (cpu_has_work(cs)) {
 /* Don't bother to go into our "low power state" if
@@ -399,7 +398,7 @@ void HELPER(wfi)(CPUARMState *env, uint32_t insn_len)
 env->regs[15] -= insn_len;
 }
 
-raise_exception(env, EXCP_UDEF, syn_wfx(1, 0xe, 0, insn_len == 2),
+raise_exception(env, excp, syn_wfx(1, 0xe, 0, insn_len == 2),
 target_el);
 }
 
@@ -424,7 +423,8 @@ void HELPER(wfit)(CPUARMState *env, uint64_t timeout)
 #else
 ARMCPU *cpu = env_archcpu(env);
 CPUState *cs = env_cpu(env);
-int target_el = check_wfx_trap(env, false);
+uint32_t excp;
+int target_el = check_wfx_trap(env, false, &excp);
 /* The WFIT should time out when CNTVCT_EL0 >= the specified value. */
 uint64_t cntval = gt_get_countervalue(env);
 uint64_t offset = gt_virt_cnt_offset(env);
@@ -441,8 +441,7 @@ void HELPER(wfit)(CPUARMState *env, uint64_t timeout)
 
 if (target_el) {
 env->pc -= 4;
-raise_exception(env, EXCP_UDEF, syn_wfx(1, 0xe, 0, false),
-target_el);
+raise_exception(env, excp, syn_wfx(1, 0xe, 0, false), target_el);
 }
 
 if (uadd64_overflow(timeout, offset, &nexttick)) {
-- 
2.39.5




[Stable-9.2.3 22/51] target/arm/hvf: sign extend the data for a load operation when SSE=1

2025-03-14 Thread Michael Tokarev
From: Joelle van Dyne 

In the syndrome value for a data abort, bit 21 is SSE, which is
set to indicate that the abort was on a sign-extending load. When
we handle the data abort from the guest via address_space_read(),
we forgot to handle this and so would return the wrong value if
the guest did a sign-extending load to an MMIO region. Add the
sign-extension of the returned data.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Joelle van Dyne 
Message-id: 20250224184123.50780-...@getutm.app
[PMM: Drop an unnecessary check on 'len'; expand commit message]
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
(cherry picked from commit 12c365315ab25d364cff24dfeea8d7ff1e752b9f)
Signed-off-by: Michael Tokarev 

diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 0089174b36..d1cf47ca6a 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -1974,6 +1974,7 @@ int hvf_vcpu_exec(CPUState *cpu)
 bool isv = syndrome & ARM_EL_ISV;
 bool iswrite = (syndrome >> 6) & 1;
 bool s1ptw = (syndrome >> 7) & 1;
+bool sse = (syndrome >> 21) & 1;
 uint32_t sas = (syndrome >> 22) & 3;
 uint32_t len = 1 << sas;
 uint32_t srt = (syndrome >> 16) & 0x1f;
@@ -2001,6 +2002,9 @@ int hvf_vcpu_exec(CPUState *cpu)
 address_space_read(&address_space_memory,
hvf_exit->exception.physical_address,
MEMTXATTRS_UNSPECIFIED, &val, len);
+if (sse) {
+val = sextract64(val, 0, len * 8);
+}
 hvf_set_reg(cpu, srt, val);
 }
 
-- 
2.39.5




[Stable-9.2.3 30/51] hw/gpio: npcm7xx: fixup out-of-bounds access

2025-03-14 Thread Michael Tokarev
From: Patrick Venture 

The reg isn't validated to be a possible register before
it's dereferenced for one case.  The mmio space registered
for the gpio device is 4KiB but there aren't that many
registers in the struct.

Cc: qemu-sta...@nongnu.org
Fixes: 526dbbe0874 ("hw/gpio: Add GPIO model for Nuvoton NPCM7xx")
Signed-off-by: Patrick Venture 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20250226024603.493148-1-vent...@google.com
Signed-off-by: Peter Maydell 
(cherry picked from commit 3b2e22c0bbe2ce07123d93961d52f17644562cd7)
Signed-off-by: Michael Tokarev 

diff --git a/hw/gpio/npcm7xx_gpio.c b/hw/gpio/npcm7xx_gpio.c
index ba19b9ebad..00ffd413ba 100644
--- a/hw/gpio/npcm7xx_gpio.c
+++ b/hw/gpio/npcm7xx_gpio.c
@@ -220,8 +220,6 @@ static void npcm7xx_gpio_regs_write(void *opaque, hwaddr 
addr, uint64_t v,
 return;
 }
 
-diff = s->regs[reg] ^ value;
-
 switch (reg) {
 case NPCM7XX_GPIO_TLOCK1:
 case NPCM7XX_GPIO_TLOCK2:
@@ -242,6 +240,7 @@ static void npcm7xx_gpio_regs_write(void *opaque, hwaddr 
addr, uint64_t v,
 case NPCM7XX_GPIO_PU:
 case NPCM7XX_GPIO_PD:
 case NPCM7XX_GPIO_IEM:
+diff = s->regs[reg] ^ value;
 s->regs[reg] = value;
 npcm7xx_gpio_update_pins(s, diff);
 break;
-- 
2.39.5




[Stable-9.2.3 12/51] hw/i386/microvm: Fix crash that occurs when introspecting the microvm machine

2025-03-14 Thread Michael Tokarev
From: Thomas Huth 

QEMU currently crashes when you try to inspect the properties of the
microvm machine:

 $ echo '{ "execute": "qmp_capabilities" }
 { "execute": "qom-list-properties","arguments":
   { "typename": "microvm-machine"}}' | \
   ./qemu-system-x86_64 -qmp stdio
 {"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 9},
  "package": "v9.2.0-1072-g60af367187-dirty"}, "capabilities": ["oob"]}}
 {"return": {}}
 qemu-system-x86_64: ../qemu/hw/i386/acpi-microvm.c:250:
  void acpi_setup_microvm(MicrovmMachineState *):
   Assertion `x86ms->fw_cfg' failed.
 Aborted (core dumped)

This happens because the microvm machine adds a machine_done (and a
powerdown_req) notifier in their instance_init function - however, the
instance_init of machines are not only called for machines that are
realized, but also for machines that are introspected, so in this case
the listener is added for a microvm machine that is never realized. And
since there is already a running machine, the listener function is
triggered immediately, causing a crash since it was not for the right
machine it was meant for.

Such listener functions must never be installed from an instance_init
function. Let's do it from microvm_machine_state_init() instead - this
function is the MachineClass->init() function instead, i.e. guaranteed
to be only called once in the lifetime of a QEMU process.

Since the microvm_machine_done() and microvm_powerdown_req() were
defined quite late in the microvm.c file, we have to move them now
also earlier, so that we can get their function pointers from
microvm_machine_state_init() without having to introduce a separate
prototype for those functions earlier.

Reviewed-by: Sergio Lopez 
Signed-off-by: Thomas Huth 
Message-Id: <20250123204708.1560305-1-th...@redhat.com>
Reviewed-by: Akihiko Odaki 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
(cherry picked from commit 38ef383073b8ee59d598643160f206a19a46237f)
Signed-off-by: Michael Tokarev 

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 86637afa0f..a024466ad8 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -451,11 +451,44 @@ static HotplugHandler 
*microvm_get_hotplug_handler(MachineState *machine,
 return NULL;
 }
 
+static void microvm_machine_done(Notifier *notifier, void *data)
+{
+MicrovmMachineState *mms = container_of(notifier, MicrovmMachineState,
+machine_done);
+X86MachineState *x86ms = X86_MACHINE(mms);
+
+acpi_setup_microvm(mms);
+dt_setup_microvm(mms);
+fw_cfg_add_e820(x86ms->fw_cfg);
+}
+
+static void microvm_powerdown_req(Notifier *notifier, void *data)
+{
+MicrovmMachineState *mms = container_of(notifier, MicrovmMachineState,
+powerdown_req);
+X86MachineState *x86ms = X86_MACHINE(mms);
+
+if (x86ms->acpi_dev) {
+Object *obj = OBJECT(x86ms->acpi_dev);
+AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(obj);
+adevc->send_event(ACPI_DEVICE_IF(x86ms->acpi_dev),
+  ACPI_POWER_DOWN_STATUS);
+}
+}
+
 static void microvm_machine_state_init(MachineState *machine)
 {
 MicrovmMachineState *mms = MICROVM_MACHINE(machine);
 X86MachineState *x86ms = X86_MACHINE(machine);
 
+/* State */
+mms->kernel_cmdline_fixed = false;
+
+mms->machine_done.notify = microvm_machine_done;
+qemu_add_machine_init_done_notifier(&mms->machine_done);
+mms->powerdown_req.notify = microvm_powerdown_req;
+qemu_register_powerdown_notifier(&mms->powerdown_req);
+
 microvm_memory_init(mms);
 
 x86_cpus_init(x86ms, CPU_VERSION_LATEST);
@@ -581,31 +614,6 @@ static void microvm_machine_set_auto_kernel_cmdline(Object 
*obj, bool value,
 mms->auto_kernel_cmdline = value;
 }
 
-static void microvm_machine_done(Notifier *notifier, void *data)
-{
-MicrovmMachineState *mms = container_of(notifier, MicrovmMachineState,
-machine_done);
-X86MachineState *x86ms = X86_MACHINE(mms);
-
-acpi_setup_microvm(mms);
-dt_setup_microvm(mms);
-fw_cfg_add_e820(x86ms->fw_cfg);
-}
-
-static void microvm_powerdown_req(Notifier *notifier, void *data)
-{
-MicrovmMachineState *mms = container_of(notifier, MicrovmMachineState,
-powerdown_req);
-X86MachineState *x86ms = X86_MACHINE(mms);
-
-if (x86ms->acpi_dev) {
-Object *obj = OBJECT(x86ms->acpi_dev);
-AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(obj);
-adevc->send_event(ACPI_DEVICE_IF(x86ms->acpi_dev),
-  ACPI_POWER_DOWN_STATUS);
-}
-}
-
 static void microvm_machine_initfn(Object *obj)
 {
 MicrovmMachineState *mms = MICROVM_MACHINE(obj);
@@ -617,14 +625,6 @@ static void microvm_machine_initfn(Object *obj)
 mms->isa_serial = true;
 mms->option_roms = true;
 mms->auto_kernel_cmdline = true;
-
-  

[Stable-9.2.3 26/51] target/riscv: throw debug exception before page fault

2025-03-14 Thread Michael Tokarev
From: Daniel Henrique Barboza 

In the RISC-V privileged ISA section 3.1.15 table 15, it is determined
that a debug exception that is triggered from a load/store has a higher
priority than a possible fault that this access might trigger.

This is not the case ATM as shown in [1]. Adding a breakpoint in an
address that deliberately will fault is causing a load page fault
instead of a debug exception. The reason is that we're throwing in the
page fault as soon as the fault occurs (end of riscv_cpu_tlb_fill(),
raise_mmu_exception()), not allowing the installed watchpoints to
trigger.

Call cpu_check_watchpoint() in the page fault path to search and execute
any watchpoints that might exist for the address, never returning back
to the fault path. If no watchpoints are found cpu_check_watchpoint()
will return and we'll fall-through the regular path to
raise_mmu_exception().

[1] https://gitlab.com/qemu-project/qemu/-/issues/2627

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2627
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Alistair Francis 
Reviewed-by: Richard Henderson 
Message-ID: <20250121170626.1992570-3-dbarb...@ventanamicro.com>
Signed-off-by: Alistair Francis 
(cherry picked from commit c86edc547692d812d1dcc04220c38310be2c00c3)
Signed-off-by: Michael Tokarev 

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 45806f5ab0..dfd6a9d149 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -27,6 +27,7 @@
 #include "exec/page-protection.h"
 #include "instmap.h"
 #include "tcg/tcg-op.h"
+#include "hw/core/tcg-cpu-ops.h"
 #include "trace.h"
 #include "semihosting/common-semi.h"
 #include "sysemu/cpu-timers.h"
@@ -1550,6 +1551,23 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 } else if (probe) {
 return false;
 } else {
+int wp_access = 0;
+
+if (access_type == MMU_DATA_LOAD) {
+wp_access |= BP_MEM_READ;
+} else if (access_type == MMU_DATA_STORE) {
+wp_access |= BP_MEM_WRITE;
+}
+
+/*
+ * If a watchpoint isn't found for 'addr' this will
+ * be a no-op and we'll resume the mmu_exception path.
+ * Otherwise we'll throw a debug exception and execution
+ * will continue elsewhere.
+ */
+cpu_check_watchpoint(cs, address, size, MEMTXATTRS_UNSPECIFIED,
+ wp_access, retaddr);
+
 raise_mmu_exception(env, address, access_type, pmp_violation,
 first_stage_error, two_stage_lookup,
 two_stage_indirect_error);
-- 
2.39.5




[Stable-9.2.3 40/51] target/arm: Correct STRD atomicity

2025-03-14 Thread Michael Tokarev
From: Peter Maydell 

Our STRD implementation doesn't correctly implement the requirement:
 * if the address is 8-aligned the access must be a 64-bit
   single-copy atomic access, not two 32-bit accesses

Rewrite the handling of STRD to use a single tcg_gen_qemu_st_i64()
of a value produced by concatenating the two 32 bit source registers.
This allows us to get the atomicity right.

As with the LDRD change, now that we don't update 'addr' in the
course of performing the store we need to adjust the offset
we pass to op_addr_ri_post() and op_addr_rr_post().

Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20250227142746.1698904-3-peter.mayd...@linaro.org
(cherry picked from commit ee786ca115045a2b7e86ac3073b0761cb99e0d49)
Signed-off-by: Michael Tokarev 

diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index a2933f1c36..4eba3d1c8d 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -5063,10 +5063,42 @@ static bool trans_LDRD_rr(DisasContext *s, arg_ldst_rr 
*a)
 return true;
 }
 
-static bool trans_STRD_rr(DisasContext *s, arg_ldst_rr *a)
+static void do_strd_store(DisasContext *s, TCGv_i32 addr, int rt, int rt2)
 {
+/*
+ * STRD is required to be an atomic 64-bit access if the
+ * address is 8-aligned, two atomic 32-bit accesses if
+ * it's only 4-aligned, and to give an alignment fault
+ * if it's not 4-aligned.
+ * Rt is always the word from the lower address, and Rt2 the
+ * data from the higher address, regardless of endianness.
+ * So (like gen_store_exclusive) we avoid gen_aa32_ld_i64()
+ * so we don't get its SCTLR_B check, and instead do a 64-bit access
+ * using MO_BE if appropriate, using a value constructed
+ * by putting the two halves together in the right order.
+ *
+ * As with LDRD, the 64-bit atomicity is not required for
+ * M-profile, or for A-profile before LPAE, and we provide
+ * the higher guarantee always for simplicity.
+ */
 int mem_idx = get_mem_index(s);
-TCGv_i32 addr, tmp;
+MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data;
+TCGv taddr = gen_aa32_addr(s, addr, opc);
+TCGv_i32 t1 = load_reg(s, rt);
+TCGv_i32 t2 = load_reg(s, rt2);
+TCGv_i64 t64 = tcg_temp_new_i64();
+
+if (s->be_data == MO_BE) {
+tcg_gen_concat_i32_i64(t64, t2, t1);
+} else {
+tcg_gen_concat_i32_i64(t64, t1, t2);
+}
+tcg_gen_qemu_st_i64(t64, taddr, mem_idx, opc);
+}
+
+static bool trans_STRD_rr(DisasContext *s, arg_ldst_rr *a)
+{
+TCGv_i32 addr;
 
 if (!ENABLE_ARCH_5TE) {
 return false;
@@ -5077,15 +5109,9 @@ static bool trans_STRD_rr(DisasContext *s, arg_ldst_rr 
*a)
 }
 addr = op_addr_rr_pre(s, a);
 
-tmp = load_reg(s, a->rt);
-gen_aa32_st_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
+do_strd_store(s, addr, a->rt, a->rt + 1);
 
-tcg_gen_addi_i32(addr, addr, 4);
-
-tmp = load_reg(s, a->rt + 1);
-gen_aa32_st_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
-
-op_addr_rr_post(s, a, addr, -4);
+op_addr_rr_post(s, a, addr, 0);
 return true;
 }
 
@@ -5213,20 +5239,13 @@ static bool trans_LDRD_ri_t32(DisasContext *s, 
arg_ldst_ri2 *a)
 
 static bool op_strd_ri(DisasContext *s, arg_ldst_ri *a, int rt2)
 {
-int mem_idx = get_mem_index(s);
-TCGv_i32 addr, tmp;
+TCGv_i32 addr;
 
 addr = op_addr_ri_pre(s, a);
 
-tmp = load_reg(s, a->rt);
-gen_aa32_st_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
-
-tcg_gen_addi_i32(addr, addr, 4);
-
-tmp = load_reg(s, rt2);
-gen_aa32_st_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
+do_strd_store(s, addr, a->rt, rt2);
 
-op_addr_ri_post(s, a, addr, -4);
+op_addr_ri_post(s, a, addr, 0);
 return true;
 }
 
-- 
2.39.5




[Stable-9.2.3 38/51] hw/arm: enable secure EL2 timers for sbsa machine

2025-03-14 Thread Michael Tokarev
From: Alex Bennée 

Signed-off-by: Alex Bennée 
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
Message-id: 20250204125009.2281315-10-peter.mayd...@linaro.org
Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Maydell 
(cherry picked from commit 9a9d9e82093efa22e3e2bdaac0f24c823f8786f7)
Signed-off-by: Michael Tokarev 

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index e3195d5449..e9985a5e3b 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -484,6 +484,8 @@ static void create_gic(SBSAMachineState *sms, MemoryRegion 
*mem)
 [GTIMER_HYP]  = ARCH_TIMER_NS_EL2_IRQ,
 [GTIMER_SEC]  = ARCH_TIMER_S_EL1_IRQ,
 [GTIMER_HYPVIRT] = ARCH_TIMER_NS_EL2_VIRT_IRQ,
+[GTIMER_S_EL2_PHYS] = ARCH_TIMER_S_EL2_IRQ,
+[GTIMER_S_EL2_VIRT] = ARCH_TIMER_S_EL2_VIRT_IRQ,
 };
 
 for (irq = 0; irq < ARRAY_SIZE(timer_irq); irq++) {
-- 
2.39.5




[Stable-9.2.3 01/51] target/arm: Report correct syndrome for UNDEFINED CNTPS_*_EL1 from EL2 and NS EL1

2025-03-14 Thread Michael Tokarev
From: Peter Maydell 

The access pseudocode for the CNTPS_TVAL_EL1, CNTPS_CTL_EL1 and
CNTPS_CVAL_EL1 secure timer registers says that they are UNDEFINED
from EL2 or NS EL1.  We incorrectly return CP_ACCESS_TRAP from the
access function in these cases, which means that we report the wrong
syndrome value to the target EL.

Use CP_ACCESS_TRAP_UNCATEGORIZED, which reports the correct syndrome
value for an UNDEFINED instruction.

Cc: qemu-sta...@nongnu.org
Fixes: b4d3978c2fd ("target-arm: Add the AArch64 view of the Secure physical 
timer")
Signed-off-by: Peter Maydell 
Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Message-id: 20250130182309.717346-2-peter.mayd...@linaro.org
(cherry picked from commit b819fd6994243aee6f9613edbbacedce4f511c32)
Signed-off-by: Michael Tokarev 

diff --git a/target/arm/helper.c b/target/arm/helper.c
index fcb13fe87e..8a0065ef60 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -2652,7 +2652,7 @@ static CPAccessResult gt_stimer_access(CPUARMState *env,
 switch (arm_current_el(env)) {
 case 1:
 if (!arm_is_secure(env)) {
-return CP_ACCESS_TRAP;
+return CP_ACCESS_TRAP_UNCATEGORIZED;
 }
 if (!(env->cp15.scr_el3 & SCR_ST)) {
 return CP_ACCESS_TRAP_EL3;
@@ -2660,7 +2660,7 @@ static CPAccessResult gt_stimer_access(CPUARMState *env,
 return CP_ACCESS_OK;
 case 0:
 case 2:
-return CP_ACCESS_TRAP;
+return CP_ACCESS_TRAP_UNCATEGORIZED;
 case 3:
 return CP_ACCESS_OK;
 default:
-- 
2.39.5




[Stable-9.2.3 00/51] Patch Round-up for stable 9.2.3, freeze on 2025-03-24

2025-03-14 Thread Michael Tokarev
The following patches are queued for QEMU stable v9.2.3:

  https://gitlab.com/qemu-project/qemu/-/commits/staging-9.2

Patch freeze is 2025-03-24, and the release is planned for 2025-03-26:

  https://wiki.qemu.org/Planning/9.2

Please respond here or CC qemu-sta...@nongnu.org on any additional patches
you think should (or shouldn't) be included in the release.

The changes which are staging for inclusion, with the original commit hash
from master branch, are given below the bottom line.

Thanks!

/mjt

--
01 b819fd699424 Peter Maydell:
   target/arm: Report correct syndrome for UNDEFINED CNTPS_*_EL1 from EL2 
   and NS EL1
02 1960d9701ef7 Peter Maydell:
   target/arm: Report correct syndrome for UNDEFINED AT ops with wrong NSE, 
   NS
03 ccda792945d6 Peter Maydell:
   target/arm: Report correct syndrome for UNDEFINED S1E2 AT ops at EL3
04 707d478ed8f2 Peter Maydell:
   target/arm: Report correct syndrome for UNDEFINED LOR sysregs when NS=0
05 4cf494865161 Peter Maydell:
   target/arm: Make CP_ACCESS_TRAPs to AArch32 EL3 be Monitor traps
06 d04c6c3c000a Peter Maydell:
   hw/intc/arm_gicv3_cpuif: Don't downgrade monitor traps for AArch32 EL3
07 4d436fb05c2a Peter Maydell:
   target/arm: Honour SDCR.TDCC and SCR.TERR in AArch32 EL3 non-Monitor modes
08 2b95a2d01b04 Peter Maydell:
   target/arm: Correct errors in WFI/WFE trapping
09 464ce71a963b Bernhard Beschow:
   Kconfig: Extract CONFIG_USB_CHIPIDEA from CONFIG_IMX
10 bc82af6b0dcb Akihiko Odaki:
   hw/net: Fix NULL dereference with software RSS
11 de538288e4da Thomas Huth:
   hw/i386/pc: Fix crash that occurs when introspecting TYPE_PC_MACHINE 
   machines
12 38ef383073b8 Thomas Huth:
   hw/i386/microvm: Fix crash that occurs when introspecting the microvm 
   machine
13 63dc0b864739 Sairaj Kodilkar:
   amd_iommu: Use correct DTE field for interrupt passthrough
14 3684717b7407 Sairaj Kodilkar:
   amd_iommu: Use correct bitmask to set capability BAR
15 83cb18ac4500 Stefano Garzarella:
   cryptodev/vhost: allocate CryptoDevBackendVhost using g_mem0()
16 50e975414906 Konstantin Shkolnyy:
   vdpa: Fix endian bugs in shadow virtqueue
17 131fe64e63c8 Alexander Graf:
   hw/virtio/virtio-nsm: Respond with correct length
18 e87b6efb11be Matias Ezequiel Vara Larsen:
   vhost-user-snd: correct the calculation of config_size
19 7bd4eaa847fc Bibo Mao:
   target/loongarch/gdbstub: Fix gdbstub incorrectly handling some registers
20 6debfb2cb179 Paolo Bonzini:
   physmem: replace assertion with error
21 fd207677a830 Joelle van Dyne:
   target/arm/hvf: Disable SME feature
22 12c365315ab2 Joelle van Dyne:
   target/arm/hvf: sign extend the data for a load operation when SSE=1
23 ffd455963f23 Max Chou:
   target/riscv: rvv: Fix unexpected behavior of vector reduction 
   instructions when vl is 0
24 bf3adf93f167 Max Chou:
   target/riscv: rvv: Fix incorrect vlen comparison in prop_vlen_set
25 3fba76e61caa Daniel Henrique Barboza:
   target/riscv/debug.c: use wp size = 4 for 32-bit CPUs
26 c86edc547692 Daniel Henrique Barboza:
   target/riscv: throw debug exception before page fault
27 3521f9cadc29 Rodrigo Dias Correa:
   goldfish_rtc: Fix tick_offset migration
28 2ad638a3d160 Denis Rastyogin:
   block/qed: fix use-after-free by nullifying timer pointer after free
29 87c8b4fc3c1c Markus Armbruster:
   docs/about/build-platforms: Correct minimum supported Python version
30 3b2e22c0bbe2 Patrick Venture:
   hw/gpio: npcm7xx: fixup out-of-bounds access
31 db6c2192839e Peter Maydell:
   target/arm: Apply correct timer offset when calculating deadlines
32 5709038aa8b4 Peter Maydell:
   target/arm: Don't apply CNTVOFF_EL2 for EL2_VIRT timer
33 bdd641541fbe Peter Maydell:
   target/arm: Make CNTPS_* UNDEF from Secure EL1 when Secure EL2 is enabled
34 4aecd4b442d7 Peter Maydell:
   target/arm: Always apply CNTVOFF_EL2 for CNTV_TVAL_EL02 accesses
35 02c648a0a103 Peter Maydell:
   target/arm: Refactor handling of timer offset for direct register accesses
36 f9f99d7ca522 Alex Bennée:
   target/arm: Implement SEL2 physical and virtual timers
37 5dcaea8bcd82 Alex Bennée:
   hw/arm: enable secure EL2 timers for virt machine
38 9a9d9e82093e Alex Bennée:
   hw/arm: enable secure EL2 timers for sbsa machine
39 cde3247651dc Peter Maydell:
   target/arm: Correct LDRD atomicity and fault behaviour
40 ee786ca11504 Peter Maydell:
   target/arm: Correct STRD atomicity
41 02ae315467ce Peter Maydell:
   util/qemu-timer.c: Don't warp timer from timerlist_rearm()
42 db0d4017f9b9 Eugenio Pérez:
   net: parameterize the removing client from nc list
43 e7891c575fb2 Eugenio Pérez:
   net: move backend cleanup to NIC cleanup
44 68adcc784bad Stefano Stabellini:
   xen: No need to flush the mapcache for grants
45 29c041ca7f8d Nicholas Piggin:
   ppc/pnv/occ: Fix common area sensor offsets
46 2fa3a5b94696 Peter Maydell:
   hw/net/smc91c111: Sanitize packet numbers
47 aad6f264add3 Peter Maydell:
   hw/net/smc91c111: Sanitize packet length on tx
48 700d3d6dd41d Peter Maydell:

[PULL 09/17] target/arm: Add cpu local variable to exception_return helper

2025-03-14 Thread Peter Maydell
We already call env_archcpu() multiple times within the
exception_return helper function, and we're about to want to
add another use of the ARMCPU pointer. Add a local variable
cpu so we can call env_archcpu() just once.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/tcg/helper-a64.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
index 32f0647ca4f..e2bdf07833d 100644
--- a/target/arm/tcg/helper-a64.c
+++ b/target/arm/tcg/helper-a64.c
@@ -631,6 +631,7 @@ static void cpsr_write_from_spsr_elx(CPUARMState *env,
 
 void HELPER(exception_return)(CPUARMState *env, uint64_t new_pc)
 {
+ARMCPU *cpu = env_archcpu(env);
 int cur_el = arm_current_el(env);
 unsigned int spsr_idx = aarch64_banked_spsr_index(cur_el);
 uint32_t spsr = env->banked_spsr[spsr_idx];
@@ -682,7 +683,7 @@ void HELPER(exception_return)(CPUARMState *env, uint64_t 
new_pc)
 }
 
 bql_lock();
-arm_call_pre_el_change_hook(env_archcpu(env));
+arm_call_pre_el_change_hook(cpu);
 bql_unlock();
 
 if (!return_to_aa64) {
@@ -710,7 +711,7 @@ void HELPER(exception_return)(CPUARMState *env, uint64_t 
new_pc)
 int tbii;
 
 env->aarch64 = true;
-spsr &= aarch64_pstate_valid_mask(&env_archcpu(env)->isar);
+spsr &= aarch64_pstate_valid_mask(&cpu->isar);
 pstate_write(env, spsr);
 if (!arm_singlestep_active(env)) {
 env->pstate &= ~PSTATE_SS;
@@ -749,7 +750,7 @@ void HELPER(exception_return)(CPUARMState *env, uint64_t 
new_pc)
 aarch64_sve_change_el(env, cur_el, new_el, return_to_aa64);
 
 bql_lock();
-arm_call_el_change_hook(env_archcpu(env));
+arm_call_el_change_hook(cpu);
 bql_unlock();
 
 return;
-- 
2.43.0




[Stable-9.2.3 39/51] target/arm: Correct LDRD atomicity and fault behaviour

2025-03-14 Thread Michael Tokarev
From: Peter Maydell 

Our LDRD implementation is wrong in two respects:

 * if the address is 4-aligned and the load crosses a page boundary
   and the second load faults and the first load was to the
   base register (as in cases like "ldrd r2, r3, [r2]", then we
   must not update the base register before taking the fault
 * if the address is 8-aligned the access must be a 64-bit
   single-copy atomic access, not two 32-bit accesses

Rewrite the handling of the loads in LDRD to use a single
tcg_gen_qemu_ld_i64() and split the result into the destination
registers. This allows us to get the atomicity requirements
right, and also implicitly means that we won't update the
base register too early for the page-crossing case.

Note that because we no longer increment 'addr' by 4 in the course of
performing the LDRD we must change the adjustment value we pass to
op_addr_ri_post() and op_addr_rr_post(): it no longer needs to
subtract 4 to get the correct value to use if doing base register
writeback.

STRD has the same problem with not getting the atomicity right;
we will deal with that in the following commit.

Cc: qemu-sta...@nongnu.org
Reported-by: Stu Grossman 
Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20250227142746.1698904-2-peter.mayd...@linaro.org
(cherry picked from commit cde3247651dc998da5dc1005148302a90d72f21f)
Signed-off-by: Michael Tokarev 

diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index 9ee761fc64..a2933f1c36 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -5003,10 +5003,49 @@ static bool op_store_rr(DisasContext *s, arg_ldst_rr *a,
 return true;
 }
 
-static bool trans_LDRD_rr(DisasContext *s, arg_ldst_rr *a)
+static void do_ldrd_load(DisasContext *s, TCGv_i32 addr, int rt, int rt2)
 {
+/*
+ * LDRD is required to be an atomic 64-bit access if the
+ * address is 8-aligned, two atomic 32-bit accesses if
+ * it's only 4-aligned, and to give an alignment fault
+ * if it's not 4-aligned. This is MO_ALIGN_4 | MO_ATOM_SUBALIGN.
+ * Rt is always the word from the lower address, and Rt2 the
+ * data from the higher address, regardless of endianness.
+ * So (like gen_load_exclusive) we avoid gen_aa32_ld_i64()
+ * so we don't get its SCTLR_B check, and instead do a 64-bit access
+ * using MO_BE if appropriate and then split the two halves.
+ *
+ * For M-profile, and for A-profile before LPAE, the 64-bit
+ * atomicity is not required. We could model that using
+ * the looser MO_ATOM_IFALIGN_PAIR, but providing a higher
+ * level of atomicity than required is harmless (we would not
+ * currently generate better code for IFALIGN_PAIR here).
+ *
+ * This also gives us the correct behaviour of not updating
+ * rt if the load of rt2 faults; this is required for cases
+ * like "ldrd r2, r3, [r2]" where rt is also the base register.
+ */
 int mem_idx = get_mem_index(s);
-TCGv_i32 addr, tmp;
+MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data;
+TCGv taddr = gen_aa32_addr(s, addr, opc);
+TCGv_i64 t64 = tcg_temp_new_i64();
+TCGv_i32 tmp = tcg_temp_new_i32();
+TCGv_i32 tmp2 = tcg_temp_new_i32();
+
+tcg_gen_qemu_ld_i64(t64, taddr, mem_idx, opc);
+if (s->be_data == MO_BE) {
+tcg_gen_extr_i64_i32(tmp2, tmp, t64);
+} else {
+tcg_gen_extr_i64_i32(tmp, tmp2, t64);
+}
+store_reg(s, rt, tmp);
+store_reg(s, rt2, tmp2);
+}
+
+static bool trans_LDRD_rr(DisasContext *s, arg_ldst_rr *a)
+{
+TCGv_i32 addr;
 
 if (!ENABLE_ARCH_5TE) {
 return false;
@@ -5017,18 +5056,10 @@ static bool trans_LDRD_rr(DisasContext *s, arg_ldst_rr 
*a)
 }
 addr = op_addr_rr_pre(s, a);
 
-tmp = tcg_temp_new_i32();
-gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
-store_reg(s, a->rt, tmp);
-
-tcg_gen_addi_i32(addr, addr, 4);
-
-tmp = tcg_temp_new_i32();
-gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
-store_reg(s, a->rt + 1, tmp);
+do_ldrd_load(s, addr, a->rt, a->rt + 1);
 
 /* LDRD w/ base writeback is undefined if the registers overlap.  */
-op_addr_rr_post(s, a, addr, -4);
+op_addr_rr_post(s, a, addr, 0);
 return true;
 }
 
@@ -5152,23 +5183,14 @@ static bool op_store_ri(DisasContext *s, arg_ldst_ri *a,
 
 static bool op_ldrd_ri(DisasContext *s, arg_ldst_ri *a, int rt2)
 {
-int mem_idx = get_mem_index(s);
-TCGv_i32 addr, tmp;
+TCGv_i32 addr;
 
 addr = op_addr_ri_pre(s, a);
 
-tmp = tcg_temp_new_i32();
-gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
-store_reg(s, a->rt, tmp);
-
-tcg_gen_addi_i32(addr, addr, 4);
-
-tmp = tcg_temp_new_i32();
-gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
-store_reg(s, rt2, tmp);
+do_ldrd_load(s, addr, a->rt, rt2);
 
 /* LDRD w/ base writeback is undefined if the registers overlap.  */
-op_addr_ri_post(s,

[PULL 15/17] target/arm: Make DisasContext.{fp, sve}_access_checked tristate

2025-03-14 Thread Peter Maydell
From: Richard Henderson 

The check for fp_excp_el in assert_fp_access_checked is
incorrect.  For SME, with StreamingMode enabled, the access
is really against the streaming mode vectors, and access
to the normal fp registers is allowed to be disabled.
C.f. sme_enabled_check.

Convert sve_access_checked to match, even though we don't
currently check the exception state.

Cc: qemu-sta...@nongnu.org
Fixes: 3d74825f4d6 ("target/arm: Add SME enablement checks")
Signed-off-by: Richard Henderson 
Message-id: 20250307190415.982049-2-richard.hender...@linaro.org
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/translate-a64.h |  2 +-
 target/arm/tcg/translate.h | 10 +++---
 target/arm/tcg/translate-a64.c | 17 +
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
index 7d3b59ccd96..b2420f59ebe 100644
--- a/target/arm/tcg/translate-a64.h
+++ b/target/arm/tcg/translate-a64.h
@@ -65,7 +65,7 @@ TCGv_i64 gen_mte_checkN(DisasContext *s, TCGv_i64 addr, bool 
is_write,
 static inline void assert_fp_access_checked(DisasContext *s)
 {
 #ifdef CONFIG_DEBUG_TCG
-if (unlikely(!s->fp_access_checked || s->fp_excp_el)) {
+if (unlikely(s->fp_access_checked <= 0)) {
 fprintf(stderr, "target-arm: FP access check missing for "
 "instruction 0x%08x\n", s->insn);
 abort();
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index f8dc2f0d4bb..53e485d28ac 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -92,15 +92,19 @@ typedef struct DisasContext {
 bool aarch64;
 bool thumb;
 bool lse2;
-/* Because unallocated encodings generate different exception syndrome
+/*
+ * Because unallocated encodings generate different exception syndrome
  * information from traps due to FP being disabled, we can't do a single
  * "is fp access disabled" check at a high level in the decode tree.
  * To help in catching bugs where the access check was forgotten in some
  * code path, we set this flag when the access check is done, and assert
  * that it is set at the point where we actually touch the FP regs.
+ *   0: not checked,
+ *   1: checked, access ok
+ *  -1: checked, access denied
  */
-bool fp_access_checked;
-bool sve_access_checked;
+int8_t fp_access_checked;
+int8_t sve_access_checked;
 /* ARMv8 single-step state (this is distinct from the QEMU gdbstub
  * single-step support).
  */
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 8bef391bb03..48e0ac75b11 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -1381,14 +1381,14 @@ static bool fp_access_check_only(DisasContext *s)
 {
 if (s->fp_excp_el) {
 assert(!s->fp_access_checked);
-s->fp_access_checked = true;
+s->fp_access_checked = -1;
 
 gen_exception_insn_el(s, 0, EXCP_UDEF,
   syn_fp_access_trap(1, 0xe, false, 0),
   s->fp_excp_el);
 return false;
 }
-s->fp_access_checked = true;
+s->fp_access_checked = 1;
 return true;
 }
 
@@ -1465,13 +1465,13 @@ bool sve_access_check(DisasContext *s)
   syn_sve_access_trap(), s->sve_excp_el);
 goto fail_exit;
 }
-s->sve_access_checked = true;
+s->sve_access_checked = 1;
 return fp_access_check(s);
 
  fail_exit:
 /* Assert that we only raise one exception per instruction. */
 assert(!s->sve_access_checked);
-s->sve_access_checked = true;
+s->sve_access_checked = -1;
 return false;
 }
 
@@ -1500,8 +1500,9 @@ bool sme_enabled_check(DisasContext *s)
  * sme_excp_el by itself for cpregs access checks.
  */
 if (!s->fp_excp_el || s->sme_excp_el < s->fp_excp_el) {
-s->fp_access_checked = true;
-return sme_access_check(s);
+bool ret = sme_access_check(s);
+s->fp_access_checked = (ret ? 1 : -1);
+return ret;
 }
 return fp_access_check_only(s);
 }
@@ -10257,8 +10258,8 @@ static void aarch64_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 s->insn = insn;
 s->base.pc_next = pc + 4;
 
-s->fp_access_checked = false;
-s->sve_access_checked = false;
+s->fp_access_checked = 0;
+s->sve_access_checked = 0;
 
 if (s->pstate_il) {
 /*
-- 
2.43.0




[PULL 14/17] util/cacheflush: Make first DSB unconditional on aarch64

2025-03-14 Thread Peter Maydell
From: Joe Komlodi 

On ARM hosts with CTR_EL0.DIC and CTR_EL0.IDC set, this would only cause
an ISB to be executed during cache maintenance, which could lead to QEMU
executing TBs containing garbage instructions.

This seems to be because the ISB finishes executing instructions and
flushes the pipeline, but the ISB doesn't guarantee that writes from the
executed instructions are committed. If a small enough TB is created, it's
possible that the writes setting up the TB aren't committed by the time the
TB is executed.

This function is intended to be a port of the gcc implementation
(https://github.com/gcc-mirror/gcc/blob/85b46d0795ac76bc192cb8f88b646a647acf98c1/libgcc/config/aarch64/sync-cache.c#L67)
which makes the first DSB unconditional, so we can fix the synchronization
issue by doing that as well.

Cc: qemu-sta...@nongnu.org
Fixes: 664a79735e4deb1 ("util: Specialize flush_idcache_range for aarch64")
Signed-off-by: Joe Komlodi 
Message-id: 20250310203622.1827940-2-koml...@google.com
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Signed-off-by: Peter Maydell 
---
 util/cacheflush.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/util/cacheflush.c b/util/cacheflush.c
index a08906155a9..1d12899a392 100644
--- a/util/cacheflush.c
+++ b/util/cacheflush.c
@@ -279,9 +279,11 @@ void flush_idcache_range(uintptr_t rx, uintptr_t rw, 
size_t len)
 for (p = rw & -dcache_lsize; p < rw + len; p += dcache_lsize) {
 asm volatile("dc\tcvau, %0" : : "r" (p) : "memory");
 }
-asm volatile("dsb\tish" : : : "memory");
 }
 
+/* DSB unconditionally to ensure any outstanding writes are committed. */
+asm volatile("dsb\tish" : : : "memory");
+
 /*
  * If CTR_EL0.DIC is enabled, Instruction cache cleaning to the Point
  * of Unification is not required for instruction to data coherence.
-- 
2.43.0




[PATCH 1/1] linux-user: Hold the fd-trans lock across fork

2025-03-14 Thread Geoffrey Thomas
If another thread is holding target_fd_trans_lock during a fork, then the lock
becomes permanently locked in the child and the emulator deadlocks at the next
interaction with the fd-trans table. As with other locks, acquire the lock in
fork_start() and release it in fork_end().

Signed-off-by: Geoffrey Thomas 
Fixes: c093364f4d91 "fd-trans: Fix race condition on reallocation of the 
translation table."
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2846
Buglink: https://github.com/astral-sh/uv/issues/6105
---
 linux-user/fd-trans.h | 10 ++
 linux-user/main.c |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/linux-user/fd-trans.h b/linux-user/fd-trans.h
index 910faaf237..e14f96059c 100644
--- a/linux-user/fd-trans.h
+++ b/linux-user/fd-trans.h
@@ -36,6 +36,16 @@ static inline void fd_trans_init(void)
 qemu_mutex_init(&target_fd_trans_lock);
 }
 
+static inline void fd_trans_prefork(void)
+{
+qemu_mutex_lock(&target_fd_trans_lock);
+}
+
+static inline void fd_trans_postfork(void)
+{
+qemu_mutex_unlock(&target_fd_trans_lock);
+}
+
 static inline TargetFdDataFunc fd_trans_target_to_host_data(int fd)
 {
 if (fd < 0) {
diff --git a/linux-user/main.c b/linux-user/main.c
index e2ec5970be..2cd867491b 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -149,12 +149,14 @@ void fork_start(void)
 cpu_list_lock();
 qemu_plugin_user_prefork_lock();
 gdbserver_fork_start();
+fd_trans_prefork();
 }
 
 void fork_end(pid_t pid)
 {
 bool child = pid == 0;
 
+fd_trans_postfork();
 qemu_plugin_user_postfork(child);
 mmap_fork_end(child);
 if (child) {
-- 
2.39.5 (Apple Git-154)




[PULL 07/17] target/arm: SCR_EL3.RW should be treated as 1 if EL2 doesn't support AArch32

2025-03-14 Thread Peter Maydell
The definition of SCR_EL3.RW says that its effective value is 1 if:
 - EL2 is implemented and does not support AArch32, and SCR_EL3.NS is 1
 - the effective value of SCR_EL3.{EEL2,NS} is {1,0} (i.e. we are
   Secure and Secure EL2 is disabled)

We implement the second of these in arm_el_is_aa64(), but forgot the
first.

Provide a new function arm_scr_rw_eff() to return the effective
value of SCR_EL3.RW, and use it in arm_el_is_aa64() and the other
places that currently look directly at the bit value.

(scr_write() enforces that the RW bit is RAO/WI if neither EL1 nor
EL2 have AArch32 support, but if EL1 does but EL2 does not then the
bit must still be writeable.)

This will mean that if code at EL3 attempts to perform an exception
return to AArch32 EL2 when EL2 is AArch64-only we will correctly
handle this as an illegal exception return: it will be caught by the
"return to an EL which is configured for a different register width"
check in HELPER(exception_return).

We do already have some CPU types which don't implement AArch32
above EL0, so this is technically a bug; it doesn't seem worth
backporting to stable because no sensible guest code will be
deliberately attempting to set the RW bit to a value corresponding
to an unimplemented execution state and then checking that we
did the right thing.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/internals.h | 26 +++---
 target/arm/helper.c|  4 ++--
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index d161a3e396b..28585c07555 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -392,6 +392,27 @@ static inline FloatRoundMode arm_rmode_to_sf(ARMFPRounding 
rmode)
 return arm_rmode_to_sf_map[rmode];
 }
 
+/* Return the effective value of SCR_EL3.RW */
+static inline bool arm_scr_rw_eff(CPUARMState *env)
+{
+/*
+ * SCR_EL3.RW has an effective value of 1 if:
+ *  - we are NS and EL2 is implemented but doesn't support AArch32
+ *  - we are S and EL2 is enabled (in which case it must be AArch64)
+ */
+ARMCPU *cpu = env_archcpu(env);
+
+if (env->cp15.scr_el3 & SCR_RW) {
+return true;
+}
+if (env->cp15.scr_el3 & SCR_NS) {
+return arm_feature(env, ARM_FEATURE_EL2) &&
+!cpu_isar_feature(aa64_aa32_el2, cpu);
+} else {
+return env->cp15.scr_el3 & SCR_EEL2;
+}
+}
+
 /* Return true if the specified exception level is running in AArch64 state. */
 static inline bool arm_el_is_aa64(CPUARMState *env, int el)
 {
@@ -411,9 +432,8 @@ static inline bool arm_el_is_aa64(CPUARMState *env, int el)
 return aa64;
 }
 
-if (arm_feature(env, ARM_FEATURE_EL3) &&
-((env->cp15.scr_el3 & SCR_NS) || !(env->cp15.scr_el3 & SCR_EEL2))) {
-aa64 = aa64 && (env->cp15.scr_el3 & SCR_RW);
+if (arm_feature(env, ARM_FEATURE_EL3)) {
+aa64 = aa64 && arm_scr_rw_eff(env);
 }
 
 if (el == 2) {
diff --git a/target/arm/helper.c b/target/arm/helper.c
index f0ead22937b..3df7d5347cb 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -9818,7 +9818,7 @@ uint32_t arm_phys_excp_target_el(CPUState *cs, uint32_t 
excp_idx,
 uint64_t hcr_el2;
 
 if (arm_feature(env, ARM_FEATURE_EL3)) {
-rw = ((env->cp15.scr_el3 & SCR_RW) == SCR_RW);
+rw = arm_scr_rw_eff(env);
 } else {
 /*
  * Either EL2 is the highest EL (and so the EL2 register width
@@ -10627,7 +10627,7 @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
 
 switch (new_el) {
 case 3:
-is_aa64 = (env->cp15.scr_el3 & SCR_RW) != 0;
+is_aa64 = arm_scr_rw_eff(env);
 break;
 case 2:
 hcr = arm_hcr_el2_eff(env);
-- 
2.43.0




Re: [PATCH RFC] target: riscv: Add Svrsw60b59b extension support

2025-03-14 Thread Daniel Henrique Barboza




On 3/14/25 9:11 AM, Alexandre Ghiti wrote:

On Fri, Mar 14, 2025 at 11:48 AM Alexandre Ghiti  wrote:


The Svrsw60b59b extension allows to free the PTE reserved bits 60 and 59
for software to use.


I missed that the extension had been renamed to Svrsw60*t*59b, I'll
fix that in v2 later after I collect some feedback.


Just to be clear: the extension is going to be named Svrsw60t59b, not
"Svrsw60*t*59b". Correct?


Aside from that code LGTM. Thanks,

Daniel




Thanks,

Alex



Signed-off-by: Alexandre Ghiti 
---

I tested it by always setting the bits 60 and 59 in Linux which booted
fine.

  target/riscv/cpu.c| 2 ++
  target/riscv/cpu_bits.h   | 3 ++-
  target/riscv/cpu_cfg.h| 1 +
  target/riscv/cpu_helper.c | 3 ++-
  4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 3d4bd157d2..ee89cdef46 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -219,6 +219,7 @@ const RISCVIsaExtData isa_edata_arr[] = {
  ISA_EXT_DATA_ENTRY(svinval, PRIV_VERSION_1_12_0, ext_svinval),
  ISA_EXT_DATA_ENTRY(svnapot, PRIV_VERSION_1_12_0, ext_svnapot),
  ISA_EXT_DATA_ENTRY(svpbmt, PRIV_VERSION_1_12_0, ext_svpbmt),
+ISA_EXT_DATA_ENTRY(svrsw60b59b, PRIV_VERSION_1_13_0, ext_svrsw60b59b),
  ISA_EXT_DATA_ENTRY(svukte, PRIV_VERSION_1_13_0, ext_svukte),
  ISA_EXT_DATA_ENTRY(svvptc, PRIV_VERSION_1_13_0, ext_svvptc),
  ISA_EXT_DATA_ENTRY(xtheadba, PRIV_VERSION_1_11_0, ext_xtheadba),
@@ -1644,6 +1645,7 @@ const RISCVCPUMultiExtConfig riscv_cpu_extensions[] = {
  MULTI_EXT_CFG_BOOL("svinval", ext_svinval, false),
  MULTI_EXT_CFG_BOOL("svnapot", ext_svnapot, false),
  MULTI_EXT_CFG_BOOL("svpbmt", ext_svpbmt, false),
+MULTI_EXT_CFG_BOOL("svrsw60b59b", ext_svrsw60b59b, false),
  MULTI_EXT_CFG_BOOL("svvptc", ext_svvptc, true),

  MULTI_EXT_CFG_BOOL("zicntr", ext_zicntr, true),
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index f97c48a394..71f9e603c5 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -663,7 +663,8 @@ typedef enum {
  #define PTE_SOFT0x300 /* Reserved for Software */
  #define PTE_PBMT0x6000ULL /* Page-based memory types 
*/
  #define PTE_N   0x8000ULL /* NAPOT translation */
-#define PTE_RESERVED0x1FC0ULL /* Reserved bits */
+#define PTE_RESERVED(svrsw60b59b)  \
+   (svrsw60b59b ? 0x07C0ULL : 0x1FC0ULL) 
/* Reserved bits */
  #define PTE_ATTR(PTE_N | PTE_PBMT) /* All attributes bits */

  /* Page table PPN shift amount */
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index b410b1e603..f6e4b0068a 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -89,6 +89,7 @@ struct RISCVCPUConfig {
  bool ext_svinval;
  bool ext_svnapot;
  bool ext_svpbmt;
+bool ext_svrsw60b59b;
  bool ext_svvptc;
  bool ext_svukte;
  bool ext_zdinx;
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index e1dfc4ecbf..6546cea403 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -1156,6 +1156,7 @@ static int get_physical_address(CPURISCVState *env, 
hwaddr *physical,
  bool svade = riscv_cpu_cfg(env)->ext_svade;
  bool svadu = riscv_cpu_cfg(env)->ext_svadu;
  bool adue = svadu ? env->menvcfg & MENVCFG_ADUE : !svade;
+bool svrsw60b59b = riscv_cpu_cfg(env)->ext_svrsw60b59b;

  if (first_stage && two_stage && env->virt_enabled) {
  pbmte = pbmte && (env->henvcfg & HENVCFG_PBMTE);
@@ -1225,7 +1226,7 @@ restart:
  if (riscv_cpu_sxl(env) == MXL_RV32) {
  ppn = pte >> PTE_PPN_SHIFT;
  } else {
-if (pte & PTE_RESERVED) {
+if (pte & PTE_RESERVED(svrsw60b59b)) {
  return TRANSLATE_FAIL;
  }

--
2.39.2






[PATCH v3 7/7] vdpa: move memory listener register to vhost_vdpa_init

2025-03-14 Thread Jonah Palmer
From: Eugenio Pérez 

Current memory operations like pinning may take a lot of time at the
destination.  Currently they are done after the source of the migration is
stopped, and before the workload is resumed at the destination.  This is a
period where neigher traffic can flow, nor the VM workload can continue
(downtime).

We can do better as we know the memory layout of the guest RAM at the
destination from the moment that all devices are initializaed.  So
moving that operation allows QEMU to communicate the kernel the maps
while the workload is still running in the source, so Linux can start
mapping them.

As a small drawback, there is a time in the initialization where QEMU
cannot respond to QMP etc.  By some testing, this time is about
0.2seconds.  This may be further reduced (or increased) depending on the
vdpa driver and the platform hardware, and it is dominated by the cost
of memory pinning.

This matches the time that we move out of the called downtime window.
The downtime is measured as checking the trace timestamp from the moment
the source suspend the device to the moment the destination starts the
eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
secs to 2.0949.

Signed-off-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 

--
v2:
Move the memory listener registration to vhost_vdpa_set_owner function.
In case of hotplug the vdpa device, the memory is already set up, and
leaving memory listener register call in the init function made maps
occur before set owner call.

To be 100% safe, let's put it right after set_owner call.

Reported-by: Lei Yang 
---
 hw/virtio/vhost-vdpa.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index eb5b5208b7..afed991253 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1381,6 +1381,11 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, 
bool started)
  "IOMMU and try again");
 return -1;
 }
+if (v->shared->listener_registered &&
+dev->vdev->dma_as != v->shared->listener.address_space) {
+memory_listener_unregister(&v->shared->listener);
+v->shared->listener_registered = false;
+}
 if (!v->shared->listener_registered) {
 memory_listener_register(&v->shared->listener, dev->vdev->dma_as);
 v->shared->listener_registered = true;
@@ -1539,12 +1544,27 @@ static int vhost_vdpa_get_features(struct vhost_dev 
*dev,
 
 static int vhost_vdpa_set_owner(struct vhost_dev *dev)
 {
+int r;
+struct vhost_vdpa *v;
+
 if (!vhost_vdpa_first_dev(dev)) {
 return 0;
 }
 
 trace_vhost_vdpa_set_owner(dev);
-return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
+r = vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
+if (unlikely(r < 0)) {
+return r;
+}
+
+/*
+ * Being optimistic and listening address space memory. If the device
+ * uses vIOMMU, it is changed at vhost_vdpa_dev_start.
+ */
+v = dev->opaque;
+memory_listener_register(&v->shared->listener, &address_space_memory);
+v->shared->listener_registered = true;
+return 0;
 }
 
 static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
-- 
2.43.5




[PULL 08/17] target/arm: HCR_EL2.RW should be RAO/WI if EL1 doesn't support AArch32

2025-03-14 Thread Peter Maydell
When EL1 doesn't support AArch32, the HCR_EL2.RW bit is supposed to
be RAO/WI. Enforce the RAO/WI behaviour.

Note that we handle "reset value should honour RES1 bits" in the same
way that SCR_EL3 does, via a reset function.

We do already have some CPU types which don't implement AArch32
above EL0, so this is technically a bug; it doesn't seem worth
backporting to stable because no sensible guest code will be
deliberately attempting to set the RW bit to a value corresponding
to an unimplemented execution state and then checking that we
did the right thing.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/helper.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 3df7d5347cb..bb445e30cd1 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5326,6 +5326,11 @@ static void do_hcr_write(CPUARMState *env, uint64_t 
value, uint64_t valid_mask)
 /* Clear RES0 bits.  */
 value &= valid_mask;
 
+/* RW is RAO/WI if EL1 is AArch64 only */
+if (!cpu_isar_feature(aa64_aa32_el1, cpu)) {
+value |= HCR_RW;
+}
+
 /*
  * These bits change the MMU setup:
  * HCR_VM enables stage 2 translation
@@ -5383,6 +5388,12 @@ static void hcr_writelow(CPUARMState *env, const 
ARMCPRegInfo *ri,
 do_hcr_write(env, value, MAKE_64BIT_MASK(32, 32));
 }
 
+static void hcr_reset(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+/* hcr_write will set the RES1 bits on an AArch64-only CPU */
+hcr_write(env, ri, 0);
+}
+
 /*
  * Return the effective value of HCR_EL2, at the given security state.
  * Bits that are not included here:
@@ -5618,6 +5629,7 @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
   .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 0,
   .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.hcr_el2),
   .nv2_redirect_offset = 0x78,
+  .resetfn = hcr_reset,
   .writefn = hcr_write, .raw_writefn = raw_write },
 { .name = "HCR", .state = ARM_CP_STATE_AA32,
   .type = ARM_CP_ALIAS | ARM_CP_IO,
-- 
2.43.0




[PATCH v3 0/7] Move memory listener register to vhost_vdpa_init

2025-03-14 Thread Jonah Palmer
Current memory operations like pinning may take a lot of time at the
destination.  Currently they are done after the source of the migration is
stopped, and before the workload is resumed at the destination.  This is a
period where neigher traffic can flow, nor the VM workload can continue
(downtime).

We can do better as we know the memory layout of the guest RAM at the
destination from the moment that all devices are initializaed.  So
moving that operation allows QEMU to communicate the kernel the maps
while the workload is still running in the source, so Linux can start
mapping them.

As a small drawback, there is a time in the initialization where QEMU
cannot respond to QMP etc.  By some testing, this time is about
0.2seconds.  This may be further reduced (or increased) depending on the
vdpa driver and the platform hardware, and it is dominated by the cost
of memory pinning.

This matches the time that we move out of the called downtime window.
The downtime is measured as checking the trace timestamp from the moment
the source suspend the device to the moment the destination starts the
eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
secs to 2.0949.

Future directions on top of this series may include to move more things ahead
of the migration time, like set DRIVER_OK or perform actual iterative migration
of virtio-net devices.

Comments are welcome.

This series is a different approach of series [1]. As the title does not
reflect the changes anymore, please refer to the previous one to know the
series history.

This series is based on [2], it must be applied after it.

[Jonah Palmer]
This series was rebased after [3] was pulled in, as [3] was a prerequisite
fix for this series.

v3:
---
* Rebase

v2:
---
* Move the memory listener registration to vhost_vdpa_set_owner function.
* Move the iova_tree allocation to net_vhost_vdpa_init.

v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.

[1] 
https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-epere...@redhat.com/
[2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
[3] 
https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.pal...@oracle.com/

Eugenio Pérez (7):
  vdpa: check for iova tree initialized at net_client_start
  vdpa: reorder vhost_vdpa_set_backend_cap
  vdpa: set backend capabilities at vhost_vdpa_init
  vdpa: add listener_registered
  vdpa: reorder listener assignment
  vdpa: move iova_tree allocation to net_vhost_vdpa_init
  vdpa: move memory listener register to vhost_vdpa_init

 hw/virtio/vhost-vdpa.c | 98 ++
 include/hw/virtio/vhost-vdpa.h | 22 +++-
 net/vhost-vdpa.c   | 34 ++--
 3 files changed, 88 insertions(+), 66 deletions(-)

-- 
2.43.5




[PATCH v3 2/7] vdpa: reorder vhost_vdpa_set_backend_cap

2025-03-14 Thread Jonah Palmer
From: Eugenio Pérez 

It will be used directly by vhost_vdpa_init.

Signed-off-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 hw/virtio/vhost-vdpa.c | 60 +-
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 7efbde3d4c..79224d18d8 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -596,6 +596,36 @@ static void vhost_vdpa_init_svq(struct vhost_dev *hdev, 
struct vhost_vdpa *v)
 v->shadow_vqs = g_steal_pointer(&shadow_vqs);
 }
 
+static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
+{
+struct vhost_vdpa *v = dev->opaque;
+
+uint64_t features;
+uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
+0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
+0x1ULL << VHOST_BACKEND_F_IOTLB_ASID |
+0x1ULL << VHOST_BACKEND_F_SUSPEND;
+int r;
+
+if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
+return -EFAULT;
+}
+
+features &= f;
+
+if (vhost_vdpa_first_dev(dev)) {
+r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
+if (r) {
+return -EFAULT;
+}
+}
+
+dev->backend_cap = features;
+v->shared->backend_cap = features;
+
+return 0;
+}
+
 static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
 {
 struct vhost_vdpa *v = opaque;
@@ -843,36 +873,6 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
 return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
 }
 
-static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
-{
-struct vhost_vdpa *v = dev->opaque;
-
-uint64_t features;
-uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
-0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
-0x1ULL << VHOST_BACKEND_F_IOTLB_ASID |
-0x1ULL << VHOST_BACKEND_F_SUSPEND;
-int r;
-
-if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
-return -EFAULT;
-}
-
-features &= f;
-
-if (vhost_vdpa_first_dev(dev)) {
-r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
-if (r) {
-return -EFAULT;
-}
-}
-
-dev->backend_cap = features;
-v->shared->backend_cap = features;
-
-return 0;
-}
-
 static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
 uint32_t *device_id)
 {
-- 
2.43.5




[PATCH v3 1/7] vdpa: check for iova tree initialized at net_client_start

2025-03-14 Thread Jonah Palmer
From: Eugenio Pérez 

To map the guest memory while it is migrating we need to create the
iova_tree, as long as the destination uses x-svq=on. Checking to not
override it.

The function vhost_vdpa_net_client_stop clear it if the device is
stopped. If the guest starts the device again, the iova tree is
recreated by vhost_vdpa_net_data_start_first or vhost_vdpa_net_cvq_start
if needed, so old behavior is kept.

Signed-off-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 net/vhost-vdpa.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index f7a54f46aa..5bc945d3e0 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -354,7 +354,9 @@ static void vhost_vdpa_net_data_start_first(VhostVDPAState 
*s)
 
 migration_add_notifier(&s->migration_state,
vdpa_net_migration_state_notifier);
-if (v->shadow_vqs_enabled) {
+
+/* iova_tree may be initialized by vhost_vdpa_net_load_setup */
+if (v->shadow_vqs_enabled && !v->shared->iova_tree) {
 v->shared->iova_tree = vhost_iova_tree_new(v->shared->iova_range.first,
v->shared->iova_range.last);
 }
-- 
2.43.5




[PATCH v3 4/7] vdpa: add listener_registered

2025-03-14 Thread Jonah Palmer
From: Eugenio Pérez 

Check if the listener has been registered or not, so it needs to be
registered again at start.

Signed-off-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 hw/virtio/vhost-vdpa.c | 7 ++-
 include/hw/virtio/vhost-vdpa.h | 6 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 939a5a28a1..61a0e8fdbd 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1381,7 +1381,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, 
bool started)
  "IOMMU and try again");
 return -1;
 }
-memory_listener_register(&v->shared->listener, dev->vdev->dma_as);
+if (!v->shared->listener_registered) {
+memory_listener_register(&v->shared->listener, dev->vdev->dma_as);
+v->shared->listener_registered = true;
+}
 
 return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
 }
@@ -1401,6 +1404,8 @@ static void vhost_vdpa_reset_status(struct vhost_dev *dev)
 vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
VIRTIO_CONFIG_S_DRIVER);
 memory_listener_unregister(&v->shared->listener);
+v->shared->listener_registered = false;
+
 }
 
 static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 0a9575b469..221840987e 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -51,6 +51,12 @@ typedef struct vhost_vdpa_shared {
 
 bool iotlb_batch_begin_sent;
 
+/*
+ * The memory listener has been registered, so DMA maps have been sent to
+ * the device.
+ */
+bool listener_registered;
+
 /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */
 bool shadow_data;
 
-- 
2.43.5




[PATCH v3 3/7] vdpa: set backend capabilities at vhost_vdpa_init

2025-03-14 Thread Jonah Palmer
From: Eugenio Pérez 

The backend does not reset them until the vdpa file descriptor is closed
so there is no harm in doing it only once.

This allows the destination of a live migration to premap memory in
batches, using VHOST_BACKEND_F_IOTLB_BATCH.

Signed-off-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 hw/virtio/vhost-vdpa.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 79224d18d8..939a5a28a1 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -636,6 +636,12 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void 
*opaque, Error **errp)
 v->dev = dev;
 dev->opaque =  opaque ;
 v->shared->listener = vhost_vdpa_memory_listener;
+
+ret = vhost_vdpa_set_backend_cap(dev);
+if (unlikely(ret != 0)) {
+return ret;
+}
+
 vhost_vdpa_init_svq(dev, v);
 
 error_propagate(&dev->migration_blocker, v->migration_blocker);
@@ -1565,7 +1571,6 @@ const VhostOps vdpa_ops = {
 .vhost_set_vring_kick = vhost_vdpa_set_vring_kick,
 .vhost_set_vring_call = vhost_vdpa_set_vring_call,
 .vhost_get_features = vhost_vdpa_get_features,
-.vhost_set_backend_cap = vhost_vdpa_set_backend_cap,
 .vhost_set_owner = vhost_vdpa_set_owner,
 .vhost_set_vring_endian = NULL,
 .vhost_backend_memslots_limit = vhost_vdpa_memslots_limit,
-- 
2.43.5




Re: [PATCH RFC] target: riscv: Add Svrsw60b59b extension support

2025-03-14 Thread Alexandre Ghiti
On Fri, Mar 14, 2025 at 1:38 PM Daniel Henrique Barboza
 wrote:
>
>
>
> On 3/14/25 9:11 AM, Alexandre Ghiti wrote:
> > On Fri, Mar 14, 2025 at 11:48 AM Alexandre Ghiti  
> > wrote:
> >>
> >> The Svrsw60b59b extension allows to free the PTE reserved bits 60 and 59
> >> for software to use.
> >
> > I missed that the extension had been renamed to Svrsw60*t*59b, I'll
> > fix that in v2 later after I collect some feedback.
>
> Just to be clear: the extension is going to be named Svrsw60t59b, not
> "Svrsw60*t*59b". Correct?

Yes, I added the '*' to emphasize the subtle change :)

>
>
> Aside from that code LGTM. Thanks,

Thanks!

Alex

>
> Daniel
>
>
> >
> > Thanks,
> >
> > Alex
> >
> >>
> >> Signed-off-by: Alexandre Ghiti 
> >> ---
> >>
> >> I tested it by always setting the bits 60 and 59 in Linux which booted
> >> fine.
> >>
> >>   target/riscv/cpu.c| 2 ++
> >>   target/riscv/cpu_bits.h   | 3 ++-
> >>   target/riscv/cpu_cfg.h| 1 +
> >>   target/riscv/cpu_helper.c | 3 ++-
> >>   4 files changed, 7 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> >> index 3d4bd157d2..ee89cdef46 100644
> >> --- a/target/riscv/cpu.c
> >> +++ b/target/riscv/cpu.c
> >> @@ -219,6 +219,7 @@ const RISCVIsaExtData isa_edata_arr[] = {
> >>   ISA_EXT_DATA_ENTRY(svinval, PRIV_VERSION_1_12_0, ext_svinval),
> >>   ISA_EXT_DATA_ENTRY(svnapot, PRIV_VERSION_1_12_0, ext_svnapot),
> >>   ISA_EXT_DATA_ENTRY(svpbmt, PRIV_VERSION_1_12_0, ext_svpbmt),
> >> +ISA_EXT_DATA_ENTRY(svrsw60b59b, PRIV_VERSION_1_13_0, ext_svrsw60b59b),
> >>   ISA_EXT_DATA_ENTRY(svukte, PRIV_VERSION_1_13_0, ext_svukte),
> >>   ISA_EXT_DATA_ENTRY(svvptc, PRIV_VERSION_1_13_0, ext_svvptc),
> >>   ISA_EXT_DATA_ENTRY(xtheadba, PRIV_VERSION_1_11_0, ext_xtheadba),
> >> @@ -1644,6 +1645,7 @@ const RISCVCPUMultiExtConfig riscv_cpu_extensions[] 
> >> = {
> >>   MULTI_EXT_CFG_BOOL("svinval", ext_svinval, false),
> >>   MULTI_EXT_CFG_BOOL("svnapot", ext_svnapot, false),
> >>   MULTI_EXT_CFG_BOOL("svpbmt", ext_svpbmt, false),
> >> +MULTI_EXT_CFG_BOOL("svrsw60b59b", ext_svrsw60b59b, false),
> >>   MULTI_EXT_CFG_BOOL("svvptc", ext_svvptc, true),
> >>
> >>   MULTI_EXT_CFG_BOOL("zicntr", ext_zicntr, true),
> >> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> >> index f97c48a394..71f9e603c5 100644
> >> --- a/target/riscv/cpu_bits.h
> >> +++ b/target/riscv/cpu_bits.h
> >> @@ -663,7 +663,8 @@ typedef enum {
> >>   #define PTE_SOFT0x300 /* Reserved for Software */
> >>   #define PTE_PBMT0x6000ULL /* Page-based memory 
> >> types */
> >>   #define PTE_N   0x8000ULL /* NAPOT translation */
> >> -#define PTE_RESERVED0x1FC0ULL /* Reserved bits */
> >> +#define PTE_RESERVED(svrsw60b59b)  \
> >> +   (svrsw60b59b ? 0x07C0ULL : 
> >> 0x1FC0ULL) /* Reserved bits */
> >>   #define PTE_ATTR(PTE_N | PTE_PBMT) /* All attributes bits */
> >>
> >>   /* Page table PPN shift amount */
> >> diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
> >> index b410b1e603..f6e4b0068a 100644
> >> --- a/target/riscv/cpu_cfg.h
> >> +++ b/target/riscv/cpu_cfg.h
> >> @@ -89,6 +89,7 @@ struct RISCVCPUConfig {
> >>   bool ext_svinval;
> >>   bool ext_svnapot;
> >>   bool ext_svpbmt;
> >> +bool ext_svrsw60b59b;
> >>   bool ext_svvptc;
> >>   bool ext_svukte;
> >>   bool ext_zdinx;
> >> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> >> index e1dfc4ecbf..6546cea403 100644
> >> --- a/target/riscv/cpu_helper.c
> >> +++ b/target/riscv/cpu_helper.c
> >> @@ -1156,6 +1156,7 @@ static int get_physical_address(CPURISCVState *env, 
> >> hwaddr *physical,
> >>   bool svade = riscv_cpu_cfg(env)->ext_svade;
> >>   bool svadu = riscv_cpu_cfg(env)->ext_svadu;
> >>   bool adue = svadu ? env->menvcfg & MENVCFG_ADUE : !svade;
> >> +bool svrsw60b59b = riscv_cpu_cfg(env)->ext_svrsw60b59b;
> >>
> >>   if (first_stage && two_stage && env->virt_enabled) {
> >>   pbmte = pbmte && (env->henvcfg & HENVCFG_PBMTE);
> >> @@ -1225,7 +1226,7 @@ restart:
> >>   if (riscv_cpu_sxl(env) == MXL_RV32) {
> >>   ppn = pte >> PTE_PPN_SHIFT;
> >>   } else {
> >> -if (pte & PTE_RESERVED) {
> >> +if (pte & PTE_RESERVED(svrsw60b59b)) {
> >>   return TRANSLATE_FAIL;
> >>   }
> >>
> >> --
> >> 2.39.2
> >>
>



Re: [PATCH v8 00/28] vfio-user client

2025-03-14 Thread John Levon
On Fri, Mar 14, 2025 at 03:25:53PM +0100, Cédric Le Goater wrote:

> > This is the 8th revision of the vfio-user client implementation. The 
> > vfio-user
> > protocol allows for implementing (PCI) devices in another userspace process;
> > SPDK is one example, which includes a virtual NVMe implementation.
> 
> This series is going in the right direction. Thanks for all the
> changes. However, I will postpone my review for the reasons below.
> 
> VFIO initial commit in 2012 was a single file adding support for a
> "vfio-pci" device. Since, the subsystem has changed a lot and it grew
> to +16K line of code. Over the years, we've had numerous additions,
> VFIO platform devices, s390x VFIO AP and CCW devices, migration
> support with dirty tracking, support for different host IOMMU backend
> devices, multifd, etc. All these additions never took the opportunity
> to introduce a VFIO interface for external usage and PPC, s390x,
> migration, backends subsystems happily and randomly peek and poke into
> it. The subsystem is now a happy jumble of disparate functions, with
> weak organization and names. The vfio-common.h header file is
> good example of the result.

Right, I certainly agree!

> It's time for a global code reshuffling. I'll take care of it (and
> I've already started) and this means nothing new (fixes are OK) will
> be merged before this is done. I hope I have made enough progress
> before the start of the QEMU 10.1 cycle. There won't be any functional
> changes, but there will be new files and new function names, so it's
> probably a real earthquake for your series.

It's had a couple of earthquakes already, very happy to bear the pain for a
better end result! I will try to look out for your series and review where I can
as well.

thanks
john



[PATCH] configure: disable split_debug on Windows and on non-git builds

2025-03-14 Thread Paolo Bonzini
-gsplit-dwarf is reported to produce broken binaries on Windows.
The linker produces warnings but exits successfully:

/usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
qga/qemu-ga.exe:/4: section below image base
/usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
qga/qemu-ga.exe:/24: section below image base

and as a result qemu-ga.exe fails to start.

On top of this, also disable -gsplit-dwarf unless building from git.
Similar to -Werror, split debug info is probably not the best choice
for people that want to build for installing.

(Random thoughts: there is a tension here between adding an option
that is useful for QEMU developers, and messing things up for everyone
else by doing something decidedly non-standard.  For example, distros
are starting to create a fake git repository just so that they can
use "git am" to apply patches; while some of them, for example Fedora,
are wise, or paranoid, enough to pass --disable-XXX for everything and
then turn back on what they want, it cannot be expected that everyone
does this.  It may be safer to make --enable-split-debug default off
for everybody and add it somewhere in docs/.  For now I am keeping it
enabled but we could consider doing something different during the hard
freeze period).

Reported-by: Konstantin Kostiuk 
Signed-off-by: Paolo Bonzini 
---
 configure | 4 
 meson_options.txt | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 02f1dd2311f..9aece67ed08 100755
--- a/configure
+++ b/configure
@@ -1864,6 +1864,10 @@ if test "$skip_meson" = no; then
   { test "$host_os" = linux || test "$host_os" = "windows"; }; then
   echo 'werror = true' >> $cross
   fi
+  if test -e "$source_path/.git" && test "$host_os" != "windows"; then
+  echo 'split_debug = true' >> $cross
+  fi
+
   echo "[project options]" >> $cross
   if test "$SMBD" != ''; then
 echo "smbd = $(meson_quote "$SMBD")" >> $cross
diff --git a/meson_options.txt b/meson_options.txt
index 3432123fee2..f3546b9abc1 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -362,7 +362,7 @@ option('debug_mutex', type: 'boolean', value: false,
description: 'mutex debugging support')
 option('debug_stack_usage', type: 'boolean', value: false,
description: 'measure coroutine stack usage')
-option('split_debug', type: 'boolean', value: true,
+option('split_debug', type: 'boolean', value: false,
description: 'split debug info from object files')
 option('qom_cast_debug', type: 'boolean', value: true,
description: 'cast debugging support')
-- 
2.48.1




Re: [PATCH] docs/cxl: Add serial number for persistent-memdev

2025-03-14 Thread Dan Williams
Jonathan Cameron wrote:
> On Wed, 5 Mar 2025 18:35:40 +0800
> Yuquan Wang  wrote:
> 
> > > 
> > > On Tue, 4 Mar 2025 14:22:48 +0800
> > > Yuquan Wang  wrote:
> > >   
> > > > > 
> > > > > On Thu, Feb 20, 2025 at 04:12:13PM +, Jonathan Cameron wrote:
> > > > > > On Mon, 17 Feb 2025 19:20:39 +0800
> > > > > > Yuquan Wang  wrote:
> > > > > > 
> > > > > > > Add serial number parameter in the cxl persistent examples.
> > > > > > > 
> > > > > > > Signed-off-by: Yuquan Wang 
> > > > > > Looks good.  I've queued it up on my gitlab staging tree, but
> > > > > > Michael if you want to pick this one directly that's fine as well.  
> > > > > >   
> > > > > 
> > > > > See no reason to, I was not even CC'd.
> > > > 
> > > > Hi, Michael
> > > > 
> > > > I'm sorry, this is my fault. I used "get_maintainer.pl" to check this
> > > > patch's maintainers but it shows "No maintainers found, printing recent
> > > > contributors". 
> > > >   
> > > I usually stage up multiple series together and send on to Michael.
> > > So it was be being lazy for a minor change rather than anything much
> > > that you did wrong.
> > > 
> > > If I get time I'll post a series with this a few other patches
> > > later today.  
> > > 
> > > Jonathan
> > >   
> > Thank you!
> > 
> > BTW, I found a corner case in CXL numa node creation.
> > 
> > Condition: 
> > 1) A UMA/NUMA system without SRAT, but with CEDT.CFMWS
> > 2)Enable CONFIG_ACPI_NUMA
> > 
> > Results:
> > 1) acpi_numa_init: the fake_pxm will be 0 and send to acpi_parse_cfmws()
> > 2)If dynamically create cxl ram region, the cxl memory would be assigned
> > to node0 rather than a new node
> > 
> > Confusions:
> > 1) Is a numa system a requirement for CXL memory usage?
> 
> Obviously discussion has gone on elsewhere, but I'd say in general it
> would be a bad idea to not have an SRAT because the moment we add CXL
> it is definitely a NUMA system and we want the Generic Port entry to
> allow us to get perf information.
> 
> So I wouldn't mind if we fail CXL init in this case, but maybe
> it is worth papering over things.

I think that is too severe. If a driver has a path to advertise
resources, even in a less than ideal way, it should make every effort to
do that. There are plenty of ways for the NUMA information to fail, that
does not mean the memory needs to be prevented from coming online. Let
the end user decide if lack of performance information is fatal.



Re: [PATCH v3 4/7] memory-attribute-manager: Introduce MemoryAttributeManager to manage RAMBLock with guest_memfd

2025-03-14 Thread Gupta, Pankaj

On 3/10/2025 9:18 AM, Chenyi Qiang wrote:

As the commit 852f0048f3 ("RAMBlock: make guest_memfd require
uncoordinated discard") highlighted, some subsystems like VFIO may
disable ram block discard. However, guest_memfd relies on the discard
operation to perform page conversion between private and shared memory.
This can lead to stale IOMMU mapping issue when assigning a hardware
device to a confidential VM via shared memory. To address this, it is
crucial to ensure systems like VFIO refresh its IOMMU mappings.

RamDiscardManager is an existing concept (used by virtio-mem) to adjust
VFIO mappings in relation to VM page assignment. Effectively page
conversion is similar to hot-removing a page in one mode and adding it
back in the other. Therefore, similar actions are required for page
conversion events. Introduce the RamDiscardManager to guest_memfd to
facilitate this process.

Since guest_memfd is not an object, it cannot directly implement the
RamDiscardManager interface. One potential attempt is to implement it in
HostMemoryBackend. This is not appropriate because guest_memfd is per
RAMBlock. Some RAMBlocks have a memory backend but others do not. In
particular, the ones like virtual BIOS calling
memory_region_init_ram_guest_memfd() do not.

To manage the RAMBlocks with guest_memfd, define a new object named
MemoryAttributeManager to implement the RamDiscardManager interface. The


Isn't this should be the other way around. 'MemoryAttributeManager' 
should be an interface and RamDiscardManager a type of it, an 
implementation?


MemoryAttributeManager have the data like 'shared_bitmap' etc that
information can also be used by the other implementation types?

Or maybe I am getting it entirely wrong.

Thanks,
Pankaj


object stores guest_memfd information such as shared_bitmap, and handles
page conversion notification. Using the name of MemoryAttributeManager is
aimed to make it more generic. The term "Memory" emcompasses not only RAM
but also private MMIO in TEE I/O, which might rely on this
object/interface to handle page conversion events in the future. The
term "Attribute" allows for the management of various attributes beyond
shared and private. For instance, it could support scenarios where
discard vs. populated and shared vs. private states co-exists, such as
supporting virtio-mem or something similar in the future.

In the current context, MemoryAttributeManager signifies discarded state
as private and populated state as shared. Memory state is tracked at the
host page size granularity, as the minimum memory conversion size can be one
page per request. Additionally, VFIO expects the DMA mapping for a
specific iova to be mapped and unmapped with the same granularity.
Confidential VMs may perform  partial conversions, e.g. conversion
happens on a small region within a large region. To prevent such invalid
cases and until cut_mapping operation support is introduced, all
operations are performed with 4K granularity.

Signed-off-by: Chenyi Qiang 
---
Changes in v3:
 - Some rename (bitmap_size->shared_bitmap_size,
   first_one/zero_bit->first_bit, etc.)
 - Change shared_bitmap_size from uint32_t to unsigned
 - Return mgr->mr->ram_block->page_size in get_block_size()
 - Move set_ram_discard_manager() up to avoid a g_free() in failure
   case.
 - Add const for the memory_attribute_manager_get_block_size()
 - Unify the ReplayRamPopulate and ReplayRamDiscard and related
   callback.

Changes in v2:
 - Rename the object name to MemoryAttributeManager
 - Rename the bitmap to shared_bitmap to make it more clear.
 - Remove block_size field and get it from a helper. In future, we
   can get the page_size from RAMBlock if necessary.
 - Remove the unncessary "struct" before GuestMemfdReplayData
 - Remove the unncessary g_free() for the bitmap
 - Add some error report when the callback failure for
   populated/discarded section.
 - Move the realize()/unrealize() definition to this patch.
---
  include/system/memory-attribute-manager.h |  42 
  system/memory-attribute-manager.c | 283 ++
  system/meson.build|   1 +
  3 files changed, 326 insertions(+)
  create mode 100644 include/system/memory-attribute-manager.h
  create mode 100644 system/memory-attribute-manager.c

diff --git a/include/system/memory-attribute-manager.h 
b/include/system/memory-attribute-manager.h
new file mode 100644
index 00..23375a14b8
--- /dev/null
+++ b/include/system/memory-attribute-manager.h
@@ -0,0 +1,42 @@
+/*
+ * QEMU memory attribute manager
+ *
+ * Copyright Intel
+ *
+ * Author:
+ *  Chenyi Qiang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory
+ *
+ */
+
+#ifndef SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H
+#define SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H
+
+#include "system/hostmem.h"
+
+#define TYPE_MEMORY_ATTRIBUTE_MANAGER "m

[PULL 02/17] target/arm: Un-inline access_secure_reg()

2025-03-14 Thread Peter Maydell
We would like to move arm_el_is_aa64() to internals.h; however, it is
used by access_secure_reg().  Make that function not be inline, so
that it can stay in cpu.h.

access_secure_reg() is used only in two places:
 * in hflags.c
 * in the user-mode arm emulators, to decide whether to store
   the TLS value in the secure or non-secure banked field

The second of these is not on a super-hot path that would care about
the inlining (and incidentally will always use the NS banked field
because our user-mode CPUs never set ARM_FEATURE_EL3); put the
definition of access_secure_reg() in hflags.c, near its only use
inside target/arm.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/cpu.h| 12 +++-
 target/arm/tcg/hflags.c |  9 +
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 15d3a79b0af..12d2706f2b5 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2668,21 +2668,15 @@ static inline bool arm_el_is_aa64(CPUARMState *env, int 
el)
 return aa64;
 }
 
-/* Function for determining whether guest cp register reads and writes should
+/*
+ * Function for determining whether guest cp register reads and writes should
  * access the secure or non-secure bank of a cp register.  When EL3 is
  * operating in AArch32 state, the NS-bit determines whether the secure
  * instance of a cp register should be used. When EL3 is AArch64 (or if
  * it doesn't exist at all) then there is no register banking, and all
  * accesses are to the non-secure version.
  */
-static inline bool access_secure_reg(CPUARMState *env)
-{
-bool ret = (arm_feature(env, ARM_FEATURE_EL3) &&
-!arm_el_is_aa64(env, 3) &&
-!(env->cp15.scr_el3 & SCR_NS));
-
-return ret;
-}
+bool access_secure_reg(CPUARMState *env);
 
 uint32_t arm_phys_excp_target_el(CPUState *cs, uint32_t excp_idx,
  uint32_t cur_el, bool secure);
diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
index 9e6a1869f94..8d79b8b7ae1 100644
--- a/target/arm/tcg/hflags.c
+++ b/target/arm/tcg/hflags.c
@@ -63,6 +63,15 @@ static bool aprofile_require_alignment(CPUARMState *env, int 
el, uint64_t sctlr)
 #endif
 }
 
+bool access_secure_reg(CPUARMState *env)
+{
+bool ret = (arm_feature(env, ARM_FEATURE_EL3) &&
+!arm_el_is_aa64(env, 3) &&
+!(env->cp15.scr_el3 & SCR_NS));
+
+return ret;
+}
+
 static CPUARMTBFlags rebuild_hflags_common(CPUARMState *env, int fp_el,
ARMMMUIdx mmu_idx,
CPUARMTBFlags flags)
-- 
2.43.0




Re: [PATCH v6] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support

2025-03-14 Thread Jörg Rödel
On Fri, Mar 14, 2025 at 03:08:43PM +0100, Gerd Hoffman wrote:
> If your input firmware image already is an IGVM (say coconut), what is
> supposed to happen?

The COCONUT igvmbuilder has the ability to take another IGVM file as
input and add its directive and contents to the output file. This is
needed for the Hyper-V firmware, which is also provided as an IGVM file.
I think it can also make changes when processing an input IGVM, like
changing the VMPL of a VMSA.

This can be extended to cover more use-cases, e.g. like direct-boot of a
linux kernel with initrd and command-line.

Regards,

Joerg



Re: [PATCH 2/2] memory: suppress INVALID_MEM logs caused by debug access

2025-03-14 Thread Philippe Mathieu-Daudé

On 14/3/25 08:41, Nicholas Piggin wrote:

Debugger-driven invalid memory accesses are not guest errors, so should
not cause these error logs.

Debuggers can access memory wildly, including access to addresses not
specified by the user (e.g., gdb it might try to walk the stack or load
target addresses to display disassembly). Failure is reported
synchronously by the GDB protcol so the user can be notified via the
debugger client.

Signed-off-by: Nicholas Piggin 
---
  system/memory.c | 37 ++---
  1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/system/memory.c b/system/memory.c
index 4c829793a0a..960f66e8d7e 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -1412,18 +1412,23 @@ bool memory_region_access_valid(MemoryRegion *mr,
  {


Should we instead consider debug accesses as always valid? i.e.:

if (attrs.debug) {
return true;
}


  if (mr->ops->valid.accepts
  && !mr->ops->valid.accepts(mr->opaque, addr, size, is_write, attrs)) {
-qemu_log_mask(LOG_INVALID_MEM, "Invalid %s at addr 0x%" HWADDR_PRIX
-  ", size %u, region '%s', reason: rejected\n",
-  is_write ? "write" : "read",
-  addr, size, memory_region_name(mr));
+if (attrs.debug) {
+/* Don't log memory errors due to debugger accesses */
+qemu_log_mask(LOG_INVALID_MEM, "Invalid %s at addr 0x%" HWADDR_PRIX
+  ", size %u, region '%s', reason: rejected\n",
+  is_write ? "write" : "read",
+  addr, size, memory_region_name(mr));
+}
  return false;
  }
  
  if (!mr->ops->valid.unaligned && (addr & (size - 1))) {

-qemu_log_mask(LOG_INVALID_MEM, "Invalid %s at addr 0x%" HWADDR_PRIX
-  ", size %u, region '%s', reason: unaligned\n",
-  is_write ? "write" : "read",
-  addr, size, memory_region_name(mr));
+if (attrs.debug) {
+qemu_log_mask(LOG_INVALID_MEM, "Invalid %s at addr 0x%" HWADDR_PRIX
+  ", size %u, region '%s', reason: unaligned\n",
+  is_write ? "write" : "read",
+  addr, size, memory_region_name(mr));
+}
  return false;
  }
  
@@ -1434,13 +1439,15 @@ bool memory_region_access_valid(MemoryRegion *mr,
  
  if (size > mr->ops->valid.max_access_size

  || size < mr->ops->valid.min_access_size) {
-qemu_log_mask(LOG_INVALID_MEM, "Invalid %s at addr 0x%" HWADDR_PRIX
-  ", size %u, region '%s', reason: invalid size "
-  "(min:%u max:%u)\n",
-  is_write ? "write" : "read",
-  addr, size, memory_region_name(mr),
-  mr->ops->valid.min_access_size,
-  mr->ops->valid.max_access_size);
+if (attrs.debug) {
+qemu_log_mask(LOG_INVALID_MEM, "Invalid %s at addr 0x%" HWADDR_PRIX
+  ", size %u, region '%s', reason: invalid size "
+  "(min:%u max:%u)\n",
+  is_write ? "write" : "read",
+  addr, size, memory_region_name(mr),
+  mr->ops->valid.min_access_size,
+  mr->ops->valid.max_access_size);
+}
  return false;
  }
  return true;





Re: [PATCH] configure: disable split_debug on Windows and on non-git builds

2025-03-14 Thread Peter Maydell
On Fri, 14 Mar 2025 at 15:39, Paolo Bonzini  wrote:
>
> -gsplit-dwarf is reported to produce broken binaries on Windows.
> The linker produces warnings but exits successfully:
>
> /usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
> qga/qemu-ga.exe:/4: section below image base
> /usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
> qga/qemu-ga.exe:/24: section below image base
>
> and as a result qemu-ga.exe fails to start.
>
> On top of this, also disable -gsplit-dwarf unless building from git.
> Similar to -Werror, split debug info is probably not the best choice
> for people that want to build for installing.
>
> (Random thoughts: there is a tension here between adding an option
> that is useful for QEMU developers, and messing things up for everyone
> else by doing something decidedly non-standard.  For example, distros
> are starting to create a fake git repository just so that they can
> use "git am" to apply patches; while some of them, for example Fedora,
> are wise, or paranoid, enough to pass --disable-XXX for everything and
> then turn back on what they want, it cannot be expected that everyone
> does this.  It may be safer to make --enable-split-debug default off
> for everybody and add it somewhere in docs/.  For now I am keeping it
> enabled but we could consider doing something different during the hard
> freeze period).
>
> Reported-by: Konstantin Kostiuk 
> Signed-off-by: Paolo Bonzini 
> ---
>  configure | 4 
>  meson_options.txt | 2 +-
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/configure b/configure
> index 02f1dd2311f..9aece67ed08 100755
> --- a/configure
> +++ b/configure
> @@ -1864,6 +1864,10 @@ if test "$skip_meson" = no; then
>{ test "$host_os" = linux || test "$host_os" = "windows"; }; then
>echo 'werror = true' >> $cross
>fi
> +  if test -e "$source_path/.git" && test "$host_os" != "windows"; then
> +  echo 'split_debug = true' >> $cross
> +  fi

Same remark as on the other patch: can we have a comment
explaining why we disable this on Windows, please, ideally
with a URL of a bug report against the toolchain ?

thanks
-- PMM



Re: [PATCH] configure: disable split_debug on Windows and on non-git builds

2025-03-14 Thread Daniel P . Berrangé
On Fri, Mar 14, 2025 at 04:44:01PM +, Peter Maydell wrote:
> On Fri, 14 Mar 2025 at 15:39, Paolo Bonzini  wrote:
> >
> > -gsplit-dwarf is reported to produce broken binaries on Windows.
> > The linker produces warnings but exits successfully:
> >
> > /usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
> > qga/qemu-ga.exe:/4: section below image base
> > /usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
> > qga/qemu-ga.exe:/24: section below image base
> >
> > and as a result qemu-ga.exe fails to start.
> >
> > On top of this, also disable -gsplit-dwarf unless building from git.
> > Similar to -Werror, split debug info is probably not the best choice
> > for people that want to build for installing.
> >
> > (Random thoughts: there is a tension here between adding an option
> > that is useful for QEMU developers, and messing things up for everyone
> > else by doing something decidedly non-standard.  For example, distros
> > are starting to create a fake git repository just so that they can
> > use "git am" to apply patches; while some of them, for example Fedora,
> > are wise, or paranoid, enough to pass --disable-XXX for everything and
> > then turn back on what they want, it cannot be expected that everyone
> > does this.  It may be safer to make --enable-split-debug default off
> > for everybody and add it somewhere in docs/.  For now I am keeping it
> > enabled but we could consider doing something different during the hard
> > freeze period).
> >
> > Reported-by: Konstantin Kostiuk 
> > Signed-off-by: Paolo Bonzini 
> > ---
> >  configure | 4 
> >  meson_options.txt | 2 +-
> >  2 files changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/configure b/configure
> > index 02f1dd2311f..9aece67ed08 100755
> > --- a/configure
> > +++ b/configure
> > @@ -1864,6 +1864,10 @@ if test "$skip_meson" = no; then
> >{ test "$host_os" = linux || test "$host_os" = "windows"; }; then
> >echo 'werror = true' >> $cross
> >fi
> > +  if test -e "$source_path/.git" && test "$host_os" != "windows"; then
> > +  echo 'split_debug = true' >> $cross
> > +  fi
> 
> Same remark as on the other patch: can we have a comment
> explaining why we disable this on Windows, please, ideally
> with a URL of a bug report against the toolchain ?

Two likely candidates open a long time

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59474
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99973


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] configure: disable split_debug on Windows and on non-git builds

2025-03-14 Thread Paolo Bonzini

On 3/14/25 17:54, Daniel P. Berrangé wrote:

Same remark as on the other patch: can we have a comment
explaining why we disable this on Windows, please, ideally
with a URL of a bug report against the toolchain ?


Two likely candidates open a long time

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59474
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99973


OMG...  Given this I think it should rather be reverted, and people can 
use --extra-cflags=-gsplit-dwarf.


(And in fact, since it's just an objcopy trick, it is even possible to 
put a gcc wrapper in ~/bin that will do it for you).


Paolo




Re: [PATCH 32/37] include/hw/intc: Remove ifndef CONFIG_USER_ONLY from armv7m_nvic.h

2025-03-14 Thread Richard Henderson

On 3/14/25 13:34, Pierrick Bouvier wrote:

On 3/14/25 13:03, Richard Henderson wrote:

I'm not quite sure what you're arguing for here.
A build-time error is vastly preferable to a run-time error.



Even though this specific patch is safe (code calling those functions should be under 
system anyway, so it should not be linked in a user binary), I just wonder if we should 
not add explicit checks for this, for other kind of replacement we'll have to do.


Any such runtime check should not be for "system" vs "user", but for "feature 
enabled".


If it's a lesser used configuration or feature, a run-time error could lay 
dormant for
years before a user encounters it.



Sure, but wouldn't it better to have an explicit assert, instead of observing a random 
behaviour?


What random behaviour are you suggesting?

I'm just worried we end up calling something we should not (user vs system, or any other 
ifdef CONFIG/TARGET that might be hidden somewhere), and silently return a wrong value, 
which would not be covered by our test suite.


Where is the wrong value going to be returned from, the stub?
Yes, many stubs do abort, typically after the "enabled" stub returns false.

It's still best if "feature enabled" is compile-time false when possible, such that 
everything after the feature check gets compiled out.  At which point we don't need the 
stubs: they won't be reachable and errors are caught at build-time.



r~



Re: [PATCH v2 00/14] Factor out HVF's instruction emulator

2025-03-14 Thread Wei Liu
On Fri, Mar 07, 2025 at 11:55:11AM -0800, Wei Liu wrote:
> Hi,
> 
> Microsoft's Linux Systems Group developed a Linux driver for the Microsoft
> Hypervisor (MSHV for short). The driver is being upstreamed. The first
> supported VMM is Cloud Hypervisor. We want to add QEMU as the second supported
> VMM.
> 
> The plan is to write an mshv accelerator in QEMU. The accelerator is still in
> the works.
> 
> MSHV doesn't emulate instructions. VMMs are supposed to bring their own
> instruction emulator. The path we've chosen is to reuse what's already in 
> QEMU.
> The instruction emulator in HVF looks good for what we need.
> 
> This patch series makes the instruction emulator in HVF a common
> component for the i386 target. It removes HVF specific code by using a
> set of hooks. The new incoming MSHV accelerator will implement the
> hooks, and where necessary, enhance the emulator and / or add new hooks.
> 
> The patches have been lightly tested by running a Linux VM on an Intel-based
> Mac. 
> 
> Thanks,
> Wei.
> 
> Changes in v2:
> 1. Address comments from Paolo on variable and directory names.
> 2. Rebase and drop the already applied patches.
> 3. Add a new entry in MAINTAINERS.
> 
> Wei Liu (14):
>   target/i386/hvf: introduce x86_emul_ops
>   target/i386/hvf: remove HVF specific calls from x86_decode.c
>   target/i386/hvf: provide and use handle_io in emul_ops
>   target/i386: rename hvf_mmio_buf to emu_mmio_buf
>   target/i386/hvf: use emul_ops->read_mem in x86_emu.c
>   taret/i386/hvf: provide and use write_mem in emul_ops
>   target/i386/hvf: provide and use simulate_{wrmsr,rdmsr} in emul_ops
>   target/i386: rename lazy flags field and its type
>   target/i386/hvf: drop unused headers
>   target/i386/hvf: rename some include guards
>   target/i386: add a directory for x86 instruction emulator
>   target/i386/emulate: add a panic.h
>   target/i386: move x86 instruction emulator out of hvf
>   MAINTAINERS: add an entry for the x86 instruction emulator

HVF maintainers, Ping?

Thanks,
Wei.



[PULL 04/17] linux-user/arm: Remove unused get_put_user macros

2025-03-14 Thread Peter Maydell
In linux-user/arm/cpu_loop.c we define a full set of get/put
macros for both code and data (since the endianness handling
is different between the two). However the only one we actually
use is get_user_code_u32(). Remove the rest.

We leave a comment noting how data-side accesses should be handled
for big-endian, because that's a subtle point and we just removed the
macros that were effectively documenting it.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 linux-user/arm/cpu_loop.c | 43 ---
 1 file changed, 4 insertions(+), 39 deletions(-)

diff --git a/linux-user/arm/cpu_loop.c b/linux-user/arm/cpu_loop.c
index 10d8561f9b9..7416e3216ea 100644
--- a/linux-user/arm/cpu_loop.c
+++ b/linux-user/arm/cpu_loop.c
@@ -36,45 +36,10 @@
 __r;\
 })
 
-#define get_user_code_u16(x, gaddr, env)\
-({ abi_long __r = get_user_u16((x), (gaddr));   \
-if (!__r && bswap_code(arm_sctlr_b(env))) { \
-(x) = bswap16(x);   \
-}   \
-__r;\
-})
-
-#define get_user_data_u32(x, gaddr, env)\
-({ abi_long __r = get_user_u32((x), (gaddr));   \
-if (!__r && arm_cpu_bswap_data(env)) {  \
-(x) = bswap32(x);   \
-}   \
-__r;\
-})
-
-#define get_user_data_u16(x, gaddr, env)\
-({ abi_long __r = get_user_u16((x), (gaddr));   \
-if (!__r && arm_cpu_bswap_data(env)) {  \
-(x) = bswap16(x);   \
-}   \
-__r;\
-})
-
-#define put_user_data_u32(x, gaddr, env)\
-({ typeof(x) __x = (x); \
-if (arm_cpu_bswap_data(env)) {  \
-__x = bswap32(__x); \
-}   \
-put_user_u32(__x, (gaddr)); \
-})
-
-#define put_user_data_u16(x, gaddr, env)\
-({ typeof(x) __x = (x); \
-if (arm_cpu_bswap_data(env)) {  \
-__x = bswap16(__x); \
-}   \
-put_user_u16(__x, (gaddr)); \
-})
+/*
+ * Note that if we need to do data accesses here, they should do a
+ * bswap if arm_cpu_bswap_data() returns true.
+ */
 
 /*
  * Similar to code in accel/tcg/user-exec.c, but outside the execution loop.
-- 
2.43.0




Re: [PATCH] configure: disable split_debug on Windows and on non-git builds

2025-03-14 Thread Alex Bennée
Pierrick Bouvier  writes:

> On 3/14/25 08:38, Paolo Bonzini wrote:
>> -gsplit-dwarf is reported to produce broken binaries on Windows.
>> The linker produces warnings but exits successfully:
>> /usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
>> qga/qemu-ga.exe:/4: section below image base
>> /usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
>> qga/qemu-ga.exe:/24: section below image base
>> and as a result qemu-ga.exe fails to start.
>> On top of this, also disable -gsplit-dwarf unless building from git.
>> Similar to -Werror, split debug info is probably not the best choice
>> for people that want to build for installing.
>> (Random thoughts: there is a tension here between adding an option
>> that is useful for QEMU developers, and messing things up for everyone
>> else by doing something decidedly non-standard.  For example, distros
>> are starting to create a fake git repository just so that they can
>> use "git am" to apply patches; while some of them, for example Fedora,
>> are wise, or paranoid, enough to pass --disable-XXX for everything and
>> then turn back on what they want, it cannot be expected that everyone
>> does this.  It may be safer to make --enable-split-debug default off
>> for everybody and add it somewhere in docs/.  For now I am keeping it
>> enabled but we could consider doing something different during the hard
>> freeze period).
>> Reported-by: Konstantin Kostiuk 
>> Signed-off-by: Paolo Bonzini 
>> ---
>>   configure | 4 
>>   meson_options.txt | 2 +-
>>   2 files changed, 5 insertions(+), 1 deletion(-)
>> diff --git a/configure b/configure
>> index 02f1dd2311f..9aece67ed08 100755
>> --- a/configure
>> +++ b/configure
>> @@ -1864,6 +1864,10 @@ if test "$skip_meson" = no; then
>> { test "$host_os" = linux || test "$host_os" = "windows"; }; then
>> echo 'werror = true' >> $cross
>> fi
>> +  if test -e "$source_path/.git" && test "$host_os" != "windows"; then
>> +  echo 'split_debug = true' >> $cross
>> +  fi
>> +
>> echo "[project options]" >> $cross
>> if test "$SMBD" != ''; then
>>   echo "smbd = $(meson_quote "$SMBD")" >> $cross
>> diff --git a/meson_options.txt b/meson_options.txt
>> index 3432123fee2..f3546b9abc1 100644
>> --- a/meson_options.txt
>> +++ b/meson_options.txt
>> @@ -362,7 +362,7 @@ option('debug_mutex', type: 'boolean', value: false,
>>  description: 'mutex debugging support')
>>   option('debug_stack_usage', type: 'boolean', value: false,
>>  description: 'measure coroutine stack usage')
>> -option('split_debug', type: 'boolean', value: true,
>> +option('split_debug', type: 'boolean', value: false,
>>  description: 'split debug info from object files')
>>   option('qom_cast_debug', type: 'boolean', value: true,
>>  description: 'cast debugging support')
>
> Unfortunate coincidence, this appears at the same time MSYS2 fixed
> some issue triggering a segfault [1]. So I didn't investigate further
> the current issue, thinking something else have been changed I don't
> know where.
>
> Would be better to revert it completely indeed, creating another build
> configuration is not worth the (cheap) disk storage saved.

Well we should disable debug info on the CI builds then. Nothing is free
and our CI is pretty damned heavy and the builds all add up when debug
info is on by default.

>
> [1] https://github.com/msys2/MINGW-packages/issues/23577

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



[PATCH] ppc/amigaone: Check blk_pwrite return value

2025-03-14 Thread BALATON Zoltan
Coverity reported that return value of blk_pwrite() maybe should not
be ignored. We can't do much if this happens other than report an
error but let's do that to silence this report.

Resolves: Coverity CID 1593725
Signed-off-by: BALATON Zoltan 
---
 hw/ppc/amigaone.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/amigaone.c b/hw/ppc/amigaone.c
index 483512125f..5d787c3059 100644
--- a/hw/ppc/amigaone.c
+++ b/hw/ppc/amigaone.c
@@ -108,8 +108,8 @@ static void nvram_write(void *opaque, hwaddr addr, uint64_t 
val,
 uint8_t *p = memory_region_get_ram_ptr(&s->mr);
 
 p[addr] = val;
-if (s->blk) {
-blk_pwrite(s->blk, addr, 1, &val, 0);
+if (s->blk && blk_pwrite(s->blk, addr, 1, &val, 0) < 0) {
+error_report("%s: could not write %s", __func__, blk_name(s->blk));
 }
 }
 
@@ -151,15 +151,17 @@ static void nvram_realize(DeviceState *dev, Error **errp)
 *c = cpu_to_be32(CRC32_DEFAULT_ENV);
 /* Also copies terminating \0 as env is terminated by \0\0 */
 memcpy(p + 4, default_env, sizeof(default_env));
-if (s->blk) {
-blk_pwrite(s->blk, 0, sizeof(crc) + sizeof(default_env), p, 0);
+if (s->blk &&
+blk_pwrite(s->blk, 0, sizeof(crc) + sizeof(default_env), p, 0) < 0
+   ) {
+error_report("%s: could not write %s", __func__, blk_name(s->blk));
 }
 return;
 }
 if (*c == 0) {
 *c = cpu_to_be32(crc32(0, p + 4, NVRAM_SIZE - 4));
-if (s->blk) {
-blk_pwrite(s->blk, 0, 4, p, 0);
+if (s->blk && blk_pwrite(s->blk, 0, 4, p, 0) < 0) {
+error_report("%s: could not write %s", __func__, blk_name(s->blk));
 }
 }
 if (be32_to_cpu(*c) != crc) {
-- 
2.41.3




[PATCH] ppc/amigaone: Constify default_env

2025-03-14 Thread BALATON Zoltan
The variable holding default env is not supposed to be written.

Signed-off-by: BALATON Zoltan 
---
 hw/ppc/amigaone.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/amigaone.c b/hw/ppc/amigaone.c
index 5d787c3059..e9407a51b5 100644
--- a/hw/ppc/amigaone.c
+++ b/hw/ppc/amigaone.c
@@ -63,7 +63,7 @@ static const char dummy_fw[] = {
 #define NVRAM_ADDR 0xfd0e
 #define NVRAM_SIZE (4 * KiB)
 
-static char default_env[] =
+static const char default_env[] =
 "baudrate=115200\0"
 "stdout=vga\0"
 "stdin=ps2kbd\0"
-- 
2.41.3




Re: [PATCH] configure: disable split_debug on Windows and on non-git builds

2025-03-14 Thread Pierrick Bouvier

On 3/14/25 08:38, Paolo Bonzini wrote:

-gsplit-dwarf is reported to produce broken binaries on Windows.
The linker produces warnings but exits successfully:

/usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
qga/qemu-ga.exe:/4: section below image base
/usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
qga/qemu-ga.exe:/24: section below image base

and as a result qemu-ga.exe fails to start.

On top of this, also disable -gsplit-dwarf unless building from git.
Similar to -Werror, split debug info is probably not the best choice
for people that want to build for installing.

(Random thoughts: there is a tension here between adding an option
that is useful for QEMU developers, and messing things up for everyone
else by doing something decidedly non-standard.  For example, distros
are starting to create a fake git repository just so that they can
use "git am" to apply patches; while some of them, for example Fedora,
are wise, or paranoid, enough to pass --disable-XXX for everything and
then turn back on what they want, it cannot be expected that everyone
does this.  It may be safer to make --enable-split-debug default off
for everybody and add it somewhere in docs/.  For now I am keeping it
enabled but we could consider doing something different during the hard
freeze period).

Reported-by: Konstantin Kostiuk 
Signed-off-by: Paolo Bonzini 
---
  configure | 4 
  meson_options.txt | 2 +-
  2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 02f1dd2311f..9aece67ed08 100755
--- a/configure
+++ b/configure
@@ -1864,6 +1864,10 @@ if test "$skip_meson" = no; then
{ test "$host_os" = linux || test "$host_os" = "windows"; }; then
echo 'werror = true' >> $cross
fi
+  if test -e "$source_path/.git" && test "$host_os" != "windows"; then
+  echo 'split_debug = true' >> $cross
+  fi
+
echo "[project options]" >> $cross
if test "$SMBD" != ''; then
  echo "smbd = $(meson_quote "$SMBD")" >> $cross
diff --git a/meson_options.txt b/meson_options.txt
index 3432123fee2..f3546b9abc1 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -362,7 +362,7 @@ option('debug_mutex', type: 'boolean', value: false,
 description: 'mutex debugging support')
  option('debug_stack_usage', type: 'boolean', value: false,
 description: 'measure coroutine stack usage')
-option('split_debug', type: 'boolean', value: true,
+option('split_debug', type: 'boolean', value: false,
 description: 'split debug info from object files')
  option('qom_cast_debug', type: 'boolean', value: true,
 description: 'cast debugging support')


Unfortunate coincidence, this appears at the same time MSYS2 fixed some 
issue triggering a segfault [1]. So I didn't investigate further the 
current issue, thinking something else have been changed I don't know where.


Would be better to revert it completely indeed, creating another build 
configuration is not worth the (cheap) disk storage saved.


[1] https://github.com/msys2/MINGW-packages/issues/23577



Re: [BUG][RFC] CPR transfer Issues: Socket permissions and PID files

2025-03-14 Thread Steven Sistare

Thank you Ben.  I appreciate you testing CPR and shaking out the bugs.
I will study these and propose patches.

My initial reaction to the pidfile issue is that the orchestration layer must
pass a different filename when starting the destination qemu instance.  When
using live update without containers, these types of resource conflicts in the
global namespaces are a known issue.

- Steve

On 3/14/2025 2:33 PM, Chaney, Ben wrote:

Hello,

While testing CPR transfer I encountered two issues. The first is that the 
transfer fails when running with pidfiles due to the destination qemu process 
attempting to create the pidfile while it is still locked by the source 
process. The second is that the transfer fails when running with the -run-with 
user=$USERID parameter. This is because the destination qemu process creates 
the UNIX sockets used for the CPR transfer before dropping to the lower 
permissioned user, which causes them to be owned by the original user. The 
source qemu process then does not have permission to connect to it because it 
is already running as the lesser permissioned user.

Reproducing the first issue:

Create a source and destination qemu instance associated with the same VM where 
both processes have the -pidfile parameter passed on the command line. You 
should see the following error on the command line of the second process:

qemu-system-x86_64: cannot create PID file: Cannot lock pid file: Resource 
temporarily unavailable

Reproducing the second issue:

Create a source and destination qemu instance associated with the same VM where 
both processes have -run-with user=$USERID passed on the command line, where 
$USERID is a different user from the one launching the processes. Then attempt 
a CPR transfer using UNIX sockets for the main and cpr sockets. You should 
receive the following error via QMP:
{"error": {"class": "GenericError", "desc": "Failed to connect to 'cpr.sock': 
Permission denied"}}

I provided a minimal patch that works around the second issue.

Thank you,
Ben Chaney

---
include/system/os-posix.h | 4 
os-posix.c | 8 
util/qemu-sockets.c | 21 +
3 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/include/system/os-posix.h b/include/system/os-posix.h
index ce5b3bccf8..2a414a914a 100644
--- a/include/system/os-posix.h
+++ b/include/system/os-posix.h
@@ -55,6 +55,10 @@ void os_setup_limits(void);
void os_setup_post(void);
int os_mlock(bool on_fault);

+extern struct passwd *user_pwd;
+extern uid_t user_uid;
+extern gid_t user_gid;
+
/**
* qemu_alloc_stack:
* @sz: pointer to a size_t holding the requested usable stack size
diff --git a/os-posix.c b/os-posix.c
index 52925c23d3..9369b312a0 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -86,14 +86,6 @@ void os_set_proc_name(const char *s)
}


-/*
- * Must set all three of these at once.
- * Legal combinations are unset by name by uid
- */
-static struct passwd *user_pwd; /* NULL non-NULL NULL */
-static uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
-static gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
-
/*
* Prepare to change user ID. user_id can be one of 3 forms:
* - a username, in which case user ID will be changed to its uid,
diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 77477c1cd5..987977ead9 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -871,6 +871,14 @@ static bool saddr_is_tight(UnixSocketAddress *saddr)
#endif
}

+/*
+ * Must set all three of these at once.
+ * Legal combinations are unset by name by uid
+ */
+struct passwd *user_pwd; /* NULL non-NULL NULL */
+uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
+gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
+
static int unix_listen_saddr(UnixSocketAddress *saddr,
int num,
Error **errp)
@@ -947,6 +955,19 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
error_setg_errno(errp, errno, "Failed to bind socket to %s", path);
goto err;
}
+ if (user_pwd) {
+ if (chown(un.sun_path, user_pwd->pw_uid, user_pwd->pw_gid) < 0) {
+ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
path);
+ goto err;
+ }
+ }
+ else if (user_uid != -1 && user_gid != -1) {
+ if (chown(un.sun_path, user_uid, user_gid) < 0) {
+ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
path);
+ goto err;
+ }
+ }
+
if (listen(sock, num) < 0) {
error_setg_errno(errp, errno, "Failed to listen on socket");
goto err;
--
2.40.1


















Re: [PATCH v2 2/3] hw/loongarch/virt: Remove unnecessary NULL pointer checking

2025-03-14 Thread bibo mao




On 2025/3/14 下午5:11, Markus Armbruster wrote:

Bibo Mao  writes:


There is NULL pointer checking function error_propagate() already,
it is not necessary to add checking for function parameter. Here remove
NULL pointer checking with function parameter.

Signed-off-by: Bibo Mao 
---
  hw/loongarch/virt.c | 25 +++--
  1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index a5840ff968..d82676d316 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -868,21 +868,24 @@ static void virt_cpu_pre_plug(HotplugHandler *hotplug_dev,
  error_setg(&err,
 "Invalid thread-id %u specified, must be in range 
1:%u",
 cpu->thread_id, ms->smp.threads - 1);
-goto out;
+error_propagate(errp, err);
+return;


Make this




error_setg(&err, ...);
return;
My bad. I remember replacing error_propagate with error_setg(errp, ...) 
already. Will do



  }
  
  if ((cpu->core_id < 0) || (cpu->core_id >= ms->smp.cores)) {

  error_setg(&err,
 "Invalid core-id %u specified, must be in range 1:%u",
 cpu->core_id, ms->smp.cores - 1);
-goto out;
+error_propagate(errp, err);
+return;


Likewise.


  }
  
  if ((cpu->socket_id < 0) || (cpu->socket_id >= ms->smp.sockets)) {

  error_setg(&err,
 "Invalid socket-id %u specified, must be in range 
1:%u",
 cpu->socket_id, ms->smp.sockets - 1);
-goto out;
+error_propagate(errp, err);
+return;


Likewise.


  }
  
  topo.socket_id = cpu->socket_id;

@@ -895,7 +898,8 @@ static void virt_cpu_pre_plug(HotplugHandler *hotplug_dev,
 "cpu(id%d=%d:%d:%d) with arch-id %" PRIu64 " exists",
 cs->cpu_index, cpu->socket_id, cpu->core_id,
 cpu->thread_id, cpu_slot->arch_id);
-goto out;
+error_propagate(errp, err);
+return;


Likewise.


  }
  } else {
  /* For cold-add cpu, find empty cpu slot */
@@ -912,10 +916,6 @@ static void virt_cpu_pre_plug(HotplugHandler *hotplug_dev,
  cpu->phy_id = cpu_slot->arch_id;
  cs->cpu_index = cpu_slot - ms->possible_cpus->cpus;
  numa_cpu_pre_plug(cpu_slot, dev, &err);


You need to pass errp instead of &err now.


-out:
-if (err) {
-error_propagate(errp, err);
-}
  }
  
  static void virt_cpu_unplug_request(HotplugHandler *hotplug_dev,

@@ -935,9 +935,7 @@ static void virt_cpu_unplug_request(HotplugHandler 
*hotplug_dev,
  }
  
  hotplug_handler_unplug_request(HOTPLUG_HANDLER(lvms->acpi_ged), dev, &err);

-if (err) {
-error_propagate(errp, err);
-}
+error_propagate(errp, err);
  }


Correct, but I'd recomment to go one step further:

  static void virt_cpu_unplug_request(HotplugHandler *hotplug_dev,
  DeviceState *dev, Error **errp)
  {
  LoongArchVirtMachineState *lvms = LOONGARCH_VIRT_MACHINE(hotplug_dev);
-Error *err = NULL;
  LoongArchCPU *cpu = LOONGARCH_CPU(dev);
  CPUState *cs = CPU(dev);
  
  if (cs->cpu_index == 0) {

-error_setg(&err, "hot-unplug of boot cpu(id%d=%d:%d:%d) not supported",
+error_setg(errp, "hot-unplug of boot cpu(id%d=%d:%d:%d) not supported",
 cs->cpu_index, cpu->socket_id,
 cpu->core_id, cpu->thread_id);
-error_propagate(errp, err);
  return;
  }
  
-hotplug_handler_unplug_request(HOTPLUG_HANDLER(lvms->acpi_ged), dev, &err);

-if (err) {
-error_propagate(errp, err);
-}
+hotplug_handler_unplug_request(HOTPLUG_HANDLER(lvms->acpi_ged), dev, errp);
  }

  
  static void virt_cpu_unplug(HotplugHandler *hotplug_dev,

@@ -1001,9 +999,8 @@ static void virt_cpu_plug(HotplugHandler *hotplug_dev,

cpu_slot = virt_find_cpu_slot(MACHINE(lvms), cpu->phy_id);
cpu_slot->cpu = CPU(dev);
if (lvms->ipi) {
hotplug_handler_plug(HOTPLUG_HANDLER(lvms->ipi), dev, &err);
if (err) {
error_propagate(errp, err);
return;
}
}

if (lvms->extioi) {
hotplug_handler_plug(HOTPLUG_HANDLER(lvms->extioi), dev, &err);
if (err) {
error_propagate(errp, err);
return;
}
}
  
  if (lvms->acpi_ged) {

  hotplug_handler_plug(HOTPLUG_HANDLER(lvms->acpi_ged), dev, &err);
-if (err) {
-error_propagate(errp, err);
-}
+error_propagate(errp, err);
+return;
  }
  
  return;


Better make this work exactly like the other checks above, and drop the
final return, which serves no p

Re: [PATCH 1/2] hw/xen: Fix xen_bus_realize() error handling

2025-03-14 Thread Stefano Stabellini
On Fri, 14 Mar 2025, Markus Armbruster wrote:
> The Error ** argument must be NULL, &error_abort, &error_fatal, or a
> pointer to a variable containing NULL.  Passing an argument of the
> latter kind twice without clearing it in between is wrong: if the
> first call sets an error, it no longer points to NULL for the second
> call.
> 
> xen_bus_realize() is wrong that way: it passes &local_err to
> xs_node_watch() in a loop.  If this fails in more than one iteration,
> it can trip error_setv()'s assertion.
> 
> Fix by clearing @local_err.
> 
> Fixes: c4583c8c394e (xen-bus: reduce scope of backend watch)
> Signed-off-by: Markus Armbruster 

Reviewed-by: Stefano Stabellini 


> ---
>  hw/xen/xen-bus.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
> index 8260f1e1bb..2aacc1436f 100644
> --- a/hw/xen/xen-bus.c
> +++ b/hw/xen/xen-bus.c
> @@ -357,6 +357,7 @@ static void xen_bus_realize(BusState *bus, Error **errp)
>  error_reportf_err(local_err,
>"failed to set up '%s' enumeration watch: ",
>type[i]);
> +local_err = NULL;
>  }
>  
>  g_free(node);
> -- 
> 2.48.1
> 



Re: [PATCH 1/2] gdbstub: Add phys_memory_rw_debug for physical memory access

2025-03-14 Thread Richard Henderson

On 3/14/25 00:41, Nicholas Piggin wrote:

Add an accessor for gdb physical memory access mode which sets the
the .debug attribute for the MemTxAttribute, and also returns success
to the caller.

GDB with PhyMemMode will now report failure from memory accesses outside
valid system memory addresses, and it is also able to write to ROMs as
GDB virtual memory access can.

Signed-off-by: Nicholas Piggin 
---
  docs/devel/loads-stores.rst | 11 +++
  include/exec/cpu-common.h   |  3 +++
  gdbstub/system.c|  7 +--
  system/physmem.c| 16 
  4 files changed, 31 insertions(+), 6 deletions(-)



I think you might as well put this function in gdbstub/system.c
and not export (or document) it.


r~



Re: [PATCH] configure: disable split_debug on Windows and on non-git builds

2025-03-14 Thread Richard Henderson

On 3/14/25 14:14, Alex Bennée wrote:

Pierrick Bouvier  writes:


On 3/14/25 08:38, Paolo Bonzini wrote:

-gsplit-dwarf is reported to produce broken binaries on Windows.
The linker produces warnings but exits successfully:
/usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
qga/qemu-ga.exe:/4: section below image base
/usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
qga/qemu-ga.exe:/24: section below image base
and as a result qemu-ga.exe fails to start.
On top of this, also disable -gsplit-dwarf unless building from git.
Similar to -Werror, split debug info is probably not the best choice
for people that want to build for installing.
(Random thoughts: there is a tension here between adding an option
that is useful for QEMU developers, and messing things up for everyone
else by doing something decidedly non-standard.  For example, distros
are starting to create a fake git repository just so that they can
use "git am" to apply patches; while some of them, for example Fedora,
are wise, or paranoid, enough to pass --disable-XXX for everything and
then turn back on what they want, it cannot be expected that everyone
does this.  It may be safer to make --enable-split-debug default off
for everybody and add it somewhere in docs/.  For now I am keeping it
enabled but we could consider doing something different during the hard
freeze period).
Reported-by: Konstantin Kostiuk 
Signed-off-by: Paolo Bonzini 
---
   configure | 4 
   meson_options.txt | 2 +-
   2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/configure b/configure
index 02f1dd2311f..9aece67ed08 100755
--- a/configure
+++ b/configure
@@ -1864,6 +1864,10 @@ if test "$skip_meson" = no; then
 { test "$host_os" = linux || test "$host_os" = "windows"; }; then
 echo 'werror = true' >> $cross
 fi
+  if test -e "$source_path/.git" && test "$host_os" != "windows"; then
+  echo 'split_debug = true' >> $cross
+  fi
+
 echo "[project options]" >> $cross
 if test "$SMBD" != ''; then
   echo "smbd = $(meson_quote "$SMBD")" >> $cross
diff --git a/meson_options.txt b/meson_options.txt
index 3432123fee2..f3546b9abc1 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -362,7 +362,7 @@ option('debug_mutex', type: 'boolean', value: false,
  description: 'mutex debugging support')
   option('debug_stack_usage', type: 'boolean', value: false,
  description: 'measure coroutine stack usage')
-option('split_debug', type: 'boolean', value: true,
+option('split_debug', type: 'boolean', value: false,
  description: 'split debug info from object files')
   option('qom_cast_debug', type: 'boolean', value: true,
  description: 'cast debugging support')


Unfortunate coincidence, this appears at the same time MSYS2 fixed
some issue triggering a segfault [1]. So I didn't investigate further
the current issue, thinking something else have been changed I don't
know where.

Would be better to revert it completely indeed, creating another build
configuration is not worth the (cheap) disk storage saved.


Well we should disable debug info on the CI builds then. Nothing is free
and our CI is pretty damned heavy and the builds all add up when debug
info is on by default.


I think it's best reverted, since it isn't implemented properly in gcc for 
cross-compilers.  Add it to --extra-cflags for particular builds if you must.



r~



[PATCH v5 09/17] exec/ram_addr: remove dependency on cpu.h

2025-03-14 Thread Pierrick Bouvier
Needed so compilation units including it can be common.

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 include/exec/ram_addr.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index e4c28fbec9b..f5d574261a3 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -20,13 +20,14 @@
 #define RAM_ADDR_H
 
 #ifndef CONFIG_USER_ONLY
-#include "cpu.h"
 #include "system/xen.h"
 #include "system/tcg.h"
 #include "exec/cputlb.h"
 #include "exec/ramlist.h"
 #include "exec/ramblock.h"
 #include "exec/exec-all.h"
+#include "exec/memory.h"
+#include "exec/target_page.h"
 #include "qemu/rcu.h"
 
 #include "exec/hwaddr.h"
-- 
2.39.5




[PATCH v5 11/17] exec/ram_addr: call xen_hvm_modified_memory only if xen is enabled

2025-03-14 Thread Pierrick Bouvier
Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 include/exec/ram_addr.h | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index f5d574261a3..92e8708af76 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -339,7 +339,9 @@ static inline void 
cpu_physical_memory_set_dirty_range(ram_addr_t start,
 }
 }
 
-xen_hvm_modified_memory(start, length);
+if (xen_enabled()) {
+xen_hvm_modified_memory(start, length);
+}
 }
 
 #if !defined(_WIN32)
@@ -415,7 +417,9 @@ uint64_t cpu_physical_memory_set_dirty_lebitmap(unsigned 
long *bitmap,
 }
 }
 
-xen_hvm_modified_memory(start, pages << TARGET_PAGE_BITS);
+if (xen_enabled()) {
+xen_hvm_modified_memory(start, pages << TARGET_PAGE_BITS);
+}
 } else {
 uint8_t clients = tcg_enabled() ? DIRTY_CLIENTS_ALL : 
DIRTY_CLIENTS_NOCODE;
 
-- 
2.39.5




Re: [PATCH 09/11] docs: add QAPI namespace "QMP" to qemu-qmp-ref

2025-03-14 Thread Markus Armbruster
John Snow  writes:

> This also creates the qapi-qmp-index.html index and cross-reference
> target.
>
> Signed-off-by: John Snow 
> ---
>  docs/conf.py  | 4 +++-
>  docs/interop/qemu-qmp-ref.rst | 1 +
>  qapi/qapi-schema.json | 2 +-
>  3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/docs/conf.py b/docs/conf.py
> index 175491148c3..9a86e84a804 100644
> --- a/docs/conf.py
> +++ b/docs/conf.py
> @@ -163,7 +163,9 @@
>  
>  # Due to a limitation in Sphinx, we need to know which indices to
>  # generate in advance. Adding a namespace here allows that generation.
> -qapi_namespaces = set()
> +qapi_namespaces = {
> +"QMP",
> +}
>  
>  # -- Options for HTML output --
>  
> diff --git a/docs/interop/qemu-qmp-ref.rst b/docs/interop/qemu-qmp-ref.rst
> index e95eeac45e2..ef8792b53f5 100644
> --- a/docs/interop/qemu-qmp-ref.rst
> +++ b/docs/interop/qemu-qmp-ref.rst
> @@ -8,3 +8,4 @@ QEMU QMP Reference Manual
>  
>  .. qapi-doc:: qapi/qapi-schema.json
> :transmogrify:
> +   :namespace: QMP
> diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
> index 4475e81cc3e..c41c01eb2ab 100644
> --- a/qapi/qapi-schema.json
> +++ b/qapi/qapi-schema.json
> @@ -5,7 +5,7 @@
>  #
>  # This document describes all commands currently supported by QMP.
>  #
> -# For locating a particular item, please see the `qapi-index`.
> +# For locating a particular item, please see the `qapi-qmp-index`.

Output changes from "see the QAPI Index" to "see the QMP Index", which
is an improvement.

I think I'd rather spell it "QMP index", though.  "Index" comes from
PATCH 08.  Not worth delaying the series. for that.

>  #
>  # Most of the time their usage is exactly the same as in the user
>  # Monitor, this means that any other document which also describe




Re: [PULL 00/72] ppc-for-10.0-1 queue

2025-03-14 Thread Nicholas Piggin
On Thu Mar 13, 2025 at 8:49 PM AEST, Philippe Mathieu-Daudé wrote:
> On 13/3/25 07:13, Thomas Huth wrote:
>> On 13/03/2025 03.34, Stefan Hajnoczi wrote:
>>> On Tue, Mar 11, 2025 at 8:59 PM Nicholas Piggin  
>>> wrote:

 The following changes since commit 
 825b96dbcee23d134b691fc75618b59c5f53da32:

    Merge tag 'migration-20250310-pull-request' of https://gitlab.com/ 
 farosas/qemu into staging (2025-03-11 09:32:07 +0800)

 are available in the Git repository at:

    https://gitlab.com/npiggin/qemu.git tags/pull-ppc-for-10.0-1-20250311

 for you to fetch changes up to 0f17ae24b53eaab4bbe9cfab267c536e2f7fdbd7:

    docs/system/ppc/amigang.rst: Update for NVRAM emulation 
 (2025-03-11 22:43:32 +1000)

 
 * amigaone enhancements, NVRAM and kernel/initrd support
 * Next round of XIVE group/crowd changes
 * SPI updates for powernv
 * Power10 2nd DAWR support for powernv and spapr
 * powernv HOMER/OCC fixes and improvements for power management
 * powernv PNOR support
 * Big cleanup to move TCG code under ifdef or into its own file
 * Update SLOF and skiboot ROMs
 * Remove 405 boards and deprecate 405 CPU
 * Add support for nested KVM "hostwide state" data.
>>>
>>> I fixed a CI failure on FreeBSD 14 hosts because of the __packed macro
>>> redefinition in hw/ppc/pnv_occ.c:
>>> https://gitlab.com/qemu-project/qemu/-/jobs/9388495246#L5857
>>>
>>> Here is my fix in the merge commit, if you prefer a different fix,
>>> please send a follow-up commit:
>>> diff --git i/hw/ppc/pnv_occ.c w/hw/ppc/pnv_occ.c
>>> index d9ce35a4d6..bda6b23ad3 100644
>>> --- i/hw/ppc/pnv_occ.c
>>> +++ w/hw/ppc/pnv_occ.c
>>> @@ -394,7 +394,9 @@ type_init(pnv_occ_register_types);
>>>   #define s64 int64_t
>>>   #define __be16 uint16_t
>>>   #define __be32 uint32_t
>>> +#ifndef __packed
>>>   #define __packed QEMU_PACKED
>>> +#endif /* !__packed */
>> 
>> We should never define such macros in userspace - everything with two 
>> underscores at the beginning is reserved for the system and the compiler 
>> and must not be created by the userspace code.
>> Why doesn't this code use QEMU_PACKED directly instead?
>
> Similar question with __be16 / __be32.

Okay these were just because the type definition is taken from
skiboot firmware, so I added those defs just in the .c file to
make it a bit less change. It's not too much to change if that
is preferred.

Thanks,
Nick



Re: [PATCH] target/arm: Define raw write for PMU CLR registers

2025-03-14 Thread Akihiko Odaki

On 2025/03/14 3:34, Peter Maydell wrote:

On Thu, 13 Mar 2025 at 07:16, Akihiko Odaki  wrote:


PMCNTENCLR_EL0 and PMINTENCLR_EL1 clears written bits so we need an
alternative raw write functions, which will be used to copy KVM kernel
coprocessor state into userspace.

Signed-off-by: Akihiko Odaki 
---
  target/arm/helper.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index f0ead22937bf..30883cd3a989 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -1907,7 +1907,8 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
.fgt = FGT_PMCNTEN,
.type = ARM_CP_ALIAS | ARM_CP_IO,
.fieldoffset = offsetof(CPUARMState, cp15.c9_pmcnten),
-  .writefn = pmcntenclr_write },
+  .writefn = pmcntenclr_write,
+  .raw_writefn = raw_write },
  { .name = "PMOVSR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 3,
.access = PL0_RW, .type = ARM_CP_IO,
.fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmovsr),
@@ -2033,7 +2034,8 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
.fgt = FGT_PMINTEN,
.type = ARM_CP_ALIAS | ARM_CP_IO | ARM_CP_NO_RAW,
.fieldoffset = offsetof(CPUARMState, cp15.c9_pminten),
-  .writefn = pmintenclr_write },
+  .writefn = pmintenclr_write,
+  .raw_writefn = raw_write },
  { .name = "CCSIDR", .state = ARM_CP_STATE_BOTH,
.opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 0,
.access = PL1_R,


Hmm, looking more closely at this, I think this second one should
not need a raw_writefn, because it's marked as ARM_CP_NO_RAW
(meaning nothing should try to do a raw write to it).


Good catch; I didn't notice ARM_CP_NO_RAW.



And the first one is marked ARM_CP_ALIAS, so I'm not
sure why we would be using it in KVM register sync:
add_cpreg_to_list() skips ARM_CP_ALIAS (and ARM_CP_NO_RAW)
registers when we construct the cpreg_tuples[] array that
defines which sysregs we sync to and from KVM.


The register list is initialized with kvm_arm_init_cpreg_list() for KVM, 
which ignores those flags.


target/arm/cpregs.h explicitly says: "registers marked ARM_CP_ALIAS will 
not be migrated but may have their state set by syncing of register 
state from KVM."


ARM_CP_NO_RAW is still respected for KVM by write_cpustate_to_list() and 
write_list_to_cpustate().




(We should arguably be consistent about our usage of the
NO_RAW flag between the pmintenclr and pmcntenclr registers.)


I sent v2 to drop the flag. target/arm/cpregs.h suggests ARM_CP_NO_RAW 
is not a flag for these registers:

> Flag: Register has no underlying state and does not support raw access
> for state saving/loading; it will not be used for either migration or
> KVM state synchronization. Typically this is for "registers" which are
> actually used as instructions for cache maintenance and so on.

These registers have underlying states and can support raw access.

Regards,
Akihiko Odaki



thanks
-- PMM





Re: Cross-compilation artifact is broken

2025-03-14 Thread Konstantin Kostiuk
On Wed, Mar 12, 2025 at 6:24 PM Alex Bennée  wrote:

> Daniel P. Berrangé  writes:
>
> > On Wed, Mar 12, 2025 at 02:05:09PM +, Daniel P. Berrangé wrote:
> >> On Wed, Mar 12, 2025 at 03:52:45PM +0200, Konstantin Kostiuk wrote:
> >> > Hi All,
> >> >
> >> > I cross-compiled qemu-ga from current master branch
> >> > (825b96dbcee23d134b691fc75618b59c5f53da32) and found strange behavior.
> >> >
> >> > Configure CLI:
> >> > ./configure --disable-docs --disable-system --disable-user
> >> > --cross-prefix=x86_64-w64-mingw32- --enable-guest-agent
> >> > --disable-guest-agent-msi --disable-qga-vss
> >> > Build CLI:
> >> > make -j8 qemu-ga
> >> >
> >> > Linker wrote the following information but exited with 0 code:
> >> >
> >> >
> /usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
> >> > qga/qemu-ga.exe:/4: section below image base
> >> >
> /usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
> >> > qga/qemu-ga.exe:/24: section below image base
> >> >
> >> > As a result, this binary failed to start on Windows without any
> details,
> >> > just a message that the application is not compatible. I also tried
> to run
> >> > it with wine and got the error:
> >> >
> >> > wine: failed to start
> >> > L"Z:\\home\\user\\Documents\\repos\\qemu\\build\\qga\\qemu-ga.exe"
> >> > Application could not be started, or no application associated with
> the
> >> > specified file.
> >> > ShellExecuteEx failed: Bad EXE format for
> >> > Z:\home\user\Documents\repos\qemu\build\qga\qemu-ga.exe.
> >> >
> >> > I bisected the tree and found the commit that caused the problem:
> >> >
> https://gitlab.com/qemu-project/qemu/-/commit/563b1a35ed1f1151505d4fe5f723827d1b3fd4bc
> >> >
> >> > Adding --disable-split-debug to the configure CLI fixes the issue.
> >> >
> >> > $ x86_64-w64-mingw32-gcc --version
> >> > x86_64-w64-mingw32-gcc (GCC) 14.2.0
> >> >
> >> > My question is, is this expected behavior or is this a bug?
> >>
> >> Your configure args don't include "--enable-debug", so I would
> >> not have expected -gsplit-dwarf to have been enabled, so I'm
> >> surprised that commit casued a problem.
> >
> > Hmm it appears that the meson  "get_option('debug')" is entirely
> > unconnected to QEMU's --enable-debug configure flag, which I did
> > not realize.
> >
> > IOW, we've got -gsplit-dwarf enabled by default for everyone
> > building QEMU, which feels dubious. IMHO only an explicit
> > --enable-debug configure arg should have triggered it.
>
> --enable-debug is more than debug info, --enable-debug-info is enabled
> by default. If you build with --disable-debug-info then -gsplit-dwarf
> won't be applied.
>
>
But as this broke the Windows build, maybe we should disable this for
Windows
until resolve this problem


> >
> > In addition since its breaking Windows builds, it appears we
> > need to block its usage on Windows.
> >
> >
> > With regards,
> > Daniel
>
> --
> Alex Bennée
> Virtualisation Tech Lead @ Linaro
>
>


Re: Re: [PATCH] cryptodev: Fix error handling in cryptodev_lkcf_execute_task()

2025-03-14 Thread zhenwei pi

Hi Markus,

Current code style seems buggy, I think the main reason is that the 
Error *errp is not generated at right place. keyctl_pkey_XXX fails 
without new error, qcrypto_akcipher_XXX fails with new error, but they 
are in the same switch-case code block. If we can separate crypto 
operations into two functions - cryptodev_lkcf_keyctl_op and 
cryptodev_lkcf_qakcipher_op, and the error is generate inside the 
functions, it may be handled easily. Then applying your changes, it seem 
more clear. What do you think?


+static inline int cryptodev_lkcf_keyctl_op(key_serial_t key_id, int 
op_code, char *op_desc,
+  CryptoDevBackendAsymOpInfo *asym_op_info, Error 
**errp)

+{
+int ret = -1;
+
+switch (op_code) {
+case VIRTIO_CRYPTO_AKCIPHER_ENCRYPT:
+ret = keyctl_pkey_encrypt(key_id, op_desc,
+asym_op_info->src, asym_op_info->src_len,
+asym_op_info->dst, asym_op_info->dst_len);
+break;
+
+case VIRTIO_CRYPTO_AKCIPHER_DECRYPT:
+ret = keyctl_pkey_decrypt(key_id, op_desc,
+asym_op_info->src, asym_op_info->src_len,
+asym_op_info->dst, asym_op_info->dst_len);
+break;
+
+case VIRTIO_CRYPTO_AKCIPHER_SIGN:
+ret = keyctl_pkey_sign(key_id, op_desc,
+asym_op_info->src, asym_op_info->src_len,
+asym_op_info->dst, asym_op_info->dst_len);
+break;
+
+case VIRTIO_CRYPTO_AKCIPHER_VERIFY:
+ret = keyctl_pkey_verify(key_id, op_desc,
+asym_op_info->src, asym_op_info->src_len,
+asym_op_info->dst, asym_op_info->dst_len);
+break;
+
+default:
+error_setg(errp, "Unknown opcode: %u", op_code);
+}
+
+if (ret && !*errp) {
+error_setg_errno(errp, errno, "Failed do operation with keyctl");
+}
+
+return ret;
+}
+
+static inline int cryptodev_lkcf_qakcipher_op(QCryptoAkCipher 
*akcipher, int op_code, char *op_desc,
+  CryptoDevBackendAsymOpInfo *asym_op_info, Error 
**errp)

+{
+int ret = -1;
+
+switch (op_code) {
+case VIRTIO_CRYPTO_AKCIPHER_ENCRYPT:
+ ret = qcrypto_akcipher_encrypt(akcipher,
+asym_op_info->src, asym_op_info->src_len,
+asym_op_info->dst, asym_op_info->dst_len, errp);
+break;
+
+case VIRTIO_CRYPTO_AKCIPHER_DECRYPT:
+ret = qcrypto_akcipher_decrypt(akcipher,
+asym_op_info->src, asym_op_info->src_len,
+asym_op_info->dst, asym_op_info->dst_len, errp);
+break;
+
+case VIRTIO_CRYPTO_AKCIPHER_SIGN:
+ret = qcrypto_akcipher_sign(akcipher,
+asym_op_info->src, asym_op_info->src_len,
+asym_op_info->dst, asym_op_info->dst_len, errp);
+break;
+
+case VIRTIO_CRYPTO_AKCIPHER_VERIFY:
+ret = qcrypto_akcipher_verify(akcipher,
+asym_op_info->src, asym_op_info->src_len,
+asym_op_info->dst, asym_op_info->dst_len, errp);
+break;
+
+default:
+error_setg(errp, "Unknown opcode: %u", op_code);
+}
+
+return ret;
+}
+
 static void cryptodev_lkcf_execute_task(CryptoDevLKCFTask *task)
 {
 CryptoDevBackendLKCFSession *session = task->sess;
@@ -336,6 +414,7 @@ static void 
cryptodev_lkcf_execute_task(CryptoDevLKCFTask *task)

 }
 }

+asym_op_info = task->op_info->u.asym_op_info;
 if (key_id < 0) {
 if (!qcrypto_akcipher_supports(&session->akcipher_opts)) {
 status = -VIRTIO_CRYPTO_NOTSUPP;
@@ -349,72 +428,13 @@ static void 
cryptodev_lkcf_execute_task(CryptoDevLKCFTask *task)

 status = -VIRTIO_CRYPTO_ERR;
 goto out;
 }
-}
-
-asym_op_info = task->op_info->u.asym_op_info;
-switch (op_code) {
-case VIRTIO_CRYPTO_AKCIPHER_ENCRYPT:
-if (key_id >= 0) {
-ret = keyctl_pkey_encrypt(key_id, op_desc,
-asym_op_info->src, asym_op_info->src_len,
-asym_op_info->dst, asym_op_info->dst_len);
-} else {
-ret = qcrypto_akcipher_encrypt(akcipher,
-asym_op_info->src, asym_op_info->src_len,
-asym_op_info->dst, asym_op_info->dst_len, &local_error);
-}
-break;

-case VIRTIO_CRYPTO_AKCIPHER_DECRYPT:
-if (key_id >= 0) {
-ret = keyctl_pkey_decrypt(key_id, op_desc,
-asym_op_info->src, asym_op_info->src_len,
-asym_op_info->dst, asym_op_info->dst_len);
-} else {
-ret = qcrypto_akcipher_decrypt(akcipher,
-asym_op_info->src, asym_op_info->src_len,
-asym_op_info->dst, asym_op_info->dst_len, &local_error);
-}
-break;
-
-case VIRTIO_CRYPTO_AKCIPHER_SIGN:
-if (key_id >= 0) {
-ret = keyctl_pkey_sign(key_id, op_desc,
-asym_op_info->src, asym_op_info->src_len,
-asym_op_info->dst, asym_op_info->dst_len);
-} else {
-ret = qcrypto_akcipher_

Re: [PATCH v2] target/arm: Define raw write for PMU CLR registers

2025-03-14 Thread Peter Maydell
On Fri, 14 Mar 2025 at 08:13, Akihiko Odaki  wrote:
>
> Raw writes to PMCNTENCLR and PMCNTENCLR_EL0 incorrectly used their
> default write function, which clears written bits instead of writes the
> raw value.
>
> PMINTENCLR and PMINTENCLR_EL1 are similar registers, but they instead
> had ARM_CP_NO_RAW. target/arm/cpregs.h suggests this flag usage is
> inappropriate:
> > Flag: Register has no underlying state and does not support raw access
> > for state saving/loading; it will not be used for either migration or
> > KVM state synchronization. Typically this is for "registers" which are
> > actually used as instructions for cache maintenance and so on.
>
> PMINTENCLR and PMINTENCLR_EL1 have underlying states and can support
> raw access for state saving/loading. Flagging a register with
> ARM_CP_NO_RAW has a side effect that hides it from GDB.

No, the CLR registers don't have their own underlying state:
all the state is handled by the SET registers, and it just
happens that you can read it via the CLR registers.

> Properly set raw write functions and drop the ARM_CP_NO_RAW flag from
> PMINTENCLR and PMINTENCLR_EL1.

I think the correct fix is to mark all the CLR registers as NO_RAW.

thanks
-- PMM



[PULL 00/12] QAPI patches patches for 2025-03-14

2025-03-14 Thread Markus Armbruster
The following changes since commit 0462a32b4f63b2448b4a196381138afd50719dc4:

  Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging 
(2025-03-14 09:31:13 +0800)

are available in the Git repository at:

  https://repo.or.cz/qemu/armbru.git tags/pull-qapi-2025-03-14

for you to fetch changes up to a6af54434400099b8afd59ba036cf9a662006d1e:

  docs: enable transmogrifier for QSD and QGA (2025-03-14 07:32:41 +0100)


QAPI patches patches for 2025-03-14


John Snow (11):
  docs/qapi_domain: isolate TYPE_CHECKING imports
  docs/qapi-domain: always store fully qualified name in signode
  docs/qapi_domain: add namespace support to FQN
  docs/qapi-domain: add :namespace: override option
  docs/qapi-domain: add qapi:namespace directive
  docs/qapidoc: add :namespace: option to qapi-doc directive
  docs/qapi_domain: add namespace support to cross-references
  docs/qapi-domain: add namespaced index support
  docs: add QAPI namespace "QMP" to qemu-qmp-ref
  docs: disambiguate references in qapi-domain.rst
  docs: enable transmogrifier for QSD and QGA

Markus Armbruster (1):
  qapi/block-core: Improve x-blockdev-change documentation

 docs/conf.py |   7 +
 docs/devel/qapi-domain.rst   |  70 +--
 docs/interop/qemu-ga-ref.rst |   2 +
 docs/interop/qemu-qmp-ref.rst|   1 +
 docs/interop/qemu-storage-daemon-qmp-ref.rst |   2 +
 docs/sphinx/qapi_domain.py   | 297 +++
 docs/sphinx/qapidoc.py   |  12 ++
 qapi/block-core.json |  28 ++-
 qapi/qapi-schema.json|   2 +-
 qga/qapi-schema.json |   3 +
 storage-daemon/qapi/qapi-schema.json |   8 +
 11 files changed, 315 insertions(+), 117 deletions(-)

-- 
2.48.1




Re: [PATCH v3 6/7] memory: Attach MemoryAttributeManager to guest_memfd-backed RAMBlocks

2025-03-14 Thread Chenyi Qiang



On 3/14/2025 5:50 PM, David Hildenbrand wrote:
> On 14.03.25 10:30, Chenyi Qiang wrote:
>>
>>
>> On 3/14/2025 5:00 PM, David Hildenbrand wrote:
>>> On 14.03.25 09:21, Chenyi Qiang wrote:
 Hi David & Alexey,
>>>
>>> Hi,
>>>

 To keep the bitmap aligned, I add the undo operation for
 set_memory_attributes() and use the bitmap + replay callback to do
 set_memory_attributes(). Does this change make sense?
>>>
>>> I assume you mean this hunk:
>>>
>>> +    ret =
>>> memory_attribute_manager_state_change(MEMORY_ATTRIBUTE_MANAGER(mr->rdm),
>>> +    offset, size,
>>> to_private);
>>> +    if (ret) {
>>> +    warn_report("Failed to notify the listener the state change
>>> of "
>>> +    "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s",
>>> +    start, size, to_private ? "private" : "shared");
>>> +    args.to_private = !to_private;
>>> +    if (to_private) {
>>> +    ret = ram_discard_manager_replay_populated(mr->rdm,
>>> §ion,
>>> +
>>> kvm_set_memory_attributes_cb,
>>> +   &args);
>>> +    } else {
>>> +    ret = ram_discard_manager_replay_discarded(mr->rdm,
>>> §ion,
>>> +
>>> kvm_set_memory_attributes_cb,
>>> +   &args);
>>> +    }
>>> +    if (ret) {
>>> +    goto out_unref;
>>> +    }
>>>
> 
> We should probably document that memory_attribute_state_change() cannot
> fail with "to_private", so you can simplify it to only handle the "to
> shared" case.

Yes, "to_private" branch is unnecessary.

> 
>>>
>>> Why is that undo necessary? The bitmap + listeners should be held in
>>> sync inside of
>>> memory_attribute_manager_state_change(). Handling this in the caller
>>> looks wrong.
>>
>> state_change() handles the listener, i.e. VFIO/IOMMU. And the caller
>> handles the core mm side (guest_memfd set_attribute()) undo if
>> state_change() failed. Just want to keep the attribute consistent with
>> the bitmap on both side. Do we need this? If not, the bitmap can only
>> represent the status of listeners.
> 
> Ah, so you meant that you effectively want to undo the attribute change,
> because the state effectively cannot change, and we want to revert the
> attribute change.
> 
> That makes sense when we are converting private->shared.
> 
> 
> BTW, I'm thinking if the orders should be the following (with in-place
> conversion in mind where we would mmap guest_memfd for the shared memory
> parts).
> 
> On private -> shared conversion:
> 
> (1) change_attribute()
> (2) state_change(): IOMMU pins shared memory
> (3) restore_attribute() if it failed
> 
> On shared -> private conversion
> (1) state_change(): IOMMU unpins shared memory
> (2) change_attribute(): can convert in-place because there are not pins
> 
> I'm wondering if the whole attribute change could actually be a
> listener, invoked by the memory_attribute_manager_state_change() call
> itself in the right order.
> 
> We'd probably need listener priorities, and invoke them in reverse order
> on shared -> private conversion. Just an idea to get rid of the manual
> ram_discard_manager_replay_discarded() call in your code.

Good idea. I think listener priorities can make it more elegant with
in-place conversion. And the change_attribute() listener can be given a
highest or lowest priority. Maybe we can add this change in advance
before in-place conversion available.

> 




Re: [PATCH v2] target/arm: Define raw write for PMU CLR registers

2025-03-14 Thread Akihiko Odaki




On 2025/03/14 19:22, Peter Maydell wrote:

On Fri, 14 Mar 2025 at 08:13, Akihiko Odaki  wrote:


Raw writes to PMCNTENCLR and PMCNTENCLR_EL0 incorrectly used their
default write function, which clears written bits instead of writes the
raw value.

PMINTENCLR and PMINTENCLR_EL1 are similar registers, but they instead
had ARM_CP_NO_RAW. target/arm/cpregs.h suggests this flag usage is
inappropriate:

Flag: Register has no underlying state and does not support raw access
for state saving/loading; it will not be used for either migration or
KVM state synchronization. Typically this is for "registers" which are
actually used as instructions for cache maintenance and so on.


PMINTENCLR and PMINTENCLR_EL1 have underlying states and can support
raw access for state saving/loading. Flagging a register with
ARM_CP_NO_RAW has a side effect that hides it from GDB.


No, the CLR registers don't have their own underlying state:
all the state is handled by the SET registers, and it just
happens that you can read it via the CLR registers.


Properly set raw write functions and drop the ARM_CP_NO_RAW flag from
PMINTENCLR and PMINTENCLR_EL1.


I think the correct fix is to mark all the CLR registers as NO_RAW.


My understanding is that ARM_CP_ALIAS is ignored for KVM to avoid making 
an assumption what aliases KVM implement at cost of migrating one state 
multiple times. The CLR registers should also remain as possible 
sources/targets of raw values.


Regards,
Akihiko Odaki



thanks
-- PMM





Re: [PATCH 2/3] hw/loongarch/virt: Remove unnecessary NULL pointer checking

2025-03-14 Thread Markus Armbruster
bibo mao  writes:

On 2025/3/13 下午6:32, Markus Armbruster wrote:

[...]

>> diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
>> index a5840ff968..4674bd9163 100644
>> --- a/hw/loongarch/virt.c
>> +++ b/hw/loongarch/virt.c
>> @@ -859,30 +859,29 @@ static void virt_cpu_pre_plug(HotplugHandler 
>> *hotplug_dev,
>>  LoongArchCPU *cpu = LOONGARCH_CPU(dev);
>>  CPUState *cs = CPU(dev);
>>  CPUArchId *cpu_slot;
>> -Error *err = NULL;
>>  LoongArchCPUTopo topo;
>>  int arch_id;
>>   
>>  if (lvms->acpi_ged) {
>>  if ((cpu->thread_id < 0) || (cpu->thread_id >= ms->smp.threads)) {
>> -error_setg(&err,
>> +error_setg(errp,
>> "Invalid thread-id %u specified, must be in range 
>> 1:%u",
>> cpu->thread_id, ms->smp.threads - 1);
>> -goto out;
>> +return;
>
> Hi Markus,
>
>  From APIs, it seems that function error_propagate() do much more than 
> error appending, such as comparing dest_err with error_abort etc. Though 
> caller function is local variable rather than error_abort/fatal/warn, 
> error_propagate() seems useful. How about do propagate error and return 
> directly as following:
>
> @@ -868,7 +868,8 @@ static void virt_cpu_pre_plug(HotplugHandler 
> *hotplug_dev,
>  error_setg(&err,
> "Invalid thread-id %u specified, must be in 
> range 1:%u",
> cpu->thread_id, ms->smp.threads - 1);
> -goto out;
> +error_propagate(errp, err);
> +return;
>  }

This is strictly worse.  One, it's more verbose.  Two, the stack
backtrace on failure is less useful, which matters when @errp is
&error_abort, and when you set a breakpoint on error_handle(), abort(),
or exit().  Three, it doesn't actually add useful functionality.

To help you see the latter, let's compare the two versions, i.e.

   direct: error_setg(errp, ...)

and

   propagate: two steps, first error_setg(&err, ...), and then
   error_propagate(errp, err);

Cases: @errp can be NULL, &error_abort, &error_fatal, &error_warn, or a
non-null pointer to variable containing NULL.

1. @errp is NULL

   Direct does nothing.

   Propagate step 1 creates an error object, and stores it in @err.
   Step 2 frees the error object.  Roundabout way to do nothing.

2. @errp is &error_abort

   Direct creates an error object, reports it to stderr, and abort()s.
   Note that the stack backtrace shows where the error is created.

   Propagate step 1 creates an error object, and stores it in @err.
   Step 2 reports it to stderr, and abort()s.  No difference, except the
   stack backtrace shows where the error is propagated, which is less
   useful.

3. @errp is &error_fatal

   Direct creates an error object, reports it to stderr, frees it, and
   exit(1)s.

   Propagate step 1 creates an error object, and stores it in @err.
   Step 2 reports it to stderr, frees it, and exit(1)s.  No difference.

4. @errp is &error_warn

   Direct creates an error object, reports it to stderr, and frees it.

   Propagate step 1 creates an error object, and stores it in @err.
   Step 2 reports it to stderr, and frees it.  No difference.

5. @errp points to variable containing NULL

   Direct creates an error object, and stores it in the variable.

   Propagate step 1 creates an error object, and stores it in @err.
   Step 2 copies it to the variable.  No difference.

Questions?

[...]




Re: [PATCH v2] target/arm: Define raw write for PMU CLR registers

2025-03-14 Thread Peter Maydell
On Fri, 14 Mar 2025 at 10:24, Akihiko Odaki  wrote:
>
>
>
> On 2025/03/14 19:22, Peter Maydell wrote:
> > On Fri, 14 Mar 2025 at 08:13, Akihiko Odaki  
> > wrote:
> >>
> >> Raw writes to PMCNTENCLR and PMCNTENCLR_EL0 incorrectly used their
> >> default write function, which clears written bits instead of writes the
> >> raw value.
> >>
> >> PMINTENCLR and PMINTENCLR_EL1 are similar registers, but they instead
> >> had ARM_CP_NO_RAW. target/arm/cpregs.h suggests this flag usage is
> >> inappropriate:
> >>> Flag: Register has no underlying state and does not support raw access
> >>> for state saving/loading; it will not be used for either migration or
> >>> KVM state synchronization. Typically this is for "registers" which are
> >>> actually used as instructions for cache maintenance and so on.
> >>
> >> PMINTENCLR and PMINTENCLR_EL1 have underlying states and can support
> >> raw access for state saving/loading. Flagging a register with
> >> ARM_CP_NO_RAW has a side effect that hides it from GDB.
> >
> > No, the CLR registers don't have their own underlying state:
> > all the state is handled by the SET registers, and it just
> > happens that you can read it via the CLR registers.
> >
> >> Properly set raw write functions and drop the ARM_CP_NO_RAW flag from
> >> PMINTENCLR and PMINTENCLR_EL1.
> >
> > I think the correct fix is to mark all the CLR registers as NO_RAW.
>
> My understanding is that ARM_CP_ALIAS is ignored for KVM to avoid making
> an assumption what aliases KVM implement at cost of migrating one state
> multiple times. The CLR registers should also remain as possible
> sources/targets of raw values.

Why? There's nothing you can do by doing a raw write to the CLR
register that you shouldn't have done by doing a raw write to
the SET register instead.

thanks
-- PMM



[RFC PATCH] meson.build: don't bother with split-debug for windows

2025-03-14 Thread Alex Bennée
It was reported this breaks the final artefacts on windows when run
under Wine.

Signed-off-by: Alex Bennée 
Cc: Konstantin Kostiuk 
---
 meson.build | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index c2c71b6f8a..9b1af6d030 100644
--- a/meson.build
+++ b/meson.build
@@ -601,8 +601,10 @@ if get_option('tsan')
   qemu_ldflags = ['-fsanitize=thread'] + qemu_ldflags
 endif
 
-if get_option('debug') and get_option('split_debug')
-  qemu_cflags += '-gsplit-dwarf'
+if host_os != 'windows'
+  if get_option('debug') and get_option('split_debug')
+qemu_cflags += '-gsplit-dwarf'
+  endif
 endif
 
 # Detect support for PT_GNU_RELRO + DT_BIND_NOW.
-- 
2.39.5




Re: [PATCH v2] target/arm: Define raw write for PMU CLR registers

2025-03-14 Thread Akihiko Odaki

On 2025/03/14 19:29, Peter Maydell wrote:

On Fri, 14 Mar 2025 at 10:24, Akihiko Odaki  wrote:




On 2025/03/14 19:22, Peter Maydell wrote:

On Fri, 14 Mar 2025 at 08:13, Akihiko Odaki  wrote:


Raw writes to PMCNTENCLR and PMCNTENCLR_EL0 incorrectly used their
default write function, which clears written bits instead of writes the
raw value.

PMINTENCLR and PMINTENCLR_EL1 are similar registers, but they instead
had ARM_CP_NO_RAW. target/arm/cpregs.h suggests this flag usage is
inappropriate:

Flag: Register has no underlying state and does not support raw access
for state saving/loading; it will not be used for either migration or
KVM state synchronization. Typically this is for "registers" which are
actually used as instructions for cache maintenance and so on.


PMINTENCLR and PMINTENCLR_EL1 have underlying states and can support
raw access for state saving/loading. Flagging a register with
ARM_CP_NO_RAW has a side effect that hides it from GDB.


No, the CLR registers don't have their own underlying state:
all the state is handled by the SET registers, and it just
happens that you can read it via the CLR registers.


Properly set raw write functions and drop the ARM_CP_NO_RAW flag from
PMINTENCLR and PMINTENCLR_EL1.


I think the correct fix is to mark all the CLR registers as NO_RAW.


My understanding is that ARM_CP_ALIAS is ignored for KVM to avoid making
an assumption what aliases KVM implement at cost of migrating one state
multiple times. The CLR registers should also remain as possible
sources/targets of raw values.


Why? There's nothing you can do by doing a raw write to the CLR
register that you shouldn't have done by doing a raw write to
the SET register instead.


In theory, KVM may choose to expose only the CLR register and hide the 
SET register. There is no guarantee that both the CLR and SET are exposed.


Regards,
Akihiko Odaki



thanks
-- PMM





[PATCH v5 08/17] exec/memory-internal: remove dependency on cpu.h

2025-03-14 Thread Pierrick Bouvier
Needed so compilation units including it can be common.

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 include/exec/memory-internal.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
index 100c1237ac2..b729f3b25ad 100644
--- a/include/exec/memory-internal.h
+++ b/include/exec/memory-internal.h
@@ -20,8 +20,6 @@
 #ifndef MEMORY_INTERNAL_H
 #define MEMORY_INTERNAL_H
 
-#include "cpu.h"
-
 #ifndef CONFIG_USER_ONLY
 static inline AddressSpaceDispatch *flatview_to_dispatch(FlatView *fv)
 {
-- 
2.39.5




Re: [PATCH 32/37] include/hw/intc: Remove ifndef CONFIG_USER_ONLY from armv7m_nvic.h

2025-03-14 Thread Richard Henderson

On 3/14/25 11:36, Pierrick Bouvier wrote:

On 3/14/25 11:13, Richard Henderson wrote:

On 3/13/25 14:00, Pierrick Bouvier wrote:

diff --git a/include/hw/intc/armv7m_nvic.h b/include/hw/intc/armv7m_nvic.h
index 89fe8aedaa..7b9964fe7e 100644
--- a/include/hw/intc/armv7m_nvic.h
+++ b/include/hw/intc/armv7m_nvic.h
@@ -189,21 +189,7 @@ int armv7m_nvic_raw_execution_priority(NVICState *s);
    * @secure: the security state to test
    * This corresponds to the pseudocode IsReqExecPriNeg().
    */
-#ifndef CONFIG_USER_ONLY
   bool armv7m_nvic_neg_prio_requested(NVICState *s, bool secure);
-#else
-static inline bool armv7m_nvic_neg_prio_requested(NVICState *s, bool secure)
-{
-    return false;
-}
-#endif
-#ifndef CONFIG_USER_ONLY
   bool armv7m_nvic_can_take_pending_exception(NVICState *s);
-#else
-static inline bool armv7m_nvic_can_take_pending_exception(NVICState *s)
-{
-    return true;
-}
-#endif
   #endif


I'm a bit worried we might have regression when doing things this way.


I expect link errors to diagnose errors.



In this case, yes.

More generally, for system vs user, it might be sufficient (even though I would prefer to 
be blindly prudent on this), but it might not protect all cases, with subtle configs or 
features enabled/disabled.


With a runtime check, we could identify the missing calls "feature_enabled()". In this 
case, we would link, but could end up call_feature in some cases, when it should be hidden 
behind a "_enabled()" check.


if (feature_enabled()) {
   call_feature();
}


I'm not quite sure what you're arguing for here.
A build-time error is vastly preferable to a run-time error.

If it's a lesser used configuration or feature, a run-time error could lay dormant for 
years before a user encounters it.



r~



Re: [PATCH] configure: disable split_debug on Windows and on non-git builds

2025-03-14 Thread Pierrick Bouvier

On 3/14/25 14:14, Alex Bennée wrote:

Pierrick Bouvier  writes:


On 3/14/25 08:38, Paolo Bonzini wrote:

-gsplit-dwarf is reported to produce broken binaries on Windows.
The linker produces warnings but exits successfully:
/usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
qga/qemu-ga.exe:/4: section below image base
/usr/lib/gcc/x86_64-w64-mingw32/14.2.0/../../../../x86_64-w64-mingw32/bin/ld:
qga/qemu-ga.exe:/24: section below image base
and as a result qemu-ga.exe fails to start.
On top of this, also disable -gsplit-dwarf unless building from git.
Similar to -Werror, split debug info is probably not the best choice
for people that want to build for installing.
(Random thoughts: there is a tension here between adding an option
that is useful for QEMU developers, and messing things up for everyone
else by doing something decidedly non-standard.  For example, distros
are starting to create a fake git repository just so that they can
use "git am" to apply patches; while some of them, for example Fedora,
are wise, or paranoid, enough to pass --disable-XXX for everything and
then turn back on what they want, it cannot be expected that everyone
does this.  It may be safer to make --enable-split-debug default off
for everybody and add it somewhere in docs/.  For now I am keeping it
enabled but we could consider doing something different during the hard
freeze period).
Reported-by: Konstantin Kostiuk 
Signed-off-by: Paolo Bonzini 
---
   configure | 4 
   meson_options.txt | 2 +-
   2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/configure b/configure
index 02f1dd2311f..9aece67ed08 100755
--- a/configure
+++ b/configure
@@ -1864,6 +1864,10 @@ if test "$skip_meson" = no; then
 { test "$host_os" = linux || test "$host_os" = "windows"; }; then
 echo 'werror = true' >> $cross
 fi
+  if test -e "$source_path/.git" && test "$host_os" != "windows"; then
+  echo 'split_debug = true' >> $cross
+  fi
+
 echo "[project options]" >> $cross
 if test "$SMBD" != ''; then
   echo "smbd = $(meson_quote "$SMBD")" >> $cross
diff --git a/meson_options.txt b/meson_options.txt
index 3432123fee2..f3546b9abc1 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -362,7 +362,7 @@ option('debug_mutex', type: 'boolean', value: false,
  description: 'mutex debugging support')
   option('debug_stack_usage', type: 'boolean', value: false,
  description: 'measure coroutine stack usage')
-option('split_debug', type: 'boolean', value: true,
+option('split_debug', type: 'boolean', value: false,
  description: 'split debug info from object files')
   option('qom_cast_debug', type: 'boolean', value: true,
  description: 'cast debugging support')


Unfortunate coincidence, this appears at the same time MSYS2 fixed
some issue triggering a segfault [1]. So I didn't investigate further
the current issue, thinking something else have been changed I don't
know where.

Would be better to revert it completely indeed, creating another build
configuration is not worth the (cheap) disk storage saved.


Well we should disable debug info on the CI builds then. Nothing is free
and our CI is pretty damned heavy and the builds all add up when debug
info is on by default.



If the problem is the storage between jobs or artifacts, maybe it's 
possibly to simply strip the binaries?




[1] https://github.com/msys2/MINGW-packages/issues/23577






Re: [PATCH v4 12/17] hw/xen: add stubs for various functions

2025-03-14 Thread Pierrick Bouvier

On 3/14/25 06:35, Anthony PERARD wrote:

On Thu, Mar 13, 2025 at 09:38:58AM -0700, Pierrick Bouvier wrote:

Those functions are used by system/physmem.c, and are called only if
xen is enabled (which happens only if CONFIG_XEN is not set).


You mean, 's/is not set/is set/'?


Right, I'll update the comment.



So we can crash in case those are called.

Acked-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
diff --git a/hw/xen/xen_stubs.c b/hw/xen/xen_stubs.c
new file mode 100644
index 000..19cee84bbb4
--- /dev/null
+++ b/hw/xen/xen_stubs.c
+
+void xen_invalidate_map_cache(void)
+{


Is this stub actually necessary? xen_invalidate_map_cache() doesn't
seems to be used outside of xen's code.



You're right again, I added it by mistake.


In anycase:
Acked-by: Anthony PERARD 

Thanks,






[PATCH v2 5/8] hw/ppc: Implement saving CPU state in Fadump

2025-03-14 Thread Aditya Gupta
Kernel expects CPU states/register states in the format mentioned in
"Register Save Area" in PAPR.

The platform (in our case, QEMU) saves each CPU register in the form of
an array of "register entries", the start and end of this array is
signified by "CPUSTRT" and "CPUEND" register entries respectively.

The CPUSTRT and CPUEND register entry also has 4-byte logical CPU ID,
thus storing the CPU ID corresponding to the array of register entries.

Each register, and CPUSTRT, CPUEND has a predefined identifier.
Implement calculating identifier for a given register in
'fadump_str_to_u64', which has been taken from the linux kernel

Similarly GPRs also have predefined identifiers, and a corresponding
64-bit resiter value (split into two 32-bit cells). Implement
calculation of GPR identifiers with 'fadump_gpr_id_to_u64'

PAPR has restrictions on particular order of few registers, and is
free to be in any order for other registers.
Some registers mentioned in PAPR have not been exported as they are not
implemented in QEMU / don't make sense in QEMU.

Implement saving of CPU state according to the PAPR document

Note: As of this patch, the "kernel-dump" device tree entry is still not
added for the second boot, so after crash, the second kernel will boot
assuming fadump dump is "NOT" active, and try to register for fadump,
but since we already have fadump registered in QEMU internal state, the
register rtas call will fail with: "DUMP ACTIVE"

Signed-off-by: Aditya Gupta 
---
 hw/ppc/spapr_fadump.c | 333 +-
 include/hw/ppc/spapr_fadump.h |  28 +++
 2 files changed, 360 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_fadump.c b/hw/ppc/spapr_fadump.c
index db874fe13487..e84f7338d82c 100644
--- a/hw/ppc/spapr_fadump.c
+++ b/hw/ppc/spapr_fadump.c
@@ -8,6 +8,71 @@
 #include "qemu/log.h"
 #include "hw/ppc/spapr.h"
 #include "system/cpus.h"
+#include "system/hw_accel.h"
+
+/*
+ * Copy the ascii values for first 8 characters from a string into u64
+ * variable at their respective indexes.
+ * e.g.
+ *  The string "FADMPINF" will be converted into 0x4641444d50494e46
+ */
+static uint64_t fadump_str_to_u64(const char *str)
+{
+uint64_t val = 0;
+int i;
+
+for (i = 0; i < sizeof(val); i++) {
+val = (*str) ? (val << 8) | *str++ : val << 8;
+}
+return val;
+}
+
+/**
+ * Get the identifier id for register entries of GPRs
+ *
+ * It gives the same id as 'fadump_str_to_u64' when the complete string id
+ * of the GPR is given, ie.
+ *
+ *   fadump_str_to_u64("GPR05") == fadump_gpr_id_to_u64(5);
+ *   fadump_str_to_u64("GPR12") == fadump_gpr_id_to_u64(12);
+ *
+ * And so on. Hence this can be implemented by creating a dynamic
+ * string for each GPR, such as "GPR00", "GPR01", ... "GPR31"
+ * Instead of allocating a string, an observation from the math of
+ * 'fadump_str_to_u64' or from PAPR tells us that there's a pattern
+ * in the identifier IDs, such that the first 4 bytes are affected only by
+ * whether it is GPR0*, GPR1*, GPR2*, GPR3*.
+ * Upper half of 5th byte is always 0x3. Lower half (nibble) of 5th byte
+ * is the tens digit of the GPR id, ie. GPR ID / 10.
+ * Upper half of 6th byte is always 0x3. Lower half (nibble) of 5th byte
+ * is the ones digit of the GPR id, ie. GPR ID % 10
+ *
+ * For example, for GPR 29, the 5th and 6th byte will be 0x32 and 0x39
+ */
+static uint64_t fadump_gpr_id_to_u64(uint32_t gpr_id)
+{
+uint64_t val = 0;
+
+/* Valid range of GPR id is only GPR0 to GPR31 */
+assert(gpr_id < 32);
+
+/* Below calculations set the 0th to 5th byte */
+if (gpr_id <= 9) {
+val = fadump_str_to_u64("GPR0");
+} else if (gpr_id <= 19) {
+val = fadump_str_to_u64("GPR1");
+} else if (gpr_id <= 29) {
+val = fadump_str_to_u64("GPR2");
+} else {
+val = fadump_str_to_u64("GPR3");
+}
+
+/* Set the 6th byte */
+val |= 0x3000;
+val |= ((gpr_id % 10) << 24);
+
+return val;
+}
 
 /*
  * Handle the "FADUMP_CMD_REGISTER" command in 'ibm,configure-kernel-dump'
@@ -218,14 +283,222 @@ static bool do_preserve_region(FadumpSection *region)
 return true;
 }
 
+/*
+ * Populate the passed CPUs register entries, in the buffer starting at
+ * the argument 'curr_reg_entry'
+ *
+ * The register entries is an array of pair of register id and register
+ * value, as described in Table 591/592 in section "H.1 Register Save Area"
+ * in PAPR v2.13
+ *
+ * Returns pointer just past this CPU's register entries, which can be used
+ * as the start address for next CPU's register entries
+ */
+static FadumpRegEntry *populate_cpu_reg_entries(CPUState *cpu,
+FadumpRegEntry *curr_reg_entry)
+{
+CPUPPCState *env;
+PowerPCCPU *ppc_cpu;
+uint32_t num_regs_per_cpu = 0;
+
+ppc_cpu = POWERPC_CPU(cpu);
+env = cpu_env(cpu);
+num_regs_per_cpu = 0;
+
+curr_reg_entry->reg_id =
+cpu_to_be64(fadump_str_to_u64("CPUSTRT"));
+curr_reg_entry->reg_value = ppc_cp

Re: [PATCH] util/loongarch64: Add clang compiler support

2025-03-14 Thread Yao Zi
On Fri, Mar 07, 2025 at 09:13:39AM +0800, Bibo Mao wrote:
> Float register name f0 - f31 is not recognized with clang compiler
> with LoongArch64 target, its name should be $f0 - $f31. It is ok
> for both gcc and clang compiler.

Sorry I didn't search the list carefully and sent a similar patch[1].

Apart from preventing the disk tools to be built, this issue affects
several headers used by linux-user emulators as well. IMHO this should
be fixed, too, or my patch could be taken.

Sorry for the inconvenience,
Yao Zi

[1]: https://lore.kernel.org/all/20250314033150.53268-3-zi...@disroot.org/

> Signed-off-by: Bibo Mao 
> ---
>  host/include/loongarch64/host/bufferiszero.c.inc | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/host/include/loongarch64/host/bufferiszero.c.inc 
> b/host/include/loongarch64/host/bufferiszero.c.inc
> index 69891eac80..bb2598fdc3 100644
> --- a/host/include/loongarch64/host/bufferiszero.c.inc
> +++ b/host/include/loongarch64/host/bufferiszero.c.inc
> @@ -61,7 +61,8 @@ static bool buffer_is_zero_lsx(const void *buf, size_t len)
>  "2:"
>  : "=&r"(ret), "+r"(p)
>  : "r"(buf), "r"(e), "r"(l)
> -: "f0", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "fcc0");
> +: "$f0", "$f1", "$f2", "$f3", "$f4", "$f5", "$f6", "$f7", "$f8",
> +  "$fcc0");
>  
>  return ret;
>  }
> @@ -119,7 +120,8 @@ static bool buffer_is_zero_lasx(const void *buf, size_t 
> len)
>  "3:"
>  : "=&r"(ret), "+r"(p)
>  : "r"(buf), "r"(e), "r"(l)
> -: "f0", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "fcc0");
> +: "$f0", "$f1", "$f2", "$f3", "$f4", "$f5", "$f6", "$f7", "$f8",
> +  "$fcc0");
>  
>  return ret;
>  }
> 
> base-commit: 661c2e1ab29cd9c4d268ae3f44712e8d421c0e56
> -- 
> 2.39.3



[PATCH v5 01/17] exec/tswap: target code can use TARGET_BIG_ENDIAN instead of target_words_bigendian()

2025-03-14 Thread Pierrick Bouvier
Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 include/exec/tswap.h | 11 ++-
 cpu-target.c |  1 +
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/exec/tswap.h b/include/exec/tswap.h
index ecd4faef015..2683da0adb7 100644
--- a/include/exec/tswap.h
+++ b/include/exec/tswap.h
@@ -13,13 +13,14 @@
 /**
  * target_words_bigendian:
  * Returns true if the (default) endianness of the target is big endian,
- * false otherwise. Note that in target-specific code, you can use
- * TARGET_BIG_ENDIAN directly instead. On the other hand, common
- * code should normally never need to know about the endianness of the
- * target, so please do *not* use this function unless you know very well
- * what you are doing!
+ * false otherwise. Common code should normally never need to know about the
+ * endianness of the target, so please do *not* use this function unless you
+ * know very well what you are doing!
  */
 bool target_words_bigendian(void);
+#ifdef COMPILING_PER_TARGET
+#define target_words_bigendian()  TARGET_BIG_ENDIAN
+#endif
 
 /*
  * If we're in target-specific code, we can hard-code the swapping
diff --git a/cpu-target.c b/cpu-target.c
index cae77374b38..519b0f89005 100644
--- a/cpu-target.c
+++ b/cpu-target.c
@@ -155,6 +155,7 @@ void cpu_abort(CPUState *cpu, const char *fmt, ...)
 abort();
 }
 
+#undef target_words_bigendian
 bool target_words_bigendian(void)
 {
 return TARGET_BIG_ENDIAN;
-- 
2.39.5




[PATCH v5 15/17] include/exec/memory: move devend functions to memory-internal.h

2025-03-14 Thread Pierrick Bouvier
Only system/physmem.c and system/memory.c use those functions, so we can
move then to internal header.

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 include/exec/memory-internal.h | 19 +++
 include/exec/memory.h  | 18 --
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
index b729f3b25ad..c75178a3d6b 100644
--- a/include/exec/memory-internal.h
+++ b/include/exec/memory-internal.h
@@ -43,5 +43,24 @@ void address_space_dispatch_free(AddressSpaceDispatch *d);
 
 void mtree_print_dispatch(struct AddressSpaceDispatch *d,
   MemoryRegion *root);
+
+/* returns true if end is big endian. */
+static inline bool devend_big_endian(enum device_endian end)
+{
+QEMU_BUILD_BUG_ON(DEVICE_HOST_ENDIAN != DEVICE_LITTLE_ENDIAN &&
+  DEVICE_HOST_ENDIAN != DEVICE_BIG_ENDIAN);
+
+if (end == DEVICE_NATIVE_ENDIAN) {
+return target_words_bigendian();
+}
+return end == DEVICE_BIG_ENDIAN;
+}
+
+/* enum device_endian to MemOp.  */
+static inline MemOp devend_memop(enum device_endian end)
+{
+return devend_big_endian(end) ? MO_BE : MO_LE;
+}
+
 #endif
 #endif
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 70177304a92..a3bb0542bf6 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -3138,24 +3138,6 @@ address_space_write_cached(MemoryRegionCache *cache, 
hwaddr addr,
 MemTxResult address_space_set(AddressSpace *as, hwaddr addr,
   uint8_t c, hwaddr len, MemTxAttrs attrs);
 
-/* returns true if end is big endian. */
-static inline bool devend_big_endian(enum device_endian end)
-{
-QEMU_BUILD_BUG_ON(DEVICE_HOST_ENDIAN != DEVICE_LITTLE_ENDIAN &&
-  DEVICE_HOST_ENDIAN != DEVICE_BIG_ENDIAN);
-
-if (end == DEVICE_NATIVE_ENDIAN) {
-return target_words_bigendian();
-}
-return end == DEVICE_BIG_ENDIAN;
-}
-
-/* enum device_endian to MemOp.  */
-static inline MemOp devend_memop(enum device_endian end)
-{
-return devend_big_endian(end) ? MO_BE : MO_LE;
-}
-
 /*
  * Inhibit technologies that require discarding of pages in RAM blocks, e.g.,
  * to manage the actual amount of memory consumed by the VM (then, the memory
-- 
2.39.5




[PATCH v5 04/17] exec/memory_ldst_phys: extract memory_ldst_phys declarations from cpu-all.h

2025-03-14 Thread Pierrick Bouvier
They are now accessible through exec/memory.h instead, and we make sure
all variants are available for common or target dependent code.

Move stl_phys_notdirty function as well.
Cached endianness agnostic version rely on st/ld*_p, which is available
through tswap.h.

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 include/exec/cpu-all.h  | 29 -
 include/exec/memory.h   | 10 ++
 include/exec/memory_ldst_phys.h.inc |  5 +
 3 files changed, 11 insertions(+), 33 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 0e8205818a4..902ca1f3c7b 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -38,35 +38,6 @@
 #define BSWAP_NEEDED
 #endif
 
-/* MMU memory access macros */
-
-#if !defined(CONFIG_USER_ONLY)
-
-#include "exec/hwaddr.h"
-
-static inline void stl_phys_notdirty(AddressSpace *as, hwaddr addr, uint32_t 
val)
-{
-address_space_stl_notdirty(as, addr, val,
-   MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-#define SUFFIX
-#define ARG1 as
-#define ARG1_DECLAddressSpace *as
-#define TARGET_ENDIANNESS
-#include "exec/memory_ldst_phys.h.inc"
-
-/* Inline fast path for direct RAM access.  */
-#define ENDIANNESS
-#include "exec/memory_ldst_cached.h.inc"
-
-#define SUFFIX   _cached
-#define ARG1 cache
-#define ARG1_DECLMemoryRegionCache *cache
-#define TARGET_ENDIANNESS
-#include "exec/memory_ldst_phys.h.inc"
-#endif
-
 /* page related stuff */
 #include "exec/cpu-defs.h"
 #include "exec/target_page.h"
diff --git a/include/exec/memory.h b/include/exec/memory.h
index d09af58c971..da21e9150b5 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -21,6 +21,7 @@
 #include "exec/memattrs.h"
 #include "exec/memop.h"
 #include "exec/ramlist.h"
+#include "exec/tswap.h"
 #include "qemu/bswap.h"
 #include "qemu/queue.h"
 #include "qemu/int128.h"
@@ -2732,6 +2733,12 @@ MemTxResult address_space_write_rom(AddressSpace *as, 
hwaddr addr,
 #define ARG1_DECLAddressSpace *as
 #include "exec/memory_ldst.h.inc"
 
+static inline void stl_phys_notdirty(AddressSpace *as, hwaddr addr, uint32_t 
val)
+{
+address_space_stl_notdirty(as, addr, val,
+   MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
 #define SUFFIX
 #define ARG1 as
 #define ARG1_DECLAddressSpace *as
@@ -2798,6 +2805,9 @@ static inline void 
address_space_stb_cached(MemoryRegionCache *cache,
 }
 }
 
+#define ENDIANNESS
+#include "exec/memory_ldst_cached.h.inc"
+
 #define ENDIANNESS   _le
 #include "exec/memory_ldst_cached.h.inc"
 
diff --git a/include/exec/memory_ldst_phys.h.inc 
b/include/exec/memory_ldst_phys.h.inc
index ecd678610d1..db67de75251 100644
--- a/include/exec/memory_ldst_phys.h.inc
+++ b/include/exec/memory_ldst_phys.h.inc
@@ -19,7 +19,6 @@
  * License along with this library; if not, see .
  */
 
-#ifdef TARGET_ENDIANNESS
 static inline uint16_t glue(lduw_phys, SUFFIX)(ARG1_DECL, hwaddr addr)
 {
 return glue(address_space_lduw, SUFFIX)(ARG1, addr,
@@ -55,7 +54,7 @@ static inline void glue(stq_phys, SUFFIX)(ARG1_DECL, hwaddr 
addr, uint64_t val)
 glue(address_space_stq, SUFFIX)(ARG1, addr, val,
 MEMTXATTRS_UNSPECIFIED, NULL);
 }
-#else
+
 static inline uint8_t glue(ldub_phys, SUFFIX)(ARG1_DECL, hwaddr addr)
 {
 return glue(address_space_ldub, SUFFIX)(ARG1, addr,
@@ -139,9 +138,7 @@ static inline void glue(stq_be_phys, SUFFIX)(ARG1_DECL, 
hwaddr addr, uint64_t va
 glue(address_space_stq_be, SUFFIX)(ARG1, addr, val,
MEMTXATTRS_UNSPECIFIED, NULL);
 }
-#endif
 
 #undef ARG1_DECL
 #undef ARG1
 #undef SUFFIX
-#undef TARGET_ENDIANNESS
-- 
2.39.5




[PATCH v5 05/17] exec/memory.h: make devend_memop "target defines" agnostic

2025-03-14 Thread Pierrick Bouvier
Will allow to make system/memory.c common later.

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 include/exec/memory.h | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index da21e9150b5..069021ac3ff 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -3138,25 +3138,17 @@ address_space_write_cached(MemoryRegionCache *cache, 
hwaddr addr,
 MemTxResult address_space_set(AddressSpace *as, hwaddr addr,
   uint8_t c, hwaddr len, MemTxAttrs attrs);
 
-#ifdef COMPILING_PER_TARGET
 /* enum device_endian to MemOp.  */
 static inline MemOp devend_memop(enum device_endian end)
 {
 QEMU_BUILD_BUG_ON(DEVICE_HOST_ENDIAN != DEVICE_LITTLE_ENDIAN &&
   DEVICE_HOST_ENDIAN != DEVICE_BIG_ENDIAN);
 
-#if HOST_BIG_ENDIAN != TARGET_BIG_ENDIAN
-/* Swap if non-host endianness or native (target) endianness */
-return (end == DEVICE_HOST_ENDIAN) ? 0 : MO_BSWAP;
-#else
-const int non_host_endianness =
-DEVICE_LITTLE_ENDIAN ^ DEVICE_BIG_ENDIAN ^ DEVICE_HOST_ENDIAN;
-
-/* In this case, native (target) endianness needs no swap.  */
-return (end == non_host_endianness) ? MO_BSWAP : 0;
-#endif
+bool big_endian = (end == DEVICE_NATIVE_ENDIAN
+   ? target_words_bigendian()
+   : end == DEVICE_BIG_ENDIAN);
+return big_endian ? MO_BE : MO_LE;
 }
-#endif /* COMPILING_PER_TARGET */
 
 /*
  * Inhibit technologies that require discarding of pages in RAM blocks, e.g.,
-- 
2.39.5




[PATCH v5 16/17] system/memory: make compilation unit common

2025-03-14 Thread Pierrick Bouvier
Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 system/memory.c| 17 +
 system/meson.build |  2 +-
 2 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/system/memory.c b/system/memory.c
index 4c829793a0a..eddd21a6cdb 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -353,15 +353,6 @@ static void flatview_simplify(FlatView *view)
 }
 }
 
-static bool memory_region_big_endian(MemoryRegion *mr)
-{
-#if TARGET_BIG_ENDIAN
-return mr->ops->endianness != DEVICE_LITTLE_ENDIAN;
-#else
-return mr->ops->endianness == DEVICE_BIG_ENDIAN;
-#endif
-}
-
 static void adjust_endianness(MemoryRegion *mr, uint64_t *data, MemOp op)
 {
 if ((op & MO_BSWAP) != devend_memop(mr->ops->endianness)) {
@@ -563,7 +554,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
 /* FIXME: support unaligned access? */
 access_size = MAX(MIN(size, access_size_max), access_size_min);
 access_mask = MAKE_64BIT_MASK(0, access_size * 8);
-if (memory_region_big_endian(mr)) {
+if (devend_big_endian(mr->ops->endianness)) {
 for (i = 0; i < size; i += access_size) {
 r |= access_fn(mr, addr + i, value, access_size,
 (size - access_size - i) * 8, access_mask, attrs);
@@ -2584,7 +2575,8 @@ void memory_region_add_eventfd(MemoryRegion *mr,
 unsigned i;
 
 if (size) {
-adjust_endianness(mr, &mrfd.data, size_memop(size) | MO_TE);
+MemOp mop = (target_words_bigendian() ? MO_BE : MO_LE) | 
size_memop(size);
+adjust_endianness(mr, &mrfd.data, mop);
 }
 memory_region_transaction_begin();
 for (i = 0; i < mr->ioeventfd_nb; ++i) {
@@ -2619,7 +2611,8 @@ void memory_region_del_eventfd(MemoryRegion *mr,
 unsigned i;
 
 if (size) {
-adjust_endianness(mr, &mrfd.data, size_memop(size) | MO_TE);
+MemOp mop = (target_words_bigendian() ? MO_BE : MO_LE) | 
size_memop(size);
+adjust_endianness(mr, &mrfd.data, mop);
 }
 memory_region_transaction_begin();
 for (i = 0; i < mr->ioeventfd_nb; ++i) {
diff --git a/system/meson.build b/system/meson.build
index bd82ef132e7..4f44b78df31 100644
--- a/system/meson.build
+++ b/system/meson.build
@@ -2,7 +2,6 @@ specific_ss.add(when: 'CONFIG_SYSTEM_ONLY', if_true: [files(
   'arch_init.c',
   'ioport.c',
   'globals-target.c',
-  'memory.c',
 )])
 
 system_ss.add(files(
@@ -15,6 +14,7 @@ system_ss.add(files(
   'dma-helpers.c',
   'globals.c',
   'memory_mapping.c',
+  'memory.c',
   'physmem.c',
   'qdev-monitor.c',
   'qtest.c',
-- 
2.39.5




Re: [PATCH 2/2] utils/qemu-sockets: Introduce keep-alive-idle-period inet socket option

2025-03-14 Thread Vladimir Sementsov-Ogievskiy

On 03.03.25 17:33, Juraj Marcin wrote:

The default idle period for TCP connection could be even 2 hours.
However, in some cases, the application needs to be aware of a
connection issue much sooner.

This is the case, for example, for postcopy live migration. If there is
no traffic from the migration destination guest (server-side) to the
migration source guest (client-side), the destination keeps waiting for
pages indefinitely and does not switch to the postcopy-paused state.
This can happen, for example, if the destination QEMU instance is
started with '-S' command line option and the machine is not started yet
or if the machine is idle and produces no new page faults for
not-yet-migrated pages.

This patch introduces a new inet socket parameter on platforms where
TCP_KEEPIDLE is defined and passes the configured value to the
TCP_KEEPIDLE socket option when a client-side or server-side socket is
created.

The default value is 0, which means system configuration is used, and no
custom value is set.

Signed-off-by: Juraj Marcin 
---
  io/dns-resolver.c   |  4 
  meson.build |  2 ++
  qapi/sockets.json   |  5 +
  util/qemu-sockets.c | 39 +++
  4 files changed, 50 insertions(+)

diff --git a/io/dns-resolver.c b/io/dns-resolver.c
index ee7403b65b..03c59673f0 100644
--- a/io/dns-resolver.c
+++ b/io/dns-resolver.c
@@ -128,6 +128,10 @@ static int 
qio_dns_resolver_lookup_sync_inet(QIODNSResolver *resolver,
  #endif
  .has_keep_alive = iaddr->has_keep_alive,
  .keep_alive = iaddr->keep_alive,
+#ifdef HAVE_TCP_KEEPIDLE
+.has_keep_alive_idle_period = iaddr->has_keep_alive_idle_period,
+.keep_alive_idle_period = iaddr->keep_alive_idle_period,
+#endif
  };
  
  (*addrs)[i] = newaddr;

diff --git a/meson.build b/meson.build
index 0ee79c664d..ca726f317f 100644
--- a/meson.build
+++ b/meson.build
@@ -2724,6 +2724,8 @@ config_host_data.set('HAVE_OPTRESET',
   cc.has_header_symbol('getopt.h', 'optreset'))
  config_host_data.set('HAVE_IPPROTO_MPTCP',
   cc.has_header_symbol('netinet/in.h', 'IPPROTO_MPTCP'))
+config_host_data.set('HAVE_TCP_KEEPIDLE',
+ cc.has_header_symbol('netinet/tcp.h', 'TCP_KEEPIDLE'))
  
  # has_member

  config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID',
diff --git a/qapi/sockets.json b/qapi/sockets.json
index eb50f64e3a..ddd82b1e66 100644
--- a/qapi/sockets.json
+++ b/qapi/sockets.json
@@ -59,6 +59,10 @@
  # @keep-alive: enable keep-alive when connecting to/listening on this socket.
  # (Since 4.2, not supported for listening sockets until 10.0)
  #
+# @keep-alive-idle-period: time in seconds the connection needs to be idle
+# before sending a keepalive packet.  Only supported for TCP sockets on
+# systems where TCP_KEEPIDLE socket option is defined.  (Since 10.0)
+#
  # @mptcp: enable multi-path TCP.  (Since 6.1)
  #
  # Since: 1.3
@@ -71,6 +75,7 @@
  '*ipv4': 'bool',
  '*ipv6': 'bool',
  '*keep-alive': 'bool',
+'*keep-alive-idle-period': { 'type': 'uint32', 'if': 'HAVE_TCP_KEEPIDLE' },
  '*mptcp': { 'type': 'bool', 'if': 'HAVE_IPPROTO_MPTCP' } } }
  
  ##

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index c30e4ac2ce..edcda846ef 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -349,6 +349,18 @@ listen_ok:
  slisten = -1;
  goto exit;
  }
+#ifdef HAVE_TCP_KEEPIDLE
+if (saddr->has_keep_alive_idle_period && 
saddr->keep_alive_idle_period) {
+int keepidle = saddr->has_keep_alive_idle_period;
+ret = setsockopt(slisten, IPPROTO_TCP, TCP_KEEPIDLE, &keepidle, 
sizeof(keepidle));
+if (ret < 0) {
+error_setg_errno(errp, errno, "Unable to set TCP_KEEPIDLE");
+close(slisten);
+slisten = -1;
+goto exit;
+}
+}
+#endif
  }
  exit:
  freeaddrinfo(res);
@@ -492,6 +504,17 @@ int inet_connect_saddr(InetSocketAddress *saddr, Error 
**errp)
  close(sock);
  return -1;
  }
+#ifdef HAVE_TCP_KEEPIDLE
+if (saddr->has_keep_alive_idle_period && 
saddr->keep_alive_idle_period) {
+int keepidle = saddr->keep_alive_idle_period;
+ret = setsockopt(sock, IPPROTO_TCP, TCP_KEEPIDLE, &keepidle, 
sizeof(keepidle));
+if (ret < 0) {
+error_setg_errno(errp, errno, "Unsable to set TCP_KEEPIDLE");
+close(sock);
+return -1;
+}
+}
+#endif
  }



Looks like we need some inet_setsockopt(*saddr) {}, which set both options and 
called from both client and server sockets, to avoid duplication.

  
  return sock;

@@ -698,6 +721,22 @@ int inet_parse(InetSocketAddress *addr, const char *str, 
Error **errp)
  }
  addr->has_keep_alive = true;
  }
+#ifdef HAVE_TCP_KEEPIDLE
+begin = strs

[PATCH v5 14/17] include/exec/memory: extract devend_big_endian from devend_memop

2025-03-14 Thread Pierrick Bouvier
we'll use it in system/memory.c.

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 include/exec/memory.h | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 069021ac3ff..70177304a92 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -3138,16 +3138,22 @@ address_space_write_cached(MemoryRegionCache *cache, 
hwaddr addr,
 MemTxResult address_space_set(AddressSpace *as, hwaddr addr,
   uint8_t c, hwaddr len, MemTxAttrs attrs);
 
-/* enum device_endian to MemOp.  */
-static inline MemOp devend_memop(enum device_endian end)
+/* returns true if end is big endian. */
+static inline bool devend_big_endian(enum device_endian end)
 {
 QEMU_BUILD_BUG_ON(DEVICE_HOST_ENDIAN != DEVICE_LITTLE_ENDIAN &&
   DEVICE_HOST_ENDIAN != DEVICE_BIG_ENDIAN);
 
-bool big_endian = (end == DEVICE_NATIVE_ENDIAN
-   ? target_words_bigendian()
-   : end == DEVICE_BIG_ENDIAN);
-return big_endian ? MO_BE : MO_LE;
+if (end == DEVICE_NATIVE_ENDIAN) {
+return target_words_bigendian();
+}
+return end == DEVICE_BIG_ENDIAN;
+}
+
+/* enum device_endian to MemOp.  */
+static inline MemOp devend_memop(enum device_endian end)
+{
+return devend_big_endian(end) ? MO_BE : MO_LE;
 }
 
 /*
-- 
2.39.5




[PATCH v5 12/17] hw/xen: add stubs for various functions

2025-03-14 Thread Pierrick Bouvier
Those symbols are used by system/physmem.c, and are called only if
xen_enabled() (which happens only if CONFIG_XEN is set and xen is
available).

So we can crash the stubs in case those are called, as they are linked
only when CONFIG_XEN is not set.

Acked-by: Richard Henderson 
Acked-by: Anthony PERARD 
Signed-off-by: Pierrick Bouvier 
---
 hw/xen/xen_stubs.c | 51 ++
 hw/xen/meson.build |  3 +++
 2 files changed, 54 insertions(+)
 create mode 100644 hw/xen/xen_stubs.c

diff --git a/hw/xen/xen_stubs.c b/hw/xen/xen_stubs.c
new file mode 100644
index 000..5e565df3929
--- /dev/null
+++ b/hw/xen/xen_stubs.c
@@ -0,0 +1,51 @@
+/*
+ * Various stubs for xen functions
+ *
+ * Those functions are used only if xen_enabled(). This file is linked only if
+ * CONFIG_XEN is not set, so they should never be called.
+ *
+ * Copyright (c) 2025 Linaro, Ltd.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "system/xen.h"
+#include "system/xen-mapcache.h"
+
+void xen_hvm_modified_memory(ram_addr_t start, ram_addr_t length)
+{
+g_assert_not_reached();
+}
+
+void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size,
+   struct MemoryRegion *mr, Error **errp)
+{
+g_assert_not_reached();
+}
+
+bool xen_mr_is_memory(MemoryRegion *mr)
+{
+g_assert_not_reached();
+}
+
+void xen_invalidate_map_cache_entry(uint8_t *buffer)
+{
+g_assert_not_reached();
+}
+
+ram_addr_t xen_ram_addr_from_mapcache(void *ptr)
+{
+g_assert_not_reached();
+}
+
+uint8_t *xen_map_cache(MemoryRegion *mr,
+   hwaddr phys_addr,
+   hwaddr size,
+   ram_addr_t ram_addr_offset,
+   uint8_t lock,
+   bool dma,
+   bool is_write)
+{
+g_assert_not_reached();
+}
diff --git a/hw/xen/meson.build b/hw/xen/meson.build
index 4a486e36738..a1850e76988 100644
--- a/hw/xen/meson.build
+++ b/hw/xen/meson.build
@@ -9,6 +9,9 @@ system_ss.add(when: ['CONFIG_XEN_BUS'], if_true: files(
 
 system_ss.add(when: ['CONFIG_XEN', xen], if_true: files(
   'xen-operations.c',
+),
+if_false: files(
+  'xen_stubs.c',
 ))
 
 xen_specific_ss = ss.source_set()
-- 
2.39.5




Re: [PATCH v2] blockdev-backup: Add error handling option for copy-before-write jobs

2025-03-14 Thread Vladimir Sementsov-Ogievskiy

On 04.03.25 12:17, Raman Dzehtsiar wrote:

This patch extends the blockdev-backup QMP command to allow users to specify
how to behave when IO errors occur during copy-before-write operations.
Previously, the behavior was fixed and could not be controlled by the user.

The new 'on-cbw-error' option can be set to one of two values:
- 'break-guest-write': Forwards the IO error to the guest and triggers
   the on-source-error policy. This preserves snapshot integrity at the
   expense of guest IO operations.
- 'break-snapshot': Allows the guest OS to continue running normally,
   but invalidates the snapshot and aborts related jobs. This prioritizes
   guest operation over backup consistency.

This enhancement provides more flexibility for backup operations in different
environments where requirements for guest availability versus backup
consistency may vary.

The default behavior remains unchanged to maintain backward compatibility.

Signed-off-by: Raman Dzehtsiar



Hi Raman, sorry for a delay!

The patch looks good to me. Still, could you also provide a test for a new 
option?

Probably the simplest would be add a test-case to 
`tests/qemu-iotests/tests/copy-before-write`, where existing on-cbw-error 
option is tested. Or you can make a separate test.

--
Best regards,
Vladimir




[PATCH v5 02/17] exec/tswap: implement {ld, st}.*_p as functions instead of macros

2025-03-14 Thread Pierrick Bouvier
Defining functions allows to use them from common code, by not depending
on TARGET_BIG_ENDIAN.
Remove previous macros from exec/cpu-all.h.
By moving them out of cpu-all.h, we'll be able to break dependency on
cpu.h for memory related functions coming in next commits.

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 include/exec/cpu-all.h | 25 ---
 include/exec/tswap.h   | 70 ++
 2 files changed, 70 insertions(+), 25 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 8cd6c00cf89..e56c064d46f 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -38,31 +38,6 @@
 #define BSWAP_NEEDED
 #endif
 
-/* Target-endianness CPU memory access functions. These fit into the
- * {ld,st}{type}{sign}{size}{endian}_p naming scheme described in bswap.h.
- */
-#if TARGET_BIG_ENDIAN
-#define lduw_p(p) lduw_be_p(p)
-#define ldsw_p(p) ldsw_be_p(p)
-#define ldl_p(p) ldl_be_p(p)
-#define ldq_p(p) ldq_be_p(p)
-#define stw_p(p, v) stw_be_p(p, v)
-#define stl_p(p, v) stl_be_p(p, v)
-#define stq_p(p, v) stq_be_p(p, v)
-#define ldn_p(p, sz) ldn_be_p(p, sz)
-#define stn_p(p, sz, v) stn_be_p(p, sz, v)
-#else
-#define lduw_p(p) lduw_le_p(p)
-#define ldsw_p(p) ldsw_le_p(p)
-#define ldl_p(p) ldl_le_p(p)
-#define ldq_p(p) ldq_le_p(p)
-#define stw_p(p, v) stw_le_p(p, v)
-#define stl_p(p, v) stl_le_p(p, v)
-#define stq_p(p, v) stq_le_p(p, v)
-#define ldn_p(p, sz) ldn_le_p(p, sz)
-#define stn_p(p, sz, v) stn_le_p(p, sz, v)
-#endif
-
 /* MMU memory access macros */
 
 #if !defined(CONFIG_USER_ONLY)
diff --git a/include/exec/tswap.h b/include/exec/tswap.h
index 2683da0adb7..84060a49994 100644
--- a/include/exec/tswap.h
+++ b/include/exec/tswap.h
@@ -80,4 +80,74 @@ static inline void tswap64s(uint64_t *s)
 }
 }
 
+/* Return ld{word}_{le,be}_p following target endianness. */
+#define LOAD_IMPL(word, args...)\
+do {\
+if (target_words_bigendian()) { \
+return glue(glue(ld, word), _be_p)(args);   \
+} else {\
+return glue(glue(ld, word), _le_p)(args);   \
+}   \
+} while (0)
+
+static inline int lduw_p(const void *ptr)
+{
+LOAD_IMPL(uw, ptr);
+}
+
+static inline int ldsw_p(const void *ptr)
+{
+LOAD_IMPL(sw, ptr);
+}
+
+static inline int ldl_p(const void *ptr)
+{
+LOAD_IMPL(l, ptr);
+}
+
+static inline uint64_t ldq_p(const void *ptr)
+{
+LOAD_IMPL(q, ptr);
+}
+
+static inline uint64_t ldn_p(const void *ptr, int sz)
+{
+LOAD_IMPL(n, ptr, sz);
+}
+
+#undef LOAD_IMPL
+
+/* Call st{word}_{le,be}_p following target endianness. */
+#define STORE_IMPL(word, args...)   \
+do {\
+if (target_words_bigendian()) { \
+glue(glue(st, word), _be_p)(args);  \
+} else {\
+glue(glue(st, word), _le_p)(args);  \
+}   \
+} while (0)
+
+
+static inline void stw_p(void *ptr, uint16_t v)
+{
+STORE_IMPL(w, ptr, v);
+}
+
+static inline void stl_p(void *ptr, uint32_t v)
+{
+STORE_IMPL(l, ptr, v);
+}
+
+static inline void stq_p(void *ptr, uint64_t v)
+{
+STORE_IMPL(q, ptr, v);
+}
+
+static inline void stn_p(void *ptr, int sz, uint64_t v)
+{
+STORE_IMPL(n, ptr, sz, v);
+}
+
+#undef STORE_IMPL
+
 #endif  /* TSWAP_H */
-- 
2.39.5




Re: [PATCH v5 00/17] make system memory API available for common code

2025-03-14 Thread Pierrick Bouvier

Hi,

one patch is missing review:
[PATCH v5 12/17] hw/xen: add stubs for various functions.

Regards,
Pierrick

On 3/14/25 10:31, Pierrick Bouvier wrote:

The main goal of this series is to be able to call any memory ld/st function
from code that is *not* target dependent. As a positive side effect, we can
turn related system compilation units into common code.

The first 5 patches remove dependency of memory API to cpu headers and remove
dependency to target specific code. This could be a series on its own, but it's
great to be able to turn system memory compilation units into common code to
make sure it can't regress, and prove it achieves the desired result.

The next patches remove more dependencies on cpu headers (exec-all,
memory-internal, ram_addr).
Then, we add access to a needed function from kvm, some xen stubs, and we
finally can turn our compilation units into common code.

Every commit was tested to build correctly for all targets (on windows, linux,
macos), and the series was fully tested by running all tests we have (linux,
x86_64 host).

v2:
- reorder first commits (tswap change first, so memory cached functions can use 
it)
- move st/ld*_p functions to tswap instead of bswap
- add define for target_words_bigendian when COMPILING_PER_TARGET, equals to
   TARGET_BIG_ENDIAN (avoid overhead in target code)
- rewrite devend_memop
- remove useless exec-all.h in concerned patch
- extract devend_big_endian function to reuse in system/memory.c
- rewrite changes to system/memory.c

v3:
- move devend functions to memory_internal.h
- completed description for commits removing cpu.h dependency

v4:
- rebase on top of master
   * missing include in 'codebase: prepare to remove cpu.h from exec/exec-all.h'
   * meson build conflict

v5:
- remove extra xen stub xen_invalidate_map_cache()
- edit xen stubs commit message

Pierrick Bouvier (17):
   exec/tswap: target code can use TARGET_BIG_ENDIAN instead of
 target_words_bigendian()
   exec/tswap: implement {ld,st}.*_p as functions instead of macros
   exec/memory_ldst: extract memory_ldst declarations from cpu-all.h
   exec/memory_ldst_phys: extract memory_ldst_phys declarations from
 cpu-all.h
   exec/memory.h: make devend_memop "target defines" agnostic
   codebase: prepare to remove cpu.h from exec/exec-all.h
   exec/exec-all: remove dependency on cpu.h
   exec/memory-internal: remove dependency on cpu.h
   exec/ram_addr: remove dependency on cpu.h
   system/kvm: make kvm_flush_coalesced_mmio_buffer() accessible for
 common code
   exec/ram_addr: call xen_hvm_modified_memory only if xen is enabled
   hw/xen: add stubs for various functions
   system/physmem: compilation unit is now common to all targets
   include/exec/memory: extract devend_big_endian from devend_memop
   include/exec/memory: move devend functions to memory-internal.h
   system/memory: make compilation unit common
   system/ioport: make compilation unit common

  include/exec/cpu-all.h  | 66 ---
  include/exec/exec-all.h |  1 -
  include/exec/memory-internal.h  | 21 +++-
  include/exec/memory.h   | 30 ---
  include/exec/ram_addr.h | 11 ++--
  include/exec/tswap.h| 81 +++--
  include/system/kvm.h|  6 +--
  include/tcg/tcg-op.h|  1 +
  target/ppc/helper_regs.h|  2 +
  include/exec/memory_ldst.h.inc  |  4 --
  include/exec/memory_ldst_phys.h.inc |  5 +-
  cpu-target.c|  1 +
  hw/ppc/spapr_nested.c   |  1 +
  hw/sh4/sh7750.c |  1 +
  hw/xen/xen_stubs.c  | 51 ++
  page-vary-target.c  |  2 +-
  system/ioport.c |  1 -
  system/memory.c | 17 ++
  target/ppc/tcg-excp_helper.c|  1 +
  target/riscv/bitmanip_helper.c  |  2 +-
  hw/xen/meson.build  |  3 ++
  system/meson.build  |  6 +--
  22 files changed, 188 insertions(+), 126 deletions(-)
  create mode 100644 hw/xen/xen_stubs.c






[PATCH v5 17/17] system/ioport: make compilation unit common

2025-03-14 Thread Pierrick Bouvier
Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 system/ioport.c| 1 -
 system/meson.build | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/system/ioport.c b/system/ioport.c
index 55c2a752396..89daae9d602 100644
--- a/system/ioport.c
+++ b/system/ioport.c
@@ -26,7 +26,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "cpu.h"
 #include "exec/ioport.h"
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
diff --git a/system/meson.build b/system/meson.build
index 4f44b78df31..063301c3ad0 100644
--- a/system/meson.build
+++ b/system/meson.build
@@ -1,6 +1,5 @@
 specific_ss.add(when: 'CONFIG_SYSTEM_ONLY', if_true: [files(
   'arch_init.c',
-  'ioport.c',
   'globals-target.c',
 )])
 
@@ -13,6 +12,7 @@ system_ss.add(files(
   'dirtylimit.c',
   'dma-helpers.c',
   'globals.c',
+  'ioport.c',
   'memory_mapping.c',
   'memory.c',
   'physmem.c',
-- 
2.39.5




[PATCH v5 13/17] system/physmem: compilation unit is now common to all targets

2025-03-14 Thread Pierrick Bouvier
Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 system/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/system/meson.build b/system/meson.build
index eec07a94513..bd82ef132e7 100644
--- a/system/meson.build
+++ b/system/meson.build
@@ -3,7 +3,6 @@ specific_ss.add(when: 'CONFIG_SYSTEM_ONLY', if_true: [files(
   'ioport.c',
   'globals-target.c',
   'memory.c',
-  'physmem.c',
 )])
 
 system_ss.add(files(
@@ -16,6 +15,7 @@ system_ss.add(files(
   'dma-helpers.c',
   'globals.c',
   'memory_mapping.c',
+  'physmem.c',
   'qdev-monitor.c',
   'qtest.c',
   'rtc.c',
-- 
2.39.5




[PATCH v5 10/17] system/kvm: make kvm_flush_coalesced_mmio_buffer() accessible for common code

2025-03-14 Thread Pierrick Bouvier
This function is used by system/physmem.c will be turn into common code
in next commit.

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 include/system/kvm.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/system/kvm.h b/include/system/kvm.h
index ab17c09a551..21da3b8b052 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -210,11 +210,11 @@ bool kvm_arm_supports_user_irq(void);
 int kvm_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
 int kvm_on_sigbus(int code, void *addr);
 
-#ifdef COMPILING_PER_TARGET
-#include "cpu.h"
-
 void kvm_flush_coalesced_mmio_buffer(void);
 
+#ifdef COMPILING_PER_TARGET
+#include "cpu.h"
+
 /**
  * kvm_update_guest_debug(): ensure KVM debug structures updated
  * @cs: the CPUState for this cpu
-- 
2.39.5




Re: [PATCH 32/37] include/hw/intc: Remove ifndef CONFIG_USER_ONLY from armv7m_nvic.h

2025-03-14 Thread Richard Henderson

On 3/13/25 14:00, Pierrick Bouvier wrote:

diff --git a/include/hw/intc/armv7m_nvic.h b/include/hw/intc/armv7m_nvic.h
index 89fe8aedaa..7b9964fe7e 100644
--- a/include/hw/intc/armv7m_nvic.h
+++ b/include/hw/intc/armv7m_nvic.h
@@ -189,21 +189,7 @@ int armv7m_nvic_raw_execution_priority(NVICState *s);
   * @secure: the security state to test
   * This corresponds to the pseudocode IsReqExecPriNeg().
   */
-#ifndef CONFIG_USER_ONLY
  bool armv7m_nvic_neg_prio_requested(NVICState *s, bool secure);
-#else
-static inline bool armv7m_nvic_neg_prio_requested(NVICState *s, bool secure)
-{
-    return false;
-}
-#endif
-#ifndef CONFIG_USER_ONLY
  bool armv7m_nvic_can_take_pending_exception(NVICState *s);
-#else
-static inline bool armv7m_nvic_can_take_pending_exception(NVICState *s)
-{
-    return true;
-}
-#endif
  #endif


I'm a bit worried we might have regression when doing things this way.


I expect link errors to diagnose errors.


r~



Re: [PATCH] host/include/loongarch64: Fix inline assembly compatibility with Clang

2025-03-14 Thread Richard Henderson

On 3/13/25 20:31, Yao Zi wrote:

Clang on LoongArch only accepts fp register names in the dollar-prefixed
form, while GCC allows omitting the dollar. Change registers in ASM
clobbers to the dollar-prefixed form to make user emulators buildable
with Clang on loongarch64. No functional change invovled.

Signed-off-by: Yao Zi
---
  host/include/loongarch64/host/atomic128-ldst.h.inc| 4 ++--
  host/include/loongarch64/host/bufferiszero.c.inc  | 6 --
  host/include/loongarch64/host/load-extract-al16-al8.h.inc | 2 +-
  3 files changed, 7 insertions(+), 5 deletions(-)


Reviewed-by: Richard Henderson 

r~



[PULL 4/4] hw/qxl: fix cpr

2025-03-14 Thread Fabiano Rosas
From: Steve Sistare 

During normal migration, new QEMU creates and initializes memory regions,
then loads the preserved contents of the region from vmstate.

During CPR, memory regions are preserved in place, then the realize
method initializes the regions contents, losing the old contents.  To
fix, skip writes to the qxl memory regions during CPR load.

Reported-by: andrey.drobys...@virtuozzo.com
Tested-by: andrey.drobys...@virtuozzo.com
Signed-off-by: Steve Sistare 
Reviewed-by: Fabiano Rosas 
Message-ID: <1741380954-341079-5-git-send-email-steven.sist...@oracle.com>
Signed-off-by: Fabiano Rosas 
---
 hw/display/qxl.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index 2efdc77e61..da14da5209 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -30,6 +30,7 @@
 #include "qemu/module.h"
 #include "hw/qdev-properties.h"
 #include "system/runstate.h"
+#include "migration/cpr.h"
 #include "migration/vmstate.h"
 #include "trace.h"
 
@@ -333,6 +334,10 @@ static void init_qxl_rom(PCIQXLDevice *d)
 uint32_t fb;
 int i, n;
 
+if (cpr_is_incoming()) {
+goto skip_init;
+}
+
 memset(rom, 0, d->rom_size);
 
 rom->magic = cpu_to_le32(QXL_ROM_MAGIC);
@@ -390,6 +395,7 @@ static void init_qxl_rom(PCIQXLDevice *d)
 sizeof(rom->client_monitors_config));
 }
 
+skip_init:
 d->shadow_rom = *rom;
 d->rom= rom;
 d->modes  = modes;
@@ -403,6 +409,9 @@ static void init_qxl_ram(PCIQXLDevice *d)
 
 buf = d->vga.vram_ptr;
 d->ram = (QXLRam *)(buf + le32_to_cpu(d->shadow_rom.ram_header_offset));
+if (cpr_is_incoming()) {
+return;
+}
 d->ram->magic   = cpu_to_le32(QXL_RAM_MAGIC);
 d->ram->int_pending = cpu_to_le32(0);
 d->ram->int_mask= cpu_to_le32(0);
@@ -539,6 +548,10 @@ static void interface_set_compression_level(QXLInstance 
*sin, int level)
 
 trace_qxl_interface_set_compression_level(qxl->id, level);
 qxl->shadow_rom.compression_level = cpu_to_le32(level);
+if (cpr_is_incoming()) {
+assert(qxl->rom->compression_level == cpu_to_le32(level));
+return;
+}
 qxl->rom->compression_level = cpu_to_le32(level);
 qxl_rom_set_dirty(qxl);
 }
@@ -997,7 +1010,8 @@ static void interface_set_client_capabilities(QXLInstance 
*sin,
 }
 
 if (runstate_check(RUN_STATE_INMIGRATE) ||
-runstate_check(RUN_STATE_POSTMIGRATE)) {
+runstate_check(RUN_STATE_POSTMIGRATE) ||
+cpr_is_incoming()) {
 return;
 }
 
@@ -1200,6 +1214,10 @@ static void qxl_reset_state(PCIQXLDevice *d)
 {
 QXLRom *rom = d->rom;
 
+if (cpr_is_incoming()) {
+return;
+}
+
 qxl_check_state(d);
 d->shadow_rom.update_id = cpu_to_le32(0);
 *rom = d->shadow_rom;
@@ -1370,8 +1388,11 @@ static int qxl_add_memslot(PCIQXLDevice *d, uint32_t 
slot_id, uint64_t delta,
 memslot.virt_start = virt_start + (guest_start - pci_start);
 memslot.virt_end   = virt_start + (guest_end   - pci_start);
 memslot.addr_delta = memslot.virt_start - delta;
-memslot.generation = d->rom->slot_generation = 0;
-qxl_rom_set_dirty(d);
+if (!cpr_is_incoming()) {
+d->rom->slot_generation = 0;
+qxl_rom_set_dirty(d);
+}
+memslot.generation = d->rom->slot_generation;
 
 qemu_spice_add_memslot(&d->ssd, &memslot, async);
 d->guest_slots[slot_id].mr = mr;
-- 
2.35.3




Re: [PATCH 0/2] gdb invalid memory access handling improvements

2025-03-14 Thread David Hildenbrand

On 14.03.25 08:41, Nicholas Piggin wrote:

This adds .debug=1 attribute for GDB's phys mem access mode, adds
memory transaction error handling for it so it reports cannot access
memory instead of silent success, and silences warning logs for
invalid memory access coming from the debugger.


Nothing jumped at me, with Richard's suggestion of keeping the new 
function limited in scope


Reviewed-by: David Hildenbrand 

--
Cheers,

David / dhildenb




Re: [PATCH 32/37] include/hw/intc: Remove ifndef CONFIG_USER_ONLY from armv7m_nvic.h

2025-03-14 Thread Pierrick Bouvier

On 3/14/25 13:59, Richard Henderson wrote:

On 3/14/25 13:34, Pierrick Bouvier wrote:

On 3/14/25 13:03, Richard Henderson wrote:

I'm not quite sure what you're arguing for here.
A build-time error is vastly preferable to a run-time error.



Even though this specific patch is safe (code calling those functions should be 
under
system anyway, so it should not be linked in a user binary), I just wonder if 
we should
not add explicit checks for this, for other kind of replacement we'll have to 
do.


Any such runtime check should not be for "system" vs "user", but for "feature 
enabled".



From our build system point of view, I don't really see what is the 
difference. That's part of why I insisted previously on cleaning user vs 
system ifdefs at the same time we make files common, but the answer I 
had was "We don't need to do that now", so I stopped arguing.


When building user vs system, we use ifdef to detect this case, and we 
select different implementations and include a specific set of source 
files, the same way we do for some features, or specific targets. So, 
why should it be treated differently, or later?



If it's a lesser used configuration or feature, a run-time error could lay 
dormant for
years before a user encounters it.



Sure, but wouldn't it better to have an explicit assert, instead of observing a 
random
behaviour?


What random behaviour are you suggesting?



In case we return a stub value by accident, when you expect to have a 
side effect done somewhere. And that it would not result in an immediate 
crash, but later at a random place.


Maybe the case just does not exist, but my point was that it does not 
cost anything to add an assert just in case.



I'm just worried we end up calling something we should not (user vs system, or 
any other
ifdef CONFIG/TARGET that might be hidden somewhere), and silently return a 
wrong value,
which would not be covered by our test suite.


Where is the wrong value going to be returned from, the stub?
Yes, many stubs do abort, typically after the "enabled" stub returns false.



Many, but not all of them I guess.


It's still best if "feature enabled" is compile-time false when possible, such 
that
everything after the feature check gets compiled out.  At which point we don't 
need the
stubs: they won't be reachable and errors are caught at build-time.



Sure, but as we saw, it's not always possible, because some calls are 
under: if (X_enabled()) {...do_X()...}, which requires to be able to 
link do_X even though it will be dead code at runtime.


Maybe my concern is unfounded, but seeing some of the inline stubs and 
the ifdefs associated, I hope we won't miss a corner case.




r~





  1   2   >