Re: [PATCH v8 0/7] Allow to enable multifd and postcopy migration together

2025-04-11 Thread Prasad Pandit
Hi,

On Fri, 11 Apr 2025 at 01:48, Fabiano Rosas  wrote:
> That's what it looks like. It could be some error condition that is not
> being propagated properly. The thread hits an error and exits without
> informing the rest of migration.

* The gdb(1) hanging in the postcopy_ram_fault_thread() is not
conclusive. I tried to set following break-points

gdb) break postcopy-ram.c:998 - poll_result = poll(pfd, pfd_len,
-1 /* Wait forever */);
gdb) break postcopy-ram.c:1057 -  rb = qemu_ram_block_from_host(...);

  gdb(1) hangs for both of them, there might be another reason for it.
Live-migration also stalls with it.

> Some combination of the postcopy traces should give you that. Sorry,
> Peter Xu really is the expert on postcopy, I just tag along.

* I see. Maybe it could be logged with --migration-debug= option.

> The snippet I posted shows that it's the same page:
>
> (gdb) x/i $pc
> => 0x75399d14 <__memcpy_evex_unaligned_erms+86>:rep movsb 
> %ds:(%rsi),%es:(%rdi)
> (gdb) p/x $rsi
> $1 = 0x7fffd68cc000
>
===
>> Thread 1 (Thread 0x7fbc4849df80 (LWP 7487) "qemu-system-x86"):
...
>> Thread 10 (Thread 0x7fffce7fc700 (LWP 11778) "mig/dst/listen"):
...
>> Thread 9 (Thread 0x7fffceffd700 (LWP 11777) "mig/dst/fault"):
#0  0x75314a89 in __GI___poll (fds=0x7fffcb60, nfds=2,
timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
...
postcopy_ram_fault_thread_request Request for HVA=0x7fffd68cc000
rb=pc.ram offset=0xcc000 pid=11754
===

* Looking at the above data, it seems the missing page fault occurred
in thread=11754 , it may not be the memcpy(3) in
thread-1(pid/tid=7487) that triggered the fault.

* Secondly, if 'mig/dst/fault' thread is waiting at poll(2) call, ie.
fault notification has not arrived on the mis->userfault_fd  OR
mis->userfault_event_fd descriptors yet.  So the "Request for
HVA=0x7fffd..." via postcopy_ram_fault_thread_request() could be an
already served request.


> Send your next version and I'll set some time aside to debug this.
>
> heads-up: I'll be off from 2025/04/18 until 2025/05/05. Peter should be
> already back in the meantime.

* Okay, I'll send the next version.

Thank you.
---
  - Prasad




Re: [PATCH 1/2] hw: usb: xhci: Add property to support writing ERSTBA in high-low order

2025-04-11 Thread Nicholas Piggin
On Sun Apr 6, 2025 at 12:00 AM AEST, Guenter Roeck wrote:
> According to the XHCI specification, ERSTBA should be written in Low-High
> order. The Linux kernel writes the high word first. This results in an
> initialization failure.

This should probably be reworded, it's not so much that Linux does it,
this kind of implies a Linux bug. It is that the hardware requires it
and Linux works around such quirk.

  According to the XHCI specification, ERSTBA should be written in Low-High
  order, however some controllers have a quirk that requires the low
  word to be written last.

>
> The following information is found in the Linux kernel commit log.
>
> [Synopsys]- The host controller was design to support ERST setting
> during the RUN state. But since there is a limitation in controller
> in supporting separate ERSTBA_HI and ERSTBA_LO programming,
> It is supported when the ERSTBA is programmed in 64bit,
> or in 32 bit mode ERSTBA_HI before ERSTBA_LO
>
> [Synopsys]- The internal initialization of event ring fetches
> the "Event Ring Segment Table Entry" based on the indication of
> ERSTBA_LO written.

Could you include a reference to the commit in the normal form?

The following information is found in the changelog for Linux kernel
commit sha ("blah").

>
> Add property to support writing the high word first.
>
> Signed-off-by: Guenter Roeck 
> ---
>  hw/usb/hcd-xhci.c | 8 +++-
>  hw/usb/hcd-xhci.h | 1 +
>  2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
> index 64c3a23b9b..8c0ba569c8 100644
> --- a/hw/usb/hcd-xhci.c
> +++ b/hw/usb/hcd-xhci.c
> @@ -3107,10 +3107,15 @@ static void xhci_runtime_write(void *ptr, hwaddr reg,
>  } else {
>  intr->erstba_low = val & 0xffc0;
>  }
> +if (xhci->erstba_hi_lo) {
> +xhci_er_reset(xhci, v);
> +}
>  break;
>  case 0x14: /* ERSTBA high */
>  intr->erstba_high = val;
> -xhci_er_reset(xhci, v);
> +if (!xhci->erstba_hi_lo) {
> +xhci_er_reset(xhci, v);
> +}
>  break;
>  case 0x18: /* ERDP low */
>  if (val & ERDP_EHB) {
> @@ -3636,6 +3641,7 @@ static const Property xhci_properties[] = {
>  DEFINE_PROP_UINT32("p3",XHCIState, numports_3, 4),
>  DEFINE_PROP_LINK("host",XHCIState, hostOpaque, TYPE_DEVICE,
>   DeviceState *),
> +DEFINE_PROP_BOOL("erstba-hi-lo", XHCIState, erstba_hi_lo, false),
>  };
>  
>  static void xhci_class_init(ObjectClass *klass, void *data)
> diff --git a/hw/usb/hcd-xhci.h b/hw/usb/hcd-xhci.h
> index 9c3974f148..cf3f074261 100644
> --- a/hw/usb/hcd-xhci.h
> +++ b/hw/usb/hcd-xhci.h
> @@ -189,6 +189,7 @@ typedef struct XHCIState {
>  uint32_t numports_3;
>  uint32_t numintrs;
>  uint32_t numslots;
> +bool erstba_hi_lo;

Could you use the "quirk" prefix for the device and property name?

With those changes,

Reviewed-by: Nicholas Piggin 

With your patch, if the target does do a 64-bit write to the address,
what happens? I wonder if that's something the device is supposed to
cope with but doesn't work or just works by luck today... I would say
that's a separate problem though, if you can get Linux working okay
with this approach.

Thanks,
Nick

>  uint32_t flags;
>  uint32_t max_pstreams_mask;
>  void (*intr_update)(XHCIState *s, int n, bool enable);




Re: [PATCH 2/2] hw/usb/hcd-dwc3: Set erstba-hi-lo property

2025-04-11 Thread Nicholas Piggin
On Sun Apr 6, 2025 at 12:00 AM AEST, Guenter Roeck wrote:
> The dwc3 hardware requires the ERSTBA address to be written in
> high-low order.
>
> From information found in the Linux kernel:

In fact this info could be contained within this patch rather
than duplicated in both. This is the one for the particular
hardware.

>
> [Synopsys]- The host controller was design to support ERST setting
> during the RUN state. But since there is a limitation in controller
> in supporting separate ERSTBA_HI and ERSTBA_LO programming,
> It is supported when the ERSTBA is programmed in 64bit,
> or in 32 bit mode ERSTBA_HI before ERSTBA_LO
>
> [Synopsys]- The internal initialization of event ring fetches
> the "Event Ring Segment Table Entry" based on the indication of
> ERSTBA_LO written.
>
> Inform the XHCI core to expect ERSTBA to be written in high-low order.
>
> Signed-off-by: Guenter Roeck 

Should this go to qemu-stable?

Thanks,
Nick

> ---
>  hw/usb/hcd-dwc3.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/usb/hcd-dwc3.c b/hw/usb/hcd-dwc3.c
> index 0bceee2712..6783d55526 100644
> --- a/hw/usb/hcd-dwc3.c
> +++ b/hw/usb/hcd-dwc3.c
> @@ -603,6 +603,7 @@ static void usb_dwc3_realize(DeviceState *dev, Error 
> **errp)
>  SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
>  Error *err = NULL;
>  
> +qdev_prop_set_bit(DEVICE(&s->sysbus_xhci), "erstba-hi-lo", true);
>  sysbus_realize(SYS_BUS_DEVICE(&s->sysbus_xhci), &err);
>  if (err) {
>  error_propagate(errp, err);




Re: [PATCH v2] target/i386: Fix model number of Zhaoxin YongFeng vCPU template

2025-04-11 Thread Ewan Hai




On 4/11/25 11:22 AM, Zhao Liu wrote:


On Thu, Apr 10, 2025 at 10:07:15PM +0800, Ewan Hai wrote:

Date: Thu, 10 Apr 2025 22:07:15 +0800
From: Ewan Hai 
Subject: Re: [PATCH v2] target/i386: Fix model number of Zhaoxin YongFeng
  vCPU template

On 4/10/25 8:22 PM, Paolo Bonzini wrote:


On 4/7/25 04:07, Ewan Hai wrote:

The model number was mistakenly set to 0x0b (11) in commit ff04bc1ac4.
The correct value is 0x5b. This mistake occurred because the extended
model bits in cpuid[eax=0x1].eax were overlooked, and only the base
model was used.

This patch corrects the model field.


Hi, please follow commit e0013791b9326945ccd09b5b602437beb322cab8 to
define a new version of the CPU.


I’ve noticed that in the QEMU repository at commit
e0013791b9326945ccd09b5b602437beb322cab8 (as HEAD), the following patches I
previously submitted (which the Zhaoxin YongFeng vCPU model depends on) are
not included:


:-) e0013791b9326945ccd09b5b602437beb322cab8 is an example case to show
how to fix model id.


- 5d20aa540b6991c0dbeef933d2055e5372f52e0e: "target/i386: Add support for
Zhaoxin CPU vendor identification"
- c0799e8b003713e07b546faba600363eccd179ee: "target/i386: Add CPUID leaf
0xC000_0001 EDX definitions"
- ff04bc1ac478656e5d6a255bf4069edb3f55bc58: "target/i386: Introduce Zhaoxin
Yongfeng CPU model" (this is the main patch that needs to be fixed)
- a4e749780bd20593c0c386612a51bf4d64a80132: "target/i386: Mask CMPLegacy bit
in CPUID[0x8001].ECX for Zhaoxin CPUs"

Should I resend the entire patchset, or would it be sufficient to just send
a revised version of the “target/i386: Introduce Zhaoxin Yongfeng CPU model”
patch?


IIUC, because this fix is planning to land in v10.1 (next release
cycle), current CPU model (will be released in v10.0) can't be modified
directly. It is only possible to directly modify an unreleased CPU model
during the same release cycle.

Thus it's enough to just introduce a v2 and correct your model id like
this:

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1b64ceaaba46..1ca1c3a729e8 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5621,6 +5621,17 @@ static const X86CPUDefinition builtin_x86_defs[] = {
  .features[FEAT_VMX_VMFUNC] = MSR_VMX_VMFUNC_EPT_SWITCHING,
  .xlevel = 0x8008,
  .model_id = "Zhaoxin YongFeng Processor",
+.versions = (X86CPUVersionDefinition[]) {
+{ .version = 1 },
+{
+.version = 2,
+.props = (PropValue[]) {
+{ "model", "0x5b" },
+{ /* end of list */ }
+}
+},
+{ /* end of list */ }
+}
  },
  };



Thanks again for your patience and explanation.

I'm not entirely sure if this is the best approach. I have one thought, and I'd 
like your help to confirm whether I'm on the right track or not. From what I can 
tell, most other vCPU definitions that use the .versions mechanism do so 
incrementally: for instance, they add new features in v2, v3, etc., but each of 
those versions (v1, v2, v3) remains valid for practical use.


However, in our specific case, the v1 version of the Zhaoxin vCPU definition has 
an incorrect .model value, which breaks the Linux guest's vPMU functionality. 
That makes me uncertain whether using new version definitions to fix this issue 
is really the best solution. After all, v1 itself would remain problematic.


Do you have any thoughts on whether it might be better to correct the existing 
definition, or do you think the versioned approach is still the recommended 
path? I appreciate any input or guidance you can provide.





[PATCH v3 5/8] tests/qtest/xhci: add a test for TR NOOP commands

2025-04-11 Thread Nicholas Piggin
Run some TR NOOP commands through the transfer ring.

Signed-off-by: Nicholas Piggin 
---
 tests/qtest/usb-hcd-xhci-test.c | 36 -
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/usb-hcd-xhci-test.c b/tests/qtest/usb-hcd-xhci-test.c
index b9fb2356d26..63359fb70b9 100644
--- a/tests/qtest/usb-hcd-xhci-test.c
+++ b/tests/qtest/usb-hcd-xhci-test.c
@@ -361,9 +361,33 @@ static void submit_cr_trb(XHCIQState *s, XHCITRB *trb)
 xhci_db_writel(s, 0, 0); /* doorbell 0 */
 }
 
+static void submit_tr_trb(XHCIQState *s, int slot, XHCITRB *trb)
+{
+XHCIQSlotState *sl = &s->slots[slot];
+uint64_t tr_addr = sl->transfer_ring + sl->tr_trb_idx * TRB_SIZE;
+XHCITRB t;
+
+trb->control |= sl->tr_trb_c; /* C */
+
+t.parameter = cpu_to_le64(trb->parameter);
+t.status = cpu_to_le32(trb->status);
+t.control = cpu_to_le32(trb->control);
+
+qtest_memwrite(s->parent->qts, tr_addr, &t, TRB_SIZE);
+sl->tr_trb_idx++;
+/* Last entry contains the link, so wrap back */
+if (sl->tr_trb_idx == sl->tr_trb_entries - 1) {
+set_link_trb(s, sl->transfer_ring, sl->tr_trb_c, sl->tr_trb_entries);
+sl->tr_trb_idx = 0;
+sl->tr_trb_c ^= 1;
+}
+xhci_db_writel(s, slot, 1); /* doorbell slot, EP0 target */
+}
+
 /*
  * This test brings up an endpoint and runs some noops through its command
- * ring and gets responses back on the event ring.
+ * ring and gets responses back on the event ring, then brings up a device
+ * context and runs some noops through its transfer ring.
  *
  * This could be librified in future (like AHCI0 to have a way to bring up
  * an endpoint to test device protocols.
@@ -519,6 +543,16 @@ static void pci_xhci_stress_rings(void)
 
 /* XXX: Could check EP state is running */
 
+/* Wrap the transfer ring a few times */
+for (i = 0; i < 100; i++) {
+/* Issue a transfer ring slot 0 noop */
+memset(&trb, 0, TRB_SIZE);
+trb.control |= TR_NOOP << TRB_TYPE_SHIFT;
+trb.control |= TRB_TR_IOC;
+submit_tr_trb(s, slotid, &trb);
+wait_event_trb(s, &trb);
+}
+
 /* Shut it down */
 qpci_msix_disable(s->dev);
 
-- 
2.47.1




[PATCH v3 2/8] hw/usb/xhci: Rename and move HCD register region constants to header

2025-04-11 Thread Nicholas Piggin
This also adds some missing constants rather than open-coding
offsets and sizes.

Signed-off-by: Nicholas Piggin 
---
 hw/usb/hcd-xhci.h | 16 
 hw/usb/hcd-xhci.c | 48 ++-
 2 files changed, 38 insertions(+), 26 deletions(-)

diff --git a/hw/usb/hcd-xhci.h b/hw/usb/hcd-xhci.h
index ee364efd0ab..20059fcf66c 100644
--- a/hw/usb/hcd-xhci.h
+++ b/hw/usb/hcd-xhci.h
@@ -115,6 +115,22 @@ typedef enum TRBCCode {
 CC_SPLIT_TRANSACTION_ERROR
 } TRBCCode;
 
+/* Register regions */
+#define XHCI_REGS_LENGTH_CAP 0x40
+#define XHCI_REGS_LENGTH_OPER0x400
+#define XHCI_REGS_LENGTH_PORT(XHCI_PORT_PR_SZ * XHCI_MAXPORTS)
+#define XHCI_REGS_LENGTH_RUNTIME ((XHCI_MAXINTRS + 1) * XHCI_INTR_IR_SZ)
+/* XXX: Should doorbell length be *4 rather than *32? */
+#define XHCI_REGS_LENGTH_DOORBELL((XHCI_MAXSLOTS + 1) * 0x20)
+
+#define XHCI_REGS_OFFSET_CAP 0
+#define XHCI_REGS_OFFSET_OPER(XHCI_REGS_OFFSET_CAP +   \
+  XHCI_REGS_LENGTH_CAP)
+#define XHCI_REGS_OFFSET_PORT(XHCI_REGS_OFFSET_OPER +  \
+  XHCI_REGS_LENGTH_OPER)
+#define XHCI_REGS_OFFSET_RUNTIME 0x1000
+#define XHCI_REGS_OFFSET_DOORBELL0x2000
+
 /* Register definitions */
 #define XHCI_HCCAP_REG_CAPLENGTH0x00
 #define XHCI_HCCAP_REG_HCIVERSION   0x02
diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index b57db309b8d..7470db38561 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -46,22 +46,14 @@
 #define COMMAND_LIMIT   256
 #define TRANSFER_LIMIT  256
 
-#define LEN_CAP 0x40
-#define LEN_OPER(0x400 + XHCI_PORT_PR_SZ * XHCI_MAXPORTS)
-#define LEN_RUNTIME ((XHCI_MAXINTRS + 1) * XHCI_INTR_IR_SZ)
-#define LEN_DOORBELL((XHCI_MAXSLOTS + 1) * 0x20)
-
-#define OFF_OPERLEN_CAP
-#define OFF_RUNTIME 0x1000
-#define OFF_DOORBELL0x2000
-
-#if (OFF_OPER + LEN_OPER) > OFF_RUNTIME
-#error Increase OFF_RUNTIME
+#if (XHCI_REGS_OFFSET_PORT + XHCI_REGS_LENGTH_PORT) > XHCI_REGS_OFFSET_RUNTIME
+#error Increase XHCI_REGS_OFFSET_RUNTIME
 #endif
-#if (OFF_RUNTIME + LEN_RUNTIME) > OFF_DOORBELL
-#error Increase OFF_DOORBELL
+#if (XHCI_REGS_OFFSET_RUNTIME + XHCI_REGS_LENGTH_RUNTIME) >\
+XHCI_REGS_OFFSET_DOORBELL
+#error Increase XHCI_REGS_OFFSET_DOORBELL
 #endif
-#if (OFF_DOORBELL + LEN_DOORBELL) > XHCI_LEN_REGS
+#if (XHCI_REGS_OFFSET_DOORBELL + XHCI_REGS_LENGTH_DOORBELL) > XHCI_LEN_REGS
 # error Increase XHCI_LEN_REGS
 #endif
 
@@ -2583,7 +2575,7 @@ static uint64_t xhci_cap_read(void *ptr, hwaddr reg, 
unsigned size)
 
 switch (reg) {
 case XHCI_HCCAP_REG_CAPLENGTH: /* Covers HCIVERSION and CAPLENGTH */
-ret = 0x0100 | LEN_CAP;
+ret = 0x0100 | XHCI_REGS_LENGTH_CAP;
 break;
 case XHCI_HCCAP_REG_HCSPARAMS1:
 ret = ((xhci->numports_2+xhci->numports_3)<<24)
@@ -2603,10 +2595,10 @@ static uint64_t xhci_cap_read(void *ptr, hwaddr reg, 
unsigned size)
 }
 break;
 case XHCI_HCCAP_REG_DBOFF:
-ret = OFF_DOORBELL;
+ret = XHCI_REGS_OFFSET_DOORBELL;
 break;
 case XHCI_HCCAP_REG_RTSOFF:
-ret = OFF_RUNTIME;
+ret = XHCI_REGS_OFFSET_RUNTIME;
 break;
 
 /* extended capabilities */
@@ -3256,22 +3248,26 @@ static void usb_xhci_realize(DeviceState *dev, Error 
**errp)
 
 memory_region_init(&xhci->mem, OBJECT(dev), "xhci", XHCI_LEN_REGS);
 memory_region_init_io(&xhci->mem_cap, OBJECT(dev), &xhci_cap_ops, xhci,
-  "capabilities", LEN_CAP);
+  "capabilities", XHCI_REGS_LENGTH_CAP);
 memory_region_init_io(&xhci->mem_oper, OBJECT(dev), &xhci_oper_ops, xhci,
-  "operational", 0x400);
+  "operational", XHCI_REGS_LENGTH_OPER);
 memory_region_init_io(&xhci->mem_runtime, OBJECT(dev), &xhci_runtime_ops,
-   xhci, "runtime", LEN_RUNTIME);
+   xhci, "runtime", XHCI_REGS_LENGTH_RUNTIME);
 memory_region_init_io(&xhci->mem_doorbell, OBJECT(dev), &xhci_doorbell_ops,
-   xhci, "doorbell", LEN_DOORBELL);
+   xhci, "doorbell", XHCI_REGS_LENGTH_DOORBELL);
 
-memory_region_add_subregion(&xhci->mem, 0,&xhci->mem_cap);
-memory_region_add_subregion(&xhci->mem, OFF_OPER, &xhci->mem_oper);
-memory_region_add_subregion(&xhci->mem, OFF_RUNTIME,  &xhci->mem_runtime);
-memory_region_add_subregion(&xhci->mem, OFF_DOORBELL, &xhci->mem_doorbell);
+memory_region_add_subregion(&xhci->mem, XHCI_REGS_OFFSET_CAP,
+&xhci->mem_cap);
+memory_region_add_subregion(&xhci->mem, XHCI_REGS_OFFSET_OPER,
+&xhci->mem_oper);
+memory_region_add_subregion(&xhci->mem, XHCI_REGS_OFFSET_RUNTIME,
+&xhci->mem_runtime);
+memory_region_add

[PATCH v3 1/8] hw/usb/xhci: Move HCD constants to a header and add register constants

2025-04-11 Thread Nicholas Piggin
Prepare to use some of these constants in xhci qtest code.

Signed-off-by: Nicholas Piggin 
---
 hw/usb/hcd-xhci.h | 214 ++
 hw/usb/hcd-xhci.c | 450 +++---
 2 files changed, 360 insertions(+), 304 deletions(-)

diff --git a/hw/usb/hcd-xhci.h b/hw/usb/hcd-xhci.h
index 9c3974f1489..ee364efd0ab 100644
--- a/hw/usb/hcd-xhci.h
+++ b/hw/usb/hcd-xhci.h
@@ -115,6 +115,220 @@ typedef enum TRBCCode {
 CC_SPLIT_TRANSACTION_ERROR
 } TRBCCode;
 
+/* Register definitions */
+#define XHCI_HCCAP_REG_CAPLENGTH0x00
+#define XHCI_HCCAP_REG_HCIVERSION   0x02
+#define XHCI_HCCAP_REG_HCSPARAMS1   0x04
+#define XHCI_HCCAP_REG_HCSPARAMS2   0x08
+#define XHCI_HCCAP_REG_HCSPARAMS3   0x0C
+#define XHCI_HCCAP_REG_HCCPARAMS1   0x10
+#define   XHCI_HCCPARAMS1_AC64  0x0001
+#define   XHCI_HCCPARAMS1_XECP_SHIFT16
+#define   XHCI_HCCPARAMS1_MAXPSASIZE_SHIFT  12
+#define XHCI_HCCAP_REG_DBOFF0x14
+#define XHCI_HCCAP_REG_RTSOFF   0x18
+#define XHCI_HCCAP_REG_HCCPARAMS2   0x1C
+#define XHCI_HCCAP_EXTCAP_START 0x20 /* SW-defined */
+
+#define XHCI_PORT_PR_SZ 0x10
+#define XHCI_PORT_REG_PORTSC0x00
+#define   XHCI_PORTSC_CCS   (1 << 0)
+#define   XHCI_PORTSC_PED   (1 << 1)
+#define   XHCI_PORTSC_OCA   (1 << 3)
+#define   XHCI_PORTSC_PR(1 << 4)
+#define   XHCI_PORTSC_PLS_SHIFT 5
+#define   XHCI_PORTSC_PLS_MASK  0xf
+#define   XHCI_PORTSC_PP(1 << 9)
+#define   XHCI_PORTSC_SPEED_SHIFT   10
+#define   XHCI_PORTSC_SPEED_MASK0xf
+#define   XHCI_PORTSC_SPEED_FULL(1 << 10)
+#define   XHCI_PORTSC_SPEED_LOW (2 << 10)
+#define   XHCI_PORTSC_SPEED_HIGH(3 << 10)
+#define   XHCI_PORTSC_SPEED_SUPER   (4 << 10)
+#define   XHCI_PORTSC_PIC_SHIFT 14
+#define   XHCI_PORTSC_PIC_MASK  0x3
+#define   XHCI_PORTSC_LWS   (1 << 16)
+#define   XHCI_PORTSC_CSC   (1 << 17)
+#define   XHCI_PORTSC_PEC   (1 << 18)
+#define   XHCI_PORTSC_WRC   (1 << 19)
+#define   XHCI_PORTSC_OCC   (1 << 20)
+#define   XHCI_PORTSC_PRC   (1 << 21)
+#define   XHCI_PORTSC_PLC   (1 << 22)
+#define   XHCI_PORTSC_CEC   (1 << 23)
+#define   XHCI_PORTSC_CAS   (1 << 24)
+#define   XHCI_PORTSC_WCE   (1 << 25)
+#define   XHCI_PORTSC_WDE   (1 << 26)
+#define   XHCI_PORTSC_WOE   (1 << 27)
+#define   XHCI_PORTSC_DR(1 << 30)
+#define   XHCI_PORTSC_WPR   (1 << 31)
+/* read/write bits */
+#define   XHCI_PORTSC_RW_MASK   (XHCI_PORTSC_PP |\
+ XHCI_PORTSC_WCE |   \
+ XHCI_PORTSC_WDE |   \
+ XHCI_PORTSC_WOE)
+/* write-1-to-clear bits*/
+#define   XHCI_PORTSC_W1C_MASK  (XHCI_PORTSC_CSC |   \
+ XHCI_PORTSC_PEC |   \
+ XHCI_PORTSC_WRC |   \
+ XHCI_PORTSC_OCC |   \
+ XHCI_PORTSC_PRC |   \
+ XHCI_PORTSC_PLC |   \
+ XHCI_PORTSC_CEC)
+#define XHCI_PORT_REG_PORTPMSC  0x04
+#define XHCI_PORT_REG_PORTLI0x08
+#define XHCI_PORT_REG_PORTHLPMC 0x0C
+
+#define XHCI_OPER_REG_USBCMD0x00
+#define   XHCI_USBCMD_RS(1 << 0)
+#define   XHCI_USBCMD_HCRST (1 << 1)
+#define   XHCI_USBCMD_INTE  (1 << 2)
+#define   XHCI_USBCMD_HSEE  (1 << 3)
+#define   XHCI_USBCMD_LHCRST(1 << 7)
+#define   XHCI_USBCMD_CSS   (1 << 8)
+#define   XHCI_USBCMD_CRS   (1 << 9)
+#define   XHCI_USBCMD_EWE   (1 << 10)
+#define   XHCI_USBCMD_EU3S  (1 << 11)
+#define XHCI_OPER_REG_USBSTS0x04
+#define   XHCI_USBSTS_HCH   (1 << 0)
+#define   XHCI_USBSTS_HSE   (1 << 2)
+#define   XHCI_USBSTS_EINT  (1 << 3)
+#define   XHCI_USBSTS_PCD   (1 << 4)
+#define   XHCI_USBSTS_SSS   (1 << 8)
+#define   XHCI_USBSTS_RSS   (1 << 9)
+#define   XHCI_USBSTS_SRE   (1 << 10)
+#define   XHCI_USBSTS_CNR   (1 << 11)
+#define   XHCI_USBSTS_HCE   (1 << 12)
+/* these bits are write-1-to-clear */
+#define   XHCI_USBSTS_W1C_MASK  (XHCI_USBSTS_HSE |\
+  

[PATCH v3 4/8] hw/usb/xhci: Support TR NOOP commands

2025-04-11 Thread Nicholas Piggin
Implement XHCI TR NOOP commands by setting up then immediately
completing the packet.

The IBM AIX XHCI HCD driver uses NOOP commands to check driver and
hardware health, which works after this change.

Signed-off-by: Nicholas Piggin 
---
 hw/usb/hcd-xhci.c | 28 +++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 88973c485d1..b6f65628db7 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -1663,6 +1663,20 @@ static int xhci_fire_transfer(XHCIState *xhci, 
XHCITransfer *xfer, XHCIEPContext
 return xhci_submit(xhci, xfer, epctx);
 }
 
+static int xhci_noop_transfer(XHCIState *xhci, XHCITransfer *xfer)
+{
+/*
+ * TR NOOP conceptually probably better not call into USB subsystem
+ * (usb_packet_setup() via xhci_setup_packet()). In practice it
+ * works and avoids code duplication.
+ */
+if (xhci_setup_packet(xfer) < 0) {
+return -1;
+}
+xhci_try_complete_packet(xfer);
+return 0;
+}
+
 static void xhci_kick_ep(XHCIState *xhci, unsigned int slotid,
  unsigned int epid, unsigned int streamid)
 {
@@ -1785,6 +1799,8 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, 
unsigned int streamid)
 
 epctx->kick_active++;
 while (1) {
+bool noop = false;
+
 length = xhci_ring_chain_length(xhci, ring);
 if (length <= 0) {
 if (epctx->type == ET_ISO_OUT || epctx->type == ET_ISO_IN) {
@@ -1813,10 +1829,20 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, 
unsigned int streamid)
 epctx->kick_active--;
 return;
 }
+if (type == TR_NOOP) {
+noop = true;
+}
 }
 xfer->streamid = streamid;
 
-if (epctx->epid == 1) {
+if (noop) {
+if (length != 1) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: NOOP TR TRB within TRB chain!\n", __func__);
+/* Undefined behavior, we no-op the entire chain */
+}
+xhci_noop_transfer(xhci, xfer);
+} else if (epctx->epid == 1) {
 xhci_fire_ctl_transfer(xhci, xfer);
 } else {
 xhci_fire_transfer(xhci, xfer, epctx);
-- 
2.47.1




[PATCH v3 7/8] hw/usb/hcd-xhci-pci: Make PCI device more configurable

2025-04-11 Thread Nicholas Piggin
To prepare to support another USB PCI Host Controller, make some PCI
configuration dynamic.

Signed-off-by: Nicholas Piggin 
---
 hw/usb/hcd-xhci-pci.h |   9 
 hw/usb/hcd-xhci-pci.c | 118 +-
 2 files changed, 103 insertions(+), 24 deletions(-)

diff --git a/hw/usb/hcd-xhci-pci.h b/hw/usb/hcd-xhci-pci.h
index 5b61ae84555..09aabae6e01 100644
--- a/hw/usb/hcd-xhci-pci.h
+++ b/hw/usb/hcd-xhci-pci.h
@@ -41,6 +41,15 @@ typedef struct XHCIPciState {
 OnOffAuto msi;
 OnOffAuto msix;
 bool conditional_intr_mapping;
+uint8_t cache_line_size;
+uint8_t pm_cap_off;
+uint8_t pcie_cap_off;
+uint8_t msi_cap_off;
+uint8_t msix_cap_off;
+int msix_bar_nr;
+uint64_t msix_bar_size;
+uint32_t msix_table_off;
+uint32_t msix_pba_off;
 } XHCIPciState;
 
 #endif
diff --git a/hw/usb/hcd-xhci-pci.c b/hw/usb/hcd-xhci-pci.c
index d908eb787d3..eb918ce3d6e 100644
--- a/hw/usb/hcd-xhci-pci.c
+++ b/hw/usb/hcd-xhci-pci.c
@@ -32,9 +32,6 @@
 #include "trace.h"
 #include "qapi/error.h"
 
-#define OFF_MSIX_TABLE  0x3000
-#define OFF_MSIX_PBA0x3800
-
 static void xhci_pci_intr_update(XHCIState *xhci, int n, bool enable)
 {
 XHCIPciState *s = container_of(xhci, XHCIPciState, xhci);
@@ -120,6 +117,31 @@ static int xhci_pci_vmstate_post_load(void *opaque, int 
version_id)
return 0;
 }
 
+static int xhci_pci_add_pm_capability(PCIDevice *pci_dev, uint8_t offset,
+  Error **errp)
+{
+int err;
+
+err = pci_add_capability(pci_dev, PCI_CAP_ID_PM, offset,
+ PCI_PM_SIZEOF, errp);
+if (err < 0) {
+return err;
+}
+
+pci_set_word(pci_dev->config + offset + PCI_PM_PMC,
+ PCI_PM_CAP_VER_1_2 |
+ PCI_PM_CAP_D1 | PCI_PM_CAP_D2 |
+ PCI_PM_CAP_PME_D0 | PCI_PM_CAP_PME_D1 |
+ PCI_PM_CAP_PME_D2 | PCI_PM_CAP_PME_D3hot);
+pci_set_word(pci_dev->wmask + offset + PCI_PM_PMC, 0);
+pci_set_word(pci_dev->config + offset + PCI_PM_CTRL,
+ PCI_PM_CTRL_NO_SOFT_RESET);
+pci_set_word(pci_dev->wmask + offset + PCI_PM_CTRL,
+ PCI_PM_CTRL_STATE_MASK);
+
+return 0;
+}
+
 static void usb_xhci_pci_realize(struct PCIDevice *dev, Error **errp)
 {
 int ret;
@@ -128,7 +150,7 @@ static void usb_xhci_pci_realize(struct PCIDevice *dev, 
Error **errp)
 
 dev->config[PCI_CLASS_PROG] = 0x30;/* xHCI */
 dev->config[PCI_INTERRUPT_PIN] = 0x01; /* interrupt pin 1 */
-dev->config[PCI_CACHE_LINE_SIZE] = 0x10;
+dev->config[PCI_CACHE_LINE_SIZE] = s->cache_line_size;
 dev->config[0x60] = 0x30; /* release number */
 
 object_property_set_link(OBJECT(&s->xhci), "host", OBJECT(s), NULL);
@@ -144,40 +166,78 @@ static void usb_xhci_pci_realize(struct PCIDevice *dev, 
Error **errp)
 s->xhci.nec_quirks = true;
 }
 
-if (s->msi != ON_OFF_AUTO_OFF) {
-ret = msi_init(dev, 0x70, s->xhci.numintrs, true, false, &err);
-/*
- * Any error other than -ENOTSUP(board's MSI support is broken)
- * is a programming error
- */
-assert(!ret || ret == -ENOTSUP);
-if (ret && s->msi == ON_OFF_AUTO_ON) {
-/* Can't satisfy user's explicit msi=on request, fail */
-error_append_hint(&err, "You have to use msi=auto (default) or "
-"msi=off with this machine type.\n");
+if (s->pm_cap_off) {
+if (xhci_pci_add_pm_capability(dev, s->pm_cap_off, &err)) {
 error_propagate(errp, err);
 return;
 }
-assert(!err || s->msi == ON_OFF_AUTO_AUTO);
-/* With msi=auto, we fall back to MSI off silently */
-error_free(err);
 }
+
+if (s->msi != ON_OFF_AUTO_OFF) {
+ret = msi_init(dev, s->msi_cap_off, s->xhci.numintrs,
+   true, false, &err);
+if (ret) {
+if (ret != -ENOTSUP) {
+/* Programming error */
+error_propagate(errp, err);
+return;
+}
+if (s->msi == ON_OFF_AUTO_ON) {
+/* Can't satisfy user's explicit msi=on request, fail */
+error_append_hint(&err, "You have to use msi=auto (default) "
+  "or msi=off with this machine type.\n");
+error_propagate(errp, err);
+return;
+}
+error_free(err);
+err = NULL; /* With msi=auto, we fall back to MSI off silently */
+}
+}
+
 pci_register_bar(dev, 0,
  PCI_BASE_ADDRESS_SPACE_MEMORY |
  PCI_BASE_ADDRESS_MEM_TYPE_64,
  &s->xhci.mem);
 
 if (pci_bus_is_express(pci_get_bus(dev))) {
-ret = pcie_endpoint_cap_init(dev, 0xa0);
+ret = pcie_endpoint_cap_init(dev, s->pcie_cap_off);
 assert(ret > 0);
 }
 
 if (s->msix != ON_OFF_AUTO_OFF) {
-/

[PATCH v3 3/8] tests/qtest/xhci: Add controller and device setup and ring tests

2025-04-11 Thread Nicholas Piggin
Add tests which init the host controller registers to the point where
command and event rings, irqs are operational. Enumerate ports and set
up an attached device context that enables device transfer ring to be
set up and tested.

This test does a bunch of things at once and is not yet well librified,
but it allows testing basic mechanisms and gives a starting point for
further work.

Signed-off-by: Nicholas Piggin 
---
 hw/usb/hcd-xhci.h   |   7 +
 hw/usb/hcd-xhci.c   |   7 -
 tests/qtest/usb-hcd-xhci-test.c | 516 +++-
 3 files changed, 517 insertions(+), 13 deletions(-)

diff --git a/hw/usb/hcd-xhci.h b/hw/usb/hcd-xhci.h
index 20059fcf66c..02a005ce78d 100644
--- a/hw/usb/hcd-xhci.h
+++ b/hw/usb/hcd-xhci.h
@@ -350,6 +350,13 @@ typedef struct XHCIRing {
 bool ccs;
 } XHCIRing;
 
+typedef struct XHCIEvRingSeg {
+uint32_t addr_low;
+uint32_t addr_high;
+uint32_t size;
+uint32_t rsvd;
+} XHCIEvRingSeg;
+
 typedef struct XHCIPort {
 XHCIState *xhci;
 uint32_t portsc;
diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 7470db38561..88973c485d1 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -128,13 +128,6 @@ struct XHCIEPContext {
 QEMUTimer *kick_timer;
 };
 
-typedef struct XHCIEvRingSeg {
-uint32_t addr_low;
-uint32_t addr_high;
-uint32_t size;
-uint32_t rsvd;
-} XHCIEvRingSeg;
-
 static void xhci_kick_ep(XHCIState *xhci, unsigned int slotid,
  unsigned int epid, unsigned int streamid);
 static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid);
diff --git a/tests/qtest/usb-hcd-xhci-test.c b/tests/qtest/usb-hcd-xhci-test.c
index 0cccfd85a64..b9fb2356d26 100644
--- a/tests/qtest/usb-hcd-xhci-test.c
+++ b/tests/qtest/usb-hcd-xhci-test.c
@@ -8,17 +8,174 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bswap.h"
+#include "libqtest.h"
+#include "libqos/libqos-pc.h"
 #include "libqtest-single.h"
 #include "libqos/usb.h"
+#include "hw/pci/pci_ids.h"
+#include "hw/pci/pci_regs.h"
+#include "hw/usb/hcd-xhci.h"
+
+/*** Test Setup & Teardown ***/
+typedef struct XHCIQSlotState {
+/* In-memory arrays */
+uint64_t device_context;
+uint64_t transfer_ring;
+
+uint32_t tr_trb_entries;
+uint32_t tr_trb_idx;
+uint32_t tr_trb_c;
+} XHCIQSlotState;
+
+typedef struct XHCIQState {
+/* QEMU PCI variables */
+QOSState *parent;
+QPCIDevice *dev;
+QPCIBar bar;
+uint64_t barsize;
+uint32_t fingerprint;
+
+/* In-memory arrays */
+uint64_t dc_base_array;
+uint64_t command_ring;
+uint64_t event_ring_seg;
+uint64_t event_ring;
+
+uint32_t cr_trb_entries;
+uint32_t cr_trb_idx;
+uint32_t cr_trb_c;
+uint32_t er_trb_entries;
+uint32_t er_trb_idx;
+uint32_t er_trb_c;
+
+/* Host controller properties */
+uint32_t rtoff, dboff;
+uint32_t maxports, maxslots, maxintrs;
+
+XHCIQSlotState slots[32];
+} XHCIQState;
+
+#define XHCI_NEC_ID (PCI_DEVICE_ID_NEC_UPD720200 << 16 | \
+ PCI_VENDOR_ID_NEC)
+
+/**
+ * Locate, verify, and return a handle to the XHCI device.
+ */
+static QPCIDevice *get_xhci_device(QTestState *qts, uint32_t *fingerprint)
+{
+QPCIDevice *xhci;
+uint32_t xhci_fingerprint;
+QPCIBus *pcibus;
+
+pcibus = qpci_new_pc(qts, NULL);
+
+/* Find the XHCI PCI device and verify it's the right one. */
+xhci = qpci_device_find(pcibus, QPCI_DEVFN(0x1D, 0x0));
+g_assert(xhci != NULL);
+
+xhci_fingerprint = qpci_config_readl(xhci, PCI_VENDOR_ID);
+switch (xhci_fingerprint) {
+case XHCI_NEC_ID:
+break;
+default:
+/* Unknown device. */
+g_assert_not_reached();
+}
+
+if (fingerprint) {
+*fingerprint = xhci_fingerprint;
+}
+return xhci;
+}
+
+static void free_xhci_device(QPCIDevice *dev)
+{
+QPCIBus *pcibus = dev ? dev->bus : NULL;
+
+/* libqos doesn't have a function for this, so free it manually */
+g_free(dev);
+qpci_free_pc(pcibus);
+}
+
+/**
+ * Start a Q35 machine and bookmark a handle to the XHCI device.
+ */
+G_GNUC_PRINTF(1, 0)
+static XHCIQState *xhci_vboot(const char *cli, va_list ap)
+{
+XHCIQState *s;
+
+s = g_new0(XHCIQState, 1);
+s->parent = qtest_pc_vboot(cli, ap);
+alloc_set_flags(&s->parent->alloc, ALLOC_LEAK_ASSERT);
+
+/* Verify that we have an XHCI device present. */
+s->dev = get_xhci_device(s->parent->qts, &s->fingerprint);
+s->bar = qpci_iomap(s->dev, 0, &s->barsize);
+/* turns on pci.cmd.iose, pci.cmd.mse and pci.cmd.bme */
+qpci_device_enable(s->dev);
+
+return s;
+}
+
+/**
+ * Start a Q35 machine and bookmark a handle to the XHCI device.
+ */
+G_GNUC_PRINTF(1, 2)
+static XHCIQState *xhci_boot(const char *cli, ...)
+{
+XHCIQState *s;
+va_list ap;
+
+if (cli) {
+va_start(ap, cli);
+s = xhci_vboot(cli, ap);
+va_end(ap);
+} else {
+s = xhci_boot("-M q35 "
+  

[PATCH v3 0/8] usb/xhci: TR NOOP, TI HCD device, more qtests

2025-04-11 Thread Nicholas Piggin
This series adds better support qtests support for the xhci controller,
adds support for the "TR NOOP" command used by AIX, and adds a new USB
controller model from TI that PowerVM and AIX use.

This series depends on some of the qtests changes from this one:

https://lore.kernel.org/qemu-devel/20250411044130.201724-1-npig...@gmail.com/T/#t

Since v2:
https://lore.kernel.org/qemu-devel/20250118070853.653778-1-npig...@gmail.com/

- Rebased to upstream. Hopefully this is ready to go for 10.1.

Thanks,
Nick

Nicholas Piggin (8):
  hw/usb/xhci: Move HCD constants to a header and add register constants
  hw/usb/xhci: Rename and move HCD register region constants to header
  tests/qtest/xhci: Add controller and device setup and ring tests
  hw/usb/xhci: Support TR NOOP commands
  tests/qtest/xhci: add a test for TR NOOP commands
  tests/qtest/xhci: test the qemu-xhci device
  hw/usb/hcd-xhci-pci: Make PCI device more configurable
  hw/usb/hcd-xhci-pci: Add TI TUSB73X0 XHCI controller model

 hw/usb/hcd-xhci-pci.h   |   9 +
 hw/usb/hcd-xhci.h   | 237 +
 include/hw/pci/pci_ids.h|   1 +
 include/hw/usb/xhci.h   |   1 +
 hw/usb/hcd-xhci-pci.c   | 118 +--
 hw/usb/hcd-xhci-ti.c|  88 +
 hw/usb/hcd-xhci.c   | 527 ++--
 tests/qtest/usb-hcd-xhci-test.c | 600 +++-
 hw/usb/Kconfig  |   5 +
 hw/usb/meson.build  |   1 +
 10 files changed, 1214 insertions(+), 373 deletions(-)
 create mode 100644 hw/usb/hcd-xhci-ti.c

-- 
2.47.1




[PATCH v3 8/8] hw/usb/hcd-xhci-pci: Add TI TUSB73X0 XHCI controller model

2025-04-11 Thread Nicholas Piggin
The TI TUSB73X0 controller has some interesting differences from NEC,
notably a separate BAR for MSIX, and PM capabilities. The spec is freely
available without sign-up.

This controller is accepted by IBM Power proprietary firmware and
software (when the subsystem IDs are set to Power servers, which is not
done here). IBM code is picky about device support, so the NEC device
can not be used.

xhci qtests are added for this device.

Signed-off-by: Nicholas Piggin 
---
 include/hw/pci/pci_ids.h|  1 +
 include/hw/usb/xhci.h   |  1 +
 hw/usb/hcd-xhci-ti.c| 88 +
 tests/qtest/usb-hcd-xhci-test.c |  4 ++
 hw/usb/Kconfig  |  5 ++
 hw/usb/meson.build  |  1 +
 6 files changed, 100 insertions(+)
 create mode 100644 hw/usb/hcd-xhci-ti.c

diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
index 33e2898be95..99fe751703f 100644
--- a/include/hw/pci/pci_ids.h
+++ b/include/hw/pci/pci_ids.h
@@ -182,6 +182,7 @@
 #define PCI_VENDOR_ID_HP 0x103c
 
 #define PCI_VENDOR_ID_TI 0x104c
+#define PCI_DEVICE_ID_TI_TUSB73X00x8241
 
 #define PCI_VENDOR_ID_MOTOROLA   0x1057
 #define PCI_DEVICE_ID_MOTOROLA_MPC1060x0002
diff --git a/include/hw/usb/xhci.h b/include/hw/usb/xhci.h
index 5c90e1373e5..203ec1fca32 100644
--- a/include/hw/usb/xhci.h
+++ b/include/hw/usb/xhci.h
@@ -3,6 +3,7 @@
 
 #define TYPE_XHCI "base-xhci"
 #define TYPE_NEC_XHCI "nec-usb-xhci"
+#define TYPE_TI_XHCI "ti-usb-xhci"
 #define TYPE_QEMU_XHCI "qemu-xhci"
 #define TYPE_XHCI_SYSBUS "sysbus-xhci"
 
diff --git a/hw/usb/hcd-xhci-ti.c b/hw/usb/hcd-xhci-ti.c
new file mode 100644
index 000..9ad9d6edf7a
--- /dev/null
+++ b/hw/usb/hcd-xhci-ti.c
@@ -0,0 +1,88 @@
+/*
+ * USB xHCI controller emulation
+ * Datasheet https://www.ti.com/product/TUSB7340
+ *
+ * Copyright (c) 2011 Securiforest
+ * Date: 2011-05-11 ;  Author: Hector Martin 
+ * Based on usb-xhci-nec.c, emulates TI TUSB73X0
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/usb.h"
+#include "qemu/module.h"
+#include "hw/pci/pci.h"
+#include "hw/qdev-properties.h"
+
+#include "hcd-xhci-pci.h"
+
+OBJECT_DECLARE_SIMPLE_TYPE(XHCITiState, TI_XHCI)
+
+struct XHCITiState {
+XHCIPciState parent_obj;
+
+uint32_t intrs;
+uint32_t slots;
+};
+
+static const Property ti_xhci_properties[] = {
+DEFINE_PROP_UINT32("intrs", XHCITiState, intrs, 8),
+DEFINE_PROP_UINT32("slots", XHCITiState, slots, XHCI_MAXSLOTS),
+};
+
+static void ti_xhci_instance_init(Object *obj)
+{
+XHCIPciState *pci = XHCI_PCI(obj);
+XHCITiState *ti = TI_XHCI(obj);
+
+pci->xhci.numintrs = ti->intrs;
+pci->xhci.numslots = ti->slots;
+
+pci->cache_line_size = 0x0;
+pci->pm_cap_off = 0x40;
+pci->pcie_cap_off = 0x70;
+pci->msi_cap_off = 0x48;
+pci->msix_cap_off = 0xc0;
+pci->msix_bar_nr = 0x2;
+pci->msix_bar_size = 0x80;
+pci->msix_table_off = 0x0;
+pci->msix_pba_off = 0x1000;
+}
+
+static void ti_xhci_class_init(ObjectClass *klass, void *data)
+{
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+device_class_set_props(dc, ti_xhci_properties);
+k->vendor_id= PCI_VENDOR_ID_TI;
+k->device_id= PCI_DEVICE_ID_TI_TUSB73X0;
+k->revision = 0x02;
+}
+
+static const TypeInfo ti_xhci_info = {
+.name  = TYPE_TI_XHCI,
+.parent= TYPE_XHCI_PCI,
+.instance_size = sizeof(XHCITiState),
+.instance_init = ti_xhci_instance_init,
+.class_init= ti_xhci_class_init,
+};
+
+static void ti_xhci_register_types(void)
+{
+type_register_static(&ti_xhci_info);
+}
+
+type_init(ti_xhci_register_types)
diff --git a/tests/qtest/usb-hcd-xhci-test.c b/tests/qtest/usb-hcd-xhci-test.c
index 4efe7b69d4f..dc438cf35c7 100644
--- a/tests/qtest/usb-hcd-xhci-test.c
+++ b/tests/qtest/usb-hcd-xhci-test.c
@@ -65,6 +65,8 @@ typedef struct XHCIQState {
   PCI_VENDOR_ID_REDHAT)
 #define XHCI_NEC_ID (PCI_DEVICE_ID_NEC_UPD720200 << 16 | \
  PCI_VENDOR_ID_NEC)
+#define XHCI_TI_ID  (PCI_DEVICE_ID_TI_TUSB73X0 << 16 | \
+ PCI_VENDOR_ID_TI)
 
 /**
  * Locate, verify, and return a handle to the XHCI device.
@@ -85,6 +87,7 @@ static QPCIDevice *get_xhci_

[PATCH v3 6/8] tests/qtest/xhci: test the qemu-xhci device

2025-04-11 Thread Nicholas Piggin
Add support in the test code for running multiple drivers, and add
tests for the qemu-xhci device.

Signed-off-by: Nicholas Piggin 
---
 tests/qtest/usb-hcd-xhci-test.c | 96 +
 1 file changed, 63 insertions(+), 33 deletions(-)

diff --git a/tests/qtest/usb-hcd-xhci-test.c b/tests/qtest/usb-hcd-xhci-test.c
index 63359fb70b9..4efe7b69d4f 100644
--- a/tests/qtest/usb-hcd-xhci-test.c
+++ b/tests/qtest/usb-hcd-xhci-test.c
@@ -13,10 +13,15 @@
 #include "libqos/libqos-pc.h"
 #include "libqtest-single.h"
 #include "libqos/usb.h"
+#include "hw/pci/pci.h"
 #include "hw/pci/pci_ids.h"
 #include "hw/pci/pci_regs.h"
 #include "hw/usb/hcd-xhci.h"
 
+typedef struct TestData {
+const char *device;
+} TestData;
+
 /*** Test Setup & Teardown ***/
 typedef struct XHCIQSlotState {
 /* In-memory arrays */
@@ -56,6 +61,8 @@ typedef struct XHCIQState {
 XHCIQSlotState slots[32];
 } XHCIQState;
 
+#define XHCI_QEMU_ID (PCI_DEVICE_ID_REDHAT_XHCI << 16 | \
+  PCI_VENDOR_ID_REDHAT)
 #define XHCI_NEC_ID (PCI_DEVICE_ID_NEC_UPD720200 << 16 | \
  PCI_VENDOR_ID_NEC)
 
@@ -76,6 +83,7 @@ static QPCIDevice *get_xhci_device(QTestState *qts, uint32_t 
*fingerprint)
 
 xhci_fingerprint = qpci_config_readl(xhci, PCI_VENDOR_ID);
 switch (xhci_fingerprint) {
+case XHCI_QEMU_ID:
 case XHCI_NEC_ID:
 break;
 default:
@@ -128,20 +136,21 @@ static XHCIQState *xhci_boot(const char *cli, ...)
 XHCIQState *s;
 va_list ap;
 
-if (cli) {
-va_start(ap, cli);
-s = xhci_vboot(cli, ap);
-va_end(ap);
-} else {
-s = xhci_boot("-M q35 "
-  "-device nec-usb-xhci,id=xhci,bus=pcie.0,addr=1d.0 "
-  "-drive id=drive0,if=none,file=null-co://,"
-  "file.read-zeroes=on,format=raw");
-}
+va_start(ap, cli);
+s = xhci_vboot(cli, ap);
+va_end(ap);
 
 return s;
 }
 
+static XHCIQState *xhci_boot_dev(const char *device)
+{
+return xhci_boot("-M q35 "
+"-device %s,id=xhci,bus=pcie.0,addr=1d.0 "
+"-drive id=drive0,if=none,file=null-co://,"
+"file.read-zeroes=on,format=raw", device);
+}
+
 /**
  * Clean up the PCI device, then terminate the QEMU instance.
  */
@@ -156,12 +165,13 @@ static void xhci_shutdown(XHCIQState *xhci)
 
 /*** tests ***/
 
-static void test_xhci_hotplug(void)
+static void test_xhci_hotplug(const void *arg)
 {
+const TestData *td = arg;
 XHCIQState *s;
 QTestState *qts;
 
-s = xhci_boot(NULL);
+s = xhci_boot_dev(td->device);
 qts = s->parent->qts;
 
 usb_test_hotplug(qts, "xhci", "1", NULL);
@@ -169,12 +179,13 @@ static void test_xhci_hotplug(void)
 xhci_shutdown(s);
 }
 
-static void test_usb_uas_hotplug(void)
+static void test_usb_uas_hotplug(const void *arg)
 {
+const TestData *td = arg;
 XHCIQState *s;
 QTestState *qts;
 
-s = xhci_boot(NULL);
+s = xhci_boot_dev(td->device);
 qts = s->parent->qts;
 
 qtest_qmp_device_add(qts, "usb-uas", "uas", "{}");
@@ -191,12 +202,13 @@ static void test_usb_uas_hotplug(void)
 xhci_shutdown(s);
 }
 
-static void test_usb_ccid_hotplug(void)
+static void test_usb_ccid_hotplug(const void *arg)
 {
+const TestData *td = arg;
 XHCIQState *s;
 QTestState *qts;
 
-s = xhci_boot(NULL);
+s = xhci_boot_dev(td->device);
 qts = s->parent->qts;
 
 qtest_qmp_device_add(qts, "usb-ccid", "ccid", "{}");
@@ -392,8 +404,9 @@ static void submit_tr_trb(XHCIQState *s, int slot, XHCITRB 
*trb)
  * This could be librified in future (like AHCI0 to have a way to bring up
  * an endpoint to test device protocols.
  */
-static void pci_xhci_stress_rings(void)
+static void test_xhci_stress_rings(const void *arg)
 {
+const TestData *td = arg;
 XHCIQState *s;
 uint32_t value;
 uint64_t input_context;
@@ -405,11 +418,11 @@ static void pci_xhci_stress_rings(void)
 int i;
 
 s = xhci_boot("-M q35 "
-"-device nec-usb-xhci,id=xhci,bus=pcie.0,addr=1d.0 "
+"-device %s,id=xhci,bus=pcie.0,addr=1d.0 "
 "-device usb-storage,bus=xhci.0,drive=drive0 "
 "-drive id=drive0,if=none,file=null-co://,"
-"file.read-zeroes=on,format=raw "
-);
+"file.read-zeroes=on,format=raw ",
+td->device);
 
 hcsparams1 = xhci_cap_readl(s, XHCI_HCCAP_REG_HCSPARAMS1);
 s->maxports = (hcsparams1 >> 24) & 0xff;
@@ -567,11 +580,37 @@ static void pci_xhci_stress_rings(void)
 xhci_shutdown(s);
 }
 
+static void add_test(const char *name, TestData *td, void (*fn)(const void *))
+{
+g_autofree char *full_name = g_strdup_printf(
+"/xhci/pci/%s/%s", td->device, name);
+qtest_add_data_func(full_name, td, fn);
+}
+
+static void add_tests(TestData *td)
+{
+add_test("hotplug", td, test_xhci_hotplug);
+if (qtest_has_device("usb-uas")) {
+   

Re: [PATCH] virtio-net: Copy all for dhclient workaround

2025-04-11 Thread Akihiko Odaki

On 2025/04/07 17:29, Antoine Damhet wrote:

On Sat, Apr 05, 2025 at 05:04:28PM +0900, Akihiko Odaki wrote:

The goal of commit 7987d2be5a8b ("virtio-net: Copy received header to
buffer") was to remove the need to patch the (const) input buffer with a
recomputed UDP checksum by copying headers to a RW region and inject the
checksum there. The patch computed the checksum only from the header
fields (missing the rest of the payload) producing an invalid one
and making guests fail to acquire a DHCP lease.

Fix the issue by copying the entire packet instead of only copying the
headers.

Fixes: 7987d2be5a8b ("virtio-net: Copy received header to buffer")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2727
Cc: qemu-sta...@nongnu.org
Signed-off-by: Akihiko Odaki 


Tested-By: Antoine Damhet 


---
This patch aims to resolves the issue the following one also does:
https://lore.kernel.org/qemu-devel/20250404151835.328368-1-adam...@scaleway.com

The difference from the mentioned patch is that this patch also
preserves that the original intent of regressing change, which is to
remove the need to patch the (const) input buffer with a recomputed UDP
checksum.

To Antoine Damhet:
I confirmed that DHCP is currently not working and this patch fixes the
issue, but I would appreciate if you also confirm the fix as I already
have done testing badly for the regressing patch.


Thanks for the swift response, ideally I'd like a non-regression test in
the testsuite but a quick test showed me that I couldn't easily
reproduce with user networking so unless someone has a great idea it
would be a pain.


---
  hw/net/virtio-net.c | 35 ---
  1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index de87cfadffe1..a920358a89c5 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1687,6 +1687,11 @@ static void virtio_net_hdr_swap(VirtIODevice *vdev, 
struct virtio_net_hdr *hdr)
  virtio_tswap16s(vdev, &hdr->csum_offset);
  }
  
+typedef struct Header {

+struct virtio_net_hdr_v1_hash virtio_net;
+uint8_t payload[1500];
+} Header;
+
  /* dhclient uses AF_PACKET but doesn't pass auxdata to the kernel so
   * it never finds out that the packets don't have valid checksums.  This
   * causes dhclient to get upset.  Fedora's carried a patch for ages to
@@ -1701,7 +1706,7 @@ static void virtio_net_hdr_swap(VirtIODevice *vdev, 
struct virtio_net_hdr *hdr)
   * we should provide a mechanism to disable it to avoid polluting the host
   * cache.
   */
-static void work_around_broken_dhclient(struct virtio_net_hdr *hdr,
+static void work_around_broken_dhclient(struct Header *hdr,
  size_t *hdr_len, const uint8_t *buf,
  size_t buf_size, size_t *buf_offset)
  {
@@ -1711,20 +1716,20 @@ static void work_around_broken_dhclient(struct 
virtio_net_hdr *hdr,
  buf += *buf_offset;
  buf_size -= *buf_offset;
  
-if ((hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && /* missing csum */

-(buf_size >= csum_size && buf_size < 1500) && /* normal sized MTU */
+if ((hdr->virtio_net.hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && /* 
missing csum */
+(buf_size >= csum_size && buf_size < sizeof(hdr->payload)) && /* 
normal sized MTU */
  (buf[12] == 0x08 && buf[13] == 0x00) && /* ethertype == IPv4 */
  (buf[23] == 17) && /* ip.protocol == UDP */
  (buf[34] == 0 && buf[35] == 67)) { /* udp.srcport == bootps */
-memcpy((uint8_t *)hdr + *hdr_len, buf, csum_size);
-net_checksum_calculate((uint8_t *)hdr + *hdr_len, csum_size, CSUM_UDP);
-hdr->flags &= ~VIRTIO_NET_HDR_F_NEEDS_CSUM;
-*hdr_len += csum_size;
-*buf_offset += csum_size;
+memcpy((uint8_t *)hdr + *hdr_len, buf, buf_size);
+net_checksum_calculate((uint8_t *)hdr + *hdr_len, buf_size, CSUM_UDP);
+hdr->virtio_net.hdr.flags &= ~VIRTIO_NET_HDR_F_NEEDS_CSUM;
+*hdr_len += buf_size;
+*buf_offset += buf_size;
  }
  }
  
-static size_t receive_header(VirtIONet *n, struct virtio_net_hdr *hdr,

+static size_t receive_header(VirtIONet *n, Header *hdr,
   const void *buf, size_t buf_size,
   size_t *buf_offset)


`receive_header` can now "receive" the whole packet that's kinda
misleading. I though another approach would be to only do the
detection/flag patching from receive_header and recompute the checksum
directly in the final `iov`, this would also eliminate the extra payload
copy.


It is possible to avoid copying but I chose not to do that because this 
is not a hot path and the code complexity required for that does not 
look worthwhile for me.


But I agree that the names of receive_header() and Header structure are 
misleading. The reasoning I used to convince myself is that the "Header" 
is at the head of the packet at least. I'd like to hea

[PATCH v2 05/10] usb/msd: Allow CBW packet size greater than 31

2025-04-11 Thread Nicholas Piggin
The CBW structure is 31 bytes, so CBW DATAOUT packets must be at least
31 bytes. QEMU enforces exactly 31 bytes, but this is inconsistent with
how it handles CSW packets (where it allows greater than or equal to 13
bytes) despite wording in the spec[*] being similar for both packet
types: "shall end as a short packet with exactly 31 bytes transferred".

  [*] USB MSD Bulk-Only Transport 1.0

For consistency, and on the principle of being tolerant in accepting
input, relax the CBW size check.

Alternatively, both checks could be tightened to exact. Or a message
could be printed warning of possible guest error if size is not exact,
but still accept the packets.

Signed-off-by: Nicholas Piggin 
---
 hw/usb/dev-storage.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index 6668114ea74..27093de5c84 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -400,7 +400,7 @@ static bool try_get_valid_cbw(USBPacket *p, struct 
usb_msd_cbw *cbw)
 {
 uint32_t sig;
 
-if (p->iov.size != 31) {
+if (p->iov.size < 31) {
 qemu_log_mask(LOG_GUEST_ERROR, "usb-msd: Bad CBW size %zu\n",
p->iov.size);
 return false;
-- 
2.47.1




[PATCH v2 03/10] usb/msd: Improved handling of mass storage reset

2025-04-11 Thread Nicholas Piggin
The mass storage reset request handling does not reset in-flight
SCSI requests or USB MSD packets. Implement this by calling the
device reset handler which should take care of everything.

Signed-off-by: Nicholas Piggin 
---
 hw/usb/dev-storage.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index 87c22476f6b..c7c36ac80fa 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -359,7 +359,7 @@ static void usb_msd_handle_control(USBDevice *dev, 
USBPacket *p,
 /* Class specific requests.  */
 case ClassInterfaceOutRequest | MassStorageReset:
 /* Reset state ready for the next CBW.  */
-s->mode = USB_MSDM_CBW;
+usb_msd_handle_reset(dev);
 break;
 case ClassInterfaceRequest | GetMaxLun:
 maxlun = 0;
-- 
2.47.1




[PATCH v2 10/10] usb/msd: Add more tracing

2025-04-11 Thread Nicholas Piggin
Add tracing for more received packet types, cbw_state changes, and
some more SCSI callbacks. These were useful in debugging relaxed
packet ordering support.

Signed-off-by: Nicholas Piggin 
---
 hw/usb/dev-storage.c | 23 +--
 hw/usb/trace-events  |  9 -
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index 654b9071d33..0ed39de189d 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -187,7 +187,7 @@ static void usb_msd_data_packet_complete(MSDState *s, int 
status)
  * because another request may be issued before usb_packet_complete
  * returns.
  */
-trace_usb_msd_packet_complete();
+trace_usb_msd_data_packet_complete();
 s->data_packet = NULL;
 p->status = status;
 usb_packet_complete(&s->dev, p);
@@ -202,7 +202,7 @@ static void usb_msd_csw_packet_complete(MSDState *s, int 
status)
  * because another request may be issued before usb_packet_complete
  * returns.
  */
-trace_usb_msd_packet_complete();
+trace_usb_msd_csw_packet_complete();
 s->csw_in_packet = NULL;
 p->status = status;
 usb_packet_complete(&s->dev, p);
@@ -231,7 +231,11 @@ static void usb_msd_fatal_error(MSDState *s)
 static void usb_msd_copy_data(MSDState *s, USBPacket *p)
 {
 uint32_t len;
+
 len = p->iov.size - p->actual_length;
+
+trace_usb_msd_copy_data(s->req->tag, len);
+
 if (len > s->scsi_len)
 len = s->scsi_len;
 usb_packet_copy(p, scsi_req_get_buf(s->req) + s->scsi_off, len);
@@ -264,6 +268,8 @@ void usb_msd_transfer_data(SCSIRequest *req, uint32_t len)
 MSDState *s = DO_UPCAST(MSDState, dev.qdev, req->bus->qbus.parent);
 USBPacket *p = s->data_packet;
 
+trace_usb_msd_transfer_data(req->tag, len);
+
 if (s->cbw_state == USB_MSD_CBW_DATAIN) {
 if (req->cmd.mode == SCSI_XFER_TO_DEV) {
 usb_msd_fatal_error(s);
@@ -324,11 +330,13 @@ void usb_msd_command_complete(SCSIRequest *req, size_t 
resid)
 }
 if (s->data_len == 0) {
 s->cbw_state = USB_MSD_CBW_CSW;
+trace_usb_msd_cbw_state(s->cbw_state);
 }
 /* USB_RET_SUCCESS status clears previous ASYNC status */
 usb_msd_data_packet_complete(s, USB_RET_SUCCESS);
 } else if (s->data_len == 0) {
 s->cbw_state = USB_MSD_CBW_CSW;
+trace_usb_msd_cbw_state(s->cbw_state);
 }
 
 if (s->cbw_state == USB_MSD_CBW_CSW) {
@@ -336,6 +344,7 @@ void usb_msd_command_complete(SCSIRequest *req, size_t 
resid)
 if (p) {
 usb_msd_send_status(s, p);
 s->cbw_state = USB_MSD_CBW_NONE;
+trace_usb_msd_cbw_state(s->cbw_state);
 /* USB_RET_SUCCESS status clears previous ASYNC status */
 usb_msd_csw_packet_complete(s, USB_RET_SUCCESS);
 }
@@ -379,6 +388,7 @@ void usb_msd_handle_reset(USBDevice *dev)
 
 memset(&s->csw, 0, sizeof(s->csw));
 s->cbw_state = USB_MSD_CBW_NONE;
+trace_usb_msd_cbw_state(s->cbw_state);
 
 s->needs_reset = false;
 }
@@ -429,6 +439,8 @@ static void usb_msd_cancel_io(USBDevice *dev, USBPacket *p)
 {
 MSDState *s = USB_STORAGE_DEV(dev);
 
+trace_usb_msd_cancel_io();
+
 if (p == s->data_packet) {
 s->data_packet = NULL;
 if (s->req) {
@@ -516,6 +528,7 @@ static void usb_msd_handle_data_out(USBDevice *dev, 
USBPacket *p)
 }
 trace_usb_msd_cmd_submit(cbw.lun, tag, cbw.flags,
  cbw.cmd_len, s->data_len);
+trace_usb_msd_cbw_state(s->cbw_state);
 assert(le32_to_cpu(s->csw.residue) == 0);
 assert(s->scsi_len == 0);
 s->req = scsi_req_new(scsi_dev, tag, cbw.lun,
@@ -553,6 +566,7 @@ static void usb_msd_handle_data_out(USBDevice *dev, 
USBPacket *p)
 s->data_len -= len;
 if (s->data_len == 0) {
 s->cbw_state = USB_MSD_CBW_CSW;
+trace_usb_msd_cbw_state(s->cbw_state);
 }
 }
 }
@@ -579,6 +593,7 @@ static void usb_msd_handle_data_in(USBDevice *dev, 
USBPacket *p)
 
 switch (s->cbw_state) {
 case USB_MSD_CBW_NONE:
+trace_usb_msd_unknown_in(p->iov.size);
 if (s->unknown_in_packet) {
 qemu_log_mask(LOG_GUEST_ERROR, "usb-msd: second IN packet was"
"received before CBW\n");
@@ -590,6 +605,7 @@ static void usb_msd_handle_data_in(USBDevice *dev, 
USBPacket *p)
 break;
 
 case USB_MSD_CBW_DATAOUT:
+trace_usb_msd_csw_in();
 if (s->unknown_in_packet) {
 error_report("usb-msd: unknown_in_packet in DATAOUT state");
 goto fail;
@@ -610,6 +626,7 @@ static void usb_msd_handle_data_in(USBDevice *dev, 
USBPacket *p)
 break;
 
 case USB_MSD_CBW_CSW:
+trace_usb_msd_csw_in();
 if (s->unknown_in_packet) {
 error_report("usb-msd: unknown_in_packet 

[PATCH v2 06/10] usb/msd: Split async packet tracking into data and csw

2025-04-11 Thread Nicholas Piggin
The async packet handling logic has places that infer whether the
async packet is data or CSW, based on context. This is not wrong,
it just makes the logic easier to follow if they are categorised
when they are accepted.

Signed-off-by: Nicholas Piggin 
---
 include/hw/usb/msd.h |   5 +-
 hw/usb/dev-storage.c | 121 +++
 2 files changed, 79 insertions(+), 47 deletions(-)

diff --git a/include/hw/usb/msd.h b/include/hw/usb/msd.h
index f9fd862b529..a40d15f5def 100644
--- a/include/hw/usb/msd.h
+++ b/include/hw/usb/msd.h
@@ -33,8 +33,11 @@ struct MSDState {
 struct usb_msd_csw csw;
 SCSIRequest *req;
 SCSIBus bus;
+
 /* For async completion.  */
-USBPacket *packet;
+USBPacket *data_packet;
+USBPacket *csw_in_packet;
+
 /* usb-storage only */
 BlockConf conf;
 bool removable;
diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index 27093de5c84..a9d8d4e8618 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -178,18 +178,33 @@ static const USBDesc desc = {
 .str   = desc_strings,
 };
 
-static void usb_msd_packet_complete(MSDState *s, int status)
+static void usb_msd_data_packet_complete(MSDState *s, int status)
 {
-USBPacket *p = s->packet;
+USBPacket *p = s->data_packet;
 
 /*
- * Set s->packet to NULL before calling usb_packet_complete
- * because another request may be issued before
- * usb_packet_complete returns.
+ * Set s->data_packet to NULL before calling usb_packet_complete
+ * because another request may be issued before usb_packet_complete
+ * returns.
  */
 trace_usb_msd_packet_complete();
+s->data_packet = NULL;
+p->status = status;
+usb_packet_complete(&s->dev, p);
+}
+
+static void usb_msd_csw_packet_complete(MSDState *s, int status)
+{
+USBPacket *p = s->csw_in_packet;
+
+/*
+ * Set s->csw_in_packet to NULL before calling usb_packet_complete
+ * because another request may be issued before usb_packet_complete
+ * returns.
+ */
+trace_usb_msd_packet_complete();
+s->csw_in_packet = NULL;
 p->status = status;
-s->packet = NULL;
 usb_packet_complete(&s->dev, p);
 }
 
@@ -197,8 +212,12 @@ static void usb_msd_fatal_error(MSDState *s)
 {
 trace_usb_msd_fatal_error();
 
-if (s->packet) {
-usb_msd_packet_complete(s, USB_RET_STALL);
+if (s->data_packet) {
+usb_msd_data_packet_complete(s, USB_RET_STALL);
+}
+
+if (s->csw_in_packet) {
+usb_msd_csw_packet_complete(s, USB_RET_STALL);
 }
 
 /*
@@ -243,7 +262,7 @@ static void usb_msd_send_status(MSDState *s, USBPacket *p)
 void usb_msd_transfer_data(SCSIRequest *req, uint32_t len)
 {
 MSDState *s = DO_UPCAST(MSDState, dev.qdev, req->bus->qbus.parent);
-USBPacket *p = s->packet;
+USBPacket *p = s->data_packet;
 
 if ((s->mode == USB_MSDM_DATAOUT) != (req->cmd.mode == SCSI_XFER_TO_DEV)) {
 usb_msd_fatal_error(s);
@@ -254,10 +273,10 @@ void usb_msd_transfer_data(SCSIRequest *req, uint32_t len)
 s->scsi_off = 0;
 if (p) {
 usb_msd_copy_data(s, p);
-p = s->packet;
+p = s->data_packet;
 if (p && p->actual_length == p->iov.size) {
 /* USB_RET_SUCCESS status clears previous ASYNC status */
-usb_msd_packet_complete(s, USB_RET_SUCCESS);
+usb_msd_data_packet_complete(s, USB_RET_SUCCESS);
 }
 }
 }
@@ -265,7 +284,7 @@ void usb_msd_transfer_data(SCSIRequest *req, uint32_t len)
 void usb_msd_command_complete(SCSIRequest *req, size_t resid)
 {
 MSDState *s = DO_UPCAST(MSDState, dev.qdev, req->bus->qbus.parent);
-USBPacket *p = s->packet;
+USBPacket *p = s->data_packet;
 
 trace_usb_msd_cmd_complete(req->status, req->tag);
 
@@ -274,35 +293,37 @@ void usb_msd_command_complete(SCSIRequest *req, size_t 
resid)
 s->csw.residue = cpu_to_le32(s->data_len);
 s->csw.status = req->status != 0;
 
-if (s->packet) {
-if (s->data_len == 0 && s->mode == USB_MSDM_DATAOUT) {
-/* A deferred packet with no write data remaining must be
-   the status read packet.  */
-usb_msd_send_status(s, p);
-s->mode = USB_MSDM_CBW;
-} else if (s->mode == USB_MSDM_CSW) {
-usb_msd_send_status(s, p);
-s->mode = USB_MSDM_CBW;
-} else {
-if (s->data_len) {
-int len = (p->iov.size - p->actual_length);
-usb_packet_skip(p, len);
-if (len > s->data_len) {
-len = s->data_len;
-}
-s->data_len -= len;
-}
-if (s->data_len == 0) {
-s->mode = USB_MSDM_CSW;
+scsi_req_unref(req);
+s->req = NULL;
+
+if (p) {
+g_assert(s->mode == USB_MSDM_DATAIN || s->mode == USB_MSDM_DATAOUT);
+if (s->data_len) {
+int len = (p->iov.size - p->actual_length);
+usb

[PATCH v2 01/10] usb/msd: Split in and out packet handling

2025-04-11 Thread Nicholas Piggin
Split in and out packet handling int otheir own functions, to make
them a bit more managable.

Signed-off-by: Nicholas Piggin 
---
 hw/usb/dev-storage.c | 266 +++
 1 file changed, 145 insertions(+), 121 deletions(-)

diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index 4f1e8b7f6cb..2d7306b0572 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -395,158 +395,182 @@ static void usb_msd_cancel_io(USBDevice *dev, USBPacket 
*p)
 }
 }
 
-static void usb_msd_handle_data(USBDevice *dev, USBPacket *p)
+static void usb_msd_handle_data_out(USBDevice *dev, USBPacket *p)
 {
 MSDState *s = (MSDState *)dev;
 uint32_t tag;
 struct usb_msd_cbw cbw;
-uint8_t devep = p->ep->nr;
 SCSIDevice *scsi_dev;
 int len;
 
-if (s->needs_reset) {
-p->status = USB_RET_STALL;
-return;
-}
+switch (s->mode) {
+case USB_MSDM_CBW:
+if (p->iov.size != 31) {
+error_report("usb-msd: Bad CBW size");
+goto fail;
+}
+usb_packet_copy(p, &cbw, 31);
+if (le32_to_cpu(cbw.sig) != 0x43425355) {
+error_report("usb-msd: Bad signature %08x",
+ le32_to_cpu(cbw.sig));
+goto fail;
+}
+scsi_dev = scsi_device_find(&s->bus, 0, 0, cbw.lun);
+if (scsi_dev == NULL) {
+error_report("usb-msd: Bad LUN %d", cbw.lun);
+goto fail;
+}
+tag = le32_to_cpu(cbw.tag);
+s->data_len = le32_to_cpu(cbw.data_len);
+if (s->data_len == 0) {
+s->mode = USB_MSDM_CSW;
+} else if (cbw.flags & 0x80) {
+s->mode = USB_MSDM_DATAIN;
+} else {
+s->mode = USB_MSDM_DATAOUT;
+}
+trace_usb_msd_cmd_submit(cbw.lun, tag, cbw.flags,
+ cbw.cmd_len, s->data_len);
+assert(le32_to_cpu(s->csw.residue) == 0);
+s->scsi_len = 0;
+s->req = scsi_req_new(scsi_dev, tag, cbw.lun,
+  cbw.cmd, cbw.cmd_len, NULL);
+if (s->commandlog) {
+scsi_req_print(s->req);
+}
+len = scsi_req_enqueue(s->req);
+if (len) {
+scsi_req_continue(s->req);
+}
+break;
 
-switch (p->pid) {
-case USB_TOKEN_OUT:
-if (devep != 2)
+case USB_MSDM_DATAOUT:
+trace_usb_msd_data_out(p->iov.size, s->data_len);
+if (p->iov.size > s->data_len) {
 goto fail;
+}
 
-switch (s->mode) {
-case USB_MSDM_CBW:
-if (p->iov.size != 31) {
-error_report("usb-msd: Bad CBW size");
-goto fail;
-}
-usb_packet_copy(p, &cbw, 31);
-if (le32_to_cpu(cbw.sig) != 0x43425355) {
-error_report("usb-msd: Bad signature %08x",
- le32_to_cpu(cbw.sig));
-goto fail;
-}
-scsi_dev = scsi_device_find(&s->bus, 0, 0, cbw.lun);
-if (scsi_dev == NULL) {
-error_report("usb-msd: Bad LUN %d", cbw.lun);
-goto fail;
-}
-tag = le32_to_cpu(cbw.tag);
-s->data_len = le32_to_cpu(cbw.data_len);
-if (s->data_len == 0) {
-s->mode = USB_MSDM_CSW;
-} else if (cbw.flags & 0x80) {
-s->mode = USB_MSDM_DATAIN;
-} else {
-s->mode = USB_MSDM_DATAOUT;
-}
-trace_usb_msd_cmd_submit(cbw.lun, tag, cbw.flags,
- cbw.cmd_len, s->data_len);
-assert(le32_to_cpu(s->csw.residue) == 0);
-s->scsi_len = 0;
-s->req = scsi_req_new(scsi_dev, tag, cbw.lun, cbw.cmd, 
cbw.cmd_len, NULL);
-if (s->commandlog) {
-scsi_req_print(s->req);
-}
-len = scsi_req_enqueue(s->req);
+if (s->scsi_len) {
+usb_msd_copy_data(s, p);
+}
+if (le32_to_cpu(s->csw.residue)) {
+len = p->iov.size - p->actual_length;
 if (len) {
-scsi_req_continue(s->req);
+usb_packet_skip(p, len);
+if (len > s->data_len) {
+len = s->data_len;
+}
+s->data_len -= len;
+if (s->data_len == 0) {
+s->mode = USB_MSDM_CSW;
+}
 }
-break;
+}
+if (p->actual_length < p->iov.size) {
+trace_usb_msd_packet_async();
+s->packet = p;
+p->status = USB_RET_ASYNC;
+}
+break;
 
-case USB_MSDM_DATAOUT:
-trace_usb_msd_data_out(p->iov.size, s->data_len);
-if (p->iov.size > s->data_len) {
-goto fail;
-}
+default:
+goto fail;
+}
+return;
 
-if (s->scsi_le

[PATCH v2 08/10] usb/msd: Rename mode to cbw_state, and tweak names

2025-04-11 Thread Nicholas Piggin
This reflects a little better what it does, particularly with a
subsequent change to relax the order packets are seen in. This
field is not the general state of the MSD state machine, rather
it follows packets that are completed as part of a CBW command.

The difference is a bit subtle, so for a concrete example, the
next change will permit the host to send a CSW packet before it
sends the associated CBW packet. In that case the CSW packet
will be tracked and the MSD state machine will move, but this
mode / cbw_state field would remain unchanged (in the "expecting
CBW" state), until the CBW packet arrives.

Signed-off-by: Nicholas Piggin 
---
 include/hw/usb/msd.h | 12 +--
 hw/usb/dev-storage.c | 50 +++-
 2 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/include/hw/usb/msd.h b/include/hw/usb/msd.h
index a40d15f5def..c109544f632 100644
--- a/include/hw/usb/msd.h
+++ b/include/hw/usb/msd.h
@@ -10,11 +10,11 @@
 #include "hw/usb.h"
 #include "hw/scsi/scsi.h"
 
-enum USBMSDMode {
-USB_MSDM_CBW, /* Command Block.  */
-USB_MSDM_DATAOUT, /* Transfer data to device.  */
-USB_MSDM_DATAIN, /* Transfer data from device.  */
-USB_MSDM_CSW /* Command Status.  */
+enum USBMSDCBWState {
+USB_MSD_CBW_NONE,/* Ready, waiting for CBW packet. */
+USB_MSD_CBW_DATAOUT, /* Expecting DATA-OUT (to device) packet */
+USB_MSD_CBW_DATAIN,  /* Expecting DATA-IN (from device) packet */
+USB_MSD_CBW_CSW  /* No more data, expecting CSW packet.  */
 };
 
 struct QEMU_PACKED usb_msd_csw {
@@ -26,7 +26,7 @@ struct QEMU_PACKED usb_msd_csw {
 
 struct MSDState {
 USBDevice dev;
-enum USBMSDMode mode;
+enum USBMSDCBWState cbw_state;
 uint32_t scsi_off;
 uint32_t scsi_len;
 uint32_t data_len;
diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index 3b806872587..ed6d9b70b96 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -264,12 +264,12 @@ void usb_msd_transfer_data(SCSIRequest *req, uint32_t len)
 MSDState *s = DO_UPCAST(MSDState, dev.qdev, req->bus->qbus.parent);
 USBPacket *p = s->data_packet;
 
-if (s->mode == USB_MSDM_DATAIN) {
+if (s->cbw_state == USB_MSD_CBW_DATAIN) {
 if (req->cmd.mode == SCSI_XFER_TO_DEV) {
 usb_msd_fatal_error(s);
 return;
 }
-} else if (s->mode == USB_MSDM_DATAOUT) {
+} else if (s->cbw_state == USB_MSD_CBW_DATAOUT) {
 if (req->cmd.mode != SCSI_XFER_TO_DEV) {
 usb_msd_fatal_error(s);
 return;
@@ -301,7 +301,7 @@ void usb_msd_command_complete(SCSIRequest *req, size_t 
resid)
 
 g_assert(s->req);
 /* The CBW is what starts the SCSI request */
-g_assert(s->mode != USB_MSDM_CBW);
+g_assert(s->cbw_state != USB_MSD_CBW_NONE);
 
 s->csw.sig = cpu_to_le32(0x53425355);
 s->csw.tag = cpu_to_le32(req->tag);
@@ -312,7 +312,8 @@ void usb_msd_command_complete(SCSIRequest *req, size_t 
resid)
 s->req = NULL;
 
 if (p) {
-g_assert(s->mode == USB_MSDM_DATAIN || s->mode == USB_MSDM_DATAOUT);
+g_assert(s->cbw_state == USB_MSD_CBW_DATAIN ||
+ s->cbw_state == USB_MSD_CBW_DATAOUT);
 if (s->data_len) {
 int len = (p->iov.size - p->actual_length);
 usb_packet_skip(p, len);
@@ -322,19 +323,19 @@ void usb_msd_command_complete(SCSIRequest *req, size_t 
resid)
 s->data_len -= len;
 }
 if (s->data_len == 0) {
-s->mode = USB_MSDM_CSW;
+s->cbw_state = USB_MSD_CBW_CSW;
 }
 /* USB_RET_SUCCESS status clears previous ASYNC status */
 usb_msd_data_packet_complete(s, USB_RET_SUCCESS);
 } else if (s->data_len == 0) {
-s->mode = USB_MSDM_CSW;
+s->cbw_state = USB_MSD_CBW_CSW;
 }
 
-if (s->mode == USB_MSDM_CSW) {
+if (s->cbw_state == USB_MSD_CBW_CSW) {
 p = s->csw_in_packet;
 if (p) {
 usb_msd_send_status(s, p);
-s->mode = USB_MSDM_CBW;
+s->cbw_state = USB_MSD_CBW_NONE;
 /* USB_RET_SUCCESS status clears previous ASYNC status */
 usb_msd_csw_packet_complete(s, USB_RET_SUCCESS);
 }
@@ -377,7 +378,7 @@ void usb_msd_handle_reset(USBDevice *dev)
 }
 
 memset(&s->csw, 0, sizeof(s->csw));
-s->mode = USB_MSDM_CBW;
+s->cbw_state = USB_MSD_CBW_NONE;
 
 s->needs_reset = false;
 }
@@ -478,8 +479,8 @@ static void usb_msd_handle_data_out(USBDevice *dev, 
USBPacket *p)
 SCSIDevice *scsi_dev;
 int len;
 
-switch (s->mode) {
-case USB_MSDM_CBW:
+switch (s->cbw_state) {
+case USB_MSD_CBW_NONE:
 if (!try_get_valid_cbw(p, &cbw)) {
 goto fail;
 }
@@ -492,11 +493,11 @@ static void usb_msd_handle_data_out(USBDevice *dev, 
USBPacket *p)
 tag = le32_to_cpu(cbw.tag);
 s->data_len = le32_to_cpu(cbw.data_len);
 if (s->data_len == 0) {
-s->mode = USB_MSDM_CSW;

[PATCH v2 09/10] usb/msd: Permit a DATA-IN or CSW packet before CBW packet

2025-04-11 Thread Nicholas Piggin
The USB MSD protocol has 3 packets that make up a command, and only one
command may be active at any time.

- CBW to start a command (that contains a SCSI request).
- DATA (IN or OUT) to request data transfer between host and SCSI layer.
- CSW to return status and complete the command.

DATA is omitted if the request has no data.

The QEMU MSD model requires these packets to arrive in this order, CBW,
DATA, CSW. This is the way the state machine is generally described in
the MSD spec, and this must be how most USB stacks operate. Except AIX.

Universal Serial Bus Mass Storage Class Bulk-Only Transport 1.0 contains
one word in one sentence that permits the relaxed ordering:

  3.3 Host/Device Packet Transfer Order
  The host shall send the CBW before the associated Data-Out, and the
  device shall send Data-In after the associated CBW and before the
  associated CSW. The host may request Data-In or CSW before sending the
  associated CBW.

Complicating matters, DATA-IN and CSW are both input packets that arrive
in the same manner, so before a CBW it is impossible to determine if an
IN packet is for data or CSW.

So permit "unusually-ordered" packets by tracking them as an "unknown"
packet until the CBW arrives, then they are categorized into a DATA or
CSW packet.

It is not clear whether the spec permits multiple such packets before
the CBW. This implementation permits only one, which seems to be enough
for AIX.

Signed-off-by: Nicholas Piggin 
---
 include/hw/usb/msd.h |  1 +
 hw/usb/dev-storage.c | 43 +++
 2 files changed, 44 insertions(+)

diff --git a/include/hw/usb/msd.h b/include/hw/usb/msd.h
index c109544f632..2ed3664b31d 100644
--- a/include/hw/usb/msd.h
+++ b/include/hw/usb/msd.h
@@ -37,6 +37,7 @@ struct MSDState {
 /* For async completion.  */
 USBPacket *data_packet;
 USBPacket *csw_in_packet;
+USBPacket *unknown_in_packet;
 
 /* usb-storage only */
 BlockConf conf;
diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index ed6d9b70b96..654b9071d33 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -436,6 +436,8 @@ static void usb_msd_cancel_io(USBDevice *dev, USBPacket *p)
 }
 } else if (p == s->csw_in_packet) {
 s->csw_in_packet = NULL;
+} else if (p == s->unknown_in_packet) {
+s->unknown_in_packet = NULL;
 } else {
 g_assert_not_reached();
 }
@@ -499,6 +501,19 @@ static void usb_msd_handle_data_out(USBDevice *dev, 
USBPacket *p)
 } else {
 s->cbw_state = USB_MSD_CBW_DATAOUT;
 }
+if (s->unknown_in_packet) {
+if (s->cbw_state == USB_MSD_CBW_DATAIN) {
+/* Must be a DATAIN packet */
+s->data_packet = s->unknown_in_packet;
+} else {
+/* Must be the CSW packet */
+if (!check_valid_csw(s->unknown_in_packet)) {
+goto fail;
+}
+s->csw_in_packet = s->unknown_in_packet;
+}
+s->unknown_in_packet = NULL;
+}
 trace_usb_msd_cmd_submit(cbw.lun, tag, cbw.flags,
  cbw.cmd_len, s->data_len);
 assert(le32_to_cpu(s->csw.residue) == 0);
@@ -516,6 +531,11 @@ static void usb_msd_handle_data_out(USBDevice *dev, 
USBPacket *p)
 
 case USB_MSD_CBW_DATAOUT:
 trace_usb_msd_data_out(p->iov.size, s->data_len);
+if (s->unknown_in_packet) {
+error_report("usb-msd: unknown_in_packet in DATAOUT state");
+goto fail;
+}
+
 if (p->iov.size > s->data_len) {
 goto fail;
 }
@@ -558,7 +578,22 @@ static void usb_msd_handle_data_in(USBDevice *dev, 
USBPacket *p)
 int len;
 
 switch (s->cbw_state) {
+case USB_MSD_CBW_NONE:
+if (s->unknown_in_packet) {
+qemu_log_mask(LOG_GUEST_ERROR, "usb-msd: second IN packet was"
+   "received before CBW\n");
+goto fail;
+}
+trace_usb_msd_packet_async();
+s->unknown_in_packet = p;
+p->status = USB_RET_ASYNC;
+break;
+
 case USB_MSD_CBW_DATAOUT:
+if (s->unknown_in_packet) {
+error_report("usb-msd: unknown_in_packet in DATAOUT state");
+goto fail;
+}
 if (!check_valid_csw(p)) {
 goto fail;
 }
@@ -575,6 +610,10 @@ static void usb_msd_handle_data_in(USBDevice *dev, 
USBPacket *p)
 break;
 
 case USB_MSD_CBW_CSW:
+if (s->unknown_in_packet) {
+error_report("usb-msd: unknown_in_packet in DATAOUT state");
+goto fail;
+}
 if (!check_valid_csw(p)) {
 goto fail;
 }
@@ -592,6 +631,10 @@ static void usb_msd_handle_data_in(USBDevice *dev, 
USBPacket *p)
 
 case USB_MSD_CBW_DATAIN:
 trace_usb_msd_data_in(p->iov.size, s->data_len, s->scsi_len);
+if (s->unknown_in_packet) {

[PATCH v2 02/10] usb/msd: Ensure packet structure layout is correct

2025-04-11 Thread Nicholas Piggin
These structures are hardware interfaces, ensure the layout is
correct.

Signed-off-by: Nicholas Piggin 
---
 hw/usb/dev-storage.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index 2d7306b0572..87c22476f6b 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -27,7 +27,7 @@
 #define MassStorageReset  0xff
 #define GetMaxLun 0xfe
 
-struct usb_msd_cbw {
+struct QEMU_PACKED usb_msd_cbw {
 uint32_t sig;
 uint32_t tag;
 uint32_t data_len;
@@ -636,6 +636,9 @@ static const TypeInfo usb_storage_dev_type_info = {
 
 static void usb_msd_register_types(void)
 {
+qemu_build_assert(sizeof(struct usb_msd_cbw) == 31);
+qemu_build_assert(sizeof(struct usb_msd_csw) == 13);
+
 type_register_static(&usb_storage_dev_type_info);
 }
 
-- 
2.47.1




[PATCH v2 04/10] usb/msd: Improve packet validation error logging

2025-04-11 Thread Nicholas Piggin
Errors in incoming USB MSD packet format or context would typically
be guest software errors. Log these under guest errors.

Signed-off-by: Nicholas Piggin 
---
 hw/usb/dev-storage.c | 53 +++-
 1 file changed, 42 insertions(+), 11 deletions(-)

diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index c7c36ac80fa..6668114ea74 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -10,6 +10,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
+#include "qemu/log.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
 #include "qemu/config-file.h"
@@ -395,6 +396,36 @@ static void usb_msd_cancel_io(USBDevice *dev, USBPacket *p)
 }
 }
 
+static bool try_get_valid_cbw(USBPacket *p, struct usb_msd_cbw *cbw)
+{
+uint32_t sig;
+
+if (p->iov.size != 31) {
+qemu_log_mask(LOG_GUEST_ERROR, "usb-msd: Bad CBW size %zu\n",
+   p->iov.size);
+return false;
+}
+usb_packet_copy(p, cbw, 31);
+sig = le32_to_cpu(cbw->sig);
+if (sig != 0x43425355) {
+qemu_log_mask(LOG_GUEST_ERROR, "usb-msd: Bad CBW signature 0x%08x\n",
+   sig);
+return false;
+}
+
+return true;
+}
+
+static bool check_valid_csw(USBPacket *p)
+{
+if (p->iov.size < 13) {
+qemu_log_mask(LOG_GUEST_ERROR, "usb-msd: Bad CSW size %zu\n",
+  p->iov.size);
+return false;
+}
+return true;
+}
+
 static void usb_msd_handle_data_out(USBDevice *dev, USBPacket *p)
 {
 MSDState *s = (MSDState *)dev;
@@ -405,19 +436,13 @@ static void usb_msd_handle_data_out(USBDevice *dev, 
USBPacket *p)
 
 switch (s->mode) {
 case USB_MSDM_CBW:
-if (p->iov.size != 31) {
-error_report("usb-msd: Bad CBW size");
-goto fail;
-}
-usb_packet_copy(p, &cbw, 31);
-if (le32_to_cpu(cbw.sig) != 0x43425355) {
-error_report("usb-msd: Bad signature %08x",
- le32_to_cpu(cbw.sig));
+if (!try_get_valid_cbw(p, &cbw)) {
 goto fail;
 }
 scsi_dev = scsi_device_find(&s->bus, 0, 0, cbw.lun);
 if (scsi_dev == NULL) {
-error_report("usb-msd: Bad LUN %d", cbw.lun);
+qemu_log_mask(LOG_GUEST_ERROR, "usb-msd: Bad CBW LUN %d\n",
+   cbw.lun);
 goto fail;
 }
 tag = le32_to_cpu(cbw.tag);
@@ -489,9 +514,15 @@ static void usb_msd_handle_data_in(USBDevice *dev, 
USBPacket *p)
 
 switch (s->mode) {
 case USB_MSDM_DATAOUT:
-if (s->data_len != 0 || p->iov.size < 13) {
+if (!check_valid_csw(p)) {
+goto fail;
+}
+if (s->data_len != 0) {
+qemu_log_mask(LOG_GUEST_ERROR, "usb-msd: CSW received before "
+   "all data was sent\n");
 goto fail;
 }
+
 /* Waiting for SCSI write to complete.  */
 trace_usb_msd_packet_async();
 s->packet = p;
@@ -499,7 +530,7 @@ static void usb_msd_handle_data_in(USBDevice *dev, 
USBPacket *p)
 break;
 
 case USB_MSDM_CSW:
-if (p->iov.size < 13) {
+if (!check_valid_csw(p)) {
 goto fail;
 }
 
-- 
2.47.1




[PATCH v2 00/10] usb/msd: Permit relaxed ordering of IN packets

2025-04-11 Thread Nicholas Piggin
This series ultimately permits relaxed ordering of USB mass-storage
packets from the host, as allowed by the usbmassbulk 1.0 spec, but
not usually seen in drivers. AIX drivers do require this ordering.

Since v1:

https://lore.kernel.org/qemu-devel/20241212091323.1442995-1-npig...@gmail.com/

- Rebased on upstream with one patch from the series merged.
- Fixed a few build warnings on 32-bit hosts.

Thanks,
Nick


Nicholas Piggin (10):
  usb/msd: Split in and out packet handling
  usb/msd: Ensure packet structure layout is correct
  usb/msd: Improved handling of mass storage reset
  usb/msd: Improve packet validation error logging
  usb/msd: Allow CBW packet size greater than 31
  usb/msd: Split async packet tracking into data and csw
  usb/msd: Add some additional assertions
  usb/msd: Rename mode to cbw_state, and tweak names
  usb/msd: Permit a DATA-IN or CSW packet before CBW packet
  usb/msd: Add more tracing

 include/hw/usb/msd.h |  18 +-
 hw/usb/dev-storage.c | 510 ---
 hw/usb/trace-events  |   9 +-
 3 files changed, 357 insertions(+), 180 deletions(-)

-- 
2.47.1




[PATCH v2 07/10] usb/msd: Add some additional assertions

2025-04-11 Thread Nicholas Piggin
Add more assertions to help verify internal logic.

Signed-off-by: Nicholas Piggin 
---
 hw/usb/dev-storage.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index a9d8d4e8618..3b806872587 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -264,13 +264,24 @@ void usb_msd_transfer_data(SCSIRequest *req, uint32_t len)
 MSDState *s = DO_UPCAST(MSDState, dev.qdev, req->bus->qbus.parent);
 USBPacket *p = s->data_packet;
 
-if ((s->mode == USB_MSDM_DATAOUT) != (req->cmd.mode == SCSI_XFER_TO_DEV)) {
-usb_msd_fatal_error(s);
-return;
+if (s->mode == USB_MSDM_DATAIN) {
+if (req->cmd.mode == SCSI_XFER_TO_DEV) {
+usb_msd_fatal_error(s);
+return;
+}
+} else if (s->mode == USB_MSDM_DATAOUT) {
+if (req->cmd.mode != SCSI_XFER_TO_DEV) {
+usb_msd_fatal_error(s);
+return;
+}
+} else {
+g_assert_not_reached();
 }
 
+assert(s->scsi_len == 0);
 s->scsi_len = len;
 s->scsi_off = 0;
+
 if (p) {
 usb_msd_copy_data(s, p);
 p = s->data_packet;
@@ -288,6 +299,10 @@ void usb_msd_command_complete(SCSIRequest *req, size_t 
resid)
 
 trace_usb_msd_cmd_complete(req->status, req->tag);
 
+g_assert(s->req);
+/* The CBW is what starts the SCSI request */
+g_assert(s->mode != USB_MSDM_CBW);
+
 s->csw.sig = cpu_to_le32(0x53425355);
 s->csw.tag = cpu_to_le32(req->tag);
 s->csw.residue = cpu_to_le32(s->data_len);
@@ -486,7 +501,7 @@ static void usb_msd_handle_data_out(USBDevice *dev, 
USBPacket *p)
 trace_usb_msd_cmd_submit(cbw.lun, tag, cbw.flags,
  cbw.cmd_len, s->data_len);
 assert(le32_to_cpu(s->csw.residue) == 0);
-s->scsi_len = 0;
+assert(s->scsi_len == 0);
 s->req = scsi_req_new(scsi_dev, tag, cbw.lun,
   cbw.cmd, cbw.cmd_len, NULL);
 if (s->commandlog) {
-- 
2.47.1




Re: [PATCH v2 1/2] file-posix: probe discard alignment on Linux block devices

2025-04-11 Thread Hanna Czenczek

On 10.04.25 20:41, Stefan Hajnoczi wrote:

Populate the pdiscard_alignment block limit so the block layer is able
align discard requests correctly.

Signed-off-by: Stefan Hajnoczi 
---
  block/file-posix.c | 56 +-
  1 file changed, 55 insertions(+), 1 deletion(-)


Ah, I didn’t know sysfs is actually fair game.  Should we not also get 
the maximum discard length then, too?



diff --git a/block/file-posix.c b/block/file-posix.c
index 56d1972d15..2a1e1f48c0 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1276,10 +1276,10 @@ static int get_sysfs_zoned_model(struct stat *st, 
BlockZoneModel *zoned)
  }
  #endif /* defined(CONFIG_BLKZONED) */
  
+#ifdef CONFIG_LINUX

  /*
   * Get a sysfs attribute value as a long integer.
   */
-#ifdef CONFIG_LINUX
  static long get_sysfs_long_val(struct stat *st, const char *attribute)
  {
  g_autofree char *str = NULL;
@@ -1299,6 +1299,30 @@ static long get_sysfs_long_val(struct stat *st, const 
char *attribute)
  }
  return ret;
  }
+
+/*
+ * Get a sysfs attribute value as a uint32_t.
+ */
+static int get_sysfs_u32_val(struct stat *st, const char *attribute,
+ uint32_t *u32)
+{
+g_autofree char *str = NULL;
+const char *end;
+unsigned int val;
+int ret;
+
+ret = get_sysfs_str_val(st, attribute, &str);
+if (ret < 0) {
+return ret;
+}
+
+/* The file is ended with '\n', pass 'end' to accept that. */
+ret = qemu_strtoui(str, &end, 10, &val);
+if (ret == 0 && end && *end == '\0') {
+*u32 = val;
+}
+return ret;
+}
  #endif
  
  static int hdev_get_max_segments(int fd, struct stat *st)

@@ -1318,6 +1342,23 @@ static int hdev_get_max_segments(int fd, struct stat *st)
  #endif
  }
  
+/*

+ * Fills in *dalign with the discard alignment and returns 0 on success,
+ * -errno otherwise.
+ */
+static int hdev_get_pdiscard_alignment(struct stat *st, uint32_t *dalign)
+{
+#ifdef CONFIG_LINUX
+/*
+ * Note that Linux "discard_granularity" is QEMU "discard_alignment". Linux
+ * "discard_alignment" is something else.
+ */
+return get_sysfs_u32_val(st, "discard_granularity", dalign);
+#else
+return -ENOTSUP;
+#endif
+}
+
  #if defined(CONFIG_BLKZONED)
  /*
   * If the reset_all flag is true, then the wps of zone whose state is
@@ -1527,6 +1568,19 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
  }
  }
  
+if (S_ISBLK(st.st_mode)) {

+uint32_t dalign = 0;
+int ret;
+
+ret = hdev_get_pdiscard_alignment(&st, &dalign);
+if (ret == 0) {
+/* Must be a multiple of request_alignment */
+assert(dalign % bs->bl.request_alignment == 0);


Is it fair to crash qemu if the kernel reports a value that is not a 
multiple of request_alignment?  Wouldn’t it make more sense to take the 
maximum, and if that still isn’t a multiple, return an error here?


Hanna


+
+bs->bl.pdiscard_alignment = dalign;
+}
+}
+
  raw_refresh_zoned_limits(bs, &st, errp);
  }
  





Re: [PATCH v2 2/2] block/io: skip head/tail requests on EINVAL

2025-04-11 Thread Hanna Czenczek

On 10.04.25 20:41, Stefan Hajnoczi wrote:

When guests send misaligned discard requests, the block layer breaks
them up into a misaligned head, an aligned main body, and a misaligned
tail.

The file-posix block driver on Linux returns -EINVAL on misaligned
discard requests. This causes bdrv_co_pdiscard() to fail and guests
configured with werror=stop will pause.

Add a special case for misaligned head/tail requests. Simply continue
when EINVAL is encountered so that the aligned main body of the request
can be completed and the guest is not paused. This is the best we can do
when guest discard limits do not match the host discard limits.

Fixes: https://issues.redhat.com/browse/RHEL-86032
Signed-off-by: Stefan Hajnoczi 
---
  block/io.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 1ba8d1aeea..a0d0b31a3e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3180,7 +3180,11 @@ int coroutine_fn bdrv_co_pdiscard(BdrvChild *child, 
int64_t offset,
  }
  }
  if (ret && ret != -ENOTSUP) {
-goto out;
+if (ret == -EINVAL && (offset % align != 0 || num % align != 0)) {


Could use `(offset | num) % align != 0`, but either way:

Reviewed-by: Hanna Czenczek 


+/* Silently skip rejected unaligned head/tail requests */
+} else {
+goto out; /* bail out */
+}
  }
  
  offset += num;





Re: [PATCH] migration: add FEATURE_SEEKABLE to QIOChannelBlock

2025-04-11 Thread Marco Cavenati
On Thursday, April 10, 2025 21:52 CEST, Fabiano Rosas  wrote:

> We'll need to add the infrastructure to reject multifd and direct-io
> before this. The rest of the capabilities should not affect mapped-ram,
> so it's fine (for now) if we don't honor them.

Ok, thanks for the update.
 
> What about zero page handling? Mapped-ram doesn't send zero pages
> because the file will always have zeroes in it and the migration
> destination is guaranteed to not have been running previously. I believe
> loading a snapshot in a VM that's already been running would leave stale
> data in the guest's memory.

Yes, you are correct.

About the `RAMBlock->file_bmap`, according to the code it is a:
`/* bitmap of pages present in the migration file */`
And, if a pages is a zero page, it won't be in the migration file:
`/* zero pages are not transferred with mapped-ram */`
So, zero page implies bitmap 0.
Does the opposite hold?

If bitmap 0 implies zero page, we could call `ram_handle_zero`
in `read_ramblock_mapped_ram` for the clear bits.
Or do you fear this might be unnecessary expensive for migration?

If bitmap 0 does not imply zero page, I feel like the
"is present in the migration file" and "is zero page" info should
be better separated.

Best,
Marco




Re: [PATCH 00/10] Enable QEMU to run on browsers

2025-04-11 Thread Paolo Bonzini

On 4/10/25 15:13, Kohei Tokunaga wrote:

> The biggest problem I'm seeing is we no longer support 64-bit guests on
> 32-bit hosts, and don't plan to revert that.

Yes, so the sixth patch ("[PATCH 06/10] include/exec: Allow using 64bit
guest addresses on emscripten") should be considered as a temporary
workaround, enabled only for Emsripten builds. It will be removed once
wasm64 gains broader support and is adopted in the Wasm backend.


Maybe there's a way though. Currently we don't support 64-bit guests on 
32-bit hosts, but more precisely we don't support 64-bit guests with 
32-bit host word size.



The wasm TCG backend is able to compile with 64-bit words:

+#define TCG_TARGET_HAS_div_i64  1
+#define TCG_TARGET_HAS_rem_i64  1
etc.

and if x32 was a thing it would as well.  In fact the changes in patch 
6/10 are not a full revert, and the "#ifdef EMSCRIPTEN" could be changed to


#if HOST_LONG_BITS >= TARGET_LONG_BITS
... use uintptr_t
#else
... use uint64_t
#endif

Paolo




RE: Configuring onboard devices, in particular memory contents (was: [PATCH v1 0/1] hw/misc/aspeed_sbc: Implement OTP memory and controller)

2025-04-11 Thread Kane Chen
Hi Markus,

Thank you for the background information.

Since the OTP device is part of the Secure Boot Controller (SBC), I plan to 
register it in the global table. I believe this will simplify usage.

Meanwhile, based on Philippe's comment, I’m working on `aspeed_otp.c` to handle 
low-level OTP operations. This approach should help decouple SBC and OTP 
functionalities.

Once testing is complete, I will submit a separate patch for further review.

Best regards,
Kane
> -Original Message-
> From: Markus Armbruster 
> Sent: Tuesday, April 8, 2025 7:39 PM
> To: Cédric Le Goater 
> Cc: Kane Chen ; Philippe Mathieu-Daudé
> ; Peter Maydell ; Steven Lee
> ; Troy Lee ; Jamin Lin
> ; Andrew Jeffery
> ; Joel Stanley ; open
> list:ASPEED BMCs ; open list:All patches CC here
> ; qemu-block ; Troy Lee
> 
> Subject: Configuring onboard devices, in particular memory contents (was:
> [PATCH v1 0/1] hw/misc/aspeed_sbc: Implement OTP memory and controller)
> 
> Cédric Le Goater  writes:
> 
> > Hello Kane,
> >
> > + Markus (for ebc29e1beab0 implementation)
> >
> > On 4/7/25 09:33, Kane Chen wrote:
> >> Hi Cédric/Philippe,
> >> OTP (One-Time Programmable) memory is a type of non-volatile memory
> >> in which each bit can be programmed only once. It is typically used
> >> to store critical and permanent information, such as the chip ID and
> >> secure boot keys. The structure and behavior of OTP memory are
> >> consistent across both the AST1030 and AST2600 platforms.
> >> As Philippe pointed out, this proposal models the OTP memory as a
> >> flash device and utilizes a block backend for persistent storage. In
> >> contrast, existing implementations such as NPCM7xxOTPState,
> >> BCM2835OTPState, and SiFiveUOTPState expose OTP memory via MMIO
> and
> >> always initialize it in a blank state.
> >
> > AFAIU, Aspeed SBC is also MMIO based or is there another device, an
> > eeprom, accessible through an external bus ? How is it implemented in
> > HW ?
> >
> >> The goal of this design is to
> >> allow the guest system to boot with a pre-configured OTP memory
> >> state.
> >
> > Yes. This is a valid request. It's not the first time we've had this
> > kind of requests. The initial content of EEPROM devices are an example
> > and some machines, like the rainier, have a lot.
> >
> > If the device can be defined on the command line, like would be an
> > EEPROM device attached to an I2C bus or a flash device attached to a
> > SPI bus, we can use a 'drive' property. Something like :
> >
> >   qemu-system-arm -M ast2600-evb \
> >   -blockdev node-name=fmc0,driver=file,filename=/path/to/fmc0.img
> \
> >   -device mx66u51235f,bus=ssi.0,cs=0x0,drive=fmc0 \
> >   -blockdev node-name=fmc1,driver=file,filename=/path/to/fmc1.img
> \
> >   -device mx66u51235f,bus=ssi.0,cs=0x1,drive=fmc1 \
> >   -blockdev node-name=spi1,driver=file,filename=/path/to/spi1.img \
> >   -device mx66u51235f,cs=0x0,bus=ssi.1,drive=spi1 \
> >   ...
> >
> > However, the Aspeed SBC device is a platform device and it makes
> > things more complex : it can not be created on the command line, it is
> > directly created by the machine and the soc and passing device
> > properties to specify a blockdev it is not possible :
> >
> >   qemu-system-arm -M ast2600-evb \
> >   -blockdev
> node-name=otpmem,driver=file,filename=/path/to/otpmem.img \
> >   -device aspeed-sbc,drive=otpmem \
> >   ...
> 
> Configuring onboard devices is an old problem, and so far we have failed at
> solving it adequately.
> 
> -device / device_add let you configure the new device in a general way, but
> these work only for device the user creates, not for devices the board creates
> automatically.
> 
> We have a bunch of ad hoc and mostly ancient ways to configure them, but
> they're all limited.  For example:
> 
> * A number of old command line options, such as -drive, -serial, -net
>   nic, create device backends and additionally deposit configuration in
>   some global table the board may elect to use however it sees fit.  The
>   intended use is to create frontends connected to these backends.
> 
>   Some boards error out when they can't honor something in the table.
>   Others silently ignore parts of the table, or all of it.  Bad UI.
> 
>   Device configuration the table doesn't support is not accessible this
>   way.  If you extend the table (and the associated option) to provide
>   access to some device-specific configuration, all the other devices
>   will silently ignore the new configuration bits.  Again, bad UI.
> 
>   There's another serious issue with block devices: -drive is obsolete
>   for configurating complex block backends.  But its replacement
>   -blockdev is for backend configuration only.  If you use -blockdev,
>   you can't add to the table.
> 
> * Command line option -global lets you change property defaults.  This
>   can be used to configure an onboard device as long as it is the only
>   such device in the system.  Limited use, an

[PATCH v1 3/3] target/s390x: Return UVC cmd code, RC and RRC value when DIAG 308 Subcode 10 fails to enter secure mode

2025-04-11 Thread Gautam Gala
Extend DIAG308 subcode 10 to return the UVC RC, RRC and command code
in bit positions 32-47, 16-31, and 0-15 of register R1 + 1 if the
function does not complete successfully (in addition to the
previously returned diag response code in bit position 47-63).

Signed-off-by: Gautam Gala 
---
 hw/s390x/ipl.c | 11 ++
 hw/s390x/ipl.h |  5 +++--
 hw/s390x/s390-virtio-ccw.c | 24 +++--
 target/s390x/kvm/pv.c  | 44 +-
 target/s390x/kvm/pv.h  | 27 ---
 5 files changed, 76 insertions(+), 35 deletions(-)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index ce6f6078d7..4f3e3945f1 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -676,7 +676,8 @@ static void s390_ipl_prepare_qipl(S390CPU *cpu)
 cpu_physical_memory_unmap(addr, len, 1, len);
 }
 
-int s390_ipl_prepare_pv_header(Error **errp)
+int s390_ipl_prepare_pv_header(Error **errp, uint16_t *pv_cmd,
+   uint16_t *pv_rc, uint16_t *pv_rrc)
 {
 IplParameterBlock *ipib = s390_ipl_get_iplb_pv();
 IPLBlockPV *ipib_pv = &ipib->pv;
@@ -685,12 +686,13 @@ int s390_ipl_prepare_pv_header(Error **errp)
 
 cpu_physical_memory_read(ipib_pv->pv_header_addr, hdr,
  ipib_pv->pv_header_len);
-rc = s390_pv_set_sec_parms((uintptr_t)hdr, ipib_pv->pv_header_len, errp);
+rc = s390_pv_set_sec_parms((uintptr_t)hdr, ipib_pv->pv_header_len,
+   errp, pv_cmd, pv_rc, pv_rrc);
 g_free(hdr);
 return rc;
 }
 
-int s390_ipl_pv_unpack(void)
+int s390_ipl_pv_unpack(uint16_t *pv_cmd, uint16_t *pv_rc, uint16_t *pv_rrc)
 {
 IplParameterBlock *ipib = s390_ipl_get_iplb_pv();
 IPLBlockPV *ipib_pv = &ipib->pv;
@@ -699,7 +701,8 @@ int s390_ipl_pv_unpack(void)
 for (i = 0; i < ipib_pv->num_comp; i++) {
 rc = s390_pv_unpack(ipib_pv->components[i].addr,
 TARGET_PAGE_ALIGN(ipib_pv->components[i].size),
-ipib_pv->components[i].tweak_pref);
+ipib_pv->components[i].tweak_pref,
+pv_cmd, pv_rc, pv_rrc);
 if (rc) {
 break;
 }
diff --git a/hw/s390x/ipl.h b/hw/s390x/ipl.h
index 8e3882d506..e021b2431f 100644
--- a/hw/s390x/ipl.h
+++ b/hw/s390x/ipl.h
@@ -26,8 +26,9 @@ void s390_ipl_convert_loadparm(char *ascii_lp, uint8_t 
*ebcdic_lp);
 void s390_ipl_fmt_loadparm(uint8_t *loadparm, char *str, Error **errp);
 void s390_rebuild_iplb(uint16_t index, IplParameterBlock *iplb);
 void s390_ipl_update_diag308(IplParameterBlock *iplb);
-int s390_ipl_prepare_pv_header(Error **errp);
-int s390_ipl_pv_unpack(void);
+int s390_ipl_prepare_pv_header(Error **errp, uint16_t *pv_cmd,
+   uint16_t *pv_rc, uint16_t *pv_rrc);
+int s390_ipl_pv_unpack(uint16_t *pv_cmd, uint16_t *pv_rc, uint16_t *pv_rrc);
 void s390_ipl_prepare_cpu(S390CPU *cpu);
 IplParameterBlock *s390_ipl_get_iplb(void);
 IplParameterBlock *s390_ipl_get_iplb_pv(void);
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index d9e683c5b4..0faf2841d6 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -53,6 +53,13 @@
 
 static Error *pv_mig_blocker;
 
+struct diag308response {
+uint16_t pv_cmd;
+uint16_t pv_rrc;
+uint16_t pv_rc;
+uint16_t diag_rc;
+};
+
 static S390CPU *s390x_new_cpu(const char *typename, uint32_t core_id,
   Error **errp)
 {
@@ -364,7 +371,10 @@ static void s390_machine_unprotect(S390CcwMachineState *ms)
 ram_block_discard_disable(false);
 }
 
-static int s390_machine_protect(S390CcwMachineState *ms)
+static int s390_machine_protect(S390CcwMachineState *ms,
+uint16_t *pv_cmd,
+uint16_t *pv_rc,
+uint16_t *pv_rrc)
 {
 Error *local_err = NULL;
 int rc;
@@ -407,19 +417,19 @@ static int s390_machine_protect(S390CcwMachineState *ms)
 }
 
 /* Set SE header and unpack */
-rc = s390_ipl_prepare_pv_header(&local_err);
+rc = s390_ipl_prepare_pv_header(&local_err, pv_cmd, pv_rc, pv_rrc);
 if (rc) {
 goto out_err;
 }
 
 /* Decrypt image */
-rc = s390_ipl_pv_unpack();
+rc = s390_ipl_pv_unpack(pv_cmd, pv_rc, pv_rrc);
 if (rc) {
 goto out_err;
 }
 
 /* Verify integrity */
-rc = s390_pv_verify();
+rc = s390_pv_verify(pv_cmd, pv_rc, pv_rrc);
 if (rc) {
 goto out_err;
 }
@@ -452,6 +462,7 @@ static void s390_machine_reset(MachineState *machine, 
ResetType type)
 {
 S390CcwMachineState *ms = S390_CCW_MACHINE(machine);
 enum s390_reset reset_type;
+struct diag308response resp;
 CPUState *cs, *t;
 S390CPU *cpu;
 
@@ -539,8 +550,9 @@ static void s390_machine_reset(MachineState *machine, 
ResetType type)
 }
 run_on_cpu(cs, s390_do_cpu_reset, RUN_ON_CPU_NULL);
 
- 

Re: Issue with stoptrigger.c Plugin in QEMU Emulation

2025-04-11 Thread Saanjh Sengupta
Hi,

Thank you for responding.

The error is consistent while executing a command on the latest master branch 
(commit ID: 56c6e249b6988c1b6edc2dd34ebb0f1e570a1365) for the v10.0.0-rc3 
release.

Could you please confirm if you are using the same command (like I do in my 
case), and if possible, share it for reference?

Also, what OS are you emulating in QEMU and what is your host machine 
configuration over which QEMU is running ?

Regards
Saanjh Sengupta


Sent from Outlook for Android

From: Pierrick Bouvier 
Sent: Thursday, April 10, 2025 8:55:32 PM
To: Saanjh Sengupta 
Cc: phi...@linaro.org ; pbonz...@redhat.com 
; marcandre.lur...@redhat.com 
; amir.gon...@neuroblade.ai 
; qemu-devel@nongnu.org ; 
aabhashswai...@gmail.com ; anian...@gmail.com 
; guptapriyanshi...@gmail.com 
; harshitgupta5...@gmail.com 

Subject: Re: Issue with stoptrigger.c Plugin in QEMU Emulation

Hi Saanjh,

I have not been able to reproduce the issue with current master branch.
Is it an error you see for every run?

Regards,
Pierrick

On 4/10/25 04:10, Saanjh Sengupta wrote:
> Hi,
>
> I am writing to seek assistance with an issue I am experiencing while
> using the stoptrigger.c plugin in QEMU emulation. I am currently
> utilising the latest QEMU version, 9.2.92, and attempting to emulate the
> Debian 11 as the operating system.
>
> The command I am using to emulate QEMU is as follows:
> *./build/qemu-system-x86_64 -m 2048M -smp 2 -boot c -nographic -serial
> mon:stdio -nic tap,ifname=tap0,script=no,downscript=no  -hda
> debian11.qcow2 -icount shift=0 -plugin ./build/contrib/plugins/
> libstoptrigger.so,icount=90 -d plugin -qmp
> tcp:localhost:,server,wait=off*
>
> However, when I attempt to use the -icount shift=0 option, the plugin
> fails with the error "*Basic icount read*". I have attached a screenshot
> of the error for your reference.
>
> error.png
>
> When I remove the -plugin argument from the command the OS boots up
> perfectly, as expected. Command utilised in that context was somewhat
> like *./build/qemu-system-x86_64 -m 2048M -smp 2 -boot c -nographic -
> serial mon:stdio -nic tap,ifname=tap0,script=no,downscript=no  -hda
> debian11.qcow2 -icount shift=0 -qmp tcp:localhost:,server,wait=off*
>
>
> I would greatly appreciate it if you could provide guidance on resolving
> this issue. Specifically, I would like to know the cause of the error
> and any potential solutions or workarounds that could be implemented to
> successfully use the stoptrigger.c plugin with the -icount shift=0 option.
>
>
> Regards
>
> Saanjh Sengupta
>



Re: Configuring onboard devices, in particular memory contents

2025-04-11 Thread Cédric Le Goater

On 4/11/25 11:08, Kane Chen wrote:

Hi Markus,

Thank you for the background information.

Since the OTP device is part of the Secure Boot Controller (SBC), I plan to 
register it in the global table. I believe this will simplify usage.

Meanwhile, based on Philippe's comment, I’m working on `aspeed_otp.c` to handle low-level OTP operations. 


This approach should help decouple SBC and OTP functionalities.


AFAUI, I think Philippe meant something more like a generic OPT block
model, hw/block/otp.c, which would implement the low level storage
primitives of an OTP device. This model would then be used by the
Aspeed SBC model, providing the SW/HW interface, and possibly also
used by other models, such as the NPCM7xxOTPState, BCM2835OTPState,
and SiFiveUOTPState models.

Anyhow, separating the layers is a good idea :
  - OTP like device
  - Aspeed SBC interface
  - machine/SoC interface to define the backend.
  - and a doc update !
  - possibly a (functional) test case


Thanks,

C.





Once testing is complete, I will submit a separate patch for further review.

Best regards,
Kane

-Original Message-
From: Markus Armbruster 
Sent: Tuesday, April 8, 2025 7:39 PM
To: Cédric Le Goater 
Cc: Kane Chen ; Philippe Mathieu-Daudé
; Peter Maydell ; Steven Lee
; Troy Lee ; Jamin Lin
; Andrew Jeffery
; Joel Stanley ; open
list:ASPEED BMCs ; open list:All patches CC here
; qemu-block ; Troy Lee

Subject: Configuring onboard devices, in particular memory contents (was:
[PATCH v1 0/1] hw/misc/aspeed_sbc: Implement OTP memory and controller)

Cédric Le Goater  writes:


Hello Kane,

+ Markus (for ebc29e1beab0 implementation)

On 4/7/25 09:33, Kane Chen wrote:

Hi Cédric/Philippe,
OTP (One-Time Programmable) memory is a type of non-volatile memory
in which each bit can be programmed only once. It is typically used
to store critical and permanent information, such as the chip ID and
secure boot keys. The structure and behavior of OTP memory are
consistent across both the AST1030 and AST2600 platforms.
As Philippe pointed out, this proposal models the OTP memory as a
flash device and utilizes a block backend for persistent storage. In
contrast, existing implementations such as NPCM7xxOTPState,
BCM2835OTPState, and SiFiveUOTPState expose OTP memory via MMIO

and

always initialize it in a blank state.


AFAIU, Aspeed SBC is also MMIO based or is there another device, an
eeprom, accessible through an external bus ? How is it implemented in
HW ?


The goal of this design is to
allow the guest system to boot with a pre-configured OTP memory
state.


Yes. This is a valid request. It's not the first time we've had this
kind of requests. The initial content of EEPROM devices are an example
and some machines, like the rainier, have a lot.

If the device can be defined on the command line, like would be an
EEPROM device attached to an I2C bus or a flash device attached to a
SPI bus, we can use a 'drive' property. Something like :

   qemu-system-arm -M ast2600-evb \
   -blockdev node-name=fmc0,driver=file,filename=/path/to/fmc0.img

\

   -device mx66u51235f,bus=ssi.0,cs=0x0,drive=fmc0 \
   -blockdev node-name=fmc1,driver=file,filename=/path/to/fmc1.img

\

   -device mx66u51235f,bus=ssi.0,cs=0x1,drive=fmc1 \
   -blockdev node-name=spi1,driver=file,filename=/path/to/spi1.img \
   -device mx66u51235f,cs=0x0,bus=ssi.1,drive=spi1 \
   ...

However, the Aspeed SBC device is a platform device and it makes
things more complex : it can not be created on the command line, it is
directly created by the machine and the soc and passing device
properties to specify a blockdev it is not possible :

   qemu-system-arm -M ast2600-evb \
   -blockdev

node-name=otpmem,driver=file,filename=/path/to/otpmem.img \

   -device aspeed-sbc,drive=otpmem \
   ...


Configuring onboard devices is an old problem, and so far we have failed at
solving it adequately.

-device / device_add let you configure the new device in a general way, but
these work only for device the user creates, not for devices the board creates
automatically.

We have a bunch of ad hoc and mostly ancient ways to configure them, but
they're all limited.  For example:

* A number of old command line options, such as -drive, -serial, -net
   nic, create device backends and additionally deposit configuration in
   some global table the board may elect to use however it sees fit.  The
   intended use is to create frontends connected to these backends.

   Some boards error out when they can't honor something in the table.
   Others silently ignore parts of the table, or all of it.  Bad UI.

   Device configuration the table doesn't support is not accessible this
   way.  If you extend the table (and the associated option) to provide
   access to some device-specific configuration, all the other devices
   will silently ignore the new configuration bits.  Again, bad UI.

   There's another serious issue with block devices: -drive is obsolete
   f

[PATCH v1 0/3] target/s390x - DIAG 308 extend subcode 10 to return UVC cmd id, RC and RRC values upon failure to enter secure mode

2025-04-11 Thread Gautam Gala
DIAG 308 (subcode 10 - performing secure execution unpack) response
code when the configuration is unable to enter secure mode has limited
usability as it is a fixed value (0xa02) for variety of different
reasons. The aim is to extend this DIAG to return UVC command ID, RC
and RRC values in addition to the diag response code. This feature can
be used by the stage3a bootloader (s390-tools/rust/pvimg/boot) to read
these new values from the corresponding register and print an
appropriate error message to help pin point the cause.

The response code, UVC RC, RRC, and command ID are returned in bit
positions 48-63, 32-47, 16-31, and 0-15 of register R1 + 1 if the
function does not complete successfully (Previously, only the
response code was returned in bits 48-63).

Gautam Gala (3):
  target/s390x: Introduce constant when checking if PV header couldn't
be decrypted
  target/s390x: introduce function when exiting PV
  target/s390x: Return UVC cmd code, RC and RRC value when DIAG 308
Subcode 10 fails to enter secure mode

 hw/s390x/ipl.c | 11 ---
 hw/s390x/ipl.h |  5 +--
 hw/s390x/s390-virtio-ccw.c | 24 +++
 target/s390x/kvm/pv.c  | 62 --
 target/s390x/kvm/pv.h  | 27 -
 5 files changed, 86 insertions(+), 43 deletions(-)

-- 
2.49.0




[PATCH v1 2/3] target/s390x: introduce function when exiting PV

2025-04-11 Thread Gautam Gala
introduce a static function when exiting PV. The function replaces an
existing macro (s390_pv_cmd_exit).

Signed-off-by: Gautam Gala 
---
 target/s390x/kvm/pv.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/target/s390x/kvm/pv.c b/target/s390x/kvm/pv.c
index 3a0a971f0b..b4abda2cef 100644
--- a/target/s390x/kvm/pv.c
+++ b/target/s390x/kvm/pv.c
@@ -59,14 +59,15 @@ static int __s390_pv_cmd(uint32_t cmd, const char *cmdname, 
void *data,
  */
 #define s390_pv_cmd(cmd, data) __s390_pv_cmd(cmd, #cmd, data, NULL)
 #define s390_pv_cmd_pvrc(cmd, data, pvrc) __s390_pv_cmd(cmd, #cmd, data, pvrc)
-#define s390_pv_cmd_exit(cmd, data)\
-{  \
-int rc;\
-   \
-rc = __s390_pv_cmd(cmd, #cmd, data, NULL); \
-if (rc) {  \
-exit(1);   \
-}  \
+
+static void s390_pv_cmd_exit(uint32_t cmd, void *data)
+{
+int rc;
+
+rc = s390_pv_cmd(cmd, data);
+if (rc) {
+exit(1);
+}
 }
 
 int s390_pv_query_info(void)
-- 
2.49.0




[PATCH v1 1/3] target/s390x: Introduce constant when checking if PV header couldn't be decrypted

2025-04-11 Thread Gautam Gala
Introduce a named constant when checking the Set Secure Configuration parameters
UV call return code for the case where no valid host key was found and therefore
the PV header couldn't be decrypted (0x108).

Signed-off-by: Gautam Gala 
---
 target/s390x/kvm/pv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/s390x/kvm/pv.c b/target/s390x/kvm/pv.c
index b191a4a68a..3a0a971f0b 100644
--- a/target/s390x/kvm/pv.c
+++ b/target/s390x/kvm/pv.c
@@ -147,6 +147,7 @@ bool s390_pv_vm_try_disable_async(S390CcwMachineState *ms)
 return true;
 }
 
+#define DIAG_308_UV_RC_INVAL_HOSTKEY0x0108
 int s390_pv_set_sec_parms(uint64_t origin, uint64_t length, Error **errp)
 {
 int ret, pvrc;
@@ -158,7 +159,7 @@ int s390_pv_set_sec_parms(uint64_t origin, uint64_t length, 
Error **errp)
 ret = s390_pv_cmd_pvrc(KVM_PV_SET_SEC_PARMS, &args, &pvrc);
 if (ret) {
 error_setg(errp, "Failed to set secure execution parameters");
-if (pvrc == 0x108) {
+if (pvrc == DIAG_308_UV_RC_INVAL_HOSTKEY) {
 error_append_hint(errp, "Please check whether the image is "
 "correctly encrypted for this host\n");
 }
-- 
2.49.0




Re: [PATCH V1 0/6] fast qom tree get

2025-04-11 Thread Daniel P . Berrangé
On Wed, Apr 09, 2025 at 09:58:13AM +0200, Peter Krempa via Devel wrote:
> On Wed, Apr 09, 2025 at 09:39:02 +0200, Markus Armbruster via Devel wrote:
> > Hi Steve, I apologize for the slow response.
> > 
> > Steve Sistare  writes:
> > 
> > > Using qom-list and qom-get to get all the nodes and property values in a
> > > QOM tree can take multiple seconds because it requires 1000's of 
> > > individual
> > > QOM requests.  Some managers fetch the entire tree or a large subset
> > > of it when starting a new VM, and this cost is a substantial fraction of
> > > start up time.
> > 
> > "Some managers"... could you name one?
> 
> libvirt is at ~500 qom-get calls during an average startup ...
> 
> > > To reduce this cost, consider QAPI calls that fetch more information in
> > > each call:
> > >   * qom-list-get: given a path, return a list of properties and values.
> > >   * qom-list-getv: given a list of paths, return a list of properties and
> > > values for each path.
> > >   * qom-tree-get: given a path, return all descendant nodes rooted at that
> > > path, with properties and values for each.
> > 
> > Libvirt developers, would you be interested in any of these?
> 
> YES!!!

Not neccessarily, see below...  

> 
> The getter with value could SO MUCH optimize the startup sequence of a
> VM where libvirt needs to probe CPU flags:
> 
> (note the 'id' field in libvirt's monitor is sequential)
> 
> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"realized"},"id":"libvirt-8"}
> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hotplugged"},"id":"libvirt-9"}
> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hotpluggable"},"id":"libvirt-10"}
> 
> [...]
> 
> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hv-apicv"},"id":"libvirt-470"}
> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"xd"},"id":"libvirt-471"}
> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"sse4_1"},"id":"libvirt-472"}
> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"unavailable-features"},"id":"libvirt-473"}
> 
> First and last line's timestamps:
> 
> 2025-04-08 14:44:28.882+: 1481190: info : qemuMonitorIOWrite:340 : 
> QEMU_MONITOR_IO_WRITE: mon=0x7f4678048360 
> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"realized"},"id":"libvirt-8"}
> 
> 2025-04-08 14:44:29.149+: 1481190: info : qemuMonitorIOWrite:340 : 
> QEMU_MONITOR_IO_WRITE: mon=0x7f4678048360 
> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"unavailable-features"},"id":"libvirt-473"}
> 
> Libvirt spent ~170 ms probing cpu flags.

One thing I would point out is that qom-get can be considered an
"escape hatch" to get information when no better QMP command exists.
In this case, libvirt has made the assumption that every CPU feature
is a QOM property.

Adding qom-list-get doesn't appreciably change that, just makes the
usage more efficient.

Considering the bigger picture QMP design, when libvirt is trying to
understand QEMU's CPU feature flag expansion, I would ask why we don't
have something like a "query-cpu" command to tell us the current CPU
expansion, avoiding the need for poking at QOM properties directly.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [RFC PATCH-for-8.0 09/10] hw/virtio: Extract vhost_user_ram_slots_max() to vhost-user-target.c

2025-04-11 Thread Philippe Mathieu-Daudé

On 10/4/25 19:29, Pierrick Bouvier wrote:

On 4/10/25 10:21, Philippe Mathieu-Daudé wrote:

On 10/4/25 16:36, Pierrick Bouvier wrote:

On 4/10/25 05:14, Philippe Mathieu-Daudé wrote:

Hi Pierrick,

On 13/12/22 00:05, Philippe Mathieu-Daudé wrote:

The current definition of VHOST_USER_MAX_RAM_SLOTS is
target specific. By converting this definition to a runtime
vhost_user_ram_slots_max() helper declared in a target
specific unit, we can have the rest of vhost-user.c target
independent.

To avoid variable length array or using the heap to store
arrays of vhost_user_ram_slots_max() elements, we simply
declare an array of the biggest VHOST_USER_MAX_RAM_SLOTS,
and each target uses up to vhost_user_ram_slots_max()
elements of it. Ensure arrays are big enough by adding an
assertion in vhost_user_init().

Signed-off-by: Philippe Mathieu-Daudé 
---
RFC: Should I add VHOST_USER_MAX_RAM_SLOTS to vhost-user.h
    or create an internal header for it?
---
    hw/virtio/meson.build  |  1 +
    hw/virtio/vhost-user-target.c  | 29 +
    hw/virtio/vhost-user.c | 26 +-
    include/hw/virtio/vhost-user.h |  7 +++
    4 files changed, 42 insertions(+), 21 deletions(-)
    create mode 100644 hw/virtio/vhost-user-target.c

diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index eb7ee8ea92..bf7e35fa8a 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -11,6 +11,7 @@ if have_vhost
  specific_virtio_ss.add(files('vhost.c', 'vhost-backend.c',
'vhost-iova-tree.c'))
  if have_vhost_user
    specific_virtio_ss.add(files('vhost-user.c'))
+    specific_virtio_ss.add(files('vhost-user-target.c'))
  endif
  if have_vhost_vdpa
    specific_virtio_ss.add(files('vhost-vdpa.c', 'vhost-shadow-
virtqueue.c'))
diff --git a/hw/virtio/vhost-user-target.c b/hw/virtio/vhost-user-
target.c
new file mode 100644
index 00..6a0d0f53d0
--- /dev/null
+++ b/hw/virtio/vhost-user-target.c
@@ -0,0 +1,29 @@
+/*
+ * vhost-user target-specific helpers
+ *
+ * Copyright (c) 2013 Virtual Open Systems Sarl.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "hw/virtio/vhost-user.h"
+
+#if defined(TARGET_X86) || defined(TARGET_X86_64) || \
+    defined(TARGET_ARM) || defined(TARGET_ARM_64)
+#include "hw/acpi/acpi.h"
+#elif defined(TARGET_PPC) || defined(TARGET_PPC64)
+#include "hw/ppc/spapr.h"
+#endif
+
+unsigned int vhost_user_ram_slots_max(void)
+{
+#if defined(TARGET_X86) || defined(TARGET_X86_64) || \
+    defined(TARGET_ARM) || defined(TARGET_ARM_64)
+    return ACPI_MAX_RAM_SLOTS;
+#elif defined(TARGET_PPC) || defined(TARGET_PPC64)
+    return SPAPR_MAX_RAM_SLOTS;
+#else
+    return 512;


Should vhost_user_ram_slots_max be another TargetInfo field?



I don't think so, it would be better to transform the existing function
in something like:

switch (target_current()) {
case TARGET_X86:
case TARGET_ARM:
case TARGET_X86_64:
case TARGET_ARM_64:
  return ACPI_MAX_RAM_SLOTS;
case TARGET PPC:
case TARGET PPC64:
  return SPAPR_MAX_RAM_SLOTS;
default:
  return 512;
}


Clever, I like it, thanks!


It's a pattern we can reuse in all places where it'll be needed.
It's better if we keep in TargetInfo only global information, that is 
used through all the codebase, and not specifics about a given 
subsystem/device/file.


By the way, TARGET_ARM_64 is probably TARGET_AARCH64.


Correct, it has been fixed by Akihiko:

commit 744734ccc9eff28394a453de462b2a155f364118
Author: Akihiko Odaki 
Date:   Mon Jan 9 15:31:30 2023 +0900

vhost-user: Correct a reference of TARGET_AARCH64

Presumably TARGET_ARM_64 should be a mistake of TARGET_AARCH64.

Signed-off-by: Akihiko Odaki 
Message-Id: <20230109063130.81296-1-akihiko.od...@daynix.com>
Fixes: 27598393a2 ("Lift max memory slots limit imposed by vhost-user")
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index d9ce0501b2c..6c79da953b3 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -48,7 +48,7 @@
  * hardware plaform.
  */
 #if defined(TARGET_X86) || defined(TARGET_X86_64) || \
-defined(TARGET_ARM) || defined(TARGET_ARM_64)
+defined(TARGET_ARM) || defined(TARGET_AARCH64)
 #include "hw/acpi/acpi.h"
 #define VHOST_USER_MAX_RAM_SLOTS ACPI_MAX_RAM_SLOTS





Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again

2025-04-11 Thread Kevin Wolf
Am 10.04.2025 um 17:28 hat Paolo Bonzini geschrieben:
> On Thu, Apr 10, 2025 at 4:25 PM Paolo Bonzini  wrote:
> > You should set ret = 0 here to avoid going down the
> > scsi_sense_from_errno() path.
> >
> > Otherwise,
> >
> > Reviewed-by: Paolo Bonzini 
> 
> Okay, going down the scsi_sense_from_errno() path is more or less
> harmless because status and sense end up unused; even though ENODEV is
> not something that the function handles, that can be added as a
> cleanup in 10.1.

Yes, it could be handled more explicitly. I considered adding a special
if branch in scsi_handle_rw_error() for host_status != -1 before
checking ret < 0, but didn't do it in the end because the existing code
already handles it fine. If you prefer it to be there for readability, I
can send a cleanup patch.

Kevin




Re: [PATCH v2 02/10] usb/msd: Ensure packet structure layout is correct

2025-04-11 Thread Philippe Mathieu-Daudé

On 11/4/25 10:04, Nicholas Piggin wrote:

These structures are hardware interfaces, ensure the layout is
correct.

Signed-off-by: Nicholas Piggin 
---
  hw/usb/dev-storage.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2 03/10] usb/msd: Improved handling of mass storage reset

2025-04-11 Thread Philippe Mathieu-Daudé

On 11/4/25 10:04, Nicholas Piggin wrote:

The mass storage reset request handling does not reset in-flight
SCSI requests or USB MSD packets. Implement this by calling the
device reset handler which should take care of everything.

Signed-off-by: Nicholas Piggin 
---
  hw/usb/dev-storage.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé 




[PATCH 5/5] vfio/iommufd: Drop HostIOMMUDeviceCaps from HostIOMMUDevice

2025-04-11 Thread Zhenzhong Duan
Because hiod_iommufd_get_cap() was dropped, HostIOMMUDeviceCaps is not
useful any more, drop it.

This also hides HostIOMMUDeviceCaps from vIOMMU so the only way to check
cap is through .get_cap() interface. This makes HostIOMMUDevice exposing
data to vIOMMU as small as possible.

Signed-off-by: Zhenzhong Duan 
---
 include/system/host_iommu_device.h | 14 --
 hw/vfio/iommufd.c  | 15 ---
 2 files changed, 29 deletions(-)

diff --git a/include/system/host_iommu_device.h 
b/include/system/host_iommu_device.h
index 809cced4ba..6f10bea25f 100644
--- a/include/system/host_iommu_device.h
+++ b/include/system/host_iommu_device.h
@@ -15,19 +15,6 @@
 #include "qom/object.h"
 #include "qapi/error.h"
 
-/**
- * struct HostIOMMUDeviceCaps - Define host IOMMU device capabilities.
- *
- * @type: host platform IOMMU type.
- *
- * @hw_caps: host platform IOMMU capabilities (e.g. on IOMMUFD this represents
- *   the @out_capabilities value returned from IOMMU_GET_HW_INFO ioctl)
- */
-typedef struct HostIOMMUDeviceCaps {
-uint32_t type;
-uint64_t hw_caps;
-} HostIOMMUDeviceCaps;
-
 #define TYPE_HOST_IOMMU_DEVICE "host-iommu-device"
 OBJECT_DECLARE_TYPE(HostIOMMUDevice, HostIOMMUDeviceClass, HOST_IOMMU_DEVICE)
 
@@ -38,7 +25,6 @@ struct HostIOMMUDevice {
 void *agent; /* pointer to agent device, ie. VFIO or VDPA device */
 PCIBus *aliased_bus;
 int aliased_devfn;
-HostIOMMUDeviceCaps caps;
 };
 
 /**
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index e7ca92f81f..947c5456d8 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -811,24 +811,9 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice 
*hiod, void *opaque,
   Error **errp)
 {
 VFIODevice *vdev = opaque;
-HostIOMMUDeviceCaps *caps = &hiod->caps;
-enum iommu_hw_info_type type;
-union {
-struct iommu_hw_info_vtd vtd;
-} data;
-uint64_t hw_caps;
 
 hiod->agent = opaque;
-
-if (!iommufd_backend_get_device_info(vdev->iommufd, vdev->devid,
- &type, &data, sizeof(data),
- &hw_caps, errp)) {
-return false;
-}
-
 hiod->name = g_strdup(vdev->name);
-caps->type = type;
-caps->hw_caps = hw_caps;
 
 return true;
 }
-- 
2.34.1




[PATCH 3/5] vfio/iommufd: Implement .get_cap() in TYPE_HOST_IOMMU_DEVICE_IOMMUFD_VFIO sub-class

2025-04-11 Thread Zhenzhong Duan
Now we have saved a copy of host iommu capabilities in VFIODevice, implemented
hiod_iommufd_vfio_get_cap() by querying the caps copy in sub-class. This
overrides .get_cap() implementation hiod_iommufd_vfio_get_cap() in
TYPE_HOST_IOMMU_DEVICE_IOMMUFD parent class.

Vendor caps are checked for a specific capability, e.g., for vtd, checking
code will be in hiod_iommufd_get_vtd_cap().

This also fixes an issue that calling vfio_device_get_aw_bits() in
TYPE_HOST_IOMMU_DEVICE_IOMMUFD parent class .get_cap().

Signed-off-by: Zhenzhong Duan 
---
 include/system/iommufd.h |  4 
 backends/iommufd.c   | 40 
 hw/vfio/iommufd.c| 16 
 3 files changed, 60 insertions(+)

diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index 0f337585c9..baba5ec1d8 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -85,4 +85,8 @@ typedef struct HostIOMMUDeviceIOMMUFDCaps {
 uint64_t hw_caps;
 VendorCaps vendor_caps;
 } HostIOMMUDeviceIOMMUFDCaps;
+
+int hiod_iommufd_get_common_cap(HostIOMMUDevice *hiod,
+HostIOMMUDeviceIOMMUFDCaps *caps,
+int cap, Error **errp);
 #endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 9587e4d99b..54fa3174d0 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -355,3 +355,43 @@ static const TypeInfo types[] = {
 };
 
 DEFINE_TYPES(types)
+
+static int hiod_iommufd_get_vtd_cap(HostIOMMUDevice *hiod,
+struct iommu_hw_info_vtd *vtd,
+int cap, Error **errp)
+{
+/* TODO: Check vtd->cap_reg/ecap_reg for capability */
+error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
+return -EINVAL;
+}
+
+static int hiod_iommufd_get_vendor_cap(HostIOMMUDevice *hiod,
+   HostIOMMUDeviceIOMMUFDCaps *caps,
+   int cap, Error **errp)
+{
+enum iommu_hw_info_type type = caps->type;
+
+switch (type) {
+case IOMMU_HW_INFO_TYPE_INTEL_VTD:
+return hiod_iommufd_get_vtd_cap(hiod, &caps->vendor_caps.vtd,
+cap, errp);
+case IOMMU_HW_INFO_TYPE_ARM_SMMUV3:
+case IOMMU_HW_INFO_TYPE_NONE:
+break;
+}
+
+error_setg(errp, "%s: unsupported capability type %x", hiod->name, type);
+return -EINVAL;
+}
+
+int hiod_iommufd_get_common_cap(HostIOMMUDevice *hiod,
+HostIOMMUDeviceIOMMUFDCaps *caps,
+int cap, Error **errp)
+{
+switch (cap) {
+case HOST_IOMMU_DEVICE_CAP_IOMMU_TYPE:
+return caps->type;
+default:
+return hiod_iommufd_get_vendor_cap(hiod, caps, cap, errp);
+}
+}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index e05b472e35..e7ca92f81f 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -833,6 +833,21 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice 
*hiod, void *opaque,
 return true;
 }
 
+static int hiod_iommufd_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
+ Error **errp)
+{
+VFIODevice *vdev = hiod->agent;
+HostIOMMUDeviceIOMMUFDCaps *caps = &vdev->caps;
+
+/* VFIO has its own way to get aw_bits which may be different from VDPA */
+switch (cap) {
+case HOST_IOMMU_DEVICE_CAP_AW_BITS:
+return vfio_device_get_aw_bits(hiod->agent);
+default:
+return hiod_iommufd_get_common_cap(hiod, caps, cap, errp);
+}
+}
+
 static GList *
 hiod_iommufd_vfio_get_iova_ranges(HostIOMMUDevice *hiod)
 {
@@ -857,6 +872,7 @@ static void hiod_iommufd_vfio_class_init(ObjectClass *oc, 
void *data)
 HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_CLASS(oc);
 
 hiodc->realize = hiod_iommufd_vfio_realize;
+hiodc->get_cap = hiod_iommufd_vfio_get_cap;
 hiodc->get_iova_ranges = hiod_iommufd_vfio_get_iova_ranges;
 hiodc->get_page_size_mask = hiod_iommufd_vfio_get_page_size_mask;
 };
-- 
2.34.1




[PATCH 1/5] vfio/iommufd: Save host iommu capabilities in VFIODevice.caps

2025-04-11 Thread Zhenzhong Duan
The saved caps copy can be used to check dirty tracking capability.

The capabilities is gotten through IOMMUFD interface, so define a
new structure HostIOMMUDeviceIOMMUFDCaps which contains vendor
caps raw data in "include/system/iommufd.h".

This is a prepare work for moving .realize() after .attach_device().

Suggested-by: Cédric Le Goater 
Suggested-by: Eric Auger 
Suggested-by: Nicolin Chen 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-device.h |  1 +
 include/system/iommufd.h  | 22 ++
 hw/vfio/iommufd.c | 10 +-
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 66797b4c92..09a7af891a 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -77,6 +77,7 @@ typedef struct VFIODevice {
 bool dirty_tracking; /* Protected by BQL */
 bool iommu_dirty_tracking;
 HostIOMMUDevice *hiod;
+HostIOMMUDeviceIOMMUFDCaps caps;
 int devid;
 IOMMUFDBackend *iommufd;
 VFIOIOASHwpt *hwpt;
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index cbab75bfbf..0f337585c9 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -18,6 +18,9 @@
 #include "exec/hwaddr.h"
 #include "exec/cpu-common.h"
 #include "system/host_iommu_device.h"
+#ifdef CONFIG_LINUX
+#include 
+#endif
 
 #define TYPE_IOMMUFD_BACKEND "iommufd"
 OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass, IOMMUFD_BACKEND)
@@ -63,4 +66,23 @@ bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, 
uint32_t hwpt_id,
   Error **errp);
 
 #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
+
+typedef union VendorCaps {
+struct iommu_hw_info_vtd vtd;
+struct iommu_hw_info_arm_smmuv3 smmuv3;
+} VendorCaps;
+
+/**
+ * struct HostIOMMUDeviceIOMMUFDCaps - Define host IOMMU device capabilities.
+ *
+ * @type: host platform IOMMU type.
+ *
+ * @hw_caps: host platform IOMMU capabilities (e.g. on IOMMUFD this represents
+ *   the @out_capabilities value returned from IOMMU_GET_HW_INFO ioctl)
+ */
+typedef struct HostIOMMUDeviceIOMMUFDCaps {
+uint32_t type;
+uint64_t hw_caps;
+VendorCaps vendor_caps;
+} HostIOMMUDeviceIOMMUFDCaps;
 #endif
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 48db105422..530cde6740 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -324,7 +324,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice 
*vbasedev,
  * vfio_migration_realize() may decide to use VF dirty tracking
  * instead.
  */
-if (vbasedev->hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING) {
+if (vbasedev->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING) {
 flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
 }
 
@@ -475,6 +475,7 @@ static bool iommufd_cdev_attach(const char *name, 
VFIODevice *vbasedev,
 int ret, devfd;
 uint32_t ioas_id;
 Error *err = NULL;
+HostIOMMUDeviceIOMMUFDCaps *caps = &vbasedev->caps;
 const VFIOIOMMUClass *iommufd_vioc =
 VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
 
@@ -505,6 +506,13 @@ static bool iommufd_cdev_attach(const char *name, 
VFIODevice *vbasedev,
 goto err_alloc_ioas;
 }
 
+if (!iommufd_backend_get_device_info(vbasedev->iommufd, vbasedev->devid,
+ &caps->type, &caps->vendor_caps,
+ sizeof(VendorCaps), &caps->hw_caps,
+ errp)) {
+goto err_alloc_ioas;
+}
+
 /* try to attach to an existing container in this space */
 QLIST_FOREACH(bcontainer, &space->containers, next) {
 container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
-- 
2.34.1




Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again

2025-04-11 Thread Paolo Bonzini
On Fri, Apr 11, 2025 at 12:18 PM Kevin Wolf  wrote:
> > Okay, going down the scsi_sense_from_errno() path is more or less
> > harmless because status and sense end up unused; even though ENODEV is
> > not something that the function handles, that can be added as a
> > cleanup in 10.1.
>
> Yes, it could be handled more explicitly. I considered adding a special
> if branch in scsi_handle_rw_error() for host_status != -1 before
> checking ret < 0, but didn't do it in the end because the existing code
> already handles it fine. If you prefer it to be there for readability, I
> can send a cleanup patch.

Don't worry, I tried when I thought it was a bug but came to the same
conclusion.  I have sent a patch to handle ENODEV, which makes the
code a bit less mysterious, but that's it.

Paolo




[PATCH 0/5] cleanup interfaces

2025-04-11 Thread Zhenzhong Duan
Hi,

This series addresses Cédric's suggestion[1] and Donald's suggestion[2] to
move realize() call after attach_device().

Also addresses Eric and Nicolin's suggestion[3] to use a union to hold different
vendor capabilities.

[1] https://lists.gnu.org/archive/html/qemu-devel/2025-04/msg01211.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2025-04/msg00898.html
[3] https://lists.gnu.org/archive/html/qemu-devel/2025-03/msg01552.html

Test:
net card passthrough and ping test
hotplug/unplug

Based on vfio-next(b9d42a878b).

Thanks
Zhenzhong


Zhenzhong Duan (5):
  vfio/iommufd: Save host iommu capabilities in VFIODevice.caps
  vfio: Move realize() after attach_device()
  vfio/iommufd: Implement .get_cap() in
TYPE_HOST_IOMMU_DEVICE_IOMMUFD_VFIO sub-class
  backends/iommufd: Drop hiod_iommufd_get_cap()
  vfio/iommufd: Drop HostIOMMUDeviceCaps from HostIOMMUDevice

 include/hw/vfio/vfio-device.h  |  2 +-
 include/system/host_iommu_device.h | 14 ---
 include/system/iommufd.h   | 26 
 backends/iommufd.c | 63 +++---
 hw/vfio/container.c|  4 --
 hw/vfio/device.c   | 28 ++---
 hw/vfio/iommufd.c  | 39 ++
 7 files changed, 100 insertions(+), 76 deletions(-)

-- 
2.34.1




[PATCH 2/5] vfio: Move realize() after attach_device()

2025-04-11 Thread Zhenzhong Duan
Previously device attaching depends on realize() getting host iommu
capabilities to check dirty tracking support.

Now we save a caps copy in VFIODevice and check that copy for dirty
tracking support, there is no dependency any more, move realize()
call after attach_device() call in vfio_device_attach().

Drop vfio_device_hiod_realize() which looks redundant now.

Suggested-by: Cédric Le Goater 
Suggested-by: Donald Dutile 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-device.h |  1 -
 hw/vfio/container.c   |  4 
 hw/vfio/device.c  | 28 +++-
 hw/vfio/iommufd.c |  4 
 4 files changed, 11 insertions(+), 26 deletions(-)

diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 09a7af891a..14559733c6 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -124,7 +124,6 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, 
int index, int subindex
 
 void vfio_device_reset_handler(void *opaque);
 bool vfio_device_is_mdev(VFIODevice *vbasedev);
-bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
 bool vfio_device_attach(char *name, VFIODevice *vbasedev,
 AddressSpace *as, Error **errp);
 void vfio_device_detach(VFIODevice *vbasedev);
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 23a3373470..676e88cef4 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -883,10 +883,6 @@ static bool vfio_legacy_attach_device(const char *name, 
VFIODevice *vbasedev,
 
 trace_vfio_device_attach(vbasedev->name, groupid);
 
-if (!vfio_device_hiod_realize(vbasedev, errp)) {
-return false;
-}
-
 group = vfio_group_get(groupid, as, errp);
 if (!group) {
 return false;
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 4de6948cf4..6154d3f443 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -347,17 +347,6 @@ bool vfio_device_is_mdev(VFIODevice *vbasedev)
 return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
 }
 
-bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
-{
-HostIOMMUDevice *hiod = vbasedev->hiod;
-
-if (!hiod) {
-return true;
-}
-
-return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp);
-}
-
 VFIODevice *vfio_get_vfio_device(Object *obj)
 {
 if (object_dynamic_cast(obj, TYPE_VFIO_PCI)) {
@@ -372,6 +361,7 @@ bool vfio_device_attach(char *name, VFIODevice *vbasedev,
 {
 const VFIOIOMMUClass *ops =
 VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
+HostIOMMUDeviceClass *hiodc;
 HostIOMMUDevice *hiod = NULL;
 
 if (vbasedev->iommufd) {
@@ -380,16 +370,20 @@ bool vfio_device_attach(char *name, VFIODevice *vbasedev,
 
 assert(ops);
 
+if (!ops->attach_device(name, vbasedev, as, errp)) {
+return false;
+}
 
 if (!vbasedev->mdev) {
 hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
-vbasedev->hiod = hiod;
-}
+hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
 
-if (!ops->attach_device(name, vbasedev, as, errp)) {
-object_unref(hiod);
-vbasedev->hiod = NULL;
-return false;
+if (!hiodc->realize(hiod, vbasedev, errp)) {
+object_unref(hiod);
+ops->detach_device(vbasedev);
+return false;
+}
+vbasedev->hiod = hiod;
 }
 
 return true;
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 530cde6740..e05b472e35 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -502,10 +502,6 @@ static bool iommufd_cdev_attach(const char *name, 
VFIODevice *vbasedev,
  * FD to be connected and having a devid to be able to successfully call
  * iommufd_backend_get_device_info().
  */
-if (!vfio_device_hiod_realize(vbasedev, errp)) {
-goto err_alloc_ioas;
-}
-
 if (!iommufd_backend_get_device_info(vbasedev->iommufd, vbasedev->devid,
  &caps->type, &caps->vendor_caps,
  sizeof(VendorCaps), &caps->hw_caps,
-- 
2.34.1




Re: [PATCH v2 02/10] usb/msd: Ensure packet structure layout is correct

2025-04-11 Thread Philippe Mathieu-Daudé

On 11/4/25 10:04, Nicholas Piggin wrote:

These structures are hardware interfaces, ensure the layout is
correct.

Signed-off-by: Nicholas Piggin 
---
  hw/usb/dev-storage.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index 2d7306b0572..87c22476f6b 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -27,7 +27,7 @@
  #define MassStorageReset  0xff
  #define GetMaxLun 0xfe
  
-struct usb_msd_cbw {

+struct QEMU_PACKED usb_msd_cbw {
  uint32_t sig;
  uint32_t tag;
  uint32_t data_len;
@@ -636,6 +636,9 @@ static const TypeInfo usb_storage_dev_type_info = {
  
  static void usb_msd_register_types(void)

  {
+qemu_build_assert(sizeof(struct usb_msd_cbw) == 31);
+qemu_build_assert(sizeof(struct usb_msd_csw) == 13);


Can we add definitions for these 13/31 magic values? Then
we can use them in try_get_valid_cbw().


+
  type_register_static(&usb_storage_dev_type_info);
  }
  





[PATCH 4/5] backends/iommufd: Drop hiod_iommufd_get_cap()

2025-04-11 Thread Zhenzhong Duan
Because sub-class TYPE_HOST_IOMMU_DEVICE_IOMMUFD_VFIO has it's own
implementation of .get_cap(), hiod_iommufd_get_cap() isn't used
any more, drop it.

Signed-off-by: Zhenzhong Duan 
---
 backends/iommufd.c | 23 ---
 1 file changed, 23 deletions(-)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index 54fa3174d0..d2ecdc9c82 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -311,28 +311,6 @@ bool iommufd_backend_get_device_info(IOMMUFDBackend *be, 
uint32_t devid,
 return true;
 }
 
-static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp)
-{
-HostIOMMUDeviceCaps *caps = &hiod->caps;
-
-switch (cap) {
-case HOST_IOMMU_DEVICE_CAP_IOMMU_TYPE:
-return caps->type;
-case HOST_IOMMU_DEVICE_CAP_AW_BITS:
-return vfio_device_get_aw_bits(hiod->agent);
-default:
-error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
-return -EINVAL;
-}
-}
-
-static void hiod_iommufd_class_init(ObjectClass *oc, void *data)
-{
-HostIOMMUDeviceClass *hioc = HOST_IOMMU_DEVICE_CLASS(oc);
-
-hioc->get_cap = hiod_iommufd_get_cap;
-};
-
 static const TypeInfo types[] = {
 {
 .name = TYPE_IOMMUFD_BACKEND,
@@ -349,7 +327,6 @@ static const TypeInfo types[] = {
 }, {
 .name = TYPE_HOST_IOMMU_DEVICE_IOMMUFD,
 .parent = TYPE_HOST_IOMMU_DEVICE,
-.class_init = hiod_iommufd_class_init,
 .abstract = true,
 }
 };
-- 
2.34.1




Re: [PATCH v2 02/10] usb/msd: Ensure packet structure layout is correct

2025-04-11 Thread Philippe Mathieu-Daudé

On 11/4/25 12:21, Philippe Mathieu-Daudé wrote:

On 11/4/25 10:04, Nicholas Piggin wrote:

These structures are hardware interfaces, ensure the layout is
correct.

Signed-off-by: Nicholas Piggin 
---
  hw/usb/dev-storage.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index 2d7306b0572..87c22476f6b 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -27,7 +27,7 @@
  #define MassStorageReset  0xff
  #define GetMaxLun 0xfe
-struct usb_msd_cbw {
+struct QEMU_PACKED usb_msd_cbw {
  uint32_t sig;
  uint32_t tag;
  uint32_t data_len;
@@ -636,6 +636,9 @@ static const TypeInfo usb_storage_dev_type_info = {
  static void usb_msd_register_types(void)
  {
+    qemu_build_assert(sizeof(struct usb_msd_cbw) == 31);
+    qemu_build_assert(sizeof(struct usb_msd_csw) == 13);


Can we add definitions for these 13/31 magic values? Then
we can use them in try_get_valid_cbw().


Maybe USB_MSD_CBW/CSW_MIN_SIZE?




+
  type_register_static(&usb_storage_dev_type_info);
  }







Re: [PATCH v2 08/10] usb/msd: Rename mode to cbw_state, and tweak names

2025-04-11 Thread Philippe Mathieu-Daudé

On 11/4/25 10:04, Nicholas Piggin wrote:

This reflects a little better what it does, particularly with a
subsequent change to relax the order packets are seen in. This
field is not the general state of the MSD state machine, rather
it follows packets that are completed as part of a CBW command.

The difference is a bit subtle, so for a concrete example, the
next change will permit the host to send a CSW packet before it
sends the associated CBW packet. In that case the CSW packet
will be tracked and the MSD state machine will move, but this
mode / cbw_state field would remain unchanged (in the "expecting
CBW" state), until the CBW packet arrives.

Signed-off-by: Nicholas Piggin 
---
  include/hw/usb/msd.h | 12 +--
  hw/usb/dev-storage.c | 50 +++-
  2 files changed, 32 insertions(+), 30 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 00/10] Enable QEMU to run on browsers

2025-04-11 Thread Daniel P . Berrangé
On Wed, Apr 09, 2025 at 03:21:15PM -0400, Stefan Hajnoczi wrote:
> On Mon, Apr 07, 2025 at 11:45:51PM +0900, Kohei Tokunaga wrote:
> > This patch series enables QEMU's system emulator to run in a browser using
> > Emscripten.
> > It includes implementations and workarounds to address browser environment
> > limitations, as shown in the following.
> 
> I think it would be great to merge this even if there are limitations
> once code review comments have been addressed. Developing WebAssembly
> support in-tree is likely to allow this effort to develop further than
> if done in personal repos (and with significant efforts required to
> rebase the code periodically).

It is certainly impressive & clever but first two critical questions..

Is there a committment to long term (many years) development & maintenance
of this, or is it just a short term experiment which will have attention
dwindle in a year's time ?

Is there a compelling real world use case for this that will justify
carrying it in QEMU, or is it a case of "it exists because it can" ?

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v2 10/10] usb/msd: Add more tracing

2025-04-11 Thread Philippe Mathieu-Daudé

On 11/4/25 10:04, Nicholas Piggin wrote:

Add tracing for more received packet types, cbw_state changes, and
some more SCSI callbacks. These were useful in debugging relaxed
packet ordering support.

Signed-off-by: Nicholas Piggin 
---
  hw/usb/dev-storage.c | 23 +--
  hw/usb/trace-events  |  9 -
  2 files changed, 29 insertions(+), 3 deletions(-)




  static void usb_msd_copy_data(MSDState *s, USBPacket *p)
  {
  uint32_t len;
+
  len = p->iov.size - p->actual_length;
+
+trace_usb_msd_copy_data(s->req->tag, len);
+
  if (len > s->scsi_len)
  len = s->scsi_len;
  usb_packet_copy(p, scsi_req_get_buf(s->req) + s->scsi_off, len);
@@ -264,6 +268,8 @@ void usb_msd_transfer_data(SCSIRequest *req, uint32_t len)
  MSDState *s = DO_UPCAST(MSDState, dev.qdev, req->bus->qbus.parent);
  USBPacket *p = s->data_packet;
  
+trace_usb_msd_transfer_data(req->tag, len);

+
  if (s->cbw_state == USB_MSD_CBW_DATAIN) {
  if (req->cmd.mode == SCSI_XFER_TO_DEV) {
  usb_msd_fatal_error(s);
@@ -324,11 +330,13 @@ void usb_msd_command_complete(SCSIRequest *req, size_t 
resid)
  }
  if (s->data_len == 0) {
  s->cbw_state = USB_MSD_CBW_CSW;
+trace_usb_msd_cbw_state(s->cbw_state);
  }
  /* USB_RET_SUCCESS status clears previous ASYNC status */
  usb_msd_data_packet_complete(s, USB_RET_SUCCESS);
  } else if (s->data_len == 0) {
  s->cbw_state = USB_MSD_CBW_CSW;
+trace_usb_msd_cbw_state(s->cbw_state);
  }


Maybe helpful to log state transition?

  void usb_msd_cbw_change_state(MSDState *s,
enum USBMSDCBWState cbw_state)
  {
  if (s->cbw_state != cbw_state) {
  trace_usb_msd_cbw_state(s->cbw_state, cbw_state);
  s->cbw_state = cbw_state;
  }
  }

Otherwise,
Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2 08/10] usb/msd: Rename mode to cbw_state, and tweak names

2025-04-11 Thread Philippe Mathieu-Daudé

On 11/4/25 10:04, Nicholas Piggin wrote:

This reflects a little better what it does, particularly with a
subsequent change to relax the order packets are seen in. This
field is not the general state of the MSD state machine, rather
it follows packets that are completed as part of a CBW command.

The difference is a bit subtle, so for a concrete example, the
next change will permit the host to send a CSW packet before it
sends the associated CBW packet. In that case the CSW packet
will be tracked and the MSD state machine will move, but this
mode / cbw_state field would remain unchanged (in the "expecting
CBW" state), until the CBW packet arrives.

Signed-off-by: Nicholas Piggin 
---
  include/hw/usb/msd.h | 12 +--
  hw/usb/dev-storage.c | 50 +++-
  2 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/include/hw/usb/msd.h b/include/hw/usb/msd.h
index a40d15f5def..c109544f632 100644
--- a/include/hw/usb/msd.h
+++ b/include/hw/usb/msd.h
@@ -10,11 +10,11 @@
  #include "hw/usb.h"
  #include "hw/scsi/scsi.h"
  
-enum USBMSDMode {

-USB_MSDM_CBW, /* Command Block.  */
-USB_MSDM_DATAOUT, /* Transfer data to device.  */
-USB_MSDM_DATAIN, /* Transfer data from device.  */
-USB_MSDM_CSW /* Command Status.  */


Since modifying this, please add

  typedef


+enum USBMSDCBWState {
+USB_MSD_CBW_NONE,/* Ready, waiting for CBW packet. */
+USB_MSD_CBW_DATAOUT, /* Expecting DATA-OUT (to device) packet */
+USB_MSD_CBW_DATAIN,  /* Expecting DATA-IN (from device) packet */
+USB_MSD_CBW_CSW  /* No more data, expecting CSW packet.  */
  }


  USBMSDCBWState;

  
  struct QEMU_PACKED usb_msd_csw {

@@ -26,7 +26,7 @@ struct QEMU_PACKED usb_msd_csw {
  
  struct MSDState {

  USBDevice dev;
-enum USBMSDMode mode;
+enum USBMSDCBWState cbw_state;


   USBMSDCBWState cbw_state;


  uint32_t scsi_off;
  uint32_t scsi_len;
  uint32_t data_len;




Re: [PATCH v2 07/10] usb/msd: Add some additional assertions

2025-04-11 Thread Philippe Mathieu-Daudé

On 11/4/25 10:04, Nicholas Piggin wrote:

Add more assertions to help verify internal logic.

Signed-off-by: Nicholas Piggin 
---
  hw/usb/dev-storage.c | 23 +++
  1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index a9d8d4e8618..3b806872587 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -264,13 +264,24 @@ void usb_msd_transfer_data(SCSIRequest *req, uint32_t len)
  MSDState *s = DO_UPCAST(MSDState, dev.qdev, req->bus->qbus.parent);
  USBPacket *p = s->data_packet;
  
-if ((s->mode == USB_MSDM_DATAOUT) != (req->cmd.mode == SCSI_XFER_TO_DEV)) {

-usb_msd_fatal_error(s);
-return;
+if (s->mode == USB_MSDM_DATAIN) {


Or switch().


+if (req->cmd.mode == SCSI_XFER_TO_DEV) {
+usb_msd_fatal_error(s);
+return;
+}
+} else if (s->mode == USB_MSDM_DATAOUT) {
+if (req->cmd.mode != SCSI_XFER_TO_DEV) {
+usb_msd_fatal_error(s);
+return;
+}
+} else {
+g_assert_not_reached();
  }
  
+assert(s->scsi_len == 0);

  s->scsi_len = len;
  s->scsi_off = 0;
+
  if (p) {
  usb_msd_copy_data(s, p);
  p = s->data_packet;
@@ -288,6 +299,10 @@ void usb_msd_command_complete(SCSIRequest *req, size_t 
resid)
  
  trace_usb_msd_cmd_complete(req->status, req->tag);
  
+g_assert(s->req);

+/* The CBW is what starts the SCSI request */
+g_assert(s->mode != USB_MSDM_CBW);
+
  s->csw.sig = cpu_to_le32(0x53425355);
  s->csw.tag = cpu_to_le32(req->tag);
  s->csw.residue = cpu_to_le32(s->data_len);
@@ -486,7 +501,7 @@ static void usb_msd_handle_data_out(USBDevice *dev, 
USBPacket *p)
  trace_usb_msd_cmd_submit(cbw.lun, tag, cbw.flags,
   cbw.cmd_len, s->data_len);
  assert(le32_to_cpu(s->csw.residue) == 0);
-s->scsi_len = 0;
+assert(s->scsi_len == 0);


Preferably having scsi_len changes in a distinct patch,
Reviewed-by: Philippe Mathieu-Daudé 


  s->req = scsi_req_new(scsi_dev, tag, cbw.lun,
cbw.cmd, cbw.cmd_len, NULL);
  if (s->commandlog) {





Re: [PATCH v3 1/5] io: Fix partial struct copy in qio_dns_resolver_lookup_sync_inet()

2025-04-11 Thread Daniel P . Berrangé
On Tue, Apr 08, 2025 at 01:25:00PM +0200, Juraj Marcin wrote:
> From: Juraj Marcin 
> 
> Commit aec21d3175 (qapi: Add InetSocketAddress member keep-alive)
> introduces the keep-alive flag, but this flag is not copied together
> with other options in qio_dns_resolver_lookup_sync_inet().
> 
> This patch fixes this issue and also prevents future ones by copying the
> entire structure first and only then overriding a few attributes that
> need to be different.
> 
> Fixes: aec21d31756c (qapi: Add InetSocketAddress member keep-alive)
> Signed-off-by: Juraj Marcin 
> ---
>  io/dns-resolver.c | 21 +
>  1 file changed, 5 insertions(+), 16 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Management applications and CPU feature flags (was: [PATCH V1 0/6] fast qom tree get)

2025-04-11 Thread Markus Armbruster
Daniel P. Berrangé  writes:

> On Wed, Apr 09, 2025 at 09:58:13AM +0200, Peter Krempa via Devel wrote:
>> On Wed, Apr 09, 2025 at 09:39:02 +0200, Markus Armbruster via Devel wrote:
>> > Hi Steve, I apologize for the slow response.
>> > 
>> > Steve Sistare  writes:
>> > 
>> > > Using qom-list and qom-get to get all the nodes and property values in a
>> > > QOM tree can take multiple seconds because it requires 1000's of 
>> > > individual
>> > > QOM requests.  Some managers fetch the entire tree or a large subset
>> > > of it when starting a new VM, and this cost is a substantial fraction of
>> > > start up time.
>> > 
>> > "Some managers"... could you name one?
>> 
>> libvirt is at ~500 qom-get calls during an average startup ...
>> 
>> > > To reduce this cost, consider QAPI calls that fetch more information in
>> > > each call:
>> > >   * qom-list-get: given a path, return a list of properties and values.
>> > >   * qom-list-getv: given a list of paths, return a list of properties and
>> > > values for each path.
>> > >   * qom-tree-get: given a path, return all descendant nodes rooted at 
>> > > that
>> > > path, with properties and values for each.
>> > 
>> > Libvirt developers, would you be interested in any of these?
>> 
>> YES!!!
>
> Not neccessarily, see below...  
>
>> 
>> The getter with value could SO MUCH optimize the startup sequence of a
>> VM where libvirt needs to probe CPU flags:
>> 
>> (note the 'id' field in libvirt's monitor is sequential)
>> 
>> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"realized"},"id":"libvirt-8"}
>> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hotplugged"},"id":"libvirt-9"}
>> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hotpluggable"},"id":"libvirt-10"}
>> 
>> [...]
>> 
>> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hv-apicv"},"id":"libvirt-470"}
>> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"xd"},"id":"libvirt-471"}
>> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"sse4_1"},"id":"libvirt-472"}
>> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"unavailable-features"},"id":"libvirt-473"}
>> 
>> First and last line's timestamps:
>> 
>> 2025-04-08 14:44:28.882+: 1481190: info : qemuMonitorIOWrite:340 : 
>> QEMU_MONITOR_IO_WRITE: mon=0x7f4678048360 
>> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"realized"},"id":"libvirt-8"}
>> 
>> 2025-04-08 14:44:29.149+: 1481190: info : qemuMonitorIOWrite:340 : 
>> QEMU_MONITOR_IO_WRITE: mon=0x7f4678048360 
>> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"unavailable-features"},"id":"libvirt-473"}
>> 
>> Libvirt spent ~170 ms probing cpu flags.
>
> One thing I would point out is that qom-get can be considered an
> "escape hatch" to get information when no better QMP command exists.
> In this case, libvirt has made the assumption that every CPU feature
> is a QOM property.
>
> Adding qom-list-get doesn't appreciably change that, just makes the
> usage more efficient.
>
> Considering the bigger picture QMP design, when libvirt is trying to
> understand QEMU's CPU feature flag expansion, I would ask why we don't
> have something like a "query-cpu" command to tell us the current CPU
> expansion, avoiding the need for poking at QOM properties directly.

How do the existing query-cpu-FOO fall short of what management
applications such as libvirt needs?




Re: [PATCH v3 2/5] util/qemu-sockets: Refactor setting client sockopts into a separate function

2025-04-11 Thread Daniel P . Berrangé
On Tue, Apr 08, 2025 at 01:25:01PM +0200, Juraj Marcin wrote:
> From: Juraj Marcin 
> 
> This is done in preparation for enabling the SO_KEEPALIVE support for
> server sockets and adding settings for more TCP keep-alive socket
> options.
> 
> Signed-off-by: Juraj Marcin 
> ---
>  util/qemu-sockets.c | 29 +++--
>  1 file changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> index 77477c1cd5..d15f6aa4b0 100644
> --- a/util/qemu-sockets.c
> +++ b/util/qemu-sockets.c
> @@ -205,6 +205,22 @@ static int try_bind(int socket, InetSocketAddress 
> *saddr, struct addrinfo *e)
>  #endif
>  }
>  
> +static int inet_set_sockopts(int sock, InetSocketAddress *saddr, Error 
> **errp)
> +{
> +if (saddr->keep_alive) {
> +int keep_alive = 1;
> +int ret = setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE,
> + &keep_alive, sizeof(keep_alive));
> +
> +if (ret < 0) {
> +error_setg_errno(errp, errno,
> + "Unable to set keep-alive option on socket");
> +return -1;
> +}
> +}
> +return 0;
> +}
> +
>  static int inet_listen_saddr(InetSocketAddress *saddr,
>   int port_offset,
>   int num,
> @@ -475,16 +491,9 @@ int inet_connect_saddr(InetSocketAddress *saddr, Error 
> **errp)
>  return sock;
>  }
>  
> -if (saddr->keep_alive) {
> -int val = 1;
> -int ret = setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE,
> - &val, sizeof(val));
> -
> -if (ret < 0) {
> -error_setg_errno(errp, errno, "Unable to set KEEPALIVE");
> -close(sock);
> -return -1;
> -}
> +if (inet_set_sockopts(sock, saddr, errp)) {

Since this returns -1 on error, by convention we check "< 0", reserving
positive return values for future non-error scenarios.

> +close(sock);
> +return -1;
>  }
>  
>  return sock;
> -- 
> 2.48.1
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: Management applications and CPU feature flags (was: [PATCH V1 0/6] fast qom tree get)

2025-04-11 Thread Daniel P . Berrangé
On Fri, Apr 11, 2025 at 12:40:46PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé  writes:
> 
> > On Wed, Apr 09, 2025 at 09:58:13AM +0200, Peter Krempa via Devel wrote:
> >> On Wed, Apr 09, 2025 at 09:39:02 +0200, Markus Armbruster via Devel wrote:
> >> > Hi Steve, I apologize for the slow response.
> >> > 
> >> > Steve Sistare  writes:
> >> > 
> >> > > Using qom-list and qom-get to get all the nodes and property values in 
> >> > > a
> >> > > QOM tree can take multiple seconds because it requires 1000's of 
> >> > > individual
> >> > > QOM requests.  Some managers fetch the entire tree or a large subset
> >> > > of it when starting a new VM, and this cost is a substantial fraction 
> >> > > of
> >> > > start up time.
> >> > 
> >> > "Some managers"... could you name one?
> >> 
> >> libvirt is at ~500 qom-get calls during an average startup ...
> >> 
> >> > > To reduce this cost, consider QAPI calls that fetch more information in
> >> > > each call:
> >> > >   * qom-list-get: given a path, return a list of properties and values.
> >> > >   * qom-list-getv: given a list of paths, return a list of properties 
> >> > > and
> >> > > values for each path.
> >> > >   * qom-tree-get: given a path, return all descendant nodes rooted at 
> >> > > that
> >> > > path, with properties and values for each.
> >> > 
> >> > Libvirt developers, would you be interested in any of these?
> >> 
> >> YES!!!
> >
> > Not neccessarily, see below...  
> >
> >> 
> >> The getter with value could SO MUCH optimize the startup sequence of a
> >> VM where libvirt needs to probe CPU flags:
> >> 
> >> (note the 'id' field in libvirt's monitor is sequential)
> >> 
> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"realized"},"id":"libvirt-8"}
> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hotplugged"},"id":"libvirt-9"}
> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hotpluggable"},"id":"libvirt-10"}
> >> 
> >> [...]
> >> 
> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hv-apicv"},"id":"libvirt-470"}
> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"xd"},"id":"libvirt-471"}
> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"sse4_1"},"id":"libvirt-472"}
> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"unavailable-features"},"id":"libvirt-473"}
> >> 
> >> First and last line's timestamps:
> >> 
> >> 2025-04-08 14:44:28.882+: 1481190: info : qemuMonitorIOWrite:340 : 
> >> QEMU_MONITOR_IO_WRITE: mon=0x7f4678048360 
> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"realized"},"id":"libvirt-8"}
> >> 
> >> 2025-04-08 14:44:29.149+: 1481190: info : qemuMonitorIOWrite:340 : 
> >> QEMU_MONITOR_IO_WRITE: mon=0x7f4678048360 
> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"unavailable-features"},"id":"libvirt-473"}
> >> 
> >> Libvirt spent ~170 ms probing cpu flags.
> >
> > One thing I would point out is that qom-get can be considered an
> > "escape hatch" to get information when no better QMP command exists.
> > In this case, libvirt has made the assumption that every CPU feature
> > is a QOM property.
> >
> > Adding qom-list-get doesn't appreciably change that, just makes the
> > usage more efficient.
> >
> > Considering the bigger picture QMP design, when libvirt is trying to
> > understand QEMU's CPU feature flag expansion, I would ask why we don't
> > have something like a "query-cpu" command to tell us the current CPU
> > expansion, avoiding the need for poking at QOM properties directly.
> 
> How do the existing query-cpu-FOO fall short of what management
> applications such as libvirt needs?

It has been along while since I looked at them, but IIRC they were
returning static info about CPU models, whereas libvirt wanted info
on the currently requested '-cpu ARGS'

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v1 01/24] Add -boot-certificates /path/dir:/path/file option in QEMU command line

2025-04-11 Thread Thomas Huth

On 08/04/2025 17.55, Zhuoying Cai wrote:

The `-boot-certificates /path/dir:/path/file` option is implemented
to provide path to either a directory or a single certificate.

Multiple paths can be delineated using a colon.

Signed-off-by: Zhuoying Cai 
---
  qemu-options.hx | 11 +++
  system/vl.c | 22 ++
  2 files changed, 33 insertions(+)

diff --git a/qemu-options.hx b/qemu-options.hx
index dc694a99a3..b460c63490 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1251,6 +1251,17 @@ SRST
  Set system UUID.
  ERST
  
+DEF("boot-certificates", HAS_ARG, QEMU_OPTION_boot_certificates,

+"-boot-certificates /path/directory:/path/file\n"
+"  Provide a path to a directory or a boot certificate.\n"
+"  A colon may be used to delineate multiple paths.\n",
+QEMU_ARCH_S390X)
+SRST
+``-boot-certificates /path/directory:/path/file``
+Provide a path to a directory or a boot certificate.
+A colon may be used to delineate multiple paths.
+ERST


Unless there is a really, really good reason for introducing new top-level 
options to QEMU, this should rather be added to one of the existing options 
instead.


I assume this is very specific to s390x, isn't it? So the best way is likely 
to add this as a parameter of the machine type option, so that the user 
would specify:


 qemu-system-s390x -machine s390-ccw-virtio,boot-certificates=/path/to/certs

See the other object_class_property_add() statements in 
ccw_machine_class_init() for some examples how to do this.


 Thomas




Re: [PATCH 08/10] hw/9pfs: Allow using hw/9pfs with emscripten

2025-04-11 Thread Kohei Tokunaga
Hi Christian,

> > Emscripten's fiber does not support submitting coroutines to other
> > threads. So this commit modifies hw/9pfs/coth.h to disable this behavior
> > when compiled with Emscripten.
>
> The lack of being able to dispatch a coroutine to a worker thread is one
> thing, however it would probably still make sense to use fibers in 9pfs as
> replacement of its coroutines mechanism.
>
> In 9pfs coroutines are used to dispatch blocking fs I/O syscalls from main
> thread to worker thread(s):
>
> https://wiki.qemu.org/Documentation/9p#Control_Flow
>
> If you just remove the coroutine code entirely, 9p server might hang for
good,
> and with it QEMU's main thread.
>
> By using fibers instead, it would not hang, as it seems as if I/O
syscalls are
> emulated in Emscripten, right?

Thank you for the feedback. Yes, it would be great if Emscripten's fiber
could be used to address this limitation. Since Emscripten's fiber is
cooperative, I believe a blocking code_block can still block the 9pfs server
unless an explicit yield occurs within it. I'll continue exploring better
solutions for this. Please let me know if I'm missing anything.

> Missing
>
> errno = ENOTSUP;

Sure, I'll fix this in the next version of the series.

> Looks like you just copied the macOS errno translation code. That probably
> doesn't make sense.

Errno values differ between Emscripten and Linux, so conversion is required
here. I've used the same mappings as macOS for now, but I'm happy to add
more conversions if needed.


Re: [PATCH 1/5] vfio/iommufd: Save host iommu capabilities in VFIODevice.caps

2025-04-11 Thread Joao Martins
On 11/04/2025 11:17, Zhenzhong Duan wrote:
> The saved caps copy can be used to check dirty tracking capability.
> 
> The capabilities is gotten through IOMMUFD interface, so define a
> new structure HostIOMMUDeviceIOMMUFDCaps which contains vendor
> caps raw data in "include/system/iommufd.h".
> 
> This is a prepare work for moving .realize() after .attach_device().
> 
> Suggested-by: Cédric Le Goater 
> Suggested-by: Eric Auger 
> Suggested-by: Nicolin Chen 
> Signed-off-by: Zhenzhong Duan 
> ---
>  include/hw/vfio/vfio-device.h |  1 +
>  include/system/iommufd.h  | 22 ++
>  hw/vfio/iommufd.c | 10 +-
>  3 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index 66797b4c92..09a7af891a 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -77,6 +77,7 @@ typedef struct VFIODevice {
>  bool dirty_tracking; /* Protected by BQL */
>  bool iommu_dirty_tracking;
>  HostIOMMUDevice *hiod;
> +HostIOMMUDeviceIOMMUFDCaps caps;
>  int devid;
>  IOMMUFDBackend *iommufd;
>  VFIOIOASHwpt *hwpt;
> diff --git a/include/system/iommufd.h b/include/system/iommufd.h
> index cbab75bfbf..0f337585c9 100644
> --- a/include/system/iommufd.h
> +++ b/include/system/iommufd.h
> @@ -18,6 +18,9 @@
>  #include "exec/hwaddr.h"
>  #include "exec/cpu-common.h"
>  #include "system/host_iommu_device.h"
> +#ifdef CONFIG_LINUX
> +#include 
> +#endif
>  
>  #define TYPE_IOMMUFD_BACKEND "iommufd"
>  OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass, IOMMUFD_BACKEND)
> @@ -63,4 +66,23 @@ bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, 
> uint32_t hwpt_id,
>Error **errp);
>  
>  #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
> +
> +typedef union VendorCaps {
> +struct iommu_hw_info_vtd vtd;
> +struct iommu_hw_info_arm_smmuv3 smmuv3;
> +} VendorCaps;
> +
> +/**
> + * struct HostIOMMUDeviceIOMMUFDCaps - Define host IOMMU device capabilities.
> + *
> + * @type: host platform IOMMU type.
> + *
> + * @hw_caps: host platform IOMMU capabilities (e.g. on IOMMUFD this 
> represents
> + *   the @out_capabilities value returned from IOMMU_GET_HW_INFO 
> ioctl)
> + */
> +typedef struct HostIOMMUDeviceIOMMUFDCaps {
> +uint32_t type;
> +uint64_t hw_caps;
> +VendorCaps vendor_caps;
> +} HostIOMMUDeviceIOMMUFDCaps;
>  #endif
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 48db105422..530cde6740 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -324,7 +324,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice 
> *vbasedev,
>   * vfio_migration_realize() may decide to use VF dirty tracking
>   * instead.
>   */
> -if (vbasedev->hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING) {
> +if (vbasedev->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING) {
>  flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>  }
>  
> @@ -475,6 +475,7 @@ static bool iommufd_cdev_attach(const char *name, 
> VFIODevice *vbasedev,
>  int ret, devfd;
>  uint32_t ioas_id;
>  Error *err = NULL;
> +HostIOMMUDeviceIOMMUFDCaps *caps = &vbasedev->caps;
>  const VFIOIOMMUClass *iommufd_vioc =
>  VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
>  
> @@ -505,6 +506,13 @@ static bool iommufd_cdev_attach(const char *name, 
> VFIODevice *vbasedev,
>  goto err_alloc_ioas;
>  }
>  
> +if (!iommufd_backend_get_device_info(vbasedev->iommufd, vbasedev->devid,
> + &caps->type, &caps->vendor_caps,
> + sizeof(VendorCaps), &caps->hw_caps,
> + errp)) {
> +goto err_alloc_ioas;
> +}
> +

I think this will fail on mdev (and thus fail the attachment mistakengly as
there's no IOMMUFDDevice with mdev) ? In case it fails, you can just do:

if (!vbasedev->mdev && !iommufd_backend_get_device_info(...)) {




Re: [PATCH 07/10] tcg: Add a TCG backend for WebAssembly

2025-04-11 Thread Philippe Mathieu-Daudé

Hi Kohei,

On 7/4/25 16:45, Kohei Tokunaga wrote:

A TB consists of a wasmTBHeader followed by the data listed below. The
wasmTBHeader contains pointers for each element:

- TCI code
- Wasm code
- Array of function indices imported into the Wasm instance
- Counter tracking the number of TB executions
- Pointer to the Wasm instance information

The Wasm backend (tcg/wasm32.c) and Wasm instances running on the same
thread share information, such as CPUArchState, through a wasmContext
structure. The Wasm backend defines tcg_qemu_tb_exec as a common entry point
for TBs, similar to the TCI backend. tcg_qemu_tb_exec runs TBs on a forked
TCI interpreter by default, while compiles and executes frequently executed
TBs as Wasm.

The code generator (tcg/wasm32) receives TCG IR and generates both Wasm and
TCI instructions. Since Wasm cannot directly jump to specific addresses,
labels are implemented using Wasm control flow instructions. As shown in the
pseudo-code below, a TB wraps instructions in a large loop, where codes are
placed within if blocks separated by labels. Branching is handled by
breaking from the current block and entering the target block.

loop
   if
 ... code after label1
   end
   if
 ... code after label2
   end
   ...
end

Additionally, the Wasm backend differs from other backends in several ways:

- goto_tb and goto_ptr return control to tcg_qemu_tb_exec which runs the
   target TB
- Helper function pointers are stored in an array in TB and imported into
   the Wasm instance on execution
- Wasm TBs lack prologue and epilogue. TBs are executed via tcg_qemu_tb_exec

Browsers cause out of memory error if too many Wasm instances are
created. To prevent this, the Wasm backend tracks active instances using an
array. When instantiating a new instance risks exceeding the limit, the
backend removes older instances to avoid browser errors. These removed
instances are re-instantiated when needed.

Signed-off-by: Kohei Tokunaga 
---
  include/accel/tcg/getpc.h|2 +-
  include/tcg/helper-info.h|4 +-
  include/tcg/tcg.h|2 +-
  meson.build  |2 +
  tcg/meson.build  |5 +
  tcg/tcg.c|   26 +-
  tcg/wasm32.c | 1260 +
  tcg/wasm32.h |   39 +
  tcg/wasm32/tcg-target-con-set.h  |   18 +
  tcg/wasm32/tcg-target-con-str.h  |8 +
  tcg/wasm32/tcg-target-has.h  |  102 +
  tcg/wasm32/tcg-target-mo.h   |   12 +
  tcg/wasm32/tcg-target-opc.h.inc  |4 +
  tcg/wasm32/tcg-target-reg-bits.h |   12 +
  tcg/wasm32/tcg-target.c.inc  | 4484 ++
  tcg/wasm32/tcg-target.h  |   65 +
  16 files changed, 6035 insertions(+), 10 deletions(-)
  create mode 100644 tcg/wasm32.c
  create mode 100644 tcg/wasm32.h
  create mode 100644 tcg/wasm32/tcg-target-con-set.h
  create mode 100644 tcg/wasm32/tcg-target-con-str.h
  create mode 100644 tcg/wasm32/tcg-target-has.h
  create mode 100644 tcg/wasm32/tcg-target-mo.h
  create mode 100644 tcg/wasm32/tcg-target-opc.h.inc
  create mode 100644 tcg/wasm32/tcg-target-reg-bits.h
  create mode 100644 tcg/wasm32/tcg-target.c.inc
  create mode 100644 tcg/wasm32/tcg-target.h




diff --git a/tcg/tcg.c b/tcg/tcg.c
index dfd48b8264..154a4dafa7 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -136,6 +136,10 @@ static void tcg_out_goto_tb(TCGContext *s, int which);
  static void tcg_out_op(TCGContext *s, TCGOpcode opc, TCGType type,
 const TCGArg args[TCG_MAX_OP_ARGS],
 const int const_args[TCG_MAX_OP_ARGS]);
+#if defined(EMSCRIPTEN)


Maybe we can let this independently of EMSCRIPTEN, to reduce #ifdef'ry.


+static void tcg_out_label_cb(TCGContext *s, TCGLabel *l);
+static int tcg_out_tb_end(TCGContext *s);
+#endif
  #if TCG_TARGET_MAYBE_vec
  static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
  TCGReg dst, TCGReg src);
@@ -251,7 +255,7 @@ TCGv_env tcg_env;
  const void *tcg_code_gen_epilogue;
  uintptr_t tcg_splitwx_diff;
  
-#ifndef CONFIG_TCG_INTERPRETER

+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(EMSCRIPTEN)


s/&&/||/ otherwise breaks TCI? (various cases)


  tcg_prologue_fn *tcg_qemu_tb_exec;
  #endif
  
@@ -358,6 +362,9 @@ static void tcg_out_label(TCGContext *s, TCGLabel *l)

  tcg_debug_assert(!l->has_value);
  l->has_value = 1;
  l->u.value_ptr = tcg_splitwx_to_rx(s->code_ptr);
+#if defined(EMSCRIPTEN)
+tcg_out_label_cb(s, l);
+#endif
  }
  
  TCGLabel *gen_new_label(void)

@@ -1139,7 +1146,7 @@ static TCGHelperInfo info_helper_st128_mmu = {
| dh_typemask(ptr, 5)  /* uintptr_t ra */
  };
  
-#ifdef CONFIG_TCG_INTERPRETER

+#if defined(CONFIG_TCG_INTERPRETER) || defined(EMSCRIPTEN)
  static ffi_type *typecode_to_ffi(int argmask)
  {
  /*
@@ -1593,7 +1600,7 @@ void tcg_prologue_init(void)
  s->code_buf = s->code_gen_ptr;
  s->data_gen_ptr = NULL;
  

Re: [PATCH 2/5] vfio: Move realize() after attach_device()

2025-04-11 Thread Philippe Mathieu-Daudé

Hi,

On 11/4/25 12:17, Zhenzhong Duan wrote:

Previously device attaching depends on realize() getting host iommu
capabilities to check dirty tracking support.

Now we save a caps copy in VFIODevice and check that copy for dirty
tracking support, there is no dependency any more, move realize()
call after attach_device() call in vfio_device_attach().

Drop vfio_device_hiod_realize() which looks redundant now.

Suggested-by: Cédric Le Goater 
Suggested-by: Donald Dutile 
Signed-off-by: Zhenzhong Duan 
---
  include/hw/vfio/vfio-device.h |  1 -
  hw/vfio/container.c   |  4 
  hw/vfio/device.c  | 28 +++-
  hw/vfio/iommufd.c |  4 
  4 files changed, 11 insertions(+), 26 deletions(-)

diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 09a7af891a..14559733c6 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -124,7 +124,6 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, 
int index, int subindex
  
  void vfio_device_reset_handler(void *opaque);

  bool vfio_device_is_mdev(VFIODevice *vbasedev);
-bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);


Pre-existing, but can we add documentation about what vfio_device_attach
does, in particular in which state is the device once attached (or if
attachment failed)?


  bool vfio_device_attach(char *name, VFIODevice *vbasedev,
  AddressSpace *as, Error **errp);
  void vfio_device_detach(VFIODevice *vbasedev);





Re: [PATCH 01/10] various: Fix type conflict of GLib function pointers

2025-04-11 Thread Kohei Tokunaga
Hi Paolo,

> > On emscripten, function pointer casts can cause function call failure.
> > This commit fixes the function definition to match to the type of the
> > function call.
> >
> > - qtest_set_command_cb passed to g_once should match to GThreadFunc
>
> Sending an alternative patch that doesn't use GOnce, this code runs in
> the main thread.

Thank you for addressing this issue. I've sent a review to that patch.

> > - object_class_cmp and cpreg_key_compare are passed to g_list_sort as
> >GCopmareFunc but GLib cast them to GCompareDataFunc.
>
> Please use g_list_sort_with_data instead, and poison
> g_slist_sort/g_list_sort in include/glib-compat.h, with a comment
> explaining that it's done this way because of Emscripten.

Sure, I’ll fix this in the next version of the series.


Re: [PATCH 02/10] various: Define macros for dependencies on emscripten

2025-04-11 Thread Kohei Tokunaga
Hi Paolo,

> > +#ifdef EMSCRIPTEN
> > +/*
> > + * emscripten exposes copy_file_range declaration but doesn't provide
the
> > + * implementation in the final link. Define the stub here but avoid
type
> > + * conflict with the emscripten's header.
> > + */
> > +ssize_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
> > + off_t *out_off, size_t len, unsigned int
flags)
> > +{
> > +errno = ENOSYS;
> > +return -1;
> > +}
>
> Please add a file stubs/emscripten.c with this function, and add it to
> the build in stubs/meson.build.
>
> > +#ifdef EMSCRIPTEN
> > +error_report("initgroups unsupported");
> > +exit(1);
>
> I think it's best to add a new function os-wasm.c in addition to
> os-posix.c and os-win32.c, and disable all the functionality of
> -run-with and -daemonize in vl.c via
>
> -#if defined(CONFIG_POSIX)
> +#if defined(CONFIG_POSIX) && !defined(EMSCRIPTEN)
>
> (there are a couple occurrences).

Sure, I'll apply these reorganization in the next version of the series.


Re: [PATCH 05/10] meson: Add wasm build in build scripts

2025-04-11 Thread Kohei Tokunaga
Hi Paolo,

> > > >> has_int128_type is set to false on emscripten as of now to avoid
errors by
> > > >> libffi.
> > >
> > > What is the error here?  How hard would it be to test for it?
> >
> > When has_int128_type=true, I encountered a runtime error from libffi. To
> > reproduce this, we need to actually execute a libffi call with 128-bit
> > arguments.
> >
> > > Uncaught TypeError: Cannot convert 1079505232 to a BigInt
> > > at ffi_call_js (out.js:702:37)
> > > at qemu-system-x86_64.wasm.ffi_call
(qemu-system-x86_64.wasm:0xa37712)
> > > at qemu-system-x86_64.wasm.tcg_qemu_tb_exec_tci
(qemu-system-x86_64.wasm:0x65f440)
> > > at qemu-system-x86_64.wasm.tcg_qemu_tb_exec
(qemu-system-x86_64.wasm:0x65edff)
> > > at qemu-system-x86_64.wasm.cpu_tb_exec
(qemu-system-x86_64.wasm:0x6762c0)
> > > at qemu-system-x86_64.wasm.cpu_exec_loop
(qemu-system-x86_64.wasm:0x677c84)
> > > at qemu-system-x86_64.wasm.dynCall_iii
(qemu-system-x86_64.wasm:0xab9014)
> > > at ret. (out.js:6016:24)
> > > at invoke_iii (out.js:7574:10)
> > > at qemu-system-x86_64.wasm.cpu_exec_setjmp
(qemu-system-x86_64.wasm:0x676db8)
>
> Ok, I guess a comment mentioning that it's a libffi limitation is enough.
>
> > > At least -g -O3 -pthread should not be necessary.
> >
> > Thank you for the suggestion. -sPROXY_TO_PTHREAD flag used in
c_link_args
> > always requires -pthread, even during configuration. Otherwise, emcc
returns
> > an error like:
> >
> > > emcc: error: -sPROXY_TO_PTHREAD requires -pthread to work!
> >
> > So I think -pthread needs to be included in c_link_args at minimum.
I'll try
> > to remove other flags in the next version of the series.
>
> Reading more about -sPROXY_TO_PTHREAD it seems that you need it for
> all calls to emcc, even when compiling, so it's better to leave it in
> everywhere.
>
> > > For -Wno-unused-command-line-argument what are the warnings/errors
that
> > > you are getting?
> >
> > I encountered the following error when compiling QEMU:
> >
> > > clang: error: argument unused during compilation: '-no-pie'
[-Werror,-Wunused-command-line-argument]
> >
> > It seems Emscripten doesn't support the -no-pie flag, and this wasn't
caught
> > during the configure phase. It seems that removing
> > -Wno-unused-command-line-argument would require the following change in
> > meson.build, but I'm open to better approaches.
> >
> > > -if not get_option('b_pie')
> > > +if not get_option('b_pie') and host_os != 'emscripten'
> > >qemu_common_flags += cc.get_supported_arguments('-fno-pie',
'-no-pie')
> > >  endif
>
> Meson should have passed the -Werror=unused-command-line-argument flag
> when doing the above test (CLikeCompiler._has_multi_arguments ->
> has_arguments -> Compiler.compiles -> _build_wrapper ->
> build_wrapper_args -> ClangCompiler.get_compiler_check_args). It would
> be great if you can check what's wrong in this theory so perhaps meson
> can be fixed, or at least send here a meson-log.txt.

According to meson-log.txt as shown below,
-Werror=unused-command-line-argument was passed to the compiler, but it
didn't catch the warning.

> Working directory:  /build/meson-private/tmp4q_5wl_9
> Code:
> extern int i;
> int i;
>
> ---
> Command line: `/emsdk/upstream/emscripten/emcc -m32
/build/meson-private/tmp4q_5wl_9/testfile.c -o
/build/meson-private/tmp4q_5wl_9/output.o -c -D_FILE_OFFSET_BITS=64 -O0
-Werror=implicit-function-declaration -Werror=unknown-warning-option
-Werror=unused-command-line-argument -fno-pie` -> 0
> Compiler for C supports arguments -fno-pie: YES
> Running compile:
> Working directory:  /build/meson-private/tmpl9yy_8gs
> Code:
> extern int i;
> int i;
>
> ---
> Command line: `/emsdk/upstream/emscripten/emcc -m32
/build/meson-private/tmpl9yy_8gs/testfile.c -o
/build/meson-private/tmpl9yy_8gs/output.o -c -D_FILE_OFFSET_BITS=64 -O0
-Werror=implicit-function-declaration -Werror=unknown-warning-option
-Werror=unused-command-line-argument -no-pie` -> 0
> stderr:
> clang: warning: argument unused during compilation: '-no-pie'
[-Wunused-command-line-argument]
> ---
> Compiler for C supports arguments -no-pie: YES

It seems there's a related issue thread on the Meson repository [1].

[1] https://github.com/mesonbuild/meson/issues/5355

> My suggestion is (if possible) to split out the parts of this series
> that are enough to run QEMU under TCI, and get those in as quickly as
> possible. The TCG backend can come second.

Sure, I'll try to split this patch series.


Re: [PATCH 1/5] vfio/iommufd: Save host iommu capabilities in VFIODevice.caps

2025-04-11 Thread Cédric Le Goater

On 4/11/25 12:17, Zhenzhong Duan wrote:

The saved caps copy can be used to check dirty tracking capability.

The capabilities is gotten through IOMMUFD interface, so define a
new structure HostIOMMUDeviceIOMMUFDCaps which contains vendor
caps raw data in "include/system/iommufd.h".

This is a prepare work for moving .realize() after .attach_device().

Suggested-by: Cédric Le Goater 
Suggested-by: Eric Auger 
Suggested-by: Nicolin Chen 
Signed-off-by: Zhenzhong Duan 
---
  include/hw/vfio/vfio-device.h |  1 +
  include/system/iommufd.h  | 22 ++
  hw/vfio/iommufd.c | 10 +-
  3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 66797b4c92..09a7af891a 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -77,6 +77,7 @@ typedef struct VFIODevice {
  bool dirty_tracking; /* Protected by BQL */
  bool iommu_dirty_tracking;
  HostIOMMUDevice *hiod;
+HostIOMMUDeviceIOMMUFDCaps caps;


IMO, these capabilities belong to HostIOMMUDevice and not VFIODevice.

I would simply call iommufd_backend_get_device_info() twice where needed :
iommufd_cdev_autodomains_get() and  hiod_iommufd_vfio_realize()


Thanks,

C.




  int devid;
  IOMMUFDBackend *iommufd;
  VFIOIOASHwpt *hwpt;
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index cbab75bfbf..0f337585c9 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -18,6 +18,9 @@
  #include "exec/hwaddr.h"
  #include "exec/cpu-common.h"
  #include "system/host_iommu_device.h"
+#ifdef CONFIG_LINUX
+#include 
+#endif
  
  #define TYPE_IOMMUFD_BACKEND "iommufd"

  OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass, IOMMUFD_BACKEND)
@@ -63,4 +66,23 @@ bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, 
uint32_t hwpt_id,
Error **errp);
  
  #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"

+
+typedef union VendorCaps {
+struct iommu_hw_info_vtd vtd;
+struct iommu_hw_info_arm_smmuv3 smmuv3;
+} VendorCaps;
+
+/**
+ * struct HostIOMMUDeviceIOMMUFDCaps - Define host IOMMU device capabilities.
+ *
+ * @type: host platform IOMMU type.
+ *
+ * @hw_caps: host platform IOMMU capabilities (e.g. on IOMMUFD this represents
+ *   the @out_capabilities value returned from IOMMU_GET_HW_INFO ioctl)
+ */
+typedef struct HostIOMMUDeviceIOMMUFDCaps {
+uint32_t type;
+uint64_t hw_caps;
+VendorCaps vendor_caps;
+} HostIOMMUDeviceIOMMUFDCaps;
  #endif
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 48db105422..530cde6740 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -324,7 +324,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice 
*vbasedev,
   * vfio_migration_realize() may decide to use VF dirty tracking
   * instead.
   */
-if (vbasedev->hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING) {
+if (vbasedev->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING) {
  flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
  }
  
@@ -475,6 +475,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,

  int ret, devfd;
  uint32_t ioas_id;
  Error *err = NULL;
+HostIOMMUDeviceIOMMUFDCaps *caps = &vbasedev->caps;
  const VFIOIOMMUClass *iommufd_vioc =
  VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
  
@@ -505,6 +506,13 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,

  goto err_alloc_ioas;
  }
  
+if (!iommufd_backend_get_device_info(vbasedev->iommufd, vbasedev->devid,

+ &caps->type, &caps->vendor_caps,
+ sizeof(VendorCaps), &caps->hw_caps,
+ errp)) {
+goto err_alloc_ioas;
+}
+
  /* try to attach to an existing container in this space */
  QLIST_FOREACH(bcontainer, &space->containers, next) {
  container = container_of(bcontainer, VFIOIOMMUFDContainer, 
bcontainer);





[PATCH v2 0/2] scsi-disk: Add FUA write support

2025-04-11 Thread Alberto Faria
Add scsi-disk support for Force Unit Access (FUA) writes. The first patch lets
us avoid FUA emulation when the underlying driver supports it natively. The
second patch makes scsi-disk devices advertise FUA support by default.

v2:
- Drop FUA write emulation logic since the block layer already does that.
- Add machine type compat for "dpofua".

Alberto Faria (2):
  scsi-disk: Add native FUA write support
  scsi-disk: Advertise FUA support by default

 hw/core/machine.c   |  1 +
 hw/scsi/scsi-disk.c | 45 +++--
 2 files changed, 12 insertions(+), 34 deletions(-)

-- 
2.49.0




[PATCH v2 1/2] scsi-disk: Add native FUA write support

2025-04-11 Thread Alberto Faria
Simply propagate the FUA flag on write requests to the driver. The block
layer will emulate it if necessary.

Signed-off-by: Alberto Faria 
---
 hw/scsi/scsi-disk.c | 43 ++-
 1 file changed, 10 insertions(+), 33 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index e59632e9b1..f62dcded64 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -74,7 +74,7 @@ struct SCSIDiskClass {
  */
 DMAIOFunc   *dma_readv;
 DMAIOFunc   *dma_writev;
-bool(*need_fua_emulation)(SCSICommand *cmd);
+bool(*need_fua)(SCSICommand *cmd);
 void(*update_sense)(SCSIRequest *r);
 };
 
@@ -85,7 +85,7 @@ typedef struct SCSIDiskReq {
 uint32_t sector_count;
 uint32_t buflen;
 bool started;
-bool need_fua_emulation;
+bool need_fua;
 struct iovec iov;
 QEMUIOVector qiov;
 BlockAcctCookie acct;
@@ -389,24 +389,6 @@ static bool scsi_is_cmd_fua(SCSICommand *cmd)
 }
 }
 
-static void scsi_write_do_fua(SCSIDiskReq *r)
-{
-SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
-
-assert(r->req.aiocb == NULL);
-assert(!r->req.io_canceled);
-
-if (r->need_fua_emulation) {
-block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct, 0,
- BLOCK_ACCT_FLUSH);
-r->req.aiocb = blk_aio_flush(s->qdev.conf.blk, scsi_aio_complete, r);
-return;
-}
-
-scsi_req_complete(&r->req, GOOD);
-scsi_req_unref(&r->req);
-}
-
 static void scsi_dma_complete_noio(SCSIDiskReq *r, int ret)
 {
 assert(r->req.aiocb == NULL);
@@ -416,12 +398,7 @@ static void scsi_dma_complete_noio(SCSIDiskReq *r, int ret)
 
 r->sector += r->sector_count;
 r->sector_count = 0;
-if (r->req.cmd.mode == SCSI_XFER_TO_DEV) {
-scsi_write_do_fua(r);
-return;
-} else {
-scsi_req_complete(&r->req, GOOD);
-}
+scsi_req_complete(&r->req, GOOD);
 
 done:
 scsi_req_unref(&r->req);
@@ -564,7 +541,7 @@ static void scsi_read_data(SCSIRequest *req)
 
 first = !r->started;
 r->started = true;
-if (first && r->need_fua_emulation) {
+if (first && r->need_fua) {
 block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct, 0,
  BLOCK_ACCT_FLUSH);
 r->req.aiocb = blk_aio_flush(s->qdev.conf.blk, scsi_do_read_cb, r);
@@ -589,8 +566,7 @@ static void scsi_write_complete_noio(SCSIDiskReq *r, int 
ret)
 r->sector += n;
 r->sector_count -= n;
 if (r->sector_count == 0) {
-scsi_write_do_fua(r);
-return;
+scsi_req_complete(&r->req, GOOD);
 } else {
 scsi_init_iovec(r, SCSI_DMA_BUF_SIZE);
 trace_scsi_disk_write_complete_noio(r->req.tag, r->qiov.size);
@@ -2391,7 +2367,7 @@ static int32_t scsi_disk_dma_command(SCSIRequest *req, 
uint8_t *buf)
 scsi_check_condition(r, SENSE_CODE(LBA_OUT_OF_RANGE));
 return 0;
 }
-r->need_fua_emulation = sdc->need_fua_emulation(&r->req.cmd);
+r->need_fua = sdc->need_fua(&r->req.cmd);
 if (r->sector_count == 0) {
 scsi_req_complete(&r->req, GOOD);
 }
@@ -3137,7 +3113,8 @@ BlockAIOCB *scsi_dma_writev(int64_t offset, QEMUIOVector 
*iov,
 {
 SCSIDiskReq *r = opaque;
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
-return blk_aio_pwritev(s->qdev.conf.blk, offset, iov, 0, cb, cb_opaque);
+int flags = r->need_fua ? BDRV_REQ_FUA : 0;
+return blk_aio_pwritev(s->qdev.conf.blk, offset, iov, flags, cb, 
cb_opaque);
 }
 
 static char *scsi_property_get_loadparm(Object *obj, Error **errp)
@@ -3186,7 +3163,7 @@ static void scsi_disk_base_class_initfn(ObjectClass 
*klass, void *data)
 device_class_set_legacy_reset(dc, scsi_disk_reset);
 sdc->dma_readv = scsi_dma_readv;
 sdc->dma_writev = scsi_dma_writev;
-sdc->need_fua_emulation = scsi_is_cmd_fua;
+sdc->need_fua  = scsi_is_cmd_fua;
 }
 
 static const TypeInfo scsi_disk_base_info = {
@@ -3338,7 +3315,7 @@ static void scsi_block_class_initfn(ObjectClass *klass, 
void *data)
 sdc->dma_readv   = scsi_block_dma_readv;
 sdc->dma_writev  = scsi_block_dma_writev;
 sdc->update_sense = scsi_block_update_sense;
-sdc->need_fua_emulation = scsi_block_no_fua;
+sdc->need_fua= scsi_block_no_fua;
 dc->desc = "SCSI block device passthrough";
 device_class_set_props(dc, scsi_block_properties);
 dc->vmsd  = &vmstate_scsi_disk_state;
-- 
2.49.0




[PATCH v2 2/2] scsi-disk: Advertise FUA support by default

2025-04-11 Thread Alberto Faria
Allow the guest to submit FUA requests directly, instead of forcing it
to emulate them using a regular flush.

Signed-off-by: Alberto Faria 
---
 hw/core/machine.c   | 1 +
 hw/scsi/scsi-disk.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 63c6ef93d2..e4e6474a4e 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -46,6 +46,7 @@ GlobalProperty hw_compat_9_2[] = {
 { "migration", "multifd-clean-tls-termination", "false" },
 { "migration", "send-switchover-start", "off"},
 { "vfio-pci", "x-migration-multifd-transfer", "off" },
+{ "scsi-disk", "dpofua", "off" },
 };
 const size_t hw_compat_9_2_len = G_N_ELEMENTS(hw_compat_9_2);
 
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index f62dcded64..2f62f6069d 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -3192,7 +3192,7 @@ static const Property scsi_hd_properties[] = {
 DEFINE_PROP_BIT("removable", SCSIDiskState, features,
 SCSI_DISK_F_REMOVABLE, false),
 DEFINE_PROP_BIT("dpofua", SCSIDiskState, features,
-SCSI_DISK_F_DPOFUA, false),
+SCSI_DISK_F_DPOFUA, true),
 DEFINE_PROP_UINT64("wwn", SCSIDiskState, qdev.wwn, 0),
 DEFINE_PROP_UINT64("port_wwn", SCSIDiskState, qdev.port_wwn, 0),
 DEFINE_PROP_UINT16("port_index", SCSIDiskState, port_index, 0),
-- 
2.49.0




Re: [PATCH 2/5] vfio: Move realize() after attach_device()

2025-04-11 Thread Cédric Le Goater

On 4/11/25 12:17, Zhenzhong Duan wrote:

Previously device attaching depends on realize() getting host iommu
capabilities to check dirty tracking support.

Now we save a caps copy in VFIODevice and check that copy for dirty
tracking support, there is no dependency any more, move realize()
call after attach_device() call in vfio_device_attach().

Drop vfio_device_hiod_realize() which looks redundant now.

Suggested-by: Cédric Le Goater 
Suggested-by: Donald Dutile 
Signed-off-by: Zhenzhong Duan 
---
  include/hw/vfio/vfio-device.h |  1 -
  hw/vfio/container.c   |  4 
  hw/vfio/device.c  | 28 +++-
  hw/vfio/iommufd.c |  4 
  4 files changed, 11 insertions(+), 26 deletions(-)

diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 09a7af891a..14559733c6 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -124,7 +124,6 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, 
int index, int subindex
  
  void vfio_device_reset_handler(void *opaque);

  bool vfio_device_is_mdev(VFIODevice *vbasedev);
-bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
  bool vfio_device_attach(char *name, VFIODevice *vbasedev,
  AddressSpace *as, Error **errp);
  void vfio_device_detach(VFIODevice *vbasedev);
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 23a3373470..676e88cef4 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -883,10 +883,6 @@ static bool vfio_legacy_attach_device(const char *name, 
VFIODevice *vbasedev,
  
  trace_vfio_device_attach(vbasedev->name, groupid);
  
-if (!vfio_device_hiod_realize(vbasedev, errp)) {

-return false;
-}
-
  group = vfio_group_get(groupid, as, errp);
  if (!group) {
  return false;
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 4de6948cf4..6154d3f443 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -347,17 +347,6 @@ bool vfio_device_is_mdev(VFIODevice *vbasedev)
  return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
  }
  
-bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)

-{
-HostIOMMUDevice *hiod = vbasedev->hiod;
-
-if (!hiod) {
-return true;
-}
-
-return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp);
-}
-
  VFIODevice *vfio_get_vfio_device(Object *obj)
  {
  if (object_dynamic_cast(obj, TYPE_VFIO_PCI)) {
@@ -372,6 +361,7 @@ bool vfio_device_attach(char *name, VFIODevice *vbasedev,
  {
  const VFIOIOMMUClass *ops =
  VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
+HostIOMMUDeviceClass *hiodc;
  HostIOMMUDevice *hiod = NULL;
  
  if (vbasedev->iommufd) {

@@ -380,16 +370,20 @@ bool vfio_device_attach(char *name, VFIODevice *vbasedev,
  
  assert(ops);
  
+if (!ops->attach_device(name, vbasedev, as, errp)) {

+return false;
+}
  
  if (!vbasedev->mdev) {

  hiod = HOST_IOMMU_DEVICE(object_new(ops->hiod_typename));
-vbasedev->hiod = hiod;
-}
+hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
  
-if (!ops->attach_device(name, vbasedev, as, errp)) {

-object_unref(hiod);
-vbasedev->hiod = NULL;
-return false;
+if (!hiodc->realize(hiod, vbasedev, errp)) {
+object_unref(hiod);
+ops->detach_device(vbasedev);
+return false;
+}
+vbasedev->hiod = hiod;


This is not what I meant. I was not clear enough. Sorry about that.

hiodc->realize can be called under each container backend: legacy
and iommufd. I don't see much much value to make it common and
it would remove the unref/detach sequence to handle errors.

Thanks,

C.




  }
  
  return true;

diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 530cde6740..e05b472e35 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -502,10 +502,6 @@ static bool iommufd_cdev_attach(const char *name, 
VFIODevice *vbasedev,
   * FD to be connected and having a devid to be able to successfully call
   * iommufd_backend_get_device_info().
   */
-if (!vfio_device_hiod_realize(vbasedev, errp)) {
-goto err_alloc_ioas;
-}
-
  if (!iommufd_backend_get_device_info(vbasedev->iommufd, vbasedev->devid,
   &caps->type, &caps->vendor_caps,
   sizeof(VendorCaps), &caps->hw_caps,





Re: Management applications and CPU feature flags

2025-04-11 Thread Markus Armbruster
Daniel P. Berrangé  writes:

> On Fri, Apr 11, 2025 at 12:40:46PM +0200, Markus Armbruster wrote:
>> Daniel P. Berrangé  writes:
>> 
>> > On Wed, Apr 09, 2025 at 09:58:13AM +0200, Peter Krempa via Devel wrote:
>> >> On Wed, Apr 09, 2025 at 09:39:02 +0200, Markus Armbruster via Devel wrote:
>> >> > Hi Steve, I apologize for the slow response.
>> >> > 
>> >> > Steve Sistare  writes:
>> >> > 
>> >> > > Using qom-list and qom-get to get all the nodes and property values 
>> >> > > in a
>> >> > > QOM tree can take multiple seconds because it requires 1000's of 
>> >> > > individual
>> >> > > QOM requests.  Some managers fetch the entire tree or a large subset
>> >> > > of it when starting a new VM, and this cost is a substantial fraction 
>> >> > > of
>> >> > > start up time.
>> >> > 
>> >> > "Some managers"... could you name one?
>> >> 
>> >> libvirt is at ~500 qom-get calls during an average startup ...
>> >> 
>> >> > > To reduce this cost, consider QAPI calls that fetch more information 
>> >> > > in
>> >> > > each call:
>> >> > >   * qom-list-get: given a path, return a list of properties and 
>> >> > > values.
>> >> > >   * qom-list-getv: given a list of paths, return a list of properties 
>> >> > > and
>> >> > > values for each path.
>> >> > >   * qom-tree-get: given a path, return all descendant nodes rooted at 
>> >> > > that
>> >> > > path, with properties and values for each.
>> >> > 
>> >> > Libvirt developers, would you be interested in any of these?
>> >> 
>> >> YES!!!
>> >
>> > Not neccessarily, see below...  
>> >
>> >> 
>> >> The getter with value could SO MUCH optimize the startup sequence of a
>> >> VM where libvirt needs to probe CPU flags:
>> >> 
>> >> (note the 'id' field in libvirt's monitor is sequential)
>> >> 
>> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"realized"},"id":"libvirt-8"}
>> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hotplugged"},"id":"libvirt-9"}
>> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hotpluggable"},"id":"libvirt-10"}
>> >> 
>> >> [...]
>> >> 
>> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hv-apicv"},"id":"libvirt-470"}
>> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"xd"},"id":"libvirt-471"}
>> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"sse4_1"},"id":"libvirt-472"}
>> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"unavailable-features"},"id":"libvirt-473"}
>> >> 
>> >> First and last line's timestamps:
>> >> 
>> >> 2025-04-08 14:44:28.882+: 1481190: info : qemuMonitorIOWrite:340 : 
>> >> QEMU_MONITOR_IO_WRITE: mon=0x7f4678048360 
>> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"realized"},"id":"libvirt-8"}
>> >> 
>> >> 2025-04-08 14:44:29.149+: 1481190: info : qemuMonitorIOWrite:340 : 
>> >> QEMU_MONITOR_IO_WRITE: mon=0x7f4678048360 
>> >> buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"unavailable-features"},"id":"libvirt-473"}
>> >> 
>> >> Libvirt spent ~170 ms probing cpu flags.
>> >
>> > One thing I would point out is that qom-get can be considered an
>> > "escape hatch" to get information when no better QMP command exists.
>> > In this case, libvirt has made the assumption that every CPU feature
>> > is a QOM property.
>> >
>> > Adding qom-list-get doesn't appreciably change that, just makes the
>> > usage more efficient.
>> >
>> > Considering the bigger picture QMP design, when libvirt is trying to
>> > understand QEMU's CPU feature flag expansion, I would ask why we don't
>> > have something like a "query-cpu" command to tell us the current CPU
>> > expansion, avoiding the need for poking at QOM properties directly.
>> 
>> How do the existing query-cpu-FOO fall short of what management
>> applications such as libvirt needs?
>
> It has been along while since I looked at them, but IIRC they were
> returning static info about CPU models, whereas libvirt wanted info
> on the currently requested '-cpu ARGS'

Libvirt developers, please work with us on design of new commands or
improvements to existing ones to better meet libvirt's needs in this
area.




[PATCH v9 1/7] migration/multifd: move macros to multifd header

2025-04-11 Thread Prasad Pandit
From: Prasad Pandit 

Move MULTIFD_ macros to the header file so that
they are accessible from other source files.

Reviewed-by: Fabiano Rosas 
Signed-off-by: Prasad Pandit 
---
 migration/multifd.c | 5 -
 migration/multifd.h | 5 +
 2 files changed, 5 insertions(+), 5 deletions(-)

v8: no change
- 
https://lore.kernel.org/qemu-devel/20250318123846.1370312-1-ppan...@redhat.com/T/#t

diff --git a/migration/multifd.c b/migration/multifd.c
index dfb5189f0e..6139cabe44 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -36,11 +36,6 @@
 #include "io/channel-socket.h"
 #include "yank_functions.h"
 
-/* Multiple fd's */
-
-#define MULTIFD_MAGIC 0x11223344U
-#define MULTIFD_VERSION 1
-
 typedef struct {
 uint32_t magic;
 uint32_t version;
diff --git a/migration/multifd.h b/migration/multifd.h
index 2d337e7b3b..9b6d81e7ed 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -49,6 +49,11 @@ bool multifd_queue_page(RAMBlock *block, ram_addr_t offset);
 bool multifd_recv(void);
 MultiFDRecvData *multifd_get_recv_data(void);
 
+/* Multiple fd's */
+
+#define MULTIFD_MAGIC 0x11223344U
+#define MULTIFD_VERSION 1
+
 /* Multifd Compression flags */
 #define MULTIFD_FLAG_SYNC (1 << 0)
 
-- 
2.49.0




[PATCH v9 0/7] Allow to enable multifd and postcopy migration together

2025-04-11 Thread Prasad Pandit
From: Prasad Pandit 

 Hello,


* This series (v9) does minor refactoring and reordering changes as
  suggested in the review of earlier series (v8). Also tried to
  reproduce/debug a qtest hang issue, but it could not be reproduced.
  From the shared stack traces it looked like Postcopy thread was
  preparing to finish before migrating all the pages.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK  
   170.50s   81 subtests passed
===


v8: 
https://lore.kernel.org/qemu-devel/20250318123846.1370312-1-ppan...@redhat.com/T/#t
* This series (v8) splits earlier patch-2 which enabled multifd and
  postcopy options together into two separate patches. One modifies
  the channel discovery in migration_ioc_process_incoming() function,
  and second one enables the multifd and postcopy migration together.

  It also adds the 'save_postcopy_prepare' savevm_state handler to
  enable different sections to take an action just before the Postcopy
  phase starts. Thank you Peter for these patches.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK  
   152.66s   81 subtests passed
===


v7: 
https://lore.kernel.org/qemu-devel/20250228121749.553184-1-ppan...@redhat.com/T/#t
* This series (v7) adds 'MULTIFD_RECV_SYNC' migration command. It is used
  to notify the destination migration thread to synchronise with the Multifd
  threads. This allows Multifd ('mig/dst/recv_x') threads on the destination
  to receive all their data, before they are shutdown.

  This series also updates the channel discovery function and qtests as
  suggested in the previous review comments.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK  
   147.84s   81 subtests passed
===


v6: 
https://lore.kernel.org/qemu-devel/20250215123119.814345-1-ppan...@redhat.com/T/#t
* This series (v6) shuts down Multifd threads before starting Postcopy
  migration. It helps to avoid an issue of multifd pages arriving late
  at the destination during Postcopy phase and corrupting the vCPU
  state. It also reorders the qtest patches and does some refactoring
  changes as suggested in previous review.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK  
   161.35s   73 subtests passed
===


v5: 
https://lore.kernel.org/qemu-devel/20250205122712.229151-1-ppan...@redhat.com/T/#t
* This series (v5) consolidates migration capabilities setting in one
  'set_migration_capabilities()' function, thus simplifying test sources.
  It passes all migration tests.
===
66/66 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK  
   143.66s   71 subtests passed
===


v4: 
https://lore.kernel.org/qemu-devel/20250127120823.144949-1-ppan...@redhat.com/T/#t
* This series (v4) adds more 'multifd+postcopy' qtests which test
  Precopy migration with 'postcopy-ram' attribute set. And run
  Postcopy migrations with 'multifd' channels enabled.
===
$ ../qtest/migration-test --tap -k -r '/x86_64/migration/multifd+postcopy' | 
grep -i 'slow test'
# slow test /x86_64/migration/multifd+postcopy/plain executed in 1.29 secs
# slow test /x86_64/migration/multifd+postcopy/recovery/tls/psk executed in 
2.48 secs
# slow test /x86_64/migration/multifd+postcopy/preempt/plain executed in 1.49 
secs
# slow test /x86_64/migration/multifd+postcopy/preempt/recovery/tls/psk 
executed in 2.52 secs
# slow test /x86_64/migration/multifd+postcopy/tcp/tls/psk/match executed in 
3.62 secs
# slow test /x86_64/migration/multifd+postcopy/tcp/plain/zstd executed in 1.34 
secs
# slow test /x86_64/migration/multifd+postcopy/tcp/plain/cancel executed in 
2.24 secs
...
66/66 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK  
   148.41s   71 subtests passed
===


v3: 
https://lore.kernel.org/qemu-devel/20250121131032.1611245-1-ppan...@redhat.com/T/#t
* This series (v3) passes all existing 'tests/qtest/migration/*' tests
  and adds a new one to enable multifd channels with postcopy migration.


v2: 
https://lore.kernel.org/qemu-devel/20241129122256.96778-1-ppan...@redhat.com/T/#u
* This series (v2) further refactors the 'ram_save_target_page'
  function to make it independent of the multifd & postcopy change.


v1: 
https://lore.kernel.org/qemu-devel/20241126115748.118683-1-ppan...@redhat.com/T/#u
* This series removes magic value (4-bytes) introduced in the
  previous series for the Postcopy channel.


v0: 
https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppan...@redhat.com/T/#u
* Currently Multifd and Postcopy migration can not be used together.
  QEMU shows "Postcopy is not yet compatible with multifd" message.

  When migrating guests with large (100's GB) RAM, Multifd threads
  help to accelerate migration, but inability to use it with the
  Postcopy mode delays guest start up on the destination side.

* This patch series allows to enable both Multifd and Postcopy
  migration together. Preco

[PATCH v9 6/7] tests/qtest/migration: consolidate set capabilities

2025-04-11 Thread Prasad Pandit
From: Prasad Pandit 

Migration capabilities are set in multiple '.start_hook'
functions for various tests. Instead, consolidate setting
capabilities in 'migrate_start_set_capabilities()' function
which is called from the 'migrate_start()' function.
While simplifying the capabilities setting, it helps
to declutter the qtest sources.

Suggested-by: Fabiano Rosas 
Signed-off-by: Prasad Pandit 
---
 tests/qtest/migration/compression-tests.c | 22 +--
 tests/qtest/migration/cpr-tests.c |  6 +-
 tests/qtest/migration/file-tests.c| 58 -
 tests/qtest/migration/framework.c | 76 ---
 tests/qtest/migration/framework.h |  9 ++-
 tests/qtest/migration/misc-tests.c|  4 +-
 tests/qtest/migration/postcopy-tests.c|  8 ++-
 tests/qtest/migration/precopy-tests.c | 29 +
 tests/qtest/migration/tls-tests.c | 23 ++-
 9 files changed, 151 insertions(+), 84 deletions(-)

v8: no change
- 
https://lore.kernel.org/qemu-devel/20250318123846.1370312-1-ppan...@redhat.com/T/#t

diff --git a/tests/qtest/migration/compression-tests.c 
b/tests/qtest/migration/compression-tests.c
index 8b58401b84..41e79f031b 100644
--- a/tests/qtest/migration/compression-tests.c
+++ b/tests/qtest/migration/compression-tests.c
@@ -35,6 +35,9 @@ static void test_multifd_tcp_zstd(void)
 {
 MigrateCommon args = {
 .listen_uri = "defer",
+.start = {
+.caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+},
 .start_hook = migrate_hook_start_precopy_tcp_multifd_zstd,
 };
 test_precopy_common(&args);
@@ -56,6 +59,9 @@ static void test_multifd_tcp_qatzip(void)
 {
 MigrateCommon args = {
 .listen_uri = "defer",
+.start = {
+.caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+},
 .start_hook = migrate_hook_start_precopy_tcp_multifd_qatzip,
 };
 test_precopy_common(&args);
@@ -74,6 +80,9 @@ static void test_multifd_tcp_qpl(void)
 {
 MigrateCommon args = {
 .listen_uri = "defer",
+.start = {
+.caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+},
 .start_hook = migrate_hook_start_precopy_tcp_multifd_qpl,
 };
 test_precopy_common(&args);
@@ -92,6 +101,9 @@ static void test_multifd_tcp_uadk(void)
 {
 MigrateCommon args = {
 .listen_uri = "defer",
+.start = {
+.caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+},
 .start_hook = migrate_hook_start_precopy_tcp_multifd_uadk,
 };
 test_precopy_common(&args);
@@ -103,10 +115,6 @@ migrate_hook_start_xbzrle(QTestState *from,
   QTestState *to)
 {
 migrate_set_parameter_int(from, "xbzrle-cache-size", 33554432);
-
-migrate_set_capability(from, "xbzrle", true);
-migrate_set_capability(to, "xbzrle", true);
-
 return NULL;
 }
 
@@ -118,6 +126,9 @@ static void test_precopy_unix_xbzrle(void)
 .listen_uri = uri,
 .start_hook = migrate_hook_start_xbzrle,
 .iterations = 2,
+.start = {
+.caps[MIGRATION_CAPABILITY_XBZRLE] = true,
+},
 /*
  * XBZRLE needs pages to be modified when doing the 2nd+ round
  * iteration to have real data pushed to the stream.
@@ -146,6 +157,9 @@ static void test_multifd_tcp_zlib(void)
 {
 MigrateCommon args = {
 .listen_uri = "defer",
+.start = {
+.caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+},
 .start_hook = migrate_hook_start_precopy_tcp_multifd_zlib,
 };
 test_precopy_common(&args);
diff --git a/tests/qtest/migration/cpr-tests.c 
b/tests/qtest/migration/cpr-tests.c
index 4758841824..5536e14610 100644
--- a/tests/qtest/migration/cpr-tests.c
+++ b/tests/qtest/migration/cpr-tests.c
@@ -24,9 +24,6 @@ static void *migrate_hook_start_mode_reboot(QTestState *from, 
QTestState *to)
 migrate_set_parameter_str(from, "mode", "cpr-reboot");
 migrate_set_parameter_str(to, "mode", "cpr-reboot");
 
-migrate_set_capability(from, "x-ignore-shared", true);
-migrate_set_capability(to, "x-ignore-shared", true);
-
 return NULL;
 }
 
@@ -39,6 +36,9 @@ static void test_mode_reboot(void)
 .connect_uri = uri,
 .listen_uri = "defer",
 .start_hook = migrate_hook_start_mode_reboot,
+.start = {
+.caps[MIGRATION_CAPABILITY_X_IGNORE_SHARED] = true,
+},
 };
 
 test_file_common(&args, true);
diff --git a/tests/qtest/migration/file-tests.c 
b/tests/qtest/migration/file-tests.c
index f260e2871d..4d78ce0855 100644
--- a/tests/qtest/migration/file-tests.c
+++ b/tests/qtest/migration/file-tests.c
@@ -107,15 +107,6 @@ static void test_precopy_file_offset_bad(void)
 test_file_common(&args, false);
 }
 
-static void *migrate_hook_start_mapped_ram(QTestState *from,
-   QTestState *to)
-{
-migrate_set_capability(from, "mapped-ram", true);
-migrate_set_capabil

[PATCH v9 2/7] migration: refactor channel discovery mechanism

2025-04-11 Thread Prasad Pandit
From: Prasad Pandit 

The various logical migration channels don't have a
standardized way of advertising themselves and their
connections may be seen out of order by the migration
destination. When a new connection arrives, the incoming
migration currently make use of heuristics to determine
which channel it belongs to.

The next few patches will need to change how the multifd
and postcopy capabilities interact and that affects the
channel discovery heuristic.

Refactor the channel discovery heuristic to make it less
opaque and simplify the subsequent patches.

Signed-off-by: Prasad Pandit 
---
 migration/migration.c | 132 +++---
 1 file changed, 71 insertions(+), 61 deletions(-)

v8: remove else if (!mis->from_src_file) checks and add assert(3) check
- 
https://lore.kernel.org/qemu-devel/20250318123846.1370312-1-ppan...@redhat.com/T/#t

diff --git a/migration/migration.c b/migration/migration.c
index d46e776e24..64f4f40ae3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -95,6 +95,9 @@ enum mig_rp_message_type {
 MIG_RP_MSG_MAX
 };
 
+/* Migration channel types */
+enum { CH_MAIN, CH_MULTIFD, CH_POSTCOPY };
+
 /* When we add fault tolerance, we could have several
migrations at once.  For now we don't need to add
dynamic creation of migration */
@@ -931,9 +934,8 @@ static void migration_incoming_setup(QEMUFile *f)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
 
-if (!mis->from_src_file) {
-mis->from_src_file = f;
-}
+assert(!mis->from_src_file);
+mis->from_src_file = f;
 qemu_file_set_blocking(f, false);
 }
 
@@ -985,28 +987,19 @@ void migration_fd_process_incoming(QEMUFile *f)
 migration_incoming_process();
 }
 
-/*
- * Returns true when we want to start a new incoming migration process,
- * false otherwise.
- */
-static bool migration_should_start_incoming(bool main_channel)
+static bool migration_has_main_and_multifd_channels(void)
 {
-/* Multifd doesn't start unless all channels are established */
-if (migrate_multifd()) {
-return migration_has_all_channels();
+MigrationIncomingState *mis = migration_incoming_get_current();
+if (!mis->from_src_file) {
+/* main channel not established */
+return false;
 }
 
-/* Preempt channel only starts when the main channel is created */
-if (migrate_postcopy_preempt()) {
-return main_channel;
+if (migrate_multifd() && !multifd_recv_all_channels_created()) {
+return false;
 }
 
-/*
- * For all the rest types of migration, we should only reach here when
- * it's the main channel that's being created, and we should always
- * proceed with this channel.
- */
-assert(main_channel);
+/* main and all multifd channels are established */
 return true;
 }
 
@@ -1015,59 +1008,81 @@ void migration_ioc_process_incoming(QIOChannel *ioc, 
Error **errp)
 MigrationIncomingState *mis = migration_incoming_get_current();
 Error *local_err = NULL;
 QEMUFile *f;
-bool default_channel = true;
+uint8_t channel;
 uint32_t channel_magic = 0;
 int ret = 0;
 
-if (migrate_multifd() && !migrate_mapped_ram() &&
-!migrate_postcopy_ram() &&
-qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
-/*
- * With multiple channels, it is possible that we receive channels
- * out of order on destination side, causing incorrect mapping of
- * source channels on destination side. Check channel MAGIC to
- * decide type of channel. Please note this is best effort, postcopy
- * preempt channel does not send any magic number so avoid it for
- * postcopy live migration. Also tls live migration already does
- * tls handshake while initializing main channel so with tls this
- * issue is not possible.
- */
-ret = migration_channel_read_peek(ioc, (void *)&channel_magic,
-  sizeof(channel_magic), errp);
+if (!migration_has_main_and_multifd_channels()) {
+if (qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
+/*
+ * With multiple channels, it is possible that we receive channels
+ * out of order on destination side, causing incorrect mapping of
+ * source channels on destination side. Check channel MAGIC to
+ * decide type of channel. Please note this is best effort,
+ * postcopy preempt channel does not send any magic number so
+ * avoid it for postcopy live migration. Also tls live migration
+ * already does tls handshake while initializing main channel so
+ * with tls this issue is not possible.
+ */
+ret = migration_channel_read_peek(ioc, (void *)&channel_magic,
+  sizeof(channel_magic), errp);
+   

[PATCH v9 5/7] migration: enable multifd and postcopy together

2025-04-11 Thread Prasad Pandit
From: Prasad Pandit 

Enable Multifd and Postcopy migration together.
The migration_ioc_process_incoming() routine checks
magic value sent on each channel and helps to properly
setup multifd and postcopy channels.

The Precopy and Multifd threads work during the initial
guest RAM transfer. When migration moves to the Postcopy
phase, the multifd threads cease to send data on multifd
channels and Postcopy threads on the destination
request/pull data from the source side.

Signed-off-by: Prasad Pandit 
---
 migration/multifd-nocomp.c | 3 ++-
 migration/multifd.c| 7 +++
 migration/options.c| 5 -
 migration/ram.c| 5 ++---
 4 files changed, 11 insertions(+), 9 deletions(-)

v8: remove a !migration_in_postcopy() check
- 
https://lore.kernel.org/qemu-devel/20250318123846.1370312-1-ppan...@redhat.com/T/#t

diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index ffe75256c9..02f8bf8ce8 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -17,6 +17,7 @@
 #include "migration-stats.h"
 #include "multifd.h"
 #include "options.h"
+#include "migration.h"
 #include "qapi/error.h"
 #include "qemu/cutils.h"
 #include "qemu/error-report.h"
@@ -399,7 +400,7 @@ int multifd_ram_flush_and_sync(QEMUFile *f)
 MultiFDSyncReq req;
 int ret;
 
-if (!migrate_multifd()) {
+if (!migrate_multifd() || migration_in_postcopy()) {
 return 0;
 }
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 6139cabe44..074d16d07d 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1379,6 +1379,13 @@ static void *multifd_recv_thread(void *opaque)
 }
 
 if (has_data) {
+/*
+ * multifd thread should not be active and receive data
+ * when migration is in the Postcopy phase. Two threads
+ * writing the same memory area could easily corrupt
+ * the guest state.
+ */
+assert(!migration_in_postcopy());
 if (is_device_state) {
 assert(use_packets);
 ret = multifd_device_state_recv(p, &local_err);
diff --git a/migration/options.c b/migration/options.c
index b0ac2ea408..48aa6076de 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -491,11 +491,6 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 error_setg(errp, "Postcopy is not compatible with ignore-shared");
 return false;
 }
-
-if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
-error_setg(errp, "Postcopy is not yet compatible with multifd");
-return false;
-}
 }
 
 if (new_caps[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) {
diff --git a/migration/ram.c b/migration/ram.c
index 753042456e..533c64b941 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1976,9 +1976,8 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 }
 }
 
-if (migrate_multifd()) {
-RAMBlock *block = pss->block;
-return ram_save_multifd_page(block, offset);
+if (migrate_multifd() && !migration_in_postcopy()) {
+return ram_save_multifd_page(pss->block, offset);
 }
 
 return ram_save_page(rs, pss);
-- 
2.49.0




[PATCH v9 7/7] tests/qtest/migration: add postcopy tests with multifd

2025-04-11 Thread Prasad Pandit
From: Prasad Pandit 

Add new qtests to run postcopy migration with multifd
channels enabled.

Signed-off-by: Prasad Pandit 
---
 tests/qtest/migration/compression-tests.c | 16 
 tests/qtest/migration/postcopy-tests.c| 27 +
 tests/qtest/migration/precopy-tests.c | 19 +
 tests/qtest/migration/tls-tests.c | 47 +++
 4 files changed, 109 insertions(+)

v8: no change
- 
https://lore.kernel.org/qemu-devel/20250318123846.1370312-1-ppan...@redhat.com/T/#t

diff --git a/tests/qtest/migration/compression-tests.c 
b/tests/qtest/migration/compression-tests.c
index 41e79f031b..a788a8d4a7 100644
--- a/tests/qtest/migration/compression-tests.c
+++ b/tests/qtest/migration/compression-tests.c
@@ -42,6 +42,20 @@ static void test_multifd_tcp_zstd(void)
 };
 test_precopy_common(&args);
 }
+
+static void test_multifd_postcopy_tcp_zstd(void)
+{
+MigrateCommon args = {
+.listen_uri = "defer",
+.start = {
+.caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+.caps[MIGRATION_CAPABILITY_POSTCOPY_RAM] = true,
+},
+.start_hook = migrate_hook_start_precopy_tcp_multifd_zstd,
+};
+
+test_precopy_common(&args);
+}
 #endif /* CONFIG_ZSTD */
 
 #ifdef CONFIG_QATZIP
@@ -184,6 +198,8 @@ void migration_test_add_compression(MigrationTestEnv *env)
 #ifdef CONFIG_ZSTD
 migration_test_add("/migration/multifd/tcp/plain/zstd",
test_multifd_tcp_zstd);
+migration_test_add("/migration/multifd+postcopy/tcp/plain/zstd",
+   test_multifd_postcopy_tcp_zstd);
 #endif
 
 #ifdef CONFIG_QATZIP
diff --git a/tests/qtest/migration/postcopy-tests.c 
b/tests/qtest/migration/postcopy-tests.c
index 483e3ff99f..eb637f94f7 100644
--- a/tests/qtest/migration/postcopy-tests.c
+++ b/tests/qtest/migration/postcopy-tests.c
@@ -94,6 +94,29 @@ static void 
migration_test_add_postcopy_smoke(MigrationTestEnv *env)
 }
 }
 
+static void test_multifd_postcopy(void)
+{
+MigrateCommon args = {
+.start = {
+.caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+},
+};
+
+test_postcopy_common(&args);
+}
+
+static void test_multifd_postcopy_preempt(void)
+{
+MigrateCommon args = {
+.start = {
+.caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+.caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true,
+},
+};
+
+test_postcopy_common(&args);
+}
+
 void migration_test_add_postcopy(MigrationTestEnv *env)
 {
 migration_test_add_postcopy_smoke(env);
@@ -114,6 +137,10 @@ void migration_test_add_postcopy(MigrationTestEnv *env)
 "/migration/postcopy/recovery/double-failures/reconnect",
 test_postcopy_recovery_fail_reconnect);
 
+migration_test_add("/migration/postcopy/multifd/plain",
+   test_multifd_postcopy);
+migration_test_add("/migration/postcopy/multifd/preempt/plain",
+   test_multifd_postcopy_preempt);
 if (env->is_x86) {
 migration_test_add("/migration/postcopy/suspend",
test_postcopy_suspend);
diff --git a/tests/qtest/migration/precopy-tests.c 
b/tests/qtest/migration/precopy-tests.c
index f8404793b8..b2b0db8076 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -34,6 +34,7 @@
 #define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
 
 static char *tmpfs;
+static bool postcopy_ram = false;
 
 static void test_precopy_unix_plain(void)
 {
@@ -476,6 +477,11 @@ static void test_multifd_tcp_cancel(void)
 migrate_ensure_non_converge(from);
 migrate_prepare_for_dirty_mem(from);
 
+if (postcopy_ram) {
+migrate_set_capability(from, "postcopy-ram", true);
+migrate_set_capability(to, "postcopy-ram", true);
+}
+
 migrate_set_parameter_int(from, "multifd-channels", 16);
 migrate_set_parameter_int(to, "multifd-channels", 16);
 
@@ -517,6 +523,10 @@ static void test_multifd_tcp_cancel(void)
 return;
 }
 
+if (postcopy_ram) {
+migrate_set_capability(to2, "postcopy-ram", true);
+}
+
 migrate_set_parameter_int(to2, "multifd-channels", 16);
 
 migrate_set_capability(to2, "multifd", true);
@@ -540,6 +550,13 @@ static void test_multifd_tcp_cancel(void)
 migrate_end(from, to2, true);
 }
 
+static void test_multifd_postcopy_tcp_cancel(void)
+{
+postcopy_ram = true;
+test_multifd_tcp_cancel();
+postcopy_ram = false;
+}
+
 static void test_cancel_src_after_failed(QTestState *from, QTestState *to,
  const char *uri, const char *phase)
 {
@@ -1127,6 +1144,8 @@ static void 
migration_test_add_precopy_smoke(MigrationTestEnv *env)
test_multifd_tcp_uri_none);
 migration_test_add("/migration/multifd/tcp/plain/cancel",
test_multifd_tcp_cancel);
+migration_test_add("/migration/multifd+postcopy/tc

[PATCH v9 3/7] migration: Add save_postcopy_prepare() savevm handler

2025-04-11 Thread Prasad Pandit
From: Peter Xu 

Add a savevm handler for a module to opt-in sending extra sections right
before postcopy starts, and before VM is stopped.

RAM will start to use this new savevm handler in the next patch to do flush
and sync for multifd pages.

Note that we choose to do it before VM stopped because the current only
potential user is not sensitive to VM status, so doing it before VM is
stopped is preferred to enlarge any postcopy downtime.

It is still a bit unfortunate that we need to introduce such a new savevm
handler just for the only use case, however it's so far the cleanest.

Signed-off-by: Peter Xu 
Signed-off-by: Prasad Pandit 
---
 include/migration/register.h | 15 +++
 migration/migration.c|  4 
 migration/savevm.c   | 33 +
 migration/savevm.h   |  1 +
 4 files changed, 53 insertions(+)

v8: reorder this patch before enabling multifd and postcopy together
- 
https://lore.kernel.org/qemu-devel/20250318123846.1370312-1-ppan...@redhat.com/T/#t

diff --git a/include/migration/register.h b/include/migration/register.h
index c041ce32f2..b79dc81b8d 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -189,6 +189,21 @@ typedef struct SaveVMHandlers {
 
 /* This runs outside the BQL!  */
 
+/**
+ * @save_postcopy_prepare
+ *
+ * This hook will be invoked on the source side right before switching
+ * to postcopy (before VM stopped).
+ *
+ * @f:  QEMUFile where to send the data
+ * @opaque: Data pointer passed to register_savevm_live()
+ * @errp:   Error** used to report error message
+ *
+ * Returns: true if succeeded, false if error occured.  When false is
+ * returned, @errp must be set.
+ */
+bool (*save_postcopy_prepare)(QEMUFile *f, void *opaque, Error **errp);
+
 /**
  * @state_pending_estimate
  *
diff --git a/migration/migration.c b/migration/migration.c
index 64f4f40ae3..4bb29b7193 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2717,6 +2717,10 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
 }
 }
 
+if (!qemu_savevm_state_postcopy_prepare(ms->to_dst_file, errp)) {
+return -1;
+}
+
 trace_postcopy_start();
 bql_lock();
 trace_postcopy_start_set_run();
diff --git a/migration/savevm.c b/migration/savevm.c
index ce158c3512..23ef4c7dc9 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1523,6 +1523,39 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
 qemu_fflush(f);
 }
 
+bool qemu_savevm_state_postcopy_prepare(QEMUFile *f, Error **errp)
+{
+SaveStateEntry *se;
+bool ret;
+
+QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+if (!se->ops || !se->ops->save_postcopy_prepare) {
+continue;
+}
+
+if (se->ops->is_active) {
+if (!se->ops->is_active(se->opaque)) {
+continue;
+}
+}
+
+trace_savevm_section_start(se->idstr, se->section_id);
+
+save_section_header(f, se, QEMU_VM_SECTION_PART);
+ret = se->ops->save_postcopy_prepare(f, se->opaque, errp);
+save_section_footer(f, se);
+
+trace_savevm_section_end(se->idstr, se->section_id, ret);
+
+if (!ret) {
+assert(*errp);
+return false;
+}
+}
+
+return true;
+}
+
 int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
 {
 int64_t start_ts_each, end_ts_each;
diff --git a/migration/savevm.h b/migration/savevm.h
index 138c39a7f9..2d5e9c7166 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -45,6 +45,7 @@ void qemu_savevm_state_pending_exact(uint64_t *must_precopy,
 void qemu_savevm_state_pending_estimate(uint64_t *must_precopy,
 uint64_t *can_postcopy);
 int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy);
+bool qemu_savevm_state_postcopy_prepare(QEMUFile *f, Error **errp);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_open_return_path(QEMUFile *f);
 int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
-- 
2.49.0




[PATCH v9 4/7] migration/ram: Implement save_postcopy_prepare()

2025-04-11 Thread Prasad Pandit
From: Peter Xu 

Implement save_postcopy_prepare(), preparing for the enablement
of both multifd and postcopy.

Signed-off-by: Peter Xu 
Signed-off-by: Prasad Pandit 
---
 migration/ram.c | 37 +
 1 file changed, 37 insertions(+)

v8: reorder patch and some typographical corrections.
- 
https://lore.kernel.org/qemu-devel/20250318123846.1370312-1-ppan...@redhat.com/T/#t

diff --git a/migration/ram.c b/migration/ram.c
index 424df6d9f1..753042456e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4420,6 +4420,42 @@ static int ram_resume_prepare(MigrationState *s, void 
*opaque)
 return 0;
 }
 
+static bool ram_save_postcopy_prepare(QEMUFile *f, void *opaque, Error **errp)
+{
+int ret;
+
+if (migrate_multifd()) {
+/*
+ * When multifd is enabled, source QEMU needs to make sure all the
+ * pages queued before postcopy starts have been flushed.
+ *
+ * The load of these pages must happen before switching to postcopy.
+ * It's because loading of guest pages (so far) in multifd recv
+ * threads is still non-atomic, so the load cannot happen with vCPUs
+ * running on the destination side.
+ *
+ * This flush and sync will guarantee that those pages are loaded
+ * _before_ postcopy starts on the destination. The rationale is,
+ * this happens before VM stops (and before source QEMU sends all
+ * the rest of the postcopy messages).  So when the destination QEMU
+ * receives the postcopy messages, it must have received the sync
+ * message on the main channel (either RAM_SAVE_FLAG_MULTIFD_FLUSH,
+ * or RAM_SAVE_FLAG_EOS), and such message would guarantee that
+ * all previous guest pages queued in the multifd channels are
+ * completely loaded.
+ */
+ret = multifd_ram_flush_and_sync(f);
+if (ret < 0) {
+error_setg(errp, "%s: multifd flush and sync failed", __func__);
+return false;
+}
+}
+
+qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+
+return true;
+}
+
 void postcopy_preempt_shutdown_file(MigrationState *s)
 {
 qemu_put_be64(s->postcopy_qemufile_src, RAM_SAVE_FLAG_EOS);
@@ -4439,6 +4475,7 @@ static SaveVMHandlers savevm_ram_handlers = {
 .load_setup = ram_load_setup,
 .load_cleanup = ram_load_cleanup,
 .resume_prepare = ram_resume_prepare,
+.save_postcopy_prepare = ram_save_postcopy_prepare,
 };
 
 static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
-- 
2.49.0




Re: Management applications and CPU feature flags

2025-04-11 Thread David Hildenbrand

On 11.04.25 13:43, Markus Armbruster wrote:

Daniel P. Berrangé  writes:


On Fri, Apr 11, 2025 at 12:40:46PM +0200, Markus Armbruster wrote:

Daniel P. Berrangé  writes:


On Wed, Apr 09, 2025 at 09:58:13AM +0200, Peter Krempa via Devel wrote:

On Wed, Apr 09, 2025 at 09:39:02 +0200, Markus Armbruster via Devel wrote:

Hi Steve, I apologize for the slow response.

Steve Sistare  writes:


Using qom-list and qom-get to get all the nodes and property values in a
QOM tree can take multiple seconds because it requires 1000's of individual
QOM requests.  Some managers fetch the entire tree or a large subset
of it when starting a new VM, and this cost is a substantial fraction of
start up time.


"Some managers"... could you name one?


libvirt is at ~500 qom-get calls during an average startup ...


To reduce this cost, consider QAPI calls that fetch more information in
each call:
   * qom-list-get: given a path, return a list of properties and values.
   * qom-list-getv: given a list of paths, return a list of properties and
 values for each path.
   * qom-tree-get: given a path, return all descendant nodes rooted at that
 path, with properties and values for each.


Libvirt developers, would you be interested in any of these?


YES!!!


Not neccessarily, see below... 



The getter with value could SO MUCH optimize the startup sequence of a
VM where libvirt needs to probe CPU flags:

(note the 'id' field in libvirt's monitor is sequential)

buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"realized"},"id":"libvirt-8"}
buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hotplugged"},"id":"libvirt-9"}
buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hotpluggable"},"id":"libvirt-10"}

[...]

buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hv-apicv"},"id":"libvirt-470"}
buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"xd"},"id":"libvirt-471"}
buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"sse4_1"},"id":"libvirt-472"}
buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"unavailable-features"},"id":"libvirt-473"}

First and last line's timestamps:

2025-04-08 14:44:28.882+: 1481190: info : qemuMonitorIOWrite:340 : QEMU_MONITOR_IO_WRITE: mon=0x7f4678048360 
buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"realized"},"id":"libvirt-8"}

2025-04-08 14:44:29.149+: 1481190: info : qemuMonitorIOWrite:340 : QEMU_MONITOR_IO_WRITE: mon=0x7f4678048360 
buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"unavailable-features"},"id":"libvirt-473"}

Libvirt spent ~170 ms probing cpu flags.


One thing I would point out is that qom-get can be considered an
"escape hatch" to get information when no better QMP command exists.
In this case, libvirt has made the assumption that every CPU feature
is a QOM property.

Adding qom-list-get doesn't appreciably change that, just makes the
usage more efficient.

Considering the bigger picture QMP design, when libvirt is trying to
understand QEMU's CPU feature flag expansion, I would ask why we don't
have something like a "query-cpu" command to tell us the current CPU
expansion, avoiding the need for poking at QOM properties directly.


How do the existing query-cpu-FOO fall short of what management
applications such as libvirt needs?


It has been along while since I looked at them, but IIRC they were
returning static info about CPU models, whereas libvirt wanted info
on the currently requested '-cpu ARGS'


Not sure what the exact requirements and other archs, but at least on 
s390x I think that's exactly what we do.


If you expand a non-static model (e.g., z14) you'd get the expansion as 
if you would specify "-cpu z14" on the cmdline for a specific QEMU machine.


Looking at CPU properties is really a nasty hack.



Libvirt developers, please work with us on design of new commands or
improvements to existing ones to better meet libvirt's needs in this
area.


Yes, knowing about requirements and why the existing APIs don't work 
would be great.


--
Cheers,

David / dhildenb




Re: Issue with stoptrigger.c Plugin in QEMU Emulation

2025-04-11 Thread Alex Bennée
Saanjh Sengupta  writes:

> Hi,
>
> I am writing to seek assistance with an issue I am experiencing while using 
> the stoptrigger.c plugin in QEMU emulation. I am
> currently utilising the latest QEMU version, 9.2.92, and attempting to 
> emulate the Debian 11 as the operating system. 
>
> The command I am using to emulate QEMU is as follows:
> ./build/qemu-system-x86_64 -m 2048M -smp 2 -boot c -nographic -serial 
> mon:stdio -nic
> tap,ifname=tap0,script=no,downscript=no  -hda debian11.qcow2 -icount shift=0 
> -plugin .
> /build/contrib/plugins/libstoptrigger.so,icount=90 -d plugin -qmp 
> tcp:localhost:,server,wait=off
>
> However, when I attempt to use the -icount shift=0 option, the plugin fails 
> with the error "Basic icount read". I have
> attached a screenshot of the error for your reference.

icount and libstoptrigger are independent of each other. You do not need
to enable icount to use libstoptrigger.

>
> error.png
>  
>
> When I remove the -plugin argument from the command the OS boots up 
> perfectly, as expected. Command utilised in that
> context was somewhat like ./build/qemu-system-x86_64 -m 2048M -smp 2 -boot c 
> -nographic -serial mon:stdio -nic
> tap,ifname=tap0,script=no,downscript=no  -hda debian11.qcow2 -icount shift=0 
> -qmp
> tcp:localhost:,server,wait=off
>
> I would greatly appreciate it if you could provide guidance on resolving this 
> issue. Specifically, I would like to know the cause
> of the error and any potential solutions or workarounds that could be 
> implemented to successfully use the stoptrigger.c
> plugin with the -icount shift=0 option.

It's likely the instrumentation libstoptrigger does has changed the size
of some of the translation blocks leading to the error being triggered.
To know exactly what is going wrong we would need to see a backtrace of
the failure. The case:

if (!cpu->neg.can_do_io) {
error_report("Bad icount read");
exit(1);
}

is basically saying you are trying to read icount at a point its not a
known precise value. Any attempt to do a device access should trigger a
TB recompile so the device access is on the last translated instruction
of the block. However if a TCG helper queries time and its not the last
instruction in a block that would trigger it.


>
> Regards
>
> Saanjh Sengupta

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH] migration: add FEATURE_SEEKABLE to QIOChannelBlock

2025-04-11 Thread Fabiano Rosas
"Marco Cavenati"  writes:

> On Thursday, April 10, 2025 21:52 CEST, Fabiano Rosas  wrote:
>
>> We'll need to add the infrastructure to reject multifd and direct-io
>> before this. The rest of the capabilities should not affect mapped-ram,
>> so it's fine (for now) if we don't honor them.
>
> Ok, thanks for the update.
>  
>> What about zero page handling? Mapped-ram doesn't send zero pages
>> because the file will always have zeroes in it and the migration
>> destination is guaranteed to not have been running previously. I believe
>> loading a snapshot in a VM that's already been running would leave stale
>> data in the guest's memory.
>
> Yes, you are correct.
>
> About the `RAMBlock->file_bmap`, according to the code it is a:
> `/* bitmap of pages present in the migration file */`
> And, if a pages is a zero page, it won't be in the migration file:
> `/* zero pages are not transferred with mapped-ram */`
> So, zero page implies bitmap 0.
> Does the opposite hold?
>

It does. Mapped-ram takes up (sparse) disk space equal to the guest's
ram size.

> If bitmap 0 implies zero page, we could call `ram_handle_zero`
> in `read_ramblock_mapped_ram` for the clear bits.
> Or do you fear this might be unnecessary expensive for migration?
>

Yes, unfortunately the peformance difference is noticeable. But we could
have a slightly different algorithm for savevm. At this point it might
be easier to just duplicate read_ramblock_mapped_ram(), check for savevm
in there and see what that the resulting code looks like.

By the way, what's your overall goal with enabling the feature? Do you
intent to enable further capabilities for snapshot? Specifically
multifd. I belive the zero page skip is responsible for most of the
performance gains for mapped-ram without direct-io and multifd. The
benefit of bounded stream size doesn't apply to snapshots because
they're not live.

It would be interesting to gather some numbers for the perf difference
between mapped-ram=on vs off.

> If bitmap 0 does not imply zero page, I feel like the
> "is present in the migration file" and "is zero page" info should
> be better separated.
>
> Best,
> Marco



Re: [PATCH v1 01/24] Add -boot-certificates /path/dir:/path/file option in QEMU command line

2025-04-11 Thread Daniel P . Berrangé
On Fri, Apr 11, 2025 at 12:44:17PM +0200, Thomas Huth wrote:
> On 08/04/2025 17.55, Zhuoying Cai wrote:
> > The `-boot-certificates /path/dir:/path/file` option is implemented
> > to provide path to either a directory or a single certificate.
> > 
> > Multiple paths can be delineated using a colon.
> > 
> > Signed-off-by: Zhuoying Cai 
> > ---
> >   qemu-options.hx | 11 +++
> >   system/vl.c | 22 ++
> >   2 files changed, 33 insertions(+)
> > 
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index dc694a99a3..b460c63490 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -1251,6 +1251,17 @@ SRST
> >   Set system UUID.
> >   ERST
> > +DEF("boot-certificates", HAS_ARG, QEMU_OPTION_boot_certificates,
> > +"-boot-certificates /path/directory:/path/file\n"
> > +"  Provide a path to a directory or a boot 
> > certificate.\n"
> > +"  A colon may be used to delineate multiple paths.\n",
> > +QEMU_ARCH_S390X)
> > +SRST
> > +``-boot-certificates /path/directory:/path/file``
> > +Provide a path to a directory or a boot certificate.
> > +A colon may be used to delineate multiple paths.
> > +ERST
> 
> Unless there is a really, really good reason for introducing new top-level
> options to QEMU, this should rather be added to one of the existing options
> instead.
> 
> I assume this is very specific to s390x, isn't it? So the best way is likely
> to add this as a parameter of the machine type option, so that the user
> would specify:
> 
>  qemu-system-s390x -machine s390-ccw-virtio,boot-certificates=/path/to/certs
> 
> See the other object_class_property_add() statements in
> ccw_machine_class_init() for some examples how to do this.

With other arches that use EDK2 (x86, arm64, riscv64, loongarch64) we
pass this info via fw_cfg

   -fw_cfg name=etc/edk2/https/cacerts,file=

Assuming this series is trying to implement a pre-existing s390x machine
standard for passing certs, then it seems inevitable that it will need
a different config approach than we use for EDK2.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v1 02/24] hw/s390x/ipl: Create certificate store

2025-04-11 Thread Thomas Huth

On 08/04/2025 17.55, Zhuoying Cai wrote:

Create a certificate store for boot certificates used for secure IPL.

Load certificates from the -boot-certificate option into the cert store.

Currently, only x509 certificates in DER format and uses SHA-256 hashing
algorithm are supported, as these are the types required for secure boot
on s390.

Signed-off-by: Zhuoying Cai 
---

...

+static size_t cert2buf(char *path, size_t max_size, char **cert_buf)
+{
+size_t size;
+g_autofree char *buf;
+buf = g_malloc(max_size);
+
+if (!g_file_get_contents(path, &buf, &size, NULL) ||
+size == 0 || size > max_size) {
+return 0;
+}
+
+*cert_buf = g_steal_pointer(&buf);
+
+return size;
+}


This function looks quite wrong to me. Why is there a g_malloc() in here if 
g_file_get_contents() already allocates the memory?


And why do we need a max_size here? If there is a reason, please add a 
proper comment in the source code.



+#ifdef CONFIG_GNUTLS
+int g_init_cert(uint8_t *raw_cert, size_t cert_size, gnutls_x509_crt_t *g_cert)


Please don't use a "g_" prefix here - otherwise that way the function could 
be confused with the functions from the glib.



+{
+int rc;
+
+if (gnutls_x509_crt_init(g_cert) < 0) {
+return -1;
+}
+
+gnutls_datum_t datum_cert = {raw_cert, cert_size};
+rc = gnutls_x509_crt_import(*g_cert, &datum_cert, GNUTLS_X509_FMT_DER);
+if (rc) {
+gnutls_x509_crt_deinit(*g_cert);
+return rc;
+}
+
+return 0;
+}
+#endif /* CONFIG_GNUTLS */
+
+static int init_cert_x509_der(size_t size, char *raw, S390IPLCertificate 
**qcert)


I'd maybe rather use "S390IPLCertificate *" as return type instead of "int" 
and return a NULL in case of errors.



+{
+#ifdef CONFIG_GNUTLS
+gnutls_x509_crt_t g_cert = NULL;
+g_autofree S390IPLCertificate *q_cert;
+size_t key_id_size;
+size_t hash_size;
+int rc;
+
+rc = g_init_cert((uint8_t *)raw, size, &g_cert);
+if (rc) {
+if (rc == GNUTLS_E_ASN1_TAG_ERROR) {
+error_report("The certificate is not in DER format");
+}
+return -1;
+}
+
+rc = gnutls_x509_crt_get_key_id(g_cert, GNUTLS_KEYID_USE_SHA256, NULL, 
&key_id_size);


Is that documented somewhere that you can call gnutls_x509_crt_get_key_id() 
like this? The docs that I found about this function do not say anything 
about passing NULL here, they rather recommend to use a buffer of size 20 by 
default?



+if (rc != GNUTLS_E_SHORT_MEMORY_BUFFER) {
+error_report("Failed to get certificate key ID size");
+goto out;
+}
+
+rc = gnutls_x509_crt_get_fingerprint(g_cert, GNUTLS_DIG_SHA256, NULL, 
&hash_size);


For this function, the NULL pointer handling is documented, so here it seems 
to be OK.



+if (rc != GNUTLS_E_SHORT_MEMORY_BUFFER) {
+error_report("Failed to get certificate hash size");
+goto out;
+}
+
+q_cert = g_malloc(sizeof(*q_cert));


Please use g_new() for allocating memory for structures instead.


+q_cert->size = size;
+q_cert->key_id_size = key_id_size;
+q_cert->hash_size = hash_size;
+q_cert->raw = raw;
+q_cert->format = GNUTLS_X509_FMT_DER;
+*qcert = g_steal_pointer(&q_cert);


If there is no "return" between the allocation and the final "return 0", you 
can also drop the g_autofree and g_steal_pointer from this function.



+gnutls_x509_crt_deinit(g_cert);
+
+return 0;
+out:
+gnutls_x509_crt_deinit(g_cert);
+return -1;
+#else
+error_report("Cryptographic library is not enabled")
+return -1;
+#endif /* #define CONFIG_GNUTLS */
+}
+
+static int check_path_type(const char *path)
+{
+struct stat path_stat;
+
+stat(path, &path_stat);
+
+if (S_ISDIR(path_stat.st_mode)) {
+return S_IFDIR;
+} else if (S_ISREG(path_stat.st_mode)) {
+return S_IFREG;
+} else {
+return -1;
+}
+}
+
+static int init_cert(char *paths, S390IPLCertificate **qcert)


as with previous function, use "S390IPLCertificate *" as return type instead 
of "int" ?



+{
+char *buf;
+char vc_name[VC_NAME_LEN_BYTES];
+const gchar *filename;
+size_t size;
+
+filename = g_path_get_basename(paths);


g_path_get_basename() returns an allocated string. You've finally got to 
free it again to avoid leaking memory. I'd suggest declaring filename with 
g_autofree.



+size = cert2buf(paths, CERT_MAX_SIZE, &buf);
+if (size == 0) {
+error_report("Failed to load certificate: %s", paths);
+return -1;
+}
+
+if (init_cert_x509_der(size, buf, qcert) < 0) {
+error_report("Failed to initialize certificate: %s", paths);
+return -1;
+}
+
+/*
+ * Left justified certificate name with padding on the right with blanks.
+ * Convert certificate name to EBCDIC.
+ */
+strpadcpy(vc_name, VC_NAME_LEN_BYTES, filename, ' ');
+ebcdic_put((*qcert)->vc_name, vc_name, VC_NAME_LEN_BYTES);
+
+ 

Re: [PATCH v1 02/24] hw/s390x/ipl: Create certificate store

2025-04-11 Thread Daniel P . Berrangé
On Tue, Apr 08, 2025 at 11:55:04AM -0400, Zhuoying Cai wrote:
> Create a certificate store for boot certificates used for secure IPL.
> 
> Load certificates from the -boot-certificate option into the cert store.
> 
> Currently, only x509 certificates in DER format and uses SHA-256 hashing
> algorithm are supported, as these are the types required for secure boot
> on s390.
> 
> Signed-off-by: Zhuoying Cai 
> ---
>  hw/s390x/cert-store.c   | 249 
>  hw/s390x/cert-store.h   |  50 
>  hw/s390x/ipl.c  |   9 ++
>  hw/s390x/ipl.h  |   3 +
>  hw/s390x/meson.build|   1 +
>  include/hw/s390x/ipl/qipl.h |   3 +
>  6 files changed, 315 insertions(+)
>  create mode 100644 hw/s390x/cert-store.c
>  create mode 100644 hw/s390x/cert-store.h
> 
> diff --git a/hw/s390x/cert-store.c b/hw/s390x/cert-store.c
> new file mode 100644
> index 00..1aa8aea040
> --- /dev/null
> +++ b/hw/s390x/cert-store.c
> @@ -0,0 +1,249 @@
> +/*
> + * S390 certificate store implementation
> + *
> + * Copyright 2025 IBM Corp.
> + * Author(s): Zhuoying Cai 
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "cert-store.h"
> +#include "qemu/error-report.h"
> +#include "qemu/option.h"
> +#include "qemu/config-file.h"
> +#include "hw/s390x/ebcdic.h"
> +#include "qemu/cutils.h"
> +#include "cert-store.h"
> +
> +#ifdef CONFIG_GNUTLS
> +#include 
> +#include 
> +#endif /* #define CONFIG_GNUTLS */

It is bad practice to directly use GNUTLS in any QEMU code except
for under the 'crypto/' directory (and its test suites). We must
define internal APIs for accessing the info we need and then call
those instead of gnutls.


> +
> +static const char *s390_get_boot_certificates(void)
> +{
> +QemuOpts *opts;
> +const char *path;
> +
> +opts = qemu_find_opts_singleton("boot-certificates");
> +path = qemu_opt_get(opts, "boot-certificates");
> +
> +return path;
> +}
> +
> +static size_t cert2buf(char *path, size_t max_size, char **cert_buf)
> +{
> +size_t size;
> +g_autofree char *buf;
> +buf = g_malloc(max_size);
> +
> +if (!g_file_get_contents(path, &buf, &size, NULL) ||
> +size == 0 || size > max_size) {
> +return 0;
> +}
> +
> +*cert_buf = g_steal_pointer(&buf);
> +
> +return size;
> +}
> +
> +#ifdef CONFIG_GNUTLS
> +int g_init_cert(uint8_t *raw_cert, size_t cert_size, gnutls_x509_crt_t 
> *g_cert)
> +{
> +int rc;
> +
> +if (gnutls_x509_crt_init(g_cert) < 0) {
> +return -1;
> +}
> +
> +gnutls_datum_t datum_cert = {raw_cert, cert_size};
> +rc = gnutls_x509_crt_import(*g_cert, &datum_cert, GNUTLS_X509_FMT_DER);
> +if (rc) {
> +gnutls_x509_crt_deinit(*g_cert);
> +return rc;
> +}
> +
> +return 0;
> +}
> +#endif /* CONFIG_GNUTLS */
> +
> +static int init_cert_x509_der(size_t size, char *raw, S390IPLCertificate 
> **qcert)
> +{
> +#ifdef CONFIG_GNUTLS
> +gnutls_x509_crt_t g_cert = NULL;
> +g_autofree S390IPLCertificate *q_cert;
> +size_t key_id_size;
> +size_t hash_size;
> +int rc;
> +
> +rc = g_init_cert((uint8_t *)raw, size, &g_cert);
> +if (rc) {
> +if (rc == GNUTLS_E_ASN1_TAG_ERROR) {
> +error_report("The certificate is not in DER format");
> +}
> +return -1;
> +}
> +
> +rc = gnutls_x509_crt_get_key_id(g_cert, GNUTLS_KEYID_USE_SHA256, NULL, 
> &key_id_size);
> +if (rc != GNUTLS_E_SHORT_MEMORY_BUFFER) {
> +error_report("Failed to get certificate key ID size");
> +goto out;
> +}
> +
> +rc = gnutls_x509_crt_get_fingerprint(g_cert, GNUTLS_DIG_SHA256, NULL, 
> &hash_size);
> +if (rc != GNUTLS_E_SHORT_MEMORY_BUFFER) {
> +error_report("Failed to get certificate hash size");
> +goto out;
> +}


We already have qcrypto_get_x509_cert_fingerprint() to avoid direct use
of gnutls. That API could be extended to optionally also report the
key id.

> +
> +q_cert = g_malloc(sizeof(*q_cert));
> +q_cert->size = size;
> +q_cert->key_id_size = key_id_size;
> +q_cert->hash_size = hash_size;
> +q_cert->raw = raw;
> +q_cert->format = GNUTLS_X509_FMT_DER;
> +*qcert = g_steal_pointer(&q_cert);
> +
> +gnutls_x509_crt_deinit(g_cert);
> +

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] virtio-net: Copy all for dhclient workaround

2025-04-11 Thread Antoine Damhet
On Fri, Apr 11, 2025 at 05:01:01PM +0900, Akihiko Odaki wrote:
> On 2025/04/07 17:29, Antoine Damhet wrote:
> > On Sat, Apr 05, 2025 at 05:04:28PM +0900, Akihiko Odaki wrote:
> > > The goal of commit 7987d2be5a8b ("virtio-net: Copy received header to
> > > buffer") was to remove the need to patch the (const) input buffer with a
> > > recomputed UDP checksum by copying headers to a RW region and inject the
> > > checksum there. The patch computed the checksum only from the header
> > > fields (missing the rest of the payload) producing an invalid one
> > > and making guests fail to acquire a DHCP lease.
> > > 
> > > Fix the issue by copying the entire packet instead of only copying the
> > > headers.
> > > 
> > > Fixes: 7987d2be5a8b ("virtio-net: Copy received header to buffer")
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2727
> > > Cc: qemu-sta...@nongnu.org
> > > Signed-off-by: Akihiko Odaki 
> > 
> > Tested-By: Antoine Damhet 
> > 
> > > ---
> > > This patch aims to resolves the issue the following one also does:
> > > https://lore.kernel.org/qemu-devel/20250404151835.328368-1-adam...@scaleway.com
> > > 
> > > The difference from the mentioned patch is that this patch also
> > > preserves that the original intent of regressing change, which is to
> > > remove the need to patch the (const) input buffer with a recomputed UDP
> > > checksum.
> > > 
> > > To Antoine Damhet:
> > > I confirmed that DHCP is currently not working and this patch fixes the
> > > issue, but I would appreciate if you also confirm the fix as I already
> > > have done testing badly for the regressing patch.
> > 
> > Thanks for the swift response, ideally I'd like a non-regression test in
> > the testsuite but a quick test showed me that I couldn't easily
> > reproduce with user networking so unless someone has a great idea it
> > would be a pain.
> > 
> > > ---
> > >   hw/net/virtio-net.c | 35 ---
> > >   1 file changed, 16 insertions(+), 19 deletions(-)
> > > 
> > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > index de87cfadffe1..a920358a89c5 100644
> > > --- a/hw/net/virtio-net.c
> > > +++ b/hw/net/virtio-net.c
> > > @@ -1687,6 +1687,11 @@ static void virtio_net_hdr_swap(VirtIODevice 
> > > *vdev, struct virtio_net_hdr *hdr)
> > >   virtio_tswap16s(vdev, &hdr->csum_offset);
> > >   }
> > > +typedef struct Header {
> > > +struct virtio_net_hdr_v1_hash virtio_net;
> > > +uint8_t payload[1500];
> > > +} Header;
> > > +
> > >   /* dhclient uses AF_PACKET but doesn't pass auxdata to the kernel so
> > >* it never finds out that the packets don't have valid checksums.  This
> > >* causes dhclient to get upset.  Fedora's carried a patch for ages to
> > > @@ -1701,7 +1706,7 @@ static void virtio_net_hdr_swap(VirtIODevice *vdev, 
> > > struct virtio_net_hdr *hdr)
> > >* we should provide a mechanism to disable it to avoid polluting the 
> > > host
> > >* cache.
> > >*/
> > > -static void work_around_broken_dhclient(struct virtio_net_hdr *hdr,
> > > +static void work_around_broken_dhclient(struct Header *hdr,
> > >   size_t *hdr_len, const uint8_t 
> > > *buf,
> > >   size_t buf_size, size_t 
> > > *buf_offset)
> > >   {
> > > @@ -1711,20 +1716,20 @@ static void work_around_broken_dhclient(struct 
> > > virtio_net_hdr *hdr,
> > >   buf += *buf_offset;
> > >   buf_size -= *buf_offset;
> > > -if ((hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && /* missing csum */
> > > -(buf_size >= csum_size && buf_size < 1500) && /* normal sized 
> > > MTU */
> > > +if ((hdr->virtio_net.hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && /* 
> > > missing csum */
> > > +(buf_size >= csum_size && buf_size < sizeof(hdr->payload)) && /* 
> > > normal sized MTU */
> > >   (buf[12] == 0x08 && buf[13] == 0x00) && /* ethertype == IPv4 */
> > >   (buf[23] == 17) && /* ip.protocol == UDP */
> > >   (buf[34] == 0 && buf[35] == 67)) { /* udp.srcport == bootps */
> > > -memcpy((uint8_t *)hdr + *hdr_len, buf, csum_size);
> > > -net_checksum_calculate((uint8_t *)hdr + *hdr_len, csum_size, 
> > > CSUM_UDP);
> > > -hdr->flags &= ~VIRTIO_NET_HDR_F_NEEDS_CSUM;
> > > -*hdr_len += csum_size;
> > > -*buf_offset += csum_size;
> > > +memcpy((uint8_t *)hdr + *hdr_len, buf, buf_size);
> > > +net_checksum_calculate((uint8_t *)hdr + *hdr_len, buf_size, 
> > > CSUM_UDP);
> > > +hdr->virtio_net.hdr.flags &= ~VIRTIO_NET_HDR_F_NEEDS_CSUM;
> > > +*hdr_len += buf_size;
> > > +*buf_offset += buf_size;
> > >   }
> > >   }
> > > -static size_t receive_header(VirtIONet *n, struct virtio_net_hdr *hdr,
> > > +static size_t receive_header(VirtIONet *n, Header *hdr,
> > >const void *buf, size_t buf_size,
> > >size_t *buf_offset)
> > 
> 

Re: [PATCH v1 03/24] s390x: Guest support for Certificate Store Facility (CS)

2025-04-11 Thread Thomas Huth

On 08/04/2025 17.55, Zhuoying Cai wrote:

DIAG 320 is supported when the certificate-store (CS) facility
is installed.

Availability of CS facility is determined by byte 134 bit 5 of the
SCLP Read Info block.

Signed-off-by: Zhuoying Cai 
---

...

diff --git a/target/s390x/cpu_features.c b/target/s390x/cpu_features.c
index 4b5be6798e..99089ab3f5 100644
--- a/target/s390x/cpu_features.c
+++ b/target/s390x/cpu_features.c
@@ -147,6 +147,7 @@ void s390_fill_feat_block(const S390FeatBitmap features, 
S390FeatType type,
  break;
  case S390_FEAT_TYPE_SCLP_FAC134:
  clear_be_bit(s390_feat_def(S390_FEAT_DIAG_318)->bit, data);
+clear_be_bit(s390_feat_def(S390_FEAT_DIAG_320)->bit, data);
  break;
  default:
  return;
diff --git a/target/s390x/cpu_features_def.h.inc 
b/target/s390x/cpu_features_def.h.inc
index e23e603a79..65d38f546d 100644
--- a/target/s390x/cpu_features_def.h.inc
+++ b/target/s390x/cpu_features_def.h.inc
@@ -138,6 +138,7 @@ DEF_FEAT(SIE_IBS, "ibs", SCLP_CONF_CHAR_EXT, 10, "SIE: 
Interlock-and-broadcast-s
  
  /* Features exposed via SCLP SCCB Facilities byte 134 (bit numbers relative to byte-134) */

  DEF_FEAT(DIAG_318, "diag318", SCLP_FAC134, 0, "Control program name and version 
codes")
+DEF_FEAT(DIAG_320, "diag320", SCLP_FAC134, 5, "Provide Certificate Store 
functions")
  
  /* Features exposed via SCLP CPU info. */

  DEF_FEAT(SIE_F2, "sief2", SCLP_CPU, 4, "SIE: interception format 2 (Virtual 
SIE)")
diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 93a05e43d7..7d65c40bd1 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -248,6 +248,7 @@ bool s390_has_feat(S390Feat feat)
  if (s390_is_pv()) {
  switch (feat) {
  case S390_FEAT_DIAG_318:
+case S390_FEAT_DIAG_320:


So secure IPL is not available with secure execution? That's surprising. 
Could you add a comment to the patch description why this is the case?



  case S390_FEAT_HPMA2:
  case S390_FEAT_SIE_F2:
  case S390_FEAT_SIE_SKEY:
@@ -505,6 +506,7 @@ static void check_consistency(const S390CPUModel *model)
  { S390_FEAT_PTFF_STOUE, S390_FEAT_MULTIPLE_EPOCH },
  { S390_FEAT_AP_QUEUE_INTERRUPT_CONTROL, S390_FEAT_AP },
  { S390_FEAT_DIAG_318, S390_FEAT_EXTENDED_LENGTH_SCCB },
+{ S390_FEAT_DIAG_320, S390_FEAT_EXTENDED_LENGTH_SCCB },


Please also add a comment to the patch description why this feature needs 
S390_FEAT_EXTENDED_LENGTH_SCCB.



  { S390_FEAT_NNPA, S390_FEAT_VECTOR },
  { S390_FEAT_RDP, S390_FEAT_LOCAL_TLB_CLEARING },
  { S390_FEAT_UV_FEAT_AP, S390_FEAT_AP },
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 41840677ce..52c649adcd 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -696,6 +696,7 @@ static uint16_t full_GEN14_GA1[] = {
  S390_FEAT_HPMA2,
  S390_FEAT_SIE_KSS,
  S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
+S390_FEAT_DIAG_320,


Is it available with the z14 already? 
https://www.ibm.com/docs/en/linux-on-systems?topic=linux-secure-boot seems 
to indicate a z15 instead??



  };
  
  #define full_GEN14_GA2 EmptyFeat

diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 4d56e653dd..d07ca879a3 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2487,6 +2487,8 @@ bool kvm_s390_get_host_cpu_model(S390CPUModel *model, 
Error **errp)
  set_bit(S390_FEAT_DIAG_318, model->features);
  }
  
+set_bit(S390_FEAT_DIAG_320, model->features);

+
  /* Test for Ultravisor features that influence secure guest behavior */
  query_uv_feat_guest(model->features);


 Thomas




Re: [PATCH v1 01/24] Add -boot-certificates /path/dir:/path/file option in QEMU command line

2025-04-11 Thread Daniel P . Berrangé
On Fri, Apr 11, 2025 at 01:57:26PM +0100, Daniel P. Berrangé wrote:
> On Fri, Apr 11, 2025 at 12:44:17PM +0200, Thomas Huth wrote:
> > On 08/04/2025 17.55, Zhuoying Cai wrote:
> > > The `-boot-certificates /path/dir:/path/file` option is implemented
> > > to provide path to either a directory or a single certificate.
> > > 
> > > Multiple paths can be delineated using a colon.
> > > 
> > > Signed-off-by: Zhuoying Cai 
> > > ---
> > >   qemu-options.hx | 11 +++
> > >   system/vl.c | 22 ++
> > >   2 files changed, 33 insertions(+)
> > > 
> > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > index dc694a99a3..b460c63490 100644
> > > --- a/qemu-options.hx
> > > +++ b/qemu-options.hx
> > > @@ -1251,6 +1251,17 @@ SRST
> > >   Set system UUID.
> > >   ERST
> > > +DEF("boot-certificates", HAS_ARG, QEMU_OPTION_boot_certificates,
> > > +"-boot-certificates /path/directory:/path/file\n"
> > > +"  Provide a path to a directory or a boot 
> > > certificate.\n"
> > > +"  A colon may be used to delineate multiple 
> > > paths.\n",
> > > +QEMU_ARCH_S390X)
> > > +SRST
> > > +``-boot-certificates /path/directory:/path/file``
> > > +Provide a path to a directory or a boot certificate.
> > > +A colon may be used to delineate multiple paths.
> > > +ERST
> > 
> > Unless there is a really, really good reason for introducing new top-level
> > options to QEMU, this should rather be added to one of the existing options
> > instead.
> > 
> > I assume this is very specific to s390x, isn't it? So the best way is likely
> > to add this as a parameter of the machine type option, so that the user
> > would specify:
> > 
> >  qemu-system-s390x -machine s390-ccw-virtio,boot-certificates=/path/to/certs
> > 
> > See the other object_class_property_add() statements in
> > ccw_machine_class_init() for some examples how to do this.
> 
> With other arches that use EDK2 (x86, arm64, riscv64, loongarch64) we
> pass this info via fw_cfg

s/this info/this kind of info/

because technically the stuff below is certs for PXE boot downloads,
not certs for secureboot. The latter are hardcoded in the EDK varstore
at boot time, so any setup of certs for secureboot is out of band
from QEMU startup

> 
>-fw_cfg name=etc/edk2/https/cacerts,file=
> 
> Assuming this series is trying to implement a pre-existing s390x machine
> standard for passing certs, then it seems inevitable that it will need
> a different config approach than we use for EDK2.
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PULL 8/9] meson: Disallow 64-bit on 32-bit emulation

2025-04-11 Thread Daniel P . Berrangé
On Sat, Feb 08, 2025 at 12:57:23PM -0800, Richard Henderson wrote:
> For system mode, we can rarely support the amount of RAM that
> the guest requires. TCG emulation is restricted to round-robin
> mode, which solves many of the atomicity issues, but not those
> associated with virtio.  In any case, round-robin does nothing
> to help the speed of emulation.
> 
> For user mode, most emulation does not succeed at all.  Most
> of the time we cannot even load 64-bit non-PIE binaries due
> to lack of a 64-bit address space.  Threads are run in
> parallel, not round-robin, which means that atomicity
> is not handled.
> 
> Reviewed-by: Thomas Huth 
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Richard Henderson 
> ---
>  meson.build | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)

Shouldn't  this patch and the earlier ones in this series have
added something to removed-features.rst, as this is a significant
feature removal which is impacting downstream users, and distros
in particular.

> 
> diff --git a/meson.build b/meson.build
> index 85317cd63f..ec51827f40 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -3185,6 +3185,9 @@ if host_os == 'windows'
>endif
>  endif
>  
> +# Detect host pointer size for the target configuration loop.
> +host_long_bits = cc.sizeof('void *') * 8
> +
>  
>  # Target configuration #
>  
> @@ -3277,8 +3280,14 @@ foreach target : target_dirs
>  }
>endif
>  
> +  config_target += keyval.load('configs/targets' / target + '.mak')
> +
>target_kconfig = []
>foreach sym: accelerators
> +# Disallow 64-bit on 32-bit emulation and virtualization
> +if host_long_bits < config_target['TARGET_LONG_BITS'].to_int()
> +  continue
> +endif
>  if sym == 'CONFIG_TCG' or target in accelerator_targets.get(sym, [])
>config_target += { sym: 'y' }
>config_all_accel += { sym: 'y' }
> @@ -3292,9 +3301,6 @@ foreach target : target_dirs
>  error('No accelerator available for target @0@'.format(target))
>endif
>  
> -  config_target += keyval.load('configs/targets' / target + '.mak')
> -  config_target += { 'TARGET_' + config_target['TARGET_ARCH'].to_upper(): 
> 'y' }
> -
>if 'TARGET_NEED_FDT' in config_target and not fdt.found()
>  if default_targets
>warning('Disabling ' + target + ' due to missing libfdt')
> @@ -3307,6 +3313,7 @@ foreach target : target_dirs
>actual_target_dirs += target
>  
># Add default keys
> +  config_target += { 'TARGET_' + config_target['TARGET_ARCH'].to_upper(): 
> 'y' }
>if 'TARGET_BASE_ARCH' not in config_target
>  config_target += {'TARGET_BASE_ARCH': config_target['TARGET_ARCH']}
>endif
> -- 
> 2.43.0
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v1 04/24] s390x/diag: Introduce DIAG 320 for certificate store facility

2025-04-11 Thread Thomas Huth

On 08/04/2025 17.55, Zhuoying Cai wrote:

From: Collin Walling 

...

Signed-off-by: Zhuoying Cai 


So the patch is from Collin, but S-o-b only by you? Looks weird, this should 
either have an additional S-o-b by Collin, too, or not have that "From:" 
line at all?


...

diff --git a/include/hw/s390x/ipl/diag320.h b/include/hw/s390x/ipl/diag320.h
new file mode 100644
index 00..d6f70c65df
--- /dev/null
+++ b/include/hw/s390x/ipl/diag320.h
@@ -0,0 +1,19 @@
+/*
+ * S/390 DIAGNOSE 320 definitions and structures
+ *
+ * Copyright 2025 IBM Corp.
+ * Author(s): Zhuoying Cai 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.


By the way, new files need a SPDX-License-Identifier nowadays to make 
scripts/checkpatch.pl happy.



+ */
+
+#ifndef S390X_DIAG320_H
+#define S390X_DIAG320_H
+
+#define DIAG_320_SUBC_QUERY_ISM 0
+
+#define DIAG_320_RC_OK  0x0001
+
+#endif
diff --git a/target/s390x/diag.c b/target/s390x/diag.c
index da44b0133e..cb840e4b97 100644
--- a/target/s390x/diag.c
+++ b/target/s390x/diag.c
@@ -192,3 +192,39 @@ out:
  break;
  }
  }
+
+void handle_diag_320(CPUS390XState *env, uint64_t r1, uint64_t r3, uintptr_t 
ra)
+{
+S390CPU *cpu = env_archcpu(env);
+uint64_t subcode = env->regs[r3];
+uint64_t addr = env->regs[r1];
+int rc;


Do we also need a s390_has_feat(S390_FEAT_DIAG_320) check here?


+if (env->psw.mask & PSW_MASK_PSTATE) {
+s390_program_interrupt(env, PGM_PRIVILEGED, ra);
+return;
+}
+
+if (r1 & 1) {
+s390_program_interrupt(env, PGM_SPECIFICATION, ra);
+return;
+}
+
+switch (subcode) {
+case DIAG_320_SUBC_QUERY_ISM:
+uint64_t ism =  0;
+
+if (s390_cpu_virt_mem_write(cpu, addr, (uint8_t)r1, &ism,


I think you could drop the (uint8_t) here?


+be64_to_cpu(sizeof(ism {


be64_to_cpu() looks very wrong here!

 Thomas




Re: [PATCH 1/4] target/arm/ptw: extract arm_mmu_idx_to_security_space

2025-04-11 Thread Philippe Mathieu-Daudé

On 10/4/25 23:00, Pierrick Bouvier wrote:

We'll reuse this function later.

Signed-off-by: Pierrick Bouvier 
---
  target/arm/ptw.c | 21 ++---
  1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 8d4e9e07a94..5e196cfa955 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -3550,13 +3550,9 @@ bool get_phys_addr_with_space_nogpc(CPUARMState *env, 
vaddr address,
 memop, result, fi);
  }
  
-bool get_phys_addr(CPUARMState *env, vaddr address,

-   MMUAccessType access_type, MemOp memop, ARMMMUIdx mmu_idx,
-   GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
+static ARMSecuritySpace arm_mmu_idx_to_security_space
+(CPUARMState *env, ARMMMUIdx mmu_idx)


Style is:

static ARMSecuritySpace arm_mmu_idx_to_security_space(CPUARMState *env,
  ARMMMUIdx mmu_idx)

or:

static ARMSecuritySpace
arm_mmu_idx_to_security_space(CPUARMState *env, ARMMMUIdx mmu_idx)

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 3/5] util/qemu-sockets: Refactor success and failure paths in inet_listen_saddr()

2025-04-11 Thread Daniel P . Berrangé
On Tue, Apr 08, 2025 at 01:25:02PM +0200, Juraj Marcin wrote:
> From: Juraj Marcin 
> 
> To get a listening socket, we need to first create a socket, try binding
> it to a certain port, and lastly starting listening to it. Each of these
> operations can fail due to various reasons, one of them being that the
> requested address/port is already in use. In such case, the function
> tries the same process with a new port number.
> 
> This patch refactors the port number loop, so the success path is no
> longer buried inside the 'if' statements in the middle of the loop. Now,
> the success path is not nested and ends at the end of the iteration
> after successful socket creation, binding, and listening. In case any of
> the operations fails, it either continues to the next iteration (and the
> next port) or jumps out of the loop to handle the error and exits the
> function.
> 
> Signed-off-by: Juraj Marcin 
> ---
>  util/qemu-sockets.c | 51 -
>  1 file changed, 27 insertions(+), 24 deletions(-)

Reviewed-by: Daniel P. Berrangé 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v3 5/5] utils/qemu-sockets: Introduce inet socket options controlling TCP keep-alive

2025-04-11 Thread Daniel P . Berrangé
On Tue, Apr 08, 2025 at 01:25:04PM +0200, Juraj Marcin wrote:
> From: Juraj Marcin 
> 
> With the default TCP stack configuration, it could be even 2 hours
> before the connection times out due to the other side not being
> reachable. However, in some cases, the application needs to be aware of
> a connection issue much sooner.
> 
> This is the case, for example, for postcopy live migration. If there is
> no traffic from the migration destination guest (server-side) to the
> migration source guest (client-side), the destination keeps waiting for
> pages indefinitely and does not switch to the postcopy-paused state.
> This can happen, for example, if the destination QEMU instance is
> started with the '-S' command line option and the machine is not started
> yet, or if the machine is idle and produces no new page faults for
> not-yet-migrated pages.
> 
> This patch introduces new inet socket parameters that control count,
> idle period, and interval of TCP keep-alive packets before the
> connection is considered broken. These parameters are available on
> systems where the respective TCP socket options are defined
> (TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL).
> 
> The default value for all is 0, which means the system configuration is
> used.
> 
> Signed-off-by: Juraj Marcin 
> ---
>  meson.build |  6 
>  qapi/sockets.json   | 15 
>  util/qemu-sockets.c | 88 +
>  3 files changed, 109 insertions(+)
> 
> diff --git a/meson.build b/meson.build
> index 41f68d3806..680f47cf42 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -2734,6 +2734,12 @@ if linux_io_uring.found()
>config_host_data.set('HAVE_IO_URING_PREP_WRITEV2',
> cc.has_header_symbol('liburing.h', 
> 'io_uring_prep_writev2'))
>  endif
> +config_host_data.set('HAVE_TCP_KEEPCNT',
> + cc.has_header_symbol('netinet/tcp.h', 'TCP_KEEPCN'T))
> +config_host_data.set('HAVE_TCP_KEEPIDLE',
> + cc.has_header_symbol('netinet/tcp.h', 'TCP_KEEPIDLE'))
> +config_host_data.set('HAVE_TCP_KEEPINTVL',
> + cc.has_header_symbol('netinet/tcp.h', 'TCP_KEEPINTVL'))

What platforms are you aware of that do NOT have these
settings available ? I'm wondering if we can just assume
they always exist.

>  
>  # has_member
>  config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID',
> diff --git a/qapi/sockets.json b/qapi/sockets.json
> index 62797cd027..bb9d298635 100644
> --- a/qapi/sockets.json
> +++ b/qapi/sockets.json
> @@ -59,6 +59,18 @@
>  # @keep-alive: enable keep-alive when connecting to/listening on this socket.
>  # (Since 4.2, not supported for listening sockets until 10.1)
>  #
> +# @keep-alive-count: number of keep-alive packets sent before the connection 
> is
> +# closed.  Only supported for TCP sockets on systems where TCP_KEEPCNT
> +# socket option is defined.  (Since 10.1)
> +#
> +# @keep-alive-idle: time in seconds the connection needs to be idle before
> +# sending a keepalive packet.  Only supported for TCP sockets on systems
> +# where TCP_KEEPIDLE socket option is defined.  (Since 10.1)
> +#
> +# @keep-alive-interval: time in secods between keep-alive packets.  Only

Trivial typo s/secods/seconds/

> +# supported for TCP sockets on systems where TCP_KEEPINTVL is defined.
> +# (Since 10.1)
> +#
>  # @mptcp: enable multi-path TCP.  (Since 6.1)
>  #
>  # Since: 1.3
> @@ -71,6 +83,9 @@
>  '*ipv4': 'bool',
>  '*ipv6': 'bool',
>  '*keep-alive': 'bool',
> +'*keep-alive-count': { 'type': 'uint32', 'if': 'HAVE_TCP_KEEPCNT' },
> +'*keep-alive-idle': { 'type': 'uint32', 'if': 'HAVE_TCP_KEEPIDLE' },
> +'*keep-alive-interval': { 'type': 'uint32', 'if': 'HAVE_TCP_KEEPINTVL' },
>  '*mptcp': { 'type': 'bool', 'if': 'HAVE_IPPROTO_MPTCP' } } }
>  
>  ##

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v1 06/24] s390x/diag: Implement DIAG 320 subcode 1

2025-04-11 Thread Thomas Huth

On 08/04/2025 17.55, Zhuoying Cai wrote:

DIAG 320 subcode 1 provides information needed to determine
the amount of storage to store one or more certificates.

The subcode value is denoted by setting the left-most bit
of an 8-byte field.

The verification-certificate-storage-size block (VCSSB) contains
the output data when the operation completes successfully.

Signed-off-by: Zhuoying Cai 
---
  include/hw/s390x/ipl/diag320.h | 25 ++
  target/s390x/diag.c| 39 +-
  2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/hw/s390x/ipl/diag320.h b/include/hw/s390x/ipl/diag320.h
index d6f70c65df..ded336df25 100644
--- a/include/hw/s390x/ipl/diag320.h
+++ b/include/hw/s390x/ipl/diag320.h
@@ -13,7 +13,32 @@
  #define S390X_DIAG320_H
  
  #define DIAG_320_SUBC_QUERY_ISM 0

+#define DIAG_320_SUBC_QUERY_VCSI1
  
  #define DIAG_320_RC_OK  0x0001

+#define DIAG_320_RC_NOMEM   0x0202
+
+#define VCSSB_MAX_LEN   128
+#define VCE_HEADER_LEN  128
+#define VCB_HEADER_LEN  64
+
+#define DIAG_320_ISM_QUERY_VCSI 0x4000
+
+struct VerificationCertificateStorageSizeBlock {
+uint32_t length;
+uint8_t reserved0[3];
+uint8_t version;
+uint32_t reserved1[6];
+uint16_t totalvc;
+uint16_t maxvc;
+uint32_t reserved3[7];
+uint32_t maxvcelen;
+uint32_t reserved4[3];
+uint32_t largestvcblen;
+uint32_t totalvcblen;
+uint32_t reserved5[10];
+} QEMU_PACKED;
+typedef struct VerificationCertificateStorageSizeBlock \
+VerificationCertificateStorageSizeBlock;


That's quite a long name, maybe shorten to VerificationCertStorageBlock or 
something similar?


  
  #endif

diff --git a/target/s390x/diag.c b/target/s390x/diag.c
index c64b935c87..cc639819ec 100644
--- a/target/s390x/diag.c
+++ b/target/s390x/diag.c
@@ -194,6 +194,7 @@ out:
  void handle_diag_320(CPUS390XState *env, uint64_t r1, uint64_t r3, uintptr_t 
ra)
  {
  S390CPU *cpu = env_archcpu(env);
+S390IPLCertificateStore *qcs = s390_ipl_get_certificate_store();
  uint64_t subcode = env->regs[r3];
  uint64_t addr = env->regs[r1];
  int rc;
@@ -210,7 +211,7 @@ void handle_diag_320(CPUS390XState *env, uint64_t r1, 
uint64_t r3, uintptr_t ra)
  
  switch (subcode) {

  case DIAG_320_SUBC_QUERY_ISM:
-uint64_t ism =  0;
+uint64_t ism = DIAG_320_ISM_QUERY_VCSI;


That likely should be a cpu_to_be64(DIAG_320_ISM_QUERY_VCSI) instead.

  
  if (s390_cpu_virt_mem_write(cpu, addr, (uint8_t)r1, &ism,

  be64_to_cpu(sizeof(ism {
@@ -218,6 +219,42 @@ void handle_diag_320(CPUS390XState *env, uint64_t r1, 
uint64_t r3, uintptr_t ra)
  return;
  }
  
+rc = DIAG_320_RC_OK;

+break;
+case DIAG_320_SUBC_QUERY_VCSI:
+VerificationCertificateStorageSizeBlock vcssb;
+
+if (!diag_parm_addr_valid(addr, 
sizeof(VerificationCertificateStorageSizeBlock),
+  true)) {
+s390_program_interrupt(env, PGM_ADDRESSING, ra);
+return;
+}
+
+if (!qcs || !qcs->count) {
+vcssb.length = 4;
+} else {
+vcssb.length = VCSSB_MAX_LEN;
+vcssb.version = 0;
+vcssb.totalvc = qcs->count;
+vcssb.maxvc = MAX_CERTIFICATES;
+vcssb.maxvcelen = VCE_HEADER_LEN + qcs->max_cert_size;
+vcssb.largestvcblen = VCB_HEADER_LEN + vcssb.maxvcelen;
+vcssb.totalvcblen = VCB_HEADER_LEN + qcs->count * VCE_HEADER_LEN +
+qcs->total_bytes;


You also need cpu_to_beXX() for these values here, too.


+}
+
+if (vcssb.length < 128) {
+rc = DIAG_320_RC_NOMEM;
+break;
+}
+
+if (s390_cpu_virt_mem_write(cpu, addr, (uint8_t)r1, &vcssb,
+be64_to_cpu(


And that be64_to_cpu() is wrong here.

 Thomas


+
sizeof(VerificationCertificateStorageSizeBlock)
+))) {
+s390_cpu_virt_mem_handle_exc(cpu, ra);
+return;
+}
  rc = DIAG_320_RC_OK;
  break;
  default:





Re: Management applications and CPU feature flags

2025-04-11 Thread Cornelia Huck
On Fri, Apr 11 2025, Jiri Denemark  wrote:

> On Fri, Apr 11, 2025 at 13:43:39 +0200, Markus Armbruster wrote:
>> Daniel P. Berrangé  writes:
>> > On Fri, Apr 11, 2025 at 12:40:46PM +0200, Markus Armbruster wrote:
>> >> Daniel P. Berrangé  writes:
>> >> > Considering the bigger picture QMP design, when libvirt is trying to
>> >> > understand QEMU's CPU feature flag expansion, I would ask why we don't
>> >> > have something like a "query-cpu" command to tell us the current CPU
>> >> > expansion, avoiding the need for poking at QOM properties directly.
>> >> 
>> >> How do the existing query-cpu-FOO fall short of what management
>> >> applications such as libvirt needs?
>> >
>> > It has been along while since I looked at them, but IIRC they were
>> > returning static info about CPU models, whereas libvirt wanted info
>> > on the currently requested '-cpu ARGS'
>> 
>> Libvirt developers, please work with us on design of new commands or
>> improvements to existing ones to better meet libvirt's needs in this
>> area.
>
> The existing commands (query-cpu-definitions, query-cpu-model-expansion)
> are useful for probing before starting a domain. But what we use qom-get
> for is to get a view of the currently instantiated virtual CPU created
> by QEMU according to -cpu when we're starting a domain. In other words,
> we start QEMU with -S and before starting vCPUs we need to know exactly
> what features were enabled and if any feature we requested was disabled
> by QEMU. Currently we query QOM for CPU properties as that's what we
> were advised to use ages ago.
>
> The reason behind querying such info is ensuring stable guest ABI during
> migration. Asking QEMU for a specific CPU model and features does not
> mean we'll get exactly what we asked for (this is not a bug) so we need
> to record the differences so that we can start QEMU for incoming
> migration with a CPU matching exactly the one provided on the source.
>
> As Peter said, the current way is terribly inefficient as it requires
> several hundreds of QMP commands so the goal is to have a single QMP
> command that would tell us all we need to know about the virtual CPU.
> That is all enabled features and all features that could not be enabled
> even though we asked for them.

Wandering in here from the still-very-much-in-progress Arm perspective
(current but not yet posted QEMU code at
https://gitlab.com/cohuck/qemu/-/tree/arm-cpu-model-rfcv3?ref_type=heads):

We're currently operating at the "writable ID register fields" level
with the idea of providing features (FEAT_xxx) as an extra layer on top
(as they model a subset of what we actually need) and have yet to come
up with a good way to do named models for KVM. The
query-cpu-model-expansion command will yield a list of all writable ID
register fields and their values (as for now, for the 'host' model.) IIUC
you want to query (a) what is actually available for configuration
(before starting a domain) and (b) what you actually got (when starting
a domain). Would a dump of the current state of the ID register fields
before starting the vcpus work for (b)? Or is that too different from
what other archs need/want? How much wriggle room do we have for special
handling (different commands, different output, ...?)




Re: [PATCH 08/10] hw/9pfs: Allow using hw/9pfs with emscripten

2025-04-11 Thread Kohei Tokunaga
Hi Paolo,

> > Emscripten's fiber does not support submitting coroutines to other
> > threads.
>
> Does it work as long as the thread does not rewind?

The structure used by Fiber includes a thread-specific field related to
rewind [1], which prevents it from being shared across threads. The behavior
of the remaining fields in multi-threaded contexts is not documented, so
further experimentation is needed to determine whether they can be safely
shared.

[1]
https://emscripten.org/docs/api_reference/fiber.h.html#c.asyncify_data_t.rewind_id

> You can add all these to the stubs/emscripten.c file that I suggested
> elsewhere.

Sure, I'll apply this reorganization in the next verison of the series.

> You could extracting v9fs_co_run_in_worker()'s bodies into separate
> functions.  It is tedious but not hard; all you have to do is define
> structs for the to parameters and return values of v9fs_co_*(), unpack
> them in the callback functions, and retrieve the return value in
> v9fs_co_*().  Many functions
>
> The advantage is that, instead of all the bottom half and yielding dance
> that is done by v9fs_co_run_in_worker() and co_run_in_worker_bh(), you
> can just use thread_pool_submit_co().

Thank you for the suggestion. I'll explore this approach, though it's still
unclear whether thread_pool_submit_co() can be used with Emscripten's Fiber
due to the limitations mentioned above.


Re: [PATCH v1 08/24] s390x/diag: Introduce DIAG 508 for secure IPL operations

2025-04-11 Thread Thomas Huth

On 08/04/2025 17.55, Zhuoying Cai wrote:

From: Collin Walling 

In order to support secure IPL (aka secure boot) for the s390-ccw BIOS,
a new s390 DIAGNOSE instruction is introduced to leverage QEMU for
handling operations such as signature verification and certificate
retrieval.

Currently, only subcode 0 is supported with this patch, which is used to
query a bitmap of which subcodes are supported.

Signed-off-by: Collin Walling 
---
  hw/s390x/ipl.h |  1 +
  include/hw/s390x/ipl/diag508.h | 17 +
  target/s390x/diag.c| 26 ++
  target/s390x/kvm/kvm.c | 14 ++
  target/s390x/s390x-internal.h  |  2 ++
  5 files changed, 60 insertions(+)
  create mode 100644 include/hw/s390x/ipl/diag508.h

diff --git a/hw/s390x/ipl.h b/hw/s390x/ipl.h
index 822535ad76..e9ef8ddccd 100644
--- a/hw/s390x/ipl.h
+++ b/hw/s390x/ipl.h
@@ -18,6 +18,7 @@
  #include "exec/address-spaces.h"
  #include "hw/qdev-core.h"
  #include "hw/s390x/ipl/diag320.h"
+#include "hw/s390x/ipl/diag508.h"
  #include "hw/s390x/ipl/qipl.h"
  #include "qom/object.h"
  
diff --git a/include/hw/s390x/ipl/diag508.h b/include/hw/s390x/ipl/diag508.h

new file mode 100644
index 00..83c4439cb2
--- /dev/null
+++ b/include/hw/s390x/ipl/diag508.h
@@ -0,0 +1,17 @@
+/*
+ * S/390 DIAGNOSE 508 definitions and structures
+ *
+ * Copyright 2025 IBM Corp.
+ * Author(s): Collin Walling 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#ifndef S390X_DIAG508_H
+#define S390X_DIAG508_H
+
+#define DIAG_508_SUBC_QUERY_SUBC0x
+
+#endif
diff --git a/target/s390x/diag.c b/target/s390x/diag.c
index 82e4dc9e1e..ad7f4b5025 100644
--- a/target/s390x/diag.c
+++ b/target/s390x/diag.c
@@ -488,3 +488,29 @@ void handle_diag_320(CPUS390XState *env, uint64_t r1, 
uint64_t r3, uintptr_t ra)
  }
  env->regs[r1 + 1] = rc;
  }
+
+void handle_diag_508(CPUS390XState *env, uint64_t r1, uint64_t r3, uintptr_t 
ra)
+{
+uint64_t subcode = env->regs[r3];
+int rc;


Do we need to check some feature bit here? e.g. check 
s390_has_feat(S390_FEAT_DIAG_320) here, too?



+if (env->psw.mask & PSW_MASK_PSTATE) {
+s390_program_interrupt(env, PGM_PRIVILEGED, ra);
+return;
+}
+
+if ((subcode & ~0x0ULL) || (r1 & 1)) {
+s390_program_interrupt(env, PGM_SPECIFICATION, ra);
+return;
+}
+
+switch (subcode) {
+case DIAG_508_SUBC_QUERY_SUBC:
+rc = 0;
+break;
+default:
+s390_program_interrupt(env, PGM_SPECIFICATION, ra);
+return;
+}
+env->regs[r1 + 1] = rc;
+}


 Thomas




Re: [PATCH v1 09/24] s390x/diag: Implement DIAG 508 subcode 2 for signature verification

2025-04-11 Thread Thomas Huth

On 08/04/2025 17.55, Zhuoying Cai wrote:

From: Collin Walling 

DIAG 508 subcode 2 performs signature-verfication on signed components.
A signed component may be a Linux kernel image, or any other signed
binary. **Verification of initrd is not supported.**

The instruction call expects two item-pairs: an address of a device
component, an address of the analogous signature file (in PKCS#7 format),
and their respective lengths. All of this data should be encapsulated
within a Diag508SignatureVerificationBlock, with the CertificateStoreInfo
fields ignored. The DIAG handler will read from the provided addresses
to retrieve the necessary data, parse the signature file, then
perform the signature-verification. Because there is no way to
correlate a specific certificate to a component, each certificate
in the store is tried until either verification succeeds, or all
certs have been exhausted.

The subcode value is denoted by setting the second-to-left-most bit of
a 2-byte field.

A return code of 1 indicates success, and the index and length of the
corresponding certificate will be set in the CertificateStoreInfo
portion of the SigVerifBlock. The following values indicate failure:

0x0402: component data is invalid
0x0502: certificate is not in x509 format
0x0602: signature is not in PKCS#7 format
0x0702: signature-verification failed

Signed-off-by: Collin Walling 
---
  include/hw/s390x/ipl/diag508.h |  25 +++
  target/s390x/diag.c| 131 -
  2 files changed, 155 insertions(+), 1 deletion(-)

diff --git a/include/hw/s390x/ipl/diag508.h b/include/hw/s390x/ipl/diag508.h
index 83c4439cb2..f8f4b6398e 100644
--- a/include/hw/s390x/ipl/diag508.h
+++ b/include/hw/s390x/ipl/diag508.h
@@ -13,5 +13,30 @@
  #define S390X_DIAG508_H
  
  #define DIAG_508_SUBC_QUERY_SUBC0x

+#define DIAG_508_SUBC_SIG_VERIF 0x4000
+
+#define DIAG_508_RC_OK  0x0001
+#define DIAG_508_RC_NO_CERTS0x0102
+#define DIAG_508_RC_CERT_NOT_FOUND  0x0202
+#define DIAG_508_RC_NO_MEM_FOR_CERT 0x0302
+#define DIAG_508_RC_INVAL_COMP_DATA 0x0402
+#define DIAG_508_RC_INVAL_X509_CERT 0x0502
+#define DIAG_508_RC_INVAL_PKCS7_SIG 0x0602
+#define DIAG_508_RC_FAIL_VERIF  0x0702
+
+struct Diag508CertificateStoreInfo {
+uint8_t  idx;
+uint64_t len;
+} QEMU_PACKED;
+typedef struct Diag508CertificateStoreInfo Diag508CertificateStoreInfo;
+
+struct Diag508SignatureVerificationBlock {
+Diag508CertificateStoreInfo csi;
+uint64_t comp_len;
+uint64_t comp_addr;
+uint64_t sig_len;
+uint64_t sig_addr;
+} QEMU_PACKED;
+typedef struct Diag508SignatureVerificationBlock 
Diag508SignatureVerificationBlock;
  
  #endif

diff --git a/target/s390x/diag.c b/target/s390x/diag.c
index ad7f4b5025..cecb8bf130 100644
--- a/target/s390x/diag.c
+++ b/target/s390x/diag.c
@@ -25,6 +25,11 @@
  #include "target/s390x/kvm/pv.h"
  #include "qemu/error-report.h"
  
+#ifdef CONFIG_GNUTLS

+#include 
+#include 
+#include 
+#endif /* CONFIG_GNUTLS */
  
  int handle_diag_288(CPUS390XState *env, uint64_t r1, uint64_t r3)

  {
@@ -489,9 +494,67 @@ void handle_diag_320(CPUS390XState *env, uint64_t r1, 
uint64_t r3, uintptr_t ra)
  env->regs[r1 + 1] = rc;
  }
  
+#ifdef CONFIG_GNUTLS

+#define datum_init(datum, data, size) \
+datum = (gnutls_datum_t){data, size}
+
+static int diag_508_init_comp(gnutls_datum_t *comp,
+  Diag508SignatureVerificationBlock *svb)
+{
+uint8_t *svb_comp = NULL;
+
+if (!svb->comp_len || !svb->comp_addr) {
+error_report("No component data.");
+return -1;
+}
+
+/*
+ * corrupted size vs. prev_size in fastbins, occurs during 2nd iteration,
+ * allocating 1mil bytes.


I don't understand that comment - could you elaborate?


+ */
+svb_comp = g_malloc0(svb->comp_len);
+cpu_physical_memory_read(svb->comp_addr, svb_comp, svb->comp_len);
+
+/*
+ * Component data is not written back to the caller,
+ * so no need to do a deep copy. Comp is freed when
+ * svb is freed.
+ */
+datum_init(*comp, svb_comp, svb->comp_len);
+return 0;
+}
+
+static int diag_508_init_signature(gnutls_pkcs7_t *sig,
+   Diag508SignatureVerificationBlock *svb)
+{
+gnutls_datum_t datum_sig;
+uint8_t *svb_sig = NULL;
+
+if (!svb->sig_len || !svb->sig_addr) {
+error_report("No signature data");
+return -1;
+}
+
+svb_sig = g_malloc0(svb->sig_len);
+cpu_physical_memory_read(svb->sig_addr, svb_sig, svb->sig_len);
+
+if (gnutls_pkcs7_init(sig) < 0) {
+error_report("Failed to initalize pkcs7 data.");
+return -1;
+}
+
+datum_init(datum_sig, svb_sig, svb->sig_len);
+return gnutls_pkcs7_import(*sig, &datum_sig, GNUTLS_X509_FMT_DER);
+
+}
+#endif /* CONFIG_GNUTLS */
+
  void handle_diag_508(CPUS390XState *env, uint64_t r1, uint64_t r3, uintptr_t 
ra)
  {
+S390IPLCertific

  1   2   >