date:20250220

Re: [PATCH v2] vhost: Don't set vring call if guest notifier is disabled

2025-02-20 Thread Michael S. Tsirkin

On Wed, Feb 19, 2025 at 09:52:35AM +0800, oen...@gmail.com wrote:
> From: Huaitong Han 
> 
> The vring call fd is set even when the guest does not use MSIX (e.g., virtio
> PMD). This results in unnecessary CPU overhead for handling virtio interrupts.
> The previous patch only optimized the condition when query_queue_notifier was
> enabled and the vector was unset. However, if query_queue_notifier is 
> disabled,
> the vring call FD should also be unset to avoid this inefficiency.
> 
> Fixes: 96a3d98d2c ("vhost: don't set vring call if no vector")
> 
> Reported-by: Zhiyuan Yuan 
> Signed-off-by: Huaitong Han 


Fails make check:


https://gitlab.com/mstredhat/qemu/-/jobs/9201935243

how was this tested? Pls include data on testing in the commit log.

> ---
> v2: - add Fixes tag
> - cc qemu-sta...@nongnu.org
> 
>  hw/virtio/vhost.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 6aa72fd434..d17e7cc6fe 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1342,8 +1342,8 @@ int vhost_virtqueue_start(struct vhost_dev *dev,
>  }
>  
>  if (k->query_guest_notifiers &&
> -k->query_guest_notifiers(qbus->parent) &&
> -virtio_queue_vector(vdev, idx) == VIRTIO_NO_VECTOR) {
> +(!k->query_guest_notifiers(qbus->parent) ||
> +virtio_queue_vector(vdev, idx) == VIRTIO_NO_VECTOR)) {
>  file.fd = -1;
>  r = dev->vhost_ops->vhost_set_vring_call(dev, &file);
>  if (r) {
> -- 
> 2.43.5

Re: [PULL 33/42] tcg/riscv: Require TCG_TARGET_REG_BITS == 64

2025-02-20 Thread Philippe Mathieu-Daudé


On 5/5/23 23:24, Richard Henderson wrote:

The port currently does not support "oversize" guests, which
means riscv32 can only target 32-bit guests.  We will soon be
building TCG once for all guests.  This implies that we can
only support riscv64.

Since all Linux distributions target riscv64 not riscv32,
this is not much of a restriction and simplifies the code.

The brcond2 and setcond2 opcodes are exclusive to 32-bit hosts,
so we can and should remove the stubs.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
  tcg/riscv/tcg-target-con-set.h |   8 --
  tcg/riscv/tcg-target.h |  22 ++--
  tcg/riscv/tcg-target.c.inc | 232 +
  3 files changed, 72 insertions(+), 190 deletions(-)




diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 0deb33701f..dddf2486c1 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -25,11 +25,14 @@
  #ifndef RISCV_TCG_TARGET_H
  #define RISCV_TCG_TARGET_H
  
-#if __riscv_xlen == 32

-# define TCG_TARGET_REG_BITS 32
-#elif __riscv_xlen == 64
-# define TCG_TARGET_REG_BITS 64
+/*
+ * We don't support oversize guests.
+ * Since we will only build tcg once, this in turn requires a 64-bit host.
+ */
+#if __riscv_xlen != 64
+#error "unsupported code generation mode"
  #endif
+#define TCG_TARGET_REG_BITS 64
  
  #define TCG_TARGET_INSN_UNIT_SIZE 4

  #define TCG_TARGET_TLB_DISPLACEMENT_BITS 20
@@ -83,13 +86,8 @@ typedef enum {
  #define TCG_TARGET_STACK_ALIGN  16
  #define TCG_TARGET_CALL_STACK_OFFSET0
  #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_NORMAL
-#if TCG_TARGET_REG_BITS == 32
-#define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_EVEN
-#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_EVEN
-#else
  #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL
  #define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_NORMAL
-#endif
  #define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL
  
  /* optional instructions */

@@ -106,8 +104,8 @@ typedef enum {
  #define TCG_TARGET_HAS_sub2_i32 1
  #define TCG_TARGET_HAS_mulu2_i320
  #define TCG_TARGET_HAS_muls2_i320
-#define TCG_TARGET_HAS_muluh_i32(TCG_TARGET_REG_BITS == 32)
-#define TCG_TARGET_HAS_mulsh_i32(TCG_TARGET_REG_BITS == 32)
+#define TCG_TARGET_HAS_muluh_i320
+#define TCG_TARGET_HAS_mulsh_i320


Should have we squashed the following with these changes?

-- >8 --
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index f7e1ca5a56f..e395ffcdaf5 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -2323,10 +2323,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode 
opc, TCGType type,

 break;

-case INDEX_op_mulsh_i32:
 case INDEX_op_mulsh_i64:
 tcg_out_opc_reg(s, OPC_MULH, a0, a1, a2);
 break;

-case INDEX_op_muluh_i32:
 case INDEX_op_muluh_i64:
 tcg_out_opc_reg(s, OPC_MULHU, a0, a1, a2);
@@ -2399,4 +2397,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode 
opc, TCGType type,

 case INDEX_op_extu_i32_i64:
 case INDEX_op_extrl_i64_i32:
+case INDEX_op_mulsh_i32:
+case INDEX_op_muluh_i32:
 default:
 g_assert_not_reached();
@@ -2707,6 +2707,4 @@ tcg_target_op_def(TCGOpcode op, TCGType type, 
unsigned flags)


 case INDEX_op_mul_i32:
-case INDEX_op_mulsh_i32:
-case INDEX_op_muluh_i32:
 case INDEX_op_div_i32:
 case INDEX_op_divu_i32:
---


  #define TCG_TARGET_HAS_ext8s_i321
  #define TCG_TARGET_HAS_ext16s_i32   1
  #define TCG_TARGET_HAS_ext8u_i321
@@ -128,7 +126,6 @@ typedef enum {
  #define TCG_TARGET_HAS_setcond2 1
  #define TCG_TARGET_HAS_qemu_st8_i32 0
  
-#if TCG_TARGET_REG_BITS == 64

  #define TCG_TARGET_HAS_movcond_i64  0
  #define TCG_TARGET_HAS_div_i64  1
  #define TCG_TARGET_HAS_rem_i64  1
@@ -165,7 +162,6 @@ typedef enum {
  #define TCG_TARGET_HAS_muls2_i640
  #define TCG_TARGET_HAS_muluh_i641
  #define TCG_TARGET_HAS_mulsh_i641
-#endif
  
  #define TCG_TARGET_DEFAULT_MO (0)

RE: [PATCH v3 23/28] test/functional/aspeed: Introduce new function to fetch assets

2025-02-20 Thread Jamin Lin

Hi Cedric,

> 
> On 2/13/25 04:35, Jamin Lin wrote:
> > This method simplifies the process of fetching and extracting assets
> > from the Aspeed GitHub repository.
> >
> > Signed-off-by: Jamin Lin 
> > ---
> >   tests/functional/test_aarch64_aspeed.py | 9 +
> >   1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/tests/functional/test_aarch64_aspeed.py
> > b/tests/functional/test_aarch64_aspeed.py
> > index 9595498ace..f3d7c8331a 100755
> > --- a/tests/functional/test_aarch64_aspeed.py
> > +++ b/tests/functional/test_aarch64_aspeed.py
> > @@ -27,14 +27,15 @@ def do_test_aarch64_aspeed_sdk_start(self, image):
> >   wait_for_console_pattern(self, '## Loading kernel from FIT
> Image')
> >   wait_for_console_pattern(self, 'Starting kernel ...')
> >
> > -ASSET_SDK_V903_AST2700 = Asset(
> > -
> 'https://github.com/AspeedTech-BMC/openbmc/releases/download/v09.03/ast
> 2700-default-obmc.tar.gz',
> > -
> '91225f50d255e2905ba8d8e0c80b71b9d157c3609770c7a740cd786370d85a77'
> )
> 
> 
> We need to keep the ASSET_SDK_V903_AST2700 definition for pre-caching the
> assets. See :
> 
> https://qemu.readthedocs.io/en/v9.2.0/devel/testing/functional.html#asset-ha
> ndling
> 
Will rework functional test.
Thanks for your review and suggestion.
Jamin
> 
> Thanks,
> 
> C.
> 
> 
> > +def extra_aspeed_archive(self, version, file, checksum):
> > +url =
> 'https://github.com/AspeedTech-BMC/openbmc/releases/download'
> > +self.archive_extract(Asset(f'{url}/{version}/{file}',
> > + f'{checksum}'))
> >
> >   def test_aarch64_ast2700_evb_sdk_v09_03(self):
> >   self.set_machine('ast2700-evb')
> >
> > -self.archive_extract(self.ASSET_SDK_V903_AST2700)
> > +self.extra_aspeed_archive('v09.03', 'ast2700-default-obmc.tar.gz',
> > +
> > +
> '91225f50d255e2905ba8d8e0c80b71b9d157c3609770c7a740cd786370d85a77'
> )
> >
> >   num_cpu = 4
> >   uboot_size =
> > os.path.getsize(self.scratch_file('ast2700-default',

RE: [PATCH v3 18/28] hw/arm/aspeed: Add SoC and Machine Support for AST2700 A1

2025-02-20 Thread Jamin Lin

Hi Cedric, 

> Subject: Re: [PATCH v3 18/28] hw/arm/aspeed: Add SoC and Machine Support
> for AST2700 A1
> 
> On 2/13/25 04:35, Jamin Lin wrote:
> > The memory map for AST2700 A1 remains compatible with AST2700 A0.
> > However, the IRQ mapping has been updated for AST2700 A1, with GIC
> > interrupts now ranging from 192 to 201. Add a new IRQ map table for
> AST2700 A1.
> > Add "aspeed_soc_ast2700a1_class_init" to initialize the AST2700 A1 SoC.
> > Introduce "aspeed_machine_ast2700a1_evb_class_init" to initialize the
> > AST2700 A1 EVB.
> >
> > Signed-off-by: Jamin Lin 
> > ---
> >   hw/arm/aspeed.c | 13 +++
> >   hw/arm/aspeed_ast27x0.c | 80
> +
> >   2 files changed, 93 insertions(+)
> >
> > diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c index
> > 6ddfdbdeba..c0539e5950 100644
> > --- a/hw/arm/aspeed.c
> > +++ b/hw/arm/aspeed.c
> > @@ -1672,6 +1672,15 @@ static void
> aspeed_machine_ast2700a0_evb_class_init(ObjectClass *oc, void *data)
> >   mc->default_ram_size = 1 * GiB;
> >   aspeed_machine_class_init_cpus_defaults(mc);
> >   }
> > +
> > +static void aspeed_machine_ast2700a1_evb_class_init(ObjectClass *oc,
> > +void *data) {
> > +MachineClass *mc = MACHINE_CLASS(oc);
> > +AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
> > +
> > +mc->desc = "Aspeed AST2700 A1 EVB (Cortex-A35)";
> > +amc->soc_name  = "ast2700-a1";
> > +}
> >   #endif
> >
> >   static void aspeed_machine_qcom_dc_scm_v1_class_init(ObjectClass
> > *oc, @@ -1798,6 +1807,10 @@ static const TypeInfo
> aspeed_machine_types[] = {
> >   .name  = MACHINE_TYPE_NAME("ast2700a0-evb"),
> >   .parent= TYPE_ASPEED_MACHINE,
> >   .class_init= aspeed_machine_ast2700a0_evb_class_init,
> > +}, {
> > +.name  = MACHINE_TYPE_NAME("ast2700a1-evb"),
> > +.parent= MACHINE_TYPE_NAME("ast2700a0-evb"),
> 
> Shouldn't the parent be TYPE_ASPEED_MACHINE instead ?
> 

Will update it.
Thanks for your review.
Jamin

> > +.class_init= aspeed_machine_ast2700a1_evb_class_init,
> >   #endif
> >   }, {
> >   .name  = TYPE_ASPEED_MACHINE,
> > diff --git a/hw/arm/aspeed_ast27x0.c b/hw/arm/aspeed_ast27x0.c index
> > 0ccec774de..926b4c3e76 100644
> > --- a/hw/arm/aspeed_ast27x0.c
> > +++ b/hw/arm/aspeed_ast27x0.c
> > @@ -119,6 +119,52 @@ static const int aspeed_soc_ast2700a0_irqmap[] = {
> >   [ASPEED_DEV_SDHCI] = 133,
> >   };
> >
> > +static const int aspeed_soc_ast2700a1_irqmap[] = {
> > +[ASPEED_DEV_SDMC]  = 0,
> > +[ASPEED_DEV_HACE]  = 4,
> > +[ASPEED_DEV_XDMA]  = 5,
> > +[ASPEED_DEV_UART4] = 8,
> > +[ASPEED_DEV_SCU]   = 12,
> > +[ASPEED_DEV_RTC]   = 13,
> > +[ASPEED_DEV_EMMC]  = 15,
> > +[ASPEED_DEV_TIMER1]= 16,
> > +[ASPEED_DEV_TIMER2]= 17,
> > +[ASPEED_DEV_TIMER3]= 18,
> > +[ASPEED_DEV_TIMER4]= 19,
> > +[ASPEED_DEV_TIMER5]= 20,
> > +[ASPEED_DEV_TIMER6]= 21,
> > +[ASPEED_DEV_TIMER7]= 22,
> > +[ASPEED_DEV_TIMER8]= 23,
> > +[ASPEED_DEV_DP]= 28,
> > +[ASPEED_DEV_LPC]   = 192,
> > +[ASPEED_DEV_IBT]   = 192,
> > +[ASPEED_DEV_KCS]   = 192,
> > +[ASPEED_DEV_I2C]   = 194,
> > +[ASPEED_DEV_ADC]   = 194,
> > +[ASPEED_DEV_GPIO]  = 194,
> > +[ASPEED_DEV_FMC]   = 195,
> > +[ASPEED_DEV_WDT]   = 195,
> > +[ASPEED_DEV_PWM]   = 195,
> > +[ASPEED_DEV_I3C]   = 195,
> > +[ASPEED_DEV_UART0] = 196,
> > +[ASPEED_DEV_UART1] = 196,
> > +[ASPEED_DEV_UART2] = 196,
> > +[ASPEED_DEV_UART3] = 196,
> > +[ASPEED_DEV_UART5] = 196,
> > +[ASPEED_DEV_UART6] = 196,
> > +[ASPEED_DEV_UART7] = 196,
> > +[ASPEED_DEV_UART8] = 196,
> > +[ASPEED_DEV_UART9] = 196,
> > +[ASPEED_DEV_UART10]= 196,
> > +[ASPEED_DEV_UART11]= 196,
> > +[ASPEED_DEV_UART12]= 196,
> > +[ASPEED_DEV_ETH1]  = 196,
> > +[ASPEED_DEV_ETH2]  = 196,
> > +[ASPEED_DEV_ETH3]  = 196,
> > +[ASPEED_DEV_PECI]  = 197,
> > +[ASPEED_DEV_SDHCI] = 197,
> > +};
> > +
> >   /* GICINT 128 */
> >   /* GICINT 192 */
> >   static const int ast2700_gic128_gic192_intcmap[] = { @@ -838,6
> > +884,34 @@ static void aspeed_soc_ast2700a0_class_init(ObjectClass *oc,
> void *data)
> >   sc->get_irq  = aspeed_soc_ast2700_get_irq;
> >   }
> >
> > +static void aspeed_soc_ast2700a1_class_init(ObjectClass *oc, void
> > +*data) {
> > +static const char * const valid_cpu_types[] = {
> > +ARM_CPU_TYPE_NAME("cortex-a35"),
> > +NULL
> > +};
> > +DeviceClass *dc = DEVICE_CLASS(oc);
> > +AspeedSoCClass *sc = ASPEED_SOC_CLASS(oc);
> > +
> > +/* Reason: The Aspeed SoC can only be instantiated from a board */
> > +dc->user_creatable = false;
> > +dc->realize  = aspeed_soc_ast2700_realize;
> > +

RE: [PATCH v3 21/28] hw/misc/aspeed_hace: Fix boot issue in the Crypto Manager Self Test

2025-02-20 Thread Jamin Lin

Hi Cedric,

> Subject: Re: [PATCH v3 21/28] hw/misc/aspeed_hace: Fix boot issue in the
> Crypto Manager Self Test
> 
> On 2/13/25 04:35, Jamin Lin wrote:
> > Currently, it does not support the CRYPT command. Instead, it only
> > sends an interrupt to notify the firmware that the crypt command has
> completed.
> > It is a temporary workaround to resolve the boot issue in the Crypto
> > Manager Self Test.
> >
> > Signed-off-by: Jamin Lin 
> 
> Please add an AspeedHACEClass class attribute (bool) to handle this
> workaround and a comment in the code mentioning the issue.
> 
Thanks for review and suggestion.
I will add the use_crypt_workaround attribute to the AspeedHACEClass and 
introduce the use-crypt-workaround property.
Do you have any concerns, or do you have a preferred naming convention?
Thanks-Jamin

> 
> Thanks,
> 
> C.
> 
> 
> 
> > ---
> >   hw/misc/aspeed_hace.c | 12 
> >   1 file changed, 12 insertions(+)
> >
> > diff --git a/hw/misc/aspeed_hace.c b/hw/misc/aspeed_hace.c index
> > 86422cb3be..4d0999e7e9 100644
> > --- a/hw/misc/aspeed_hace.c
> > +++ b/hw/misc/aspeed_hace.c
> > @@ -59,6 +59,7 @@
> >   /* Other cmd bits */
> >   #define  HASH_IRQ_ENBIT(9)
> >   #define  HASH_SG_EN BIT(18)
> > +#define  CRYPT_IRQ_EN   BIT(12)
> >   /* Scatter-gather data list */
> >   #define SG_LIST_LEN_SIZE4
> >   #define SG_LIST_LEN_MASK0x0FFF
> > @@ -343,6 +344,13 @@ static void aspeed_hace_write(void *opaque,
> hwaddr addr, uint64_t data,
> >   qemu_irq_lower(s->irq);
> >   }
> >   }
> > +if (data & CRYPT_IRQ) {
> > +data &= ~CRYPT_IRQ;
> > +
> > +if (s->regs[addr] & CRYPT_IRQ) {
> > +qemu_irq_lower(s->irq);
> > +}
> > +}
> >   break;
> >   case R_HASH_SRC:
> >   data &= ahc->src_mask;
> > @@ -388,6 +396,10 @@ static void aspeed_hace_write(void *opaque,
> hwaddr addr, uint64_t data,
> >   case R_CRYPT_CMD:
> >   qemu_log_mask(LOG_UNIMP, "%s: Crypt commands not
> implemented\n",
> >  __func__);
> > +s->regs[R_STATUS] |= CRYPT_IRQ;
> > +if (data & CRYPT_IRQ_EN) {
> > +qemu_irq_raise(s->irq);
> > +}
> >   break;
> >   default:
> >   break;

Re: [PATCH v3 14/14] scripts/ghes_inject: add a script to generate GHES error inject

2025-02-20 Thread Markus Armbruster

Mauro Carvalho Chehab  writes:

> Em Wed, 05 Feb 2025 09:16:53 +0100
> Markus Armbruster  escreveu:

[...]

>> Sorry if this has been answered already...  why not GPL-2.0-or-later?
>> 
>> More of the same below.
>
> No particular reason. It is just that GPL-2.0 is my preferred license.
>
> I'll change the license of the three scripts to be GPL-2.0-or-later.

Thank you!

[PATCH 4/5] pcie, virtio: Remove redundant pm_cap

2025-02-20 Thread Alex Williamson

The pm_cap on the PCIExpressDevice object can be distilled down
to the new instance on the PCIDevice object.

Cc: Michael S. Tsirkin 
Cc: Marcel Apfelbaum 
Signed-off-by: Alex Williamson 
---
 hw/pci-bridge/pcie_pci_bridge.c | 1 -
 hw/virtio/virtio-pci.c  | 8 +++-
 include/hw/pci/pcie.h   | 2 --
 3 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c
index 9fa656b43b42..2429503cfbbf 100644
--- a/hw/pci-bridge/pcie_pci_bridge.c
+++ b/hw/pci-bridge/pcie_pci_bridge.c
@@ -56,7 +56,6 @@ static void pcie_pci_bridge_realize(PCIDevice *d, Error 
**errp)
 if (pos < 0) {
 goto pm_error;
 }
-d->exp.pm_cap = pos;
 pci_set_word(d->config + pos + PCI_PM_PMC, 0x3);
 
 pcie_cap_arifwd_init(d);
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index afe8b5551c5c..3ca3f849d391 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2209,8 +2209,6 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 return;
 }
 
-pci_dev->exp.pm_cap = pos;
-
 /*
  * Indicates that this function complies with revision 1.2 of the
  * PCI Power Management Interface Specification.
@@ -2309,11 +2307,11 @@ static bool virtio_pci_no_soft_reset(PCIDevice *dev)
 {
 uint16_t pmcsr;
 
-if (!pci_is_express(dev) || !dev->exp.pm_cap) {
+if (!pci_is_express(dev) || !(dev->cap_present & QEMU_PCI_CAP_PM)) {
 return false;
 }
 
-pmcsr = pci_get_word(dev->config + dev->exp.pm_cap + PCI_PM_CTRL);
+pmcsr = pci_get_word(dev->config + dev->pm_cap + PCI_PM_CTRL);
 
 /*
  * When No_Soft_Reset bit is set and the device
@@ -2342,7 +2340,7 @@ static void virtio_pci_bus_reset_hold(Object *obj, 
ResetType type)
 
 if (proxy->flags & VIRTIO_PCI_FLAG_INIT_PM) {
 pci_word_test_and_clear_mask(
-dev->config + dev->exp.pm_cap + PCI_PM_CTRL,
+dev->config + dev->pm_cap + PCI_PM_CTRL,
 PCI_PM_CTRL_STATE_MASK);
 }
 }
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index b8d59732bc63..70a5de09de39 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -58,8 +58,6 @@ typedef enum {
 struct PCIExpressDevice {
 /* Offset of express capability in config space */
 uint8_t exp_cap;
-/* Offset of Power Management capability in config space */
-uint8_t pm_cap;
 
 /* SLOT */
 bool hpev_notified; /* Logical AND of conditions for hot plug event.
-- 
2.48.1

Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled

2025-02-20 Thread Michael S. Tsirkin

On Thu, Feb 20, 2025 at 04:57:51PM +0100, Eric Auger wrote:
> Hi Michael,
> 
> 
> On 2/20/25 4:25 PM, Michael S. Tsirkin wrote:
> > On Fri, Jan 31, 2025 at 10:55:26AM +0100, Eric Auger wrote:
> >> I tested [PATCH] virtio: Remove virtio devices on device_shutdown()
> >> https://lore.kernel.org/all/20240808075141.3433253-1-kirill.shute...@linux.intel.com/
> >>
> >> and it fixes my issue
> >>
> >> Eric
> >
> > To make sure, we are dropping this in favor of the guest fix (which
> > I intent to merge shortly), correct?
> 
> Yes this should be dropped in favour of the implementation of the
> shutdown callback on guest side. Did you send your patch, I did not see it.

Will send tomorrow.

> however on qemu side, there is "[PATCH v3 0/5] Fix vIOMMU reset order"
> that needs to be merged to fix qmp system_reset.


Yes that one I'm testing.

> Thanks
> 
> Eric
> >

Re: [PATCH v3 028/162] tcg: Convert sub to TCGOutOpSubtract

2025-02-20 Thread Philippe Mathieu-Daudé


Hi Richard,

On 17/2/25 00:07, Richard Henderson wrote:

Create a special subclass for sub, because two backends can
support "subtract from immediate".  Drop all backend support
for an immediate as the second operand, as we transform sub
to add during optimize.

Signed-off-by: Richard Henderson 
---
  tcg/mips/tcg-target-con-set.h|  1 -
  tcg/ppc/tcg-target-con-set.h |  3 +-
  tcg/riscv/tcg-target-con-set.h   |  1 -
  tcg/riscv/tcg-target-con-str.h   |  1 -
  tcg/tcg.c| 26 ++--
  tcg/aarch64/tcg-target.c.inc | 24 +++
  tcg/arm/tcg-target.c.inc | 29 +++---
  tcg/i386/tcg-target.c.inc| 23 +++---
  tcg/loongarch64/tcg-target.c.inc | 32 +---
  tcg/mips/tcg-target.c.inc| 31 ---
  tcg/ppc/tcg-target.c.inc | 52 +++-
  tcg/riscv/tcg-target.c.inc   | 45 +--
  tcg/s390x/tcg-target.c.inc   | 41 +++--
  tcg/sparc64/tcg-target.c.inc | 16 +++---
  tcg/tci/tcg-target.c.inc | 14 +++--
  15 files changed, 165 insertions(+), 174 deletions(-)




diff --git a/tcg/tcg.c b/tcg/tcg.c
index b740609c03..ca91a80efc 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -986,6 +986,14 @@ typedef struct TCGOutOpBinary {
  TCGReg a0, TCGReg a1, tcg_target_long a2);
  } TCGOutOpBinary;
  
+typedef struct TCGOutOpSubtract {

+TCGOutOp base;
+void (*out_rrr)(TCGContext *s, TCGType type,
+TCGReg a0, TCGReg a1, TCGReg a2);
+void (*out_rir)(TCGContext *s, TCGType type,
+TCGReg a0, tcg_target_long a1, TCGReg a2);
+} TCGOutOpSubtract;
+
  #include "tcg-target.c.inc"
  
  #ifndef CONFIG_TCG_INTERPRETER

@@ -1012,6 +1020,8 @@ static const TCGOutOp * const all_outop[NB_OPS] = {
  OUTOP(INDEX_op_nor, TCGOutOpBinary, outop_nor),
  OUTOP(INDEX_op_or, TCGOutOpBinary, outop_or),
  OUTOP(INDEX_op_orc, TCGOutOpBinary, outop_orc),
+OUTOP(INDEX_op_sub_i32, TCGOutOpSubtract, outop_sub),
+OUTOP(INDEX_op_sub_i64, TCGOutOpSubtract, outop_sub),
  OUTOP(INDEX_op_xor, TCGOutOpBinary, outop_xor),
  };
  
@@ -2231,7 +2241,6 @@ bool tcg_op_supported(TCGOpcode op, TCGType type, unsigned flags)

  case INDEX_op_st8_i32:
  case INDEX_op_st16_i32:
  case INDEX_op_st_i32:
-case INDEX_op_sub_i32:
  case INDEX_op_neg_i32:
  case INDEX_op_mul_i32:
  case INDEX_op_shl_i32:
@@ -2301,7 +2310,6 @@ bool tcg_op_supported(TCGOpcode op, TCGType type, 
unsigned flags)
  case INDEX_op_st16_i64:
  case INDEX_op_st32_i64:
  case INDEX_op_st_i64:
-case INDEX_op_sub_i64:
  case INDEX_op_neg_i64:
  case INDEX_op_mul_i64:
  case INDEX_op_shl_i64:
@@ -5442,6 +5450,20 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
  }
  break;
  
+case INDEX_op_sub_i32:

+case INDEX_op_sub_i64:
+{
+const TCGOutOpSubtract *out = &outop_sub;
+
+tcg_debug_assert(!const_args[2]);
+if (const_args[1]) {
+out->out_rir(s, type, new_args[0], new_args[1], new_args[2]);
+} else {
+out->out_rrr(s, type, new_args[0], new_args[1], new_args[2]);
+}
+}
+break;
+
  default:
  if (def->flags & TCG_OPF_VECTOR) {
  tcg_out_vec_op(s, op->opc, type - TCG_TYPE_V64,
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 30cad937b7..dfe67c1261 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -2205,6 +2205,17 @@ static const TCGOutOpBinary outop_orc = {
  .out_rrr = tgen_orc,
  };
  
+static void tgen_sub(TCGContext *s, TCGType type,

+ TCGReg a0, TCGReg a1, TCGReg a2)
+{
+tcg_out_insn(s, 3502, SUB, type, a0, a1, a2);
+}
+
+static const TCGOutOpSubtract outop_sub = {
+.base.static_constraint = C_O1_I2(r, r, r),
+.out_rrr = tgen_sub,
+};
+
  static void tgen_xor(TCGContext *s, TCGType type,
   TCGReg a0, TCGReg a1, TCGReg a2)
  {
@@ -2290,15 +2301,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
TCGType ext,
  tcg_out_ldst(s, I3312_STRX, a0, a1, a2, 3);
  break;
  
-case INDEX_op_sub_i32:

-case INDEX_op_sub_i64:
-if (c2) {
-tgen_addi(s, ext, a0, a1, -a2);


I'm a bit lost with this change here, and following patch #4
(ADD conversion).

Since tgen_addi() effectively handles both add/sub, why not
name it tgen_addsubi() like the old tcg_out_addsubi() name?

(Maybe reorder to have that SUB conv patch after ADD conv?)


-} else {
-tcg_out_insn(s, 3502, SUB, ext, a0, a1, a2);
-}
-break;
-
  case INDEX_op_neg_i64:
  case INDEX_op_neg_i32:
  tcg_out_insn(s, 3502, SUB, ext, a0, TCG_REG_XZR, a1);
@@ -3014,10 +3016,6 @@ tcg_target_op_def(TCGOpcode op, TCGType type, unsigned

Re: [PATCH v3 1/2] hw/i386/amd_iommu: Isolate AMDVI-PCI from amd-iommu device to allow full control over the PCI device creation

2025-02-20 Thread Michael S. Tsirkin

On Wed, Feb 12, 2025 at 05:44:49AM +, Suravee Suthikulpanit wrote:
> Current amd-iommu model internally creates an AMDVI-PCI device. Here is
> a snippet from info qtree:
> 
>   bus: main-system-bus
> type System
> dev: amd-iommu, id ""
>   xtsup = false
>   pci-id = ""
>   intremap = "on"
>   device-iotlb = false
>   pt = true
> ...
> dev: q35-pcihost, id ""
>   MCFG = -1 (0x)
>   pci-hole64-size = 34359738368 (32 GiB)
>   below-4g-mem-size = 134217728 (128 MiB)
>   above-4g-mem-size = 0 (0 B)
>   smm-ranges = true
>   x-pci-hole64-fix = true
>   x-config-reg-migration-enabled = true
>   bypass-iommu = false
>   bus: pcie.0
> type PCIE
> dev: AMDVI-PCI, id ""
>   addr = 01.0
>   romfile = ""
>   romsize = 4294967295 (0x)
>   rombar = -1 (0x)
>   multifunction = false
>   x-pcie-lnksta-dllla = true
>   x-pcie-extcap-init = true
>   failover_pair_id = ""
>   acpi-index = 0 (0x0)
>   x-pcie-err-unc-mask = true
>   x-pcie-ari-nextfn-1 = false
>   x-max-bounce-buffer-size = 4096 (4 KiB)
>   x-pcie-ext-tag = true
>   busnr = 0 (0x0)
>   class Class 0806, addr 00:01.0, pci id 1022: (sub 1af4:1100)
> 
> This prohibits users from specifying the PCI topology for the amd-iommu 
> device,
> which becomes a problem when trying to support VM migration since it does not
> guarantee the same enumeration of AMD IOMMU device.
> 
> Therfore, decouple the AMDVI-PCI from amd-iommu device and introduce pci-id
> parameter to link between the two devices.
> 
> For example:
>   -device AMDVI-PCI,id=iommupci0,bus=pcie.0,addr=0x05 \
>   -device amd-iommu,intremap=on,pt=on,xtsup=on,pci-id=iommupci0 \
> 
> For backward-compatibility, internally create the AMDVI-PCI device if not
> specified on the CLI.
> 
> Co-developed-by: Daniel P. Berrangé 
> Signed-off-by: Suravee Suthikulpanit 


breaks build:

https://gitlab.com/mstredhat/qemu/-/jobs/9202802751

./hw/i386/amd_iommu.c: In function ‘amdvi_sysbus_realize’:
../hw/i386/amd_iommu.c:1616:18: error: unused variable ‘dc’ 
[-Werror=unused-variable]
 1616 | DeviceClass *dc = (DeviceClass *) object_get_class(OBJECT(dev));
  |  ^~
cc1: all warnings being treated as errors



> ---
>  hw/i386/acpi-build.c |  8 +++
>  hw/i386/amd_iommu.c  | 53 +++-
>  hw/i386/amd_iommu.h  |  3 ++-
>  3 files changed, 39 insertions(+), 25 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 53b7306b43..e70eeaf577 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2333,10 +2333,10 @@ build_amd_iommu(GArray *table_data, BIOSLinker 
> *linker, const char *oem_id,
>  build_append_int_noprefix(table_data, ivhd_blob->len + 24, 2);
>  /* DeviceID */
>  build_append_int_noprefix(table_data,
> -  object_property_get_int(OBJECT(&s->pci), 
> "addr",
> +  object_property_get_int(OBJECT(s->pci), "addr",
>&error_abort), 2);
>  /* Capability offset */
> -build_append_int_noprefix(table_data, s->pci.capab_offset, 2);
> +build_append_int_noprefix(table_data, s->pci->capab_offset, 2);
>  /* IOMMU base address */
>  build_append_int_noprefix(table_data, s->mr_mmio.addr, 8);
>  /* PCI Segment Group */
> @@ -2368,10 +2368,10 @@ build_amd_iommu(GArray *table_data, BIOSLinker 
> *linker, const char *oem_id,
>  build_append_int_noprefix(table_data, ivhd_blob->len + 40, 2);
>  /* DeviceID */
>  build_append_int_noprefix(table_data,
> -  object_property_get_int(OBJECT(&s->pci), 
> "addr",
> +  object_property_get_int(OBJECT(s->pci), "addr",
>&error_abort), 2);
>  /* Capability offset */
> -build_append_int_noprefix(table_data, s->pci.capab_offset, 2);
> +build_append_int_noprefix(table_data, s->pci->capab_offset, 2);
>  /* IOMMU base address */
>  build_append_int_noprefix(table_data, s->mr_mmio.addr, 8);
>  /* PCI Segment Group */
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index 6b13ce894b..0f552bafa0 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -167,11 +167,11 @@ static void amdvi_generate_msi_interrupt(AMDVIState *s)
>  {
>  MSIMessage msg = {};
>  MemTxAttrs attrs = {
> -.requester_id = pci_requester_id(&s->pci.dev)
> +.requester_id = pci_requester_id(&s->pci->dev)
>  };
>  
> -if (msi_enabled(&s->pci.dev)) {
> -msg = msi_get_message(&s->pci.dev, 0);
> +if (msi_enabled(&s->pci->dev)) {
> +msg = msi_get_message(&s->pci->dev, 0);
>  address_space_stl_le(&address_space_memo

Re: [PATCH v3 03/14] acpi/ghes: Use HEST table offsets when preparing GHES records

2025-02-20 Thread Mauro Carvalho Chehab

Em Mon, 3 Feb 2025 15:34:23 +0100
Igor Mammedov  escreveu:

> On Fri, 31 Jan 2025 18:42:44 +0100
> Mauro Carvalho Chehab  wrote:
> 
> > There are two pointers that are needed during error injection:
> > 
> > 1. The start address of the CPER block to be stored;
> > 2. The address of the ack.
> > 
> > It is preferable to calculate them from the HEST table.  This allows
> > checking the source ID, the size of the table and the type of the
> > HEST error block structures.
> > 
> > Yet, keep the old code, as this is needed for migration purposes.
> > 
> > Signed-off-by: Mauro Carvalho Chehab 
> > ---
> >  hw/acpi/ghes.c | 132 -
> >  include/hw/acpi/ghes.h |   1 +
> >  2 files changed, 119 insertions(+), 14 deletions(-)
> > 
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index 27478f2d5674..8f284fd191a6 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -41,6 +41,12 @@
> >  /* Address offset in Generic Address Structure(GAS) */
> >  #define GAS_ADDR_OFFSET 4
> >  
> > +/*
> > + * ACPI spec 1.0b
> > + * 5.2.3 System Description Table Header
> > + */
> > +#define ACPI_DESC_HEADER_OFFSET 36
> > +
> >  /*
> >   * The total size of Generic Error Data Entry
> >   * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> > @@ -61,6 +67,25 @@
> >   */
> >  #define ACPI_GHES_GESB_SIZE 20
> >  
> > +/*
> > + * Offsets with regards to the start of the HEST table stored at
> > + * ags->hest_addr_le,  
> 
> If I read this literary, then offsets above are not what
> declared later in this patch.
> I'd really drop this comment altogether as it's confusing,
> and rather get variables/macro naming right
> 
> > according with the memory layout map at
> > + * docs/specs/acpi_hest_ghes.rst.
> > + */  
> 
> what we need is update to above doc, describing new and old ways.
> a separate patch.

I can't see anything that should be changed at
docs/specs/acpi_hest_ghes.rst, as this series doesn't change the
firmware layout: we're still using two firmware tables:

- etc/acpi/tables, with HEST on it;
- etc/hardware_errors, with:
- error block addresses;
- read_ack registers;
- CPER records.

The only changes that this series introduce are related to how
the error generation logic navigates between HEST and hw_errors
firmware. This is not described at acpi_hest_ghes.rst, and both
ways follow ACPI specs to the letter.

The only difference is that the code which populates the CPER
record and the error/read offsets doesn't require to know how
the HEST table generation placed offsets, as it will basically
reproduce what OSPM firmware does when handling HEST events.

Thanks,
Mauro

RE: [PATCH rfcv2 00/20] intel_iommu: Enable stage-1 translation for passthrough device

2025-02-20 Thread Duan, Zhenzhong

Hi Eric,

>-Original Message-
>From: Eric Auger 
>Subject: Re: [PATCH rfcv2 00/20] intel_iommu: Enable stage-1 translation for
>passthrough device
>
>
>Hi Zhenzhong
>
>On 2/19/25 9:22 AM, Zhenzhong Duan wrote:
>> Hi,
>>
>> Per Jason Wang's suggestion, iommufd nesting series[1] is split into
>> "Enable stage-1 translation for emulated device" series and
>> "Enable stage-1 translation for passthrough device" series.
>>
>> This series is 2nd part focusing on passthrough device. We don't do
>> shadowing of guest page table for passthrough device but pass stage-1
>> page table to host side to construct a nested domain. There was some
>> effort to enable this feature in old days, see [2] for details.
>>
>> The key design is to utilize the dual-stage IOMMU translation
>> (also known as IOMMU nested translation) capability in host IOMMU.
>> As the below diagram shows, guest I/O page table pointer in GPA
>> (guest physical address) is passed to host and be used to perform
>s/be/is
>> the stage-1 address translation. Along with it, modifications to
>> present mappings in the guest I/O page table should be followed
>> with an IOTLB invalidation.
>>
>> .-.  .---.
>> |   vIOMMU|  | Guest I/O page table  |
>> | |  '---'
>> ./
>> | PASID Entry |--- PASID cache flush --+
>> '-'|
>> | |V
>> | |   I/O page table pointer in GPA
>> '-'
>> Guest
>> --| Shadow |---|
>>   vv   v
>> Host
>> .-.  ..
>> |   pIOMMU|  |  FS for GIOVA->GPA |
>> | |  ''
>> ./  |
>> | PASID Entry | V (Nested xlate)
>> '\.--.
>> | |   | SS for GPA->HPA, unmanaged domain|
>> | |   '--'
>> '-'
>> Where:
>>  - FS = First stage page tables
>>  - SS = Second stage page tables
>> 
>>
>> There are some interactions between VFIO and vIOMMU
>> * vIOMMU registers PCIIOMMUOps [set|unset]_iommu_device to PCI
>>   subsystem. VFIO calls them to register/unregister HostIOMMUDevice
>>   instance to vIOMMU at vfio device realize stage.
>> * vIOMMU calls HostIOMMUDeviceIOMMUFD interface [at|de]tach_hwpt
>>   to bind/unbind device to IOMMUFD backed domains, either nested
>>   domain or not.
>>
>> See below diagram:
>>
>> VFIO Device Intel IOMMU
>> .-. .---.
>> | | |   |
>> |   .-|PCIIOMMUOps  |.-.|
>> |   | IOMMUFD |(set_iommu_device)   || Host IOMMU  ||
>> |   | Device  |>|| Device list ||
>> |   .-|(unset_iommu_device) |.-.|
>> | | |   |   |
>> | | |   V   |
>> |   .-|  HostIOMMUDeviceIOMMUFD |  .-.  |
>> |   | IOMMUFD |(attach_hwpt)|  | Host IOMMU  |  |
>> |   | link|<|  |   Device|  |
>> |   .-|(detach_hwpt)|  .-.  |
>> | | |   |   |
>> | | |   ... |
>> .-. .---.
>>
>> Based on Yi's suggestion, this design is optimal in sharing ioas/hwpt
>> whenever possible and create new one on demand, also supports multiple
>> iommufd objects and ERRATA_772415.
>>
>> E.g., Stage-2 page table could be shared by different devices if there
>> is no conflict and devices link to same iommufd object, i.e. devices
>> under same host IOMMU can share same stage-2 page table. If there is
>> conflict, i.e. there is one device under non cache coherency mode
>> which is different from others, it requires a separate stage-2 page
>> table in non-CC mode.
>>
>> SPR platform has ERRATA_772415 which requires no readonly mappings
>> in stage-2 page table. This series supports creating VTDIOASContainer
>> with no readonly mappings. If there is a rare case that some IOMMUs
>> on a multiple IOMMU host have ERRATA_772415 and others not, this
>> design can still survive.
>>
>> See below example diagram for a full view:
>>
>>   IntelIOMMUState
>>  |
>>  V
>> .--..--..---.
>> | VTDIOA

Re: [PATCH v2 00/19] intel_iommu: Add ATS support

2025-02-20 Thread CLEMENT MATHIEU--DRIF




On 20/02/2025 22:13, Michael S. Tsirkin wrote:
> Caution: External email. Do not open attachments or click links, unless this 
> email comes from a known sender and you know the content is safe.
>
>
> On Mon, Jan 20, 2025 at 05:41:32PM +, CLEMENT MATHIEU--DRIF wrote:
>> From: Clement Mathieu--Drif 
>>
>> This patch set belongs to a list of series that add SVM support for VT-d.
>>
>> Here we focus on implementing ATS support in the IOMMU and adding a
>> PCI-level API to be used by virtual devices.
>>
>> This work is based on the VT-d specification version 4.1 (March 2023).
>>
>> Here is a link to our GitHub repository where you can find the following 
>> elements:
>>  - Qemu with all the patches for SVM
>>  - ATS
>>  - PRI
>>  - Device IOTLB invalidations
>>  - Requests with already pre-translated addresses
>>  - A demo device
>>  - A simple driver for the demo device
>>  - A userspace program (for testing and demonstration purposes)
>>
>> https://github.com/BullSequana/Qemu-in-guest-SVM-demo
>
> Fails build:
>
> https://gitlab.com/mstredhat/qemu/-/jobs/9200372388
>
> In function ‘vtd_iommu_ats_do_translate’,
>  inlined from ‘vtd_iommu_ats_request_translation’ at 
> ../hw/i386/intel_iommu.c:4778:17:
> ../hw/i386/intel_iommu.c:4758:12: error: ‘entry.target_as’ may be used 
> uninitialized [-Werror=maybe-uninitialized]
>   4758 | return entry;
>|^
> ../hw/i386/intel_iommu.c: In function ‘vtd_iommu_ats_request_translation’:
> ../hw/i386/intel_iommu.c:4745:19: note: ‘entry’ declared here
>   4745 | IOMMUTLBEntry entry;
>|   ^
> cc1: all warnings being treated as errors
>

Uh, looks like the error is only present in non-debug mode.
I'll send a v3

Thanks Michael

>
>
>> ===
>>
>> Context and design notes
>> 
>>
>> The main purpose of this work is to enable vVT-d users to make
>> translation requests to the vIOMMU as described in the PCIe Gen 5.0
>> specification (section 10). Moreover, we aim to implement a
>> PCI/Memory-level framework that could be used by other vIOMMUs
>> to implement the same features.
>>
>> What is ATS?
>> 
>>
>> ATS (Address Translation Service) is a PCIe-level protocol that
>> enables PCIe devices to query an IOMMU for virtual to physical
>> address translations in a specific address space (such as a userland
>> process address space). When a device receives translation responses
>> from an IOMMU, it may decide to store them in an internal cache,
>> often known as "ATC" (Address Translation Cache) or "Device IOTLB".
>> To keep page tables and caches consistent, the IOMMU is allowed to
>> send asynchronous invalidation requests to its client devices.
>>
>> To avoid introducing an unnecessarily complex API, this series simply
>> exposes 3 functions. The first 2 are a pair of setup functions that
>> are called to install and remove the ATS invalidation callback during
>> the initialization phase of a process. The third one will be
>> used to request translations. The callback setup API introduced in
>> this series calls the IOMMUNotifier API under the hood.
>>
>> API design
>> ''
>>
>> - int pci_register_iommu_tlb_event_notifier(PCIDevice *dev,
>>  uint32_t pasid,
>>  IOMMUNotifier *n);
>>
>> - int pci_unregister_iommu_tlb_event_notifier(PCIDevice *dev, uint32_t pasid,
>>IOMMUNotifier *n);
>>
>> - ssize_t pci_ats_request_translation_pasid(PCIDevice *dev, uint32_t pasid,
>>  bool priv_req, bool exec_req,
>>  hwaddr addr, size_t length,
>>  bool no_write,
>>  IOMMUTLBEntry *result,
>>  size_t result_length,
>>  uint32_t *err_count);
>>
>> Although device developers may want to implement custom ATC for
>> testing or performance measurement purposes, we provide a generic
>> implementation as a utility module.
>>
>> Overview
>> 
>>
>> Here are the interactions between an ATS-capable PCIe device and the vVT-d:
>>
>>
>>
>>┌───┐ ┌┐
>>│Device │ │PCI / Memory│
>>│   │ pci_ats_request_│abstraction │ iommu_ats_
>>│   │ translation_││ request_
>>│┌─┐│ pasid   │ AS lookup  │ translation
>>││Logic││>│╶╶╶>│──┐
>>│└─┘│<│<╶╶╶│<──┐  │
>>│┌─┐│ ││   │  │
>>││inv func ││<───┐││   │  │
>>│└─┘││││   │  │
>>││  │││

[PATCH v2 0/3] tests/functional: Allow running TCG plugins tests on macOS

2025-02-20 Thread Philippe Mathieu-Daudé

Pierrick kindly helped me to resolve this issue which ended
being trivial (to him!). Not tested on Windows so far.

Since v1:
- Include Pierrick's meson fix patch (adding Fixes: tag)
- Addressed Thomas review comments (config.py, os.path.join)

Philippe Mathieu-Daudé (2):
  tests/functional: Introduce the dso_suffix() helper
  tests/functional: Allow running TCG plugins tests on non-Linux/BSD
hosts

Pierrick Bouvier (1):
  plugins: add explicit dependency in functional tests

 meson.build  |  1 +
 contrib/plugins/meson.build  |  2 ++
 tests/functional/meson.build |  2 +-
 tests/functional/qemu_test/__init__.py   |  2 +-
 tests/functional/qemu_test/cmd.py|  1 -
 tests/functional/qemu_test/config.py |  6 ++
 tests/functional/test_aarch64_tcg_plugins.py | 11 ---
 tests/tcg/plugins/meson.build|  2 ++
 8 files changed, 21 insertions(+), 6 deletions(-)

-- 
2.47.1

[PATCH v2 2/3] tests/functional: Introduce the dso_suffix() helper

2025-02-20 Thread Philippe Mathieu-Daudé

Introduce a helper to get the default shared library
suffix used on the host.

Suggested-by: Pierrick Bouvier 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Pierrick Bouvier 
---
 tests/functional/qemu_test/__init__.py | 2 +-
 tests/functional/qemu_test/cmd.py  | 1 -
 tests/functional/qemu_test/config.py   | 6 ++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tests/functional/qemu_test/__init__.py 
b/tests/functional/qemu_test/__init__.py
index 5c972843a6d..45f7befa374 100644
--- a/tests/functional/qemu_test/__init__.py
+++ b/tests/functional/qemu_test/__init__.py
@@ -7,7 +7,7 @@
 
 
 from .asset import Asset
-from .config import BUILD_DIR
+from .config import BUILD_DIR, dso_suffix
 from .cmd import is_readable_executable_file, \
 interrupt_interactive_console_until_pattern, wait_for_console_pattern, \
 exec_command, exec_command_and_wait_for_pattern, get_qemu_img, which
diff --git a/tests/functional/qemu_test/cmd.py 
b/tests/functional/qemu_test/cmd.py
index dc5f422b77d..254e23ef748 100644
--- a/tests/functional/qemu_test/cmd.py
+++ b/tests/functional/qemu_test/cmd.py
@@ -15,7 +15,6 @@
 import os
 import os.path
 
-
 def which(tool):
 """ looks up the full path for @tool, returns None if not found
 or if @tool does not have executable permissions.
diff --git a/tests/functional/qemu_test/config.py 
b/tests/functional/qemu_test/config.py
index edd75b7fd06..0eab1baa541 100644
--- a/tests/functional/qemu_test/config.py
+++ b/tests/functional/qemu_test/config.py
@@ -13,6 +13,7 @@
 
 import os
 from pathlib import Path
+import platform
 
 
 def _source_dir():
@@ -34,3 +35,8 @@ def _build_dir():
 raise Exception("Cannot identify build dir, set QEMU_BUILD_ROOT")
 
 BUILD_DIR = _build_dir()
+
+def dso_suffix():
+'''Return the dynamic libraries suffix for the current platform'''
+DSO_SUFFIXES = { 'Linux': 'so', 'Darwin': 'dylib', 'Windows': 'dll' }
+return DSO_SUFFIXES[platform.system()]
-- 
2.47.1

[PATCH v2 3/3] tests/functional: Allow running TCG plugins tests on non-Linux/BSD hosts

2025-02-20 Thread Philippe Mathieu-Daudé

Not all platforms use the '.so' suffix for shared libraries,
which is how plugins are built. Use the recently introduced
dso_suffix() helper to get the proper host suffix.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2804
Suggested-by: Pierrick Bouvier 
Suggested-by: Daniel P. Berrangé 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Pierrick Bouvier 
---
 tests/functional/test_aarch64_tcg_plugins.py | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/tests/functional/test_aarch64_tcg_plugins.py 
b/tests/functional/test_aarch64_tcg_plugins.py
index 7e8beacc833..5736b60545a 100755
--- a/tests/functional/test_aarch64_tcg_plugins.py
+++ b/tests/functional/test_aarch64_tcg_plugins.py
@@ -13,10 +13,11 @@
 
 import tempfile
 import mmap
+import os
 import re
 
 from qemu.machine.machine import VMLaunchFailure
-from qemu_test import LinuxKernelTest, Asset
+from qemu_test import LinuxKernelTest, Asset, dso_suffix
 
 
 class PluginKernelBase(LinuxKernelTest):
@@ -62,6 +63,10 @@ class PluginKernelNormal(PluginKernelBase):
 ('https://storage.tuxboot.com/20230331/arm64/Image'),
 'ce95a7101a5fecebe0fe630deee6bd97b32ba41bc8754090e9ad8961ea8674c7')
 
+def plugin_file(self, plugin_name):
+sfx = dso_suffix()
+return os.path.join('tests', 'tcg', 'plugins', f'{plugin_name}.{sfx}')
+
 def test_aarch64_virt_insn(self):
 self.set_machine('virt')
 self.cpu='cortex-a53'
@@ -74,7 +79,7 @@ def test_aarch64_virt_insn(self):
  suffix=".log")
 
 self.run_vm(kernel_path, kernel_command_line,
-"tests/tcg/plugins/libinsn.so", plugin_log.name,
+self.plugin_file('libinsn'), plugin_log.name,
 console_pattern)
 
 with plugin_log as lf, \
@@ -100,7 +105,7 @@ def test_aarch64_virt_insn_icount(self):
  suffix=".log")
 
 self.run_vm(kernel_path, kernel_command_line,
-"tests/tcg/plugins/libinsn.so", plugin_log.name,
+self.plugin_file('libinsn'), plugin_log.name,
 console_pattern,
 args=('-icount', 'shift=1'))
 
-- 
2.47.1

Re: [PATCH v2 2/3] tests/functional: Introduce the dso_suffix() helper

2025-02-20 Thread Philippe Mathieu-Daudé


On 20/2/25 09:02, Philippe Mathieu-Daudé wrote:

Introduce a helper to get the default shared library
suffix used on the host.

Suggested-by: Pierrick Bouvier 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Pierrick Bouvier 
---
  tests/functional/qemu_test/__init__.py | 2 +-
  tests/functional/qemu_test/cmd.py  | 1 -
  tests/functional/qemu_test/config.py   | 6 ++
  3 files changed, 7 insertions(+), 2 deletions(-)




diff --git a/tests/functional/qemu_test/cmd.py 
b/tests/functional/qemu_test/cmd.py
index dc5f422b77d..254e23ef748 100644
--- a/tests/functional/qemu_test/cmd.py
+++ b/tests/functional/qemu_test/cmd.py
@@ -15,7 +15,6 @@
  import os
  import os.path
  
-


Oops, Alex, should I respin a v3 removing this change?


  def which(tool):
  """ looks up the full path for @tool, returns None if not found
  or if @tool does not have executable permissions.

Re: [PATCH] qapi: pluggable backend code generators

2025-02-20 Thread Markus Armbruster

Cc: John for advice on my somewhat nebulous mypy worries at the end.

Daniel P. Berrangé  writes:

> The 'qapi.backend.QAPIBackend' class defines an API contract for
> code generators. The current generator is put into a new class
> 'qapi.backend.QAPICBackend' and made to be the default impl.
>
> A custom generator can be requested using the '-k' arg which takes
> a fully qualified python class name
>
>qapi-gen.py -k the.python.module.QAPIMyBackend

I understand why that will be useful, but explaining it briefly in the
commit message wouldn't hurt.

>
> Signed-off-by: Daniel P. Berrangé 
> ---
>
> This is an impl of the idea I mentioned at:
>
>https://lists.nongnu.org/archive/html/qemu-devel/2025-02/msg03475.html
>
> With this change, it is possible for the Go generator code to live
> outside of qemu.git, invoked using:
>
>   $ PYTHONPATH=/path/to/qemu.git/scripts \
> python /path/to/qemu.git/scripts/qapi-gen.py \
>   -o somedir \
>   -k qapi.golang.golang.QAPIGoBackend \
>   /path/to/qemu.git/qga/qapi-schema.json
>
> The external app could just expect qemu.git to be checkedout somewhere
> convenient, or could use a git submodule to reference it.
>
>  scripts/qapi/backend.py | 96 +
>  scripts/qapi/main.py| 65 
>  2 files changed, 113 insertions(+), 48 deletions(-)
>  create mode 100644 scripts/qapi/backend.py
>
> diff --git a/scripts/qapi/backend.py b/scripts/qapi/backend.py
> new file mode 100644
> index 00..b6873fd2e3
> --- /dev/null
> +++ b/scripts/qapi/backend.py
> @@ -0,0 +1,96 @@
> +# This work is licensed under the terms of the GNU GPL, version 2 or later.
> +# See the COPYING file in the top-level directory.
> +
> +from abc import ABC, abstractmethod
> +from typing import Optional
> +
> +from .commands import gen_commands
> +from .common import must_match
> +from .events import gen_events
> +from .features import gen_features
> +from .introspect import gen_introspect
> +from .schema import QAPISchema
> +from .types import gen_types
> +from .visit import gen_visit
> +
> +
> +def invalid_prefix_char(prefix: str) -> Optional[str]:
> +match = must_match(r'([A-Za-z_.-][A-Za-z0-9_.-]*)?', prefix)
> +if match.end() != len(prefix):
> +return prefix[match.end()]
> +return None
> +
> +
> +class QAPIBackend(ABC):
> +
> +def run(self,
> +schema_file: str,
> +output_dir: str,
> +prefix: str,
> +unmask: bool = False,
> +builtins: bool = False,
> +gen_tracing: bool = False) -> None:
> +"""
> +Run the code generator for the given schema into the target 
> directory.
> +
> +:param schema_file: The primary QAPI schema file.
> +:param output_dir: The output directory to store generated code.
> +:param prefix: Optional C-code prefix for symbol names.
> +:param unmask: Expose non-ABI names through introspection?
> +:param builtins: Generate code for built-in types?
> +
> +:raise QAPIError: On failures.
> +"""
> +assert invalid_prefix_char(prefix) is None
> +
> +schema = QAPISchema(schema_file)

Hmm.  This makes the backend run the frontend.  Could we keep this in
main.py instead?

> +self.generate(schema, output_dir, prefix, unmask, builtins, 
> gen_tracing)
> +
> +@abstractmethod
> +def generate(self,
> + schema: QAPISchema,
> + output_dir: str,
> + prefix: str,
> + unmask: bool,
> + builtins: bool,
> + gen_tracing: bool) -> None:
> +"""
> +Generate code for the given schema into the target directory.
> +
> +:param schema: The primary QAPI schema object.
> +:param output_dir: The output directory to store generated code.
> +:param prefix: Optional C-code prefix for symbol names.
> +:param unmask: Expose non-ABI names through introspection?
> +:param builtins: Generate code for built-in types?
> +
> +:raise QAPIError: On failures.
> +"""
> +pass

pylint prefers not to pass:

scripts/qapi/backend.py:68:8: W0107: Unnecessary pass statement 
(unnecessary-pass)

> +
> +
> +class QAPICBackend(QAPIBackend):
> +
> +def generate(self,
> + schema: QAPISchema,
> + output_dir: str,
> + prefix: str,
> + unmask: bool,
> + builtins: bool,
> + gen_tracing: bool) -> None:
> +"""
> +Generate C code for the given schema into the target directory.
> +
> +:param schema_file: The primary QAPI schema file.
> +:param output_dir: The output directory to store generated code.
> +:param prefix: Optional C-code prefix for symbol names.
> +:param unmask: Expose non-ABI names through introspection?
> +:param builtins: G

Re: [PATCH v4 00/11]

2025-02-20 Thread Markus Armbruster

Daniel P. Berrangé  writes:

[...]

> When I think about the code generator and how this will all
> evolve over time, I have a strong feeling that none of this
> should be in qemu.git.

Yes, keeping it in qemu.git has its drawbacks.  Testing is awkward
there.  The coupling could cause friction.

I'll gladly support exploring alternatives.  I saw your "[PATCH] qapi:
pluggable backend code generators", and just sent my review.

Thanks!

[...]

Re: [PATCH v2 1/9] hw/char/pl011: Warn when using disabled receiver

2025-02-20 Thread Luc Michel

On 10:28 Thu 20 Feb , Philippe Mathieu-Daudé wrote:
> We shouldn't receive characters when the full UART or its
> receiver is disabled. However we don't want to break the
> possibly incomplete "my first bare metal assembly program"s,
> so we choose to simply display a warning when this occurs.
> 
> Reviewed-by: Richard Henderson 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Luc Michel 

> ---
>  hw/char/pl011.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/hw/char/pl011.c b/hw/char/pl011.c
> index 06ce851044d..12a2d4bc7bd 100644
> --- a/hw/char/pl011.c
> +++ b/hw/char/pl011.c
> @@ -85,6 +85,7 @@ DeviceState *pl011_create(hwaddr addr, qemu_irq irq, 
> Chardev *chr)
>  #define CR_OUT1 (1 << 12)
>  #define CR_RTS  (1 << 11)
>  #define CR_DTR  (1 << 10)
> +#define CR_RXE  (1 << 9)
>  #define CR_TXE  (1 << 8)
>  #define CR_LBE  (1 << 7)
>  #define CR_UARTEN   (1 << 0)
> @@ -487,6 +488,14 @@ static int pl011_can_receive(void *opaque)
>  PL011State *s = (PL011State *)opaque;
>  int r;
> 
> +if (!(s->cr & CR_UARTEN)) {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "PL011 receiving data on disabled UART\n");
> +}
> +if (!(s->cr & CR_RXE)) {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "PL011 receiving data on disabled RX UART\n");
> +}
>  r = s->read_count < pl011_get_fifo_depth(s);
>  trace_pl011_can_receive(s->lcr, s->read_count, r);
>  return r;
> --
> 2.47.1
> 

--

Re: [PATCH v5 0/7] hw/loongarch/virt: Add cpu hotplug support

2025-02-20 Thread bibo mao


Slightly ping.
Hope to merge to qemu 10.0 if possible :)

Regards
Bibo Mao

On 2025/2/11 上午11:08, Bibo Mao wrote:

LoongArch cpu hotplug is based on ACPI GED device, it depends on
patchset where TYPE_HOTPLUG_HANDLER interface is added in ipi and extioi
interrupt controller class for cpu hotplug event notification.
   
https://lore.kernel.org/qemu-devel/0d920309-c7ba-48d8-b46d-04ac1e38e...@linaro.org/T/#t

It can be verified with qemu command:
   qemu-system-loongarch64 -smp 2,maxcpus=16,sockets=4,cores=4,threads=1
and vcpu can be added or remove with hmp command:
   device_add la464-loongarch-cpu,socket-id=0,core-id=2,thread-id=0,id=cpu-2
   device_del cpu-2

---
v4 ... v5:
   1. Use new calculation logic about physical cpu id, the max core-id and
  thread-id is aligned up with power of 2.
   2. Remove num-cpu property with ipi and extioi interrupt controller,
  TYPE_HOTPLUG_HANDLER interface is added with the interrupt
  controllers for cpu hotplug event notification.
  
v3 ... v4:

   1. For cold-plug CPUs, move socket-id/core-id/thread-id property
  setting from preplug function to CPU object creating loop, since
  there is topo information calculation already in CPU object creating
  loop.
   2. Init interrupt pin of CPU object in cpu plug interface for both
  cold-plug CPUs and hot-plug CPUs.
   3. Apply the patch based on latest qemu version.

v2 ... v3:
   1. Use qdev_realize_and_unref() with qdev_realize() and object_unref().
   2. Set vcpus_count with 1 since vcpu object is created for every thread.
   3. Remove property hw-id, use internal variable hw_id to differentiate
  cold-plug cpus and hot-plug cpus.
   4. Add generic function virt_init_cpu_irq() to init interrupt pin
  of CPU object, used by both cold-plug and hot-plug CPUs

v1 ... v2:
   1. Add new property hw-id, property hw-id is set for cold-added CPUs,
  and property socket-id/core-id/thread-id is set for hot-added CPUs.
  The two properties can be generated from each other.
   2. Use general hotplug api such as hotplug_handler_pre_plug etc
   3. Reorganize the patch order, split the patch set into 4 small
  patches.
---
Bibo Mao (7):
   hw/loongarch/virt: Add CPU topology support
   hw/loongarch/virt: Add topo properties on CPU object
   hw/loongarch/virt: Add basic cpu plug interface framework
   hw/loongarch/virt: Implement cpu unplug interface
   hw/loongarch/virt: Implement cpu plug interface
   hw/loongarch/virt: Update the ACPI table for hotplug cpu
   hw/loongarch/virt: Enable cpu hotplug feature on virt machine

  hw/loongarch/Kconfig   |   1 +
  hw/loongarch/virt-acpi-build.c |  35 +++-
  hw/loongarch/virt.c| 284 ++---
  include/hw/loongarch/virt.h|   1 +
  target/loongarch/cpu.c |  23 +++
  target/loongarch/cpu.h |  11 ++
  6 files changed, 333 insertions(+), 22 deletions(-)

Re: [Qemu-devel] [PATCH v2 00/11] chardev: Convert IO handlers to use unsigned type

2025-02-20 Thread Philippe Mathieu-Daudé


On 12/10/18 10:20, Daniel P. Berrangé wrote:

On Fri, Oct 12, 2018 at 02:22:06AM +0200, Philippe Mathieu-Daudé wrote:

Hi Paolo,

Here are the changes you suggested in
https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg02294.html




   chardev: Simplify IOWatchPoll::fd_can_read as a GSourceFunc
   chardev: Assert backend's chr_can_read() is positive
   chardev: Let chr_sync_read() use unsigned type
   chardev: Let chr_write use unsigned type
   chardev: Let IOReadHandler use unsigned type
   chardev: Let IOCanReadHandler use unsigned type
   chardev: Let qemu_chr_fe_* use unsigned type
   chardev: Let qemu_chr_be_* use unsigned type
   chardev: FDChardev::max_size be unsigned


s/unsigned/size_t/ in all these commit messages, as it looked
like you meant 'unsigned int' with the current commit msg.


Sure.

Paolo, any blocking comment before I fix the bitrotten places?

Re: [Qemu-devel] [PATCH v2 00/11] chardev: Convert IO handlers to use unsigned type

2025-02-20 Thread Philippe Mathieu-Daudé


On 20/2/25 11:07, Philippe Mathieu-Daudé wrote:

On 12/10/18 10:20, Daniel P. Berrangé wrote:

On Fri, Oct 12, 2018 at 02:22:06AM +0200, Philippe Mathieu-Daudé wrote:

Hi Paolo,

Here are the changes you suggested in
https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg02294.html




   chardev: Simplify IOWatchPoll::fd_can_read as a GSourceFunc
   chardev: Assert backend's chr_can_read() is positive
   chardev: Let chr_sync_read() use unsigned type
   chardev: Let chr_write use unsigned type
   chardev: Let IOReadHandler use unsigned type
   chardev: Let IOCanReadHandler use unsigned type
   chardev: Let qemu_chr_fe_* use unsigned type
   chardev: Let qemu_chr_be_* use unsigned type
   chardev: FDChardev::max_size be unsigned


s/unsigned/size_t/ in all these commit messages, as it looked
like you meant 'unsigned int' with the current commit msg.


Sure.

Paolo, any blocking comment before I fix the bitrotten places?


Doh, I just noticed your comments in the last patch:
https://lore.kernel.org/qemu-devel/786ca1d4-f874-e643-b85a-20652f1c8...@redhat.com/

Re: [PATCH v2 2/3] tests/functional: Introduce the dso_suffix() helper

2025-02-20 Thread Thomas Huth


On 20/02/2025 09.02, Philippe Mathieu-Daudé wrote:

Introduce a helper to get the default shared library
suffix used on the host.

Suggested-by: Pierrick Bouvier 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Pierrick Bouvier 
---
  tests/functional/qemu_test/__init__.py | 2 +-
  tests/functional/qemu_test/cmd.py  | 1 -
  tests/functional/qemu_test/config.py   | 6 ++
  3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tests/functional/qemu_test/__init__.py 
b/tests/functional/qemu_test/__init__.py
index 5c972843a6d..45f7befa374 100644
--- a/tests/functional/qemu_test/__init__.py
+++ b/tests/functional/qemu_test/__init__.py
@@ -7,7 +7,7 @@
  
  
  from .asset import Asset

-from .config import BUILD_DIR
+from .config import BUILD_DIR, dso_suffix
  from .cmd import is_readable_executable_file, \
  interrupt_interactive_console_until_pattern, wait_for_console_pattern, \
  exec_command, exec_command_and_wait_for_pattern, get_qemu_img, which
diff --git a/tests/functional/qemu_test/cmd.py 
b/tests/functional/qemu_test/cmd.py
index dc5f422b77d..254e23ef748 100644
--- a/tests/functional/qemu_test/cmd.py
+++ b/tests/functional/qemu_test/cmd.py
@@ -15,7 +15,6 @@
  import os
  import os.path
  
-

  def which(tool):
  """ looks up the full path for @tool, returns None if not found
  or if @tool does not have executable permissions.


Please drop the change to cmd.py now.
With that fixed:
Reviewed-by: Thomas Huth

Re: Any plan for command line '-g' option (graphical resolution and depth)?

2025-02-20 Thread Philippe Mathieu-Daudé


On 20/2/25 10:50, Paolo Bonzini wrote:

On 2/20/25 10:20, Philippe Mathieu-Daudé wrote:

Hi,

QEMU provides the global '-g' CLI option:

$ qemu-system-foo --help

   -g WxH[xDEPTH]  Set the initial graphical resolution and depth

This option is used to pass resolution/depth information to guest
firmwares in the machines defined in the following files:

   hw/ppc/mac_newworld.c
   hw/ppc/mac_oldworld.c
   hw/ppc/prep.c
   hw/ppc/spapr.c
   hw/sparc/sun4m.c
   hw/sparc64/sun4u.c

Examples:

- hw/ppc/spapr.c:1102:

   _FDT(fdt_setprop_cell(fdt, c, "qemu,graphic-width", graphic_width));

- hw/sparc64/sun4u.c:716:

   fw_cfg_add_i16(fw_cfg, FW_CFG_SPARC64_WIDTH, graphic_width);


Obviously we have default values, which are different per architecture!
Clearly these ought to be machine properties.


Yes, we should have an easy way for machine to allow, in addition to -M 
graphics=BOOLEAN, the structured property -M graphics. 
{width,height,depth,enabled}. Then '-g' can desugar to -M 
graphics.width=WW,graphics.height=HH,graphics.depth=BB.



It is also used to set TYPE_NUBUS_MACFB properties in hw/m68k/q800.c.
Here i suppose we could directly use '-global nubus-macfb.width=value'
etc. although I'm not sure it is the recommended way.

Should we start deprecating '-g' as a whole?


Only if it's unused/useless.  If the option is useful it could also be 
used for EDID, for example (adding Gerd).


Global variables are hard to track, and the one-off nature of the option 
makes it hard to figure out (without looking at the code) which machine 
support it.


However, if the properties were available and introspectable via QOM, 
then it wouldn't matter if the -g sugar remains as an isolated "case 
QEMU_OPTION_g:" in system/vl.c.  For an example see -boot/-smp or -no- 
reboot/-no-shutdown.


OK, I suppose you are pointing me to:

commit e6dba0481363ad343c5f984dd4de3dd06d79ee68
Author: Alejandro Jimenez 
Date:   Fri Dec 11 11:52:43 2020 -0500

qmp: generalize watchdog-set-action to -no-reboot/-no-shutdown

Add a QMP command to allow for the behaviors specified by the
-no-reboot and -no-shutdown command line option to be set at
runtime.
The new command is named set-action and takes optional arguments,
named  after an event, that provide a corresponding action to take.

Commit 30e863e5a7a ("qapi/qom: Add ObjectOptions for input-*") already
introduced width/height notions in InputBarrierProperties:

# @width: the width of secondary screen in pixels (default: "1920")
#
# @height: the height of secondary screen in pixels (default: "1080")

Maybe we can extract a QAPI structure including depth.

Anyway I'll have a look, thanks for your guidance!

Phil.

Re: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in d3hot state

2025-02-20 Thread Eric Auger



Hi Alex,

On 2/19/25 10:19 PM, Alex Williamson wrote:
> On Wed, 19 Feb 2025 11:58:44 -0700
> Alex Williamson  wrote:
>
>> On Wed, 19 Feb 2025 18:58:58 +0100
>> Eric Auger  wrote:
>>
>>> Since kernel commit:
>>> 2b2c651baf1c ("vfio/pci: Invalidate mmaps and block the access
>>> in D3hot power state")
>>> any attempt to do an mmap access to a BAR when the device is in d3hot
>>> state will generate a fault.
>>>
>>> On system_powerdown, if the VFIO device is translated by an IOMMU,
>>> the device is moved to D3hot state and then the vIOMMU gets disabled
>>> by the guest. As a result of this later operation, the address space is
>>> swapped from translated to untranslated. When re-enabling the aliased
>>> regions, the RAM regions are dma-mapped again and this causes DMA_MAP
>>> faults when attempting the operation on BARs.
>>>
>>> To avoid doing the remap on those BARs, we compute whether the
>>> device is in D3hot state and if so, skip the DMA MAP.  
>> Thinking on this some more, QEMU PCI code already manages the device
>> BARs appearing in the address space based on the memory enable bit in
>> the command register.  Should we do the same for PM state?
>>
>> IOW, the device going into low power state should remove the BARs from
>> the AddressSpace and waking the device should re-add them.  The BAR DMA
>> mapping should then always be consistent, whereas here nothing would
>> remap the BARs when the device is woken.
>>
>> I imagine we'd need an interface to register the PM capability with the
>> core QEMU PCI code, where address space updates are performed relative
>> to both memory enable and power status.  There might be a way to
>> implement this just for vfio-pci devices by toggling the enable state
>> of the BAR mmaps relative to PM state, but doing it at the PCI core
>> level seems like it'd provide behavior more true to physical hardware.
> I took a stab at this approach here, it doesn't obviously break
> anything in my configs, but I haven't yet tried to reproduce this exact
> scenario.
>
> https://gitlab.com/alex.williamson/qemu/-/tree/pci-pm-power-state

So if I understand correctly the BAR regions will disappear upon the
config cmd write in vfio_sub_page_bar_update_mapping(). Is that correct?
I will give it a try on my setup...
>
> There's another pm_cap on the PCIExpressDevice that needs to be
> consolidated as well, once I do some research to figure out why a
> non-express capability is tracked only by express devices and what
> they're doing with it.  Thanks,
I am not sure I get this last point though.

Thanks

Eric
>
> Alex
>

[PATCH v3 2/2] rust: add module to convert between success/-errno and io::Result

2025-02-20 Thread Paolo Bonzini

It is a common convention in QEMU to return a positive value in case of
success, and a negated errno value in case of error.  Unfortunately,
using errno portably in Rust is a bit complicated; on Unix the errno
values are supported natively by io::Error, but on Windows they are not;
so, use the libc crate.

This is a set of utility functions that are used by both chardev and
block layer bindings.

Signed-off-by: Paolo Bonzini 
---
 docs/devel/rust.rst |   1 +
 rust/qemu-api/meson.build   |   4 +
 rust/qemu-api/src/assertions.rs |  28 +++
 rust/qemu-api/src/errno.rs  | 343 
 rust/qemu-api/src/lib.rs|   1 +
 rust/qemu-api/src/prelude.rs|   2 +
 6 files changed, 379 insertions(+)
 create mode 100644 rust/qemu-api/src/errno.rs

diff --git a/docs/devel/rust.rst b/docs/devel/rust.rst
index 90958e5a306..c75dccdbb7c 100644
--- a/docs/devel/rust.rst
+++ b/docs/devel/rust.rst
@@ -179,6 +179,7 @@ module   status
 ``callbacks``complete
 ``cell`` stable
 ``c_str``complete
+``errno``complete
 ``irq``  complete
 ``memory``   stable
 ``module``   complete
diff --git a/rust/qemu-api/meson.build b/rust/qemu-api/meson.build
index 2e9c1078b9b..bcf1cf780f3 100644
--- a/rust/qemu-api/meson.build
+++ b/rust/qemu-api/meson.build
@@ -2,6 +2,8 @@ _qemu_api_cfg = run_command(rustc_args,
   '--config-headers', config_host_h, '--features', files('Cargo.toml'),
   capture: true, check: true).stdout().strip().splitlines()
 
+libc_dep = dependency('libc-0.2-rs')
+
 # _qemu_api_cfg += ['--cfg', 'feature="allocator"']
 if rustc.version().version_compare('>=1.77.0')
   _qemu_api_cfg += ['--cfg', 'has_offset_of']
@@ -22,6 +24,7 @@ _qemu_api_rs = static_library(
   'src/cell.rs',
   'src/chardev.rs',
   'src/c_str.rs',
+  'src/errno.rs',
   'src/irq.rs',
   'src/memory.rs',
   'src/module.rs',
@@ -39,6 +42,7 @@ _qemu_api_rs = static_library(
   override_options: ['rust_std=2021', 'build.rust_std=2021'],
   rust_abi: 'rust',
   rust_args: _qemu_api_cfg,
+  dependencies: libc_dep,
 )
 
 rust.test('rust-qemu-api-tests', _qemu_api_rs,
diff --git a/rust/qemu-api/src/assertions.rs b/rust/qemu-api/src/assertions.rs
index fa1a18de6fe..104dec39774 100644
--- a/rust/qemu-api/src/assertions.rs
+++ b/rust/qemu-api/src/assertions.rs
@@ -92,3 +92,31 @@ fn types_must_be_equal(_: T)
 };
 };
 }
+
+/// Assert that an expression matches a pattern.  This can also be
+/// useful to compare enums that do not implement `Eq`.
+///
+/// # Examples
+///
+/// ```
+/// # use qemu_api::assert_match;
+/// // JoinHandle does not implement `Eq`, therefore the result
+/// // does not either.
+/// let result: Result, u32> = Err(42);
+/// assert_match!(result, Err(42));
+/// ```
+#[macro_export]
+macro_rules! assert_match {
+($a:expr, $b:pat) => {
+assert!(
+match $a {
+$b => true,
+_ => false,
+},
+"{} = {:?} does not match {}",
+stringify!($a),
+$a,
+stringify!($b)
+);
+};
+}
diff --git a/rust/qemu-api/src/errno.rs b/rust/qemu-api/src/errno.rs
new file mode 100644
index 000..bfd972df951
--- /dev/null
+++ b/rust/qemu-api/src/errno.rs
@@ -0,0 +1,343 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+//! Utility functions to convert `errno` to and from
+//! [`io::Error`]/[`io::Result`]
+//!
+//! QEMU C functions often have a "positive success/negative `errno`" calling
+//! convention.  This module provides functions to portably convert an integer
+//! into an [`io::Result`] and back.
+
+use std::{convert::TryFrom, io, io::ErrorKind};
+
+/// An `errno` value that can be converted into an [`io::Error`]
+pub struct Errno(pub u16);
+
+// On Unix, from_raw_os_error takes an errno value and OS errors
+// are printed using strerror.  On Windows however it takes a
+// GetLastError() value; therefore we need to convert errno values
+// into io::Error by hand.  This is the same mapping that the
+// standard library uses to retrieve the kind of OS errors
+// (`std::sys::pal::unix::decode_error_kind`).
+impl From for ErrorKind {
+fn from(value: Errno) -> ErrorKind {
+let Errno(errno) = value;
+match i32::from(errno) {
+libc::EPERM | libc::EACCES => ErrorKind::PermissionDenied,
+libc::ENOENT => ErrorKind::NotFound,
+libc::EINTR => ErrorKind::Interrupted,
+x if x == libc::EAGAIN || x == libc::EWOULDBLOCK => 
ErrorKind::WouldBlock,
+libc::ENOMEM => ErrorKind::OutOfMemory,
+libc::EEXIST => ErrorKind::AlreadyExists,
+libc::EINVAL => ErrorKind::InvalidInput,
+libc::EPIPE => ErrorKind::BrokenPipe,
+libc::EADDRINUSE => ErrorKind::AddrInUse,
+libc::EADDRNOTAVAIL => ErrorKind::AddrNotAvailable,
+libc::ECONNABORTED => ErrorKind::ConnectionAborted,
+libc

Re: [PATCH v5 4/6] target/i386: Add feature that indicates WRMSR to BASE reg is non-serializing

2025-02-20 Thread Zhao Liu

On Thu, Feb 06, 2025 at 01:28:37PM -0600, Babu Moger wrote:
> Date: Thu, 6 Feb 2025 13:28:37 -0600
> From: Babu Moger 
> Subject: [PATCH v5 4/6] target/i386: Add feature that indicates WRMSR to
>  BASE reg is non-serializing
> X-Mailer: git-send-email 2.34.1
> 
> Add the CPUID bit indicates that a WRMSR to MSR_FS_BASE, MSR_GS_BASE, or
> MSR_KERNEL_GS_BASE is non-serializing.
> 
> CPUID_Fn8021_EAX
> BitFeature description
> 1  FsGsKernelGsBaseNonSerializing.
>WRMSR to FS_BASE, GS_BASE and KernelGSbase are non-serializing.
> 
> Link: 
> https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/57238.zip
> Signed-off-by: Babu Moger 
> Reviewed-by: Maksim Davydov 
> ---
>  target/i386/cpu.c | 2 +-
>  target/i386/cpu.h | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)

Reviewed-by: Zhao Liu

Re: [PATCH v5 5/6] target/i386: Update EPYC-Genoa for Cache property, perfmon-v2, RAS and SVM feature bits

2025-02-20 Thread Zhao Liu

> +static const CPUCaches epyc_genoa_v2_cache_info = {
> +.l1d_cache = &(CPUCacheInfo) {
> +.type = DATA_CACHE,
> +.level = 1,
> +.size = 32 * KiB,
> +.line_size = 64,
> +.associativity = 8,
> +.partitions = 1,
> +.sets = 64,
> +.lines_per_tag = 1,
> +.self_init = 1,

true.

> +.share_level = CPU_TOPOLOGY_LEVEL_CORE,
> +},
> +.l1i_cache = &(CPUCacheInfo) {
> +.type = INSTRUCTION_CACHE,
> +.level = 1,
> +.size = 32 * KiB,
> +.line_size = 64,
> +.associativity = 8,
> +.partitions = 1,
> +.sets = 64,
> +.lines_per_tag = 1,
> +.self_init = 1,

true.

Reviewed-by: Zhao Liu

Re: [PATCH v5 6/6] target/i386: Add support for EPYC-Turin model

2025-02-20 Thread Zhao Liu

> +static const CPUCaches epyc_turin_cache_info = {
> +.l1d_cache = &(CPUCacheInfo) {
> +.type = DATA_CACHE,
> +.level = 1,
> +.size = 48 * KiB,
> +.line_size = 64,
> +.associativity = 12,
> +.partitions = 1,
> +.sets = 64,
> +.lines_per_tag = 1,
> +.self_init = 1,

true.

> +.share_level = CPU_TOPOLOGY_LEVEL_CORE,
> +},
> +.l1i_cache = &(CPUCacheInfo) {
> +.type = INSTRUCTION_CACHE,
> +.level = 1,
> +.size = 32 * KiB,
> +.line_size = 64,
> +.associativity = 8,
> +.partitions = 1,
> +.sets = 64,
> +.lines_per_tag = 1,
> +.self_init = 1,

true.

Reviewed-by: Zhao Liu 

(And it would be better to add a Turin entry in 
docs/system/cpu-models-x86.rst.inc
later :-).)

Thanks,
Zhao

Re: [PATCH v4 1/9] vhost-user: Add VirtIO Shared Memory map request

2025-02-20 Thread Alyssa Ross

Albert Esteve  writes:

> @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
>  unsigned char uuid[16];
>  } VhostUserShared;
>  
> +/* For the flags field of VhostUserMMap */
> +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> +
> +typedef struct {
> +/* VIRTIO Shared Memory Region ID */
> +uint8_t shmid;
> +uint8_t padding[7];
> +/* File offset */
> +uint64_t fd_offset;
> +/* Offset within the VIRTIO Shared Memory Region */
> +uint64_t shm_offset;
> +/* Size of the mapping */
> +uint64_t len;
> +/* Flags for the mmap operation, from VHOST_USER_FLAG_* */

Perhaps this should say VHOST_USER_FLAG_MAP_*?


signature.asc
Description: PGP signature

Re: [PATCH v6 0/7] hw/char/pl011: Implement TX (async) FIFO to avoid blocking the main loop

2025-02-20 Thread Peter Maydell

On Thu, 20 Feb 2025 at 10:52, Peter Maydell  wrote:
>
> On Thu, 20 Feb 2025 at 10:43, Peter Maydell  wrote:
> >
> > On Tue, 18 Feb 2025 at 13:54, Peter Maydell  
> > wrote:
> > >
> > > On Mon, 17 Feb 2025 at 14:55, Peter Maydell  
> > > wrote:
> > > >
> > > > On Sat, 8 Feb 2025 at 16:39, Philippe Mathieu-Daudé  
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > This series add support for (async) FIFO on the transmit path
> > > > > of the PL011 UART.
> > > > >
> > > >
> > > > Applied to target-arm.next, thanks (with a couple of minor
> > > > tweaks to two of the patches).
> > >
> > > Unfortunately I seem to get failures in 'make check-functional'
> > > with the last patch of this series applied.
> >
> > I had a look at this this morning because I wondered if it
> > was a mistake in the style fixups I'd applied to the patches
> > on my end, and I found the bug fairly quickly. The problem is
> > that pl011_xmit() doesn't update the TXFE and TXFF FIFO empty/full
> > status flag bits when it removes characters from the FIFO.
> > So the guest kernel spins forever because TXFF is never unset.
> >
> > The following patch fixes this for me (and also makes us not
> > set INT_TX for the case where we couldn't send any bytes to
> > the chardev, which I noticed reading the code rather than
> > because it had any visible bad effects):
>
> Hmm, but that's clearly not the only problem -- it fixed the
> "no output at all issue", but now I see a test failure because
> of garbled console output:

> I also noticed that pl011_write_txdata() doesn't clear TXFE
> when it puts a byte into the fifo -- I'm testing to see if
> fixing that helps.

Yes, with this patch also:

--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -330,6 +330,7 @@ static void pl011_write_txdata(PL011State *s, uint8_t data)
 if (pl011_is_tx_fifo_full(s)) {
 s->flags |= PL011_FLAG_TXFF;
 }
+s->flags &= ~PL011_FLAG_TXFE;

 pl011_xmit(NULL, G_IO_OUT, s);
 }

this remaining failure is fixed and I get a clean pass (other than
the gpu test failure, but that's not related to the pl011).

-- PMM

Re: [PATCH v2 2/3] binfmt: Normalize host CPU architecture

2025-02-20 Thread Laurent Vivier


Le 27/01/2025 à 19:29, Andrea Bolognani a écrit :

Right now information regarding the family each CPU type belongs
to is recorded in two places: the large data table at the top of
the script, and the qemu_host_family() function.

We can make things better by mapping host CPU architecture to
QEMU target in the few cases where the two don't already match
and then using the data table to look up the family, same as
we're already doing for the guest CPU architecture.

Being able to reason in terms of QEMU target regardless of
whether we're looking at the host or guest CPU architecture will
come in handy to implement upcoming changes.

A couple of entries are dropped in the process: BePC and Power
Macintosh. I'm quite certain neither of those have ever been
reported as CPU architectures by Linux. I believe many more of
the entries that are carried forward could be dropped as well,
but I don't have the same level of confidence there so I
decided to play it safe just in case.

Signed-off-by: Andrea Bolognani 
---
  scripts/qemu-binfmt-conf.sh | 44 +
  1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/scripts/qemu-binfmt-conf.sh b/scripts/qemu-binfmt-conf.sh
index 426f075e31..8d9136a29f 100755
--- a/scripts/qemu-binfmt-conf.sh
+++ b/scripts/qemu-binfmt-conf.sh
@@ -144,35 +144,35 @@ 
loongarch64_magic='\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x
  
loongarch64_mask='\xff\xff\xff\xff\xff\xff\xff\xfc\x00\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff'
  loongarch64_family=loongarch
  
-qemu_get_family() {

-cpu=${HOST_ARCH:-$(uname -m)}
+# Converts the name of a host CPU architecture to the corresponding QEMU
+# target.
+#
+# FIXME: This can probably be simplified a lot by dropping most entries.
+#Remember that the script is only used on Linux, so we only need to
+#handle the strings Linux uses to report the host CPU architecture.
+qemu_normalize() {
+cpu="$1"
  case "$cpu" in
-amd64|i386|i486|i586|i686|i86pc|BePC|x86_64)
+i[3-6]86)
  echo "i386"
  ;;
-mips*)
-echo "mips"
+amd64)
+echo "x86_64"
  ;;
-"Power Macintosh"|ppc64|powerpc|ppc)


I don't know why we have "Power Macintosh" as a valid value for "uname -m".
I think it's a good idea to remove it.
It's here from the beginning
08785f48b73c ("updated so that PPC/ARM/SPARC executables are automatically launched 
when invoked")
Perhaps it's a value coming from the OpenFirmware or MkLinux.


+powerpc)
  echo "ppc"
  ;;
-ppc64el|ppc64le)
-echo "ppcle"
+ppc64el)
+echo "ppc64le"
  ;;
-arm|armel|armhf|arm64|armv[4-9]*l|aarch64)
+armel|armhf|armv[4-9]*l)
  echo "arm"
  ;;
-armeb|armv[4-9]*b|aarch64_be)
+armv[4-9]*b)
  echo "armeb"
  ;;
-sparc*)
-echo "sparc"
-;;
-riscv*)
-echo "riscv"
-;;
-loongarch*)
-echo "loongarch"
+arm64)
+echo "aarch64"
  ;;
  *)
  echo "$cpu"
@@ -309,7 +309,13 @@ EOF
  
  qemu_set_binfmts() {

  # probe cpu type
-host_family=$(qemu_get_family)
+host_cpu=$(qemu_normalize ${HOST_ARCH:-$(uname -m)})
+host_family=$(eval echo \$${host_cpu}_family)
+
+if [ "$host_family" = "" ] ; then
+echo "INTERNAL ERROR: unknown host cpu $host_cpu" 1>&2
+exit 1
+fi
  
  # register the interpreter for each cpu except for the native one
  


Reviewed-by: Laurent Vivier

Re: [PATCH v2 3/3] binfmt: Add --ignore-family option

2025-02-20 Thread Laurent Vivier


Le 27/01/2025 à 19:29, Andrea Bolognani a écrit :

Until now, the script has worked under the assumption that a
host CPU can run binaries targeting any CPU in the same family.
That's a fair enough assumption when it comes to running i386
binaries on x86_64, but it doesn't quite apply in the general
case.

For example, while riscv64 CPUs could theoretically run riscv32
applications natively, in practice there exist few (if any?)
CPUs that implement the necessary silicon; moreover, even if you
had one such CPU, your host OS would most likely not have
enabled the necessary kernel bits.

This new option gives distro packagers the ability to opt out of
the assumption, likely on a per-architecture basis, and make
things work out of the box for a larger fraction of their user
base.

As an interesting side effect, this makes it possible to enable
execution of 64-bit binaries on 32-bit CPUs of the same family,
which is a perfectly valid use case that apparently hadn't been
considered until now.

Link: https://src.fedoraproject.org/rpms/qemu/pull-request/72
Thanks: David Abdurachmanov 
Thanks: Daniel P. Berrangé 
Signed-off-by: Andrea Bolognani 
---
  scripts/qemu-binfmt-conf.sh | 19 ---
  1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/scripts/qemu-binfmt-conf.sh b/scripts/qemu-binfmt-conf.sh
index 8d9136a29f..5fd462b1d1 100755
--- a/scripts/qemu-binfmt-conf.sh
+++ b/scripts/qemu-binfmt-conf.sh
@@ -205,6 +205,9 @@ Usage: qemu-binfmt-conf.sh [--qemu-path 
PATH][--debian][--systemd CPU]
 --persistent:if yes, the interpreter is loaded when binfmt is
  configured and remains in memory. All future uses
  are cloned from the open file.
+   --ignore-family: if yes, it is assumed that the host CPU (e.g. riscv64)
+can't natively run programs targeting a CPU that is
+part of the same family (e.g. riscv32).
 --preserve-argv0 preserve argv[0]
  
  To import templates with update-binfmts, use :

@@ -337,7 +340,12 @@ qemu_set_binfmts() {
  fi
  
  if [ "$host_family" = "$family" ] ; then

-continue
+# When --ignore-family is used, we have to generate rules even
+# for targets that are in the same family as the host CPU. The
+# only exception is of course when the CPU types exactly match
+if [ "$target" = "$host_cpu" ] || [ "$IGNORE_FAMILY" = "no" ] ; 
then
+continue
+fi
  fi
  
  $BINFMT_SET

@@ -355,10 +363,11 @@ CREDENTIAL=no
  PERSISTENT=no
  PRESERVE_ARG0=no
  QEMU_SUFFIX=""
+IGNORE_FAMILY=no
  
  _longopts="debian,systemd:,qemu-path:,qemu-suffix:,exportdir:,help,credential:,\

-persistent:,preserve-argv0:"
-options=$(getopt -o ds:Q:S:e:hc:p:g:F: -l ${_longopts} -- "$@")
+persistent:,preserve-argv0:,ignore-family:"
+options=$(getopt -o ds:Q:S:e:hc:p:g:F:i: -l ${_longopts} -- "$@")
  eval set -- "$options"
  
  while true ; do

@@ -418,6 +427,10 @@ while true ; do
  shift
  PRESERVE_ARG0="$1"
  ;;
+-i|--ignore-family)
+shift
+IGNORE_FAMILY="$1"
+;;
  *)
  break
  ;;


Reviewed-by: Laurent Vivier

Re: [PATCH v5 3/6] target/i386: Update EPYC-Milan CPU model for Cache property, RAS, SVM feature bits

2025-02-20 Thread Zhao Liu

> +static const CPUCaches epyc_milan_v3_cache_info = {
> +.l1d_cache = &(CPUCacheInfo) {
> +.type = DATA_CACHE,
> +.level = 1,
> +.size = 32 * KiB,
> +.line_size = 64,
> +.associativity = 8,
> +.partitions = 1,
> +.sets = 64,
> +.lines_per_tag = 1,
> +.self_init = 1,

true.

> +.share_level = CPU_TOPOLOGY_LEVEL_CORE,
> +},
> +.l1i_cache = &(CPUCacheInfo) {
> +.type = INSTRUCTION_CACHE,
> +.level = 1,
> +.size = 32 * KiB,
> +.line_size = 64,
> +.associativity = 8,
> +.partitions = 1,
> +.sets = 64,
> +.lines_per_tag = 1,
> +.self_init = 1,

true.

Reviewed-by: Zhao Liu

Re: Any plan for command line '-g' option (graphical resolution and depth)?

2025-02-20 Thread Mark Cave-Ayland


On 20/02/2025 09:20, Philippe Mathieu-Daudé wrote:


Hi,

QEMU provides the global '-g' CLI option:

$ qemu-system-foo --help

   -g WxH[xDEPTH]  Set the initial graphical resolution and depth

This option is used to pass resolution/depth information to guest
firmwares in the machines defined in the following files:

   hw/ppc/mac_newworld.c
   hw/ppc/mac_oldworld.c
   hw/ppc/prep.c
   hw/ppc/spapr.c
   hw/sparc/sun4m.c
   hw/sparc64/sun4u.c

Examples:

- hw/ppc/spapr.c:1102:

   _FDT(fdt_setprop_cell(fdt, c, "qemu,graphic-width", graphic_width));

- hw/sparc64/sun4u.c:716:

   fw_cfg_add_i16(fw_cfg, FW_CFG_SPARC64_WIDTH, graphic_width);


Obviously we have default values, which are different per architecture!
Clearly these ought to be machine properties.


At least for VGA it is possible to use VGA via the width and height properties to set 
the initial values i.e. on the command line with -device VGA,width=1024,height=768 
and then use an EDID parser to get the information in the firmware. But then we hit 
the same problem of how to set these properties from the command line on in-built 
devices again.



It is also used to set TYPE_NUBUS_MACFB properties in hw/m68k/q800.c.
Here i suppose we could directly use '-global nubus-macfb.width=value'
etc. although I'm not sure it is the recommended way.


This approach should work, however -global feels more like a big hammer as opposed to 
a proper solution. In particular it doesn't work if you have e.g. multiple screens 
that you want set to different resolutions.



Should we start deprecating '-g' as a whole?


I think there should be a better way to consolidate the default resolution selection 
on display devices (particularly in the case of multiple displays), so yes.



ATB,

Mark.

[PATCH v2 9/9] hw/char/sh_serial: Return correct number of empty RX FIFO elements

2025-02-20 Thread Philippe Mathieu-Daudé

In the IOCanReadHandler sh_serial_can_receive(), if the Serial
Control Register 'Receive Enable' bit is set (bit 4), then we
return a size of (1 << 4) which happens to be equal to 16, so
effectively SH_RX_FIFO_LENGTH.

The IOReadHandler, sh_serial_receive1() takes care to receive
multiple chars, but if the FIFO is partly filled, we only process
the number of free slots in the FIFO, discarding the other chars!

Fix by returning how many elements the FIFO can queue in the
IOCanReadHandler, so we don't have to process more than that in
the IOReadHandler, thus not discarding anything.

Remove the now unnecessary check on 's->rx_cnt < SH_RX_FIFO_LENGTH'
in IOReadHandler, reducing the block indentation.

Fixes: 63242a007a1 ("SH4: Serial controller improvement")
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Luc Michel 
---
 hw/char/sh_serial.c | 30 ++
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/hw/char/sh_serial.c b/hw/char/sh_serial.c
index 247aeb071ac..41c8175a638 100644
--- a/hw/char/sh_serial.c
+++ b/hw/char/sh_serial.c
@@ -320,7 +320,7 @@ static uint64_t sh_serial_read(void *opaque, hwaddr offs,
 
 static int sh_serial_can_receive(SHSerialState *s)
 {
-return s->scr & (1 << 4);
+return s->scr & (1 << 4) ? SH_RX_FIFO_LENGTH - s->rx_head : 0;
 }
 
 static void sh_serial_receive_break(SHSerialState *s)
@@ -353,22 +353,20 @@ static void sh_serial_receive1(void *opaque, const 
uint8_t *buf, int size)
 if (s->feat & SH_SERIAL_FEAT_SCIF) {
 int i;
 for (i = 0; i < size; i++) {
-if (s->rx_cnt < SH_RX_FIFO_LENGTH) {
-s->rx_fifo[s->rx_head++] = buf[i];
-if (s->rx_head == SH_RX_FIFO_LENGTH) {
-s->rx_head = 0;
-}
-s->rx_cnt++;
-if (s->rx_cnt >= s->rtrg) {
-s->flags |= SH_SERIAL_FLAG_RDF;
-if (s->scr & (1 << 6) && s->rxi) {
-timer_del(&s->fifo_timeout_timer);
-qemu_set_irq(s->rxi, 1);
-}
-} else {
-timer_mod(&s->fifo_timeout_timer,
-qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 15 * s->etu);
+s->rx_fifo[s->rx_head++] = buf[i];
+if (s->rx_head == SH_RX_FIFO_LENGTH) {
+s->rx_head = 0;
+}
+s->rx_cnt++;
+if (s->rx_cnt >= s->rtrg) {
+s->flags |= SH_SERIAL_FLAG_RDF;
+if (s->scr & (1 << 6) && s->rxi) {
+timer_del(&s->fifo_timeout_timer);
+qemu_set_irq(s->rxi, 1);
 }
+} else {
+timer_mod(&s->fifo_timeout_timer,
+qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 15 * s->etu);
 }
 }
 } else {
-- 
2.47.1

[PATCH v2 4/9] hw/char/pl011: Really use RX FIFO depth

2025-02-20 Thread Philippe Mathieu-Daudé

While we model a 16-elements RX FIFO since the PL011 model was
introduced in commit cdbdb648b7c ("ARM Versatile Platform Baseboard
emulation"), we only read 1 char at a time!

Have the IOCanReadHandler handler return how many elements are
available, and use that in the IOReadHandler handler.

Example of FIFO better used by enabling the pl011 tracing events
and running the tests/functional/test_aarch64_virt.py tests:

  pl011_can_receive LCR 0x70, RX FIFO used 0/16, can_receive 16 chars
  pl011_receive recv 5 chars
  pl011_fifo_rx_put RX FIFO push char [0x72] 1/16 depth used
  pl011_irq_state irq state 1
  pl011_fifo_rx_put RX FIFO push char [0x6f] 2/16 depth used
  pl011_fifo_rx_put RX FIFO push char [0x6f] 3/16 depth used
  pl011_fifo_rx_put RX FIFO push char [0x74] 4/16 depth used
  pl011_fifo_rx_put RX FIFO push char [0x0d] 5/16 depth used
  pl011_can_receive LCR 0x70, RX FIFO used 5/16, can_receive 11 chars
  pl011_can_receive LCR 0x70, RX FIFO used 5/16, can_receive 11 chars
  pl011_write addr 0x038 value 0x0050 reg IMSC
  pl011_irq_state irq state 1
  pl011_can_receive LCR 0x70, RX FIFO used 5/16, can_receive 11 chars
  pl011_read addr 0x03c value 0x0030 reg RIS
  pl011_write addr 0x044 value 0x reg ICR
  pl011_irq_state irq state 1
  pl011_read addr 0x018 value 0x0080 reg FR
  pl011_read_fifo RX FIFO read, used 4/16
  pl011_irq_state irq state 1
  pl011_read addr 0x000 value 0x0072 reg DR
  pl011_can_receive LCR 0x70, RX FIFO used 4/16, can_receive 12 chars
  pl011_read addr 0x018 value 0x0080 reg FR
  pl011_read_fifo RX FIFO read, used 3/16
  pl011_irq_state irq state 1
  pl011_read addr 0x000 value 0x006f reg DR
  pl011_can_receive LCR 0x70, RX FIFO used 3/16, can_receive 13 chars
  pl011_read addr 0x018 value 0x0080 reg FR
  pl011_read_fifo RX FIFO read, used 2/16
  pl011_irq_state irq state 1
  pl011_read addr 0x000 value 0x006f reg DR
  pl011_can_receive LCR 0x70, RX FIFO used 2/16, can_receive 14 chars
  pl011_read addr 0x018 value 0x0080 reg FR
  pl011_read_fifo RX FIFO read, used 1/16
  pl011_irq_state irq state 1
  pl011_read addr 0x000 value 0x0074 reg DR
  pl011_can_receive LCR 0x70, RX FIFO used 1/16, can_receive 15 chars
  pl011_read addr 0x018 value 0x0080 reg FR
  pl011_read_fifo RX FIFO read, used 0/16
  pl011_irq_state irq state 0
  pl011_read addr 0x000 value 0x000d reg DR
  pl011_can_receive LCR 0x70, RX FIFO used 0/16, can_receive 16 chars
  pl011_read addr 0x018 value 0x0090 reg FR
  pl011_read addr 0x03c value 0x0020 reg RIS
  pl011_write addr 0x038 value 0x0050 reg IMSC
  pl011_irq_state irq state 0
  pl011_can_receive LCR 0x70, RX FIFO used 0/16, can_receive 16 chars
  pl011_can_receive LCR 0x70, RX FIFO used 0/16, can_receive 16 chars
  pl011_read addr 0x018 value 0x0090 reg FR
  pl011_write addr 0x000 value 0x0072 reg DR

Inspired-by: Peter Maydell 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Luc Michel 
---
 hw/char/pl011.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index f7485e7c541..23a9db8c57c 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -489,7 +489,6 @@ static int pl011_can_receive(void *opaque)
 PL011State *s = (PL011State *)opaque;
 unsigned fifo_depth = pl011_get_fifo_depth(s);
 unsigned fifo_available = fifo_depth - s->read_count;
-int r = fifo_available ? 1 : 0;
 
 if (!(s->cr & CR_UARTEN)) {
 qemu_log_mask(LOG_GUEST_ERROR,
@@ -500,7 +499,8 @@ static int pl011_can_receive(void *opaque)
   "PL011 receiving data on disabled RX UART\n");
 }
 trace_pl011_can_receive(s->lcr, s->read_count, fifo_depth, fifo_available);
-return r;
+
+return fifo_available;
 }
 
 static void pl011_receive(void *opaque, const uint8_t *buf, int size)
@@ -515,7 +515,9 @@ static void pl011_receive(void *opaque, const uint8_t *buf, 
int size)
 return;
 }
 
-pl011_fifo_rx_put(opaque, *buf);
+for (int i = 0; i < size; i++) {
+pl011_fifo_rx_put(opaque, buf[i]);
+}
 }
 
 static void pl011_event(void *opaque, QEMUChrEvent event)
-- 
2.47.1

[PATCH v2 8/9] hw/char/mcf_uart: Really use RX FIFO depth

2025-02-20 Thread Philippe Mathieu-Daudé

While we model a 4-elements RX FIFO since the MCF UART model
was introduced in commit 20dcee94833 ("MCF5208 emulation"),
we only read 1 char at a time!

Have the IOCanReadHandler handler return how many elements are
available, and use that in the IOReadHandler handler.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Luc Michel 
---
 hw/char/mcf_uart.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/char/mcf_uart.c b/hw/char/mcf_uart.c
index 95f269ee9b7..529c26be93a 100644
--- a/hw/char/mcf_uart.c
+++ b/hw/char/mcf_uart.c
@@ -281,14 +281,16 @@ static int mcf_uart_can_receive(void *opaque)
 {
 mcf_uart_state *s = (mcf_uart_state *)opaque;
 
-return s->rx_enabled && (s->sr & MCF_UART_FFULL) == 0;
+return s->rx_enabled ? FIFO_DEPTH - s->fifo_len : 0;
 }
 
 static void mcf_uart_receive(void *opaque, const uint8_t *buf, int size)
 {
 mcf_uart_state *s = (mcf_uart_state *)opaque;
 
-mcf_uart_push_byte(s, buf[0]);
+for (int i = 0; i < size; i++) {
+mcf_uart_push_byte(s, buf[i]);
+}
 }
 
 static const MemoryRegionOps mcf_uart_ops = {
-- 
2.47.1

Re: [PATCH 2/2] [NOT-FOR-MERGE] Add qtest for migration over RDMA

2025-02-20 Thread Li Zhijian via

On 19/02/2025 22:11, Peter Xu wrote:
> then
> in the test it tries to detect rdma link and fetch the ip only
 It should work without root permission if we just*detect*  and*fetch ip*.

 Do you also mean we can split new-rdma-link.sh to 2 separate scripts
 - add-rdma-link.sh # optionally, execute by user before the test (require 
 root permission)
 - detect-fetch-rdma.sh # execute from the migration-test
>>> Hmm indeed we still need a script to scan over all the ports..
>>>
>>> If having --rdma is a good idea, maybe we can further make it a parameter
>>> to --rdma?
>>>
>>>$ migration-test --rdma $RDMA_IP
>>>
>>> Or:
>>>
>>>$ migration-test --rdma-ip $RDMA_IP
>> I think --rdma only makes sense if it's going to do something
>> special. The optmimal scenario is that it always runs the test when it
>> can and sets up/tears down anything it needs.
>>
>> If it needs root, I'd prefer the test informs about this and does the
>> work itself.
>>
>> It would also be good to have the add + detect separate so we have more
>> flexibility, maybe we manage to enable this in CI even.
>>
>> So:
>>
>> ./add.sh
>> migration-test
>> (runs detect.sh + runs rdma test)
>> (leaves stuff behind)
>>
>> migration-test
>> (skips rdma test with message that it needs root)
>>
>> sudo migration-test
>> (runs add.sh + detect.sh + runs rdma test)
>> (cleans itself up)
>>
>> Does that make sense to you? I hope it's not too much work.
> Looks good here.  We can also keep all the rdma stuff into one file, taking
> parameters.
> 
> ./rdma-helper.sh setup
> ./rdma-helper.sh detect-ip

Hi Peter and Fabiano

Many thanks for your kindly idea and suggestion.
Please take another look at the changes below.
- I don't copy script to the build dir, just execute the script like 
misc-tests.c
- It will automatically create a new RXE if it doesn't exit when running in root

[PATCH] migration: Add qtest for migration over RDMA

This qtest requires there is RDMA(RoCE) link in the host.
Introduce a scripts/rdma-migration-helper.sh to
- setup a new RXE if it's root
- detect existing RoCE link
to make the qtest work smoothly.

Test will be skip if there is no available RoCE link.
 # Start of rdma tests
 # Running /x86_64/migration/precopy/rdma/plain
 ok 1 /x86_64/migration/precopy/rdma/plain # SKIP There is no available rdma 
link in the host.
 Maybe you are not running with the root permission
 # End of rdma tests

Admin is able to remove the RXE by passing 'cleanup' to this script.

Signed-off-by: Li Zhijian 
---
 scripts/rdma-migration-helper.sh  | 40 +++
 tests/qtest/migration/precopy-tests.c | 57 +++
 2 files changed, 97 insertions(+)
 create mode 100755 scripts/rdma-migration-helper.sh

diff --git a/scripts/rdma-migration-helper.sh b/scripts/rdma-migration-helper.sh
new file mode 100755
index 00..4ef62baf0f
--- /dev/null
+++ b/scripts/rdma-migration-helper.sh
@@ -0,0 +1,40 @@
+#!/bin/bash
+
+# Copied from blktests
+get_ipv4_addr() {
+ip -4 -o addr show dev "$1" |
+sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' |
+tr -d '\n'
+}
+
+has_soft_rdma() {
+rdma link | grep -q " netdev $1[[:blank:]]*\$"
+}
+
+rdma_rxe_setup_detect()
+{
+(
+cd /sys/class/net &&
+for i in *; do
+[ -e "$i" ] || continue
+[ "$i" = "lo" ] && continue
+[ "$(<"$i/addr_len")" = 6 ] || continue
+[ "$(<"$i/carrier")" = 1 ] || continue
+
+has_soft_rdma "$i" && break
+[ "$operation" = "setup" ] && rdma link add "${i}_rxe" type 
rxe netdev "$i" && break
+done
+has_soft_rdma "$i" || return
+get_ipv4_addr $i
+)
+}
+
+operation=${1:-setup}
+
+if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then
+rdma_rxe_setup_detect
+elif [ "$operation" == "cleanup" ]; then
+modprobe -r rdma_rxe
+else
+echo "Usage: $0 [setup | cleanup | detect]"
+fi
diff --git a/tests/qtest/migration/precopy-tests.c 
b/tests/qtest/migration/precopy-tests.c
index ba273d10b9..8c72eb699b 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -99,6 +99,59 @@ static void test_precopy_unix_dirty_ring(void)
 test_precopy_common(&args);
 }
 
+#ifdef CONFIG_RDMA
+
+#define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh"
+static int new_rdma_link(char *buffer) {
+const char *argument = (geteuid() == 0) ? "setup" : "detect";
+char command[1024];
+
+snprintf(command, sizeof(command), "%s %s", RDMA_MIGRATION_HELPER, 
argument);
+
+FILE *pipe = popen(command, "r");
+if (pipe == NULL) {
+perror("Failed to run script");
+return -1;
+}
+
+int idx = 0;
+while (fgets(buffer + idx, 128 - idx, pipe) != NULL) {
+idx += strlen(buffer);
+}
+
+int status = pclose(pipe);
+if (status == -1) {
+perror("Error reported b

Re: [PATCH v5 1/6] target/i386: Update EPYC CPU model for Cache property, RAS, SVM feature bits

2025-02-20 Thread Zhao Liu

> +static CPUCaches epyc_v5_cache_info = {
> +.l1d_cache = &(CPUCacheInfo) {
> +.type = DATA_CACHE,
> +.level = 1,
> +.size = 32 * KiB,
> +.line_size = 64,
> +.associativity = 8,
> +.partitions = 1,
> +.sets = 64,
> +.lines_per_tag = 1,
> +.self_init = 1,

For consistency as the below parts, it's better to code `true` for all
boolean types.

> +.share_level = CPU_TOPOLOGY_LEVEL_CORE,
> +},
> +.l1i_cache = &(CPUCacheInfo) {
> +.type = INSTRUCTION_CACHE,
> +.level = 1,
> +.size = 64 * KiB,
> +.line_size = 64,
> +.associativity = 4,
> +.partitions = 1,
> +.sets = 256,
> +.lines_per_tag = 1,
> +.self_init = 1,

ditto.

Others are fine for me, so,

Reviewed-by: Zhao Liu 


And one more thing :-) ...

>  static const CPUCaches epyc_rome_cache_info = {
>  .l1d_cache = &(CPUCacheInfo) {
>  .type = DATA_CACHE,
> @@ -5207,6 +5261,25 @@ static const X86CPUDefinition builtin_x86_defs[] = {
>  },
>  .cache_info = &epyc_v4_cache_info
>  },
> +{
> +.version = 5,
> +.props = (PropValue[]) {
> +{ "overflow-recov", "on" },
> +{ "succor", "on" },

When I checks the "overflow-recov" and "succor" enabling, I find these 2
bits are set unconditionally.

I'm not sure if all AMD platforms support both bits, do you think it's
necessary to check the host support?

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 6c749d4ee812..03e463076632 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -555,7 +555,10 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
 ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
 } else if (function == 0x8007 && reg == R_EBX) {
-ret |= CPUID_8000_0007_EBX_OVERFLOW_RECOV | CPUID_8000_0007_EBX_SUCCOR;
+uint32_t ebx;
+host_cpuid(0x8007, 0, &unused, &ebx, &unused, &unused);
+
+ret |= ebx & (CPUID_8000_0007_EBX_OVERFLOW_RECOV | 
CPUID_8000_0007_EBX_SUCCOR);
 } else if (function == KVM_CPUID_FEATURES && reg == R_EAX) {
 /* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't
  * be enabled without the in-kernel irqchip

Thanks,
Zhao

Re: [PATCH v6 0/7] hw/char/pl011: Implement TX (async) FIFO to avoid blocking the main loop

2025-02-20 Thread Peter Maydell

On Tue, 18 Feb 2025 at 13:54, Peter Maydell  wrote:
>
> On Mon, 17 Feb 2025 at 14:55, Peter Maydell  wrote:
> >
> > On Sat, 8 Feb 2025 at 16:39, Philippe Mathieu-Daudé  
> > wrote:
> > >
> > > Hi,
> > >
> > > This series add support for (async) FIFO on the transmit path
> > > of the PL011 UART.
> > >
> >
> > Applied to target-arm.next, thanks (with a couple of minor
> > tweaks to two of the patches).
>
> Unfortunately I seem to get failures in 'make check-functional'
> with the last patch of this series applied.

I had a look at this this morning because I wondered if it
was a mistake in the style fixups I'd applied to the patches
on my end, and I found the bug fairly quickly. The problem is
that pl011_xmit() doesn't update the TXFE and TXFF FIFO empty/full
status flag bits when it removes characters from the FIFO.
So the guest kernel spins forever because TXFF is never unset.

The following patch fixes this for me (and also makes us not
set INT_TX for the case where we couldn't send any bytes to
the chardev, which I noticed reading the code rather than
because it had any visible bad effects):

--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -256,6 +256,7 @@ static gboolean pl011_xmit(void *do_not_use,
GIOCondition cond, void *opaque)
 int bytes_consumed;
 uint8_t buf[PL011_FIFO_DEPTH];
 uint32_t count;
+bool emptied_fifo;

 count = fifo8_num_used(&s->xmit_fifo);
 trace_pl011_fifo_tx_xmit_used(count);
@@ -280,15 +281,24 @@ static gboolean pl011_xmit(void *do_not_use,
GIOCondition cond, void *opaque)
 /* Error in back-end: drain the fifo. */
 pl011_drain_tx(s);
 return G_SOURCE_REMOVE;
+} else if (bytes_consumed == 0) {
+/* Couldn't send anything, try again later */
+return G_SOURCE_CONTINUE;
 }

 /* Pop the data we could transmit. */
 fifo8_drop(&s->xmit_fifo, bytes_consumed);
 s->int_level |= INT_TX;
+s->flags &= ~PL011_FLAG_TXFF;
+
+emptied_fifo = fifo8_is_empty(&s->xmit_fifo);
+if (emptied_fifo) {
+s->flags |= PL011_FLAG_TXFE;
+}

 pl011_update(s);

-if (!fifo8_is_empty(&s->xmit_fifo)) {
+if (!emptied_fifo) {
 /* Reschedule another transmission if we couldn't transmit all. */
 return G_SOURCE_CONTINUE;
 }

If you're OK with that as a fix then I'll squash it in
and keep the series in target-arm.next.

thanks
-- PMM

Re: [PATCH v2 1/3] binfmt: Shuffle things around

2025-02-20 Thread Laurent Vivier


Le 27/01/2025 à 19:29, Andrea Bolognani a écrit :

This should make no difference from the functional point of
view and it's just preparation for upcoming changes.

Signed-off-by: Andrea Bolognani 
---
  scripts/qemu-binfmt-conf.sh | 17 ++---
  1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/scripts/qemu-binfmt-conf.sh b/scripts/qemu-binfmt-conf.sh
index 6ef9f118d9..426f075e31 100755
--- a/scripts/qemu-binfmt-conf.sh
+++ b/scripts/qemu-binfmt-conf.sh
@@ -318,20 +318,23 @@ qemu_set_binfmts() {
  mask=$(eval echo \$${cpu}_mask)
  family=$(eval echo \$${cpu}_family)
  
+target="$cpu"

+if [ "$cpu" = "i486" ] ; then
+target="i386"
+fi
+
+qemu="$QEMU_PATH/qemu-$target$QEMU_SUFFIX"
+
  if [ "$magic" = "" ] || [ "$mask" = "" ] || [ "$family" = "" ] ; then
  echo "INTERNAL ERROR: unknown cpu $cpu" 1>&2
  continue
  fi
  
-qemu="$QEMU_PATH/qemu-$cpu"

-if [ "$cpu" = "i486" ] ; then
-qemu="$QEMU_PATH/qemu-i386"
+if [ "$host_family" = "$family" ] ; then
+continue
  fi
  
-qemu="$qemu$QEMU_SUFFIX"

-if [ "$host_family" != "$family" ] ; then
-$BINFMT_SET
-fi
+$BINFMT_SET
  done
  }
  


Reviewed-by: Laurent Vivier

Re: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in d3hot state

2025-02-20 Thread Eric Auger

Hi Alex,

On 2/20/25 11:31 AM, Eric Auger wrote:
> 
> Hi Alex,
> 
> On 2/19/25 10:19 PM, Alex Williamson wrote:
>> On Wed, 19 Feb 2025 11:58:44 -0700
>> Alex Williamson  wrote:
>>
>>> On Wed, 19 Feb 2025 18:58:58 +0100
>>> Eric Auger  wrote:
>>>
 Since kernel commit:
 2b2c651baf1c ("vfio/pci: Invalidate mmaps and block the access
 in D3hot power state")
 any attempt to do an mmap access to a BAR when the device is in d3hot
 state will generate a fault.

 On system_powerdown, if the VFIO device is translated by an IOMMU,
 the device is moved to D3hot state and then the vIOMMU gets disabled
 by the guest. As a result of this later operation, the address space is
 swapped from translated to untranslated. When re-enabling the aliased
 regions, the RAM regions are dma-mapped again and this causes DMA_MAP
 faults when attempting the operation on BARs.

 To avoid doing the remap on those BARs, we compute whether the
 device is in D3hot state and if so, skip the DMA MAP.  
>>> Thinking on this some more, QEMU PCI code already manages the device
>>> BARs appearing in the address space based on the memory enable bit in
>>> the command register.  Should we do the same for PM state?
>>>
>>> IOW, the device going into low power state should remove the BARs from
>>> the AddressSpace and waking the device should re-add them.  The BAR DMA
>>> mapping should then always be consistent, whereas here nothing would
>>> remap the BARs when the device is woken.
>>>
>>> I imagine we'd need an interface to register the PM capability with the
>>> core QEMU PCI code, where address space updates are performed relative
>>> to both memory enable and power status.  There might be a way to
>>> implement this just for vfio-pci devices by toggling the enable state
>>> of the BAR mmaps relative to PM state, but doing it at the PCI core
>>> level seems like it'd provide behavior more true to physical hardware.
>> I took a stab at this approach here, it doesn't obviously break
>> anything in my configs, but I haven't yet tried to reproduce this exact
>> scenario.
>>
>> https://gitlab.com/alex.williamson/qemu/-/tree/pci-pm-power-state

it does not totally fix the issue: I now get:

qemu-system-x86_64: warning: vfio_container_dma_map(0x55cc25705680,
0x3800, 0x100, 0x7f876200) = -14 (Bad address)
:41:00.0: PCI peer-to-peer transactions on BARs are not supported.


Eric

> 
> So if I understand correctly the BAR regions will disappear upon the
> config cmd write in vfio_sub_page_bar_update_mapping(). Is that correct?
> I will give it a try on my setup...
>>
>> There's another pm_cap on the PCIExpressDevice that needs to be
>> consolidated as well, once I do some research to figure out why a
>> non-express capability is tracked only by express devices and what
>> they're doing with it.  Thanks,
> I am not sure I get this last point though.
> 
> Thanks
> 
> Eric
>>
>> Alex
>>
>

Re: [PATCH v6 0/7] hw/char/pl011: Implement TX (async) FIFO to avoid blocking the main loop

2025-02-20 Thread Peter Maydell

On Thu, 20 Feb 2025 at 10:43, Peter Maydell  wrote:
>
> On Tue, 18 Feb 2025 at 13:54, Peter Maydell  wrote:
> >
> > On Mon, 17 Feb 2025 at 14:55, Peter Maydell  
> > wrote:
> > >
> > > On Sat, 8 Feb 2025 at 16:39, Philippe Mathieu-Daudé  
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > This series add support for (async) FIFO on the transmit path
> > > > of the PL011 UART.
> > > >
> > >
> > > Applied to target-arm.next, thanks (with a couple of minor
> > > tweaks to two of the patches).
> >
> > Unfortunately I seem to get failures in 'make check-functional'
> > with the last patch of this series applied.
>
> I had a look at this this morning because I wondered if it
> was a mistake in the style fixups I'd applied to the patches
> on my end, and I found the bug fairly quickly. The problem is
> that pl011_xmit() doesn't update the TXFE and TXFF FIFO empty/full
> status flag bits when it removes characters from the FIFO.
> So the guest kernel spins forever because TXFF is never unset.
>
> The following patch fixes this for me (and also makes us not
> set INT_TX for the case where we couldn't send any bytes to
> the chardev, which I noticed reading the code rather than
> because it had any visible bad effects):

Hmm, but that's clearly not the only problem -- it fixed the
"no output at all issue", but now I see a test failure because
of garbled console output:

2025-02-20 10:34:51,562: Booting Linux on physical CPU 0x0
2025-02-20 10:34:51,563: Linux version 4.18.16-300.fc29.armv7hl
(mockbu...@buildvm-armv7-06.arm.fedoraproject.org) (gcc version 8.2.1
20180801 (Red Hat 8.2.1-2) (GCC)) #1 SMP Sun Oct 21 00:56:28 UTC 2018
2025-02-20 10:34:51,563: CPU: ARMv7 Processor [414fc0f0] revision 0
(ARMv7), cr=10c5387d
2025-02-20 10:34:51,564: CPU: div instructions available: patching division code
2025-02-20 10:34:51,564: CPU: PIPT / VIPT nonaliasing data cache, PIPT
instruction cache
2025-02-20 10:34:51,565: OF: fdt: Machine model: linux,dummy-virt
2025-02-20 10:34:51,565: Memory policy: Data cache writealloc
2025-02-20 10:34:51,565: efi: Getting EFI parameters from FDT:
2025-02-20 10:34:51,566: efi: UEFI not found.
2025-02-20 10:34:51,566: cma: aetrsv6i
2025-02-20 10:34:51,566: psci: oi ocdteofmT
2025-02-20 10:34:51,566: ps:Sv1ecdniwe^Mps:Usdvni spsc:rt  gtnot psci:
SMC Calling Conventiov0
2025-02-20 10:34:51,567: pec:Eee1pec pvls9 1 2865^MBul1olt bi4^MKerloa
n:itte nltAA
2025-02-20 10:34:51,568: Det c stlere 3 rr4 5i 9(dr336bt)Memr 06112aib
6Keece11K ra0 ,21 sv,Kmrsv,Kim)Virtual kernel memory layout:
2025-02-20 10:34:51,568: vt :xf0  fa  f00 0f00 30 (0M^M lm  c00
0800xe0   2M^M mue:x00 0f00  4B
2025-02-20 10:34:51,568: et x(tvl) (rl 97B
2025-02-20 10:34:51,569: n 0(trvl-xta) 0  0pv) 3 B
2025-02-20 10:34:51,570: bs prl-xta) 5 )random: get_random_u32 ca
___+:ag6 d=3MOes,P=,os^Mfta:aotg78ni  1asHierhlC pmti.RCrtcnC
oN_S2orpi=.   TasCebd
2025-02-20 10:34:51,571: RC jtgeeyor_o_a1 __d^MNRIS1 rr GIC physical
location is 0x80^MGIC2 n[mx00-02f,P81]arcte 1tm(
nna65H(r.cemk0facl:0d20,aish_o:6i  M,routo1n per49419sn^MCocrm
dexCalibrating delay ose eceiife..oIl2^Mpxdl6i3
2025-02-20 10:34:51,572: SertFmo ia^MYemgmdl
2025-02-20 10:34:51,572: SEn: ili.Moutcea b te 2(d:,4t^MMoupnch
stlers14oe  9oeyo
2025-02-20 10:34:51,572: CP0Screv:iwein tuia norieI tst lrl
2025-02-20 10:34:51,573: /cpus/cpu@0 missing clock-frequen or^MCPU
rd1c  k  i 00^MSetgptiintm rx30 000Hircc Cmento^MEF rc ln  aae^Msm ii
cdrCs.smp ohu1o, USMPTao1reo cvae(15. gP.CPUA U)tt  Code.devtmpfs:
initialize^Mmo4aheu pr3vit e0cloejfe s fffm_cs0fffm_d_:924250sfut stbs
2025-02-20 10:34:51,574: piclo:niidit uye
2025-02-20 10:34:51,574: DM tre  vi^MNETeDM:rlae2  o tmi oerent aloaios
2025-02-20 10:34:51,574: audit: initializing netlink subsys (disabled)
2025-02-20 10:34:51,574: audit: type=2000 audit(0.480:1):
sat=ntalzdadt_nbe= e=^Mcpuidl:usn oenrmnu
2025-02-20 10:34:51,574: No ATG?
2025-02-20 10:34:51,575: hw-brapit fud5(1rsrvd rapon n  achon
eitrs^Mhw-brapit:mxmmwthontsz s8bts^MSerial MAP01 Ap1:ttAA tMIO09000ud
= 0) saP01 e1
2025-02-20 10:34:51,575: console [ttyAMA0] enabled
2025-02-20 10:34:51,668: cryptd: max_cpu_qlen set to 1000
2025-02-20 10:34:51,690: vgaarb: loaded
2025-02-20 10:34:51,697: SCSI subsystem initialized
2025-02-20 10:34:51,702: usbcore: registered new interface driver usbfs
2025-02-20 10:34:51,703: usbcore: registered new interface driver hub

(the log has no garbling after that so it's presumably OK with
the interrupt-driven real UART driver but the polling one you get
for earlycon has trouble.)

I also noticed that pl011_write_txdata() doesn't clear TXFE
when it puts a byte into the fifo -- I'm testing to see if
fixing that helps.

-- PMM

Re: [PATCH v5 2/6] target/i386: Update EPYC-Rome CPU model for Cache property, RAS, SVM feature bits

2025-02-20 Thread Zhao Liu

On Thu, Feb 06, 2025 at 01:28:35PM -0600, Babu Moger wrote:
> Date: Thu, 6 Feb 2025 13:28:35 -0600
> From: Babu Moger 
> Subject: [PATCH v5 2/6] target/i386: Update EPYC-Rome CPU model for Cache
>  property, RAS, SVM feature bits
> X-Mailer: git-send-email 2.34.1
> 
> Found that some of the cache properties are not set correctly for EPYC models.
> 
> l1d_cache.no_invd_sharing should not be true.
> l1i_cache.no_invd_sharing should not be true.
> 
> L2.self_init should be true.
> L2.inclusive should be true.
> 
> L3.inclusive should not be true.
> L3.no_invd_sharing should be true.
> 
> Fix these cache properties.
> 
> Also add the missing RAS and SVM features bits on AMD EPYC-Rome. The SVM
> feature bits are used in nested guests.
> 
> succor: Software uncorrectable error containment and recovery 
> capability.
> overflow-recov: MCA overflow recovery support.
> lbrv  : LBR virtualization
> tsc-scale : MSR based TSC rate control
> vmcb-clean: VMCB clean bits
> flushbyasid   : Flush by ASID
> pause-filter  : Pause intercept filter
> pfthreshold   : PAUSE filter threshold
> v-vmsave-vmload   : Virtualized VMLOAD and VMSAVE
> vgif  : Virtualized GIF
> 
> Signed-off-by: Babu Moger 
> Reviewed-by: Maksim Davydov 
> ---
>  target/i386/cpu.c | 73 +++
>  1 file changed, 73 insertions(+)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 94292bfaa2..e2c3c797ed 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -2342,6 +2342,60 @@ static const CPUCaches epyc_rome_v3_cache_info = {
>  },
>  };
>  
> +static const CPUCaches epyc_rome_v5_cache_info = {
> +.l1d_cache = &(CPUCacheInfo) {
> +.type = DATA_CACHE,
> +.level = 1,
> +.size = 32 * KiB,
> +.line_size = 64,
> +.associativity = 8,
> +.partitions = 1,
> +.sets = 64,
> +.lines_per_tag = 1,
> +.self_init = 1,

This field could be true,

> +.share_level = CPU_TOPOLOGY_LEVEL_CORE,
> +},
> +.l1i_cache = &(CPUCacheInfo) {
> +.type = INSTRUCTION_CACHE,
> +.level = 1,
> +.size = 32 * KiB,
> +.line_size = 64,
> +.associativity = 8,
> +.partitions = 1,
> +.sets = 64,
> +.lines_per_tag = 1,
> +.self_init = 1,

ditto,

Compared to the previous cache model version, the differences can be
checked. I feel that in the future, when we introduce a new cache model,
it's better to avoid omitting items that default to false. This way, the
cache model can correspond to the output of the cpuid tool, making it
easier to compare and check.

Overall, LGTM,

Reviewed-by: Zhao Liu

Re: [PATCH 2/9] hw/char/pl011: Simplify a bit pl011_can_receive()

2025-02-20 Thread Luc Michel

On 22:08 Wed 19 Feb , Philippe Mathieu-Daudé wrote:
> Introduce 'fifo_depth' and 'fifo_available' local variables
> to better express the 'r' variable use.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Luc Michel 

> ---
>  hw/char/pl011.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/char/pl011.c b/hw/char/pl011.c
> index 60cea1d9a16..bcd516d682d 100644
> --- a/hw/char/pl011.c
> +++ b/hw/char/pl011.c
> @@ -486,7 +486,9 @@ static void pl011_write(void *opaque, hwaddr offset,
>  static int pl011_can_receive(void *opaque)
>  {
>  PL011State *s = (PL011State *)opaque;
> -int r;
> +unsigned fifo_depth = pl011_get_fifo_depth(s);
> +unsigned fifo_available = fifo_depth - s->read_count;
> +int r = fifo_available ? 1 : 0;
> 
>  if (!(s->cr & CR_UARTEN)) {
>  qemu_log_mask(LOG_GUEST_ERROR, "PL011 reading data on disabled 
> UART\n");
> @@ -494,7 +496,6 @@ static int pl011_can_receive(void *opaque)
>  if (!(s->cr & CR_RXE)) {
>  qemu_log_mask(LOG_GUEST_ERROR, "PL011 reading data on disabled TX 
> UART\n");
>  }
> -r = s->read_count < pl011_get_fifo_depth(s);
>  trace_pl011_can_receive(s->lcr, s->read_count, r);
>  return r;
>  }
> --
> 2.47.1
> 

--

Re: [PATCH 3/9] hw/char/pl011: Improve RX flow tracing events

2025-02-20 Thread Luc Michel

On 22:08 Wed 19 Feb , Philippe Mathieu-Daudé wrote:
> Log FIFO use (availability and depth).
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Luc Michel 

> ---
>  hw/char/pl011.c  | 10 ++
>  hw/char/trace-events |  7 ---
>  2 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/char/pl011.c b/hw/char/pl011.c
> index bcd516d682d..148a7d0dc60 100644
> --- a/hw/char/pl011.c
> +++ b/hw/char/pl011.c
> @@ -185,7 +185,7 @@ static void pl011_fifo_rx_put(void *opaque, uint32_t 
> value)
>  s->read_fifo[slot] = value;
>  s->read_count++;
>  s->flags &= ~PL011_FLAG_RXFE;
> -trace_pl011_fifo_rx_put(value, s->read_count);
> +trace_pl011_fifo_rx_put(value, s->read_count, pipe_depth);
>  if (s->read_count == pipe_depth) {
>  trace_pl011_fifo_rx_full();
>  s->flags |= PL011_FLAG_RXFF;
> @@ -248,12 +248,13 @@ static void pl011_write_txdata(PL011State *s, uint8_t 
> data)
>  static uint32_t pl011_read_rxdata(PL011State *s)
>  {
>  uint32_t c;
> +unsigned fifo_depth = pl011_get_fifo_depth(s);
> 
>  s->flags &= ~PL011_FLAG_RXFF;
>  c = s->read_fifo[s->read_pos];
>  if (s->read_count > 0) {
>  s->read_count--;
> -s->read_pos = (s->read_pos + 1) & (pl011_get_fifo_depth(s) - 1);
> +s->read_pos = (s->read_pos + 1) & (fifo_depth - 1);
>  }
>  if (s->read_count == 0) {
>  s->flags |= PL011_FLAG_RXFE;
> @@ -261,7 +262,7 @@ static uint32_t pl011_read_rxdata(PL011State *s)
>  if (s->read_count == s->read_trigger - 1) {
>  s->int_level &= ~INT_RX;
>  }
> -trace_pl011_read_fifo(s->read_count);
> +trace_pl011_read_fifo(s->read_count, fifo_depth);
>  s->rsr = c >> 8;
>  pl011_update(s);
>  qemu_chr_fe_accept_input(&s->chr);
> @@ -496,12 +497,13 @@ static int pl011_can_receive(void *opaque)
>  if (!(s->cr & CR_RXE)) {
>  qemu_log_mask(LOG_GUEST_ERROR, "PL011 reading data on disabled TX 
> UART\n");
>  }
> -trace_pl011_can_receive(s->lcr, s->read_count, r);
> +trace_pl011_can_receive(s->lcr, s->read_count, fifo_depth, 
> fifo_available);
>  return r;
>  }
> 
>  static void pl011_receive(void *opaque, const uint8_t *buf, int size)
>  {
> +trace_pl011_receive(size);
>  /*
>   * In loopback mode, the RX input signal is internally disconnected
>   * from the entire receiving logics; thus, all inputs are ignored,
> diff --git a/hw/char/trace-events b/hw/char/trace-events
> index b2e3d25ae34..05a33036c12 100644
> --- a/hw/char/trace-events
> +++ b/hw/char/trace-events
> @@ -60,12 +60,13 @@ imx_serial_put_data(const char *chrname, uint32_t value) 
> "%s: 0x%" PRIx32
>  # pl011.c
>  pl011_irq_state(int level) "irq state %d"
>  pl011_read(uint32_t addr, uint32_t value, const char *regname) "addr 0x%03x 
> value 0x%08x reg %s"
> -pl011_read_fifo(int read_count) "FIFO read, read_count now %d"
> +pl011_read_fifo(unsigned rx_fifo_used, size_t rx_fifo_depth) "RX FIFO read, 
> used %u/%zu"
>  pl011_write(uint32_t addr, uint32_t value, const char *regname) "addr 0x%03x 
> value 0x%08x reg %s"
> -pl011_can_receive(uint32_t lcr, int read_count, int r) "LCR 0x%08x 
> read_count %d returning %d"
> -pl011_fifo_rx_put(uint32_t c, int read_count) "new char 0x%02x read_count 
> now %d"
> +pl011_can_receive(uint32_t lcr, unsigned rx_fifo_used, size_t rx_fifo_depth, 
> unsigned rx_fifo_available) "LCR 0x%02x, RX FIFO used %u/%zu, can_receive %u 
> chars"
> +pl011_fifo_rx_put(uint32_t c, unsigned read_count, size_t rx_fifo_depth) "RX 
> FIFO push char [0x%02x] %d/%zu depth used"
>  pl011_fifo_rx_full(void) "RX FIFO now full, RXFF set"
>  pl011_baudrate_change(unsigned int baudrate, uint64_t clock, uint32_t ibrd, 
> uint32_t fbrd) "new baudrate %u (clk: %" PRIu64 "hz, ibrd: %" PRIu32 ", fbrd: 
> %" PRIu32 ")"
> +pl011_receive(int size) "recv %d chars"
> 
>  # cmsdk-apb-uart.c
>  cmsdk_apb_uart_read(uint64_t offset, uint64_t data, unsigned size) "CMSDK 
> APB UART read: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
> --
> 2.47.1
> 

--

Re: [PATCH 00/10] fpu: Remove remaining target ifdefs and build only once

2025-02-20 Thread Philippe Mathieu-Daudé


On 17/2/25 13:50, Peter Maydell wrote:

The work I needed to do to make various softfloat emulation behaviours
runtime-selectable for Arm FEAT_AFP has left the fpu code with very
few remaning target ifdefs. So this series turns the last remaning
ones into runtime behaviour choices and switches the fpu code into
"build once" rather than "build per target". The main driver of this
is that we're going to want to do this for the "multiple targets in
one binary" work.


Indeed the following 332 symbols are now removed for each target:

bfloat16_add
bfloat16_compare
bfloat16_compare_quiet
bfloat16_default_nan
bfloat16_div
bfloat16_is_quiet_nan
bfloat16_is_signaling_nan
bfloat16_max
bfloat16_maximum_number
bfloat16_maxnum
bfloat16_maxnummag
bfloat16_min
bfloat16_minimum_number
bfloat16_minnum
bfloat16_minnummag
bfloat16_mul
bfloat16_muladd
bfloat16_round_to_int
bfloat16_scalbn
bfloat16_silence_nan
bfloat16_sqrt
bfloat16_squash_input_denormal
bfloat16_sub
bfloat16_to_float32
bfloat16_to_float64
bfloat16_to_int16
bfloat16_to_int16_round_to_zero
bfloat16_to_int16_scalbn
bfloat16_to_int32
bfloat16_to_int32_round_to_zero
bfloat16_to_int32_scalbn
bfloat16_to_int64
bfloat16_to_int64_round_to_zero
bfloat16_to_int64_scalbn
bfloat16_to_int8
bfloat16_to_int8_round_to_zero
bfloat16_to_int8_scalbn
bfloat16_to_uint16
bfloat16_to_uint16_round_to_zero
bfloat16_to_uint16_scalbn
bfloat16_to_uint32
bfloat16_to_uint32_round_to_zero
bfloat16_to_uint32_scalbn
bfloat16_to_uint64
bfloat16_to_uint64_round_to_zero
bfloat16_to_uint64_scalbn
bfloat16_to_uint8
bfloat16_to_uint8_round_to_zero
bfloat16_to_uint8_scalbn
float128_add
float128_compare
float128_compare_quiet
float128_default_nan
float128_div
float128_is_quiet_nan
float128_is_signaling_nan
float128_max
float128_maximum_number
float128_maxnum
float128_maxnummag
float128_min
float128_minimum_number
float128_minnum
float128_minnummag
float128_mul
float128_muladd
float128_rem
float128_round_to_int
float128_scalbn
float128_silence_nan
float128_sqrt
float128_sub
float128_to_float32
float128_to_float64
float128_to_floatx80
float128_to_int128
float128_to_int128_round_to_zero
float128_to_int32
float128_to_int32_round_to_zero
float128_to_int64
float128_to_int64_round_to_zero
float128_to_uint128
float128_to_uint128_round_to_zero
float128_to_uint32
float128_to_uint32_round_to_zero
float128_to_uint64
float128_to_uint64_round_to_zero
float16_add
float16_compare
float16_compare_quiet
float16_default_nan
float16_div
float16_is_quiet_nan
float16_is_signaling_nan
float16_max
float16_maximum_number
float16_maxnum
float16_maxnummag
float16_min
float16_minimum_number
float16_minnum
float16_minnummag
float16_mul
float16_muladd
float16_muladd_scalbn
float16_round_to_int
float16_scalbn
float16_silence_nan
float16_sqrt
float16_squash_input_denormal
float16_sub
float16_to_float32
float16_to_float64
float16_to_int16
float16_to_int16_round_to_zero
float16_to_int16_scalbn
float16_to_int32
float16_to_int32_round_to_zero
float16_to_int32_scalbn
float16_to_int64
float16_to_int64_round_to_zero
float16_to_int64_scalbn
float16_to_int8
float16_to_int8_scalbn
float16_to_uint16
float16_to_uint16_round_to_zero
float16_to_uint16_scalbn
float16_to_uint32
float16_to_uint32_round_to_zero
float16_to_uint32_scalbn
float16_to_uint64
float16_to_uint64_round_to_zero
float16_to_uint64_scalbn
float16_to_uint8
float16_to_uint8_scalbn
float32_add
float32_compare
float32_compare_quiet
float32_default_nan
float32_div
float32_exp2
float32_is_quiet_nan
float32_is_signaling_nan
float32_log2
float32_max
float32_maximum_number
float32_maxnum
float32_maxnummag
float32_min
float32_minimum_number
float32_minnum
float32_minnummag
float32_mul
float32_muladd
float32_muladd_scalbn
float32_rem
float32_round_to_int
float32_scalbn
float32_silence_nan
float32_sqrt
float32_squash_input_denormal
float32_sub
float32_to_bfloat16
float32_to_float128
float32_to_float16
float32_to_float64
float32_to_floatx80
float32_to_int16
float32_to_int16_round_to_zero
float32_to_int16_scalbn
float32_to_int32
float32_to_int32_round_to_zero
float32_to_int32_scalbn
float32_to_int64
float32_to_int64_round_to_zero
float32_to_int64_scalbn
float32_to_uint16
float32_to_uint16_round_to_zero
float32_to_uint16_scalbn
float32_to_uint32
float32_to_uint32_round_to_zero

Re: [PATCH 6/9] hw/char/imx_serial: Really use RX FIFO depth

2025-02-20 Thread Luc Michel

On 22:08 Wed 19 Feb , Philippe Mathieu-Daudé wrote:
> While we model a 32-elements RX FIFO since the PL011 model was
> introduced in commit 988f2442971 ("hw/char/imx_serial: Implement
> receive FIFO and ageing timer") we only read 1 char at a time!

"the IMX serial model"?

Reviewed-by: Luc Michel 

> 
> Have the IOCanReadHandler handler return how many elements are
> available, and use that in the IOReadHandler handler.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/char/imx_serial.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/char/imx_serial.c b/hw/char/imx_serial.c
> index 38b4865157e..6f14f8403a9 100644
> --- a/hw/char/imx_serial.c
> +++ b/hw/char/imx_serial.c
> @@ -386,7 +386,8 @@ static void imx_serial_write(void *opaque, hwaddr offset,
>  static int imx_can_receive(void *opaque)
>  {
>  IMXSerialState *s = (IMXSerialState *)opaque;
> -return s->ucr2 & UCR2_RXEN && fifo32_num_used(&s->rx_fifo) < FIFO_SIZE;
> +
> +return s->ucr2 & UCR2_RXEN ? fifo32_num_free(&s->rx_fifo) : 0;
>  }
> 
>  static void imx_put_data(void *opaque, uint32_t value)
> @@ -417,7 +418,10 @@ static void imx_receive(void *opaque, const uint8_t 
> *buf, int size)
>  IMXSerialState *s = (IMXSerialState *)opaque;
> 
>  s->usr2 |= USR2_WAKE;
> -imx_put_data(opaque, *buf);
> +
> +for (int i = 0; i < size; i++) {
> +imx_put_data(opaque, buf[i]);
> +}
>  }
> 
>  static void imx_event(void *opaque, QEMUChrEvent event)
> --
> 2.47.1
> 

--

Re: [PATCH 7/9] hw/char/mcf_uart: Use FIFO_DEPTH definition instead of magic values

2025-02-20 Thread Luc Michel

On 22:08 Wed 19 Feb , Philippe Mathieu-Daudé wrote:
> Defines FIFO_DEPTH and use it, fixing coding style.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Luc Michel 

> ---
>  hw/char/mcf_uart.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/char/mcf_uart.c b/hw/char/mcf_uart.c
> index 980a12fcb7d..95f269ee9b7 100644
> --- a/hw/char/mcf_uart.c
> +++ b/hw/char/mcf_uart.c
> @@ -17,6 +17,8 @@
>  #include "chardev/char-fe.h"
>  #include "qom/object.h"
> 
> +#define FIFO_DEPTH 4
> +
>  struct mcf_uart_state {
>  SysBusDevice parent_obj;
> 
> @@ -27,7 +29,7 @@ struct mcf_uart_state {
>  uint8_t imr;
>  uint8_t bg1;
>  uint8_t bg2;
> -uint8_t fifo[4];
> +uint8_t fifo[FIFO_DEPTH];
>  uint8_t tb;
>  int current_mr;
>  int fifo_len;
> @@ -247,14 +249,16 @@ static void mcf_uart_reset(DeviceState *dev)
>  static void mcf_uart_push_byte(mcf_uart_state *s, uint8_t data)
>  {
>  /* Break events overwrite the last byte if the fifo is full.  */
> -if (s->fifo_len == 4)
> +if (s->fifo_len == FIFO_DEPTH) {
>  s->fifo_len--;
> +}
> 
>  s->fifo[s->fifo_len] = data;
>  s->fifo_len++;
>  s->sr |= MCF_UART_RxRDY;
> -if (s->fifo_len == 4)
> +if (s->fifo_len == FIFO_DEPTH) {
>  s->sr |= MCF_UART_FFULL;
> +}
> 
>  mcf_uart_update(s);
>  }
> --
> 2.47.1
> 

--

RE: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in d3hot state

2025-02-20 Thread Duan, Zhenzhong




>-Original Message-
>From: Alex Williamson 
>Subject: Re: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in
>d3hot state
>
>On Thu, 20 Feb 2025 04:24:13 +
>"Duan, Zhenzhong"  wrote:
>
>> >-Original Message-
>> >From: Alex Williamson 
>> >Subject: Re: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in
>> >d3hot state
>> >
>> >On Wed, 19 Feb 2025 18:58:58 +0100
>> >Eric Auger  wrote:
>> >
>> >> Since kernel commit:
>> >> 2b2c651baf1c ("vfio/pci: Invalidate mmaps and block the access
>> >> in D3hot power state")
>> >> any attempt to do an mmap access to a BAR when the device is in d3hot
>> >> state will generate a fault.
>> >>
>> >> On system_powerdown, if the VFIO device is translated by an IOMMU,
>> >> the device is moved to D3hot state and then the vIOMMU gets disabled
>> >> by the guest. As a result of this later operation, the address space is
>> >> swapped from translated to untranslated. When re-enabling the aliased
>> >> regions, the RAM regions are dma-mapped again and this causes DMA_MAP
>> >> faults when attempting the operation on BARs.
>> >>
>> >> To avoid doing the remap on those BARs, we compute whether the
>> >> device is in D3hot state and if so, skip the DMA MAP.
>> >
>> >Thinking on this some more, QEMU PCI code already manages the device
>> >BARs appearing in the address space based on the memory enable bit in
>> >the command register.  Should we do the same for PM state?
>> >
>> >IOW, the device going into low power state should remove the BARs from
>> >the AddressSpace and waking the device should re-add them.  The BAR DMA
>> >mapping should then always be consistent, whereas here nothing would
>> >remap the BARs when the device is woken.
>>
>> If BARs should be disabled before D3hot transition, isn't it guest's 
>> responsibility
>to do that itself?
>> Just like what have been done for FLR which calls pci_dev_save_and_disable().
>
>Nothing requires the guest to clear memory and IO from the command
>register before entering a low power state, nor are we going to get
>very far arguing that it's the guest's fault for triggering an error in
>the hypervisor.  The PCI spec indicates that memory and IO BARs are only
>accessible when the device is in the D0 power state.  On bare metal
>accessing the BAR for a device in a low power state would generate an
>unsupported request.

Understood, yes it makes sense to remove BARs from AddressSpace when D3hot.

> Therefore why should QEMU map BARs of devices in
>low power states into the address space?
Should not.

Thanks
Zhenzhong

Re: [PATCH 8/9] hw/char/mcf_uart: Really use RX FIFO depth

2025-02-20 Thread Luc Michel

On 22:08 Wed 19 Feb , Philippe Mathieu-Daudé wrote:
> Caution: This message originated from an External Source. Use proper caution 
> when opening attachments, clicking links, or responding.
> 
> 
> While we model a 4-elements RX FIFO since the PL011 model
> was introduced in commit 20dcee94833 ("MCF5208 emulation"),
> we only read 1 char at a time!

"the MCF UART model"

Reviewed-by: Luc Michel 

> 
> Have the IOCanReadHandler handler return how many elements are
> available, and use that in the IOReadHandler handler.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/char/mcf_uart.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/char/mcf_uart.c b/hw/char/mcf_uart.c
> index 95f269ee9b7..529c26be93a 100644
> --- a/hw/char/mcf_uart.c
> +++ b/hw/char/mcf_uart.c
> @@ -281,14 +281,16 @@ static int mcf_uart_can_receive(void *opaque)
>  {
>  mcf_uart_state *s = (mcf_uart_state *)opaque;
> 
> -return s->rx_enabled && (s->sr & MCF_UART_FFULL) == 0;
> +return s->rx_enabled ? FIFO_DEPTH - s->fifo_len : 0;
>  }
> 
>  static void mcf_uart_receive(void *opaque, const uint8_t *buf, int size)
>  {
>  mcf_uart_state *s = (mcf_uart_state *)opaque;
> 
> -mcf_uart_push_byte(s, buf[0]);
> +for (int i = 0; i < size; i++) {
> +mcf_uart_push_byte(s, buf[i]);
> +}
>  }
> 
>  static const MemoryRegionOps mcf_uart_ops = {
> --
> 2.47.1
> 

--

[PATCH v2 1/3] plugins: add explicit dependency in functional tests

2025-02-20 Thread Philippe Mathieu-Daudé

From: Pierrick Bouvier 

./tests/functional/test_aarch64_tcg_plugins.py needs to have plugin
libinsn built. However, it's not listed as a dependency, so meson can't
know it needs to be built.

Thus, we keep track of all plugins, and add them as an explicit
dependency.

Fixes: 4c134d07b9e ("tests: add a new set of tests to exercise plugins")
Signed-off-by: Pierrick Bouvier 
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Philippe Mathieu-Daudé 
---
 meson.build   | 1 +
 contrib/plugins/meson.build   | 2 ++
 tests/functional/meson.build  | 2 +-
 tests/tcg/plugins/meson.build | 2 ++
 4 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 0ee79c664d3..4588bfd8642 100644
--- a/meson.build
+++ b/meson.build
@@ -3657,6 +3657,7 @@ qtest_module_ss = ss.source_set()
 
 modules = {}
 target_modules = {}
+plugin_modules = []
 hw_arch = {}
 target_arch = {}
 target_system_arch = {}
diff --git a/contrib/plugins/meson.build b/contrib/plugins/meson.build
index 484b9a808c8..fa8a426c8b5 100644
--- a/contrib/plugins/meson.build
+++ b/contrib/plugins/meson.build
@@ -26,3 +26,5 @@ if t.length() > 0
 else
   run_target('contrib-plugins', command: find_program('true'))
 endif
+
+plugin_modules += t
diff --git a/tests/functional/meson.build b/tests/functional/meson.build
index b516d21cba1..2d41908c0a8 100644
--- a/tests/functional/meson.build
+++ b/tests/functional/meson.build
@@ -362,7 +362,7 @@ foreach speed : ['quick', 'thorough']
   # 'run_target' logic below & in Makefile.include
   test('func-' + testname,
python,
-   depends: [test_deps, test_emulator, emulator_modules],
+   depends: [test_deps, test_emulator, emulator_modules, 
plugin_modules],
env: test_env,
args: [testpath],
protocol: 'tap',
diff --git a/tests/tcg/plugins/meson.build b/tests/tcg/plugins/meson.build
index 87a17d67bd4..c8cb0626a6d 100644
--- a/tests/tcg/plugins/meson.build
+++ b/tests/tcg/plugins/meson.build
@@ -19,3 +19,5 @@ if t.length() > 0
 else
   run_target('test-plugins', command: find_program('true'))
 endif
+
+plugin_modules += t
-- 
2.47.1

Re: [PATCH v2 3/3] tests/functional: Allow running TCG plugins tests on non-Linux/BSD hosts

2025-02-20 Thread Thomas Huth


On 20/02/2025 09.02, Philippe Mathieu-Daudé wrote:

Not all platforms use the '.so' suffix for shared libraries,
which is how plugins are built. Use the recently introduced
dso_suffix() helper to get the proper host suffix.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2804
Suggested-by: Pierrick Bouvier 
Suggested-by: Daniel P. Berrangé 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Pierrick Bouvier 
---


Reviewed-by: Thomas Huth

Re: [PATCH v6 2/4] migration: enable multifd and postcopy together

2025-02-20 Thread Prasad Pandit

Hello,

On Wed, 19 Feb 2025 at 22:53, Fabiano Rosas  wrote:
> I don't see anything stopping postcopy_start() from being called in the
> source in relation to multifd recv threads being setup in the
> destination. So far it seems possible that the source is opening the
> preempt channel while multifd still hasn't seen all threads. There's
> also pre-7.2 machines which create the postcopy channel early.

* If we can not predict the sequence/timings of when different types
of connections are initiated and processed, maybe source and
destination QEMUs could work in tandem. ie. before initiating a
connection, source QEMU could send an 'initiate' message saying I'm
initiating 'X' connection. Only when destination QEMU says 'okay',
source QEMU could proceed with actual connection.

 QEMU-A  -> Initiate connection type X -> QEMU-B
 QEMU-A  <-okay   <-<- QEMU-B
 QEMU-A ->   connect type X-> QEMU-B

   (thinking out loud)

>>> > * migration_needs_multiple_sockets()
>> Then it should return 'True' when both migrate_multifd() and 
>> postcopy_preempt() are enabled.
> Why?

* I was thinking multiple_sockets is multiple types of sockets:
multifd & postcopy. But it seems here multiple sockets is any type of
multiple sockets.

> I thought you meant the CH_MAIN stuff. So now I don't know what you
> mean. You want to do away with multifd?

* Yes, CH_DEFAULT -> CH_MAIN was introduced in this series to identify
channels and accordingly call relevant functions.

* Not to do away with multifd, but more of making it same as the main
channel, ex: virsh migrate --threads  N = 1...255. All precopy
threads/connections behave the same. Differentiation of precopy and
postcopy shall still exist, because they operate/work in opposite
directions.

> Continue with this patch and fix the stuff I mentioned. You can ignore
> the first two paragraphs of that reply.
>
> https://lore.kernel.org/r/87y0y4tf5q@suse.de
>
> I still think we need to test that preempt + multifd scenario, but it
> should be easy to write a test for that once the series is in more of a
> final shape.

* Okay.

> We can't add magic values, as we've discussed.

Okay.

Thank you.
---
  - Prasad

Re: Any plan for command line '-g' option (graphical resolution and depth)?

2025-02-20 Thread Paolo Bonzini


On 2/20/25 10:20, Philippe Mathieu-Daudé wrote:

Hi,

QEMU provides the global '-g' CLI option:

$ qemu-system-foo --help

   -g WxH[xDEPTH]  Set the initial graphical resolution and depth

This option is used to pass resolution/depth information to guest
firmwares in the machines defined in the following files:

   hw/ppc/mac_newworld.c
   hw/ppc/mac_oldworld.c
   hw/ppc/prep.c
   hw/ppc/spapr.c
   hw/sparc/sun4m.c
   hw/sparc64/sun4u.c

Examples:

- hw/ppc/spapr.c:1102:

   _FDT(fdt_setprop_cell(fdt, c, "qemu,graphic-width", graphic_width));

- hw/sparc64/sun4u.c:716:

   fw_cfg_add_i16(fw_cfg, FW_CFG_SPARC64_WIDTH, graphic_width);


Obviously we have default values, which are different per architecture!
Clearly these ought to be machine properties.


Yes, we should have an easy way for machine to allow, in addition to -M 
graphics=BOOLEAN, the structured property -M 
graphics.{width,height,depth,enabled}. Then '-g' can desugar to -M 
graphics.width=WW,graphics.height=HH,graphics.depth=BB.



It is also used to set TYPE_NUBUS_MACFB properties in hw/m68k/q800.c.
Here i suppose we could directly use '-global nubus-macfb.width=value'
etc. although I'm not sure it is the recommended way.

Should we start deprecating '-g' as a whole?


Only if it's unused/useless.  If the option is useful it could also be 
used for EDID, for example (adding Gerd).


Global variables are hard to track, and the one-off nature of the option 
makes it hard to figure out (without looking at the code) which machine 
support it.


However, if the properties were available and introspectable via QOM, 
then it wouldn't matter if the -g sugar remains as an isolated "case 
QEMU_OPTION_g:" in system/vl.c.  For an example see -boot/-smp or 
-no-reboot/-no-shutdown.


Paolo

Re: [PATCH] Fix data race with the state Field of ThreadPoolElement

2025-02-20 Thread Paolo Bonzini


On 2/19/25 17:12, Vitalii Mordan wrote:

diff --git a/util/thread-pool.c b/util/thread-pool.c
index 27eb777e85..6c5f4d085b 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -111,9 +111,8 @@ static void *worker_thread(void *opaque)
  ret = req->func(req->arg);
  
  req->ret = ret;

-/* Write ret before state.  */
-smp_wmb();
-req->state = THREAD_DONE;
+/* Atomically update state after setting ret.  */
+qatomic_store_release(&req->state, THREAD_DONE);


This is good.


@@ -180,7 +179,7 @@ static void thread_pool_completion_bh(void *opaque)
  
  restart:

  QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
-if (elem->state != THREAD_DONE) {
+if (qatomic_load_acquire(&elem->state) != THREAD_DONE) {


This is good, but it needs a comment and it can replace the smp_rmb() below.


  continue;
  }
  
@@ -223,12 +222,12 @@ static void thread_pool_cancel(BlockAIOCB *acb)

  trace_thread_pool_cancel(elem, elem->common.opaque);
  
  QEMU_LOCK_GUARD(&pool->lock);

-if (elem->state == THREAD_QUEUED) {
+if (qatomic_load_acquire(&elem->state) == THREAD_QUEUED) {


This is not ordering anything so it can be qatomic_read/qatomic_set 
(technically the one below doesn't even need that, but it's fine).



  QTAILQ_REMOVE(&pool->request_list, elem, reqs);
  qemu_bh_schedule(pool->completion_bh);
  
-elem->state = THREAD_DONE;

  elem->ret = -ECANCELED;
+qatomic_store_release(&elem->state, THREAD_DONE);
  }
  
  }

@@ -251,8 +250,8 @@ BlockAIOCB *thread_pool_submit_aio(ThreadPoolFunc *func, 
void *arg,
  req = qemu_aio_get(&thread_pool_aiocb_info, NULL, cb, opaque);
  req->func = func;
  req->arg = arg;
-req->state = THREAD_QUEUED;
  req->pool = pool;
+qatomic_store_release(&req->state, THREAD_QUEUED);


This does not need any atomic access, because there is ordering via 
pool->lock (which protects the request_list).  There's no need to do 
store-release and load-acquire except to order with another store or 
load, and in fact adding unnecessary acquire/release is harder to 
understand.


Paolo


  QLIST_INSERT_HEAD(&pool->head, req, all);

Re: [PATCH v2 0/3] tests/functional: Allow running TCG plugins tests on macOS

2025-02-20 Thread Alex Bennée

Philippe Mathieu-Daudé  writes:

> Pierrick kindly helped me to resolve this issue which ended
> being trivial (to him!). Not tested on Windows so far.
>
> Since v1:
> - Include Pierrick's meson fix patch (adding Fixes: tag)
> - Addressed Thomas review comments (config.py, os.path.join)
>
> Philippe Mathieu-Daudé (2):
>   tests/functional: Introduce the dso_suffix() helper
>   tests/functional: Allow running TCG plugins tests on non-Linux/BSD
> hosts
>
> Pierrick Bouvier (1):
>   plugins: add explicit dependency in functional tests
>
>  meson.build  |  1 +
>  contrib/plugins/meson.build  |  2 ++
>  tests/functional/meson.build |  2 +-
>  tests/functional/qemu_test/__init__.py   |  2 +-
>  tests/functional/qemu_test/cmd.py|  1 -
>  tests/functional/qemu_test/config.py |  6 ++
>  tests/functional/test_aarch64_tcg_plugins.py | 11 ---
>  tests/tcg/plugins/meson.build|  2 ++
>  8 files changed, 21 insertions(+), 6 deletions(-)

Queued to testing/next (I fixed the cmd.py whitespace change), thanks. 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Any plan for command line '-g' option (graphical resolution and depth)?

2025-02-20 Thread Philippe Mathieu-Daudé


Hi,

QEMU provides the global '-g' CLI option:

$ qemu-system-foo --help

  -g WxH[xDEPTH]  Set the initial graphical resolution and depth

This option is used to pass resolution/depth information to guest
firmwares in the machines defined in the following files:

  hw/ppc/mac_newworld.c
  hw/ppc/mac_oldworld.c
  hw/ppc/prep.c
  hw/ppc/spapr.c
  hw/sparc/sun4m.c
  hw/sparc64/sun4u.c

Examples:

- hw/ppc/spapr.c:1102:

  _FDT(fdt_setprop_cell(fdt, c, "qemu,graphic-width", graphic_width));

- hw/sparc64/sun4u.c:716:

  fw_cfg_add_i16(fw_cfg, FW_CFG_SPARC64_WIDTH, graphic_width);


Obviously we have default values, which are different per architecture!
Clearly these ought to be machine properties.


It is also used to set TYPE_NUBUS_MACFB properties in hw/m68k/q800.c.
Here i suppose we could directly use '-global nubus-macfb.width=value'
etc. although I'm not sure it is the recommended way.

Should we start deprecating '-g' as a whole?

Regards,

Phil.

Re: [PATCH 00/10] fpu: Remove remaining target ifdefs and build only once

2025-02-20 Thread Philippe Mathieu-Daudé


On 20/2/25 09:48, Philippe Mathieu-Daudé wrote:

On 17/2/25 13:50, Peter Maydell wrote:

The work I needed to do to make various softfloat emulation behaviours
runtime-selectable for Arm FEAT_AFP has left the fpu code with very
few remaning target ifdefs. So this series turns the last remaning
ones into runtime behaviour choices and switches the fpu code into
"build once" rather than "build per target". The main driver of this
is that we're going to want to do this for the "multiple targets in
one binary" work.




and my single binary duplicate symbols list drastically reduced:

-ld: 1759 duplicate symbols
+ld: 1427 duplicate symbols


(being so far for a microblaze+xtensa config:)

CpuModelCompareResult_lookup
CpuModelExpansionType_lookup
QAPIEvent_lookup
accel_cpu_common_realize
accel_cpu_common_unrealize
accel_cpu_instance_init
accel_find
accel_init_interfaces
accel_supported_gdbstub_sstep_flags
address_space_access_valid
address_space_cache_destroy
address_space_cache_init
address_space_cache_invalidate
address_space_destroy
address_space_dispatch_compact
address_space_dispatch_free
address_space_dispatch_new
address_space_get_flatview
address_space_get_iotlb_entry
address_space_init
address_space_io
address_space_ldl
address_space_ldl_be
address_space_ldl_be_cached_slow
address_space_ldl_cached_slow
address_space_ldl_le
address_space_ldl_le_cached_slow
address_space_ldq
address_space_ldq_be
address_space_ldq_be_cached_slow
address_space_ldq_cached_slow
address_space_ldq_le
address_space_ldq_le_cached_slow
address_space_ldub
address_space_ldub_cached_slow
address_space_lduw
address_space_lduw_be
address_space_lduw_be_cached_slow
address_space_lduw_cached_slow
address_space_lduw_le
address_space_lduw_le_cached_slow
address_space_map
address_space_memory
address_space_read_cached_slow
address_space_read_full
address_space_register_map_client
address_space_remove_listeners
address_space_rw
address_space_set
address_space_stb
address_space_stb_cached_slow
address_space_stl
address_space_stl_be
address_space_stl_be_cached_slow
address_space_stl_cached_slow
address_space_stl_le
address_space_stl_le_cached_slow
address_space_stl_notdirty
address_space_stl_notdirty_cached_slow
address_space_stq
address_space_stq_be
address_space_stq_be_cached_slow
address_space_stq_cached_slow
address_space_stq_le
address_space_stq_le_cached_slow
address_space_stw
address_space_stw_be
address_space_stw_be_cached_slow
address_space_stw_cached_slow
address_space_stw_le
address_space_stw_le_cached_slow
address_space_translate_for_iotlb
address_space_unmap
address_space_unregister_map_client
address_space_write
address_space_write_cached_slow
address_space_write_rom
arch_type
colo_flush_ram_cache
colo_incoming_start_dirty_log
colo_init_ram_cache
colo_record_bitmap
colo_release_ram_cache
cpu_abort
cpu_address_space_destroy
cpu_address_space_init
cpu_atomic_add_fetchb_mmu
cpu_atomic_add_fetchl_be_mmu
cpu_atomic_add_fetchl_le_mmu
cpu_atomic_add_fetchq_be_mmu
cpu_atomic_add_fetchq_le_mmu
cpu_atomic_add_fetchw_be_mmu
cpu_atomic_add_fetchw_le_mmu
cpu_atomic_and_fetchb_mmu
cpu_atomic_and_fetchl_be_mmu
cpu_atomic_and_fetchl_le_mmu
cpu_atomic_and_fetchq_be_mmu
cpu_atomic_and_fetchq_le_mmu
cpu_atomic_and_fetchw_be_mmu
cpu_atomic_and_fetchw_le_mmu
cpu_atomic_cmpxchgb_mmu
cpu_atomic_cmpxchgl_be_mmu
cpu_atomic_cmpxchgl_le_mmu
cpu_atomic_cmpxchgo_be_mmu
cpu_atomic_cmpxchgo_le_mmu
cpu_atomic_cmpxchgq_be_mmu
cpu_atomic_cmpxchgq_le_mmu
cpu_atomic_cmpxchgw_be_mmu
cpu_atomic_cmpxchgw_le_mmu
cpu_atomic_fetch_addb_mmu
cpu_atomic_fetch_addl_be_mmu
cpu_atomic_fetch_addl_le_mmu
cpu_atomic_fetch_addq_be_mmu
cpu_atomic_fetch_addq_le_mmu
cpu_atomic_fetch_addw_be_mmu
cpu_atomic_fetch_addw_le_mmu
cpu_atomic_fetch_andb_mmu
cpu_atomic_fetch_andl_be_mmu
cpu_atomic_fetch_andl_le_mmu
cpu_atomic_fetch_andq_be_mmu
cpu_atomic_fetch_andq_le_mmu
cpu_atomic_fetch_andw_be_mmu
cpu_atomic_fetch_andw_le_mmu
cpu_atomic_fetch_orb_mmu
cpu_atomic_fetch_orl_be_mmu
cpu_atomic_fetch_orl_le_mmu
cpu_atomic_fetch_orq_be_mmu
cpu_atomic_fetch_orq_le_mmu
cpu_atomic_fetch_orw_be_mmu
cpu_atomic_fetch_orw_le_mmu
cpu_atomic_fetch_smaxb_mmu
cpu_atomic_fetch_smaxl_be_mmu
cpu_atomic_fetch_smaxl_le_mmu
cpu_atomic_fetch_smaxq_be_mmu
cpu_atomic_fetch_smaxq_le_mmu
cpu_atomic_fetch_smaxw_be_mmu
cpu_atomic_fetch_smaxw_le_mmu
cpu_atomic_fetch_sminb_mmu
cpu_atomic_fetch_sminl_be_mmu
cpu_atomic_fetch_sminl_le_mmu
cpu_atomic_fetch_sminq_be_mmu
cpu_atomic_fetch_sminq_le_mmu
cpu_atomic_fetch_sminw_be_mmu
cpu_atomic_fetch_sminw_le_mmu
cpu_atomic_fetch_umaxb_mmu
cpu_atomic_fetch_umaxl_be_mmu
cpu_atomic_fetch_umaxl_le_mmu
cpu_atomic_fetch_umaxq_be_mmu
cpu_atomic_fetch_umaxq_le_mmu
cpu_atomic_fetch_umaxw_be_mmu
cpu_atomic_fetch_umaxw_le_mmu
cpu_atomic_fetch_uminb_mmu
cpu_atomic_fetch_uminl_be_mmu
cpu_atomic_fetch_uminl_le_mmu
cpu_atomic_fetch_uminq_be_mmu
cpu_atomic_fetch_uminq_le_mmu
cpu_atomic_fetch_uminw_be_mmu
cpu_atomic_fetch_uminw_le_mmu
cpu_atomic_fetch_xorb_mmu
cpu_atomic_fetch_xorl_be_mmu
cpu_atomi

[PATCH v2 3/9] hw/char/pl011: Improve RX flow tracing events

2025-02-20 Thread Philippe Mathieu-Daudé

Log FIFO use (availability and depth).

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Reviewed-by: Luc Michel 
---
 hw/char/pl011.c  | 10 ++
 hw/char/trace-events |  7 ---
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index 5bb83c54216..f7485e7c541 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -185,7 +185,7 @@ static void pl011_fifo_rx_put(void *opaque, uint32_t value)
 s->read_fifo[slot] = value;
 s->read_count++;
 s->flags &= ~PL011_FLAG_RXFE;
-trace_pl011_fifo_rx_put(value, s->read_count);
+trace_pl011_fifo_rx_put(value, s->read_count, pipe_depth);
 if (s->read_count == pipe_depth) {
 trace_pl011_fifo_rx_full();
 s->flags |= PL011_FLAG_RXFF;
@@ -248,12 +248,13 @@ static void pl011_write_txdata(PL011State *s, uint8_t 
data)
 static uint32_t pl011_read_rxdata(PL011State *s)
 {
 uint32_t c;
+unsigned fifo_depth = pl011_get_fifo_depth(s);
 
 s->flags &= ~PL011_FLAG_RXFF;
 c = s->read_fifo[s->read_pos];
 if (s->read_count > 0) {
 s->read_count--;
-s->read_pos = (s->read_pos + 1) & (pl011_get_fifo_depth(s) - 1);
+s->read_pos = (s->read_pos + 1) & (fifo_depth - 1);
 }
 if (s->read_count == 0) {
 s->flags |= PL011_FLAG_RXFE;
@@ -261,7 +262,7 @@ static uint32_t pl011_read_rxdata(PL011State *s)
 if (s->read_count == s->read_trigger - 1) {
 s->int_level &= ~INT_RX;
 }
-trace_pl011_read_fifo(s->read_count);
+trace_pl011_read_fifo(s->read_count, fifo_depth);
 s->rsr = c >> 8;
 pl011_update(s);
 qemu_chr_fe_accept_input(&s->chr);
@@ -498,12 +499,13 @@ static int pl011_can_receive(void *opaque)
 qemu_log_mask(LOG_GUEST_ERROR,
   "PL011 receiving data on disabled RX UART\n");
 }
-trace_pl011_can_receive(s->lcr, s->read_count, r);
+trace_pl011_can_receive(s->lcr, s->read_count, fifo_depth, fifo_available);
 return r;
 }
 
 static void pl011_receive(void *opaque, const uint8_t *buf, int size)
 {
+trace_pl011_receive(size);
 /*
  * In loopback mode, the RX input signal is internally disconnected
  * from the entire receiving logics; thus, all inputs are ignored,
diff --git a/hw/char/trace-events b/hw/char/trace-events
index b2e3d25ae34..05a33036c12 100644
--- a/hw/char/trace-events
+++ b/hw/char/trace-events
@@ -60,12 +60,13 @@ imx_serial_put_data(const char *chrname, uint32_t value) 
"%s: 0x%" PRIx32
 # pl011.c
 pl011_irq_state(int level) "irq state %d"
 pl011_read(uint32_t addr, uint32_t value, const char *regname) "addr 0x%03x 
value 0x%08x reg %s"
-pl011_read_fifo(int read_count) "FIFO read, read_count now %d"
+pl011_read_fifo(unsigned rx_fifo_used, size_t rx_fifo_depth) "RX FIFO read, 
used %u/%zu"
 pl011_write(uint32_t addr, uint32_t value, const char *regname) "addr 0x%03x 
value 0x%08x reg %s"
-pl011_can_receive(uint32_t lcr, int read_count, int r) "LCR 0x%08x read_count 
%d returning %d"
-pl011_fifo_rx_put(uint32_t c, int read_count) "new char 0x%02x read_count now 
%d"
+pl011_can_receive(uint32_t lcr, unsigned rx_fifo_used, size_t rx_fifo_depth, 
unsigned rx_fifo_available) "LCR 0x%02x, RX FIFO used %u/%zu, can_receive %u 
chars"
+pl011_fifo_rx_put(uint32_t c, unsigned read_count, size_t rx_fifo_depth) "RX 
FIFO push char [0x%02x] %d/%zu depth used"
 pl011_fifo_rx_full(void) "RX FIFO now full, RXFF set"
 pl011_baudrate_change(unsigned int baudrate, uint64_t clock, uint32_t ibrd, 
uint32_t fbrd) "new baudrate %u (clk: %" PRIu64 "hz, ibrd: %" PRIu32 ", fbrd: 
%" PRIu32 ")"
+pl011_receive(int size) "recv %d chars"
 
 # cmsdk-apb-uart.c
 cmsdk_apb_uart_read(uint64_t offset, uint64_t data, unsigned size) "CMSDK APB 
UART read: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
-- 
2.47.1

[PATCH v2 1/9] hw/char/pl011: Warn when using disabled receiver

2025-02-20 Thread Philippe Mathieu-Daudé

We shouldn't receive characters when the full UART or its
receiver is disabled. However we don't want to break the
possibly incomplete "my first bare metal assembly program"s,
so we choose to simply display a warning when this occurs.

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/char/pl011.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index 06ce851044d..12a2d4bc7bd 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -85,6 +85,7 @@ DeviceState *pl011_create(hwaddr addr, qemu_irq irq, Chardev 
*chr)
 #define CR_OUT1 (1 << 12)
 #define CR_RTS  (1 << 11)
 #define CR_DTR  (1 << 10)
+#define CR_RXE  (1 << 9)
 #define CR_TXE  (1 << 8)
 #define CR_LBE  (1 << 7)
 #define CR_UARTEN   (1 << 0)
@@ -487,6 +488,14 @@ static int pl011_can_receive(void *opaque)
 PL011State *s = (PL011State *)opaque;
 int r;
 
+if (!(s->cr & CR_UARTEN)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "PL011 receiving data on disabled UART\n");
+}
+if (!(s->cr & CR_RXE)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "PL011 receiving data on disabled RX UART\n");
+}
 r = s->read_count < pl011_get_fifo_depth(s);
 trace_pl011_can_receive(s->lcr, s->read_count, r);
 return r;
-- 
2.47.1

[PATCH v2 0/9] hw/char: Improve RX FIFO depth uses

2025-02-20 Thread Philippe Mathieu-Daudé

Since v1:
- Fixed typos (Luc)

Some UART devices implement a RX FIFO but their code
(via IOCanReadHandler) only return a size of 1 element,
while we can receive more chars.

This series takes advantage of the full depth.

Inspired by pm215 chat comment on yesterday's community
meeting on the PL011 Rust implementation description by
Paolo :)

Philippe Mathieu-Daudé (9):
  hw/char/pl011: Warn when using disabled receiver
  hw/char/pl011: Simplify a bit pl011_can_receive()
  hw/char/pl011: Improve RX flow tracing events
  hw/char/pl011: Really use RX FIFO depth
  hw/char/bcm2835_aux: Really use RX FIFO depth
  hw/char/imx_serial: Really use RX FIFO depth
  hw/char/mcf_uart: Use FIFO_DEPTH definition instead of magic values
  hw/char/mcf_uart: Really use RX FIFO depth
  hw/char/sh_serial: Return correct number of empty RX FIFO elements

 hw/char/bcm2835_aux.c |  6 --
 hw/char/imx_serial.c  |  8 ++--
 hw/char/mcf_uart.c| 16 +++-
 hw/char/pl011.c   | 30 ++
 hw/char/sh_serial.c   | 30 ++
 hw/char/trace-events  |  7 ---
 6 files changed, 61 insertions(+), 36 deletions(-)

-- 
2.47.1

[PATCH v2 2/9] hw/char/pl011: Simplify a bit pl011_can_receive()

2025-02-20 Thread Philippe Mathieu-Daudé

Introduce 'fifo_depth' and 'fifo_available' local variables
to better express the 'r' variable use.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Luc Michel 
---
 hw/char/pl011.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index 12a2d4bc7bd..5bb83c54216 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -486,7 +486,9 @@ static void pl011_write(void *opaque, hwaddr offset,
 static int pl011_can_receive(void *opaque)
 {
 PL011State *s = (PL011State *)opaque;
-int r;
+unsigned fifo_depth = pl011_get_fifo_depth(s);
+unsigned fifo_available = fifo_depth - s->read_count;
+int r = fifo_available ? 1 : 0;
 
 if (!(s->cr & CR_UARTEN)) {
 qemu_log_mask(LOG_GUEST_ERROR,
@@ -496,7 +498,6 @@ static int pl011_can_receive(void *opaque)
 qemu_log_mask(LOG_GUEST_ERROR,
   "PL011 receiving data on disabled RX UART\n");
 }
-r = s->read_count < pl011_get_fifo_depth(s);
 trace_pl011_can_receive(s->lcr, s->read_count, r);
 return r;
 }
-- 
2.47.1

[PATCH v2 7/9] hw/char/mcf_uart: Use FIFO_DEPTH definition instead of magic values

2025-02-20 Thread Philippe Mathieu-Daudé

Defines FIFO_DEPTH and use it, fixing coding style.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Luc Michel 
---
 hw/char/mcf_uart.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/char/mcf_uart.c b/hw/char/mcf_uart.c
index 980a12fcb7d..95f269ee9b7 100644
--- a/hw/char/mcf_uart.c
+++ b/hw/char/mcf_uart.c
@@ -17,6 +17,8 @@
 #include "chardev/char-fe.h"
 #include "qom/object.h"
 
+#define FIFO_DEPTH 4
+
 struct mcf_uart_state {
 SysBusDevice parent_obj;
 
@@ -27,7 +29,7 @@ struct mcf_uart_state {
 uint8_t imr;
 uint8_t bg1;
 uint8_t bg2;
-uint8_t fifo[4];
+uint8_t fifo[FIFO_DEPTH];
 uint8_t tb;
 int current_mr;
 int fifo_len;
@@ -247,14 +249,16 @@ static void mcf_uart_reset(DeviceState *dev)
 static void mcf_uart_push_byte(mcf_uart_state *s, uint8_t data)
 {
 /* Break events overwrite the last byte if the fifo is full.  */
-if (s->fifo_len == 4)
+if (s->fifo_len == FIFO_DEPTH) {
 s->fifo_len--;
+}
 
 s->fifo[s->fifo_len] = data;
 s->fifo_len++;
 s->sr |= MCF_UART_RxRDY;
-if (s->fifo_len == 4)
+if (s->fifo_len == FIFO_DEPTH) {
 s->sr |= MCF_UART_FFULL;
+}
 
 mcf_uart_update(s);
 }
-- 
2.47.1

[PATCH v2 6/9] hw/char/imx_serial: Really use RX FIFO depth

2025-02-20 Thread Philippe Mathieu-Daudé

While we model a 32-elements RX FIFO since the IMX serial
model was introduced in commit 988f2442971 ("hw/char/imx_serial:
Implement receive FIFO and ageing timer") we only read 1 char
at a time!

Have the IOCanReadHandler handler return how many elements are
available, and use that in the IOReadHandler handler.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Luc Michel 
---
 hw/char/imx_serial.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/char/imx_serial.c b/hw/char/imx_serial.c
index 38b4865157e..6f14f8403a9 100644
--- a/hw/char/imx_serial.c
+++ b/hw/char/imx_serial.c
@@ -386,7 +386,8 @@ static void imx_serial_write(void *opaque, hwaddr offset,
 static int imx_can_receive(void *opaque)
 {
 IMXSerialState *s = (IMXSerialState *)opaque;
-return s->ucr2 & UCR2_RXEN && fifo32_num_used(&s->rx_fifo) < FIFO_SIZE;
+
+return s->ucr2 & UCR2_RXEN ? fifo32_num_free(&s->rx_fifo) : 0;
 }
 
 static void imx_put_data(void *opaque, uint32_t value)
@@ -417,7 +418,10 @@ static void imx_receive(void *opaque, const uint8_t *buf, 
int size)
 IMXSerialState *s = (IMXSerialState *)opaque;
 
 s->usr2 |= USR2_WAKE;
-imx_put_data(opaque, *buf);
+
+for (int i = 0; i < size; i++) {
+imx_put_data(opaque, buf[i]);
+}
 }
 
 static void imx_event(void *opaque, QEMUChrEvent event)
-- 
2.47.1

[PATCH v2 5/9] hw/char/bcm2835_aux: Really use RX FIFO depth

2025-02-20 Thread Philippe Mathieu-Daudé

While we model a 8-elements RX FIFO since the BCM2835 AUX model
was introduced in commit 97398d900ca ("bcm2835_aux: add emulation
of BCM2835 AUX block") we only read 1 char at a time!

Have the IOCanReadHandler handler return how many elements are
available, and use that in the IOReadHandler handler.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Luc Michel 
---
 hw/char/bcm2835_aux.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/char/bcm2835_aux.c b/hw/char/bcm2835_aux.c
index 73ad5934067..c6e7eccf7dd 100644
--- a/hw/char/bcm2835_aux.c
+++ b/hw/char/bcm2835_aux.c
@@ -221,7 +221,7 @@ static int bcm2835_aux_can_receive(void *opaque)
 {
 BCM2835AuxState *s = opaque;
 
-return s->read_count < BCM2835_AUX_RX_FIFO_LEN;
+return BCM2835_AUX_RX_FIFO_LEN - s->read_count;
 }
 
 static void bcm2835_aux_put_fifo(void *opaque, uint8_t value)
@@ -243,7 +243,9 @@ static void bcm2835_aux_put_fifo(void *opaque, uint8_t 
value)
 
 static void bcm2835_aux_receive(void *opaque, const uint8_t *buf, int size)
 {
-bcm2835_aux_put_fifo(opaque, *buf);
+for (int i = 0; i < size; i++) {
+bcm2835_aux_put_fifo(opaque, buf[i]);
+}
 }
 
 static const MemoryRegionOps bcm2835_aux_ops = {
-- 
2.47.1

[PATCH 3/5] vfio/pci: Delete local pm_cap

2025-02-20 Thread Alex Williamson

This is now redundant to PCIDevice.pm_cap.

Cc: Cédric Le Goater 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c | 9 -
 hw/vfio/pci.h | 1 -
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6903f831e45f..ba4ef65b16fa 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2215,7 +2215,6 @@ static bool vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t 
pos, Error **errp)
 break;
 case PCI_CAP_ID_PM:
 vfio_check_pm_reset(vdev, pos);
-vdev->pm_cap = pos;
 ret = pci_pm_init(pdev, pos, errp) >= 0;
 break;
 case PCI_CAP_ID_AF:
@@ -2407,17 +2406,17 @@ void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
 vfio_disable_interrupts(vdev);
 
 /* Make sure the device is in D0 */
-if (vdev->pm_cap) {
+if (pdev->pm_cap) {
 uint16_t pmcsr;
 uint8_t state;
 
-pmcsr = vfio_pci_read_config(pdev, vdev->pm_cap + PCI_PM_CTRL, 2);
+pmcsr = vfio_pci_read_config(pdev, pdev->pm_cap + PCI_PM_CTRL, 2);
 state = pmcsr & PCI_PM_CTRL_STATE_MASK;
 if (state) {
 pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
-vfio_pci_write_config(pdev, vdev->pm_cap + PCI_PM_CTRL, pmcsr, 2);
+vfio_pci_write_config(pdev, pdev->pm_cap + PCI_PM_CTRL, pmcsr, 2);
 /* vfio handles the necessary delay here */
-pmcsr = vfio_pci_read_config(pdev, vdev->pm_cap + PCI_PM_CTRL, 2);
+pmcsr = vfio_pci_read_config(pdev, pdev->pm_cap + PCI_PM_CTRL, 2);
 state = pmcsr & PCI_PM_CTRL_STATE_MASK;
 if (state) {
 error_report("vfio: Unable to power on device, stuck in D%d",
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 43c166680abb..d638c781f6f1 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -160,7 +160,6 @@ struct VFIOPCIDevice {
 int32_t bootindex;
 uint32_t igd_gms;
 OffAutoPCIBAR msix_relo;
-uint8_t pm_cap;
 uint8_t nv_gpudirect_clique;
 bool pci_aer;
 bool req_enabled;
-- 
2.48.1

[PATCH 2/5] pci: Use PCI PM capability initializer

2025-02-20 Thread Alex Williamson

Switch callers directly initializing the PCI PM capability with
pci_add_capability() to use pci_pm_init().

Cc: Dmitry Fleytman 
Cc: Akihiko Odaki 
Cc: Jason Wang 
Cc: Stefan Weil 
Cc: Sriram Yagnaraman 
Cc: Keith Busch 
Cc: Klaus Jensen 
Cc: Jesper Devantier 
Cc: Michael S. Tsirkin 
Cc: Marcel Apfelbaum 
Cc: Cédric Le Goater 
Signed-off-by: Alex Williamson 
---
 hw/net/e1000e.c | 3 +--
 hw/net/eepro100.c   | 4 +---
 hw/net/igb.c| 3 +--
 hw/nvme/ctrl.c  | 3 +--
 hw/pci-bridge/pcie_pci_bridge.c | 2 +-
 hw/vfio/pci.c   | 2 +-
 hw/virtio/virtio-pci.c  | 3 +--
 7 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index f637853073e2..b72cbab7e889 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -372,8 +372,7 @@ static int
 e1000e_add_pm_capability(PCIDevice *pdev, uint8_t offset, uint16_t pmc)
 {
 Error *local_err = NULL;
-int ret = pci_add_capability(pdev, PCI_CAP_ID_PM, offset,
- PCI_PM_SIZEOF, &local_err);
+int ret = pci_pm_init(pdev, offset, &local_err);
 
 if (local_err) {
 error_report_err(local_err);
diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index 6d853229aec2..29a39865a608 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -551,9 +551,7 @@ static void e100_pci_reset(EEPRO100State *s, Error **errp)
 if (info->power_management) {
 /* Power Management Capabilities */
 int cfg_offset = 0xdc;
-int r = pci_add_capability(&s->dev, PCI_CAP_ID_PM,
-   cfg_offset, PCI_PM_SIZEOF,
-   errp);
+int r = pci_pm_init(&s->dev, cfg_offset, errp);
 if (r < 0) {
 return;
 }
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 4d93ce629f95..700dbc746d3d 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -356,8 +356,7 @@ static int
 igb_add_pm_capability(PCIDevice *pdev, uint8_t offset, uint16_t pmc)
 {
 Error *local_err = NULL;
-int ret = pci_add_capability(pdev, PCI_CAP_ID_PM, offset,
- PCI_PM_SIZEOF, &local_err);
+int ret = pci_pm_init(pdev, offset, &local_err);
 
 if (local_err) {
 error_report_err(local_err);
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 68903d1d7067..1faea3d2b85b 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8503,8 +8503,7 @@ static int nvme_add_pm_capability(PCIDevice *pci_dev, 
uint8_t offset)
 Error *err = NULL;
 int ret;
 
-ret = pci_add_capability(pci_dev, PCI_CAP_ID_PM, offset,
- PCI_PM_SIZEOF, &err);
+ret = pci_pm_init(pci_dev, offset, &err);
 if (err) {
 error_report_err(err);
 return ret;
diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c
index fd4514a595ce..9fa656b43b42 100644
--- a/hw/pci-bridge/pcie_pci_bridge.c
+++ b/hw/pci-bridge/pcie_pci_bridge.c
@@ -52,7 +52,7 @@ static void pcie_pci_bridge_realize(PCIDevice *d, Error 
**errp)
 goto cap_error;
 }
 
-pos = pci_add_capability(d, PCI_CAP_ID_PM, 0, PCI_PM_SIZEOF, errp);
+pos = pci_pm_init(d, 0, errp);
 if (pos < 0) {
 goto pm_error;
 }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 89d900e9cf0c..6903f831e45f 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2216,7 +2216,7 @@ static bool vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t 
pos, Error **errp)
 case PCI_CAP_ID_PM:
 vfio_check_pm_reset(vdev, pos);
 vdev->pm_cap = pos;
-ret = pci_add_capability(pdev, cap_id, pos, size, errp) >= 0;
+ret = pci_pm_init(pdev, pos, errp) >= 0;
 break;
 case PCI_CAP_ID_AF:
 vfio_check_af_flr(vdev, pos);
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index c773a9130c7e..afe8b5551c5c 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2204,8 +2204,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 pos = pcie_endpoint_cap_init(pci_dev, 0);
 assert(pos > 0);
 
-pos = pci_add_capability(pci_dev, PCI_CAP_ID_PM, 0,
- PCI_PM_SIZEOF, errp);
+pos = pci_pm_init(pci_dev, 0, errp);
 if (pos < 0) {
 return;
 }
-- 
2.48.1

[PATCH 5/5] hw/vfio/pci: Re-order pre-reset

2025-02-20 Thread Alex Williamson

We want the device in the D0 power state going into reset, but the
config write can enable the BARs in the address space, which are
then removed from the address space once we clear the memory enable
bit in the command register.  Re-order to clear the command bit
first, so the power state change doesn't enable the BARs.

Cc: Cédric Le Goater 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ba4ef65b16fa..fcc5f118bf90 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2405,6 +2405,15 @@ void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
 
 vfio_disable_interrupts(vdev);
 
+/*
+ * Stop any ongoing DMA by disconnecting I/O, MMIO, and bus master.
+ * Also put INTx Disable in known state.
+ */
+cmd = vfio_pci_read_config(pdev, PCI_COMMAND, 2);
+cmd &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER |
+ PCI_COMMAND_INTX_DISABLE);
+vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2);
+
 /* Make sure the device is in D0 */
 if (pdev->pm_cap) {
 uint16_t pmcsr;
@@ -2424,15 +2433,6 @@ void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
 }
 }
 }
-
-/*
- * Stop any ongoing DMA by disconnecting I/O, MMIO, and bus master.
- * Also put INTx Disable in known state.
- */
-cmd = vfio_pci_read_config(pdev, PCI_COMMAND, 2);
-cmd &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER |
- PCI_COMMAND_INTX_DISABLE);
-vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2);
 }
 
 void vfio_pci_post_reset(VFIOPCIDevice *vdev)
-- 
2.48.1

[PATCH 0/5] PCI: Implement basic PCI PM capability backing

2025-02-20 Thread Alex Williamson

Eric recently identified an issue[1] where during graceful shutdown
of a VM in a vIOMMU configuration, the guest driver places the device
into the D3 power state, the vIOMMU is then disabled, triggering an
AddressSpace update.  The device BARs are still mapped into the AS,
but the vfio host driver refuses to DMA map the MMIO space due to the
device power state.

The proposed solution in [1] was to skip mappings based on the
device power state.  Here we take a different approach.  The PCI spec
defines that devices in D1/2/3 power state should respond only to
configuration and message requests and all other requests should be
handled as an Unsupported Request.  In other words, the memory and
IO BARs are not accessible except when the device is in the D0 power
state.

To emulate this behavior, we can factor the device power state into
the mapping state of the device BARs.  Therefore the BAR is marked
as unmapped if either the respective command register enable bit is
clear or the device is not in the D0 power state.

In order to implement this, the PowerState field of the PMCSR
register becomes writable, which allows the device to appear in
lower power states.  This also therefore implements D3 support
(insofar as the BAR behavior) for all devices implementing the PM
capability.  The PCI spec requires D3 support.

An aspect that needs attention here is whether this change in the
wmask and PMCSR bits becomes a problem for migration, and how we
might solve it.  For a guest migrating old->new, the device would
always be in the D0 power state, but the register becomes writable.
In the opposite direction, is it possible that a device could
migrate in a low power state and be stuck there since the bits are
read-only in old QEMU?  Do we need an option for this behavior and a
machine state bump, or are there alternatives?

Thanks,
Alex

[1]https://lore.kernel.org/all/20250219175941.135390-1-eric.au...@redhat.com/

Alex Williamson (5):
  hw/pci: Basic support for PCI power management
  pci: Use PCI PM capability initializer
  vfio/pci: Delete local pm_cap
  pcie, virtio: Remove redundant pm_cap
  hw/vfio/pci: Re-order pre-reset

 hw/net/e1000e.c |  3 +-
 hw/net/eepro100.c   |  4 +-
 hw/net/igb.c|  3 +-
 hw/nvme/ctrl.c  |  3 +-
 hw/pci-bridge/pcie_pci_bridge.c |  3 +-
 hw/pci/pci.c| 83 -
 hw/pci/trace-events |  2 +
 hw/vfio/pci.c   | 29 ++--
 hw/vfio/pci.h   |  1 -
 hw/virtio/virtio-pci.c  | 11 ++---
 include/hw/pci/pci.h|  3 ++
 include/hw/pci/pci_device.h |  3 ++
 include/hw/pci/pcie.h   |  2 -
 13 files changed, 112 insertions(+), 38 deletions(-)

-- 
2.48.1

Re: [PATCH v3 0/2] hw/i386/amd_iommu: Add migration support

2025-02-20 Thread Michael S. Tsirkin

On Wed, Feb 12, 2025 at 05:44:48AM +, Suravee Suthikulpanit wrote:
> Currently, amd-iommu device does not support migration. This series addresses
> an issue due hidden AMDVI-PCI device enumeration. Then introduces migratable
> VMStateDescription, which saves necessary parameters for the device.
> 
> Changes from v2:
> (https://lore.kernel.org/all/20250206051856.323651-1-suravee.suthikulpa...@amd.com)
>   * Add patch 1/2


Fails build on 32 bit:

https://gitlab.com/mstredhat/qemu/-/jobs/9202574769


In file included from ../include/qemu/osdep.h:53,
 from ../hw/i386/amd_iommu.c:23:
../include/qemu/compiler.h:70:35: error: invalid operands to binary - (have 
‘uint64_t *’ {aka ‘long long unsigned int *’} and ‘size_t *’ {aka ‘unsigned int 
*’})
   70 | #define type_check(t1,t2) ((t1*)0 - (t2*)0)
  |   ^
../include/migration/vmstate.h:269:6: note: in expansion of macro ‘type_check’
  269 |  type_check(_type, typeof_field(_state, _field)))
  |  ^~
../include/migration/vmstate.h:320:21: note: in expansion of macro 
‘vmstate_offset_value’
  320 | .offset   = vmstate_offset_value(_state, _field, _type), \
  | ^~~~
../include/migration/vmstate.h:853:5: note: in expansion of macro 
‘VMSTATE_SINGLE_TEST’
  853 | VMSTATE_SINGLE_TEST(_field, _state, NULL, _version, _info, _type)
  | ^~~
../include/migration/vmstate.h:903:5: note: in expansion of macro 
‘VMSTATE_SINGLE’
  903 | VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint64, uint64_t)
  | ^~
../include/migration/vmstate.h:937:5: note: in expansion of macro 
‘VMSTATE_UINT64_V’
  937 | VMSTATE_UINT64_V(_f, _s, 0)
  | ^~~~
../hw/i386/amd_iommu.c:1635:7: note: in expansion of macro ‘VMSTATE_UINT64’
 1635 |   VMSTATE_UINT64(devtab_len, AMDVIState),
  |   ^~


> Suravee Suthikulpanit (2):
>   hw/i386/amd_iommu: Isolate AMDVI-PCI from amd-iommu device to allow
> full control over the PCI device creation
>   hw/i386/amd_iommu: Allow migration when explicitly create the
> AMDVI-PCI device
> 
>  hw/i386/acpi-build.c |   8 ++--
>  hw/i386/amd_iommu.c  | 111 +--
>  hw/i386/amd_iommu.h  |   3 +-
>  3 files changed, 91 insertions(+), 31 deletions(-)
> 
> -- 
> 2.34.1

[PATCH v2] hw/riscv: Allow direct start of kernel for MPFS

2025-02-20 Thread Sebastian Huber

Further customize the -bios and -kernel options behaviour for the
microchip-icicle-kit machine.  If "-bios none -kernel filename" is
specified, then do not load a firmware and instead only load and start
the kernel image.

Signed-off-by: Sebastian Huber 
---

v2: Use riscv_find_firmware() to locate the firmware shipped with Qemu.

 hw/riscv/microchip_pfsoc.c | 57 ++
 1 file changed, 40 insertions(+), 17 deletions(-)

diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
index 4d3c5c8e20..b5ac057a1d 100644
--- a/hw/riscv/microchip_pfsoc.c
+++ b/hw/riscv/microchip_pfsoc.c
@@ -578,29 +578,45 @@ static void 
microchip_icicle_kit_machine_init(MachineState *machine)
 }
 
 /*
- * We follow the following table to select which payload we execute.
+ * We follow the following table to select which firmware we use.
  *
- *  -bios |-kernel | payload
- * ---++
- *  N |  N | HSS
- *  Y | don't care | HSS
- *  N |  Y | kernel
- *
- * This ensures backwards compatibility with how we used to expose -bios
- * to users but allows them to run through direct kernel booting as well.
+ * -bios | -kernel| firmware
+ * --++
+ * none  |  N | error
+ * none  |  Y | kernel
+ * NULL, default |  N | BIOS_FILENAME
+ * NULL, default |  Y | RISCV64_BIOS_BIN
+ * other | don't care | other
  */
+if (machine->firmware && !strcmp(machine->firmware, "none")) {
+if (!machine->kernel_filename) {
+error_report("for -bios none, a kernel is required");
+exit(1);
+}
 
-if (machine->kernel_filename) {
-firmware_name = RISCV64_BIOS_BIN;
-firmware_load_addr = memmap[MICROCHIP_PFSOC_DRAM_LO].base;
+firmware_name = NULL;
+firmware_load_addr = RESET_VECTOR;
+} else if (!machine->firmware || !strcmp(machine->firmware, "default")) {
+if (machine->kernel_filename) {
+firmware_name = RISCV64_BIOS_BIN;
+firmware_load_addr = memmap[MICROCHIP_PFSOC_DRAM_LO].base;
+} else {
+firmware_name = BIOS_FILENAME;
+firmware_load_addr = RESET_VECTOR;
+}
 } else {
-firmware_name = BIOS_FILENAME;
+firmware_name = machine->firmware;
 firmware_load_addr = RESET_VECTOR;
 }
 
-/* Load the firmware */
-firmware_end_addr = riscv_find_and_load_firmware(machine, firmware_name,
- &firmware_load_addr, 
NULL);
+/* Load the firmware if necessary */
+if (firmware_name) {
+const char *filename = riscv_find_firmware(firmware_name, NULL);
+firmware_end_addr = riscv_load_firmware(filename, &firmware_load_addr,
+NULL);
+} else {
+firmware_end_addr = firmware_load_addr;
+}
 
 riscv_boot_info_init(&boot_info, &s->soc.u_cpus);
 if (machine->kernel_filename) {
@@ -635,8 +651,15 @@ static void microchip_icicle_kit_machine_init(MachineState 
*machine)
 fdt_load_addr = 0;
 }
 
+hwaddr start_addr;
+if (firmware_name) {
+start_addr = firmware_load_addr;
+} else {
+start_addr = kernel_entry;
+}
+
 /* Load the reset vector */
-riscv_setup_rom_reset_vec(machine, &s->soc.u_cpus, firmware_load_addr,
+riscv_setup_rom_reset_vec(machine, &s->soc.u_cpus, start_addr,
   memmap[MICROCHIP_PFSOC_ENVM_DATA].base,
   memmap[MICROCHIP_PFSOC_ENVM_DATA].size,
   kernel_entry, fdt_load_addr);
-- 
2.43.0

Re: [PATCH 0/5] Improve Microchip Polarfire SoC customization

2025-02-20 Thread Sebastian Huber

- Am 20. Feb 2025 um 23:29 schrieb Philippe Mathieu-Daudé phi...@linaro.org:

> Hi Conor,
> 
> On 20/2/25 19:30, Conor Dooley wrote:
>> +cc qemu-riscv, Alistar.
>> 
>> On Fri, Feb 14, 2025 at 07:24:37AM +0100, Sebastian Huber wrote:
>>> Booting the microchip-icicle-kit machine using the latest PolarFire SoC
>>> Hart Software Services (HSS) no longer works since Qemu lacks support
>>> for several registers (clocks, DRAM controller). Also reading from the
>>> SDCard does not work currently.
>> 
>> On that note, I think the inaccurate docs about polarfire should be
>> removed. There's a wiki page here with dead links, or links to things
>> that do not work anymore:
>> https://wiki.qemu.org/Documentation/Platforms/RISCV#Microchip_PolarFire_SoC_Icicle_Kit
>> I think the whole section should be removed, find it kinda odd that
>> there's a polarfire section but not for any other board. Either way,
>> it's talking about something that just does not work, the current HSS
>> and Yocto don't boot.
>> 
>> There's also a docs page here:
>> https://www.qemu.org/docs/master/system/riscv/microchip-icicle-kit.html
>> that has a copy of the table your patch 4 modifies, that probably should
>> be updated to match your changes.
>> 
>> In a similar vein to the wiki, it talks about the HSS and booting a
>> yocto wic image. I think those should be deleted since they don't work.
>> 
>> Alistar/Other RISC-V folks, what do you think? Bin wrote the port but
>> seems to be AFK and I don't have the capacity to fix any of that stuff
>> on top of what I already do in my spare time - do you agree that
>> deleting the now inaccurate docs makes sense?
>> 
>>> In order to allow tests runs for real-time kernels such as RTEMS and
>>> Zephyr, improve the boot customization. This patch set enables a direct
>>> run of kernel executables, for example:
>>>
>>> qemu-system-riscv64 -no-reboot -nographic \
>>>-serial null -serial mon:stdio \
>>>-smp 2 \
>>>-bios none \
>>>-machine microchip-icicle-kit,clint-timebase-frequency=1000 \
>>>-kernel rtos.elf
>> 
>> The series breaks my usage:
>> qemu//build/qemu-system-riscv64 -M microchip-icicle-kit \
>>  -m 3G -smp 5 \
>>  -kernel vmlinux.bin \
>>  -dtb riscvpc.dtb \
>>  -initrd initramfs.cpio.gz \
>>  -display none -serial null \
>>  -serial mon:stdio \
>>  -D qemu.log -d unimp
>> opensbi-riscv64-generic-fw_dynamic.bin: No such file or directory
>> qemu-system-riscv64: could not load firmware
>> 'opensbi-riscv64-generic-fw_dynamic.bin'
>> make: *** [Makefile:305: qemu-icicle] Error 1
>> 
>> Figure it is likely to be your patch 4? The file does exist, so probably
>> some sort of path-to-it issues?
> 
> Maybe missing the -L option?
> 
>   -L path set the directory for the BIOS, VGA BIOS and keymaps

It was an error in patch 4/5. I sent a v2 version of it.

You have to find the firmware, before you can load it.

-- 
embedded brains GmbH & Co. KG
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/

Re: [PATCH 2/2] [NOT-FOR-MERGE] Add qtest for migration over RDMA

2025-02-20 Thread Zhijian Li (Fujitsu)



On 20/02/2025 23:55, Peter Xu wrote:
> On Thu, Feb 20, 2025 at 05:40:38PM +0800, Li Zhijian wrote:
>> On 19/02/2025 22:11, Peter Xu wrote:
>>> then
>>> in the test it tries to detect rdma link and fetch the ip only
>> It should work without root permission if we just*detect*  and*fetch ip*.
>>
>> Do you also mean we can split new-rdma-link.sh to 2 separate scripts
>> - add-rdma-link.sh # optionally, execute by user before the test 
>> (require root permission)
>> - detect-fetch-rdma.sh # execute from the migration-test
> Hmm indeed we still need a script to scan over all the ports..
>
> If having --rdma is a good idea, maybe we can further make it a parameter
> to --rdma?
>
> $ migration-test --rdma $RDMA_IP
>
> Or:
>
> $ migration-test --rdma-ip $RDMA_IP
 I think --rdma only makes sense if it's going to do something
 special. The optmimal scenario is that it always runs the test when it
 can and sets up/tears down anything it needs.

 If it needs root, I'd prefer the test informs about this and does the
 work itself.

 It would also be good to have the add + detect separate so we have more
 flexibility, maybe we manage to enable this in CI even.

 So:

 ./add.sh
 migration-test
 (runs detect.sh + runs rdma test)
 (leaves stuff behind)

 migration-test
 (skips rdma test with message that it needs root)

 sudo migration-test
 (runs add.sh + detect.sh + runs rdma test)
 (cleans itself up)

 Does that make sense to you? I hope it's not too much work.
>>> Looks good here.  We can also keep all the rdma stuff into one file, taking
>>> parameters.
>>>
>>> ./rdma-helper.sh setup
>>> ./rdma-helper.sh detect-ip
>>
>> Hi Peter and Fabiano
>>
>> Many thanks for your kindly idea and suggestion.
>> Please take another look at the changes below.
>> - I don't copy script to the build dir, just execute the script like 
>> misc-tests.c
>> - It will automatically create a new RXE if it doesn't exit when running in 
>> root
> 
> Thanks!  This is much better.  Comments below.
> 
>>
>> [PATCH] migration: Add qtest for migration over RDMA
>>
>> This qtest requires there is RDMA(RoCE) link in the host.
>> Introduce a scripts/rdma-migration-helper.sh to
>> - setup a new RXE if it's root
>> - detect existing RoCE link
>> to make the qtest work smoothly.
>>
>> Test will be skip if there is no available RoCE link.
>>   # Start of rdma tests
>>   # Running /x86_64/migration/precopy/rdma/plain
>>   ok 1 /x86_64/migration/precopy/rdma/plain # SKIP There is no available 
>> rdma link in the host.
>>   Maybe you are not running with the root permission
>>   # End of rdma tests
>>
>> Admin is able to remove the RXE by passing 'cleanup' to this script.
>>
>> Signed-off-by: Li Zhijian 
>> ---
>>   scripts/rdma-migration-helper.sh  | 40 +++
>>   tests/qtest/migration/precopy-tests.c | 57 +++
>>   2 files changed, 97 insertions(+)
>>   create mode 100755 scripts/rdma-migration-helper.sh
>>
>> diff --git a/scripts/rdma-migration-helper.sh 
>> b/scripts/rdma-migration-helper.sh
>> new file mode 100755
>> index 00..4ef62baf0f
>> --- /dev/null
>> +++ b/scripts/rdma-migration-helper.sh
>> @@ -0,0 +1,40 @@
>> +#!/bin/bash
>> +
>> +# Copied from blktests
>> +get_ipv4_addr() {
>> +ip -4 -o addr show dev "$1" |
>> +sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' |
>> +tr -d '\n'
>> +}
>> +
>> +has_soft_rdma() {
>> +rdma link | grep -q " netdev $1[[:blank:]]*\$"
>> +}
>> +
>> +rdma_rxe_setup_detect()
>> +{
>> +(
>> +cd /sys/class/net &&
>> +for i in *; do
>> +[ -e "$i" ] || continue
>> +[ "$i" = "lo" ] && continue
>> +[ "$(<"$i/addr_len")" = 6 ] || continue
>> +[ "$(<"$i/carrier")" = 1 ] || continue
>> +
>> +has_soft_rdma "$i" && break
>> +[ "$operation" = "setup" ] && rdma link add "${i}_rxe" type 
>> rxe netdev "$i" && break
>> +done
>> +has_soft_rdma "$i" || return
>> +get_ipv4_addr $i
>> +)
>> +}
>> +
>> +operation=${1:-setup}
>> +
>> +if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then
>> +rdma_rxe_setup_detect
>> +elif [ "$operation" == "cleanup" ]; then
>> +modprobe -r rdma_rxe
> 
> Would this affects host when there's existing user of rdma_rxe (or fail
> with -EBUSY)?  

'modprobe -r ' will fail with EBUSY in this case.


> If there's no major side effect of leftover rdma link, we
> could also drop the "cleanup" for now and start from simple.

In my understanding, there is no any side effect, I'm fine to drop it in this 
time.


> 
>> +else
>> +echo "Usage: $0 [setup | cleanup | detect]"
>> +fi
>> diff --git a/tests/qtest/migration/precopy-tests.c 
>> b/tests/qtest/migration/precopy-tests.c
>> in

RE: [PATCH v3 08/28] hw/intc/aspeed: Add support for multiple output pins in INTC

2025-02-20 Thread Jamin Lin

Hi Cedric, 

> Subject: Re: [PATCH v3 08/28] hw/intc/aspeed: Add support for multiple output
> pins in INTC
> 
> On 2/13/25 04:35, Jamin Lin wrote:
> > Added support for multiple output pins in the INTC controller to
> > accommodate the AST2700 A1.
> >
> > Introduced "num_outpins" to represent the number of output pins.
> > Updated the IRQ handling logic to initialize and connect output pins
> > separately from input pins. Modified the "aspeed_soc_ast2700_realize"
> > function to connect source orgates to INTC and INTC to GIC128 - GIC136.
> Updated the "aspeed_intc_realize"
> > function to initialize output pins.
> >
> > Signed-off-by: Jamin Lin 
> 
> could you please add to the .git/config file :
> 
Thank you for letting me know this information and for your suggestion.
I will add it to my local Git configuration.
Jamin

>  [diff]
>  orderFile = /path/to/qemu/scripts/git.orderfile
> 
> This will put .h files before the .c files in the patch.
> 
> 
> Reviewed-by: Cédric Le Goater 
> 
> Thanks,
> 
> C.
> 
> 
> > ---
> >   hw/arm/aspeed_ast27x0.c   | 6 +-
> >   hw/intc/aspeed_intc.c | 4 
> >   include/hw/intc/aspeed_intc.h | 5 +++--
> >   3 files changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/arm/aspeed_ast27x0.c b/hw/arm/aspeed_ast27x0.c index
> > 18e14a7914..775e953afd 100644
> > --- a/hw/arm/aspeed_ast27x0.c
> > +++ b/hw/arm/aspeed_ast27x0.c
> > @@ -519,10 +519,14 @@ static void
> aspeed_soc_ast2700_realize(DeviceState *dev, Error **errp)
> >   aspeed_mmio_map(s, SYS_BUS_DEVICE(&a->intc), 0,
> >   sc->memmap[ASPEED_DEV_INTC]);
> >
> > -/* GICINT orgates -> INTC -> GIC */
> > +/* source orgates -> INTC */
> >   for (i = 0; i < ic->num_inpins; i++) {
> >   qdev_connect_gpio_out(DEVICE(&a->intc.orgates[i]), 0,
> >
> qdev_get_gpio_in(DEVICE(&a->intc),
> > i));
> > +}
> > +
> > +/* INTC -> GIC128 - GIC136 */
> > +for (i = 0; i < ic->num_outpins; i++) {
> >   sysbus_connect_irq(SYS_BUS_DEVICE(&a->intc), i,
> >  qdev_get_gpio_in(DEVICE(&a->gic),
> >
> > aspeed_soc_ast2700_gic_intcmap[i].irq));
> > diff --git a/hw/intc/aspeed_intc.c b/hw/intc/aspeed_intc.c index
> > 95b40e1935..32c4a3bb44 100644
> > --- a/hw/intc/aspeed_intc.c
> > +++ b/hw/intc/aspeed_intc.c
> > @@ -354,6 +354,9 @@ static void aspeed_intc_realize(DeviceState *dev,
> Error **errp)
> >   if (!qdev_realize(DEVICE(&s->orgates[i]), NULL, errp)) {
> >   return;
> >   }
> > +}
> > +
> > +for (i = 0; i < aic->num_outpins; i++) {
> >   sysbus_init_irq(sbd, &s->output_pins[i]);
> >   }
> >   }
> > @@ -389,6 +392,7 @@ static void aspeed_2700_intc_class_init(ObjectClass
> *klass, void *data)
> >   dc->desc = "ASPEED 2700 INTC Controller";
> >   aic->num_lines = 32;
> >   aic->num_inpins = 9;
> > +aic->num_outpins = 9;
> >   aic->mem_size = 0x4000;
> >   aic->reg_size = 0x2000;
> >   }
> > diff --git a/include/hw/intc/aspeed_intc.h
> > b/include/hw/intc/aspeed_intc.h index 5f0429c7f9..0bf96a81bb 100644
> > --- a/include/hw/intc/aspeed_intc.h
> > +++ b/include/hw/intc/aspeed_intc.h
> > @@ -17,8 +17,8 @@
> >   OBJECT_DECLARE_TYPE(AspeedINTCState, AspeedINTCClass,
> ASPEED_INTC)
> >
> >   #define ASPEED_INTC_NR_REGS (0x2000 >> 2) -#define
> > ASPEED_INTC_NR_INTS 9
> >   #define ASPEED_INTC_MAX_INPINS 9
> > +#define ASPEED_INTC_MAX_OUTPINS 9
> >
> >   struct AspeedINTCState {
> >   /*< private >*/
> > @@ -30,7 +30,7 @@ struct AspeedINTCState {
> >
> >   uint32_t regs[ASPEED_INTC_NR_REGS];
> >   OrIRQState orgates[ASPEED_INTC_MAX_INPINS];
> > -qemu_irq output_pins[ASPEED_INTC_NR_INTS];
> > +qemu_irq output_pins[ASPEED_INTC_MAX_OUTPINS];
> >
> >   uint32_t enable[ASPEED_INTC_MAX_INPINS];
> >   uint32_t mask[ASPEED_INTC_MAX_INPINS]; @@ -42,6 +42,7 @@
> struct
> > AspeedINTCClass {
> >
> >   uint32_t num_lines;
> >   uint32_t num_inpins;
> > +uint32_t num_outpins;
> >   uint64_t mem_size;
> >   uint64_t reg_size;
> >   const MemoryRegionOps *reg_ops;

Re: [PATCH v4 00/33] Multifd 🔀 device state transfer support with VFIO consumer

2025-02-20 Thread Yanghang Liu

Let me share my performance report after applying the patches for the
information:

1. live mlx VF migration

outgoing migration:
+--+---+---++
| VF(s) number | 1 | 2 | 4  |
+--+---+---++
| Memory bandwidth | 733.693 MiB/s | 556.565 MiB/s | 475.310 MiB/s  |
| Total downtime   | 227ms | 358ms | 460ms  |
+--+---+---++

incoming migration:
+--+---+---++
| VF(s) number | 1 | 2 | 4  |
+--+---+---++
| Memory bandwidth | 738.758 MiB/s | 566.175 MiB/s | 458.936 MiB/s  |
| Total downtime   | 220ms | 342ms | 459ms  |
+--+---+---++


2. live mlx VF multifd migration

outgoing migration:
+--+---++
| VF(s) number | 1 | 1  |
+--+---++
| Channel  | 4 | 6  |
| Memory bandwidth | 786.942 MiB/s | 848.362 MiB/s  |
| Total downtime   | 142ms | 188ms  |
+--+---++

+--++---++
| VF(s) number | 2  | 2 | 2  |
+--++---++
| Channel  | 4  | 6 | 8  |
| Memory bandwidth |  774.315 MiB/s | 831.938 MiB/s | 769.799 MiB/s  |
| Total downtime   | 160ms  | 178ms | 156ms  |
+--++---++

+--++---++
| VF(s) number | 4  | 4 | 4  |
+--++---++
| Channel  | 6  | 8 | 16 |
| Memory bandwidth |  715.210 MiB/s | 742.962 MiB/s | 747.188 MiB/s  |
| Total downtime   | 180ms  | 219ms | 190ms  |
+--++---++

incoming migration:
+--+---++
| VF(s) number | 1 | 1  |
+--+---++
| Channel  | 4 | 6  |
| Memory bandwidth | 807.958 MiB/s | 859.525 MiB/s  |
| Total downtime   | 150ms | 177ms  |
+--+---++

+--+---+---++
| VF(s) number | 2 | 2 | 2  |
+--+---+---++
| Channel  | 4 | 6 | 8  |
| Memory bandwidth | 768.104 MiB/s | 825.462 MiB/s | 791.582 MiB/s  |
| Total downtime   | 170ms | 185ms | 175ms  |
+--+---+---++

+--+---+---++
| VF(s) number | 4 | 4 | 4  |
+--+---+---++
| Channel  | 6 | 8 | 16 |
| Memory bandwidth | 706.921 MiB/s | 750.706 MiB/s | 746.295 MiB/s  |
| Total downtime   | 174ms | 193ms | 191ms  |
+--+---+---++

Best Regards,
Yanghang Liu


On Mon, Feb 3, 2025 at 10:20 PM Cédric Le Goater  wrote:
>
> Hello Maciej,
>
> > This patch set is targeting QEMU 10.0.
> >
> > What's not yet present is documentation update under docs/devel/migration
> > but I didn't want to delay posting the code any longer.
> > Such doc can still be merged later when the design is 100% finalized.
> The changes are quite complex, the design is not trivial, the benefits are
> not huge as far as we know. I'd rather have the doc update first please.
>
> Thanks,
>
> C.
>
>
>

Re: [PATCH] docs/cxl: Add serial number for persistent-memdev

2025-02-20 Thread Yuquan Wang

> 
> Looks good.  I've queued it up on my gitlab staging tree, but
> Michael if you want to pick this one directly that's fine as well.
> 
> I should be pushing out my gitlab tree shortly (bit of networking
> fun to deal with).
> 
Hi, Jonathan

About qemu side, I have another question: Could the qemu provide simulated
RCH-RCD topology now?

Yuquan


信息安全声明：本邮件包含信息归发件人所在组织所有,发件人所在组织对该邮件拥有所有权利。请接收者注意保密,未经发件人书面许可,不得向任何第三方组织和个人透露本邮件所含信息。
Information Security Notice: The information contained in this mail is solely 
property of the sender's organization.This mail communication is 
confidential.Recipients named above are obligated to maintain secrecy and are 
not permitted to disclose the contents of this communication to others.

[PATCH v2 4/8] migration: Integrate control_save_page() logic into ram_save_target_page()

2025-02-20 Thread Li Zhijian via

Refactor the page saving logic by integrating the control_save_page()
function directly into ram_save_target_page(). This change consolidates the
RDMA migration decision-making process into a single function, enhancing
clarity and maintainability.

Signed-off-by: Li Zhijian 
---
 migration/ram.c | 35 +++
 1 file changed, 7 insertions(+), 28 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index b7157b9b175..e07651aee8d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1143,33 +1143,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus 
*pss,
 return len;
 }
 
-/*
- * @pages: the number of pages written by the control path,
- *< 0 - error
- *> 0 - number of pages written
- *
- * Return true if the pages has been saved, otherwise false is returned.
- */
-static bool control_save_page(PageSearchStatus *pss,
-  ram_addr_t offset, int *pages)
-{
-int ret;
-
-if (migrate_rdma() && !migration_in_postcopy()) {
-ret = rdma_control_save_page(pss->pss_channel, pss->block->offset,
- offset, TARGET_PAGE_SIZE);
-
-if (ret == RAM_SAVE_CONTROL_DELAYED) {
-*pages = 1;
-} else {
-*pages = ret;
-}
-return true;
-}
-
-return false;
-}
-
 /*
  * directly send the page to the stream
  *
@@ -1966,7 +1939,13 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 int res;
 
 /* Hand over to RDMA first */
-if (control_save_page(pss, offset, &res)) {
+if (migrate_rdma() && !migration_in_postcopy()) {
+res = rdma_control_save_page(pss->pss_channel, pss->block->offset,
+ offset, TARGET_PAGE_SIZE);
+
+if (res == RAM_SAVE_CONTROL_DELAYED) {
+res = 1;
+}
 return res;
 }
 
-- 
2.44.0

[PATCH v2 8/8] migration: Add qtest for migration over RDMA

2025-02-20 Thread Li Zhijian via

This qtest requires there is a RDMA(RoCE) link in the host.
In order to make the test work smoothly, introduce a
scripts/rdma-migration-helper.sh to
- setup a new Soft-RoCE(aka RXE) if it's root
- detect existing RoCE link

Test will be skipped if there is no available RoCE link.
 # Start of rdma tests
 # Running /x86_64/migration/precopy/rdma/plain
 ok 1 /x86_64/migration/precopy/rdma/plain # SKIP
 There is no available rdma link to run RDMA migration test.
 To enable the test:
 (1) Run 'scripts/rdma-migration-helper.sh setup' with root and rerun the test
 or
 (2) Run the test with root privilege

 # End of rdma tests

Signed-off-by: Li Zhijian 
---
 MAINTAINERS   |  1 +
 scripts/rdma-migration-helper.sh  | 41 +
 tests/qtest/migration/precopy-tests.c | 64 +++
 3 files changed, 106 insertions(+)
 create mode 100755 scripts/rdma-migration-helper.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 3848d37a38d..15360fcdc4b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3480,6 +3480,7 @@ R: Li Zhijian 
 R: Peter Xu 
 S: Odd Fixes
 F: migration/rdma*
+F: scripts/rdma-migration-helper.sh
 
 Migration dirty limit and dirty page rate
 M: Hyman Huang 
diff --git a/scripts/rdma-migration-helper.sh b/scripts/rdma-migration-helper.sh
new file mode 100755
index 000..66557d9e267
--- /dev/null
+++ b/scripts/rdma-migration-helper.sh
@@ -0,0 +1,41 @@
+#!/bin/bash
+
+# Copied from blktests
+get_ipv4_addr()
+{
+ip -4 -o addr show dev "$1" |
+sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' |
+tr -d '\n'
+}
+
+has_soft_rdma()
+{
+rdma link | grep -q " netdev $1[[:blank:]]*\$"
+}
+
+rdma_rxe_setup_detect()
+{
+(
+cd /sys/class/net &&
+for i in *; do
+[ -e "$i" ] || continue
+[ "$i" = "lo" ] && continue
+[ "$(<"$i/addr_len")" = 6 ] || continue
+[ "$(<"$i/carrier")" = 1 ] || continue
+
+has_soft_rdma "$i" && break
+[ "$operation" = "setup" ] &&
+rdma link add "${i}_rxe" type rxe netdev "$i" && break
+done
+has_soft_rdma "$i" || return
+get_ipv4_addr "$i"
+)
+}
+
+operation=${1:-setup}
+
+if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then
+rdma_rxe_setup_detect
+else
+echo "Usage: $0 [setup | detect]"
+fi
diff --git a/tests/qtest/migration/precopy-tests.c 
b/tests/qtest/migration/precopy-tests.c
index ba273d10b9a..bf97f4e9325 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -99,6 +99,66 @@ static void test_precopy_unix_dirty_ring(void)
 test_precopy_common(&args);
 }
 
+#ifdef CONFIG_RDMA
+
+#define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh"
+static int new_rdma_link(char *buffer)
+{
+const char *argument = (geteuid() == 0) ? "setup" : "detect";
+char cmd[1024];
+
+snprintf(cmd, sizeof(cmd), "%s %s", RDMA_MIGRATION_HELPER, argument);
+
+FILE *pipe = popen(cmd, "r");
+if (pipe == NULL) {
+perror("Failed to run script");
+return -1;
+}
+
+int idx = 0;
+while (fgets(buffer + idx, 128 - idx, pipe) != NULL) {
+idx += strlen(buffer);
+}
+
+int status = pclose(pipe);
+if (status == -1) {
+perror("Error reported by pclose()");
+return -1;
+} else if (WIFEXITED(status)) {
+return WEXITSTATUS(status);
+}
+
+return -1;
+}
+
+static void test_precopy_rdma_plain(void)
+{
+char buffer[128] = {};
+
+if (new_rdma_link(buffer)) {
+g_test_skip("\nThere is no available rdma link to run RDMA migration 
test.\n"
+"To enable the test:\n"
+"(1) Run \'" RDMA_MIGRATION_HELPER " setup\' with root and 
rerun the test\n"
+"or\n"
+"(2) Run the test with root privilege\n");
+return;
+}
+
+/*
+ * TODO: query a free port instead of hard code.
+ * 29200=('R'+'D'+'M'+'A')*100
+ **/
+g_autofree char *uri = g_strdup_printf("rdma:%s:29200", buffer);
+
+MigrateCommon args = {
+.listen_uri = uri,
+.connect_uri = uri,
+};
+
+test_precopy_common(&args);
+}
+#endif
+
 static void test_precopy_tcp_plain(void)
 {
 MigrateCommon args = {
@@ -1124,6 +1184,10 @@ static void 
migration_test_add_precopy_smoke(MigrationTestEnv *env)
test_multifd_tcp_uri_none);
 migration_test_add("/migration/multifd/tcp/plain/cancel",
test_multifd_tcp_cancel);
+#ifdef CONFIG_RDMA
+migration_test_add("/migration/precopy/rdma/plain",
+   test_precopy_rdma_plain);
+#endif
 }
 
 void migration_test_add_precopy(MigrationTestEnv *env)
-- 
2.44.0

[PATCH v2 2/8] migration/rdma: Remove redundant RAM_SAVE_CONTROL_NOT_SUPP check

2025-02-20 Thread Li Zhijian via

qemu_rdma_save_page() no longer returns RAM_SAVE_CONTROL_NOT_SUPP
since commit a4832d299dd ("migration/rdma: Check sooner if we are in postcopy 
for save_page()")

Signed-off-by: Li Zhijian 
---
 migration/rdma.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 76fb0349238..af8e6234a9f 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3290,8 +3290,7 @@ int rdma_control_save_page(QEMUFile *f, ram_addr_t 
block_offset,
 
 int ret = qemu_rdma_save_page(f, block_offset, offset, size);
 
-if (ret != RAM_SAVE_CONTROL_DELAYED &&
-ret != RAM_SAVE_CONTROL_NOT_SUPP) {
+if (ret != RAM_SAVE_CONTROL_DELAYED) {
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
-- 
2.44.0

[PATCH v2 5/8] migration: Add migration_capabilities_and_transport_compatible() helper

2025-02-20 Thread Li Zhijian via

Similar to migration_channels_and_transport_compatible(), introduce a
new helper migration_capabilities_and_transport_compatible() to check if
the capabilites is compatible with the transport.

Currently, only move the capabilities vs RDMA transport to this
function.

Signed-off-by: Li Zhijian 
---
 migration/migration.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c597aa707e5..2eacae25e0e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -238,6 +238,30 @@ 
migration_channels_and_transport_compatible(MigrationAddress *addr,
 return true;
 }
 
+static bool
+migration_capabilities_and_transport_compatible(MigrationAddress *addr,
+Error **errp)
+{
+if (addr->transport == MIGRATION_ADDRESS_TYPE_RDMA) {
+if (migrate_xbzrle()) {
+error_setg(errp, "RDMA and XBZRLE can't be used together");
+return false;
+}
+if (migrate_multifd()) {
+error_setg(errp, "RDMA and multifd can't be used together");
+return false;
+}
+}
+
+return true;
+}
+
+static bool migration_transport_compatible(MigrationAddress *addr, Error 
**errp)
+{
+return migration_channels_and_transport_compatible(addr, errp) &&
+   migration_capabilities_and_transport_compatible(addr, errp);
+}
+
 static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
 {
 uintptr_t a = (uintptr_t) ap, b = (uintptr_t) bp;
@@ -716,7 +740,7 @@ static void qemu_start_incoming_migration(const char *uri, 
bool has_channels,
 }
 
 /* transport mechanism not suitable for migration? */
-if (!migration_channels_and_transport_compatible(addr, errp)) {
+if (!migration_transport_compatible(addr, errp)) {
 return;
 }
 
@@ -735,14 +759,6 @@ static void qemu_start_incoming_migration(const char *uri, 
bool has_channels,
 }
 #ifdef CONFIG_RDMA
 } else if (addr->transport == MIGRATION_ADDRESS_TYPE_RDMA) {
-if (migrate_xbzrle()) {
-error_setg(errp, "RDMA and XBZRLE can't be used together");
-return;
-}
-if (migrate_multifd()) {
-error_setg(errp, "RDMA and multifd can't be used together");
-return;
-}
 rdma_start_incoming_migration(&addr->u.rdma, errp);
 #endif
 } else if (addr->transport == MIGRATION_ADDRESS_TYPE_EXEC) {
@@ -2159,7 +2175,7 @@ void qmp_migrate(const char *uri, bool has_channels,
 }
 
 /* transport mechanism not suitable for migration? */
-if (!migration_channels_and_transport_compatible(addr, errp)) {
+if (!migration_transport_compatible(addr, errp)) {
 return;
 }
 
-- 
2.44.0

[PATCH v2 1/8] migration: Prioritize RDMA in ram_save_target_page()

2025-02-20 Thread Li Zhijian via

Address an error in RDMA-based migration by ensuring RDMA is prioritized
when saving pages in `ram_save_target_page()`.

Previously, the RDMA protocol's page-saving step was placed after other
protocols due to a refactoring in commit bc38dc2f5f3. This led to migration
failures characterized by unknown control messages and state loading errors
destination:
(qemu) qemu-system-x86_64: Unknown control message QEMU FILE
qemu-system-x86_64: error while loading state section id 1(ram)
qemu-system-x86_64: load of migration failed: Operation not permitted
source:
(qemu) qemu-system-x86_64: RDMA is in an error state waiting migration to abort!
qemu-system-x86_64: failed to save SaveStateEntry with id(name): 1(ram): -1
qemu-system-x86_64: rdma migration: recv polling control error!
qemu-system-x86_64: warning: Early error. Sending error.
qemu-system-x86_64: warning: rdma migration: send polling control error

RDMA migration implemented its own protocol/method to send pages to
destination side, hand over to RDMA first to prevent pages being saved by
other protocol.

Fixes: bc38dc2f5f3 ("migration: refactor ram_save_target_page functions")
Signed-off-by: Li Zhijian 
---
 migration/ram.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 589b6505eb2..424df6d9f13 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1964,6 +1964,11 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
 int res;
 
+/* Hand over to RDMA first */
+if (control_save_page(pss, offset, &res)) {
+return res;
+}
+
 if (!migrate_multifd()
 || migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
 if (save_zero_page(rs, pss, offset)) {
@@ -1976,10 +1981,6 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 return ram_save_multifd_page(block, offset);
 }
 
-if (control_save_page(pss, offset, &res)) {
-return res;
-}
-
 return ram_save_page(rs, pss);
 }
 
-- 
2.44.0

[PATCH v2 6/8] migraion: disable RDMA + postcopy-ram

2025-02-20 Thread Li Zhijian via

It's believed that RDMA + postcopy-ram has been broken for a while.
Rather than spending time re-enabling it, let's simply disable it as a
trade-off.

Signed-off-by: Li Zhijian 
---
 migration/migration.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 2eacae25e0e..d414a4b1379 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -251,6 +251,10 @@ 
migration_capabilities_and_transport_compatible(MigrationAddress *addr,
 error_setg(errp, "RDMA and multifd can't be used together");
 return false;
 }
+if (migrate_postcopy_ram()) {
+error_setg(errp, "RDMA and postcopy-ram can't be used together");
+return false;
+}
 }
 
 return true;
-- 
2.44.0

[PATCH v2 3/8] migration: Kill RAM_SAVE_CONTROL_NOT_SUPP

2025-02-20 Thread Li Zhijian via

Refactor the migration control logic by eliminating the
`RAM_SAVE_CONTROL_NOT_SUPP` return value within the migration codebase.

This involves moving the checks for RDMA migration status and postcopy
state from rdma_control_save_page() to control_save_page()

With this change, control_save_page() now takes responsibility for
determining whether RDMA operations can proceed, based on the state of
migration.

Signed-off-by: Li Zhijian 
---
 migration/ram.c  | 19 ++-
 migration/rdma.c |  4 +---
 migration/rdma.h |  3 +--
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 424df6d9f13..b7157b9b175 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1155,18 +1155,19 @@ static bool control_save_page(PageSearchStatus *pss,
 {
 int ret;
 
-ret = rdma_control_save_page(pss->pss_channel, pss->block->offset, offset,
- TARGET_PAGE_SIZE);
-if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
-return false;
-}
+if (migrate_rdma() && !migration_in_postcopy()) {
+ret = rdma_control_save_page(pss->pss_channel, pss->block->offset,
+ offset, TARGET_PAGE_SIZE);
 
-if (ret == RAM_SAVE_CONTROL_DELAYED) {
-*pages = 1;
+if (ret == RAM_SAVE_CONTROL_DELAYED) {
+*pages = 1;
+} else {
+*pages = ret;
+}
 return true;
 }
-*pages = ret;
-return true;
+
+return false;
 }
 
 /*
diff --git a/migration/rdma.c b/migration/rdma.c
index af8e6234a9f..c6876347e1e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3284,9 +3284,7 @@ err:
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
ram_addr_t offset, size_t size)
 {
-if (!migrate_rdma() || migration_in_postcopy()) {
-return RAM_SAVE_CONTROL_NOT_SUPP;
-}
+assert(migrate_rdma());
 
 int ret = qemu_rdma_save_page(f, block_offset, offset, size);
 
diff --git a/migration/rdma.h b/migration/rdma.h
index f55f28bbed1..8eeb0117b91 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -33,7 +33,6 @@ void rdma_start_incoming_migration(InetSocketAddress 
*host_port, Error **errp);
 #define RAM_CONTROL_ROUND 1
 #define RAM_CONTROL_FINISH3
 
-#define RAM_SAVE_CONTROL_NOT_SUPP -1000
 #define RAM_SAVE_CONTROL_DELAYED  -2000
 
 #ifdef CONFIG_RDMA
@@ -56,7 +55,7 @@ static inline
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
ram_addr_t offset, size_t size)
 {
-return RAM_SAVE_CONTROL_NOT_SUPP;
+g_assert_not_reached();
 }
 #endif
 #endif
-- 
2.44.0

[PATCH v2 7/8] migration/rdma: Remove redundant migration_in_postcopy checks

2025-02-20 Thread Li Zhijian via

Since we have disabled RDMA + postcopy, it's safe to remove
the migration_in_postcopy()  that follows the migration_rdma().

Signed-off-by: Li Zhijian 
---
 migration/ram.c  | 2 +-
 migration/rdma.c | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index e07651aee8d..c363034c882 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1939,7 +1939,7 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 int res;
 
 /* Hand over to RDMA first */
-if (migrate_rdma() && !migration_in_postcopy()) {
+if (migrate_rdma()) {
 res = rdma_control_save_page(pss->pss_channel, pss->block->offset,
  offset, TARGET_PAGE_SIZE);
 
diff --git a/migration/rdma.c b/migration/rdma.c
index c6876347e1e..0349dd4a8b8 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3826,7 +3826,7 @@ int rdma_block_notification_handle(QEMUFile *f, const 
char *name)
 
 int rdma_registration_start(QEMUFile *f, uint64_t flags)
 {
-if (!migrate_rdma() || migration_in_postcopy()) {
+if (!migrate_rdma()) {
 return 0;
 }
 
@@ -3858,7 +3858,8 @@ int rdma_registration_stop(QEMUFile *f, uint64_t flags)
 RDMAControlHeader head = { .len = 0, .repeat = 1 };
 int ret;
 
-if (!migrate_rdma() || migration_in_postcopy()) {
+/* Hand over to RDMA first */
+if (!migrate_rdma()) {
 return 0;
 }
 
-- 
2.44.0

Re: [PATCH rfcv2 09/20] intel_iommu: Rename vtd_ce_get_rid2pasid_entry to vtd_ce_get_pasid_entry

2025-02-20 Thread CLEMENT MATHIEU--DRIF

Reviewed-by: Clément Mathieu--Drif

On 19/02/2025 09:22, Zhenzhong Duan wrote:
> Caution: External email. Do not open attachments or click links, unless this 
> email comes from a known sender and you know the content is safe.
> 
> 
> In early days vtd_ce_get_rid2pasid_entry() is used to get pasid entry of
> rid2pasid, then extend to any pasid. So a new name vtd_ce_get_pasid_entry
> is better to match its functions.
> 
> No functional change intended.
> 
> Signed-off-by: Zhenzhong Duan 
> ---
>   hw/i386/intel_iommu.c | 14 +++---
>   1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 7fde0603bf..df5fb30bc8 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -944,7 +944,7 @@ static int vtd_get_pe_from_pasid_table(IntelIOMMUState *s,
>   return 0;
>   }
> 
> -static int vtd_ce_get_rid2pasid_entry(IntelIOMMUState *s,
> +static int vtd_ce_get_pasid_entry(IntelIOMMUState *s,
> VTDContextEntry *ce,
> VTDPASIDEntry *pe,
> uint32_t pasid)
> @@ -1025,7 +1025,7 @@ static uint32_t vtd_get_iova_level(IntelIOMMUState *s,
>   VTDPASIDEntry pe;
> 
>   if (s->root_scalable) {
> -vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
> +vtd_ce_get_pasid_entry(s, ce, &pe, pasid);
>   if (s->flts) {
>   return VTD_PE_GET_FL_LEVEL(&pe);
>   } else {
> @@ -1048,7 +1048,7 @@ static uint32_t vtd_get_iova_agaw(IntelIOMMUState *s,
>   VTDPASIDEntry pe;
> 
>   if (s->root_scalable) {
> -vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
> +vtd_ce_get_pasid_entry(s, ce, &pe, pasid);
>   return 30 + ((pe.val[0] >> 2) & VTD_SM_PASID_ENTRY_AW) * 9;
>   }
> 
> @@ -1116,7 +1116,7 @@ static dma_addr_t 
> vtd_get_iova_pgtbl_base(IntelIOMMUState *s,
>   VTDPASIDEntry pe;
> 
>   if (s->root_scalable) {
> -vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
> +vtd_ce_get_pasid_entry(s, ce, &pe, pasid);
>   if (s->flts) {
>   return pe.val[2] & VTD_SM_PASID_ENTRY_FLPTPTR;
>   } else {
> @@ -1522,7 +1522,7 @@ static int vtd_ce_rid2pasid_check(IntelIOMMUState *s,
>* has valid rid2pasid setting, which includes valid
>* rid2pasid field and corresponding pasid entry setting
>*/
> -return vtd_ce_get_rid2pasid_entry(s, ce, &pe, PCI_NO_PASID);
> +return vtd_ce_get_pasid_entry(s, ce, &pe, PCI_NO_PASID);
>   }
> 
>   /* Map a device to its corresponding domain (context-entry) */
> @@ -1611,7 +1611,7 @@ static uint16_t vtd_get_domain_id(IntelIOMMUState *s,
>   VTDPASIDEntry pe;
> 
>   if (s->root_scalable) {
> -vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
> +vtd_ce_get_pasid_entry(s, ce, &pe, pasid);
>   return VTD_SM_PASID_ENTRY_DID(pe.val[1]);
>   }
> 
> @@ -1687,7 +1687,7 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, 
> VTDContextEntry *ce,
>   int ret;
> 
>   if (s->root_scalable) {
> -ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
> +ret = vtd_ce_get_pasid_entry(s, ce, &pe, pasid);
>   if (ret) {
>   /*
>* This error is guest triggerable. We should assumt PT
> --
> 2.34.1
>

Re: [PATCH v3 00/14] Change ghes to use HEST-based offsets and add support for error inject

2025-02-20 Thread Mauro Carvalho Chehab

Em Mon, 3 Feb 2025 16:22:36 +0100
Igor Mammedov  escreveu:

> On Mon, 3 Feb 2025 11:09:34 +
> Jonathan Cameron  wrote:
> 
> > On Fri, 31 Jan 2025 18:42:41 +0100
> > Mauro Carvalho Chehab  wrote:
> >   
> > > Now that the ghes preparation patches were merged, let's add support
> > > for error injection.
> > > 
> > > On this series, the first 6 patches chang to the math used to calculate 
> > > offsets at HEST
> > > table and hardware_error firmware file, together with its migration code. 
> > > Migration tested
> > > with both latest QEMU released kernel and upstream, on both directions.
> > > 
> > > The next patches add a new QAPI to allow injecting GHESv2 errors, and a 
> > > script using such QAPI
> > >to inject ARM Processor Error records.
> > > 
> > > If I'm counting well, this is the 19th submission of my error inject 
> > > patches.
> > 
> > Looks good to me. All remaining trivial things are in the category
> > of things to consider only if you are doing another spin.  The code
> > ends up how I'd like it at the end of the series anyway, just
> > a question of the precise path to that state!  
> 
> if you look at series as a whole it's more or less fine (I guess you
> and me got used to it)
> 
> however if you take it patch by patch (as if you've never seen it)
> ordering is messed up (the same would apply to everyone after a while
> when it's forgotten)
> 
> So I'd strongly suggest to restructure the series (especially 2-6/14).
> re sum up my comments wrt ordering:
> 
> 0  add testcase for HEST table with current HEST as expected blob
>(currently missing), so that we can be sure that we haven't messed
>existing tables during refactoring.

Not sure if I got this one. The HEST table is part of etc/acpi/tables,
which is already tested, as you pointed at the previous reviews. Doing
changes there is already detected. That's basically why we added patches
10 and 12:

[PATCH v3 10/14] tests/acpi: virt: allow acpi table changes for a new 
table: HEST
[PATCH v3 12/14] tests/acpi: virt: add a HEST table to aarch64 virt and 
update DSDT

What tests don't have is a check for etc/hardware_errors firmware inside 
tests/data/acpi/aarch64/virt/, but, IMO, we shouldn't add it there.

See, hardware_errors table contains only some skeleton space to
store:

- 1 or more error block address offsets;
- 1 or more read ack register;
- 1 or more HEST source entries containing CPER blocks.

There's nothing there to be actually checked: it is just some
empty spaces with a variable number of fields.

With the new code, the actual number of CPER blocks and their
corresponding offsets and read ack registers can be different on 
different architectures. So, for instance, when we add x86 support,
we'll likely start with just one error source entry, while arm will
have two after this changeset.

Also, one possibility to address the issues reported by Gavin Shan at
https://lore.kernel.org/qemu-devel/20250214041635.608012-1-gs...@redhat.com/
would be to have one entry per each CPU. So, the size of such firmware
could be dependent on the number of CPUs.

So, adding any validation to it would just cause pain and probably
won't detect any problems.

What could be done instead is to have a different type of tests that
would use the error injection script to check if regressions are 
introduced after QEMU 10.0. Such new kind of test would require
this series to be merged first. It would also require the usage of
an OSPM image with some testing tools on it. This is easier said 
than done, as besides the complexity of having an OSPM test image,
such kind of tests would require extra logic, specially if it would
check regressions for SEA and other notification sources.

Thanks,
Mauro

Re: [PATCH 00/42] docs: add sphinx-domain rST generator to qapidoc

2025-02-20 Thread Markus Armbruster

John Snow  writes:

> On Wed, Feb 19, 2025 at 8:22 AM Markus Armbruster  wrote:
>
>> John Snow  writes:
>>
>> > "The text handler you add looks just like the existing latex handler. Does
>> > LaTeX output lack "little headings", too?"
>> >
>> > Yes, almost certainly. Can you let me know which output formats we actually
>> > "care about"? I'll have to test them all.
>>
>> As far as I can tell, our build system runs sphinx-build -b html and -b
>> man.
>>
>> I run it with -b text manually all the time to hunt for and review
>> changes in output.  I'd prefer to keep it working if practical.
>>
>> For what it's worth, there is a bit of LaTeX configuration in
>> docs/conf.py.
>>
>> >   In the meantime, I upgraded my
>> > patch so that the text translator properly handles branches with headings
>> > that delineate the different branches so that the text output is fully
>> > reasonable. I will need to do the same for any format we care about.
>> >
>> > I've re-pushed as of "about 30 minutes before I wrote this email" --
>> > https://gitlab.com/jsnow/qemu/-/commits/sphinx-domain-blergh2
>> >
>> > This branch includes the text generator fixes (which technically belong
>> > with the predecessor series we skipped, but I'll refactor that later.)
>> > it also includes fixes to the branch inliner, generated return statements,
>> > and generated out-of-band feature sections.
>>
>> I'll fetch it, thanks!
>>
>> > (Long story short: inserting new sections in certain spots was broken
>> > because of cache. Oops. We can discuss more why I wrote that part of the
>> > code like I did in review for the patch that introduced that problem. It's
>> > the "basic inliner" patch.)
>> >
>> > Below, I'm going to try a new communication approach where I explicitly say
>> > if I have added something to my tasklist or not so that it's clear to you
>> > what I believe is actionable (and what I am agreeing to change) and what I
>> > believe needs stronger input from you before I do anything. Apologies if it
>> > seems a little robotic, just trying new things O:-)
>> >
>> > On that note: not added to tasklist: do we need the LaTeX handler? Do we
>> > need any others? Please confirm O:-)
>>
>> See above.
>>
>
> I've got html and text working, text wasn't hard. I will give it a good
> college try on the LaTeX and man formats. Might be easy. The issue here is
> the custom node I introduced for the collapsible details sections which has
> no default handler in the generators. I'll have to learn more about that
> part of the API, I haven't interfaced with it much yet.

Understand.

Have you considered cutting the series in half before the inliner?
First part emits "The members of ..." like the old doc generator.
Second part replaces that with inlined material.

We could totally release with just the first half!  Inlining is great,
but even without it, your work looks so much better and is so much more
usable.

>> > On Fri, Feb 14, 2025 at 7:05 AM Markus Armbruster  
>> > wrote:
>> >
>> >> I started to eyeball old and new generated output side by side.
>> >>
>> >> New table of contents shows one level, old two.  No objection; the
>> >> navigation thingie on the left is more useful anyway.
>> >>
>> >
>> > Unintentional, but if you like it, it's fine by me. Nothing added to my
>> > tasklist.
>>
>> Mention in a commit message.
>>
>
> Sure. I... just need to figure out which commit to mention it in. Added to
> my list, anyway.
>
>
>>
>> >> The new generator elides unreferenced types.  Generally good, but two
>> >> observations:
>> >>
>> >> * QapiErrorClass is unreferenced, but its members are mentioned in
>> >>   Errors sections.  QapiErrorClass serves as better than nothing error
>> >>   code documentation, but it's gone in the new doc.  So this is a minor
>> >>   regression.  We can figure out what to do about it later.
>> >>
>> >
>> > Right. I debated making the members references to that class, but recalled
>> > that you disliked this class and figured you'd not like such a change, so I
>> > just left it alone. I do not have cross-references for individual members
>> > of objects at all yet anyway, so this is definitely more work regardless.
>> >
>> > We could always create a pragma of some sort (or just hardcode a list) of
>> > items that must be documented regardless of if they're referenced or not.
>> > Please let me know your preference and I will add a "ticket" on my personal
>> > tasklist for this project to handle that at /some point/. Nothing added to
>> > my tasklist just yet.
>>
>> Suggest to add something like "compensate for the loss of QapiErrorClass
>> documentation in the QEMU QMP Reference Manual".
>>
>
> Got it. Possibly a "for later" task but not much later. It can always come
> after this first series, but before we "turn on" the new generator, if that
> makes sense. Just so we reach a quiescent point and flush the staggeringly
> large queue.

I think we could even do it after "turn on".  Ye

RE: [PATCH 4/5] pcie, virtio: Remove redundant pm_cap

2025-02-20 Thread Duan, Zhenzhong




>-Original Message-
>From: Alex Williamson 
>Subject: [PATCH 4/5] pcie, virtio: Remove redundant pm_cap
>
>The pm_cap on the PCIExpressDevice object can be distilled down
>to the new instance on the PCIDevice object.
>
>Cc: Michael S. Tsirkin 
>Cc: Marcel Apfelbaum 
>Signed-off-by: Alex Williamson 
>---
> hw/pci-bridge/pcie_pci_bridge.c | 1 -
> hw/virtio/virtio-pci.c  | 8 +++-
> include/hw/pci/pcie.h   | 2 --
> 3 files changed, 3 insertions(+), 8 deletions(-)
>
>diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c
>index 9fa656b43b42..2429503cfbbf 100644
>--- a/hw/pci-bridge/pcie_pci_bridge.c
>+++ b/hw/pci-bridge/pcie_pci_bridge.c
>@@ -56,7 +56,6 @@ static void pcie_pci_bridge_realize(PCIDevice *d, Error
>**errp)
> if (pos < 0) {
> goto pm_error;
> }
>-d->exp.pm_cap = pos;
> pci_set_word(d->config + pos + PCI_PM_PMC, 0x3);
>
> pcie_cap_arifwd_init(d);
>diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
>index afe8b5551c5c..3ca3f849d391 100644
>--- a/hw/virtio/virtio-pci.c
>+++ b/hw/virtio/virtio-pci.c
>@@ -2209,8 +2209,6 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error
>**errp)
> return;
> }
>
>-pci_dev->exp.pm_cap = pos;
>-
> /*
>  * Indicates that this function complies with revision 1.2 of the
>  * PCI Power Management Interface Specification.
>@@ -2309,11 +2307,11 @@ static bool virtio_pci_no_soft_reset(PCIDevice *dev)
> {
> uint16_t pmcsr;
>
>-if (!pci_is_express(dev) || !dev->exp.pm_cap) {
>+if (!pci_is_express(dev) || !(dev->cap_present & QEMU_PCI_CAP_PM)) {

Maybe a bit more optimized by checking dev->pm_cap,
but it's also ok checking present bit. For the whole series,

Reviewed-by: Zhenzhong Duan 

Thanks
Zhenzhong

> return false;
> }
>
>-pmcsr = pci_get_word(dev->config + dev->exp.pm_cap + PCI_PM_CTRL);
>+pmcsr = pci_get_word(dev->config + dev->pm_cap + PCI_PM_CTRL);
>
> /*
>  * When No_Soft_Reset bit is set and the device
>@@ -2342,7 +2340,7 @@ static void virtio_pci_bus_reset_hold(Object *obj,
>ResetType type)
>
> if (proxy->flags & VIRTIO_PCI_FLAG_INIT_PM) {
> pci_word_test_and_clear_mask(
>-dev->config + dev->exp.pm_cap + PCI_PM_CTRL,
>+dev->config + dev->pm_cap + PCI_PM_CTRL,
> PCI_PM_CTRL_STATE_MASK);
> }
> }
>diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
>index b8d59732bc63..70a5de09de39 100644
>--- a/include/hw/pci/pcie.h
>+++ b/include/hw/pci/pcie.h
>@@ -58,8 +58,6 @@ typedef enum {
> struct PCIExpressDevice {
> /* Offset of express capability in config space */
> uint8_t exp_cap;
>-/* Offset of Power Management capability in config space */
>-uint8_t pm_cap;
>
> /* SLOT */
> bool hpev_notified; /* Logical AND of conditions for hot plug event.
>--
>2.48.1

[PATCH v2 0/8] migration/rdma: fixes, refactor and cleanup

2025-02-20 Thread Li Zhijian via

- It fix the RDMA migration broken issue
- disable RDMA + postcopy
- some cleanups
- Add a qtest for RDMA at last

Changs since V1[0]:
Add some saparate patches to refactor and cleanup based on V1

[0] 
https://lore.kernel.org/qemu-devel/20250218074345.638203-1-lizhij...@fujitsu.com/

Li Zhijian (8):
  migration: Prioritize RDMA in ram_save_target_page()
  migration/rdma: Remove redundant RAM_SAVE_CONTROL_NOT_SUPP check
  migration: Kill RAM_SAVE_CONTROL_NOT_SUPP
  migration: Integrate control_save_page() logic into
ram_save_target_page()
  migration: Add migration_capabilities_and_transport_compatible()
helper
  migraion: disable RDMA + postcopy-ram
  migration/rdma: Remove redundant migration_in_postcopy checks
  migration: Add qtest for migration over RDMA

 MAINTAINERS   |  1 +
 migration/migration.c | 40 -
 migration/ram.c   | 41 +
 migration/rdma.c  | 12 +++--
 migration/rdma.h  |  3 +-
 scripts/rdma-migration-helper.sh  | 41 +
 tests/qtest/migration/precopy-tests.c | 64 +++
 7 files changed, 153 insertions(+), 49 deletions(-)
 create mode 100755 scripts/rdma-migration-helper.sh

-- 
2.44.0

RE: [PATCH v3 21/28] hw/misc/aspeed_hace: Fix boot issue in the Crypto Manager Self Test

2025-02-20 Thread Jamin Lin

Hi Cedric, 

> 
> Hi Cedric,
> 
> > Subject: Re: [PATCH v3 21/28] hw/misc/aspeed_hace: Fix boot issue in
> > the Crypto Manager Self Test
> >
> > On 2/13/25 04:35, Jamin Lin wrote:
> > > Currently, it does not support the CRYPT command. Instead, it only
> > > sends an interrupt to notify the firmware that the crypt command has
> > completed.
> > > It is a temporary workaround to resolve the boot issue in the Crypto
> > > Manager Self Test.
> > >
> > > Signed-off-by: Jamin Lin 
> >
> > Please add an AspeedHACEClass class attribute (bool) to handle this
> > workaround and a comment in the code mentioning the issue.
> >
> Thanks for review and suggestion.
> I will add the use_crypt_workaround attribute to the AspeedHACEClass and
> introduce the use-crypt-workaround property.
> Do you have any concerns, or do you have a preferred naming convention?


Update my comments.
I do not need to create a property. I can set use_crypt_workaround to true in 
aspeed_ast2700_hace_class_init
Thanks-Jamin

> Thanks-Jamin
> 
> >
> > Thanks,
> >
> > C.
> >
> >
> >
> > > ---
> > >   hw/misc/aspeed_hace.c | 12 
> > >   1 file changed, 12 insertions(+)
> > >
> > > diff --git a/hw/misc/aspeed_hace.c b/hw/misc/aspeed_hace.c index
> > > 86422cb3be..4d0999e7e9 100644
> > > --- a/hw/misc/aspeed_hace.c
> > > +++ b/hw/misc/aspeed_hace.c
> > > @@ -59,6 +59,7 @@
> > >   /* Other cmd bits */
> > >   #define  HASH_IRQ_ENBIT(9)
> > >   #define  HASH_SG_EN BIT(18)
> > > +#define  CRYPT_IRQ_EN   BIT(12)
> > >   /* Scatter-gather data list */
> > >   #define SG_LIST_LEN_SIZE4
> > >   #define SG_LIST_LEN_MASK0x0FFF
> > > @@ -343,6 +344,13 @@ static void aspeed_hace_write(void *opaque,
> > hwaddr addr, uint64_t data,
> > >   qemu_irq_lower(s->irq);
> > >   }
> > >   }
> > > +if (data & CRYPT_IRQ) {
> > > +data &= ~CRYPT_IRQ;
> > > +
> > > +if (s->regs[addr] & CRYPT_IRQ) {
> > > +qemu_irq_lower(s->irq);
> > > +}
> > > +}
> > >   break;
> > >   case R_HASH_SRC:
> > >   data &= ahc->src_mask;
> > > @@ -388,6 +396,10 @@ static void aspeed_hace_write(void *opaque,
> > hwaddr addr, uint64_t data,
> > >   case R_CRYPT_CMD:
> > >   qemu_log_mask(LOG_UNIMP, "%s: Crypt commands not
> > implemented\n",
> > >  __func__);
> > > +s->regs[R_STATUS] |= CRYPT_IRQ;
> > > +if (data & CRYPT_IRQ_EN) {
> > > +qemu_irq_raise(s->irq);
> > > +}
> > >   break;
> > >   default:
> > >   break;

Re: [PATCH 0/5] PCI: Implement basic PCI PM capability backing

2025-02-20 Thread Michael S. Tsirkin

On Thu, Feb 20, 2025 at 03:48:53PM -0700, Alex Williamson wrote:
> Eric recently identified an issue[1] where during graceful shutdown
> of a VM in a vIOMMU configuration, the guest driver places the device
> into the D3 power state, the vIOMMU is then disabled, triggering an
> AddressSpace update.  The device BARs are still mapped into the AS,
> but the vfio host driver refuses to DMA map the MMIO space due to the
> device power state.
> 
> The proposed solution in [1] was to skip mappings based on the
> device power state.  Here we take a different approach.  The PCI spec
> defines that devices in D1/2/3 power state should respond only to
> configuration and message requests and all other requests should be
> handled as an Unsupported Request.  In other words, the memory and
> IO BARs are not accessible except when the device is in the D0 power
> state.
> 
> To emulate this behavior, we can factor the device power state into
> the mapping state of the device BARs.  Therefore the BAR is marked
> as unmapped if either the respective command register enable bit is
> clear or the device is not in the D0 power state.
> 
> In order to implement this, the PowerState field of the PMCSR
> register becomes writable, which allows the device to appear in
> lower power states.  This also therefore implements D3 support
> (insofar as the BAR behavior) for all devices implementing the PM
> capability.  The PCI spec requires D3 support.
> 
> An aspect that needs attention here is whether this change in the
> wmask and PMCSR bits becomes a problem for migration, and how we
> might solve it.  For a guest migrating old->new, the device would
> always be in the D0 power state, but the register becomes writable.
> In the opposite direction, is it possible that a device could
> migrate in a low power state and be stuck there since the bits are
> read-only in old QEMU?  Do we need an option for this behavior and a
> machine state bump, or are there alternatives?
> 
> Thanks,
> Alex
> 
> [1]https://lore.kernel.org/all/20250219175941.135390-1-eric.au...@redhat.com/


PCI bits:

Reviewed-by: Michael S. Tsirkin 

feel free to merge.

> Alex Williamson (5):
>   hw/pci: Basic support for PCI power management
>   pci: Use PCI PM capability initializer
>   vfio/pci: Delete local pm_cap
>   pcie, virtio: Remove redundant pm_cap
>   hw/vfio/pci: Re-order pre-reset
> 
>  hw/net/e1000e.c |  3 +-
>  hw/net/eepro100.c   |  4 +-
>  hw/net/igb.c|  3 +-
>  hw/nvme/ctrl.c  |  3 +-
>  hw/pci-bridge/pcie_pci_bridge.c |  3 +-
>  hw/pci/pci.c| 83 -
>  hw/pci/trace-events |  2 +
>  hw/vfio/pci.c   | 29 ++--
>  hw/vfio/pci.h   |  1 -
>  hw/virtio/virtio-pci.c  | 11 ++---
>  include/hw/pci/pci.h|  3 ++
>  include/hw/pci/pci_device.h |  3 ++
>  include/hw/pci/pcie.h   |  2 -
>  13 files changed, 112 insertions(+), 38 deletions(-)
> 
> -- 
> 2.48.1

Re: [PATCH v2 0/2] s390x: support virtio-mem-pci

2025-02-20 Thread Michael S. Tsirkin

On Tue, Jan 28, 2025 at 07:57:03PM +0100, David Hildenbrand wrote:
> This is based-on [1], which adds MSI-X support to virtio-balloon-pci,
> but can be applied independently.
> 
> Turns out it is fairly easy to get virtio-mem-pci running on s390x. We
> only have to add MSI-X support to virtio-mem-pci, and wire-up the
> (un)plugging in the machine.
> 
> Tried some simple stuff (hotplug/hotunplug/resize/reboot), and all seems
> to be working as expected.
> 
> The kernel in the VM needs both, CONFIG_VIRTIO_PCI and CONFIG_VIRTIO_MEM
> for it to work.
> 
> [1] https://lkml.kernel.org/r/20250115161425.246348-1-ar...@linux.ibm.com


Fails CI:

https://gitlab.com/mstredhat/qemu/-/jobs/9202574981


> v1 -> v2:
> * There are no transitional/non_transitional devices for virtio-mem
> * Spell out removal of "return;" in second patch
> 
> Cc: Eduardo Habkost 
> Cc: Marcel Apfelbaum 
> Cc: Richard Henderson 
> Cc: David Hildenbrand 
> Cc: Ilya Leoshkevich 
> Cc: Halil Pasic 
> Cc: Christian Borntraeger 
> Cc: Eric Farman 
> Cc: Thomas Huth 
> Cc: "Michael S. Tsirkin" 
> Cc: Cornelia Huck 
> Cc: Boris Fiuczynski 
> Cc: Michal Privoznik 
> Cc: Mario Casquero 
> 
> David Hildenbrand (2):
>   virtio-mem-pci: Allow setting nvectors, so we can use MSI-X
>   s390x/s390-virtio-ccw: Support plugging PCI-based virtio memory
> devices
> 
>  hw/core/machine.c  |  1 +
>  hw/s390x/s390-virtio-ccw.c | 20 ++--
>  hw/virtio/virtio-mem-pci.c | 12 
>  3 files changed, 27 insertions(+), 6 deletions(-)
> 
> -- 
> 2.48.1

Re: [PATCH v5 3/6] target/i386: Update EPYC-Milan CPU model for Cache property, RAS, SVM feature bits

2025-02-20 Thread Moger, Babu


Hi Zhao,

On 2/20/2025 5:26 AM, Zhao Liu wrote:

+static const CPUCaches epyc_milan_v3_cache_info = {
+.l1d_cache = &(CPUCacheInfo) {
+.type = DATA_CACHE,
+.level = 1,
+.size = 32 * KiB,
+.line_size = 64,
+.associativity = 8,
+.partitions = 1,
+.sets = 64,
+.lines_per_tag = 1,
+.self_init = 1,


true.


Sure.




+.share_level = CPU_TOPOLOGY_LEVEL_CORE,
+},
+.l1i_cache = &(CPUCacheInfo) {
+.type = INSTRUCTION_CACHE,
+.level = 1,
+.size = 32 * KiB,
+.line_size = 64,
+.associativity = 8,
+.partitions = 1,
+.sets = 64,
+.lines_per_tag = 1,
+.self_init = 1,


true.


Sure.



Reviewed-by: Zhao Liu 




Thanks
Babu

Re: [PATCH v5 4/6] target/i386: Add feature that indicates WRMSR to BASE reg is non-serializing

2025-02-20 Thread Moger, Babu

Hi Zhao,

On 2/20/2025 6:00 AM, Zhao Liu wrote:

On Thu, Feb 06, 2025 at 01:28:37PM -0600, Babu Moger wrote:

Date: Thu, 6 Feb 2025 13:28:37 -0600
From: Babu Moger 
Subject: [PATCH v5 4/6] target/i386: Add feature that indicates WRMSR to
  BASE reg is non-serializing
X-Mailer: git-send-email 2.34.1

Add the CPUID bit indicates that a WRMSR to MSR_FS_BASE, MSR_GS_BASE, or
MSR_KERNEL_GS_BASE is non-serializing.

CPUID_Fn8021_EAX
BitFeature description
1  FsGsKernelGsBaseNonSerializing.
WRMSR to FS_BASE, GS_BASE and KernelGSbase are non-serializing.

Link: 
https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/57238.zip
Signed-off-by: Babu Moger 
Reviewed-by: Maksim Davydov 
---
  target/i386/cpu.c | 2 +-
  target/i386/cpu.h | 2 ++
  2 files changed, 3 insertions(+), 1 deletion(-)

Reviewed-by: Zhao Liu 

Thank you,
Babu

Re: [PATCH v5 5/6] target/i386: Update EPYC-Genoa for Cache property, perfmon-v2, RAS and SVM feature bits

2025-02-20 Thread Moger, Babu


Hi Zhao,

On 2/20/2025 6:05 AM, Zhao Liu wrote:

+static const CPUCaches epyc_genoa_v2_cache_info = {
+.l1d_cache = &(CPUCacheInfo) {
+.type = DATA_CACHE,
+.level = 1,
+.size = 32 * KiB,
+.line_size = 64,
+.associativity = 8,
+.partitions = 1,
+.sets = 64,
+.lines_per_tag = 1,
+.self_init = 1,


true.


Sure.




+.share_level = CPU_TOPOLOGY_LEVEL_CORE,
+},
+.l1i_cache = &(CPUCacheInfo) {
+.type = INSTRUCTION_CACHE,
+.level = 1,
+.size = 32 * KiB,
+.line_size = 64,
+.associativity = 8,
+.partitions = 1,
+.sets = 64,
+.lines_per_tag = 1,
+.self_init = 1,


true.


Sure.


Reviewed-by: Zhao Liu 




Thanks
Babu

Re: [PATCH v5 2/6] target/i386: Update EPYC-Rome CPU model for Cache property, RAS, SVM feature bits

2025-02-20 Thread Moger, Babu

Hi Zhao,

On 2/20/2025 5:18 AM, Zhao Liu wrote:

On Thu, Feb 06, 2025 at 01:28:35PM -0600, Babu Moger wrote:

Date: Thu, 6 Feb 2025 13:28:35 -0600
From: Babu Moger 
Subject: [PATCH v5 2/6] target/i386: Update EPYC-Rome CPU model for Cache
  property, RAS, SVM feature bits
X-Mailer: git-send-email 2.34.1

Found that some of the cache properties are not set correctly for EPYC models.

l1d_cache.no_invd_sharing should not be true.
l1i_cache.no_invd_sharing should not be true.

L2.self_init should be true.
L2.inclusive should be true.

L3.inclusive should not be true.
L3.no_invd_sharing should be true.

Fix these cache properties.

Also add the missing RAS and SVM features bits on AMD EPYC-Rome. The SVM
feature bits are used in nested guests.

succor  : Software uncorrectable error containment and recovery 
capability.
overflow-recov  : MCA overflow recovery support.
lbrv: LBR virtualization
tsc-scale   : MSR based TSC rate control
vmcb-clean  : VMCB clean bits
flushbyasid : Flush by ASID
pause-filter: Pause intercept filter
pfthreshold : PAUSE filter threshold
v-vmsave-vmload : Virtualized VMLOAD and VMSAVE
vgif: Virtualized GIF

Signed-off-by: Babu Moger 
Reviewed-by: Maksim Davydov 
---
  target/i386/cpu.c | 73 +++
  1 file changed, 73 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 94292bfaa2..e2c3c797ed 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -2342,6 +2342,60 @@ static const CPUCaches epyc_rome_v3_cache_info = {
  },
  };

+static const CPUCaches epyc_rome_v5_cache_info = {

+.l1d_cache = &(CPUCacheInfo) {
+.type = DATA_CACHE,
+.level = 1,
+.size = 32 * KiB,
+.line_size = 64,
+.associativity = 8,
+.partitions = 1,
+.sets = 64,
+.lines_per_tag = 1,
+.self_init = 1,

This field could be true,

Sure.

+.share_level = CPU_TOPOLOGY_LEVEL_CORE,
+},
+.l1i_cache = &(CPUCacheInfo) {
+.type = INSTRUCTION_CACHE,
+.level = 1,
+.size = 32 * KiB,
+.line_size = 64,
+.associativity = 8,
+.partitions = 1,
+.sets = 64,
+.lines_per_tag = 1,
+.self_init = 1,

ditto,

Sure.

Compared to the previous cache model version, the differences can be
checked. I feel that in the future, when we introduce a new cache model,
it's better to avoid omitting items that default to false. This way, the
cache model can correspond to the output of the cpuid tool, making it
easier to compare and check.

Sounds good.

Overall, LGTM,

Reviewed-by: Zhao Liu 

Thanks
Babu

Re: [PATCH v5 6/6] target/i386: Add support for EPYC-Turin model

2025-02-20 Thread Moger, Babu


Hi Zhao,

On 2/20/2025 6:11 AM, Zhao Liu wrote:

+static const CPUCaches epyc_turin_cache_info = {
+.l1d_cache = &(CPUCacheInfo) {
+.type = DATA_CACHE,
+.level = 1,
+.size = 48 * KiB,
+.line_size = 64,
+.associativity = 12,
+.partitions = 1,
+.sets = 64,
+.lines_per_tag = 1,
+.self_init = 1,


true.


Sure.




+.share_level = CPU_TOPOLOGY_LEVEL_CORE,
+},
+.l1i_cache = &(CPUCacheInfo) {
+.type = INSTRUCTION_CACHE,
+.level = 1,
+.size = 32 * KiB,
+.line_size = 64,
+.associativity = 8,
+.partitions = 1,
+.sets = 64,
+.lines_per_tag = 1,
+.self_init = 1,


true.


Sure.



Reviewed-by: Zhao Liu 


thanks



(And it would be better to add a Turin entry in 
docs/system/cpu-models-x86.rst.inc
later :-).)


Yes. Will add a new patch to update docs/system/cpu-models-x86.rst.inc.



Thanks,
Zhao

1 2 3 >

1 - 100 of 244 matches

Mail list logo