Re: [PATCH v2] vhost: Don't set vring call if guest notifier is disabled
On Wed, Feb 19, 2025 at 09:52:35AM +0800, oen...@gmail.com wrote: > From: Huaitong Han > > The vring call fd is set even when the guest does not use MSIX (e.g., virtio > PMD). This results in unnecessary CPU overhead for handling virtio interrupts. > The previous patch only optimized the condition when query_queue_notifier was > enabled and the vector was unset. However, if query_queue_notifier is > disabled, > the vring call FD should also be unset to avoid this inefficiency. > > Fixes: 96a3d98d2c ("vhost: don't set vring call if no vector") > > Reported-by: Zhiyuan Yuan > Signed-off-by: Huaitong Han Fails make check: https://gitlab.com/mstredhat/qemu/-/jobs/9201935243 how was this tested? Pls include data on testing in the commit log. > --- > v2: - add Fixes tag > - cc qemu-sta...@nongnu.org > > hw/virtio/vhost.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > index 6aa72fd434..d17e7cc6fe 100644 > --- a/hw/virtio/vhost.c > +++ b/hw/virtio/vhost.c > @@ -1342,8 +1342,8 @@ int vhost_virtqueue_start(struct vhost_dev *dev, > } > > if (k->query_guest_notifiers && > -k->query_guest_notifiers(qbus->parent) && > -virtio_queue_vector(vdev, idx) == VIRTIO_NO_VECTOR) { > +(!k->query_guest_notifiers(qbus->parent) || > +virtio_queue_vector(vdev, idx) == VIRTIO_NO_VECTOR)) { > file.fd = -1; > r = dev->vhost_ops->vhost_set_vring_call(dev, &file); > if (r) { > -- > 2.43.5
Re: [PULL 33/42] tcg/riscv: Require TCG_TARGET_REG_BITS == 64
On 5/5/23 23:24, Richard Henderson wrote: The port currently does not support "oversize" guests, which means riscv32 can only target 32-bit guests. We will soon be building TCG once for all guests. This implies that we can only support riscv64. Since all Linux distributions target riscv64 not riscv32, this is not much of a restriction and simplifies the code. The brcond2 and setcond2 opcodes are exclusive to 32-bit hosts, so we can and should remove the stubs. Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Daniel Henrique Barboza Signed-off-by: Richard Henderson --- tcg/riscv/tcg-target-con-set.h | 8 -- tcg/riscv/tcg-target.h | 22 ++-- tcg/riscv/tcg-target.c.inc | 232 + 3 files changed, 72 insertions(+), 190 deletions(-) diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h index 0deb33701f..dddf2486c1 100644 --- a/tcg/riscv/tcg-target.h +++ b/tcg/riscv/tcg-target.h @@ -25,11 +25,14 @@ #ifndef RISCV_TCG_TARGET_H #define RISCV_TCG_TARGET_H -#if __riscv_xlen == 32 -# define TCG_TARGET_REG_BITS 32 -#elif __riscv_xlen == 64 -# define TCG_TARGET_REG_BITS 64 +/* + * We don't support oversize guests. + * Since we will only build tcg once, this in turn requires a 64-bit host. + */ +#if __riscv_xlen != 64 +#error "unsupported code generation mode" #endif +#define TCG_TARGET_REG_BITS 64 #define TCG_TARGET_INSN_UNIT_SIZE 4 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 20 @@ -83,13 +86,8 @@ typedef enum { #define TCG_TARGET_STACK_ALIGN 16 #define TCG_TARGET_CALL_STACK_OFFSET0 #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_NORMAL -#if TCG_TARGET_REG_BITS == 32 -#define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_EVEN -#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_EVEN -#else #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL #define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_NORMAL -#endif #define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL /* optional instructions */ @@ -106,8 +104,8 @@ typedef enum { #define TCG_TARGET_HAS_sub2_i32 1 #define TCG_TARGET_HAS_mulu2_i320 #define TCG_TARGET_HAS_muls2_i320 -#define TCG_TARGET_HAS_muluh_i32(TCG_TARGET_REG_BITS == 32) -#define TCG_TARGET_HAS_mulsh_i32(TCG_TARGET_REG_BITS == 32) +#define TCG_TARGET_HAS_muluh_i320 +#define TCG_TARGET_HAS_mulsh_i320 Should have we squashed the following with these changes? -- >8 -- diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc index f7e1ca5a56f..e395ffcdaf5 100644 --- a/tcg/riscv/tcg-target.c.inc +++ b/tcg/riscv/tcg-target.c.inc @@ -2323,10 +2323,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, TCGType type, break; -case INDEX_op_mulsh_i32: case INDEX_op_mulsh_i64: tcg_out_opc_reg(s, OPC_MULH, a0, a1, a2); break; -case INDEX_op_muluh_i32: case INDEX_op_muluh_i64: tcg_out_opc_reg(s, OPC_MULHU, a0, a1, a2); @@ -2399,4 +2397,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, TCGType type, case INDEX_op_extu_i32_i64: case INDEX_op_extrl_i64_i32: +case INDEX_op_mulsh_i32: +case INDEX_op_muluh_i32: default: g_assert_not_reached(); @@ -2707,6 +2707,4 @@ tcg_target_op_def(TCGOpcode op, TCGType type, unsigned flags) case INDEX_op_mul_i32: -case INDEX_op_mulsh_i32: -case INDEX_op_muluh_i32: case INDEX_op_div_i32: case INDEX_op_divu_i32: --- #define TCG_TARGET_HAS_ext8s_i321 #define TCG_TARGET_HAS_ext16s_i32 1 #define TCG_TARGET_HAS_ext8u_i321 @@ -128,7 +126,6 @@ typedef enum { #define TCG_TARGET_HAS_setcond2 1 #define TCG_TARGET_HAS_qemu_st8_i32 0 -#if TCG_TARGET_REG_BITS == 64 #define TCG_TARGET_HAS_movcond_i64 0 #define TCG_TARGET_HAS_div_i64 1 #define TCG_TARGET_HAS_rem_i64 1 @@ -165,7 +162,6 @@ typedef enum { #define TCG_TARGET_HAS_muls2_i640 #define TCG_TARGET_HAS_muluh_i641 #define TCG_TARGET_HAS_mulsh_i641 -#endif #define TCG_TARGET_DEFAULT_MO (0)
RE: [PATCH v3 23/28] test/functional/aspeed: Introduce new function to fetch assets
Hi Cedric, > > On 2/13/25 04:35, Jamin Lin wrote: > > This method simplifies the process of fetching and extracting assets > > from the Aspeed GitHub repository. > > > > Signed-off-by: Jamin Lin > > --- > > tests/functional/test_aarch64_aspeed.py | 9 + > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > diff --git a/tests/functional/test_aarch64_aspeed.py > > b/tests/functional/test_aarch64_aspeed.py > > index 9595498ace..f3d7c8331a 100755 > > --- a/tests/functional/test_aarch64_aspeed.py > > +++ b/tests/functional/test_aarch64_aspeed.py > > @@ -27,14 +27,15 @@ def do_test_aarch64_aspeed_sdk_start(self, image): > > wait_for_console_pattern(self, '## Loading kernel from FIT > Image') > > wait_for_console_pattern(self, 'Starting kernel ...') > > > > -ASSET_SDK_V903_AST2700 = Asset( > > - > 'https://github.com/AspeedTech-BMC/openbmc/releases/download/v09.03/ast > 2700-default-obmc.tar.gz', > > - > '91225f50d255e2905ba8d8e0c80b71b9d157c3609770c7a740cd786370d85a77' > ) > > > We need to keep the ASSET_SDK_V903_AST2700 definition for pre-caching the > assets. See : > > https://qemu.readthedocs.io/en/v9.2.0/devel/testing/functional.html#asset-ha > ndling > Will rework functional test. Thanks for your review and suggestion. Jamin > > Thanks, > > C. > > > > +def extra_aspeed_archive(self, version, file, checksum): > > +url = > 'https://github.com/AspeedTech-BMC/openbmc/releases/download' > > +self.archive_extract(Asset(f'{url}/{version}/{file}', > > + f'{checksum}')) > > > > def test_aarch64_ast2700_evb_sdk_v09_03(self): > > self.set_machine('ast2700-evb') > > > > -self.archive_extract(self.ASSET_SDK_V903_AST2700) > > +self.extra_aspeed_archive('v09.03', 'ast2700-default-obmc.tar.gz', > > + > > + > '91225f50d255e2905ba8d8e0c80b71b9d157c3609770c7a740cd786370d85a77' > ) > > > > num_cpu = 4 > > uboot_size = > > os.path.getsize(self.scratch_file('ast2700-default',
RE: [PATCH v3 18/28] hw/arm/aspeed: Add SoC and Machine Support for AST2700 A1
Hi Cedric, > Subject: Re: [PATCH v3 18/28] hw/arm/aspeed: Add SoC and Machine Support > for AST2700 A1 > > On 2/13/25 04:35, Jamin Lin wrote: > > The memory map for AST2700 A1 remains compatible with AST2700 A0. > > However, the IRQ mapping has been updated for AST2700 A1, with GIC > > interrupts now ranging from 192 to 201. Add a new IRQ map table for > AST2700 A1. > > Add "aspeed_soc_ast2700a1_class_init" to initialize the AST2700 A1 SoC. > > Introduce "aspeed_machine_ast2700a1_evb_class_init" to initialize the > > AST2700 A1 EVB. > > > > Signed-off-by: Jamin Lin > > --- > > hw/arm/aspeed.c | 13 +++ > > hw/arm/aspeed_ast27x0.c | 80 > + > > 2 files changed, 93 insertions(+) > > > > diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c index > > 6ddfdbdeba..c0539e5950 100644 > > --- a/hw/arm/aspeed.c > > +++ b/hw/arm/aspeed.c > > @@ -1672,6 +1672,15 @@ static void > aspeed_machine_ast2700a0_evb_class_init(ObjectClass *oc, void *data) > > mc->default_ram_size = 1 * GiB; > > aspeed_machine_class_init_cpus_defaults(mc); > > } > > + > > +static void aspeed_machine_ast2700a1_evb_class_init(ObjectClass *oc, > > +void *data) { > > +MachineClass *mc = MACHINE_CLASS(oc); > > +AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc); > > + > > +mc->desc = "Aspeed AST2700 A1 EVB (Cortex-A35)"; > > +amc->soc_name = "ast2700-a1"; > > +} > > #endif > > > > static void aspeed_machine_qcom_dc_scm_v1_class_init(ObjectClass > > *oc, @@ -1798,6 +1807,10 @@ static const TypeInfo > aspeed_machine_types[] = { > > .name = MACHINE_TYPE_NAME("ast2700a0-evb"), > > .parent= TYPE_ASPEED_MACHINE, > > .class_init= aspeed_machine_ast2700a0_evb_class_init, > > +}, { > > +.name = MACHINE_TYPE_NAME("ast2700a1-evb"), > > +.parent= MACHINE_TYPE_NAME("ast2700a0-evb"), > > Shouldn't the parent be TYPE_ASPEED_MACHINE instead ? > Will update it. Thanks for your review. Jamin > > +.class_init= aspeed_machine_ast2700a1_evb_class_init, > > #endif > > }, { > > .name = TYPE_ASPEED_MACHINE, > > diff --git a/hw/arm/aspeed_ast27x0.c b/hw/arm/aspeed_ast27x0.c index > > 0ccec774de..926b4c3e76 100644 > > --- a/hw/arm/aspeed_ast27x0.c > > +++ b/hw/arm/aspeed_ast27x0.c > > @@ -119,6 +119,52 @@ static const int aspeed_soc_ast2700a0_irqmap[] = { > > [ASPEED_DEV_SDHCI] = 133, > > }; > > > > +static const int aspeed_soc_ast2700a1_irqmap[] = { > > +[ASPEED_DEV_SDMC] = 0, > > +[ASPEED_DEV_HACE] = 4, > > +[ASPEED_DEV_XDMA] = 5, > > +[ASPEED_DEV_UART4] = 8, > > +[ASPEED_DEV_SCU] = 12, > > +[ASPEED_DEV_RTC] = 13, > > +[ASPEED_DEV_EMMC] = 15, > > +[ASPEED_DEV_TIMER1]= 16, > > +[ASPEED_DEV_TIMER2]= 17, > > +[ASPEED_DEV_TIMER3]= 18, > > +[ASPEED_DEV_TIMER4]= 19, > > +[ASPEED_DEV_TIMER5]= 20, > > +[ASPEED_DEV_TIMER6]= 21, > > +[ASPEED_DEV_TIMER7]= 22, > > +[ASPEED_DEV_TIMER8]= 23, > > +[ASPEED_DEV_DP]= 28, > > +[ASPEED_DEV_LPC] = 192, > > +[ASPEED_DEV_IBT] = 192, > > +[ASPEED_DEV_KCS] = 192, > > +[ASPEED_DEV_I2C] = 194, > > +[ASPEED_DEV_ADC] = 194, > > +[ASPEED_DEV_GPIO] = 194, > > +[ASPEED_DEV_FMC] = 195, > > +[ASPEED_DEV_WDT] = 195, > > +[ASPEED_DEV_PWM] = 195, > > +[ASPEED_DEV_I3C] = 195, > > +[ASPEED_DEV_UART0] = 196, > > +[ASPEED_DEV_UART1] = 196, > > +[ASPEED_DEV_UART2] = 196, > > +[ASPEED_DEV_UART3] = 196, > > +[ASPEED_DEV_UART5] = 196, > > +[ASPEED_DEV_UART6] = 196, > > +[ASPEED_DEV_UART7] = 196, > > +[ASPEED_DEV_UART8] = 196, > > +[ASPEED_DEV_UART9] = 196, > > +[ASPEED_DEV_UART10]= 196, > > +[ASPEED_DEV_UART11]= 196, > > +[ASPEED_DEV_UART12]= 196, > > +[ASPEED_DEV_ETH1] = 196, > > +[ASPEED_DEV_ETH2] = 196, > > +[ASPEED_DEV_ETH3] = 196, > > +[ASPEED_DEV_PECI] = 197, > > +[ASPEED_DEV_SDHCI] = 197, > > +}; > > + > > /* GICINT 128 */ > > /* GICINT 192 */ > > static const int ast2700_gic128_gic192_intcmap[] = { @@ -838,6 > > +884,34 @@ static void aspeed_soc_ast2700a0_class_init(ObjectClass *oc, > void *data) > > sc->get_irq = aspeed_soc_ast2700_get_irq; > > } > > > > +static void aspeed_soc_ast2700a1_class_init(ObjectClass *oc, void > > +*data) { > > +static const char * const valid_cpu_types[] = { > > +ARM_CPU_TYPE_NAME("cortex-a35"), > > +NULL > > +}; > > +DeviceClass *dc = DEVICE_CLASS(oc); > > +AspeedSoCClass *sc = ASPEED_SOC_CLASS(oc); > > + > > +/* Reason: The Aspeed SoC can only be instantiated from a board */ > > +dc->user_creatable = false; > > +dc->realize = aspeed_soc_ast2700_realize; > > +
RE: [PATCH v3 21/28] hw/misc/aspeed_hace: Fix boot issue in the Crypto Manager Self Test
Hi Cedric, > Subject: Re: [PATCH v3 21/28] hw/misc/aspeed_hace: Fix boot issue in the > Crypto Manager Self Test > > On 2/13/25 04:35, Jamin Lin wrote: > > Currently, it does not support the CRYPT command. Instead, it only > > sends an interrupt to notify the firmware that the crypt command has > completed. > > It is a temporary workaround to resolve the boot issue in the Crypto > > Manager Self Test. > > > > Signed-off-by: Jamin Lin > > Please add an AspeedHACEClass class attribute (bool) to handle this > workaround and a comment in the code mentioning the issue. > Thanks for review and suggestion. I will add the use_crypt_workaround attribute to the AspeedHACEClass and introduce the use-crypt-workaround property. Do you have any concerns, or do you have a preferred naming convention? Thanks-Jamin > > Thanks, > > C. > > > > > --- > > hw/misc/aspeed_hace.c | 12 > > 1 file changed, 12 insertions(+) > > > > diff --git a/hw/misc/aspeed_hace.c b/hw/misc/aspeed_hace.c index > > 86422cb3be..4d0999e7e9 100644 > > --- a/hw/misc/aspeed_hace.c > > +++ b/hw/misc/aspeed_hace.c > > @@ -59,6 +59,7 @@ > > /* Other cmd bits */ > > #define HASH_IRQ_ENBIT(9) > > #define HASH_SG_EN BIT(18) > > +#define CRYPT_IRQ_EN BIT(12) > > /* Scatter-gather data list */ > > #define SG_LIST_LEN_SIZE4 > > #define SG_LIST_LEN_MASK0x0FFF > > @@ -343,6 +344,13 @@ static void aspeed_hace_write(void *opaque, > hwaddr addr, uint64_t data, > > qemu_irq_lower(s->irq); > > } > > } > > +if (data & CRYPT_IRQ) { > > +data &= ~CRYPT_IRQ; > > + > > +if (s->regs[addr] & CRYPT_IRQ) { > > +qemu_irq_lower(s->irq); > > +} > > +} > > break; > > case R_HASH_SRC: > > data &= ahc->src_mask; > > @@ -388,6 +396,10 @@ static void aspeed_hace_write(void *opaque, > hwaddr addr, uint64_t data, > > case R_CRYPT_CMD: > > qemu_log_mask(LOG_UNIMP, "%s: Crypt commands not > implemented\n", > > __func__); > > +s->regs[R_STATUS] |= CRYPT_IRQ; > > +if (data & CRYPT_IRQ_EN) { > > +qemu_irq_raise(s->irq); > > +} > > break; > > default: > > break;
Re: [PATCH v3 14/14] scripts/ghes_inject: add a script to generate GHES error inject
Mauro Carvalho Chehab writes: > Em Wed, 05 Feb 2025 09:16:53 +0100 > Markus Armbruster escreveu: [...] >> Sorry if this has been answered already... why not GPL-2.0-or-later? >> >> More of the same below. > > No particular reason. It is just that GPL-2.0 is my preferred license. > > I'll change the license of the three scripts to be GPL-2.0-or-later. Thank you!
[PATCH 4/5] pcie, virtio: Remove redundant pm_cap
The pm_cap on the PCIExpressDevice object can be distilled down to the new instance on the PCIDevice object. Cc: Michael S. Tsirkin Cc: Marcel Apfelbaum Signed-off-by: Alex Williamson --- hw/pci-bridge/pcie_pci_bridge.c | 1 - hw/virtio/virtio-pci.c | 8 +++- include/hw/pci/pcie.h | 2 -- 3 files changed, 3 insertions(+), 8 deletions(-) diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c index 9fa656b43b42..2429503cfbbf 100644 --- a/hw/pci-bridge/pcie_pci_bridge.c +++ b/hw/pci-bridge/pcie_pci_bridge.c @@ -56,7 +56,6 @@ static void pcie_pci_bridge_realize(PCIDevice *d, Error **errp) if (pos < 0) { goto pm_error; } -d->exp.pm_cap = pos; pci_set_word(d->config + pos + PCI_PM_PMC, 0x3); pcie_cap_arifwd_init(d); diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index afe8b5551c5c..3ca3f849d391 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -2209,8 +2209,6 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp) return; } -pci_dev->exp.pm_cap = pos; - /* * Indicates that this function complies with revision 1.2 of the * PCI Power Management Interface Specification. @@ -2309,11 +2307,11 @@ static bool virtio_pci_no_soft_reset(PCIDevice *dev) { uint16_t pmcsr; -if (!pci_is_express(dev) || !dev->exp.pm_cap) { +if (!pci_is_express(dev) || !(dev->cap_present & QEMU_PCI_CAP_PM)) { return false; } -pmcsr = pci_get_word(dev->config + dev->exp.pm_cap + PCI_PM_CTRL); +pmcsr = pci_get_word(dev->config + dev->pm_cap + PCI_PM_CTRL); /* * When No_Soft_Reset bit is set and the device @@ -2342,7 +2340,7 @@ static void virtio_pci_bus_reset_hold(Object *obj, ResetType type) if (proxy->flags & VIRTIO_PCI_FLAG_INIT_PM) { pci_word_test_and_clear_mask( -dev->config + dev->exp.pm_cap + PCI_PM_CTRL, +dev->config + dev->pm_cap + PCI_PM_CTRL, PCI_PM_CTRL_STATE_MASK); } } diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h index b8d59732bc63..70a5de09de39 100644 --- a/include/hw/pci/pcie.h +++ b/include/hw/pci/pcie.h @@ -58,8 +58,6 @@ typedef enum { struct PCIExpressDevice { /* Offset of express capability in config space */ uint8_t exp_cap; -/* Offset of Power Management capability in config space */ -uint8_t pm_cap; /* SLOT */ bool hpev_notified; /* Logical AND of conditions for hot plug event. -- 2.48.1
Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
On Thu, Feb 20, 2025 at 04:57:51PM +0100, Eric Auger wrote: > Hi Michael, > > > On 2/20/25 4:25 PM, Michael S. Tsirkin wrote: > > On Fri, Jan 31, 2025 at 10:55:26AM +0100, Eric Auger wrote: > >> I tested [PATCH] virtio: Remove virtio devices on device_shutdown() > >> https://lore.kernel.org/all/20240808075141.3433253-1-kirill.shute...@linux.intel.com/ > >> > >> and it fixes my issue > >> > >> Eric > > > > To make sure, we are dropping this in favor of the guest fix (which > > I intent to merge shortly), correct? > > Yes this should be dropped in favour of the implementation of the > shutdown callback on guest side. Did you send your patch, I did not see it. Will send tomorrow. > however on qemu side, there is "[PATCH v3 0/5] Fix vIOMMU reset order" > that needs to be merged to fix qmp system_reset. Yes that one I'm testing. > Thanks > > Eric > >
Re: [PATCH v3 028/162] tcg: Convert sub to TCGOutOpSubtract
Hi Richard, On 17/2/25 00:07, Richard Henderson wrote: Create a special subclass for sub, because two backends can support "subtract from immediate". Drop all backend support for an immediate as the second operand, as we transform sub to add during optimize. Signed-off-by: Richard Henderson --- tcg/mips/tcg-target-con-set.h| 1 - tcg/ppc/tcg-target-con-set.h | 3 +- tcg/riscv/tcg-target-con-set.h | 1 - tcg/riscv/tcg-target-con-str.h | 1 - tcg/tcg.c| 26 ++-- tcg/aarch64/tcg-target.c.inc | 24 +++ tcg/arm/tcg-target.c.inc | 29 +++--- tcg/i386/tcg-target.c.inc| 23 +++--- tcg/loongarch64/tcg-target.c.inc | 32 +--- tcg/mips/tcg-target.c.inc| 31 --- tcg/ppc/tcg-target.c.inc | 52 +++- tcg/riscv/tcg-target.c.inc | 45 +-- tcg/s390x/tcg-target.c.inc | 41 +++-- tcg/sparc64/tcg-target.c.inc | 16 +++--- tcg/tci/tcg-target.c.inc | 14 +++-- 15 files changed, 165 insertions(+), 174 deletions(-) diff --git a/tcg/tcg.c b/tcg/tcg.c index b740609c03..ca91a80efc 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -986,6 +986,14 @@ typedef struct TCGOutOpBinary { TCGReg a0, TCGReg a1, tcg_target_long a2); } TCGOutOpBinary; +typedef struct TCGOutOpSubtract { +TCGOutOp base; +void (*out_rrr)(TCGContext *s, TCGType type, +TCGReg a0, TCGReg a1, TCGReg a2); +void (*out_rir)(TCGContext *s, TCGType type, +TCGReg a0, tcg_target_long a1, TCGReg a2); +} TCGOutOpSubtract; + #include "tcg-target.c.inc" #ifndef CONFIG_TCG_INTERPRETER @@ -1012,6 +1020,8 @@ static const TCGOutOp * const all_outop[NB_OPS] = { OUTOP(INDEX_op_nor, TCGOutOpBinary, outop_nor), OUTOP(INDEX_op_or, TCGOutOpBinary, outop_or), OUTOP(INDEX_op_orc, TCGOutOpBinary, outop_orc), +OUTOP(INDEX_op_sub_i32, TCGOutOpSubtract, outop_sub), +OUTOP(INDEX_op_sub_i64, TCGOutOpSubtract, outop_sub), OUTOP(INDEX_op_xor, TCGOutOpBinary, outop_xor), }; @@ -2231,7 +2241,6 @@ bool tcg_op_supported(TCGOpcode op, TCGType type, unsigned flags) case INDEX_op_st8_i32: case INDEX_op_st16_i32: case INDEX_op_st_i32: -case INDEX_op_sub_i32: case INDEX_op_neg_i32: case INDEX_op_mul_i32: case INDEX_op_shl_i32: @@ -2301,7 +2310,6 @@ bool tcg_op_supported(TCGOpcode op, TCGType type, unsigned flags) case INDEX_op_st16_i64: case INDEX_op_st32_i64: case INDEX_op_st_i64: -case INDEX_op_sub_i64: case INDEX_op_neg_i64: case INDEX_op_mul_i64: case INDEX_op_shl_i64: @@ -5442,6 +5450,20 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op) } break; +case INDEX_op_sub_i32: +case INDEX_op_sub_i64: +{ +const TCGOutOpSubtract *out = &outop_sub; + +tcg_debug_assert(!const_args[2]); +if (const_args[1]) { +out->out_rir(s, type, new_args[0], new_args[1], new_args[2]); +} else { +out->out_rrr(s, type, new_args[0], new_args[1], new_args[2]); +} +} +break; + default: if (def->flags & TCG_OPF_VECTOR) { tcg_out_vec_op(s, op->opc, type - TCG_TYPE_V64, diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc index 30cad937b7..dfe67c1261 100644 --- a/tcg/aarch64/tcg-target.c.inc +++ b/tcg/aarch64/tcg-target.c.inc @@ -2205,6 +2205,17 @@ static const TCGOutOpBinary outop_orc = { .out_rrr = tgen_orc, }; +static void tgen_sub(TCGContext *s, TCGType type, + TCGReg a0, TCGReg a1, TCGReg a2) +{ +tcg_out_insn(s, 3502, SUB, type, a0, a1, a2); +} + +static const TCGOutOpSubtract outop_sub = { +.base.static_constraint = C_O1_I2(r, r, r), +.out_rrr = tgen_sub, +}; + static void tgen_xor(TCGContext *s, TCGType type, TCGReg a0, TCGReg a1, TCGReg a2) { @@ -2290,15 +2301,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, TCGType ext, tcg_out_ldst(s, I3312_STRX, a0, a1, a2, 3); break; -case INDEX_op_sub_i32: -case INDEX_op_sub_i64: -if (c2) { -tgen_addi(s, ext, a0, a1, -a2); I'm a bit lost with this change here, and following patch #4 (ADD conversion). Since tgen_addi() effectively handles both add/sub, why not name it tgen_addsubi() like the old tcg_out_addsubi() name? (Maybe reorder to have that SUB conv patch after ADD conv?) -} else { -tcg_out_insn(s, 3502, SUB, ext, a0, a1, a2); -} -break; - case INDEX_op_neg_i64: case INDEX_op_neg_i32: tcg_out_insn(s, 3502, SUB, ext, a0, TCG_REG_XZR, a1); @@ -3014,10 +3016,6 @@ tcg_target_op_def(TCGOpcode op, TCGType type, unsigned
Re: [PATCH v3 1/2] hw/i386/amd_iommu: Isolate AMDVI-PCI from amd-iommu device to allow full control over the PCI device creation
On Wed, Feb 12, 2025 at 05:44:49AM +, Suravee Suthikulpanit wrote: > Current amd-iommu model internally creates an AMDVI-PCI device. Here is > a snippet from info qtree: > > bus: main-system-bus > type System > dev: amd-iommu, id "" > xtsup = false > pci-id = "" > intremap = "on" > device-iotlb = false > pt = true > ... > dev: q35-pcihost, id "" > MCFG = -1 (0x) > pci-hole64-size = 34359738368 (32 GiB) > below-4g-mem-size = 134217728 (128 MiB) > above-4g-mem-size = 0 (0 B) > smm-ranges = true > x-pci-hole64-fix = true > x-config-reg-migration-enabled = true > bypass-iommu = false > bus: pcie.0 > type PCIE > dev: AMDVI-PCI, id "" > addr = 01.0 > romfile = "" > romsize = 4294967295 (0x) > rombar = -1 (0x) > multifunction = false > x-pcie-lnksta-dllla = true > x-pcie-extcap-init = true > failover_pair_id = "" > acpi-index = 0 (0x0) > x-pcie-err-unc-mask = true > x-pcie-ari-nextfn-1 = false > x-max-bounce-buffer-size = 4096 (4 KiB) > x-pcie-ext-tag = true > busnr = 0 (0x0) > class Class 0806, addr 00:01.0, pci id 1022: (sub 1af4:1100) > > This prohibits users from specifying the PCI topology for the amd-iommu > device, > which becomes a problem when trying to support VM migration since it does not > guarantee the same enumeration of AMD IOMMU device. > > Therfore, decouple the AMDVI-PCI from amd-iommu device and introduce pci-id > parameter to link between the two devices. > > For example: > -device AMDVI-PCI,id=iommupci0,bus=pcie.0,addr=0x05 \ > -device amd-iommu,intremap=on,pt=on,xtsup=on,pci-id=iommupci0 \ > > For backward-compatibility, internally create the AMDVI-PCI device if not > specified on the CLI. > > Co-developed-by: Daniel P. Berrangé > Signed-off-by: Suravee Suthikulpanit breaks build: https://gitlab.com/mstredhat/qemu/-/jobs/9202802751 ./hw/i386/amd_iommu.c: In function ‘amdvi_sysbus_realize’: ../hw/i386/amd_iommu.c:1616:18: error: unused variable ‘dc’ [-Werror=unused-variable] 1616 | DeviceClass *dc = (DeviceClass *) object_get_class(OBJECT(dev)); | ^~ cc1: all warnings being treated as errors > --- > hw/i386/acpi-build.c | 8 +++ > hw/i386/amd_iommu.c | 53 +++- > hw/i386/amd_iommu.h | 3 ++- > 3 files changed, 39 insertions(+), 25 deletions(-) > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > index 53b7306b43..e70eeaf577 100644 > --- a/hw/i386/acpi-build.c > +++ b/hw/i386/acpi-build.c > @@ -2333,10 +2333,10 @@ build_amd_iommu(GArray *table_data, BIOSLinker > *linker, const char *oem_id, > build_append_int_noprefix(table_data, ivhd_blob->len + 24, 2); > /* DeviceID */ > build_append_int_noprefix(table_data, > - object_property_get_int(OBJECT(&s->pci), > "addr", > + object_property_get_int(OBJECT(s->pci), "addr", >&error_abort), 2); > /* Capability offset */ > -build_append_int_noprefix(table_data, s->pci.capab_offset, 2); > +build_append_int_noprefix(table_data, s->pci->capab_offset, 2); > /* IOMMU base address */ > build_append_int_noprefix(table_data, s->mr_mmio.addr, 8); > /* PCI Segment Group */ > @@ -2368,10 +2368,10 @@ build_amd_iommu(GArray *table_data, BIOSLinker > *linker, const char *oem_id, > build_append_int_noprefix(table_data, ivhd_blob->len + 40, 2); > /* DeviceID */ > build_append_int_noprefix(table_data, > - object_property_get_int(OBJECT(&s->pci), > "addr", > + object_property_get_int(OBJECT(s->pci), "addr", >&error_abort), 2); > /* Capability offset */ > -build_append_int_noprefix(table_data, s->pci.capab_offset, 2); > +build_append_int_noprefix(table_data, s->pci->capab_offset, 2); > /* IOMMU base address */ > build_append_int_noprefix(table_data, s->mr_mmio.addr, 8); > /* PCI Segment Group */ > diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c > index 6b13ce894b..0f552bafa0 100644 > --- a/hw/i386/amd_iommu.c > +++ b/hw/i386/amd_iommu.c > @@ -167,11 +167,11 @@ static void amdvi_generate_msi_interrupt(AMDVIState *s) > { > MSIMessage msg = {}; > MemTxAttrs attrs = { > -.requester_id = pci_requester_id(&s->pci.dev) > +.requester_id = pci_requester_id(&s->pci->dev) > }; > > -if (msi_enabled(&s->pci.dev)) { > -msg = msi_get_message(&s->pci.dev, 0); > +if (msi_enabled(&s->pci->dev)) { > +msg = msi_get_message(&s->pci->dev, 0); > address_space_stl_le(&address_space_memo
Re: [PATCH v3 03/14] acpi/ghes: Use HEST table offsets when preparing GHES records
Em Mon, 3 Feb 2025 15:34:23 +0100 Igor Mammedov escreveu: > On Fri, 31 Jan 2025 18:42:44 +0100 > Mauro Carvalho Chehab wrote: > > > There are two pointers that are needed during error injection: > > > > 1. The start address of the CPER block to be stored; > > 2. The address of the ack. > > > > It is preferable to calculate them from the HEST table. This allows > > checking the source ID, the size of the table and the type of the > > HEST error block structures. > > > > Yet, keep the old code, as this is needed for migration purposes. > > > > Signed-off-by: Mauro Carvalho Chehab > > --- > > hw/acpi/ghes.c | 132 - > > include/hw/acpi/ghes.h | 1 + > > 2 files changed, 119 insertions(+), 14 deletions(-) > > > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c > > index 27478f2d5674..8f284fd191a6 100644 > > --- a/hw/acpi/ghes.c > > +++ b/hw/acpi/ghes.c > > @@ -41,6 +41,12 @@ > > /* Address offset in Generic Address Structure(GAS) */ > > #define GAS_ADDR_OFFSET 4 > > > > +/* > > + * ACPI spec 1.0b > > + * 5.2.3 System Description Table Header > > + */ > > +#define ACPI_DESC_HEADER_OFFSET 36 > > + > > /* > > * The total size of Generic Error Data Entry > > * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data, > > @@ -61,6 +67,25 @@ > > */ > > #define ACPI_GHES_GESB_SIZE 20 > > > > +/* > > + * Offsets with regards to the start of the HEST table stored at > > + * ags->hest_addr_le, > > If I read this literary, then offsets above are not what > declared later in this patch. > I'd really drop this comment altogether as it's confusing, > and rather get variables/macro naming right > > > according with the memory layout map at > > + * docs/specs/acpi_hest_ghes.rst. > > + */ > > what we need is update to above doc, describing new and old ways. > a separate patch. I can't see anything that should be changed at docs/specs/acpi_hest_ghes.rst, as this series doesn't change the firmware layout: we're still using two firmware tables: - etc/acpi/tables, with HEST on it; - etc/hardware_errors, with: - error block addresses; - read_ack registers; - CPER records. The only changes that this series introduce are related to how the error generation logic navigates between HEST and hw_errors firmware. This is not described at acpi_hest_ghes.rst, and both ways follow ACPI specs to the letter. The only difference is that the code which populates the CPER record and the error/read offsets doesn't require to know how the HEST table generation placed offsets, as it will basically reproduce what OSPM firmware does when handling HEST events. Thanks, Mauro
RE: [PATCH rfcv2 00/20] intel_iommu: Enable stage-1 translation for passthrough device
Hi Eric, >-Original Message- >From: Eric Auger >Subject: Re: [PATCH rfcv2 00/20] intel_iommu: Enable stage-1 translation for >passthrough device > > >Hi Zhenzhong > >On 2/19/25 9:22 AM, Zhenzhong Duan wrote: >> Hi, >> >> Per Jason Wang's suggestion, iommufd nesting series[1] is split into >> "Enable stage-1 translation for emulated device" series and >> "Enable stage-1 translation for passthrough device" series. >> >> This series is 2nd part focusing on passthrough device. We don't do >> shadowing of guest page table for passthrough device but pass stage-1 >> page table to host side to construct a nested domain. There was some >> effort to enable this feature in old days, see [2] for details. >> >> The key design is to utilize the dual-stage IOMMU translation >> (also known as IOMMU nested translation) capability in host IOMMU. >> As the below diagram shows, guest I/O page table pointer in GPA >> (guest physical address) is passed to host and be used to perform >s/be/is >> the stage-1 address translation. Along with it, modifications to >> present mappings in the guest I/O page table should be followed >> with an IOTLB invalidation. >> >> .-. .---. >> | vIOMMU| | Guest I/O page table | >> | | '---' >> ./ >> | PASID Entry |--- PASID cache flush --+ >> '-'| >> | |V >> | | I/O page table pointer in GPA >> '-' >> Guest >> --| Shadow |---| >> vv v >> Host >> .-. .. >> | pIOMMU| | FS for GIOVA->GPA | >> | | '' >> ./ | >> | PASID Entry | V (Nested xlate) >> '\.--. >> | | | SS for GPA->HPA, unmanaged domain| >> | | '--' >> '-' >> Where: >> - FS = First stage page tables >> - SS = Second stage page tables >> >> >> There are some interactions between VFIO and vIOMMU >> * vIOMMU registers PCIIOMMUOps [set|unset]_iommu_device to PCI >> subsystem. VFIO calls them to register/unregister HostIOMMUDevice >> instance to vIOMMU at vfio device realize stage. >> * vIOMMU calls HostIOMMUDeviceIOMMUFD interface [at|de]tach_hwpt >> to bind/unbind device to IOMMUFD backed domains, either nested >> domain or not. >> >> See below diagram: >> >> VFIO Device Intel IOMMU >> .-. .---. >> | | | | >> | .-|PCIIOMMUOps |.-.| >> | | IOMMUFD |(set_iommu_device) || Host IOMMU || >> | | Device |>|| Device list || >> | .-|(unset_iommu_device) |.-.| >> | | | | | >> | | | V | >> | .-| HostIOMMUDeviceIOMMUFD | .-. | >> | | IOMMUFD |(attach_hwpt)| | Host IOMMU | | >> | | link|<| | Device| | >> | .-|(detach_hwpt)| .-. | >> | | | | | >> | | | ... | >> .-. .---. >> >> Based on Yi's suggestion, this design is optimal in sharing ioas/hwpt >> whenever possible and create new one on demand, also supports multiple >> iommufd objects and ERRATA_772415. >> >> E.g., Stage-2 page table could be shared by different devices if there >> is no conflict and devices link to same iommufd object, i.e. devices >> under same host IOMMU can share same stage-2 page table. If there is >> conflict, i.e. there is one device under non cache coherency mode >> which is different from others, it requires a separate stage-2 page >> table in non-CC mode. >> >> SPR platform has ERRATA_772415 which requires no readonly mappings >> in stage-2 page table. This series supports creating VTDIOASContainer >> with no readonly mappings. If there is a rare case that some IOMMUs >> on a multiple IOMMU host have ERRATA_772415 and others not, this >> design can still survive. >> >> See below example diagram for a full view: >> >> IntelIOMMUState >> | >> V >> .--..--..---. >> | VTDIOA
Re: [PATCH v2 00/19] intel_iommu: Add ATS support
On 20/02/2025 22:13, Michael S. Tsirkin wrote: > Caution: External email. Do not open attachments or click links, unless this > email comes from a known sender and you know the content is safe. > > > On Mon, Jan 20, 2025 at 05:41:32PM +, CLEMENT MATHIEU--DRIF wrote: >> From: Clement Mathieu--Drif >> >> This patch set belongs to a list of series that add SVM support for VT-d. >> >> Here we focus on implementing ATS support in the IOMMU and adding a >> PCI-level API to be used by virtual devices. >> >> This work is based on the VT-d specification version 4.1 (March 2023). >> >> Here is a link to our GitHub repository where you can find the following >> elements: >> - Qemu with all the patches for SVM >> - ATS >> - PRI >> - Device IOTLB invalidations >> - Requests with already pre-translated addresses >> - A demo device >> - A simple driver for the demo device >> - A userspace program (for testing and demonstration purposes) >> >> https://github.com/BullSequana/Qemu-in-guest-SVM-demo > > Fails build: > > https://gitlab.com/mstredhat/qemu/-/jobs/9200372388 > > In function ‘vtd_iommu_ats_do_translate’, > inlined from ‘vtd_iommu_ats_request_translation’ at > ../hw/i386/intel_iommu.c:4778:17: > ../hw/i386/intel_iommu.c:4758:12: error: ‘entry.target_as’ may be used > uninitialized [-Werror=maybe-uninitialized] > 4758 | return entry; >|^ > ../hw/i386/intel_iommu.c: In function ‘vtd_iommu_ats_request_translation’: > ../hw/i386/intel_iommu.c:4745:19: note: ‘entry’ declared here > 4745 | IOMMUTLBEntry entry; >| ^ > cc1: all warnings being treated as errors > Uh, looks like the error is only present in non-debug mode. I'll send a v3 Thanks Michael > > >> === >> >> Context and design notes >> >> >> The main purpose of this work is to enable vVT-d users to make >> translation requests to the vIOMMU as described in the PCIe Gen 5.0 >> specification (section 10). Moreover, we aim to implement a >> PCI/Memory-level framework that could be used by other vIOMMUs >> to implement the same features. >> >> What is ATS? >> >> >> ATS (Address Translation Service) is a PCIe-level protocol that >> enables PCIe devices to query an IOMMU for virtual to physical >> address translations in a specific address space (such as a userland >> process address space). When a device receives translation responses >> from an IOMMU, it may decide to store them in an internal cache, >> often known as "ATC" (Address Translation Cache) or "Device IOTLB". >> To keep page tables and caches consistent, the IOMMU is allowed to >> send asynchronous invalidation requests to its client devices. >> >> To avoid introducing an unnecessarily complex API, this series simply >> exposes 3 functions. The first 2 are a pair of setup functions that >> are called to install and remove the ATS invalidation callback during >> the initialization phase of a process. The third one will be >> used to request translations. The callback setup API introduced in >> this series calls the IOMMUNotifier API under the hood. >> >> API design >> '' >> >> - int pci_register_iommu_tlb_event_notifier(PCIDevice *dev, >> uint32_t pasid, >> IOMMUNotifier *n); >> >> - int pci_unregister_iommu_tlb_event_notifier(PCIDevice *dev, uint32_t pasid, >>IOMMUNotifier *n); >> >> - ssize_t pci_ats_request_translation_pasid(PCIDevice *dev, uint32_t pasid, >> bool priv_req, bool exec_req, >> hwaddr addr, size_t length, >> bool no_write, >> IOMMUTLBEntry *result, >> size_t result_length, >> uint32_t *err_count); >> >> Although device developers may want to implement custom ATC for >> testing or performance measurement purposes, we provide a generic >> implementation as a utility module. >> >> Overview >> >> >> Here are the interactions between an ATS-capable PCIe device and the vVT-d: >> >> >> >>┌───┐ ┌┐ >>│Device │ │PCI / Memory│ >>│ │ pci_ats_request_│abstraction │ iommu_ats_ >>│ │ translation_││ request_ >>│┌─┐│ pasid │ AS lookup │ translation >>││Logic││>│╶╶╶>│──┐ >>│└─┘│<│<╶╶╶│<──┐ │ >>│┌─┐│ ││ │ │ >>││inv func ││<───┐││ │ │ >>│└─┘││││ │ │ >>││ │││
[PATCH v2 0/3] tests/functional: Allow running TCG plugins tests on macOS
Pierrick kindly helped me to resolve this issue which ended being trivial (to him!). Not tested on Windows so far. Since v1: - Include Pierrick's meson fix patch (adding Fixes: tag) - Addressed Thomas review comments (config.py, os.path.join) Philippe Mathieu-Daudé (2): tests/functional: Introduce the dso_suffix() helper tests/functional: Allow running TCG plugins tests on non-Linux/BSD hosts Pierrick Bouvier (1): plugins: add explicit dependency in functional tests meson.build | 1 + contrib/plugins/meson.build | 2 ++ tests/functional/meson.build | 2 +- tests/functional/qemu_test/__init__.py | 2 +- tests/functional/qemu_test/cmd.py| 1 - tests/functional/qemu_test/config.py | 6 ++ tests/functional/test_aarch64_tcg_plugins.py | 11 --- tests/tcg/plugins/meson.build| 2 ++ 8 files changed, 21 insertions(+), 6 deletions(-) -- 2.47.1
[PATCH v2 2/3] tests/functional: Introduce the dso_suffix() helper
Introduce a helper to get the default shared library suffix used on the host. Suggested-by: Pierrick Bouvier Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Pierrick Bouvier --- tests/functional/qemu_test/__init__.py | 2 +- tests/functional/qemu_test/cmd.py | 1 - tests/functional/qemu_test/config.py | 6 ++ 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/tests/functional/qemu_test/__init__.py b/tests/functional/qemu_test/__init__.py index 5c972843a6d..45f7befa374 100644 --- a/tests/functional/qemu_test/__init__.py +++ b/tests/functional/qemu_test/__init__.py @@ -7,7 +7,7 @@ from .asset import Asset -from .config import BUILD_DIR +from .config import BUILD_DIR, dso_suffix from .cmd import is_readable_executable_file, \ interrupt_interactive_console_until_pattern, wait_for_console_pattern, \ exec_command, exec_command_and_wait_for_pattern, get_qemu_img, which diff --git a/tests/functional/qemu_test/cmd.py b/tests/functional/qemu_test/cmd.py index dc5f422b77d..254e23ef748 100644 --- a/tests/functional/qemu_test/cmd.py +++ b/tests/functional/qemu_test/cmd.py @@ -15,7 +15,6 @@ import os import os.path - def which(tool): """ looks up the full path for @tool, returns None if not found or if @tool does not have executable permissions. diff --git a/tests/functional/qemu_test/config.py b/tests/functional/qemu_test/config.py index edd75b7fd06..0eab1baa541 100644 --- a/tests/functional/qemu_test/config.py +++ b/tests/functional/qemu_test/config.py @@ -13,6 +13,7 @@ import os from pathlib import Path +import platform def _source_dir(): @@ -34,3 +35,8 @@ def _build_dir(): raise Exception("Cannot identify build dir, set QEMU_BUILD_ROOT") BUILD_DIR = _build_dir() + +def dso_suffix(): +'''Return the dynamic libraries suffix for the current platform''' +DSO_SUFFIXES = { 'Linux': 'so', 'Darwin': 'dylib', 'Windows': 'dll' } +return DSO_SUFFIXES[platform.system()] -- 2.47.1
[PATCH v2 3/3] tests/functional: Allow running TCG plugins tests on non-Linux/BSD hosts
Not all platforms use the '.so' suffix for shared libraries, which is how plugins are built. Use the recently introduced dso_suffix() helper to get the proper host suffix. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2804 Suggested-by: Pierrick Bouvier Suggested-by: Daniel P. Berrangé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Pierrick Bouvier --- tests/functional/test_aarch64_tcg_plugins.py | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/tests/functional/test_aarch64_tcg_plugins.py b/tests/functional/test_aarch64_tcg_plugins.py index 7e8beacc833..5736b60545a 100755 --- a/tests/functional/test_aarch64_tcg_plugins.py +++ b/tests/functional/test_aarch64_tcg_plugins.py @@ -13,10 +13,11 @@ import tempfile import mmap +import os import re from qemu.machine.machine import VMLaunchFailure -from qemu_test import LinuxKernelTest, Asset +from qemu_test import LinuxKernelTest, Asset, dso_suffix class PluginKernelBase(LinuxKernelTest): @@ -62,6 +63,10 @@ class PluginKernelNormal(PluginKernelBase): ('https://storage.tuxboot.com/20230331/arm64/Image'), 'ce95a7101a5fecebe0fe630deee6bd97b32ba41bc8754090e9ad8961ea8674c7') +def plugin_file(self, plugin_name): +sfx = dso_suffix() +return os.path.join('tests', 'tcg', 'plugins', f'{plugin_name}.{sfx}') + def test_aarch64_virt_insn(self): self.set_machine('virt') self.cpu='cortex-a53' @@ -74,7 +79,7 @@ def test_aarch64_virt_insn(self): suffix=".log") self.run_vm(kernel_path, kernel_command_line, -"tests/tcg/plugins/libinsn.so", plugin_log.name, +self.plugin_file('libinsn'), plugin_log.name, console_pattern) with plugin_log as lf, \ @@ -100,7 +105,7 @@ def test_aarch64_virt_insn_icount(self): suffix=".log") self.run_vm(kernel_path, kernel_command_line, -"tests/tcg/plugins/libinsn.so", plugin_log.name, +self.plugin_file('libinsn'), plugin_log.name, console_pattern, args=('-icount', 'shift=1')) -- 2.47.1
Re: [PATCH v2 2/3] tests/functional: Introduce the dso_suffix() helper
On 20/2/25 09:02, Philippe Mathieu-Daudé wrote: Introduce a helper to get the default shared library suffix used on the host. Suggested-by: Pierrick Bouvier Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Pierrick Bouvier --- tests/functional/qemu_test/__init__.py | 2 +- tests/functional/qemu_test/cmd.py | 1 - tests/functional/qemu_test/config.py | 6 ++ 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/tests/functional/qemu_test/cmd.py b/tests/functional/qemu_test/cmd.py index dc5f422b77d..254e23ef748 100644 --- a/tests/functional/qemu_test/cmd.py +++ b/tests/functional/qemu_test/cmd.py @@ -15,7 +15,6 @@ import os import os.path - Oops, Alex, should I respin a v3 removing this change? def which(tool): """ looks up the full path for @tool, returns None if not found or if @tool does not have executable permissions.
Re: [PATCH] qapi: pluggable backend code generators
Cc: John for advice on my somewhat nebulous mypy worries at the end. Daniel P. Berrangé writes: > The 'qapi.backend.QAPIBackend' class defines an API contract for > code generators. The current generator is put into a new class > 'qapi.backend.QAPICBackend' and made to be the default impl. > > A custom generator can be requested using the '-k' arg which takes > a fully qualified python class name > >qapi-gen.py -k the.python.module.QAPIMyBackend I understand why that will be useful, but explaining it briefly in the commit message wouldn't hurt. > > Signed-off-by: Daniel P. Berrangé > --- > > This is an impl of the idea I mentioned at: > >https://lists.nongnu.org/archive/html/qemu-devel/2025-02/msg03475.html > > With this change, it is possible for the Go generator code to live > outside of qemu.git, invoked using: > > $ PYTHONPATH=/path/to/qemu.git/scripts \ > python /path/to/qemu.git/scripts/qapi-gen.py \ > -o somedir \ > -k qapi.golang.golang.QAPIGoBackend \ > /path/to/qemu.git/qga/qapi-schema.json > > The external app could just expect qemu.git to be checkedout somewhere > convenient, or could use a git submodule to reference it. > > scripts/qapi/backend.py | 96 + > scripts/qapi/main.py| 65 > 2 files changed, 113 insertions(+), 48 deletions(-) > create mode 100644 scripts/qapi/backend.py > > diff --git a/scripts/qapi/backend.py b/scripts/qapi/backend.py > new file mode 100644 > index 00..b6873fd2e3 > --- /dev/null > +++ b/scripts/qapi/backend.py > @@ -0,0 +1,96 @@ > +# This work is licensed under the terms of the GNU GPL, version 2 or later. > +# See the COPYING file in the top-level directory. > + > +from abc import ABC, abstractmethod > +from typing import Optional > + > +from .commands import gen_commands > +from .common import must_match > +from .events import gen_events > +from .features import gen_features > +from .introspect import gen_introspect > +from .schema import QAPISchema > +from .types import gen_types > +from .visit import gen_visit > + > + > +def invalid_prefix_char(prefix: str) -> Optional[str]: > +match = must_match(r'([A-Za-z_.-][A-Za-z0-9_.-]*)?', prefix) > +if match.end() != len(prefix): > +return prefix[match.end()] > +return None > + > + > +class QAPIBackend(ABC): > + > +def run(self, > +schema_file: str, > +output_dir: str, > +prefix: str, > +unmask: bool = False, > +builtins: bool = False, > +gen_tracing: bool = False) -> None: > +""" > +Run the code generator for the given schema into the target > directory. > + > +:param schema_file: The primary QAPI schema file. > +:param output_dir: The output directory to store generated code. > +:param prefix: Optional C-code prefix for symbol names. > +:param unmask: Expose non-ABI names through introspection? > +:param builtins: Generate code for built-in types? > + > +:raise QAPIError: On failures. > +""" > +assert invalid_prefix_char(prefix) is None > + > +schema = QAPISchema(schema_file) Hmm. This makes the backend run the frontend. Could we keep this in main.py instead? > +self.generate(schema, output_dir, prefix, unmask, builtins, > gen_tracing) > + > +@abstractmethod > +def generate(self, > + schema: QAPISchema, > + output_dir: str, > + prefix: str, > + unmask: bool, > + builtins: bool, > + gen_tracing: bool) -> None: > +""" > +Generate code for the given schema into the target directory. > + > +:param schema: The primary QAPI schema object. > +:param output_dir: The output directory to store generated code. > +:param prefix: Optional C-code prefix for symbol names. > +:param unmask: Expose non-ABI names through introspection? > +:param builtins: Generate code for built-in types? > + > +:raise QAPIError: On failures. > +""" > +pass pylint prefers not to pass: scripts/qapi/backend.py:68:8: W0107: Unnecessary pass statement (unnecessary-pass) > + > + > +class QAPICBackend(QAPIBackend): > + > +def generate(self, > + schema: QAPISchema, > + output_dir: str, > + prefix: str, > + unmask: bool, > + builtins: bool, > + gen_tracing: bool) -> None: > +""" > +Generate C code for the given schema into the target directory. > + > +:param schema_file: The primary QAPI schema file. > +:param output_dir: The output directory to store generated code. > +:param prefix: Optional C-code prefix for symbol names. > +:param unmask: Expose non-ABI names through introspection? > +:param builtins: G
Re: [PATCH v4 00/11]
Daniel P. Berrangé writes: [...] > When I think about the code generator and how this will all > evolve over time, I have a strong feeling that none of this > should be in qemu.git. Yes, keeping it in qemu.git has its drawbacks. Testing is awkward there. The coupling could cause friction. I'll gladly support exploring alternatives. I saw your "[PATCH] qapi: pluggable backend code generators", and just sent my review. Thanks! [...]
Re: [PATCH v2 1/9] hw/char/pl011: Warn when using disabled receiver
On 10:28 Thu 20 Feb , Philippe Mathieu-Daudé wrote: > We shouldn't receive characters when the full UART or its > receiver is disabled. However we don't want to break the > possibly incomplete "my first bare metal assembly program"s, > so we choose to simply display a warning when this occurs. > > Reviewed-by: Richard Henderson > Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Luc Michel > --- > hw/char/pl011.c | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/hw/char/pl011.c b/hw/char/pl011.c > index 06ce851044d..12a2d4bc7bd 100644 > --- a/hw/char/pl011.c > +++ b/hw/char/pl011.c > @@ -85,6 +85,7 @@ DeviceState *pl011_create(hwaddr addr, qemu_irq irq, > Chardev *chr) > #define CR_OUT1 (1 << 12) > #define CR_RTS (1 << 11) > #define CR_DTR (1 << 10) > +#define CR_RXE (1 << 9) > #define CR_TXE (1 << 8) > #define CR_LBE (1 << 7) > #define CR_UARTEN (1 << 0) > @@ -487,6 +488,14 @@ static int pl011_can_receive(void *opaque) > PL011State *s = (PL011State *)opaque; > int r; > > +if (!(s->cr & CR_UARTEN)) { > +qemu_log_mask(LOG_GUEST_ERROR, > + "PL011 receiving data on disabled UART\n"); > +} > +if (!(s->cr & CR_RXE)) { > +qemu_log_mask(LOG_GUEST_ERROR, > + "PL011 receiving data on disabled RX UART\n"); > +} > r = s->read_count < pl011_get_fifo_depth(s); > trace_pl011_can_receive(s->lcr, s->read_count, r); > return r; > -- > 2.47.1 > --
Re: [PATCH v5 0/7] hw/loongarch/virt: Add cpu hotplug support
Slightly ping. Hope to merge to qemu 10.0 if possible :) Regards Bibo Mao On 2025/2/11 上午11:08, Bibo Mao wrote: LoongArch cpu hotplug is based on ACPI GED device, it depends on patchset where TYPE_HOTPLUG_HANDLER interface is added in ipi and extioi interrupt controller class for cpu hotplug event notification. https://lore.kernel.org/qemu-devel/0d920309-c7ba-48d8-b46d-04ac1e38e...@linaro.org/T/#t It can be verified with qemu command: qemu-system-loongarch64 -smp 2,maxcpus=16,sockets=4,cores=4,threads=1 and vcpu can be added or remove with hmp command: device_add la464-loongarch-cpu,socket-id=0,core-id=2,thread-id=0,id=cpu-2 device_del cpu-2 --- v4 ... v5: 1. Use new calculation logic about physical cpu id, the max core-id and thread-id is aligned up with power of 2. 2. Remove num-cpu property with ipi and extioi interrupt controller, TYPE_HOTPLUG_HANDLER interface is added with the interrupt controllers for cpu hotplug event notification. v3 ... v4: 1. For cold-plug CPUs, move socket-id/core-id/thread-id property setting from preplug function to CPU object creating loop, since there is topo information calculation already in CPU object creating loop. 2. Init interrupt pin of CPU object in cpu plug interface for both cold-plug CPUs and hot-plug CPUs. 3. Apply the patch based on latest qemu version. v2 ... v3: 1. Use qdev_realize_and_unref() with qdev_realize() and object_unref(). 2. Set vcpus_count with 1 since vcpu object is created for every thread. 3. Remove property hw-id, use internal variable hw_id to differentiate cold-plug cpus and hot-plug cpus. 4. Add generic function virt_init_cpu_irq() to init interrupt pin of CPU object, used by both cold-plug and hot-plug CPUs v1 ... v2: 1. Add new property hw-id, property hw-id is set for cold-added CPUs, and property socket-id/core-id/thread-id is set for hot-added CPUs. The two properties can be generated from each other. 2. Use general hotplug api such as hotplug_handler_pre_plug etc 3. Reorganize the patch order, split the patch set into 4 small patches. --- Bibo Mao (7): hw/loongarch/virt: Add CPU topology support hw/loongarch/virt: Add topo properties on CPU object hw/loongarch/virt: Add basic cpu plug interface framework hw/loongarch/virt: Implement cpu unplug interface hw/loongarch/virt: Implement cpu plug interface hw/loongarch/virt: Update the ACPI table for hotplug cpu hw/loongarch/virt: Enable cpu hotplug feature on virt machine hw/loongarch/Kconfig | 1 + hw/loongarch/virt-acpi-build.c | 35 +++- hw/loongarch/virt.c| 284 ++--- include/hw/loongarch/virt.h| 1 + target/loongarch/cpu.c | 23 +++ target/loongarch/cpu.h | 11 ++ 6 files changed, 333 insertions(+), 22 deletions(-)
Re: [Qemu-devel] [PATCH v2 00/11] chardev: Convert IO handlers to use unsigned type
On 12/10/18 10:20, Daniel P. Berrangé wrote: On Fri, Oct 12, 2018 at 02:22:06AM +0200, Philippe Mathieu-Daudé wrote: Hi Paolo, Here are the changes you suggested in https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg02294.html chardev: Simplify IOWatchPoll::fd_can_read as a GSourceFunc chardev: Assert backend's chr_can_read() is positive chardev: Let chr_sync_read() use unsigned type chardev: Let chr_write use unsigned type chardev: Let IOReadHandler use unsigned type chardev: Let IOCanReadHandler use unsigned type chardev: Let qemu_chr_fe_* use unsigned type chardev: Let qemu_chr_be_* use unsigned type chardev: FDChardev::max_size be unsigned s/unsigned/size_t/ in all these commit messages, as it looked like you meant 'unsigned int' with the current commit msg. Sure. Paolo, any blocking comment before I fix the bitrotten places?
Re: [Qemu-devel] [PATCH v2 00/11] chardev: Convert IO handlers to use unsigned type
On 20/2/25 11:07, Philippe Mathieu-Daudé wrote: On 12/10/18 10:20, Daniel P. Berrangé wrote: On Fri, Oct 12, 2018 at 02:22:06AM +0200, Philippe Mathieu-Daudé wrote: Hi Paolo, Here are the changes you suggested in https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg02294.html chardev: Simplify IOWatchPoll::fd_can_read as a GSourceFunc chardev: Assert backend's chr_can_read() is positive chardev: Let chr_sync_read() use unsigned type chardev: Let chr_write use unsigned type chardev: Let IOReadHandler use unsigned type chardev: Let IOCanReadHandler use unsigned type chardev: Let qemu_chr_fe_* use unsigned type chardev: Let qemu_chr_be_* use unsigned type chardev: FDChardev::max_size be unsigned s/unsigned/size_t/ in all these commit messages, as it looked like you meant 'unsigned int' with the current commit msg. Sure. Paolo, any blocking comment before I fix the bitrotten places? Doh, I just noticed your comments in the last patch: https://lore.kernel.org/qemu-devel/786ca1d4-f874-e643-b85a-20652f1c8...@redhat.com/
Re: [PATCH v2 2/3] tests/functional: Introduce the dso_suffix() helper
On 20/02/2025 09.02, Philippe Mathieu-Daudé wrote: Introduce a helper to get the default shared library suffix used on the host. Suggested-by: Pierrick Bouvier Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Pierrick Bouvier --- tests/functional/qemu_test/__init__.py | 2 +- tests/functional/qemu_test/cmd.py | 1 - tests/functional/qemu_test/config.py | 6 ++ 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/tests/functional/qemu_test/__init__.py b/tests/functional/qemu_test/__init__.py index 5c972843a6d..45f7befa374 100644 --- a/tests/functional/qemu_test/__init__.py +++ b/tests/functional/qemu_test/__init__.py @@ -7,7 +7,7 @@ from .asset import Asset -from .config import BUILD_DIR +from .config import BUILD_DIR, dso_suffix from .cmd import is_readable_executable_file, \ interrupt_interactive_console_until_pattern, wait_for_console_pattern, \ exec_command, exec_command_and_wait_for_pattern, get_qemu_img, which diff --git a/tests/functional/qemu_test/cmd.py b/tests/functional/qemu_test/cmd.py index dc5f422b77d..254e23ef748 100644 --- a/tests/functional/qemu_test/cmd.py +++ b/tests/functional/qemu_test/cmd.py @@ -15,7 +15,6 @@ import os import os.path - def which(tool): """ looks up the full path for @tool, returns None if not found or if @tool does not have executable permissions. Please drop the change to cmd.py now. With that fixed: Reviewed-by: Thomas Huth
Re: Any plan for command line '-g' option (graphical resolution and depth)?
On 20/2/25 10:50, Paolo Bonzini wrote: On 2/20/25 10:20, Philippe Mathieu-Daudé wrote: Hi, QEMU provides the global '-g' CLI option: $ qemu-system-foo --help -g WxH[xDEPTH] Set the initial graphical resolution and depth This option is used to pass resolution/depth information to guest firmwares in the machines defined in the following files: hw/ppc/mac_newworld.c hw/ppc/mac_oldworld.c hw/ppc/prep.c hw/ppc/spapr.c hw/sparc/sun4m.c hw/sparc64/sun4u.c Examples: - hw/ppc/spapr.c:1102: _FDT(fdt_setprop_cell(fdt, c, "qemu,graphic-width", graphic_width)); - hw/sparc64/sun4u.c:716: fw_cfg_add_i16(fw_cfg, FW_CFG_SPARC64_WIDTH, graphic_width); Obviously we have default values, which are different per architecture! Clearly these ought to be machine properties. Yes, we should have an easy way for machine to allow, in addition to -M graphics=BOOLEAN, the structured property -M graphics. {width,height,depth,enabled}. Then '-g' can desugar to -M graphics.width=WW,graphics.height=HH,graphics.depth=BB. It is also used to set TYPE_NUBUS_MACFB properties in hw/m68k/q800.c. Here i suppose we could directly use '-global nubus-macfb.width=value' etc. although I'm not sure it is the recommended way. Should we start deprecating '-g' as a whole? Only if it's unused/useless. If the option is useful it could also be used for EDID, for example (adding Gerd). Global variables are hard to track, and the one-off nature of the option makes it hard to figure out (without looking at the code) which machine support it. However, if the properties were available and introspectable via QOM, then it wouldn't matter if the -g sugar remains as an isolated "case QEMU_OPTION_g:" in system/vl.c. For an example see -boot/-smp or -no- reboot/-no-shutdown. OK, I suppose you are pointing me to: commit e6dba0481363ad343c5f984dd4de3dd06d79ee68 Author: Alejandro Jimenez Date: Fri Dec 11 11:52:43 2020 -0500 qmp: generalize watchdog-set-action to -no-reboot/-no-shutdown Add a QMP command to allow for the behaviors specified by the -no-reboot and -no-shutdown command line option to be set at runtime. The new command is named set-action and takes optional arguments, named after an event, that provide a corresponding action to take. Commit 30e863e5a7a ("qapi/qom: Add ObjectOptions for input-*") already introduced width/height notions in InputBarrierProperties: # @width: the width of secondary screen in pixels (default: "1920") # # @height: the height of secondary screen in pixels (default: "1080") Maybe we can extract a QAPI structure including depth. Anyway I'll have a look, thanks for your guidance! Phil.
Re: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in d3hot state
Hi Alex, On 2/19/25 10:19 PM, Alex Williamson wrote: > On Wed, 19 Feb 2025 11:58:44 -0700 > Alex Williamson wrote: > >> On Wed, 19 Feb 2025 18:58:58 +0100 >> Eric Auger wrote: >> >>> Since kernel commit: >>> 2b2c651baf1c ("vfio/pci: Invalidate mmaps and block the access >>> in D3hot power state") >>> any attempt to do an mmap access to a BAR when the device is in d3hot >>> state will generate a fault. >>> >>> On system_powerdown, if the VFIO device is translated by an IOMMU, >>> the device is moved to D3hot state and then the vIOMMU gets disabled >>> by the guest. As a result of this later operation, the address space is >>> swapped from translated to untranslated. When re-enabling the aliased >>> regions, the RAM regions are dma-mapped again and this causes DMA_MAP >>> faults when attempting the operation on BARs. >>> >>> To avoid doing the remap on those BARs, we compute whether the >>> device is in D3hot state and if so, skip the DMA MAP. >> Thinking on this some more, QEMU PCI code already manages the device >> BARs appearing in the address space based on the memory enable bit in >> the command register. Should we do the same for PM state? >> >> IOW, the device going into low power state should remove the BARs from >> the AddressSpace and waking the device should re-add them. The BAR DMA >> mapping should then always be consistent, whereas here nothing would >> remap the BARs when the device is woken. >> >> I imagine we'd need an interface to register the PM capability with the >> core QEMU PCI code, where address space updates are performed relative >> to both memory enable and power status. There might be a way to >> implement this just for vfio-pci devices by toggling the enable state >> of the BAR mmaps relative to PM state, but doing it at the PCI core >> level seems like it'd provide behavior more true to physical hardware. > I took a stab at this approach here, it doesn't obviously break > anything in my configs, but I haven't yet tried to reproduce this exact > scenario. > > https://gitlab.com/alex.williamson/qemu/-/tree/pci-pm-power-state So if I understand correctly the BAR regions will disappear upon the config cmd write in vfio_sub_page_bar_update_mapping(). Is that correct? I will give it a try on my setup... > > There's another pm_cap on the PCIExpressDevice that needs to be > consolidated as well, once I do some research to figure out why a > non-express capability is tracked only by express devices and what > they're doing with it. Thanks, I am not sure I get this last point though. Thanks Eric > > Alex >
[PATCH v3 2/2] rust: add module to convert between success/-errno and io::Result
It is a common convention in QEMU to return a positive value in case of success, and a negated errno value in case of error. Unfortunately, using errno portably in Rust is a bit complicated; on Unix the errno values are supported natively by io::Error, but on Windows they are not; so, use the libc crate. This is a set of utility functions that are used by both chardev and block layer bindings. Signed-off-by: Paolo Bonzini --- docs/devel/rust.rst | 1 + rust/qemu-api/meson.build | 4 + rust/qemu-api/src/assertions.rs | 28 +++ rust/qemu-api/src/errno.rs | 343 rust/qemu-api/src/lib.rs| 1 + rust/qemu-api/src/prelude.rs| 2 + 6 files changed, 379 insertions(+) create mode 100644 rust/qemu-api/src/errno.rs diff --git a/docs/devel/rust.rst b/docs/devel/rust.rst index 90958e5a306..c75dccdbb7c 100644 --- a/docs/devel/rust.rst +++ b/docs/devel/rust.rst @@ -179,6 +179,7 @@ module status ``callbacks``complete ``cell`` stable ``c_str``complete +``errno``complete ``irq`` complete ``memory`` stable ``module`` complete diff --git a/rust/qemu-api/meson.build b/rust/qemu-api/meson.build index 2e9c1078b9b..bcf1cf780f3 100644 --- a/rust/qemu-api/meson.build +++ b/rust/qemu-api/meson.build @@ -2,6 +2,8 @@ _qemu_api_cfg = run_command(rustc_args, '--config-headers', config_host_h, '--features', files('Cargo.toml'), capture: true, check: true).stdout().strip().splitlines() +libc_dep = dependency('libc-0.2-rs') + # _qemu_api_cfg += ['--cfg', 'feature="allocator"'] if rustc.version().version_compare('>=1.77.0') _qemu_api_cfg += ['--cfg', 'has_offset_of'] @@ -22,6 +24,7 @@ _qemu_api_rs = static_library( 'src/cell.rs', 'src/chardev.rs', 'src/c_str.rs', + 'src/errno.rs', 'src/irq.rs', 'src/memory.rs', 'src/module.rs', @@ -39,6 +42,7 @@ _qemu_api_rs = static_library( override_options: ['rust_std=2021', 'build.rust_std=2021'], rust_abi: 'rust', rust_args: _qemu_api_cfg, + dependencies: libc_dep, ) rust.test('rust-qemu-api-tests', _qemu_api_rs, diff --git a/rust/qemu-api/src/assertions.rs b/rust/qemu-api/src/assertions.rs index fa1a18de6fe..104dec39774 100644 --- a/rust/qemu-api/src/assertions.rs +++ b/rust/qemu-api/src/assertions.rs @@ -92,3 +92,31 @@ fn types_must_be_equal(_: T) }; }; } + +/// Assert that an expression matches a pattern. This can also be +/// useful to compare enums that do not implement `Eq`. +/// +/// # Examples +/// +/// ``` +/// # use qemu_api::assert_match; +/// // JoinHandle does not implement `Eq`, therefore the result +/// // does not either. +/// let result: Result, u32> = Err(42); +/// assert_match!(result, Err(42)); +/// ``` +#[macro_export] +macro_rules! assert_match { +($a:expr, $b:pat) => { +assert!( +match $a { +$b => true, +_ => false, +}, +"{} = {:?} does not match {}", +stringify!($a), +$a, +stringify!($b) +); +}; +} diff --git a/rust/qemu-api/src/errno.rs b/rust/qemu-api/src/errno.rs new file mode 100644 index 000..bfd972df951 --- /dev/null +++ b/rust/qemu-api/src/errno.rs @@ -0,0 +1,343 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +//! Utility functions to convert `errno` to and from +//! [`io::Error`]/[`io::Result`] +//! +//! QEMU C functions often have a "positive success/negative `errno`" calling +//! convention. This module provides functions to portably convert an integer +//! into an [`io::Result`] and back. + +use std::{convert::TryFrom, io, io::ErrorKind}; + +/// An `errno` value that can be converted into an [`io::Error`] +pub struct Errno(pub u16); + +// On Unix, from_raw_os_error takes an errno value and OS errors +// are printed using strerror. On Windows however it takes a +// GetLastError() value; therefore we need to convert errno values +// into io::Error by hand. This is the same mapping that the +// standard library uses to retrieve the kind of OS errors +// (`std::sys::pal::unix::decode_error_kind`). +impl From for ErrorKind { +fn from(value: Errno) -> ErrorKind { +let Errno(errno) = value; +match i32::from(errno) { +libc::EPERM | libc::EACCES => ErrorKind::PermissionDenied, +libc::ENOENT => ErrorKind::NotFound, +libc::EINTR => ErrorKind::Interrupted, +x if x == libc::EAGAIN || x == libc::EWOULDBLOCK => ErrorKind::WouldBlock, +libc::ENOMEM => ErrorKind::OutOfMemory, +libc::EEXIST => ErrorKind::AlreadyExists, +libc::EINVAL => ErrorKind::InvalidInput, +libc::EPIPE => ErrorKind::BrokenPipe, +libc::EADDRINUSE => ErrorKind::AddrInUse, +libc::EADDRNOTAVAIL => ErrorKind::AddrNotAvailable, +libc::ECONNABORTED => ErrorKind::ConnectionAborted, +libc
Re: [PATCH v5 4/6] target/i386: Add feature that indicates WRMSR to BASE reg is non-serializing
On Thu, Feb 06, 2025 at 01:28:37PM -0600, Babu Moger wrote: > Date: Thu, 6 Feb 2025 13:28:37 -0600 > From: Babu Moger > Subject: [PATCH v5 4/6] target/i386: Add feature that indicates WRMSR to > BASE reg is non-serializing > X-Mailer: git-send-email 2.34.1 > > Add the CPUID bit indicates that a WRMSR to MSR_FS_BASE, MSR_GS_BASE, or > MSR_KERNEL_GS_BASE is non-serializing. > > CPUID_Fn8021_EAX > BitFeature description > 1 FsGsKernelGsBaseNonSerializing. >WRMSR to FS_BASE, GS_BASE and KernelGSbase are non-serializing. > > Link: > https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/57238.zip > Signed-off-by: Babu Moger > Reviewed-by: Maksim Davydov > --- > target/i386/cpu.c | 2 +- > target/i386/cpu.h | 2 ++ > 2 files changed, 3 insertions(+), 1 deletion(-) Reviewed-by: Zhao Liu
Re: [PATCH v5 5/6] target/i386: Update EPYC-Genoa for Cache property, perfmon-v2, RAS and SVM feature bits
> +static const CPUCaches epyc_genoa_v2_cache_info = { > +.l1d_cache = &(CPUCacheInfo) { > +.type = DATA_CACHE, > +.level = 1, > +.size = 32 * KiB, > +.line_size = 64, > +.associativity = 8, > +.partitions = 1, > +.sets = 64, > +.lines_per_tag = 1, > +.self_init = 1, true. > +.share_level = CPU_TOPOLOGY_LEVEL_CORE, > +}, > +.l1i_cache = &(CPUCacheInfo) { > +.type = INSTRUCTION_CACHE, > +.level = 1, > +.size = 32 * KiB, > +.line_size = 64, > +.associativity = 8, > +.partitions = 1, > +.sets = 64, > +.lines_per_tag = 1, > +.self_init = 1, true. Reviewed-by: Zhao Liu
Re: [PATCH v5 6/6] target/i386: Add support for EPYC-Turin model
> +static const CPUCaches epyc_turin_cache_info = { > +.l1d_cache = &(CPUCacheInfo) { > +.type = DATA_CACHE, > +.level = 1, > +.size = 48 * KiB, > +.line_size = 64, > +.associativity = 12, > +.partitions = 1, > +.sets = 64, > +.lines_per_tag = 1, > +.self_init = 1, true. > +.share_level = CPU_TOPOLOGY_LEVEL_CORE, > +}, > +.l1i_cache = &(CPUCacheInfo) { > +.type = INSTRUCTION_CACHE, > +.level = 1, > +.size = 32 * KiB, > +.line_size = 64, > +.associativity = 8, > +.partitions = 1, > +.sets = 64, > +.lines_per_tag = 1, > +.self_init = 1, true. Reviewed-by: Zhao Liu (And it would be better to add a Turin entry in docs/system/cpu-models-x86.rst.inc later :-).) Thanks, Zhao
Re: [PATCH v4 1/9] vhost-user: Add VirtIO Shared Memory map request
Albert Esteve writes: > @@ -192,6 +194,24 @@ typedef struct VhostUserShared { > unsigned char uuid[16]; > } VhostUserShared; > > +/* For the flags field of VhostUserMMap */ > +#define VHOST_USER_FLAG_MAP_R (1u << 0) > +#define VHOST_USER_FLAG_MAP_W (1u << 1) > + > +typedef struct { > +/* VIRTIO Shared Memory Region ID */ > +uint8_t shmid; > +uint8_t padding[7]; > +/* File offset */ > +uint64_t fd_offset; > +/* Offset within the VIRTIO Shared Memory Region */ > +uint64_t shm_offset; > +/* Size of the mapping */ > +uint64_t len; > +/* Flags for the mmap operation, from VHOST_USER_FLAG_* */ Perhaps this should say VHOST_USER_FLAG_MAP_*? signature.asc Description: PGP signature
Re: [PATCH v6 0/7] hw/char/pl011: Implement TX (async) FIFO to avoid blocking the main loop
On Thu, 20 Feb 2025 at 10:52, Peter Maydell wrote: > > On Thu, 20 Feb 2025 at 10:43, Peter Maydell wrote: > > > > On Tue, 18 Feb 2025 at 13:54, Peter Maydell > > wrote: > > > > > > On Mon, 17 Feb 2025 at 14:55, Peter Maydell > > > wrote: > > > > > > > > On Sat, 8 Feb 2025 at 16:39, Philippe Mathieu-Daudé > > > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > This series add support for (async) FIFO on the transmit path > > > > > of the PL011 UART. > > > > > > > > > > > > > Applied to target-arm.next, thanks (with a couple of minor > > > > tweaks to two of the patches). > > > > > > Unfortunately I seem to get failures in 'make check-functional' > > > with the last patch of this series applied. > > > > I had a look at this this morning because I wondered if it > > was a mistake in the style fixups I'd applied to the patches > > on my end, and I found the bug fairly quickly. The problem is > > that pl011_xmit() doesn't update the TXFE and TXFF FIFO empty/full > > status flag bits when it removes characters from the FIFO. > > So the guest kernel spins forever because TXFF is never unset. > > > > The following patch fixes this for me (and also makes us not > > set INT_TX for the case where we couldn't send any bytes to > > the chardev, which I noticed reading the code rather than > > because it had any visible bad effects): > > Hmm, but that's clearly not the only problem -- it fixed the > "no output at all issue", but now I see a test failure because > of garbled console output: > I also noticed that pl011_write_txdata() doesn't clear TXFE > when it puts a byte into the fifo -- I'm testing to see if > fixing that helps. Yes, with this patch also: --- a/hw/char/pl011.c +++ b/hw/char/pl011.c @@ -330,6 +330,7 @@ static void pl011_write_txdata(PL011State *s, uint8_t data) if (pl011_is_tx_fifo_full(s)) { s->flags |= PL011_FLAG_TXFF; } +s->flags &= ~PL011_FLAG_TXFE; pl011_xmit(NULL, G_IO_OUT, s); } this remaining failure is fixed and I get a clean pass (other than the gpu test failure, but that's not related to the pl011). -- PMM
Re: [PATCH v2 2/3] binfmt: Normalize host CPU architecture
Le 27/01/2025 à 19:29, Andrea Bolognani a écrit : Right now information regarding the family each CPU type belongs to is recorded in two places: the large data table at the top of the script, and the qemu_host_family() function. We can make things better by mapping host CPU architecture to QEMU target in the few cases where the two don't already match and then using the data table to look up the family, same as we're already doing for the guest CPU architecture. Being able to reason in terms of QEMU target regardless of whether we're looking at the host or guest CPU architecture will come in handy to implement upcoming changes. A couple of entries are dropped in the process: BePC and Power Macintosh. I'm quite certain neither of those have ever been reported as CPU architectures by Linux. I believe many more of the entries that are carried forward could be dropped as well, but I don't have the same level of confidence there so I decided to play it safe just in case. Signed-off-by: Andrea Bolognani --- scripts/qemu-binfmt-conf.sh | 44 + 1 file changed, 25 insertions(+), 19 deletions(-) diff --git a/scripts/qemu-binfmt-conf.sh b/scripts/qemu-binfmt-conf.sh index 426f075e31..8d9136a29f 100755 --- a/scripts/qemu-binfmt-conf.sh +++ b/scripts/qemu-binfmt-conf.sh @@ -144,35 +144,35 @@ loongarch64_magic='\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x loongarch64_mask='\xff\xff\xff\xff\xff\xff\xff\xfc\x00\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff' loongarch64_family=loongarch -qemu_get_family() { -cpu=${HOST_ARCH:-$(uname -m)} +# Converts the name of a host CPU architecture to the corresponding QEMU +# target. +# +# FIXME: This can probably be simplified a lot by dropping most entries. +#Remember that the script is only used on Linux, so we only need to +#handle the strings Linux uses to report the host CPU architecture. +qemu_normalize() { +cpu="$1" case "$cpu" in -amd64|i386|i486|i586|i686|i86pc|BePC|x86_64) +i[3-6]86) echo "i386" ;; -mips*) -echo "mips" +amd64) +echo "x86_64" ;; -"Power Macintosh"|ppc64|powerpc|ppc) I don't know why we have "Power Macintosh" as a valid value for "uname -m". I think it's a good idea to remove it. It's here from the beginning 08785f48b73c ("updated so that PPC/ARM/SPARC executables are automatically launched when invoked") Perhaps it's a value coming from the OpenFirmware or MkLinux. +powerpc) echo "ppc" ;; -ppc64el|ppc64le) -echo "ppcle" +ppc64el) +echo "ppc64le" ;; -arm|armel|armhf|arm64|armv[4-9]*l|aarch64) +armel|armhf|armv[4-9]*l) echo "arm" ;; -armeb|armv[4-9]*b|aarch64_be) +armv[4-9]*b) echo "armeb" ;; -sparc*) -echo "sparc" -;; -riscv*) -echo "riscv" -;; -loongarch*) -echo "loongarch" +arm64) +echo "aarch64" ;; *) echo "$cpu" @@ -309,7 +309,13 @@ EOF qemu_set_binfmts() { # probe cpu type -host_family=$(qemu_get_family) +host_cpu=$(qemu_normalize ${HOST_ARCH:-$(uname -m)}) +host_family=$(eval echo \$${host_cpu}_family) + +if [ "$host_family" = "" ] ; then +echo "INTERNAL ERROR: unknown host cpu $host_cpu" 1>&2 +exit 1 +fi # register the interpreter for each cpu except for the native one Reviewed-by: Laurent Vivier
Re: [PATCH v2 3/3] binfmt: Add --ignore-family option
Le 27/01/2025 à 19:29, Andrea Bolognani a écrit : Until now, the script has worked under the assumption that a host CPU can run binaries targeting any CPU in the same family. That's a fair enough assumption when it comes to running i386 binaries on x86_64, but it doesn't quite apply in the general case. For example, while riscv64 CPUs could theoretically run riscv32 applications natively, in practice there exist few (if any?) CPUs that implement the necessary silicon; moreover, even if you had one such CPU, your host OS would most likely not have enabled the necessary kernel bits. This new option gives distro packagers the ability to opt out of the assumption, likely on a per-architecture basis, and make things work out of the box for a larger fraction of their user base. As an interesting side effect, this makes it possible to enable execution of 64-bit binaries on 32-bit CPUs of the same family, which is a perfectly valid use case that apparently hadn't been considered until now. Link: https://src.fedoraproject.org/rpms/qemu/pull-request/72 Thanks: David Abdurachmanov Thanks: Daniel P. Berrangé Signed-off-by: Andrea Bolognani --- scripts/qemu-binfmt-conf.sh | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/scripts/qemu-binfmt-conf.sh b/scripts/qemu-binfmt-conf.sh index 8d9136a29f..5fd462b1d1 100755 --- a/scripts/qemu-binfmt-conf.sh +++ b/scripts/qemu-binfmt-conf.sh @@ -205,6 +205,9 @@ Usage: qemu-binfmt-conf.sh [--qemu-path PATH][--debian][--systemd CPU] --persistent:if yes, the interpreter is loaded when binfmt is configured and remains in memory. All future uses are cloned from the open file. + --ignore-family: if yes, it is assumed that the host CPU (e.g. riscv64) +can't natively run programs targeting a CPU that is +part of the same family (e.g. riscv32). --preserve-argv0 preserve argv[0] To import templates with update-binfmts, use : @@ -337,7 +340,12 @@ qemu_set_binfmts() { fi if [ "$host_family" = "$family" ] ; then -continue +# When --ignore-family is used, we have to generate rules even +# for targets that are in the same family as the host CPU. The +# only exception is of course when the CPU types exactly match +if [ "$target" = "$host_cpu" ] || [ "$IGNORE_FAMILY" = "no" ] ; then +continue +fi fi $BINFMT_SET @@ -355,10 +363,11 @@ CREDENTIAL=no PERSISTENT=no PRESERVE_ARG0=no QEMU_SUFFIX="" +IGNORE_FAMILY=no _longopts="debian,systemd:,qemu-path:,qemu-suffix:,exportdir:,help,credential:,\ -persistent:,preserve-argv0:" -options=$(getopt -o ds:Q:S:e:hc:p:g:F: -l ${_longopts} -- "$@") +persistent:,preserve-argv0:,ignore-family:" +options=$(getopt -o ds:Q:S:e:hc:p:g:F:i: -l ${_longopts} -- "$@") eval set -- "$options" while true ; do @@ -418,6 +427,10 @@ while true ; do shift PRESERVE_ARG0="$1" ;; +-i|--ignore-family) +shift +IGNORE_FAMILY="$1" +;; *) break ;; Reviewed-by: Laurent Vivier
Re: [PATCH v5 3/6] target/i386: Update EPYC-Milan CPU model for Cache property, RAS, SVM feature bits
> +static const CPUCaches epyc_milan_v3_cache_info = { > +.l1d_cache = &(CPUCacheInfo) { > +.type = DATA_CACHE, > +.level = 1, > +.size = 32 * KiB, > +.line_size = 64, > +.associativity = 8, > +.partitions = 1, > +.sets = 64, > +.lines_per_tag = 1, > +.self_init = 1, true. > +.share_level = CPU_TOPOLOGY_LEVEL_CORE, > +}, > +.l1i_cache = &(CPUCacheInfo) { > +.type = INSTRUCTION_CACHE, > +.level = 1, > +.size = 32 * KiB, > +.line_size = 64, > +.associativity = 8, > +.partitions = 1, > +.sets = 64, > +.lines_per_tag = 1, > +.self_init = 1, true. Reviewed-by: Zhao Liu
Re: Any plan for command line '-g' option (graphical resolution and depth)?
On 20/02/2025 09:20, Philippe Mathieu-Daudé wrote: Hi, QEMU provides the global '-g' CLI option: $ qemu-system-foo --help -g WxH[xDEPTH] Set the initial graphical resolution and depth This option is used to pass resolution/depth information to guest firmwares in the machines defined in the following files: hw/ppc/mac_newworld.c hw/ppc/mac_oldworld.c hw/ppc/prep.c hw/ppc/spapr.c hw/sparc/sun4m.c hw/sparc64/sun4u.c Examples: - hw/ppc/spapr.c:1102: _FDT(fdt_setprop_cell(fdt, c, "qemu,graphic-width", graphic_width)); - hw/sparc64/sun4u.c:716: fw_cfg_add_i16(fw_cfg, FW_CFG_SPARC64_WIDTH, graphic_width); Obviously we have default values, which are different per architecture! Clearly these ought to be machine properties. At least for VGA it is possible to use VGA via the width and height properties to set the initial values i.e. on the command line with -device VGA,width=1024,height=768 and then use an EDID parser to get the information in the firmware. But then we hit the same problem of how to set these properties from the command line on in-built devices again. It is also used to set TYPE_NUBUS_MACFB properties in hw/m68k/q800.c. Here i suppose we could directly use '-global nubus-macfb.width=value' etc. although I'm not sure it is the recommended way. This approach should work, however -global feels more like a big hammer as opposed to a proper solution. In particular it doesn't work if you have e.g. multiple screens that you want set to different resolutions. Should we start deprecating '-g' as a whole? I think there should be a better way to consolidate the default resolution selection on display devices (particularly in the case of multiple displays), so yes. ATB, Mark.
[PATCH v2 9/9] hw/char/sh_serial: Return correct number of empty RX FIFO elements
In the IOCanReadHandler sh_serial_can_receive(), if the Serial Control Register 'Receive Enable' bit is set (bit 4), then we return a size of (1 << 4) which happens to be equal to 16, so effectively SH_RX_FIFO_LENGTH. The IOReadHandler, sh_serial_receive1() takes care to receive multiple chars, but if the FIFO is partly filled, we only process the number of free slots in the FIFO, discarding the other chars! Fix by returning how many elements the FIFO can queue in the IOCanReadHandler, so we don't have to process more than that in the IOReadHandler, thus not discarding anything. Remove the now unnecessary check on 's->rx_cnt < SH_RX_FIFO_LENGTH' in IOReadHandler, reducing the block indentation. Fixes: 63242a007a1 ("SH4: Serial controller improvement") Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Luc Michel --- hw/char/sh_serial.c | 30 ++ 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/hw/char/sh_serial.c b/hw/char/sh_serial.c index 247aeb071ac..41c8175a638 100644 --- a/hw/char/sh_serial.c +++ b/hw/char/sh_serial.c @@ -320,7 +320,7 @@ static uint64_t sh_serial_read(void *opaque, hwaddr offs, static int sh_serial_can_receive(SHSerialState *s) { -return s->scr & (1 << 4); +return s->scr & (1 << 4) ? SH_RX_FIFO_LENGTH - s->rx_head : 0; } static void sh_serial_receive_break(SHSerialState *s) @@ -353,22 +353,20 @@ static void sh_serial_receive1(void *opaque, const uint8_t *buf, int size) if (s->feat & SH_SERIAL_FEAT_SCIF) { int i; for (i = 0; i < size; i++) { -if (s->rx_cnt < SH_RX_FIFO_LENGTH) { -s->rx_fifo[s->rx_head++] = buf[i]; -if (s->rx_head == SH_RX_FIFO_LENGTH) { -s->rx_head = 0; -} -s->rx_cnt++; -if (s->rx_cnt >= s->rtrg) { -s->flags |= SH_SERIAL_FLAG_RDF; -if (s->scr & (1 << 6) && s->rxi) { -timer_del(&s->fifo_timeout_timer); -qemu_set_irq(s->rxi, 1); -} -} else { -timer_mod(&s->fifo_timeout_timer, -qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 15 * s->etu); +s->rx_fifo[s->rx_head++] = buf[i]; +if (s->rx_head == SH_RX_FIFO_LENGTH) { +s->rx_head = 0; +} +s->rx_cnt++; +if (s->rx_cnt >= s->rtrg) { +s->flags |= SH_SERIAL_FLAG_RDF; +if (s->scr & (1 << 6) && s->rxi) { +timer_del(&s->fifo_timeout_timer); +qemu_set_irq(s->rxi, 1); } +} else { +timer_mod(&s->fifo_timeout_timer, +qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 15 * s->etu); } } } else { -- 2.47.1
[PATCH v2 4/9] hw/char/pl011: Really use RX FIFO depth
While we model a 16-elements RX FIFO since the PL011 model was introduced in commit cdbdb648b7c ("ARM Versatile Platform Baseboard emulation"), we only read 1 char at a time! Have the IOCanReadHandler handler return how many elements are available, and use that in the IOReadHandler handler. Example of FIFO better used by enabling the pl011 tracing events and running the tests/functional/test_aarch64_virt.py tests: pl011_can_receive LCR 0x70, RX FIFO used 0/16, can_receive 16 chars pl011_receive recv 5 chars pl011_fifo_rx_put RX FIFO push char [0x72] 1/16 depth used pl011_irq_state irq state 1 pl011_fifo_rx_put RX FIFO push char [0x6f] 2/16 depth used pl011_fifo_rx_put RX FIFO push char [0x6f] 3/16 depth used pl011_fifo_rx_put RX FIFO push char [0x74] 4/16 depth used pl011_fifo_rx_put RX FIFO push char [0x0d] 5/16 depth used pl011_can_receive LCR 0x70, RX FIFO used 5/16, can_receive 11 chars pl011_can_receive LCR 0x70, RX FIFO used 5/16, can_receive 11 chars pl011_write addr 0x038 value 0x0050 reg IMSC pl011_irq_state irq state 1 pl011_can_receive LCR 0x70, RX FIFO used 5/16, can_receive 11 chars pl011_read addr 0x03c value 0x0030 reg RIS pl011_write addr 0x044 value 0x reg ICR pl011_irq_state irq state 1 pl011_read addr 0x018 value 0x0080 reg FR pl011_read_fifo RX FIFO read, used 4/16 pl011_irq_state irq state 1 pl011_read addr 0x000 value 0x0072 reg DR pl011_can_receive LCR 0x70, RX FIFO used 4/16, can_receive 12 chars pl011_read addr 0x018 value 0x0080 reg FR pl011_read_fifo RX FIFO read, used 3/16 pl011_irq_state irq state 1 pl011_read addr 0x000 value 0x006f reg DR pl011_can_receive LCR 0x70, RX FIFO used 3/16, can_receive 13 chars pl011_read addr 0x018 value 0x0080 reg FR pl011_read_fifo RX FIFO read, used 2/16 pl011_irq_state irq state 1 pl011_read addr 0x000 value 0x006f reg DR pl011_can_receive LCR 0x70, RX FIFO used 2/16, can_receive 14 chars pl011_read addr 0x018 value 0x0080 reg FR pl011_read_fifo RX FIFO read, used 1/16 pl011_irq_state irq state 1 pl011_read addr 0x000 value 0x0074 reg DR pl011_can_receive LCR 0x70, RX FIFO used 1/16, can_receive 15 chars pl011_read addr 0x018 value 0x0080 reg FR pl011_read_fifo RX FIFO read, used 0/16 pl011_irq_state irq state 0 pl011_read addr 0x000 value 0x000d reg DR pl011_can_receive LCR 0x70, RX FIFO used 0/16, can_receive 16 chars pl011_read addr 0x018 value 0x0090 reg FR pl011_read addr 0x03c value 0x0020 reg RIS pl011_write addr 0x038 value 0x0050 reg IMSC pl011_irq_state irq state 0 pl011_can_receive LCR 0x70, RX FIFO used 0/16, can_receive 16 chars pl011_can_receive LCR 0x70, RX FIFO used 0/16, can_receive 16 chars pl011_read addr 0x018 value 0x0090 reg FR pl011_write addr 0x000 value 0x0072 reg DR Inspired-by: Peter Maydell Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Luc Michel --- hw/char/pl011.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/hw/char/pl011.c b/hw/char/pl011.c index f7485e7c541..23a9db8c57c 100644 --- a/hw/char/pl011.c +++ b/hw/char/pl011.c @@ -489,7 +489,6 @@ static int pl011_can_receive(void *opaque) PL011State *s = (PL011State *)opaque; unsigned fifo_depth = pl011_get_fifo_depth(s); unsigned fifo_available = fifo_depth - s->read_count; -int r = fifo_available ? 1 : 0; if (!(s->cr & CR_UARTEN)) { qemu_log_mask(LOG_GUEST_ERROR, @@ -500,7 +499,8 @@ static int pl011_can_receive(void *opaque) "PL011 receiving data on disabled RX UART\n"); } trace_pl011_can_receive(s->lcr, s->read_count, fifo_depth, fifo_available); -return r; + +return fifo_available; } static void pl011_receive(void *opaque, const uint8_t *buf, int size) @@ -515,7 +515,9 @@ static void pl011_receive(void *opaque, const uint8_t *buf, int size) return; } -pl011_fifo_rx_put(opaque, *buf); +for (int i = 0; i < size; i++) { +pl011_fifo_rx_put(opaque, buf[i]); +} } static void pl011_event(void *opaque, QEMUChrEvent event) -- 2.47.1
[PATCH v2 8/9] hw/char/mcf_uart: Really use RX FIFO depth
While we model a 4-elements RX FIFO since the MCF UART model was introduced in commit 20dcee94833 ("MCF5208 emulation"), we only read 1 char at a time! Have the IOCanReadHandler handler return how many elements are available, and use that in the IOReadHandler handler. Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Luc Michel --- hw/char/mcf_uart.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/hw/char/mcf_uart.c b/hw/char/mcf_uart.c index 95f269ee9b7..529c26be93a 100644 --- a/hw/char/mcf_uart.c +++ b/hw/char/mcf_uart.c @@ -281,14 +281,16 @@ static int mcf_uart_can_receive(void *opaque) { mcf_uart_state *s = (mcf_uart_state *)opaque; -return s->rx_enabled && (s->sr & MCF_UART_FFULL) == 0; +return s->rx_enabled ? FIFO_DEPTH - s->fifo_len : 0; } static void mcf_uart_receive(void *opaque, const uint8_t *buf, int size) { mcf_uart_state *s = (mcf_uart_state *)opaque; -mcf_uart_push_byte(s, buf[0]); +for (int i = 0; i < size; i++) { +mcf_uart_push_byte(s, buf[i]); +} } static const MemoryRegionOps mcf_uart_ops = { -- 2.47.1
Re: [PATCH 2/2] [NOT-FOR-MERGE] Add qtest for migration over RDMA
On 19/02/2025 22:11, Peter Xu wrote: > then > in the test it tries to detect rdma link and fetch the ip only It should work without root permission if we just*detect* and*fetch ip*. Do you also mean we can split new-rdma-link.sh to 2 separate scripts - add-rdma-link.sh # optionally, execute by user before the test (require root permission) - detect-fetch-rdma.sh # execute from the migration-test >>> Hmm indeed we still need a script to scan over all the ports.. >>> >>> If having --rdma is a good idea, maybe we can further make it a parameter >>> to --rdma? >>> >>>$ migration-test --rdma $RDMA_IP >>> >>> Or: >>> >>>$ migration-test --rdma-ip $RDMA_IP >> I think --rdma only makes sense if it's going to do something >> special. The optmimal scenario is that it always runs the test when it >> can and sets up/tears down anything it needs. >> >> If it needs root, I'd prefer the test informs about this and does the >> work itself. >> >> It would also be good to have the add + detect separate so we have more >> flexibility, maybe we manage to enable this in CI even. >> >> So: >> >> ./add.sh >> migration-test >> (runs detect.sh + runs rdma test) >> (leaves stuff behind) >> >> migration-test >> (skips rdma test with message that it needs root) >> >> sudo migration-test >> (runs add.sh + detect.sh + runs rdma test) >> (cleans itself up) >> >> Does that make sense to you? I hope it's not too much work. > Looks good here. We can also keep all the rdma stuff into one file, taking > parameters. > > ./rdma-helper.sh setup > ./rdma-helper.sh detect-ip Hi Peter and Fabiano Many thanks for your kindly idea and suggestion. Please take another look at the changes below. - I don't copy script to the build dir, just execute the script like misc-tests.c - It will automatically create a new RXE if it doesn't exit when running in root [PATCH] migration: Add qtest for migration over RDMA This qtest requires there is RDMA(RoCE) link in the host. Introduce a scripts/rdma-migration-helper.sh to - setup a new RXE if it's root - detect existing RoCE link to make the qtest work smoothly. Test will be skip if there is no available RoCE link. # Start of rdma tests # Running /x86_64/migration/precopy/rdma/plain ok 1 /x86_64/migration/precopy/rdma/plain # SKIP There is no available rdma link in the host. Maybe you are not running with the root permission # End of rdma tests Admin is able to remove the RXE by passing 'cleanup' to this script. Signed-off-by: Li Zhijian --- scripts/rdma-migration-helper.sh | 40 +++ tests/qtest/migration/precopy-tests.c | 57 +++ 2 files changed, 97 insertions(+) create mode 100755 scripts/rdma-migration-helper.sh diff --git a/scripts/rdma-migration-helper.sh b/scripts/rdma-migration-helper.sh new file mode 100755 index 00..4ef62baf0f --- /dev/null +++ b/scripts/rdma-migration-helper.sh @@ -0,0 +1,40 @@ +#!/bin/bash + +# Copied from blktests +get_ipv4_addr() { +ip -4 -o addr show dev "$1" | +sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' | +tr -d '\n' +} + +has_soft_rdma() { +rdma link | grep -q " netdev $1[[:blank:]]*\$" +} + +rdma_rxe_setup_detect() +{ +( +cd /sys/class/net && +for i in *; do +[ -e "$i" ] || continue +[ "$i" = "lo" ] && continue +[ "$(<"$i/addr_len")" = 6 ] || continue +[ "$(<"$i/carrier")" = 1 ] || continue + +has_soft_rdma "$i" && break +[ "$operation" = "setup" ] && rdma link add "${i}_rxe" type rxe netdev "$i" && break +done +has_soft_rdma "$i" || return +get_ipv4_addr $i +) +} + +operation=${1:-setup} + +if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then +rdma_rxe_setup_detect +elif [ "$operation" == "cleanup" ]; then +modprobe -r rdma_rxe +else +echo "Usage: $0 [setup | cleanup | detect]" +fi diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c index ba273d10b9..8c72eb699b 100644 --- a/tests/qtest/migration/precopy-tests.c +++ b/tests/qtest/migration/precopy-tests.c @@ -99,6 +99,59 @@ static void test_precopy_unix_dirty_ring(void) test_precopy_common(&args); } +#ifdef CONFIG_RDMA + +#define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh" +static int new_rdma_link(char *buffer) { +const char *argument = (geteuid() == 0) ? "setup" : "detect"; +char command[1024]; + +snprintf(command, sizeof(command), "%s %s", RDMA_MIGRATION_HELPER, argument); + +FILE *pipe = popen(command, "r"); +if (pipe == NULL) { +perror("Failed to run script"); +return -1; +} + +int idx = 0; +while (fgets(buffer + idx, 128 - idx, pipe) != NULL) { +idx += strlen(buffer); +} + +int status = pclose(pipe); +if (status == -1) { +perror("Error reported b
Re: [PATCH v5 1/6] target/i386: Update EPYC CPU model for Cache property, RAS, SVM feature bits
> +static CPUCaches epyc_v5_cache_info = { > +.l1d_cache = &(CPUCacheInfo) { > +.type = DATA_CACHE, > +.level = 1, > +.size = 32 * KiB, > +.line_size = 64, > +.associativity = 8, > +.partitions = 1, > +.sets = 64, > +.lines_per_tag = 1, > +.self_init = 1, For consistency as the below parts, it's better to code `true` for all boolean types. > +.share_level = CPU_TOPOLOGY_LEVEL_CORE, > +}, > +.l1i_cache = &(CPUCacheInfo) { > +.type = INSTRUCTION_CACHE, > +.level = 1, > +.size = 64 * KiB, > +.line_size = 64, > +.associativity = 4, > +.partitions = 1, > +.sets = 256, > +.lines_per_tag = 1, > +.self_init = 1, ditto. Others are fine for me, so, Reviewed-by: Zhao Liu And one more thing :-) ... > static const CPUCaches epyc_rome_cache_info = { > .l1d_cache = &(CPUCacheInfo) { > .type = DATA_CACHE, > @@ -5207,6 +5261,25 @@ static const X86CPUDefinition builtin_x86_defs[] = { > }, > .cache_info = &epyc_v4_cache_info > }, > +{ > +.version = 5, > +.props = (PropValue[]) { > +{ "overflow-recov", "on" }, > +{ "succor", "on" }, When I checks the "overflow-recov" and "succor" enabling, I find these 2 bits are set unconditionally. I'm not sure if all AMD platforms support both bits, do you think it's necessary to check the host support? diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 6c749d4ee812..03e463076632 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -555,7 +555,10 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function, cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX); ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES; } else if (function == 0x8007 && reg == R_EBX) { -ret |= CPUID_8000_0007_EBX_OVERFLOW_RECOV | CPUID_8000_0007_EBX_SUCCOR; +uint32_t ebx; +host_cpuid(0x8007, 0, &unused, &ebx, &unused, &unused); + +ret |= ebx & (CPUID_8000_0007_EBX_OVERFLOW_RECOV | CPUID_8000_0007_EBX_SUCCOR); } else if (function == KVM_CPUID_FEATURES && reg == R_EAX) { /* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't * be enabled without the in-kernel irqchip Thanks, Zhao
Re: [PATCH v6 0/7] hw/char/pl011: Implement TX (async) FIFO to avoid blocking the main loop
On Tue, 18 Feb 2025 at 13:54, Peter Maydell wrote: > > On Mon, 17 Feb 2025 at 14:55, Peter Maydell wrote: > > > > On Sat, 8 Feb 2025 at 16:39, Philippe Mathieu-Daudé > > wrote: > > > > > > Hi, > > > > > > This series add support for (async) FIFO on the transmit path > > > of the PL011 UART. > > > > > > > Applied to target-arm.next, thanks (with a couple of minor > > tweaks to two of the patches). > > Unfortunately I seem to get failures in 'make check-functional' > with the last patch of this series applied. I had a look at this this morning because I wondered if it was a mistake in the style fixups I'd applied to the patches on my end, and I found the bug fairly quickly. The problem is that pl011_xmit() doesn't update the TXFE and TXFF FIFO empty/full status flag bits when it removes characters from the FIFO. So the guest kernel spins forever because TXFF is never unset. The following patch fixes this for me (and also makes us not set INT_TX for the case where we couldn't send any bytes to the chardev, which I noticed reading the code rather than because it had any visible bad effects): --- a/hw/char/pl011.c +++ b/hw/char/pl011.c @@ -256,6 +256,7 @@ static gboolean pl011_xmit(void *do_not_use, GIOCondition cond, void *opaque) int bytes_consumed; uint8_t buf[PL011_FIFO_DEPTH]; uint32_t count; +bool emptied_fifo; count = fifo8_num_used(&s->xmit_fifo); trace_pl011_fifo_tx_xmit_used(count); @@ -280,15 +281,24 @@ static gboolean pl011_xmit(void *do_not_use, GIOCondition cond, void *opaque) /* Error in back-end: drain the fifo. */ pl011_drain_tx(s); return G_SOURCE_REMOVE; +} else if (bytes_consumed == 0) { +/* Couldn't send anything, try again later */ +return G_SOURCE_CONTINUE; } /* Pop the data we could transmit. */ fifo8_drop(&s->xmit_fifo, bytes_consumed); s->int_level |= INT_TX; +s->flags &= ~PL011_FLAG_TXFF; + +emptied_fifo = fifo8_is_empty(&s->xmit_fifo); +if (emptied_fifo) { +s->flags |= PL011_FLAG_TXFE; +} pl011_update(s); -if (!fifo8_is_empty(&s->xmit_fifo)) { +if (!emptied_fifo) { /* Reschedule another transmission if we couldn't transmit all. */ return G_SOURCE_CONTINUE; } If you're OK with that as a fix then I'll squash it in and keep the series in target-arm.next. thanks -- PMM
Re: [PATCH v2 1/3] binfmt: Shuffle things around
Le 27/01/2025 à 19:29, Andrea Bolognani a écrit : This should make no difference from the functional point of view and it's just preparation for upcoming changes. Signed-off-by: Andrea Bolognani --- scripts/qemu-binfmt-conf.sh | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/scripts/qemu-binfmt-conf.sh b/scripts/qemu-binfmt-conf.sh index 6ef9f118d9..426f075e31 100755 --- a/scripts/qemu-binfmt-conf.sh +++ b/scripts/qemu-binfmt-conf.sh @@ -318,20 +318,23 @@ qemu_set_binfmts() { mask=$(eval echo \$${cpu}_mask) family=$(eval echo \$${cpu}_family) +target="$cpu" +if [ "$cpu" = "i486" ] ; then +target="i386" +fi + +qemu="$QEMU_PATH/qemu-$target$QEMU_SUFFIX" + if [ "$magic" = "" ] || [ "$mask" = "" ] || [ "$family" = "" ] ; then echo "INTERNAL ERROR: unknown cpu $cpu" 1>&2 continue fi -qemu="$QEMU_PATH/qemu-$cpu" -if [ "$cpu" = "i486" ] ; then -qemu="$QEMU_PATH/qemu-i386" +if [ "$host_family" = "$family" ] ; then +continue fi -qemu="$qemu$QEMU_SUFFIX" -if [ "$host_family" != "$family" ] ; then -$BINFMT_SET -fi +$BINFMT_SET done } Reviewed-by: Laurent Vivier
Re: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in d3hot state
Hi Alex, On 2/20/25 11:31 AM, Eric Auger wrote: > > Hi Alex, > > On 2/19/25 10:19 PM, Alex Williamson wrote: >> On Wed, 19 Feb 2025 11:58:44 -0700 >> Alex Williamson wrote: >> >>> On Wed, 19 Feb 2025 18:58:58 +0100 >>> Eric Auger wrote: >>> Since kernel commit: 2b2c651baf1c ("vfio/pci: Invalidate mmaps and block the access in D3hot power state") any attempt to do an mmap access to a BAR when the device is in d3hot state will generate a fault. On system_powerdown, if the VFIO device is translated by an IOMMU, the device is moved to D3hot state and then the vIOMMU gets disabled by the guest. As a result of this later operation, the address space is swapped from translated to untranslated. When re-enabling the aliased regions, the RAM regions are dma-mapped again and this causes DMA_MAP faults when attempting the operation on BARs. To avoid doing the remap on those BARs, we compute whether the device is in D3hot state and if so, skip the DMA MAP. >>> Thinking on this some more, QEMU PCI code already manages the device >>> BARs appearing in the address space based on the memory enable bit in >>> the command register. Should we do the same for PM state? >>> >>> IOW, the device going into low power state should remove the BARs from >>> the AddressSpace and waking the device should re-add them. The BAR DMA >>> mapping should then always be consistent, whereas here nothing would >>> remap the BARs when the device is woken. >>> >>> I imagine we'd need an interface to register the PM capability with the >>> core QEMU PCI code, where address space updates are performed relative >>> to both memory enable and power status. There might be a way to >>> implement this just for vfio-pci devices by toggling the enable state >>> of the BAR mmaps relative to PM state, but doing it at the PCI core >>> level seems like it'd provide behavior more true to physical hardware. >> I took a stab at this approach here, it doesn't obviously break >> anything in my configs, but I haven't yet tried to reproduce this exact >> scenario. >> >> https://gitlab.com/alex.williamson/qemu/-/tree/pci-pm-power-state it does not totally fix the issue: I now get: qemu-system-x86_64: warning: vfio_container_dma_map(0x55cc25705680, 0x3800, 0x100, 0x7f876200) = -14 (Bad address) :41:00.0: PCI peer-to-peer transactions on BARs are not supported. Eric > > So if I understand correctly the BAR regions will disappear upon the > config cmd write in vfio_sub_page_bar_update_mapping(). Is that correct? > I will give it a try on my setup... >> >> There's another pm_cap on the PCIExpressDevice that needs to be >> consolidated as well, once I do some research to figure out why a >> non-express capability is tracked only by express devices and what >> they're doing with it. Thanks, > I am not sure I get this last point though. > > Thanks > > Eric >> >> Alex >> >
Re: [PATCH v6 0/7] hw/char/pl011: Implement TX (async) FIFO to avoid blocking the main loop
On Thu, 20 Feb 2025 at 10:43, Peter Maydell wrote: > > On Tue, 18 Feb 2025 at 13:54, Peter Maydell wrote: > > > > On Mon, 17 Feb 2025 at 14:55, Peter Maydell > > wrote: > > > > > > On Sat, 8 Feb 2025 at 16:39, Philippe Mathieu-Daudé > > > wrote: > > > > > > > > Hi, > > > > > > > > This series add support for (async) FIFO on the transmit path > > > > of the PL011 UART. > > > > > > > > > > Applied to target-arm.next, thanks (with a couple of minor > > > tweaks to two of the patches). > > > > Unfortunately I seem to get failures in 'make check-functional' > > with the last patch of this series applied. > > I had a look at this this morning because I wondered if it > was a mistake in the style fixups I'd applied to the patches > on my end, and I found the bug fairly quickly. The problem is > that pl011_xmit() doesn't update the TXFE and TXFF FIFO empty/full > status flag bits when it removes characters from the FIFO. > So the guest kernel spins forever because TXFF is never unset. > > The following patch fixes this for me (and also makes us not > set INT_TX for the case where we couldn't send any bytes to > the chardev, which I noticed reading the code rather than > because it had any visible bad effects): Hmm, but that's clearly not the only problem -- it fixed the "no output at all issue", but now I see a test failure because of garbled console output: 2025-02-20 10:34:51,562: Booting Linux on physical CPU 0x0 2025-02-20 10:34:51,563: Linux version 4.18.16-300.fc29.armv7hl (mockbu...@buildvm-armv7-06.arm.fedoraproject.org) (gcc version 8.2.1 20180801 (Red Hat 8.2.1-2) (GCC)) #1 SMP Sun Oct 21 00:56:28 UTC 2018 2025-02-20 10:34:51,563: CPU: ARMv7 Processor [414fc0f0] revision 0 (ARMv7), cr=10c5387d 2025-02-20 10:34:51,564: CPU: div instructions available: patching division code 2025-02-20 10:34:51,564: CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache 2025-02-20 10:34:51,565: OF: fdt: Machine model: linux,dummy-virt 2025-02-20 10:34:51,565: Memory policy: Data cache writealloc 2025-02-20 10:34:51,565: efi: Getting EFI parameters from FDT: 2025-02-20 10:34:51,566: efi: UEFI not found. 2025-02-20 10:34:51,566: cma: aetrsv6i 2025-02-20 10:34:51,566: psci: oi ocdteofmT 2025-02-20 10:34:51,566: ps:Sv1ecdniwe^Mps:Usdvni spsc:rt gtnot psci: SMC Calling Conventiov0 2025-02-20 10:34:51,567: pec:Eee1pec pvls9 1 2865^MBul1olt bi4^MKerloa n:itte nltAA 2025-02-20 10:34:51,568: Det c stlere 3 rr4 5i 9(dr336bt)Memr 06112aib 6Keece11K ra0 ,21 sv,Kmrsv,Kim)Virtual kernel memory layout: 2025-02-20 10:34:51,568: vt :xf0 fa f00 0f00 30 (0M^M lm c00 0800xe0 2M^M mue:x00 0f00 4B 2025-02-20 10:34:51,568: et x(tvl) (rl 97B 2025-02-20 10:34:51,569: n 0(trvl-xta) 0 0pv) 3 B 2025-02-20 10:34:51,570: bs prl-xta) 5 )random: get_random_u32 ca ___+:ag6 d=3MOes,P=,os^Mfta:aotg78ni 1asHierhlC pmti.RCrtcnC oN_S2orpi=. TasCebd 2025-02-20 10:34:51,571: RC jtgeeyor_o_a1 __d^MNRIS1 rr GIC physical location is 0x80^MGIC2 n[mx00-02f,P81]arcte 1tm( nna65H(r.cemk0facl:0d20,aish_o:6i M,routo1n per49419sn^MCocrm dexCalibrating delay ose eceiife..oIl2^Mpxdl6i3 2025-02-20 10:34:51,572: SertFmo ia^MYemgmdl 2025-02-20 10:34:51,572: SEn: ili.Moutcea b te 2(d:,4t^MMoupnch stlers14oe 9oeyo 2025-02-20 10:34:51,572: CP0Screv:iwein tuia norieI tst lrl 2025-02-20 10:34:51,573: /cpus/cpu@0 missing clock-frequen or^MCPU rd1c k i 00^MSetgptiintm rx30 000Hircc Cmento^MEF rc ln aae^Msm ii cdrCs.smp ohu1o, USMPTao1reo cvae(15. gP.CPUA U)tt Code.devtmpfs: initialize^Mmo4aheu pr3vit e0cloejfe s fffm_cs0fffm_d_:924250sfut stbs 2025-02-20 10:34:51,574: piclo:niidit uye 2025-02-20 10:34:51,574: DM tre vi^MNETeDM:rlae2 o tmi oerent aloaios 2025-02-20 10:34:51,574: audit: initializing netlink subsys (disabled) 2025-02-20 10:34:51,574: audit: type=2000 audit(0.480:1): sat=ntalzdadt_nbe= e=^Mcpuidl:usn oenrmnu 2025-02-20 10:34:51,574: No ATG? 2025-02-20 10:34:51,575: hw-brapit fud5(1rsrvd rapon n achon eitrs^Mhw-brapit:mxmmwthontsz s8bts^MSerial MAP01 Ap1:ttAA tMIO09000ud = 0) saP01 e1 2025-02-20 10:34:51,575: console [ttyAMA0] enabled 2025-02-20 10:34:51,668: cryptd: max_cpu_qlen set to 1000 2025-02-20 10:34:51,690: vgaarb: loaded 2025-02-20 10:34:51,697: SCSI subsystem initialized 2025-02-20 10:34:51,702: usbcore: registered new interface driver usbfs 2025-02-20 10:34:51,703: usbcore: registered new interface driver hub (the log has no garbling after that so it's presumably OK with the interrupt-driven real UART driver but the polling one you get for earlycon has trouble.) I also noticed that pl011_write_txdata() doesn't clear TXFE when it puts a byte into the fifo -- I'm testing to see if fixing that helps. -- PMM
Re: [PATCH v5 2/6] target/i386: Update EPYC-Rome CPU model for Cache property, RAS, SVM feature bits
On Thu, Feb 06, 2025 at 01:28:35PM -0600, Babu Moger wrote: > Date: Thu, 6 Feb 2025 13:28:35 -0600 > From: Babu Moger > Subject: [PATCH v5 2/6] target/i386: Update EPYC-Rome CPU model for Cache > property, RAS, SVM feature bits > X-Mailer: git-send-email 2.34.1 > > Found that some of the cache properties are not set correctly for EPYC models. > > l1d_cache.no_invd_sharing should not be true. > l1i_cache.no_invd_sharing should not be true. > > L2.self_init should be true. > L2.inclusive should be true. > > L3.inclusive should not be true. > L3.no_invd_sharing should be true. > > Fix these cache properties. > > Also add the missing RAS and SVM features bits on AMD EPYC-Rome. The SVM > feature bits are used in nested guests. > > succor: Software uncorrectable error containment and recovery > capability. > overflow-recov: MCA overflow recovery support. > lbrv : LBR virtualization > tsc-scale : MSR based TSC rate control > vmcb-clean: VMCB clean bits > flushbyasid : Flush by ASID > pause-filter : Pause intercept filter > pfthreshold : PAUSE filter threshold > v-vmsave-vmload : Virtualized VMLOAD and VMSAVE > vgif : Virtualized GIF > > Signed-off-by: Babu Moger > Reviewed-by: Maksim Davydov > --- > target/i386/cpu.c | 73 +++ > 1 file changed, 73 insertions(+) > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > index 94292bfaa2..e2c3c797ed 100644 > --- a/target/i386/cpu.c > +++ b/target/i386/cpu.c > @@ -2342,6 +2342,60 @@ static const CPUCaches epyc_rome_v3_cache_info = { > }, > }; > > +static const CPUCaches epyc_rome_v5_cache_info = { > +.l1d_cache = &(CPUCacheInfo) { > +.type = DATA_CACHE, > +.level = 1, > +.size = 32 * KiB, > +.line_size = 64, > +.associativity = 8, > +.partitions = 1, > +.sets = 64, > +.lines_per_tag = 1, > +.self_init = 1, This field could be true, > +.share_level = CPU_TOPOLOGY_LEVEL_CORE, > +}, > +.l1i_cache = &(CPUCacheInfo) { > +.type = INSTRUCTION_CACHE, > +.level = 1, > +.size = 32 * KiB, > +.line_size = 64, > +.associativity = 8, > +.partitions = 1, > +.sets = 64, > +.lines_per_tag = 1, > +.self_init = 1, ditto, Compared to the previous cache model version, the differences can be checked. I feel that in the future, when we introduce a new cache model, it's better to avoid omitting items that default to false. This way, the cache model can correspond to the output of the cpuid tool, making it easier to compare and check. Overall, LGTM, Reviewed-by: Zhao Liu
Re: [PATCH 2/9] hw/char/pl011: Simplify a bit pl011_can_receive()
On 22:08 Wed 19 Feb , Philippe Mathieu-Daudé wrote: > Introduce 'fifo_depth' and 'fifo_available' local variables > to better express the 'r' variable use. > > Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Luc Michel > --- > hw/char/pl011.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/hw/char/pl011.c b/hw/char/pl011.c > index 60cea1d9a16..bcd516d682d 100644 > --- a/hw/char/pl011.c > +++ b/hw/char/pl011.c > @@ -486,7 +486,9 @@ static void pl011_write(void *opaque, hwaddr offset, > static int pl011_can_receive(void *opaque) > { > PL011State *s = (PL011State *)opaque; > -int r; > +unsigned fifo_depth = pl011_get_fifo_depth(s); > +unsigned fifo_available = fifo_depth - s->read_count; > +int r = fifo_available ? 1 : 0; > > if (!(s->cr & CR_UARTEN)) { > qemu_log_mask(LOG_GUEST_ERROR, "PL011 reading data on disabled > UART\n"); > @@ -494,7 +496,6 @@ static int pl011_can_receive(void *opaque) > if (!(s->cr & CR_RXE)) { > qemu_log_mask(LOG_GUEST_ERROR, "PL011 reading data on disabled TX > UART\n"); > } > -r = s->read_count < pl011_get_fifo_depth(s); > trace_pl011_can_receive(s->lcr, s->read_count, r); > return r; > } > -- > 2.47.1 > --
Re: [PATCH 3/9] hw/char/pl011: Improve RX flow tracing events
On 22:08 Wed 19 Feb , Philippe Mathieu-Daudé wrote: > Log FIFO use (availability and depth). > > Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Luc Michel > --- > hw/char/pl011.c | 10 ++ > hw/char/trace-events | 7 --- > 2 files changed, 10 insertions(+), 7 deletions(-) > > diff --git a/hw/char/pl011.c b/hw/char/pl011.c > index bcd516d682d..148a7d0dc60 100644 > --- a/hw/char/pl011.c > +++ b/hw/char/pl011.c > @@ -185,7 +185,7 @@ static void pl011_fifo_rx_put(void *opaque, uint32_t > value) > s->read_fifo[slot] = value; > s->read_count++; > s->flags &= ~PL011_FLAG_RXFE; > -trace_pl011_fifo_rx_put(value, s->read_count); > +trace_pl011_fifo_rx_put(value, s->read_count, pipe_depth); > if (s->read_count == pipe_depth) { > trace_pl011_fifo_rx_full(); > s->flags |= PL011_FLAG_RXFF; > @@ -248,12 +248,13 @@ static void pl011_write_txdata(PL011State *s, uint8_t > data) > static uint32_t pl011_read_rxdata(PL011State *s) > { > uint32_t c; > +unsigned fifo_depth = pl011_get_fifo_depth(s); > > s->flags &= ~PL011_FLAG_RXFF; > c = s->read_fifo[s->read_pos]; > if (s->read_count > 0) { > s->read_count--; > -s->read_pos = (s->read_pos + 1) & (pl011_get_fifo_depth(s) - 1); > +s->read_pos = (s->read_pos + 1) & (fifo_depth - 1); > } > if (s->read_count == 0) { > s->flags |= PL011_FLAG_RXFE; > @@ -261,7 +262,7 @@ static uint32_t pl011_read_rxdata(PL011State *s) > if (s->read_count == s->read_trigger - 1) { > s->int_level &= ~INT_RX; > } > -trace_pl011_read_fifo(s->read_count); > +trace_pl011_read_fifo(s->read_count, fifo_depth); > s->rsr = c >> 8; > pl011_update(s); > qemu_chr_fe_accept_input(&s->chr); > @@ -496,12 +497,13 @@ static int pl011_can_receive(void *opaque) > if (!(s->cr & CR_RXE)) { > qemu_log_mask(LOG_GUEST_ERROR, "PL011 reading data on disabled TX > UART\n"); > } > -trace_pl011_can_receive(s->lcr, s->read_count, r); > +trace_pl011_can_receive(s->lcr, s->read_count, fifo_depth, > fifo_available); > return r; > } > > static void pl011_receive(void *opaque, const uint8_t *buf, int size) > { > +trace_pl011_receive(size); > /* > * In loopback mode, the RX input signal is internally disconnected > * from the entire receiving logics; thus, all inputs are ignored, > diff --git a/hw/char/trace-events b/hw/char/trace-events > index b2e3d25ae34..05a33036c12 100644 > --- a/hw/char/trace-events > +++ b/hw/char/trace-events > @@ -60,12 +60,13 @@ imx_serial_put_data(const char *chrname, uint32_t value) > "%s: 0x%" PRIx32 > # pl011.c > pl011_irq_state(int level) "irq state %d" > pl011_read(uint32_t addr, uint32_t value, const char *regname) "addr 0x%03x > value 0x%08x reg %s" > -pl011_read_fifo(int read_count) "FIFO read, read_count now %d" > +pl011_read_fifo(unsigned rx_fifo_used, size_t rx_fifo_depth) "RX FIFO read, > used %u/%zu" > pl011_write(uint32_t addr, uint32_t value, const char *regname) "addr 0x%03x > value 0x%08x reg %s" > -pl011_can_receive(uint32_t lcr, int read_count, int r) "LCR 0x%08x > read_count %d returning %d" > -pl011_fifo_rx_put(uint32_t c, int read_count) "new char 0x%02x read_count > now %d" > +pl011_can_receive(uint32_t lcr, unsigned rx_fifo_used, size_t rx_fifo_depth, > unsigned rx_fifo_available) "LCR 0x%02x, RX FIFO used %u/%zu, can_receive %u > chars" > +pl011_fifo_rx_put(uint32_t c, unsigned read_count, size_t rx_fifo_depth) "RX > FIFO push char [0x%02x] %d/%zu depth used" > pl011_fifo_rx_full(void) "RX FIFO now full, RXFF set" > pl011_baudrate_change(unsigned int baudrate, uint64_t clock, uint32_t ibrd, > uint32_t fbrd) "new baudrate %u (clk: %" PRIu64 "hz, ibrd: %" PRIu32 ", fbrd: > %" PRIu32 ")" > +pl011_receive(int size) "recv %d chars" > > # cmsdk-apb-uart.c > cmsdk_apb_uart_read(uint64_t offset, uint64_t data, unsigned size) "CMSDK > APB UART read: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u" > -- > 2.47.1 > --
Re: [PATCH 00/10] fpu: Remove remaining target ifdefs and build only once
On 17/2/25 13:50, Peter Maydell wrote: The work I needed to do to make various softfloat emulation behaviours runtime-selectable for Arm FEAT_AFP has left the fpu code with very few remaning target ifdefs. So this series turns the last remaning ones into runtime behaviour choices and switches the fpu code into "build once" rather than "build per target". The main driver of this is that we're going to want to do this for the "multiple targets in one binary" work. Indeed the following 332 symbols are now removed for each target: bfloat16_add bfloat16_compare bfloat16_compare_quiet bfloat16_default_nan bfloat16_div bfloat16_is_quiet_nan bfloat16_is_signaling_nan bfloat16_max bfloat16_maximum_number bfloat16_maxnum bfloat16_maxnummag bfloat16_min bfloat16_minimum_number bfloat16_minnum bfloat16_minnummag bfloat16_mul bfloat16_muladd bfloat16_round_to_int bfloat16_scalbn bfloat16_silence_nan bfloat16_sqrt bfloat16_squash_input_denormal bfloat16_sub bfloat16_to_float32 bfloat16_to_float64 bfloat16_to_int16 bfloat16_to_int16_round_to_zero bfloat16_to_int16_scalbn bfloat16_to_int32 bfloat16_to_int32_round_to_zero bfloat16_to_int32_scalbn bfloat16_to_int64 bfloat16_to_int64_round_to_zero bfloat16_to_int64_scalbn bfloat16_to_int8 bfloat16_to_int8_round_to_zero bfloat16_to_int8_scalbn bfloat16_to_uint16 bfloat16_to_uint16_round_to_zero bfloat16_to_uint16_scalbn bfloat16_to_uint32 bfloat16_to_uint32_round_to_zero bfloat16_to_uint32_scalbn bfloat16_to_uint64 bfloat16_to_uint64_round_to_zero bfloat16_to_uint64_scalbn bfloat16_to_uint8 bfloat16_to_uint8_round_to_zero bfloat16_to_uint8_scalbn float128_add float128_compare float128_compare_quiet float128_default_nan float128_div float128_is_quiet_nan float128_is_signaling_nan float128_max float128_maximum_number float128_maxnum float128_maxnummag float128_min float128_minimum_number float128_minnum float128_minnummag float128_mul float128_muladd float128_rem float128_round_to_int float128_scalbn float128_silence_nan float128_sqrt float128_sub float128_to_float32 float128_to_float64 float128_to_floatx80 float128_to_int128 float128_to_int128_round_to_zero float128_to_int32 float128_to_int32_round_to_zero float128_to_int64 float128_to_int64_round_to_zero float128_to_uint128 float128_to_uint128_round_to_zero float128_to_uint32 float128_to_uint32_round_to_zero float128_to_uint64 float128_to_uint64_round_to_zero float16_add float16_compare float16_compare_quiet float16_default_nan float16_div float16_is_quiet_nan float16_is_signaling_nan float16_max float16_maximum_number float16_maxnum float16_maxnummag float16_min float16_minimum_number float16_minnum float16_minnummag float16_mul float16_muladd float16_muladd_scalbn float16_round_to_int float16_scalbn float16_silence_nan float16_sqrt float16_squash_input_denormal float16_sub float16_to_float32 float16_to_float64 float16_to_int16 float16_to_int16_round_to_zero float16_to_int16_scalbn float16_to_int32 float16_to_int32_round_to_zero float16_to_int32_scalbn float16_to_int64 float16_to_int64_round_to_zero float16_to_int64_scalbn float16_to_int8 float16_to_int8_scalbn float16_to_uint16 float16_to_uint16_round_to_zero float16_to_uint16_scalbn float16_to_uint32 float16_to_uint32_round_to_zero float16_to_uint32_scalbn float16_to_uint64 float16_to_uint64_round_to_zero float16_to_uint64_scalbn float16_to_uint8 float16_to_uint8_scalbn float32_add float32_compare float32_compare_quiet float32_default_nan float32_div float32_exp2 float32_is_quiet_nan float32_is_signaling_nan float32_log2 float32_max float32_maximum_number float32_maxnum float32_maxnummag float32_min float32_minimum_number float32_minnum float32_minnummag float32_mul float32_muladd float32_muladd_scalbn float32_rem float32_round_to_int float32_scalbn float32_silence_nan float32_sqrt float32_squash_input_denormal float32_sub float32_to_bfloat16 float32_to_float128 float32_to_float16 float32_to_float64 float32_to_floatx80 float32_to_int16 float32_to_int16_round_to_zero float32_to_int16_scalbn float32_to_int32 float32_to_int32_round_to_zero float32_to_int32_scalbn float32_to_int64 float32_to_int64_round_to_zero float32_to_int64_scalbn float32_to_uint16 float32_to_uint16_round_to_zero float32_to_uint16_scalbn float32_to_uint32 float32_to_uint32_round_to_zero
Re: [PATCH 6/9] hw/char/imx_serial: Really use RX FIFO depth
On 22:08 Wed 19 Feb , Philippe Mathieu-Daudé wrote: > While we model a 32-elements RX FIFO since the PL011 model was > introduced in commit 988f2442971 ("hw/char/imx_serial: Implement > receive FIFO and ageing timer") we only read 1 char at a time! "the IMX serial model"? Reviewed-by: Luc Michel > > Have the IOCanReadHandler handler return how many elements are > available, and use that in the IOReadHandler handler. > > Signed-off-by: Philippe Mathieu-Daudé > --- > hw/char/imx_serial.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/hw/char/imx_serial.c b/hw/char/imx_serial.c > index 38b4865157e..6f14f8403a9 100644 > --- a/hw/char/imx_serial.c > +++ b/hw/char/imx_serial.c > @@ -386,7 +386,8 @@ static void imx_serial_write(void *opaque, hwaddr offset, > static int imx_can_receive(void *opaque) > { > IMXSerialState *s = (IMXSerialState *)opaque; > -return s->ucr2 & UCR2_RXEN && fifo32_num_used(&s->rx_fifo) < FIFO_SIZE; > + > +return s->ucr2 & UCR2_RXEN ? fifo32_num_free(&s->rx_fifo) : 0; > } > > static void imx_put_data(void *opaque, uint32_t value) > @@ -417,7 +418,10 @@ static void imx_receive(void *opaque, const uint8_t > *buf, int size) > IMXSerialState *s = (IMXSerialState *)opaque; > > s->usr2 |= USR2_WAKE; > -imx_put_data(opaque, *buf); > + > +for (int i = 0; i < size; i++) { > +imx_put_data(opaque, buf[i]); > +} > } > > static void imx_event(void *opaque, QEMUChrEvent event) > -- > 2.47.1 > --
Re: [PATCH 7/9] hw/char/mcf_uart: Use FIFO_DEPTH definition instead of magic values
On 22:08 Wed 19 Feb , Philippe Mathieu-Daudé wrote: > Defines FIFO_DEPTH and use it, fixing coding style. > > Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Luc Michel > --- > hw/char/mcf_uart.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/hw/char/mcf_uart.c b/hw/char/mcf_uart.c > index 980a12fcb7d..95f269ee9b7 100644 > --- a/hw/char/mcf_uart.c > +++ b/hw/char/mcf_uart.c > @@ -17,6 +17,8 @@ > #include "chardev/char-fe.h" > #include "qom/object.h" > > +#define FIFO_DEPTH 4 > + > struct mcf_uart_state { > SysBusDevice parent_obj; > > @@ -27,7 +29,7 @@ struct mcf_uart_state { > uint8_t imr; > uint8_t bg1; > uint8_t bg2; > -uint8_t fifo[4]; > +uint8_t fifo[FIFO_DEPTH]; > uint8_t tb; > int current_mr; > int fifo_len; > @@ -247,14 +249,16 @@ static void mcf_uart_reset(DeviceState *dev) > static void mcf_uart_push_byte(mcf_uart_state *s, uint8_t data) > { > /* Break events overwrite the last byte if the fifo is full. */ > -if (s->fifo_len == 4) > +if (s->fifo_len == FIFO_DEPTH) { > s->fifo_len--; > +} > > s->fifo[s->fifo_len] = data; > s->fifo_len++; > s->sr |= MCF_UART_RxRDY; > -if (s->fifo_len == 4) > +if (s->fifo_len == FIFO_DEPTH) { > s->sr |= MCF_UART_FFULL; > +} > > mcf_uart_update(s); > } > -- > 2.47.1 > --
RE: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in d3hot state
>-Original Message- >From: Alex Williamson >Subject: Re: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in >d3hot state > >On Thu, 20 Feb 2025 04:24:13 + >"Duan, Zhenzhong" wrote: > >> >-Original Message- >> >From: Alex Williamson >> >Subject: Re: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in >> >d3hot state >> > >> >On Wed, 19 Feb 2025 18:58:58 +0100 >> >Eric Auger wrote: >> > >> >> Since kernel commit: >> >> 2b2c651baf1c ("vfio/pci: Invalidate mmaps and block the access >> >> in D3hot power state") >> >> any attempt to do an mmap access to a BAR when the device is in d3hot >> >> state will generate a fault. >> >> >> >> On system_powerdown, if the VFIO device is translated by an IOMMU, >> >> the device is moved to D3hot state and then the vIOMMU gets disabled >> >> by the guest. As a result of this later operation, the address space is >> >> swapped from translated to untranslated. When re-enabling the aliased >> >> regions, the RAM regions are dma-mapped again and this causes DMA_MAP >> >> faults when attempting the operation on BARs. >> >> >> >> To avoid doing the remap on those BARs, we compute whether the >> >> device is in D3hot state and if so, skip the DMA MAP. >> > >> >Thinking on this some more, QEMU PCI code already manages the device >> >BARs appearing in the address space based on the memory enable bit in >> >the command register. Should we do the same for PM state? >> > >> >IOW, the device going into low power state should remove the BARs from >> >the AddressSpace and waking the device should re-add them. The BAR DMA >> >mapping should then always be consistent, whereas here nothing would >> >remap the BARs when the device is woken. >> >> If BARs should be disabled before D3hot transition, isn't it guest's >> responsibility >to do that itself? >> Just like what have been done for FLR which calls pci_dev_save_and_disable(). > >Nothing requires the guest to clear memory and IO from the command >register before entering a low power state, nor are we going to get >very far arguing that it's the guest's fault for triggering an error in >the hypervisor. The PCI spec indicates that memory and IO BARs are only >accessible when the device is in the D0 power state. On bare metal >accessing the BAR for a device in a low power state would generate an >unsupported request. Understood, yes it makes sense to remove BARs from AddressSpace when D3hot. > Therefore why should QEMU map BARs of devices in >low power states into the address space? Should not. Thanks Zhenzhong
Re: [PATCH 8/9] hw/char/mcf_uart: Really use RX FIFO depth
On 22:08 Wed 19 Feb , Philippe Mathieu-Daudé wrote: > Caution: This message originated from an External Source. Use proper caution > when opening attachments, clicking links, or responding. > > > While we model a 4-elements RX FIFO since the PL011 model > was introduced in commit 20dcee94833 ("MCF5208 emulation"), > we only read 1 char at a time! "the MCF UART model" Reviewed-by: Luc Michel > > Have the IOCanReadHandler handler return how many elements are > available, and use that in the IOReadHandler handler. > > Signed-off-by: Philippe Mathieu-Daudé > --- > hw/char/mcf_uart.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/hw/char/mcf_uart.c b/hw/char/mcf_uart.c > index 95f269ee9b7..529c26be93a 100644 > --- a/hw/char/mcf_uart.c > +++ b/hw/char/mcf_uart.c > @@ -281,14 +281,16 @@ static int mcf_uart_can_receive(void *opaque) > { > mcf_uart_state *s = (mcf_uart_state *)opaque; > > -return s->rx_enabled && (s->sr & MCF_UART_FFULL) == 0; > +return s->rx_enabled ? FIFO_DEPTH - s->fifo_len : 0; > } > > static void mcf_uart_receive(void *opaque, const uint8_t *buf, int size) > { > mcf_uart_state *s = (mcf_uart_state *)opaque; > > -mcf_uart_push_byte(s, buf[0]); > +for (int i = 0; i < size; i++) { > +mcf_uart_push_byte(s, buf[i]); > +} > } > > static const MemoryRegionOps mcf_uart_ops = { > -- > 2.47.1 > --
[PATCH v2 1/3] plugins: add explicit dependency in functional tests
From: Pierrick Bouvier ./tests/functional/test_aarch64_tcg_plugins.py needs to have plugin libinsn built. However, it's not listed as a dependency, so meson can't know it needs to be built. Thus, we keep track of all plugins, and add them as an explicit dependency. Fixes: 4c134d07b9e ("tests: add a new set of tests to exercise plugins") Signed-off-by: Pierrick Bouvier Tested-by: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé --- meson.build | 1 + contrib/plugins/meson.build | 2 ++ tests/functional/meson.build | 2 +- tests/tcg/plugins/meson.build | 2 ++ 4 files changed, 6 insertions(+), 1 deletion(-) diff --git a/meson.build b/meson.build index 0ee79c664d3..4588bfd8642 100644 --- a/meson.build +++ b/meson.build @@ -3657,6 +3657,7 @@ qtest_module_ss = ss.source_set() modules = {} target_modules = {} +plugin_modules = [] hw_arch = {} target_arch = {} target_system_arch = {} diff --git a/contrib/plugins/meson.build b/contrib/plugins/meson.build index 484b9a808c8..fa8a426c8b5 100644 --- a/contrib/plugins/meson.build +++ b/contrib/plugins/meson.build @@ -26,3 +26,5 @@ if t.length() > 0 else run_target('contrib-plugins', command: find_program('true')) endif + +plugin_modules += t diff --git a/tests/functional/meson.build b/tests/functional/meson.build index b516d21cba1..2d41908c0a8 100644 --- a/tests/functional/meson.build +++ b/tests/functional/meson.build @@ -362,7 +362,7 @@ foreach speed : ['quick', 'thorough'] # 'run_target' logic below & in Makefile.include test('func-' + testname, python, - depends: [test_deps, test_emulator, emulator_modules], + depends: [test_deps, test_emulator, emulator_modules, plugin_modules], env: test_env, args: [testpath], protocol: 'tap', diff --git a/tests/tcg/plugins/meson.build b/tests/tcg/plugins/meson.build index 87a17d67bd4..c8cb0626a6d 100644 --- a/tests/tcg/plugins/meson.build +++ b/tests/tcg/plugins/meson.build @@ -19,3 +19,5 @@ if t.length() > 0 else run_target('test-plugins', command: find_program('true')) endif + +plugin_modules += t -- 2.47.1
Re: [PATCH v2 3/3] tests/functional: Allow running TCG plugins tests on non-Linux/BSD hosts
On 20/02/2025 09.02, Philippe Mathieu-Daudé wrote: Not all platforms use the '.so' suffix for shared libraries, which is how plugins are built. Use the recently introduced dso_suffix() helper to get the proper host suffix. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2804 Suggested-by: Pierrick Bouvier Suggested-by: Daniel P. Berrangé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Pierrick Bouvier --- Reviewed-by: Thomas Huth
Re: [PATCH v6 2/4] migration: enable multifd and postcopy together
Hello, On Wed, 19 Feb 2025 at 22:53, Fabiano Rosas wrote: > I don't see anything stopping postcopy_start() from being called in the > source in relation to multifd recv threads being setup in the > destination. So far it seems possible that the source is opening the > preempt channel while multifd still hasn't seen all threads. There's > also pre-7.2 machines which create the postcopy channel early. * If we can not predict the sequence/timings of when different types of connections are initiated and processed, maybe source and destination QEMUs could work in tandem. ie. before initiating a connection, source QEMU could send an 'initiate' message saying I'm initiating 'X' connection. Only when destination QEMU says 'okay', source QEMU could proceed with actual connection. QEMU-A -> Initiate connection type X -> QEMU-B QEMU-A <-okay <-<- QEMU-B QEMU-A -> connect type X-> QEMU-B (thinking out loud) >>> > * migration_needs_multiple_sockets() >> Then it should return 'True' when both migrate_multifd() and >> postcopy_preempt() are enabled. > Why? * I was thinking multiple_sockets is multiple types of sockets: multifd & postcopy. But it seems here multiple sockets is any type of multiple sockets. > I thought you meant the CH_MAIN stuff. So now I don't know what you > mean. You want to do away with multifd? * Yes, CH_DEFAULT -> CH_MAIN was introduced in this series to identify channels and accordingly call relevant functions. * Not to do away with multifd, but more of making it same as the main channel, ex: virsh migrate --threads N = 1...255. All precopy threads/connections behave the same. Differentiation of precopy and postcopy shall still exist, because they operate/work in opposite directions. > Continue with this patch and fix the stuff I mentioned. You can ignore > the first two paragraphs of that reply. > > https://lore.kernel.org/r/87y0y4tf5q@suse.de > > I still think we need to test that preempt + multifd scenario, but it > should be easy to write a test for that once the series is in more of a > final shape. * Okay. > We can't add magic values, as we've discussed. Okay. Thank you. --- - Prasad
Re: Any plan for command line '-g' option (graphical resolution and depth)?
On 2/20/25 10:20, Philippe Mathieu-Daudé wrote: Hi, QEMU provides the global '-g' CLI option: $ qemu-system-foo --help -g WxH[xDEPTH] Set the initial graphical resolution and depth This option is used to pass resolution/depth information to guest firmwares in the machines defined in the following files: hw/ppc/mac_newworld.c hw/ppc/mac_oldworld.c hw/ppc/prep.c hw/ppc/spapr.c hw/sparc/sun4m.c hw/sparc64/sun4u.c Examples: - hw/ppc/spapr.c:1102: _FDT(fdt_setprop_cell(fdt, c, "qemu,graphic-width", graphic_width)); - hw/sparc64/sun4u.c:716: fw_cfg_add_i16(fw_cfg, FW_CFG_SPARC64_WIDTH, graphic_width); Obviously we have default values, which are different per architecture! Clearly these ought to be machine properties. Yes, we should have an easy way for machine to allow, in addition to -M graphics=BOOLEAN, the structured property -M graphics.{width,height,depth,enabled}. Then '-g' can desugar to -M graphics.width=WW,graphics.height=HH,graphics.depth=BB. It is also used to set TYPE_NUBUS_MACFB properties in hw/m68k/q800.c. Here i suppose we could directly use '-global nubus-macfb.width=value' etc. although I'm not sure it is the recommended way. Should we start deprecating '-g' as a whole? Only if it's unused/useless. If the option is useful it could also be used for EDID, for example (adding Gerd). Global variables are hard to track, and the one-off nature of the option makes it hard to figure out (without looking at the code) which machine support it. However, if the properties were available and introspectable via QOM, then it wouldn't matter if the -g sugar remains as an isolated "case QEMU_OPTION_g:" in system/vl.c. For an example see -boot/-smp or -no-reboot/-no-shutdown. Paolo
Re: [PATCH] Fix data race with the state Field of ThreadPoolElement
On 2/19/25 17:12, Vitalii Mordan wrote: diff --git a/util/thread-pool.c b/util/thread-pool.c index 27eb777e85..6c5f4d085b 100644 --- a/util/thread-pool.c +++ b/util/thread-pool.c @@ -111,9 +111,8 @@ static void *worker_thread(void *opaque) ret = req->func(req->arg); req->ret = ret; -/* Write ret before state. */ -smp_wmb(); -req->state = THREAD_DONE; +/* Atomically update state after setting ret. */ +qatomic_store_release(&req->state, THREAD_DONE); This is good. @@ -180,7 +179,7 @@ static void thread_pool_completion_bh(void *opaque) restart: QLIST_FOREACH_SAFE(elem, &pool->head, all, next) { -if (elem->state != THREAD_DONE) { +if (qatomic_load_acquire(&elem->state) != THREAD_DONE) { This is good, but it needs a comment and it can replace the smp_rmb() below. continue; } @@ -223,12 +222,12 @@ static void thread_pool_cancel(BlockAIOCB *acb) trace_thread_pool_cancel(elem, elem->common.opaque); QEMU_LOCK_GUARD(&pool->lock); -if (elem->state == THREAD_QUEUED) { +if (qatomic_load_acquire(&elem->state) == THREAD_QUEUED) { This is not ordering anything so it can be qatomic_read/qatomic_set (technically the one below doesn't even need that, but it's fine). QTAILQ_REMOVE(&pool->request_list, elem, reqs); qemu_bh_schedule(pool->completion_bh); -elem->state = THREAD_DONE; elem->ret = -ECANCELED; +qatomic_store_release(&elem->state, THREAD_DONE); } } @@ -251,8 +250,8 @@ BlockAIOCB *thread_pool_submit_aio(ThreadPoolFunc *func, void *arg, req = qemu_aio_get(&thread_pool_aiocb_info, NULL, cb, opaque); req->func = func; req->arg = arg; -req->state = THREAD_QUEUED; req->pool = pool; +qatomic_store_release(&req->state, THREAD_QUEUED); This does not need any atomic access, because there is ordering via pool->lock (which protects the request_list). There's no need to do store-release and load-acquire except to order with another store or load, and in fact adding unnecessary acquire/release is harder to understand. Paolo QLIST_INSERT_HEAD(&pool->head, req, all);
Re: [PATCH v2 0/3] tests/functional: Allow running TCG plugins tests on macOS
Philippe Mathieu-Daudé writes: > Pierrick kindly helped me to resolve this issue which ended > being trivial (to him!). Not tested on Windows so far. > > Since v1: > - Include Pierrick's meson fix patch (adding Fixes: tag) > - Addressed Thomas review comments (config.py, os.path.join) > > Philippe Mathieu-Daudé (2): > tests/functional: Introduce the dso_suffix() helper > tests/functional: Allow running TCG plugins tests on non-Linux/BSD > hosts > > Pierrick Bouvier (1): > plugins: add explicit dependency in functional tests > > meson.build | 1 + > contrib/plugins/meson.build | 2 ++ > tests/functional/meson.build | 2 +- > tests/functional/qemu_test/__init__.py | 2 +- > tests/functional/qemu_test/cmd.py| 1 - > tests/functional/qemu_test/config.py | 6 ++ > tests/functional/test_aarch64_tcg_plugins.py | 11 --- > tests/tcg/plugins/meson.build| 2 ++ > 8 files changed, 21 insertions(+), 6 deletions(-) Queued to testing/next (I fixed the cmd.py whitespace change), thanks. -- Alex Bennée Virtualisation Tech Lead @ Linaro
Any plan for command line '-g' option (graphical resolution and depth)?
Hi, QEMU provides the global '-g' CLI option: $ qemu-system-foo --help -g WxH[xDEPTH] Set the initial graphical resolution and depth This option is used to pass resolution/depth information to guest firmwares in the machines defined in the following files: hw/ppc/mac_newworld.c hw/ppc/mac_oldworld.c hw/ppc/prep.c hw/ppc/spapr.c hw/sparc/sun4m.c hw/sparc64/sun4u.c Examples: - hw/ppc/spapr.c:1102: _FDT(fdt_setprop_cell(fdt, c, "qemu,graphic-width", graphic_width)); - hw/sparc64/sun4u.c:716: fw_cfg_add_i16(fw_cfg, FW_CFG_SPARC64_WIDTH, graphic_width); Obviously we have default values, which are different per architecture! Clearly these ought to be machine properties. It is also used to set TYPE_NUBUS_MACFB properties in hw/m68k/q800.c. Here i suppose we could directly use '-global nubus-macfb.width=value' etc. although I'm not sure it is the recommended way. Should we start deprecating '-g' as a whole? Regards, Phil.
Re: [PATCH 00/10] fpu: Remove remaining target ifdefs and build only once
On 20/2/25 09:48, Philippe Mathieu-Daudé wrote: On 17/2/25 13:50, Peter Maydell wrote: The work I needed to do to make various softfloat emulation behaviours runtime-selectable for Arm FEAT_AFP has left the fpu code with very few remaning target ifdefs. So this series turns the last remaning ones into runtime behaviour choices and switches the fpu code into "build once" rather than "build per target". The main driver of this is that we're going to want to do this for the "multiple targets in one binary" work. and my single binary duplicate symbols list drastically reduced: -ld: 1759 duplicate symbols +ld: 1427 duplicate symbols (being so far for a microblaze+xtensa config:) CpuModelCompareResult_lookup CpuModelExpansionType_lookup QAPIEvent_lookup accel_cpu_common_realize accel_cpu_common_unrealize accel_cpu_instance_init accel_find accel_init_interfaces accel_supported_gdbstub_sstep_flags address_space_access_valid address_space_cache_destroy address_space_cache_init address_space_cache_invalidate address_space_destroy address_space_dispatch_compact address_space_dispatch_free address_space_dispatch_new address_space_get_flatview address_space_get_iotlb_entry address_space_init address_space_io address_space_ldl address_space_ldl_be address_space_ldl_be_cached_slow address_space_ldl_cached_slow address_space_ldl_le address_space_ldl_le_cached_slow address_space_ldq address_space_ldq_be address_space_ldq_be_cached_slow address_space_ldq_cached_slow address_space_ldq_le address_space_ldq_le_cached_slow address_space_ldub address_space_ldub_cached_slow address_space_lduw address_space_lduw_be address_space_lduw_be_cached_slow address_space_lduw_cached_slow address_space_lduw_le address_space_lduw_le_cached_slow address_space_map address_space_memory address_space_read_cached_slow address_space_read_full address_space_register_map_client address_space_remove_listeners address_space_rw address_space_set address_space_stb address_space_stb_cached_slow address_space_stl address_space_stl_be address_space_stl_be_cached_slow address_space_stl_cached_slow address_space_stl_le address_space_stl_le_cached_slow address_space_stl_notdirty address_space_stl_notdirty_cached_slow address_space_stq address_space_stq_be address_space_stq_be_cached_slow address_space_stq_cached_slow address_space_stq_le address_space_stq_le_cached_slow address_space_stw address_space_stw_be address_space_stw_be_cached_slow address_space_stw_cached_slow address_space_stw_le address_space_stw_le_cached_slow address_space_translate_for_iotlb address_space_unmap address_space_unregister_map_client address_space_write address_space_write_cached_slow address_space_write_rom arch_type colo_flush_ram_cache colo_incoming_start_dirty_log colo_init_ram_cache colo_record_bitmap colo_release_ram_cache cpu_abort cpu_address_space_destroy cpu_address_space_init cpu_atomic_add_fetchb_mmu cpu_atomic_add_fetchl_be_mmu cpu_atomic_add_fetchl_le_mmu cpu_atomic_add_fetchq_be_mmu cpu_atomic_add_fetchq_le_mmu cpu_atomic_add_fetchw_be_mmu cpu_atomic_add_fetchw_le_mmu cpu_atomic_and_fetchb_mmu cpu_atomic_and_fetchl_be_mmu cpu_atomic_and_fetchl_le_mmu cpu_atomic_and_fetchq_be_mmu cpu_atomic_and_fetchq_le_mmu cpu_atomic_and_fetchw_be_mmu cpu_atomic_and_fetchw_le_mmu cpu_atomic_cmpxchgb_mmu cpu_atomic_cmpxchgl_be_mmu cpu_atomic_cmpxchgl_le_mmu cpu_atomic_cmpxchgo_be_mmu cpu_atomic_cmpxchgo_le_mmu cpu_atomic_cmpxchgq_be_mmu cpu_atomic_cmpxchgq_le_mmu cpu_atomic_cmpxchgw_be_mmu cpu_atomic_cmpxchgw_le_mmu cpu_atomic_fetch_addb_mmu cpu_atomic_fetch_addl_be_mmu cpu_atomic_fetch_addl_le_mmu cpu_atomic_fetch_addq_be_mmu cpu_atomic_fetch_addq_le_mmu cpu_atomic_fetch_addw_be_mmu cpu_atomic_fetch_addw_le_mmu cpu_atomic_fetch_andb_mmu cpu_atomic_fetch_andl_be_mmu cpu_atomic_fetch_andl_le_mmu cpu_atomic_fetch_andq_be_mmu cpu_atomic_fetch_andq_le_mmu cpu_atomic_fetch_andw_be_mmu cpu_atomic_fetch_andw_le_mmu cpu_atomic_fetch_orb_mmu cpu_atomic_fetch_orl_be_mmu cpu_atomic_fetch_orl_le_mmu cpu_atomic_fetch_orq_be_mmu cpu_atomic_fetch_orq_le_mmu cpu_atomic_fetch_orw_be_mmu cpu_atomic_fetch_orw_le_mmu cpu_atomic_fetch_smaxb_mmu cpu_atomic_fetch_smaxl_be_mmu cpu_atomic_fetch_smaxl_le_mmu cpu_atomic_fetch_smaxq_be_mmu cpu_atomic_fetch_smaxq_le_mmu cpu_atomic_fetch_smaxw_be_mmu cpu_atomic_fetch_smaxw_le_mmu cpu_atomic_fetch_sminb_mmu cpu_atomic_fetch_sminl_be_mmu cpu_atomic_fetch_sminl_le_mmu cpu_atomic_fetch_sminq_be_mmu cpu_atomic_fetch_sminq_le_mmu cpu_atomic_fetch_sminw_be_mmu cpu_atomic_fetch_sminw_le_mmu cpu_atomic_fetch_umaxb_mmu cpu_atomic_fetch_umaxl_be_mmu cpu_atomic_fetch_umaxl_le_mmu cpu_atomic_fetch_umaxq_be_mmu cpu_atomic_fetch_umaxq_le_mmu cpu_atomic_fetch_umaxw_be_mmu cpu_atomic_fetch_umaxw_le_mmu cpu_atomic_fetch_uminb_mmu cpu_atomic_fetch_uminl_be_mmu cpu_atomic_fetch_uminl_le_mmu cpu_atomic_fetch_uminq_be_mmu cpu_atomic_fetch_uminq_le_mmu cpu_atomic_fetch_uminw_be_mmu cpu_atomic_fetch_uminw_le_mmu cpu_atomic_fetch_xorb_mmu cpu_atomic_fetch_xorl_be_mmu cpu_atomi
[PATCH v2 3/9] hw/char/pl011: Improve RX flow tracing events
Log FIFO use (availability and depth). Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Alex Bennée Reviewed-by: Luc Michel --- hw/char/pl011.c | 10 ++ hw/char/trace-events | 7 --- 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/hw/char/pl011.c b/hw/char/pl011.c index 5bb83c54216..f7485e7c541 100644 --- a/hw/char/pl011.c +++ b/hw/char/pl011.c @@ -185,7 +185,7 @@ static void pl011_fifo_rx_put(void *opaque, uint32_t value) s->read_fifo[slot] = value; s->read_count++; s->flags &= ~PL011_FLAG_RXFE; -trace_pl011_fifo_rx_put(value, s->read_count); +trace_pl011_fifo_rx_put(value, s->read_count, pipe_depth); if (s->read_count == pipe_depth) { trace_pl011_fifo_rx_full(); s->flags |= PL011_FLAG_RXFF; @@ -248,12 +248,13 @@ static void pl011_write_txdata(PL011State *s, uint8_t data) static uint32_t pl011_read_rxdata(PL011State *s) { uint32_t c; +unsigned fifo_depth = pl011_get_fifo_depth(s); s->flags &= ~PL011_FLAG_RXFF; c = s->read_fifo[s->read_pos]; if (s->read_count > 0) { s->read_count--; -s->read_pos = (s->read_pos + 1) & (pl011_get_fifo_depth(s) - 1); +s->read_pos = (s->read_pos + 1) & (fifo_depth - 1); } if (s->read_count == 0) { s->flags |= PL011_FLAG_RXFE; @@ -261,7 +262,7 @@ static uint32_t pl011_read_rxdata(PL011State *s) if (s->read_count == s->read_trigger - 1) { s->int_level &= ~INT_RX; } -trace_pl011_read_fifo(s->read_count); +trace_pl011_read_fifo(s->read_count, fifo_depth); s->rsr = c >> 8; pl011_update(s); qemu_chr_fe_accept_input(&s->chr); @@ -498,12 +499,13 @@ static int pl011_can_receive(void *opaque) qemu_log_mask(LOG_GUEST_ERROR, "PL011 receiving data on disabled RX UART\n"); } -trace_pl011_can_receive(s->lcr, s->read_count, r); +trace_pl011_can_receive(s->lcr, s->read_count, fifo_depth, fifo_available); return r; } static void pl011_receive(void *opaque, const uint8_t *buf, int size) { +trace_pl011_receive(size); /* * In loopback mode, the RX input signal is internally disconnected * from the entire receiving logics; thus, all inputs are ignored, diff --git a/hw/char/trace-events b/hw/char/trace-events index b2e3d25ae34..05a33036c12 100644 --- a/hw/char/trace-events +++ b/hw/char/trace-events @@ -60,12 +60,13 @@ imx_serial_put_data(const char *chrname, uint32_t value) "%s: 0x%" PRIx32 # pl011.c pl011_irq_state(int level) "irq state %d" pl011_read(uint32_t addr, uint32_t value, const char *regname) "addr 0x%03x value 0x%08x reg %s" -pl011_read_fifo(int read_count) "FIFO read, read_count now %d" +pl011_read_fifo(unsigned rx_fifo_used, size_t rx_fifo_depth) "RX FIFO read, used %u/%zu" pl011_write(uint32_t addr, uint32_t value, const char *regname) "addr 0x%03x value 0x%08x reg %s" -pl011_can_receive(uint32_t lcr, int read_count, int r) "LCR 0x%08x read_count %d returning %d" -pl011_fifo_rx_put(uint32_t c, int read_count) "new char 0x%02x read_count now %d" +pl011_can_receive(uint32_t lcr, unsigned rx_fifo_used, size_t rx_fifo_depth, unsigned rx_fifo_available) "LCR 0x%02x, RX FIFO used %u/%zu, can_receive %u chars" +pl011_fifo_rx_put(uint32_t c, unsigned read_count, size_t rx_fifo_depth) "RX FIFO push char [0x%02x] %d/%zu depth used" pl011_fifo_rx_full(void) "RX FIFO now full, RXFF set" pl011_baudrate_change(unsigned int baudrate, uint64_t clock, uint32_t ibrd, uint32_t fbrd) "new baudrate %u (clk: %" PRIu64 "hz, ibrd: %" PRIu32 ", fbrd: %" PRIu32 ")" +pl011_receive(int size) "recv %d chars" # cmsdk-apb-uart.c cmsdk_apb_uart_read(uint64_t offset, uint64_t data, unsigned size) "CMSDK APB UART read: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u" -- 2.47.1
[PATCH v2 1/9] hw/char/pl011: Warn when using disabled receiver
We shouldn't receive characters when the full UART or its receiver is disabled. However we don't want to break the possibly incomplete "my first bare metal assembly program"s, so we choose to simply display a warning when this occurs. Reviewed-by: Richard Henderson Signed-off-by: Philippe Mathieu-Daudé --- hw/char/pl011.c | 9 + 1 file changed, 9 insertions(+) diff --git a/hw/char/pl011.c b/hw/char/pl011.c index 06ce851044d..12a2d4bc7bd 100644 --- a/hw/char/pl011.c +++ b/hw/char/pl011.c @@ -85,6 +85,7 @@ DeviceState *pl011_create(hwaddr addr, qemu_irq irq, Chardev *chr) #define CR_OUT1 (1 << 12) #define CR_RTS (1 << 11) #define CR_DTR (1 << 10) +#define CR_RXE (1 << 9) #define CR_TXE (1 << 8) #define CR_LBE (1 << 7) #define CR_UARTEN (1 << 0) @@ -487,6 +488,14 @@ static int pl011_can_receive(void *opaque) PL011State *s = (PL011State *)opaque; int r; +if (!(s->cr & CR_UARTEN)) { +qemu_log_mask(LOG_GUEST_ERROR, + "PL011 receiving data on disabled UART\n"); +} +if (!(s->cr & CR_RXE)) { +qemu_log_mask(LOG_GUEST_ERROR, + "PL011 receiving data on disabled RX UART\n"); +} r = s->read_count < pl011_get_fifo_depth(s); trace_pl011_can_receive(s->lcr, s->read_count, r); return r; -- 2.47.1
[PATCH v2 0/9] hw/char: Improve RX FIFO depth uses
Since v1: - Fixed typos (Luc) Some UART devices implement a RX FIFO but their code (via IOCanReadHandler) only return a size of 1 element, while we can receive more chars. This series takes advantage of the full depth. Inspired by pm215 chat comment on yesterday's community meeting on the PL011 Rust implementation description by Paolo :) Philippe Mathieu-Daudé (9): hw/char/pl011: Warn when using disabled receiver hw/char/pl011: Simplify a bit pl011_can_receive() hw/char/pl011: Improve RX flow tracing events hw/char/pl011: Really use RX FIFO depth hw/char/bcm2835_aux: Really use RX FIFO depth hw/char/imx_serial: Really use RX FIFO depth hw/char/mcf_uart: Use FIFO_DEPTH definition instead of magic values hw/char/mcf_uart: Really use RX FIFO depth hw/char/sh_serial: Return correct number of empty RX FIFO elements hw/char/bcm2835_aux.c | 6 -- hw/char/imx_serial.c | 8 ++-- hw/char/mcf_uart.c| 16 +++- hw/char/pl011.c | 30 ++ hw/char/sh_serial.c | 30 ++ hw/char/trace-events | 7 --- 6 files changed, 61 insertions(+), 36 deletions(-) -- 2.47.1
[PATCH v2 2/9] hw/char/pl011: Simplify a bit pl011_can_receive()
Introduce 'fifo_depth' and 'fifo_available' local variables to better express the 'r' variable use. Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Luc Michel --- hw/char/pl011.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/char/pl011.c b/hw/char/pl011.c index 12a2d4bc7bd..5bb83c54216 100644 --- a/hw/char/pl011.c +++ b/hw/char/pl011.c @@ -486,7 +486,9 @@ static void pl011_write(void *opaque, hwaddr offset, static int pl011_can_receive(void *opaque) { PL011State *s = (PL011State *)opaque; -int r; +unsigned fifo_depth = pl011_get_fifo_depth(s); +unsigned fifo_available = fifo_depth - s->read_count; +int r = fifo_available ? 1 : 0; if (!(s->cr & CR_UARTEN)) { qemu_log_mask(LOG_GUEST_ERROR, @@ -496,7 +498,6 @@ static int pl011_can_receive(void *opaque) qemu_log_mask(LOG_GUEST_ERROR, "PL011 receiving data on disabled RX UART\n"); } -r = s->read_count < pl011_get_fifo_depth(s); trace_pl011_can_receive(s->lcr, s->read_count, r); return r; } -- 2.47.1
[PATCH v2 7/9] hw/char/mcf_uart: Use FIFO_DEPTH definition instead of magic values
Defines FIFO_DEPTH and use it, fixing coding style. Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Luc Michel --- hw/char/mcf_uart.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/hw/char/mcf_uart.c b/hw/char/mcf_uart.c index 980a12fcb7d..95f269ee9b7 100644 --- a/hw/char/mcf_uart.c +++ b/hw/char/mcf_uart.c @@ -17,6 +17,8 @@ #include "chardev/char-fe.h" #include "qom/object.h" +#define FIFO_DEPTH 4 + struct mcf_uart_state { SysBusDevice parent_obj; @@ -27,7 +29,7 @@ struct mcf_uart_state { uint8_t imr; uint8_t bg1; uint8_t bg2; -uint8_t fifo[4]; +uint8_t fifo[FIFO_DEPTH]; uint8_t tb; int current_mr; int fifo_len; @@ -247,14 +249,16 @@ static void mcf_uart_reset(DeviceState *dev) static void mcf_uart_push_byte(mcf_uart_state *s, uint8_t data) { /* Break events overwrite the last byte if the fifo is full. */ -if (s->fifo_len == 4) +if (s->fifo_len == FIFO_DEPTH) { s->fifo_len--; +} s->fifo[s->fifo_len] = data; s->fifo_len++; s->sr |= MCF_UART_RxRDY; -if (s->fifo_len == 4) +if (s->fifo_len == FIFO_DEPTH) { s->sr |= MCF_UART_FFULL; +} mcf_uart_update(s); } -- 2.47.1
[PATCH v2 6/9] hw/char/imx_serial: Really use RX FIFO depth
While we model a 32-elements RX FIFO since the IMX serial model was introduced in commit 988f2442971 ("hw/char/imx_serial: Implement receive FIFO and ageing timer") we only read 1 char at a time! Have the IOCanReadHandler handler return how many elements are available, and use that in the IOReadHandler handler. Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Luc Michel --- hw/char/imx_serial.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/char/imx_serial.c b/hw/char/imx_serial.c index 38b4865157e..6f14f8403a9 100644 --- a/hw/char/imx_serial.c +++ b/hw/char/imx_serial.c @@ -386,7 +386,8 @@ static void imx_serial_write(void *opaque, hwaddr offset, static int imx_can_receive(void *opaque) { IMXSerialState *s = (IMXSerialState *)opaque; -return s->ucr2 & UCR2_RXEN && fifo32_num_used(&s->rx_fifo) < FIFO_SIZE; + +return s->ucr2 & UCR2_RXEN ? fifo32_num_free(&s->rx_fifo) : 0; } static void imx_put_data(void *opaque, uint32_t value) @@ -417,7 +418,10 @@ static void imx_receive(void *opaque, const uint8_t *buf, int size) IMXSerialState *s = (IMXSerialState *)opaque; s->usr2 |= USR2_WAKE; -imx_put_data(opaque, *buf); + +for (int i = 0; i < size; i++) { +imx_put_data(opaque, buf[i]); +} } static void imx_event(void *opaque, QEMUChrEvent event) -- 2.47.1
[PATCH v2 5/9] hw/char/bcm2835_aux: Really use RX FIFO depth
While we model a 8-elements RX FIFO since the BCM2835 AUX model was introduced in commit 97398d900ca ("bcm2835_aux: add emulation of BCM2835 AUX block") we only read 1 char at a time! Have the IOCanReadHandler handler return how many elements are available, and use that in the IOReadHandler handler. Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Luc Michel --- hw/char/bcm2835_aux.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/hw/char/bcm2835_aux.c b/hw/char/bcm2835_aux.c index 73ad5934067..c6e7eccf7dd 100644 --- a/hw/char/bcm2835_aux.c +++ b/hw/char/bcm2835_aux.c @@ -221,7 +221,7 @@ static int bcm2835_aux_can_receive(void *opaque) { BCM2835AuxState *s = opaque; -return s->read_count < BCM2835_AUX_RX_FIFO_LEN; +return BCM2835_AUX_RX_FIFO_LEN - s->read_count; } static void bcm2835_aux_put_fifo(void *opaque, uint8_t value) @@ -243,7 +243,9 @@ static void bcm2835_aux_put_fifo(void *opaque, uint8_t value) static void bcm2835_aux_receive(void *opaque, const uint8_t *buf, int size) { -bcm2835_aux_put_fifo(opaque, *buf); +for (int i = 0; i < size; i++) { +bcm2835_aux_put_fifo(opaque, buf[i]); +} } static const MemoryRegionOps bcm2835_aux_ops = { -- 2.47.1
[PATCH 3/5] vfio/pci: Delete local pm_cap
This is now redundant to PCIDevice.pm_cap. Cc: Cédric Le Goater Signed-off-by: Alex Williamson --- hw/vfio/pci.c | 9 - hw/vfio/pci.h | 1 - 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 6903f831e45f..ba4ef65b16fa 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2215,7 +2215,6 @@ static bool vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp) break; case PCI_CAP_ID_PM: vfio_check_pm_reset(vdev, pos); -vdev->pm_cap = pos; ret = pci_pm_init(pdev, pos, errp) >= 0; break; case PCI_CAP_ID_AF: @@ -2407,17 +2406,17 @@ void vfio_pci_pre_reset(VFIOPCIDevice *vdev) vfio_disable_interrupts(vdev); /* Make sure the device is in D0 */ -if (vdev->pm_cap) { +if (pdev->pm_cap) { uint16_t pmcsr; uint8_t state; -pmcsr = vfio_pci_read_config(pdev, vdev->pm_cap + PCI_PM_CTRL, 2); +pmcsr = vfio_pci_read_config(pdev, pdev->pm_cap + PCI_PM_CTRL, 2); state = pmcsr & PCI_PM_CTRL_STATE_MASK; if (state) { pmcsr &= ~PCI_PM_CTRL_STATE_MASK; -vfio_pci_write_config(pdev, vdev->pm_cap + PCI_PM_CTRL, pmcsr, 2); +vfio_pci_write_config(pdev, pdev->pm_cap + PCI_PM_CTRL, pmcsr, 2); /* vfio handles the necessary delay here */ -pmcsr = vfio_pci_read_config(pdev, vdev->pm_cap + PCI_PM_CTRL, 2); +pmcsr = vfio_pci_read_config(pdev, pdev->pm_cap + PCI_PM_CTRL, 2); state = pmcsr & PCI_PM_CTRL_STATE_MASK; if (state) { error_report("vfio: Unable to power on device, stuck in D%d", diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 43c166680abb..d638c781f6f1 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -160,7 +160,6 @@ struct VFIOPCIDevice { int32_t bootindex; uint32_t igd_gms; OffAutoPCIBAR msix_relo; -uint8_t pm_cap; uint8_t nv_gpudirect_clique; bool pci_aer; bool req_enabled; -- 2.48.1
[PATCH 2/5] pci: Use PCI PM capability initializer
Switch callers directly initializing the PCI PM capability with pci_add_capability() to use pci_pm_init(). Cc: Dmitry Fleytman Cc: Akihiko Odaki Cc: Jason Wang Cc: Stefan Weil Cc: Sriram Yagnaraman Cc: Keith Busch Cc: Klaus Jensen Cc: Jesper Devantier Cc: Michael S. Tsirkin Cc: Marcel Apfelbaum Cc: Cédric Le Goater Signed-off-by: Alex Williamson --- hw/net/e1000e.c | 3 +-- hw/net/eepro100.c | 4 +--- hw/net/igb.c| 3 +-- hw/nvme/ctrl.c | 3 +-- hw/pci-bridge/pcie_pci_bridge.c | 2 +- hw/vfio/pci.c | 2 +- hw/virtio/virtio-pci.c | 3 +-- 7 files changed, 7 insertions(+), 13 deletions(-) diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c index f637853073e2..b72cbab7e889 100644 --- a/hw/net/e1000e.c +++ b/hw/net/e1000e.c @@ -372,8 +372,7 @@ static int e1000e_add_pm_capability(PCIDevice *pdev, uint8_t offset, uint16_t pmc) { Error *local_err = NULL; -int ret = pci_add_capability(pdev, PCI_CAP_ID_PM, offset, - PCI_PM_SIZEOF, &local_err); +int ret = pci_pm_init(pdev, offset, &local_err); if (local_err) { error_report_err(local_err); diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c index 6d853229aec2..29a39865a608 100644 --- a/hw/net/eepro100.c +++ b/hw/net/eepro100.c @@ -551,9 +551,7 @@ static void e100_pci_reset(EEPRO100State *s, Error **errp) if (info->power_management) { /* Power Management Capabilities */ int cfg_offset = 0xdc; -int r = pci_add_capability(&s->dev, PCI_CAP_ID_PM, - cfg_offset, PCI_PM_SIZEOF, - errp); +int r = pci_pm_init(&s->dev, cfg_offset, errp); if (r < 0) { return; } diff --git a/hw/net/igb.c b/hw/net/igb.c index 4d93ce629f95..700dbc746d3d 100644 --- a/hw/net/igb.c +++ b/hw/net/igb.c @@ -356,8 +356,7 @@ static int igb_add_pm_capability(PCIDevice *pdev, uint8_t offset, uint16_t pmc) { Error *local_err = NULL; -int ret = pci_add_capability(pdev, PCI_CAP_ID_PM, offset, - PCI_PM_SIZEOF, &local_err); +int ret = pci_pm_init(pdev, offset, &local_err); if (local_err) { error_report_err(local_err); diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 68903d1d7067..1faea3d2b85b 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -8503,8 +8503,7 @@ static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset) Error *err = NULL; int ret; -ret = pci_add_capability(pci_dev, PCI_CAP_ID_PM, offset, - PCI_PM_SIZEOF, &err); +ret = pci_pm_init(pci_dev, offset, &err); if (err) { error_report_err(err); return ret; diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c index fd4514a595ce..9fa656b43b42 100644 --- a/hw/pci-bridge/pcie_pci_bridge.c +++ b/hw/pci-bridge/pcie_pci_bridge.c @@ -52,7 +52,7 @@ static void pcie_pci_bridge_realize(PCIDevice *d, Error **errp) goto cap_error; } -pos = pci_add_capability(d, PCI_CAP_ID_PM, 0, PCI_PM_SIZEOF, errp); +pos = pci_pm_init(d, 0, errp); if (pos < 0) { goto pm_error; } diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 89d900e9cf0c..6903f831e45f 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2216,7 +2216,7 @@ static bool vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp) case PCI_CAP_ID_PM: vfio_check_pm_reset(vdev, pos); vdev->pm_cap = pos; -ret = pci_add_capability(pdev, cap_id, pos, size, errp) >= 0; +ret = pci_pm_init(pdev, pos, errp) >= 0; break; case PCI_CAP_ID_AF: vfio_check_af_flr(vdev, pos); diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index c773a9130c7e..afe8b5551c5c 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -2204,8 +2204,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp) pos = pcie_endpoint_cap_init(pci_dev, 0); assert(pos > 0); -pos = pci_add_capability(pci_dev, PCI_CAP_ID_PM, 0, - PCI_PM_SIZEOF, errp); +pos = pci_pm_init(pci_dev, 0, errp); if (pos < 0) { return; } -- 2.48.1
[PATCH 5/5] hw/vfio/pci: Re-order pre-reset
We want the device in the D0 power state going into reset, but the config write can enable the BARs in the address space, which are then removed from the address space once we clear the memory enable bit in the command register. Re-order to clear the command bit first, so the power state change doesn't enable the BARs. Cc: Cédric Le Goater Signed-off-by: Alex Williamson --- hw/vfio/pci.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index ba4ef65b16fa..fcc5f118bf90 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2405,6 +2405,15 @@ void vfio_pci_pre_reset(VFIOPCIDevice *vdev) vfio_disable_interrupts(vdev); +/* + * Stop any ongoing DMA by disconnecting I/O, MMIO, and bus master. + * Also put INTx Disable in known state. + */ +cmd = vfio_pci_read_config(pdev, PCI_COMMAND, 2); +cmd &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER | + PCI_COMMAND_INTX_DISABLE); +vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2); + /* Make sure the device is in D0 */ if (pdev->pm_cap) { uint16_t pmcsr; @@ -2424,15 +2433,6 @@ void vfio_pci_pre_reset(VFIOPCIDevice *vdev) } } } - -/* - * Stop any ongoing DMA by disconnecting I/O, MMIO, and bus master. - * Also put INTx Disable in known state. - */ -cmd = vfio_pci_read_config(pdev, PCI_COMMAND, 2); -cmd &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER | - PCI_COMMAND_INTX_DISABLE); -vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2); } void vfio_pci_post_reset(VFIOPCIDevice *vdev) -- 2.48.1
[PATCH 0/5] PCI: Implement basic PCI PM capability backing
Eric recently identified an issue[1] where during graceful shutdown of a VM in a vIOMMU configuration, the guest driver places the device into the D3 power state, the vIOMMU is then disabled, triggering an AddressSpace update. The device BARs are still mapped into the AS, but the vfio host driver refuses to DMA map the MMIO space due to the device power state. The proposed solution in [1] was to skip mappings based on the device power state. Here we take a different approach. The PCI spec defines that devices in D1/2/3 power state should respond only to configuration and message requests and all other requests should be handled as an Unsupported Request. In other words, the memory and IO BARs are not accessible except when the device is in the D0 power state. To emulate this behavior, we can factor the device power state into the mapping state of the device BARs. Therefore the BAR is marked as unmapped if either the respective command register enable bit is clear or the device is not in the D0 power state. In order to implement this, the PowerState field of the PMCSR register becomes writable, which allows the device to appear in lower power states. This also therefore implements D3 support (insofar as the BAR behavior) for all devices implementing the PM capability. The PCI spec requires D3 support. An aspect that needs attention here is whether this change in the wmask and PMCSR bits becomes a problem for migration, and how we might solve it. For a guest migrating old->new, the device would always be in the D0 power state, but the register becomes writable. In the opposite direction, is it possible that a device could migrate in a low power state and be stuck there since the bits are read-only in old QEMU? Do we need an option for this behavior and a machine state bump, or are there alternatives? Thanks, Alex [1]https://lore.kernel.org/all/20250219175941.135390-1-eric.au...@redhat.com/ Alex Williamson (5): hw/pci: Basic support for PCI power management pci: Use PCI PM capability initializer vfio/pci: Delete local pm_cap pcie, virtio: Remove redundant pm_cap hw/vfio/pci: Re-order pre-reset hw/net/e1000e.c | 3 +- hw/net/eepro100.c | 4 +- hw/net/igb.c| 3 +- hw/nvme/ctrl.c | 3 +- hw/pci-bridge/pcie_pci_bridge.c | 3 +- hw/pci/pci.c| 83 - hw/pci/trace-events | 2 + hw/vfio/pci.c | 29 ++-- hw/vfio/pci.h | 1 - hw/virtio/virtio-pci.c | 11 ++--- include/hw/pci/pci.h| 3 ++ include/hw/pci/pci_device.h | 3 ++ include/hw/pci/pcie.h | 2 - 13 files changed, 112 insertions(+), 38 deletions(-) -- 2.48.1
Re: [PATCH v3 0/2] hw/i386/amd_iommu: Add migration support
On Wed, Feb 12, 2025 at 05:44:48AM +, Suravee Suthikulpanit wrote: > Currently, amd-iommu device does not support migration. This series addresses > an issue due hidden AMDVI-PCI device enumeration. Then introduces migratable > VMStateDescription, which saves necessary parameters for the device. > > Changes from v2: > (https://lore.kernel.org/all/20250206051856.323651-1-suravee.suthikulpa...@amd.com) > * Add patch 1/2 Fails build on 32 bit: https://gitlab.com/mstredhat/qemu/-/jobs/9202574769 In file included from ../include/qemu/osdep.h:53, from ../hw/i386/amd_iommu.c:23: ../include/qemu/compiler.h:70:35: error: invalid operands to binary - (have ‘uint64_t *’ {aka ‘long long unsigned int *’} and ‘size_t *’ {aka ‘unsigned int *’}) 70 | #define type_check(t1,t2) ((t1*)0 - (t2*)0) | ^ ../include/migration/vmstate.h:269:6: note: in expansion of macro ‘type_check’ 269 | type_check(_type, typeof_field(_state, _field))) | ^~ ../include/migration/vmstate.h:320:21: note: in expansion of macro ‘vmstate_offset_value’ 320 | .offset = vmstate_offset_value(_state, _field, _type), \ | ^~~~ ../include/migration/vmstate.h:853:5: note: in expansion of macro ‘VMSTATE_SINGLE_TEST’ 853 | VMSTATE_SINGLE_TEST(_field, _state, NULL, _version, _info, _type) | ^~~ ../include/migration/vmstate.h:903:5: note: in expansion of macro ‘VMSTATE_SINGLE’ 903 | VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint64, uint64_t) | ^~ ../include/migration/vmstate.h:937:5: note: in expansion of macro ‘VMSTATE_UINT64_V’ 937 | VMSTATE_UINT64_V(_f, _s, 0) | ^~~~ ../hw/i386/amd_iommu.c:1635:7: note: in expansion of macro ‘VMSTATE_UINT64’ 1635 | VMSTATE_UINT64(devtab_len, AMDVIState), | ^~ > Suravee Suthikulpanit (2): > hw/i386/amd_iommu: Isolate AMDVI-PCI from amd-iommu device to allow > full control over the PCI device creation > hw/i386/amd_iommu: Allow migration when explicitly create the > AMDVI-PCI device > > hw/i386/acpi-build.c | 8 ++-- > hw/i386/amd_iommu.c | 111 +-- > hw/i386/amd_iommu.h | 3 +- > 3 files changed, 91 insertions(+), 31 deletions(-) > > -- > 2.34.1
[PATCH v2] hw/riscv: Allow direct start of kernel for MPFS
Further customize the -bios and -kernel options behaviour for the microchip-icicle-kit machine. If "-bios none -kernel filename" is specified, then do not load a firmware and instead only load and start the kernel image. Signed-off-by: Sebastian Huber --- v2: Use riscv_find_firmware() to locate the firmware shipped with Qemu. hw/riscv/microchip_pfsoc.c | 57 ++ 1 file changed, 40 insertions(+), 17 deletions(-) diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c index 4d3c5c8e20..b5ac057a1d 100644 --- a/hw/riscv/microchip_pfsoc.c +++ b/hw/riscv/microchip_pfsoc.c @@ -578,29 +578,45 @@ static void microchip_icicle_kit_machine_init(MachineState *machine) } /* - * We follow the following table to select which payload we execute. + * We follow the following table to select which firmware we use. * - * -bios |-kernel | payload - * ---++ - * N | N | HSS - * Y | don't care | HSS - * N | Y | kernel - * - * This ensures backwards compatibility with how we used to expose -bios - * to users but allows them to run through direct kernel booting as well. + * -bios | -kernel| firmware + * --++ + * none | N | error + * none | Y | kernel + * NULL, default | N | BIOS_FILENAME + * NULL, default | Y | RISCV64_BIOS_BIN + * other | don't care | other */ +if (machine->firmware && !strcmp(machine->firmware, "none")) { +if (!machine->kernel_filename) { +error_report("for -bios none, a kernel is required"); +exit(1); +} -if (machine->kernel_filename) { -firmware_name = RISCV64_BIOS_BIN; -firmware_load_addr = memmap[MICROCHIP_PFSOC_DRAM_LO].base; +firmware_name = NULL; +firmware_load_addr = RESET_VECTOR; +} else if (!machine->firmware || !strcmp(machine->firmware, "default")) { +if (machine->kernel_filename) { +firmware_name = RISCV64_BIOS_BIN; +firmware_load_addr = memmap[MICROCHIP_PFSOC_DRAM_LO].base; +} else { +firmware_name = BIOS_FILENAME; +firmware_load_addr = RESET_VECTOR; +} } else { -firmware_name = BIOS_FILENAME; +firmware_name = machine->firmware; firmware_load_addr = RESET_VECTOR; } -/* Load the firmware */ -firmware_end_addr = riscv_find_and_load_firmware(machine, firmware_name, - &firmware_load_addr, NULL); +/* Load the firmware if necessary */ +if (firmware_name) { +const char *filename = riscv_find_firmware(firmware_name, NULL); +firmware_end_addr = riscv_load_firmware(filename, &firmware_load_addr, +NULL); +} else { +firmware_end_addr = firmware_load_addr; +} riscv_boot_info_init(&boot_info, &s->soc.u_cpus); if (machine->kernel_filename) { @@ -635,8 +651,15 @@ static void microchip_icicle_kit_machine_init(MachineState *machine) fdt_load_addr = 0; } +hwaddr start_addr; +if (firmware_name) { +start_addr = firmware_load_addr; +} else { +start_addr = kernel_entry; +} + /* Load the reset vector */ -riscv_setup_rom_reset_vec(machine, &s->soc.u_cpus, firmware_load_addr, +riscv_setup_rom_reset_vec(machine, &s->soc.u_cpus, start_addr, memmap[MICROCHIP_PFSOC_ENVM_DATA].base, memmap[MICROCHIP_PFSOC_ENVM_DATA].size, kernel_entry, fdt_load_addr); -- 2.43.0
Re: [PATCH 0/5] Improve Microchip Polarfire SoC customization
- Am 20. Feb 2025 um 23:29 schrieb Philippe Mathieu-Daudé phi...@linaro.org: > Hi Conor, > > On 20/2/25 19:30, Conor Dooley wrote: >> +cc qemu-riscv, Alistar. >> >> On Fri, Feb 14, 2025 at 07:24:37AM +0100, Sebastian Huber wrote: >>> Booting the microchip-icicle-kit machine using the latest PolarFire SoC >>> Hart Software Services (HSS) no longer works since Qemu lacks support >>> for several registers (clocks, DRAM controller). Also reading from the >>> SDCard does not work currently. >> >> On that note, I think the inaccurate docs about polarfire should be >> removed. There's a wiki page here with dead links, or links to things >> that do not work anymore: >> https://wiki.qemu.org/Documentation/Platforms/RISCV#Microchip_PolarFire_SoC_Icicle_Kit >> I think the whole section should be removed, find it kinda odd that >> there's a polarfire section but not for any other board. Either way, >> it's talking about something that just does not work, the current HSS >> and Yocto don't boot. >> >> There's also a docs page here: >> https://www.qemu.org/docs/master/system/riscv/microchip-icicle-kit.html >> that has a copy of the table your patch 4 modifies, that probably should >> be updated to match your changes. >> >> In a similar vein to the wiki, it talks about the HSS and booting a >> yocto wic image. I think those should be deleted since they don't work. >> >> Alistar/Other RISC-V folks, what do you think? Bin wrote the port but >> seems to be AFK and I don't have the capacity to fix any of that stuff >> on top of what I already do in my spare time - do you agree that >> deleting the now inaccurate docs makes sense? >> >>> In order to allow tests runs for real-time kernels such as RTEMS and >>> Zephyr, improve the boot customization. This patch set enables a direct >>> run of kernel executables, for example: >>> >>> qemu-system-riscv64 -no-reboot -nographic \ >>>-serial null -serial mon:stdio \ >>>-smp 2 \ >>>-bios none \ >>>-machine microchip-icicle-kit,clint-timebase-frequency=1000 \ >>>-kernel rtos.elf >> >> The series breaks my usage: >> qemu//build/qemu-system-riscv64 -M microchip-icicle-kit \ >> -m 3G -smp 5 \ >> -kernel vmlinux.bin \ >> -dtb riscvpc.dtb \ >> -initrd initramfs.cpio.gz \ >> -display none -serial null \ >> -serial mon:stdio \ >> -D qemu.log -d unimp >> opensbi-riscv64-generic-fw_dynamic.bin: No such file or directory >> qemu-system-riscv64: could not load firmware >> 'opensbi-riscv64-generic-fw_dynamic.bin' >> make: *** [Makefile:305: qemu-icicle] Error 1 >> >> Figure it is likely to be your patch 4? The file does exist, so probably >> some sort of path-to-it issues? > > Maybe missing the -L option? > > -L path set the directory for the BIOS, VGA BIOS and keymaps It was an error in patch 4/5. I sent a v2 version of it. You have to find the firmware, before you can load it. -- embedded brains GmbH & Co. KG Herr Sebastian HUBER Dornierstr. 4 82178 Puchheim Germany email: sebastian.hu...@embedded-brains.de phone: +49-89-18 94 741 - 16 fax: +49-89-18 94 741 - 08 Registergericht: Amtsgericht München Registernummer: HRB 157899 Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler Unsere Datenschutzerklärung finden Sie hier: https://embedded-brains.de/datenschutzerklaerung/
Re: [PATCH 2/2] [NOT-FOR-MERGE] Add qtest for migration over RDMA
On 20/02/2025 23:55, Peter Xu wrote: > On Thu, Feb 20, 2025 at 05:40:38PM +0800, Li Zhijian wrote: >> On 19/02/2025 22:11, Peter Xu wrote: >>> then >>> in the test it tries to detect rdma link and fetch the ip only >> It should work without root permission if we just*detect* and*fetch ip*. >> >> Do you also mean we can split new-rdma-link.sh to 2 separate scripts >> - add-rdma-link.sh # optionally, execute by user before the test >> (require root permission) >> - detect-fetch-rdma.sh # execute from the migration-test > Hmm indeed we still need a script to scan over all the ports.. > > If having --rdma is a good idea, maybe we can further make it a parameter > to --rdma? > > $ migration-test --rdma $RDMA_IP > > Or: > > $ migration-test --rdma-ip $RDMA_IP I think --rdma only makes sense if it's going to do something special. The optmimal scenario is that it always runs the test when it can and sets up/tears down anything it needs. If it needs root, I'd prefer the test informs about this and does the work itself. It would also be good to have the add + detect separate so we have more flexibility, maybe we manage to enable this in CI even. So: ./add.sh migration-test (runs detect.sh + runs rdma test) (leaves stuff behind) migration-test (skips rdma test with message that it needs root) sudo migration-test (runs add.sh + detect.sh + runs rdma test) (cleans itself up) Does that make sense to you? I hope it's not too much work. >>> Looks good here. We can also keep all the rdma stuff into one file, taking >>> parameters. >>> >>> ./rdma-helper.sh setup >>> ./rdma-helper.sh detect-ip >> >> Hi Peter and Fabiano >> >> Many thanks for your kindly idea and suggestion. >> Please take another look at the changes below. >> - I don't copy script to the build dir, just execute the script like >> misc-tests.c >> - It will automatically create a new RXE if it doesn't exit when running in >> root > > Thanks! This is much better. Comments below. > >> >> [PATCH] migration: Add qtest for migration over RDMA >> >> This qtest requires there is RDMA(RoCE) link in the host. >> Introduce a scripts/rdma-migration-helper.sh to >> - setup a new RXE if it's root >> - detect existing RoCE link >> to make the qtest work smoothly. >> >> Test will be skip if there is no available RoCE link. >> # Start of rdma tests >> # Running /x86_64/migration/precopy/rdma/plain >> ok 1 /x86_64/migration/precopy/rdma/plain # SKIP There is no available >> rdma link in the host. >> Maybe you are not running with the root permission >> # End of rdma tests >> >> Admin is able to remove the RXE by passing 'cleanup' to this script. >> >> Signed-off-by: Li Zhijian >> --- >> scripts/rdma-migration-helper.sh | 40 +++ >> tests/qtest/migration/precopy-tests.c | 57 +++ >> 2 files changed, 97 insertions(+) >> create mode 100755 scripts/rdma-migration-helper.sh >> >> diff --git a/scripts/rdma-migration-helper.sh >> b/scripts/rdma-migration-helper.sh >> new file mode 100755 >> index 00..4ef62baf0f >> --- /dev/null >> +++ b/scripts/rdma-migration-helper.sh >> @@ -0,0 +1,40 @@ >> +#!/bin/bash >> + >> +# Copied from blktests >> +get_ipv4_addr() { >> +ip -4 -o addr show dev "$1" | >> +sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' | >> +tr -d '\n' >> +} >> + >> +has_soft_rdma() { >> +rdma link | grep -q " netdev $1[[:blank:]]*\$" >> +} >> + >> +rdma_rxe_setup_detect() >> +{ >> +( >> +cd /sys/class/net && >> +for i in *; do >> +[ -e "$i" ] || continue >> +[ "$i" = "lo" ] && continue >> +[ "$(<"$i/addr_len")" = 6 ] || continue >> +[ "$(<"$i/carrier")" = 1 ] || continue >> + >> +has_soft_rdma "$i" && break >> +[ "$operation" = "setup" ] && rdma link add "${i}_rxe" type >> rxe netdev "$i" && break >> +done >> +has_soft_rdma "$i" || return >> +get_ipv4_addr $i >> +) >> +} >> + >> +operation=${1:-setup} >> + >> +if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then >> +rdma_rxe_setup_detect >> +elif [ "$operation" == "cleanup" ]; then >> +modprobe -r rdma_rxe > > Would this affects host when there's existing user of rdma_rxe (or fail > with -EBUSY)? 'modprobe -r ' will fail with EBUSY in this case. > If there's no major side effect of leftover rdma link, we > could also drop the "cleanup" for now and start from simple. In my understanding, there is no any side effect, I'm fine to drop it in this time. > >> +else >> +echo "Usage: $0 [setup | cleanup | detect]" >> +fi >> diff --git a/tests/qtest/migration/precopy-tests.c >> b/tests/qtest/migration/precopy-tests.c >> in
RE: [PATCH v3 08/28] hw/intc/aspeed: Add support for multiple output pins in INTC
Hi Cedric, > Subject: Re: [PATCH v3 08/28] hw/intc/aspeed: Add support for multiple output > pins in INTC > > On 2/13/25 04:35, Jamin Lin wrote: > > Added support for multiple output pins in the INTC controller to > > accommodate the AST2700 A1. > > > > Introduced "num_outpins" to represent the number of output pins. > > Updated the IRQ handling logic to initialize and connect output pins > > separately from input pins. Modified the "aspeed_soc_ast2700_realize" > > function to connect source orgates to INTC and INTC to GIC128 - GIC136. > Updated the "aspeed_intc_realize" > > function to initialize output pins. > > > > Signed-off-by: Jamin Lin > > could you please add to the .git/config file : > Thank you for letting me know this information and for your suggestion. I will add it to my local Git configuration. Jamin > [diff] > orderFile = /path/to/qemu/scripts/git.orderfile > > This will put .h files before the .c files in the patch. > > > Reviewed-by: Cédric Le Goater > > Thanks, > > C. > > > > --- > > hw/arm/aspeed_ast27x0.c | 6 +- > > hw/intc/aspeed_intc.c | 4 > > include/hw/intc/aspeed_intc.h | 5 +++-- > > 3 files changed, 12 insertions(+), 3 deletions(-) > > > > diff --git a/hw/arm/aspeed_ast27x0.c b/hw/arm/aspeed_ast27x0.c index > > 18e14a7914..775e953afd 100644 > > --- a/hw/arm/aspeed_ast27x0.c > > +++ b/hw/arm/aspeed_ast27x0.c > > @@ -519,10 +519,14 @@ static void > aspeed_soc_ast2700_realize(DeviceState *dev, Error **errp) > > aspeed_mmio_map(s, SYS_BUS_DEVICE(&a->intc), 0, > > sc->memmap[ASPEED_DEV_INTC]); > > > > -/* GICINT orgates -> INTC -> GIC */ > > +/* source orgates -> INTC */ > > for (i = 0; i < ic->num_inpins; i++) { > > qdev_connect_gpio_out(DEVICE(&a->intc.orgates[i]), 0, > > > qdev_get_gpio_in(DEVICE(&a->intc), > > i)); > > +} > > + > > +/* INTC -> GIC128 - GIC136 */ > > +for (i = 0; i < ic->num_outpins; i++) { > > sysbus_connect_irq(SYS_BUS_DEVICE(&a->intc), i, > > qdev_get_gpio_in(DEVICE(&a->gic), > > > > aspeed_soc_ast2700_gic_intcmap[i].irq)); > > diff --git a/hw/intc/aspeed_intc.c b/hw/intc/aspeed_intc.c index > > 95b40e1935..32c4a3bb44 100644 > > --- a/hw/intc/aspeed_intc.c > > +++ b/hw/intc/aspeed_intc.c > > @@ -354,6 +354,9 @@ static void aspeed_intc_realize(DeviceState *dev, > Error **errp) > > if (!qdev_realize(DEVICE(&s->orgates[i]), NULL, errp)) { > > return; > > } > > +} > > + > > +for (i = 0; i < aic->num_outpins; i++) { > > sysbus_init_irq(sbd, &s->output_pins[i]); > > } > > } > > @@ -389,6 +392,7 @@ static void aspeed_2700_intc_class_init(ObjectClass > *klass, void *data) > > dc->desc = "ASPEED 2700 INTC Controller"; > > aic->num_lines = 32; > > aic->num_inpins = 9; > > +aic->num_outpins = 9; > > aic->mem_size = 0x4000; > > aic->reg_size = 0x2000; > > } > > diff --git a/include/hw/intc/aspeed_intc.h > > b/include/hw/intc/aspeed_intc.h index 5f0429c7f9..0bf96a81bb 100644 > > --- a/include/hw/intc/aspeed_intc.h > > +++ b/include/hw/intc/aspeed_intc.h > > @@ -17,8 +17,8 @@ > > OBJECT_DECLARE_TYPE(AspeedINTCState, AspeedINTCClass, > ASPEED_INTC) > > > > #define ASPEED_INTC_NR_REGS (0x2000 >> 2) -#define > > ASPEED_INTC_NR_INTS 9 > > #define ASPEED_INTC_MAX_INPINS 9 > > +#define ASPEED_INTC_MAX_OUTPINS 9 > > > > struct AspeedINTCState { > > /*< private >*/ > > @@ -30,7 +30,7 @@ struct AspeedINTCState { > > > > uint32_t regs[ASPEED_INTC_NR_REGS]; > > OrIRQState orgates[ASPEED_INTC_MAX_INPINS]; > > -qemu_irq output_pins[ASPEED_INTC_NR_INTS]; > > +qemu_irq output_pins[ASPEED_INTC_MAX_OUTPINS]; > > > > uint32_t enable[ASPEED_INTC_MAX_INPINS]; > > uint32_t mask[ASPEED_INTC_MAX_INPINS]; @@ -42,6 +42,7 @@ > struct > > AspeedINTCClass { > > > > uint32_t num_lines; > > uint32_t num_inpins; > > +uint32_t num_outpins; > > uint64_t mem_size; > > uint64_t reg_size; > > const MemoryRegionOps *reg_ops;
Re: [PATCH v4 00/33] Multifd 🔀 device state transfer support with VFIO consumer
Let me share my performance report after applying the patches for the information: 1. live mlx VF migration outgoing migration: +--+---+---++ | VF(s) number | 1 | 2 | 4 | +--+---+---++ | Memory bandwidth | 733.693 MiB/s | 556.565 MiB/s | 475.310 MiB/s | | Total downtime | 227ms | 358ms | 460ms | +--+---+---++ incoming migration: +--+---+---++ | VF(s) number | 1 | 2 | 4 | +--+---+---++ | Memory bandwidth | 738.758 MiB/s | 566.175 MiB/s | 458.936 MiB/s | | Total downtime | 220ms | 342ms | 459ms | +--+---+---++ 2. live mlx VF multifd migration outgoing migration: +--+---++ | VF(s) number | 1 | 1 | +--+---++ | Channel | 4 | 6 | | Memory bandwidth | 786.942 MiB/s | 848.362 MiB/s | | Total downtime | 142ms | 188ms | +--+---++ +--++---++ | VF(s) number | 2 | 2 | 2 | +--++---++ | Channel | 4 | 6 | 8 | | Memory bandwidth | 774.315 MiB/s | 831.938 MiB/s | 769.799 MiB/s | | Total downtime | 160ms | 178ms | 156ms | +--++---++ +--++---++ | VF(s) number | 4 | 4 | 4 | +--++---++ | Channel | 6 | 8 | 16 | | Memory bandwidth | 715.210 MiB/s | 742.962 MiB/s | 747.188 MiB/s | | Total downtime | 180ms | 219ms | 190ms | +--++---++ incoming migration: +--+---++ | VF(s) number | 1 | 1 | +--+---++ | Channel | 4 | 6 | | Memory bandwidth | 807.958 MiB/s | 859.525 MiB/s | | Total downtime | 150ms | 177ms | +--+---++ +--+---+---++ | VF(s) number | 2 | 2 | 2 | +--+---+---++ | Channel | 4 | 6 | 8 | | Memory bandwidth | 768.104 MiB/s | 825.462 MiB/s | 791.582 MiB/s | | Total downtime | 170ms | 185ms | 175ms | +--+---+---++ +--+---+---++ | VF(s) number | 4 | 4 | 4 | +--+---+---++ | Channel | 6 | 8 | 16 | | Memory bandwidth | 706.921 MiB/s | 750.706 MiB/s | 746.295 MiB/s | | Total downtime | 174ms | 193ms | 191ms | +--+---+---++ Best Regards, Yanghang Liu On Mon, Feb 3, 2025 at 10:20 PM Cédric Le Goater wrote: > > Hello Maciej, > > > This patch set is targeting QEMU 10.0. > > > > What's not yet present is documentation update under docs/devel/migration > > but I didn't want to delay posting the code any longer. > > Such doc can still be merged later when the design is 100% finalized. > The changes are quite complex, the design is not trivial, the benefits are > not huge as far as we know. I'd rather have the doc update first please. > > Thanks, > > C. > > >
Re: [PATCH] docs/cxl: Add serial number for persistent-memdev
> > Looks good. I've queued it up on my gitlab staging tree, but > Michael if you want to pick this one directly that's fine as well. > > I should be pushing out my gitlab tree shortly (bit of networking > fun to deal with). > Hi, Jonathan About qemu side, I have another question: Could the qemu provide simulated RCH-RCD topology now? Yuquan 信息安全声明:本邮件包含信息归发件人所在组织所有,发件人所在组织对该邮件拥有所有权利。请接收者注意保密,未经发件人书面许可,不得向任何第三方组织和个人透露本邮件所含信息。 Information Security Notice: The information contained in this mail is solely property of the sender's organization.This mail communication is confidential.Recipients named above are obligated to maintain secrecy and are not permitted to disclose the contents of this communication to others.
[PATCH v2 4/8] migration: Integrate control_save_page() logic into ram_save_target_page()
Refactor the page saving logic by integrating the control_save_page() function directly into ram_save_target_page(). This change consolidates the RDMA migration decision-making process into a single function, enhancing clarity and maintainability. Signed-off-by: Li Zhijian --- migration/ram.c | 35 +++ 1 file changed, 7 insertions(+), 28 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index b7157b9b175..e07651aee8d 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1143,33 +1143,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss, return len; } -/* - * @pages: the number of pages written by the control path, - *< 0 - error - *> 0 - number of pages written - * - * Return true if the pages has been saved, otherwise false is returned. - */ -static bool control_save_page(PageSearchStatus *pss, - ram_addr_t offset, int *pages) -{ -int ret; - -if (migrate_rdma() && !migration_in_postcopy()) { -ret = rdma_control_save_page(pss->pss_channel, pss->block->offset, - offset, TARGET_PAGE_SIZE); - -if (ret == RAM_SAVE_CONTROL_DELAYED) { -*pages = 1; -} else { -*pages = ret; -} -return true; -} - -return false; -} - /* * directly send the page to the stream * @@ -1966,7 +1939,13 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss) int res; /* Hand over to RDMA first */ -if (control_save_page(pss, offset, &res)) { +if (migrate_rdma() && !migration_in_postcopy()) { +res = rdma_control_save_page(pss->pss_channel, pss->block->offset, + offset, TARGET_PAGE_SIZE); + +if (res == RAM_SAVE_CONTROL_DELAYED) { +res = 1; +} return res; } -- 2.44.0
[PATCH v2 8/8] migration: Add qtest for migration over RDMA
This qtest requires there is a RDMA(RoCE) link in the host. In order to make the test work smoothly, introduce a scripts/rdma-migration-helper.sh to - setup a new Soft-RoCE(aka RXE) if it's root - detect existing RoCE link Test will be skipped if there is no available RoCE link. # Start of rdma tests # Running /x86_64/migration/precopy/rdma/plain ok 1 /x86_64/migration/precopy/rdma/plain # SKIP There is no available rdma link to run RDMA migration test. To enable the test: (1) Run 'scripts/rdma-migration-helper.sh setup' with root and rerun the test or (2) Run the test with root privilege # End of rdma tests Signed-off-by: Li Zhijian --- MAINTAINERS | 1 + scripts/rdma-migration-helper.sh | 41 + tests/qtest/migration/precopy-tests.c | 64 +++ 3 files changed, 106 insertions(+) create mode 100755 scripts/rdma-migration-helper.sh diff --git a/MAINTAINERS b/MAINTAINERS index 3848d37a38d..15360fcdc4b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3480,6 +3480,7 @@ R: Li Zhijian R: Peter Xu S: Odd Fixes F: migration/rdma* +F: scripts/rdma-migration-helper.sh Migration dirty limit and dirty page rate M: Hyman Huang diff --git a/scripts/rdma-migration-helper.sh b/scripts/rdma-migration-helper.sh new file mode 100755 index 000..66557d9e267 --- /dev/null +++ b/scripts/rdma-migration-helper.sh @@ -0,0 +1,41 @@ +#!/bin/bash + +# Copied from blktests +get_ipv4_addr() +{ +ip -4 -o addr show dev "$1" | +sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' | +tr -d '\n' +} + +has_soft_rdma() +{ +rdma link | grep -q " netdev $1[[:blank:]]*\$" +} + +rdma_rxe_setup_detect() +{ +( +cd /sys/class/net && +for i in *; do +[ -e "$i" ] || continue +[ "$i" = "lo" ] && continue +[ "$(<"$i/addr_len")" = 6 ] || continue +[ "$(<"$i/carrier")" = 1 ] || continue + +has_soft_rdma "$i" && break +[ "$operation" = "setup" ] && +rdma link add "${i}_rxe" type rxe netdev "$i" && break +done +has_soft_rdma "$i" || return +get_ipv4_addr "$i" +) +} + +operation=${1:-setup} + +if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then +rdma_rxe_setup_detect +else +echo "Usage: $0 [setup | detect]" +fi diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c index ba273d10b9a..bf97f4e9325 100644 --- a/tests/qtest/migration/precopy-tests.c +++ b/tests/qtest/migration/precopy-tests.c @@ -99,6 +99,66 @@ static void test_precopy_unix_dirty_ring(void) test_precopy_common(&args); } +#ifdef CONFIG_RDMA + +#define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh" +static int new_rdma_link(char *buffer) +{ +const char *argument = (geteuid() == 0) ? "setup" : "detect"; +char cmd[1024]; + +snprintf(cmd, sizeof(cmd), "%s %s", RDMA_MIGRATION_HELPER, argument); + +FILE *pipe = popen(cmd, "r"); +if (pipe == NULL) { +perror("Failed to run script"); +return -1; +} + +int idx = 0; +while (fgets(buffer + idx, 128 - idx, pipe) != NULL) { +idx += strlen(buffer); +} + +int status = pclose(pipe); +if (status == -1) { +perror("Error reported by pclose()"); +return -1; +} else if (WIFEXITED(status)) { +return WEXITSTATUS(status); +} + +return -1; +} + +static void test_precopy_rdma_plain(void) +{ +char buffer[128] = {}; + +if (new_rdma_link(buffer)) { +g_test_skip("\nThere is no available rdma link to run RDMA migration test.\n" +"To enable the test:\n" +"(1) Run \'" RDMA_MIGRATION_HELPER " setup\' with root and rerun the test\n" +"or\n" +"(2) Run the test with root privilege\n"); +return; +} + +/* + * TODO: query a free port instead of hard code. + * 29200=('R'+'D'+'M'+'A')*100 + **/ +g_autofree char *uri = g_strdup_printf("rdma:%s:29200", buffer); + +MigrateCommon args = { +.listen_uri = uri, +.connect_uri = uri, +}; + +test_precopy_common(&args); +} +#endif + static void test_precopy_tcp_plain(void) { MigrateCommon args = { @@ -1124,6 +1184,10 @@ static void migration_test_add_precopy_smoke(MigrationTestEnv *env) test_multifd_tcp_uri_none); migration_test_add("/migration/multifd/tcp/plain/cancel", test_multifd_tcp_cancel); +#ifdef CONFIG_RDMA +migration_test_add("/migration/precopy/rdma/plain", + test_precopy_rdma_plain); +#endif } void migration_test_add_precopy(MigrationTestEnv *env) -- 2.44.0
[PATCH v2 2/8] migration/rdma: Remove redundant RAM_SAVE_CONTROL_NOT_SUPP check
qemu_rdma_save_page() no longer returns RAM_SAVE_CONTROL_NOT_SUPP since commit a4832d299dd ("migration/rdma: Check sooner if we are in postcopy for save_page()") Signed-off-by: Li Zhijian --- migration/rdma.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index 76fb0349238..af8e6234a9f 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3290,8 +3290,7 @@ int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset, int ret = qemu_rdma_save_page(f, block_offset, offset, size); -if (ret != RAM_SAVE_CONTROL_DELAYED && -ret != RAM_SAVE_CONTROL_NOT_SUPP) { +if (ret != RAM_SAVE_CONTROL_DELAYED) { if (ret < 0) { qemu_file_set_error(f, ret); } -- 2.44.0
[PATCH v2 5/8] migration: Add migration_capabilities_and_transport_compatible() helper
Similar to migration_channels_and_transport_compatible(), introduce a new helper migration_capabilities_and_transport_compatible() to check if the capabilites is compatible with the transport. Currently, only move the capabilities vs RDMA transport to this function. Signed-off-by: Li Zhijian --- migration/migration.c | 36 ++-- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index c597aa707e5..2eacae25e0e 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -238,6 +238,30 @@ migration_channels_and_transport_compatible(MigrationAddress *addr, return true; } +static bool +migration_capabilities_and_transport_compatible(MigrationAddress *addr, +Error **errp) +{ +if (addr->transport == MIGRATION_ADDRESS_TYPE_RDMA) { +if (migrate_xbzrle()) { +error_setg(errp, "RDMA and XBZRLE can't be used together"); +return false; +} +if (migrate_multifd()) { +error_setg(errp, "RDMA and multifd can't be used together"); +return false; +} +} + +return true; +} + +static bool migration_transport_compatible(MigrationAddress *addr, Error **errp) +{ +return migration_channels_and_transport_compatible(addr, errp) && + migration_capabilities_and_transport_compatible(addr, errp); +} + static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp) { uintptr_t a = (uintptr_t) ap, b = (uintptr_t) bp; @@ -716,7 +740,7 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels, } /* transport mechanism not suitable for migration? */ -if (!migration_channels_and_transport_compatible(addr, errp)) { +if (!migration_transport_compatible(addr, errp)) { return; } @@ -735,14 +759,6 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels, } #ifdef CONFIG_RDMA } else if (addr->transport == MIGRATION_ADDRESS_TYPE_RDMA) { -if (migrate_xbzrle()) { -error_setg(errp, "RDMA and XBZRLE can't be used together"); -return; -} -if (migrate_multifd()) { -error_setg(errp, "RDMA and multifd can't be used together"); -return; -} rdma_start_incoming_migration(&addr->u.rdma, errp); #endif } else if (addr->transport == MIGRATION_ADDRESS_TYPE_EXEC) { @@ -2159,7 +2175,7 @@ void qmp_migrate(const char *uri, bool has_channels, } /* transport mechanism not suitable for migration? */ -if (!migration_channels_and_transport_compatible(addr, errp)) { +if (!migration_transport_compatible(addr, errp)) { return; } -- 2.44.0
[PATCH v2 1/8] migration: Prioritize RDMA in ram_save_target_page()
Address an error in RDMA-based migration by ensuring RDMA is prioritized when saving pages in `ram_save_target_page()`. Previously, the RDMA protocol's page-saving step was placed after other protocols due to a refactoring in commit bc38dc2f5f3. This led to migration failures characterized by unknown control messages and state loading errors destination: (qemu) qemu-system-x86_64: Unknown control message QEMU FILE qemu-system-x86_64: error while loading state section id 1(ram) qemu-system-x86_64: load of migration failed: Operation not permitted source: (qemu) qemu-system-x86_64: RDMA is in an error state waiting migration to abort! qemu-system-x86_64: failed to save SaveStateEntry with id(name): 1(ram): -1 qemu-system-x86_64: rdma migration: recv polling control error! qemu-system-x86_64: warning: Early error. Sending error. qemu-system-x86_64: warning: rdma migration: send polling control error RDMA migration implemented its own protocol/method to send pages to destination side, hand over to RDMA first to prevent pages being saved by other protocol. Fixes: bc38dc2f5f3 ("migration: refactor ram_save_target_page functions") Signed-off-by: Li Zhijian --- migration/ram.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 589b6505eb2..424df6d9f13 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1964,6 +1964,11 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss) ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS; int res; +/* Hand over to RDMA first */ +if (control_save_page(pss, offset, &res)) { +return res; +} + if (!migrate_multifd() || migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) { if (save_zero_page(rs, pss, offset)) { @@ -1976,10 +1981,6 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss) return ram_save_multifd_page(block, offset); } -if (control_save_page(pss, offset, &res)) { -return res; -} - return ram_save_page(rs, pss); } -- 2.44.0
[PATCH v2 6/8] migraion: disable RDMA + postcopy-ram
It's believed that RDMA + postcopy-ram has been broken for a while. Rather than spending time re-enabling it, let's simply disable it as a trade-off. Signed-off-by: Li Zhijian --- migration/migration.c | 4 1 file changed, 4 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 2eacae25e0e..d414a4b1379 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -251,6 +251,10 @@ migration_capabilities_and_transport_compatible(MigrationAddress *addr, error_setg(errp, "RDMA and multifd can't be used together"); return false; } +if (migrate_postcopy_ram()) { +error_setg(errp, "RDMA and postcopy-ram can't be used together"); +return false; +} } return true; -- 2.44.0
[PATCH v2 3/8] migration: Kill RAM_SAVE_CONTROL_NOT_SUPP
Refactor the migration control logic by eliminating the `RAM_SAVE_CONTROL_NOT_SUPP` return value within the migration codebase. This involves moving the checks for RDMA migration status and postcopy state from rdma_control_save_page() to control_save_page() With this change, control_save_page() now takes responsibility for determining whether RDMA operations can proceed, based on the state of migration. Signed-off-by: Li Zhijian --- migration/ram.c | 19 ++- migration/rdma.c | 4 +--- migration/rdma.h | 3 +-- 3 files changed, 12 insertions(+), 14 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 424df6d9f13..b7157b9b175 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1155,18 +1155,19 @@ static bool control_save_page(PageSearchStatus *pss, { int ret; -ret = rdma_control_save_page(pss->pss_channel, pss->block->offset, offset, - TARGET_PAGE_SIZE); -if (ret == RAM_SAVE_CONTROL_NOT_SUPP) { -return false; -} +if (migrate_rdma() && !migration_in_postcopy()) { +ret = rdma_control_save_page(pss->pss_channel, pss->block->offset, + offset, TARGET_PAGE_SIZE); -if (ret == RAM_SAVE_CONTROL_DELAYED) { -*pages = 1; +if (ret == RAM_SAVE_CONTROL_DELAYED) { +*pages = 1; +} else { +*pages = ret; +} return true; } -*pages = ret; -return true; + +return false; } /* diff --git a/migration/rdma.c b/migration/rdma.c index af8e6234a9f..c6876347e1e 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3284,9 +3284,7 @@ err: int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset, ram_addr_t offset, size_t size) { -if (!migrate_rdma() || migration_in_postcopy()) { -return RAM_SAVE_CONTROL_NOT_SUPP; -} +assert(migrate_rdma()); int ret = qemu_rdma_save_page(f, block_offset, offset, size); diff --git a/migration/rdma.h b/migration/rdma.h index f55f28bbed1..8eeb0117b91 100644 --- a/migration/rdma.h +++ b/migration/rdma.h @@ -33,7 +33,6 @@ void rdma_start_incoming_migration(InetSocketAddress *host_port, Error **errp); #define RAM_CONTROL_ROUND 1 #define RAM_CONTROL_FINISH3 -#define RAM_SAVE_CONTROL_NOT_SUPP -1000 #define RAM_SAVE_CONTROL_DELAYED -2000 #ifdef CONFIG_RDMA @@ -56,7 +55,7 @@ static inline int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset, ram_addr_t offset, size_t size) { -return RAM_SAVE_CONTROL_NOT_SUPP; +g_assert_not_reached(); } #endif #endif -- 2.44.0
[PATCH v2 7/8] migration/rdma: Remove redundant migration_in_postcopy checks
Since we have disabled RDMA + postcopy, it's safe to remove the migration_in_postcopy() that follows the migration_rdma(). Signed-off-by: Li Zhijian --- migration/ram.c | 2 +- migration/rdma.c | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index e07651aee8d..c363034c882 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1939,7 +1939,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss) int res; /* Hand over to RDMA first */ -if (migrate_rdma() && !migration_in_postcopy()) { +if (migrate_rdma()) { res = rdma_control_save_page(pss->pss_channel, pss->block->offset, offset, TARGET_PAGE_SIZE); diff --git a/migration/rdma.c b/migration/rdma.c index c6876347e1e..0349dd4a8b8 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3826,7 +3826,7 @@ int rdma_block_notification_handle(QEMUFile *f, const char *name) int rdma_registration_start(QEMUFile *f, uint64_t flags) { -if (!migrate_rdma() || migration_in_postcopy()) { +if (!migrate_rdma()) { return 0; } @@ -3858,7 +3858,8 @@ int rdma_registration_stop(QEMUFile *f, uint64_t flags) RDMAControlHeader head = { .len = 0, .repeat = 1 }; int ret; -if (!migrate_rdma() || migration_in_postcopy()) { +/* Hand over to RDMA first */ +if (!migrate_rdma()) { return 0; } -- 2.44.0
Re: [PATCH rfcv2 09/20] intel_iommu: Rename vtd_ce_get_rid2pasid_entry to vtd_ce_get_pasid_entry
Reviewed-by: Clément Mathieu--Drif On 19/02/2025 09:22, Zhenzhong Duan wrote: > Caution: External email. Do not open attachments or click links, unless this > email comes from a known sender and you know the content is safe. > > > In early days vtd_ce_get_rid2pasid_entry() is used to get pasid entry of > rid2pasid, then extend to any pasid. So a new name vtd_ce_get_pasid_entry > is better to match its functions. > > No functional change intended. > > Signed-off-by: Zhenzhong Duan > --- > hw/i386/intel_iommu.c | 14 +++--- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > index 7fde0603bf..df5fb30bc8 100644 > --- a/hw/i386/intel_iommu.c > +++ b/hw/i386/intel_iommu.c > @@ -944,7 +944,7 @@ static int vtd_get_pe_from_pasid_table(IntelIOMMUState *s, > return 0; > } > > -static int vtd_ce_get_rid2pasid_entry(IntelIOMMUState *s, > +static int vtd_ce_get_pasid_entry(IntelIOMMUState *s, > VTDContextEntry *ce, > VTDPASIDEntry *pe, > uint32_t pasid) > @@ -1025,7 +1025,7 @@ static uint32_t vtd_get_iova_level(IntelIOMMUState *s, > VTDPASIDEntry pe; > > if (s->root_scalable) { > -vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid); > +vtd_ce_get_pasid_entry(s, ce, &pe, pasid); > if (s->flts) { > return VTD_PE_GET_FL_LEVEL(&pe); > } else { > @@ -1048,7 +1048,7 @@ static uint32_t vtd_get_iova_agaw(IntelIOMMUState *s, > VTDPASIDEntry pe; > > if (s->root_scalable) { > -vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid); > +vtd_ce_get_pasid_entry(s, ce, &pe, pasid); > return 30 + ((pe.val[0] >> 2) & VTD_SM_PASID_ENTRY_AW) * 9; > } > > @@ -1116,7 +1116,7 @@ static dma_addr_t > vtd_get_iova_pgtbl_base(IntelIOMMUState *s, > VTDPASIDEntry pe; > > if (s->root_scalable) { > -vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid); > +vtd_ce_get_pasid_entry(s, ce, &pe, pasid); > if (s->flts) { > return pe.val[2] & VTD_SM_PASID_ENTRY_FLPTPTR; > } else { > @@ -1522,7 +1522,7 @@ static int vtd_ce_rid2pasid_check(IntelIOMMUState *s, >* has valid rid2pasid setting, which includes valid >* rid2pasid field and corresponding pasid entry setting >*/ > -return vtd_ce_get_rid2pasid_entry(s, ce, &pe, PCI_NO_PASID); > +return vtd_ce_get_pasid_entry(s, ce, &pe, PCI_NO_PASID); > } > > /* Map a device to its corresponding domain (context-entry) */ > @@ -1611,7 +1611,7 @@ static uint16_t vtd_get_domain_id(IntelIOMMUState *s, > VTDPASIDEntry pe; > > if (s->root_scalable) { > -vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid); > +vtd_ce_get_pasid_entry(s, ce, &pe, pasid); > return VTD_SM_PASID_ENTRY_DID(pe.val[1]); > } > > @@ -1687,7 +1687,7 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, > VTDContextEntry *ce, > int ret; > > if (s->root_scalable) { > -ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid); > +ret = vtd_ce_get_pasid_entry(s, ce, &pe, pasid); > if (ret) { > /* >* This error is guest triggerable. We should assumt PT > -- > 2.34.1 >
Re: [PATCH v3 00/14] Change ghes to use HEST-based offsets and add support for error inject
Em Mon, 3 Feb 2025 16:22:36 +0100 Igor Mammedov escreveu: > On Mon, 3 Feb 2025 11:09:34 + > Jonathan Cameron wrote: > > > On Fri, 31 Jan 2025 18:42:41 +0100 > > Mauro Carvalho Chehab wrote: > > > > > Now that the ghes preparation patches were merged, let's add support > > > for error injection. > > > > > > On this series, the first 6 patches chang to the math used to calculate > > > offsets at HEST > > > table and hardware_error firmware file, together with its migration code. > > > Migration tested > > > with both latest QEMU released kernel and upstream, on both directions. > > > > > > The next patches add a new QAPI to allow injecting GHESv2 errors, and a > > > script using such QAPI > > >to inject ARM Processor Error records. > > > > > > If I'm counting well, this is the 19th submission of my error inject > > > patches. > > > > Looks good to me. All remaining trivial things are in the category > > of things to consider only if you are doing another spin. The code > > ends up how I'd like it at the end of the series anyway, just > > a question of the precise path to that state! > > if you look at series as a whole it's more or less fine (I guess you > and me got used to it) > > however if you take it patch by patch (as if you've never seen it) > ordering is messed up (the same would apply to everyone after a while > when it's forgotten) > > So I'd strongly suggest to restructure the series (especially 2-6/14). > re sum up my comments wrt ordering: > > 0 add testcase for HEST table with current HEST as expected blob >(currently missing), so that we can be sure that we haven't messed >existing tables during refactoring. Not sure if I got this one. The HEST table is part of etc/acpi/tables, which is already tested, as you pointed at the previous reviews. Doing changes there is already detected. That's basically why we added patches 10 and 12: [PATCH v3 10/14] tests/acpi: virt: allow acpi table changes for a new table: HEST [PATCH v3 12/14] tests/acpi: virt: add a HEST table to aarch64 virt and update DSDT What tests don't have is a check for etc/hardware_errors firmware inside tests/data/acpi/aarch64/virt/, but, IMO, we shouldn't add it there. See, hardware_errors table contains only some skeleton space to store: - 1 or more error block address offsets; - 1 or more read ack register; - 1 or more HEST source entries containing CPER blocks. There's nothing there to be actually checked: it is just some empty spaces with a variable number of fields. With the new code, the actual number of CPER blocks and their corresponding offsets and read ack registers can be different on different architectures. So, for instance, when we add x86 support, we'll likely start with just one error source entry, while arm will have two after this changeset. Also, one possibility to address the issues reported by Gavin Shan at https://lore.kernel.org/qemu-devel/20250214041635.608012-1-gs...@redhat.com/ would be to have one entry per each CPU. So, the size of such firmware could be dependent on the number of CPUs. So, adding any validation to it would just cause pain and probably won't detect any problems. What could be done instead is to have a different type of tests that would use the error injection script to check if regressions are introduced after QEMU 10.0. Such new kind of test would require this series to be merged first. It would also require the usage of an OSPM image with some testing tools on it. This is easier said than done, as besides the complexity of having an OSPM test image, such kind of tests would require extra logic, specially if it would check regressions for SEA and other notification sources. Thanks, Mauro
Re: [PATCH 00/42] docs: add sphinx-domain rST generator to qapidoc
John Snow writes: > On Wed, Feb 19, 2025 at 8:22 AM Markus Armbruster wrote: > >> John Snow writes: >> >> > "The text handler you add looks just like the existing latex handler. Does >> > LaTeX output lack "little headings", too?" >> > >> > Yes, almost certainly. Can you let me know which output formats we actually >> > "care about"? I'll have to test them all. >> >> As far as I can tell, our build system runs sphinx-build -b html and -b >> man. >> >> I run it with -b text manually all the time to hunt for and review >> changes in output. I'd prefer to keep it working if practical. >> >> For what it's worth, there is a bit of LaTeX configuration in >> docs/conf.py. >> >> > In the meantime, I upgraded my >> > patch so that the text translator properly handles branches with headings >> > that delineate the different branches so that the text output is fully >> > reasonable. I will need to do the same for any format we care about. >> > >> > I've re-pushed as of "about 30 minutes before I wrote this email" -- >> > https://gitlab.com/jsnow/qemu/-/commits/sphinx-domain-blergh2 >> > >> > This branch includes the text generator fixes (which technically belong >> > with the predecessor series we skipped, but I'll refactor that later.) >> > it also includes fixes to the branch inliner, generated return statements, >> > and generated out-of-band feature sections. >> >> I'll fetch it, thanks! >> >> > (Long story short: inserting new sections in certain spots was broken >> > because of cache. Oops. We can discuss more why I wrote that part of the >> > code like I did in review for the patch that introduced that problem. It's >> > the "basic inliner" patch.) >> > >> > Below, I'm going to try a new communication approach where I explicitly say >> > if I have added something to my tasklist or not so that it's clear to you >> > what I believe is actionable (and what I am agreeing to change) and what I >> > believe needs stronger input from you before I do anything. Apologies if it >> > seems a little robotic, just trying new things O:-) >> > >> > On that note: not added to tasklist: do we need the LaTeX handler? Do we >> > need any others? Please confirm O:-) >> >> See above. >> > > I've got html and text working, text wasn't hard. I will give it a good > college try on the LaTeX and man formats. Might be easy. The issue here is > the custom node I introduced for the collapsible details sections which has > no default handler in the generators. I'll have to learn more about that > part of the API, I haven't interfaced with it much yet. Understand. Have you considered cutting the series in half before the inliner? First part emits "The members of ..." like the old doc generator. Second part replaces that with inlined material. We could totally release with just the first half! Inlining is great, but even without it, your work looks so much better and is so much more usable. >> > On Fri, Feb 14, 2025 at 7:05 AM Markus Armbruster >> > wrote: >> > >> >> I started to eyeball old and new generated output side by side. >> >> >> >> New table of contents shows one level, old two. No objection; the >> >> navigation thingie on the left is more useful anyway. >> >> >> > >> > Unintentional, but if you like it, it's fine by me. Nothing added to my >> > tasklist. >> >> Mention in a commit message. >> > > Sure. I... just need to figure out which commit to mention it in. Added to > my list, anyway. > > >> >> >> The new generator elides unreferenced types. Generally good, but two >> >> observations: >> >> >> >> * QapiErrorClass is unreferenced, but its members are mentioned in >> >> Errors sections. QapiErrorClass serves as better than nothing error >> >> code documentation, but it's gone in the new doc. So this is a minor >> >> regression. We can figure out what to do about it later. >> >> >> > >> > Right. I debated making the members references to that class, but recalled >> > that you disliked this class and figured you'd not like such a change, so I >> > just left it alone. I do not have cross-references for individual members >> > of objects at all yet anyway, so this is definitely more work regardless. >> > >> > We could always create a pragma of some sort (or just hardcode a list) of >> > items that must be documented regardless of if they're referenced or not. >> > Please let me know your preference and I will add a "ticket" on my personal >> > tasklist for this project to handle that at /some point/. Nothing added to >> > my tasklist just yet. >> >> Suggest to add something like "compensate for the loss of QapiErrorClass >> documentation in the QEMU QMP Reference Manual". >> > > Got it. Possibly a "for later" task but not much later. It can always come > after this first series, but before we "turn on" the new generator, if that > makes sense. Just so we reach a quiescent point and flush the staggeringly > large queue. I think we could even do it after "turn on". Ye
RE: [PATCH 4/5] pcie, virtio: Remove redundant pm_cap
>-Original Message- >From: Alex Williamson >Subject: [PATCH 4/5] pcie, virtio: Remove redundant pm_cap > >The pm_cap on the PCIExpressDevice object can be distilled down >to the new instance on the PCIDevice object. > >Cc: Michael S. Tsirkin >Cc: Marcel Apfelbaum >Signed-off-by: Alex Williamson >--- > hw/pci-bridge/pcie_pci_bridge.c | 1 - > hw/virtio/virtio-pci.c | 8 +++- > include/hw/pci/pcie.h | 2 -- > 3 files changed, 3 insertions(+), 8 deletions(-) > >diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c >index 9fa656b43b42..2429503cfbbf 100644 >--- a/hw/pci-bridge/pcie_pci_bridge.c >+++ b/hw/pci-bridge/pcie_pci_bridge.c >@@ -56,7 +56,6 @@ static void pcie_pci_bridge_realize(PCIDevice *d, Error >**errp) > if (pos < 0) { > goto pm_error; > } >-d->exp.pm_cap = pos; > pci_set_word(d->config + pos + PCI_PM_PMC, 0x3); > > pcie_cap_arifwd_init(d); >diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c >index afe8b5551c5c..3ca3f849d391 100644 >--- a/hw/virtio/virtio-pci.c >+++ b/hw/virtio/virtio-pci.c >@@ -2209,8 +2209,6 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error >**errp) > return; > } > >-pci_dev->exp.pm_cap = pos; >- > /* > * Indicates that this function complies with revision 1.2 of the > * PCI Power Management Interface Specification. >@@ -2309,11 +2307,11 @@ static bool virtio_pci_no_soft_reset(PCIDevice *dev) > { > uint16_t pmcsr; > >-if (!pci_is_express(dev) || !dev->exp.pm_cap) { >+if (!pci_is_express(dev) || !(dev->cap_present & QEMU_PCI_CAP_PM)) { Maybe a bit more optimized by checking dev->pm_cap, but it's also ok checking present bit. For the whole series, Reviewed-by: Zhenzhong Duan Thanks Zhenzhong > return false; > } > >-pmcsr = pci_get_word(dev->config + dev->exp.pm_cap + PCI_PM_CTRL); >+pmcsr = pci_get_word(dev->config + dev->pm_cap + PCI_PM_CTRL); > > /* > * When No_Soft_Reset bit is set and the device >@@ -2342,7 +2340,7 @@ static void virtio_pci_bus_reset_hold(Object *obj, >ResetType type) > > if (proxy->flags & VIRTIO_PCI_FLAG_INIT_PM) { > pci_word_test_and_clear_mask( >-dev->config + dev->exp.pm_cap + PCI_PM_CTRL, >+dev->config + dev->pm_cap + PCI_PM_CTRL, > PCI_PM_CTRL_STATE_MASK); > } > } >diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h >index b8d59732bc63..70a5de09de39 100644 >--- a/include/hw/pci/pcie.h >+++ b/include/hw/pci/pcie.h >@@ -58,8 +58,6 @@ typedef enum { > struct PCIExpressDevice { > /* Offset of express capability in config space */ > uint8_t exp_cap; >-/* Offset of Power Management capability in config space */ >-uint8_t pm_cap; > > /* SLOT */ > bool hpev_notified; /* Logical AND of conditions for hot plug event. >-- >2.48.1
[PATCH v2 0/8] migration/rdma: fixes, refactor and cleanup
- It fix the RDMA migration broken issue - disable RDMA + postcopy - some cleanups - Add a qtest for RDMA at last Changs since V1[0]: Add some saparate patches to refactor and cleanup based on V1 [0] https://lore.kernel.org/qemu-devel/20250218074345.638203-1-lizhij...@fujitsu.com/ Li Zhijian (8): migration: Prioritize RDMA in ram_save_target_page() migration/rdma: Remove redundant RAM_SAVE_CONTROL_NOT_SUPP check migration: Kill RAM_SAVE_CONTROL_NOT_SUPP migration: Integrate control_save_page() logic into ram_save_target_page() migration: Add migration_capabilities_and_transport_compatible() helper migraion: disable RDMA + postcopy-ram migration/rdma: Remove redundant migration_in_postcopy checks migration: Add qtest for migration over RDMA MAINTAINERS | 1 + migration/migration.c | 40 - migration/ram.c | 41 + migration/rdma.c | 12 +++-- migration/rdma.h | 3 +- scripts/rdma-migration-helper.sh | 41 + tests/qtest/migration/precopy-tests.c | 64 +++ 7 files changed, 153 insertions(+), 49 deletions(-) create mode 100755 scripts/rdma-migration-helper.sh -- 2.44.0
RE: [PATCH v3 21/28] hw/misc/aspeed_hace: Fix boot issue in the Crypto Manager Self Test
Hi Cedric, > > Hi Cedric, > > > Subject: Re: [PATCH v3 21/28] hw/misc/aspeed_hace: Fix boot issue in > > the Crypto Manager Self Test > > > > On 2/13/25 04:35, Jamin Lin wrote: > > > Currently, it does not support the CRYPT command. Instead, it only > > > sends an interrupt to notify the firmware that the crypt command has > > completed. > > > It is a temporary workaround to resolve the boot issue in the Crypto > > > Manager Self Test. > > > > > > Signed-off-by: Jamin Lin > > > > Please add an AspeedHACEClass class attribute (bool) to handle this > > workaround and a comment in the code mentioning the issue. > > > Thanks for review and suggestion. > I will add the use_crypt_workaround attribute to the AspeedHACEClass and > introduce the use-crypt-workaround property. > Do you have any concerns, or do you have a preferred naming convention? Update my comments. I do not need to create a property. I can set use_crypt_workaround to true in aspeed_ast2700_hace_class_init Thanks-Jamin > Thanks-Jamin > > > > > Thanks, > > > > C. > > > > > > > > > --- > > > hw/misc/aspeed_hace.c | 12 > > > 1 file changed, 12 insertions(+) > > > > > > diff --git a/hw/misc/aspeed_hace.c b/hw/misc/aspeed_hace.c index > > > 86422cb3be..4d0999e7e9 100644 > > > --- a/hw/misc/aspeed_hace.c > > > +++ b/hw/misc/aspeed_hace.c > > > @@ -59,6 +59,7 @@ > > > /* Other cmd bits */ > > > #define HASH_IRQ_ENBIT(9) > > > #define HASH_SG_EN BIT(18) > > > +#define CRYPT_IRQ_EN BIT(12) > > > /* Scatter-gather data list */ > > > #define SG_LIST_LEN_SIZE4 > > > #define SG_LIST_LEN_MASK0x0FFF > > > @@ -343,6 +344,13 @@ static void aspeed_hace_write(void *opaque, > > hwaddr addr, uint64_t data, > > > qemu_irq_lower(s->irq); > > > } > > > } > > > +if (data & CRYPT_IRQ) { > > > +data &= ~CRYPT_IRQ; > > > + > > > +if (s->regs[addr] & CRYPT_IRQ) { > > > +qemu_irq_lower(s->irq); > > > +} > > > +} > > > break; > > > case R_HASH_SRC: > > > data &= ahc->src_mask; > > > @@ -388,6 +396,10 @@ static void aspeed_hace_write(void *opaque, > > hwaddr addr, uint64_t data, > > > case R_CRYPT_CMD: > > > qemu_log_mask(LOG_UNIMP, "%s: Crypt commands not > > implemented\n", > > > __func__); > > > +s->regs[R_STATUS] |= CRYPT_IRQ; > > > +if (data & CRYPT_IRQ_EN) { > > > +qemu_irq_raise(s->irq); > > > +} > > > break; > > > default: > > > break;
Re: [PATCH 0/5] PCI: Implement basic PCI PM capability backing
On Thu, Feb 20, 2025 at 03:48:53PM -0700, Alex Williamson wrote: > Eric recently identified an issue[1] where during graceful shutdown > of a VM in a vIOMMU configuration, the guest driver places the device > into the D3 power state, the vIOMMU is then disabled, triggering an > AddressSpace update. The device BARs are still mapped into the AS, > but the vfio host driver refuses to DMA map the MMIO space due to the > device power state. > > The proposed solution in [1] was to skip mappings based on the > device power state. Here we take a different approach. The PCI spec > defines that devices in D1/2/3 power state should respond only to > configuration and message requests and all other requests should be > handled as an Unsupported Request. In other words, the memory and > IO BARs are not accessible except when the device is in the D0 power > state. > > To emulate this behavior, we can factor the device power state into > the mapping state of the device BARs. Therefore the BAR is marked > as unmapped if either the respective command register enable bit is > clear or the device is not in the D0 power state. > > In order to implement this, the PowerState field of the PMCSR > register becomes writable, which allows the device to appear in > lower power states. This also therefore implements D3 support > (insofar as the BAR behavior) for all devices implementing the PM > capability. The PCI spec requires D3 support. > > An aspect that needs attention here is whether this change in the > wmask and PMCSR bits becomes a problem for migration, and how we > might solve it. For a guest migrating old->new, the device would > always be in the D0 power state, but the register becomes writable. > In the opposite direction, is it possible that a device could > migrate in a low power state and be stuck there since the bits are > read-only in old QEMU? Do we need an option for this behavior and a > machine state bump, or are there alternatives? > > Thanks, > Alex > > [1]https://lore.kernel.org/all/20250219175941.135390-1-eric.au...@redhat.com/ PCI bits: Reviewed-by: Michael S. Tsirkin feel free to merge. > Alex Williamson (5): > hw/pci: Basic support for PCI power management > pci: Use PCI PM capability initializer > vfio/pci: Delete local pm_cap > pcie, virtio: Remove redundant pm_cap > hw/vfio/pci: Re-order pre-reset > > hw/net/e1000e.c | 3 +- > hw/net/eepro100.c | 4 +- > hw/net/igb.c| 3 +- > hw/nvme/ctrl.c | 3 +- > hw/pci-bridge/pcie_pci_bridge.c | 3 +- > hw/pci/pci.c| 83 - > hw/pci/trace-events | 2 + > hw/vfio/pci.c | 29 ++-- > hw/vfio/pci.h | 1 - > hw/virtio/virtio-pci.c | 11 ++--- > include/hw/pci/pci.h| 3 ++ > include/hw/pci/pci_device.h | 3 ++ > include/hw/pci/pcie.h | 2 - > 13 files changed, 112 insertions(+), 38 deletions(-) > > -- > 2.48.1
Re: [PATCH v2 0/2] s390x: support virtio-mem-pci
On Tue, Jan 28, 2025 at 07:57:03PM +0100, David Hildenbrand wrote: > This is based-on [1], which adds MSI-X support to virtio-balloon-pci, > but can be applied independently. > > Turns out it is fairly easy to get virtio-mem-pci running on s390x. We > only have to add MSI-X support to virtio-mem-pci, and wire-up the > (un)plugging in the machine. > > Tried some simple stuff (hotplug/hotunplug/resize/reboot), and all seems > to be working as expected. > > The kernel in the VM needs both, CONFIG_VIRTIO_PCI and CONFIG_VIRTIO_MEM > for it to work. > > [1] https://lkml.kernel.org/r/20250115161425.246348-1-ar...@linux.ibm.com Fails CI: https://gitlab.com/mstredhat/qemu/-/jobs/9202574981 > v1 -> v2: > * There are no transitional/non_transitional devices for virtio-mem > * Spell out removal of "return;" in second patch > > Cc: Eduardo Habkost > Cc: Marcel Apfelbaum > Cc: Richard Henderson > Cc: David Hildenbrand > Cc: Ilya Leoshkevich > Cc: Halil Pasic > Cc: Christian Borntraeger > Cc: Eric Farman > Cc: Thomas Huth > Cc: "Michael S. Tsirkin" > Cc: Cornelia Huck > Cc: Boris Fiuczynski > Cc: Michal Privoznik > Cc: Mario Casquero > > David Hildenbrand (2): > virtio-mem-pci: Allow setting nvectors, so we can use MSI-X > s390x/s390-virtio-ccw: Support plugging PCI-based virtio memory > devices > > hw/core/machine.c | 1 + > hw/s390x/s390-virtio-ccw.c | 20 ++-- > hw/virtio/virtio-mem-pci.c | 12 > 3 files changed, 27 insertions(+), 6 deletions(-) > > -- > 2.48.1
Re: [PATCH v5 3/6] target/i386: Update EPYC-Milan CPU model for Cache property, RAS, SVM feature bits
Hi Zhao, On 2/20/2025 5:26 AM, Zhao Liu wrote: +static const CPUCaches epyc_milan_v3_cache_info = { +.l1d_cache = &(CPUCacheInfo) { +.type = DATA_CACHE, +.level = 1, +.size = 32 * KiB, +.line_size = 64, +.associativity = 8, +.partitions = 1, +.sets = 64, +.lines_per_tag = 1, +.self_init = 1, true. Sure. +.share_level = CPU_TOPOLOGY_LEVEL_CORE, +}, +.l1i_cache = &(CPUCacheInfo) { +.type = INSTRUCTION_CACHE, +.level = 1, +.size = 32 * KiB, +.line_size = 64, +.associativity = 8, +.partitions = 1, +.sets = 64, +.lines_per_tag = 1, +.self_init = 1, true. Sure. Reviewed-by: Zhao Liu Thanks Babu
Re: [PATCH v5 4/6] target/i386: Add feature that indicates WRMSR to BASE reg is non-serializing
Hi Zhao, On 2/20/2025 6:00 AM, Zhao Liu wrote: On Thu, Feb 06, 2025 at 01:28:37PM -0600, Babu Moger wrote: Date: Thu, 6 Feb 2025 13:28:37 -0600 From: Babu Moger Subject: [PATCH v5 4/6] target/i386: Add feature that indicates WRMSR to BASE reg is non-serializing X-Mailer: git-send-email 2.34.1 Add the CPUID bit indicates that a WRMSR to MSR_FS_BASE, MSR_GS_BASE, or MSR_KERNEL_GS_BASE is non-serializing. CPUID_Fn8021_EAX BitFeature description 1 FsGsKernelGsBaseNonSerializing. WRMSR to FS_BASE, GS_BASE and KernelGSbase are non-serializing. Link: https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/57238.zip Signed-off-by: Babu Moger Reviewed-by: Maksim Davydov --- target/i386/cpu.c | 2 +- target/i386/cpu.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) Reviewed-by: Zhao Liu Thank you, Babu
Re: [PATCH v5 5/6] target/i386: Update EPYC-Genoa for Cache property, perfmon-v2, RAS and SVM feature bits
Hi Zhao, On 2/20/2025 6:05 AM, Zhao Liu wrote: +static const CPUCaches epyc_genoa_v2_cache_info = { +.l1d_cache = &(CPUCacheInfo) { +.type = DATA_CACHE, +.level = 1, +.size = 32 * KiB, +.line_size = 64, +.associativity = 8, +.partitions = 1, +.sets = 64, +.lines_per_tag = 1, +.self_init = 1, true. Sure. +.share_level = CPU_TOPOLOGY_LEVEL_CORE, +}, +.l1i_cache = &(CPUCacheInfo) { +.type = INSTRUCTION_CACHE, +.level = 1, +.size = 32 * KiB, +.line_size = 64, +.associativity = 8, +.partitions = 1, +.sets = 64, +.lines_per_tag = 1, +.self_init = 1, true. Sure. Reviewed-by: Zhao Liu Thanks Babu
Re: [PATCH v5 2/6] target/i386: Update EPYC-Rome CPU model for Cache property, RAS, SVM feature bits
Hi Zhao, On 2/20/2025 5:18 AM, Zhao Liu wrote: On Thu, Feb 06, 2025 at 01:28:35PM -0600, Babu Moger wrote: Date: Thu, 6 Feb 2025 13:28:35 -0600 From: Babu Moger Subject: [PATCH v5 2/6] target/i386: Update EPYC-Rome CPU model for Cache property, RAS, SVM feature bits X-Mailer: git-send-email 2.34.1 Found that some of the cache properties are not set correctly for EPYC models. l1d_cache.no_invd_sharing should not be true. l1i_cache.no_invd_sharing should not be true. L2.self_init should be true. L2.inclusive should be true. L3.inclusive should not be true. L3.no_invd_sharing should be true. Fix these cache properties. Also add the missing RAS and SVM features bits on AMD EPYC-Rome. The SVM feature bits are used in nested guests. succor : Software uncorrectable error containment and recovery capability. overflow-recov : MCA overflow recovery support. lbrv: LBR virtualization tsc-scale : MSR based TSC rate control vmcb-clean : VMCB clean bits flushbyasid : Flush by ASID pause-filter: Pause intercept filter pfthreshold : PAUSE filter threshold v-vmsave-vmload : Virtualized VMLOAD and VMSAVE vgif: Virtualized GIF Signed-off-by: Babu Moger Reviewed-by: Maksim Davydov --- target/i386/cpu.c | 73 +++ 1 file changed, 73 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 94292bfaa2..e2c3c797ed 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -2342,6 +2342,60 @@ static const CPUCaches epyc_rome_v3_cache_info = { }, }; +static const CPUCaches epyc_rome_v5_cache_info = { +.l1d_cache = &(CPUCacheInfo) { +.type = DATA_CACHE, +.level = 1, +.size = 32 * KiB, +.line_size = 64, +.associativity = 8, +.partitions = 1, +.sets = 64, +.lines_per_tag = 1, +.self_init = 1, This field could be true, Sure. +.share_level = CPU_TOPOLOGY_LEVEL_CORE, +}, +.l1i_cache = &(CPUCacheInfo) { +.type = INSTRUCTION_CACHE, +.level = 1, +.size = 32 * KiB, +.line_size = 64, +.associativity = 8, +.partitions = 1, +.sets = 64, +.lines_per_tag = 1, +.self_init = 1, ditto, Sure. Compared to the previous cache model version, the differences can be checked. I feel that in the future, when we introduce a new cache model, it's better to avoid omitting items that default to false. This way, the cache model can correspond to the output of the cpuid tool, making it easier to compare and check. Sounds good. Overall, LGTM, Reviewed-by: Zhao Liu Thanks Babu
Re: [PATCH v5 6/6] target/i386: Add support for EPYC-Turin model
Hi Zhao, On 2/20/2025 6:11 AM, Zhao Liu wrote: +static const CPUCaches epyc_turin_cache_info = { +.l1d_cache = &(CPUCacheInfo) { +.type = DATA_CACHE, +.level = 1, +.size = 48 * KiB, +.line_size = 64, +.associativity = 12, +.partitions = 1, +.sets = 64, +.lines_per_tag = 1, +.self_init = 1, true. Sure. +.share_level = CPU_TOPOLOGY_LEVEL_CORE, +}, +.l1i_cache = &(CPUCacheInfo) { +.type = INSTRUCTION_CACHE, +.level = 1, +.size = 32 * KiB, +.line_size = 64, +.associativity = 8, +.partitions = 1, +.sets = 64, +.lines_per_tag = 1, +.self_init = 1, true. Sure. Reviewed-by: Zhao Liu thanks (And it would be better to add a Turin entry in docs/system/cpu-models-x86.rst.inc later :-).) Yes. Will add a new patch to update docs/system/cpu-models-x86.rst.inc. Thanks, Zhao