date:20250227

Re: [PATCH] hw/net: ftgmac100: copy eth_hdr for alignment

2025-02-27 Thread Andrew Jeffery

Hi Patrick,

On Thu, 2025-02-27 at 15:42 +, Patrick Venture wrote:
> eth_hdr requires 2 byte alignment
> 
> Signed-off-by: Patrick Venture 
> ---
>  hw/net/ftgmac100.c | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
> index 1f524d7a01..a33aaa01ee 100644
> --- a/hw/net/ftgmac100.c
> +++ b/hw/net/ftgmac100.c
> @@ -989,12 +989,16 @@ static void ftgmac100_high_write(void *opaque, hwaddr 
> addr,
>  static int ftgmac100_filter(FTGMAC100State *s, const uint8_t *buf, size_t 
> len)
>  {
>  unsigned mcast_idx;
> +    struct eth_header eth_hdr = {};
>  
>  if (s->maccr & FTGMAC100_MACCR_RX_ALL) {
>  return 1;
>  }
>  
> -    switch (get_eth_packet_type(PKT_GET_ETH_HDR(buf))) {
> +    memcpy(ð_hdr, PKT_GET_ETH_HDR(buf),
> +   (sizeof(eth_hdr) > len) ? len : sizeof(eth_hdr));

I don't think truncating the memcpy() in this way is what we want? The
switched value may not be meaningful for small values of len.

Perhaps return an error?

> +
> +    switch (get_eth_packet_type(ð_hdr)) {
>  case ETH_PKT_BCAST:
>  if (!(s->maccr & FTGMAC100_MACCR_RX_BROADPKT)) {
>  return 0;
> @@ -1028,6 +1032,7 @@ static ssize_t ftgmac100_receive(NetClientState *nc, 
> const uint8_t *buf,
>  {
>  FTGMAC100State *s = FTGMAC100(qemu_get_nic_opaque(nc));
>  FTGMAC100Desc bd;
> +    struct eth_header eth_hdr = {};
>  uint32_t flags = 0;
>  uint64_t addr;
>  uint32_t crc;
> @@ -1036,7 +1041,11 @@ static ssize_t ftgmac100_receive(NetClientState *nc, 
> const uint8_t *buf,
>  uint32_t buf_len;
>  size_t size = len;
>  uint32_t first = FTGMAC100_RXDES0_FRS;
> -    uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(buf)->h_proto);
> +    uint16_t proto;
> +
> +    memcpy(ð_hdr, PKT_GET_ETH_HDR(buf),
> +   (sizeof(eth_hdr) > len) ? len : sizeof(eth_hdr));

Again here.

> +    proto = be16_to_cpu(eth_hdr.h_proto);
>  int max_frame_size = ftgmac100_max_frame_size(s, proto);
>  
>  if ((s->maccr & (FTGMAC100_MACCR_RXDMA_EN | FTGMAC100_MACCR_RXMAC_EN))
> @@ -1061,7 +1070,7 @@ static ssize_t ftgmac100_receive(NetClientState *nc, 
> const uint8_t *buf,
>  flags |= FTGMAC100_RXDES0_FTL;
>  }
>  
> -    switch (get_eth_packet_type(PKT_GET_ETH_HDR(buf))) {
> +    switch (get_eth_packet_type(ð_hdr)) {
>  case ETH_PKT_BCAST:
>  flags |= FTGMAC100_RXDES0_BROADCAST;
>  break;

RE: [PATCH rfcv2 02/20] vfio/iommufd: Add properties and handlers to TYPE_HOST_IOMMU_DEVICE_IOMMUFD

2025-02-27 Thread Duan, Zhenzhong

Hi Eric,

>-Original Message-
>From: Eric Auger 
>Subject: Re: [PATCH rfcv2 02/20] vfio/iommufd: Add properties and handlers to
>TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>
>Hi Zhenzhong,
>
>
>On 2/19/25 9:22 AM, Zhenzhong Duan wrote:
>> New added properties include IOMMUFD handle, devid and hwpt_id.
>a property generally has an other meaning in qemu (PROP*).
>
>I would rather say you enhance HostIOMMUDeviceIOMMUFD object with 3 new
>members, specific to the iommufd BE + 2 new class functions.

Will do.

>
>
>> IOMMUFD handle and devid are used to allocate/free ioas and hwpt.
>> hwpt_id is used to re-attach IOMMUFD backed device to its default
>> VFIO sub-system created hwpt, i.e., when vIOMMU is disabled by
>> guest. These properties are initialized in .realize_late() handler.
>realize_late does not exist yet
>>
>> New added handlers include [at|de]tach_hwpt. They are used to
>> attach/detach hwpt. VFIO and VDPA have different way to attach
>> and detach, so implementation will be in sub-class instead of
>> HostIOMMUDeviceIOMMUFD.
>this is tricky to follow ...

I mean implementing [at|de]tach_hwpt in e.g., HostIOMMUDeviceIOMMUFDVFIO.

>>
>> Add two wrappers host_iommu_device_iommufd_[at|de]tach_hwpt to
>> wrap the two handlers.
>>
>> This is a prerequisite patch for following ones.
>would get rid of that sentence as it does not help much

Sure.

Thanks
Zhenzhong

Re: [PATCH v10 5/8] hw/misc/riscv_iopmp_txn_info: Add struct for transaction infomation

2025-02-27 Thread Alistair Francis

On Wed, Jan 22, 2025 at 6:49 PM Ethan Chen via  wrote:
>
> The entire valid transaction must fit within a single IOPMP entry.
> However, during IOMMU translation, the transaction size is not
> available. This structure defines the transaction information required
> by the IOPMP.
>
> Signed-off-by: Ethan Chen 
> ---
>  include/hw/misc/riscv_iopmp_txn_info.h | 38 ++
>  1 file changed, 38 insertions(+)
>  create mode 100644 include/hw/misc/riscv_iopmp_txn_info.h
>
> diff --git a/include/hw/misc/riscv_iopmp_txn_info.h 
> b/include/hw/misc/riscv_iopmp_txn_info.h
> new file mode 100644
> index 00..98bd26b68b
> --- /dev/null
> +++ b/include/hw/misc/riscv_iopmp_txn_info.h
> @@ -0,0 +1,38 @@
> +/*
> + * QEMU RISC-V IOPMP transaction information
> + *
> + * The transaction information structure provides the complete transaction
> + * length to the IOPMP device
> + *
> + * Copyright (c) 2023-2025 Andes Tech. Corp.
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program.  If not, see .
> + */
> +
> +#ifndef RISCV_IOPMP_TXN_INFO_H
> +#define RISCV_IOPMP_TXN_INFO_H
> +
> +typedef struct {
> +/* The id of requestor */
> +uint32_t rrid:16;
> +/* The start address of transaction */
> +uint64_t start_addr;
> +/* The end address of transaction */
> +uint64_t end_addr;
> +/* The stage of cascading IOPMP */
> +uint32_t stage;
> +} riscv_iopmp_txn_info;

Whoops, this should be CamelCase.

Checkpatch should catch these type of errors, make sure you run it if you didn't

Alistair

> +
> +#endif
> --
> 2.34.1
>
>

Re: [PATCH v10 7/8] hw/misc/riscv_iopmp_dispatcher: Device for redirect IOPMP transaction infomation

2025-02-27 Thread Alistair Francis

On Wed, Jan 22, 2025 at 6:48 PM Ethan Chen via  wrote:
>
> This device determines the target IOPMP device for forwarding information
> based on:
> * Address: For parallel IOPMP devices
> * Stage: For cascading IOPMP devices
>
> Signed-off-by: Ethan Chen 
> ---
>  hw/misc/meson.build  |   1 +
>  hw/misc/riscv_iopmp_dispatcher.c | 136 +++
>  include/hw/misc/riscv_iopmp_dispatcher.h |  61 ++
>  3 files changed, 198 insertions(+)
>  create mode 100644 hw/misc/riscv_iopmp_dispatcher.c
>  create mode 100644 include/hw/misc/riscv_iopmp_dispatcher.h
>
> diff --git a/hw/misc/meson.build b/hw/misc/meson.build
> index 88f2bb6b88..497f83637f 100644
> --- a/hw/misc/meson.build
> +++ b/hw/misc/meson.build
> @@ -35,6 +35,7 @@ system_ss.add(when: 'CONFIG_SIFIVE_E_AON', if_true: 
> files('sifive_e_aon.c'))
>  system_ss.add(when: 'CONFIG_SIFIVE_U_OTP', if_true: files('sifive_u_otp.c'))
>  system_ss.add(when: 'CONFIG_SIFIVE_U_PRCI', if_true: 
> files('sifive_u_prci.c'))
>  specific_ss.add(when: 'CONFIG_RISCV_IOPMP', if_true: files('riscv_iopmp.c'))
> +specific_ss.add(when: 'CONFIG_RISCV_IOPMP', if_true: 
> files('riscv_iopmp_dispatcher.c'))
>
>  subdir('macio')
>
> diff --git a/hw/misc/riscv_iopmp_dispatcher.c 
> b/hw/misc/riscv_iopmp_dispatcher.c
> new file mode 100644
> index 00..4086e9c82b
> --- /dev/null
> +++ b/hw/misc/riscv_iopmp_dispatcher.c
> @@ -0,0 +1,136 @@
> +/*
> + * QEMU RISC-V IOPMP dispatcher
> + *
> + * Receives transaction information from the requestor and forwards it to the
> + * corresponding IOPMP device.
> + *
> + * Copyright (c) 2023-2025 Andes Tech. Corp.
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program.  If not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "qapi/error.h"
> +#include "trace.h"
> +#include "exec/exec-all.h"
> +#include "exec/address-spaces.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/sysbus.h"
> +#include "hw/misc/riscv_iopmp_dispatcher.h"
> +#include "memory.h"
> +#include "hw/irq.h"
> +#include "hw/misc/riscv_iopmp_txn_info.h"
> +
> +static void riscv_iopmp_dispatcher_realize(DeviceState *dev, Error **errp)
> +{
> +int i;
> +RISCVIOPMPDispState *s = RISCV_IOPMP_DISP(dev);
> +
> +s->SinkMemMap = g_new(SinkMemMapEntry *, s->stage_num);
> +for (i = 0; i < s->stage_num; i++) {
> +s->SinkMemMap[i] = g_new(SinkMemMapEntry, s->target_num);
> +}
> +
> +object_initialize_child(OBJECT(s), "iopmp_dispatcher_txn_info",
> +&s->txn_info_sink,
> +TYPE_RISCV_IOPMP_DISP_SS);
> +}
> +
> +static const Property iopmp_dispatcher_properties[] = {
> +DEFINE_PROP_UINT32("stage-num", RISCVIOPMPDispState, stage_num, 2),
> +DEFINE_PROP_UINT32("target-num", RISCVIOPMPDispState, target_num, 10),
> +};
> +
> +static void riscv_iopmp_dispatcher_class_init(ObjectClass *klass, void *data)
> +{
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +device_class_set_props(dc, iopmp_dispatcher_properties);
> +dc->realize = riscv_iopmp_dispatcher_realize;
> +}
> +
> +static const TypeInfo riscv_iopmp_dispatcher_info = {
> +.name = TYPE_RISCV_IOPMP_DISP,
> +.parent = TYPE_DEVICE,
> +.instance_size = sizeof(RISCVIOPMPDispState),
> +.class_init = riscv_iopmp_dispatcher_class_init,
> +};
> +
> +static size_t dispatcher_txn_info_push(StreamSink *txn_info_sink,
> +   unsigned char *buf,
> +   size_t len, bool eop)
> +{
> +uint64_t addr;
> +uint32_t stage;
> +int i, j;
> +riscv_iopmp_disp_ss *ss =
> +RISCV_IOPMP_DISP_SS(txn_info_sink);
> +RISCVIOPMPDispState *s = RISCV_IOPMP_DISP(container_of(ss,
> +RISCVIOPMPDispState, txn_info_sink));
> +riscv_iopmp_txn_info signal;
> +memcpy(&signal, buf, len);

Shouldn't the `len` be verified before this?

Also, instead of copying if you can cast the `buf` pointer to a
`riscv_iopmp_txn_info` pointer

Alistair

> +addr = signal.start_addr;
> +stage = signal.stage;
> +for (i = stage; i < s->stage_num; i++) {
> +for (j = 0; j < s->target_num; j++) {
> +if (s->SinkMemMap[i][j].map.base <= addr &&
> +addr < s->SinkMemMap[i][j].map.base
> ++ s->SinkMemMap[i][j].map.size) {
> +

Re: [PATCH v10 8/8] hw/riscv/virt: Add IOPMP support

2025-02-27 Thread Alistair Francis

On Wed, Jan 22, 2025 at 6:49 PM Ethan Chen via  wrote:
>
> - Add 'iopmp=on' option to enable IOPMP. It adds iopmp devices virt machine
>   to protect all regions of system memory.
>
> Signed-off-by: Ethan Chen 
> ---
>  docs/system/riscv/virt.rst |  7 
>  hw/riscv/Kconfig   |  1 +
>  hw/riscv/virt.c| 75 ++
>  include/hw/riscv/virt.h|  4 ++
>  4 files changed, 87 insertions(+)
>
> diff --git a/docs/system/riscv/virt.rst b/docs/system/riscv/virt.rst
> index 60850970ce..6b5fc1d37d 100644
> --- a/docs/system/riscv/virt.rst
> +++ b/docs/system/riscv/virt.rst
> @@ -146,6 +146,13 @@ The following machine-specific options are supported:
>
>Enables the riscv-iommu-sys platform device. Defaults to 'off'.
>
> +- iopmp=[on|off]
> +
> +  When this option is "on", IOPMP devices are added to machine. IOPMP checks
> +  memory transcations in system memory. This option is assumed to be "off". 
> To
> +  enable the CPU to perform transactions with a specified RRID, use the CPU
> +  option "-cpu ,iopmp=true,iopmp_rrid="

We should include some details on the default implementation settings
here as well

Alistair

> +
>  Running Linux kernel
>  
>
> diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
> index e6a0ac1fa1..637438af2c 100644
> --- a/hw/riscv/Kconfig
> +++ b/hw/riscv/Kconfig
> @@ -68,6 +68,7 @@ config RISCV_VIRT
>  select PLATFORM_BUS
>  select ACPI
>  select ACPI_PCI
> +select RISCV_IOPMP
>
>  config SHAKTI_C
>  bool
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index 241389d72f..c5a8f7173e 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -57,6 +57,8 @@
>  #include "hw/acpi/aml-build.h"
>  #include "qapi/qapi-visit-common.h"
>  #include "hw/virtio/virtio-iommu.h"
> +#include "hw/misc/riscv_iopmp.h"
> +#include "hw/misc/riscv_iopmp_dispatcher.h"
>
>  /* KVM AIA only supports APLIC MSI. APLIC Wired is always emulated by QEMU. 
> */
>  static bool virt_use_kvm_aia_aplic_imsic(RISCVVirtAIAType aia_type)
> @@ -94,6 +96,7 @@ static const MemMapEntry virt_memmap[] = {
>  [VIRT_UART0] ={ 0x1000, 0x100 },
>  [VIRT_VIRTIO] =   { 0x10001000,0x1000 },
>  [VIRT_FW_CFG] =   { 0x1010,  0x18 },
> +[VIRT_IOPMP] ={ 0x1020,  0x10 },
>  [VIRT_FLASH] ={ 0x2000, 0x400 },
>  [VIRT_IMSIC_M] =  { 0x2400, VIRT_IMSIC_MAX_SIZE },
>  [VIRT_IMSIC_S] =  { 0x2800, VIRT_IMSIC_MAX_SIZE },
> @@ -102,6 +105,11 @@ static const MemMapEntry virt_memmap[] = {
>  [VIRT_DRAM] = { 0x8000,   0x0 },
>  };
>
> +static const MemMapEntry iopmp_protect_memmap[] = {
> +/* IOPMP protect all regions by default */
> +{0x0, 0x},
> +};
> +
>  /* PCIe high mmio is fixed for RV32 */
>  #define VIRT32_HIGH_PCIE_MMIO_BASE  0x3ULL
>  #define VIRT32_HIGH_PCIE_MMIO_SIZE  (4 * GiB)
> @@ -1117,6 +1125,24 @@ static void create_fdt_iommu(RISCVVirtState *s, 
> uint16_t bdf)
> bdf + 1, iommu_phandle, bdf + 1, 0x - bdf);
>  }
>
> +static void create_fdt_iopmp(RISCVVirtState *s, const MemMapEntry *memmap,
> + uint32_t irq_mmio_phandle) {
> +g_autofree char *name = NULL;
> +MachineState *ms = MACHINE(s);
> +
> +name = g_strdup_printf("/soc/iopmp@%lx", (long)memmap[VIRT_IOPMP].base);
> +qemu_fdt_add_subnode(ms->fdt, name);
> +qemu_fdt_setprop_string(ms->fdt, name, "compatible", "riscv_iopmp");
> +qemu_fdt_setprop_cells(ms->fdt, name, "reg", 0x0, 
> memmap[VIRT_IOPMP].base,
> +0x0, memmap[VIRT_IOPMP].size);
> +qemu_fdt_setprop_cell(ms->fdt, name, "interrupt-parent", 
> irq_mmio_phandle);
> +if (s->aia_type == VIRT_AIA_TYPE_NONE) {
> +qemu_fdt_setprop_cell(ms->fdt, name, "interrupts", IOPMP_IRQ);
> +} else {
> +qemu_fdt_setprop_cells(ms->fdt, name, "interrupts", IOPMP_IRQ, 0x4);
> +}
> +}
> +
>  static void finalize_fdt(RISCVVirtState *s)
>  {
>  uint32_t phandle = 1, irq_mmio_phandle = 1, msi_pcie_phandle = 1;
> @@ -1141,6 +1167,10 @@ static void finalize_fdt(RISCVVirtState *s)
>  create_fdt_uart(s, virt_memmap, irq_mmio_phandle);
>
>  create_fdt_rtc(s, virt_memmap, irq_mmio_phandle);
> +
> +if (s->have_iopmp) {
> +create_fdt_iopmp(s, virt_memmap, irq_mmio_phandle);
> +}
>  }
>
>  static void create_fdt(RISCVVirtState *s, const MemMapEntry *memmap)
> @@ -1529,6 +1559,8 @@ static void virt_machine_init(MachineState *machine)
>  DeviceState *mmio_irqchip, *virtio_irqchip, *pcie_irqchip;
>  int i, base_hartid, hart_count;
>  int socket_count = riscv_socket_count(machine);
> +DeviceState *iopmp_dev, *iopmp_disp_dev;
> +StreamSink *iopmp_ss, *iopmp_disp_ss;
>
>  /* Check socket count limit */
>  if (VIRT_SOCKETS_MAX < socket_count) {
> @@ -1710,6 +1742,29 @@ static void virt_machine_init(MachineState *machine)
>

Re: [PATCH v2 0/3] riscv: AIA: refinement for KVM acceleration

2025-02-27 Thread Alistair Francis

On Mon, Feb 24, 2025 at 12:58 PM Yong-Xuan Wang
 wrote:
>
> Reorder the code to reduce the conditional checking and remove
> unnecessary resource setting when using in-kernl AIA irqchip.
>
> ---
> v2:
> - remove the code reordering of the riscv-virt machine since it can't
>   work with NUMA setting. (Daniel)
>
> Yong-Xuan Wang (3):
>   hw/intc/imsic: refine the IMSIC realize
>   hw/intc/aplic: refine the APLIC realize
>   hw/intc/aplic: refine kvm_msicfgaddr

Thanks!

Applied to riscv-to-apply.next

Alistair

>
>  hw/intc/riscv_aplic.c | 73 +++
>  hw/intc/riscv_imsic.c | 47 +++-
>  2 files changed, 65 insertions(+), 55 deletions(-)
>
> --
> 2.17.1
>
>

Re: [PATCH v10 6/8] hw/misc/riscv_iopmp: Add RISC-V IOPMP device

2025-02-27 Thread Alistair Francis

On Wed, Jan 22, 2025 at 6:48 PM Ethan Chen via  wrote:
>
> Support IOPMP specification v0.9.2RC3.
> The specification url:
> https://github.com/riscv-non-isa/iopmp-spec/releases/tag/v0.9.2-RC3
>
> The IOPMP checks whether memory access from a device or CPU is valid.
> This implementation uses an IOMMU to modify the address space accessed
> by the device.
>
> For device access with IOMMUAccessFlags specifying read or write
> (IOMMU_RO or IOMMU_WO), the IOPMP checks the permission in
> iopmp_translate. If the access is valid, the target address space is
> downstream_as. If the access is blocked, it will be redirected to
> blocked_rwx_as.
>
> For CPU access with IOMMUAccessFlags not specifying read or write
> (IOMMU_NONE), the IOPMP translates the access to the corresponding
> address space based on the permission. If the access has full permission
> (rwx), the target address space is downstream_as. If the access has
> limited permission, the target address space is blocked_ followed by
> the lacked permissions.
>
> The operation of a blocked region can trigger an IOPMP interrupt, a bus
> error, or it can respond with success and fabricated data, depending on
> the value of the IOPMP ERR_CFG register.
>
> Support Properties and Default Values of the IOPMP Device
> The following are the supported properties and their default values for the
> IOPMP device. If a property has no description here, please refer to the
> IOPMP specification for details:
>
> * mdcfg_fmt: 1 (Options: 0/1/2)
> * srcmd_fmt: 0 (Options: 0/1/2)
> * tor_en: true (Options: true/false)
> * sps_en: false (Options: true/false)
> * prient_prog: true (Options: true/false)
> * rrid_transl_en: false (Options: true/false)
> * rrid_transl_prog: false (Options: true/false)
> * chk_x: true (Options: true/false)
> * no_x: false (Options: true/false)
> * no_w: false (Options: true/false)
> * stall_en: false (Options: true/false)
> * peis: true (Options: true/false)
> * pees: true (Options: true/false)
> * mfr_en: true (Options: true/false)
> * md_entry_num: 5 (IMP: Valid only for mdcfg_fmt 1/2)
> * md_num: 8 (Range: 0-63)
> * rrid_num: 16 (Range: srcmd_fmt ≠ 2: 0-65535, srcmd_fmt = 2: 0-32)
> * entry_num: 48
>   (Range: 0-IMP. For mdcfg_fmt = 1, it is fixed as md_num * (md_entry_num + 
> 1).
>Entry registers must not overlap with other registers.)
> * prio_entry: 65535
>   (Range: 0-IMP. If prio_entry > entry_num, it will be set to entry_num.)
> * rrid_transl: 0x0 (Range: 0-65535)
> * entry_offset: 0x4000
>   (IMP: Entry registers must not overlap with other registers.)
> * err_rdata: 0x0
>   (uint32. Specifies the value used in responses to read transactions when
>errors are suppressed)
> * msi_en: false (Options: true/false)
> * msidata: 12 (Range: 1-1023)
> * stall_violation_en: true (Options: true/false)
> * err_msiaddr: 0x2400 (low-part 32-bit address)
> * err_msiaddrh: 0x0 (high-part 32-bit address)
> * msi_rrid: 0
>   (Range: 0-65535. Specifies the rrid used by the IOPMP to send the MSI.)
>
> Signed-off-by: Ethan Chen 
> ---
>  hw/misc/Kconfig   |4 +
>  hw/misc/meson.build   |1 +
>  hw/misc/riscv_iopmp.c | 2182 +
>  hw/misc/trace-events  |4 +
>  include/hw/misc/riscv_iopmp.h |  191 +++
>  5 files changed, 2382 insertions(+)
>  create mode 100644 hw/misc/riscv_iopmp.c
>  create mode 100644 include/hw/misc/riscv_iopmp.h
>
> diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
> index 8f9ce2f68c..e4ad9cf9fe 100644
> --- a/hw/misc/Kconfig
> +++ b/hw/misc/Kconfig
> @@ -220,4 +220,8 @@ config IOSB
>  config XLNX_VERSAL_TRNG
>  bool
>
> +config RISCV_IOPMP
> +bool
> +select STREAM
> +
>  source macio/Kconfig
> diff --git a/hw/misc/meson.build b/hw/misc/meson.build
> index 55f493521b..88f2bb6b88 100644
> --- a/hw/misc/meson.build
> +++ b/hw/misc/meson.build
> @@ -34,6 +34,7 @@ system_ss.add(when: 'CONFIG_SIFIVE_E_PRCI', if_true: 
> files('sifive_e_prci.c'))
>  system_ss.add(when: 'CONFIG_SIFIVE_E_AON', if_true: files('sifive_e_aon.c'))
>  system_ss.add(when: 'CONFIG_SIFIVE_U_OTP', if_true: files('sifive_u_otp.c'))
>  system_ss.add(when: 'CONFIG_SIFIVE_U_PRCI', if_true: 
> files('sifive_u_prci.c'))
> +specific_ss.add(when: 'CONFIG_RISCV_IOPMP', if_true: files('riscv_iopmp.c'))
>
>  subdir('macio')
>
> diff --git a/hw/misc/riscv_iopmp.c b/hw/misc/riscv_iopmp.c
> new file mode 100644
> index 00..ab8f616c03
> --- /dev/null
> +++ b/hw/misc/riscv_iopmp.c
> @@ -0,0 +1,2182 @@
> +/*
> + * QEMU RISC-V IOPMP (Input Output Physical Memory Protection)
> + *
> + * Copyright (c) 2023-2025 Andes Tech. Corp.
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT

Re: [PATCH v2 1/6] hw/misc: Add MPFS system reset support

2025-02-27 Thread Alistair Francis

On Tue, Feb 25, 2025 at 10:55 AM Sebastian Huber
 wrote:
>
> Signed-off-by: Sebastian Huber 

Acked-by: Alistair Francis 

Alistair

> ---
>  hw/misc/mchp_pfsoc_sysreg.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/hw/misc/mchp_pfsoc_sysreg.c b/hw/misc/mchp_pfsoc_sysreg.c
> index 7876fe0c5b..08196525aa 100644
> --- a/hw/misc/mchp_pfsoc_sysreg.c
> +++ b/hw/misc/mchp_pfsoc_sysreg.c
> @@ -27,7 +27,9 @@
>  #include "hw/irq.h"
>  #include "hw/sysbus.h"
>  #include "hw/misc/mchp_pfsoc_sysreg.h"
> +#include "system/runstate.h"
>
> +#define MSS_RESET_CR0x18
>  #define ENVM_CR 0xb8
>  #define MESSAGE_INT 0x118c
>
> @@ -56,6 +58,11 @@ static void mchp_pfsoc_sysreg_write(void *opaque, hwaddr 
> offset,
>  {
>  MchpPfSoCSysregState *s = opaque;
>  switch (offset) {
> +case MSS_RESET_CR:
> +if (value == 0xdead) {
> +qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
> +}
> +break;
>  case MESSAGE_INT:
>  qemu_irq_lower(s->irq);
>  break;
> --
> 2.43.0
>

Re: [PATCH v2 3/3] binfmt: Add --ignore-family option

2025-02-27 Thread Alistair Francis

On Tue, Jan 28, 2025 at 4:29 AM Andrea Bolognani  wrote:
>
> Until now, the script has worked under the assumption that a
> host CPU can run binaries targeting any CPU in the same family.
> That's a fair enough assumption when it comes to running i386
> binaries on x86_64, but it doesn't quite apply in the general
> case.
>
> For example, while riscv64 CPUs could theoretically run riscv32
> applications natively, in practice there exist few (if any?)
> CPUs that implement the necessary silicon; moreover, even if you
> had one such CPU, your host OS would most likely not have
> enabled the necessary kernel bits.
>
> This new option gives distro packagers the ability to opt out of
> the assumption, likely on a per-architecture basis, and make
> things work out of the box for a larger fraction of their user
> base.
>
> As an interesting side effect, this makes it possible to enable
> execution of 64-bit binaries on 32-bit CPUs of the same family,
> which is a perfectly valid use case that apparently hadn't been
> considered until now.
>
> Link: https://src.fedoraproject.org/rpms/qemu/pull-request/72
> Thanks: David Abdurachmanov 
> Thanks: Daniel P. Berrangé 
> Signed-off-by: Andrea Bolognani 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  scripts/qemu-binfmt-conf.sh | 19 ---
>  1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/scripts/qemu-binfmt-conf.sh b/scripts/qemu-binfmt-conf.sh
> index 8d9136a29f..5fd462b1d1 100755
> --- a/scripts/qemu-binfmt-conf.sh
> +++ b/scripts/qemu-binfmt-conf.sh
> @@ -205,6 +205,9 @@ Usage: qemu-binfmt-conf.sh [--qemu-path 
> PATH][--debian][--systemd CPU]
> --persistent:if yes, the interpreter is loaded when binfmt is
>  configured and remains in memory. All future uses
>  are cloned from the open file.
> +   --ignore-family: if yes, it is assumed that the host CPU (e.g. 
> riscv64)
> +can't natively run programs targeting a CPU that is
> +part of the same family (e.g. riscv32).
> --preserve-argv0 preserve argv[0]
>
>  To import templates with update-binfmts, use :
> @@ -337,7 +340,12 @@ qemu_set_binfmts() {
>  fi
>
>  if [ "$host_family" = "$family" ] ; then
> -continue
> +# When --ignore-family is used, we have to generate rules even
> +# for targets that are in the same family as the host CPU. The
> +# only exception is of course when the CPU types exactly match
> +if [ "$target" = "$host_cpu" ] || [ "$IGNORE_FAMILY" = "no" ] ; 
> then
> +continue
> +fi
>  fi
>
>  $BINFMT_SET
> @@ -355,10 +363,11 @@ CREDENTIAL=no
>  PERSISTENT=no
>  PRESERVE_ARG0=no
>  QEMU_SUFFIX=""
> +IGNORE_FAMILY=no
>
>  
> _longopts="debian,systemd:,qemu-path:,qemu-suffix:,exportdir:,help,credential:,\
> -persistent:,preserve-argv0:"
> -options=$(getopt -o ds:Q:S:e:hc:p:g:F: -l ${_longopts} -- "$@")
> +persistent:,preserve-argv0:,ignore-family:"
> +options=$(getopt -o ds:Q:S:e:hc:p:g:F:i: -l ${_longopts} -- "$@")
>  eval set -- "$options"
>
>  while true ; do
> @@ -418,6 +427,10 @@ while true ; do
>  shift
>  PRESERVE_ARG0="$1"
>  ;;
> +-i|--ignore-family)
> +shift
> +IGNORE_FAMILY="$1"
> +;;
>  *)
>  break
>  ;;
> --
> 2.48.1
>

[PATCH v7 16/16] sev: Provide sev_features flags from IGVM VMSA to KVM_SEV_INIT2

2025-02-27 Thread Roy Hopkins

IGVM files can contain an initial VMSA that should be applied to each
vcpu as part of the initial guest state. The sev_features flags are
provided as part of the VMSA structure. However, KVM only allows
sev_features to be set during initialization and not as the guest is
being prepared for launch.

This patch queries KVM for the supported set of sev_features flags and
processes the IGVM file during kvm_init to determine any sev_features
flags set in the IGVM file. These are then provided in the call to
KVM_SEV_INIT2 to ensure the guest state matches that specified in the
IGVM file.

This does cause the IGVM file to be processed twice. Firstly to extract
the sev_features then secondly to actually configure the guest. However,
the first pass is largely ignored meaning the overhead is minimal.

Signed-off-by: Roy Hopkins 
Acked-by: Michael S. Tsirkin 
Acked-by: Stefano Garzarella 
---
 target/i386/sev.c | 160 --
 1 file changed, 141 insertions(+), 19 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index fa9b4bcad6..ef25e64b14 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -117,6 +117,8 @@ struct SevCommonState {
 uint32_t cbitpos;
 uint32_t reduced_phys_bits;
 bool kernel_hashes;
+uint64_t sev_features;
+uint64_t supported_sev_features;
 
 /* runtime state */
 uint8_t api_major;
@@ -492,7 +494,40 @@ static void sev_apply_cpu_context(CPUState *cpu)
 }
 }
 
-static int check_vmsa_supported(hwaddr gpa, const struct sev_es_save_area 
*vmsa,
+static int check_sev_features(SevCommonState *sev_common, uint64_t 
sev_features,
+  Error **errp)
+{
+/*
+ * Ensure SEV_FEATURES is configured for correct SEV hardware and that
+ * the requested features are supported. If SEV-SNP is enabled then
+ * that feature must be enabled, otherwise it must be cleared.
+ */
+if (sev_snp_enabled() && !(sev_features & SVM_SEV_FEAT_SNP_ACTIVE)) {
+error_setg(
+errp,
+"%s: SEV_SNP is enabled but is not enabled in VMSA sev_features",
+__func__);
+return -1;
+} else if (!sev_snp_enabled() &&
+   (sev_features & SVM_SEV_FEAT_SNP_ACTIVE)) {
+error_setg(
+errp,
+"%s: SEV_SNP is not enabled but is enabled in VMSA sev_features",
+__func__);
+return -1;
+}
+if (sev_features & ~sev_common->supported_sev_features) {
+error_setg(errp,
+   "%s: VMSA contains unsupported sev_features: %lX, "
+   "supported features: %lX",
+   __func__, sev_features, sev_common->supported_sev_features);
+return -1;
+}
+return 0;
+}
+
+static int check_vmsa_supported(SevCommonState *sev_common, hwaddr gpa,
+const struct sev_es_save_area *vmsa,
 Error **errp)
 {
 struct sev_es_save_area vmsa_check;
@@ -558,24 +593,10 @@ static int check_vmsa_supported(hwaddr gpa, const struct 
sev_es_save_area *vmsa,
 vmsa_check.x87_fcw = 0;
 vmsa_check.mxcsr = 0;
 
-if (sev_snp_enabled()) {
-if (vmsa_check.sev_features != SVM_SEV_FEAT_SNP_ACTIVE) {
-error_setg(errp,
-   "%s: sev_features in the VMSA contains an unsupported "
-   "value. For SEV-SNP, sev_features must be set to %x.",
-   __func__, SVM_SEV_FEAT_SNP_ACTIVE);
-return -1;
-}
-vmsa_check.sev_features = 0;
-} else {
-if (vmsa_check.sev_features != 0) {
-error_setg(errp,
-   "%s: sev_features in the VMSA contains an unsupported "
-   "value. For SEV-ES and SEV, sev_features must be "
-   "set to 0.", __func__);
-return -1;
-}
+if (check_sev_features(sev_common, vmsa_check.sev_features, errp) < 0) {
+return -1;
 }
+vmsa_check.sev_features = 0;
 
 if (!buffer_is_zero(&vmsa_check, sizeof(vmsa_check))) {
 error_setg(errp,
@@ -1729,6 +1750,39 @@ static int sev_snp_kvm_type(X86ConfidentialGuest *cg)
 return KVM_X86_SNP_VM;
 }
 
+static int sev_init_supported_features(ConfidentialGuestSupport *cgs,
+   SevCommonState *sev_common, Error 
**errp)
+{
+X86ConfidentialGuestClass *x86_klass =
+   X86_CONFIDENTIAL_GUEST_GET_CLASS(cgs);
+/*
+ * Older kernels do not support query or setting of sev_features. In this
+ * case the set of supported features must be zero to match the settings
+ * in the kernel.
+ */
+if (x86_klass->kvm_type(X86_CONFIDENTIAL_GUEST(sev_common)) ==
+KVM_X86_DEFAULT_VM) {
+sev_common->supported_sev_features = 0;
+return 0;
+}
+
+/* Query KVM for the supported set of sev_features */
+struct kvm_device_attr attr = {
+

[PATCH] hw/loongarch/virt: Replace RSDT with XSDT table

2025-02-27 Thread Bibo Mao

XSDT table is introduced in ACPI Specification 5.0, it supports 64-bit
address in the table. There is LoongArch system support from ACPI
Specification 6.4 and later, XSDT is supported by LoongArch system.

Here replace RSDT with XSDT table.

Signed-off-by: Bibo Mao 
---
 hw/loongarch/virt-acpi-build.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/loongarch/virt-acpi-build.c b/hw/loongarch/virt-acpi-build.c
index 9ca88d63ae..43ed8e2825 100644
--- a/hw/loongarch/virt-acpi-build.c
+++ b/hw/loongarch/virt-acpi-build.c
@@ -485,7 +485,7 @@ static void acpi_build(AcpiBuildTables *tables, 
MachineState *machine)
 LoongArchVirtMachineState *lvms = LOONGARCH_VIRT_MACHINE(machine);
 GArray *table_offsets;
 AcpiFadtData fadt_data;
-unsigned facs, rsdt, dsdt;
+unsigned facs, xsdt, dsdt;
 uint8_t *u;
 GArray *tables_blob = tables->table_data;
 
@@ -571,17 +571,17 @@ static void acpi_build(AcpiBuildTables *tables, 
MachineState *machine)
 }
 
 /* RSDT is pointed to by RSDP */
-rsdt = tables_blob->len;
-build_rsdt(tables_blob, tables->linker, table_offsets,
+xsdt = tables_blob->len;
+build_xsdt(tables_blob, tables->linker, table_offsets,
lvms->oem_id, lvms->oem_table_id);
 
 /* RSDP is in FSEG memory, so allocate it separately */
 {
 AcpiRsdpData rsdp_data = {
-.revision = 0,
+.revision = 2,
 .oem_id = lvms->oem_id,
-.xsdt_tbl_offset = NULL,
-.rsdt_tbl_offset = &rsdt,
+.xsdt_tbl_offset = &xsdt,
+.rsdt_tbl_offset = NULL,
 };
 build_rsdp(tables->rsdp, tables->linker, &rsdp_data);
 }

base-commit: b69801dd6b1eb4d107f7c2f643adf0a4e3ec9124
-- 
2.39.3

[PATCH v7 16/16] sev: Provide sev_features flags from IGVM VMSA to KVM_SEV_INIT2

2025-02-27 Thread Roy Hopkins

IGVM files can contain an initial VMSA that should be applied to each
vcpu as part of the initial guest state. The sev_features flags are
provided as part of the VMSA structure. However, KVM only allows
sev_features to be set during initialization and not as the guest is
being prepared for launch.

This patch queries KVM for the supported set of sev_features flags and
processes the IGVM file during kvm_init to determine any sev_features
flags set in the IGVM file. These are then provided in the call to
KVM_SEV_INIT2 to ensure the guest state matches that specified in the
IGVM file.

This does cause the IGVM file to be processed twice. Firstly to extract
the sev_features then secondly to actually configure the guest. However,
the first pass is largely ignored meaning the overhead is minimal.

Signed-off-by: Roy Hopkins 
Acked-by: Michael S. Tsirkin 
Acked-by: Stefano Garzarella 
---
 target/i386/sev.c | 160 --
 1 file changed, 141 insertions(+), 19 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index fa9b4bcad6..ef25e64b14 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -117,6 +117,8 @@ struct SevCommonState {
 uint32_t cbitpos;
 uint32_t reduced_phys_bits;
 bool kernel_hashes;
+uint64_t sev_features;
+uint64_t supported_sev_features;
 
 /* runtime state */
 uint8_t api_major;
@@ -492,7 +494,40 @@ static void sev_apply_cpu_context(CPUState *cpu)
 }
 }
 
-static int check_vmsa_supported(hwaddr gpa, const struct sev_es_save_area 
*vmsa,
+static int check_sev_features(SevCommonState *sev_common, uint64_t 
sev_features,
+  Error **errp)
+{
+/*
+ * Ensure SEV_FEATURES is configured for correct SEV hardware and that
+ * the requested features are supported. If SEV-SNP is enabled then
+ * that feature must be enabled, otherwise it must be cleared.
+ */
+if (sev_snp_enabled() && !(sev_features & SVM_SEV_FEAT_SNP_ACTIVE)) {
+error_setg(
+errp,
+"%s: SEV_SNP is enabled but is not enabled in VMSA sev_features",
+__func__);
+return -1;
+} else if (!sev_snp_enabled() &&
+   (sev_features & SVM_SEV_FEAT_SNP_ACTIVE)) {
+error_setg(
+errp,
+"%s: SEV_SNP is not enabled but is enabled in VMSA sev_features",
+__func__);
+return -1;
+}
+if (sev_features & ~sev_common->supported_sev_features) {
+error_setg(errp,
+   "%s: VMSA contains unsupported sev_features: %lX, "
+   "supported features: %lX",
+   __func__, sev_features, sev_common->supported_sev_features);
+return -1;
+}
+return 0;
+}
+
+static int check_vmsa_supported(SevCommonState *sev_common, hwaddr gpa,
+const struct sev_es_save_area *vmsa,
 Error **errp)
 {
 struct sev_es_save_area vmsa_check;
@@ -558,24 +593,10 @@ static int check_vmsa_supported(hwaddr gpa, const struct 
sev_es_save_area *vmsa,
 vmsa_check.x87_fcw = 0;
 vmsa_check.mxcsr = 0;
 
-if (sev_snp_enabled()) {
-if (vmsa_check.sev_features != SVM_SEV_FEAT_SNP_ACTIVE) {
-error_setg(errp,
-   "%s: sev_features in the VMSA contains an unsupported "
-   "value. For SEV-SNP, sev_features must be set to %x.",
-   __func__, SVM_SEV_FEAT_SNP_ACTIVE);
-return -1;
-}
-vmsa_check.sev_features = 0;
-} else {
-if (vmsa_check.sev_features != 0) {
-error_setg(errp,
-   "%s: sev_features in the VMSA contains an unsupported "
-   "value. For SEV-ES and SEV, sev_features must be "
-   "set to 0.", __func__);
-return -1;
-}
+if (check_sev_features(sev_common, vmsa_check.sev_features, errp) < 0) {
+return -1;
 }
+vmsa_check.sev_features = 0;
 
 if (!buffer_is_zero(&vmsa_check, sizeof(vmsa_check))) {
 error_setg(errp,
@@ -1729,6 +1750,39 @@ static int sev_snp_kvm_type(X86ConfidentialGuest *cg)
 return KVM_X86_SNP_VM;
 }
 
+static int sev_init_supported_features(ConfidentialGuestSupport *cgs,
+   SevCommonState *sev_common, Error 
**errp)
+{
+X86ConfidentialGuestClass *x86_klass =
+   X86_CONFIDENTIAL_GUEST_GET_CLASS(cgs);
+/*
+ * Older kernels do not support query or setting of sev_features. In this
+ * case the set of supported features must be zero to match the settings
+ * in the kernel.
+ */
+if (x86_klass->kvm_type(X86_CONFIDENTIAL_GUEST(sev_common)) ==
+KVM_X86_DEFAULT_VM) {
+sev_common->supported_sev_features = 0;
+return 0;
+}
+
+/* Query KVM for the supported set of sev_features */
+struct kvm_device_attr attr = {
+

Re: [RFC 1/2] system/memory: Allow creating IOMMU mappings from RAM discard populate notifiers

2025-02-27 Thread Chenyi Qiang




On 2/27/2025 7:27 PM, David Hildenbrand wrote:
> On 27.02.25 04:26, Chenyi Qiang wrote:
>>
>>
>> On 2/26/2025 8:43 PM, Chenyi Qiang wrote:
>>>
>>>
>>> On 2/25/2025 5:41 PM, David Hildenbrand wrote:
 On 25.02.25 03:00, Chenyi Qiang wrote:
>
>
> On 2/21/2025 6:04 PM, Chenyi Qiang wrote:
>>
>>
>> On 2/21/2025 4:09 PM, David Hildenbrand wrote:
>>> On 21.02.25 03:25, Chenyi Qiang wrote:


 On 2/21/2025 3:39 AM, David Hildenbrand wrote:
> On 20.02.25 17:13, Jean-Philippe Brucker wrote:
>> For Arm CCA we'd like the guest_memfd discard notifier to call
>> the
>> IOMMU
>> notifiers and create e.g. VFIO mappings. The default VFIO discard
>> notifier isn't sufficient for CCA because the DMA addresses
>> need a
>> translation (even without vIOMMU).
>>
>> At the moment:
>> * guest_memfd_state_change() calls the populate() notifier
>> * the populate notifier() calls IOMMU notifiers
>> * the IOMMU notifier handler calls memory_get_xlat_addr() to get
>> a VA
>> * it calls ram_discard_manager_is_populated() which fails.
>>
>> guest_memfd_state_change() only changes the section's state after
>> calling the populate() notifier. We can't easily invert the
>> order of
>> operation because it uses the old state bitmap to know which
>> pages need
>> the populate() notifier.
>
> I assume we talk about this code: [1]
>
> [1] https://lkml.kernel.org/r/20250217081833.21568-1-
> chenyi.qi...@intel.com
>
>
> +static int memory_attribute_state_change(MemoryAttributeManager
> *mgr,
> uint64_t offset,
> + uint64_t size, bool
> shared_to_private)
> +{
> +    int block_size =
> memory_attribute_manager_get_block_size(mgr);
> +    int ret = 0;
> +
> +    if (!memory_attribute_is_valid_range(mgr, offset, size)) {
> +    error_report("%s, invalid range: offset 0x%lx, size
> 0x%lx",
> + __func__, offset, size);
> +    return -1;
> +    }
> +
> +    if ((shared_to_private &&
> memory_attribute_is_range_discarded(mgr,
> offset, size)) ||
> +    (!shared_to_private &&
> memory_attribute_is_range_populated(mgr,
> offset, size))) {
> +    return 0;
> +    }
> +
> +    if (shared_to_private) {
> +    memory_attribute_notify_discard(mgr, offset, size);
> +    } else {
> +    ret = memory_attribute_notify_populate(mgr, offset,
> size);
> +    }
> +
> +    if (!ret) {
> +    unsigned long first_bit = offset / block_size;
> +    unsigned long nbits = size / block_size;
> +
> +    g_assert((first_bit + nbits) <= mgr->bitmap_size);
> +
> +    if (shared_to_private) {
> +    bitmap_clear(mgr->shared_bitmap, first_bit, nbits);
> +    } else {
> +    bitmap_set(mgr->shared_bitmap, first_bit, nbits);
> +    }
> +
> +    return 0;
> +    }
> +
> +    return ret;
> +}
>
> Then, in memory_attribute_notify_populate(), we walk the bitmap
> again.
>
> Why?
>
> We just checked that it's all in the expected state, no?
>
>
> virtio-mem doesn't handle it that way, so I'm curious why we would
> have
> to do it here?

 I was concerned about the case where the guest issues a request
 that
 only partial of the range is in the desired state.
 I think the main problem is the policy for the guest conversion
 request.
 My current handling is:

 1. When a conversion request is made for a range already in the
 desired
  state, the helper simply returns success.
>>>
>>> Yes.
>>>
 2. For requests involving a range partially in the desired
 state, only
  the necessary segments are converted, ensuring the entire
 range
  complies with the request efficiently.
>>>
>>>
>>> Ah, now I get:
>>>
>>> +    if ((shared_to_private &&
>>> memory_attribute_is_range_discarded(mgr,
>>> offset, size)) ||
>>> +    (!shared_to_private &&
>>> memory_attribute_is_range_populated(mgr,
>>> offset, size))) {
>>> +    return 0;
>>> +    }
>>> +
>>>
>>> We're not failing if it might already partially be in the other
>>> state.
>>>
 3. In scenarios where a conversion request is

Re: [PATCH 2/5] rust: pl011: move register definitions out of lib.rs

2025-02-27 Thread Peter Maydell

On Thu, 27 Feb 2025 at 16:48, Paolo Bonzini  wrote:
>
> Signed-off-by: Paolo Bonzini 
> ---
>  rust/hw/char/pl011/src/device.rs|   7 +-
>  rust/hw/char/pl011/src/lib.rs   | 509 +---
>  rust/hw/char/pl011/src/registers.rs | 507 +++
>  3 files changed, 513 insertions(+), 510 deletions(-)
>  create mode 100644 rust/hw/char/pl011/src/registers.rs

Looking at this patch I'm sorely tempted to suggest significantly
trimming down the commentary in these comments: it contains
rather more text cut-n-pasted from the PL011 TRM than I'm
entirely comfortable with, and much of it is detail that
is irrelevant to QEMU. I don't think we should be trying to
make it unnecessary for somebody working on the QEMU device
models to ever look at the hardware reference manuals.

thanks
-- PMM

Re: Problem with iotest 233

2025-02-27 Thread Eric Blake

On Wed, Feb 26, 2025 at 09:55:18AM +0100, Thomas Huth wrote:
> > > Though, that does not look like the thread from the simpletrace, but
> > > the the QEMU RCU thread instead ... so no clue where that writer
> > > thread might have gone...
> > 
> > OK, I think I now understood the problem: qemu-nbd is calling
> > trace_init_backends() first, which creates the simpletrace threads and
> > installs the atexit() handler. Then it is calling fork() since the test
> > uses the --fork command line option. But fork() does not clone the
> > simpletrace thread into the new process, only the main thread (see
> > man-page of fork, the new process starts single-threaded). So when the
> > new child process exits, the exit handler calls the simple trace flush
> > function which tries to wait for a thread that has never been created in
> > that process.

That definitely explains the symptoms.

> > 
> > The test works when I move the trace_init_backends() behind the fork()
> > in the main function... but I am not sure if we would miss some logs
> > this way, so I don't know whether that's the right solution. Could a
> > qemu-nbd expert please have a look at this?

I'm also thinking about ways to avoid it.

> 
> After pondering about it for a while, maybe the best solution is to handle
> it within the simpletrace backend itself, by using pthread_atfork() :
> 
>  https://lore.kernel.org/qemu-devel/20250226085015.1143991-1-th...@redhat.com/

pthread_atfork() is an odd function - POSIX itself says it is
unreliable, because there is NO sane way you can possibly know every
action that any library you call into that might possibly need
protection before fork.  That doesn't mean we can't try it, just that
we can't expect it to solve every fork-related problem we might
encounter.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

[PATCH 5/5] hw/arm/versatilepb: Convert printfs to LOG_GUEST_ERROR

2025-02-27 Thread Peter Maydell

Convert some printf() calls for attempts to access nonexistent
registers into LOG_GUEST_ERROR logging.

Signed-off-by: Peter Maydell 
---
 hw/arm/versatilepb.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/arm/versatilepb.c b/hw/arm/versatilepb.c
index 941616cd25b..35766445fa4 100644
--- a/hw/arm/versatilepb.c
+++ b/hw/arm/versatilepb.c
@@ -27,6 +27,7 @@
 #include "qom/object.h"
 #include "audio/audio.h"
 #include "target/arm/cpu-qom.h"
+#include "qemu/log.h"
 
 #define VERSATILE_FLASH_ADDR 0x3400
 #define VERSATILE_FLASH_SIZE (64 * 1024 * 1024)
@@ -110,7 +111,8 @@ static uint64_t vpb_sic_read(void *opaque, hwaddr offset,
 case 8: /* PICENABLE */
 return s->pic_enable;
 default:
-printf ("vpb_sic_read: Bad register offset 0x%x\n", (int)offset);
+qemu_log_mask(LOG_GUEST_ERROR,
+  "vpb_sic_read: Bad register offset 0x%x\n", (int)offset);
 return 0;
 }
 }
@@ -144,7 +146,8 @@ static void vpb_sic_write(void *opaque, hwaddr offset,
 vpb_sic_update_pic(s);
 break;
 default:
-printf ("vpb_sic_write: Bad register offset 0x%x\n", (int)offset);
+qemu_log_mask(LOG_GUEST_ERROR,
+  "vpb_sic_write: Bad register offset 0x%x\n", 
(int)offset);
 return;
 }
 vpb_sic_update(s);
-- 
2.43.0

Re: [PATCH 00/10] vfio/igd: Remove legacy mode

2025-02-27 Thread Alex Williamson

On Tue, 25 Feb 2025 02:29:17 +0800
Tomita Moeko  wrote:

> This patchset removes some legacy checks and converts the legacy mode
> implicitly enabled by BDF 00:02.0 into x-igd-* options, including:
> * Removing PCI ROM BAR and VGA IO/MMIO range check before applying quirk
> * Using unified x-igd-opregion option for OpRegion access.
> * Introducing new x-igd-lpc option for the LPC bridge / Host bridge ID
>   quirk. Currently this is only supported on i440fx.
> * Extending quirk support when IGD is not assigned to BDF 00:02.0
> 
> The first 2 patches of this patchset was taken from a previous one,
> details can be found at:
> https://lore.kernel.org/all/20250124191245.12464-1-tomitamo...@gmail.com/
> 
> This patchest was mainly tested on Alder Lake UHD770, with Debian 12
> (kernel 6.1), Windows 11 (driver 32.0.101.6458) and Intel GOP driver
> 17.0.1081.
> 
> If the design is good to go, I will update the documentation also.
> 
> A open question is whether the old legacy mode behavior should be kept
> or not. Checking if all the condition of legacy mode were met and
> toggling correspoding options is more complicated then I expected :(
> Any ideas would be appreciated.

I dusted off a working IGD assignment configuration on an i7-7700 Kaby
Lake system with HD630 graphics.  This series breaks it a couple times.

First, while the host system itself may support UEFI (some systems may
not), I had dumped the VGA ROM in CSM mode and run the VM with SeaBIOS.
Therefore it needs VGA support.  This is the recommended configuration
for legacy mode in our current documentation.  Patch 4/ requires that I
add x-vga to continue using this VM.

Then of course since this VM is specifically using legacy mode, patch
10/ requires that I also add both x-igd-opregion and x-igd-lpc.

Therefore I go from a VM that works with IGD assignment in legacy mode,
with no unsupported options:

...
hvm
...

To one that requires multiple unsupported, experimental options:

http://libvirt.org/schemas/domain/qemu/1.0"; type="kvm">
...
hvm
...

  ...

I don't know how we justify this within our defacto support contract
with users.  Legacy mode needs to continue to exist and be
automatically enabled unless we want to figure out a deprecation
process for it and queue this sort of change for sometime in the
future.  Thanks,

Alex

Re: [PATCH v5 25/36] vfio/migration: Multifd device state transfer support - receive init/cleanup

2025-02-27 Thread Maciej S. Szmigiero


On 26.02.2025 18:46, Cédric Le Goater wrote:

On 2/19/25 21:34, Maciej S. Szmigiero wrote:

From: "Maciej S. Szmigiero" 

Add support for VFIOMultifd data structure that will contain most of the
receive-side data together with its init/cleanup methods.

Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration-multifd.c   | 33 +
  hw/vfio/migration-multifd.h   |  8 
  hw/vfio/migration.c   | 29 +++--
  include/hw/vfio/vfio-common.h |  3 +++
  4 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index 7328ad8e925c..c2defc0efef0 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -41,6 +41,9 @@ typedef struct VFIOStateBuffer {
  size_t len;
  } VFIOStateBuffer;
+typedef struct VFIOMultifd {
+} VFIOMultifd;
+
  static void vfio_state_buffer_clear(gpointer data)
  {
  VFIOStateBuffer *lb = data;
@@ -84,8 +87,38 @@ static VFIOStateBuffer 
*vfio_state_buffers_at(VFIOStateBuffers *bufs, guint idx)
  return &g_array_index(bufs->array, VFIOStateBuffer, idx);
  }
+VFIOMultifd *vfio_multifd_new(void)
+{
+    VFIOMultifd *multifd = g_new(VFIOMultifd, 1);
+
+    return multifd;
+}
+
+void vfio_multifd_free(VFIOMultifd *multifd)
+{
+    g_free(multifd);
+}
+
  bool vfio_multifd_transfer_supported(void)
  {
  return multifd_device_state_supported() &&
  migrate_send_switchover_start();
  }
+
+bool vfio_multifd_transfer_enabled(VFIODevice *vbasedev)
+{
+    return false;
+}
+
+bool vfio_multifd_transfer_setup(VFIODevice *vbasedev, Error **errp)
+{
+    if (vfio_multifd_transfer_enabled(vbasedev) &&
+    !vfio_multifd_transfer_supported()) {
+    error_setg(errp,
+   "%s: Multifd device transfer requested but unsupported in the 
current config",
+   vbasedev->name);
+    return false;
+    }
+
+    return true;
+}
diff --git a/hw/vfio/migration-multifd.h b/hw/vfio/migration-multifd.h
index 8fe004c1da81..1eefba3b2eed 100644
--- a/hw/vfio/migration-multifd.h
+++ b/hw/vfio/migration-multifd.h
@@ -12,6 +12,14 @@
  #include "hw/vfio/vfio-common.h"
+typedef struct VFIOMultifd VFIOMultifd;
+
+VFIOMultifd *vfio_multifd_new(void);
+void vfio_multifd_free(VFIOMultifd *multifd);
+
  bool vfio_multifd_transfer_supported(void);
+bool vfio_multifd_transfer_enabled(VFIODevice *vbasedev);
+
+bool vfio_multifd_transfer_setup(VFIODevice *vbasedev, Error **errp);
  #endif
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 7b79be6ad293..4311de763885 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -674,15 +674,40 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
  static int vfio_load_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  VFIODevice *vbasedev = opaque;
+    VFIOMigration *migration = vbasedev->migration;
+    int ret;
+
+    if (!vfio_multifd_transfer_setup(vbasedev, errp)) {
+    return -EINVAL;
+    }
+
+    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
+   migration->device_state, errp);
+    if (ret) {
+    return ret;
+    }
-    return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
-    vbasedev->migration->device_state, errp);
+    if (vfio_multifd_transfer_enabled(vbasedev)) {
+    assert(!migration->multifd);
+    migration->multifd = vfio_multifd_new();


When called from vfio_load_setup(), I think vfio_multifd_transfer_setup()
should allocate migration->multifd at the same time. It would simplify
the setup to one step. Maybe we could add a bool parameter ? because,
IIRC, you didn't like the idea of allocating it always, that is in
vfio_save_setup() too.


I have added a "bool alloc_multifd" parameter to
vfio_multifd_transfer_setup() and renamed it to vfio_multifd_setup() for
consistency with vfio_multifd_cleanup().

Unexported vfio_multifd_new() now that it is called only from
vfio_multifd_setup() in the same translation unit.



For symmetry, could vfio_save_cleanup() call vfio_multifd_cleanup() too ?
a setup implies a cleanup.


Added vfio_multifd_cleanup() call to vfio_save_cleanup() with a comment
describing that it is currently a NOP.


Thanks,

C.


Thanks,
Maciej

Re: [PATCH v5 34/36] vfio/migration: Max in-flight VFIO device state buffer count limit

2025-02-27 Thread Maciej S. Szmigiero


On 27.02.2025 07:48, Cédric Le Goater wrote:

On 2/19/25 21:34, Maciej S. Szmigiero wrote:

From: "Maciej S. Szmigiero" 

Allow capping the maximum count of in-flight VFIO device state buffers
queued at the destination, otherwise a malicious QEMU source could
theoretically cause the target QEMU to allocate unlimited amounts of memory
for buffers-in-flight.

Since this is not expected to be a realistic threat in most of VFIO live
migration use cases and the right value depends on the particular setup
disable the limit by default by setting it to UINT64_MAX.

Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration-multifd.c   | 14 ++
  hw/vfio/pci.c |  2 ++
  include/hw/vfio/vfio-common.h |  1 +
  3 files changed, 17 insertions(+)

diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index 18a5ff964a37..04aa3f4a6596 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -53,6 +53,7 @@ typedef struct VFIOMultifd {
  QemuMutex load_bufs_mutex; /* Lock order: this lock -> BQL */
  uint32_t load_buf_idx;
  uint32_t load_buf_idx_last;
+    uint32_t load_buf_queued_pending_buffers;
  } VFIOMultifd;
  static void vfio_state_buffer_clear(gpointer data)
@@ -121,6 +122,15 @@ static bool vfio_load_state_buffer_insert(VFIODevice 
*vbasedev,
  assert(packet->idx >= multifd->load_buf_idx);
+    multifd->load_buf_queued_pending_buffers++;
+    if (multifd->load_buf_queued_pending_buffers >
+    vbasedev->migration_max_queued_buffers) {
+    error_setg(errp,
+   "queuing state buffer %" PRIu32 " would exceed the max of 
%" PRIu64,
+   packet->idx, vbasedev->migration_max_queued_buffers);
+    return false;
+    }
+
  lb->data = g_memdup2(&packet->data, packet_total_size - sizeof(*packet));
  lb->len = packet_total_size - sizeof(*packet);
  lb->is_present = true;
@@ -374,6 +384,9 @@ static bool vfio_load_bufs_thread(void *opaque, bool 
*should_quit, Error **errp)
  goto ret_signal;
  }
+    assert(multifd->load_buf_queued_pending_buffers > 0);
+    multifd->load_buf_queued_pending_buffers--;
+
  if (multifd->load_buf_idx == multifd->load_buf_idx_last - 1) {
  trace_vfio_load_state_device_buffer_end(vbasedev->name);
  }
@@ -408,6 +421,7 @@ VFIOMultifd *vfio_multifd_new(void)
  multifd->load_buf_idx = 0;
  multifd->load_buf_idx_last = UINT32_MAX;
+    multifd->load_buf_queued_pending_buffers = 0;
  qemu_cond_init(&multifd->load_bufs_buffer_ready_cond);
  multifd->load_bufs_thread_running = false;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 9111805ae06c..247418f0fce2 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3383,6 +3383,8 @@ static const Property vfio_pci_dev_properties[] = {
  vbasedev.migration_multifd_transfer,
  qdev_prop_on_off_auto_mutable, OnOffAuto,
  .set_default = true, .defval.i = ON_OFF_AUTO_AUTO),
+    DEFINE_PROP_UINT64("x-migration-max-queued-buffers", VFIOPCIDevice,
+   vbasedev.migration_max_queued_buffers, UINT64_MAX),


UINT64_MAX doesn't make sense to me. What would be a reasonable value ?


It's the value that effectively disables this limit.


Have you monitored the max ? Should we collect some statistics on this
value and raise a warning if a high water mark is reached ? I think
this would more useful.


It's an additional mechanism, which is not expected to be necessary
in most of real-world setups, hence it's disabled by default:

Since this is not expected to be a realistic threat in most of VFIO live
migration use cases and the right value depends on the particular setup
disable the limit by default by setting it to UINT64_MAX.


The minimum value that works with particular setup depends on number of
multifd channels, probably also the number of NIC queues, etc. so it's
not something we should propose hard default to - unless it's a very
high default like 100 buffers, but then why have it set by default?.

IMHO setting it to UINT64_MAX clearly shows that it is disabled by
default since it obviously couldn't be set higher.
 

  DEFINE_PROP_BOOL("migration-events", VFIOPCIDevice,
   vbasedev.migration_events, false),
  DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),



Please add property documentation in vfio_pci_dev_class_init()



I'm not sure what you mean by that, vfio_pci_dev_class_init() doesn't
contain any documentation or even references to either
x-migration-max-queued-buffers or x-migration-multifd-transfer:

static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
PCIDeviceClass *pdc = PCI_DEVICE_CLASS(klass);

device_class_set_legacy_reset(dc, vfio_pci_reset);
device_class_set_props(dc, vfio_pci_dev_properties);
#ifdef CONFIG_IOMMUFD
object_class_property_add_str(klass, "fd"

Re: [PATCH v5 25/36] vfio/migration: Multifd device state transfer support - receive init/cleanup

2025-02-27 Thread Maciej S. Szmigiero


On 26.02.2025 18:28, Cédric Le Goater wrote:

On 2/19/25 21:34, Maciej S. Szmigiero wrote:

From: "Maciej S. Szmigiero" 

Add support for VFIOMultifd data structure that will contain most of the
receive-side data together with its init/cleanup methods.

Signed-off-by: Maciej S. Szmigiero 
---
  hw/vfio/migration-multifd.c   | 33 +
  hw/vfio/migration-multifd.h   |  8 
  hw/vfio/migration.c   | 29 +++--
  include/hw/vfio/vfio-common.h |  3 +++
  4 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index 7328ad8e925c..c2defc0efef0 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -41,6 +41,9 @@ typedef struct VFIOStateBuffer {
  size_t len;
  } VFIOStateBuffer;
+typedef struct VFIOMultifd {
+} VFIOMultifd;
+
  static void vfio_state_buffer_clear(gpointer data)
  {
  VFIOStateBuffer *lb = data;
@@ -84,8 +87,38 @@ static VFIOStateBuffer 
*vfio_state_buffers_at(VFIOStateBuffers *bufs, guint idx)
  return &g_array_index(bufs->array, VFIOStateBuffer, idx);
  }
+VFIOMultifd *vfio_multifd_new(void)
+{
+    VFIOMultifd *multifd = g_new(VFIOMultifd, 1);
+
+    return multifd;
+}
+
+void vfio_multifd_free(VFIOMultifd *multifd)
+{
+    g_free(multifd);
+}
+
  bool vfio_multifd_transfer_supported(void)
  {
  return multifd_device_state_supported() &&
  migrate_send_switchover_start();
  }
+
+bool vfio_multifd_transfer_enabled(VFIODevice *vbasedev)
+{
+    return false;
+}
+
+bool vfio_multifd_transfer_setup(VFIODevice *vbasedev, Error **errp)
+{
+    if (vfio_multifd_transfer_enabled(vbasedev) &&
+    !vfio_multifd_transfer_supported()) {
+    error_setg(errp,
+   "%s: Multifd device transfer requested but unsupported in the 
current config",
+   vbasedev->name);
+    return false;
+    }
+
+    return true;
+}
diff --git a/hw/vfio/migration-multifd.h b/hw/vfio/migration-multifd.h
index 8fe004c1da81..1eefba3b2eed 100644
--- a/hw/vfio/migration-multifd.h
+++ b/hw/vfio/migration-multifd.h
@@ -12,6 +12,14 @@
  #include "hw/vfio/vfio-common.h"
+typedef struct VFIOMultifd VFIOMultifd;
+
+VFIOMultifd *vfio_multifd_new(void);
+void vfio_multifd_free(VFIOMultifd *multifd);
+
  bool vfio_multifd_transfer_supported(void);
+bool vfio_multifd_transfer_enabled(VFIODevice *vbasedev);
+
+bool vfio_multifd_transfer_setup(VFIODevice *vbasedev, Error **errp);
  #endif
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 7b79be6ad293..4311de763885 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -674,15 +674,40 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
  static int vfio_load_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  VFIODevice *vbasedev = opaque;
+    VFIOMigration *migration = vbasedev->migration;
+    int ret;
+
+    if (!vfio_multifd_transfer_setup(vbasedev, errp)) {
+    return -EINVAL;
+    }
+
+    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
+   migration->device_state, errp);
+    if (ret) {
+    return ret;
+    }
-    return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
-    vbasedev->migration->device_state, errp);
+    if (vfio_multifd_transfer_enabled(vbasedev)) {
+    assert(!migration->multifd);
+    migration->multifd = vfio_multifd_new();
+    }
+
+    return 0;
+}
+
+static void vfio_multifd_cleanup(VFIODevice *vbasedev)
+{
+    VFIOMigration *migration = vbasedev->migration;
+
+    g_clear_pointer(&migration->multifd, vfio_multifd_free);
  }


Please move vfio_multifd_cleanup() to migration-multifd.c.


Done now.


Thanks,

C.



Thanks,
Maciej

[PATCH v2 0/5] ppc/amigaone patches

2025-02-27 Thread BALATON Zoltan

Hello,

v2:
- change unused read function to g_assert_not_reached()
- new patch to add defines to constants
- added R-b tags

This series adds NVRAM and support for -kernel, -initrd and -append
options to the amigaone machine. This makes it easier to boot AmigaOS
and avoids a crash in the guest when it tries to access NVRAM.

While the -kernel option emulates what U-Boot passes to the kernel,
old Linux kernels for amigaone may not work with it because of two
reasons: these come in legacy U-Boot Multi-File image format that QEMU
cannot read and even after unpacking that and creating a kernel uimage
it won't find PCI devices because it does not initialise them
correctly. This works when booted from U-Boot because U-Boot inits PCI
devices. So does my BBoot loader which can be used to load AmigaOS so
I don't intend to emulate that part of U-Boot.

I'd like this to be merged for the next release please. When merging
please update https://wiki.qemu.org/ChangeLog/10.0 with the following:

amigaone

Added support for NVRAM and -kernel, -initrd, -append command line
options. By default the NVRAM contents are not preserved between
sessions. To make it persistent create a backing file with 'qemu-image
create -f raw nvram.bin 4k' and add -drive
if=mtd,format=raw,file=nvram.bin to keep NVRAM contents in the backing
file so settings stored in it will be preserved between sessions.

To run AmigaOS with BBoot using the -kernel option at least BBoot
version 0.8 is needed. Older BBoot versions only work with -device
loader and cannot be used with -kernel on amigaone.

Regards,

BALATON Zoltan (5):
  ppc/amigaone: Simplify replacement dummy_fw
  ppc/amigaone: Implement NVRAM emulation
  ppc/amigaone: Add default environment
  ppc/amigaone: Add kernel and initrd support
  ppc/amigaone: Add #defines for memory map constants

 hw/ppc/amigaone.c | 284 +++---
 1 file changed, 271 insertions(+), 13 deletions(-)

-- 
2.30.9

Re: [PATCH] QIOChannelSocket: Flush zerocopy socket error queue on ENOBUF failure for sendmsg

2025-02-27 Thread Manish

Again really sorry, missed this due to some issue with my mail filters 
and came to know about it via qemu-devel weblink. :)


On 25/02/25 2:37 pm, Daniel P. Berrangé wrote:

!---|
   CAUTION: External Email

|---!

On Fri, Feb 21, 2025 at 04:44:48AM -0500, Manish Mishra wrote:

We allocate extra metadata SKBs in case of zerocopy send. This metadata memory
is accounted for in the OPTMEM limit. If there is any error with sending
zerocopy data or if zerocopy was skipped, these metadata SKBs are queued in the
socket error queue. This error queue is freed when userspace reads it.

Usually, if there are continuous failures, we merge the metadata into a single
SKB and free another one. However, if there is any out-of-order processing or
an intermittent zerocopy failures, this error chain can grow significantly,
exhausting the OPTMEM limit. As a result, all new sendmsg requests fail to
allocate any new SKB, leading to an ENOBUF error.

IIUC, you are effectively saying that the migration code is calling
qio_channel_write() too many times, before it calls qio_channel_flush(.)

Can you clarify what yu mean by the "OPTMEM limit" here ? I'm wondering
if this is potentially triggered by suboptimal tuning of the deployment
environment or we need to document tuning better.



I replied it on other thread. Posting it again.

We allocate some memory for zerocopy metadata, this is not accounted in
tcp_send_queue but it is accounted in optmem_limit.

https://github.com/torvalds/linux/blob/dd83757f6e686a2188997cb58b5975f744bb7786/net/core/skbuff.c#L1607

Also when the zerocopy data is sent and acked, we try to free this
allocated skb as we can see in below code.

https://github.com/torvalds/linux/blob/dd83757f6e686a2188997cb58b5975f744bb7786/net/core/skbuff.c#L1751

In case, we get __msg_zerocopy_callback() on continous ranges and
skb_zerocopy_notify_extend() passes, we merge the ranges and free up the
current skb. But if that is not the case, we insert that skb in error
queue and it won't be freed until we do error flush from userspace. This
is possible when either zerocopy packets are skipped in between or it is
always skipped but we get out of order acks on packets. As a result this
error chain keeps growing, exhausthing the optmem_limit. As a result
when new zerocopy sendmsg request comes, it won't be able to allocate
the metadata and returns with ENOBUF.

I understand there is another bug of why zerocopy pakcets are getting
skipped and which could be our deployment specific. But anyway live
migrations should not fail, it is fine to mark zerocopy skipped but not
fail?



To workaround this, if we encounter an ENOBUF error with a zerocopy sendmsg,
we flush the error queue and retry once more.

Signed-off-by: Manish Mishra
---
  include/io/channel-socket.h |  1 +
  io/channel-socket.c | 52 -
  2 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index ab15577d38..6cfc66eb5b 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -49,6 +49,7 @@ struct QIOChannelSocket {
  socklen_t remoteAddrLen;
  ssize_t zero_copy_queued;
  ssize_t zero_copy_sent;
+bool new_zero_copy_sent_success;
  };
  
  
diff --git a/io/channel-socket.c b/io/channel-socket.c

index 608bcf066e..c7f576290f 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -37,6 +37,11 @@
  
  #define SOCKET_MAX_FDS 16
  
+#ifdef QEMU_MSG_ZEROCOPY

+static int qio_channel_socket_flush_internal(QIOChannel *ioc,
+ Error **errp);
+#endif
+
  SocketAddress *
  qio_channel_socket_get_local_address(QIOChannelSocket *ioc,
   Error **errp)
@@ -65,6 +70,7 @@ qio_channel_socket_new(void)
  sioc->fd = -1;
  sioc->zero_copy_queued = 0;
  sioc->zero_copy_sent = 0;
+sioc->new_zero_copy_sent_success = FALSE;
  
  ioc = QIO_CHANNEL(sioc);

  qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
@@ -566,6 +572,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
  size_t fdsize = sizeof(int) * nfds;
  struct cmsghdr *cmsg;
  int sflags = 0;
+bool zero_copy_flush_pending = TRUE;
  
  memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS));
  
@@ -612,9 +619,21 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,

  goto retry;
  case ENOBUFS:
  if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
-error_setg_errno(errp, errno,
- "Process can't lock enough memory for using 
MSG_ZEROCOPY");
-return -1;
+if (zero_copy_flush_pending) {
+ret = qio_channel_socket_flush_internal(ioc, errp);

Calling this is problematic, because qio_chan

[PATCH 0/5] hw/arm: Remove printf() calls

2025-02-27 Thread Peter Maydell

I noticed while looking at the sx1 functional tests that
the omap1 device emulation code prints to stdout
"omap_clkm_write: clocking scheme set to synchronous scalable"
which the test dutifully captures to its default.log.

Printing this kind of debug or information message to stdout
is definitely not something we do any more; so seeing it
prompted me to clean this up for hw/arm:
 * printf()s for guest errors or unimplemented functionality
   should be qemu_log_mask()
 * printf()s for minor informational things should be
   tracepoints
 * printf() for debug that have been ifdeffed out forever
   could in theory be tracepoints, but I didn't feel it
   worth the effort of conversion in this case: I doubt that
   anybody will be trying to debug this code or that the
   specific handful of debug ifdefs would be what they'll want
   anyway. If anybody ever does need to do debug here, they can
   add the tracepoints that are actually useful to them.

thanks
-- PMM

Peter Maydell (5):
  hw/arm/omap1: Convert raw printfs to qemu_log_mask()
  hw/arm/omap1: Drop ALMDEBUG ifdeffed out code
  hw/arm/omap1: Convert information printfs to tracepoints
  hw/arm/omap_sx1.c: Remove ifdeffed out debug printf
  hw/arm/versatilepb: Convert printfs to LOG_GUEST_ERROR

 hw/arm/omap1.c   | 126 ---
 hw/arm/omap_sx1.c|   4 --
 hw/arm/versatilepb.c |   7 ++-
 hw/arm/trace-events  |   7 +++
 4 files changed, 57 insertions(+), 87 deletions(-)

-- 
2.43.0

[PATCH 4/5] hw/arm/omap_sx1.c: Remove ifdeffed out debug printf

2025-02-27 Thread Peter Maydell

Remove an ifdeffed out debug printf from the static_write() function in
omap_sx1.c. In theory we could turn this into a tracepoint, but for
code this old it doesn't seem worthwhile. We can add tracepoints if
and when we have a reason to debug something.

Signed-off-by: Peter Maydell 
---
 hw/arm/omap_sx1.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/hw/arm/omap_sx1.c b/hw/arm/omap_sx1.c
index c6b0bed0796..24b40431832 100644
--- a/hw/arm/omap_sx1.c
+++ b/hw/arm/omap_sx1.c
@@ -76,10 +76,6 @@ static uint64_t static_read(void *opaque, hwaddr offset,
 static void static_write(void *opaque, hwaddr offset,
  uint64_t value, unsigned size)
 {
-#ifdef SPY
-printf("%s: value %" PRIx64 " %u bytes written at 0x%x\n",
-__func__, value, size, (int)offset);
-#endif
 }
 
 static const MemoryRegionOps static_ops = {
-- 
2.43.0

Re: [PATCH 3/4] ppc/amigaone: Add default environment

2025-02-27 Thread BALATON Zoltan


On Thu, 27 Feb 2025, Nicholas Piggin wrote:

On Thu Feb 27, 2025 at 12:18 PM AEST, BALATON Zoltan wrote:

On Thu, 27 Feb 2025, Nicholas Piggin wrote:

On Sun Feb 23, 2025 at 3:52 AM AEST, BALATON Zoltan wrote:

Initialise empty NVRAM with default values. This also enables IDE UDMA
mode in AmigaOS that is faster but has to be enabled in environment
due to problems with real hardware but that does not affect emulation
so we can use faster defaults here.


So this overwrites a blank NVRAM file. Okay I suppose if that works.


We're emulating what U-Boot does. If it does not find a valid environment
it will overwrite with defaults.


AFAIKS u-boot provides a default environment if the CRC does not match.

If all-zeros env was created with correct CRC, IMO it should be
accepted.

Does u-boot write back a default environment or corrected CRC to NVRAM
if it was missing/bad?


No, U-Boot replaces the in memory copy with the default, the NVRAM is only 
changed if saveenv command is issued. I don't want to 100% emulate U-Boot 
but make it easier for users, see below.



These defaults are to get the same
behaviour and additionally to enable UDMA for IDE driver that until now
had to be done manually to get the same speed as with pegasos2 where this
is enabled by default. (That's because these VIA VT82C686B chips had some
issues with DMA even in PCs but the emulated devices work so can be
enabled and that's faster on QEMU too.)


You could have a property to supply the default environment
alternatively.


U-Boot has the defaults hard coded. I set and use it from the same file so
I don't see the need to send this through a property. (Properties are also
listed in QEMU Monitor with info qtree and I don't know how a long string
with embedded zeros would look there so I don't think that's a good idea.)


Anywhere to document this behaviour for users?


I've added docs in the cover letter that I hope would end up in the
changelog and I have a separate page for this which I'll update at
qmiga.codeberg.page (Haven't done that yet but would do it by the
release once this is merged.)


Okay. It would still be better to add QEMU options to docs/ files and
point the amiga pages to that rather than the other way around, since
this is QEMU specific stuff. Doc in cover letter unfortunately is
not very good since it  just gets lost.


I hope that part of the cover letter would end up in the QEMU changelog so 
it will be in QEMU docs. The amigang.rst doc specifically documents how to 
run Linux on these machines and these patches don't affect Linux nor work 
with it so I left them out to not confuse readers with options they don't 
need.





Signed-off-by: BALATON Zoltan 
---
 hw/ppc/amigaone.c | 37 -
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/amigaone.c b/hw/ppc/amigaone.c
index 5273543460..35e4075cc3 100644
--- a/hw/ppc/amigaone.c
+++ b/hw/ppc/amigaone.c
@@ -52,6 +52,28 @@ static const char dummy_fw[] = {
 #define NVRAM_ADDR 0xfd0e
 #define NVRAM_SIZE (4 * KiB)

+static char default_env[] =
+"baudrate=115200\0"
+"stdout=vga\0"
+"stdin=ps2kbd\0"
+"bootcmd=boota; menu; run menuboot_cmd\0"
+"boot1=ide\0"
+"boot2=cdrom\0"
+"boota_timeout=3\0"
+"ide_doreset=on\0"
+"pci_irqa=9\0"
+"pci_irqa_select=level\0"
+"pci_irqb=10\0"
+"pci_irqb_select=level\0"
+"pci_irqc=11\0"
+"pci_irqc_select=level\0"
+"pci_irqd=7\0"
+"pci_irqd_select=level\0"


Hmm, the u-boot default env (before it was removed) selected
edge for these. Was that wrong?


What was in upstream U-Boot wasn't what was in the binary that came with 
the machine. The binary has these defaults and AmigaOS depends on it, it 
does not work with edge as found for the pegasos2 where the default was 
edge and that resulted in missed IRQs.



+"a1ide_irq=\0"
+"a1ide_xfer=\0";
+#define CRC32_DEFAULT_ENV 0xb5548481
+#define CRC32_ALL_ZEROS   0x603b0489
+
 #define TYPE_A1_NVRAM "a1-nvram"
 OBJECT_DECLARE_SIMPLE_TYPE(A1NVRAMState, A1_NVRAM)

@@ -97,7 +119,7 @@ static void nvram_realize(DeviceState *dev, Error **errp)
 {
 A1NVRAMState *s = A1_NVRAM(dev);
 void *p;
-uint32_t *c;
+uint32_t crc, *c;

 memory_region_init_rom_device(&s->mr, NULL, &nvram_ops, s, "nvram",
   NVRAM_SIZE, &error_fatal);
@@ -116,12 +138,25 @@ static void nvram_realize(DeviceState *dev, Error **errp)
 return;
 }
 }
+crc = crc32(0, p + 4, NVRAM_SIZE - 4);
+if (crc == CRC32_ALL_ZEROS) { /* If env is uninitialized set default */
+*c = cpu_to_be32(CRC32_DEFAULT_ENV);
+/* Also copies terminating \0 as env is terminated by \0\0 */
+memcpy(p + 4, default_env, sizeof(default_env));
+if (s->blk) {
+blk_pwrite(s->blk, 0, sizeof(crc) + sizeof(default_env), p, 0);
+}
+return;
+}
 if (*c == 0) {
 *c = cpu_to_be32(crc32(0, p + 4, NVRAM

Re: [PATCH v7 10/16] docs/system: Add documentation on support for IGVM

2025-02-27 Thread Gupta, Pankaj


On 2/27/2025 3:29 PM, Roy Hopkins wrote:

IGVM support has been implemented for Confidential Guests that support
AMD SEV and AMD SEV-ES. Add some documentation that gives some
background on the IGVM format and how to use it to configure a
confidential guest.

Signed-off-by: Roy Hopkins 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Stefano Garzarella 
Acked-by: Michael S. Tsirkin 
---
  docs/system/i386/amd-memory-encryption.rst |   2 +
  docs/system/igvm.rst   | 173 +
  docs/system/index.rst  |   1 +
  3 files changed, 176 insertions(+)
  create mode 100644 docs/system/igvm.rst

diff --git a/docs/system/i386/amd-memory-encryption.rst 
b/docs/system/i386/amd-memory-encryption.rst
index 748f5094ba..6c23f3535f 100644
--- a/docs/system/i386/amd-memory-encryption.rst
+++ b/docs/system/i386/amd-memory-encryption.rst
@@ -1,3 +1,5 @@
+.. _amd-sev:
+
  AMD Secure Encrypted Virtualization (SEV)
  =
  
diff --git a/docs/system/igvm.rst b/docs/system/igvm.rst

new file mode 100644
index 00..36146a81df
--- /dev/null
+++ b/docs/system/igvm.rst
@@ -0,0 +1,173 @@
+Independent Guest Virtual Machine (IGVM) support
+
+
+IGVM files are designed to encapsulate all the information required to launch a
+virtual machine on any given virtualization stack in a deterministic way. This
+allows the cryptographic measurement of initial guest state for Confidential
+Guests to be calculated when the IGVM file is built, allowing a relying party 
to
+verify the initial state of a guest via a remote attestation.
+
+Although IGVM files are designed with Confidential Computing in mind, they can
+also be used to configure non-confidential guests. Multiple platforms can be
+defined by a single IGVM file, allowing a single IGVM file to configure a
+virtual machine that can run on, for example, TDX, SEV and non-confidential
+hosts.
+
+QEMU supports IGVM files through the user-creatable ``igvm-cfg`` object. This
+object is used to define the filename of the IGVM file to process. A reference
+to the object is added to the ``-machine`` to configure the virtual machine
+to use the IGVM file for configuration.
+
+Confidential platform support is provided through the use of
+the ``ConfidentialGuestSupport`` object. If the virtual machine provides an
+instance of this object then this is used by the IGVM loader to configure the
+isolation properties of the directives within the file.
+
+Further Information on IGVM
+---
+
+Information about the IGVM format, including links to the format specification
+and documentation for the Rust and C libraries can be found at the project
+repository:
+
+https://github.com/microsoft/igvm
+
+
+Supported Platforms
+---
+
+Currently, IGVM files can be provided for Confidential Guests on host systems
+that support AMD SEV, SEV-ES and SEV-SNP with KVM. IGVM files can also be
+provided for non-confidential guests.
+
+
+Limitations when using IGVM with AMD SEV, SEV-ES and SEV-SNP
+
+
+IGVM files configure the initial state of the guest using a set of directives.
+Not every directive is supported by every Confidential Guest type. For example,
+AMD SEV does not support encrypted save state regions, therefore setting the
+initial CPU state using IGVM for SEV is not possible. When an IGVM file 
contains
+directives that are not supported for the active platform, an error is 
generated
+and the guest launch is aborted.
+
+The table below describes the list of directives that are supported for SEV,
+SEV-ES, SEV-SNP and non-confidential platforms.
+
+.. list-table:: SEV, SEV-ES, SEV-SNP & non-confidential Supported Directives
+   :widths: 35 65
+   :header-rows: 1
+
+   * - IGVM directive
+ - Notes
+   * - IGVM_VHT_PAGE_DATA
+ - ``NORMAL`` zero, measured and unmeasured page types are supported. Other
+   page types result in an error.
+   * - IGVM_VHT_PARAMETER_AREA
+ -
+   * - IGVM_VHT_PARAMETER_INSERT
+ -
+   * - IGVM_VHT_VP_COUNT_PARAMETER
+ - The guest parameter page is populated with the CPU count.
+   * - IGVM_VHT_ENVIRONMENT_INFO_PARAMETER
+ - The ``memory_is_shared`` parameter is set to 1 in the guest parameter
+   page.
+
+.. list-table:: Additional SEV, SEV-ES & SEV_SNP Supported Directives
+   :widths: 25 75
+   :header-rows: 1
+
+   * - IGVM directive
+ - Notes
+   * - IGVM_VHT_MEMORY_MAP
+ - The memory map page is populated using entries from the E820 table.
+   * - IGVM_VHT_REQUIRED_MEMORY
+ -


Is this '-' superflous? Or maybe you want to describe it here?

Other than that:

Reviewed-by: Pankaj Gupta 



+
+.. list-table:: Additional SEV-ES & SEV-SNP Supported Directives
+   :widths: 25 75
+   :header-rows: 1
+
+   * - IGVM directive
+ - Notes
+   * - IGVM_VHT_VP_CONTEXT
+ - Setting of the initial CPU state for t

[PATCH 2/5] hw/arm/omap1: Drop ALMDEBUG ifdeffed out code

2025-02-27 Thread Peter Maydell

In omap1.c, there are some debug printfs in the omap_rtc_write()
function that are guardad by ifdef ALMDEBUG. ALMDEBUG is never
set, so this is all dead code.

It's not worth the effort of converting all of these to tracepoints;
a modern tracepoint approach would probably have a single tracepoint
covering all the register writes anyway. Just delete the printf()s.

Signed-off-by: Peter Maydell 
---
 hw/arm/omap1.c | 51 --
 1 file changed, 51 deletions(-)

diff --git a/hw/arm/omap1.c b/hw/arm/omap1.c
index 3c0ce5e0979..8f5bb81c96a 100644
--- a/hw/arm/omap1.c
+++ b/hw/arm/omap1.c
@@ -2660,25 +2660,16 @@ static void omap_rtc_write(void *opaque, hwaddr addr,
 
 switch (offset) {
 case 0x00: /* SECONDS_REG */
-#ifdef ALMDEBUG
-printf("RTC SEC_REG <-- %02x\n", value);
-#endif
 s->ti -= s->current_tm.tm_sec;
 s->ti += from_bcd(value);
 return;
 
 case 0x04: /* MINUTES_REG */
-#ifdef ALMDEBUG
-printf("RTC MIN_REG <-- %02x\n", value);
-#endif
 s->ti -= s->current_tm.tm_min * 60;
 s->ti += from_bcd(value) * 60;
 return;
 
 case 0x08: /* HOURS_REG */
-#ifdef ALMDEBUG
-printf("RTC HRS_REG <-- %02x\n", value);
-#endif
 s->ti -= s->current_tm.tm_hour * 3600;
 if (s->pm_am) {
 s->ti += (from_bcd(value & 0x3f) & 12) * 3600;
@@ -2688,17 +2679,11 @@ static void omap_rtc_write(void *opaque, hwaddr addr,
 return;
 
 case 0x0c: /* DAYS_REG */
-#ifdef ALMDEBUG
-printf("RTC DAY_REG <-- %02x\n", value);
-#endif
 s->ti -= s->current_tm.tm_mday * 86400;
 s->ti += from_bcd(value) * 86400;
 return;
 
 case 0x10: /* MONTHS_REG */
-#ifdef ALMDEBUG
-printf("RTC MTH_REG <-- %02x\n", value);
-#endif
 memcpy(&new_tm, &s->current_tm, sizeof(new_tm));
 new_tm.tm_mon = from_bcd(value);
 ti[0] = mktimegm(&s->current_tm);
@@ -2715,9 +2700,6 @@ static void omap_rtc_write(void *opaque, hwaddr addr,
 return;
 
 case 0x14: /* YEARS_REG */
-#ifdef ALMDEBUG
-printf("RTC YRS_REG <-- %02x\n", value);
-#endif
 memcpy(&new_tm, &s->current_tm, sizeof(new_tm));
 new_tm.tm_year += from_bcd(value) - (new_tm.tm_year % 100);
 ti[0] = mktimegm(&s->current_tm);
@@ -2737,25 +2719,16 @@ static void omap_rtc_write(void *opaque, hwaddr addr,
 return;/* Ignored */
 
 case 0x20: /* ALARM_SECONDS_REG */
-#ifdef ALMDEBUG
-printf("ALM SEC_REG <-- %02x\n", value);
-#endif
 s->alarm_tm.tm_sec = from_bcd(value);
 omap_rtc_alarm_update(s);
 return;
 
 case 0x24: /* ALARM_MINUTES_REG */
-#ifdef ALMDEBUG
-printf("ALM MIN_REG <-- %02x\n", value);
-#endif
 s->alarm_tm.tm_min = from_bcd(value);
 omap_rtc_alarm_update(s);
 return;
 
 case 0x28: /* ALARM_HOURS_REG */
-#ifdef ALMDEBUG
-printf("ALM HRS_REG <-- %02x\n", value);
-#endif
 if (s->pm_am)
 s->alarm_tm.tm_hour =
 ((from_bcd(value & 0x3f)) % 12) +
@@ -2766,33 +2739,21 @@ static void omap_rtc_write(void *opaque, hwaddr addr,
 return;
 
 case 0x2c: /* ALARM_DAYS_REG */
-#ifdef ALMDEBUG
-printf("ALM DAY_REG <-- %02x\n", value);
-#endif
 s->alarm_tm.tm_mday = from_bcd(value);
 omap_rtc_alarm_update(s);
 return;
 
 case 0x30: /* ALARM_MONTHS_REG */
-#ifdef ALMDEBUG
-printf("ALM MON_REG <-- %02x\n", value);
-#endif
 s->alarm_tm.tm_mon = from_bcd(value);
 omap_rtc_alarm_update(s);
 return;
 
 case 0x34: /* ALARM_YEARS_REG */
-#ifdef ALMDEBUG
-printf("ALM YRS_REG <-- %02x\n", value);
-#endif
 s->alarm_tm.tm_year = from_bcd(value);
 omap_rtc_alarm_update(s);
 return;
 
 case 0x40: /* RTC_CTRL_REG */
-#ifdef ALMDEBUG
-printf("RTC CONTROL <-- %02x\n", value);
-#endif
 s->pm_am = (value >> 3) & 1;
 s->auto_comp = (value >> 2) & 1;
 s->round = (value >> 1) & 1;
@@ -2802,32 +2763,20 @@ static void omap_rtc_write(void *opaque, hwaddr addr,
 return;
 
 case 0x44: /* RTC_STATUS_REG */
-#ifdef ALMDEBUG
-printf("RTC STATUSL <-- %02x\n", value);
-#endif
 s->status &= ~((value & 0xc0) ^ 0x80);
 omap_rtc_interrupts_update(s);
 return;
 
 case 0x48: /* RTC_INTERRUPTS_REG */
-#ifdef ALMDEBUG
-printf("RTC INTRS <-- %02x\n", value);
-#endif
 s->interrupts = value;
 return;
 
 case 0x4c: /* RTC_COMP_LSB_REG */
-#ifdef ALMDEBUG
-printf("RTC COMPLSB <-- %02x\n", value);
-#endif
 s->comp_reg &= 0xff00;
 s->comp_reg |= 0x00ff & value;
 return;
 
 case 0x50: /* RTC_COMP_MSB_REG */
-#ifdef ALMDEBUG
-printf("RTC COMPMSB <-- %02x\n", value);
-#endif
 s->comp_reg &= 0x00ff;
 s->comp_reg |= 0xff00 & (value << 8);
 return;
-- 
2.43.0

Re: [PATCH v7 38/52] i386/apic: Skip kvm_apic_put() for TDX

2025-02-27 Thread Francesco Lavra

On Fri, 2025-01-24 at 08:20 -0500, Xiaoyao Li wrote:
> KVM neithers allow writing to MSR_IA32_APICBASE for TDs, nor allow
> for
> KVM_SET_LAPIC[*].
> 
> Note, KVM_GET_LAPIC is also disallowed for TDX. It is called in the
> path
> 
>   do_kvm_cpu_synchronize_state()
>   -> kvm_arch_get_registers()
>  -> kvm_get_apic()
> 
> and it's already disllowed for confidential guest through
> guest_state_protected.
> 
> [*] https://lore.kernel.org/all/z3w4ku4jq0crt...@google.com/
> 
> Signed-off-by: Xiaoyao Li 
> ---
>  hw/i386/kvm/apic.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
> index 757510600098..a1850524a67f 100644
> --- a/hw/i386/kvm/apic.c
> +++ b/hw/i386/kvm/apic.c
> @@ -17,6 +17,7 @@
>  #include "system/hw_accel.h"
>  #include "system/kvm.h"
>  #include "kvm/kvm_i386.h"
> +#include "kvm/tdx.h"
>  
>  static inline void kvm_apic_set_reg(struct kvm_lapic_state *kapic,
>  int reg_id, uint32_t val)
> @@ -141,6 +142,10 @@ static void kvm_apic_put(CPUState *cs,
> run_on_cpu_data data)
>  struct kvm_lapic_state kapic;
>  int ret;
>  
> +    if(is_tdx_vm()) {

Missing space between if and (.
scripts/checkpatch.pl would have caught this.

Re: [PATCH] tests/tcg: Suppress compiler false-positive warning on sha1.c

2025-02-27 Thread Richard Henderson


On 2/27/25 06:13, Peter Maydell wrote:

GCC versions at least 12 through 15 incorrectly report a warning
about code in sha1.c:

tests/tcg/multiarch/sha1.c:161:13: warning: ‘SHA1Transform’ reading 64 bytes 
from a region of size 0 [-Wstringop-overread]
   161 | SHA1Transform(context->state, &data[i]);
   | ^~~

This is a piece of stock library code for doing SHA1 which we've
simply copied, rather than writing ourselves. The bug has been
reported to upstream GCC (about a different library's use of this
code):
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106709

For our test case, since this isn't our original code and there isn't
actually a bug in it, suppress the incorrect warning rather than
trying to modify the code to work around the compiler issue.

Resolves:https://gitlab.com/qemu-project/qemu/-/issues/2328
Signed-off-by: Peter Maydell
---
  tests/tcg/aarch64/Makefile.target   | 3 ++-
  tests/tcg/arm/Makefile.target   | 3 ++-
  tests/tcg/multiarch/Makefile.target | 8 
  3 files changed, 12 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson 

r~

[PATCH 1/5] hw/arm/omap1: Convert raw printfs to qemu_log_mask()

2025-02-27 Thread Peter Maydell

omap1.c is very old code, and it contains numerous calls direct to
printf() for various error and information cases.

In this commit, convert the printf() calls that are for either guest
error or unimplemented functionality to qemu_log_mask() calls.

This leaves the printf() calls that are informative or which are
ifdeffed-out debug statements untouched.

Signed-off-by: Peter Maydell 
---
 hw/arm/omap1.c | 48 +++-
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/hw/arm/omap1.c b/hw/arm/omap1.c
index ca2eb0d1576..3c0ce5e0979 100644
--- a/hw/arm/omap1.c
+++ b/hw/arm/omap1.c
@@ -2559,8 +2559,9 @@ static void omap_rtc_interrupts_update(struct omap_rtc_s 
*s)
 static void omap_rtc_alarm_update(struct omap_rtc_s *s)
 {
 s->alarm_ti = mktimegm(&s->alarm_tm);
-if (s->alarm_ti == -1)
-printf("%s: conversion failed\n", __func__);
+if (s->alarm_ti == -1) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: conversion failed\n", __func__);
+}
 }
 
 static uint64_t omap_rtc_read(void *opaque, hwaddr addr, unsigned size)
@@ -3024,8 +3025,9 @@ static void omap_mcbsp_source_tick(void *opaque)
 
 if (!s->rx_rate)
 return;
-if (s->rx_req)
-printf("%s: Rx FIFO overrun\n", __func__);
+if (s->rx_req) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Rx FIFO overrun\n", __func__);
+}
 
 s->rx_req = s->rx_rate << bps[(s->rcr[0] >> 5) & 7];
 
@@ -3070,8 +3072,9 @@ static void omap_mcbsp_sink_tick(void *opaque)
 
 if (!s->tx_rate)
 return;
-if (s->tx_req)
-printf("%s: Tx FIFO underrun\n", __func__);
+if (s->tx_req) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Tx FIFO underrun\n", __func__);
+}
 
 s->tx_req = s->tx_rate << bps[(s->xcr[0] >> 5) & 7];
 
@@ -3173,7 +3176,7 @@ static uint64_t omap_mcbsp_read(void *opaque, hwaddr addr,
 /* Fall through.  */
 case 0x02: /* DRR1 */
 if (s->rx_req < 2) {
-printf("%s: Rx FIFO underrun\n", __func__);
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Rx FIFO underrun\n", __func__);
 omap_mcbsp_rx_done(s);
 } else {
 s->tx_req -= 2;
@@ -3278,8 +3281,9 @@ static void omap_mcbsp_writeh(void *opaque, hwaddr addr,
 }
 if (s->tx_req < 2)
 omap_mcbsp_tx_done(s);
-} else
-printf("%s: Tx FIFO overrun\n", __func__);
+} else {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Tx FIFO overrun\n", __func__);
+}
 return;
 
 case 0x08: /* SPCR2 */
@@ -3293,8 +3297,11 @@ static void omap_mcbsp_writeh(void *opaque, hwaddr addr,
 case 0x0a: /* SPCR1 */
 s->spcr[0] &= 0x0006;
 s->spcr[0] |= 0xf8f9 & value;
-if (value & (1 << 15)) /* DLB */
-printf("%s: Digital Loopback mode enable attempt\n", __func__);
+if (value & (1 << 15)) {/* DLB */
+qemu_log_mask(LOG_UNIMP,
+  "%s: Digital Loopback mode enable attempt\n",
+  __func__);
+}
 if (~value & 1) {  /* RRST */
 s->spcr[0] &= ~6;
 s->rx_req = 0;
@@ -3325,13 +3332,19 @@ static void omap_mcbsp_writeh(void *opaque, hwaddr addr,
 return;
 case 0x18: /* MCR2 */
 s->mcr[1] = value & 0x03e3;
-if (value & 3) /* XMCM */
-printf("%s: Tx channel selection mode enable attempt\n", __func__);
+if (value & 3) {/* XMCM */
+qemu_log_mask(LOG_UNIMP,
+  "%s: Tx channel selection mode enable attempt\n",
+  __func__);
+}
 return;
 case 0x1a: /* MCR1 */
 s->mcr[0] = value & 0x03e1;
-if (value & 1) /* RMCM */
-printf("%s: Rx channel selection mode enable attempt\n", __func__);
+if (value & 1) {/* RMCM */
+qemu_log_mask(LOG_UNIMP,
+  "%s: Rx channel selection mode enable attempt\n",
+  __func__);
+}
 return;
 case 0x1c: /* RCERA */
 s->rcer[0] = value & 0x;
@@ -3412,8 +3425,9 @@ static void omap_mcbsp_writew(void *opaque, hwaddr addr,
 }
 if (s->tx_req < 4)
 omap_mcbsp_tx_done(s);
-} else
-printf("%s: Tx FIFO overrun\n", __func__);
+} else {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Tx FIFO overrun\n", __func__);
+}
 return;
 }
 
-- 
2.43.0

[PATCH 3/5] hw/arm/omap1: Convert information printfs to tracepoints

2025-02-27 Thread Peter Maydell

The omap1 code uses raw printf() statements to print information
about some events; convert these to tracepoints.

In particular, this will stop the functional test for the sx1
from printing the not-very-helpful note
 "omap_clkm_write: clocking scheme set to synchronous scalable"
to the test's default.log.

Signed-off-by: Peter Maydell 
---
 hw/arm/omap1.c  | 27 ++-
 hw/arm/trace-events |  7 +++
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/hw/arm/omap1.c b/hw/arm/omap1.c
index 8f5bb81c96a..605e733459c 100644
--- a/hw/arm/omap1.c
+++ b/hw/arm/omap1.c
@@ -42,6 +42,7 @@
 #include "qemu/cutils.h"
 #include "qemu/bcd.h"
 #include "target/arm/cpu-qom.h"
+#include "trace.h"
 
 static inline void omap_log_badwidth(const char *funcname, hwaddr addr, int sz)
 {
@@ -1731,8 +1732,7 @@ static void omap_clkm_write(void *opaque, hwaddr addr,
 case 0x18: /* ARM_SYSST */
 if ((s->clkm.clocking_scheme ^ (value >> 11)) & 7) {
 s->clkm.clocking_scheme = (value >> 11) & 7;
-printf("%s: clocking scheme set to %s\n", __func__,
-   clkschemename[s->clkm.clocking_scheme]);
+
trace_omap1_clocking_scheme(clkschemename[s->clkm.clocking_scheme]);
 }
 s->clkm.cold_start &= value & 0x3f;
 return;
@@ -2335,7 +2335,7 @@ static void omap_pwl_update(struct omap_pwl_s *s)
 
 if (output != s->output) {
 s->output = output;
-printf("%s: Backlight now at %i/256\n", __func__, output);
+trace_omap1_backlight(output);
 }
 }
 
@@ -2470,8 +2470,8 @@ static void omap_pwt_write(void *opaque, hwaddr addr,
 break;
 case 0x04: /* VRC */
 if ((value ^ s->vrc) & 1) {
-if (value & 1)
-printf("%s: %iHz buzz on\n", __func__, (int)
+if (value & 1) {
+trace_omap1_buzz(
 /* 1.5 MHz from a 12-MHz or 13-MHz PWT_CLK */
 ((omap_clk_getrate(s->clk) >> 3) /
  /* Pre-multiplexer divider */
@@ -2487,8 +2487,9 @@ static void omap_pwt_write(void *opaque, hwaddr addr,
  /*  80/127 divider */
  ((value & (1 << 5)) ?  80 : 127) /
  (107 * 55 * 63 * 127)));
-else
-printf("%s: silence!\n", __func__);
+} else {
+trace_omap1_silence();
+}
 }
 s->vrc = value & 0x7f;
 break;
@@ -3494,7 +3495,7 @@ static void omap_lpg_tick(void *opaque)
 timer_mod(s->tm, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + s->on);
 
 s->cycle = !s->cycle;
-printf("%s: LED is %s\n", __func__, s->cycle ? "on" : "off");
+trace_omap1_led(s->cycle ? "on" : "off");
 }
 
 static void omap_lpg_update(struct omap_lpg_s *s)
@@ -3514,11 +3515,11 @@ static void omap_lpg_update(struct omap_lpg_s *s)
 }
 
 timer_del(s->tm);
-if (on == period && s->on < s->period)
-printf("%s: LED is on\n", __func__);
-else if (on == 0 && s->on)
-printf("%s: LED is off\n", __func__);
-else if (on && (on != s->on || period != s->period)) {
+if (on == period && s->on < s->period) {
+trace_omap1_led("on");
+} else if (on == 0 && s->on) {
+trace_omap1_led("off");
+} else if (on && (on != s->on || period != s->period)) {
 s->cycle = 0;
 s->on = on;
 s->period = period;
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 7790db780e0..70b137a6cfd 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -1,5 +1,12 @@
 # See docs/devel/tracing.rst for syntax documentation.
 
+# omap1.c
+omap1_clocking_scheme(const char *scheme) "omap1 CLKM: clocking scheme set to 
%s"
+omap1_backlight(int output) "omap1 PWL: backlight now at %d/256"
+omap1_buzz(int freq) "omap1 PWT: %dHz buzz on"
+omap1_silence(void) "omap1 PWT: buzzer silenced"
+omap1_led(const char *onoff) "omap1 LPG: LED is %s"
+
 # virt-acpi-build.c
 virt_acpi_setup(void) "No fw cfg or ACPI disabled. Bailing out."
 
-- 
2.43.0

Re: [PATCH v7 08/16] i386/sev: Refactor setting of reset vector and initial CPU state

2025-02-27 Thread Gupta, Pankaj


On 2/27/2025 3:29 PM, Roy Hopkins wrote:

When an SEV guest is started, the reset vector and state are
extracted from metadata that is contained in the firmware volume.

In preparation for using IGVM to setup the initial CPU state,
the code has been refactored to populate vmcb_save_area for each
CPU which is then applied during guest startup and CPU reset.

Signed-off-by: Roy Hopkins 
Acked-by: Michael S. Tsirkin 
Acked-by: Stefano Garzarella 


Reviewed-by: Pankaj Gupta 


---
  target/i386/sev.c | 323 +-
  target/i386/sev.h | 110 
  2 files changed, 400 insertions(+), 33 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 7d91985f41..1d1e36e3de 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -49,6 +49,12 @@ OBJECT_DECLARE_TYPE(SevSnpGuestState, SevCommonStateClass, 
SEV_SNP_GUEST)
  /* hard code sha256 digest size */
  #define HASH_SIZE 32
  
+/* Convert between SEV-ES VMSA and SegmentCache flags/attributes */

+#define FLAGS_VMSA_TO_SEGCACHE(flags) \
+flags) & 0xff00) << 12) | (((flags) & 0xff) << 8))
+#define FLAGS_SEGCACHE_TO_VMSA(flags) \
+flags) & 0xff00) >> 8) | (((flags) & 0xf0) >> 12))
+
  typedef struct QEMU_PACKED SevHashTableEntry {
  QemuUUID guid;
  uint16_t len;
@@ -88,6 +94,14 @@ typedef struct QEMU_PACKED SevHashTableDescriptor {
  uint32_t size;
  } SevHashTableDescriptor;
  
+typedef struct SevLaunchVmsa {

+QTAILQ_ENTRY(SevLaunchVmsa) next;
+
+uint16_t cpu_index;
+uint64_t gpa;
+struct sev_es_save_area vmsa;
+} SevLaunchVmsa;
+
  struct SevCommonState {
  X86ConfidentialGuest parent_obj;
  
@@ -106,9 +120,7 @@ struct SevCommonState {

  int sev_fd;
  SevState state;
  
-uint32_t reset_cs;

-uint32_t reset_ip;
-bool reset_data_valid;
+QTAILQ_HEAD(, SevLaunchVmsa) launch_vmsa;
  };
  
  struct SevCommonStateClass {

@@ -371,6 +383,172 @@ static struct RAMBlockNotifier sev_ram_notifier = {
  .ram_block_removed = sev_ram_block_removed,
  };
  
+static void sev_apply_cpu_context(CPUState *cpu)

+{
+SevCommonState *sev_common = SEV_COMMON(MACHINE(qdev_get_machine())->cgs);
+X86CPU *x86;
+CPUX86State *env;
+struct SevLaunchVmsa *launch_vmsa;
+
+/* See if an initial VMSA has been provided for this CPU */
+QTAILQ_FOREACH(launch_vmsa, &sev_common->launch_vmsa, next)
+{
+if (cpu->cpu_index == launch_vmsa->cpu_index) {
+x86 = X86_CPU(cpu);
+env = &x86->env;
+
+/*
+ * Ideally we would provide the VMSA directly to kvm which would
+ * ensure that the resulting initial VMSA measurement which is
+ * calculated during KVM_SEV_LAUNCH_UPDATE_VMSA is calculated from
+ * exactly what we provide here. Currently this is not possible so
+ * we need to copy the parts of the VMSA structure that we 
currently
+ * support into the CPU state.
+ */
+cpu_load_efer(env, launch_vmsa->vmsa.efer);
+cpu_x86_update_cr4(env, launch_vmsa->vmsa.cr4);
+cpu_x86_update_cr0(env, launch_vmsa->vmsa.cr0);
+cpu_x86_update_cr3(env, launch_vmsa->vmsa.cr3);
+env->xcr0 = launch_vmsa->vmsa.xcr0;
+env->pat = launch_vmsa->vmsa.g_pat;
+
+cpu_x86_load_seg_cache(
+env, R_CS, launch_vmsa->vmsa.cs.selector,
+launch_vmsa->vmsa.cs.base, launch_vmsa->vmsa.cs.limit,
+FLAGS_VMSA_TO_SEGCACHE(launch_vmsa->vmsa.cs.attrib));
+cpu_x86_load_seg_cache(
+env, R_DS, launch_vmsa->vmsa.ds.selector,
+launch_vmsa->vmsa.ds.base, launch_vmsa->vmsa.ds.limit,
+FLAGS_VMSA_TO_SEGCACHE(launch_vmsa->vmsa.ds.attrib));
+cpu_x86_load_seg_cache(
+env, R_ES, launch_vmsa->vmsa.es.selector,
+launch_vmsa->vmsa.es.base, launch_vmsa->vmsa.es.limit,
+FLAGS_VMSA_TO_SEGCACHE(launch_vmsa->vmsa.es.attrib));
+cpu_x86_load_seg_cache(
+env, R_FS, launch_vmsa->vmsa.fs.selector,
+launch_vmsa->vmsa.fs.base, launch_vmsa->vmsa.fs.limit,
+FLAGS_VMSA_TO_SEGCACHE(launch_vmsa->vmsa.fs.attrib));
+cpu_x86_load_seg_cache(
+env, R_GS, launch_vmsa->vmsa.gs.selector,
+launch_vmsa->vmsa.gs.base, launch_vmsa->vmsa.gs.limit,
+FLAGS_VMSA_TO_SEGCACHE(launch_vmsa->vmsa.gs.attrib));
+cpu_x86_load_seg_cache(
+env, R_SS, launch_vmsa->vmsa.ss.selector,
+launch_vmsa->vmsa.ss.base, launch_vmsa->vmsa.ss.limit,
+FLAGS_VMSA_TO_SEGCACHE(launch_vmsa->vmsa.ss.attrib));
+
+env->gdt.base = launch_vmsa->vmsa.gdtr.base;
+env->gdt.limit = launch_vmsa->vmsa.gdtr.limit;
+env->gdt.flags =
+FLAGS_VMSA_TO_SEGCACHE(launch_vmsa->vmsa.gdtr

Re: [PATCH 4/5] rust: pl011: switch to safe chardev operation

2025-02-27 Thread Peter Maydell

On Thu, 27 Feb 2025 at 16:48, Paolo Bonzini  wrote:
>
> Switch bindings::CharBackend with chardev::CharBackend.  This removes
> occurrences of "unsafe" due to FFI and switches the wrappers for receive,
> can_receive and event callbacks to the common ones implemented by
> chardev::CharBackend.
>
> Signed-off-by: Paolo Bonzini 

> @@ -567,21 +552,16 @@ fn write(&self, offset: hwaddr, value: u64, _size: u32) 
> {

> -update_irq = self.regs.borrow_mut().write(
> -field,
> -value as u32,
> -addr_of!(self.char_backend) as *mut _,
> -);
> +update_irq = self
> +.regs
> +.borrow_mut()
> +.write(field, value as u32, &self.char_backend);
>  } else {
>  eprintln!("write bad offset {offset} value {value}");
>  }

Entirely unrelated to this patch, but seeing this go past
reminded me that I had a question I didn't get round to
asking in the community call the other day. In this
PL011State::write function, we delegate the job of
updating the register state to PL011Registers::write,
which returns a bool to tell us whether to call update().

I guess the underlying design idea here is "the register
object updates itself and tells the device object what
kinds of updates to the outside world it needs to do" ?
But then, why is the irq output something that PL011State
needs to handle itself whereas the chardev backend is
something we can pass into PL011Registers ?

In the C version, we just call pl011_update() where we
need to; we could validly call it unconditionally for any
write, we're just being (possibly prematurely) efficient
by avoiding a call when we happen to know that the register
write didn't touch any of the state that pl011_update()
cares about. So it feels a bit odd to me that in the Rust
version this "we happen to know that sometimes it would be
unnecessary to call the update function" has been kind of
promoted to being part of an interface between the two
different types PL011Registers and PL011State.

Thinking about other devices, presumably for more complex
devices we might need to pass more than just a single 'bool'
back from PL011Registers::write. What other kinds of thing
might we need to do in the FooState function, and (since
the pl011 code is presumably going to be used as a template
for those other devices) is it worth having something that
expresses that better than just a raw 'bool' return ?

thanks
-- PMM

[PULL 04/34] rust: subprojects: add libc crate

2025-02-27 Thread Paolo Bonzini

This allows access to errno values.

Reviewed-by: Zhao Liu 
Signed-off-by: Paolo Bonzini 
---
 rust/Cargo.lock   |  7 
 rust/qemu-api/Cargo.toml  |  1 +
 scripts/archive-source.sh |  2 +-
 scripts/make-release  |  2 +-
 subprojects/.gitignore|  1 +
 subprojects/libc-0.2-rs.wrap  |  7 
 .../packagefiles/libc-0.2-rs/meson.build  | 37 +++
 7 files changed, 55 insertions(+), 2 deletions(-)
 create mode 100644 subprojects/libc-0.2-rs.wrap
 create mode 100644 subprojects/packagefiles/libc-0.2-rs/meson.build

diff --git a/rust/Cargo.lock b/rust/Cargo.lock
index 79e142723b8..2ebf0a11ea4 100644
--- a/rust/Cargo.lock
+++ b/rust/Cargo.lock
@@ -54,6 +54,12 @@ dependencies = [
  "either",
 ]
 
+[[package]]
+name = "libc"
+version = "0.2.162"
+source = "registry+https://github.com/rust-lang/crates.io-index";
+checksum = "18d287de67fe55fd7e1581fe933d965a5a9477b38e949cfa9f8574ef01506398"
+
 [[package]]
 name = "pl011"
 version = "0.1.0"
@@ -100,6 +106,7 @@ dependencies = [
 name = "qemu_api"
 version = "0.1.0"
 dependencies = [
+ "libc",
  "qemu_api_macros",
  "version_check",
 ]
diff --git a/rust/qemu-api/Cargo.toml b/rust/qemu-api/Cargo.toml
index a51dd142852..57747bc9341 100644
--- a/rust/qemu-api/Cargo.toml
+++ b/rust/qemu-api/Cargo.toml
@@ -16,6 +16,7 @@ rust-version = "1.63.0"
 
 [dependencies]
 qemu_api_macros = { path = "../qemu-api-macros" }
+libc = "0.2.162"
 
 [build-dependencies]
 version_check = "~0.9"
diff --git a/scripts/archive-source.sh b/scripts/archive-source.sh
index 30677c3ec90..e461c1531ed 100755
--- a/scripts/archive-source.sh
+++ b/scripts/archive-source.sh
@@ -28,7 +28,7 @@ sub_file="${sub_tdir}/submodule.tar"
 # different to the host OS.
 subprojects="keycodemapdb libvfio-user berkeley-softfloat-3
   berkeley-testfloat-3 arbitrary-int-1-rs bilge-0.2-rs
-  bilge-impl-0.2-rs either-1-rs itertools-0.11-rs proc-macro2-1-rs
+  bilge-impl-0.2-rs either-1-rs itertools-0.11-rs libc-0.2-rs proc-macro2-1-rs
   proc-macro-error-1-rs proc-macro-error-attr-1-rs quote-1-rs
   syn-2-rs unicode-ident-1-rs"
 sub_deinit=""
diff --git a/scripts/make-release b/scripts/make-release
index 1b89b3423a8..8c3594a1a47 100755
--- a/scripts/make-release
+++ b/scripts/make-release
@@ -41,7 +41,7 @@ fi
 # Only include wraps that are invoked with subproject()
 SUBPROJECTS="libvfio-user keycodemapdb berkeley-softfloat-3
   berkeley-testfloat-3 arbitrary-int-1-rs bilge-0.2-rs
-  bilge-impl-0.2-rs either-1-rs itertools-0.11-rs proc-macro2-1-rs
+  bilge-impl-0.2-rs either-1-rs itertools-0.11-rs libc-0.2-rs proc-macro2-1-rs
   proc-macro-error-1-rs proc-macro-error-attr-1-rs quote-1-rs
   syn-2-rs unicode-ident-1-rs"
 
diff --git a/subprojects/.gitignore b/subprojects/.gitignore
index 50f173f90db..d12d34618cc 100644
--- a/subprojects/.gitignore
+++ b/subprojects/.gitignore
@@ -11,6 +11,7 @@
 /bilge-impl-0.2.0
 /either-1.12.0
 /itertools-0.11.0
+/libc-0.2.162
 /proc-macro-error-1.0.4
 /proc-macro-error-attr-1.0.4
 /proc-macro2-1.0.84
diff --git a/subprojects/libc-0.2-rs.wrap b/subprojects/libc-0.2-rs.wrap
new file mode 100644
index 000..bbe08f87883
--- /dev/null
+++ b/subprojects/libc-0.2-rs.wrap
@@ -0,0 +1,7 @@
+[wrap-file]
+directory = libc-0.2.162
+source_url = https://crates.io/api/v1/crates/libc/0.2.162/download
+source_filename = libc-0.2.162.tar.gz
+source_hash = 18d287de67fe55fd7e1581fe933d965a5a9477b38e949cfa9f8574ef01506398
+#method = cargo
+patch_directory = libc-0.2-rs
diff --git a/subprojects/packagefiles/libc-0.2-rs/meson.build 
b/subprojects/packagefiles/libc-0.2-rs/meson.build
new file mode 100644
index 000..ac4f80dba98
--- /dev/null
+++ b/subprojects/packagefiles/libc-0.2-rs/meson.build
@@ -0,0 +1,37 @@
+project('libc-0.2-rs', 'rust',
+  meson_version: '>=1.5.0',
+  version: '0.2.162',
+  license: 'MIT OR Apache-2.0',
+  default_options: [])
+
+_libc_rs = static_library(
+  'libc',
+  files('src/lib.rs'),
+  gnu_symbol_visibility: 'hidden',
+  override_options: ['rust_std=2015', 'build.rust_std=2015'],
+  rust_abi: 'rust',
+  rust_args: [
+'--cap-lints', 'allow',
+'--cfg', 'freebsd11',
+'--cfg', 'libc_priv_mod_use',
+'--cfg', 'libc_union',
+'--cfg', 'libc_const_size_of',
+'--cfg', 'libc_align',
+'--cfg', 'libc_int128',
+'--cfg', 'libc_core_cvoid',
+'--cfg', 'libc_packedN',
+'--cfg', 'libc_cfg_target_vendor',
+'--cfg', 'libc_non_exhaustive',
+'--cfg', 'libc_long_array',
+'--cfg', 'libc_ptr_addr_of',
+'--cfg', 'libc_underscore_const_names',
+'--cfg', 'libc_const_extern_fn',
+  ],
+  dependencies: [],
+)
+
+libc_dep = declare_dependency(
+  link_with: _libc_rs,
+)
+
+meson.override_dependency('libc-0.2-rs', libc_dep)
-- 
2.48.1

Re: [PATCH 2/2] vfio: Make vfio-platform available on Aarch64 platforms only

2025-02-27 Thread Alex Williamson

On Thu, 27 Feb 2025 09:32:46 +0100
Eric Auger  wrote:

> Hi Cédric,
> 
> On 2/26/25 9:47 AM, Cédric Le Goater wrote:
> > VFIO Platforms was designed for Aarch64. Restrict availability to
> > 64-bit host platforms.
> >
> > Cc: Eric Auger 
> > Signed-off-by: Cédric Le Goater   
> Reviewed-by: Eric Auger 
> 
> As an outcome from last KVM forum, next step may be to simply remove
> VFIO_PLATFORM from the qemu tree.
> 
> We also need to make a decision wrt linux vfio platform driver. As I
> can't test it anymore without hacks (my last tegra234 mgbe works are
> unlikely to land on qemu side and lack traction on kernel side too),
> either someone who can test it volunteers to take over the kernel
> maintainership or we remove it from kernel too.

I think it's more than just a kernel maintainer stepping up to test,
there really needs to be some in-kernel justification for the
vfio-platform driver itself.  If it's only enabling out of tree use
cases and there's nothing in-tree that's actually independently
worthwhile, I don't really see why we shouldn't remove it and just let
those out of tree use cases provide their own out of tree versions of
vfio-platform.  Thanks,

Alex

[PATCH] rust: hpet: decode HPET registers into enums

2025-02-27 Thread Paolo Bonzini

Generalize timer_and_addr() to decode all registers into a single enum
HPETRegister, and use the TryInto derive to separate valid and
invalid values.

The main advantage lies in checking that all registers are enumerated
in the "match" statements.

Signed-off-by: Paolo Bonzini 
---
 rust/Cargo.toml|   2 +
 rust/hw/char/pl011/src/lib.rs  |   2 -
 rust/hw/timer/hpet/src/hpet.rs | 204 +
 3 files changed, 110 insertions(+), 98 deletions(-)

diff --git a/rust/Cargo.toml b/rust/Cargo.toml
index 5041d6291fd..ab1185a8143 100644
--- a/rust/Cargo.toml
+++ b/rust/Cargo.toml
@@ -37,6 +37,8 @@ result_unit_err = "allow"
 should_implement_trait = "deny"
 # can be for a reason, e.g. in callbacks
 unused_self = "allow"
+# common in device crates
+upper_case_acronyms = "allow"
 
 # default-allow lints
 as_ptr_cast_mut = "deny"
diff --git a/rust/hw/char/pl011/src/lib.rs b/rust/hw/char/pl011/src/lib.rs
index 45c13ba899e..dbae76991c9 100644
--- a/rust/hw/char/pl011/src/lib.rs
+++ b/rust/hw/char/pl011/src/lib.rs
@@ -12,8 +12,6 @@
 //! See [`PL011State`](crate::device::PL011State) for the device model type and
 //! the [`registers`] module for register types.
 
-#![allow(clippy::upper_case_acronyms)]
-
 use qemu_api::c_str;
 
 mod device;
diff --git a/rust/hw/timer/hpet/src/hpet.rs b/rust/hw/timer/hpet/src/hpet.rs
index a440c9f4cb9..af273f02c54 100644
--- a/rust/hw/timer/hpet/src/hpet.rs
+++ b/rust/hw/timer/hpet/src/hpet.rs
@@ -48,8 +48,6 @@
 const HPET_CLK_PERIOD: u64 = 10; // 10 ns
 const FS_PER_NS: u64 = 100; // 100 femtoseconds == 1 ns
 
-/// General Capabilities and ID Register
-const HPET_CAP_REG: u64 = 0x000;
 /// Revision ID (bits 0:7). Revision 1 is implemented (refer to v1.0a spec).
 const HPET_CAP_REV_ID_VALUE: u64 = 0x1;
 const HPET_CAP_REV_ID_SHIFT: usize = 0;
@@ -65,8 +63,6 @@
 /// Main Counter Tick Period (bits 32:63)
 const HPET_CAP_CNT_CLK_PERIOD_SHIFT: usize = 32;
 
-/// General Configuration Register
-const HPET_CFG_REG: u64 = 0x010;
 /// Overall Enable (bit 0)
 const HPET_CFG_ENABLE_SHIFT: usize = 0;
 /// Legacy Replacement Route (bit 1)
@@ -74,14 +70,6 @@
 /// Other bits are reserved.
 const HPET_CFG_WRITE_MASK: u64 = 0x003;
 
-/// General Interrupt Status Register
-const HPET_INT_STATUS_REG: u64 = 0x020;
-
-/// Main Counter Value Register
-const HPET_COUNTER_REG: u64 = 0x0f0;
-
-/// Timer N Configuration and Capability Register (masked by 0x18)
-const HPET_TN_CFG_REG: u64 = 0x000;
 /// bit 0, 7, and bits 16:31 are reserved.
 /// bit 4, 5, 15, and bits 32:64 are read-only.
 const HPET_TN_CFG_WRITE_MASK: u64 = 0x7f4e;
@@ -109,11 +97,51 @@
 /// Timer N Interrupt Routing Capability (bits 32:63)
 const HPET_TN_CFG_INT_ROUTE_CAP_SHIFT: usize = 32;
 
-/// Timer N Comparator Value Register (masked by 0x18)
-const HPET_TN_CMP_REG: u64 = 0x008;
+#[derive(qemu_api_macros::TryInto)]
+#[repr(u64)]
+#[allow(non_camel_case_types)]
+/// Timer registers, masked by 0x18
+enum TimerRegister {
+/// Timer N Configuration and Capability Register
+CFG = 0,
+/// Timer N Comparator Value Register
+CMP = 8,
+/// Timer N FSB Interrupt Route Register
+ROUTE = 16,
+}
 
-/// Timer N FSB Interrupt Route Register (masked by 0x18)
-const HPET_TN_FSB_ROUTE_REG: u64 = 0x010;
+#[derive(qemu_api_macros::TryInto)]
+#[repr(u64)]
+#[allow(non_camel_case_types)]
+/// Global registers
+enum GlobalRegister {
+/// General Capabilities and ID Register
+CAP = 0,
+/// General Configuration Register
+CFG = 0x10,
+/// General Interrupt Status Register
+INT_STATUS = 0x20,
+/// Main Counter Value Register
+COUNTER = 0xF0,
+}
+
+enum HPETRegister<'a> {
+/// Global register in the range from `0` to `0xff`
+Global(GlobalRegister),
+
+/// Register in the timer block `0x100`...`0x3ff`
+Timer(&'a BqlRefCell, TimerRegister),
+
+/// Invalid address
+#[allow(dead_code)]
+Unknown(hwaddr),
+}
+
+struct HPETAddrDecode<'a> {
+shift: u32,
+len: u32,
+reg: HPETRegister<'a>,
+}
 
 const fn hpet_next_wrap(cur_tick: u64) -> u64 {
 (cur_tick | 0x) + 1
@@ -460,33 +488,21 @@ fn callback(&mut self) {
 self.update_irq(true);
 }
 
-const fn read(&self, addr: hwaddr, _size: u32) -> u64 {
-let shift: u64 = (addr & 4) * 8;
-
-match addr & !4 {
-HPET_TN_CFG_REG => self.config >> shift, // including interrupt 
capabilities
-HPET_TN_CMP_REG => self.cmp >> shift,// comparator register
-HPET_TN_FSB_ROUTE_REG => self.fsb >> shift,
-_ => {
-// TODO: Add trace point - trace_hpet_ram_read_invalid()
-// Reserved.
-0
-}
+const fn read(&self, reg: TimerRegister) -> u64 {
+use TimerRegister::*;
+match reg {
+CFG => self.config, // including interrupt capabilities
+CMP => self.cmp,// comparator register
+ROUTE => self.fsb,
 }

Re: [PATCH 2/3] target/arm: Correct STRD atomicity

2025-02-27 Thread Richard Henderson


On 2/27/25 06:27, Peter Maydell wrote:

Our STRD implementation doesn't correctly implement the requirement:
  * if the address is 8-aligned the access must be a 64-bit
single-copy atomic access, not two 32-bit accesses

Rewrite the handling of STRD to use a single tcg_gen_qemu_st_i64()
of a value produced by concatenating the two 32 bit source registers.
This allows us to get the atomicity right.

As with the LDRD change, now that we don't update 'addr' in the
course of performing the store we need to adjust the offset
we pass to op_addr_ri_post() and op_addr_rr_post().

Cc:qemu-sta...@nongnu.org
Signed-off-by: Peter Maydell
---
  target/arm/tcg/translate.c | 55 --
  1 file changed, 35 insertions(+), 20 deletions(-)


Modulo the LPAE comment vs patch 1,
Reviewed-by: Richard Henderson 


r~

Re: [PATCH 3/3] target/arm: Drop unused address_offset from op_addr_{rr, ri}_post()

2025-02-27 Thread Richard Henderson


On 2/27/25 06:27, Peter Maydell wrote:

All the callers of op_addr_rr_post() and op_addr_ri_post() now pass in
zero for the address_offset, so we can remove that argument.

Signed-off-by: Peter Maydell
---
  target/arm/tcg/translate.c | 26 +-
  1 file changed, 13 insertions(+), 13 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH 4/4] target/arm: Retry pushing CPER error if necessary

2025-02-27 Thread Jonathan Cameron via

On Wed, 26 Feb 2025 14:58:46 +1000
Gavin Shan  wrote:

> On 2/25/25 9:19 PM, Igor Mammedov wrote:
> > On Fri, 21 Feb 2025 11:04:35 +
> > Jonathan Cameron  wrote:  
> >>
> >> Ideally I'd like whatever we choose to look like what a bare metal machine
> >> does - mostly because we are less likely to hit untested OS paths.  
> > 
> > Ack for that but,
> > that would need someone from hw/firmware side since error status block
> > handling is done by firmware.
> > 
> > right now we are just making things up based on spec interpretation.
> >   
> 
> It's a good point. I think it's worthwhile to understand how the RAS event
> is processed and turned to CPER by firmware.
> 
> I didn't figure out how CPER is generated by edk2 after looking into tf-a 
> (trust
> firmware ARM) and edk2 for a while. I will consult to EDK2 developers to seek
> their helps. However, there is a note in tf-a that briefly explaining how RAS
> event is handled.
> 
>From tf-a/plat/arm/board/fvp/aarch64/fvp_lsp_ras_sp.c:
>(g...@github.com:ARM-software/arm-trusted-firmware.git)
> 
>/*
> * Note: Typical RAS error handling flow with Firmware First Handling
> *
> * Step 1: Exception resulting from a RAS error in the normal world is 
> routed to
> * EL3.
> * Step 2: This exception is typically signaled as either a synchronous 
> external
> * abort or SError or interrupt. TF-A (EL3 firmware) delegates the
> * control to platform specific handler built on top of the RAS 
> helper
> * utilities.
> * Step 3: With the help of a Logical Secure Partition, TF-A sends a direct
> * message to dedicated S-EL0 (or S-EL1) RAS Partition managed by 
> SPMC.
> * TF-A also populates a shared buffer with a data structure 
> containing
> * enough information (such as system registers) to identify and 
> triage
> * the RAS error.
> * Step 4: RAS SP generates the Common Platform Error Record (CPER) and 
> shares
> * it with normal world firmware and/or OS kernel through a 
> reserved
> * buffer memory.
> * Step 5: RAS SP responds to the direct message with information 
> necessary for
> * TF-A to notify the OS kernel.
> * Step 6: Consequently, TF-A dispatches an SDEI event to notify the OS 
> kernel
> * about the CPER records for further logging.
> */
> 
> According to the note, RAS SP (Secure Partition) is the black box where the 
> RAS
> event raised by tf-a is turned to CPER. Unfortunately, I didn't find the 
> source
> code to understand the details yet.

This is very much 'a flow' rather than 'the flow'.  TFA may not even be
involved in many systems, nor SDEI, nor EDK2 beyond passing through some
config.   One option, as I understand it, is to offload the firmware handing
and building of the record to a management processor and stick to SEA
for the signalling.

I'd be rather surprised if you can find anything beyond binary blobs
for those firmware (if that!).  Maybe all we can get from publicish sources
is what the HEST tables look like.  I've asked our firmware folk if they
can share more on how we do it but might take a while.

I have confirmed we only have one GHESv2 SEA entry on at least the one random
board I looked at (and various interrupt ones).  That board may not be
representative but seems pushing everything through one structure is an option.

Jonathan

> 
> Thanks,
> Gavin
> 
>

Re: [PATCH v10 5/8] hw/misc/riscv_iopmp_txn_info: Add struct for transaction infomation

2025-02-27 Thread Alistair Francis

On Wed, Jan 22, 2025 at 6:49 PM Ethan Chen via  wrote:
>
> The entire valid transaction must fit within a single IOPMP entry.
> However, during IOMMU translation, the transaction size is not
> available. This structure defines the transaction information required
> by the IOPMP.
>
> Signed-off-by: Ethan Chen 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  include/hw/misc/riscv_iopmp_txn_info.h | 38 ++
>  1 file changed, 38 insertions(+)
>  create mode 100644 include/hw/misc/riscv_iopmp_txn_info.h
>
> diff --git a/include/hw/misc/riscv_iopmp_txn_info.h 
> b/include/hw/misc/riscv_iopmp_txn_info.h
> new file mode 100644
> index 00..98bd26b68b
> --- /dev/null
> +++ b/include/hw/misc/riscv_iopmp_txn_info.h
> @@ -0,0 +1,38 @@
> +/*
> + * QEMU RISC-V IOPMP transaction information
> + *
> + * The transaction information structure provides the complete transaction
> + * length to the IOPMP device
> + *
> + * Copyright (c) 2023-2025 Andes Tech. Corp.
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program.  If not, see .
> + */
> +
> +#ifndef RISCV_IOPMP_TXN_INFO_H
> +#define RISCV_IOPMP_TXN_INFO_H
> +
> +typedef struct {
> +/* The id of requestor */
> +uint32_t rrid:16;
> +/* The start address of transaction */
> +uint64_t start_addr;
> +/* The end address of transaction */
> +uint64_t end_addr;
> +/* The stage of cascading IOPMP */
> +uint32_t stage;
> +} riscv_iopmp_txn_info;
> +
> +#endif
> --
> 2.34.1
>
>

[PATCH v3] hw/arm/smmu: Introduce smmu_configs_inv_sid_range() helper

2025-02-27 Thread JianChunfu

Use a similar terminology smmu_hash_remove_by_sid_range() as the one
being used for other hash table matching functions since
smmuv3_invalidate_ste() name is not self explanatory, and introduce a
helper that invokes the g_hash_table_foreach_remove.

No functional change intended.

Signed-off-by: JianChunfu 
---
v3: - Modify the commit msg
- Rename the trace funtion
v2: - move smmuv3_invalidate_ste() to smmu_hash_remove_by_sid_range()
- add function smmu_configs_inv_sid_range()
v1: - Rename smmuv3_invalidate_ste to smmuv3_hash_remove_by_sid_range
---
 hw/arm/smmu-common.c | 21 +
 hw/arm/smmu-internal.h   |  5 -
 hw/arm/smmuv3.c  | 19 ++-
 hw/arm/trace-events  |  3 ++-
 include/hw/arm/smmu-common.h |  6 ++
 5 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 3f8272875..bad3b3b0b 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -225,6 +225,27 @@ static gboolean smmu_hash_remove_by_vmid_ipa(gpointer key, 
gpointer value,
((entry->iova & ~info->mask) == info->iova);
 }
 
+static gboolean
+smmu_hash_remove_by_sid_range(gpointer key, gpointer value, gpointer user_data)
+{
+SMMUDevice *sdev = (SMMUDevice *)key;
+uint32_t sid = smmu_get_sid(sdev);
+SMMUSIDRange *sid_range = (SMMUSIDRange *)user_data;
+
+if (sid < sid_range->start || sid > sid_range->end) {
+return false;
+}
+trace_smmu_config_cache_inv(sid);
+return true;
+}
+
+void smmu_configs_inv_sid_range(SMMUState *s, SMMUSIDRange sid_range)
+{
+trace_smmu_configs_inv_sid_range(sid_range.start, sid_range.end);
+g_hash_table_foreach_remove(s->configs, smmu_hash_remove_by_sid_range,
+&sid_range);
+}
+
 void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
  uint8_t tg, uint64_t num_pages, uint8_t ttl)
 {
diff --git a/hw/arm/smmu-internal.h b/hw/arm/smmu-internal.h
index 843bebb18..d143d296f 100644
--- a/hw/arm/smmu-internal.h
+++ b/hw/arm/smmu-internal.h
@@ -141,9 +141,4 @@ typedef struct SMMUIOTLBPageInvInfo {
 uint64_t mask;
 } SMMUIOTLBPageInvInfo;
 
-typedef struct SMMUSIDRange {
-uint32_t start;
-uint32_t end;
-} SMMUSIDRange;
-
 #endif
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 4c49b5a88..1c55bc56d 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -903,7 +903,7 @@ static void smmuv3_flush_config(SMMUDevice *sdev)
 SMMUv3State *s = sdev->smmu;
 SMMUState *bc = &s->smmu_state;
 
-trace_smmuv3_config_cache_inv(smmu_get_sid(sdev));
+trace_smmu_config_cache_inv(smmu_get_sid(sdev));
 g_hash_table_remove(bc->configs, sdev);
 }
 
@@ -1277,20 +1277,6 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd, 
SMMUStage stage)
 }
 }
 
-static gboolean
-smmuv3_invalidate_ste(gpointer key, gpointer value, gpointer user_data)
-{
-SMMUDevice *sdev = (SMMUDevice *)key;
-uint32_t sid = smmu_get_sid(sdev);
-SMMUSIDRange *sid_range = (SMMUSIDRange *)user_data;
-
-if (sid < sid_range->start || sid > sid_range->end) {
-return false;
-}
-trace_smmuv3_config_cache_inv(sid);
-return true;
-}
-
 static int smmuv3_cmdq_consume(SMMUv3State *s)
 {
 SMMUState *bs = ARM_SMMU(s);
@@ -1373,8 +1359,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 sid_range.end = sid_range.start + mask;
 
 trace_smmuv3_cmdq_cfgi_ste_range(sid_range.start, sid_range.end);
-g_hash_table_foreach_remove(bs->configs, smmuv3_invalidate_ste,
-&sid_range);
+smmu_configs_inv_sid_range(bs, sid_range);
 break;
 }
 case SMMU_CMD_CFGI_CD:
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index c64ad344b..e96f9ae47 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -15,6 +15,8 @@ smmu_iotlb_inv_asid_vmid(int asid, int vmid) "IOTLB 
invalidate asid=%d vmid=%d"
 smmu_iotlb_inv_vmid(int vmid) "IOTLB invalidate vmid=%d"
 smmu_iotlb_inv_vmid_s1(int vmid) "IOTLB invalidate vmid=%d"
 smmu_iotlb_inv_iova(int asid, uint64_t addr) "IOTLB invalidate asid=%d 
addr=0x%"PRIx64
+smmu_configs_inv_sid_range(uint32_t start, uint32_t end) "Config cache INV SID 
range from 0x%x to 0x%x"
+smmu_config_cache_inv(uint32_t sid) "Config cache INV for sid=0x%x"
 smmu_inv_notifiers_mr(const char *name) "iommu mr=%s"
 smmu_iotlb_lookup_hit(int asid, int vmid, uint64_t addr, uint32_t hit, 
uint32_t miss, uint32_t p) "IOTLB cache HIT asid=%d vmid=%d addr=0x%"PRIx64" 
hit=%d miss=%d hit rate=%d"
 smmu_iotlb_lookup_miss(int asid, int vmid, uint64_t addr, uint32_t hit, 
uint32_t miss, uint32_t p) "IOTLB cache MISS asid=%d vmid=%d addr=0x%"PRIx64" 
hit=%d miss=%d hit rate=%d"
@@ -52,7 +54,6 @@ smmuv3_cmdq_tlbi_nh(int vmid) "vmid=%d"
 smmuv3_cmdq_tlbi_nsnh(void) ""
 smmuv3_cmdq_tlbi_nh_asid(int asid) "asid=%d"
 smmuv3_cmdq_tlbi_s12_vmid(int vm

Re: [PATCH v2 0/3] binfmt: Add --ignore-family option

2025-02-27 Thread Alistair Francis

On Tue, Jan 28, 2025 at 4:29 AM Andrea Bolognani  wrote:
>
> Changes from [v1]:
>
>   * adopt a completely different, more general approach.
>
> [v1] https://mail.gnu.org/archive/html/qemu-devel/2024-12/msg00459.html
>
> Andrea Bolognani (3):
>   binfmt: Shuffle things around
>   binfmt: Normalize host CPU architecture
>   binfmt: Add --ignore-family option

Thanks!

Applied to riscv-to-apply.next

Alistair

>
>  scripts/qemu-binfmt-conf.sh | 78 -
>  1 file changed, 50 insertions(+), 28 deletions(-)
>
> --
> 2.48.1
>

Re: [PATCH v5 4/4] hw/ssi/pnv_spi: Put a limit to RDR match failures

2025-02-27 Thread Chalapathi V




On 27-02-2025 07:26, Nicholas Piggin wrote:

On Sat Jan 4, 2025 at 2:18 AM AEST, Chalapathi V wrote:

There is a possibility that SPI controller can get into loop due to indefinite
RDR match failures. Hence put a limit to failures and stop the sequencer.

Signed-off-by: Chalapathi V 
---
  hw/ssi/pnv_spi.c | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/hw/ssi/pnv_spi.c b/hw/ssi/pnv_spi.c
index 41beb559c6..d605fa8b46 100644
--- a/hw/ssi/pnv_spi.c
+++ b/hw/ssi/pnv_spi.c
@@ -20,6 +20,7 @@
  #define PNV_SPI_OPCODE_LO_NIBBLE(x) (x & 0x0F)
  #define PNV_SPI_MASKED_OPCODE(x) (x & 0xF0)
  #define PNV_SPI_FIFO_SIZE 16
+#define RDR_MATCH_FAILURE_LIMIT 16
  
  /*

   * Macro from include/hw/ppc/fdt.h
@@ -838,21 +839,31 @@ static void operation_sequencer(PnvSpi *s)
   */
  if (GETFIELD(SPI_STS_RDR_FULL, s->status) == 1) {
  bool rdr_matched = false;
+static int fail_count;

This will be shared by SPI instances, is that okay or should it be
in PnvSpi?

Other than that, looks good.

This should be in PnvSpi. Will update in V6. Thank You.

Reviewed-by: Nicholas Piggin

[PATCH] virtio-pci: fix memory leak from device realization failure

2025-02-27 Thread Zheng Huang

This commit adds failback routine for `virtio_pci_realize` to 
fix the memory leak of an address space and the virtio-net device object.
If the realization of the device failed, the address space should be 
destroyed too.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2845

Signed-off-by: Zheng Huang 

---
 hw/virtio/virtio-pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index c773a9130c..4b0d8cd90a 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2266,6 +2266,9 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 virtio_pci_bus_new(&proxy->bus, sizeof(proxy->bus), proxy);
 if (k->realize) {
 k->realize(proxy, errp);
+if (*errp) {
+address_space_destroy(&proxy->modern_cfg_mem_as);
+}
 }
 }
 
-- 
2.34.1

RE: [PATCH rfcv2 01/20] backends/iommufd: Add helpers for invalidating user-managed HWPT

2025-02-27 Thread Duan, Zhenzhong

Hi Eric,

>-Original Message-
>From: Eric Auger 
>Subject: Re: [PATCH rfcv2 01/20] backends/iommufd: Add helpers for invalidating
>user-managed HWPT
>
>Hi Zhenzhong,
>
>
>On 2/19/25 9:22 AM, Zhenzhong Duan wrote:
>> Signed-off-by: Nicolin Chen 
>> Signed-off-by: Zhenzhong Duan 
>in the title, there is only a single helper here. a small commit msg may
>help the reader

Sure, will do.

>> ---
>>  include/system/iommufd.h |  3 +++
>>  backends/iommufd.c   | 30 ++
>>  backends/trace-events|  1 +
>>  3 files changed, 34 insertions(+)
>>
>> diff --git a/include/system/iommufd.h b/include/system/iommufd.h
>> index cbab75bfbf..5d02e9d148 100644
>> --- a/include/system/iommufd.h
>> +++ b/include/system/iommufd.h
>> @@ -61,6 +61,9 @@ bool
>iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
>>uint64_t iova, ram_addr_t size,
>>uint64_t page_size, uint64_t *data,
>>Error **errp);
>> +int iommufd_backend_invalidate_cache(IOMMUFDBackend *be, uint32_t
>hwpt_id,
>> + uint32_t data_type, uint32_t entry_len,
>> + uint32_t *entry_num, void *data_ptr);
>>
>>  #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>TYPE_HOST_IOMMU_DEVICE "-iommufd"
>>  #endif
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index d57da44755..fc32aad5cb 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -311,6 +311,36 @@ bool
>iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
>>  return true;
>>  }
>>
>> +int iommufd_backend_invalidate_cache(IOMMUFDBackend *be, uint32_t
>hwpt_id,
>> + uint32_t data_type, uint32_t entry_len,
>> + uint32_t *entry_num, void *data_ptr)
>> +{
>> +int ret, fd = be->fd;
>> +struct iommu_hwpt_invalidate cache = {
>> +.size = sizeof(cache),
>> +.hwpt_id = hwpt_id,
>> +.data_type = data_type,
>> +.entry_len = entry_len,
>> +.entry_num = *entry_num,
>> +.data_uptr = (uintptr_t)data_ptr,
>> +};
>> +
>> +ret = ioctl(fd, IOMMU_HWPT_INVALIDATE, &cache);
>> +
>> +trace_iommufd_backend_invalidate_cache(fd, hwpt_id, data_type,
>entry_len,
>> +   *entry_num, cache.entry_num,
>> +   (uintptr_t)data_ptr, ret);
>> +if (ret) {
>> +*entry_num = cache.entry_num;
>> +error_report("IOMMU_HWPT_INVALIDATE failed: %s", strerror(errno));
>nit: you may report *entry_num also.
>Wouldn't it be useful to have an Error *errp passed to the function

Will do.

Thanks
Zhenzhong

>> +ret = -errno;
>> +} else {
>> +g_assert(*entry_num == cache.entry_num);
>> +}
>> +
>> +return ret;
>> +}
>> +
>>  static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error
>**errp)
>>  {
>>  HostIOMMUDeviceCaps *caps = &hiod->caps;
>> diff --git a/backends/trace-events b/backends/trace-events
>> index 40811a3162..5a23db6c8a 100644
>> --- a/backends/trace-events
>> +++ b/backends/trace-events
>> @@ -18,3 +18,4 @@ iommufd_backend_alloc_hwpt(int iommufd, uint32_t
>dev_id, uint32_t pt_id, uint32_
>>  iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>id=%d (%d)"
>>  iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int 
>> ret)
>" iommufd=%d hwpt=%u enable=%d (%d)"
>>  iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t
>iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u
>iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
>> +iommufd_backend_invalidate_cache(int iommufd, uint32_t hwpt_id, uint32_t
>data_type, uint32_t entry_len, uint32_t entry_num, uint32_t done_num, uint64_t
>data_ptr, int ret) " iommufd=%d hwpt_id=%u data_type=%u entry_len=%u
>entry_num=%u done_num=%u data_ptr=0x%"PRIx64" (%d)"
>Eric

[PATCH v7 16/16] sev: Provide sev_features flags from IGVM VMSA to KVM_SEV_INIT2

2025-02-27 Thread Roy Hopkins

IGVM files can contain an initial VMSA that should be applied to each
vcpu as part of the initial guest state. The sev_features flags are
provided as part of the VMSA structure. However, KVM only allows
sev_features to be set during initialization and not as the guest is
being prepared for launch.

This patch queries KVM for the supported set of sev_features flags and
processes the IGVM file during kvm_init to determine any sev_features
flags set in the IGVM file. These are then provided in the call to
KVM_SEV_INIT2 to ensure the guest state matches that specified in the
IGVM file.

This does cause the IGVM file to be processed twice. Firstly to extract
the sev_features then secondly to actually configure the guest. However,
the first pass is largely ignored meaning the overhead is minimal.

Signed-off-by: Roy Hopkins 
Acked-by: Michael S. Tsirkin 
Acked-by: Stefano Garzarella 
---
 target/i386/sev.c | 160 --
 1 file changed, 141 insertions(+), 19 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index fa9b4bcad6..ef25e64b14 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -117,6 +117,8 @@ struct SevCommonState {
 uint32_t cbitpos;
 uint32_t reduced_phys_bits;
 bool kernel_hashes;
+uint64_t sev_features;
+uint64_t supported_sev_features;
 
 /* runtime state */
 uint8_t api_major;
@@ -492,7 +494,40 @@ static void sev_apply_cpu_context(CPUState *cpu)
 }
 }
 
-static int check_vmsa_supported(hwaddr gpa, const struct sev_es_save_area 
*vmsa,
+static int check_sev_features(SevCommonState *sev_common, uint64_t 
sev_features,
+  Error **errp)
+{
+/*
+ * Ensure SEV_FEATURES is configured for correct SEV hardware and that
+ * the requested features are supported. If SEV-SNP is enabled then
+ * that feature must be enabled, otherwise it must be cleared.
+ */
+if (sev_snp_enabled() && !(sev_features & SVM_SEV_FEAT_SNP_ACTIVE)) {
+error_setg(
+errp,
+"%s: SEV_SNP is enabled but is not enabled in VMSA sev_features",
+__func__);
+return -1;
+} else if (!sev_snp_enabled() &&
+   (sev_features & SVM_SEV_FEAT_SNP_ACTIVE)) {
+error_setg(
+errp,
+"%s: SEV_SNP is not enabled but is enabled in VMSA sev_features",
+__func__);
+return -1;
+}
+if (sev_features & ~sev_common->supported_sev_features) {
+error_setg(errp,
+   "%s: VMSA contains unsupported sev_features: %lX, "
+   "supported features: %lX",
+   __func__, sev_features, sev_common->supported_sev_features);
+return -1;
+}
+return 0;
+}
+
+static int check_vmsa_supported(SevCommonState *sev_common, hwaddr gpa,
+const struct sev_es_save_area *vmsa,
 Error **errp)
 {
 struct sev_es_save_area vmsa_check;
@@ -558,24 +593,10 @@ static int check_vmsa_supported(hwaddr gpa, const struct 
sev_es_save_area *vmsa,
 vmsa_check.x87_fcw = 0;
 vmsa_check.mxcsr = 0;
 
-if (sev_snp_enabled()) {
-if (vmsa_check.sev_features != SVM_SEV_FEAT_SNP_ACTIVE) {
-error_setg(errp,
-   "%s: sev_features in the VMSA contains an unsupported "
-   "value. For SEV-SNP, sev_features must be set to %x.",
-   __func__, SVM_SEV_FEAT_SNP_ACTIVE);
-return -1;
-}
-vmsa_check.sev_features = 0;
-} else {
-if (vmsa_check.sev_features != 0) {
-error_setg(errp,
-   "%s: sev_features in the VMSA contains an unsupported "
-   "value. For SEV-ES and SEV, sev_features must be "
-   "set to 0.", __func__);
-return -1;
-}
+if (check_sev_features(sev_common, vmsa_check.sev_features, errp) < 0) {
+return -1;
 }
+vmsa_check.sev_features = 0;
 
 if (!buffer_is_zero(&vmsa_check, sizeof(vmsa_check))) {
 error_setg(errp,
@@ -1729,6 +1750,39 @@ static int sev_snp_kvm_type(X86ConfidentialGuest *cg)
 return KVM_X86_SNP_VM;
 }
 
+static int sev_init_supported_features(ConfidentialGuestSupport *cgs,
+   SevCommonState *sev_common, Error 
**errp)
+{
+X86ConfidentialGuestClass *x86_klass =
+   X86_CONFIDENTIAL_GUEST_GET_CLASS(cgs);
+/*
+ * Older kernels do not support query or setting of sev_features. In this
+ * case the set of supported features must be zero to match the settings
+ * in the kernel.
+ */
+if (x86_klass->kvm_type(X86_CONFIDENTIAL_GUEST(sev_common)) ==
+KVM_X86_DEFAULT_VM) {
+sev_common->supported_sev_features = 0;
+return 0;
+}
+
+/* Query KVM for the supported set of sev_features */
+struct kvm_device_attr attr = {
+

Re: [PATCH v5 1/4] hw/ssi/pnv_spi: Replace PnvXferBuffer with Fifo8 structure

2025-02-27 Thread Chalapathi V


Hello Nick,

Thank You for reviewing this series.

On 27-02-2025 07:09, Nicholas Piggin wrote:

On Sat Jan 4, 2025 at 2:18 AM AEST, Chalapathi V wrote:

In PnvXferBuffer dynamically allocating and freeing is a
process overhead. Hence used an existing Fifo8 buffer with
capacity of 16 bytes.

Signed-off-by: Chalapathi V
---
  include/hw/ssi/pnv_spi.h |   3 +
  hw/ssi/pnv_spi.c | 237 +--
  2 files changed, 81 insertions(+), 159 deletions(-)

diff --git a/include/hw/ssi/pnv_spi.h b/include/hw/ssi/pnv_spi.h
index 8815f67d45..9878d9a25f 100644
--- a/include/hw/ssi/pnv_spi.h
+++ b/include/hw/ssi/pnv_spi.h
@@ -23,6 +23,7 @@
  
  #include "hw/ssi/ssi.h"

  #include "hw/sysbus.h"
+#include "qemu/fifo8.h"
  
  #define TYPE_PNV_SPI "pnv-spi"

  OBJECT_DECLARE_SIMPLE_TYPE(PnvSpi, PNV_SPI)
@@ -37,6 +38,8 @@ typedef struct PnvSpi {
  SSIBus *ssi_bus;
  qemu_irq *cs_line;
  MemoryRegionxscom_spic_regs;
+Fifo8 tx_fifo;
+Fifo8 rx_fifo;
  /* SPI object number */
  uint32_tspic_num;
  uint8_t transfer_len;
diff --git a/hw/ssi/pnv_spi.c b/hw/ssi/pnv_spi.c
index 15e25bd1be..63d298980d 100644
--- a/hw/ssi/pnv_spi.c
+++ b/hw/ssi/pnv_spi.c
@@ -19,6 +19,7 @@
  
  #define PNV_SPI_OPCODE_LO_NIBBLE(x) (x & 0x0F)

  #define PNV_SPI_MASKED_OPCODE(x) (x & 0xF0)
+#define PNV_SPI_FIFO_SIZE 16
  
  /*

   * Macro from include/hw/ppc/fdt.h
@@ -35,48 +36,14 @@
  }  \
  } while (0)
  
-/* PnvXferBuffer */

-typedef struct PnvXferBuffer {
-
-uint32_tlen;
-uint8_t*data;
-
-} PnvXferBuffer;
-
-/* pnv_spi_xfer_buffer_methods */
-static PnvXferBuffer *pnv_spi_xfer_buffer_new(void)
-{
-PnvXferBuffer *payload = g_malloc0(sizeof(*payload));
-
-return payload;
-}
-
-static void pnv_spi_xfer_buffer_free(PnvXferBuffer *payload)
-{
-g_free(payload->data);
-g_free(payload);
-}
-
-static uint8_t *pnv_spi_xfer_buffer_write_ptr(PnvXferBuffer *payload,
-uint32_t offset, uint32_t length)
-{
-if (payload->len < (offset + length)) {
-payload->len = offset + length;
-payload->data = g_realloc(payload->data, payload->len);
-}
-return &payload->data[offset];
-}
-
  static bool does_rdr_match(PnvSpi *s)
  {
  /*
   * According to spec, the mask bits that are 0 are compared and the
   * bits that are 1 are ignored.
   */
-uint16_t rdr_match_mask = GETFIELD(SPI_MM_RDR_MATCH_MASK,
-s->regs[SPI_MM_REG]);
-uint16_t rdr_match_val = GETFIELD(SPI_MM_RDR_MATCH_VAL,
-s->regs[SPI_MM_REG]);
+uint16_t rdr_match_mask = GETFIELD(SPI_MM_RDR_MATCH_MASK, 
s->regs[SPI_MM_REG]);
+uint16_t rdr_match_val = GETFIELD(SPI_MM_RDR_MATCH_VAL, 
s->regs[SPI_MM_REG]);
  
  if ((~rdr_match_mask & rdr_match_val) == ((~rdr_match_mask) &

  GETFIELD(PPC_BITMASK(48, 63), s->regs[SPI_RCV_DATA_REG]))) {

Usually try to avoid unrelated / cleanup in the same patch that acually
changes things. In this case it's quite minor but it helps with review
and rebasing to avoid.

If it's on the same line or very close line to your change, or
occasional ones I don't mind so much, but you have quite a few
more further down the patch.
Sorry about that!. Should I create a new patch in v6 or keep it as is 
for now?


Going forward will create new patches for distinct changes.

Thank You.


@@ -107,8 +74,8 @@ static uint8_t get_from_offset(PnvSpi *s, uint8_t offset)
  return byte;
  }
  
-static uint8_t read_from_frame(PnvSpi *s, uint8_t *read_buf, uint8_t nr_bytes,

-uint8_t ecc_count, uint8_t shift_in_count)
+static uint8_t read_from_frame(PnvSpi *s, uint8_t nr_bytes, uint8_t ecc_count,
+uint8_t shift_in_count)
  {
  uint8_t byte;
  int count = 0;
@@ -118,20 +85,23 @@ static uint8_t read_from_frame(PnvSpi *s, uint8_t 
*read_buf, uint8_t nr_bytes,
  if ((ecc_count != 0) &&
  (shift_in_count == (PNV_SPI_REG_SIZE + ecc_count))) {
  shift_in_count = 0;
-} else {
-byte = read_buf[count];
+} else if (!fifo8_is_empty(&s->rx_fifo)) {
+byte = fifo8_pop(&s->rx_fifo);
  trace_pnv_spi_shift_rx(byte, count);
  s->regs[SPI_RCV_DATA_REG] = (s->regs[SPI_RCV_DATA_REG] << 8) | 
byte;
+} else {
+qemu_log_mask(LOG_GUEST_ERROR, "pnv_spi: Reading empty RX_FIFO\n");
  }
  count++;
  } /* end of while */
  return shift_in_count;
  }
  
-static void spi_response(PnvSpi *s, int bits, PnvXferBuffer *rsp_payload)

+static void spi_response(PnvSpi *s)
  {
  uint8_t ecc_count;
  uint8_t shift_in_count;
+uint32_t rx_len;
  
  /*

   * Processing here must handle:
@@ -144,13 +114,14 @@ static void spi_response(PnvSpi *s, int bits, 
PnvXferBuffer *rsp_payload)
   * First check that the

Re: [PATCH v5 3/4] hw/ssi/pnv_spi: Make bus names distinct for each controllers of a socket

2025-02-27 Thread Chalapathi V



On 27-02-2025 07:24, Nicholas Piggin wrote:

On Sat Jan 4, 2025 at 2:18 AM AEST, Chalapathi V wrote:

Create a spi buses with distict names on each socket so that responders
are attached to correct SPI controllers.

QOM tree on a 2 socket machine:
(qemu) info qom-tree
/machine (powernv10-machine)
   /chip[0] (power10_v2.0-pnv-chip)
 /pib_spic[0] (pnv-spi)
   /chip0.pnv.spi.bus.0 (SSI)
   /xscom-spi[0] (memory-region)
   /chip[1] (power10_v2.0-pnv-chip)
 /pib_spic[0] (pnv-spi)
   /chip1.pnv.spi.bus.0 (SSI)
   /xscom-spi[0] (memory-region)

Mechanics of the patch looks fine. I don't know about the name
though.

I think "pnv-spi-bus" is the right name for the bus. Using dots as
with chip0. makes it seem like each element is part of a topology.

Would chip0.pnv-spi-bus be better?

Will rename the bus name to chip0.pnv-spi-bus . Thank You


I don't suppose there is a good way to create an alias so existing
cmdline works and refers to the bus on chip0? Maybe the chip0 bus
could just not have the chip0. prefix?

Thanks,
Nick

Would it be best to keep the chip0 prefix to have uniformity?

Signed-off-by: Chalapathi V
---
  include/hw/ssi/pnv_spi.h   | 3 ++-
  hw/ppc/pnv.c   | 2 ++
  hw/ssi/pnv_spi.c   | 5 +++--
  tests/qtest/pnv-spi-seeprom-test.c | 2 +-
  4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/hw/ssi/pnv_spi.h b/include/hw/ssi/pnv_spi.h
index 9878d9a25f..7fc5da1f84 100644
--- a/include/hw/ssi/pnv_spi.h
+++ b/include/hw/ssi/pnv_spi.h
@@ -31,7 +31,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(PnvSpi, PNV_SPI)
  #define PNV_SPI_REG_SIZE 8
  #define PNV_SPI_REGS 7
  
-#define TYPE_PNV_SPI_BUS "pnv-spi-bus"

+#define TYPE_PNV_SPI_BUS "pnv.spi.bus"
  typedef struct PnvSpi {
  SysBusDevice parent_obj;
  
@@ -42,6 +42,7 @@ typedef struct PnvSpi {

  Fifo8 rx_fifo;
  /* SPI object number */
  uint32_tspic_num;
+uint32_tchip_id;
  uint8_t transfer_len;
  uint8_t responder_select;
  /* To verify if shift_n1 happens prior to shift_n2 */
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 11fd477b71..ce23892fdf 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -2226,6 +2226,8 @@ static void pnv_chip_power10_realize(DeviceState *dev, 
Error **errp)
  /* pib_spic[2] connected to 25csm04 which implements 1 byte transfer 
*/
  object_property_set_int(OBJECT(&chip10->pib_spic[i]), "transfer_len",
  (i == 2) ? 1 : 4, &error_fatal);
+object_property_set_int(OBJECT(&chip10->pib_spic[i]), "chip-id",
+chip->chip_id, &error_fatal);
  if (!sysbus_realize(SYS_BUS_DEVICE(OBJECT
  (&chip10->pib_spic[i])), errp)) {
  return;
diff --git a/hw/ssi/pnv_spi.c b/hw/ssi/pnv_spi.c
index 87eac666bb..41beb559c6 100644
--- a/hw/ssi/pnv_spi.c
+++ b/hw/ssi/pnv_spi.c
@@ -1116,14 +1116,15 @@ static const MemoryRegionOps pnv_spi_xscom_ops = {
  
  static const Property pnv_spi_properties[] = {

  DEFINE_PROP_UINT32("spic_num", PnvSpi, spic_num, 0),
+DEFINE_PROP_UINT32("chip-id", PnvSpi, chip_id, 0),
  DEFINE_PROP_UINT8("transfer_len", PnvSpi, transfer_len, 4),
  };
  
  static void pnv_spi_realize(DeviceState *dev, Error **errp)

  {
  PnvSpi *s = PNV_SPI(dev);
-g_autofree char *name = g_strdup_printf(TYPE_PNV_SPI_BUS ".%d",
-s->spic_num);
+g_autofree char *name = g_strdup_printf("chip%d." TYPE_PNV_SPI_BUS ".%d",
+s->chip_id, s->spic_num);
  s->ssi_bus = ssi_create_bus(dev, name);
  s->cs_line = g_new0(qemu_irq, 1);
  qdev_init_gpio_out_named(DEVICE(s), s->cs_line, "cs", 1);
diff --git a/tests/qtest/pnv-spi-seeprom-test.c 
b/tests/qtest/pnv-spi-seeprom-test.c
index 57f20af76e..ef1005a926 100644
--- a/tests/qtest/pnv-spi-seeprom-test.c
+++ b/tests/qtest/pnv-spi-seeprom-test.c
@@ -92,7 +92,7 @@ static void test_spi_seeprom(const void *data)
  qts = qtest_initf("-machine powernv10 -smp 2,cores=2,"
"threads=1 -accel tcg,thread=single -nographic "
"-blockdev node-name=pib_spic2,driver=file,"
-  "filename=%s -device 25csm04,bus=pnv-spi-bus.2,cs=0,"
+  "filename=%s -device 
25csm04,bus=chip0.pnv.spi.bus.2,cs=0,"



"drive=pib_spic2", tmp_path);
  spi_seeprom_transaction(qts, chip);
  qtest_quit(qts);

Re: [PATCH v5 3/4] hw/ssi/pnv_spi: Make bus names distinct for each controllers of a socket

2025-02-27 Thread Cédric Le Goater


On 2/28/25 04:03, Chalapathi V wrote:


On 27-02-2025 07:24, Nicholas Piggin wrote:

On Sat Jan 4, 2025 at 2:18 AM AEST, Chalapathi V wrote:

Create a spi buses with distict names on each socket so that responders
are attached to correct SPI controllers.

QOM tree on a 2 socket machine:
(qemu) info qom-tree
/machine (powernv10-machine)
   /chip[0] (power10_v2.0-pnv-chip)
 /pib_spic[0] (pnv-spi)
   /chip0.pnv.spi.bus.0 (SSI)
   /xscom-spi[0] (memory-region)
   /chip[1] (power10_v2.0-pnv-chip)
 /pib_spic[0] (pnv-spi)
   /chip1.pnv.spi.bus.0 (SSI)
   /xscom-spi[0] (memory-region)

Mechanics of the patch looks fine. I don't know about the name
though.

I think "pnv-spi-bus" is the right name for the bus. Using dots as
with chip0. makes it seem like each element is part of a topology.

Would chip0.pnv-spi-bus be better?

Will rename the bus name to chip0.pnv-spi-bus . Thank You


Yep. I don't think the bus suffix is useful (minor).

Will you be attaching flash devices from the command line ? Can you provide
an example if so ?

Thanks,

C.



I don't suppose there is a good way to create an alias so existing
cmdline works and refers to the bus on chip0? Maybe the chip0 bus
could just not have the chip0. prefix?

Thanks,
Nick

Would it be best to keep the chip0 prefix to have uniformity?

Signed-off-by: Chalapathi V
---
  include/hw/ssi/pnv_spi.h   | 3 ++-
  hw/ppc/pnv.c   | 2 ++
  hw/ssi/pnv_spi.c   | 5 +++--
  tests/qtest/pnv-spi-seeprom-test.c | 2 +-
  4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/hw/ssi/pnv_spi.h b/include/hw/ssi/pnv_spi.h
index 9878d9a25f..7fc5da1f84 100644
--- a/include/hw/ssi/pnv_spi.h
+++ b/include/hw/ssi/pnv_spi.h
@@ -31,7 +31,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(PnvSpi, PNV_SPI)
  #define PNV_SPI_REG_SIZE 8
  #define PNV_SPI_REGS 7
  
-#define TYPE_PNV_SPI_BUS "pnv-spi-bus"

+#define TYPE_PNV_SPI_BUS "pnv.spi.bus"
  typedef struct PnvSpi {
  SysBusDevice parent_obj;
  
@@ -42,6 +42,7 @@ typedef struct PnvSpi {

  Fifo8 rx_fifo;
  /* SPI object number */
  uint32_tspic_num;
+uint32_tchip_id;
  uint8_t transfer_len;
  uint8_t responder_select;
  /* To verify if shift_n1 happens prior to shift_n2 */
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 11fd477b71..ce23892fdf 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -2226,6 +2226,8 @@ static void pnv_chip_power10_realize(DeviceState *dev, 
Error **errp)
  /* pib_spic[2] connected to 25csm04 which implements 1 byte transfer 
*/
  object_property_set_int(OBJECT(&chip10->pib_spic[i]), "transfer_len",
  (i == 2) ? 1 : 4, &error_fatal);
+object_property_set_int(OBJECT(&chip10->pib_spic[i]), "chip-id",
+chip->chip_id, &error_fatal);
  if (!sysbus_realize(SYS_BUS_DEVICE(OBJECT
  (&chip10->pib_spic[i])), errp)) {
  return;
diff --git a/hw/ssi/pnv_spi.c b/hw/ssi/pnv_spi.c
index 87eac666bb..41beb559c6 100644
--- a/hw/ssi/pnv_spi.c
+++ b/hw/ssi/pnv_spi.c
@@ -1116,14 +1116,15 @@ static const MemoryRegionOps pnv_spi_xscom_ops = {
  
  static const Property pnv_spi_properties[] = {

  DEFINE_PROP_UINT32("spic_num", PnvSpi, spic_num, 0),
+DEFINE_PROP_UINT32("chip-id", PnvSpi, chip_id, 0),
  DEFINE_PROP_UINT8("transfer_len", PnvSpi, transfer_len, 4),
  };
  
  static void pnv_spi_realize(DeviceState *dev, Error **errp)

  {
  PnvSpi *s = PNV_SPI(dev);
-g_autofree char *name = g_strdup_printf(TYPE_PNV_SPI_BUS ".%d",
-s->spic_num);
+g_autofree char *name = g_strdup_printf("chip%d." TYPE_PNV_SPI_BUS ".%d",
+s->chip_id, s->spic_num);
  s->ssi_bus = ssi_create_bus(dev, name);
  s->cs_line = g_new0(qemu_irq, 1);
  qdev_init_gpio_out_named(DEVICE(s), s->cs_line, "cs", 1);
diff --git a/tests/qtest/pnv-spi-seeprom-test.c 
b/tests/qtest/pnv-spi-seeprom-test.c
index 57f20af76e..ef1005a926 100644
--- a/tests/qtest/pnv-spi-seeprom-test.c
+++ b/tests/qtest/pnv-spi-seeprom-test.c
@@ -92,7 +92,7 @@ static void test_spi_seeprom(const void *data)
  qts = qtest_initf("-machine powernv10 -smp 2,cores=2,"
"threads=1 -accel tcg,thread=single -nographic "
"-blockdev node-name=pib_spic2,driver=file,"
-  "filename=%s -device 25csm04,bus=pnv-spi-bus.2,cs=0,"
+  "filename=%s -device 
25csm04,bus=chip0.pnv.spi.bus.2,cs=0,"
"drive=pib_spic2", tmp_path);
  spi_seeprom_transaction(qts, chip);
  qtest_quit(qts);

Re: [PATCH] hw/net: npcm7xx_emc: fix alignment to eth_hdr

2025-02-27 Thread Patrick Venture

On Thu, Feb 27, 2025 at 8:08 AM Patrick Venture  wrote:

>
>
> On Thu, Feb 27, 2025 at 8:01 AM Peter Maydell 
> wrote:
>
>> On Thu, 27 Feb 2025 at 15:55, Patrick Venture  wrote:
>> >
>> >
>> >
>> > On Thu, Feb 27, 2025 at 7:52 AM Peter Maydell 
>> wrote:
>> >>
>> >> On Thu, 27 Feb 2025 at 15:40, Patrick Venture 
>> wrote:
>> >> >
>> >> > 'const struct eth_header', which requires 2 byte alignment
>> >> >
>> >> > Signed-off-by: Patrick Venture 
>> >> > ---
>> >> >  hw/net/npcm7xx_emc.c | 7 ++-
>> >> >  1 file changed, 6 insertions(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/hw/net/npcm7xx_emc.c b/hw/net/npcm7xx_emc.c
>> >> > index e06f652629..11ed4a9e6a 100644
>> >> > --- a/hw/net/npcm7xx_emc.c
>> >> > +++ b/hw/net/npcm7xx_emc.c
>> >> > @@ -424,7 +424,12 @@ static bool emc_can_receive(NetClientState *nc)
>> >> >  static bool emc_receive_filter1(NPCM7xxEMCState *emc, const uint8_t
>> *buf,
>> >> >  size_t len, const char
>> **fail_reason)
>> >> >  {
>> >> > -eth_pkt_types_e pkt_type =
>> get_eth_packet_type(PKT_GET_ETH_HDR(buf));
>> >> > +struct eth_header eth_hdr = {};
>> >> > +eth_pkt_types_e pkt_type;
>> >> > +
>> >> > +memcpy(ð_hdr, PKT_GET_ETH_HDR(buf),
>> >> > +   (sizeof(eth_hdr) > len) ? len : sizeof(eth_hdr));
>> >> > +pkt_type = get_eth_packet_type(ð_hdr);
>> >>
>> >> Maybe better to mark struct eth_header as QEMU_PACKED?
>> >> Compare commit f8b94b4c5201 ("net: mark struct ip_header as
>> >> QEMU_PACKED"). The handling of these header structs in eth.h
>> >> is in general pretty suspect IMHO. We do the same
>> >> "get_eth_packet_type(PKT_GET_ETH_HDR(buf))" in other devices,
>> >> so this isn't just this device's bug.
>>
>> > Roger that. We saw this in the two NICs we happened to be testing that
>> day, and yeah, I grepped and just figured that those other NICs were doing
>> something with their buffer allocations that we didn't. I'll give
>> QEMU_PACKED  whirl.
>>
>> You might find you need to make some fixes to other
>> devices to get the QEMU_PACKED change to compile (do an
>> all-targets build to test that). For instance for the
>> ip_header change I had to first fix virtio-net.c in commit
>> 5814c0846793715. The kind of thing that will need fixing is
>> if there are places where code takes the address of the
>> h_proto field and puts it into a uint16_t* : the compiler
>> will complain about that. A quick grep suggests that the
>> rocker_of_dpa.c code might be doing something like this, but
>> hopefully that's it.
>>
>
Ok, so digging, and I see that vlanhdr is used similarly in the
rocker_of_dpa.c code, so, without trying to bit off the yak shave of fixing
all ethernet headers, but in reality ethernet packets are packed
structures, should we just make them all packed and bite that bullet?


>
> Thanks for the head's up.
>
>>
>> thanks
>> -- PMM
>>
>

Re: [PATCH v2] vdpa: Fix endian bugs in shadow virtqueue

2025-02-27 Thread Konstantin Shkolnyy


On 2/27/2025 00:33, Michael Tokarev wrote:

25.02.2025 15:39, Konstantin Shkolnyy wrote:

On 2/25/2025 03:30, Michael Tokarev wrote:



This looks like a qemu-stable material.
Please let me know if it is not.


It won't help without my other "[PATCH v2] vdpa: Allow vDPA to work on 
big-endian machine". With both patches, VDPA works on a big-endian 
machine.


Aha. And it is not in master yet.  Thank you for letting me know!

How do you think, is it worth the effort to pick these up for
older stable releases (7.2, 8.2) too?


Yes. It's legitimate bugfixes. I suspect, more and more people will use 
VDPA as time goes by, so someone might try it on s390 with a stable QEMU.

[Bug 2072564] Re: qemu-aarch64-static segfaults running ldconfig.real (amd64 host)

2025-02-27 Thread Launchpad Bug Tracker

This bug was fixed in the package qemu - 1:9.2.1+ds-1ubuntu3

---
qemu (1:9.2.1+ds-1ubuntu3) plucky; urgency=medium

  * Fix qemu-aarch64-static segfaults running ldconfig.real (LP: #2072564)
- lp-2072564-elfload-Fix-alignment-when-unmapping-excess-reservat.patch
Thanks to Dimitry Andric for identifying the fix.

 -- Lukas Märdian   Wed, 26 Feb 2025 09:56:38 +0100

** Changed in: qemu (Ubuntu)
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2072564

Title:
  qemu-aarch64-static segfaults running ldconfig.real (amd64 host)

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Noble:
  Triaged
Status in qemu source package in Oracular:
  Triaged

Bug description:
  [ Impact ]

   * QEMU crashes when running (emulating) ldconfig in a Ubuntu 22.04
  arm64 guest

   * This affects the qemu-user-static 1:8.2.2+ds-0ubuntu1 package on
  Ubuntu 24.04+, running on a amd64 host.

   * When running docker containers with Ubuntu 22.04 in them, emulating
  arm64 with qemu-aarch64-static, invocations of ldconfig (actually
  ldconfig.real) segfault, leading to problems when loading shared
  libraries.

  [ Test Plan ]

   * Reproducer is very easy:

  $ sudo snap install docker
  docker 27.5.1 from Canonical** installed
  $ docker run -ti --platform linux/arm64/v8 ubuntu:22.04
  Unable to find image 'ubuntu:22.04' locally
  22.04: Pulling from library/ubuntu
  0d1c17d4e593: Pull complete 
  Digest: 
sha256:ed1544e454989078f5dec1bfdabd8c5cc9c48e0705d07b678ab6ae3fb61952d2
  Status: Downloaded newer image for ubuntu:22.04

  # Execute ldconfig.real inside the arm64 guest.
  # This should not crash after the fix!
  root@ad80af5378dc:/# /sbin/ldconfig.real
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  Segmentation fault (core dumped)

  [ Where problems could occur ]

   * This changes the alignment of sections in the ELF binary via QEMUs
  elfloader, if something goes wrong with this change, it could lead to
  all kind of crashes (segfault) of any emulated binaries.

  [ Other Info ]

   * Upstream bug: https://gitlab.com/qemu-project/qemu/-/issues/1913
   * Upstream fix: https://gitlab.com/qemu-project/qemu/-/commit/4b7b20a3
 - Fix dependency (needed for QEMU < 9.20): 
https://gitlab.com/qemu-project/qemu/-/commit/c81d1faf

  --- original bug report ---

  
  This affects the qemu-user-static 1:8.2.2+ds-0ubuntu1 package on Ubuntu 
24.04, running on a amd64 host.

  When running docker containers with Ubuntu 22.04 in them, emulating
  arm64 with qemu-aarch64-static, invocations of ldconfig (actually
  ldconfig.real) segfault. For example:

  $ docker run -ti --platform linux/arm64/v8 ubuntu:22.04
  root@8861ff640a1c:/# /sbin/ldconfig.real
  Segmentation fault

  If you copy the ldconfig.real binary to the host, and run it directly
  via qemu-aarch64-static:

  $ gdb --args qemu-aarch64-static ./ldconfig.real
  GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
  Copyright (C) 2024 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later 
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  Type "show copying" and "show warranty" for details.
  This GDB was configured as "x86_64-linux-gnu".
  Type "show configuration" for configuration details.
  For bug reporting instructions, please see:
  .
  Find the GDB manual and other documentation resources online at:
  .

  For help, type "help".
  Type "apropos word" to search for commands related to "word"...
  Reading symbols from qemu-aarch64-static...
  Reading symbols from 
/home/dim/.cache/debuginfod_client/86579812b213be0964189499f62f176bea817bf2/debuginfo...
  (gdb) r
  Starting program: /usr/bin/qemu-aarch64-static ./ldconfig.real
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
  [New Thread 0x776006c0 (LWP 28378)]

  Thread 1 "qemu-aarch64-st" received signal SIGSEGV, Segmentation fault.
  0x7fffe801645b in ?? ()
  (gdb) disassemble
  No function contains program counter for selected frame.

  It looks like this is a known qemu regression after v8.1.1:
  https://gitlab.com/qemu-project/qemu/-/issues/1913

  Downgrading the package to qemu-user-
  static_8.0.4+dfsg-1ubuntu3_amd64.deb fixes the segfault.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2072564/+subscriptions

Re: [PATCH] hw/net: npcm7xx_emc: fix alignment to eth_hdr

2025-02-27 Thread Peter Maydell

On Thu, 27 Feb 2025 at 18:12, Patrick Venture  wrote:
>
>
>
> On Thu, Feb 27, 2025 at 8:08 AM Patrick Venture  wrote:
>>
>>
>>
>> On Thu, Feb 27, 2025 at 8:01 AM Peter Maydell  
>> wrote:
>>>
>>> On Thu, 27 Feb 2025 at 15:55, Patrick Venture  wrote:
>>> >
>>> >
>>> >
>>> > On Thu, Feb 27, 2025 at 7:52 AM Peter Maydell  
>>> > wrote:
>>> >>
>>> >> On Thu, 27 Feb 2025 at 15:40, Patrick Venture  wrote:
>>> >> >
>>> >> > 'const struct eth_header', which requires 2 byte alignment
>>> >> >
>>> >> > Signed-off-by: Patrick Venture 
>>> >> > ---
>>> >> >  hw/net/npcm7xx_emc.c | 7 ++-
>>> >> >  1 file changed, 6 insertions(+), 1 deletion(-)
>>> >> >
>>> >> > diff --git a/hw/net/npcm7xx_emc.c b/hw/net/npcm7xx_emc.c
>>> >> > index e06f652629..11ed4a9e6a 100644
>>> >> > --- a/hw/net/npcm7xx_emc.c
>>> >> > +++ b/hw/net/npcm7xx_emc.c
>>> >> > @@ -424,7 +424,12 @@ static bool emc_can_receive(NetClientState *nc)
>>> >> >  static bool emc_receive_filter1(NPCM7xxEMCState *emc, const uint8_t 
>>> >> > *buf,
>>> >> >  size_t len, const char **fail_reason)
>>> >> >  {
>>> >> > -eth_pkt_types_e pkt_type = 
>>> >> > get_eth_packet_type(PKT_GET_ETH_HDR(buf));
>>> >> > +struct eth_header eth_hdr = {};
>>> >> > +eth_pkt_types_e pkt_type;
>>> >> > +
>>> >> > +memcpy(ð_hdr, PKT_GET_ETH_HDR(buf),
>>> >> > +   (sizeof(eth_hdr) > len) ? len : sizeof(eth_hdr));
>>> >> > +pkt_type = get_eth_packet_type(ð_hdr);
>>> >>
>>> >> Maybe better to mark struct eth_header as QEMU_PACKED?
>>> >> Compare commit f8b94b4c5201 ("net: mark struct ip_header as
>>> >> QEMU_PACKED"). The handling of these header structs in eth.h
>>> >> is in general pretty suspect IMHO. We do the same
>>> >> "get_eth_packet_type(PKT_GET_ETH_HDR(buf))" in other devices,
>>> >> so this isn't just this device's bug.
>>>
>>> > Roger that. We saw this in the two NICs we happened to be testing that 
>>> > day, and yeah, I grepped and just figured that those other NICs were 
>>> > doing something with their buffer allocations that we didn't. I'll give 
>>> > QEMU_PACKED  whirl.
>>>
>>> You might find you need to make some fixes to other
>>> devices to get the QEMU_PACKED change to compile (do an
>>> all-targets build to test that). For instance for the
>>> ip_header change I had to first fix virtio-net.c in commit
>>> 5814c0846793715. The kind of thing that will need fixing is
>>> if there are places where code takes the address of the
>>> h_proto field and puts it into a uint16_t* : the compiler
>>> will complain about that. A quick grep suggests that the
>>> rocker_of_dpa.c code might be doing something like this, but
>>> hopefully that's it.
>
>
> Ok, so digging, and I see that vlanhdr is used similarly in the 
> rocker_of_dpa.c code, so, without trying to bit off the yak shave of fixing 
> all ethernet headers, but in reality ethernet packets are packed structures, 
> should we just make them all packed and bite that bullet?

If you want to do all of them that's probably the long term
right thing. But the patchset structure would be a series
of "fix X that assumes struct A is not packed", "fix Y
that assumes struct A is not packed", "mark struct A packed",
"fix Z that assumes struct B is not packed", "mark struct B
packed", etc -- so I don't mind if you stop partway through
without doing all of them. (After all, that's exactly what
I did with only doing ip_header :-))

thanks
-- PMM

[PATCH] nbd: Defer trace init until after daemonization

2025-02-27 Thread Eric Blake

At least the simple trace backend works by spawning a helper thread,
and setting up an atexit() handler that coordinates completion with
the helper thread.  But since atexit registrations survive fork() but
helper threads do not, this means that qemu-nbd configured to use the
simple trace will deadlock waiting for a thread that no longer exists
when it has daemonized.

Better is to follow the example of vl.c: don't call any setup
functions that might spawn helper threads until we are in the final
process that will be doing the work worth tracing.

Tested by configuring with --enable-trace-backends=simple, then running
  qemu-nbd --fork --trace=nbd_\*,file=qemu-nbd.trace -f raw -r README.rst
followed by `nbdinfo nbd://localhost`, and observing that the trace
file is now created without hanging.

Reported-by: Thomas Huth 
Signed-off-by: Eric Blake 
---
 qemu-nbd.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 05b61da51ea..ed5895861bb 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -852,10 +852,6 @@ int main(int argc, char **argv)
 export_name = "";
 }

-if (!trace_init_backends()) {
-exit(1);
-}
-trace_init_file();
 qemu_set_log(LOG_TRACE, &error_fatal);

 socket_activation = check_socket_activation();
@@ -1045,6 +1041,18 @@ int main(int argc, char **argv)
 #endif /* WIN32 */
 }

+/*
+ * trace_init must be done after daemonization.  Why? Because at
+ * least the simple backend spins up a helper thread as well as an
+ * atexit() handler that waits on that thread, but the helper
+ * thread won't survive a fork, leading to deadlock in the child
+ * if we initialized pre-fork.
+ */
+if (!trace_init_backends()) {
+exit(1);
+}
+trace_init_file();
+
 if (opts.device != NULL && sockpath == NULL) {
 sockpath = g_malloc(128);
 snprintf(sockpath, 128, SOCKET_PATH, basename(opts.device));
-- 
2.48.1

Re: [PATCH] QIOChannelSocket: Flush zerocopy socket error queue on ENOBUF failure for sendmsg

2025-02-27 Thread Peter Xu

On Thu, Feb 27, 2025 at 10:30:31PM +0530, Manish wrote:
> Again really sorry, missed this due to some issue with my mail filters and
> came to know about it via qemu-devel weblink. :)
> 
> On 25/02/25 2:37 pm, Daniel P. Berrangé wrote:
> > !---|
> >CAUTION: External Email
> > 
> > |---!
> > 
> > On Fri, Feb 21, 2025 at 04:44:48AM -0500, Manish Mishra wrote:
> > > We allocate extra metadata SKBs in case of zerocopy send. This metadata 
> > > memory
> > > is accounted for in the OPTMEM limit. If there is any error with sending
> > > zerocopy data or if zerocopy was skipped, these metadata SKBs are queued 
> > > in the
> > > socket error queue. This error queue is freed when userspace reads it.
> > > 
> > > Usually, if there are continuous failures, we merge the metadata into a 
> > > single
> > > SKB and free another one. However, if there is any out-of-order 
> > > processing or
> > > an intermittent zerocopy failures, this error chain can grow 
> > > significantly,
> > > exhausting the OPTMEM limit. As a result, all new sendmsg requests fail to
> > > allocate any new SKB, leading to an ENOBUF error.
> > IIUC, you are effectively saying that the migration code is calling
> > qio_channel_write() too many times, before it calls qio_channel_flush(.)
> > 
> > Can you clarify what yu mean by the "OPTMEM limit" here ? I'm wondering
> > if this is potentially triggered by suboptimal tuning of the deployment
> > environment or we need to document tuning better.
> 
> 
> I replied it on other thread. Posting it again.
> 
> We allocate some memory for zerocopy metadata, this is not accounted in
> tcp_send_queue but it is accounted in optmem_limit.
> 
> https://github.com/torvalds/linux/blob/dd83757f6e686a2188997cb58b5975f744bb7786/net/core/skbuff.c#L1607
> 
> Also when the zerocopy data is sent and acked, we try to free this
> allocated skb as we can see in below code.
> 
> https://github.com/torvalds/linux/blob/dd83757f6e686a2188997cb58b5975f744bb7786/net/core/skbuff.c#L1751
> 
> In case, we get __msg_zerocopy_callback() on continous ranges and
> skb_zerocopy_notify_extend() passes, we merge the ranges and free up the
> current skb. But if that is not the case, we insert that skb in error
> queue and it won't be freed until we do error flush from userspace. This
> is possible when either zerocopy packets are skipped in between or it is
> always skipped but we get out of order acks on packets. As a result this
> error chain keeps growing, exhausthing the optmem_limit. As a result
> when new zerocopy sendmsg request comes, it won't be able to allocate
> the metadata and returns with ENOBUF.
> 
> I understand there is another bug of why zerocopy pakcets are getting
> skipped and which could be our deployment specific. But anyway live
> migrations should not fail, it is fine to mark zerocopy skipped but not
> fail?
> 
> 
> > > To workaround this, if we encounter an ENOBUF error with a zerocopy 
> > > sendmsg,
> > > we flush the error queue and retry once more.
> > > 
> > > Signed-off-by: Manish Mishra
> > > ---
> > >   include/io/channel-socket.h |  1 +
> > >   io/channel-socket.c | 52 -
> > >   2 files changed, 46 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> > > index ab15577d38..6cfc66eb5b 100644
> > > --- a/include/io/channel-socket.h
> > > +++ b/include/io/channel-socket.h
> > > @@ -49,6 +49,7 @@ struct QIOChannelSocket {
> > >   socklen_t remoteAddrLen;
> > >   ssize_t zero_copy_queued;
> > >   ssize_t zero_copy_sent;
> > > +bool new_zero_copy_sent_success;
> > >   };
> > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > index 608bcf066e..c7f576290f 100644
> > > --- a/io/channel-socket.c
> > > +++ b/io/channel-socket.c
> > > @@ -37,6 +37,11 @@
> > >   #define SOCKET_MAX_FDS 16
> > > +#ifdef QEMU_MSG_ZEROCOPY
> > > +static int qio_channel_socket_flush_internal(QIOChannel *ioc,
> > > + Error **errp);
> > > +#endif
> > > +
> > >   SocketAddress *
> > >   qio_channel_socket_get_local_address(QIOChannelSocket *ioc,
> > >Error **errp)
> > > @@ -65,6 +70,7 @@ qio_channel_socket_new(void)
> > >   sioc->fd = -1;
> > >   sioc->zero_copy_queued = 0;
> > >   sioc->zero_copy_sent = 0;
> > > +sioc->new_zero_copy_sent_success = FALSE;
> > >   ioc = QIO_CHANNEL(sioc);
> > >   qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
> > > @@ -566,6 +572,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel 
> > > *ioc,
> > >   size_t fdsize = sizeof(int) * nfds;
> > >   struct cmsghdr *cmsg;
> > >   int sflags = 0;
> > > +bool zero_copy_flush_pending = TRUE;
> > >   memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS));
> >

Re: [PATCH 1/3] target/arm: Correct LDRD atomicity and fault behaviour

2025-02-27 Thread Peter Maydell

On Thu, 27 Feb 2025 at 17:41, Richard Henderson
 wrote:
>
> On 2/27/25 06:27, Peter Maydell wrote:
> > +static void do_ldrd_load(DisasContext *s, TCGv_i32 addr, int rt, int rt2)
> > +{
> > +/*
> > + * LDRD is required to be an atomic 64-bit access if the
> > + * address is 8-aligned, two atomic 32-bit accesses if
> > + * it's only 4-aligned, and to give an alignemnt fault
> > + * if it's not 4-aligned.
> > + * Rt is always the word from the lower address, and Rt2 the
> > + * data from the higher address, regardless of endianness.
> > + * So (like gen_load_exclusive) we avoid gen_aa32_ld_i64()
> > + * so we don't get its SCTLR_B check, and instead do a 64-bit access
> > + * using MO_BE if appropriate and then split the two halves.
> > + *
> > + * This also gives us the correct behaviour of not updating
> > + * rt if the load of rt2 faults; this is required for cases
> > + * like "ldrd r2, r3, [r2]" where rt is also the base register.
> > + */
> > +int mem_idx = get_mem_index(s);
> > +MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data;
>
> The 64-bit atomicity begins with armv7 + LPAE, and not present for any 
> m-profile.
> Worth checking ARM_FEATURE_LPAE, or at least adding to the comment?
>
> Getting 2 x 4-byte atomicity, but not require 8-byte atomicity, would use
> MO_ATOM_IFALIGN_PAIR.

Definitely worth a comment at minimum. Do we generate better
code for MO_ATOM_IFALIGN_PAIR ? (If not, then providing higher
atomicity than the architecture mandates seems harmless.)

For the comment in memop.h that currently reads
 * MO_ATOM_SUBALIGN: the operation is single-copy atomic by parts
 *by the alignment.  E.g. if the address is 0 mod 4, then each
 *4-byte subobject is single-copy atomic.
 *This is the atomicity e.g. of IBM Power.

maybe we could expand the e.g:

  E.g if an 8-byte value is accessed at an address which is 0 mod 8,
  then the whole 8-byte access is single-copy atomic; otherwise,
  if it is accessed at 0 mod 4 then each 4-byte subobject is
  single-copy atomic; otherwise if it is accessed at 0 mod 2
  then the four 2-byte subobjects are single-copy atomic.

? I wasn't sure when reading what we currently have whether
it provided the 8-byte-aligned guarantee, rather than merely
the 4-byte-aligned one.

thanks
-- PMM

Re: [PATCH 4/5] rust: pl011: switch to safe chardev operation

2025-02-27 Thread Paolo Bonzini

On Thu, Feb 27, 2025 at 6:25 PM Peter Maydell  wrote:
> On Thu, 27 Feb 2025 at 16:48, Paolo Bonzini  wrote:
> > Switch bindings::CharBackend with chardev::CharBackend.  This removes
> > occurrences of "unsafe" due to FFI and switches the wrappers for receive,
> > can_receive and event callbacks to the common ones implemented by
> > chardev::CharBackend.
> >
> > Signed-off-by: Paolo Bonzini 
>
> > @@ -567,21 +552,16 @@ fn write(&self, offset: hwaddr, value: u64, _size: 
> > u32) {
>
> > -update_irq = self.regs.borrow_mut().write(
> > -field,
> > -value as u32,
> > -addr_of!(self.char_backend) as *mut _,
> > -);
> > +update_irq = self
> > +.regs
> > +.borrow_mut()
> > +.write(field, value as u32, &self.char_backend);
> >  } else {
> >  eprintln!("write bad offset {offset} value {value}");
> >  }
>
> Entirely unrelated to this patch, but seeing this go past
> reminded me that I had a question I didn't get round to
> asking in the community call the other day. In this
> PL011State::write function, we delegate the job of
> updating the register state to PL011Registers::write,
> which returns a bool to tell us whether to call update().
>
> I guess the underlying design idea here is "the register
> object updates itself and tells the device object what
> kinds of updates to the outside world it needs to do" ?
> But then, why is the irq output something that PL011State
> needs to handle itself whereas the chardev backend is
> something we can pass into PL011Registers ?

Just because the IRQ update is needed in many places and the chardev
backend only in one place.

> In the C version, we just call pl011_update() where we
> need to; we could validly call it unconditionally for any
> write, we're just being (possibly prematurely) efficient
> by avoiding a call when we happen to know that the register
> write didn't touch any of the state that pl011_update()
> cares about. So it feels a bit odd to me that in the Rust
> version this "we happen to know that sometimes it would be
> unnecessary to call the update function" has been kind of
> promoted to being part of an interface between the two
> different types PL011Registers and PL011State.

Yeah, if I was writing from scratch I would probably call update()
unconditionally. If it turns out to be inefficient you could cache the
current value of

let flags = regs.int_level & regs.int_enabled;

in PL011State as a BqlCell.

> Thinking about other devices, presumably for more complex
> devices we might need to pass more than just a single 'bool'
> back from PL011Registers::write. What other kinds of thing
> might we need to do in the FooState function, and (since
> the pl011 code is presumably going to be used as a template
> for those other devices) is it worth having something that
> expresses that better than just a raw 'bool' return ?

Ideally nothing, especially considering that more modern devices have
edge-triggered interrupts like MSIs, instead of level-triggered
interrupts that need qemu_set_irq() calls. But if you have something a
lot more complex than a bool I would pass down the PL011State and do
something like pl011.schedule_update_irq() which updates a BqlCell<>.
The device could then use a bottom half or process them after
"drop(regs)".

HPET has another approach, which is to store a backpointer from
HPETTimer to the HPETState, so that it can do

self.get_state().irqs[route].pulse();

without passing down anything. The reason for this is that it has
multiple timers on the same routine, and it assigns the timers to
separate HPETTimers. I would not use it for PL011 because all accesses
to the PL011Registers go through the PL011State.

A while ago I checked how OpenVMM does it, and basically it does not
have the PL011State/PL011Registers separation at all: the devices are
automatically wrapped with a Mutex and memory accesses take a &mut.
That removes some of the complexity, but also a lot of flexibility.

Unfortunately, before being able to reason on how to make peace with
the limitations of safe Rust, it was necessary to spend a lot of time
writing API abstractions, otherwise you don't actually experience the
limitations. But that means that the number of lines of code in
qemu_api is not representative of my experience using it. :( I am
sorry this isn't a great answer yet; certainly some aspects of the
PL011 or HPET devices could be treated as a blueprint for future
devices, but which and how to document that is something where I would
like to consult with an actual Rust maven.

Paolo

[PATCH] scripts: dump stdin on meson-buildoptions error

2025-02-27 Thread Patrick Venture

From: Nabih Estefan 

Dump sys.stdin when it errors on meson-buildoptions.py, letting us debug
the build errors instead of just saying "Couldn't parse"

Signed-off-by: Nabih Estefan 
Signed-off-by: Patrick Venture 
---
 scripts/meson-buildoptions.py | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/scripts/meson-buildoptions.py b/scripts/meson-buildoptions.py
index 4814a8ff61..a3e22471b2 100644
--- a/scripts/meson-buildoptions.py
+++ b/scripts/meson-buildoptions.py
@@ -241,8 +241,14 @@ def print_parse(options):
 print("  esac")
 print("}")
 
-
-options = load_options(json.load(sys.stdin))
+json_data = sys.stdin.read()
+try:
+options = load_options(json.loads(json_data))
+except:
+print("Failure in scripts/meson-buildoptions.py parsing stdin as json",
+  file=sys.stderr)
+print(json_data, file=sys.stderr)
+sys.exit(1)
 print("# This file is generated by meson-buildoptions.py, do not edit!")
 print_help(options)
 print_parse(options)
-- 
2.48.1.658.g4767266eb4-goog

Re: [PATCH 1/3] target/arm: Correct LDRD atomicity and fault behaviour

2025-02-27 Thread Richard Henderson


On 2/27/25 06:27, Peter Maydell wrote:

Our LDRD implementation is wrong in two respects:

  * if the address is 4-aligned and the load crosses a page boundary
and the second load faults and the first load was to the
base register (as in cases like "ldrd r2, r3, [r2]", then we
must not update the base register before taking the fault
  * if the address is 8-aligned the access must be a 64-bit
single-copy atomic access, not two 32-bit accesses

Rewrite the handling of the loads in LDRD to use a single
tcg_gen_qemu_ld_i64() and split the result into the destination
registers. This allows us to get the atomicity requirements
right, and also implicitly means that we won't update the
base register too early for the page-crossing case.

Note that because we no longer increment 'addr' by 4 in the course of
performing the LDRD we must change the adjustment value we pass to
op_addr_ri_post() and op_addr_rr_post(): it no longer needs to
subtract 4 to get the correct value to use if doing base register
writeback.

STRD has the same problem with not getting the atomicity right;
we will deal with that in the following commit.

Cc: qemu-sta...@nongnu.org
Reported-by: Stu Grossman 
Signed-off-by: Peter Maydell 
---
  target/arm/tcg/translate.c | 64 --
  1 file changed, 40 insertions(+), 24 deletions(-)

diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index d8225b77c8c..e10a1240c17 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -5003,10 +5003,43 @@ static bool op_store_rr(DisasContext *s, arg_ldst_rr *a,
  return true;
  }
  
+static void do_ldrd_load(DisasContext *s, TCGv_i32 addr, int rt, int rt2)

+{
+/*
+ * LDRD is required to be an atomic 64-bit access if the
+ * address is 8-aligned, two atomic 32-bit accesses if
+ * it's only 4-aligned, and to give an alignemnt fault
+ * if it's not 4-aligned.
+ * Rt is always the word from the lower address, and Rt2 the
+ * data from the higher address, regardless of endianness.
+ * So (like gen_load_exclusive) we avoid gen_aa32_ld_i64()
+ * so we don't get its SCTLR_B check, and instead do a 64-bit access
+ * using MO_BE if appropriate and then split the two halves.
+ *
+ * This also gives us the correct behaviour of not updating
+ * rt if the load of rt2 faults; this is required for cases
+ * like "ldrd r2, r3, [r2]" where rt is also the base register.
+ */
+int mem_idx = get_mem_index(s);
+MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data;


The 64-bit atomicity begins with armv7 + LPAE, and not present for any 
m-profile.
Worth checking ARM_FEATURE_LPAE, or at least adding to the comment?

Getting 2 x 4-byte atomicity, but not require 8-byte atomicity, would use 
MO_ATOM_IFALIGN_PAIR.


Otherwise,
Reviewed-by: Richard Henderson 


r~

Re: [PATCH 0/5] hw/arm: Remove printf() calls

2025-02-27 Thread Richard Henderson


On 2/27/25 09:01, Peter Maydell wrote:

Peter Maydell (5):
   hw/arm/omap1: Convert raw printfs to qemu_log_mask()
   hw/arm/omap1: Drop ALMDEBUG ifdeffed out code
   hw/arm/omap1: Convert information printfs to tracepoints
   hw/arm/omap_sx1.c: Remove ifdeffed out debug printf
   hw/arm/versatilepb: Convert printfs to LOG_GUEST_ERROR


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v5 36/36] vfio/migration: Update VFIO migration documentation

2025-02-27 Thread Maciej S. Szmigiero


On 27.02.2025 07:59, Cédric Le Goater wrote:

On 2/19/25 21:34, Maciej S. Szmigiero wrote:

From: "Maciej S. Szmigiero" 

Update the VFIO documentation at docs/devel/migration describing the
changes brought by the multifd device state transfer.

Signed-off-by: Maciej S. Szmigiero 
---
  docs/devel/migration/vfio.rst | 80 +++
  1 file changed, 71 insertions(+), 9 deletions(-)

diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst
index c49482eab66d..d9b169d29921 100644
--- a/docs/devel/migration/vfio.rst
+++ b/docs/devel/migration/vfio.rst
@@ -16,6 +16,37 @@ helps to reduce the total downtime of the VM. VFIO devices 
opt-in to pre-copy
  support by reporting the VFIO_MIGRATION_PRE_COPY flag in the
  VFIO_DEVICE_FEATURE_MIGRATION ioctl.


Please add a new "multifd" documentation subsection at the end of the file
with this part :


+Starting from QEMU version 10.0 there's a possibility to transfer VFIO device
+_STOP_COPY state via multifd channels. This helps reduce downtime - especially
+with multiple VFIO devices or with devices having a large migration state.
+As an additional benefit, setting the VFIO device to _STOP_COPY state and
+saving its config space is also parallelized (run in a separate thread) in
+such migration mode.
+
+The multifd VFIO device state transfer is controlled by
+"x-migration-multifd-transfer" VFIO device property. This property defaults to
+AUTO, which means that VFIO device state transfer via multifd channels is
+attempted in configurations that otherwise support it.
+


Done - I also moved the parts about x-migration-max-queued-buffers
and x-migration-load-config-after-iter description there since
obviously they wouldn't make sense being left alone in the top section.


I was expecting a much more detailed explanation on the design too  :

  * in the cover letter
  * in the hw/vfio/migration-multifd.c
  * in some new file under docs/devel/migration/



I'm not sure what descriptions you exactly want in these places, but since
that's just documentation (not code) it could be added after the code freeze...



This section :


+Since the target QEMU needs to load device state buffers in-order it needs to
+queue incoming buffers until they can be loaded into the device.
+This means that a malicious QEMU source could theoretically cause the target
+QEMU to allocate unlimited amounts of memory for such buffers-in-flight.
+
+The "x-migration-max-queued-buffers" property allows capping the maximum count
+of these VFIO device state buffers queued at the destination.
+
+Because a malicious QEMU source causing OOM on the target is not expected to be
+a realistic threat in most of VFIO live migration use cases and the right value
+depends on the particular setup by default this queued buffers limit is
+disabled by setting it to UINT64_MAX.


should be in patch 34. It is not obvious it will be merged.



...which brings us to this point.

I think by this point in time (less then 2 weeks to code freeze) we should
finally decide what is going to be included in the patch set.

This way this patch set could be well tested in its final form rather than
having significant parts taken out of it at the eleventh hour.

If the final form is known also the documentation can be adjusted accordingly
and user/admin documentation eventually written once the code is considered
okay.

I though we discussed a few times the rationale behind both
x-migration-max-queued-buffers and x-migration-load-config-after-iter properties
but if you still have some concerns there please let me know before I prepare
the next version of this patch set so I know whether to include these.


This section :


+Some host platforms (like ARM64) require that VFIO device config is loaded only
+after all iterables were loaded.
+Such interlocking is controlled by "x-migration-load-config-after-iter" VFIO
+device property, which in its default setting (AUTO) does so only on platforms
+that actually require it.


Should be in 35. Same reason.



  When pre-copy is supported, it's possible to further reduce downtime by
  enabling "switchover-ack" migration capability.
  VFIO migration uAPI defines "initial bytes" as part of its pre-copy data 
stream
@@ -67,14 +98,39 @@ VFIO implements the device hooks for the iterative approach 
as follows:
  * A ``switchover_ack_needed`` function that checks if the VFIO device uses
    "switchover-ack" migration capability when this capability is enabled.
-* A ``save_state`` function to save the device config space if it is present.
-
-* A ``save_live_complete_precopy`` function that sets the VFIO device in
-  _STOP_COPY state and iteratively copies the data for the VFIO device until
-  the vendor driver indicates that no data remains.
-
-* A ``load_state`` function that loads the config section and the data
-  sections that are generated by the save functions above.
+* A ``switchover_start`` function that in the multifd mode starts a thread that
+  r

Re: [PATCH v3 044/162] tcg: Convert div2 to TCGOutOpDivRem

2025-02-27 Thread Philippe Mathieu-Daudé


On 17/2/25 00:08, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  tcg/tcg.c| 24 +++--
  tcg/aarch64/tcg-target.c.inc |  4 +++
  tcg/arm/tcg-target.c.inc |  4 +++
  tcg/i386/tcg-target.c.inc| 17 
  tcg/loongarch64/tcg-target.c.inc |  4 +++
  tcg/mips/tcg-target.c.inc|  4 +++
  tcg/ppc/tcg-target.c.inc |  4 +++
  tcg/riscv/tcg-target.c.inc   |  4 +++
  tcg/s390x/tcg-target.c.inc   | 44 
  tcg/sparc64/tcg-target.c.inc |  4 +++
  tcg/tci/tcg-target.c.inc |  4 +++
  11 files changed, 88 insertions(+), 29 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 2/5] rust: pl011: move register definitions out of lib.rs

2025-02-27 Thread Paolo Bonzini


On 2/27/25 18:28, Peter Maydell wrote:

On Thu, 27 Feb 2025 at 16:48, Paolo Bonzini  wrote:


Signed-off-by: Paolo Bonzini 
---
  rust/hw/char/pl011/src/device.rs|   7 +-
  rust/hw/char/pl011/src/lib.rs   | 509 +---
  rust/hw/char/pl011/src/registers.rs | 507 +++
  3 files changed, 513 insertions(+), 510 deletions(-)
  create mode 100644 rust/hw/char/pl011/src/registers.rs


Looking at this patch I'm sorely tempted to suggest significantly
trimming down the commentary in these comments: it contains
rather more text cut-n-pasted from the PL011 TRM than I'm
entirely comfortable with, and much of it is detail that
is irrelevant to QEMU.


Yeah, that was a point that was made on the call last week, too.  I 
think I agree, but it wasn't a decision I really wanted to take or 
suggest myself...


That said, some of the stuff does not belong in the structs but could be 
added to lib.rs, too, with more fair-use justification than in 
registers.rs.  So perhaps we could delay removing it until more aspects 
of the FIFO are modeled correctly, so that one does not have to reinvent 
the wording from scratch.


Paolo

Re: [PATCH 2/5] rust: pl011: move register definitions out of lib.rs

2025-02-27 Thread Paolo Bonzini


On 2/27/25 18:28, Peter Maydell wrote:

On Thu, 27 Feb 2025 at 16:48, Paolo Bonzini  wrote:


Signed-off-by: Paolo Bonzini 
---
  rust/hw/char/pl011/src/device.rs|   7 +-
  rust/hw/char/pl011/src/lib.rs   | 509 +---
  rust/hw/char/pl011/src/registers.rs | 507 +++
  3 files changed, 513 insertions(+), 510 deletions(-)
  create mode 100644 rust/hw/char/pl011/src/registers.rs


Looking at this patch I'm sorely tempted to suggest significantly
trimming down the commentary in these comments: it contains
rather more text cut-n-pasted from the PL011 TRM than I'm
entirely comfortable with, and much of it is detail that
is irrelevant to QEMU.


Yeah, that was a point that was made on the call last week, too.  I kind 
of agree, but it wasn't a decision I really wanted to take or suggest.


Also, some of the stuff does not belong in the structs but could be 
added to lib.rs, too, with more fair-use justification than in 
registers.rs.  So perhaps delay removing it until more aspects of the 
FIFO are modeled correctly, so that one does not have to reinvent the 
wording from scratch.


Paolo

Re: [PATCH] QIOChannelSocket: Flush zerocopy socket error queue on ENOBUF failure for sendmsg

2025-02-27 Thread Manish




On 27/02/25 11:26 pm, Peter Xu wrote:

!---|
   CAUTION: External Email

|---!

On Thu, Feb 27, 2025 at 10:30:31PM +0530, Manish wrote:

Again really sorry, missed this due to some issue with my mail filters and
came to know about it via qemu-devel weblink. :)

On 25/02/25 2:37 pm, Daniel P. Berrangé wrote:

!---|
CAUTION: External Email

|---!

On Fri, Feb 21, 2025 at 04:44:48AM -0500, Manish Mishra wrote:

We allocate extra metadata SKBs in case of zerocopy send. This metadata memory
is accounted for in the OPTMEM limit. If there is any error with sending
zerocopy data or if zerocopy was skipped, these metadata SKBs are queued in the
socket error queue. This error queue is freed when userspace reads it.

Usually, if there are continuous failures, we merge the metadata into a single
SKB and free another one. However, if there is any out-of-order processing or
an intermittent zerocopy failures, this error chain can grow significantly,
exhausting the OPTMEM limit. As a result, all new sendmsg requests fail to
allocate any new SKB, leading to an ENOBUF error.

IIUC, you are effectively saying that the migration code is calling
qio_channel_write() too many times, before it calls qio_channel_flush(.)

Can you clarify what yu mean by the "OPTMEM limit" here ? I'm wondering
if this is potentially triggered by suboptimal tuning of the deployment
environment or we need to document tuning better.


I replied it on other thread. Posting it again.

We allocate some memory for zerocopy metadata, this is not accounted in
tcp_send_queue but it is accounted in optmem_limit.

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_torvalds_linux_blob_dd83757f6e686a2188997cb58b5975f744bb7786_net_core_skbuff.c-23L1607&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=YA3zea2x_8HvvhQTYxsrstDnnR6I9dBTpwab3ZA3sSlAG5-8Yx7-xXYWLbe97cTe&s=3Wy9sMKSYoYsFN2cMzzIoa-C-wu4Uz8EHwizX5bGHaw&e=

Also when the zerocopy data is sent and acked, we try to free this
allocated skb as we can see in below code.

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_torvalds_linux_blob_dd83757f6e686a2188997cb58b5975f744bb7786_net_core_skbuff.c-23L1751&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=YA3zea2x_8HvvhQTYxsrstDnnR6I9dBTpwab3ZA3sSlAG5-8Yx7-xXYWLbe97cTe&s=rF8-LBBR4gvzKz2mE7dopv2uUYJavJuF2wmKUDmeFgE&e=

In case, we get __msg_zerocopy_callback() on continous ranges and
skb_zerocopy_notify_extend() passes, we merge the ranges and free up the
current skb. But if that is not the case, we insert that skb in error
queue and it won't be freed until we do error flush from userspace. This
is possible when either zerocopy packets are skipped in between or it is
always skipped but we get out of order acks on packets. As a result this
error chain keeps growing, exhausthing the optmem_limit. As a result
when new zerocopy sendmsg request comes, it won't be able to allocate
the metadata and returns with ENOBUF.

I understand there is another bug of why zerocopy pakcets are getting
skipped and which could be our deployment specific. But anyway live
migrations should not fail, it is fine to mark zerocopy skipped but not
fail?



To workaround this, if we encounter an ENOBUF error with a zerocopy sendmsg,
we flush the error queue and retry once more.

Signed-off-by: Manish Mishra
---
   include/io/channel-socket.h |  1 +
   io/channel-socket.c | 52 -
   2 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index ab15577d38..6cfc66eb5b 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -49,6 +49,7 @@ struct QIOChannelSocket {
   socklen_t remoteAddrLen;
   ssize_t zero_copy_queued;
   ssize_t zero_copy_sent;
+bool new_zero_copy_sent_success;
   };
diff --git a/io/channel-socket.c b/io/channel-socket.c
index 608bcf066e..c7f576290f 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -37,6 +37,11 @@
   #define SOCKET_MAX_FDS 16
+#ifdef QEMU_MSG_ZEROCOPY
+static int qio_channel_socket_flush_internal(QIOChannel *ioc,
+ Error **errp);
+#endif
+
   SocketAddress *
   qio_channel_socket_get_local_address(QIOChannelSocket *ioc,
Error **errp)
@@ -65,6 +70,7 @@ qio_channel_socket_new(void)
   sioc->fd = -1;
   sioc->zero_copy_queued = 0;
   sioc->zero_copy_sent = 0;
+sioc->new_zero_copy_sent_success = FALSE;
   ioc = QIO_CHANNEL(sioc);
   qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
@@ -566,6 +572,7 @@ static ssize_t qio_channel_socket_writev(QIOC

Re: [PATCH 4/5] rust: pl011: switch to safe chardev operation

2025-02-27 Thread Peter Maydell

On Thu, 27 Feb 2025 at 18:02, Paolo Bonzini  wrote:
>
> On Thu, Feb 27, 2025 at 6:25 PM Peter Maydell  
> wrote:
> > Thinking about other devices, presumably for more complex
> > devices we might need to pass more than just a single 'bool'
> > back from PL011Registers::write. What other kinds of thing
> > might we need to do in the FooState function, and (since
> > the pl011 code is presumably going to be used as a template
> > for those other devices) is it worth having something that
> > expresses that better than just a raw 'bool' return ?
>
> Ideally nothing, especially considering that more modern devices have
> edge-triggered interrupts like MSIs, instead of level-triggered
> interrupts that need qemu_set_irq() calls. But if you have something a
> lot more complex than a bool I would pass down the PL011State and do
> something like pl011.schedule_update_irq() which updates a BqlCell<>.
> The device could then use a bottom half or process them after
> "drop(regs)".
>
> HPET has another approach, which is to store a backpointer from
> HPETTimer to the HPETState, so that it can do
>
> self.get_state().irqs[route].pulse();
>
> without passing down anything. The reason for this is that it has
> multiple timers on the same routine, and it assigns the timers to
> separate HPETTimers. I would not use it for PL011 because all accesses
> to the PL011Registers go through the PL011State.

I think the idea I vaguely have in the back of my mind is that
maybe it's a nice idea to have a coding structure that enforces
"you update your own internal state, and only then do things that
might involve calling out to the outside world, and if there's
something that you need to do with the result of that callout,
there's an easy mechanism for 'this is what I will want to do
next after that' continuations".

The fact that our C device implementations don't do that is
kind of the underlying cause of all the "recursive entry back
into the device via DMA" problems that we squashed with the
big hammer of "just forbid it entirely". It's also in a way
the problem underlying the race condition segfault here:
https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/#u
(memory_region_snapshot_and_clear_dirty() drops the BKL, no
callers expect that, segfaults in the calling code in the
framebuffer device models if something else gets in and
e.g. resizes the framebuffer in the middle of a display update).

So I was sort of wondering if the pl011 structure was aiming
at providing that kind of separation of "internal state" and
"external interactions", such that the compiler would complain
if you tried to do an externally-interacting thing while your
internal state was not consistent.

thanks
-- PMM

Re: [PATCH 1/3] target/arm: Correct LDRD atomicity and fault behaviour

2025-02-27 Thread Richard Henderson


On 2/27/25 09:58, Peter Maydell wrote:

On Thu, 27 Feb 2025 at 17:41, Richard Henderson
 wrote:


On 2/27/25 06:27, Peter Maydell wrote:

+static void do_ldrd_load(DisasContext *s, TCGv_i32 addr, int rt, int rt2)
+{
+/*
+ * LDRD is required to be an atomic 64-bit access if the
+ * address is 8-aligned, two atomic 32-bit accesses if
+ * it's only 4-aligned, and to give an alignemnt fault
+ * if it's not 4-aligned.
+ * Rt is always the word from the lower address, and Rt2 the
+ * data from the higher address, regardless of endianness.
+ * So (like gen_load_exclusive) we avoid gen_aa32_ld_i64()
+ * so we don't get its SCTLR_B check, and instead do a 64-bit access
+ * using MO_BE if appropriate and then split the two halves.
+ *
+ * This also gives us the correct behaviour of not updating
+ * rt if the load of rt2 faults; this is required for cases
+ * like "ldrd r2, r3, [r2]" where rt is also the base register.
+ */
+int mem_idx = get_mem_index(s);
+MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data;


The 64-bit atomicity begins with armv7 + LPAE, and not present for any 
m-profile.
Worth checking ARM_FEATURE_LPAE, or at least adding to the comment?

Getting 2 x 4-byte atomicity, but not require 8-byte atomicity, would use
MO_ATOM_IFALIGN_PAIR.


Definitely worth a comment at minimum. Do we generate better
code for MO_ATOM_IFALIGN_PAIR ? (If not, then providing higher
atomicity than the architecture mandates seems harmless.)


We could, but currently do not, generate better code for IFALIGN_PAIR for MO_64. 
Currently the only place we take special care is for MO_128.



For the comment in memop.h that currently reads
  * MO_ATOM_SUBALIGN: the operation is single-copy atomic by parts
  *by the alignment.  E.g. if the address is 0 mod 4, then each
  *4-byte subobject is single-copy atomic.
  *This is the atomicity e.g. of IBM Power.

maybe we could expand the e.g:

   E.g if an 8-byte value is accessed at an address which is 0 mod 8,
   then the whole 8-byte access is single-copy atomic; otherwise,
   if it is accessed at 0 mod 4 then each 4-byte subobject is
   single-copy atomic; otherwise if it is accessed at 0 mod 2
   then the four 2-byte subobjects are single-copy atomic.

?


Yes, that's correct.


I wasn't sure when reading what we currently have whether
it provided the 8-byte-aligned guarantee, rather than merely
the 4-byte-aligned one.


I was trying to highlight the difference between SUBALIGN and IFALIGN, and perhaps didn't 
do adequate job of it.


r~

Re: [PATCH 2/5] hw/arm/omap1: Drop ALMDEBUG ifdeffed out code

2025-02-27 Thread Philippe Mathieu-Daudé


On 27/2/25 18:01, Peter Maydell wrote:

In omap1.c, there are some debug printfs in the omap_rtc_write()
function that are guardad by ifdef ALMDEBUG. ALMDEBUG is never
set, so this is all dead code.

It's not worth the effort of converting all of these to tracepoints;
a modern tracepoint approach would probably have a single tracepoint
covering all the register writes anyway. Just delete the printf()s.

Signed-off-by: Peter Maydell 
---
  hw/arm/omap1.c | 51 --
  1 file changed, 51 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v3 034/162] tcg: Convert mul to TCGOutOpBinary

2025-02-27 Thread Philippe Mathieu-Daudé


On 17/2/25 00:08, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  tcg/tcg.c|  6 ++-
  tcg/aarch64/tcg-target.c.inc | 18 ---
  tcg/arm/tcg-target.c.inc | 23 
  tcg/i386/tcg-target.c.inc| 47 +---
  tcg/loongarch64/tcg-target.c.inc | 24 +
  tcg/mips/tcg-target.c.inc| 43 +--
  tcg/ppc/tcg-target.c.inc | 42 +++
  tcg/riscv/tcg-target.c.inc   | 21 
  tcg/s390x/tcg-target.c.inc   | 92 ++--
  tcg/sparc64/tcg-target.c.inc | 28 +++---
  tcg/tci/tcg-target.c.inc | 14 +++--
  11 files changed, 210 insertions(+), 148 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 3/5] hw/arm/omap1: Convert information printfs to tracepoints

2025-02-27 Thread Philippe Mathieu-Daudé


On 27/2/25 18:01, Peter Maydell wrote:

The omap1 code uses raw printf() statements to print information
about some events; convert these to tracepoints.

In particular, this will stop the functional test for the sx1
from printing the not-very-helpful note
  "omap_clkm_write: clocking scheme set to synchronous scalable"
to the test's default.log.

Signed-off-by: Peter Maydell 
---
  hw/arm/omap1.c  | 27 ++-
  hw/arm/trace-events |  7 +++
  2 files changed, 21 insertions(+), 13 deletions(-)




diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 7790db780e0..70b137a6cfd 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -1,5 +1,12 @@
  # See docs/devel/tracing.rst for syntax documentation.
  
+# omap1.c

+omap1_clocking_scheme(const char *scheme) "omap1 CLKM: clocking scheme set to 
%s"
+omap1_backlight(int output) "omap1 PWL: backlight now at %d/256"


omap1_pwl_...


+omap1_buzz(int freq) "omap1 PWT: %dHz buzz on"
+omap1_silence(void) "omap1 PWT: buzzer silenced"


omap1_pwt_...


+omap1_led(const char *onoff) "omap1 LPG: LED is %s"


omap1_lpg_led

Regardless:
Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 5/5] hw/arm/versatilepb: Convert printfs to LOG_GUEST_ERROR

2025-02-27 Thread Philippe Mathieu-Daudé


On 27/2/25 18:01, Peter Maydell wrote:

Convert some printf() calls for attempts to access nonexistent
registers into LOG_GUEST_ERROR logging.

Signed-off-by: Peter Maydell 
---
  hw/arm/versatilepb.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 3/3] target/arm: Drop unused address_offset from op_addr_{rr, ri}_post()

2025-02-27 Thread Philippe Mathieu-Daudé


On 27/2/25 15:27, Peter Maydell wrote:

All the callers of op_addr_rr_post() and op_addr_ri_post() now pass in
zero for the address_offset, so we can remove that argument.

Signed-off-by: Peter Maydell 
---
  target/arm/tcg/translate.c | 26 +-
  1 file changed, 13 insertions(+), 13 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] tests/tcg: Suppress compiler false-positive warning on sha1.c

2025-02-27 Thread Alex Bennée

Peter Maydell  writes:

> GCC versions at least 12 through 15 incorrectly report a warning
> about code in sha1.c:
>
> tests/tcg/multiarch/sha1.c:161:13: warning: ‘SHA1Transform’ reading 64 bytes 
> from a region of size 0 [-Wstringop-overread]
>   161 | SHA1Transform(context->state, &data[i]);
>   | ^~~
>
> This is a piece of stock library code for doing SHA1 which we've
> simply copied, rather than writing ourselves. The bug has been
> reported to upstream GCC (about a different library's use of this
> code):
>  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106709
>
> For our test case, since this isn't our original code and there isn't
> actually a bug in it, suppress the incorrect warning rather than
> trying to modify the code to work around the compiler issue.

Queued to maintainer/for-10.0-softfreeze, thanks.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 1/5] hw/arm/omap1: Convert raw printfs to qemu_log_mask()

2025-02-27 Thread Philippe Mathieu-Daudé


On 27/2/25 18:01, Peter Maydell wrote:

omap1.c is very old code, and it contains numerous calls direct to
printf() for various error and information cases.

In this commit, convert the printf() calls that are for either guest
error or unimplemented functionality to qemu_log_mask() calls.

This leaves the printf() calls that are informative or which are
ifdeffed-out debug statements untouched.

Signed-off-by: Peter Maydell 
---
  hw/arm/omap1.c | 48 +++-
  1 file changed, 31 insertions(+), 17 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: Query on the dirty bitmap

2025-02-27 Thread Eric Blake

On Wed, Feb 19, 2025 at 04:23:26PM +0530, prashant patil wrote:
> Hello All,
> Hope this email finds you well.
> 
> I have been trying with qemu for a while now, and have come across a
> problem specific to dirty bitmaps. I have enabled bitmap on the qcow2 disk
> image using 'qemu-img bitmap' command, exposed the bitmap over a unix
> socket using 'qemu-nbd' command. Now when I try to read the bitmap using
> 'qemu-img map' command with 'x-dirty-bitmap=qemu:dirty-bitmap:{bitmap}'
> option, I get one single extent which shows that the entire disk is dirty.
> Note that the disk size is 5 GB, and has only a few MB of data in it, and
> had added very small data after the bitmap was enabled. Bitmap output has
> been pasted below.

Can you show the exact sequence of command lines you used to create
the image, dirty a portion of it, then start up the qemu-nbd process
to inspect it?  As written, I can't reproduce your issue, but I know
it sounds similar to tests/qemu-iotests/tests/qemu-img-bitmaps which
does what you're talking about, so I know the code works and have to
suspect you may have missed a step or reordered things in such a way
that the entire bitmap is reading as dirty.

> 
> [{ "start": 0, "length": 5368709120, "depth": 0, "present": true, "zero":
> false, "data": true, "compressed": false, "offset": 0}]
> 
> Can someone please help me understand why the bitmap content shows the
> entire disk as dirty?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH] trace/simple: Fix hang when using simpletrace with fork()

2025-02-27 Thread Eric Blake

On Wed, Feb 26, 2025 at 09:50:15AM +0100, Thomas Huth wrote:
> When compiling QEMU with --enable-trace-backends=simple , the
> iotest 233 is currently hanging. This happens because qemu-nbd
> calls trace_init_backends() first - which causes simpletrace to
> install its writer thread and the atexit() handler - before
> calling fork(). But the simpletrace writer thread is then only
> available in the parent process, not in the child process anymore.
> Thus when the child process exits, its atexit handler waits forever
> on the trace_empty_cond condition to be set by the non-existing
> writer thread, so the process never finishes.
> 
> Fix it by installing a pthread_atfork() handler, too, which
> makes sure that the trace_writeout_enabled variable gets set
> to false again in the child process, so we can use it in the
> atexit() handler to check whether we still need to wait on the
> writer thread or not.
> 
> Signed-off-by: Thomas Huth 
> ---
>  trace/simple.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/trace/simple.c b/trace/simple.c
> index c0aba00cb7f..269bbda69f1 100644
> --- a/trace/simple.c
> +++ b/trace/simple.c
> @@ -380,8 +380,22 @@ void st_print_trace_file_status(void)
>  
>  void st_flush_trace_buffer(void)
>  {
> -flush_trace_file(true);
> +flush_trace_file(trace_writeout_enabled);
> +}
> +
> +#ifndef _WIN32
> +static void trace_thread_atfork(void)
> +{
> +/*
> + * If we fork, the writer thread does not exist in the child, so
> + * make sure to allow st_flush_trace_buffer() to clean up correctly.
> + */
> +g_mutex_lock(&trace_lock);

And are we sure trace_lock was previously initialized in memory
visible to the thread that is calling the after-fork handler here?

POSIX admits that pthread_atfork() is generally too hard to use
successfully (there are just too many corner cases), and instead
recommends that the only portable thing a multi-threaded app can do
after forking is limit itself to async-signal-safe functions until it
execs a new program.  The other common reason for forking is for
daemonization; there, the advice is to be completely single-threaded
until after the fork() has set up the right environment for the
daemon, at which point the child can then finally go multi-threaded.

And with either of those solutions (Using fork() to spawn a child
process? do nothing unsafe after fork except for what it takes to
reach exec. Using fork() to daemonize? do nothing unsafe before fork)
eliminates the need for using pthread_atfork() in the first place.

At any rate, per your plea elsewhere in the thread, I'll take a look
at deferring the logging init until after qemu-nbd has daemonized.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH] iotests: Stop NBD server in test 162 before starting the next one

2025-02-27 Thread Eric Blake

On Tue, Feb 25, 2025 at 08:06:50AM +0100, Thomas Huth wrote:
> Test 162 recently started failing for me for no obvious reasons (I
> did not spot any suspicious commits in this area), but looking in
> the 162.out.bad log file, there was a suspicious message at the end:
> 
>  qemu-nbd: Cannot lock pid file: Resource temporarily unavailable
> 
> And indeed, the test starts the NBD server two times, without stopping
> the first server before running the second one, so the second one can
> indeed fail to lock the PID file. Thus let's make sure to stop the
> first server before the test continues with the second one. With this
> change, the test works fine for me again.
> 
> Signed-off-by: Thomas Huth 
> ---
>  tests/qemu-iotests/162 | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH v10 2/8] memory: Introduce memory region fetch operation

2025-02-27 Thread Alistair Francis

On Wed, Jan 22, 2025 at 6:39 PM Ethan Chen via  wrote:
>
> Allow memory regions to have different behaviors for read and fetch
> operations.
>
> For example, the RISC-V IOPMP could raise an interrupt when the CPU
> tries to fetch from a non-executable region.
>
> If the fetch operation for a memory region is not implemented, the read
> operation will still be used for fetch operations.
>
> Signed-off-by: Ethan Chen 

This looks ok to me, but I would like someone who knows this better to
review it as well

Acked-by: Alistair Francis 

Alistair

> ---
>  accel/tcg/cputlb.c|   9 +++-
>  include/exec/memory.h |  27 +++
>  system/memory.c   | 104 ++
>  system/trace-events   |   2 +
>  4 files changed, 140 insertions(+), 2 deletions(-)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index b4ccf0cdcb..71c16a1ac1 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1947,8 +1947,13 @@ static uint64_t int_ld_mmio_beN(CPUState *cpu, 
> CPUTLBEntryFull *full,
>  this_size = 1 << this_mop;
>  this_mop |= MO_BE;
>
> -r = memory_region_dispatch_read(mr, mr_offset, &val,
> -this_mop, full->attrs);
> +if (type == MMU_INST_FETCH) {
> +r = memory_region_dispatch_fetch(mr, mr_offset, &val,
> + this_mop, full->attrs);
> +} else {
> +r = memory_region_dispatch_read(mr, mr_offset, &val,
> +this_mop, full->attrs);
> +}
>  if (unlikely(r != MEMTX_OK)) {
>  io_failed(cpu, full, addr, this_size, type, mmu_idx, r, ra);
>  }
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 3ee1901b52..6166d697d9 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -273,6 +273,11 @@ struct MemoryRegionOps {
>hwaddr addr,
>uint64_t data,
>unsigned size);
> +/* Fetch from the memory region. @addr is relative to @mr; @size is
> + * in bytes. */
> +uint64_t (*fetch)(void *opaque,
> +  hwaddr addr,
> +  unsigned size);
>
>  MemTxResult (*read_with_attrs)(void *opaque,
> hwaddr addr,
> @@ -284,6 +289,11 @@ struct MemoryRegionOps {
>  uint64_t data,
>  unsigned size,
>  MemTxAttrs attrs);
> +MemTxResult (*fetch_with_attrs)(void *opaque,
> +hwaddr addr,
> +uint64_t *data,
> +unsigned size,
> +MemTxAttrs attrs);
>
>  enum device_endian endianness;
>  /* Guest-visible constraints: */
> @@ -2604,6 +2614,23 @@ MemTxResult memory_region_dispatch_write(MemoryRegion 
> *mr,
>   MemOp op,
>   MemTxAttrs attrs);
>
> +
> +/**
> + * memory_region_dispatch_fetch: perform a fetch directly to the specified
> + * MemoryRegion.
> + *
> + * @mr: #MemoryRegion to access
> + * @addr: address within that region
> + * @pval: pointer to uint64_t which the data is written to
> + * @op: size, sign, and endianness of the memory operation
> + * @attrs: memory transaction attributes to use for the access
> + */
> +MemTxResult memory_region_dispatch_fetch(MemoryRegion *mr,
> + hwaddr addr,
> + uint64_t *pval,
> + MemOp op,
> + MemTxAttrs attrs);
> +
>  /**
>   * address_space_init: initializes an address space
>   *
> diff --git a/system/memory.c b/system/memory.c
> index b17b5538ff..7f26f681f9 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -477,6 +477,51 @@ static MemTxResult 
> memory_region_read_with_attrs_accessor(MemoryRegion *mr,
>  return r;
>  }
>
> +static MemTxResult memory_region_fetch_accessor(MemoryRegion *mr,
> +hwaddr addr,
> +uint64_t *value,
> +unsigned size,
> +signed shift,
> +uint64_t mask,
> +MemTxAttrs attrs)
> +{
> +uint64_t tmp;
> +
> +tmp = mr->ops->fetch(mr->opaque, addr, size);
> +if (mr->subpage) {
> +trace_memory_region_subpage_fetch(get_cpu_index(), mr, addr, tmp, 
> size);
> +} else if 
> (trace_event_get_state_backends(TRACE_MEMORY_REGION_OPS_FETCH)) {
> +hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
> +trace_memory_reg

[PULL 06/10] hw/nvme: be compliant wrt. dsm processing limits

2025-02-27 Thread Klaus Jensen

From: Klaus Jensen 

The specification states that,

> The controller shall set all three processing limit fields (i.e., the
> DMRL, DMRSL and DMSL fields) to non-zero values or shall clear all
> three processing limit fields to 0h.

So, set the DMRL and DMSL fields in addition to DMRSL.

Reviewed-by: Jesper Wendel Devantier 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 24 +++-
 include/block/nvme.h |  2 ++
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 2b73f601608f..86e1c48fab82 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5639,7 +5639,9 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, 
NvmeRequest *req)
 switch (c->csi) {
 case NVME_CSI_NVM:
 id_nvm->vsl = n->params.vsl;
+id_nvm->dmrl = NVME_ID_CTRL_NVM_DMRL_MAX;
 id_nvm->dmrsl = cpu_to_le32(n->dmrsl);
+id_nvm->dmsl = NVME_ID_CTRL_NVM_DMRL_MAX * n->dmrsl;
 break;
 
 case NVME_CSI_ZONED:
@@ -6696,18 +6698,23 @@ static uint16_t nvme_aer(NvmeCtrl *n, NvmeRequest *req)
 return NVME_NO_COMPLETE;
 }
 
-static void nvme_update_dmrsl(NvmeCtrl *n)
+static void nvme_update_dsm_limits(NvmeCtrl *n, NvmeNamespace *ns)
 {
-int nsid;
+if (ns) {
+n->dmrsl =
+MIN_NON_ZERO(n->dmrsl, BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
 
-for (nsid = 1; nsid <= NVME_MAX_NAMESPACES; nsid++) {
-NvmeNamespace *ns = nvme_ns(n, nsid);
+return;
+}
+
+for (uint32_t nsid = 1; nsid <= NVME_MAX_NAMESPACES; nsid++) {
+ns = nvme_ns(n, nsid);
 if (!ns) {
 continue;
 }
 
-n->dmrsl = MIN_NON_ZERO(n->dmrsl,
-BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
+n->dmrsl =
+MIN_NON_ZERO(n->dmrsl, BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
 }
 }
 
@@ -6795,7 +6802,7 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 ctrl->namespaces[nsid] = NULL;
 ns->attached--;
 
-nvme_update_dmrsl(ctrl);
+nvme_update_dsm_limits(ctrl, NULL);
 
 break;
 
@@ -8902,8 +8909,7 @@ void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns)
 n->namespaces[nsid] = ns;
 ns->attached++;
 
-n->dmrsl = MIN_NON_ZERO(n->dmrsl,
-BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
+nvme_update_dsm_limits(n, ns);
 }
 
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
diff --git a/include/block/nvme.h b/include/block/nvme.h
index aecfc9ce66b4..763b2b2f0ec7 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1207,6 +1207,8 @@ typedef struct NvmeIdCtrlZoned {
 uint8_t rsvd1[4095];
 } NvmeIdCtrlZoned;
 
+#define NVME_ID_CTRL_NVM_DMRL_MAX 255
+
 typedef struct NvmeIdCtrlNvm {
 uint8_t vsl;
 uint8_t wzsl;
-- 
2.47.2

[PULL 00/10] nvme queue

2025-02-27 Thread Klaus Jensen

From: Klaus Jensen 

Hi,

The following changes since commit b69801dd6b1eb4d107f7c2f643adf0a4e3ec9124:

  Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu 
into staging (2025-02-22 05:06:39 +0800)

are available in the Git repository at:

  https://gitlab.com/birkelund/qemu.git tags/pull-nvme-20250227

for you to fetch changes up to cad58ada8f104bf342097a7a683ef594ac949c8d:

  hw/nvme: remove nvme_aio_err() (2025-02-26 12:40:35 +0100)


nvme queue


Klaus Jensen (9):
  hw/nvme: always initialize a subsystem
  hw/nvme: make oacs dynamic
  hw/nvme: add knob for doorbell buffer config support
  nvme: fix iocs status code values
  hw/nvme: be compliant wrt. dsm processing limits
  hw/nvme: rework csi handling
  hw/nvme: only set command abort requested when cancelled due to Abort
  hw/nvme: set error status code explicitly for misc commands
  hw/nvme: remove nvme_aio_err()

Stephen Bates (1):
  hw/nvme: Add OCP SMART / Health Information Extended Log Page

 docs/system/devices/nvme.rst |   7 +
 hw/nvme/ctrl.c   | 460 +++
 hw/nvme/ns.c |  62 ++
 hw/nvme/nvme.h   |  11 +-
 include/block/nvme.h |  63 +-
 5 files changed, 373 insertions(+), 230 deletions(-)

[PULL 03/10] hw/nvme: make oacs dynamic

2025-02-27 Thread Klaus Jensen

From: Klaus Jensen 

Virtualization Management needs sriov-related parameters. Only report
support for the command when that conditions are true.

Reviewed-by: Jesper Wendel Devantier 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 25 ++---
 hw/nvme/nvme.h   |  4 
 include/block/nvme.h |  3 ++-
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 5a7ccbcc1b80..4ee8588ca9ae 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -266,7 +266,7 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
 [NVME_FDP_EVENTS]   = NVME_FEAT_CAP_CHANGE | NVME_FEAT_CAP_NS,
 };
 
-static const uint32_t nvme_cse_acs[256] = {
+static const uint32_t nvme_cse_acs_default[256] = {
 [NVME_ADM_CMD_DELETE_SQ]= NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_CREATE_SQ]= NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_GET_LOG_PAGE] = NVME_CMD_EFF_CSUPP,
@@ -278,7 +278,6 @@ static const uint32_t nvme_cse_acs[256] = {
 [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_NS_ATTACHMENT]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_NIC,
-[NVME_ADM_CMD_VIRT_MNGMT]   = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_DBBUF_CONFIG] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_FORMAT_NVM]   = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_ADM_CMD_DIRECTIVE_RECV]   = NVME_CMD_EFF_CSUPP,
@@ -5174,7 +5173,7 @@ static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint8_t 
csi, uint32_t buf_len,
 }
 }
 
-memcpy(log.acs, nvme_cse_acs, sizeof(nvme_cse_acs));
+memcpy(log.acs, n->cse.acs, sizeof(log.acs));
 
 if (src_iocs) {
 memcpy(log.iocs, src_iocs, sizeof(log.iocs));
@@ -7300,7 +7299,7 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_admin_cmd(nvme_cid(req), nvme_sqid(req), req->cmd.opcode,
  nvme_adm_opc_str(req->cmd.opcode));
 
-if (!(nvme_cse_acs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) {
+if (!(n->cse.acs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) {
 trace_pci_nvme_err_invalid_admin_opc(req->cmd.opcode);
 return NVME_INVALID_OPCODE | NVME_DNR;
 }
@@ -8740,6 +8739,9 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 uint64_t cap = ldq_le_p(&n->bar.cap);
 NvmeSecCtrlEntry *sctrl = nvme_sctrl(n);
 uint32_t ctratt;
+uint16_t oacs;
+
+memcpy(n->cse.acs, nvme_cse_acs_default, sizeof(n->cse.acs));
 
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
@@ -8770,9 +8772,18 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 
 id->mdts = n->params.mdts;
 id->ver = cpu_to_le32(NVME_SPEC_VER);
-id->oacs =
-cpu_to_le16(NVME_OACS_NS_MGMT | NVME_OACS_FORMAT | NVME_OACS_DBBUF |
-NVME_OACS_DIRECTIVES);
+
+oacs = NVME_OACS_NMS | NVME_OACS_FORMAT | NVME_OACS_DBBUF |
+NVME_OACS_DIRECTIVES;
+
+if (n->params.sriov_max_vfs) {
+oacs |= NVME_OACS_VMS;
+
+n->cse.acs[NVME_ADM_CMD_VIRT_MNGMT] = NVME_CMD_EFF_CSUPP;
+}
+
+id->oacs = cpu_to_le16(oacs);
+
 id->cntrltype = 0x1;
 
 /*
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index e307e733e46a..b86cad388f5a 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -584,6 +584,10 @@ typedef struct NvmeCtrl {
 uint64_tdbbuf_eis;
 booldbbuf_enabled;
 
+struct {
+uint32_t acs[256];
+} cse;
+
 struct {
 MemoryRegion mem;
 uint8_t  *buf;
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 975d321c5c08..80fbcb420d48 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1232,8 +1232,9 @@ enum NvmeIdCtrlOacs {
 NVME_OACS_SECURITY  = 1 << 0,
 NVME_OACS_FORMAT= 1 << 1,
 NVME_OACS_FW= 1 << 2,
-NVME_OACS_NS_MGMT   = 1 << 3,
+NVME_OACS_NMS   = 1 << 3,
 NVME_OACS_DIRECTIVES= 1 << 5,
+NVME_OACS_VMS   = 1 << 7,
 NVME_OACS_DBBUF = 1 << 8,
 };
 
-- 
2.47.2

Re: [PATCH v2 6/6] qdev: Improve a few more PropertyInfo @description members

2025-02-27 Thread Daniel P . Berrangé

On Thu, Feb 27, 2025 at 09:56:01AM +0100, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster 
> ---
>  hw/block/xen-block.c | 2 +-
>  hw/core/qdev-properties-system.c | 2 +-
>  hw/core/qdev-properties.c| 1 +
>  hw/s390x/ccw-device.c| 4 ++--
>  target/sparc/cpu.c   | 1 +
>  5 files changed, 6 insertions(+), 4 deletions(-)

Reviewed-by: Daniel P. Berrangé 


> diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
> index 5a801057db..c04df3b337 100644
> --- a/hw/core/qdev-properties.c
> +++ b/hw/core/qdev-properties.c
> @@ -247,6 +247,7 @@ static void set_bool(Object *obj, Visitor *v, const char 
> *name, void *opaque,
>  
>  const PropertyInfo qdev_prop_bool = {
>  .type  = "bool",
> +.description = "on/off",

Awkward as on/off for QemuOpts, but JSON true/false  for QMP, but I
guess clarifying this is beyond the scope of the .description field.

>  .get   = get_bool,
>  .set   = set_bool,
>  .set_default_value = set_default_value_bool,


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PULL 05/10] nvme: fix iocs status code values

2025-02-27 Thread Klaus Jensen

From: Klaus Jensen 

The status codes related to I/O Command Sets are in the wrong group.

Reviewed-by: Jesper Wendel Devantier 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 4 ++--
 include/block/nvme.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 1ad76da943a6..2b73f601608f 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5681,7 +5681,7 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest 
*req, bool active)
 return nvme_c2h(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), req);
 }
 
-return NVME_INVALID_CMD_SET | NVME_DNR;
+return NVME_INVALID_IOCS | NVME_DNR;
 }
 
 static uint16_t nvme_identify_ctrl_list(NvmeCtrl *n, NvmeRequest *req,
@@ -6647,7 +6647,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest 
*req)
 case NVME_COMMAND_SET_PROFILE:
 if (dw11 & 0x1ff) {
 trace_pci_nvme_err_invalid_iocsci(dw11 & 0x1ff);
-return NVME_CMD_SET_CMB_REJECTED | NVME_DNR;
+return NVME_IOCS_COMBINATION_REJECTED | NVME_DNR;
 }
 break;
 case NVME_FDP_MODE:
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 63eb74460eac..aecfc9ce66b4 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -906,8 +906,6 @@ enum NvmeStatusCodes {
 NVME_SGL_DESCR_TYPE_INVALID = 0x0011,
 NVME_INVALID_USE_OF_CMB = 0x0012,
 NVME_INVALID_PRP_OFFSET = 0x0013,
-NVME_CMD_SET_CMB_REJECTED   = 0x002b,
-NVME_INVALID_CMD_SET= 0x002c,
 NVME_FDP_DISABLED   = 0x0029,
 NVME_INVALID_PHID_LIST  = 0x002a,
 NVME_LBA_RANGE  = 0x0080,
@@ -940,6 +938,8 @@ enum NvmeStatusCodes {
 NVME_INVALID_SEC_CTRL_STATE = 0x0120,
 NVME_INVALID_NUM_RESOURCES  = 0x0121,
 NVME_INVALID_RESOURCE_ID= 0x0122,
+NVME_IOCS_COMBINATION_REJECTED = 0x012b,
+NVME_INVALID_IOCS   = 0x012c,
 NVME_CONFLICTING_ATTRS  = 0x0180,
 NVME_INVALID_PROT_INFO  = 0x0181,
 NVME_WRITE_TO_RO= 0x0182,
-- 
2.47.2

Re: [PATCH v3 15/28] hw/misc/aspeed_scu: Fix the revision ID cannot be set in the SOC layer for AST2700

2025-02-27 Thread Cédric Le Goater


Hello Jamin,

On 2/26/25 07:38, Jamin Lin wrote:

Hi Cedric,



On 2/13/25 04:35, Jamin Lin wrote:

According to the design of the AST2600, it has a Silicon Revision ID
Register, specifically SCU004 and SCU014, to set the Revision ID for the

AST2600.

For the AST2600 A3, SCU004 is set to 0x05030303 and SCU014 is set to

0x05030303.

In the "aspeed_ast2600_scu_reset" function, the hardcoded value
"AST2600_A3_SILICON_REV" is set in SCU004, and "s->silicon_rev" is set
in SCU014. The value of "s->silicon_rev" is set by the SOC layer via
the "silicon-rev" property.

However, the design of the AST2700 is different. There are two SCU

controllers:

SCU0 (CPU Die) and SCU1 (IO Die). In the AST2700, the firmware reads
the SCU Silicon Revision ID register (SCU0_000) and the SCUIO Silicon
Revision ID register (SCU1_000) and combines them into a 64-bit value.
The combined value of SCU0_000[23:16] and SCU1_000[23:16] represents
the silicon revision. For example, the AST2700-A1 revision is
"0x0601010306010103", where
SCU0_000 should be 06010103 and SCU1_000 should be 06010103.


Are both these values supposed to be identical ? if not, we should plan for
changes at machine/SoC level too.



Currently, these values are supposed to be identical. Therefore, we can reuse 
the current design of the
silicon_rev in the machine/SoC layer for AST2700.


   static const uint32_t ast2700_a0_resets[ASPEED_AST2700_SCU_NR_REGS]

= {

-[AST2700_SILICON_REV]   = AST2700_A0_SILICON_REV,
   [AST2700_HW_STRAP1] = 0x0800,
   [AST2700_HW_STRAP1_CLR] = 0xFFF0FFF0,
   [AST2700_HW_STRAP1_LOCK]= 0x0FFF,
@@ -940,6 +939,7 @@ static void aspeed_ast2700_scu_reset(DeviceState

*dev)

   AspeedSCUClass *asc = ASPEED_SCU_GET_CLASS(dev);

   memcpy(s->regs, asc->resets, asc->nr_regs * 4);
+s->regs[AST2700_SILICON_REV] = s->silicon_rev;

   }

   static void aspeed_2700_scu_class_init(ObjectClass *klass, void
*data) @@ -1032,7 +1032,6 @@ static const MemoryRegionOps

aspeed_ast2700_scuio_ops = {

   };

   static const uint32_t

ast2700_a0_resets_io[ASPEED_AST2700_SCU_NR_REGS] = {

-[AST2700_SILICON_REV]   = 0x0603,
   [AST2700_HW_STRAP1] = 0x0504,


why isn't AST2700_HW_STRAP1 assigned with s->hw_strap1 property ?



This is a bug. The design of the HW_STRAP register has changed in the AST2700.
There is one hw_strap1 register in the SCU (CPU DIE) and another hw_strap1 
register in the SCUIO (IO DIE).
The values of these two hw_strap1 registers should not be the same.

To fix this issue, I made the following changes. Do you have any suggestions?


All Aspeed SoC models currently define "hw-strap1" and "hw-strap2"
properties as aliases on the same properties of the SCU model :

object_property_add_alias(obj, "hw-strap1", OBJECT(&s->scu),
  "hw-strap1");
object_property_add_alias(obj, "hw-strap2", OBJECT(&s->scu),
  "hw-strap2");

For the AST2700 SoC, you could change the "hw-strap2" alias to point
to the SCUIO model  :

object_property_add_alias(obj, "hw-strap1", OBJECT(&s->scu),
  "hw-strap1");
object_property_add_alias(obj, "hw-strap2", OBJECT(&s->scuio),
  "hw-strap1");


Additionally, would it be possible to submit a separate patch for the SCU 
silicon_rev and SCU hw_strap fix?


yes we should please.


The patch series for supporting AST2700 A1 is quite large.


yes. That's why I asked you to split it :)


Thanks-Jamin

1. I dumped the real values of both registers on the EVB

root@ast2700-a0-default:~# md 14c02010 1  > SCUIO hw_strap1
14c02010: 0500
root@ast2700-a0-default:~# md 12c02010 1 --> SCU hw_strap1
12c02010: 0800

The value AST2700_EVB_HW_STRAP1 0x0800 is used for the SCU hw_strap1.
The value AST2700_EVB_HW_STRAP2 0x0500 is used for the SCUIO hw_strap1

--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -181,8 +181,8 @@ struct AspeedMachineState {

  #ifdef TARGET_AARCH64
  /* AST2700 evb hardware value */
-#define AST2700_EVB_HW_STRAP1 0x00C0
-#define AST2700_EVB_HW_STRAP2 0x0003
+#define AST2700_EVB_HW_STRAP1 0x0800
+#define AST2700_EVB_HW_STRAP2 0x0500
  #endif

2.  Change to set hw_strap2 in the SCUIO model. Note this will modify the 
hw_strap1 register of the SCUIO.

+++ b/hw/arm/aspeed_ast27x0.c
@@ -410,14 +410,14 @@ static void aspeed_soc_ast2700_init(Object *obj)
   sc->silicon_rev);
  object_property_add_alias(obj, "hw-strap1", OBJECT(&s->scu),
"hw-strap1");
-object_property_add_alias(obj, "hw-strap2", OBJECT(&s->scu),
-  "hw-strap2");
  object_property_add_alias(obj, "hw-prot-key", OBJECT(&s->scu),
"hw-prot-key");

  object_initialize_child(obj, "scuio", &s->scuio, TYPE_ASPEED_2700_SCUIO);
  qdev_prop_set_uint32(DEVICE(&s->s

[PULL 08/10] hw/nvme: only set command abort requested when cancelled due to Abort

2025-02-27 Thread Klaus Jensen

From: Klaus Jensen 

The Command Abort Requested status code should only be set if the
command was explicitly cancelled due to an Abort command. Or, in the
case the cancel was due to Submission Queue deletion, set the status
code to Command Aborted due to SQ Deletion.

Reviewed-by: Jesper Wendel Devantier 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 21496c6b6b81..07cd63298526 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1783,10 +1783,6 @@ static void nvme_aio_err(NvmeRequest *req, int ret)
 break;
 }
 
-if (ret == -ECANCELED) {
-status = NVME_CMD_ABORT_REQ;
-}
-
 trace_pci_nvme_err_aio(nvme_cid(req), strerror(-ret), status);
 
 error_setg_errno(&local_err, -ret, "aio failed");
@@ -4827,6 +4823,7 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeRequest *req)
 while (!QTAILQ_EMPTY(&sq->out_req_list)) {
 r = QTAILQ_FIRST(&sq->out_req_list);
 assert(r->aiocb);
+r->status = NVME_CMD_ABORT_SQ_DEL;
 blk_aio_cancel(r->aiocb);
 }
 
@@ -6137,6 +6134,7 @@ static uint16_t nvme_abort(NvmeCtrl *n, NvmeRequest *req)
 QTAILQ_FOREACH_SAFE(r, &sq->out_req_list, entry, next) {
 if (r->cqe.cid == cid) {
 if (r->aiocb) {
+r->status = NVME_CMD_ABORT_REQ;
 blk_aio_cancel_async(r->aiocb);
 }
 break;
-- 
2.47.2

[PULL 10/10] hw/nvme: remove nvme_aio_err()

2025-02-27 Thread Klaus Jensen

From: Klaus Jensen 

nvme_rw_complete_cb() is the only remaining user of nvme_aio_err(), so
open code the status code setting instead.

Reviewed-by: Jesper Wendel Devantier 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 60 +++---
 1 file changed, 23 insertions(+), 37 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index b7222fd9ac02..e62c6a358828 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1762,42 +1762,6 @@ static uint16_t nvme_check_dulbe(NvmeNamespace *ns, 
uint64_t slba,
 return NVME_SUCCESS;
 }
 
-static void nvme_aio_err(NvmeRequest *req, int ret)
-{
-uint16_t status = NVME_SUCCESS;
-Error *local_err = NULL;
-
-switch (req->cmd.opcode) {
-case NVME_CMD_READ:
-status = NVME_UNRECOVERED_READ;
-break;
-case NVME_CMD_WRITE:
-case NVME_CMD_WRITE_ZEROES:
-case NVME_CMD_ZONE_APPEND:
-case NVME_CMD_COPY:
-status = NVME_WRITE_FAULT;
-break;
-default:
-status = NVME_INTERNAL_DEV_ERROR;
-break;
-}
-
-trace_pci_nvme_err_aio(nvme_cid(req), strerror(-ret), status);
-
-error_setg_errno(&local_err, -ret, "aio failed");
-error_report_err(local_err);
-
-/*
- * Set the command status code to the first encountered error but allow a
- * subsequent Internal Device Error to trump it.
- */
-if (req->status && status != NVME_INTERNAL_DEV_ERROR) {
-return;
-}
-
-req->status = status;
-}
-
 static inline uint32_t nvme_zone_idx(NvmeNamespace *ns, uint64_t slba)
 {
 return ns->zone_size_log2 > 0 ? slba >> ns->zone_size_log2 :
@@ -2182,8 +2146,30 @@ void nvme_rw_complete_cb(void *opaque, int ret)
 trace_pci_nvme_rw_complete_cb(nvme_cid(req), blk_name(blk));
 
 if (ret) {
+Error *err = NULL;
+
 block_acct_failed(stats, acct);
-nvme_aio_err(req, ret);
+
+switch (req->cmd.opcode) {
+case NVME_CMD_READ:
+req->status = NVME_UNRECOVERED_READ;
+break;
+
+case NVME_CMD_WRITE:
+case NVME_CMD_WRITE_ZEROES:
+case NVME_CMD_ZONE_APPEND:
+req->status = NVME_WRITE_FAULT;
+break;
+
+default:
+req->status = NVME_INTERNAL_DEV_ERROR;
+break;
+}
+
+trace_pci_nvme_err_aio(nvme_cid(req), strerror(-ret), req->status);
+
+error_setg_errno(&err, -ret, "aio failed");
+error_report_err(err);
 } else {
 block_acct_done(stats, acct);
 }
-- 
2.47.2

Re: [PATCH v3 032/162] tcg: Convert not to TCGOutOpUnary

2025-02-27 Thread Philippe Mathieu-Daudé


On 17/2/25 00:08, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  tcg/aarch64/tcg-target-has.h |  2 --
  tcg/arm/tcg-target-has.h |  1 -
  tcg/i386/tcg-target-has.h|  2 --
  tcg/loongarch64/tcg-target-has.h |  2 --
  tcg/mips/tcg-target-has.h|  2 --
  tcg/ppc/tcg-target-has.h |  2 --
  tcg/riscv/tcg-target-has.h   |  2 --
  tcg/s390x/tcg-target-has.h   |  2 --
  tcg/sparc64/tcg-target-has.h |  2 --
  tcg/tcg-has.h|  1 -
  tcg/tci/tcg-target-has.h |  2 --
  tcg/optimize.c   |  4 ++--
  tcg/tcg-op.c | 10 ++
  tcg/tcg.c|  8 
  tcg/tci.c|  2 --
  tcg/aarch64/tcg-target.c.inc | 17 ++---
  tcg/arm/tcg-target.c.inc | 15 ++-
  tcg/i386/tcg-target.c.inc| 17 +++--
  tcg/loongarch64/tcg-target.c.inc | 17 ++---
  tcg/mips/tcg-target.c.inc| 20 ++--
  tcg/ppc/tcg-target.c.inc | 17 ++---
  tcg/riscv/tcg-target.c.inc   | 17 ++---
  tcg/s390x/tcg-target.c.inc   | 25 -
  tcg/sparc64/tcg-target.c.inc | 20 ++--
  tcg/tci/tcg-target.c.inc | 13 ++---
  25 files changed, 119 insertions(+), 103 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 1/6] qdev: Delete unused qdev_prop_enum

2025-02-27 Thread Daniel P . Berrangé

On Thu, Feb 27, 2025 at 09:55:56AM +0100, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster 
> ---
>  include/hw/qdev-properties.h | 1 -
>  hw/core/qdev-properties.c| 7 ---
>  2 files changed, 8 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 4/6] qdev: Change values of PropertyInfo member @type to be QAPI types

2025-02-27 Thread Daniel P . Berrangé

On Thu, Feb 27, 2025 at 09:55:59AM +0100, Markus Armbruster wrote:
> PropertyInfo member @type is externally visible via QMP
> device-list-properties and qom-list-properies.
> 
> Its meaning is not documented at its definition.
> 
> It gets passed to as @type argument to object_property_add() and

 ^  "passed as the @type argument" ?

> object_class_property_add().  This argument's documentation isn't of
> much help, either:
> 
>  * @type: the type name of the property.  This namespace is pretty loosely
>  *   defined.  Sub namespaces are constructed by using a prefix and then
>  *   to angle brackets.  For instance, the type 'virtio-net-pci' in the
>  *   'link' namespace would be 'link'.
> 
> The two QMP commands document it as
> 
>  # @type: the type of the property.  This will typically come in one of
>  # four forms:
>  #
>  # 1) A primitive type such as 'u8', 'u16', 'bool', 'str', or
>  #'double'.  These types are mapped to the appropriate JSON
>  #type.
>  #
>  # 2) A child type in the form 'child' where subtype is a
>  #qdev device type name.  Child properties create the
>  #composition tree.
>  #
>  # 3) A link type in the form 'link' where subtype is a
>  #qdev device type name.  Link properties form the device model
>  #graph.
> 
> "Typically come in one of four forms" followed by three items inspires
> the level of trust that is appropriate here.

> 
> Clean up a bunch of funnies:
> 
> * qdev_prop_fdc_drive_type.type is "FdcDriveType".  Its .enum_table
>   refers to QAPI type "FloppyDriveType".  So use that.
> 
> * qdev_prop_reserved_region is "reserved_region".  Its only user is an
>   array property called "reserved-regions".  Its .set() visits str.
>   So change @type to "str".
> 
> * trng_prop_fault_event_set.type is "uint32:bits".  Its .set() visits
>   uint32, so change @type to "uint32".  If we believe mentioning it's
>   actually bits is useful, the proper place would be .description.
> 
> * ccw_loadparm.type is "ccw_loadparm".  It's users are properties
>   called "loadparm".  Its .set() visits str.  So change @type to
>   "str".
> 
> * qdev_prop_nv_gpudirect_clique.type is "uint4".  Its set() visits
>   uint8, so change @type to "uint8".  If we believe mentioning the
>   range is useful, the proper place would be .description.
> 
> * s390_pci_fid_propinfo.type is "zpci_fid".  Its .set() visits uint32.
>   So change type to that, and move the "zpci_fid" to .description.
>   This is admittedly a lousy description, but it's still an
>   improvement; for instance, output of -device zpci,help changes from
> 
>   fid=
> 
>   to
> 
>   fid=   - zpci_fid
> 
> * Similarly for a raft of PropertyInfo in target/riscv/cpu.c.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  hw/core/qdev-properties-system.c |  4 +--
>  hw/misc/xlnx-versal-trng.c   |  2 +-
>  hw/s390x/ccw-device.c|  2 +-
>  hw/s390x/s390-pci-bus.c  |  3 ++-
>  hw/vfio/pci-quirks.c |  2 +-
>  target/riscv/cpu.c   | 44 ++--
>  6 files changed, 37 insertions(+), 20 deletions(-)

Reviewed-by: Daniel P. Berrangé 



With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 5/6] qdev: Improve PropertyInfo member @description for enum properties

2025-02-27 Thread Daniel P . Berrangé

On Thu, Feb 27, 2025 at 09:56:00AM +0100, Markus Armbruster wrote:
> Consistently use format "DESCRIPTION (VALUE/VALUE...)".
> 
> Signed-off-by: Markus Armbruster 
> ---
>  hw/core/qdev-properties-system.c | 26 +++---
>  1 file changed, 11 insertions(+), 15 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PULL 01/10] hw/nvme: Add OCP SMART / Health Information Extended Log Page

2025-02-27 Thread Klaus Jensen

From: Stephen Bates 

The Open Compute Project [1] includes a Datacenter NVMe
SSD Specification [2]. The most recent version of this specification
(as of November 2024) is 2.6.1. This specification layers on top of
the NVM Express specifications [3] to provide additional
functionality. A key part of of this is the 512 Byte OCP SMART / Health
Information Extended log page that is defined in Section 4.8.6 of the
specification.

We add a controller argument (ocp) that toggles on/off the SMART log
extended structure.  To accommodate different vendor specific specifications
like OCP, we add a multiplexing function (nvme_vendor_specific_log) which
will route to the different log functions based on arguments and log ids.
We only return the OCP extended SMART log when the command is 0xC0 and ocp
has been turned on in the nvme argumants.

Though we add the whole nvme SMART log extended structure, we only populate
the physical_media_units_{read,written}, log_page_version and
log_page_uuid.

This patch is based on work done by Joel but has been modified enough
that he requested a co-developed-by tag rather than a signed-off-by.

[1]: https://www.opencompute.org/
[2]: 
https://www.opencompute.org/documents/datacenter-nvme-ssd-specification-v2-6-1-pdf
[3]: https://nvmexpress.org/specifications/

Signed-off-by: Stephen Bates 
Co-developed-by: Joel Granados 
Reviewed-by: Klaus Jensen 
Signed-off-by: Klaus Jensen 
---
 docs/system/devices/nvme.rst |  7 +
 hw/nvme/ctrl.c   | 59 
 hw/nvme/nvme.h   |  1 +
 include/block/nvme.h | 41 +
 4 files changed, 108 insertions(+)

diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst
index d2b1ca96455f..6509b35fcb4e 100644
--- a/docs/system/devices/nvme.rst
+++ b/docs/system/devices/nvme.rst
@@ -53,6 +53,13 @@ parameters.
   Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID
   previously used.
 
+``ocp`` (default: ``off``)
+  The Open Compute Project defines the Datacenter NVMe SSD Specification that
+  sits on top of NVMe. It describes additional commands and NVMe behaviors
+  specific for the Datacenter. When this option is ``on`` OCP features such as
+  the SMART / Health information extended log become available in the
+  controller. We emulate version 5 of this log page.
+
 Additional Namespaces
 -
 
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 8175751518f8..11687e597a11 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4917,6 +4917,45 @@ static void nvme_set_blk_stats(NvmeNamespace *ns, struct 
nvme_stats *stats)
 stats->write_commands += s->nr_ops[BLOCK_ACCT_WRITE];
 }
 
+static uint16_t nvme_ocp_extended_smart_info(NvmeCtrl *n, uint8_t rae,
+ uint32_t buf_len, uint64_t off,
+ NvmeRequest *req)
+{
+NvmeNamespace *ns = NULL;
+NvmeSmartLogExtended smart_l = { 0 };
+struct nvme_stats stats = { 0 };
+uint32_t trans_len;
+
+if (off >= sizeof(smart_l)) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+/* accumulate all stats from all namespaces */
+for (int i = 1; i <= NVME_MAX_NAMESPACES; i++) {
+ns = nvme_ns(n, i);
+if (ns) {
+nvme_set_blk_stats(ns, &stats);
+}
+}
+
+smart_l.physical_media_units_written[0] = cpu_to_le64(stats.units_written);
+smart_l.physical_media_units_read[0] = cpu_to_le64(stats.units_read);
+smart_l.log_page_version = 0x0005;
+
+static const uint8_t guid[16] = {
+0xC5, 0xAF, 0x10, 0x28, 0xEA, 0xBF, 0xF2, 0xA4,
+0x9C, 0x4F, 0x6F, 0x7C, 0xC9, 0x14, 0xD5, 0xAF
+};
+memcpy(smart_l.log_page_guid, guid, sizeof(smart_l.log_page_guid));
+
+if (!rae) {
+nvme_clear_events(n, NVME_AER_TYPE_SMART);
+}
+
+trans_len = MIN(sizeof(smart_l) - off, buf_len);
+return nvme_c2h(n, (uint8_t *) &smart_l + off, trans_len, req);
+}
+
 static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len,
 uint64_t off, NvmeRequest *req)
 {
@@ -5146,6 +5185,23 @@ static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint8_t 
csi, uint32_t buf_len,
 return nvme_c2h(n, ((uint8_t *)&log) + off, trans_len, req);
 }
 
+static uint16_t nvme_vendor_specific_log(NvmeCtrl *n, uint8_t rae,
+ uint32_t buf_len, uint64_t off,
+ NvmeRequest *req, uint8_t lid)
+{
+switch (lid) {
+case NVME_OCP_EXTENDED_SMART_INFO:
+if (n->params.ocp) {
+return nvme_ocp_extended_smart_info(n, rae, buf_len, off, req);
+}
+break;
+/* add a case for each additional vendor specific log id */
+}
+
+trace_pci_nvme_err_invalid_log_page(nvme_cid(req), lid);
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
 static size_t sizeof_fdp_conf_descr(size_t nruh, size_t vss)
 {

Re: [PATCH] hw/arm/smmu-common: Remove the repeated ttb field

2025-02-27 Thread Eric Auger

Hi,
On 2/21/25 4:10 AM, JianChunfu wrote:
> SMMUTransCfg->ttb is never used in QEMU, TT base address
> can be accessed by SMMUTransCfg->tt[i]->ttb.
>
> Signed-off-by: JianChunfu 
Reviewed-by: Eric Auger 

Thanks!

Eric
> ---
>  include/hw/arm/smmu-common.h | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
> index d1a4a6455..e5ad55bba 100644
> --- a/include/hw/arm/smmu-common.h
> +++ b/include/hw/arm/smmu-common.h
> @@ -110,7 +110,6 @@ typedef struct SMMUTransCfg {
>  /* Used by stage-1 only. */
>  bool aa64; /* arch64 or aarch32 translation table */
>  bool record_faults;/* record fault events */
> -uint64_t ttb;  /* TT base address */
>  uint8_t oas;   /* output address width */
>  uint8_t tbi;   /* Top Byte Ignore */
>  int asid;

[PULL 04/10] hw/nvme: add knob for doorbell buffer config support

2025-02-27 Thread Klaus Jensen

From: Klaus Jensen 

Add a 'dbcs' knob to allow Doorbell Buffer Config command to be
disabled.

Reviewed-by: Jesper Wendel Devantier 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 11 ---
 hw/nvme/nvme.h   |  1 +
 include/block/nvme.h |  2 +-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 4ee8588ca9ae..1ad76da943a6 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -278,7 +278,6 @@ static const uint32_t nvme_cse_acs_default[256] = {
 [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_NS_ATTACHMENT]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_NIC,
-[NVME_ADM_CMD_DBBUF_CONFIG] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_FORMAT_NVM]   = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_ADM_CMD_DIRECTIVE_RECV]   = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_DIRECTIVE_SEND]   = NVME_CMD_EFF_CSUPP,
@@ -8773,8 +8772,13 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->mdts = n->params.mdts;
 id->ver = cpu_to_le32(NVME_SPEC_VER);
 
-oacs = NVME_OACS_NMS | NVME_OACS_FORMAT | NVME_OACS_DBBUF |
-NVME_OACS_DIRECTIVES;
+oacs = NVME_OACS_NMS | NVME_OACS_FORMAT | NVME_OACS_DIRECTIVES;
+
+if (n->params.dbcs) {
+oacs |= NVME_OACS_DBCS;
+
+n->cse.acs[NVME_ADM_CMD_DBBUF_CONFIG] = NVME_CMD_EFF_CSUPP;
+}
 
 if (n->params.sriov_max_vfs) {
 oacs |= NVME_OACS_VMS;
@@ -9024,6 +9028,7 @@ static const Property nvme_props[] = {
 DEFINE_PROP_BOOL("use-intel-id", NvmeCtrl, params.use_intel_id, false),
 DEFINE_PROP_BOOL("legacy-cmb", NvmeCtrl, params.legacy_cmb, false),
 DEFINE_PROP_BOOL("ioeventfd", NvmeCtrl, params.ioeventfd, false),
+DEFINE_PROP_BOOL("dbcs", NvmeCtrl, params.dbcs, true),
 DEFINE_PROP_UINT8("zoned.zasl", NvmeCtrl, params.zasl, 0),
 DEFINE_PROP_BOOL("zoned.auto_transition", NvmeCtrl,
  params.auto_transition_zones, true),
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index b86cad388f5a..b8d063a027a9 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -539,6 +539,7 @@ typedef struct NvmeParams {
 bool auto_transition_zones;
 bool legacy_cmb;
 bool ioeventfd;
+bool dbcs;
 uint16_t  sriov_max_vfs;
 uint16_t sriov_vq_flexible;
 uint16_t sriov_vi_flexible;
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 80fbcb420d48..63eb74460eac 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1235,7 +1235,7 @@ enum NvmeIdCtrlOacs {
 NVME_OACS_NMS   = 1 << 3,
 NVME_OACS_DIRECTIVES= 1 << 5,
 NVME_OACS_VMS   = 1 << 7,
-NVME_OACS_DBBUF = 1 << 8,
+NVME_OACS_DBCS  = 1 << 8,
 };
 
 enum NvmeIdCtrlOncs {
-- 
2.47.2

Re: [PATCH v2 2/6] qdev: Change qdev_prop_pci_devfn member @name from "int32" to "str"

2025-02-27 Thread Daniel P . Berrangé

On Thu, Feb 27, 2025 at 09:55:57AM +0100, Markus Armbruster wrote:
> Properties using qdev_prop_pci_devfn initially accepted a string of
> the form "DEV.FN" or "DEV" where DEV and FN are in hexadecimal.
> Member @name was "pci-devfn" initially.
> 
> Commit b403298adb5 (qdev: make the non-legacy pci address property
> accept an integer) changed them to additionally accept integers: bits
> 3..7 are DEV, and bits 0..2 are FN.  This is inaccessible externally
> in device_add so far.
> 
> The commit also changed @name to "int32", and set member @legacy-name
> to "pci-devfn".  Together, this kept QMP command
> device-list-properties unaffected: it used @name only when
> @legacy_name was null.
> 
> Commit 07d09c58dbb (qmp: Print descriptions of object properties)
> quietly dumbed that down to use @name always, and the next commit
> 18b91a3e082q (qdev: Drop legacy_name from qdev properties) dropped
> member @legacy_name.  This changed the value of @type reported by QMP
> command device-list-properties from "pci-devfn" to "int32".
> 
> But "int32" is misleading: device_add actually wants QAPI type "str".
> So change @name to that.

That history is "fun" :-)

> 
> Signed-off-by: Markus Armbruster 
> ---
>  hw/core/qdev-properties-system.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v3 03/14] acpi/ghes: Use HEST table offsets when preparing GHES records

2025-02-27 Thread Igor Mammedov

On Wed, 26 Feb 2025 17:14:06 +0100
Mauro Carvalho Chehab  wrote:

> Em Tue, 25 Feb 2025 10:43:27 +0100
> Igor Mammedov  escreveu:
> 
> > On Fri, 21 Feb 2025 07:02:21 +0100
> > Mauro Carvalho Chehab  wrote:
> >   
> > > Em Mon, 3 Feb 2025 15:34:23 +0100
> > > Igor Mammedov  escreveu:
> > > 
> > > > On Fri, 31 Jan 2025 18:42:44 +0100
> > > > Mauro Carvalho Chehab  wrote:
> > > >   
> > > > > There are two pointers that are needed during error injection:
> > > > > 
> > > > > 1. The start address of the CPER block to be stored;
> > > > > 2. The address of the ack.
> > > > > 
> > > > > It is preferable to calculate them from the HEST table.  This allows
> > > > > checking the source ID, the size of the table and the type of the
> > > > > HEST error block structures.
> > > > > 
> > > > > Yet, keep the old code, as this is needed for migration purposes.
> > > > > 
> > > > > Signed-off-by: Mauro Carvalho Chehab 
> > > > > ---
> > > > >  hw/acpi/ghes.c | 132 
> > > > > -
> > > > >  include/hw/acpi/ghes.h |   1 +
> > > > >  2 files changed, 119 insertions(+), 14 deletions(-)
> > > > > 
> > > > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > > > index 27478f2d5674..8f284fd191a6 100644
> > > > > --- a/hw/acpi/ghes.c
> > > > > +++ b/hw/acpi/ghes.c
> > > > > @@ -41,6 +41,12 @@
> > > > >  /* Address offset in Generic Address Structure(GAS) */
> > > > >  #define GAS_ADDR_OFFSET 4
> > > > >  
> > > > > +/*
> > > > > + * ACPI spec 1.0b
> > > > > + * 5.2.3 System Description Table Header
> > > > > + */
> > > > > +#define ACPI_DESC_HEADER_OFFSET 36
> > > > > +
> > > > >  /*
> > > > >   * The total size of Generic Error Data Entry
> > > > >   * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> > > > > @@ -61,6 +67,25 @@
> > > > >   */
> > > > >  #define ACPI_GHES_GESB_SIZE 20
> > > > >  
> > > > > +/*
> > > > > + * Offsets with regards to the start of the HEST table stored at
> > > > > + * ags->hest_addr_le,
> > > > 
> > > > If I read this literary, then offsets above are not what
> > > > declared later in this patch.
> > > > I'd really drop this comment altogether as it's confusing,
> > > > and rather get variables/macro naming right
> > > >   
> > > > > according with the memory layout map at
> > > > > + * docs/specs/acpi_hest_ghes.rst.
> > > > > + */
> > > > 
> > > > what we need is update to above doc, describing new and old ways.
> > > > a separate patch.  
> > > 
> > > I can't see anything that should be changed at
> > > docs/specs/acpi_hest_ghes.rst, as this series doesn't change the
> > > firmware layout: we're still using two firmware tables:
> > > 
> > > - etc/acpi/tables, with HEST on it;
> > > - etc/hardware_errors, with:
> > >   - error block addresses;
> > >   - read_ack registers;
> > >   - CPER records.
> > > 
> > > The only changes that this series introduce are related to how
> > > the error generation logic navigates between HEST and hw_errors
> > > firmware. This is not described at acpi_hest_ghes.rst, and both
> > > ways follow ACPI specs to the letter.
> > > 
> > > The only difference is that the code which populates the CPER
> > > record and the error/read offsets doesn't require to know how
> > > the HEST table generation placed offsets, as it will basically
> > > reproduce what OSPM firmware does when handling   HEST events.
> > 
> > section 8 describes old way to get to address to record old CPER,
> > so it needs to amended to also describe a new approach and say
> > which way is used for which version.
> > 
> > possibly section 11 might need some messaging as well.  
> 
> Ok, I'll modify it and place at the end of the series. Please
> see below if the new text is ok for you.
> 
> ---
> 
> [PATCH] docs/specs/acpi_hest_ghes.rst: update it to reflect some changes

s/^^^/docs: hest: add new "etc/acpi_table_hest_addr" and update workflow/

> 
> While the HEST layout didn't change, there are some internal
> changes related to how offsets are calculated and how memory error
> events are triggered.
> 
> Update specs to reflect such changes.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> 
> diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
> index c3e9f8d9a702..f22d2eefdec7 100644
> --- a/docs/specs/acpi_hest_ghes.rst
> +++ b/docs/specs/acpi_hest_ghes.rst
> @@ -89,12 +89,21 @@ Design Details
>  addresses in the "error_block_address" fields with a pointer to the
>  respective "Error Status Data Block" in the "etc/hardware_errors" blob.
>  
> -(8) QEMU defines a third and write-only fw_cfg blob which is called
> -"etc/hardware_errors_addr". Through that blob, the firmware can send back
> -the guest-side allocation addresses to QEMU. The 
> "etc/hardware_errors_addr"
> -blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER 
> command
> -for the firmware. The firmware will write back the start address of
> -"etc/hardware_errors" blob to the fw_c

Re: [PATCH v2 3/6] qdev: Rename PropertyInfo member @name to @type

2025-02-27 Thread Daniel P . Berrangé

On Thu, Feb 27, 2025 at 09:55:58AM +0100, Markus Armbruster wrote:
> PropertyInfo member @name becomes ObjectProperty member @type, while
> Property member @name becomes ObjectProperty member @name.  Rename the
> former.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  include/hw/qdev-properties.h |  2 +-
>  backends/tpm/tpm_util.c  |  2 +-
>  hw/block/xen-block.c |  2 +-
>  hw/core/qdev-properties-system.c | 50 
>  hw/core/qdev-properties.c| 36 +++
>  hw/misc/xlnx-versal-trng.c   |  2 +-
>  hw/nvme/nguid.c  |  2 +-
>  hw/nvram/xlnx-bbram.c|  2 +-
>  hw/nvram/xlnx-efuse.c|  2 +-
>  hw/pci/pci.c |  2 +-
>  hw/s390x/ccw-device.c|  2 +-
>  hw/s390x/css.c   |  4 +--
>  hw/s390x/s390-pci-bus.c  |  2 +-
>  hw/vfio/pci-quirks.c |  2 +-
>  target/riscv/cpu.c   | 28 +-
>  target/sparc/cpu.c   |  2 +-
>  hw/display/apple-gfx.m   |  2 +-
>  17 files changed, 72 insertions(+), 72 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PULL 02/10] hw/nvme: always initialize a subsystem

2025-02-27 Thread Klaus Jensen

From: Klaus Jensen 

If no nvme-subsys is explicitly configured, instantiate one.

Reviewed-by: Jesper Wendel Devantier 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 36 +++-
 hw/nvme/ns.c   | 64 +-
 2 files changed, 42 insertions(+), 58 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 11687e597a11..5a7ccbcc1b80 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8823,15 +8823,13 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->psd[0].enlat = cpu_to_le32(0x10);
 id->psd[0].exlat = cpu_to_le32(0x4);
 
-if (n->subsys) {
-id->cmic |= NVME_CMIC_MULTI_CTRL;
-ctratt |= NVME_CTRATT_ENDGRPS;
+id->cmic |= NVME_CMIC_MULTI_CTRL;
+ctratt |= NVME_CTRATT_ENDGRPS;
 
-id->endgidmax = cpu_to_le16(0x1);
+id->endgidmax = cpu_to_le16(0x1);
 
-if (n->subsys->endgrp.fdp.enabled) {
-ctratt |= NVME_CTRATT_FDPS;
-}
+if (n->subsys->endgrp.fdp.enabled) {
+ctratt |= NVME_CTRATT_FDPS;
 }
 
 id->ctratt = cpu_to_le32(ctratt);
@@ -8860,7 +8858,15 @@ static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
 int cntlid;
 
 if (!n->subsys) {
-return 0;
+DeviceState *dev = qdev_new(TYPE_NVME_SUBSYS);
+
+qdev_prop_set_string(dev, "nqn", n->params.serial);
+
+if (!qdev_realize(dev, NULL, errp)) {
+return -1;
+}
+
+n->subsys = NVME_SUBSYS(dev);
 }
 
 cntlid = nvme_subsys_register_ctrl(n, errp);
@@ -8950,17 +8956,15 @@ static void nvme_exit(PCIDevice *pci_dev)
 
 nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
 
-if (n->subsys) {
-for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
-ns = nvme_ns(n, i);
-if (ns) {
-ns->attached--;
-}
+for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
+ns = nvme_ns(n, i);
+if (ns) {
+ns->attached--;
 }
-
-nvme_subsys_unregister_ctrl(n->subsys, n);
 }
 
+nvme_subsys_unregister_ctrl(n->subsys, n);
+
 g_free(n->cq);
 g_free(n->sq);
 g_free(n->aer_reqs);
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 410df2959192..94cabc6a5b8d 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -727,25 +727,14 @@ static void nvme_ns_realize(DeviceState *dev, Error 
**errp)
 uint32_t nsid = ns->params.nsid;
 int i;
 
-if (!n->subsys) {
-/* If no subsys, the ns cannot be attached to more than one ctrl. */
-ns->params.shared = false;
-if (ns->params.detached) {
-error_setg(errp, "detached requires that the nvme device is "
-   "linked to an nvme-subsys device");
-return;
-}
-} else {
-/*
- * If this namespace belongs to a subsystem (through a link on the
- * controller device), reparent the device.
- */
-if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
-return;
-}
-ns->subsys = subsys;
-ns->endgrp = &subsys->endgrp;
+assert(subsys);
+
+/* reparent to subsystem bus */
+if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
+return;
 }
+ns->subsys = subsys;
+ns->endgrp = &subsys->endgrp;
 
 if (nvme_ns_setup(ns, errp)) {
 return;
@@ -753,7 +742,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 
 if (!nsid) {
 for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
-if (nvme_ns(n, i) || nvme_subsys_ns(subsys, i)) {
+if (nvme_subsys_ns(subsys, i)) {
 continue;
 }
 
@@ -765,38 +754,29 @@ static void nvme_ns_realize(DeviceState *dev, Error 
**errp)
 error_setg(errp, "no free namespace id");
 return;
 }
-} else {
-if (nvme_ns(n, nsid) || nvme_subsys_ns(subsys, nsid)) {
-error_setg(errp, "namespace id '%d' already allocated", nsid);
-return;
-}
+} else if (nvme_subsys_ns(subsys, nsid)) {
+error_setg(errp, "namespace id '%d' already allocated", nsid);
+return;
 }
 
-if (subsys) {
-subsys->namespaces[nsid] = ns;
+subsys->namespaces[nsid] = ns;
 
-ns->id_ns.endgid = cpu_to_le16(0x1);
-ns->id_ns_ind.endgrpid = cpu_to_le16(0x1);
+ns->id_ns.endgid = cpu_to_le16(0x1);
+ns->id_ns_ind.endgrpid = cpu_to_le16(0x1);
 
-if (ns->params.detached) {
-return;
-}
+if (ns->params.detached) {
+return;
+}
 
-if (ns->params.shared) {
-for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
-NvmeCtrl *ctrl = subsys->ctrls[i];
+if (ns->params.shared) {
+for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+NvmeCtrl *ctrl = subsys->ctrls[i];
 
-if (ctrl && ctrl != SUBSYS_SLOT_RSVD) {
-nvme_attach_ns(ctrl, ns);
-

[PULL 07/10] hw/nvme: rework csi handling

2025-02-27 Thread Klaus Jensen

From: Klaus Jensen 

The controller incorrectly allows a zoned namespace to be attached even
if CS.CSS is configured to only support the NVM command set for I/O
queues.

Rework handling of namespace command sets in general by attaching
supported namespaces when the controller is started instead of, like
now, statically when realized.

Reviewed-by: Jesper Wendel Devantier 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 213 ---
 hw/nvme/ns.c |  14 ---
 hw/nvme/nvme.h   |   5 +-
 include/block/nvme.h |  10 +-
 4 files changed, 131 insertions(+), 111 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 86e1c48fab82..21496c6b6b81 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -277,15 +277,14 @@ static const uint32_t nvme_cse_acs_default[256] = {
 [NVME_ADM_CMD_SET_FEATURES] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP,
-[NVME_ADM_CMD_NS_ATTACHMENT]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_NIC,
+[NVME_ADM_CMD_NS_ATTACHMENT]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_NIC |
+  NVME_CMD_EFF_CCC,
 [NVME_ADM_CMD_FORMAT_NVM]   = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_ADM_CMD_DIRECTIVE_RECV]   = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_DIRECTIVE_SEND]   = NVME_CMD_EFF_CSUPP,
 };
 
-static const uint32_t nvme_cse_iocs_none[256];
-
-static const uint32_t nvme_cse_iocs_nvm[256] = {
+static const uint32_t nvme_cse_iocs_nvm_default[256] = {
 [NVME_CMD_FLUSH]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_WRITE_ZEROES] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_WRITE]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
@@ -298,7 +297,7 @@ static const uint32_t nvme_cse_iocs_nvm[256] = {
 [NVME_CMD_IO_MGMT_SEND] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 };
 
-static const uint32_t nvme_cse_iocs_zoned[256] = {
+static const uint32_t nvme_cse_iocs_zoned_default[256] = {
 [NVME_CMD_FLUSH]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_WRITE_ZEROES] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_WRITE]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
@@ -307,6 +306,9 @@ static const uint32_t nvme_cse_iocs_zoned[256] = {
 [NVME_CMD_VERIFY]   = NVME_CMD_EFF_CSUPP,
 [NVME_CMD_COPY] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_COMPARE]  = NVME_CMD_EFF_CSUPP,
+[NVME_CMD_IO_MGMT_RECV] = NVME_CMD_EFF_CSUPP,
+[NVME_CMD_IO_MGMT_SEND] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
+
 [NVME_CMD_ZONE_APPEND]  = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_ZONE_MGMT_SEND]   = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_ZONE_MGMT_RECV]   = NVME_CMD_EFF_CSUPP,
@@ -4603,6 +4605,61 @@ static uint16_t nvme_io_mgmt_send(NvmeCtrl *n, 
NvmeRequest *req)
 };
 }
 
+static uint16_t __nvme_io_cmd_nvm(NvmeCtrl *n, NvmeRequest *req)
+{
+switch (req->cmd.opcode) {
+case NVME_CMD_WRITE:
+return nvme_write(n, req);
+case NVME_CMD_READ:
+return nvme_read(n, req);
+case NVME_CMD_COMPARE:
+return nvme_compare(n, req);
+case NVME_CMD_WRITE_ZEROES:
+return nvme_write_zeroes(n, req);
+case NVME_CMD_DSM:
+return nvme_dsm(n, req);
+case NVME_CMD_VERIFY:
+return nvme_verify(n, req);
+case NVME_CMD_COPY:
+return nvme_copy(n, req);
+case NVME_CMD_IO_MGMT_RECV:
+return nvme_io_mgmt_recv(n, req);
+case NVME_CMD_IO_MGMT_SEND:
+return nvme_io_mgmt_send(n, req);
+}
+
+g_assert_not_reached();
+}
+
+static uint16_t nvme_io_cmd_nvm(NvmeCtrl *n, NvmeRequest *req)
+{
+if (!(n->cse.iocs.nvm[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) {
+trace_pci_nvme_err_invalid_opc(req->cmd.opcode);
+return NVME_INVALID_OPCODE | NVME_DNR;
+}
+
+return __nvme_io_cmd_nvm(n, req);
+}
+
+static uint16_t nvme_io_cmd_zoned(NvmeCtrl *n, NvmeRequest *req)
+{
+if (!(n->cse.iocs.zoned[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) {
+trace_pci_nvme_err_invalid_opc(req->cmd.opcode);
+return NVME_INVALID_OPCODE | NVME_DNR;
+}
+
+switch (req->cmd.opcode) {
+case NVME_CMD_ZONE_APPEND:
+return nvme_zone_append(n, req);
+case NVME_CMD_ZONE_MGMT_SEND:
+return nvme_zone_mgmt_send(n, req);
+case NVME_CMD_ZONE_MGMT_RECV:
+return nvme_zone_mgmt_recv(n, req);
+}
+
+return __nvme_io_cmd_nvm(n, req);
+}
+
 static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeNamespace *ns;
@@ -4644,11 +4701,6 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-if (!(ns->iocs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) {
-trace_pci_nvme_err_invalid_opc(req->cmd.op

[PULL 09/10] hw/nvme: set error status code explicitly for misc commands

2025-02-27 Thread Klaus Jensen

From: Klaus Jensen 

The nvme_aio_err() does not handle Verify, Compare, Copy and other misc
commands and defaults to setting the error status code to Internal
Device Error. For some of these commands, we know better, so set it
explicitly.

For the commands using the nvme_misc_cb() callback (Copy, Flush, ...),
if no status code has explicitly been set by the lower handlers, default
to Internal Device Error as previously.

Reviewed-by: Jesper Wendel Devantier 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 28 ++--
 include/block/nvme.h |  1 +
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 07cd63298526..b7222fd9ac02 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1771,7 +1771,6 @@ static void nvme_aio_err(NvmeRequest *req, int ret)
 case NVME_CMD_READ:
 status = NVME_UNRECOVERED_READ;
 break;
-case NVME_CMD_FLUSH:
 case NVME_CMD_WRITE:
 case NVME_CMD_WRITE_ZEROES:
 case NVME_CMD_ZONE_APPEND:
@@ -2157,11 +2156,16 @@ static inline bool nvme_is_write(NvmeRequest *req)
 static void nvme_misc_cb(void *opaque, int ret)
 {
 NvmeRequest *req = opaque;
+uint16_t cid = nvme_cid(req);
 
-trace_pci_nvme_misc_cb(nvme_cid(req));
+trace_pci_nvme_misc_cb(cid);
 
 if (ret) {
-nvme_aio_err(req, ret);
+if (!req->status) {
+req->status = NVME_INTERNAL_DEV_ERROR;
+}
+
+trace_pci_nvme_err_aio(cid, strerror(-ret), req->status);
 }
 
 nvme_enqueue_req_completion(nvme_cq(req), req);
@@ -2264,7 +2268,10 @@ static void nvme_verify_cb(void *opaque, int ret)
 
 if (ret) {
 block_acct_failed(stats, acct);
-nvme_aio_err(req, ret);
+req->status = NVME_UNRECOVERED_READ;
+
+trace_pci_nvme_err_aio(nvme_cid(req), strerror(-ret), req->status);
+
 goto out;
 }
 
@@ -2363,7 +2370,10 @@ static void nvme_compare_mdata_cb(void *opaque, int ret)
 
 if (ret) {
 block_acct_failed(stats, acct);
-nvme_aio_err(req, ret);
+req->status = NVME_UNRECOVERED_READ;
+
+trace_pci_nvme_err_aio(nvme_cid(req), strerror(-ret), req->status);
+
 goto out;
 }
 
@@ -2445,7 +2455,10 @@ static void nvme_compare_data_cb(void *opaque, int ret)
 
 if (ret) {
 block_acct_failed(stats, acct);
-nvme_aio_err(req, ret);
+req->status = NVME_UNRECOVERED_READ;
+
+trace_pci_nvme_err_aio(nvme_cid(req), strerror(-ret), req->status);
+
 goto out;
 }
 
@@ -2924,6 +2937,7 @@ static void nvme_copy_out_completed_cb(void *opaque, int 
ret)
 
 if (ret < 0) {
 iocb->ret = ret;
+req->status = NVME_WRITE_FAULT;
 goto out;
 } else if (iocb->ret < 0) {
 goto out;
@@ -2988,6 +3002,7 @@ static void nvme_copy_in_completed_cb(void *opaque, int 
ret)
 
 if (ret < 0) {
 iocb->ret = ret;
+req->status = NVME_UNRECOVERED_READ;
 goto out;
 } else if (iocb->ret < 0) {
 goto out;
@@ -3510,6 +3525,7 @@ static void nvme_flush_ns_cb(void *opaque, int ret)
 
 if (ret < 0) {
 iocb->ret = ret;
+iocb->req->status = NVME_WRITE_FAULT;
 goto out;
 } else if (iocb->ret < 0) {
 goto out;
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 366739f79edf..358e516e38b0 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -906,6 +906,7 @@ enum NvmeStatusCodes {
 NVME_SGL_DESCR_TYPE_INVALID = 0x0011,
 NVME_INVALID_USE_OF_CMB = 0x0012,
 NVME_INVALID_PRP_OFFSET = 0x0013,
+NVME_COMMAND_INTERRUPTED= 0x0021,
 NVME_FDP_DISABLED   = 0x0029,
 NVME_INVALID_PHID_LIST  = 0x002a,
 NVME_LBA_RANGE  = 0x0080,
-- 
2.47.2

Re: [PATCH v4 08/14] acpi/generic_event_device: add logic to detect if HEST addr is available

2025-02-27 Thread Igor Mammedov

On Thu, 27 Feb 2025 08:26:38 +0100
Mauro Carvalho Chehab  wrote:

> Em Thu, 27 Feb 2025 08:19:27 +0100
> Mauro Carvalho Chehab  escreveu:
> 
> > Em Wed, 26 Feb 2025 16:52:26 +0100
> > Igor Mammedov  escreveu:
> >   
> > > On Fri, 21 Feb 2025 15:35:17 +0100
> > > Mauro Carvalho Chehab  wrote:
> > > 
> >   
> > > > diff --git a/hw/acpi/generic_event_device.c 
> > > > b/hw/acpi/generic_event_device.c
> > > > index 5346cae573b7..14d8513a5440 100644
> > > > --- a/hw/acpi/generic_event_device.c
> > > > +++ b/hw/acpi/generic_event_device.c
> > > > @@ -318,6 +318,7 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, 
> > > > AcpiEventStatusBits ev)
> > > >  
> > > >  static const Property acpi_ged_properties[] = {
> > > >  DEFINE_PROP_UINT32("ged-event", AcpiGedState, ged_event_bitmap, 0),
> > > > +DEFINE_PROP_BOOL("x-has-hest-addr", AcpiGedState, 
> > > > ghes_state.use_hest_addr, false),  
> > > 
> > > you below set it for 9.2 to false, so
> > > shouldn't it be set to true by default here?
> > 
> > Yes, but it is too early to do that here, as the DSDT table was not
> > updated to contain the GED device.
> > 
> > We're switching it to true later on, at patch 11::
> > 
> > d8c44ee13fbe ("arm/virt: Wire up a GED error device for ACPI / GHES")  

After sleeping on it,
what you did here is totally correct.

You are right, We can't really flip switch to true here
since without 11/14 APEI will stop working properly.

Perhaps add to commit message a note explaining why it's false
in this patch and where it will be set to true.

> 
> Hmm... too many rebases that on my head things are becoming shady ;-)
> 
> Originally, this was setting it to true, but you requested to move it
> to another patch during one of the patch reorder requests.
> 
> Anyway, after all those rebases, I guess it is now safe to set it
> to true here without breaking bisectability. I'll move the hunk back
> to this patch.
> 
> Thanks,
> Mauro
>

1 2 3 4 >

1 - 100 of 340 matches

Mail list logo