Re: [PATCH v6 00/10] Support virtio-gpu DRM native context

2025-02-03 Thread Alex Bennée
"Kim, Dongwon"  writes:

> Hi,
>
> The commit below could change the timing of drawing by making the drawing
> done at refresh cycle instead of via drawing event. So it looks like either 
> dmabuf
> or client's framebuffer is being written and read at the same time. Hey, can 
> you
> describe how the corruption looks like? Is it just garbage image with random 
> noise
> or the actual frame with some defects like tearing...?

The terminal gets mirrored upside down and the mouse creates damage as
it moves about.

>
>> Subject: Re: [PATCH v6 00/10] Support virtio-gpu DRM native context
>> 
>> Dmitry Osipenko  writes:
>> 
>> > On 1/27/25 19:17, Alex Bennée wrote:
>> > ...
>> >> I'm still seeing corruption with -display gtk,gl=on on my x86 system
>> >> BTW. I would like to understand if that is a problem with QEMU, GTK
>> >> or something else in the stack before we merge.
>> >
>> > I reproduced the display mirroring/corruption issue and bisected it to
>> > the following commit. The problem only happens when QEMU/GTK uses
>> > Wayland display directly, while previously I was running QEMU with
>> > XWayland that doesn't have the problem. Why this change breaks dmabuf
>> > displaying with Wayland/GTK is unclear.
>> 
>> Ahh that makes sense - I obviously forgot to mention I'm running sway/wayland
>> across both machines.
>> 
>> > Reverting commit fixes the bug.
>> >
>> > +Dongwon Kim +Vivek Kasireddy
>> >
>> > commit 77bf310084dad38b3a2badf01766c659056f1cf2
>> > Author: Dongwon Kim 
>> > Date:   Fri Apr 26 15:50:59 2024 -0700
>> >
>> > ui/gtk: Draw guest frame at refresh cycle
>> >
>> > Draw routine needs to be manually invoked in the next refresh
>> > if there is a scanout blob from the guest. This is to prevent
>> > a situation where there is a scheduled draw event but it won't
>> > happen bacause the window is currently in inactive state
>> > (minimized or tabified). If draw is not done for a long time,
>> > gl_block timeout and/or fence timeout (on the guest) will happen
>> > eventually.
>> >
>> > v2: Use gd_gl_area_draw(vc) in gtk-gl-area.c
>> >
>> > Suggested-by: Vivek Kasireddy 
>> > Cc: Gerd Hoffmann 
>> > Cc: Marc-André Lureau 
>> > Cc: Daniel P. Berrangé 
>> > Signed-off-by: Dongwon Kim 
>> > Acked-by: Marc-André Lureau 
>> > Message-Id: <20240426225059.3871283-1-dongwon@intel.com>
>> 
>> 
>> Maybe a race on:
>> 
>> QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf; ?
>> 
>> --
>> Alex Bennée
>> Virtualisation Tech Lead @ Linaro

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH] block: remove unused BLOCK_OP_TYPE_DATAPLANE

2025-02-03 Thread Eric Blake
On Mon, Feb 03, 2025 at 01:25:29PM -0500, Stefan Hajnoczi wrote:
> BLOCK_OP_TYPE_DATAPLANE prevents BlockDriverState from being used by
> virtio-blk/virtio-scsi with IOThread. Commit b112a65c52aa ("block:
> declare blockjobs and dataplane friends!") eliminated the main reason
> for this blocker in 2014.

Wow, that's a long time.

> 
> Nowadays the block layer supports I/O from multiple AioContexts, so
> there is even less reason to block IOThread users. Any legitimate
> reasons related to interference would probably also apply to
> non-IOThread users.
> 
> The only remaining users are bdrv_op_unblock(BLOCK_OP_TYPE_DATAPLANE)
> calls after bdrv_op_block_all(). If we remove BLOCK_OP_TYPE_DATAPLANE
> their behavior doesn't change.
> 
> Existing bdrv_op_block_all() callers that don't explicitly unblock
> BLOCK_OP_TYPE_DATAPLANE seem to do so simply because no one bothered to
> rather than because it is necessary to keep BLOCK_OP_TYPE_DATAPLANE
> blocked.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/block/block-common.h | 1 -
>  block/replication.c  | 1 -
>  blockjob.c   | 2 --
>  hw/block/virtio-blk.c| 9 -
>  hw/scsi/virtio-scsi.c| 3 ---
>  5 files changed, 16 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH v3 09/26] target/arm/kvm-rme: Initialize Realm memory

2025-02-03 Thread Gavin Shan

On 11/26/24 5:56 AM, Jean-Philippe Brucker wrote:

Initialize the IPA state of RAM. Collect the images copied into guest
RAM into a sorted list, and issue POPULATE_REALM KVM ioctls once we've
created the Realm Descriptor. The images are part of the Realm Initial
Measurement.

Signed-off-by: Jean-Philippe Brucker 
---
v2->v3: RIPAS is now initialized separately
---
  target/arm/kvm_arm.h |  14 +
  target/arm/kvm-rme.c | 128 +++
  2 files changed, 142 insertions(+)

diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 8b52a881b0..67db09a424 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -255,6 +255,16 @@ int kvm_arm_rme_vm_type(MachineState *ms);
   */
  int kvm_arm_rme_vcpu_init(CPUState *cs);
  
+/*

+ * kvm_arm_rme_init_guest_ram
+ * @base: base address of RAM
+ * @size: size of RAM
+ *
+ * If the user requested a Realm, set the base and size of guest RAM, in order
+ * to initialize the Realm IPA space.
+ */
+void kvm_arm_rme_init_guest_ram(hwaddr base, size_t size);
+
  #else
  
  /*

@@ -281,6 +291,10 @@ static inline bool kvm_arm_mte_supported(void)
  return false;
  }
  
+static inline void kvm_arm_rme_init_guest_ram(hwaddr base, size_t size)

+{
+}
+
  /*
   * These functions should never actually be called without KVM support.
   */
diff --git a/target/arm/kvm-rme.c b/target/arm/kvm-rme.c
index e3cc37538a..83a29421df 100644
--- a/target/arm/kvm-rme.c
+++ b/target/arm/kvm-rme.c
@@ -9,6 +9,7 @@
  #include "exec/confidential-guest-support.h"
  #include "hw/boards.h"
  #include "hw/core/cpu.h"
+#include "hw/loader.h"
  #include "kvm_arm.h"
  #include "migration/blocker.h"
  #include "qapi/error.h"
@@ -20,16 +21,85 @@
  #define TYPE_RME_GUEST "rme-guest"
  OBJECT_DECLARE_SIMPLE_TYPE(RmeGuest, RME_GUEST)
  
+#define RME_PAGE_SIZE qemu_real_host_page_size()

+
  struct RmeGuest {
  ConfidentialGuestSupport parent_obj;
+Notifier rom_load_notifier;
+GSList *ram_regions;
+
+hwaddr ram_base;
+size_t ram_size;
  };
  


s/size_t/hwaddr. To be consistent with RmeRamRegion, we may reuse
it like below.

struct RmeGuest {
:
GSlist *populate_ram_regions;
RmeRamRegion init_ram_region;
};


  OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(RmeGuest, rme_guest, RME_GUEST,
CONFIDENTIAL_GUEST_SUPPORT,
{ TYPE_USER_CREATABLE }, { })
  
+typedef struct {

+hwaddr base;
+hwaddr size;
+} RmeRamRegion;
+
  static RmeGuest *rme_guest;
  
+static int rme_init_ram(hwaddr base, size_t size, Error **errp)

+{
+int ret;
+uint64_t start = QEMU_ALIGN_DOWN(base, RME_PAGE_SIZE);
+uint64_t end = QEMU_ALIGN_UP(base + size, RME_PAGE_SIZE);
+struct kvm_cap_arm_rme_init_ipa_args init_args = {
+.init_ipa_base = start,
+.init_ipa_size = end - start,
+};
+
+ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_ARM_RME, 0,
+KVM_CAP_ARM_RME_INIT_IPA_REALM,
+(intptr_t)&init_args);
+if (ret) {
+error_setg_errno(errp, -ret,
+ "failed to init RAM [0x%"HWADDR_PRIx", 
0x%"HWADDR_PRIx")",

 ^^
^^^
The type for 'start' and 'end' would be 'hwaddr'.


+ start, end);
+}
+
+return ret;
+}
+
+static int rme_populate_range(hwaddr base, size_t size, bool measure,
+  Error **errp)
+{
+int ret;
+uint64_t start = QEMU_ALIGN_DOWN(base, RME_PAGE_SIZE);
+uint64_t end = QEMU_ALIGN_UP(base + size, RME_PAGE_SIZE);
+struct kvm_cap_arm_rme_populate_realm_args populate_args = {
+.populate_ipa_base = start,
+.populate_ipa_size = end - start,
+.flags = measure ? KVM_ARM_RME_POPULATE_FLAGS_MEASURE : 0,
+};
+
+ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_ARM_RME, 0,
+KVM_CAP_ARM_RME_POPULATE_REALM,
+(intptr_t)&populate_args);
+if (ret) {
+error_setg_errno(errp, -ret,
+   "failed to populate realm [0x%"HWADDR_PRIx", 
0x%"HWADDR_PRIx")",
+   start, end);
+}
+return ret;
+}
+
+static void rme_populate_ram_region(gpointer data, gpointer err)
+{
+Error **errp = err;
+const RmeRamRegion *region = data;
+
+if (*errp) {
+return;
+}
+
+rme_populate_range(region->base, region->size, /* measure */ true, errp);
+}
+
  static int rme_init_cpus(Error **errp)
  {
  int ret;
@@ -60,6 +130,16 @@ static int rme_create_realm(Error **errp)
  return -1;
  }
  
+if (rme_init_ram(rme_guest->ram_base, rme_guest->ram_size, errp)) {

+return -1;
+}
+
+g_slist_foreach(rme_guest->ram_regions, rme_populate_ram_region, errp);
+g_slist_free_full(g_steal_pointer(&rme_guest->ram_regions), g_free);
+if (*errp) {
+return -1;
+}
+
 

Re: [PATCH v3 08/26] hw/core/loader: Add ROM loader notifier

2025-02-03 Thread Gavin Shan

On 11/26/24 5:56 AM, Jean-Philippe Brucker wrote:

Add a function to register a notifier, that is invoked after a ROM gets
loaded into guest memory.

It will be used by Arm confidential guest support, in order to register
all blobs loaded into memory with KVM, so that their content is moved
into Realm state and measured into the initial VM state.

Signed-off-by: Jean-Philippe Brucker 
---
  include/hw/loader.h | 15 +++
  hw/core/loader.c| 15 +++
  2 files changed, 30 insertions(+)

diff --git a/include/hw/loader.h b/include/hw/loader.h
index 7f6d06b956..0cd9905f97 100644
--- a/include/hw/loader.h
+++ b/include/hw/loader.h
@@ -353,6 +353,21 @@ void *rom_ptr_for_as(AddressSpace *as, hwaddr addr, size_t 
size);
  ssize_t rom_add_vga(const char *file);
  ssize_t rom_add_option(const char *file, int32_t bootindex);
  
+typedef struct RomLoaderNotify {

+/* Parameters passed to rom_add_blob() */
+hwaddr addr;
+size_t len;
+size_t max_len;
+} RomLoaderNotify;
+


I would suggest to rename it to RomLoaderNotifyData since it's the
data passed to the notifier.


+/**
+ * rom_add_load_notifier - Add a notifier for loaded images
+ *
+ * Add a notifier that will be invoked with a RomLoaderNotify structure for 
each
+ * blob loaded into guest memory, after the blob is loaded.
+ */
+void rom_add_load_notifier(Notifier *notifier);
+
  /* This is the usual maximum in uboot, so if a uImage overflows this, it would
   * overflow on real hardware too. */
  #define UBOOT_MAX_GUNZIP_BYTES (64 << 20)
diff --git a/hw/core/loader.c b/hw/core/loader.c
index 31593a1171..759a62cf58 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -67,6 +67,8 @@
  #include 
  
  static int roms_loaded;

+static NotifierList rom_loader_notifier =
+NOTIFIER_LIST_INITIALIZER(rom_loader_notifier);
  
  /* return the size or -1 if error */

  int64_t get_image_size(const char *filename)
@@ -1179,6 +1181,11 @@ MemoryRegion *rom_add_blob(const char *name, const void 
*blob, size_t len,
  return mr;
  }
  
+void rom_add_load_notifier(Notifier *notifier)

+{
+notifier_list_add(&rom_loader_notifier, notifier);
+}
+
  /* This function is specific for elf program because we don't need to allocate
   * all the rom. We just allocate the first part and the rest is just zeros. 
This
   * is why romsize and datasize are different. Also, this function takes its 
own
@@ -1220,6 +1227,7 @@ ssize_t rom_add_option(const char *file, int32_t 
bootindex)
  static void rom_reset(void *unused)
  {
  Rom *rom;
+RomLoaderNotify notify;
  
  QTAILQ_FOREACH(rom, &roms, next) {

  if (rom->fw_file) {
@@ -1268,6 +1276,13 @@ static void rom_reset(void *unused)
  cpu_flush_icache_range(rom->addr, rom->datasize);
  
  trace_loader_write_rom(rom->name, rom->addr, rom->datasize, rom->isrom);

+
+notify = (RomLoaderNotify) {
+.addr = rom->addr,
+.len = rom->datasize,
+.max_len = rom->romsize,
+};
+notifier_list_notify(&rom_loader_notifier, ¬ify);
  }
  }
  


Thanks,
Gavin




Re: [PATCH 0/1] meson: Deprecate 32-bit host systems

2025-02-03 Thread Philippe Mathieu-Daudé

On 3/2/25 10:10, Alex Bennée wrote:

Peter Maydell  writes:


On Wed, 29 Jan 2025 at 06:23, Thomas Huth  wrote:

So unless someone complains immediately with a good reason, I'm also in
favor of marking it as deprecated now. If then someone complains during the
deprecation period, we still can reconsider and remove the deprecation note
again.


Well, I mean the reason would be that I suspect we do still have
users who are using QEMU for some purposes on 32-bit arm hosts.
That doesn't mean they're trying to run massively complex or
high memory guests or that they care that our whole test suite
doesn't run.

I'm not really strongly opposed to dropping 32-bit host support,
but I don't think a thread on qemu-devel is exactly likely to
get the attention of the people who might be using this
functionality. (You could argue that functionality without
representation among the developer community is fair game
for being dumped even if it has users, of course.)


FWIW random internet poll:

   https://mastodon.org.uk/deck/@stsquad/113905257703721811

26% 32 bit
74% 64 bit

with 41 respondents.


Note that some respondents who voted to maintain 32-bit support
mixed 32-bit host with 32-bit guests.



Re: [PATCH 0/7] hw/arm/raspi4b: Add models with 4GB and 8GB of DRAM

2025-02-03 Thread Philippe Mathieu-Daudé

On 3/2/25 15:50, Daniel P. Berrangé wrote:

On Mon, Feb 03, 2025 at 02:45:06PM +, Peter Maydell wrote:

On Mon, 3 Feb 2025 at 14:33, Daniel P. Berrangé  wrote:


On Mon, Feb 03, 2025 at 02:29:49PM +, Alex Bennée wrote:

Peter Maydell  writes:


On Sat, 1 Feb 2025 at 12:57, BALATON Zoltan  wrote:


On Sat, 1 Feb 2025, Philippe Mathieu-Daudé wrote:

- Deprecate the 'raspi4b' machine name, renaming it as
  'raspi4b-1g' on 32-bit hosts, 'raspi4b-2g' otherwise.
- Add the 'raspi4b-4g' and 'raspi4b-8g' machines, with
  respectively 4GB and 8GB of DRAM.


IMHO (meaning you can ignore it, just my opinion) if the only difference
is the memory size -machine raspi4b -memory 4g would be better user
experience than having a lot of different machines.


Yes, I think I agree. We have a way for users to specify
how much memory they want, and I think it makes more sense
to use that than to have lots of different machine types.


I guess for the Pi we should validate the -memory supplied is on of the
supported grid of devices rather than an arbitrary value?


If the user wants to create a rpi4 with 6 GB RAM why should we stop
them ? It is their choice if they want to precisely replicate RAM
size from a physical model, or use something different when virtualized.


The board revision code (reported to the guest via the emulated
firmware interface) only supports reporting 256MB, 512MB,
1GB, 2GB, 4GB or 8GB:

https://www.raspberrypi.com/documentation/computers/raspberry-pi.html#new-style-revision-codes


I think it would be valid to report the revision code for the memory
size that doesn't exceed what QEMU has configured. eg if configured
with 6 GB, then report code for 4 GB.


We need to distinct between physical machines VS virtual ones.

Guests on virtual machines have some way to figure the virtual
hardware (ACPI tables, DeviceTree blob, fw-cfg, ...).

Guests for physical machines usually expect fixed hardware (not
considering devices on busses).

For the particular case of the Raspberry Pi machines, their
bootloader gets the board layout by reading the
RPI_FWREQ_GET_BOARD_REVISION constant value.


What would be the point of emulating a raspi machine with 6GB
if the FW is not going to consider besides 4GB?
Besides, someone modify a guest to work with 6GB, it won't work
on real HW.


For Arm embedded boards we mostly tend to "restrict the user
to what you can actually do", except for older boards where
we tended not to write any kind of sanity checking on CPU
type, memory size, etc.


If we're going to strictly limit memory size that's accepted I wonder
how we could information users/mgmt apps about what's permitted ?

Expressing valid combinations of configs across different args gets
pretty complicated quickly :-(


I'll try to address Zoltan and Peter request to have a dynamic raspi
machine. It is a bit unfortunate we didn't insisted on that when we
decided to expose a fixed set of existing boards in order to not be
bothered by inconsistent bug reports, back in 2019.

Regards,

Phil.



[PATCH qemu 4/5] hw/mem/cxl_type3: Ensure errp is set on realization failure

2025-02-03 Thread Jonathan Cameron via
From: Li Zhijian 

Simply pass the errp to its callee which will set errp if needed, to
enhance error reporting for CXL Type 3 device initialization by setting
the errp when realization functions fail.

Previously, failing to set `errp` could result in errors being overlooked,
causing the system to mistakenly treat failure scenarios as successful and
potentially leading to redundant cleanup operations in ct3_exit().

Signed-off-by: Li Zhijian 
Signed-off-by: Jonathan Cameron 
---
 hw/mem/cxl_type3.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index ff6861889b..d8b45f9bd1 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -891,7 +891,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
  &ct3d->cxl_dstate.device_registers);
 
 /* MSI(-X) Initialization */
-rc = msix_init_exclusive_bar(pci_dev, CXL_T3_MSIX_VECTOR_NR, 4, NULL);
+rc = msix_init_exclusive_bar(pci_dev, CXL_T3_MSIX_VECTOR_NR, 4, errp);
 if (rc) {
 goto err_free_special_ops;
 }
@@ -912,7 +912,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 
 pcie_cap_deverr_init(pci_dev);
 /* Leave a bit of room for expansion */
-rc = pcie_aer_init(pci_dev, PCI_ERR_VER, 0x200, PCI_ERR_SIZEOF, NULL);
+rc = pcie_aer_init(pci_dev, PCI_ERR_VER, 0x200, PCI_ERR_SIZEOF, errp);
 if (rc) {
 goto err_release_cdat;
 }
-- 
2.43.0




[PATCH qemu 5/5] mem/cxl_type3: support 3, 6, 12 and 16 interleave ways

2025-02-03 Thread Jonathan Cameron via
From: Yao Xingtao 

Since the kernel does not check the interleave capability, a
3-way, 6-way, 12-way or 16-way region can be create normally.

Applications can access the memory of 16-way region normally because
qemu can convert hpa to dpa correctly for the power of 2 interleave
ways, after kernel implementing the check, this kind of region will
not be created any more.

For non power of 2 interleave ways, applications could not access the
memory normally and may occur some unexpected behaviors, such as
segmentation fault.

So implements this feature is needed.

Link: 
https://lore.kernel.org/linux-cxl/3e84b919-7631-d1db-3e1d-33000f3f3...@fujitsu.com/
Signed-off-by: Yao Xingtao 
Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-component-utils.c |  9 +++--
 hw/mem/cxl_type3.c   | 15 +++
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
index cd116c0401..473895948b 100644
--- a/hw/cxl/cxl-component-utils.c
+++ b/hw/cxl/cxl-component-utils.c
@@ -243,8 +243,13 @@ static void hdm_init_common(uint32_t *reg_state, uint32_t 
*write_msk,
 ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, INTERLEAVE_4K, 1);
 ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY,
  POISON_ON_ERR_CAP, 0);
-ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 3_6_12_WAY, 0);
-ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 16_WAY, 0);
+if (type == CXL2_TYPE3_DEVICE) {
+ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 3_6_12_WAY, 1);
+ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 16_WAY, 1);
+} else {
+ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 3_6_12_WAY, 0);
+ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 16_WAY, 0);
+}
 ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, UIO, 0);
 ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY,
  UIO_DECODER_COUNT, 0);
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index d8b45f9bd1..6fffa21ead 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1100,10 +1100,17 @@ static bool cxl_type3_dpa(CXLType3Dev *ct3d, hwaddr 
host_addr, uint64_t *dpa)
 continue;
 }
 
-*dpa = dpa_base +
-((MAKE_64BIT_MASK(0, 8 + ig) & hpa_offset) |
- ((MAKE_64BIT_MASK(8 + ig + iw, 64 - 8 - ig - iw) & hpa_offset)
-  >> iw));
+if (iw < 8) {
+*dpa = dpa_base +
+((MAKE_64BIT_MASK(0, 8 + ig) & hpa_offset) |
+ ((MAKE_64BIT_MASK(8 + ig + iw, 64 - 8 - ig - iw) & hpa_offset)
+  >> iw));
+} else {
+*dpa = dpa_base +
+((MAKE_64BIT_MASK(0, 8 + ig) & hpa_offset) |
+ MAKE_64BIT_MASK(ig + iw, 64 - ig - iw) & hpa_offset)
+   >> (ig + iw)) / 3) << (ig + 8)));
+}
 
 return true;
 }
-- 
2.43.0




Re: [PATCH V1 02/26] migration: lower handler priority

2025-02-03 Thread Fabiano Rosas
Steve Sistare  writes:

> Define a vmstate priority that is lower than the default, so its handlers
> run after all default priority handlers.  Since 0 is no longer the default
> priority, translate an uninitialized priority of 0 to MIG_PRI_DEFAULT.
>
> CPR for vfio will use this to install handlers for containers that run
> after handlers for the devices that they contain.
>
> Signed-off-by: Steve Sistare 

Reviewed-by: Fabiano Rosas 



[PATCH qemu 3/5] hw/mem/cxl_type3: Fix special_ops memory leak on msix_init_exclusive_bar() failure

2025-02-03 Thread Jonathan Cameron via
From: Li Zhijian 

Address a memory leak issue by ensuring `regs->special_ops` is freed when
`msix_init_exclusive_bar()` encounters an error during CXL Type3 device
initialization.

Additionally, this patch renames err_address_space_free to err_msix_uninit
for better clarity and logical flow

Signed-off-by: Li Zhijian 
Signed-off-by: Jonathan Cameron 
---
 hw/mem/cxl_type3.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 4775aab0d6..ff6861889b 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -893,7 +893,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 /* MSI(-X) Initialization */
 rc = msix_init_exclusive_bar(pci_dev, CXL_T3_MSIX_VECTOR_NR, 4, NULL);
 if (rc) {
-goto err_address_space_free;
+goto err_free_special_ops;
 }
 for (i = 0; i < CXL_T3_MSIX_VECTOR_NR; i++) {
 msix_vector_use(pci_dev, i);
@@ -907,7 +907,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 cxl_cstate->cdat.free_cdat_table = ct3_free_cdat_table;
 cxl_cstate->cdat.private = ct3d;
 if (!cxl_doe_cdat_init(cxl_cstate, errp)) {
-goto err_free_special_ops;
+goto err_msix_uninit;
 }
 
 pcie_cap_deverr_init(pci_dev);
@@ -943,10 +943,10 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 
 err_release_cdat:
 cxl_doe_cdat_release(cxl_cstate);
-err_free_special_ops:
+err_msix_uninit:
 msix_uninit_exclusive_bar(pci_dev);
+err_free_special_ops:
 g_free(regs->special_ops);
-err_address_space_free:
 if (ct3d->dc.host_dc) {
 cxl_destroy_dc_regions(ct3d);
 address_space_destroy(&ct3d->dc.host_dc_as);
-- 
2.43.0




[PATCH qemu 2/5] hw/mem/cxl_type3: Add paired msix_uninit_exclusive_bar() call

2025-02-03 Thread Jonathan Cameron via
From: Li Zhijian 

msix_uninit_exclusive_bar() should be paired with msix_init_exclusive_bar()

Ensure proper resource cleanup by adding the missing
`msix_uninit_exclusive_bar()` call for the Type3 CXL device.

Signed-off-by: Li Zhijian 
Signed-off-by: Jonathan Cameron 
---
 hw/mem/cxl_type3.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index ebc0ec536e..4775aab0d6 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -944,6 +944,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 err_release_cdat:
 cxl_doe_cdat_release(cxl_cstate);
 err_free_special_ops:
+msix_uninit_exclusive_bar(pci_dev);
 g_free(regs->special_ops);
 err_address_space_free:
 if (ct3d->dc.host_dc) {
@@ -967,6 +968,7 @@ static void ct3_exit(PCIDevice *pci_dev)
 
 pcie_aer_exit(pci_dev);
 cxl_doe_cdat_release(cxl_cstate);
+msix_uninit_exclusive_bar(pci_dev);
 g_free(regs->special_ops);
 if (ct3d->dc.host_dc) {
 cxl_destroy_dc_regions(ct3d);
-- 
2.43.0




[PATCH qemu 0/5] hw/cxl: Cleanups and interleave support.

2025-02-03 Thread Jonathan Cameron via
First set of CXL updates for the 10.0 cycle.
- Mixture of cleanup and hardening against a repeat of recent MSI-X
  numbering bug.
- Expanded interleave support (been on my tree a long time)

Whilst I think these are in a good state, review always welcome.

Li Zhijian (4):
  hw/cxl: Introduce CXL_T3_MSIX_VECTOR enumeration
  hw/mem/cxl_type3: Add paired msix_uninit_exclusive_bar() call
  hw/mem/cxl_type3: Fix special_ops memory leak on
msix_init_exclusive_bar() failure
  hw/mem/cxl_type3: Ensure errp is set on realization failure

Yao Xingtao (1):
  mem/cxl_type3: support 3, 6, 12 and 16 interleave ways

 include/hw/cxl/cxl_device.h  |  4 ++--
 hw/cxl/cxl-component-utils.c |  9 ++--
 hw/cxl/cxl-device-utils.c| 12 --
 hw/cxl/switch-mailbox-cci.c  |  4 +++-
 hw/mem/cxl_type3.c   | 45 +---
 5 files changed, 48 insertions(+), 26 deletions(-)

-- 
2.43.0




[PATCH qemu 1/5] hw/cxl: Introduce CXL_T3_MSIX_VECTOR enumeration

2025-02-03 Thread Jonathan Cameron via
From: Li Zhijian 

Introduce the `CXL_T3_MSIX_VECTOR` enumeration to specify MSIX vector
assignments specific to the Type 3 (T3) CXL device.

The primary goal of this change is to encapsulate the MSIX vector uses
that are unique to the T3 device within an enumeration, improving code
readability and maintenance by avoiding magic numbers. This organizational
change allows for more explicit references to each vector’s role, thereby
reducing the potential for misconfiguration.

It also modified `mailbox_reg_init_common` to accept the `msi_n` parameter,
reflecting the new MSIX vector setup.

This pertains to the T3 device privately; other endpoints should refrain from
using it, despite its public accessibility to all of them.

Signed-off-by: Li Zhijian 
Signed-off-by: Jonathan Cameron 
---
 include/hw/cxl/cxl_device.h |  4 ++--
 hw/cxl/cxl-device-utils.c   | 12 +---
 hw/cxl/switch-mailbox-cci.c |  4 +++-
 hw/mem/cxl_type3.c  | 20 ++--
 4 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 561b375dc8..3a0ee7e8e7 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -264,8 +264,8 @@ void cxl_device_register_block_init(Object *obj, 
CXLDeviceState *dev,
 typedef struct CXLType3Dev CXLType3Dev;
 typedef struct CSWMBCCIDev CSWMBCCIDev;
 /* Set up default values for the register block */
-void cxl_device_register_init_t3(CXLType3Dev *ct3d);
-void cxl_device_register_init_swcci(CSWMBCCIDev *sw);
+void cxl_device_register_init_t3(CXLType3Dev *ct3d, int msi_n);
+void cxl_device_register_init_swcci(CSWMBCCIDev *sw, int msi_n);
 
 /*
  * CXL r3.1 Section 8.2.8.1: CXL Device Capabilities Array Register
diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index 035d034f6d..52ad1e4c3f 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -352,10 +352,8 @@ static void device_reg_init_common(CXLDeviceState 
*cxl_dstate)
 }
 }
 
-static void mailbox_reg_init_common(CXLDeviceState *cxl_dstate)
+static void mailbox_reg_init_common(CXLDeviceState *cxl_dstate, int msi_n)
 {
-const uint8_t msi_n = 9;
-
 /* 2048 payload size */
 ARRAY_FIELD_DP32(cxl_dstate->mbox_reg_state32, CXL_DEV_MAILBOX_CAP,
  PAYLOAD_SIZE, CXL_MAILBOX_PAYLOAD_SHIFT);
@@ -382,7 +380,7 @@ static void memdev_reg_init_common(CXLDeviceState 
*cxl_dstate)
 cxl_dstate->memdev_status = memdev_status_reg;
 }
 
-void cxl_device_register_init_t3(CXLType3Dev *ct3d)
+void cxl_device_register_init_t3(CXLType3Dev *ct3d, int msi_n)
 {
 CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
 uint64_t *cap_h = cxl_dstate->caps_reg_state64;
@@ -398,7 +396,7 @@ void cxl_device_register_init_t3(CXLType3Dev *ct3d)
 device_reg_init_common(cxl_dstate);
 
 cxl_device_cap_init(cxl_dstate, MAILBOX, 2, CXL_DEV_MAILBOX_VERSION);
-mailbox_reg_init_common(cxl_dstate);
+mailbox_reg_init_common(cxl_dstate, msi_n);
 
 cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000,
 CXL_MEM_DEV_STATUS_VERSION);
@@ -408,7 +406,7 @@ void cxl_device_register_init_t3(CXLType3Dev *ct3d)
   CXL_MAILBOX_MAX_PAYLOAD_SIZE);
 }
 
-void cxl_device_register_init_swcci(CSWMBCCIDev *sw)
+void cxl_device_register_init_swcci(CSWMBCCIDev *sw, int msi_n)
 {
 CXLDeviceState *cxl_dstate = &sw->cxl_dstate;
 uint64_t *cap_h = cxl_dstate->caps_reg_state64;
@@ -423,7 +421,7 @@ void cxl_device_register_init_swcci(CSWMBCCIDev *sw)
 device_reg_init_common(cxl_dstate);
 
 cxl_device_cap_init(cxl_dstate, MAILBOX, 2, 1);
-mailbox_reg_init_common(cxl_dstate);
+mailbox_reg_init_common(cxl_dstate, msi_n);
 
 cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000, 1);
 memdev_reg_init_common(cxl_dstate);
diff --git a/hw/cxl/switch-mailbox-cci.c b/hw/cxl/switch-mailbox-cci.c
index 65cdac6cc1..833b824619 100644
--- a/hw/cxl/switch-mailbox-cci.c
+++ b/hw/cxl/switch-mailbox-cci.c
@@ -17,10 +17,12 @@
 #include "hw/qdev-properties.h"
 #include "hw/cxl/cxl.h"
 
+#define CXL_SWCCI_MSIX_MBOX 3
+
 static void cswmbcci_reset(DeviceState *dev)
 {
 CSWMBCCIDev *cswmb = CXL_SWITCH_MAILBOX_CCI(dev);
-cxl_device_register_init_swcci(cswmb);
+cxl_device_register_init_swcci(cswmb, CXL_SWCCI_MSIX_MBOX);
 }
 
 static void cswbcci_realize(PCIDevice *pci_dev, Error **errp)
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 0ae1704a34..ebc0ec536e 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -30,6 +30,14 @@
 #include "hw/cxl/cxl.h"
 #include "hw/pci/msix.h"
 
+/* type3 device private */
+enum CXL_T3_MSIX_VECTOR {
+CXL_T3_MSIX_PCIE_DOE_TABLE_ACCESS = 0,
+CXL_T3_MSIX_EVENT_START = 2,
+CXL_T3_MSIX_MBOX = CXL_T3_MSIX_EVENT_START + CXL_EVENT_TYPE_MAX,
+CXL_T3_MSIX_VECTOR_NR
+};
+
 #define DWORD_BYTE 4
 #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
 
@@ -843,7 +851,6 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)

Re: [PATCH v2 10/15] block/export: Don't ignore image activation error in blk_exp_add()

2025-02-03 Thread Eric Blake
On Fri, Jan 31, 2025 at 10:50:46AM +0100, Kevin Wolf wrote:
> Currently, block jobs can't handle inactive images correctly. Incoming
> write requests would run into assertion failures. Make sure that we
> return an error when creating an export can't activate the image.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/export/export.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




[PATCH v1 1/1] aspeed/soc: Support Non-maskable Interrupt for AST2700

2025-02-03 Thread Jamin Lin via
QEMU supports GICv3 Non-maskable Interrupt, adds to support Non-maskable
Interrupt for AST2700.

Reference:
https://github.com/qemu/qemu/commit/b36a32ead

Signed-off-by: Jamin Lin 
---
 hw/arm/aspeed_ast27x0.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/arm/aspeed_ast27x0.c b/hw/arm/aspeed_ast27x0.c
index 4114e15ddd..361a054d46 100644
--- a/hw/arm/aspeed_ast27x0.c
+++ b/hw/arm/aspeed_ast27x0.c
@@ -470,6 +470,10 @@ static bool aspeed_soc_ast2700_gic_realize(DeviceState 
*dev, Error **errp)
qdev_get_gpio_in(cpudev, ARM_CPU_VIRQ));
 sysbus_connect_irq(gicbusdev, i + 3 * sc->num_cpus,
qdev_get_gpio_in(cpudev, ARM_CPU_VFIQ));
+sysbus_connect_irq(gicbusdev, i + 4 * sc->num_cpus,
+   qdev_get_gpio_in(cpudev, ARM_CPU_NMI));
+sysbus_connect_irq(gicbusdev, i + 5 * sc->num_cpus,
+   qdev_get_gpio_in(cpudev, ARM_CPU_VINMI));
 }
 
 return true;
-- 
2.25.1




Re: [PATCH v2 10/14] configure: Define TARGET_LONG_BITS in configs/targets/*.mak

2025-02-03 Thread Richard Henderson

On 2/3/25 10:30, Philippe Mathieu-Daudé wrote:

On 3/2/25 04:18, Richard Henderson wrote:

Define TARGET_LONG_BITS in each target's configure fragment.
Do this without removing the define in target/*/cpu-param.h
so that errors are caught like so:

In file included from .../src/include/exec/cpu-defs.h:26,
  from ../src/target/hppa/cpu.h:24,
  from ../src/linux-user/qemu.h:4,
  from ../src/linux-user/hppa/cpu_loop.c:21:
../src/target/hppa/cpu-param.h:11: error: "TARGET_LONG_BITS" redefined [-Werror]
    11 | #define TARGET_LONG_BITS  64
   |
In file included from .../src/include/qemu/osdep.h:36,
  from ../src/linux-user/hppa/cpu_loop.c:20:
./hppa-linux-user-config-target.h:32: note: this is the location of the 
previous definition
    32 | #define TARGET_LONG_BITS 32
   |
cc1: all warnings being treated as errors

Signed-off-by: Richard Henderson 
---


Orthogonal to this series, what about the other definitions,
like TARGET_PHYS_ADDR_SPACE_BITS / TARGET_VIRT_ADDR_SPACE_BITS
and possibly TARGET_PAGE_BITS?


We don't need those at configure time, so there's no need to move them.


r~



Re: [PATCH v2 13/15] iotests: Add filter_qtest()

2025-02-03 Thread Eric Blake
On Fri, Jan 31, 2025 at 10:50:49AM +0100, Kevin Wolf wrote:
> The open-coded form of this filter has been copied into enough tests
> that it's better to move it into iotests.py.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/iotests.py | 4 
>  tests/qemu-iotests/041| 4 +---
>  tests/qemu-iotests/165| 4 +---
>  tests/qemu-iotests/tests/copy-before-write| 3 +--
>  tests/qemu-iotests/tests/migrate-bitmaps-test | 7 +++
>  5 files changed, 10 insertions(+), 12 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH v2 14/15] iotests: Add qsd-migrate case

2025-02-03 Thread Eric Blake
On Fri, Jan 31, 2025 at 10:50:50AM +0100, Kevin Wolf wrote:
> Test that it's possible to migrate a VM that uses an image on shared
> storage through qemu-storage-daemon.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/tests/qsd-migrate | 132 +++
>  tests/qemu-iotests/tests/qsd-migrate.out |  51 +
>  2 files changed, 183 insertions(+)
>  create mode 100755 tests/qemu-iotests/tests/qsd-migrate
>  create mode 100644 tests/qemu-iotests/tests/qsd-migrate.out
> 
> diff --git a/tests/qemu-iotests/tests/qsd-migrate 
> b/tests/qemu-iotests/tests/qsd-migrate
> new file mode 100755
> index 00..687bda6f93
> --- /dev/null
> +++ b/tests/qemu-iotests/tests/qsd-migrate
> @@ -0,0 +1,132 @@
> +#!/usr/bin/env python3
> +# group: rw quick

> +
> +with iotests.FilePath('disk.img') as path, \
> + iotests.FilePath('nbd-src.sock', base_dir=iotests.sock_dir) as nbd_src, 
> \
> + iotests.FilePath('nbd-dst.sock', base_dir=iotests.sock_dir) as nbd_dst, 
> \
> + iotests.FilePath('migrate.sock', base_dir=iotests.sock_dir) as 
> mig_sock, \
> + iotests.VM(path_suffix="-src") as vm_src, \
> + iotests.VM(path_suffix="-dst") as vm_dst:
> +

> +
> +iotests.log('\nTest I/O on the source')
> +vm_src.hmp_qemu_io('virtio0/virtio-backend', 'write -P 0x11 0 4k',
> +   use_log=True, qdev=True)
> +vm_src.hmp_qemu_io('virtio0/virtio-backend', 'read -P 0x11 0 4k',
> +   use_log=True, qdev=True)
> +
> +iotests.log('\nStarting migration...')


Is it worth adding a test that qemu_io fails to write on the
destination while it is inactive (to ensure we are properly rejecting
modification of an inactive image)?

> +
> +mig_caps = [
> +{'capability': 'events', 'state': True},
> +{'capability': 'pause-before-switchover', 'state': True},
> +]
> +vm_src.qmp_log('migrate-set-capabilities', capabilities=mig_caps)
> +vm_dst.qmp_log('migrate-set-capabilities', capabilities=mig_caps)
> +vm_src.qmp_log('migrate', uri=f'unix:{mig_sock}',
> +   filters=[iotests.filter_qmp_testfiles])
> +
> +vm_src.event_wait('MIGRATION',
> +  match={'data': {'status': 'pre-switchover'}})
> +
> +iotests.log('\nPre-switchover: Reconfigure QSD instances')
> +
> +iotests.log(qsd_src.qmp('blockdev-set-active', {'active': False}))
> +iotests.log(qsd_dst.qmp('blockdev-set-active', {'active': True}))

Also, should you attempt a read on both src and dst while both sides
are inactive, to prove that reads can take a snapshot in the middle of
the handover?

Oveall a nice test.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH v2 12/15] nbd/server: Support inactive nodes

2025-02-03 Thread Eric Blake
On Fri, Jan 31, 2025 at 10:50:48AM +0100, Kevin Wolf wrote:
> In order to support running an NBD export on inactive nodes, we must
> make sure to return errors for any operations that aren't allowed on
> inactive nodes. Reads are the only operation we know we need for
> inactive images, so to err on the side of caution, return errors for
> everything else, even if some operations could possibly be okay.

We may still find a use case for block status on an inactive node
(especially if that helps us take more accurate snapshots, which is
the whole point of wanting to read pre-activation).  But I'm okay if
we defer that to a separate patch only if it actually proves to be
needed.

> 
> Signed-off-by: Kevin Wolf 
> ---
>  nbd/server.c | 17 +
>  1 file changed, 17 insertions(+)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls

2025-02-03 Thread Maciej S. Szmigiero

On 3.02.2025 20:58, Peter Xu wrote:

On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote:

On 2.02.2025 13:45, Dr. David Alan Gilbert wrote:

* Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote:

On 2.02.2025 03:06, Dr. David Alan Gilbert wrote:

* Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote:

From: "Maciej S. Szmigiero" 

postcopy_ram_listen_thread() is a free running thread, so it needs to
take BQL around function calls to migration methods requiring BQL.

qemu_loadvm_state_main() needs BQL held since it ultimately calls
"load_state" SaveVMHandlers.

migration_incoming_state_destroy() needs BQL held since it ultimately calls
"load_cleanup" SaveVMHandlers.

Signed-off-by: Maciej S. Szmigiero 
---
migration/savevm.c | 4 
1 file changed, 4 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index b0b74140daea..0ceea9638cc1 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
 * in qemu_file, and thus we must be blocking now.
 */
qemu_file_set_blocking(f, true);
+bql_lock();
load_res = qemu_loadvm_state_main(f, mis);
+bql_unlock();


Doesn't that leave that held for a heck of a long time?


Yes, and it effectively broke "postcopy recover" test but I
think the reason for that is qemu_loadvm_state_main() and
its children don't drop BQL while waiting for I/O.

I've described this case in more detail in my reply to Fabiano here:
https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd25...@maciej.szmigiero.name/


While it might be the cause in this case, my feeling is it's more fundamental
here - it's the whole reason that postcopy has a separate ram listen
thread.  As the destination is running, after it loads it's devices
and as it starts up the destination will be still loading RAM
(and other postcopiable devices) potentially for quite a while.
Holding the bql around the ram listen thread means that the
execution of the destination won't be able to take that lock
until the postcopy load has finished; so while that might apparently
complete, it'll lead to the destination stalling until that's finished
which defeats the whole point of postcopy.
That last one probably won't fail a test but it will lead to a long stall
if you give it a nice big guest with lots of RAM that it's rapidly
changing.


Okay, I understand the postcopy case/flow now.
Thanks for explaining it clearly.


I still think that "load_state" SaveVMHandlers need to be called
with BQL held since implementations apparently expect it that way:
for example, I think PCI device configuration restore calls
address space manipulation methods which abort() if called
without BQL held.


However, the only devices that *should* be arriving on the channel
that the postcopy_ram_listen_thread is reading from are those
that are postcopiable (i.e. RAM and hmm block's dirty_bitmap).
Those load handlers are safe to be run while the other devices
are being changed.   Note the *should* - you could add a check
to fail if any other device arrives on that channel.


I think ultimately there should be either an explicit check, or,
as you suggest in the paragraph below, a separate SaveVMHandler
that runs without BQL held.


To me those are bugs happening during postcopy, so those abort()s in
memory.c are indeed for catching these issues too.


Since the current state of just running these SaveVMHandlers
without BQL in this case and hoping that nothing breaks is
clearly sub-optimal.


I have previously even submitted a patch to explicitly document
"load_state" SaveVMHandler as requiring BQL (which was also
included in the previous version of this patch set) and it
received a "Reviewed-by:" tag:
https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/
https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/
https://lore.kernel.org/qemu-devel/87o732bti7@suse.de/


It happens!
You could make this safer by having a load_state and a load_state_postcopy
member, and only mark the load_state as requiring the lock.


To not digress too much from the subject of this patch set
(multifd VFIO device state transfer) for now I've just updated the
TODO comment around that qemu_loadvm_state_main(), so hopefully this
discussion won't get forgotten:
https://gitlab.com/maciejsszmigiero/qemu/-/commit/046e3deac5b1dbc406b3e9571f62468bd6743e79


The commit message may still need some touch ups, e.g.:

   postcopy_ram_listen_thread() is a free running thread, so it needs to
   take BQL around function calls to migration methods requiring BQL.


This sentence is still not correct, IMHO. As Dave explained, the ram load
thread is designed to run without BQL at least for the major workloads it
runs.


So what's your proposed wording of this commit then?


I don't worry on src sending s

[PATCH v2 02/12] hw/arm/raspi: Merge model 4B with other models

2025-02-03 Thread Philippe Mathieu-Daudé
Except we alter the device tree blob, the 4B
is just another raspi model.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/raspi.c | 114 -
 hw/arm/raspi4b.c   | 136 -
 hw/arm/meson.build |   2 +-
 3 files changed, 114 insertions(+), 138 deletions(-)
 delete mode 100644 hw/arm/raspi4b.c

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 508f90479e2..3fa382d62ce 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -8,6 +8,10 @@
  * Raspberry Pi 3 emulation Copyright (c) 2018 Zoltán Baldaszti
  * Upstream code cleanup (c) 2018 Pekka Enberg
  *
+ * Raspberry Pi 4 emulation Copyright (C) 2022 Ovchinnikov Vitalii
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
  */
@@ -16,20 +20,27 @@
 #include "qemu/units.h"
 #include "qemu/cutils.h"
 #include "qapi/error.h"
+#include "qapi/visitor.h"
 #include "hw/arm/boot.h"
 #include "hw/arm/bcm2836.h"
 #include "hw/arm/bcm2838.h"
 #include "hw/arm/raspi_platform.h"
+#include "hw/display/bcm2835_fb.h"
 #include "hw/registerfields.h"
 #include "qemu/error-report.h"
 #include "hw/boards.h"
 #include "hw/loader.h"
 #include "hw/arm/boot.h"
 #include "qom/object.h"
+#include "system/device_tree.h"
+#include 
 
 #define TYPE_RASPI_MACHINE  MACHINE_TYPE_NAME("raspi-common")
 OBJECT_DECLARE_SIMPLE_TYPE(RaspiMachineState, RASPI_MACHINE)
 
+#define TYPE_RASPI4B_MACHINE MACHINE_TYPE_NAME("raspi4b")
+OBJECT_DECLARE_SIMPLE_TYPE(Raspi4bMachineState, RASPI4B_MACHINE)
+
 #define SMPBOOT_ADDR0x300 /* this should leave enough space for ATAGS */
 #define MVBAR_ADDR  0x400 /* secure vectors */
 #define BOARDSETUP_ADDR (MVBAR_ADDR + 0x20) /* board setup code */
@@ -44,6 +55,11 @@ struct RaspiMachineState {
 BCM283XState soc;
 };
 
+struct Raspi4bMachineState {
+RaspiBaseMachineState parent_obj;
+BCM2838State soc;
+};
+
 /*
  * Board revision codes:
  * www.raspberrypi.org/documentation/hardware/raspberrypi/revision-codes/
@@ -301,6 +317,83 @@ void raspi_base_machine_init(MachineState *machine,
boot_ram_size);
 }
 
+#ifdef TARGET_AARCH64
+/*
+ * Add second memory region if board RAM amount exceeds VC base address
+ * (see https://datasheets.raspberrypi.com/bcm2711/bcm2711-peripherals.pdf
+ * 1.2 Address Map)
+ */
+static int raspi4_add_memory_node(void *fdt, hwaddr mem_base, hwaddr mem_len)
+{
+int ret;
+uint32_t acells, scells;
+char *nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
+
+acells = qemu_fdt_getprop_cell(fdt, "/", "#address-cells",
+   NULL, &error_fatal);
+scells = qemu_fdt_getprop_cell(fdt, "/", "#size-cells",
+   NULL, &error_fatal);
+if (acells == 0 || scells == 0) {
+fprintf(stderr, "dtb file invalid (#address-cells or #size-cells 
0)\n");
+ret = -1;
+} else {
+qemu_fdt_add_subnode(fdt, nodename);
+qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
+ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg",
+   acells, mem_base,
+   scells, mem_len);
+}
+
+g_free(nodename);
+return ret;
+}
+
+static void raspi4_modify_dtb(const struct arm_boot_info *info, void *fdt)
+{
+uint64_t ram_size;
+
+/* Temporarily disable following devices until they are implemented */
+const char *nodes_to_remove[] = {
+"brcm,bcm2711-pcie",
+"brcm,bcm2711-rng200",
+"brcm,bcm2711-thermal",
+"brcm,bcm2711-genet-v5",
+};
+
+for (int i = 0; i < ARRAY_SIZE(nodes_to_remove); i++) {
+const char *dev_str = nodes_to_remove[i];
+
+int offset = fdt_node_offset_by_compatible(fdt, -1, dev_str);
+if (offset >= 0) {
+if (!fdt_nop_node(fdt, offset)) {
+warn_report("bcm2711 dtc: %s has been disabled!", dev_str);
+}
+}
+}
+
+ram_size = board_ram_size(info->board_id);
+
+if (info->ram_size > UPPER_RAM_BASE) {
+raspi4_add_memory_node(fdt, UPPER_RAM_BASE, ram_size - UPPER_RAM_BASE);
+}
+}
+
+static void raspi4b_machine_init(MachineState *machine)
+{
+Raspi4bMachineState *s = RASPI4B_MACHINE(machine);
+RaspiBaseMachineState *s_base = RASPI_BASE_MACHINE(machine);
+RaspiBaseMachineClass *mc = RASPI_BASE_MACHINE_GET_CLASS(machine);
+BCM2838State *soc = &s->soc;
+
+s_base->binfo.modify_dtb = raspi4_modify_dtb;
+s_base->binfo.board_id = mc->board_rev;
+
+object_initialize_child(OBJECT(machine), "soc", soc,
+board_soc_type(mc->board_rev));
+raspi_base_machine_init(machine, BCM283X_BASE(soc));
+}
+#endif /* TARGET_AARCH64 */
+
 void raspi_machine_init(MachineState *machine)
 {
 RaspiMachineState *s = RASPI_MACHINE(machine);
@@

[PATCH v2 00/12] hw/arm/raspi: Allow creating any Raspberry Pi machine

2025-02-03 Thread Philippe Mathieu-Daudé
Full rewrite of v1 [1], addressing Zoltan & Peter suggestion.

Introduce a generic 'raspi' machine, which takes a 'model'
and 'revision' properties, and any memory size. The 'board_rev'
register is filled appropriately.

Before, merge raspi4b.c within raspi.c (more is planned here
with the MPCore refactor [2]).

Regards,

Phil.

[1] https://lore.kernel.org/qemu-devel/20250201091528.1177-1-phi...@linaro.org/
[2] https://lore.kernel.org/qemu-devel/20231212162935.42910-1-phi...@linaro.org/

Philippe Mathieu-Daudé (12):
  hw/arm/raspi: Access SoC parent object using  BCM283X_BASE() macro
  hw/arm/raspi: Merge model 4B with other models
  hw/arm/raspi: Unify RASPI_MACHINE types
  hw/arm/raspi: Pass board_rev as argument to raspi_base_machine_init()
  hw/arm/raspi: Consider processor id in types[] array
  hw/arm/raspi: Consider network interface for B models
  hw/arm/raspi: Check ramsize is within chipset aperture
  hw/arm/raspi: Introduce generic Raspberry Pi machine
  hw/arm/raspi: Have the generic machine take a 'revision' property
  hw/arm/raspi: List models creatable by the generic 'raspi' machine
  hw/arm/raspi: Deprecate old raspiX machine names
  hw/arm/raspi: Support more models

 docs/about/deprecated.rst   |  13 +
 include/hw/arm/raspi_platform.h |   5 +-
 hw/arm/raspi.c  | 383 ++--
 hw/arm/raspi4b.c| 136 -
 tests/qtest/bcm2835-dma-test.c  |   2 +-
 tests/qtest/bcm2835-i2c-test.c  |   2 +-
 tests/qtest/boot-serial-test.c  |   3 +-
 hw/arm/meson.build  |   2 +-
 tests/functional/test_aarch64_raspi3.py |   5 +-
 tests/functional/test_aarch64_raspi4.py |   4 +-
 tests/functional/test_arm_raspi2.py |   4 +-
 11 files changed, 385 insertions(+), 174 deletions(-)
 delete mode 100644 hw/arm/raspi4b.c

-- 
2.47.1




[PATCH v2 08/12] hw/arm/raspi: Introduce generic Raspberry Pi machine

2025-02-03 Thread Philippe Mathieu-Daudé
The generic 'raspi' machine takes a 'model' argument and
create the machine associated with the model, with the
RAM size requested (or default to the minimum of 256MB
if not precised).

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2797
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/arm/raspi_platform.h |   3 +-
 hw/arm/raspi.c  | 127 
 2 files changed, 115 insertions(+), 15 deletions(-)

diff --git a/include/hw/arm/raspi_platform.h b/include/hw/arm/raspi_platform.h
index defb786153b..14cb91e153c 100644
--- a/include/hw/arm/raspi_platform.h
+++ b/include/hw/arm/raspi_platform.h
@@ -41,7 +41,8 @@ OBJECT_DECLARE_TYPE(RaspiBaseMachineState, 
RaspiBaseMachineClass,
 struct RaspiBaseMachineState {
 /*< private >*/
 MachineState parent_obj;
-/*< public >*/
+
+uint32_t board_rev;
 struct arm_boot_info binfo;
 };
 
diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index d44277001ee..1dc41701efe 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -300,20 +300,6 @@ void raspi_base_machine_init(MachineState *machine,
 BlockBackend *blk;
 BusState *bus;
 DeviceState *carddev;
-uint64_t max_ramsize;
-
-if (machine->ram_size != ram_size) {
-char *size_str = size_to_str(ram_size);
-error_report("Invalid RAM size, should be %s", size_str);
-g_free(size_str);
-exit(1);
-}
-max_ramsize = ramsize_max(board_rev);
-if (ram_size > max_ramsize) {
-g_autofree char *max_ramsize_str = size_to_str(max_ramsize);
-error_report("At most %s of RAM can be used", max_ramsize_str);
- exit(1);
-}
 
 /* FIXME: Remove when we have custom CPU address space support */
 memory_region_add_subregion_overlap(get_system_memory(), 0,
@@ -448,6 +434,115 @@ void raspi_machine_init(MachineState *machine)
 raspi_base_machine_init(machine, BCM283X_BASE(soc), mc->board_rev);
 }
 
+static void raspi_generic_machine_init(MachineState *ms)
+{
+RaspiMachineState *s = RASPI_MACHINE(ms);
+RaspiBaseMachineState *s_base = RASPI_BASE_MACHINE(ms);
+uint32_t board_rev = s_base->board_rev;
+const char *soc_type = board_soc_type(board_rev);
+BCM283XBaseState *bsoc;
+uint64_t ram_size;
+uint64_t max_ramsize;
+
+if (!board_rev) {
+error_report("Missing model");
+exit(1);
+}
+
+ram_size = ROUND_UP(ms->ram_size, 256 * MiB);
+if (ram_size != ms->ram_size) {
+g_autofree char *ram_size_str = size_to_str(ms->ram_size);
+g_autofree char *rounded_size_str = size_to_str(ram_size);
+warn_report("Invalid RAM size %s, rounding to %s",
+ram_size_str, rounded_size_str);
+}
+max_ramsize = ramsize_max(board_rev);
+if (ram_size > max_ramsize) {
+g_autofree char *max_ramsize_str = size_to_str(max_ramsize);
+error_report("At most %s of RAM can be used with BCM%s",
+ max_ramsize_str, soc_type + 3);
+exit(1);
+}
+board_rev = FIELD_DP32(board_rev, REV_CODE, MEMORY_SIZE,
+   ctz64(ms->ram_size) - 28);
+
+ms->ram = g_new(MemoryRegion, 1);
+memory_region_init(ms->ram, OBJECT(ms), "DRAM", ram_size);
+
+if (board_processor_id(board_rev) == PROCESSOR_ID_BCM2838) {
+BCM2838State *soc = &s->soc4;
+bsoc = BCM283X_BASE(soc);
+object_initialize_child(OBJECT(ms), "soc", soc, soc_type);
+} else {
+BCM283XState *soc = &s->soc;
+bsoc = BCM283X_BASE(soc);
+object_initialize_child(OBJECT(ms), "soc", soc, soc_type);
+}
+raspi_base_machine_init(ms, bsoc, board_rev);
+}
+
+static void raspi_update_board_rev(RaspiBaseMachineState *s)
+{
+MachineState *ms = MACHINE(s);
+RaspiProcessorId proc;
+unsigned model_index;
+
+s->board_rev = FIELD_DP32(s->board_rev, REV_CODE, STYLE, 1);
+
+model_index = FIELD_EX32(s->board_rev, REV_CODE, TYPE);
+proc = types[model_index].proc_id;
+s->board_rev = FIELD_DP32(s->board_rev, REV_CODE, PROCESSOR, proc);
+
+ms->smp.max_cpus = soc_property[proc].cores_count;
+}
+
+static void raspi_set_machine_model(Object *obj, const char *value, Error 
**errp)
+{
+for (unsigned i = 0; i < ARRAY_SIZE(types); i++) {
+if (types[i].model && !strcmp(value, types[i].model)) {
+RaspiBaseMachineState *s = RASPI_BASE_MACHINE(obj);
+
+s->board_rev = FIELD_DP32(s->board_rev, REV_CODE, TYPE, i);
+
+return raspi_update_board_rev(s);
+}
+}
+error_setg(errp, "Invalid model");
+}
+
+static char *raspi_get_machine_model(Object *obj, Error **errp)
+{
+RaspiBaseMachineState *s = RASPI_BASE_MACHINE(obj);
+
+return g_strdup(types[FIELD_EX32(s->board_rev, REV_CODE, TYPE)].model);
+}
+
+static void raspi_generic_machine_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+RaspiBaseMachineClass *rmc = RASPI_BASE_MACHINE_CLASS(oc);
+
+r

[PATCH v2 06/12] hw/arm/raspi: Consider network interface for B models

2025-02-03 Thread Philippe Mathieu-Daudé
Raspberry Pi 'B' models have an ethernet chipset (the LAN9512).
Since we don't yet model it, add a /* TODO */ comment.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/raspi.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 1a6a1f8ff22..68332fba027 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -143,6 +143,16 @@ static const char *board_type(uint32_t board_rev)
 return types[bt].model;
 }
 
+static bool is_model_b(uint32_t board_rev)
+{
+return !!strchr(board_type(board_rev), 'B');
+}
+
+static bool has_enet(uint32_t board_rev)
+{
+return is_model_b(board_rev);
+}
+
 static void write_smpboot(ARMCPU *cpu, const struct arm_boot_info *info)
 {
 static const ARMInsnFixup smpboot[] = {
@@ -304,6 +314,10 @@ void raspi_base_machine_init(MachineState *machine,
 machine->kernel_cmdline, &error_abort);
 qdev_realize(DEVICE(soc), NULL, &error_fatal);
 
+if (has_enet(board_rev)) {
+/* TODO: model LAN9512 and wire over USB2 */
+}
+
 /* Create and plug in the SD cards */
 di = drive_get(IF_SD, 0, 0);
 blk = di ? blk_by_legacy_dinfo(di) : NULL;
-- 
2.47.1




[PATCH v2 03/12] hw/arm/raspi: Unify RASPI_MACHINE types

2025-02-03 Thread Philippe Mathieu-Daudé
Merge Raspi4bMachineState within RaspiMachineState by
using an unnamed union.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/raspi.c | 21 +++--
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 3fa382d62ce..ef94d57dab5 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -38,9 +38,6 @@
 #define TYPE_RASPI_MACHINE  MACHINE_TYPE_NAME("raspi-common")
 OBJECT_DECLARE_SIMPLE_TYPE(RaspiMachineState, RASPI_MACHINE)
 
-#define TYPE_RASPI4B_MACHINE MACHINE_TYPE_NAME("raspi4b")
-OBJECT_DECLARE_SIMPLE_TYPE(Raspi4bMachineState, RASPI4B_MACHINE)
-
 #define SMPBOOT_ADDR0x300 /* this should leave enough space for ATAGS */
 #define MVBAR_ADDR  0x400 /* secure vectors */
 #define BOARDSETUP_ADDR (MVBAR_ADDR + 0x20) /* board setup code */
@@ -49,15 +46,12 @@ OBJECT_DECLARE_SIMPLE_TYPE(Raspi4bMachineState, 
RASPI4B_MACHINE)
 #define SPINTABLE_ADDR  0xd8 /* Pi 3 bootloader spintable */
 
 struct RaspiMachineState {
-/*< private >*/
 RaspiBaseMachineState parent_obj;
-/*< public >*/
-BCM283XState soc;
-};
 
-struct Raspi4bMachineState {
-RaspiBaseMachineState parent_obj;
-BCM2838State soc;
+union {
+BCM283XState soc;
+BCM2838State soc4;
+};
 };
 
 /*
@@ -380,10 +374,10 @@ static void raspi4_modify_dtb(const struct arm_boot_info 
*info, void *fdt)
 
 static void raspi4b_machine_init(MachineState *machine)
 {
-Raspi4bMachineState *s = RASPI4B_MACHINE(machine);
+RaspiMachineState *s = RASPI_MACHINE(machine);
 RaspiBaseMachineState *s_base = RASPI_BASE_MACHINE(machine);
 RaspiBaseMachineClass *mc = RASPI_BASE_MACHINE_GET_CLASS(machine);
-BCM2838State *soc = &s->soc;
+BCM2838State *soc = &s->soc4;
 
 s_base->binfo.modify_dtb = raspi4_modify_dtb;
 s_base->binfo.board_id = mc->board_rev;
@@ -515,8 +509,7 @@ static const TypeInfo raspi_machine_types[] = {
 .class_init = raspi3b_machine_class_init,
 }, {
 .name   = MACHINE_TYPE_NAME("raspi4"),
-.parent = TYPE_RASPI_BASE_MACHINE,
-.instance_size  = sizeof(Raspi4bMachineState),
+.parent = TYPE_RASPI_MACHINE,
 .class_init = raspi4b_machine_class_init,
 #endif /* TARGET_AARCH64 */
 }, {
-- 
2.47.1




[PATCH v2 05/12] hw/arm/raspi: Consider processor id in types[] array

2025-02-03 Thread Philippe Mathieu-Daudé
Expand the current type2model array to include the processor id.

Since the BCM2838 is indistinctly used as BCM2711 (within the
Linux community), add it as alias in RaspiProcessorId.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/raspi.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 571b50bef7e..1a6a1f8ff22 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -70,6 +70,7 @@ typedef enum RaspiProcessorId {
 PROCESSOR_ID_BCM2836 = 1,
 PROCESSOR_ID_BCM2837 = 2,
 PROCESSOR_ID_BCM2838 = 3,
+PROCESSOR_ID_BCM2711 = 3,
 } RaspiProcessorId;
 
 static const struct {
@@ -82,6 +83,30 @@ static const struct {
 [PROCESSOR_ID_BCM2838] = {TYPE_BCM2838, BCM283X_NCPUS},
 };
 
+static const struct {
+RaspiProcessorId proc_id;
+const char *model;
+} types[] = {
+{PROCESSOR_ID_BCM2835, "A"},
+{PROCESSOR_ID_BCM2835, "B"},
+{PROCESSOR_ID_BCM2835, "A+"},
+{PROCESSOR_ID_BCM2835, "B+"},
+{PROCESSOR_ID_BCM2836, "2B"},
+{ },
+{PROCESSOR_ID_BCM2835, "CM1"},
+{ },
+{PROCESSOR_ID_BCM2837, "3B"},
+{PROCESSOR_ID_BCM2835, "Zero"},
+{PROCESSOR_ID_BCM2837, "CM3"},
+{ },
+{PROCESSOR_ID_BCM2835, "ZeroW"},
+{PROCESSOR_ID_BCM2837, "3B+"},
+{PROCESSOR_ID_BCM2837, "3A+"},
+{ },
+{PROCESSOR_ID_BCM2837, "CM3+"},
+{PROCESSOR_ID_BCM2711, "4B"},
+};
+
 uint64_t board_ram_size(uint32_t board_rev)
 {
 assert(FIELD_EX32(board_rev, REV_CODE, STYLE)); /* Only new style */
@@ -110,16 +135,12 @@ static int cores_count(uint32_t board_rev)
 
 static const char *board_type(uint32_t board_rev)
 {
-static const char *types[] = {
-"A", "B", "A+", "B+", "2B", "Alpha", "CM1", NULL, "3B", "Zero",
-"CM3", NULL, "Zero W", "3B+", "3A+", NULL, "CM3+", "4B",
-};
 assert(FIELD_EX32(board_rev, REV_CODE, STYLE)); /* Only new style */
 int bt = FIELD_EX32(board_rev, REV_CODE, TYPE);
-if (bt >= ARRAY_SIZE(types) || !types[bt]) {
+if (bt >= ARRAY_SIZE(types) || !types[bt].model) {
 return "Unknown";
 }
-return types[bt];
+return types[bt].model;
 }
 
 static void write_smpboot(ARMCPU *cpu, const struct arm_boot_info *info)
-- 
2.47.1




[PATCH v2 10/12] hw/arm/raspi: List models creatable by the generic 'raspi' machine

2025-02-03 Thread Philippe Mathieu-Daudé
All the following models can be created (with different RAM size):

  $ qemu-system-aarch64 -M raspi
  qemu-system-aarch64: Missing model, try -M raspi,model=help
  $ qemu-system-aarch64 -M raspi,model=help
  Available models (processor):
  - A  (BCM2835)
  - B  (BCM2835)
  - A+ (BCM2835)
  - B+ (BCM2835)
  - 2B (BCM2836)
  - CM1(BCM2835)
  - 3B (BCM2837)
  - Zero   (BCM2835)
  - CM3(BCM2837)
  - ZeroW  (BCM2835)
  - 3B+(BCM2837)
  - 3A+(BCM2837)
  - CM3+   (BCM2837)
  - 4B (BCM2838)

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/raspi.c | 28 +++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index b184ac3c446..8cae1ff6f93 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -445,7 +445,7 @@ static void raspi_generic_machine_init(MachineState *ms)
 uint64_t max_ramsize;
 
 if (!board_rev) {
-error_report("Missing model");
+error_report("Missing model, try -M raspi,model=help");
 exit(1);
 }
 
@@ -500,8 +500,33 @@ static void raspi_update_board_rev(RaspiBaseMachineState 
*s)
 ms->smp.max_cpus = soc_property[proc].cores_count;
 }
 
+static void raspi_list_machine_models(void)
+{
+printf("Available models (processor):\n");
+
+for (unsigned i = 0; i < ARRAY_SIZE(types); i++) {
+const char *soc_type;
+
+if (!types[i].model) {
+continue;
+}
+
+soc_type = soc_property[types[i].proc_id].type;
+if (!soc_type) {
+continue;
+}
+printf("- %-10s (BCM%s)\n",
+   types[i].model,
+   soc_property[types[i].proc_id].type + 3);
+}
+}
+
 static void raspi_set_machine_model(Object *obj, const char *value, Error 
**errp)
 {
+if (!strcmp(value, "help")) {
+raspi_list_machine_models();
+exit(0);
+}
 for (unsigned i = 0; i < ARRAY_SIZE(types); i++) {
 if (types[i].model && !strcmp(value, types[i].model)) {
 RaspiBaseMachineState *s = RASPI_BASE_MACHINE(obj);
@@ -512,6 +537,7 @@ static void raspi_set_machine_model(Object *obj, const char 
*value, Error **errp
 }
 }
 error_setg(errp, "Invalid model");
+error_append_hint(errp, "Use model=help to list models.\n");
 }
 
 static char *raspi_get_machine_model(Object *obj, Error **errp)
-- 
2.47.1




[PATCH v2 09/12] hw/arm/raspi: Have the generic machine take a 'revision' property

2025-02-03 Thread Philippe Mathieu-Daudé
Add a property to specify the board revision. This allows to
create a Raspberry Pi 2B with BCM2836 SoC (rev 1.0 and 1.1)
or BCM2837 (rev 1.2 up to 1.5).

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/raspi.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 1dc41701efe..b184ac3c446 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -491,6 +491,10 @@ static void raspi_update_board_rev(RaspiBaseMachineState 
*s)
 
 model_index = FIELD_EX32(s->board_rev, REV_CODE, TYPE);
 proc = types[model_index].proc_id;
+if (model_index == 4 && FIELD_EX32(s->board_rev, REV_CODE, REVISION) > 1) {
+/* 2B rev 1.0 and 1.1 have BCM2836, 1.2+ have BCM2837 */
+proc = PROCESSOR_ID_BCM2837;
+}
 s->board_rev = FIELD_DP32(s->board_rev, REV_CODE, PROCESSOR, proc);
 
 ms->smp.max_cpus = soc_property[proc].cores_count;
@@ -517,6 +521,35 @@ static char *raspi_get_machine_model(Object *obj, Error 
**errp)
 return g_strdup(types[FIELD_EX32(s->board_rev, REV_CODE, TYPE)].model);
 }
 
+static void raspi_set_machine_rev(Object *obj, const char *value, Error **errp)
+{
+RaspiBaseMachineState *s;
+int rev;
+
+if (strlen(value) != 3 || value[0] != '1' || value[1] != '.') {
+error_setg(errp, "Invalid revision");
+return;
+}
+rev = value[2] - '0';
+if (rev < 0 || rev > 5) {
+error_setg(errp, "Invalid revision");
+return;
+}
+
+s = RASPI_BASE_MACHINE(obj);
+s->board_rev = FIELD_DP32(s->board_rev, REV_CODE, REVISION, rev);
+
+return raspi_update_board_rev(s);
+}
+
+static char *raspi_get_machine_rev(Object *obj, Error **errp)
+{
+RaspiBaseMachineState *s = RASPI_BASE_MACHINE(obj);
+
+return g_strdup_printf("1.%u",
+   FIELD_EX32(s->board_rev, REV_CODE, REVISION));
+}
+
 static void raspi_generic_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -540,6 +573,12 @@ static void raspi_generic_machine_class_init(ObjectClass 
*oc, void *data)
   raspi_get_machine_model,
   raspi_set_machine_model);
 object_class_property_set_description(oc, "model", "Set machine model.");
+object_class_property_add_str(oc, "revision",
+  raspi_get_machine_rev,
+  raspi_set_machine_rev);
+object_class_property_set_description(oc, "revision",
+  "Set machine revision. "
+  "Valid values are 1.0 to 1.5");
 };
 
 
-- 
2.47.1




[PATCH v2 01/12] hw/arm/raspi: Access SoC parent object using BCM283X_BASE() macro

2025-02-03 Thread Philippe Mathieu-Daudé
We shouldn't access a QOM parent object directly.
Use the appropriate type-cast macro.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/raspi.c   | 2 +-
 hw/arm/raspi4b.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index a7a662f40db..508f90479e2 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -312,7 +312,7 @@ void raspi_machine_init(MachineState *machine)
 
 object_initialize_child(OBJECT(machine), "soc", soc,
 board_soc_type(mc->board_rev));
-raspi_base_machine_init(machine, &soc->parent_obj);
+raspi_base_machine_init(machine, BCM283X_BASE(soc));
 }
 
 void raspi_machine_class_common_init(MachineClass *mc,
diff --git a/hw/arm/raspi4b.c b/hw/arm/raspi4b.c
index 1264e0d6eed..9b08a598f39 100644
--- a/hw/arm/raspi4b.c
+++ b/hw/arm/raspi4b.c
@@ -104,7 +104,7 @@ static void raspi4b_machine_init(MachineState *machine)
 object_initialize_child(OBJECT(machine), "soc", soc,
 board_soc_type(mc->board_rev));
 
-raspi_base_machine_init(machine, &soc->parent_obj);
+raspi_base_machine_init(machine, BCM283X_BASE(soc));
 }
 
 static void raspi4b_machine_class_init(ObjectClass *oc, void *data)
-- 
2.47.1




[PATCH v2 07/12] hw/arm/raspi: Check ramsize is within chipset aperture

2025-02-03 Thread Philippe Mathieu-Daudé
Add the 'max_ramsize' field to the soc_property[] array,
corresponding to the maximum DRAM size a SoC can map.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/raspi.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 68332fba027..d44277001ee 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -76,11 +76,12 @@ typedef enum RaspiProcessorId {
 static const struct {
 const char *type;
 int cores_count;
+uint64_t max_ramsize;
 } soc_property[] = {
-[PROCESSOR_ID_BCM2835] = {TYPE_BCM2835, 1},
-[PROCESSOR_ID_BCM2836] = {TYPE_BCM2836, BCM283X_NCPUS},
-[PROCESSOR_ID_BCM2837] = {TYPE_BCM2837, BCM283X_NCPUS},
-[PROCESSOR_ID_BCM2838] = {TYPE_BCM2838, BCM283X_NCPUS},
+[PROCESSOR_ID_BCM2835] = {TYPE_BCM2835, 1,  512 * MiB},
+[PROCESSOR_ID_BCM2836] = {TYPE_BCM2836, BCM283X_NCPUS,  1 * GiB},
+[PROCESSOR_ID_BCM2837] = {TYPE_BCM2837, BCM283X_NCPUS,  1 * GiB},
+[PROCESSOR_ID_BCM2838] = {TYPE_BCM2838, BCM283X_NCPUS,  8 * GiB},
 };
 
 static const struct {
@@ -133,6 +134,11 @@ static int cores_count(uint32_t board_rev)
 return soc_property[board_processor_id(board_rev)].cores_count;
 }
 
+static uint64_t ramsize_max(uint32_t board_rev)
+{
+return soc_property[board_processor_id(board_rev)].max_ramsize;
+}
+
 static const char *board_type(uint32_t board_rev)
 {
 assert(FIELD_EX32(board_rev, REV_CODE, STYLE)); /* Only new style */
@@ -294,6 +300,7 @@ void raspi_base_machine_init(MachineState *machine,
 BlockBackend *blk;
 BusState *bus;
 DeviceState *carddev;
+uint64_t max_ramsize;
 
 if (machine->ram_size != ram_size) {
 char *size_str = size_to_str(ram_size);
@@ -301,6 +308,12 @@ void raspi_base_machine_init(MachineState *machine,
 g_free(size_str);
 exit(1);
 }
+max_ramsize = ramsize_max(board_rev);
+if (ram_size > max_ramsize) {
+g_autofree char *max_ramsize_str = size_to_str(max_ramsize);
+error_report("At most %s of RAM can be used", max_ramsize_str);
+ exit(1);
+}
 
 /* FIXME: Remove when we have custom CPU address space support */
 memory_region_add_subregion_overlap(get_system_memory(), 0,
-- 
2.47.1




[PATCH v2 04/12] hw/arm/raspi: Pass board_rev as argument to raspi_base_machine_init()

2025-02-03 Thread Philippe Mathieu-Daudé
Since callers already have reference to the RaspiBaseMachineClass,
directly pass 'board_rev' as argument to raspi_base_machine_init().

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/arm/raspi_platform.h | 2 +-
 hw/arm/raspi.c  | 8 +++-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/include/hw/arm/raspi_platform.h b/include/hw/arm/raspi_platform.h
index 7bc4807fa51..defb786153b 100644
--- a/include/hw/arm/raspi_platform.h
+++ b/include/hw/arm/raspi_platform.h
@@ -58,7 +58,7 @@ void raspi_machine_init(MachineState *machine);
 
 typedef struct BCM283XBaseState BCM283XBaseState;
 void raspi_base_machine_init(MachineState *machine,
- BCM283XBaseState *soc);
+ BCM283XBaseState *soc, const uint32_t board_rev);
 
 void raspi_machine_class_common_init(MachineClass *mc,
  uint32_t board_rev);
diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index ef94d57dab5..571b50bef7e 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -254,10 +254,8 @@ static void setup_boot(MachineState *machine, ARMCPU *cpu,
 }
 
 void raspi_base_machine_init(MachineState *machine,
- BCM283XBaseState *soc)
+ BCM283XBaseState *soc, const uint32_t board_rev)
 {
-RaspiBaseMachineClass *mc = RASPI_BASE_MACHINE_GET_CLASS(machine);
-uint32_t board_rev = mc->board_rev;
 uint64_t ram_size = board_ram_size(board_rev);
 uint32_t vcram_base, vcram_size;
 size_t boot_ram_size;
@@ -384,7 +382,7 @@ static void raspi4b_machine_init(MachineState *machine)
 
 object_initialize_child(OBJECT(machine), "soc", soc,
 board_soc_type(mc->board_rev));
-raspi_base_machine_init(machine, BCM283X_BASE(soc));
+raspi_base_machine_init(machine, BCM283X_BASE(soc), mc->board_rev);
 }
 #endif /* TARGET_AARCH64 */
 
@@ -399,7 +397,7 @@ void raspi_machine_init(MachineState *machine)
 
 object_initialize_child(OBJECT(machine), "soc", soc,
 board_soc_type(mc->board_rev));
-raspi_base_machine_init(machine, BCM283X_BASE(soc));
+raspi_base_machine_init(machine, BCM283X_BASE(soc), mc->board_rev);
 }
 
 void raspi_machine_class_common_init(MachineClass *mc,
-- 
2.47.1




[PATCH v2 12/12] hw/arm/raspi: Support more models

2025-02-03 Thread Philippe Mathieu-Daudé
Allow to create the following machines:

  - Zero2W
  - 400
  - CM4 and CM4S

Fill the arrays with the BCM2712-based machines (raspi5),
but since we don't model the SoC, these machines can't
be created (and aren't listed in the 'help' output).

List taken from:
https://github.com/raspberrypi/documentation/blob/9b126446a5/documentation/asciidoc/computers/raspberry-pi/revision-codes.adoc

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/raspi.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 86ecc988e06..2346550eec5 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -71,6 +71,7 @@ typedef enum RaspiProcessorId {
 PROCESSOR_ID_BCM2837 = 2,
 PROCESSOR_ID_BCM2838 = 3,
 PROCESSOR_ID_BCM2711 = 3,
+PROCESSOR_ID_BCM2712 = 4,
 } RaspiProcessorId;
 
 static const struct {
@@ -82,6 +83,7 @@ static const struct {
 [PROCESSOR_ID_BCM2836] = {TYPE_BCM2836, BCM283X_NCPUS,  1 * GiB},
 [PROCESSOR_ID_BCM2837] = {TYPE_BCM2837, BCM283X_NCPUS,  1 * GiB},
 [PROCESSOR_ID_BCM2838] = {TYPE_BCM2838, BCM283X_NCPUS,  8 * GiB},
+[PROCESSOR_ID_BCM2712] = {NULL, BCM283X_NCPUS,  16 * GiB},
 };
 
 static const struct {
@@ -106,6 +108,17 @@ static const struct {
 { },
 {PROCESSOR_ID_BCM2837, "CM3+"},
 {PROCESSOR_ID_BCM2711, "4B"},
+{PROCESSOR_ID_BCM2837, "Zero2W"},
+{PROCESSOR_ID_BCM2711, "400"},
+
+{PROCESSOR_ID_BCM2711, "CM4"},
+{PROCESSOR_ID_BCM2711, "CM4S"},
+{ },
+{PROCESSOR_ID_BCM2712, "5"},
+{PROCESSOR_ID_BCM2712, "CM5"},
+{PROCESSOR_ID_BCM2712, "500"},
+{PROCESSOR_ID_BCM2712, "CM5lite"},
+{ },
 };
 
 uint64_t board_ram_size(uint32_t board_rev)
-- 
2.47.1




[PATCH v2 11/12] hw/arm/raspi: Deprecate old raspiX machine names

2025-02-03 Thread Philippe Mathieu-Daudé
All previous raspi machines can be created using the
generic machine. Deprecate the old names to maintain
a single one. Update the tests.

Signed-off-by: Philippe Mathieu-Daudé 
---
QOM HMP introspection test fails because without the 'model'
argument set, no machine is created...

  $ qemu-system-aarch64 -M raspi
  qemu-system-aarch64: Missing model, try -M raspi,model=help
---
 docs/about/deprecated.rst   | 13 +
 hw/arm/raspi.c  |  5 +
 tests/qtest/bcm2835-dma-test.c  |  2 +-
 tests/qtest/bcm2835-i2c-test.c  |  2 +-
 tests/qtest/boot-serial-test.c  |  3 ++-
 tests/functional/test_aarch64_raspi3.py |  5 ++---
 tests/functional/test_aarch64_raspi4.py |  4 ++--
 tests/functional/test_arm_raspi2.py |  4 ++--
 8 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 4a3c302962a..c9a11a52f78 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -257,6 +257,19 @@ Big-Endian variants of MicroBlaze ``petalogix-ml605`` and 
``xlnx-zynqmp-pmu`` ma
 Both ``petalogix-ml605`` and ``xlnx-zynqmp-pmu`` were added for little endian
 CPUs. Big endian support is not tested.
 
+ARM ``raspi0``, ``raspi1ap``, ``raspi2b``, ``raspi3ap``, ``raspi3b`` and 
``raspi4b`` machines (since 10.0)
+''
+
+The Raspberry Pi machines have been unified under the generic ``raspi`` 
machine,
+which takes the model as argument.
+
+- `raspi0`` is now an alias for ``raspi,model=Zero``
+- `raspi1ap`` is now an alias for ``raspi,model=1A+``
+- `raspi2b`` is now an alias for ``raspi,model=2B``
+- `raspi3ap`` is now an alias for ``raspi,model=3A+``
+- `raspi3b`` is now an alias for ``raspi,model=3B``
+- `raspi4b`` is now an alias for ``raspi,model=4B``
+
 Backend options
 ---
 
diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 8cae1ff6f93..86ecc988e06 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -637,6 +637,7 @@ static void raspi0_machine_class_init(ObjectClass *oc, void 
*data)
 
 rmc->board_rev = 0x920092; /* Revision 1.2 */
 raspi_machine_class_init(mc, rmc->board_rev);
+mc->deprecation_reason = "-M raspi,model=Zero";
 };
 
 static void raspi1ap_machine_class_init(ObjectClass *oc, void *data)
@@ -646,6 +647,7 @@ static void raspi1ap_machine_class_init(ObjectClass *oc, 
void *data)
 
 rmc->board_rev = 0x900021; /* Revision 1.1 */
 raspi_machine_class_init(mc, rmc->board_rev);
+mc->deprecation_reason = "-M raspi,model=A+ (-m 512m)";
 };
 
 static void raspi2b_machine_class_init(ObjectClass *oc, void *data)
@@ -655,6 +657,7 @@ static void raspi2b_machine_class_init(ObjectClass *oc, 
void *data)
 
 rmc->board_rev = 0xa21041;
 raspi_machine_class_init(mc, rmc->board_rev);
+mc->deprecation_reason = "-M raspi,model=2B -m 1g";
 };
 
 #ifdef TARGET_AARCH64
@@ -665,6 +668,7 @@ static void raspi3ap_machine_class_init(ObjectClass *oc, 
void *data)
 
 rmc->board_rev = 0x9020e0; /* Revision 1.0 */
 raspi_machine_class_init(mc, rmc->board_rev);
+mc->deprecation_reason = "-M raspi,model=3A+ -m 512m";
 };
 
 static void raspi3b_machine_class_init(ObjectClass *oc, void *data)
@@ -674,6 +678,7 @@ static void raspi3b_machine_class_init(ObjectClass *oc, 
void *data)
 
 rmc->board_rev = 0xa02082;
 raspi_machine_class_init(mc, rmc->board_rev);
+mc->deprecation_reason = "-M raspi,model=3B -m 1g";
 };
 
 static void raspi4b_machine_class_init(ObjectClass *oc, void *data)
diff --git a/tests/qtest/bcm2835-dma-test.c b/tests/qtest/bcm2835-dma-test.c
index 18901b76d21..705e6b2362b 100644
--- a/tests/qtest/bcm2835-dma-test.c
+++ b/tests/qtest/bcm2835-dma-test.c
@@ -111,7 +111,7 @@ int main(int argc, char **argv)
 g_test_init(&argc, &argv, NULL);
 qtest_add_func("/bcm2835/dma/test_interrupts",
bcm2835_dma_test_interrupts);
-qtest_start("-machine raspi3b");
+qtest_start("-machine raspi,model=3B -m 1g");
 ret = g_test_run();
 qtest_end();
 return ret;
diff --git a/tests/qtest/bcm2835-i2c-test.c b/tests/qtest/bcm2835-i2c-test.c
index 15991949260..15904abf393 100644
--- a/tests/qtest/bcm2835-i2c-test.c
+++ b/tests/qtest/bcm2835-i2c-test.c
@@ -104,7 +104,7 @@ int main(int argc, char **argv)
 }
 
 /* Run I2C tests with TMP105 slaves on all three buses */
-qtest_start("-M raspi3b "
+qtest_start("-M raspi,model=3B -m 1g "
 "-device tmp105,address=0x50,bus=i2c-bus.0 "
 "-device tmp105,address=0x50,bus=i2c-bus.1 "
 "-device tmp105,address=0x50,bus=i2c-bus.2");
diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
index a05d26ee996..fbafd73facb 100644
--- a/tests/qtest/boot-serial-test.c
+++ b/tests/qtest/boot-serial-test.c
@@ -188,7 +188,8 @@ static const testdef_t tests[] = {
   size

Re: [RFC PATCH v12 qemu 2/2] qtest/cxl: Add aarch64 virt test for CXL

2025-02-03 Thread Itaru Kitayama
Jonathan,

> On Feb 4, 2025, at 2:30, Jonathan Cameron  wrote:
> 
> Add a single complex case for aarch64 virt machine.
> Given existing much more comprehensive tests for x86 cover the
> common functionality, a single test should be enough to verify
> that the aarch64 part continue to work.
> 
> Signed-off-by: Jonathan Cameron 
> ---
> tests/qtest/cxl-test.c  | 59 -
> tests/qtest/meson.build |  1 +
> 2 files changed, 47 insertions(+), 13 deletions(-)
> 
> diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
> index a600331843..c7189d6222 100644
> --- a/tests/qtest/cxl-test.c
> +++ b/tests/qtest/cxl-test.c
> @@ -19,6 +19,12 @@
> "-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 " \
> "-M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=4G 
> "
> 
> +#define QEMU_VIRT_2PXB_CMD \
> +"-machine virt,cxl=on -cpu max " \
> +"-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 " \
> +"-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 " \
> +"-M 
> cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=4G "
> +
> #define QEMU_RP \
> "-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 "
> 
> @@ -197,25 +203,52 @@ static void cxl_2pxb_4rp_4t3d(void)
> qtest_end();
> rmdir(tmpfs);
> }
> +
> +static void cxl_virt_2pxb_4rp_4t3d(void)
> +{
> +g_autoptr(GString) cmdline = g_string_new(NULL);
> +char template[] = "/tmp/cxl-test-XX";
> +const char *tmpfs;
> +
> +tmpfs = mkdtemp(template);
> +
> +g_string_printf(cmdline, QEMU_VIRT_2PXB_CMD QEMU_4RP QEMU_4T3D,
> +tmpfs, tmpfs, tmpfs, tmpfs, tmpfs, tmpfs,
> +tmpfs, tmpfs);
> +
> +qtest_start(cmdline->str);
> +qtest_end();
> +rmdir(tmpfs);
> +}
> #endif /* CONFIG_POSIX */
> 
> int main(int argc, char **argv)
> {
> -g_test_init(&argc, &argv, NULL);
> +const char *arch = qtest_get_arch();
> 
> -qtest_add_func("/pci/cxl/basic_hostbridge", cxl_basic_hb);
> -qtest_add_func("/pci/cxl/basic_pxb", cxl_basic_pxb);
> -qtest_add_func("/pci/cxl/pxb_with_window", cxl_pxb_with_window);
> -qtest_add_func("/pci/cxl/pxb_x2_with_window", cxl_2pxb_with_window);
> -qtest_add_func("/pci/cxl/rp", cxl_root_port);
> -qtest_add_func("/pci/cxl/rp_x2", cxl_2root_port);
> +g_test_init(&argc, &argv, NULL);
> +if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
> +qtest_add_func("/pci/cxl/basic_hostbridge", cxl_basic_hb);
> +qtest_add_func("/pci/cxl/basic_pxb", cxl_basic_pxb);
> +qtest_add_func("/pci/cxl/pxb_with_window", cxl_pxb_with_window);
> +qtest_add_func("/pci/cxl/pxb_x2_with_window", cxl_2pxb_with_window);
> +qtest_add_func("/pci/cxl/rp", cxl_root_port);
> +qtest_add_func("/pci/cxl/rp_x2", cxl_2root_port);
> #ifdef CONFIG_POSIX
> -qtest_add_func("/pci/cxl/type3_device", cxl_t3d_deprecated);
> -qtest_add_func("/pci/cxl/type3_device_pmem", cxl_t3d_persistent);
> -qtest_add_func("/pci/cxl/type3_device_vmem", cxl_t3d_volatile);
> -qtest_add_func("/pci/cxl/type3_device_vmem_lsa", cxl_t3d_volatile_lsa);
> -qtest_add_func("/pci/cxl/rp_x2_type3_x2", cxl_1pxb_2rp_2t3d);
> -qtest_add_func("/pci/cxl/pxb_x2_root_port_x4_type3_x4", 
> cxl_2pxb_4rp_4t3d);
> +qtest_add_func("/pci/cxl/type3_device", cxl_t3d_deprecated);
> +qtest_add_func("/pci/cxl/type3_device_pmem", cxl_t3d_persistent);
> +qtest_add_func("/pci/cxl/type3_device_vmem", cxl_t3d_volatile);
> +qtest_add_func("/pci/cxl/type3_device_vmem_lsa", 
> cxl_t3d_volatile_lsa);
> +qtest_add_func("/pci/cxl/rp_x2_type3_x2", cxl_1pxb_2rp_2t3d);
> +qtest_add_func("/pci/cxl/pxb_x2_root_port_x4_type3_x4",
> +   cxl_2pxb_4rp_4t3d);
> #endif
> +} else if (strcmp(arch, "aarch64") == 0) {
> +#ifdef CONFIG_POSIX
> +qtest_add_func("/pci/cxl/virt/pxb_x2_root_port_x4_type3_x4",
> +   cxl_virt_2pxb_4rp_4t3d);
> +#endif
> +}
> +
> return g_test_run();
> }
> diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> index e60e92fe9d..f5e7fb060e 100644
> --- a/tests/qtest/meson.build
> +++ b/tests/qtest/meson.build
> @@ -257,6 +257,7 @@ qtests_aarch64 = \
>   (config_all_accel.has_key('CONFIG_TCG') and 
>\
>config_all_devices.has_key('CONFIG_TPM_TIS_I2C') ? ['tpm-tis-i2c-test'] : 
> []) + \
>   (config_all_devices.has_key('CONFIG_ASPEED_SOC') ? qtests_aspeed64 : []) + \
> +  qtests_cxl +   
>\
>   ['arm-cpu-features',
>'numa-test',
>'boot-serial-test',
> -- 
> 2.43.0
> 

In Ubuntu 22.04 LTS, cxl-test applied on top of today’s QEMU upstream master 
branch cxl-test fails:

$ ./tests/qtest/cxl-test
# random seed: R02S2a8b02df7b32b79d086ce22f7f8ebeab
1..1
# Start of aarch64 tests
# Start of pci tests
# Start of cxl tests
# Start of virt tests
# s

Re: [PATCH v3 06/26] target/arm/kvm-rme: Initialize vCPU

2025-02-03 Thread Gavin Shan

On 11/26/24 5:56 AM, Jean-Philippe Brucker wrote:

The target code calls kvm_arm_vcpu_init() to mark the vCPU as part of a
Realm. For a Realm vCPU, only x0-x7 can be set at runtime. Before boot,
the PC can also be set, and is ignored at runtime. KVM also accepts a
few system register changes during initial configuration, as returned by
KVM_GET_REG_LIST.

Signed-off-by: Jean-Philippe Brucker 
---
  target/arm/cpu.h |  3 +++
  target/arm/kvm_arm.h | 15 +++
  target/arm/kvm-rme.c | 10 
  target/arm/kvm.c | 61 
  4 files changed, 89 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index d86e641280..f617591921 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -961,6 +961,9 @@ struct ArchCPU {
  OnOffAuto kvm_steal_time;
  #endif /* CONFIG_KVM */
  
+/* Realm Management Extension */

+bool kvm_rme;
+
  /* Uniprocessor system with MP extensions */
  bool mp_is_up;
  
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h

index 9d6a89f9b1..8b52a881b0 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -245,6 +245,16 @@ int kvm_arm_rme_init(MachineState *ms);
   */
  int kvm_arm_rme_vm_type(MachineState *ms);
  
+/**

+ * kvm_arm_rme_vcpu_init
+ * @cs: the CPU
+ *
+ * If the user requested a Realm, setup the given vCPU accordingly. Realm vCPUs
+ * behave a little differently, for example most of their register state is
+ * hidden from the host.
+ */
+int kvm_arm_rme_vcpu_init(CPUState *cs);
+
  #else
  
  /*

@@ -339,6 +349,11 @@ static inline int kvm_arm_rme_vm_type(MachineState *ms)
  g_assert_not_reached();
  }
  
+static inline int kvm_arm_rme_vcpu_init(CPUState *cs)

+{
+g_assert_not_reached();
+}
+
  #endif
  
  #endif

diff --git a/target/arm/kvm-rme.c b/target/arm/kvm-rme.c
index 60d967a842..e3cc37538a 100644
--- a/target/arm/kvm-rme.c
+++ b/target/arm/kvm-rme.c
@@ -137,6 +137,16 @@ int kvm_arm_rme_init(MachineState *ms)
  return 0;
  }
  
+int kvm_arm_rme_vcpu_init(CPUState *cs)

+{
+ARMCPU *cpu = ARM_CPU(cs);
+
+if (rme_guest) {
+cpu->kvm_rme = true;
+}
+return 0;
+}
+
  int kvm_arm_rme_vm_type(MachineState *ms)
  {
  if (rme_guest) {
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 0c80992f7c..a0de2efc41 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -1926,6 +1926,11 @@ int kvm_arch_init_vcpu(CPUState *cs)
  return ret;
  }
  
+ret = kvm_arm_rme_vcpu_init(cs);

+if (ret) {
+return ret;
+}
+
  if (cpu_isar_feature(aa64_sve, cpu)) {
  ret = kvm_arm_sve_set_vls(cpu);
  if (ret) {
@@ -2062,6 +2067,35 @@ static int kvm_arch_put_sve(CPUState *cs)
  return 0;
  }
  
+static int kvm_arm_rme_put_core_regs(CPUState *cs, Error **errp)

+{
+int i, ret;
+struct kvm_one_reg reg;
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = &cpu->env;
+
+/*
+ * The RME ABI only allows us to set 8 GPRs and the PC
+ */


Needn't to span for multiple lines.


+for (i = 0; i < 8; i++) {
+reg.id = AARCH64_CORE_REG(regs.regs[i]);
+reg.addr = (uintptr_t) &env->xregs[i];
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
+if (ret) {
+return ret;
+}
+}
+
+reg.id = AARCH64_CORE_REG(regs.pc);
+reg.addr = (uintptr_t) &env->pc;
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
+if (ret) {
+return ret;
+}
+
+return 0;
+}
+


Nice place to use kvm_set_one_reg(). With it, @reg can be dropped.


  static int kvm_arm_put_core_regs(CPUState *cs, int level, Error **errp)
  {
  uint64_t val;
@@ -2072,6 +2106,10 @@ static int kvm_arm_put_core_regs(CPUState *cs, int 
level, Error **errp)
  ARMCPU *cpu = ARM_CPU(cs);
  CPUARMState *env = &cpu->env;
  
+if (cpu->kvm_rme) {

+return kvm_arm_rme_put_core_regs(cs, errp);
+}
+
  /* If we are in AArch32 mode then we need to copy the AArch32 regs to the
   * AArch64 registers before pushing them out to 64-bit KVM.
   */
@@ -2259,6 +2297,25 @@ static int kvm_arch_get_sve(CPUState *cs)
  return 0;
  }
  
+static int kvm_arm_rme_get_core_regs(CPUState *cs, Error **errp)

+{
+int i, ret;
+struct kvm_one_reg reg;
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = &cpu->env;
+
+for (i = 0; i < 8; i++) {
+reg.id = AARCH64_CORE_REG(regs.regs[i]);
+reg.addr = (uintptr_t) &env->xregs[i];
+ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
+if (ret) {
+return ret;
+}
+}
+
+return 0;
+}
+


Similiarly, kvm_get_one_reg() can be used.


  static int kvm_arm_get_core_regs(CPUState *cs, Error **errp)
  {
  uint64_t val;
@@ -2269,6 +2326,10 @@ static int kvm_arm_get_core_regs(CPUState *cs, Error 
**errp)
  ARMCPU *cpu = ARM_CPU(cs);
  CPUARMState *env = &cpu->env;
  
+if (cpu->kvm_rme) {

+return kvm_arm_rme_get_core_regs(cs, errp);
+}
+
  fo

Re: [PATCH v2 11/15] block/export: Add option to allow export of inactive nodes

2025-02-03 Thread Stefan Hajnoczi
On Fri, Jan 31, 2025 at 10:50:47AM +0100, Kevin Wolf wrote:
> Add an option in BlockExportOptions to allow creating an export on an
> inactive node without activating the node. This mode needs to be
> explicitly supported by the export type (so that it doesn't perform any
> operations that are forbidden for inactive nodes), so this patch alone
> doesn't allow this option to be successfully used yet.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/block-export.json | 10 +-
>  include/block/block-global-state.h |  3 +++
>  include/block/export.h |  3 +++
>  block.c|  4 
>  block/export/export.c  | 31 --
>  5 files changed, 40 insertions(+), 11 deletions(-)
> 
> diff --git a/qapi/block-export.json b/qapi/block-export.json
> index ce33fe378d..117b05d13c 100644
> --- a/qapi/block-export.json
> +++ b/qapi/block-export.json
> @@ -372,6 +372,13 @@
>  # cannot be moved to the iothread.  The default is false.
>  # (since: 5.2)
>  #
> +# @allow-inactive: If true, the export allows the exported node to be 
> inactive.
> +# If it is created for an inactive block node, the node remains 
> inactive. If
> +# the export type doesn't support running on an inactive node, an error 
> is
> +# returned. If false, inactive block nodes are automatically activated 
> before
> +# creating the export and trying to inactivate them later fails.
> +# (since: 10.0; default: false)

Exposing activation in the API is ugly but I don't see a cleaner option
given that we cannot change block-export-add's existing behavior of
activating the node by default. :(

Ideally block-export-add would not modify active/inactive and leave it
up to user to provide a node in the desired state.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls

2025-02-03 Thread Peter Xu
On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote:
> On 2.02.2025 13:45, Dr. David Alan Gilbert wrote:
> > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote:
> > > On 2.02.2025 03:06, Dr. David Alan Gilbert wrote:
> > > > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote:
> > > > > From: "Maciej S. Szmigiero" 
> > > > > 
> > > > > postcopy_ram_listen_thread() is a free running thread, so it needs to
> > > > > take BQL around function calls to migration methods requiring BQL.
> > > > > 
> > > > > qemu_loadvm_state_main() needs BQL held since it ultimately calls
> > > > > "load_state" SaveVMHandlers.
> > > > > 
> > > > > migration_incoming_state_destroy() needs BQL held since it ultimately 
> > > > > calls
> > > > > "load_cleanup" SaveVMHandlers.
> > > > > 
> > > > > Signed-off-by: Maciej S. Szmigiero 
> > > > > ---
> > > > >migration/savevm.c | 4 
> > > > >1 file changed, 4 insertions(+)
> > > > > 
> > > > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > > > index b0b74140daea..0ceea9638cc1 100644
> > > > > --- a/migration/savevm.c
> > > > > +++ b/migration/savevm.c
> > > > > @@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void 
> > > > > *opaque)
> > > > > * in qemu_file, and thus we must be blocking now.
> > > > > */
> > > > >qemu_file_set_blocking(f, true);
> > > > > +bql_lock();
> > > > >load_res = qemu_loadvm_state_main(f, mis);
> > > > > +bql_unlock();
> > > > 
> > > > Doesn't that leave that held for a heck of a long time?
> > > 
> > > Yes, and it effectively broke "postcopy recover" test but I
> > > think the reason for that is qemu_loadvm_state_main() and
> > > its children don't drop BQL while waiting for I/O.
> > > 
> > > I've described this case in more detail in my reply to Fabiano here:
> > > https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd25...@maciej.szmigiero.name/
> > 
> > While it might be the cause in this case, my feeling is it's more 
> > fundamental
> > here - it's the whole reason that postcopy has a separate ram listen
> > thread.  As the destination is running, after it loads it's devices
> > and as it starts up the destination will be still loading RAM
> > (and other postcopiable devices) potentially for quite a while.
> > Holding the bql around the ram listen thread means that the
> > execution of the destination won't be able to take that lock
> > until the postcopy load has finished; so while that might apparently
> > complete, it'll lead to the destination stalling until that's finished
> > which defeats the whole point of postcopy.
> > That last one probably won't fail a test but it will lead to a long stall
> > if you give it a nice big guest with lots of RAM that it's rapidly
> > changing.
> 
> Okay, I understand the postcopy case/flow now.
> Thanks for explaining it clearly.
> 
> > > I still think that "load_state" SaveVMHandlers need to be called
> > > with BQL held since implementations apparently expect it that way:
> > > for example, I think PCI device configuration restore calls
> > > address space manipulation methods which abort() if called
> > > without BQL held.
> > 
> > However, the only devices that *should* be arriving on the channel
> > that the postcopy_ram_listen_thread is reading from are those
> > that are postcopiable (i.e. RAM and hmm block's dirty_bitmap).
> > Those load handlers are safe to be run while the other devices
> > are being changed.   Note the *should* - you could add a check
> > to fail if any other device arrives on that channel.
> 
> I think ultimately there should be either an explicit check, or,
> as you suggest in the paragraph below, a separate SaveVMHandler
> that runs without BQL held.

To me those are bugs happening during postcopy, so those abort()s in
memory.c are indeed for catching these issues too.

> Since the current state of just running these SaveVMHandlers
> without BQL in this case and hoping that nothing breaks is
> clearly sub-optimal.
> 
> > > I have previously even submitted a patch to explicitly document
> > > "load_state" SaveVMHandler as requiring BQL (which was also
> > > included in the previous version of this patch set) and it
> > > received a "Reviewed-by:" tag:
> > > https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/
> > > https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/
> > > https://lore.kernel.org/qemu-devel/87o732bti7@suse.de/
> > 
> > It happens!
> > You could make this safer by having a load_state and a load_state_postcopy
> > member, and only mark the load_state as requiring the lock.
> 
> To not digress too much from the subject of this patch set
> (multifd VFIO device state transfer) for now I've just updated the
> TODO comment around that qemu_loadvm_state_main(), so hopefully this
> discussion won't get forgotten:
> https://gitlab

Re: [PATCH v4 13/33] migration/multifd: Device state transfer support - receive side

2025-02-03 Thread Peter Xu
On Thu, Jan 30, 2025 at 11:08:34AM +0100, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" 
> 
> Add a basic support for receiving device state via multifd channels -
> channels that are shared with RAM transfers.
> 
> Depending whether MULTIFD_FLAG_DEVICE_STATE flag is present or not in the
> packet header either device state (MultiFDPacketDeviceState_t) or RAM
> data (existing MultiFDPacket_t) is read.
> 
> The received device state data is provided to
> qemu_loadvm_load_state_buffer() function for processing in the
> device's load_state_buffer handler.
> 
> Signed-off-by: Maciej S. Szmigiero 

I think I acked this one.  You could keep my R-b if...

[...]

> diff --git a/migration/multifd.h b/migration/multifd.h
> index 9e4baa066312..abf3acdcee40 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -62,6 +62,12 @@ MultiFDRecvData *multifd_get_recv_data(void);
>  #define MULTIFD_FLAG_UADK (8 << 1)
>  #define MULTIFD_FLAG_QATZIP (16 << 1)
>  
> +/*
> + * If set it means that this packet contains device state
> + * (MultiFDPacketDeviceState_t), not RAM data (MultiFDPacket_t).
> + */
> +#define MULTIFD_FLAG_DEVICE_STATE (1 << 6)

... if this won't conflict with MULTIFD_FLAG_QATZIP.

I think we should stick with one way to write it, then when rebase you can
see such conflicts - either your patch uses 32 << 1, or perhaps we should
start to switch to BIT() for all above instead..

-- 
Peter Xu




RE: [PATCH v5 13/17] aspeed/soc: Add AST2700 support

2025-02-03 Thread Jamin Lin
Hi Philippe,

> From: Philippe Mathieu-Daudé 
> Sent: Tuesday, February 4, 2025 12:41 AM
> To: Jamin Lin ; Cédric Le Goater ;
> Peter Maydell ; Andrew Jeffery
> ; Joel Stanley ; Alistair
> Francis ; Cleber Rosa ; Wainer
> dos Santos Moschetta ; Beraldo Leal
> ; open list:ASPEED BMCs ; open
> list:All patches CC here ; Jinjie Ruan
> 
> Cc: Troy Lee ; Yunlin Tang
> 
> Subject: Re: [PATCH v5 13/17] aspeed/soc: Add AST2700 support
> 
> On 3/2/25 08:43, Jamin Lin wrote:
> > Hi Philippe,
> >
> >> From: Jamin Lin
> >> Sent: Monday, February 3, 2025 3:29 PM
> >> To: Philippe Mathieu-Daudé ; Cédric Le Goater
> >> ; Peter Maydell ; Andrew
> >> Jeffery ; Joel Stanley ;
> >> Alistair Francis ; Cleber Rosa
> >> ; Wainer dos Santos Moschetta
> >> ; Beraldo Leal ; open
> >> list:ASPEED BMCs ; open list:All patches CC here
> >> ; Jinjie Ruan 
> >> Cc: Troy Lee ; Yunlin Tang
> >> 
> >> Subject: RE: [PATCH v5 13/17] aspeed/soc: Add AST2700 support
> >>
> >> Hi Philippe,
> >>
> >>> From: Philippe Mathieu-Daudé 
> >>> Sent: Thursday, January 30, 2025 11:14 PM
> >>> To: Jamin Lin ; Cédric Le Goater
> >>> ; Peter Maydell ; Andrew
> >>> Jeffery ; Joel Stanley
> >>> ; Alistair Francis ; Cleber
> >>> Rosa ; Wainer dos Santos Moschetta
> >> ;
> >>> Beraldo Leal ; open list:ASPEED BMCs
> >>> ; open list:All patches CC here
> >>> ; Jinjie Ruan 
> >>> Cc: Troy Lee ; Yunlin Tang
> >>> 
> >>> Subject: Re: [PATCH v5 13/17] aspeed/soc: Add AST2700 support
> >>>
> >>> Hi Jamin,
> >>>
> >>> On 4/6/24 07:44, Jamin Lin wrote:
>  Initial definitions for a simple machine using an AST2700 SOC
>  (Cortex-a35
> >>> CPU).
> 
>  AST2700 SOC and its interrupt controller are too complex to handle
>  in the common Aspeed SoC framework. We introduce a new ast2700
>  class with instance_init and realize handlers.
> 
>  AST2700 is a 64 bits quad core cpus and support 8 watchdog.
>  Update maximum ASPEED_CPUS_NUM to 4 and ASPEED_WDTS_NUM to
> 8.
>  In addition, update AspeedSocState to support scuio, sli, sliio and intc.
> 
>  Add TYPE_ASPEED27X0_SOC machine type.
> 
>  The SDMC controller is unlocked at SPL stage.
>  At present, only supports to emulate booting start from u-boot stage.
>  Set SDMC controller unlocked by default.
> 
>  In INTC, each interrupt of INT 128 to INT 136 combines 32 interrupts.
>  It connect GICINT IRQ GPIO-OUTPUT pins to GIC device with irq 128 to
> 136.
>  And, if a device irq is 128 to 136, its irq GPIO-OUTPUT pin is
>  connected to GICINT or-gates instead of GIC device.
> 
>  Signed-off-by: Troy Lee 
>  Signed-off-by: Jamin Lin 
>  ---
> hw/arm/aspeed_ast27x0.c | 563
> >>> 
> hw/arm/meson.build  |   1 +
> include/hw/arm/aspeed_soc.h |  28 +-
> 3 files changed, 590 insertions(+), 2 deletions(-)
> create mode 100644 hw/arm/aspeed_ast27x0.c
> >>>
> >>>
>  +static bool aspeed_soc_ast2700_gic_realize(DeviceState *dev, Error
>  +**errp) {
>  +Aspeed27x0SoCState *a = ASPEED27X0_SOC(dev);
>  +AspeedSoCState *s = ASPEED_SOC(dev);
>  +AspeedSoCClass *sc = ASPEED_SOC_GET_CLASS(s);
>  +SysBusDevice *gicbusdev;
>  +DeviceState *gicdev;
>  +QList *redist_region_count;
>  +int i;
>  +
>  +gicbusdev = SYS_BUS_DEVICE(&a->gic);
>  +gicdev = DEVICE(&a->gic);
>  +qdev_prop_set_uint32(gicdev, "revision", 3);
>  +qdev_prop_set_uint32(gicdev, "num-cpu", sc->num_cpus);
>  +qdev_prop_set_uint32(gicdev, "num-irq", AST2700_MAX_IRQ);
>  +
>  +redist_region_count = qlist_new();
>  +qlist_append_int(redist_region_count, sc->num_cpus);
>  +qdev_prop_set_array(gicdev, "redist-region-count",
>  + redist_region_count);
>  +
>  +if (!sysbus_realize(gicbusdev, errp)) {
>  +return false;
>  +}
>  +sysbus_mmio_map(gicbusdev, 0,
> sc->memmap[ASPEED_GIC_DIST]);
>  +sysbus_mmio_map(gicbusdev, 1,
> >> sc->memmap[ASPEED_GIC_REDIST]);
>  +
>  +for (i = 0; i < sc->num_cpus; i++) {
>  +DeviceState *cpudev = DEVICE(&a->cpu[i]);
>  +int NUM_IRQS = 256, ARCH_GIC_MAINT_IRQ = 9,
> >>> VIRTUAL_PMU_IRQ = 7;
>  +int ppibase = NUM_IRQS + i * GIC_INTERNAL + GIC_NR_SGIS;
>  +
>  +const int timer_irq[] = {
>  +[GTIMER_PHYS] = 14,
>  +[GTIMER_VIRT] = 11,
>  +[GTIMER_HYP]  = 10,
>  +[GTIMER_SEC]  = 13,
>  +};
>  +int j;
>  +
>  +for (j = 0; j < ARRAY_SIZE(timer_irq); j++) {
>  +qdev_connect_gpio_out(cpudev, j,
>  +qdev_get_gpio_in(gicdev, ppibase +
> >> timer_irq[j]));
>  +}
>  +
>  +qemu_irq irq = qdev_get_gpio_in(gicdev,
>  +ppibase +
> >>

Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls

2025-02-03 Thread Maciej S. Szmigiero

On 3.02.2025 21:36, Peter Xu wrote:

On Mon, Feb 03, 2025 at 09:15:52PM +0100, Maciej S. Szmigiero wrote:

On 3.02.2025 20:58, Peter Xu wrote:

On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote:

On 2.02.2025 13:45, Dr. David Alan Gilbert wrote:

* Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote:

On 2.02.2025 03:06, Dr. David Alan Gilbert wrote:

* Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote:

From: "Maciej S. Szmigiero" 

postcopy_ram_listen_thread() is a free running thread, so it needs to
take BQL around function calls to migration methods requiring BQL.

qemu_loadvm_state_main() needs BQL held since it ultimately calls
"load_state" SaveVMHandlers.

migration_incoming_state_destroy() needs BQL held since it ultimately calls
"load_cleanup" SaveVMHandlers.

Signed-off-by: Maciej S. Szmigiero 
---
 migration/savevm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index b0b74140daea..0ceea9638cc1 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
  * in qemu_file, and thus we must be blocking now.
  */
 qemu_file_set_blocking(f, true);
+bql_lock();
 load_res = qemu_loadvm_state_main(f, mis);
+bql_unlock();


Doesn't that leave that held for a heck of a long time?


Yes, and it effectively broke "postcopy recover" test but I
think the reason for that is qemu_loadvm_state_main() and
its children don't drop BQL while waiting for I/O.

I've described this case in more detail in my reply to Fabiano here:
https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd25...@maciej.szmigiero.name/


While it might be the cause in this case, my feeling is it's more fundamental
here - it's the whole reason that postcopy has a separate ram listen
thread.  As the destination is running, after it loads it's devices
and as it starts up the destination will be still loading RAM
(and other postcopiable devices) potentially for quite a while.
Holding the bql around the ram listen thread means that the
execution of the destination won't be able to take that lock
until the postcopy load has finished; so while that might apparently
complete, it'll lead to the destination stalling until that's finished
which defeats the whole point of postcopy.
That last one probably won't fail a test but it will lead to a long stall
if you give it a nice big guest with lots of RAM that it's rapidly
changing.


Okay, I understand the postcopy case/flow now.
Thanks for explaining it clearly.


I still think that "load_state" SaveVMHandlers need to be called
with BQL held since implementations apparently expect it that way:
for example, I think PCI device configuration restore calls
address space manipulation methods which abort() if called
without BQL held.


However, the only devices that *should* be arriving on the channel
that the postcopy_ram_listen_thread is reading from are those
that are postcopiable (i.e. RAM and hmm block's dirty_bitmap).
Those load handlers are safe to be run while the other devices
are being changed.   Note the *should* - you could add a check
to fail if any other device arrives on that channel.


I think ultimately there should be either an explicit check, or,
as you suggest in the paragraph below, a separate SaveVMHandler
that runs without BQL held.


To me those are bugs happening during postcopy, so those abort()s in
memory.c are indeed for catching these issues too.


Since the current state of just running these SaveVMHandlers
without BQL in this case and hoping that nothing breaks is
clearly sub-optimal.


I have previously even submitted a patch to explicitly document
"load_state" SaveVMHandler as requiring BQL (which was also
included in the previous version of this patch set) and it
received a "Reviewed-by:" tag:
https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/
https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/
https://lore.kernel.org/qemu-devel/87o732bti7@suse.de/


It happens!
You could make this safer by having a load_state and a load_state_postcopy
member, and only mark the load_state as requiring the lock.


To not digress too much from the subject of this patch set
(multifd VFIO device state transfer) for now I've just updated the
TODO comment around that qemu_loadvm_state_main(), so hopefully this
discussion won't get forgotten:
https://gitlab.com/maciejsszmigiero/qemu/-/commit/046e3deac5b1dbc406b3e9571f62468bd6743e79


The commit message may still need some touch ups, e.g.:

postcopy_ram_listen_thread() is a free running thread, so it needs to
take BQL around function calls to migration methods requiring BQL.


This sentence is still not correct, IMHO. As Dave explained, the ram load
thread is designed to run without BQL at least for t

Re: [PATCH v4 08/33] migration/multifd: Allow premature EOF on TLS incoming channels

2025-02-03 Thread Maciej S. Szmigiero

On 3.02.2025 21:20, Peter Xu wrote:

On Mon, Feb 03, 2025 at 07:53:00PM +0100, Maciej S. Szmigiero wrote:

On 3.02.2025 19:20, Peter Xu wrote:

On Thu, Jan 30, 2025 at 11:08:29AM +0100, Maciej S. Szmigiero wrote:

From: "Maciej S. Szmigiero" 

Multifd send channels are terminated by calling
qio_channel_shutdown(QIO_CHANNEL_SHUTDOWN_BOTH) in
multifd_send_terminate_threads(), which in the TLS case essentially
calls shutdown(SHUT_RDWR) on the underlying raw socket.

Unfortunately, this does not terminate the TLS session properly and
the receive side sees this as a GNUTLS_E_PREMATURE_TERMINATION error.

The only reason why this wasn't causing migration failures is because
the current migration code apparently does not check for migration
error being set after the end of the multifd receive process.

However, this will change soon so the multifd receive code has to be
prepared to not return an error on such premature TLS session EOF.
Use the newly introduced QIOChannelTLS method for that.

It's worth noting that even if the sender were to be changed to terminate
the TLS connection properly the receive side still needs to remain
compatible with older QEMU bit stream which does not do this.


If this is an existing bug, we could add a Fixes.


It is an existing issue but only uncovered by this patch set.

As far as I can see it was always there, so it would need some
thought where to point that Fixes tag.


If there's no way to trigger a real functional bug anyway, it's also ok we
omit the Fixes.


Two pure questions..

- What is the correct way to terminate the TLS session without this flag?


I guess one would need to call gnutls_bye() like in this GnuTLS example:
https://gitlab.com/gnutls/gnutls/-/blob/2b8c3e4c71ad380bbbffb32e6003b34ecad596e3/doc/examples/ex-client-anon.c#L102


- Why this is only needed by multifd sessions?


What uncovered the issue was switching the load threads to using
migrate_set_error() instead of their own result variable
(load_threads_ret) which you had requested during the previous
patch set version review:
https://lore.kernel.org/qemu-devel/Z1DbH5fwBaxtgrvH@x1n/

Turns out that the multifd receive code always returned
error in the TLS case, just nothing was previously checking for
that error presence.


What I was curious is whether this issue also exists for the main migration
channel when with tls, especially when e.g. multifd not enabled at all.  As
I don't see anywhere that qemu uses gnutls_bye() for any tls session.

I think it's a good to find that we overlooked this before.. and IMHO it's
always good we could fix this.

Does it mean we need proper gnutls_bye() somewhere?

If we need an explicit gnutls_bye(), then I wonder if that should be done
on the main channel as well.


That's a good question and looking at the code qemu_loadvm_state_main() exits
on receiving "QEMU_VM_EOF" section (that's different from receiving socket EOF)
and then optionally "QEMU_VM_VMDESCRIPTION" section is read with explicit size
in qemu_loadvm_state() - so still not until channel EOF.

Then I can't see anything else reading the channel until it is closed in
migration_incoming_state_destroy().

So most likely the main migration channel will never read far enough to
reach that GNUTLS_E_PREMATURE_TERMINATION error.


If we don't need gnutls_bye(), then should we always ignore pre-mature
termination of tls no matter if it's multifd or non-multifd channel (or
even a tls session that is not migration-related)?


So basically have this patch extended to calling
qio_channel_tls_set_premature_eof_okay() also on the main migration channel?


Thanks,


Thanks,
Maciej




Re: [PATCH v4 16/33] migration/multifd: Device state transfer support - send side

2025-02-03 Thread Peter Xu
On Thu, Jan 30, 2025 at 11:08:37AM +0100, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" 
> 
> A new function multifd_queue_device_state() is provided for device to queue
> its state for transmission via a multifd channel.
> 
> Signed-off-by: Maciej S. Szmigiero 

Reviewed-by: Peter Xu 

-- 
Peter Xu




Re: [PATCH v4 10/33] error: define g_autoptr() cleanup function for the Error type

2025-02-03 Thread Maciej S. Szmigiero

On 3.02.2025 22:13, Daniel P. Berrangé wrote:

On Thu, Jan 30, 2025 at 11:08:31AM +0100, Maciej S. Szmigiero wrote:

From: "Maciej S. Szmigiero" 

Automatic memory management helps avoid memory safety issues.

Signed-off-by: Maciej S. Szmigiero 
---
  include/qapi/error.h | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/include/qapi/error.h b/include/qapi/error.h
index 71f8fb2c50ee..649ec8f1b6a2 100644
--- a/include/qapi/error.h
+++ b/include/qapi/error.h
@@ -437,6 +437,8 @@ Error *error_copy(const Error *err);>

q   */

  void error_free(Error *err);
  
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(Error, error_free)

+


This has been rejected by Markus in the past when I proposed. See the
rationale at the time here:

   https://lists.nongnu.org/archive/html/qemu-devel/2024-07/msg05503.html


Thanks for the pointer, I wasn't expecting this change to be controversial.
 

If you want this, the commit message will need to explain the use
case and justify why the existing error usage patterns are insufficient.


In this case it's about giving received Error to migrate_set_error()
which does *not* take ownership of it.

And the reason why migrate_set_error() does not take ownership of
incoming Error is that it might have an Error already set in
MigrationState, in this case it simply ignores the passed Error
(almost like being a NOP in this case).

I don't know whether this is enough of a justification for introducing
g_autoptr(Error).
I'm happy to drop this commit and change it to manual memory management
instead if it is not.

@Markus, what's your opinion here?


With regards,
Daniel


Thanks,
Maciej




Re: [PATCH v4 08/33] migration/multifd: Allow premature EOF on TLS incoming channels

2025-02-03 Thread Peter Xu
On Mon, Feb 03, 2025 at 07:53:00PM +0100, Maciej S. Szmigiero wrote:
> On 3.02.2025 19:20, Peter Xu wrote:
> > On Thu, Jan 30, 2025 at 11:08:29AM +0100, Maciej S. Szmigiero wrote:
> > > From: "Maciej S. Szmigiero" 
> > > 
> > > Multifd send channels are terminated by calling
> > > qio_channel_shutdown(QIO_CHANNEL_SHUTDOWN_BOTH) in
> > > multifd_send_terminate_threads(), which in the TLS case essentially
> > > calls shutdown(SHUT_RDWR) on the underlying raw socket.
> > > 
> > > Unfortunately, this does not terminate the TLS session properly and
> > > the receive side sees this as a GNUTLS_E_PREMATURE_TERMINATION error.
> > > 
> > > The only reason why this wasn't causing migration failures is because
> > > the current migration code apparently does not check for migration
> > > error being set after the end of the multifd receive process.
> > > 
> > > However, this will change soon so the multifd receive code has to be
> > > prepared to not return an error on such premature TLS session EOF.
> > > Use the newly introduced QIOChannelTLS method for that.
> > > 
> > > It's worth noting that even if the sender were to be changed to terminate
> > > the TLS connection properly the receive side still needs to remain
> > > compatible with older QEMU bit stream which does not do this.
> > 
> > If this is an existing bug, we could add a Fixes.
> 
> It is an existing issue but only uncovered by this patch set.
> 
> As far as I can see it was always there, so it would need some
> thought where to point that Fixes tag.

If there's no way to trigger a real functional bug anyway, it's also ok we
omit the Fixes.

> > Two pure questions..
> > 
> >- What is the correct way to terminate the TLS session without this flag?
> 
> I guess one would need to call gnutls_bye() like in this GnuTLS example:
> https://gitlab.com/gnutls/gnutls/-/blob/2b8c3e4c71ad380bbbffb32e6003b34ecad596e3/doc/examples/ex-client-anon.c#L102
> 
> >- Why this is only needed by multifd sessions?
> 
> What uncovered the issue was switching the load threads to using
> migrate_set_error() instead of their own result variable
> (load_threads_ret) which you had requested during the previous
> patch set version review:
> https://lore.kernel.org/qemu-devel/Z1DbH5fwBaxtgrvH@x1n/
> 
> Turns out that the multifd receive code always returned
> error in the TLS case, just nothing was previously checking for
> that error presence.

What I was curious is whether this issue also exists for the main migration
channel when with tls, especially when e.g. multifd not enabled at all.  As
I don't see anywhere that qemu uses gnutls_bye() for any tls session.

I think it's a good to find that we overlooked this before.. and IMHO it's
always good we could fix this.

Does it mean we need proper gnutls_bye() somewhere?

If we need an explicit gnutls_bye(), then I wonder if that should be done
on the main channel as well.

If we don't need gnutls_bye(), then should we always ignore pre-mature
termination of tls no matter if it's multifd or non-multifd channel (or
even a tls session that is not migration-related)?

Thanks,

> 
> Another option would be to simply return to using
> load_threads_ret like the previous versions did and not
> experiment with touching global migration state because
> as we can see other places can unintentionally break.
> 
> If we go this route then these TLS EOF patches could be
> dropped.
> 
> > Thanks,
> > 
> 
> Thanks,
> Maciej
> 

-- 
Peter Xu




Re: [PATCH v2 14/15] iotests: Add qsd-migrate case

2025-02-03 Thread Kevin Wolf
Am 03.02.2025 um 20:35 hat Eric Blake geschrieben:
> On Fri, Jan 31, 2025 at 10:50:50AM +0100, Kevin Wolf wrote:
> > Test that it's possible to migrate a VM that uses an image on shared
> > storage through qemu-storage-daemon.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  tests/qemu-iotests/tests/qsd-migrate | 132 +++
> >  tests/qemu-iotests/tests/qsd-migrate.out |  51 +
> >  2 files changed, 183 insertions(+)
> >  create mode 100755 tests/qemu-iotests/tests/qsd-migrate
> >  create mode 100644 tests/qemu-iotests/tests/qsd-migrate.out
> > 
> > diff --git a/tests/qemu-iotests/tests/qsd-migrate 
> > b/tests/qemu-iotests/tests/qsd-migrate
> > new file mode 100755
> > index 00..687bda6f93
> > --- /dev/null
> > +++ b/tests/qemu-iotests/tests/qsd-migrate
> > @@ -0,0 +1,132 @@
> > +#!/usr/bin/env python3
> > +# group: rw quick
> 
> > +
> > +with iotests.FilePath('disk.img') as path, \
> > + iotests.FilePath('nbd-src.sock', base_dir=iotests.sock_dir) as 
> > nbd_src, \
> > + iotests.FilePath('nbd-dst.sock', base_dir=iotests.sock_dir) as 
> > nbd_dst, \
> > + iotests.FilePath('migrate.sock', base_dir=iotests.sock_dir) as 
> > mig_sock, \
> > + iotests.VM(path_suffix="-src") as vm_src, \
> > + iotests.VM(path_suffix="-dst") as vm_dst:
> > +
> 
> > +
> > +iotests.log('\nTest I/O on the source')
> > +vm_src.hmp_qemu_io('virtio0/virtio-backend', 'write -P 0x11 0 4k',
> > +   use_log=True, qdev=True)
> > +vm_src.hmp_qemu_io('virtio0/virtio-backend', 'read -P 0x11 0 4k',
> > +   use_log=True, qdev=True)
> > +
> > +iotests.log('\nStarting migration...')
> 
> 
> Is it worth adding a test that qemu_io fails to write on the
> destination while it is inactive (to ensure we are properly rejecting
> modification of an inactive image)?

The problem with that is that the failure mode for qemu_io (which acts
as if it were a device, not an external interface) is an assertion
failure.

The other test (in patch 15) tests writes on the NBD export, which fails
gracefully.

> > +
> > +mig_caps = [
> > +{'capability': 'events', 'state': True},
> > +{'capability': 'pause-before-switchover', 'state': True},
> > +]
> > +vm_src.qmp_log('migrate-set-capabilities', capabilities=mig_caps)
> > +vm_dst.qmp_log('migrate-set-capabilities', capabilities=mig_caps)
> > +vm_src.qmp_log('migrate', uri=f'unix:{mig_sock}',
> > +   filters=[iotests.filter_qmp_testfiles])
> > +
> > +vm_src.event_wait('MIGRATION',
> > +  match={'data': {'status': 'pre-switchover'}})
> > +
> > +iotests.log('\nPre-switchover: Reconfigure QSD instances')
> > +
> > +iotests.log(qsd_src.qmp('blockdev-set-active', {'active': False}))
> > +iotests.log(qsd_dst.qmp('blockdev-set-active', {'active': True}))
> 
> Also, should you attempt a read on both src and dst while both sides
> are inactive, to prove that reads can take a snapshot in the middle of
> the handover?

I think this could be done without any problems.

Kevin




Re: [PATCH v4 10/33] error: define g_autoptr() cleanup function for the Error type

2025-02-03 Thread Daniel P . Berrangé
On Thu, Jan 30, 2025 at 11:08:31AM +0100, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" 
> 
> Automatic memory management helps avoid memory safety issues.
> 
> Signed-off-by: Maciej S. Szmigiero 
> ---
>  include/qapi/error.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index 71f8fb2c50ee..649ec8f1b6a2 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -437,6 +437,8 @@ Error *error_copy(const Error *err);>
q   */
>  void error_free(Error *err);
>  
> +G_DEFINE_AUTOPTR_CLEANUP_FUNC(Error, error_free)
> +

This has been rejected by Markus in the past when I proposed. See the
rationale at the time here:

  https://lists.nongnu.org/archive/html/qemu-devel/2024-07/msg05503.html

If you want this, the commit message will need to explain the use
case and justify why the existing error usage patterns are insufficient.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH V1 05/26] vfio/container: preserve descriptors

2025-02-03 Thread Steven Sistare

On 2/3/2025 12:48 PM, Cédric Le Goater wrote:

On 1/29/25 15:43, Steve Sistare wrote:

At vfio creation time, save the value of vfio container, group, and device
descriptors in CPR state.  On qemu restart, vfio_realize() finds and uses
the saved descriptors, and remembers the reused status for subsequent
patches.  The reused status is cleared when vmstate load finishes.

During reuse, device and iommu state is already configured, so operations
in vfio_realize that would modify the configuration, such as vfio ioctl's,
are skipped.  The result is that vfio_realize constructs qemu data
structures that reflect the current state of the device.

Signed-off-by: Steve Sistare 
---
  hw/vfio/container.c   | 105 ++
  hw/vfio/cpr-legacy.c  |  17 +++
  include/hw/vfio/vfio-common.h |   2 +
  3 files changed, 105 insertions(+), 19 deletions(-)

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index a90ce6c..81d0ccc 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -31,6 +31,7 @@
  #include "system/reset.h"
  #include "trace.h"
  #include "qapi/error.h"
+#include "migration/cpr.h"
  #include "pci.h"
  VFIOGroupList vfio_group_list =
@@ -415,12 +416,28 @@ static bool vfio_set_iommu(int container_fd, int group_fd,
  }
  static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
-    Error **errp)
+    bool reused, Error **errp)


Please rename 'reused' to 'cpr_reused'. We should know what this parameter
is for and I don't see any other use than CPR.


Hi Cedric, glad to virtually meet you, and thanks for reviewing this.

There is no other notion of "reused" in qemu -- CPR is the first to introduce
it.  Thus "reused" is unambiguous, it always refers to CPR.  IMO shorter names
without underscores make the code more readable, as long as they are 
unambiguous.

Also, the "reused" identifier already appears in the initial series for
cpr-transfer, and to switch now to a different identifier leaves us with two
names for the same functionality.  Right now I can cscope "reused" and find
everything.

For those reasons, I prefer reused, but if you feel strongly, I will rename it.


  {
  int iommu_type;
  const char *vioc_name;
  VFIOContainer *container;
+    /*
+ * If container is reused, just set its type and skip the ioctls, as the
+ * container and group are already configured in the kernel.
+ * VFIO_TYPE1v2_IOMMU is the only type that supports reuse/cpr.
+ */
+    if (reused) {
+    if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
+    iommu_type = VFIO_TYPE1v2_IOMMU;
+    goto skip_iommu;
+    } else {
+    error_setg(errp, "container was reused but VFIO_TYPE1v2_IOMMU "
+ "is not supported");
+    return NULL;
+    }
+    }
+


Can we use 'iommu_type' below instead and avoid VFIO_CHECK_EXTENSION
ioctl ? and then set the iommu unless CPR reused is set.


Sure, I'll mke that change.


  iommu_type = vfio_get_iommu_type(fd, errp);
  if (iommu_type < 0) {
  return NULL;
@@ -430,10 +447,12 @@ static VFIOContainer *vfio_create_container(int fd, 
VFIOGroup *group,
  return NULL;
  }
+skip_iommu:


I think we can avoid this 'skip_iommu' label with some minor refactoring.


  vioc_name = vfio_get_iommu_class_name(iommu_type);
  container = VFIO_IOMMU_LEGACY(object_new(vioc_name));
  container->fd = fd;
+    container->reused = reused;
  container->iommu_type = iommu_type;
  return container;
  }
@@ -543,10 +562,13 @@ static bool vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
  VFIOContainer *container;
  VFIOContainerBase *bcontainer;
  int ret, fd;
+    bool reused;


cpr_reused.


  VFIOAddressSpace *space;
  VFIOIOMMUClass *vioc;
  space = vfio_get_address_space(as);
+    fd = cpr_find_fd("vfio_container_for_group", group->groupid);
+    reused = (fd > 0);



hmm, so we are deducing from the existence of a CprFd state element
that we are doing a live update of the VM.  This seems to me to be a
somewhat quick heuristic.

Isn't there a global helper ? Isn't the VM aware that it's being
restarted after a live update ? I am not familiar with the CPR
sequence.


There is a global mode that can be checked, but we would still need to
fetch the fd.  Checking the fd alone yields tighter code.  It also seems
perfectly logical to me when reading the code.  Can't find the cpr fd?
Then we are not doing cpr.  BTW, it is not heuristic.  The cpr fd exists
at creation time iff we are doing cpr.


  /*
   * VFIO is currently incompatible with discarding of RAM insofar as the
@@ -579,28 +601,52 @@ static bool vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
   * details once we know which type of IOMMU we are using.
   */
+    /*
+ * If the container is reused, 

Re: [PATCH V1 04/26] vfio/container: register container for cpr

2025-02-03 Thread Steven Sistare

On 2/3/2025 12:01 PM, Cédric Le Goater wrote:

On 1/29/25 15:43, Steve Sistare wrote:

Register a legacy container for cpr-transfer.  Add a blocker if the kernel
does not support VFIO_UPDATE_VADDR or VFIO_UNMAP_ALL.

This is mostly boiler plate.  The fields to to saved and restored are added
in subsequent patches.

Signed-off-by: Steve Sistare 
---
  hw/vfio/container.c   |  6 ++--
  hw/vfio/cpr-legacy.c  | 68 +++
  hw/vfio/meson.build   |  3 +-
  include/hw/vfio/vfio-common.h |  3 ++
  4 files changed, 76 insertions(+), 4 deletions(-)
  create mode 100644 hw/vfio/cpr-legacy.c

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 4ebb526..a90ce6c 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -618,7 +618,7 @@ static bool vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
  }
  bcontainer = &container->bcontainer;
-    if (!vfio_cpr_register_container(bcontainer, errp)) {
+    if (!vfio_legacy_cpr_register_container(container, errp)) {
  goto free_container_exit;
  }
@@ -666,7 +666,7 @@ enable_discards_exit:
  vfio_ram_block_discard_disable(container, false);
  unregister_container_exit:
-    vfio_cpr_unregister_container(bcontainer);
+    vfio_legacy_cpr_unregister_container(container);
  free_container_exit:
  object_unref(container);
@@ -710,7 +710,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
  VFIOAddressSpace *space = bcontainer->space;
  trace_vfio_disconnect_container(container->fd);
-    vfio_cpr_unregister_container(bcontainer);
+    vfio_legacy_cpr_unregister_container(container);
  close(container->fd);
  object_unref(container);
diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
new file mode 100644
index 000..d3bbc05
--- /dev/null
+++ b/hw/vfio/cpr-legacy.c
@@ -0,0 +1,68 @@
+/*
+ * Copyright (c) 2021-2025 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+#include "qemu/osdep.h"
+#include "hw/vfio/vfio-common.h"
+#include "migration/blocker.h"
+#include "migration/cpr.h"
+#include "migration/migration.h"
+#include "migration/vmstate.h"
+#include "qapi/error.h"
+
+static bool vfio_cpr_supported(VFIOContainer *container, Error **errp)
+{
+    if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR)) {
+    error_setg(errp, "VFIO container does not support VFIO_UPDATE_VADDR");
+    return false;
+
+    } else if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UNMAP_ALL)) {
+    error_setg(errp, "VFIO container does not support VFIO_UNMAP_ALL");
+    return false;
+
+    } else {
+    return true;
+    }
+}
+
+static const VMStateDescription vfio_container_vmstate = {
+    .name = "vfio-container",
+    .version_id = 0,
+    .minimum_version_id = 0,
+    .needed = cpr_needed_for_reuse,
+    .fields = (VMStateField[]) {
+    VMSTATE_END_OF_LIST()
+    }
+};
+
+bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
+{
+    VFIOContainerBase *bcontainer = &container->bcontainer;
+    Error **cpr_blocker = &container->cpr_blocker;
+
+    if (!vfio_cpr_register_container(bcontainer, errp)) {
+    return false;
+    }
+
+    if (!vfio_cpr_supported(container, cpr_blocker)) {
+    return migrate_add_blocker_modes(cpr_blocker, errp,
+ MIG_MODE_CPR_TRANSFER, -1) == 0;
+    }
+
+    vmstate_register(NULL, -1, &vfio_container_vmstate, container);
+
+    return true;
+}
+
+void vfio_legacy_cpr_unregister_container(VFIOContainer *container)
+{
+    VFIOContainerBase *bcontainer = &container->bcontainer;
+
+    vfio_cpr_unregister_container(bcontainer);
+    migrate_del_blocker(&container->cpr_blocker);
+    vmstate_unregister(NULL, &vfio_container_vmstate, container);
+}
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index bba776f..5487815 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -5,13 +5,14 @@ vfio_ss.add(files(
    'container-base.c',
    'container.c',
    'migration.c',
-  'cpr.c',
  ))
  vfio_ss.add(when: 'CONFIG_PSERIES', if_true: files('spapr.c'))
  vfio_ss.add(when: 'CONFIG_IOMMUFD', if_true: files(
    'iommufd.c',
  ))
  vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
+  'cpr.c',
+  'cpr-legacy.c',
    'display.c',
    'pci-quirks.c',
    'pci.c',
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 0c60be5..53e554f 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -84,6 +84,7 @@ typedef struct VFIOContainer {
  VFIOContainerBase bcontainer;
  int fd; /* /dev/vfio/vfio, empowered by the attached groups */
  unsigned iommu_type;
+    Error *cpr_blocker;
  QLIST_HEAD(, VFIOGroup) group_list;
  } VFIOContainer;
@@ -258,6 +259,8 @@ int vfio_kvm_device_del_fd(int fd, Error 

[PATCH 0/2] nbd: Allow debugging tuning of handshake limit

2025-02-03 Thread Eric Blake
Reviving a patch that has been sitting in my tree for a while.  It's
mostly useful for low-level integration testing (such as debugging
libnbd as an NBD client).

Eric Blake (2):
  qemu-nbd: Allow users to adjust handshake limit
  nbd/server: Allow users to adjust handshake limit in QMP

 docs/tools/qemu-nbd.rst|  5 +
 qapi/block-export.json | 10 +
 include/block/nbd.h|  6 ++---
 block/monitor/block-hmp-cmds.c |  4 ++--
 blockdev-nbd.c | 26 ++---
 qemu-nbd.c | 41 +-
 6 files changed, 64 insertions(+), 28 deletions(-)

-- 
2.48.1




[PATCH 2/2] nbd/server: Allow users to adjust handshake limit in QMP

2025-02-03 Thread Eric Blake
Although defaulting the handshake limit to 10 seconds was a nice QoI
change to weed out intentionally slow clients, it can interfere with
integration testing done with manual NBD_OPT commands over 'nbdsh
--opt-mode'.  Expose a QMP knob 'handshake-max-secs' to allow the user
to alter the timeout away from the default.

The parameter name here intentionally matches the spelling of the
constant added in commit fb1c2aaa98, and not the command-line spelling
added in the previous patch for qemu-nbd; that's because in QMP,
longer names serve as good self-documentation, and unlike the command
line, machines don't have problems generating longer spellings.

Signed-off-by: Eric Blake 
---
 qapi/block-export.json | 10 ++
 include/block/nbd.h|  6 +++---
 block/monitor/block-hmp-cmds.c |  4 ++--
 blockdev-nbd.c | 26 ++
 4 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/qapi/block-export.json b/qapi/block-export.json
index ce33fe378df..58ae6a5e1d7 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -17,6 +17,10 @@
 #
 # @addr: Address on which to listen.
 #
+# @handshake-max-secs: Time limit, in seconds, at which a client that
+# has not completed the negotiation handshake will be disconnected,
+# or 0 for no limit (since 10.0; default: 10).
+#
 # @tls-creds: ID of the TLS credentials object (since 2.6).
 #
 # @tls-authz: ID of the QAuthZ authorization object used to validate
@@ -34,6 +38,7 @@
 ##
 { 'struct': 'NbdServerOptions',
   'data': { 'addr': 'SocketAddress',
+'*handshake-max-secs': 'uint32',
 '*tls-creds': 'str',
 '*tls-authz': 'str',
 '*max-connections': 'uint32' } }
@@ -52,6 +57,10 @@
 #
 # @addr: Address on which to listen.
 #
+# @handshake-max-secs: Time limit, in seconds, at which a client that
+# has not completed the negotiation handshake will be disconnected,
+# or 0 for no limit (since 10.0; default: 10).
+#
 # @tls-creds: ID of the TLS credentials object (since 2.6).
 #
 # @tls-authz: ID of the QAuthZ authorization object used to validate
@@ -72,6 +81,7 @@
 ##
 { 'command': 'nbd-server-start',
   'data': { 'addr': 'SocketAddressLegacy',
+'*handshake-max-secs': 'uint32',
 '*tls-creds': 'str',
 '*tls-authz': 'str',
 '*max-connections': 'uint32' },
diff --git a/include/block/nbd.h b/include/block/nbd.h
index d4f8b21aecc..92987c76fd6 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -428,9 +428,9 @@ void nbd_client_put(NBDClient *client);
 void nbd_server_is_qemu_nbd(int max_connections);
 bool nbd_server_is_running(void);
 int nbd_server_max_connections(void);
-void nbd_server_start(SocketAddress *addr, const char *tls_creds,
-  const char *tls_authz, uint32_t max_connections,
-  Error **errp);
+void nbd_server_start(SocketAddress *addr, uint32_t handshake_max_secs,
+  const char *tls_creds, const char *tls_authz,
+  uint32_t max_connections, Error **errp);
 void nbd_server_start_options(NbdServerOptions *arg, Error **errp);

 /* nbd_read
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 1d312513fc4..0cfcbfe7c21 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -402,8 +402,8 @@ void hmp_nbd_server_start(Monitor *mon, const QDict *qdict)
 goto exit;
 }

-nbd_server_start(addr, NULL, NULL, NBD_DEFAULT_MAX_CONNECTIONS,
- &local_err);
+nbd_server_start(addr, NBD_DEFAULT_HANDSHAKE_MAX_SECS, NULL, NULL,
+ NBD_DEFAULT_MAX_CONNECTIONS, &local_err);
 qapi_free_SocketAddress(addr);
 if (local_err != NULL) {
 goto exit;
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 9e61fbaf2b2..e9f53e83d48 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -28,6 +28,7 @@ typedef struct NBDConn {

 typedef struct NBDServerData {
 QIONetListener *listener;
+uint32_t handshake_max_secs;
 QCryptoTLSCreds *tlscreds;
 char *tlsauthz;
 uint32_t max_connections;
@@ -84,8 +85,7 @@ static void nbd_accept(QIONetListener *listener, 
QIOChannelSocket *cioc,
 nbd_update_server_watch(nbd_server);

 qio_channel_set_name(QIO_CHANNEL(cioc), "nbd-server");
-/* TODO - expose handshake timeout as QMP option */
-nbd_client_new(cioc, NBD_DEFAULT_HANDSHAKE_MAX_SECS,
+nbd_client_new(cioc, nbd_server->handshake_max_secs,
nbd_server->tlscreds, nbd_server->tlsauthz,
nbd_blockdev_client_closed, conn);
 }
@@ -162,9 +162,9 @@ static QCryptoTLSCreds *nbd_get_tls_creds(const char *id, 
Error **errp)
 }


-void nbd_server_start(SocketAddress *addr, const char *tls_creds,
-  const char *tls_authz, uint32_t max_connections,
-  Error **errp)
+void nbd_server_start(SocketAddress *addr, uint32_t handshake_max

Re: [PATCH V1 06/26] vfio/container: preserve DMA mappings

2025-02-03 Thread Steven Sistare

On 2/3/2025 1:25 PM, Cédric Le Goater wrote:

On 1/29/25 15:43, Steve Sistare wrote:

Preserve DMA mappings during cpr-transfer.

In the container pre_save handler, suspend the use of virtual addresses
in DMA mappings with VFIO_DMA_UNMAP_FLAG_VADDR, because guest RAM will
be remapped at a different VA after exec.  DMA to already-mapped pages
continues.

Because the vaddr is temporarily invalid, mediated devices cannot be
supported, so add a blocker for them.  This restriction will not apply
to iommufd containers when CPR is added for them in a future patch.

In new QEMU, do not register the memory listener at device creation time.
Register it later, in the container post_load handler, after all vmstate
that may affect regions and mapping boundaries has been loaded.  The
post_load registration will cause the listener to invoke its callback on
each flat section, and the calls will match the mappings remembered by the
kernel.  Modify vfio_dma_map (which is called by the listener) to pass the
new VA to the kernel using VFIO_DMA_MAP_FLAG_VADDR.

Signed-off-by: Steve Sistare 
---
  hw/vfio/container.c   | 44 +++
  hw/vfio/cpr-legacy.c  | 32 +++
  include/hw/vfio/vfio-common.h |  3 +++
  3 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 81d0ccc..2b5125e 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -32,6 +32,7 @@
  #include "trace.h"
  #include "qapi/error.h"
  #include "migration/cpr.h"
+#include "migration/blocker.h"
  #include "pci.h"
  VFIOGroupList vfio_group_list =
@@ -132,6 +133,8 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase 
*bcontainer,
  int ret;
  Error *local_err = NULL;
+    assert(!container->reused);
+
  if (iotlb && vfio_devices_all_dirty_tracking_started(bcontainer)) {
  if (!vfio_devices_all_device_dirty_tracking(bcontainer) &&
  bcontainer->dirty_pages_supported) {
@@ -183,12 +186,24 @@ static int vfio_legacy_dma_map(const VFIOContainerBase 
*bcontainer, hwaddr iova,
    bcontainer);
  struct vfio_iommu_type1_dma_map map = {
  .argsz = sizeof(map),
-    .flags = VFIO_DMA_MAP_FLAG_READ,
  .vaddr = (__u64)(uintptr_t)vaddr,
  .iova = iova,
  .size = size,
  };
+    /*
+ * Set the new vaddr for any mappings registered during cpr load.
+ * Reused is cleared thereafter.
+ */
+    if (container->reused) {
+    map.flags = VFIO_DMA_MAP_FLAG_VADDR;
+    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map)) {
+    goto fail;
+    }
+    return 0;
+    }


This is a bit ugly.

When reaching routine vfio_attach_device(), could we detect that CPR is
in progress and replace the 'VFIOIOMMUClass *' temporarily with a set of
CPR specific handlers ?


Good idea, I'll try it.  I wrote this code years ago before the dma
map and unmap functions were defined in an ops vector.


+
+    map.flags = VFIO_DMA_MAP_FLAG_READ;
  if (!readonly) {
  map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
  }
@@ -205,7 +220,11 @@ static int vfio_legacy_dma_map(const VFIOContainerBase 
*bcontainer, hwaddr iova,
  return 0;
  }
-    error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
+fail:
+    error_report("vfio_dma_map %s (iova %lu, size %ld, va %p): %s",
+    (container->reused ? "VADDR" : ""), iova, size, vaddr,
+    strerror(errno));
+



FYI, I am currently trying to remove this error report.



  return -errno;
  }
@@ -689,8 +708,17 @@ static bool vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
  group->container = container;
  QLIST_INSERT_HEAD(&container->group_list, group, container_next);
-    bcontainer->listener = vfio_memory_listener;
-    memory_listener_register(&bcontainer->listener, bcontainer->space->as);
+    /*
+ * If reused, register the listener later, after all state that may
+ * affect regions and mapping boundaries has been cpr load'ed.  Later,
+ * the listener will invoke its callback on each flat section and call
+ * vfio_dma_map to supply the new vaddr, and the calls will match the
+ * mappings remembered by the kernel.
+ */
+    if (!reused) {
+    bcontainer->listener = vfio_memory_listener;
+    memory_listener_register(&bcontainer->listener, bcontainer->space->as);
+    }


oh ! This is an important change. Please move in its own patch.


OK.


  if (bcontainer->error) {
  error_propagate_prepend(errp, bcontainer->error,
@@ -1002,6 +1030,13 @@ static bool vfio_legacy_attach_device(const char *name, 
VFIODevice *vbasedev,
  return false;
  }
+    if (vbasedev->mdev) {
+    error_setg(&vbasedev->cpr_mdev_blocker,
+   "CPR does not support vfio mdev %s", vbasedev->name);
+    migrate_add_blocker_modes(&vbasedev->cpr_mdev_blocker, &error_fatal

[PATCH 1/2] qemu-nbd: Allow users to adjust handshake limit

2025-02-03 Thread Eric Blake
Although defaulting the handshake limit to 10 seconds was a nice QoI
change to weed out intentionally slow clients, it can interfere with
integration testing done with manual NBD_OPT commands over 'nbdsh
--opt-mode'.  Expose a command line option to allow the user to alter
the timeout away from the default.  This option is unlikely to be used
in enough scenarios to warrant a short option letter.

The option --handshake-limit intentionally differs from the name of
the constant added in commit fb1c2aaa98 (limit instead of max_secs)
and the QMP name to be added in the next commit; this is because
typing a longer command-line name is undesirable and there is
sufficient --help text to document the units.

Signed-off-by: Eric Blake 
---
 docs/tools/qemu-nbd.rst |  5 +
 qemu-nbd.c  | 41 ++---
 2 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst
index 4f21b7904ac..f82ea5fd77b 100644
--- a/docs/tools/qemu-nbd.rst
+++ b/docs/tools/qemu-nbd.rst
@@ -156,6 +156,11 @@ driver options if :option:`--image-opts` is specified.
   Set the NBD volume export description, as a human-readable
   string.

+.. option:: --handshake-limit=N
+
+  Set the timeout for a client to successfully complete its handshake
+  to N seconds (default 10), or 0 for no limit.
+
 .. option:: -L, --list

   Connect as a client and list all details about the exports exposed by
diff --git a/qemu-nbd.c b/qemu-nbd.c
index e7a961a5562..2eb32bc700c 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -57,19 +57,20 @@
 #define HAVE_NBD_DEVICE 0
 #endif

-#define SOCKET_PATH"/var/lock/qemu-nbd-%s"
-#define QEMU_NBD_OPT_CACHE 256
-#define QEMU_NBD_OPT_AIO   257
-#define QEMU_NBD_OPT_DISCARD   258
-#define QEMU_NBD_OPT_DETECT_ZEROES 259
-#define QEMU_NBD_OPT_OBJECT260
-#define QEMU_NBD_OPT_TLSCREDS  261
-#define QEMU_NBD_OPT_IMAGE_OPTS262
-#define QEMU_NBD_OPT_FORK  263
-#define QEMU_NBD_OPT_TLSAUTHZ  264
-#define QEMU_NBD_OPT_PID_FILE  265
-#define QEMU_NBD_OPT_SELINUX_LABEL 266
-#define QEMU_NBD_OPT_TLSHOSTNAME   267
+#define SOCKET_PATH  "/var/lock/qemu-nbd-%s"
+#define QEMU_NBD_OPT_CACHE   256
+#define QEMU_NBD_OPT_AIO 257
+#define QEMU_NBD_OPT_DISCARD 258
+#define QEMU_NBD_OPT_DETECT_ZEROES   259
+#define QEMU_NBD_OPT_OBJECT  260
+#define QEMU_NBD_OPT_TLSCREDS261
+#define QEMU_NBD_OPT_IMAGE_OPTS  262
+#define QEMU_NBD_OPT_FORK263
+#define QEMU_NBD_OPT_TLSAUTHZ264
+#define QEMU_NBD_OPT_PID_FILE265
+#define QEMU_NBD_OPT_SELINUX_LABEL   266
+#define QEMU_NBD_OPT_TLSHOSTNAME 267
+#define QEMU_NBD_OPT_HANDSHAKE_LIMIT 268

 #define MBR_SIZE 512

@@ -80,6 +81,7 @@ static int nb_fds;
 static QIONetListener *server;
 static QCryptoTLSCreds *tlscreds;
 static const char *tlsauthz;
+static int handshake_limit = NBD_DEFAULT_HANDSHAKE_MAX_SECS;

 static void usage(const char *name)
 {
@@ -101,6 +103,7 @@ static void usage(const char *name)
 "  -v, --verbose display extra debugging information\n"
 "  -x, --export-name=NAMEexpose export by name (default is empty string)\n"
 "  -D, --description=TEXTexport a human-readable description\n"
+"  --handshake-limit=N   limit client's handshake to N seconds (default 
10)\n"
 "\n"
 "Exposing part of the image:\n"
 "  -o, --offset=OFFSET   offset into the image\n"
@@ -390,8 +393,7 @@ static void nbd_accept(QIONetListener *listener, 
QIOChannelSocket *cioc,

 nb_fds++;
 nbd_update_server_watch();
-/* TODO - expose handshake timeout as command line option */
-nbd_client_new(cioc, NBD_DEFAULT_HANDSHAKE_MAX_SECS,
+nbd_client_new(cioc, handshake_limit,
tlscreds, tlsauthz, nbd_client_closed, NULL);
 }

@@ -569,6 +571,8 @@ int main(int argc, char **argv)
 { "object", required_argument, NULL, QEMU_NBD_OPT_OBJECT },
 { "export-name", required_argument, NULL, 'x' },
 { "description", required_argument, NULL, 'D' },
+{ "handshake-limit", required_argument, NULL,
+  QEMU_NBD_OPT_HANDSHAKE_LIMIT },
 { "tls-creds", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
 { "tls-hostname", required_argument, NULL, QEMU_NBD_OPT_TLSHOSTNAME },
 { "tls-authz", required_argument, NULL, QEMU_NBD_OPT_TLSAUTHZ },
@@ -815,6 +819,13 @@ int main(int argc, char **argv)
 case QEMU_NBD_OPT_SELINUX_LABEL:
 selinux_label = optarg;
 break;
+case QEMU_NBD_OPT_HANDSHAKE_LIMIT:
+if (qemu_strtoi(optarg, NULL, 0, &handshake_limit) < 0 ||
+handshake_limit < 0) {
+error_report("Invalid handshake limit '%s'", optarg);
+exit(EXIT_FAILURE);
+}
+break;
 }
 }

-- 
2.48.1




Re: [PATCH v2 00/14] meson: Deprecate 32-bit host support

2025-02-03 Thread Stefano Stabellini
+Xen maintainers


On Mon, 3 Feb 2025, Richard Henderson wrote:
> On 2/3/25 04:54, Paolo Bonzini wrote:
> > On 2/3/25 04:18, Richard Henderson wrote:
> > > v1: 20250128004254.33442-1-richard.hender...@linaro.org
> > > 
> > > For v2, immediately disable 64-on-32 TCG.
> > > 
> > > I *suspect* that we should disable 64-on-32 for *all* accelerators.
> > > The idea that an i686 binary on an x86_64 host may be used to spawn
> > > an x86_64 guest via kvm is silly and a bit more than niche.
> > 
> > At least Xen used to be commonly used with 32-bit dom0, because it saved
> > memory and dom0 would map in guest buffers as needed.  I'm not sure how
> > common that is these days, perhaps Stefano knows.
> 
> As a data-point, debian does not ship libxen-dev for i686.
> We cannot build-test this configuration at all.
> 
> I can build-test Xen for armhf, and I guess it would use i386-softmmu; it's
> unclear whether x86_64-softmmu and aarch64-softmmu are relevant or useful for
> an armhf host, or as an armhf binary running on an aarch64 host.


On the Xen side, there are two different use cases: x86 32-bit and ARM
32-bit.  

For x86 32-bit, while it was a very important use case in the past, I
believe it is far less so now. I will let the x86 maintainers comment on
how important it is today. 

For ARM 32-bit, I do not think we ever had many deployments, as most are
64-bit. Even when there are deployments, they do not typically use QEMU,
as QEMU is less important for Xen on ARM compared to x86. Therefore, I
would not block your cleanup and deprecation because of that. I will let
the other ARM maintainers chime in.

Re: [PATCH 20/21] hw/i2c: Import TCA6416 emulation from Xilinx

2025-02-03 Thread Bernhard Beschow



Am 2. Februar 2025 17:09:06 UTC schrieb "Philippe Mathieu-Daudé" 
:
>On 1/2/25 16:28, Bernhard Beschow wrote:
>> 
>> 
>> Am 30. Januar 2025 23:05:53 UTC schrieb "Philippe Mathieu-Daudé" 
>> :
>>> Cc'ing AMD folks
>>> 
>>> Hi Bernhard,
>>> 
>>> TL;DR; can't you use the PCF8574 which is a more complete model of I/O
>>> expander? (See hw/gpio/pcf8574.c)
>> 
>> If it is software-compatible then I could use it. I'm modeling a real board 
>> whose device tree I'd like to use unchanged, so I don't have much choice. 
>> The only reason I need it is because its absence clogs the i2c bus.
>
>No clue about compatibility. If you unfortunately need to add it,
>then please address my comments in the next version.

Sure, I'll link your and Zoltan's comments as todos in the next version.

Best regards,
Bernhard

>
>> 
>> Best regards,
>> Bernhard
>> 
>>> 
>>> On 20/1/25 21:37, Bernhard Beschow wrote:
 Xilinx QEMU implements a TCA6416 device model which may be useful for the
 broader QEMU community, so upstream it. In the Xilinx fork, the device 
 model
 gets compiled whenever CONFIG_CADENCE is true, so have it maintained by the
 "hw/*/cadence_*" maintainers.
 
 The code is based on Xilinx QEMU version xilinx_v2024.2 plus the following
 modifications:
 * Use OBJECT_DECLARE_SIMPLE_TYPE()
 * Port from DPRINTF() to trace events
 * Follow QEMU conventions for naming of state struct
 * Rename type to not contain a ','
 * Use DEFINE_TYPES() macro
 * Implement under hw/gpio since device is i2c client and gpio provider
 * Have dedicated Kconfig switch
 
 Signed-off-by: Bernhard Beschow 
 
 --
 I have a use for TCA6416 emulation. Question: Are Xilinx Zynq maintainers
 willing to maintain this device model?
 ---
MAINTAINERS  |   1 +
hw/gpio/tca6416.c| 122 +++
hw/gpio/Kconfig  |   5 ++
hw/gpio/meson.build  |   1 +
hw/gpio/trace-events |   4 ++
5 files changed, 133 insertions(+)
create mode 100644 hw/gpio/tca6416.c
 
 diff --git a/MAINTAINERS b/MAINTAINERS
 index 7531d65429..0cac9c90bc 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -1030,6 +1030,7 @@ S: Maintained
F: hw/*/xilinx_*
F: hw/*/cadence_*
F: hw/misc/zynq_slcr.c
 +F: hw/gpio/tca6416.c
F: hw/adc/zynq-xadc.c
F: include/hw/misc/zynq_slcr.h
F: include/hw/adc/zynq-xadc.h
 diff --git a/hw/gpio/tca6416.c b/hw/gpio/tca6416.c
 new file mode 100644
 index 00..81ed7a654d
 --- /dev/null
 +++ b/hw/gpio/tca6416.c
 @@ -0,0 +1,122 @@
 +/*
 + * QEMU model of TCA6416 16-Bit I/O Expander
>>> 
>>> Please add datasheet reference.
>>> 
 + *
 + * Copyright (c) 2018 Xilinx Inc.
 + *
 + * Written by Sai Pavan Boddu 
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining 
 a copy
 + * of this software and associated documentation files (the "Software"), 
 to deal
 + * in the Software without restriction, including without limitation the 
 rights
 + * to use, copy, modify, merge, publish, distribute, sublicense, and/or 
 sell
 + * copies of the Software, and to permit persons to whom the Software is
 + * furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice shall be 
 included in
 + * all copies or substantial portions of the Software.
 + *
 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
 EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
 MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
 OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
 ARISING FROM,
 + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS 
 IN
 + * THE SOFTWARE.
>>> 
>>> Sai/Edgar/Francisco, should we update to AMD here? Remove or update the
>>> email address? Also, can you Ack the replacement of this license by a
>>> SPDX tag?
>>> 
 + */
 +#include "qemu/osdep.h"
 +#include "hw/i2c/i2c.h"
 +#include "trace.h"
 +
 +#define TYPE_TCA6416 "tca6416"
 +OBJECT_DECLARE_SIMPLE_TYPE(Tca6416State, TCA6416)
 +
 +#define IN_PORT00
 +#define IN_PORT11
 +#define OUT_PORT0   2
 +#define OUT_PORT1   3
 +#define POL_INV04
 +#define POL_INV15
 +#define CONF_PORT0  6
 +#define CONF_PORT1  7
>>> 
>>> enum up to here?
>>> 
 +#define RMAX (CONF_PORT1 + 1)
 +
 +enum tca6416_events {
 + ADDR_DONE,
 + ADDRESSING,
 +};
 +
 +struct Tca6416State {
 + I2CSlave i2c;
 +
 + uint8_t addr;
>>>

Re: [PATCH 0/7] hw/arm/raspi4b: Add models with 4GB and 8GB of DRAM

2025-02-03 Thread BALATON Zoltan

On Mon, 3 Feb 2025, Philippe Mathieu-Daudé wrote:

On 3/2/25 15:50, Daniel P. Berrangé wrote:

On Mon, Feb 03, 2025 at 02:45:06PM +, Peter Maydell wrote:
On Mon, 3 Feb 2025 at 14:33, Daniel P. Berrangé  
wrote:


On Mon, Feb 03, 2025 at 02:29:49PM +, Alex Bennée wrote:

Peter Maydell  writes:


On Sat, 1 Feb 2025 at 12:57, BALATON Zoltan  wrote:


On Sat, 1 Feb 2025, Philippe Mathieu-Daudé wrote:

- Deprecate the 'raspi4b' machine name, renaming it as
  'raspi4b-1g' on 32-bit hosts, 'raspi4b-2g' otherwise.
- Add the 'raspi4b-4g' and 'raspi4b-8g' machines, with
  respectively 4GB and 8GB of DRAM.


IMHO (meaning you can ignore it, just my opinion) if the only 
difference

is the memory size -machine raspi4b -memory 4g would be better user
experience than having a lot of different machines.


Yes, I think I agree. We have a way for users to specify
how much memory they want, and I think it makes more sense
to use that than to have lots of different machine types.


I guess for the Pi we should validate the -memory supplied is on of the
supported grid of devices rather than an arbitrary value?


If the user wants to create a rpi4 with 6 GB RAM why should we stop
them ? It is their choice if they want to precisely replicate RAM
size from a physical model, or use something different when virtualized.


The board revision code (reported to the guest via the emulated
firmware interface) only supports reporting 256MB, 512MB,
1GB, 2GB, 4GB or 8GB:

https://www.raspberrypi.com/documentation/computers/raspberry-pi.html#new-style-revision-codes


I think it would be valid to report the revision code for the memory
size that doesn't exceed what QEMU has configured. eg if configured
with 6 GB, then report code for 4 GB.


We need to distinct between physical machines VS virtual ones.

Guests on virtual machines have some way to figure the virtual
hardware (ACPI tables, DeviceTree blob, fw-cfg, ...).

Guests for physical machines usually expect fixed hardware (not
considering devices on busses).

For the particular case of the Raspberry Pi machines, their
bootloader gets the board layout by reading the
RPI_FWREQ_GET_BOARD_REVISION constant value.


What would be the point of emulating a raspi machine with 6GB
if the FW is not going to consider besides 4GB?
Besides, someone modify a guest to work with 6GB, it won't work
on real HW.


Usually the point of such non-standard configs would be running Linux via 
-kernel which could use whatever is configured if it has a way to detect 
it or maybe for memory it could even be specified on the kernel command 
line. But maybe this is not a common enough config to support so reporting 
error for memory size that's not on the list of valid sizes might be 
enough. This is similar to qemu-system-ppc -machine sam460ex which has a 
memory controller register that can only describe existing DIMM sizes. 
Originally I allowed users to specify whatever memory size and only warn 
for sizes not matching a DIMM size because Linux only looks at the device 
tree which QEMU can generate when booting with -kernel so it works but the 
firmware detects RAM from the SPD data and can only support certain sizes 
so only less RAM that could be convered with SPD data would be visible for 
guests. But then others thought it's better to return error for such cases 
and changed it and removed the support for arbitrary memory size so now it 
returns error when non power of 2 memory size is specified.


Regards,
BALATON Zoltan


For Arm embedded boards we mostly tend to "restrict the user
to what you can actually do", except for older boards where
we tended not to write any kind of sanity checking on CPU
type, memory size, etc.


If we're going to strictly limit memory size that's accepted I wonder
how we could information users/mgmt apps about what's permitted ?

Expressing valid combinations of configs across different args gets
pretty complicated quickly :-(


I'll try to address Zoltan and Peter request to have a dynamic raspi
machine. It is a bit unfortunate we didn't insisted on that when we
decided to expose a fixed set of existing boards in order to not be
bothered by inconsistent bug reports, back in 2019.

Regards,

Phil.



Re: [PATCH v2 07/14] accel/stubs: Expand stubs for TCG

2025-02-03 Thread Richard Henderson

On 2/3/25 09:38, Thomas Huth wrote:

On 03/02/2025 17.43, Richard Henderson wrote:

On 2/3/25 02:22, Thomas Huth wrote:

On 03/02/2025 04.18, Richard Henderson wrote:

Add tcg_allowed, qmp_x_query_jit, qmp_x_query_opcount.
These are referenced when CONFIG_TCG is enabled globally,
but not for a specific target.

Signed-off-by: Richard Henderson 
---
  accel/stubs/tcg-stub.c | 24 
  1 file changed, 24 insertions(+)

diff --git a/accel/stubs/tcg-stub.c b/accel/stubs/tcg-stub.c
index 7f4208fddf..9c2e2dc6e1 100644
--- a/accel/stubs/tcg-stub.c
+++ b/accel/stubs/tcg-stub.c
@@ -13,6 +13,18 @@
  #include "qemu/osdep.h"
  #include "exec/tb-flush.h"
  #include "exec/exec-all.h"
+#include "qapi/error.h"
+
+/*
+ * This file *ought* to be built once and linked only when required.
+ * However, it is built per-target, which means qemu/osdep.h has already
+ * undef'ed CONFIG_TCG, which hides the auto-generated declaration.


So why don't we only build this file once?


I think we'd have to create a static library for it.
It didn't seem worth the effort at the time.
I can re-investigate if you like.


I think something like this might work:


I think we need some of Philippe's include/exec/ cleanup work first. We currently use 
exec/exec-all.h, which requires cpu.h, which requires building per-target.



r~



Re: [PATCH v4 13/33] migration/multifd: Device state transfer support - receive side

2025-02-03 Thread Peter Xu
On Mon, Feb 03, 2025 at 11:18:11PM +0100, Maciej S. Szmigiero wrote:
> On 3.02.2025 22:27, Peter Xu wrote:
> > On Thu, Jan 30, 2025 at 11:08:34AM +0100, Maciej S. Szmigiero wrote:
> > > From: "Maciej S. Szmigiero" 
> > > 
> > > Add a basic support for receiving device state via multifd channels -
> > > channels that are shared with RAM transfers.
> > > 
> > > Depending whether MULTIFD_FLAG_DEVICE_STATE flag is present or not in the
> > > packet header either device state (MultiFDPacketDeviceState_t) or RAM
> > > data (existing MultiFDPacket_t) is read.
> > > 
> > > The received device state data is provided to
> > > qemu_loadvm_load_state_buffer() function for processing in the
> > > device's load_state_buffer handler.
> > > 
> > > Signed-off-by: Maciej S. Szmigiero 
> > 
> > I think I acked this one.  You could keep my R-b if...
> > 
> > [...]
> > 
> > > diff --git a/migration/multifd.h b/migration/multifd.h
> > > index 9e4baa066312..abf3acdcee40 100644
> > > --- a/migration/multifd.h
> > > +++ b/migration/multifd.h
> > > @@ -62,6 +62,12 @@ MultiFDRecvData *multifd_get_recv_data(void);
> > >   #define MULTIFD_FLAG_UADK (8 << 1)
> > >   #define MULTIFD_FLAG_QATZIP (16 << 1)
> > > +/*
> > > + * If set it means that this packet contains device state
> > > + * (MultiFDPacketDeviceState_t), not RAM data (MultiFDPacket_t).
> > > + */
> > > +#define MULTIFD_FLAG_DEVICE_STATE (1 << 6)
> > 
> > ... if this won't conflict with MULTIFD_FLAG_QATZIP.
> 
> Hmm, isn't (16 << 1) = 32 while (1 << 6) = 64?

Oops. :)

> > I think we should stick with one way to write it, then when rebase you can
> > see such conflicts - either your patch uses 32 << 1, or perhaps we should
> > start to switch to BIT() for all above instead..

Still, do you mind switch to "32 << 1" (or use BIT())?

With either, feel free to take:

Reviewed-by: Peter Xu 

-- 
Peter Xu




Re: [PATCH v4 08/33] migration/multifd: Allow premature EOF on TLS incoming channels

2025-02-03 Thread Peter Xu
On Mon, Feb 03, 2025 at 10:41:32PM +0100, Maciej S. Szmigiero wrote:
> On 3.02.2025 21:20, Peter Xu wrote:
> > On Mon, Feb 03, 2025 at 07:53:00PM +0100, Maciej S. Szmigiero wrote:
> > > On 3.02.2025 19:20, Peter Xu wrote:
> > > > On Thu, Jan 30, 2025 at 11:08:29AM +0100, Maciej S. Szmigiero wrote:
> > > > > From: "Maciej S. Szmigiero" 
> > > > > 
> > > > > Multifd send channels are terminated by calling
> > > > > qio_channel_shutdown(QIO_CHANNEL_SHUTDOWN_BOTH) in
> > > > > multifd_send_terminate_threads(), which in the TLS case essentially
> > > > > calls shutdown(SHUT_RDWR) on the underlying raw socket.
> > > > > 
> > > > > Unfortunately, this does not terminate the TLS session properly and
> > > > > the receive side sees this as a GNUTLS_E_PREMATURE_TERMINATION error.
> > > > > 
> > > > > The only reason why this wasn't causing migration failures is because
> > > > > the current migration code apparently does not check for migration
> > > > > error being set after the end of the multifd receive process.
> > > > > 
> > > > > However, this will change soon so the multifd receive code has to be
> > > > > prepared to not return an error on such premature TLS session EOF.
> > > > > Use the newly introduced QIOChannelTLS method for that.
> > > > > 
> > > > > It's worth noting that even if the sender were to be changed to 
> > > > > terminate
> > > > > the TLS connection properly the receive side still needs to remain
> > > > > compatible with older QEMU bit stream which does not do this.
> > > > 
> > > > If this is an existing bug, we could add a Fixes.
> > > 
> > > It is an existing issue but only uncovered by this patch set.
> > > 
> > > As far as I can see it was always there, so it would need some
> > > thought where to point that Fixes tag.
> > 
> > If there's no way to trigger a real functional bug anyway, it's also ok we
> > omit the Fixes.
> > 
> > > > Two pure questions..
> > > > 
> > > > - What is the correct way to terminate the TLS session without this 
> > > > flag?
> > > 
> > > I guess one would need to call gnutls_bye() like in this GnuTLS example:
> > > https://gitlab.com/gnutls/gnutls/-/blob/2b8c3e4c71ad380bbbffb32e6003b34ecad596e3/doc/examples/ex-client-anon.c#L102
> > > 
> > > > - Why this is only needed by multifd sessions?
> > > 
> > > What uncovered the issue was switching the load threads to using
> > > migrate_set_error() instead of their own result variable
> > > (load_threads_ret) which you had requested during the previous
> > > patch set version review:
> > > https://lore.kernel.org/qemu-devel/Z1DbH5fwBaxtgrvH@x1n/
> > > 
> > > Turns out that the multifd receive code always returned
> > > error in the TLS case, just nothing was previously checking for
> > > that error presence.
> > 
> > What I was curious is whether this issue also exists for the main migration
> > channel when with tls, especially when e.g. multifd not enabled at all.  As
> > I don't see anywhere that qemu uses gnutls_bye() for any tls session.
> > 
> > I think it's a good to find that we overlooked this before.. and IMHO it's
> > always good we could fix this.
> > 
> > Does it mean we need proper gnutls_bye() somewhere?
> > 
> > If we need an explicit gnutls_bye(), then I wonder if that should be done
> > on the main channel as well.
> 
> That's a good question and looking at the code qemu_loadvm_state_main() exits
> on receiving "QEMU_VM_EOF" section (that's different from receiving socket 
> EOF)
> and then optionally "QEMU_VM_VMDESCRIPTION" section is read with explicit size
> in qemu_loadvm_state() - so still not until channel EOF.

I had a closer look, I do feel like such pre-mature termination is caused
by explicit shutdown()s of the iochannels, looks like that can cause issue
even after everything is sent.  Then I noticed indeed multifd sender
iochannels will get explicit shutdown()s since commit 077fbb5942, while we
don't do that for the main channel.  Maybe that is a major difference.

Now I wonder whether we should shutdown() the channel at all if migration
succeeded, because looks like it can cause tls session to interrupt even if
the shutdown() is done after sent everything, and if so it'll explain why
you hit the issue with tls.

> 
> Then I can't see anything else reading the channel until it is closed in
> migration_incoming_state_destroy().
> 
> So most likely the main migration channel will never read far enough to
> reach that GNUTLS_E_PREMATURE_TERMINATION error.
> 
> > If we don't need gnutls_bye(), then should we always ignore pre-mature
> > termination of tls no matter if it's multifd or non-multifd channel (or
> > even a tls session that is not migration-related)?
> 
> So basically have this patch extended to calling
> qio_channel_tls_set_premature_eof_okay() also on the main migration channel?

If above theory can stand, then eof-okay could be a workaround papering
over the real problem that we shouldn't always shutdown()..

Could you have a look at below patch and see whether it ca

Re: [PATCH 08/21] hw/arm/fsl-imx8mp: Add USDHC storage controllers

2025-02-03 Thread Bernhard Beschow



Am 21. Januar 2025 02:52:58 UTC schrieb BALATON Zoltan :
>On Mon, 20 Jan 2025, Bernhard Beschow wrote:
>> The USDHC emulation allows for running real-world images such as those 
>> generated
>> by Buildroot. Convert the board documentation accordingly instead of running 
>> a
>> Linux kernel with ephemeral storage.
>> 
>> Signed-off-by: Bernhard Beschow 
>> ---
>> docs/system/arm/imx8mp-evk.rst | 39 +++---
>> include/hw/arm/fsl-imx8mp.h|  7 ++
>> hw/arm/fsl-imx8mp.c| 28 
>> hw/arm/imx8mp-evk.c| 18 
>> hw/arm/Kconfig |  1 +
>> 5 files changed, 81 insertions(+), 12 deletions(-)
>> 
>> diff --git a/docs/system/arm/imx8mp-evk.rst b/docs/system/arm/imx8mp-evk.rst
>> index d7d08cc198..1514bc5864 100644
>> --- a/docs/system/arm/imx8mp-evk.rst
>> +++ b/docs/system/arm/imx8mp-evk.rst
>> @@ -13,6 +13,7 @@ The ``imx8mp-evk`` machine implements the following 
>> devices:
>>  * Up to 4 Cortex-A53 Cores
>>  * Generic Interrupt Controller (GICv3)
>>  * 4 UARTs
>> + * 3 USDHC Storage Controllers
>>  * Secure Non-Volatile Storage (SNVS) including an RTC
>>  * Clock Tree
>> 
>> @@ -25,25 +26,39 @@ for loading a Linux kernel.
>> Direct Linux Kernel Boot
>> 
>> 
>> -Linux mainline v6.12 release is tested at the time of writing. To build a 
>> Linux
>> -mainline kernel that can be booted by the ``imx8mp-evk`` machine, simply
>> -configure the kernel using the defconfig configuration:
>> +Probably the easiest way to get started with a whole Linux system on the 
>> machine
>> +is to generate an image with Buildroot. Version 2024.11.1 is tested at the 
>> time
>> +of writing and involves three steps. First run the following command in the
>> +toplevel directory of the Buildroot source tree:
>> 
>> .. code-block:: bash
>> 
>> -  $ export ARCH=arm64
>> -  $ export CROSS_COMPILE=aarch64-linux-gnu-
>> -  $ make defconfig
>> +  $ make freescale_imx8mpevk_defconfig
>>   $ make
>
>Adding documentation that is removed a few patches later seems unnecessary. 
>Maybe you could keep the bare kernel docs as that could also be useful or not 
>add it in the first place? But if this is already written keeping it may make 
>more sense than dropping it.

I'd like the documentation to be complete and easy to follow at the same time. 
Buildroot achieves both, and allows for generating a cpio initrd instead. So 
I'd start with that which just requires minor adjustments here.

Best regards,
Bernhard

>
>Regards,
>BALATON Zoltan
>
>> 
>> -To boot the newly built Linux kernel in QEMU with the ``imx8mp-evk`` 
>> machine,
>> -run:
>> +Once finished successfully there is an ``output/image`` subfolder. Navigate 
>> into
>> +it and resize the SD card image to a power of two:
>> +
>> +.. code-block:: bash
>> +
>> +  $ qemu-img resize sdcard.img 256M
>> +
>> +Finally, the device tree needs to be patched with the following commands 
>> which
>> +will remove the ``cpu-idle-states`` properties from CPU nodes:
>> +
>> +.. code-block:: bash
>> +
>> +  $ dtc imx8mp-evk.dtb | sed '/cpu-idle-states/d' > imx8mp-evk-patched.dts
>> +  $ dtc imx8mp-evk-patched.dts -o imx8mp-evk-patched.dtb
>> +
>> +Now that everything is prepared the newly built image can be run in the QEMU
>> +``imx8mp-evk`` machine:
>> 
>> .. code-block:: bash
>> 
>>   $ qemu-system-aarch64 -M imx8mp-evk -smp 4 -m 3G \
>>   -display none -serial null -serial stdio \
>> -  -kernel arch/arm64/boot/Image \
>> -  -dtb arch/arm64/boot/dts/freescale/imx8mp-evk.dtb \
>> -  -initrd /path/to/rootfs.ext4 \
>> -  -append "root=/dev/ram"
>> +  -kernel Image \
>> +  -dtb imx8mp-evk-patched.dtb \
>> +  -append "root=/dev/mmcblk2p2" \
>> +  -drive file=sdcard.img,if=sd,bus=2,format=raw,id=mmcblk2
>> diff --git a/include/hw/arm/fsl-imx8mp.h b/include/hw/arm/fsl-imx8mp.h
>> index 9d2a757c2a..9832c05e8c 100644
>> --- a/include/hw/arm/fsl-imx8mp.h
>> +++ b/include/hw/arm/fsl-imx8mp.h
>> @@ -14,6 +14,7 @@
>> #include "hw/intc/arm_gicv3_common.h"
>> #include "hw/misc/imx7_snvs.h"
>> #include "hw/misc/imx8mp_ccm.h"
>> +#include "hw/sd/sdhci.h"
>> #include "qom/object.h"
>> #include "qemu/units.h"
>> 
>> @@ -27,6 +28,7 @@ enum FslImx8mpConfiguration {
>> FSL_IMX8MP_NUM_CPUS = 4,
>> FSL_IMX8MP_NUM_IRQS = 160,
>> FSL_IMX8MP_NUM_UARTS= 4,
>> +FSL_IMX8MP_NUM_USDHCS   = 3,
>> };
>> 
>> struct FslImx8mpState {
>> @@ -38,6 +40,7 @@ struct FslImx8mpState {
>> IMX8MPAnalogState  analog;
>> IMX7SNVSState  snvs;
>> IMXSerialState uart[FSL_IMX8MP_NUM_UARTS];
>> +SDHCIState usdhc[FSL_IMX8MP_NUM_USDHCS];
>> };
>> 
>> enum FslImx8mpMemoryRegions {
>> @@ -183,6 +186,10 @@ enum FslImx8mpMemoryRegions {
>> };
>> 
>> enum FslImx8mpIrqs {
>> +FSL_IMX8MP_USDHC1_IRQ   = 22,
>> +FSL_IMX8MP_USDHC2_IRQ   = 23,
>> +FSL_IMX8MP_USDHC3_IRQ   = 24,
>> +
>> FSL_IMX8MP_UART1_IRQ= 26,
>> 

Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls

2025-02-03 Thread Peter Xu
On Mon, Feb 03, 2025 at 10:41:43PM +0100, Maciej S. Szmigiero wrote:
> On 3.02.2025 21:36, Peter Xu wrote:
> > On Mon, Feb 03, 2025 at 09:15:52PM +0100, Maciej S. Szmigiero wrote:
> > > On 3.02.2025 20:58, Peter Xu wrote:
> > > > On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote:
> > > > > On 2.02.2025 13:45, Dr. David Alan Gilbert wrote:
> > > > > > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote:
> > > > > > > On 2.02.2025 03:06, Dr. David Alan Gilbert wrote:
> > > > > > > > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote:
> > > > > > > > > From: "Maciej S. Szmigiero" 
> > > > > > > > > 
> > > > > > > > > postcopy_ram_listen_thread() is a free running thread, so it 
> > > > > > > > > needs to
> > > > > > > > > take BQL around function calls to migration methods requiring 
> > > > > > > > > BQL.
> > > > > > > > > 
> > > > > > > > > qemu_loadvm_state_main() needs BQL held since it ultimately 
> > > > > > > > > calls
> > > > > > > > > "load_state" SaveVMHandlers.
> > > > > > > > > 
> > > > > > > > > migration_incoming_state_destroy() needs BQL held since it 
> > > > > > > > > ultimately calls
> > > > > > > > > "load_cleanup" SaveVMHandlers.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Maciej S. Szmigiero 
> > > > > > > > > 
> > > > > > > > > ---
> > > > > > > > >  migration/savevm.c | 4 
> > > > > > > > >  1 file changed, 4 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > > > > > > > index b0b74140daea..0ceea9638cc1 100644
> > > > > > > > > --- a/migration/savevm.c
> > > > > > > > > +++ b/migration/savevm.c
> > > > > > > > > @@ -2013,7 +2013,9 @@ static void 
> > > > > > > > > *postcopy_ram_listen_thread(void *opaque)
> > > > > > > > >   * in qemu_file, and thus we must be blocking now.
> > > > > > > > >   */
> > > > > > > > >  qemu_file_set_blocking(f, true);
> > > > > > > > > +bql_lock();
> > > > > > > > >  load_res = qemu_loadvm_state_main(f, mis);
> > > > > > > > > +bql_unlock();
> > > > > > > > 
> > > > > > > > Doesn't that leave that held for a heck of a long time?
> > > > > > > 
> > > > > > > Yes, and it effectively broke "postcopy recover" test but I
> > > > > > > think the reason for that is qemu_loadvm_state_main() and
> > > > > > > its children don't drop BQL while waiting for I/O.
> > > > > > > 
> > > > > > > I've described this case in more detail in my reply to Fabiano 
> > > > > > > here:
> > > > > > > https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd25...@maciej.szmigiero.name/
> > > > > > 
> > > > > > While it might be the cause in this case, my feeling is it's more 
> > > > > > fundamental
> > > > > > here - it's the whole reason that postcopy has a separate ram listen
> > > > > > thread.  As the destination is running, after it loads it's devices
> > > > > > and as it starts up the destination will be still loading RAM
> > > > > > (and other postcopiable devices) potentially for quite a while.
> > > > > > Holding the bql around the ram listen thread means that the
> > > > > > execution of the destination won't be able to take that lock
> > > > > > until the postcopy load has finished; so while that might apparently
> > > > > > complete, it'll lead to the destination stalling until that's 
> > > > > > finished
> > > > > > which defeats the whole point of postcopy.
> > > > > > That last one probably won't fail a test but it will lead to a long 
> > > > > > stall
> > > > > > if you give it a nice big guest with lots of RAM that it's rapidly
> > > > > > changing.
> > > > > 
> > > > > Okay, I understand the postcopy case/flow now.
> > > > > Thanks for explaining it clearly.
> > > > > 
> > > > > > > I still think that "load_state" SaveVMHandlers need to be called
> > > > > > > with BQL held since implementations apparently expect it that way:
> > > > > > > for example, I think PCI device configuration restore calls
> > > > > > > address space manipulation methods which abort() if called
> > > > > > > without BQL held.
> > > > > > 
> > > > > > However, the only devices that *should* be arriving on the channel
> > > > > > that the postcopy_ram_listen_thread is reading from are those
> > > > > > that are postcopiable (i.e. RAM and hmm block's dirty_bitmap).
> > > > > > Those load handlers are safe to be run while the other devices
> > > > > > are being changed.   Note the *should* - you could add a check
> > > > > > to fail if any other device arrives on that channel.
> > > > > 
> > > > > I think ultimately there should be either an explicit check, or,
> > > > > as you suggest in the paragraph below, a separate SaveVMHandler
> > > > > that runs without BQL held.
> > > > 
> > > > To me those are bugs happening during postcopy, so those abort()s in
> > > > memory.c are indeed for catching these issues too.
> > > > 
> > > > > Since the current state of just running these SaveVMHandlers
> > > > > without BQL in this case and hoping

Re: [PATCH 21/21] hw/gpio/imx_gpio: Don't clear input GPIO values upon reset

2025-02-03 Thread Bernhard Beschow



Am 28. Januar 2025 14:33:27 UTC schrieb Gustavo Romero 
:
>Hi,
>
>On 1/20/25 17:37, Bernhard Beschow wrote:
>> Input GPIO values such as a present SD card may get notified before the GPIO
>> controller itself gets reset. Claring the input values thus loses data. 
>> Assuming
>
>^- nit: Clearing
>
>

Peter asked for a three-way reset in inbound devices while keeping the logic 
here as is. I'd drop this patch then.

Best regards,
Bernhard

>Cheers,
>Gustavo
>
>> that input GPIO events are only fired when the state changes, the input 
>> values
>> shouldn't be reset.
>> 
>> Signed-off-by: Bernhard Beschow 
>> ---
>>   hw/gpio/imx_gpio.c | 1 -
>>   1 file changed, 1 deletion(-)
>> 
>> diff --git a/hw/gpio/imx_gpio.c b/hw/gpio/imx_gpio.c
>> index 549a281ed7..25546221e0 100644
>> --- a/hw/gpio/imx_gpio.c
>> +++ b/hw/gpio/imx_gpio.c
>> @@ -298,7 +298,6 @@ static void imx_gpio_reset(DeviceState *dev)
>> s->dr   = 0;
>>   s->gdir = 0;
>> -s->psr  = 0;
>>   s->icr  = 0;
>>   s->imr  = 0;
>>   s->isr  = 0;
>



Re: [PATCH v4 13/33] migration/multifd: Device state transfer support - receive side

2025-02-03 Thread Maciej S. Szmigiero

On 3.02.2025 22:27, Peter Xu wrote:

On Thu, Jan 30, 2025 at 11:08:34AM +0100, Maciej S. Szmigiero wrote:

From: "Maciej S. Szmigiero" 

Add a basic support for receiving device state via multifd channels -
channels that are shared with RAM transfers.

Depending whether MULTIFD_FLAG_DEVICE_STATE flag is present or not in the
packet header either device state (MultiFDPacketDeviceState_t) or RAM
data (existing MultiFDPacket_t) is read.

The received device state data is provided to
qemu_loadvm_load_state_buffer() function for processing in the
device's load_state_buffer handler.

Signed-off-by: Maciej S. Szmigiero 


I think I acked this one.  You could keep my R-b if...

[...]


diff --git a/migration/multifd.h b/migration/multifd.h
index 9e4baa066312..abf3acdcee40 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -62,6 +62,12 @@ MultiFDRecvData *multifd_get_recv_data(void);
  #define MULTIFD_FLAG_UADK (8 << 1)
  #define MULTIFD_FLAG_QATZIP (16 << 1)
  
+/*

+ * If set it means that this packet contains device state
+ * (MultiFDPacketDeviceState_t), not RAM data (MultiFDPacket_t).
+ */
+#define MULTIFD_FLAG_DEVICE_STATE (1 << 6)


... if this won't conflict with MULTIFD_FLAG_QATZIP.


Hmm, isn't (16 << 1) = 32 while (1 << 6) = 64?
 

I think we should stick with one way to write it, then when rebase you can
see such conflicts - either your patch uses 32 << 1, or perhaps we should
start to switch to BIT() for all above instead..



Thanks,
Maciej




RE: [PATCH v6 00/10] Support virtio-gpu DRM native context

2025-02-03 Thread Kim, Dongwon
> Subject: Re: [PATCH v6 00/10] Support virtio-gpu DRM native context
> 
> "Kim, Dongwon"  writes:
> 
> > Hi,
> >
> > The commit below could change the timing of drawing by making the
> > drawing done at refresh cycle instead of via drawing event. So it
> > looks like either dmabuf or client's framebuffer is being written and
> > read at the same time. Hey, can you describe how the corruption looks
> > like? Is it just garbage image with random noise or the actual frame with 
> > some
> defects like tearing...?
> 
> The terminal gets mirrored upside down and the mouse creates damage as it
> moves about.

I am wondering if this is reproducible without virgl and drm native context 
(like w/
sw rasterizer on the guest) as well. 

> 
> >
> >> Subject: Re: [PATCH v6 00/10] Support virtio-gpu DRM native context
> >>
> >> Dmitry Osipenko  writes:
> >>
> >> > On 1/27/25 19:17, Alex Bennée wrote:
> >> > ...
> >> >> I'm still seeing corruption with -display gtk,gl=on on my x86
> >> >> system BTW. I would like to understand if that is a problem with
> >> >> QEMU, GTK or something else in the stack before we merge.
> >> >
> >> > I reproduced the display mirroring/corruption issue and bisected it
> >> > to the following commit. The problem only happens when QEMU/GTK
> >> > uses Wayland display directly, while previously I was running QEMU
> >> > with XWayland that doesn't have the problem. Why this change breaks
> >> > dmabuf displaying with Wayland/GTK is unclear.
> >>
> >> Ahh that makes sense - I obviously forgot to mention I'm running
> >> sway/wayland across both machines.
> >>
> >> > Reverting commit fixes the bug.
> >> >
> >> > +Dongwon Kim +Vivek Kasireddy
> >> >
> >> > commit 77bf310084dad38b3a2badf01766c659056f1cf2
> >> > Author: Dongwon Kim 
> >> > Date:   Fri Apr 26 15:50:59 2024 -0700
> >> >
> >> > ui/gtk: Draw guest frame at refresh cycle
> >> >
> >> > Draw routine needs to be manually invoked in the next refresh
> >> > if there is a scanout blob from the guest. This is to prevent
> >> > a situation where there is a scheduled draw event but it won't
> >> > happen bacause the window is currently in inactive state
> >> > (minimized or tabified). If draw is not done for a long time,
> >> > gl_block timeout and/or fence timeout (on the guest) will happen
> >> > eventually.
> >> >
> >> > v2: Use gd_gl_area_draw(vc) in gtk-gl-area.c
> >> >
> >> > Suggested-by: Vivek Kasireddy 
> >> > Cc: Gerd Hoffmann 
> >> > Cc: Marc-André Lureau 
> >> > Cc: Daniel P. Berrangé 
> >> > Signed-off-by: Dongwon Kim 
> >> > Acked-by: Marc-André Lureau 
> >> > Message-Id: <20240426225059.3871283-1-dongwon@intel.com>
> >>
> >>
> >> Maybe a race on:
> >>
> >> QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf; ?
> >>
> >> --
> >> Alex Bennée
> >> Virtualisation Tech Lead @ Linaro
> 
> --
> Alex Bennée
> Virtualisation Tech Lead @ Linaro


Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls

2025-02-03 Thread Peter Xu
On Mon, Feb 03, 2025 at 09:15:52PM +0100, Maciej S. Szmigiero wrote:
> On 3.02.2025 20:58, Peter Xu wrote:
> > On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote:
> > > On 2.02.2025 13:45, Dr. David Alan Gilbert wrote:
> > > > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote:
> > > > > On 2.02.2025 03:06, Dr. David Alan Gilbert wrote:
> > > > > > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote:
> > > > > > > From: "Maciej S. Szmigiero" 
> > > > > > > 
> > > > > > > postcopy_ram_listen_thread() is a free running thread, so it 
> > > > > > > needs to
> > > > > > > take BQL around function calls to migration methods requiring BQL.
> > > > > > > 
> > > > > > > qemu_loadvm_state_main() needs BQL held since it ultimately calls
> > > > > > > "load_state" SaveVMHandlers.
> > > > > > > 
> > > > > > > migration_incoming_state_destroy() needs BQL held since it 
> > > > > > > ultimately calls
> > > > > > > "load_cleanup" SaveVMHandlers.
> > > > > > > 
> > > > > > > Signed-off-by: Maciej S. Szmigiero 
> > > > > > > ---
> > > > > > > migration/savevm.c | 4 
> > > > > > > 1 file changed, 4 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > > > > > index b0b74140daea..0ceea9638cc1 100644
> > > > > > > --- a/migration/savevm.c
> > > > > > > +++ b/migration/savevm.c
> > > > > > > @@ -2013,7 +2013,9 @@ static void 
> > > > > > > *postcopy_ram_listen_thread(void *opaque)
> > > > > > >  * in qemu_file, and thus we must be blocking now.
> > > > > > >  */
> > > > > > > qemu_file_set_blocking(f, true);
> > > > > > > +bql_lock();
> > > > > > > load_res = qemu_loadvm_state_main(f, mis);
> > > > > > > +bql_unlock();
> > > > > > 
> > > > > > Doesn't that leave that held for a heck of a long time?
> > > > > 
> > > > > Yes, and it effectively broke "postcopy recover" test but I
> > > > > think the reason for that is qemu_loadvm_state_main() and
> > > > > its children don't drop BQL while waiting for I/O.
> > > > > 
> > > > > I've described this case in more detail in my reply to Fabiano here:
> > > > > https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd25...@maciej.szmigiero.name/
> > > > 
> > > > While it might be the cause in this case, my feeling is it's more 
> > > > fundamental
> > > > here - it's the whole reason that postcopy has a separate ram listen
> > > > thread.  As the destination is running, after it loads it's devices
> > > > and as it starts up the destination will be still loading RAM
> > > > (and other postcopiable devices) potentially for quite a while.
> > > > Holding the bql around the ram listen thread means that the
> > > > execution of the destination won't be able to take that lock
> > > > until the postcopy load has finished; so while that might apparently
> > > > complete, it'll lead to the destination stalling until that's finished
> > > > which defeats the whole point of postcopy.
> > > > That last one probably won't fail a test but it will lead to a long 
> > > > stall
> > > > if you give it a nice big guest with lots of RAM that it's rapidly
> > > > changing.
> > > 
> > > Okay, I understand the postcopy case/flow now.
> > > Thanks for explaining it clearly.
> > > 
> > > > > I still think that "load_state" SaveVMHandlers need to be called
> > > > > with BQL held since implementations apparently expect it that way:
> > > > > for example, I think PCI device configuration restore calls
> > > > > address space manipulation methods which abort() if called
> > > > > without BQL held.
> > > > 
> > > > However, the only devices that *should* be arriving on the channel
> > > > that the postcopy_ram_listen_thread is reading from are those
> > > > that are postcopiable (i.e. RAM and hmm block's dirty_bitmap).
> > > > Those load handlers are safe to be run while the other devices
> > > > are being changed.   Note the *should* - you could add a check
> > > > to fail if any other device arrives on that channel.
> > > 
> > > I think ultimately there should be either an explicit check, or,
> > > as you suggest in the paragraph below, a separate SaveVMHandler
> > > that runs without BQL held.
> > 
> > To me those are bugs happening during postcopy, so those abort()s in
> > memory.c are indeed for catching these issues too.
> > 
> > > Since the current state of just running these SaveVMHandlers
> > > without BQL in this case and hoping that nothing breaks is
> > > clearly sub-optimal.
> > > 
> > > > > I have previously even submitted a patch to explicitly document
> > > > > "load_state" SaveVMHandler as requiring BQL (which was also
> > > > > included in the previous version of this patch set) and it
> > > > > received a "Reviewed-by:" tag:
> > > > > https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/
> > > > > https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.

Re: [PATCH v2 12/15] nbd/server: Support inactive nodes

2025-02-03 Thread Stefan Hajnoczi
On Fri, Jan 31, 2025 at 10:50:48AM +0100, Kevin Wolf wrote:
> In order to support running an NBD export on inactive nodes, we must
> make sure to return errors for any operations that aren't allowed on
> inactive nodes. Reads are the only operation we know we need for
> inactive images, so to err on the side of caution, return errors for
> everything else, even if some operations could possibly be okay.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  nbd/server.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index f64e47270c..2076fb2666 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -2026,6 +2026,7 @@ static void nbd_export_delete(BlockExport *blk_exp)
>  const BlockExportDriver blk_exp_nbd = {
>  .type   = BLOCK_EXPORT_TYPE_NBD,
>  .instance_size  = sizeof(NBDExport),
> +.supports_inactive  = true,
>  .create = nbd_export_create,
>  .delete = nbd_export_delete,
>  .request_shutdown   = nbd_export_request_shutdown,
> @@ -2920,6 +2921,22 @@ static coroutine_fn int nbd_handle_request(NBDClient 
> *client,
>  NBDExport *exp = client->exp;
>  char *msg;
>  size_t i;
> +bool inactive;
> +
> +WITH_GRAPH_RDLOCK_GUARD() {
> +inactive = bdrv_is_inactive(blk_bs(exp->common.blk));
> +if (inactive) {
> +switch (request->type) {
> +case NBD_CMD_READ:
> +/* These commands are allowed on inactive nodes */
> +break;
> +default:
> +/* Return an error for the rest */
> +return nbd_send_generic_reply(client, request, -EPERM,
> +  "export is inactive", errp);
> +}
> +}
> +}

Hmm...end of lock guard. What prevents the race where inactive changes
before the request is performed?

>  
>  switch (request->type) {
>  case NBD_CMD_CACHE:
> -- 
> 2.48.1
> 


signature.asc
Description: PGP signature


Re: [PATCH v2 11/15] block/export: Add option to allow export of inactive nodes

2025-02-03 Thread Eric Blake
On Fri, Jan 31, 2025 at 10:50:47AM +0100, Kevin Wolf wrote:
> Add an option in BlockExportOptions to allow creating an export on an
> inactive node without activating the node. This mode needs to be
> explicitly supported by the export type (so that it doesn't perform any
> operations that are forbidden for inactive nodes), so this patch alone
> doesn't allow this option to be successfully used yet.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/block-export.json | 10 +-
>  include/block/block-global-state.h |  3 +++
>  include/block/export.h |  3 +++
>  block.c|  4 
>  block/export/export.c  | 31 --
>  5 files changed, 40 insertions(+), 11 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH v2 15/15] iotests: Add (NBD-based) tests for inactive nodes

2025-02-03 Thread Eric Blake
On Fri, Jan 31, 2025 at 10:50:51AM +0100, Kevin Wolf wrote:
> This tests different types of operations on inactive block nodes
> (including graph changes, block jobs and NBD exports) to make sure that
> users manually activating and inactivating nodes doesn't break things.
> 
> Support for inactive nodes in other export types will have to come with
> separate test cases because they have different dependencies like blkio
> or root permissions and we don't want to disable this basic test when
> they are not fulfilled.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/iotests.py |   4 +
>  tests/qemu-iotests/tests/inactive-node-nbd| 303 ++
>  .../qemu-iotests/tests/inactive-node-nbd.out  | 239 ++
>  3 files changed, 546 insertions(+)
>  create mode 100755 tests/qemu-iotests/tests/inactive-node-nbd
>  create mode 100644 tests/qemu-iotests/tests/inactive-node-nbd.out
> 

> +iotests.log('\nMirror from active source to inactive target')
> +
> +iotests.log('disk-fmt active: %s' % node_is_active(vm, 'disk-fmt'))
> +iotests.log('snap-fmt active: %s' % node_is_active(vm, 'snap-fmt'))
> +iotests.log('snap2-fmt active: %s' % node_is_active(vm, 'snap2-fmt'))
> +iotests.log('target-fmt active: %s' % node_is_active(vm, 'target-fmt'))
> +
> +# Activating snap2-fmt recursively activates the whole backing chain
> +vm.qmp_log('blockdev-set-active', node_name='snap2-fmt', active=True)
> +vm.qmp_log('blockdev-set-active', node_name='target-fmt', active=False)

Here, you have "Activating ... recursively activates"...

> +
> +iotests.log('disk-fmt active: %s' % node_is_active(vm, 'disk-fmt'))
> +iotests.log('snap-fmt active: %s' % node_is_active(vm, 'snap-fmt'))
> +iotests.log('snap2-fmt active: %s' % node_is_active(vm, 'snap2-fmt'))
> +iotests.log('target-fmt active: %s' % node_is_active(vm, 'target-fmt'))
> +
> +vm.qmp_log('blockdev-mirror', job_id='job0', device='snap2-fmt',
> +   target='target-fmt', sync='full',
> +   filters=[iotests.filter_qmp_generated_node_ids])
> +
> +iotests.log('\nBackup from active source to inactive target')
> +
> +vm.qmp_log('blockdev-backup', job_id='job0', device='snap2-fmt',
> +   target='target-fmt', sync='full',
> +   filters=[iotests.filter_qmp_generated_node_ids])
> +
> +iotests.log('\nBackup from inactive source to active target')
> +
> +# Activating snap2-fmt recursively inactivates the whole backing chain
> +vm.qmp_log('blockdev-set-active', node_name='snap2-fmt', active=False)
> +vm.qmp_log('blockdev-set-active', node_name='target-fmt', active=True)

...but here, "Activating ... recursively inactivates".  Is one of
these statements wrong?

Overall a nice barrage of tests, and I can see how adding this many
tests caused your v2 to fix some bugs that it discovered in v1.


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH v4 10/33] error: define g_autoptr() cleanup function for the Error type

2025-02-03 Thread Peter Xu
On Thu, Jan 30, 2025 at 11:08:31AM +0100, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" 
> 
> Automatic memory management helps avoid memory safety issues.
> 
> Signed-off-by: Maciej S. Szmigiero 

Reviewed-by: Peter Xu 

-- 
Peter Xu




Re: [PATCH v2 00/14] meson: Deprecate 32-bit host support

2025-02-03 Thread Richard Henderson

On 2/3/25 04:54, Paolo Bonzini wrote:

On 2/3/25 04:18, Richard Henderson wrote:

v1: 20250128004254.33442-1-richard.hender...@linaro.org

For v2, immediately disable 64-on-32 TCG.

I *suspect* that we should disable 64-on-32 for *all* accelerators.
The idea that an i686 binary on an x86_64 host may be used to spawn
an x86_64 guest via kvm is silly and a bit more than niche.


At least Xen used to be commonly used with 32-bit dom0, because it saved memory and dom0 
would map in guest buffers as needed.  I'm not sure how common that is these days, perhaps 
Stefano knows.


As a data-point, debian does not ship libxen-dev for i686.
We cannot build-test this configuration at all.

I can build-test Xen for armhf, and I guess it would use i386-softmmu; it's unclear 
whether x86_64-softmmu and aarch64-softmmu are relevant or useful for an armhf host, or as 
an armhf binary running on an aarch64 host.



r~



[PATCH] tests/functional: Extend the ppc64 e500 test

2025-02-03 Thread Cédric Le Goater
The test sequence boots a ppce500 machine from kernel and disk.

The buildroot is built with the qemu_ppc64_e5500_defconfig config.

Signed-off-by: Cédric Le Goater 
---
 tests/functional/test_ppc64_e500.py | 30 +
 1 file changed, 30 insertions(+)

diff --git a/tests/functional/test_ppc64_e500.py 
b/tests/functional/test_ppc64_e500.py
index b92fe0b0e75e..f21d7d84177e 100755
--- a/tests/functional/test_ppc64_e500.py
+++ b/tests/functional/test_ppc64_e500.py
@@ -5,6 +5,7 @@
 # SPDX-License-Identifier: GPL-2.0-or-later
 
 from qemu_test import LinuxKernelTest, Asset
+from qemu_test import exec_command_and_wait_for_pattern
 
 
 class E500Test(LinuxKernelTest):
@@ -20,5 +21,34 @@ def test_ppc64_e500(self):
 self.launch_kernel(self.scratch_file('day19', 'uImage'),
wait_for='QEMU advent calendar')
 
+ASSET_BR2_E5500_UIMAGE = Asset(
+
'https://github.com/legoater/qemu-ppc-boot/raw/refs/heads/main/buildroot/qemu_ppc64_e5500-2023.11-8-gdcd9f0f6eb-20240104/uImage',
+'2478187c455d6cca3984e9dfde9c635d824ea16236b85fd6b4809f744706deda')
+
+ASSET_BR2_E5500_ROOTFS = Asset(
+
'https://github.com/legoater/qemu-ppc-boot/raw/refs/heads/main//buildroot/qemu_ppc64_e5500-2023.11-8-gdcd9f0f6eb-20240104/rootfs.ext2',
+'9035ef97237c84c7522baaff17d25cdfca4bb7a053d5e296e902919473423d76')
+
+def test_ppc64_e500_buildroot(self):
+self.set_machine('ppce500')
+self.cpu = 'e5500'
+
+uimage_path = self.ASSET_BR2_E5500_UIMAGE.fetch()
+rootfs_path = self.ASSET_BR2_E5500_ROOTFS.fetch()
+
+self.vm.set_console()
+self.vm.add_args('-kernel', uimage_path,
+ '-append', 'root=/dev/vda',
+ '-drive', f'file={rootfs_path},if=virtio,format=raw',
+ '-snapshot', '-no-shutdown')
+self.vm.launch()
+
+self.wait_for_console_pattern('Linux version')
+self.wait_for_console_pattern('/init as init process')
+self.wait_for_console_pattern('lease of 10.0.2.15')
+self.wait_for_console_pattern('buildroot login:')
+exec_command_and_wait_for_pattern(self, 'root', '#')
+exec_command_and_wait_for_pattern(self, 'poweroff', 'Power down')
+
 if __name__ == '__main__':
 LinuxKernelTest.main()
-- 
2.48.1




Re: [PATCH v2 01/14] meson: Drop tcg as a module

2025-02-03 Thread Thomas Huth

On 03/02/2025 04.18, Richard Henderson wrote:

The fact that this is only enabled for x86 probably means it
was done incorrectly.  Certainly the set of files selected to
go into the module is woefully incomplete.  Drop it for now.

Signed-off-by: Richard Henderson 
---
  accel/tcg/meson.build | 11 ---
  meson.build   | 18 +-
  2 files changed, 5 insertions(+), 24 deletions(-)


Looking at the cover letter 
https://lore.kernel.org/qemu-devel/20210624103836.2382472-1-kra...@redhat.com/ 
it indeed only mentions "a small fraction of tcg (x86 only)", and since 
there were no follow up patches, it sounds like an incomplete conversion to 
me. So reverting it three and a half years later sounds reasonable.


Reviewed-by: Thomas Huth 




[PATCH v3] spapr: nested: Add support for reporting Hostwide state counter

2025-02-03 Thread Vaibhav Jain
Add support for reporting Hostwide state counters for nested KVM pseries
guests running with 'cap-nested-papr' on Qemu-TCG acting as
L0-hypervisor. sPAPR supports reporting various stats counters for
Guest-Management-Area(GMA) thats owned by L0-Hypervisor and are documented
at [1]. These stats counters are exposed via a new bit-flag named
'getHostWideState' for the H_GUEST_GET_STATE hcall. Once this flag is set
the hcall should populate the Guest-State-Elements in the requested GSB
with the stat counter values. Currently following five counters are
supported:

* host_heap : The currently used bytes in the
  Hypervisor's Guest Management Space
  associated with the Host Partition.
* host_heap_max : The maximum bytes available in the
  Hypervisor's Guest Management Space
  associated with the Host Partition.
* host_pagetable: The currently used bytes in the
  Hypervisor's Guest Page Table Management
  Space associated with the Host Partition.
* host_pagetable_max: The maximum bytes available in the
  Hypervisor's Guest Page Table Management
  Space associated with the Host Partition.
* host_pagetable_reclaim: The amount of space in bytes that has
  been reclaimed due to overcommit in the
  Hypervisor's Guest Page Table Management
  Space associated with the Host Partition.

At the moment '0' is being reported for all these counters as these
counters doesnt align with how L0-Qemu manages Guest memory.

The patch implements support for these counters by adding new members to
the 'struct SpaprMachineStateNested'. These new members are then plugged
into the existing 'guest_state_element_types[]' with the help of a new
macro 'GSBE_NESTED_MACHINE_DW' together with a new helper
'get_machine_ptr()'. guest_state_request_check() is updated to ensure
correctness of the requested GSB and finally h_guest_getset_state() is
updated to handle the newly introduced flag
'GUEST_STATE_REQUEST_HOST_WIDE'.

This patch is tested with the proposed linux-kernel implementation to
expose these stat-counter as perf-events at [2].

[1]
https://lore.kernel.org/all/20241222140247.174998-2-vaib...@linux.ibm.com

[2]
https://lore.kernel.org/all/20241222140247.174998-1-vaib...@linux.ibm.com

Signed-off-by: Vaibhav Jain 
Reviewed-by: Harsh Prateek Bora 
---
Changelog:

v2->v1:
* Fixed minor nits suggested [Harsh]
* s/GUEST_STATE_ELEMENT_TYPE_FLAG_HOST_WIDE/GUEST_STATE_REQUEST_HOST_WIDE/
in guest_state_request_check() [Harsh]
* MInor change in the order of comparision in h_guest_getset_state()
[Harsh]
* Added reviewed-by of Harsh

v1->v2:
* Introduced new flags to correctly compare hcall flags
  for H_GUEST_{GET,SET}_STATE [Harsh]
* Fixed ordering of new GSB elements in spapr_nested.h [Harsh]
* s/GSBE_MACHINE_NESTED_DW/GSBE_NESTED_MACHINE_DW/
* Minor tweaks to patch description
* Updated recipients list
---
 hw/ppc/spapr_nested.c | 82 ++-
 include/hw/ppc/spapr_nested.h | 39 ++---
 2 files changed, 96 insertions(+), 25 deletions(-)

diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 7def8eb73b..d1aa6fc866 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -64,10 +64,9 @@ static
 SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
  target_ulong guestid)
 {
-SpaprMachineStateNestedGuest *guest;
-
-guest = g_hash_table_lookup(spapr->nested.guests, 
GINT_TO_POINTER(guestid));
-return guest;
+return spapr->nested.guests ?
+g_hash_table_lookup(spapr->nested.guests,
+GINT_TO_POINTER(guestid)) : NULL;
 }
 
 bool spapr_get_pate_nested_papr(SpaprMachineState *spapr, PowerPCCPU *cpu,
@@ -613,6 +612,13 @@ static void *get_guest_ptr(SpaprMachineStateNestedGuest 
*guest,
 return guest; /* for GSBE_NESTED */
 }
 
+static void *get_machine_ptr(SpaprMachineStateNestedGuest *guest,
+ target_ulong vcpuid)
+{
+SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+return &spapr->nested;
+}
+
 /*
  * set=1 means the L1 is trying to set some state
  * set=0 means the L1 is trying to get some state
@@ -1012,7 +1018,12 @@ struct guest_state_element_type 
guest_state_element_types[] = {
 GSBE_NESTED_VCPU(GSB_VCPU_OUT_BUFFER, 0x10, runbufout,   
copy_state_runbuf),
 GSBE_NESTED_VCPU(GSB_VCPU_OUT_BUF_MIN_SZ, 0x8, runbufout, 
out_buf_min_size),
 GSBE_NESTED_VCPU(GSB_VCPU_HDEC_EXPIRY_TB, 0x8, hdecr_expiry_tb,
- copy_state_hdecr)
+ copy_state_hdecr),
+GSBE_NESTED_MACHINE_DW(GSB_GUEST_HEAP, current_guest_heap),
+GSBE_NESTED_MACHINE_DW(GSB_GUEST_HEAP_MAX, max_guest_heap)

Re: [PATCH v2 04/14] meson: Introduce CONFIG_TCG_TARGET

2025-02-03 Thread Thomas Huth

On 03/02/2025 04.18, Richard Henderson wrote:

Use CONFIG_TCG as a project-wide flag to indicate that TCG is enabled
for *some* target.  Use CONFIG_TCG_TARGET to indicate that TCG is
enabled for a specific target.

Within a specific compilation unit, we can remap CONFIG_TCG based on
CONFIG_TCG_TARGET.  This allows us to avoid changes to the bulk of
the code base.

Within meson.build, while CONFIG_TCG may be set in config_host_data,
it may not be set within config_target.  Thus all references to
CONFIG_TCG in source_set 'when:' need not be updated.

For the moment, CONFIG_TCG and CONFIG_TCG_TARGET are identical.

Signed-off-by: Richard Henderson 
---
  include/qemu/osdep.h |  7 +++
  meson.build  | 11 +++
  2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 112ebdff21..1f6f73a148 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -34,9 +34,16 @@
  #include "config-host.h"
  #ifdef COMPILING_PER_TARGET
  #include CONFIG_TARGET
+# ifdef CONFIG_TCG_TARGET
+#  undef CONFIG_TCG_TARGET
+# else
+#  undef CONFIG_TCG
+# endif
  #else
  #include "exec/poison.h"
  #endif
+#pragma GCC poison CONFIG_TCG_TARGET


Shouldn't that rather go before the "#endif" instead?

Also, would it be possible to rather adjust scripts/make-config-poison.sh 
instead of poisoning this switch manually?


 Thomas

  
  /*

   * HOST_WORDS_BIGENDIAN was replaced with HOST_BIG_ENDIAN. Prevent it from
diff --git a/meson.build b/meson.build
index b72114819b..5ca3cc3f34 100644
--- a/meson.build
+++ b/meson.build
@@ -3270,11 +3270,14 @@ foreach target : target_dirs
  
target_kconfig = []

foreach sym: accelerators
-if sym == 'CONFIG_TCG' or target in accelerator_targets.get(sym, [])
-  config_target += { sym: 'y' }
-  config_all_accel += { sym: 'y' }
-  target_kconfig += [ sym + '=y' ]
+if sym == 'CONFIG_TCG'
+  config_target += { 'CONFIG_TCG_TARGET': 'y' }
+elif target not in accelerator_targets.get(sym, [])
+  continue
  endif
+config_target += { sym: 'y' }
+config_all_accel += { sym: 'y' }
+target_kconfig += [ sym + '=y' ]
endforeach
if target_kconfig.length() == 0
  if default_targets





Re: [PATCH v2] spapr: nested: Add support for reporting Hostwide state counter

2025-02-03 Thread Vaibhav Jain
Harsh Prateek Bora  writes:

> On 1/23/25 17:25, Vaibhav Jain wrote:
>> Add support for reporting Hostwide state counters for nested KVM pseries
>> guests running with 'cap-nested-papr' on Qemu-TCG acting as
>> L0-hypervisor. sPAPR supports reporting various stats counters for
>> Guest-Management-Area(GMA) thats owned by L0-Hypervisor and are documented
>> at [1]. These stats counters are exposed via a new bit-flag named
>> 'getHostWideState' for the H_GUEST_GET_STATE hcall. Once this flag is set
>> the hcall should populate the Guest-State-Elements in the requested GSB
>> with the stat counter values. Currently following five counters are
>> supported:
>> 
>> * host_heap  : The currently used bytes in the
>>Hypervisor's Guest Management Space
>>associated with the Host Partition.
>> * host_heap_max  : The maximum bytes available in the
>>Hypervisor's Guest Management Space
>>associated with the Host Partition.
>> * host_pagetable : The currently used bytes in the
>>Hypervisor's Guest Page Table Management
>>Space associated with the Host Partition.
>> * host_pagetable_max : The maximum bytes available in the
>>Hypervisor's Guest Page Table Management
>>Space associated with the Host Partition.
>> * host_pagetable_reclaim: The amount of space in bytes that has
>>been reclaimed due to overcommit in the
>>Hypervisor's Guest Page Table Management
>>Space associated with the Host Partition.
>> 
>> At the moment '0' is being reported for all these counters as these
>> counters doesnt align with how L0-Qemu manages Guest memory.
>> 
>> The patch implements support for these counters by adding new members to
>> the 'struct SpaprMachineStateNested'. These new members are then plugged
>> into the existing 'guest_state_element_types[]' with the help of a new
>> macro 'GSBE_NESTED_MACHINE_DW' together with a new helper
>> 'get_machine_ptr()'. guest_state_request_check() is updated to ensure
>> correctness of the requested GSB and finally h_guest_getset_state() is
>> updated to handle the newly introduced flag
>> 'GUEST_STATE_REQUEST_HOST_WIDE'.
>> 
>> This patch is tested with the proposed linux-kernel implementation to
>> expose these stat-counter as perf-events at [2].
>> 
>> [1]
>> https://lore.kernel.org/all/20241222140247.174998-2-vaib...@linux.ibm.com
>> 
>> [2]
>> https://lore.kernel.org/all/20241222140247.174998-1-vaib...@linux.ibm.com
>> 
>> Signed-off-by: Vaibhav Jain 
>> ---
>> Changelog:
>> 
>> v1->v2:
>> * Introduced new flags to correctly compare hcall flags
>>for H_GUEST_{GET,SET}_STATE [Harsh]
>> * Fixed ordering of new GSB elements in spapr_nested.h [Harsh]
>> * s/GSBE_MACHINE_NESTED_DW/GSBE_NESTED_MACHINE_DW/
>> * Minor tweaks to patch description
>> * Updated recipients list
>> ---
>>   hw/ppc/spapr_nested.c | 82 ++-
>>   include/hw/ppc/spapr_nested.h | 39 ++---
>>   2 files changed, 96 insertions(+), 25 deletions(-)
>> 
>> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
>> index 7def8eb73b..7f484bb3e7 100644
>> --- a/hw/ppc/spapr_nested.c
>> +++ b/hw/ppc/spapr_nested.c
>> @@ -64,10 +64,9 @@ static
>>   SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState 
>> *spapr,
>>target_ulong guestid)
>>   {
>> -SpaprMachineStateNestedGuest *guest;
>> -
>> -guest = g_hash_table_lookup(spapr->nested.guests, 
>> GINT_TO_POINTER(guestid));
>> -return guest;
>> +return spapr->nested.guests ?
>> +g_hash_table_lookup(spapr->nested.guests,
>> +GINT_TO_POINTER(guestid)) : NULL;
>>   }
>>   
>>   bool spapr_get_pate_nested_papr(SpaprMachineState *spapr, PowerPCCPU *cpu,
>> @@ -613,6 +612,13 @@ static void *get_guest_ptr(SpaprMachineStateNestedGuest 
>> *guest,
>>   return guest; /* for GSBE_NESTED */
>>   }
>>   
>> +static void *get_machine_ptr(SpaprMachineStateNestedGuest *guest,
>> + target_ulong vcpuid)
>> +{
>> +SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +return &spapr->nested;
>> +}
>> +
>>   /*
>>* set=1 means the L1 is trying to set some state
>>* set=0 means the L1 is trying to get some state
>> @@ -1012,7 +1018,12 @@ struct guest_state_element_type 
>> guest_state_element_types[] = {
>>   GSBE_NESTED_VCPU(GSB_VCPU_OUT_BUFFER, 0x10, runbufout,   
>> copy_state_runbuf),
>>   GSBE_NESTED_VCPU(GSB_VCPU_OUT_BUF_MIN_SZ, 0x8, runbufout, 
>> out_buf_min_size),
>>   GSBE_NESTED_VCPU(GSB_VCPU_HDEC_EXPIRY_TB, 0x8, hdecr_expiry_tb,
>> - copy_state_hdecr)
>> + copy_state_hdecr),
>> +GSBE_NESTED_MACHINE_DW(GSB_GUEST_HEAP, curr

Re: [PATCH] tests/functional: Extend the ppc64 e500 test

2025-02-03 Thread Thomas Huth

On 03/02/2025 10.57, Cédric Le Goater wrote:

The test sequence boots a ppce500 machine from kernel and disk.

The buildroot is built with the qemu_ppc64_e5500_defconfig config.

Signed-off-by: Cédric Le Goater 
---
  tests/functional/test_ppc64_e500.py | 30 +
  1 file changed, 30 insertions(+)

diff --git a/tests/functional/test_ppc64_e500.py 
b/tests/functional/test_ppc64_e500.py
index b92fe0b0e75e..f21d7d84177e 100755
--- a/tests/functional/test_ppc64_e500.py
+++ b/tests/functional/test_ppc64_e500.py
@@ -5,6 +5,7 @@
  # SPDX-License-Identifier: GPL-2.0-or-later
  
  from qemu_test import LinuxKernelTest, Asset

+from qemu_test import exec_command_and_wait_for_pattern
  
  
  class E500Test(LinuxKernelTest):

@@ -20,5 +21,34 @@ def test_ppc64_e500(self):
  self.launch_kernel(self.scratch_file('day19', 'uImage'),
 wait_for='QEMU advent calendar')
  
+ASSET_BR2_E5500_UIMAGE = Asset(

+
'https://github.com/legoater/qemu-ppc-boot/raw/refs/heads/main/buildroot/qemu_ppc64_e5500-2023.11-8-gdcd9f0f6eb-20240104/uImage',
+'2478187c455d6cca3984e9dfde9c635d824ea16236b85fd6b4809f744706deda')
+
+ASSET_BR2_E5500_ROOTFS = Asset(
+
'https://github.com/legoater/qemu-ppc-boot/raw/refs/heads/main//buildroot/qemu_ppc64_e5500-2023.11-8-gdcd9f0f6eb-20240104/rootfs.ext2',
+'9035ef97237c84c7522baaff17d25cdfca4bb7a053d5e296e902919473423d76')


Hmm, the advent calendar test that is already available in this file is also 
based on build root ... so I think we don't need both tests here. IIRC I 
built most of the advent calendar images without networking stack (to keep 
them smaller), so your image is likely better suited here, thus I'd suggest 
to remove the advent calendar image now when you add your new test. WDYT?


 Thomas




Re: [PATCH v2 03/14] plugins: Uninline qemu_plugin_add_opts

2025-02-03 Thread Thomas Huth

On 03/02/2025 04.18, Richard Henderson wrote:

No need to expand this function inline.
Unexport qemu_plugin_opts to match.

Signed-off-by: Richard Henderson 
---
  include/qemu/plugin.h | 9 +
  plugins/loader.c  | 7 ++-
  2 files changed, 7 insertions(+), 9 deletions(-)


Reviewed-by: Thomas Huth 




Re: [PATCH v2 02/14] tcg: Move stubs in tcg/perf.h to tcg/perf-stubs.c

2025-02-03 Thread Thomas Huth

On 03/02/2025 04.18, Richard Henderson wrote:

These are not called so frequently as to be
performance sensitive.

Signed-off-by: Richard Henderson 
---
  include/tcg/perf.h | 23 ---
  tcg/perf-stubs.c   | 26 ++
  tcg/meson.build|  2 ++
  3 files changed, 28 insertions(+), 23 deletions(-)
  create mode 100644 tcg/perf-stubs.c


Reviewed-by: Thomas Huth 




Re: [PATCH v2 05/14] tcg: Link only when required in system mode

2025-02-03 Thread Thomas Huth

On 03/02/2025 04.18, Richard Henderson wrote:

Rather than unconditional linkage via system_ss, conditinally
include the static library via specific_ss.  This will elide
the code when CONFIG_TCG is disabled for a specific target.

Signed-off-by: Richard Henderson 
---
  tcg/meson.build | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tcg/meson.build b/tcg/meson.build
index 2977df5862..8266bcb324 100644
--- a/tcg/meson.build
+++ b/tcg/meson.build
@@ -49,4 +49,8 @@ libtcg_system = static_library('tcg_system',
  
  tcg_system = declare_dependency(objects: libtcg_system.extract_all_objects(recursive: false),

  dependencies: tcg_ss.dependencies())
-system_ss.add(tcg_system)
+
+specific_ss.add(when: ['CONFIG_SYSTEM_ONLY', 'CONFIG_TCG'], if_true: 
tcg_system)
+if host_os == 'linux'
+  specific_ss.add(when: 'CONFIG_TCG', if_false: files('perf-stubs.c'))
+endif


Reviewed-by: Thomas Huth 




Re: [PATCH v2 06/14] plugins: Link only when required in system mode

2025-02-03 Thread Thomas Huth

On 03/02/2025 04.18, Richard Henderson wrote:

Provide out-of-line versions of some of the qemu/plugin.h API.
These will be referenced with --enable-plugin, but CONFIG_TCG
is disabled for a specific target.

Signed-off-by: Richard Henderson 
---
  plugins/stubs.c | 49 +
  plugins/meson.build |  5 -
  2 files changed, 53 insertions(+), 1 deletion(-)
  create mode 100644 plugins/stubs.c


Reviewed-by: Thomas Huth 




Re: [PATCH v2 07/14] accel/stubs: Expand stubs for TCG

2025-02-03 Thread Thomas Huth

On 03/02/2025 04.18, Richard Henderson wrote:

Add tcg_allowed, qmp_x_query_jit, qmp_x_query_opcount.
These are referenced when CONFIG_TCG is enabled globally,
but not for a specific target.

Signed-off-by: Richard Henderson 
---
  accel/stubs/tcg-stub.c | 24 
  1 file changed, 24 insertions(+)

diff --git a/accel/stubs/tcg-stub.c b/accel/stubs/tcg-stub.c
index 7f4208fddf..9c2e2dc6e1 100644
--- a/accel/stubs/tcg-stub.c
+++ b/accel/stubs/tcg-stub.c
@@ -13,6 +13,18 @@
  #include "qemu/osdep.h"
  #include "exec/tb-flush.h"
  #include "exec/exec-all.h"
+#include "qapi/error.h"
+
+/*
+ * This file *ought* to be built once and linked only when required.
+ * However, it is built per-target, which means qemu/osdep.h has already
+ * undef'ed CONFIG_TCG, which hides the auto-generated declaration.


So why don't we only build this file once?

 Thomas



+ */
+#define CONFIG_TCG
+#include "qapi/qapi-commands-machine.h"
+
+
+const bool tcg_allowed = false;
  
  void tb_flush(CPUState *cpu)

  {
@@ -27,3 +39,15 @@ G_NORETURN void cpu_loop_exit_restore(CPUState *cpu, 
uintptr_t pc)
  {
  g_assert_not_reached();
  }
+
+HumanReadableText *qmp_x_query_jit(Error **errp)
+{
+error_setg(errp, "JIT information is only available with accel=tcg");
+return NULL;
+}
+
+HumanReadableText *qmp_x_query_opcount(Error **errp)
+{
+error_setg(errp, "Opcode count information is only available with 
accel=tcg");
+return NULL;
+}





Re: [PATCH v2 08/14] target/mips: Protect objects with CONFIG_TCG

2025-02-03 Thread Thomas Huth

On 03/02/2025 04.18, Richard Henderson wrote:

Hack around mips32 host allowing kvm acceleration
of mips64 guest, but tcg is disabled.

Signed-off-by: Richard Henderson 
---
  target/mips/tcg/meson.build| 4 ++--
  target/mips/tcg/system/meson.build | 6 +++---
  2 files changed, 5 insertions(+), 5 deletions(-)


Reviewed-by: Thomas Huth 




Re: [PATCH v2 09/14] gitlab: Replace aarch64 with arm in cross-i686-tci build

2025-02-03 Thread Thomas Huth

On 03/02/2025 04.18, Richard Henderson wrote:

Configuration of 64-bit host on 32-bit guest will shortly
be denied.  Use a 32-bit guest instead.

Signed-off-by: Richard Henderson 
---
  .gitlab-ci.d/crossbuilds.yml | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
index 95dfc39224..7ae0f966f1 100644
--- a/.gitlab-ci.d/crossbuilds.yml
+++ b/.gitlab-ci.d/crossbuilds.yml
@@ -61,7 +61,7 @@ cross-i686-tci:
variables:
  IMAGE: debian-i686-cross
  ACCEL: tcg-interpreter
-EXTRA_CONFIGURE_OPTS: 
--target-list=i386-softmmu,i386-linux-user,aarch64-softmmu,aarch64-linux-user,ppc-softmmu,ppc-linux-user
 --disable-plugins --disable-kvm
+EXTRA_CONFIGURE_OPTS: 
--target-list=i386-softmmu,i386-linux-user,arm-softmmu,arm-linux-user,ppc-softmmu,ppc-linux-user
 --disable-plugins --disable-kvm
  # Force tests to run with reduced parallelism, to see whether this
  # reduces the flakiness of this CI job. The CI
  # environment by default shows us 8 CPUs and so we


Reviewed-by: Thomas Huth 




Re: [PATCH v2 10/14] configure: Define TARGET_LONG_BITS in configs/targets/*.mak

2025-02-03 Thread Thomas Huth

On 03/02/2025 04.18, Richard Henderson wrote:

Define TARGET_LONG_BITS in each target's configure fragment.
Do this without removing the define in target/*/cpu-param.h
so that errors are caught like so:

In file included from .../src/include/exec/cpu-defs.h:26,
  from ../src/target/hppa/cpu.h:24,
  from ../src/linux-user/qemu.h:4,
  from ../src/linux-user/hppa/cpu_loop.c:21:
../src/target/hppa/cpu-param.h:11: error: "TARGET_LONG_BITS" redefined [-Werror]
11 | #define TARGET_LONG_BITS  64
   |
In file included from .../src/include/qemu/osdep.h:36,
  from ../src/linux-user/hppa/cpu_loop.c:20:
./hppa-linux-user-config-target.h:32: note: this is the location of the 
previous definition
32 | #define TARGET_LONG_BITS 32
   |
cc1: all warnings being treated as errors

Signed-off-by: Richard Henderson 
---
[...]> diff --git a/configs/targets/hppa-linux-user.mak 
b/configs/targets/hppa-linux-user.mak

index 8e0a80492f..4295cf384e 100644
--- a/configs/targets/hppa-linux-user.mak
+++ b/configs/targets/hppa-linux-user.mak
@@ -3,3 +3,5 @@ TARGET_ABI32=y
  TARGET_SYSTBL_ABI=common,32
  TARGET_SYSTBL=syscall.tbl
  TARGET_BIG_ENDIAN=y
+# Compromise to ease maintainence vs system mode


s/maintainence/maintenance/


diff --git a/configs/targets/mipsn32-linux-user.mak 
b/configs/targets/mipsn32-linux-user.mak
index 206095da64..39ae214633 100644
--- a/configs/targets/mipsn32-linux-user.mak
+++ b/configs/targets/mipsn32-linux-user.mak
@@ -5,3 +5,4 @@ TARGET_BASE_ARCH=mips
  TARGET_SYSTBL_ABI=n32
  TARGET_SYSTBL=syscall_n32.tbl
  TARGET_BIG_ENDIAN=y
+TARGET_LONG_BITS=64


Why is this 64 ?


diff --git a/configs/targets/mipsn32el-linux-user.mak 
b/configs/targets/mipsn32el-linux-user.mak
index ca2a3ed753..d9b61d6990 100644
--- a/configs/targets/mipsn32el-linux-user.mak
+++ b/configs/targets/mipsn32el-linux-user.mak
@@ -4,3 +4,4 @@ TARGET_ABI32=y
  TARGET_BASE_ARCH=mips
  TARGET_SYSTBL_ABI=n32
  TARGET_SYSTBL=syscall_n32.tbl
+TARGET_LONG_BITS=64


dito?


diff --git a/configs/targets/sparc32plus-linux-user.mak 
b/configs/targets/sparc32plus-linux-user.mak
index 6cc8fa516b..7a16934fd1 100644
--- a/configs/targets/sparc32plus-linux-user.mak
+++ b/configs/targets/sparc32plus-linux-user.mak
@@ -5,3 +5,4 @@ TARGET_ABI_DIR=sparc
  TARGET_SYSTBL_ABI=common,32
  TARGET_SYSTBL=syscall.tbl
  TARGET_BIG_ENDIAN=y
+TARGET_LONG_BITS=64


Same question here: Why 64? If this isn't a mistake, could you maybe add a 
comment?


 Thanks,
  Thomas




Re: [PATCH v2 11/14] target/*: Remove TARGET_LONG_BITS from cpu-param.h

2025-02-03 Thread Thomas Huth

On 03/02/2025 04.18, Richard Henderson wrote:

This is now handled by the configs/targets/*.mak fragment.

Signed-off-by: Richard Henderson 
---
  target/alpha/cpu-param.h  | 2 --
  target/arm/cpu-param.h| 2 --
  target/avr/cpu-param.h| 1 -
  target/hexagon/cpu-param.h| 1 -
  target/hppa/cpu-param.h   | 2 --
  target/i386/cpu-param.h   | 2 --
  target/loongarch/cpu-param.h  | 1 -
  target/m68k/cpu-param.h   | 1 -
  target/microblaze/cpu-param.h | 2 --
  target/mips/cpu-param.h   | 5 -
  target/openrisc/cpu-param.h   | 1 -
  target/ppc/cpu-param.h| 2 --
  target/riscv/cpu-param.h  | 2 --
  target/rx/cpu-param.h | 1 -
  target/s390x/cpu-param.h  | 1 -
  target/sh4/cpu-param.h| 1 -
  target/sparc/cpu-param.h  | 2 --
  target/tricore/cpu-param.h| 1 -
  target/xtensa/cpu-param.h | 1 -
  19 files changed, 31 deletions(-)


Reviewed-by: Thomas Huth 




Re: [PATCH v2 13/14] meson: Deprecate 32-bit host support

2025-02-03 Thread Thomas Huth

On 03/02/2025 04.18, Richard Henderson wrote:

We deprecated i686 system mode support for qemu 8.0.  However, to
make real cleanups to TCG we need to deprecate all 32-bit hosts.

Signed-off-by: Richard Henderson 
---
  docs/about/deprecated.rst | 7 +++
  meson.build   | 6 ++
  2 files changed, 9 insertions(+), 4 deletions(-)


Reviewed-by: Thomas Huth 




Re: [PATCH v3 06/14] acpi/ghes: only set hw_error_le or hest_addr_le

2025-02-03 Thread Jonathan Cameron via
On Fri, 31 Jan 2025 18:42:47 +0100
Mauro Carvalho Chehab  wrote:

> The hw_error_le pointer is used for legacy support (virt-9.2).
> Starting from virt-10.0, HEST table is accessed via hest_addr_le.
> 
> Remove fw_cfg logic for legacy support if virt is 10.0 or upper.
> 
> Signed-off-by: Mauro Carvalho Chehab 
Reviewed-by: Jonathan Cameron 



Re: [PATCH v3 08/14] acpi/ghes: Cleanup the code which gets ghes ged state

2025-02-03 Thread Jonathan Cameron via
On Fri, 31 Jan 2025 18:42:49 +0100
Mauro Carvalho Chehab  wrote:

> Move the check logic into a common function and simplify the
> code which checks if GHES is enabled and was properly setup.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> Reviewed-by: Jonathan Cameron 
> Reviewed-by:  Igor Mammedov 

One minor comment inline on a change I think should be in an earlier
patch.

> -void ghes_record_cper_errors(const void *cper, size_t len,
> +void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t 
> len,
>   uint16_t source_id, Error **errp)
>  {
>  uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
> -AcpiGedState *acpi_ged_state;
> -AcpiGhesState *ags;
>  
>  if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
>  error_setg(errp, "GHES CPER record is too big: %zd", len);
>  return;
>  }
>  
> -acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> -   NULL));
> -if (!acpi_ged_state) {
> -error_setg(errp, "Can't find ACPI_GED object");
> -return;
> -}
> -ags = &acpi_ged_state->ghes_state;
> -
> -if (!ags->hest_addr_le) {
> +if (!ags->use_hest_addr) {

Should this change be moved back to patch 3?  use_hest_addr was available
at that point and it would reduce churn a tiny bit.

>  get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
>   &cper_addr, &read_ack_register_addr);
>  } else {
> @@ -547,11 +531,6 @@ void ghes_record_cper_errors(const void *cper, size_t 
> len,
>  &cper_addr, &read_ack_register_addr, errp);
>  }
>  
> -if (!cper_addr) {
> -error_setg(errp, "can not find Generic Error Status Block");
> -return;
> -}
> -
>  cpu_physical_memory_read(read_ack_register_addr,
>   &read_ack_register, sizeof(read_ack_register));
>  



Re: [PATCH v3 03/14] acpi/ghes: Use HEST table offsets when preparing GHES records

2025-02-03 Thread Jonathan Cameron via
On Fri, 31 Jan 2025 18:42:44 +0100
Mauro Carvalho Chehab  wrote:

> There are two pointers that are needed during error injection:
> 
> 1. The start address of the CPER block to be stored;
> 2. The address of the ack.
> 
> It is preferable to calculate them from the HEST table.  This allows
> checking the source ID, the size of the table and the type of the
> HEST error block structures.
> 
> Yet, keep the old code, as this is needed for migration purposes.
> 
> Signed-off-by: Mauro Carvalho Chehab 
Tiny niggle on patch split up inline.  Either way

Reviewed-by: Jonathan Cameron 

> @@ -212,14 +237,6 @@ static void build_ghes_error_table(GArray 
> *hardware_errors, BIOSLinker *linker,
>  {
>  int i, error_status_block_offset;
>  
> -/*
> - * TODO: Current version supports only one source.
> - * A further patch will drop this check, after adding a proper migration
> - * code, as, for the code to work, we need to store a bios pointer to the
> - * HEST table.
> - */
> -assert(num_sources == 1);
> -
>  /* Build error_block_address */
>  for (i = 0; i < num_sources; i++) {
>  build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
> @@ -352,6 +369,14 @@ void acpi_build_hest(GArray *table_data, GArray 
> *hardware_errors,
>  .oem_id = oem_id, .oem_table_id = oem_table_id };
>  uint32_t hest_offset;
>  int i;
> +AcpiGedState *acpi_ged_state;
> +AcpiGhesState *ags = NULL;
> +
> +acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> +   NULL));
> +if (acpi_ged_state) {
> +ags = &acpi_ged_state->ghes_state;
> +}
>  
>  hest_offset = table_data->len;
>  
> @@ -371,10 +396,12 @@ void acpi_build_hest(GArray *table_data, GArray 
> *hardware_errors,
>   * Tell firmware to write into GPA the address of HEST via fw_cfg,
>   * once initialized.
>   */
> -bios_linker_loader_write_pointer(linker,
> - ACPI_HEST_ADDR_FW_CFG_FILE, 0,
> - sizeof(uint64_t),
> - ACPI_BUILD_TABLE_FILE, hest_offset);
> +if (ags->use_hest_addr) {

Maybe move ags->use_hest_addr introduction to previous patch to avoid
churn here?  It's not set yet anyway.

> +bios_linker_loader_write_pointer(linker,
> + ACPI_HEST_ADDR_FW_CFG_FILE, 0,
> + sizeof(uint64_t),
> + ACPI_BUILD_TABLE_FILE, hest_offset);
> +}
>  }
>  
>  void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> @@ -420,6 +447,78 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
>  *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
>  }






Re: [PATCH v2 12/14] meson: Disallow 64-bit on 32-bit TCG emulation

2025-02-03 Thread Thomas Huth

On 03/02/2025 04.18, Richard Henderson wrote:

For system mode, we can rarely support the amount of RAM that
the guest requires. Emulation is restricted to round-robin
mode, which solves many of the atomicity issues, but not those
associated with virtio.  In any case, round-robin does nothing
to help the speed of emulation.

For user mode, most emulation does not succeed at all.  Most
of the time we cannot even load 64-bit non-PIE binaries due
to lack of a 64-bit address space.  Threads are run in
parallel, not round-robin, which means that atomicity
is not handled.

Signed-off-by: Richard Henderson 
---
  meson.build | 14 +++---
  1 file changed, 11 insertions(+), 3 deletions(-)


Reviewed-by: Thomas Huth 




Re: [PATCH v3 12/14] tests/acpi: virt: add a HEST table to aarch64 virt and update DSDT

2025-02-03 Thread Jonathan Cameron via
On Fri, 31 Jan 2025 18:42:53 +0100
Mauro Carvalho Chehab  wrote:

> --- a/DSDT.dsl2025-01-28 09:38:15.155347858 +0100
> +++ b/DSDT.dsl2025-01-28 09:39:01.684836954 +0100
> @@ -9,9 +9,9 @@
>   *
>   * Original Table Header:
>   * Signature"DSDT"
> - * Length   0x1516 (5398)
> + * Length   0x1542 (5442)
>   * Revision 0x02
> - * Checksum 0x0F
> + * Checksum 0xE9
>   * OEM ID   "BOCHS "
>   * OEM Table ID "BXPC"
>   * OEM Revision 0x0001 (1)
> @@ -1931,6 +1931,11 @@
>  {
>  Notify (PWRB, 0x80) // Status Change
>  }
> +
> +If (((Local0 & 0x10) == 0x10))
> +{
> +Notify (GEDD, 0x80) // Status Change
> +}
>  }
>  }
> 
> @@ -1939,6 +1944,12 @@
>  Name (_HID, "PNP0C0C" /* Power Button Device */)  // _HID: 
> Hardware ID
>  Name (_UID, Zero)  // _UID: Unique ID
>  }
> +
> +Device (GEDD)
> +{
> +Name (_HID, "PNP0C33" /* Error Device */)  // _HID: Hardware ID
> +Name (_UID, Zero)  // _UID: Unique ID
> +}
>  }
>  }
> 
> Signed-off-by: Mauro Carvalho Chehab 
Diff looks good.

Reviewed-by: Jonathan Cameron 



Re: [PATCH v3 14/14] scripts/ghes_inject: add a script to generate GHES error inject

2025-02-03 Thread Jonathan Cameron via
On Fri, 31 Jan 2025 18:42:55 +0100
Mauro Carvalho Chehab  wrote:

> Using the QMP GHESv2 API requires preparing a raw data array
> containing a CPER record.
> 
> Add a helper script with subcommands to prepare such data.
> 
> Currently, only ARM Processor error CPER record is supported, by
> using:
>   $ ghes_inject.py arm
> 
> which produces those warnings on Linux:
> 
> [  705.032426] [Firmware Warn]: GHES: Unhandled processor error type 0x02: 
> cache error
> [  774.866308] {4}[Hardware Error]: Hardware error from APEI Generic Hardware 
> Error Source: 1
> [  774.866583] {4}[Hardware Error]: event severity: recoverable
> [  774.866738] {4}[Hardware Error]:  Error 0, type: recoverable
> [  774.866889] {4}[Hardware Error]:   section_type: ARM processor error
> [  774.867048] {4}[Hardware Error]:   MIDR: 0x000f0510
> [  774.867189] {4}[Hardware Error]:   running state: 0x0
> [  774.867321] {4}[Hardware Error]:   Power State Coordination Interface 
> state: 0
> [  774.867511] {4}[Hardware Error]:   Error info structure 0:
> [  774.867679] {4}[Hardware Error]:   num errors: 2
> [  774.867801] {4}[Hardware Error]:error_type: 0x02: cache error
> [  774.867962] {4}[Hardware Error]:error_info: 0x0091000f
> [  774.868124] {4}[Hardware Error]: transaction type: Data Access
> [  774.868280] {4}[Hardware Error]: cache error, operation type: Data 
> write
> [  774.868465] {4}[Hardware Error]: cache level: 2
> [  774.868592] {4}[Hardware Error]: processor context not corrupted
> [  774.868774] [Firmware Warn]: GHES: Unhandled processor error type 0x02: 
> cache error
> 
> Such script allows customizing the error data, allowing to change
> all fields at the record. Please use:
> 
>   $ ghes_inject.py arm -h
> 
> For more details about its usage.
> 
> Signed-off-by: Mauro Carvalho Chehab 

Thanks for examples.  Given my poor python skills take this one with
a pinch of salt.

Reviewed-by: Jonathan Cameron 



Re: [PATCH 0/2] target/i386: Fix 0 * Inf + QNaN regression

2025-02-03 Thread Peter Maydell
On Fri, 24 Jan 2025 at 17:22, Paolo Bonzini  wrote:
>
> Queued, thanks.

Thanks; do you plan to send a pullreq with these in soon?
I ask because the Arm FEAT_AFP set is now ready to land
and it has a dependency on these.

thanks
-- PMM



RE: [PATCH v1 14/18] hw/arm/aspeed: Add SoC and Machine Support for AST2700 A1

2025-02-03 Thread Jamin Lin
Hi Andrew,

> From: Andrew Jeffery 
> Sent: Thursday, January 30, 2025 12:22 PM
> To: Jamin Lin ; Cédric Le Goater ;
> Peter Maydell ; Steven Lee
> ; Troy Lee ; Joel Stanley
> ; open list:ASPEED BMCs ; open
> list:All patches CC here 
> Cc: Troy Lee ; Yunlin Tang
> 
> Subject: Re: [PATCH v1 14/18] hw/arm/aspeed: Add SoC and Machine Support
> for AST2700 A1
> 
> On Tue, 2025-01-21 at 15:04 +0800, Jamin Lin wrote:
> > The memory map for AST2700 A1 remains compatible with AST2700 A0.
> > However, the IRQ mapping has been updated for AST2700 A1, with GIC
> > interrupts now ranging from 192 to 201. Add a new IRQ map table for
> > AST2700 A1.
> >
> > Introduce "aspeed_machine_ast2700_evb_class_init" to initialize the
> > AST2700 EVB
> > machine. Add "aspeed_soc_ast2700_class_init" to initialize the
> > AST2700 A1 SoC.
> >
> > Signed-off-by: Jamin Lin 
> > ---
> >  hw/arm/aspeed.c | 24 +
> >  hw/arm/aspeed_ast27x0.c | 80
> > +
> >  2 files changed, 104 insertions(+)
> >
> > diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c index
> > 402d55c556..254fa5316d 100644
> > --- a/hw/arm/aspeed.c
> > +++ b/hw/arm/aspeed.c
> > @@ -1672,6 +1672,26 @@ static void
> > aspeed_machine_ast2700a0_evb_class_init(ObjectClass *oc, void *data)
> >  mc->default_ram_size = 1 * GiB;
> >  aspeed_machine_class_init_cpus_defaults(mc);
> >  }
> > +
> > +static void aspeed_machine_ast2700_evb_class_init(ObjectClass *oc,
> > void *data)
> > +{
> > +    MachineClass *mc = MACHINE_CLASS(oc);
> > +    AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
> > +
> > +    mc->desc = "Aspeed AST2700 EVB (Cortex-A35)";
> > +    amc->soc_name  = "ast2700-a1";
> > +    amc->hw_strap1 = AST2700_EVB_HW_STRAP1;
> > +    amc->hw_strap2 = AST2700_EVB_HW_STRAP2;
> > +    amc->fmc_model = "w25q01jvq";
> > +    amc->spi_model = "w25q512jv";
> > +    amc->num_cs    = 2;
> > +    amc->macs_mask = ASPEED_MAC0_ON | ASPEED_MAC1_ON |
> > ASPEED_MAC2_ON;
> > +    amc->uart_default = ASPEED_DEV_UART12;
> > +    amc->i2c_init  = ast2700_evb_i2c_init;
> > +    mc->default_ram_size = 1 * GiB;
> > +    aspeed_machine_class_init_cpus_defaults(mc);
> > +}
> > +
> >  #endif
> >
> >  static void aspeed_machine_qcom_dc_scm_v1_class_init(ObjectClass
> > *oc,
> > @@ -1798,6 +1818,10 @@ static const TypeInfo aspeed_machine_types[] =
> > {
> >  .name  =
> MACHINE_TYPE_NAME("ast2700a0-evb"),
> >  .parent    = TYPE_ASPEED_MACHINE,
> >  .class_init    = aspeed_machine_ast2700a0_evb_class_init,
> > + }, {
> > +    .name  = MACHINE_TYPE_NAME("ast2700-evb"),
> > +    .parent    = TYPE_ASPEED_MACHINE,
> > +    .class_init    = aspeed_machine_ast2700_evb_class_init,
> >  #endif
> >  }, {
> >  .name  = TYPE_ASPEED_MACHINE, diff --git
> > a/hw/arm/aspeed_ast27x0.c b/hw/arm/aspeed_ast27x0.c index
> > b32c4fcc35..e0a29c9053 100644
> > --- a/hw/arm/aspeed_ast27x0.c
> > +++ b/hw/arm/aspeed_ast27x0.c
> > @@ -119,6 +119,52 @@ static const int aspeed_soc_ast2700a0_irqmap[] =
> > {
> >  [ASPEED_DEV_SDHCI] = 133,
> >  };
> >
> > +static const int aspeed_soc_ast2700_irqmap[] = {
> > +    [ASPEED_DEV_UART0] = 196,
> > +    [ASPEED_DEV_UART1] = 196,
> > +    [ASPEED_DEV_UART2] = 196,
> > +    [ASPEED_DEV_UART3] = 196,
> > +    [ASPEED_DEV_UART4] = 8,
> > +    [ASPEED_DEV_UART5] = 196,
> > +    [ASPEED_DEV_UART6] = 196,
> > +    [ASPEED_DEV_UART7] = 196,
> > +    [ASPEED_DEV_UART8] = 196,
> > +    [ASPEED_DEV_UART9] = 196,
> > +    [ASPEED_DEV_UART10]    = 196,
> > +    [ASPEED_DEV_UART11]    = 196,
> > +    [ASPEED_DEV_UART12]    = 196,
> > +    [ASPEED_DEV_FMC]   = 195,
> > +    [ASPEED_DEV_SDMC]  = 0,
> > +    [ASPEED_DEV_SCU]   = 12,
> > +    [ASPEED_DEV_ADC]   = 194,
> > +    [ASPEED_DEV_XDMA]  = 5,
> > +    [ASPEED_DEV_EMMC]  = 15,
> > +    [ASPEED_DEV_GPIO]  = 194,
> > +    [ASPEED_DEV_RTC]   = 13,
> > +    [ASPEED_DEV_TIMER1]    = 16,
> > +    [ASPEED_DEV_TIMER2]    = 17,
> > +    [ASPEED_DEV_TIMER3]    = 18,
> > +    [ASPEED_DEV_TIMER4]    = 19,
> > +    [ASPEED_DEV_TIMER5]    = 20,
> > +    [ASPEED_DEV_TIMER6]    = 21,
> > +    [ASPEED_DEV_TIMER7]    = 22,
> > +    [ASPEED_DEV_TIMER8]    = 23,
> > +    [ASPEED_DEV_WDT]   = 195,
> > +    [ASPEED_DEV_PWM]   = 195,
> > +    [ASPEED_DEV_LPC]   = 192,
> > +    [ASPEED_DEV_IBT]   = 192,
> > +    [ASPEED_DEV_I2C]   = 194,
> > +    [ASPEED_DEV_PECI]  = 197,
> > +    [ASPEED_DEV_ETH1]  = 196,
> > +    [ASPEED_DEV_ETH2]  = 196,
> > +    [ASPEED_DEV_ETH3]  = 196,
> > +    [ASPEED_DEV_HACE]  = 4,
> > +    [ASPEED_DEV_KCS]   = 192,
> > +    [ASPEED_DEV_DP]    = 28,
> > +    [ASPEED_DEV_I3C]   = 195,
> > +    [ASPEED_DEV_SDHCI] = 197,
> > +};
> 
> Bit of a nit, but can we sort this table? Perhaps by interrupt value?

Thanks for review and suggestion.
Will update this table

Jamin

Re: [PATCH v9 0/4] chardev: implement backend chardev multiplexing

2025-02-03 Thread Roman Penyaev
Hi Marc-Andre,

Do you plan to pull the latest version of this series,
or is there something which I have to address?
Please let me know.

Thanks.

--
Roman

On Thu, Jan 23, 2025 at 9:53 AM Roman Penyaev  wrote:
>
> Mux is a character backend (host side) device, which multiplexes
> multiple frontends with one backend device. The following is a
> few lines from the QEMU manpage [1]:
>
>   A multiplexer is a "1:N" device, and here the "1" end is your
>   specified chardev backend, and the "N" end is the various parts
>   of QEMU that can talk to a chardev.
>
> But sadly multiple backends are not supported.
>
> This work implements a new chardev backend `hub` device, which
> aggregates input from multiple backend devices and forwards it to a
> single frontend device. Additionally, `hub` device takes the output
> from the frontend device and sends it back to all the connected
> backend devices. This allows for seamless interaction between
> different backend devices and a single frontend interface.
>
> The motivation is the EVE project [2], where it would be very
> convenient to have a virtio console frontend device on the guest that
> can be controlled from multiple backend devices, namely VNC and local
> TTY emulator. The following is an example of the QEMU command line:
>
> -chardev pty,path=/tmp/pty,id=pty0 \
> -chardev vc,id=vc0 \
> -chardev hub,id=hub0,chardevs.0=pty0,chardevs.1=vc0 \
> -device virtconsole,chardev=hub0 \
> -vnc 0.0.0.0:0
>
> Which creates two backend devices:
>
> * Text virtual console (`vc0`)
> * A pseudo TTY (`pty0`) connected to the single virtio hvc console with the
>   help of a new backend aggregator (`hub0`)
>
> `vc0` renders text to an image, which can be shared over the VNC
> protocol.  `pty0` is a pseudo TTY backend which provides bidirectional
> communication to the virtio hvc console.
>
> Once QEMU starts, the VNC client and any TTY emulator can be used to
> control a single hvc console. For example, these two different
> consoles should have similar input and output due to the buffer
> aggregation:
>
> # Start TTY emulator
> tio /tmp/pty
>
> # Start VNC client and switch to virtual console Ctrl-Alt-2
> vncviewer :0
>
> 'chardevs.N' list syntax is used for the sake of compatibility with
> the representation of JSON lists in 'key=val' pairs format of the
> util/keyval.c, despite the fact that modern QAPI way of parsing,
> namely qobject_input_visitor_new_str(), is not used. Choice of keeping
> QAPI list syntax may help to smoothly switch to modern parsing in the
> future.
>
> v8 .. v9:
>
> * Incorporate Reviewed-by tags
>
> v7 .. v8:
>
> * No need for a separate `->frontend` pointer in the hub device
>   structure, use `hub->parent.fe` directly.
> * Remove special handling of !EAGAIN error while serving write
>   to all backends. This should be safe, because detached backends
>   are handled by the `->be_open` flag check.
> * Combine `hub_chr_write_to_all()` and `hub_chr_write()` calls.
> * Fix docs generation: no single backtick, but double, so not
>   a `hub` but ``hub`` in qemu-options.hx
>
> v6 .. v7:
>
> After discussing v6 it was decided to:
>
> * Rename "multiplexer" to "aggregator"
> * Rename "mux-be" device type to "hub"
> * Drop all changes related to the original multiplexer implementation
>
> Code changes:
>
> * Added counting of CHR_EVENT_OPENED and CHR_EVENT_CLOSED events
> coming from backend devices. This prevents frontend devices from
> closing if one of the backend devices has been disconnected. The
> logic is simple: "the last one turns off the light".
>
> v5 .. v6:
>
> * Rebased on latest master
> * Changed how chardev is attached to a multiplexer: with version 6
> mux should specify list elements with ID of chardevs:
>
> chardevs.0=ID[,chardevs.N=ID]
>
> 'chardevs.N' list syntax is used for the sake of compatibility with
> the representation of JSON lists in 'key=val' pairs format of the
> util/keyval.c, despite the fact that modern QAPI way of parsing,
> namely qobject_input_visitor_new_str(), is not used. Choice of keeping
> QAPI list syntax may help to smoothly switch to modern parsing in the
> future.
>
> v4 .. v5:
>
> * Spelling fixes in qemu-options description
> * Memory leaks fixes in mux-be tests
> * Add sanity checks to chardev to avoid stacking of mux devices
> * Add corresponding unit test case to cover the creation of stacked
>   muxers: `-chardev mux-be,mux-id-be=ID`, which is forbidden
> * Reflect the fact that stacking is not supported in the documentation
>
> v3 .. v4:
>
> * Rebase on latest chardev changes
> * Add unit tests which test corner cases:
>* Inability to remove mux with active frontend
>* Inability to add more chardevs to a mux than `MUX_MAX`
>* Inability to mix mux-fe and mux-be for the same chardev
>
> v2 .. v3:
>
> * Split frontend and backend multiplexer implementations and
>   move them to separate files: char-mux-fe.c and char-mux-be.c
>
> v1 .. v2:
>
> * Separate type f

Re: [PATCH v6 00/10] Support virtio-gpu DRM native context

2025-02-03 Thread Alex Bennée
Dmitry Osipenko  writes:

> On 1/27/25 19:17, Alex Bennée wrote:
> ...
>> I'm still seeing corruption with -display gtk,gl=on on my x86 system
>> BTW. I would like to understand if that is a problem with QEMU, GTK or
>> something else in the stack before we merge.
>
> I reproduced the display mirroring/corruption issue and bisected it to
> the following commit. The problem only happens when QEMU/GTK uses
> Wayland display directly, while previously I was running QEMU with
> XWayland that doesn't have the problem. Why this change breaks dmabuf
> displaying with Wayland/GTK is unclear.

Ahh that makes sense - I obviously forgot to mention I'm running
sway/wayland across both machines.

> Reverting commit fixes the bug.
>
> +Dongwon Kim +Vivek Kasireddy
>
> commit 77bf310084dad38b3a2badf01766c659056f1cf2
> Author: Dongwon Kim 
> Date:   Fri Apr 26 15:50:59 2024 -0700
>
> ui/gtk: Draw guest frame at refresh cycle
>
> Draw routine needs to be manually invoked in the next refresh
> if there is a scanout blob from the guest. This is to prevent
> a situation where there is a scheduled draw event but it won't
> happen bacause the window is currently in inactive state
> (minimized or tabified). If draw is not done for a long time,
> gl_block timeout and/or fence timeout (on the guest) will happen
> eventually.
>
> v2: Use gd_gl_area_draw(vc) in gtk-gl-area.c
>
> Suggested-by: Vivek Kasireddy 
> Cc: Gerd Hoffmann 
> Cc: Marc-André Lureau 
> Cc: Daniel P. Berrangé 
> Signed-off-by: Dongwon Kim 
> Acked-by: Marc-André Lureau 
> Message-Id: <20240426225059.3871283-1-dongwon@intel.com>


Maybe a race on:

QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
?

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH 0/1] meson: Deprecate 32-bit host systems

2025-02-03 Thread Alex Bennée
Peter Maydell  writes:

> On Wed, 29 Jan 2025 at 06:23, Thomas Huth  wrote:
>> So unless someone complains immediately with a good reason, I'm also in
>> favor of marking it as deprecated now. If then someone complains during the
>> deprecation period, we still can reconsider and remove the deprecation note
>> again.
>
> Well, I mean the reason would be that I suspect we do still have
> users who are using QEMU for some purposes on 32-bit arm hosts.
> That doesn't mean they're trying to run massively complex or
> high memory guests or that they care that our whole test suite
> doesn't run.
>
> I'm not really strongly opposed to dropping 32-bit host support,
> but I don't think a thread on qemu-devel is exactly likely to
> get the attention of the people who might be using this
> functionality. (You could argue that functionality without
> representation among the developer community is fair game
> for being dumped even if it has users, of course.)

FWIW random internet poll:

  https://mastodon.org.uk/deck/@stsquad/113905257703721811

26% 32 bit
74% 64 bit

with 41 respondents.

>
> -- PMM

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PULL 0/8] Ui patches

2025-02-03 Thread Stefan Hajnoczi
On Mon, Feb 3, 2025 at 7:58 AM  wrote:
>
> From: Marc-André Lureau 
>
> The following changes since commit 6fccaa2fba391815308a746d68f7fa197bc93586:
>
>   Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into 
> staging (2025-02-02 11:09:10 -0500)
>
> are available in the Git repository at:
>
>   https://gitlab.com/marcandre.lureau/qemu.git tags/ui-pull-request
>
> for you to fetch changes up to f327a2cea1502a8cad2beb13bc4e2c12b95b10ba:
>
>   dbus: add -audio dbus nsamples option (2025-02-03 13:58:08 +0400)
>
> 
> UI/chardev-related patch queue
>
> 
>
> Marc-André Lureau (4):
>   ui/dbus: on win32, allow ANONYMOUS with p2p

Hi Marc-André,
There is an unexpected submodule update in this commit. Although it's
not included in the patch email sent to the mailing list, GitLab shows
it:
https://gitlab.com/marcandre.lureau/qemu/-/commit/31d9023965ba1963afd1e0e0f48c75399a7bc23e

Please rebase onto qemu.git/master and remove the spurious libvirt-ci
submodule update before resending this pull request. Thank you!

Stefan

>   ui/dbus: clarify the kind of win32 handle that is shared
>   plugins: fix -Werror=maybe-uninitialized false-positive
>   dbus: add -audio dbus nsamples option
>
> Roman Penyaev (4):
>   chardev/char-pty: send CHR_EVENT_CLOSED on disconnect
>   chardev/char-hub: implement backend chardev aggregator
>   tests/unit/test-char: add unit tests for hub chardev backend
>   qemu-options.hx: describe hub chardev and aggregation of several
> backends
>
>  qapi/audio.json|  22 +-
>  qapi/char.json |  27 +++
>  chardev/chardev-internal.h |  51 -
>  include/chardev/char.h |   1 +
>  audio/dbusaudio.c  |  29 ++-
>  chardev/char-hub.c | 301 
>  chardev/char-pty.c |   4 +-
>  chardev/char.c |  23 ++-
>  contrib/plugins/cache.c|   2 +-
>  tests/unit/test-char.c | 398 +
>  ui/dbus-console.c  |   8 +-
>  ui/dbus.c  |  10 +-
>  chardev/meson.build|   1 +
>  qemu-options.hx|  49 -
>  ui/dbus-display1.xml   |  16 +-
>  15 files changed, 923 insertions(+), 19 deletions(-)
>  create mode 100644 chardev/char-hub.c
>
> --
> 2.47.0
>
>



Re: [PATCH v4 00/33] Multifd 🔀 device state transfer support with VFIO consumer

2025-02-03 Thread Cédric Le Goater

Hello Maciej,


This patch set is targeting QEMU 10.0.

What's not yet present is documentation update under docs/devel/migration
but I didn't want to delay posting the code any longer.
Such doc can still be merged later when the design is 100% finalized.

The changes are quite complex, the design is not trivial, the benefits are
not huge as far as we know. I'd rather have the doc update first please.

Thanks,

C.





Re: [PATCH 0/7] hw/arm/raspi4b: Add models with 4GB and 8GB of DRAM

2025-02-03 Thread Alex Bennée
Peter Maydell  writes:

> On Sat, 1 Feb 2025 at 12:57, BALATON Zoltan  wrote:
>>
>> On Sat, 1 Feb 2025, Philippe Mathieu-Daudé wrote:
>> > - Deprecate the 'raspi4b' machine name, renaming it as
>> >  'raspi4b-1g' on 32-bit hosts, 'raspi4b-2g' otherwise.
>> > - Add the 'raspi4b-4g' and 'raspi4b-8g' machines, with
>> >  respectively 4GB and 8GB of DRAM.
>>
>> IMHO (meaning you can ignore it, just my opinion) if the only difference
>> is the memory size -machine raspi4b -memory 4g would be better user
>> experience than having a lot of different machines.
>
> Yes, I think I agree. We have a way for users to specify
> how much memory they want, and I think it makes more sense
> to use that than to have lots of different machine types.

I guess for the Pi we should validate the -memory supplied is on of the
supported grid of devices rather than an arbitrary value?

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH 0/7] hw/arm/raspi4b: Add models with 4GB and 8GB of DRAM

2025-02-03 Thread Daniel P . Berrangé
On Mon, Feb 03, 2025 at 02:29:49PM +, Alex Bennée wrote:
> Peter Maydell  writes:
> 
> > On Sat, 1 Feb 2025 at 12:57, BALATON Zoltan  wrote:
> >>
> >> On Sat, 1 Feb 2025, Philippe Mathieu-Daudé wrote:
> >> > - Deprecate the 'raspi4b' machine name, renaming it as
> >> >  'raspi4b-1g' on 32-bit hosts, 'raspi4b-2g' otherwise.
> >> > - Add the 'raspi4b-4g' and 'raspi4b-8g' machines, with
> >> >  respectively 4GB and 8GB of DRAM.
> >>
> >> IMHO (meaning you can ignore it, just my opinion) if the only difference
> >> is the memory size -machine raspi4b -memory 4g would be better user
> >> experience than having a lot of different machines.
> >
> > Yes, I think I agree. We have a way for users to specify
> > how much memory they want, and I think it makes more sense
> > to use that than to have lots of different machine types.
> 
> I guess for the Pi we should validate the -memory supplied is on of the
> supported grid of devices rather than an arbitrary value?

If the user wants to create a rpi4 with 6 GB RAM why should we stop
them ? It is their choice if they want to precisely replicate RAM
size from a physical model, or use something different when virtualized.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




  1   2   3   >