Re: [PATCH v6 00/10] Support virtio-gpu DRM native context
"Kim, Dongwon" writes: > Hi, > > The commit below could change the timing of drawing by making the drawing > done at refresh cycle instead of via drawing event. So it looks like either > dmabuf > or client's framebuffer is being written and read at the same time. Hey, can > you > describe how the corruption looks like? Is it just garbage image with random > noise > or the actual frame with some defects like tearing...? The terminal gets mirrored upside down and the mouse creates damage as it moves about. > >> Subject: Re: [PATCH v6 00/10] Support virtio-gpu DRM native context >> >> Dmitry Osipenko writes: >> >> > On 1/27/25 19:17, Alex Bennée wrote: >> > ... >> >> I'm still seeing corruption with -display gtk,gl=on on my x86 system >> >> BTW. I would like to understand if that is a problem with QEMU, GTK >> >> or something else in the stack before we merge. >> > >> > I reproduced the display mirroring/corruption issue and bisected it to >> > the following commit. The problem only happens when QEMU/GTK uses >> > Wayland display directly, while previously I was running QEMU with >> > XWayland that doesn't have the problem. Why this change breaks dmabuf >> > displaying with Wayland/GTK is unclear. >> >> Ahh that makes sense - I obviously forgot to mention I'm running sway/wayland >> across both machines. >> >> > Reverting commit fixes the bug. >> > >> > +Dongwon Kim +Vivek Kasireddy >> > >> > commit 77bf310084dad38b3a2badf01766c659056f1cf2 >> > Author: Dongwon Kim >> > Date: Fri Apr 26 15:50:59 2024 -0700 >> > >> > ui/gtk: Draw guest frame at refresh cycle >> > >> > Draw routine needs to be manually invoked in the next refresh >> > if there is a scanout blob from the guest. This is to prevent >> > a situation where there is a scheduled draw event but it won't >> > happen bacause the window is currently in inactive state >> > (minimized or tabified). If draw is not done for a long time, >> > gl_block timeout and/or fence timeout (on the guest) will happen >> > eventually. >> > >> > v2: Use gd_gl_area_draw(vc) in gtk-gl-area.c >> > >> > Suggested-by: Vivek Kasireddy >> > Cc: Gerd Hoffmann >> > Cc: Marc-André Lureau >> > Cc: Daniel P. Berrangé >> > Signed-off-by: Dongwon Kim >> > Acked-by: Marc-André Lureau >> > Message-Id: <20240426225059.3871283-1-dongwon@intel.com> >> >> >> Maybe a race on: >> >> QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf; ? >> >> -- >> Alex Bennée >> Virtualisation Tech Lead @ Linaro -- Alex Bennée Virtualisation Tech Lead @ Linaro
Re: [PATCH] block: remove unused BLOCK_OP_TYPE_DATAPLANE
On Mon, Feb 03, 2025 at 01:25:29PM -0500, Stefan Hajnoczi wrote: > BLOCK_OP_TYPE_DATAPLANE prevents BlockDriverState from being used by > virtio-blk/virtio-scsi with IOThread. Commit b112a65c52aa ("block: > declare blockjobs and dataplane friends!") eliminated the main reason > for this blocker in 2014. Wow, that's a long time. > > Nowadays the block layer supports I/O from multiple AioContexts, so > there is even less reason to block IOThread users. Any legitimate > reasons related to interference would probably also apply to > non-IOThread users. > > The only remaining users are bdrv_op_unblock(BLOCK_OP_TYPE_DATAPLANE) > calls after bdrv_op_block_all(). If we remove BLOCK_OP_TYPE_DATAPLANE > their behavior doesn't change. > > Existing bdrv_op_block_all() callers that don't explicitly unblock > BLOCK_OP_TYPE_DATAPLANE seem to do so simply because no one bothered to > rather than because it is necessary to keep BLOCK_OP_TYPE_DATAPLANE > blocked. > > Signed-off-by: Stefan Hajnoczi > --- > include/block/block-common.h | 1 - > block/replication.c | 1 - > blockjob.c | 2 -- > hw/block/virtio-blk.c| 9 - > hw/scsi/virtio-scsi.c| 3 --- > 5 files changed, 16 deletions(-) Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v3 09/26] target/arm/kvm-rme: Initialize Realm memory
On 11/26/24 5:56 AM, Jean-Philippe Brucker wrote: Initialize the IPA state of RAM. Collect the images copied into guest RAM into a sorted list, and issue POPULATE_REALM KVM ioctls once we've created the Realm Descriptor. The images are part of the Realm Initial Measurement. Signed-off-by: Jean-Philippe Brucker --- v2->v3: RIPAS is now initialized separately --- target/arm/kvm_arm.h | 14 + target/arm/kvm-rme.c | 128 +++ 2 files changed, 142 insertions(+) diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h index 8b52a881b0..67db09a424 100644 --- a/target/arm/kvm_arm.h +++ b/target/arm/kvm_arm.h @@ -255,6 +255,16 @@ int kvm_arm_rme_vm_type(MachineState *ms); */ int kvm_arm_rme_vcpu_init(CPUState *cs); +/* + * kvm_arm_rme_init_guest_ram + * @base: base address of RAM + * @size: size of RAM + * + * If the user requested a Realm, set the base and size of guest RAM, in order + * to initialize the Realm IPA space. + */ +void kvm_arm_rme_init_guest_ram(hwaddr base, size_t size); + #else /* @@ -281,6 +291,10 @@ static inline bool kvm_arm_mte_supported(void) return false; } +static inline void kvm_arm_rme_init_guest_ram(hwaddr base, size_t size) +{ +} + /* * These functions should never actually be called without KVM support. */ diff --git a/target/arm/kvm-rme.c b/target/arm/kvm-rme.c index e3cc37538a..83a29421df 100644 --- a/target/arm/kvm-rme.c +++ b/target/arm/kvm-rme.c @@ -9,6 +9,7 @@ #include "exec/confidential-guest-support.h" #include "hw/boards.h" #include "hw/core/cpu.h" +#include "hw/loader.h" #include "kvm_arm.h" #include "migration/blocker.h" #include "qapi/error.h" @@ -20,16 +21,85 @@ #define TYPE_RME_GUEST "rme-guest" OBJECT_DECLARE_SIMPLE_TYPE(RmeGuest, RME_GUEST) +#define RME_PAGE_SIZE qemu_real_host_page_size() + struct RmeGuest { ConfidentialGuestSupport parent_obj; +Notifier rom_load_notifier; +GSList *ram_regions; + +hwaddr ram_base; +size_t ram_size; }; s/size_t/hwaddr. To be consistent with RmeRamRegion, we may reuse it like below. struct RmeGuest { : GSlist *populate_ram_regions; RmeRamRegion init_ram_region; }; OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(RmeGuest, rme_guest, RME_GUEST, CONFIDENTIAL_GUEST_SUPPORT, { TYPE_USER_CREATABLE }, { }) +typedef struct { +hwaddr base; +hwaddr size; +} RmeRamRegion; + static RmeGuest *rme_guest; +static int rme_init_ram(hwaddr base, size_t size, Error **errp) +{ +int ret; +uint64_t start = QEMU_ALIGN_DOWN(base, RME_PAGE_SIZE); +uint64_t end = QEMU_ALIGN_UP(base + size, RME_PAGE_SIZE); +struct kvm_cap_arm_rme_init_ipa_args init_args = { +.init_ipa_base = start, +.init_ipa_size = end - start, +}; + +ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_ARM_RME, 0, +KVM_CAP_ARM_RME_INIT_IPA_REALM, +(intptr_t)&init_args); +if (ret) { +error_setg_errno(errp, -ret, + "failed to init RAM [0x%"HWADDR_PRIx", 0x%"HWADDR_PRIx")", ^^ ^^^ The type for 'start' and 'end' would be 'hwaddr'. + start, end); +} + +return ret; +} + +static int rme_populate_range(hwaddr base, size_t size, bool measure, + Error **errp) +{ +int ret; +uint64_t start = QEMU_ALIGN_DOWN(base, RME_PAGE_SIZE); +uint64_t end = QEMU_ALIGN_UP(base + size, RME_PAGE_SIZE); +struct kvm_cap_arm_rme_populate_realm_args populate_args = { +.populate_ipa_base = start, +.populate_ipa_size = end - start, +.flags = measure ? KVM_ARM_RME_POPULATE_FLAGS_MEASURE : 0, +}; + +ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_ARM_RME, 0, +KVM_CAP_ARM_RME_POPULATE_REALM, +(intptr_t)&populate_args); +if (ret) { +error_setg_errno(errp, -ret, + "failed to populate realm [0x%"HWADDR_PRIx", 0x%"HWADDR_PRIx")", + start, end); +} +return ret; +} + +static void rme_populate_ram_region(gpointer data, gpointer err) +{ +Error **errp = err; +const RmeRamRegion *region = data; + +if (*errp) { +return; +} + +rme_populate_range(region->base, region->size, /* measure */ true, errp); +} + static int rme_init_cpus(Error **errp) { int ret; @@ -60,6 +130,16 @@ static int rme_create_realm(Error **errp) return -1; } +if (rme_init_ram(rme_guest->ram_base, rme_guest->ram_size, errp)) { +return -1; +} + +g_slist_foreach(rme_guest->ram_regions, rme_populate_ram_region, errp); +g_slist_free_full(g_steal_pointer(&rme_guest->ram_regions), g_free); +if (*errp) { +return -1; +} +
Re: [PATCH v3 08/26] hw/core/loader: Add ROM loader notifier
On 11/26/24 5:56 AM, Jean-Philippe Brucker wrote: Add a function to register a notifier, that is invoked after a ROM gets loaded into guest memory. It will be used by Arm confidential guest support, in order to register all blobs loaded into memory with KVM, so that their content is moved into Realm state and measured into the initial VM state. Signed-off-by: Jean-Philippe Brucker --- include/hw/loader.h | 15 +++ hw/core/loader.c| 15 +++ 2 files changed, 30 insertions(+) diff --git a/include/hw/loader.h b/include/hw/loader.h index 7f6d06b956..0cd9905f97 100644 --- a/include/hw/loader.h +++ b/include/hw/loader.h @@ -353,6 +353,21 @@ void *rom_ptr_for_as(AddressSpace *as, hwaddr addr, size_t size); ssize_t rom_add_vga(const char *file); ssize_t rom_add_option(const char *file, int32_t bootindex); +typedef struct RomLoaderNotify { +/* Parameters passed to rom_add_blob() */ +hwaddr addr; +size_t len; +size_t max_len; +} RomLoaderNotify; + I would suggest to rename it to RomLoaderNotifyData since it's the data passed to the notifier. +/** + * rom_add_load_notifier - Add a notifier for loaded images + * + * Add a notifier that will be invoked with a RomLoaderNotify structure for each + * blob loaded into guest memory, after the blob is loaded. + */ +void rom_add_load_notifier(Notifier *notifier); + /* This is the usual maximum in uboot, so if a uImage overflows this, it would * overflow on real hardware too. */ #define UBOOT_MAX_GUNZIP_BYTES (64 << 20) diff --git a/hw/core/loader.c b/hw/core/loader.c index 31593a1171..759a62cf58 100644 --- a/hw/core/loader.c +++ b/hw/core/loader.c @@ -67,6 +67,8 @@ #include static int roms_loaded; +static NotifierList rom_loader_notifier = +NOTIFIER_LIST_INITIALIZER(rom_loader_notifier); /* return the size or -1 if error */ int64_t get_image_size(const char *filename) @@ -1179,6 +1181,11 @@ MemoryRegion *rom_add_blob(const char *name, const void *blob, size_t len, return mr; } +void rom_add_load_notifier(Notifier *notifier) +{ +notifier_list_add(&rom_loader_notifier, notifier); +} + /* This function is specific for elf program because we don't need to allocate * all the rom. We just allocate the first part and the rest is just zeros. This * is why romsize and datasize are different. Also, this function takes its own @@ -1220,6 +1227,7 @@ ssize_t rom_add_option(const char *file, int32_t bootindex) static void rom_reset(void *unused) { Rom *rom; +RomLoaderNotify notify; QTAILQ_FOREACH(rom, &roms, next) { if (rom->fw_file) { @@ -1268,6 +1276,13 @@ static void rom_reset(void *unused) cpu_flush_icache_range(rom->addr, rom->datasize); trace_loader_write_rom(rom->name, rom->addr, rom->datasize, rom->isrom); + +notify = (RomLoaderNotify) { +.addr = rom->addr, +.len = rom->datasize, +.max_len = rom->romsize, +}; +notifier_list_notify(&rom_loader_notifier, ¬ify); } } Thanks, Gavin
Re: [PATCH 0/1] meson: Deprecate 32-bit host systems
On 3/2/25 10:10, Alex Bennée wrote: Peter Maydell writes: On Wed, 29 Jan 2025 at 06:23, Thomas Huth wrote: So unless someone complains immediately with a good reason, I'm also in favor of marking it as deprecated now. If then someone complains during the deprecation period, we still can reconsider and remove the deprecation note again. Well, I mean the reason would be that I suspect we do still have users who are using QEMU for some purposes on 32-bit arm hosts. That doesn't mean they're trying to run massively complex or high memory guests or that they care that our whole test suite doesn't run. I'm not really strongly opposed to dropping 32-bit host support, but I don't think a thread on qemu-devel is exactly likely to get the attention of the people who might be using this functionality. (You could argue that functionality without representation among the developer community is fair game for being dumped even if it has users, of course.) FWIW random internet poll: https://mastodon.org.uk/deck/@stsquad/113905257703721811 26% 32 bit 74% 64 bit with 41 respondents. Note that some respondents who voted to maintain 32-bit support mixed 32-bit host with 32-bit guests.
Re: [PATCH 0/7] hw/arm/raspi4b: Add models with 4GB and 8GB of DRAM
On 3/2/25 15:50, Daniel P. Berrangé wrote: On Mon, Feb 03, 2025 at 02:45:06PM +, Peter Maydell wrote: On Mon, 3 Feb 2025 at 14:33, Daniel P. Berrangé wrote: On Mon, Feb 03, 2025 at 02:29:49PM +, Alex Bennée wrote: Peter Maydell writes: On Sat, 1 Feb 2025 at 12:57, BALATON Zoltan wrote: On Sat, 1 Feb 2025, Philippe Mathieu-Daudé wrote: - Deprecate the 'raspi4b' machine name, renaming it as 'raspi4b-1g' on 32-bit hosts, 'raspi4b-2g' otherwise. - Add the 'raspi4b-4g' and 'raspi4b-8g' machines, with respectively 4GB and 8GB of DRAM. IMHO (meaning you can ignore it, just my opinion) if the only difference is the memory size -machine raspi4b -memory 4g would be better user experience than having a lot of different machines. Yes, I think I agree. We have a way for users to specify how much memory they want, and I think it makes more sense to use that than to have lots of different machine types. I guess for the Pi we should validate the -memory supplied is on of the supported grid of devices rather than an arbitrary value? If the user wants to create a rpi4 with 6 GB RAM why should we stop them ? It is their choice if they want to precisely replicate RAM size from a physical model, or use something different when virtualized. The board revision code (reported to the guest via the emulated firmware interface) only supports reporting 256MB, 512MB, 1GB, 2GB, 4GB or 8GB: https://www.raspberrypi.com/documentation/computers/raspberry-pi.html#new-style-revision-codes I think it would be valid to report the revision code for the memory size that doesn't exceed what QEMU has configured. eg if configured with 6 GB, then report code for 4 GB. We need to distinct between physical machines VS virtual ones. Guests on virtual machines have some way to figure the virtual hardware (ACPI tables, DeviceTree blob, fw-cfg, ...). Guests for physical machines usually expect fixed hardware (not considering devices on busses). For the particular case of the Raspberry Pi machines, their bootloader gets the board layout by reading the RPI_FWREQ_GET_BOARD_REVISION constant value. What would be the point of emulating a raspi machine with 6GB if the FW is not going to consider besides 4GB? Besides, someone modify a guest to work with 6GB, it won't work on real HW. For Arm embedded boards we mostly tend to "restrict the user to what you can actually do", except for older boards where we tended not to write any kind of sanity checking on CPU type, memory size, etc. If we're going to strictly limit memory size that's accepted I wonder how we could information users/mgmt apps about what's permitted ? Expressing valid combinations of configs across different args gets pretty complicated quickly :-( I'll try to address Zoltan and Peter request to have a dynamic raspi machine. It is a bit unfortunate we didn't insisted on that when we decided to expose a fixed set of existing boards in order to not be bothered by inconsistent bug reports, back in 2019. Regards, Phil.
[PATCH qemu 4/5] hw/mem/cxl_type3: Ensure errp is set on realization failure
From: Li Zhijian Simply pass the errp to its callee which will set errp if needed, to enhance error reporting for CXL Type 3 device initialization by setting the errp when realization functions fail. Previously, failing to set `errp` could result in errors being overlooked, causing the system to mistakenly treat failure scenarios as successful and potentially leading to redundant cleanup operations in ct3_exit(). Signed-off-by: Li Zhijian Signed-off-by: Jonathan Cameron --- hw/mem/cxl_type3.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c index ff6861889b..d8b45f9bd1 100644 --- a/hw/mem/cxl_type3.c +++ b/hw/mem/cxl_type3.c @@ -891,7 +891,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp) &ct3d->cxl_dstate.device_registers); /* MSI(-X) Initialization */ -rc = msix_init_exclusive_bar(pci_dev, CXL_T3_MSIX_VECTOR_NR, 4, NULL); +rc = msix_init_exclusive_bar(pci_dev, CXL_T3_MSIX_VECTOR_NR, 4, errp); if (rc) { goto err_free_special_ops; } @@ -912,7 +912,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp) pcie_cap_deverr_init(pci_dev); /* Leave a bit of room for expansion */ -rc = pcie_aer_init(pci_dev, PCI_ERR_VER, 0x200, PCI_ERR_SIZEOF, NULL); +rc = pcie_aer_init(pci_dev, PCI_ERR_VER, 0x200, PCI_ERR_SIZEOF, errp); if (rc) { goto err_release_cdat; } -- 2.43.0
[PATCH qemu 5/5] mem/cxl_type3: support 3, 6, 12 and 16 interleave ways
From: Yao Xingtao Since the kernel does not check the interleave capability, a 3-way, 6-way, 12-way or 16-way region can be create normally. Applications can access the memory of 16-way region normally because qemu can convert hpa to dpa correctly for the power of 2 interleave ways, after kernel implementing the check, this kind of region will not be created any more. For non power of 2 interleave ways, applications could not access the memory normally and may occur some unexpected behaviors, such as segmentation fault. So implements this feature is needed. Link: https://lore.kernel.org/linux-cxl/3e84b919-7631-d1db-3e1d-33000f3f3...@fujitsu.com/ Signed-off-by: Yao Xingtao Signed-off-by: Jonathan Cameron --- hw/cxl/cxl-component-utils.c | 9 +++-- hw/mem/cxl_type3.c | 15 +++ 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c index cd116c0401..473895948b 100644 --- a/hw/cxl/cxl-component-utils.c +++ b/hw/cxl/cxl-component-utils.c @@ -243,8 +243,13 @@ static void hdm_init_common(uint32_t *reg_state, uint32_t *write_msk, ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, INTERLEAVE_4K, 1); ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, POISON_ON_ERR_CAP, 0); -ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 3_6_12_WAY, 0); -ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 16_WAY, 0); +if (type == CXL2_TYPE3_DEVICE) { +ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 3_6_12_WAY, 1); +ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 16_WAY, 1); +} else { +ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 3_6_12_WAY, 0); +ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 16_WAY, 0); +} ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, UIO, 0); ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, UIO_DECODER_COUNT, 0); diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c index d8b45f9bd1..6fffa21ead 100644 --- a/hw/mem/cxl_type3.c +++ b/hw/mem/cxl_type3.c @@ -1100,10 +1100,17 @@ static bool cxl_type3_dpa(CXLType3Dev *ct3d, hwaddr host_addr, uint64_t *dpa) continue; } -*dpa = dpa_base + -((MAKE_64BIT_MASK(0, 8 + ig) & hpa_offset) | - ((MAKE_64BIT_MASK(8 + ig + iw, 64 - 8 - ig - iw) & hpa_offset) - >> iw)); +if (iw < 8) { +*dpa = dpa_base + +((MAKE_64BIT_MASK(0, 8 + ig) & hpa_offset) | + ((MAKE_64BIT_MASK(8 + ig + iw, 64 - 8 - ig - iw) & hpa_offset) + >> iw)); +} else { +*dpa = dpa_base + +((MAKE_64BIT_MASK(0, 8 + ig) & hpa_offset) | + MAKE_64BIT_MASK(ig + iw, 64 - ig - iw) & hpa_offset) + >> (ig + iw)) / 3) << (ig + 8))); +} return true; } -- 2.43.0
Re: [PATCH V1 02/26] migration: lower handler priority
Steve Sistare writes: > Define a vmstate priority that is lower than the default, so its handlers > run after all default priority handlers. Since 0 is no longer the default > priority, translate an uninitialized priority of 0 to MIG_PRI_DEFAULT. > > CPR for vfio will use this to install handlers for containers that run > after handlers for the devices that they contain. > > Signed-off-by: Steve Sistare Reviewed-by: Fabiano Rosas
[PATCH qemu 3/5] hw/mem/cxl_type3: Fix special_ops memory leak on msix_init_exclusive_bar() failure
From: Li Zhijian Address a memory leak issue by ensuring `regs->special_ops` is freed when `msix_init_exclusive_bar()` encounters an error during CXL Type3 device initialization. Additionally, this patch renames err_address_space_free to err_msix_uninit for better clarity and logical flow Signed-off-by: Li Zhijian Signed-off-by: Jonathan Cameron --- hw/mem/cxl_type3.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c index 4775aab0d6..ff6861889b 100644 --- a/hw/mem/cxl_type3.c +++ b/hw/mem/cxl_type3.c @@ -893,7 +893,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp) /* MSI(-X) Initialization */ rc = msix_init_exclusive_bar(pci_dev, CXL_T3_MSIX_VECTOR_NR, 4, NULL); if (rc) { -goto err_address_space_free; +goto err_free_special_ops; } for (i = 0; i < CXL_T3_MSIX_VECTOR_NR; i++) { msix_vector_use(pci_dev, i); @@ -907,7 +907,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp) cxl_cstate->cdat.free_cdat_table = ct3_free_cdat_table; cxl_cstate->cdat.private = ct3d; if (!cxl_doe_cdat_init(cxl_cstate, errp)) { -goto err_free_special_ops; +goto err_msix_uninit; } pcie_cap_deverr_init(pci_dev); @@ -943,10 +943,10 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp) err_release_cdat: cxl_doe_cdat_release(cxl_cstate); -err_free_special_ops: +err_msix_uninit: msix_uninit_exclusive_bar(pci_dev); +err_free_special_ops: g_free(regs->special_ops); -err_address_space_free: if (ct3d->dc.host_dc) { cxl_destroy_dc_regions(ct3d); address_space_destroy(&ct3d->dc.host_dc_as); -- 2.43.0
[PATCH qemu 2/5] hw/mem/cxl_type3: Add paired msix_uninit_exclusive_bar() call
From: Li Zhijian msix_uninit_exclusive_bar() should be paired with msix_init_exclusive_bar() Ensure proper resource cleanup by adding the missing `msix_uninit_exclusive_bar()` call for the Type3 CXL device. Signed-off-by: Li Zhijian Signed-off-by: Jonathan Cameron --- hw/mem/cxl_type3.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c index ebc0ec536e..4775aab0d6 100644 --- a/hw/mem/cxl_type3.c +++ b/hw/mem/cxl_type3.c @@ -944,6 +944,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp) err_release_cdat: cxl_doe_cdat_release(cxl_cstate); err_free_special_ops: +msix_uninit_exclusive_bar(pci_dev); g_free(regs->special_ops); err_address_space_free: if (ct3d->dc.host_dc) { @@ -967,6 +968,7 @@ static void ct3_exit(PCIDevice *pci_dev) pcie_aer_exit(pci_dev); cxl_doe_cdat_release(cxl_cstate); +msix_uninit_exclusive_bar(pci_dev); g_free(regs->special_ops); if (ct3d->dc.host_dc) { cxl_destroy_dc_regions(ct3d); -- 2.43.0
[PATCH qemu 0/5] hw/cxl: Cleanups and interleave support.
First set of CXL updates for the 10.0 cycle. - Mixture of cleanup and hardening against a repeat of recent MSI-X numbering bug. - Expanded interleave support (been on my tree a long time) Whilst I think these are in a good state, review always welcome. Li Zhijian (4): hw/cxl: Introduce CXL_T3_MSIX_VECTOR enumeration hw/mem/cxl_type3: Add paired msix_uninit_exclusive_bar() call hw/mem/cxl_type3: Fix special_ops memory leak on msix_init_exclusive_bar() failure hw/mem/cxl_type3: Ensure errp is set on realization failure Yao Xingtao (1): mem/cxl_type3: support 3, 6, 12 and 16 interleave ways include/hw/cxl/cxl_device.h | 4 ++-- hw/cxl/cxl-component-utils.c | 9 ++-- hw/cxl/cxl-device-utils.c| 12 -- hw/cxl/switch-mailbox-cci.c | 4 +++- hw/mem/cxl_type3.c | 45 +--- 5 files changed, 48 insertions(+), 26 deletions(-) -- 2.43.0
[PATCH qemu 1/5] hw/cxl: Introduce CXL_T3_MSIX_VECTOR enumeration
From: Li Zhijian Introduce the `CXL_T3_MSIX_VECTOR` enumeration to specify MSIX vector assignments specific to the Type 3 (T3) CXL device. The primary goal of this change is to encapsulate the MSIX vector uses that are unique to the T3 device within an enumeration, improving code readability and maintenance by avoiding magic numbers. This organizational change allows for more explicit references to each vector’s role, thereby reducing the potential for misconfiguration. It also modified `mailbox_reg_init_common` to accept the `msi_n` parameter, reflecting the new MSIX vector setup. This pertains to the T3 device privately; other endpoints should refrain from using it, despite its public accessibility to all of them. Signed-off-by: Li Zhijian Signed-off-by: Jonathan Cameron --- include/hw/cxl/cxl_device.h | 4 ++-- hw/cxl/cxl-device-utils.c | 12 +--- hw/cxl/switch-mailbox-cci.c | 4 +++- hw/mem/cxl_type3.c | 20 ++-- 4 files changed, 24 insertions(+), 16 deletions(-) diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h index 561b375dc8..3a0ee7e8e7 100644 --- a/include/hw/cxl/cxl_device.h +++ b/include/hw/cxl/cxl_device.h @@ -264,8 +264,8 @@ void cxl_device_register_block_init(Object *obj, CXLDeviceState *dev, typedef struct CXLType3Dev CXLType3Dev; typedef struct CSWMBCCIDev CSWMBCCIDev; /* Set up default values for the register block */ -void cxl_device_register_init_t3(CXLType3Dev *ct3d); -void cxl_device_register_init_swcci(CSWMBCCIDev *sw); +void cxl_device_register_init_t3(CXLType3Dev *ct3d, int msi_n); +void cxl_device_register_init_swcci(CSWMBCCIDev *sw, int msi_n); /* * CXL r3.1 Section 8.2.8.1: CXL Device Capabilities Array Register diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c index 035d034f6d..52ad1e4c3f 100644 --- a/hw/cxl/cxl-device-utils.c +++ b/hw/cxl/cxl-device-utils.c @@ -352,10 +352,8 @@ static void device_reg_init_common(CXLDeviceState *cxl_dstate) } } -static void mailbox_reg_init_common(CXLDeviceState *cxl_dstate) +static void mailbox_reg_init_common(CXLDeviceState *cxl_dstate, int msi_n) { -const uint8_t msi_n = 9; - /* 2048 payload size */ ARRAY_FIELD_DP32(cxl_dstate->mbox_reg_state32, CXL_DEV_MAILBOX_CAP, PAYLOAD_SIZE, CXL_MAILBOX_PAYLOAD_SHIFT); @@ -382,7 +380,7 @@ static void memdev_reg_init_common(CXLDeviceState *cxl_dstate) cxl_dstate->memdev_status = memdev_status_reg; } -void cxl_device_register_init_t3(CXLType3Dev *ct3d) +void cxl_device_register_init_t3(CXLType3Dev *ct3d, int msi_n) { CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate; uint64_t *cap_h = cxl_dstate->caps_reg_state64; @@ -398,7 +396,7 @@ void cxl_device_register_init_t3(CXLType3Dev *ct3d) device_reg_init_common(cxl_dstate); cxl_device_cap_init(cxl_dstate, MAILBOX, 2, CXL_DEV_MAILBOX_VERSION); -mailbox_reg_init_common(cxl_dstate); +mailbox_reg_init_common(cxl_dstate, msi_n); cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000, CXL_MEM_DEV_STATUS_VERSION); @@ -408,7 +406,7 @@ void cxl_device_register_init_t3(CXLType3Dev *ct3d) CXL_MAILBOX_MAX_PAYLOAD_SIZE); } -void cxl_device_register_init_swcci(CSWMBCCIDev *sw) +void cxl_device_register_init_swcci(CSWMBCCIDev *sw, int msi_n) { CXLDeviceState *cxl_dstate = &sw->cxl_dstate; uint64_t *cap_h = cxl_dstate->caps_reg_state64; @@ -423,7 +421,7 @@ void cxl_device_register_init_swcci(CSWMBCCIDev *sw) device_reg_init_common(cxl_dstate); cxl_device_cap_init(cxl_dstate, MAILBOX, 2, 1); -mailbox_reg_init_common(cxl_dstate); +mailbox_reg_init_common(cxl_dstate, msi_n); cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000, 1); memdev_reg_init_common(cxl_dstate); diff --git a/hw/cxl/switch-mailbox-cci.c b/hw/cxl/switch-mailbox-cci.c index 65cdac6cc1..833b824619 100644 --- a/hw/cxl/switch-mailbox-cci.c +++ b/hw/cxl/switch-mailbox-cci.c @@ -17,10 +17,12 @@ #include "hw/qdev-properties.h" #include "hw/cxl/cxl.h" +#define CXL_SWCCI_MSIX_MBOX 3 + static void cswmbcci_reset(DeviceState *dev) { CSWMBCCIDev *cswmb = CXL_SWITCH_MAILBOX_CCI(dev); -cxl_device_register_init_swcci(cswmb); +cxl_device_register_init_swcci(cswmb, CXL_SWCCI_MSIX_MBOX); } static void cswbcci_realize(PCIDevice *pci_dev, Error **errp) diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c index 0ae1704a34..ebc0ec536e 100644 --- a/hw/mem/cxl_type3.c +++ b/hw/mem/cxl_type3.c @@ -30,6 +30,14 @@ #include "hw/cxl/cxl.h" #include "hw/pci/msix.h" +/* type3 device private */ +enum CXL_T3_MSIX_VECTOR { +CXL_T3_MSIX_PCIE_DOE_TABLE_ACCESS = 0, +CXL_T3_MSIX_EVENT_START = 2, +CXL_T3_MSIX_MBOX = CXL_T3_MSIX_EVENT_START + CXL_EVENT_TYPE_MAX, +CXL_T3_MSIX_VECTOR_NR +}; + #define DWORD_BYTE 4 #define CXL_CAPACITY_MULTIPLIER (256 * MiB) @@ -843,7 +851,6 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
Re: [PATCH v2 10/15] block/export: Don't ignore image activation error in blk_exp_add()
On Fri, Jan 31, 2025 at 10:50:46AM +0100, Kevin Wolf wrote: > Currently, block jobs can't handle inactive images correctly. Incoming > write requests would run into assertion failures. Make sure that we > return an error when creating an export can't activate the image. > > Signed-off-by: Kevin Wolf > --- > block/export/export.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
[PATCH v1 1/1] aspeed/soc: Support Non-maskable Interrupt for AST2700
QEMU supports GICv3 Non-maskable Interrupt, adds to support Non-maskable Interrupt for AST2700. Reference: https://github.com/qemu/qemu/commit/b36a32ead Signed-off-by: Jamin Lin --- hw/arm/aspeed_ast27x0.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/arm/aspeed_ast27x0.c b/hw/arm/aspeed_ast27x0.c index 4114e15ddd..361a054d46 100644 --- a/hw/arm/aspeed_ast27x0.c +++ b/hw/arm/aspeed_ast27x0.c @@ -470,6 +470,10 @@ static bool aspeed_soc_ast2700_gic_realize(DeviceState *dev, Error **errp) qdev_get_gpio_in(cpudev, ARM_CPU_VIRQ)); sysbus_connect_irq(gicbusdev, i + 3 * sc->num_cpus, qdev_get_gpio_in(cpudev, ARM_CPU_VFIQ)); +sysbus_connect_irq(gicbusdev, i + 4 * sc->num_cpus, + qdev_get_gpio_in(cpudev, ARM_CPU_NMI)); +sysbus_connect_irq(gicbusdev, i + 5 * sc->num_cpus, + qdev_get_gpio_in(cpudev, ARM_CPU_VINMI)); } return true; -- 2.25.1
Re: [PATCH v2 10/14] configure: Define TARGET_LONG_BITS in configs/targets/*.mak
On 2/3/25 10:30, Philippe Mathieu-Daudé wrote: On 3/2/25 04:18, Richard Henderson wrote: Define TARGET_LONG_BITS in each target's configure fragment. Do this without removing the define in target/*/cpu-param.h so that errors are caught like so: In file included from .../src/include/exec/cpu-defs.h:26, from ../src/target/hppa/cpu.h:24, from ../src/linux-user/qemu.h:4, from ../src/linux-user/hppa/cpu_loop.c:21: ../src/target/hppa/cpu-param.h:11: error: "TARGET_LONG_BITS" redefined [-Werror] 11 | #define TARGET_LONG_BITS 64 | In file included from .../src/include/qemu/osdep.h:36, from ../src/linux-user/hppa/cpu_loop.c:20: ./hppa-linux-user-config-target.h:32: note: this is the location of the previous definition 32 | #define TARGET_LONG_BITS 32 | cc1: all warnings being treated as errors Signed-off-by: Richard Henderson --- Orthogonal to this series, what about the other definitions, like TARGET_PHYS_ADDR_SPACE_BITS / TARGET_VIRT_ADDR_SPACE_BITS and possibly TARGET_PAGE_BITS? We don't need those at configure time, so there's no need to move them. r~
Re: [PATCH v2 13/15] iotests: Add filter_qtest()
On Fri, Jan 31, 2025 at 10:50:49AM +0100, Kevin Wolf wrote: > The open-coded form of this filter has been copied into enough tests > that it's better to move it into iotests.py. > > Signed-off-by: Kevin Wolf > --- > tests/qemu-iotests/iotests.py | 4 > tests/qemu-iotests/041| 4 +--- > tests/qemu-iotests/165| 4 +--- > tests/qemu-iotests/tests/copy-before-write| 3 +-- > tests/qemu-iotests/tests/migrate-bitmaps-test | 7 +++ > 5 files changed, 10 insertions(+), 12 deletions(-) Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v2 14/15] iotests: Add qsd-migrate case
On Fri, Jan 31, 2025 at 10:50:50AM +0100, Kevin Wolf wrote: > Test that it's possible to migrate a VM that uses an image on shared > storage through qemu-storage-daemon. > > Signed-off-by: Kevin Wolf > --- > tests/qemu-iotests/tests/qsd-migrate | 132 +++ > tests/qemu-iotests/tests/qsd-migrate.out | 51 + > 2 files changed, 183 insertions(+) > create mode 100755 tests/qemu-iotests/tests/qsd-migrate > create mode 100644 tests/qemu-iotests/tests/qsd-migrate.out > > diff --git a/tests/qemu-iotests/tests/qsd-migrate > b/tests/qemu-iotests/tests/qsd-migrate > new file mode 100755 > index 00..687bda6f93 > --- /dev/null > +++ b/tests/qemu-iotests/tests/qsd-migrate > @@ -0,0 +1,132 @@ > +#!/usr/bin/env python3 > +# group: rw quick > + > +with iotests.FilePath('disk.img') as path, \ > + iotests.FilePath('nbd-src.sock', base_dir=iotests.sock_dir) as nbd_src, > \ > + iotests.FilePath('nbd-dst.sock', base_dir=iotests.sock_dir) as nbd_dst, > \ > + iotests.FilePath('migrate.sock', base_dir=iotests.sock_dir) as > mig_sock, \ > + iotests.VM(path_suffix="-src") as vm_src, \ > + iotests.VM(path_suffix="-dst") as vm_dst: > + > + > +iotests.log('\nTest I/O on the source') > +vm_src.hmp_qemu_io('virtio0/virtio-backend', 'write -P 0x11 0 4k', > + use_log=True, qdev=True) > +vm_src.hmp_qemu_io('virtio0/virtio-backend', 'read -P 0x11 0 4k', > + use_log=True, qdev=True) > + > +iotests.log('\nStarting migration...') Is it worth adding a test that qemu_io fails to write on the destination while it is inactive (to ensure we are properly rejecting modification of an inactive image)? > + > +mig_caps = [ > +{'capability': 'events', 'state': True}, > +{'capability': 'pause-before-switchover', 'state': True}, > +] > +vm_src.qmp_log('migrate-set-capabilities', capabilities=mig_caps) > +vm_dst.qmp_log('migrate-set-capabilities', capabilities=mig_caps) > +vm_src.qmp_log('migrate', uri=f'unix:{mig_sock}', > + filters=[iotests.filter_qmp_testfiles]) > + > +vm_src.event_wait('MIGRATION', > + match={'data': {'status': 'pre-switchover'}}) > + > +iotests.log('\nPre-switchover: Reconfigure QSD instances') > + > +iotests.log(qsd_src.qmp('blockdev-set-active', {'active': False})) > +iotests.log(qsd_dst.qmp('blockdev-set-active', {'active': True})) Also, should you attempt a read on both src and dst while both sides are inactive, to prove that reads can take a snapshot in the middle of the handover? Oveall a nice test. Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v2 12/15] nbd/server: Support inactive nodes
On Fri, Jan 31, 2025 at 10:50:48AM +0100, Kevin Wolf wrote: > In order to support running an NBD export on inactive nodes, we must > make sure to return errors for any operations that aren't allowed on > inactive nodes. Reads are the only operation we know we need for > inactive images, so to err on the side of caution, return errors for > everything else, even if some operations could possibly be okay. We may still find a use case for block status on an inactive node (especially if that helps us take more accurate snapshots, which is the whole point of wanting to read pre-activation). But I'm okay if we defer that to a separate patch only if it actually proves to be needed. > > Signed-off-by: Kevin Wolf > --- > nbd/server.c | 17 + > 1 file changed, 17 insertions(+) > Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
On 3.02.2025 20:58, Peter Xu wrote: On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote: On 2.02.2025 13:45, Dr. David Alan Gilbert wrote: * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote: On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote: From: "Maciej S. Szmigiero" postcopy_ram_listen_thread() is a free running thread, so it needs to take BQL around function calls to migration methods requiring BQL. qemu_loadvm_state_main() needs BQL held since it ultimately calls "load_state" SaveVMHandlers. migration_incoming_state_destroy() needs BQL held since it ultimately calls "load_cleanup" SaveVMHandlers. Signed-off-by: Maciej S. Szmigiero --- migration/savevm.c | 4 1 file changed, 4 insertions(+) diff --git a/migration/savevm.c b/migration/savevm.c index b0b74140daea..0ceea9638cc1 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void *opaque) * in qemu_file, and thus we must be blocking now. */ qemu_file_set_blocking(f, true); +bql_lock(); load_res = qemu_loadvm_state_main(f, mis); +bql_unlock(); Doesn't that leave that held for a heck of a long time? Yes, and it effectively broke "postcopy recover" test but I think the reason for that is qemu_loadvm_state_main() and its children don't drop BQL while waiting for I/O. I've described this case in more detail in my reply to Fabiano here: https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd25...@maciej.szmigiero.name/ While it might be the cause in this case, my feeling is it's more fundamental here - it's the whole reason that postcopy has a separate ram listen thread. As the destination is running, after it loads it's devices and as it starts up the destination will be still loading RAM (and other postcopiable devices) potentially for quite a while. Holding the bql around the ram listen thread means that the execution of the destination won't be able to take that lock until the postcopy load has finished; so while that might apparently complete, it'll lead to the destination stalling until that's finished which defeats the whole point of postcopy. That last one probably won't fail a test but it will lead to a long stall if you give it a nice big guest with lots of RAM that it's rapidly changing. Okay, I understand the postcopy case/flow now. Thanks for explaining it clearly. I still think that "load_state" SaveVMHandlers need to be called with BQL held since implementations apparently expect it that way: for example, I think PCI device configuration restore calls address space manipulation methods which abort() if called without BQL held. However, the only devices that *should* be arriving on the channel that the postcopy_ram_listen_thread is reading from are those that are postcopiable (i.e. RAM and hmm block's dirty_bitmap). Those load handlers are safe to be run while the other devices are being changed. Note the *should* - you could add a check to fail if any other device arrives on that channel. I think ultimately there should be either an explicit check, or, as you suggest in the paragraph below, a separate SaveVMHandler that runs without BQL held. To me those are bugs happening during postcopy, so those abort()s in memory.c are indeed for catching these issues too. Since the current state of just running these SaveVMHandlers without BQL in this case and hoping that nothing breaks is clearly sub-optimal. I have previously even submitted a patch to explicitly document "load_state" SaveVMHandler as requiring BQL (which was also included in the previous version of this patch set) and it received a "Reviewed-by:" tag: https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/ https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/ https://lore.kernel.org/qemu-devel/87o732bti7@suse.de/ It happens! You could make this safer by having a load_state and a load_state_postcopy member, and only mark the load_state as requiring the lock. To not digress too much from the subject of this patch set (multifd VFIO device state transfer) for now I've just updated the TODO comment around that qemu_loadvm_state_main(), so hopefully this discussion won't get forgotten: https://gitlab.com/maciejsszmigiero/qemu/-/commit/046e3deac5b1dbc406b3e9571f62468bd6743e79 The commit message may still need some touch ups, e.g.: postcopy_ram_listen_thread() is a free running thread, so it needs to take BQL around function calls to migration methods requiring BQL. This sentence is still not correct, IMHO. As Dave explained, the ram load thread is designed to run without BQL at least for the major workloads it runs. So what's your proposed wording of this commit then? I don't worry on src sending s
[PATCH v2 02/12] hw/arm/raspi: Merge model 4B with other models
Except we alter the device tree blob, the 4B is just another raspi model. Signed-off-by: Philippe Mathieu-Daudé --- hw/arm/raspi.c | 114 - hw/arm/raspi4b.c | 136 - hw/arm/meson.build | 2 +- 3 files changed, 114 insertions(+), 138 deletions(-) delete mode 100644 hw/arm/raspi4b.c diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c index 508f90479e2..3fa382d62ce 100644 --- a/hw/arm/raspi.c +++ b/hw/arm/raspi.c @@ -8,6 +8,10 @@ * Raspberry Pi 3 emulation Copyright (c) 2018 Zoltán Baldaszti * Upstream code cleanup (c) 2018 Pekka Enberg * + * Raspberry Pi 4 emulation Copyright (C) 2022 Ovchinnikov Vitalii + * + * SPDX-License-Identifier: GPL-2.0-or-later + * * This work is licensed under the terms of the GNU GPL, version 2 or later. * See the COPYING file in the top-level directory. */ @@ -16,20 +20,27 @@ #include "qemu/units.h" #include "qemu/cutils.h" #include "qapi/error.h" +#include "qapi/visitor.h" #include "hw/arm/boot.h" #include "hw/arm/bcm2836.h" #include "hw/arm/bcm2838.h" #include "hw/arm/raspi_platform.h" +#include "hw/display/bcm2835_fb.h" #include "hw/registerfields.h" #include "qemu/error-report.h" #include "hw/boards.h" #include "hw/loader.h" #include "hw/arm/boot.h" #include "qom/object.h" +#include "system/device_tree.h" +#include #define TYPE_RASPI_MACHINE MACHINE_TYPE_NAME("raspi-common") OBJECT_DECLARE_SIMPLE_TYPE(RaspiMachineState, RASPI_MACHINE) +#define TYPE_RASPI4B_MACHINE MACHINE_TYPE_NAME("raspi4b") +OBJECT_DECLARE_SIMPLE_TYPE(Raspi4bMachineState, RASPI4B_MACHINE) + #define SMPBOOT_ADDR0x300 /* this should leave enough space for ATAGS */ #define MVBAR_ADDR 0x400 /* secure vectors */ #define BOARDSETUP_ADDR (MVBAR_ADDR + 0x20) /* board setup code */ @@ -44,6 +55,11 @@ struct RaspiMachineState { BCM283XState soc; }; +struct Raspi4bMachineState { +RaspiBaseMachineState parent_obj; +BCM2838State soc; +}; + /* * Board revision codes: * www.raspberrypi.org/documentation/hardware/raspberrypi/revision-codes/ @@ -301,6 +317,83 @@ void raspi_base_machine_init(MachineState *machine, boot_ram_size); } +#ifdef TARGET_AARCH64 +/* + * Add second memory region if board RAM amount exceeds VC base address + * (see https://datasheets.raspberrypi.com/bcm2711/bcm2711-peripherals.pdf + * 1.2 Address Map) + */ +static int raspi4_add_memory_node(void *fdt, hwaddr mem_base, hwaddr mem_len) +{ +int ret; +uint32_t acells, scells; +char *nodename = g_strdup_printf("/memory@%" PRIx64, mem_base); + +acells = qemu_fdt_getprop_cell(fdt, "/", "#address-cells", + NULL, &error_fatal); +scells = qemu_fdt_getprop_cell(fdt, "/", "#size-cells", + NULL, &error_fatal); +if (acells == 0 || scells == 0) { +fprintf(stderr, "dtb file invalid (#address-cells or #size-cells 0)\n"); +ret = -1; +} else { +qemu_fdt_add_subnode(fdt, nodename); +qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory"); +ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", + acells, mem_base, + scells, mem_len); +} + +g_free(nodename); +return ret; +} + +static void raspi4_modify_dtb(const struct arm_boot_info *info, void *fdt) +{ +uint64_t ram_size; + +/* Temporarily disable following devices until they are implemented */ +const char *nodes_to_remove[] = { +"brcm,bcm2711-pcie", +"brcm,bcm2711-rng200", +"brcm,bcm2711-thermal", +"brcm,bcm2711-genet-v5", +}; + +for (int i = 0; i < ARRAY_SIZE(nodes_to_remove); i++) { +const char *dev_str = nodes_to_remove[i]; + +int offset = fdt_node_offset_by_compatible(fdt, -1, dev_str); +if (offset >= 0) { +if (!fdt_nop_node(fdt, offset)) { +warn_report("bcm2711 dtc: %s has been disabled!", dev_str); +} +} +} + +ram_size = board_ram_size(info->board_id); + +if (info->ram_size > UPPER_RAM_BASE) { +raspi4_add_memory_node(fdt, UPPER_RAM_BASE, ram_size - UPPER_RAM_BASE); +} +} + +static void raspi4b_machine_init(MachineState *machine) +{ +Raspi4bMachineState *s = RASPI4B_MACHINE(machine); +RaspiBaseMachineState *s_base = RASPI_BASE_MACHINE(machine); +RaspiBaseMachineClass *mc = RASPI_BASE_MACHINE_GET_CLASS(machine); +BCM2838State *soc = &s->soc; + +s_base->binfo.modify_dtb = raspi4_modify_dtb; +s_base->binfo.board_id = mc->board_rev; + +object_initialize_child(OBJECT(machine), "soc", soc, +board_soc_type(mc->board_rev)); +raspi_base_machine_init(machine, BCM283X_BASE(soc)); +} +#endif /* TARGET_AARCH64 */ + void raspi_machine_init(MachineState *machine) { RaspiMachineState *s = RASPI_MACHINE(machine); @@
[PATCH v2 00/12] hw/arm/raspi: Allow creating any Raspberry Pi machine
Full rewrite of v1 [1], addressing Zoltan & Peter suggestion. Introduce a generic 'raspi' machine, which takes a 'model' and 'revision' properties, and any memory size. The 'board_rev' register is filled appropriately. Before, merge raspi4b.c within raspi.c (more is planned here with the MPCore refactor [2]). Regards, Phil. [1] https://lore.kernel.org/qemu-devel/20250201091528.1177-1-phi...@linaro.org/ [2] https://lore.kernel.org/qemu-devel/20231212162935.42910-1-phi...@linaro.org/ Philippe Mathieu-Daudé (12): hw/arm/raspi: Access SoC parent object using BCM283X_BASE() macro hw/arm/raspi: Merge model 4B with other models hw/arm/raspi: Unify RASPI_MACHINE types hw/arm/raspi: Pass board_rev as argument to raspi_base_machine_init() hw/arm/raspi: Consider processor id in types[] array hw/arm/raspi: Consider network interface for B models hw/arm/raspi: Check ramsize is within chipset aperture hw/arm/raspi: Introduce generic Raspberry Pi machine hw/arm/raspi: Have the generic machine take a 'revision' property hw/arm/raspi: List models creatable by the generic 'raspi' machine hw/arm/raspi: Deprecate old raspiX machine names hw/arm/raspi: Support more models docs/about/deprecated.rst | 13 + include/hw/arm/raspi_platform.h | 5 +- hw/arm/raspi.c | 383 ++-- hw/arm/raspi4b.c| 136 - tests/qtest/bcm2835-dma-test.c | 2 +- tests/qtest/bcm2835-i2c-test.c | 2 +- tests/qtest/boot-serial-test.c | 3 +- hw/arm/meson.build | 2 +- tests/functional/test_aarch64_raspi3.py | 5 +- tests/functional/test_aarch64_raspi4.py | 4 +- tests/functional/test_arm_raspi2.py | 4 +- 11 files changed, 385 insertions(+), 174 deletions(-) delete mode 100644 hw/arm/raspi4b.c -- 2.47.1
[PATCH v2 08/12] hw/arm/raspi: Introduce generic Raspberry Pi machine
The generic 'raspi' machine takes a 'model' argument and create the machine associated with the model, with the RAM size requested (or default to the minimum of 256MB if not precised). Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2797 Signed-off-by: Philippe Mathieu-Daudé --- include/hw/arm/raspi_platform.h | 3 +- hw/arm/raspi.c | 127 2 files changed, 115 insertions(+), 15 deletions(-) diff --git a/include/hw/arm/raspi_platform.h b/include/hw/arm/raspi_platform.h index defb786153b..14cb91e153c 100644 --- a/include/hw/arm/raspi_platform.h +++ b/include/hw/arm/raspi_platform.h @@ -41,7 +41,8 @@ OBJECT_DECLARE_TYPE(RaspiBaseMachineState, RaspiBaseMachineClass, struct RaspiBaseMachineState { /*< private >*/ MachineState parent_obj; -/*< public >*/ + +uint32_t board_rev; struct arm_boot_info binfo; }; diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c index d44277001ee..1dc41701efe 100644 --- a/hw/arm/raspi.c +++ b/hw/arm/raspi.c @@ -300,20 +300,6 @@ void raspi_base_machine_init(MachineState *machine, BlockBackend *blk; BusState *bus; DeviceState *carddev; -uint64_t max_ramsize; - -if (machine->ram_size != ram_size) { -char *size_str = size_to_str(ram_size); -error_report("Invalid RAM size, should be %s", size_str); -g_free(size_str); -exit(1); -} -max_ramsize = ramsize_max(board_rev); -if (ram_size > max_ramsize) { -g_autofree char *max_ramsize_str = size_to_str(max_ramsize); -error_report("At most %s of RAM can be used", max_ramsize_str); - exit(1); -} /* FIXME: Remove when we have custom CPU address space support */ memory_region_add_subregion_overlap(get_system_memory(), 0, @@ -448,6 +434,115 @@ void raspi_machine_init(MachineState *machine) raspi_base_machine_init(machine, BCM283X_BASE(soc), mc->board_rev); } +static void raspi_generic_machine_init(MachineState *ms) +{ +RaspiMachineState *s = RASPI_MACHINE(ms); +RaspiBaseMachineState *s_base = RASPI_BASE_MACHINE(ms); +uint32_t board_rev = s_base->board_rev; +const char *soc_type = board_soc_type(board_rev); +BCM283XBaseState *bsoc; +uint64_t ram_size; +uint64_t max_ramsize; + +if (!board_rev) { +error_report("Missing model"); +exit(1); +} + +ram_size = ROUND_UP(ms->ram_size, 256 * MiB); +if (ram_size != ms->ram_size) { +g_autofree char *ram_size_str = size_to_str(ms->ram_size); +g_autofree char *rounded_size_str = size_to_str(ram_size); +warn_report("Invalid RAM size %s, rounding to %s", +ram_size_str, rounded_size_str); +} +max_ramsize = ramsize_max(board_rev); +if (ram_size > max_ramsize) { +g_autofree char *max_ramsize_str = size_to_str(max_ramsize); +error_report("At most %s of RAM can be used with BCM%s", + max_ramsize_str, soc_type + 3); +exit(1); +} +board_rev = FIELD_DP32(board_rev, REV_CODE, MEMORY_SIZE, + ctz64(ms->ram_size) - 28); + +ms->ram = g_new(MemoryRegion, 1); +memory_region_init(ms->ram, OBJECT(ms), "DRAM", ram_size); + +if (board_processor_id(board_rev) == PROCESSOR_ID_BCM2838) { +BCM2838State *soc = &s->soc4; +bsoc = BCM283X_BASE(soc); +object_initialize_child(OBJECT(ms), "soc", soc, soc_type); +} else { +BCM283XState *soc = &s->soc; +bsoc = BCM283X_BASE(soc); +object_initialize_child(OBJECT(ms), "soc", soc, soc_type); +} +raspi_base_machine_init(ms, bsoc, board_rev); +} + +static void raspi_update_board_rev(RaspiBaseMachineState *s) +{ +MachineState *ms = MACHINE(s); +RaspiProcessorId proc; +unsigned model_index; + +s->board_rev = FIELD_DP32(s->board_rev, REV_CODE, STYLE, 1); + +model_index = FIELD_EX32(s->board_rev, REV_CODE, TYPE); +proc = types[model_index].proc_id; +s->board_rev = FIELD_DP32(s->board_rev, REV_CODE, PROCESSOR, proc); + +ms->smp.max_cpus = soc_property[proc].cores_count; +} + +static void raspi_set_machine_model(Object *obj, const char *value, Error **errp) +{ +for (unsigned i = 0; i < ARRAY_SIZE(types); i++) { +if (types[i].model && !strcmp(value, types[i].model)) { +RaspiBaseMachineState *s = RASPI_BASE_MACHINE(obj); + +s->board_rev = FIELD_DP32(s->board_rev, REV_CODE, TYPE, i); + +return raspi_update_board_rev(s); +} +} +error_setg(errp, "Invalid model"); +} + +static char *raspi_get_machine_model(Object *obj, Error **errp) +{ +RaspiBaseMachineState *s = RASPI_BASE_MACHINE(obj); + +return g_strdup(types[FIELD_EX32(s->board_rev, REV_CODE, TYPE)].model); +} + +static void raspi_generic_machine_class_init(ObjectClass *oc, void *data) +{ +MachineClass *mc = MACHINE_CLASS(oc); +RaspiBaseMachineClass *rmc = RASPI_BASE_MACHINE_CLASS(oc); + +r
[PATCH v2 06/12] hw/arm/raspi: Consider network interface for B models
Raspberry Pi 'B' models have an ethernet chipset (the LAN9512). Since we don't yet model it, add a /* TODO */ comment. Signed-off-by: Philippe Mathieu-Daudé --- hw/arm/raspi.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c index 1a6a1f8ff22..68332fba027 100644 --- a/hw/arm/raspi.c +++ b/hw/arm/raspi.c @@ -143,6 +143,16 @@ static const char *board_type(uint32_t board_rev) return types[bt].model; } +static bool is_model_b(uint32_t board_rev) +{ +return !!strchr(board_type(board_rev), 'B'); +} + +static bool has_enet(uint32_t board_rev) +{ +return is_model_b(board_rev); +} + static void write_smpboot(ARMCPU *cpu, const struct arm_boot_info *info) { static const ARMInsnFixup smpboot[] = { @@ -304,6 +314,10 @@ void raspi_base_machine_init(MachineState *machine, machine->kernel_cmdline, &error_abort); qdev_realize(DEVICE(soc), NULL, &error_fatal); +if (has_enet(board_rev)) { +/* TODO: model LAN9512 and wire over USB2 */ +} + /* Create and plug in the SD cards */ di = drive_get(IF_SD, 0, 0); blk = di ? blk_by_legacy_dinfo(di) : NULL; -- 2.47.1
[PATCH v2 03/12] hw/arm/raspi: Unify RASPI_MACHINE types
Merge Raspi4bMachineState within RaspiMachineState by using an unnamed union. Signed-off-by: Philippe Mathieu-Daudé --- hw/arm/raspi.c | 21 +++-- 1 file changed, 7 insertions(+), 14 deletions(-) diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c index 3fa382d62ce..ef94d57dab5 100644 --- a/hw/arm/raspi.c +++ b/hw/arm/raspi.c @@ -38,9 +38,6 @@ #define TYPE_RASPI_MACHINE MACHINE_TYPE_NAME("raspi-common") OBJECT_DECLARE_SIMPLE_TYPE(RaspiMachineState, RASPI_MACHINE) -#define TYPE_RASPI4B_MACHINE MACHINE_TYPE_NAME("raspi4b") -OBJECT_DECLARE_SIMPLE_TYPE(Raspi4bMachineState, RASPI4B_MACHINE) - #define SMPBOOT_ADDR0x300 /* this should leave enough space for ATAGS */ #define MVBAR_ADDR 0x400 /* secure vectors */ #define BOARDSETUP_ADDR (MVBAR_ADDR + 0x20) /* board setup code */ @@ -49,15 +46,12 @@ OBJECT_DECLARE_SIMPLE_TYPE(Raspi4bMachineState, RASPI4B_MACHINE) #define SPINTABLE_ADDR 0xd8 /* Pi 3 bootloader spintable */ struct RaspiMachineState { -/*< private >*/ RaspiBaseMachineState parent_obj; -/*< public >*/ -BCM283XState soc; -}; -struct Raspi4bMachineState { -RaspiBaseMachineState parent_obj; -BCM2838State soc; +union { +BCM283XState soc; +BCM2838State soc4; +}; }; /* @@ -380,10 +374,10 @@ static void raspi4_modify_dtb(const struct arm_boot_info *info, void *fdt) static void raspi4b_machine_init(MachineState *machine) { -Raspi4bMachineState *s = RASPI4B_MACHINE(machine); +RaspiMachineState *s = RASPI_MACHINE(machine); RaspiBaseMachineState *s_base = RASPI_BASE_MACHINE(machine); RaspiBaseMachineClass *mc = RASPI_BASE_MACHINE_GET_CLASS(machine); -BCM2838State *soc = &s->soc; +BCM2838State *soc = &s->soc4; s_base->binfo.modify_dtb = raspi4_modify_dtb; s_base->binfo.board_id = mc->board_rev; @@ -515,8 +509,7 @@ static const TypeInfo raspi_machine_types[] = { .class_init = raspi3b_machine_class_init, }, { .name = MACHINE_TYPE_NAME("raspi4"), -.parent = TYPE_RASPI_BASE_MACHINE, -.instance_size = sizeof(Raspi4bMachineState), +.parent = TYPE_RASPI_MACHINE, .class_init = raspi4b_machine_class_init, #endif /* TARGET_AARCH64 */ }, { -- 2.47.1
[PATCH v2 05/12] hw/arm/raspi: Consider processor id in types[] array
Expand the current type2model array to include the processor id. Since the BCM2838 is indistinctly used as BCM2711 (within the Linux community), add it as alias in RaspiProcessorId. Signed-off-by: Philippe Mathieu-Daudé --- hw/arm/raspi.c | 33 +++-- 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c index 571b50bef7e..1a6a1f8ff22 100644 --- a/hw/arm/raspi.c +++ b/hw/arm/raspi.c @@ -70,6 +70,7 @@ typedef enum RaspiProcessorId { PROCESSOR_ID_BCM2836 = 1, PROCESSOR_ID_BCM2837 = 2, PROCESSOR_ID_BCM2838 = 3, +PROCESSOR_ID_BCM2711 = 3, } RaspiProcessorId; static const struct { @@ -82,6 +83,30 @@ static const struct { [PROCESSOR_ID_BCM2838] = {TYPE_BCM2838, BCM283X_NCPUS}, }; +static const struct { +RaspiProcessorId proc_id; +const char *model; +} types[] = { +{PROCESSOR_ID_BCM2835, "A"}, +{PROCESSOR_ID_BCM2835, "B"}, +{PROCESSOR_ID_BCM2835, "A+"}, +{PROCESSOR_ID_BCM2835, "B+"}, +{PROCESSOR_ID_BCM2836, "2B"}, +{ }, +{PROCESSOR_ID_BCM2835, "CM1"}, +{ }, +{PROCESSOR_ID_BCM2837, "3B"}, +{PROCESSOR_ID_BCM2835, "Zero"}, +{PROCESSOR_ID_BCM2837, "CM3"}, +{ }, +{PROCESSOR_ID_BCM2835, "ZeroW"}, +{PROCESSOR_ID_BCM2837, "3B+"}, +{PROCESSOR_ID_BCM2837, "3A+"}, +{ }, +{PROCESSOR_ID_BCM2837, "CM3+"}, +{PROCESSOR_ID_BCM2711, "4B"}, +}; + uint64_t board_ram_size(uint32_t board_rev) { assert(FIELD_EX32(board_rev, REV_CODE, STYLE)); /* Only new style */ @@ -110,16 +135,12 @@ static int cores_count(uint32_t board_rev) static const char *board_type(uint32_t board_rev) { -static const char *types[] = { -"A", "B", "A+", "B+", "2B", "Alpha", "CM1", NULL, "3B", "Zero", -"CM3", NULL, "Zero W", "3B+", "3A+", NULL, "CM3+", "4B", -}; assert(FIELD_EX32(board_rev, REV_CODE, STYLE)); /* Only new style */ int bt = FIELD_EX32(board_rev, REV_CODE, TYPE); -if (bt >= ARRAY_SIZE(types) || !types[bt]) { +if (bt >= ARRAY_SIZE(types) || !types[bt].model) { return "Unknown"; } -return types[bt]; +return types[bt].model; } static void write_smpboot(ARMCPU *cpu, const struct arm_boot_info *info) -- 2.47.1
[PATCH v2 10/12] hw/arm/raspi: List models creatable by the generic 'raspi' machine
All the following models can be created (with different RAM size): $ qemu-system-aarch64 -M raspi qemu-system-aarch64: Missing model, try -M raspi,model=help $ qemu-system-aarch64 -M raspi,model=help Available models (processor): - A (BCM2835) - B (BCM2835) - A+ (BCM2835) - B+ (BCM2835) - 2B (BCM2836) - CM1(BCM2835) - 3B (BCM2837) - Zero (BCM2835) - CM3(BCM2837) - ZeroW (BCM2835) - 3B+(BCM2837) - 3A+(BCM2837) - CM3+ (BCM2837) - 4B (BCM2838) Signed-off-by: Philippe Mathieu-Daudé --- hw/arm/raspi.c | 28 +++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c index b184ac3c446..8cae1ff6f93 100644 --- a/hw/arm/raspi.c +++ b/hw/arm/raspi.c @@ -445,7 +445,7 @@ static void raspi_generic_machine_init(MachineState *ms) uint64_t max_ramsize; if (!board_rev) { -error_report("Missing model"); +error_report("Missing model, try -M raspi,model=help"); exit(1); } @@ -500,8 +500,33 @@ static void raspi_update_board_rev(RaspiBaseMachineState *s) ms->smp.max_cpus = soc_property[proc].cores_count; } +static void raspi_list_machine_models(void) +{ +printf("Available models (processor):\n"); + +for (unsigned i = 0; i < ARRAY_SIZE(types); i++) { +const char *soc_type; + +if (!types[i].model) { +continue; +} + +soc_type = soc_property[types[i].proc_id].type; +if (!soc_type) { +continue; +} +printf("- %-10s (BCM%s)\n", + types[i].model, + soc_property[types[i].proc_id].type + 3); +} +} + static void raspi_set_machine_model(Object *obj, const char *value, Error **errp) { +if (!strcmp(value, "help")) { +raspi_list_machine_models(); +exit(0); +} for (unsigned i = 0; i < ARRAY_SIZE(types); i++) { if (types[i].model && !strcmp(value, types[i].model)) { RaspiBaseMachineState *s = RASPI_BASE_MACHINE(obj); @@ -512,6 +537,7 @@ static void raspi_set_machine_model(Object *obj, const char *value, Error **errp } } error_setg(errp, "Invalid model"); +error_append_hint(errp, "Use model=help to list models.\n"); } static char *raspi_get_machine_model(Object *obj, Error **errp) -- 2.47.1
[PATCH v2 09/12] hw/arm/raspi: Have the generic machine take a 'revision' property
Add a property to specify the board revision. This allows to create a Raspberry Pi 2B with BCM2836 SoC (rev 1.0 and 1.1) or BCM2837 (rev 1.2 up to 1.5). Signed-off-by: Philippe Mathieu-Daudé --- hw/arm/raspi.c | 39 +++ 1 file changed, 39 insertions(+) diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c index 1dc41701efe..b184ac3c446 100644 --- a/hw/arm/raspi.c +++ b/hw/arm/raspi.c @@ -491,6 +491,10 @@ static void raspi_update_board_rev(RaspiBaseMachineState *s) model_index = FIELD_EX32(s->board_rev, REV_CODE, TYPE); proc = types[model_index].proc_id; +if (model_index == 4 && FIELD_EX32(s->board_rev, REV_CODE, REVISION) > 1) { +/* 2B rev 1.0 and 1.1 have BCM2836, 1.2+ have BCM2837 */ +proc = PROCESSOR_ID_BCM2837; +} s->board_rev = FIELD_DP32(s->board_rev, REV_CODE, PROCESSOR, proc); ms->smp.max_cpus = soc_property[proc].cores_count; @@ -517,6 +521,35 @@ static char *raspi_get_machine_model(Object *obj, Error **errp) return g_strdup(types[FIELD_EX32(s->board_rev, REV_CODE, TYPE)].model); } +static void raspi_set_machine_rev(Object *obj, const char *value, Error **errp) +{ +RaspiBaseMachineState *s; +int rev; + +if (strlen(value) != 3 || value[0] != '1' || value[1] != '.') { +error_setg(errp, "Invalid revision"); +return; +} +rev = value[2] - '0'; +if (rev < 0 || rev > 5) { +error_setg(errp, "Invalid revision"); +return; +} + +s = RASPI_BASE_MACHINE(obj); +s->board_rev = FIELD_DP32(s->board_rev, REV_CODE, REVISION, rev); + +return raspi_update_board_rev(s); +} + +static char *raspi_get_machine_rev(Object *obj, Error **errp) +{ +RaspiBaseMachineState *s = RASPI_BASE_MACHINE(obj); + +return g_strdup_printf("1.%u", + FIELD_EX32(s->board_rev, REV_CODE, REVISION)); +} + static void raspi_generic_machine_class_init(ObjectClass *oc, void *data) { MachineClass *mc = MACHINE_CLASS(oc); @@ -540,6 +573,12 @@ static void raspi_generic_machine_class_init(ObjectClass *oc, void *data) raspi_get_machine_model, raspi_set_machine_model); object_class_property_set_description(oc, "model", "Set machine model."); +object_class_property_add_str(oc, "revision", + raspi_get_machine_rev, + raspi_set_machine_rev); +object_class_property_set_description(oc, "revision", + "Set machine revision. " + "Valid values are 1.0 to 1.5"); }; -- 2.47.1
[PATCH v2 01/12] hw/arm/raspi: Access SoC parent object using BCM283X_BASE() macro
We shouldn't access a QOM parent object directly. Use the appropriate type-cast macro. Signed-off-by: Philippe Mathieu-Daudé --- hw/arm/raspi.c | 2 +- hw/arm/raspi4b.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c index a7a662f40db..508f90479e2 100644 --- a/hw/arm/raspi.c +++ b/hw/arm/raspi.c @@ -312,7 +312,7 @@ void raspi_machine_init(MachineState *machine) object_initialize_child(OBJECT(machine), "soc", soc, board_soc_type(mc->board_rev)); -raspi_base_machine_init(machine, &soc->parent_obj); +raspi_base_machine_init(machine, BCM283X_BASE(soc)); } void raspi_machine_class_common_init(MachineClass *mc, diff --git a/hw/arm/raspi4b.c b/hw/arm/raspi4b.c index 1264e0d6eed..9b08a598f39 100644 --- a/hw/arm/raspi4b.c +++ b/hw/arm/raspi4b.c @@ -104,7 +104,7 @@ static void raspi4b_machine_init(MachineState *machine) object_initialize_child(OBJECT(machine), "soc", soc, board_soc_type(mc->board_rev)); -raspi_base_machine_init(machine, &soc->parent_obj); +raspi_base_machine_init(machine, BCM283X_BASE(soc)); } static void raspi4b_machine_class_init(ObjectClass *oc, void *data) -- 2.47.1
[PATCH v2 07/12] hw/arm/raspi: Check ramsize is within chipset aperture
Add the 'max_ramsize' field to the soc_property[] array, corresponding to the maximum DRAM size a SoC can map. Signed-off-by: Philippe Mathieu-Daudé --- hw/arm/raspi.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c index 68332fba027..d44277001ee 100644 --- a/hw/arm/raspi.c +++ b/hw/arm/raspi.c @@ -76,11 +76,12 @@ typedef enum RaspiProcessorId { static const struct { const char *type; int cores_count; +uint64_t max_ramsize; } soc_property[] = { -[PROCESSOR_ID_BCM2835] = {TYPE_BCM2835, 1}, -[PROCESSOR_ID_BCM2836] = {TYPE_BCM2836, BCM283X_NCPUS}, -[PROCESSOR_ID_BCM2837] = {TYPE_BCM2837, BCM283X_NCPUS}, -[PROCESSOR_ID_BCM2838] = {TYPE_BCM2838, BCM283X_NCPUS}, +[PROCESSOR_ID_BCM2835] = {TYPE_BCM2835, 1, 512 * MiB}, +[PROCESSOR_ID_BCM2836] = {TYPE_BCM2836, BCM283X_NCPUS, 1 * GiB}, +[PROCESSOR_ID_BCM2837] = {TYPE_BCM2837, BCM283X_NCPUS, 1 * GiB}, +[PROCESSOR_ID_BCM2838] = {TYPE_BCM2838, BCM283X_NCPUS, 8 * GiB}, }; static const struct { @@ -133,6 +134,11 @@ static int cores_count(uint32_t board_rev) return soc_property[board_processor_id(board_rev)].cores_count; } +static uint64_t ramsize_max(uint32_t board_rev) +{ +return soc_property[board_processor_id(board_rev)].max_ramsize; +} + static const char *board_type(uint32_t board_rev) { assert(FIELD_EX32(board_rev, REV_CODE, STYLE)); /* Only new style */ @@ -294,6 +300,7 @@ void raspi_base_machine_init(MachineState *machine, BlockBackend *blk; BusState *bus; DeviceState *carddev; +uint64_t max_ramsize; if (machine->ram_size != ram_size) { char *size_str = size_to_str(ram_size); @@ -301,6 +308,12 @@ void raspi_base_machine_init(MachineState *machine, g_free(size_str); exit(1); } +max_ramsize = ramsize_max(board_rev); +if (ram_size > max_ramsize) { +g_autofree char *max_ramsize_str = size_to_str(max_ramsize); +error_report("At most %s of RAM can be used", max_ramsize_str); + exit(1); +} /* FIXME: Remove when we have custom CPU address space support */ memory_region_add_subregion_overlap(get_system_memory(), 0, -- 2.47.1
[PATCH v2 04/12] hw/arm/raspi: Pass board_rev as argument to raspi_base_machine_init()
Since callers already have reference to the RaspiBaseMachineClass, directly pass 'board_rev' as argument to raspi_base_machine_init(). Signed-off-by: Philippe Mathieu-Daudé --- include/hw/arm/raspi_platform.h | 2 +- hw/arm/raspi.c | 8 +++- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/include/hw/arm/raspi_platform.h b/include/hw/arm/raspi_platform.h index 7bc4807fa51..defb786153b 100644 --- a/include/hw/arm/raspi_platform.h +++ b/include/hw/arm/raspi_platform.h @@ -58,7 +58,7 @@ void raspi_machine_init(MachineState *machine); typedef struct BCM283XBaseState BCM283XBaseState; void raspi_base_machine_init(MachineState *machine, - BCM283XBaseState *soc); + BCM283XBaseState *soc, const uint32_t board_rev); void raspi_machine_class_common_init(MachineClass *mc, uint32_t board_rev); diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c index ef94d57dab5..571b50bef7e 100644 --- a/hw/arm/raspi.c +++ b/hw/arm/raspi.c @@ -254,10 +254,8 @@ static void setup_boot(MachineState *machine, ARMCPU *cpu, } void raspi_base_machine_init(MachineState *machine, - BCM283XBaseState *soc) + BCM283XBaseState *soc, const uint32_t board_rev) { -RaspiBaseMachineClass *mc = RASPI_BASE_MACHINE_GET_CLASS(machine); -uint32_t board_rev = mc->board_rev; uint64_t ram_size = board_ram_size(board_rev); uint32_t vcram_base, vcram_size; size_t boot_ram_size; @@ -384,7 +382,7 @@ static void raspi4b_machine_init(MachineState *machine) object_initialize_child(OBJECT(machine), "soc", soc, board_soc_type(mc->board_rev)); -raspi_base_machine_init(machine, BCM283X_BASE(soc)); +raspi_base_machine_init(machine, BCM283X_BASE(soc), mc->board_rev); } #endif /* TARGET_AARCH64 */ @@ -399,7 +397,7 @@ void raspi_machine_init(MachineState *machine) object_initialize_child(OBJECT(machine), "soc", soc, board_soc_type(mc->board_rev)); -raspi_base_machine_init(machine, BCM283X_BASE(soc)); +raspi_base_machine_init(machine, BCM283X_BASE(soc), mc->board_rev); } void raspi_machine_class_common_init(MachineClass *mc, -- 2.47.1
[PATCH v2 12/12] hw/arm/raspi: Support more models
Allow to create the following machines: - Zero2W - 400 - CM4 and CM4S Fill the arrays with the BCM2712-based machines (raspi5), but since we don't model the SoC, these machines can't be created (and aren't listed in the 'help' output). List taken from: https://github.com/raspberrypi/documentation/blob/9b126446a5/documentation/asciidoc/computers/raspberry-pi/revision-codes.adoc Signed-off-by: Philippe Mathieu-Daudé --- hw/arm/raspi.c | 13 + 1 file changed, 13 insertions(+) diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c index 86ecc988e06..2346550eec5 100644 --- a/hw/arm/raspi.c +++ b/hw/arm/raspi.c @@ -71,6 +71,7 @@ typedef enum RaspiProcessorId { PROCESSOR_ID_BCM2837 = 2, PROCESSOR_ID_BCM2838 = 3, PROCESSOR_ID_BCM2711 = 3, +PROCESSOR_ID_BCM2712 = 4, } RaspiProcessorId; static const struct { @@ -82,6 +83,7 @@ static const struct { [PROCESSOR_ID_BCM2836] = {TYPE_BCM2836, BCM283X_NCPUS, 1 * GiB}, [PROCESSOR_ID_BCM2837] = {TYPE_BCM2837, BCM283X_NCPUS, 1 * GiB}, [PROCESSOR_ID_BCM2838] = {TYPE_BCM2838, BCM283X_NCPUS, 8 * GiB}, +[PROCESSOR_ID_BCM2712] = {NULL, BCM283X_NCPUS, 16 * GiB}, }; static const struct { @@ -106,6 +108,17 @@ static const struct { { }, {PROCESSOR_ID_BCM2837, "CM3+"}, {PROCESSOR_ID_BCM2711, "4B"}, +{PROCESSOR_ID_BCM2837, "Zero2W"}, +{PROCESSOR_ID_BCM2711, "400"}, + +{PROCESSOR_ID_BCM2711, "CM4"}, +{PROCESSOR_ID_BCM2711, "CM4S"}, +{ }, +{PROCESSOR_ID_BCM2712, "5"}, +{PROCESSOR_ID_BCM2712, "CM5"}, +{PROCESSOR_ID_BCM2712, "500"}, +{PROCESSOR_ID_BCM2712, "CM5lite"}, +{ }, }; uint64_t board_ram_size(uint32_t board_rev) -- 2.47.1
[PATCH v2 11/12] hw/arm/raspi: Deprecate old raspiX machine names
All previous raspi machines can be created using the generic machine. Deprecate the old names to maintain a single one. Update the tests. Signed-off-by: Philippe Mathieu-Daudé --- QOM HMP introspection test fails because without the 'model' argument set, no machine is created... $ qemu-system-aarch64 -M raspi qemu-system-aarch64: Missing model, try -M raspi,model=help --- docs/about/deprecated.rst | 13 + hw/arm/raspi.c | 5 + tests/qtest/bcm2835-dma-test.c | 2 +- tests/qtest/bcm2835-i2c-test.c | 2 +- tests/qtest/boot-serial-test.c | 3 ++- tests/functional/test_aarch64_raspi3.py | 5 ++--- tests/functional/test_aarch64_raspi4.py | 4 ++-- tests/functional/test_arm_raspi2.py | 4 ++-- 8 files changed, 28 insertions(+), 10 deletions(-) diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst index 4a3c302962a..c9a11a52f78 100644 --- a/docs/about/deprecated.rst +++ b/docs/about/deprecated.rst @@ -257,6 +257,19 @@ Big-Endian variants of MicroBlaze ``petalogix-ml605`` and ``xlnx-zynqmp-pmu`` ma Both ``petalogix-ml605`` and ``xlnx-zynqmp-pmu`` were added for little endian CPUs. Big endian support is not tested. +ARM ``raspi0``, ``raspi1ap``, ``raspi2b``, ``raspi3ap``, ``raspi3b`` and ``raspi4b`` machines (since 10.0) +'' + +The Raspberry Pi machines have been unified under the generic ``raspi`` machine, +which takes the model as argument. + +- `raspi0`` is now an alias for ``raspi,model=Zero`` +- `raspi1ap`` is now an alias for ``raspi,model=1A+`` +- `raspi2b`` is now an alias for ``raspi,model=2B`` +- `raspi3ap`` is now an alias for ``raspi,model=3A+`` +- `raspi3b`` is now an alias for ``raspi,model=3B`` +- `raspi4b`` is now an alias for ``raspi,model=4B`` + Backend options --- diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c index 8cae1ff6f93..86ecc988e06 100644 --- a/hw/arm/raspi.c +++ b/hw/arm/raspi.c @@ -637,6 +637,7 @@ static void raspi0_machine_class_init(ObjectClass *oc, void *data) rmc->board_rev = 0x920092; /* Revision 1.2 */ raspi_machine_class_init(mc, rmc->board_rev); +mc->deprecation_reason = "-M raspi,model=Zero"; }; static void raspi1ap_machine_class_init(ObjectClass *oc, void *data) @@ -646,6 +647,7 @@ static void raspi1ap_machine_class_init(ObjectClass *oc, void *data) rmc->board_rev = 0x900021; /* Revision 1.1 */ raspi_machine_class_init(mc, rmc->board_rev); +mc->deprecation_reason = "-M raspi,model=A+ (-m 512m)"; }; static void raspi2b_machine_class_init(ObjectClass *oc, void *data) @@ -655,6 +657,7 @@ static void raspi2b_machine_class_init(ObjectClass *oc, void *data) rmc->board_rev = 0xa21041; raspi_machine_class_init(mc, rmc->board_rev); +mc->deprecation_reason = "-M raspi,model=2B -m 1g"; }; #ifdef TARGET_AARCH64 @@ -665,6 +668,7 @@ static void raspi3ap_machine_class_init(ObjectClass *oc, void *data) rmc->board_rev = 0x9020e0; /* Revision 1.0 */ raspi_machine_class_init(mc, rmc->board_rev); +mc->deprecation_reason = "-M raspi,model=3A+ -m 512m"; }; static void raspi3b_machine_class_init(ObjectClass *oc, void *data) @@ -674,6 +678,7 @@ static void raspi3b_machine_class_init(ObjectClass *oc, void *data) rmc->board_rev = 0xa02082; raspi_machine_class_init(mc, rmc->board_rev); +mc->deprecation_reason = "-M raspi,model=3B -m 1g"; }; static void raspi4b_machine_class_init(ObjectClass *oc, void *data) diff --git a/tests/qtest/bcm2835-dma-test.c b/tests/qtest/bcm2835-dma-test.c index 18901b76d21..705e6b2362b 100644 --- a/tests/qtest/bcm2835-dma-test.c +++ b/tests/qtest/bcm2835-dma-test.c @@ -111,7 +111,7 @@ int main(int argc, char **argv) g_test_init(&argc, &argv, NULL); qtest_add_func("/bcm2835/dma/test_interrupts", bcm2835_dma_test_interrupts); -qtest_start("-machine raspi3b"); +qtest_start("-machine raspi,model=3B -m 1g"); ret = g_test_run(); qtest_end(); return ret; diff --git a/tests/qtest/bcm2835-i2c-test.c b/tests/qtest/bcm2835-i2c-test.c index 15991949260..15904abf393 100644 --- a/tests/qtest/bcm2835-i2c-test.c +++ b/tests/qtest/bcm2835-i2c-test.c @@ -104,7 +104,7 @@ int main(int argc, char **argv) } /* Run I2C tests with TMP105 slaves on all three buses */ -qtest_start("-M raspi3b " +qtest_start("-M raspi,model=3B -m 1g " "-device tmp105,address=0x50,bus=i2c-bus.0 " "-device tmp105,address=0x50,bus=i2c-bus.1 " "-device tmp105,address=0x50,bus=i2c-bus.2"); diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c index a05d26ee996..fbafd73facb 100644 --- a/tests/qtest/boot-serial-test.c +++ b/tests/qtest/boot-serial-test.c @@ -188,7 +188,8 @@ static const testdef_t tests[] = { size
Re: [RFC PATCH v12 qemu 2/2] qtest/cxl: Add aarch64 virt test for CXL
Jonathan, > On Feb 4, 2025, at 2:30, Jonathan Cameron wrote: > > Add a single complex case for aarch64 virt machine. > Given existing much more comprehensive tests for x86 cover the > common functionality, a single test should be enough to verify > that the aarch64 part continue to work. > > Signed-off-by: Jonathan Cameron > --- > tests/qtest/cxl-test.c | 59 - > tests/qtest/meson.build | 1 + > 2 files changed, 47 insertions(+), 13 deletions(-) > > diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c > index a600331843..c7189d6222 100644 > --- a/tests/qtest/cxl-test.c > +++ b/tests/qtest/cxl-test.c > @@ -19,6 +19,12 @@ > "-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 " \ > "-M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=4G > " > > +#define QEMU_VIRT_2PXB_CMD \ > +"-machine virt,cxl=on -cpu max " \ > +"-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 " \ > +"-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 " \ > +"-M > cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=4G " > + > #define QEMU_RP \ > "-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 " > > @@ -197,25 +203,52 @@ static void cxl_2pxb_4rp_4t3d(void) > qtest_end(); > rmdir(tmpfs); > } > + > +static void cxl_virt_2pxb_4rp_4t3d(void) > +{ > +g_autoptr(GString) cmdline = g_string_new(NULL); > +char template[] = "/tmp/cxl-test-XX"; > +const char *tmpfs; > + > +tmpfs = mkdtemp(template); > + > +g_string_printf(cmdline, QEMU_VIRT_2PXB_CMD QEMU_4RP QEMU_4T3D, > +tmpfs, tmpfs, tmpfs, tmpfs, tmpfs, tmpfs, > +tmpfs, tmpfs); > + > +qtest_start(cmdline->str); > +qtest_end(); > +rmdir(tmpfs); > +} > #endif /* CONFIG_POSIX */ > > int main(int argc, char **argv) > { > -g_test_init(&argc, &argv, NULL); > +const char *arch = qtest_get_arch(); > > -qtest_add_func("/pci/cxl/basic_hostbridge", cxl_basic_hb); > -qtest_add_func("/pci/cxl/basic_pxb", cxl_basic_pxb); > -qtest_add_func("/pci/cxl/pxb_with_window", cxl_pxb_with_window); > -qtest_add_func("/pci/cxl/pxb_x2_with_window", cxl_2pxb_with_window); > -qtest_add_func("/pci/cxl/rp", cxl_root_port); > -qtest_add_func("/pci/cxl/rp_x2", cxl_2root_port); > +g_test_init(&argc, &argv, NULL); > +if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) { > +qtest_add_func("/pci/cxl/basic_hostbridge", cxl_basic_hb); > +qtest_add_func("/pci/cxl/basic_pxb", cxl_basic_pxb); > +qtest_add_func("/pci/cxl/pxb_with_window", cxl_pxb_with_window); > +qtest_add_func("/pci/cxl/pxb_x2_with_window", cxl_2pxb_with_window); > +qtest_add_func("/pci/cxl/rp", cxl_root_port); > +qtest_add_func("/pci/cxl/rp_x2", cxl_2root_port); > #ifdef CONFIG_POSIX > -qtest_add_func("/pci/cxl/type3_device", cxl_t3d_deprecated); > -qtest_add_func("/pci/cxl/type3_device_pmem", cxl_t3d_persistent); > -qtest_add_func("/pci/cxl/type3_device_vmem", cxl_t3d_volatile); > -qtest_add_func("/pci/cxl/type3_device_vmem_lsa", cxl_t3d_volatile_lsa); > -qtest_add_func("/pci/cxl/rp_x2_type3_x2", cxl_1pxb_2rp_2t3d); > -qtest_add_func("/pci/cxl/pxb_x2_root_port_x4_type3_x4", > cxl_2pxb_4rp_4t3d); > +qtest_add_func("/pci/cxl/type3_device", cxl_t3d_deprecated); > +qtest_add_func("/pci/cxl/type3_device_pmem", cxl_t3d_persistent); > +qtest_add_func("/pci/cxl/type3_device_vmem", cxl_t3d_volatile); > +qtest_add_func("/pci/cxl/type3_device_vmem_lsa", > cxl_t3d_volatile_lsa); > +qtest_add_func("/pci/cxl/rp_x2_type3_x2", cxl_1pxb_2rp_2t3d); > +qtest_add_func("/pci/cxl/pxb_x2_root_port_x4_type3_x4", > + cxl_2pxb_4rp_4t3d); > #endif > +} else if (strcmp(arch, "aarch64") == 0) { > +#ifdef CONFIG_POSIX > +qtest_add_func("/pci/cxl/virt/pxb_x2_root_port_x4_type3_x4", > + cxl_virt_2pxb_4rp_4t3d); > +#endif > +} > + > return g_test_run(); > } > diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build > index e60e92fe9d..f5e7fb060e 100644 > --- a/tests/qtest/meson.build > +++ b/tests/qtest/meson.build > @@ -257,6 +257,7 @@ qtests_aarch64 = \ > (config_all_accel.has_key('CONFIG_TCG') and >\ >config_all_devices.has_key('CONFIG_TPM_TIS_I2C') ? ['tpm-tis-i2c-test'] : > []) + \ > (config_all_devices.has_key('CONFIG_ASPEED_SOC') ? qtests_aspeed64 : []) + \ > + qtests_cxl + >\ > ['arm-cpu-features', >'numa-test', >'boot-serial-test', > -- > 2.43.0 > In Ubuntu 22.04 LTS, cxl-test applied on top of today’s QEMU upstream master branch cxl-test fails: $ ./tests/qtest/cxl-test # random seed: R02S2a8b02df7b32b79d086ce22f7f8ebeab 1..1 # Start of aarch64 tests # Start of pci tests # Start of cxl tests # Start of virt tests # s
Re: [PATCH v3 06/26] target/arm/kvm-rme: Initialize vCPU
On 11/26/24 5:56 AM, Jean-Philippe Brucker wrote: The target code calls kvm_arm_vcpu_init() to mark the vCPU as part of a Realm. For a Realm vCPU, only x0-x7 can be set at runtime. Before boot, the PC can also be set, and is ignored at runtime. KVM also accepts a few system register changes during initial configuration, as returned by KVM_GET_REG_LIST. Signed-off-by: Jean-Philippe Brucker --- target/arm/cpu.h | 3 +++ target/arm/kvm_arm.h | 15 +++ target/arm/kvm-rme.c | 10 target/arm/kvm.c | 61 4 files changed, 89 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index d86e641280..f617591921 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -961,6 +961,9 @@ struct ArchCPU { OnOffAuto kvm_steal_time; #endif /* CONFIG_KVM */ +/* Realm Management Extension */ +bool kvm_rme; + /* Uniprocessor system with MP extensions */ bool mp_is_up; diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h index 9d6a89f9b1..8b52a881b0 100644 --- a/target/arm/kvm_arm.h +++ b/target/arm/kvm_arm.h @@ -245,6 +245,16 @@ int kvm_arm_rme_init(MachineState *ms); */ int kvm_arm_rme_vm_type(MachineState *ms); +/** + * kvm_arm_rme_vcpu_init + * @cs: the CPU + * + * If the user requested a Realm, setup the given vCPU accordingly. Realm vCPUs + * behave a little differently, for example most of their register state is + * hidden from the host. + */ +int kvm_arm_rme_vcpu_init(CPUState *cs); + #else /* @@ -339,6 +349,11 @@ static inline int kvm_arm_rme_vm_type(MachineState *ms) g_assert_not_reached(); } +static inline int kvm_arm_rme_vcpu_init(CPUState *cs) +{ +g_assert_not_reached(); +} + #endif #endif diff --git a/target/arm/kvm-rme.c b/target/arm/kvm-rme.c index 60d967a842..e3cc37538a 100644 --- a/target/arm/kvm-rme.c +++ b/target/arm/kvm-rme.c @@ -137,6 +137,16 @@ int kvm_arm_rme_init(MachineState *ms) return 0; } +int kvm_arm_rme_vcpu_init(CPUState *cs) +{ +ARMCPU *cpu = ARM_CPU(cs); + +if (rme_guest) { +cpu->kvm_rme = true; +} +return 0; +} + int kvm_arm_rme_vm_type(MachineState *ms) { if (rme_guest) { diff --git a/target/arm/kvm.c b/target/arm/kvm.c index 0c80992f7c..a0de2efc41 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -1926,6 +1926,11 @@ int kvm_arch_init_vcpu(CPUState *cs) return ret; } +ret = kvm_arm_rme_vcpu_init(cs); +if (ret) { +return ret; +} + if (cpu_isar_feature(aa64_sve, cpu)) { ret = kvm_arm_sve_set_vls(cpu); if (ret) { @@ -2062,6 +2067,35 @@ static int kvm_arch_put_sve(CPUState *cs) return 0; } +static int kvm_arm_rme_put_core_regs(CPUState *cs, Error **errp) +{ +int i, ret; +struct kvm_one_reg reg; +ARMCPU *cpu = ARM_CPU(cs); +CPUARMState *env = &cpu->env; + +/* + * The RME ABI only allows us to set 8 GPRs and the PC + */ Needn't to span for multiple lines. +for (i = 0; i < 8; i++) { +reg.id = AARCH64_CORE_REG(regs.regs[i]); +reg.addr = (uintptr_t) &env->xregs[i]; +ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®); +if (ret) { +return ret; +} +} + +reg.id = AARCH64_CORE_REG(regs.pc); +reg.addr = (uintptr_t) &env->pc; +ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®); +if (ret) { +return ret; +} + +return 0; +} + Nice place to use kvm_set_one_reg(). With it, @reg can be dropped. static int kvm_arm_put_core_regs(CPUState *cs, int level, Error **errp) { uint64_t val; @@ -2072,6 +2106,10 @@ static int kvm_arm_put_core_regs(CPUState *cs, int level, Error **errp) ARMCPU *cpu = ARM_CPU(cs); CPUARMState *env = &cpu->env; +if (cpu->kvm_rme) { +return kvm_arm_rme_put_core_regs(cs, errp); +} + /* If we are in AArch32 mode then we need to copy the AArch32 regs to the * AArch64 registers before pushing them out to 64-bit KVM. */ @@ -2259,6 +2297,25 @@ static int kvm_arch_get_sve(CPUState *cs) return 0; } +static int kvm_arm_rme_get_core_regs(CPUState *cs, Error **errp) +{ +int i, ret; +struct kvm_one_reg reg; +ARMCPU *cpu = ARM_CPU(cs); +CPUARMState *env = &cpu->env; + +for (i = 0; i < 8; i++) { +reg.id = AARCH64_CORE_REG(regs.regs[i]); +reg.addr = (uintptr_t) &env->xregs[i]; +ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®); +if (ret) { +return ret; +} +} + +return 0; +} + Similiarly, kvm_get_one_reg() can be used. static int kvm_arm_get_core_regs(CPUState *cs, Error **errp) { uint64_t val; @@ -2269,6 +2326,10 @@ static int kvm_arm_get_core_regs(CPUState *cs, Error **errp) ARMCPU *cpu = ARM_CPU(cs); CPUARMState *env = &cpu->env; +if (cpu->kvm_rme) { +return kvm_arm_rme_get_core_regs(cs, errp); +} + fo
Re: [PATCH v2 11/15] block/export: Add option to allow export of inactive nodes
On Fri, Jan 31, 2025 at 10:50:47AM +0100, Kevin Wolf wrote: > Add an option in BlockExportOptions to allow creating an export on an > inactive node without activating the node. This mode needs to be > explicitly supported by the export type (so that it doesn't perform any > operations that are forbidden for inactive nodes), so this patch alone > doesn't allow this option to be successfully used yet. > > Signed-off-by: Kevin Wolf > --- > qapi/block-export.json | 10 +- > include/block/block-global-state.h | 3 +++ > include/block/export.h | 3 +++ > block.c| 4 > block/export/export.c | 31 -- > 5 files changed, 40 insertions(+), 11 deletions(-) > > diff --git a/qapi/block-export.json b/qapi/block-export.json > index ce33fe378d..117b05d13c 100644 > --- a/qapi/block-export.json > +++ b/qapi/block-export.json > @@ -372,6 +372,13 @@ > # cannot be moved to the iothread. The default is false. > # (since: 5.2) > # > +# @allow-inactive: If true, the export allows the exported node to be > inactive. > +# If it is created for an inactive block node, the node remains > inactive. If > +# the export type doesn't support running on an inactive node, an error > is > +# returned. If false, inactive block nodes are automatically activated > before > +# creating the export and trying to inactivate them later fails. > +# (since: 10.0; default: false) Exposing activation in the API is ugly but I don't see a cleaner option given that we cannot change block-export-add's existing behavior of activating the node by default. :( Ideally block-export-add would not modify active/inactive and leave it up to user to provide a node in the desired state. Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote: > On 2.02.2025 13:45, Dr. David Alan Gilbert wrote: > > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote: > > > On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: > > > > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote: > > > > > From: "Maciej S. Szmigiero" > > > > > > > > > > postcopy_ram_listen_thread() is a free running thread, so it needs to > > > > > take BQL around function calls to migration methods requiring BQL. > > > > > > > > > > qemu_loadvm_state_main() needs BQL held since it ultimately calls > > > > > "load_state" SaveVMHandlers. > > > > > > > > > > migration_incoming_state_destroy() needs BQL held since it ultimately > > > > > calls > > > > > "load_cleanup" SaveVMHandlers. > > > > > > > > > > Signed-off-by: Maciej S. Szmigiero > > > > > --- > > > > >migration/savevm.c | 4 > > > > >1 file changed, 4 insertions(+) > > > > > > > > > > diff --git a/migration/savevm.c b/migration/savevm.c > > > > > index b0b74140daea..0ceea9638cc1 100644 > > > > > --- a/migration/savevm.c > > > > > +++ b/migration/savevm.c > > > > > @@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void > > > > > *opaque) > > > > > * in qemu_file, and thus we must be blocking now. > > > > > */ > > > > >qemu_file_set_blocking(f, true); > > > > > +bql_lock(); > > > > >load_res = qemu_loadvm_state_main(f, mis); > > > > > +bql_unlock(); > > > > > > > > Doesn't that leave that held for a heck of a long time? > > > > > > Yes, and it effectively broke "postcopy recover" test but I > > > think the reason for that is qemu_loadvm_state_main() and > > > its children don't drop BQL while waiting for I/O. > > > > > > I've described this case in more detail in my reply to Fabiano here: > > > https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd25...@maciej.szmigiero.name/ > > > > While it might be the cause in this case, my feeling is it's more > > fundamental > > here - it's the whole reason that postcopy has a separate ram listen > > thread. As the destination is running, after it loads it's devices > > and as it starts up the destination will be still loading RAM > > (and other postcopiable devices) potentially for quite a while. > > Holding the bql around the ram listen thread means that the > > execution of the destination won't be able to take that lock > > until the postcopy load has finished; so while that might apparently > > complete, it'll lead to the destination stalling until that's finished > > which defeats the whole point of postcopy. > > That last one probably won't fail a test but it will lead to a long stall > > if you give it a nice big guest with lots of RAM that it's rapidly > > changing. > > Okay, I understand the postcopy case/flow now. > Thanks for explaining it clearly. > > > > I still think that "load_state" SaveVMHandlers need to be called > > > with BQL held since implementations apparently expect it that way: > > > for example, I think PCI device configuration restore calls > > > address space manipulation methods which abort() if called > > > without BQL held. > > > > However, the only devices that *should* be arriving on the channel > > that the postcopy_ram_listen_thread is reading from are those > > that are postcopiable (i.e. RAM and hmm block's dirty_bitmap). > > Those load handlers are safe to be run while the other devices > > are being changed. Note the *should* - you could add a check > > to fail if any other device arrives on that channel. > > I think ultimately there should be either an explicit check, or, > as you suggest in the paragraph below, a separate SaveVMHandler > that runs without BQL held. To me those are bugs happening during postcopy, so those abort()s in memory.c are indeed for catching these issues too. > Since the current state of just running these SaveVMHandlers > without BQL in this case and hoping that nothing breaks is > clearly sub-optimal. > > > > I have previously even submitted a patch to explicitly document > > > "load_state" SaveVMHandler as requiring BQL (which was also > > > included in the previous version of this patch set) and it > > > received a "Reviewed-by:" tag: > > > https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/ > > > https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/ > > > https://lore.kernel.org/qemu-devel/87o732bti7@suse.de/ > > > > It happens! > > You could make this safer by having a load_state and a load_state_postcopy > > member, and only mark the load_state as requiring the lock. > > To not digress too much from the subject of this patch set > (multifd VFIO device state transfer) for now I've just updated the > TODO comment around that qemu_loadvm_state_main(), so hopefully this > discussion won't get forgotten: > https://gitlab
Re: [PATCH v4 13/33] migration/multifd: Device state transfer support - receive side
On Thu, Jan 30, 2025 at 11:08:34AM +0100, Maciej S. Szmigiero wrote: > From: "Maciej S. Szmigiero" > > Add a basic support for receiving device state via multifd channels - > channels that are shared with RAM transfers. > > Depending whether MULTIFD_FLAG_DEVICE_STATE flag is present or not in the > packet header either device state (MultiFDPacketDeviceState_t) or RAM > data (existing MultiFDPacket_t) is read. > > The received device state data is provided to > qemu_loadvm_load_state_buffer() function for processing in the > device's load_state_buffer handler. > > Signed-off-by: Maciej S. Szmigiero I think I acked this one. You could keep my R-b if... [...] > diff --git a/migration/multifd.h b/migration/multifd.h > index 9e4baa066312..abf3acdcee40 100644 > --- a/migration/multifd.h > +++ b/migration/multifd.h > @@ -62,6 +62,12 @@ MultiFDRecvData *multifd_get_recv_data(void); > #define MULTIFD_FLAG_UADK (8 << 1) > #define MULTIFD_FLAG_QATZIP (16 << 1) > > +/* > + * If set it means that this packet contains device state > + * (MultiFDPacketDeviceState_t), not RAM data (MultiFDPacket_t). > + */ > +#define MULTIFD_FLAG_DEVICE_STATE (1 << 6) ... if this won't conflict with MULTIFD_FLAG_QATZIP. I think we should stick with one way to write it, then when rebase you can see such conflicts - either your patch uses 32 << 1, or perhaps we should start to switch to BIT() for all above instead.. -- Peter Xu
RE: [PATCH v5 13/17] aspeed/soc: Add AST2700 support
Hi Philippe, > From: Philippe Mathieu-Daudé > Sent: Tuesday, February 4, 2025 12:41 AM > To: Jamin Lin ; Cédric Le Goater ; > Peter Maydell ; Andrew Jeffery > ; Joel Stanley ; Alistair > Francis ; Cleber Rosa ; Wainer > dos Santos Moschetta ; Beraldo Leal > ; open list:ASPEED BMCs ; open > list:All patches CC here ; Jinjie Ruan > > Cc: Troy Lee ; Yunlin Tang > > Subject: Re: [PATCH v5 13/17] aspeed/soc: Add AST2700 support > > On 3/2/25 08:43, Jamin Lin wrote: > > Hi Philippe, > > > >> From: Jamin Lin > >> Sent: Monday, February 3, 2025 3:29 PM > >> To: Philippe Mathieu-Daudé ; Cédric Le Goater > >> ; Peter Maydell ; Andrew > >> Jeffery ; Joel Stanley ; > >> Alistair Francis ; Cleber Rosa > >> ; Wainer dos Santos Moschetta > >> ; Beraldo Leal ; open > >> list:ASPEED BMCs ; open list:All patches CC here > >> ; Jinjie Ruan > >> Cc: Troy Lee ; Yunlin Tang > >> > >> Subject: RE: [PATCH v5 13/17] aspeed/soc: Add AST2700 support > >> > >> Hi Philippe, > >> > >>> From: Philippe Mathieu-Daudé > >>> Sent: Thursday, January 30, 2025 11:14 PM > >>> To: Jamin Lin ; Cédric Le Goater > >>> ; Peter Maydell ; Andrew > >>> Jeffery ; Joel Stanley > >>> ; Alistair Francis ; Cleber > >>> Rosa ; Wainer dos Santos Moschetta > >> ; > >>> Beraldo Leal ; open list:ASPEED BMCs > >>> ; open list:All patches CC here > >>> ; Jinjie Ruan > >>> Cc: Troy Lee ; Yunlin Tang > >>> > >>> Subject: Re: [PATCH v5 13/17] aspeed/soc: Add AST2700 support > >>> > >>> Hi Jamin, > >>> > >>> On 4/6/24 07:44, Jamin Lin wrote: > Initial definitions for a simple machine using an AST2700 SOC > (Cortex-a35 > >>> CPU). > > AST2700 SOC and its interrupt controller are too complex to handle > in the common Aspeed SoC framework. We introduce a new ast2700 > class with instance_init and realize handlers. > > AST2700 is a 64 bits quad core cpus and support 8 watchdog. > Update maximum ASPEED_CPUS_NUM to 4 and ASPEED_WDTS_NUM to > 8. > In addition, update AspeedSocState to support scuio, sli, sliio and intc. > > Add TYPE_ASPEED27X0_SOC machine type. > > The SDMC controller is unlocked at SPL stage. > At present, only supports to emulate booting start from u-boot stage. > Set SDMC controller unlocked by default. > > In INTC, each interrupt of INT 128 to INT 136 combines 32 interrupts. > It connect GICINT IRQ GPIO-OUTPUT pins to GIC device with irq 128 to > 136. > And, if a device irq is 128 to 136, its irq GPIO-OUTPUT pin is > connected to GICINT or-gates instead of GIC device. > > Signed-off-by: Troy Lee > Signed-off-by: Jamin Lin > --- > hw/arm/aspeed_ast27x0.c | 563 > >>> > hw/arm/meson.build | 1 + > include/hw/arm/aspeed_soc.h | 28 +- > 3 files changed, 590 insertions(+), 2 deletions(-) > create mode 100644 hw/arm/aspeed_ast27x0.c > >>> > >>> > +static bool aspeed_soc_ast2700_gic_realize(DeviceState *dev, Error > +**errp) { > +Aspeed27x0SoCState *a = ASPEED27X0_SOC(dev); > +AspeedSoCState *s = ASPEED_SOC(dev); > +AspeedSoCClass *sc = ASPEED_SOC_GET_CLASS(s); > +SysBusDevice *gicbusdev; > +DeviceState *gicdev; > +QList *redist_region_count; > +int i; > + > +gicbusdev = SYS_BUS_DEVICE(&a->gic); > +gicdev = DEVICE(&a->gic); > +qdev_prop_set_uint32(gicdev, "revision", 3); > +qdev_prop_set_uint32(gicdev, "num-cpu", sc->num_cpus); > +qdev_prop_set_uint32(gicdev, "num-irq", AST2700_MAX_IRQ); > + > +redist_region_count = qlist_new(); > +qlist_append_int(redist_region_count, sc->num_cpus); > +qdev_prop_set_array(gicdev, "redist-region-count", > + redist_region_count); > + > +if (!sysbus_realize(gicbusdev, errp)) { > +return false; > +} > +sysbus_mmio_map(gicbusdev, 0, > sc->memmap[ASPEED_GIC_DIST]); > +sysbus_mmio_map(gicbusdev, 1, > >> sc->memmap[ASPEED_GIC_REDIST]); > + > +for (i = 0; i < sc->num_cpus; i++) { > +DeviceState *cpudev = DEVICE(&a->cpu[i]); > +int NUM_IRQS = 256, ARCH_GIC_MAINT_IRQ = 9, > >>> VIRTUAL_PMU_IRQ = 7; > +int ppibase = NUM_IRQS + i * GIC_INTERNAL + GIC_NR_SGIS; > + > +const int timer_irq[] = { > +[GTIMER_PHYS] = 14, > +[GTIMER_VIRT] = 11, > +[GTIMER_HYP] = 10, > +[GTIMER_SEC] = 13, > +}; > +int j; > + > +for (j = 0; j < ARRAY_SIZE(timer_irq); j++) { > +qdev_connect_gpio_out(cpudev, j, > +qdev_get_gpio_in(gicdev, ppibase + > >> timer_irq[j])); > +} > + > +qemu_irq irq = qdev_get_gpio_in(gicdev, > +ppibase + > >>
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
On 3.02.2025 21:36, Peter Xu wrote: On Mon, Feb 03, 2025 at 09:15:52PM +0100, Maciej S. Szmigiero wrote: On 3.02.2025 20:58, Peter Xu wrote: On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote: On 2.02.2025 13:45, Dr. David Alan Gilbert wrote: * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote: On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote: From: "Maciej S. Szmigiero" postcopy_ram_listen_thread() is a free running thread, so it needs to take BQL around function calls to migration methods requiring BQL. qemu_loadvm_state_main() needs BQL held since it ultimately calls "load_state" SaveVMHandlers. migration_incoming_state_destroy() needs BQL held since it ultimately calls "load_cleanup" SaveVMHandlers. Signed-off-by: Maciej S. Szmigiero --- migration/savevm.c | 4 1 file changed, 4 insertions(+) diff --git a/migration/savevm.c b/migration/savevm.c index b0b74140daea..0ceea9638cc1 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void *opaque) * in qemu_file, and thus we must be blocking now. */ qemu_file_set_blocking(f, true); +bql_lock(); load_res = qemu_loadvm_state_main(f, mis); +bql_unlock(); Doesn't that leave that held for a heck of a long time? Yes, and it effectively broke "postcopy recover" test but I think the reason for that is qemu_loadvm_state_main() and its children don't drop BQL while waiting for I/O. I've described this case in more detail in my reply to Fabiano here: https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd25...@maciej.szmigiero.name/ While it might be the cause in this case, my feeling is it's more fundamental here - it's the whole reason that postcopy has a separate ram listen thread. As the destination is running, after it loads it's devices and as it starts up the destination will be still loading RAM (and other postcopiable devices) potentially for quite a while. Holding the bql around the ram listen thread means that the execution of the destination won't be able to take that lock until the postcopy load has finished; so while that might apparently complete, it'll lead to the destination stalling until that's finished which defeats the whole point of postcopy. That last one probably won't fail a test but it will lead to a long stall if you give it a nice big guest with lots of RAM that it's rapidly changing. Okay, I understand the postcopy case/flow now. Thanks for explaining it clearly. I still think that "load_state" SaveVMHandlers need to be called with BQL held since implementations apparently expect it that way: for example, I think PCI device configuration restore calls address space manipulation methods which abort() if called without BQL held. However, the only devices that *should* be arriving on the channel that the postcopy_ram_listen_thread is reading from are those that are postcopiable (i.e. RAM and hmm block's dirty_bitmap). Those load handlers are safe to be run while the other devices are being changed. Note the *should* - you could add a check to fail if any other device arrives on that channel. I think ultimately there should be either an explicit check, or, as you suggest in the paragraph below, a separate SaveVMHandler that runs without BQL held. To me those are bugs happening during postcopy, so those abort()s in memory.c are indeed for catching these issues too. Since the current state of just running these SaveVMHandlers without BQL in this case and hoping that nothing breaks is clearly sub-optimal. I have previously even submitted a patch to explicitly document "load_state" SaveVMHandler as requiring BQL (which was also included in the previous version of this patch set) and it received a "Reviewed-by:" tag: https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/ https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/ https://lore.kernel.org/qemu-devel/87o732bti7@suse.de/ It happens! You could make this safer by having a load_state and a load_state_postcopy member, and only mark the load_state as requiring the lock. To not digress too much from the subject of this patch set (multifd VFIO device state transfer) for now I've just updated the TODO comment around that qemu_loadvm_state_main(), so hopefully this discussion won't get forgotten: https://gitlab.com/maciejsszmigiero/qemu/-/commit/046e3deac5b1dbc406b3e9571f62468bd6743e79 The commit message may still need some touch ups, e.g.: postcopy_ram_listen_thread() is a free running thread, so it needs to take BQL around function calls to migration methods requiring BQL. This sentence is still not correct, IMHO. As Dave explained, the ram load thread is designed to run without BQL at least for t
Re: [PATCH v4 08/33] migration/multifd: Allow premature EOF on TLS incoming channels
On 3.02.2025 21:20, Peter Xu wrote: On Mon, Feb 03, 2025 at 07:53:00PM +0100, Maciej S. Szmigiero wrote: On 3.02.2025 19:20, Peter Xu wrote: On Thu, Jan 30, 2025 at 11:08:29AM +0100, Maciej S. Szmigiero wrote: From: "Maciej S. Szmigiero" Multifd send channels are terminated by calling qio_channel_shutdown(QIO_CHANNEL_SHUTDOWN_BOTH) in multifd_send_terminate_threads(), which in the TLS case essentially calls shutdown(SHUT_RDWR) on the underlying raw socket. Unfortunately, this does not terminate the TLS session properly and the receive side sees this as a GNUTLS_E_PREMATURE_TERMINATION error. The only reason why this wasn't causing migration failures is because the current migration code apparently does not check for migration error being set after the end of the multifd receive process. However, this will change soon so the multifd receive code has to be prepared to not return an error on such premature TLS session EOF. Use the newly introduced QIOChannelTLS method for that. It's worth noting that even if the sender were to be changed to terminate the TLS connection properly the receive side still needs to remain compatible with older QEMU bit stream which does not do this. If this is an existing bug, we could add a Fixes. It is an existing issue but only uncovered by this patch set. As far as I can see it was always there, so it would need some thought where to point that Fixes tag. If there's no way to trigger a real functional bug anyway, it's also ok we omit the Fixes. Two pure questions.. - What is the correct way to terminate the TLS session without this flag? I guess one would need to call gnutls_bye() like in this GnuTLS example: https://gitlab.com/gnutls/gnutls/-/blob/2b8c3e4c71ad380bbbffb32e6003b34ecad596e3/doc/examples/ex-client-anon.c#L102 - Why this is only needed by multifd sessions? What uncovered the issue was switching the load threads to using migrate_set_error() instead of their own result variable (load_threads_ret) which you had requested during the previous patch set version review: https://lore.kernel.org/qemu-devel/Z1DbH5fwBaxtgrvH@x1n/ Turns out that the multifd receive code always returned error in the TLS case, just nothing was previously checking for that error presence. What I was curious is whether this issue also exists for the main migration channel when with tls, especially when e.g. multifd not enabled at all. As I don't see anywhere that qemu uses gnutls_bye() for any tls session. I think it's a good to find that we overlooked this before.. and IMHO it's always good we could fix this. Does it mean we need proper gnutls_bye() somewhere? If we need an explicit gnutls_bye(), then I wonder if that should be done on the main channel as well. That's a good question and looking at the code qemu_loadvm_state_main() exits on receiving "QEMU_VM_EOF" section (that's different from receiving socket EOF) and then optionally "QEMU_VM_VMDESCRIPTION" section is read with explicit size in qemu_loadvm_state() - so still not until channel EOF. Then I can't see anything else reading the channel until it is closed in migration_incoming_state_destroy(). So most likely the main migration channel will never read far enough to reach that GNUTLS_E_PREMATURE_TERMINATION error. If we don't need gnutls_bye(), then should we always ignore pre-mature termination of tls no matter if it's multifd or non-multifd channel (or even a tls session that is not migration-related)? So basically have this patch extended to calling qio_channel_tls_set_premature_eof_okay() also on the main migration channel? Thanks, Thanks, Maciej
Re: [PATCH v4 16/33] migration/multifd: Device state transfer support - send side
On Thu, Jan 30, 2025 at 11:08:37AM +0100, Maciej S. Szmigiero wrote: > From: "Maciej S. Szmigiero" > > A new function multifd_queue_device_state() is provided for device to queue > its state for transmission via a multifd channel. > > Signed-off-by: Maciej S. Szmigiero Reviewed-by: Peter Xu -- Peter Xu
Re: [PATCH v4 10/33] error: define g_autoptr() cleanup function for the Error type
On 3.02.2025 22:13, Daniel P. Berrangé wrote: On Thu, Jan 30, 2025 at 11:08:31AM +0100, Maciej S. Szmigiero wrote: From: "Maciej S. Szmigiero" Automatic memory management helps avoid memory safety issues. Signed-off-by: Maciej S. Szmigiero --- include/qapi/error.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/qapi/error.h b/include/qapi/error.h index 71f8fb2c50ee..649ec8f1b6a2 100644 --- a/include/qapi/error.h +++ b/include/qapi/error.h @@ -437,6 +437,8 @@ Error *error_copy(const Error *err);> q */ void error_free(Error *err); +G_DEFINE_AUTOPTR_CLEANUP_FUNC(Error, error_free) + This has been rejected by Markus in the past when I proposed. See the rationale at the time here: https://lists.nongnu.org/archive/html/qemu-devel/2024-07/msg05503.html Thanks for the pointer, I wasn't expecting this change to be controversial. If you want this, the commit message will need to explain the use case and justify why the existing error usage patterns are insufficient. In this case it's about giving received Error to migrate_set_error() which does *not* take ownership of it. And the reason why migrate_set_error() does not take ownership of incoming Error is that it might have an Error already set in MigrationState, in this case it simply ignores the passed Error (almost like being a NOP in this case). I don't know whether this is enough of a justification for introducing g_autoptr(Error). I'm happy to drop this commit and change it to manual memory management instead if it is not. @Markus, what's your opinion here? With regards, Daniel Thanks, Maciej
Re: [PATCH v4 08/33] migration/multifd: Allow premature EOF on TLS incoming channels
On Mon, Feb 03, 2025 at 07:53:00PM +0100, Maciej S. Szmigiero wrote: > On 3.02.2025 19:20, Peter Xu wrote: > > On Thu, Jan 30, 2025 at 11:08:29AM +0100, Maciej S. Szmigiero wrote: > > > From: "Maciej S. Szmigiero" > > > > > > Multifd send channels are terminated by calling > > > qio_channel_shutdown(QIO_CHANNEL_SHUTDOWN_BOTH) in > > > multifd_send_terminate_threads(), which in the TLS case essentially > > > calls shutdown(SHUT_RDWR) on the underlying raw socket. > > > > > > Unfortunately, this does not terminate the TLS session properly and > > > the receive side sees this as a GNUTLS_E_PREMATURE_TERMINATION error. > > > > > > The only reason why this wasn't causing migration failures is because > > > the current migration code apparently does not check for migration > > > error being set after the end of the multifd receive process. > > > > > > However, this will change soon so the multifd receive code has to be > > > prepared to not return an error on such premature TLS session EOF. > > > Use the newly introduced QIOChannelTLS method for that. > > > > > > It's worth noting that even if the sender were to be changed to terminate > > > the TLS connection properly the receive side still needs to remain > > > compatible with older QEMU bit stream which does not do this. > > > > If this is an existing bug, we could add a Fixes. > > It is an existing issue but only uncovered by this patch set. > > As far as I can see it was always there, so it would need some > thought where to point that Fixes tag. If there's no way to trigger a real functional bug anyway, it's also ok we omit the Fixes. > > Two pure questions.. > > > >- What is the correct way to terminate the TLS session without this flag? > > I guess one would need to call gnutls_bye() like in this GnuTLS example: > https://gitlab.com/gnutls/gnutls/-/blob/2b8c3e4c71ad380bbbffb32e6003b34ecad596e3/doc/examples/ex-client-anon.c#L102 > > >- Why this is only needed by multifd sessions? > > What uncovered the issue was switching the load threads to using > migrate_set_error() instead of their own result variable > (load_threads_ret) which you had requested during the previous > patch set version review: > https://lore.kernel.org/qemu-devel/Z1DbH5fwBaxtgrvH@x1n/ > > Turns out that the multifd receive code always returned > error in the TLS case, just nothing was previously checking for > that error presence. What I was curious is whether this issue also exists for the main migration channel when with tls, especially when e.g. multifd not enabled at all. As I don't see anywhere that qemu uses gnutls_bye() for any tls session. I think it's a good to find that we overlooked this before.. and IMHO it's always good we could fix this. Does it mean we need proper gnutls_bye() somewhere? If we need an explicit gnutls_bye(), then I wonder if that should be done on the main channel as well. If we don't need gnutls_bye(), then should we always ignore pre-mature termination of tls no matter if it's multifd or non-multifd channel (or even a tls session that is not migration-related)? Thanks, > > Another option would be to simply return to using > load_threads_ret like the previous versions did and not > experiment with touching global migration state because > as we can see other places can unintentionally break. > > If we go this route then these TLS EOF patches could be > dropped. > > > Thanks, > > > > Thanks, > Maciej > -- Peter Xu
Re: [PATCH v2 14/15] iotests: Add qsd-migrate case
Am 03.02.2025 um 20:35 hat Eric Blake geschrieben: > On Fri, Jan 31, 2025 at 10:50:50AM +0100, Kevin Wolf wrote: > > Test that it's possible to migrate a VM that uses an image on shared > > storage through qemu-storage-daemon. > > > > Signed-off-by: Kevin Wolf > > --- > > tests/qemu-iotests/tests/qsd-migrate | 132 +++ > > tests/qemu-iotests/tests/qsd-migrate.out | 51 + > > 2 files changed, 183 insertions(+) > > create mode 100755 tests/qemu-iotests/tests/qsd-migrate > > create mode 100644 tests/qemu-iotests/tests/qsd-migrate.out > > > > diff --git a/tests/qemu-iotests/tests/qsd-migrate > > b/tests/qemu-iotests/tests/qsd-migrate > > new file mode 100755 > > index 00..687bda6f93 > > --- /dev/null > > +++ b/tests/qemu-iotests/tests/qsd-migrate > > @@ -0,0 +1,132 @@ > > +#!/usr/bin/env python3 > > +# group: rw quick > > > + > > +with iotests.FilePath('disk.img') as path, \ > > + iotests.FilePath('nbd-src.sock', base_dir=iotests.sock_dir) as > > nbd_src, \ > > + iotests.FilePath('nbd-dst.sock', base_dir=iotests.sock_dir) as > > nbd_dst, \ > > + iotests.FilePath('migrate.sock', base_dir=iotests.sock_dir) as > > mig_sock, \ > > + iotests.VM(path_suffix="-src") as vm_src, \ > > + iotests.VM(path_suffix="-dst") as vm_dst: > > + > > > + > > +iotests.log('\nTest I/O on the source') > > +vm_src.hmp_qemu_io('virtio0/virtio-backend', 'write -P 0x11 0 4k', > > + use_log=True, qdev=True) > > +vm_src.hmp_qemu_io('virtio0/virtio-backend', 'read -P 0x11 0 4k', > > + use_log=True, qdev=True) > > + > > +iotests.log('\nStarting migration...') > > > Is it worth adding a test that qemu_io fails to write on the > destination while it is inactive (to ensure we are properly rejecting > modification of an inactive image)? The problem with that is that the failure mode for qemu_io (which acts as if it were a device, not an external interface) is an assertion failure. The other test (in patch 15) tests writes on the NBD export, which fails gracefully. > > + > > +mig_caps = [ > > +{'capability': 'events', 'state': True}, > > +{'capability': 'pause-before-switchover', 'state': True}, > > +] > > +vm_src.qmp_log('migrate-set-capabilities', capabilities=mig_caps) > > +vm_dst.qmp_log('migrate-set-capabilities', capabilities=mig_caps) > > +vm_src.qmp_log('migrate', uri=f'unix:{mig_sock}', > > + filters=[iotests.filter_qmp_testfiles]) > > + > > +vm_src.event_wait('MIGRATION', > > + match={'data': {'status': 'pre-switchover'}}) > > + > > +iotests.log('\nPre-switchover: Reconfigure QSD instances') > > + > > +iotests.log(qsd_src.qmp('blockdev-set-active', {'active': False})) > > +iotests.log(qsd_dst.qmp('blockdev-set-active', {'active': True})) > > Also, should you attempt a read on both src and dst while both sides > are inactive, to prove that reads can take a snapshot in the middle of > the handover? I think this could be done without any problems. Kevin
Re: [PATCH v4 10/33] error: define g_autoptr() cleanup function for the Error type
On Thu, Jan 30, 2025 at 11:08:31AM +0100, Maciej S. Szmigiero wrote: > From: "Maciej S. Szmigiero" > > Automatic memory management helps avoid memory safety issues. > > Signed-off-by: Maciej S. Szmigiero > --- > include/qapi/error.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/include/qapi/error.h b/include/qapi/error.h > index 71f8fb2c50ee..649ec8f1b6a2 100644 > --- a/include/qapi/error.h > +++ b/include/qapi/error.h > @@ -437,6 +437,8 @@ Error *error_copy(const Error *err);> q */ > void error_free(Error *err); > > +G_DEFINE_AUTOPTR_CLEANUP_FUNC(Error, error_free) > + This has been rejected by Markus in the past when I proposed. See the rationale at the time here: https://lists.nongnu.org/archive/html/qemu-devel/2024-07/msg05503.html If you want this, the commit message will need to explain the use case and justify why the existing error usage patterns are insufficient. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH V1 05/26] vfio/container: preserve descriptors
On 2/3/2025 12:48 PM, Cédric Le Goater wrote: On 1/29/25 15:43, Steve Sistare wrote: At vfio creation time, save the value of vfio container, group, and device descriptors in CPR state. On qemu restart, vfio_realize() finds and uses the saved descriptors, and remembers the reused status for subsequent patches. The reused status is cleared when vmstate load finishes. During reuse, device and iommu state is already configured, so operations in vfio_realize that would modify the configuration, such as vfio ioctl's, are skipped. The result is that vfio_realize constructs qemu data structures that reflect the current state of the device. Signed-off-by: Steve Sistare --- hw/vfio/container.c | 105 ++ hw/vfio/cpr-legacy.c | 17 +++ include/hw/vfio/vfio-common.h | 2 + 3 files changed, 105 insertions(+), 19 deletions(-) diff --git a/hw/vfio/container.c b/hw/vfio/container.c index a90ce6c..81d0ccc 100644 --- a/hw/vfio/container.c +++ b/hw/vfio/container.c @@ -31,6 +31,7 @@ #include "system/reset.h" #include "trace.h" #include "qapi/error.h" +#include "migration/cpr.h" #include "pci.h" VFIOGroupList vfio_group_list = @@ -415,12 +416,28 @@ static bool vfio_set_iommu(int container_fd, int group_fd, } static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group, - Error **errp) + bool reused, Error **errp) Please rename 'reused' to 'cpr_reused'. We should know what this parameter is for and I don't see any other use than CPR. Hi Cedric, glad to virtually meet you, and thanks for reviewing this. There is no other notion of "reused" in qemu -- CPR is the first to introduce it. Thus "reused" is unambiguous, it always refers to CPR. IMO shorter names without underscores make the code more readable, as long as they are unambiguous. Also, the "reused" identifier already appears in the initial series for cpr-transfer, and to switch now to a different identifier leaves us with two names for the same functionality. Right now I can cscope "reused" and find everything. For those reasons, I prefer reused, but if you feel strongly, I will rename it. { int iommu_type; const char *vioc_name; VFIOContainer *container; + /* + * If container is reused, just set its type and skip the ioctls, as the + * container and group are already configured in the kernel. + * VFIO_TYPE1v2_IOMMU is the only type that supports reuse/cpr. + */ + if (reused) { + if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) { + iommu_type = VFIO_TYPE1v2_IOMMU; + goto skip_iommu; + } else { + error_setg(errp, "container was reused but VFIO_TYPE1v2_IOMMU " + "is not supported"); + return NULL; + } + } + Can we use 'iommu_type' below instead and avoid VFIO_CHECK_EXTENSION ioctl ? and then set the iommu unless CPR reused is set. Sure, I'll mke that change. iommu_type = vfio_get_iommu_type(fd, errp); if (iommu_type < 0) { return NULL; @@ -430,10 +447,12 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group, return NULL; } +skip_iommu: I think we can avoid this 'skip_iommu' label with some minor refactoring. vioc_name = vfio_get_iommu_class_name(iommu_type); container = VFIO_IOMMU_LEGACY(object_new(vioc_name)); container->fd = fd; + container->reused = reused; container->iommu_type = iommu_type; return container; } @@ -543,10 +562,13 @@ static bool vfio_connect_container(VFIOGroup *group, AddressSpace *as, VFIOContainer *container; VFIOContainerBase *bcontainer; int ret, fd; + bool reused; cpr_reused. VFIOAddressSpace *space; VFIOIOMMUClass *vioc; space = vfio_get_address_space(as); + fd = cpr_find_fd("vfio_container_for_group", group->groupid); + reused = (fd > 0); hmm, so we are deducing from the existence of a CprFd state element that we are doing a live update of the VM. This seems to me to be a somewhat quick heuristic. Isn't there a global helper ? Isn't the VM aware that it's being restarted after a live update ? I am not familiar with the CPR sequence. There is a global mode that can be checked, but we would still need to fetch the fd. Checking the fd alone yields tighter code. It also seems perfectly logical to me when reading the code. Can't find the cpr fd? Then we are not doing cpr. BTW, it is not heuristic. The cpr fd exists at creation time iff we are doing cpr. /* * VFIO is currently incompatible with discarding of RAM insofar as the @@ -579,28 +601,52 @@ static bool vfio_connect_container(VFIOGroup *group, AddressSpace *as, * details once we know which type of IOMMU we are using. */ + /* + * If the container is reused,
Re: [PATCH V1 04/26] vfio/container: register container for cpr
On 2/3/2025 12:01 PM, Cédric Le Goater wrote: On 1/29/25 15:43, Steve Sistare wrote: Register a legacy container for cpr-transfer. Add a blocker if the kernel does not support VFIO_UPDATE_VADDR or VFIO_UNMAP_ALL. This is mostly boiler plate. The fields to to saved and restored are added in subsequent patches. Signed-off-by: Steve Sistare --- hw/vfio/container.c | 6 ++-- hw/vfio/cpr-legacy.c | 68 +++ hw/vfio/meson.build | 3 +- include/hw/vfio/vfio-common.h | 3 ++ 4 files changed, 76 insertions(+), 4 deletions(-) create mode 100644 hw/vfio/cpr-legacy.c diff --git a/hw/vfio/container.c b/hw/vfio/container.c index 4ebb526..a90ce6c 100644 --- a/hw/vfio/container.c +++ b/hw/vfio/container.c @@ -618,7 +618,7 @@ static bool vfio_connect_container(VFIOGroup *group, AddressSpace *as, } bcontainer = &container->bcontainer; - if (!vfio_cpr_register_container(bcontainer, errp)) { + if (!vfio_legacy_cpr_register_container(container, errp)) { goto free_container_exit; } @@ -666,7 +666,7 @@ enable_discards_exit: vfio_ram_block_discard_disable(container, false); unregister_container_exit: - vfio_cpr_unregister_container(bcontainer); + vfio_legacy_cpr_unregister_container(container); free_container_exit: object_unref(container); @@ -710,7 +710,7 @@ static void vfio_disconnect_container(VFIOGroup *group) VFIOAddressSpace *space = bcontainer->space; trace_vfio_disconnect_container(container->fd); - vfio_cpr_unregister_container(bcontainer); + vfio_legacy_cpr_unregister_container(container); close(container->fd); object_unref(container); diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c new file mode 100644 index 000..d3bbc05 --- /dev/null +++ b/hw/vfio/cpr-legacy.c @@ -0,0 +1,68 @@ +/* + * Copyright (c) 2021-2025 Oracle and/or its affiliates. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include +#include "qemu/osdep.h" +#include "hw/vfio/vfio-common.h" +#include "migration/blocker.h" +#include "migration/cpr.h" +#include "migration/migration.h" +#include "migration/vmstate.h" +#include "qapi/error.h" + +static bool vfio_cpr_supported(VFIOContainer *container, Error **errp) +{ + if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR)) { + error_setg(errp, "VFIO container does not support VFIO_UPDATE_VADDR"); + return false; + + } else if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UNMAP_ALL)) { + error_setg(errp, "VFIO container does not support VFIO_UNMAP_ALL"); + return false; + + } else { + return true; + } +} + +static const VMStateDescription vfio_container_vmstate = { + .name = "vfio-container", + .version_id = 0, + .minimum_version_id = 0, + .needed = cpr_needed_for_reuse, + .fields = (VMStateField[]) { + VMSTATE_END_OF_LIST() + } +}; + +bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp) +{ + VFIOContainerBase *bcontainer = &container->bcontainer; + Error **cpr_blocker = &container->cpr_blocker; + + if (!vfio_cpr_register_container(bcontainer, errp)) { + return false; + } + + if (!vfio_cpr_supported(container, cpr_blocker)) { + return migrate_add_blocker_modes(cpr_blocker, errp, + MIG_MODE_CPR_TRANSFER, -1) == 0; + } + + vmstate_register(NULL, -1, &vfio_container_vmstate, container); + + return true; +} + +void vfio_legacy_cpr_unregister_container(VFIOContainer *container) +{ + VFIOContainerBase *bcontainer = &container->bcontainer; + + vfio_cpr_unregister_container(bcontainer); + migrate_del_blocker(&container->cpr_blocker); + vmstate_unregister(NULL, &vfio_container_vmstate, container); +} diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build index bba776f..5487815 100644 --- a/hw/vfio/meson.build +++ b/hw/vfio/meson.build @@ -5,13 +5,14 @@ vfio_ss.add(files( 'container-base.c', 'container.c', 'migration.c', - 'cpr.c', )) vfio_ss.add(when: 'CONFIG_PSERIES', if_true: files('spapr.c')) vfio_ss.add(when: 'CONFIG_IOMMUFD', if_true: files( 'iommufd.c', )) vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files( + 'cpr.c', + 'cpr-legacy.c', 'display.c', 'pci-quirks.c', 'pci.c', diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 0c60be5..53e554f 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -84,6 +84,7 @@ typedef struct VFIOContainer { VFIOContainerBase bcontainer; int fd; /* /dev/vfio/vfio, empowered by the attached groups */ unsigned iommu_type; + Error *cpr_blocker; QLIST_HEAD(, VFIOGroup) group_list; } VFIOContainer; @@ -258,6 +259,8 @@ int vfio_kvm_device_del_fd(int fd, Error
[PATCH 0/2] nbd: Allow debugging tuning of handshake limit
Reviving a patch that has been sitting in my tree for a while. It's mostly useful for low-level integration testing (such as debugging libnbd as an NBD client). Eric Blake (2): qemu-nbd: Allow users to adjust handshake limit nbd/server: Allow users to adjust handshake limit in QMP docs/tools/qemu-nbd.rst| 5 + qapi/block-export.json | 10 + include/block/nbd.h| 6 ++--- block/monitor/block-hmp-cmds.c | 4 ++-- blockdev-nbd.c | 26 ++--- qemu-nbd.c | 41 +- 6 files changed, 64 insertions(+), 28 deletions(-) -- 2.48.1
[PATCH 2/2] nbd/server: Allow users to adjust handshake limit in QMP
Although defaulting the handshake limit to 10 seconds was a nice QoI change to weed out intentionally slow clients, it can interfere with integration testing done with manual NBD_OPT commands over 'nbdsh --opt-mode'. Expose a QMP knob 'handshake-max-secs' to allow the user to alter the timeout away from the default. The parameter name here intentionally matches the spelling of the constant added in commit fb1c2aaa98, and not the command-line spelling added in the previous patch for qemu-nbd; that's because in QMP, longer names serve as good self-documentation, and unlike the command line, machines don't have problems generating longer spellings. Signed-off-by: Eric Blake --- qapi/block-export.json | 10 ++ include/block/nbd.h| 6 +++--- block/monitor/block-hmp-cmds.c | 4 ++-- blockdev-nbd.c | 26 ++ 4 files changed, 33 insertions(+), 13 deletions(-) diff --git a/qapi/block-export.json b/qapi/block-export.json index ce33fe378df..58ae6a5e1d7 100644 --- a/qapi/block-export.json +++ b/qapi/block-export.json @@ -17,6 +17,10 @@ # # @addr: Address on which to listen. # +# @handshake-max-secs: Time limit, in seconds, at which a client that +# has not completed the negotiation handshake will be disconnected, +# or 0 for no limit (since 10.0; default: 10). +# # @tls-creds: ID of the TLS credentials object (since 2.6). # # @tls-authz: ID of the QAuthZ authorization object used to validate @@ -34,6 +38,7 @@ ## { 'struct': 'NbdServerOptions', 'data': { 'addr': 'SocketAddress', +'*handshake-max-secs': 'uint32', '*tls-creds': 'str', '*tls-authz': 'str', '*max-connections': 'uint32' } } @@ -52,6 +57,10 @@ # # @addr: Address on which to listen. # +# @handshake-max-secs: Time limit, in seconds, at which a client that +# has not completed the negotiation handshake will be disconnected, +# or 0 for no limit (since 10.0; default: 10). +# # @tls-creds: ID of the TLS credentials object (since 2.6). # # @tls-authz: ID of the QAuthZ authorization object used to validate @@ -72,6 +81,7 @@ ## { 'command': 'nbd-server-start', 'data': { 'addr': 'SocketAddressLegacy', +'*handshake-max-secs': 'uint32', '*tls-creds': 'str', '*tls-authz': 'str', '*max-connections': 'uint32' }, diff --git a/include/block/nbd.h b/include/block/nbd.h index d4f8b21aecc..92987c76fd6 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -428,9 +428,9 @@ void nbd_client_put(NBDClient *client); void nbd_server_is_qemu_nbd(int max_connections); bool nbd_server_is_running(void); int nbd_server_max_connections(void); -void nbd_server_start(SocketAddress *addr, const char *tls_creds, - const char *tls_authz, uint32_t max_connections, - Error **errp); +void nbd_server_start(SocketAddress *addr, uint32_t handshake_max_secs, + const char *tls_creds, const char *tls_authz, + uint32_t max_connections, Error **errp); void nbd_server_start_options(NbdServerOptions *arg, Error **errp); /* nbd_read diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c index 1d312513fc4..0cfcbfe7c21 100644 --- a/block/monitor/block-hmp-cmds.c +++ b/block/monitor/block-hmp-cmds.c @@ -402,8 +402,8 @@ void hmp_nbd_server_start(Monitor *mon, const QDict *qdict) goto exit; } -nbd_server_start(addr, NULL, NULL, NBD_DEFAULT_MAX_CONNECTIONS, - &local_err); +nbd_server_start(addr, NBD_DEFAULT_HANDSHAKE_MAX_SECS, NULL, NULL, + NBD_DEFAULT_MAX_CONNECTIONS, &local_err); qapi_free_SocketAddress(addr); if (local_err != NULL) { goto exit; diff --git a/blockdev-nbd.c b/blockdev-nbd.c index 9e61fbaf2b2..e9f53e83d48 100644 --- a/blockdev-nbd.c +++ b/blockdev-nbd.c @@ -28,6 +28,7 @@ typedef struct NBDConn { typedef struct NBDServerData { QIONetListener *listener; +uint32_t handshake_max_secs; QCryptoTLSCreds *tlscreds; char *tlsauthz; uint32_t max_connections; @@ -84,8 +85,7 @@ static void nbd_accept(QIONetListener *listener, QIOChannelSocket *cioc, nbd_update_server_watch(nbd_server); qio_channel_set_name(QIO_CHANNEL(cioc), "nbd-server"); -/* TODO - expose handshake timeout as QMP option */ -nbd_client_new(cioc, NBD_DEFAULT_HANDSHAKE_MAX_SECS, +nbd_client_new(cioc, nbd_server->handshake_max_secs, nbd_server->tlscreds, nbd_server->tlsauthz, nbd_blockdev_client_closed, conn); } @@ -162,9 +162,9 @@ static QCryptoTLSCreds *nbd_get_tls_creds(const char *id, Error **errp) } -void nbd_server_start(SocketAddress *addr, const char *tls_creds, - const char *tls_authz, uint32_t max_connections, - Error **errp) +void nbd_server_start(SocketAddress *addr, uint32_t handshake_max
Re: [PATCH V1 06/26] vfio/container: preserve DMA mappings
On 2/3/2025 1:25 PM, Cédric Le Goater wrote: On 1/29/25 15:43, Steve Sistare wrote: Preserve DMA mappings during cpr-transfer. In the container pre_save handler, suspend the use of virtual addresses in DMA mappings with VFIO_DMA_UNMAP_FLAG_VADDR, because guest RAM will be remapped at a different VA after exec. DMA to already-mapped pages continues. Because the vaddr is temporarily invalid, mediated devices cannot be supported, so add a blocker for them. This restriction will not apply to iommufd containers when CPR is added for them in a future patch. In new QEMU, do not register the memory listener at device creation time. Register it later, in the container post_load handler, after all vmstate that may affect regions and mapping boundaries has been loaded. The post_load registration will cause the listener to invoke its callback on each flat section, and the calls will match the mappings remembered by the kernel. Modify vfio_dma_map (which is called by the listener) to pass the new VA to the kernel using VFIO_DMA_MAP_FLAG_VADDR. Signed-off-by: Steve Sistare --- hw/vfio/container.c | 44 +++ hw/vfio/cpr-legacy.c | 32 +++ include/hw/vfio/vfio-common.h | 3 +++ 3 files changed, 75 insertions(+), 4 deletions(-) diff --git a/hw/vfio/container.c b/hw/vfio/container.c index 81d0ccc..2b5125e 100644 --- a/hw/vfio/container.c +++ b/hw/vfio/container.c @@ -32,6 +32,7 @@ #include "trace.h" #include "qapi/error.h" #include "migration/cpr.h" +#include "migration/blocker.h" #include "pci.h" VFIOGroupList vfio_group_list = @@ -132,6 +133,8 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer, int ret; Error *local_err = NULL; + assert(!container->reused); + if (iotlb && vfio_devices_all_dirty_tracking_started(bcontainer)) { if (!vfio_devices_all_device_dirty_tracking(bcontainer) && bcontainer->dirty_pages_supported) { @@ -183,12 +186,24 @@ static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova, bcontainer); struct vfio_iommu_type1_dma_map map = { .argsz = sizeof(map), - .flags = VFIO_DMA_MAP_FLAG_READ, .vaddr = (__u64)(uintptr_t)vaddr, .iova = iova, .size = size, }; + /* + * Set the new vaddr for any mappings registered during cpr load. + * Reused is cleared thereafter. + */ + if (container->reused) { + map.flags = VFIO_DMA_MAP_FLAG_VADDR; + if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map)) { + goto fail; + } + return 0; + } This is a bit ugly. When reaching routine vfio_attach_device(), could we detect that CPR is in progress and replace the 'VFIOIOMMUClass *' temporarily with a set of CPR specific handlers ? Good idea, I'll try it. I wrote this code years ago before the dma map and unmap functions were defined in an ops vector. + + map.flags = VFIO_DMA_MAP_FLAG_READ; if (!readonly) { map.flags |= VFIO_DMA_MAP_FLAG_WRITE; } @@ -205,7 +220,11 @@ static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova, return 0; } - error_report("VFIO_MAP_DMA failed: %s", strerror(errno)); +fail: + error_report("vfio_dma_map %s (iova %lu, size %ld, va %p): %s", + (container->reused ? "VADDR" : ""), iova, size, vaddr, + strerror(errno)); + FYI, I am currently trying to remove this error report. return -errno; } @@ -689,8 +708,17 @@ static bool vfio_connect_container(VFIOGroup *group, AddressSpace *as, group->container = container; QLIST_INSERT_HEAD(&container->group_list, group, container_next); - bcontainer->listener = vfio_memory_listener; - memory_listener_register(&bcontainer->listener, bcontainer->space->as); + /* + * If reused, register the listener later, after all state that may + * affect regions and mapping boundaries has been cpr load'ed. Later, + * the listener will invoke its callback on each flat section and call + * vfio_dma_map to supply the new vaddr, and the calls will match the + * mappings remembered by the kernel. + */ + if (!reused) { + bcontainer->listener = vfio_memory_listener; + memory_listener_register(&bcontainer->listener, bcontainer->space->as); + } oh ! This is an important change. Please move in its own patch. OK. if (bcontainer->error) { error_propagate_prepend(errp, bcontainer->error, @@ -1002,6 +1030,13 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev, return false; } + if (vbasedev->mdev) { + error_setg(&vbasedev->cpr_mdev_blocker, + "CPR does not support vfio mdev %s", vbasedev->name); + migrate_add_blocker_modes(&vbasedev->cpr_mdev_blocker, &error_fatal
[PATCH 1/2] qemu-nbd: Allow users to adjust handshake limit
Although defaulting the handshake limit to 10 seconds was a nice QoI change to weed out intentionally slow clients, it can interfere with integration testing done with manual NBD_OPT commands over 'nbdsh --opt-mode'. Expose a command line option to allow the user to alter the timeout away from the default. This option is unlikely to be used in enough scenarios to warrant a short option letter. The option --handshake-limit intentionally differs from the name of the constant added in commit fb1c2aaa98 (limit instead of max_secs) and the QMP name to be added in the next commit; this is because typing a longer command-line name is undesirable and there is sufficient --help text to document the units. Signed-off-by: Eric Blake --- docs/tools/qemu-nbd.rst | 5 + qemu-nbd.c | 41 ++--- 2 files changed, 31 insertions(+), 15 deletions(-) diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst index 4f21b7904ac..f82ea5fd77b 100644 --- a/docs/tools/qemu-nbd.rst +++ b/docs/tools/qemu-nbd.rst @@ -156,6 +156,11 @@ driver options if :option:`--image-opts` is specified. Set the NBD volume export description, as a human-readable string. +.. option:: --handshake-limit=N + + Set the timeout for a client to successfully complete its handshake + to N seconds (default 10), or 0 for no limit. + .. option:: -L, --list Connect as a client and list all details about the exports exposed by diff --git a/qemu-nbd.c b/qemu-nbd.c index e7a961a5562..2eb32bc700c 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -57,19 +57,20 @@ #define HAVE_NBD_DEVICE 0 #endif -#define SOCKET_PATH"/var/lock/qemu-nbd-%s" -#define QEMU_NBD_OPT_CACHE 256 -#define QEMU_NBD_OPT_AIO 257 -#define QEMU_NBD_OPT_DISCARD 258 -#define QEMU_NBD_OPT_DETECT_ZEROES 259 -#define QEMU_NBD_OPT_OBJECT260 -#define QEMU_NBD_OPT_TLSCREDS 261 -#define QEMU_NBD_OPT_IMAGE_OPTS262 -#define QEMU_NBD_OPT_FORK 263 -#define QEMU_NBD_OPT_TLSAUTHZ 264 -#define QEMU_NBD_OPT_PID_FILE 265 -#define QEMU_NBD_OPT_SELINUX_LABEL 266 -#define QEMU_NBD_OPT_TLSHOSTNAME 267 +#define SOCKET_PATH "/var/lock/qemu-nbd-%s" +#define QEMU_NBD_OPT_CACHE 256 +#define QEMU_NBD_OPT_AIO 257 +#define QEMU_NBD_OPT_DISCARD 258 +#define QEMU_NBD_OPT_DETECT_ZEROES 259 +#define QEMU_NBD_OPT_OBJECT 260 +#define QEMU_NBD_OPT_TLSCREDS261 +#define QEMU_NBD_OPT_IMAGE_OPTS 262 +#define QEMU_NBD_OPT_FORK263 +#define QEMU_NBD_OPT_TLSAUTHZ264 +#define QEMU_NBD_OPT_PID_FILE265 +#define QEMU_NBD_OPT_SELINUX_LABEL 266 +#define QEMU_NBD_OPT_TLSHOSTNAME 267 +#define QEMU_NBD_OPT_HANDSHAKE_LIMIT 268 #define MBR_SIZE 512 @@ -80,6 +81,7 @@ static int nb_fds; static QIONetListener *server; static QCryptoTLSCreds *tlscreds; static const char *tlsauthz; +static int handshake_limit = NBD_DEFAULT_HANDSHAKE_MAX_SECS; static void usage(const char *name) { @@ -101,6 +103,7 @@ static void usage(const char *name) " -v, --verbose display extra debugging information\n" " -x, --export-name=NAMEexpose export by name (default is empty string)\n" " -D, --description=TEXTexport a human-readable description\n" +" --handshake-limit=N limit client's handshake to N seconds (default 10)\n" "\n" "Exposing part of the image:\n" " -o, --offset=OFFSET offset into the image\n" @@ -390,8 +393,7 @@ static void nbd_accept(QIONetListener *listener, QIOChannelSocket *cioc, nb_fds++; nbd_update_server_watch(); -/* TODO - expose handshake timeout as command line option */ -nbd_client_new(cioc, NBD_DEFAULT_HANDSHAKE_MAX_SECS, +nbd_client_new(cioc, handshake_limit, tlscreds, tlsauthz, nbd_client_closed, NULL); } @@ -569,6 +571,8 @@ int main(int argc, char **argv) { "object", required_argument, NULL, QEMU_NBD_OPT_OBJECT }, { "export-name", required_argument, NULL, 'x' }, { "description", required_argument, NULL, 'D' }, +{ "handshake-limit", required_argument, NULL, + QEMU_NBD_OPT_HANDSHAKE_LIMIT }, { "tls-creds", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS }, { "tls-hostname", required_argument, NULL, QEMU_NBD_OPT_TLSHOSTNAME }, { "tls-authz", required_argument, NULL, QEMU_NBD_OPT_TLSAUTHZ }, @@ -815,6 +819,13 @@ int main(int argc, char **argv) case QEMU_NBD_OPT_SELINUX_LABEL: selinux_label = optarg; break; +case QEMU_NBD_OPT_HANDSHAKE_LIMIT: +if (qemu_strtoi(optarg, NULL, 0, &handshake_limit) < 0 || +handshake_limit < 0) { +error_report("Invalid handshake limit '%s'", optarg); +exit(EXIT_FAILURE); +} +break; } } -- 2.48.1
Re: [PATCH v2 00/14] meson: Deprecate 32-bit host support
+Xen maintainers On Mon, 3 Feb 2025, Richard Henderson wrote: > On 2/3/25 04:54, Paolo Bonzini wrote: > > On 2/3/25 04:18, Richard Henderson wrote: > > > v1: 20250128004254.33442-1-richard.hender...@linaro.org > > > > > > For v2, immediately disable 64-on-32 TCG. > > > > > > I *suspect* that we should disable 64-on-32 for *all* accelerators. > > > The idea that an i686 binary on an x86_64 host may be used to spawn > > > an x86_64 guest via kvm is silly and a bit more than niche. > > > > At least Xen used to be commonly used with 32-bit dom0, because it saved > > memory and dom0 would map in guest buffers as needed. I'm not sure how > > common that is these days, perhaps Stefano knows. > > As a data-point, debian does not ship libxen-dev for i686. > We cannot build-test this configuration at all. > > I can build-test Xen for armhf, and I guess it would use i386-softmmu; it's > unclear whether x86_64-softmmu and aarch64-softmmu are relevant or useful for > an armhf host, or as an armhf binary running on an aarch64 host. On the Xen side, there are two different use cases: x86 32-bit and ARM 32-bit. For x86 32-bit, while it was a very important use case in the past, I believe it is far less so now. I will let the x86 maintainers comment on how important it is today. For ARM 32-bit, I do not think we ever had many deployments, as most are 64-bit. Even when there are deployments, they do not typically use QEMU, as QEMU is less important for Xen on ARM compared to x86. Therefore, I would not block your cleanup and deprecation because of that. I will let the other ARM maintainers chime in.
Re: [PATCH 20/21] hw/i2c: Import TCA6416 emulation from Xilinx
Am 2. Februar 2025 17:09:06 UTC schrieb "Philippe Mathieu-Daudé" : >On 1/2/25 16:28, Bernhard Beschow wrote: >> >> >> Am 30. Januar 2025 23:05:53 UTC schrieb "Philippe Mathieu-Daudé" >> : >>> Cc'ing AMD folks >>> >>> Hi Bernhard, >>> >>> TL;DR; can't you use the PCF8574 which is a more complete model of I/O >>> expander? (See hw/gpio/pcf8574.c) >> >> If it is software-compatible then I could use it. I'm modeling a real board >> whose device tree I'd like to use unchanged, so I don't have much choice. >> The only reason I need it is because its absence clogs the i2c bus. > >No clue about compatibility. If you unfortunately need to add it, >then please address my comments in the next version. Sure, I'll link your and Zoltan's comments as todos in the next version. Best regards, Bernhard > >> >> Best regards, >> Bernhard >> >>> >>> On 20/1/25 21:37, Bernhard Beschow wrote: Xilinx QEMU implements a TCA6416 device model which may be useful for the broader QEMU community, so upstream it. In the Xilinx fork, the device model gets compiled whenever CONFIG_CADENCE is true, so have it maintained by the "hw/*/cadence_*" maintainers. The code is based on Xilinx QEMU version xilinx_v2024.2 plus the following modifications: * Use OBJECT_DECLARE_SIMPLE_TYPE() * Port from DPRINTF() to trace events * Follow QEMU conventions for naming of state struct * Rename type to not contain a ',' * Use DEFINE_TYPES() macro * Implement under hw/gpio since device is i2c client and gpio provider * Have dedicated Kconfig switch Signed-off-by: Bernhard Beschow -- I have a use for TCA6416 emulation. Question: Are Xilinx Zynq maintainers willing to maintain this device model? --- MAINTAINERS | 1 + hw/gpio/tca6416.c| 122 +++ hw/gpio/Kconfig | 5 ++ hw/gpio/meson.build | 1 + hw/gpio/trace-events | 4 ++ 5 files changed, 133 insertions(+) create mode 100644 hw/gpio/tca6416.c diff --git a/MAINTAINERS b/MAINTAINERS index 7531d65429..0cac9c90bc 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1030,6 +1030,7 @@ S: Maintained F: hw/*/xilinx_* F: hw/*/cadence_* F: hw/misc/zynq_slcr.c +F: hw/gpio/tca6416.c F: hw/adc/zynq-xadc.c F: include/hw/misc/zynq_slcr.h F: include/hw/adc/zynq-xadc.h diff --git a/hw/gpio/tca6416.c b/hw/gpio/tca6416.c new file mode 100644 index 00..81ed7a654d --- /dev/null +++ b/hw/gpio/tca6416.c @@ -0,0 +1,122 @@ +/* + * QEMU model of TCA6416 16-Bit I/O Expander >>> >>> Please add datasheet reference. >>> + * + * Copyright (c) 2018 Xilinx Inc. + * + * Written by Sai Pavan Boddu + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. >>> >>> Sai/Edgar/Francisco, should we update to AMD here? Remove or update the >>> email address? Also, can you Ack the replacement of this license by a >>> SPDX tag? >>> + */ +#include "qemu/osdep.h" +#include "hw/i2c/i2c.h" +#include "trace.h" + +#define TYPE_TCA6416 "tca6416" +OBJECT_DECLARE_SIMPLE_TYPE(Tca6416State, TCA6416) + +#define IN_PORT00 +#define IN_PORT11 +#define OUT_PORT0 2 +#define OUT_PORT1 3 +#define POL_INV04 +#define POL_INV15 +#define CONF_PORT0 6 +#define CONF_PORT1 7 >>> >>> enum up to here? >>> +#define RMAX (CONF_PORT1 + 1) + +enum tca6416_events { + ADDR_DONE, + ADDRESSING, +}; + +struct Tca6416State { + I2CSlave i2c; + + uint8_t addr; >>>
Re: [PATCH 0/7] hw/arm/raspi4b: Add models with 4GB and 8GB of DRAM
On Mon, 3 Feb 2025, Philippe Mathieu-Daudé wrote: On 3/2/25 15:50, Daniel P. Berrangé wrote: On Mon, Feb 03, 2025 at 02:45:06PM +, Peter Maydell wrote: On Mon, 3 Feb 2025 at 14:33, Daniel P. Berrangé wrote: On Mon, Feb 03, 2025 at 02:29:49PM +, Alex Bennée wrote: Peter Maydell writes: On Sat, 1 Feb 2025 at 12:57, BALATON Zoltan wrote: On Sat, 1 Feb 2025, Philippe Mathieu-Daudé wrote: - Deprecate the 'raspi4b' machine name, renaming it as 'raspi4b-1g' on 32-bit hosts, 'raspi4b-2g' otherwise. - Add the 'raspi4b-4g' and 'raspi4b-8g' machines, with respectively 4GB and 8GB of DRAM. IMHO (meaning you can ignore it, just my opinion) if the only difference is the memory size -machine raspi4b -memory 4g would be better user experience than having a lot of different machines. Yes, I think I agree. We have a way for users to specify how much memory they want, and I think it makes more sense to use that than to have lots of different machine types. I guess for the Pi we should validate the -memory supplied is on of the supported grid of devices rather than an arbitrary value? If the user wants to create a rpi4 with 6 GB RAM why should we stop them ? It is their choice if they want to precisely replicate RAM size from a physical model, or use something different when virtualized. The board revision code (reported to the guest via the emulated firmware interface) only supports reporting 256MB, 512MB, 1GB, 2GB, 4GB or 8GB: https://www.raspberrypi.com/documentation/computers/raspberry-pi.html#new-style-revision-codes I think it would be valid to report the revision code for the memory size that doesn't exceed what QEMU has configured. eg if configured with 6 GB, then report code for 4 GB. We need to distinct between physical machines VS virtual ones. Guests on virtual machines have some way to figure the virtual hardware (ACPI tables, DeviceTree blob, fw-cfg, ...). Guests for physical machines usually expect fixed hardware (not considering devices on busses). For the particular case of the Raspberry Pi machines, their bootloader gets the board layout by reading the RPI_FWREQ_GET_BOARD_REVISION constant value. What would be the point of emulating a raspi machine with 6GB if the FW is not going to consider besides 4GB? Besides, someone modify a guest to work with 6GB, it won't work on real HW. Usually the point of such non-standard configs would be running Linux via -kernel which could use whatever is configured if it has a way to detect it or maybe for memory it could even be specified on the kernel command line. But maybe this is not a common enough config to support so reporting error for memory size that's not on the list of valid sizes might be enough. This is similar to qemu-system-ppc -machine sam460ex which has a memory controller register that can only describe existing DIMM sizes. Originally I allowed users to specify whatever memory size and only warn for sizes not matching a DIMM size because Linux only looks at the device tree which QEMU can generate when booting with -kernel so it works but the firmware detects RAM from the SPD data and can only support certain sizes so only less RAM that could be convered with SPD data would be visible for guests. But then others thought it's better to return error for such cases and changed it and removed the support for arbitrary memory size so now it returns error when non power of 2 memory size is specified. Regards, BALATON Zoltan For Arm embedded boards we mostly tend to "restrict the user to what you can actually do", except for older boards where we tended not to write any kind of sanity checking on CPU type, memory size, etc. If we're going to strictly limit memory size that's accepted I wonder how we could information users/mgmt apps about what's permitted ? Expressing valid combinations of configs across different args gets pretty complicated quickly :-( I'll try to address Zoltan and Peter request to have a dynamic raspi machine. It is a bit unfortunate we didn't insisted on that when we decided to expose a fixed set of existing boards in order to not be bothered by inconsistent bug reports, back in 2019. Regards, Phil.
Re: [PATCH v2 07/14] accel/stubs: Expand stubs for TCG
On 2/3/25 09:38, Thomas Huth wrote: On 03/02/2025 17.43, Richard Henderson wrote: On 2/3/25 02:22, Thomas Huth wrote: On 03/02/2025 04.18, Richard Henderson wrote: Add tcg_allowed, qmp_x_query_jit, qmp_x_query_opcount. These are referenced when CONFIG_TCG is enabled globally, but not for a specific target. Signed-off-by: Richard Henderson --- accel/stubs/tcg-stub.c | 24 1 file changed, 24 insertions(+) diff --git a/accel/stubs/tcg-stub.c b/accel/stubs/tcg-stub.c index 7f4208fddf..9c2e2dc6e1 100644 --- a/accel/stubs/tcg-stub.c +++ b/accel/stubs/tcg-stub.c @@ -13,6 +13,18 @@ #include "qemu/osdep.h" #include "exec/tb-flush.h" #include "exec/exec-all.h" +#include "qapi/error.h" + +/* + * This file *ought* to be built once and linked only when required. + * However, it is built per-target, which means qemu/osdep.h has already + * undef'ed CONFIG_TCG, which hides the auto-generated declaration. So why don't we only build this file once? I think we'd have to create a static library for it. It didn't seem worth the effort at the time. I can re-investigate if you like. I think something like this might work: I think we need some of Philippe's include/exec/ cleanup work first. We currently use exec/exec-all.h, which requires cpu.h, which requires building per-target. r~
Re: [PATCH v4 13/33] migration/multifd: Device state transfer support - receive side
On Mon, Feb 03, 2025 at 11:18:11PM +0100, Maciej S. Szmigiero wrote: > On 3.02.2025 22:27, Peter Xu wrote: > > On Thu, Jan 30, 2025 at 11:08:34AM +0100, Maciej S. Szmigiero wrote: > > > From: "Maciej S. Szmigiero" > > > > > > Add a basic support for receiving device state via multifd channels - > > > channels that are shared with RAM transfers. > > > > > > Depending whether MULTIFD_FLAG_DEVICE_STATE flag is present or not in the > > > packet header either device state (MultiFDPacketDeviceState_t) or RAM > > > data (existing MultiFDPacket_t) is read. > > > > > > The received device state data is provided to > > > qemu_loadvm_load_state_buffer() function for processing in the > > > device's load_state_buffer handler. > > > > > > Signed-off-by: Maciej S. Szmigiero > > > > I think I acked this one. You could keep my R-b if... > > > > [...] > > > > > diff --git a/migration/multifd.h b/migration/multifd.h > > > index 9e4baa066312..abf3acdcee40 100644 > > > --- a/migration/multifd.h > > > +++ b/migration/multifd.h > > > @@ -62,6 +62,12 @@ MultiFDRecvData *multifd_get_recv_data(void); > > > #define MULTIFD_FLAG_UADK (8 << 1) > > > #define MULTIFD_FLAG_QATZIP (16 << 1) > > > +/* > > > + * If set it means that this packet contains device state > > > + * (MultiFDPacketDeviceState_t), not RAM data (MultiFDPacket_t). > > > + */ > > > +#define MULTIFD_FLAG_DEVICE_STATE (1 << 6) > > > > ... if this won't conflict with MULTIFD_FLAG_QATZIP. > > Hmm, isn't (16 << 1) = 32 while (1 << 6) = 64? Oops. :) > > I think we should stick with one way to write it, then when rebase you can > > see such conflicts - either your patch uses 32 << 1, or perhaps we should > > start to switch to BIT() for all above instead.. Still, do you mind switch to "32 << 1" (or use BIT())? With either, feel free to take: Reviewed-by: Peter Xu -- Peter Xu
Re: [PATCH v4 08/33] migration/multifd: Allow premature EOF on TLS incoming channels
On Mon, Feb 03, 2025 at 10:41:32PM +0100, Maciej S. Szmigiero wrote: > On 3.02.2025 21:20, Peter Xu wrote: > > On Mon, Feb 03, 2025 at 07:53:00PM +0100, Maciej S. Szmigiero wrote: > > > On 3.02.2025 19:20, Peter Xu wrote: > > > > On Thu, Jan 30, 2025 at 11:08:29AM +0100, Maciej S. Szmigiero wrote: > > > > > From: "Maciej S. Szmigiero" > > > > > > > > > > Multifd send channels are terminated by calling > > > > > qio_channel_shutdown(QIO_CHANNEL_SHUTDOWN_BOTH) in > > > > > multifd_send_terminate_threads(), which in the TLS case essentially > > > > > calls shutdown(SHUT_RDWR) on the underlying raw socket. > > > > > > > > > > Unfortunately, this does not terminate the TLS session properly and > > > > > the receive side sees this as a GNUTLS_E_PREMATURE_TERMINATION error. > > > > > > > > > > The only reason why this wasn't causing migration failures is because > > > > > the current migration code apparently does not check for migration > > > > > error being set after the end of the multifd receive process. > > > > > > > > > > However, this will change soon so the multifd receive code has to be > > > > > prepared to not return an error on such premature TLS session EOF. > > > > > Use the newly introduced QIOChannelTLS method for that. > > > > > > > > > > It's worth noting that even if the sender were to be changed to > > > > > terminate > > > > > the TLS connection properly the receive side still needs to remain > > > > > compatible with older QEMU bit stream which does not do this. > > > > > > > > If this is an existing bug, we could add a Fixes. > > > > > > It is an existing issue but only uncovered by this patch set. > > > > > > As far as I can see it was always there, so it would need some > > > thought where to point that Fixes tag. > > > > If there's no way to trigger a real functional bug anyway, it's also ok we > > omit the Fixes. > > > > > > Two pure questions.. > > > > > > > > - What is the correct way to terminate the TLS session without this > > > > flag? > > > > > > I guess one would need to call gnutls_bye() like in this GnuTLS example: > > > https://gitlab.com/gnutls/gnutls/-/blob/2b8c3e4c71ad380bbbffb32e6003b34ecad596e3/doc/examples/ex-client-anon.c#L102 > > > > > > > - Why this is only needed by multifd sessions? > > > > > > What uncovered the issue was switching the load threads to using > > > migrate_set_error() instead of their own result variable > > > (load_threads_ret) which you had requested during the previous > > > patch set version review: > > > https://lore.kernel.org/qemu-devel/Z1DbH5fwBaxtgrvH@x1n/ > > > > > > Turns out that the multifd receive code always returned > > > error in the TLS case, just nothing was previously checking for > > > that error presence. > > > > What I was curious is whether this issue also exists for the main migration > > channel when with tls, especially when e.g. multifd not enabled at all. As > > I don't see anywhere that qemu uses gnutls_bye() for any tls session. > > > > I think it's a good to find that we overlooked this before.. and IMHO it's > > always good we could fix this. > > > > Does it mean we need proper gnutls_bye() somewhere? > > > > If we need an explicit gnutls_bye(), then I wonder if that should be done > > on the main channel as well. > > That's a good question and looking at the code qemu_loadvm_state_main() exits > on receiving "QEMU_VM_EOF" section (that's different from receiving socket > EOF) > and then optionally "QEMU_VM_VMDESCRIPTION" section is read with explicit size > in qemu_loadvm_state() - so still not until channel EOF. I had a closer look, I do feel like such pre-mature termination is caused by explicit shutdown()s of the iochannels, looks like that can cause issue even after everything is sent. Then I noticed indeed multifd sender iochannels will get explicit shutdown()s since commit 077fbb5942, while we don't do that for the main channel. Maybe that is a major difference. Now I wonder whether we should shutdown() the channel at all if migration succeeded, because looks like it can cause tls session to interrupt even if the shutdown() is done after sent everything, and if so it'll explain why you hit the issue with tls. > > Then I can't see anything else reading the channel until it is closed in > migration_incoming_state_destroy(). > > So most likely the main migration channel will never read far enough to > reach that GNUTLS_E_PREMATURE_TERMINATION error. > > > If we don't need gnutls_bye(), then should we always ignore pre-mature > > termination of tls no matter if it's multifd or non-multifd channel (or > > even a tls session that is not migration-related)? > > So basically have this patch extended to calling > qio_channel_tls_set_premature_eof_okay() also on the main migration channel? If above theory can stand, then eof-okay could be a workaround papering over the real problem that we shouldn't always shutdown().. Could you have a look at below patch and see whether it ca
Re: [PATCH 08/21] hw/arm/fsl-imx8mp: Add USDHC storage controllers
Am 21. Januar 2025 02:52:58 UTC schrieb BALATON Zoltan : >On Mon, 20 Jan 2025, Bernhard Beschow wrote: >> The USDHC emulation allows for running real-world images such as those >> generated >> by Buildroot. Convert the board documentation accordingly instead of running >> a >> Linux kernel with ephemeral storage. >> >> Signed-off-by: Bernhard Beschow >> --- >> docs/system/arm/imx8mp-evk.rst | 39 +++--- >> include/hw/arm/fsl-imx8mp.h| 7 ++ >> hw/arm/fsl-imx8mp.c| 28 >> hw/arm/imx8mp-evk.c| 18 >> hw/arm/Kconfig | 1 + >> 5 files changed, 81 insertions(+), 12 deletions(-) >> >> diff --git a/docs/system/arm/imx8mp-evk.rst b/docs/system/arm/imx8mp-evk.rst >> index d7d08cc198..1514bc5864 100644 >> --- a/docs/system/arm/imx8mp-evk.rst >> +++ b/docs/system/arm/imx8mp-evk.rst >> @@ -13,6 +13,7 @@ The ``imx8mp-evk`` machine implements the following >> devices: >> * Up to 4 Cortex-A53 Cores >> * Generic Interrupt Controller (GICv3) >> * 4 UARTs >> + * 3 USDHC Storage Controllers >> * Secure Non-Volatile Storage (SNVS) including an RTC >> * Clock Tree >> >> @@ -25,25 +26,39 @@ for loading a Linux kernel. >> Direct Linux Kernel Boot >> >> >> -Linux mainline v6.12 release is tested at the time of writing. To build a >> Linux >> -mainline kernel that can be booted by the ``imx8mp-evk`` machine, simply >> -configure the kernel using the defconfig configuration: >> +Probably the easiest way to get started with a whole Linux system on the >> machine >> +is to generate an image with Buildroot. Version 2024.11.1 is tested at the >> time >> +of writing and involves three steps. First run the following command in the >> +toplevel directory of the Buildroot source tree: >> >> .. code-block:: bash >> >> - $ export ARCH=arm64 >> - $ export CROSS_COMPILE=aarch64-linux-gnu- >> - $ make defconfig >> + $ make freescale_imx8mpevk_defconfig >> $ make > >Adding documentation that is removed a few patches later seems unnecessary. >Maybe you could keep the bare kernel docs as that could also be useful or not >add it in the first place? But if this is already written keeping it may make >more sense than dropping it. I'd like the documentation to be complete and easy to follow at the same time. Buildroot achieves both, and allows for generating a cpio initrd instead. So I'd start with that which just requires minor adjustments here. Best regards, Bernhard > >Regards, >BALATON Zoltan > >> >> -To boot the newly built Linux kernel in QEMU with the ``imx8mp-evk`` >> machine, >> -run: >> +Once finished successfully there is an ``output/image`` subfolder. Navigate >> into >> +it and resize the SD card image to a power of two: >> + >> +.. code-block:: bash >> + >> + $ qemu-img resize sdcard.img 256M >> + >> +Finally, the device tree needs to be patched with the following commands >> which >> +will remove the ``cpu-idle-states`` properties from CPU nodes: >> + >> +.. code-block:: bash >> + >> + $ dtc imx8mp-evk.dtb | sed '/cpu-idle-states/d' > imx8mp-evk-patched.dts >> + $ dtc imx8mp-evk-patched.dts -o imx8mp-evk-patched.dtb >> + >> +Now that everything is prepared the newly built image can be run in the QEMU >> +``imx8mp-evk`` machine: >> >> .. code-block:: bash >> >> $ qemu-system-aarch64 -M imx8mp-evk -smp 4 -m 3G \ >> -display none -serial null -serial stdio \ >> - -kernel arch/arm64/boot/Image \ >> - -dtb arch/arm64/boot/dts/freescale/imx8mp-evk.dtb \ >> - -initrd /path/to/rootfs.ext4 \ >> - -append "root=/dev/ram" >> + -kernel Image \ >> + -dtb imx8mp-evk-patched.dtb \ >> + -append "root=/dev/mmcblk2p2" \ >> + -drive file=sdcard.img,if=sd,bus=2,format=raw,id=mmcblk2 >> diff --git a/include/hw/arm/fsl-imx8mp.h b/include/hw/arm/fsl-imx8mp.h >> index 9d2a757c2a..9832c05e8c 100644 >> --- a/include/hw/arm/fsl-imx8mp.h >> +++ b/include/hw/arm/fsl-imx8mp.h >> @@ -14,6 +14,7 @@ >> #include "hw/intc/arm_gicv3_common.h" >> #include "hw/misc/imx7_snvs.h" >> #include "hw/misc/imx8mp_ccm.h" >> +#include "hw/sd/sdhci.h" >> #include "qom/object.h" >> #include "qemu/units.h" >> >> @@ -27,6 +28,7 @@ enum FslImx8mpConfiguration { >> FSL_IMX8MP_NUM_CPUS = 4, >> FSL_IMX8MP_NUM_IRQS = 160, >> FSL_IMX8MP_NUM_UARTS= 4, >> +FSL_IMX8MP_NUM_USDHCS = 3, >> }; >> >> struct FslImx8mpState { >> @@ -38,6 +40,7 @@ struct FslImx8mpState { >> IMX8MPAnalogState analog; >> IMX7SNVSState snvs; >> IMXSerialState uart[FSL_IMX8MP_NUM_UARTS]; >> +SDHCIState usdhc[FSL_IMX8MP_NUM_USDHCS]; >> }; >> >> enum FslImx8mpMemoryRegions { >> @@ -183,6 +186,10 @@ enum FslImx8mpMemoryRegions { >> }; >> >> enum FslImx8mpIrqs { >> +FSL_IMX8MP_USDHC1_IRQ = 22, >> +FSL_IMX8MP_USDHC2_IRQ = 23, >> +FSL_IMX8MP_USDHC3_IRQ = 24, >> + >> FSL_IMX8MP_UART1_IRQ= 26, >>
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
On Mon, Feb 03, 2025 at 10:41:43PM +0100, Maciej S. Szmigiero wrote: > On 3.02.2025 21:36, Peter Xu wrote: > > On Mon, Feb 03, 2025 at 09:15:52PM +0100, Maciej S. Szmigiero wrote: > > > On 3.02.2025 20:58, Peter Xu wrote: > > > > On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote: > > > > > On 2.02.2025 13:45, Dr. David Alan Gilbert wrote: > > > > > > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote: > > > > > > > On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: > > > > > > > > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote: > > > > > > > > > From: "Maciej S. Szmigiero" > > > > > > > > > > > > > > > > > > postcopy_ram_listen_thread() is a free running thread, so it > > > > > > > > > needs to > > > > > > > > > take BQL around function calls to migration methods requiring > > > > > > > > > BQL. > > > > > > > > > > > > > > > > > > qemu_loadvm_state_main() needs BQL held since it ultimately > > > > > > > > > calls > > > > > > > > > "load_state" SaveVMHandlers. > > > > > > > > > > > > > > > > > > migration_incoming_state_destroy() needs BQL held since it > > > > > > > > > ultimately calls > > > > > > > > > "load_cleanup" SaveVMHandlers. > > > > > > > > > > > > > > > > > > Signed-off-by: Maciej S. Szmigiero > > > > > > > > > > > > > > > > > > --- > > > > > > > > > migration/savevm.c | 4 > > > > > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > > > > > > > > > diff --git a/migration/savevm.c b/migration/savevm.c > > > > > > > > > index b0b74140daea..0ceea9638cc1 100644 > > > > > > > > > --- a/migration/savevm.c > > > > > > > > > +++ b/migration/savevm.c > > > > > > > > > @@ -2013,7 +2013,9 @@ static void > > > > > > > > > *postcopy_ram_listen_thread(void *opaque) > > > > > > > > > * in qemu_file, and thus we must be blocking now. > > > > > > > > > */ > > > > > > > > > qemu_file_set_blocking(f, true); > > > > > > > > > +bql_lock(); > > > > > > > > > load_res = qemu_loadvm_state_main(f, mis); > > > > > > > > > +bql_unlock(); > > > > > > > > > > > > > > > > Doesn't that leave that held for a heck of a long time? > > > > > > > > > > > > > > Yes, and it effectively broke "postcopy recover" test but I > > > > > > > think the reason for that is qemu_loadvm_state_main() and > > > > > > > its children don't drop BQL while waiting for I/O. > > > > > > > > > > > > > > I've described this case in more detail in my reply to Fabiano > > > > > > > here: > > > > > > > https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd25...@maciej.szmigiero.name/ > > > > > > > > > > > > While it might be the cause in this case, my feeling is it's more > > > > > > fundamental > > > > > > here - it's the whole reason that postcopy has a separate ram listen > > > > > > thread. As the destination is running, after it loads it's devices > > > > > > and as it starts up the destination will be still loading RAM > > > > > > (and other postcopiable devices) potentially for quite a while. > > > > > > Holding the bql around the ram listen thread means that the > > > > > > execution of the destination won't be able to take that lock > > > > > > until the postcopy load has finished; so while that might apparently > > > > > > complete, it'll lead to the destination stalling until that's > > > > > > finished > > > > > > which defeats the whole point of postcopy. > > > > > > That last one probably won't fail a test but it will lead to a long > > > > > > stall > > > > > > if you give it a nice big guest with lots of RAM that it's rapidly > > > > > > changing. > > > > > > > > > > Okay, I understand the postcopy case/flow now. > > > > > Thanks for explaining it clearly. > > > > > > > > > > > > I still think that "load_state" SaveVMHandlers need to be called > > > > > > > with BQL held since implementations apparently expect it that way: > > > > > > > for example, I think PCI device configuration restore calls > > > > > > > address space manipulation methods which abort() if called > > > > > > > without BQL held. > > > > > > > > > > > > However, the only devices that *should* be arriving on the channel > > > > > > that the postcopy_ram_listen_thread is reading from are those > > > > > > that are postcopiable (i.e. RAM and hmm block's dirty_bitmap). > > > > > > Those load handlers are safe to be run while the other devices > > > > > > are being changed. Note the *should* - you could add a check > > > > > > to fail if any other device arrives on that channel. > > > > > > > > > > I think ultimately there should be either an explicit check, or, > > > > > as you suggest in the paragraph below, a separate SaveVMHandler > > > > > that runs without BQL held. > > > > > > > > To me those are bugs happening during postcopy, so those abort()s in > > > > memory.c are indeed for catching these issues too. > > > > > > > > > Since the current state of just running these SaveVMHandlers > > > > > without BQL in this case and hoping
Re: [PATCH 21/21] hw/gpio/imx_gpio: Don't clear input GPIO values upon reset
Am 28. Januar 2025 14:33:27 UTC schrieb Gustavo Romero : >Hi, > >On 1/20/25 17:37, Bernhard Beschow wrote: >> Input GPIO values such as a present SD card may get notified before the GPIO >> controller itself gets reset. Claring the input values thus loses data. >> Assuming > >^- nit: Clearing > > Peter asked for a three-way reset in inbound devices while keeping the logic here as is. I'd drop this patch then. Best regards, Bernhard >Cheers, >Gustavo > >> that input GPIO events are only fired when the state changes, the input >> values >> shouldn't be reset. >> >> Signed-off-by: Bernhard Beschow >> --- >> hw/gpio/imx_gpio.c | 1 - >> 1 file changed, 1 deletion(-) >> >> diff --git a/hw/gpio/imx_gpio.c b/hw/gpio/imx_gpio.c >> index 549a281ed7..25546221e0 100644 >> --- a/hw/gpio/imx_gpio.c >> +++ b/hw/gpio/imx_gpio.c >> @@ -298,7 +298,6 @@ static void imx_gpio_reset(DeviceState *dev) >> s->dr = 0; >> s->gdir = 0; >> -s->psr = 0; >> s->icr = 0; >> s->imr = 0; >> s->isr = 0; >
Re: [PATCH v4 13/33] migration/multifd: Device state transfer support - receive side
On 3.02.2025 22:27, Peter Xu wrote: On Thu, Jan 30, 2025 at 11:08:34AM +0100, Maciej S. Szmigiero wrote: From: "Maciej S. Szmigiero" Add a basic support for receiving device state via multifd channels - channels that are shared with RAM transfers. Depending whether MULTIFD_FLAG_DEVICE_STATE flag is present or not in the packet header either device state (MultiFDPacketDeviceState_t) or RAM data (existing MultiFDPacket_t) is read. The received device state data is provided to qemu_loadvm_load_state_buffer() function for processing in the device's load_state_buffer handler. Signed-off-by: Maciej S. Szmigiero I think I acked this one. You could keep my R-b if... [...] diff --git a/migration/multifd.h b/migration/multifd.h index 9e4baa066312..abf3acdcee40 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -62,6 +62,12 @@ MultiFDRecvData *multifd_get_recv_data(void); #define MULTIFD_FLAG_UADK (8 << 1) #define MULTIFD_FLAG_QATZIP (16 << 1) +/* + * If set it means that this packet contains device state + * (MultiFDPacketDeviceState_t), not RAM data (MultiFDPacket_t). + */ +#define MULTIFD_FLAG_DEVICE_STATE (1 << 6) ... if this won't conflict with MULTIFD_FLAG_QATZIP. Hmm, isn't (16 << 1) = 32 while (1 << 6) = 64? I think we should stick with one way to write it, then when rebase you can see such conflicts - either your patch uses 32 << 1, or perhaps we should start to switch to BIT() for all above instead.. Thanks, Maciej
RE: [PATCH v6 00/10] Support virtio-gpu DRM native context
> Subject: Re: [PATCH v6 00/10] Support virtio-gpu DRM native context > > "Kim, Dongwon" writes: > > > Hi, > > > > The commit below could change the timing of drawing by making the > > drawing done at refresh cycle instead of via drawing event. So it > > looks like either dmabuf or client's framebuffer is being written and > > read at the same time. Hey, can you describe how the corruption looks > > like? Is it just garbage image with random noise or the actual frame with > > some > defects like tearing...? > > The terminal gets mirrored upside down and the mouse creates damage as it > moves about. I am wondering if this is reproducible without virgl and drm native context (like w/ sw rasterizer on the guest) as well. > > > > >> Subject: Re: [PATCH v6 00/10] Support virtio-gpu DRM native context > >> > >> Dmitry Osipenko writes: > >> > >> > On 1/27/25 19:17, Alex Bennée wrote: > >> > ... > >> >> I'm still seeing corruption with -display gtk,gl=on on my x86 > >> >> system BTW. I would like to understand if that is a problem with > >> >> QEMU, GTK or something else in the stack before we merge. > >> > > >> > I reproduced the display mirroring/corruption issue and bisected it > >> > to the following commit. The problem only happens when QEMU/GTK > >> > uses Wayland display directly, while previously I was running QEMU > >> > with XWayland that doesn't have the problem. Why this change breaks > >> > dmabuf displaying with Wayland/GTK is unclear. > >> > >> Ahh that makes sense - I obviously forgot to mention I'm running > >> sway/wayland across both machines. > >> > >> > Reverting commit fixes the bug. > >> > > >> > +Dongwon Kim +Vivek Kasireddy > >> > > >> > commit 77bf310084dad38b3a2badf01766c659056f1cf2 > >> > Author: Dongwon Kim > >> > Date: Fri Apr 26 15:50:59 2024 -0700 > >> > > >> > ui/gtk: Draw guest frame at refresh cycle > >> > > >> > Draw routine needs to be manually invoked in the next refresh > >> > if there is a scanout blob from the guest. This is to prevent > >> > a situation where there is a scheduled draw event but it won't > >> > happen bacause the window is currently in inactive state > >> > (minimized or tabified). If draw is not done for a long time, > >> > gl_block timeout and/or fence timeout (on the guest) will happen > >> > eventually. > >> > > >> > v2: Use gd_gl_area_draw(vc) in gtk-gl-area.c > >> > > >> > Suggested-by: Vivek Kasireddy > >> > Cc: Gerd Hoffmann > >> > Cc: Marc-André Lureau > >> > Cc: Daniel P. Berrangé > >> > Signed-off-by: Dongwon Kim > >> > Acked-by: Marc-André Lureau > >> > Message-Id: <20240426225059.3871283-1-dongwon@intel.com> > >> > >> > >> Maybe a race on: > >> > >> QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf; ? > >> > >> -- > >> Alex Bennée > >> Virtualisation Tech Lead @ Linaro > > -- > Alex Bennée > Virtualisation Tech Lead @ Linaro
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
On Mon, Feb 03, 2025 at 09:15:52PM +0100, Maciej S. Szmigiero wrote: > On 3.02.2025 20:58, Peter Xu wrote: > > On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote: > > > On 2.02.2025 13:45, Dr. David Alan Gilbert wrote: > > > > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote: > > > > > On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: > > > > > > * Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote: > > > > > > > From: "Maciej S. Szmigiero" > > > > > > > > > > > > > > postcopy_ram_listen_thread() is a free running thread, so it > > > > > > > needs to > > > > > > > take BQL around function calls to migration methods requiring BQL. > > > > > > > > > > > > > > qemu_loadvm_state_main() needs BQL held since it ultimately calls > > > > > > > "load_state" SaveVMHandlers. > > > > > > > > > > > > > > migration_incoming_state_destroy() needs BQL held since it > > > > > > > ultimately calls > > > > > > > "load_cleanup" SaveVMHandlers. > > > > > > > > > > > > > > Signed-off-by: Maciej S. Szmigiero > > > > > > > --- > > > > > > > migration/savevm.c | 4 > > > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > > > > > diff --git a/migration/savevm.c b/migration/savevm.c > > > > > > > index b0b74140daea..0ceea9638cc1 100644 > > > > > > > --- a/migration/savevm.c > > > > > > > +++ b/migration/savevm.c > > > > > > > @@ -2013,7 +2013,9 @@ static void > > > > > > > *postcopy_ram_listen_thread(void *opaque) > > > > > > > * in qemu_file, and thus we must be blocking now. > > > > > > > */ > > > > > > > qemu_file_set_blocking(f, true); > > > > > > > +bql_lock(); > > > > > > > load_res = qemu_loadvm_state_main(f, mis); > > > > > > > +bql_unlock(); > > > > > > > > > > > > Doesn't that leave that held for a heck of a long time? > > > > > > > > > > Yes, and it effectively broke "postcopy recover" test but I > > > > > think the reason for that is qemu_loadvm_state_main() and > > > > > its children don't drop BQL while waiting for I/O. > > > > > > > > > > I've described this case in more detail in my reply to Fabiano here: > > > > > https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd25...@maciej.szmigiero.name/ > > > > > > > > While it might be the cause in this case, my feeling is it's more > > > > fundamental > > > > here - it's the whole reason that postcopy has a separate ram listen > > > > thread. As the destination is running, after it loads it's devices > > > > and as it starts up the destination will be still loading RAM > > > > (and other postcopiable devices) potentially for quite a while. > > > > Holding the bql around the ram listen thread means that the > > > > execution of the destination won't be able to take that lock > > > > until the postcopy load has finished; so while that might apparently > > > > complete, it'll lead to the destination stalling until that's finished > > > > which defeats the whole point of postcopy. > > > > That last one probably won't fail a test but it will lead to a long > > > > stall > > > > if you give it a nice big guest with lots of RAM that it's rapidly > > > > changing. > > > > > > Okay, I understand the postcopy case/flow now. > > > Thanks for explaining it clearly. > > > > > > > > I still think that "load_state" SaveVMHandlers need to be called > > > > > with BQL held since implementations apparently expect it that way: > > > > > for example, I think PCI device configuration restore calls > > > > > address space manipulation methods which abort() if called > > > > > without BQL held. > > > > > > > > However, the only devices that *should* be arriving on the channel > > > > that the postcopy_ram_listen_thread is reading from are those > > > > that are postcopiable (i.e. RAM and hmm block's dirty_bitmap). > > > > Those load handlers are safe to be run while the other devices > > > > are being changed. Note the *should* - you could add a check > > > > to fail if any other device arrives on that channel. > > > > > > I think ultimately there should be either an explicit check, or, > > > as you suggest in the paragraph below, a separate SaveVMHandler > > > that runs without BQL held. > > > > To me those are bugs happening during postcopy, so those abort()s in > > memory.c are indeed for catching these issues too. > > > > > Since the current state of just running these SaveVMHandlers > > > without BQL in this case and hoping that nothing breaks is > > > clearly sub-optimal. > > > > > > > > I have previously even submitted a patch to explicitly document > > > > > "load_state" SaveVMHandler as requiring BQL (which was also > > > > > included in the previous version of this patch set) and it > > > > > received a "Reviewed-by:" tag: > > > > > https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/ > > > > > https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.
Re: [PATCH v2 12/15] nbd/server: Support inactive nodes
On Fri, Jan 31, 2025 at 10:50:48AM +0100, Kevin Wolf wrote: > In order to support running an NBD export on inactive nodes, we must > make sure to return errors for any operations that aren't allowed on > inactive nodes. Reads are the only operation we know we need for > inactive images, so to err on the side of caution, return errors for > everything else, even if some operations could possibly be okay. > > Signed-off-by: Kevin Wolf > --- > nbd/server.c | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/nbd/server.c b/nbd/server.c > index f64e47270c..2076fb2666 100644 > --- a/nbd/server.c > +++ b/nbd/server.c > @@ -2026,6 +2026,7 @@ static void nbd_export_delete(BlockExport *blk_exp) > const BlockExportDriver blk_exp_nbd = { > .type = BLOCK_EXPORT_TYPE_NBD, > .instance_size = sizeof(NBDExport), > +.supports_inactive = true, > .create = nbd_export_create, > .delete = nbd_export_delete, > .request_shutdown = nbd_export_request_shutdown, > @@ -2920,6 +2921,22 @@ static coroutine_fn int nbd_handle_request(NBDClient > *client, > NBDExport *exp = client->exp; > char *msg; > size_t i; > +bool inactive; > + > +WITH_GRAPH_RDLOCK_GUARD() { > +inactive = bdrv_is_inactive(blk_bs(exp->common.blk)); > +if (inactive) { > +switch (request->type) { > +case NBD_CMD_READ: > +/* These commands are allowed on inactive nodes */ > +break; > +default: > +/* Return an error for the rest */ > +return nbd_send_generic_reply(client, request, -EPERM, > + "export is inactive", errp); > +} > +} > +} Hmm...end of lock guard. What prevents the race where inactive changes before the request is performed? > > switch (request->type) { > case NBD_CMD_CACHE: > -- > 2.48.1 > signature.asc Description: PGP signature
Re: [PATCH v2 11/15] block/export: Add option to allow export of inactive nodes
On Fri, Jan 31, 2025 at 10:50:47AM +0100, Kevin Wolf wrote: > Add an option in BlockExportOptions to allow creating an export on an > inactive node without activating the node. This mode needs to be > explicitly supported by the export type (so that it doesn't perform any > operations that are forbidden for inactive nodes), so this patch alone > doesn't allow this option to be successfully used yet. > > Signed-off-by: Kevin Wolf > --- > qapi/block-export.json | 10 +- > include/block/block-global-state.h | 3 +++ > include/block/export.h | 3 +++ > block.c| 4 > block/export/export.c | 31 -- > 5 files changed, 40 insertions(+), 11 deletions(-) > Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v2 15/15] iotests: Add (NBD-based) tests for inactive nodes
On Fri, Jan 31, 2025 at 10:50:51AM +0100, Kevin Wolf wrote: > This tests different types of operations on inactive block nodes > (including graph changes, block jobs and NBD exports) to make sure that > users manually activating and inactivating nodes doesn't break things. > > Support for inactive nodes in other export types will have to come with > separate test cases because they have different dependencies like blkio > or root permissions and we don't want to disable this basic test when > they are not fulfilled. > > Signed-off-by: Kevin Wolf > --- > tests/qemu-iotests/iotests.py | 4 + > tests/qemu-iotests/tests/inactive-node-nbd| 303 ++ > .../qemu-iotests/tests/inactive-node-nbd.out | 239 ++ > 3 files changed, 546 insertions(+) > create mode 100755 tests/qemu-iotests/tests/inactive-node-nbd > create mode 100644 tests/qemu-iotests/tests/inactive-node-nbd.out > > +iotests.log('\nMirror from active source to inactive target') > + > +iotests.log('disk-fmt active: %s' % node_is_active(vm, 'disk-fmt')) > +iotests.log('snap-fmt active: %s' % node_is_active(vm, 'snap-fmt')) > +iotests.log('snap2-fmt active: %s' % node_is_active(vm, 'snap2-fmt')) > +iotests.log('target-fmt active: %s' % node_is_active(vm, 'target-fmt')) > + > +# Activating snap2-fmt recursively activates the whole backing chain > +vm.qmp_log('blockdev-set-active', node_name='snap2-fmt', active=True) > +vm.qmp_log('blockdev-set-active', node_name='target-fmt', active=False) Here, you have "Activating ... recursively activates"... > + > +iotests.log('disk-fmt active: %s' % node_is_active(vm, 'disk-fmt')) > +iotests.log('snap-fmt active: %s' % node_is_active(vm, 'snap-fmt')) > +iotests.log('snap2-fmt active: %s' % node_is_active(vm, 'snap2-fmt')) > +iotests.log('target-fmt active: %s' % node_is_active(vm, 'target-fmt')) > + > +vm.qmp_log('blockdev-mirror', job_id='job0', device='snap2-fmt', > + target='target-fmt', sync='full', > + filters=[iotests.filter_qmp_generated_node_ids]) > + > +iotests.log('\nBackup from active source to inactive target') > + > +vm.qmp_log('blockdev-backup', job_id='job0', device='snap2-fmt', > + target='target-fmt', sync='full', > + filters=[iotests.filter_qmp_generated_node_ids]) > + > +iotests.log('\nBackup from inactive source to active target') > + > +# Activating snap2-fmt recursively inactivates the whole backing chain > +vm.qmp_log('blockdev-set-active', node_name='snap2-fmt', active=False) > +vm.qmp_log('blockdev-set-active', node_name='target-fmt', active=True) ...but here, "Activating ... recursively inactivates". Is one of these statements wrong? Overall a nice barrage of tests, and I can see how adding this many tests caused your v2 to fix some bugs that it discovered in v1. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v4 10/33] error: define g_autoptr() cleanup function for the Error type
On Thu, Jan 30, 2025 at 11:08:31AM +0100, Maciej S. Szmigiero wrote: > From: "Maciej S. Szmigiero" > > Automatic memory management helps avoid memory safety issues. > > Signed-off-by: Maciej S. Szmigiero Reviewed-by: Peter Xu -- Peter Xu
Re: [PATCH v2 00/14] meson: Deprecate 32-bit host support
On 2/3/25 04:54, Paolo Bonzini wrote: On 2/3/25 04:18, Richard Henderson wrote: v1: 20250128004254.33442-1-richard.hender...@linaro.org For v2, immediately disable 64-on-32 TCG. I *suspect* that we should disable 64-on-32 for *all* accelerators. The idea that an i686 binary on an x86_64 host may be used to spawn an x86_64 guest via kvm is silly and a bit more than niche. At least Xen used to be commonly used with 32-bit dom0, because it saved memory and dom0 would map in guest buffers as needed. I'm not sure how common that is these days, perhaps Stefano knows. As a data-point, debian does not ship libxen-dev for i686. We cannot build-test this configuration at all. I can build-test Xen for armhf, and I guess it would use i386-softmmu; it's unclear whether x86_64-softmmu and aarch64-softmmu are relevant or useful for an armhf host, or as an armhf binary running on an aarch64 host. r~
[PATCH] tests/functional: Extend the ppc64 e500 test
The test sequence boots a ppce500 machine from kernel and disk. The buildroot is built with the qemu_ppc64_e5500_defconfig config. Signed-off-by: Cédric Le Goater --- tests/functional/test_ppc64_e500.py | 30 + 1 file changed, 30 insertions(+) diff --git a/tests/functional/test_ppc64_e500.py b/tests/functional/test_ppc64_e500.py index b92fe0b0e75e..f21d7d84177e 100755 --- a/tests/functional/test_ppc64_e500.py +++ b/tests/functional/test_ppc64_e500.py @@ -5,6 +5,7 @@ # SPDX-License-Identifier: GPL-2.0-or-later from qemu_test import LinuxKernelTest, Asset +from qemu_test import exec_command_and_wait_for_pattern class E500Test(LinuxKernelTest): @@ -20,5 +21,34 @@ def test_ppc64_e500(self): self.launch_kernel(self.scratch_file('day19', 'uImage'), wait_for='QEMU advent calendar') +ASSET_BR2_E5500_UIMAGE = Asset( + 'https://github.com/legoater/qemu-ppc-boot/raw/refs/heads/main/buildroot/qemu_ppc64_e5500-2023.11-8-gdcd9f0f6eb-20240104/uImage', +'2478187c455d6cca3984e9dfde9c635d824ea16236b85fd6b4809f744706deda') + +ASSET_BR2_E5500_ROOTFS = Asset( + 'https://github.com/legoater/qemu-ppc-boot/raw/refs/heads/main//buildroot/qemu_ppc64_e5500-2023.11-8-gdcd9f0f6eb-20240104/rootfs.ext2', +'9035ef97237c84c7522baaff17d25cdfca4bb7a053d5e296e902919473423d76') + +def test_ppc64_e500_buildroot(self): +self.set_machine('ppce500') +self.cpu = 'e5500' + +uimage_path = self.ASSET_BR2_E5500_UIMAGE.fetch() +rootfs_path = self.ASSET_BR2_E5500_ROOTFS.fetch() + +self.vm.set_console() +self.vm.add_args('-kernel', uimage_path, + '-append', 'root=/dev/vda', + '-drive', f'file={rootfs_path},if=virtio,format=raw', + '-snapshot', '-no-shutdown') +self.vm.launch() + +self.wait_for_console_pattern('Linux version') +self.wait_for_console_pattern('/init as init process') +self.wait_for_console_pattern('lease of 10.0.2.15') +self.wait_for_console_pattern('buildroot login:') +exec_command_and_wait_for_pattern(self, 'root', '#') +exec_command_and_wait_for_pattern(self, 'poweroff', 'Power down') + if __name__ == '__main__': LinuxKernelTest.main() -- 2.48.1
Re: [PATCH v2 01/14] meson: Drop tcg as a module
On 03/02/2025 04.18, Richard Henderson wrote: The fact that this is only enabled for x86 probably means it was done incorrectly. Certainly the set of files selected to go into the module is woefully incomplete. Drop it for now. Signed-off-by: Richard Henderson --- accel/tcg/meson.build | 11 --- meson.build | 18 +- 2 files changed, 5 insertions(+), 24 deletions(-) Looking at the cover letter https://lore.kernel.org/qemu-devel/20210624103836.2382472-1-kra...@redhat.com/ it indeed only mentions "a small fraction of tcg (x86 only)", and since there were no follow up patches, it sounds like an incomplete conversion to me. So reverting it three and a half years later sounds reasonable. Reviewed-by: Thomas Huth
[PATCH v3] spapr: nested: Add support for reporting Hostwide state counter
Add support for reporting Hostwide state counters for nested KVM pseries guests running with 'cap-nested-papr' on Qemu-TCG acting as L0-hypervisor. sPAPR supports reporting various stats counters for Guest-Management-Area(GMA) thats owned by L0-Hypervisor and are documented at [1]. These stats counters are exposed via a new bit-flag named 'getHostWideState' for the H_GUEST_GET_STATE hcall. Once this flag is set the hcall should populate the Guest-State-Elements in the requested GSB with the stat counter values. Currently following five counters are supported: * host_heap : The currently used bytes in the Hypervisor's Guest Management Space associated with the Host Partition. * host_heap_max : The maximum bytes available in the Hypervisor's Guest Management Space associated with the Host Partition. * host_pagetable: The currently used bytes in the Hypervisor's Guest Page Table Management Space associated with the Host Partition. * host_pagetable_max: The maximum bytes available in the Hypervisor's Guest Page Table Management Space associated with the Host Partition. * host_pagetable_reclaim: The amount of space in bytes that has been reclaimed due to overcommit in the Hypervisor's Guest Page Table Management Space associated with the Host Partition. At the moment '0' is being reported for all these counters as these counters doesnt align with how L0-Qemu manages Guest memory. The patch implements support for these counters by adding new members to the 'struct SpaprMachineStateNested'. These new members are then plugged into the existing 'guest_state_element_types[]' with the help of a new macro 'GSBE_NESTED_MACHINE_DW' together with a new helper 'get_machine_ptr()'. guest_state_request_check() is updated to ensure correctness of the requested GSB and finally h_guest_getset_state() is updated to handle the newly introduced flag 'GUEST_STATE_REQUEST_HOST_WIDE'. This patch is tested with the proposed linux-kernel implementation to expose these stat-counter as perf-events at [2]. [1] https://lore.kernel.org/all/20241222140247.174998-2-vaib...@linux.ibm.com [2] https://lore.kernel.org/all/20241222140247.174998-1-vaib...@linux.ibm.com Signed-off-by: Vaibhav Jain Reviewed-by: Harsh Prateek Bora --- Changelog: v2->v1: * Fixed minor nits suggested [Harsh] * s/GUEST_STATE_ELEMENT_TYPE_FLAG_HOST_WIDE/GUEST_STATE_REQUEST_HOST_WIDE/ in guest_state_request_check() [Harsh] * MInor change in the order of comparision in h_guest_getset_state() [Harsh] * Added reviewed-by of Harsh v1->v2: * Introduced new flags to correctly compare hcall flags for H_GUEST_{GET,SET}_STATE [Harsh] * Fixed ordering of new GSB elements in spapr_nested.h [Harsh] * s/GSBE_MACHINE_NESTED_DW/GSBE_NESTED_MACHINE_DW/ * Minor tweaks to patch description * Updated recipients list --- hw/ppc/spapr_nested.c | 82 ++- include/hw/ppc/spapr_nested.h | 39 ++--- 2 files changed, 96 insertions(+), 25 deletions(-) diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c index 7def8eb73b..d1aa6fc866 100644 --- a/hw/ppc/spapr_nested.c +++ b/hw/ppc/spapr_nested.c @@ -64,10 +64,9 @@ static SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr, target_ulong guestid) { -SpaprMachineStateNestedGuest *guest; - -guest = g_hash_table_lookup(spapr->nested.guests, GINT_TO_POINTER(guestid)); -return guest; +return spapr->nested.guests ? +g_hash_table_lookup(spapr->nested.guests, +GINT_TO_POINTER(guestid)) : NULL; } bool spapr_get_pate_nested_papr(SpaprMachineState *spapr, PowerPCCPU *cpu, @@ -613,6 +612,13 @@ static void *get_guest_ptr(SpaprMachineStateNestedGuest *guest, return guest; /* for GSBE_NESTED */ } +static void *get_machine_ptr(SpaprMachineStateNestedGuest *guest, + target_ulong vcpuid) +{ +SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine()); +return &spapr->nested; +} + /* * set=1 means the L1 is trying to set some state * set=0 means the L1 is trying to get some state @@ -1012,7 +1018,12 @@ struct guest_state_element_type guest_state_element_types[] = { GSBE_NESTED_VCPU(GSB_VCPU_OUT_BUFFER, 0x10, runbufout, copy_state_runbuf), GSBE_NESTED_VCPU(GSB_VCPU_OUT_BUF_MIN_SZ, 0x8, runbufout, out_buf_min_size), GSBE_NESTED_VCPU(GSB_VCPU_HDEC_EXPIRY_TB, 0x8, hdecr_expiry_tb, - copy_state_hdecr) + copy_state_hdecr), +GSBE_NESTED_MACHINE_DW(GSB_GUEST_HEAP, current_guest_heap), +GSBE_NESTED_MACHINE_DW(GSB_GUEST_HEAP_MAX, max_guest_heap)
Re: [PATCH v2 04/14] meson: Introduce CONFIG_TCG_TARGET
On 03/02/2025 04.18, Richard Henderson wrote: Use CONFIG_TCG as a project-wide flag to indicate that TCG is enabled for *some* target. Use CONFIG_TCG_TARGET to indicate that TCG is enabled for a specific target. Within a specific compilation unit, we can remap CONFIG_TCG based on CONFIG_TCG_TARGET. This allows us to avoid changes to the bulk of the code base. Within meson.build, while CONFIG_TCG may be set in config_host_data, it may not be set within config_target. Thus all references to CONFIG_TCG in source_set 'when:' need not be updated. For the moment, CONFIG_TCG and CONFIG_TCG_TARGET are identical. Signed-off-by: Richard Henderson --- include/qemu/osdep.h | 7 +++ meson.build | 11 +++ 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index 112ebdff21..1f6f73a148 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -34,9 +34,16 @@ #include "config-host.h" #ifdef COMPILING_PER_TARGET #include CONFIG_TARGET +# ifdef CONFIG_TCG_TARGET +# undef CONFIG_TCG_TARGET +# else +# undef CONFIG_TCG +# endif #else #include "exec/poison.h" #endif +#pragma GCC poison CONFIG_TCG_TARGET Shouldn't that rather go before the "#endif" instead? Also, would it be possible to rather adjust scripts/make-config-poison.sh instead of poisoning this switch manually? Thomas /* * HOST_WORDS_BIGENDIAN was replaced with HOST_BIG_ENDIAN. Prevent it from diff --git a/meson.build b/meson.build index b72114819b..5ca3cc3f34 100644 --- a/meson.build +++ b/meson.build @@ -3270,11 +3270,14 @@ foreach target : target_dirs target_kconfig = [] foreach sym: accelerators -if sym == 'CONFIG_TCG' or target in accelerator_targets.get(sym, []) - config_target += { sym: 'y' } - config_all_accel += { sym: 'y' } - target_kconfig += [ sym + '=y' ] +if sym == 'CONFIG_TCG' + config_target += { 'CONFIG_TCG_TARGET': 'y' } +elif target not in accelerator_targets.get(sym, []) + continue endif +config_target += { sym: 'y' } +config_all_accel += { sym: 'y' } +target_kconfig += [ sym + '=y' ] endforeach if target_kconfig.length() == 0 if default_targets
Re: [PATCH v2] spapr: nested: Add support for reporting Hostwide state counter
Harsh Prateek Bora writes: > On 1/23/25 17:25, Vaibhav Jain wrote: >> Add support for reporting Hostwide state counters for nested KVM pseries >> guests running with 'cap-nested-papr' on Qemu-TCG acting as >> L0-hypervisor. sPAPR supports reporting various stats counters for >> Guest-Management-Area(GMA) thats owned by L0-Hypervisor and are documented >> at [1]. These stats counters are exposed via a new bit-flag named >> 'getHostWideState' for the H_GUEST_GET_STATE hcall. Once this flag is set >> the hcall should populate the Guest-State-Elements in the requested GSB >> with the stat counter values. Currently following five counters are >> supported: >> >> * host_heap : The currently used bytes in the >>Hypervisor's Guest Management Space >>associated with the Host Partition. >> * host_heap_max : The maximum bytes available in the >>Hypervisor's Guest Management Space >>associated with the Host Partition. >> * host_pagetable : The currently used bytes in the >>Hypervisor's Guest Page Table Management >>Space associated with the Host Partition. >> * host_pagetable_max : The maximum bytes available in the >>Hypervisor's Guest Page Table Management >>Space associated with the Host Partition. >> * host_pagetable_reclaim: The amount of space in bytes that has >>been reclaimed due to overcommit in the >>Hypervisor's Guest Page Table Management >>Space associated with the Host Partition. >> >> At the moment '0' is being reported for all these counters as these >> counters doesnt align with how L0-Qemu manages Guest memory. >> >> The patch implements support for these counters by adding new members to >> the 'struct SpaprMachineStateNested'. These new members are then plugged >> into the existing 'guest_state_element_types[]' with the help of a new >> macro 'GSBE_NESTED_MACHINE_DW' together with a new helper >> 'get_machine_ptr()'. guest_state_request_check() is updated to ensure >> correctness of the requested GSB and finally h_guest_getset_state() is >> updated to handle the newly introduced flag >> 'GUEST_STATE_REQUEST_HOST_WIDE'. >> >> This patch is tested with the proposed linux-kernel implementation to >> expose these stat-counter as perf-events at [2]. >> >> [1] >> https://lore.kernel.org/all/20241222140247.174998-2-vaib...@linux.ibm.com >> >> [2] >> https://lore.kernel.org/all/20241222140247.174998-1-vaib...@linux.ibm.com >> >> Signed-off-by: Vaibhav Jain >> --- >> Changelog: >> >> v1->v2: >> * Introduced new flags to correctly compare hcall flags >>for H_GUEST_{GET,SET}_STATE [Harsh] >> * Fixed ordering of new GSB elements in spapr_nested.h [Harsh] >> * s/GSBE_MACHINE_NESTED_DW/GSBE_NESTED_MACHINE_DW/ >> * Minor tweaks to patch description >> * Updated recipients list >> --- >> hw/ppc/spapr_nested.c | 82 ++- >> include/hw/ppc/spapr_nested.h | 39 ++--- >> 2 files changed, 96 insertions(+), 25 deletions(-) >> >> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c >> index 7def8eb73b..7f484bb3e7 100644 >> --- a/hw/ppc/spapr_nested.c >> +++ b/hw/ppc/spapr_nested.c >> @@ -64,10 +64,9 @@ static >> SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState >> *spapr, >>target_ulong guestid) >> { >> -SpaprMachineStateNestedGuest *guest; >> - >> -guest = g_hash_table_lookup(spapr->nested.guests, >> GINT_TO_POINTER(guestid)); >> -return guest; >> +return spapr->nested.guests ? >> +g_hash_table_lookup(spapr->nested.guests, >> +GINT_TO_POINTER(guestid)) : NULL; >> } >> >> bool spapr_get_pate_nested_papr(SpaprMachineState *spapr, PowerPCCPU *cpu, >> @@ -613,6 +612,13 @@ static void *get_guest_ptr(SpaprMachineStateNestedGuest >> *guest, >> return guest; /* for GSBE_NESTED */ >> } >> >> +static void *get_machine_ptr(SpaprMachineStateNestedGuest *guest, >> + target_ulong vcpuid) >> +{ >> +SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine()); >> +return &spapr->nested; >> +} >> + >> /* >>* set=1 means the L1 is trying to set some state >>* set=0 means the L1 is trying to get some state >> @@ -1012,7 +1018,12 @@ struct guest_state_element_type >> guest_state_element_types[] = { >> GSBE_NESTED_VCPU(GSB_VCPU_OUT_BUFFER, 0x10, runbufout, >> copy_state_runbuf), >> GSBE_NESTED_VCPU(GSB_VCPU_OUT_BUF_MIN_SZ, 0x8, runbufout, >> out_buf_min_size), >> GSBE_NESTED_VCPU(GSB_VCPU_HDEC_EXPIRY_TB, 0x8, hdecr_expiry_tb, >> - copy_state_hdecr) >> + copy_state_hdecr), >> +GSBE_NESTED_MACHINE_DW(GSB_GUEST_HEAP, curr
Re: [PATCH] tests/functional: Extend the ppc64 e500 test
On 03/02/2025 10.57, Cédric Le Goater wrote: The test sequence boots a ppce500 machine from kernel and disk. The buildroot is built with the qemu_ppc64_e5500_defconfig config. Signed-off-by: Cédric Le Goater --- tests/functional/test_ppc64_e500.py | 30 + 1 file changed, 30 insertions(+) diff --git a/tests/functional/test_ppc64_e500.py b/tests/functional/test_ppc64_e500.py index b92fe0b0e75e..f21d7d84177e 100755 --- a/tests/functional/test_ppc64_e500.py +++ b/tests/functional/test_ppc64_e500.py @@ -5,6 +5,7 @@ # SPDX-License-Identifier: GPL-2.0-or-later from qemu_test import LinuxKernelTest, Asset +from qemu_test import exec_command_and_wait_for_pattern class E500Test(LinuxKernelTest): @@ -20,5 +21,34 @@ def test_ppc64_e500(self): self.launch_kernel(self.scratch_file('day19', 'uImage'), wait_for='QEMU advent calendar') +ASSET_BR2_E5500_UIMAGE = Asset( + 'https://github.com/legoater/qemu-ppc-boot/raw/refs/heads/main/buildroot/qemu_ppc64_e5500-2023.11-8-gdcd9f0f6eb-20240104/uImage', +'2478187c455d6cca3984e9dfde9c635d824ea16236b85fd6b4809f744706deda') + +ASSET_BR2_E5500_ROOTFS = Asset( + 'https://github.com/legoater/qemu-ppc-boot/raw/refs/heads/main//buildroot/qemu_ppc64_e5500-2023.11-8-gdcd9f0f6eb-20240104/rootfs.ext2', +'9035ef97237c84c7522baaff17d25cdfca4bb7a053d5e296e902919473423d76') Hmm, the advent calendar test that is already available in this file is also based on build root ... so I think we don't need both tests here. IIRC I built most of the advent calendar images without networking stack (to keep them smaller), so your image is likely better suited here, thus I'd suggest to remove the advent calendar image now when you add your new test. WDYT? Thomas
Re: [PATCH v2 03/14] plugins: Uninline qemu_plugin_add_opts
On 03/02/2025 04.18, Richard Henderson wrote: No need to expand this function inline. Unexport qemu_plugin_opts to match. Signed-off-by: Richard Henderson --- include/qemu/plugin.h | 9 + plugins/loader.c | 7 ++- 2 files changed, 7 insertions(+), 9 deletions(-) Reviewed-by: Thomas Huth
Re: [PATCH v2 02/14] tcg: Move stubs in tcg/perf.h to tcg/perf-stubs.c
On 03/02/2025 04.18, Richard Henderson wrote: These are not called so frequently as to be performance sensitive. Signed-off-by: Richard Henderson --- include/tcg/perf.h | 23 --- tcg/perf-stubs.c | 26 ++ tcg/meson.build| 2 ++ 3 files changed, 28 insertions(+), 23 deletions(-) create mode 100644 tcg/perf-stubs.c Reviewed-by: Thomas Huth
Re: [PATCH v2 05/14] tcg: Link only when required in system mode
On 03/02/2025 04.18, Richard Henderson wrote: Rather than unconditional linkage via system_ss, conditinally include the static library via specific_ss. This will elide the code when CONFIG_TCG is disabled for a specific target. Signed-off-by: Richard Henderson --- tcg/meson.build | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tcg/meson.build b/tcg/meson.build index 2977df5862..8266bcb324 100644 --- a/tcg/meson.build +++ b/tcg/meson.build @@ -49,4 +49,8 @@ libtcg_system = static_library('tcg_system', tcg_system = declare_dependency(objects: libtcg_system.extract_all_objects(recursive: false), dependencies: tcg_ss.dependencies()) -system_ss.add(tcg_system) + +specific_ss.add(when: ['CONFIG_SYSTEM_ONLY', 'CONFIG_TCG'], if_true: tcg_system) +if host_os == 'linux' + specific_ss.add(when: 'CONFIG_TCG', if_false: files('perf-stubs.c')) +endif Reviewed-by: Thomas Huth
Re: [PATCH v2 06/14] plugins: Link only when required in system mode
On 03/02/2025 04.18, Richard Henderson wrote: Provide out-of-line versions of some of the qemu/plugin.h API. These will be referenced with --enable-plugin, but CONFIG_TCG is disabled for a specific target. Signed-off-by: Richard Henderson --- plugins/stubs.c | 49 + plugins/meson.build | 5 - 2 files changed, 53 insertions(+), 1 deletion(-) create mode 100644 plugins/stubs.c Reviewed-by: Thomas Huth
Re: [PATCH v2 07/14] accel/stubs: Expand stubs for TCG
On 03/02/2025 04.18, Richard Henderson wrote: Add tcg_allowed, qmp_x_query_jit, qmp_x_query_opcount. These are referenced when CONFIG_TCG is enabled globally, but not for a specific target. Signed-off-by: Richard Henderson --- accel/stubs/tcg-stub.c | 24 1 file changed, 24 insertions(+) diff --git a/accel/stubs/tcg-stub.c b/accel/stubs/tcg-stub.c index 7f4208fddf..9c2e2dc6e1 100644 --- a/accel/stubs/tcg-stub.c +++ b/accel/stubs/tcg-stub.c @@ -13,6 +13,18 @@ #include "qemu/osdep.h" #include "exec/tb-flush.h" #include "exec/exec-all.h" +#include "qapi/error.h" + +/* + * This file *ought* to be built once and linked only when required. + * However, it is built per-target, which means qemu/osdep.h has already + * undef'ed CONFIG_TCG, which hides the auto-generated declaration. So why don't we only build this file once? Thomas + */ +#define CONFIG_TCG +#include "qapi/qapi-commands-machine.h" + + +const bool tcg_allowed = false; void tb_flush(CPUState *cpu) { @@ -27,3 +39,15 @@ G_NORETURN void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc) { g_assert_not_reached(); } + +HumanReadableText *qmp_x_query_jit(Error **errp) +{ +error_setg(errp, "JIT information is only available with accel=tcg"); +return NULL; +} + +HumanReadableText *qmp_x_query_opcount(Error **errp) +{ +error_setg(errp, "Opcode count information is only available with accel=tcg"); +return NULL; +}
Re: [PATCH v2 08/14] target/mips: Protect objects with CONFIG_TCG
On 03/02/2025 04.18, Richard Henderson wrote: Hack around mips32 host allowing kvm acceleration of mips64 guest, but tcg is disabled. Signed-off-by: Richard Henderson --- target/mips/tcg/meson.build| 4 ++-- target/mips/tcg/system/meson.build | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) Reviewed-by: Thomas Huth
Re: [PATCH v2 09/14] gitlab: Replace aarch64 with arm in cross-i686-tci build
On 03/02/2025 04.18, Richard Henderson wrote: Configuration of 64-bit host on 32-bit guest will shortly be denied. Use a 32-bit guest instead. Signed-off-by: Richard Henderson --- .gitlab-ci.d/crossbuilds.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml index 95dfc39224..7ae0f966f1 100644 --- a/.gitlab-ci.d/crossbuilds.yml +++ b/.gitlab-ci.d/crossbuilds.yml @@ -61,7 +61,7 @@ cross-i686-tci: variables: IMAGE: debian-i686-cross ACCEL: tcg-interpreter -EXTRA_CONFIGURE_OPTS: --target-list=i386-softmmu,i386-linux-user,aarch64-softmmu,aarch64-linux-user,ppc-softmmu,ppc-linux-user --disable-plugins --disable-kvm +EXTRA_CONFIGURE_OPTS: --target-list=i386-softmmu,i386-linux-user,arm-softmmu,arm-linux-user,ppc-softmmu,ppc-linux-user --disable-plugins --disable-kvm # Force tests to run with reduced parallelism, to see whether this # reduces the flakiness of this CI job. The CI # environment by default shows us 8 CPUs and so we Reviewed-by: Thomas Huth
Re: [PATCH v2 10/14] configure: Define TARGET_LONG_BITS in configs/targets/*.mak
On 03/02/2025 04.18, Richard Henderson wrote: Define TARGET_LONG_BITS in each target's configure fragment. Do this without removing the define in target/*/cpu-param.h so that errors are caught like so: In file included from .../src/include/exec/cpu-defs.h:26, from ../src/target/hppa/cpu.h:24, from ../src/linux-user/qemu.h:4, from ../src/linux-user/hppa/cpu_loop.c:21: ../src/target/hppa/cpu-param.h:11: error: "TARGET_LONG_BITS" redefined [-Werror] 11 | #define TARGET_LONG_BITS 64 | In file included from .../src/include/qemu/osdep.h:36, from ../src/linux-user/hppa/cpu_loop.c:20: ./hppa-linux-user-config-target.h:32: note: this is the location of the previous definition 32 | #define TARGET_LONG_BITS 32 | cc1: all warnings being treated as errors Signed-off-by: Richard Henderson --- [...]> diff --git a/configs/targets/hppa-linux-user.mak b/configs/targets/hppa-linux-user.mak index 8e0a80492f..4295cf384e 100644 --- a/configs/targets/hppa-linux-user.mak +++ b/configs/targets/hppa-linux-user.mak @@ -3,3 +3,5 @@ TARGET_ABI32=y TARGET_SYSTBL_ABI=common,32 TARGET_SYSTBL=syscall.tbl TARGET_BIG_ENDIAN=y +# Compromise to ease maintainence vs system mode s/maintainence/maintenance/ diff --git a/configs/targets/mipsn32-linux-user.mak b/configs/targets/mipsn32-linux-user.mak index 206095da64..39ae214633 100644 --- a/configs/targets/mipsn32-linux-user.mak +++ b/configs/targets/mipsn32-linux-user.mak @@ -5,3 +5,4 @@ TARGET_BASE_ARCH=mips TARGET_SYSTBL_ABI=n32 TARGET_SYSTBL=syscall_n32.tbl TARGET_BIG_ENDIAN=y +TARGET_LONG_BITS=64 Why is this 64 ? diff --git a/configs/targets/mipsn32el-linux-user.mak b/configs/targets/mipsn32el-linux-user.mak index ca2a3ed753..d9b61d6990 100644 --- a/configs/targets/mipsn32el-linux-user.mak +++ b/configs/targets/mipsn32el-linux-user.mak @@ -4,3 +4,4 @@ TARGET_ABI32=y TARGET_BASE_ARCH=mips TARGET_SYSTBL_ABI=n32 TARGET_SYSTBL=syscall_n32.tbl +TARGET_LONG_BITS=64 dito? diff --git a/configs/targets/sparc32plus-linux-user.mak b/configs/targets/sparc32plus-linux-user.mak index 6cc8fa516b..7a16934fd1 100644 --- a/configs/targets/sparc32plus-linux-user.mak +++ b/configs/targets/sparc32plus-linux-user.mak @@ -5,3 +5,4 @@ TARGET_ABI_DIR=sparc TARGET_SYSTBL_ABI=common,32 TARGET_SYSTBL=syscall.tbl TARGET_BIG_ENDIAN=y +TARGET_LONG_BITS=64 Same question here: Why 64? If this isn't a mistake, could you maybe add a comment? Thanks, Thomas
Re: [PATCH v2 11/14] target/*: Remove TARGET_LONG_BITS from cpu-param.h
On 03/02/2025 04.18, Richard Henderson wrote: This is now handled by the configs/targets/*.mak fragment. Signed-off-by: Richard Henderson --- target/alpha/cpu-param.h | 2 -- target/arm/cpu-param.h| 2 -- target/avr/cpu-param.h| 1 - target/hexagon/cpu-param.h| 1 - target/hppa/cpu-param.h | 2 -- target/i386/cpu-param.h | 2 -- target/loongarch/cpu-param.h | 1 - target/m68k/cpu-param.h | 1 - target/microblaze/cpu-param.h | 2 -- target/mips/cpu-param.h | 5 - target/openrisc/cpu-param.h | 1 - target/ppc/cpu-param.h| 2 -- target/riscv/cpu-param.h | 2 -- target/rx/cpu-param.h | 1 - target/s390x/cpu-param.h | 1 - target/sh4/cpu-param.h| 1 - target/sparc/cpu-param.h | 2 -- target/tricore/cpu-param.h| 1 - target/xtensa/cpu-param.h | 1 - 19 files changed, 31 deletions(-) Reviewed-by: Thomas Huth
Re: [PATCH v2 13/14] meson: Deprecate 32-bit host support
On 03/02/2025 04.18, Richard Henderson wrote: We deprecated i686 system mode support for qemu 8.0. However, to make real cleanups to TCG we need to deprecate all 32-bit hosts. Signed-off-by: Richard Henderson --- docs/about/deprecated.rst | 7 +++ meson.build | 6 ++ 2 files changed, 9 insertions(+), 4 deletions(-) Reviewed-by: Thomas Huth
Re: [PATCH v3 06/14] acpi/ghes: only set hw_error_le or hest_addr_le
On Fri, 31 Jan 2025 18:42:47 +0100 Mauro Carvalho Chehab wrote: > The hw_error_le pointer is used for legacy support (virt-9.2). > Starting from virt-10.0, HEST table is accessed via hest_addr_le. > > Remove fw_cfg logic for legacy support if virt is 10.0 or upper. > > Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Jonathan Cameron
Re: [PATCH v3 08/14] acpi/ghes: Cleanup the code which gets ghes ged state
On Fri, 31 Jan 2025 18:42:49 +0100 Mauro Carvalho Chehab wrote: > Move the check logic into a common function and simplify the > code which checks if GHES is enabled and was properly setup. > > Signed-off-by: Mauro Carvalho Chehab > Reviewed-by: Jonathan Cameron > Reviewed-by: Igor Mammedov One minor comment inline on a change I think should be in an earlier patch. > -void ghes_record_cper_errors(const void *cper, size_t len, > +void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t > len, > uint16_t source_id, Error **errp) > { > uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register; > -AcpiGedState *acpi_ged_state; > -AcpiGhesState *ags; > > if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) { > error_setg(errp, "GHES CPER record is too big: %zd", len); > return; > } > > -acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, > - NULL)); > -if (!acpi_ged_state) { > -error_setg(errp, "Can't find ACPI_GED object"); > -return; > -} > -ags = &acpi_ged_state->ghes_state; > - > -if (!ags->hest_addr_le) { > +if (!ags->use_hest_addr) { Should this change be moved back to patch 3? use_hest_addr was available at that point and it would reduce churn a tiny bit. > get_hw_error_offsets(le64_to_cpu(ags->hw_error_le), > &cper_addr, &read_ack_register_addr); > } else { > @@ -547,11 +531,6 @@ void ghes_record_cper_errors(const void *cper, size_t > len, > &cper_addr, &read_ack_register_addr, errp); > } > > -if (!cper_addr) { > -error_setg(errp, "can not find Generic Error Status Block"); > -return; > -} > - > cpu_physical_memory_read(read_ack_register_addr, > &read_ack_register, sizeof(read_ack_register)); >
Re: [PATCH v3 03/14] acpi/ghes: Use HEST table offsets when preparing GHES records
On Fri, 31 Jan 2025 18:42:44 +0100 Mauro Carvalho Chehab wrote: > There are two pointers that are needed during error injection: > > 1. The start address of the CPER block to be stored; > 2. The address of the ack. > > It is preferable to calculate them from the HEST table. This allows > checking the source ID, the size of the table and the type of the > HEST error block structures. > > Yet, keep the old code, as this is needed for migration purposes. > > Signed-off-by: Mauro Carvalho Chehab Tiny niggle on patch split up inline. Either way Reviewed-by: Jonathan Cameron > @@ -212,14 +237,6 @@ static void build_ghes_error_table(GArray > *hardware_errors, BIOSLinker *linker, > { > int i, error_status_block_offset; > > -/* > - * TODO: Current version supports only one source. > - * A further patch will drop this check, after adding a proper migration > - * code, as, for the code to work, we need to store a bios pointer to the > - * HEST table. > - */ > -assert(num_sources == 1); > - > /* Build error_block_address */ > for (i = 0; i < num_sources; i++) { > build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t)); > @@ -352,6 +369,14 @@ void acpi_build_hest(GArray *table_data, GArray > *hardware_errors, > .oem_id = oem_id, .oem_table_id = oem_table_id }; > uint32_t hest_offset; > int i; > +AcpiGedState *acpi_ged_state; > +AcpiGhesState *ags = NULL; > + > +acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, > + NULL)); > +if (acpi_ged_state) { > +ags = &acpi_ged_state->ghes_state; > +} > > hest_offset = table_data->len; > > @@ -371,10 +396,12 @@ void acpi_build_hest(GArray *table_data, GArray > *hardware_errors, > * Tell firmware to write into GPA the address of HEST via fw_cfg, > * once initialized. > */ > -bios_linker_loader_write_pointer(linker, > - ACPI_HEST_ADDR_FW_CFG_FILE, 0, > - sizeof(uint64_t), > - ACPI_BUILD_TABLE_FILE, hest_offset); > +if (ags->use_hest_addr) { Maybe move ags->use_hest_addr introduction to previous patch to avoid churn here? It's not set yet anyway. > +bios_linker_loader_write_pointer(linker, > + ACPI_HEST_ADDR_FW_CFG_FILE, 0, > + sizeof(uint64_t), > + ACPI_BUILD_TABLE_FILE, hest_offset); > +} > } > > void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s, > @@ -420,6 +447,78 @@ static void get_hw_error_offsets(uint64_t ghes_addr, > *read_ack_register_addr = ghes_addr + sizeof(uint64_t); > }
Re: [PATCH v2 12/14] meson: Disallow 64-bit on 32-bit TCG emulation
On 03/02/2025 04.18, Richard Henderson wrote: For system mode, we can rarely support the amount of RAM that the guest requires. Emulation is restricted to round-robin mode, which solves many of the atomicity issues, but not those associated with virtio. In any case, round-robin does nothing to help the speed of emulation. For user mode, most emulation does not succeed at all. Most of the time we cannot even load 64-bit non-PIE binaries due to lack of a 64-bit address space. Threads are run in parallel, not round-robin, which means that atomicity is not handled. Signed-off-by: Richard Henderson --- meson.build | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) Reviewed-by: Thomas Huth
Re: [PATCH v3 12/14] tests/acpi: virt: add a HEST table to aarch64 virt and update DSDT
On Fri, 31 Jan 2025 18:42:53 +0100 Mauro Carvalho Chehab wrote: > --- a/DSDT.dsl2025-01-28 09:38:15.155347858 +0100 > +++ b/DSDT.dsl2025-01-28 09:39:01.684836954 +0100 > @@ -9,9 +9,9 @@ > * > * Original Table Header: > * Signature"DSDT" > - * Length 0x1516 (5398) > + * Length 0x1542 (5442) > * Revision 0x02 > - * Checksum 0x0F > + * Checksum 0xE9 > * OEM ID "BOCHS " > * OEM Table ID "BXPC" > * OEM Revision 0x0001 (1) > @@ -1931,6 +1931,11 @@ > { > Notify (PWRB, 0x80) // Status Change > } > + > +If (((Local0 & 0x10) == 0x10)) > +{ > +Notify (GEDD, 0x80) // Status Change > +} > } > } > > @@ -1939,6 +1944,12 @@ > Name (_HID, "PNP0C0C" /* Power Button Device */) // _HID: > Hardware ID > Name (_UID, Zero) // _UID: Unique ID > } > + > +Device (GEDD) > +{ > +Name (_HID, "PNP0C33" /* Error Device */) // _HID: Hardware ID > +Name (_UID, Zero) // _UID: Unique ID > +} > } > } > > Signed-off-by: Mauro Carvalho Chehab Diff looks good. Reviewed-by: Jonathan Cameron
Re: [PATCH v3 14/14] scripts/ghes_inject: add a script to generate GHES error inject
On Fri, 31 Jan 2025 18:42:55 +0100 Mauro Carvalho Chehab wrote: > Using the QMP GHESv2 API requires preparing a raw data array > containing a CPER record. > > Add a helper script with subcommands to prepare such data. > > Currently, only ARM Processor error CPER record is supported, by > using: > $ ghes_inject.py arm > > which produces those warnings on Linux: > > [ 705.032426] [Firmware Warn]: GHES: Unhandled processor error type 0x02: > cache error > [ 774.866308] {4}[Hardware Error]: Hardware error from APEI Generic Hardware > Error Source: 1 > [ 774.866583] {4}[Hardware Error]: event severity: recoverable > [ 774.866738] {4}[Hardware Error]: Error 0, type: recoverable > [ 774.866889] {4}[Hardware Error]: section_type: ARM processor error > [ 774.867048] {4}[Hardware Error]: MIDR: 0x000f0510 > [ 774.867189] {4}[Hardware Error]: running state: 0x0 > [ 774.867321] {4}[Hardware Error]: Power State Coordination Interface > state: 0 > [ 774.867511] {4}[Hardware Error]: Error info structure 0: > [ 774.867679] {4}[Hardware Error]: num errors: 2 > [ 774.867801] {4}[Hardware Error]:error_type: 0x02: cache error > [ 774.867962] {4}[Hardware Error]:error_info: 0x0091000f > [ 774.868124] {4}[Hardware Error]: transaction type: Data Access > [ 774.868280] {4}[Hardware Error]: cache error, operation type: Data > write > [ 774.868465] {4}[Hardware Error]: cache level: 2 > [ 774.868592] {4}[Hardware Error]: processor context not corrupted > [ 774.868774] [Firmware Warn]: GHES: Unhandled processor error type 0x02: > cache error > > Such script allows customizing the error data, allowing to change > all fields at the record. Please use: > > $ ghes_inject.py arm -h > > For more details about its usage. > > Signed-off-by: Mauro Carvalho Chehab Thanks for examples. Given my poor python skills take this one with a pinch of salt. Reviewed-by: Jonathan Cameron
Re: [PATCH 0/2] target/i386: Fix 0 * Inf + QNaN regression
On Fri, 24 Jan 2025 at 17:22, Paolo Bonzini wrote: > > Queued, thanks. Thanks; do you plan to send a pullreq with these in soon? I ask because the Arm FEAT_AFP set is now ready to land and it has a dependency on these. thanks -- PMM
RE: [PATCH v1 14/18] hw/arm/aspeed: Add SoC and Machine Support for AST2700 A1
Hi Andrew, > From: Andrew Jeffery > Sent: Thursday, January 30, 2025 12:22 PM > To: Jamin Lin ; Cédric Le Goater ; > Peter Maydell ; Steven Lee > ; Troy Lee ; Joel Stanley > ; open list:ASPEED BMCs ; open > list:All patches CC here > Cc: Troy Lee ; Yunlin Tang > > Subject: Re: [PATCH v1 14/18] hw/arm/aspeed: Add SoC and Machine Support > for AST2700 A1 > > On Tue, 2025-01-21 at 15:04 +0800, Jamin Lin wrote: > > The memory map for AST2700 A1 remains compatible with AST2700 A0. > > However, the IRQ mapping has been updated for AST2700 A1, with GIC > > interrupts now ranging from 192 to 201. Add a new IRQ map table for > > AST2700 A1. > > > > Introduce "aspeed_machine_ast2700_evb_class_init" to initialize the > > AST2700 EVB > > machine. Add "aspeed_soc_ast2700_class_init" to initialize the > > AST2700 A1 SoC. > > > > Signed-off-by: Jamin Lin > > --- > > hw/arm/aspeed.c | 24 + > > hw/arm/aspeed_ast27x0.c | 80 > > + > > 2 files changed, 104 insertions(+) > > > > diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c index > > 402d55c556..254fa5316d 100644 > > --- a/hw/arm/aspeed.c > > +++ b/hw/arm/aspeed.c > > @@ -1672,6 +1672,26 @@ static void > > aspeed_machine_ast2700a0_evb_class_init(ObjectClass *oc, void *data) > > mc->default_ram_size = 1 * GiB; > > aspeed_machine_class_init_cpus_defaults(mc); > > } > > + > > +static void aspeed_machine_ast2700_evb_class_init(ObjectClass *oc, > > void *data) > > +{ > > + MachineClass *mc = MACHINE_CLASS(oc); > > + AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc); > > + > > + mc->desc = "Aspeed AST2700 EVB (Cortex-A35)"; > > + amc->soc_name = "ast2700-a1"; > > + amc->hw_strap1 = AST2700_EVB_HW_STRAP1; > > + amc->hw_strap2 = AST2700_EVB_HW_STRAP2; > > + amc->fmc_model = "w25q01jvq"; > > + amc->spi_model = "w25q512jv"; > > + amc->num_cs = 2; > > + amc->macs_mask = ASPEED_MAC0_ON | ASPEED_MAC1_ON | > > ASPEED_MAC2_ON; > > + amc->uart_default = ASPEED_DEV_UART12; > > + amc->i2c_init = ast2700_evb_i2c_init; > > + mc->default_ram_size = 1 * GiB; > > + aspeed_machine_class_init_cpus_defaults(mc); > > +} > > + > > #endif > > > > static void aspeed_machine_qcom_dc_scm_v1_class_init(ObjectClass > > *oc, > > @@ -1798,6 +1818,10 @@ static const TypeInfo aspeed_machine_types[] = > > { > > .name = > MACHINE_TYPE_NAME("ast2700a0-evb"), > > .parent = TYPE_ASPEED_MACHINE, > > .class_init = aspeed_machine_ast2700a0_evb_class_init, > > + }, { > > + .name = MACHINE_TYPE_NAME("ast2700-evb"), > > + .parent = TYPE_ASPEED_MACHINE, > > + .class_init = aspeed_machine_ast2700_evb_class_init, > > #endif > > }, { > > .name = TYPE_ASPEED_MACHINE, diff --git > > a/hw/arm/aspeed_ast27x0.c b/hw/arm/aspeed_ast27x0.c index > > b32c4fcc35..e0a29c9053 100644 > > --- a/hw/arm/aspeed_ast27x0.c > > +++ b/hw/arm/aspeed_ast27x0.c > > @@ -119,6 +119,52 @@ static const int aspeed_soc_ast2700a0_irqmap[] = > > { > > [ASPEED_DEV_SDHCI] = 133, > > }; > > > > +static const int aspeed_soc_ast2700_irqmap[] = { > > + [ASPEED_DEV_UART0] = 196, > > + [ASPEED_DEV_UART1] = 196, > > + [ASPEED_DEV_UART2] = 196, > > + [ASPEED_DEV_UART3] = 196, > > + [ASPEED_DEV_UART4] = 8, > > + [ASPEED_DEV_UART5] = 196, > > + [ASPEED_DEV_UART6] = 196, > > + [ASPEED_DEV_UART7] = 196, > > + [ASPEED_DEV_UART8] = 196, > > + [ASPEED_DEV_UART9] = 196, > > + [ASPEED_DEV_UART10] = 196, > > + [ASPEED_DEV_UART11] = 196, > > + [ASPEED_DEV_UART12] = 196, > > + [ASPEED_DEV_FMC] = 195, > > + [ASPEED_DEV_SDMC] = 0, > > + [ASPEED_DEV_SCU] = 12, > > + [ASPEED_DEV_ADC] = 194, > > + [ASPEED_DEV_XDMA] = 5, > > + [ASPEED_DEV_EMMC] = 15, > > + [ASPEED_DEV_GPIO] = 194, > > + [ASPEED_DEV_RTC] = 13, > > + [ASPEED_DEV_TIMER1] = 16, > > + [ASPEED_DEV_TIMER2] = 17, > > + [ASPEED_DEV_TIMER3] = 18, > > + [ASPEED_DEV_TIMER4] = 19, > > + [ASPEED_DEV_TIMER5] = 20, > > + [ASPEED_DEV_TIMER6] = 21, > > + [ASPEED_DEV_TIMER7] = 22, > > + [ASPEED_DEV_TIMER8] = 23, > > + [ASPEED_DEV_WDT] = 195, > > + [ASPEED_DEV_PWM] = 195, > > + [ASPEED_DEV_LPC] = 192, > > + [ASPEED_DEV_IBT] = 192, > > + [ASPEED_DEV_I2C] = 194, > > + [ASPEED_DEV_PECI] = 197, > > + [ASPEED_DEV_ETH1] = 196, > > + [ASPEED_DEV_ETH2] = 196, > > + [ASPEED_DEV_ETH3] = 196, > > + [ASPEED_DEV_HACE] = 4, > > + [ASPEED_DEV_KCS] = 192, > > + [ASPEED_DEV_DP] = 28, > > + [ASPEED_DEV_I3C] = 195, > > + [ASPEED_DEV_SDHCI] = 197, > > +}; > > Bit of a nit, but can we sort this table? Perhaps by interrupt value? Thanks for review and suggestion. Will update this table Jamin
Re: [PATCH v9 0/4] chardev: implement backend chardev multiplexing
Hi Marc-Andre, Do you plan to pull the latest version of this series, or is there something which I have to address? Please let me know. Thanks. -- Roman On Thu, Jan 23, 2025 at 9:53 AM Roman Penyaev wrote: > > Mux is a character backend (host side) device, which multiplexes > multiple frontends with one backend device. The following is a > few lines from the QEMU manpage [1]: > > A multiplexer is a "1:N" device, and here the "1" end is your > specified chardev backend, and the "N" end is the various parts > of QEMU that can talk to a chardev. > > But sadly multiple backends are not supported. > > This work implements a new chardev backend `hub` device, which > aggregates input from multiple backend devices and forwards it to a > single frontend device. Additionally, `hub` device takes the output > from the frontend device and sends it back to all the connected > backend devices. This allows for seamless interaction between > different backend devices and a single frontend interface. > > The motivation is the EVE project [2], where it would be very > convenient to have a virtio console frontend device on the guest that > can be controlled from multiple backend devices, namely VNC and local > TTY emulator. The following is an example of the QEMU command line: > > -chardev pty,path=/tmp/pty,id=pty0 \ > -chardev vc,id=vc0 \ > -chardev hub,id=hub0,chardevs.0=pty0,chardevs.1=vc0 \ > -device virtconsole,chardev=hub0 \ > -vnc 0.0.0.0:0 > > Which creates two backend devices: > > * Text virtual console (`vc0`) > * A pseudo TTY (`pty0`) connected to the single virtio hvc console with the > help of a new backend aggregator (`hub0`) > > `vc0` renders text to an image, which can be shared over the VNC > protocol. `pty0` is a pseudo TTY backend which provides bidirectional > communication to the virtio hvc console. > > Once QEMU starts, the VNC client and any TTY emulator can be used to > control a single hvc console. For example, these two different > consoles should have similar input and output due to the buffer > aggregation: > > # Start TTY emulator > tio /tmp/pty > > # Start VNC client and switch to virtual console Ctrl-Alt-2 > vncviewer :0 > > 'chardevs.N' list syntax is used for the sake of compatibility with > the representation of JSON lists in 'key=val' pairs format of the > util/keyval.c, despite the fact that modern QAPI way of parsing, > namely qobject_input_visitor_new_str(), is not used. Choice of keeping > QAPI list syntax may help to smoothly switch to modern parsing in the > future. > > v8 .. v9: > > * Incorporate Reviewed-by tags > > v7 .. v8: > > * No need for a separate `->frontend` pointer in the hub device > structure, use `hub->parent.fe` directly. > * Remove special handling of !EAGAIN error while serving write > to all backends. This should be safe, because detached backends > are handled by the `->be_open` flag check. > * Combine `hub_chr_write_to_all()` and `hub_chr_write()` calls. > * Fix docs generation: no single backtick, but double, so not > a `hub` but ``hub`` in qemu-options.hx > > v6 .. v7: > > After discussing v6 it was decided to: > > * Rename "multiplexer" to "aggregator" > * Rename "mux-be" device type to "hub" > * Drop all changes related to the original multiplexer implementation > > Code changes: > > * Added counting of CHR_EVENT_OPENED and CHR_EVENT_CLOSED events > coming from backend devices. This prevents frontend devices from > closing if one of the backend devices has been disconnected. The > logic is simple: "the last one turns off the light". > > v5 .. v6: > > * Rebased on latest master > * Changed how chardev is attached to a multiplexer: with version 6 > mux should specify list elements with ID of chardevs: > > chardevs.0=ID[,chardevs.N=ID] > > 'chardevs.N' list syntax is used for the sake of compatibility with > the representation of JSON lists in 'key=val' pairs format of the > util/keyval.c, despite the fact that modern QAPI way of parsing, > namely qobject_input_visitor_new_str(), is not used. Choice of keeping > QAPI list syntax may help to smoothly switch to modern parsing in the > future. > > v4 .. v5: > > * Spelling fixes in qemu-options description > * Memory leaks fixes in mux-be tests > * Add sanity checks to chardev to avoid stacking of mux devices > * Add corresponding unit test case to cover the creation of stacked > muxers: `-chardev mux-be,mux-id-be=ID`, which is forbidden > * Reflect the fact that stacking is not supported in the documentation > > v3 .. v4: > > * Rebase on latest chardev changes > * Add unit tests which test corner cases: >* Inability to remove mux with active frontend >* Inability to add more chardevs to a mux than `MUX_MAX` >* Inability to mix mux-fe and mux-be for the same chardev > > v2 .. v3: > > * Split frontend and backend multiplexer implementations and > move them to separate files: char-mux-fe.c and char-mux-be.c > > v1 .. v2: > > * Separate type f
Re: [PATCH v6 00/10] Support virtio-gpu DRM native context
Dmitry Osipenko writes: > On 1/27/25 19:17, Alex Bennée wrote: > ... >> I'm still seeing corruption with -display gtk,gl=on on my x86 system >> BTW. I would like to understand if that is a problem with QEMU, GTK or >> something else in the stack before we merge. > > I reproduced the display mirroring/corruption issue and bisected it to > the following commit. The problem only happens when QEMU/GTK uses > Wayland display directly, while previously I was running QEMU with > XWayland that doesn't have the problem. Why this change breaks dmabuf > displaying with Wayland/GTK is unclear. Ahh that makes sense - I obviously forgot to mention I'm running sway/wayland across both machines. > Reverting commit fixes the bug. > > +Dongwon Kim +Vivek Kasireddy > > commit 77bf310084dad38b3a2badf01766c659056f1cf2 > Author: Dongwon Kim > Date: Fri Apr 26 15:50:59 2024 -0700 > > ui/gtk: Draw guest frame at refresh cycle > > Draw routine needs to be manually invoked in the next refresh > if there is a scanout blob from the guest. This is to prevent > a situation where there is a scheduled draw event but it won't > happen bacause the window is currently in inactive state > (minimized or tabified). If draw is not done for a long time, > gl_block timeout and/or fence timeout (on the guest) will happen > eventually. > > v2: Use gd_gl_area_draw(vc) in gtk-gl-area.c > > Suggested-by: Vivek Kasireddy > Cc: Gerd Hoffmann > Cc: Marc-André Lureau > Cc: Daniel P. Berrangé > Signed-off-by: Dongwon Kim > Acked-by: Marc-André Lureau > Message-Id: <20240426225059.3871283-1-dongwon@intel.com> Maybe a race on: QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf; ? -- Alex Bennée Virtualisation Tech Lead @ Linaro
Re: [PATCH 0/1] meson: Deprecate 32-bit host systems
Peter Maydell writes: > On Wed, 29 Jan 2025 at 06:23, Thomas Huth wrote: >> So unless someone complains immediately with a good reason, I'm also in >> favor of marking it as deprecated now. If then someone complains during the >> deprecation period, we still can reconsider and remove the deprecation note >> again. > > Well, I mean the reason would be that I suspect we do still have > users who are using QEMU for some purposes on 32-bit arm hosts. > That doesn't mean they're trying to run massively complex or > high memory guests or that they care that our whole test suite > doesn't run. > > I'm not really strongly opposed to dropping 32-bit host support, > but I don't think a thread on qemu-devel is exactly likely to > get the attention of the people who might be using this > functionality. (You could argue that functionality without > representation among the developer community is fair game > for being dumped even if it has users, of course.) FWIW random internet poll: https://mastodon.org.uk/deck/@stsquad/113905257703721811 26% 32 bit 74% 64 bit with 41 respondents. > > -- PMM -- Alex Bennée Virtualisation Tech Lead @ Linaro
Re: [PULL 0/8] Ui patches
On Mon, Feb 3, 2025 at 7:58 AM wrote: > > From: Marc-André Lureau > > The following changes since commit 6fccaa2fba391815308a746d68f7fa197bc93586: > > Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into > staging (2025-02-02 11:09:10 -0500) > > are available in the Git repository at: > > https://gitlab.com/marcandre.lureau/qemu.git tags/ui-pull-request > > for you to fetch changes up to f327a2cea1502a8cad2beb13bc4e2c12b95b10ba: > > dbus: add -audio dbus nsamples option (2025-02-03 13:58:08 +0400) > > > UI/chardev-related patch queue > > > > Marc-André Lureau (4): > ui/dbus: on win32, allow ANONYMOUS with p2p Hi Marc-André, There is an unexpected submodule update in this commit. Although it's not included in the patch email sent to the mailing list, GitLab shows it: https://gitlab.com/marcandre.lureau/qemu/-/commit/31d9023965ba1963afd1e0e0f48c75399a7bc23e Please rebase onto qemu.git/master and remove the spurious libvirt-ci submodule update before resending this pull request. Thank you! Stefan > ui/dbus: clarify the kind of win32 handle that is shared > plugins: fix -Werror=maybe-uninitialized false-positive > dbus: add -audio dbus nsamples option > > Roman Penyaev (4): > chardev/char-pty: send CHR_EVENT_CLOSED on disconnect > chardev/char-hub: implement backend chardev aggregator > tests/unit/test-char: add unit tests for hub chardev backend > qemu-options.hx: describe hub chardev and aggregation of several > backends > > qapi/audio.json| 22 +- > qapi/char.json | 27 +++ > chardev/chardev-internal.h | 51 - > include/chardev/char.h | 1 + > audio/dbusaudio.c | 29 ++- > chardev/char-hub.c | 301 > chardev/char-pty.c | 4 +- > chardev/char.c | 23 ++- > contrib/plugins/cache.c| 2 +- > tests/unit/test-char.c | 398 + > ui/dbus-console.c | 8 +- > ui/dbus.c | 10 +- > chardev/meson.build| 1 + > qemu-options.hx| 49 - > ui/dbus-display1.xml | 16 +- > 15 files changed, 923 insertions(+), 19 deletions(-) > create mode 100644 chardev/char-hub.c > > -- > 2.47.0 > >
Re: [PATCH v4 00/33] Multifd 🔀 device state transfer support with VFIO consumer
Hello Maciej, This patch set is targeting QEMU 10.0. What's not yet present is documentation update under docs/devel/migration but I didn't want to delay posting the code any longer. Such doc can still be merged later when the design is 100% finalized. The changes are quite complex, the design is not trivial, the benefits are not huge as far as we know. I'd rather have the doc update first please. Thanks, C.
Re: [PATCH 0/7] hw/arm/raspi4b: Add models with 4GB and 8GB of DRAM
Peter Maydell writes: > On Sat, 1 Feb 2025 at 12:57, BALATON Zoltan wrote: >> >> On Sat, 1 Feb 2025, Philippe Mathieu-Daudé wrote: >> > - Deprecate the 'raspi4b' machine name, renaming it as >> > 'raspi4b-1g' on 32-bit hosts, 'raspi4b-2g' otherwise. >> > - Add the 'raspi4b-4g' and 'raspi4b-8g' machines, with >> > respectively 4GB and 8GB of DRAM. >> >> IMHO (meaning you can ignore it, just my opinion) if the only difference >> is the memory size -machine raspi4b -memory 4g would be better user >> experience than having a lot of different machines. > > Yes, I think I agree. We have a way for users to specify > how much memory they want, and I think it makes more sense > to use that than to have lots of different machine types. I guess for the Pi we should validate the -memory supplied is on of the supported grid of devices rather than an arbitrary value? -- Alex Bennée Virtualisation Tech Lead @ Linaro
Re: [PATCH 0/7] hw/arm/raspi4b: Add models with 4GB and 8GB of DRAM
On Mon, Feb 03, 2025 at 02:29:49PM +, Alex Bennée wrote: > Peter Maydell writes: > > > On Sat, 1 Feb 2025 at 12:57, BALATON Zoltan wrote: > >> > >> On Sat, 1 Feb 2025, Philippe Mathieu-Daudé wrote: > >> > - Deprecate the 'raspi4b' machine name, renaming it as > >> > 'raspi4b-1g' on 32-bit hosts, 'raspi4b-2g' otherwise. > >> > - Add the 'raspi4b-4g' and 'raspi4b-8g' machines, with > >> > respectively 4GB and 8GB of DRAM. > >> > >> IMHO (meaning you can ignore it, just my opinion) if the only difference > >> is the memory size -machine raspi4b -memory 4g would be better user > >> experience than having a lot of different machines. > > > > Yes, I think I agree. We have a way for users to specify > > how much memory they want, and I think it makes more sense > > to use that than to have lots of different machine types. > > I guess for the Pi we should validate the -memory supplied is on of the > supported grid of devices rather than an arbitrary value? If the user wants to create a rpi4 with 6 GB RAM why should we stop them ? It is their choice if they want to precisely replicate RAM size from a physical model, or use something different when virtualized. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|