[PATCH v2 11/11] tests/migration-test: Add a test for postcopy hangs during RECOVER

2023-09-12 Thread Peter Xu
From: Fabiano Rosas To do so, create two paired sockets, but make them not providing real data. Feed those fake sockets to src/dst QEMUs for recovery to let them go into RECOVER stage without going out. Test that we can always kick it out and recover again with the right ports. This patch is ba

[PATCH v2 01/11] migration: Display error in query-migrate irrelevant of status

2023-09-12 Thread Peter Xu
Display it as long as being set, irrelevant of FAILED status. E.g., it may also be applicable to PAUSED stage of postcopy, to provide hint on what has gone wrong. The error_mutex seems to be overlooked when referencing the error, add it to be very safe. Bugzilla: https://bugzilla.redhat.com/show

[PATCH v2 06/11] qemufile: Always return a verbose error

2023-09-12 Thread Peter Xu
There're a lot of cases where we only have an errno set in last_error but without a detailed error description. When this happens, try to generate an error contains the errno as a descriptive error. This will be helpful in cases where one relies on the Error*. E.g., migration state only caches E

[PATCH v2 05/11] migration: Deliver return path file error to migrate state too

2023-09-12 Thread Peter Xu
We've already did this for most of the return path thread errors, but not yet for the IO errors happened on the return path qemufile. Do that too. Remember to reset "err" always, because the ownership is not us anymore, otherwise we're prone to use-after-free later after recovered. Re-export qem

Re: [PATCH 3/3] iotests: distinguish 'skipped' and 'not run' states

2023-09-12 Thread Denis V. Lunev
On 9/12/23 22:03, Vladimir Sementsov-Ogievskiy wrote: On 06.09.23 17:09, Denis V. Lunev wrote: Each particular testcase could skipped intentionally and accidentally. For example the test is not designed for a particular image format or is not run due to the missed library. The latter case is un

[PATCH v2 10/11] migration: Allow RECOVER->PAUSED convertion for dest qemu

2023-09-12 Thread Peter Xu
There's a bug on dest that if a double fault triggered on dest qemu (a network issue during postcopy-recover), we won't set PAUSED correctly because we assumed we always came from ACTIVE. Fix that by always overwriting the state to PAUSE. We could also check for these two states, but maybe it's a

[PATCH v2 04/11] migration: Refactor error handling in source return path

2023-09-12 Thread Peter Xu
rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_error), but also not good enough in that we only do error_report() and set it to true, we never can keep a history of the exact error and show it in query-migrat

[PATCH v2 03/11] migration: Introduce migrate_has_error()

2023-09-12 Thread Peter Xu
Introduce a helper to detect whether MigrationState.error is set for whatever reason. It is intended to not taking the error_mutex here because neither do we reference the pointer, nor do we modify the pointer. State why it's safe to do so. This is preparation work for any thread (e.g. source re

[PATCH v2 07/11] migration: Remember num of ramblocks to sync during recovery

2023-09-12 Thread Peter Xu
Instead of only relying on the count of rp_sem, make the counter be part of RAMState so it can be used in both threads to synchronize on the process. rp_sem will be further reused as a way to kick the main thread, e.g., on recovery failures. Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu --

[PATCH v2 02/11] migration: Let migrate_set_error() take ownership

2023-09-12 Thread Peter Xu
migrate_set_error() used one error_copy() so it always copy an error. However that's not the major use case - the major use case is one would like to pass the error to migrate_set_error() without further touching the error. It can be proved if we see most of the callers are freeing the error expli

qemu-riscv32 usermode still broken?

2023-09-12 Thread Andreas K. Huettel
Dear all, I've once more tried to build up a riscv32 linux install in a qemu-riscv32 usermode systemd-nspawn, and am running into the same problems as some time ago... https://dev.gentoo.org/~dilfridge/riscv32/riscv32.tar.xz (220M) The problems manifest themselves mostly in bash; if I replace

[PATCH v3 05/12] gdbstub: Introduce GDBFeature structure

2023-09-12 Thread Akihiko Odaki
Before this change, the information from a XML file was stored in an array that is not descriptive. Introduce a dedicated structure type to make it easier to understand and to extend with more fields. Signed-off-by: Akihiko Odaki Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Alex Bennée Revi

[PATCH v3 03/12] plugins: Check if vCPU is realized

2023-09-12 Thread Akihiko Odaki
The created member of CPUState tells if the vCPU thread is started, and will be always false for the user space emulation that manages threads independently. Use the realized member of DeviceState, which is valid for both of the system and user space emulation. Fixes: 54cb65d858 ("plugin: add core

[PATCH v3 02/12] gdbstub: Fix target.xml response

2023-09-12 Thread Akihiko Odaki
It was failing to return target.xml after the first request. Fixes: 56e534bd11 ("gdbstub: refactor get_feature_xml") Signed-off-by: Akihiko Odaki --- gdbstub/gdbstub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c index 349d348c7b..3841

[PATCH v3 08/12] gdbstub: Use g_markup_printf_escaped()

2023-09-12 Thread Akihiko Odaki
g_markup_printf_escaped() is a safer alternative to simple printf() as it automatically escapes values. Signed-off-by: Akihiko Odaki --- gdbstub/gdbstub.c | 36 +--- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstu

[PATCH v3 00/12] gdbstub and TCG plugin improvements

2023-09-12 Thread Akihiko Odaki
This series extracts fixes and refactorings that can be applied independently from "[PATCH RESEND v5 00/26] plugins: Allow to read registers" as suggested by Nicholas Piggin. Patch "target/ppc: Remove references to gdb_has_xml" is also updated to remove some dead code I missed earlier and thus the

[PATCH v3 12/12] gdbstub: Replace gdb_regs with an array

2023-09-12 Thread Akihiko Odaki
An array is a more appropriate data structure than a list for gdb_regs since it is initialized only with append operation and read-only after initialization. Signed-off-by: Akihiko Odaki --- include/hw/core/cpu.h | 2 +- gdbstub/gdbstub.c | 34 -- 2 files cha

[PATCH v3 01/12] gdbstub: Fix target_xml initialization

2023-09-12 Thread Akihiko Odaki
target_xml is no longer a fixed-length array but a pointer to a variable-length memory. Fixes: 56e534bd11 ("gdbstub: refactor get_feature_xml") Signed-off-by: Akihiko Odaki --- gdbstub/softmmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gdbstub/softmmu.c b/gdbstub/softm

[PATCH v3 11/12] gdbstub: Remove gdb_has_xml variable

2023-09-12 Thread Akihiko Odaki
GDB has XML support since 6.7 which was released in 2007. It's time to remove support for old GDB versions without XML support. Signed-off-by: Akihiko Odaki --- gdbstub/internals.h| 2 -- include/exec/gdbstub.h | 8 gdbstub/gdbstub.c | 15 --- 3 files changed, 25

[PATCH v3 06/12] target/arm: Move the reference to arm-core.xml

2023-09-12 Thread Akihiko Odaki
Some subclasses overwrite gdb_core_xml_file member but others don't. Always initialize the member in the subclasses for consistency. This especially helps for AArch64; in a following change, the file specified by gdb_core_xml_file is always looked up even if it's going to be overwritten later. Loo

[PATCH v3 07/12] hw/core/cpu: Return static value with gdb_arch_name()

2023-09-12 Thread Akihiko Odaki
All implementations of gdb_arch_name() returns dynamic duplicates of static strings. It's also unlikely that there will be an implementation of gdb_arch_name() that returns a truly dynamic value due to the nature of the function returning a well-known identifiers. Qualify the value gdb_arch_name()

[PATCH v3 09/12] target/arm: Remove references to gdb_has_xml

2023-09-12 Thread Akihiko Odaki
GDB has XML support since 6.7 which was released in 2007. It's time to remove support for old GDB versions without XML support. Signed-off-by: Akihiko Odaki Acked-by: Alex Bennée --- target/arm/gdbstub.c | 32 ++-- 1 file changed, 2 insertions(+), 30 deletions(-) di

[PATCH v3 10/12] target/ppc: Remove references to gdb_has_xml

2023-09-12 Thread Akihiko Odaki
GDB has XML support since 6.7 which was released in 2007. It's time to remove support for old GDB versions without XML support. Signed-off-by: Akihiko Odaki --- target/ppc/gdbstub.c | 18 -- 1 file changed, 18 deletions(-) diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c

[PATCH v3 04/12] contrib/plugins: Use GRWLock in execlog

2023-09-12 Thread Akihiko Odaki
execlog had the following comment: > As we could have multiple threads trying to do this we need to > serialise the expansion under a lock. Threads accessing already > created entries can continue without issue even if the ptr array > gets reallocated during resize. However, when the ptr array get

Re: [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery

2023-09-12 Thread Fabiano Rosas
Peter Xu writes: >> Scenario 1: >> /x86_64/migration/postcopy/recovery/fail-twice >> >> the stacks are: >> >> Thread 8 (Thread 0x7fffd5ffe700 (LWP 30282) "live_migration"): >> qemu_sem_wait >> ram_dirty_bitmap_sync_all >> ram_resume_prepare >> qemu_savevm_state_resume_prepare >> postcopy_d

Re: [PATCH v7 14/18] cpu: Call plugin hooks only when ready

2023-09-12 Thread Akihiko Odaki
On 2023/09/12 17:46, Philippe Mathieu-Daudé wrote: Hi Akihiko, On 12/9/23 09:12, Akihiko Odaki wrote: The initialization and exit hooks will not affect the state of vCPU, What about:  qemu_plugin_vcpu_init_hook()    -> plugin_cpu_update__locked()   -> plugin_cpu_update__async()  

CI container image interference between staging and staging-7.2

2023-09-12 Thread Stefan Hajnoczi
Hi, TL;DR Michael: Please check that the staging-7.2 branch has Dan's commit e28112d00703abd136e2411d23931f4f891c9244 ("gitlab: stable staging branches publish containers in a separate tag"). I couldn't explain a check-cfi-x86_64 failure (https://gitlab.com/qemu-project/qemu/-/jobs/5072006964), so

[PATCH v3 3/5] block-backend: process I/O in the current AioContext

2023-09-12 Thread Stefan Hajnoczi
Switch blk_aio_*() APIs over to multi-queue by using qemu_get_current_aio_context() instead of blk_get_aio_context(). This change will allow devices to process I/O in multiple IOThreads in the future. I audited existing blk_aio_*() callers: - migration/block.c: blk_mig_lock() protects the data acc

[PATCH v3 2/5] test-bdrv-drain: avoid race with BH in IOThread drain test

2023-09-12 Thread Stefan Hajnoczi
This patch fixes a race condition in test-bdrv-drain that is difficult to reproduce. test-bdrv-drain sometimes fails without an error message on the block pull request sent by Kevin Wolf on Sep 4, 2023. I was able to reproduce it locally and found that "block-backend: process I/O in the current Aio

[PATCH v3 5/5] block-coroutine-wrapper: use qemu_get_current_aio_context()

2023-09-12 Thread Stefan Hajnoczi
Use qemu_get_current_aio_context() in mixed wrappers and coroutine wrappers so that code runs in the caller's AioContext instead of moving to the BlockDriverState's AioContext. This change is necessary for the multi-queue block layer where any thread can call into the block layer. Most wrappers ar

[PATCH v3 0/5] block-backend: process I/O in the current AioContext

2023-09-12 Thread Stefan Hajnoczi
v3 - Add Patch 2 to fix a race condition in test-bdrv-drain. This was the CI failure that bumped this patch series from Kevin's pull request. - Add missing 051.pc.out file. I tried qemu-system-aarch64 to see of 051.out also needs to be updated, but no changes were necessary. [Kevin] v2 - Add pa

[PATCH v3 4/5] block-backend: process zoned requests in the current AioContext

2023-09-12 Thread Stefan Hajnoczi
Process zoned requests in the current thread's AioContext instead of in the BlockBackend's AioContext. There is no need to use the BlockBackend's AioContext thanks to CoMutex bs->wps->colock, which protects zone metadata. Signed-off-by: Stefan Hajnoczi --- block/block-backend.c | 12 ++-

[PATCH v3 1/5] block: remove AIOCBInfo->get_aio_context()

2023-09-12 Thread Stefan Hajnoczi
The synchronous bdrv_aio_cancel() function needs the acb's AioContext so it can call aio_poll() to wait for cancellation. It turns out that all users run under the BQL in the main AioContext, so this callback is not needed. Remove the callback, mark bdrv_aio_cancel() GLOBAL_STATE_CODE just like i

Re: [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery

2023-09-12 Thread Peter Xu
On Tue, Sep 12, 2023 at 07:49:37PM -0300, Fabiano Rosas wrote: > I figured what is going on here (test #1). At postcopy_pause_incoming() > the state transition is ACTIVE -> PAUSED, but when the first recovery > fails on the incoming side, the transition would have to be RECOVER -> > PAUSED. > > Co

[RFC PATCH 0/3] target/ppc: Change CR registers from i32 to tl

2023-09-12 Thread Nicholas Piggin
This is a bit of churn so I might leave it for later in the cycle (or defer if we get a lot of other changes) since it's a relatively mechanical change. So don't spend time reviewing details, I'm just wondering about concept and general approach. I'm not sure the history of why these are 32-bit, m

[RFC PATCH 2/3] target/ppc: Use FP CR1 update helper more widely

2023-09-12 Thread Nicholas Piggin
Several places open-code this FP CR1 update. Move them to call gen_set_cr1_from_fpscr(). FPSCR_OX = 28 so move that to the symbolic constant while we are here. Signed-off-by: Nicholas Piggin --- target/ppc/translate/fp-impl.c.inc | 16 ++-- 1 file changed, 6 insertions(+), 10 deleti

[RFC PATCH 3/3] target/ppc: Optimise after CR register tl conversion

2023-09-12 Thread Nicholas Piggin
After changing CR registers from i32 to tl, a number of places that that previously did type conversion are now redundant moves between variables that can be removed. Signed-off-by: Nicholas Piggin --- target/ppc/translate.c | 97 +- target/ppc/translate/f

[RFC PATCH 1/3] target/ppc: Change CR registers from i32 to tl

2023-09-12 Thread Nicholas Piggin
tl is more convenient to work with because it matches most other registers. Change the type to tl. Keep generated code changes to a minimum with trivial conversions (e.g., tcg_gen_trunc_tl_i32 -> tcg_gen_mov_tl). Optimisation is done with a subsequent change. Signed-off-by: Nicholas Piggin ---

Re: [PATCH v13 0/9] rutabaga_gfx + gfxstream

2023-09-12 Thread Gurchetan Singh
On Tue, Sep 12, 2023 at 6:59 AM Marc-André Lureau < marcandre.lur...@gmail.com> wrote: > Hi Gurchetan > > On Wed, Sep 6, 2023 at 5:22 AM Gurchetan Singh > wrote: > > > > > > > > On Wed, Aug 30, 2023 at 7:26 PM Huang Rui wrote: > >> > >> On Tue, Aug 29, 2023 at 08:36:20AM +0800, Gurchetan Singh w

Re: [PATCH v11 0/9] rutabaga_gfx + gfxstream

2023-09-12 Thread Gurchetan Singh
On Tue, Sep 12, 2023 at 1:53 AM Alyssa Ross wrote: > Gurchetan Singh writes: > > > On Fri, Aug 25, 2023 at 12:37 PM Alyssa Ross wrote: > > > >> Alyssa Ross writes: > >> > >> > Gurchetan Singh writes: > >> > > >> >> On Fri, Aug 25, 2023 at 12:11 AM Alyssa Ross wrote: > >> >> > >> >>> Gurcheta

Re: qemu-riscv32 usermode still broken?

2023-09-12 Thread LIU Zhiwei
On 2023/9/13 6:31, Andreas K. Huettel wrote: Dear all, I've once more tried to build up a riscv32 linux install in a qemu-riscv32 usermode systemd-nspawn, and am running into the same problems as some time ago... https://dev.gentoo.org/~dilfridge/riscv32/riscv32.tar.xz (220M) The problems

Re: [PATCH 0/4] net: avoid variable length arrays

2023-09-12 Thread Jason Wang
On Tue, Sep 12, 2023 at 10:20 PM Peter Maydell wrote: > > Hi, Jason. This patchset has been reviewed -- do you want to > pick it up via the net tree? Yes, I've queued this. Thanks > > thanks > -- PMM > > On Thu, 24 Aug 2023 at 16:32, Peter Maydell wrote: > > > > This patchset removes the use o

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-12 Thread Gupta, Pankaj
From: William Roche AMD guests can't currently deal with BUS_MCEERR_AO MCE injection as it panics the VM kernel. We filter this event and provide a warning message. Signed-off-by: William Roche --- v3: - New patch v4: - Remove redundant check for AO errors --- target/i386/kvm/kvm.c | 9

Re: [PATCH v2 00/10] Adds CPU hot-plug support to Loongarch

2023-09-12 Thread lixianglai
Hi, Salil Mehta : Hi Xianglai, From: qemu-devel-bounces+salil.mehta=huawei@nongnu.org On Behalf Of xianglai li Sent: Tuesday, September 12, 2023 3:12 AM To: qemu-devel@nongnu.org Cc: Salil Mehta ; Xiaojuan Yang ; Song Gao ; Michael S. Tsirkin ; Igor Mammedov ; Ani Sinha ; Paolo Bonzini ;

Re: [PATCH 3/4] gitlab: make Cirrus CI timeout explicit

2023-09-12 Thread Philippe Mathieu-Daudé
On 12/9/23 20:41, Daniel P. Berrangé wrote: On the GitLab side we're invoking the Cirrus CI job using the cirrus-run tool which speaks to the Cirrus REST API. Cirrus sometimes tasks 5-10 minutes to actually schedule the task, and thus the execution time of 'cirrus-run' inside GitLab will be sligh

[PATCH] ppc/xive: Fix uint32_t overflow

2023-09-12 Thread Cédric Le Goater
As reported by Coverity, "idx << xive->pc_shift" is evaluated using 32-bit arithmetic, and then used in a context expecting a "uint64_t". Add a uint64_t cast. Fixes: Coverity CID 1519049 Fixes: b68147b7a5bf ("ppc/xive: Add support for the PC MMIOs") Signed-off-by: Cédric Le Goater --- hw/intc/pn

Re: [PULL v1 0/1] Merge tpm 2023/09/12 v1

2023-09-12 Thread Philippe Mathieu-Daudé
On 12/9/23 23:41, Stefan Berger wrote: Hello! This PR contains a fix for the case where the TPM file descriptor is >= 1024 and the select() call cannot be used. Regards, Stefan The following changes since commit 9ef497755afc252fb8e060c9ea6b0987abfd20b6: Merge tag 'pull-vfio-20230911

Re: [PATCH v3 03/12] plugins: Check if vCPU is realized

2023-09-12 Thread Philippe Mathieu-Daudé
On 13/9/23 00:40, Akihiko Odaki wrote: The created member of CPUState tells if the vCPU thread is started, and will be always false for the user space emulation that manages threads independently. Per the docstring: /** * CPUState: * @created: Indicates whether the CPU thread has been

Re: [BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs

2023-09-12 Thread Erik Schilling
On Fri Sep 1, 2023 at 12:37 PM CEST, Erik Schilling wrote: > On Wed Aug 30, 2023 at 10:20 AM CEST, Erik Schilling wrote: > > Hi all! > > > > Some days ago I posted to #virtiofs:matrix.org, describing that I am > > observing what looks like a corruption when executing programs from a > > virtiofs-ba

Re: [RFC PATCH 2/3] target/ppc: Use FP CR1 update helper more widely

2023-09-12 Thread Philippe Mathieu-Daudé
On 13/9/23 02:58, Nicholas Piggin wrote: Several places open-code this FP CR1 update. Move them to call gen_set_cr1_from_fpscr(). FPSCR_OX = 28 so move that to the symbolic constant while we are here. Signed-off-by: Nicholas Piggin --- target/ppc/translate/fp-impl.c.inc | 16 ++--

Re: [PATCH] ppc/xive: Fix uint32_t overflow

2023-09-12 Thread Philippe Mathieu-Daudé
On 13/9/23 07:56, Cédric Le Goater wrote: As reported by Coverity, "idx << xive->pc_shift" is evaluated using 32-bit arithmetic, and then used in a context expecting a "uint64_t". Add a uint64_t cast. Fixes: Coverity CID 1519049 Fixes: b68147b7a5bf ("ppc/xive: Add support for the PC MMIOs") Sign

Re: [PATCH] ppc/xive: Fix uint32_t overflow

2023-09-12 Thread Frederic Barrat
On 13/09/2023 07:56, Cédric Le Goater wrote: As reported by Coverity, "idx << xive->pc_shift" is evaluated using 32-bit arithmetic, and then used in a context expecting a "uint64_t". Add a uint64_t cast. Fixes: Coverity CID 1519049 Fixes: b68147b7a5bf ("ppc/xive: Add support for the PC MMIOs"

Re: [PATCH] hw/i386/pc_piix: Mark the machine types from version 1.4 to 1.7 as deprecated

2023-09-12 Thread Philippe Mathieu-Daudé
On 18/1/22 09:49, Thomas Huth wrote: On 17/01/2022 21.12, Daniel P. Berrangé wrote: On Mon, Jan 17, 2022 at 08:16:39PM +0100, Thomas Huth wrote: The list of machine types grows larger and larger each release ... and it is unlikely that many people still use the very old ones for live migration.

Re: [PATCH 0/4] hw/cxl: Minor CXL emulation fixes and cleanup

2023-09-12 Thread Philippe Mathieu-Daudé
Cc'ing qemu-trivial@ On 4/9/23 15:28, Jonathan Cameron wrote: A small set gathering patches that have been posted and reviewed on list over the last few months. Looking to get these upstream before making any significant changes to the CXL emulation for this cycle. More wide spread cleanup will

Re: [PULL v1 1/1] tpm: fix crash when FD >= 1024

2023-09-12 Thread Michael Tokarev
13.09.2023 00:41, Stefan Berger wrote: From: Marc-Andr޸ Lureau Replace select() with poll() to fix a crash when QEMU has a large number of FDs. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2020133 Please keep these on the same line next time. And there's an UTF8 issue with Marc-André'

Re: [PATCH] gitlab: remove unreliable avocado CI jobs

2023-09-12 Thread Philippe Mathieu-Daudé
On 12/9/23 21:58, Thomas Huth wrote: On 12/09/2023 17.06, Stefan Hajnoczi wrote: The avocado-system-alpine, avocado-system-fedora, and avocado-system-ubuntu jobs are unreliable. I identified them while looking over CI failures from the past week: https://gitlab.com/qemu-project/qemu/-/jobs/50586

Re: [PATCH v3 3/4] hw/cxl: Fix and use same calculation for HDM decoder block size everywhere

2023-09-12 Thread Philippe Mathieu-Daudé
On 11/9/23 13:43, Jonathan Cameron wrote: In order to avoid having the size of the per HDM decoder register block repeated in lots of places, create the register definitions for HDM decoder 1 and use the offset between the first registers in HDM decoder 0 and HDM decoder 1 to establish the offset

<    1   2   3   4