Re: [PATCH v2 0/6] migration/block: disk activation rewrite

2024-12-17 Thread Fabiano Rosas
Peter Xu  writes:

> CI: https://gitlab.com/peterx/qemu/-/pipelines/1577280033
>  (note: it's a pipeline of two patchsets, to save CI credits and time)
>
> v1: https://lore.kernel.org/r/20241204005138.702289-1-pet...@redhat.com
>
> This is v2 of the series, removing RFC tag, because my goal is to have them
> (or some newer version) merged.
>
> The major change is I merged last three patches, and did quite some changes
> here and there, to make sure the global disk activation status is always
> consistent.  The whole idea is still the same.  I say changelog won't help.
>
> I also temporarily dropped Fabiano's ping-pong test cases to avoid
> different versions floating on the list (as I know a new version is coming
> at some point. Fabiano: you're taking over the 10.0 pulls, so I assume
> you're aware so there's no concern on order of merges).  I'll review the
> test cases separately when they're ready, but this series is still tested
> with that pingpong test and it keeps working.
>
> I started looking at this problem as a whole when reviewing Fabiano's
> series, especially the patch (for a QEMU crash [1]):
>
> https://lore.kernel.org/r/20241125144612.16194-5-faro...@suse.de
>
> The proposed patch could work, but it's unwanted to add such side effect to
> migration.  So I start to think about whether we can provide a cleaner
> approach, because migration doesn't need the disks to be active to work at
> all.  Hence we should try to avoid adding a migration ABI (which doesn't
> matter now, but may matter some day) into prepare phase on disk activation
> status.  Migration should happen with disks inactivated.
>
> It's also a pure wish that, if bdrv_inactivate_all() could be benign to be
> called even if all disks are already inactive.  Then the bug is also gone.
> After all, similar call on bdrv_activate_all() upon all-active disks is all
> fine.  I hope that wish could still be fair.  But I don't know well on
> block layer to say anything meaningful.
>
> And when I was looking at that, I found more things spread all over the
> place on disk activation.  I decided to clean all of them up, while
> hopefully fixing the QEMU crash [1] too.
>
> For this v2, I did some more tests, I want to make sure all the past paths
> keep working at least on failure or cancel races, also in postcopy failure
> cases.  So I did below and they all run pass (when I said "emulated" below,
> I meant I hacked something to trigger those race / rare failures, because
> they aren't easy to trigger with vanilla binary):
>
> - Tested generic migrate_cancel during precopy, disk activation won't be
>   affected.  Disk status reports correct values in tracepoints.
>
> - Test Fabiano's ping-pong migration tests on PAUSED state VM.
>
> - Emulated precopy failure before sending non-iterable, disk inactivation
>   won't happen, and also activation won't trigger after migration cleanups
>   (even if activation on top of activate disk is benign, I checked traces
>   to make sure it'll provide consistent disk status, skipping activation).
>
> - Emulated precopy failure right after sending non-iterable. Disks will be
>   inactivated, but then can be reactivated properly before VM starts.
>
> - Emulated postcopy failure when sending the packed data (which is after
>   disk invalidated), and making sure src VM will get back to live properly,
>   re-activate the disks before starting.
>
> - Emulated concurrent migrate_cancel at the end of migration_completion()
>   of precopy, after disk inactivated.  Disks can be reactivated properly.
>
>   NOTE: here if dest QEMU didn't quit before migrate_cancel,
>   bdrv_activate_all() can crash src QEMU.  This behavior should be the same
>   before/after this patch.
>
> Comments welcomed, thanks.
>
> [1] https://gitlab.com/qemu-project/qemu/-/issues/2395
>
> Peter Xu (6):
>   migration: Add helper to get target runstate
>   qmp/cont: Only activate disks if migration completed
>   migration/block: Make late-block-active the default
>   migration/block: Apply late-block-active behavior to postcopy
>   migration/block: Fix possible race with block_inactive
>   migration/block: Rewrite disk activation
>
>  include/migration/misc.h |   4 ++
>  migration/migration.h|   6 +-
>  migration/block-active.c |  94 +++
>  migration/colo.c |   2 +-
>  migration/migration.c| 136 +++
>  migration/savevm.c   |  46 ++---
>  monitor/qmp-cmds.c   |  22 +++
>  migration/meson.build|   1 +
>  migration/trace-events   |   3 +
>  9 files changed, 188 insertions(+), 126 deletions(-)
>  create mode 100644 migration/block-active.c

Queued, thanks!



Re: [PATCH 0/2] include: Two cleanups around missing 'qemu/atomic.h'

2024-12-17 Thread Ilya Leoshkevich
On Tue, 2024-12-17 at 15:13 +0100, Philippe Mathieu-Daudé wrote:
> We have 2 headers using qatomic_read() without including
> its declaration from "qemu/atomic.h". Include the missing
> header. For my own convenience I plan to merge these 2 patches
> via my tree.
> 
> Regards,
> 
> Phil.
> 
> Philippe Mathieu-Daudé (2):
>   exec/translation-block: Include missing 'qemu/atomic.h' header
>   qemu/coroutine: Include missing 'qemu/atomic.h' header
> 
>  include/exec/translation-block.h | 1 +
>  include/qemu/coroutine.h | 1 +
>  2 files changed, 2 insertions(+)

Acked-by: Ilya Leoshkevich 



[PATCH 1/2] exec/translation-block: Include missing 'qemu/atomic.h' header

2024-12-17 Thread Philippe Mathieu-Daudé
When moving tb_cflags() in commit 88d4b5138a8 ("tcg: Make
tb_cflags() usable from target-agnostic code") we forgot to
include "qemu/atomic.h", which declares qatomic_read().
Explicitly include it now to avoid issue when refactoring
unrelated headers.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/exec/translation-block.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/exec/translation-block.h b/include/exec/translation-block.h
index b99afb00779..81299b7bdb5 100644
--- a/include/exec/translation-block.h
+++ b/include/exec/translation-block.h
@@ -7,6 +7,7 @@
 #ifndef EXEC_TRANSLATION_BLOCK_H
 #define EXEC_TRANSLATION_BLOCK_H
 
+#include "qemu/atomic.h"
 #include "qemu/thread.h"
 #include "exec/cpu-common.h"
 #include "exec/vaddr.h"
-- 
2.45.2




Re: [PATCH 0/2] include: Two cleanups around missing 'qemu/atomic.h'

2024-12-17 Thread Richard Henderson

On 12/17/24 08:13, Philippe Mathieu-Daudé wrote:

Philippe Mathieu-Daudé (2):
   exec/translation-block: Include missing 'qemu/atomic.h' header
   qemu/coroutine: Include missing 'qemu/atomic.h' header


Reviewed-by: Richard Henderson 

r~



[PATCH 2/2] qemu/coroutine: Include missing 'qemu/atomic.h' header

2024-12-17 Thread Philippe Mathieu-Daudé
Commit 944f3d5dd21 ("coroutine: Add qemu_co_mutex_assert_locked")
added an inline method which uses qatomic_read(), itself declared
in "qemu/atomic.h". Explicitly include it now to avoid issue when
refactoring unrelated headers.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/qemu/coroutine.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index ff3084538b8..e545bbf620f 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -16,6 +16,7 @@
 #define QEMU_COROUTINE_H
 
 #include "qemu/coroutine-core.h"
+#include "qemu/atomic.h"
 #include "qemu/queue.h"
 #include "qemu/timer.h"
 
-- 
2.45.2




[PATCH 0/2] include: Two cleanups around missing 'qemu/atomic.h'

2024-12-17 Thread Philippe Mathieu-Daudé
We have 2 headers using qatomic_read() without including
its declaration from "qemu/atomic.h". Include the missing
header. For my own convenience I plan to merge these 2 patches
via my tree.

Regards,

Phil.

Philippe Mathieu-Daudé (2):
  exec/translation-block: Include missing 'qemu/atomic.h' header
  qemu/coroutine: Include missing 'qemu/atomic.h' header

 include/exec/translation-block.h | 1 +
 include/qemu/coroutine.h | 1 +
 2 files changed, 2 insertions(+)

-- 
2.45.2




[PATCH] meson.build: Disallow libnfs v6 to fix the broken macOS build

2024-12-17 Thread Thomas Huth
The macOS build in our CI is currently broken since homebrew
updated libnfs to version 6 - and that version apparently comes
with a big API breakage. Disallow that version for now to get the
broken CI job working again. Once somebody had enough time to
adapt our code in block/nfs.c, we can revert this change again.

Signed-off-by: Thomas Huth 
---
 meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 85f7485473..6149b50db2 100644
--- a/meson.build
+++ b/meson.build
@@ -1145,7 +1145,7 @@ endif
 
 libnfs = not_found
 if not get_option('libnfs').auto() or have_block
-  libnfs = dependency('libnfs', version: '>=1.9.3',
+  libnfs = dependency('libnfs', version: ['>=1.9.3', '<6.0.0'],
   required: get_option('libnfs'),
   method: 'pkg-config')
 endif
-- 
2.47.1