On 02.05.23 23:48, Peter Xu wrote:
On Fri, Apr 28, 2023 at 10:49:25PM +0300, Vladimir Sementsov-Ogievskiy wrote:
Originally, migration_incoming_co was introduced by
25d0c16f625feb3b6
"migration: Switch to COLO process after finishing loadvm"
to be able to enter from COLO code to one specific yield point, added
by 25d0c16f625feb3b6.
Later in 923709896b1b0
"migration: poll the cm event for destination qemu"
we reused this variable to wake the migration incoming coroutine from
RDMA code.
That was doubtful idea. Entering coroutines is a very fragile thing:
you should be absolutely sure which yield point you are going to enter.
I don't know how much is it safe to enter during qemu_loadvm_state()
which I think what RDMA want to do. But for sure RDMA shouldn't enter
the special COLO-related yield-point. As well, COLO code doesn't want
to enter during qemu_loadvm_state(), it want to enter it's own specific
yield-point.
As well, when in 8e48ac95865ac97d
"COLO: Add block replication into colo process" we added
bdrv_invalidate_cache_all() call (now it's called activate_all())
it became possible to enter the migration incoming coroutine during
that call which is wrong too.
So, let't make these things separate and disjoint: loadvm_co for RDMA,
non-NULL during qemu_loadvm_state(), and colo_incoming_co for COLO,
non-NULL only around specific yield.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@yandex-team.ru>
---
migration/colo.c | 4 ++--
migration/migration.c | 8 ++++++--
migration/migration.h | 9 ++++++++-
3 files changed, 16 insertions(+), 5 deletions(-)
The idea looks right to me, but I really know mostly nothing on coroutines
and also rdma+colo..
Is the other ref in rdma.c (rdma_cm_poll_handler()) still missing?
Oops right.. I was building with rdma disabled. Will fix.
Thanks a lot for reviewing!
--
Best regards,
Vladimir