On 2025/08/27 6:52, Fabiano Rosas wrote:
Michael Tokarev <m...@tls.msk.ru> writes:
+CC Akihiko
Hi!
This is
commit c0b32426ce56182c1ce2a12904f3a702c2ecc460
Author: Marco Cavenati <marco.caven...@eurecom.fr>
Date: Wed Mar 26 17:22:30 2025 +0100
migration: fix SEEK_CUR offset calculation in qio_channel_block_seek
which went to 10.0.0-rc2, and has been cherry-picked to
7.2 and 9.2 stable series.
Reportedly it breaks migration in 7.2.18 and up. Which is
kinda strange, as it shouldn't do any harm?
Yeah, this is not it. Unless you're using colo or mapped-ram.
https://bugs.debian.org/1112044
any guess what's going on?
The virtio changes are probably the issue. One of them touches
mhdr.num_buffers, under mergeable_rx_bufs, which is migrated state. The
flag in turn depends on VIRTIO_NET_F_MRG_RXBUF, which is set on the
cmdline with -device virtio-net-pci,mrg_rxbuf= but also reset by
virtio_set_features_nocheck, if I'm reading this right.
I don't think commit cefd67f25430 ("virtio-net: Fix num_buffers for
version 1") is related to the issue. Commit ce1431615292 ("virtio: Call
set_features during reset") is more likely.
virtio_set_features_nocheck() shouldn't reset VIRTIO_NET_F_MRG_RXBUF. It
calls virtio_net_set_features(), which does not clear features.
virtio_net_get_features() clears features, but it is called before
migration.
The posted call trace indicates a lockup happens in the control path,
but commit cefd67f25430 ("virtio-net: Fix num_buffers for version 1")
changes the data path.
On the other hand, I can come up with a possible failure scenario with
commit ce1431615292 ("virtio: Call set_features during reset"). Perhaps
it changed the machine state before loading the migrated state, and
caused a mismatch between them.
I need more information to understand the issue. A command line to
reproduce the issue is especially helpful because options like
mrg_rxbuf=, which you mentioned, tell enabled features, which is
valuable information.
Regards,
Akihiko Odaki