Hi Daniel,
I was thinking for some solutions for this so wanted to discuss that before
going ahead. Also added Juan and Peter in loop.
1. Earlier i was thinking, on destination side as of now for default and
multi-FD channel first data to be sent is MAGIC_NUMBER and VERSION so may be we
can decide mapping based on that. But then that does not work for newly added
post copy preempt channel as it does not send any MAGIC number. Also even for
multiFD just MAGIC number does not tell which multifd channel number is it,
even though as per my thinking it does not matter. So MAGIC number should be
good for indentifying default vs multiFD channel?
2. For post-copy preempt may be we can initiate this channel only after we have
received a request from remote e.g. remote page fault. This to me looks safest
considering post-copy recorvery case too. I can not think of any depedency on
post copy preempt channel which requires it to be initialised very early. May
be Peter can confirm this.
3. Another thing we can do is to have 2-way handshake on every channel creation
with some additional metadata, this to me looks like cleanest approach and
durable, i understand that can break migration to/from old qemu, but then that
can come as migration capability?
Please let me know if any of these works or if you have some other suggestions?
Thanks
Manish Mishra
On 13/10/22 1:45 pm, Daniel P. Berrangé wrote:
On Thu, Oct 13, 2022 at 01:23:40AM +0530, manish.mishra wrote:
Hi Everyone,
Hope everyone is doing great. I have seen some live migration issues with
qemu-4.2 when using multiFD. Signature of issue is something like this.
2022-10-01T09:57:53.972864Z qemu-kvm: failed to receive packet via multifd
channel 0: multifd: received packet magic 5145564d expected 11223344
Basically default live migration channel packet is received on multiFD channel.
I see a older patch explaining potential reason for this behavior.
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gnu.org_archive_html_qemu-2Ddevel_2019-2D10_msg05920.html&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=LZBcU_C3HMbpUCFZgqxkS-pV8C2mHOjqUTzt45LlLwa26DA0pCAjJVDoamnX8vnC&s=B-b_HMnn_ee6JeA87-PVNBrBqxzdWYgo5PpaP91dqT8&e=
[PATCH 3/3] migration/multifd: fix potential wrong acception order of IO.
But i see this patch was not merged. By looking at qemu master code, i
could not find any other patch too which can handle this issue. So as
per my understanding this is still a potential issue even in qemu
master. I mainly wanted to check why this patch was dropped?
See my repllies in that message - it broke compatilibity of data on
the wire, meaning old QEMU can't talk to new QEMU and vica-verca.
We need a fix for this issue, but it needs to take into account
wire compatibility.
With regards,
Daniel