Hi Daniel,

I was thinking for some solutions for this so wanted to discuss that before 
going ahead. Also added Juan and Peter in loop.

1. Earlier i was thinking, on destination side as of now for default and 
multi-FD channel first data to be sent is MAGIC_NUMBER and VERSION so may be we 
can decide mapping based on that. But then that does not work for newly added 
post copy preempt channel as it does not send any MAGIC number. Also even for 
multiFD just MAGIC number does not tell which multifd channel number is it, 
even though as per my thinking it does not matter. So MAGIC number should be 
good for indentifying default vs multiFD channel?
2. For post-copy preempt may be we can initiate this channel only after we have 
received a request from remote e.g. remote page fault. This to me looks safest 
considering post-copy recorvery case too. I can not think of any depedency on 
post copy preempt channel which requires it to be initialised very early. May 
be Peter can confirm this.
3. Another thing we can do is to have 2-way handshake on every channel creation 
with some additional metadata, this to me looks like cleanest approach and 
durable, i understand that can break migration to/from old qemu, but then that 
can come as migration capability?

Please let me know if any of these works or if you have some other suggestions?

Thanks

Manish Mishra


On 13/10/22 1:45 pm, Daniel P. Berrangé wrote:
On Thu, Oct 13, 2022 at 01:23:40AM +0530, manish.mishra wrote:
Hi Everyone,
Hope everyone is doing great. I have seen some live migration issues with 
qemu-4.2 when using multiFD. Signature of issue is something like this.
2022-10-01T09:57:53.972864Z qemu-kvm: failed to receive packet via multifd 
channel 0: multifd: received packet magic 5145564d expected 11223344

Basically default live migration channel packet is received on multiFD channel. 
I see a older patch explaining potential reason for this behavior.
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gnu.org_archive_html_qemu-2Ddevel_2019-2D10_msg05920.html&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=LZBcU_C3HMbpUCFZgqxkS-pV8C2mHOjqUTzt45LlLwa26DA0pCAjJVDoamnX8vnC&s=B-b_HMnn_ee6JeA87-PVNBrBqxzdWYgo5PpaP91dqT8&e=
[PATCH 3/3] migration/multifd: fix potential wrong acception order of IO.
But i see this patch was not merged. By looking at qemu master code, i
could not find any other patch too which can handle this issue. So as
per my understanding this is still a potential issue even in qemu
master. I mainly wanted to check why this patch was dropped?
See my repllies in that message - it broke compatilibity of data on
the wire, meaning old QEMU can't talk to new QEMU and vica-verca.

We need a fix for this issue, but it needs to take into account
wire compatibility.

With regards,
Daniel

Reply via email to