Re: [PATCH 0/2] Postcopy migration and vhost-user errors

Prasad Pandit Mon, 15 Jul 2024 03:14:57 -0700

On Thu, 11 Jul 2024 at 21:08, Peter Xu <pet...@redhat.com> wrote:
> Hmm, I thought it was one of the vcpu threads that invoked
> vhost_dev_start(), rather than any migration thread?


     [QEMU=vhost-user-front-end]  <===========>   [QEMU=vhost-user-front-end]
                            ^
                                    |
                            |
                                     |
                            |
                                     |
                            |
                                    V
[external-process=vhost-user-back-end]
[external-process=vhost-user-back-end]
===
vhost-user-protocol:
    -> https://www.qemu.org/docs/master/interop/vhost-user.html#vhost-user-proto

* It is not clear which thread calls vhost_dev_start() routine, it
could be a vCPU thread.  Sending 'postcopy_end' message to the
'vhost-user-back-end', hints that the device was being migrated and
migration finished before the device set-up was done. The protocol
above says

    "...The nature of the channel is implementation-defined, but it
must generally behave like a pipe: The writing end will write all the
data it has into it, signalling the end of data by closing its end.
The reading end must read all of this data (until encountering the end
of file) and process it."

* It does not mention sending the 'postcopy_end' message. But it talks
about the front-end sending 'VHOST_USER_CHECK_DEVICE_STATE' to the
back-end to check if the migration of the device state was successful
or not.

> I remember after you added the rwlock, there's still a hang issue.
> Did you investigated that?  Or do you mean this series will fix all the 
> problems?

* No, this series does not fix the guest hang issue. Root cause of
that is still a mystery. If migration is ending abruptly before all of
the guest state is migrated, the guest hang scenario seems possible.
Adding vhost-user-rw-lock does not address the issue of end of
migration.

* From the protocol page above, it is not clear if the front-end
should allow/have multiple threads talking to the same vhost-user
device.

Thank you.
---
  - Prasad

Re: [PATCH 0/2] Postcopy migration and vhost-user errors

Reply via email to