On Thu, 11 Jul 2024 at 21:08, Peter Xu <pet...@redhat.com> wrote: > Hmm, I thought it was one of the vcpu threads that invoked > vhost_dev_start(), rather than any migration thread?
[QEMU=vhost-user-front-end] <===========> [QEMU=vhost-user-front-end] ^ | | | | | | V [external-process=vhost-user-back-end] [external-process=vhost-user-back-end] === vhost-user-protocol: -> https://www.qemu.org/docs/master/interop/vhost-user.html#vhost-user-proto * It is not clear which thread calls vhost_dev_start() routine, it could be a vCPU thread. Sending 'postcopy_end' message to the 'vhost-user-back-end', hints that the device was being migrated and migration finished before the device set-up was done. The protocol above says "...The nature of the channel is implementation-defined, but it must generally behave like a pipe: The writing end will write all the data it has into it, signalling the end of data by closing its end. The reading end must read all of this data (until encountering the end of file) and process it." * It does not mention sending the 'postcopy_end' message. But it talks about the front-end sending 'VHOST_USER_CHECK_DEVICE_STATE' to the back-end to check if the migration of the device state was successful or not. > I remember after you added the rwlock, there's still a hang issue. > Did you investigated that? Or do you mean this series will fix all the > problems? * No, this series does not fix the guest hang issue. Root cause of that is still a mystery. If migration is ending abruptly before all of the guest state is migrated, the guest hang scenario seems possible. Adding vhost-user-rw-lock does not address the issue of end of migration. * From the protocol page above, it is not clear if the front-end should allow/have multiple threads talking to the same vhost-user device. Thank you. --- - Prasad