> On May 23, 2022, at 11:09 AM, Alexander Duyck <alexander.du...@gmail.com> > wrote: > > From: Alexander Duyck <alexanderdu...@fb.com> > > When I run Multi-process QEMU with an e1000 as the remote device and SMP > enabled I see the combination lock up and become unresponsive. The QEMU build > is a fairly standard x86_64-softmmu setup. After doing some digging I tracked > the lockup down to the what appears to be a race with the mpqemu-link msg_send > and msg_receive functions and the reacquisition of the lock. > > I am assuming the issue is some sort of lock inversion though I haven't > identified exactly what the other lock involved is yet. For now removing > the logic to unlock the iothread and then reacquire the lock seems to > resolve the issue. I am assuming the releasing of the lock was some form of > optimization but I am not certain so I am submitting this as an RFC.
Hi Alexander, We are working on moving away from Multi-process QEMU and to using vfio-user based approach. The vfio-user patches are under review. I believe we would drop the Multi-process support once vfio-user is merged. We release the lock here while communicating with the remote process via the QIOChannel. It is to prevent lockup of the VM in case the QIOChannel hangs. I was able to reproduce this issue at my end. There is a deadlock between "mpqemu_msg_send() -> qemu_mutex_lock_iothread()" and "mpqemu_msg_send_and_await_reply() -> QEMU_LOCK_GUARD(&pdev->io_mutex)”. From what I can tell, as soon as one vcpu thread drops the iothread lock, another thread running mpqemu_msg_send_and_await_reply() holds on to it. That prevents the first thread from completing. Attaching backtrace below. To avoid the deadlock, I think we should drop both the iothread lock and io_mutex and reacquire them in the correct order - first iothread and then io_mutex. Given multiprocess QEMU would be dropped in the near future, I suppose we don’t have to proceed further along these lines. I tested your patch, and that fixes the e1000 issue at my end also. I believe we could adopt it. Thank you! -- Jag Thread 6 (Thread 0x7f2d12281700 (LWP 31758)): #0 0x00007f2d9b7ac54d in __lll_lock_wait () at /lib64/libpthread.so.0 #1 0x00007f2d9b7a7e9b in _L_lock_883 () at /lib64/libpthread.so.0 #2 0x00007f2d9b7a7d68 in pthread_mutex_lock () at /lib64/libpthread.so.0 #3 0x000055bdeb48663f in qemu_mutex_lock_impl (mutex=0x55bdebf68800 <qemu_global_mutex>, file=0x55bdeb5c5c5a "../hw/remote/mpqemu-link.c", line=79) at ../util/qemu-thread-posix.c:88 #4 0x000055bdeb006546 in qemu_mutex_lock_iothread_impl (file=0x55bdeb5c5c5a "../hw/remote/mpqemu-link.c", line=79) at ../softmmu/cpus.c:502 #5 0x000055bdeafed3ff in mpqemu_msg_send (msg=0x7f2d12280430, ioc=0x55bdeeb02600, errp=0x7f2d12280420) at ../hw/remote/mpqemu-link.c:79 #6 0x000055bdeafed93c in mpqemu_msg_send_and_await_reply (msg=0x7f2d12280430, pdev=0x55bdeeaff8e0, errp=0x7f2d12280420) at ../hw/remote/mpqemu-link.c:198 #7 0x000055bdeafefe0e in send_bar_access_msg (pdev=0x55bdeeaff8e0, mr=0x55bdeeb00460, write=false, addr=192, val=0x7f2d12280578, size=4, memory=true) at ../hw/remote/proxy.c:256 #8 0x000055bdeafeff3e in proxy_bar_read (opaque=0x55bdeeb00450, addr=192, size=4) at ../hw/remote/proxy.c:280 #9 0x000055bdeb1f3759 in memory_region_read_accessor (mr=0x55bdeeb00460, addr=192, value=0x7f2d12280750, size=4, shift=0, mask=4294967295, attrs=...) at ../softmmu/memory.c:440 #10 0x000055bdeb1f3c8e in access_with_adjusted_size (addr=192, value=0x7f2d12280750, size=4, access_size_min=1, access_size_max=8, access_fn=0x55bdeb1f3716 <memory_region_read_accessor>, mr=0x55bdeeb00460, attrs=...) at ../softmmu/memory.c:554 #11 0x000055bdeb1f695f in memory_region_dispatch_read1 (mr=0x55bdeeb00460, addr=192, pval=0x7f2d12280750, size=4, attrs=...) at ../softmmu/memory.c:1424 #12 0x000055bdeb1f6a79 in memory_region_dispatch_read (mr=0x55bdeeb00460, addr=192, pval=0x7f2d12280750, op=MO_32, attrs=...) at ../softmmu/memory.c:1457 #13 0x000055bdeb20451a in flatview_read_continue (fv=0x7f2d0c30ef50, addr=4273602752, attrs=..., ptr=0x7f2d9d988028, len=4, addr1=192, l=4, mr=0x55bdeeb00460) at ../softmmu/physmem.c:2881 #14 0x000055bdeb204692 in flatview_read (fv=0x7f2d0c30ef50, addr=4273602752, attrs=..., buf=0x7f2d9d988028, len=4) at ../softmmu/physmem.c:2923 #15 0x000055bdeb20471b in address_space_read_full (as=0x55bdebf705e0 <address_space_memory>, addr=4273602752, attrs=..., buf=0x7f2d9d988028, len=4) at ../softmmu/physmem.c:2936 #16 0x000055bdeb20483f in address_space_rw (as=0x55bdebf705e0 <address_space_memory>, addr=4273602752, attrs=..., buf=0x7f2d9d988028, len=4, is_write=false) at ../softmmu/physmem.c:2964 #17 0x000055bdeb29a60a in kvm_cpu_exec (cpu=0x55bdedcb0410) at ../accel/kvm/kvm-all.c:2929 #18 0x000055bdeb29c3fc in kvm_vcpu_thread_fn (arg=0x55bdedcb0410) at ../accel/kvm/kvm-accel-ops.c:49 #19 0x000055bdeb4872f8 in qemu_thread_start (args=0x55bdedcbf700) at ../util/qemu-thread-posix.c:504 #20 0x00007f2d9b7a5ea5 in start_thread () at /lib64/libpthread.so.0 #21 0x00007f2d9b4ceb0d in clone () at /lib64/libc.so.6 Thread 3 (Thread 0x7f2d13a84700 (LWP 31753)): #0 0x00007f2d9b7ac54d in __lll_lock_wait () at /lib64/libpthread.so.0 #1 0x00007f2d9b7a7e9b in _L_lock_883 () at /lib64/libpthread.so.0 #2 0x00007f2d9b7a7d68 in pthread_mutex_lock () at /lib64/libpthread.so.0 #3 0x000055bdeb48663f in qemu_mutex_lock_impl (mutex=0x55bdeeb00308, file=0x55bdeb5c5aa0 "/home/qemu-separation/qemu-split/include/qemu/thread.h", line=118) at ../util/qemu-thread-posix.c:88 #4 0x000055bdeafecf00 in qemu_mutex_lock (mutex=0x55bdeeb00308) at /home/qemu-separation/qemu-split/include/qemu/thread.h:118 #5 0x000055bdeafecf4a in qemu_lockable_lock (x=0x7f2d13a83310) at /home/qemu-separation/qemu-split/include/qemu/lockable.h:95 #6 0x000055bdeafecf88 in qemu_lockable_auto_lock (x=0x7f2d13a83310) at /home/qemu-separation/qemu-split/include/qemu/lockable.h:105 #7 0x000055bdeafed90e in mpqemu_msg_send_and_await_reply (msg=0x7f2d13a83490, pdev=0x55bdeeaff8e0, errp=0x7f2d13a83480) at ../hw/remote/mpqemu-link.c:197 #8 0x000055bdeafefe0e in send_bar_access_msg (pdev=0x55bdeeaff8e0, mr=0x55bdeeb00460, write=true, addr=200, val=0x7f2d13a835b8, size=4, memory=true) at ../hw/remote/proxy.c:256 #9 0x000055bdeafefec9 in proxy_bar_write (opaque=0x55bdeeb00450, addr=200, val=4, size=4) at ../hw/remote/proxy.c:271 #10 0x000055bdeb1f3a48 in memory_region_write_accessor (mr=0x55bdeeb00460, addr=200, value=0x7f2d13a836e8, size=4, shift=0, mask=4294967295, attrs=...) at ../softmmu/memory.c:492 #11 0x000055bdeb1f3c8e in access_with_adjusted_size (addr=200, value=0x7f2d13a836e8, size=4, access_size_min=1, access_size_max=8, access_fn=0x55bdeb1f3952 <memory_region_write_accessor>, mr=0x55bdeeb00460, attrs=...) at ../softmmu/memory.c:554 #12 0x000055bdeb1f6d8b in memory_region_dispatch_write (mr=0x55bdeeb00460, addr=200, data=4, op=MO_32, attrs=...) at ../softmmu/memory.c:1514 #13 0x000055bdeb20429f in flatview_write_continue (fv=0x7f2d0c30ef50, addr=4273602760, attrs=..., ptr=0x7f2d9d991028, len=4, addr1=200, l=4, mr=0x55bdeeb00460) at ../softmmu/physmem.c:2814 #14 0x000055bdeb204402 in flatview_write (fv=0x7f2d0c30ef50, addr=4273602760, attrs=..., buf=0x7f2d9d991028, len=4) at ../softmmu/physmem.c:2856 #15 0x000055bdeb2047b2 in address_space_write (as=0x55bdebf705e0 <address_space_memory>, addr=4273602760, attrs=..., buf=0x7f2d9d991028, len=4) at ../softmmu/physmem.c:2952 #16 0x000055bdeb20481f in address_space_rw (as=0x55bdebf705e0 <address_space_memory>, addr=4273602760, attrs=..., buf=0x7f2d9d991028, len=4, is_write=true) at ../softmmu/physmem.c:2962 #17 0x000055bdeb29a60a in kvm_cpu_exec (cpu=0x55bdedc56930) at ../accel/kvm/kvm-all.c:2929 #18 0x000055bdeb29c3fc in kvm_vcpu_thread_fn (arg=0x55bdedc56930) at ../accel/kvm/kvm-accel-ops.c:49 #19 0x000055bdeb4872f8 in qemu_thread_start (args=0x55bdedc64fa0) at ../util/qemu-thread-posix.c:504 #20 0x00007f2d9b7a5ea5 in start_thread () at /lib64/libpthread.so.0 #21 0x00007f2d9b4ceb0d in clone () at /lib64/libc.so.6 > > Signed-off-by: Alexander Duyck <alexanderdu...@fb.com> > --- > hw/remote/mpqemu-link.c | 25 ------------------------- > 1 file changed, 25 deletions(-) > > diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c > index 9bd98e82197e..3e7818f54a63 100644 > --- a/hw/remote/mpqemu-link.c > +++ b/hw/remote/mpqemu-link.c > @@ -33,7 +33,6 @@ > */ > bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp) > { > - bool iolock = qemu_mutex_iothread_locked(); > bool iothread = qemu_in_iothread(); > struct iovec send[2] = {}; > int *fds = NULL; > @@ -57,16 +56,6 @@ bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, > Error **errp) > */ > assert(qemu_in_coroutine() || !iothread); > > - /* > - * Skip unlocking/locking iothread lock when the IOThread is running > - * in co-routine context. Co-routine context is asserted above > - * for IOThread case. > - * Also skip lock handling while in a co-routine in the main context. > - */ > - if (iolock && !iothread && !qemu_in_coroutine()) { > - qemu_mutex_unlock_iothread(); > - } > - > if (!qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send), > fds, nfds, 0, errp)) { > ret = true; > @@ -74,11 +63,6 @@ bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, > Error **errp) > trace_mpqemu_send_io_error(msg->cmd, msg->size, nfds); > } > > - if (iolock && !iothread && !qemu_in_coroutine()) { > - /* See above comment why skip locking here. */ > - qemu_mutex_lock_iothread(); > - } > - > return ret; > } > > @@ -96,7 +80,6 @@ static ssize_t mpqemu_read(QIOChannel *ioc, void *buf, > size_t len, int **fds, > size_t *nfds, Error **errp) > { > struct iovec iov = { .iov_base = buf, .iov_len = len }; > - bool iolock = qemu_mutex_iothread_locked(); > bool iothread = qemu_in_iothread(); > int ret = -1; > > @@ -106,16 +89,8 @@ static ssize_t mpqemu_read(QIOChannel *ioc, void *buf, > size_t len, int **fds, > */ > assert(qemu_in_coroutine() || !iothread); > > - if (iolock && !iothread && !qemu_in_coroutine()) { > - qemu_mutex_unlock_iothread(); > - } > - > ret = qio_channel_readv_full_all_eof(ioc, &iov, 1, fds, nfds, errp); > > - if (iolock && !iothread && !qemu_in_coroutine()) { > - qemu_mutex_lock_iothread(); > - } > - > return (ret <= 0) ? ret : iov.iov_len; > } > > > >