Re: [Qemu-devel] [PATCH] monitor: avoid potential dead-lock when cleaning up

Markus Armbruster Mon, 20 Aug 2018 22:08:53 -0700

Marc-André Lureau <marcandre.lur...@gmail.com> writes:

> Hi
>
> On Mon, Aug 20, 2018 at 8:57 AM Markus Armbruster <arm...@redhat.com> wrote:
>>
>> Marc-André Lureau <marcandre.lur...@gmail.com> writes:
>>
>> > Hi
>> > On Wed, Aug 1, 2018 at 5:09 PM Markus Armbruster <arm...@redhat.com> wrote:
>> >>
>> >> Marc-André Lureau <marcandre.lur...@gmail.com> writes:
>> >>
>> >> > Hi
>> >> >
>> >> > On Wed, Aug 1, 2018 at 3:19 PM, Markus Armbruster <arm...@redhat.com> 
>> >> > wrote:
>> >> >> Marc-André Lureau <marcandre.lur...@redhat.com> writes:
>> >> >>
>> >> >>> When a monitor is connected to a Spice chardev, the monitor cleanup
>> >> >>> can dead-lock:
>> >> >>>
>> >> >>>  #0  0x00007f43446637fd in __lll_lock_wait () at 
>> >> >>> /lib64/libpthread.so.0
>> >> >>>  #1  0x00007f434465ccf4 in pthread_mutex_lock () at 
>> >> >>> /lib64/libpthread.so.0
>> >> >>>  #2  0x0000556dd79f22ba in qemu_mutex_lock_impl (mutex=0x556dd81c9220 
>> >> >>> <monitor_lock>, file=0x556dd7ae3648 "/home/elmarco/src/qq/monitor.c", 
>> >> >>> line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66
>> >> >>>  #3  0x0000556dd7431bd5 in monitor_qapi_event_queue 
>> >> >>> (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x556dd9abc850, 
>> >> >>> errp=0x7fffb7bbddd8) at /home/elmarco/src/qq/monitor.c:645
>> >> >>>  #4  0x0000556dd79d476b in qapi_event_send_spice_disconnected 
>> >> >>> (server=0x556dd98ee760, client=0x556ddaaa8560, errp=0x556dd82180d0 
>> >> >>> <error_abort>) at qapi/qapi-events-ui.c:149
>> >> >>>  #5  0x0000556dd7870fc1 in channel_event (event=3, 
>> >> >>> info=0x556ddad1b590) at /home/elmarco/src/qq/ui/spice-core.c:235
>> >> >>>  #6  0x00007f434560a6bb in reds_handle_channel_event (reds=<optimized 
>> >> >>> out>, event=3, info=0x556ddad1b590) at reds.c:316
>> >> >>>  #7  0x00007f43455f393b in main_dispatcher_self_handle_channel_event 
>> >> >>> (info=0x556ddad1b590, event=3, self=0x556dd9a7d8c0) at 
>> >> >>> main-dispatcher.c:197
>> >> >>>  #8  0x00007f43455f393b in main_dispatcher_channel_event 
>> >> >>> (self=0x556dd9a7d8c0, event=event@entry=3, info=0x556ddad1b590) at 
>> >> >>> main-dispatcher.c:197
>> >> >>>  #9  0x00007f4345612833 in red_stream_push_channel_event 
>> >> >>> (s=s@entry=0x556ddae2ef40, event=event@entry=3) at red-stream.c:414
>> >> >>>  #10 0x00007f434561286b in red_stream_free (s=0x556ddae2ef40) at 
>> >> >>> red-stream.c:388
>> >> >>>  #11 0x00007f43455f9ddc in red_channel_client_finalize 
>> >> >>> (object=0x556dd9bb21a0) at red-channel-client.c:347
>> >> >>>  #12 0x00007f434b5f9fb9 in g_object_unref () at 
>> >> >>> /lib64/libgobject-2.0.so.0
>> >> >>>  #13 0x00007f43455fc212 in red_channel_client_push 
>> >> >>> (rcc=0x556dd9bb21a0) at red-channel-client.c:1341
>> >> >>>  #14 0x0000556dd76081ba in spice_port_set_fe_open 
>> >> >>> (chr=0x556dd9925e20, fe_open=0) at 
>> >> >>> /home/elmarco/src/qq/chardev/spice.c:241
>> >> >>>  #15 0x0000556dd796d74a in qemu_chr_fe_set_open (be=0x556dd9a37c00, 
>> >> >>> fe_open=0) at /home/elmarco/src/qq/chardev/char-fe.c:340
>> >> >>>  #16 0x0000556dd796d4d9 in qemu_chr_fe_set_handlers 
>> >> >>> (b=0x556dd9a37c00, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, 
>> >> >>> be_change=0x0, opaque=0x0, context=0x0, set_open=true) at 
>> >> >>> /home/elmarco/src/qq/chardev/char-fe.c:280
>> >> >>>  #17 0x0000556dd796d359 in qemu_chr_fe_deinit (b=0x556dd9a37c00, 
>> >> >>> del=false) at /home/elmarco/src/qq/chardev/char-fe.c:233
>> >> >>>  #18 0x0000556dd7432240 in monitor_data_destroy (mon=0x556dd9a37c00) 
>> >> >>> at /home/elmarco/src/qq/monitor.c:786
>> >> >>>  #19 0x0000556dd743b968 in monitor_cleanup () at 
>> >> >>> /home/elmarco/src/qq/monitor.c:4683
>> >> >>>  #20 0x0000556dd75ce776 in main (argc=3, argv=0x7fffb7bbe458, 
>> >> >>> envp=0x7fffb7bbe478) at /home/elmarco/src/qq/vl.c:4660
>> >> >>>
>> >> >>> Because spice code tries to emit a "disconnected" signal on the
>> >> >>> monitors. Fix this situation by tightening the monitor lock time to
>> >> >>> the monitor list removal.
>> >> >>>
>> >> >>> Signed-off-by: Marc-André Lureau <marcandre.lur...@redhat.com>
>> >> >>
>> >> >> Do you think this should go into 3.0?
>> >> >>
>> >> >>> ---
>> >> >>>  monitor.c | 22 +++++++++++++++-------
>> >> >>>  1 file changed, 15 insertions(+), 7 deletions(-)
>> >> >>>
>> >> >>> diff --git a/monitor.c b/monitor.c
>> >> >>> index 0fa0910a2a..a16a6c5311 100644
>> >> >>> --- a/monitor.c
>> >> >>> +++ b/monitor.c
>> >> >>> @@ -4702,8 +4702,6 @@ void monitor_init(Chardev *chr, int flags)
>> >> >>>
>> >> >>>  void monitor_cleanup(void)
>> >> >>>  {
>> >> >>> -    Monitor *mon, *next;
>> >> >>> -
>> >> >>>      /*
>> >> >>>       * We need to explicitly stop the I/O thread (but not destroy 
>> >> >>> it),
>> >> >>>       * clean up the monitor resources, then destroy the I/O thread 
>> >> >>> since
>> >> >>> @@ -4719,14 +4717,24 @@ void monitor_cleanup(void)
>> >> >>>      monitor_qmp_bh_responder(NULL);
>> >> >>>
>> >> >>>      /* Flush output buffers and destroy monitors */
>> >> >>> -    qemu_mutex_lock(&monitor_lock);
>> >> >>> -    QTAILQ_FOREACH_SAFE(mon, &mon_list, entry, next) {
>> >> >>> -        QTAILQ_REMOVE(&mon_list, mon, entry);
>> >> >>> +    do {
>> >> >>
>> >> >> for (;;), please.
>> >> >>
>> >> >>> +        Monitor *mon;
>> >> >>> +
>> >> >>> +        qemu_mutex_lock(&monitor_lock);
>> >> >>> +        mon = QTAILQ_FIRST(&mon_list);
>> >> >>> +        if (mon) {
>> >> >>> +            QTAILQ_REMOVE(&mon_list, mon, entry);
>> >> >>> +        }
>> >> >>> +        qemu_mutex_unlock(&monitor_lock);
>> >> >>> +
>> >> >>> +        if (!mon) {
>> >> >>> +            break;
>> >> >>> +        }
>> >> >>> +
>> >> >>>          monitor_flush(mon);
>> >> >>>          monitor_data_destroy(mon);
>> >> >>>          g_free(mon);
>> >> >>> -    }
>> >> >>> -    qemu_mutex_unlock(&monitor_lock);
>> >> >>> +    } while (true);
>> >> >>>
>> >> >>>      /* QEMUBHs needs to be deleted before destroying the I/O thread 
>> >> >>> */
>> >> >>>      qemu_bh_delete(qmp_dispatcher_bh);
>> >> >>
>> >> >> Iterating safely over a list protected by a lock should be easier than
>> >> >> that.  Sad.
>> >> >>
>> >> >> Hmm, what about this:
>> >> >>
>> >> >> diff --git a/monitor.c b/monitor.c
>> >> >> index 77861e96af..4a23f6c7bc 100644
>> >> >> --- a/monitor.c
>> >> >> +++ b/monitor.c
>> >> >> @@ -4721,9 +4721,11 @@ void monitor_cleanup(void)
>> >> >>      qemu_mutex_lock(&monitor_lock);
>> >> >>      QTAILQ_FOREACH_SAFE(mon, &mon_list, entry, next) {
>> >> >>          QTAILQ_REMOVE(&mon_list, mon, entry);
>> >> >> +        qemu_mutex_unlock(&monitor_lock);
>> >> >>          monitor_flush(mon);
>> >> >>          monitor_data_destroy(mon);
>> >> >>          g_free(mon);
>> >> >> +        qemu_mutex_lock(&monitor_lock);
>> >> >
>> >> > Although unlikely, there is a chance the monitor list could be
>> >> > modified while flushing/cleaning up, I suppose, in this case we could
>> >> > miss the new monitors (if next is NULL).
>> >>
>> >> Your loop prevents that from happening while it runs, but does nothing
>> >> to stop it from happening afterwards.  If we want to lock out new
>> >> monitors, we need to make monitor_init() fail or impossible to call.
>> >
>> > Not so trivial. Is there other threads capable of calling
>> > monitor_init() by the time monitor_cleanup() is called? It looks like
>> > monitor_init() may only be called from the main thread.
>>
>> Callers:
>>
>> * gdbserver_start()
>>
>>   CLI option -gdb, HMP command gdbserver, linux user CLI option -g and
>>   environment variable QEMU_GDB
>>
>>   The interesting one is the HMP command.  Does your loop lock it out?
>>   If we run it only in the main thread, and we run the HMP command only
>>   in the main thread, it obviously does.
>>
>> * mon_init_func()
>>
>>   CLI option -mon and its convenience buddies -monitor, -qmp,
>>   -qmp-pretty
>>
>>   We don't have a monitor command to spawn off a new monitor, but we
>>   could have.
>>
>> * qemu_chr_new_noreplay()
>>
>>   gdbserver_start() again, and qemu_chr_new(), which is called all over
>>   the place.  I lack the time to review these calls.  Are you sure this
>>   one can only run in the main thread?
>>
>
> No, I am not sure, but I would consider it a bug today. However, if
> it's possible to keep using the monitor or create new monitor after
> monitor_cleanup() is called, we have probably have more issues to
> solve.
>
> However, this problem is not directly related to the dead-lock fixed
> here, and the problem is pre-existing.
>
> Probably the cleanup code would have to look different if we want to
> solve the init/cleanup races, but that's a different fix. Do we have
> to solve it first, or can we add a FIXME?


FIXME is fine.

Can we go with the simpler fix I sketched?

>> Synchronizing monitor creation and cleanup explicitly might be cleaner.
>> I guess monitor_lock kind of sort of almost does that before your patch,
>> but it can deadlock because it's too coarse.
>>
>> I'm afraid we need to rethink the set of locks protecting shared monitor
>> state.
>
> Yes, and probably change a bit monitor/chardev creation to be under
> tighter control...

We're going to have fun...

Re: [Qemu-devel] [PATCH] monitor: avoid potential dead-lock when cleaning up

Reply via email to