On Fri, Mar 31, 2017 at 06:53:56PM +0200, Paolo Bonzini wrote: > > > On 31/03/2017 18:43, Stefan Hajnoczi wrote: > > The ISA serial port device's output can hang when the pipe on stdout > > becomes full. This is a race condition where the vcpu thread executing > > serial emulation code adds a watch on stdout while the main loop thread > > is blocked in ppoll(2). If no timer or other event wakes up the main > > loop, there will be no further output from the serial device even when > > the pipe becomes writable. > > > > Richard W. M. Jones was able to reproduce the hang on recent versions of > > guestfs-tools-c and libglib2 on Fedora 26 hosts. > > > > This patch kicks the main loop so the next iteration invokes ppoll(2) > > with the watch fd. > > > > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1435432 > > Reported-by: Richard W. M. Jones <rjo...@redhat.com> > > Tested-by: Richard W. M. Jones <rjo...@redhat.com> > > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > > --- > > chardev/char.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/chardev/char.c b/chardev/char.c > > index 3df1163..6c99c34 100644 > > --- a/chardev/char.c > > +++ b/chardev/char.c > > @@ -1059,6 +1059,11 @@ guint qemu_chr_fe_add_watch(CharBackend *be, > > GIOCondition cond, > > tag = g_source_attach(src, NULL); > > g_source_unref(src); > > > > + /* The main loop may be in blocked waiting on events in another thread. > > + * Kick it so the new watch will be added. > > + */ > > + qemu_notify_event(); > > + > > return tag; > > } > > > > > > Thanks for looking at this, I was quite stuck and now I understand > what's going on. However, I don't believe your patch is the right > solution. > > According to Richard's bisection, the bug was introduced by the patch > at https://bug761102.bugzilla-attachments.gnome.org/attachment.cgi?id=319699. > > The g_wakeup_signal that is removed (actually made conditional) in that > patch is doing exactly the same thing as qemu_notify_event, which is > fishy... It would still be a QEMU bug according to the theory below but, > depending on how they handle backwards-compatibility, they might > consider undoing this change. > > glib is expecting QEMU to use g_main_context_acquire around accesses to > GMainContext. However QEMU is not doing that, instead it is taking its > own mutex. So we should add g_main_context_acquire and > g_main_context_release in the two implementations of > os_host_main_loop_wait; these should undo the effect of Frediano's > glib patch. > > In all fairness, the docs do say "You must be the owner of a context > before you can call g_main_context_prepare(), g_main_context_query(), > g_main_context_check(), g_main_context_dispatch()". However, it has > worked until now and the documentation does not say exactly why that > is necessary.
Thanks Paolo, very interesting. I never realized that glib has its own inter-thread signal. NACK Stefan
signature.asc
Description: PGP signature