On Wed, Feb 26, 2025 at 10:38:56AM +0100, Thomas Huth wrote:
> On 26/02/2025 10.15, Daniel P. Berrangé wrote:
> > On Wed, Feb 26, 2025 at 09:50:15AM +0100, Thomas Huth wrote:
> > > When compiling QEMU with --enable-trace-backends=simple , the
> > > iotest 233 is currently hanging. This happens because qemu-nbd
> > > calls trace_init_backends() first - which causes simpletrace to
> > > install its writer thread and the atexit() handler - before
> > > calling fork(). But the simpletrace writer thread is then only
> > > available in the parent process, not in the child process anymore.
> > > Thus when the child process exits, its atexit handler waits forever
> > > on the trace_empty_cond condition to be set by the non-existing
> > > writer thread, so the process never finishes.
> > > 
> > > Fix it by installing a pthread_atfork() handler, too, which
> > > makes sure that the trace_writeout_enabled variable gets set
> > > to false again in the child process, so we can use it in the
> > > atexit() handler to check whether we still need to wait on the
> > > writer thread or not.
> > > 
> > > Signed-off-by: Thomas Huth <th...@redhat.com>
> > > ---
> > >   trace/simple.c | 17 ++++++++++++++++-
> > >   1 file changed, 16 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/trace/simple.c b/trace/simple.c
> > > index c0aba00cb7f..269bbda69f1 100644
> > > --- a/trace/simple.c
> > > +++ b/trace/simple.c
> > > @@ -380,8 +380,22 @@ void st_print_trace_file_status(void)
> > >   void st_flush_trace_buffer(void)
> > >   {
> > > -    flush_trace_file(true);
> > > +    flush_trace_file(trace_writeout_enabled);
> > > +}
> > > +
> > > +#ifndef _WIN32
> > > +static void trace_thread_atfork(void)
> > > +{
> > > +    /*
> > > +     * If we fork, the writer thread does not exist in the child, so
> > > +     * make sure to allow st_flush_trace_buffer() to clean up correctly.
> > > +     */
> > > +    g_mutex_lock(&trace_lock);
> > > +    trace_writeout_enabled = false;
> > > +    g_cond_signal(&trace_empty_cond);
> > > +    g_mutex_unlock(&trace_lock);
> > >   }
> > > +#endif
> > 
> > This doesn't seem right to me. This is being run in the child and while
> > it may avoid the hang when the child exits, surely it still leaves tracing
> > non-functional in the child as we're lacking the thread to write out the
> > trace data.
> 
> Well, you cannot write to the same file from the parent and child at the
> same time, so one of both needs to be shut up AFAIU. And the simpletrace
> code cannot now which one of the two processes should be allowed to continue
> with the logging, so we either have to disable tracing in one of the two
> processes, or think of something completely different, e.g. using
> pthread_atfork(abort, NULL, NULL) to make people aware that they are not
> allowed to start tracing before calling fork()...? But in that case we still
> need a qemu-nbd expert to fix qemu-nbd, so that it does not initialize the
> trace backend before calling fork().

As precedent, in system/vl.c we delay trace_init() until after daemonizing
which is the simple way to avoid the worst of the danger.

It would still be nice to have an atfork() handler to fully eliminate the
danger though

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to