On Wed, Feb 26, 2025 at 09:53:53AM +0000, Daniel P. Berrangé wrote:
> On Wed, Feb 26, 2025 at 10:38:56AM +0100, Thomas Huth wrote:
> > On 26/02/2025 10.15, Daniel P. Berrangé wrote:
> > > On Wed, Feb 26, 2025 at 09:50:15AM +0100, Thomas Huth wrote:
> > > > When compiling QEMU with --enable-trace-backends=simple , the
> > > > iotest 233 is currently hanging. This happens because qemu-nbd
> > > > calls trace_init_backends() first - which causes simpletrace to
> > > > install its writer thread and the atexit() handler - before
> > > > calling fork(). But the simpletrace writer thread is then only
> > > > available in the parent process, not in the child process anymore.
> > > > Thus when the child process exits, its atexit handler waits forever
> > > > on the trace_empty_cond condition to be set by the non-existing
> > > > writer thread, so the process never finishes.
> > > > 
> > > > Fix it by installing a pthread_atfork() handler, too, which
> > > > makes sure that the trace_writeout_enabled variable gets set
> > > > to false again in the child process, so we can use it in the
> > > > atexit() handler to check whether we still need to wait on the
> > > > writer thread or not.
> > > > 
> > > > Signed-off-by: Thomas Huth <th...@redhat.com>
> > > > ---
> > > >   trace/simple.c | 17 ++++++++++++++++-
> > > >   1 file changed, 16 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/trace/simple.c b/trace/simple.c
> > > > index c0aba00cb7f..269bbda69f1 100644
> > > > --- a/trace/simple.c
> > > > +++ b/trace/simple.c
> > > > @@ -380,8 +380,22 @@ void st_print_trace_file_status(void)
> > > >   void st_flush_trace_buffer(void)
> > > >   {
> > > > -    flush_trace_file(true);
> > > > +    flush_trace_file(trace_writeout_enabled);
> > > > +}
> > > > +
> > > > +#ifndef _WIN32
> > > > +static void trace_thread_atfork(void)
> > > > +{
> > > > +    /*
> > > > +     * If we fork, the writer thread does not exist in the child, so
> > > > +     * make sure to allow st_flush_trace_buffer() to clean up 
> > > > correctly.
> > > > +     */
> > > > +    g_mutex_lock(&trace_lock);
> > > > +    trace_writeout_enabled = false;
> > > > +    g_cond_signal(&trace_empty_cond);
> > > > +    g_mutex_unlock(&trace_lock);
> > > >   }
> > > > +#endif
> > > 
> > > This doesn't seem right to me. This is being run in the child and while
> > > it may avoid the hang when the child exits, surely it still leaves tracing
> > > non-functional in the child as we're lacking the thread to write out the
> > > trace data.
> > 
> > Well, you cannot write to the same file from the parent and child at the
> > same time, so one of both needs to be shut up AFAIU. And the simpletrace
> > code cannot now which one of the two processes should be allowed to continue
> > with the logging, so we either have to disable tracing in one of the two
> > processes, or think of something completely different, e.g. using
> > pthread_atfork(abort, NULL, NULL) to make people aware that they are not
> > allowed to start tracing before calling fork()...? But in that case we still
> > need a qemu-nbd expert to fix qemu-nbd, so that it does not initialize the
> > trace backend before calling fork().
> 
> As precedent, in system/vl.c we delay trace_init() until after daemonizing
> which is the simple way to avoid the worst of the danger.

That sounds good to me.

Adding Daniel Henrique Barboza because he fixed a similar issue in
commit 10b6ee1616f9 ("vl.c: do not execute trace_init_backends() before
daemonizing").

Stefan

Attachment: signature.asc
Description: PGP signature

Reply via email to