On 26/02/2025 10.15, Daniel P. Berrangé wrote:
On Wed, Feb 26, 2025 at 09:50:15AM +0100, Thomas Huth wrote:
When compiling QEMU with --enable-trace-backends=simple , the
iotest 233 is currently hanging. This happens because qemu-nbd
calls trace_init_backends() first - which causes simpletrace to
install its writer thread and the atexit() handler - before
calling fork(). But the simpletrace writer thread is then only
available in the parent process, not in the child process anymore.
Thus when the child process exits, its atexit handler waits forever
on the trace_empty_cond condition to be set by the non-existing
writer thread, so the process never finishes.
Fix it by installing a pthread_atfork() handler, too, which
makes sure that the trace_writeout_enabled variable gets set
to false again in the child process, so we can use it in the
atexit() handler to check whether we still need to wait on the
writer thread or not.
Signed-off-by: Thomas Huth <th...@redhat.com>
---
trace/simple.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/trace/simple.c b/trace/simple.c
index c0aba00cb7f..269bbda69f1 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -380,8 +380,22 @@ void st_print_trace_file_status(void)
void st_flush_trace_buffer(void)
{
- flush_trace_file(true);
+ flush_trace_file(trace_writeout_enabled);
+}
+
+#ifndef _WIN32
+static void trace_thread_atfork(void)
+{
+ /*
+ * If we fork, the writer thread does not exist in the child, so
+ * make sure to allow st_flush_trace_buffer() to clean up correctly.
+ */
+ g_mutex_lock(&trace_lock);
+ trace_writeout_enabled = false;
+ g_cond_signal(&trace_empty_cond);
+ g_mutex_unlock(&trace_lock);
}
+#endif
This doesn't seem right to me. This is being run in the child and while
it may avoid the hang when the child exits, surely it still leaves tracing
non-functional in the child as we're lacking the thread to write out the
trace data.
Well, you cannot write to the same file from the parent and child at the
same time, so one of both needs to be shut up AFAIU. And the simpletrace
code cannot now which one of the two processes should be allowed to continue
with the logging, so we either have to disable tracing in one of the two
processes, or think of something completely different, e.g. using
pthread_atfork(abort, NULL, NULL) to make people aware that they are not
allowed to start tracing before calling fork()...? But in that case we still
need a qemu-nbd expert to fix qemu-nbd, so that it does not initialize the
trace backend before calling fork().
Thomas