On 2/22/23 09:09, Richard W.M. Jones wrote: > On Tue, Feb 21, 2023 at 05:03:23PM +0100, Laszlo Ersek wrote: >> Rich mentioned that libnbd had actually encountered a bug of this kind, >> just not specifically in exec*p*(). > > Probably this one? > > https://github.com/libguestfs/libguestfs/commit/e1c9bbb3d1d5ef81490977060120dda0963eb567
Yes, that's the one you mentioned before! Here: 20230131130753.GA7636@redhat.com">http://mid.mail-archive.com/20230131130753.GA7636@redhat.com > glibc was pretty tolerant of this code bug, and the error only > manifested itself when we used glibc.malloc.check=1 Thank you for pointing me to the same commit again. Yesterday, when Daniel described that malloc() was -- in practice -- safe to call in a child process forked from a multi-threaded process, I wrote the following test program: (The program starts 8 threads calling malloc+free in a busy loop, then the main thread enters an infinite loop, forking and reaping a child process in each iteration, and printing a dot for each child reaped. The child process, forked from the multi-threaded parent process, calls a single malloc+free pair, and then exits.) 1 #define _XOPEN_SOURCE 700 2 3 #include <pthread.h> 4 #include <stdlib.h> 5 #include <sys/wait.h> 6 #include <unistd.h> 7 8 static const size_t size = 16 * 1024 * 1024; 9 10 static void * 11 threadfn (void *arg) 12 { 13 while (1) 14 free (malloc (size)); 15 } 16 17 int 18 main (void) 19 { 20 unsigned i; 21 22 for (i = 0; i < 8; ++i) { 23 pthread_t thread; 24 25 if (pthread_create (&thread, NULL, threadfn, NULL) != 0) 26 _exit (EXIT_FAILURE); 27 } 28 29 while (1) { 30 pid_t pid; 31 32 pid = fork (); 33 switch (pid) { 34 case -1: 35 _exit (EXIT_FAILURE); 36 37 case 0: 38 /* child */ 39 free (malloc (size)); 40 _exit (EXIT_SUCCESS); 41 42 default: 43 /* parent */ 44 if (waitpid (pid, NULL, 0) == -1 || 45 write (STDOUT_FILENO, ".", 1) == -1) 46 _exit (EXIT_FAILURE); 47 } 48 } 49 } To my shock, the program ran totally fine (on RHEL-9.1), producing a constant stream of dots on standard output, proving Daniel *right*. However, the commit message you now reference highlights "GLIBC_TUNABLES glibc.malloc.check=1". Doing a web search for that, I'm led to https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html which states that this tunable actually depends on pre-loading "libc_malloc_debug". So, if I re-run the program like this: $ LD_PRELOAD=/usr/lib64/libc_malloc_debug.so.0 \ ./malloc-test then it continues running; if I re-run it like this: $ GLIBC_TUNABLES=glibc.malloc.check=1 \ ./malloc-test then it continues running; but if I re-run it like *this*: $ LD_PRELOAD=/usr/lib64/libc_malloc_debug.so.0 \ GLIBC_TUNABLES=glibc.malloc.check=1 \ ./malloc-test then it *instantly* deadlocks; it doesn't print a single dot. According to gdb, Thread 1 of the parent process is blocked in waitpid() on line 44, the other threads of the parent process are executing threadfn() -- I can see that on my CPU load indicator too --, and the child process is deadlocked in malloc(): #0 0x00007ffbae0934fb in __lll_lock_wait_private () from /lib64/libc.so.6 #1 0x00007ffbae218f38 in malloc_check () from /usr/lib64/libc_malloc_debug.so.0 #2 0x00007ffbae219c05 in malloc () from /usr/lib64/libc_malloc_debug.so.0 #3 0x000000000040121a in main () at malloc-test.c:39 (Yes, I understand that libc_malloc_debug is not meant for production use; still...) Laszlo _______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs