On 2/22/23 09:09, Richard W.M. Jones wrote:
> On Tue, Feb 21, 2023 at 05:03:23PM +0100, Laszlo Ersek wrote:
>> Rich mentioned that libnbd had actually encountered a bug of this kind,
>> just not specifically in exec*p*().
> 
> Probably this one?
> 
> https://github.com/libguestfs/libguestfs/commit/e1c9bbb3d1d5ef81490977060120dda0963eb567

Yes, that's the one you mentioned before! Here:

20230131130753.GA7636@redhat.com">http://mid.mail-archive.com/20230131130753.GA7636@redhat.com

> glibc was pretty tolerant of this code bug, and the error only
> manifested itself when we used glibc.malloc.check=1

Thank you for pointing me to the same commit again.

Yesterday, when Daniel described that malloc() was -- in practice -- safe to 
call in a child process forked from a multi-threaded process, I wrote the 
following test program:

(The program starts 8 threads calling malloc+free in a busy loop, then the main 
thread enters an infinite loop, forking and reaping a child process in each 
iteration, and printing a dot for each child reaped. The child process, forked 
from the multi-threaded parent process, calls a single malloc+free pair, and 
then exits.)

     1  #define _XOPEN_SOURCE 700
     2
     3  #include <pthread.h>
     4  #include <stdlib.h>
     5  #include <sys/wait.h>
     6  #include <unistd.h>
     7
     8  static const size_t size = 16 * 1024 * 1024;
     9
    10  static void *
    11  threadfn (void *arg)
    12  {
    13    while (1)
    14      free (malloc (size));
    15  }
    16
    17  int
    18  main (void)
    19  {
    20    unsigned i;
    21
    22    for (i = 0; i < 8; ++i) {
    23      pthread_t thread;
    24
    25      if (pthread_create (&thread, NULL, threadfn, NULL) != 0)
    26        _exit (EXIT_FAILURE);
    27    }
    28
    29    while (1) {
    30      pid_t pid;
    31
    32      pid = fork ();
    33      switch (pid) {
    34      case -1:
    35        _exit (EXIT_FAILURE);
    36
    37      case 0:
    38        /* child */
    39        free (malloc (size));
    40        _exit (EXIT_SUCCESS);
    41
    42      default:
    43        /* parent */
    44        if (waitpid (pid, NULL, 0) == -1 ||
    45            write (STDOUT_FILENO, ".", 1) == -1)
    46          _exit (EXIT_FAILURE);
    47      }
    48    }
    49  }


To my shock, the program ran totally fine (on RHEL-9.1), producing a constant 
stream of dots on standard output, proving Daniel *right*.

However, the commit message you now reference highlights "GLIBC_TUNABLES 
glibc.malloc.check=1". Doing a web search for that, I'm led to

  
https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html

which states that this tunable actually depends on pre-loading 
"libc_malloc_debug".

So, if I re-run the program like this:

$ LD_PRELOAD=/usr/lib64/libc_malloc_debug.so.0 \
  ./malloc-test

then it continues running; if I re-run it like this:

$ GLIBC_TUNABLES=glibc.malloc.check=1 \
  ./malloc-test

then it continues running; but if I re-run it like *this*:

$ LD_PRELOAD=/usr/lib64/libc_malloc_debug.so.0 \
  GLIBC_TUNABLES=glibc.malloc.check=1 \
  ./malloc-test

then it *instantly* deadlocks; it doesn't print a single dot.

According to gdb, Thread 1 of the parent process is blocked in waitpid() on 
line 44, the other threads of the parent process are executing threadfn() -- I 
can see that on my CPU load indicator too --, and the child process is 
deadlocked in malloc():

#0  0x00007ffbae0934fb in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007ffbae218f38 in malloc_check () from /usr/lib64/libc_malloc_debug.so.0
#2  0x00007ffbae219c05 in malloc () from /usr/lib64/libc_malloc_debug.so.0
#3  0x000000000040121a in main () at malloc-test.c:39

(Yes, I understand that libc_malloc_debug is not meant for production use; 
still...)

Laszlo
_______________________________________________
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Reply via email to