On Thu, Mar 09, 2023 at 09:13:24AM +0000, Daniel P. Berrangé wrote: > On Thu, Mar 09, 2023 at 08:44:51AM +0000, Richard W.M. Jones wrote: > > When a highly multi-threaded program such as nbdcopy encounters an > > error, there is a race condition in the library which can cause an > > assertion failure and thus a core dump: > > > > (1) An error occurs on one of the threads. nbdcopy calls exit(3). > > > > (2) In lib/errors.c, the destructor calls pthread_key_delete. > > > > (3) Another thread which is still running also encounters an error, > > and inside libnbd the library calls set_error(). > > > > (4) The call to set_error() calls pthread_getspecific which returns > > NULL (since the key has already been destroyed in step (2)), and this > > causes us to call pthread_setspecific which returns EINVAL because > > glibc is able to detect invalid use of a pthread_key_t after it has > > been destroyed. In any case, the error message is lost, and any > > subsequent call to nbd_get_error() will return NULL. > > > > (5) We enter the %DEAD state, where there is an assertion that > > nbd_get_error() is not NULL. This assertion is supposed to be for > > checking that the code called set_error() before entering the %DEAD > > state. Although we did call set_error(), the error message was lost > > at step (4) and nbd_get_error() will always return NULL. This > > assertion failure causes a crash. > > > > There aren't any good ways to fix this. I chose to leak the > > pthread_key_t on the exit path. > > --- > > lib/errors.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/lib/errors.c b/lib/errors.c > > index 8b77650ef3..6fbfaacd34 100644 > > --- a/lib/errors.c > > +++ b/lib/errors.c > > @@ -69,7 +69,11 @@ errors_key_destroy (void) > > free (last_error->error); > > free (last_error); > > } > > - pthread_key_delete (errors_key); > > + > > + /* We could do this, but that causes a race condition described here: > > + * https://listman.redhat.com/archives/libguestfs/2023-March/031002.html > > + */ > > + //pthread_key_delete (errors_key); > > } > > While I think this fixes the problem with pthread_setspecific crashing, > I believe this then opens apps up to the problem hit by libvirt with > destructors being run which don't exist in memory. > > Consider a thread is running that has done something with libnbd which > caused allocate_last_error_on_demand() to run, which populates the > 'errors_key' data for that thread. This thread is no longer doing anything > with libnbd, but the 'errors_key' data remains populated none the less > because that's just how thread locals work. > > Now nothing at all is using libnbd.so, so some thread calls dlclose(), > causing libnbd.so to be unmapped from memory. Since we've not called > pthread_key_delete, the thread local still exists and, crucially, so > does its registered destructor function. > > The thread mentioned in the previous paragraph now terminates and this > causes the destructor function 'free_errors_key' to be run.... except > this function isn't mapped in memory any more. This will cause the > program to crash.
Indeed it does exactly this. v4 posted. Rich. > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v _______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs