When a highly multi-threaded program such as nbdcopy encounters an error, there is a race condition in the library which can cause an assertion failure and thus a core dump:
(1) An error occurs on one of the threads. nbdcopy calls exit(3). (2) In lib/errors.c, the destructor calls pthread_key_delete. (3) Another thread which is still running also encounters an error, and inside libnbd the library calls set_error(). (4) The call to set_error() calls pthread_getspecific which returns NULL (since the key has already been destroyed in step (2)), and this causes us to call pthread_setspecific which returns EINVAL because glibc is able to detect invalid use of a pthread_key_t after it has been destroyed. In any case, the error message is lost, and any subsequent call to nbd_get_error() will return NULL. (5) We enter the %DEAD state, where there is an assertion that nbd_get_error() is not NULL. This assertion is supposed to be for checking that the code called set_error() before entering the %DEAD state. Although we did call set_error(), the error message was lost at step (4) and nbd_get_error() will always return NULL. This assertion failure causes a crash. There aren't any good ways to fix this. I chose to remove the assertions that might trigger, and ignore pthread_getspecific returning EINVAL. This reverts a small part of commit 831cbf7ee14c ("generator: Allow DEAD state actions to run"). --- generator/state_machine_generator.ml | 5 +---- generator/states.c | 2 -- lib/errors.c | 5 ++++- 3 files changed, 5 insertions(+), 7 deletions(-) diff --git a/generator/state_machine_generator.ml b/generator/state_machine_generator.ml index 43c1ce4c14..9f94bed1f3 100644 --- a/generator/state_machine_generator.ml +++ b/generator/state_machine_generator.ml @@ -449,10 +449,7 @@ let pr " abort (); /* Should never happen, but keeps GCC happy. */\n"; pr " }\n"; pr "\n"; - pr " if (r == -1) {\n"; - pr " assert (nbd_get_error () != NULL);\n"; - pr " return -1;\n"; - pr " }\n"; + pr " if (r == -1) return -1;\n"; pr " } while (!blocked);\n"; pr " return 0;\n"; pr "}\n"; diff --git a/generator/states.c b/generator/states.c index fa0f8d716e..c0cf5a7f26 100644 --- a/generator/states.c +++ b/generator/states.c @@ -191,8 +191,6 @@ STATE_MACHINE { return 0; DEAD: - /* The caller should have used set_error() before reaching here */ - assert (nbd_get_error ()); abort_option (h); nbd_internal_abort_commands (h, &h->cmds_to_issue); nbd_internal_abort_commands (h, &h->cmds_in_flight); diff --git a/lib/errors.c b/lib/errors.c index 8b77650ef3..9142a0843e 100644 --- a/lib/errors.c +++ b/lib/errors.c @@ -93,7 +93,10 @@ allocate_last_error_on_demand (void) last_error = calloc (1, sizeof *last_error); if (last_error) { int err = pthread_setspecific (errors_key, last_error); - if (err != 0) { + /* Ignore EINVAL because that can happen in a race with other + * threads when we are exiting. + */ + if (err != 0 && err != EINVAL) { /* This is not supposed to happen (XXX). */ fprintf (stderr, "%s: %s: %s\n", "libnbd", "pthread_setspecific", strerror (err)); -- 2.39.2 _______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs