Fei Li <lifei1...@126.com> writes: > 在 2019/1/9 上午1:29, Markus Armbruster 写道: >> fei <lifei1...@126.com> writes: >> >>>> 在 2019年1月8日,01:55,Markus Armbruster <arm...@redhat.com> 写道: >>>> >>>> Fei Li <f...@suse.com> writes: >>>> >>>>> To avoid the segmentation fault in qemu_thread_join(), just directly >>>>> return when the QemuThread *thread failed to be created in either >>>>> qemu-thread-posix.c or qemu-thread-win32.c. >>>>> >>>>> Cc: Stefan Weil <s...@weilnetz.de> >>>>> Signed-off-by: Fei Li <f...@suse.com> >>>>> Reviewed-by: Fam Zheng <f...@redhat.com> >>>>> --- >>>>> util/qemu-thread-posix.c | 3 +++ >>>>> util/qemu-thread-win32.c | 2 +- >>>>> 2 files changed, 4 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c >>>>> index 39834b0551..3548935dac 100644 >>>>> --- a/util/qemu-thread-posix.c >>>>> +++ b/util/qemu-thread-posix.c >>>>> @@ -571,6 +571,9 @@ void *qemu_thread_join(QemuThread *thread) >>>>> int err; >>>>> void *ret; >>>>> >>>>> + if (!thread->thread) { >>>>> + return NULL; >>>>> + } >>>> How can this happen? >>> I think I have answered this earlier, please check the following link to >>> see whether it helps: >>> http://lists.nongnu.org/archive/html/qemu-devel/2018-11/msg06554.html >> Thanks for the pointer. Unfortunately, I don't understand your >> explanation. You also wrote there "I will remove this patch in next >> version"; looks like you've since changed your mind. > Emm, issues left over from history.. The background is I was hurry to > make those five > Reviewed-by patches be merged, including this v9 16/16 patch but not > the real > qemu_thread_create() modification. But actually this patch is to fix > the segmentation > fault after we modified qemu_thread_create() related functions > although it has got a > Reviewed-by earlier. :) Thus to not make troube, I wrote the > "remove..." sentence > to separate it from those 5 Reviewed-by patches, and were plan to send > only four patches. > But later I got a message that these five patches are not that urgent > to catch qemu v3.1, > thus I joined the earlier 5 R-b patches into the later v8 & v9 to have > a better review. > > Sorry for the trouble, I need to explain it without involving too much > background.. > > Back at the farm: in our current qemu code, some cleanups use a loop > to join() > the total number of threads if caller fails. This is not a problem > until applying the > qemu_thread_create() modification. E.g. when compress_threads_save_setup() > fails while trying to create the last do_data_compress thread, > segmentation fault > will occur when join() is called (sadly there's not enough condition > to filter this > unsuccessful created thread) as this thread is actually not be created. > > Hope the above makes it clear. :)
Alright, let's have a look at compress_threads_save_setup(): static int compress_threads_save_setup(void) { int i, thread_count; if (!migrate_use_compression()) { return 0; } thread_count = migrate_compress_threads(); compress_threads = g_new0(QemuThread, thread_count); comp_param = g_new0(CompressParam, thread_count); qemu_cond_init(&comp_done_cond); qemu_mutex_init(&comp_done_lock); for (i = 0; i < thread_count; i++) { comp_param[i].originbuf = g_try_malloc(TARGET_PAGE_SIZE); if (!comp_param[i].originbuf) { goto exit; } if (deflateInit(&comp_param[i].stream, migrate_compress_level()) != Z_OK) { g_free(comp_param[i].originbuf); goto exit; } /* comp_param[i].file is just used as a dummy buffer to save data, * set its ops to empty. */ comp_param[i].file = qemu_fopen_ops(NULL, &empty_ops); comp_param[i].done = true; comp_param[i].quit = false; qemu_mutex_init(&comp_param[i].mutex); qemu_cond_init(&comp_param[i].cond); qemu_thread_create(compress_threads + i, "compress", do_data_compress, comp_param + i, QEMU_THREAD_JOINABLE); } return 0; exit: compress_threads_save_cleanup(); return -1; } At label exit, we have @i threads, all fully initialized. That's an invariant. compress_threads_save_cleanup() finds the threads to clean up by checking comp_param[i].file: static void compress_threads_save_cleanup(void) { int i, thread_count; if (!migrate_use_compression() || !comp_param) { return; } thread_count = migrate_compress_threads(); for (i = 0; i < thread_count; i++) { /* * we use it as a indicator which shows if the thread is * properly init'd or not */ ---> if (!comp_param[i].file) { ---> break; ---> } qemu_mutex_lock(&comp_param[i].mutex); comp_param[i].quit = true; qemu_cond_signal(&comp_param[i].cond); qemu_mutex_unlock(&comp_param[i].mutex); qemu_thread_join(compress_threads + i); qemu_mutex_destroy(&comp_param[i].mutex); qemu_cond_destroy(&comp_param[i].cond); deflateEnd(&comp_param[i].stream); g_free(comp_param[i].originbuf); qemu_fclose(comp_param[i].file); comp_param[i].file = NULL; } qemu_mutex_destroy(&comp_done_lock); qemu_cond_destroy(&comp_done_cond); g_free(compress_threads); g_free(comp_param); compress_threads = NULL; comp_param = NULL; } Due to the invariant, a comp_param[i] with a null .file doesn't need *any* cleanup. To maintain the invariant, compress_threads_save_setup() carefully cleans up any partial initializations itself before a goto exit. Since the code is arranged smartly, the only such cleanup is the g_free(comp_param[i].originbuf) before the second goto exit. Your PATCH 13 adds a third goto exit, but neglects to clean up partial initializations. Breaks the invariant. I see two sane solutions: 1. compress_threads_save_setup() carefully cleans up partial initializations itself. compress_threads_save_cleanup() copes only with fully initialized comp_param[i]. This is how things work before your series. 2. compress_threads_save_cleanup() copes with partially initialized comp_param[i], i.e. does the right thing for each goto exit in compress_threads_save_setup(). compress_threads_save_setup() doesn't clean up partial initializations. Your PATCH 13 together with the fixup PATCH 16 does 3. A confusing mix of the two. Don't. > Have a nice day > Fei >> >> What exactly breaks if we omit this patch? Assuming something does >> break: imagine we did omit this patch, then forgot we ever saw it, and >> now you've discovered the breakage. Write us the bug report, complete >> with reproducer. >> >> [...]