[ https://issues.apache.org/jira/browse/KUDU-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Serbin updated KUDU-3635: -------------------------------- Affects Version/s: 1.17.1 1.17.0 > kudu CLI tool sometimes crashes on exit with SIGSEGV in OPENSSL_cleanup > ----------------------------------------------------------------------- > > Key: KUDU-3635 > URL: https://issues.apache.org/jira/browse/KUDU-3635 > Project: Kudu > Issue Type: Bug > Components: CLI > Affects Versions: 1.17.0, 1.18.0, 1.17.1 > Reporter: Alexey Serbin > Priority: Major > > The kudu CLI tools sometimes crash on exit with SIGSEGV. > I haven't had a chance looking at this closely, but it seems the problem is > related to the order of cleanup of different libraries and overall unexpected > state of the runtime when the implicitly installed cleanup handler for the > OpenSSL library is being called. > Below is a snippet from the output of the > {{RebalanceIgnoredTserversTest.Basic}} test scenario. That was generated by > Kudu bits built in RELEASE configuration on Ubuntu 18.04.6 LTS machine and > run via dist-test on Ubuntu 18.04.6 LTS as well. > BTW, we have been suppressing TSAN warnings in the OpenSSL cleanup paths for > a long time due to well-known issue in the OpenSSL library (see [this TSAN > suppression|https://github.com/apache/kudu/blob/2b9a2012f6d7b59931119dfad03e8d40e3031a0e/src/kudu/util/sanitizer_options.cc#L177-L184]), > so there might be some other issues around that we haven't paid attention > for a long time. > Probably, it's time to follow [best practices for at-exit cleanup of > applications using > OpenSSL|https://developers.redhat.com/articles/2022/10/31/best-practices-application-shutdown-openssl#]. > In essence, that works at least with v1.1.1 and newer versions of the > OpenSSL library: use the {{OPENSSL_INIT_NO_ATEXIT}} option for > {{OPENSSL_init_ssl()}} at initialization and then explicitly call > {{OPENSSL_cleanup()}} upon exit/shutdown. > {noformat} > *** SIGSEGV (@0x10000562bd5) received by PID 1447 (TID 0x7fb1cda47480) from > PID 5647317; stack trace: *** > @ 0x7fb1d6307980 (unknown) at ??:0 > > @ 0x7fb1d5a37873 tcmalloc::ThreadCache::ReleaseToCentralCache() at > ??:0 > @ 0x7fb1d5a37be7 tcmalloc::ThreadCache::Scavenge() at ??:0 > > @ 0x7fb1d3bce271 OPENSSL_LH_free at ??:0 > > @ 0x7fb1d3bacbfd (unknown) at ??:0 > > @ 0x7fb1d3bcbe10 OPENSSL_cleanup at ??:0 > > @ 0x7fb1d434e161 (unknown) at ??:0 > > @ 0x7fb1d434e25a exit at ??:0 > > @ 0x7fb1d432cbfe __libc_start_main at ??:0 > > @ 0x562bc9f8300a _start at ??:0 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)