Alexey Serbin created KUDU-3635:
-----------------------------------

             Summary: kudu CLI tool sometimes crashes on exit with SIGSEGV in 
OPENSSL_cleanup
                 Key: KUDU-3635
                 URL: https://issues.apache.org/jira/browse/KUDU-3635
             Project: Kudu
          Issue Type: Bug
          Components: CLI
    Affects Versions: 1.18.0
            Reporter: Alexey Serbin


The kudu CLI tools sometimes crash on exit with SIGSEGV.

I haven't had a chance looking at this closely, but it seems the problem is 
related to the order of cleanup of different libraries and overall unexpected 
state of the runtime when the implicitly installed cleanup handler for the 
OpenSSL library is being called.

Below is a snippet from the output of the 
{{RebalanceIgnoredTserversTest.Basic}} test scenario.  That was generated by 
Kudu bits built in DEBUG configuration on Ubuntu 18.04.6 LTS machine and run 
via dist-test on Ubuntu 18.04.6 LTS as well.

BTW, we have been suppressing TSAN warnings in the OpenSSL cleanup paths for a 
long time due to well-known issue in the OpenSSL library (see [this TSAN 
suppression|https://github.com/apache/kudu/blob/2b9a2012f6d7b59931119dfad03e8d40e3031a0e/src/kudu/util/sanitizer_options.cc#L177-L184]),
 so there might be some other issues around that we haven't paid attention for 
a long time.

Probably, it's time to follow [best practices for at-exit cleanup of 
applications using 
OpenSSL|https://developers.redhat.com/articles/2022/10/31/best-practices-application-shutdown-openssl#].
  In essence, that works at least with v1.1.1 and newer versions of the OpenSSL 
library: use the {{OPENSSL_INIT_NO_ATEXIT}} option for 
{{OPENSSL_init_crypto()}} call at initialization and then explicitly call 
{{OPENSSL_cleanup()}} upon exit/shutdown.

{noformat}
*** SIGSEGV (@0x10000562bd5) received by PID 1447 (TID 0x7fb1cda47480) from PID 
5647317; stack trace: ***
    @     0x7fb1d6307980 (unknown) at ??:0                                      
    @     0x7fb1d5a37873 tcmalloc::ThreadCache::ReleaseToCentralCache() at ??:0 
    @     0x7fb1d5a37be7 tcmalloc::ThreadCache::Scavenge() at ??:0              
    @     0x7fb1d3bce271 OPENSSL_LH_free at ??:0                                
    @     0x7fb1d3bacbfd (unknown) at ??:0                                      
    @     0x7fb1d3bcbe10 OPENSSL_cleanup at ??:0                                
    @     0x7fb1d434e161 (unknown) at ??:0                                      
    @     0x7fb1d434e25a exit at ??:0                                           
    @     0x7fb1d432cbfe __libc_start_main at ??:0                              
    @     0x562bc9f8300a _start at ??:0   
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to