[ 
https://issues.apache.org/jira/browse/KUDU-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-3635:
--------------------------------
    Description: 
The kudu CLI tools sometimes crash on exit with SIGSEGV.

I haven't had a chance looking at this closely, but it seems the problem is 
related to the order of cleanup of different libraries and overall unexpected 
state of the runtime when the implicitly installed cleanup handler for the 
OpenSSL library is being called.

Below is a snippet from the output of the 
{{RebalanceIgnoredTserversTest.Basic}} test scenario.  That was generated by 
Kudu bits built in DEBUG configuration on Ubuntu 18.04.6 LTS machine and run 
via dist-test on Ubuntu 18.04.6 LTS as well.

BTW, we have been suppressing TSAN warnings in the OpenSSL cleanup paths for a 
long time due to well-known issue in the OpenSSL library (see [this TSAN 
suppression|https://github.com/apache/kudu/blob/2b9a2012f6d7b59931119dfad03e8d40e3031a0e/src/kudu/util/sanitizer_options.cc#L177-L184]),
 so there might be some other issues around that we haven't paid attention for 
a long time.

Probably, it's time to follow [best practices for at-exit cleanup of 
applications using 
OpenSSL|https://developers.redhat.com/articles/2022/10/31/best-practices-application-shutdown-openssl#].
  In essence, that works at least with v1.1.1 and newer versions of the OpenSSL 
library: use the {{OPENSSL_INIT_NO_ATEXIT}} option for {{OPENSSL_init_ssl()}} 
at initialization and then explicitly call {{OPENSSL_cleanup()}} upon 
exit/shutdown.

{noformat}
*** SIGSEGV (@0x10000562bd5) received by PID 1447 (TID 0x7fb1cda47480) from PID 
5647317; stack trace: ***
    @     0x7fb1d6307980 (unknown) at ??:0                                      
    @     0x7fb1d5a37873 tcmalloc::ThreadCache::ReleaseToCentralCache() at ??:0 
    @     0x7fb1d5a37be7 tcmalloc::ThreadCache::Scavenge() at ??:0              
    @     0x7fb1d3bce271 OPENSSL_LH_free at ??:0                                
    @     0x7fb1d3bacbfd (unknown) at ??:0                                      
    @     0x7fb1d3bcbe10 OPENSSL_cleanup at ??:0                                
    @     0x7fb1d434e161 (unknown) at ??:0                                      
    @     0x7fb1d434e25a exit at ??:0                                           
    @     0x7fb1d432cbfe __libc_start_main at ??:0                              
    @     0x562bc9f8300a _start at ??:0   
{noformat}

  was:
The kudu CLI tools sometimes crash on exit with SIGSEGV.

I haven't had a chance looking at this closely, but it seems the problem is 
related to the order of cleanup of different libraries and overall unexpected 
state of the runtime when the implicitly installed cleanup handler for the 
OpenSSL library is being called.

Below is a snippet from the output of the 
{{RebalanceIgnoredTserversTest.Basic}} test scenario.  That was generated by 
Kudu bits built in DEBUG configuration on Ubuntu 18.04.6 LTS machine and run 
via dist-test on Ubuntu 18.04.6 LTS as well.

BTW, we have been suppressing TSAN warnings in the OpenSSL cleanup paths for a 
long time due to well-known issue in the OpenSSL library (see [this TSAN 
suppression|https://github.com/apache/kudu/blob/2b9a2012f6d7b59931119dfad03e8d40e3031a0e/src/kudu/util/sanitizer_options.cc#L177-L184]),
 so there might be some other issues around that we haven't paid attention for 
a long time.

Probably, it's time to follow [best practices for at-exit cleanup of 
applications using 
OpenSSL|https://developers.redhat.com/articles/2022/10/31/best-practices-application-shutdown-openssl#].
  In essence, that works at least with v1.1.1 and newer versions of the OpenSSL 
library: use the {{OPENSSL_INIT_NO_ATEXIT}} option for 
{{OPENSSL_init_crypto()}} call at initialization and then explicitly call 
{{OPENSSL_cleanup()}} upon exit/shutdown.

{noformat}
*** SIGSEGV (@0x10000562bd5) received by PID 1447 (TID 0x7fb1cda47480) from PID 
5647317; stack trace: ***
    @     0x7fb1d6307980 (unknown) at ??:0                                      
    @     0x7fb1d5a37873 tcmalloc::ThreadCache::ReleaseToCentralCache() at ??:0 
    @     0x7fb1d5a37be7 tcmalloc::ThreadCache::Scavenge() at ??:0              
    @     0x7fb1d3bce271 OPENSSL_LH_free at ??:0                                
    @     0x7fb1d3bacbfd (unknown) at ??:0                                      
    @     0x7fb1d3bcbe10 OPENSSL_cleanup at ??:0                                
    @     0x7fb1d434e161 (unknown) at ??:0                                      
    @     0x7fb1d434e25a exit at ??:0                                           
    @     0x7fb1d432cbfe __libc_start_main at ??:0                              
    @     0x562bc9f8300a _start at ??:0   
{noformat}


> kudu CLI tool sometimes crashes on exit with SIGSEGV in OPENSSL_cleanup
> -----------------------------------------------------------------------
>
>                 Key: KUDU-3635
>                 URL: https://issues.apache.org/jira/browse/KUDU-3635
>             Project: Kudu
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 1.18.0
>            Reporter: Alexey Serbin
>            Priority: Major
>
> The kudu CLI tools sometimes crash on exit with SIGSEGV.
> I haven't had a chance looking at this closely, but it seems the problem is 
> related to the order of cleanup of different libraries and overall unexpected 
> state of the runtime when the implicitly installed cleanup handler for the 
> OpenSSL library is being called.
> Below is a snippet from the output of the 
> {{RebalanceIgnoredTserversTest.Basic}} test scenario.  That was generated by 
> Kudu bits built in DEBUG configuration on Ubuntu 18.04.6 LTS machine and run 
> via dist-test on Ubuntu 18.04.6 LTS as well.
> BTW, we have been suppressing TSAN warnings in the OpenSSL cleanup paths for 
> a long time due to well-known issue in the OpenSSL library (see [this TSAN 
> suppression|https://github.com/apache/kudu/blob/2b9a2012f6d7b59931119dfad03e8d40e3031a0e/src/kudu/util/sanitizer_options.cc#L177-L184]),
>  so there might be some other issues around that we haven't paid attention 
> for a long time.
> Probably, it's time to follow [best practices for at-exit cleanup of 
> applications using 
> OpenSSL|https://developers.redhat.com/articles/2022/10/31/best-practices-application-shutdown-openssl#].
>   In essence, that works at least with v1.1.1 and newer versions of the 
> OpenSSL library: use the {{OPENSSL_INIT_NO_ATEXIT}} option for 
> {{OPENSSL_init_ssl()}} at initialization and then explicitly call 
> {{OPENSSL_cleanup()}} upon exit/shutdown.
> {noformat}
> *** SIGSEGV (@0x10000562bd5) received by PID 1447 (TID 0x7fb1cda47480) from 
> PID 5647317; stack trace: ***
>     @     0x7fb1d6307980 (unknown) at ??:0                                    
>   
>     @     0x7fb1d5a37873 tcmalloc::ThreadCache::ReleaseToCentralCache() at 
> ??:0 
>     @     0x7fb1d5a37be7 tcmalloc::ThreadCache::Scavenge() at ??:0            
>   
>     @     0x7fb1d3bce271 OPENSSL_LH_free at ??:0                              
>   
>     @     0x7fb1d3bacbfd (unknown) at ??:0                                    
>   
>     @     0x7fb1d3bcbe10 OPENSSL_cleanup at ??:0                              
>   
>     @     0x7fb1d434e161 (unknown) at ??:0                                    
>   
>     @     0x7fb1d434e25a exit at ??:0                                         
>   
>     @     0x7fb1d432cbfe __libc_start_main at ??:0                            
>   
>     @     0x562bc9f8300a _start at ??:0   
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to