schneuwlym created an issue (kamailio/kamailio#4185)

### Description

This issue is a follow up of the discussion in the [mailing 
list](https://lists.kamailio.org/mailman3/hyperkitty/list/sr-us...@lists.kamailio.org/thread/IO63U64NPN2NB3YGJBCOSZTIERTB54CD/)

We are using Kamailio 5.8.4 in a yocto scarthgap based image. We noticed on 
multiple nodes, that suddenly phones lost their registrations and the log 
showed messages like `send_fd(): sendmsg failed sending 55 on 52: Too many 
references: cannot splice (109)`.

After debugging we guess that the `send_fd` messages are just a symptom, not 
the cause. Generating a backtrace using `kamctl trap` shows that all TCP 
receiver are waiting for a `pthread_rwlock_...` lock.

Our deployments, where we randomly see the issue, are very small; 10-30 
registrations.

I've tested kamailio version 5.8.4, 5.8.5, 5.8.6 and 6.0.1 and I was able to 
reproduce the issue with all versions.

I've also tested version 5.5.4, but with an older yocto (dunfell) which used 
openssl 1.1.1w, **which did not block!**

Yocto Scarthgap uses OpenSSL 3.2.3 and we also updated to version 3.4.1, which 
didn't show any difference.

### Troubleshooting

#### Reproduction

Meanwhile we found a way to reproduce the issue within seconds. It requires a 
bit of brute force, but it locks kamailio reliable.

We are generating 3 parallel SIP REGISTER streams with sipexer to 3 different 
accounts. Following an example of one stream:
```
watch -n 0.1 "./sipexer -register -vl 3 -co -com -ex 3600 -fu "2"  -au "2" -ap 
"test" -tk <path to key>.key -tc <path to certificate>.cer -ti -sd -ru 
sip:<domain> -cu sip:2@<domain> tls:<fqdn>:5061;sleep 0.1;./sipexer -register 
-vl 3 -co -com -ex 0 -fu "2"  -au "2" -ap "test" -tk <path to key>.key -tc 
<path to cert>.cer -ti -sd -ru sip:<domain> -cu sip:2@<domain> tls:<fqdn>:5061"
```

Our dialplan is quite complex, but I was also able to reproduce the issue with 
the default 
[dialplan](https://raw.githubusercontent.com/kamailio/kamailio/refs/heads/5.8/etc/kamailio.cfg)
 with the following changes:
```
169a170,171
> #!define WITH_TLS
> 
205a208,219
> alias=udp:<domain>:5060
> alias=udp:127.0.0.1:5060
> alias=udp:10.0.6.1:5060
> alias=tls:<domain>
> alias=tls:10.0.6.1:5061
> alias=tls:10.165.42.76:5061
> alias=tls:<domain>:5061
> alias=udp:<domain>
> listen=udp:127.0.0.1:5060
> listen=tls:10.0.6.1:5061
> listen=udp:10.0.6.1:5060
> listen=tls:10.165.42.76:5061
498c512
< modparam("tls", "config", "/usr/local/etc/kamailio/tls.cfg")
---
> modparam("tls", "config", "/etc/kamailio/kamailio-tls.cfg")
```

```
# cat /etc/kamailio/kamailio-tls.cfg
[server:default]
method = TLSv1+
verify_certificate = yes
require_certificate = yes
private_key = "<path to key>.key"
certificate = "<path to cert>.crt"
ca_list = "<path to ca>.pem"

[client:default]
method = TLSv1+
verify_certificate = yes
require_certificate = yes
private_key = "<path to key>.key"
certificate = "<path to cert>.crt"
ca_list = "<path to ca>.pem"

```

#### Debugging Data

Following the backtrace of one tcp listener process. The full backtrace can be 
found 
[here](https://github.com/user-attachments/files/19430313/gdb_kamailio_20250321_154253.txt)

```
---start 4608 -----------------------------------------------------
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
__futex_abstimed_wait_common (futex_word=futex_word@entry=0x7fc855758a0c, 
expected=expected@entry=3, clockid=clockid@entry=0, abstime=abstime@entry=0x0, 
private=<optimized out>, cancel=cancel@entry=false) at futex-internal.c:103

warning: 103    futex-internal.c: No such file or directory
#0  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7fc855758a0c, 
expected=expected@entry=3, clockid=clockid@entry=0, abstime=abstime@entry=0x0, 
private=<optimized out>, cancel=cancel@entry=false) at futex-internal.c:103
        err = -512
        clockbit = 256
        op = <optimized out>
#1  0x00007fc85a59429c in __GI___futex_abstimed_wait64 
(futex_word=futex_word@entry=0x7fc855758a0c, expected=expected@entry=3, 
clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at 
futex-internal.c:128
No locals.
#2  0x00007fc85a59d830 in __pthread_rwlock_wrlock_full64 (abstime=0x0, 
clockid=0, rwlock=0x7fc855758a00) at 
/usr/src/debug/glibc/2.39+git/nptl/pthread_rwlock_common.c:730
        private = <optimized out>
        wf = <optimized out>
        err = <optimized out>
        prefer_writer = false
        may_share_futex_used_flag = true
        wpf = <optimized out>
        ready = <optimized out>
        r = <optimized out>
        may_share_futex_used_flag = <optimized out>
        r = <optimized out>
        done = <optimized out>
        wpf = <optimized out>
        ready = <optimized out>
        __value = <optimized out>
        prefer_writer = <optimized out>
        private = <optimized out>
        wf = <optimized out>
        err = <optimized out>
        w = <optimized out>
        w = <optimized out>
        private = <optimized out>
        err = <optimized out>
        w = <optimized out>
        wf = <optimized out>
        wf = <optimized out>
        __value = <optimized out>
#3  ___pthread_rwlock_wrlock (rwlock=0x7fc855758a00) at 
pthread_rwlock_wrlock.c:26
        result = <optimized out>
#4  0x00007fc859820b49 in CRYPTO_THREAD_write_lock (lock=<optimized out>) at 
/usr/src/debug/openssl/3.2.3/crypto/threads_pthread.c:108
No locals.
#5  0x00007fc8599bd5da in ossl_prov_drbg_generate (drbg=0x7fc855710040, 
out=0x7fc856471980 "", outlen=48, strength=256, prediction_resistance=0, 
adin=0x7ffeec8b67e0 "@", adinlen=8) at 
/usr/src/debug/openssl/3.2.3/providers/implementations/rands/drbg.c:645
        fork_id = <optimized out>
        reseed_required = 0
        ret = 0
        __func__ = "ossl_prov_drbg_generate"
        err = <optimized out>
        now = <optimized out>
#6  0x00007fc8599bd9ca in ossl_drbg_get_seed (vdrbg=<optimized out>, 
pout=0x7ffeec8b6880, entropy=<optimized out>, min_len=<optimized out>, 
max_len=<optimized out>, prediction_resistance=0, adin=0x7ffeec8b6838 
"\220'\037V\310\177", adin_len=8) at 
/usr/src/debug/openssl/3.2.3/providers/implementations/rands/drbg.c:175
        drbg = 0x7fc855710040
        bytes_needed = 48
        buffer = 0x7fc856471980 ""
        __func__ = "ossl_drbg_get_seed"
#7  0x00007fc8599bc907 in get_entropy (drbg=<optimized out>, 
drbg@entry=0x7fc8561f2790, pout=pout@entry=0x7ffeec8b6880, entropy=<optimized 
out>, min_len=48, max_len=48, 
prediction_resistance=prediction_resistance@entry=0) at 
/usr/src/debug/openssl/3.2.3/providers/implementations/rands/drbg.c:245
        bytes = <optimized out>
        p_str = 256
        __func__ = "get_entropy"
#8  0x00007fc8599bd26c in ossl_prov_drbg_reseed_unlocked 
(drbg=drbg@entry=0x7fc8561f2790, 
prediction_resistance=prediction_resistance@entry=0, ent=ent@entry=0x0, 
ent_len=ent_len@entry=0, adin=adin@entry=0x0, adinlen=adinlen@entry=0) at 
/usr/src/debug/openssl/3.2.3/providers/implementations/rands/drbg.c:573
        entropy = 0x0
        entropylen = 0
        __func__ = "ossl_prov_drbg_reseed_unlocked"
#9  0x00007fc8599bd68e in ossl_prov_drbg_generate (adinlen=0, adin=0x0, 
prediction_resistance=0, strength=<optimized out>, outlen=32, 
out=0x7fc8564717c8 "", drbg=0x7fc8561f2790) at 
/usr/src/debug/openssl/3.2.3/providers/implementations/rands/drbg.c:697
        fork_id = <optimized out>
        reseed_required = <optimized out>
        ret = <optimized out>
        fork_id = <optimized out>
        reseed_required = <optimized out>
        ret = <optimized out>
        __func__ = "ossl_prov_drbg_generate"
        err = <optimized out>
        now = <optimized out>
#10 ossl_prov_drbg_generate (drbg=0x7fc8561f2790, out=0x7fc8564717c8 "", 
outlen=32, strength=<optimized out>, prediction_resistance=0, adin=0x0, 
adinlen=0) at 
/usr/src/debug/openssl/3.2.3/providers/implementations/rands/drbg.c:634
        fork_id = <optimized out>
        reseed_required = 0
        ret = 0
        __func__ = "ossl_prov_drbg_generate"
        err = <optimized out>
        now = <optimized out>
#11 0x00007fc8597e75c6 in evp_rand_generate_locked 
(ctx=ctx@entry=0x7fc8561f26f0, out=out@entry=0x7fc8564717c8 "", 
outlen=outlen@entry=32, strength=strength@entry=0, 
prediction_resistance=<optimized out>, prediction_resistance@entry=0, 
addin=0x0, addin_len=0) at 
/usr/src/debug/openssl/3.2.3/crypto/evp/evp_rand.c:570
        chunk = 32
        max_request = 65536
        params = {{key = 0x7fc8599e9521 "max_request", data_type = 2, data = 
0x7ffeec8b6968, data_size = 8, return_size = 8}, {key = 0x0, data_type = 0, 
data = 0x0, data_size = 0, return_size = 0}}
        __func__ = "evp_rand_generate_locked"
#12 0x00007fc8597e840b in EVP_RAND_generate (ctx=0x7fc8561f26f0, 
out=out@entry=0x7fc8564717c8 "", outlen=outlen@entry=32, 
strength=strength@entry=0, prediction_resistance=prediction_resistance@entry=0, 
addin=addin@entry=0x0, addin_len=0) at 
/usr/src/debug/openssl/3.2.3/crypto/evp/evp_rand.c:592
        res = <optimized out>
#13 0x00007fc8598f9282 in RAND_bytes_ex (ctx=warning: could not convert 
'ossl_lib_ctx_st' from the host encoding (ANSI_X3.4-1968) to UTF-32.
This normally should not happen, please file a bug report.
0x0, buf=0x7fc8564717c8 "", num=32, strength=0) at 
/usr/src/debug/openssl/3.2.3/crypto/rand/rand_lib.c:380
        rand = <optimized out>
        meth = <optimized out>
        __func__ = "RAND_bytes_ex"
#14 0x00007fc859bdb62c in def_generate_session_id 
(ssl=ssl@entry=0x7fc856436e50, id=id@entry=0x7fc8564717c8 "", 
id_len=id_len@entry=0x7ffeec8b6ae4) at 
/usr/src/debug/openssl/3.2.3/ssl/ssl_sess.c:326
        retry = 0
#15 0x00007fc859bdc132 in ssl_generate_session_id (s=s@entry=0x7fc856436e50, 
ss=ss@entry=0x7fc856471570) at /usr/src/debug/openssl/3.2.3/ssl/ssl_sess.c:409
        tmp = 32
        cb = 0x7fc859bdb600 <def_generate_session_id>
        ssl = <optimized out>
        __func__ = "ssl_generate_session_id"
#16 0x00007fc859bdc31e in ssl_get_new_session (s=s@entry=0x7fc856436e50, 
session=session@entry=1) at /usr/src/debug/openssl/3.2.3/ssl/ssl_sess.c:464
        ss = 0x7fc856471570
        __func__ = "ssl_get_new_session"
#17 0x00007fc859c3f169 in tls_early_post_process_client_hello 
(s=0x7fc856436e50) at /usr/src/debug/openssl/3.2.3/ssl/statem/statem_srvr.c:1898
        i = <optimized out>
        loop = <optimized out>
        sctx = 0x7fc855d24280
        j = <optimized out>
        dgrd = DOWNGRADE_NONE
        protverr = <optimized out>
        c = <optimized out>
        clienthello = 0x7fc856470a90
        al = 80
        id = <optimized out>
        comp = 0x0
        ciphers = 0x7fc856471330
        scsvs = 0x7fc8564713c0
        ssl = 0x7fc856436e50
        j = <optimized out>
        i = <optimized out>
        al = <optimized out>
        protverr = <optimized out>
        loop = <optimized out>
        id = <optimized out>
        comp = <optimized out>
        c = <optimized out>
        ciphers = <optimized out>
        scsvs = <optimized out>
        clienthello = <optimized out>
        dgrd = <optimized out>
        sctx = <optimized out>
        ssl = <optimized out>
        __func__ = "tls_early_post_process_client_hello"
        err = <optimized out>
        cipher = <optimized out>
        trc_out = <optimized out>
        pos = <optimized out>
        pref_cipher = <optimized out>
        master_key_length = <optimized out>
        m = <optimized out>
        comp_id = <optimized out>
        k = <optimized out>
        m = <optimized out>
        nn = <optimized out>
        v = <optimized out>
        done = <optimized out>
        o = <optimized out>
#18 tls_post_process_client_hello (s=0x7fc856436e50, wst=<optimized out>) at 
/usr/src/debug/openssl/3.2.3/ssl/statem/statem_srvr.c:2295
        rv = <optimized out>
        cipher = <optimized out>
        ssl = 0x7fc856436e50
        err = <optimized out>
        __func__ = "tls_post_process_client_hello"
#19 0x00007fc859c2bc1b in read_state_machine (s=0x7fc856436e50) at 
/usr/src/debug/openssl/3.2.3/ssl/statem/statem.c:712
        mt = 1
        pkt = {curr = 0x7fc856466125 "", remaining = 0}
        max_message_size = 0x7fc859c3e3e0 <ossl_statem_server_max_message_size>
        ret = <optimized out>
        process_message = 0x7fc859c42400 <ossl_statem_server_process_message>
        post_process_message = 0x7fc859c414f0 
<ossl_statem_server_post_process_message>
        st = 0x7fc856436ee0
        len = 273
        transition = 0x7fc859c3d330 <ossl_statem_server_read_transition>
        cb = 0x7fc859c91d90 <sr_ssl_ctx_info_callback>
        ssl = 0x7fc856436e50
        st = <optimized out>
        ret = <optimized out>
        mt = <optimized out>
        len = <optimized out>
        transition = <optimized out>
        pkt = <optimized out>
        process_message = <optimized out>
        post_process_message = <optimized out>
        max_message_size = <optimized out>
        cb = <optimized out>
        ssl = <optimized out>
        __func__ = "read_state_machine"
#20 state_machine (s=0x7fc856436e50, server=1) at 
/usr/src/debug/openssl/3.2.3/ssl/statem/statem.c:478
        buf = 0x0
        cb = 0x7fc859c91d90 <sr_ssl_ctx_info_callback>
        st = 0x7fc856436ee0
        ret = -1
        ssret = <optimized out>
        ssl = 0x7fc856436e50
        __func__ = "state_machine"
#21 0x00007fc859cbf036 in tls_accept (c=c@entry=0x7fc856461be0, 
error=error@entry=0x7ffeec8b6dd8) at tls_server.c:457
        ret = <optimized out>
        ssl = 0x7fc856436e50
        cert = <optimized out>
        tls_c = 0x7fc8564171b0
        tls_log = <optimized out>
        __func__ = "tls_accept"
```

### Additional Information

  * **Kamailio Version** - output of `kamailio -v`

```
version: kamailio 5.8.4 (x86_64/linux) f1d83d
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, 
USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, MEM_JOIN_FREE, Q_MALLOC, 
F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, 
USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_SEND_BUFFER_SIZE 
262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: f1d83d 
compiled on 10:25:58 Nov 12 2024 with x86_64-ruag-linux-gcc 13.3.0

```

* **Operating System**:

It is a yocto scarthgap based image (Version 5.0.6 LTS)

```
Linux ttel 6.6.54-yocto-standard #1 SMP PREEMPT_DYNAMIC Tue Oct  8 03:03:40 UTC 
2024 x86_64 GNU/Linux
```


-- 
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/4185
You are receiving this because you are subscribed to this thread.

Message ID: <kamailio/kamailio/issues/4...@github.com>
_______________________________________________
Kamailio - Development Mailing List -- sr-dev@lists.kamailio.org
To unsubscribe send an email to sr-dev-le...@lists.kamailio.org
Important: keep the mailing list in the recipients, do not reply only to the 
sender!

Reply via email to