Hey liangbaolin,

I tried to look into this, but the backtrace is a bit strange, because
those line numbers don't match up at all -
https://github.com/kamailio/kamailio/blob/5.8.1/src/modules/cdp/receiver.c#L942

Or maybe I'm looking at a different version than  you are, or the core is
from a different version of the source code?

Cheers,
-Dragos

On Fri, Oct 11, 2024 at 12:27 PM liangbaolin via sr-dev <
sr-dev@lists.kamailio.org> wrote:

> Description
>
> hi, I encountered a problem where the CDP module is extremely prone to
> process crashes. The following are screenshots of the logs and core files.
> I couldn't find the exception code that caused the problem, but I suspect
> that the TCP link was properly established, but the peer did not initialize
> or handle the exception properly, resulting in an exception when the packet
> was parsed incorrectly and disconnected later. In addition, since the
> socket is not a normal peer, it will constantly rebuild the chain, but the
> CDP does not recognize and process it properly. The socket will continue to
> grow, but the number of peers will not increase.
> _20241011175120.png (view on web)
> <https://github.com/user-attachments/assets/88a0fc31-9ff6-4f46-9394-f49c787850e8>
> Troubleshooting Reproduction Debugging Data
>
> Core was generated by `/usr/sbin/kamailio -f 
> /etc/kamailio_dra/kamailio_dra.cfg -P /var/run/kamailio_d'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007f9860fae336 in cc_acc_client_stateful_sm_process 
> (s=0x7f9861bd8980, event=21874, msg=0x55720d57c702 <qm_free+8029>) at 
> acctstatemachine.c:304
> 304   }
> (gdb) bt
> #0  0x00007f9860fae336 in cc_acc_client_stateful_sm_process 
> (s=0x7f9861bd8980, event=21874, msg=0x55720d57c702 <qm_free+8029>) at 
> acctstatemachine.c:304
> #1  0x00007f9860fae391 in atomic_get_and_set_int (var=0x776f6e6b6e752072, 
> v=32766) at ../../core/mem/../atomic/atomic_x86.h:242
> #2  0x00007f9860fb16eb in disconnect_serviced_peer (sp=0x7f98e28287d0, 
> locked=0) at receiver.c:232
> #3  0x00007f9860fbd56b in receive_loop (original_peer=0x0) at receiver.c:942
> #4  0x00007f9860fb4c02 in receiver_process (p=0x0) at receiver.c:488
> #5  0x00007f9860f50c80 in diameter_peer_start (blocking=0) at 
> diameter_peer.c:278
> #6  0x00007f9860f413df in cdp_child_init (rank=0) at cdp_mod.c:274
> #7  0x000055720d3df6c2 in init_mod_child (m=0x7f98e279b180, rank=0) at 
> core/sr_module.c:920
> #8  0x000055720d3df2b5 in init_mod_child (m=0x7f98e279be50, rank=0) at 
> core/sr_module.c:912
> #9  0x000055720d3df2b5 in init_mod_child (m=0x7f98e279c2c0, rank=0) at 
> core/sr_module.c:912
> #10 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279cc40, rank=0) at 
> core/sr_module.c:912
> #11 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279d130, rank=0) at 
> core/sr_module.c:912
> #12 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279d6c0, rank=0) at 
> core/sr_module.c:912
> #13 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279db10, rank=0) at 
> core/sr_module.c:912
> #14 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279dff0, rank=0) at 
> core/sr_module.c:912
> #15 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279fae0, rank=0) at 
> core/sr_module.c:912
> #16 0x000055720d3dfff0 in init_child (rank=0) at core/sr_module.c:999
> #17 0x000055720d23d70a in main_loop () at main.c:1942
> #18 0x000055720d2488f5 in main (argc=16, argv=0x7ffea4fde838) at main.c:3256
> (gdb) bt full
> #0  0x00007f9860fae336 in cc_acc_client_stateful_sm_process 
> (s=0x7f9861bd8980, event=21874, msg=0x55720d57c702 <qm_free+8029>) at 
> acctstatemachine.c:304
>         x = 0x7f9861b97000
>         ret = 441
>         rc = 1627304660
>         record_type = 32664
>         __func__ = "cc_acc_client_stateful_sm_process"
> #1  0x00007f9860fae391 in atomic_get_and_set_int (var=0x776f6e6b6e752072, 
> v=32766) at ../../core/mem/../atomic/atomic_x86.h:242
> No locals.
> #2  0x00007f9860fb16eb in disconnect_serviced_peer (sp=0x7f98e28287d0, 
> locked=0) at receiver.c:232
>         __llevel = 0
>         __func__ = "disconnect_serviced_peer"
> #3  0x00007f9860fbd56b in receive_loop (original_peer=0x0) at receiver.c:942
>         __llevel = -1526863824
>         rfds = {__fds_bits = {0, 0, 0, 0, 1024, 0 <repeats 11 times>}}
>         efds = {__fds_bits = {0 <repeats 16 times>}}
>         tv = {tv_sec = 0, tv_usec = 883496}
>         n = 1
>         max = 298
>         cnt = 0
>         msg = 0x0
>         sp = 0x7f98e28287d0
>         sp2 = 0x7f98e2827e30
>         p = 0x0
>         fd = 295
>         fd_exchange_pipe_local = 28
>         __func__ = "receive_loop"
> #4  0x00007f9860fb4c02 in receiver_process (p=0x0) at receiver.c:488
>         __llevel = -990730168
>         __func__ = "receiver_process"
> #5  0x00007f9860f50c80 in diameter_peer_start (blocking=0) at 
> diameter_peer.c:278
>         pid = 0
>         k = 1
>         seed = 1112701621
>         p = 0x0
>         __func__ = "diameter_peer_start"
> #6  0x00007f9860f413df in cdp_child_init (rank=0) at cdp_mod.c:274
>         __llevel = 0
>         __func__ = "cdp_child_init"
> #7  0x000055720d3df6c2 in init_mod_child (m=0x7f98e279b180, rank=0) at 
> core/sr_module.c:920
>         ret = 0
>         __func__ = "init_mod_child"
> #8  0x000055720d3df2b5 in init_mod_child (m=0x7f98e279be50, rank=0) at 
> core/sr_module.c:912
>         ret = 1
>         __func__ = "init_mod_child"
> #9  0x000055720d3df2b5 in init_mod_child (m=0x7f98e279c2c0, rank=0) at 
> core/sr_module.c:912
>         ret = 0
>         __func__ = "init_mod_child"
> #10 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279cc40, rank=0) at 
> core/sr_module.c:912
>         ret = 0
>         __func__ = "init_mod_child"
> ---Type <return> to continue, or q <return> to quit---
> #11 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279d130, rank=0) at 
> core/sr_module.c:912
>         ret = 0
>         __func__ = "init_mod_child"
> #12 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279d6c0, rank=0) at 
> core/sr_module.c:912
>         ret = 0
>         __func__ = "init_mod_child"
> #13 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279db10, rank=0) at 
> core/sr_module.c:912
>         ret = 32766
>         __func__ = "init_mod_child"
> #14 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279dff0, rank=0) at 
> core/sr_module.c:912
>         ret = 21874
>         __func__ = "init_mod_child"
> #15 0x000055720d3df2b5 in init_mod_child (m=0x7f98e279fae0, rank=0) at 
> core/sr_module.c:912
>         ret = 12
>         __func__ = "init_mod_child"
> #16 0x000055720d3dfff0 in init_child (rank=0) at core/sr_module.c:999
>         ret = 0
>         type = 0x55720d6ece7b "PROC_MAIN"
>         __func__ = "init_child"
> #17 0x000055720d23d70a in main_loop () at main.c:1942
>         i = 1639542784
>         pid = 50
>         si = 0x0
>         si_desc = 
> "\240\233\"\rrU\000\000@\361\367a\230\177\000\000\000\343\375\244\376\177\000\000\327~2\rrU\000\000\000\343\375\244\376\177\000\000\025\t>\r\005\000\000\000\000\000\000\000\037\000\000\000\000;\213\222\213\017\r\202h\r\000\000\000\000\000\000\060\000\000\000\000\000\000\000\240\233\"\rrU\000\000\060\350\375\244\376\177",
>  '\000' <repeats 18 times>, "\340\342\375\244\376\177\000\000 \351O\rrU\000"
>         nrprocs = 21874
>         woneinit = 0
>         __func__ = "main_loop"
> #18 0x000055720d2488f5 in main (argc=16, argv=0x7ffea4fde838) at main.c:3256
>         cfg_stream = 0x55720ef75260
>         c = -1
>         r = 0
>         tmp = 0x7ffea4fdfee3 ""
>         tmp_len = 32766
>         port = 5060
>         proto = 0
>         aproto = 0
>         ahost = 0x0
>         aport = 0
>         options = 0x55720d6b3698 
> ":f:cm:M:dVIhEeb:B:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"
>         ret = -1
>         seed = 2959679812
>         rfd = 4
>         debug_save = 0
>         debug_flag = 0
>         dont_fork_cnt = 2
>         n_lst = 0x0
>         p = 0x7f996220a3d0 ""
>         st = {st_dev = 50, st_ino = 2995745, st_nlink = 1, st_mode = 16877, 
> st_uid = 103, st_gid = 105, __pad0 = 0, st_rdev = 0, st_size = 4096, 
> st_blksize = 4096, st_blocks = 8, st_atim = {tv_sec = 1726638407, tv_nsec = 
> 558385502}, st_mtim = {tv_sec = 1727831142,
>             tv_nsec = 822744119}, st_ctim = {tv_sec = 1727831142, tv_nsec = 
> 822744119}, __glibc_reserved = {0, 0, 0}}
> ---Type <return> to continue, or q <return> to quit---
>         l1 = 2048
>         tbuf = 
> "pQ!b\231\177\000\000\070-\000b\231\177\000\000\020\347\375\244\376\177\000\000\367\344\376a\231\177",
>  '\000' <repeats 18 times>, 
> "\001\000\000\000\000\000\000\000(W!b\231\177\000\000\000Q!b\231\177\000\000\001\000\000\000\000\000\000\000\300\200
>  b\231\177\000\000\017Q\377a\231\177\000\000\020W!b\231\177", '\000' <repeats 
> 19 times>, 
> "S\376\244\376\177\000\000\300\212\225\001\000\000\000\000\207\026=a\231\177\000\000`\347\375\244\376\177\000\000\220Q\376\244\376\177\000\000\002\000\000\000\231\177\000\000\000\000\000\000\000\000\000\000\300\346\375\244\376\177\000\000\003\000\000\000\000\000\000\000\260\346\375\244\376\177\000\000\000\000\000\000\000\000\000\000"...
>         option_index = 0
>         long_options = {{name = 0x55720d6b59f6 "help", has_arg = 0, flag = 
> 0x0, val = 104}, {name = 0x55720d6b087c "version", has_arg = 0, flag = 0x0, 
> val = 118}, {name = 0x55720d6b59fb "alias", has_arg = 1, flag = 0x0, val = 
> 1024}, {name = 0x55720d6b5a01 "subst", has_arg = 1,
>             flag = 0x0, val = 1025}, {name = 0x55720d6b5a07 "substdef", 
> has_arg = 1, flag = 0x0, val = 1026}, {name = 0x55720d6b5a10 "substdefs", 
> has_arg = 1, flag = 0x0, val = 1027}, {name = 0x55720d6b5a1a "server-id", 
> has_arg = 1, flag = 0x0, val = 1028}, {
>             name = 0x55720d6b5a24 "loadmodule", has_arg = 1, flag = 0x0, val 
> = 1029}, {name = 0x55720d6b5a2f "modparam", has_arg = 1, flag = 0x0, val = 
> 1030}, {name = 0x55720d6b5a38 "log-engine", has_arg = 1, flag = 0x0, val = 
> 1031}, {name = 0x55720d6b5a43 "debug", has_arg = 1,
>             flag = 0x0, val = 1032}, {name = 0x55720d6b5a49 "cfg-print", 
> has_arg = 0, flag = 0x0, val = 1033}, {name = 0x55720d6b5a53 "atexit", 
> has_arg = 1, flag = 0x0, val = 1034}, {name = 0x55720d6b5a5a "all-errors", 
> has_arg = 0, flag = 0x0, val = 1035}, {name = 0x0,
>             has_arg = 0, flag = 0x0, val = 0}}
>         __func__ = "main"
> (gdb)
> (gdb)   info locals
> cfg_stream = 0x55720ef75260
> c = -1
> r = 0
> tmp = 0x7ffea4fdfee3 ""
> tmp_len = 32766
> port = 5060
> proto = 0
> aproto = 0
> ahost = 0x0
> aport = 0
> options = 0x55720d6b3698 
> ":f:cm:M:dVIhEeb:B:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"
> ret = -1
> seed = 2959679812
> rfd = 4
> debug_save = 0
> debug_flag = 0
> dont_fork_cnt = 2
> n_lst = 0x0
> p = 0x7f996220a3d0 ""
> st = {st_dev = 50, st_ino = 2995745, st_nlink = 1, st_mode = 16877, st_uid = 
> 103, st_gid = 105, __pad0 = 0, st_rdev = 0, st_size = 4096, st_blksize = 
> 4096, st_blocks = 8, st_atim = {tv_sec = 1726638407, tv_nsec = 558385502}, 
> st_mtim = {tv_sec = 1727831142, tv_nsec = 822744119},
>   st_ctim = {tv_sec = 1727831142, tv_nsec = 822744119}, __glibc_reserved = 
> {0, 0, 0}}
> l1 = 2048
> tbuf = 
> "pQ!b\231\177\000\000\070-\000b\231\177\000\000\020\347\375\244\376\177\000\000\367\344\376a\231\177",
>  '\000' <repeats 18 times>, 
> "\001\000\000\000\000\000\000\000(W!b\231\177\000\000\000Q!b\231\177\000\000\001\000\000\000\000\000\000\000\300\200
>  b\231\177\000\000\017Q\377a\231\177\000\000\020W!b\231\177", '\000' <repeats 
> 19 times>, 
> "S\376\244\376\177\000\000\300\212\225\001\000\000\000\000\207\026=a\231\177\000\000`\347\375\244\376\177\000\000\220Q\376\244\376\177\000\000\002\000\000\000\231\177\000\000\000\000\000\000\000\000\000\000\300\346\375\244\376\177\000\000\003\000\000\000\000\000\000\000\260\346\375\244\376\177\000\000\000\000\000\000\000\000\000\000"...
> option_index = 0
> long_options = {{name = 0x55720d6b59f6 "help", has_arg = 0, flag = 0x0, val = 
> 104}, {name = 0x55720d6b087c "version", has_arg = 0, flag = 0x0, val = 118}, 
> {name = 0x55720d6b59fb "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 
> 0x55720d6b5a01 "subst", has_arg = 1,
>     flag = 0x0, val = 1025}, {name = 0x55720d6b5a07 "substdef", has_arg = 1, 
> flag = 0x0, val = 1026}, {name = 0x55720d6b5a10 "substdefs", has_arg = 1, 
> flag = 0x0, val = 1027}, {name = 0x55720d6b5a1a "server-id", has_arg = 1, 
> flag = 0x0, val = 1028}, {
>     name = 0x55720d6b5a24 "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, 
> {name = 0x55720d6b5a2f "modparam", has_arg = 1, flag = 0x0, val = 1030}, 
> {name = 0x55720d6b5a38 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, 
> {name = 0x55720d6b5a43 "debug", has_arg = 1,
>     flag = 0x0, val = 1032}, {name = 0x55720d6b5a49 "cfg-print", has_arg = 0, 
> flag = 0x0, val = 1033}, {name = 0x55720d6b5a53 "atexit", has_arg = 1, flag = 
> 0x0, val = 1034}, {name = 0x55720d6b5a5a "all-errors", has_arg = 0, flag = 
> 0x0, val = 1035}, {name = 0x0, has_arg = 0,
>     flag = 0x0, val = 0}}
> __func__ = "main"
> (gdb) list
> 299           if(s) {
> 300                   AAASessionsUnlock(s->hash);
> 301           }
> 302   
> 303           return ret;
> 304   }
>
> Log Messages
>
> 2024-10-11T08:23:20.234947572Z 21(60) ERROR: cdp [receiver.c:783]: 
> receive_loop(): select_recv(): Bad file descriptor
> 2024-10-11T08:23:24.906121404Z 21(60) ERROR: cdp [receiver.c:783]: 
> receive_loop(): select_recv(): Bad file descriptor
> 2024-10-11T08:23:41.857946233Z 21(60) ERROR: cdp [receiver.c:783]: 
> receive_loop(): select_recv(): Bad file descriptor
> 2024-10-11T08:25:18.095639136Z 25(64) WARNING: cdp [peermanager.c:337]: 
> peer_timer(): Inactivity on peer [scscf32.ims.mnc011.mcc460.3gppnetwork.org] 
> and no DWA, Closing peer...
> 2024-10-11T08:43:38.138243609Z 31(70) CRITICAL: <core> [core/pass_fd.c:281]: 
> receive_fd(): EOF on 34
> 2024-10-11T08:43:47.476315811Z  0(39) ALERT: <core> [main.c:805]: 
> handle_sigs(): child process 60 exited by a signal 11
> 2024-10-11T08:43:47.476380054Z  0(39) ALERT: <core> [main.c:809]: 
> handle_sigs(): core was generated
> 2024-10-11T08:43:47.503780368Z  0(39) CRITICAL: cdp [diameter_peer.c:447]: 
> diameter_peer_destroy(): destroy_diameter_peer(): Bye Bye from C Diameter 
> Peer test
>
>
>
>
>    - *Operating System*:
>
> version: kamailio 5.8.1 (x86_64/linux) 07b761
> flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS,
> DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, MEM_JOIN_FREE,
> Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX,
> FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR,
> USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
> ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144,
> MAX_SEND_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT
> PKG_SIZE 8MB
> poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
> id: 07b761
> compiled on 10:00:57 Oct 9 2024 with gcc 7.5.0
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/kamailio/kamailio/issues/3999>, or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABO7UZKVG2MSCXKHIKVVDOTZ26RMPAVCNFSM6AAAAABPYT6WY6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4DCMBRGM2TENA>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: <kamailio/kamailio/issues/3...@github.com>
> _______________________________________________
> Kamailio (SER) - Development Mailing List
> To unsubscribe send an email to sr-dev-le...@lists.kamailio.org
>
_______________________________________________
Kamailio (SER) - Development Mailing List
To unsubscribe send an email to sr-dev-le...@lists.kamailio.org

Reply via email to