Daniel, I was unable to locate them. We made the mem_safety change and put the machine back into Production. It has been running for 5 hours without crashing. The only error that seems different than before is the following
Apr 10 17:36:29 tel-vc-fs03 /usr/local/sbin/kamailio[31365]: ERROR: tm [t_fwd.c:1609]: t_send_branch(): ERROR: t_send_branch: sending request on branch 0 failed Apr 10 17:36:29 tel-vc-fs03 /usr/local/sbin/kamailio[31365]: ERROR: sl [sl_funcs.c:387]: sl_reply_error(): ERROR: sl_reply_error used: Unfortunately error on sending to next hop occurred (477/SL) Apr 10 17:37:02 tel-vc-fs03 /usr/local/sbin/kamailio[31368]: : <core> [mem/q_malloc.c:468]: qm_free(): BUG: qm_free: freeing already freed pointer (0x7f8c19c22f10), called from tm: h_table.c: free_cell(186), first free tm: h_table .c: free_cell(186) - aborting On Apr 9, 2014, at 12:34 PM, Daniel-Constantin Mierla <mico...@gmail.com> wrote: > Hello, > > the logs show that the core file was generated: > > Apr 3 13:14:55 tel-vc-fs03 /usr/local/sbin/kamailio[7553]: ALERT: <core> > [main.c:778]: handle_sigs(): core was generated > > Unless you deleted them, then you should have them somewhere on the file > system. > > mem_safety won't affect any kind of processing - it is just a protection for > a double call of free() function. Actually, it is turned on with f_malloc > memory manager which was the default in the past. Now we use q_malloc as > default which is kind of f_malloc with defrag option. We should make this > back on by default, but catching double free() is also good to solve anyhow. > > Cheers, > Daniel > > On 09/04/14 17:15, Samuel Ware wrote: >> Daniel, >> >> Here is the version information. >> >> r...@tel-vc-fs03.telariscom.com> kamailio -v >> version: kamailio 4.1.2 (x86_64/linux) 73ea61 >> flags: STATS: Off, USE_TCP, USE_TLS, TLS_HOOKS, USE_RAW_SOCKS, >> DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, >> DBG_QM_MALLOC, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, >> USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES >> ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, >> MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 4MB >> poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. >> id: 73ea61 >> compiled on 20:41:21 Apr 1 2014 with gcc 4.4.7 >> >> We don’t see this happen in the lab environment. It only happens in >> production so we are little concern about it failing. Unfortunately, we >> don’t have any core dumps of this occurring; we didn’t have that feature >> enabled when this happened. >> >> By enabling “men_safety”, will this affect the processing of any of other >> signaling messages when an error occurs or will it only affect the >> transaction in which the error occurred. There isn’t much detail in the >> description of this flag in the documentation to layout how it affects the >> overall system in an otherwise “abort” situation. Your assistance is >> appreciated and I hope that it doesn’t sound like we are being difficult but >> this element is setting in front of our entire network and failure causes us >> to block all calls. We were running close to 800 CPS through it when it >> failed. We don’t see any load on the machine that would indicate that the >> issue isn’t software based. If the “men_safety” will not affect the rest of >> the traffic, we can add this as well as enabling core dumps and try it again >> when we can closely monitor the system. Please provide the additional >> details about “men_safety” so we can assess the situation. Is there any >> other information that I can provide to assist your input like the part of >> the config file or what else? >> >> >> >> Sam >> >> >> On Apr 9, 2014, at 2:27 AM, Daniel-Constantin Mierla <mico...@gmail.com> >> wrote: >> >>> Can you get the output of kamailio -v? >>> >>> Also, send the backtrace for the corefiles taken with gdb - the logs show >>> that the core were generated (check / or the value of -w parameter): >>> >>> gdb /usr/local/sbin/kamailio /path/to/core >>> bt full >>> >>> To protect against such cases, you can use in config file: >>> >>> mem_safety=1 >>> >>> Cheers, >>> Daniel >>> >>> On 08/04/14 17:34, Samuel Ware wrote: >>>> Here are the logs from the first crash >>>> >>>> Apr 3 13:13:59 tel-vc-fs03 /usr/local/sbin/kamailio[7602]: ERROR: tm >>>> [../../forward.h:199]: msg_send(): msg_send: ERROR: udp_send failed >>>> Apr 3 13:13:59 tel-vc-fs03 /usr/local/sbin/kamailio[7602]: ERROR: tm >>>> [t_fwd.c:1609]: t_send_branch(): ERROR: t_send_branch: sending request on >>>> branch 0 failed >>>> Apr 3 13:13:59 tel-vc-fs03 /usr/local/sbin/kamailio[7602]: ERROR: sl >>>> [sl_funcs.c:387]: sl_reply_error(): ERROR: sl_reply_error used: >>>> Unfortunately error on sending to next hop occurred (477/SL) >>>> Apr 3 13:14:05 tel-vc-fs03 /usr/local/sbin/kamailio[7579]: ERROR: >>>> <core> [udp_server.c:591]: udp_send(): ERROR: udp_send: >>>> sendto(sock,0x7feac6836310,65572,0,205.251.172.14:5060,16): Message too >>>> long(90) >>>> Apr 3 13:14:05 tel-vc-fs03 /usr/local/sbin/kamailio[7579]: ERROR: tm >>>> [../../forward.h:199]: msg_send(): msg_send: ERROR: udp_send failed >>>> Apr 3 13:14:05 tel-vc-fs03 /usr/local/sbin/kamailio[7579]: ERROR: tm >>>> [t_fwd.c:1609]: t_send_branch(): ERROR: t_send_branch: sending request on >>>> branch 0 failed >>>> Apr 3 13:14:05 tel-vc-fs03 /usr/local/sbin/kamailio[7579]: ERROR: sl >>>> [sl_funcs.c:387]: sl_reply_error(): ERROR: sl_reply_error used: >>>> Unfortunately error on sending to next hop occurred (477/SL) >>>> Apr 3 13:14:20 tel-vc-fs03 /usr/local/sbin/kamailio[7594]: ERROR: >>>> <core> [udp_server.c:591]: udp_send(): ERROR: udp_send: >>>> sendto(sock,0x7feaace02c98,65533,0,205.251.172.14:5060,16): Message too >>>> long(90) >>>> Apr 3 13:14:20 tel-vc-fs03 /usr/local/sbin/kamailio[7594]: ERROR: tm >>>> [../../forward.h:199]: msg_send(): msg_send: ERROR: udp_send failed >>>> Apr 3 13:14:20 tel-vc-fs03 /usr/local/sbin/kamailio[7594]: ERROR: tm >>>> [t_fwd.c:1609]: t_send_branch(): ERROR: t_send_branch: sending request on >>>> branch 0 failed >>>> Apr 3 13:14:20 tel-vc-fs03 /usr/local/sbin/kamailio[7594]: ERROR: sl >>>> [sl_funcs.c:387]: sl_reply_error(): ERROR: sl_reply_error used: >>>> Unfortunately error on sending to next hop occurred (477/SL) >>>> Apr 3 13:14:27 tel-vc-fs03 /usr/local/sbin/kamailio[7583]: ERROR: >>>> <core> [udp_server.c:591]: udp_send(): ERROR: udp_send: >>>> sendto(sock,0x7feac6836310,65569,0,205.251.172.14:5060,16): Message too >>>> long(90) >>>> Apr 3 13:14:27 tel-vc-fs03 /usr/local/sbin/kamailio[7583]: ERROR: tm >>>> [../../forward.h:199]: msg_send(): msg_send: ERROR: udp_send failed >>>> Apr 3 13:14:27 tel-vc-fs03 /usr/local/sbin/kamailio[7583]: ERROR: tm >>>> [t_fwd.c:1609]: t_send_branch(): ERROR: t_send_branch: sending request on >>>> branch 0 failed >>>> Apr 3 13:14:27 tel-vc-fs03 /usr/local/sbin/kamailio[7583]: ERROR: sl >>>> [sl_funcs.c:387]: sl_reply_error(): ERROR: sl_reply_error used: >>>> Unfortunately error on sending to next hop occurred (477/SL) >>>> Apr 3 13:14:32 tel-vc-fs03 /usr/local/sbin/kamailio[7589]: : <core> >>>> [mem/q_malloc.c:468]: qm_free(): BUG: qm_free: freeing already freed >>>> pointer (0x7feaca6c9738), called from <core>: mem/shm_mem.c: >>>> sh_realloc(88), first free <core>: mem/shm_mem.c: sh_realloc(88) - aborting >>>> Apr 3 13:14:55 tel-vc-fs03 /usr/local/sbin/kamailio[7553]: ALERT: >>>> <core> [main.c:775]: handle_sigs(): child process 7589 exited by a signal 6 >>>> Apr 3 13:14:55 tel-vc-fs03 /usr/local/sbin/kamailio[7553]: ALERT: >>>> <core> [main.c:778]: handle_sigs(): core was generated >>>> Apr 3 13:14:55 tel-vc-fs03 /usr/local/sbin/kamailio[7553]: INFO: >>>> <core> [main.c:790]: handle_sigs(): INFO: terminating due to SIGCHLD >>>> Apr 3 13:14:55 tel-vc-fs03 /usr/local/sbin/kamailio[7604]: INFO: >>>> <core> [main.c:841]: sig_usr(): INFO: signal 15 received >>>> Apr 3 13:14:55 tel-vc-fs03 /usr/local/sbin/kamailio[7597]: INFO: >>>> <core> [main.c:841]: sig_usr(): INFO: signal 15 received >>>> Apr 3 13:14:55 tel-vc-fs03 /usr/local/sbin/kamailio[7579]: INFO: >>>> <core> [main.c:841]: sig_usr(): INFO: signal 15 received >>>> Apr 3 13:14:55 tel-vc-fs03 /usr/local/sbin/kamailio[7603]: INFO: >>>> <core> [main.c:841]: sig_usr(): INFO: signal 15 received >>>> Apr 3 13:14:55 tel-vc-fs03 /usr/local/sbin/kamailio[7601]: INFO: >>>> <core> [main.c:841]: sig_usr(): INFO: signal 15 received >>>> >>>> >>>> Logs from second crash >>>> >>>> Apr 3 14:38:42 tel-vc-fs03 /usr/local/sbin/kamailio[3979]: ERROR: <core> >>>> [udp_server.c:591]: udp_send(): ERROR: udp_send: >>>> sendto(sock,0x7f34ec4f6ed8,65569,0,205.251.172.14:5060,16): Message too >>>> long(90) >>>> Apr 3 14:38:42 tel-vc-fs03 /usr/local/sbin/kamailio[3979]: ERROR: tm >>>> [../../forward.h:199]: msg_send(): msg_send: ERROR: udp_send failed >>>> Apr 3 14:38:42 tel-vc-fs03 /usr/local/sbin/kamailio[3979]: ERROR: tm >>>> [t_fwd.c:1609]: t_send_branch(): ERROR: t_send_branch: sending request on >>>> branch 0 failed >>>> Apr 3 14:38:42 tel-vc-fs03 /usr/local/sbin/kamailio[3979]: ERROR: sl >>>> [sl_funcs.c:387]: sl_reply_error(): ERROR: sl_reply_error used: >>>> Unfortunately error on sending to next hop occurred (477/SL) >>>> Apr 3 14:38:44 tel-vc-fs03 /usr/local/sbin/kamailio[3965]: ERROR: >>>> <core> [udp_server.c:591]: udp_send(): ERROR: udp_send: >>>> sendto(sock,0x7f35242e5d28,65535,0,205.251.172.14:5060,16): Message too >>>> long(90) >>>> Apr 3 14:38:44 tel-vc-fs03 /usr/local/sbin/kamailio[3965]: ERROR: tm >>>> [../../forward.h:199]: msg_send(): msg_send: ERROR: udp_send failed >>>> Apr 3 14:38:44 tel-vc-fs03 /usr/local/sbin/kamailio[3965]: ERROR: tm >>>> [t_fwd.c:1609]: t_send_branch(): ERROR: t_send_branch: sending request on >>>> branch 0 failed >>>> Apr 3 14:38:44 tel-vc-fs03 /usr/local/sbin/kamailio[3965]: ERROR: sl >>>> [sl_funcs.c:387]: sl_reply_error(): ERROR: sl_reply_error used: >>>> Unfortunately error on sending to next hop occurred (477/SL) >>>> Apr 3 14:38:57 tel-vc-fs03 /usr/local/sbin/kamailio[3968]: ERROR: >>>> <core> [udp_server.c:591]: udp_send(): ERROR: udp_send: >>>> sendto(sock,0x7f3516fb6720,65578,0,205.251.172.14:5060,16): Message too >>>> long(90) >>>> Apr 3 14:38:57 tel-vc-fs03 /usr/local/sbin/kamailio[3968]: ERROR: tm >>>> [../../forward.h:199]: msg_send(): msg_send: ERROR: udp_send failed >>>> Apr 3 14:38:57 tel-vc-fs03 /usr/local/sbin/kamailio[3968]: ERROR: tm >>>> [t_fwd.c:1609]: t_send_branch(): ERROR: t_send_branch: sending request on >>>> branch 0 failed >>>> Apr 3 14:38:57 tel-vc-fs03 /usr/local/sbin/kamailio[3968]: ERROR: >>>> <core> [udp_server.c:591]: udp_send(): ERROR: udp_send: >>>> sendto(sock,0x7f37d8e91088,65522,0,205.251.172.14:5060,16): Message too >>>> long(90) >>>> Apr 3 14:38:57 tel-vc-fs03 /usr/local/sbin/kamailio[3968]: ERROR: sl >>>> [../../forward.h:199]: msg_send(): msg_send: ERROR: udp_send failed >>>> Apr 3 14:38:57 tel-vc-fs03 /usr/local/sbin/kamailio[3968]: ERROR: sl >>>> [sl_funcs.c:387]: sl_reply_error(): ERROR: sl_reply_error used: >>>> Unfortunately error on sending to next hop occurred (477/SL) >>>> Apr 3 14:38:57 tel-vc-fs03 /usr/local/sbin/kamailio[3968]: ERROR: >>>> <core> [udp_server.c:591]: udp_send(): ERROR: udp_send: >>>> sendto(sock,0x7f37d8e91088,65522,0,205.251.172.14:5060,16): Message too >>>> long(90) >>>> Apr 3 14:38:57 tel-vc-fs03 /usr/local/sbin/kamailio[3968]: ERROR: tm >>>> [../../forward.h:199]: msg_send(): msg_send: ERROR: udp_send failed >>>> Apr 3 14:38:58 tel-vc-fs03 /usr/local/sbin/kamailio[3961]: ERROR: >>>> <core> [udp_server.c:591]: udp_send(): ERROR: udp_send: >>>> sendto(sock,0x7f37d81fa960,65522,0,205.251.172.14:5060,16): Message too >>>> long(90) >>>> Apr 3 14:38:58 tel-vc-fs03 /usr/local/sbin/kamailio[3961]: ERROR: tm >>>> [../../forward.h:199]: msg_send(): msg_send: ERROR: udp_send failed >>>> Apr 3 14:38:58 tel-vc-fs03 /usr/local/sbin/kamailio[3974]: : <core> >>>> [mem/q_malloc.c:468]: qm_free(): BUG: qm_free: freeing already freed >>>> pointer (0x7f34d7d6b720), called from <core>: mem/shm_mem.c: >>>> sh_realloc(88), first free <core>: mem/shm_mem.c: sh_realloc(88) - aborting >>>> Apr 3 14:39:20 tel-vc-fs03 /usr/local/sbin/kamailio[3935]: ALERT: >>>> <core> [main.c:775]: handle_sigs(): child process 3974 exited by a signal 6 >>>> Apr 3 14:39:20 tel-vc-fs03 /usr/local/sbin/kamailio[3935]: ALERT: >>>> <core> [main.c:778]: handle_sigs(): core was generated >>>> Apr 3 14:39:20 tel-vc-fs03 /usr/local/sbin/kamailio[3935]: INFO: >>>> <core> [main.c:790]: handle_sigs(): INFO: terminating due to SIGCHLD >>>> Apr 3 14:39:20 tel-vc-fs03 /usr/local/sbin/kamailio[3985]: INFO: >>>> <core> [main.c:841]: sig_usr(): INFO: signal 15 received >>>> Apr 3 14:39:20 tel-vc-fs03 /usr/local/sbin/kamailio[3983]: INFO: >>>> <core> [main.c:841]: sig_usr(): INFO: signal 15 received >>>> Apr 3 14:39:20 tel-vc-fs03 /usr/local/sbin/kamailio[3966]: INFO: >>>> <core> [main.c:841]: sig_usr(): INFO: signal 15 received >>>> Apr 3 14:39:20 tel-vc-fs03 /usr/local/sbin/kamailio[3988]: INFO: >>>> <core> [main.c:841]: sig_usr(): INFO: signal 15 received >>>> Apr 3 14:39:20 tel-vc-fs03 /usr/local/sbin/kamailio[3972]: INFO: >>>> <core> [main.c:841]: sig_usr(): INFO: signal 15 received >>>> Apr 3 14:39:20 tel-vc-fs03 /usr/local/sbin/kamailio[3984]: INFO: >>>> <core> [main.c:841]: sig_usr(): INFO: signal 15 received >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list >>>> sr-users@lists.sip-router.org >>>> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users >>> -- >>> Daniel-Constantin Mierla - http://www.asipto.com >>> http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda >>> > > -- > Daniel-Constantin Mierla - http://www.asipto.com > http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda > > _______________________________________________ SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users