Hi , Stanislav Zaikin May be memroy overwrite issue, especial vlib buffer. Becasue VPP use vector packet processing, not scalar packet processing, so you can see a unexpected nodes even if not a stack corrupted . You can use debug version with CLIB_DEBUG>0 to location the root cause.
zhangguangm...@baicells.com From: Stanislav Zaikine Date: 2023-02-01 19:17 To: vpp-dev Subject: [vpp-dev] sigsegv and its handler Hello folks, I've been experiencing rare crashes (one crash in 3 months or so), it looks like the heap is corrupted somehow. Sometimes, the trace shows very unexpected nodes (like ip6-map-t although I don't configure any ipv6 map) or sometimes it's just a crash inside ip4-rewrite-node. After a look I found that last 2 crashes occured in the same way: 1. vnet_feature_arc_start_w_cfg_index or vnet_feature_arc_start call 2. vnet_get_config_data call But then VPP received and handled a SIGSEGV signal. It completely broke the stack trace in the core dump (for the corresponding worker): #0 0x00007f44fa0812c6 in __GI_epoll_pwait (epfd=8, events=0x7f44babe52d8, maxevents=<optimized out>, timeout=9, set=0x7f44fa5c66f8 <linux_epoll_input_inline.unblock_all_signals>) at ../sysdeps/unix/sysv/linux/epoll_pwait.c:42 #1 0x000000089f6fab2b in ?? () #2 0x00007f44babe52d8 in ?? () #3 0x0000000900000100 in ?? () #4 0x00007f44fa5c66f8 in _vlib_init_function_init_linux_epoll_input_init () from /lib/x86_64-linux-gnu/libvlib.so.22.10.0 #5 0x0000000000000000 in ?? () So, I can't analyze the core dump. Any ideas on how to catch this crash correctly? Disable receiving SIGSEGV? Or is there a way to restore the original stack trace of the worker? For the reference, stack traces from syslog: vnet[2856086]: received signal SIGSEGV, PC 0x7f44b76dbee3, faulting address 0xb0040114 vnet[2856086]: #0 0x00007f44fa43885b 0x7f44fa43885b (unix_signal_handler+379) vnet[2856086]: #1 0x00007f44fa34f3c0 0x7f44fa34f3c0 (__funlockfile) vnet[2856086]: #2 0x00007f44b76dbee3 0x7f44b76dbee3 (ip6_map_t+675) vnet[2856086]: #3 0x00007f44fa3c86fb vlib_worker_loop + 0x1b3b vnet[2856086]: #4 0x00007f44fa41aafa vlib_worker_thread_fn + 0xaa vnet[2856086]: #5 0x00007f44fa414e01 vlib_worker_thread_bootstrap_fn + 0x51 vnet[2856086]: #6 0x00007f44fa343609 start_thread + 0xd9 vnet[2856086]: #7 0x00007f44fa081163 clone + 0x43 vnet[944491]: received signal SIGSEGV, PC 0x7faf922ca6ae, faulting address 0x7fb3519530fc vnet[944491]: #0 0x00007faf9102785b 0x7faf9102785b vnet[944491]: #1 0x00007faf90f3e3c0 0x7faf90f3e3c0 vnet[944491]: #2 0x00007faf922ca6ae ip4_rewrite_node_fn_skx + 0x149e vnet[944491]: #3 0x00007faf90fb76fb vlib_worker_loop + 0x1b3b vnet[944491]: #4 0x00007faf91009afa vlib_worker_thread_fn + 0xaa vnet[944491]: #5 0x00007faf91003e01 vlib_worker_thread_bootstrap_fn + 0x51 vnet[944491]: #6 0x00007faf90f32609 start_thread + 0xd9 vnet[944491]: #7 0x00007faf90c70163 clone + 0x43 Line information: Line 135 of "/home/runner/work/vpp/vpp/src/vnet/config.h" starts at address 0x7f44b76dbee3 <ip6_map_t+675> and ends at 0x7f44b76dbee7 <ip6_map_t+679>. Line 135 of "/home/runner/work/vpp/vpp/src/vnet/config.h" starts at address 0x7f44fb6db6ae <ip4_rewrite_node_fn_skx+5278> and ends at 0x7f44fb6db6b1 <ip4_rewrite_node_fn_skx+5281>. -- Best regards Stanislav Zaikin
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#22532): https://lists.fd.io/g/vpp-dev/message/22532 Mute This Topic: https://lists.fd.io/mt/96673497/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-