Hi , Stanislav Zaikin    

  May be memroy overwrite issue, especial vlib buffer.   Becasue VPP use vector 
packet processing, not scalar packet processing, so you can see
a unexpected nodes even if not a stack corrupted .
  You can use debug version with CLIB_DEBUG>0 to location the root cause. 


zhangguangm...@baicells.com
 
From: Stanislav Zaikine 
Date: 2023-02-01 19:17
To: vpp-dev
Subject: [vpp-dev] sigsegv and its handler
Hello folks,

I've been experiencing rare crashes (one crash in 3 months or so), it looks 
like the heap is corrupted somehow. Sometimes, the trace shows very unexpected 
nodes (like ip6-map-t although I don't configure any ipv6 map) or sometimes 
it's just a crash inside ip4-rewrite-node.

After a look I found that last 2 crashes occured in the same way:
1. vnet_feature_arc_start_w_cfg_index or vnet_feature_arc_start call
2. vnet_get_config_data call

But then VPP received and handled a SIGSEGV signal. It completely broke the 
stack trace in the core dump (for the corresponding worker):
#0  0x00007f44fa0812c6 in __GI_epoll_pwait (epfd=8, events=0x7f44babe52d8, 
maxevents=<optimized out>, timeout=9, set=0x7f44fa5c66f8 
<linux_epoll_input_inline.unblock_all_signals>) at 
../sysdeps/unix/sysv/linux/epoll_pwait.c:42
#1  0x000000089f6fab2b in ?? ()
#2  0x00007f44babe52d8 in ?? ()
#3  0x0000000900000100 in ?? ()
#4  0x00007f44fa5c66f8 in _vlib_init_function_init_linux_epoll_input_init () 
from /lib/x86_64-linux-gnu/libvlib.so.22.10.0
#5  0x0000000000000000 in ?? ()

So, I can't analyze the core dump. Any ideas on how to catch this crash 
correctly? Disable receiving SIGSEGV? Or is there a way to restore the original 
stack trace of the worker?

For the reference, stack traces from syslog:
vnet[2856086]: received signal SIGSEGV, PC 0x7f44b76dbee3, faulting address 
0xb0040114
vnet[2856086]: #0  0x00007f44fa43885b 0x7f44fa43885b (unix_signal_handler+379)
vnet[2856086]: #1  0x00007f44fa34f3c0 0x7f44fa34f3c0 (__funlockfile)
vnet[2856086]: #2  0x00007f44b76dbee3 0x7f44b76dbee3 (ip6_map_t+675)
vnet[2856086]: #3  0x00007f44fa3c86fb vlib_worker_loop + 0x1b3b
vnet[2856086]: #4  0x00007f44fa41aafa vlib_worker_thread_fn + 0xaa
vnet[2856086]: #5  0x00007f44fa414e01 vlib_worker_thread_bootstrap_fn + 0x51
vnet[2856086]: #6  0x00007f44fa343609 start_thread + 0xd9
vnet[2856086]: #7  0x00007f44fa081163 clone + 0x43

vnet[944491]: received signal SIGSEGV, PC 0x7faf922ca6ae, faulting address 
0x7fb3519530fc
vnet[944491]: #0  0x00007faf9102785b 0x7faf9102785b
vnet[944491]: #1  0x00007faf90f3e3c0 0x7faf90f3e3c0
vnet[944491]: #2  0x00007faf922ca6ae ip4_rewrite_node_fn_skx + 0x149e
vnet[944491]: #3  0x00007faf90fb76fb vlib_worker_loop + 0x1b3b
vnet[944491]: #4  0x00007faf91009afa vlib_worker_thread_fn + 0xaa
vnet[944491]: #5  0x00007faf91003e01 vlib_worker_thread_bootstrap_fn + 0x51
vnet[944491]: #6  0x00007faf90f32609 start_thread + 0xd9
vnet[944491]: #7  0x00007faf90c70163 clone + 0x43

Line information:
Line 135 of "/home/runner/work/vpp/vpp/src/vnet/config.h" starts at address 
0x7f44b76dbee3 <ip6_map_t+675> and ends at 0x7f44b76dbee7 <ip6_map_t+679>.

Line 135 of "/home/runner/work/vpp/vpp/src/vnet/config.h" starts at address 
0x7f44fb6db6ae <ip4_rewrite_node_fn_skx+5278> and ends at 0x7f44fb6db6b1 
<ip4_rewrite_node_fn_skx+5281>.

-- 
Best regards
Stanislav Zaikin
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22532): https://lists.fd.io/g/vpp-dev/message/22532
Mute This Topic: https://lists.fd.io/mt/96673497/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to