Hello VPP experts,

fdio version: 2106

We are seeing the following crash with this version. with the earlier
version (we were using fdio 2005), we don't see any problem.
Have you seen a similar issue earlier?
Any idea what could be the root cause based on the information given below?

(gdb) bt
#0  0x00007f3f5bf872a2 in raise () from /lib64/libc.so.6
#1  0x00007f3f5bf708a4 in abort () from /lib64/libc.so.6
#2  0x0000563169f469a0 in os_panic () at
/usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vpp/vnet/main.c:453
#3  0x00007f3f5c334597 in clib_mem_alloc_aligned_at_offset (size=<optimized
out>, align=8, align_offset=<optimized out>, os_out_of_memory_on_failure=1)
at
/usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vppinfra/mem.h:243
#4  vec_resize_allocate_memory (v=<optimized out>, v@entry=0x7f3f4ef613e0,
length_increment=<optimized out>, length_increment@entry=1,
data_bytes=167075112, header_bytes=8, header_bytes@entry=0,
data_align=data_align@entry=8,
    numa_id=numa_id@entry=255) at
/usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vppinfra/vec.c:111
#5  0x00007f3f5c4025cb in _vec_resize_inline (v=0x7f3f4ef613e0,
length_increment=1, data_bytes=<optimized out>, header_bytes=0,
data_align=8, numa_id=255)
    at
/usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vppinfra/vec.h:170
#6  vlib_put_next_frame (vm=<optimized out>, vm@entry=0x7f3f30239780,
r=r@entry=0x7f3f30d1b180, next_index=next_index@entry=2,
n_vectors_left=<optimized out>)
    at
/usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vlib/main.c:543
#7  0x00007f3f5c5a61cd in enqueue_one (vm=0x7f3f30239780,
node=0x7f3f30d1b180, used_elt_bmp=0x7f3bf9bf8f20, next_index=<optimized
out>, buffers=0x7f3f30562090, nexts=0x7f3bf9bf9420, n_buffers=<optimized
out>,
    n_left=<optimized out>, tmp=0x7f3bf9bf8f60) at
/usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vlib/buffer_funcs.c:105
#8  vlib_buffer_enqueue_to_next_fn_skx (vm=0x7f3f30239780,
node=0x7f3f30d1b180, buffers=0x7f3f30562090, nexts=<optimized out>,
count=<optimized out>)
    at
/usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vlib/buffer_funcs.c:153
#9  0x00007f3edac5bcaf in vlib_buffer_enqueue_to_next (count=<optimized
out>, nexts=0x7f3bf9bf9420, buffers=<optimized out>, node=0x7f3f30d1b180,
vm=0x7f3f30239780)
    at
/usr/cna/bld-dataplane_base/base/cni-infra-dataplane/fdio/src/fdio.2106/build-root/install-vpp_debug-native/vpp/include/vlib/buffer_node.h:344
#10 an_ppe_router_input_inline (is_trace=<optimized out>, frame=<optimized
out>, node=<optimized out>, p_vlib_main=<optimized out>)
    at
/src/cna/.build/dbg/external-package/fdio/src/fdio.2106/src/an-plugins/an_ppe_router-plugin/an_ppe_router/an_ppe_router_input_node.c:298
#11 an_ppe_router_input_node_fn (vm=0x7f3f30239780, node=<optimized out>,
frame=0x7f3f30562080)
    at
/src/cna/.build/dbg/external-package/fdio/src/fdio.2106/src/an-plugins/an_ppe_router-plugin/an_ppe_router/an_ppe_router_input_node.c:315
#12 0x00007f3f5c405427 in dispatch_node (vm=0x7f3f30239780,
node=0x7f3f30d1b180, type=VLIB_NODE_TYPE_INTERNAL,
dispatch_state=VLIB_NODE_STATE_POLLING, frame=<optimized out>,
last_time_stamp=<optimized out>)
    at
/usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vlib/main.c:1058
#13 dispatch_pending_node (vm=0x7f3f30239780, pending_frame_index=10442192,
last_time_stamp=<optimized out>) at
/usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vlib/main.c:1238
#14 vlib_main_or_worker_loop (vm=0x7f3f30239780, is_main=0) at
/usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vlib/main.c:1822
#15 vlib_worker_loop (vm=vm@entry=0x7f3f30239780) at
/usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vlib/main.c:1956
#16 0x00007f3f5c48817d in vlib_worker_thread_fn (arg=<optimized out>) at
/usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vlib/threads.c:1617
#17 0x00007f3f5c33f56c in clib_calljmp () at
/usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vppinfra/longjmp.S:123
#18 0x00007f3bfb7fdc30 in ?? ()
#19 0x00007f3f5c46f0e7 in vlib_worker_thread_bootstrap_fn
(arg=0x7f3edc721a40) at
/usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vlib/threads.c:488
#20 0x0000000000000000 in ?? ()
(gdb) thread apply all bt


couple of things we saw while debugging this problem:

1. pending_frame vector len is huge:
(gdb) p nm->pending_frames
$4 = (vlib_pending_frame_t *) 0x7f3f4532fe30
(gdb) get_vec_len 0x7f3f4532fe30
$5 = 10236250
(gdb)

2. pending_frame vector has duplicate entries:
(gdb) p nm->pending_frames[0]
$10 = {frame = 0x7f3f30709b80, node_runtime_index = 648, next_frame_index =
4294967295}
(gdb) p nm->pending_frames[1]
$11 = {frame = 0x7f3f30709280, node_runtime_index = 550, next_frame_index =
2299}
(gdb) p nm->pending_frames[2]
$12 = {frame = 0x7f3f302e2140, node_runtime_index = 548, next_frame_index =
2315}
(gdb) p nm->pending_frames[3]
$13 = {frame = 0x7f3f302e2a80, node_runtime_index = 569, next_frame_index =
949}
(gdb) p nm->pending_frames[4]
$14 = {frame = 0x7f3f302e4140, node_runtime_index = 567, next_frame_index =
2629}
(gdb) p nm->pending_frames[5]
$15 = {frame = 0x7f3f30523280, node_runtime_index = 251, next_frame_index =
2605}
(gdb) p nm->pending_frames[6]
$16 = {frame = 0x7f3f302e2a80, node_runtime_index = 569, next_frame_index =
949}
(gdb) p nm->pending_frames[7]
$17 = {frame = 0x7f3f302e4140, node_runtime_index = 567, next_frame_index =
2629}
(gdb) p nm->pending_frames[8]
$18 = {frame = 0x7f3f30523280, node_runtime_index = 251, next_frame_index =
2605}
(gdb) p nm->pending_frames[100]
$19 = {frame = 0x7f3f302e4140, node_runtime_index = 567, next_frame_index =
2629}
(gdb) p nm->pending_frames[101]
$20 = {frame = 0x7f3f30523280, node_runtime_index = 251, next_frame_index =
2605}
(gdb) p nm->pending_frames[102]
$21 = {frame = 0x7f3f302e2a80, node_runtime_index = 569, next_frame_index =
949}

3. all the frame points to same buffer:
(gdb) get_buf_index_from_frame   0x7f3f30709b80
$56 = (u32 *) 0x7f3f30709b90
(gdb) p *$56
$57 = 4988539
(gdb) get_buf_index_from_frame 0x7f3f30709280
$58 = (u32 *) 0x7f3f30709290
(gdb) p *$58
$59 = 4988539
(gdb) get_buf_index_from_frame 0x7f3f302e2140
$60 = (u32 *) 0x7f3f302e2150
(gdb) p *$60
$61 = 4988539
(gdb) get_buf_index_from_frame 0x7f3f302e2a80
$62 = (u32 *) 0x7f3f302e2a90
(gdb) p *$62
$63 = 4988539
(gdb) get_buf_index_from_frame 0x7f3f302e4140
$64 = (u32 *) 0x7f3f302e4150
(gdb) p *$64
$65 = 4988539
(gdb) get_buf_index_from_frame 0x7f3f30523280
$66 = (u32 *) 0x7f3f30523290
(gdb) p *$66
$67 = 4988539
(gdb)

4. buffer data shows that it's a v6 BGP packet.

5. next_frame vector:
(gdb) p nm->next_frames
$50 = (vlib_next_frame_t *) 0x7f3f30852c00
(gdb) p nm->next_frames[2605]
$51 = {frame = 0x0, node_runtime_index = 251, flags = 32772,
vectors_since_last_overflow = 3412084}
(gdb) p /t nm->next_frames[2605].flags
$52 = 1000000000000100
(gdb) p nm->nodes_by_type[VLIB_NODE_TYPE_INTERNAL]
$53 = (vlib_node_runtime_t *) 0x7f3f30b2ad00
(gdb) p nm->nodes_by_type[VLIB_NODE_TYPE_INTERNAL][251]
$54 = {cacheline0 = 0x7f3f30b32a80 "", function = 0x7f3edac5b300
<an_ppe_router_input_node_fn>, errors = 0x7f3edcaf89e0,
clocks_since_last_overflow = 1303441754, max_clock = 17932868, max_clock_n
= 1,
  calls_since_last_overflow = 3412083, vectors_since_last_overflow =
3412083, next_frame_index = 947, node_index = 291,
input_main_loops_per_call = 0, main_loop_count_last_dispatch = 92468689,
main_loop_vector_stats = {0, 3412081},
  flags = 0, state = 0, n_next_nodes = 3, cached_next_index = 2,
thread_index = 3, runtime_data = 0x7f3f30b32ac6 ""}
(gdb)
(gdb) get_vec_len nm->next_frames
$69 = 4005
(gdb)

6. node runtime index to name mapping:
(gdb) p nm->nodes[648].name
$70 = (u8 *) 0x7f3edcd6e130 "ip4-sv-reassembly"
(gdb) p nm->nodes[550].name
$71 = (u8 *) 0x7f3edcd3f630 "esp4-decrypt-post"
(gdb) p nm->nodes[548].name
$72 = (u8 *) 0x7f3edcd3dec0 "esp6-decrypt-post"
(gdb) p nm->nodes[569].name
$73 = (u8 *) 0x7f3edcd48680 "ah4-decrypt-handoff"
(gdb) p nm->nodes[567].name
$74 = (u8 *) 0x7f3edcd47cf0 "ipsec4-input-feature"
(gdb) p nm->nodes[251].name
$75 = (u8 *) 0x7f3edcac17c0 "dns46_reply"
(gdb)


Thanks
Vipin A.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20634): https://lists.fd.io/g/vpp-dev/message/20634
Mute This Topic: https://lists.fd.io/mt/87717737/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to