The abort() itself is because you are out of heap: https://git.fd.io/vpp/tree/src/vppinfra/mem.h?h=stable/2106#n243 As you pointed out this is caused by the ridiculous size of the pending vector. All this smells corruption of the pending frames: do you enqueue the same frame several times? I see you have a private plugin, it would be good to see if you can reproduce the issue with only upstream VPP (no private code).
Best ben > -----Original Message----- > From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of vipin > allawadhi > Sent: mardi 14 décembre 2021 10:20 > To: vpp-dev@lists.fd.io > Subject: [vpp-dev]crash in vec_resize_allocate_memory > > Hello VPP experts, > > fdio version: 2106 > > We are seeing the following crash with this version. with the earlier > version (we were using fdio 2005), we don't see any problem. > Have you seen a similar issue earlier? > Any idea what could be the root cause based on the information given > below? > > (gdb) bt > > #0 0x00007f3f5bf872a2 in raise () from /lib64/libc.so.6 > #1 0x00007f3f5bf708a4 in abort () from /lib64/libc.so.6 > #2 0x0000563169f469a0 in os_panic () at /usr/src/debug/vpp-21.06.0- > 3~g50650da54_dirty.x86_64/src/vpp/vnet/main.c:453 > #3 0x00007f3f5c334597 in clib_mem_alloc_aligned_at_offset > (size=<optimized out>, align=8, align_offset=<optimized out>, > os_out_of_memory_on_failure=1) at /usr/src/debug/vpp-21.06.0- > 3~g50650da54_dirty.x86_64/src/vppinfra/mem.h:243 > #4 vec_resize_allocate_memory (v=<optimized out>, v@entry=0x7f3f4ef613e0, > length_increment=<optimized out>, length_increment@entry=1, > data_bytes=167075112, header_bytes=8, header_bytes@entry=0, > data_align=data_align@entry=8, > numa_id=numa_id@entry=255) at /usr/src/debug/vpp-21.06.0- > 3~g50650da54_dirty.x86_64/src/vppinfra/vec.c:111 > #5 0x00007f3f5c4025cb in _vec_resize_inline (v=0x7f3f4ef613e0, > length_increment=1, data_bytes=<optimized out>, header_bytes=0, > data_align=8, numa_id=255) > at /usr/src/debug/vpp-21.06.0- > 3~g50650da54_dirty.x86_64/src/vppinfra/vec.h:170 > #6 vlib_put_next_frame (vm=<optimized out>, vm@entry=0x7f3f30239780, > r=r@entry=0x7f3f30d1b180, next_index=next_index@entry=2, > n_vectors_left=<optimized out>) > at /usr/src/debug/vpp-21.06.0- > 3~g50650da54_dirty.x86_64/src/vlib/main.c:543 > #7 0x00007f3f5c5a61cd in enqueue_one (vm=0x7f3f30239780, > node=0x7f3f30d1b180, used_elt_bmp=0x7f3bf9bf8f20, next_index=<optimized > out>, buffers=0x7f3f30562090, nexts=0x7f3bf9bf9420, n_buffers=<optimized > out>, > n_left=<optimized out>, tmp=0x7f3bf9bf8f60) at /usr/src/debug/vpp- > 21.06.0-3~g50650da54_dirty.x86_64/src/vlib/buffer_funcs.c:105 > #8 vlib_buffer_enqueue_to_next_fn_skx (vm=0x7f3f30239780, > node=0x7f3f30d1b180, buffers=0x7f3f30562090, nexts=<optimized out>, > count=<optimized out>) > at /usr/src/debug/vpp-21.06.0- > 3~g50650da54_dirty.x86_64/src/vlib/buffer_funcs.c:153 > #9 0x00007f3edac5bcaf in vlib_buffer_enqueue_to_next (count=<optimized > out>, nexts=0x7f3bf9bf9420, buffers=<optimized out>, node=0x7f3f30d1b180, > vm=0x7f3f30239780) > at /usr/cna/bld-dataplane_base/base/cni-infra- > dataplane/fdio/src/fdio.2106/build-root/install-vpp_debug- > native/vpp/include/vlib/buffer_node.h:344 > #10 an_ppe_router_input_inline (is_trace=<optimized out>, frame=<optimized > out>, node=<optimized out>, p_vlib_main=<optimized out>) > at /src/cna/.build/dbg/external-package/fdio/src/fdio.2106/src/an- > plugins/an_ppe_router-plugin/an_ppe_router/an_ppe_router_input_node.c:298 > #11 an_ppe_router_input_node_fn (vm=0x7f3f30239780, node=<optimized out>, > frame=0x7f3f30562080) > at /src/cna/.build/dbg/external-package/fdio/src/fdio.2106/src/an- > plugins/an_ppe_router-plugin/an_ppe_router/an_ppe_router_input_node.c:315 > #12 0x00007f3f5c405427 in dispatch_node (vm=0x7f3f30239780, > node=0x7f3f30d1b180, type=VLIB_NODE_TYPE_INTERNAL, > dispatch_state=VLIB_NODE_STATE_POLLING, frame=<optimized out>, > last_time_stamp=<optimized out>) > at /usr/src/debug/vpp-21.06.0- > 3~g50650da54_dirty.x86_64/src/vlib/main.c:1058 > #13 dispatch_pending_node (vm=0x7f3f30239780, > pending_frame_index=10442192, last_time_stamp=<optimized out>) at > /usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vlib/main.c:1238 > #14 vlib_main_or_worker_loop (vm=0x7f3f30239780, is_main=0) at > /usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vlib/main.c:1822 > #15 vlib_worker_loop (vm=vm@entry=0x7f3f30239780) at /usr/src/debug/vpp- > 21.06.0-3~g50650da54_dirty.x86_64/src/vlib/main.c:1956 > #16 0x00007f3f5c48817d in vlib_worker_thread_fn (arg=<optimized out>) at > /usr/src/debug/vpp-21.06.0- > 3~g50650da54_dirty.x86_64/src/vlib/threads.c:1617 > #17 0x00007f3f5c33f56c in clib_calljmp () at /usr/src/debug/vpp-21.06.0- > 3~g50650da54_dirty.x86_64/src/vppinfra/longjmp.S:123 > #18 0x00007f3bfb7fdc30 in ?? () > #19 0x00007f3f5c46f0e7 in vlib_worker_thread_bootstrap_fn > (arg=0x7f3edc721a40) at /usr/src/debug/vpp-21.06.0- > 3~g50650da54_dirty.x86_64/src/vlib/threads.c:488 > #20 0x0000000000000000 in ?? () > (gdb) thread apply all bt > > > couple of things we saw while debugging this problem: > > 1. pending_frame vector len is huge: > (gdb) p nm->pending_frames > $4 = (vlib_pending_frame_t *) 0x7f3f4532fe30 > (gdb) get_vec_len 0x7f3f4532fe30 > $5 = 10236250 > (gdb) > > 2. pending_frame vector has duplicate entries: > (gdb) p nm->pending_frames[0] > $10 = {frame = 0x7f3f30709b80, node_runtime_index = 648, next_frame_index > = 4294967295} > (gdb) p nm->pending_frames[1] > $11 = {frame = 0x7f3f30709280, node_runtime_index = 550, next_frame_index > = 2299} > (gdb) p nm->pending_frames[2] > $12 = {frame = 0x7f3f302e2140, node_runtime_index = 548, next_frame_index > = 2315} > (gdb) p nm->pending_frames[3] > $13 = {frame = 0x7f3f302e2a80, node_runtime_index = 569, next_frame_index > = 949} > (gdb) p nm->pending_frames[4] > $14 = {frame = 0x7f3f302e4140, node_runtime_index = 567, next_frame_index > = 2629} > (gdb) p nm->pending_frames[5] > $15 = {frame = 0x7f3f30523280, node_runtime_index = 251, next_frame_index > = 2605} > (gdb) p nm->pending_frames[6] > $16 = {frame = 0x7f3f302e2a80, node_runtime_index = 569, next_frame_index > = 949} > (gdb) p nm->pending_frames[7] > $17 = {frame = 0x7f3f302e4140, node_runtime_index = 567, next_frame_index > = 2629} > (gdb) p nm->pending_frames[8] > $18 = {frame = 0x7f3f30523280, node_runtime_index = 251, next_frame_index > = 2605} > (gdb) p nm->pending_frames[100] > $19 = {frame = 0x7f3f302e4140, node_runtime_index = 567, next_frame_index > = 2629} > (gdb) p nm->pending_frames[101] > $20 = {frame = 0x7f3f30523280, node_runtime_index = 251, next_frame_index > = 2605} > (gdb) p nm->pending_frames[102] > $21 = {frame = 0x7f3f302e2a80, node_runtime_index = 569, next_frame_index > = 949} > > 3. all the frame points to same buffer: > (gdb) get_buf_index_from_frame 0x7f3f30709b80 > $56 = (u32 *) 0x7f3f30709b90 > (gdb) p *$56 > $57 = 4988539 > (gdb) get_buf_index_from_frame 0x7f3f30709280 > $58 = (u32 *) 0x7f3f30709290 > (gdb) p *$58 > $59 = 4988539 > (gdb) get_buf_index_from_frame 0x7f3f302e2140 > $60 = (u32 *) 0x7f3f302e2150 > (gdb) p *$60 > > $61 = 4988539 > (gdb) get_buf_index_from_frame 0x7f3f302e2a80 > $62 = (u32 *) 0x7f3f302e2a90 > (gdb) p *$62 > $63 = 4988539 > (gdb) get_buf_index_from_frame 0x7f3f302e4140 > $64 = (u32 *) 0x7f3f302e4150 > (gdb) p *$64 > $65 = 4988539 > (gdb) get_buf_index_from_frame 0x7f3f30523280 > $66 = (u32 *) 0x7f3f30523290 > (gdb) p *$66 > $67 = 4988539 > (gdb) > > 4. buffer data shows that it's a v6 BGP packet. > > 5. next_frame vector: > (gdb) p nm->next_frames > $50 = (vlib_next_frame_t *) 0x7f3f30852c00 > (gdb) p nm->next_frames[2605] > $51 = {frame = 0x0, node_runtime_index = 251, flags = 32772, > vectors_since_last_overflow = 3412084} > (gdb) p /t nm->next_frames[2605].flags > $52 = 1000000000000100 > (gdb) p nm->nodes_by_type[VLIB_NODE_TYPE_INTERNAL] > $53 = (vlib_node_runtime_t *) 0x7f3f30b2ad00 > (gdb) p nm->nodes_by_type[VLIB_NODE_TYPE_INTERNAL][251] > $54 = {cacheline0 = 0x7f3f30b32a80 "", function = 0x7f3edac5b300 > <an_ppe_router_input_node_fn>, errors = 0x7f3edcaf89e0, > clocks_since_last_overflow = 1303441754, max_clock = 17932868, max_clock_n > = 1, > calls_since_last_overflow = 3412083, vectors_since_last_overflow = > 3412083, next_frame_index = 947, node_index = 291, > input_main_loops_per_call = 0, main_loop_count_last_dispatch = 92468689, > main_loop_vector_stats = {0, 3412081}, > flags = 0, state = 0, n_next_nodes = 3, cached_next_index = 2, > thread_index = 3, runtime_data = 0x7f3f30b32ac6 ""} > (gdb) > (gdb) get_vec_len nm->next_frames > $69 = 4005 > (gdb) > > 6. node runtime index to name mapping: > (gdb) p nm->nodes[648].name > $70 = (u8 *) 0x7f3edcd6e130 "ip4-sv-reassembly" > (gdb) p nm->nodes[550].name > $71 = (u8 *) 0x7f3edcd3f630 "esp4-decrypt-post" > (gdb) p nm->nodes[548].name > $72 = (u8 *) 0x7f3edcd3dec0 "esp6-decrypt-post" > (gdb) p nm->nodes[569].name > $73 = (u8 *) 0x7f3edcd48680 "ah4-decrypt-handoff" > (gdb) p nm->nodes[567].name > $74 = (u8 *) 0x7f3edcd47cf0 "ipsec4-input-feature" > (gdb) p nm->nodes[251].name > > $75 = (u8 *) 0x7f3edcac17c0 "dns46_reply" > (gdb) > > > Thanks > Vipin A.
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#20635): https://lists.fd.io/g/vpp-dev/message/20635 Mute This Topic: https://lists.fd.io/mt/87717737/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-