The abort() itself is because you are out of heap: 
https://git.fd.io/vpp/tree/src/vppinfra/mem.h?h=stable/2106#n243
As you pointed out this is caused by the ridiculous size of the pending vector.
All this smells corruption of the pending frames: do you enqueue the same frame 
several times?
I see you have a private plugin, it would be good to see if you can reproduce 
the issue with only upstream VPP (no private code).

Best
ben

> -----Original Message-----
> From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of vipin
> allawadhi
> Sent: mardi 14 décembre 2021 10:20
> To: vpp-dev@lists.fd.io
> Subject: [vpp-dev]crash in vec_resize_allocate_memory
> 
> Hello VPP experts,
> 
> fdio version: 2106
> 
> We are seeing the following crash with this version. with the earlier
> version (we were using fdio 2005), we don't see any problem.
> Have you seen a similar issue earlier?
> Any idea what could be the root cause based on the information given
> below?
> 
> (gdb) bt
> 
> #0  0x00007f3f5bf872a2 in raise () from /lib64/libc.so.6
> #1  0x00007f3f5bf708a4 in abort () from /lib64/libc.so.6
> #2  0x0000563169f469a0 in os_panic () at /usr/src/debug/vpp-21.06.0-
> 3~g50650da54_dirty.x86_64/src/vpp/vnet/main.c:453
> #3  0x00007f3f5c334597 in clib_mem_alloc_aligned_at_offset
> (size=<optimized out>, align=8, align_offset=<optimized out>,
> os_out_of_memory_on_failure=1) at /usr/src/debug/vpp-21.06.0-
> 3~g50650da54_dirty.x86_64/src/vppinfra/mem.h:243
> #4  vec_resize_allocate_memory (v=<optimized out>, v@entry=0x7f3f4ef613e0,
> length_increment=<optimized out>, length_increment@entry=1,
> data_bytes=167075112, header_bytes=8, header_bytes@entry=0,
> data_align=data_align@entry=8,
>     numa_id=numa_id@entry=255) at /usr/src/debug/vpp-21.06.0-
> 3~g50650da54_dirty.x86_64/src/vppinfra/vec.c:111
> #5  0x00007f3f5c4025cb in _vec_resize_inline (v=0x7f3f4ef613e0,
> length_increment=1, data_bytes=<optimized out>, header_bytes=0,
> data_align=8, numa_id=255)
>     at /usr/src/debug/vpp-21.06.0-
> 3~g50650da54_dirty.x86_64/src/vppinfra/vec.h:170
> #6  vlib_put_next_frame (vm=<optimized out>, vm@entry=0x7f3f30239780,
> r=r@entry=0x7f3f30d1b180, next_index=next_index@entry=2,
> n_vectors_left=<optimized out>)
>     at /usr/src/debug/vpp-21.06.0-
> 3~g50650da54_dirty.x86_64/src/vlib/main.c:543
> #7  0x00007f3f5c5a61cd in enqueue_one (vm=0x7f3f30239780,
> node=0x7f3f30d1b180, used_elt_bmp=0x7f3bf9bf8f20, next_index=<optimized
> out>, buffers=0x7f3f30562090, nexts=0x7f3bf9bf9420, n_buffers=<optimized
> out>,
>     n_left=<optimized out>, tmp=0x7f3bf9bf8f60) at /usr/src/debug/vpp-
> 21.06.0-3~g50650da54_dirty.x86_64/src/vlib/buffer_funcs.c:105
> #8  vlib_buffer_enqueue_to_next_fn_skx (vm=0x7f3f30239780,
> node=0x7f3f30d1b180, buffers=0x7f3f30562090, nexts=<optimized out>,
> count=<optimized out>)
>     at /usr/src/debug/vpp-21.06.0-
> 3~g50650da54_dirty.x86_64/src/vlib/buffer_funcs.c:153
> #9  0x00007f3edac5bcaf in vlib_buffer_enqueue_to_next (count=<optimized
> out>, nexts=0x7f3bf9bf9420, buffers=<optimized out>, node=0x7f3f30d1b180,
> vm=0x7f3f30239780)
>     at /usr/cna/bld-dataplane_base/base/cni-infra-
> dataplane/fdio/src/fdio.2106/build-root/install-vpp_debug-
> native/vpp/include/vlib/buffer_node.h:344
> #10 an_ppe_router_input_inline (is_trace=<optimized out>, frame=<optimized
> out>, node=<optimized out>, p_vlib_main=<optimized out>)
>     at /src/cna/.build/dbg/external-package/fdio/src/fdio.2106/src/an-
> plugins/an_ppe_router-plugin/an_ppe_router/an_ppe_router_input_node.c:298
> #11 an_ppe_router_input_node_fn (vm=0x7f3f30239780, node=<optimized out>,
> frame=0x7f3f30562080)
>     at /src/cna/.build/dbg/external-package/fdio/src/fdio.2106/src/an-
> plugins/an_ppe_router-plugin/an_ppe_router/an_ppe_router_input_node.c:315
> #12 0x00007f3f5c405427 in dispatch_node (vm=0x7f3f30239780,
> node=0x7f3f30d1b180, type=VLIB_NODE_TYPE_INTERNAL,
> dispatch_state=VLIB_NODE_STATE_POLLING, frame=<optimized out>,
> last_time_stamp=<optimized out>)
>     at /usr/src/debug/vpp-21.06.0-
> 3~g50650da54_dirty.x86_64/src/vlib/main.c:1058
> #13 dispatch_pending_node (vm=0x7f3f30239780,
> pending_frame_index=10442192, last_time_stamp=<optimized out>) at
> /usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vlib/main.c:1238
> #14 vlib_main_or_worker_loop (vm=0x7f3f30239780, is_main=0) at
> /usr/src/debug/vpp-21.06.0-3~g50650da54_dirty.x86_64/src/vlib/main.c:1822
> #15 vlib_worker_loop (vm=vm@entry=0x7f3f30239780) at /usr/src/debug/vpp-
> 21.06.0-3~g50650da54_dirty.x86_64/src/vlib/main.c:1956
> #16 0x00007f3f5c48817d in vlib_worker_thread_fn (arg=<optimized out>) at
> /usr/src/debug/vpp-21.06.0-
> 3~g50650da54_dirty.x86_64/src/vlib/threads.c:1617
> #17 0x00007f3f5c33f56c in clib_calljmp () at /usr/src/debug/vpp-21.06.0-
> 3~g50650da54_dirty.x86_64/src/vppinfra/longjmp.S:123
> #18 0x00007f3bfb7fdc30 in ?? ()
> #19 0x00007f3f5c46f0e7 in vlib_worker_thread_bootstrap_fn
> (arg=0x7f3edc721a40) at /usr/src/debug/vpp-21.06.0-
> 3~g50650da54_dirty.x86_64/src/vlib/threads.c:488
> #20 0x0000000000000000 in ?? ()
> (gdb) thread apply all bt
> 
> 
> couple of things we saw while debugging this problem:
> 
> 1. pending_frame vector len is huge:
> (gdb) p nm->pending_frames
> $4 = (vlib_pending_frame_t *) 0x7f3f4532fe30
> (gdb) get_vec_len 0x7f3f4532fe30
> $5 = 10236250
> (gdb)
> 
> 2. pending_frame vector has duplicate entries:
> (gdb) p nm->pending_frames[0]
> $10 = {frame = 0x7f3f30709b80, node_runtime_index = 648, next_frame_index
> = 4294967295}
> (gdb) p nm->pending_frames[1]
> $11 = {frame = 0x7f3f30709280, node_runtime_index = 550, next_frame_index
> = 2299}
> (gdb) p nm->pending_frames[2]
> $12 = {frame = 0x7f3f302e2140, node_runtime_index = 548, next_frame_index
> = 2315}
> (gdb) p nm->pending_frames[3]
> $13 = {frame = 0x7f3f302e2a80, node_runtime_index = 569, next_frame_index
> = 949}
> (gdb) p nm->pending_frames[4]
> $14 = {frame = 0x7f3f302e4140, node_runtime_index = 567, next_frame_index
> = 2629}
> (gdb) p nm->pending_frames[5]
> $15 = {frame = 0x7f3f30523280, node_runtime_index = 251, next_frame_index
> = 2605}
> (gdb) p nm->pending_frames[6]
> $16 = {frame = 0x7f3f302e2a80, node_runtime_index = 569, next_frame_index
> = 949}
> (gdb) p nm->pending_frames[7]
> $17 = {frame = 0x7f3f302e4140, node_runtime_index = 567, next_frame_index
> = 2629}
> (gdb) p nm->pending_frames[8]
> $18 = {frame = 0x7f3f30523280, node_runtime_index = 251, next_frame_index
> = 2605}
> (gdb) p nm->pending_frames[100]
> $19 = {frame = 0x7f3f302e4140, node_runtime_index = 567, next_frame_index
> = 2629}
> (gdb) p nm->pending_frames[101]
> $20 = {frame = 0x7f3f30523280, node_runtime_index = 251, next_frame_index
> = 2605}
> (gdb) p nm->pending_frames[102]
> $21 = {frame = 0x7f3f302e2a80, node_runtime_index = 569, next_frame_index
> = 949}
> 
> 3. all the frame points to same buffer:
> (gdb) get_buf_index_from_frame   0x7f3f30709b80
> $56 = (u32 *) 0x7f3f30709b90
> (gdb) p *$56
> $57 = 4988539
> (gdb) get_buf_index_from_frame 0x7f3f30709280
> $58 = (u32 *) 0x7f3f30709290
> (gdb) p *$58
> $59 = 4988539
> (gdb) get_buf_index_from_frame 0x7f3f302e2140
> $60 = (u32 *) 0x7f3f302e2150
> (gdb) p *$60
> 
> $61 = 4988539
> (gdb) get_buf_index_from_frame 0x7f3f302e2a80
> $62 = (u32 *) 0x7f3f302e2a90
> (gdb) p *$62
> $63 = 4988539
> (gdb) get_buf_index_from_frame 0x7f3f302e4140
> $64 = (u32 *) 0x7f3f302e4150
> (gdb) p *$64
> $65 = 4988539
> (gdb) get_buf_index_from_frame 0x7f3f30523280
> $66 = (u32 *) 0x7f3f30523290
> (gdb) p *$66
> $67 = 4988539
> (gdb)
> 
> 4. buffer data shows that it's a v6 BGP packet.
> 
> 5. next_frame vector:
> (gdb) p nm->next_frames
> $50 = (vlib_next_frame_t *) 0x7f3f30852c00
> (gdb) p nm->next_frames[2605]
> $51 = {frame = 0x0, node_runtime_index = 251, flags = 32772,
> vectors_since_last_overflow = 3412084}
> (gdb) p /t nm->next_frames[2605].flags
> $52 = 1000000000000100
> (gdb) p nm->nodes_by_type[VLIB_NODE_TYPE_INTERNAL]
> $53 = (vlib_node_runtime_t *) 0x7f3f30b2ad00
> (gdb) p nm->nodes_by_type[VLIB_NODE_TYPE_INTERNAL][251]
> $54 = {cacheline0 = 0x7f3f30b32a80 "", function = 0x7f3edac5b300
> <an_ppe_router_input_node_fn>, errors = 0x7f3edcaf89e0,
> clocks_since_last_overflow = 1303441754, max_clock = 17932868, max_clock_n
> = 1,
>   calls_since_last_overflow = 3412083, vectors_since_last_overflow =
> 3412083, next_frame_index = 947, node_index = 291,
> input_main_loops_per_call = 0, main_loop_count_last_dispatch = 92468689,
> main_loop_vector_stats = {0, 3412081},
>   flags = 0, state = 0, n_next_nodes = 3, cached_next_index = 2,
> thread_index = 3, runtime_data = 0x7f3f30b32ac6 ""}
> (gdb)
> (gdb) get_vec_len nm->next_frames
> $69 = 4005
> (gdb)
> 
> 6. node runtime index to name mapping:
> (gdb) p nm->nodes[648].name
> $70 = (u8 *) 0x7f3edcd6e130 "ip4-sv-reassembly"
> (gdb) p nm->nodes[550].name
> $71 = (u8 *) 0x7f3edcd3f630 "esp4-decrypt-post"
> (gdb) p nm->nodes[548].name
> $72 = (u8 *) 0x7f3edcd3dec0 "esp6-decrypt-post"
> (gdb) p nm->nodes[569].name
> $73 = (u8 *) 0x7f3edcd48680 "ah4-decrypt-handoff"
> (gdb) p nm->nodes[567].name
> $74 = (u8 *) 0x7f3edcd47cf0 "ipsec4-input-feature"
> (gdb) p nm->nodes[251].name
> 
> $75 = (u8 *) 0x7f3edcac17c0 "dns46_reply"
> (gdb)
> 
> 
> Thanks
> Vipin A.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20635): https://lists.fd.io/g/vpp-dev/message/20635
Mute This Topic: https://lists.fd.io/mt/87717737/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to