Hi Stanislav, 

The only thing I can think of is that main thread grows the pool, or the pool’s 
bitmap, without a worker barrier while the worker that asserts is trying to 
access it. Is main thread busy doing something (e.g., adding routes/interfaces) 
when the assert happens? 

Regards,
Florin

> On Oct 13, 2021, at 2:52 PM, Stanislav Zaikin <zsta...@gmail.com> wrote:
> 
> Hi Florin,
> 
> I wasn't aware of those helper functions, thanks! But yeah, it also returns 0 
> (sorry, but there's the trace of another crash)
> 
> Thread 3 "vpp_wk_0" received signal SIGABRT, Aborted.
> [Switching to Thread 0x7f9cc0f6a700 (LWP 3546)]
> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x00007f9d61542921 in __GI_abort () at abort.c:79
> #2  0x00007f9d624da799 in os_panic () at 
> /home/vpp/vpp/src/vppinfra/unix-misc.c:177
> #3  0x00007f9d62420f49 in debugger () at /home/vpp/vpp/src/vppinfra/error.c:84
> #4  0x00007f9d62420cc7 in _clib_error (how_to_die=2, function_name=0x0, 
> line_number=0, fmt=0x7f9d644348d0 "%s:%d (%s) assertion `%s' fails") at 
> /home/vpp/vpp/src/vppinfra/error.c:143
> #5  0x00007f9d636695b4 in load_balance_get (lbi=4569) at 
> /home/vpp/vpp/src/vnet/dpo/load_balance.h:222
> #6  0x00007f9d63668247 in mpls_lookup_node_fn_hsw (vm=0x7f9ceb0138c0, 
> node=0x7f9ceee6f700, from_frame=0x7f9cef9c9240) at 
> /home/vpp/vpp/src/vnet/mpls/mpls_lookup.c:229
> #7  0x00007f9d63008076 in dispatch_node (vm=0x7f9ceb0138c0, 
> node=0x7f9ceee6f700, type=VLIB_NODE_TYPE_INTERNAL, 
> dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x7f9cef9c9240, 
> last_time_stamp=1837178878370487) at /home/vpp/vpp/src/vlib/main.c:1217
> #8  0x00007f9d630089e7 in dispatch_pending_node (vm=0x7f9ceb0138c0, 
> pending_frame_index=2, last_time_stamp=1837178878370487) at 
> /home/vpp/vpp/src/vlib/main.c:1376
> #9  0x00007f9d63002441 in vlib_main_or_worker_loop (vm=0x7f9ceb0138c0, 
> is_main=0) at /home/vpp/vpp/src/vlib/main.c:1904
> #10 0x00007f9d630012e7 in vlib_worker_loop (vm=0x7f9ceb0138c0) at 
> /home/vpp/vpp/src/vlib/main.c:2038
> #11 0x00007f9d6305995d in vlib_worker_thread_fn (arg=0x7f9ce1b88540) at 
> /home/vpp/vpp/src/vlib/threads.c:1868
> #12 0x00007f9d62445214 in clib_calljmp () at 
> /home/vpp/vpp/src/vppinfra/longjmp.S:123
> #13 0x00007f9cc0f69c90 in ?? ()
> #14 0x00007f9d63051b83 in vlib_worker_thread_bootstrap_fn 
> (arg=0x7f9ce1b88540) at /home/vpp/vpp/src/vlib/threads.c:585
> #15 0x00007f9cda360355 in eal_thread_loop (arg=0x0) at 
> ../src-dpdk/lib/librte_eal/linux/eal_thread.c:127
> #16 0x00007f9d629246db in start_thread (arg=0x7f9cc0f6a700) at 
> pthread_create.c:463
> #17 0x00007f9d6162371f in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> (gdb) select 5
> (gdb) print pifi( load_balance_pool, 4569 )
> $1 = 0
> (gdb) source ~/vpp/extras/gdb/gdbinit 
> Loading vpp functions...
> Load vlLoad pe
> Load pifi
> Load node_name_from_index
> Load vnet_buffer_opaque
> Load vnet_buffer_opaque2
> Load bitmap_get
> Done loading vpp functions...
> (gdb) pifi load_balance_pool 4569
> pool_is_free_index (load_balance_pool, 4569)$2 = 0
> 
> On Wed, 13 Oct 2021 at 21:55, Florin Coras <fcoras.li...@gmail.com 
> <mailto:fcoras.li...@gmail.com>> wrote:
> Hi Stanislav, 
> 
> Just to make sure the gdb macro is okay, could you run from gdb: pifi(pool, 
> index)? The function is defined in gdb_funcs.c.
> 
> Regards,
> Florin
> 
>> On Oct 13, 2021, at 11:30 AM, Stanislav Zaikin <zsta...@gmail.com 
>> <mailto:zsta...@gmail.com>> wrote:
>> 
>> Hello folks,
>> 
>> I'm facing a strange issue with 2 worker threads. Sometimes I get a crash 
>> either in "ip6-lookup" or "mpls-lookup" nodes. They happen with assert in 
>> the pool_elt_at_index macro and always inside the "load_balance_get" 
>> function. But the load_balance dpo looks perfectly good, I mean it still has 
>> a lock and on regular deletion (in the case when the load_balance dpo is 
>> deleted) it should be erased properly (with dpo_reset). It happens usually 
>> when the main core is executing vlib_worker_thread_barrier_sync_int(), and 
>> the other worker is executing vlib_worker_thread_barrier_check().
>> And the strangest thing is, when I run the vpp's gdb helper for checking 
>> "pool_index_is_free" or pifi, it shows me that the index isn't free (and the 
>> macro in that case shouldn't fire).
>> 
>> Any thoughts and inputs are appreciated.
>> 
>> Thread 3 "vpp_wk_0" received signal SIGABRT, Aborted.
>> [Switching to Thread 0x7fb4f2e22700 (LWP 3244)]
>> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
>> 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>> (gdb) bt
>> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
>> #1  0x00007fb5933fa921 in __GI_abort () at abort.c:79
>> #2  0x00007fb594392799 in os_panic () at 
>> /home/vpp/vpp/src/vppinfra/unix-misc.c:177
>> #3  0x00007fb5942d8f49 in debugger () at 
>> /home/vpp/vpp/src/vppinfra/error.c:84
>> #4  0x00007fb5942d8cc7 in _clib_error (how_to_die=2, function_name=0x0, 
>> line_number=0, fmt=0x7fb5962ec8d0 "%s:%d (%s) assertion `%s' fails") at 
>> /home/vpp/vpp/src/vppinfra/error.c:143
>> #5  0x00007fb5954bd694 in load_balance_get (lbi=3604) at 
>> /home/vpp/vpp/src/vnet/dpo/load_balance.h:222
>> #6  0x00007fb5954bc070 in ip6_lookup_inline (vm=0x7fb51ceccd00, 
>> node=0x7fb520f6b700, frame=0x7fb52128e4c0) at 
>> /home/vpp/vpp/src/vnet/ip/ip6_forward.h:117
>> #7  0x00007fb5954bbdd5 in ip6_lookup_node_fn_hsw (vm=0x7fb51ceccd00, 
>> node=0x7fb520f6b700, frame=0x7fb52128e4c0) at 
>> /home/vpp/vpp/src/vnet/ip/ip6_forward.c:736
>> #8  0x00007fb594ec0076 in dispatch_node (vm=0x7fb51ceccd00, 
>> node=0x7fb520f6b700, type=VLIB_NODE_TYPE_INTERNAL, 
>> dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x7fb52128e4c0, 
>> last_time_stamp=1808528151240447) at /home/vpp/vpp/src/vlib/main.c:1217
>> #9  0x00007fb594ec09e7 in dispatch_pending_node (vm=0x7fb51ceccd00, 
>> pending_frame_index=5, last_time_stamp=1808528151240447) at 
>> /home/vpp/vpp/src/vlib/main.c:1376
>> #10 0x00007fb594eba441 in vlib_main_or_worker_loop (vm=0x7fb51ceccd00, 
>> is_main=0) at /home/vpp/vpp/src/vlib/main.c:1904
>> #11 0x00007fb594eb92e7 in vlib_worker_loop (vm=0x7fb51ceccd00) at 
>> /home/vpp/vpp/src/vlib/main.c:2038
>> #12 0x00007fb594f1195d in vlib_worker_thread_fn (arg=0x7fb513a48100) at 
>> /home/vpp/vpp/src/vlib/threads.c:1868
>> #13 0x00007fb5942fd214 in clib_calljmp () at 
>> /home/vpp/vpp/src/vppinfra/longjmp.S:123
>> #14 0x00007fb4f2e21c90 in ?? ()
>> #15 0x00007fb594f09b83 in vlib_worker_thread_bootstrap_fn 
>> (arg=0x7fb513a48100) at /home/vpp/vpp/src/vlib/threads.c:585
>> #16 0x00007fb50c218355 in eal_thread_loop (arg=0x0) at 
>> ../src-dpdk/lib/librte_eal/linux/eal_thread.c:127
>> #17 0x00007fb5947dc6db in start_thread (arg=0x7fb4f2e22700) at 
>> pthread_create.c:463
>> #18 0x00007fb5934db71f in clone () at 
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>> (gdb) select 5
>> (gdb) print _e
>> $1 = (load_balance_t *) 0x7fb52651e580
>> (gdb) print load_balance_pool[3604]
>> $2 = {cacheline0 = 0x7fb52651e580 "\001", lb_n_buckets = 1, 
>> lb_n_buckets_minus_1 = 0, lb_proto = DPO_PROTO_IP6, lb_flags = 
>> LOAD_BALANCE_FLAG_NONE, lb_fib_entry_flags = (FIB_ENTRY_FLAG_CONNECTED | 
>> FIB_ENTRY_FLAG_LOCAL), lb_locks = 1, lb_map = 4294967295, lb_urpf = 4094, 
>> lb_hash_config = 31, lb_buckets = 0x0, 
>>   lb_buckets_inline = {{{{dpoi_type = DPO_RECEIVE, dpoi_proto = 
>> DPO_PROTO_IP6, dpoi_next_node = 2, dpoi_index = 2094}, as_u64 = 
>> 8993661649164}}, {{{dpoi_type = DPO_FIRST, dpoi_proto = DPO_PROTO_IP4, 
>> dpoi_next_node = 0, dpoi_index = 0}, as_u64 = 0}}, {{{dpoi_type = DPO_FIRST, 
>> dpoi_proto = DPO_PROTO_IP4, 
>>           dpoi_next_node = 0, dpoi_index = 0}, as_u64 = 0}}, {{{dpoi_type = 
>> DPO_FIRST, dpoi_proto = DPO_PROTO_IP4, dpoi_next_node = 0, dpoi_index = 0}, 
>> as_u64 = 0}}}}
>> (gdb) print &load_balance_pool[3604]
>> $3 = (load_balance_t *) 0x7fb52651e580
>> (gdb) source ~/vpp/extras/gdb/gdbinit 
>> Loading vpp functions...
>> Load vlLoad pe
>> Load pifi
>> Load node_name_from_index
>> Load vnet_buffer_opaque
>> Load vnet_buffer_opaque2
>> Load bitmap_get
>> Done loading vpp functions...
>> (gdb) pifi load_balance_pool 3604
>> pool_is_free_index (load_balance_pool, 3604)$4 = 0
>> (gdb) info threads
>>   Id   Target Id         Frame 
>>   1    Thread 0x7fb596bd2c40 (LWP 727) "vpp_main" 0x00007fb594f1439b in 
>> clib_time_now_internal (c=0x7fb59517ccc0 <vlib_global_main>, 
>> n=1808528155236639) at /home/vpp/vpp/src/vppinfra/time.h:215
>>   2    Thread 0x7fb4f3623700 (LWP 2976) "eal-intr-thread" 0x00007fb5934dba47 
>> in epoll_wait (epfd=17, events=0x7fb4f3622d80, maxevents=1, timeout=-1) at 
>> ../sysdeps/unix/sysv/linux/epoll_wait.c:30
>> * 3    Thread 0x7fb4f2e22700 (LWP 3244) "vpp_wk_0" __GI_raise 
>> (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
>>   4    Thread 0x7fb4f2621700 (LWP 3246) "vpp_wk_1" 0x00007fb594ebf897 in 
>> vlib_worker_thread_barrier_check () at /home/vpp/vpp/src/vlib/threads.h:439
>> 
>> -- 
>> Best regards
>> Stanislav Zaikin
>> 
>> 
>> 
> 
> 
> 
> -- 
> Best regards
> Stanislav Zaikin

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20323): https://lists.fd.io/g/vpp-dev/message/20323
Mute This Topic: https://lists.fd.io/mt/86295132/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to