Hi Stanislav,

I see no smoking guns :(

The only cause I can think of is that when a load-balance is returned to the 
pool, the pool’s bitmap of free indicies may expand, which would confuse 
readers/workers. But I don’t see any of your threads having just pool_put a 
load-balance. Since you have a reliable reproduction environment, could you 
cook your own pool_put_would_expand macro to test this theory?

/neale


From: Stanislav Zaikin <zsta...@gmail.com>
Date: Friday, 22 October 2021 at 15:06
To: Neale Ranns <ne...@graphiant.com>
Cc: vpp-dev <vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] assert in pool_elt_at_index
Hi Neale,

Sure, here it is:
https://gist.github.com/zstas/c2316d4e95a84fa28f0e0be00eb6fb19

Thanks in advance.

On Fri, 22 Oct 2021 at 09:55, Neale Ranns 
<ne...@graphiant.com<mailto:ne...@graphiant.com>> wrote:
Hi Stanislav,

Can you do:
  thread apply all bt
I’d like to see what the other threads are doing.

/neale

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
<vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> on behalf of Stanislav Zaikin 
via lists.fd.io<http://lists.fd.io> 
<zstaseg=gmail....@lists.fd.io<mailto:gmail....@lists.fd.io>>
Date: Wednesday, 13 October 2021 at 20:30
To: vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>>
Subject: [vpp-dev] assert in pool_elt_at_index
Hello folks,

I'm facing a strange issue with 2 worker threads. Sometimes I get a crash 
either in "ip6-lookup" or "mpls-lookup" nodes. They happen with assert in the 
pool_elt_at_index macro and always inside the "load_balance_get" function. But 
the load_balance dpo looks perfectly good, I mean it still has a lock and on 
regular deletion (in the case when the load_balance dpo is deleted) it should 
be erased properly (with dpo_reset). It happens usually when the main core is 
executing vlib_worker_thread_barrier_sync_int(), and the other worker is 
executing vlib_worker_thread_barrier_check().
And the strangest thing is, when I run the vpp's gdb helper for checking 
"pool_index_is_free" or pifi, it shows me that the index isn't free (and the 
macro in that case shouldn't fire).

Any thoughts and inputs are appreciated.

Thread 3 "vpp_wk_0" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fb4f2e22700 (LWP 3244)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fb5933fa921 in __GI_abort () at abort.c:79
#2  0x00007fb594392799 in os_panic () at 
/home/vpp/vpp/src/vppinfra/unix-misc.c:177
#3  0x00007fb5942d8f49 in debugger () at /home/vpp/vpp/src/vppinfra/error.c:84
#4  0x00007fb5942d8cc7 in _clib_error (how_to_die=2, function_name=0x0, 
line_number=0, fmt=0x7fb5962ec8d0 "%s:%d (%s) assertion `%s' fails") at 
/home/vpp/vpp/src/vppinfra/error.c:143
#5  0x00007fb5954bd694 in load_balance_get (lbi=3604) at 
/home/vpp/vpp/src/vnet/dpo/load_balance.h:222
#6  0x00007fb5954bc070 in ip6_lookup_inline (vm=0x7fb51ceccd00, 
node=0x7fb520f6b700, frame=0x7fb52128e4c0) at 
/home/vpp/vpp/src/vnet/ip/ip6_forward.h:117
#7  0x00007fb5954bbdd5 in ip6_lookup_node_fn_hsw (vm=0x7fb51ceccd00, 
node=0x7fb520f6b700, frame=0x7fb52128e4c0) at 
/home/vpp/vpp/src/vnet/ip/ip6_forward.c:736
#8  0x00007fb594ec0076 in dispatch_node (vm=0x7fb51ceccd00, 
node=0x7fb520f6b700, type=VLIB_NODE_TYPE_INTERNAL, 
dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x7fb52128e4c0, 
last_time_stamp=1808528151240447) at /home/vpp/vpp/src/vlib/main.c:1217
#9  0x00007fb594ec09e7 in dispatch_pending_node (vm=0x7fb51ceccd00, 
pending_frame_index=5, last_time_stamp=1808528151240447) at 
/home/vpp/vpp/src/vlib/main.c:1376
#10 0x00007fb594eba441 in vlib_main_or_worker_loop (vm=0x7fb51ceccd00, 
is_main=0) at /home/vpp/vpp/src/vlib/main.c:1904
#11 0x00007fb594eb92e7 in vlib_worker_loop (vm=0x7fb51ceccd00) at 
/home/vpp/vpp/src/vlib/main.c:2038
#12 0x00007fb594f1195d in vlib_worker_thread_fn (arg=0x7fb513a48100) at 
/home/vpp/vpp/src/vlib/threads.c:1868
#13 0x00007fb5942fd214 in clib_calljmp () at 
/home/vpp/vpp/src/vppinfra/longjmp.S:123
#14 0x00007fb4f2e21c90 in ?? ()
#15 0x00007fb594f09b83 in vlib_worker_thread_bootstrap_fn (arg=0x7fb513a48100) 
at /home/vpp/vpp/src/vlib/threads.c:585
#16 0x00007fb50c218355 in eal_thread_loop (arg=0x0) at 
../src-dpdk/lib/librte_eal/linux/eal_thread.c:127
#17 0x00007fb5947dc6db in start_thread (arg=0x7fb4f2e22700) at 
pthread_create.c:463
#18 0x00007fb5934db71f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) select 5
(gdb) print _e
$1 = (load_balance_t *) 0x7fb52651e580
(gdb) print load_balance_pool[3604]
$2 = {cacheline0 = 0x7fb52651e580 "\001", lb_n_buckets = 1, 
lb_n_buckets_minus_1 = 0, lb_proto = DPO_PROTO_IP6, lb_flags = 
LOAD_BALANCE_FLAG_NONE, lb_fib_entry_flags = (FIB_ENTRY_FLAG_CONNECTED | 
FIB_ENTRY_FLAG_LOCAL), lb_locks = 1, lb_map = 4294967295, lb_urpf = 4094, 
lb_hash_config = 31, lb_buckets = 0x0,
  lb_buckets_inline = {{{{dpoi_type = DPO_RECEIVE, dpoi_proto = DPO_PROTO_IP6, 
dpoi_next_node = 2, dpoi_index = 2094}, as_u64 = 8993661649164}}, {{{dpoi_type 
= DPO_FIRST, dpoi_proto = DPO_PROTO_IP4, dpoi_next_node = 0, dpoi_index = 0}, 
as_u64 = 0}}, {{{dpoi_type = DPO_FIRST, dpoi_proto = DPO_PROTO_IP4,
          dpoi_next_node = 0, dpoi_index = 0}, as_u64 = 0}}, {{{dpoi_type = 
DPO_FIRST, dpoi_proto = DPO_PROTO_IP4, dpoi_next_node = 0, dpoi_index = 0}, 
as_u64 = 0}}}}
(gdb) print &load_balance_pool[3604]
$3 = (load_balance_t *) 0x7fb52651e580
(gdb) source ~/vpp/extras/gdb/gdbinit
Loading vpp functions...
Load vlLoad pe
Load pifi
Load node_name_from_index
Load vnet_buffer_opaque
Load vnet_buffer_opaque2
Load bitmap_get
Done loading vpp functions...
(gdb) pifi load_balance_pool 3604
pool_is_free_index (load_balance_pool, 3604)$4 = 0
(gdb) info threads
  Id   Target Id         Frame
  1    Thread 0x7fb596bd2c40 (LWP 727) "vpp_main" 0x00007fb594f1439b in 
clib_time_now_internal (c=0x7fb59517ccc0 <vlib_global_main>, 
n=1808528155236639) at /home/vpp/vpp/src/vppinfra/time.h:215
  2    Thread 0x7fb4f3623700 (LWP 2976) "eal-intr-thread" 0x00007fb5934dba47 in 
epoll_wait (epfd=17, events=0x7fb4f3622d80, maxevents=1, timeout=-1) at 
../sysdeps/unix/sysv/linux/epoll_wait.c:30
* 3    Thread 0x7fb4f2e22700 (LWP 3244) "vpp_wk_0" __GI_raise (sig=sig@entry=6) 
at ../sysdeps/unix/sysv/linux/raise.c:51
  4    Thread 0x7fb4f2621700 (LWP 3246) "vpp_wk_1" 0x00007fb594ebf897 in 
vlib_worker_thread_barrier_check () at /home/vpp/vpp/src/vlib/threads.h:439

--
Best regards
Stanislav Zaikin


--
Best regards
Stanislav Zaikin
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20363): https://lists.fd.io/g/vpp-dev/message/20363
Mute This Topic: https://lists.fd.io/mt/86295132/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to