Hi Benoit,
I've tried but got this crash. Maybe some fixes were merged to handle this
issue? (I'm using 21.01 + additional patches from the master)

Program received signal SIGSEGV, Segmentation fault.
0x000000000042c037 in __asan::FakeStack::AddrIsInFakeStack(unsigned long,
unsigned long*, unsigned long*) ()
(gdb) bt
#0  0x000000000042c037 in __asan::FakeStack::AddrIsInFakeStack(unsigned
long, unsigned long*, unsigned long*) ()
#1  0x00000000004a31b4 in
__asan::ThreadStackContainsAddress(__sanitizer::ThreadContextBase*, void*)
()
#2  0x00000000004b2f4a in
__sanitizer::ThreadRegistry::FindThreadContextLocked(bool
(*)(__sanitizer::ThreadContextBase*, void*), void*) ()
#3  0x00000000004a30eb in __asan::FindThreadByStackAddress(unsigned long) ()
#4  0x0000000000429265 in
__asan::AddressDescription::AddressDescription(unsigned long, unsigned
long, bool) ()
#5  0x000000000042adb3 in __asan::ErrorGeneric::ErrorGeneric(unsigned int,
unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned
long) ()
#6  0x000000000049cd81 in __asan::ReportGenericError(unsigned long,
unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned
int, bool) ()
#7  0x000000000049d5e8 in __asan_report_load4 ()
#8  0x00007ffff4fc18e8 in clib_march_select_fn_ptr_by_name
(r=0x7ffff7858740 <vnet_interface_output_node_march_fn_registration>,
name=0x7ffff6fd5980 <.str> "hsw") at /home/vpp/vpp/src/vppinfra/cpu.h:94
#9  0x00007ffff4fc17a1 in vnet_interface_output_node_get (vm=0x7ffff3a2d340
<vlib_global_main>) at /home/vpp/vpp/src/vnet/interface_output.c:538
#10 0x00007ffff4f152cd in vnet_register_interface (vnm=0x7ffff7a6b320
<vnet_main>, dev_class_index=34, dev_instance=0, hw_class_index=31,
hw_instance=0) at /home/vpp/vpp/src/vnet/interface.c:810
#11 0x00007ffff4ff34ca in vnet_main_init (vm=0x7ffff3a2d340
<vlib_global_main>) at /home/vpp/vpp/src/vnet/misc.c:81
#12 0x00007ffff353d476 in call_init_exit_functions_internal
(vm=0x7ffff3a2d340 <vlib_global_main>, headp=0x7ffff3a2d998
<vlib_global_main+1624>, call_once=1, do_sort=1) at
/home/vpp/vpp/src/vlib/init.c:350
#13 0x00007ffff353d258 in vlib_call_init_exit_functions (vm=0x7ffff3a2d340
<vlib_global_main>, headp=0x7ffff3a2d998 <vlib_global_main+1624>,
call_once=1) at /home/vpp/vpp/src/vlib/init.c:364
#14 0x00007ffff353d561 in vlib_call_all_init_functions (vm=0x7ffff3a2d340
<vlib_global_main>) at /home/vpp/vpp/src/vlib/init.c:386
#15 0x00007ffff35c8711 in vlib_main (vm=0x7ffff3a2d340 <vlib_global_main>,
input=0x7fff9cb4bec0) at /home/vpp/vpp/src/vlib/main.c:2213
#16 0x00007ffff376b808 in thread0 (arg=140737280922432) at
/home/vpp/vpp/src/vlib/unix/main.c:670
#17 0x00007ffff269204c in clib_calljmp () at
/home/vpp/vpp/src/vppinfra/longjmp.S:123
#18 0x00007fffffffc980 in ?? ()
#19 0x00007ffff376ad55 in vlib_unix_main (argc=2, argv=0x7fffffffe498) at
/home/vpp/vpp/src/vlib/unix/main.c:747
#20 0x00000000004c8fa8 in main (argc=2, argv=0x7fffffffe498) at
/home/vpp/vpp/src/vpp/vnet/main.c:338
(gdb)

On Mon, 18 Oct 2021 at 14:29, Benoit Ganne (bganne) <bga...@cisco.com>
wrote:

> You can try running with AddressSanitizer:
> https://fd.io/docs/vpp/master/troubleshooting/sanitizer.html#id2
> That could catch the corruption earlier with more clues.
>
> Best
> ben
>
> > -----Original Message-----
> > From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Stanislav
> > Zaikin
> > Sent: vendredi 15 octobre 2021 14:54
> > To: vpp-dev <vpp-dev@lists.fd.io>
> > Subject: Re: [vpp-dev] assert in pool_elt_at_index
> >
> > Any thoughts will be much appreciated. This bug is quite reproducible in
> > my environment, but I don't know what else to check.
> >
> >
> > On Thu, 14 Oct 2021 at 08:51, Stanislav Zaikin via lists.fd.io
> > <http://lists.fd.io>  <zstaseg=gmail....@lists.fd.io
> > <mailto:gmail....@lists.fd.io> > wrote:
> >
> >
> >       Hi Florin,
> >       Hi Rajith,
> >
> >       It shouldn't be the pool expansion case, I have
> > 8341f76fd1cd4351961cd8161cfed2814fc55103.
> >       Moreover, in this case _e would be different from
> > &load_balance_pool[3604]. I've found some of those expansions (in other
> > places), in those cases a pointer to the element has a different address.
> >
> >
> >
> >
> >       On Thu, 14 Oct 2021 at 06:45, Rajith PR <raj...@rtbrick.com
> > <mailto:raj...@rtbrick.com> > wrote:
> >
> >
> >               HI Stanislav,
> >
> >               My guess is you don't have the commit below.
> >
> >               commit 8341f76fd1cd4351961cd8161cfed2814fc55103
> >               Author: Dave Barach <d...@barachs.net
> > <mailto:d...@barachs.net> >
> >               Date:   Wed Jun 3 08:05:15 2020 -0400
> >
> >                   fib: add barrier sync, pool/vector expand cases
> >
> >                   load_balance_alloc_i(...) is not thread safe when the
> >                   load_balance_pool or combined counter vectors expand.
> >
> >                   Type: fix
> >
> >                   Signed-off-by: Dave Barach <d...@barachs.net
> > <mailto:d...@barachs.net> >
> >                   Change-Id: I7f295ed77350d1df0434d5ff461eedafe79131de
> >
> >
> >               Thanks,
> >               Rajith
> >
> >               On Thu, Oct 14, 2021 at 3:57 AM Florin Coras
> > <fcoras.li...@gmail.com <mailto:fcoras.li...@gmail.com> > wrote:
> >
> >
> >                       Hi Stanislav,
> >
> >                       The only thing I can think of is that main thread
> grows
> > the pool, or the pool’s bitmap, without a worker barrier while the worker
> > that asserts is trying to access it. Is main thread busy doing something
> > (e.g., adding routes/interfaces) when the assert happens?
> >
> >                       Regards,
> >                       Florin
> >
> >
> >
> >                               On Oct 13, 2021, at 2:52 PM, Stanislav
> Zaikin
> > <zsta...@gmail.com <mailto:zsta...@gmail.com> > wrote:
> >
> >                               Hi Florin,
> >
> >                               I wasn't aware of those helper functions,
> thanks!
> > But yeah, it also returns 0 (sorry, but there's the trace of another
> > crash)
> >
> >                               Thread 3 "vpp_wk_0" received signal
> SIGABRT,
> > Aborted.
> >                               [Switching to Thread 0x7f9cc0f6a700 (LWP
> 3546)]
> >                               __GI_raise (sig=sig@entry=6) at
> > ../sysdeps/unix/sysv/linux/raise.c:51
> >                               51 ../sysdeps/unix/sysv/linux/raise.c: No
> such
> > file or directory.
> >                               (gdb) bt
> >                               #0  __GI_raise (sig=sig@entry=6) at
> > ../sysdeps/unix/sysv/linux/raise.c:51
> >                               #1  0x00007f9d61542921 in __GI_abort () at
> > abort.c:79
> >                               #2  0x00007f9d624da799 in os_panic () at
> > /home/vpp/vpp/src/vppinfra/unix-misc.c:177
> >                               #3  0x00007f9d62420f49 in debugger () at
> > /home/vpp/vpp/src/vppinfra/error.c:84
> >                               #4  0x00007f9d62420cc7 in _clib_error
> > (how_to_die=2, function_name=0x0, line_number=0, fmt=0x7f9d644348d0
> "%s:%d
> > (%s) assertion `%s' fails") at /home/vpp/vpp/src/vppinfra/error.c:143
> >                               #5  0x00007f9d636695b4 in load_balance_get
> > (lbi=4569) at /home/vpp/vpp/src/vnet/dpo/load_balance.h:222
> >                               #6  0x00007f9d63668247 in
> mpls_lookup_node_fn_hsw
> > (vm=0x7f9ceb0138c0, node=0x7f9ceee6f700, from_frame=0x7f9cef9c9240) at
> > /home/vpp/vpp/src/vnet/mpls/mpls_lookup.c:229
> >                               #7  0x00007f9d63008076 in dispatch_node
> > (vm=0x7f9ceb0138c0, node=0x7f9ceee6f700, type=VLIB_NODE_TYPE_INTERNAL,
> > dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x7f9cef9c9240,
> > last_time_stamp=1837178878370487) at /home/vpp/vpp/src/vlib/main.c:1217
> >                               #8  0x00007f9d630089e7 in
> dispatch_pending_node
> > (vm=0x7f9ceb0138c0, pending_frame_index=2,
> > last_time_stamp=1837178878370487) at /home/vpp/vpp/src/vlib/main.c:1376
> >                               #9  0x00007f9d63002441 in
> vlib_main_or_worker_loop
> > (vm=0x7f9ceb0138c0, is_main=0) at /home/vpp/vpp/src/vlib/main.c:1904
> >                               #10 0x00007f9d630012e7 in vlib_worker_loop
> > (vm=0x7f9ceb0138c0) at /home/vpp/vpp/src/vlib/main.c:2038
> >                               #11 0x00007f9d6305995d in
> vlib_worker_thread_fn
> > (arg=0x7f9ce1b88540) at /home/vpp/vpp/src/vlib/threads.c:1868
> >                               #12 0x00007f9d62445214 in clib_calljmp ()
> at
> > /home/vpp/vpp/src/vppinfra/longjmp.S:123
> >                               #13 0x00007f9cc0f69c90 in ?? ()
> >                               #14 0x00007f9d63051b83 in
> > vlib_worker_thread_bootstrap_fn (arg=0x7f9ce1b88540) at
> > /home/vpp/vpp/src/vlib/threads.c:585
> >                               #15 0x00007f9cda360355 in eal_thread_loop
> > (arg=0x0) at ../src-dpdk/lib/librte_eal/linux/eal_thread.c:127
> >                               #16 0x00007f9d629246db in start_thread
> > (arg=0x7f9cc0f6a700) at pthread_create.c:463
> >                               #17 0x00007f9d6162371f in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> >                               (gdb) select 5
> >                               (gdb) print pifi( load_balance_pool, 4569 )
> >                               $1 = 0
> >                               (gdb) source ~/vpp/extras/gdb/gdbinit
> >                               Loading vpp functions...
> >                               Load vlLoad pe
> >                               Load pifi
> >                               Load node_name_from_index
> >                               Load vnet_buffer_opaque
> >                               Load vnet_buffer_opaque2
> >                               Load bitmap_get
> >                               Done loading vpp functions...
> >                               (gdb) pifi load_balance_pool 4569
> >                               pool_is_free_index (load_balance_pool,
> 4569)$2 = 0
> >
> >
> >                               On Wed, 13 Oct 2021 at 21:55, Florin Coras
> > <fcoras.li...@gmail.com <mailto:fcoras.li...@gmail.com> > wrote:
> >
> >
> >                                       Hi Stanislav,
> >
> >                                       Just to make sure the gdb macro is
> okay,
> > could you run from gdb: pifi(pool, index)? The function is defined in
> > gdb_funcs.c.
> >
> >                                       Regards,
> >                                       Florin
> >
> >
> >                                       On Oct 13, 2021, at 11:30 AM,
> Stanislav
> > Zaikin <zsta...@gmail.com <mailto:zsta...@gmail.com> > wrote:
> >
> >                                       Hello folks,
> >
> >                                       I'm facing a strange issue with 2
> worker
> > threads. Sometimes I get a crash either in "ip6-lookup" or "mpls-lookup"
> > nodes. They happen with assert in the pool_elt_at_index macro and always
> > inside the "load_balance_get" function. But the load_balance dpo looks
> > perfectly good, I mean it still has a lock and on regular deletion (in
> the
> > case when the load_balance dpo is deleted) it should be erased properly
> > (with dpo_reset). It happens usually when the main core is executing
> > vlib_worker_thread_barrier_sync_int(), and the other worker is executing
> > vlib_worker_thread_barrier_check().
> >                                       And the strangest thing is, when I
> run the
> > vpp's gdb helper for checking "pool_index_is_free" or pifi, it shows me
> > that the index isn't free (and the macro in that case shouldn't fire).
> >
> >
> >                                       Any thoughts and inputs are
> appreciated.
> >
> >
> >                                       Thread 3 "vpp_wk_0" received
> signal SIGABRT,
> > Aborted.
> >                                       [Switching to Thread
> 0x7fb4f2e22700 (LWP
> > 3244)]
> >                                       __GI_raise (sig=sig@entry=6) at
> > ../sysdeps/unix/sysv/linux/raise.c:51
> >                                       51
> ../sysdeps/unix/sysv/linux/raise.c: No
> > such file or directory.
> >                                       (gdb) bt
> >                                       #0  __GI_raise (sig=sig@entry=6)
> at
> > ../sysdeps/unix/sysv/linux/raise.c:51
> >                                       #1  0x00007fb5933fa921 in
> __GI_abort () at
> > abort.c:79
> >                                       #2  0x00007fb594392799 in os_panic
> () at
> > /home/vpp/vpp/src/vppinfra/unix-misc.c:177
> >                                       #3  0x00007fb5942d8f49 in debugger
> () at
> > /home/vpp/vpp/src/vppinfra/error.c:84
> >                                       #4  0x00007fb5942d8cc7 in
> _clib_error
> > (how_to_die=2, function_name=0x0, line_number=0, fmt=0x7fb5962ec8d0
> "%s:%d
> > (%s) assertion `%s' fails") at /home/vpp/vpp/src/vppinfra/error.c:143
> >                                       #5  0x00007fb5954bd694 in
> load_balance_get
> > (lbi=3604) at /home/vpp/vpp/src/vnet/dpo/load_balance.h:222
> >                                       #6  0x00007fb5954bc070 in
> ip6_lookup_inline
> > (vm=0x7fb51ceccd00, node=0x7fb520f6b700, frame=0x7fb52128e4c0) at
> > /home/vpp/vpp/src/vnet/ip/ip6_forward.h:117
> >                                       #7  0x00007fb5954bbdd5 in
> > ip6_lookup_node_fn_hsw (vm=0x7fb51ceccd00, node=0x7fb520f6b700,
> > frame=0x7fb52128e4c0) at /home/vpp/vpp/src/vnet/ip/ip6_forward.c:736
> >                                       #8  0x00007fb594ec0076 in
> dispatch_node
> > (vm=0x7fb51ceccd00, node=0x7fb520f6b700, type=VLIB_NODE_TYPE_INTERNAL,
> > dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x7fb52128e4c0,
> > last_time_stamp=1808528151240447) at /home/vpp/vpp/src/vlib/main.c:1217
> >                                       #9  0x00007fb594ec09e7 in
> > dispatch_pending_node (vm=0x7fb51ceccd00, pending_frame_index=5,
> > last_time_stamp=1808528151240447) at /home/vpp/vpp/src/vlib/main.c:1376
> >                                       #10 0x00007fb594eba441 in
> > vlib_main_or_worker_loop (vm=0x7fb51ceccd00, is_main=0) at
> > /home/vpp/vpp/src/vlib/main.c:1904
> >                                       #11 0x00007fb594eb92e7 in
> vlib_worker_loop
> > (vm=0x7fb51ceccd00) at /home/vpp/vpp/src/vlib/main.c:2038
> >                                       #12 0x00007fb594f1195d in
> > vlib_worker_thread_fn (arg=0x7fb513a48100) at
> > /home/vpp/vpp/src/vlib/threads.c:1868
> >                                       #13 0x00007fb5942fd214 in
> clib_calljmp () at
> > /home/vpp/vpp/src/vppinfra/longjmp.S:123
> >                                       #14 0x00007fb4f2e21c90 in ?? ()
> >                                       #15 0x00007fb594f09b83 in
> > vlib_worker_thread_bootstrap_fn (arg=0x7fb513a48100) at
> > /home/vpp/vpp/src/vlib/threads.c:585
> >                                       #16 0x00007fb50c218355 in
> eal_thread_loop
> > (arg=0x0) at ../src-dpdk/lib/librte_eal/linux/eal_thread.c:127
> >                                       #17 0x00007fb5947dc6db in
> start_thread
> > (arg=0x7fb4f2e22700) at pthread_create.c:463
> >                                       #18 0x00007fb5934db71f in clone ()
> at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> >                                       (gdb) select 5
> >                                       (gdb) print _e
> >                                       $1 = (load_balance_t *)
> 0x7fb52651e580
> >                                       (gdb) print load_balance_pool[3604]
> >                                       $2 = {cacheline0 = 0x7fb52651e580
> "\001",
> > lb_n_buckets = 1, lb_n_buckets_minus_1 = 0, lb_proto = DPO_PROTO_IP6,
> > lb_flags = LOAD_BALANCE_FLAG_NONE, lb_fib_entry_flags =
> > (FIB_ENTRY_FLAG_CONNECTED | FIB_ENTRY_FLAG_LOCAL), lb_locks = 1, lb_map =
> > 4294967295, lb_urpf = 4094, lb_hash_config = 31, lb_buckets = 0x0,
> >                                         lb_buckets_inline =
> {{{{dpoi_type =
> > DPO_RECEIVE, dpoi_proto = DPO_PROTO_IP6, dpoi_next_node = 2, dpoi_index =
> > 2094}, as_u64 = 8993661649164}}, {{{dpoi_type = DPO_FIRST, dpoi_proto =
> > DPO_PROTO_IP4, dpoi_next_node = 0, dpoi_index = 0}, as_u64 = 0}},
> > {{{dpoi_type = DPO_FIRST, dpoi_proto = DPO_PROTO_IP4,
> >                                                 dpoi_next_node = 0,
> dpoi_index =
> > 0}, as_u64 = 0}}, {{{dpoi_type = DPO_FIRST, dpoi_proto = DPO_PROTO_IP4,
> > dpoi_next_node = 0, dpoi_index = 0}, as_u64 = 0}}}}
> >                                       (gdb) print
> &load_balance_pool[3604]
> >                                       $3 = (load_balance_t *)
> 0x7fb52651e580
> >                                       (gdb) source
> ~/vpp/extras/gdb/gdbinit
> >                                       Loading vpp functions...
> >                                       Load vlLoad pe
> >                                       Load pifi
> >                                       Load node_name_from_index
> >                                       Load vnet_buffer_opaque
> >                                       Load vnet_buffer_opaque2
> >                                       Load bitmap_get
> >                                       Done loading vpp functions...
> >                                       (gdb) pifi load_balance_pool 3604
> >                                       pool_is_free_index
> (load_balance_pool,
> > 3604)$4 = 0
> >                                       (gdb) info threads
> >                                         Id   Target Id         Frame
> >                                         1    Thread 0x7fb596bd2c40 (LWP
> 727)
> > "vpp_main" 0x00007fb594f1439b in clib_time_now_internal (c=0x7fb59517ccc0
> > <vlib_global_main>, n=1808528155236639) at
> > /home/vpp/vpp/src/vppinfra/time.h:215
> >                                         2    Thread 0x7fb4f3623700 (LWP
> 2976)
> > "eal-intr-thread" 0x00007fb5934dba47 in epoll_wait (epfd=17,
> > events=0x7fb4f3622d80, maxevents=1, timeout=-1) at
> > ../sysdeps/unix/sysv/linux/epoll_wait.c:30
> >                                       * 3    Thread 0x7fb4f2e22700 (LWP
> 3244)
> > "vpp_wk_0" __GI_raise (sig=sig@entry=6) at
> > ../sysdeps/unix/sysv/linux/raise.c:51
> >                                         4    Thread 0x7fb4f2621700 (LWP
> 3246)
> > "vpp_wk_1" 0x00007fb594ebf897 in vlib_worker_thread_barrier_check () at
> > /home/vpp/vpp/src/vlib/threads.h:439
> >
> >
> >                                       --
> >
> >                                       Best regards
> >                                       Stanislav Zaikin
> >
> >
> >
> >
> >
> >
> >
> >                               --
> >
> >                               Best regards
> >                               Stanislav Zaikin
> >
> >
> >
> >
> >
> >
> >               NOTICE TO RECIPIENT This e-mail message and any attachments
> > are confidential and may be privileged. If you received this e-mail in
> > error, any review, use, dissemination, distribution, or copying of this
> e-
> > mail is strictly prohibited. Please notify us immediately of the error by
> > return e-mail and please delete this message from your system. For more
> > information about Rtbrick, please visit us at www.rtbrick.com
> > <http://www.rtbrick.com>
> >
> >
> >
> >       --
> >
> >       Best regards
> >       Stanislav Zaikin
> >
> >
> >
> >
> >
> >
> > --
> >
> > Best regards
> > Stanislav Zaikin
>


-- 
Best regards
Stanislav Zaikin
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20354): https://lists.fd.io/g/vpp-dev/message/20354
Mute This Topic: https://lists.fd.io/mt/86295132/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to