Thanks Dave for the quick analysis. Are there some Debug CLIs that I can run to analyse? We are not using the VPP timers as we have our own timer library. In VPP, we have added a couple of VPP nodes(process, internal and input). Could these be causing the problem?
Thanks, Rajith On Wed, Sep 2, 2020 at 6:43 PM Dave Barach (dbarach) <dbar...@cisco.com> wrote: > Given the amount of soak-time / perf/scale / stress testing which the > tw_timer code has experienced, it’s reasonably likely that your application > is responsible. > > > > Caution is required when dealing with timers other than the timer which > has expired. > > > > If you have > 1 timer per object and you manipulate timer B when timer A > expires, there’s no guarantee that timer B isn’t already on the expired > timer list. That’s almost always good for trouble. > > > > HTH... Dave > > > > *From:* vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> *On Behalf Of *Rajith > PR via lists.fd.io > *Sent:* Wednesday, September 2, 2020 12:39 AM > *To:* vpp-dev <vpp-dev@lists.fd.io> > *Subject:* [vpp-dev]: Crash in Timer wheel infra > > > > Hi All, > > > > We are facing a crash in VPP's Timer wheel INFRA. Please find the details > below. > > > > Version : *19.08* > > Configuration: *2 workers and the main thread.* > > Bactraces: thread apply all bt > > > > Thread 1 (Thread 0x7ff41d586d00 (LWP 253)): > > ---Type <return> to continue, or q <return> to quit--- > > #0 0x00007ff41c696722 in __GI___waitpid (pid=707, > > stat_loc=stat_loc@entry=0x7ff39f18ca18, options=options@entry=0) > > at ../sysdeps/unix/sysv/linux/waitpid.c:30 > > #1 0x00007ff41c601107 in do_system (line=<optimized out>) > > at ../sysdeps/posix/system.c:149 > > #2 0x00007ff41d11a76b in bd_signal_handler_cb (signo=6) > > at /development/librtbrickinfra/bd/src/bd.c:770 > > #3 0x00007ff410ce907b in rtb_bd_signal_handler (signo=6) > > at /development/libvpp/src/vlib/unix/main.c:80 > > #4 0x00007ff410ce9416 in unix_signal_handler (signum=6, si=0x7ff39f18d1f0, > > uc=0x7ff39f18d0c0) at /development/libvpp/src/vlib/unix/main.c:180 > > #5 <signal handler called> > > #6 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > > #7 0x00007ff41c5f28b1 in __GI_abort () at abort.c:79 > > #8 0x00007ff41248ee66 in os_panic () > > at /development/libvpp/src/vpp/vnet/main.c:559 > > #9 0x00007ff410922825 in debugger () > > at /development/libvpp/src/vppinfra/error.c:84 > > #10 0x00007ff410922bf4 in _clib_error (how_to_die=2, function_name=0x0, > > line_number=0, fmt=0x7ff4109e8a78 "%s:%d (%s) assertion `%s' fails") > > at /development/libvpp/src/vppinfra/error.c:143 > > #11 0x00007ff4109a64dd in tw_timer_expire_timers_internal_1t_3w_1024sl_ov ( > > tw=0x7ff39fdf7a40, now=327.59939269500001, > > ---Type <return> to continue, or q <return> to quit--- > > callback_vector_arg=0x7ff39fdfab00) > > at /development/libvpp/src/vppinfra/tw_timer_template.c:753 > > #12 0x00007ff4109a6b36 in tw_timer_expire_timers_vec_1t_3w_1024sl_ov ( > > tw=0x7ff39fdf7a40, now=327.59939269500001, vec=0x7ff39fdfab00) > > at /development/libvpp/src/vppinfra/tw_timer_template.c:814 > > #13 0x00007ff410c8321a in vlib_main_or_worker_loop ( > > vm=0x7ff410f22e40 <vlib_global_main>, is_main=1) > > at /development/libvpp/src/vlib/main.c:1859 > > #14 0x00007ff410c83965 in vlib_main_loop (vm=0x7ff410f22e40 > <vlib_global_main>) > > at /development/libvpp/src/vlib/main.c:1930 > > #15 0x00007ff410c8462c in vlib_main (vm=0x7ff410f22e40 <vlib_global_main>, > > input=0x7ff39f18dfb0) at /development/libvpp/src/vlib/main.c:2147 > > #16 0x00007ff410ceabc9 in thread0 (arg=140686233054784) > > at /development/libvpp/src/vlib/unix/main.c:666 > > #17 0x00007ff410943600 in clib_calljmp () > > from /usr/local/lib/libvppinfra.so.1.0.1 > > #18 0x00007ffe4d981390 in ?? () > > #19 0x00007ff410ceb13f in vlib_unix_main (argc=55, argv=0x556c398eb100) > > at /development/libvpp/src/vlib/unix/main.c:736 > > #20 0x00007ff41248e7cb in rtb_vpp_core_init (argc=55, argv=0x556c398eb100) > > at /development/libvpp/src/vpp/vnet/main.c:483 > > #21 0x00007ff41256189a in rtb_vpp_main () > > at /development/libvpp/src/vpp/rtbrick/rtb_vpp_main.c:113 > > ---Type <return> to continue, or q <return> to quit--- > > #22 0x00007ff41d11a15a in bd_load_daemon_lib ( > > dmn_lib_cfg=0x7ff41d337860 <bd_json_global+21408>) > > at /development/librtbrickinfra/bd/src/bd.c:627 > > #23 0x00007ff41d11a205 in bd_load_all_daemon_libs () > > at /development/librtbrickinfra/bd/src/bd.c:646 > > #24 0x00007ff41d11b676 in bd_start_process () > > at /development/librtbrickinfra/bd/src/bd.c:1128 > > #25 0x00007ff419e92200 in bds_bd_init () > > at /development/librtbrickinfra/libbds/code/bds/src/bds.c:651 > > #26 0x00007ff419f1aa5d in pubsub_bd_init_expiry (data=0x0) > > at > /development/librtbrickinfra/libbds/code/pubsub/src/pubsub_helper.c:1412 > > #27 0x00007ff41cc23070 in timer_dispatch (item=0x556c39997cf0, p=QB_LOOP_HIGH) > > at /development/librtbrickinfra/libqb/lib/loop_timerlist.c:56 > > #28 0x00007ff41cc1f006 in qb_loop_run_level (level=0x556c366fb3e0) > > at /development/librtbrickinfra/libqb/lib/loop.c:43 > > #29 0x00007ff41cc1f77b in qb_loop_run (lp=0x556c366fb370) > > at /development/librtbrickinfra/libqb/lib/loop.c:210 > > #30 0x00007ff41cc30b3f in lib_qb_service_start_event_loop () > > at /development/librtbrickinfra/libqb/lib/wrapper/lib_qb_service.c:257 > > #31 0x0000556c358c7153 in main () > > Thread 11 (Thread 0x7ff35b622700 (LWP 413)): > > #0 rtb_vpp_shm_rx_burst (port_id=3, queue_id=0, burst_size=64 '@') > > at /development/libvpp/src/vpp/rtbrick/rtb_vpp_shm_node.c:317 > > #1 0x00007ff4125ee043 in rtb_vpp_shm_device_input (vm=0x7ff39f89ac80, > > shmm=0x7ff41285e180 <rtb_vpp_shm_main_ctx>, shmif=0x7ff39f8ad940, > > node=0x7ff39d461480, frame=0x0, thread_index=2, queue_id=0) > > at /development/libvpp/src/vpp/rtbrick/rtb_vpp_shm_node.c:359 > > #2 0x00007ff4125ee839 in rtb_vpp_shm_input_node_fn (vm=0x7ff39f89ac80, > > node=0x7ff39d461480, f=0x0) > > at /development/libvpp/src/vpp/rtbrick/rtb_vpp_shm_node.c:452 > > #3 0x00007ff410c80cef in dispatch_node (vm=0x7ff39f89ac80, > > node=0x7ff39d461480, type=VLIB_NODE_TYPE_INPUT, > > dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, > > last_time_stamp=11572457044265548) > > at /development/libvpp/src/vlib/main.c:1207 > > #4 0x00007ff410c82e50 in vlib_main_or_worker_loop (vm=0x7ff39f89ac80, > > is_main=0) at /development/libvpp/src/vlib/main.c:1781 > > #5 0x00007ff410c83985 in vlib_worker_loop (vm=0x7ff39f89ac80) > > at /development/libvpp/src/vlib/main.c:1936 > > #6 0x00007ff410cc26af in vlib_worker_thread_fn (arg=0x7ff39d6eba80) > > at /development/libvpp/src/vlib/threads.c:1803 > > #7 0x00007ff410943600 in clib_calljmp () > > ---Type <return> to continue, or q <return> to quit--- > > from /usr/local/lib/libvppinfra.so.1.0.1 > > #8 0x00007ff35b621ec0 in ?? () > > #9 0x00007ff410cbcc46 in vlib_worker_thread_bootstrap_fn (arg=0x7ff39d6eba80) > > at /development/libvpp/src/vlib/threads.c:573 > > Backtrace stopped: previous frame inner to this frame (corrupt stack?) > > > > Thread 10 (Thread 0x7ff35be23700 (LWP 412)): > > #0 vlib_node_runtime_update_main_loop_vector_stats (vm=0x7ff39f89a540, > > node=0x7ff3a0248940, n_vectors=0) > > at /development/libvpp/src/vlib/node_funcs.h:1066 > > #1 0x00007ff410c7e5df in vlib_node_runtime_update_stats (vm=0x7ff39f89a540, > > node=0x7ff3a0248940, n_calls=1, n_vectors=0, n_clocks=8543, n_ticks0=0, > > n_ticks1=0) at /development/libvpp/src/vlib/main.c:657 > > #2 0x00007ff410c80df9 in dispatch_node (vm=0x7ff39f89a540, > > node=0x7ff3a0248940, type=VLIB_NODE_TYPE_INPUT, > > dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, > > last_time_stamp=11572457044240565) > > at /development/libvpp/src/vlib/main.c:1227 > > #3 0x00007ff410c82e50 in vlib_main_or_worker_loop (vm=0x7ff39f89a540, > > is_main=0) at /development/libvpp/src/vlib/main.c:1781 > > #4 0x00007ff410c83985 in vlib_worker_loop (vm=0x7ff39f89a540) > > at /development/libvpp/src/vlib/main.c:1936 > > #5 0x00007ff410cc26af in vlib_worker_thread_fn (arg=0x7ff39d6eb980) > > ---Type <return> to continue, or q <return> to quit--- > > at /development/libvpp/src/vlib/threads.c:1803 > > #6 0x00007ff410943600 in clib_calljmp () > > from /usr/local/lib/libvppinfra.so.1.0.1 > > #7 0x00007ff35be22ec0 in ?? () > > #8 0x00007ff410cbcc46 in vlib_worker_thread_bootstrap_fn (arg=0x7ff39d6eb980) > > at /development/libvpp/src/vlib/threads.c:573 > > Backtrace stopped: previous frame inner to this frame (corrupt stack?) > > The ASSERT at line 753 seems to be hit, the next_index seems to be > invalid. > > > > 750 > <http://rajith/lxr/http/source/src/vppinfra/tw_timer_template.c?sn=vpp-19-08#L750> > */* Construct vector of expired timer handles to give the user */* > > 751 > <http://rajith/lxr/http/source/src/vppinfra/tw_timer_template.c?sn=vpp-19-08#L751> > while (next_index != ts->head_index) > > 752 > <http://rajith/lxr/http/source/src/vppinfra/tw_timer_template.c?sn=vpp-19-08#L752> > { > > 753 > <http://rajith/lxr/http/source/src/vppinfra/tw_timer_template.c?sn=vpp-19-08#L753> > t <http://rajith/lxr/http/ident?sn=vpp-19-08;i=t> = > pool_elt_at_index > <http://rajith/lxr/http/ident?sn=vpp-19-08;i=pool_elt_at_index> (tw->timers, > next_index <http://rajith/lxr/http/ident?sn=vpp-19-08;i=next_index>); > > 754 > <http://rajith/lxr/http/source/src/vppinfra/tw_timer_template.c?sn=vpp-19-08#L754> > next_index > <http://rajith/lxr/http/ident?sn=vpp-19-08;i=next_index> = t > <http://rajith/lxr/http/ident?sn=vpp-19-08;i=t>->next; > > 755 > <http://rajith/lxr/http/source/src/vppinfra/tw_timer_template.c?sn=vpp-19-08#L755> > vec_add1 <http://rajith/lxr/http/ident?sn=vpp-19-08;i=vec_add1> > (callback_vector, t > <http://rajith/lxr/http/ident?sn=vpp-19-08;i=t>->user_handle); > > 756 > <http://rajith/lxr/http/source/src/vppinfra/tw_timer_template.c?sn=vpp-19-08#L756> > #if TW_START_STOP_TRACE_SIZE > <http://rajith/lxr/http/ident?sn=vpp-19-08;i=TW_START_STOP_TRACE_SIZE> > 0 > > 757 > <http://rajith/lxr/http/source/src/vppinfra/tw_timer_template.c?sn=vpp-19-08#L757> > TW <http://rajith/lxr/http/ident?sn=vpp-19-08;i=TW> > (tw_timer_trace <http://rajith/lxr/http/ident?sn=vpp-19-08;i=tw_timer_trace>) > (tw, 0xfe, t <http://rajith/lxr/http/ident?sn=vpp-19-08;i=t>->user_handle, t > <http://rajith/lxr/http/ident?sn=vpp-19-08;i=t> - tw->timers); > > 758 > <http://rajith/lxr/http/source/src/vppinfra/tw_timer_template.c?sn=vpp-19-08#L758> > #endif > > 759 > <http://rajith/lxr/http/source/src/vppinfra/tw_timer_template.c?sn=vpp-19-08#L759> > pool_put (tw->timers, t > <http://rajith/lxr/http/ident?sn=vpp-19-08;i=t>); > > 760 > <http://rajith/lxr/http/source/src/vppinfra/tw_timer_template.c?sn=vpp-19-08#L760> > } > > > > Thanks, > > Rajith > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#17319): https://lists.fd.io/g/vpp-dev/message/17319 Mute This Topic: https://lists.fd.io/mt/76576572/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-