vpp "process" nodes are cooperative multitasking threads. Emphasis on 
cooperative. There is no force of physics which will cause such a thread to 
deschedule unless it chooses to give up the CPU. Toss in a runtime limit + 
vlib_process_suspend (...) and see if that improves matters...

Dave

________________________________

From: Rajith PR <raj...@rtbrick.com>
Sent: Friday, June 19, 2020 9:16 AM
To: Dave Barach (dbarach) <dbar...@cisco.com>
Cc: vpp-dev <vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] VPP_Main Thread Gets Stuck

Version is 19.08.  And we suspect one of our own process nodes is in a tight 
loop doing route download. However show run , show run max does not indicate 
any high clock tim on them.
Is there any other way to detect the problem node.

Thanks,
Rajith

On Fri, Jun 19, 2020 at 5:26 PM Dave Barach (dbarach) 
<dbar...@cisco.com<mailto:dbar...@cisco.com>> wrote:

Vpp version? Configuration? Backtraces from other threads? The timer wheel code 
is not likely to be directly responsible.



Earlier this year, we addressed a number of issues in vppinfra/time.[ch] having 
to do with NTP and/or manual time changes which could lead to symptoms like 
this.



If you don’t have those patches, it would be best to acquire them at your 
earliest convenience. T=131 seconds is within the plausible range for an NTP 
timebase earthquake.



HTH... Dave



Please refer to 
https://fd.io/docs/vpp/master/troubleshooting/reportingissues/reportingissues.html#<https://fd.io/docs/vpp/master/troubleshooting/reportingissues/reportingissues.html>



From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
<vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> On Behalf Of Rajith PR via 
lists.fd.io<http://lists.fd.io>
Sent: Friday, June 19, 2020 12:30 AM
To: vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>>
Subject: [vpp-dev] VPP_Main Thread Gets Stuck



Hi All,



While during scale tests with large numbers of routes, we occasionally hit a 
strange issue in our container. The vpp process became unresponsive, after 
attaching the process to gdb we could see the vpp_main thread is stuck on a 
specific function. Any pointer to debug such issues would be of great help.



Back Trace:



#0 0x00007f6895f1bc56 in clib_bitmap_get (ai=0x7f683ad339c0, i=826) at 
/development/libvpp/src/vppinfra/bitmap.h:201

#1 0x00007f6895f20357 in tw_timer_expire_timers_internal_1t_3w_1024sl_ov 
(tw=0x7f683ad30000, now=131.6111045732342, callback_vector_arg=0x7f683ad330c0) 
at /development/libvpp/src/vppinfra/tw_timer_template.c:744 #2 
0x00007f6895f20b36 in tw_timer_expire_timers_vec_1t_3w_1024sl_ov 
(tw=0x7f683ad30000, now=131.6111045732342, vec=0x7f683ad330c0) at 
/development/libvpp/src/vppinfra/tw_timer_template.c:814 #3 0x00007f68961fd166 
in vlib_main_or_worker_loop (vm=0x7f689649ce00 <vlib_global_main>, is_main=1) 
at /development/libvpp/src/vlib/main.c:1857 #4 0x00007f68961fd8b1 in 
vlib_main_loop (vm=0x7f689649ce00 <vlib_global_main>) at 
/development/libvpp/src/vlib/main.c:1928 #5 0x00007f68961fe578 in vlib_main 
(vm=0x7f689649ce00 <vlib_global_main>, input=0x7f683a60ffb0) at 
/development/libvpp/src/vlib/main.c:2145 #6 0x00007f6896264865 in thread0 
(arg=140087174745600) at /development/libvpp/src/vlib/unix/main.c:666 #7 
0x00007f6895ebd600 in clib_calljmp () from /usr/local/lib/libvppinfra.so.1.0.1 
#8 0x00007fff47e2f760 in ?? () #9 0x00007f6896264ddb in vlib_unix_main 
(argc=21, argv=0x563cecf5f900) at /development/libvpp/src/vlib/unix/main.c:736



Thanks,

Rajith
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16766): https://lists.fd.io/g/vpp-dev/message/16766
Mute This Topic: https://lists.fd.io/mt/74973962/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to