vpp "process" nodes are cooperative multitasking threads. Emphasis on cooperative. There is no force of physics which will cause such a thread to deschedule unless it chooses to give up the CPU. Toss in a runtime limit + vlib_process_suspend (...) and see if that improves matters...
Dave ________________________________ From: Rajith PR <raj...@rtbrick.com> Sent: Friday, June 19, 2020 9:16 AM To: Dave Barach (dbarach) <dbar...@cisco.com> Cc: vpp-dev <vpp-dev@lists.fd.io> Subject: Re: [vpp-dev] VPP_Main Thread Gets Stuck Version is 19.08. And we suspect one of our own process nodes is in a tight loop doing route download. However show run , show run max does not indicate any high clock tim on them. Is there any other way to detect the problem node. Thanks, Rajith On Fri, Jun 19, 2020 at 5:26 PM Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>> wrote: Vpp version? Configuration? Backtraces from other threads? The timer wheel code is not likely to be directly responsible. Earlier this year, we addressed a number of issues in vppinfra/time.[ch] having to do with NTP and/or manual time changes which could lead to symptoms like this. If you don’t have those patches, it would be best to acquire them at your earliest convenience. T=131 seconds is within the plausible range for an NTP timebase earthquake. HTH... Dave Please refer to https://fd.io/docs/vpp/master/troubleshooting/reportingissues/reportingissues.html#<https://fd.io/docs/vpp/master/troubleshooting/reportingissues/reportingissues.html> From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> On Behalf Of Rajith PR via lists.fd.io<http://lists.fd.io> Sent: Friday, June 19, 2020 12:30 AM To: vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> Subject: [vpp-dev] VPP_Main Thread Gets Stuck Hi All, While during scale tests with large numbers of routes, we occasionally hit a strange issue in our container. The vpp process became unresponsive, after attaching the process to gdb we could see the vpp_main thread is stuck on a specific function. Any pointer to debug such issues would be of great help. Back Trace: #0 0x00007f6895f1bc56 in clib_bitmap_get (ai=0x7f683ad339c0, i=826) at /development/libvpp/src/vppinfra/bitmap.h:201 #1 0x00007f6895f20357 in tw_timer_expire_timers_internal_1t_3w_1024sl_ov (tw=0x7f683ad30000, now=131.6111045732342, callback_vector_arg=0x7f683ad330c0) at /development/libvpp/src/vppinfra/tw_timer_template.c:744 #2 0x00007f6895f20b36 in tw_timer_expire_timers_vec_1t_3w_1024sl_ov (tw=0x7f683ad30000, now=131.6111045732342, vec=0x7f683ad330c0) at /development/libvpp/src/vppinfra/tw_timer_template.c:814 #3 0x00007f68961fd166 in vlib_main_or_worker_loop (vm=0x7f689649ce00 <vlib_global_main>, is_main=1) at /development/libvpp/src/vlib/main.c:1857 #4 0x00007f68961fd8b1 in vlib_main_loop (vm=0x7f689649ce00 <vlib_global_main>) at /development/libvpp/src/vlib/main.c:1928 #5 0x00007f68961fe578 in vlib_main (vm=0x7f689649ce00 <vlib_global_main>, input=0x7f683a60ffb0) at /development/libvpp/src/vlib/main.c:2145 #6 0x00007f6896264865 in thread0 (arg=140087174745600) at /development/libvpp/src/vlib/unix/main.c:666 #7 0x00007f6895ebd600 in clib_calljmp () from /usr/local/lib/libvppinfra.so.1.0.1 #8 0x00007fff47e2f760 in ?? () #9 0x00007f6896264ddb in vlib_unix_main (argc=21, argv=0x563cecf5f900) at /development/libvpp/src/vlib/unix/main.c:736 Thanks, Rajith
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16766): https://lists.fd.io/g/vpp-dev/message/16766 Mute This Topic: https://lists.fd.io/mt/74973962/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-