Hi Dave, I understand a lot of things have changed in between 1801 and latest release. But based on the pstack we were seeing, I went ahead and cherry picked a small change from latest in file vlib/threads.c in function 'vlib_worker_thread_barrier_sync_int'
I replaced this -- while ((now = vlib_time_now (vm)) < vm->barrier_no_close_before); with this block while (1) { now = vlib_time_now (vm); /* Barrier hold-down timer expired? */ if (now >= vm->barrier_no_close_before) break; if ((vm->barrier_no_close_before - now) > (2.0 * BARRIER_MINIMUM_OPEN_LIMIT)) { clib_warning ("clock change: would have waited for %.4f seconds", (vm->barrier_no_close_before - now)); break; } } This seems to resolve some of the problems. The vapi client doesn't get disconnected any more. The CLI also keeps working, so main thread is not stuck anymore when the system time is changed. I do see the above clib warning also. However, the main thread keeps running at 100% CPU utilization for the time reported by the clib warning. I see that the throughput goes way down and the workers are kind of under-performing This under-performance of workers again happens for the same period of time as printed in this log -"clock change: would have waited for xxx seconds". After this period, the system returns to normal. I was wondering if I could pickup something else to get this right ? Regards, Siddarth On Fri, Feb 7, 2020 at 9:57 PM Dave Barach (dbarach) <dbar...@cisco.com> wrote: > FWIW, master/latest continues to pass traffic w/ “date -s” deltas of both > plus and minus a couple of minutes. This is not a huge surprise, nor is it > a surprise that stable/1801 fails miserably under similar albeit less > draconian circumstances. > > > > The algorithm changes mentioned below don’t involve a lot of code, but > they are pretty first-order important. > > > > HTH... Dave > > > > *From:* vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> *On Behalf Of *Dave > Barach via Lists.Fd.Io > *Sent:* Friday, February 7, 2020 10:53 AM > *To:* siddarth rai <sid...@gmail.com>; vpp-dev <vpp-dev@lists.fd.io> > *Cc:* vpp-dev@lists.fd.io > *Subject:* Re: [vpp-dev] VPP main threads gets stuck when system time is > changed > > > > Try patching src/vppinfra/time.[ch] from master/latest. The algorithms > involved have been changed quite a bit since 18.01... > > > > Dave > > > > *From:* vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> *On Behalf Of *siddarth > rai > *Sent:* Friday, February 7, 2020 9:17 AM > *To:* vpp-dev <vpp-dev@lists.fd.io> > *Subject:* [vpp-dev] VPP main threads gets stuck when system time is > changed > > > > Hi, > > > > We have VPP 1801 in one of our systems. I understand the support for VPP > 1801 is not there anymore , but requesting for any advice nevertheless. > > > > System time is changed by a few seconds using 'date -s'. Then the VPP main > thread goes to 100% CPU utilization. > > The issue is only reproduced when the traffic is running. > > > > I attached gdb to VPP and saw that while the worker thread is working > normally, the main thread seems to be stuck at clib_cpu_time_now. > > https://pastebin.com/iJm0uZqx > > > > Also, here is the bt of main : > > https://pastebin.com/CSjv4KsW > > > > Please help. Any pointers will be much appreciated > > > > Regards, > > Siddarth > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#15376): https://lists.fd.io/g/vpp-dev/message/15376 Mute This Topic: https://lists.fd.io/mt/71053226/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-