Hi Dave, In addition to the change mentioned earlier, I have tried one more change. In file vppinfra/time.h, I replaced 'CLOCK_REALTIME' to 'CLOCK_MONOTONIC'. This seems to have done the trick for now.
Just wondering what could be the impact of this change elsewhere. Should we watch out for any blind spots ? Regards, Siddarth P.S:We have moved to 19.08 but at some deployments are live and we can't help but work with 18.01 and resolve the issues that come up. On Tue, Feb 11, 2020 at 6:21 PM Dave Barach (dbarach) <dbar...@cisco.com> wrote: > Start vpp under gdb, and produce the condition. Interrupt vpp, switch to > thread 0 and collect a backtrace. Based on the backtrace, it should be > fairly clear what’s happening. > > > > Once again: vpp 18.01 is 2+ year old software which the community no > longer supports. If at all possible, please rebase your work onto 19.08 > (LTS), or 20.01 (current release). > > > > HTH... Dave > > > > *From:* siddarth rai <sid...@gmail.com> > *Sent:* Tuesday, February 11, 2020 2:06 AM > *To:* Dave Barach (dbarach) <dbar...@cisco.com> > *Cc:* vpp-dev <vpp-dev@lists.fd.io> > *Subject:* Re: [vpp-dev] VPP main threads gets stuck when system time is > changed > > > > Hi Dave, > > > > I understand a lot of things have changed in between 1801 and latest > release. > > But based on the pstack we were seeing, I went ahead and cherry picked a > small change from latest in file > > vlib/threads.c in function 'vlib_worker_thread_barrier_sync_int' > > > > I replaced this -- while ((now = vlib_time_now (vm)) < > vm->barrier_no_close_before); > > > > with this block > > > > while (1) > { > now = vlib_time_now (vm); > /* Barrier hold-down timer expired? */ > if (now >= vm->barrier_no_close_before) > break; > if ((vm->barrier_no_close_before - now) > > (2.0 * BARRIER_MINIMUM_OPEN_LIMIT)) > { > clib_warning > ("clock change: would have waited for %.4f seconds", > (vm->barrier_no_close_before - now)); > break; > } > } > > > > This seems to resolve some of the problems. The vapi client doesn't get > disconnected any more. The CLI also keeps working, so main thread is not > stuck anymore when the system time is changed. I do see the above clib > warning also. > > > > However, the main thread keeps running at 100% CPU utilization for the > time reported by the clib warning. > > I see that the throughput goes way down and the workers are kind of > under-performing > > This under-performance of workers again happens for the same period of > time as printed in this log -"clock change: would have waited for xxx > seconds". After this period, the system returns to normal. > > > > I was wondering if I could pickup something else to get this right ? > > > > Regards, > > Siddarth > > > > > > > > On Fri, Feb 7, 2020 at 9:57 PM Dave Barach (dbarach) <dbar...@cisco.com> > wrote: > > FWIW, master/latest continues to pass traffic w/ “date -s” deltas of both > plus and minus a couple of minutes. This is not a huge surprise, nor is it > a surprise that stable/1801 fails miserably under similar albeit less > draconian circumstances. > > > > The algorithm changes mentioned below don’t involve a lot of code, but > they are pretty first-order important. > > > > HTH... Dave > > > > *From:* vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> *On Behalf Of *Dave > Barach via Lists.Fd.Io > *Sent:* Friday, February 7, 2020 10:53 AM > *To:* siddarth rai <sid...@gmail.com>; vpp-dev <vpp-dev@lists.fd.io> > *Cc:* vpp-dev@lists.fd.io > *Subject:* Re: [vpp-dev] VPP main threads gets stuck when system time is > changed > > > > Try patching src/vppinfra/time.[ch] from master/latest. The algorithms > involved have been changed quite a bit since 18.01... > > > > Dave > > > > *From:* vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> *On Behalf Of *siddarth > rai > *Sent:* Friday, February 7, 2020 9:17 AM > *To:* vpp-dev <vpp-dev@lists.fd.io> > *Subject:* [vpp-dev] VPP main threads gets stuck when system time is > changed > > > > Hi, > > > > We have VPP 1801 in one of our systems. I understand the support for VPP > 1801 is not there anymore , but requesting for any advice nevertheless. > > > > System time is changed by a few seconds using 'date -s'. Then the VPP main > thread goes to 100% CPU utilization. > > The issue is only reproduced when the traffic is running. > > > > I attached gdb to VPP and saw that while the worker thread is working > normally, the main thread seems to be stuck at clib_cpu_time_now. > > https://pastebin.com/iJm0uZqx > > > > Also, here is the bt of main : > > https://pastebin.com/CSjv4KsW > > > > Please help. Any pointers will be much appreciated > > > > Regards, > > Siddarth > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#15380): https://lists.fd.io/g/vpp-dev/message/15380 Mute This Topic: https://lists.fd.io/mt/71053226/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-