Hi Dave,

I understand a lot of things have changed in between 1801 and latest
release.
But based on the pstack we were seeing, I went ahead and cherry picked a
small change from latest in file
 vlib/threads.c in function 'vlib_worker_thread_barrier_sync_int'

I replaced this --  while ((now = vlib_time_now (vm)) <
vm->barrier_no_close_before);

with this block

      while (1)
        {
          now = vlib_time_now (vm);
          /* Barrier hold-down timer expired? */
          if (now >= vm->barrier_no_close_before)
            break;
          if ((vm->barrier_no_close_before - now)
              > (2.0 * BARRIER_MINIMUM_OPEN_LIMIT))
            {
              clib_warning
                ("clock change: would have waited for %.4f seconds",
                 (vm->barrier_no_close_before - now));
              break;
            }
        }

This seems to resolve some of the problems. The vapi client doesn't get
disconnected any more. The CLI also keeps working, so main thread is not
stuck anymore when the system time is changed. I do see the above clib
warning also.

However, the main thread keeps running at 100% CPU utilization for the time
reported by the clib warning.
I see that the throughput goes way down and the workers are kind of
under-performing
This under-performance of workers again happens for the same period of time
as printed in this log -"clock change: would have waited for xxx
seconds". After
this period, the system returns to normal.

I was wondering if I could pickup something else to get this right ?

Regards,
Siddarth



On Fri, Feb 7, 2020 at 9:57 PM Dave Barach (dbarach) <dbar...@cisco.com>
wrote:

> FWIW, master/latest continues to pass traffic w/ “date -s” deltas of both
> plus and minus a couple of minutes. This is not a huge surprise, nor is it
> a surprise that stable/1801 fails miserably under similar albeit less
> draconian circumstances.
>
>
>
> The algorithm changes mentioned below don’t involve a lot of code, but
> they are pretty first-order important.
>
>
>
> HTH... Dave
>
>
>
> *From:* vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> *On Behalf Of *Dave
> Barach via Lists.Fd.Io
> *Sent:* Friday, February 7, 2020 10:53 AM
> *To:* siddarth rai <sid...@gmail.com>; vpp-dev <vpp-dev@lists.fd.io>
> *Cc:* vpp-dev@lists.fd.io
> *Subject:* Re: [vpp-dev] VPP main threads gets stuck when system time is
> changed
>
>
>
> Try patching src/vppinfra/time.[ch] from master/latest. The algorithms
> involved have been changed quite a bit since 18.01...
>
>
>
> Dave
>
>
>
> *From:* vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> *On Behalf Of *siddarth
> rai
> *Sent:* Friday, February 7, 2020 9:17 AM
> *To:* vpp-dev <vpp-dev@lists.fd.io>
> *Subject:* [vpp-dev] VPP main threads gets stuck when system time is
> changed
>
>
>
> Hi,
>
>
>
> We have  VPP 1801 in one of our systems. I understand the support for VPP
> 1801 is not there anymore , but requesting for any advice nevertheless.
>
>
>
> System time is changed by a few seconds using 'date -s'. Then the VPP main
> thread goes to 100% CPU utilization.
>
> The issue is only reproduced when the traffic is running.
>
>
>
> I attached gdb to VPP and saw that while the worker thread is working
> normally, the main thread seems to be stuck at clib_cpu_time_now.
>
>              https://pastebin.com/iJm0uZqx
>
>
>
> Also, here is the bt of main :
>
>              https://pastebin.com/CSjv4KsW
>
>
>
> Please help. Any pointers will be much appreciated
>
>
>
> Regards,
>
> Siddarth
>
>
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15376): https://lists.fd.io/g/vpp-dev/message/15376
Mute This Topic: https://lists.fd.io/mt/71053226/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to