In 1804 the rd-cp-process would run frequently in the main thread to cause your observation of CPU usage. It can be observed in your "show run" output under main thread that this process went through suspend/run cycle many times (highlighted in bold red below).
This was fixed under Jira ticket VPP-1256 and available in 18.07. The patch is - https://gerrit.fd.io/r/#/c/12521/ Regards, John From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of sheckman Sent: Tuesday, September 18, 2018 5:55 PM To: Dave Barach (dbarach) <dbar...@cisco.com>; vpp-dev@lists.fd.io Subject: Re: [vpp-dev] Increase in main core CPU usage between 17.10 and 18.04 Dave, That isn't our real application, which does run higher PPS rates: vpp# show run Thread 0 vpp_main (lcore 30) Time 50182.3, average vectors/node 1.00, last 128 main loops 0.00 per node 0.00 vector rates in 6.7225e-1, out 6.8335e-1, drop 5.9782e-3, punt 0.0000e0 Thread 1 vpp_wk_0 (lcore 12) Time 50182.3, average vectors/node 1.97, last 128 main loops 0.00 per node 0.00 vector rates in 2.2826e5, out 8.6201e4, drop 2.6843e0, punt 4.6188e-1 Thread 2 vpp_wk_1 (lcore 32) Time 50182.3, average vectors/node 2.18, last 128 main loops 0.00 per node 0.00 vector rates in 6.1042e5, out 1.6054e5, drop 1.5624e0, punt 0.0000e0 The workers run on isolated tickless cores and the main runs on a regular linux core. What I sent below is a plain vanilla vpp run to investigate whether our code caused the main core CPU usage increase or if it was due to a change in vpp (which it appears to be). As I said, in 17.10, we observed much lower CPU usage on vpp_main (0.3%). We were just curious what the design change was in 18.0x that bumped the main core CPU usage up to 85%. (In 17.10 and 18.04, the workers always run at 100%). (BTW: This is the change we back-merged: https://github.com/FDio/vpp/commit/85aa49019f4b4b2b7a4fce4313fdc0f2de65c277) Thanks, Steve On 09/18/2018 05:27 PM, Dave Barach (dbarach) wrote: At that PPS rate, you don't need two worker threads. The worker threads burn a bunch of cycles - poll-wait or not - doing next-to-nothing. Try running the main thread all by itself... D. -----Original Message----- From: Heckman, Steve <steve.heck...@arris.com><mailto:steve.heck...@arris.com> Sent: Tuesday, September 18, 2018 5:15 PM To: Dave Barach (dbarach) <dbar...@cisco.com><mailto:dbar...@cisco.com>; vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> Subject: Re: Increase in main core CPU usage between 17.10 and 18.04 We back-merged the unix poll wait timeout and a 100 usec delay gets us down to maybe 15%. Just wondering why the change originally. top -H: top - 17:09:45 up 19 days, 2:07, 7 users, load average: 5.72, 5.85, 5.69 Threads: 491 total, 7 running, 484 sleeping, 0 stopped, 0 zombie %Cpu(s): 8.9 us, 0.0 sy, 0.0 ni, 91.0 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 26409392+total, 10683445+free, 15314609+used, 4113380 buff/cache KiB Swap: 26830848+total, 26830848+free, 0 used. 10981197+avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7621 root -51 0 11.789g 119608 17500 R 99.7 0.0 39:21.32 vpp_wk_0 7622 root -51 0 11.789g 119608 17500 R 99.7 0.0 39:21.32 vpp_wk_1 7615 root -51 0 11.789g 119608 17500 R 83.8 0.0 34:24.68 vpp_main sudo strace -p 7615: epoll_pwait(5, [], 256, 0, [], 8) = 0 epoll_pwait(5, [], 256, 0, [], 8) = 0 epoll_pwait(5, [], 256, 0, [], 8) = 0 epoll_pwait(5, [], 256, 0, [], 8) = 0 epoll_pwait(5, [], 256, 0, [], 8) = 0 epoll_pwait(5, [], 256, 0, [], 8) = 0 epoll_pwait(5, [], 256, 0, [], 8) = 0 Thread 0 vpp_main (lcore 16) Time 2234.4, average vectors/node 1.07, last 128 main loops 0.00 per node 0.00 vector rates in 0.0000e0, out 1.6111e-2, drop 2.2377e-2, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call TenGigabitEthernet5/0/0-output active 3 3 0 4.35e3 1.00 TenGigabitEthernet5/0/1-output active 17 17 0 3.45e3 1.00 TenGigabitEthernet8/0/0-output active 18 18 0 1.84e3 1.00 TenGigabitEthernet8/0/0-tx active 18 18 0 7.47e3 1.00 TenGigabitEthernet8/0/1-output active 18 18 0 1.62e3 1.00 TenGigabitEthernet8/0/1-tx active 18 18 0 6.92e3 1.00 acl-plugin-fa-cleaner-process event wait 0 0 1 7.96e3 0.00 admin-up-down-process event wait 0 0 1 1.70e3 0.00 api-rx-from-ring any wait 0 0 119 7.05e4 0.00 avf-process event wait 0 0 1 9.06e3 0.00 bfd-process event wait 0 0 1 5.34e3 0.00 cdp-process any wait 0 0 1 7.69e6 0.00 dhcp-client-process any wait 0 0 23 7.64e3 0.00 dns-resolver-process any wait 0 0 3 1.13e4 0.00 dpdk-ipsec-process done 1 0 0 7.18e4 0.00 dpdk-process any wait 0 0 732 7.76e5 0.00 error-drop active 45 50 0 4.24e3 1.11 ethernet-input active 16 16 0 2.89e3 1.00 fib-walk any wait 0 0 1097 2.19e3 0.00 flow-report-process any wait 0 0 1 1.21e3 0.00 flowprobe-timer-process any wait 0 0 1 6.17e3 0.00 icmp6-router-advertisement active 15 15 0 5.36e3 1.00 icmp6-router-solicitation active 60 73 0 4.19e3 1.22 igmp-timer-process event wait 0 0 1 8.86e3 0.00 ikev2-manager-process any wait 0 0 2193 2.16e3 0.00 ioam-export-process any wait 0 0 1 1.49e3 0.00 ip-route-resolver-process any wait 0 0 23 4.47e3 0.00 ip4-reassembly-expire-walk any wait 0 0 221 2.94e3 0.00 ip6-drop active 30 30 0 1.38e3 1.00 ip6-icmp-error active 1 1 0 2.09e3 1.00 ip6-icmp-input active 16 16 0 1.34e3 1.00 ip6-icmp-neighbor-discovery-ev any wait 0 0 2193 2.24e3 0.00 ip6-input active 16 16 0 1.83e3 1.00 ip6-link-local active 1 1 0 1.50e3 1.00 ip6-local active 17 17 0 3.54e3 1.00 ip6-lookup active 3 3 0 5.49e3 1.00 ip6-mfib-forward-lookup active 16 16 0 3.86e3 1.00 ip6-mfib-forward-rpf active 16 16 0 3.35e3 1.00 ip6-reassembly-expire-walk any wait 0 0 221 2.99e3 0.00 ip6-replicate active 16 16 0 2.72e3 1.00 ip6-rewrite active 1 1 0 5.67e3 1.00 ip6-rewrite-mcast active 55 64 0 1.98e3 1.16 l2fib-mac-age-scanner-process event wait 0 0 1 1.93e3 0.00 lacp-process event wait 0 0 1 1.02e7 0.00 lisp-retry-service any wait 0 0 1097 2.96e3 0.00 lldp-process event wait 0 0 1 7.15e6 0.00 loop0-output active 16 16 0 1.81e3 1.00 loop0-tx active 16 0 0 1.56e3 0.00 memif-process event wait 0 0 1 1.28e4 0.00 nat-det-expire-walk done 1 0 0 2.12e3 0.00 nat64-expire-walk event wait 0 0 1 2.68e4 0.00 rd-cp-process any wait 0 0 93544417 2.71e2 0.00 send-garp-na-process event wait 0 0 1 1.42e3 0.00 send-rs-process any wait 0 0 1 1.41e3 0.00 startup-config-process done 1 0 1 1.60e9 0.00 udp-ping-local active 1 1 0 1.59e4 1.00 udp-ping-process any wait 0 0 1 1.96e4 0.00 unix-cli-127.0.0.1:52318 active 0 0 24 1.09e8 0.00 unix-cli-stdin event wait 0 0 1 2.11e9 0.00 unix-epoll-input polling 68445636 0 0 1.16e4 0.00 vhost-user-process any wait 0 0 1 1.58e3 0.00 vhost-user-send-interrupt-proc any wait 0 0 1 1.37e3 0.00 vpe-link-state-process event wait 0 0 2 5.49e3 0.00 vpe-oam-process any wait 0 0 1076 3.17e3 0.00 vxlan-gpe-ioam-export-process any wait 0 0 1 2.16e3 0.00 wildcard-ip4-arp-publisher-pro event wait 0 0 1 1.87e3 0.00 --------------- Thread 1 vpp_wk_0 (lcore 4) Time 2234.4, average vectors/node 1.00, last 128 main loops 0.00 per node 0.00 vector rates in 1.7902e-3, out 0.0000e0, drop 1.7902e-3, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call arp-input active 3 3 0 1.57e4 1.00 dpdk-input polling 7030810415 4 0 1.09e12 0.00 error-drop active 4 4 0 6.25e3 1.00 ethernet-input active 3 3 0 3.17e3 1.00 icmp6-neighbor-advertisement active 1 1 0 1.35e4 1.00 ip6-drop active 1 1 0 5.52e2 1.00 ip6-icmp-input active 1 1 0 1.43e3 1.00 ip6-input active 1 1 0 2.27e3 1.00 ip6-local active 1 1 0 1.67e3 1.00 ip6-lookup active 1 1 0 3.63e3 1.00 unix-epoll-input polling 1877343 0 0 5.71e-4 0.00 --------------- Thread 2 vpp_wk_1 (lcore 20) Time 2234.4, average vectors/node 1.00, last 128 main loops 0.00 per node 0.00 vector rates in 8.9508e-4, out 4.4754e-4, drop 4.4754e-4, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call TenGigabitEthernet8/0/0-output active 1 1 0 3.00e3 1.00 TenGigabitEthernet8/0/0-tx active 1 1 0 3.16e3 1.00 dpdk-input polling 7017175581 2 0 2.17e12 0.00 error-drop active 1 1 0 1.23e4 1.00 icmp6-neighbor-advertisement active 1 1 0 1.45e4 1.00 icmp6-neighbor-solicitation active 1 1 0 1.22e4 1.00 interface-output active 1 1 0 5.67e3 1.00 ip6-drop active 1 1 0 1.75e3 1.00 ip6-icmp-input active 2 2 0 2.84e3 1.00 ip6-input active 2 2 0 5.28e3 1.00 ip6-local active 2 2 0 4.12e3 1.00 ip6-lookup active 2 2 0 4.60e3 1.00 unix-epoll-input polling 1877343 0 0 2.29e3 0.00 Thanks, Steve On 09/18/2018 05:06 PM, Dave Barach (dbarach) wrote: "show run" please... -----Original Message----- From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> <vpp-dev@lists.fd.io><mailto:vpp-dev@lists.fd.io> On Behalf Of sheckman Sent: Tuesday, September 18, 2018 4:58 PM To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> Subject: [vpp-dev] Increase in main core CPU usage between 17.10 and 18.04 I've been seeing a dramatic increase in CPU usage by the vpp_main task. It used to be around 0.3%. Now it's around 85%. I've pored over the release notes and design docs, but haven't found an explanation for this. Why the increase? Thanks, Steve Heckman Principal Software Engineer Arris Group
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#10554): https://lists.fd.io/g/vpp-dev/message/10554 Mute This Topic: https://lists.fd.io/mt/25750065/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-