Thanks, Damjan for the hints. The 2nd point that you mentioned looks interesting. I did see drops on rx queue, especially with command “show hardware”.
I was able to see a significant performance improvement with multiple worker threads pinned to a dedicated logical core. There are 4 worker threads, each polling on 10G ports and 1G ports. There is one Rx queue configured / port. 1G ports are admin down. Below is the rx-placement and thread placement. With below config, I am able to send/receive 95% of 10G traffic (1500 byte frame) in both directions without any drops. However, when I admin-enable 1G ports, “rx miss” and “tx-error” drops start to appear on 10G ports. Had to bring down the traffic to 85% of 10G to see no drops Any thoughts on why this could be happening ? There is separate worker thread polling on the 1G port queues and there is no traffic on 1G ports. So, looking to understand what is the relation. vpp# show interface rx-placement Thread 1 (vpp_wk_0): node dpdk-input: TenGigabitEthernet3/0/0 queue 0 (polling) Thread 2 (vpp_wk_1): node dpdk-input: TenGigabitEthernet3/0/1 queue 0 (polling) Thread 3 (vpp_wk_2): node dpdk-input: GigabitEthernet5/0/0 queue 0 (polling) Thread 4 (vpp_wk_3): node dpdk-input: GigabitEthernet5/0/1 queue 0 (polling) vpp# show threads ID Name Type LWP Sched Policy (Priority) lcore Core Socket State 0 vpp_main 1454 other (0) 2 2 0 1 vpp_wk_0 workers 1458 other (0) 10 2 0 2 vpp_wk_1 workers 1459 other (0) 11 3 0 3 vpp_wk_2 workers 1460 other (0) 12 4 0 4 vpp_wk_3 workers 1461 other (0) 13 5 0 5 stats 1462 other (0) 0 0 0 Thanks, Vijay From: Damjan Marion <dmar...@me.com> Date: Tuesday, March 26, 2019 at 2:30 AM To: "Chandra Mohan, Vijay Mohan" <vijch...@ciena.com> Cc: "vpp-dev@lists.fd.io" <vpp-dev@lists.fd.io> Subject: [**EXTERNAL**] Re: [vpp-dev] VPP Performance question Few hints: 1. When you observe “show run” statistics, you always do: 1. start traffic 2. clear run 3. wait a bit 4 show run Otherwise statistics will show you average which includes period without traffic. 2. debug CLI commands are typically causing barrier sync (unless handler is explicitly marked as thread safe), and that can stop worker threads for more than 500 usec. In such situations it is normal and expected that you will observe small amount of rx tail drops as simply worker is not servicing specific NIC queue for significant amount of time. On 26 Mar 2019, at 04:04, Chandra Mohan, Vijay Mohan <vijch...@ciena.com<mailto:vijch...@ciena.com>> wrote: Hi Everyone, I am working on measuring the performance with a xconnect of two sub-interfaces. I did see quite a few performance related questions & answers in the community which were very helpful to get to this point. However, I’m still facing rx and tx queue drops (“rx misses” and “tx-error”). Here is the config : l2 xconnect TenGigabitEthernet3/0/0.1 TenGigabitEthernet3/0/1.1 l2 xconnect TenGigabitEthernet3/0/1.1 TenGigabitEthernet3/0/0.1 I’m passing traffic which is 70% of the line rate (10G) in both directions and I do not see any drops. Ran the traffic for 30 Min with no drops. Below is the runtime stats. Have CPU affinity in place and “vpp_wk_0” is on dedicated logical core 9. However, I see that the “vectors/node” is 26.03 . I was expecting to see 255.99. Is it something that can be seen only with high burst of traffic ? I may be missing something here and looking to understand what that may be. Thread 1 vpp_wk_0 (lcore 9) Time 531.8, average vectors/node 26.03, last 128 main loops 1.31 per node 21.00 vector rates in 1.1539e6, out 1.1539e6, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call TenGigabitEthernet3/0/0-output active 18857661 306865019 0 1.54e2 16.27 TenGigabitEthernet3/0/0-tx active 18857661 306865019 0 2.62e2 16.27 TenGigabitEthernet3/0/1-output active 18857661 306865172 0 1.63e2 16.27 TenGigabitEthernet3/0/1-tx active 18857661 306865172 0 2.67e2 16.27 dpdk-input polling 18857661 613730191 0 4.48e2 32.55 ethernet-input active 18864470 613730191 0 6.92e2 32.53 l2-input active 18864470 613730191 0 1.15e2 32.53 l2-output active 18864470 613730191 0 1.31e2 32.53 There are two rx-queues and two tx-queues assigned to each of the 10 Gig ports. Queue depth is 1024. Following is the queue placement: Thread 1 (vpp_wk_0): node dpdk-input: TenGigabitEthernet3/0/0 queue 0 (polling) TenGigabitEthernet3/0/0 queue 1 (polling) TenGigabitEthernet3/0/1 queue 0 (polling) TenGigabitEthernet3/0/1 queue 1 (polling) Now, when I increase the rate to 75% of 10G, I am seeing drops due to “rx-miss” DBGvpp# sho int Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count TenGigabitEthernet3/0/0 1 up 9000/0/0/0 rx packets 26235935 rx bytes 39248958760 tx packets 26236104 tx bytes 39249211584 rx-miss 697 TenGigabitEthernet3/0/0.1 3 up 0/0/0/0 rx packets 26235935 rx bytes 39248958760 tx packets 26236104 tx bytes 39249211584 TenGigabitEthernet3/0/1 2 up 9000/0/0/0 rx packets 26236104 rx bytes 39249211584 tx packets 26235935 tx bytes 39248958760 rx-miss 711 TenGigabitEthernet3/0/1.1 4 up 0/0/0/0 rx packets 26236104 rx bytes 39249211584 tx packets 26235935 tx bytes 39248958760 local0 0 down 0/0/0/0 Here is the runtime stats when that happens: Thread 1 vpp_wk_0 (lcore 9) Time 59.0, average vectors/node 34.58, last 128 main loops 1.69 per node 27.00 vector rates in 1.2365e6, out 1.2365e6, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call TenGigabitEthernet3/0/0-output active 1682608 36482575 0 1.33e2 21.68 TenGigabitEthernet3/0/0-tx active 1682608 36482575 0 2.48e2 21.68 TenGigabitEthernet3/0/1-output active 1682608 36482560 0 1.42e2 21.68 TenGigabitEthernet3/0/1-tx active 1682608 36482560 0 2.53e2 21.68 dpdk-input polling 1682608 72965135 0 4.11e2 43.36 ethernet-input active 1691495 72965135 0 6.77e2 43.14 l2-input active 1691495 72965135 0 1.08e2 43.14 l2-output active 1691495 72965135 0 1.07e2 43.14 Would increasing the core, threads be of any help ? or Given that vector/node is 34.58, does it mean there is still room to process more frames ? Also, there are two Rx queues configured. Is there a command to check if they are equally serviced ? looking to understand how the load is equally distributed over the two rx-queues and two tx queues. Any help to determine why this drop might be happening will be great. Thanks, Vijay -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#12635): https://lists.fd.io/g/vpp-dev/message/12635 Mute This Topic: https://lists.fd.io/mt/30778968/675642 Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [dmar...@me.com<mailto:dmar...@me.com>] -=-=-=-=-=-=-=-=-=-=-=-
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#12654): https://lists.fd.io/g/vpp-dev/message/12654 Mute This Topic: https://lists.fd.io/mt/30808484/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-