Thanks, Damjan for the hints. The 2nd point that you mentioned looks 
interesting. I did see drops on rx queue, especially with command “show 
hardware”.

I was able to see a significant performance improvement with multiple worker 
threads pinned to a dedicated logical core. There are 4 worker threads, each 
polling on 10G ports and 1G ports. There is one Rx queue configured / port. 1G 
ports are admin down. Below is the rx-placement and thread placement. With 
below config, I am able to send/receive 95% of 10G traffic (1500 byte frame) in 
both directions without any drops. However, when I admin-enable 1G ports, “rx 
miss” and “tx-error” drops start to appear on 10G ports. Had to bring down the 
traffic to 85% of 10G to see no drops Any thoughts on why this could be 
happening ?  There is separate worker thread polling on the 1G port queues and 
there is no traffic on 1G ports. So, looking to understand what is the relation.

vpp# show interface rx-placement
Thread 1 (vpp_wk_0):
  node dpdk-input:
    TenGigabitEthernet3/0/0 queue 0 (polling)
Thread 2 (vpp_wk_1):
  node dpdk-input:
    TenGigabitEthernet3/0/1 queue 0 (polling)
Thread 3 (vpp_wk_2):
  node dpdk-input:
    GigabitEthernet5/0/0 queue 0 (polling)
Thread 4 (vpp_wk_3):
  node dpdk-input:
    GigabitEthernet5/0/1 queue 0 (polling)

vpp# show threads
ID     Name                Type        LWP     Sched Policy (Priority)  lcore  
Core   Socket State
0      vpp_main                        1454    other (0)                2      
2      0
1      vpp_wk_0            workers     1458    other (0)                10     
2      0
2      vpp_wk_1            workers     1459    other (0)                11     
3      0
3      vpp_wk_2            workers     1460    other (0)                12     
4      0
4      vpp_wk_3            workers     1461    other (0)                13     
5      0
5                          stats       1462    other (0)                0      
0      0

Thanks,
Vijay

From: Damjan Marion <dmar...@me.com>
Date: Tuesday, March 26, 2019 at 2:30 AM
To: "Chandra Mohan, Vijay Mohan" <vijch...@ciena.com>
Cc: "vpp-dev@lists.fd.io" <vpp-dev@lists.fd.io>
Subject: [**EXTERNAL**] Re: [vpp-dev] VPP Performance question


Few hints:

1. When you observe “show run” statistics, you always do:

1. start traffic
2. clear run
3. wait a bit
4 show run

Otherwise statistics will show you average which includes period without 
traffic.


2. debug CLI commands are typically causing barrier sync (unless handler is 
explicitly marked as thread safe),
and that can stop worker threads for more than 500 usec. In such situations it 
is normal and expected that you will observe
small amount of rx tail drops as simply worker is not servicing specific NIC 
queue for significant amount of time.



On 26 Mar 2019, at 04:04, Chandra Mohan, Vijay Mohan 
<vijch...@ciena.com<mailto:vijch...@ciena.com>> wrote:

Hi Everyone,

I am working on measuring the performance with a xconnect of two 
sub-interfaces. I did see quite a few performance related questions & answers 
in the community which were very helpful to get to this point. However, I’m 
still facing rx and tx queue drops (“rx misses” and “tx-error”).

Here is the config :
l2 xconnect TenGigabitEthernet3/0/0.1 TenGigabitEthernet3/0/1.1
l2 xconnect TenGigabitEthernet3/0/1.1 TenGigabitEthernet3/0/0.1

I’m passing traffic which is 70% of the line rate (10G) in both directions and 
I do not see any drops. Ran the traffic for 30 Min with no drops. Below is the 
runtime stats. Have CPU affinity in place and “vpp_wk_0” is on dedicated 
logical core 9. However, I see that the “vectors/node” is 26.03 . I was 
expecting to see 255.99. Is it something that can be seen only with high burst 
of traffic ?  I may be missing something here and looking to understand what 
that may be.
Thread 1 vpp_wk_0 (lcore 9)
Time 531.8, average vectors/node 26.03, last 128 main loops 1.31 per node 21.00
  vector rates in 1.1539e6, out 1.1539e6, drop 0.0000e0, punt 0.0000e0
             Name                 State         Calls          Vectors        
Suspends         Clocks       Vectors/Call
TenGigabitEthernet3/0/0-output   active           18857661       306865019      
         0          1.54e2           16.27
TenGigabitEthernet3/0/0-tx       active           18857661       306865019      
         0          2.62e2           16.27
TenGigabitEthernet3/0/1-output   active           18857661       306865172      
         0          1.63e2           16.27
TenGigabitEthernet3/0/1-tx       active           18857661       306865172      
         0          2.67e2           16.27
dpdk-input                       polling          18857661       613730191      
         0          4.48e2           32.55
ethernet-input                   active           18864470       613730191      
         0          6.92e2           32.53
l2-input                         active           18864470       613730191      
         0          1.15e2           32.53
l2-output                        active           18864470       613730191      
         0          1.31e2           32.53

There are two rx-queues and two tx-queues assigned to each of the 10 Gig ports. 
Queue depth is 1024. Following is the queue placement:
Thread 1 (vpp_wk_0):
  node dpdk-input:
    TenGigabitEthernet3/0/0 queue 0 (polling)
    TenGigabitEthernet3/0/0 queue 1 (polling)
    TenGigabitEthernet3/0/1 queue 0 (polling)
    TenGigabitEthernet3/0/1 queue 1 (polling)

Now, when I increase the rate to 75% of 10G, I am seeing drops due to “rx-miss”
DBGvpp# sho int
              Name               Idx    State  MTU (L3/IP4/IP6/MPLS)     
Counter          Count
TenGigabitEthernet3/0/0           1      up          9000/0/0/0     rx packets  
            26235935
                                                                    rx bytes    
         39248958760
                                                                    tx packets  
            26236104
                                                                    tx bytes    
         39249211584
                                                                    rx-miss     
                 697
TenGigabitEthernet3/0/0.1         3      up           0/0/0/0       rx packets  
            26235935
                                                                    rx bytes    
         39248958760
                                                                    tx packets  
            26236104
                                                                    tx bytes    
         39249211584
TenGigabitEthernet3/0/1           2      up          9000/0/0/0     rx packets  
            26236104
                                                                    rx bytes    
         39249211584
                                                                    tx packets  
            26235935
                                                                    tx bytes    
         39248958760
                                                                    rx-miss     
                 711
TenGigabitEthernet3/0/1.1         4      up           0/0/0/0       rx packets  
            26236104
                                                                    rx bytes    
         39249211584
                                                                    tx packets  
            26235935
                                                                    tx bytes    
         39248958760
local0                            0     down          0/0/0/0

Here is the runtime stats when that happens:
Thread 1 vpp_wk_0 (lcore 9)
Time 59.0, average vectors/node 34.58, last 128 main loops 1.69 per node 27.00
  vector rates in 1.2365e6, out 1.2365e6, drop 0.0000e0, punt 0.0000e0
             Name                 State         Calls          Vectors        
Suspends         Clocks       Vectors/Call
TenGigabitEthernet3/0/0-output   active            1682608        36482575      
         0          1.33e2           21.68
TenGigabitEthernet3/0/0-tx       active            1682608        36482575      
         0          2.48e2           21.68
TenGigabitEthernet3/0/1-output   active            1682608        36482560      
         0          1.42e2           21.68
TenGigabitEthernet3/0/1-tx       active            1682608        36482560      
         0          2.53e2           21.68
dpdk-input                       polling           1682608        72965135      
         0          4.11e2           43.36
ethernet-input                   active            1691495        72965135      
         0          6.77e2           43.14
l2-input                         active            1691495        72965135      
         0          1.08e2           43.14
l2-output                        active            1691495        72965135      
         0          1.07e2           43.14

Would increasing the core, threads be of any help ? or Given that vector/node 
is 34.58, does it mean there is still room to process more frames ?

Also, there are two Rx queues configured. Is there a command to check if they 
are equally serviced ? looking to understand how the load is equally 
distributed over the two rx-queues and two tx queues.

Any help to determine why this drop might be happening will be great.


Thanks,
Vijay
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12635): https://lists.fd.io/g/vpp-dev/message/12635
Mute This Topic: https://lists.fd.io/mt/30778968/675642
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[dmar...@me.com<mailto:dmar...@me.com>]
-=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12654): https://lists.fd.io/g/vpp-dev/message/12654
Mute This Topic: https://lists.fd.io/mt/30808484/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to