Hi! I'm doing a throughput test of OVS 2.4.1 and I'm seeing an issue with abnormally high latency variation when sending traffic via a physical port connected to a numa node 1 when having OVS control plane running on numa node 0.
The problem seen is that when a PMD thread running on numa node 0 reads a packet from the vhost queue and transmit it to the physical ports Tx queue located on numa node 1 the packet gets stuck there until next packet arrives. In the code a PMD thread running on numa node 0 reads a packet from the vhost queue, and calls dpdk_queue_pkts to queue it in struct dpdk_tx_queue for the physical port located on numa node 1. In this test dev->tx_q[i].flush_tx is false which means that dpdk_queue_flush__ is not called (except if the Tx queue gets full, i.e. reach MAX_TX_QUEUE_LEN amount of packets but that is not of interest here). For the queues on numa node 1 no other trigger mechanism for a flush is present, so the packet gets stuck in struct dpdk_tx_queue until the next packet arrives. What I think happens in my test is that the last packets I send doesn't fill up the Tx queue hence no flush of it happens. The flush happens when I send an additional packet, then the DRAIN_TSC interval has passed which forces a call to dpdk_queue_flush__. This behavior is not seen when sending packets to a physical port connected to numa node 0 where the OVS control plane is running as the dev->tx_q[i].flush_tx is true in that situation. The behavior seems to be controlled by how the Tx queue is configured in relation with the CPU id is the PMD thread is executing on. Below is code from a function in netdev-dpdk.c (seems to be similar in the latest version on the master- and 2.4.1 branch). Why isn't dev->tx_q[i].flush_tx always enabled (set to true)? If I change the code and always enable it the problem seems to disappear. I've attached some logs describing the cpu layout, where my different PMD threads executes and how the queues for the two physical ports I have gets configured. I would very much appreciate if I can get help in understanding of how this is intended to work. Thanks in advance! BR, Mattias Unmodified code snippet from: https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c (362ca39) static void netdev_dpdk_alloc_txq(struct netdev_dpdk *dev, unsigned int n_txqs) { unsigned i; dev->tx_q = dpdk_rte_mzalloc(n_txqs * sizeof *dev->tx_q); for (i = 0; i < n_txqs; i++) { int numa_id = ovs_numa_get_numa_id(i); if (!dev->txq_needs_locking) { /* Each index is considered as a cpu core id, since there should * be one tx queue for each cpu core. If the corresponding core * is not on the same numa node as 'dev', flags the * 'flush_tx'. */ dev->tx_q[i].flush_tx = dev->socket_id == numa_id; } else { /* Queues are shared among CPUs. Always flush */ dev->tx_q[i].flush_tx = true; } /* Initialize map for vhost devices. */ dev->tx_q[i].map = OVS_VHOST_QUEUE_MAP_UNKNOWN; rte_spinlock_init(&dev->tx_q[i].tx_lock); } } Traffic tool output: burst size: 2 frequency: 100000 burst/s payload size: 30 B frame size: 64 B length of test: 2 s tx: 399730 rx: 399730 bits received: 0 Gb elapsed: 4 s tx lost: 0 rx lost: 0 out of order: 0 corrupt: 0 noise: 0 min: 14 us max: 1994061 us mean latency: 95.254 us standard deviation: 8344.242 us Latencies (us): 0 <= x < 50 158747 50 <= x < 100 200266 100 <= x < 150 40552 150 <= x < 200 32 200 <= x < 250 30 250 <= x < 300 20 300 <= x < 350 20 350 <= x < 400 22 400 <= x < 450 16 450 <= x < 500 10 500 <= x < 550 8 50000 <= x 7 OVS threads: Name CPU pmd128 2 pmd129 39 pmd127 22 pmd126 19 ovs-vswitchd 0 cpu_layout.py: ============================================================ Core and Socket Information (as reported by '/proc/cpuinfo') ============================================================ cores = [0, 1, 2, 3, 4, 8, 9, 10, 11, 12] sockets = [0, 1] Socket 0 Socket 1 -------- -------- Core 0 [0, 20] [1, 21] Core 1 [2, 22] [3, 23] Core 2 [4, 24] [5, 25] Core 3 [6, 26] [7, 27] Core 4 [8, 28] [9, 29] Core 8 [10, 30] [11, 31] Core 9 [12, 32] [13, 33] Core 10 [14, 34] [15, 35] Core 11 [16, 36] [17, 37] Core 12 [18, 38] [19, 39] How Tx flush is configured for dpdk0 on numa node 0: 2016-04-19T08:34:36.389Z|00024|dpdk|INFO|Enable flush for tx queue 0 2016-04-19T08:34:36.389Z|00025|dpdk|INFO|Disable flush for tx queue 1 2016-04-19T08:34:36.389Z|00026|dpdk|INFO|Enable flush for tx queue 2 2016-04-19T08:34:36.389Z|00027|dpdk|INFO|Disable flush for tx queue 3 2016-04-19T08:34:36.389Z|00028|dpdk|INFO|Enable flush for tx queue 4 2016-04-19T08:34:36.389Z|00029|dpdk|INFO|Disable flush for tx queue 5 2016-04-19T08:34:36.389Z|00030|dpdk|INFO|Enable flush for tx queue 6 2016-04-19T08:34:36.389Z|00031|dpdk|INFO|Disable flush for tx queue 7 2016-04-19T08:34:36.389Z|00032|dpdk|INFO|Enable flush for tx queue 8 2016-04-19T08:34:36.389Z|00033|dpdk|INFO|Disable flush for tx queue 9 2016-04-19T08:34:36.389Z|00034|dpdk|INFO|Enable flush for tx queue 10 2016-04-19T08:34:36.389Z|00035|dpdk|INFO|Disable flush for tx queue 11 2016-04-19T08:34:36.389Z|00036|dpdk|INFO|Enable flush for tx queue 12 2016-04-19T08:34:36.389Z|00037|dpdk|INFO|Disable flush for tx queue 13 2016-04-19T08:34:36.389Z|00038|dpdk|INFO|Enable flush for tx queue 14 2016-04-19T08:34:36.389Z|00039|dpdk|INFO|Disable flush for tx queue 15 2016-04-19T08:34:36.389Z|00040|dpdk|INFO|Enable flush for tx queue 16 2016-04-19T08:34:36.389Z|00041|dpdk|INFO|Disable flush for tx queue 17 2016-04-19T08:34:36.389Z|00042|dpdk|INFO|Enable flush for tx queue 18 2016-04-19T08:34:36.389Z|00043|dpdk|INFO|Disable flush for tx queue 19 2016-04-19T08:34:36.389Z|00044|dpdk|INFO|Enable flush for tx queue 20 2016-04-19T08:34:36.389Z|00045|dpdk|INFO|Disable flush for tx queue 21 2016-04-19T08:34:36.389Z|00046|dpdk|INFO|Enable flush for tx queue 22 2016-04-19T08:34:36.389Z|00047|dpdk|INFO|Disable flush for tx queue 23 2016-04-19T08:34:36.389Z|00048|dpdk|INFO|Enable flush for tx queue 24 2016-04-19T08:34:36.389Z|00049|dpdk|INFO|Disable flush for tx queue 25 2016-04-19T08:34:36.389Z|00050|dpdk|INFO|Enable flush for tx queue 26 2016-04-19T08:34:36.389Z|00051|dpdk|INFO|Disable flush for tx queue 27 2016-04-19T08:34:36.389Z|00052|dpdk|INFO|Enable flush for tx queue 28 2016-04-19T08:34:36.389Z|00053|dpdk|INFO|Disable flush for tx queue 29 2016-04-19T08:34:36.389Z|00054|dpdk|INFO|Enable flush for tx queue 30 2016-04-19T08:34:36.389Z|00055|dpdk|INFO|Disable flush for tx queue 31 2016-04-19T08:34:36.389Z|00056|dpdk|INFO|Enable flush for tx queue 32 2016-04-19T08:34:36.389Z|00057|dpdk|INFO|Disable flush for tx queue 33 2016-04-19T08:34:36.389Z|00058|dpdk|INFO|Enable flush for tx queue 34 2016-04-19T08:34:36.389Z|00059|dpdk|INFO|Disable flush for tx queue 35 2016-04-19T08:34:36.389Z|00060|dpdk|INFO|Enable flush for tx queue 36 2016-04-19T08:34:36.389Z|00061|dpdk|INFO|Disable flush for tx queue 37 2016-04-19T08:34:36.389Z|00062|dpdk|INFO|Enable flush for tx queue 38 2016-04-19T08:34:36.389Z|00063|dpdk|INFO|Disable flush for tx queue 39 2016-04-19T08:34:36.389Z|00064|dpdk|INFO|Disable flush for tx queue 40 How Tx flush is configured for dpdk1 on numa node 1: 2016-04-19T08:34:37.495Z|00069|dpdk|INFO|Disable flush for tx queue 0 2016-04-19T08:34:37.495Z|00070|dpdk|INFO|Enable flush for tx queue 1 2016-04-19T08:34:37.495Z|00071|dpdk|INFO|Disable flush for tx queue 2 2016-04-19T08:34:37.495Z|00072|dpdk|INFO|Enable flush for tx queue 3 2016-04-19T08:34:37.495Z|00073|dpdk|INFO|Disable flush for tx queue 4 2016-04-19T08:34:37.495Z|00074|dpdk|INFO|Enable flush for tx queue 5 2016-04-19T08:34:37.495Z|00075|dpdk|INFO|Disable flush for tx queue 6 2016-04-19T08:34:37.495Z|00076|dpdk|INFO|Enable flush for tx queue 7 2016-04-19T08:34:37.495Z|00077|dpdk|INFO|Disable flush for tx queue 8 2016-04-19T08:34:37.495Z|00078|dpdk|INFO|Enable flush for tx queue 9 2016-04-19T08:34:37.495Z|00079|dpdk|INFO|Disable flush for tx queue 10 2016-04-19T08:34:37.495Z|00080|dpdk|INFO|Enable flush for tx queue 11 2016-04-19T08:34:37.495Z|00081|dpdk|INFO|Disable flush for tx queue 12 2016-04-19T08:34:37.495Z|00082|dpdk|INFO|Enable flush for tx queue 13 2016-04-19T08:34:37.495Z|00083|dpdk|INFO|Disable flush for tx queue 14 2016-04-19T08:34:37.495Z|00084|dpdk|INFO|Enable flush for tx queue 15 2016-04-19T08:34:37.495Z|00085|dpdk|INFO|Disable flush for tx queue 16 2016-04-19T08:34:37.495Z|00086|dpdk|INFO|Enable flush for tx queue 17 2016-04-19T08:34:37.495Z|00087|dpdk|INFO|Disable flush for tx queue 18 2016-04-19T08:34:37.495Z|00088|dpdk|INFO|Enable flush for tx queue 19 2016-04-19T08:34:37.495Z|00089|dpdk|INFO|Disable flush for tx queue 20 2016-04-19T08:34:37.495Z|00090|dpdk|INFO|Enable flush for tx queue 21 2016-04-19T08:34:37.495Z|00091|dpdk|INFO|Disable flush for tx queue 22 2016-04-19T08:34:37.495Z|00092|dpdk|INFO|Enable flush for tx queue 23 2016-04-19T08:34:37.495Z|00093|dpdk|INFO|Disable flush for tx queue 24 2016-04-19T08:34:37.495Z|00094|dpdk|INFO|Enable flush for tx queue 25 2016-04-19T08:34:37.495Z|00095|dpdk|INFO|Disable flush for tx queue 26 2016-04-19T08:34:37.495Z|00096|dpdk|INFO|Enable flush for tx queue 27 2016-04-19T08:34:37.495Z|00097|dpdk|INFO|Disable flush for tx queue 28 2016-04-19T08:34:37.495Z|00098|dpdk|INFO|Enable flush for tx queue 29 2016-04-19T08:34:37.495Z|00099|dpdk|INFO|Disable flush for tx queue 30 2016-04-19T08:34:37.495Z|00100|dpdk|INFO|Enable flush for tx queue 31 2016-04-19T08:34:37.495Z|00101|dpdk|INFO|Disable flush for tx queue 32 2016-04-19T08:34:37.495Z|00102|dpdk|INFO|Enable flush for tx queue 33 2016-04-19T08:34:37.495Z|00103|dpdk|INFO|Disable flush for tx queue 34 2016-04-19T08:34:37.495Z|00104|dpdk|INFO|Enable flush for tx queue 35 2016-04-19T08:34:37.495Z|00105|dpdk|INFO|Disable flush for tx queue 36 2016-04-19T08:34:37.495Z|00106|dpdk|INFO|Enable flush for tx queue 37 2016-04-19T08:34:37.495Z|00107|dpdk|INFO|Disable flush for tx queue 38 2016-04-19T08:34:37.495Z|00108|dpdk|INFO|Enable flush for tx queue 39 2016-04-19T08:34:37.495Z|00109|dpdk|INFO|Disable flush for tx queue 40 _______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss