Hi Mattias,

thanks for reporting this!

The delayed flushing logic was implemented for performance reasons: the
idea is to reduce the calls to rte_eth_tx_burst(). The tx queue is supposed
to be flushed also in netdev_dpdk_rxq_recv(), but that doesn't always work
as intended, because we rely on the cpu id to be related to the tx qid and
rx qid.

Since we've implemented tx output batching, I think the performance
benefits of the delayed flushing shouldn't matter as much.

Also, as you clearly show, the increased latency might be problematic.

We should either rework delayed flushing (e.g. by introducing an explicit
call from the datapath layer), or remove it completely.

Ilya proposed a patch to remove it:

http://openvswitch.org/pipermail/dev/2016-May/070900.html

Thanks,

Daniele

2016-05-04 6:15 GMT-07:00 Mattias Johansson G <
mattias.g.johans...@ericsson.com>:

> Hi!
>
> I'm doing a throughput test of OVS 2.4.1 and I'm seeing an issue with
> abnormally high
> latency variation when sending traffic via a physical port connected to a
> numa node 1
> when having OVS control plane running on numa node 0.
>
> The problem seen is that when a PMD thread running on numa node 0 reads a
> packet from
> the vhost queue and transmit it to the physical ports Tx queue located on
> numa node 1
> the packet gets stuck there until next packet arrives.
>
> In the code a PMD thread running on numa node 0 reads a packet from the
> vhost queue,
> and calls dpdk_queue_pkts to queue it in struct dpdk_tx_queue for the
> physical port
> located on numa node 1.
>
> In this test dev->tx_q[i].flush_tx is false which means that
> dpdk_queue_flush__ is not
> called (except if the Tx queue gets full, i.e. reach MAX_TX_QUEUE_LEN
> amount of
> packets but that is not of interest here).
>
> For the queues on numa node 1 no other trigger mechanism for a flush is
> present, so the
> packet gets stuck in struct dpdk_tx_queue  until the next packet arrives.
>
> What I think happens in my test is that the last packets I send doesn't
> fill up the Tx queue
> hence no flush of it happens. The flush happens when I send an additional
> packet, then
> the DRAIN_TSC interval has passed which forces a call to
> dpdk_queue_flush__.
>
> This behavior is not seen when sending packets to a physical port
> connected to numa
> node 0 where the OVS control plane is running as the dev->tx_q[i].flush_tx
> is true in
> that situation.
>
> The behavior seems to be controlled by how the Tx queue is configured in
> relation with
> the CPU id is the PMD thread is executing on. Below is code from a
> function in
> netdev-dpdk.c (seems to be similar in the latest version on the master-
> and 2.4.1 branch).
>
> Why isn't dev->tx_q[i].flush_tx always enabled (set to true)? If I change
> the code and
> always enable it the problem seems to disappear.
>
> I've attached some logs describing the cpu layout, where my different PMD
> threads
> executes and how the queues for the two physical ports I have gets
> configured.
>
> I would very much appreciate if I can get help in understanding of how
> this is intended
> to work.
>
> Thanks in advance!
>
> BR,
> Mattias
>
> Unmodified code snippet from:
> https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c (362ca39)
> static void
> netdev_dpdk_alloc_txq(struct netdev_dpdk *dev, unsigned int n_txqs)
> {
>     unsigned i;
>
>     dev->tx_q = dpdk_rte_mzalloc(n_txqs * sizeof *dev->tx_q);
>     for (i = 0; i < n_txqs; i++) {
>         int numa_id = ovs_numa_get_numa_id(i);
>
>         if (!dev->txq_needs_locking) {
>             /* Each index is considered as a cpu core id, since there
> should
>              * be one tx queue for each cpu core.  If the corresponding
> core
>              * is not on the same numa node as 'dev', flags the
>              * 'flush_tx'. */
>             dev->tx_q[i].flush_tx = dev->socket_id == numa_id;
>         } else {
>             /* Queues are shared among CPUs. Always flush */
>             dev->tx_q[i].flush_tx = true;
>         }
>
>         /* Initialize map for vhost devices. */
>         dev->tx_q[i].map = OVS_VHOST_QUEUE_MAP_UNKNOWN;
>         rte_spinlock_init(&dev->tx_q[i].tx_lock);
>     }
> }
>
> Traffic tool output:
> burst size:                           2
> frequency:                       100000 burst/s
> payload size:                        30 B
> frame size:                          64 B
> length of test:                       2 s
>
> tx:                              399730
> rx:                              399730
> bits received:                        0 Gb
> elapsed:                              4 s
>
> tx lost:                              0
> rx lost:                              0
> out of order:                         0
> corrupt:                              0
> noise:                                0
>
> min:                                 14 us
> max:                            1994061 us
> mean latency:                    95.254 us
> standard deviation:            8344.242 us
>
> Latencies (us):
>        0 <= x < 50               158747
>       50 <= x < 100              200266
>      100 <= x < 150               40552
>      150 <= x < 200                  32
>      200 <= x < 250                  30
>      250 <= x < 300                  20
>      300 <= x < 350                  20
>      350 <= x < 400                  22
>      400 <= x < 450                  16
>      450 <= x < 500                  10
>      500 <= x < 550                   8
>     50000 <= x                         7
>
>
>
> OVS threads:
> Name              CPU
> pmd128             2
> pmd129           39
> pmd127           22
> pmd126           19
> ovs-vswitchd   0
>
> cpu_layout.py:
> ============================================================
> Core and Socket Information (as reported by '/proc/cpuinfo')
> ============================================================
>
> cores =  [0, 1, 2, 3, 4, 8, 9, 10, 11, 12] sockets =  [0, 1]
>
>         Socket 0        Socket 1
>         --------        --------
> Core 0  [0, 20]         [1, 21]
>
> Core 1  [2, 22]         [3, 23]
>
> Core 2  [4, 24]         [5, 25]
>
> Core 3  [6, 26]         [7, 27]
>
> Core 4  [8, 28]         [9, 29]
>
> Core 8  [10, 30]        [11, 31]
>
> Core 9  [12, 32]        [13, 33]
>
> Core 10 [14, 34]        [15, 35]
>
> Core 11 [16, 36]        [17, 37]
>
> Core 12 [18, 38]        [19, 39]
>
>
> How Tx flush is configured for dpdk0 on numa node 0:
> 2016-04-19T08:34:36.389Z|00024|dpdk|INFO|Enable flush for tx queue 0
> 2016-04-19T08:34:36.389Z|00025|dpdk|INFO|Disable flush for tx queue 1
> 2016-04-19T08:34:36.389Z|00026|dpdk|INFO|Enable flush for tx queue 2
> 2016-04-19T08:34:36.389Z|00027|dpdk|INFO|Disable flush for tx queue 3
> 2016-04-19T08:34:36.389Z|00028|dpdk|INFO|Enable flush for tx queue 4
> 2016-04-19T08:34:36.389Z|00029|dpdk|INFO|Disable flush for tx queue 5
> 2016-04-19T08:34:36.389Z|00030|dpdk|INFO|Enable flush for tx queue 6
> 2016-04-19T08:34:36.389Z|00031|dpdk|INFO|Disable flush for tx queue 7
> 2016-04-19T08:34:36.389Z|00032|dpdk|INFO|Enable flush for tx queue 8
> 2016-04-19T08:34:36.389Z|00033|dpdk|INFO|Disable flush for tx queue 9
> 2016-04-19T08:34:36.389Z|00034|dpdk|INFO|Enable flush for tx queue 10
> 2016-04-19T08:34:36.389Z|00035|dpdk|INFO|Disable flush for tx queue 11
> 2016-04-19T08:34:36.389Z|00036|dpdk|INFO|Enable flush for tx queue 12
> 2016-04-19T08:34:36.389Z|00037|dpdk|INFO|Disable flush for tx queue 13
> 2016-04-19T08:34:36.389Z|00038|dpdk|INFO|Enable flush for tx queue 14
> 2016-04-19T08:34:36.389Z|00039|dpdk|INFO|Disable flush for tx queue 15
> 2016-04-19T08:34:36.389Z|00040|dpdk|INFO|Enable flush for tx queue 16
> 2016-04-19T08:34:36.389Z|00041|dpdk|INFO|Disable flush for tx queue 17
> 2016-04-19T08:34:36.389Z|00042|dpdk|INFO|Enable flush for tx queue 18
> 2016-04-19T08:34:36.389Z|00043|dpdk|INFO|Disable flush for tx queue 19
> 2016-04-19T08:34:36.389Z|00044|dpdk|INFO|Enable flush for tx queue 20
> 2016-04-19T08:34:36.389Z|00045|dpdk|INFO|Disable flush for tx queue 21
> 2016-04-19T08:34:36.389Z|00046|dpdk|INFO|Enable flush for tx queue 22
> 2016-04-19T08:34:36.389Z|00047|dpdk|INFO|Disable flush for tx queue 23
> 2016-04-19T08:34:36.389Z|00048|dpdk|INFO|Enable flush for tx queue 24
> 2016-04-19T08:34:36.389Z|00049|dpdk|INFO|Disable flush for tx queue 25
> 2016-04-19T08:34:36.389Z|00050|dpdk|INFO|Enable flush for tx queue 26
> 2016-04-19T08:34:36.389Z|00051|dpdk|INFO|Disable flush for tx queue 27
> 2016-04-19T08:34:36.389Z|00052|dpdk|INFO|Enable flush for tx queue 28
> 2016-04-19T08:34:36.389Z|00053|dpdk|INFO|Disable flush for tx queue 29
> 2016-04-19T08:34:36.389Z|00054|dpdk|INFO|Enable flush for tx queue 30
> 2016-04-19T08:34:36.389Z|00055|dpdk|INFO|Disable flush for tx queue 31
> 2016-04-19T08:34:36.389Z|00056|dpdk|INFO|Enable flush for tx queue 32
> 2016-04-19T08:34:36.389Z|00057|dpdk|INFO|Disable flush for tx queue 33
> 2016-04-19T08:34:36.389Z|00058|dpdk|INFO|Enable flush for tx queue 34
> 2016-04-19T08:34:36.389Z|00059|dpdk|INFO|Disable flush for tx queue 35
> 2016-04-19T08:34:36.389Z|00060|dpdk|INFO|Enable flush for tx queue 36
> 2016-04-19T08:34:36.389Z|00061|dpdk|INFO|Disable flush for tx queue 37
> 2016-04-19T08:34:36.389Z|00062|dpdk|INFO|Enable flush for tx queue 38
> 2016-04-19T08:34:36.389Z|00063|dpdk|INFO|Disable flush for tx queue 39
> 2016-04-19T08:34:36.389Z|00064|dpdk|INFO|Disable flush for tx queue 40
>
> How Tx flush is configured for dpdk1 on numa node 1:
> 2016-04-19T08:34:37.495Z|00069|dpdk|INFO|Disable flush for tx queue 0
> 2016-04-19T08:34:37.495Z|00070|dpdk|INFO|Enable flush for tx queue 1
> 2016-04-19T08:34:37.495Z|00071|dpdk|INFO|Disable flush for tx queue 2
> 2016-04-19T08:34:37.495Z|00072|dpdk|INFO|Enable flush for tx queue 3
> 2016-04-19T08:34:37.495Z|00073|dpdk|INFO|Disable flush for tx queue 4
> 2016-04-19T08:34:37.495Z|00074|dpdk|INFO|Enable flush for tx queue 5
> 2016-04-19T08:34:37.495Z|00075|dpdk|INFO|Disable flush for tx queue 6
> 2016-04-19T08:34:37.495Z|00076|dpdk|INFO|Enable flush for tx queue 7
> 2016-04-19T08:34:37.495Z|00077|dpdk|INFO|Disable flush for tx queue 8
> 2016-04-19T08:34:37.495Z|00078|dpdk|INFO|Enable flush for tx queue 9
> 2016-04-19T08:34:37.495Z|00079|dpdk|INFO|Disable flush for tx queue 10
> 2016-04-19T08:34:37.495Z|00080|dpdk|INFO|Enable flush for tx queue 11
> 2016-04-19T08:34:37.495Z|00081|dpdk|INFO|Disable flush for tx queue 12
> 2016-04-19T08:34:37.495Z|00082|dpdk|INFO|Enable flush for tx queue 13
> 2016-04-19T08:34:37.495Z|00083|dpdk|INFO|Disable flush for tx queue 14
> 2016-04-19T08:34:37.495Z|00084|dpdk|INFO|Enable flush for tx queue 15
> 2016-04-19T08:34:37.495Z|00085|dpdk|INFO|Disable flush for tx queue 16
> 2016-04-19T08:34:37.495Z|00086|dpdk|INFO|Enable flush for tx queue 17
> 2016-04-19T08:34:37.495Z|00087|dpdk|INFO|Disable flush for tx queue 18
> 2016-04-19T08:34:37.495Z|00088|dpdk|INFO|Enable flush for tx queue 19
> 2016-04-19T08:34:37.495Z|00089|dpdk|INFO|Disable flush for tx queue 20
> 2016-04-19T08:34:37.495Z|00090|dpdk|INFO|Enable flush for tx queue 21
> 2016-04-19T08:34:37.495Z|00091|dpdk|INFO|Disable flush for tx queue 22
> 2016-04-19T08:34:37.495Z|00092|dpdk|INFO|Enable flush for tx queue 23
> 2016-04-19T08:34:37.495Z|00093|dpdk|INFO|Disable flush for tx queue 24
> 2016-04-19T08:34:37.495Z|00094|dpdk|INFO|Enable flush for tx queue 25
> 2016-04-19T08:34:37.495Z|00095|dpdk|INFO|Disable flush for tx queue 26
> 2016-04-19T08:34:37.495Z|00096|dpdk|INFO|Enable flush for tx queue 27
> 2016-04-19T08:34:37.495Z|00097|dpdk|INFO|Disable flush for tx queue 28
> 2016-04-19T08:34:37.495Z|00098|dpdk|INFO|Enable flush for tx queue 29
> 2016-04-19T08:34:37.495Z|00099|dpdk|INFO|Disable flush for tx queue 30
> 2016-04-19T08:34:37.495Z|00100|dpdk|INFO|Enable flush for tx queue 31
> 2016-04-19T08:34:37.495Z|00101|dpdk|INFO|Disable flush for tx queue 32
> 2016-04-19T08:34:37.495Z|00102|dpdk|INFO|Enable flush for tx queue 33
> 2016-04-19T08:34:37.495Z|00103|dpdk|INFO|Disable flush for tx queue 34
> 2016-04-19T08:34:37.495Z|00104|dpdk|INFO|Enable flush for tx queue 35
> 2016-04-19T08:34:37.495Z|00105|dpdk|INFO|Disable flush for tx queue 36
> 2016-04-19T08:34:37.495Z|00106|dpdk|INFO|Enable flush for tx queue 37
> 2016-04-19T08:34:37.495Z|00107|dpdk|INFO|Disable flush for tx queue 38
> 2016-04-19T08:34:37.495Z|00108|dpdk|INFO|Enable flush for tx queue 39
> 2016-04-19T08:34:37.495Z|00109|dpdk|INFO|Disable flush for tx queue 40
> _______________________________________________
> discuss mailing list
> discuss@openvswitch.org
> http://openvswitch.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to