On 03/08/2017 07:01 AM, Yao, Lei A wrote:
-----Original Message-----
From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
Sent: Monday, March 6, 2017 10:11 PM
To: Yuanhan Liu <yuanhan....@linux.intel.com>
Cc: Liang, Cunming <cunming.li...@intel.com>; Tan, Jianfeng
<jianfeng....@intel.com>; dev@dpdk.org; Wang, Zhihong
<zhihong.w...@intel.com>; Yao, Lei A <lei.a....@intel.com>
Subject: Re: [RFC PATCH] net/virtio: Align Virtio-net header on cache line in
receive path
On 03/06/2017 09:46 AM, Yuanhan Liu wrote:
On Wed, Mar 01, 2017 at 08:36:24AM +0100, Maxime Coquelin wrote:
On 02/23/2017 06:49 AM, Yuanhan Liu wrote:
On Wed, Feb 22, 2017 at 10:36:36AM +0100, Maxime Coquelin wrote:
On 02/22/2017 02:37 AM, Yuanhan Liu wrote:
On Tue, Feb 21, 2017 at 06:32:43PM +0100, Maxime Coquelin wrote:
This patch aligns the Virtio-net header on a cache-line boundary to
optimize cache utilization, as it puts the Virtio-net header (which
is always accessed) on the same cache line as the packet header.
For example with an application that forwards packets at L2 level,
a single cache-line will be accessed with this patch, instead of
two before.
I'm assuming you were testing pkt size <= (64 - hdr_size)?
No, I tested with 64 bytes packets only.
Oh, my bad, I overlooked it. While you were saying "a single cache
line", I was thinking putting the virtio net hdr and the "whole"
packet data in single cache line, which is not possible for pkt
size 64B.
I run some more tests this morning with different packet sizes,
and also with changing the mbuf size on guest side to have multi-
buffers packets:
+-------+--------+--------+-------------------------+
| Txpkt | Rxmbuf | v17.02 | v17.02 + vnet hdr align |
+-------+--------+--------+-------------------------+
| 64 | 2048 | 11.05 | 11.78 |
| 128 | 2048 | 10.66 | 11.48 |
| 256 | 2048 | 10.47 | 11.21 |
| 512 | 2048 | 10.22 | 10.88 |
| 1024 | 2048 | 7.65 | 7.84 |
| 1500 | 2048 | 6.25 | 6.45 |
| 2000 | 2048 | 5.31 | 5.43 |
| 2048 | 2048 | 5.32 | 4.25 |
| 1500 | 512 | 3.89 | 3.98 |
| 2048 | 512 | 1.96 | 2.02 |
+-------+--------+--------+-------------------------+
Could you share more info, say is it a PVP test? Is mergeable on?
What's the fwd mode?
No, this is not PVP benchmark, I have neither another server nor a packet
generator connected to my Haswell machine back-to-back.
This is simple micro-benchmark, vhost PMD in txonly, Virtio PMD in
rxonly. In this configuration, mergeable is ON and no offload disabled
in QEMU cmdline.
Okay, I see. So the boost, as you have stated, comes from saving two
cache line access to one. Before that, vhost write 2 cache lines,
while the virtio pmd reads 2 cache lines: one for reading the header,
another one for reading the ether header, for updating xstats (there
is no ether access in the fwd mode you tested).
That's why I would be interested in more testing on recent hardware
with PVP benchmark. Is it something that could be run in Intel lab?
I think Yao Lei could help on that? But as stated, I think it may
break the performance for bit packets. And I also won't expect big
boost even for 64B in PVP test, judging that it's only 6% boost in
micro bechmarking.
That would be great.
Note that on SandyBridge, on which I see a drop in perf with
microbenchmark, I get a 4% gain on PVP benchmark. So on recent hardware
that show a gain on microbenchmark, I'm curious of the gain with PVP
bench.
Hi, Maxime, Yuanhan
I have execute the PVP and loopback performance test on my Ivy bridge server.
OS:Ubutnu16.04
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
Kernal: 4.4.0
gcc : 5.4.0
I use MAC forward for test.
Performance base is commit f5472703c0bdfc29c46fc4b2ca445bce3dc08c9f,
"eal: optimize aligned memcpy on x86".
I can see big performance drop on Mergeable and no-mergeable path
after apply this patch
Mergebale Path loopback test
packet size Performance compare
64 -21.76%
128 -17.79%
260 -20.25%
520 -14.80%
1024 -9.34%
1500 -6.16%
No-mergeable Path loopback test
packet size
64 -13.72%
128 -10.35%
260 -16.40%
520 -14.78%
1024 -10.48%
1500 -6.91%
Mergeable Path PVP test
packet size Performance compare
64 -16.33%
No-mergeable Path PVP test
packet size
64 -8.69%
Thanks Yao for the testing.
I'm surprised of the PVP results as even on SandyBridge, where I get
perf drop on micro benchmarks, I get improvement with PVP.
I'll try to reproduce some tests with Ivy Bridge, to understand what is
happening.
Cheers,
Maxime