On Thu, Apr 23, 2015 at 11:40 AM, Daniele Di Proietto
<diproiet...@vmware.com> wrote:
> Initializing the dp_packet's metadata can be a hot spot, especially
> for very simple pipelines. Therefore improving the code here can
> sometimes make a difference.
>
> Using memcpy instead of a plain assignment helps GCC and clang generate
> faster code. Here's a comparison of the compiler generated code (GCC 4.8)
> with or without this commit.
>
> BEFORE (assignment) | AFTER(memcpy)
>
> c8: add $0x8,%r8 | d8: mov (%rsi),%r8
> mov (%rcx),%r9 | mov (%rdx),%rdi
> mov (%rbx),%r11d | add $0x1,%ecx
> mov %r10,%rcx | add $0x8,%rsi
> cmp %rsi,%r8 | cmp -0x870(%rbp),%ecx
> lea 0x88(%r9),%rdi | mov %rdi,0x88(%r8)
> rep stos %rax,%es:(%rdi) | mov 0x8(%rdx),%rdi
> mov %r11d,0xb8(%r9) | lea 0x88(%r8),%rax
> mov %r8,%rcx | mov %rdi,0x90(%r8)
> jne c8 | mov 0x10(%rdx),%rdi
> | mov %rdi,0x98(%r8)
> | mov 0x18(%rdx),%rdi
> | mov %rdi,0xa0(%r8)
> | mov 0x20(%rdx),%r8
> | mov %r8,0x20(%rax)
> | mov 0x28(%rdx),%r8
> | mov %r8,0x28(%rax)
> | mov 0x30(%rdx),%r8
> | mov %r8,0x30(%rax)
> | jl d8
>
> The old code uses a 'rep stos' and fetches the 'port_no' value from
> the 'port' member at every iteration ('mov (%rbx),%r11d'), while the
> new code uses a series of mov operation to accomplish everything.
>
> I can measure a through improvement of ~7% on a single flow phy-phy test
> with 64 bytes UDP packets.
>
> The improvement has been observed on an Intel Xeon Sandy Bridge (2012)
> and on an Intel Xeon Westmere (2010).
>
> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
> ---
> lib/dpif-netdev.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index f1d65f5..7d55997 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -2507,13 +2507,16 @@ dp_netdev_process_rxq_port(struct
> dp_netdev_pmd_thread *pmd,
> error = netdev_rxq_recv(rxq, packets, &cnt);
> cycles_count_end(pmd, PMD_CYCLES_POLLING);
> if (!error) {
> + const struct pkt_metadata md =
> PKT_METADATA_INITIALIZER(port->port_no);
This change looks good. But I think we can improve it even more by
replacing port->port_no with pkt_metadata. So that we do not need to
initialize this structure on even packet receive.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev