Hi,
I've set up a simple packet forwarding perf test on a dual-port 10G
82599ES: one port receives 64 byte UDP packets, the other sends it out,
one core used. I've used latest OVS with DPDK 2.1, and the first result
was only 13.2 Mpps, which was a bit far from the 13.9 I've seen last
year with the same test. The first thing I've changed was to revert back
to the old behaviour about this issue:
http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/22731
So instead of the new default I've passed 2048 + RTE_PKTMBUF_HEADROOM.
That increased the performance to 13.5, but to figure out what's wrong
started to play with the receive functions. First I've disabled vector
PMD, but ixgbe_recv_pkts_bulk_alloc() was even worse, only 12.5 Mpps. So
then I've enabled scattered RX, and with
ixgbe_recv_pkts_lro_bulk_alloc() I could manage to get 13.98 Mpps, which
is I guess as close as possible to the 14.2 line rate (on my HW at
least, with one core)
Does anyone has a good explanation about why the vector PMD performs so
significantly worse? I would expect that on a 3.2 GHz i5-4570 one core
should be able to reach ~14 Mpps, SG and vector PMD shouldn't make a
difference.
I've tried to look into it with oprofile, but the results were quite
strange: 35% of the samples were from miniflow_extract, the part where
parse_vlan calls data_pull to jump after the MAC addresses. The oprofile
snippet (1M samples):
511454 19 0.0037 flow.c:511
511458 149 0.0292 dp-packet.h:266
51145f 4264 0.8357 dp-packet.h:267
511466 18 0.0035 dp-packet.h:268
51146d 43 0.0084 dp-packet.h:269
511474 172 0.0337 flow.c:511
51147a 4320 0.8467 string3.h:51
51147e 358763 70.3176 flow.c:99
511482 2 3.9e-04 string3.h:51
511485 3060 0.5998 string3.h:51
511488 1693 0.3318 string3.h:51
51148c 2933 0.5749 flow.c:326
511491 47 0.0092 flow.c:326
And the corresponding disassembled code:
511454: 49 83 f9 0d cmp r9,0xd
511458: c6 83 81 00 00 00 00 mov BYTE PTR [rbx+0x81],0x0
51145f: 66 89 83 82 00 00 00 mov WORD PTR [rbx+0x82],ax
511466: 66 89 93 84 00 00 00 mov WORD PTR [rbx+0x84],dx
51146d: 66 89 8b 86 00 00 00 mov WORD PTR [rbx+0x86],cx
511474: 0f 86 af 01 00 00 jbe 511629
<miniflow_extract+0x279>
51147a: 48 8b 45 00 mov rax,QWORD PTR [rbp+0x0]
51147e: 4c 8d 5d 0c lea r11,[rbp+0xc]
511482: 49 89 00 mov QWORD PTR [r8],rax
511485: 8b 45 08 mov eax,DWORD PTR [rbp+0x8]
511488: 41 89 40 08 mov DWORD PTR [r8+0x8],eax
51148c: 44 0f b7 55 0c movzx r10d,WORD PTR [rbp+0xc]
511491: 66 41 81 fa 81 00 cmp r10w,0x81
My only explanation to this so far is that I misunderstand something
about the oprofile results.
Regards,
Zoltan
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev