Hi,

I've set up a simple packet forwarding perf test on a dual-port 10G 82599ES: one port receives 64 byte UDP packets, the other sends it out, one core used. I've used latest OVS with DPDK 2.1, and the first result was only 13.2 Mpps, which was a bit far from the 13.9 I've seen last year with the same test. The first thing I've changed was to revert back to the old behaviour about this issue:

http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/22731

So instead of the new default I've passed 2048 + RTE_PKTMBUF_HEADROOM. That increased the performance to 13.5, but to figure out what's wrong started to play with the receive functions. First I've disabled vector PMD, but ixgbe_recv_pkts_bulk_alloc() was even worse, only 12.5 Mpps. So then I've enabled scattered RX, and with ixgbe_recv_pkts_lro_bulk_alloc() I could manage to get 13.98 Mpps, which is I guess as close as possible to the 14.2 line rate (on my HW at least, with one core) Does anyone has a good explanation about why the vector PMD performs so significantly worse? I would expect that on a 3.2 GHz i5-4570 one core should be able to reach ~14 Mpps, SG and vector PMD shouldn't make a difference. I've tried to look into it with oprofile, but the results were quite strange: 35% of the samples were from miniflow_extract, the part where parse_vlan calls data_pull to jump after the MAC addresses. The oprofile snippet (1M samples):

  511454 19        0.0037  flow.c:511
  511458 149       0.0292  dp-packet.h:266
  51145f 4264      0.8357  dp-packet.h:267
  511466 18        0.0035  dp-packet.h:268
  51146d 43        0.0084  dp-packet.h:269
  511474 172       0.0337  flow.c:511
  51147a 4320      0.8467  string3.h:51
  51147e 358763   70.3176  flow.c:99
  511482 2        3.9e-04  string3.h:51
  511485 3060      0.5998  string3.h:51
  511488 1693      0.3318  string3.h:51
  51148c 2933      0.5749  flow.c:326
  511491 47        0.0092  flow.c:326

And the corresponding disassembled code:

  511454:       49 83 f9 0d             cmp    r9,0xd
  511458:       c6 83 81 00 00 00 00    mov    BYTE PTR [rbx+0x81],0x0
  51145f:       66 89 83 82 00 00 00    mov    WORD PTR [rbx+0x82],ax
  511466:       66 89 93 84 00 00 00    mov    WORD PTR [rbx+0x84],dx
  51146d:       66 89 8b 86 00 00 00    mov    WORD PTR [rbx+0x86],cx
511474: 0f 86 af 01 00 00 jbe 511629 <miniflow_extract+0x279>
  51147a:       48 8b 45 00             mov    rax,QWORD PTR [rbp+0x0]
  51147e:       4c 8d 5d 0c             lea    r11,[rbp+0xc]
  511482:       49 89 00                mov    QWORD PTR [r8],rax
  511485:       8b 45 08                mov    eax,DWORD PTR [rbp+0x8]
  511488:       41 89 40 08             mov    DWORD PTR [r8+0x8],eax
  51148c:       44 0f b7 55 0c          movzx  r10d,WORD PTR [rbp+0xc]
  511491:       66 41 81 fa 81 00       cmp    r10w,0x81

My only explanation to this so far is that I misunderstand something about the oprofile results.

Regards,

Zoltan
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to