> > Hi, > > I've set up a simple packet forwarding perf test on a dual-port 10G > 82599ES: one port receives 64 byte UDP packets, the other sends it out, > one core used. I've used latest OVS with DPDK 2.1, and the first result > was only 13.2 Mpps, which was a bit far from the 13.9 I've seen last > year with the same test. The first thing I've changed was to revert back > to the old behaviour about this issue: > > http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/22731 > > So instead of the new default I've passed 2048 + > RTE_PKTMBUF_HEADROOM.
We'll post a patch this week that should resolve this (if it is the same issue). > That increased the performance to 13.5, but to figure out what's wrong > started to play with the receive functions. First I've disabled vector > PMD, but ixgbe_recv_pkts_bulk_alloc() was even worse, only 12.5 Mpps. So > then I've enabled scattered RX, and with > ixgbe_recv_pkts_lro_bulk_alloc() I could manage to get 13.98 Mpps, which > is I guess as close as possible to the 14.2 line rate (on my HW at > least, with one core) > Does anyone has a good explanation about why the vector PMD performs so > significantly worse? I would expect that on a 3.2 GHz i5-4570 one core > should be able to reach ~14 Mpps, SG and vector PMD shouldn't make a > difference. > I've tried to look into it with oprofile, but the results were quite > strange: 35% of the samples were from miniflow_extract, the part where > parse_vlan calls data_pull to jump after the MAC addresses. The oprofile > snippet (1M samples): > > 511454 19 0.0037 flow.c:511 > 511458 149 0.0292 dp-packet.h:266 > 51145f 4264 0.8357 dp-packet.h:267 > 511466 18 0.0035 dp-packet.h:268 > 51146d 43 0.0084 dp-packet.h:269 > 511474 172 0.0337 flow.c:511 > 51147a 4320 0.8467 string3.h:51 > 51147e 358763 70.3176 flow.c:99 > 511482 2 3.9e-04 string3.h:51 > 511485 3060 0.5998 string3.h:51 > 511488 1693 0.3318 string3.h:51 > 51148c 2933 0.5749 flow.c:326 > 511491 47 0.0092 flow.c:326 > > And the corresponding disassembled code: > > 511454: 49 83 f9 0d cmp r9,0xd > 511458: c6 83 81 00 00 00 00 mov BYTE PTR [rbx+0x81],0x0 > 51145f: 66 89 83 82 00 00 00 mov WORD PTR [rbx+0x82],ax > 511466: 66 89 93 84 00 00 00 mov WORD PTR [rbx+0x84],dx > 51146d: 66 89 8b 86 00 00 00 mov WORD PTR [rbx+0x86],cx > 511474: 0f 86 af 01 00 00 jbe 511629 > <miniflow_extract+0x279> > 51147a: 48 8b 45 00 mov rax,QWORD PTR [rbp+0x0] > 51147e: 4c 8d 5d 0c lea r11,[rbp+0xc] > 511482: 49 89 00 mov QWORD PTR [r8],rax > 511485: 8b 45 08 mov eax,DWORD PTR [rbp+0x8] > 511488: 41 89 40 08 mov DWORD PTR [r8+0x8],eax > 51148c: 44 0f b7 55 0c movzx r10d,WORD PTR [rbp+0xc] > 511491: 66 41 81 fa 81 00 cmp r10w,0x81 > > My only explanation to this so far is that I misunderstand something > about the oprofile results. > > Regards, > > Zoltan