On 02/20/2017 08:43 AM, Thomas F Herbert wrote: > > > On 02/17/2017 06:18 PM, Alec Hothan (ahothan) wrote: >> >> Hi Karl >> >> >> >> Can you also tell which version of DPDK you were using for OVS and for >> VPP (for VPP is it the one bundled with 17.01?). >> > DPDK 1611 and VPP 1701. >> >> >> >> “The pps is the bi-directional sum of the packets received back at the >> traffic generator.” >> >> Just to make sure…. >> >> >> >> If your traffic gen sends 1 Mpps to each of the 2 interfaces and you >> get no drop (meaning you receive 1 Mpps from each interface). What do >> you report? 2 Mpps or 4 Mpps?
2 Mpps >> >> You seem to say 2Mpps (sum of all RX). >> >> >> >> The CSIT perf numbers report the sum(TX) = in the above example CSIT >> reports 2 Mpps. >> >> The CSIT numbers for 1 vhost/1 VM (practically similar to yours) are >> at about half of what you report. >> >> >> >> <https://docs.fd.io/csit/rls1701/report/vpp_performance_results_hw/performance_results_hw.html#ge2p1x520-dot1q-l2xcbase-eth-2vhost-1vm-ndrpdrdisc>https://docs.fd.io/csit/rls1701/report/vpp_performance_results_hw/performance_results_hw.html#ge2p1x520-dot1q-l2xcbase-eth-2vhost-1vm-ndrpdrdisc >> >> >> >> >> >> scroll down the table to tc13 tc14, 4t4c (4 threads) L2XC, 64B NDR, >> 5.95Mpps (aggregated TX of the 2 interfaces) PDR 7.47Mpps. >> >> while the results in your slides put it at around 11Mpps. >> >> >> >> So either your testbed really switches 2 times more packets than the >> CSIT one, or you’re actually reporting double the amount compared to >> how CSIT reports it… tc13 and tc14 both say "4 threads, 4 phy cores, 2 receive queues per NIC port". In our configuration when doing doing 2 queue we are actually using 8 CPU threads on 4 cores -- a dpdk thread on one core thread and a vhost-user thread on the other core thread. Our comparison of 1 thread per core versus 2 threads per core (slide 3) showed that very little performance was lost we packing the threads onto the cores in this way. For tc13 and tc14 I assume that each thread is polling on both the dpdk and vhost-user interfaces at the same time, is that accurate? If so that is a lot different than our test where each thread is only polling a single interface. Attached is a dump of some vppctl command output that hopefully shows exactly how our setup is configured. >> >> >> >> Thanks >> >> >> >> Alec >> >> >> >> >> >> >> >> *From: *Karl Rister <kris...@redhat.com> >> *Organization: *Red Hat >> *Reply-To: *"kris...@redhat.com" <kris...@redhat.com> >> *Date: *Thursday, February 16, 2017 at 11:09 AM >> *To: *"Alec Hothan (ahothan)" <ahot...@cisco.com>, "Maciek >> Konstantynowicz (mkonstan)" <mkons...@cisco.com>, Thomas F Herbert >> <therb...@redhat.com> >> *Cc: *Andrew Theurer <atheu...@redhat.com>, Douglas Shakshober >> <dsh...@redhat.com>, "csit-...@lists.fd.io" >> <csit-...@lists.fd.io>, vpp-dev <vpp-dev@lists.fd.io> >> *Subject: *Re: [vpp-dev] Interesting perf test results from Red >> Hat's test team >> >> >> >> On 02/15/2017 08:58 PM, Alec Hothan (ahothan) wrote: >> >> >> >> Great summary slides Karl, I have a few more questions on the >> slides. >> >> >> >> · Did you use OSP10/OSPD/ML2 to deploy your testpmd >> VM/configure >> >> the vswitch or is it direct launch using libvirt and direct >> config of >> >> the vswitches? (this is a bit related to Maciek’s question on >> the exact >> >> interface configs in the vswitch) >> >> >> >> There was no use of OSP in these tests, the guest is launched via >> >> libvirt and the vswitches are manually launched and configured with >> >> shell scripts. >> >> >> >> · Unclear if all the charts results were measured >> using 4 phys >> >> cores (no HT) or 2 phys cores (4 threads with HT) >> >> >> >> Only the slide 3 has any 4 core (no HT) data, all other data is >> captured >> >> using HT on the appropriate number of cores: 2 for single queue, 4 for >> >> two queue, and 6 for three queue. >> >> >> >> · How do you report your pps? ;-) Are those >> >> o vswitch centric (how many packets the vswitch forwards per >> second >> >> coming from traffic gen and from VMs) >> >> o or traffic gen centric aggregated TX (how many pps are >> sent by the >> >> traffic gen on both interfaces) >> >> o or traffic gen centric aggregated TX+RX (how many pps are >> sent and >> >> received by the traffic gen on both interfaces) >> >> >> >> The pps is the bi-directional sum of the packets received back at the >> >> traffic generator. >> >> >> >> · From the numbers shown, it looks like it is the >> first or the last >> >> · Unidirectional or symmetric bi-directional traffic? >> >> >> >> symmetric bi-directional >> >> >> >> · BIOS Turbo boost enabled or disabled? >> >> >> >> disabled >> >> >> >> · How many vcpus running the testpmd VM? >> >> >> >> 3, 5, or 7. 1 VCPU for house keeping and then 2 VCPUs for each queue >> >> configuration. Only the required VCPUs are active for any >> >> configuration, so the VCPU count varies depending on the configuration >> >> being tested. >> >> >> >> · How do you range the combinations in your 1M flows >> src/dest >> >> MAC? I’m not aware about any real NFV cloud deployment/VNF >> that handles >> >> that type of flow pattern, do you? >> >> >> >> We increment all the fields being modified by one for each packet >> until >> >> we hit a million and then we restart at the base value and repeat. So >> >> all IPs and/or MACs get modified in unison. >> >> >> >> We actually arrived at the srcMac,dstMac configuration in a backwards >> >> manner. On one of our systems where we develop the traffic >> generator we >> >> were getting an error when doing srcMac,dstMac,srcIp,dstIp that we >> >> couldn't figure out in the time needed for this work so we were >> going to >> >> just go with srcMac,dstMac due to time constraints. However, on the >> >> system where we actually did the testing both worked so I just >> collected >> >> both out of curiosity. >> >> >> >> >> >> Thanks >> >> >> >> Alec >> >> >> >> >> >> *From: >> *<<mailto:vpp-dev-boun...@lists.fd.io>vpp-dev-boun...@lists.fd.io> >> on behalf of "Maciek >> >> Konstantynowicz (mkonstan)" >> <<mailto:mkons...@cisco.com>mkons...@cisco.com> >> >> *Date: *Wednesday, February 15, 2017 at 1:28 PM >> >> *To: *Thomas F Herbert >> <<mailto:therb...@redhat.com>therb...@redhat.com> >> >> *Cc: *Andrew Theurer >> <<mailto:atheu...@redhat.com>atheu...@redhat.com>, Douglas >> Shakshober >> >> <dsh...@redhat.com <mailto:dsh...@redhat.com>>, >> "csit-...@lists.fd.io <mailto:csit-...@lists.fd.io>" >> <csit-...@lists.fd.io <mailto:csit-...@lists.fd.io>>, >> >> vpp-dev >> <<mailto:vpp-dev@lists.fd.io>vpp-dev@lists.fd.io>, Karl Rister >> <kris...@redhat.com <mailto:kris...@redhat.com>> >> >> *Subject: *Re: [vpp-dev] Interesting perf test results >> from Red >> >> Hat's test team >> >> >> >> Thomas, many thanks for sending this. >> >> >> >> Few comments and questions after reading the slides: >> >> >> >> 1. s3 clarification - host and data plane thread setup - >> vswitch pmd >> >> (data plane) thread placement >> >> a. "1PMD/core (4 core)” - HT (SMT) disabled, 4 phy >> cores used >> >> for vswitch, each with data plane thread. >> >> b. “2PMD/core (2 core)” - HT (SMT) enabled, 2 phy >> cores, 4 >> >> logical cores used for vswitch, each with data plane thread. >> >> c. in both cases each data plane thread handling a single >> >> interface - 2* physical, 2* vhost => 4 threads, all busy. >> >> d. in both cases frames are dropped by vswitch or in >> vring due >> >> to vswitch not keeping up - IOW testpmd in kvm guest is >> not DUT. >> >> 2. s3 question - vswitch setup - it is unclear what is the >> >> forwarding mode of each vswitch, as only srcIp changed in >> flows >> >> a. flow or MAC learning mode? >> >> b. port to port crossconnect? >> >> 3. s3 comment - host and data plane thread setup >> >> a. “2PMD/core (2 core)” case - thread placement may yield >> >> different results >> >> - physical interface threads as siblings vs. >> >> - physical and virtual interface threads as siblings. >> >> b. "1PMD/core (4 core)” - one would expect these to >> be much >> >> higher than “2PMD/core (2 core)” >> >> - speculation: possibly due to "instruction load" >> imbalance >> >> between threads. >> >> - two types of thread with different "instruction >> load": >> >> phy->vhost vs. vhost->phy >> >> - "instruction load" = instr/pkt, instr/cycle >> (IPC efficiency). >> >> 4. s4 comment - results look as expected for vpp >> >> 5. s5 question - unclear why throughput doubled >> >> a. e.g. for vpp from "11.16 Mpps" to "22.03 Mpps" >> >> b. if only queues increased, and cpu resources did >> not, or have >> >> they? >> >> 6. s6 question - similar to point 5. - unclear cpu and thread >> >> reasources. >> >> 7. s7 comment - anomaly for 3q (virtio multi-queue) for >> (srcMAc,dstMAC) >> >> a. could be due to flow hashing inefficiency. >> >> -Maciek >> >> >> >> On 15 Feb 2017, at 17:34, Thomas F Herbert >> <therb...@redhat.com <mailto:therb...@redhat.com> >> >> >> <<mailto:therb...@redhat.com%3e>mailto:therb...@redhat.com>> >> wrote: >> >> >> >> Here are test results on VPP 17.01 compared with OVS/DPDK >> >> 2.6/1611 performed by Karl Rister of Red Hat. >> >> This is PVP testing with 1, 2 and 3 queues. It is an >> interesting >> >> comparison with the CSIT results. Of particular >> interest is the >> >> drop off on the 3 queue results. >> >> --TFH >> >> >> >> -- >> >> *Thomas F Herbert* >> >> SDN Group >> >> Office of Technology >> >> *Red Hat* >> >> >> >> <vpp-17.01_vs_ovs-2.6.pdf>_______________________________________________ >> >> vpp-dev mailing list >> >> vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> >> <mailto:vpp-dev@lists.fd.io> >> >> https://lists.fd.io/mailman/listinfo/vpp-dev >> <https://lists.fd.io/mailman/listinfo/vpp-dev> >> >> >> >> >> >> >> >> -- >> >> Karl Rister <<mailto:kris...@redhat.com>kris...@redhat.com> >> >> >> > > -- > *Thomas F Herbert* > SDN Group > Office of Technology > *Red Hat* -- Karl Rister <kris...@redhat.com>
+ vppctl show interface Name Idx State Counter Count TenGigabitEthernet1/0/0 1 up TenGigabitEthernet1/0/1 2 up VirtualEthernet0/0/0 3 up VirtualEthernet0/0/1 4 up local0 0 down + vppctl show interface address TenGigabitEthernet1/0/0 (up): l2 xconnect VirtualEthernet0/0/0 TenGigabitEthernet1/0/1 (up): l2 xconnect VirtualEthernet0/0/1 VirtualEthernet0/0/0 (up): l2 xconnect TenGigabitEthernet1/0/0 VirtualEthernet0/0/1 (up): l2 xconnect TenGigabitEthernet1/0/1 local0 (dn): + vppctl show hardware Name Idx Link Hardware TenGigabitEthernet1/0/0 1 up TenGigabitEthernet1/0/0 Ethernet address 24:6e:96:19:de:d8 Intel 82599 carrier up full duplex speed 10000 mtu 9216 promisc rx queues 2, rx desc 2048, tx queues 2, tx desc 2048 cpu socket 0 extended stats: mac local errors 3 mac remote errors 1 TenGigabitEthernet1/0/1 2 up TenGigabitEthernet1/0/1 Ethernet address 24:6e:96:19:de:da Intel 82599 carrier up full duplex speed 10000 mtu 9216 promisc rx queues 2, rx desc 2048, tx queues 2, tx desc 2048 cpu socket 0 extended stats: mac local errors 4 mac remote errors 2 VirtualEthernet0/0/0 3 up VirtualEthernet0/0/0 Ethernet address 02:fe:e6:e7:8d:e8 VirtualEthernet0/0/1 4 up VirtualEthernet0/0/1 Ethernet address 02:fe:0d:5e:7f:3b local0 0 down local0 local + vppctl show threads ID Name Type LWP Sched Policy (Priority) lcore Core Socket State 0 vpp_main 37859 other (0) 6 3 0 wait 1 vpp_wk_0 workers 37867 fifo (95) 10 5 0 running 2 vpp_wk_1 workers 37868 fifo (95) 12 6 0 running 3 vpp_wk_2 workers 37869 fifo (95) 14 8 0 running 4 vpp_wk_3 workers 37870 fifo (95) 16 9 0 running 5 vpp_wk_4 workers 37871 fifo (95) 38 5 0 running 6 vpp_wk_5 workers 37872 fifo (95) 40 6 0 running 7 vpp_wk_6 workers 37873 fifo (95) 42 8 0 running 8 vpp_wk_7 workers 37874 fifo (95) 44 9 0 running 9 stats 37877 other (0) 0 0 0 wait + vppctl show dpdk interface placement Thread 5 (vpp_wk_4 at lcore 38): TenGigabitEthernet1/0/0 queue 0 Thread 6 (vpp_wk_5 at lcore 40): TenGigabitEthernet1/0/0 queue 1 Thread 7 (vpp_wk_6 at lcore 42): TenGigabitEthernet1/0/1 queue 0 Thread 8 (vpp_wk_7 at lcore 44): TenGigabitEthernet1/0/1 queue 1 + vppctl show vhost-user Virtio vhost-user interfaces Global: coalesce frames 32 time 1e-3 Interface: VirtualEthernet0/0/0 (ifindex 3) virtio_net_hdr_sz 12 features mask (0xffffffffffffffff): features (0x50408000): VIRTIO_NET_F_MRG_RXBUF (15) VIRTIO_NET_F_MQ (22) VIRTIO_F_INDIRECT_DESC (28) VHOST_USER_F_PROTOCOL_FEATURES (30) protocol features (0x3) VHOST_USER_PROTOCOL_F_MQ (0) VHOST_USER_PROTOCOL_F_LOG_SHMFD (1) socket filename /var/run/vpp/vhost1 type server errno "Success" rx placement: thread 1 on vring 1 thread 2 on vring 3 tx placement: spin-lock thread 0 on vring 0 thread 1 on vring 2 thread 2 on vring 0 thread 3 on vring 2 thread 4 on vring 0 thread 5 on vring 2 thread 6 on vring 0 thread 7 on vring 2 thread 8 on vring 0 Memory regions (total 2) region fd guest_phys_addr memory_size userspace_addr mmap_offset mmap_addr ====== ===== ================== ================== ================== ================== ================== 0 60 0x0000000100000000 0x0000000040000000 0x00007ff7c0000000 0x00000000c0000000 0x00002aaf00000000 1 61 0x0000000000000000 0x00000000c0000000 0x00007ff700000000 0x0000000000000000 0x00002aaac0000000 Virtqueue 0 (TX) qsz 256 last_avail_idx 0 last_used_idx 0 avail.flags 1 avail.idx 256 used.flags 1 used.idx 0 kickfd 58 callfd 59 errfd -1 Virtqueue 1 (RX) qsz 256 last_avail_idx 0 last_used_idx 0 avail.flags 1 avail.idx 0 used.flags 1 used.idx 0 kickfd 62 callfd 63 errfd -1 Virtqueue 2 (TX) qsz 256 last_avail_idx 0 last_used_idx 0 avail.flags 1 avail.idx 256 used.flags 1 used.idx 0 kickfd 64 callfd 65 errfd -1 Virtqueue 3 (RX) qsz 256 last_avail_idx 0 last_used_idx 0 avail.flags 1 avail.idx 0 used.flags 1 used.idx 0 kickfd 66 callfd 67 errfd -1 Interface: VirtualEthernet0/0/1 (ifindex 4) virtio_net_hdr_sz 12 features mask (0xffffffffffffffff): features (0x50408000): VIRTIO_NET_F_MRG_RXBUF (15) VIRTIO_NET_F_MQ (22) VIRTIO_F_INDIRECT_DESC (28) VHOST_USER_F_PROTOCOL_FEATURES (30) protocol features (0x3) VHOST_USER_PROTOCOL_F_MQ (0) VHOST_USER_PROTOCOL_F_LOG_SHMFD (1) socket filename /var/run/vpp/vhost2 type server errno "Success" rx placement: thread 3 on vring 1 thread 4 on vring 3 tx placement: spin-lock thread 0 on vring 0 thread 1 on vring 2 thread 2 on vring 0 thread 3 on vring 2 thread 4 on vring 0 thread 5 on vring 2 thread 6 on vring 0 thread 7 on vring 2 thread 8 on vring 0 Memory regions (total 2) region fd guest_phys_addr memory_size userspace_addr mmap_offset mmap_addr ====== ===== ================== ================== ================== ================== ================== 0 68 0x0000000100000000 0x0000000040000000 0x00007ff7c0000000 0x00000000c0000000 0x00002aac80000000 1 71 0x0000000000000000 0x00000000c0000000 0x00007ff700000000 0x0000000000000000 0x00002aad00000000 Virtqueue 0 (TX) qsz 256 last_avail_idx 0 last_used_idx 0 avail.flags 1 avail.idx 256 used.flags 1 used.idx 0 kickfd 69 callfd 70 errfd -1 Virtqueue 1 (RX) qsz 256 last_avail_idx 0 last_used_idx 0 avail.flags 1 avail.idx 0 used.flags 1 used.idx 0 kickfd 72 callfd 73 errfd -1 Virtqueue 2 (TX) qsz 256 last_avail_idx 0 last_used_idx 0 avail.flags 1 avail.idx 256 used.flags 1 used.idx 0 kickfd 74 callfd 75 errfd -1 Virtqueue 3 (RX) qsz 256 last_avail_idx 0 last_used_idx 0 avail.flags 1 avail.idx 0 used.flags 1 used.idx 0 kickfd 76 callfd 77 errfd -1 + vppctl show run Thread 0 vpp_main (lcore 6) Time 659.6, average vectors/node 0.00, last 128 main loops 0.00 per node 0.00 vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call admin-up-down-process event wait 0 0 1 7.51e4 0.00 api-rx-from-ring active 0 0 310 8.45e6 0.00 bfd-process event wait 0 0 1 2.32e4 0.00 cdp-process any wait 0 0 305 3.13e4 0.00 dhcp-client-process any wait 0 0 7 3.64e3 0.00 dpdk-process any wait 0 0 202 4.37e5 0.00 fib-walk any wait 0 0 605875 2.35e2 0.00 flow-report-process any wait 0 0 1 8.84e3 0.00 gmon-process time wait 0 0 121 1.54e3 0.00 ip6-icmp-neighbor-discovery-ev any wait 0 0 604 5.06e2 0.00 l2fib-mac-age-scanner-process event wait 0 0 1 1.39e4 0.00 lisp-retry-service any wait 0 0 303 9.28e2 0.00 lldp-process event wait 0 0 1 7.99e6 0.00 startup-config-process done 1 0 1 2.44e4 0.00 unix-epoll-input polling 892432380 0 0 1.53e3 0.00 vhost-user-process any wait 0 0 204 2.48e3 0.00 vpe-link-state-process event wait 0 0 23 2.73e3 0.00 vpe-oam-process any wait 0 0 297 5.97e2 0.00 vpe-route-resolver-process any wait 0 0 7 1.23e4 0.00 --------------- Thread 1 vpp_wk_0 (lcore 10) Time 659.6, average vectors/node 0.00, last 128 main loops 0.00 per node 0.00 vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call dpdk-input disabled 1348503 0 0 2.78e2 0.00 vhost-user-input polling 1484883589 0 0 2.89e2 0.00 --------------- Thread 2 vpp_wk_1 (lcore 12) Time 659.6, average vectors/node 0.00, last 128 main loops 0.00 per node 0.00 vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call dpdk-input disabled 1557231 0 0 2.74e2 0.00 vhost-user-input polling 1609161637 0 0 2.63e2 0.00 --------------- Thread 3 vpp_wk_2 (lcore 14) Time 659.6, average vectors/node 0.00, last 128 main loops 0.00 per node 0.00 vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call dpdk-input disabled 1759965 0 0 2.74e2 0.00 vhost-user-input polling 1521831133 0 0 2.80e2 0.00 --------------- Thread 4 vpp_wk_3 (lcore 16) Time 659.6, average vectors/node 0.00, last 128 main loops 0.00 per node 0.00 vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call dpdk-input disabled 1974685 0 0 2.69e2 0.00 vhost-user-input polling 1555190382 0 0 2.71e2 0.00 --------------- Thread 5 vpp_wk_4 (lcore 38) Time 659.6, average vectors/node 0.00, last 128 main loops 0.00 per node 0.00 vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call dpdk-input polling 3776242407 0 0 2.97e2 0.00 --------------- Thread 6 vpp_wk_5 (lcore 40) Time 659.6, average vectors/node 0.00, last 128 main loops 0.00 per node 0.00 vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call dpdk-input polling 3333481960 0 0 3.37e2 0.00 --------------- Thread 7 vpp_wk_6 (lcore 42) Time 659.6, average vectors/node 0.00, last 128 main loops 0.00 per node 0.00 vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call dpdk-input polling 3709143401 0 0 3.07e2 0.00 --------------- Thread 8 vpp_wk_7 (lcore 44) Time 659.6, average vectors/node 0.00, last 128 main loops 0.00 per node 0.00 vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call dpdk-input polling 3693949514 0 0 3.07e2 0.00
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev