> -----Original Message----- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Thursday, April 21, 2016 5:54 PM > To: Wang, Zhihong <zhihong.wang at intel.com> > Cc: dev at dpdk.org; De Lara Guarch, Pablo <pablo.de.lara.guarch at intel.com> > Subject: Re: [dpdk-dev] [RFC PATCH 0/2] performance utility in testpmd > > 2016-04-20 18:43, Zhihong Wang: > > This RFC patch proposes a general purpose forwarding engine in testpmd > > namely "portfwd", to enable performance analysis and tuning for poll mode > > drivers in vSwitching scenarios. > > > > > > Problem statement > > ----------------- > > > > vSwitching is more I/O bound in a lot of cases since there are a lot of > > LLC/cross-core memory accesses. > > > > In order to reveal memory/cache behavior in real usage scenarios and enable > > efficient performance analysis and tuning for vSwitching, DPDK needs a > > sample application that supports traffic flow close to real deployment, > > e.g. multi-tenancy, service chaining. > > > > There is a vhost sample application currently to enable simple vSwitching > > scenarios, it comes with several limitations: > > > > 1) Traffic flow is too simple and not flexible > > > > 2) Switching based on MAC/VLAN only > > > > 3) Not enough performance metrics > > > > > > Proposed solution > > ----------------- > > > > The testpmd sample application is a good choice, it's a powerful poll mode > > driver management framework hosts various forwarding engine. > > Not sure it is a good choice. > The goal of testpmd is to test every PMD features. > How far can we go in adding some stack processing while keeping it > easily maintainable?
Thanks for the quick response! This utility is not for vSwitching in particular, it's just adding more forwarding setup capabilities in testpmd. testpmd composes of separated components: 1) pmd management framework 2) forwarding engines: a) traffic setup b) forwarding function When adding a new fwd engine, only the new traffic setup function and forwarding function (maybe cmd handlers too) are added, no existing things are touched. So it doesn't make it harder to maintain. It also doesn't change the current behavior at all, by default it's still iofwd, the user can switch to portfwd only when flexible forwarding rules are needed. Also, I believe in both DPDK and OVS-DPDK community, testpmd has Already become a widely used tool to setup performance and functional test, and there're some complains about the usability and flexibility. Just one of the many examples to show why we need a feature-rich fwd engine: There was an OVS bug reported by Red Hat that took both OVS and DPDK a long time to investigate, and it turned out to be a testpmd setup issue: They used testpmd in the guest to do the forwarding, and when multiqueue is enabled, current testpmd have to use separated cores for each rxq, so insufficient cores will result in untended rxqs, which is not an expected result, and not an necessary limitation. Also, when OVS-DPDK are integrating multiqueue, a lot of cores have to be assigned to the VM to handle all the rxqs for the test, which puts limitation on both performance test and functional test because a single numa node have limited cores. Another thing is about the learning curve to use DPDK sample application, we can actually use portfwd for all kinds of pmd test (both host and guest, nic pmds, vhost pmds, virtio pmds, etc.), and it's simple to use, instead of useing different apps, like vhost sample in the host and testpmd in the guest. > > > Now with the vhost pmd feature, it can also handle vhost devices, only a > > new forwarding engine is needed to make use of it. > > Why a new forwarding engine is needed for vhost? Appologize for my poor English, what I meant is with the vhost pmd feature, testpmd has become a vSwitch already, we just need to add more forwarding setup capability to make use of it. > > > portfwd is implemented to this end. > > > > Features of portfwd: > > > > 1) Build up traffic from simple rx/tx to complex scenarios easily > > > > 2) Rich performance statistics for all ports > > Have you checked CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES and > CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS? These stats are good, it'll be even better to have per rx/tx cycle & burst size info for each port in portfwd, like: cycle stat (since last show) ---------------- port 0, burst 32, rx, run, min, avg, max, 0, 0, 0, 0, 0, 1, 21, 596, 663, 752, 2, 289, 580, 725, 1056, 3, 6, 644, 686, 796, 4, 153, 656, 724, 896, [...] 32, 1208666, 756, 1206, 19212, tx, run, min, avg, max, 0, 0, 0, 0, 0, [...] 32, 1208652, 476, 559, 12144, > > > 3) Core affinity manipulation > > > > 4) Commands for run time configuration > > > > Notice that portfwd has fair performance, but it's not for getting the > > "maximum" numbers: > > > > 1) It buffers packets for burst send efficiency analysis, which increase > > latency > > > > 2) It touches the packet header and collect performance statistics which > > adds overheads > > > > These "extra" overheads are actually what happens in real applications. > [...] > > Implementation details > > ---------------------- > > > > To enable flexible traffic flow setup, each port has 2 ways to forward > > packets in portfwd: > > Should not it be 2 forward engines? > Please first describe the existing engines to help making a decision. It's actually 1 engine. A fwd engine means a forwarding function to be called in the testpmd framework. Take iofwd for example, in its fwd function: pkt_burst_io_forward: Simply call rte_eth_rx_burst for an rxq and then rte_eth_tx_burst to the fixed mapping txq. In portfwd, it's basically the same, but we get the dst port and queue dynamically before rte_eth_tx_burst, that's all. Current engines are: * csumonly.c * flowgen.c * icmpecho.c * ieee1588fwd.c * iofwd.c * macfwd.c * macfwd-retry.c * macswap.c * rxonly.c * txonly.c All of them have fixed traffic setup, for instance, if we have 3 ports, the traffic will be like this: Logical Core 14 (socket 0) forwards packets on 3 streams: 0: RX P=0/Q=0 (socket 0) -> TX P=2/Q=0 (socket 0) peer=02:00:00:00:00:02 0: RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 0: RX P=2/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 And you can't change it into something like: port 0 -> port 1 -> port 2 Not to mention the multiqueue limitation and core affinity manipulation. Like, when we have 2 ports each with 2 queues running on 1 core, the traffic will be like this: Logical Core 14 (socket 0) forwards packets on 1 streams: 0: RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01 Only 1 rxq will be handled. This is the Red Hat issue I mentioned above. > > > 1) Forward based on dst ip > [...] > > 2) Forward to a fixed port > [...]