>> From: pbhagavat...@marvell.com <pbhagavat...@marvell.com> >> Sent: Wednesday, March 24, 2021 10:35 AM >> To: jer...@marvell.com; Jayatheerthan, Jay ><jay.jayatheert...@intel.com>; Carrillo, Erik G ><erik.g.carri...@intel.com>; Gujjar, Abhinandan >> S <abhinandan.guj...@intel.com>; McDaniel, Timothy ><timothy.mcdan...@intel.com>; hemant.agra...@nxp.com; Van >Haaren, Harry >> <harry.van.haa...@intel.com>; mattias.ronnblom ><mattias.ronnb...@ericsson.com>; Ma, Liang J ><liang.j...@intel.com> >> Cc: dev@dpdk.org; Pavan Nikhilesh <pbhagavat...@marvell.com> >> Subject: [dpdk-dev] [PATCH v5 0/8] Introduce event vectorization >> >> From: Pavan Nikhilesh <pbhagavat...@marvell.com> >> >> In traditional event programming model, events are identified by a >> flow-id and a uintptr_t. The flow-id uniquely identifies a given event >> and determines the order of scheduling based on schedule type, the >> uintptr_t holds a single object. >> >> Event devices also support burst mode with configurable dequeue >depth, >> i.e. each dequeue call would return multiple events and each event >> might be at a different stage of the pipeline. >> Having a burst of events belonging to different stages in a dequeue >> burst is not only difficult to vectorize but also increases the scheduler >> overhead and application overhead of pipelining events further. >> Using event vectors we see a performance gain of ~628% as shown in >[1]. >This is very impressive performance boost. Thanks so much for putting >this patchset together! Just curious, was any performance >measurement done for existing applications (non-vector)? >> >> By introducing event vectorization, each event will be capable of >holding >> multiple uintptr_t of the same flow thereby allowing applications >> to vectorize their pipeline and reduce the complexity of pipelining >> events across multiple stages. This also reduces the complexity of >handling >> enqueue and dequeue on an event device. >> >> Since event devices are transparent to the events they are scheduling >> so the event producers such as eth_rx_adapter, crypto_adapter , etc.. >> are responsible for vectorizing the buffers of the same flow into a >single >> event. >> >> The series also breaks ABI in the patch [8/8] which is targetted to the >> v21.11 release. >> >> The dpdk-test-eventdev application has been updated with options to >test >> multiple vector sizes and timeouts. >> >> [1] >> As for performance improvement, with a ARM Cortex-A72 equivalent >processer, >> software event device (--vdev=event_sw0), single worker core, single >stage >> and using one service core for Rx adapter, Tx adapter, Scheduling. >> >> Without event vectorization: >> ./build/app/dpdk-test-eventdev -l 7-23 -s 0x700 -- >vdev="event_sw0" -- >> --prod_type_ethdev --nb_pkts=0 --verbose 2 -- >test=pipeline_queue >> --stlist=a --wlcores=20 >> Port[0] using Rx adapter[0] configured >> Port[0] using Tx adapter[0] Configured >> 4.728 mpps avg 4.728 mpps >Is this number before the patchset? If so, it would help put similar >number with the patchset but not using vectorization feature.
I don’t remember the exact clock frequency I was using when I ran the above test but with equal clocks: 1. Without the patchset applied 5.071 mpps 2. With patchset applied w/o enabling vector 5.123 mpps 3. With patchset applied with enabling vector vector_sz@256 42.715 mpps vector_sz@512 45.335 mpps >> >> With event vectorization: >> ./build/app/dpdk-test-eventdev -l 7-23 -s 0x700 -- >vdev="event_sw0" -- >> --prod_type_ethdev --nb_pkts=0 --verbose 2 -- >test=pipeline_queue >> --stlist=a --wlcores=20 --enable_vector --nb_eth_queues 1 >> --vector_size 256 >> Port[0] using Rx adapter[0] configured >> Port[0] using Tx adapter[0] Configured >> 34.383 mpps avg 34.383 mpps >> >> Having dedicated service cores for each Rx queues and tweaking the >vector, >> dequeue burst size would further improve performance. >> >> API usage is shown below: >> >> Configuration: >> >> struct rte_event_eth_rx_adapter_event_vector_config >vec_conf; >> >> vector_pool = rte_event_vector_pool_create("vector_pool", >> nb_elem, 0, vector_size, socket_id); >> >> rte_event_eth_rx_adapter_create(id, event_id, &adptr_conf); >> rte_event_eth_rx_adapter_queue_add(id, eth_id, -1, >&queue_conf); >> if (cap & RTE_EVENT_ETH_RX_ADAPTER_CAP_EVENT_VECTOR) >{ >> vec_conf.vector_sz = vector_size; >> vec_conf.vector_timeout_ns = vector_tmo_nsec; >> vec_conf.vector_mp = vector_pool; >> > rte_event_eth_rx_adapter_queue_event_vector_config(id, >> eth_id, -1, &vec_conf); >> } >> >> Fastpath: >> >> num = rte_event_dequeue_burst(event_id, port_id, &ev, 1, 0); >> if (!num) >> continue; >> >> if (ev.event_type & RTE_EVENT_TYPE_VECTOR) { >> switch (ev.event_type) { >> case RTE_EVENT_TYPE_ETHDEV_VECTOR: >> case RTE_EVENT_TYPE_ETH_RX_ADAPTER_VECTOR: >> struct rte_mbuf **mbufs; >> >> mbufs = ev.vector_ev->mbufs; >> for (i = 0; i < ev.vector_ev->nb_elem; i++) >> //Process mbufs. >> break; >> case ... >> } >> } >> ... >> >> v5 Changes: >> - Make `rte_event_vector_pool_create non-inline` to ease ABI >stability.(Ray) >> - Move `rte_event_eth_rx_adapter_queue_event_vector_config` and >> `rte_event_eth_rx_adapter_vector_limits_get` implementation to >the patch >> where they are initially defined.(Ray) >> - Multiple gramatical and style fixes.(Jerin) >> - Add missing release notes.(Jerin) >> >> v4 Changes: >> - Fix missing event vector structure in event structure.(Jay) >> >> v3 Changes: >> - Fix unintended formatting changes. >> >> v2 Changes: >> - Multiple gramatical and style fixes.(Jerin) >> - Add parameter to define vector size in power of 2. (Jerin) >> - Redo patch series w/o breaking ABI till the last patch.(David) >> - Add deprication notice to announce ABI break in 21.11.(David) >> - Add vector limits validation to app/test-eventdev. >> >> Pavan Nikhilesh (8): >> eventdev: introduce event vector capability >> eventdev: introduce event vector Rx capability >> eventdev: introduce event vector Tx capability >> eventdev: add Rx adapter event vector support >> eventdev: add Tx adapter event vector support >> app/eventdev: add event vector mode in pipeline test >> doc: announce event Rx adapter config changes >> eventdev: simplify Rx adapter event vector config >> >> app/test-eventdev/evt_common.h | 4 + >> app/test-eventdev/evt_options.c | 52 +++ >> app/test-eventdev/evt_options.h | 4 + >> app/test-eventdev/test_pipeline_atq.c | 310 +++++++++++++++-- >> app/test-eventdev/test_pipeline_common.c | 105 +++++- >> app/test-eventdev/test_pipeline_common.h | 18 + >> app/test-eventdev/test_pipeline_queue.c | 320 >++++++++++++++++-- >> .../prog_guide/event_ethernet_rx_adapter.rst | 38 +++ >> .../prog_guide/event_ethernet_tx_adapter.rst | 12 + >> doc/guides/prog_guide/eventdev.rst | 36 +- >> doc/guides/rel_notes/deprecation.rst | 9 + >> doc/guides/rel_notes/release_21_05.rst | 8 + >> doc/guides/tools/testeventdev.rst | 45 ++- >> lib/librte_eventdev/eventdev_pmd.h | 31 +- >> .../rte_event_eth_rx_adapter.c | 305 ++++++++++++++++- >> .../rte_event_eth_rx_adapter.h | 78 +++++ >> .../rte_event_eth_tx_adapter.c | 66 +++- >> lib/librte_eventdev/rte_eventdev.c | 53 ++- >> lib/librte_eventdev/rte_eventdev.h | 113 ++++++- >> lib/librte_eventdev/version.map | 4 + >> 20 files changed, 1524 insertions(+), 87 deletions(-) >> >> -- >> 2.17.1 > >Just a heads up. v5 patchset doesn't apply cleanly on HEAD >(5f0849c1155849dfdbf950c91c52cdf9cd301f59). Although, it applies >cleanly on app/eventdev: fix timeout accuracy >(c33d48387dc8ccf1b432820f6e0cd4992ab486df). This patchset is currently rebased on main branch, I will rebase it on dpdk-next-event in next version.