Re: [dpdk-dev] [PATCH v5 0/8] Introduce event vectorization

Jayatheerthan, Jay Tue, 23 Mar 2021 22:39:29 -0700

> -----Original Message-----
> From: pbhagavat...@marvell.com <pbhagavat...@marvell.com>
> Sent: Wednesday, March 24, 2021 10:35 AM
> To: jer...@marvell.com; Jayatheerthan, Jay <jay.jayatheert...@intel.com>; 
> Carrillo, Erik G <erik.g.carri...@intel.com>; Gujjar, Abhinandan
> S <abhinandan.guj...@intel.com>; McDaniel, Timothy 
> <timothy.mcdan...@intel.com>; hemant.agra...@nxp.com; Van Haaren, Harry
> <harry.van.haa...@intel.com>; mattias.ronnblom 
> <mattias.ronnb...@ericsson.com>; Ma, Liang J <liang.j...@intel.com>
> Cc: dev@dpdk.org; Pavan Nikhilesh <pbhagavat...@marvell.com>
> Subject: [dpdk-dev] [PATCH v5 0/8] Introduce event vectorization
> 
> From: Pavan Nikhilesh <pbhagavat...@marvell.com>
> 
> In traditional event programming model, events are identified by a
> flow-id and a uintptr_t. The flow-id uniquely identifies a given event
> and determines the order of scheduling based on schedule type, the
> uintptr_t holds a single object.
> 
> Event devices also support burst mode with configurable dequeue depth,
> i.e. each dequeue call would return multiple events and each event
> might be at a different stage of the pipeline.
> Having a burst of events belonging to different stages in a dequeue
> burst is not only difficult to vectorize but also increases the scheduler
> overhead and application overhead of pipelining events further.
> Using event vectors we see a performance gain of ~628% as shown in [1].
This is very impressive performance boost. Thanks so much for putting this 
patchset together! Just curious, was any performance measurement done for 
existing applications (non-vector)?
> 
> By introducing event vectorization, each event will be capable of holding
> multiple uintptr_t of the same flow thereby allowing applications
> to vectorize their pipeline and reduce the complexity of pipelining
> events across multiple stages. This also reduces the complexity of handling
> enqueue and dequeue on an event device.
> 
> Since event devices are transparent to the events they are scheduling
> so the event producers such as eth_rx_adapter, crypto_adapter , etc..
> are responsible for vectorizing the buffers of the same flow into a single
> event.
> 
> The series also breaks ABI in the patch [8/8] which is targetted to the
> v21.11 release.
> 
> The dpdk-test-eventdev application has been updated with options to test
> multiple vector sizes and timeouts.
> 
> [1]
> As for performance improvement, with a ARM Cortex-A72 equivalent processer,
> software event device (--vdev=event_sw0), single worker core, single stage
> and using one service core for Rx adapter, Tx adapter, Scheduling.
> 
> Without event vectorization:
>     ./build/app/dpdk-test-eventdev -l 7-23 -s 0x700 --vdev="event_sw0" --
>          --prod_type_ethdev --nb_pkts=0 --verbose 2 --test=pipeline_queue
>          --stlist=a --wlcores=20
>     Port[0] using Rx adapter[0] configured
>     Port[0] using Tx adapter[0] Configured
>     4.728 mpps avg 4.728 mpps
Is this number before the patchset? If so, it would help put similar number 
with the patchset but not using vectorization feature.
> 
> With event vectorization:
>     ./build/app/dpdk-test-eventdev -l 7-23 -s 0x700 --vdev="event_sw0" --
>         --prod_type_ethdev --nb_pkts=0 --verbose 2 --test=pipeline_queue
>         --stlist=a --wlcores=20 --enable_vector --nb_eth_queues 1
>         --vector_size 256
>     Port[0] using Rx adapter[0] configured
>     Port[0] using Tx adapter[0] Configured
>     34.383 mpps avg 34.383 mpps
> 
> Having dedicated service cores for each Rx queues and tweaking the vector,
> dequeue burst size would further improve performance.
> 
> API usage is shown below:
> 
> Configuration:
> 
>       struct rte_event_eth_rx_adapter_event_vector_config vec_conf;
> 
>       vector_pool = rte_event_vector_pool_create("vector_pool",
>                       nb_elem, 0, vector_size, socket_id);
> 
>       rte_event_eth_rx_adapter_create(id, event_id, &adptr_conf);
>       rte_event_eth_rx_adapter_queue_add(id, eth_id, -1, &queue_conf);
>       if (cap & RTE_EVENT_ETH_RX_ADAPTER_CAP_EVENT_VECTOR) {
>               vec_conf.vector_sz = vector_size;
>               vec_conf.vector_timeout_ns = vector_tmo_nsec;
>               vec_conf.vector_mp = vector_pool;
>               rte_event_eth_rx_adapter_queue_event_vector_config(id,
>                               eth_id, -1, &vec_conf);
>       }
> 
> Fastpath:
> 
>       num = rte_event_dequeue_burst(event_id, port_id, &ev, 1, 0);
>       if (!num)
>               continue;
> 
>       if (ev.event_type & RTE_EVENT_TYPE_VECTOR) {
>               switch (ev.event_type) {
>               case RTE_EVENT_TYPE_ETHDEV_VECTOR:
>               case RTE_EVENT_TYPE_ETH_RX_ADAPTER_VECTOR:
>                       struct rte_mbuf **mbufs;
> 
>                       mbufs = ev.vector_ev->mbufs;
>                       for (i = 0; i < ev.vector_ev->nb_elem; i++)
>                               //Process mbufs.
>                       break;
>               case ...
>               }
>       }
>       ...
> 
> v5 Changes:
> - Make `rte_event_vector_pool_create non-inline` to ease ABI stability.(Ray)
> - Move `rte_event_eth_rx_adapter_queue_event_vector_config` and
>   `rte_event_eth_rx_adapter_vector_limits_get` implementation to the patch
>   where they are initially defined.(Ray)
> - Multiple gramatical and style fixes.(Jerin)
> - Add missing release notes.(Jerin)
> 
> v4 Changes:
> - Fix missing event vector structure in event structure.(Jay)
> 
> v3 Changes:
> - Fix unintended formatting changes.
> 
> v2 Changes:
> - Multiple gramatical and style fixes.(Jerin)
> - Add parameter to define vector size in power of 2. (Jerin)
> - Redo patch series w/o breaking ABI till the last patch.(David)
> - Add deprication notice to announce ABI break in 21.11.(David)
> - Add vector limits validation to app/test-eventdev.
> 
> Pavan Nikhilesh (8):
>   eventdev: introduce event vector capability
>   eventdev: introduce event vector Rx capability
>   eventdev: introduce event vector Tx capability
>   eventdev: add Rx adapter event vector support
>   eventdev: add Tx adapter event vector support
>   app/eventdev: add event vector mode in pipeline test
>   doc: announce event Rx adapter config changes
>   eventdev: simplify Rx adapter event vector config
> 
>  app/test-eventdev/evt_common.h                |   4 +
>  app/test-eventdev/evt_options.c               |  52 +++
>  app/test-eventdev/evt_options.h               |   4 +
>  app/test-eventdev/test_pipeline_atq.c         | 310 +++++++++++++++--
>  app/test-eventdev/test_pipeline_common.c      | 105 +++++-
>  app/test-eventdev/test_pipeline_common.h      |  18 +
>  app/test-eventdev/test_pipeline_queue.c       | 320 ++++++++++++++++--
>  .../prog_guide/event_ethernet_rx_adapter.rst  |  38 +++
>  .../prog_guide/event_ethernet_tx_adapter.rst  |  12 +
>  doc/guides/prog_guide/eventdev.rst            |  36 +-
>  doc/guides/rel_notes/deprecation.rst          |   9 +
>  doc/guides/rel_notes/release_21_05.rst        |   8 +
>  doc/guides/tools/testeventdev.rst             |  45 ++-
>  lib/librte_eventdev/eventdev_pmd.h            |  31 +-
>  .../rte_event_eth_rx_adapter.c                | 305 ++++++++++++++++-
>  .../rte_event_eth_rx_adapter.h                |  78 +++++
>  .../rte_event_eth_tx_adapter.c                |  66 +++-
>  lib/librte_eventdev/rte_eventdev.c            |  53 ++-
>  lib/librte_eventdev/rte_eventdev.h            | 113 ++++++-
>  lib/librte_eventdev/version.map               |   4 +
>  20 files changed, 1524 insertions(+), 87 deletions(-)
> 
> --
> 2.17.1


Just a heads up. v5 patchset doesn't apply cleanly on HEAD 
(5f0849c1155849dfdbf950c91c52cdf9cd301f59). Although, it applies cleanly on 
app/eventdev: fix timeout accuracy (c33d48387dc8ccf1b432820f6e0cd4992ab486df).

Re: [dpdk-dev] [PATCH v5 0/8] Introduce event vectorization

Reply via email to