l3fwd-graph: introduce multicore dispatch worker model

Yan, Zhirun Mon, 05 Jun 2023 22:11:08 -0700


> -----Original Message-----
> From: Jerin Jacob <jerinjac...@gmail.com>
> Sent: Monday, June 5, 2023 9:42 PM
> To: Yan, Zhirun <zhirun....@intel.com>
> Cc: dev@dpdk.org; jer...@marvell.com; kirankum...@marvell.com;
> ndabilpu...@marvell.com; step...@networkplumber.org;
> pbhagavat...@marvell.com; Liang, Cunming <cunming.li...@intel.com>; Wang,
> Haiyue <haiyue.w...@intel.com>; mattias.ronnblom
> <mattias.ronnb...@ericsson.com>
> Subject: Re: [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore
> dispatch worker model
> 
> On Mon, Jun 5, 2023 at 4:57 PM Zhirun Yan <zhirun....@intel.com> wrote:
> >
> > Add new parameter "model" to choose mcore dispatch or rtc model.
> > And in dispatch model, the node will affinity to worker core successively.
> >
> > Note:
> > RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default. Must
> set
> > model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
> > dispatch explicitly. GRAPH_MODEL_MCORE_RUNTIME_SELECT means it could
> > choose by model in runtime.
> > Only support one RX node for mcore dispatch model in current
> > implementation.
> >
> > ./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P
> > --model="dispatch"
> >
> > Signed-off-by: Haiyue Wang <haiyue.w...@intel.com>
> > Signed-off-by: Cunming Liang <cunming.li...@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun....@intel.com>
> > ---
> >  examples/l3fwd-graph/main.c  | 231 +++++++++++++++++++++++++++++------
> >  lib/graph/rte_graph_worker.h |   3 +
> >  2 files changed, 196 insertions(+), 38 deletions(-)
> >
> > diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c
> > index 5feeab4f0f..4ecc6c9af4 100644
> > --- a/examples/l3fwd-graph/main.c
> > +++ b/examples/l3fwd-graph/main.c
> > @@ -23,6 +23,12 @@
> >  #include <rte_cycles.h>
> >  #include <rte_eal.h>
> >  #include <rte_ethdev.h>
> > +#define GRAPH_MODEL_RTC 0 /* Run-to-completion model, set by default.
> > +*/ #define GRAPH_MODEL_MCORE_DISPATCH 1 /* Dispatch model. */
> #define
> > +GRAPH_MODEL_MCORE_RUNTIME_SELECT 2 /* Support to select model by
> */
> > +                                          /* parsing model in
> > +cmdline. */
> 
> After moving model to graph->model, Can you check the performance.


In my env, I test l3fwd-graph, I got the same throughput.(slight improve could 
be treated as jitter)
For graph_perf_autotest in test app, there is slight drop (About 0.2% call, 
similar cycles/call) 
Can it be treated as jitter?

Old:
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|Node                           |calls          |objs           |realloc_count  
|objs/call      |objs/sec(10E6) |cycles/call|
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|test_graph_perf_worker-0-0     |10175176       |2604845056     |1              
|256.000        |2015.394304    |27.0000    |
|test_graph_perf_worker-1-0     |10175542       |2604938752     |1              
|256.000        |2015.488000    |28.0000    |
|test_graph_perf_worker-2-0     |10175565       |2604944640     |1              
|256.000        |2015.493888    |28.0000    |
|test_graph_perf_worker-3-0     |10175593       |2604951808     |1              
|256.000        |2015.501056    |27.0000    |
|test_graph_perf_source-0       |10175623       |2604959488     |2              
|256.000        |2015.508480    |27.0000    |
|test_graph_perf_sink-0         |10175642       |2604964352     |1              
|256.000        |2015.513600    |27.0000    |
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+

New:
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|Node                           |calls          |objs           |realloc_count  
|objs/call      |objs/sec(10E6) |cycles/call|
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|test_graph_perf_worker-0-0     |10154953       |2599667968     |1              
|256.000        |2010.960128    |27.0000    |
|test_graph_perf_worker-1-0     |10155316       |2599760896     |1              
|256.000        |2011.053056    |27.0000    |
|test_graph_perf_worker-2-0     |10155338       |2599766528     |1              
|256.000        |2011.058688    |28.0000    |
|test_graph_perf_worker-3-0     |10155357       |2599771392     |1              
|256.000        |2011.063552    |28.0000    |
|test_graph_perf_source-0       |10155394       |2599780864     |2              
|256.000        |2011.072768    |27.0000    |
|test_graph_perf_sink-0         |10155422       |2599788032     |1              
|256.000        |2011.080192    |27.0000    |
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+

> This may not be needed for l3fwd
> 
Do you mean graph->model?

> or if there is not much code duplication,
> 
> Do the following remove the limitation,
>  #define RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC.
> 
> graph_main_loop change to graph_main_rtc_loop
> 
>  #define RTE_GRAPH_MODEL_SELECT GRAPH_MODEL_MCORE_DISPATCH
> 
> graph_main_loop change to graph_main_mcore_loop
> 
> Select the following based on runtime option
>         /* Launch per-lcore init on every worker lcore */
>         rte_eal_mp_remote_launch(graph_main_rtc_loop, NULL, SKIP_MAIN); or
>         rte_eal_mp_remote_launch(graph_main_mcore_loop, NULL, SKIP_MAIN);
> 

We want to 1. Use same API (rte_graph_walk()) for diff models.
2. no performance drop for rtc (use RTE_GRAPH_MODEL_SELECT in compile time)

If I understand correctly, I need remove graph->model and only use
RTE_GRAPH_MODEL_SELECT to select models? 

And change it as
graph_main_rtc_loop()
{
  While(1)
    rte_graph_walk_rtc()
}

But actually, I think graph->model is need, especially for config stage and for 
runtime config
If set RTE_GRAPH_MODEL_SELECT_RUNTIME.
We need the model type to decide to alloc workqueue and use 
RTE_GRAPH_MODEL_SELECT
to choose the walk.


> >         memset(&rewrite_data, 0, sizeof(rewrite_data));
> >         rewrite_len = sizeof(rewrite_data); diff --git
> > a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h index
> > 541c373cb1..19b4c1514f 100644
> > --- a/lib/graph/rte_graph_worker.h
> > +++ b/lib/graph/rte_graph_worker.h
> > @@ -26,6 +26,9 @@ __rte_experimental
> >  static inline void
> >  rte_graph_walk(struct rte_graph *graph)  {
> > +#define RTE_GRAPH_MODEL_RTC 0
> > +#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
> 
> No need for duplicate enum. Please remove enum make  this as in public header
> file.
> 
Yes, it will cause no defined warnings.
Thanks for your comments.
I will remove enum and define model type macros in public header. And also 
change
the related structs/APIs.

> 
> > +
> 
> Add comment here, On how  application uses this, aka.  before inlcuding the
> worker header file #define RTE_GRAPH_MODEL_SELECT
> RTE_GRAPH_MODEL_RTC.
> Please change the text as needed.
Yes, I will add comment and add the usage in document.

> 
> 
> >  #if !defined(RTE_GRAPH_MODEL_SELECT) || RTE_GRAPH_MODEL_SELECT ==
> RTE_GRAPH_MODEL_RTC
> >         rte_graph_walk_rtc(graph);
> >  #elif RTE_GRAPH_MODEL_SELECT ==
> RTE_GRAPH_MODEL_MCORE_DISPATCH
> > --
> > 2.37.2
> >

RE: [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model

Reply via email to