> -----Original Message----- > From: Jerin Jacob <jerinjac...@gmail.com> > Sent: Tuesday, June 6, 2023 1:55 PM > To: Yan, Zhirun <zhirun....@intel.com> > Cc: dev@dpdk.org; jer...@marvell.com; kirankum...@marvell.com; > ndabilpu...@marvell.com; step...@networkplumber.org; > pbhagavat...@marvell.com; Liang, Cunming <cunming.li...@intel.com>; Wang, > Haiyue <haiyue.w...@intel.com>; mattias.ronnblom > <mattias.ronnb...@ericsson.com> > Subject: Re: [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore > dispatch worker model > > On Tue, Jun 6, 2023 at 10:41 AM Yan, Zhirun <zhirun....@intel.com> wrote: > > > > > > > > > -----Original Message----- > > > From: Jerin Jacob <jerinjac...@gmail.com> > > > Sent: Monday, June 5, 2023 9:42 PM > > > To: Yan, Zhirun <zhirun....@intel.com> > > > Cc: dev@dpdk.org; jer...@marvell.com; kirankum...@marvell.com; > > > ndabilpu...@marvell.com; step...@networkplumber.org; > > > pbhagavat...@marvell.com; Liang, Cunming <cunming.li...@intel.com>; > > > Wang, Haiyue <haiyue.w...@intel.com>; mattias.ronnblom > > > <mattias.ronnb...@ericsson.com> > > > Subject: Re: [PATCH v7 15/17] examples/l3fwd-graph: introduce > > > multicore dispatch worker model > > > > > > On Mon, Jun 5, 2023 at 4:57 PM Zhirun Yan <zhirun....@intel.com> wrote: > > > > > > > > Add new parameter "model" to choose mcore dispatch or rtc model. > > > > And in dispatch model, the node will affinity to worker core > > > > successively. > > > > > > > > Note: > > > > RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default. > Must > > > set > > > > model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore > > > > dispatch explicitly. GRAPH_MODEL_MCORE_RUNTIME_SELECT means it > > > > could choose by model in runtime. > > > > Only support one RX node for mcore dispatch model in current > > > > implementation. > > > > > > > > ./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" > > > > -P --model="dispatch" > > > > > > > > Signed-off-by: Haiyue Wang <haiyue.w...@intel.com> > > > > Signed-off-by: Cunming Liang <cunming.li...@intel.com> > > > > Signed-off-by: Zhirun Yan <zhirun....@intel.com> > > > > --- > > > > examples/l3fwd-graph/main.c | 231 +++++++++++++++++++++++++++++-- > ---- > > > > lib/graph/rte_graph_worker.h | 3 + > > > > 2 files changed, 196 insertions(+), 38 deletions(-) > > > > > > > > diff --git a/examples/l3fwd-graph/main.c > > > > b/examples/l3fwd-graph/main.c index 5feeab4f0f..4ecc6c9af4 100644 > > > > --- a/examples/l3fwd-graph/main.c > > > > +++ b/examples/l3fwd-graph/main.c > > > > @@ -23,6 +23,12 @@ > > > > #include <rte_cycles.h> > > > > #include <rte_eal.h> > > > > #include <rte_ethdev.h> > > > > +#define GRAPH_MODEL_RTC 0 /* Run-to-completion model, set by > default. > > > > +*/ #define GRAPH_MODEL_MCORE_DISPATCH 1 /* Dispatch model. */ > > > #define > > > > +GRAPH_MODEL_MCORE_RUNTIME_SELECT 2 /* Support to select model > by > > > */ > > > > + /* parsing model in > > > > +cmdline. */ > > > > > > After moving model to graph->model, Can you check the performance. > > > > In my env, I test l3fwd-graph, I got the same throughput.(slight > > improve could be treated as jitter) For graph_perf_autotest in test > > app, there is slight drop (About 0.2% call, similar cycles/call) Can it be > > treated > as jitter? > > Most likely. > Try in following in fasth path. > const ... model = graph->model; >
By default we set RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC, So get model is only in stats, not in fast path. I found the root cause is coming from the additional unit test. But actually, it should not be called. All added UT is under graph_autotest, not in graph_perf_autotest. It's strange if I destroy the cloned graph in testcase in the additional UT, can get same stats as we expected(even better performance). A little more calls, objs and better cycles/call (28->27). New: +-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+ |Node |calls |objs |realloc_count |objs/call |objs/sec(10E6) |cycles/call| +-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+ |test_graph_perf_worker-0-0 |10286353 |2633306368 |1 |256.000 |2037.676032 |27.0000 | |test_graph_perf_worker-1-0 |10286709 |2633397504 |1 |256.000 |2037.767168 |27.0000 | |test_graph_perf_worker-2-0 |10286732 |2633403392 |1 |256.000 |2037.773056 |27.0000 | |test_graph_perf_worker-3-0 |10286751 |2633408256 |1 |256.000 |2037.777920 |27.0000 | |test_graph_perf_source-0 |10286774 |2633414144 |2 |256.000 |2037.783552 |27.0000 | |test_graph_perf_sink-0 |10286791 |2633418496 |1 |256.000 |2037.788160 |27.0000 | +-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+ > > > > Old: > > +-------------------------------+---------------+---------------+---------------+------------ > ---+---------------+-----------+ > > |Node |calls |objs > > |realloc_count |objs/call > |objs/sec(10E6) |cycles/call| > > +-------------------------------+---------------+---------------+---------------+------------ > ---+---------------+-----------+ > > |test_graph_perf_worker-0-0 |10175176 |2604845056 |1 > |256.000 |2015.394304 |27.0000 | > > |test_graph_perf_worker-1-0 |10175542 |2604938752 |1 > |256.000 |2015.488000 |28.0000 | > > |test_graph_perf_worker-2-0 |10175565 |2604944640 |1 > |256.000 |2015.493888 |28.0000 | > > |test_graph_perf_worker-3-0 |10175593 |2604951808 |1 > |256.000 |2015.501056 |27.0000 | > > |test_graph_perf_source-0 |10175623 |2604959488 |2 > |256.000 |2015.508480 |27.0000 | > > |test_graph_perf_sink-0 |10175642 |2604964352 |1 > |256.000 |2015.513600 |27.0000 | > > +-------------------------------+---------------+---------------+---------------+------------ > ---+---------------+-----------+ > > > > New: > > +-------------------------------+---------------+---------------+---------------+------------ > ---+---------------+-----------+ > > |Node |calls |objs > > |realloc_count |objs/call > |objs/sec(10E6) |cycles/call| > > +-------------------------------+---------------+---------------+---------------+------------ > ---+---------------+-----------+ > > |test_graph_perf_worker-0-0 |10154953 |2599667968 |1 > |256.000 |2010.960128 |27.0000 | > > |test_graph_perf_worker-1-0 |10155316 |2599760896 |1 > |256.000 |2011.053056 |27.0000 | > > |test_graph_perf_worker-2-0 |10155338 |2599766528 |1 > |256.000 |2011.058688 |28.0000 | > > |test_graph_perf_worker-3-0 |10155357 |2599771392 |1 > |256.000 |2011.063552 |28.0000 | > > |test_graph_perf_source-0 |10155394 |2599780864 |2 > |256.000 |2011.072768 |27.0000 | > > |test_graph_perf_sink-0 |10155422 |2599788032 |1 > |256.000 |2011.080192 |27.0000 | > > +-------------------------------+---------------+---------------+---------------+------------ > ---+---------------+-----------+ > > > > > This may not be needed for l3fwd > > > > > Do you mean graph->model? > > Yes. > > > > > > or if there is not much code duplication, > > > > > > Do the following remove the limitation, #define > > > RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC. > > > > > > graph_main_loop change to graph_main_rtc_loop > > > > > > #define RTE_GRAPH_MODEL_SELECT GRAPH_MODEL_MCORE_DISPATCH > > > > > > graph_main_loop change to graph_main_mcore_loop > > > > > > Select the following based on runtime option > > > /* Launch per-lcore init on every worker lcore */ > > > rte_eal_mp_remote_launch(graph_main_rtc_loop, NULL, SKIP_MAIN); > or > > > rte_eal_mp_remote_launch(graph_main_mcore_loop, NULL, > > > SKIP_MAIN); > > > > > > > We want to 1. Use same API (rte_graph_walk()) for diff models. > > 2. no performance drop for rtc (use RTE_GRAPH_MODEL_SELECT in compile > > time) > > > > If I understand correctly, I need remove graph->model and only use > > RTE_GRAPH_MODEL_SELECT to select models? > > > > And change it as > > graph_main_rtc_loop() > > { > > While(1) > > rte_graph_walk_rtc() > > } > > > > But actually, I think graph->model is need, especially for config > > stage and for runtime config If set RTE_GRAPH_MODEL_SELECT_RUNTIME. > > Yes. Agree. If there is no MAJOR performance issues lets use > RTE_GRAPH_MODEL_SELECT_RUNTIME for l3fwd. > Thanks, there is no major performance issues. I will keep to use current scheme. > > We need the model type to decide to alloc workqueue and use > > RTE_GRAPH_MODEL_SELECT to choose the walk. > > > > > > > > memset(&rewrite_data, 0, sizeof(rewrite_data)); > > > > rewrite_len = sizeof(rewrite_data); diff --git > > > > a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h > > > > index 541c373cb1..19b4c1514f 100644 > > > > --- a/lib/graph/rte_graph_worker.h > > > > +++ b/lib/graph/rte_graph_worker.h > > > > @@ -26,6 +26,9 @@ __rte_experimental static inline void > > > > rte_graph_walk(struct rte_graph *graph) { > > > > +#define RTE_GRAPH_MODEL_RTC 0 > > > > +#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1 > > > > > > No need for duplicate enum. Please remove enum make this as in > > > public header file. > > > > > Yes, it will cause no defined warnings. > > Thanks for your comments. > > I will remove enum and define model type macros in public header. And > > also change the related structs/APIs. > > Also add a comment in RTE_GRAPH_MODEL_MCORE_DISPATCH, If adding new > entry, then update graph_is_valid API. > Got it. Thanks very much. > > > > > > > > > + > > > > > > Add comment here, On how application uses this, aka. before > > > inlcuding the worker header file #define RTE_GRAPH_MODEL_SELECT > > > RTE_GRAPH_MODEL_RTC. > > > Please change the text as needed. > > Yes, I will add comment and add the usage in document. > > > > > > > > > > > > #if !defined(RTE_GRAPH_MODEL_SELECT) || > RTE_GRAPH_MODEL_SELECT == > > > RTE_GRAPH_MODEL_RTC > > > > rte_graph_walk_rtc(graph); #elif RTE_GRAPH_MODEL_SELECT > > > > == > > > RTE_GRAPH_MODEL_MCORE_DISPATCH > > > > -- > > > > 2.37.2 > > > >