Hi, Pavan > -----邮件原件----- > 发件人: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com> > 发送时间: 2021年1月12日 16:29 > 收件人: Feifei Wang <feifei.wa...@arm.com>; jer...@marvell.com; Harry > van Haaren <harry.van.haa...@intel.com> > 抄送: dev@dpdk.org; nd <n...@arm.com>; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; sta...@dpdk.org; Ruifeng Wang > <ruifeng.w...@arm.com>; nd <n...@arm.com>; nd <n...@arm.com>; nd > <n...@arm.com> > 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline > test > > Hi Feifei, > > >Hi Feifei, > > > >>Hi, Pavan > >> > >>> -----邮件原件----- > >>> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com> > >>> 发送时间: 2021年1月8日 17:13 > >>> 收件人: Feifei Wang <feifei.wa...@arm.com>; > >jer...@marvell.com; > >>Harry > >>> van Haaren <harry.van.haa...@intel.com> > >>> 抄送: dev@dpdk.org; nd <n...@arm.com>; Honnappa Nagarahalli > >>> <honnappa.nagaraha...@arm.com>; sta...@dpdk.org; Ruifeng > >Wang > >>> <ruifeng.w...@arm.com>; nd <n...@arm.com>; nd <n...@arm.com> > >>> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for > >>pipeline > >>> test > >>> > >>> Hi Feifei, > >>> > >>> >Hi, Pavan > >>> > > >>> > > >>> > > >>> >> -----邮件原件----- > >>> > > >>> >> 发件人: Pavan Nikhilesh Bhagavatula > >><mailto:pbhagavat...@marvell.com> > >>> > > >>> >> 发送时间: 2021年1月5日 17:29 > >>> > > >>> >> 收件人: Feifei Wang <mailto:feifei.wa...@arm.com>; > >>mailto:jer...@marvell.com; > >>> >Harry > >>> > > >>> >> van Haaren <mailto:harry.van.haa...@intel.com> > >>> > > >>> >> 抄送: mailto:dev@dpdk.org; nd <mailto:n...@arm.com>; > >Honnappa > >>Nagarahalli > >>> > > >>> >> <mailto:honnappa.nagaraha...@arm.com>; > >>mailto:sta...@dpdk.org; Ruifeng Wang > >>> > > >>> >> <mailto:ruifeng.w...@arm.com>; nd <mailto:n...@arm.com> > >>> > > >>> >> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers > >>for > >>> >pipeline > >>> > > >>> >> test > >>> > > >>> >> > >>> > > >>> >> Hi Feifei, > >>> > > >>> >> > >>> > > >>> >> >Hi, Pavan > >>> > > >>> >> > > >>> > > >>> >> >Sorry for my late reply and thanks very much for your review. > >>> > > >>> >> > > >>> > > >>> >> >> -----Original Message----- > >>> > > >>> >> >> From: Pavan Nikhilesh Bhagavatula > >>> > >>><mailto:pbhagavat...@marvell.com<mailto:pbhagavatula@marvell. > >c > >>om>> > >>> > > >>> >> >> Sent: 2020年12月22日 18:33 > >>> > > >>> >> >> To: Feifei Wang > >>> > >><mailto:feifei.wa...@arm.com<mailto:feifei.wa...@arm.com>>; > >>> >mailto:jer...@marvell.com<mailto:jer...@marvell.com>; > >>> > > >>> >> >Harry van > >>> > > >>> >> >> Haaren > >>> > >>><mailto:harry.van.haa...@intel.com<mailto:harry.van.haaren@intel > >. > >>com>>; > >>> >Pavan Nikhilesh > >>> > > >>> >> >> > >>> > >>><pbhagavat...@caviumnetworks.com<mailto:pbhagavatula@caviu > >m > >>n > >>> >etworks.com>> > >>> > > >>> >> >> Cc: mailto:dev@dpdk.org<mailto:dev@dpdk.org>; nd > >>> ><mailto:n...@arm.com<mailto:n...@arm.com>>; Honnappa > >>Nagarahalli > >>> > > >>> >> >> > >>> > >>><honnappa.nagaraha...@arm.com<mailto:Honnappa.Nagarahalli@a > >r > >>m > >>> >.com>>; mailto:sta...@dpdk.org<mailto:sta...@dpdk.org>; Phil > >>Yang > >>> > > >>> >> >> <mailto:phil.y...@arm.com<mailto:phil.y...@arm.com>> > >>> > > >>> >> >> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release > >>> >barriers > >>> > > >>> >> >for > >>> > > >>> >> >> pipeline test > >>> > > >>> >> >> > >>> > > >>> >> >> > >>> > > >>> >> >> >Add release barriers before updating the processed packets > >for > >>> > > >>> >> >worker > >>> > > >>> >> >> >lcores to ensure the worker lcore has really finished data > >>> > > >>> >> >> >processing and then it can update the processed packets > >>number. > >>> > > >>> >> >> > > >>> > > >>> >> >> > >>> > > >>> >> >> I believe we can live with minor inaccuracies in stats being > >>> > > >>> >> >> presented > >>> > > >>> >> >as > >>> > > >>> >> >> atomics are pretty heavy when scheduler is limited to burst > >size > >>> >> >> as > >>> >1. > >>> > > >>> >> >> > >>> > > >>> >> >> One option is to move it before a pipeline operation > >>> > > >>> >> >(pipeline_event_tx, > >>> > > >>> >> >> pipeline_fwd_event etc.) as they imply implicit release > >>> >> >> barrier (as > >>> > > >>> >> >> all > >>> > > >>> >> >the > >>> > > >>> >> >> changes done to the event should be visible to the next core). > >>> > > >>> >> > > >>> > > >>> >> >If I understand correctly, your meaning is that move release > >>> >> >barriers > >>> > > >>> >> >before pipeline_event_tx or pipeline_fwd_event. This can ensure > >>the > >>> > > >>> >> >event has been processed before the next core begins to tx/fwd. > >>For > >>> > > >>> >> >example: > >>> > > >>> >> > >>> > > >>> >> What I meant was event APIs such as `rte_event_enqueue_burst`, > >>> > > >>> >> `rte_event_eth_tx_adapter_enqueue` > >>> > > >>> >> act as an implicit release barrier and the API > >>> >`rte_event_dequeue_burst` act > >>> > > >>> >> as an implicit acquire barrier. > >>> > > >>> > > >>> > > >>> >> > >>> > > >>> >> Since, pipeline_* test starts with a dequeue() and ends with an > >>> >enqueue() I > >>> > > >>> >> don’t believe we need barriers in Between. > >>> > > >>> > > >>> > > >>> >Sorry for my misunderstanding this. And I agree with you that no > >>> >barriers are > >>> > > >>> >needed between dequeue and enqueue. > >>> > > >>> > > >>> > > >>> >Now, let's go back to the beginning. Actually with this patch, our > >>> >barrier is mainly > >>> > > >>> >for the synchronous variable " w->processed_pkts ". As we all > >know, > >>the > >>> >event is firstly > >>> > > >>> >dequeued and then enqueued, after this, the event can be treated > >as > >>the > >>> >processed event > >>> > > >>> >and included in the statistics("w->processed_pkts++"). > >>> > > >>> > > >>> > > >>> >Thus, we add a release barrier before " w->processed_pkts++" is to > >>> >prevent this operation > >>> > > >>> >being executed ahead of time. For example: > >>> > > >>> >dequeue -> w->processed_pkts++ -> enqueue > >>> > > >>> >This cause that the worker doesn't actually finish this event > >>> >processing, but the event is treated > >>> > > >>> >as the processed one and included in the statistics. > >>> > > >>> > >>> But the current sequence is dequeue-> enqueue-> w- > processed_pkts++ > >>>and enqueue already acts as an explicit release barrier right? > >>> > >> > >>Sorry maybe I cannot understand how “enqueue” as an explicit release > >>barrier. I think of two possibilities: > >>1. As you say before, all the changes done to the event should be > >visible > >>to the next core and enqueue is a operation for event, so the next > >core > >>should wait for the event to be enqueued. > >>I think this is due to data dependence for the same variable. However, > >>‘w->processed_pkts’ and ‘ev’ are different variables, so this cannot > >>prevent ‘w->processed_pkts++’ before enqueue. > >>And the main core may load updated ‘w->processed_pkts’ but actually > >>the event is still being processed. For example: > >> > >> Time Slot Worker 1 Main core > >> 1 dequeue > >> 2 w->processed_pkts++ > >> 3 load w- > >>processed_pkts > >> 4 enqueue > >> > >>2. Some release barriers have been included in enqueue. There is a > >>release barrier in rte_ring_enqueue : > >>move head -> copy elements to the ring -> release barrier -> update > >tail > >>-> w->processed_pkts++ > >>However, this barrier cannot prevent ‘w->processed_pkts++’ before > >>update tail, and when update_tail has been finished, the enqueue > >>process can be seen completed. > > > >I was talking about case 2 in particular almost all enqueue calls have > >some kind of release barrier in place. I do agree w->processed_pkts++ > >might get reordered with tail update but since enqueue itself is a ldr > >+ blr I was hoping that it wouldn't occur. > > > >We can continue the discussion once I have some performance data. > > > >Thanks for your patience :) > >Pavan. > > > >> > >>> > > >>> > > >>> > >>>_______________________________________________________ > _ > >_ > >>> _ > >>> >____________________ > >>> > > >>> > > >>> > > >>> >By the way, I have two other questions about pipeline process test > >in > >>> >"test_pipeline_queue". > >>> > > >>> >1. when do we start counting processed events (w- > >>>processed_pkts)? > >>> > > >>> >For the fwd mode (internal_port = false), when we choose single > >>stage, > >>> >application increments > >>> > > >>> >the number events processed after "pipeline_event enqueue". > >>> >However, when we choose multiple > >>> > > >>> >stage, application increments the number events processed before > >>> >"pipleline_event_enqueue". > >>> > >>> We count an event as process when all the stages have completed > >>and its > >>> Trasnmitted. > >>> > >>> >So, > >>> > > >>> >maybe we can unify this. For example of multiple stage: > >>> > > >>> > > >>> > > >>> > if (cq_id == last_queue) { > >>> > > >>> > ev.queue_id = > >>> > tx_queue[ev.mbuf->port]; > >>> > > >>> > > >>> >rte_event_eth_tx_adapter_txq_set(ev.mbuf, > >>> >0); > >>> > > >>> > > >>> >pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC); > >>> > > >>> > + > >>> > pipeline_event_enqueue(dev, port, &ev); > >>> > > >>> > > >>> > w->processed_pkts++; > >>> > > >>> > } else { > >>> > > >>> > ev.queue_id++; > >>> > > >>> > > >>> >pipeline_fwd_event(&ev, sched_type_list[cq_id]); > >>> > > >>> > + > >>> > pipeline_event_enqueue(dev, port, &ev); > >>> > > >>> > } > >>> > > >>> > > >>> > > >>> > - pipeline_event_enqueue(dev, port, &ev); > >>> > > >>> > > >>> > >>> The above change makes sense. > >>> > >>Thanks for your review, and I’ll update this change into the next > >>version. > >>> > > >>> >2. Whether "pipeline_event_enqueue" is needed after > >>> >"pipeline_event_tx" for tx mode? > >>> > > >>> >For single_stage_burst_tx mode, after "pipeline_event_tx", the > >>worker > >>> >has to enqueue again > >>> > > >>> >due to "pipeline_event_enqueue_burst", so maybe we should jump > >>out of > >>> >the loop after > >>> > > >>> >“pipeline_event_tx”, > >>> > >>> We call enqueue burst to release the events i.e. enqueue events with > >>> RTE_EVENT_OP_RELEASE. > >>> > >>However, > >>In case of single event, for ' pipeline_queue_worker_single_stage_tx' > >>and ' pipeline_queue_worker_multi_stage_tx', > >>after tx, there is no release operation. > >> > > I think this was done before the feature disable_implicit_release > `RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE ` was implemented. > We can remove the event_op_release section of the code and have a > continue. > Ok, thanks for your explain. I will upload the new series to fix above two bugs. > >>> > > >>> for example: > >>> > > >>> > > >>> > > >>> > if > >>> >(ev[i].sched_type == > >>> >RTE_SCHED_TYPE_ATOMIC) { > >>> > > >>> > > >>> > pipeline_event_tx(dev, port, &ev[i]); > >>> > > >>> > > >>> > ev[i].op = RTE_EVENT_OP_RELEASE; > >>> > > >>> > > >>> > w->processed_pkts++; > >>> > > >>> > + continue; > >>> > > >>> > } else { > >>> > > >>> > > >>> > ev[i].queue_id++; > >>> > > >>> > > >>> > pipeline_fwd_event(&ev[i], > >>> > > >>> > > >>> >RTE_SCHED_TYPE_ATOMIC); > >>> > > >>> > } > >>> > > >>> > } > >>> > > >>> > > >>> > > >>> > pipeline_event_enqueue_burst(dev, > >>> > port, ev, nb_rx); > >>> > > >>> > > >>> > > >>> > > >>> > > >>> >> > >>> > > >>> >> > > >>> > > >>> >> >if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) { > >>> > > >>> >> > + > >>> >__atomic_thread_fence(__ATOMIC_RELEASE); > >>> > > >>> >> > pipeline_event_tx(dev, > >>> >> > port, &ev); > >>> > > >>> >> > w->processed_pkts++; > >>> > > >>> >> > } else { > >>> > > >>> >> > ev.queue_id++; > >>> > > >>> >> > + > >>> >__atomic_thread_fence(__ATOMIC_RELEASE); > >>> > > >>> >> > > >>> >> > pipeline_fwd_event(&ev, > >>> > > >>> >> >RTE_SCHED_TYPE_ATOMIC); > >>> > > >>> >> > > >>> >> > pipeline_event_enqueue(dev, port, &ev); > >>> > > >>> >> > > >>> > > >>> >> >However, there are two reasons to prevent this: > >>> > > >>> >> > > >>> > > >>> >> >First, compare with other tests in app/eventdev, for example, > >the > >>> > > >>> >> >eventdev perf test, the wmb is after event operation to ensure > >>> > > >>> >> >operation has been finished and then w->processed_pkts++. > >>> > > >>> >> > >>> > > >>> >> In case of perf_* tests start with a dequeue() and finally ends > >>> >> with a > >>> > > >>> >> mempool_put() should also act as implicit acquire release pairs > >>> >making stats > >>> > > >>> >> consistent? > >>> > > >>> > > >>> > > >>> >For perf tests, this consistency refers to that there is a wmb > >>> >after mempool_put(). > >>> > > >>> >Please refer to this link: > >>> > > >>> >https://urldefense.proofpoint.com/v2/url?u=http- > >>> > >>>3A__patches.dpdk.org_patch_85634_&d=DwIGaQ&c=nKjWec2b6R0 > >m > >>O > >>> > >>>yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m= > >z > >>g > >>> >QHeSDiXWfI1PIIUxXBqMS6E- > >>2_3G46nhrzGXoBpHI&s=0FwTxPXjWflh- > >>> >sdmnkY133IPlJB780x0yxe7Am3JCBw&e= > >>> > > >>> > > >>> > > >>> >> > >>> > > >>> >> >So, if we move release barriers before tx/fwd, it may cause that > >>the > >>> > > >>> >> >tests of app/eventdev become inconsistent.This may reduce the > >>> > > >>> >> >maintainability of the code and make it difficult to understand. > >>> > > >>> >> > > >>> > > >>> >> >Second, it is a test case, though heavy thread may cause > >>> >performance > >>> > > >>> >> >degradation, it can ensure that the operation process and the > >test > >>> > > >>> >> >result are correct. And maybe for a test case, correctness is > >more > >>> > > >>> >> >important than performance. > >>> > > >>> >> > > >>> > > >>> >> > >>> > > >>> >> Most of our internal perf test run on 24/48 core combinations > >and > >>> >since > >>> > > >>> >> Octeontx2 event device driver supports a burst size of 1, it will > >>> >> show > >>> >up as > >>> > > >>> >> Huge performance degradation. > >>> > > >>> > > >>> > > >>> >For the impact on performance, I do the test using software driver, > >>> >following are some test results: > >>> > > >>> >------------------------------------------------------------------- > >>> >---- > >>> >---------------- > >>> >--------------------------------------------- > >>> > > >>> >Architecture: aarch64 > >>> > > >>> >Nics: ixgbe-82599 > >>> > > >>> >CPU: Cortex-A72 > >>> > > >>> >BURST_SIZE: 1 > >>> > > >>> >Order: ./dpdk-test-eventdev -l 0-15 -s 0x2 --vdev=event_sw0 -- -- > >>> >test=pipeline_queue --wlcore=4-14 --prod_type_ethdev --stlist=a,a > >>> > > >>> >Flow: one flow, 64bits package, TX rate: 1.4Mpps > >>> > > >>> > > >>> > > >>> >Without this patch: > >>> > > >>> >0.954 mpps avg 0.953 mpps > >>> > > >>> > > >>> > > >>> >With this patch: > >>> > > >>> >0.932 mpps avg 0.930 mpps > >>> > > >>> >------------------------------------------------------------------- > >>> >---- > >>> >---------------- > >>> >--------------------------------------------- > >>> > > >>> > > >>> > > >>> >Based on the result above, there is no significant performance > >>> >degradation with this patch. > >>> > > >>> >This is because the release barrier is only for “w- > >>>processed_pkts++”. > >>> >It just ensures that the worker core > >>> > > >>> >increments the number events processed after enqueue, and it > >>doesn’t > >>> >affect dequeue/enqueue: > >>> > > >>> > > >>> > > >>> >dequeue -> enqueue -> release barrier -> w->processed_pkts++ > >>> > > >>> > >>> Here enqueue already acts as an explicit release barrier. > >>> > >>Please refer above reasons. > >> > >>> > > >>> > > >>> >On the other hand, I infer the reason for the slight decrease in > >>> >measurement performance is that the release barrier > >>> > > >>> >prevent “w->processed_pkts++” before that the event has been > >>processed > >>> >(enqueue). But I think this test result is closer > >>> > > >>> >to the real performance. > >>> > > >>> >And sorry for that we have no octentx2 device, so there is no test > >>> >result on Octeontx2 event device driver. Would you please > >>> > > >>> >help us test this patch on octentx2 when you are convenient. > >Thanks > >>> >very much. > >>> > > >>> > >>> I will report the performance numbers on Monday. > >>> > >> > >>That’s great, Thanks very much for your help. > > We are seeing a ~2% performance loss with the series applied on pipeline > tests. > I have taken the performance numbers on IXIA to eliminate any > discrepancies in stats. > > Do you see any improvement in stats being reported with and w/o the > barrier with Traffic generator vs eventdev application? because I don’t see > any difference when verifying with IXIA. > Thanks very much for your test. We also use IXIA to test with SW driver and have ~2% performance loss according to the above data.
> Can we skip this patch? > Also, note that the barrier wouldn’t help in cases where: > 1. SW eventdev Tx routine runs on service core which would have it's on > latency in processing the packets. > 2. HW eventdev Tx routine that have shaping and scheduling. > That's Ok, I'll remove this patch from this series. And thanks very much for this great discussion. Best Regards Feifei > Regards, > Pavan. > > >> > >>Best Regards > >>Feifei > >> > >>> > > >>> > > >>> >Best Regards > >>> > > >>> >Feifei > >>> > >>> Regards, > >>> Pavan. > >>> > >>> > > >>> > > >>> > > >>> >> > >>> > > >>> >> >So, due to two reasons above, I'm ambivalent about how we > >>should > >>> >do in > >>> > > >>> >> >the next step. > >>> > > >>> >> > > >>> > > >>> >> >Best Regards > >>> > > >>> >> >Feifei > >>> > > >>> >> > >>> > > >>> >> Regards, > >>> > > >>> >> Pavan. > >>> > > >>> >> > >>> > > >>> >> > > >>> > > >>> >> >> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue > >>worker > >>> > > >>> >> >> >functions") > >>> > > >>> >> >> >Cc: > >>> > >>>mailto:pbhagavat...@marvell.com<mailto:pbhagavatula@marvell.c > >o > >>m> > >>> > > >>> >> >> >Cc: mailto:sta...@dpdk.org<mailto:sta...@dpdk.org> > >>> > > >>> >> >> > > >>> > > >>> >> >> >Signed-off-by: Phil Yang > >>> ><mailto:phil.y...@arm.com<mailto:phil.y...@arm.com>> > >>> > > >>> >> >> >Signed-off-by: Feifei Wang > >>> ><mailto:feifei.wa...@arm.com<mailto:feifei.wa...@arm.com>> > >>> > > >>> >> >> >Reviewed-by: Ruifeng Wang > >>> > >><mailto:ruifeng.w...@arm.com<mailto:ruifeng.w...@arm.com>> > >>> > > >>> >> >> >--- > >>> > > >>> >> >> > app/test-eventdev/test_pipeline_queue.c | 64 > >>> > > >>> >> >> >+++++++++++++++++++++---- > >>> > > >>> >> >> > 1 file changed, 56 insertions(+), 8 deletions(-) > >>> > > >>> >> >> > > >>> > > >>> >> >> >diff --git a/app/test-eventdev/test_pipeline_queue.c > >>b/app/test- > >>> > > >>> >> >> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb > >>> > > >>> >> >100644 > >>> > > >>> >> >> >--- a/app/test-eventdev/test_pipeline_queue.c > >>> > > >>> >> >> >+++ b/app/test-eventdev/test_pipeline_queue.c > >>> > > >>> >> >> >@@ -30,7 +30,13 @@ > >>> >pipeline_queue_worker_single_stage_tx(void > >>> > > >>> >> >> >*arg) > >>> > > >>> >> >> > > >>> > > >>> >> >> > if (ev.sched_type == > >>> >> >> > RTE_SCHED_TYPE_ATOMIC) { > >>> > > >>> >> >> > pipeline_event_tx(dev, > >>> >> >> > port, &ev); > >>> > > >>> >> >> >- w->processed_pkts++; > >>> > > >>> >> >> >+ > >>> > > >>> >> >> >+ /* release barrier here > >>> >> >> >+ ensures stored > >>> >operation > >>> > > >>> >> >> >+ * of the event completes > >>> >> >> >+ before the number > >>> >of > >>> > > >>> >> >> >+ * processed pkts is visible > >>> >> >> >+ to the main core > >>> > > >>> >> >> >+ */ > >>> > > >>> >> >> >+ > >>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts), > >>> >1, > >>> > > >>> >> >> >+ > >>> >> >> >+ __ATOMIC_RELEASE); > >>> > > >>> >> >> > } else { > >>> > > >>> >> >> > ev.queue_id++; > >>> > > >>> >> >> > pipeline_fwd_event(&ev, > >>> > > >>> >> >> >RTE_SCHED_TYPE_ATOMIC); > >>> > > >>> >> >> >@@ -59,7 +65,13 @@ > >>> > > >>> >> >pipeline_queue_worker_single_stage_fwd(void > >>> > > >>> >> >> >*arg) > >>> > > >>> >> >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf, > >>> >> >> > 0); > >>> > > >>> >> >> > pipeline_fwd_event(&ev, > >>> >> >> > RTE_SCHED_TYPE_ATOMIC); > >>> > > >>> >> >> > pipeline_event_enqueue(dev, port, &ev); > >>> > > >>> >> >> >- w->processed_pkts++; > >>> > > >>> >> >> >+ > >>> > > >>> >> >> >+ /* release barrier here ensures stored > >>> >> >> >+ operation > >>> > > >>> >> >> >+ * of the event completes before the number > >>> >> >> >+ of > >>> > > >>> >> >> >+ * processed pkts is visible to the main > >>> >> >> >+ core > >>> > > >>> >> >> >+ */ > >>> > > >>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts), 1, > >>> > > >>> >> >> >+ > >>> >> >> >+ __ATOMIC_RELEASE); > >>> > > >>> >> >> > } > >>> > > >>> >> >> > > >>> > > >>> >> >> > return 0; > >>> > > >>> >> >> >@@ -84,7 +96,13 @@ > >>> > > >>> >> >> >pipeline_queue_worker_single_stage_burst_tx(void *arg) > >>> > > >>> >> >> > if (ev[i].sched_type == > >>> > > >>> >> >> >RTE_SCHED_TYPE_ATOMIC) { > >>> > > >>> >> >> > > >>> >> >> > pipeline_event_tx(dev, port, &ev[i]); > >>> > > >>> >> >> > ev[i].op > >>> >> >> > = RTE_EVENT_OP_RELEASE; > >>> > > >>> >> >> >- > >>> >> >> >w->processed_pkts++; > >>> > > >>> >> >> >+ > >>> > > >>> >> >> >+ /* release > >>> >> >> >+ barrier here ensures stored > >>> > > >>> >> >> >operation > >>> > > >>> >> >> >+ * of the > >>> >> >> >+ event completes before the > >>> > > >>> >> >> >number of > >>> > > >>> >> >> >+ * processed > >>> >> >> >+ pkts is visible to the main > >>> > > >>> >> >> >core > >>> > > >>> >> >> >+ */ > >>> > > >>> >> >> >+ > >>> >> >> >+ __atomic_fetch_add(&(w- > >>> > > >>> >> >> >>processed_pkts), 1, > >>> > > >>> >> >> >+ > >>> >__ATOMIC_RELEASE); > >>> > > >>> >> >> > } else { > >>> > > >>> >> >> > > >>> >> >> > ev[i].queue_id++; > >>> > > >>> >> >> > > >>> >> >> > pipeline_fwd_event(&ev[i], > >>> > > >>> >> >> >@@ -121,7 +139,13 @@ > >>> > > >>> >> >> >pipeline_queue_worker_single_stage_burst_fwd(void *arg) > >>> > > >>> >> >> > } > >>> > > >>> >> >> > > >>> > > >>> >> >> > pipeline_event_enqueue_burst(dev, port, > >>> >> >> > ev, nb_rx); > >>> > > >>> >> >> >- w->processed_pkts += nb_rx; > >>> > > >>> >> >> >+ > >>> > > >>> >> >> >+ /* release barrier here ensures stored > >>> >> >> >+ operation > >>> > > >>> >> >> >+ * of the event completes before the number > >>> >> >> >+ of > >>> > > >>> >> >> >+ * processed pkts is visible to the main > >>> >> >> >+ core > >>> > > >>> >> >> >+ */ > >>> > > >>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts), > >nb_rx, > >>> > > >>> >> >> >+ > >>> >> >> >+ __ATOMIC_RELEASE); > >>> > > >>> >> >> > } > >>> > > >>> >> >> > > >>> > > >>> >> >> > return 0; > >>> > > >>> >> >> >@@ -146,7 +170,13 @@ > >>> > > >>> >> >pipeline_queue_worker_multi_stage_tx(void > >>> > > >>> >> >> >*arg) > >>> > > >>> >> >> > > >>> > > >>> >> >> > if (ev.queue_id == > >>> >> >> > tx_queue[ev.mbuf->port]) { > >>> > > >>> >> >> > pipeline_event_tx(dev, > >>> >> >> > port, &ev); > >>> > > >>> >> >> >- w->processed_pkts++; > >>> > > >>> >> >> >+ > >>> > > >>> >> >> >+ /* release barrier here > >>> >> >> >+ ensures stored > >>> >operation > >>> > > >>> >> >> >+ * of the event completes > >>> >> >> >+ before the number > >>> >of > >>> > > >>> >> >> >+ * processed pkts is visible > >>> >> >> >+ to the main core > >>> > > >>> >> >> >+ */ > >>> > > >>> >> >> >+ > >>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts), > >>> >1, > >>> > > >>> >> >> >+ > >>> >> >> >+ __ATOMIC_RELEASE); > >>> > > >>> >> >> > continue; > >>> > > >>> >> >> > } > >>> > > >>> >> >> > > >>> > > >>> >> >> >@@ -180,7 +210,13 @@ > >>> > > >>> >> >> >pipeline_queue_worker_multi_stage_fwd(void *arg) > >>> > > >>> >> >> > ev.queue_id = > >>> >> >> > tx_queue[ev.mbuf->port]; > >>> > > >>> >> >> > > >>> >> >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf, > >>> >0); > >>> > > >>> >> >> > pipeline_fwd_event(&ev, > >>> > > >>> >> >> >RTE_SCHED_TYPE_ATOMIC); > >>> > > >>> >> >> >- w->processed_pkts++; > >>> > > >>> >> >> >+ > >>> > > >>> >> >> >+ /* release barrier here > >>> >> >> >+ ensures stored > >>> >operation > >>> > > >>> >> >> >+ * of the event completes > >>> >> >> >+ before the number > >>> >of > >>> > > >>> >> >> >+ * processed pkts is visible > >>> >> >> >+ to the main core > >>> > > >>> >> >> >+ */ > >>> > > >>> >> >> >+ > >>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts), > >>> >1, > >>> > > >>> >> >> >+ > >>> >> >> >+ __ATOMIC_RELEASE); > >>> > > >>> >> >> > } else { > >>> > > >>> >> >> > ev.queue_id++; > >>> > > >>> >> >> > pipeline_fwd_event(&ev, > >>> > > >>> >> >> >sched_type_list[cq_id]); > >>> > > >>> >> >> >@@ -214,7 +250,13 @@ > >>> > > >>> >> >> >pipeline_queue_worker_multi_stage_burst_tx(void *arg) > >>> > > >>> >> >> > if (ev[i].queue_id == > >>> >> >> > tx_queue[ev[i].mbuf- > >>> > > >>> >> >> >>port]) { > >>> > > >>> >> >> > > >>> >> >> > pipeline_event_tx(dev, port, &ev[i]); > >>> > > >>> >> >> > ev[i].op > >>> >> >> > = RTE_EVENT_OP_RELEASE; > >>> > > >>> >> >> >- > >>> >> >> >w->processed_pkts++; > >>> > > >>> >> >> >+ > >>> > > >>> >> >> >+ /* release > >>> >> >> >+ barrier here ensures stored > >>> > > >>> >> >> >operation > >>> > > >>> >> >> >+ * of the > >>> >> >> >+ event completes before the > >>> > > >>> >> >> >number of > >>> > > >>> >> >> >+ * processed > >>> >> >> >+ pkts is visible to the main > >>> > > >>> >> >> >core > >>> > > >>> >> >> >+ */ > >>> > > >>> >> >> >+ > >>> >> >> >+ __atomic_fetch_add(&(w- > >>> > > >>> >> >> >>processed_pkts), 1, > >>> > > >>> >> >> >+ > >>> >__ATOMIC_RELEASE); > >>> > > >>> >> >> > continue; > >>> > > >>> >> >> > } > >>> > > >>> >> >> > > >>> > > >>> >> >> >@@ -254,7 +296,13 @@ > >>> > > >>> >> >> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg) > >>> > > >>> >> >> > > >>> > > >>> >> >> > rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0); > >>> > > >>> >> >> > > >>> >> >> > pipeline_fwd_event(&ev[i], > >>> > > >>> >> >> > > >>> > > >>> >> >> > RTE_SCHED_TYPE_ATOMIC); > >>> > > >>> >> >> >- > >>> >> >> >w->processed_pkts++; > >>> > > >>> >> >> >+ > >>> > > >>> >> >> >+ /* release > >>> >> >> >+ barrier here ensures stored > >>> > > >>> >> >> >operation > >>> > > >>> >> >> >+ * of the > >>> >> >> >+ event completes before the > >>> > > >>> >> >> >number of > >>> > > >>> >> >> >+ * processed > >>> >> >> >+ pkts is visible to the main > >>> > > >>> >> >> >core > >>> > > >>> >> >> >+ */ > >>> > > >>> >> >> >+ > >>> >> >> >+ __atomic_fetch_add(&(w- > >>> > > >>> >> >> >>processed_pkts), 1, > >>> > > >>> >> >> >+ > >>> >__ATOMIC_RELEASE); > >>> > > >>> >> >> > } else { > >>> > > >>> >> >> > > >>> >> >> > ev[i].queue_id++; > >>> > > >>> >> >> > > >>> >> >> > pipeline_fwd_event(&ev[i], > >>> > > >>> >> >> >-- > >>> > > >>> >> >> >2.17.1 > >>> >