Hi Feifei, >Hi, Pavan > >> -----邮件原件----- >> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com> >> 发送时间: 2021年1月8日 17:13 >> 收件人: Feifei Wang <feifei.wa...@arm.com>; jer...@marvell.com; >Harry >> van Haaren <harry.van.haa...@intel.com> >> 抄送: dev@dpdk.org; nd <n...@arm.com>; Honnappa Nagarahalli >> <honnappa.nagaraha...@arm.com>; sta...@dpdk.org; Ruifeng Wang >> <ruifeng.w...@arm.com>; nd <n...@arm.com>; nd <n...@arm.com> >> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for >pipeline >> test >> >> Hi Feifei, >> >> >Hi, Pavan >> > >> > >> > >> >> -----邮件原件----- >> > >> >> 发件人: Pavan Nikhilesh Bhagavatula ><mailto:pbhagavat...@marvell.com> >> > >> >> 发送时间: 2021年1月5日 17:29 >> > >> >> 收件人: Feifei Wang <mailto:feifei.wa...@arm.com>; >mailto:jer...@marvell.com; >> >Harry >> > >> >> van Haaren <mailto:harry.van.haa...@intel.com> >> > >> >> 抄送: mailto:dev@dpdk.org; nd <mailto:n...@arm.com>; Honnappa >Nagarahalli >> > >> >> <mailto:honnappa.nagaraha...@arm.com>; >mailto:sta...@dpdk.org; Ruifeng Wang >> > >> >> <mailto:ruifeng.w...@arm.com>; nd <mailto:n...@arm.com> >> > >> >> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers >for >> >pipeline >> > >> >> test >> > >> >> >> > >> >> Hi Feifei, >> > >> >> >> > >> >> >Hi, Pavan >> > >> >> > >> > >> >> >Sorry for my late reply and thanks very much for your review. >> > >> >> > >> > >> >> >> -----Original Message----- >> > >> >> >> From: Pavan Nikhilesh Bhagavatula >> >><mailto:pbhagavat...@marvell.com<mailto:pbhagavatula@marvell.c >om>> >> > >> >> >> Sent: 2020年12月22日 18:33 >> > >> >> >> To: Feifei Wang >> ><mailto:feifei.wa...@arm.com<mailto:feifei.wa...@arm.com>>; >> >mailto:jer...@marvell.com<mailto:jer...@marvell.com>; >> > >> >> >Harry van >> > >> >> >> Haaren >> >><mailto:harry.van.haa...@intel.com<mailto:harry.van.haaren@intel. >com>>; >> >Pavan Nikhilesh >> > >> >> >> >> >><pbhagavat...@caviumnetworks.com<mailto:pbhagavatula@cavium >n >> >etworks.com>> >> > >> >> >> Cc: mailto:dev@dpdk.org<mailto:dev@dpdk.org>; nd >> ><mailto:n...@arm.com<mailto:n...@arm.com>>; Honnappa >Nagarahalli >> > >> >> >> >> >><honnappa.nagaraha...@arm.com<mailto:Honnappa.Nagarahalli@ar >m >> >.com>>; mailto:sta...@dpdk.org<mailto:sta...@dpdk.org>; Phil >Yang >> > >> >> >> <mailto:phil.y...@arm.com<mailto:phil.y...@arm.com>> >> > >> >> >> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release >> >barriers >> > >> >> >for >> > >> >> >> pipeline test >> > >> >> >> >> > >> >> >> >> > >> >> >> >Add release barriers before updating the processed packets for >> > >> >> >worker >> > >> >> >> >lcores to ensure the worker lcore has really finished data >> > >> >> >> >processing and then it can update the processed packets >number. >> > >> >> >> > >> > >> >> >> >> > >> >> >> I believe we can live with minor inaccuracies in stats being >> > >> >> >> presented >> > >> >> >as >> > >> >> >> atomics are pretty heavy when scheduler is limited to burst size >> >> >> as >> >1. >> > >> >> >> >> > >> >> >> One option is to move it before a pipeline operation >> > >> >> >(pipeline_event_tx, >> > >> >> >> pipeline_fwd_event etc.) as they imply implicit release barrier >> >> >> (as >> > >> >> >> all >> > >> >> >the >> > >> >> >> changes done to the event should be visible to the next core). >> > >> >> > >> > >> >> >If I understand correctly, your meaning is that move release >> >> >barriers >> > >> >> >before pipeline_event_tx or pipeline_fwd_event. This can ensure >the >> > >> >> >event has been processed before the next core begins to tx/fwd. >For >> > >> >> >example: >> > >> >> >> > >> >> What I meant was event APIs such as `rte_event_enqueue_burst`, >> > >> >> `rte_event_eth_tx_adapter_enqueue` >> > >> >> act as an implicit release barrier and the API >> >`rte_event_dequeue_burst` act >> > >> >> as an implicit acquire barrier. >> > >> > >> > >> >> >> > >> >> Since, pipeline_* test starts with a dequeue() and ends with an >> >enqueue() I >> > >> >> don’t believe we need barriers in Between. >> > >> > >> > >> >Sorry for my misunderstanding this. And I agree with you that no >> >barriers are >> > >> >needed between dequeue and enqueue. >> > >> > >> > >> >Now, let's go back to the beginning. Actually with this patch, our >> >barrier is mainly >> > >> >for the synchronous variable " w->processed_pkts ". As we all know, >the >> >event is firstly >> > >> >dequeued and then enqueued, after this, the event can be treated as >the >> >processed event >> > >> >and included in the statistics("w->processed_pkts++"). >> > >> > >> > >> >Thus, we add a release barrier before " w->processed_pkts++" is to >> >prevent this operation >> > >> >being executed ahead of time. For example: >> > >> >dequeue -> w->processed_pkts++ -> enqueue >> > >> >This cause that the worker doesn't actually finish this event >> >processing, but the event is treated >> > >> >as the processed one and included in the statistics. >> > >> >> But the current sequence is dequeue-> enqueue-> w- >>processed_pkts++ >> and enqueue already acts as an explicit release barrier right? >> > >Sorry maybe I cannot understand how “enqueue” as an explicit release >barrier. I think of two possibilities: >1. As you say before, all the changes done to the event should be visible >to the next core and enqueue is a operation for event, so the next core >should wait for the event to be enqueued. >I think this is due to data dependence for the same variable. However, >‘w->processed_pkts’ and ‘ev’ are different variables, so this cannot >prevent ‘w->processed_pkts++’ before enqueue. >And the main core may load updated ‘w->processed_pkts’ but actually >the event is still being processed. For example: > > Time Slot Worker 1 Main core > 1 dequeue > 2 w->processed_pkts++ > 3 load > w->processed_pkts > 4 enqueue > >2. Some release barriers have been included in enqueue. There is a >release barrier in rte_ring_enqueue : >move head -> copy elements to the ring -> release barrier -> update tail >-> w->processed_pkts++ >However, this barrier cannot prevent ‘w->processed_pkts++’ before >update tail, and when update_tail has been finished, the enqueue >process can be seen completed.
I was talking about case 2 in particular almost all enqueue calls have some kind of release barrier in place. I do agree w->processed_pkts++ might get reordered with tail update but since enqueue itself is a ldr + blr I was hoping that it wouldn't occur. We can continue the discussion once I have some performance data. Thanks for your patience :) Pavan. > >> > >> > >> >>_________________________________________________________ >> _ >> >____________________ >> > >> > >> > >> >By the way, I have two other questions about pipeline process test in >> >"test_pipeline_queue". >> > >> >1. when do we start counting processed events (w- >>processed_pkts)? >> > >> >For the fwd mode (internal_port = false), when we choose single >stage, >> >application increments >> > >> >the number events processed after "pipeline_event enqueue". >> >However, when we choose multiple >> > >> >stage, application increments the number events processed before >> >"pipleline_event_enqueue". >> >> We count an event as process when all the stages have completed >and its >> Trasnmitted. >> >> >So, >> > >> >maybe we can unify this. For example of multiple stage: >> > >> > >> > >> > if (cq_id == last_queue) { >> > >> > ev.queue_id = >> > tx_queue[ev.mbuf->port]; >> > >> > >> >rte_event_eth_tx_adapter_txq_set(ev.mbuf, >> >0); >> > >> > pipeline_fwd_event(&ev, >> >RTE_SCHED_TYPE_ATOMIC); >> > >> > + pipeline_event_enqueue(dev, >> > port, &ev); >> > >> > w->processed_pkts++; >> > >> > } else { >> > >> > ev.queue_id++; >> > >> > pipeline_fwd_event(&ev, >> >sched_type_list[cq_id]); >> > >> > + pipeline_event_enqueue(dev, >> > port, &ev); >> > >> > } >> > >> > >> > >> > - pipeline_event_enqueue(dev, port, &ev); >> > >> > >> >> The above change makes sense. >> >Thanks for your review, and I’ll update this change into the next >version. >> > >> >2. Whether "pipeline_event_enqueue" is needed after >> >"pipeline_event_tx" for tx mode? >> > >> >For single_stage_burst_tx mode, after "pipeline_event_tx", the >worker >> >has to enqueue again >> > >> >due to "pipeline_event_enqueue_burst", so maybe we should jump >out of >> >the loop after >> > >> >“pipeline_event_tx”, >> >> We call enqueue burst to release the events i.e. enqueue events with >> RTE_EVENT_OP_RELEASE. >> >However, >In case of single event, for ' pipeline_queue_worker_single_stage_tx' >and ' pipeline_queue_worker_multi_stage_tx', >after tx, there is no release operation. > >> > >> for example: >> > >> > >> > >> > if (ev[i].sched_type == >> >RTE_SCHED_TYPE_ATOMIC) { >> > >> > >> > pipeline_event_tx(dev, port, &ev[i]); >> > >> > >> > ev[i].op = RTE_EVENT_OP_RELEASE; >> > >> > >> > w->processed_pkts++; >> > >> > + continue; >> > >> > } else { >> > >> > >> > ev[i].queue_id++; >> > >> > >> > pipeline_fwd_event(&ev[i], >> > >> > >> >RTE_SCHED_TYPE_ATOMIC); >> > >> > } >> > >> > } >> > >> > >> > >> > pipeline_event_enqueue_burst(dev, port, >> > ev, nb_rx); >> > >> > >> > >> > >> > >> >> >> > >> >> > >> > >> >> >if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) { >> > >> >> > + >> >__atomic_thread_fence(__ATOMIC_RELEASE); >> > >> >> > pipeline_event_tx(dev, >> >> > port, &ev); >> > >> >> > w->processed_pkts++; >> > >> >> > } else { >> > >> >> > ev.queue_id++; >> > >> >> > + >> >__atomic_thread_fence(__ATOMIC_RELEASE); >> > >> >> > pipeline_fwd_event(&ev, >> > >> >> >RTE_SCHED_TYPE_ATOMIC); >> > >> >> > >> >> > pipeline_event_enqueue(dev, port, &ev); >> > >> >> > >> > >> >> >However, there are two reasons to prevent this: >> > >> >> > >> > >> >> >First, compare with other tests in app/eventdev, for example, the >> > >> >> >eventdev perf test, the wmb is after event operation to ensure >> > >> >> >operation has been finished and then w->processed_pkts++. >> > >> >> >> > >> >> In case of perf_* tests start with a dequeue() and finally ends with >> >> a >> > >> >> mempool_put() should also act as implicit acquire release pairs >> >making stats >> > >> >> consistent? >> > >> > >> > >> >For perf tests, this consistency refers to that there is a wmb after >> >mempool_put(). >> > >> >Please refer to this link: >> > >> >https://urldefense.proofpoint.com/v2/url?u=http- >> >>3A__patches.dpdk.org_patch_85634_&d=DwIGaQ&c=nKjWec2b6R0m >O >> >>yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=z >g >> >QHeSDiXWfI1PIIUxXBqMS6E- >2_3G46nhrzGXoBpHI&s=0FwTxPXjWflh- >> >sdmnkY133IPlJB780x0yxe7Am3JCBw&e= >> > >> > >> > >> >> >> > >> >> >So, if we move release barriers before tx/fwd, it may cause that >the >> > >> >> >tests of app/eventdev become inconsistent.This may reduce the >> > >> >> >maintainability of the code and make it difficult to understand. >> > >> >> > >> > >> >> >Second, it is a test case, though heavy thread may cause >> >performance >> > >> >> >degradation, it can ensure that the operation process and the test >> > >> >> >result are correct. And maybe for a test case, correctness is more >> > >> >> >important than performance. >> > >> >> > >> > >> >> >> > >> >> Most of our internal perf test run on 24/48 core combinations and >> >since >> > >> >> Octeontx2 event device driver supports a burst size of 1, it will >> >> show >> >up as >> > >> >> Huge performance degradation. >> > >> > >> > >> >For the impact on performance, I do the test using software driver, >> >following are some test results: >> > >> >----------------------------------------------------------------------- >> >---------------- >> >--------------------------------------------- >> > >> >Architecture: aarch64 >> > >> >Nics: ixgbe-82599 >> > >> >CPU: Cortex-A72 >> > >> >BURST_SIZE: 1 >> > >> >Order: ./dpdk-test-eventdev -l 0-15 -s 0x2 --vdev=event_sw0 -- -- >> >test=pipeline_queue --wlcore=4-14 --prod_type_ethdev --stlist=a,a >> > >> >Flow: one flow, 64bits package, TX rate: 1.4Mpps >> > >> > >> > >> >Without this patch: >> > >> >0.954 mpps avg 0.953 mpps >> > >> > >> > >> >With this patch: >> > >> >0.932 mpps avg 0.930 mpps >> > >> >----------------------------------------------------------------------- >> >---------------- >> >--------------------------------------------- >> > >> > >> > >> >Based on the result above, there is no significant performance >> >degradation with this patch. >> > >> >This is because the release barrier is only for “w- >>processed_pkts++”. >> >It just ensures that the worker core >> > >> >increments the number events processed after enqueue, and it >doesn’t >> >affect dequeue/enqueue: >> > >> > >> > >> >dequeue -> enqueue -> release barrier -> w->processed_pkts++ >> > >> >> Here enqueue already acts as an explicit release barrier. >> >Please refer above reasons. > >> > >> > >> >On the other hand, I infer the reason for the slight decrease in >> >measurement performance is that the release barrier >> > >> >prevent “w->processed_pkts++” before that the event has been >processed >> >(enqueue). But I think this test result is closer >> > >> >to the real performance. >> > >> >And sorry for that we have no octentx2 device, so there is no test >> >result on Octeontx2 event device driver. Would you please >> > >> >help us test this patch on octentx2 when you are convenient. Thanks >> >very much. >> > >> >> I will report the performance numbers on Monday. >> > >That’s great, Thanks very much for your help. > >Best Regards >Feifei > >> > >> > >> >Best Regards >> > >> >Feifei >> >> Regards, >> Pavan. >> >> > >> > >> > >> >> >> > >> >> >So, due to two reasons above, I'm ambivalent about how we >should >> >do in >> > >> >> >the next step. >> > >> >> > >> > >> >> >Best Regards >> > >> >> >Feifei >> > >> >> >> > >> >> Regards, >> > >> >> Pavan. >> > >> >> >> > >> >> > >> > >> >> >> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue >worker >> > >> >> >> >functions") >> > >> >> >> >Cc: >> >>mailto:pbhagavat...@marvell.com<mailto:pbhagavat...@marvell.co >m> >> > >> >> >> >Cc: mailto:sta...@dpdk.org<mailto:sta...@dpdk.org> >> > >> >> >> > >> > >> >> >> >Signed-off-by: Phil Yang >> ><mailto:phil.y...@arm.com<mailto:phil.y...@arm.com>> >> > >> >> >> >Signed-off-by: Feifei Wang >> ><mailto:feifei.wa...@arm.com<mailto:feifei.wa...@arm.com>> >> > >> >> >> >Reviewed-by: Ruifeng Wang >> ><mailto:ruifeng.w...@arm.com<mailto:ruifeng.w...@arm.com>> >> > >> >> >> >--- >> > >> >> >> > app/test-eventdev/test_pipeline_queue.c | 64 >> > >> >> >> >+++++++++++++++++++++---- >> > >> >> >> > 1 file changed, 56 insertions(+), 8 deletions(-) >> > >> >> >> > >> > >> >> >> >diff --git a/app/test-eventdev/test_pipeline_queue.c >b/app/test- >> > >> >> >> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb >> > >> >> >100644 >> > >> >> >> >--- a/app/test-eventdev/test_pipeline_queue.c >> > >> >> >> >+++ b/app/test-eventdev/test_pipeline_queue.c >> > >> >> >> >@@ -30,7 +30,13 @@ >> >pipeline_queue_worker_single_stage_tx(void >> > >> >> >> >*arg) >> > >> >> >> > >> > >> >> >> > if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) { >> > >> >> >> > pipeline_event_tx(dev, port, >> >> >> > &ev); >> > >> >> >> >- w->processed_pkts++; >> > >> >> >> >+ >> > >> >> >> >+ /* release barrier here ensures >> >> >> >+ stored >> >operation >> > >> >> >> >+ * of the event completes before >> >> >> >+ the number >> >of >> > >> >> >> >+ * processed pkts is visible to >> >> >> >+ the main core >> > >> >> >> >+ */ >> > >> >> >> >+ >> >> >> >+ __atomic_fetch_add(&(w->processed_pkts), >> >1, >> > >> >> >> >+ >> >> >> >+ __ATOMIC_RELEASE); >> > >> >> >> > } else { >> > >> >> >> > ev.queue_id++; >> > >> >> >> > pipeline_fwd_event(&ev, >> > >> >> >> >RTE_SCHED_TYPE_ATOMIC); >> > >> >> >> >@@ -59,7 +65,13 @@ >> > >> >> >pipeline_queue_worker_single_stage_fwd(void >> > >> >> >> >*arg) >> > >> >> >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0); >> > >> >> >> > pipeline_fwd_event(&ev, >> >> >> > RTE_SCHED_TYPE_ATOMIC); >> > >> >> >> > pipeline_event_enqueue(dev, port, &ev); >> > >> >> >> >- w->processed_pkts++; >> > >> >> >> >+ >> > >> >> >> >+ /* release barrier here ensures stored >> >> >> >+ operation >> > >> >> >> >+ * of the event completes before the number of >> > >> >> >> >+ * processed pkts is visible to the main core >> > >> >> >> >+ */ >> > >> >> >> >+ __atomic_fetch_add(&(w->processed_pkts), 1, >> > >> >> >> >+ >> >> >> >+ __ATOMIC_RELEASE); >> > >> >> >> > } >> > >> >> >> > >> > >> >> >> > return 0; >> > >> >> >> >@@ -84,7 +96,13 @@ >> > >> >> >> >pipeline_queue_worker_single_stage_burst_tx(void *arg) >> > >> >> >> > if (ev[i].sched_type == >> > >> >> >> >RTE_SCHED_TYPE_ATOMIC) { >> > >> >> >> > >> >> >> > pipeline_event_tx(dev, port, &ev[i]); >> > >> >> >> > ev[i].op = >> >> >> > RTE_EVENT_OP_RELEASE; >> > >> >> >> >- >> >> >> >w->processed_pkts++; >> > >> >> >> >+ >> > >> >> >> >+ /* release >> >> >> >+ barrier here ensures stored >> > >> >> >> >operation >> > >> >> >> >+ * of the event >> >> >> >+ completes before the >> > >> >> >> >number of >> > >> >> >> >+ * processed >> >> >> >+ pkts is visible to the main >> > >> >> >> >core >> > >> >> >> >+ */ >> > >> >> >> >+ >> >> >> >+ __atomic_fetch_add(&(w- >> > >> >> >> >>processed_pkts), 1, >> > >> >> >> >+ >> >__ATOMIC_RELEASE); >> > >> >> >> > } else { >> > >> >> >> > >> >> >> > ev[i].queue_id++; >> > >> >> >> > >> >> >> > pipeline_fwd_event(&ev[i], >> > >> >> >> >@@ -121,7 +139,13 @@ >> > >> >> >> >pipeline_queue_worker_single_stage_burst_fwd(void *arg) >> > >> >> >> > } >> > >> >> >> > >> > >> >> >> > pipeline_event_enqueue_burst(dev, port, ev, >> >> >> > nb_rx); >> > >> >> >> >- w->processed_pkts += nb_rx; >> > >> >> >> >+ >> > >> >> >> >+ /* release barrier here ensures stored >> >> >> >+ operation >> > >> >> >> >+ * of the event completes before the number of >> > >> >> >> >+ * processed pkts is visible to the main core >> > >> >> >> >+ */ >> > >> >> >> >+ __atomic_fetch_add(&(w->processed_pkts), nb_rx, >> > >> >> >> >+ >> >> >> >+ __ATOMIC_RELEASE); >> > >> >> >> > } >> > >> >> >> > >> > >> >> >> > return 0; >> > >> >> >> >@@ -146,7 +170,13 @@ >> > >> >> >pipeline_queue_worker_multi_stage_tx(void >> > >> >> >> >*arg) >> > >> >> >> > >> > >> >> >> > if (ev.queue_id == tx_queue[ev.mbuf->port]) { >> > >> >> >> > pipeline_event_tx(dev, port, >> >> >> > &ev); >> > >> >> >> >- w->processed_pkts++; >> > >> >> >> >+ >> > >> >> >> >+ /* release barrier here ensures >> >> >> >+ stored >> >operation >> > >> >> >> >+ * of the event completes before >> >> >> >+ the number >> >of >> > >> >> >> >+ * processed pkts is visible to >> >> >> >+ the main core >> > >> >> >> >+ */ >> > >> >> >> >+ >> >> >> >+ __atomic_fetch_add(&(w->processed_pkts), >> >1, >> > >> >> >> >+ >> >> >> >+ __ATOMIC_RELEASE); >> > >> >> >> > continue; >> > >> >> >> > } >> > >> >> >> > >> > >> >> >> >@@ -180,7 +210,13 @@ >> > >> >> >> >pipeline_queue_worker_multi_stage_fwd(void *arg) >> > >> >> >> > ev.queue_id = >> >> >> > tx_queue[ev.mbuf->port]; >> > >> >> >> > >> >> >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf, >> >0); >> > >> >> >> > pipeline_fwd_event(&ev, >> > >> >> >> >RTE_SCHED_TYPE_ATOMIC); >> > >> >> >> >- w->processed_pkts++; >> > >> >> >> >+ >> > >> >> >> >+ /* release barrier here ensures >> >> >> >+ stored >> >operation >> > >> >> >> >+ * of the event completes before >> >> >> >+ the number >> >of >> > >> >> >> >+ * processed pkts is visible to >> >> >> >+ the main core >> > >> >> >> >+ */ >> > >> >> >> >+ >> >> >> >+ __atomic_fetch_add(&(w->processed_pkts), >> >1, >> > >> >> >> >+ >> >> >> >+ __ATOMIC_RELEASE); >> > >> >> >> > } else { >> > >> >> >> > ev.queue_id++; >> > >> >> >> > pipeline_fwd_event(&ev, >> > >> >> >> >sched_type_list[cq_id]); >> > >> >> >> >@@ -214,7 +250,13 @@ >> > >> >> >> >pipeline_queue_worker_multi_stage_burst_tx(void *arg) >> > >> >> >> > if (ev[i].queue_id == >> >> >> > tx_queue[ev[i].mbuf- >> > >> >> >> >>port]) { >> > >> >> >> > >> >> >> > pipeline_event_tx(dev, port, &ev[i]); >> > >> >> >> > ev[i].op = >> >> >> > RTE_EVENT_OP_RELEASE; >> > >> >> >> >- >> >> >> >w->processed_pkts++; >> > >> >> >> >+ >> > >> >> >> >+ /* release >> >> >> >+ barrier here ensures stored >> > >> >> >> >operation >> > >> >> >> >+ * of the event >> >> >> >+ completes before the >> > >> >> >> >number of >> > >> >> >> >+ * processed >> >> >> >+ pkts is visible to the main >> > >> >> >> >core >> > >> >> >> >+ */ >> > >> >> >> >+ >> >> >> >+ __atomic_fetch_add(&(w- >> > >> >> >> >>processed_pkts), 1, >> > >> >> >> >+ >> >__ATOMIC_RELEASE); >> > >> >> >> > continue; >> > >> >> >> > } >> > >> >> >> > >> > >> >> >> >@@ -254,7 +296,13 @@ >> > >> >> >> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg) >> > >> >> >> > >> > >> >> >> > rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0); >> > >> >> >> > >> >> >> > pipeline_fwd_event(&ev[i], >> > >> >> >> > >> > >> >> >> > RTE_SCHED_TYPE_ATOMIC); >> > >> >> >> >- >> >> >> >w->processed_pkts++; >> > >> >> >> >+ >> > >> >> >> >+ /* release >> >> >> >+ barrier here ensures stored >> > >> >> >> >operation >> > >> >> >> >+ * of the event >> >> >> >+ completes before the >> > >> >> >> >number of >> > >> >> >> >+ * processed >> >> >> >+ pkts is visible to the main >> > >> >> >> >core >> > >> >> >> >+ */ >> > >> >> >> >+ >> >> >> >+ __atomic_fetch_add(&(w- >> > >> >> >> >>processed_pkts), 1, >> > >> >> >> >+ >> >__ATOMIC_RELEASE); >> > >> >> >> > } else { >> > >> >> >> > >> >> >> > ev[i].queue_id++; >> > >> >> >> > >> >> >> > pipeline_fwd_event(&ev[i], >> > >> >> >> >-- >> > >> >> >> >2.17.1 >> >