eventdev: add release barriers for pipeline test

Feifei Wang Wed, 13 Jan 2021 22:07:40 -0800

Hi, Pavan

> -----邮件原件-----
> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com>
> 发送时间: 2021年1月12日 16:29
> 收件人: Feifei Wang <feifei.wa...@arm.com>; jer...@marvell.com; Harry
> van Haaren <harry.van.haa...@intel.com>
> 抄送: dev@dpdk.org; nd <n...@arm.com>; Honnappa Nagarahalli
> <honnappa.nagaraha...@arm.com>; sta...@dpdk.org; Ruifeng Wang
> <ruifeng.w...@arm.com>; nd <n...@arm.com>; nd <n...@arm.com>; nd
> <n...@arm.com>
> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline
> test
> 
> Hi Feifei,
> 
> >Hi Feifei,
> >
> >>Hi, Pavan
> >>
> >>> -----邮件原件-----
> >>> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com>
> >>> 发送时间: 2021年1月8日 17:13
> >>> 收件人: Feifei Wang <feifei.wa...@arm.com>;
> >jer...@marvell.com;
> >>Harry
> >>> van Haaren <harry.van.haa...@intel.com>
> >>> 抄送: dev@dpdk.org; nd <n...@arm.com>; Honnappa Nagarahalli
> >>> <honnappa.nagaraha...@arm.com>; sta...@dpdk.org; Ruifeng
> >Wang
> >>> <ruifeng.w...@arm.com>; nd <n...@arm.com>; nd <n...@arm.com>
> >>> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for
> >>pipeline
> >>> test
> >>>
> >>> Hi Feifei,
> >>>
> >>> >Hi, Pavan
> >>> >
> >>> >
> >>> >
> >>> >> -----邮件原件-----
> >>> >
> >>> >> 发件人: Pavan Nikhilesh Bhagavatula
> >><mailto:pbhagavat...@marvell.com>
> >>> >
> >>> >> 发送时间: 2021年1月5日 17:29
> >>> >
> >>> >> 收件人: Feifei Wang <mailto:feifei.wa...@arm.com>;
> >>mailto:jer...@marvell.com;
> >>> >Harry
> >>> >
> >>> >> van Haaren <mailto:harry.van.haa...@intel.com>
> >>> >
> >>> >> 抄送: mailto:dev@dpdk.org; nd <mailto:n...@arm.com>;
> >Honnappa
> >>Nagarahalli
> >>> >
> >>> >> <mailto:honnappa.nagaraha...@arm.com>;
> >>mailto:sta...@dpdk.org; Ruifeng Wang
> >>> >
> >>> >> <mailto:ruifeng.w...@arm.com>; nd <mailto:n...@arm.com>
> >>> >
> >>> >> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers
> >>for
> >>> >pipeline
> >>> >
> >>> >> test
> >>> >
> >>> >>
> >>> >
> >>> >> Hi Feifei,
> >>> >
> >>> >>
> >>> >
> >>> >> >Hi, Pavan
> >>> >
> >>> >> >
> >>> >
> >>> >> >Sorry for my late reply and thanks very much for your review.
> >>> >
> >>> >> >
> >>> >
> >>> >> >> -----Original Message-----
> >>> >
> >>> >> >> From: Pavan Nikhilesh Bhagavatula
> >>>
> >>><mailto:pbhagavat...@marvell.com<mailto:pbhagavatula@marvell.
> >c
> >>om>>
> >>> >
> >>> >> >> Sent: 2020年12月22日 18:33
> >>> >
> >>> >> >> To: Feifei Wang
> >>>
> >><mailto:feifei.wa...@arm.com<mailto:feifei.wa...@arm.com>>;
> >>> >mailto:jer...@marvell.com<mailto:jer...@marvell.com>;
> >>> >
> >>> >> >Harry van
> >>> >
> >>> >> >> Haaren
> >>>
> >>><mailto:harry.van.haa...@intel.com<mailto:harry.van.haaren@intel
> >.
> >>com>>;
> >>> >Pavan Nikhilesh
> >>> >
> >>> >> >>
> >>>
> >>><pbhagavat...@caviumnetworks.com<mailto:pbhagavatula@caviu
> >m
> >>n
> >>> >etworks.com>>
> >>> >
> >>> >> >> Cc: mailto:dev@dpdk.org<mailto:dev@dpdk.org>; nd
> >>> ><mailto:n...@arm.com<mailto:n...@arm.com>>; Honnappa
> >>Nagarahalli
> >>> >
> >>> >> >>
> >>>
> >>><honnappa.nagaraha...@arm.com<mailto:Honnappa.Nagarahalli@a
> >r
> >>m
> >>> >.com>>; mailto:sta...@dpdk.org<mailto:sta...@dpdk.org>; Phil
> >>Yang
> >>> >
> >>> >> >> <mailto:phil.y...@arm.com<mailto:phil.y...@arm.com>>
> >>> >
> >>> >> >> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release
> >>> >barriers
> >>> >
> >>> >> >for
> >>> >
> >>> >> >> pipeline test
> >>> >
> >>> >> >>
> >>> >
> >>> >> >>
> >>> >
> >>> >> >> >Add release barriers before updating the processed packets
> >for
> >>> >
> >>> >> >worker
> >>> >
> >>> >> >> >lcores to ensure the worker lcore has really finished data
> >>> >
> >>> >> >> >processing and then it can update the processed packets
> >>number.
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >>
> >>> >
> >>> >> >> I believe we can live with minor inaccuracies in stats being
> >>> >
> >>> >> >> presented
> >>> >
> >>> >> >as
> >>> >
> >>> >> >> atomics are pretty heavy when scheduler is limited to burst
> >size
> >>> >> >> as
> >>> >1.
> >>> >
> >>> >> >>
> >>> >
> >>> >> >> One option is to move it before a pipeline operation
> >>> >
> >>> >> >(pipeline_event_tx,
> >>> >
> >>> >> >> pipeline_fwd_event etc.) as they imply implicit release
> >>> >> >> barrier (as
> >>> >
> >>> >> >> all
> >>> >
> >>> >> >the
> >>> >
> >>> >> >> changes done to the event should be visible to the next core).
> >>> >
> >>> >> >
> >>> >
> >>> >> >If I understand correctly, your meaning is that move release
> >>> >> >barriers
> >>> >
> >>> >> >before pipeline_event_tx or pipeline_fwd_event. This can ensure
> >>the
> >>> >
> >>> >> >event has been processed before the next core begins to tx/fwd.
> >>For
> >>> >
> >>> >> >example:
> >>> >
> >>> >>
> >>> >
> >>> >> What I meant was event APIs such as `rte_event_enqueue_burst`,
> >>> >
> >>> >> `rte_event_eth_tx_adapter_enqueue`
> >>> >
> >>> >> act as an implicit release barrier and the API
> >>> >`rte_event_dequeue_burst` act
> >>> >
> >>> >> as an implicit acquire barrier.
> >>> >
> >>> >
> >>> >
> >>> >>
> >>> >
> >>> >> Since, pipeline_* test starts with a dequeue() and ends with an
> >>> >enqueue() I
> >>> >
> >>> >> don’t believe we need barriers in Between.
> >>> >
> >>> >
> >>> >
> >>> >Sorry for my misunderstanding this. And I agree with you that no
> >>> >barriers are
> >>> >
> >>> >needed between dequeue and enqueue.
> >>> >
> >>> >
> >>> >
> >>> >Now, let's go back to the beginning. Actually with this patch, our
> >>> >barrier is mainly
> >>> >
> >>> >for the synchronous variable " w->processed_pkts ". As we all
> >know,
> >>the
> >>> >event is firstly
> >>> >
> >>> >dequeued and then enqueued, after this, the event can be treated
> >as
> >>the
> >>> >processed event
> >>> >
> >>> >and included in the statistics("w->processed_pkts++").
> >>> >
> >>> >
> >>> >
> >>> >Thus, we add a release barrier before " w->processed_pkts++" is to
> >>> >prevent this operation
> >>> >
> >>> >being executed ahead of time. For example:
> >>> >
> >>> >dequeue  ->  w->processed_pkts++  ->  enqueue
> >>> >
> >>> >This cause that the worker doesn't actually finish this event
> >>> >processing, but the event is treated
> >>> >
> >>> >as the processed one and included in the statistics.
> >>> >
> >>>
> >>> But the current sequence is dequeue-> enqueue-> w-
> processed_pkts++
> >>>and enqueue already acts as an explicit release barrier right?
> >>>
> >>
> >>Sorry maybe I cannot understand how “enqueue” as an explicit release
> >>barrier. I think of two possibilities:
> >>1. As you say before, all the changes done to the event should be
> >visible
> >>to the next core and enqueue is a operation for event, so the next
> >core
> >>should wait for the event to be enqueued.
> >>I think this is due to data dependence for the same variable. However,
> >>‘w->processed_pkts’ and ‘ev’ are different variables, so this cannot
> >>prevent ‘w->processed_pkts++’ before enqueue.
> >>And the main core may load updated ‘w->processed_pkts’ but actually
> >>the event is still being processed. For example:
> >>
> >> Time Slot     Worker 1                                      Main core
> >>   1                           dequeue
> >>   2                  w->processed_pkts++
> >>   3                                                              load w-
> >>processed_pkts
> >>   4                          enqueue
> >>
> >>2. Some release barriers have been included in enqueue. There is a
> >>release barrier in rte_ring_enqueue :
> >>move head -> copy elements to the ring -> release barrier -> update
> >tail
> >>-> w->processed_pkts++
> >>However, this barrier cannot prevent ‘w->processed_pkts++’ before
> >>update tail, and when update_tail has been finished, the enqueue
> >>process  can be seen completed.
> >
> >I was talking about case 2 in particular almost all enqueue calls have
> >some kind of release barrier in place. I do agree w->processed_pkts++
> >might get reordered with tail update but since enqueue itself is a ldr
> >+ blr I was hoping that it wouldn't occur.
> >
> >We can continue the discussion once I have some performance data.
> >
> >Thanks for your patience :)
> >Pavan.
> >
> >>
> >>> >
> >>> >
> >>>
> >>>_______________________________________________________
> _
> >_
> >>> _
> >>> >____________________
> >>> >
> >>> >
> >>> >
> >>> >By the way, I have two other questions about pipeline process test
> >in
> >>> >"test_pipeline_queue".
> >>> >
> >>> >1. when do we start counting processed events (w-
> >>>processed_pkts)?
> >>> >
> >>> >For the fwd mode (internal_port = false), when we choose single
> >>stage,
> >>> >application increments
> >>> >
> >>> >the number events processed after "pipeline_event enqueue".
> >>> >However, when we choose multiple
> >>> >
> >>> >stage, application increments the number events processed before
> >>> >"pipleline_event_enqueue".
> >>>
> >>> We count an event as process when all the stages have completed
> >>and its
> >>> Trasnmitted.
> >>>
> >>> >So,
> >>> >
> >>> >maybe we can unify this. For example of multiple stage:
> >>> >
> >>> >
> >>> >
> >>> >                                if (cq_id == last_queue) {
> >>> >
> >>> >                                                ev.queue_id =
> >>> > tx_queue[ev.mbuf->port];
> >>> >
> >>> >
> >>> >rte_event_eth_tx_adapter_txq_set(ev.mbuf,
> >>> >0);
> >>> >
> >>> >
> >>> >pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
> >>> >
> >>> >                                +             
> >>> > pipeline_event_enqueue(dev, port, &ev);
> >>> >
> >>> >
> >>> > w->processed_pkts++;
> >>> >
> >>> >                                } else {
> >>> >
> >>> >                                                ev.queue_id++;
> >>> >
> >>> >
> >>> >pipeline_fwd_event(&ev, sched_type_list[cq_id]);
> >>> >
> >>> >                                +             
> >>> > pipeline_event_enqueue(dev, port, &ev);
> >>> >
> >>> >                                }
> >>> >
> >>> >
> >>> >
> >>> >                -              pipeline_event_enqueue(dev, port, &ev);
> >>> >
> >>> >
> >>>
> >>> The above change makes sense.
> >>>
> >>Thanks for your review, and I’ll update this change into the next
> >>version.
> >>> >
> >>> >2. Whether  "pipeline_event_enqueue" is needed after
> >>> >"pipeline_event_tx" for tx mode?
> >>> >
> >>> >For single_stage_burst_tx mode, after "pipeline_event_tx", the
> >>worker
> >>> >has to enqueue again
> >>> >
> >>> >due to  "pipeline_event_enqueue_burst", so maybe we should jump
> >>out of
> >>> >the loop after
> >>> >
> >>> >“pipeline_event_tx”,
> >>>
> >>> We call enqueue burst to release the events i.e. enqueue events with
> >>> RTE_EVENT_OP_RELEASE.
> >>>
> >>However,
> >>In case of single event, for ' pipeline_queue_worker_single_stage_tx'
> >>and ' pipeline_queue_worker_multi_stage_tx',
> >>after tx, there is no release operation.
> >>
> 
> I think this was done before the feature disable_implicit_release
> `RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE ` was implemented.
> We can remove the event_op_release section of the code and have a
> continue.
> 
Ok, thanks for your explain. I will upload the new series to fix above two bugs.
 
> >>> >
> >>>  for example:
> >>> >
> >>> >
> >>> >
> >>> >                                                if
> >>> >(ev[i].sched_type ==
> >>> >RTE_SCHED_TYPE_ATOMIC) {
> >>> >
> >>> >
> >>> > pipeline_event_tx(dev, port, &ev[i]);
> >>> >
> >>> >
> >>> > ev[i].op = RTE_EVENT_OP_RELEASE;
> >>> >
> >>> >
> >>> > w->processed_pkts++;
> >>> >
> >>> >                                                +             continue;
> >>> >
> >>> >                                                } else {
> >>> >
> >>> >
> >>> > ev[i].queue_id++;
> >>> >
> >>> >
> >>> > pipeline_fwd_event(&ev[i],
> >>> >
> >>> >
> >>> >RTE_SCHED_TYPE_ATOMIC);
> >>> >
> >>> >                                                }
> >>> >
> >>> >                                }
> >>> >
> >>> >
> >>> >
> >>> >                                pipeline_event_enqueue_burst(dev,
> >>> > port, ev, nb_rx);
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >>
> >>> >
> >>> >> >
> >>> >
> >>> >> >if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
> >>> >
> >>> >> >                          +
> >>> >__atomic_thread_fence(__ATOMIC_RELEASE);
> >>> >
> >>> >> >                                          pipeline_event_tx(dev,
> >>> >> > port, &ev);
> >>> >
> >>> >> >                                          w->processed_pkts++;
> >>> >
> >>> >> >                          } else {
> >>> >
> >>> >> >                                          ev.queue_id++;
> >>> >
> >>> >> >                          +
> >>> >__atomic_thread_fence(__ATOMIC_RELEASE);
> >>> >
> >>> >> >
> >>> >> > pipeline_fwd_event(&ev,
> >>> >
> >>> >> >RTE_SCHED_TYPE_ATOMIC);
> >>> >
> >>> >> >
> >>> >> > pipeline_event_enqueue(dev, port, &ev);
> >>> >
> >>> >> >
> >>> >
> >>> >> >However, there are two reasons to prevent this:
> >>> >
> >>> >> >
> >>> >
> >>> >> >First, compare with other tests in app/eventdev, for example,
> >the
> >>> >
> >>> >> >eventdev perf test, the wmb is after event operation to ensure
> >>> >
> >>> >> >operation has been finished and then w->processed_pkts++.
> >>> >
> >>> >>
> >>> >
> >>> >> In case of perf_* tests start with a dequeue() and finally ends
> >>> >> with a
> >>> >
> >>> >> mempool_put() should also act as implicit acquire release pairs
> >>> >making stats
> >>> >
> >>> >> consistent?
> >>> >
> >>> >
> >>> >
> >>> >For perf tests, this consistency refers to that there is a wmb
> >>> >after mempool_put().
> >>> >
> >>> >Please refer to this link:
> >>> >
> >>> >https://urldefense.proofpoint.com/v2/url?u=http-
> >>>
> >>>3A__patches.dpdk.org_patch_85634_&d=DwIGaQ&c=nKjWec2b6R0
> >m
> >>O
> >>>
> >>>yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=
> >z
> >>g
> >>> >QHeSDiXWfI1PIIUxXBqMS6E-
> >>2_3G46nhrzGXoBpHI&s=0FwTxPXjWflh-
> >>> >sdmnkY133IPlJB780x0yxe7Am3JCBw&e=
> >>> >
> >>> >
> >>> >
> >>> >>
> >>> >
> >>> >> >So, if we move release barriers before tx/fwd, it may cause that
> >>the
> >>> >
> >>> >> >tests of app/eventdev become  inconsistent.This may reduce the
> >>> >
> >>> >> >maintainability of the code and make it difficult to understand.
> >>> >
> >>> >> >
> >>> >
> >>> >> >Second, it is a test case, though heavy thread may cause
> >>> >performance
> >>> >
> >>> >> >degradation, it can ensure that the operation process and the
> >test
> >>> >
> >>> >> >result are correct. And maybe for a test case, correctness is
> >more
> >>> >
> >>> >> >important than performance.
> >>> >
> >>> >> >
> >>> >
> >>> >>
> >>> >
> >>> >> Most of our internal perf test run on 24/48 core combinations
> >and
> >>> >since
> >>> >
> >>> >> Octeontx2 event device driver supports a burst size of 1, it will
> >>> >> show
> >>> >up as
> >>> >
> >>> >> Huge performance degradation.
> >>> >
> >>> >
> >>> >
> >>> >For the impact on performance, I do the test using software driver,
> >>> >following are some test results:
> >>> >
> >>> >-------------------------------------------------------------------
> >>> >----
> >>> >----------------
> >>> >---------------------------------------------
> >>> >
> >>> >Architecture: aarch64
> >>> >
> >>> >Nics: ixgbe-82599
> >>> >
> >>> >CPU: Cortex-A72
> >>> >
> >>> >BURST_SIZE: 1
> >>> >
> >>> >Order: ./dpdk-test-eventdev -l 0-15 -s 0x2 --vdev=event_sw0 -- --
> >>> >test=pipeline_queue --wlcore=4-14 --prod_type_ethdev --stlist=a,a
> >>> >
> >>> >Flow: one flow, 64bits package, TX rate: 1.4Mpps
> >>> >
> >>> >
> >>> >
> >>> >Without this patch:
> >>> >
> >>> >0.954 mpps avg 0.953 mpps
> >>> >
> >>> >
> >>> >
> >>> >With this patch:
> >>> >
> >>> >0.932 mpps avg 0.930 mpps
> >>> >
> >>> >-------------------------------------------------------------------
> >>> >----
> >>> >----------------
> >>> >---------------------------------------------
> >>> >
> >>> >
> >>> >
> >>> >Based on the result above, there is no significant performance
> >>> >degradation with this patch.
> >>> >
> >>> >This is because the release barrier is only for  “w-
> >>>processed_pkts++”.
> >>> >It just ensures that the worker core
> >>> >
> >>> >increments the number events processed after enqueue, and it
> >>doesn’t
> >>> >affect dequeue/enqueue:
> >>> >
> >>> >
> >>> >
> >>> >dequeue -> enqueue -> release barrier -> w->processed_pkts++
> >>> >
> >>>
> >>> Here enqueue already acts as an explicit release barrier.
> >>>
> >>Please refer above reasons.
> >>
> >>> >
> >>> >
> >>> >On the other hand, I infer the reason for the slight decrease in
> >>> >measurement performance is that the release barrier
> >>> >
> >>> >prevent “w->processed_pkts++” before that the event has been
> >>processed
> >>> >(enqueue). But I think this test result is closer
> >>> >
> >>> >to the real performance.
> >>> >
> >>> >And sorry for that we have no octentx2 device, so there is no test
> >>> >result on Octeontx2 event device driver. Would you please
> >>> >
> >>> >help us test this patch on octentx2 when you are convenient.
> >Thanks
> >>> >very much.
> >>> >
> >>>
> >>> I will report the performance numbers on Monday.
> >>>
> >>
> >>That’s great, Thanks very much for your help.
> 
> We are seeing a ~2% performance loss with the series applied on pipeline
> tests.
> I have taken the performance numbers on IXIA to eliminate any
> discrepancies in stats.
> 
> Do you see any improvement in stats being reported with and w/o the
> barrier with Traffic generator vs eventdev application? because I don’t see
> any difference when verifying with IXIA.
> 
Thanks very much for your test. We also use IXIA to test with SW driver and 
have ~2% performance loss according to the above data.


> Can we skip this patch?
> Also, note that the barrier wouldn’t help in cases where:
> 1. SW eventdev Tx routine runs on service core which would have it's on
> latency in processing the packets.
> 2. HW eventdev Tx routine that have shaping and scheduling.
> 

That's Ok, I'll remove this patch from this series. And thanks very much for 
this great discussion.

Best Regards
Feifei

> Regards,
> Pavan.
> 
> >>
> >>Best Regards
> >>Feifei
> >>
> >>> >
> >>> >
> >>> >Best Regards
> >>> >
> >>> >Feifei
> >>>
> >>> Regards,
> >>> Pavan.
> >>>
> >>> >
> >>> >
> >>> >
> >>> >>
> >>> >
> >>> >> >So, due to two reasons above, I'm ambivalent about how we
> >>should
> >>> >do in
> >>> >
> >>> >> >the next step.
> >>> >
> >>> >> >
> >>> >
> >>> >> >Best Regards
> >>> >
> >>> >> >Feifei
> >>> >
> >>> >>
> >>> >
> >>> >> Regards,
> >>> >
> >>> >> Pavan.
> >>> >
> >>> >>
> >>> >
> >>> >> >
> >>> >
> >>> >> >> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue
> >>worker
> >>> >
> >>> >> >> >functions")
> >>> >
> >>> >> >> >Cc:
> >>>
> >>>mailto:pbhagavat...@marvell.com<mailto:pbhagavatula@marvell.c
> >o
> >>m>
> >>> >
> >>> >> >> >Cc: mailto:sta...@dpdk.org<mailto:sta...@dpdk.org>
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >Signed-off-by: Phil Yang
> >>> ><mailto:phil.y...@arm.com<mailto:phil.y...@arm.com>>
> >>> >
> >>> >> >> >Signed-off-by: Feifei Wang
> >>> ><mailto:feifei.wa...@arm.com<mailto:feifei.wa...@arm.com>>
> >>> >
> >>> >> >> >Reviewed-by: Ruifeng Wang
> >>>
> >><mailto:ruifeng.w...@arm.com<mailto:ruifeng.w...@arm.com>>
> >>> >
> >>> >> >> >---
> >>> >
> >>> >> >> > app/test-eventdev/test_pipeline_queue.c | 64
> >>> >
> >>> >> >> >+++++++++++++++++++++----
> >>> >
> >>> >> >> > 1 file changed, 56 insertions(+), 8 deletions(-)
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >diff --git a/app/test-eventdev/test_pipeline_queue.c
> >>b/app/test-
> >>> >
> >>> >> >> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb
> >>> >
> >>> >> >100644
> >>> >
> >>> >> >> >--- a/app/test-eventdev/test_pipeline_queue.c
> >>> >
> >>> >> >> >+++ b/app/test-eventdev/test_pipeline_queue.c
> >>> >
> >>> >> >> >@@ -30,7 +30,13 @@
> >>> >pipeline_queue_worker_single_stage_tx(void
> >>> >
> >>> >> >> >*arg)
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >                    if (ev.sched_type ==
> >>> >> >> > RTE_SCHED_TYPE_ATOMIC) {
> >>> >
> >>> >> >> >                                    pipeline_event_tx(dev,
> >>> >> >> > port, &ev);
> >>> >
> >>> >> >> >-                                  w->processed_pkts++;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                                 /* release barrier here
> >>> >> >> >+ ensures stored
> >>> >operation
> >>> >
> >>> >> >> >+                                 * of the event completes
> >>> >> >> >+ before the number
> >>> >of
> >>> >
> >>> >> >> >+                                 * processed pkts is visible
> >>> >> >> >+ to the main core
> >>> >
> >>> >> >> >+                                 */
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
> >>> >1,
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __ATOMIC_RELEASE);
> >>> >
> >>> >> >> >                    } else {
> >>> >
> >>> >> >> >                                    ev.queue_id++;
> >>> >
> >>> >> >> >                                    pipeline_fwd_event(&ev,
> >>> >
> >>> >> >> >RTE_SCHED_TYPE_ATOMIC);
> >>> >
> >>> >> >> >@@ -59,7 +65,13 @@
> >>> >
> >>> >> >pipeline_queue_worker_single_stage_fwd(void
> >>> >
> >>> >> >> >*arg)
> >>> >
> >>> >> >> >                    rte_event_eth_tx_adapter_txq_set(ev.mbuf,
> >>> >> >> > 0);
> >>> >
> >>> >> >> >                    pipeline_fwd_event(&ev,
> >>> >> >> > RTE_SCHED_TYPE_ATOMIC);
> >>> >
> >>> >> >> >                    pipeline_event_enqueue(dev, port, &ev);
> >>> >
> >>> >> >> >-                  w->processed_pkts++;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                 /* release barrier here ensures stored
> >>> >> >> >+ operation
> >>> >
> >>> >> >> >+                 * of the event completes before the number
> >>> >> >> >+ of
> >>> >
> >>> >> >> >+                 * processed pkts is visible to the main
> >>> >> >> >+ core
> >>> >
> >>> >> >> >+                 */
> >>> >
> >>> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts), 1,
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __ATOMIC_RELEASE);
> >>> >
> >>> >> >> >    }
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >    return 0;
> >>> >
> >>> >> >> >@@ -84,7 +96,13 @@
> >>> >
> >>> >> >> >pipeline_queue_worker_single_stage_burst_tx(void *arg)
> >>> >
> >>> >> >> >                                    if (ev[i].sched_type ==
> >>> >
> >>> >> >> >RTE_SCHED_TYPE_ATOMIC) {
> >>> >
> >>> >> >> >
> >>> >> >> > pipeline_event_tx(dev, port, &ev[i]);
> >>> >
> >>> >> >> >                                                    ev[i].op
> >>> >> >> > = RTE_EVENT_OP_RELEASE;
> >>> >
> >>> >> >> >-                                                  
> >>> >> >> >w->processed_pkts++;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                                                 /* release
> >>> >> >> >+ barrier here ensures stored
> >>> >
> >>> >> >> >operation
> >>> >
> >>> >> >> >+                                                 * of the
> >>> >> >> >+ event completes before the
> >>> >
> >>> >> >> >number of
> >>> >
> >>> >> >> >+                                                 * processed
> >>> >> >> >+ pkts is visible to the main
> >>> >
> >>> >> >> >core
> >>> >
> >>> >> >> >+                                                 */
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __atomic_fetch_add(&(w-
> >>> >
> >>> >> >> >>processed_pkts), 1,
> >>> >
> >>> >> >> >+
> >>> >__ATOMIC_RELEASE);
> >>> >
> >>> >> >> >                                    } else {
> >>> >
> >>> >> >> >
> >>> >> >> > ev[i].queue_id++;
> >>> >
> >>> >> >> >
> >>> >> >> > pipeline_fwd_event(&ev[i],
> >>> >
> >>> >> >> >@@ -121,7 +139,13 @@
> >>> >
> >>> >> >> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)
> >>> >
> >>> >> >> >                    }
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >                    pipeline_event_enqueue_burst(dev, port,
> >>> >> >> > ev, nb_rx);
> >>> >
> >>> >> >> >-                  w->processed_pkts += nb_rx;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                 /* release barrier here ensures stored
> >>> >> >> >+ operation
> >>> >
> >>> >> >> >+                 * of the event completes before the number
> >>> >> >> >+ of
> >>> >
> >>> >> >> >+                 * processed pkts is visible to the main
> >>> >> >> >+ core
> >>> >
> >>> >> >> >+                 */
> >>> >
> >>> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts),
> >nb_rx,
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __ATOMIC_RELEASE);
> >>> >
> >>> >> >> >    }
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >    return 0;
> >>> >
> >>> >> >> >@@ -146,7 +170,13 @@
> >>> >
> >>> >> >pipeline_queue_worker_multi_stage_tx(void
> >>> >
> >>> >> >> >*arg)
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >                    if (ev.queue_id ==
> >>> >> >> > tx_queue[ev.mbuf->port]) {
> >>> >
> >>> >> >> >                                    pipeline_event_tx(dev,
> >>> >> >> > port, &ev);
> >>> >
> >>> >> >> >-                                  w->processed_pkts++;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                                 /* release barrier here
> >>> >> >> >+ ensures stored
> >>> >operation
> >>> >
> >>> >> >> >+                                 * of the event completes
> >>> >> >> >+ before the number
> >>> >of
> >>> >
> >>> >> >> >+                                 * processed pkts is visible
> >>> >> >> >+ to the main core
> >>> >
> >>> >> >> >+                                 */
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
> >>> >1,
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __ATOMIC_RELEASE);
> >>> >
> >>> >> >> >                                    continue;
> >>> >
> >>> >> >> >                    }
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >@@ -180,7 +210,13 @@
> >>> >
> >>> >> >> >pipeline_queue_worker_multi_stage_fwd(void *arg)
> >>> >
> >>> >> >> >                                    ev.queue_id =
> >>> >> >> > tx_queue[ev.mbuf->port];
> >>> >
> >>> >> >> >
> >>> >> >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf,
> >>> >0);
> >>> >
> >>> >> >> >                                    pipeline_fwd_event(&ev,
> >>> >
> >>> >> >> >RTE_SCHED_TYPE_ATOMIC);
> >>> >
> >>> >> >> >-                                  w->processed_pkts++;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                                 /* release barrier here
> >>> >> >> >+ ensures stored
> >>> >operation
> >>> >
> >>> >> >> >+                                 * of the event completes
> >>> >> >> >+ before the number
> >>> >of
> >>> >
> >>> >> >> >+                                 * processed pkts is visible
> >>> >> >> >+ to the main core
> >>> >
> >>> >> >> >+                                 */
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
> >>> >1,
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __ATOMIC_RELEASE);
> >>> >
> >>> >> >> >                    } else {
> >>> >
> >>> >> >> >                                    ev.queue_id++;
> >>> >
> >>> >> >> >                                    pipeline_fwd_event(&ev,
> >>> >
> >>> >> >> >sched_type_list[cq_id]);
> >>> >
> >>> >> >> >@@ -214,7 +250,13 @@
> >>> >
> >>> >> >> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)
> >>> >
> >>> >> >> >                                    if (ev[i].queue_id ==
> >>> >> >> > tx_queue[ev[i].mbuf-
> >>> >
> >>> >> >> >>port]) {
> >>> >
> >>> >> >> >
> >>> >> >> > pipeline_event_tx(dev, port, &ev[i]);
> >>> >
> >>> >> >> >                                                    ev[i].op
> >>> >> >> > = RTE_EVENT_OP_RELEASE;
> >>> >
> >>> >> >> >-                                                  
> >>> >> >> >w->processed_pkts++;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                                                 /* release
> >>> >> >> >+ barrier here ensures stored
> >>> >
> >>> >> >> >operation
> >>> >
> >>> >> >> >+                                                 * of the
> >>> >> >> >+ event completes before the
> >>> >
> >>> >> >> >number of
> >>> >
> >>> >> >> >+                                                 * processed
> >>> >> >> >+ pkts is visible to the main
> >>> >
> >>> >> >> >core
> >>> >
> >>> >> >> >+                                                 */
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __atomic_fetch_add(&(w-
> >>> >
> >>> >> >> >>processed_pkts), 1,
> >>> >
> >>> >> >> >+
> >>> >__ATOMIC_RELEASE);
> >>> >
> >>> >> >> >                                                    continue;
> >>> >
> >>> >> >> >                                    }
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >@@ -254,7 +296,13 @@
> >>> >
> >>> >> >> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >    rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
> >>> >
> >>> >> >> >
> >>> >> >> > pipeline_fwd_event(&ev[i],
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >    RTE_SCHED_TYPE_ATOMIC);
> >>> >
> >>> >> >> >-                                                  
> >>> >> >> >w->processed_pkts++;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                                                 /* release
> >>> >> >> >+ barrier here ensures stored
> >>> >
> >>> >> >> >operation
> >>> >
> >>> >> >> >+                                                 * of the
> >>> >> >> >+ event completes before the
> >>> >
> >>> >> >> >number of
> >>> >
> >>> >> >> >+                                                 * processed
> >>> >> >> >+ pkts is visible to the main
> >>> >
> >>> >> >> >core
> >>> >
> >>> >> >> >+                                                 */
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __atomic_fetch_add(&(w-
> >>> >
> >>> >> >> >>processed_pkts), 1,
> >>> >
> >>> >> >> >+
> >>> >__ATOMIC_RELEASE);
> >>> >
> >>> >> >> >                                    } else {
> >>> >
> >>> >> >> >
> >>> >> >> > ev[i].queue_id++;
> >>> >
> >>> >> >> >
> >>> >> >> > pipeline_fwd_event(&ev[i],
> >>> >
> >>> >> >> >--
> >>> >
> >>> >> >> >2.17.1
> >>> >

[dpdk-dev] 回复: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test

Reply via email to