eventdev: add release barriers for pipeline test

Pavan Nikhilesh Bhagavatula Fri, 08 Jan 2021 02:58:33 -0800

Hi Feifei,

>Hi, Pavan
>
>> -----邮件原件-----
>> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com>
>> 发送时间: 2021年1月8日 17:13
>> 收件人: Feifei Wang <feifei.wa...@arm.com>; jer...@marvell.com;
>Harry
>> van Haaren <harry.van.haa...@intel.com>
>> 抄送: dev@dpdk.org; nd <n...@arm.com>; Honnappa Nagarahalli
>> <honnappa.nagaraha...@arm.com>; sta...@dpdk.org; Ruifeng Wang
>> <ruifeng.w...@arm.com>; nd <n...@arm.com>; nd <n...@arm.com>
>> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for
>pipeline
>> test
>>
>> Hi Feifei,
>>
>> >Hi, Pavan
>> >
>> >
>> >
>> >> -----邮件原件-----
>> >
>> >> 发件人: Pavan Nikhilesh Bhagavatula
><mailto:pbhagavat...@marvell.com>
>> >
>> >> 发送时间: 2021年1月5日 17:29
>> >
>> >> 收件人: Feifei Wang <mailto:feifei.wa...@arm.com>;
>mailto:jer...@marvell.com;
>> >Harry
>> >
>> >> van Haaren <mailto:harry.van.haa...@intel.com>
>> >
>> >> 抄送: mailto:dev@dpdk.org; nd <mailto:n...@arm.com>; Honnappa
>Nagarahalli
>> >
>> >> <mailto:honnappa.nagaraha...@arm.com>;
>mailto:sta...@dpdk.org; Ruifeng Wang
>> >
>> >> <mailto:ruifeng.w...@arm.com>; nd <mailto:n...@arm.com>
>> >
>> >> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers
>for
>> >pipeline
>> >
>> >> test
>> >
>> >>
>> >
>> >> Hi Feifei,
>> >
>> >>
>> >
>> >> >Hi, Pavan
>> >
>> >> >
>> >
>> >> >Sorry for my late reply and thanks very much for your review.
>> >
>> >> >
>> >
>> >> >> -----Original Message-----
>> >
>> >> >> From: Pavan Nikhilesh Bhagavatula
>>
>><mailto:pbhagavat...@marvell.com<mailto:pbhagavatula@marvell.c
>om>>
>> >
>> >> >> Sent: 2020年12月22日 18:33
>> >
>> >> >> To: Feifei Wang
>> ><mailto:feifei.wa...@arm.com<mailto:feifei.wa...@arm.com>>;
>> >mailto:jer...@marvell.com<mailto:jer...@marvell.com>;
>> >
>> >> >Harry van
>> >
>> >> >> Haaren
>>
>><mailto:harry.van.haa...@intel.com<mailto:harry.van.haaren@intel.
>com>>;
>> >Pavan Nikhilesh
>> >
>> >> >>
>>
>><pbhagavat...@caviumnetworks.com<mailto:pbhagavatula@cavium
>n
>> >etworks.com>>
>> >
>> >> >> Cc: mailto:dev@dpdk.org<mailto:dev@dpdk.org>; nd
>> ><mailto:n...@arm.com<mailto:n...@arm.com>>; Honnappa
>Nagarahalli
>> >
>> >> >>
>>
>><honnappa.nagaraha...@arm.com<mailto:Honnappa.Nagarahalli@ar
>m
>> >.com>>; mailto:sta...@dpdk.org<mailto:sta...@dpdk.org>; Phil
>Yang
>> >
>> >> >> <mailto:phil.y...@arm.com<mailto:phil.y...@arm.com>>
>> >
>> >> >> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release
>> >barriers
>> >
>> >> >for
>> >
>> >> >> pipeline test
>> >
>> >> >>
>> >
>> >> >>
>> >
>> >> >> >Add release barriers before updating the processed packets for
>> >
>> >> >worker
>> >
>> >> >> >lcores to ensure the worker lcore has really finished data
>> >
>> >> >> >processing and then it can update the processed packets
>number.
>> >
>> >> >> >
>> >
>> >> >>
>> >
>> >> >> I believe we can live with minor inaccuracies in stats being
>> >
>> >> >> presented
>> >
>> >> >as
>> >
>> >> >> atomics are pretty heavy when scheduler is limited to burst size
>> >> >> as
>> >1.
>> >
>> >> >>
>> >
>> >> >> One option is to move it before a pipeline operation
>> >
>> >> >(pipeline_event_tx,
>> >
>> >> >> pipeline_fwd_event etc.) as they imply implicit release barrier
>> >> >> (as
>> >
>> >> >> all
>> >
>> >> >the
>> >
>> >> >> changes done to the event should be visible to the next core).
>> >
>> >> >
>> >
>> >> >If I understand correctly, your meaning is that move release
>> >> >barriers
>> >
>> >> >before pipeline_event_tx or pipeline_fwd_event. This can ensure
>the
>> >
>> >> >event has been processed before the next core begins to tx/fwd.
>For
>> >
>> >> >example:
>> >
>> >>
>> >
>> >> What I meant was event APIs such as `rte_event_enqueue_burst`,
>> >
>> >> `rte_event_eth_tx_adapter_enqueue`
>> >
>> >> act as an implicit release barrier and the API
>> >`rte_event_dequeue_burst` act
>> >
>> >> as an implicit acquire barrier.
>> >
>> >
>> >
>> >>
>> >
>> >> Since, pipeline_* test starts with a dequeue() and ends with an
>> >enqueue() I
>> >
>> >> don’t believe we need barriers in Between.
>> >
>> >
>> >
>> >Sorry for my misunderstanding this. And I agree with you that no
>> >barriers are
>> >
>> >needed between dequeue and enqueue.
>> >
>> >
>> >
>> >Now, let's go back to the beginning. Actually with this patch, our
>> >barrier is mainly
>> >
>> >for the synchronous variable " w->processed_pkts ". As we all know,
>the
>> >event is firstly
>> >
>> >dequeued and then enqueued, after this, the event can be treated as
>the
>> >processed event
>> >
>> >and included in the statistics("w->processed_pkts++").
>> >
>> >
>> >
>> >Thus, we add a release barrier before " w->processed_pkts++" is to
>> >prevent this operation
>> >
>> >being executed ahead of time. For example:
>> >
>> >dequeue  ->  w->processed_pkts++  ->  enqueue
>> >
>> >This cause that the worker doesn't actually finish this event
>> >processing, but the event is treated
>> >
>> >as the processed one and included in the statistics.
>> >
>>
>> But the current sequence is dequeue-> enqueue-> w-
>>processed_pkts++
>> and enqueue already acts as an explicit release barrier right?
>>
>
>Sorry maybe I cannot understand how “enqueue” as an explicit release
>barrier. I think of two possibilities:
>1. As you say before, all the changes done to the event should be visible
>to the next core and enqueue is a operation for event, so the next core
>should wait for the event to be enqueued.
>I think this is due to data dependence for the same variable. However,
>‘w->processed_pkts’ and ‘ev’ are different variables, so this cannot
>prevent ‘w->processed_pkts++’ before enqueue.
>And the main core may load updated ‘w->processed_pkts’ but actually
>the event is still being processed. For example:
>
> Time Slot        Worker 1                                      Main core
>   1                           dequeue
>   2                  w->processed_pkts++
>   3                                                                 load 
> w->processed_pkts
>   4                          enqueue
>
>2. Some release barriers have been included in enqueue. There is a
>release barrier in rte_ring_enqueue :
>move head -> copy elements to the ring -> release barrier -> update tail
>-> w->processed_pkts++
>However, this barrier cannot prevent ‘w->processed_pkts++’ before
>update tail, and when update_tail has been finished, the enqueue
>process  can be seen completed.


I was talking about case 2 in particular almost all enqueue calls have some 
kind of 
release barrier in place. I do agree w->processed_pkts++ might get reordered 
with 
tail update but since enqueue itself is a ldr + blr I was hoping that it 
wouldn't occur.

We can continue the discussion once I have some performance data.

Thanks for your patience :) 
Pavan.

>
>> >
>> >
>>
>>_________________________________________________________
>> _
>> >____________________
>> >
>> >
>> >
>> >By the way, I have two other questions about pipeline process test in
>> >"test_pipeline_queue".
>> >
>> >1. when do we start counting processed events (w-
>>processed_pkts)?
>> >
>> >For the fwd mode (internal_port = false), when we choose single
>stage,
>> >application increments
>> >
>> >the number events processed after "pipeline_event enqueue".
>> >However, when we choose multiple
>> >
>> >stage, application increments the number events processed before
>> >"pipleline_event_enqueue".
>>
>> We count an event as process when all the stages have completed
>and its
>> Trasnmitted.
>>
>> >So,
>> >
>> >maybe we can unify this. For example of multiple stage:
>> >
>> >
>> >
>> >                                if (cq_id == last_queue) {
>> >
>> >                                                ev.queue_id =
>> > tx_queue[ev.mbuf->port];
>> >
>> >
>> >rte_event_eth_tx_adapter_txq_set(ev.mbuf,
>> >0);
>> >
>> >                                                pipeline_fwd_event(&ev,
>> >RTE_SCHED_TYPE_ATOMIC);
>> >
>> >                                +             pipeline_event_enqueue(dev, 
>> > port, &ev);
>> >
>> >                                                w->processed_pkts++;
>> >
>> >                                } else {
>> >
>> >                                                ev.queue_id++;
>> >
>> >                                                pipeline_fwd_event(&ev,
>> >sched_type_list[cq_id]);
>> >
>> >                                +             pipeline_event_enqueue(dev, 
>> > port, &ev);
>> >
>> >                                }
>> >
>> >
>> >
>> >                -              pipeline_event_enqueue(dev, port, &ev);
>> >
>> >
>>
>> The above change makes sense.
>>
>Thanks for your review, and I’ll update this change into the next
>version.
>> >
>> >2. Whether  "pipeline_event_enqueue" is needed after
>> >"pipeline_event_tx" for tx mode?
>> >
>> >For single_stage_burst_tx mode, after "pipeline_event_tx", the
>worker
>> >has to enqueue again
>> >
>> >due to  "pipeline_event_enqueue_burst", so maybe we should jump
>out of
>> >the loop after
>> >
>> >“pipeline_event_tx”,
>>
>> We call enqueue burst to release the events i.e. enqueue events with
>> RTE_EVENT_OP_RELEASE.
>>
>However,
>In case of single event, for ' pipeline_queue_worker_single_stage_tx'
>and ' pipeline_queue_worker_multi_stage_tx',
>after tx, there is no release operation.
>
>> >
>>  for example:
>> >
>> >
>> >
>> >                                                if (ev[i].sched_type ==
>> >RTE_SCHED_TYPE_ATOMIC) {
>> >
>> >
>> > pipeline_event_tx(dev, port, &ev[i]);
>> >
>> >
>> > ev[i].op = RTE_EVENT_OP_RELEASE;
>> >
>> >
>> > w->processed_pkts++;
>> >
>> >                                                +             continue;
>> >
>> >                                                } else {
>> >
>> >
>> > ev[i].queue_id++;
>> >
>> >
>> > pipeline_fwd_event(&ev[i],
>> >
>> >
>> >RTE_SCHED_TYPE_ATOMIC);
>> >
>> >                                                }
>> >
>> >                                }
>> >
>> >
>> >
>> >                                pipeline_event_enqueue_burst(dev, port,
>> > ev, nb_rx);
>> >
>> >
>> >
>> >
>> >
>> >>
>> >
>> >> >
>> >
>> >> >if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>> >
>> >> >                          +
>> >__atomic_thread_fence(__ATOMIC_RELEASE);
>> >
>> >> >                                          pipeline_event_tx(dev,
>> >> > port, &ev);
>> >
>> >> >                                          w->processed_pkts++;
>> >
>> >> >                          } else {
>> >
>> >> >                                          ev.queue_id++;
>> >
>> >> >                          +
>> >__atomic_thread_fence(__ATOMIC_RELEASE);
>> >
>> >> >                                          pipeline_fwd_event(&ev,
>> >
>> >> >RTE_SCHED_TYPE_ATOMIC);
>> >
>> >> >
>> >> > pipeline_event_enqueue(dev, port, &ev);
>> >
>> >> >
>> >
>> >> >However, there are two reasons to prevent this:
>> >
>> >> >
>> >
>> >> >First, compare with other tests in app/eventdev, for example, the
>> >
>> >> >eventdev perf test, the wmb is after event operation to ensure
>> >
>> >> >operation has been finished and then w->processed_pkts++.
>> >
>> >>
>> >
>> >> In case of perf_* tests start with a dequeue() and finally ends with
>> >> a
>> >
>> >> mempool_put() should also act as implicit acquire release pairs
>> >making stats
>> >
>> >> consistent?
>> >
>> >
>> >
>> >For perf tests, this consistency refers to that there is a wmb after
>> >mempool_put().
>> >
>> >Please refer to this link:
>> >
>> >https://urldefense.proofpoint.com/v2/url?u=http-
>>
>>3A__patches.dpdk.org_patch_85634_&d=DwIGaQ&c=nKjWec2b6R0m
>O
>>
>>yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=z
>g
>> >QHeSDiXWfI1PIIUxXBqMS6E-
>2_3G46nhrzGXoBpHI&s=0FwTxPXjWflh-
>> >sdmnkY133IPlJB780x0yxe7Am3JCBw&e=
>> >
>> >
>> >
>> >>
>> >
>> >> >So, if we move release barriers before tx/fwd, it may cause that
>the
>> >
>> >> >tests of app/eventdev become  inconsistent.This may reduce the
>> >
>> >> >maintainability of the code and make it difficult to understand.
>> >
>> >> >
>> >
>> >> >Second, it is a test case, though heavy thread may cause
>> >performance
>> >
>> >> >degradation, it can ensure that the operation process and the test
>> >
>> >> >result are correct. And maybe for a test case, correctness is more
>> >
>> >> >important than performance.
>> >
>> >> >
>> >
>> >>
>> >
>> >> Most of our internal perf test run on 24/48 core combinations and
>> >since
>> >
>> >> Octeontx2 event device driver supports a burst size of 1, it will
>> >> show
>> >up as
>> >
>> >> Huge performance degradation.
>> >
>> >
>> >
>> >For the impact on performance, I do the test using software driver,
>> >following are some test results:
>> >
>> >-----------------------------------------------------------------------
>> >----------------
>> >---------------------------------------------
>> >
>> >Architecture: aarch64
>> >
>> >Nics: ixgbe-82599
>> >
>> >CPU: Cortex-A72
>> >
>> >BURST_SIZE: 1
>> >
>> >Order: ./dpdk-test-eventdev -l 0-15 -s 0x2 --vdev=event_sw0 -- --
>> >test=pipeline_queue --wlcore=4-14 --prod_type_ethdev --stlist=a,a
>> >
>> >Flow: one flow, 64bits package, TX rate: 1.4Mpps
>> >
>> >
>> >
>> >Without this patch:
>> >
>> >0.954 mpps avg 0.953 mpps
>> >
>> >
>> >
>> >With this patch:
>> >
>> >0.932 mpps avg 0.930 mpps
>> >
>> >-----------------------------------------------------------------------
>> >----------------
>> >---------------------------------------------
>> >
>> >
>> >
>> >Based on the result above, there is no significant performance
>> >degradation with this patch.
>> >
>> >This is because the release barrier is only for  “w-
>>processed_pkts++”.
>> >It just ensures that the worker core
>> >
>> >increments the number events processed after enqueue, and it
>doesn’t
>> >affect dequeue/enqueue:
>> >
>> >
>> >
>> >dequeue -> enqueue -> release barrier -> w->processed_pkts++
>> >
>>
>> Here enqueue already acts as an explicit release barrier.
>>
>Please refer above reasons.
>
>> >
>> >
>> >On the other hand, I infer the reason for the slight decrease in
>> >measurement performance is that the release barrier
>> >
>> >prevent “w->processed_pkts++” before that the event has been
>processed
>> >(enqueue). But I think this test result is closer
>> >
>> >to the real performance.
>> >
>> >And sorry for that we have no octentx2 device, so there is no test
>> >result on Octeontx2 event device driver. Would you please
>> >
>> >help us test this patch on octentx2 when you are convenient. Thanks
>> >very much.
>> >
>>
>> I will report the performance numbers on Monday.
>>
>
>That’s great, Thanks very much for your help.
>
>Best Regards
>Feifei
>
>> >
>> >
>> >Best Regards
>> >
>> >Feifei
>>
>> Regards,
>> Pavan.
>>
>> >
>> >
>> >
>> >>
>> >
>> >> >So, due to two reasons above, I'm ambivalent about how we
>should
>> >do in
>> >
>> >> >the next step.
>> >
>> >> >
>> >
>> >> >Best Regards
>> >
>> >> >Feifei
>> >
>> >>
>> >
>> >> Regards,
>> >
>> >> Pavan.
>> >
>> >>
>> >
>> >> >
>> >
>> >> >> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue
>worker
>> >
>> >> >> >functions")
>> >
>> >> >> >Cc:
>>
>>mailto:pbhagavat...@marvell.com<mailto:pbhagavat...@marvell.co
>m>
>> >
>> >> >> >Cc: mailto:sta...@dpdk.org<mailto:sta...@dpdk.org>
>> >
>> >> >> >
>> >
>> >> >> >Signed-off-by: Phil Yang
>> ><mailto:phil.y...@arm.com<mailto:phil.y...@arm.com>>
>> >
>> >> >> >Signed-off-by: Feifei Wang
>> ><mailto:feifei.wa...@arm.com<mailto:feifei.wa...@arm.com>>
>> >
>> >> >> >Reviewed-by: Ruifeng Wang
>> ><mailto:ruifeng.w...@arm.com<mailto:ruifeng.w...@arm.com>>
>> >
>> >> >> >---
>> >
>> >> >> > app/test-eventdev/test_pipeline_queue.c | 64
>> >
>> >> >> >+++++++++++++++++++++----
>> >
>> >> >> > 1 file changed, 56 insertions(+), 8 deletions(-)
>> >
>> >> >> >
>> >
>> >> >> >diff --git a/app/test-eventdev/test_pipeline_queue.c
>b/app/test-
>> >
>> >> >> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb
>> >
>> >> >100644
>> >
>> >> >> >--- a/app/test-eventdev/test_pipeline_queue.c
>> >
>> >> >> >+++ b/app/test-eventdev/test_pipeline_queue.c
>> >
>> >> >> >@@ -30,7 +30,13 @@
>> >pipeline_queue_worker_single_stage_tx(void
>> >
>> >> >> >*arg)
>> >
>> >> >> >
>> >
>> >> >> >                    if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>> >
>> >> >> >                                    pipeline_event_tx(dev, port,
>> >> >> > &ev);
>> >
>> >> >> >-                                  w->processed_pkts++;
>> >
>> >> >> >+
>> >
>> >> >> >+                                 /* release barrier here ensures
>> >> >> >+ stored
>> >operation
>> >
>> >> >> >+                                 * of the event completes before
>> >> >> >+ the number
>> >of
>> >
>> >> >> >+                                 * processed pkts is visible to
>> >> >> >+ the main core
>> >
>> >> >> >+                                 */
>> >
>> >> >> >+
>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
>> >1,
>> >
>> >> >> >+
>> >> >> >+ __ATOMIC_RELEASE);
>> >
>> >> >> >                    } else {
>> >
>> >> >> >                                    ev.queue_id++;
>> >
>> >> >> >                                    pipeline_fwd_event(&ev,
>> >
>> >> >> >RTE_SCHED_TYPE_ATOMIC);
>> >
>> >> >> >@@ -59,7 +65,13 @@
>> >
>> >> >pipeline_queue_worker_single_stage_fwd(void
>> >
>> >> >> >*arg)
>> >
>> >> >> >                    rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
>> >
>> >> >> >                    pipeline_fwd_event(&ev,
>> >> >> > RTE_SCHED_TYPE_ATOMIC);
>> >
>> >> >> >                    pipeline_event_enqueue(dev, port, &ev);
>> >
>> >> >> >-                  w->processed_pkts++;
>> >
>> >> >> >+
>> >
>> >> >> >+                 /* release barrier here ensures stored
>> >> >> >+ operation
>> >
>> >> >> >+                 * of the event completes before the number of
>> >
>> >> >> >+                 * processed pkts is visible to the main core
>> >
>> >> >> >+                 */
>> >
>> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts), 1,
>> >
>> >> >> >+
>> >> >> >+ __ATOMIC_RELEASE);
>> >
>> >> >> >    }
>> >
>> >> >> >
>> >
>> >> >> >    return 0;
>> >
>> >> >> >@@ -84,7 +96,13 @@
>> >
>> >> >> >pipeline_queue_worker_single_stage_burst_tx(void *arg)
>> >
>> >> >> >                                    if (ev[i].sched_type ==
>> >
>> >> >> >RTE_SCHED_TYPE_ATOMIC) {
>> >
>> >> >> >
>> >> >> > pipeline_event_tx(dev, port, &ev[i]);
>> >
>> >> >> >                                                    ev[i].op =
>> >> >> > RTE_EVENT_OP_RELEASE;
>> >
>> >> >> >-                                                  
>> >> >> >w->processed_pkts++;
>> >
>> >> >> >+
>> >
>> >> >> >+                                                 /* release
>> >> >> >+ barrier here ensures stored
>> >
>> >> >> >operation
>> >
>> >> >> >+                                                 * of the event
>> >> >> >+ completes before the
>> >
>> >> >> >number of
>> >
>> >> >> >+                                                 * processed
>> >> >> >+ pkts is visible to the main
>> >
>> >> >> >core
>> >
>> >> >> >+                                                 */
>> >
>> >> >> >+
>> >> >> >+ __atomic_fetch_add(&(w-
>> >
>> >> >> >>processed_pkts), 1,
>> >
>> >> >> >+
>> >__ATOMIC_RELEASE);
>> >
>> >> >> >                                    } else {
>> >
>> >> >> >
>> >> >> > ev[i].queue_id++;
>> >
>> >> >> >
>> >> >> > pipeline_fwd_event(&ev[i],
>> >
>> >> >> >@@ -121,7 +139,13 @@
>> >
>> >> >> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)
>> >
>> >> >> >                    }
>> >
>> >> >> >
>> >
>> >> >> >                    pipeline_event_enqueue_burst(dev, port, ev,
>> >> >> > nb_rx);
>> >
>> >> >> >-                  w->processed_pkts += nb_rx;
>> >
>> >> >> >+
>> >
>> >> >> >+                 /* release barrier here ensures stored
>> >> >> >+ operation
>> >
>> >> >> >+                 * of the event completes before the number of
>> >
>> >> >> >+                 * processed pkts is visible to the main core
>> >
>> >> >> >+                 */
>> >
>> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts), nb_rx,
>> >
>> >> >> >+
>> >> >> >+ __ATOMIC_RELEASE);
>> >
>> >> >> >    }
>> >
>> >> >> >
>> >
>> >> >> >    return 0;
>> >
>> >> >> >@@ -146,7 +170,13 @@
>> >
>> >> >pipeline_queue_worker_multi_stage_tx(void
>> >
>> >> >> >*arg)
>> >
>> >> >> >
>> >
>> >> >> >                    if (ev.queue_id == tx_queue[ev.mbuf->port]) {
>> >
>> >> >> >                                    pipeline_event_tx(dev, port,
>> >> >> > &ev);
>> >
>> >> >> >-                                  w->processed_pkts++;
>> >
>> >> >> >+
>> >
>> >> >> >+                                 /* release barrier here ensures
>> >> >> >+ stored
>> >operation
>> >
>> >> >> >+                                 * of the event completes before
>> >> >> >+ the number
>> >of
>> >
>> >> >> >+                                 * processed pkts is visible to
>> >> >> >+ the main core
>> >
>> >> >> >+                                 */
>> >
>> >> >> >+
>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
>> >1,
>> >
>> >> >> >+
>> >> >> >+ __ATOMIC_RELEASE);
>> >
>> >> >> >                                    continue;
>> >
>> >> >> >                    }
>> >
>> >> >> >
>> >
>> >> >> >@@ -180,7 +210,13 @@
>> >
>> >> >> >pipeline_queue_worker_multi_stage_fwd(void *arg)
>> >
>> >> >> >                                    ev.queue_id =
>> >> >> > tx_queue[ev.mbuf->port];
>> >
>> >> >> >
>> >> >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf,
>> >0);
>> >
>> >> >> >                                    pipeline_fwd_event(&ev,
>> >
>> >> >> >RTE_SCHED_TYPE_ATOMIC);
>> >
>> >> >> >-                                  w->processed_pkts++;
>> >
>> >> >> >+
>> >
>> >> >> >+                                 /* release barrier here ensures
>> >> >> >+ stored
>> >operation
>> >
>> >> >> >+                                 * of the event completes before
>> >> >> >+ the number
>> >of
>> >
>> >> >> >+                                 * processed pkts is visible to
>> >> >> >+ the main core
>> >
>> >> >> >+                                 */
>> >
>> >> >> >+
>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
>> >1,
>> >
>> >> >> >+
>> >> >> >+ __ATOMIC_RELEASE);
>> >
>> >> >> >                    } else {
>> >
>> >> >> >                                    ev.queue_id++;
>> >
>> >> >> >                                    pipeline_fwd_event(&ev,
>> >
>> >> >> >sched_type_list[cq_id]);
>> >
>> >> >> >@@ -214,7 +250,13 @@
>> >
>> >> >> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)
>> >
>> >> >> >                                    if (ev[i].queue_id ==
>> >> >> > tx_queue[ev[i].mbuf-
>> >
>> >> >> >>port]) {
>> >
>> >> >> >
>> >> >> > pipeline_event_tx(dev, port, &ev[i]);
>> >
>> >> >> >                                                    ev[i].op =
>> >> >> > RTE_EVENT_OP_RELEASE;
>> >
>> >> >> >-                                                  
>> >> >> >w->processed_pkts++;
>> >
>> >> >> >+
>> >
>> >> >> >+                                                 /* release
>> >> >> >+ barrier here ensures stored
>> >
>> >> >> >operation
>> >
>> >> >> >+                                                 * of the event
>> >> >> >+ completes before the
>> >
>> >> >> >number of
>> >
>> >> >> >+                                                 * processed
>> >> >> >+ pkts is visible to the main
>> >
>> >> >> >core
>> >
>> >> >> >+                                                 */
>> >
>> >> >> >+
>> >> >> >+ __atomic_fetch_add(&(w-
>> >
>> >> >> >>processed_pkts), 1,
>> >
>> >> >> >+
>> >__ATOMIC_RELEASE);
>> >
>> >> >> >                                                    continue;
>> >
>> >> >> >                                    }
>> >
>> >> >> >
>> >
>> >> >> >@@ -254,7 +296,13 @@
>> >
>> >> >> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
>> >
>> >> >> >
>> >
>> >> >> >    rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
>> >
>> >> >> >
>> >> >> > pipeline_fwd_event(&ev[i],
>> >
>> >> >> >
>> >
>> >> >> >    RTE_SCHED_TYPE_ATOMIC);
>> >
>> >> >> >-                                                  
>> >> >> >w->processed_pkts++;
>> >
>> >> >> >+
>> >
>> >> >> >+                                                 /* release
>> >> >> >+ barrier here ensures stored
>> >
>> >> >> >operation
>> >
>> >> >> >+                                                 * of the event
>> >> >> >+ completes before the
>> >
>> >> >> >number of
>> >
>> >> >> >+                                                 * processed
>> >> >> >+ pkts is visible to the main
>> >
>> >> >> >core
>> >
>> >> >> >+                                                 */
>> >
>> >> >> >+
>> >> >> >+ __atomic_fetch_add(&(w-
>> >
>> >> >> >>processed_pkts), 1,
>> >
>> >> >> >+
>> >__ATOMIC_RELEASE);
>> >
>> >> >> >                                    } else {
>> >
>> >> >> >
>> >> >> > ev[i].queue_id++;
>> >
>> >> >> >
>> >> >> > pipeline_fwd_event(&ev[i],
>> >
>> >> >> >--
>> >
>> >> >> >2.17.1
>> >

Re: [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test

Reply via email to