On 2023-05-15 14:38, Jerin Jacob wrote:
On Fri, May 12, 2023 at 6:45 PM Mattias Rönnblom
<mattias.ronnb...@ericsson.com> wrote:
On 2023-05-12 13:59, Jerin Jacob wrote:
On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
<mattias.ronnb...@ericsson.com> wrote:
Use non-burst event enqueue and dequeue calls from burst enqueue and
dequeue only when the burst size is compile-time constant (and equal
to one).
Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com>
---
v3: Actually include the change v2 claimed to contain.
v2: Wrap builtin call in __extension__, to avoid compiler warnings if
application is compiled with -pedantic. (Morten Brørup)
---
lib/eventdev/rte_eventdev.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index a90e23ac8b..a471caeb6d 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
* Allow zero cost non burst mode routine invocation if application
* requests nb_events as const one
*/
- if (nb_events == 1)
+ if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
"Why" part is not clear from the commit message. Is this to avoid
nb_events read if it is built-in const.
The __builtin_constant_p() is introduced to avoid having the compiler
generate a conditional branch and two different code paths in case
nb_elem is a run-time variable.
In particular, this matters if nb_elems is run-time variable and varies
between 1 and some larger value.
I should have mention this in the commit message.
A very slight performance improvement. It also makes the code better
match the comment, imo. Zero cost for const one enqueues, but no impact
non-compile-time-constant-length enqueues.
Feel free to ignore.
I did some performance comparison of the patch.
A low-end ARM machines shows 0.7% drop with single event case. No
regression see with high-end ARM cores with single event case.
IMO, optimizing the check for burst mode(the new patch) may not show
any real improvement as the cost is divided by number of event.
Whereas optimizing the check for single event case(The current code)
shows better performance with single event case and no regression
with burst mode as cost is divided by number of events.
I ran some tests on an AMD Zen 3 with DSW.
In the below tests the enqueue burst size is not compile-time constant.
Enqueue burst size Performance improvement
Run-time constant 1 ~5%
Run-time constant 2 ~0%
Run-time variable 1-2 ~9%
Run-time variable 1-16 ~0%
The run-time variable enqueue sizes randomly (uniformly) distributed in
the specified range.
The first result may come as a surprise. The benchmark is using
RTE_EVENT_OP_FORWARD type events (which likely is the dominating op type
in most apps). The single-event enqueue function only exists in a
generic variant (i.e., no rte_event_enqueue_forward_burst() equivalent).
I suspect that is the reason for the performance improvement.
This effect is large-enough to make it somewhat beneficial (+~1%) to use
run-time variable single-event enqueue compared to keeping the burst
size compile-time constant.
The performance gain is counted toward both enqueue and dequeue costs
(+benchmark app overhead), so an under-estimation if see this as an
enqueue performance improvement.
If you agree, then we can skip this patch.
I have no strong opinion if this should be included or not.
It was up to me, I would drop the single-enqueue special case handling
altogether in the next ABI update.
If so, check should be following. Right?
if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
|| nb_events == 1)
At least, It was my original intention in the code.
return (fp_ops->enqueue)(port, ev);
else
return fn(port, ev, nb_events);
@@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id,
struct rte_event ev[],
* Allow zero cost non burst mode routine invocation if application
* requests nb_events as const one
*/
- if (nb_events == 1)
+ if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
return (fp_ops->dequeue)(port, ev, timeout_ticks);
else
return (fp_ops->dequeue_burst)(port, ev, nb_events,
--
2.34.1