On Fri, May 12, 2023 at 6:45 PM Mattias Rönnblom
<mattias.ronnb...@ericsson.com> wrote:
>
> On 2023-05-12 13:59, Jerin Jacob wrote:
> > On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
> > <mattias.ronnb...@ericsson.com> wrote:
> >>
> >> Use non-burst event enqueue and dequeue calls from burst enqueue and
> >> dequeue only when the burst size is compile-time constant (and equal
> >> to one).
> >>
> >> Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com>
> >>
> >> ---
> >>
> >> v3: Actually include the change v2 claimed to contain.
> >> v2: Wrap builtin call in __extension__, to avoid compiler warnings if
> >>      application is compiled with -pedantic. (Morten Brørup)
> >> ---
> >>   lib/eventdev/rte_eventdev.h | 4 ++--
> >>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> >> index a90e23ac8b..a471caeb6d 100644
> >> --- a/lib/eventdev/rte_eventdev.h
> >> +++ b/lib/eventdev/rte_eventdev.h
> >> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t 
> >> port_id,
> >>           * Allow zero cost non burst mode routine invocation if 
> >> application
> >>           * requests nb_events as const one
> >>           */
> >> -       if (nb_events == 1)
> >> +       if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 
> >> 1)
> >
> > "Why" part is not clear from the commit message. Is this to avoid
> > nb_events read if it is built-in const.
>
> The __builtin_constant_p() is introduced to avoid having the compiler
> generate a conditional branch and two different code paths in case
> nb_elem is a run-time variable.
>
> In particular, this matters if nb_elems is run-time variable and varies
> between 1 and some larger value.
>
> I should have mention this in the commit message.
>
> A very slight performance improvement. It also makes the code better
> match the comment, imo. Zero cost for const one enqueues, but no impact
> non-compile-time-constant-length enqueues.
>
> Feel free to ignore.


I did some performance comparison of the patch.
A low-end ARM machines shows 0.7%  drop with single event case. No
regression see with high-end ARM cores with single event case.

IMO, optimizing the check for burst mode(the new patch) may not show
any real improvement as the cost is divided by number of event.
Whereas optimizing the check for single event case(The current code)
shows better performance with single event case and no regression
with burst mode as cost is divided by number of events.
If you agree, then we can skip this patch.


>
> > If so, check should be following. Right?
> >
> > if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
> > || nb_events  == 1)
> >
> > At least, It was my original intention in the code.
> >
> >
> >
> >>                  return (fp_ops->enqueue)(port, ev);
> >>          else
> >>                  return fn(port, ev, nb_events);
> >> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t 
> >> port_id, struct rte_event ev[],
> >>           * Allow zero cost non burst mode routine invocation if 
> >> application
> >>           * requests nb_events as const one
> >>           */
> >> -       if (nb_events == 1)
> >> +       if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 
> >> 1)
> >>                  return (fp_ops->dequeue)(port, ev, timeout_ticks);
> >>          else
> >>                  return (fp_ops->dequeue_burst)(port, ev, nb_events,
> >> --
> >> 2.34.1
> >>
>

Reply via email to