On Tue, May 16, 2023 at 2:22 AM Mattias Rönnblom <hof...@lysator.liu.se> wrote: > > On 2023-05-15 14:38, Jerin Jacob wrote: > > On Fri, May 12, 2023 at 6:45 PM Mattias Rönnblom > > <mattias.ronnb...@ericsson.com> wrote: > >> > >> On 2023-05-12 13:59, Jerin Jacob wrote: > >>> On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom > >>> <mattias.ronnb...@ericsson.com> wrote: > >>>> > >>>> Use non-burst event enqueue and dequeue calls from burst enqueue and > >>>> dequeue only when the burst size is compile-time constant (and equal > >>>> to one). > >>>> > >>>> Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com> > >>>> > >>>> --- > >>>> > >>>> v3: Actually include the change v2 claimed to contain. > >>>> v2: Wrap builtin call in __extension__, to avoid compiler warnings if > >>>> application is compiled with -pedantic. (Morten Brørup) > >>>> --- > >>>> lib/eventdev/rte_eventdev.h | 4 ++-- > >>>> 1 file changed, 2 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h > >>>> index a90e23ac8b..a471caeb6d 100644 > >>>> --- a/lib/eventdev/rte_eventdev.h > >>>> +++ b/lib/eventdev/rte_eventdev.h > >>>> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t > >>>> port_id, > >>>> * Allow zero cost non burst mode routine invocation if > >>>> application > >>>> * requests nb_events as const one > >>>> */ > >>>> - if (nb_events == 1) > >>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events > >>>> == 1) > >>> > >>> "Why" part is not clear from the commit message. Is this to avoid > >>> nb_events read if it is built-in const. > >> > >> The __builtin_constant_p() is introduced to avoid having the compiler > >> generate a conditional branch and two different code paths in case > >> nb_elem is a run-time variable. > >> > >> In particular, this matters if nb_elems is run-time variable and varies > >> between 1 and some larger value. > >> > >> I should have mention this in the commit message. > >> > >> A very slight performance improvement. It also makes the code better > >> match the comment, imo. Zero cost for const one enqueues, but no impact > >> non-compile-time-constant-length enqueues. > >> > >> Feel free to ignore. > > > > > > I did some performance comparison of the patch. > > A low-end ARM machines shows 0.7% drop with single event case. No > > regression see with high-end ARM cores with single event case. > > > > IMO, optimizing the check for burst mode(the new patch) may not show > > any real improvement as the cost is divided by number of event. > > Whereas optimizing the check for single event case(The current code) > > shows better performance with single event case and no regression > > with burst mode as cost is divided by number of events. > > I ran some tests on an AMD Zen 3 with DSW. > In the below tests the enqueue burst size is not compile-time constant. > > Enqueue burst size Performance improvement > Run-time constant 1 ~5% > Run-time constant 2 ~0% > Run-time variable 1-2 ~9% > Run-time variable 1-16 ~0% > > The run-time variable enqueue sizes randomly (uniformly) distributed in > the specified range. > > The first result may come as a surprise. The benchmark is using > RTE_EVENT_OP_FORWARD type events (which likely is the dominating op type > in most apps). The single-event enqueue function only exists in a > generic variant (i.e., no rte_event_enqueue_forward_burst() equivalent). > I suspect that is the reason for the performance improvement. > > This effect is large-enough to make it somewhat beneficial (+~1%) to use > run-time variable single-event enqueue compared to keeping the burst > size compile-time constant.
# Interesting, Could you share your testeventdev command to test it. # By having quick glance on DSW code, following change can be added(or similar cases). Not sure such change in DSW driver is making a difference or nor? diff --git a/drivers/event/dsw/dsw_event.c b/drivers/event/dsw/dsw_event.c index e84b65d99f..455470997b 100644 --- a/drivers/event/dsw/dsw_event.c +++ b/drivers/event/dsw/dsw_event.c @@ -1251,7 +1251,7 @@ dsw_port_flush_out_buffers(struct dsw_evdev *dsw, struct dsw_port *source_port) uint16_t dsw_event_enqueue(void *port, const struct rte_event *ev) { - return dsw_event_enqueue_burst(port, ev, unlikely(ev == NULL) ? 0 : 1); + return dsw_event_enqueue_burst(port, ev, 1); } static __rte_always_inline uint16_t @@ -1340,7 +1340,7 @@ dsw_event_enqueue_burst_generic(struct dsw_port *source_port, return (num_non_release + num_release); } -uint16_t +inline uint16_t dsw_event_enqueue_burst(void *port, const struct rte_event events[], uint16_t events_len) { # I am testing with command like this "/build/app/dpdk-test-eventdev -l 0-23 -a 0002:0e:00.0 -- --test=perf_atq --plcores 1 --wlcores 8 --stlist p --nb_pkts=10000000000" > > The performance gain is counted toward both enqueue and dequeue costs > (+benchmark app overhead), so an under-estimation if see this as an > enqueue performance improvement. > > > If you agree, then we can skip this patch. > > > > I have no strong opinion if this should be included or not. > > It was up to me, I would drop the single-enqueue special case handling > altogether in the next ABI update. That's a reasonable path. If we are willing to push a patch, we can test it and give feedback. Or in our spare time, We can do that as well. > > > > >> > >>> If so, check should be following. Right? > >>> > >>> if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1) > >>> || nb_events == 1) > >>> > >>> At least, It was my original intention in the code. > >>> > >>> > >>> > >>>> return (fp_ops->enqueue)(port, ev); > >>>> else > >>>> return fn(port, ev, nb_events); > >>>> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t > >>>> port_id, struct rte_event ev[], > >>>> * Allow zero cost non burst mode routine invocation if > >>>> application > >>>> * requests nb_events as const one > >>>> */ > >>>> - if (nb_events == 1) > >>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events > >>>> == 1) > >>>> return (fp_ops->dequeue)(port, ev, timeout_ticks); > >>>> else > >>>> return (fp_ops->dequeue_burst)(port, ev, nb_events, > >>>> -- > >>>> 2.34.1 > >>>> > >>