RE: [PATCH v2] mempool: micro-optimize put function

Morten Brørup Mon, 19 Dec 2022 00:50:10 -0800

PING mempool maintainers.

If you don't provide ACK or Review to patches, how should Thomas know that it 
is ready for merging?


-Morten

> From: Morten Brørup [mailto:m...@smartsharesystems.com]
> Sent: Wednesday, 16 November 2022 18.39
> 
> > From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com]
> > Sent: Wednesday, 16 November 2022 17.27
> >
> > <snip>
> >
> > > > >
> > > > > Micro-optimization:
> > > > > Reduced the most likely code path in the generic put function
> by
> > > > moving an
> > > > > unlikely check out of the most likely code path and further
> down.
> > > > >
> > > > > Also updated the comments in the function.
> > > > >
> > > > > v2 (feedback from Andrew Rybchenko):
> > > > > * Modified comparison to prevent overflow if n is really huge
> and
> > > > > len
> > > > is
> > > > >   non-zero.
> > > > > * Added assertion about the invariant preventing overflow in
> the
> > > > >   comparison.
> > > > > * Crossing the threshold is not extremely unlikely, so removed
> > > > likely()
> > > > >   from that comparison.
> > > > >   The compiler will generate code with optimal static branch
> > > > prediction
> > > > >   here anyway.
> > > > >
> > > > > Signed-off-by: Morten Brørup <m...@smartsharesystems.com>
> > > > > ---
> > > > >  lib/mempool/rte_mempool.h | 36 ++++++++++++++++++++-----------
> --
> > ---
> > > > >  1 file changed, 20 insertions(+), 16 deletions(-)
> > > > >
> > > > > diff --git a/lib/mempool/rte_mempool.h
> > b/lib/mempool/rte_mempool.h
> > > > > index 9f530db24b..dd1a3177d6 100644
> > > > > --- a/lib/mempool/rte_mempool.h
> > > > > +++ b/lib/mempool/rte_mempool.h
> > > > > @@ -1364,32 +1364,36 @@ rte_mempool_do_generic_put(struct
> > > > > rte_mempool *mp, void * const *obj_table,  {
> > > > >       void **cache_objs;
> > > > >
> > > > > -     /* No cache provided */
> > > > > +     /* No cache provided? */
> > > > >       if (unlikely(cache == NULL))
> > > > >               goto driver_enqueue;
> > > > >
> > > > > -     /* increment stat now, adding in mempool always success */
> > > > > +     /* Increment stats now, adding in mempool always succeeds.
> > */
> > > > >       RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > >       RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > > >
> > > > > -     /* The request itself is too big for the cache */
> > > > > -     if (unlikely(n > cache->flushthresh))
> > > > > -             goto driver_enqueue_stats_incremented;
> > > > > -
> > > > > -     /*
> > > > > -      * The cache follows the following algorithm:
> > > > > -      *   1. If the objects cannot be added to the cache without
> > > > crossing
> > > > > -      *      the flush threshold, flush the cache to the
> > backend.
> > > > > -      *   2. Add the objects to the cache.
> > > > > -      */
> > > > > +     /* Assert the invariant preventing overflow in the
> > comparison
> > > > below.
> > > > > */
> > > > > +     RTE_ASSERT(cache->len <= cache->flushthresh);
> > > > >
> > > > > -     if (cache->len + n <= cache->flushthresh) {
> > > > > +     if (n <= cache->flushthresh - cache->len) {
> > > > > +             /*
> > > > > +              * The objects can be added to the cache without
> > crossing
> > > > the
> > > > > +              * flush threshold.
> > > > > +              */
> > > > >               cache_objs = &cache->objs[cache->len];
> > > > >               cache->len += n;
> > > > > -     } else {
> > > > > +     } else if (likely(n <= cache->flushthresh)) {
> > > > IMO, this is a misconfiguration on the application part. In the
> > PMDs I
> > > > have looked at, max value of 'n' is controlled by compile time
> > > > constants. Application could do a compile time check on the cache
> > > > threshold or we could have another RTE_ASSERT on this.
> > >
> > > There could be applications using a mempool for something else than
> > mbufs.
> > Agree
> >
> > >
> > > In that case, the application should be allowed to get/put many
> > objects in
> > > one transaction.
> > Still, this is a misconfiguration on the application. On one hand the
> > threshold is configured for 'x' but they are sending a request which
> is
> > more than 'x'. It should be possible to change the threshold
> > configuration or reduce the request size.
> >
> > If the application does not fix the misconfiguration, it is possible
> > that it will always hit this case and does not get the benefit of
> using
> > the per-core cache.
> 
> Correct. I suppose this is the intended behavior of this API.
> 
> The zero-copy API proposed in another patch [1] has stricter
> requirements to the bulk size.
> 
> [1]: http://inbox.dpdk.org/dev/20221115161822.70886-1-
> m...@smartsharesystems.com/T/#u
> 
> >
> > With this check, we are introducing an additional memcpy as well. I
> am
> > not sure if reusing the latest buffers is better than having an
> memcpy.
> 
> There is no additional memcpy. The large bulk transfer is stored
> directly in the backend pool, bypassing the mempool cache.
> 
> Please note that this check is not new, it has just been moved. Before
> this patch, it was checked on every call (if a cache is present); with
> this patch, it is only checked if the entire request cannot go directly
> into the cache.
> 
> >
> > >
> > > >
> > > > > +             /*
> > > > > +              * The request itself fits into the cache.
> > > > > +              * But first, the cache must be flushed to the
> > backend, so
> > > > > +              * adding the objects does not cross the flush
> > threshold.
> > > > > +              */
> > > > >               cache_objs = &cache->objs[0];
> > > > >               rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > > > >len);
> > > > >               cache->len = n;
> > > > > +     } else {
> > > > > +             /* The request itself is too big for the cache. */
> > > > > +             goto driver_enqueue_stats_incremented;
> > > > >       }
> > > > >
> > > > >       /* Add the objects to the cache. */ @@ -1399,13 +1403,13 @@
> > > > > rte_mempool_do_generic_put(struct rte_mempool *mp, void * const
> > > > > *obj_table,
> > > > >
> > > > >  driver_enqueue:
> > > > >
> > > > > -     /* increment stat now, adding in mempool always success */
> > > > > +     /* Increment stats now, adding in mempool always succeeds.
> > */
> > > > >       RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > > >       RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > > > >
> > > > >  driver_enqueue_stats_incremented:
> > > > >
> > > > > -     /* push objects to the backend */
> > > > > +     /* Push the objects to the backend. */
> > > > >       rte_mempool_ops_enqueue_bulk(mp, obj_table, n);  }
> > > > >
> > > > > --
> > > > > 2.17.1

RE: [PATCH v2] mempool: micro-optimize put function

Reply via email to