RE: [PATCH v2] mempool: micro-optimize put function

Konstantin Ananyev Thu, 22 Dec 2022 05:52:08 -0800

Hi Morten,

> PING mempool maintainers.
> 
> If you don't provide ACK or Review to patches, how should Thomas know that it 
> is ready for merging?
> 
> -Morten


The code changes itself looks ok to me. 
Though I don't think it would really bring any noticeable speedup.
But yes, it looks a bit nicer this way, especially with extra comments.
One question though, why do you feel this assert:
RTE_ASSERT(cache->len <= cache->flushthresh);
is necessary?
I can't see any way it could happen, unless something is totally broken
within the app.   

> > From: Morten Brørup [mailto:m...@smartsharesystems.com]
> > Sent: Wednesday, 16 November 2022 18.39
> >
> > > From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com]
> > > Sent: Wednesday, 16 November 2022 17.27
> > >
> > >
> > > > > >
> > > > > > Micro-optimization:
> > > > > > Reduced the most likely code path in the generic put function
> > by
> > > > > moving an
> > > > > > unlikely check out of the most likely code path and further
> > down.
> > > > > >
> > > > > > Also updated the comments in the function.
> > > > > >
> > > > > > v2 (feedback from Andrew Rybchenko):
> > > > > > * Modified comparison to prevent overflow if n is really huge
> > and
> > > > > > len
> > > > > is
> > > > > >   non-zero.
> > > > > > * Added assertion about the invariant preventing overflow in
> > the
> > > > > >   comparison.
> > > > > > * Crossing the threshold is not extremely unlikely, so removed
> > > > > likely()
> > > > > >   from that comparison.
> > > > > >   The compiler will generate code with optimal static branch
> > > > > prediction
> > > > > >   here anyway.
> > > > > >
> > > > > > Signed-off-by: Morten Brørup <m...@smartsharesystems.com>
> > > > > > ---
> > > > > >  lib/mempool/rte_mempool.h | 36 ++++++++++++++++++++-----------
> > --
> > > ---
> > > > > >  1 file changed, 20 insertions(+), 16 deletions(-)
> > > > > >
> > > > > > diff --git a/lib/mempool/rte_mempool.h
> > > b/lib/mempool/rte_mempool.h
> > > > > > index 9f530db24b..dd1a3177d6 100644
> > > > > > --- a/lib/mempool/rte_mempool.h
> > > > > > +++ b/lib/mempool/rte_mempool.h
> > > > > > @@ -1364,32 +1364,36 @@ rte_mempool_do_generic_put(struct
> > > > > > rte_mempool *mp, void * const *obj_table,  {
> > > > > >     void **cache_objs;
> > > > > >
> > > > > > -   /* No cache provided */
> > > > > > +   /* No cache provided? */
> > > > > >     if (unlikely(cache == NULL))
> > > > > >             goto driver_enqueue;
> > > > > >
> > > > > > -   /* increment stat now, adding in mempool always success */
> > > > > > +   /* Increment stats now, adding in mempool always succeeds.
> > > */
> > > > > >     RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > > >     RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > > > >
> > > > > > -   /* The request itself is too big for the cache */
> > > > > > -   if (unlikely(n > cache->flushthresh))
> > > > > > -           goto driver_enqueue_stats_incremented;
> > > > > > -
> > > > > > -   /*
> > > > > > -    * The cache follows the following algorithm:
> > > > > > -    *   1. If the objects cannot be added to the cache without
> > > > > crossing
> > > > > > -    *      the flush threshold, flush the cache to the
> > > backend.
> > > > > > -    *   2. Add the objects to the cache.
> > > > > > -    */
> > > > > > +   /* Assert the invariant preventing overflow in the
> > > comparison
> > > > > below.
> > > > > > */
> > > > > > +   RTE_ASSERT(cache->len <= cache->flushthresh);
> > > > > >
> > > > > > -   if (cache->len + n <= cache->flushthresh) {
> > > > > > +   if (n <= cache->flushthresh - cache->len) {
> > > > > > +           /*
> > > > > > +            * The objects can be added to the cache without
> > > crossing
> > > > > the
> > > > > > +            * flush threshold.
> > > > > > +            */
> > > > > >             cache_objs = &cache->objs[cache->len];
> > > > > >             cache->len += n;
> > > > > > -   } else {
> > > > > > +   } else if (likely(n <= cache->flushthresh)) {
> > > > > IMO, this is a misconfiguration on the application part. In the
> > > PMDs I
> > > > > have looked at, max value of 'n' is controlled by compile time
> > > > > constants. Application could do a compile time check on the cache
> > > > > threshold or we could have another RTE_ASSERT on this.
> > > >
> > > > There could be applications using a mempool for something else than
> > > mbufs.
> > > Agree
> > >
> > > >
> > > > In that case, the application should be allowed to get/put many
> > > objects in
> > > > one transaction.
> > > Still, this is a misconfiguration on the application. On one hand the
> > > threshold is configured for 'x' but they are sending a request which
> > is
> > > more than 'x'. It should be possible to change the threshold
> > > configuration or reduce the request size.
> > >
> > > If the application does not fix the misconfiguration, it is possible
> > > that it will always hit this case and does not get the benefit of
> > using
> > > the per-core cache.
> >
> > Correct. I suppose this is the intended behavior of this API.
> >
> > The zero-copy API proposed in another patch [1] has stricter
> > requirements to the bulk size.
> >
> > [1]: http://inbox.dpdk.org/dev/20221115161822.70886-1-
> > m...@smartsharesystems.com/T/#u
> >
> > >
> > > With this check, we are introducing an additional memcpy as well. I
> > am
> > > not sure if reusing the latest buffers is better than having an
> > memcpy.
> >
> > There is no additional memcpy. The large bulk transfer is stored
> > directly in the backend pool, bypassing the mempool cache.
> >
> > Please note that this check is not new, it has just been moved. Before
> > this patch, it was checked on every call (if a cache is present); with
> > this patch, it is only checked if the entire request cannot go directly
> > into the cache.
> >
> > >
> > > >
> > > > >
> > > > > > +           /*
> > > > > > +            * The request itself fits into the cache.
> > > > > > +            * But first, the cache must be flushed to the
> > > backend, so
> > > > > > +            * adding the objects does not cross the flush
> > > threshold.
> > > > > > +            */
> > > > > >             cache_objs = &cache->objs[0];
> > > > > >             rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > > > > >len);
> > > > > >             cache->len = n;
> > > > > > +   } else {
> > > > > > +           /* The request itself is too big for the cache. */
> > > > > > +           goto driver_enqueue_stats_incremented;
> > > > > >     }
> > > > > >
> > > > > >     /* Add the objects to the cache. */ @@ -1399,13 +1403,13 @@
> > > > > > rte_mempool_do_generic_put(struct rte_mempool *mp, void * const
> > > > > > *obj_table,
> > > > > >
> > > > > >  driver_enqueue:
> > > > > >
> > > > > > -   /* increment stat now, adding in mempool always success */
> > > > > > +   /* Increment stats now, adding in mempool always succeeds.
> > > */
> > > > > >     RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > > > >     RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > > > > >
> > > > > >  driver_enqueue_stats_incremented:
> > > > > >
> > > > > > -   /* push objects to the backend */
> > > > > > +   /* Push the objects to the backend. */
> > > > > >     rte_mempool_ops_enqueue_bulk(mp, obj_table, n);  }
> > > > > >
> > > > > > --
> > > > > > 2.17.1

RE: [PATCH v2] mempool: micro-optimize put function

Reply via email to