> From: Andrew Rybchenko [mailto:andrew.rybche...@oktetlabs.ru]
> Sent: Sunday, 9 October 2022 15.38
> 
> From: Morten Brørup <m...@smartsharesystems.com>
> 
> Fix the rte_mempool_do_generic_put() caching flushing algorithm to
> keep hot objects in cache instead of cold ones.
> 
> The algorithm was:
>  1. Add the objects to the cache.
>  2. Anything greater than the cache size (if it crosses the cache flush
>     threshold) is flushed to the backend.
> 
> Please note that the description in the source code said that it kept
> "cache min value" objects after flushing, but the function actually
> kept
> the cache full after flushing, which the above description reflects.
> 
> Now, the algorithm is:
>  1. If the objects cannot be added to the cache without crossing the
>     flush threshold, flush some cached objects to the backend to
>     free up required space.
>  2. Add the objects to the cache.
> 
> The most recent (hot) objects were flushed, leaving the oldest (cold)
> objects in the mempool cache. The bug degraded performance, because
> flushing prevented immediate reuse of the (hot) objects already in
> the CPU cache.  Now, the existing (cold) objects in the mempool cache
> are flushed before the new (hot) objects are added the to the mempool
> cache.
> 
> Since nearby code is touched anyway fix flush threshold comparison
> to do flushing if the threshold is really exceed, not just reached.
> I.e. it must be "len > flushthresh", not "len >= flushthresh".
> Consider a flush multiplier of 1 instead of 1.5; the cache would be
> flushed already when reaching size objects, not when exceeding size
> objects. In other words, the cache would not be able to hold "size"
> objects, which is clearly a bug. The bug could degraded performance
> due to premature flushing.
> 
> Since we never exceed flush threshold now, cache size in the mempool
> may be decreased from RTE_MEMPOOL_CACHE_MAX_SIZE * 3 to
> RTE_MEMPOOL_CACHE_MAX_SIZE * 2. In fact it could be
> CALC_CACHE_FLUSHTHRESH(RTE_MEMPOOL_CACHE_MAX_SIZE), but flush
> threshold multiplier is internal.
> 
> Signed-off-by: Morten Brørup <m...@smartsharesystems.com>
> Signed-off-by: Andrew Rybchenko <andrew.rybche...@oktetlabs.ru>
> ---

[...]

> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -90,7 +90,7 @@ struct rte_mempool_cache {
>        * Cache is allocated to this size to allow it to overflow in
> certain
>        * cases to avoid needless emptying of cache.
>        */
> -     void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects */
> +     void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2]; /**< Cache objects */
>  } __rte_cache_aligned;

How much are we allowed to break the ABI here?

This patch reduces the size of the structure by removing a now unused part at 
the end, which should be harmless.

If we may also move the position of the objs array, I would add 
__rte_cache_aligned to the objs array. It makes no difference in the general 
case, but if get/put operations are always 32 objects, it will reduce the 
number of memory (or last level cache) accesses from five to four 64 B cache 
lines for every get/put operation.

        uint32_t len;         /**< Current cache count */
-       /*
-        * Cache is allocated to this size to allow it to overflow in certain
-        * cases to avoid needless emptying of cache.
-        */
-       void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects */
+       /**
+        * Cache objects
+        *
+        * Cache is allocated to this size to allow it to overflow in certain
+        * cases to avoid needless emptying of cache.
+        */
+       void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2] __rte_cache_aligned;
} __rte_cache_aligned;

With or without the above suggested optimization...

Reviewed-by: Morten Brørup <m...@smartsharesystems.com>

Reply via email to