> From: Andrew Rybchenko [mailto:andrew.rybche...@oktetlabs.ru] > Sent: Sunday, 9 October 2022 15.38 > > From: Morten Brørup <m...@smartsharesystems.com> > > Fix the rte_mempool_do_generic_put() caching flushing algorithm to > keep hot objects in cache instead of cold ones. > > The algorithm was: > 1. Add the objects to the cache. > 2. Anything greater than the cache size (if it crosses the cache flush > threshold) is flushed to the backend. > > Please note that the description in the source code said that it kept > "cache min value" objects after flushing, but the function actually > kept > the cache full after flushing, which the above description reflects. > > Now, the algorithm is: > 1. If the objects cannot be added to the cache without crossing the > flush threshold, flush some cached objects to the backend to > free up required space. > 2. Add the objects to the cache. > > The most recent (hot) objects were flushed, leaving the oldest (cold) > objects in the mempool cache. The bug degraded performance, because > flushing prevented immediate reuse of the (hot) objects already in > the CPU cache. Now, the existing (cold) objects in the mempool cache > are flushed before the new (hot) objects are added the to the mempool > cache. > > Since nearby code is touched anyway fix flush threshold comparison > to do flushing if the threshold is really exceed, not just reached. > I.e. it must be "len > flushthresh", not "len >= flushthresh". > Consider a flush multiplier of 1 instead of 1.5; the cache would be > flushed already when reaching size objects, not when exceeding size > objects. In other words, the cache would not be able to hold "size" > objects, which is clearly a bug. The bug could degraded performance > due to premature flushing. > > Since we never exceed flush threshold now, cache size in the mempool > may be decreased from RTE_MEMPOOL_CACHE_MAX_SIZE * 3 to > RTE_MEMPOOL_CACHE_MAX_SIZE * 2. In fact it could be > CALC_CACHE_FLUSHTHRESH(RTE_MEMPOOL_CACHE_MAX_SIZE), but flush > threshold multiplier is internal. > > Signed-off-by: Morten Brørup <m...@smartsharesystems.com> > Signed-off-by: Andrew Rybchenko <andrew.rybche...@oktetlabs.ru> > ---
[...] > --- a/lib/mempool/rte_mempool.h > +++ b/lib/mempool/rte_mempool.h > @@ -90,7 +90,7 @@ struct rte_mempool_cache { > * Cache is allocated to this size to allow it to overflow in > certain > * cases to avoid needless emptying of cache. > */ > - void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects */ > + void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2]; /**< Cache objects */ > } __rte_cache_aligned; How much are we allowed to break the ABI here? This patch reduces the size of the structure by removing a now unused part at the end, which should be harmless. If we may also move the position of the objs array, I would add __rte_cache_aligned to the objs array. It makes no difference in the general case, but if get/put operations are always 32 objects, it will reduce the number of memory (or last level cache) accesses from five to four 64 B cache lines for every get/put operation. uint32_t len; /**< Current cache count */ - /* - * Cache is allocated to this size to allow it to overflow in certain - * cases to avoid needless emptying of cache. - */ - void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects */ + /** + * Cache objects + * + * Cache is allocated to this size to allow it to overflow in certain + * cases to avoid needless emptying of cache. + */ + void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2] __rte_cache_aligned; } __rte_cache_aligned; With or without the above suggested optimization... Reviewed-by: Morten Brørup <m...@smartsharesystems.com>