On 10/9/22 17:31, Morten Brørup wrote:
From: Andrew Rybchenko [mailto:andrew.rybche...@oktetlabs.ru]
Sent: Sunday, 9 October 2022 15.38
From: Morten Brørup <m...@smartsharesystems.com>
Fix the rte_mempool_do_generic_put() caching flushing algorithm to
keep hot objects in cache instead of cold ones.
The algorithm was:
1. Add the objects to the cache.
2. Anything greater than the cache size (if it crosses the cache flush
threshold) is flushed to the backend.
Please note that the description in the source code said that it kept
"cache min value" objects after flushing, but the function actually
kept
the cache full after flushing, which the above description reflects.
Now, the algorithm is:
1. If the objects cannot be added to the cache without crossing the
flush threshold, flush some cached objects to the backend to
free up required space.
2. Add the objects to the cache.
The most recent (hot) objects were flushed, leaving the oldest (cold)
objects in the mempool cache. The bug degraded performance, because
flushing prevented immediate reuse of the (hot) objects already in
the CPU cache. Now, the existing (cold) objects in the mempool cache
are flushed before the new (hot) objects are added the to the mempool
cache.
Since nearby code is touched anyway fix flush threshold comparison
to do flushing if the threshold is really exceed, not just reached.
I.e. it must be "len > flushthresh", not "len >= flushthresh".
Consider a flush multiplier of 1 instead of 1.5; the cache would be
flushed already when reaching size objects, not when exceeding size
objects. In other words, the cache would not be able to hold "size"
objects, which is clearly a bug. The bug could degraded performance
due to premature flushing.
Since we never exceed flush threshold now, cache size in the mempool
may be decreased from RTE_MEMPOOL_CACHE_MAX_SIZE * 3 to
RTE_MEMPOOL_CACHE_MAX_SIZE * 2. In fact it could be
CALC_CACHE_FLUSHTHRESH(RTE_MEMPOOL_CACHE_MAX_SIZE), but flush
threshold multiplier is internal.
Signed-off-by: Morten Brørup <m...@smartsharesystems.com>
Signed-off-by: Andrew Rybchenko <andrew.rybche...@oktetlabs.ru>
---
[...]
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -90,7 +90,7 @@ struct rte_mempool_cache {
* Cache is allocated to this size to allow it to overflow in
certain
* cases to avoid needless emptying of cache.
*/
- void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects */
+ void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2]; /**< Cache objects */
} __rte_cache_aligned;
How much are we allowed to break the ABI here?
This patch reduces the size of the structure by removing a now unused part at
the end, which should be harmless.
If we may also move the position of the objs array, I would add
__rte_cache_aligned to the objs array. It makes no difference in the general
case, but if get/put operations are always 32 objects, it will reduce the
number of memory (or last level cache) accesses from five to four 64 B cache
lines for every get/put operation.
uint32_t len; /**< Current cache count */
- /*
- * Cache is allocated to this size to allow it to overflow in certain
- * cases to avoid needless emptying of cache.
- */
- void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects */
+ /**
+ * Cache objects
+ *
+ * Cache is allocated to this size to allow it to overflow in certain
+ * cases to avoid needless emptying of cache.
+ */
+ void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2] __rte_cache_aligned;
} __rte_cache_aligned;
I think aligning objs on cacheline should be a separate patch.
With or without the above suggested optimization...
Reviewed-by: Morten Brørup <m...@smartsharesystems.com>