> From: Andrew Rybchenko [mailto:andrew.rybche...@oktetlabs.ru] > Sent: Tuesday, 4 October 2022 14.54 > To: Olivier Matz > Cc: dev@dpdk.org; Morten Brørup; Beilei Xing; Bruce Richardson; Jerin > Jacob Kollanukkaran > Subject: [PATCH v3] mempool: fix get objects from mempool with cache > > From: Morten Brørup <m...@smartsharesystems.com> > > A flush threshold for the mempool cache was introduced in DPDK version > 1.3, but rte_mempool_do_generic_get() was not completely updated back > then, and some inefficiencies were introduced. > > Fix the following in rte_mempool_do_generic_get(): > > 1. The code that initially screens the cache request was not updated > with the change in DPDK version 1.3. > The initial screening compared the request length to the cache size, > which was correct before, but became irrelevant with the introduction > of > the flush threshold. E.g. the cache can hold up to flushthresh objects, > which is more than its size, so some requests were not served from the > cache, even though they could be. > The initial screening has now been corrected to match the initial > screening in rte_mempool_do_generic_put(), which verifies that a cache > is present, and that the length of the request does not overflow the > memory allocated for the cache. > > This bug caused a major performance degradation in scenarios where the > application burst length is the same as the cache size. In such cases, > the objects were not ever fetched from the mempool cache, regardless if > they could have been. > This scenario occurs e.g. if an application has configured a mempool > with a size matching the application's burst size. > > 2. The function is a helper for rte_mempool_generic_get(), so it must > behave according to the description of that function. > Specifically, objects must first be returned from the cache, > subsequently from the ring. > After the change in DPDK version 1.3, this was not the behavior when > the request was partially satisfied from the cache; instead, the > objects > from the ring were returned ahead of the objects from the cache. > This bug degraded application performance on CPUs with a small L1 > cache, > which benefit from having the hot objects first in the returned array. > (This is probably also the reason why the function returns the objects > in reverse order, which it still does.) > Now, all code paths first return objects from the cache, subsequently > from the ring. > > The function was not behaving as described (by the function using it) > and expected by applications using it. This in itself is also a bug. > > 3. If the cache could not be backfilled, the function would attempt > to get all the requested objects from the ring (instead of only the > number of requested objects minus the objects available in the ring), > and the function would fail if that failed. > Now, the first part of the request is always satisfied from the cache, > and if the subsequent backfilling of the cache from the ring fails, > only > the remaining requested objects are retrieved from the ring. > > The function would fail despite there are enough objects in the cache > plus the common pool. > > 4. The code flow for satisfying the request from the cache was slightly > inefficient: > The likely code path where the objects are simply served from the cache > was treated as unlikely. Now it is treated as likely. > > Signed-off-by: Morten Brørup <m...@smartsharesystems.com> > Signed-off-by: Andrew Rybchenko <andrew.rybche...@oktetlabs.ru> > --- > v3 changes (Andrew Rybchenko) > - Always get first objects from the cache even if request is bigger > than cache size. Remove one corresponding condition from the path > when request is fully served from cache. > - Simplify code to avoid duplication: > - Get objects directly from backend in single place only. > - Share code which gets from the cache first regardless if > everythihg is obtained from the cache or just the first part. > - Rollback cache length in unlikely failure branch to avoid cache > vs NULL check in success branch. > > v2 changes > - Do not modify description of return value. This belongs in a separate > doc fix. > - Elaborate even more on which bugs the modifications fix. > > lib/mempool/rte_mempool.h | 74 +++++++++++++++++++++++++-------------- > 1 file changed, 48 insertions(+), 26 deletions(-)
Thank you, Andrew. I haven't compared the resulting assembler output (regarding performance), but I have carefully reviewed the resulting v3 source code for potential bugs in all code paths and for performance, and think it looks good. The RTE_MIN() macro looks like it prefers the first parameter, so static branch prediction for len=RTE_MIN(remaining, cache->len) should be correct. You could consider adding likely() around (cache != NULL) near the bottom of the function, so it matches the unlikely(cache == NULL) at the top of the function; mainly for symmetry in the source code, as I expect it to be the compiler default anyway. Also, you could add "remaining" to the comment: /* Get the objects directly from the ring. */ As is, or with suggested modifications... Reviewed-by: Morten Brørup <m...@smartsharesystems.com>