> From: Andrew Rybchenko [mailto:andrew.rybche...@oktetlabs.ru]
> Sent: Tuesday, 4 October 2022 14.54
> To: Olivier Matz
> Cc: dev@dpdk.org; Morten Brørup; Beilei Xing; Bruce Richardson; Jerin
> Jacob Kollanukkaran
> Subject: [PATCH v3] mempool: fix get objects from mempool with cache
> 
> From: Morten Brørup <m...@smartsharesystems.com>
> 
> A flush threshold for the mempool cache was introduced in DPDK version
> 1.3, but rte_mempool_do_generic_get() was not completely updated back
> then, and some inefficiencies were introduced.
> 
> Fix the following in rte_mempool_do_generic_get():
> 
> 1. The code that initially screens the cache request was not updated
> with the change in DPDK version 1.3.
> The initial screening compared the request length to the cache size,
> which was correct before, but became irrelevant with the introduction
> of
> the flush threshold. E.g. the cache can hold up to flushthresh objects,
> which is more than its size, so some requests were not served from the
> cache, even though they could be.
> The initial screening has now been corrected to match the initial
> screening in rte_mempool_do_generic_put(), which verifies that a cache
> is present, and that the length of the request does not overflow the
> memory allocated for the cache.
> 
> This bug caused a major performance degradation in scenarios where the
> application burst length is the same as the cache size. In such cases,
> the objects were not ever fetched from the mempool cache, regardless if
> they could have been.
> This scenario occurs e.g. if an application has configured a mempool
> with a size matching the application's burst size.
> 
> 2. The function is a helper for rte_mempool_generic_get(), so it must
> behave according to the description of that function.
> Specifically, objects must first be returned from the cache,
> subsequently from the ring.
> After the change in DPDK version 1.3, this was not the behavior when
> the request was partially satisfied from the cache; instead, the
> objects
> from the ring were returned ahead of the objects from the cache.
> This bug degraded application performance on CPUs with a small L1
> cache,
> which benefit from having the hot objects first in the returned array.
> (This is probably also the reason why the function returns the objects
> in reverse order, which it still does.)
> Now, all code paths first return objects from the cache, subsequently
> from the ring.
> 
> The function was not behaving as described (by the function using it)
> and expected by applications using it. This in itself is also a bug.
> 
> 3. If the cache could not be backfilled, the function would attempt
> to get all the requested objects from the ring (instead of only the
> number of requested objects minus the objects available in the ring),
> and the function would fail if that failed.
> Now, the first part of the request is always satisfied from the cache,
> and if the subsequent backfilling of the cache from the ring fails,
> only
> the remaining requested objects are retrieved from the ring.
> 
> The function would fail despite there are enough objects in the cache
> plus the common pool.
> 
> 4. The code flow for satisfying the request from the cache was slightly
> inefficient:
> The likely code path where the objects are simply served from the cache
> was treated as unlikely. Now it is treated as likely.
> 
> Signed-off-by: Morten Brørup <m...@smartsharesystems.com>
> Signed-off-by: Andrew Rybchenko <andrew.rybche...@oktetlabs.ru>
> ---
> v3 changes (Andrew Rybchenko)
>  - Always get first objects from the cache even if request is bigger
>    than cache size. Remove one corresponding condition from the path
>    when request is fully served from cache.
>  - Simplify code to avoid duplication:
>     - Get objects directly from backend in single place only.
>     - Share code which gets from the cache first regardless if
>       everythihg is obtained from the cache or just the first part.
>  - Rollback cache length in unlikely failure branch to avoid cache
>    vs NULL check in success branch.
> 
> v2 changes
> - Do not modify description of return value. This belongs in a separate
> doc fix.
> - Elaborate even more on which bugs the modifications fix.
> 
>  lib/mempool/rte_mempool.h | 74 +++++++++++++++++++++++++--------------
>  1 file changed, 48 insertions(+), 26 deletions(-)

Thank you, Andrew.

I haven't compared the resulting assembler output (regarding performance), but 
I have carefully reviewed the resulting v3 source code for potential bugs in 
all code paths and for performance, and think it looks good.

The RTE_MIN() macro looks like it prefers the first parameter, so static branch 
prediction for len=RTE_MIN(remaining, cache->len) should be correct.

You could consider adding likely() around (cache != NULL) near the bottom of 
the function, so it matches the unlikely(cache == NULL) at the top of the 
function; mainly for symmetry in the source code, as I expect it to be the 
compiler default anyway.

Also, you could add "remaining" to the comment:
/* Get the objects directly from the ring. */

As is, or with suggested modifications...

Reviewed-by: Morten Brørup <m...@smartsharesystems.com>

Reply via email to