> From: Olivier Matz [mailto:olivier.m...@6wind.com]
> Sent: Monday, 27 February 2023 10.10
> 
> Hi,
> 
> On Sun, Feb 26, 2023 at 11:45:48AM +0100, Morten Brørup wrote:
> > > From: Harris, James R [mailto:james.r.har...@intel.com]
> > > Sent: Friday, 24 February 2023 17.43
> > >
> > > > On Feb 24, 2023, at 6:56 AM, Honnappa Nagarahalli
> > > <honnappa.nagaraha...@arm.com> wrote:
> > > >
> > > >> From: Morten Brørup <m...@smartsharesystems.com>
> > > >> Sent: Friday, February 24, 2023 6:13 AM
> > > >> To: Harris, James R <james.r.har...@intel.com>; dev@dpdk.org
> > > >> Subject: RE: Bug in rte_mempool_do_generic_get?
> > > >>
> > > >>> If you have a mempool with 2048 objects, shouldn't 4 cores each be
> > > able to do a 512 buffer bulk get, regardless of the configured cache
> > > size?
> > > >>
> > > >> No, the scenario you described above is the expected behavior. I
> > > think it is
> > > >> documented somewhere that objects in the caches are unavailable for
> > > other
> > > >> cores, but now I cannot find where this is documented.
> > > >>
> > >
> > > Thanks Morten.
> > >
> > > Yeah, I think it is documented somewhere, but I also couldn’t find it.
> > > I was aware of cores not being able to allocate from another core’s
> > > cache.  My surprise was that in a pristine new mempool, that 4 cores
> > > could not each do one initial 512 buffer bulk get.  But I also see that
> > > even before the a2833ecc5 patch, the cache would get populated on gets
> > > less than cache size, in addition to the buffers requested by the user.
> > > So if cache size is 256, and bulk get is for 128 buffers, it pulls 384
> > > buffers from backing pool - 128 for the caller, another 256 to prefill
> > > the cache.  Your patch makes this cache filling consistent between less-
> > > than-cache-size and greater-than-or-equal-to-cache-size cases.
> >
> > Yeah... I have tried hard to clean up some of the mess introduced by the
> <insert curse words here> patch [1] that increased the effective cache size by
> factor 1.5.
> >
> > [1]
> http://git.dpdk.org/dpdk/commit/lib/librte_mempool/rte_mempool.h?id=ea5dd2744b
> 90b330f07fd10f327ab99ef55c7266
> >
> > E.g., somewhat related to your use case, I have argued against the design
> goal of trying to keep the mempool caches full at all times, but have not yet
> been able to convince the community against it.
> >
> > Overall, this is one of the absolute core DPDK libraries, so the support for
> the changes I would like to see is near zero - even minor adjustments are
> considered very high risk, which I must admit has quite a lot of merit.
> >
> > So instead of fixing the library's implementation to make it behave as
> reasonably expected according to its documentation, the library's
> implementation is considered the reference. And as a consequence, the
> library's internals, such as the cache size multiplier, is getting promoted to
> be part of the public API, e.g. for applications to implement mempool sizing
> calculations like the one below.
> >
> > I recall running into problems once myself, when I had sized a mempool with
> a specific cache size for a specific number of cores, because 50 % additional
> objects got stuck in the caches. If you ask me, this bug is so fundamental
> that it should be fixed at the root, not by trying to document the weird
> behavior of allocating 50 % more than specified.
> 
> I think we should document it: even in the case the 1.5 factor is
> removed, it is helpful to document that a user needs to reserve more
> objects when using a cache, to ensure that all cores can allocate their
> objects.
> 
> Morten, what would you think about this change below?

A big improvement! Only a few comments, inline below.

> 
>   --- a/doc/guides/prog_guide/mempool_lib.rst
>   +++ b/doc/guides/prog_guide/mempool_lib.rst
>   @@ -89,9 +89,13 @@ In this way, each core has full access to its own cache
> (with locks) of free obj
>    only when the cache fills does the core need to shuffle some of the free
> objects back to the pools ring or
>    obtain more objects when the cache is empty.
> 
>   -While this may mean a number of buffers may sit idle on some core's cache,
>   +While this may mean a number of buffers may sit idle on some core's cache
> (up to ``1.5 * cache_length``),

Typo: cache_length -> cache_size

Looking at this, I noticed another detail: Chapter 10.4 uses the term 
"buffers", but it should use the term "objects", because a mempool may hold 
other objects than packet buffers. (Using the term "packet buffers" in the 
example in chapter 10.3 is correct.)

>    the speed at which a core can access its own cache for a specific memory
> pool without locks provides performance gains.
> 
>   +However, to ensure that all cores can get objects when a cache is used, it
> is required to allocate
>   +more objects: if N objects are needed, allocate a mempool with ``N +
> (number_of_active_cores *
>   +cache_size * 1.5)`` objects.
>   +

Perhaps add here that the factor 1.5 stems from the cache being allowed to 
exceed its configured size by 50 %.

>    The cache is composed of a small, per-core table of pointers and its length
> (used as a stack).
>    This internal cache can be enabled or disabled at creation of the pool.
> 

If something similar is added to the description of the "cache_size" parameter 
to the "rte_mempool_create()" function, the most obvious places to look for 
documentation are covered.

A lengthy description might be redundant here, so perhaps only mention that the 
actual cache size is 1.5 * cache_size, because the cache is allowed to exceed 
its configured size by 50 %.

> 
> >
> > >
> > > >> Furthermore, since the effective per-core cache size is 1.5 *
> > > configured cache
> > > >> size, a configured cache size of 256 may leave up to 384 objects in
> > > each per-
> > > >> core cache.
> > > >>
> > > >> With 4 cores, you can expect up to 3 * 384 = 1152 objects sitting in
> > > the
> > > >> caches of other cores. If you want to be able to pull 512 objects
> > > with each
> > > >> core, the pool size should be 4 * 512 + 1152 objects.
> > > > May be we should document this in mempool library?
> > > >
> > >
> > > Maybe.  But this case I described here is a bit wonky - SPDK should
> > > never have been specifying a non-zero cache in this case.  We only
> > > noticed this change in behavior because we were creating the mempool
> > > with a cache when we shouldn’t have.
> >
> > So, looking at the positive side, this regression revealed a "bug" in SPDK.
> ;-)
> >
> > PS: I assume that you are aware that mempools generally perform better with
> cache, so I assume that you know what you are doing when you say that you
> don't need the cache.
> >
> > PPS: Sorry about venting here. If nothing else, I hope it had some
> entertainment value. :-)
> >

Reply via email to