On Mon, Feb 27, 2023 at 10:48:09AM +0100, Morten Brørup wrote: > > From: Olivier Matz [mailto:olivier.m...@6wind.com] > > Sent: Monday, 27 February 2023 10.10 > > > > Hi, > > > > On Sun, Feb 26, 2023 at 11:45:48AM +0100, Morten Brørup wrote: > > > > From: Harris, James R [mailto:james.r.har...@intel.com] > > > > Sent: Friday, 24 February 2023 17.43 > > > > > > > > > On Feb 24, 2023, at 6:56 AM, Honnappa Nagarahalli > > > > <honnappa.nagaraha...@arm.com> wrote: > > > > > > > > > >> From: Morten Brørup <m...@smartsharesystems.com> > > > > >> Sent: Friday, February 24, 2023 6:13 AM > > > > >> To: Harris, James R <james.r.har...@intel.com>; dev@dpdk.org > > > > >> Subject: RE: Bug in rte_mempool_do_generic_get? > > > > >> > > > > >>> If you have a mempool with 2048 objects, shouldn't 4 cores each be > > > > able to do a 512 buffer bulk get, regardless of the configured cache > > > > size? > > > > >> > > > > >> No, the scenario you described above is the expected behavior. I > > > > think it is > > > > >> documented somewhere that objects in the caches are unavailable for > > > > other > > > > >> cores, but now I cannot find where this is documented. > > > > >> > > > > > > > > Thanks Morten. > > > > > > > > Yeah, I think it is documented somewhere, but I also couldn’t find it. > > > > I was aware of cores not being able to allocate from another core’s > > > > cache. My surprise was that in a pristine new mempool, that 4 cores > > > > could not each do one initial 512 buffer bulk get. But I also see that > > > > even before the a2833ecc5 patch, the cache would get populated on gets > > > > less than cache size, in addition to the buffers requested by the user. > > > > So if cache size is 256, and bulk get is for 128 buffers, it pulls 384 > > > > buffers from backing pool - 128 for the caller, another 256 to prefill > > > > the cache. Your patch makes this cache filling consistent between less- > > > > than-cache-size and greater-than-or-equal-to-cache-size cases. > > > > > > Yeah... I have tried hard to clean up some of the mess introduced by the > > <insert curse words here> patch [1] that increased the effective cache size > > by > > factor 1.5. > > > > > > [1] > > http://git.dpdk.org/dpdk/commit/lib/librte_mempool/rte_mempool.h?id=ea5dd2744b > > 90b330f07fd10f327ab99ef55c7266 > > > > > > E.g., somewhat related to your use case, I have argued against the design > > goal of trying to keep the mempool caches full at all times, but have not > > yet > > been able to convince the community against it. > > > > > > Overall, this is one of the absolute core DPDK libraries, so the support > > > for > > the changes I would like to see is near zero - even minor adjustments are > > considered very high risk, which I must admit has quite a lot of merit. > > > > > > So instead of fixing the library's implementation to make it behave as > > reasonably expected according to its documentation, the library's > > implementation is considered the reference. And as a consequence, the > > library's internals, such as the cache size multiplier, is getting promoted > > to > > be part of the public API, e.g. for applications to implement mempool sizing > > calculations like the one below. > > > > > > I recall running into problems once myself, when I had sized a mempool > > > with > > a specific cache size for a specific number of cores, because 50 % > > additional > > objects got stuck in the caches. If you ask me, this bug is so fundamental > > that it should be fixed at the root, not by trying to document the weird > > behavior of allocating 50 % more than specified. > > > > I think we should document it: even in the case the 1.5 factor is > > removed, it is helpful to document that a user needs to reserve more > > objects when using a cache, to ensure that all cores can allocate their > > objects. > > > > Morten, what would you think about this change below? > > A big improvement! Only a few comments, inline below. > > > > > --- a/doc/guides/prog_guide/mempool_lib.rst > > +++ b/doc/guides/prog_guide/mempool_lib.rst > > @@ -89,9 +89,13 @@ In this way, each core has full access to its own cache > > (with locks) of free obj > > only when the cache fills does the core need to shuffle some of the free > > objects back to the pools ring or > > obtain more objects when the cache is empty. > > > > -While this may mean a number of buffers may sit idle on some core's > > cache, > > +While this may mean a number of buffers may sit idle on some core's cache > > (up to ``1.5 * cache_length``), > > Typo: cache_length -> cache_size > > Looking at this, I noticed another detail: Chapter 10.4 uses the term > "buffers", but it should use the term "objects", because a mempool may hold > other objects than packet buffers. (Using the term "packet buffers" in the > example in chapter 10.3 is correct.) > > > the speed at which a core can access its own cache for a specific memory > > pool without locks provides performance gains. > > > > +However, to ensure that all cores can get objects when a cache is used, > > it > > is required to allocate > > +more objects: if N objects are needed, allocate a mempool with ``N + > > (number_of_active_cores * > > +cache_size * 1.5)`` objects. > > + > > Perhaps add here that the factor 1.5 stems from the cache being allowed to > exceed its configured size by 50 %. > > > The cache is composed of a small, per-core table of pointers and its > > length > > (used as a stack). > > This internal cache can be enabled or disabled at creation of the pool. > > > > If something similar is added to the description of the "cache_size" > parameter to the "rte_mempool_create()" function, the most obvious places to > look for documentation are covered. > > A lengthy description might be redundant here, so perhaps only mention that > the actual cache size is 1.5 * cache_size, because the cache is allowed to > exceed its configured size by 50 %.
Thanks for the quick feedback, I'll submit a patch for this. > > > > > > > > > > > > > > >> Furthermore, since the effective per-core cache size is 1.5 * > > > > configured cache > > > > >> size, a configured cache size of 256 may leave up to 384 objects in > > > > each per- > > > > >> core cache. > > > > >> > > > > >> With 4 cores, you can expect up to 3 * 384 = 1152 objects sitting in > > > > the > > > > >> caches of other cores. If you want to be able to pull 512 objects > > > > with each > > > > >> core, the pool size should be 4 * 512 + 1152 objects. > > > > > May be we should document this in mempool library? > > > > > > > > > > > > > Maybe. But this case I described here is a bit wonky - SPDK should > > > > never have been specifying a non-zero cache in this case. We only > > > > noticed this change in behavior because we were creating the mempool > > > > with a cache when we shouldn’t have. > > > > > > So, looking at the positive side, this regression revealed a "bug" in > > > SPDK. > > ;-) > > > > > > PS: I assume that you are aware that mempools generally perform better > > > with > > cache, so I assume that you know what you are doing when you say that you > > don't need the cache. > > > > > > PPS: Sorry about venting here. If nothing else, I hope it had some > > entertainment value. :-) > > > >