+Jerin, you had feedback regarding the mempool. > From: Morten Brørup > Sent: Friday, 28 January 2022 09.41 > > +Honnappa, you have shown a keen interest in this topic. > > > From: Olivier Matz [mailto:olivier.m...@6wind.com] > > Sent: Monday, 24 January 2022 16.57 > > > > On Mon, Jan 10, 2022 at 09:40:48AM +0000, Bruce Richardson wrote: > > > On Sat, Jan 08, 2022 at 12:00:17PM +0100, Morten Brørup wrote: > > > > > From: Bruce Richardson [mailto:bruce.richard...@intel.com] > > > > > Sent: Friday, 7 January 2022 16.12 > > > > > > > > > > On Tue, Dec 28, 2021 at 03:28:45PM +0100, Morten Brørup wrote: > > > > > > Hi mempool maintainers and DPDK team. > > > > > > > > > > > > Does anyone know the reason or history why > > > > > CACHE_FLUSHTHRESH_MULTIPLIER was chosen to be 1.5? I think it > is > > > > > counterintuitive. > > > > > > > > > > > > The mempool cache flush threshold was introduced in DPDK > > version 1.3; > > > > > it was not in DPDK version 1.2. The copyright notice for > > rte_mempool.c > > > > > says year 2012. > > > > > > > > > > > > > > > > > > Here is my analysis: > > > > > > > > > > > > With the multiplier of 1.5, a mempool cache is allowed to be > > filled > > > > > up to 50 % above than its target size before its excess entries > > are > > > > > flushed to the mempool (thereby reducing the cache length to > the > > target > > > > > size). > > > > > > > > > > > > In the opposite direction, a mempool cache is allowed to be > > drained > > > > > completely, i.e. up to 100 % below its target size. > > > > > > > > > > > > My instinct tells me that it would be more natural to let a > > mempool > > > > > cache go the same amount above and below its target size, i.e. > > using a > > > > > flush multiplier of 2 instead of 1.5. > > > > > > > > > > > > Also, the cache should be allowed to fill up to and including > > the > > > > > flush threshold, so it is flushed when the threshold is > exceeded, > > > > > instead of when it is reached. > > > > > > > > > > > > Here is a simplified example: > > > > > > > > > > > > Imagine a cache target size of 32, corresponding to a typical > > packet > > > > > burst. With a flush threshold of 2 (and len > threshold instead > > of len > > > > > >= threshold), the cache could hold 1 +/-1 packet bursts. With > > the > > > > > current multiplier it can only hold [0 .. 1.5[ packet bursts, > not > > > > > really providing a lot of elasticity. > > > > > > > > > > > Hi Morten, > > > > > > > > > > Interesting to see this being looked at again. The original > idea > > of > > > > > adding > > > > > in some extra room above the requested value was to avoid the > > worst- > > > > > case > > > > > scenario of a pool oscillating between full and empty > repeatedly > > due to > > > > > the > > > > > addition/removal of perhaps a single packet. As for why 1.5 was > > chosen > > > > > as > > > > > the value, I don't recall any particular reason for it myself. > > The main > > > > > objective was to have a separate flush and size values so that > we > > could > > > > > go > > > > > a bit above full, and when flushing, not emptying the entire > > cache down > > > > > to > > > > > zero. > > > > > > > > Thanks for providing the historical background for this feature, > > Bruce. > > > > > > > > > > > > > > In terms of the behavioural points you make above, I wonder if > > symmetry > > > > > is > > > > > actually necessary or desirable in this case. After all, the > > ideal case > > > > > is > > > > > probably to keep the mempool neither full nor empty, so that > both > > > > > allocations or frees can be done without having to go to the > > underlying > > > > > shared data structure. To accomodate this, the mempool will > only > > flush > > > > > when > > > > > the number of elements is greater than size * 1.5, and then it > > only > > > > > flushes > > > > > elements down to size, ensuring that allocations can still take > > place. > > > > > On allocation, new buffers are taken when we don't have enough > in > > the > > > > > cache > > > > > to fullfil the request, and then the cache is filled up to > size, > > not to > > > > > the > > > > > flush threshold. > > > > > > > > I agree with the ideal case. > > > > > > > > However, it looks like the addition of the flush threshold also > > changed the "size" parameter to effectively become "desired length" > > instead. This interpretation is also supported by the flush > algorithm, > > which doesn't flush below the "size", but to the "size". So based on > > interpretation, I was wondering why it is not symmetrical around the > > "desired length", but allowed to go 100 % below and only 50 % above. > > > > > > > > > > > > > > Now, for the scenario you describe - where the mempool cache > size > > is > > > > > set to > > > > > be the same as the burst size, this scheme probably does break > > down, in > > > > > that we don't really have any burst elasticity. However, I > would > > query > > > > > if > > > > > that is a configuration that is used, since to the user it > should > > > > > appear > > > > > correctly to provide no elasticity. Looking at testpmd, and our > > example > > > > > apps, the standard there is a burst size of 32 and mempool > cache > > of > > > > > ~256. > > > > > In OVS code, netdev-dpdk.c seems to initialize the mempool > with > > cache > > > > > size > > > > > of RTE_MEMPOOL_CACHE_MAX_SIZE (through define MP_CACHE_SZ). In > > all > > > > > these > > > > > cases, I think the 1.5 threshold should work just fine for us. > > > > > > > > My example was for demonstration only, and hopefully not being > used > > by any applications. > > > > > > > > The simplified example was intended to demonstrate the > theoretical > > effect of the unbalance in using the 1.5 threshold. It will be > similar > > with a cache size of 256 objects: You will be allowed to go 8 bursts > > (calculation: 256 objects / 32 objects per burst) below the cache > > "size" without retrieving objects from the backing pool, but only 4 > > bursts above the cache "size" before flushing objects to the backing > > pool. > > > > > > > > > That said, > > > > > if you want to bump it up to 2x, I can't say I'd object > strongly > > as it > > > > > should be harmless, I think. > > > > > > > > From an academic viewpoint, 2x seems much better to me than 1.5x. > > And based on your background information from the introduction of the > > flush threshold, there was no academic reason why 1.5x was chosen - > > this is certainly valuable feedback. > > > > > > > > Just to say I agree with Morten's analysis. Having a threshold at 2x > > size would > > make more sense to me. > > > > Without the corner cases (big requests, or common pool empty), I > would > > have suggested this, which looks more symetric than what we have > today: > > > > get() > > fill request with as many objs as possible from cache > > if cache is empty, request "size" more objects from common pool > > fill the rest of the request from cache > > I agree, and this is doable without a performance penalty. > > > > > put() > > fill cache with provided objects up to 2x size > > if cache exceeds threshold, fill common pool with "size" objects > > fill the cache with the remaining objs from request > > I tried implementing this already, but it cannot be done without a > performance cost, one way or the other: > > 1. If we flush the top of the cache (which is a stack), we are flushing > the hot objects, leaving cold objects at the new top of the cache. > > 2. If we flush the bottom of the cache, we need to move the remaining > objects in the cache down. E.g. flushing objects at index [0..256[ in > the cache means that the objects at index [256..len[ in the cache must > be moved down to index[0..len-256[. > > So I came to the conclusion that flushing the cache needs to flush it > completely. I also consider this the best choice for real applications: > > Either, an lcore thread in the application is in a state of balance, > where it uses the mempool cache within its flush/refill boundaries - > which makes the flush method less important. > > Or, an lcore thread in the application is out of balance (either > permanently or temporarily), and mostly gets or puts objects from/to > the mempool. If it mostly puts objects, not flushing all of the objects > will cause more frequent flushing. E.g.: > > Cache size=256, flushthresh=512 (2x, not 1.5x), initial len=256; > application burst len=32. > > If flushing leaves "size" objects in the cache, the cache is flushed > every 8th burst. And with flushthresh=384 (1.5x), flushing occurs every > 4th burst, which is what the mempool does today! > > If the cache is flushed completely, the cache is only flushed at every > 16th burst. > > > And when/if the application thread breaks its pattern of continuously > putting objects, and suddenly starts to get objects instead, it will > either get objects already in the cache, or the get() function will > refill the cache. > > The concept of not flushing the cache completely is based on the > assumption that it is more likely for an application's lcore thread to > get() after flushing than to put() after flushing. I strongly disagree > with this assumption! If an application thread is continuously putting > so much that it overflows the cache, it is much more likely to keep > putting than it is to start getting. This is CPU branch prediction 101. > > > > > But this is a quite sensitive area, and modifying these functions > > should > > be done with care, as it impacts mbuf allocation. > > Certainly! > > The current mempool caching algorithm is defective, and I'm trying to > repair it. Honnappa already provided very valuable feedback regarding > CPUs with small L1 cache, which must be taken carefully into > consideration.
I'm sorry... The L1 cache feedback was from Jerin, not Honnappa. > > For now, there seems to be consensus regarding changing the flush > threshold multiplier from 1.5 to 2. > > > > > > > I will consider providing a patch. Worst case it probably doesn't > > do any harm. And if anyone is really interested, they can look at the > > mempool debug stats to see the difference it makes for their specific > > application, e.g. the put_common_pool_bulk/put_bulk and > > get_common_pool_bulk/get_success_bulk ratios should decrease. > > > > > > Sure, thanks. > > > > > > /Bruce