On 5/26/2026 10:00 PM, Morten Brørup wrote: > This patch refactors the mempool cache to eliminate some unexpected > behaviour and reduce the mempool cache miss rate. > > 1. > The actual cache size was 1.5 times the cache size specified at run-time > mempool creation. > This was obviously not expected by application developers. > > 2. > In get operations, the check for when to use the cache as bounce buffer > did not respect the run-time configured cache size, > but compared to the build time maximum possible cache size > (RTE_MEMPOOL_CACHE_MAX_SIZE, default 512). > E.g. with a configured cache size of 32 objects, getting 256 objects > would first fetch 32 + 256 = 288 objects into the cache, > and then move the 256 objects from the cache to the destination memory, > instead of fetching the 256 objects directly to the destination memory. > This had a performance cost. > However, this is unlikely to occur in real applications, so it is not > important in itself. > > 3. > When putting objects into a mempool, and the mempool cache did not have > free space for so many objects, > the cache was flushed completely, and the new objects were then put into > the cache. > I.e. the cache drain level was zero. > This (complete cache flush) meant that a subsequent get operation (with > the same number of objects) completely emptied the cache, > so another subsequent get operation required replenishing the cache. > > Similarly, > When getting objects from a mempool, and the mempool cache did not hold so > many objects, > the cache was replenished to cache->size + remaining objects, > and then (the remaining part of) the requested objects were fetched via > the cache, > which left the cache filled (to cache->size) at completion. > I.e. the cache refill level was cache->size (plus some, depending on > request size). > > (1) was improved by generally comparing to cache->size instead of > cache->flushthresh, when considering the capacity of the cache. > The cache->flushthresh field is kept for API/ABI compatibility purposes, > and initialized to cache->size instead of cache->size * 1.5. > > (2) was improved by generally comparing to cache->size / 2 instead of > RTE_MEMPOOL_CACHE_MAX_SIZE, when checking the bounce buffer limit. > > (3) was improved by flushing and replenishing the cache by half its size, > so a flush/refill can be followed randomly by get or put requests. > This also reduced the number of objects in each flush/refill operation. > > As a consequence of these changes, the size of the array holding the > objects in the cache (cache->objs[]) no longer needs to be > 2 * RTE_MEMPOOL_CACHE_MAX_SIZE, and can be reduced to > RTE_MEMPOOL_CACHE_MAX_SIZE at an API/ABI breaking release. > > Performance data: > With a real WAN Optimization application, where the number of allocated > packets varies (as they are held in e.g. shaper queues), the mempool > cache miss rate dropped from ca. 1/20 objects to ca. 1/48 objects. > This was deployed in production at an ISP, and using an effective cache > size of 384 objects.
Does the application run as a RTC (run-to-complete) mode? How about pipeline model which NIC recv packets and enqueue ring, another work thread dequeue packets, process packets and then free packets mbuf?

