> From: fengchengwen [mailto:[email protected]] > Sent: Friday, 29 May 2026 10.54 > > On 5/26/2026 10:00 PM, Morten Brørup wrote: > > This patch refactors the mempool cache to eliminate some unexpected > > behaviour and reduce the mempool cache miss rate. > > > > 1. > > The actual cache size was 1.5 times the cache size specified at run- > time > > mempool creation. > > This was obviously not expected by application developers. > > > > 2. > > In get operations, the check for when to use the cache as bounce > buffer > > did not respect the run-time configured cache size, > > but compared to the build time maximum possible cache size > > (RTE_MEMPOOL_CACHE_MAX_SIZE, default 512). > > E.g. with a configured cache size of 32 objects, getting 256 objects > > would first fetch 32 + 256 = 288 objects into the cache, > > and then move the 256 objects from the cache to the destination > memory, > > instead of fetching the 256 objects directly to the destination > memory. > > This had a performance cost. > > However, this is unlikely to occur in real applications, so it is not > > important in itself. > > > > 3. > > When putting objects into a mempool, and the mempool cache did not > have > > free space for so many objects, > > the cache was flushed completely, and the new objects were then put > into > > the cache. > > I.e. the cache drain level was zero. > > This (complete cache flush) meant that a subsequent get operation > (with > > the same number of objects) completely emptied the cache, > > so another subsequent get operation required replenishing the cache. > > > > Similarly, > > When getting objects from a mempool, and the mempool cache did not > hold so > > many objects, > > the cache was replenished to cache->size + remaining objects, > > and then (the remaining part of) the requested objects were fetched > via > > the cache, > > which left the cache filled (to cache->size) at completion. > > I.e. the cache refill level was cache->size (plus some, depending on > > request size). > > > > (1) was improved by generally comparing to cache->size instead of > > cache->flushthresh, when considering the capacity of the cache. > > The cache->flushthresh field is kept for API/ABI compatibility > purposes, > > and initialized to cache->size instead of cache->size * 1.5. > > > > (2) was improved by generally comparing to cache->size / 2 instead of > > RTE_MEMPOOL_CACHE_MAX_SIZE, when checking the bounce buffer limit. > > > > (3) was improved by flushing and replenishing the cache by half its > size, > > so a flush/refill can be followed randomly by get or put requests. > > This also reduced the number of objects in each flush/refill > operation. > > > > As a consequence of these changes, the size of the array holding the > > objects in the cache (cache->objs[]) no longer needs to be > > 2 * RTE_MEMPOOL_CACHE_MAX_SIZE, and can be reduced to > > RTE_MEMPOOL_CACHE_MAX_SIZE at an API/ABI breaking release. > > > > Performance data: > > With a real WAN Optimization application, where the number of > allocated > > packets varies (as they are held in e.g. shaper queues), the mempool > > cache miss rate dropped from ca. 1/20 objects to ca. 1/48 objects. > > This was deployed in production at an ISP, and using an effective > cache > > size of 384 objects. > > Does the application run as a RTC (run-to-complete) mode?
Yes, the application runs as RTC mode. > How about pipeline model which NIC recv packets and enqueue ring, > another > work thread dequeue packets, process packets and then free packets > mbuf? > If one thread only receives packets (mempool get) and another thread only transmits/frees (mempool put), their cache miss rate roughly doubles. But the number of objects copied to/from the backend per cache miss roughly halves to exactly size/2. And the backend copy operations become CPU cache aligned (assuming all transactions with the backend go via the mempool cache). The release notes mention that such pipelined applications should double their configured mempool cache size.

