Hi Dharmik,
> > > >> Current mempool per core cache implementation stores pointers to mbufs > >> On 64b architectures, each pointer consumes 8B > >> This patch replaces it with index-based implementation, > >> where in each buffer is addressed by (pool base address + index) > >> It reduces the amount of memory/cache required for per core cache > >> > >> L3Fwd performance testing reveals minor improvements in the cache > >> performance (L1 and L2 misses reduced by 0.60%) > >> with no change in throughput > > > > I feel really sceptical about that patch and the whole idea in general: > > - From what I read above there is no real performance improvement observed. > > (In fact on my IA boxes mempool_perf_autotest reports ~20% slowdown, > > see below for more details). > > Currently, the optimizations (loop unroll and vectorization) are only > implemented for ARM64. > Similar optimizations can be implemented for x86 platforms which should close > the performance gap > and in my understanding should give better performance for a bulk size of 32. Might be, but I still don't see the reason for such effort. As you mentioned there is no performance improvement in 'real' apps: l3fwd, etc. on ARM64 even with vectorized version of the code. > > - Space utilization difference looks neglectable too. > > Sorry, I did not understand this point. As I understand one of the expectations from that patch was: reduce memory/cache required, which should improve cache utilization (less misses, etc.). Though I think such improvements would be neglectable and wouldn't cause any real performance gain. > > - The change introduces a new build time config option with a major > > limitation: > > All memzones in a pool have to be within the same 4GB boundary. > > To address it properly, extra changes will be required in init(/populate) > > part of the code. > > I agree to the above mentioned challenges and I am currently working on > resolving these issues. I still think that to justify such changes some really noticeable performance improvement needs to be demonstrated: double-digit speedup for l3fwd/ipsec-secgw/... Otherwise it just not worth the hassle. > > All that will complicate mempool code, will make it more error prone > > and harder to maintain. > > But, as there is no real gain in return - no point to add such extra > > complexity at all. > > > > Konstantin > >