On Sat, Nov 04, 2023 at 06:29:40PM +0100, Morten Brørup wrote:
> I tried a little experiment, which gave a 25 % improvement in mempool
> perf tests for long bursts (n_get_bulk=32 n_put_bulk=32 n_keep=512
> constant_n=0) on a Xeon E5-2620 v4 based system.
> 
> This is the concept:
> 
> If all accesses to the mempool driver goes through the mempool cache,
> we can ensure that these bulk load/stores are always CPU cache aligned,
> by using cache->size when loading/storing to the mempool driver.
> 
> Furthermore, it is rumored that most applications use the default
> mempool cache size, so if the driver tests for that specific value,
> it can use rte_memcpy(src,dst,N) with N known at build time, allowing
> optimal performance for copying the array of objects.
> 
> Unfortunately, I need to change the flush threshold from 1.5 to 2 to
> be able to always use cache->size when loading/storing to the mempool
> driver.
> 
> What do you think?
> 
> PS: If we can't get rid of the mempool cache size threshold factor,
> we really need to expose it through public APIs. A job for another day.
> 
> Signed-off-by: Morten Brørup <m...@smartsharesystems.com>
> ---
Interesting, thanks.

Out of interest, is there any different in performance you observe if using
regular libc memcpy vs rte_memcpy for the ring copies? Since the copy
amount is constant, a regular memcpy call should be expanded by the
compiler itself, and so should be pretty efficient.

/Bruce

Reply via email to