On Thu, Feb 27, 2025 at 10:14:27AM +0100, Morten Brørup wrote: > > From: Bruce Richardson [mailto:bruce.richard...@intel.com] > > Sent: Wednesday, 26 February 2025 17.53 > > > > On Wed, Feb 26, 2025 at 03:59:22PM +0000, Morten Brørup wrote: > > > The comparisons lcore_id < RTE_MAX_LCORE and lcore_id != LCORE_ID_ANY > > are > > > equivalent, but the latter compiles to fewer bytes of code space. > > > Similarly for lcore_id >= RTE_MAX_LCORE and lcore_id == LCORE_ID_ANY. > > > > > > The rte_mempool_get_ops() function is also used in the fast path, so > > > RTE_VERIFY() was replaced by RTE_ASSERT(). > > > > > > Compilers implicitly consider comparisons of variable == 0 likely, so > > > unlikely() was added to the check for no mempool cache (mp- > > >cache_size == > > > 0) in the rte_mempool_default_cache() function. > > > > > > The rte_mempool_do_generic_put() function for adding objects to a > > mempool > > > was refactored as follows: > > > - The comparison for the request itself being too big, which is > > considered > > > unlikely, was moved down and out of the code path where the cache > > has > > > sufficient room for the added objects, which is considered the most > > > likely code path. > > > - Added __rte_assume() about the cache length, size and threshold, > > for > > > compiler optimization when "n" is compile time constant. > > > - Added __rte_assume() about "ret" being zero, so other functions > > using > > > the value returned by this function can be potentially optimized by > > the > > > compiler; especially when it merges multiple sequential code paths > > of > > > inlined code depending on the return value being either zero or > > > negative. > > > - The refactored source code (with comments) made the separate > > comment > > > describing the cache flush/add algorithm superfluous, so it was > > removed. > > > > > > A few more likely()/unlikely() were added. > > > > > > A few comments were improved for readability. > > > > > > Some assertions, RTE_ASSERT(), were added. Most importantly to assert > > that > > > the return values of the mempool drivers' enqueue and dequeue > > operations > > > are API compliant, i.e. 0 (for success) or negative (for failure), > > and > > > never positive. > > > > > > Signed-off-by: Morten Brørup <m...@smartsharesystems.com> > > > --- > > > lib/mempool/rte_mempool.h | 67 ++++++++++++++++++++++--------------- > > -- > > > 1 file changed, 38 insertions(+), 29 deletions(-) > > > > > Is there any measurable performance change with these modifications? > > It varies. > Here are some of the good ones, tested on a VM under VMware: > > mempool_autotest cache=512 cores=1 > n_get_bulk=64 n_put_bulk=64 n_keep=128 constant_n=0 > rate_persec=1309408130 -> 1417067889 : +8.2 % > > mempool_autotest cache=512 cores=1 > n_get_bulk=64 n_put_bulk=64 n_keep=128 constant_n=1 > rate_persec=1479812844 -> 1573307159 : +6.3 % > > mempool_autotest cache=512 cores=1 > n_max_bulk=32 n_keep=128 constant_n=0 > rate_persec=825183959 -> 868013386 : +5.2 % > > The last result is from a new type of test where the size of every get/put > varies between 1 and n_max_bulk, so the CPU's dynamic branch predictor cannot > predict the request size. > I'll probably provide a separate patch for test_mempool_perf.c with this new > test type, when I have finished it. >
Thanks, those results look worthwhile so. /Bruce