Hi Jerin, I just ran a couple of tests on this patch on the latest master head on a couple of machines. An older quad socket E5-4650 and a quad socket E5-2699 v3
E5-4650: I'm seeing a gain of 2% for un-cached tests and a gain of 9% on the cached tests. E5-2699 v3: I'm seeing a loss of 0.1% for un-cached tests and a gain of 11% on the cached tests. This is purely the autotest comparison, I don't have traffic generator results. But based on the above, I don't think there are any performance issues with the patch. Regards, Dave. On 24/5/2016 4:17 PM, Jerin Jacob wrote: > On Tue, May 24, 2016 at 04:59:47PM +0200, Olivier Matz wrote: >> Hi Jerin, >> >> >> On 05/24/2016 04:50 PM, Jerin Jacob wrote: >>> Signed-off-by: Jerin Jacob <jerin.jacob at caviumnetworks.com> >>> --- >>> lib/librte_mempool/rte_mempool.h | 5 ++--- >>> 1 file changed, 2 insertions(+), 3 deletions(-) >>> >>> diff --git a/lib/librte_mempool/rte_mempool.h >>> b/lib/librte_mempool/rte_mempool.h >>> index ed2c110..ebe399a 100644 >>> --- a/lib/librte_mempool/rte_mempool.h >>> +++ b/lib/librte_mempool/rte_mempool.h >>> @@ -74,6 +74,7 @@ >>> #include <rte_memory.h> >>> #include <rte_branch_prediction.h> >>> #include <rte_ring.h> >>> +#include <rte_memcpy.h> >>> >>> #ifdef __cplusplus >>> extern "C" { >>> @@ -917,7 +918,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const >>> *obj_table, >>> unsigned n, __rte_unused int is_mp) >>> { >>> struct rte_mempool_cache *cache; >>> - uint32_t index; >>> void **cache_objs; >>> unsigned lcore_id = rte_lcore_id(); >>> uint32_t cache_size = mp->cache_size; >>> @@ -946,8 +946,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const >>> *obj_table, >>> */ >>> >>> /* Add elements back into the cache */ >>> - for (index = 0; index < n; ++index, obj_table++) >>> - cache_objs[index] = *obj_table; >>> + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); >>> >>> cache->len += n; >>> >>> >> The commit title should be "mempool" instead of "mbuf". > I will fix it. > >> Are you seeing some performance improvement by using rte_memcpy()? > Yes, In some case, In default case, It was replaced with memcpy by the > compiler itself(gcc 5.3). But when I tried external mempool manager patch and > then performance dropped almost 800Kpps. Debugging further it turns out that > external mempool managers unrelated change was knocking out the memcpy. > explicit rte_memcpy brought back 500Kpps. Remaing 300Kpps drop is still > unknown(In my test setup, packets are in the local cache, so it must be > something do with __mempool_put_bulk text alignment change or similar. > > Anyone else observed performance drop with external poolmanager? > > Jerin > >> Regards >> Olivier