Hi Olivier, On 23/5/2016 1:55 PM, Olivier Matz wrote: > Hi David, > > Please find some comments below. > > On 05/19/2016 04:48 PM, David Hunt wrote: >> [...] >> +++ b/lib/librte_mempool/rte_mempool_stack.c >> @@ -0,0 +1,145 @@ >> +/*- >> + * BSD LICENSE >> + * >> + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. >> + * All rights reserved. > Should be 2016?
Yes, fixed. >> ... >> + >> +static void * >> +common_stack_alloc(struct rte_mempool *mp) >> +{ >> + struct rte_mempool_common_stack *s; >> + unsigned n = mp->size; >> + int size = sizeof(*s) + (n+16)*sizeof(void *); >> + >> + /* Allocate our local memory structure */ >> + s = rte_zmalloc_socket("common-stack", > "mempool-stack" ? Done >> + size, >> + RTE_CACHE_LINE_SIZE, >> + mp->socket_id); >> + if (s == NULL) { >> + RTE_LOG(ERR, MEMPOOL, "Cannot allocate stack!\n"); >> + return NULL; >> + } >> + >> + rte_spinlock_init(&s->sl); >> + >> + s->size = n; >> + mp->pool = s; >> + rte_mempool_set_handler(mp, "stack"); > rte_mempool_set_handler() is a user function, it should be called here Removed. >> + >> + return s; >> +} >> + >> +static int common_stack_put(void *p, void * const *obj_table, >> + unsigned n) >> +{ >> + struct rte_mempool_common_stack *s = p; >> + void **cache_objs; >> + unsigned index; >> + >> + rte_spinlock_lock(&s->sl); >> + cache_objs = &s->objs[s->len]; >> + >> + /* Is there sufficient space in the stack ? */ >> + if ((s->len + n) > s->size) { >> + rte_spinlock_unlock(&s->sl); >> + return -ENOENT; >> + } > The usual return value for a failing put() is ENOBUFS (see in rte_ring). Done. > > After reading it, I realize that it's nearly exactly the same code than > in "app/test: test external mempool handler". > http://patchwork.dpdk.org/dev/patchwork/patch/12896/ > > We should drop one of them. If this stack handler is really useful for > a performance use-case, it could go in librte_mempool. At the first > read, the code looks like a demo example : it uses a simple spinlock for > concurrent accesses to the common pool. Maybe the mempool cache hides > this cost, in this case we could also consider removing the use of the > rte_ring. While I agree that the code is similar, the handler in the test is a ring based handler, where as this patch adds an array based handler. I think that the case for leaving it in as a test for the standard handler as part of the previous mempool handler is valid, but maybe there is a case for removing it if we add the stack handler. Maybe a future patch? > Do you have some some performance numbers? Do you know if it scales > with the number of cores? For the mempool_perf_autotest, I'm seeing a 30% increase in performance for the local cache use-case for 1 - 36 cores (results vary within those tests between 10-45% gain, but with an average of 30% gain over all the tests.). However, for the tests with no local cache configured, throughput of the enqueue/dequeue drops by about 30%, with the 36 core yelding the largest drop of 40%. So this handler would not be recommended in no-cache applications. > If we can identify the conditions where this mempool handler > overperforms the default handler, it would be valuable to have them > in the documentation. > Regards, Dave.