Hi Dmitry,

On Wed, Oct 13, 2021 at 02:01:28PM +0300, Dmitry Kozlyuk wrote:
> Data path performance can benefit if the PMD knows which memory it will
> need to handle in advance, before the first mbuf is sent to the PMD.
> It is impractical, however, to consider all allocated memory for this
> purpose. Most often mbuf memory comes from mempools that can come and
> go. PMD can enumerate existing mempools on device start, but it also
> needs to track creation and destruction of mempools after the forwarding
> starts but before an mbuf from the new mempool is sent to the device.
> 
> Add an API to register callback for mempool life cycle events:
> * rte_mempool_event_callback_register()
> * rte_mempool_event_callback_unregister()
> Currently tracked events are:
> * RTE_MEMPOOL_EVENT_READY (after populating a mempool)
> * RTE_MEMPOOL_EVENT_DESTROY (before freeing a mempool)
> Provide a unit test for the new API.
> The new API is internal, because it is primarily demanded by PMDs that
> may need to deal with any mempools and do not control their creation,
> while an application, on the other hand, knows which mempools it creates
> and doesn't care about internal mempools PMDs might create.
> 
> Signed-off-by: Dmitry Kozlyuk <dkozl...@nvidia.com>
> Acked-by: Matan Azrad <ma...@nvidia.com>
> ---
>  app/test/test_mempool.c   | 209 ++++++++++++++++++++++++++++++++++++++
>  lib/mempool/rte_mempool.c | 137 +++++++++++++++++++++++++
>  lib/mempool/rte_mempool.h |  61 +++++++++++
>  lib/mempool/version.map   |   8 ++
>  4 files changed, 415 insertions(+)

(...)

> --- a/lib/mempool/rte_mempool.c
> +++ b/lib/mempool/rte_mempool.c
> @@ -42,6 +42,18 @@ static struct rte_tailq_elem rte_mempool_tailq = {
>  };
>  EAL_REGISTER_TAILQ(rte_mempool_tailq)
>  
> +TAILQ_HEAD(mempool_callback_list, rte_tailq_entry);
> +
> +static struct rte_tailq_elem callback_tailq = {
> +     .name = "RTE_MEMPOOL_CALLBACK",
> +};
> +EAL_REGISTER_TAILQ(callback_tailq)
> +
> +/* Invoke all registered mempool event callbacks. */
> +static void
> +mempool_event_callback_invoke(enum rte_mempool_event event,
> +                           struct rte_mempool *mp);
> +
>  #define CACHE_FLUSHTHRESH_MULTIPLIER 1.5
>  #define CALC_CACHE_FLUSHTHRESH(c)    \
>       ((typeof(c))((c) * CACHE_FLUSHTHRESH_MULTIPLIER))
> @@ -360,6 +372,10 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char 
> *vaddr,
>       STAILQ_INSERT_TAIL(&mp->mem_list, memhdr, next);
>       mp->nb_mem_chunks++;
>  
> +     /* Report the mempool as ready only when fully populated. */
> +     if (mp->populated_size >= mp->size)
> +             mempool_event_callback_invoke(RTE_MEMPOOL_EVENT_READY, mp);
> +

One small comment here. I think it does not happen today, but in the
future, something that could happen is:
  - create empty mempool
  - populate mempool
  - use mempool
  - populate mempool with more objects
  - use mempool

I've seen one usage there: https://www.youtube.com/watch?v=SzQFn9tm4Sw

In that case, it would require a POPULATE event instead of a
MEMPOOL_CREATE event.

Enhancing the documentation to better explain when the callback is
invoked looks enough to me for the moment.

>       rte_mempool_trace_populate_iova(mp, vaddr, iova, len, free_cb, opaque);
>       return i;
>  
> @@ -722,6 +738,7 @@ rte_mempool_free(struct rte_mempool *mp)
>       }
>       rte_mcfg_tailq_write_unlock();
>  
> +     mempool_event_callback_invoke(RTE_MEMPOOL_EVENT_DESTROY, mp);
>       rte_mempool_trace_free(mp);
>       rte_mempool_free_memchunks(mp);
>       rte_mempool_ops_free(mp);
> @@ -1343,3 +1360,123 @@ void rte_mempool_walk(void (*func)(struct rte_mempool 
> *, void *),
>  
>       rte_mcfg_mempool_read_unlock();
>  }
> +
> +struct mempool_callback {
> +     rte_mempool_event_callback *func;
> +     void *user_data;
> +};
> +
> +static void
> +mempool_event_callback_invoke(enum rte_mempool_event event,
> +                           struct rte_mempool *mp)
> +{
> +     struct mempool_callback_list *list;
> +     struct rte_tailq_entry *te;
> +     void *tmp_te;
> +
> +     rte_mcfg_tailq_read_lock();
> +     list = RTE_TAILQ_CAST(callback_tailq.head, mempool_callback_list);
> +     RTE_TAILQ_FOREACH_SAFE(te, list, next, tmp_te) {
> +             struct mempool_callback *cb = te->data;
> +             rte_mcfg_tailq_read_unlock();
> +             cb->func(event, mp, cb->user_data);
> +             rte_mcfg_tailq_read_lock();

I think it is dangerous to unlock the list before invoking the callback.
During that time, another thread can remove the next mempool callback, and
the next iteration will access to a freed element, causing an undefined
behavior.

Is it a problem to keep the lock held during the callback invocation?

I see that you have a test for this, and that you wrote a comment in the
documentation:

 * rte_mempool_event_callback_register() may be called from within the callback,
 * but the callbacks registered this way will not be invoked for the same event.
 * rte_mempool_event_callback_unregister() may only be safely called
 * to remove the running callback.

But is there a use-case for this?
If no, I'll tend to say that we can document that it is not allowed to
create, free or list mempools or register cb from the callback.


> +     }
> +     rte_mcfg_tailq_read_unlock();
> +}
> +
> +int
> +rte_mempool_event_callback_register(rte_mempool_event_callback *func,
> +                                 void *user_data)
> +{
> +     struct mempool_callback_list *list;
> +     struct rte_tailq_entry *te = NULL;
> +     struct mempool_callback *cb;
> +     void *tmp_te;
> +     int ret;
> +
> +     if (func == NULL) {
> +             rte_errno = EINVAL;
> +             return -rte_errno;
> +     }
> +
> +     rte_mcfg_mempool_read_lock();
> +     rte_mcfg_tailq_write_lock();
> +
> +     list = RTE_TAILQ_CAST(callback_tailq.head, mempool_callback_list);
> +     RTE_TAILQ_FOREACH_SAFE(te, list, next, tmp_te) {
> +             struct mempool_callback *cb =
> +                                     (struct mempool_callback *)te->data;
> +             if (cb->func == func && cb->user_data == user_data) {
> +                     ret = -EEXIST;
> +                     goto exit;
> +             }
> +     }
> +
> +     te = rte_zmalloc("MEMPOOL_TAILQ_ENTRY", sizeof(*te), 0);
> +     if (te == NULL) {
> +             RTE_LOG(ERR, MEMPOOL,
> +                     "Cannot allocate event callback tailq entry!\n");
> +             ret = -ENOMEM;
> +             goto exit;
> +     }
> +
> +     cb = rte_malloc("MEMPOOL_EVENT_CALLBACK", sizeof(*cb), 0);
> +     if (cb == NULL) {
> +             RTE_LOG(ERR, MEMPOOL,
> +                     "Cannot allocate event callback!\n");
> +             rte_free(te);
> +             ret = -ENOMEM;
> +             goto exit;
> +     }
> +
> +     cb->func = func;
> +     cb->user_data = user_data;
> +     te->data = cb;
> +     TAILQ_INSERT_TAIL(list, te, next);
> +     ret = 0;
> +
> +exit:
> +     rte_mcfg_tailq_write_unlock();
> +     rte_mcfg_mempool_read_unlock();
> +     rte_errno = -ret;
> +     return ret;
> +}
> +
> +int
> +rte_mempool_event_callback_unregister(rte_mempool_event_callback *func,
> +                                   void *user_data)
> +{
> +     struct mempool_callback_list *list;
> +     struct rte_tailq_entry *te = NULL;
> +     struct mempool_callback *cb;
> +     int ret;
> +
> +     if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
> +             rte_errno = EPERM;
> +             return -1;
> +     }

The help of the register function says
 * Callbacks will be invoked in the process that creates the mempool.

So registration is allowed from primary or secondary process. Can't a
secondary process destroys the callback it has loaded?

> +
> +     rte_mcfg_mempool_read_lock();
> +     rte_mcfg_tailq_write_lock();

I don't understand why there are 2 locks here.

After looking at the code, I think the locking model is already
incorrect in current mempool code:

   rte_mcfg_tailq_write_lock() is used in create and free to protect the
     access to the mempool tailq

   rte_mcfg_mempool_write_lock() is used in create(), to protect from
     concurrent creation (with same name for instance), but I doubt it
     is absolutly needed, because memzone_reserve is already protected.

   rte_mcfg_mempool_read_lock() is used in dump functions, but to me
     it should use rte_mcfg_tailq_read_lock() instead.
     Currently, doing a dump and a free concurrently can cause a crash
     because they are not using the same lock.

In your case, I suggest to use only one lock to protect the callback
list. I think it can be rte_mcfg_tailq_*_lock().

Reply via email to