> > >
> > > In an eventdev world, multiple workers (with ordered queue) will be
> > > working on IPsec ESP processing. The ESP header's sequence number is
> > > unique and has to be sequentially incremented in an orderly manner.
> > > This rises a need for incrementing sequence number in crypto stage
> > > especially in event crypto adapter. By adding a user callback to
> > > cryptodev at enqueue burst, the user callback will get executed in the
> > > context of event crypto adapter. This helps the application to
> > > increment the ESP sequence number atomically and orderly manner.
> > >
> > > This patch adds APIs to add/remove callback functions. The callback
> > > function will be called for each burst of crypto ops received on a
> > > given crypto device queue pair.
> > >
> > > v1->v2:
> > > Moved callback related members to the end of cryptodev struct Added
> > > support for RCU
> > >
> > > Signed-off-by: Abhinandan Gujjar <abhinandan.guj...@intel.com>
> > > ---
> > > config/common_base | 1 +
> > > lib/librte_cryptodev/Makefile | 2 +-
> > > lib/librte_cryptodev/rte_cryptodev.c | 157
> > +++++++++++++++++++++++++
> > > lib/librte_cryptodev/rte_cryptodev.h | 154
> > +++++++++++++++++++++++-
> > > lib/librte_cryptodev/rte_cryptodev_version.map | 6 +
> > > 5 files changed, 318 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/config/common_base b/config/common_base index
> > > fbf0ee7..f5ebde4 100644
> > > --- a/config/common_base
> > > +++ b/config/common_base
> > > @@ -599,6 +599,7 @@
> > CONFIG_RTE_LIBRTE_PMD_BBDEV_FPGA_5GNR_FEC=y
> > > #
> > > CONFIG_RTE_LIBRTE_CRYPTODEV=y
> > > CONFIG_RTE_CRYPTO_MAX_DEVS=64
> > > +CONFIG_RTE_CRYPTODEV_CALLBACKS=y
> > >
> > > #
> > > # Compile PMD for ARMv8 Crypto device diff --git
> > > a/lib/librte_cryptodev/Makefile b/lib/librte_cryptodev/Makefile index
> > > 73e77a2..514d552 100644
> > > --- a/lib/librte_cryptodev/Makefile
> > > +++ b/lib/librte_cryptodev/Makefile
> > > @@ -10,7 +10,7 @@ LIB = librte_cryptodev.a CFLAGS += -O3 CFLAGS +=
> > > $(WERROR_FLAGS) LDLIBS += -lrte_eal -lrte_mempool -lrte_ring
> > > -lrte_mbuf -LDLIBS += -lrte_kvargs
> > > +LDLIBS += -lrte_kvargs -lrte_rcu
> > >
> > > # library source files
> > > SRCS-y += rte_cryptodev.c rte_cryptodev_pmd.c
> > > cryptodev_trace_points.c diff --git
> > > a/lib/librte_cryptodev/rte_cryptodev.c
> > > b/lib/librte_cryptodev/rte_cryptodev.c
> > > index 1dd795b..2fb3e35 100644
> > > --- a/lib/librte_cryptodev/rte_cryptodev.c
> > > +++ b/lib/librte_cryptodev/rte_cryptodev.c
> > > @@ -38,6 +38,7 @@
> > > #include <rte_string_fns.h>
> > > #include <rte_compat.h>
> > > #include <rte_function_versioning.h>
> > > +#include <rte_rcu_qsbr.h>
> > >
> > > #include "rte_crypto.h"
> > > #include "rte_cryptodev.h"
> > > @@ -499,6 +500,10 @@ struct
> > rte_cryptodev_sym_session_pool_private_data {
> > > return 0;
> > > }
> > >
> > > +#ifdef RTE_CRYPTODEV_CALLBACKS
> > > +/* spinlock for crypto device enq callbacks */ static rte_spinlock_t
> > > +rte_cryptodev_enq_cb_lock = RTE_SPINLOCK_INITIALIZER; #endif
> > >
> > > const char *
> > > rte_cryptodev_get_feature_name(uint64_t flag) @@ -1449,6 +1454,158
> > @@
> > > struct rte_cryptodev *
> > > rte_spinlock_unlock(&rte_cryptodev_cb_lock);
> > > }
> > >
> > > +#ifdef RTE_CRYPTODEV_CALLBACKS
> > > +int
> > > +rte_cryptodev_rcu_qsbr_add(uint8_t dev_id, struct rte_rcu_qsbr *qsbr)
> > > +{
> > > +
> > > + struct rte_cryptodev *dev;
> > > +
> > > + if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) {
> > > + CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id);
> > > + return -EINVAL;
> > > + }
> > > +
> > > + dev = &rte_crypto_devices[dev_id];
> > > + dev->qsbr = qsbr;
> > > + return 0;
> > > +}
> >
> > So if I understand your patch correctly you propose a new working model for
> > crypto-devs:
> > 1. Control-plane has to allocate/setup rcu_qsbr and do
> > rte_cryptodev_rcu_qsbr_add().
> > 2. Data-plane has somehow to obtain pointer to that rcu_qsbr and wrap
> > cryptodev_enqueue()
> > with rcu_qsbr_quiescent() or rcu_qsbr_online()/rcu_qsbr_offline().
> Yes. I think, it is not a new model. It is same as RCU integration with LPM.
> Please refer: https://patches.dpdk.org/cover/73673/
I am talking about new working model for crypto-dev enqueue/dequeue.
As I said above now it becomes data-plane thread responsibility to:
-somehow to obtain pointer to that rcu_qsbr for each cryptodev it is using.
-call rcu sync functions (quiescent/online/offline) on a regular basis.
Note that now data-plane thread would have to do that always -
even if there are now callbacks installed for that cryptodev queue right now.
All that changes behaviour of existing apps and I presume would reduce
adoption of that fature.
I still think all this callback mechanism should be totally opaque to data-plane
threads - user shouldn't change his app code depending on would some
enqueue/dequeue callbacks be installed or not.
>
> >
> > That seems quite a big change and I don't think it is acceptable for most
> > users.
> > From my perspective adding/installing call-backs to the dev has to be opaque
> > to the data-plane code.
> > Also note that different callbacks can be installed by different entities
> > (libs)
> > and might have no idea about each other.
> > That's why I thought it would be better to make all this RCU stuff internal
> > inside cryptodev:
> > hide all this rcu_qsbr allocation/setup inside cryptod somehow to
> > obtain pointer to that rcu_qsbr ev init/queue setup
> > invoke rcu_qsbr_online()/rcu_qsbr_offline() inside cryptodev_enqueue().
> I have already tried exploring above stuffs. There are too many constraints.
> The changes don't fit in, as per RCU design.
Hmm could you be more specific here - what constraints are you referring to?
> Moreover, having rcu api under enqueue_burst() will affect the performance.
It most likely will. Though my expectation it will affect performance only when
some callbacks are installed. My thought here:
callback function by itself will affect cryptdev_enqueue performance anyway,
so adding extra overhead for sync is probably ok here.
Though for situation when no callbacks are installed - perfomance should be
left unaffected (or impact should be as small as possible).
> The changes are more on control plane side, which is one time.
> The data plane changes are minimal.
I still think upper layer data-plane code should stay unaffected (zero
changes).
> >
> > > +
> > > +struct rte_cryptodev_enq_callback *
> > > +rte_cryptodev_add_enq_callback(uint8_t dev_id,
> > > + uint16_t qp_id,
> > > + rte_cryptodev_enq_cb_fn cb_fn,
> > > + void *cb_arg)
> > > +{
> > > + struct rte_cryptodev *dev;
> > > + struct rte_cryptodev_enq_callback *cb, *tail;
> > > +
> > > + if (!cb_fn)
> > > + return NULL;
> > > +
> > > + if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) {
> > > + CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id);
> > > + return NULL;
> > > + }
> > > +
> > > + dev = &rte_crypto_devices[dev_id];
> > > + if (qp_id >= dev->data->nb_queue_pairs) {
> > > + CDEV_LOG_ERR("Invalid queue_pair_id=%d", qp_id);
> > > + return NULL;
> > > + }
> > > +
> > > + cb = rte_zmalloc(NULL, sizeof(*cb), 0);
> > > + if (cb == NULL) {
> > > + CDEV_LOG_ERR("Failed to allocate memory for callback on "
> > > + "dev=%d, queue_pair_id=%d", dev_id, qp_id);
> > > + rte_errno = ENOMEM;
> > > + return NULL;
> > > + }
> > > +
> > > + cb->fn = cb_fn;
> > > + cb->arg = cb_arg;
> > > +
> > > + rte_spinlock_lock(&rte_cryptodev_enq_cb_lock);
> > > + if (dev->enq_cbs == NULL) {
> > > + dev->enq_cbs = rte_zmalloc(NULL, sizeof(cb) *
> > > + dev->data->nb_queue_pairs, 0);
> > > + if (dev->enq_cbs == NULL) {
> > > + CDEV_LOG_ERR("Failed to allocate memory for
> > callbacks");
> > > + rte_errno = ENOMEM;
> > > + rte_free(cb);
> > > + return NULL;
> > > + }
> > > + }
> > > +
> > > + /* Add the callbacks in fifo order. */
> > > + tail = dev->enq_cbs[qp_id];
> > > + if (tail) {
> > > + while (tail->next)
> > > + tail = tail->next;
> > > + tail->next = cb;
> > > + } else
> > > + dev->enq_cbs[qp_id] = cb;
> > > +
> > > + rte_spinlock_unlock(&rte_cryptodev_enq_cb_lock);
> > > +
> > > + return cb;
> > > +}
> > > +
> > > +int
> > > +rte_cryptodev_remove_enq_callback(uint8_t dev_id,
> > > + uint16_t qp_id,
> > > + struct rte_cryptodev_enq_callback *cb) {
> > > + struct rte_cryptodev *dev;
> > > + struct rte_cryptodev_enq_callback **prev_cb, *curr_cb;
> > > + uint16_t qp;
> > > + int free_mem;
> > > + int ret;
> > > +
> > > + free_mem = 1;
> > > +
> > > + if (!cb) {
> > > + CDEV_LOG_ERR("cb is NULL");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) {
> > > + CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id);
> > > + return -EINVAL;
> > > + }
> > > +
> > > + dev = &rte_crypto_devices[dev_id];
> > > + if (qp_id >= dev->data->nb_queue_pairs) {
> > > + CDEV_LOG_ERR("Invalid queue_pair_id=%d", qp_id);
> > > + return -EINVAL;
> > > + }
> > > +
> > > + if (!dev->qsbr) {
> > > + CDEV_LOG_ERR("Rcu qsbr is NULL");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + rte_spinlock_lock(&rte_cryptodev_enq_cb_lock);
> > > + if (dev->enq_cbs == NULL) {
> > > + rte_spinlock_unlock(&rte_cryptodev_enq_cb_lock);
> > > + return -EINVAL;
> > > + }
> > > +
> > > + prev_cb = &dev->enq_cbs[qp_id];
> > > + for (; *prev_cb != NULL; prev_cb = &curr_cb->next) {
> > > + curr_cb = *prev_cb;
> > > + if (curr_cb == cb) {
> > > + /* Remove the user cb from the callback list. */
> > > + *prev_cb = curr_cb->next;
> > > + ret = 0;
> > > + break;
> > > + }
> > > + }
> > > +
> > > + if (!ret) {
> > > + /* Call sync with invalid thread id as this is part of
> > > + * control plane API */
> > > + rte_rcu_qsbr_synchronize(dev->qsbr,
> > > + RTE_QSBR_THRID_INVALID);
> > > + rte_free(cb);
> > > + }
> > > +
> > > + for (qp = 0; qp < dev->data->nb_queue_pairs; qp++)
> > > + if (dev->enq_cbs[qp] != NULL) {
> >
> > Some reference count (number of callbacks) seems like a better approach
> > here.
> Ok.
> >
> > > + free_mem = 0;
> > > + break;
> > > + }
> > > +
> > > + if (free_mem) {
> > > + rte_free(dev->enq_cbs);
> > > + dev->enq_cbs = NULL;
> > > + }
> > > +
> > > + rte_spinlock_unlock(&rte_cryptodev_enq_cb_lock);
> > > +
> > > + return ret;
> > > +}
> > > +#endif
> > >
> > > int
> > > rte_cryptodev_sym_session_init(uint8_t dev_id, diff --git
> > > a/lib/librte_cryptodev/rte_cryptodev.h
> > > b/lib/librte_cryptodev/rte_cryptodev.h
> > > index 7b3ebc2..2c7a47b 100644
> > > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > > @@ -530,6 +530,32 @@ struct rte_cryptodev_qp_conf { };
> > >
> > > /**
> > > + * Function type used for pre processing crypto ops when enqueue
> > > +burst is
> > > + * called.
> > > + *
> > > + * The callback function is called on enqueue burst immediately
> > > + * before the crypto ops are put onto the hardware queue for processing.
> > > + *
> > > + * @param dev_id The identifier of the device.
> > > + * @param qp_id The index of the queue pair in which
> > > ops are
> > > + * to be enqueued for processing. The value
> > > + * must be in the range [0, nb_queue_pairs
> > > - 1]
> > > + * previously supplied to
> > > + * *rte_cryptodev_configure*.
> > > + * @param ops The address of an array of *nb_ops*
> > > pointers
> > > + * to *rte_crypto_op* structures which
> > > contain
> > > + * the crypto operations to be processed.
> > > + * @param nb_ops The number of operations to process.
> > > + * @param user_param The arbitrary user parameter passed in
> > > by the
> > > + * application when the callback was
> > > originally
> > > + * registered.
> > > + * @return The number of ops to be enqueued to the
> > > + * crypto device.
> > > + */
> > > +typedef uint16_t (*rte_cryptodev_enq_cb_fn)(uint16_t dev_id, uint16_t
> > qp_id,
> > > + struct rte_crypto_op **ops, uint16_t nb_ops, void
> > *user_param);
> > > +
> > > +/**
> > > * Typedef for application callback function to be registered by
> > > application
> > > * software for notification of device events
> > > *
> > > @@ -853,7 +879,6 @@ struct rte_cryptodev_config {
> > > enum rte_cryptodev_event_type event,
> > > rte_cryptodev_cb_fn cb_fn, void *cb_arg);
> > >
> > > -
> > > typedef uint16_t (*dequeue_pkt_burst_t)(void *qp,
> > > struct rte_crypto_op **ops, uint16_t nb_ops);
> > > /**< Dequeue processed packets from queue pair of a device. */ @@
> > > -870,6 +895,17 @@ typedef uint16_t (*enqueue_pkt_burst_t)(void *qp,
> > > /** Structure to keep track of registered callbacks */
> > > TAILQ_HEAD(rte_cryptodev_cb_list, rte_cryptodev_callback);
> > >
> > > +/**
> > > + * @internal
> > > + * Structure used to hold information about the callbacks to be
> > > +called for a
> > > + * queue pair on enqueue.
> > > + */
> > > +struct rte_cryptodev_enq_callback {
> > > + struct rte_cryptodev_enq_callback *next;
> > > + rte_cryptodev_enq_cb_fn fn;
> > > + void *arg;
> > > +};
> > > +
> > > /** The data structure associated with each crypto device. */ struct
> > > rte_cryptodev {
> > > dequeue_pkt_burst_t dequeue_burst;
> > > @@ -898,6 +934,14 @@ struct rte_cryptodev {
> > > __extension__
> > > uint8_t attached : 1;
> > > /**< Flag indicating the device is attached */
> > > +
> > > +#ifdef RTE_CRYPTODEV_CALLBACKS
> > > + struct rte_cryptodev_enq_callback **enq_cbs;
> > > + /**< User application callback for pre enqueue processing */
> > > +
> > > + struct rte_rcu_qsbr *qsbr;
> > > + /** < RCU QSBR variable for rte_cryptodev_enq_callback */
> >
> > Probably better to have both these fields per queue.
> > Space for them can be allocated at dev_configure() or so.
> enq_cbs is allocated during callback add.
> Unlike ethdev, each cryptodev have their own max queue pair. There is no
> macro for that.
> I think, single RCU should be good enough, as it has mechanism to track all
> its reporting threads.
>
> > BTW, wouldn't it make sense to have ability to add callback for dequeue too?
> As mentioned in the commit message, this patch was driven by a requirement.
> If required, callback for the dequeue can be added in a separate patch.
> >
> > > +#endif
> > > } __rte_cache_aligned;
> > >
> > > void *
> > > @@ -1019,6 +1063,18 @@ struct rte_cryptodev_data {
> > > struct rte_crypto_op **ops, uint16_t nb_ops) {
> > > struct rte_cryptodev *dev = &rte_cryptodevs[dev_id];
> > > +#ifdef RTE_CRYPTODEV_CALLBACKS
> > > + if (unlikely(dev->enq_cbs != NULL && dev->enq_cbs[qp_id] !=
> > NULL)) {
> > > + struct rte_cryptodev_enq_callback *cb =
> > > + dev->enq_cbs[qp_id];
> > > +
> > > + do {
> > > + nb_ops = cb->fn(dev_id, qp_id, ops, nb_ops,
> > > + cb->arg);
> > > + cb = cb->next;
> > > + } while (cb != NULL);
> > > + }
> > > +#endif
> > >
> > > rte_cryptodev_trace_enqueue_burst(dev_id, qp_id, (void **)ops,
> > nb_ops);
> > > return (*dev->enqueue_burst)(
> > > @@ -1351,6 +1407,102 @@ struct rte_cryptodev_asym_session *
> > > struct rte_cryptodev_sym_session *sess, union rte_crypto_sym_ofs
> > ofs,
> > > struct rte_crypto_sym_vec *vec);
> > >
> > > +#ifdef RTE_CRYPTODEV_CALLBACKS
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Add a user callback for a given crypto device and queue pair which
> > > +will be
> > > + * called on crypto ops enqueue.
> > > + *
> > > + * This API configures a function to be called for each burst of
> > > +crypto ops
> > > + * received on a given crypto device queue pair. The return value is
> > > +a pointer
> > > + * that can be used later to remove the callback using
> > > + * rte_cryptodev_remove_enq_callback().
> > > + *
> > > + * Multiple functions are called in the order that they are added.
> > > + *
> > > + * @param dev_id The identifier of the device.
> > > + * @param qp_id The index of the queue pair in which
> > > ops are
> > > + * to be enqueued for processing. The value
> > > + * must be in the range [0, nb_queue_pairs
> > > - 1]
> > > + * previously supplied to
> > > + * *rte_cryptodev_configure*.
> > > + * @param cb_fn The callback function
> > > + * @param cb_arg A generic pointer parameter which will
> > > be
> > passed
> > > + * to each invocation of the callback
> > > function on
> > > + * this crypto device and queue pair.
> > > + *
> > > + * @return
> > > + * NULL on error.
> > > + * On success, a pointer value which can later be used to remove the
> > callback.
> > > + */
> > > +
> > > +__rte_experimental
> > > +struct rte_cryptodev_enq_callback *
> > > +rte_cryptodev_add_enq_callback(uint8_t dev_id,
> > > + uint16_t qp_id,
> > > + rte_cryptodev_enq_cb_fn cb_fn,
> > > + void *cb_arg);
> > > +
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Remove a user callback function for given crypto device and queue
> > > pair.
> > > + *
> > > + * This function is used to removed callbacks that were added to a
> > > +crypto
> > > + * device queue pair using rte_cryptodev_add_enq_callback().
> > > + *
> > > + * Note: The callback expects a RCU QSBR to be configured to
> > > +synchronize
> > > + * to free the memory. Application is expected to configure RCU QSBR
> > > +after
> > > + * adding an enqueue callback.
> > > + *
> > > + *
> > > + * @param dev_id The identifier of the device.
> > > + * @param qp_id The index of the queue pair in which
> > > ops are
> > > + * to be enqueued for processing. The value
> > > + * must be in the range [0, nb_queue_pairs
> > > - 1]
> > > + * previously supplied to
> > > + * *rte_cryptodev_configure*.
> > > + * @param cb Pointer to user supplied callback
> > > created via
> > > + * rte_cryptodev_add_enq_callback().
> > > + *
> > > + * @return
> > > + * - 0: Success. Callback was removed.
> > > + * - -EINVAL: The dev_id or the qp_id is out of range, or the callback
> > > + * is NULL or not found for the crypto device queue pair.
> > > + */
> > > +
> > > +__rte_experimental
> > > +int rte_cryptodev_remove_enq_callback(uint8_t dev_id,
> > > + uint16_t qp_id,
> > > + struct rte_cryptodev_enq_callback *cb);
> > > +
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Associate RCU QSBR variable with a cryptodev.
> > > + *
> > > + * This function is used to add RCU QSBR to a crypto device.
> > > + * The purpose of RCU is to help multiple threads to synchronize
> > > + * with each other before initiating adding/removing callback
> > > + * while dataplane threads are running enqueue callbacks.
> > > + *
> > > + * @param dev_id The identifier of the device.
> > > + * @param qsr RCU QSBR configuration
> > > + * @return
> > > + * On success - 0
> > > + * On error - EINVAL.
> > > + */
> > > +
> > > +__rte_experimental
> > > +int rte_cryptodev_rcu_qsbr_add(uint8_t dev_id, struct rte_rcu_qsbr
> > > +*qsbr); #endif
> > > +
> > > #ifdef __cplusplus
> > > }
> > > #endif
> > > diff --git a/lib/librte_cryptodev/rte_cryptodev_version.map
> > > b/lib/librte_cryptodev/rte_cryptodev_version.map
> > > index 02f6dcf..46de3ca 100644
> > > --- a/lib/librte_cryptodev/rte_cryptodev_version.map
> > > +++ b/lib/librte_cryptodev/rte_cryptodev_version.map
> > > @@ -64,6 +64,7 @@ DPDK_20.0 {
> > > rte_cryptodev_sym_capability_get;
> > > };
> > >
> > > +
> > > EXPERIMENTAL {
> > > global:
> > >
> > > @@ -105,4 +106,9 @@ EXPERIMENTAL {
> > >
> > > # added in 20.08
> > > rte_cryptodev_get_qp_status;
> > > +
> > > + # added in 20.11
> > > + rte_cryptodev_add_enq_callback;
> > > + rte_cryptodev_remove_enq_callback;
> > > + rte_cryptodev_rcu_qsbr_add;
> > > };
> > > --
> > > 1.9.1