<snip> > Subject: RE: [PATCH v3 3/9] ring: introduce RTS ring mode > > > > Introduce relaxed tail sync (RTS) mode for MT ring synchronization. > > > Aim to reduce stall times in case when ring is used on overcommited > > > cpus (multiple active threads on the same cpu). > > > The main difference from original MP/MC algorithm is that tail value > > > is increased not by every thread that finished enqueue/dequeue, but > > > only by the last one. > > > That allows threads to avoid spinning on ring tail value, leaving > > > actual tail value change to the last thread in the update queue. > > > > > > check-abi.sh reports what I believe is a false-positive about ring > > > cons/prod changes. As a workaround, devtools/libabigail.abignore is > > > updated to suppress *struct ring* related errors. > > This can be removed from the commit message. > > > > > > > > Signed-off-by: Konstantin Ananyev <konstantin.anan...@intel.com> > > > --- > > > devtools/libabigail.abignore | 7 + > > > lib/librte_ring/Makefile | 5 +- > > > lib/librte_ring/meson.build | 5 +- > > > lib/librte_ring/rte_ring.c | 100 +++++++- > > > lib/librte_ring/rte_ring.h | 110 ++++++++- > > > lib/librte_ring/rte_ring_elem.h | 86 ++++++- > > > lib/librte_ring/rte_ring_rts.h | 316 +++++++++++++++++++++++++ > > > lib/librte_ring/rte_ring_rts_elem.h | 205 ++++++++++++++++ > > > lib/librte_ring/rte_ring_rts_generic.h | 210 ++++++++++++++++ > > > 9 files changed, 1015 insertions(+), 29 deletions(-) create mode > > > 100644 lib/librte_ring/rte_ring_rts.h create mode 100644 > > > lib/librte_ring/rte_ring_rts_elem.h > > > create mode 100644 lib/librte_ring/rte_ring_rts_generic.h > > > > > > diff --git a/devtools/libabigail.abignore > > > b/devtools/libabigail.abignore index a59df8f13..cd86d89ca 100644 > > > --- a/devtools/libabigail.abignore > > > +++ b/devtools/libabigail.abignore > > > @@ -11,3 +11,10 @@ > > > type_kind = enum > > > name = rte_crypto_asym_xform_type > > > changed_enumerators = > RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END > > > +; Ignore updates of ring prod/cons > > > +[suppress_type] > > > + type_kind = struct > > > + name = rte_ring > > > +[suppress_type] > > > + type_kind = struct > > > + name = rte_event_ring > > Does this block the reporting of these structures forever? > > Till we'll have a fix in libabigail, then we can remove these lines. > I don't know any better alternative. David, does this block all issues in the future for rte_ring library?
> > > > > > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile > > > index 917c560ad..8f5c284cc 100644 > > > --- a/lib/librte_ring/Makefile > > > +++ b/lib/librte_ring/Makefile > > > @@ -18,6 +18,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c > > > SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \ > > > rte_ring_elem.h \ > > > rte_ring_generic.h \ > > > - rte_ring_c11_mem.h > > > + rte_ring_c11_mem.h \ > > > + rte_ring_rts.h \ > > > + rte_ring_rts_elem.h \ > > > + rte_ring_rts_generic.h > > > > > > include $(RTE_SDK)/mk/rte.lib.mk > > > diff --git a/lib/librte_ring/meson.build > > > b/lib/librte_ring/meson.build index f2f3ccc88..612936afb 100644 > > > --- a/lib/librte_ring/meson.build > > > +++ b/lib/librte_ring/meson.build > > > @@ -5,7 +5,10 @@ sources = files('rte_ring.c') headers = > > > files('rte_ring.h', > > > 'rte_ring_elem.h', > > > 'rte_ring_c11_mem.h', > > > - 'rte_ring_generic.h') > > > + 'rte_ring_generic.h', > > > + 'rte_ring_rts.h', > > > + 'rte_ring_rts_elem.h', > > > + 'rte_ring_rts_generic.h') > > > > > > # rte_ring_create_elem and rte_ring_get_memsize_elem are > > > experimental allow_experimental_apis = true diff --git > > > a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c index > > > fa5733907..222eec0fb 100644 > > > --- a/lib/librte_ring/rte_ring.c > > > +++ b/lib/librte_ring/rte_ring.c > > > @@ -45,6 +45,9 @@ EAL_REGISTER_TAILQ(rte_ring_tailq) > > > /* true if x is a power of 2 */ > > > #define POWEROF2(x) ((((x)-1) & (x)) == 0) > > > > > > +/* by default set head/tail distance as 1/8 of ring capacity */ > > > +#define HTD_MAX_DEF 8 > > > + > > > /* return the size of memory occupied by a ring */ ssize_t > > > rte_ring_get_memsize_elem(unsigned int esize, unsigned int count) @@ > > > - > > > 79,11 +82,84 @@ rte_ring_get_memsize(unsigned int count) > > > return rte_ring_get_memsize_elem(sizeof(void *), count); } > > > > > > +/* > > > + * internal helper function to reset prod/cons head-tail values. > > > + */ > > > +static void > > > +reset_headtail(void *p) > > > +{ > > > + struct rte_ring_headtail *ht; > > > + struct rte_ring_rts_headtail *ht_rts; > > > + > > > + ht = p; > > > + ht_rts = p; > > > + > > > + switch (ht->sync_type) { > > > + case RTE_RING_SYNC_MT: > > > + case RTE_RING_SYNC_ST: > > > + ht->head = 0; > > > + ht->tail = 0; > > > + break; > > > + case RTE_RING_SYNC_MT_RTS: > > > + ht_rts->head.raw = 0; > > > + ht_rts->tail.raw = 0; > > > + break; > > > + default: > > > + /* unknown sync mode */ > > > + RTE_ASSERT(0); > > > + } > > > +} > > > + > > > void > > > rte_ring_reset(struct rte_ring *r) > > > { > > > - r->prod.head = r->cons.head = 0; > > > - r->prod.tail = r->cons.tail = 0; > > > + reset_headtail(&r->prod); > > > + reset_headtail(&r->cons); > > > +} > > > + > > > +/* > > > + * helper function, calculates sync_type values for prod and cons > > > + * based on input flags. Returns zero at success or negative > > > + * errno value otherwise. > > > + */ > > > +static int > > > +get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st, > > > + enum rte_ring_sync_type *cons_st) > > > +{ > > > + static const uint32_t prod_st_flags = > > > + (RING_F_SP_ENQ | RING_F_MP_RTS_ENQ); > > > + static const uint32_t cons_st_flags = > > > + (RING_F_SC_DEQ | RING_F_MC_RTS_DEQ); > > > + > > > + switch (flags & prod_st_flags) { > > > + case 0: > > > + *prod_st = RTE_RING_SYNC_MT; > > > + break; > > > + case RING_F_SP_ENQ: > > > + *prod_st = RTE_RING_SYNC_ST; > > > + break; > > > + case RING_F_MP_RTS_ENQ: > > > + *prod_st = RTE_RING_SYNC_MT_RTS; > > > + break; > > > + default: > > > + return -EINVAL; > > > + } > > > + > > > + switch (flags & cons_st_flags) { > > > + case 0: > > > + *cons_st = RTE_RING_SYNC_MT; > > > + break; > > > + case RING_F_SC_DEQ: > > > + *cons_st = RTE_RING_SYNC_ST; > > > + break; > > > + case RING_F_MC_RTS_DEQ: > > > + *cons_st = RTE_RING_SYNC_MT_RTS; > > > + break; > > > + default: > > > + return -EINVAL; > > > + } > > > + > > > + return 0; > > > } > > > > > > int > > > @@ -100,16 +176,20 @@ rte_ring_init(struct rte_ring *r, const char > > > *name, unsigned count, > > > RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) & > > > RTE_CACHE_LINE_MASK) != 0); > > > > > > + RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) != > > > + offsetof(struct rte_ring_rts_headtail, sync_type)); > > > + RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) != > > > + offsetof(struct rte_ring_rts_headtail, tail.val.pos)); > > > + > > > /* init the ring structure */ > > > memset(r, 0, sizeof(*r)); > > > ret = strlcpy(r->name, name, sizeof(r->name)); > > > if (ret < 0 || ret >= (int)sizeof(r->name)) > > > return -ENAMETOOLONG; > > > r->flags = flags; > > > - r->prod.sync_type = (flags & RING_F_SP_ENQ) ? > > > - RTE_RING_SYNC_ST : RTE_RING_SYNC_MT; > > > - r->cons.sync_type = (flags & RING_F_SC_DEQ) ? > > > - RTE_RING_SYNC_ST : RTE_RING_SYNC_MT; > > > + ret = get_sync_type(flags, &r->prod.sync_type, &r->cons.sync_type); > > > + if (ret != 0) > > > + return ret; > > > > > > if (flags & RING_F_EXACT_SZ) { > > > r->size = rte_align32pow2(count + 1); @@ -126,8 +206,12 > @@ > > > rte_ring_init(struct rte_ring *r, const char *name, unsigned count, > > > r->mask = count - 1; > > > r->capacity = r->mask; > > > } > > > - r->prod.head = r->cons.head = 0; > > > - r->prod.tail = r->cons.tail = 0; > > > + > > > + /* set default values for head-tail distance */ > > > + if (flags & RING_F_MP_RTS_ENQ) > > > + rte_ring_set_prod_htd_max(r, r->capacity / HTD_MAX_DEF); > > > + if (flags & RING_F_MC_RTS_DEQ) > > > + rte_ring_set_cons_htd_max(r, r->capacity / HTD_MAX_DEF); > > > > > > return 0; > > > } > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > > > index > > > d4775a063..f6f084d79 100644 > > > --- a/lib/librte_ring/rte_ring.h > > > +++ b/lib/librte_ring/rte_ring.h > > > @@ -48,6 +48,7 @@ extern "C" { > > > #include <rte_branch_prediction.h> > > > #include <rte_memzone.h> > > > #include <rte_pause.h> > > > +#include <rte_debug.h> > > > > > > #define RTE_TAILQ_RING_NAME "RTE_RING" > > > > > > @@ -65,10 +66,13 @@ enum rte_ring_queue_behavior { enum > > > rte_ring_sync_type { > > > RTE_RING_SYNC_MT, /**< multi-thread safe (default mode) */ > > > RTE_RING_SYNC_ST, /**< single thread only */ > > > +#ifdef ALLOW_EXPERIMENTAL_API > > > + RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */ > > > #endif > > > }; > > > > > > /** > > > - * structure to hold a pair of head/tail values and other metadata. > > > + * structures to hold a pair of head/tail values and other metadata. > > > * Depending on sync_type format of that structure might be different, > > > * but offset for *sync_type* and *tail* values should remain the same. > > > */ > > > @@ -84,6 +88,21 @@ struct rte_ring_headtail { > > > }; > > > }; > > > > > > +union rte_ring_ht_poscnt { > > nit, this is specific to RTS, may be change this to rte_ring_rts_ht_poscnt? > > Ok. > > > > > > + uint64_t raw; > > > + struct { > > > + uint32_t cnt; /**< head/tail reference counter */ > > > + uint32_t pos; /**< head/tail position */ > > > + } val; > > > +}; > > > + > > > +struct rte_ring_rts_headtail { > > > + volatile union rte_ring_ht_poscnt tail; > > > + enum rte_ring_sync_type sync_type; /**< sync type of prod/cons */ > > > + uint32_t htd_max; /**< max allowed distance between head/tail */ > > > + volatile union rte_ring_ht_poscnt head; }; > > > + > > > /** > > > * An RTE ring structure. > > > * > > > @@ -111,11 +130,21 @@ struct rte_ring { > > > char pad0 __rte_cache_aligned; /**< empty cache line */ > > > > > > /** Ring producer status. */ > > > - struct rte_ring_headtail prod __rte_cache_aligned; > > > + RTE_STD_C11 > > > + union { > > > + struct rte_ring_headtail prod; > > > + struct rte_ring_rts_headtail rts_prod; > > > + } __rte_cache_aligned; > > > + > > > char pad1 __rte_cache_aligned; /**< empty cache line */ > > > > > > /** Ring consumer status. */ > > > - struct rte_ring_headtail cons __rte_cache_aligned; > > > + RTE_STD_C11 > > > + union { > > > + struct rte_ring_headtail cons; > > > + struct rte_ring_rts_headtail rts_cons; > > > + } __rte_cache_aligned; > > > + > > > char pad2 __rte_cache_aligned; /**< empty cache line */ }; > > > > > > @@ -132,6 +161,9 @@ struct rte_ring { #define RING_F_EXACT_SZ > > > 0x0004 #define RTE_RING_SZ_MASK (0x7fffffffU) /**< Ring size mask > > > */ > > > > > > +#define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP > RTS". > > > +*/ #define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is > "MC > > > +RTS". */ > > > + > > > #define __IS_SP RTE_RING_SYNC_ST > > > #define __IS_MP RTE_RING_SYNC_MT > > > #define __IS_SC RTE_RING_SYNC_ST > > > @@ -461,6 +493,10 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, > > > void * const *obj_table, > > > RTE_RING_SYNC_ST, free_space); > > > } > > > > > > +#ifdef ALLOW_EXPERIMENTAL_API > > > +#include <rte_ring_rts.h> > > > +#endif > > > + > > > /** > > > * Enqueue several objects on a ring. > > > * > > > @@ -484,8 +520,21 @@ static __rte_always_inline unsigned int > > > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table, > > > unsigned int n, unsigned int *free_space) { > > > - return __rte_ring_do_enqueue(r, obj_table, n, > > > RTE_RING_QUEUE_FIXED, > > > - r->prod.sync_type, free_space); > > > + switch (r->prod.sync_type) { > > > + case RTE_RING_SYNC_MT: > > > + return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space); > > > + case RTE_RING_SYNC_ST: > > > + return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space); > > Have you validated if these affect the performance for the existing APIs? > > I run ring_pmd_perf_autotest > (AFAIK, that's the only one of our perf tests that calls > rte_ring_enqueue/dequeue), and didn't see any real difference in perf > numbers. > > > I am also wondering why should we support these new modes in the legacy > APIs? > > Majority of DPDK users still do use legacy API, and I am not sure all of them > will be happy to switch to _elem_ one manually. > Plus I can't see how we can justify that after let say: > rte_ring_init(ring, ..., RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ); returns > with success valid call to rte_ring_enqueue(ring,...) should fail. Agree, I think the only way right now is through documentation. > > > I think users should move to use rte_ring_xxx_elem APIs. If users want to > use RTS/HTS it will be a good time for them to move to new APIs. > > If they use rte_ring_enqueue/dequeue all they have to do - just change flags > in ring_create/ring_init call. > With what you suggest - they have to change every > rte_ring_enqueue/dequeue to rte_ring_elem_enqueue/dequeue. > That's much bigger code churn. But these are just 1 to 1 mapped. I would think, there are not a whole lot of them in the application code, may be ~10 lines? I think the bigger factor for the user here is the algorithm changes in rte_ring library. Bigger effort for the users would be testing rather than code changes in the applications. > > > They anyway have to test their code for RTS/HTS, might as well make the > change to new APIs and test both. > > It will be less code to maintain for the community as well. > > That's true, right now there is a lot of duplication between _elem_ and legacy > code. > Actually the only real diff between them - actual copying of the objects. > But I thought we are going to deal with that, just by changing one day all > legacy API to wrappers around _elem_ calls, i.e something like: > > static __rte_always_inline unsigned int > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table, > unsigned int n, unsigned int *free_space) { > return rte_ring_enqueue_elem_bulk(r, obj_table, sizeof(uintptr_t), n, > free_space); } > > That way users will switch to new API automatically, without any extra effort > for them, and we will be able to remove legacy code. > Do you have some other thoughts here how to deal with this legacy/elem > conversion? Yes, that is what I was thinking, but had not considered any addition of new APIs. But, I am wondering if we should look at deprecation? If we decide to deprecate, it would be good to avoid making the users of RTS/HTS do the work twice (once to make the switch to RTS/HTS and then another to _elem_ APIs). One thing we can do is to implement the wrappers you mentioned above for RTS/HTS now. I also it is worth considering to switch to these wrappers 20.05 so that come 20.11, we have a code base that has gone through couple of releases' testing. > > > > > > #ifdef > > > +ALLOW_EXPERIMENTAL_API > > > + case RTE_RING_SYNC_MT_RTS: > > > + return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n, > > > + free_space); > > > +#endif > > > + } > > > + > > > + /* valid ring should never reach this point */ > > > + RTE_ASSERT(0); > > > + return 0; > > > } > > > <snip> > > > > > > #ifdef __cplusplus > > > diff --git a/lib/librte_ring/rte_ring_elem.h > > > b/lib/librte_ring/rte_ring_elem.h index 28f9836e6..5de0850dc 100644 > > > --- a/lib/librte_ring/rte_ring_elem.h > > > +++ b/lib/librte_ring/rte_ring_elem.h > > > @@ -542,6 +542,8 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring > > > *r, const void *obj_table, > > > RTE_RING_QUEUE_FIXED, __IS_SP, free_space); } > > > > > > +#include <rte_ring_rts_elem.h> <snip> > > > > > > #ifdef __cplusplus > > > diff --git a/lib/librte_ring/rte_ring_rts.h > > > b/lib/librte_ring/rte_ring_rts.h new file mode 100644 index > > > 000000000..18404fe48 > > > --- /dev/null > > > +++ b/lib/librte_ring/rte_ring_rts.h > > IMO, we should not provide these APIs. > > You mean only _elem_ ones, as discussed above? Yes > > > > > > @@ -0,0 +1,316 @@ > > > +/* SPDX-License-Identifier: BSD-3-Clause > > > + * > > > + * Copyright (c) 2010-2017 Intel Corporation > > nit, the year should change to 2020? Look at others too. > > ack, will do. > > > > > > + * Copyright (c) 2007-2009 Kip Macy km...@freebsd.org > > > + * All rights reserved. > > > + * Derived from FreeBSD's bufring.h > > > + * Used as BSD-3 Licensed with permission from Kip Macy. > > > + */ > > > + > > > +#ifndef _RTE_RING_RTS_H_ > > > +#define _RTE_RING_RTS_H_ > > > + > > > +/** > > > + * @file rte_ring_rts.h > > > + * @b EXPERIMENTAL: this API may change without prior notice > > > + * It is not recommended to include this file directly. > > > + * Please include <rte_ring.h> instead. > > > + * > > > + * Contains functions for Relaxed Tail Sync (RTS) ring mode. > > > + * The main idea remains the same as for our original MP/MC > > > > ^^^ the > > > +synchronization > > > + * mechanism. > > > + * The main difference is that tail value is increased not > > > + * by every thread that finished enqueue/dequeue, > > > + * but only by the last one doing enqueue/dequeue. > > should we say 'current last' or 'last thread at a given instance'? > > > > > + * That allows threads to skip spinning on tail value, > > > + * leaving actual tail value change to last thread in the update queue. > > nit, I understand what you mean by 'update queue' here. IMO, we should > remove it as it might confuse some. > > > > > + * RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation: > > > + * one for head update, second for tail update. > > > + * As a gain it allows thread to avoid spinning/waiting on tail value. > > > + * In comparision original MP/MC algorithm requires one 32-bit CAS > > > + * for head update and waiting/spinning on tail value. > > > + * > > > + * Brief outline: > > > + * - introduce refcnt for both head and tail. > > Suggesting using the same names as used in the structures. > > > > > + * - increment head.refcnt for each head.value update > > > + * - write head:value and head:refcnt atomically (64-bit CAS) > > > + * - move tail.value ahead only when tail.refcnt + 1 == > > > + head.refcnt > > May be add '(indicating that this is the last thread updating the tail)' > > > > > + * - increment tail.refcnt when each enqueue/dequeue op finishes > > May be add 'otherwise' at the beginning. > > > > > + * (no matter is tail:value going to change or not) > > nit ^^ if > > > + * - write tail.value and tail.recnt atomically (64-bit CAS) > > > + * > > > + * To avoid producer/consumer starvation: > > > + * - limit max allowed distance between head and tail value (HTD_MAX). > > > + * I.E. thread is allowed to proceed with changing head.value, > > > + * only when: head.value - tail.value <= HTD_MAX > > > + * HTD_MAX is an optional parameter. > > > + * With HTD_MAX == 0 we'll have fully serialized ring - > > > + * i.e. only one thread at a time will be able to enqueue/dequeue > > > + * to/from the ring. > > > + * With HTD_MAX >= ring.capacity - no limitation. > > > + * By default HTD_MAX == ring.capacity / 8. > > > + */ > > > + > > > +#ifdef __cplusplus > > > +extern "C" { > > > +#endif > > > + > > > +#include <rte_ring_rts_generic.h> > > > + <snip> > > > + > > > +#endif /* _RTE_RING_RTS_H_ */ > > > diff --git a/lib/librte_ring/rte_ring_rts_elem.h > > > b/lib/librte_ring/rte_ring_rts_elem.h > > > new file mode 100644 > > > index 000000000..71a331b23 > > > --- /dev/null > > > +++ b/lib/librte_ring/rte_ring_rts_elem.h > > > @@ -0,0 +1,205 @@ > > > +/* SPDX-License-Identifier: BSD-3-Clause > > > + * > > > + * Copyright (c) 2010-2017 Intel Corporation > > > + * Copyright (c) 2007-2009 Kip Macy km...@freebsd.org > > > + * All rights reserved. > > > + * Derived from FreeBSD's bufring.h > > > + * Used as BSD-3 Licensed with permission from Kip Macy. > > > + */ > > > + > > > +#ifndef _RTE_RING_RTS_ELEM_H_ > > > +#define _RTE_RING_RTS_ELEM_H_ > > > + > > > +/** > > > + * @file rte_ring_rts_elem.h > > > + * @b EXPERIMENTAL: this API may change without prior notice > > > + * > > > + * It is not recommended to include this file directly. > > > + * Please include <rte_ring_elem.h> instead. > > > + * Contains *ring_elem* functions for Relaxed Tail Sync (RTS) ring mode. > > > + * for more details please refer to <rte_ring_rts.h>. > > > + */ > > > + > > > +#ifdef __cplusplus > > > +extern "C" { > > > +#endif > > > + > > > +#include <rte_ring_rts_generic.h> > > > + > > > +/** > > > + * @internal Enqueue several objects on the RTS ring. > > > + * > > > + * @param r > > > + * A pointer to the ring structure. > > > + * @param obj_table > > > + * A pointer to a table of void * pointers (objects). > > > + * @param n > > > + * The number of objects to add in the ring from the obj_table. > > > + * @param behavior > > > + * RTE_RING_QUEUE_FIXED: Enqueue a fixed number of items from a > ring > > > + * RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible > from > > > ring > > > + * @param free_space > > > + * returns the amount of space after the enqueue operation has finished > > > + * @return > > > + * Actual number of objects enqueued. > > > + * If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only. > > > + */ > > > +static __rte_always_inline unsigned int > > > +__rte_ring_do_rts_enqueue_elem(struct rte_ring *r, void * const > > > +*obj_table, > > obj_table should be of type 'const void * obj_table' (looks like copy paste > error). Please check the other APIs below too. > > > > > + uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior, > > 'esize' is not documented in the comments above the function. You can > > copy the header from rte_ring_elem.h file. Please check other APIs as well. > > Ack to both, will fix. > > > > > > + uint32_t *free_space) > > > +{ > > > + uint32_t free, head; > > > + > > > + n = __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free); > > > + > > > + if (n != 0) { > > > + __rte_ring_enqueue_elems(r, head, obj_table, esize, n); > > > + __rte_ring_rts_update_tail(&r->rts_prod); > > > + } > > > + > > > + if (free_space != NULL) > > > + *free_space = free - n; > > > + return n; > > > +} > > > + <snip> > > > + > > > +#endif /* _RTE_RING_RTS_ELEM_H_ */ > > > diff --git a/lib/librte_ring/rte_ring_rts_generic.h > > > b/lib/librte_ring/rte_ring_rts_generic.h > > > new file mode 100644 > > > index 000000000..f88460d47 > > > --- /dev/null > > > +++ b/lib/librte_ring/rte_ring_rts_generic.h > > I do not know the benefit to providing the generic version. Do you know > why this was done in the legacy APIs? > > I think at first we had generic API only, then later C11 was added. > As I remember, C11 one on IA was measured as a bit slower then generic, > so it was decided to keep both. > > > If there is no performance difference between generic and C11 versions, > should we just skip the generic version? > > The oldest compiler in CI are GCC 4.8.5 and Clang 3.4.2 and C11 built-ins > are supported earlier than these compiler versions. > > I feel the code is growing exponentially in rte_ring library and we should > > try > to cut non-value-ass code/APIs aggressively. > > I'll check is there perf difference for RTS and HTS between generic and C11 > versions on IA. > Meanwhile please have a proper look at C11 implementation, I am not that > familiar with C11 atomics yet. ok > If there would be no problems with it and no noticeable diff in performance - > I am ok to have for RTS/HTS modes C11 version only. > > > > > > @@ -0,0 +1,210 @@ > > > +/* SPDX-License-Identifier: BSD-3-Clause > > > + * > > > + * Copyright (c) 2010-2017 Intel Corporation > > > + * Copyright (c) 2007-2009 Kip Macy km...@freebsd.org > > > + * All rights reserved. > > > + * Derived from FreeBSD's bufring.h > > > + * Used as BSD-3 Licensed with permission from Kip Macy. > > > + */ > > > + > > > +#ifndef _RTE_RING_RTS_GENERIC_H_ > > > +#define _RTE_RING_RTS_GENERIC_H_ > > > + > > > +/** > > > + * @file rte_ring_rts_generic.h > > > + * It is not recommended to include this file directly, > > > + * include <rte_ring.h> instead. > > > + * Contains internal helper functions for Relaxed Tail Sync (RTS) ring > mode. > > > + * For more information please refer to <rte_ring_rts.h>. > > > + */ <snip>