<snip> > > > diff --git a/lib/librte_ring/rte_ring_hts_generic.h > > > b/lib/librte_ring/rte_ring_hts_generic.h > > > new file mode 100644 > > > index 000000000..da08f1d94 > > > --- /dev/null > > > +++ b/lib/librte_ring/rte_ring_hts_generic.h > > > @@ -0,0 +1,198 @@
<snip> > > > + > > > +/** > > > + * @internal This function updates the producer head for enqueue > > > + * > > > + * @param r > > > + * A pointer to the ring structure > > > + * @param is_sp > > > + * Indicates whether multi-producer path is needed or not > > > + * @param n > > > + * The number of elements we will want to enqueue, i.e. how far should > the > > > + * head be moved > > > + * @param behavior > > > + * RTE_RING_QUEUE_FIXED: Enqueue a fixed number of items from a > ring > > > + * RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible > from > > > ring > > > + * @param old_head > > > + * Returns head value as it was before the move, i.e. where enqueue > starts > > > + * @param new_head > > > + * Returns the current/new head value i.e. where enqueue finishes > > Ups, copy/paste thing - will remove. > > > Would be good to return the new_head from this function and use it in > '__rte_ring_hts_update_tail'. > > I think old_head + num should be enough here (see above). > > > > > > + * @param free_entries > > > + * Returns the amount of free space in the ring BEFORE head was > moved > > > + * @return > > > + * Actual number of objects enqueued. > > > + * If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only. > > > + */ > > Minor, suggest removing the elaborate comments, it is not required and > difficult to maintain. > > I think we should do the same thing for other files too. > > Sorry, didn't get you here: what exactly do you suggest to remove? Following function is an internal function, we can skip the elaborate comments. I see that you have done this in other places. > > > > > > +static __rte_always_inline unsigned int > > > +__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num, > > > + enum rte_ring_queue_behavior behavior, uint32_t *old_head, > > > + uint32_t *free_entries) > > > +{ > > > + uint32_t n; > > > + union rte_ring_ht_pos np, op; > > > + > > > + const uint32_t capacity = r->capacity; > > > + > > > + do { > > > + /* Reset n to the initial burst count */ > > > + n = num; > > > + > > > + /* wait for tail to be equal to head */ > > > + __rte_ring_hts_head_wait(&r->hts_prod, &op); > > > + > > > + /* add rmb barrier to avoid load/load reorder in weak > > > + * memory model. It is noop on x86 > > > + */ > > > + rte_smp_rmb(); > > > + > > > + /* > > > + * The subtraction is done between two unsigned 32bits > > > value > > > + * (the result is always modulo 32 bits even if we have > > > + * *old_head > cons_tail). So 'free_entries' is always between > > > 0 > > > + * and capacity (which is < size). > > > + */ > > > + *free_entries = capacity + r->cons.tail - op.pos.head; > > > + > > > + /* check that we have enough room in ring */ > > > + if (unlikely(n > *free_entries)) > > > + n = (behavior == RTE_RING_QUEUE_FIXED) ? > > > + 0 : *free_entries; > > > + > > > + if (n == 0) > > > + break; > > > + > > > + np.pos.tail = op.pos.tail; > > > + np.pos.head = op.pos.head + n; > > > + > > > + } while (rte_atomic64_cmpset(&r->hts_prod.ht.raw, > > > + op.raw, np.raw) == 0); > > I think we can use 32b atomic operation here and just update the head. > > I think we have to do proper 64 bit CAS here, otherwise ABA race could arise: > Thread reads head/tail values, then get suspended just before CAS instruction > for a while. > Thread resumes when ring head value is equal to thread's local head value, > but tail differs (some other thread enqueuing into the ring). Good point, ACK > If we'll do CAS just for head - it would succeed, though it shouldn't. > I understand that with 32-bit head/tail values probability of such situation > is > really low, but still. Using 64b values would be good. Both Arm and x86 support 128b CAS, not sure about POWER. > > > > > > + > > > + *old_head = op.pos.head; > > > + return n; > > > +} > > > + <snip>