<snip> > > > > > > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API > > > > > > > > > > > > > > > > > > > > From: Ruifeng Wang <ruifeng.w...@arm.com> > > > > > > > > > > > > > > > > > > > > The peek API allows fetching the next available object > > > > > > > > > > in the ring without dequeuing it. This helps in > > > > > > > > > > scenarios where dequeuing of objects depend on their value. > > > > > > > > > > > > > > > > > > > > Signed-off-by: Dharmik Thakkar > > > > > > > > > > <dharmik.thak...@arm.com> > > > > > > > > > > Signed-off-by: Ruifeng Wang <ruifeng.w...@arm.com> > > > > > > > > > > Reviewed-by: Honnappa Nagarahalli > > > > > > > > > > <honnappa.nagaraha...@arm.com> > > > > > > > > > > Reviewed-by: Gavin Hu <gavin...@arm.com> > > > > > > > > > > --- > > > > > > > > > > lib/librte_ring/rte_ring.h | 30 > > > > > > > > > > ++++++++++++++++++++++++++++++ > > > > > > > > > > 1 file changed, 30 insertions(+) > > > > > > > > > > > > > > > > > > > > diff --git a/lib/librte_ring/rte_ring.h > > > > > > > > > > b/lib/librte_ring/rte_ring.h index > > > > > > > > > > 2a9f768a1..d3d0d5e18 > > > > > > > > > > 100644 > > > > > > > > > > --- a/lib/librte_ring/rte_ring.h > > > > > > > > > > +++ b/lib/librte_ring/rte_ring.h > > > > > > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct > > > > > > > > > > rte_ring *r, void > > > > > > > > > **obj_table, > > > > > > > > > > r->cons.single, available); } > > > > > > > > > > > > > > > > > > > > +/** > > > > > > > > > > + * Peek one object from a ring. > > > > > > > > > > + * > > > > > > > > > > + * The peek API allows fetching the next available > > > > > > > > > > +object in the ring > > > > > > > > > > + * without dequeuing it. This API is not multi-thread > > > > > > > > > > +safe with respect > > > > > > > > > > + * to other consumer threads. > > > > > > > > > > + * > > > > > > > > > > + * @param r > > > > > > > > > > + * A pointer to the ring structure. > > > > > > > > > > + * @param obj_p > > > > > > > > > > + * A pointer to a void * pointer (object) that will be > > > > > > > > > > filled. > > > > > > > > > > + * @return > > > > > > > > > > + * - 0: Success, object available > > > > > > > > > > + * - -ENOENT: Not enough entries in the ring. > > > > > > > > > > + */ > > > > > > > > > > +__rte_experimental > > > > > > > > > > +static __rte_always_inline int rte_ring_peek(struct > > > > > > > > > > +rte_ring *r, void **obj_p) > > > > > > > > > > > > > > > > > > As it is not MT safe, then I think we need _sc_ in the > > > > > > > > > name, to follow other rte_ring functions naming > > > > > > > > > conventions > > > > > > > > > (rte_ring_sc_peek() or so). > > > > > > > > Agree > > > > > > > > > > > > > > > > > > > > > > > > > > As a better alternative what do you think about > > > > > > > > > introducing a serialized versions of DPDK rte_ring dequeue > functions? > > > > > > > > > Something like that: > > > > > > > > > > > > > > > > > > /* same as original ring dequeue, but: > > > > > > > > > * 1) move cons.head only if cons.head == const.tail > > > > > > > > > * 2) don't update cons.tail > > > > > > > > > */ > > > > > > > > > unsigned int > > > > > > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void > > > > > > > > > **obj_table, unsigned int n, > > > > > > > > > unsigned int *available); > > > > > > > > > > > > > > > > > > /* sets both cons.head and cons.tail to cons.head + num > > > > > > > > > */ void rte_ring_serial_dequeue_finish(struct rte_ring > > > > > > > > > *r, uint32_t num); > > > > > > > > > > > > > > > > > > /* resets cons.head to const.tail value */ void > > > > > > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r); > > > > > > > > > > > > > > > > > > Then your dq_reclaim cycle function will look like that: > > > > > > > > > > > > > > > > > > const uint32_t nb_elt = dq->elt_size/8 + 1; uint32_t > > > > > > > > > avl, n; uintptr_t elt[nb_elt]; ... > > > > > > > > > > > > > > > > > > do { > > > > > > > > > > > > > > > > > > /* read next elem from the queue */ > > > > > > > > > n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl); > > > > > > > > > if (n == 0) > > > > > > > > > break; > > > > > > > > > > > > > > > > > > /* wrong period, keep elem in the queue */ if > > > > > > > > > (rte_rcu_qsbr_check(dr->v, > > > > > > > > > elt[0]) != 1) { > > > > > > > > > rte_ring_serial_dequeue_abort(dq->r); > > > > > > > > > break; > > > > > > > > > } > > > > > > > > > > > > > > > > > > /* can reclaim, remove elem from the queue */ > > > > > > > > > rte_ring_serial_dequeue_finish(dr->q, nb_elt); > > > > > > > > > > > > > > > > > > /*call reclaim function */ > > > > > > > > > dr->f(dr->p, elt); > > > > > > > > > > > > > > > > > > } while (avl >= nb_elt); > > > > > > > > > > > > > > > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT > safe. > > > > > > > > > As long as actual reclamation callback itself is MT safe of > course. > > > > > > > > > > > > > > > > I think it is a great idea. The other writers would still > > > > > > > > be polling for the current writer to update the tail or > > > > > > > > update the head. This makes it a > > > > > > > blocking solution. > > > > > > > > > > > > > > Yep, it is a blocking one. > > > > > > > > > > > > > > > We can make the other threads not poll i.e. they will quit > > > > > > > > reclaiming if they > > > > > > > see that other writers are dequeuing from the queue. > > > > > > > > > > > > > > Actually didn't think about that possibility, but yes should > > > > > > > be possible to have _try_ semantics too. > > > > > > > > > > > > > > >The other way is to use per thread queues. > > > > > > > > > > > > > > > > The other requirement I see is to support unbounded-size > > > > > > > > data structures where in the data structures do not have a > > > > > > > > pre-determined number of entries. Also, currently the > > > > > > > > defer queue size is equal to the total > > > > > > > number of entries in a given data structure. There are plans > > > > > > > to support dynamically resizable defer queue. This means, > > > > > > > memory allocation which will affect the lock-free-ness of the > solution. > > > > > > > > > > > > > > > > So, IMO: > > > > > > > > 1) The API should provide the capability to support > > > > > > > > different algorithms - > > > > > > > may be through some flags? > > > > > > > > 2) The requirements for the ring are pretty unique to the > > > > > > > > problem we have here (for ex: move the cons-head only if > > > > > > > > cons-tail is also the same, skip > > > > > > > polling). So, we should probably implement a ring with-in > > > > > > > the RCU > > > library? > > > > > > > > > > > > > > Personally, I think such serialization ring API would be > > > > > > > useful for other cases too. > > > > > > > There are few cases when user need to read contents of the > > > > > > > queue without removing elements from it. > > > > > > > Let say we do use similar approach inside TLDK to implement > > > > > > > TCP transmit queue. > > > > > > > If such API would exist in DPDK we can just use it > > > > > > > straightway, without maintaining a separate one. > > > > > > ok > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From the timeline perspective, adding all these > > > > > > > > capabilities would be difficult to get done with in 19.11 > > > > > > > > timeline. What I have here satisfies my current needs. I > > > > > > > > suggest that we make provisions in APIs now to > > > > > > > support all these features, but do the implementation in the > > > > > > > coming > > > > > releases. > > > > > > > Does this sound ok for you? > > > > > > > > > > > > > > Not sure I understand your suggestion here... > > > > > > > Could you explain it a bit more - how new API will look like > > > > > > > and what would be left for the future. > > > > > > For this patch, I suggest we do not add any more complexity. > > > > > > If someone wants a lock-free/block-free mechanism, it is > > > > > > available by creating > > > > > per thread defer queues. > > > > > > > > > > > > We push the following to the future: > > > > > > 1) Dynamically size adjustable defer queue. IMO, with this, > > > > > > the lock-free/block-free reclamation will not be available > > > > > > (memory allocation > > > > > requires locking). The memory for the defer queue will be > > > > > allocated/freed in chunks of 'size' elements as the queue > grows/shrinks. > > > > > > > > > > That one is fine by me. > > > > > In fact I don't know would be there a real use-case for dynamic > > > > > defer queue for rcu var... > > > > > But I suppose that's subject for another discussion. > > > > Currently, the defer queue size is equal to the number of > > > > resources in the data structure. This is unnecessary as the reclamation > > > > is > done regularly. > > > > If a smaller queue size is used, the queue might get full (even > > > > after > > > reclamation), in which case, the queue size should be increased. > > > > > > I understand the intention. > > > Though I am not very happy with approach where to free one resource > > > we first have to allocate another one. > > > Sounds like a source of deadlocks and for that case probably > > > unnecessary complication. > > It depends on the use case. For some use cases lock-free reader-writer > > concurrency is enough (in which case there is no need to have a queue > > large enough to hold all the resources) and some would require lock-free > reader-writer and writer-writer concurrency (where, theoretically, a queue > large enough to hold all the resources would be required). > > > > > But again, as it is not for 19.11 we don't have to discuss it now. > > > > > > > > > > > > > > > > > > > > 2) Constant size defer queue with lock-free and block-free > > > > > > reclamation (single option). The defer queue will be of fixed > > > > > > length 'size'. If the queue gets full an error is returned. > > > > > > The user could provide a 'size' equal > > > > > to the number of elements in a data structure to ensure queue > > > > > never gets > > > full. > > > > > > > > > > Ok so for 19.11 what enqueue/dequeue model do you plan to support? > > > > > - MP/MC > > > > > - MP/SC > > > > > - SP/SC > > > > Just SP/SC > > > > > > Ok, just to confirm we are on the same page: > > > there would be a possibility for one thread do dq_enqueue(), second > > > one do > > > dq_reclaim() simultaneously (of course if actual reclamation > > > function is thread safe)? > > Yes, that is allowed. Mutual exclusion is required only around dq_reclaim. This is not completely correct (as you have pointed out below) as dq_enqueue, will end up calling do_reclaim > > Ok, and that probably due to nature of ring_sc_peek(), right?. > BuT user can set reclaim threshold higher then number of elems in the defere > queue, and that should help to prevent dq_reclaim() from inside > dq_enqueue(), correct? Yes, that is possible.
> If so, I have no objections in general to the proposed plan. > Konstantin > > > > > > > > > > > - non MT at all (only same single thread can do enqueue and > > > > > dequeue) > > > > If MT safe is required, one should use 1 defer queue per thread for now. > > > > > > > > > > > > > > And related question: > > > > > What additional rte_ring API you plan to introduce in that case? > > > > > - None > > > > > - rte_ring_sc_peek() > > > > rte_ring_peek will be changed to rte_ring_sc_peek > > > > > > > > > - rte_ring_serial_dequeue() > > > > > > > > > > > > > > > > > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and > > > > > > provide > > > > > > 2 #defines, one for dynamically variable size defer queue and > > > > > > the other for > > > > > constant size defer queue. > > > > > > > > > > > > However, IMO, using per thread defer queue is a much simpler > > > > > > way to > > > > > achieve 2. It does not add any significant burden to the user either. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > +{ > > > > > > > > > > + uint32_t prod_tail = r->prod.tail; > > > > > > > > > > + uint32_t cons_head = r->cons.head; > > > > > > > > > > + uint32_t count = (prod_tail - cons_head) & r->mask; > > > > > > > > > > + unsigned int n = 1; > > > > > > > > > > + if (count) { > > > > > > > > > > + DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, > void *); > > > > > > > > > > + return 0; > > > > > > > > > > + } > > > > > > > > > > + return -ENOENT; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > #ifdef __cplusplus > > > > > > > > > > } > > > > > > > > > > #endif > > > > > > > > > > -- > > > > > > > > > > 2.17.1