On Wed, Oct 25, 2023 at 09:40:54AM +0200, Mattias Rönnblom wrote:
> On 2023-10-24 11:10, Bruce Richardson wrote:
> > On Tue, Oct 24, 2023 at 09:10:30AM +0100, Bruce Richardson wrote:
> > > On Mon, Oct 23, 2023 at 06:10:54PM +0200, Mattias Rönnblom wrote:
> > > > Hi.
> > > > 
> > > > Consider an Eventdev app using atomic-type scheduling doing something 
> > > > like:
> > > > 
> > > >      struct rte_event events[3];
> > > > 
> > > >      rte_event_dequeue_burst(dev_id, port_id, events, 3, 0);
> > > > 
> > > >      /* Assume three events were dequeued, and the application decides
> > > >       * it's best off to processing event 0 and 2 consecutively */
> > > > 
> > > >      process(&events[0]);
> > > >      process(&events[2]);
> > > > 
> > > >      events[0].queue_id++;
> > > >      events[0].op = RTE_EVENT_OP_FORWARD;
> > > >      events[2].queue_id++;
> > > >      events[2].op = RTE_EVENT_OP_FORWARD;
> > > > 
> > > >      rte_event_enqueue_burst(dev_id, port_id, &events[0], 1);
> > > >      rte_event_enqueue_burst(dev_id, port_id, &events[2], 1);
> > > > 
> > > >      process(&events[1]);
> > > >      events[1].queue_id++;
> > > >      events[1].op = RTE_EVENT_OP_FORWARD;
> > > > 
> > > >      rte_event_enqueue_burst(dev_id, port_id, &events[1], 1);
> > > > 
> > > > If one would just read the Eventdev API spec, they might expect this to 
> > > > work
> > > > (especially since impl_opaque hints as potentially be useful for the 
> > > > purpose
> > > > of identifying events).
> > > > 
> > > > However, on certain event devices, it doesn't (and maybe rightly so). If
> > > > event 0 and 2 belongs to the same flow (queue id + flow id pair), and 
> > > > event
> > > > 1 belongs to some other, then this other flow would be "unlocked" at the
> > > > point of the second enqueue operation (and thus be processed on some 
> > > > other
> > > > core, in parallel). The first flow would still be needlessly "locked".
> > > > 
> > > > Such event devices require the order of the enqueued events to be the 
> > > > same
> > > > as the dequeued events, using RTE_EVENT_OP_RELEASE type events as 
> > > > "fillers"
> > > > for dropped events.
> > > > 
> > > > Am I missing something in the Eventdev API documentation?
> > > > 
> > > 
> > > Much more likely is that the documentation is missing something. We should
> > > explicitly clarify this behaviour, as it's required by a number of 
> > > drivers.
> > > 
> > > > Could an event device use the impl_opaque field to track the identity 
> > > > of an
> > > > event (and thus relax ordering requirements) and still be complaint 
> > > > toward
> > > > the API?
> > > > 
> > > 
> > > Possibly, but the documentation also doesn't report that the impl_opaque
> > > field must be preserved between dequeue and enqueue. When forwarding a
> > > packet it's well possible for an app to extract an mbuf from a dequeued
> > > event and create a new event for sending it back in to the eventdev. For
> 
> Such a behavior would be in violation of a part of the Eventdev API contract
> actually specified. The rte_event struct documentation says about
> impl_opaque that "An implementation may use this field to hold
> implementation specific value to share between dequeue and enqueue
> operation. The application should not modify this field. "
> 
> I see no other way to read this than that "an implementation" here is
> referring to an event device PMD. The requirement that the application can't
> modify this field only make sense in the context of "from dequeue to
> enqueue".
> 

Yep, you are completely correct. For some reason, I had this in my head the
other way round, that it was for internal use between the enqueue and
dequeue. My mistake! :-(

> > > example, if the first stage post-RX is doing classify, it's entirely
> > > possible for every single field in the event header to be different for 
> > > the
> > > event returned compared to dequeue (flow_id recomputed, event type/source
> > > adjusted, target queue_id and priority updated, op type changed to forward
> > > from new, etc. etc.).
> > > 
> > > > What happens if a RTE_EVENT_OP_NEW event is inserted into the mix of
> > > > OP_FORWARD and OP_RELEASE type events being enqueued? Again I'm not 
> > > > clear on
> > > > what the API says, if anything.
> > > > 
> > > OP_NEW should have no effect on the "history-list" of events previousl
> > > dequeued. Again, our docs should clarify that explicitly. Thanks for
> > > calling all this out.
> > > 
> > Looking at the docs we have, I would propose adding a new subsection "Event
> > Operations", as section 49.1.6 to [1]. There we could explain "New",
> > "Forward" and "Release" events - what they mean for the different queue
> > types and how to use them. That section could also cover the enqueue
> > ordering rules, as the use of event "history" is necessary to explain
> > releases and forwards.
> > 
> > This seem reasonable? If nobody else has already started on updating docs
> > for this, I'm happy enough to give it a stab.
> > 
> 
> Batch dequeues not only provides an opportunity to amortize per-interaction
> overhead with the event device, it also allows the application to reshuffle
> the order in which it decides to process the events.
> 
> Such reshuffling may have a very significant impact on performance. At a
> minimum, cache locality improves, and in case the app is able to "vector
> processing" (e.g., something akin to what fd.io VPP does), the gains may be
> further increased.
> 
> One may argue the app/core should just "do what it's told" by the event
> device. After all, an event device is a work scheduler, and reshuffling
> items of work certainly counts as (micro-)scheduling work.
> 
> However it's much to hope for to expect a fairly generic function,
> especially if it comes in the form of hardware, with a design frozen years
> ago, to be able to arrange the work in whatever is currently optimal order
> for one particular application.
> 
> What such an app can do (or must do, if it has efficiency constraints) is to
> buffer the events on the output side, rearranging them in accordance to the
> yet-seemingly-undocumented Eventdev API contract. That's certainly possible,
> and not very difficult, but it seems to me that this really is the job
> something in the platform (e.g., in Eventdev or the event device PMD).
> 
> One way out of this could be to add an "implicit release-*only*" mode of
> operation for eventdev.
> 
> In such a mode, the RTE_SCHED_TYPE_ATOMIC per-flow "lock" (and its ORDERED
> equivalent, if there is one) would be held until the next dequeue. In such a
> mode, the difference between OP_FORWARD and OP_NEW events would just be the
> back-pressure watermark (new_event_threshold).
> 
> That pre-rte_event_enqueue_burst() buffering would prevent the event device
> from releasing "locks" that could otherwise be released, but the typical
> cost of event device interaction is so high so I have my doubts about how
> useful that feature is. If you are worried about "locks" held for a long
> time, one may need to use short bursts anyway (since worst-case critical
> section length is not reduced by such RELEASEs).
> 
> Another option would be to have the current RTE_EVENT_DEV_CAP_BURST_MODE
> capable PMDs start using the "impl_opaque" field for the purpose of matching
> in and out events. It would require applications to actually start adhering
> to the "don't touch impl_opaque" requirement of the Eventdev API.
> 
> Those "fixes" are not mutually exclusive.
> 
> A side note: it's unfortunate there are no bits in the rte_event struct that
> can be used for "event id"/"event SN"/"event dequeue idx" type information,
> if an app would like to work around this issue with current PMDs.
> 
Lots of good points here. We'll take a look and see what we can do in our
drivers and any other ideas or suggestions.

/Bruce

Reply via email to