On Wed, Oct 25, 2023 at 09:40:54AM +0200, Mattias Rönnblom wrote: > On 2023-10-24 11:10, Bruce Richardson wrote: > > On Tue, Oct 24, 2023 at 09:10:30AM +0100, Bruce Richardson wrote: > > > On Mon, Oct 23, 2023 at 06:10:54PM +0200, Mattias Rönnblom wrote: > > > > Hi. > > > > > > > > Consider an Eventdev app using atomic-type scheduling doing something > > > > like: > > > > > > > > struct rte_event events[3]; > > > > > > > > rte_event_dequeue_burst(dev_id, port_id, events, 3, 0); > > > > > > > > /* Assume three events were dequeued, and the application decides > > > > * it's best off to processing event 0 and 2 consecutively */ > > > > > > > > process(&events[0]); > > > > process(&events[2]); > > > > > > > > events[0].queue_id++; > > > > events[0].op = RTE_EVENT_OP_FORWARD; > > > > events[2].queue_id++; > > > > events[2].op = RTE_EVENT_OP_FORWARD; > > > > > > > > rte_event_enqueue_burst(dev_id, port_id, &events[0], 1); > > > > rte_event_enqueue_burst(dev_id, port_id, &events[2], 1); > > > > > > > > process(&events[1]); > > > > events[1].queue_id++; > > > > events[1].op = RTE_EVENT_OP_FORWARD; > > > > > > > > rte_event_enqueue_burst(dev_id, port_id, &events[1], 1); > > > > > > > > If one would just read the Eventdev API spec, they might expect this to > > > > work > > > > (especially since impl_opaque hints as potentially be useful for the > > > > purpose > > > > of identifying events). > > > > > > > > However, on certain event devices, it doesn't (and maybe rightly so). If > > > > event 0 and 2 belongs to the same flow (queue id + flow id pair), and > > > > event > > > > 1 belongs to some other, then this other flow would be "unlocked" at the > > > > point of the second enqueue operation (and thus be processed on some > > > > other > > > > core, in parallel). The first flow would still be needlessly "locked". > > > > > > > > Such event devices require the order of the enqueued events to be the > > > > same > > > > as the dequeued events, using RTE_EVENT_OP_RELEASE type events as > > > > "fillers" > > > > for dropped events. > > > > > > > > Am I missing something in the Eventdev API documentation? > > > > > > > > > > Much more likely is that the documentation is missing something. We should > > > explicitly clarify this behaviour, as it's required by a number of > > > drivers. > > > > > > > Could an event device use the impl_opaque field to track the identity > > > > of an > > > > event (and thus relax ordering requirements) and still be complaint > > > > toward > > > > the API? > > > > > > > > > > Possibly, but the documentation also doesn't report that the impl_opaque > > > field must be preserved between dequeue and enqueue. When forwarding a > > > packet it's well possible for an app to extract an mbuf from a dequeued > > > event and create a new event for sending it back in to the eventdev. For > > Such a behavior would be in violation of a part of the Eventdev API contract > actually specified. The rte_event struct documentation says about > impl_opaque that "An implementation may use this field to hold > implementation specific value to share between dequeue and enqueue > operation. The application should not modify this field. " > > I see no other way to read this than that "an implementation" here is > referring to an event device PMD. The requirement that the application can't > modify this field only make sense in the context of "from dequeue to > enqueue". >
Yep, you are completely correct. For some reason, I had this in my head the other way round, that it was for internal use between the enqueue and dequeue. My mistake! :-( > > > example, if the first stage post-RX is doing classify, it's entirely > > > possible for every single field in the event header to be different for > > > the > > > event returned compared to dequeue (flow_id recomputed, event type/source > > > adjusted, target queue_id and priority updated, op type changed to forward > > > from new, etc. etc.). > > > > > > > What happens if a RTE_EVENT_OP_NEW event is inserted into the mix of > > > > OP_FORWARD and OP_RELEASE type events being enqueued? Again I'm not > > > > clear on > > > > what the API says, if anything. > > > > > > > OP_NEW should have no effect on the "history-list" of events previousl > > > dequeued. Again, our docs should clarify that explicitly. Thanks for > > > calling all this out. > > > > > Looking at the docs we have, I would propose adding a new subsection "Event > > Operations", as section 49.1.6 to [1]. There we could explain "New", > > "Forward" and "Release" events - what they mean for the different queue > > types and how to use them. That section could also cover the enqueue > > ordering rules, as the use of event "history" is necessary to explain > > releases and forwards. > > > > This seem reasonable? If nobody else has already started on updating docs > > for this, I'm happy enough to give it a stab. > > > > Batch dequeues not only provides an opportunity to amortize per-interaction > overhead with the event device, it also allows the application to reshuffle > the order in which it decides to process the events. > > Such reshuffling may have a very significant impact on performance. At a > minimum, cache locality improves, and in case the app is able to "vector > processing" (e.g., something akin to what fd.io VPP does), the gains may be > further increased. > > One may argue the app/core should just "do what it's told" by the event > device. After all, an event device is a work scheduler, and reshuffling > items of work certainly counts as (micro-)scheduling work. > > However it's much to hope for to expect a fairly generic function, > especially if it comes in the form of hardware, with a design frozen years > ago, to be able to arrange the work in whatever is currently optimal order > for one particular application. > > What such an app can do (or must do, if it has efficiency constraints) is to > buffer the events on the output side, rearranging them in accordance to the > yet-seemingly-undocumented Eventdev API contract. That's certainly possible, > and not very difficult, but it seems to me that this really is the job > something in the platform (e.g., in Eventdev or the event device PMD). > > One way out of this could be to add an "implicit release-*only*" mode of > operation for eventdev. > > In such a mode, the RTE_SCHED_TYPE_ATOMIC per-flow "lock" (and its ORDERED > equivalent, if there is one) would be held until the next dequeue. In such a > mode, the difference between OP_FORWARD and OP_NEW events would just be the > back-pressure watermark (new_event_threshold). > > That pre-rte_event_enqueue_burst() buffering would prevent the event device > from releasing "locks" that could otherwise be released, but the typical > cost of event device interaction is so high so I have my doubts about how > useful that feature is. If you are worried about "locks" held for a long > time, one may need to use short bursts anyway (since worst-case critical > section length is not reduced by such RELEASEs). > > Another option would be to have the current RTE_EVENT_DEV_CAP_BURST_MODE > capable PMDs start using the "impl_opaque" field for the purpose of matching > in and out events. It would require applications to actually start adhering > to the "don't touch impl_opaque" requirement of the Eventdev API. > > Those "fixes" are not mutually exclusive. > > A side note: it's unfortunate there are no bits in the rte_event struct that > can be used for "event id"/"event SN"/"event dequeue idx" type information, > if an app would like to work around this issue with current PMDs. > Lots of good points here. We'll take a look and see what we can do in our drivers and any other ideas or suggestions. /Bruce