On 2020-04-09 15:32, Jerin Jacob wrote: > On Thu, Apr 9, 2020 at 5:51 PM Mattias Rönnblom > <mattias.ronnb...@ericsson.com> wrote: >> On 2020-04-08 21:36, Jerin Jacob wrote: >>> On Wed, Apr 8, 2020 at 11:27 PM Mattias Rönnblom >>> <mattias.ronnb...@ericsson.com> wrote: >>>> Extend Eventdev API to allow for event devices which require various >>>> forms of internal processing to happen, even when events are not >>>> enqueued to or dequeued from a port. >>>> >>>> Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com> >>>> --- >>>> lib/librte_eventdev/rte_eventdev.h | 65 ++++++++++++++++++++++++++ >>>> lib/librte_eventdev/rte_eventdev_pmd.h | 14 ++++++ >>>> 2 files changed, 79 insertions(+) >>>> >>>> diff --git a/lib/librte_eventdev/rte_eventdev.h >>>> b/lib/librte_eventdev/rte_eventdev.h >>>> index 226f352ad..d69150792 100644 >>>> --- a/lib/librte_eventdev/rte_eventdev.h >>>> +++ b/lib/librte_eventdev/rte_eventdev.h >>>> @@ -289,6 +289,15 @@ struct rte_event; >>>> * single queue to each port or map a single queue to many port. >>>> */ >>>> >>>> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9) >>>> +/**< Event device requires calls to rte_event_maintain() during >>> This scheme would call for DSW specific API handling in fastpath. >> >> Initially this would be so, but buffering events might yield performance >> benefits for more event devices than DSW. >> >> >> In an application, it's often convenient, but sub-optimal from a >> performance point of view, to do single-event enqueue operations. The >> alternative is to use an application-level buffer, and the flush this >> buffer with rte_event_enqueue_burst(). If you allow the event device to >> buffer, you get the simplicity of single-event enqueue operations, but >> without taking any noticeable performance hit. > IMO, It is better to aggregate the burst by the application, as sending > event by event to the driver to aggregate has performance due to cost > function pointer overhead.
That's a very slight overhead - but for optimal performance, sure. It'll come at a cost in terms of code complexity. Just look at the adapters. They do this already. I think some applications are ready to take the extra 5-10 clock cycles or so it'll cost them to do the function call (provided the event device had buffering support). > Another concern is the frequency of calling rte_event_maintain() function by > the application, as the timing requirements will vary differently by > the driver to driver and application to application. > IMO, It is not portable and I believe the application should not be > aware of those details. If the driver needs specific maintenance > function for any other reason then better to use DPDK SERVICE core infra. The only thing the application needs to be aware of, is that it needs to call rte_event_maintain() as often as it would have called dequeue() in your "typical worker" example. To make sure this call is cheap-enough is up to the driver, and this needs to hold true for all event devices that needs maintenance. If you plan to use a non-buffering hardware device driver or a soft, centralized scheduler that doesn't need this, it will also not set the flag, and thus the application needs not care about the rte_event_maintain() function. DPDK code such as the eventdev adapters do need to care, but the increase in complexity is slight, and the cost of calling rte_maintain_event() on a maintenance-free devices is very low (since the then-NULL function pointer is in the eventdev struct, likely on a cache-line already dragged in). Unfortunately, DPDK doesn't have a per-core delayed-work mechanism. Flushing event buffers (and other DSW "background work") can't be done on a service core, since they would work on non-MT-safe data structures on the worker thread's event ports. >> >>>> + * periods when neither rte_event_dequeue_burst() nor >>> The typical worker thread will be >>> while (1) { >>> rte_event_dequeue_burst(); >>> ..proess.. >>> rte_event_enqueue_burst(); >>> } >>> If so, Why DSW driver can't do the maintenance in driver context in >>> dequeue() call. >>> >> DSW already does maintenance on dequeue, and works well in the above >> scenario. The typical worker does not need to care about the >> rte_event_maintain() functions, since it dequeues events on a regular basis. >> >> >> What this RFC addresses is the more atypical (but still fairly common) >> case of a port being neither dequeued to or enqueued from on a regular >> basis. The timer and ethernet rx adapters are examples of such. > If it is an Adapter specific use case problem then maybe, we have > an option to fix the problem in adapter specific API usage or in that area. > It's not adapter specific, I think. There might be producer-only ports, for example, which doesn't provide a constant stream of events, but rather intermittent bursts. A traffic generator is one example of such an application, and there might be other, less synthetic ones as well. >> >>>> + * rte_event_enqueue_burst() are called on a port. This will allow the >>>> + * event device to perform internal processing, such as flushing >>>> + * buffered events, return credits to a global pool, or process >>>> + * signaling related to load balancing. >>>> + */ >>