Hi All, > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com] > Sent: Friday, November 25, 2016 12:24 AM > To: Thomas Monjalon <thomas.monjalon at 6wind.com> > Cc: dev at dpdk.org; Richardson, Bruce <bruce.richardson at intel.com>; Van > Haaren, Harry > <harry.van.haaren at intel.com>; hemant.agrawal at nxp.com; Eads, Gage > <gage.eads at intel.com> > Subject: Re: [dpdk-dev] [PATCH 1/4] eventdev: introduce event driven > programming model > > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote: > > 2016-11-24 07:29, Jerin Jacob: > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote: > > > > 2016-11-18 11:14, Jerin Jacob: > > > > > +Eventdev API - EXPERIMENTAL > > > > > +M: Jerin Jacob <jerin.jacob at caviumnetworks.com> > > > > > +F: lib/librte_eventdev/ > > > > > > > > OK to mark it experimental. > > > > What is the plan to remove the experimental word? > > > > > > IMO, EXPERIMENTAL status can be changed when > > > - At least two event drivers available(Intel and Cavium are working on > > > SW and HW event drivers) > > > - Functional test applications are fine with at least two drivers > > > - Portable example application to showcase the features of the library > > > - eventdev integration with another dpdk subsystem such as ethdev > > > > > > Thoughts?. I am not sure the criteria used in cryptodev case. > > > > Sounds good. > > We will be more confident when drivers and tests will be implemented. > > > > I think the roadmap for the SW driver targets the release 17.05. > > Do you still plan 17.02 for this API and the Cavium driver? > > No. 17.02 too short for up-streaming the Cavium driver.However, I think API > and > skeleton event driver can go in 17.02 if there are no objections. > > > > > > > > +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton > > > > > +/**< Skeleton event device PMD name */ > > > > > > > > I do not understand this #define. > > > > > > Applications can explicitly request the a specific driver though driver > > > name. This will go as argument to rte_event_dev_get_dev_id(const char > > > *name). > > > The reason for keeping this #define in rte_eventdev.h is that, > > > application needs to include only rte_eventdev.h not rte_eventdev_pmd.h. > > > > So each driver must register its name in the API? > > Is it really needed? > > Otherwise how application knows the name of the driver. > The similar scheme used in cryptodev. > http://dpdk.org/browse/dpdk/tree/lib/librte_cryptodev/rte_cryptodev.h#n53 > No strong opinion here. Open for suggestions. > > > > > > > > +struct rte_event_dev_config { > > > > > + uint32_t dequeue_wait_ns; > > > > > + /**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this > > > > > device. > > > > > > > > Please explain exactly when the wait occurs and why. > > > > > > Here is the explanation from rte_event_dequeue() API definition, > > > - > > > @param wait > > > 0 - no-wait, returns immediately if there is no event. > > > >0 - wait for the event, if the device is configured with > > > RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait until > > > the event available or *wait* time. > > > if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT > > > then this function will wait until the event available or > > > *dequeue_wait_ns* > > > > > > ^^^^^^^^^^^^^^^^^^^^^^ > > > ns which was previously supplied to rte_event_dev_configure() > > > - > > > This is provides the application to have control over, how long the > > > implementation should wait if event is not available. > > > > > > Let me know what exact changes are required if details are not enough in > > > rte_event_dequeue() API definition. > > > > Maybe that timeout would be a better name. > > It waits only if there is nothing in the queue. > > It can be interesting to highlight in this comment that this parameter > > makes the dequeue function a blocking call. > > OK. I will change to timeout then > > > > > > > > +/** Event port configuration structure */ > > > > > +struct rte_event_port_conf { > > > > > + int32_t new_event_threshold; > > > > > + /**< A backpressure threshold for new event enqueues on this > > > > > port. > > > > > + * Use for *closed system* event dev where event capacity is > > > > > limited, > > > > > + * and cannot exceed the capacity of the event dev. > > > > > + * Configuring ports with different thresholds can make higher > > > > > priority > > > > > + * traffic less likely to be backpressured. > > > > > + * For example, a port used to inject NIC Rx packets into the > > > > > event dev > > > > > + * can have a lower threshold so as not to overwhelm the device, > > > > > + * while ports used for worker pools can have a higher > > > > > threshold. > > > > > + * This value cannot exceed the *nb_events_limit* > > > > > + * which previously supplied to rte_event_dev_configure() > > > > > + */ > > > > > + uint8_t dequeue_depth; > > > > > + /**< Configure number of bulk dequeues for this event port. > > > > > + * This value cannot exceed the *nb_event_port_dequeue_depth* > > > > > + * which previously supplied to rte_event_dev_configure() > > > > > + */ > > > > > + uint8_t enqueue_depth; > > > > > + /**< Configure number of bulk enqueues for this event port. > > > > > + * This value cannot exceed the *nb_event_port_enqueue_depth* > > > > > + * which previously supplied to rte_event_dev_configure() > > > > > + */ > > > > > +}; > > > > > > > > The depth configuration is not clear to me. > > > > > > Basically the maximum number of events can be enqueued/dequeued at time > > > from a given event port. depth of one == non burst mode. > > > > OK so depth is the queue size. Please could you reword? > > OK > > > > > > > > +/* Event types to classify the event source */ > > > > > > > > Why this classification is needed? > > > > > > This for application pipeling and the cases like, if application wants to > > > know which > > > subsystem generated the event. > > > > > > example packet forwarding loop on the worker cores: > > > while(1) { > > > ev = dequeue() > > > // event from ethdev subsystem > > > if (ev.event_type == RTE_EVENT_TYPE_ETHDEV) { > > > - swap the mac address > > > - push to atomic queue for ingress flow order maintenance > > > by CORE > > > /* events from core */ > > > } else if (ev.event_type == RTE_EVENT_TYPE_CORE) { > > > > > > } > > > enqueue(ev); > > > } > > > > I don't know why but I feel this classification is weak. > > You need to track the source of the event. Does it make sense to go beyond > > and identify the source device? > > No, dequeue has dev_id argument, so event comes only from that device > > > > > > > > +#define RTE_EVENT_TYPE_ETHDEV 0x0 > > > > > +/**< The event generated from ethdev subsystem */ > > > > > +#define RTE_EVENT_TYPE_CRYPTODEV 0x1 > > > > > +/**< The event generated from crypodev subsystem */ > > > > > +#define RTE_EVENT_TYPE_TIMERDEV 0x2 > > > > > +/**< The event generated from timerdev subsystem */ > > > > > +#define RTE_EVENT_TYPE_CORE 0x3 > > > > > +/**< The event generated from core. > > > > > > > > What is core? > > > > > > The event are generated by lcore for pipeling. Any suggestion for > > > better name? lcore? > > > > What about CPU or SW? > > No strong opinion here. I will go with CPU then
+1 for CPU (as SW is the software PMD name). > > > > > + /**< Opaque event pointer */ > > > > > + struct rte_mbuf *mbuf; > > > > > + /**< mbuf pointer if dequeued event is associated with > > > > > mbuf */ > > > > > > > > How do we know that an event is associated with mbuf? > > > > > > By looking at the event source/type RTE_EVENT_TYPE_* > > > > > > > Does it mean that such events are always converted into mbuf even if the > > > > application does not need it? > > > > > > Hardware has dependency on getting physical address of the event, so any > > > struct that has "phys_addr_t buf_physaddr" works. > > > > I do not understand. > > In HW based implementations, the event pointer will be submitted to HW. > As you know, since HW can't understand the virtual address and it needs > to converted to the physical address, any DPDK object that provides > phys_addr_t > such as mbuf can be used with libeventdev. > > > > > I tought that decoding the event would be the responsibility of the app > > by calling a function like > > rte_eventdev_convert_to_mbuf(struct rte_event *, struct rte_mbuf *). > > It can be. But it is costly.i.e Yet another function pointer based > driver interface on fastpath. Instead, if the driver itself can > convert to mbuf(in case of ETHDEV device) and tag the source/event type > as RTE_EVENT_TYPE_ETHDEV. > IMO the proposed schemed helps in SW based implementation as their no real > mbuf conversation. Something we can revisit in ethdev integration if > required. > > > > > > > > +struct rte_eventdev_driver; > > > > > +struct rte_eventdev_ops; > > > > > > > > I think it is better to split API and driver interface in two files. > > > > (we should do this split in ethdev) > > > > > > I thought so, but then the "static inline" versions of northbound > > > API(like rte_event_enqueue) will go another file(due to the fact that > > > implementation need to deference "dev->data->ports[port_id]"). Do you > > > want that way? > > > I would like to keep all northbound API in rte_eventdev.h and not any of > > > them > > > in rte_eventdev_pmd.h. > > > > My comment was confusing. > > You are doing 2 files, one for API (what you call northbound I think) > > and the other one for driver interface (what you call southbound I think), > > it's very fine. > > > > > > > +/** > > > > > + * Enqueue the event object supplied in the *rte_event* structure on > > > > > an > > > > > + * event device designated by its *dev_id* through the event port > > > > > specified by > > > > > + * *port_id*. The event object specifies the event queue on which > > > > > this > > > > > + * event will be enqueued. > > > > > + * > > > > > + * @param dev_id > > > > > + * Event device identifier. > > > > > + * @param port_id > > > > > + * The identifier of the event port. > > > > > + * @param ev > > > > > + * Pointer to struct rte_event > > > > > + * > > > > > + * @return > > > > > + * - 0 on success > > > > > + * - <0 on failure. Failure can occur if the event port's output > > > > > queue is > > > > > + * backpressured, for instance. > > > > > + */ > > > > > +static inline int > > > > > +rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_event > > > > > *ev) > > > > > > > > Is it really needed to have non-burst variant of enqueue/dequeue? > > > > > > Yes. certain HW can work only with non burst variants. > > > > Same comment as Bruce, we must keep only the burst variant. > > We cannot have different API for different HW. > > I don't think there is any portability issue here, I can explain. > > The application level, we have two more use case to deal with non burst > variant > > - latency critical work > - on dequeue, if application wants to deal with only one flow(i.e to > avoid processing two different application flows to avoid cache trashing) > > Selection of the burst variants will be based on > rte_event_dev_info_get() and rte_event_dev_configure()(see, > max_event_port_dequeue_depth, > max_event_port_enqueue_depth, nb_event_port_dequeue_depth, > nb_event_port_enqueue_depth ) > So I don't think their is portability issue here and I don't want to waste my > CPU cycles on the for loop if application known to be working with non > bursts variant like below > > nb_events = rte_event_dequeue_burst(); > for(i=0; i < nb_events; i++){ > process ev[i] > } > > And mostly importantly the NPU can get almost same throughput > without burst variant so why not? Perhaps I'm mis-understanding, but can you not just dequeue 1 from the burst() function? struct rte_event ev; rte_event_dequeue_burst(dev, port, &ev, 1, wait); process( &ev ); I mean, if an application *demands* to not use bursts, the above allows it. Of course it won't scale to other implementations that would benefit from burst - but that's the application authors choice? > > > > > +/** > > > > > + * Converts nanoseconds to *wait* value for rte_event_dequeue() > > > > > + * > > > > > + * If the device is configured with > > > > > RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag then > > > > > + * application can use this function to convert wait value in > > > > > nanoseconds to > > > > > + * implementations specific wait value supplied in > > > > > rte_event_dequeue() > > > > > > > > Why is it implementation-specific? > > > > Why this conversion is not internal in the driver? > > > > > > This is for performance optimization, otherwise in drivers > > > need to convert ns to ticks in "fast path" > > > > So why not defining the unit of this timeout as CPU cycles like the ones > > returned by rte_get_timer_cycles()? > > Because HW co-processor can run in different clock domain. Need not be at > CPU frequency.