Hi Jerin, Thanks for bringing a big new piece in DPDK.
I made some comments below. 2016-11-18 11:14, Jerin Jacob: > +Eventdev API - EXPERIMENTAL > +M: Jerin Jacob <jerin.jacob at caviumnetworks.com> > +F: lib/librte_eventdev/ OK to mark it experimental. What is the plan to remove the experimental word? > + * RTE event device drivers do not use interrupts for enqueue or dequeue > + * operation. Instead, Event drivers export Poll-Mode enqueue and dequeue > + * functions to applications. To the question "what makes DPDK different" it could be answered that DPDK event drivers implement polling functions :) > +#include <stdbool.h> > + > +#include <rte_pci.h> > +#include <rte_dev.h> > +#include <rte_memory.h> Is it possible to remove some of these includes from the API? > + > +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton > +/**< Skeleton event device PMD name */ I do not understand this #define. And it is not properly prefixed. > +struct rte_event_dev_info { > + const char *driver_name; /**< Event driver name */ > + struct rte_pci_device *pci_dev; /**< PCI information */ There is some work in progress to remove PCI information from ethdev. Please do not add any PCI related structure in eventdev. The generic structure is rte_device. > +struct rte_event_dev_config { > + uint32_t dequeue_wait_ns; > + /**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this device. Please explain exactly when the wait occurs and why. > + * This value should be in the range of *min_dequeue_wait_ns* and > + * *max_dequeue_wait_ns* which previously provided in > + * rte_event_dev_info_get() > + * \see RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT I think the @see syntax would be more consistent than \see. > + uint8_t nb_event_port_dequeue_depth; > + /**< Number of dequeue queue depth for any event port on this device. I think it deserves more explanations. > + uint32_t event_dev_cfg; > + /**< Event device config flags(RTE_EVENT_DEV_CFG_)*/ How this field differs from others in the struct? Should it be named flags? > + uint32_t event_queue_cfg; /**< Queue config flags(EVENT_QUEUE_CFG_) */ Same comment about the naming of this field for event_queue config sruct. > +/** Event port configuration structure */ > +struct rte_event_port_conf { > + int32_t new_event_threshold; > + /**< A backpressure threshold for new event enqueues on this port. > + * Use for *closed system* event dev where event capacity is limited, > + * and cannot exceed the capacity of the event dev. > + * Configuring ports with different thresholds can make higher priority > + * traffic less likely to be backpressured. > + * For example, a port used to inject NIC Rx packets into the event dev > + * can have a lower threshold so as not to overwhelm the device, > + * while ports used for worker pools can have a higher threshold. > + * This value cannot exceed the *nb_events_limit* > + * which previously supplied to rte_event_dev_configure() > + */ > + uint8_t dequeue_depth; > + /**< Configure number of bulk dequeues for this event port. > + * This value cannot exceed the *nb_event_port_dequeue_depth* > + * which previously supplied to rte_event_dev_configure() > + */ > + uint8_t enqueue_depth; > + /**< Configure number of bulk enqueues for this event port. > + * This value cannot exceed the *nb_event_port_enqueue_depth* > + * which previously supplied to rte_event_dev_configure() > + */ > +}; The depth configuration is not clear to me. > +/* Event types to classify the event source */ Why this classification is needed? > +#define RTE_EVENT_TYPE_ETHDEV 0x0 > +/**< The event generated from ethdev subsystem */ > +#define RTE_EVENT_TYPE_CRYPTODEV 0x1 > +/**< The event generated from crypodev subsystem */ > +#define RTE_EVENT_TYPE_TIMERDEV 0x2 > +/**< The event generated from timerdev subsystem */ > +#define RTE_EVENT_TYPE_CORE 0x3 > +/**< The event generated from core. What is core? > +/* Event enqueue operations */ I feel a longer explanation is needed here to describe what is an operation and where this data is useful. > +#define RTE_EVENT_OP_NEW 0 > +/**< New event without previous context */ > +#define RTE_EVENT_OP_FORWARD 1 > +/**< Re-enqueue previously dequeued event */ > +#define RTE_EVENT_OP_RELEASE 2 There is no comment for the release operation. > +/** > + * Release the flow context associated with the schedule type. > + * [...] > + */ There is no function declaration below this comment. > +/** > + * The generic *rte_event* structure to hold the event attributes > + * for dequeue and enqueue operation > + */ > +struct rte_event { > + /** WORD0 */ > + RTE_STD_C11 > + union { > + uint64_t event; [...] > + }; > + /** WORD1 */ > + RTE_STD_C11 > + union { > + uintptr_t event_ptr; I wonder if it can be a problem to have the size of this field not constant across machines. > + /**< Opaque event pointer */ > + struct rte_mbuf *mbuf; > + /**< mbuf pointer if dequeued event is associated with mbuf */ How do we know that an event is associated with mbuf? Does it mean that such events are always converted into mbuf even if the application does not need it? > +struct rte_eventdev_driver; > +struct rte_eventdev_ops; I think it is better to split API and driver interface in two files. (we should do this split in ethdev) > +/** > + * Enqueue the event object supplied in the *rte_event* structure on an > + * event device designated by its *dev_id* through the event port specified > by > + * *port_id*. The event object specifies the event queue on which this > + * event will be enqueued. > + * > + * @param dev_id > + * Event device identifier. > + * @param port_id > + * The identifier of the event port. > + * @param ev > + * Pointer to struct rte_event > + * > + * @return > + * - 0 on success > + * - <0 on failure. Failure can occur if the event port's output queue is > + * backpressured, for instance. > + */ > +static inline int > +rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_event *ev) Is it really needed to have non-burst variant of enqueue/dequeue? > +/** > + * Converts nanoseconds to *wait* value for rte_event_dequeue() > + * > + * If the device is configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag > then > + * application can use this function to convert wait value in nanoseconds to > + * implementations specific wait value supplied in rte_event_dequeue() Why is it implementation-specific? Why this conversion is not internal in the driver? End of review for this patch ;)