From: Marc Sune <marc.sune at bisdn.de<mailto:marc.s...@bisdn.de>> Date: Monday, April 20, 2015 at 1:51 AM To: Keith Wiles <keith.wiles at intel.com<mailto:keith.wiles at intel.com>>, "dev at dpdk.org<mailto:dev at dpdk.org>" <dev at dpdk.org<mailto:dev at dpdk.org>> Subject: Re: [dpdk-dev] [RFC PATCH 0/4] pktdev
On 17/04/15 21:50, Wiles, Keith wrote: Hi Marc and Bruce, Hi Keith, Bruce, On 4/17/15, 1:49 PM, "Marc Sune" <marc.sune at bisdn.de><mailto:marc.sune at bisdn.de> wrote: On 17/04/15 17:16, Bruce Richardson wrote: Hi all, to continue this discussion a bit more, here is my, slightly different, slant on what a pktdev abstraction may look like. The primary objective I had in mind when drafting this is to provide the minimal abstraction that can be *easily* used as a common device abstraction for existing (and future) device types to be passed to dataplane code. The patchset demonstrates this by defining a minimal interface for pktdev - since I firmly believe the interface should be as small as possible - and then showing how that common interface can be used to unify rings and ethdevs under a common API for the datapath. I believe any attempt to unify things much beyond this to the control plane or setup phase is not worth doing - at least not initially - as at init time the code always needs to be aware of the underlying resource type in order to configure it properly for dataplane use. The overall objective I look to achieve is illustrated by the final patch in the series, which is a sample app where the same code is used for all cores, irrespective of the underlying device type. To get to that point, patch 1 defines the minimal API - just RX and TX. The .c file in the library is empty for simplicity, though I would see some functionality moving there when/if it makes sense e.g. the callback support from ethdev, as is done in Keith's patchset. Patch 2 then makes very minimal changes to ethdev to allow ethdevs to be used as pktdevs, and to make use of the pktdev functions when appropriate Patch 3 was, for me, the key test for this implementation - how hard was it to make an rte_ring usable as a pktdev too. Two single-line functions for RX/TX and a separate "converter" function proved to be all that was necessary here - and I believe simpler solutions may be possible too, as the extra structures allocated on conversion could be merged into the rte_ring structure itself and initialized on ring creation if we prefer that option. It is hoped/presumed that wrapping other structures, such as KNI, may prove to be just as easily done. [Not attempted yet - left as an exercise for the reader :-)]. Now, in terms of pktdev vs ethdev, there is nothing in this proposal that cannot also be done using ethdev AFAIK. However, pktdev as outlined here should make the process far easier than trying to create a full PMD for something. All NIC specific functions, including things like stop/start, are stripped out, as they don't make sense for an rte_ring or other software objects. Also, the other thing this provides is that we can move away from just using port ids. Instead in the same way as we now reference rings/mempools/KNIs etc via pointer, we can do the same with ethernet ports as pktdevs on the data path. There was discussion previously on moving beyond 8-bit port ids. If we look to use ethdev as a common abstraction, I feel that change will soon have to be made causing a large amount of code churn. Hi Richard, First thank you both for taking the time to look at this. I did not not reply to Keith because you Richard summarized most of my concerns. I had a brief look to this second proposal. It is more aligned to what I had in mind. But still I feel it is slightly too complicated. I don't like much the necessary (in your approach) MACRO-like pkt_dev_data struct. It is also slightly inconvenient that the user has to do: + struct rte_pkt_dev *in = rte_eth_get_dev(0); + struct rte_pkt_dev *out = rte_ring_get_dev( + rte_ring_create(name, 4096, rte_socket_id(), 0)); What about something like (~pseudo-code): rte_pkt_dev_data.h: enum rte_pkt_dev_type{ RTE_PKT_DEV_ETH, RTE_PKT_DEV_RING, RTE_PKT_DEV_KNI, //Keep adding as more PMDs are supported }; //This struct may be redundant if there is nothing more struct rte_pkt_dev_data{ enum rte_pkt_dev_type; //Placeholder, maybe we need more... }; //Make RX/TX pktdev APIs more readable, but not really needed typedef void pkt_dev_t; (In all PMDs and e.g. KNI and RINGs): struct rte_eth_dev { struct rte_pkt_dev_data pkt_dev;// <++++++++++++++++++++++++++++++++++ eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */ eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */ struct rte_eth_dev_data *data; /**< Pointer to device data */ /... rte_pkt_dev.h: #include <rte_ethdev.h> //Include PMD (and non-PMD) TX/RX headers... static inline uint16_t rte_pkt_tx_burst(pkt_dev_t* dev, uint16_t queue_id, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) { switch (((struct rte_pkt_dev_data*)dev)->type){ case RTE_PKT_DEV_ETH: struct rte_eth_dev* eth_dev = (struct rte_eth_dev*)pkt_dev; rte_pkt_tx_burst(eth_dev, queue_id, tx_pkts, nb_pkts); break; case RTE_PKT_DEV_RING: //... } } //... I do not see the functional different from my proposal other then a bit more search/replace, which does effect some of the PMD?s. The changes to the PMD is just to use the macros as we now have a nested structure. I think what I am proposing is different. See below. The nesting and movement some structures and code to a common location is to provide a better way to add other devices. Moving code into a common set of structures and APIs should only improve the code as we get more devices added to DPDK. The basic flow and design is the same. In your proposal you are movig TX/RX queues as well as some shared stuff and, most importantly, you are using function pointers to execute the RX/TX routines. What I was proposing is to try to add the minimum common shared state in order to properly demultiplex the RX/TX call and have a common set of abstract calls (the pkt_dev type). In a way, I was proposing to deliberately not have a shared struct rte_dev_data because I think the internals of the "pkt_dev" can be very different across devices (e.g. queues in kni vs eth port vs. crypto?). I treat the pkt_dev as a "black box" that conforms to TX/RX API, leaving the developer of that device to define its internal structures as it better suites the needs. I only use each of the specific device type TX/RX APIs (external to us, pkt_dev library) in rte_pkt_dev.h. This also simplifies the refactor required to eventually integrate the rte_pkt_dev library and builds it "on top" of the existing APIs. The other important difference with both, Bruce and your approach, and mine is the use of function pointers for RX/TX. I don't use them, which makes the entire abstracted TX/RX (including the final RX/TX routines itself) functions be "inlinable". Btw, I forgot to add something basic in the previous pseudo-code. The different types have to be conditionally compiled according to compiled-in DPDK libs: rte_pkt_dev.h: #include <rte_config.h> //Eth devices #ifdef RTE_LIBRTE_ETHER #include <rte_ethdev.h> #endif //KNI #ifdef RTE_LIBRTE_KNI #include <rte_kni.h> #endif //... //Include PMD (and non-PMD) TX/RX headers... static inline uint16_t rte_pkt_tx_burst(pkt_dev_t* dev, uint16_t queue_id, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) { switch (((struct rte_pkt_dev_data*)dev)->type){ #ifdef RTE_LIBRTE_ETHER case RTE_PKT_DEV_ETH: struct rte_eth_dev* eth_dev = (struct rte_eth_dev*)pkt_dev; rte_pkt_tx_burst(eth_dev, queue_id, tx_pkts, nb_pkts); break; #endif #ifdef RTE_LIBRTE_KNI case RTE_PKT_DEV_KNI: //... break; #endif default: //Corrupted type or unsupported (without compiled support) //Ignore or fail(fatal error)? break; } } //... This adds a switch into the performance path, not a great solution IMO.