Re: [dpdk-dev] [PATCH v3] net/pcap: rx_iface_in stream type support
> -Original Message- > From: Ferruh Yigit > Sent: Wednesday, June 27, 2018 4:59 PM > To: Ido Goshen ; Bruce Richardson > ; John McNamara > ; Marko Kovacevic > > Cc: dev@dpdk.org > Subject: Re: [PATCH v3] net/pcap: rx_iface_in stream type support > > On 6/27/2018 1:04 PM, ido goshen wrote: > > From: ido g > > > > Support rx of in direction packets only Useful for apps that also tx > > to eth_pcap ports in order to not see them echoed back in as rx when > > out direction is also captured > > Can you please add your command, which was in previous mails, on how to re- > produce the issue of capturing transferred packets in Rx path; for future. [idog] I think one can just use the new doc example below (the one w/o the _in option) but I can add it in the commit log too... > > And overall looks good, there are a few syntax comments below. > > > > > Signed-off-by: ido g > > --- > > v3: > > * merge to updated dpdk-next-net code > > * pcap_ring doc update > > > > v2: > > * clean checkpatch warning > > > > doc/guides/nics/pcap_ring.rst | 25 ++- > > drivers/net/pcap/rte_eth_pcap.c | 45 > > ++--- > > 2 files changed, 66 insertions(+), 4 deletions(-) > > > > diff --git a/doc/guides/nics/pcap_ring.rst > > b/doc/guides/nics/pcap_ring.rst index 7fd063c..6282be6 100644 > > --- a/doc/guides/nics/pcap_ring.rst > > +++ b/doc/guides/nics/pcap_ring.rst > > @@ -71,11 +71,19 @@ The different stream types are: > > tx_pcap=/path/to/file.pcap > > > > * rx_iface: Defines a reception stream based on a network interface name. > > -The driver reads packets coming from the given interface using the > > Linux > kernel driver for that interface. > > +The driver reads packets from the given interface using the Linux > > kernel > driver for that interface. > > +The driver captures both the incoming and outgoing packets on that > interface. > > This is only true if tx_iface parameter given for that interface, right? I > can be > good to clarify to not confuse people. I am for keeping first sentences, and > add > a note about this special case, something like (feel free to update): > [idog] No, This is true indifferent to what the other params are. i.e. In case iface_rx is given the dpdk app will see not only packets coming into that iface (e.g. echo request) but also what linux apps are sending out of that iface (e.g. echo reply) In case iface_rx_in is given it will see only incoming traffic (only the echo requests) Giving tx_iface with the same iface just exposes that behavior and makes it worst cause it will also capture back what the dpdk app is sending to that iface and not only what Linux sends. Therefore I think the documentation is correct. > " > The driver reads packets coming from the given interface using the Linux > kernel driver for that interface. > When tx_iface argument given for same interface, Tx packets also captured. > " > > > The value is an interface name. > > > > rx_iface=eth0 > > > > +* rx_iface_in: Defines a reception stream based on a network interface > name. > > +The driver reads packets from the given interface using the Linux > > kernel > driver for that interface. > > +The driver captures only the incoming packets on that interface. > > Again I am for keeping "... reads packets coming from the given interface ..." > and clarify the difference in next sentences specific to tx_iface usage. > > > +The value is an interface name. > > + > > +rx_iface_in=eth0 > > + > > * tx_iface: Defines a transmission stream based on a network interface > name. > > The driver sends packets to the given interface using the Linux kernel > driver for that interface. > > The value is an interface name. > > @@ -122,6 +130,21 @@ Forward packets through two network interfaces: > > $RTE_TARGET/app/testpmd -l 0-3 -n 4 \ > > --vdev 'net_pcap0,iface=eth0' --vdev='net_pcap1;iface=eth1' > > > > +Enable 2 tx queues on a network interface:> + .. code-block:: console > > + > > +$RTE_TARGET/app/testpmd -l 0-3 -n 4 \ > > +--vdev 'net_pcap0,rx_iface=eth1,tx_iface=eth1,tx_iface=eth1' \ > > +-- --txq 2 > > + > > +Read only incoming packets from a network interface: > > This title is confusing, the sample is not for "read only incoming packets" > it Tx > also J. I understand what you mean, but I believe it would be better to > clarify > this. [idog] Would this make it clearer? "Read only incoming packets from a network interface and write them back to that network interface:" > > > + > > +.. code-block:: console > > + > > +$RTE_TARGET/app/testpmd -l 0-3 -n 4 \ > > +--vdev 'net_pcap0,rx_iface_in=eth1,tx_iface=eth1' > > + > > Using libpcap-based PMD with the testpmd Application > > > > > > diff --git a/drivers/net/pcap/rte_eth_pcap.c > > b/drivers/net/pcap/rte_eth_pcap.c index b21930b
Re: [dpdk-dev] [PATCH V4 8/9] app/testpmd: show example to handle hot unplug
Hi Jeff A good advance, thank you, but as I said in previous version, this patch inserts a bug and the next one fixes it. Patch 9 should be before patch 8 while this patch just add 1 more option for EAL hotplug. Please see 1 more comment below. From: Jeff Guo > Use testpmd for example, to show how an application smoothly handle failure > when device being hot unplug. If app have enabled the device event monitor > and register the hot plug event’s callback before running, once app detect the > removal event, the callback would be called. It will first stop the packet > forwarding, then stop the port, close the port, and finally detach the port to > remove the device out from the device lists. > > Signed-off-by: Jeff Guo > --- > v4->v3: > remove some unused code > --- > app/test-pmd/testpmd.c | 13 + > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index > 24c1998..42ed196 100644 > --- a/app/test-pmd/testpmd.c > +++ b/app/test-pmd/testpmd.c > @@ -2196,6 +2196,9 @@ static void > eth_dev_event_callback(char *device_name, enum rte_dev_event_type type, >__rte_unused void *arg) > { > + uint16_t port_id; > + int ret; > + > if (type >= RTE_DEV_EVENT_MAX) { > fprintf(stderr, "%s called upon invalid event %d\n", > __func__, type); > @@ -2206,9 +2209,12 @@ eth_dev_event_callback(char *device_name, enum > rte_dev_event_type type, > case RTE_DEV_EVENT_REMOVE: > RTE_LOG(ERR, EAL, "The device: %s has been removed!\n", > device_name); > - /* TODO: After finish failure handle, begin to stop > - * packet forward, stop port, close port, detach port. > - */ > + ret = rte_eth_dev_get_port_by_name(device_name, &port_id); As you probably know, 1 rte_device may be associated to more than one ethdev ports, so the ethdev port name can be different from rte_device name. Looks like we need a new ethdev API to get all the ports associated to one rte_device. > + if (ret) { > + printf("can not get port by device %s!\n", > device_name); > + return; > + } > + rmv_event_callback((void *)(intptr_t)port_id); > break; > case RTE_DEV_EVENT_ADD: > RTE_LOG(ERR, EAL, "The device: %s has been added!\n", @@ - > 2736,7 +2742,6 @@ main(int argc, char** argv) > return -1; > } > eth_dev_event_callback_register(); > - > } > > if (start_port(RTE_PORT_ALL) != 0) > -- > 2.7.4
Re: [dpdk-dev] [PATCH v5 03/15] vhost: vring address setup for packed queues
On 06/29/2018 05:59 PM, Tiwei Bie wrote: @@ -888,7 +914,8 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *pmsg) static int vq_is_ready(struct vhost_virtqueue *vq) { - return vq && vq->desc && vq->avail && vq->used && + return vq && + (vq->desc_packed || (vq->desc && vq->avail && vq->used)) && vq->kickfd != VIRTIO_UNINITIALIZED_EVENTFD && vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD; It seems that the check is wrong here as desc_packed and desc are in a union. We may have to check whether packed ring has been negotiated.
[dpdk-dev] [PATCH v4] net/pcap: rx_iface_in stream type support
From: ido g Support rx of in direction packets only Useful for apps that also tx to eth_pcap ports in order to not see them echoed back in as rx when out direction is also captured Example: In case using rx_iface and sending *single* packet to eth1 it will loop forever as the when it is sent to tx_iface=eth1 it will be captured again on the rx_iface=eth1 and so on $RTE_TARGET/app/testpmd l 0-3 -n 4 \ --vdev 'net_pcap0,rx_iface=eth1,tx_iface=eth1' … -- Forward statistics for port 0 RX-packets: 758RX-dropped: 0 RX-total: 758 TX-packets: 758TX-dropped: 0 TX-total: 758 -- While if using rx_iface_in it will not be captured on the way out and be forwarded only once $RTE_TARGET/app/testpmd l 0-3 -n 4 \ --vdev 'net_pcap0,rx_iface_in=eth1,tx_iface=eth1' … -- Forward statistics for port 0 RX-packets: 1 RX-dropped: 0 RX-total: 1 TX-packets: 1 TX-dropped: 0 TX-total: 1 -- Signed-off-by: ido g --- v4: * fix order of rx_face and rx_iface_in mix * reward pcap_ring doc example * cosmetics code alignments * adding example commands in commit log v3: * merge to updated dpdk-next-net code * pcap_ring doc update v2: * clean checkpatch warning doc/guides/nics/pcap_ring.rst | 25 +++- drivers/net/pcap/rte_eth_pcap.c | 51 + 2 files changed, 70 insertions(+), 6 deletions(-) diff --git a/doc/guides/nics/pcap_ring.rst b/doc/guides/nics/pcap_ring.rst index 7fd063c..879e543 100644 --- a/doc/guides/nics/pcap_ring.rst +++ b/doc/guides/nics/pcap_ring.rst @@ -71,11 +71,19 @@ The different stream types are: tx_pcap=/path/to/file.pcap * rx_iface: Defines a reception stream based on a network interface name. -The driver reads packets coming from the given interface using the Linux kernel driver for that interface. +The driver reads packets from the given interface using the Linux kernel driver for that interface. +The driver captures both the incoming and outgoing packets on that interface. The value is an interface name. rx_iface=eth0 +* rx_iface_in: Defines a reception stream based on a network interface name. +The driver reads packets from the given interface using the Linux kernel driver for that interface. +The driver captures only the incoming packets on that interface. +The value is an interface name. + +rx_iface_in=eth0 + * tx_iface: Defines a transmission stream based on a network interface name. The driver sends packets to the given interface using the Linux kernel driver for that interface. The value is an interface name. @@ -122,6 +130,21 @@ Forward packets through two network interfaces: $RTE_TARGET/app/testpmd -l 0-3 -n 4 \ --vdev 'net_pcap0,iface=eth0' --vdev='net_pcap1;iface=eth1' +Enable 2 tx queues on a network interface: + +.. code-block:: console + +$RTE_TARGET/app/testpmd -l 0-3 -n 4 \ +--vdev 'net_pcap0,rx_iface=eth1,tx_iface=eth1,tx_iface=eth1' \ +-- --txq 2 + +Read only incoming packets from a network interface and write them back to the same network interface: + +.. code-block:: console + +$RTE_TARGET/app/testpmd -l 0-3 -n 4 \ +--vdev 'net_pcap0,rx_iface_in=eth1,tx_iface=eth1' + Using libpcap-based PMD with the testpmd Application diff --git a/drivers/net/pcap/rte_eth_pcap.c b/drivers/net/pcap/rte_eth_pcap.c index b21930b..0a89b24 100644 --- a/drivers/net/pcap/rte_eth_pcap.c +++ b/drivers/net/pcap/rte_eth_pcap.c @@ -26,6 +26,7 @@ #define ETH_PCAP_RX_PCAP_ARG "rx_pcap" #define ETH_PCAP_TX_PCAP_ARG "tx_pcap" #define ETH_PCAP_RX_IFACE_ARG "rx_iface" +#define ETH_PCAP_RX_IFACE_IN_ARG "rx_iface_in" #define ETH_PCAP_TX_IFACE_ARG "tx_iface" #define ETH_PCAP_IFACE_ARG"iface" @@ -83,6 +84,7 @@ struct pmd_devargs { ETH_PCAP_RX_PCAP_ARG, ETH_PCAP_TX_PCAP_ARG, ETH_PCAP_RX_IFACE_ARG, + ETH_PCAP_RX_IFACE_IN_ARG, ETH_PCAP_TX_IFACE_ARG, ETH_PCAP_IFACE_ARG, NULL @@ -739,6 +741,21 @@ struct pmd_devargs { } static inline int +set_iface_direction(const char *iface, pcap_t *pcap, + pcap_direction_t direction) +{ + const char *direction_str = (direction == PCAP_D_IN) ? "IN" : "OUT"; + if (pcap_setdirection(pcap, direction) < 0) { + PMD_LOG(ERR, "Setting %s pcap direction %s failed - %s\n", + iface, direction_str, pcap_geterr(pcap)); + return -1; + } + PMD_LOG(INFO, "Setting %s pcap direction %s\n", + iface, direction_str); + return 0; +} + +static inline int open_
[dpdk-dev] 17.05 --> 17.11, minimum hash table key size
Hello, We are in process of migrating our design from DPDK 17.05 to 17.11 and we ran into a small problem. Within our design, we have some hash tables with 4-byte keys. While going through the changes done in 17.11, we have found there was an added key_size check, which now requires key_size >= 8 bytes (see check_params_create() in rte_table_hash_ext.c). Not seeing any other options, so I was hoping someone could advise on how to support a 4-byte hash key size in 17.11 and on a go forward basis. Regards, Mike
Re: [dpdk-dev] [PATCH] net/mlx4: refinements to Rx packet type report
Thursday, June 28, 2018 3:40 PM, Adrien Mazarguil: > Subject: Re: [dpdk-dev] [PATCH] net/mlx4: refinements to Rx packet type > report > > On Thu, Jun 28, 2018 at 09:30:28AM +0300, Moti Haimovsky wrote: > > This commit refines the Rx Packet type flags reported by the PMD for > > each packet being received in order to make the report more accurate. > > > > Signed-off-by: Moti Haimovsky > > Patch looks good, thanks. > > Acked-by: Adrien Mazarguil Applied to next-net-mlx, thanks. > > -- > Adrien Mazarguil > 6WIND
[dpdk-dev] [PATCH] net/thunderx: add support for Rx VLAN offload
From: "Kudurumalla, Rakesh" This feature is used to offload stripping of vlan header from recevied packets and update vlan_tci field in mbuf when DEV_RX_OFFLOAD_VLAN_STRIP & ETH_VLAN_STRIP_MASK flag is set. Signed-off-by: Rakesh Kudurumalla Signed-off-by: Pavan Nikhilesh --- drivers/net/thunderx/base/nicvf_hw.c | 1 + drivers/net/thunderx/nicvf_ethdev.c | 59 +-- drivers/net/thunderx/nicvf_rxtx.c| 70 drivers/net/thunderx/nicvf_rxtx.h| 15 -- drivers/net/thunderx/nicvf_struct.h | 1 + 5 files changed, 119 insertions(+), 27 deletions(-) diff --git a/drivers/net/thunderx/base/nicvf_hw.c b/drivers/net/thunderx/base/nicvf_hw.c index b07a2937d..5b1abe201 100644 --- a/drivers/net/thunderx/base/nicvf_hw.c +++ b/drivers/net/thunderx/base/nicvf_hw.c @@ -699,6 +699,7 @@ nicvf_vlan_hw_strip(struct nicvf *nic, bool enable) else val &= ~((STRIP_SECOND_VLAN | STRIP_FIRST_VLAN) << 25); + nic->vlan_strip = enable; nicvf_reg_write(nic, NIC_VNIC_RQ_GEN_CFG, val); } diff --git a/drivers/net/thunderx/nicvf_ethdev.c b/drivers/net/thunderx/nicvf_ethdev.c index 76fed9f99..4f58b2e33 100644 --- a/drivers/net/thunderx/nicvf_ethdev.c +++ b/drivers/net/thunderx/nicvf_ethdev.c @@ -52,6 +52,8 @@ static void nicvf_dev_stop(struct rte_eth_dev *dev); static void nicvf_dev_stop_cleanup(struct rte_eth_dev *dev, bool cleanup); static void nicvf_vf_stop(struct rte_eth_dev *dev, struct nicvf *nic, bool cleanup); +static int nicvf_vlan_offload_config(struct rte_eth_dev *dev, int mask); +static int nicvf_vlan_offload_set(struct rte_eth_dev *dev, int mask); RTE_INIT(nicvf_init_log); static void @@ -357,11 +359,9 @@ nicvf_dev_supported_ptypes_get(struct rte_eth_dev *dev) } memcpy((char *)ptypes + copied, &ptypes_end, sizeof(ptypes_end)); - if (dev->rx_pkt_burst == nicvf_recv_pkts || - dev->rx_pkt_burst == nicvf_recv_pkts_multiseg) - return ptypes; - return NULL; + /* All Ptypes are supported in all Rx functions. */ + return ptypes; } static void @@ -918,13 +918,18 @@ nicvf_set_tx_function(struct rte_eth_dev *dev) static void nicvf_set_rx_function(struct rte_eth_dev *dev) { - if (dev->data->scattered_rx) { - PMD_DRV_LOG(DEBUG, "Using multi-segment rx callback"); - dev->rx_pkt_burst = nicvf_recv_pkts_multiseg; - } else { - PMD_DRV_LOG(DEBUG, "Using single-segment rx callback"); - dev->rx_pkt_burst = nicvf_recv_pkts; - } + struct nicvf *nic = nicvf_pmd_priv(dev); + + const eth_rx_burst_t rx_burst_func[2][2] = { + /* [NORMAL/SCATTER] [VLAN_STRIP/NO_VLAN_STRIP] */ + [0][0] = nicvf_recv_pkts_no_offload, + [0][1] = nicvf_recv_pkts_vlan_strip, + [1][0] = nicvf_recv_pkts_multiseg_no_offload, + [1][1] = nicvf_recv_pkts_multiseg_vlan_strip, + }; + + dev->rx_pkt_burst = + rx_burst_func[dev->data->scattered_rx][nic->vlan_strip]; } static int @@ -1469,7 +1474,7 @@ nicvf_vf_start(struct rte_eth_dev *dev, struct nicvf *nic, uint32_t rbdrsz) struct rte_mbuf *mbuf; uint16_t rx_start, rx_end; uint16_t tx_start, tx_end; - bool vlan_strip; + int mask; PMD_INIT_FUNC_TRACE(); @@ -1590,9 +1595,9 @@ nicvf_vf_start(struct rte_eth_dev *dev, struct nicvf *nic, uint32_t rbdrsz) nic->rbdr->tail, nb_rbdr_desc, nic->vf_id); /* Configure VLAN Strip */ - vlan_strip = !!(dev->data->dev_conf.rxmode.offloads & - DEV_RX_OFFLOAD_VLAN_STRIP); - nicvf_vlan_hw_strip(nic, vlan_strip); + mask = ETH_VLAN_STRIP_MASK | ETH_VLAN_FILTER_MASK | + ETH_VLAN_EXTEND_MASK; + ret = nicvf_vlan_offload_config(dev, mask); /* Based on the packet type(IPv4 or IPv6), the nicvf HW aligns L3 data * to the 64bit memory address. @@ -1983,6 +1988,7 @@ static const struct eth_dev_ops nicvf_eth_dev_ops = { .dev_infos_get= nicvf_dev_info_get, .dev_supported_ptypes_get = nicvf_dev_supported_ptypes_get, .mtu_set = nicvf_dev_set_mtu, + .vlan_offload_set = nicvf_vlan_offload_set, .reta_update = nicvf_dev_reta_update, .reta_query = nicvf_dev_reta_query, .rss_hash_update = nicvf_dev_rss_hash_update, @@ -1999,6 +2005,29 @@ static const struct eth_dev_ops nicvf_eth_dev_ops = { .get_reg = nicvf_dev_get_regs, }; +static int +nicvf_vlan_offload_config(struct rte_eth_dev *dev, int mask) +{ + struct rte_eth_rxmode *rxmode; + struct nicvf *nic = nicvf_pmd_priv(dev); + rxmode = &dev->data->dev_conf.rxmode; + if (mask & ETH_VLAN_STRIP_MASK) { + if (rxmode->offloads & DEV_RX_OFFLOAD_VLAN_
Re: [dpdk-dev] [PATCH v6] net/fm10k: add support for check descriptor status APIs
Hi, Ferruh > -Original Message- > From: Yigit, Ferruh > Sent: Friday, June 29, 2018 7:04 PM > To: Zhao1, Wei ; dev@dpdk.org > Cc: Zhang, Qi Z > Subject: Re: [dpdk-dev] [PATCH v6] net/fm10k: add support for check > descriptor status APIs > > On 6/29/2018 2:48 AM, Wei Zhao wrote: > > rte_eth_rx_descritpr_status and rte_eth_tx_descriptor_status are > > supported by fm10K. > > > > Signed-off-by: Wei Zhao > > > > --- > > > > v2: > > -fix DD check error in tx descriptor > > > > v3: > > -fix DD check index error > > > > v4: > > -fix error in RS bit list poll > > > > v5: > > -rebase code to branch and delete useless variable > > > > v6: > > -change release note > > --- > > doc/guides/rel_notes/release_18_08.rst | 6 +++ > > drivers/net/fm10k/fm10k.h | 7 +++ > > drivers/net/fm10k/fm10k_ethdev.c | 2 + > > drivers/net/fm10k/fm10k_rxtx.c | 78 > ++ > > Can you please update fm10k*.ini files to announce newly added "Rx > descriptor status" & "Tx descriptor status" features? Ok, thank you. I will commit new patch.
[dpdk-dev] [PATCH v4 2/5] eventdev: improve err handling for Rx adapter queue add/del
The new WRR sequence applicable after queue add/del is set up after setting the new queue state, so a memory allocation failure will leave behind an incorrect state. This change separates the memory sizing + allocation for the Rx poll and WRR array from calculation of the WRR sequence. If there is a memory allocation failure, existing Rx queue configuration remains unchanged. Signed-off-by: Nikhil Rao --- lib/librte_eventdev/rte_event_eth_rx_adapter.c | 418 ++--- 1 file changed, 302 insertions(+), 116 deletions(-) diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.c b/lib/librte_eventdev/rte_event_eth_rx_adapter.c index 9361d48..926f83a 100644 --- a/lib/librte_eventdev/rte_event_eth_rx_adapter.c +++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.c @@ -109,10 +109,16 @@ struct eth_device_info { * rx_adapter_stop callback needs to be invoked */ uint8_t dev_rx_started; - /* If nb_dev_queues > 0, the start callback will + /* Number of queues added for this device */ + uint16_t nb_dev_queues; + /* If nb_rx_poll > 0, the start callback will * be invoked if not already invoked */ - uint16_t nb_dev_queues; + uint16_t nb_rx_poll; + /* sum(wrr(q)) for all queues within the device +* useful when deleting all device queues +*/ + uint32_t wrr_len; }; /* Per Rx queue */ @@ -188,13 +194,170 @@ static uint16_t rxa_gcd_u16(uint16_t a, uint16_t b) } } -/* Precalculate WRR polling sequence for all queues in rx_adapter */ +static inline int +rxa_polled_queue(struct eth_device_info *dev_info, + int rx_queue_id) +{ + struct eth_rx_queue_info *queue_info; + + queue_info = &dev_info->rx_queue[rx_queue_id]; + return !dev_info->internal_event_port && + dev_info->rx_queue && + queue_info->queue_enabled && queue_info->wt != 0; +} + +/* Calculate size of the eth_rx_poll and wrr_sched arrays + * after deleting poll mode rx queues + */ +static void +rxa_calc_nb_post_poll_del(struct rte_event_eth_rx_adapter *rx_adapter, + struct eth_device_info *dev_info, + int rx_queue_id, + uint32_t *nb_rx_poll, + uint32_t *nb_wrr) +{ + uint32_t poll_diff; + uint32_t wrr_len_diff; + + if (rx_queue_id == -1) { + poll_diff = dev_info->nb_rx_poll; + wrr_len_diff = dev_info->wrr_len; + } else { + poll_diff = rxa_polled_queue(dev_info, rx_queue_id); + wrr_len_diff = poll_diff ? dev_info->rx_queue[rx_queue_id].wt : + 0; + } + + *nb_rx_poll = rx_adapter->num_rx_polled - poll_diff; + *nb_wrr = rx_adapter->wrr_len - wrr_len_diff; +} + +/* Calculate nb_rx_* after adding poll mode rx queues + */ +static void +rxa_calc_nb_post_add_poll(struct rte_event_eth_rx_adapter *rx_adapter, + struct eth_device_info *dev_info, + int rx_queue_id, + uint16_t wt, + uint32_t *nb_rx_poll, + uint32_t *nb_wrr) +{ + uint32_t poll_diff; + uint32_t wrr_len_diff; + + if (rx_queue_id == -1) { + poll_diff = dev_info->dev->data->nb_rx_queues - + dev_info->nb_rx_poll; + wrr_len_diff = wt*dev_info->dev->data->nb_rx_queues + - dev_info->wrr_len; + } else { + poll_diff = !rxa_polled_queue(dev_info, rx_queue_id); + wrr_len_diff = rxa_polled_queue(dev_info, rx_queue_id) ? + wt - dev_info->rx_queue[rx_queue_id].wt : + wt; + } + + *nb_rx_poll = rx_adapter->num_rx_polled + poll_diff; + *nb_wrr = rx_adapter->wrr_len + wrr_len_diff; +} + +/* Calculate nb_rx_* after adding rx_queue_id */ +static void +rxa_calc_nb_post_add(struct rte_event_eth_rx_adapter *rx_adapter, + struct eth_device_info *dev_info, + int rx_queue_id, + uint16_t wt, + uint32_t *nb_rx_poll, + uint32_t *nb_wrr) +{ + rxa_calc_nb_post_add_poll(rx_adapter, dev_info, rx_queue_id, + wt, nb_rx_poll, nb_wrr); +} + +/* Calculate nb_rx_* after deleting rx_queue_id */ +static void +rxa_calc_nb_post_del(struct rte_event_eth_rx_adapter *rx_adapter, + struct eth_device_info *dev_info, + int rx_queue_id, + uint32_t *nb_rx_poll, + uint32_t *nb_wrr) +{ + rxa_calc_nb_post_poll_del(rx_adapter, dev_info, rx_queue_id, nb_rx_poll, + nb_wrr); +} + +/* + * Allocate the rx_poll array + */ +static struct eth_rx_poll_entry * +rxa_alloc_poll(struct rte_event_eth_rx_adapter *rx_adapter, + uint32_t nu
[dpdk-dev] [PATCH v4 0/5] eventdev: add interrupt driven queues to Rx adapter
This patch series adds support for interrupt driven queues to the ethernet Rx adapter, the first 3 patches prepare the code to handle both poll and interrupt driven Rx queues, the 4th patch patch has code changes specific to interrupt driven queues and the final patch has test code. Changelog: v3->v4: * Fix FreeBSD build breakage. v2->v3: * Fix shared build breakage. * Fix FreeBSD build breakage. * Reduce epoll maxevents parameter by 1, since thread wakeup uses pthread_cancel as opposed to an exit message through a file monitored by epoll_wait(). * Check intr_handle before access, it is NULL when zero Rx queue interrupts are configured. * Remove thread_stop flag, in the event of a pthread_cancel, it is not possible to check this flag thread stack is unwound without returning to rxa_intr_thread. v1->v2: * Move rte_service_component_runstate_set such that it is called only when cap & RTE__EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT is false. (Jerin Jacob) * Fix meson build. (Jerin Jacob) * Replace calls to pthread_* with rte_ctrl_thread_create(). (Jerin Jacob) * Move adapter test code to separate patch. (Jerin Jacob) Note: I haven't removed the note about devices created rte_event_eth_rx_adapter_create, will fix in a separate patch. Nikhil Rao (5): eventdev: standardize Rx adapter internal function names eventdev: improve err handling for Rx adapter queue add/del eventdev: move Rx adapter eth Rx to separate function eventdev: add interrupt driven queues to Rx adapter eventdev: add Rx adapter tests for interrupt driven queues config/rte_config.h|1 + lib/librte_eventdev/rte_event_eth_rx_adapter.h |5 +- lib/librte_eventdev/rte_event_eth_rx_adapter.c | 1526 +--- test/test/test_event_eth_rx_adapter.c | 261 +++- .../prog_guide/event_ethernet_rx_adapter.rst | 24 + config/common_base |1 + lib/librte_eventdev/Makefile |9 +- 7 files changed, 1588 insertions(+), 239 deletions(-) -- 1.8.3.1
[dpdk-dev] [PATCH v4 5/5] eventdev: add Rx adapter tests for interrupt driven queues
Add test for queue add and delete, the add/delete calls also switch queues between poll and interrupt mode. Signed-off-by: Nikhil Rao --- test/test/test_event_eth_rx_adapter.c | 261 +++--- 1 file changed, 242 insertions(+), 19 deletions(-) diff --git a/test/test/test_event_eth_rx_adapter.c b/test/test/test_event_eth_rx_adapter.c index d432731..2337e54 100644 --- a/test/test/test_event_eth_rx_adapter.c +++ b/test/test/test_event_eth_rx_adapter.c @@ -25,28 +25,17 @@ struct event_eth_rx_adapter_test_params { struct rte_mempool *mp; uint16_t rx_rings, tx_rings; uint32_t caps; + int rx_intr_port_inited; + uint16_t rx_intr_port; }; static struct event_eth_rx_adapter_test_params default_params; static inline int -port_init(uint8_t port, struct rte_mempool *mp) +port_init_common(uint8_t port, const struct rte_eth_conf *port_conf, + struct rte_mempool *mp) { - static const struct rte_eth_conf port_conf_default = { - .rxmode = { - .mq_mode = ETH_MQ_RX_RSS, - .max_rx_pkt_len = ETHER_MAX_LEN - }, - .rx_adv_conf = { - .rss_conf = { - .rss_hf = ETH_RSS_IP | - ETH_RSS_TCP | - ETH_RSS_UDP, - } - } - }; const uint16_t rx_ring_size = 512, tx_ring_size = 512; - struct rte_eth_conf port_conf = port_conf_default; int retval; uint16_t q; struct rte_eth_dev_info dev_info; @@ -54,7 +43,7 @@ struct event_eth_rx_adapter_test_params { if (!rte_eth_dev_is_valid_port(port)) return -1; - retval = rte_eth_dev_configure(port, 0, 0, &port_conf); + retval = rte_eth_dev_configure(port, 0, 0, port_conf); rte_eth_dev_info_get(port, &dev_info); @@ -64,7 +53,7 @@ struct event_eth_rx_adapter_test_params { /* Configure the Ethernet device. */ retval = rte_eth_dev_configure(port, default_params.rx_rings, - default_params.tx_rings, &port_conf); + default_params.tx_rings, port_conf); if (retval != 0) return retval; @@ -104,6 +93,77 @@ struct event_eth_rx_adapter_test_params { return 0; } +static inline int +port_init_rx_intr(uint8_t port, struct rte_mempool *mp) +{ + static const struct rte_eth_conf port_conf_default = { + .rxmode = { + .mq_mode = ETH_MQ_RX_RSS, + .max_rx_pkt_len = ETHER_MAX_LEN + }, + .intr_conf = { + .rxq = 1, + }, + }; + + return port_init_common(port, &port_conf_default, mp); +} + +static inline int +port_init(uint8_t port, struct rte_mempool *mp) +{ + static const struct rte_eth_conf port_conf_default = { + .rxmode = { + .mq_mode = ETH_MQ_RX_RSS, + .max_rx_pkt_len = ETHER_MAX_LEN + }, + .rx_adv_conf = { + .rss_conf = { + .rss_hf = ETH_RSS_IP | + ETH_RSS_TCP | + ETH_RSS_UDP, + } + } + }; + + return port_init_common(port, &port_conf_default, mp); +} + +static int +init_port_rx_intr(int num_ports) +{ + int retval; + uint16_t portid; + int err; + + default_params.mp = rte_pktmbuf_pool_create("packet_pool", + NB_MBUFS, + MBUF_CACHE_SIZE, + MBUF_PRIV_SIZE, + RTE_MBUF_DEFAULT_BUF_SIZE, + rte_socket_id()); + if (!default_params.mp) + return -ENOMEM; + + RTE_ETH_FOREACH_DEV(portid) { + retval = port_init_rx_intr(portid, default_params.mp); + if (retval) + continue; + err = rte_event_eth_rx_adapter_caps_get(TEST_DEV_ID, portid, + &default_params.caps); + if (err) + continue; + if (!(default_params.caps & + RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT)) { + default_params.rx_intr_port_inited = 1; + default_params.rx_intr_port = portid; + return 0; + } + rte_eth_dev_stop(portid); + } + return 0; +} + static int init_ports(int num_ports) { @@ -181,6 +241,57 @@ struct event_eth_rx_adapter_test_params {
[dpdk-dev] [PATCH v4 4/5] eventdev: add interrupt driven queues to Rx adapter
Add support for interrupt driven queues when eth device is configured for rxq interrupts and servicing weight for the queue is configured to be zero. A interrupt driven packet received counter has been added to rte_event_eth_rx_adapter_stats. Signed-off-by: Nikhil Rao --- config/rte_config.h| 1 + lib/librte_eventdev/rte_event_eth_rx_adapter.h | 5 +- lib/librte_eventdev/rte_event_eth_rx_adapter.c | 940 - .../prog_guide/event_ethernet_rx_adapter.rst | 24 + config/common_base | 1 + lib/librte_eventdev/Makefile | 9 +- 6 files changed, 950 insertions(+), 30 deletions(-) diff --git a/config/rte_config.h b/config/rte_config.h index a1d0175..ec88f14 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -64,6 +64,7 @@ #define RTE_EVENT_MAX_DEVS 16 #define RTE_EVENT_MAX_QUEUES_PER_DEV 64 #define RTE_EVENT_TIMER_ADAPTER_NUM_MAX 32 +#define RTE_EVENT_ETH_INTR_RING_SIZE 1024 #define RTE_EVENT_CRYPTO_ADAPTER_MAX_INSTANCE 32 /* rawdev defines */ diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.h b/lib/librte_eventdev/rte_event_eth_rx_adapter.h index 307b2b5..97f25e9 100644 --- a/lib/librte_eventdev/rte_event_eth_rx_adapter.h +++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.h @@ -64,8 +64,7 @@ * the service function ID of the adapter in this case. * * Note: - * 1) Interrupt driven receive queues are currently unimplemented. - * 2) Devices created after an instance of rte_event_eth_rx_adapter_create + * 1) Devices created after an instance of rte_event_eth_rx_adapter_create * should be added to a new instance of the rx adapter. */ @@ -199,6 +198,8 @@ struct rte_event_eth_rx_adapter_stats { * block cycles can be used to compute the percentage of * cycles the service is blocked by the event device. */ + uint64_t rx_intr_packets; + /**< Received packet count for interrupt mode Rx queues */ }; /** diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.c b/lib/librte_eventdev/rte_event_eth_rx_adapter.c index 8fe037f..42dd7f8 100644 --- a/lib/librte_eventdev/rte_event_eth_rx_adapter.c +++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.c @@ -2,6 +2,11 @@ * Copyright(c) 2017 Intel Corporation. * All rights reserved. */ +#if defined(LINUX) +#include +#endif +#include + #include #include #include @@ -11,6 +16,7 @@ #include #include #include +#include #include "rte_eventdev.h" #include "rte_eventdev_pmd.h" @@ -24,6 +30,22 @@ #define ETH_RX_ADAPTER_MEM_NAME_LEN32 #define RSS_KEY_SIZE 40 +/* value written to intr thread pipe to signal thread exit */ +#define ETH_BRIDGE_INTR_THREAD_EXIT1 +/* Sentinel value to detect initialized file handle */ +#define INIT_FD-1 + +/* + * Used to store port and queue ID of interrupting Rx queue + */ +union queue_data { + RTE_STD_C11 + void *ptr; + struct { + uint16_t port; + uint16_t queue; + }; +}; /* * There is an instance of this struct per polled Rx queue added to the @@ -75,6 +97,30 @@ struct rte_event_eth_rx_adapter { uint16_t enq_block_count; /* Block start ts */ uint64_t rx_enq_block_start_ts; + /* epoll fd used to wait for Rx interrupts */ + int epd; + /* Num of interrupt driven interrupt queues */ + uint32_t num_rx_intr; + /* Used to send of interrupting Rx queues from +* the interrupt thread to the Rx thread +*/ + struct rte_ring *intr_ring; + /* Rx Queue data (dev id, queue id) for the last non-empty +* queue polled +*/ + union queue_data qd; + /* queue_data is valid */ + int qd_valid; + /* Interrupt ring lock, synchronizes Rx thread +* and interrupt thread +*/ + rte_spinlock_t intr_ring_lock; + /* event array passed to rte_poll_wait */ + struct rte_epoll_event *epoll_events; + /* Count of interrupt vectors in use */ + uint32_t num_intr_vec; + /* Thread blocked on Rx interrupts */ + pthread_t rx_intr_thread; /* Configuration callback for rte_service configuration */ rte_event_eth_rx_adapter_conf_cb conf_cb; /* Configuration callback argument */ @@ -93,6 +139,8 @@ struct rte_event_eth_rx_adapter { uint32_t service_id; /* Adapter started flag */ uint8_t rxa_started; + /* Adapter ID */ + uint8_t id; } __rte_cache_aligned; /* Per eth device */ @@ -111,19 +159,40 @@ struct eth_device_info { uint8_t dev_rx_started; /* Number of queues added for this device */ uint16_t nb_dev_queues; - /* If nb_rx_poll > 0, the start callback will + /* Number of poll based queues +* If nb_rx_poll > 0, the start callback will * be invoked if not already invoked */ uint16_t nb
[dpdk-dev] [PATCH v4 1/5] eventdev: standardize Rx adapter internal function names
Add a common prefix to function names and rename few to better match functionality Signed-off-by: Nikhil Rao Acked-by: Jerin Jacob --- lib/librte_eventdev/rte_event_eth_rx_adapter.c | 167 - 1 file changed, 80 insertions(+), 87 deletions(-) diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.c b/lib/librte_eventdev/rte_event_eth_rx_adapter.c index ce1f62d..9361d48 100644 --- a/lib/librte_eventdev/rte_event_eth_rx_adapter.c +++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.c @@ -129,30 +129,30 @@ struct eth_rx_queue_info { static struct rte_event_eth_rx_adapter **event_eth_rx_adapter; static inline int -valid_id(uint8_t id) +rxa_validate_id(uint8_t id) { return id < RTE_EVENT_ETH_RX_ADAPTER_MAX_INSTANCE; } #define RTE_EVENT_ETH_RX_ADAPTER_ID_VALID_OR_ERR_RET(id, retval) do { \ - if (!valid_id(id)) { \ + if (!rxa_validate_id(id)) { \ RTE_EDEV_LOG_ERR("Invalid eth Rx adapter id = %d\n", id); \ return retval; \ } \ } while (0) static inline int -sw_rx_adapter_queue_count(struct rte_event_eth_rx_adapter *rx_adapter) +rxa_sw_adapter_queue_count(struct rte_event_eth_rx_adapter *rx_adapter) { return rx_adapter->num_rx_polled; } /* Greatest common divisor */ -static uint16_t gcd_u16(uint16_t a, uint16_t b) +static uint16_t rxa_gcd_u16(uint16_t a, uint16_t b) { uint16_t r = a % b; - return r ? gcd_u16(b, r) : b; + return r ? rxa_gcd_u16(b, r) : b; } /* Returns the next queue in the polling sequence @@ -160,7 +160,7 @@ static uint16_t gcd_u16(uint16_t a, uint16_t b) * http://kb.linuxvirtualserver.org/wiki/Weighted_Round-Robin_Scheduling */ static int -wrr_next(struct rte_event_eth_rx_adapter *rx_adapter, +rxa_wrr_next(struct rte_event_eth_rx_adapter *rx_adapter, unsigned int n, int *cw, struct eth_rx_poll_entry *eth_rx_poll, uint16_t max_wt, uint16_t gcd, int prev) @@ -190,7 +190,7 @@ static uint16_t gcd_u16(uint16_t a, uint16_t b) /* Precalculate WRR polling sequence for all queues in rx_adapter */ static int -eth_poll_wrr_calc(struct rte_event_eth_rx_adapter *rx_adapter) +rxa_calc_wrr_sequence(struct rte_event_eth_rx_adapter *rx_adapter) { uint16_t d; uint16_t q; @@ -239,7 +239,7 @@ static uint16_t gcd_u16(uint16_t a, uint16_t b) rx_poll[poll_q].eth_rx_qid = q; max_wrr_pos += wt; max_wt = RTE_MAX(max_wt, wt); - gcd = (gcd) ? gcd_u16(gcd, wt) : wt; + gcd = (gcd) ? rxa_gcd_u16(gcd, wt) : wt; poll_q++; } } @@ -259,7 +259,7 @@ static uint16_t gcd_u16(uint16_t a, uint16_t b) int prev = -1; int cw = -1; for (i = 0; i < max_wrr_pos; i++) { - rx_wrr[i] = wrr_next(rx_adapter, poll_q, &cw, + rx_wrr[i] = rxa_wrr_next(rx_adapter, poll_q, &cw, rx_poll, max_wt, gcd, prev); prev = rx_wrr[i]; } @@ -276,7 +276,7 @@ static uint16_t gcd_u16(uint16_t a, uint16_t b) } static inline void -mtoip(struct rte_mbuf *m, struct ipv4_hdr **ipv4_hdr, +rxa_mtoip(struct rte_mbuf *m, struct ipv4_hdr **ipv4_hdr, struct ipv6_hdr **ipv6_hdr) { struct ether_hdr *eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *); @@ -315,7 +315,7 @@ static uint16_t gcd_u16(uint16_t a, uint16_t b) /* Calculate RSS hash for IPv4/6 */ static inline uint32_t -do_softrss(struct rte_mbuf *m, const uint8_t *rss_key_be) +rxa_do_softrss(struct rte_mbuf *m, const uint8_t *rss_key_be) { uint32_t input_len; void *tuple; @@ -324,7 +324,7 @@ static uint16_t gcd_u16(uint16_t a, uint16_t b) struct ipv4_hdr *ipv4_hdr; struct ipv6_hdr *ipv6_hdr; - mtoip(m, &ipv4_hdr, &ipv6_hdr); + rxa_mtoip(m, &ipv4_hdr, &ipv6_hdr); if (ipv4_hdr) { ipv4_tuple.src_addr = rte_be_to_cpu_32(ipv4_hdr->src_addr); @@ -343,13 +343,13 @@ static uint16_t gcd_u16(uint16_t a, uint16_t b) } static inline int -rx_enq_blocked(struct rte_event_eth_rx_adapter *rx_adapter) +rxa_enq_blocked(struct rte_event_eth_rx_adapter *rx_adapter) { return !!rx_adapter->enq_block_count; } static inline void -rx_enq_block_start_ts(struct rte_event_eth_rx_adapter *rx_adapter) +rxa_enq_block_start_ts(struct rte_event_eth_rx_adapter *rx_adapter) { if (rx_adapter->rx_enq_block_start_ts) return; @@ -362,13 +362,13 @@ static uint16_t gcd_u16(uint16_t a, uint16_t b) } static inline void -rx_enq_block_end_ts(struct rte_event_eth_rx_adapter *rx_adapter, +rxa_enq_block_end_ts(struct rte_event_eth_rx_adapter *rx_adapter, struct rte_event_eth_rx_adapter_stats *stats) { if (unlikely(
[dpdk-dev] [PATCH v4 3/5] eventdev: move Rx adapter eth Rx to separate function
Create a separate function that handles eth receive and enqueue to event buffer. This function will also be called for interrupt driven receive queues. Signed-off-by: Nikhil Rao Acked-by: Jerin Jacob --- lib/librte_eventdev/rte_event_eth_rx_adapter.c | 67 ++ 1 file changed, 47 insertions(+), 20 deletions(-) diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.c b/lib/librte_eventdev/rte_event_eth_rx_adapter.c index 926f83a..8fe037f 100644 --- a/lib/librte_eventdev/rte_event_eth_rx_adapter.c +++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.c @@ -616,6 +616,45 @@ static uint16_t rxa_gcd_u16(uint16_t a, uint16_t b) } } +/* Enqueue packets fromto event buffer */ +static inline uint32_t +rxa_eth_rx(struct rte_event_eth_rx_adapter *rx_adapter, + uint16_t port_id, + uint16_t queue_id, + uint32_t rx_count, + uint32_t max_rx) +{ + struct rte_mbuf *mbufs[BATCH_SIZE]; + struct rte_eth_event_enqueue_buffer *buf = + &rx_adapter->event_enqueue_buffer; + struct rte_event_eth_rx_adapter_stats *stats = + &rx_adapter->stats; + uint16_t n; + uint32_t nb_rx = 0; + + /* Don't do a batch dequeue from the rx queue if there isn't +* enough space in the enqueue buffer. +*/ + while (BATCH_SIZE <= (RTE_DIM(buf->events) - buf->count)) { + if (buf->count >= BATCH_SIZE) + rxa_flush_event_buffer(rx_adapter); + + stats->rx_poll_count++; + n = rte_eth_rx_burst(port_id, queue_id, mbufs, BATCH_SIZE); + if (unlikely(!n)) + break; + rxa_buffer_mbufs(rx_adapter, port_id, queue_id, mbufs, n); + nb_rx += n; + if (rx_count + nb_rx > max_rx) + break; + } + + if (buf->count >= BATCH_SIZE) + rxa_flush_event_buffer(rx_adapter); + + return nb_rx; +} + /* * Polls receive queues added to the event adapter and enqueues received * packets to the event device. @@ -633,17 +672,16 @@ static uint16_t rxa_gcd_u16(uint16_t a, uint16_t b) rxa_poll(struct rte_event_eth_rx_adapter *rx_adapter) { uint32_t num_queue; - uint16_t n; uint32_t nb_rx = 0; - struct rte_mbuf *mbufs[BATCH_SIZE]; struct rte_eth_event_enqueue_buffer *buf; uint32_t wrr_pos; uint32_t max_nb_rx; + struct rte_event_eth_rx_adapter_stats *stats; wrr_pos = rx_adapter->wrr_pos; max_nb_rx = rx_adapter->max_nb_rx; buf = &rx_adapter->event_enqueue_buffer; - struct rte_event_eth_rx_adapter_stats *stats = &rx_adapter->stats; + stats = &rx_adapter->stats; /* Iterate through a WRR sequence */ for (num_queue = 0; num_queue < rx_adapter->wrr_len; num_queue++) { @@ -658,32 +696,21 @@ static uint16_t rxa_gcd_u16(uint16_t a, uint16_t b) rxa_flush_event_buffer(rx_adapter); if (BATCH_SIZE > (ETH_EVENT_BUFFER_SIZE - buf->count)) { rx_adapter->wrr_pos = wrr_pos; - return; + break; } - stats->rx_poll_count++; - n = rte_eth_rx_burst(d, qid, mbufs, BATCH_SIZE); - - if (n) { - stats->rx_packets += n; - /* The check before rte_eth_rx_burst() ensures that -* all n mbufs can be buffered -*/ - rxa_buffer_mbufs(rx_adapter, d, qid, mbufs, n); - nb_rx += n; - if (nb_rx > max_nb_rx) { - rx_adapter->wrr_pos = + nb_rx += rxa_eth_rx(rx_adapter, d, qid, nb_rx, max_nb_rx); + if (nb_rx > max_nb_rx) { + rx_adapter->wrr_pos = (wrr_pos + 1) % rx_adapter->wrr_len; - break; - } + break; } if (++wrr_pos == rx_adapter->wrr_len) wrr_pos = 0; } - if (buf->count >= BATCH_SIZE) - rxa_flush_event_buffer(rx_adapter); + stats->rx_packets += nb_rx; } static int -- 1.8.3.1
[dpdk-dev] [Bug 67] multi_process/l2fwd_fork failed to compile
https://bugs.dpdk.org/show_bug.cgi?id=67 Bug ID: 67 Summary: multi_process/l2fwd_fork failed to compile Product: DPDK Version: 18.05 Hardware: All OS: All Status: CONFIRMED Severity: normal Priority: Normal Component: examples Assignee: dev@dpdk.org Reporter: wangl...@infoch.cn Target Milestone: --- CC main.o /root/dpdk-18.05/examples/multi_process/l2fwd_fork/main.c: In function ‘main’: /root/dpdk-18.05/examples/multi_process/l2fwd_fork/main.c:1043:33: error: ‘dev_info’ undeclared (first use in this function) rte_eth_dev_info_get(portid, &dev_info); ^ /root/dpdk-18.05/examples/multi_process/l2fwd_fork/main.c:1043:33: note: each undeclared identifier is reported only once for each function it appears in /root/dpdk-18.05/examples/multi_process/l2fwd_fork/main.c:1077:11: error: ‘struct rte_eth_txconf’ has no member named ‘tx_offloads’ txq_conf.tx_offloads = local_port_conf.txmode.offloads; ^ make[1]: *** [main.o] Error 1 make: *** [all] Error 2 -- You are receiving this mail because: You are the assignee for the bug.
[dpdk-dev] [PATCH] eal: fix device be attached twice
If an attached PCI device be attached again, it will cause rte_pci_device->device.name be corrupted due to unexpected rte_devargs_remove. Fixes: 7e8b26650146 ("eal: fix hotplug add / remove") Cc: sta...@dpdk.org Signed-off-by: Qi Zhang --- lib/librte_eal/common/eal_common_dev.c | 21 +++-- 1 file changed, 7 insertions(+), 14 deletions(-) diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c index 61cb3b162..14c5f05fa 100644 --- a/lib/librte_eal/common/eal_common_dev.c +++ b/lib/librte_eal/common/eal_common_dev.c @@ -42,18 +42,6 @@ static struct dev_event_cb_list dev_event_cbs; /* spinlock for device callbacks */ static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER; -static int cmp_detached_dev_name(const struct rte_device *dev, - const void *_name) -{ - const char *name = _name; - - /* skip attached devices */ - if (dev->driver != NULL) - return 1; - - return strcmp(dev->name, name); -} - static int cmp_dev_name(const struct rte_device *dev, const void *_name) { const char *name = _name; @@ -151,14 +139,19 @@ int __rte_experimental rte_eal_hotplug_add(const char *busname, const char *devn if (ret) goto err_devarg; - dev = bus->find_device(NULL, cmp_detached_dev_name, devname); + dev = bus->find_device(NULL, cmp_dev_name, devname); if (dev == NULL) { - RTE_LOG(ERR, EAL, "Cannot find unplugged device (%s)\n", + RTE_LOG(ERR, EAL, "Cannot find device (%s)\n", devname); ret = -ENODEV; goto err_devarg; } + if (dev->driver != NULL) { + RTE_LOG(ERR, EAL, "Device is already plugged\n"); + return -EEXIST; + } + ret = bus->plug(dev); if (ret) { RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n", -- 2.13.6
[dpdk-dev] [PATCH] net/ixgbe: fix missing NULL point check
Add missing NULL point check in ixgbe_pf_host_uninit, or it may cause segement fault when detach a device. Fixes: cf80ba6e2038 ("net/ixgbe: add support for representor ports") Cc: sta...@dpdk.org Signed-off-by: Qi Zhang --- drivers/net/ixgbe/ixgbe_pf.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c index 4d199c8..73f0e43 100644 --- a/drivers/net/ixgbe/ixgbe_pf.c +++ b/drivers/net/ixgbe/ixgbe_pf.c @@ -128,21 +128,24 @@ void ixgbe_pf_host_uninit(struct rte_eth_dev *eth_dev) PMD_INIT_FUNC_TRACE(); - vfinfo = IXGBE_DEV_PRIVATE_TO_P_VFDATA(eth_dev->data->dev_private); - RTE_ETH_DEV_SRIOV(eth_dev).active = 0; RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool = 0; RTE_ETH_DEV_SRIOV(eth_dev).def_vmdq_idx = 0; RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx = 0; - ret = rte_eth_switch_domain_free((*vfinfo)->switch_domain_id); - if (ret) - PMD_INIT_LOG(WARNING, "failed to free switch domain: %d", ret); - vf_num = dev_num_vf(eth_dev); if (vf_num == 0) return; + vfinfo = IXGBE_DEV_PRIVATE_TO_P_VFDATA(eth_dev->data->dev_private); + + if (*vfinfo == NULL) + return; + + ret = rte_eth_switch_domain_free((*vfinfo)->switch_domain_id); + if (ret) + PMD_INIT_LOG(WARNING, "failed to free switch domain: %d", ret); + rte_free(*vfinfo); *vfinfo = NULL; } -- 2.5.5
[dpdk-dev] [PATCH] net/mlx5: activate Verbs cleanup on removal
Starting from rdma-core v19, Mellanox OFED 4.4, the Verbs resources cleanup is properly activated in plug-out process while setting the MLX5_DEVICE_FATAL_CLEANUP environment variable to 1. Set the aforementioned variable to 1. Signed-off-by: Matan Azrad --- drivers/net/mlx5/mlx5.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index f0e6ed7..d081bdd 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -1409,6 +1409,11 @@ /* Match the size of Rx completion entry to the size of a cacheline. */ if (RTE_CACHE_LINE_SIZE == 128) setenv("MLX5_CQE_SIZE", "128", 0); + /* +* MLX5_DEVICE_FATAL_CLEANUP tells ibv_destroy functions to +* cleanup all the Verbs resources even when the device was removed. +*/ + setenv("MLX5_DEVICE_FATAL_CLEANUP", "1", 1); #ifdef RTE_LIBRTE_MLX5_DLOPEN_DEPS if (mlx5_glue_init()) return; -- 1.9.5
[dpdk-dev] [PATCH] net/i40e: fix link speed issue
When link needs to go up, I40E_AQ_PHY_AN_ENABLED is always be set in DPDK. So all speeds are always set. This causes speed config never works. This patch fixes this issue and only allows to set available speeds. If link needs to go up and speed setting is not supported, it will print warning and set default available speeds. And when link needs to go down, link speed field should be set to non-zero to avoid link down issue when binding back to kernel driver. Fixes: ca7e599d4506 ("net/i40e: fix link management") Fixes: 1bb8f661168d ("net/i40e: fix link down and negotiation") Cc: sta...@dpdk.org Signed-off-by: Xiaoyun Li --- drivers/net/i40e/i40e_ethdev.c | 58 ++ 1 file changed, 36 insertions(+), 22 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 13c5d32..272a975 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -2026,27 +2026,38 @@ i40e_phy_conf_link(struct i40e_hw *hw, struct i40e_aq_get_phy_abilities_resp phy_ab; struct i40e_aq_set_phy_config phy_conf; enum i40e_aq_phy_type cnt; + uint8_t avail_speed; uint32_t phy_type_mask = 0; const uint8_t mask = I40E_AQ_PHY_FLAG_PAUSE_TX | I40E_AQ_PHY_FLAG_PAUSE_RX | I40E_AQ_PHY_FLAG_PAUSE_RX | I40E_AQ_PHY_FLAG_LOW_POWER; - const uint8_t advt = I40E_LINK_SPEED_40GB | - I40E_LINK_SPEED_25GB | - I40E_LINK_SPEED_10GB | - I40E_LINK_SPEED_1GB | - I40E_LINK_SPEED_100MB; int ret = -ENOTSUP; + /* To get phy capabilities of available speeds. */ + status = i40e_aq_get_phy_capabilities(hw, false, true, &phy_ab, + NULL); + if (status) { + PMD_DRV_LOG(ERR, "Failed to get PHY capabilities: %d\n", + status); + return ret; + } + avail_speed = phy_ab.link_speed; + /* To get the current phy config. */ status = i40e_aq_get_phy_capabilities(hw, false, false, &phy_ab, NULL); - if (status) + if (status) { + PMD_DRV_LOG(ERR, "Failed to get the current PHY config: %d\n", + status); return ret; + } - /* If link already up, no need to set up again */ - if (is_up && phy_ab.phy_type != 0) + /* If link needs to go up and its speed values are OK, no need +* to set up again. +*/ + if (is_up && phy_ab.phy_type != 0 && phy_ab.link_speed != 0) return I40E_SUCCESS; memset(&phy_conf, 0, sizeof(phy_conf)); @@ -2055,15 +2066,17 @@ i40e_phy_conf_link(struct i40e_hw *hw, abilities &= ~mask; abilities |= phy_ab.abilities & mask; - /* update ablities and speed */ - if (abilities & I40E_AQ_PHY_AN_ENABLED) - phy_conf.link_speed = advt; - else - phy_conf.link_speed = is_up ? force_speed : phy_ab.link_speed; - phy_conf.abilities = abilities; - + /* If link needs to go up, but the force speed is not supported, +* Warn users and config the default available speeds. +*/ + if (is_up && !(force_speed & avail_speed)) { + PMD_DRV_LOG(WARNING, "Invalid speed setting, set to default!\n"); + phy_conf.link_speed = avail_speed; + } else { + phy_conf.link_speed = is_up ? force_speed : avail_speed; + } /* To enable link, phy_type mask needs to include each type */ for (cnt = I40E_PHY_TYPE_SGMII; cnt < I40E_PHY_TYPE_MAX; cnt++) @@ -2099,6 +2112,14 @@ i40e_apply_link_speed(struct rte_eth_dev *dev) struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); struct rte_eth_conf *conf = &dev->data->dev_conf; + if (conf->link_speeds == ETH_LINK_SPEED_AUTONEG) { + conf->link_speeds = ETH_LINK_SPEED_40G | + ETH_LINK_SPEED_25G | + ETH_LINK_SPEED_20G | + ETH_LINK_SPEED_10G | + ETH_LINK_SPEED_1G | + ETH_LINK_SPEED_100M; + } speed = i40e_parse_link_speeds(conf->link_speeds); abilities |= I40E_AQ_PHY_ENABLE_ATOMIC_LINK; if (!(conf->link_speeds & ETH_LINK_SPEED_FIXED)) @@ -2220,13 +2241,6 @@ i40e_dev_start(struct rte_eth_dev *dev) } /* Apply link configure */ - if (dev->data->dev_conf.link_speeds & ~(ETH_LINK_SPEED_100M | - ETH_LINK_SPEED_1G | ETH_LINK_SPEED_10G | - ETH_LINK_SPEED_20G | ETH_LINK_SPEED_25G | - ETH_LINK_SPEED_40G)) { -
[dpdk-dev] [PATCH v8 00/19] enable hotplug on multi-process
v8: - update rte_eal_version.map due to new API added. - minor reword on release note. - minor fix on commit log and code style. NOTE: Some issues which is not related with this patchset is expected when play with hotplug_mp sample as belows. - Attach a PCI device twice may cause device can't be detached below fix is required: https://patches.dpdk.org/patch/42030/ - ixgbe device can't detached, below fix is required https://patches.dpdk.org/patch/42031/ v7: - update rte_ethdev_version.map for new APIs. - improve code readability in __handle_secondary_request by use goto. - add comments to explain why need to call rte_eal_alarm_set. - add error log when process_mp_init_callbacks failed. - reword release notes base on Anatoly's suggestion. - add back previous "Acked-by" and "Reviewed-by" in commit log. NOTE: current patchset depends on below IPC fix, or it may not be able to attach a shared vdev. https://patches.dpdk.org/patch/41647/ v6: - remove bus->scan_one, since ABI break is not necessary. - remove patch for failsafe PMD since it will not support secondary. - fix wrong implemenation on ixgbe. - add rte_eth_dev_release_port_private into rte_eth_dev_pci_generic_remove for secondary process, so we don't need to patch on PMD if PMD use the default remove function. - add release notes update. - agreed to use strdup(peer) as workaround for repling a sync request in seperate thread. v5: - since we will keep mp thread separate from interrupt thread, it is not necessary to use temporary thread, we use rte_eal_alarm_set. - remove the change in rte_eth_dev_release_port, since there is a better way to prevent rte_eth_dev_release_port be called after rte_eth_dev_release_port_private. - fix the issue that lock does not take effect on secondary due to previous re-work - fix the issue when the first attached device is a private device from secondary. (patch 8/24) - work around for reply a sync request in separate thread, this is still an open and in discussion as below. https://mails.dpdk.org/archives/dev/2018-June/105359.html v4: - since mp thread will be merged to interrupt thread, the fix on v3 for sync IPC deadlock will not work. the new version enable the machanism to invoke a mp action callback in a temporary thread to avoid the IPC deadlock, with this, secondary to primary request impelemtation also be simplified, since we can use sync request directly in a separate thread. v3: - enable mp init callback register to help non-eal module to initialize mp channel during rte_eal_init - fix when attach share device from secondary. 1) dead lock due to sync IPC be invoked in rte_malloc in primary process when handle secondary request to attach device, the solution is primary process to issue share device attach/detach in interrupt thread. 2) return port_id not correct. - check nb_sent and nb_received in sync IPC. - fix memory leak duirng error handling at attach_on_secondary. - improve clean_lock_callback to only lock/unlock spinlock once - improve error code return in check-reply during async IPC. - remove rte_ prefix of internal function in ethdev_mp.c - sample code improvement. 1) rename sample to "hotplug_mp", and move to example/multi-process. 2) cleanup header include. 3) call rte_eal_cleanup before exit. v2: - rename rte_ethdev_mp.* to ethdev_mp.* - rename rte_ethdev_lock.* to ethdev_lock.* - move internal funciton to ethdev_private.h - separate rte_eth_dev_[un]lock into rte_eth_dev_[un]lock and rte_eth_dev_[un]lock_with_callback - lock callbacks will be removed automatically after device is detached. - add experimental tag for all new APIs. - fix coding style issue. - fix wrong lisence header in sample code. - fix spelling - fix meson.build. - improve comments. Background: === Currently secondary process will only sync ethdev from primary process at init stage, but it will not be aware if device is attached/detached on primary process at runtime. While there is the requirement from application that take primary-secondary process model. The primary process work as a resource management process, it will create/destroy virtual device at runtime, while the secondary process deal with the network stuff with these devices. Solution: = So the orignial intention is to fix this gap, but beyond that the patch set provide a more comprehesive solution to handle different hotplug cases in multi-process situation, it cover below scenario: 1. Attach a share device from primary 2. Detach a share device from primary 3. Attach a share device from secondary 4. Detach a share device from secondary 5. Attach a private device from secondary 6. Detach a private device from secondary 7. Detach a share device from secondary privately 8. Attach a share device from secondary privately In primary-secondary process model, we assume ethernet devices are shared by default. that means attach or detach a device on any process will broa
[dpdk-dev] [PATCH v8 05/19] ethdev: support attach or detach share device from secondary
This patch cover the multi-process hotplug case when a share device attach/detach request be issued from secondary process device attach on secondary: a) seconary send sync request to primary. b) primary receive the request and attach the new device if failed goto i). c) primary forward attach sync request to all secondary. d) secondary receive request and attach device and send reply. e) primary check the reply if all success go to j). f) primary send attach rollback sync request to all secondary. g) secondary receive the request and detach device and send reply. h) primary receive the reply and detach device as rollback action. i) send fail reply to secondary, goto k). j) send success reply to secondary. k) secondary process receive reply of step a) and return. device detach on secondary: a) secondary send sync request to primary b) primary receive the request and perform pre-detach check, if device is locked, goto j). c) primary send pre-detach sync request to all secondary. d) secondary perform pre-detach check and send reply. e) primary check the reply if any fail goto j). f) primary send detach sync request to all secondary g) secondary detach the device and send reply h) primary detach the device. i) send success reply to secondary, goto k). j) send fail reply to secondary. k) secondary process receive reply of step a) and return. Signed-off-by: Qi Zhang Reviewed-by: Anatoly Burakov --- lib/librte_ethdev/ethdev_mp.c | 179 -- 1 file changed, 173 insertions(+), 6 deletions(-) diff --git a/lib/librte_ethdev/ethdev_mp.c b/lib/librte_ethdev/ethdev_mp.c index 1d148cd5e..8d13da591 100644 --- a/lib/librte_ethdev/ethdev_mp.c +++ b/lib/librte_ethdev/ethdev_mp.c @@ -5,8 +5,44 @@ #include #include "rte_ethdev_driver.h" + #include "ethdev_mp.h" #include "ethdev_lock.h" +#include "ethdev_private.h" + +/** + * + * secondary to primary request. + * start from function eth_dev_request_to_primary. + * + * device attach on secondary: + * a) seconary send sycn request to primary + * b) primary receive the request and attach the new device thread, + *if failed goto i). + * c) primary forward attach request to all secondary as sync request + * d) secondary receive request and attach device and send reply. + * e) primary check the reply if all success go to j). + * f) primary send attach rollback sync request to all secondary. + * g) secondary receive the request and detach device and send reply. + * h) primary receive the reply and detach device as rollback action. + * i) send fail sync reply to secondary, goto k). + * j) send success sync reply to secondary. + * k) secondary process receive reply of step a) and return. + * + * device detach on secondary: + * a) secondary send detach sync request to primary + * b) primary receive the request and perform pre-detach check, if device + *is locked, goto j). + * c) primary send pre-detach sync request to all secondary. + * d) secondary perform pre-detach check and send reply. + * e) primary check the reply if any fail goto j). + * f) primary send detach sync request to all secondary + * g) secondary detach the device and send reply + * h) primary detach the device. + * i) send success sync reply to secondary, goto k). + * j) send fail sync reply to secondary. + * k) secondary process receive reply of step a) and return. + */ #define MP_TIMEOUT_S 5 /**< 5 seconds timeouts */ @@ -84,11 +120,122 @@ static int attach_on_secondary(const char *devargs, uint16_t port_id) } static int -handle_secondary_request(const struct rte_mp_msg *msg, const void *peer) +send_response_to_secondary(const struct eth_dev_mp_req *req, + int result, + const void *peer) +{ + struct rte_mp_msg mp_resp; + struct eth_dev_mp_req *resp = + (struct eth_dev_mp_req *)mp_resp.param; + int ret; + + memset(&mp_resp, 0, sizeof(mp_resp)); + mp_resp.len_param = sizeof(*resp); + strcpy(mp_resp.name, ETH_DEV_MP_ACTION_REQUEST); + memcpy(resp, req, sizeof(*req)); + resp->result = result; + + ret = rte_mp_reply(&mp_resp, peer); + if (ret) + ethdev_log(ERR, "failed to send response to secondary\n"); + + return ret; +} + +int eth_dev_request_to_secondary(struct eth_dev_mp_req *req); + +static void +__handle_secondary_request(void *param) +{ + struct mp_reply_bundle *bundle = param; + const struct rte_mp_msg *msg = &bundle->msg; + const struct eth_dev_mp_req *req = + (const struct eth_dev_mp_req *)msg->param; + struct eth_dev_mp_req tmp_req; + uint16_t port_id; + int ret = 0; + + tmp_req = *req; + + if (req->t == REQ_TYPE_ATTACH) { + ret = do_eth_dev_attach(req->devargs, &port_id); + if (ret) + goto finish; + + tmp_req.port_id = port_id; + ret = eth_dev_request_to_secondar
[dpdk-dev] [PATCH v8 02/19] eal: enable multi process init callback
Introduce new API rte_eal_register_mp_init that help to register a callback function which will be invoked right after multi-process channel be established (rte_mp_channel_init). Typically the API will be used by other module that want it's mp channel action callbacks can be registered during rte_eal_init automatically. Signed-off-by: Qi Zhang Acked-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_proc.c | 57 +++-- lib/librte_eal/common/eal_private.h | 5 +++ lib/librte_eal/common/include/rte_eal.h | 34 lib/librte_eal/linuxapp/eal/eal.c | 2 ++ lib/librte_eal/rte_eal_version.map | 1 + 5 files changed, 97 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c index f010ef59e..f6d7c83e4 100644 --- a/lib/librte_eal/common/eal_common_proc.c +++ b/lib/librte_eal/common/eal_common_proc.c @@ -619,11 +619,47 @@ unlink_sockets(const char *filter) return 0; } +struct mp_init_entry { + TAILQ_ENTRY(mp_init_entry) next; + rte_eal_mp_init_callback_t callback; +}; + +TAILQ_HEAD(mp_init_entry_list, mp_init_entry); +static struct mp_init_entry_list mp_init_entry_list = + TAILQ_HEAD_INITIALIZER(mp_init_entry_list); + +static int process_mp_init_callbacks(void) +{ + struct mp_init_entry *entry; + int ret; + + TAILQ_FOREACH(entry, &mp_init_entry_list, next) { + ret = entry->callback(); + if (ret) + return ret; + } + return 0; +} + +int __rte_experimental +rte_eal_register_mp_init(rte_eal_mp_init_callback_t callback) +{ + struct mp_init_entry *entry = calloc(1, sizeof(struct mp_init_entry)); + + if (entry == NULL) + return -ENOMEM; + + entry->callback = callback; + TAILQ_INSERT_TAIL(&mp_init_entry_list, entry, next); + + return 0; +} + int rte_mp_channel_init(void) { char path[PATH_MAX]; - int dir_fd; + int dir_fd, ret; pthread_t mp_handle_tid, async_reply_handle_tid; /* create filter path */ @@ -686,7 +722,24 @@ rte_mp_channel_init(void) flock(dir_fd, LOCK_UN); close(dir_fd); - return 0; + ret = process_mp_init_callbacks(); + if (ret) + RTE_LOG(ERR, EAL, "failed to process mp init callbacks\n"); + + return ret; +} + +void rte_mp_init_callback_cleanup(void) +{ + struct mp_init_entry *entry; + + while (!TAILQ_EMPTY(&mp_init_entry_list)) { + TAILQ_FOREACH(entry, &mp_init_entry_list, next) { + TAILQ_REMOVE(&mp_init_entry_list, entry, next); + free(entry); + break; + } + } } /** diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index bdadc4d50..bc230ee23 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -247,6 +247,11 @@ struct rte_bus *rte_bus_find_by_device_name(const char *str); int rte_mp_channel_init(void); /** + * Cleanup all mp channel init callbacks. + */ +void rte_mp_init_callback_cleanup(void); + +/** * Internal Executes all the user application registered callbacks for * the specific device. It is for DPDK internal user only. User * application should not call it directly. diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h index 8de5d69e8..506f17f34 100644 --- a/lib/librte_eal/common/include/rte_eal.h +++ b/lib/librte_eal/common/include/rte_eal.h @@ -512,6 +512,40 @@ __rte_deprecated const char * rte_eal_mbuf_default_mempool_ops(void); +/** + * Callback function right after multi-process channel be established. + * Typical implementation of these functions is to register mp channel + * action callbacks + * + * @return + * - 0 on success. + * - (<0) on failure. + */ +typedef int (*rte_eal_mp_init_callback_t)(void); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Register a callback function that will be invoked right after + * multi-process channel be established (rte_mp_channel_init). Typically + * the function is used by other module that want it's mp channel + * action callbacks can be registered during rte_eal_init automatically. + * + * @note + * This function only take effect when be called before rte_eal_init, + * and all registered callback will be clear during rte_eal_cleanup. + * + * @param callback + * function be called at that moment. + * + * @return + * - 0 on success. + * - (<0) on failure. + */ +int __rte_experimental +rte_eal_register_mp_init(rte_eal_mp_init_callback_t callback); + #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 8655b8691..45cccff7e 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal
[dpdk-dev] [PATCH v8 01/19] ethdev: add function to release port in local process
Add driver API rte_eth_release_port_private to support the requirement that an ethdev only be released on secondary process, so only local state be set to unused, share data will not be reset so primary process can still use it. Signed-off-by: Qi Zhang Reviewed-by: Andrew Rybchenko Acked-by: Remy Horton --- lib/librte_ethdev/rte_ethdev.c| 12 lib/librte_ethdev/rte_ethdev_driver.h | 13 + lib/librte_ethdev/rte_ethdev_pci.h| 3 +++ 3 files changed, 28 insertions(+) diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index a9977df97..52a97694c 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -359,6 +359,18 @@ rte_eth_dev_attach_secondary(const char *name) } int +rte_eth_dev_release_port_private(struct rte_eth_dev *eth_dev) +{ + if (eth_dev == NULL) + return -EINVAL; + + _rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_DESTROY, NULL); + eth_dev->state = RTE_ETH_DEV_UNUSED; + + return 0; +} + +int rte_eth_dev_release_port(struct rte_eth_dev *eth_dev) { if (eth_dev == NULL) diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h index c9c825e3f..49c27223d 100644 --- a/lib/librte_ethdev/rte_ethdev_driver.h +++ b/lib/librte_ethdev/rte_ethdev_driver.h @@ -70,6 +70,19 @@ int rte_eth_dev_release_port(struct rte_eth_dev *eth_dev); /** * @internal + * Release the specified ethdev port in local process, only set to ethdev + * state to unused, but not reset share data since it assume other process + * is still using it, typically it is called by secondary process. + * + * @param eth_dev + * The *eth_dev* pointer is the address of the *rte_eth_dev* structure. + * @return + * - 0 on success, negative on error + */ +int rte_eth_dev_release_port_private(struct rte_eth_dev *eth_dev); + +/** + * @internal * Release device queues and clear its configuration to force the user * application to reconfigure it. It is for internal use only. * diff --git a/lib/librte_ethdev/rte_ethdev_pci.h b/lib/librte_ethdev/rte_ethdev_pci.h index 2cfd37274..eeb944146 100644 --- a/lib/librte_ethdev/rte_ethdev_pci.h +++ b/lib/librte_ethdev/rte_ethdev_pci.h @@ -197,6 +197,9 @@ rte_eth_dev_pci_generic_remove(struct rte_pci_device *pci_dev, if (!eth_dev) return -ENODEV; + if (rte_eal_process_type() != RTE_PROC_PRIMARY) + return rte_eth_dev_release_port_private(eth_dev); + if (dev_uninit) { ret = dev_uninit(eth_dev); if (ret) -- 2.13.6
[dpdk-dev] [PATCH v8 06/19] ethdev: support attach private device as first
When attach a private device from secondary as the first one, we need to make sure rte_eth_dev_shared_data is initialized, the patch add necessary IPC for secondary to inform primary to do initialization. Signed-off-by: Qi Zhang --- lib/librte_ethdev/ethdev_mp.c | 2 ++ lib/librte_ethdev/ethdev_mp.h | 1 + lib/librte_ethdev/ethdev_private.h | 3 +++ lib/librte_ethdev/rte_ethdev.c | 31 --- 4 files changed, 26 insertions(+), 11 deletions(-) diff --git a/lib/librte_ethdev/ethdev_mp.c b/lib/librte_ethdev/ethdev_mp.c index 8d13da591..28f89dba9 100644 --- a/lib/librte_ethdev/ethdev_mp.c +++ b/lib/librte_ethdev/ethdev_mp.c @@ -189,6 +189,8 @@ __handle_secondary_request(void *param) } else { ret = tmp_req.result; } + } else if (req->t == REQ_TYPE_SHARE_DATA_PREPARE) { + eth_dev_shared_data_prepare(); } else { ethdev_log(ERR, "unsupported secondary to primary request\n"); ret = -ENOTSUP; diff --git a/lib/librte_ethdev/ethdev_mp.h b/lib/librte_ethdev/ethdev_mp.h index 40be46c89..61fc381da 100644 --- a/lib/librte_ethdev/ethdev_mp.h +++ b/lib/librte_ethdev/ethdev_mp.h @@ -15,6 +15,7 @@ enum eth_dev_req_type { REQ_TYPE_PRE_DETACH, REQ_TYPE_DETACH, REQ_TYPE_ATTACH_ROLLBACK, + REQ_TYPE_SHARE_DATA_PREPARE, }; struct eth_dev_mp_req { diff --git a/lib/librte_ethdev/ethdev_private.h b/lib/librte_ethdev/ethdev_private.h index 981e7de8a..005d63afc 100644 --- a/lib/librte_ethdev/ethdev_private.h +++ b/lib/librte_ethdev/ethdev_private.h @@ -36,4 +36,7 @@ int do_eth_dev_attach(const char *devargs, uint16_t *port_id); */ int do_eth_dev_detach(uint16_t port_id); +/* Prepare shared data for multi-process */ +void eth_dev_shared_data_prepare(void); + #endif diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index 7d89d9f95..408a49f44 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -199,11 +199,14 @@ rte_eth_find_next(uint16_t port_id) return port_id; } -static void -rte_eth_dev_shared_data_prepare(void) +void +eth_dev_shared_data_prepare(void) { const unsigned flags = 0; const struct rte_memzone *mz; + struct eth_dev_mp_req req; + + memset(&req, 0, sizeof(req)); rte_spinlock_lock(&rte_eth_shared_data_lock); @@ -215,6 +218,12 @@ rte_eth_dev_shared_data_prepare(void) rte_socket_id(), flags); } else mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA); + /* if secondary attach a private device first */ + if (mz == NULL && rte_eal_process_type() != RTE_PROC_PRIMARY) { + req.t = REQ_TYPE_SHARE_DATA_PREPARE; + eth_dev_request_to_primary(&req); + mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA); + } if (mz == NULL) rte_panic("Cannot allocate ethdev shared data\n"); @@ -255,7 +264,7 @@ rte_eth_dev_allocated(const char *name) { struct rte_eth_dev *ethdev; - rte_eth_dev_shared_data_prepare(); + eth_dev_shared_data_prepare(); rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock); @@ -300,7 +309,7 @@ rte_eth_dev_allocate(const char *name) uint16_t port_id; struct rte_eth_dev *eth_dev = NULL; - rte_eth_dev_shared_data_prepare(); + eth_dev_shared_data_prepare(); /* Synchronize port creation between primary and secondary threads. */ rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock); @@ -339,7 +348,7 @@ rte_eth_dev_attach_secondary(const char *name) uint16_t i; struct rte_eth_dev *eth_dev = NULL; - rte_eth_dev_shared_data_prepare(); + eth_dev_shared_data_prepare(); /* Synchronize port attachment to primary port creation and release. */ rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock); @@ -379,7 +388,7 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev) if (eth_dev == NULL) return -EINVAL; - rte_eth_dev_shared_data_prepare(); + eth_dev_shared_data_prepare(); _rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_DESTROY, NULL); @@ -433,7 +442,7 @@ rte_eth_find_next_owned_by(uint16_t port_id, const uint64_t owner_id) int __rte_experimental rte_eth_dev_owner_new(uint64_t *owner_id) { - rte_eth_dev_shared_data_prepare(); + eth_dev_shared_data_prepare(); rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock); @@ -488,7 +497,7 @@ rte_eth_dev_owner_set(const uint16_t port_id, { int ret; - rte_eth_dev_shared_data_prepare(); + eth_dev_shared_data_prepare(); rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock); @@ -505,7 +514,7 @@ rte_eth_dev_owner_u
[dpdk-dev] [PATCH v8 03/19] ethdev: enable hotplug on multi-process
We are going to introduce the solution to handle different hotplug cases in multi-process situation, it include below scenario: 1. Attach a share device from primary 2. Detach a share device from primary 3. Attach a share device from secondary 4. Detach a share device from secondary 5. Attach a private device from secondary 6. Detach a private device from secondary 7. Detach a share device from secondary privately 8. Attach a share device from secondary privately In primary-secondary process model, we assume device is shared by default. that means attach or detach a device on any process will broadcast to all other processes through mp channel then device information will be synchronized on all processes. Any failure during attaching process will cause inconsistent status between processes, so proper rollback action should be considered. Also it is not safe to detach a share device when other process still use it, so a handshake mechanism is introduced. This patch covers the implementation of case 1,2,5,6,7,8. Case 3,4 will be implemented on separate patch as well as handshake mechanism. Scenario for Case 1, 2: attach device a) primary attach the new device if failed goto h). b) primary send attach sync request to all secondary. c) secondary receive request and attach device and send reply. d) primary check the reply if all success go to i). e) primary send attach rollback sync request to all secondary. f) secondary receive the request and detach device and send reply. g) primary receive the reply and detach device as rollback action. h) attach fail i) attach success detach device a) primary perform pre-detach check, if device is locked, goto i). b) primary send pre-detach sync request to all secondary. c) secondary perform pre-detach check and send reply. d) primary check the reply if any fail goto i). e) primary send detach sync request to all secondary f) secondary detach the device and send reply (assume no fail) g) primary detach the device. h) detach success i) detach failed Case 5, 6: Secondary process can attach private device which only visible to itself, in this case no IPC is involved, primary process is not allowed to have private device so far. Case 7, 8: Secondary process can also temporally to detach a share device "privately" then attach it back later, this action also not impact other processes. APIs changes: rte_eth_dev_attach and rte_eth_dev_attach are extended to support share device attach/detach in primary-secondary process model, it will be called in case 1,2,3,4. New API rte_eth_dev_attach_private and rte_eth_dev_detach_private are introduced to cover case 5,6,7,8, this API can only be invoked in secondary process. Signed-off-by: Qi Zhang --- lib/librte_ethdev/Makefile | 1 + lib/librte_ethdev/ethdev_mp.c| 261 +++ lib/librte_ethdev/ethdev_mp.h| 41 + lib/librte_ethdev/ethdev_private.h | 39 + lib/librte_ethdev/meson.build| 1 + lib/librte_ethdev/rte_ethdev.c | 210 +++-- lib/librte_ethdev/rte_ethdev.h | 45 ++ lib/librte_ethdev/rte_ethdev_core.h | 5 + lib/librte_ethdev/rte_ethdev_version.map | 2 + 9 files changed, 588 insertions(+), 17 deletions(-) create mode 100644 lib/librte_ethdev/ethdev_mp.c create mode 100644 lib/librte_ethdev/ethdev_mp.h create mode 100644 lib/librte_ethdev/ethdev_private.h diff --git a/lib/librte_ethdev/Makefile b/lib/librte_ethdev/Makefile index c2f2f7d82..d0a059b83 100644 --- a/lib/librte_ethdev/Makefile +++ b/lib/librte_ethdev/Makefile @@ -19,6 +19,7 @@ EXPORT_MAP := rte_ethdev_version.map LIBABIVER := 9 SRCS-y += rte_ethdev.c +SRCS-y += ethdev_mp.c SRCS-y += rte_flow.c SRCS-y += rte_tm.c SRCS-y += rte_mtr.c diff --git a/lib/librte_ethdev/ethdev_mp.c b/lib/librte_ethdev/ethdev_mp.c new file mode 100644 index 0..0f9d8990d --- /dev/null +++ b/lib/librte_ethdev/ethdev_mp.c @@ -0,0 +1,261 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2018 Intel Corporation + */ +#include +#include + +#include "rte_ethdev_driver.h" +#include "ethdev_mp.h" + +#define MP_TIMEOUT_S 5 /**< 5 seconds timeouts */ + +struct mp_reply_bundle { + struct rte_mp_msg msg; + void *peer; +}; + +static int detach_on_secondary(uint16_t port_id) +{ + struct rte_device *dev; + struct rte_bus *bus; + int ret = 0; + + if (rte_eth_devices[port_id].state == RTE_ETH_DEV_UNUSED) { + ethdev_log(ERR, "detach on secondary: invalid port %d\n", + port_id); + return -ENODEV; + } + + dev = rte_eth_devices[port_id].device; + if (dev == NULL) + return -EINVAL; + + bus = rte_bus_find_by_device(dev); + if (bus == NULL) + return -ENOENT; + + ret = rte_eal_hotplug_remove(bus->name, dev->name); + if (ret) { + ethdev_log(ERR, "failed to h
[dpdk-dev] [PATCH v8 07/19] net/i40e: enable port detach on secondary process
Previously, detach port on secondary process will mess primary process and cause same device can't be attached again, by take advantage of rte_eth_release_port_private, we can support this with minor change. Signed-off-by: Qi Zhang --- drivers/net/i40e/i40e_ethdev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 13c5d3296..7d1f98422 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -678,6 +678,8 @@ static int eth_i40e_pci_remove(struct rte_pci_device *pci_dev) if (!ethdev) return -ENODEV; + if (rte_eal_process_type() != RTE_PROC_PRIMARY) + return rte_eth_dev_release_port_private(ethdev); if (ethdev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR) return rte_eth_dev_destroy(ethdev, i40e_vf_representor_uninit); -- 2.13.6
[dpdk-dev] [PATCH v8 04/19] ethdev: introduce device lock
Introduce API rte_eth_dev_lock and rte_eth_dev_unlock to let application lock or unlock on specific ethdev, a locked device can't be detached, this help applicaiton to prevent unexpected device detaching, especially in multi-process envrionment. Aslo introduce the new API rte_eth_dev_lock_with_callback and rte_eth_dev_unlock_with callback to let application to register a callback function which will be invoked before a device is going to be detached, the return value of the function will decide if device will continue be detached or not, this support application to do condition check at runtime. Signed-off-by: Qi Zhang Reviewed-by: Anatoly Burakov --- lib/librte_ethdev/Makefile | 1 + lib/librte_ethdev/ethdev_lock.c | 140 +++ lib/librte_ethdev/ethdev_lock.h | 31 +++ lib/librte_ethdev/ethdev_mp.c| 3 +- lib/librte_ethdev/meson.build| 1 + lib/librte_ethdev/rte_ethdev.c | 60 - lib/librte_ethdev/rte_ethdev.h | 124 +++ lib/librte_ethdev/rte_ethdev_version.map | 2 + 8 files changed, 360 insertions(+), 2 deletions(-) create mode 100644 lib/librte_ethdev/ethdev_lock.c create mode 100644 lib/librte_ethdev/ethdev_lock.h diff --git a/lib/librte_ethdev/Makefile b/lib/librte_ethdev/Makefile index d0a059b83..62bef03fc 100644 --- a/lib/librte_ethdev/Makefile +++ b/lib/librte_ethdev/Makefile @@ -20,6 +20,7 @@ LIBABIVER := 9 SRCS-y += rte_ethdev.c SRCS-y += ethdev_mp.c +SRCS-y += ethdev_lock.c SRCS-y += rte_flow.c SRCS-y += rte_tm.c SRCS-y += rte_mtr.c diff --git a/lib/librte_ethdev/ethdev_lock.c b/lib/librte_ethdev/ethdev_lock.c new file mode 100644 index 0..6379519e3 --- /dev/null +++ b/lib/librte_ethdev/ethdev_lock.c @@ -0,0 +1,140 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation + */ +#include "ethdev_lock.h" + +struct lock_entry { + TAILQ_ENTRY(lock_entry) next; + rte_eth_dev_lock_callback_t callback; + uint16_t port_id; + void *user_args; + int ref_count; +}; + +TAILQ_HEAD(lock_entry_list, lock_entry); +static struct lock_entry_list lock_entry_list = + TAILQ_HEAD_INITIALIZER(lock_entry_list); +static rte_spinlock_t lock_entry_lock = RTE_SPINLOCK_INITIALIZER; + +int +register_lock_callback(uint16_t port_id, + rte_eth_dev_lock_callback_t callback, + void *user_args) +{ + struct lock_entry *le; + + rte_spinlock_lock(&lock_entry_lock); + + TAILQ_FOREACH(le, &lock_entry_list, next) { + if (le->port_id == port_id && + le->callback == callback && + le->user_args == user_args) + break; + } + + if (le == NULL) { + le = calloc(1, sizeof(struct lock_entry)); + if (le == NULL) { + rte_spinlock_unlock(&lock_entry_lock); + return -ENOMEM; + } + le->callback = callback; + le->port_id = port_id; + le->user_args = user_args; + TAILQ_INSERT_TAIL(&lock_entry_list, le, next); + } + le->ref_count++; + + rte_spinlock_unlock(&lock_entry_lock); + return 0; +} + +int +unregister_lock_callback(uint16_t port_id, + rte_eth_dev_lock_callback_t callback, + void *user_args) +{ + struct lock_entry *le; + int ret = 0; + + rte_spinlock_lock(&lock_entry_lock); + + TAILQ_FOREACH(le, &lock_entry_list, next) { + if (le->port_id == port_id && + le->callback == callback && + le->user_args == user_args) + break; + } + + if (le != NULL) { + le->ref_count--; + if (le->ref_count == 0) { + TAILQ_REMOVE(&lock_entry_list, le, next); + free(le); + } + } else { + ret = -ENOENT; + } + + rte_spinlock_unlock(&lock_entry_lock); + return ret; +} + +static int clean_lock_callback_one(uint16_t port_id) +{ + struct lock_entry *le; + int ret = 0; + + TAILQ_FOREACH(le, &lock_entry_list, next) { + if (le->port_id == port_id) + break; + } + + if (le != NULL) { + le->ref_count--; + if (le->ref_count == 0) { + TAILQ_REMOVE(&lock_entry_list, le, next); + free(le); + } + } else { + ret = -ENOENT; + } + + return ret; + +} + +void clean_lock_callback(uint16_t port_id) +{ + int ret; + + rte_spinlock_lock(&lock_entry_lock); + + for (;;) { + ret = clean_lock_callback_one(port_id); + if (ret == -ENOENT) + break;
[dpdk-dev] [PATCH v8 08/19] net/ixgbe: enable port detach on secondary process
Previously, detach port on secondary process will mess primary process and cause same device can't be attached again, by take advantage of rte_eth_release_port_private, we can support this with minor change. Signed-off-by: Qi Zhang --- drivers/net/ixgbe/ixgbe_ethdev.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 87d2ad090..161a15f05 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -1792,6 +1792,9 @@ static int eth_ixgbe_pci_remove(struct rte_pci_device *pci_dev) if (!ethdev) return -ENODEV; + if (rte_eal_process_type() != RTE_PROC_PRIMARY) + return rte_eth_dev_release_port_private(ethdev); + if (ethdev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR) return rte_eth_dev_destroy(ethdev, ixgbe_vf_representor_uninit); else -- 2.13.6
[dpdk-dev] [PATCH v8 11/19] net/kni: enable port detach on secondary process
Previously, detach port on secondary process will mess primary process and cause same device can't be attached again, by take advantage of rte_eth_release_port_private, we can support this with minor change. Signed-off-by: Qi Zhang --- drivers/net/kni/rte_eth_kni.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/kni/rte_eth_kni.c b/drivers/net/kni/rte_eth_kni.c index ab63ea427..e5679c76a 100644 --- a/drivers/net/kni/rte_eth_kni.c +++ b/drivers/net/kni/rte_eth_kni.c @@ -419,6 +419,7 @@ eth_kni_probe(struct rte_vdev_device *vdev) } /* TODO: request info from primary to set up Rx and Tx */ eth_dev->dev_ops = ð_kni_ops; + eth_dev->device = &vdev->device; rte_eth_dev_probing_finish(eth_dev); return 0; } @@ -463,6 +464,16 @@ eth_kni_remove(struct rte_vdev_device *vdev) if (eth_dev == NULL) return -1; + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { + /* detach device on local pprocess only */ + if (strlen(rte_vdev_device_args(vdev)) == 0) + return rte_eth_dev_release_port_private(eth_dev); + /** +* else this is a private device for current process +* so continue with normal detach scenario +*/ + } + eth_kni_dev_stop(eth_dev); internals = eth_dev->data->dev_private; -- 2.13.6
[dpdk-dev] [PATCH v8 09/19] net/af_packet: enable port detach on secondary process
Previously, detach port on secondary process will mess primary process and cause same device can't be attached again, by take advantage of rte_eth_release_port_private, we can support this with minor change. Signed-off-by: Qi Zhang --- drivers/net/af_packet/rte_eth_af_packet.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c index ea47abbf8..33ac19de8 100644 --- a/drivers/net/af_packet/rte_eth_af_packet.c +++ b/drivers/net/af_packet/rte_eth_af_packet.c @@ -935,6 +935,7 @@ rte_pmd_af_packet_probe(struct rte_vdev_device *dev) } /* TODO: request info from primary to set up Rx and Tx */ eth_dev->dev_ops = &ops; + eth_dev->device = &dev->device; rte_eth_dev_probing_finish(eth_dev); return 0; } @@ -986,6 +987,16 @@ rte_pmd_af_packet_remove(struct rte_vdev_device *dev) if (eth_dev == NULL) return -1; + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { + /* detach device on local pprocess only */ + if (strlen(rte_vdev_device_args(dev)) == 0) + return rte_eth_dev_release_port_private(eth_dev); + /** +* else this is a private device for current process +* so continue with normal detach scenario +*/ + } + internals = eth_dev->data->dev_private; for (q = 0; q < internals->nb_queues; q++) { rte_free(internals->rx_queue[q].rd); -- 2.13.6
[dpdk-dev] [PATCH v8 12/19] net/null: enable port detach on secondary process
Previously, detach port on secondary process will mess primary process and cause same device can't be attached again, by take advantage of rte_eth_release_port_private, we can support this with minor change. Signed-off-by: Qi Zhang --- drivers/net/null/rte_eth_null.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c index 1d2e6b9e9..2f040729b 100644 --- a/drivers/net/null/rte_eth_null.c +++ b/drivers/net/null/rte_eth_null.c @@ -623,6 +623,7 @@ rte_pmd_null_probe(struct rte_vdev_device *dev) } /* TODO: request info from primary to set up Rx and Tx */ eth_dev->dev_ops = &ops; + eth_dev->device = &dev->device; rte_eth_dev_probing_finish(eth_dev); return 0; } @@ -667,18 +668,31 @@ static int rte_pmd_null_remove(struct rte_vdev_device *dev) { struct rte_eth_dev *eth_dev = NULL; + const char *name; if (!dev) return -EINVAL; + name = rte_vdev_device_name(dev); + PMD_LOG(INFO, "Closing null ethdev on numa socket %u", rte_socket_id()); /* find the ethdev entry */ - eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev)); + eth_dev = rte_eth_dev_allocated(name); if (eth_dev == NULL) return -1; + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { + /* detach device on local pprocess only */ + if (strlen(rte_vdev_device_args(dev)) == 0) + return rte_eth_dev_release_port_private(eth_dev); + /** +* else this is a private device for current process +* so continue with normal detach scenario +*/ + } + rte_free(eth_dev->data->dev_private); rte_eth_dev_release_port(eth_dev); -- 2.13.6
[dpdk-dev] [PATCH v8 10/19] net/bonding: enable port detach on secondary process
Previously, detach port on secondary process will mess primary process and cause same device can't be attached again, by take advantage of rte_eth_release_port_private, we can support this with minor change. Signed-off-by: Qi Zhang --- drivers/net/bonding/rte_eth_bond_pmd.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c index f155ff779..da45ba9ba 100644 --- a/drivers/net/bonding/rte_eth_bond_pmd.c +++ b/drivers/net/bonding/rte_eth_bond_pmd.c @@ -3062,6 +3062,7 @@ bond_probe(struct rte_vdev_device *dev) } /* TODO: request info from primary to set up Rx and Tx */ eth_dev->dev_ops = &default_dev_ops; + eth_dev->device = &dev->device; rte_eth_dev_probing_finish(eth_dev); return 0; } @@ -3168,6 +3169,16 @@ bond_remove(struct rte_vdev_device *dev) if (eth_dev == NULL) return -ENODEV; + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { + /* detach device on local pprocess only */ + if (strlen(rte_vdev_device_args(dev)) == 0) + return rte_eth_dev_release_port_private(eth_dev); + /** +* else this is a private device for current process +* so continue with normal detach scenario +*/ + } + RTE_ASSERT(eth_dev->device == &dev->device); internals = eth_dev->data->dev_private; -- 2.13.6
[dpdk-dev] [PATCH v8 13/19] net/octeontx: enable port detach on secondary process
Previously, detach port on secondary process will mess primary process and cause same device can't be attached again, by take advantage of rte_eth_release_port_private, we can support this with minor change. Signed-off-by: Qi Zhang --- drivers/net/octeontx/octeontx_ethdev.c | 16 1 file changed, 16 insertions(+) diff --git a/drivers/net/octeontx/octeontx_ethdev.c b/drivers/net/octeontx/octeontx_ethdev.c index 1eb453b21..497bacdc6 100644 --- a/drivers/net/octeontx/octeontx_ethdev.c +++ b/drivers/net/octeontx/octeontx_ethdev.c @@ -1016,6 +1016,7 @@ octeontx_create(struct rte_vdev_device *dev, int port, uint8_t evdev, eth_dev->tx_pkt_burst = octeontx_xmit_pkts; eth_dev->rx_pkt_burst = octeontx_recv_pkts; + eth_dev->device = &dev->device; rte_eth_dev_probing_finish(eth_dev); return 0; } @@ -1138,6 +1139,18 @@ octeontx_remove(struct rte_vdev_device *dev) if (eth_dev == NULL) return -ENODEV; + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { + /* detach device on local pprocess only */ + if (strlen(rte_vdev_device_args(dev)) == 0) { + rte_eth_dev_release_port_private(eth_dev); + continue; + } + /** +* else this is a private device for current process +* so continue with normal detach scenario +*/ + } + nic = octeontx_pmd_priv(eth_dev); rte_event_dev_stop(nic->evdev); PMD_INIT_LOG(INFO, "Closing octeontx device %s", octtx_name); @@ -1148,6 +1161,9 @@ octeontx_remove(struct rte_vdev_device *dev) rte_event_dev_close(nic->evdev); } + if (rte_eal_process_type() != RTE_PROC_PRIMARY) + return 0; + /* Free FC resource */ octeontx_pko_fc_free(); -- 2.13.6
[dpdk-dev] [PATCH v8 15/19] net/softnic: enable port detach on secondary process
Previously, detach port on secondary process will mess primary process and cause same device can't be attached again, by take advantage of rte_eth_release_port_private, we can support this with minor change. Signed-off-by: Qi Zhang --- drivers/net/softnic/rte_eth_softnic.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/net/softnic/rte_eth_softnic.c b/drivers/net/softnic/rte_eth_softnic.c index 6b3c13e5c..a45a7b0dd 100644 --- a/drivers/net/softnic/rte_eth_softnic.c +++ b/drivers/net/softnic/rte_eth_softnic.c @@ -750,6 +750,7 @@ pmd_probe(struct rte_vdev_device *vdev) } /* TODO: request info from primary to set up Rx and Tx */ eth_dev->dev_ops = &pmd_ops; + eth_dev->device = &vdev->device; rte_eth_dev_probing_finish(eth_dev); return 0; } @@ -803,17 +804,29 @@ pmd_remove(struct rte_vdev_device *vdev) { struct rte_eth_dev *dev = NULL; struct pmd_internals *p; + const char *name; if (!vdev) return -EINVAL; - PMD_LOG(INFO, "Removing device \"%s\"", - rte_vdev_device_name(vdev)); + name = rte_vdev_device_name(vdev); + PMD_LOG(INFO, "Removing device \"%s\"", name); /* Find the ethdev entry */ - dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev)); + dev = rte_eth_dev_allocated(name); if (dev == NULL) return -ENODEV; + + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { + /* detach device on local pprocess only */ + if (strlen(rte_vdev_device_args(vdev)) == 0) + return rte_eth_dev_release_port_private(dev); + /** +* else this is a private device for current process +* so continue with normal detach scenario +*/ + } + p = dev->data->dev_private; /* Free device data structures*/ -- 2.13.6
[dpdk-dev] [PATCH v8 14/19] net/pcap: enable port detach on secondary process
Previously, detach port on secondary process will mess primary process and cause same device can't be attached again, by take advantage of rte_eth_release_port_private, we can support this with minor change. Signed-off-by: Qi Zhang --- drivers/net/pcap/rte_eth_pcap.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/net/pcap/rte_eth_pcap.c b/drivers/net/pcap/rte_eth_pcap.c index 6bd4a7d79..6cc20c2b2 100644 --- a/drivers/net/pcap/rte_eth_pcap.c +++ b/drivers/net/pcap/rte_eth_pcap.c @@ -925,6 +925,7 @@ pmd_pcap_probe(struct rte_vdev_device *dev) } /* TODO: request info from primary to set up Rx and Tx */ eth_dev->dev_ops = &ops; + eth_dev->device = &dev->device; rte_eth_dev_probing_finish(eth_dev); return 0; } @@ -1016,6 +1017,7 @@ static int pmd_pcap_remove(struct rte_vdev_device *dev) { struct rte_eth_dev *eth_dev = NULL; + const char *name; PMD_LOG(INFO, "Closing pcap ethdev on numa socket %d", rte_socket_id()); @@ -1023,11 +1025,22 @@ pmd_pcap_remove(struct rte_vdev_device *dev) if (!dev) return -1; + name = rte_vdev_device_name(dev); /* reserve an ethdev entry */ - eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev)); + eth_dev = rte_eth_dev_allocated(name); if (eth_dev == NULL) return -1; + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { + /* detach device on local pprocess only */ + if (strlen(rte_vdev_device_args(dev)) == 0) + return rte_eth_dev_release_port_private(eth_dev); + /** +* else this is a private device for current process +* so continue with normal detach scenario +*/ + } + rte_free(eth_dev->data->dev_private); rte_eth_dev_release_port(eth_dev); -- 2.13.6
[dpdk-dev] [PATCH v8 17/19] net/vhost: enable port detach on secondary process
Previously, detach port on secondary process will mess primary process and cause same device can't be attached again, by take advantage of rte_eth_release_port_private, we can support this with minor change. Signed-off-by: Qi Zhang --- drivers/net/vhost/rte_eth_vhost.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c index ba9d768a0..f773711b4 100644 --- a/drivers/net/vhost/rte_eth_vhost.c +++ b/drivers/net/vhost/rte_eth_vhost.c @@ -1353,6 +1353,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev) } /* TODO: request info from primary to set up Rx and Tx */ eth_dev->dev_ops = &ops; + eth_dev->device = &dev->device; rte_eth_dev_probing_finish(eth_dev); return 0; } @@ -1435,6 +1436,16 @@ rte_pmd_vhost_remove(struct rte_vdev_device *dev) if (eth_dev == NULL) return -ENODEV; + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { + /* detach device on local pprocess only */ + if (strlen(rte_vdev_device_args(dev)) == 0) + return rte_eth_dev_release_port_private(eth_dev); + /** +* else this is a private device for current process +* so continue with normal detach scenario +*/ + } + eth_dev_close(eth_dev); rte_free(vring_states[eth_dev->data->port_id]); -- 2.13.6
[dpdk-dev] [PATCH v8 16/19] net/tap: enable port detach on secondary process
Previously, detach port on secondary process will mess primary process and cause same device can't be attached again, by take advantage of rte_eth_release_port_private, we can support this with minor change. Signed-off-by: Qi Zhang Acked-by: Keith Wiles --- drivers/net/tap/rte_eth_tap.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index df396bfde..bb5f20b01 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -1759,6 +1759,7 @@ rte_pmd_tap_probe(struct rte_vdev_device *dev) } /* TODO: request info from primary to set up Rx and Tx */ eth_dev->dev_ops = &ops; + eth_dev->device = &dev->device; rte_eth_dev_probing_finish(eth_dev); return 0; } @@ -1827,12 +1828,24 @@ rte_pmd_tap_remove(struct rte_vdev_device *dev) { struct rte_eth_dev *eth_dev = NULL; struct pmd_internals *internals; + const char *name; int i; + name = rte_vdev_device_name(dev); /* find the ethdev entry */ - eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev)); + eth_dev = rte_eth_dev_allocated(name); if (!eth_dev) - return 0; + return -ENODEV; + + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { + /* detach device on local pprocess only */ + if (strlen(rte_vdev_device_args(dev)) == 0) + return rte_eth_dev_release_port_private(eth_dev); + /** +* else this is a private device for current process +* so continue with normal detach scenario +*/ + } internals = eth_dev->data->dev_private; -- 2.13.6
[dpdk-dev] [PATCH v8 18/19] examples/multi_process: add hotplug sample
The sample code demonstrate device (ethdev only) management at multi-process envrionment. User can attach/detach a device on primary process and see it is synced on secondary process automatically, also user can lock a device to prevent it be detached or unlock it to go back to default behaviour. How to start? ./hotplug_mp --proc-type=auto Command Line Example: >help >list /* attach a af_packet vdev */ >attach net_af_packet,iface=eth0 /* detach port 0 */ >detach 0 /* attach a private af_packet vdev (secondary process only)*/ >attachp net_af_packet,iface=eth0 /* detach a private device (secondary process only) */ >detachp 0 /* lock port 0 */ >lock 0 /* unlock port 0 */ >unlock 0 Signed-off-by: Qi Zhang --- examples/multi_process/Makefile | 1 + examples/multi_process/hotplug_mp/Makefile | 23 ++ examples/multi_process/hotplug_mp/commands.c | 356 +++ examples/multi_process/hotplug_mp/commands.h | 10 + examples/multi_process/hotplug_mp/main.c | 41 +++ 5 files changed, 431 insertions(+) create mode 100644 examples/multi_process/hotplug_mp/Makefile create mode 100644 examples/multi_process/hotplug_mp/commands.c create mode 100644 examples/multi_process/hotplug_mp/commands.h create mode 100644 examples/multi_process/hotplug_mp/main.c diff --git a/examples/multi_process/Makefile b/examples/multi_process/Makefile index a6708b7e4..b76b02fcb 100644 --- a/examples/multi_process/Makefile +++ b/examples/multi_process/Makefile @@ -13,5 +13,6 @@ include $(RTE_SDK)/mk/rte.vars.mk DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += client_server_mp DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += simple_mp DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += symmetric_mp +DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += hotplug_mp include $(RTE_SDK)/mk/rte.extsubdir.mk diff --git a/examples/multi_process/hotplug_mp/Makefile b/examples/multi_process/hotplug_mp/Makefile new file mode 100644 index 0..c09a57bfa --- /dev/null +++ b/examples/multi_process/hotplug_mp/Makefile @@ -0,0 +1,23 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2010-2014 Intel Corporation + +ifeq ($(RTE_SDK),) +$(error "Please define RTE_SDK environment variable") +endif + +# Default target, can be overridden by command line or environment +RTE_TARGET ?= x86_64-native-linuxapp-gcc + +include $(RTE_SDK)/mk/rte.vars.mk + +# binary name +APP = hotplug_mp + +# all source are stored in SRCS-y +SRCS-y := main.c commands.c + +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) +CFLAGS += -DALLOW_EXPERIMENTAL_API + +include $(RTE_SDK)/mk/rte.extapp.mk diff --git a/examples/multi_process/hotplug_mp/commands.c b/examples/multi_process/hotplug_mp/commands.c new file mode 100644 index 0..31f9e2e15 --- /dev/null +++ b/examples/multi_process/hotplug_mp/commands.c @@ -0,0 +1,356 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation. + */ + +#include +#include +#include +#include +#include +#include +#include + +/**/ + +struct cmd_help_result { + cmdline_fixed_string_t help; +}; + +static void cmd_help_parsed(__attribute__((unused)) void *parsed_result, + struct cmdline *cl, + __attribute__((unused)) void *data) +{ + cmdline_printf(cl, + "commands:\n" + "- attach \n" + "- detach \n" + "- attachp \n" + "- detachp \n" + "- lock \n" + "- unlock \n" + "- list\n\n"); +} + +cmdline_parse_token_string_t cmd_help_help = + TOKEN_STRING_INITIALIZER(struct cmd_help_result, help, "help"); + +cmdline_parse_inst_t cmd_help = { + .f = cmd_help_parsed, /* function to call */ + .data = NULL, /* 2nd arg of func */ + .help_str = "show help", + .tokens = {/* token list, NULL terminated */ + (void *)&cmd_help_help, + NULL, + }, +}; + +/**/ + +struct cmd_quit_result { + cmdline_fixed_string_t quit; +}; + +static void cmd_quit_parsed(__attribute__((unused)) void *parsed_result, + struct cmdline *cl, + __attribute__((unused)) void *data) +{ + cmdline_quit(cl); +} + +cmdline_parse_token_string_t cmd_quit_quit = + TOKEN_STRING_INITIALIZER(struct cmd_quit_result, quit, "quit"); + +cmdline_parse_inst_t cmd_quit = { + .f = cmd_quit_parsed, /* function to call */ + .data = NULL, /* 2nd arg of func */ + .help_str = "quit", + .tokens = {/* token list, NULL terminated */ + (void *)&cmd_quit_quit, + NULL, + }, +}; + +/**/ + +struct cmd_list_result { + cmdline_fixed_string_t list; +}; + +static void cmd_
[dpdk-dev] [PATCH v8 19/19] doc: update release notes for multi process hotplug
Update release notes for the new multi process hotplug feature. Signed-off-by: Qi Zhang --- doc/guides/rel_notes/release_18_08.rst | 20 1 file changed, 20 insertions(+) diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst index bc0124295..93a813340 100644 --- a/doc/guides/rel_notes/release_18_08.rst +++ b/doc/guides/rel_notes/release_18_08.rst @@ -46,6 +46,21 @@ New Features Flow API support has been added to CXGBE Poll Mode Driver to offload flows to Chelsio T5/T6 NICs. +* **Support etherdev multi-process hotplug.** + + Hotplug and hot-unplug for ethdev devices will now be supported in + multiprocessing scenario. Any ethdev devices created in the primary + process will be regarded as shared and will be available for all DPDK + processes, while secondary processes will have a choice between adding + a private (non-shared) or a shared device. Synchronization between + processes will be done using DPDK IPC. + +* **Support etherdev locking.** + + Application can now lock an ethernet device to prevent unexpected device + removal. Devices can either be locked unconditionally, or an application + can register for a callback before unplug for the purposes of performing + cleanup before releasing the device (or have a chance to deny unplug) API Changes --- @@ -60,6 +75,11 @@ API Changes Also, make sure to start the actual text at the margin. = +* ethdev: scope of rte_eth_dev_attach and rte_eth_dev_detach is extended. + + In primary-secondary process model, ``rte_eth_dev_attach`` will guarantee + that device be attached on all processes, while ``rte_eth_dev_detach`` + will guarantee device be detached on all processes. ABI Changes --- -- 2.13.6
[dpdk-dev] [PATCH v5 1/9] vhost: advertise support in-order feature
If devices always use descriptors in the same order in which they have been made available. These devices can offer the VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows devices to notify the use of a batch of buffers to virtio driver by only writing used ring index. Vhost user device has supported this feature by default. If vhost dequeue zero is enabled, should disable VIRTIO_F_IN_ORDER as vhost can’t assure that descriptors returned from NIC are in order. Signed-off-by: Marvin Liu Reviewed-by: Maxime Coquelin diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 0399c37bc..d63031747 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -853,6 +853,12 @@ rte_vhost_driver_register(const char *path, uint64_t flags) vsocket->supported_features = VIRTIO_NET_SUPPORTED_FEATURES; vsocket->features = VIRTIO_NET_SUPPORTED_FEATURES; + /* Dequeue zero copy can't assure descriptors returned in order */ + if (vsocket->dequeue_zero_copy) { + vsocket->supported_features &= ~(1ULL << VIRTIO_F_IN_ORDER); + vsocket->features &= ~(1ULL << VIRTIO_F_IN_ORDER); + } + if (!(flags & RTE_VHOST_USER_IOMMU_SUPPORT)) { vsocket->supported_features &= ~(1ULL << VIRTIO_F_IOMMU_PLATFORM); vsocket->features &= ~(1ULL << VIRTIO_F_IOMMU_PLATFORM); diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 786a74f64..3437b996b 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -191,6 +191,13 @@ struct vhost_msg { #define VIRTIO_F_VERSION_1 32 #endif +/* + * Available and used descs are in same order + */ +#ifndef VIRTIO_F_IN_ORDER +#define VIRTIO_F_IN_ORDER 35 +#endif + /* Features supported by this builtin vhost-user net driver. */ #define VIRTIO_NET_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \ (1ULL << VIRTIO_F_ANY_LAYOUT) | \ @@ -214,7 +221,8 @@ struct vhost_msg { (1ULL << VIRTIO_NET_F_GUEST_ECN) | \ (1ULL << VIRTIO_RING_F_INDIRECT_DESC) | \ (1ULL << VIRTIO_RING_F_EVENT_IDX) | \ - (1ULL << VIRTIO_NET_F_MTU) | \ + (1ULL << VIRTIO_NET_F_MTU) | \ + (1ULL << VIRTIO_F_IN_ORDER) | \ (1ULL << VIRTIO_F_IOMMU_PLATFORM)) -- 2.17.0
[dpdk-dev] [PATCH v5 0/9] support in-order feature
In latest virtio-spec, new feature bit VIRTIO_F_IN_ORDER was introduced. When this feature has been negotiated, virtio driver will use descriptors in ring order: starting from offset 0 in the table, and wrapping around at the end of the table. Vhost devices will always use descriptors in the same order in which they have been made available. This can reduce virtio accesses to used ring. Based on updated virtio-spec, this series realized IN_ORDER prototype in virtio driver. Due to new [RT]x path added into selection, also add two new parameters mrg_rx and in_order into virtio-user vdev parameters list. This will allow user to configure feature bits thus can impact [RT]x path selection. Performance of virtio user with IN_ORDER feature: Platform: Purely CPU: Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz DPDK baseline: 18.05 Setup: testpmd with vhost vdev + testpmd with virtio vdev +--+--+--+-+ |Vhost->Virtio |1 Queue |2 Queues |4 Queues | +--+--+--+-+ |Inorder |12.0Mpps |24.2Mpps |26.0Mpps | |Normal|12.1Mpps |18.5Mpps |18.9Mpps | +--+--+--+-+ +--+--++-+ |Virtio->Vhost |1 Queue |2 Queues|4 Queues | +--+--++-+ |Inorder |13.8Mpps |10.7 ~ 15.2Mpps |11.5Mpps | |Normal|13.3Mpps |9.8 ~ 14Mpps|10.5Mpps | +--+--++-+ +-+--+++ |Loopback |1 Queue |2 Queues|4 Queues| +-+--+++ |Inorder |7.4Mpps |9.1 ~ 11.6Mpps |10.5 ~ 11.3Mpps | +-+--+++ |Normal |7.5Mpps |7.7 ~ 9.0Mpps |7.6 ~ 7.8Mpps | +-+--+++ v5: - disable simple Tx when in-order negotiated - doc update v4: - disable simple [RT]x function for ARM - squash doc update into relevant patches - fix git-check-log and checkpatch errors v3: - refine [RT]x function selection logic - fix in-order mergeable packets index error - combine unsupport mask patch - doc virtio in-order update - fix checkpatch error v2: - merge to latest dpdk-net-virtio - not use in_direct for normal xmit packets - update available ring for each descriptor - clean up IN_ORDER xmit function - unmask feature bits when disabled in_order or mgr_rxbuf - extract common part between IN_ORDER and normal functions - update performance result Marvin Liu (9): vhost: advertise support in-order feature net/virtio: add in-order feature bit definition net/virtio-user: add unsupported features mask net/virtio-user: add mrg-rxbuf and in-order vdev parameters net/virtio: free in-order descriptors before device start net/virtio: extract common part for in-order functions net/virtio: support in-order Rx and Tx net/virtio: add in-order Rx/Tx into selection net/virtio: advertise support in-order feature doc/guides/nics/virtio.rst| 23 +- drivers/net/virtio/virtio_ethdev.c| 32 +- drivers/net/virtio/virtio_ethdev.h| 7 + drivers/net/virtio/virtio_pci.h | 8 + drivers/net/virtio/virtio_rxtx.c | 639 -- .../net/virtio/virtio_user/virtio_user_dev.c | 30 +- .../net/virtio/virtio_user/virtio_user_dev.h | 4 +- drivers/net/virtio/virtio_user_ethdev.c | 47 +- drivers/net/virtio/virtqueue.c| 8 + drivers/net/virtio/virtqueue.h| 2 + lib/librte_vhost/socket.c | 6 + lib/librte_vhost/vhost.h | 10 +- 12 files changed, 736 insertions(+), 80 deletions(-) -- 2.17.0
[dpdk-dev] [PATCH v5 4/9] net/virtio-user: add mrg-rxbuf and in-order vdev parameters
Add parameters for configuring VIRTIO_NET_F_MRG_RXBUF and VIRTIO_F_IN_ORDER feature bits. If feature is disabled, also update corresponding unsupported feature bit. Signed-off-by: Marvin Liu Reviewed-by: Maxime Coquelin diff --git a/doc/guides/nics/virtio.rst b/doc/guides/nics/virtio.rst index a42d1bb30..46e292c4d 100644 --- a/doc/guides/nics/virtio.rst +++ b/doc/guides/nics/virtio.rst @@ -331,3 +331,13 @@ The user can specify below argument in devargs. driver, and works as a HW vhost backend. This argument is used to specify a virtio device needs to work in vDPA mode. (Default: 0 (disabled)) + +#. ``mrg_rxbuf``: + +It is used to enable virtio device mergeable Rx buffer feature. +(Default: 1 (enabled)) + +#. ``in_order``: + +It is used to enable virtio device in-order feature. +(Default: 1 (enabled)) diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c index e0e956888..953c46055 100644 --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c @@ -375,7 +375,8 @@ virtio_user_dev_setup(struct virtio_user_dev *dev) int virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues, -int cq, int queue_size, const char *mac, char **ifname) +int cq, int queue_size, const char *mac, char **ifname, +int mrg_rxbuf, int in_order) { pthread_mutex_init(&dev->mutex, NULL); snprintf(dev->path, PATH_MAX, "%s", path); @@ -420,6 +421,16 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues, dev->device_features = VIRTIO_USER_SUPPORTED_FEATURES; } + if (!mrg_rxbuf) { + dev->device_features &= ~(1ull << VIRTIO_NET_F_MRG_RXBUF); + dev->unsupported_features |= (1ull << VIRTIO_NET_F_MRG_RXBUF); + } + + if (!in_order) { + dev->device_features &= ~(1ull << VIRTIO_F_IN_ORDER); + dev->unsupported_features |= (1ull << VIRTIO_F_IN_ORDER); + } + if (dev->mac_specified) { dev->device_features |= (1ull << VIRTIO_NET_F_MAC); } else { diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h b/drivers/net/virtio/virtio_user/virtio_user_dev.h index c23ddfcc5..d6e0e137b 100644 --- a/drivers/net/virtio/virtio_user/virtio_user_dev.h +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h @@ -48,7 +48,8 @@ int is_vhost_user_by_type(const char *path); int virtio_user_start_device(struct virtio_user_dev *dev); int virtio_user_stop_device(struct virtio_user_dev *dev); int virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues, -int cq, int queue_size, const char *mac, char **ifname); +int cq, int queue_size, const char *mac, char **ifname, +int mrg_rxbuf, int in_order); void virtio_user_dev_uninit(struct virtio_user_dev *dev); void virtio_user_handle_cq(struct virtio_user_dev *dev, uint16_t queue_idx); uint8_t virtio_user_handle_mq(struct virtio_user_dev *dev, uint16_t q_pairs); diff --git a/drivers/net/virtio/virtio_user_ethdev.c b/drivers/net/virtio/virtio_user_ethdev.c index 08fa4bd47..fcd30251f 100644 --- a/drivers/net/virtio/virtio_user_ethdev.c +++ b/drivers/net/virtio/virtio_user_ethdev.c @@ -358,8 +358,12 @@ static const char *valid_args[] = { VIRTIO_USER_ARG_QUEUE_SIZE, #define VIRTIO_USER_ARG_INTERFACE_NAME "iface" VIRTIO_USER_ARG_INTERFACE_NAME, -#define VIRTIO_USER_ARG_SERVER_MODE "server" +#define VIRTIO_USER_ARG_SERVER_MODE"server" VIRTIO_USER_ARG_SERVER_MODE, +#define VIRTIO_USER_ARG_MRG_RXBUF "mrg_rxbuf" + VIRTIO_USER_ARG_MRG_RXBUF, +#define VIRTIO_USER_ARG_IN_ORDER "in_order" + VIRTIO_USER_ARG_IN_ORDER, NULL }; @@ -464,6 +468,8 @@ virtio_user_pmd_probe(struct rte_vdev_device *dev) uint64_t cq = VIRTIO_USER_DEF_CQ_EN; uint64_t queue_size = VIRTIO_USER_DEF_Q_SZ; uint64_t server_mode = VIRTIO_USER_DEF_SERVER_MODE; + uint64_t mrg_rxbuf = 1; + uint64_t in_order = 1; char *path = NULL; char *ifname = NULL; char *mac_addr = NULL; @@ -563,6 +569,24 @@ virtio_user_pmd_probe(struct rte_vdev_device *dev) goto end; } + if (rte_kvargs_count(kvlist, VIRTIO_USER_ARG_MRG_RXBUF) == 1) { + if (rte_kvargs_process(kvlist, VIRTIO_USER_ARG_MRG_RXBUF, + &get_integer_arg, &mrg_rxbuf) < 0) { + PMD_INIT_LOG(ERR, "error to parse %s", +VIRTIO_USER_ARG_MRG_RXBUF); + goto end; + } + } + + if (rte_kvargs_count(kvlist, VIRTIO_USER_ARG_IN_ORDER) == 1) { + if (rte_kvargs_process(kvlist, VIRTIO_USER_ARG_IN_ORDER, +
[dpdk-dev] [PATCH v5 2/9] net/virtio: add in-order feature bit definition
If VIRTIO_F_IN_ORDER has been negotiated, driver will use descriptors in ring order: starting from offset 0 in the table, and wrapping around at the end of the table. Also introduce use_inorder_[rt]x flag for selection of IN_ORDER [RT]x handlers. Signed-off-by: Marvin Liu Reviewed-by: Maxime Coquelin diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index a28ba8339..77f805df6 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -121,6 +121,12 @@ struct virtnet_ctl; #define VIRTIO_TRANSPORT_F_START 28 #define VIRTIO_TRANSPORT_F_END 34 +/* + * Inorder feature indicates that all buffers are used by the device + * in the same order in which they have been made available. + */ +#define VIRTIO_F_IN_ORDER 35 + /* The Guest publishes the used index for which it expects an interrupt * at the end of the avail ring. Host should ignore the avail->flags field. */ /* The Host publishes the avail index for which it expects a kick @@ -233,6 +239,8 @@ struct virtio_hw { uint8_t modern; uint8_t use_simple_rx; uint8_t use_simple_tx; + uint8_t use_inorder_rx; + uint8_t use_inorder_tx; uint16_tport_id; uint8_t mac_addr[ETHER_ADDR_LEN]; uint32_tnotify_off_multiplier; diff --git a/drivers/net/virtio/virtio_user_ethdev.c b/drivers/net/virtio/virtio_user_ethdev.c index 1c102ca72..8747cbf94 100644 --- a/drivers/net/virtio/virtio_user_ethdev.c +++ b/drivers/net/virtio/virtio_user_ethdev.c @@ -441,6 +441,8 @@ virtio_user_eth_dev_alloc(struct rte_vdev_device *vdev) hw->modern = 0; hw->use_simple_rx = 0; hw->use_simple_tx = 0; + hw->use_inorder_rx = 0; + hw->use_inorder_tx = 0; hw->virtio_user_dev = dev; return eth_dev; } -- 2.17.0
[dpdk-dev] [PATCH v5 5/9] net/virtio: free in-order descriptors before device start
Add new function for freeing IN_ORDER descriptors. As descriptors will be allocated and freed sequentially when IN_ORDER feature was negotiated. There will be no need to utilize chain for freed descriptors management, only index update is enough. Signed-off-by: Marvin Liu Reviewed-by: Maxime Coquelin diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 92fab2174..0bca29855 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -47,6 +47,13 @@ virtio_dev_rx_queue_done(void *rxq, uint16_t offset) return VIRTQUEUE_NUSED(vq) >= offset; } +void +vq_ring_free_inorder(struct virtqueue *vq, uint16_t desc_idx, uint16_t num) +{ + vq->vq_free_cnt += num; + vq->vq_desc_tail_idx = desc_idx & (vq->vq_nentries - 1); +} + void vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx) { diff --git a/drivers/net/virtio/virtqueue.c b/drivers/net/virtio/virtqueue.c index a7d0a9cbe..56a77cc71 100644 --- a/drivers/net/virtio/virtqueue.c +++ b/drivers/net/virtio/virtqueue.c @@ -74,6 +74,14 @@ virtqueue_rxvq_flush(struct virtqueue *vq) desc_idx = used_idx; rte_pktmbuf_free(vq->sw_ring[desc_idx]); vq->vq_free_cnt++; + } else if (hw->use_inorder_rx) { + desc_idx = (uint16_t)uep->id; + dxp = &vq->vq_descx[desc_idx]; + if (dxp->cookie != NULL) { + rte_pktmbuf_free(dxp->cookie); + dxp->cookie = NULL; + } + vq_ring_free_inorder(vq, desc_idx, 1); } else { desc_idx = (uint16_t)uep->id; dxp = &vq->vq_descx[desc_idx]; diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h index 14364f356..26518ed98 100644 --- a/drivers/net/virtio/virtqueue.h +++ b/drivers/net/virtio/virtqueue.h @@ -306,6 +306,8 @@ virtio_get_queue_type(struct virtio_hw *hw, uint16_t vtpci_queue_idx) #define VIRTQUEUE_NUSED(vq) ((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx)) void vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx); +void vq_ring_free_inorder(struct virtqueue *vq, uint16_t desc_idx, + uint16_t num); static inline void vq_update_avail_idx(struct virtqueue *vq) -- 2.17.0
[dpdk-dev] [PATCH v5 3/9] net/virtio-user: add unsupported features mask
This patch introduces unsupported features mask for virtio-user device. For virtio-user server mode, when reconnecting virtio-user will retrieve vhost device features as base and then unmask unsupported features. Signed-off-by: Marvin Liu Reviewed-by: Maxime Coquelin diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c index 4322527f2..e0e956888 100644 --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c @@ -384,6 +384,7 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues, dev->queue_pairs = 1; /* mq disabled by default */ dev->queue_size = queue_size; dev->mac_specified = 0; + dev->unsupported_features = 0; parse_mac(dev, mac); if (*ifname) { @@ -419,10 +420,12 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues, dev->device_features = VIRTIO_USER_SUPPORTED_FEATURES; } - if (dev->mac_specified) + if (dev->mac_specified) { dev->device_features |= (1ull << VIRTIO_NET_F_MAC); - else + } else { dev->device_features &= ~(1ull << VIRTIO_NET_F_MAC); + dev->unsupported_features |= (1ull << VIRTIO_NET_F_MAC); + } if (cq) { /* device does not really need to know anything about CQ, @@ -437,6 +440,14 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues, dev->device_features &= ~(1ull << VIRTIO_NET_F_GUEST_ANNOUNCE); dev->device_features &= ~(1ull << VIRTIO_NET_F_MQ); dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_MAC_ADDR); + dev->unsupported_features |= (1ull << VIRTIO_NET_F_CTRL_VQ); + dev->unsupported_features |= (1ull << VIRTIO_NET_F_CTRL_RX); + dev->unsupported_features |= (1ull << VIRTIO_NET_F_CTRL_VLAN); + dev->unsupported_features |= + (1ull << VIRTIO_NET_F_GUEST_ANNOUNCE); + dev->unsupported_features |= (1ull << VIRTIO_NET_F_MQ); + dev->unsupported_features |= + (1ull << VIRTIO_NET_F_CTRL_MAC_ADDR); } /* The backend will not report this feature, we add it explicitly */ @@ -444,6 +455,7 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues, dev->device_features |= (1ull << VIRTIO_NET_F_STATUS); dev->device_features &= VIRTIO_USER_SUPPORTED_FEATURES; + dev->unsupported_features |= ~VIRTIO_USER_SUPPORTED_FEATURES; if (rte_mem_event_callback_register(VIRTIO_USER_MEM_EVENT_CLB_NAME, virtio_user_mem_event_cb, dev)) { diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h b/drivers/net/virtio/virtio_user/virtio_user_dev.h index d2d4cb825..c23ddfcc5 100644 --- a/drivers/net/virtio/virtio_user/virtio_user_dev.h +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h @@ -33,6 +33,7 @@ struct virtio_user_dev { * and will be sync with device */ uint64_tdevice_features; /* supported features by device */ + uint64_tunsupported_features; /* unsupported features mask */ uint8_t status; uint16_tport_id; uint8_t mac_addr[ETHER_ADDR_LEN]; diff --git a/drivers/net/virtio/virtio_user_ethdev.c b/drivers/net/virtio/virtio_user_ethdev.c index 8747cbf94..08fa4bd47 100644 --- a/drivers/net/virtio/virtio_user_ethdev.c +++ b/drivers/net/virtio/virtio_user_ethdev.c @@ -30,7 +30,6 @@ virtio_user_server_reconnect(struct virtio_user_dev *dev) int ret; int flag; int connectfd; - uint64_t features = dev->device_features; struct rte_eth_dev *eth_dev = &rte_eth_devices[dev->port_id]; connectfd = accept(dev->listenfd, NULL, NULL); @@ -45,15 +44,8 @@ virtio_user_server_reconnect(struct virtio_user_dev *dev) return -1; } - features &= ~dev->device_features; - /* For following bits, vhost-user doesn't really need to know */ - features &= ~(1ull << VIRTIO_NET_F_MAC); - features &= ~(1ull << VIRTIO_NET_F_CTRL_VLAN); - features &= ~(1ull << VIRTIO_NET_F_CTRL_MAC_ADDR); - features &= ~(1ull << VIRTIO_NET_F_STATUS); - if (features) - PMD_INIT_LOG(ERR, "WARNING: Some features 0x%" PRIx64 " are not supported by vhost-user!", -features); + /* umask vhost-user unsupported features */ + dev->device_features &= ~(dev->unsupported_features); dev->features &= dev->device_features; -- 2.17.0
[dpdk-dev] [PATCH v5 6/9] net/virtio: extract common part for in-order functions
IN_ORDER virtio-user Tx function support Tx checksum offloading and TSO which also support on normal Tx function. So extracts common part into separated function for reuse. Signed-off-by: Marvin Liu Reviewed-by: Maxime Coquelin diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 0bca29855..e9b1b496e 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -246,6 +246,55 @@ tx_offload_enabled(struct virtio_hw *hw) (var) = (val); \ } while (0) +static inline void +virtqueue_xmit_offload(struct virtio_net_hdr *hdr, + struct rte_mbuf *cookie, + int offload) +{ + if (offload) { + if (cookie->ol_flags & PKT_TX_TCP_SEG) + cookie->ol_flags |= PKT_TX_TCP_CKSUM; + + switch (cookie->ol_flags & PKT_TX_L4_MASK) { + case PKT_TX_UDP_CKSUM: + hdr->csum_start = cookie->l2_len + cookie->l3_len; + hdr->csum_offset = offsetof(struct udp_hdr, + dgram_cksum); + hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; + break; + + case PKT_TX_TCP_CKSUM: + hdr->csum_start = cookie->l2_len + cookie->l3_len; + hdr->csum_offset = offsetof(struct tcp_hdr, cksum); + hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; + break; + + default: + ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0); + ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0); + ASSIGN_UNLESS_EQUAL(hdr->flags, 0); + break; + } + + /* TCP Segmentation Offload */ + if (cookie->ol_flags & PKT_TX_TCP_SEG) { + virtio_tso_fix_cksum(cookie); + hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ? + VIRTIO_NET_HDR_GSO_TCPV6 : + VIRTIO_NET_HDR_GSO_TCPV4; + hdr->gso_size = cookie->tso_segsz; + hdr->hdr_len = + cookie->l2_len + + cookie->l3_len + + cookie->l4_len; + } else { + ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0); + ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0); + ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0); + } + } +} + static inline void virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie, uint16_t needed, int use_indirect, int can_push) @@ -315,49 +364,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie, idx = start_dp[idx].next; } - /* Checksum Offload / TSO */ - if (offload) { - if (cookie->ol_flags & PKT_TX_TCP_SEG) - cookie->ol_flags |= PKT_TX_TCP_CKSUM; - - switch (cookie->ol_flags & PKT_TX_L4_MASK) { - case PKT_TX_UDP_CKSUM: - hdr->csum_start = cookie->l2_len + cookie->l3_len; - hdr->csum_offset = offsetof(struct udp_hdr, - dgram_cksum); - hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; - break; - - case PKT_TX_TCP_CKSUM: - hdr->csum_start = cookie->l2_len + cookie->l3_len; - hdr->csum_offset = offsetof(struct tcp_hdr, cksum); - hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; - break; - - default: - ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0); - ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0); - ASSIGN_UNLESS_EQUAL(hdr->flags, 0); - break; - } - - /* TCP Segmentation Offload */ - if (cookie->ol_flags & PKT_TX_TCP_SEG) { - virtio_tso_fix_cksum(cookie); - hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ? - VIRTIO_NET_HDR_GSO_TCPV6 : - VIRTIO_NET_HDR_GSO_TCPV4; - hdr->gso_size = cookie->tso_segsz; - hdr->hdr_len = - cookie->l2_len + - cookie->l3_len + - cookie->l4_len; - } else { - ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0); - ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0); - ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0); - } - } + virtqueue_xmit_offload(hdr, cookie, offload); do { st
[dpdk-dev] [PATCH v5 7/9] net/virtio: support in-order Rx and Tx
IN_ORDER Rx function depends on merge-able feature. Descriptors allocation and free will be done in bulk. Virtio dequeue logic: dequeue_burst_rx(burst mbufs) for (each mbuf b) { if (b need merge) { merge remained mbufs add merged mbuf to return mbufs list } else { add mbuf to return mbufs list } } if (last mbuf c need merge) { dequeue_burst_rx(required mbufs) merge last mbuf c } refill_avail_ring_bulk() update_avail_ring() return mbufs list IN_ORDER Tx function can support offloading features. Packets which matched "can_push" option will be handled by simple xmit function. Those packets can't match "can_push" will be handled by original xmit function with in-order flag. Virtio enqueue logic: xmit_cleanup(used descs) for (each xmit mbuf b) { if (b can inorder xmit) { add mbuf b to inorder burst list continue } else { xmit inorder burst list xmit mbuf b by original function } } if (inorder burst list not empty) { xmit inorder burst list } update_avail_ring() Signed-off-by: Marvin Liu Reviewed-by: Maxime Coquelin diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h index bb40064ea..cd8070248 100644 --- a/drivers/net/virtio/virtio_ethdev.h +++ b/drivers/net/virtio/virtio_ethdev.h @@ -83,9 +83,15 @@ uint16_t virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); +uint16_t virtio_recv_mergeable_pkts_inorder(void *rx_queue, + struct rte_mbuf **rx_pkts, uint16_t nb_pkts); + uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); +uint16_t virtio_xmit_pkts_inorder(void *tx_queue, struct rte_mbuf **tx_pkts, + uint16_t nb_pkts); + uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index e9b1b496e..6394071b8 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -122,6 +122,44 @@ virtqueue_dequeue_burst_rx(struct virtqueue *vq, struct rte_mbuf **rx_pkts, return i; } +static uint16_t +virtqueue_dequeue_rx_inorder(struct virtqueue *vq, + struct rte_mbuf **rx_pkts, + uint32_t *len, + uint16_t num) +{ + struct vring_used_elem *uep; + struct rte_mbuf *cookie; + uint16_t used_idx = 0; + uint16_t i; + + if (unlikely(num == 0)) + return 0; + + for (i = 0; i < num; i++) { + used_idx = vq->vq_used_cons_idx & (vq->vq_nentries - 1); + /* Desc idx same as used idx */ + uep = &vq->vq_ring.used->ring[used_idx]; + len[i] = uep->len; + cookie = (struct rte_mbuf *)vq->vq_descx[used_idx].cookie; + + if (unlikely(cookie == NULL)) { + PMD_DRV_LOG(ERR, "vring descriptor with no mbuf cookie at %u", + vq->vq_used_cons_idx); + break; + } + + rte_prefetch0(cookie); + rte_packet_prefetch(rte_pktmbuf_mtod(cookie, void *)); + rx_pkts[i] = cookie; + vq->vq_used_cons_idx++; + vq->vq_descx[used_idx].cookie = NULL; + } + + vq_ring_free_inorder(vq, used_idx, i); + return i; +} + #ifndef DEFAULT_TX_FREE_THRESH #define DEFAULT_TX_FREE_THRESH 32 #endif @@ -150,6 +188,83 @@ virtio_xmit_cleanup(struct virtqueue *vq, uint16_t num) } } +/* Cleanup from completed inorder transmits. */ +static void +virtio_xmit_cleanup_inorder(struct virtqueue *vq, uint16_t num) +{ + uint16_t i, used_idx, desc_idx, last_idx; + int16_t free_cnt = 0; + struct vq_desc_extra *dxp = NULL; + + if (unlikely(num == 0)) + return; + + for (i = 0; i < num; i++) { + struct vring_used_elem *uep; + + used_idx = vq->vq_used_cons_idx & (vq->vq_nentries - 1); + uep = &vq->vq_ring.used->ring[used_idx]; + desc_idx = (uint16_t)uep->id; + + dxp = &vq->vq_descx[desc_idx]; + vq->vq_used_cons_idx++; + + if (dxp->cookie != NULL) { + rte_pktmbuf_free(dxp->cookie); + dxp->cookie = NULL; + } + } + + last_idx = desc_idx + dxp->ndescs - 1; + free_cnt = last_idx - vq->vq_desc_tail_idx; + if (free_cnt <= 0) + free_cnt += vq->vq_nentries; + + vq_ring_free_inorder(vq,
[dpdk-dev] [PATCH v5 8/9] net/virtio: add in-order Rx/Tx into selection
After IN_ORDER Rx/Tx paths added, need to update Rx/Tx path selection logic. Rx path select logic: If IN_ORDER and merge-able are enabled will select IN_ORDER Rx path. If IN_ORDER is enabled, Rx offload and merge-able are disabled will select simple Rx path. Otherwise will select normal Rx path. Tx path select logic: If IN_ORDER is enabled will select IN_ORDER Tx path. Otherwise will select default Tx path. Signed-off-by: Marvin Liu Reviewed-by: Maxime Coquelin diff --git a/doc/guides/nics/virtio.rst b/doc/guides/nics/virtio.rst index 46e292c4d..7c099fb7c 100644 --- a/doc/guides/nics/virtio.rst +++ b/doc/guides/nics/virtio.rst @@ -201,7 +201,7 @@ The packet transmission flow is: Virtio PMD Rx/Tx Callbacks -- -Virtio driver has 3 Rx callbacks and 2 Tx callbacks. +Virtio driver has 4 Rx callbacks and 3 Tx callbacks. Rx callbacks: @@ -215,6 +215,9 @@ Rx callbacks: Vector version without mergeable Rx buffer support, also fixes the available ring indexes and uses vector instructions to optimize performance. +#. ``virtio_recv_mergeable_pkts_inorder``: + In-order version with mergeable Rx buffer support. + Tx callbacks: #. ``virtio_xmit_pkts``: @@ -223,6 +226,8 @@ Tx callbacks: #. ``virtio_xmit_pkts_simple``: Vector version fixes the available ring indexes to optimize performance. +#. ``virtio_xmit_pkts_inorder``: + In-order version. By default, the non-vector callbacks are used: @@ -254,6 +259,12 @@ Example of using the vector version of the virtio poll mode driver in testpmd -l 0-2 -n 4 -- -i --tx-offloads=0x0 --rxq=1 --txq=1 --nb-cores=1 +In-order callbacks only work on simulated virtio user vdev. + +* For Rx: If mergeable Rx buffers is enabled and in-order is enabled then +``virtio_xmit_pkts_inorder`` is used. + +* For Tx: If in-order is enabled then ``virtio_xmit_pkts_inorder`` is used. Interrupt mode -- diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index df50a571a..df7981ddb 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1320,6 +1320,11 @@ set_rxtx_funcs(struct rte_eth_dev *eth_dev) PMD_INIT_LOG(INFO, "virtio: using simple Rx path on port %u", eth_dev->data->port_id); eth_dev->rx_pkt_burst = virtio_recv_pkts_vec; + } else if (hw->use_inorder_rx) { + PMD_INIT_LOG(INFO, + "virtio: using inorder mergeable buffer Rx path on port %u", + eth_dev->data->port_id); + eth_dev->rx_pkt_burst = &virtio_recv_mergeable_pkts_inorder; } else if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) { PMD_INIT_LOG(INFO, "virtio: using mergeable buffer Rx path on port %u", @@ -1335,6 +1340,10 @@ set_rxtx_funcs(struct rte_eth_dev *eth_dev) PMD_INIT_LOG(INFO, "virtio: using simple Tx path on port %u", eth_dev->data->port_id); eth_dev->tx_pkt_burst = virtio_xmit_pkts_simple; + } else if (hw->use_inorder_tx) { + PMD_INIT_LOG(INFO, "virtio: using inorder Tx path on port %u", + eth_dev->data->port_id); + eth_dev->tx_pkt_burst = virtio_xmit_pkts_inorder; } else { PMD_INIT_LOG(INFO, "virtio: using standard Tx path on port %u", eth_dev->data->port_id); @@ -1874,20 +1883,27 @@ virtio_dev_configure(struct rte_eth_dev *dev) hw->use_simple_rx = 1; hw->use_simple_tx = 1; + if (vtpci_with_feature(hw, VIRTIO_F_IN_ORDER)) { + /* Simple Tx not compatible with in-order ring */ + hw->use_inorder_tx = 1; + hw->use_simple_tx = 0; + if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) { + hw->use_inorder_rx = 1; + hw->use_simple_rx = 0; + } else { + hw->use_inorder_rx = 0; + if (rx_offloads & (DEV_RX_OFFLOAD_UDP_CKSUM | + DEV_RX_OFFLOAD_TCP_CKSUM)) + hw->use_simple_rx = 0; + } + } + #if defined RTE_ARCH_ARM64 || defined RTE_ARCH_ARM if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) { hw->use_simple_rx = 0; hw->use_simple_tx = 0; } #endif - if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) { - hw->use_simple_rx = 0; - hw->use_simple_tx = 0; - } - - if (rx_offloads & (DEV_RX_OFFLOAD_UDP_CKSUM | - DEV_RX_OFFLOAD_TCP_CKSUM)) - hw->use_simple_rx = 0; return 0; } -- 2.17.0
[dpdk-dev] [PATCH v5 9/9] net/virtio: advertise support in-order feature
Signed-off-by: Marvin Liu Reviewed-by: Maxime Coquelin diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h index cd8070248..350e9ce73 100644 --- a/drivers/net/virtio/virtio_ethdev.h +++ b/drivers/net/virtio/virtio_ethdev.h @@ -36,6 +36,7 @@ 1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE | \ 1u << VIRTIO_RING_F_INDIRECT_DESC |\ 1ULL << VIRTIO_F_VERSION_1 | \ +1ULL << VIRTIO_F_IN_ORDER| \ 1ULL << VIRTIO_F_IOMMU_PLATFORM) #define VIRTIO_PMD_SUPPORTED_GUEST_FEATURES\ diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c index 953c46055..7df600b02 100644 --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c @@ -371,6 +371,7 @@ virtio_user_dev_setup(struct virtio_user_dev *dev) 1ULL << VIRTIO_NET_F_GUEST_CSUM| \ 1ULL << VIRTIO_NET_F_GUEST_TSO4| \ 1ULL << VIRTIO_NET_F_GUEST_TSO6| \ +1ULL << VIRTIO_F_IN_ORDER | \ 1ULL << VIRTIO_F_VERSION_1) int -- 2.17.0
[dpdk-dev] [PATCH v2] mempool/octeontx: fix pool to aura mapping
HW needs each pool to be mapped to an aura set of 16 auras. Previously, pool to aura mapping was considered to be 1:1. Fixes: 02fd6c744350 ("mempool/octeontx: support allocation") Cc: sta...@dpdk.org Signed-off-by: Pavan Nikhilesh Acked-by: Santosh Shukla --- v2 Changes: - use macro to avoid code duplication (Santosh). - use uint16_t for gaura id. drivers/event/octeontx/timvf_evdev.c | 2 +- drivers/mempool/octeontx/octeontx_fpavf.c | 45 ++- drivers/mempool/octeontx/octeontx_fpavf.h | 9 + drivers/net/octeontx/octeontx_ethdev.c| 6 +-- drivers/net/octeontx/octeontx_rxtx.c | 2 +- 5 files changed, 42 insertions(+), 22 deletions(-) diff --git a/drivers/event/octeontx/timvf_evdev.c b/drivers/event/octeontx/timvf_evdev.c index c4fbd2d86..8a045c250 100644 --- a/drivers/event/octeontx/timvf_evdev.c +++ b/drivers/event/octeontx/timvf_evdev.c @@ -174,7 +174,7 @@ timvf_ring_start(const struct rte_event_timer_adapter *adptr) if (use_fpa) { pool = (uintptr_t)((struct rte_mempool *) timr->chunk_pool)->pool_id; - ret = octeontx_fpa_bufpool_gpool(pool); + ret = octeontx_fpa_bufpool_gaura(pool); if (ret < 0) { timvf_log_dbg("Unable to get gaura id"); ret = -ENOMEM; diff --git a/drivers/mempool/octeontx/octeontx_fpavf.c b/drivers/mempool/octeontx/octeontx_fpavf.c index 7aecaa85d..e5918c866 100644 --- a/drivers/mempool/octeontx/octeontx_fpavf.c +++ b/drivers/mempool/octeontx/octeontx_fpavf.c @@ -243,7 +243,7 @@ octeontx_fpapf_pool_setup(unsigned int gpool, unsigned int buf_size, POOL_LTYPE(0x2) | POOL_STYPE(0) | POOL_SET_NAT_ALIGN | POOL_ENA; - cfg.aid = 0; + cfg.aid = FPA_AURA_IDX(gpool); cfg.pool_cfg = reg; cfg.pool_stack_base = phys_addr; cfg.pool_stack_end = phys_addr + memsz; @@ -327,7 +327,7 @@ octeontx_fpapf_aura_attach(unsigned int gpool_index) hdr.vfid = gpool_index; hdr.res_code = 0; memset(&cfg, 0x0, sizeof(struct octeontx_mbox_fpa_cfg)); - cfg.aid = gpool_index; /* gpool is guara */ + cfg.aid = gpool_index << FPA_GAURA_SHIFT; ret = octeontx_mbox_send(&hdr, &cfg, sizeof(struct octeontx_mbox_fpa_cfg), @@ -335,7 +335,8 @@ octeontx_fpapf_aura_attach(unsigned int gpool_index) if (ret < 0) { fpavf_log_err("Could not attach fpa "); fpavf_log_err("aura %d to pool %d. Err=%d. FuncErr=%d\n", - gpool_index, gpool_index, ret, hdr.res_code); + gpool_index << FPA_GAURA_SHIFT, gpool_index, ret, + hdr.res_code); ret = -EACCES; goto err; } @@ -355,14 +356,15 @@ octeontx_fpapf_aura_detach(unsigned int gpool_index) goto err; } - cfg.aid = gpool_index; /* gpool is gaura */ + cfg.aid = gpool_index << FPA_GAURA_SHIFT; hdr.coproc = FPA_COPROC; hdr.msg = FPA_DETACHAURA; hdr.vfid = gpool_index; ret = octeontx_mbox_send(&hdr, &cfg, sizeof(cfg), NULL, 0); if (ret < 0) { fpavf_log_err("Couldn't detach FPA aura %d Err=%d FuncErr=%d\n", - gpool_index, ret, hdr.res_code); + gpool_index << FPA_GAURA_SHIFT, ret, + hdr.res_code); ret = -EINVAL; } @@ -469,6 +471,7 @@ octeontx_fpa_bufpool_free_count(uintptr_t handle) { uint64_t cnt, limit, avail; uint8_t gpool; + uint16_t gaura; uintptr_t pool_bar; if (unlikely(!octeontx_fpa_handle_valid(handle))) @@ -476,14 +479,16 @@ octeontx_fpa_bufpool_free_count(uintptr_t handle) /* get the gpool */ gpool = octeontx_fpa_bufpool_gpool(handle); + /* get the aura */ + gaura = octeontx_fpa_bufpool_gaura(handle); /* Get pool bar address from handle */ pool_bar = handle & ~(uint64_t)FPA_GPOOL_MASK; cnt = fpavf_read64((void *)((uintptr_t)pool_bar + - FPA_VF_VHAURA_CNT(gpool))); + FPA_VF_VHAURA_CNT(gaura))); limit = fpavf_read64((void *)((uintptr_t)pool_bar + - FPA_VF_VHAURA_CNT_LIMIT(gpool))); + FPA_VF_VHAURA_CNT_LIMIT(gaura))); avail = fpavf_read64((void *)((uintptr_t)pool_bar + FPA_VF_VHPOOL_AVAILABLE(gpool))); @@ -496,6 +501,7 @@ octeontx_fpa_bufpool_create(unsigned int object_size, unsigned int object_count, unsigned int buf_offset, int node_id) { unsigned int gpool; + unsigned int gaura; uintptr_t gpool_handle; uintptr_t pool_bar; int res; @@ -545,16 +551,18 @@ octeontx
Re: [dpdk-dev] [PATCH v2 4/4] net/ena: enable WC
2018-06-28 15:15 GMT+02:00 Rafal Kozik : > > Write combining (WC) increases NIC performance by making better > utilization of PCI bus. ENA PMD may make usage of this feature. > > To enable it load igb_uio driver with wc_activate set to 1. > > Signed-off-by: Rafal Kozik > Acked-by: Bruce Richardson Acked-by: Michal Krawczyk > --- > drivers/net/ena/ena_ethdev.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c > index 9ae73e3..1870edf 100644 > --- a/drivers/net/ena/ena_ethdev.c > +++ b/drivers/net/ena/ena_ethdev.c > @@ -2210,7 +2210,8 @@ static int eth_ena_pci_remove(struct rte_pci_device > *pci_dev) > > static struct rte_pci_driver rte_ena_pmd = { > .id_table = pci_id_ena_map, > - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, > + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | > +RTE_PCI_DRV_WC_ACTIVATE, > .probe = eth_ena_pci_probe, > .remove = eth_ena_pci_remove, > }; > -- > 2.7.4 >