Re: [dpdk-dev] [RFC 0/5] rte_flow extension for vSwitch acceleration
Hi Zhang, On Wed, Dec 20, 2017 at 04:34:44PM -0500, Qi Zhang wrote: > This patch extend rte_flow API. > The purpose is to provide comfortable programming interface for virtual switch > software (such as OVS) to take advantage of incoming device's vSwitch > acceleration > capability when using DPDK as data plane. > > Below is summary of changes: > > 1. Support to specify flow's destination as an ethdev interface. > > Add action RTE_FLOW_ACTION_TYPE_ETHDEV_PORT, use port_id as the identification > of the destitation. A typical use case is, with a smart NIC used for vSwitch > acceleration, flow is defined to redirect packet between switch port that is > managed by a Port Representor. > See patch for Port Representor: http://dpdk.org/dev/patchwork/patch/31458/ > > 2. Enhanced flow statistics query. > Enhanced action RTE_FLOW_ACTION_COUNT by adding last hit timestamp tracking > which is > the requirement from OVS. Seems this introduce a regression for drop flows and for hardware unable to timestamp packets which are already using the count action. Why not using the timestamp API in conjunction with the mark id action? > 3. Add flow timeout support as the requirement from OVS > Application is able to > a) Setup the time duration of a flow, the flow is expected to be deleted > automatically > when timeout. > b) Ping a flow to check if it is active or not. > c) Register a callback function when a flow is deleted due to timeout. > > 4. Add protocol headers which will be supported by incoming device. > > New protocal headers include IPV4 ARP, IPV6 ICMP , IPV6 extent header. > > 5. Add packet modification actions which will be supported by incoming device. > > Add new actions that be used to modify packet content with generic semantic: > > RTE_FLOW_ACTION_TYPE_FIELD_UPDATE: update specific field of packet > RTE_FLWO_ACTION_TYPE_FIELD_INCREMENT: increament specific field of packet > RTE_FLWO_ACTION_TYPE_FIELD_DECREMENT: decreament specific field of packet > RTE_FLWO_ACTION_TYPE_FIELD_COPY: copy data from one field to another in > packet. > > All action use struct rte_flow_item parameter to match the pattern that going > to be modified, if no pattern match, the action just be skipped. > These action are non-terminating action. they will not impact the fate of the > packets, since pattern match is expected to be performed before packet be > modified. > > Note: > The RFC patch is based on v17.11. > Testpmd command line support is not included. > > Qi Zhang (4): > ether: add flow action to redirect packet in a switch domain > ether: add flow last hit query support > ether: Add flow timeout support > ether: add packet modification aciton in rte_flow > > Thomas Monjalon (1): > version: 17.11.0 > > doc/guides/prog_guide/rte_flow.rst | 148 +++ > lib/librte_eal/common/include/rte_version.h | 4 +- > lib/librte_ether/rte_flow.c | 38 +++ > lib/librte_ether/rte_flow.h | 149 > +++- > lib/librte_ether/rte_flow_driver.h | 12 +++ > pkg/dpdk.spec | 2 +- > 6 files changed, 349 insertions(+), 4 deletions(-) > > -- > 2.7.4 > Regards, -- Nélio Laranjeiro 6WIND
[dpdk-dev] [PATCH v2 0/2] vhost: introduce rte_vhost_vring_call()
v2: * Add internal vhost_vring_call() helper function [Maxime] These patches eliminate code duplication for vhost_virtqueue->callfd users by introducing rte_vhost_vring_call() (public API) and vhost_vring_call() (librte_vhost-internal API). Stefan Hajnoczi (2): vhost: add vhost_vring_call() helper vhost: introduce rte_vhost_vring_call() lib/librte_vhost/rte_vhost.h | 15 +++ lib/librte_vhost/vhost.h | 12 examples/vhost/virtio_net.c| 11 ++- examples/vhost_scsi/vhost_scsi.c | 6 +++--- lib/librte_vhost/vhost.c | 21 + lib/librte_vhost/virtio_net.c | 23 +++ lib/librte_vhost/rte_vhost_version.map | 7 +++ 7 files changed, 63 insertions(+), 32 deletions(-) -- 2.14.3
[dpdk-dev] [PATCH v2 1/2] vhost: add vhost_vring_call() helper
Extract the callfd eventfd signal operation so virtio_net.c does not have to repeat it multiple times. Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/vhost.h | 12 lib/librte_vhost/virtio_net.c | 23 +++ 2 files changed, 15 insertions(+), 20 deletions(-) diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 04f54cb60..ac81d83bb 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -394,4 +394,16 @@ vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq, return __vhost_iova_to_vva(dev, vq, iova, size, perm); } +static __rte_always_inline void +vhost_vring_call(struct vhost_virtqueue *vq) +{ + /* Flush used->idx update before we read avail->flags. */ + rte_mb(); + + /* Kick the guest if necessary. */ + if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT) + && (vq->callfd >= 0)) + eventfd_write(vq->callfd, (eventfd_t)1); +} + #endif /* _VHOST_NET_CDEV_H_ */ diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 79d80f7fd..a92a5181d 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -408,13 +408,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, offsetof(struct vring_used, idx), sizeof(vq->used->idx)); - /* flush used->idx update before we read avail->flags. */ - rte_mb(); - - /* Kick the guest if necessary. */ - if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT) - && (vq->callfd >= 0)) - eventfd_write(vq->callfd, (eventfd_t)1); + vhost_vring_call(vq); out: if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) vhost_user_iotlb_rd_unlock(vq); @@ -701,14 +695,7 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, if (likely(vq->shadow_used_idx)) { flush_shadow_used_ring(dev, vq); - - /* flush used->idx update before we read avail->flags. */ - rte_mb(); - - /* Kick the guest if necessary. */ - if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT) - && (vq->callfd >= 0)) - eventfd_write(vq->callfd, (eventfd_t)1); + vhost_vring_call(vq); } out: @@ -1107,11 +1094,7 @@ update_used_idx(struct virtio_net *dev, struct vhost_virtqueue *vq, vq->used->idx += count; vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx), sizeof(vq->used->idx)); - - /* Kick guest if required. */ - if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT) - && (vq->callfd >= 0)) - eventfd_write(vq->callfd, (eventfd_t)1); + vhost_vring_call(vq); } static __rte_always_inline struct zcopy_mbuf * -- 2.14.3
[dpdk-dev] [PATCH v2 2/2] vhost: introduce rte_vhost_vring_call()
Users of librte_vhost currently implement the vring call operation themselves. Each caller performs the operation slightly differently. This patch introduces a new librte_vhost API called rte_vhost_vring_call() that performs the operation so that vhost-user applications don't have to duplicate it. Signed-off-by: Stefan Hajnoczi --- lib/librte_vhost/rte_vhost.h | 15 +++ examples/vhost/virtio_net.c| 11 ++- examples/vhost_scsi/vhost_scsi.c | 6 +++--- lib/librte_vhost/vhost.c | 21 + lib/librte_vhost/rte_vhost_version.map | 7 +++ 5 files changed, 48 insertions(+), 12 deletions(-) diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index f65364495..890f8a831 100644 --- a/lib/librte_vhost/rte_vhost.h +++ b/lib/librte_vhost/rte_vhost.h @@ -86,7 +86,9 @@ struct rte_vhost_vring { struct vring_used *used; uint64_tlog_guest_addr; + /** Deprecated, use rte_vhost_vring_call() instead. */ int callfd; + int kickfd; uint16_tsize; }; @@ -436,6 +438,19 @@ int rte_vhost_get_mem_table(int vid, struct rte_vhost_memory **mem); int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx, struct rte_vhost_vring *vring); +/** + * Notify the guest that used descriptors have been added to the vring. This + * function acts as a memory barrier. + * + * @param vid + * vhost device ID + * @param vring_idx + * vring index + * @return + * 0 on success, -1 on failure + */ +int rte_vhost_vring_call(int vid, uint16_t vring_idx); + /** * Get vhost RX queue avail count. * diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c index 1ab57f526..252c5b8ce 100644 --- a/examples/vhost/virtio_net.c +++ b/examples/vhost/virtio_net.c @@ -207,13 +207,8 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id, *(volatile uint16_t *)&vr->used->idx += count; queue->last_used_idx += count; - /* flush used->idx update before we read avail->flags. */ - rte_mb(); + rte_vhost_vring_call(dev->vid, queue_id); - /* Kick the guest if necessary. */ - if (!(vr->avail->flags & VRING_AVAIL_F_NO_INTERRUPT) - && (vr->callfd >= 0)) - eventfd_write(vr->callfd, (eventfd_t)1); return count; } @@ -396,9 +391,7 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id, vr->used->idx += i; - if (!(vr->avail->flags & VRING_AVAIL_F_NO_INTERRUPT) - && (vr->callfd >= 0)) - eventfd_write(vr->callfd, (eventfd_t)1); + rte_vhost_vring_call(dev->vid, queue_id); return i; } diff --git a/examples/vhost_scsi/vhost_scsi.c b/examples/vhost_scsi/vhost_scsi.c index b4f1f8d27..e30e61f6d 100644 --- a/examples/vhost_scsi/vhost_scsi.c +++ b/examples/vhost_scsi/vhost_scsi.c @@ -110,7 +110,7 @@ descriptor_is_wr(struct vring_desc *cur_desc) } static void -submit_completion(struct vhost_scsi_task *task) +submit_completion(struct vhost_scsi_task *task, uint32_t q_idx) { struct rte_vhost_vring *vq; struct vring_used *used; @@ -131,7 +131,7 @@ submit_completion(struct vhost_scsi_task *task) /* Send an interrupt back to the guest VM so that it knows * a completion is ready to be processed. */ - eventfd_write(vq->callfd, (eventfd_t)1); + rte_vhost_vring_call(task->bdev->vid, q_idx); } static void @@ -263,7 +263,7 @@ process_requestq(struct vhost_scsi_ctrlr *ctrlr, uint32_t q_idx) task->resp->status = 0; task->resp->resid = 0; } - submit_completion(task); + submit_completion(task, q_idx); rte_free(task); } } diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 4f8b73a09..1244d76d4 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -519,6 +519,27 @@ rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx, return 0; } +int +rte_vhost_vring_call(int vid, uint16_t vring_idx) +{ + struct virtio_net *dev; + struct vhost_virtqueue *vq; + + dev = get_device(vid); + if (!dev) + return -1; + + if (vring_idx >= VHOST_MAX_VRING) + return -1; + + vq = dev->virtqueue[vring_idx]; + if (!vq) + return -1; + + vhost_vring_call(vq); + return 0; +} + uint16_t rte_vhost_avail_entries(int vid, uint16_t queue_id) { diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index 1e7049535..b30187601 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -52,3 +52,10 @@ DPDK_17.08 { rte_vhost_rx_queue_count; } DPDK_17.05; + +EXPERIMENTAL { + glob
Re: [dpdk-dev] [PATCH v6 1/2] eal: add uevent monitor for hot plug
On 12/26/2017 2:06 AM, Stephen Hemminger wrote: On Thu, 2 Nov 2017 04:16:44 +0800 Jeff Guo wrote: +int +rte_dev_bind_driver(const char *dev_name, const char *drv_type) { Bracket left after declaration. thanks. + snprintf(drv_override_path, sizeof(drv_override_path), + "/sys/bus/pci/devices/%s/driver_override", dev_name); + + /* specify the driver for a device by writing to driver_override */ + drv_override_fd = open(drv_override_path, O_WRONLY); + if (drv_override_fd < 0) { + RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", + drv_override_path, strerror(errno)); + goto err; + } You should not have dev functions that assume PCI. Please split into common and bus specific code. make sense, will modify it into bus specific code. +static int +dev_uev_parse(const char *buf, struct rte_eal_uevent *event) +{ + char action[RTE_EAL_UEVENT_MSG_LEN]; + char subsystem[RTE_EAL_UEVENT_MSG_LEN]; + char dev_path[RTE_EAL_UEVENT_MSG_LEN]; + char pci_slot_name[RTE_EAL_UEVENT_MSG_LEN]; + int i = 0; + + memset(action, 0, RTE_EAL_UEVENT_MSG_LEN); + memset(subsystem, 0, RTE_EAL_UEVENT_MSG_LEN); + memset(dev_path, 0, RTE_EAL_UEVENT_MSG_LEN); + memset(pci_slot_name, 0, RTE_EAL_UEVENT_MSG_LEN); + + while (i < RTE_EAL_UEVENT_MSG_LEN) { + for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) { + if (*buf) + break; + buf++; + } + if (!strncmp(buf, "libudev", 7)) { + buf += 7; + i += 7; + event->group = UEV_MONITOR_UDEV; + } + if (!strncmp(buf, "ACTION=", 7)) { + buf += 7; + i += 7; + snprintf(action, sizeof(action), "%s", buf); Why snprintf rather than strncpy? snprintf would no need manual write '\0' and the src length is not explicit, and if concern about the efficiency of the snprintf scan, i will constrain the value of dest buf length. + } else if (!strncmp(buf, "DEVPATH=", 8)) { + buf += 8; + i += 8; + snprintf(dev_path, sizeof(dev_path), "%s", buf); + } else if (!strncmp(buf, "SUBSYSTEM=", 10)) { + buf += 10; + i += 10; + snprintf(subsystem, sizeof(subsystem), "%s", buf); + } else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) { + buf += 14; + i += 14; + snprintf(pci_slot_name, sizeof(subsystem), "%s", buf); + } + for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) { + if (*buf == '\0') + break; + buf++; + } + } + + if (!strncmp(subsystem, "pci", 3)) + event->subsystem = UEV_SUBSYSTEM_PCI; + if (!strncmp(action, "add", 3)) + event->type = RTE_EAL_DEV_EVENT_ADD; + if (!strncmp(action, "remove", 6)) + event->type = RTE_EAL_DEV_EVENT_REMOVE; + event->devname = pci_slot_name; Why do you need to first capture the strings, then set state variables? Instead why not update event->xxx directly? i think that would be more benefit to read and manage out of the loop.
[dpdk-dev] [PATCH v7 1/2] eal: add uevent monitor for hot plug
This patch aim to add a general uevent mechanism in eal device layer, to enable all linux kernel object hot plug monitoring, so user could use these APIs to monitor and read out the device status info that sent from the kernel side, then corresponding to handle it, such as detach or attach the device, and even benefit to use it to do smoothly fail safe work. 1) About uevent monitoring: a: add one epolling to poll the netlink socket, to monitor the uevent of the device, add device_state in struct of rte_device, to identify the device state machine. b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent. c: add below API in rte eal device common layer. rte_eal_dev_monitor_enable rte_dev_callback_register rte_dev_callback_unregister _rte_dev_callback_process rte_dev_monitor_start rte_dev_monitor_stop 2) About failure handler, use pci uio for example, add pci_remap_device in bus layer and below function to process it: rte_pci_remap_device pci_uio_remap_resource pci_map_private_resource add rte_pci_dev_bind_driver to bind pci device with explicit driver. Signed-off-by: Jeff Guo --- v7->v6: a.modify vdev part according to the vdev rework b.re-define and split the func into common and bus specific code c.fix some incorrect issue. b.fix the system hung after send packcet issue. --- drivers/bus/pci/bsd/pci.c | 30 ++ drivers/bus/pci/linux/pci.c| 87 + drivers/bus/pci/linux/pci_init.h | 1 + drivers/bus/pci/pci_common.c | 43 +++ drivers/bus/pci/pci_common_uio.c | 28 ++ drivers/bus/pci/private.h | 12 + drivers/bus/pci/rte_bus_pci.h | 25 ++ drivers/bus/vdev/vdev.c| 36 +++ lib/librte_eal/bsdapp/eal/eal_dev.c| 64 .../bsdapp/eal/include/exec-env/rte_dev.h | 106 ++ lib/librte_eal/common/eal_common_bus.c | 30 ++ lib/librte_eal/common/eal_common_dev.c | 169 ++ lib/librte_eal/common/include/rte_bus.h| 69 lib/librte_eal/common/include/rte_dev.h| 89 ++ lib/librte_eal/linuxapp/eal/Makefile | 3 +- lib/librte_eal/linuxapp/eal/eal_alarm.c| 5 + lib/librte_eal/linuxapp/eal/eal_dev.c | 356 + .../linuxapp/eal/include/exec-env/rte_dev.h| 106 ++ lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 6 + lib/librte_pci/rte_pci.c | 20 ++ lib/librte_pci/rte_pci.h | 17 + 21 files changed, 1301 insertions(+), 1 deletion(-) create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c index b8e2178..d58dbf6 100644 --- a/drivers/bus/pci/bsd/pci.c +++ b/drivers/bus/pci/bsd/pci.c @@ -126,6 +126,29 @@ rte_pci_unmap_device(struct rte_pci_device *dev) } } +/* re-map pci device */ +int +rte_pci_remap_device(struct rte_pci_device *dev) +{ + int ret; + + if (dev == NULL) + return -EINVAL; + + switch (dev->kdrv) { + case RTE_KDRV_NIC_UIO: + ret = pci_uio_remap_resource(dev); + break; + default: + RTE_LOG(DEBUG, EAL, + " Not managed by a supported kernel driver, skipped\n"); + ret = 1; + break; + } + + return ret; +} + void pci_uio_free_resource(struct rte_pci_device *dev, struct mapped_pci_resource *uio_res) @@ -678,3 +701,10 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p) return ret; } + +int +rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type) +{ + return -1; +} + diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 5da6728..792fd2c 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -145,6 +145,38 @@ rte_pci_unmap_device(struct rte_pci_device *dev) } } +/* Map pci device */ +int +rte_pci_remap_device(struct rte_pci_device *dev) +{ + int ret = -1; + + if (dev == NULL) + return -EINVAL; + + switch (dev->kdrv) { + case RTE_KDRV_VFIO: +#ifdef VFIO_PRESENT + /* no thing to do */ +#endif + break; + case RTE_KDRV_IGB_UIO: + case RTE_KDRV_UIO_GENERIC: + if (rte_eal_using_phys_addrs()) { + /* map resources for devices that use uio */ + ret = pci_uio_remap_resource(dev); + } + break; + default: + RTE_LOG(DEBUG, EAL, + " Not managed b
[dpdk-dev] [PATCH v7 0/2] add uevent monitor for hot plug
So far, about hot plug in dpdk, we already have hot plug add/remove api and fail-safe driver to offload the fail-safe work from the app user. But there are still lack of a general event api, since the interrupt event, which hot plug related with, is diversity between each device and driver, such as mlx4, pci driver and others. Use the hot removal event for example, pci drivers not all exposure the remove interrupt, so in order to make user to easy use the hot plug feature for pci driver, something must be done to detect the remove event at the kernel level and offer a new line of interrupt to the user land. Base on the uevent of kobject mechanism in kernel, we could use it to benefit for monitoring the hot plug status of the device which not only uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express bus devices. The idea is comming as bellow. a.The uevent message form FD monitoring which will be useful. remove@/devices/pci:80/:80:02.2/:82:00.0/:83:03.0/:84:00.2/uio/uio2 ACTION=remove DEVPATH=/devices/pci:80/:80:02.2/:82:00.0/:83:03.0/:84:00.2/uio/uio2 SUBSYSTEM=uio MAJOR=243 MINOR=2 DEVNAME=uio2 SEQNUM=11366 b.add uevent monitoring machanism: add several general api to enable uevent monitoring. c.add common uevent handler and uevent failure handler uevent of device should be handler at bus or device layer, and the memory read and write failure when hot removal should be handle correctly before detach behaviors. d.show example how to use uevent monitor enable uevent monitoring in testpmd or fail-safe to show usage. patchset history: v7->v6: 1.modify vdev part according to the vdev rework 2.re-define and split the func into common and bus specific code 3.fix some incorrect issue. 4.fix the system hung after send packcet issue. v6->v5: 1.add hot plug policy, in eal, default handle to prepare hot plug work for all pci device, then let app to manage to deside which device need to hot plug. 2.modify to manage event callback in each device. 3.fix some system hung issue when igb_uio release. 4.modify the pci part to the bus-pci base on the bus rework. 5.add hot plug policy in app, show example to use hotplug list to manage to deside which device need to hot plug. v5->v4: 1.Move uevent monitor epolling from eal interrupt to eal device layer. 2.Redefine the eal device API for common, and distinguish between linux and bsd 3.Add failure handler helper api in bus layer.Add function of find device by name. 4.Replace of individual fd bind with single device, use a common fd to polling all device. 5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device 6.Refine some coding style and typos issue 7.add new callback to process hot insertion v4->v3: 1.move uevent monitor api from eal interrupt to eal device layer. 2.create uevent type and struct in eal device. 3.move uevent handler for each driver to eal layer. 4.add uevent failure handler to process signal fault issue. 5.add example for request and use uevent monitoring in testpmd. v3->v2: 1.refine some return error 2.refine the string searching logic to avoid memory issue v2->v1: 1.remove global variables of hotplug_fd, add uevent_fd in rte_intr_handle to let each pci device self maintain it fd, to fix dual device fd issue. 2.refine some typo error. Jeff Guo (2): eal: add uevent monitor for hot plug app/testpmd: use uevent to monitor hotplug app/test-pmd/testpmd.c | 178 +++ app/test-pmd/testpmd.h | 9 + drivers/bus/pci/bsd/pci.c | 30 ++ drivers/bus/pci/linux/pci.c| 87 + drivers/bus/pci/linux/pci_init.h | 1 + drivers/bus/pci/pci_common.c | 43 +++ drivers/bus/pci/pci_common_uio.c | 28 ++ drivers/bus/pci/private.h | 12 + drivers/bus/pci/rte_bus_pci.h | 25 ++ drivers/bus/vdev/vdev.c| 36 +++ lib/librte_eal/bsdapp/eal/eal_dev.c| 64 .../bsdapp/eal/include/exec-env/rte_dev.h | 106 ++ lib/librte_eal/common/eal_common_bus.c | 30 ++ lib/librte_eal/common/eal_common_dev.c | 169 ++ lib/librte_eal/common/include/rte_bus.h| 69 lib/librte_eal/common/include/rte_dev.h| 89 ++ lib/librte_eal/linuxapp/eal/Makefile | 3 +- lib/librte_eal/linuxapp/eal/eal_alarm.c| 5 + lib/librte_eal/linuxapp/eal/eal_dev.c | 356 + .../linuxapp/eal/include/exec-env/rte_dev.h| 106 ++ lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 6 + lib/librte_pci/rte_pci.c | 20 ++ lib/librte_pci/rte_pci.h | 17 + 23 files changed, 1488 insertions(+), 1 deletion(-) create mode 100644
[dpdk-dev] [PATCH v7 2/2] app/testpmd: use uevent to monitor hotplug
use testpmd for example, to show app how to request and use uevent monitoring to handle the hot removal event and the hot insertion event. Signed-off-by: Jeff Guo --- v7->v6: fix the system hung after send packcet issue. --- app/test-pmd/testpmd.c | 178 + app/test-pmd/testpmd.h | 9 +++ 2 files changed, 187 insertions(+) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index c3ab448..97b4999 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -401,6 +401,8 @@ uint8_t bitrate_enabled; struct gro_status gro_ports[RTE_MAX_ETHPORTS]; uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES; +static struct hotplug_request_list hp_list; + /* Forward function declarations */ static void map_port_queue_stats_mapping_registers(portid_t pi, struct rte_port *port); @@ -408,6 +410,13 @@ static void check_all_ports_link_status(uint32_t port_mask); static int eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param, void *ret_param); +static int eth_uevent_callback(enum rte_eal_dev_event_type type, + void *param, void *ret_param); +static int eth_uevent_callback_register(portid_t pid); +static int in_hotplug_list(const char *dev_name); + +static int hotplug_list_add(const char *dev_name, + enum rte_eal_dev_event_type event); /* * Check if all the ports are started. @@ -1757,6 +1766,31 @@ reset_port(portid_t pid) printf("Done\n"); } +static int +eth_uevent_callback_register(portid_t pid) { + int diag; + struct rte_eth_dev *dev; + enum rte_eal_dev_event_type dev_event_type; + + /* register the uevent callback */ + dev = &rte_eth_devices[pid]; + for (dev_event_type = RTE_EAL_DEV_EVENT_ADD; +dev_event_type < RTE_EAL_DEV_EVENT_CHANGE; +dev_event_type++) { + diag = rte_dev_callback_register(dev->device, dev_event_type, + eth_uevent_callback, + (void *)(intptr_t)pid); + if (diag) { + printf("Failed to setup uevent callback for" + " device event %d\n", + dev_event_type); + return -1; + } + } + + return 0; +} + void attach_port(char *identifier) { @@ -1773,6 +1807,8 @@ attach_port(char *identifier) if (rte_eth_dev_attach(identifier, &pi)) return; + eth_uevent_callback_register(pi); + socket_id = (unsigned)rte_eth_dev_socket_id(pi); /* if socket_id is invalid, set to 0 */ if (check_socket_id(socket_id) < 0) @@ -1784,6 +1820,8 @@ attach_port(char *identifier) ports[pi].port_status = RTE_PORT_STOPPED; + hotplug_list_add(identifier, RTE_EAL_DEV_EVENT_REMOVE); + printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports); printf("Done\n"); } @@ -1810,6 +1848,9 @@ detach_port(portid_t port_id) nb_ports = rte_eth_dev_count(); + hotplug_list_add(rte_eth_devices[port_id].device->name, +RTE_EAL_DEV_EVENT_ADD); + printf("Port '%s' is detached. Now total ports is %d\n", name, nb_ports); printf("Done\n"); @@ -1833,6 +1874,9 @@ pmd_test_exit(void) close_port(pt_id); } } + + rte_dev_monitor_stop(); + printf("\nBye...\n"); } @@ -1917,6 +1961,49 @@ rmv_event_callback(void *arg) dev->device->name); } +static void +rmv_uevent_callback(void *arg) +{ + char name[RTE_ETH_NAME_MAX_LEN]; + uint8_t port_id = (intptr_t)arg; + + rte_eal_alarm_cancel(rmv_uevent_callback, arg); + + RTE_ETH_VALID_PORTID_OR_RET(port_id); + printf("removing port id:%u\n", port_id); + + if (!in_hotplug_list(rte_eth_devices[port_id].device->name)) + return; + + stop_packet_forwarding(); + + stop_port(port_id); + close_port(port_id); + if (rte_eth_dev_detach(port_id, name)) { + RTE_LOG(ERR, USER1, "Failed to detach port '%s'\n", name); + return; + } + + nb_ports = rte_eth_dev_count(); + + printf("Port '%s' is detached. Now total ports is %d\n", + name, nb_ports); +} + +static void +add_uevent_callback(void *arg) +{ + char *dev_name = (char *)arg; + + rte_eal_alarm_cancel(add_uevent_callback, arg); + + if (!in_hotplug_list(dev_name)) + return; + + RTE_LOG(ERR, EAL, "add device: %s\n", dev_name); + attach_port(dev_name); +} + /* This function is used by the interrupt thread */ static int eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param, @@ -1959,6
Re: [dpdk-dev] [PATCH v6 0/2] add uevent monitor for hot plug
hi,moti please see v7 patch set , thanks. On 12/24/2017 11:12 PM, Mordechay Haimovsky wrote: Thanks Jeff, Do you have an estimation on when will these patches be ready ? Moti H. -Original Message- From: Guo, Jia [mailto:jia@intel.com] Sent: Friday, December 22, 2017 2:16 AM To: Gaëtan Rivet ; Mordechay Haimovsky Cc: dev@dpdk.org Subject: RE: [dpdk-dev] [PATCH v6 0/2] add uevent monitor for hot plug Moti, Hello and sorry for be reply late until now, definitely as gaetan said that there might be some change after the version, anyway I will create a new version to benefit you all to review and further test. Best regards, Jeff Guo -Original Message- From: Gaëtan Rivet [mailto:gaetan.ri...@6wind.com] Sent: Thursday, December 14, 2017 6:21 PM To: Mordechay Haimovsky Cc: Guo, Jia ; dev@dpdk.org Subject: Re: [dpdk-dev] [PATCH v6 0/2] add uevent monitor for hot plug Hello Moti, On Thu, Dec 14, 2017 at 09:48:23AM +, Mordechay Haimovsky wrote: Hello, I would like to apply this patch in order to review it. In absence of answer from Jeff, Those two paths were modified during the 17.08 release: both pci and vdev buses have been moved to drivers/bus. Trying to apply it on 17.11 (and latest) fails due to missing lib/librte_eal/common/eal_common_vdev.c Trying to apply it on 17.08.1 fails on missing drivers/bus/pci/bsd/pci.c file Only the pci bus move was integrated by Jeff to this version of the udev monitor. The vdev bus move however came later and should be rebased upon. So, on what DPDK version should I apply it ? Or maybe there is a bunch of other patches I have to apply in order to use this patch ? You should apply it on 17.11 IMO. Either you take upon yourself to make it work with the new tree, or wait for Jeff to send a new version. -- Gaëtan Rivet 6WIND
[dpdk-dev] [PATCH v2] examples/l2fwd: increase pktmbuf pool size
Make pktmbuf pool size a function of ports and lcores detected instead of using constant 8192. Signed-off-by: Pavan Nikhilesh --- v2 Changes: - used variable inplace rather than having a macro. examples/l2fwd/main.c | 25 +++-- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c index e89e2e1bf..e6229955f 100644 --- a/examples/l2fwd/main.c +++ b/examples/l2fwd/main.c @@ -75,8 +75,6 @@ static int mac_updating = 1; #define RTE_LOGTYPE_L2FWD RTE_LOGTYPE_USER1 -#define NB_MBUF 8192 - #define MAX_PKT_BURST 32 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */ #define MEMPOOL_CACHE_SIZE 256 @@ -556,6 +554,8 @@ main(int argc, char **argv) uint16_t portid, last_port; unsigned lcore_id, rx_lcore_id; unsigned nb_ports_in_mask = 0; + unsigned int nb_lcores = 0; + unsigned int nb_mbufs; /* init EAL */ ret = rte_eal_init(argc, argv); @@ -578,13 +578,6 @@ main(int argc, char **argv) /* convert to number of cycles */ timer_period *= rte_get_timer_hz(); - /* create the mbuf pool */ - l2fwd_pktmbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", NB_MBUF, - MEMPOOL_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, - rte_socket_id()); - if (l2fwd_pktmbuf_pool == NULL) - rte_exit(EXIT_FAILURE, "Cannot init mbuf pool\n"); - nb_ports = rte_eth_dev_count(); if (nb_ports == 0) rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n"); @@ -636,9 +629,11 @@ main(int argc, char **argv) rte_exit(EXIT_FAILURE, "Not enough cores\n"); } - if (qconf != &lcore_queue_conf[rx_lcore_id]) + if (qconf != &lcore_queue_conf[rx_lcore_id]) { /* Assigned a new logical core in the loop above. */ qconf = &lcore_queue_conf[rx_lcore_id]; + nb_lcores++; + } qconf->rx_port_list[qconf->n_rx_port] = portid; qconf->n_rx_port++; @@ -647,6 +642,16 @@ main(int argc, char **argv) nb_ports_available = nb_ports; + nb_mbufs = RTE_MAX(nb_ports * (nb_rxd + nb_txd + MAX_PKT_BURST + + nb_lcores * MEMPOOL_CACHE_SIZE), 8192U); + + /* create the mbuf pool */ + l2fwd_pktmbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", nb_mbufs, + MEMPOOL_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, + rte_socket_id()); + if (l2fwd_pktmbuf_pool == NULL) + rte_exit(EXIT_FAILURE, "Cannot init mbuf pool\n"); + /* Initialise each port */ for (portid = 0; portid < nb_ports; portid++) { /* skip ports that are not enabled */ -- 2.15.1
[dpdk-dev] [PATCH v3] lib/librte_vhost: move fdset_del out of conn_mutex
From: wang zhike v3: * Fix duplicate variable name, which leads to unexpected memory write. v2: * Move fdset_del before conn destroy. * Fix coding style. This patch fixes below race condition: 1. one thread calls: rte_vhost_driver_unregister->lock conn_mutex ->fdset_del->loop to check fd.busy. 2. another thread calls fdset_event_dispatch, and the busy flag is changed AFTER handling on the fd, i.e, rcb(). However, the rcb, such as vhost_user_read_cb() would try to retrieve the conn_mutex. So issue is that the 1st thread will loop check the flag while holding the mutex, while the 2nd thread would be blocked by mutex and can not change the flag. Then dead lock is observed. Signed-off-by: zhike wang --- lib/librte_vhost/socket.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 422da00..ea01327 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -749,6 +749,9 @@ struct vhost_user_reconnect_list { struct vhost_user_socket *vsocket = vhost_user.vsockets[i]; if (!strcmp(vsocket->path, path)) { + int del_fds[MAX_FDS]; + int num_of_fds = 0, fd_index; + if (vsocket->is_server) { fdset_del(&vhost_user.fdset, vsocket->socket_fd); close(vsocket->socket_fd); @@ -757,13 +760,26 @@ struct vhost_user_reconnect_list { vhost_user_remove_reconnect(vsocket); } + /* fdset_del() must be called without conn_mutex. */ + pthread_mutex_lock(&vsocket->conn_mutex); + for (conn = TAILQ_FIRST(&vsocket->conn_list); +conn != NULL; +conn = next) { + next = TAILQ_NEXT(conn, next); + + del_fds[num_of_fds++] = conn->connfd; + } + pthread_mutex_unlock(&vsocket->conn_mutex); + + for (fd_index = 0; fd_index < num_of_fds; fd_index++) + fdset_del(&vhost_user.fdset, del_fds[fd_index]); + pthread_mutex_lock(&vsocket->conn_mutex); for (conn = TAILQ_FIRST(&vsocket->conn_list); conn != NULL; conn = next) { next = TAILQ_NEXT(conn, next); - fdset_del(&vhost_user.fdset, conn->connfd); RTE_LOG(INFO, VHOST_CONFIG, "free connfd = %d for device '%s'\n", conn->connfd, path); -- 1.8.3.1
Re: [dpdk-dev] [PATCH v2] eal/x86: get hypervisor name
-Original Message- > Date: Sat, 30 Dec 2017 23:47:23 +0100 > From: Thomas Monjalon > To: dev@dpdk.org > Cc: Stephen Hemminger , Jerin Jacob > > Subject: [PATCH v2] eal/x86: get hypervisor name > X-Mailer: git-send-email 2.15.1 > > The CPUID instruction is catched by hypervisor which can return > a flag indicating one is running, and its name. > > Suggested-by: Stephen Hemminger > Signed-off-by: Thomas Monjalon > --- > v2 changes: > - remove C99 style declaration > - move code in rte_hypervisor.* files > - add a function to get the name string >From the new API change perspective, Acked-by: Jerin Jacob
Re: [dpdk-dev] [PATCH v2 0/2] vhost: introduce rte_vhost_vring_call()
Hi Stefan, On 01/02/2018 10:31 AM, Stefan Hajnoczi wrote: v2: * Add internal vhost_vring_call() helper function [Maxime] These patches eliminate code duplication for vhost_virtqueue->callfd users by introducing rte_vhost_vring_call() (public API) and vhost_vring_call() (librte_vhost-internal API). Stefan Hajnoczi (2): vhost: add vhost_vring_call() helper vhost: introduce rte_vhost_vring_call() lib/librte_vhost/rte_vhost.h | 15 +++ lib/librte_vhost/vhost.h | 12 examples/vhost/virtio_net.c| 11 ++- examples/vhost_scsi/vhost_scsi.c | 6 +++--- lib/librte_vhost/vhost.c | 21 + lib/librte_vhost/virtio_net.c | 23 +++ lib/librte_vhost/rte_vhost_version.map | 7 +++ 7 files changed, 63 insertions(+), 32 deletions(-) I just wonder whether tagging the new API as experimental is needed, but apart from that it looks good to me: Reviewed-by: Maxime Coquelin Thanks! Maxime
Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
Hi Declan, On 12/22/2017 12:21 AM, Doherty, Declan wrote: This RFC contains a proposal to add a new tunnel endpoint API to DPDK that when used in conjunction with rte_flow enables the configuration of inline data path encapsulation and decapsulation of tunnel endpoint network overlays on accelerated IO devices. The proposed new API would provide for the creation, destruction, and monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs to allow the acceleration features to be discovered by applications. /** Tunnel Endpoint context, opaque structure */ struct rte_tep; enum rte_tep_type { RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */ RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */ ... }; /** Tunnel Endpoint Attributes */ struct rte_tep_attr { enum rte_type_type type; /* other endpoint attributes here */ } /** * Create a tunnel end-point context as specified by the flow attribute and pattern * * @param port_id Port identifier of Ethernet device. * @param attrFlow rule attributes. * @param pattern Pattern specification by list of rte_flow_items. * @return * - On success returns pointer to TEP context * - On failure returns NULL */ struct rte_tep *rte_tep_create(uint16_t port_id, struct rte_tep_attr *attr, struct rte_flow_item pattern[]) /** * Destroy an existing tunnel end-point context. All the end-points context * will be destroyed, so all active flows using tep should be freed before * destroying context. * @param port_idPort identifier of Ethernet device. * @param tepTunnel endpoint context * @return * - On success returns 0 * - On failure returns 1 */ int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep) /** * Get tunnel endpoint statistics * * @param port_idPort identifier of Ethernet device. * @param tepTunnel endpoint context * @param stats Tunnel endpoint statistics * * @return * - On success returns 0 * - On failure returns 1 */ Int rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep, struct rte_tep_stats *stats) /** * Get ports tunnel endpoint capabilities * * @param port_idPort identifier of Ethernet device. * @param capabilitiesTunnel endpoint capabilities * * @return * - On success returns 0 * - On failure returns 1 */ int rte_tep_capabilities_get(uint16_t port_id, struct rte_tep_capabilities *capabilities) To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is enhanced to add a new flow item type. This contains a pointer to the TEP context as well as the overlay flow id to which the traffic flow is associated. struct rte_flow_item_tep { struct rte_tep *tep; uint32_t flow_id; } Also 2 new generic actions types are added encapsulation and decapsulation. RTE_FLOW_ACTION_TYPE_ENCAP RTE_FLOW_ACTION_TYPE_DECAP struct rte_flow_action_encap { struct rte_flow_item *item; } struct rte_flow_action_decap { struct rte_flow_item *item; } The following section outlines the intended usage of the new APIs and then how they are combined with the existing rte_flow APIs. Tunnel endpoints are created on logical ports which support the capability using rte_tep_create() using a combination of TEP attributes and rte_flow_items. In the example below a new IPv4 VxLAN endpoint is being defined. The attrs parameter sets the TEP type, and could be used for other possible attributes. struct rte_tep_attr attrs = { .type = RTE_TEP_TYPE_VXLAN }; The values for the headers which make up the tunnel endpointr are then defined using spec parameter in the rte flow items (IPv4, UDP and VxLAN in this case) struct rte_flow_item_ipv4 ipv4_item = { .hdr = { .src_addr = saddr, .dst_addr = daddr } }; struct rte_flow_item_udp udp_item = { .hdr = { .src_port = sport, .dst_port = dport } }; struct rte_flow_item_vxlan vxlan_item = { .flags = vxlan_flags }; struct rte_flow_item pattern[] = { { .type = RTE_FLOW_ITEM_TYPE_IPV4, .spec = &ipv4_item }, { .type = RTE_FLOW_ITEM_TYPE_UDP, .spec = &udp_item }, { .type = RTE_FLOW_ITEM_TYPE_VXLAN, .spec = &vxlan_item }, { .type = RTE_FLOW_ITEM_TYPE_END } }; The tunnel endpoint can then be create on the port. Whether or not any hw configuration is required at this point would be hw dependent, but if not the context for the TEP is available for use in programming flow, so the application is not forced to redefine the TEP parameters on each flow addition. struct rte_tep *tep = rte_tep_create(port_id, &attrs, pattern); Once the tep context is created flows can then be directed to that endpoint for processing. The following sections will outline how the author envisage flow programming will work and also how TEP
Re: [dpdk-dev] [PATCH v3 1/2] gro: code cleanup
On Fri, Dec 22, 2017 at 03:25:43PM +0800, Jiayu Hu wrote: > - Remove needless check and variants > - For better understanding, update the programmer guide and rename > internal functions and variants > - For supporting tunneled gro, move common internal functions from > gro_tcp4.c to gro_tcp4.h > - Comply RFC 6864 to process the IPv4 ID field > > Signed-off-by: Jiayu Hu > --- > .../prog_guide/generic_receive_offload_lib.rst | 246 --- > doc/guides/prog_guide/img/gro-key-algorithm.png| Bin 0 -> 28231 bytes Rather than binary PNG images, please use SVG files (note, real SVG, not an SVG file with a binary blob pasted into it). Thanks, /Bruce
Re: [dpdk-dev] [PATCH v3 1/5] ethdev: remove useless parameter in callback process
Hi Thomas, > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon > Sent: Friday, December 29, 2017 1:37 PM > To: dev@dpdk.org > Cc: Yigit, Ferruh > Subject: [dpdk-dev] [PATCH v3 1/5] ethdev: remove useless parameter in > callback process > > The pointer to the user parameter of the callback registration is > automatically > pass to the callback function. > There is no point to allow changing this user parameter by a caller. > That's why this parameter is always set to NULL by PMDs and set only in ethdev > layer before calling the callback function. > > The history is that the user parameter was initially used by the callback > implementation to pass some information between the application and the > driver: > c1ceaf3ad056 ("ethdev: add an argument to internal callback function") > Then a new parameter has been added to leave the user parameter to its > standard usage of context given at registration: > d6af1a13d7a1 ("ethdev: add return values to callback process API") > > The NULL parameter in the internal callback processing function is now > removed. It makes clear that the callback parameter is user managed and > opaque from a DPDK point of view. > > Signed-off-by: Thomas Monjalon > Reviewed-by: Ferruh Yigit > --- > v2: add history > v3: no change > --- > doc/guides/prog_guide/poll_mode_drv.rst | 4 ++-- > drivers/net/bnxt/rte_pmd_bnxt.c | 2 +- > drivers/net/bonding/rte_eth_bond_pmd.c | 6 +++--- > drivers/net/dpaa2/dpaa2_ethdev.c| 2 +- > drivers/net/e1000/em_ethdev.c | 2 +- > drivers/net/e1000/igb_ethdev.c | 4 ++-- > drivers/net/enic/enic_main.c| 2 +- > drivers/net/failsafe/failsafe_ether.c | 2 +- > drivers/net/fm10k/fm10k_ethdev.c| 8 > drivers/net/i40e/i40e_ethdev.c | 2 +- > drivers/net/i40e/i40e_ethdev_vf.c | 2 +- > drivers/net/i40e/i40e_pf.c | 3 +-- > drivers/net/ixgbe/ixgbe_ethdev.c| 6 +++--- > drivers/net/ixgbe/ixgbe_pf.c| 4 ++-- > drivers/net/mlx4/mlx4_intr.c| 4 ++-- > drivers/net/mlx5/mlx5_ethdev.c | 9 +++-- > drivers/net/nfp/nfp_net.c | 2 +- > drivers/net/sfc/sfc_intr.c | 4 ++-- > drivers/net/thunderx/nicvf_ethdev.c | 2 +- > drivers/net/vhost/rte_eth_vhost.c | 9 +++-- > drivers/net/virtio/virtio_ethdev.c | 2 +- > drivers/net/vmxnet3/vmxnet3_ethdev.c| 2 +- > lib/librte_ether/rte_ethdev.c | 4 +--- > lib/librte_ether/rte_ethdev.h | 4 +--- > test/test/virtual_pmd.c | 2 +- > 25 files changed, 41 insertions(+), 52 deletions(-) > > diff --git a/doc/guides/prog_guide/poll_mode_drv.rst > b/doc/guides/prog_guide/poll_mode_drv.rst > index 6a0c9f992..d1d4b1cb7 100644 > --- a/doc/guides/prog_guide/poll_mode_drv.rst > +++ b/doc/guides/prog_guide/poll_mode_drv.rst > @@ -581,8 +581,8 @@ thread safety all these operations should be called from > the same thread. > For example when PF is reset, the PF sends a message to notify VFs of this > event and also trigger an interrupt to VFs. Then in the interrupt service > routine > the VFs detects this notification message and calls - > _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET, NULL, - > NULL). This means that a PF reset triggers an RTE_ETH_EVENT_INTR_RESET > +_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET, NULL). > +This means that a PF reset triggers an RTE_ETH_EVENT_INTR_RESET > event within VFs. The function _rte_eth_dev_callback_process() will call the > registered callback function. The callback function can trigger the > application to > handle all operations the VF reset requires including diff --git > a/drivers/net/bnxt/rte_pmd_bnxt.c b/drivers/net/bnxt/rte_pmd_bnxt.c index > a31340742..e86e670dc 100644 > --- a/drivers/net/bnxt/rte_pmd_bnxt.c > +++ b/drivers/net/bnxt/rte_pmd_bnxt.c > @@ -57,7 +57,7 @@ int bnxt_rcv_msg_from_vf(struct bnxt *bp, uint16_t vf_id, > void *msg) > ret_param.msg = msg; > > _rte_eth_dev_callback_process(bp->eth_dev, > RTE_ETH_EVENT_VF_MBOX, > - NULL, &ret_param); > + &ret_param); > > /* Default to approve */ > if (ret_param.retval == RTE_PMD_BNXT_MB_EVENT_PROCEED) diff -- > git a/drivers/net/bonding/rte_eth_bond_pmd.c > b/drivers/net/bonding/rte_eth_bond_pmd.c > index fe2328954..68952c4c0 100644 > --- a/drivers/net/bonding/rte_eth_bond_pmd.c > +++ b/drivers/net/bonding/rte_eth_bond_pmd.c > @@ -2476,7 +2476,7 @@ bond_ethdev_delayed_lsc_propagation(void *arg) > return; > > _rte_eth_dev_callback_process((struct rte_eth_dev *)arg, > - RTE_ETH_EVENT_INTR_LSC, NULL, NULL); > + RTE_ETH_EVENT_INTR_LSC, NULL); > } > > int > @@ -2584,7 +2584,7 @@ bond_ethdev_lsc_event_callback(uint16_t port_id, > enum rte_eth_event_type type, >
Re: [dpdk-dev] [PATCH v3 1/5] ethdev: remove useless parameter in callback process
On Tue, Jan 02, 2018 at 11:35:02AM +, Iremonger, Bernard wrote: > Hi Thomas, > > > -Original Message- > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon > > Sent: Friday, December 29, 2017 1:37 PM > > To: dev@dpdk.org > > Cc: Yigit, Ferruh > > Subject: [dpdk-dev] [PATCH v3 1/5] ethdev: remove useless parameter in > > callback process > > > > The pointer to the user parameter of the callback registration is > > automatically > > pass to the callback function. > > There is no point to allow changing this user parameter by a caller. > > That's why this parameter is always set to NULL by PMDs and set only in > > ethdev > > layer before calling the callback function. > > > > The history is that the user parameter was initially used by the callback > > implementation to pass some information between the application and the > > driver: > > c1ceaf3ad056 ("ethdev: add an argument to internal callback function") > > Then a new parameter has been added to leave the user parameter to its > > standard usage of context given at registration: > > d6af1a13d7a1 ("ethdev: add return values to callback process API") > > > > The NULL parameter in the internal callback processing function is now > > removed. It makes clear that the callback parameter is user managed and > > opaque from a DPDK point of view. > > > > Signed-off-by: Thomas Monjalon > > Reviewed-by: Ferruh Yigit > > --- > > v2: add history > > v3: no change > > --- > > doc/guides/prog_guide/poll_mode_drv.rst | 4 ++-- > > drivers/net/bnxt/rte_pmd_bnxt.c | 2 +- > > drivers/net/bonding/rte_eth_bond_pmd.c | 6 +++--- > > drivers/net/dpaa2/dpaa2_ethdev.c| 2 +- > > drivers/net/e1000/em_ethdev.c | 2 +- > > drivers/net/e1000/igb_ethdev.c | 4 ++-- > > drivers/net/enic/enic_main.c| 2 +- > > drivers/net/failsafe/failsafe_ether.c | 2 +- > > drivers/net/fm10k/fm10k_ethdev.c| 8 > > drivers/net/i40e/i40e_ethdev.c | 2 +- > > drivers/net/i40e/i40e_ethdev_vf.c | 2 +- > > drivers/net/i40e/i40e_pf.c | 3 +-- > > drivers/net/ixgbe/ixgbe_ethdev.c| 6 +++--- > > drivers/net/ixgbe/ixgbe_pf.c| 4 ++-- > > drivers/net/mlx4/mlx4_intr.c| 4 ++-- > > drivers/net/mlx5/mlx5_ethdev.c | 9 +++-- > > drivers/net/nfp/nfp_net.c | 2 +- > > drivers/net/sfc/sfc_intr.c | 4 ++-- > > drivers/net/thunderx/nicvf_ethdev.c | 2 +- > > drivers/net/vhost/rte_eth_vhost.c | 9 +++-- > > drivers/net/virtio/virtio_ethdev.c | 2 +- > > drivers/net/vmxnet3/vmxnet3_ethdev.c| 2 +- > > lib/librte_ether/rte_ethdev.c | 4 +--- > > lib/librte_ether/rte_ethdev.h | 4 +--- > > test/test/virtual_pmd.c | 2 +- > > 25 files changed, 41 insertions(+), 52 deletions(-) > > > > diff --git a/doc/guides/prog_guide/poll_mode_drv.rst > > b/doc/guides/prog_guide/poll_mode_drv.rst > > index 6a0c9f992..d1d4b1cb7 100644 > > --- a/doc/guides/prog_guide/poll_mode_drv.rst > > +++ b/doc/guides/prog_guide/poll_mode_drv.rst > > @@ -581,8 +581,8 @@ thread safety all these operations should be called from > > the same thread. > > For example when PF is reset, the PF sends a message to notify VFs of this > > event and also trigger an interrupt to VFs. Then in the interrupt service > > routine > > the VFs detects this notification message and calls - > > _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET, NULL, - > > NULL). This means that a PF reset triggers an RTE_ETH_EVENT_INTR_RESET > > +_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET, NULL). > > +This means that a PF reset triggers an RTE_ETH_EVENT_INTR_RESET > > event within VFs. The function _rte_eth_dev_callback_process() will call > > the > > registered callback function. The callback function can trigger the > > application to > > handle all operations the VF reset requires including diff --git > > a/drivers/net/bnxt/rte_pmd_bnxt.c b/drivers/net/bnxt/rte_pmd_bnxt.c index > > a31340742..e86e670dc 100644 > > --- a/drivers/net/bnxt/rte_pmd_bnxt.c > > +++ b/drivers/net/bnxt/rte_pmd_bnxt.c > > @@ -57,7 +57,7 @@ int bnxt_rcv_msg_from_vf(struct bnxt *bp, uint16_t vf_id, > > void *msg) > > ret_param.msg = msg; > > > > _rte_eth_dev_callback_process(bp->eth_dev, > > RTE_ETH_EVENT_VF_MBOX, > > - NULL, &ret_param); > > + &ret_param); > > > > /* Default to approve */ > > if (ret_param.retval == RTE_PMD_BNXT_MB_EVENT_PROCEED) diff -- > > git a/drivers/net/bonding/rte_eth_bond_pmd.c > > b/drivers/net/bonding/rte_eth_bond_pmd.c > > index fe2328954..68952c4c0 100644 > > --- a/drivers/net/bonding/rte_eth_bond_pmd.c > > +++ b/drivers/net/bonding/rte_eth_bond_pmd.c > > @@ -2476,7 +2476,7 @@ bond_ethdev_delayed_lsc_propagation(void *arg) > > return; > > > > _rte_eth_dev_callback_process((struct r
[dpdk-dev] [PATCH v1 1/5] rawdev: introduce raw device library support
A device is DPDK has a flavor to it - ethernet, crypto, event etc. A rawdevice represents a generic device map-able to a device flavour not being currently handled out-of-the-box by DPDK framework. A device which can be scanned on an installed bus (pci, fslmc, ...) or instantiated through devargs, can be interfaced using standardized APIs just like other standardized devices. This library introduces an API set which can be plugged on the northbound side to the application layer, and on the southbound side to the driver layer. The APIs of rawdev library exposes some generic operations which can enable configuration and I/O with the raw devices. Signed-off-by: Shreyansh Jain --- lib/librte_rawdev/Makefile | 27 ++ lib/librte_rawdev/rte_rawdev.c | 540 lib/librte_rawdev/rte_rawdev.h | 586 ++ lib/librte_rawdev/rte_rawdev_pmd.h | 588 +++ lib/librte_rawdev/rte_rawdev_version.map | 33 ++ 5 files changed, 1774 insertions(+) create mode 100644 lib/librte_rawdev/Makefile create mode 100644 lib/librte_rawdev/rte_rawdev.c create mode 100644 lib/librte_rawdev/rte_rawdev.h create mode 100644 lib/librte_rawdev/rte_rawdev_pmd.h create mode 100644 lib/librte_rawdev/rte_rawdev_version.map diff --git a/lib/librte_rawdev/Makefile b/lib/librte_rawdev/Makefile new file mode 100644 index 0..addb288d7 --- /dev/null +++ b/lib/librte_rawdev/Makefile @@ -0,0 +1,27 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright 2017 NXP + +include $(RTE_SDK)/mk/rte.vars.mk + +# library name +LIB = librte_rawdev.a + +# library version +LIBABIVER := 1 + +# build flags +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) +LDLIBS += -lrte_eal + +# library source files +SRCS-y += rte_rawdev.c + +# export include files +SYMLINK-y-include += rte_rawdev.h +SYMLINK-y-include += rte_rawdev_pmd.h + +# versioning export map +EXPORT_MAP := rte_rawdev_version.map + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/lib/librte_rawdev/rte_rawdev.c b/lib/librte_rawdev/rte_rawdev.c new file mode 100644 index 0..2d34d9b6d --- /dev/null +++ b/lib/librte_rawdev/rte_rawdev.c @@ -0,0 +1,540 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright 2017 NXP + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "rte_rawdev.h" +#include "rte_rawdev_pmd.h" + +/* dynamic log identifier */ +int librawdev_logtype; + +/* Maximum rawdevices supported by system. + */ +#define RTE_MAX_RAWDEVPORTS10 + +struct rte_rawdev rte_rawdevices[RTE_RAWDEV_MAX_DEVS]; + +struct rte_rawdev *rte_rawdevs = &rte_rawdevices[0]; + +static struct rte_rawdev_global rawdev_globals = { + .nb_devs= 0 +}; + +struct rte_rawdev_global *rte_rawdev_globals = &rawdev_globals; + +/* Raw device, northbound API implementation */ +uint8_t +rte_rawdev_count(void) +{ + return rte_rawdev_globals->nb_devs; +} + +uint16_t +rte_rawdev_get_dev_id(const char *name) +{ + uint16_t i; + + if (!name) + return -EINVAL; + + for (i = 0; i < rte_rawdev_globals->nb_devs; i++) + if ((strcmp(rte_rawdevices[i].name, name) + == 0) && + (rte_rawdevices[i].attached == + RTE_RAWDEV_ATTACHED)) + return i; + return -ENODEV; +} + +int +rte_rawdev_socket_id(uint16_t dev_id) +{ + struct rte_rawdev *dev; + + RTE_RAWDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL); + dev = &rte_rawdevs[dev_id]; + + return dev->socket_id; +} + +int +rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info) +{ + struct rte_rawdev *rawdev; + + RTE_RAWDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL); + RTE_FUNC_PTR_OR_ERR_RET(dev_info, -EINVAL); + + if (dev_info == NULL) + return -EINVAL; + + rawdev = &rte_rawdevs[dev_id]; + + RTE_FUNC_PTR_OR_ERR_RET(*rawdev->dev_ops->dev_info_get, -ENOTSUP); + (*rawdev->dev_ops->dev_info_get)(rawdev, dev_info->dev_private); + + if (dev_info) { + + dev_info->driver_name = rawdev->driver_name; + dev_info->device = rawdev->device; + } + + return 0; +} + +int +rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf) +{ + struct rte_rawdev *dev; + int diag; + + RTE_RAWDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL); + RTE_FUNC_PTR_OR_ERR_RET(dev_conf, -EINVAL); + + dev = &rte_rawdevs[dev_id]; + + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_configure, -ENOTSUP); + + if (dev->started) { + RTE_RDEV_ERR( + "device %d must be stopped to al
[dpdk-dev] [PATCH v1 0/5] Introduce generic 'rawdevice' support
Rawdevice Support in DPDK - RFC [1]: http://dpdk.org/ml/archives/dev/2017-November/081550.html (from: hemant.agra...@nxp.com) This patchset introduces rawdevices or generic device support in DPDK. Motivation == In terms of device flavor (type) support, DPDK currently has ethernet (lib_ether), cryptodev (libcryptodev), eventdev (libeventdev) and vdev (virtual device) support. For a new type of device, for example an accelerator, there are not many options except either of: 1. create another lib/librte_MySpecialDev, driver/MySpecialDrv and use it through Bus/PMD model. 2. Or, create a vdev and implement necessary custom APIs which are directly exposed from driver layer. However this may still require changes in bus code in DPDK. Either method is unclean (touching lib for specific context) and possibly non-upstreamable (custom APIs). Applications and customers prefers uniform device view and programming model. Scope = The rawdevice implementation is targetted towards various accelerator use cases which cannot be generalized within existing device models. Aim is to provided a generalized structure at the cost of portability guarantee. Specific PMDs may also expose any specific config APIs. Applications built over such devices are special use-cases involving IP blocks. The rawdevice may also connect to other standard devices using adapter or other methods, similar to eventdev adpter for ethernet/crypto devices. Proposed Solution = Defining a very generic super-set of device type and its device operations that can be exposed such that any new/upcoming/experimental device can be layered over it. 'rawdevice' semantic in this patchset represents a device that doesn't have any flavor/type associated with it which is advertised (like net, crypto etc). A *rte_rawdevice* is a raw/generic device without any standard configuration or input/output method assumption. Thus, driver for a new accelerator block, which requires operations for start/stop/enqueue/dequeue, can be quickly strapped over this rawdevice layer. Thereafter, any appropriate bus can scan for it (assuming device is discoverable over the Linux interfaces like sysfs) and match it against registered drivers. Similarly, for a new accelerator or a wireless device, which doesn't fit the eth type, a driver can be registered with a bus (on which its device would be scannable) and use this layer for configuring the device. It can also serve as a staging area for new type of devices till they find some commonality and can be standardized. The outline of this proposed library is same as existing ether/crypto devices. +---+ |Application(s) | +--.+ | | +--'+ | DPDK Framework (APIs) | +--||-|-+ / \ \ (crypto ops) (eth ops) (rawdev ops)++ / \ \ |DrvA| +-'---++`++---'-+ ++ | crypto || ethdev || raw | +--/--++---/-++/+ ++ /\__/\ / ..|DrvB| / \ /\/ ../\ ++ ++ ++++ +++==/=+ ```Bus Probe |DevA| |DevB||DevC| |DevD||DevF| ++ ++++ ++++ | || | | ``|``||``|`|Bus Scan (PCI) | (PCI) (PCI)(PCI) (BusA) * It is assumed above that DrvB is a PCI type driver which registers itself with PCI Bus * Thereafter, when the PCI scan is done, during probe DrvB would match the rawdev DevF ID and take control of device * Applications can then continue using the device through rawdev API interfaces Proposed Interfaces === Following broad API categories are exposed by the rawdevice: 1) Device State Operations (start/stop/reset) 2) Communication Channel setup/teardown (queue) 3) Attribute get/set operations (~ioctls) 4) Enqueue/Dequeue Operations (using opaque buffers) 5) Firmware Operations (Load/unload) Notes: For (1), other than standard start/stop, reset has been added extra. This is for cases where device power cycle has various definitions. Semantics of what stop->start and what reset would mean are still open-ended. For (2), though currently `queue` has been used a semantic, it would be possible in implementation to use this with other
[dpdk-dev] [PATCH v1 4/5] config: enable compilation of rawdev skeleton driver
Signed-off-by: Shreyansh Jain --- config/common_base | 1 + drivers/Makefile | 2 ++ mk/rte.app.mk | 4 3 files changed, 7 insertions(+) diff --git a/config/common_base b/config/common_base index 3d2e12c31..5f7dfd9e9 100644 --- a/config/common_base +++ b/config/common_base @@ -798,6 +798,7 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n # CONFIG_RTE_LIBRTE_RAWDEV=y CONFIG_RTE_MAX_RAWDEVS=10 +CONFIG_RTE_LIBRTE_PMD_SKELETON_RAWDEV=y # # Compile vhost PMD diff --git a/drivers/Makefile b/drivers/Makefile index db0cd76ee..407f22a3c 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -40,5 +40,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += crypto DEPDIRS-crypto := bus mempool DIRS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += event DEPDIRS-event := bus mempool net +DIRS-$(CONFIG_RTE_LIBRTE_RAWDEV) += raw +DEPDIRS-raw := bus mempool net event include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/mk/rte.app.mk b/mk/rte.app.mk index d783de2c1..ce037827b 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -203,6 +203,10 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL) += -lrte_mempool_octeontx _LDLIBS-$(CONFIG_RTE_LIBRTE_OCTEONTX_PMD) += -lrte_pmd_octeontx endif # CONFIG_RTE_LIBRTE_EVENTDEV +ifeq ($(CONFIG_RTE_LIBRTE_RAWDEV),y) +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SKELETON_RAWDEV) += -lrte_pmd_skeleton_rawdev +endif # CONFIG_RTE_LIBRTE_RAWDEV + ifeq ($(CONFIG_RTE_LIBRTE_DPAA2_PMD),y) _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_PMD) += -lrte_bus_fslmc _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_PMD) += -lrte_mempool_dpaa2 -- 2.14.1
[dpdk-dev] [PATCH v1 2/5] config: enable compilation of rawdev library
Add config option CONFIG_RTE_LIBRTE_RAWDEV for toggling rawdev library support. This patch also enables compilation of the library. Signed-off-by: Shreyansh Jain --- config/common_base | 7 +++ lib/Makefile | 3 +++ mk/rte.app.mk | 1 + 3 files changed, 11 insertions(+) diff --git a/config/common_base b/config/common_base index e74febef4..3d2e12c31 100644 --- a/config/common_base +++ b/config/common_base @@ -792,6 +792,13 @@ CONFIG_RTE_LIBRTE_VHOST=n CONFIG_RTE_LIBRTE_VHOST_NUMA=n CONFIG_RTE_LIBRTE_VHOST_DEBUG=n +# +# Compile raw device support +# EXPERIMENTAL: API may change without prior notice +# +CONFIG_RTE_LIBRTE_RAWDEV=y +CONFIG_RTE_MAX_RAWDEVS=10 + # # Compile vhost PMD # To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled. diff --git a/lib/Makefile b/lib/Makefile index dc4e8df70..c75b7a694 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -126,4 +126,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni endif DEPDIRS-librte_kni := librte_eal librte_mempool librte_mbuf librte_ether +DIRS-$(CONFIG_RTE_LIBRTE_RAWDEV) += librte_rawdev +DEPDIRS-librte_rawdev := librte_eal librte_ether + include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 6a6a7452e..d783de2c1 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -104,6 +104,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)+= -lrte_eal _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)+= -lrte_cmdline _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)+= -lrte_reorder _LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED) += -lrte_sched +_LDLIBS-$(CONFIG_RTE_LIBRTE_RAWDEV) += -lrte_rawdev ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y) _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)+= -lrte_kni -- 2.14.1
[dpdk-dev] [PATCH v1 5/5] test: support for rawdev testcases
Patch introduces rawdev unit testcase for validation against the Skeleton rawdev dummy PMD implementation. Signed-off-by: Shreyansh Jain --- test/test/Makefile | 4 + test/test/test_rawdev.c | 376 2 files changed, 380 insertions(+) create mode 100644 test/test/test_rawdev.c diff --git a/test/test/Makefile b/test/test/Makefile index bb54c9808..038343d38 100644 --- a/test/test/Makefile +++ b/test/test/Makefile @@ -214,6 +214,10 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += test_eventdev_sw.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_OCTEONTX_SSOVF) += test_eventdev_octeontx.c endif +ifeq ($(CONFIG_RTE_LIBRTE_RAWDEV),y) +SRCS-y += test_rawdev.c +endif + SRCS-$(CONFIG_RTE_LIBRTE_KVARGS) += test_kvargs.c CFLAGS += -O3 diff --git a/test/test/test_rawdev.c b/test/test/test_rawdev.c new file mode 100644 index 0..000331387 --- /dev/null +++ b/test/test/test_rawdev.c @@ -0,0 +1,376 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright 2017 NXP + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "test.h" + +/* Using relative path as skeleton_rawdev is not part of exported headers */ +#include "../../drivers/raw/skeleton_rawdev/skeleton_rawdev.h" + +#define TEST_DEV_ID 0 +#define TEST_DEV_NAME "rawdev_skeleton" + +static int +testsuite_setup(void) +{ + uint8_t count; + count = rte_rawdev_count(); + if (!count) { + printf("\tNo existing rawdev; Creating 'skeldev_rawdev'\n"); + return rte_vdev_init(TEST_DEV_NAME, NULL); + } + + return TEST_SUCCESS; +} + +static void local_teardown(void); + +static void +testsuite_teardown(void) +{ + local_teardown(); +} + +static void +local_teardown(void) +{ + rte_vdev_uninit(TEST_DEV_NAME); +} + +static int +test_rawdev_count(void) +{ + uint8_t count; + count = rte_rawdev_count(); + TEST_ASSERT(count > 0, "Invalid rawdev count %" PRIu8, count); + return TEST_SUCCESS; +} + +static int +test_rawdev_get_dev_id(void) +{ + int ret; + ret = rte_rawdev_get_dev_id("invalid_rawdev_device"); + TEST_ASSERT_FAIL(ret, "Expected <0 for invalid dev name ret=%d", ret); + return TEST_SUCCESS; +} + +static int +test_rawdev_socket_id(void) +{ + int socket_id; + socket_id = rte_rawdev_socket_id(TEST_DEV_ID); + TEST_ASSERT(socket_id != -EINVAL, "Failed to get socket_id %d", + socket_id); + socket_id = rte_rawdev_socket_id(RTE_RAWDEV_MAX_DEVS); + TEST_ASSERT(socket_id == -EINVAL, "Expected -EINVAL %d", socket_id); + + return TEST_SUCCESS; +} + +static int +test_rawdev_info_get(void) +{ + int ret; + struct rte_rawdev_info rdev_info = {0}; + struct skeleton_rawdev_conf skel_conf = {0}; + + ret = rte_rawdev_info_get(TEST_DEV_ID, NULL); + TEST_ASSERT(ret == -EINVAL, "Expected -EINVAL, %d", ret); + + rdev_info.dev_private = &skel_conf; + + ret = rte_rawdev_info_get(TEST_DEV_ID, &rdev_info); + TEST_ASSERT_SUCCESS(ret, "Failed to get raw dev info"); + + return TEST_SUCCESS; +} + +static int +test_rawdev_configure(void) +{ + int ret; + struct rte_rawdev_info rdev_info = {0}; + struct skeleton_rawdev_conf rdev_conf_set = {0}; + struct skeleton_rawdev_conf rdev_conf_get = {0}; + + /* Check invalid configuration */ + ret = rte_rawdev_configure(TEST_DEV_ID, NULL); + TEST_ASSERT(ret == -EINVAL, "Null configure; Expected -EINVAL, got %d", + ret); + + /* Valid configuration test */ + rdev_conf_set.num_queues = 1; + rdev_conf_set.capabilities = SKELETON_CAPA_FW_LOAD | +SKELETON_CAPA_FW_RESET; + + rdev_info.dev_private = &rdev_conf_set; + ret = rte_rawdev_configure(TEST_DEV_ID, + (rte_rawdev_obj_t)&rdev_info); + TEST_ASSERT_SUCCESS(ret, "Failed to configure rawdev (%d)", ret); + + rdev_info.dev_private = &rdev_conf_get; + ret = rte_rawdev_info_get(TEST_DEV_ID, + (rte_rawdev_obj_t)&rdev_info); + TEST_ASSERT_SUCCESS(ret, "Failed to obtain rawdev configuration (%d)", + ret); + + TEST_ASSERT_EQUAL(rdev_conf_set.num_queues, rdev_conf_get.num_queues, + "Configuration test failed; num_queues (%d)(%d)", + rdev_conf_set.num_queues, rdev_conf_get.num_queues); + TEST_ASSERT_EQUAL(rdev_conf_set.capabilities, + rdev_conf_get.capabilities, + "Configuration test failed; capabilities"); + + return TEST_SUCCESS; +} + +static int +test_rawdev_queue_default_conf_get(void) +{ + int ret, i; + struct rte_rawdev_info rdev_info = {0}; + struct skeleton_rawdev_conf rdev_conf_get = {0}; + struct skeleton_rawdev_queue q = {0};
[dpdk-dev] [PATCH v1 3/5] drivers/raw: introduce skeleton rawdev driver
Skeleton rawdevice driver, on the lines of eventdev skeleton, is for showcasing the rawdev library. This driver implements some of the operations of the library based on which a test module can be developed. * Design of skeleton involves a virtual device which is plugged into VDEV bus on initialization. * Enqueue and Dequeue buffers essentially hold a series of buffers for a sequence enqueue->dequeue operation. Important in this is use of context as parameter to define opaque information (like queue_id) transacted between application and driver. * Device start and stop are dummy-fied operations for enabling and disabling the device. * Firmware operations are not implemented but can be easily extended with corresponding test cases in the unittest framework. Signed-off-by: Shreyansh Jain --- drivers/raw/Makefile | 9 + drivers/raw/skeleton_rawdev/Makefile | 25 + .../rte_pmd_skeleton_rawdev_version.map| 4 + drivers/raw/skeleton_rawdev/skeleton_rawdev.c | 668 + drivers/raw/skeleton_rawdev/skeleton_rawdev.h | 130 5 files changed, 836 insertions(+) create mode 100644 drivers/raw/Makefile create mode 100644 drivers/raw/skeleton_rawdev/Makefile create mode 100644 drivers/raw/skeleton_rawdev/rte_pmd_skeleton_rawdev_version.map create mode 100644 drivers/raw/skeleton_rawdev/skeleton_rawdev.c create mode 100644 drivers/raw/skeleton_rawdev/skeleton_rawdev.h diff --git a/drivers/raw/Makefile b/drivers/raw/Makefile new file mode 100644 index 0..da7c8b449 --- /dev/null +++ b/drivers/raw/Makefile @@ -0,0 +1,9 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright 2017 NXP + +include $(RTE_SDK)/mk/rte.vars.mk + +# DIRS-$() += +DIRS-$(CONFIG_RTE_LIBRTE_PMD_SKELETON_RAWDEV) += skeleton_rawdev + +include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/drivers/raw/skeleton_rawdev/Makefile b/drivers/raw/skeleton_rawdev/Makefile new file mode 100644 index 0..4d9b2f804 --- /dev/null +++ b/drivers/raw/skeleton_rawdev/Makefile @@ -0,0 +1,25 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright 2017 NXP + +include $(RTE_SDK)/mk/rte.vars.mk + +# +# library name +# +LIB = librte_pmd_skeleton_rawdev.a + +CFLAGS += $(WERROR_FLAGS) +LDLIBS += -lrte_eal +LDLIBS += -lrte_rawdev +LDLIBS += -lrte_bus_vdev + +EXPORT_MAP := rte_pmd_skeleton_rawdev_version.map + +LIBABIVER := 1 + +# +# all source are stored in SRCS-y +# +SRCS-$(CONFIG_RTE_LIBRTE_PMD_SKELETON_RAWDEV) += skeleton_rawdev.c + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/raw/skeleton_rawdev/rte_pmd_skeleton_rawdev_version.map b/drivers/raw/skeleton_rawdev/rte_pmd_skeleton_rawdev_version.map new file mode 100644 index 0..179140fb8 --- /dev/null +++ b/drivers/raw/skeleton_rawdev/rte_pmd_skeleton_rawdev_version.map @@ -0,0 +1,4 @@ +DPDK_18.02 { + + local: *; +}; diff --git a/drivers/raw/skeleton_rawdev/skeleton_rawdev.c b/drivers/raw/skeleton_rawdev/skeleton_rawdev.c new file mode 100644 index 0..56e6805df --- /dev/null +++ b/drivers/raw/skeleton_rawdev/skeleton_rawdev.c @@ -0,0 +1,668 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright 2017 NXP + */ + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include "skeleton_rawdev.h" + +/* Dynamic log type identifier */ +int skeleton_pmd_logtype; + +#define SKELETON_PMD_RAWDEV_NAME rawdev_skeleton +/**< Rawdev Skeleton dummy driver name */ + +static struct rte_vdev_driver skeleton_pmd_drv; +/**< Skeleton rawdev driver object */ + +struct queue_buffers { + void *bufs[SKELETON_QUEUE_MAX_DEPTH]; +}; + +static struct queue_buffers queue_buf[SKELETON_MAX_QUEUES] = {0}; +static void clear_queue_bufs(int queue_id); + +static void skeleton_rawdev_info_get(struct rte_rawdev *dev, +rte_rawdev_obj_t dev_info) +{ + struct skeleton_rawdev *skeldev; + struct skeleton_rawdev_conf *skeldev_conf; + + SKELETON_PMD_FUNC_TRACE(); + + if (!dev_info) { + SKELETON_PMD_ERR("Invalid request"); + return; + } + + skeldev = skeleton_rawdev_get_priv(dev); + + skeldev_conf = dev_info; + + skeldev_conf->num_queues = skeldev->num_queues; + skeldev_conf->capabilities = skeldev->capabilities; + skeldev_conf->device_state = skeldev->device_state; + skeldev_conf->firmware_state = skeldev->fw.firmware_state; +} + +static int skeleton_rawdev_configure(const struct rte_rawdev *dev, +rte_rawdev_obj_t config) +{ + struct skeleton_rawdev *skeldev; + struct skeleton_rawdev_conf *skeldev_conf; + + SKELETON_PMD_FUNC_TRACE(); + + RTE_FUNC_PTR_OR_ERR_RET(dev, -EINVAL); + + if (!config) { + SKELETON_PMD_ERR("Invalid configuration"); +
Re: [dpdk-dev] [PATCH v6 1/2] net/i40e: support input set configuration
> -Original Message- > From: Xing, Beilei > Sent: Friday, December 8, 2017 3:52 PM > To: Wu, Jingjing ; Lu, Wenzhuo > ; Zhang, Qi Z > Cc: dev@dpdk.org; Chilikin, Andrey > Subject: [PATCH v6 1/2] net/i40e: support input set configuration > > This patch supports getting/setting input set info for RSS/FDIR/FDIR flexible > payload. > Also add some helper functions for input set configuration. > > Signed-off-by: Beilei Xing > --- > drivers/net/i40e/rte_pmd_i40e.c | 141 > ++ > drivers/net/i40e/rte_pmd_i40e.h | 138 > + > drivers/net/i40e/rte_pmd_i40e_version.map | 10 +++ > 3 files changed, 289 insertions(+) > > diff --git a/drivers/net/i40e/rte_pmd_i40e.c > b/drivers/net/i40e/rte_pmd_i40e.c index aeb92af..1f95f91 100644 > --- a/drivers/net/i40e/rte_pmd_i40e.c > +++ b/drivers/net/i40e/rte_pmd_i40e.c > @@ -2985,3 +2985,144 @@ int > rte_pmd_i40e_flow_add_del_packet_template( > > return i40e_flow_add_del_fdir_filter(dev, &filter_conf, add); } > + > +int > +rte_pmd_i40e_inset_get(uint16_t port, uint8_t pctype, > +struct rte_pmd_i40e_inset *inset, > +enum rte_pmd_i40e_inset_type inset_type) { > + struct rte_eth_dev *dev; > + struct i40e_hw *hw; > + uint64_t inset_reg; > + uint32_t mask_reg[2]; > + int i; > + > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV); > + > + dev = &rte_eth_devices[port]; > + > + if (!is_i40e_supported(dev)) > + return -ENOTSUP; > + > + if (pctype > 63) > + return -EINVAL; > + > + hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); > + memset(inset, 0, sizeof(struct rte_pmd_i40e_inset)); > + > + switch (inset_type) { > + case INSET_HASH: > + /* Get input set */ > + inset_reg = > + i40e_read_rx_ctl(hw, I40E_GLQF_HASH_INSET(1, pctype)); > + inset_reg <<= I40E_32_BIT_WIDTH; > + inset_reg |= > + i40e_read_rx_ctl(hw, I40E_GLQF_HASH_INSET(0, pctype)); > + /* Get field mask */ > + mask_reg[0] = > + i40e_read_rx_ctl(hw, I40E_GLQF_HASH_MSK(0, pctype)); > + mask_reg[1] = > + i40e_read_rx_ctl(hw, I40E_GLQF_HASH_MSK(1, pctype)); > + break; > + case INSET_FDIR: > + inset_reg = > + i40e_read_rx_ctl(hw, I40E_PRTQF_FD_INSET(pctype, 1)); > + inset_reg <<= I40E_32_BIT_WIDTH; > + inset_reg |= > + i40e_read_rx_ctl(hw, I40E_PRTQF_FD_INSET(pctype, 0)); > + mask_reg[0] = > + i40e_read_rx_ctl(hw, I40E_GLQF_FD_MSK(0, pctype)); > + mask_reg[1] = > + i40e_read_rx_ctl(hw, I40E_GLQF_FD_MSK(1, pctype)); > + break; > + case INSET_FDIR_FLX: > + inset_reg = > + i40e_read_rx_ctl(hw, I40E_PRTQF_FD_FLXINSET(pctype)); > + mask_reg[0] = > + i40e_read_rx_ctl(hw, I40E_PRTQF_FD_MSK(pctype, 0)); > + mask_reg[1] = > + i40e_read_rx_ctl(hw, I40E_PRTQF_FD_MSK(pctype, 1)); > + break; > + default: > + PMD_DRV_LOG(ERR, "Unsupported input set type."); > + return -EINVAL; > + } > + > + inset->inset = inset_reg; > + > + for (i = 0; i < 2; i++) { > + inset->mask[i].field_idx = ((mask_reg[i] >> 16) & 0x3F); > + inset->mask[i].mask = mask_reg[i] & 0x; > + } > + > + return 0; > +} > + > +int > +rte_pmd_i40e_inset_set(uint16_t port, uint8_t pctype, > +struct rte_pmd_i40e_inset *inset, > +enum rte_pmd_i40e_inset_type inset_type) { > + struct rte_eth_dev *dev; > + struct i40e_hw *hw; > + uint64_t inset_reg; > + uint32_t mask_reg[2]; > + int i; > + > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV); > + > + dev = &rte_eth_devices[port]; > + > + if (!is_i40e_supported(dev)) > + return -ENOTSUP; > + > + if (pctype > 63) > + return -EINVAL; > + > + hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); > + > + /* Clear mask first */ > + for (i = 0; i < 2; i++) > + i40e_check_write_reg(hw, I40E_GLQF_FD_MSK(i, pctype), 0); > + > + inset_reg = inset->inset; > + for (i = 0; i < 2; i++) > + mask_reg[i] = (inset->mask[i].field_idx << 16) | > + inset->mask[i].mask; > + > + switch (inset_type) { > + case INSET_HASH: > + i40e_check_write_reg(hw, I40E_GLQF_HASH_INSET(0, pctype), > + (uint32_t)(inset_reg & UINT32_MAX)); > + i40e_check_write_reg(hw, I40E_GLQF_HASH_INSET(1, pctype), > + (uint32_t)((inset_reg >> > + I40E_32_BIT_WIDTH) & UINT32_MA
[dpdk-dev] [PATCH] bus/fslmc: add support for scanned device count
FSLMC bus detects a multiple type of logical objects representing components of the datapath. Using the type of device, a newly introduced API rte_fslmc_get_device_count can return the count of devices scanned of that device type. Signed-off-by: Shreyansh Jain --- :: This patch is based on *net-next* tree. drivers/bus/fslmc/fslmc_bus.c | 12 drivers/bus/fslmc/rte_bus_fslmc_version.map | 1 + drivers/bus/fslmc/rte_fslmc.h | 18 +++--- 3 files changed, 28 insertions(+), 3 deletions(-) diff --git a/drivers/bus/fslmc/fslmc_bus.c b/drivers/bus/fslmc/fslmc_bus.c index 63c333a59..39478f7f3 100644 --- a/drivers/bus/fslmc/fslmc_bus.c +++ b/drivers/bus/fslmc/fslmc_bus.c @@ -53,6 +53,14 @@ struct rte_fslmc_bus rte_fslmc_bus; uint8_t dpaa2_virt_mode; +uint32_t +rte_fslmc_get_device_count(enum rte_dpaa2_dev_type device_type) +{ + if (device_type > DPAA2_DEVTYPE_MAX) + return 0; + return rte_fslmc_bus.device_count[device_type]; +} + static void cleanup_fslmc_device_list(void) { @@ -164,6 +172,9 @@ scan_one_fslmc_device(char *dev_name) else dev->dev_type = DPAA2_UNKNOWN; + /* Update the device found into the device_count table */ + rte_fslmc_bus.device_count[dev->dev_type]++; + t_ptr = strtok(NULL, "."); if (!t_ptr) { FSLMC_BUS_LOG(ERR, "Incorrect device string observed (%s).", @@ -408,6 +419,7 @@ struct rte_fslmc_bus rte_fslmc_bus = { }, .device_list = TAILQ_HEAD_INITIALIZER(rte_fslmc_bus.device_list), .driver_list = TAILQ_HEAD_INITIALIZER(rte_fslmc_bus.driver_list), + .device_count = {0}, }; RTE_REGISTER_BUS(fslmc, rte_fslmc_bus.bus); diff --git a/drivers/bus/fslmc/rte_bus_fslmc_version.map b/drivers/bus/fslmc/rte_bus_fslmc_version.map index f59fc671f..16b759d8b 100644 --- a/drivers/bus/fslmc/rte_bus_fslmc_version.map +++ b/drivers/bus/fslmc/rte_bus_fslmc_version.map @@ -97,5 +97,6 @@ DPDK_18.02 { dpaa2_virt_mode; qbman_fq_query_state; qbman_fq_state_frame_count; + rte_fslmc_get_device_count; } DPDK_17.11; diff --git a/drivers/bus/fslmc/rte_fslmc.h b/drivers/bus/fslmc/rte_fslmc.h index fd52e2b84..e6314b5cb 100644 --- a/drivers/bus/fslmc/rte_fslmc.h +++ b/drivers/bus/fslmc/rte_fslmc.h @@ -88,7 +88,8 @@ enum rte_dpaa2_dev_type { DPAA2_CI, /**< DPCI type device */ DPAA2_MPORTAL, /**< DPMCP type device */ /* Unknown device placeholder */ - DPAA2_UNKNOWN + DPAA2_UNKNOWN, + DPAA2_DEVTYPE_MAX, }; TAILQ_HEAD(rte_dpaa2_object_list, rte_dpaa2_object); @@ -150,8 +151,8 @@ struct rte_fslmc_bus { /**< FSLMC DPAA2 Device list */ struct rte_fslmc_driver_list driver_list; /**< FSLMC DPAA2 Driver list */ - int device_count; - /**< Optional: Count of devices on bus */ + int device_count[DPAA2_DEVTYPE_MAX]; + /**< Count of all devices scanned */ }; /** @@ -191,6 +192,17 @@ RTE_PMD_EXPORT_NAME(nm, __COUNTER__) */ void rte_fslmc_object_register(struct rte_dpaa2_object *object); +/** + * Count of a particular type of DPAA2 device scanned on the bus. + * + * @param dev_type + * Type of device as rte_dpaa2_dev_type enumerator + * @return + * >=0 for count; 0 indicates either no device of the said type scanned or + * invalid device type. + */ +uint32_t rte_fslmc_get_device_count(enum rte_dpaa2_dev_type device_type); + /** Helper for DPAA2 object registration */ #define RTE_PMD_REGISTER_DPAA2_OBJECT(nm, dpaa2_obj) \ RTE_INIT(dpaa2objinitfn_ ##nm); \ -- 2.14.1
Re: [dpdk-dev] [PATCH v4] net/mlx5: load libmlx5 and libibverbs in run-time
Hi Shachar, Please see small comment bellow, On Sun, Dec 31, 2017 at 07:52:51AM +, Shachar Beiser wrote: > MLX5 PMD loads libraries: libibverbs and libmlx5. > MLX5 PMD is not linked to external libraries. > > Signed-off-by: Shachar Beiser > --- > v1: > load external libraries in run-time > v2: > * fix checkpatch warnings > v3: > * fix checkpatch warnings > v4: > New MACROs in order to reuse code > --- > config/common_base | 1 + > drivers/net/mlx5/Makefile| 22 ++- > drivers/net/mlx5/lib/mlx5_dll.c | 294 > +++ > drivers/net/mlx5/lib/mlx5_dll.h | 103 ++ > drivers/net/mlx5/mlx5.c | 17 ++- > drivers/net/mlx5/mlx5.h | 4 + > drivers/net/mlx5/mlx5_flow.c | 4 + > drivers/net/mlx5/mlx5_mac.c | 4 + > drivers/net/mlx5/mlx5_mr.c | 4 + > drivers/net/mlx5/mlx5_rss.c | 4 + > drivers/net/mlx5/mlx5_rxmode.c | 4 + > drivers/net/mlx5/mlx5_rxq.c | 4 + > drivers/net/mlx5/mlx5_rxtx.c | 4 + > drivers/net/mlx5/mlx5_rxtx.h | 6 +- > drivers/net/mlx5/mlx5_rxtx_vec.c | 4 + > drivers/net/mlx5/mlx5_txq.c | 4 + > mk/rte.app.mk| 8 +- > 17 files changed, 479 insertions(+), 12 deletions(-) > create mode 100644 drivers/net/mlx5/lib/mlx5_dll.c > create mode 100644 drivers/net/mlx5/lib/mlx5_dll.h > > diff --git a/config/common_base b/config/common_base > index b8ee8f9..30c8fcf 100644 > --- a/config/common_base > +++ b/config/common_base > @@ -236,6 +236,7 @@ CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8 > # Compile burst-oriented Mellanox ConnectX-4 & ConnectX-5 (MLX5) PMD > # > CONFIG_RTE_LIBRTE_MLX5_PMD=n > +CONFIG_RTE_LIBRTE_MLX5_DLL=y > CONFIG_RTE_LIBRTE_MLX5_DEBUG=n > CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8 Not sure a new configuration item is allowed. If it is, the documentation of such variable is missing. > diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile > index a3984eb..24fa127 100644 > --- a/drivers/net/mlx5/Makefile > +++ b/drivers/net/mlx5/Makefile > @@ -53,18 +53,25 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rss.c > SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mr.c > SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow.c > SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_socket.c > - > +ifeq ($(CONFIG_RTE_LIBRTE_MLX5_DLL),y) > +SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/mlx5_dll.c > +endif > # Basic CFLAGS. > CFLAGS += -O3 > CFLAGS += -std=c11 -Wall -Wextra > CFLAGS += -g > CFLAGS += -I. > +CFLAGS += -I$(SRCDIR) > CFLAGS += -D_BSD_SOURCE > CFLAGS += -D_DEFAULT_SOURCE > CFLAGS += -D_XOPEN_SOURCE=600 > CFLAGS += $(WERROR_FLAGS) > CFLAGS += -Wno-strict-prototypes > +ifeq ($(CONFIG_RTE_LIBRTE_MLX5_DLL),y) > +LDLIBS += -ldl > +else > LDLIBS += -libverbs -lmlx5 > +endif > LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring > LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs > LDLIBS += -lrte_bus_pci > @@ -105,26 +112,28 @@ endif > > mlx5_autoconf.h.new: FORCE > > +VERBS_H := infiniband/verbs.h > +MLX5DV_H := infiniband/mlx5dv.h > mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh > $Q $(RM) -f -- '$@' > $Q sh -- '$<' '$@' \ > HAVE_IBV_DEVICE_VXLAN_SUPPORT \ > - infiniband/verbs.h \ > + $(VERBS_H) \ > enum IBV_DEVICE_VXLAN_SUPPORT \ > $(AUTOCONF_OUTPUT) > $Q sh -- '$<' '$@' \ > HAVE_IBV_WQ_FLAG_RX_END_PADDING \ > - infiniband/verbs.h \ > + $(VERBS_H) \ > enum IBV_WQ_FLAG_RX_END_PADDING \ > $(AUTOCONF_OUTPUT) > $Q sh -- '$<' '$@' \ > HAVE_IBV_MLX5_MOD_MPW \ > - infiniband/mlx5dv.h \ > + $(MLX5DV_H) \ > enum MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED \ > $(AUTOCONF_OUTPUT) > $Q sh -- '$<' '$@' \ > HAVE_IBV_MLX5_MOD_CQE_128B_COMP \ > - infiniband/mlx5dv.h \ > + $(MLX5DV_H) \ > enum MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP \ > $(AUTOCONF_OUTPUT) > $Q sh -- '$<' '$@' \ > @@ -144,10 +153,9 @@ mlx5_autoconf.h.new: > $(RTE_SDK)/buildtools/auto-config-h.sh > $(AUTOCONF_OUTPUT) > $Q sh -- '$<' '$@' \ > HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT \ > - infiniband/verbs.h \ > + $(VERBS_H) \ > enum IBV_FLOW_SPEC_ACTION_COUNT \ > $(AUTOCONF_OUTPUT) This modification should be inside its own patch, it is not directly related to the this patch itself. > - > # Create mlx5_autoconf.h or update it in case it differs from the new one. > > mlx5_autoconf.h: mlx5_autoconf.h.new > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c > index cd66fe1..eeef782 100644 > --- a/drivers/net/mlx5/mlx5.c > +++ b/drivers/net/mlx5/mlx5.c > @@ -30,7 +30,8 @@ > * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE >
Re: [dpdk-dev] [PATCH v1 3/6] flow_classify: fix issue in exported header
> -Original Message- > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com] > Sent: Thursday, December 21, 2017 1:00 PM > To: dev@dpdk.org > Cc: Yigit, Ferruh ; Iremonger, Bernard > > Subject: [PATCH v1 3/6] flow_classify: fix issue in exported header > > Reported by check-includes.sh: > > [...]/rte_flow_classify.h:85:47: error: ISO C does not permit named > variadic macros [-Werror=variadic-macros] > #define RTE_FLOW_CLASSIFY_LOG(level, fmt, args...) \ > ^ > > Fixes: be41ac2a330f ("flow_classify: introduce flow classify library") > Cc: Ferruh Yigit > Cc: Bernard Iremonger > > Signed-off-by: Adrien Mazarguil Acked-by: Bernard Iremonger
Re: [dpdk-dev] [RFC 0/5] Port Representor for control and monitoring of VF devices
On 27/12/2017 15:50, Alex Rosenbaum wrote: On Wed, Dec 27, 2017 at 11:40 AM, Mohammad Abdul Awal wrote: On 22/12/2017 22:33, Alex Rosenbaum wrote: On Fri, Dec 22, 2017 at 4:31 PM, Mohammad Abdul Awal On 21/12/2017 14:51, Alex Rosenbaum wrote: By hotplug I did not mean HW hotplug, rather I meant the software hotplug of port representor so that an application can add/delete representor at run time. What is the expect results if application adds/deletes a representor at run time? From my understanding, for OVS, it would make much sense to enumerate the representors during the startup time and only 'state' active/inactive would enough to imply the state of a VF. On the other hand, for a system with varieties of NICs/FPGAs/SmartNics having capacities of hundreds (if not thousands) of max VFs and different capabilities, we may not want to allocate them if not being using, and we may not be able to control this way if no broker. This is definitely a matter of design discussion for now where ultimate outcome is same, i.e. having a representor to control a VF. I would expect the VF hotplug to be depended on the PF configuration. So that new/removed VF's would trigger a representor state or existance. I agree and as I have just said above that it is different ways of doing same thing with limited/flexible ability. Alex Regards, Awal
Re: [dpdk-dev] [PATCH v2] sched: fix overflow errors in WRR weighting code
Hi Alan, NAK for now. There is a good reason for truncating the WRR cost to 8-bit value, which is keeping the size of the rte_sched_pipe structure to single cache line (64 bytes). This is done for performance reasons at the expense of some accuracy loss for the scheduling of the 4x queues per traffic class. Is there a way to make the improvement while working with 8-bit WRR cost values in the pipe structure? > -Original Message- > From: alangordonde...@gmail.com [mailto:alangordonde...@gmail.com] > Sent: Thursday, November 30, 2017 9:05 AM > To: Dumitrescu, Cristian > Cc: dev@dpdk.org; Alan Dewar > Subject: [PATCH v2] sched: fix overflow errors in WRR weighting code > > From: Alan Dewar > > Revised patch - this version fixes an issue when a small wrr_cost is > shifted so far right that its value becomes zero. > > The WRR code calculates the lowest common denominator between the > four > WRR weights as a uint32_t value and divides the LCD by each of the WRR > weights and casts the results as a uint8_t. This casting can cause > the ratios of the computed wrr costs to be wrong. For example with > WRR weights of 3, 5, 7 and 11, the LCD is computed to be > 1155. The WRR costs get computed as: Picking prime numbers for the weights is generally a bad idea. If you would pick e.g. 4, 6, 8 and 12 rather than 3, 5, 7 and 11 you would avoid any issues due to the 8-bit truncation. > > 1155/3 = 385, 1155/5 = 231, 1155/7 = 165, 1155/11 = 105. > > When the value 385 is cast into an uint8_t it ends up as 129. > Rather than casting straight into a uint8_t, this patch shifts the > computed WRR costs right so that the largest value is only eight bits > wide. > > In grinder_schedule, the packet length is multiplied by the WRR cost > and added to the grinder's wrr_tokens value. The grinder's wrr_tokens > field is a uint16_t, so combination of a packet length of 1500 bytes > and a wrr cost of 44 will overflow this field on the first packet. > > This patch increases the width of the grinder's wrr_tokens and > wrr_mask fields from uint16_t to uint32_t. > Increasing the size of the grinder fields is OK, but I am not sure whether it is really helpful, as the values saved in the pipe structure are 8-bit. > In grinder_wrr_store, the remaining tokens in the grinder's wrr_tokens > array are copied to the appropriate pipe's wrr_tokens array. However > the pipe's wrr_tokens array is only a uint8_t array so unused tokens > were quite frequently lost which upsets the balance of traffic across > the four WRR queues. > > This patch increases the width of the pipe's wrr_tokens array from > a uint8_t to uint32_t. This is not allowed for performance reasons, as having 16-bit or 32-bit WRR cost values in pipe structure would increase the size of the pipe structure from one cache line to two cache lines. > > Signed-off-by: Alan Dewar > --- > v2 - fixed bug in the wrr_cost calculation code that could result > in a zero wrr_cost > > lib/librte_sched/rte_sched.c| 59 +--- > - > lib/librte_sched/rte_sched_common.h | 15 ++ > 2 files changed, 61 insertions(+), 13 deletions(-) > > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c > index 7252f85..324743d 100644 > --- a/lib/librte_sched/rte_sched.c > +++ b/lib/librte_sched/rte_sched.c > @@ -130,7 +130,7 @@ struct rte_sched_pipe { > uint32_t tc_credits[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; > > /* Weighted Round Robin (WRR) */ > - uint8_t wrr_tokens[RTE_SCHED_QUEUES_PER_PIPE]; > + uint32_t wrr_tokens[RTE_SCHED_QUEUES_PER_PIPE]; > > /* TC oversubscription */ > uint32_t tc_ov_credits; > @@ -205,8 +205,8 @@ struct rte_sched_grinder { > struct rte_mbuf *pkt; > > /* WRR */ > - uint16_t wrr_tokens[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > - uint16_t wrr_mask[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > + uint32_t wrr_tokens[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > + uint32_t wrr_mask[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > uint8_t wrr_cost[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > }; > > @@ -542,6 +542,17 @@ rte_sched_time_ms_to_bytes(uint32_t time_ms, > uint32_t rate) > return time; > } > > +static uint32_t rte_sched_reduce_to_byte(uint32_t value) > +{ > + uint32_t shift = 0; > + > + while (value & 0xFF00) { > + value >>= 1; > + shift++; > + } > + return shift; > +} > + > static void > rte_sched_port_config_pipe_profile_table(struct rte_sched_port *port, > struct rte_sched_port_params *params) > { > @@ -583,6 +594,8 @@ rte_sched_port_config_pipe_profile_table(struct > rte_sched_port *port, struct rte > uint32_t > wrr_cost[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > uint32_t lcd, lcd1, lcd2; > uint32_t qindex; > + uint32_t low_pos; > + uint32_t shift; > > qindex
Re: [dpdk-dev] [PATCH v2] test: new sched WRR unit-test
> -Original Message- > From: alangordonde...@gmail.com [mailto:alangordonde...@gmail.com] > Sent: Thursday, November 30, 2017 9:05 AM > To: Dumitrescu, Cristian > Cc: dev@dpdk.org; Alan Dewar > Subject: [PATCH v2] test: new sched WRR unit-test > > From: Alan Dewar > > New unit-test for the librte_sched WRR weighting code. > > With the standard 17.11 code, the first three sub-tests pass, but > the last three fail due to bugs in the WRR weighting code. > > With v1 of the "sched: fix overflow errors in WRR weighting code" > patch the first five sub-tests pass, and the last sub-test fails badly. > > With v2 of the "sched: fix overflow errors in WRR weighting code" > patch the first five sub-tests pass, and the last sub-test is a very > near miss (i.e. measured packets counts are one away from the expected > counts). > > Signed-off-by: Alan Dewar > --- > v2 - add new 255-254-253-1 weightings sub-test > > test/test/Makefile | 1 + > test/test/test_sched_wrr.c | 491 > + > 2 files changed, 492 insertions(+) > create mode 100644 test/test/test_sched_wrr.c > > diff --git a/test/test/Makefile b/test/test/Makefile > index bb54c98..0ab0ed3 100644 > --- a/test/test/Makefile > +++ b/test/test/Makefile > @@ -173,6 +173,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_NET) += test_crc.c > ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y) > SRCS-y += test_red.c > SRCS-y += test_sched.c > +SRCS-y += test_sched_wrr.c > endif > > SRCS-$(CONFIG_RTE_LIBRTE_METER) += test_meter.c > diff --git a/test/test/test_sched_wrr.c b/test/test/test_sched_wrr.c > new file mode 100644 > index 000..df5a231 > --- /dev/null > +++ b/test/test/test_sched_wrr.c > @@ -0,0 +1,491 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * Copyright(c) 2017 ATT Intellectual Property. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND > CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT > NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND > FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE > COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, > INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, > BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; > LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED > AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR > TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT > OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH > DAMAGE. > + * > + */ > + > +#include > +#include > +#include > +#include > +#include > + > +#include "test.h" > + > +#include > +#include > +#include > +#include > +#include > + > + > +#define SUBPORT 0 > +#define PIPE1 > +#define TC 0 > +#define QUEUE 3 > + > +static struct rte_sched_subport_params subport_param[] = { > + { > + .tb_rate = 125000, > + .tb_size = 100, > + > + .tc_rate = {125000, 125000, 125000, > 125000}, > + .tc_period = 10, > + }, > +}; > + > +static struct rte_sched_pipe_params pipe_profile[] = { > + { /* Profile #0 */ > + .tb_rate = 3051750, > + .tb_size = 100, > + > + .tc_rate = {3051750, 3051750, 3051750, 3051750}, > + .tc_period = 160, > + > + .wrr_weights = {1, 1, 1, 1, > + 1, 1, 1, 1, > + 1, 1, 1, 1, > + 1, 1, 1, 1}, > + }, > +}; > + > +static struct rte_sched_port_params port_param = { > + .socket = 0, /* computed */ > + .rate = 0, /* computed */ > + .mtu = 1522, > + .frame_overhead = RTE_SCHED_FRAME_OVERHEAD_DEFAULT, > + .n_subports_per_port = 1, > + .n_pipes_per_subport = 1024, > + .qsize = {32, 32, 32, 32}, > + .pipe_profiles = pipe_profile
Re: [dpdk-dev] [PATCH v4] sched: make RED scaling configurable
> -Original Message- > From: Dumitrescu, Cristian > Sent: Tuesday, October 3, 2017 6:16 PM > To: alangordonde...@gmail.com; Kantecki, Tomasz > > Cc: dev@dpdk.org; Alan Dewar > Subject: RE: [PATCH v4] sched: make RED scaling configurable > > Adding Tomasz. > > > -Original Message- > > From: alangordonde...@gmail.com > [mailto:alangordonde...@gmail.com] > > Sent: Tuesday, October 3, 2017 10:22 AM > > To: Dumitrescu, Cristian > > Cc: dev@dpdk.org; Alan Dewar > > Subject: [PATCH v4] sched: make RED scaling configurable > > > > From: Alan Dewar > > > > The RED code stores the weighted moving average in a 32-bit integer as > > a pseudo fixed-point floating number with 10 fractional bits. Twelve > > other bits are used to encode the filter weight, leaving just 10 bits > > for the queue length. This limits the maximum queue length supported > > by RED queues to 1024 packets. > > > > Introduce a new API to allow the RED scaling factor to be configured > > based upon maximum queue length. If this API is not called, the RED > > scaling factor remains at its default value. > > > > Added some new RED scaling unit-tests to test with RED queue-lengths > > up to 8192 packets long. > > > > Signed-off-by: Alan Dewar > > --- > > lib/librte_sched/rte_red.c | 53 ++- > > lib/librte_sched/rte_red.h | 63 ++-- > > lib/librte_sched/rte_sched_version.map | 6 + > > test/test/test_red.c | 274 > > - > > 4 files changed, 374 insertions(+), 22 deletions(-) > > > > diff --git a/lib/librte_sched/rte_red.c b/lib/librte_sched/rte_red.c > > index ade57d1..0dc8d28 100644 > > --- a/lib/librte_sched/rte_red.c > > +++ b/lib/librte_sched/rte_red.c > > @@ -43,6 +43,8 @@ > > static int rte_red_init_done = 0; /**< Flag to indicate that global > > initialisation is done */ > > uint32_t rte_red_rand_val = 0;/**< Random value cache */ > > uint32_t rte_red_rand_seed = 0; /**< Seed for random number > > generation */ > > +uint8_t rte_red_scaling = RTE_RED_SCALING_DEFAULT; > > +uint16_t rte_red_max_threshold = RTE_RED_DEFAULT_QUEUE_LENGTH > - > > 1; > > > > /** > > * table[i] = log2(1-Wq) * Scale * -1 > > @@ -66,7 +68,7 @@ __rte_red_init_tables(void) > > double scale = 0.0; > > double table_size = 0.0; > > > > - scale = (double)(1 << RTE_RED_SCALING); > > + scale = (double)(1 << rte_red_scaling); > > table_size = (double)(RTE_DIM(rte_red_pow2_frac_inv)); > > > > for (i = 0; i < RTE_DIM(rte_red_pow2_frac_inv); i++) { > > @@ -119,7 +121,7 @@ rte_red_config_init(struct rte_red_config > *red_cfg, > > if (red_cfg == NULL) { > > return -1; > > } > > - if (max_th > RTE_RED_MAX_TH_MAX) { > > + if (max_th > rte_red_max_threshold) { > > return -2; > > } > > if (min_th >= max_th) { > > @@ -148,11 +150,52 @@ rte_red_config_init(struct rte_red_config > > *red_cfg, > > rte_red_init_done = 1; > > } > > > > - red_cfg->min_th = ((uint32_t) min_th) << (wq_log2 + > > RTE_RED_SCALING); > > - red_cfg->max_th = ((uint32_t) max_th) << (wq_log2 + > > RTE_RED_SCALING); > > - red_cfg->pa_const = (2 * (max_th - min_th) * maxp_inv) << > > RTE_RED_SCALING; > > + red_cfg->min_th = ((uint32_t) min_th) << (wq_log2 + > > rte_red_scaling); > > + red_cfg->max_th = ((uint32_t) max_th) << (wq_log2 + > > rte_red_scaling); > > + red_cfg->pa_const = (2 * (max_th - min_th) * maxp_inv) << > > + rte_red_scaling; > > red_cfg->maxp_inv = maxp_inv; > > red_cfg->wq_log2 = wq_log2; > > > > return 0; > > } > > + > > +int > > +rte_red_set_scaling(uint16_t max_red_queue_length) > > +{ > > + int8_t count; > > + > > + if (rte_red_init_done) > > + /** > > +* Can't change the scaling once the red table has been > > +* computed. > > +*/ > > + return -1; > > + > > + if (max_red_queue_length < RTE_RED_MIN_QUEUE_LENGTH) > > + return -2; > > + > > + if (max_red_queue_length > RTE_RED_MAX_QUEUE_LENGTH) > > + return -3; > > + > > + if (!rte_is_power_of_2(max_red_queue_length)) > > + return -4; > > + > > + count = 0; > > + while (max_red_queue_length != 0) { > > + max_red_queue_length >>= 1; > > + count++; > > + } > > + > > + rte_red_scaling -= count - RTE_RED_SCALING_DEFAULT; > > + rte_red_max_threshold = max_red_queue_length - 1; > > + return 0; > > +} > > + > > +void > > +rte_red_reset_scaling(void) > > +{ > > + rte_red_init_done = 0; > > + rte_red_scaling = RTE_RED_SCALING_DEFAULT; > > + rte_red_max_threshold = RTE_RED_DEFAULT_QUEUE_LENGTH - 1; > > +} > > diff --git a/lib/librte_sched/rte_red.h b/lib/librte_sched/rte_red.h > > index ca12227..be1fb0f 100644 > > --- a/lib/librte_sched/rte_red.h > > +++ b/lib/librte_sched/rte_red.h > > @@ -52,14 +52,31 @@ extern "C" { > > #include > > #include > > > > -#define RTE_RED_SCALING
Re: [dpdk-dev] [PATCH v2] eal/x86: get hypervisor name
On Sat, 30 Dec 2017 23:47:23 +0100 Thomas Monjalon wrote: > The CPUID instruction is catched by hypervisor which can return > a flag indicating one is running, and its name. > > Suggested-by: Stephen Hemminger > Signed-off-by: Thomas Monjalon > --- > v2 changes: > - remove C99 style declaration > - move code in rte_hypervisor.* files > - add a function to get the name string Looks good to me. What about Xen?
Re: [dpdk-dev] [PATCH v4] sched: make RED scaling configurable
Hi Alan, Thanks for your work! I do have some comments (see below), but generally looks good to me. > -Original Message- > From: alangordonde...@gmail.com [mailto:alangordonde...@gmail.com] > Sent: Tuesday, October 3, 2017 10:22 AM > To: Dumitrescu, Cristian > Cc: dev@dpdk.org; Alan Dewar > Subject: [PATCH v4] sched: make RED scaling configurable > > From: Alan Dewar > > The RED code stores the weighted moving average in a 32-bit integer as > a pseudo fixed-point floating number with 10 fractional bits. Twelve > other bits are used to encode the filter weight, leaving just 10 bits > for the queue length. This limits the maximum queue length supported > by RED queues to 1024 packets. > > Introduce a new API to allow the RED scaling factor to be configured > based upon maximum queue length. If this API is not called, the RED > scaling factor remains at its default value. > > Added some new RED scaling unit-tests to test with RED queue-lengths > up to 8192 packets long. > > Signed-off-by: Alan Dewar > --- > lib/librte_sched/rte_red.c | 53 ++- > lib/librte_sched/rte_red.h | 63 ++-- > lib/librte_sched/rte_sched_version.map | 6 + > test/test/test_red.c | 274 > - > 4 files changed, 374 insertions(+), 22 deletions(-) > > diff --git a/lib/librte_sched/rte_red.c b/lib/librte_sched/rte_red.c > index ade57d1..0dc8d28 100644 > --- a/lib/librte_sched/rte_red.c > +++ b/lib/librte_sched/rte_red.c > @@ -43,6 +43,8 @@ > static int rte_red_init_done = 0; /**< Flag to indicate that global > initialisation is done */ > uint32_t rte_red_rand_val = 0;/**< Random value cache */ > uint32_t rte_red_rand_seed = 0; /**< Seed for random number > generation */ > +uint8_t rte_red_scaling = RTE_RED_SCALING_DEFAULT; > +uint16_t rte_red_max_threshold = RTE_RED_DEFAULT_QUEUE_LENGTH - > 1; > > /** > * table[i] = log2(1-Wq) * Scale * -1 > @@ -66,7 +68,7 @@ __rte_red_init_tables(void) > double scale = 0.0; > double table_size = 0.0; > > - scale = (double)(1 << RTE_RED_SCALING); > + scale = (double)(1 << rte_red_scaling); > table_size = (double)(RTE_DIM(rte_red_pow2_frac_inv)); > > for (i = 0; i < RTE_DIM(rte_red_pow2_frac_inv); i++) { > @@ -119,7 +121,7 @@ rte_red_config_init(struct rte_red_config *red_cfg, > if (red_cfg == NULL) { > return -1; > } > - if (max_th > RTE_RED_MAX_TH_MAX) { > + if (max_th > rte_red_max_threshold) { > return -2; > } > if (min_th >= max_th) { > @@ -148,11 +150,52 @@ rte_red_config_init(struct rte_red_config > *red_cfg, > rte_red_init_done = 1; > } > > - red_cfg->min_th = ((uint32_t) min_th) << (wq_log2 + > RTE_RED_SCALING); > - red_cfg->max_th = ((uint32_t) max_th) << (wq_log2 + > RTE_RED_SCALING); > - red_cfg->pa_const = (2 * (max_th - min_th) * maxp_inv) << > RTE_RED_SCALING; > + red_cfg->min_th = ((uint32_t) min_th) << (wq_log2 + > rte_red_scaling); > + red_cfg->max_th = ((uint32_t) max_th) << (wq_log2 + > rte_red_scaling); > + red_cfg->pa_const = (2 * (max_th - min_th) * maxp_inv) << > + rte_red_scaling; > red_cfg->maxp_inv = maxp_inv; > red_cfg->wq_log2 = wq_log2; > > return 0; > } > + > +int > +rte_red_set_scaling(uint16_t max_red_queue_length) > +{ > + int8_t count; > + > + if (rte_red_init_done) > + /** > + * Can't change the scaling once the red table has been > + * computed. > + */ > + return -1; > + > + if (max_red_queue_length < RTE_RED_MIN_QUEUE_LENGTH) > + return -2; > + > + if (max_red_queue_length > RTE_RED_MAX_QUEUE_LENGTH) > + return -3; > + > + if (!rte_is_power_of_2(max_red_queue_length)) > + return -4; > + > + count = 0; > + while (max_red_queue_length != 0) { > + max_red_queue_length >>= 1; > + count++; > + } > + > + rte_red_scaling -= count - RTE_RED_SCALING_DEFAULT; > + rte_red_max_threshold = max_red_queue_length - 1; > + return 0; > +} > + > +void > +rte_red_reset_scaling(void) > +{ > + rte_red_init_done = 0; > + rte_red_scaling = RTE_RED_SCALING_DEFAULT; > + rte_red_max_threshold = RTE_RED_DEFAULT_QUEUE_LENGTH - 1; > +} Why do we need this function? These global variables are already initialized at the top of the file. My vote is to remove it. > diff --git a/lib/librte_sched/rte_red.h b/lib/librte_sched/rte_red.h > index ca12227..be1fb0f 100644 > --- a/lib/librte_sched/rte_red.h > +++ b/lib/librte_sched/rte_red.h > @@ -52,14 +52,31 @@ extern "C" { > #include > #include > > -#define RTE_RED_SCALING 10 /**< Fraction size > for fixed- > point */ > -#define RTE_RED_S (1 << 22) /**< Packet size > multiplied > by number of leaf queues */ > -#define
Re: [dpdk-dev] [PATCH v7 1/2] eal: add uevent monitor for hot plug
Hi Jeff Maybe I'm touching in previous discussions but please see some comments\questions. From: Jeff Guo: > This patch aim to add a general uevent mechanism in eal device layer, > to enable all linux kernel object hot plug monitoring, so user could use these > APIs to monitor and read out the device status info that sent from the kernel > side, then corresponding to handle it, such as detach or attach the > device, and even benefit to use it to do smoothly fail safe work. > > 1) About uevent monitoring: > a: add one epolling to poll the netlink socket, to monitor the uevent of >the device, add device_state in struct of rte_device, to identify the >device state machine. > b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent. > c: add below API in rte eal device common layer. >rte_eal_dev_monitor_enable >rte_dev_callback_register >rte_dev_callback_unregister >_rte_dev_callback_process >rte_dev_monitor_start >rte_dev_monitor_stop > > 2) About failure handler, use pci uio for example, >add pci_remap_device in bus layer and below function to process it: >rte_pci_remap_device >pci_uio_remap_resource >pci_map_private_resource >add rte_pci_dev_bind_driver to bind pci device with explicit driver. > > Signed-off-by: Jeff Guo > --- > v7->v6: > a.modify vdev part according to the vdev rework > b.re-define and split the func into common and bus specific code > c.fix some incorrect issue. > b.fix the system hung after send packcet issue. > --- > drivers/bus/pci/bsd/pci.c | 30 ++ > drivers/bus/pci/linux/pci.c| 87 + > drivers/bus/pci/linux/pci_init.h | 1 + > drivers/bus/pci/pci_common.c | 43 +++ > drivers/bus/pci/pci_common_uio.c | 28 ++ > drivers/bus/pci/private.h | 12 + > drivers/bus/pci/rte_bus_pci.h | 25 ++ > drivers/bus/vdev/vdev.c| 36 +++ > lib/librte_eal/bsdapp/eal/eal_dev.c| 64 > .../bsdapp/eal/include/exec-env/rte_dev.h | 106 ++ > lib/librte_eal/common/eal_common_bus.c | 30 ++ > lib/librte_eal/common/eal_common_dev.c | 169 ++ > lib/librte_eal/common/include/rte_bus.h| 69 > lib/librte_eal/common/include/rte_dev.h| 89 ++ > lib/librte_eal/linuxapp/eal/Makefile | 3 +- > lib/librte_eal/linuxapp/eal/eal_alarm.c| 5 + > lib/librte_eal/linuxapp/eal/eal_dev.c | 356 > + > .../linuxapp/eal/include/exec-env/rte_dev.h| 106 ++ > lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 6 + > lib/librte_pci/rte_pci.c | 20 ++ > lib/librte_pci/rte_pci.h | 17 + > 21 files changed, 1301 insertions(+), 1 deletion(-) > create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c > create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h > create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c > create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h > > diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c > index b8e2178..d58dbf6 100644 > --- a/drivers/bus/pci/bsd/pci.c > +++ b/drivers/bus/pci/bsd/pci.c > @@ -126,6 +126,29 @@ rte_pci_unmap_device(struct rte_pci_device *dev) > } > } > > +/* re-map pci device */ > +int > +rte_pci_remap_device(struct rte_pci_device *dev) > +{ > + int ret; > + > + if (dev == NULL) > + return -EINVAL; > + > + switch (dev->kdrv) { > + case RTE_KDRV_NIC_UIO: > + ret = pci_uio_remap_resource(dev); > + break; > + default: > + RTE_LOG(DEBUG, EAL, > + " Not managed by a supported kernel driver, > skipped\n"); > + ret = 1; > + break; > + } > + > + return ret; > +} > + > void > pci_uio_free_resource(struct rte_pci_device *dev, > struct mapped_pci_resource *uio_res) > @@ -678,3 +701,10 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p) > > return ret; > } > + > +int > +rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type) > +{ > + return -1; > +} > + > diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c > index 5da6728..792fd2c 100644 > --- a/drivers/bus/pci/linux/pci.c > +++ b/drivers/bus/pci/linux/pci.c > @@ -145,6 +145,38 @@ rte_pci_unmap_device(struct rte_pci_device *dev) > } > } > > +/* Map pci device */ > +int > +rte_pci_remap_device(struct rte_pci_device *dev) > +{ > + int ret = -1; > + > + if (dev == NULL) > + return -EINVAL; > + > + switch (dev->kdrv) { > + case RTE_KDRV_VFIO: > +#ifdef VFIO_PRESENT > + /* no thing to do */ > +#endif > + break; > + case RTE_KDRV_IGB_UIO: > + case RTE_KDRV_UIO_GENER
Re: [dpdk-dev] [PATCH 2/2] net/mlx5: fix IPv6 header fields
> On Dec 30, 2017, at 11:34 PM, Shachar Beiser wrote: > > There are parameters that are not copy from > spec to verbs structure in the vtc_label > > Fixes: 43e9d97 ("net/mlx5: support upstream rdma-core") > Cc: sta...@dpdk.org > > Signed-off-by: Shachar Beiser > --- Acked-by: Yongseok Koh Thanks
[dpdk-dev] [PATCH 0/2] mlx5: remove dependency on kernel version
Trying to eliminate all runtime calls to look at kernel version to determine API because they are source of portablity problems in distributions. Stephen Hemminger (2): mlx5: don't pass unused argument to sub-functions mlx5: don't depend on kernel version drivers/net/mlx5/mlx5_ethdev.c | 118 + 1 file changed, 48 insertions(+), 70 deletions(-) -- 2.15.1
[dpdk-dev] [PATCH 2/2] mlx5: don't depend on kernel version
This driver uses ethtool to get link status. The ethtool API has new and old deprecated API. Rather than checking kernel version, use the same algorithm that the ethtool command does; check the new API first and if that fails, try the old one. Also, use common code for getting link state up/down and comparing for changes. Signed-off-by: Stephen Hemminger --- drivers/net/mlx5/mlx5_ethdev.c | 110 ++--- 1 file changed, 47 insertions(+), 63 deletions(-) diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 388507f109f7..2dc32cdf58b9 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -49,7 +49,6 @@ #include #include #include -#include #include #include #include @@ -757,36 +756,25 @@ mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev) * Pointer to Ethernet device structure. */ static int -mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev) +mlx5_link_update_unlocked_gset(struct priv *priv, struct ifreq *ifr, + struct rte_eth_link *dev_link) { - struct priv *priv = mlx5_get_priv(dev); struct ethtool_cmd edata = { .cmd = ETHTOOL_GSET /* Deprecated since Linux v4.5. */ }; - struct ifreq ifr; - struct rte_eth_link dev_link; int link_speed = 0; - /* priv_lock() is not taken to allow concurrent calls. */ - - if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) { - WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(errno)); - return -1; - } - memset(&dev_link, 0, sizeof(dev_link)); - dev_link.link_status = ((ifr.ifr_flags & IFF_UP) && - (ifr.ifr_flags & IFF_RUNNING)); - ifr.ifr_data = (void *)&edata; - if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) { + ifr->ifr_data = (void *)&edata; + if (priv_ifreq(priv, SIOCETHTOOL, ifr)) { WARN("ioctl(SIOCETHTOOL, ETHTOOL_GSET) failed: %s", strerror(errno)); return -1; } link_speed = ethtool_cmd_speed(&edata); if (link_speed == -1) - dev_link.link_speed = 0; + dev_link->link_speed = 0; else - dev_link.link_speed = link_speed; + dev_link->link_speed = link_speed; priv->link_speed_capa = 0; if (edata.supported & SUPPORTED_Autoneg) priv->link_speed_capa |= ETH_LINK_SPEED_AUTONEG; @@ -800,17 +788,9 @@ mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev) SUPPORTED_4baseSR4_Full | SUPPORTED_4baseLR4_Full)) priv->link_speed_capa |= ETH_LINK_SPEED_40G; - dev_link.link_duplex = ((edata.duplex == DUPLEX_HALF) ? + dev_link->link_duplex = ((edata.duplex == DUPLEX_HALF) ? ETH_LINK_HALF_DUPLEX : ETH_LINK_FULL_DUPLEX); - dev_link.link_autoneg = !(dev->data->dev_conf.link_speeds & - ETH_LINK_SPEED_FIXED); - if (memcmp(&dev_link, &dev->data->dev_link, sizeof(dev_link))) { - /* Link status changed. */ - dev->data->dev_link = dev_link; - return 0; - } - /* Link status is still the same. */ - return -1; + return 0; } /** @@ -820,23 +800,14 @@ mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev) * Pointer to Ethernet device structure. */ static int -mlx5_link_update_unlocked_gs(struct rte_eth_dev *dev) +mlx5_link_update_unlocked_gs(struct priv *priv, struct ifreq *ifr, +struct rte_eth_link *dev_link) { - struct priv *priv = mlx5_get_priv(dev); struct ethtool_link_settings gcmd = { .cmd = ETHTOOL_GLINKSETTINGS }; - struct ifreq ifr; - struct rte_eth_link dev_link; uint64_t sc; - if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) { - WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(errno)); - return -1; - } - memset(&dev_link, 0, sizeof(dev_link)); - dev_link.link_status = ((ifr.ifr_flags & IFF_UP) && - (ifr.ifr_flags & IFF_RUNNING)); - ifr.ifr_data = (void *)&gcmd; - if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) { + ifr->ifr_data = (void *)&gcmd; + if (priv_ifreq(priv, SIOCETHTOOL, ifr)) { DEBUG("ioctl(SIOCETHTOOL, ETHTOOL_GLINKSETTINGS) failed: %s", strerror(errno)); return -1; @@ -849,13 +820,13 @@ mlx5_link_update_unlocked_gs(struct rte_eth_dev *dev) struct ethtool_link_settings *ecmd = (void *)data; *ecmd = gcmd; - ifr.ifr_data = (void *)ecmd; - if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) { + ifr->ifr_data = (void *)ecmd; + if (priv_ifreq(priv, SIOCETHTOOL, ifr)) { DEBUG("ioctl(SIOCETHTOOL, ETHTOOL_GLINKSETTINGS) failed:
[dpdk-dev] [PATCH 1/2] mlx5: don't pass unused argument to sub-functions
Since wait_to_complete is unused, don't pass it to helper functions. Use the standard RTE macro to indicate this is an unused parameter. Signed-off-by: Stephen Hemminger --- drivers/net/mlx5/mlx5_ethdev.c | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index a3cef6891d03..388507f109f7 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -755,11 +755,9 @@ mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev) * * @param dev * Pointer to Ethernet device structure. - * @param wait_to_complete - * Wait for request completion (ignored). */ static int -mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev, int wait_to_complete) +mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev) { struct priv *priv = mlx5_get_priv(dev); struct ethtool_cmd edata = { @@ -771,7 +769,6 @@ mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev, int wait_to_complete) /* priv_lock() is not taken to allow concurrent calls. */ - (void)wait_to_complete; if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) { WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(errno)); return -1; @@ -821,11 +818,9 @@ mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev, int wait_to_complete) * * @param dev * Pointer to Ethernet device structure. - * @param wait_to_complete - * Wait for request completion (ignored). */ static int -mlx5_link_update_unlocked_gs(struct rte_eth_dev *dev, int wait_to_complete) +mlx5_link_update_unlocked_gs(struct rte_eth_dev *dev) { struct priv *priv = mlx5_get_priv(dev); struct ethtool_link_settings gcmd = { .cmd = ETHTOOL_GLINKSETTINGS }; @@ -833,7 +828,6 @@ mlx5_link_update_unlocked_gs(struct rte_eth_dev *dev, int wait_to_complete) struct rte_eth_link dev_link; uint64_t sc; - (void)wait_to_complete; if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) { WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(errno)); return -1; @@ -921,7 +915,7 @@ mlx5_link_update_unlocked_gs(struct rte_eth_dev *dev, int wait_to_complete) * Wait for request completion (ignored). */ int -mlx5_link_update(struct rte_eth_dev *dev, int wait_to_complete) +mlx5_link_update(struct rte_eth_dev *dev, int wait_to_complete __rte_unused) { struct utsname utsname; int ver[3]; @@ -930,8 +924,8 @@ mlx5_link_update(struct rte_eth_dev *dev, int wait_to_complete) sscanf(utsname.release, "%d.%d.%d", &ver[0], &ver[1], &ver[2]) != 3 || KERNEL_VERSION(ver[0], ver[1], ver[2]) < KERNEL_VERSION(4, 9, 0)) - return mlx5_link_update_unlocked_gset(dev, wait_to_complete); - return mlx5_link_update_unlocked_gs(dev, wait_to_complete); + return mlx5_link_update_unlocked_gset(dev); + return mlx5_link_update_unlocked_gs(dev); } /** -- 2.15.1
[dpdk-dev] [RFC] mlx5: update NIC documentation on RDMA core version
The current driver requires v16. It will not work or build with the older version (as in Debian stable). Note: libmlx5 is rolled into rdma-core in current versions. Mlx4 probably requires similar documentation update. Signed-off-by: Stephen Hemminger --- doc/guides/nics/mlx5.rst | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index f9558da89b61..603dd4e9c1cd 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -299,26 +299,26 @@ Prerequisites - This driver relies on external libraries and kernel drivers for resources -allocations and initialization. The following dependencies are not part of -DPDK and must be installed separately: +allocations and initialization. The following packages come from the +Linux RDMA core https://github.com/linux-rdma/rdma-core. The current +version of this driver requires version version 16 or later. - **libibverbs** User space Verbs framework used by librte_pmd_mlx5. This library provides - a generic interface between the kernel and low-level user space drivers - such as libmlx5. + a generic interface between the kernel and low-level user space drivers. It allows slow and privileged operations (context initialization, hardware resources allocations) to be managed by the kernel and fast operations to never leave user space. -- **libmlx5** + The development package (libibverbs-dev or libibverbs-devel) are necessary + for compilation. - Low-level user space driver library for Mellanox ConnectX-4/ConnectX-5 - devices, it is automatically loaded by libibverbs. +- **rdma-core** - This library basically implements send/receive calls to the hardware - queues. + The basic userspace infrastructure for interaction with RDMA subsystem + on Linux. - **Kernel modules** -- 2.15.1
Re: [dpdk-dev] [PATCH v3 1/2] gro: code cleanup
> -Original Message- > From: Richardson, Bruce > Sent: Tuesday, January 2, 2018 7:26 PM > To: Hu, Jiayu > Cc: dev@dpdk.org; Tan, Jianfeng; Chen, Junjie J; Ananyev, Konstantin; > step...@networkplumber.org; Yigit, Ferruh; Yao, Lei A > Subject: Re: [dpdk-dev] [PATCH v3 1/2] gro: code cleanup > > On Fri, Dec 22, 2017 at 03:25:43PM +0800, Jiayu Hu wrote: > > - Remove needless check and variants > > - For better understanding, update the programmer guide and rename > > internal functions and variants > > - For supporting tunneled gro, move common internal functions from > > gro_tcp4.c to gro_tcp4.h > > - Comply RFC 6864 to process the IPv4 ID field > > > > Signed-off-by: Jiayu Hu > > --- > > .../prog_guide/generic_receive_offload_lib.rst | 246 --- > > doc/guides/prog_guide/img/gro-key-algorithm.png| Bin 0 -> 28231 > bytes > > Rather than binary PNG images, please use SVG files (note, real SVG, not > an SVG file with a binary blob pasted into it). Based on my limited experience, there is no shortcut for this, but re-draw the picture with tools like visio. Thanks, Jianfeng > > Thanks, > /Bruce
Re: [dpdk-dev] [PATCH v2] bus/vdev: add custom scan hook
> -Original Message- > From: Thomas Monjalon [mailto:tho...@monjalon.net] > Sent: Sunday, December 31, 2017 5:22 AM > To: Tan, Jianfeng > Cc: dev@dpdk.org > Subject: [PATCH v2] bus/vdev: add custom scan hook > > The scan callback allows to spawn a vdev automatically > given some custom scan rules. > It is especially useful to create a TAP device automatically > connected to a netdevice as remote. > > Signed-off-by: Thomas Monjalon Might worth a note in new feature section of the release doc. Besides that, Acked-by: Jianfeng Tan Thanks, Jianfeng
Re: [dpdk-dev] [PATCH v3 1/2] gro: code cleanup
On Wed, 3 Jan 2018 01:07:37 + "Tan, Jianfeng" wrote: > > -Original Message- > > From: Richardson, Bruce > > Sent: Tuesday, January 2, 2018 7:26 PM > > To: Hu, Jiayu > > Cc: dev@dpdk.org; Tan, Jianfeng; Chen, Junjie J; Ananyev, Konstantin; > > step...@networkplumber.org; Yigit, Ferruh; Yao, Lei A > > Subject: Re: [dpdk-dev] [PATCH v3 1/2] gro: code cleanup > > > > On Fri, Dec 22, 2017 at 03:25:43PM +0800, Jiayu Hu wrote: > > > - Remove needless check and variants > > > - For better understanding, update the programmer guide and rename > > > internal functions and variants > > > - For supporting tunneled gro, move common internal functions from > > > gro_tcp4.c to gro_tcp4.h > > > - Comply RFC 6864 to process the IPv4 ID field > > > > > > Signed-off-by: Jiayu Hu > > > --- > > > .../prog_guide/generic_receive_offload_lib.rst | 246 --- > > > doc/guides/prog_guide/img/gro-key-algorithm.png| Bin 0 -> 28231 > > bytes > > > > Rather than binary PNG images, please use SVG files (note, real SVG, not > > an SVG file with a binary blob pasted into it). > > Based on my limited experience, there is no shortcut for this, but re-draw > the picture with tools like visio. Inkscape is open source and produces svg files.
Re: [dpdk-dev] [PATCH v2 2/2] net/virtio: support GUEST ANNOUNCE
Hi, > -Original Message- > From: Bie, Tiwei > Sent: Monday, December 4, 2017 4:47 PM > To: Wang, Xiao W > Cc: y...@fridaylinux.org; dev@dpdk.org; step...@networkplumber.org > Subject: Re: [PATCH v2 2/2] net/virtio: support GUEST ANNOUNCE > > On Mon, Dec 04, 2017 at 06:02:08AM -0800, Xiao Wang wrote: > > When live migration is done, for the backup VM, either the virtio > > frontend or the vhost backend needs to send out gratuitous RARP packet > > to announce its new network location. > > > > To support GUEST ANNOUNCE, do we just need to send a RARP packet? > Will it work in an IPv6-only network? Will try to send out another one for IPv6 in next version. > > > This patch enables VIRTIO_NET_F_GUEST_ANNOUNCE feature to support > live > [...] > > + > > +static int > > +virtio_dev_pause(struct rte_eth_dev *dev) > > +{ > > + struct virtio_hw *hw = dev->data->dev_private; > > + > > + if (hw->started == 0) > > + return -1; > > + hw->started = 0; > > + /* > > +* Prevent the worker thread from touching queues to avoid condition, > > Typo. Avoid "contention"? Will fix it in next version. > > > +* 1 ms should be enough for the ongoing Tx function to finish. > > +*/ > > + rte_delay_ms(1); > > + return 0; > > +} > > + > > +static void > > +virtio_dev_resume(struct rte_eth_dev *dev) > > +{ > > + struct virtio_hw *hw = dev->data->dev_private; > > + > > + hw->started = 1; > > +} > > + > > +static void > > +generate_rarp(struct rte_eth_dev *dev) > > You can give it a better name, e.g. virtio_notify_peers(). Good suggestion. > > > +{ > > + struct virtio_hw *hw = dev->data->dev_private; > > + struct rte_mbuf *rarp_mbuf = NULL; > > + struct virtnet_tx *txvq = dev->data->tx_queues[0]; > > + struct virtnet_rx *rxvq = dev->data->rx_queues[0]; > > + > > + rarp_mbuf = rte_mbuf_raw_alloc(rxvq->mpool); > > + if (rarp_mbuf == NULL) { > > + PMD_DRV_LOG(ERR, "mbuf allocate failed"); > > + return; > > + } > > + > > + if (make_rarp_packet(rarp_mbuf, (struct ether_addr *)hw->mac_addr)) > { > > + rte_pktmbuf_free(rarp_mbuf); > > + rarp_mbuf = NULL; > > + return; > > + } > > + > > + /* If virtio port just stopped, no need to send RARP */ > > + if (virtio_dev_pause(dev) < 0) > > + return; > > + > > + virtio_inject_pkts(txvq, &rarp_mbuf, 1); > > + /* Recover the stored hw status to let worker thread continue */ > > + virtio_dev_resume(dev); > > +} > > + > > +static void > > +virtnet_ack_link_announce(struct rte_eth_dev *dev) > > Why use "virtnet_" prefix? I think "virtio_" would be better. Yes, that would be similar to other function names in this file. > > > +{ > > + struct virtio_hw *hw = dev->data->dev_private; > > + struct virtio_pmd_ctrl ctrl; > > + int len; > > + > > + ctrl.hdr.class = VIRTIO_NET_CTRL_ANNOUNCE; > > + ctrl.hdr.cmd = VIRTIO_NET_CTRL_ANNOUNCE_ACK; > > + len = 0; > > + > > + virtio_send_command(hw->cvq, &ctrl, &len, 0); > > +} > > + > > /* > > - * Process Virtio Config changed interrupt and call the callback > > - * if link state changed. > > + * Process virtio config changed interrupt. Call the callback > > + * if link state changed; generate gratuitous RARP packet if > > Better to replace ";" with "," OK. Will update in next version. > > > + * the status indicates an ANNOUNCE. > > */ > > void > > virtio_interrupt_handler(void *param) > > @@ -1274,6 +1391,12 @@ static int virtio_dev_xstats_get_names(struct > rte_eth_dev *dev, > > NULL, NULL); > > } > > > > + if (isr & VIRTIO_NET_S_ANNOUNCE) { > > + rte_spinlock_lock(&hw->sl); > > + generate_rarp(dev); > > Just curious. Do you need to make sure that the RARP packet > would be sent successfully? The pause will make the ring get drained. > > > + virtnet_ack_link_announce(dev); > > + rte_spinlock_unlock(&hw->sl); > > + } > > } > [...] > > diff --git a/drivers/net/virtio/virtio_pci.h > > b/drivers/net/virtio/virtio_pci.h > > index 3c5ce66..3cd367e 100644 > > --- a/drivers/net/virtio/virtio_pci.h > > +++ b/drivers/net/virtio/virtio_pci.h > > @@ -270,6 +270,7 @@ struct virtio_hw { > > struct virtio_pci_common_cfg *common_cfg; > > struct virtio_net_config *dev_cfg; > > void*virtio_user_dev; > > + rte_spinlock_t sl; > > Some detailed comments need to be added in the code to > document the usage of this lock. OK. Will add it in v3. > > > > > struct virtqueue **vqs; > > }; > [...] > > diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h > > index 2305d91..ed420e9 100644 > > --- a/drivers/net/virtio/virtqueue.h > > +++ b/drivers/net/virtio/virtqueue.h > > @@ -158,6 +158,17 @@ struct virtio_net_ctrl_mac { > > #define VIRTIO_NET_CTRL_VLAN_ADD 0 > > #define VIRTIO_NET_CTRL_VLAN_DEL 1 > > > > +/* > > + * Control link announce acknowledgement > > + * > > + * T
Re: [dpdk-dev] [PATCH v2 2/2] net/virtio: support GUEST ANNOUNCE
> -Original Message- > From: Bie, Tiwei > Sent: Wednesday, December 6, 2017 7:23 PM > To: Wang, Xiao W > Cc: y...@fridaylinux.org; dev@dpdk.org; step...@networkplumber.org > Subject: Re: [PATCH v2 2/2] net/virtio: support GUEST ANNOUNCE > > On Mon, Dec 04, 2017 at 06:02:08AM -0800, Xiao Wang wrote: > [...] > > diff --git a/drivers/net/virtio/virtio_rxtx.c > > b/drivers/net/virtio/virtio_rxtx.c > > index 6a24fde..7313bdd 100644 > > --- a/drivers/net/virtio/virtio_rxtx.c > > +++ b/drivers/net/virtio/virtio_rxtx.c > > @@ -1100,3 +1100,84 @@ > > > > return nb_tx; > > } > > + > > +uint16_t > > +virtio_inject_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t > nb_pkts) > > +{ > > + struct virtnet_tx *txvq = tx_queue; > > + struct virtqueue *vq = txvq->vq; > > + struct virtio_hw *hw = vq->hw; > > + uint16_t hdr_size = hw->vtnet_hdr_size; > > + uint16_t nb_used, nb_tx = 0; > > + > > + if (unlikely(nb_pkts < 1)) > > + return nb_pkts; > > + > > + PMD_TX_LOG(DEBUG, "%d packets to xmit", nb_pkts); > > + nb_used = VIRTQUEUE_NUSED(vq); > > + > > + virtio_rmb(); > > + if (likely(nb_used > vq->vq_nentries - vq->vq_free_thresh)) > > + virtio_xmit_cleanup(vq, nb_used); > > + > > + for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) { > > + struct rte_mbuf *txm = tx_pkts[nb_tx]; > > + int can_push = 0, use_indirect = 0, slots, need; > > + > > + /* optimize ring usage */ > > + if ((vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) || > > + vtpci_with_feature(hw, > VIRTIO_F_VERSION_1)) && > > + rte_mbuf_refcnt_read(txm) == 1 && > > + RTE_MBUF_DIRECT(txm) && > > + txm->nb_segs == 1 && > > + rte_pktmbuf_headroom(txm) >= hdr_size && > > + rte_is_aligned(rte_pktmbuf_mtod(txm, char *), > > + __alignof__(struct virtio_net_hdr_mrg_rxbuf))) > > + can_push = 1; > > + else if (vtpci_with_feature(hw, > VIRTIO_RING_F_INDIRECT_DESC) && > > +txm->nb_segs < VIRTIO_MAX_TX_INDIRECT) > > + use_indirect = 1; > > + > > + /* How many main ring entries are needed to this Tx? > > +* any_layout => number of segments > > +* indirect => 1 > > +* default=> number of segments + 1 > > +*/ > > + slots = use_indirect ? 1 : (txm->nb_segs + !can_push); > > + need = slots - vq->vq_free_cnt; > > + > > + /* Positive value indicates it need free vring descriptors */ > > + if (unlikely(need > 0)) { > > + nb_used = VIRTQUEUE_NUSED(vq); > > + virtio_rmb(); > > + need = RTE_MIN(need, (int)nb_used); > > + > > + virtio_xmit_cleanup(vq, need); > > + need = slots - vq->vq_free_cnt; > > + if (unlikely(need > 0)) { > > + PMD_TX_LOG(ERR, > > + "No free tx descriptors to > transmit"); > > + break; > > + } > > + } > > + > > + /* Enqueue Packet buffers */ > > + virtqueue_enqueue_xmit(txvq, txm, slots, use_indirect, > can_push); > > + > > + txvq->stats.bytes += txm->pkt_len; > > + virtio_update_packet_stats(&txvq->stats, txm); > > + } > > + > > + txvq->stats.packets += nb_tx; > > + > > + if (likely(nb_tx)) { > > + vq_update_avail_idx(vq); > > + > > + if (unlikely(virtqueue_kick_prepare(vq))) { > > + virtqueue_notify(vq); > > + PMD_TX_LOG(DEBUG, "Notified backend after xmit"); > > + } > > + } > > + > > + return nb_tx; > > +} > > Simple Tx has some special assumptions and setups of the txq. > Basically the current implementation of virtio_inject_pkts() > is a mirror of virtio_xmit_pkts(). So when simple Tx function > is chosen, calling virtio_inject_pkts() could cause problems. I will have a static mbuf ** pointer for rarp packets in next version, which can also avoid code duplication. BRs, Xiao
Re: [dpdk-dev] [PATCH 2/2] net/virtio: support GUEST ANNOUNCE
> -Original Message- > From: Yuanhan Liu [mailto:y...@fridaylinux.org] > Sent: Tuesday, December 5, 2017 10:26 PM > To: Wang, Xiao W > Cc: dev@dpdk.org; Bie, Tiwei > Subject: Re: [PATCH 2/2] net/virtio: support GUEST ANNOUNCE > > On Thu, Nov 30, 2017 at 02:41:12AM +, Wang, Xiao W wrote: > > > > > > > -Original Message- > > > From: Yuanhan Liu [mailto:y...@fridaylinux.org] > > > Sent: Monday, November 27, 2017 8:49 PM > > > To: Wang, Xiao W > > > Cc: dev@dpdk.org > > > Subject: Re: [PATCH 2/2] net/virtio: support GUEST ANNOUNCE > > > > > > On Fri, Nov 24, 2017 at 03:04:00AM -0800, Xiao Wang wrote: > > > > When live migration is done, for the backup VM, either the virtio > > > > frontend or the vhost backend needs to send out gratuitous RARP packet > > > > to announce its new network location. > > > > > > > > This patch enables VIRTIO_NET_F_GUEST_ANNOUNCE feature to support > > > live > > > > migration scenario where the vhost backend doesn't have the ability to > > > > generate RARP packet. > > > > > > Yes, it's a feature good to have. > > > > > > > +static int > > > > +virtio_dev_pause(struct rte_eth_dev *dev) > > > > +{ > > > > + struct virtio_hw *hw = dev->data->dev_private; > > > > + > > > > + if (hw->started == 0) > > > > + return -1; > > > > + hw->started = 0; > > > > + /* > > > > +* Prevent the worker thread from touching queues to avoid > > > > condition, > > > > +* 1 ms should be enough for the ongoing Tx function to finish. > > > > +*/ > > > > + rte_delay_ms(1); > > > > + return 0; > > > > +} > > > > + > > > > +static void > > > > +virtio_dev_resume(struct rte_eth_dev *dev) > > > > +{ > > > > + struct virtio_hw *hw = dev->data->dev_private; > > > > + > > > > + hw->started = 1; > > > > +} > > > > > > However, the implementation (stop first, pause for 1ms, duplicate another > > > Tx function, resume) doesn't seem elegant. > > > > > > You probably could try something like DPDK vhost does: > > > > > > - set a flag when S_ANNOUCE is received > > > - inject a pkt when such flag is set in the xmit function > > > > > > You then should be able to get rid of all of above stuffs. > > > > > > --yliu > > > > The difference is that the virtio port may just receive packet, without > > xmit. > > Thanks, I missed that. > > However, you really should not add a duplicate function. It adds more > maintain effort. I think you probably could just invoke the tx_pkt_burst > callback directly. You have stopped the device after all. What's the > necessary to duplicate it? > > --yliu To just invoke the tx_pkt_burst callback will be ideal. I will have a static mbuf ** pointer for rarp packets in next version, then the tx_pkt_burst function will let rarp packet pass even if the queue is stopped. This can avoid code duplication. BRs, Xiao
Re: [dpdk-dev] [PATCH v2] bus/pci: fix wrong intr_handle.type with uio_pci_generic
> -Original Message- > From: Thomas Monjalon [mailto:tho...@monjalon.net] > Sent: Friday, December 29, 2017 7:07 PM > To: Yang, Zhiyong > Cc: dev@dpdk.org; Yigit, Ferruh ; sta...@dpdk.org > Subject: Re: [PATCH v2] bus/pci: fix wrong intr_handle.type with > uio_pci_generic > > 29/12/2017 08:55, Zhiyong Yang: > > For virtio legacy device, testpmd startup fails when using > > uio_pci_generic. The issue is caused by invoking the function > > pci_ioport_map. The right intr_handle.type is already set before > > calling it, we should avoid overwriting the default value "RTE_ > > INTR_HANDLE_UNKNOWN" in it. Besides, the removal has no harm to other > > cases since it already is set to this value (0) at init. > > To be more precise, it is set to 0 by a memset on the whole struct during > allocation in the scan function (pci_scan_one). > > > --- a/drivers/bus/pci/linux/pci.c > > +++ b/drivers/bus/pci/linux/pci.c > > @@ -723,7 +723,6 @@ pci_ioport_map(struct rte_pci_device *dev, int bar > __rte_unused, > > if (!found) > > return -1; > > > > - dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN; > > There is the same assignment in pci_vfio_map_resource_primary(), > pci_vfio_map_resource_secondary() and pci_uio_map_resource(). > > Please could you check why there is such assignments? In general, the operation in the three functions intends to initialize the "intr_handle.type", For example, For pci_uio_map_resource(), it wants to get "unknown" status once the code returns abnormally after initializing. If the code goes smoothly, dev->intr_handle.type must be assigned to "RTE_INTR_HANDLE_UIO" for bsd environment, Or must be assigned to "RTE_INTR_HANDLE_UIO" or " RTE_INTR_HANDLE_UIO_INTX" for linux environment In consideration of the "memset" in pci_scan_one, it can be removed to has no harm to the existing logic. Of course, keeping it is ok. pci_vfio_map_resource_primary() and pci_vfio_map_resource_secondary() are similar. The author was emphasizing that intr_handle.type should be initialized (0) and can be assigned to a right value after it. Once fails, we can read a status "unknown". I guess. Turn back to the patch, it is crude to assign "unknown" directly since pci_ioport_map is not only used by real "unknown" But also is used to handle uio_pci_generic driver on X86 platform. It is a special case to cause error for uio_pci_generic. Thanks Zhiyong
Re: [dpdk-dev] [PATCH v2] net/i40e: fix port segmentation fault when restart
Hi, zhangqi > -Original Message- > From: Zhang, Qi Z > Sent: Friday, December 22, 2017 11:31 AM > To: Zhao1, Wei ; dev@dpdk.org > Cc: Zhao1, Wei > Subject: RE: [dpdk-dev] [PATCH v2] net/i40e: fix port segmentation fault > when restart > > > -Original Message- > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Wei Zhao > > Sent: Wednesday, November 15, 2017 1:55 PM > > To: dev@dpdk.org > > Cc: Zhao1, Wei > > Subject: [dpdk-dev] [PATCH v2] net/i40e: fix port segmentation fault > > when restart > > > > It will clear all queue region related configuration when dev stop > > even if threr > > s/threr/there > > > is no queue region config, so this may cause error. So add check when > > flush queue region config and delete it when device stop. > > > > Fixes: 7cbecc2f742 ("net/i40e: support queue region set and flush") > > > > Signed-off-by: Wei Zhao > > > > --- > > > > v2: > > -fix patch check warning. > > --- > > drivers/net/i40e/i40e_ethdev.c | 3 --- > > drivers/net/i40e/rte_pmd_i40e.c | 27 ++- > > 2 files changed, 14 insertions(+), 16 deletions(-) > > > > diff --git a/drivers/net/i40e/i40e_ethdev.c > > b/drivers/net/i40e/i40e_ethdev.c index 811cc9f..7a1290b 100644 > > --- a/drivers/net/i40e/i40e_ethdev.c > > +++ b/drivers/net/i40e/i40e_ethdev.c > > @@ -2154,9 +2154,6 @@ i40e_dev_stop(struct rte_eth_dev *dev) > > /* reset hierarchy commit */ > > pf->tm_conf.committed = false; > > > > - /* Remove all the queue region configuration */ > > - i40e_flush_queue_region_all_conf(dev, hw, pf, 0); > > - > From the commit log, the reason you remove above line is because the > function can't handle the situation when no queue region config, but what > about the case that queue region config does exist? Could you add more > message to explain why this not be impacted. Ok, I will add some more info about when queue region config does exist. In this case, it treat no queue region config as queue region config does exist. > > > hw->adapter_stopped = 1; > > } > > > > diff --git a/drivers/net/i40e/rte_pmd_i40e.c > > b/drivers/net/i40e/rte_pmd_i40e.c index aeb92af..c2e2466 100644 > > --- a/drivers/net/i40e/rte_pmd_i40e.c > > +++ b/drivers/net/i40e/rte_pmd_i40e.c > > @@ -2845,22 +2845,23 @@ i40e_flush_queue_region_all_conf(struct > > rte_eth_dev *dev, > > return 0; > > } > > > > - info->queue_region_number = 1; > > - info->region[0].queue_num = main_vsi->nb_used_qps; > > - info->region[0].queue_start_index = 0; > > + if (info->queue_region_number) { > > + info->queue_region_number = 1; > > + info->region[0].queue_num = main_vsi->nb_used_qps; > > + info->region[0].queue_start_index = 0; > > > > - ret = i40e_vsi_update_queue_region_mapping(hw, pf); > > - if (ret != I40E_SUCCESS) > > - PMD_DRV_LOG(INFO, "Failed to flush queue region > mapping."); > > - > > - ret = i40e_dcb_init_configure(dev, TRUE); > > - if (ret != I40E_SUCCESS) { > > - PMD_DRV_LOG(INFO, "Failed to flush dcb."); > > - pf->flags &= ~I40E_FLAG_DCB; > > - } > > + ret = i40e_vsi_update_queue_region_mapping(hw, pf); > > + if (ret != I40E_SUCCESS) > > + PMD_DRV_LOG(INFO, "Failed to flush queue region > > mapping."); > > WARNING: line over 80 characters There is a new rule that I has been told by Ferruh Yigit, log message in double quotation marks can over 80 characters. > Regards > Qi > > > > > - i40e_init_queue_region_conf(dev); > > + ret = i40e_dcb_init_configure(dev, TRUE); > > + if (ret != I40E_SUCCESS) { > > + PMD_DRV_LOG(INFO, "Failed to flush dcb."); > > + pf->flags &= ~I40E_FLAG_DCB; > > + } > > > > + i40e_init_queue_region_conf(dev); > > + } > > return 0; > > } > > > > -- > > 2.7.4
[dpdk-dev] [PATCH v2 0/7] convert mlx PMDs to new ethdev offloads API
This series is to convert mlx4 and mlx5 PMDs to the new offloads API [1]. On v2: - New design to hold PMD specific args and combine them with offloads requested. - Fix missing IPV4 checksum flag on vector function selection. - Verify Txq flags ignore bit before checking for valid offloads configuration. - Removed strict offloads check from mlx4. [1] http://dpdk.org/ml/archives/dev/2017-October/077329.html Nelio Laranjeiro (1): net/mlx5: rename counter set in configuration Shahaf Shuler (6): net/mlx5: change pkt burst select function prototype net/mlx5: add device configuration structure net/mlx5: convert to new Tx offloads API net/mlx5: convert to new Rx offloads API net/mlx4: convert to new Tx offloads API net/mlx4: convert to new Rx offloads API doc/guides/nics/mlx5.rst | 15 +- drivers/net/mlx4/mlx4_ethdev.c | 17 +-- drivers/net/mlx4/mlx4_flow.c | 5 +- drivers/net/mlx4/mlx4_rxq.c | 78 ++- drivers/net/mlx4/mlx4_rxtx.h | 3 + drivers/net/mlx4/mlx4_txq.c | 71 +- drivers/net/mlx5/mlx5.c | 190 + drivers/net/mlx5/mlx5.h | 57 +--- drivers/net/mlx5/mlx5_ethdev.c | 113 --- drivers/net/mlx5/mlx5_flow.c | 2 +- drivers/net/mlx5/mlx5_rxq.c | 121 +--- drivers/net/mlx5/mlx5_rxtx.c | 6 +- drivers/net/mlx5/mlx5_rxtx.h | 10 +- drivers/net/mlx5/mlx5_rxtx_vec.c | 40 +++--- drivers/net/mlx5/mlx5_rxtx_vec.h | 12 ++ drivers/net/mlx5/mlx5_trigger.c | 4 +- drivers/net/mlx5/mlx5_txq.c | 254 +- drivers/net/mlx5/mlx5_vlan.c | 7 +- 18 files changed, 662 insertions(+), 343 deletions(-) -- 2.12.0
[dpdk-dev] [PATCH v2 1/7] net/mlx5: change pkt burst select function prototype
Change the function prototype to return the function pointer of the selected Tx/Rx burst function instead of assigning it directly to the device context. Such change will enable to use those select functions to query the burst function that will be selected according to the device configuration. Signed-off-by: Shahaf Shuler Acked-by: Nelio Laranjeiro --- drivers/net/mlx5/mlx5.c | 11 -- drivers/net/mlx5/mlx5.h | 4 ++-- drivers/net/mlx5/mlx5_ethdev.c | 41 +--- drivers/net/mlx5/mlx5_trigger.c | 4 ++-- 4 files changed, 37 insertions(+), 23 deletions(-) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index cd66fe162..0192815f2 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -712,8 +712,15 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) err = -err; goto error; } - priv_dev_select_rx_function(priv, eth_dev); - priv_dev_select_tx_function(priv, eth_dev); + /* +* Ethdev pointer is still required as input since +* the primary device is not accessible from the +* secondary process. +*/ + eth_dev->rx_pkt_burst = + priv_select_rx_function(priv, eth_dev); + eth_dev->tx_pkt_burst = + priv_select_tx_function(priv, eth_dev); continue; } diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index e6a69b823..3e3259b55 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -206,8 +206,8 @@ void priv_dev_interrupt_handler_uninstall(struct priv *, struct rte_eth_dev *); void priv_dev_interrupt_handler_install(struct priv *, struct rte_eth_dev *); int mlx5_set_link_down(struct rte_eth_dev *dev); int mlx5_set_link_up(struct rte_eth_dev *dev); -void priv_dev_select_tx_function(struct priv *priv, struct rte_eth_dev *dev); -void priv_dev_select_rx_function(struct priv *priv, struct rte_eth_dev *dev); +eth_tx_burst_t priv_select_tx_function(struct priv *, struct rte_eth_dev *); +eth_rx_burst_t priv_select_rx_function(struct priv *, struct rte_eth_dev *); /* mlx5_mac.c */ diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 282ef241e..28183534a 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -1325,8 +1325,8 @@ priv_dev_set_link(struct priv *priv, struct rte_eth_dev *dev, int up) err = priv_set_flags(priv, ~IFF_UP, IFF_UP); if (err) return err; - priv_dev_select_tx_function(priv, dev); - priv_dev_select_rx_function(priv, dev); + dev->tx_pkt_burst = priv_select_tx_function(priv, dev); + dev->rx_pkt_burst = priv_select_rx_function(priv, dev); } else { err = priv_set_flags(priv, ~IFF_UP, ~IFF_UP); if (err) @@ -1386,32 +1386,36 @@ mlx5_set_link_up(struct rte_eth_dev *dev) * Pointer to private data structure. * @param dev * Pointer to rte_eth_dev structure. + * + * @return + * Pointer to selected Tx burst function. */ -void -priv_dev_select_tx_function(struct priv *priv, struct rte_eth_dev *dev) +eth_tx_burst_t +priv_select_tx_function(struct priv *priv, __rte_unused struct rte_eth_dev *dev) { + eth_tx_burst_t tx_pkt_burst = mlx5_tx_burst; + assert(priv != NULL); - assert(dev != NULL); - dev->tx_pkt_burst = mlx5_tx_burst; /* Select appropriate TX function. */ if (priv->mps == MLX5_MPW_ENHANCED) { if (priv_check_vec_tx_support(priv) > 0) { if (priv_check_raw_vec_tx_support(priv) > 0) - dev->tx_pkt_burst = mlx5_tx_burst_raw_vec; + tx_pkt_burst = mlx5_tx_burst_raw_vec; else - dev->tx_pkt_burst = mlx5_tx_burst_vec; + tx_pkt_burst = mlx5_tx_burst_vec; DEBUG("selected Enhanced MPW TX vectorized function"); } else { - dev->tx_pkt_burst = mlx5_tx_burst_empw; + tx_pkt_burst = mlx5_tx_burst_empw; DEBUG("selected Enhanced MPW TX function"); } } else if (priv->mps && priv->txq_inline) { - dev->tx_pkt_burst = mlx5_tx_burst_mpw_inline; + tx_pkt_burst = mlx5_tx_burst_mpw_inline; DEBUG("selected MPW inline TX function"); } else if (priv->mps) { - dev->tx_pkt_burst = mlx5_tx_burst_mpw; + tx_pkt_burst = mlx5_tx_burst_mpw; D
[dpdk-dev] [PATCH v2 4/7] net/mlx5: convert to new Tx offloads API
Ethdev Tx offloads API has changed since: commit cba7f53b717d ("ethdev: introduce Tx queue offloads API") This commit support the new Tx offloads API. Signed-off-by: Shahaf Shuler Acked-by: Nelio Laranjeiro --- doc/guides/nics/mlx5.rst | 15 +++ drivers/net/mlx5/mlx5.c | 18 ++-- drivers/net/mlx5/mlx5.h | 2 +- drivers/net/mlx5/mlx5_ethdev.c | 37 drivers/net/mlx5/mlx5_rxtx.c | 6 ++- drivers/net/mlx5/mlx5_rxtx.h | 7 +-- drivers/net/mlx5/mlx5_rxtx_vec.c | 32 +++--- drivers/net/mlx5/mlx5_rxtx_vec.h | 12 ++ drivers/net/mlx5/mlx5_txq.c | 80 --- 9 files changed, 142 insertions(+), 67 deletions(-) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 154db64d7..bdc2216c0 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -262,8 +262,9 @@ Run-time configuration Enhanced MPS supports hybrid mode - mixing inlined packets and pointers in the same descriptor. - This option cannot be used in conjunction with ``tso`` below. When ``tso`` - is set, ``txq_mpw_en`` is disabled. + This option cannot be used with certain offloads such as ``DEV_TX_OFFLOAD_TCP_TSO, + DEV_TX_OFFLOAD_VXLAN_TNL_TSO, DEV_TX_OFFLOAD_GRE_TNL_TSO, DEV_TX_OFFLOAD_VLAN_INSERT``. + When those offloads are requested the MPS send function will not be used. It is currently only supported on the ConnectX-4 Lx and ConnectX-5 families of adapters. Enabled by default. @@ -284,17 +285,15 @@ Run-time configuration Effective only when Enhanced MPS is supported. The default value is 256. -- ``tso`` parameter [int] - - A nonzero value enables hardware TSO. - When hardware TSO is enabled, packets marked with TCP segmentation - offload will be divided into segments by the hardware. Disabled by default. - - ``tx_vec_en`` parameter [int] A nonzero value enables Tx vector on ConnectX-5 only NIC if the number of global Tx queues on the port is lesser than MLX5_VPMD_MIN_TXQS. + This option cannot be used with certain offloads such as ``DEV_TX_OFFLOAD_TCP_TSO, + DEV_TX_OFFLOAD_VXLAN_TNL_TSO, DEV_TX_OFFLOAD_GRE_TNL_TSO, DEV_TX_OFFLOAD_VLAN_INSERT``. + When those offloads are requested the MPS send function will not be used. + Enabled by default on ConnectX-5. - ``rx_vec_en`` parameter [int] diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index ca44a0a59..1c95f3520 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -85,9 +85,6 @@ /* Device parameter to limit the size of inlining packet. */ #define MLX5_TXQ_MAX_INLINE_LEN "txq_max_inline_len" -/* Device parameter to enable hardware TSO offload. */ -#define MLX5_TSO "tso" - /* Device parameter to enable hardware Tx vector. */ #define MLX5_TX_VEC_EN "tx_vec_en" @@ -406,8 +403,6 @@ mlx5_args_check(const char *key, const char *val, void *opaque) config->mpw_hdr_dseg = !!tmp; } else if (strcmp(MLX5_TXQ_MAX_INLINE_LEN, key) == 0) { config->inline_max_packet_sz = tmp; - } else if (strcmp(MLX5_TSO, key) == 0) { - config->tso = !!tmp; } else if (strcmp(MLX5_TX_VEC_EN, key) == 0) { config->tx_vec_en = !!tmp; } else if (strcmp(MLX5_RX_VEC_EN, key) == 0) { @@ -440,7 +435,6 @@ mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs) MLX5_TXQ_MPW_EN, MLX5_TXQ_MPW_HDR_DSEG_EN, MLX5_TXQ_MAX_INLINE_LEN, - MLX5_TSO, MLX5_TX_VEC_EN, MLX5_RX_VEC_EN, NULL, @@ -629,7 +623,6 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) .cqe_comp = cqe_comp, .mps = mps, .tunnel_en = tunnel_en, - .tso = 0, .tx_vec_en = 1, .rx_vec_en = 1, .mpw_hdr_dseg = 0, @@ -793,10 +786,9 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) priv_get_num_vfs(priv, &num_vfs); config.sriov = (num_vfs || sriov); - if (config.tso) - config.tso = ((device_attr_ex.tso_caps.max_tso > 0) && - (device_attr_ex.tso_caps.supported_qpts & - (1 << IBV_QPT_RAW_PACKET))); + config.tso = ((device_attr_ex.tso_caps.max_tso > 0) && + (device_attr_ex.tso_caps.supported_qpts & + (1 << IBV_QPT_RAW_PACKET))); if (config.tso) config.tso_max_payload_sz = device_attr_ex.tso_caps.max_tso; @@ -805,10 +797,6 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) " (" MLX
[dpdk-dev] [PATCH v2 2/7] net/mlx5: add device configuration structure
Move device configuration and features capabilities to its own structure. This structure is filled by mlx5_pci_probe(), outside of this function it should be treated as *read only*. This configuration struct will be used for the Tx/Rx queue setup to select the Tx/Rx queue parameters based on the user configuration and device capabilities. In addition it will be used by the burst selection function to decide on the best pkt burst to be used. Signed-off-by: Shahaf Shuler Signed-off-by: Nelio Laranjeiro --- drivers/net/mlx5/mlx5.c | 178 +++-- drivers/net/mlx5/mlx5.h | 53 ++ drivers/net/mlx5/mlx5_ethdev.c | 26 ++--- drivers/net/mlx5/mlx5_flow.c | 2 +- drivers/net/mlx5/mlx5_rxq.c | 21 ++-- drivers/net/mlx5/mlx5_rxtx_vec.c | 10 +- drivers/net/mlx5/mlx5_txq.c | 182 ++ drivers/net/mlx5/mlx5_vlan.c | 4 +- 8 files changed, 245 insertions(+), 231 deletions(-) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 0192815f2..fdd4710f1 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -94,9 +94,6 @@ /* Device parameter to enable hardware Rx vector. */ #define MLX5_RX_VEC_EN "rx_vec_en" -/* Default PMD specific parameter value. */ -#define MLX5_ARG_UNSET (-1) - #ifndef HAVE_IBV_MLX5_MOD_MPW #define MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED (1 << 2) #define MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW (1 << 3) @@ -106,17 +103,6 @@ #define MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP (1 << 4) #endif -struct mlx5_args { - int cqe_comp; - int txq_inline; - int txqs_inline; - int mps; - int mpw_hdr_dseg; - int inline_max_packet_sz; - int tso; - int tx_vec_en; - int rx_vec_en; -}; /** * Retrieve integer value from environment variable. * @@ -399,7 +385,7 @@ mlx5_dev_idx(struct rte_pci_addr *pci_addr) static int mlx5_args_check(const char *key, const char *val, void *opaque) { - struct mlx5_args *args = opaque; + struct mlx5_dev_config *config = opaque; unsigned long tmp; errno = 0; @@ -409,23 +395,23 @@ mlx5_args_check(const char *key, const char *val, void *opaque) return errno; } if (strcmp(MLX5_RXQ_CQE_COMP_EN, key) == 0) { - args->cqe_comp = !!tmp; + config->cqe_comp = !!tmp; } else if (strcmp(MLX5_TXQ_INLINE, key) == 0) { - args->txq_inline = tmp; + config->txq_inline = tmp; } else if (strcmp(MLX5_TXQS_MIN_INLINE, key) == 0) { - args->txqs_inline = tmp; + config->txqs_inline = tmp; } else if (strcmp(MLX5_TXQ_MPW_EN, key) == 0) { - args->mps = !!tmp; + config->mps = !!tmp ? config->mps : 0; } else if (strcmp(MLX5_TXQ_MPW_HDR_DSEG_EN, key) == 0) { - args->mpw_hdr_dseg = !!tmp; + config->mpw_hdr_dseg = !!tmp; } else if (strcmp(MLX5_TXQ_MAX_INLINE_LEN, key) == 0) { - args->inline_max_packet_sz = tmp; + config->inline_max_packet_sz = tmp; } else if (strcmp(MLX5_TSO, key) == 0) { - args->tso = !!tmp; + config->tso = !!tmp; } else if (strcmp(MLX5_TX_VEC_EN, key) == 0) { - args->tx_vec_en = !!tmp; + config->tx_vec_en = !!tmp; } else if (strcmp(MLX5_RX_VEC_EN, key) == 0) { - args->rx_vec_en = !!tmp; + config->rx_vec_en = !!tmp; } else { WARN("%s: unknown parameter", key); return -EINVAL; @@ -436,8 +422,8 @@ mlx5_args_check(const char *key, const char *val, void *opaque) /** * Parse device parameters. * - * @param priv - * Pointer to private structure. + * @param config + * Pointer to device configuration structure. * @param devargs * Device arguments structure. * @@ -445,7 +431,7 @@ mlx5_args_check(const char *key, const char *val, void *opaque) * 0 on success, errno value on failure. */ static int -mlx5_args(struct mlx5_args *args, struct rte_devargs *devargs) +mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs) { const char **params = (const char *[]){ MLX5_RXQ_CQE_COMP_EN, @@ -473,7 +459,7 @@ mlx5_args(struct mlx5_args *args, struct rte_devargs *devargs) for (i = 0; (params[i] != NULL); ++i) { if (rte_kvargs_count(kvlist, params[i])) { ret = rte_kvargs_process(kvlist, params[i], -mlx5_args_check, args); +mlx5_args_check, config); if (ret != 0) { rte_kvargs_free(kvlist); return ret; @@ -487,38 +473,6 @@ mlx5_args(struct mlx5_args *args, struct rte_devargs *devargs) static struct rte_pci_driver mlx5_driver; /** - *
[dpdk-dev] [PATCH v2 3/7] net/mlx5: rename counter set in configuration
From: Nelio Laranjeiro Counter_set is a counter used for flows when its support is available. Renaming it to flow counter. Signed-off-by: Nelio Laranjeiro --- drivers/net/mlx5/mlx5.c | 3 +-- drivers/net/mlx5/mlx5.h | 2 +- drivers/net/mlx5/mlx5_flow.c | 2 +- 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index fdd4710f1..ca44a0a59 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -759,8 +759,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) (config.hw_csum_l2tun ? "" : "not ")); #ifdef HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT - config.counter_set_supported = - !!(device_attr.max_counter_sets); + config.flow_counter_en = !!(device_attr.max_counter_sets); ibv_describe_counter_set(ctx, 0, &cs_desc); DEBUG("counter type = %d, num of cs = %ld, attributes = %d", cs_desc.counter_type, cs_desc.num_of_cs, diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index 04f0b2557..171b3a933 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -110,7 +110,7 @@ struct mlx5_dev_config { unsigned int sriov:1; /* This is a VF or PF with VF devices. */ unsigned int mps:2; /* Multi-packet send supported mode. */ unsigned int tunnel_en:1; /* Whether tunnel is supported. */ - unsigned int counter_set_supported:1; /* Counter set is supported. */ + unsigned int flow_counter_en:1; /* Whether flow counter is supported. */ unsigned int cqe_comp:1; /* CQE compression is enabled. */ unsigned int tso:1; /* Whether TSO is enabled. */ unsigned int tx_vec_en:1; /* Tx vector is enabled. */ diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 8ad07b839..334a4f4ba 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -771,7 +771,7 @@ priv_flow_convert_actions(struct priv *priv, } else if (actions->type == RTE_FLOW_ACTION_TYPE_FLAG) { parser->mark = 1; } else if (actions->type == RTE_FLOW_ACTION_TYPE_COUNT && - priv->config.counter_set_supported) { + priv->config.flow_counter_en) { parser->count = 1; } else { goto exit_action_not_supported; -- 2.12.0
[dpdk-dev] [PATCH v2 5/7] net/mlx5: convert to new Rx offloads API
Ethdev Rx offloads API has changed since: commit ce17eddefc20 ("ethdev: introduce Rx queue offloads API") This commit support the new Rx offloads API. Signed-off-by: Shahaf Shuler Acked-by: Nelio Laranjeiro --- drivers/net/mlx5/mlx5_ethdev.c | 23 +--- drivers/net/mlx5/mlx5_rxq.c| 106 +++- drivers/net/mlx5/mlx5_rxtx.h | 3 + drivers/net/mlx5/mlx5_vlan.c | 3 +- 4 files changed, 111 insertions(+), 24 deletions(-) diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 8be4f43f7..adaa34fff 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -553,6 +553,10 @@ dev_configure(struct rte_eth_dev *dev) !!dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key; uint64_t supp_tx_offloads = mlx5_priv_get_tx_port_offloads(priv); uint64_t tx_offloads = dev->data->dev_conf.txmode.offloads; + uint64_t supp_rx_offloads = + (mlx5_priv_get_rx_port_offloads(priv) | +mlx5_priv_get_rx_queue_offloads(priv)); + uint64_t rx_offloads = dev->data->dev_conf.rxmode.offloads; if ((tx_offloads & supp_tx_offloads) != tx_offloads) { ERROR("Some Tx offloads are not supported " @@ -560,6 +564,12 @@ dev_configure(struct rte_eth_dev *dev) tx_offloads, supp_tx_offloads); return ENOTSUP; } + if ((rx_offloads & supp_rx_offloads) != rx_offloads) { + ERROR("Some Rx offloads are not supported " + "requested 0x%lx supported 0x%lx\n", + rx_offloads, supp_rx_offloads); + return ENOTSUP; + } if (use_app_rss_key && (dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key_len != rss_hash_default_key_len)) { @@ -671,15 +681,10 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info) info->max_rx_queues = max; info->max_tx_queues = max; info->max_mac_addrs = RTE_DIM(priv->mac); - info->rx_offload_capa = - (config->hw_csum ? -(DEV_RX_OFFLOAD_IPV4_CKSUM | - DEV_RX_OFFLOAD_UDP_CKSUM | - DEV_RX_OFFLOAD_TCP_CKSUM) : -0) | - (priv->config.hw_vlan_strip ? DEV_RX_OFFLOAD_VLAN_STRIP : 0) | - DEV_RX_OFFLOAD_TIMESTAMP; - + info->rx_queue_offload_capa = + mlx5_priv_get_rx_queue_offloads(priv); + info->rx_offload_capa = (mlx5_priv_get_rx_port_offloads(priv) | +info->rx_queue_offload_capa); info->tx_offload_capa = mlx5_priv_get_tx_port_offloads(priv); if (priv_get_ifname(priv, &ifname) == 0) info->if_index = if_nametoindex(ifname); diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index 81363ecd7..232e660ce 100644 --- a/drivers/net/mlx5/mlx5_rxq.c +++ b/drivers/net/mlx5/mlx5_rxq.c @@ -213,6 +213,78 @@ mlx5_rxq_cleanup(struct mlx5_rxq_ctrl *rxq_ctrl) } /** + * Returns the per-queue supported offloads. + * + * @param priv + * Pointer to private structure. + * + * @return + * Supported Rx offloads. + */ +uint64_t +mlx5_priv_get_rx_queue_offloads(struct priv *priv) +{ + struct mlx5_dev_config *config = &priv->config; + uint64_t offloads = (DEV_RX_OFFLOAD_SCATTER | +DEV_RX_OFFLOAD_TIMESTAMP | +DEV_RX_OFFLOAD_JUMBO_FRAME); + + if (config->hw_fcs_strip) + offloads |= DEV_RX_OFFLOAD_CRC_STRIP; + if (config->hw_csum) + offloads |= (DEV_RX_OFFLOAD_IPV4_CKSUM | +DEV_RX_OFFLOAD_UDP_CKSUM | +DEV_RX_OFFLOAD_TCP_CKSUM); + if (config->hw_vlan_strip) + offloads |= DEV_RX_OFFLOAD_VLAN_STRIP; + return offloads; +} + + +/** + * Returns the per-port supported offloads. + * + * @param priv + * Pointer to private structure. + * @return + * Supported Rx offloads. + */ +uint64_t +mlx5_priv_get_rx_port_offloads(struct priv *priv __rte_unused) +{ + uint64_t offloads = DEV_RX_OFFLOAD_VLAN_FILTER; + + return offloads; +} + +/** + * Checks if the per-queue offload configuration is valid. + * + * @param priv + * Pointer to private structure. + * @param offloads + * Per-queue offloads configuration. + * + * @return + * 1 if the configuration is valid, 0 otherwise. + */ +static int +priv_is_rx_queue_offloads_allowed(struct priv *priv, uint64_t offloads) +{ + uint64_t port_offloads = priv->dev->data->dev_conf.rxmode.offloads; + uint64_t queue_supp_offloads = + mlx5_priv_get_rx_queue_offloads(priv); + uint64_t port_supp_offloads = mlx5_priv_get_rx_port_offloads(priv); + + if ((offloads & (queue_supp_offloads | port_supp_offloads)) != + offloads) + return 0; + if (((port_offloads ^ offloads) & port_supp_
[dpdk-dev] [PATCH v2 7/7] net/mlx4: convert to new Rx offloads API
Ethdev Rx offloads API has changed since: commit ce17eddefc20 ("ethdev: introduce Rx queue offloads API") This commit support the new Rx offloads API. Signed-off-by: Shahaf Shuler --- drivers/net/mlx4/mlx4_ethdev.c | 10 ++--- drivers/net/mlx4/mlx4_flow.c | 5 ++- drivers/net/mlx4/mlx4_rxq.c| 78 ++--- drivers/net/mlx4/mlx4_rxtx.h | 2 + 4 files changed, 82 insertions(+), 13 deletions(-) diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c index 63e00b1da..39a23ee7b 100644 --- a/drivers/net/mlx4/mlx4_ethdev.c +++ b/drivers/net/mlx4/mlx4_ethdev.c @@ -766,13 +766,11 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info) info->max_rx_queues = max; info->max_tx_queues = max; info->max_mac_addrs = RTE_DIM(priv->mac); - info->rx_offload_capa = 0; info->tx_offload_capa = mlx4_priv_get_tx_port_offloads(priv); - if (priv->hw_csum) { - info->rx_offload_capa |= (DEV_RX_OFFLOAD_IPV4_CKSUM | - DEV_RX_OFFLOAD_UDP_CKSUM | - DEV_RX_OFFLOAD_TCP_CKSUM); - } + info->rx_queue_offload_capa = + mlx4_priv_get_rx_queue_offloads(priv); + info->rx_offload_capa = (mlx4_priv_get_rx_port_offloads(priv) | +info->rx_queue_offload_capa); if (mlx4_get_ifname(priv, &ifname) == 0) info->if_index = if_nametoindex(ifname); info->hash_key_size = MLX4_RSS_HASH_KEY_SIZE; diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c index 69025da42..96a6a6fa7 100644 --- a/drivers/net/mlx4/mlx4_flow.c +++ b/drivers/net/mlx4/mlx4_flow.c @@ -1232,7 +1232,7 @@ mlx4_flow_internal_next_vlan(struct priv *priv, uint16_t vlan) * - MAC flow rules are generated from @p dev->data->mac_addrs * (@p priv->mac array). * - An additional flow rule for Ethernet broadcasts is also generated. - * - All these are per-VLAN if @p dev->data->dev_conf.rxmode.hw_vlan_filter + * - All these are per-VLAN if @p DEV_RX_OFFLOAD_VLAN_FILTER * is enabled and VLAN filters are configured. * * @param priv @@ -1300,7 +1300,8 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error) }; struct ether_addr *rule_mac = ð_spec.dst; rte_be16_t *rule_vlan = - priv->dev->data->dev_conf.rxmode.hw_vlan_filter && + (priv->dev->data->dev_conf.rxmode.offloads & +DEV_RX_OFFLOAD_VLAN_FILTER) && !priv->dev->data->promiscuous ? &vlan_spec.tci : NULL; diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c index 53313c56f..0cad28269 100644 --- a/drivers/net/mlx4/mlx4_rxq.c +++ b/drivers/net/mlx4/mlx4_rxq.c @@ -663,6 +663,64 @@ mlx4_rxq_detach(struct rxq *rxq) } /** + * Returns the per-queue supported offloads. + * + * @param priv + * Pointer to private structure. + * + * @return + * Supported Tx offloads. + */ +uint64_t +mlx4_priv_get_rx_queue_offloads(struct priv *priv) +{ + uint64_t offloads = DEV_RX_OFFLOAD_SCATTER; + + if (priv->hw_csum) + offloads |= DEV_RX_OFFLOAD_CHECKSUM; + return offloads; +} + +/** + * Returns the per-port supported offloads. + * + * @param priv + * Pointer to private strucute. + * + * @return + * Supported Rx offloads. + */ +uint64_t +mlx4_priv_get_rx_port_offloads(struct priv *priv __rte_unused) +{ + uint64_t offloads = DEV_RX_OFFLOAD_VLAN_FILTER; + + return offloads; +} + +/** + * Checks if the per-queue offload configuration is valid. + * + * @param priv + * Pointer to private structure. + * @param offloads + * Per-queue offloads configuration. + * + * @return + * 1 if the configuration is valid, 0 otherwise. + */ +static int +priv_is_rx_queue_offloads_allowed(struct priv *priv, uint64_t offloads) +{ + uint64_t port_offloads = priv->dev->data->dev_conf.rxmode.offloads; + uint64_t port_supp_offloads = mlx4_priv_get_rx_port_offloads(priv); + + if (((port_offloads ^ offloads) & port_supp_offloads)) + return 0; + return 1; +} + +/** * DPDK callback to configure a Rx queue. * * @param dev @@ -707,6 +765,16 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc, (void)conf; /* Thresholds configuration (ignored). */ DEBUG("%p: configuring queue %u for %u descriptors", (void *)dev, idx, desc); + if (!priv_is_rx_queue_offloads_allowed(priv, conf->offloads)) { + rte_errno = ENOTSUP; + ERROR("%p: Rx queue offloads 0x%lx don't match port " + "offloads 0x%lx or supported offloads 0x%lx", + (void *)dev, conf->offloads, + dev->data->dev_conf.rxmode.offloads, + (mlx4_priv_get_rx_port_offloads(priv) | +
[dpdk-dev] [PATCH v2 6/7] net/mlx4: convert to new Tx offloads API
Ethdev Tx offloads API has changed since: commit cba7f53b717d ("ethdev: introduce Tx queue offloads API") This commit support the new Tx offloads API. Signed-off-by: Shahaf Shuler --- drivers/net/mlx4/mlx4_ethdev.c | 7 +--- drivers/net/mlx4/mlx4_rxtx.h | 1 + drivers/net/mlx4/mlx4_txq.c| 71 +++-- 3 files changed, 70 insertions(+), 9 deletions(-) diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c index 2f69e7d4f..63e00b1da 100644 --- a/drivers/net/mlx4/mlx4_ethdev.c +++ b/drivers/net/mlx4/mlx4_ethdev.c @@ -767,17 +767,12 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info) info->max_tx_queues = max; info->max_mac_addrs = RTE_DIM(priv->mac); info->rx_offload_capa = 0; - info->tx_offload_capa = 0; + info->tx_offload_capa = mlx4_priv_get_tx_port_offloads(priv); if (priv->hw_csum) { - info->tx_offload_capa |= (DEV_TX_OFFLOAD_IPV4_CKSUM | - DEV_TX_OFFLOAD_UDP_CKSUM | - DEV_TX_OFFLOAD_TCP_CKSUM); info->rx_offload_capa |= (DEV_RX_OFFLOAD_IPV4_CKSUM | DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM); } - if (priv->hw_csum_l2tun) - info->tx_offload_capa |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM; if (mlx4_get_ifname(priv, &ifname) == 0) info->if_index = if_nametoindex(ifname); info->hash_key_size = MLX4_RSS_HASH_KEY_SIZE; diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h index b93e2bcda..91971c4fb 100644 --- a/drivers/net/mlx4/mlx4_rxtx.h +++ b/drivers/net/mlx4/mlx4_rxtx.h @@ -184,6 +184,7 @@ int mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc, unsigned int socket, const struct rte_eth_txconf *conf); void mlx4_tx_queue_release(void *dpdk_txq); +uint64_t mlx4_priv_get_tx_port_offloads(struct priv *priv); /** * Get memory region (MR) <-> memory pool (MP) association from txq->mp2mr[]. diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c index d651e4980..f74e4a735 100644 --- a/drivers/net/mlx4/mlx4_txq.c +++ b/drivers/net/mlx4/mlx4_txq.c @@ -182,6 +182,53 @@ mlx4_txq_fill_dv_obj_info(struct txq *txq, struct mlx4dv_obj *mlxdv) } /** + * Returns the per-port supported offloads. + * + * @param priv + * Pointer to private structure. + * + * @return + * Supported Tx offloads. + */ +uint64_t +mlx4_priv_get_tx_port_offloads(struct priv *priv) +{ + uint64_t offloads = DEV_TX_OFFLOAD_MULTI_SEGS; + + if (priv->hw_csum) { + offloads |= (DEV_TX_OFFLOAD_IPV4_CKSUM | +DEV_TX_OFFLOAD_UDP_CKSUM | +DEV_TX_OFFLOAD_TCP_CKSUM); + } + if (priv->hw_csum_l2tun) + offloads |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM; + + return offloads; +} + +/** + * Checks if the per-queue offload configuration is valid. + * + * @param priv + * Pointer to private structure. + * @param offloads + * Per-queue offloads configuration. + * + * @return + * 1 if the configuration is valid, 0 otherwise. + */ +static int +priv_is_tx_queue_offloads_allowed(struct priv *priv, uint64_t offloads) +{ + uint64_t port_offloads = priv->dev->data->dev_conf.txmode.offloads; + uint64_t port_supp_offloads = mlx4_priv_get_tx_port_offloads(priv); + + if ((port_offloads ^ offloads) & port_supp_offloads) + return 0; + return 1; +} + +/** * DPDK callback to configure a Tx queue. * * @param dev @@ -229,9 +276,22 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc, }; int ret; - (void)conf; /* Thresholds configuration (ignored). */ DEBUG("%p: configuring queue %u for %u descriptors", (void *)dev, idx, desc); + /* +* Don't verify port offloads for application which +* use the old API. +*/ + if (!!(conf->txq_flags & ETH_TXQ_FLAGS_IGNORE) && + !priv_is_tx_queue_offloads_allowed(priv, conf->offloads)) { + rte_errno = ENOTSUP; + ERROR("%p: Tx queue offloads 0x%lx don't match port " + "offloads 0x%lx or supported offloads 0x%lx", + (void *)dev, conf->offloads, + dev->data->dev_conf.txmode.offloads, + mlx4_priv_get_tx_port_offloads(priv)); + return -rte_errno; + } if (idx >= dev->data->nb_tx_queues) { rte_errno = EOVERFLOW; ERROR("%p: queue index out of range (%u >= %u)", @@ -281,8 +341,13 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc, RTE_MIN(MLX4_PMD_TX_PER_COMP_REQ, desc / 4),
[dpdk-dev] [PATCH v3] net/i40e: fix port segmentation fault when restart
This patch will clear all queue region related configuration when dev stop even if threr is no queue region config, so this may cause error. So add check if there is queue configuration exist when flush queue region config and remove this process when device stop. Queue region clear only do when device initialization or PMD get flush command. Fixes: 7cbecc2f742 ("net/i40e: support queue region set and flush") Signed-off-by: Wei Zhao --- v2: -fix patch check warning. v3: -add more log message. --- drivers/net/i40e/i40e_ethdev.c | 3 --- drivers/net/i40e/rte_pmd_i40e.c | 27 ++- 2 files changed, 14 insertions(+), 16 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 811cc9f..7a1290b 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -2154,9 +2154,6 @@ i40e_dev_stop(struct rte_eth_dev *dev) /* reset hierarchy commit */ pf->tm_conf.committed = false; - /* Remove all the queue region configuration */ - i40e_flush_queue_region_all_conf(dev, hw, pf, 0); - hw->adapter_stopped = 1; } diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c index aeb92af..c2e2466 100644 --- a/drivers/net/i40e/rte_pmd_i40e.c +++ b/drivers/net/i40e/rte_pmd_i40e.c @@ -2845,22 +2845,23 @@ i40e_flush_queue_region_all_conf(struct rte_eth_dev *dev, return 0; } - info->queue_region_number = 1; - info->region[0].queue_num = main_vsi->nb_used_qps; - info->region[0].queue_start_index = 0; + if (info->queue_region_number) { + info->queue_region_number = 1; + info->region[0].queue_num = main_vsi->nb_used_qps; + info->region[0].queue_start_index = 0; - ret = i40e_vsi_update_queue_region_mapping(hw, pf); - if (ret != I40E_SUCCESS) - PMD_DRV_LOG(INFO, "Failed to flush queue region mapping."); - - ret = i40e_dcb_init_configure(dev, TRUE); - if (ret != I40E_SUCCESS) { - PMD_DRV_LOG(INFO, "Failed to flush dcb."); - pf->flags &= ~I40E_FLAG_DCB; - } + ret = i40e_vsi_update_queue_region_mapping(hw, pf); + if (ret != I40E_SUCCESS) + PMD_DRV_LOG(INFO, "Failed to flush queue region mapping."); - i40e_init_queue_region_conf(dev); + ret = i40e_dcb_init_configure(dev, TRUE); + if (ret != I40E_SUCCESS) { + PMD_DRV_LOG(INFO, "Failed to flush dcb."); + pf->flags &= ~I40E_FLAG_DCB; + } + i40e_init_queue_region_conf(dev); + } return 0; } -- 2.9.3
Re: [dpdk-dev] [PATCH 2/2] mlx5: don't depend on kernel version
Hi Stephen, Please see few comments bellow, On Tue, Jan 02, 2018 at 12:53:10PM -0800, Stephen Hemminger wrote: > This driver uses ethtool to get link status. The ethtool API has new > and old deprecated API. Rather than checking kernel version, use the > same algorithm that the ethtool command does; check the new API first > and if that fails, try the old one. > > Also, use common code for getting link state up/down and comparing > for changes. > > Signed-off-by: Stephen Hemminger > --- > drivers/net/mlx5/mlx5_ethdev.c | 110 > ++--- > 1 file changed, 47 insertions(+), 63 deletions(-) > > diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c > index 388507f109f7..2dc32cdf58b9 100644 > --- a/drivers/net/mlx5/mlx5_ethdev.c > +++ b/drivers/net/mlx5/mlx5_ethdev.c > @@ -49,7 +49,6 @@ > #include > #include > #include > -#include > #include > #include > #include > @@ -757,36 +756,25 @@ mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev) > * Pointer to Ethernet device structure. > */ > static int > -mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev) > +mlx5_link_update_unlocked_gset(struct priv *priv, struct ifreq *ifr, > +struct rte_eth_link *dev_link) Function documentation should also describe the new parameter. > { > - struct priv *priv = mlx5_get_priv(dev); > struct ethtool_cmd edata = { > .cmd = ETHTOOL_GSET /* Deprecated since Linux v4.5. */ > }; > - struct ifreq ifr; > - struct rte_eth_link dev_link; > int link_speed = 0; > > - /* priv_lock() is not taken to allow concurrent calls. */ > - > - if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) { > - WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(errno)); > - return -1; > - } > - memset(&dev_link, 0, sizeof(dev_link)); > - dev_link.link_status = ((ifr.ifr_flags & IFF_UP) && > - (ifr.ifr_flags & IFF_RUNNING)); > - ifr.ifr_data = (void *)&edata; > - if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) { > + ifr->ifr_data = (void *)&edata; > + if (priv_ifreq(priv, SIOCETHTOOL, ifr)) { > WARN("ioctl(SIOCETHTOOL, ETHTOOL_GSET) failed: %s", >strerror(errno)); > return -1; > } > link_speed = ethtool_cmd_speed(&edata); > if (link_speed == -1) > - dev_link.link_speed = 0; > + dev_link->link_speed = 0; > else > - dev_link.link_speed = link_speed; > + dev_link->link_speed = link_speed; > priv->link_speed_capa = 0; > if (edata.supported & SUPPORTED_Autoneg) > priv->link_speed_capa |= ETH_LINK_SPEED_AUTONEG; > @@ -800,17 +788,9 @@ mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev) > SUPPORTED_4baseSR4_Full | > SUPPORTED_4baseLR4_Full)) > priv->link_speed_capa |= ETH_LINK_SPEED_40G; > - dev_link.link_duplex = ((edata.duplex == DUPLEX_HALF) ? > + dev_link->link_duplex = ((edata.duplex == DUPLEX_HALF) ? > ETH_LINK_HALF_DUPLEX : ETH_LINK_FULL_DUPLEX); > - dev_link.link_autoneg = !(dev->data->dev_conf.link_speeds & > - ETH_LINK_SPEED_FIXED); > - if (memcmp(&dev_link, &dev->data->dev_link, sizeof(dev_link))) { > - /* Link status changed. */ > - dev->data->dev_link = dev_link; > - return 0; > - } > - /* Link status is still the same. */ > - return -1; > + return 0; > } > > /** > @@ -820,23 +800,14 @@ mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev) > * Pointer to Ethernet device structure. > */ > static int > -mlx5_link_update_unlocked_gs(struct rte_eth_dev *dev) > +mlx5_link_update_unlocked_gs(struct priv *priv, struct ifreq *ifr, > + struct rte_eth_link *dev_link) Same here. > { > - struct priv *priv = mlx5_get_priv(dev); > struct ethtool_link_settings gcmd = { .cmd = ETHTOOL_GLINKSETTINGS }; > - struct ifreq ifr; > - struct rte_eth_link dev_link; > uint64_t sc; > > - if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) { > - WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(errno)); > - return -1; > - } > - memset(&dev_link, 0, sizeof(dev_link)); > - dev_link.link_status = ((ifr.ifr_flags & IFF_UP) && > - (ifr.ifr_flags & IFF_RUNNING)); > - ifr.ifr_data = (void *)&gcmd; > - if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) { > + ifr->ifr_data = (void *)&gcmd; > + if (priv_ifreq(priv, SIOCETHTOOL, ifr)) { > DEBUG("ioctl(SIOCETHTOOL, ETHTOOL_GLINKSETTINGS) failed: %s", > strerror(errno)); > return -1; > @@ -849,13 +820,13 @@ mlx5_link_update_unlocked_gs(struct rte_eth_dev *dev) > struct ethtool_link_settings *ecmd
Re: [dpdk-dev] [PATCH 1/2] mlx5: don't pass unused argument to sub-functions
Hi Stephen, On Tue, Jan 02, 2018 at 12:53:09PM -0800, Stephen Hemminger wrote: > Since wait_to_complete is unused, don't pass it to helper functions. > Use the standard RTE macro to indicate this is an unused parameter. I would suggest to use the (void) as it is done in the whole driver, a specific patch should be done to use the rte_unused macro in all the sources at once. Thanks, -- Nélio Laranjeiro 6WIND
Re: [dpdk-dev] [RFC] mlx5: update NIC documentation on RDMA core version
Hi Stephen, Seems you missed an important point, MLNX_OFED is still supported with this driver allowing it to work on stable releases like strict debian 9 i.e. without updating the Linux kernel. On Tue, Jan 02, 2018 at 01:44:21PM -0800, Stephen Hemminger wrote: > The current driver requires v16. It will not work or build with > the older version (as in Debian stable). Note: libmlx5 is rolled > into rdma-core in current versions. > > Mlx4 probably requires similar documentation update. > > Signed-off-by: Stephen Hemminger > --- > doc/guides/nics/mlx5.rst | 18 +- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst > index f9558da89b61..603dd4e9c1cd 100644 > --- a/doc/guides/nics/mlx5.rst > +++ b/doc/guides/nics/mlx5.rst > @@ -299,26 +299,26 @@ Prerequisites > - > > This driver relies on external libraries and kernel drivers for resources > -allocations and initialization. The following dependencies are not part of > -DPDK and must be installed separately: > +allocations and initialization. The following packages come from the > +Linux RDMA core https://github.com/linux-rdma/rdma-core. The current > +version of this driver requires version version 16 or later. Not only, they are also present in MLNX_OFED >= 4.2 for regular distribution without updating the Linux Kernel. In such situation installing RDMA-Core is useless. > - **libibverbs** > >User space Verbs framework used by librte_pmd_mlx5. This library provides > - a generic interface between the kernel and low-level user space drivers > - such as libmlx5. > + a generic interface between the kernel and low-level user space drivers. > >It allows slow and privileged operations (context initialization, hardware >resources allocations) to be managed by the kernel and fast operations to >never leave user space. > > -- **libmlx5** > + The development package (libibverbs-dev or libibverbs-devel) are necessary > + for compilation. Not really MLNX_OFED >= 4.2 or libibverbs-dev or libibverbs-devel are necessary. People using MLNX_OFED must not install libibverbs-dev(el) packages. > - Low-level user space driver library for Mellanox ConnectX-4/ConnectX-5 > - devices, it is automatically loaded by libibverbs. > +- **rdma-core** > > - This library basically implements send/receive calls to the hardware > - queues. > + The basic userspace infrastructure for interaction with RDMA subsystem > + on Linux. > > - **Kernel modules** > > -- > 2.15.1 Keep both situation in the documentation for costumers who uses: - stable distribution in conjunction with MLNX_OFED - stable distribution with updated Linux kernel using RDMA-Core. Both are supported. Thanks, -- Nélio Laranjeiro 6WIND