Re: [dpdk-dev] [PATCH] net/virtio-user: support changing tap interface name
On Sat, Mar 11, 2017 at 03:36:58PM +, Wenfeng Liu wrote: > This patch adds a new option 'iface' to change the interface name of > tap device with vhost-kernel as backend. > > Signed-off-by: Wenfeng Liu Jianfeng, any comments? --yliu
[dpdk-dev] Compiling DPDK 17.02 using musl instead of glibc
Hi, I've managed to compile DPDK 17.02 for musl instead of glibc. In essence, I had to change a number of files. I've not prepared a patch at this stage, but I wanted to share my changes with the list so that others can benefit. Firstly, I apply the following minor sed edits:- # Assumes glibc sed -i -e 's;#include ;#include \n#include ;g' lib/librte_eal/common/include/rte_lcore.h sed -i -e 's;#include ;#include \n#include ;g' lib/librte_eal/linuxapp/eal/eal_hugepage_info.c sed -i -e 's;#include ;#include \n#include ;g' lib/librte_eal/linuxapp/eal/eal_memory.c sed -i -e 's;#include ;#include \n#include \n;g' lib/librte_eal/linuxapp/eal/eal_pci_uio.c sed -i -e 's;#define PAGE_SIZE;#undef PAGE_SIZE\n#define PAGE_SIZE;g' lib/librte_eal/linuxapp/eal/eal_pci_vfio.c sed -i -e 's;#include ;#include \n#include ;g' app/test/test_eal_flags.c # ? sed -i -e 's;uint hash_key_len;uint8_t hash_key_len;g' app/test-pmd/testpmd.c app/test-pmd/testpmd.h app/t Secondly, I replaced these files with dummies as they use glibc functionality without equivalence:- lib/librte_eal/linuxapp/eal/eal_debug.c lib/librte_eal/linuxapp/eal/eal_log.c lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c That's not ideal Lastly, I also had to modify musl itself:- - Musl does not ship sys/queue.h. Perhaps in future DPDK could ship its own copy of this? Musl will not be shipping it, either - Musl lacks some functions in bits/io.h for x86_64. They seem open to adding them, but need a contribution from someone knowledgeable ( http://www.openwall.com/lists/musl/2015/11/10/1). There is a patch shipped with OSv for it. Raph
[dpdk-dev] Changes to DPDK 17.02 to cross compile for Linux on Mac OS X
Hi, If anyone's interested, there are a few small changes needed to pmdinfogen to get it to cross-compile using Homebrew on Mac OS X. These aren't really patch worthy, as my changes will break it for other BSD-based systems. Firstly local libelfPrefix="$(brew --prefix libelf)" local extra="-I${libelfPrefix}/include/libelf -I${libelfPrefix}/include -isystem${dpdkSrcDir}/buildtools/pmdinfogen" sed -i -e 's;-g;-g '"$extra"';g' buildtools/pmdinfogen/Makefile sed -i -e 's;#include ;#include "endian.h";g' buildtools/pmdinfogen/pmdinfogen.h Secondly, create a simple endian.h file inside buildtools/pmdinfogen. There are a few ones floating around on the web. Thirdly, create a stripped down elf.h inside buildtools/pmdinfogen. There are a few public domain ones floating around on the web. I'm not sure about the copyright status of these, so I've not linked to them. I'm using these changes currently as part of a Rust crate wrapping DPDK, called `dpdk-sys` (it has a mid-level Rust wrapper called `dpdk`). It compiles but it's early days - see https://github.com/lemonrock/dpdk . Raph
Re: [dpdk-dev] [PATCH 1/3] app/testpmd: add port reset command into testpmd
Hi, Ferruh > -Original Message- > From: Yigit, Ferruh > Sent: Wednesday, March 8, 2017 7:07 PM > To: Zhao1, Wei ; dev@dpdk.org > Cc: Lu, Wenzhuo > Subject: Re: [dpdk-dev] [PATCH 1/3] app/testpmd: add port reset command > into testpmd > > On 3/3/2017 4:56 AM, Wei Zhao wrote: > > Add vf port reset command into testpmd project, it is the interface > > for user to reset vf port. > > I think it is better to change the order of this patch, first implement new > API > in ethdev, later this patch implement new API in testpmd. > You means change the order of patch 0001 and 0002 ?0003 remain the same? Ok,I will do that in v2 patch > > > > Signed-off-by: Wei Zhao > > Signed-off-by: Wenzhuo Lu > > --- > > app/test-pmd/cmdline.c | 17 ++--- app/test-pmd/testpmd.c | > > 67 ++ > > app/test-pmd/testpmd.h | 1 + > > 3 files changed, 81 insertions(+), 4 deletions(-) > > > > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index > > 43fc636..59db672 100644 > > --- a/app/test-pmd/cmdline.c > > +++ b/app/test-pmd/cmdline.c > > @@ -596,6 +596,9 @@ static void cmd_help_long_parsed(void > *parsed_result, > > "port close (port_id|all)\n" > > "Close all ports or port_id.\n\n" > > > > + "port reset (port_id|all)\n" > > + "Reset all ports or port_id.\n\n" > > It is not clear what reset does to the port. This is only for VF right? > Adding reset here hides that it is for VF. Yes, it is only for VF port reset, maybe I should change cmd line command to "port vf_reset index" that mode to avoid ambiguous for user in v2 > > <...> > > > @@ -601,6 +602,7 @@ init_config(void) > > if (init_fwd_streams() < 0) > > rte_exit(EXIT_FAILURE, "FAIL from init_fwd_streams()\n"); > > > > + > > This may be unintentional. Yes, I will fix it in v2 patch. > > <...> > > > @@ -1350,6 +1363,10 @@ start_port(portid_t pid) > > return -1; > > } > > } > > + > > + /* register reset interrupt callback */ > > + rte_eth_dev_callback_register(pi, > RTE_ETH_EVENT_INTR_RESET, > > + reset_event_callback, NULL); > > So each port started will register a callback to handle reset events, > > 1- isn't this overkill for the usecases that does not need this reset? > 2- should there be an unregister event? > 3- This issue can be fixed in testpmd, but for user application, is this the > suggested way? 1. it is hard to know which port may be need to reset in the feature at the beginning stage, so there is register event for all ports. 2. this is only a demo for user application. I think in user application, there should be a specific thread to check whether reset_ports is set by I40e function _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET, NULL), so RTE_ETH_EVENT_INTR_RESET must be registered. IF true, then user application should call function rte_eth_dev_reset(pi) 3. . this is only a demo for user application, user application can create a new specific thread to do this work , but testpmd app can not because that may be introduce A lot of unknow problem for existing feature. > > > if (port->need_reconfig_queues > 0) { > > port->need_reconfig_queues = 0; > > /* setup tx queues */ > > @@ -1559,6 +1576,56 @@ close_port(portid_t pid) } > > > > void > > +reset_port(portid_t pid) > > +{ > > + portid_t pi; > > + struct rte_port *port; > > + > > + if (port_id_is_invalid(pid, ENABLED_WARN)) > > + return; > > + > > + printf("Closing ports...\n"); > > + > > + FOREACH_PORT(pi, ports) { > > Since we already know the port_id (pid), why iterating through all ports? This is usefully, because when user command is “port reset all”, the port_id is 0x If all of these ports are vf, then it will need to retrieve all port. > > > + if (pid != pi && pid != (portid_t)RTE_PORT_ALL) > > + continue; > > + > > + if (port_is_forwarding(pi) != 0 && test_done == 0) { > > + printf("Please remove port %d from forwarding " > > + "configuration.\n", pi); > > + continue; > > + } > > + > > + if (port_is_bonding_slave(pi)) { > > + printf("Please remove port %d from " > > + "bonded device.\n", pi); > > + continue; > > + } > > + > > + if (!reset_ports[pi]) { > > + printf("vf must get reset port %d info from " > > + "pf before reset.\n", pi); > > + continue; > > + } > > Can there be a timing issue here? Is it possible that reset occurred already > and we are in the middle of the callback function when this check don
[dpdk-dev] Compiling DPDK 17.02 with a Busybox-based tar
Hi, To compile DPDK on a system with Busybox tar installed, it's necessary to make a small change to the build system:- sed -i -e '/--keep-newer-files/d' mk/rte.sdkinstall.mk sed -i -e 's;--strip-components=1 \\;--strip-components=1;g' mk/ rte.sdkinstall.mk I'm not sure whether the impact of this change fundamentally affects DPDK. I'm a little surprised that tar is needed at all for a compile + install, but I haven't investigated further. Is it being used to do a copy? Raph
Re: [dpdk-dev] [PATCHv8 19/46] pool/dpaa2: add DPAA2 hardware offloaded mempool
Hi Hemant, On Tue, 14 Mar 2017 12:12:17 +0530, Hemant Agrawal wrote: > On 3/9/2017 11:27 AM, Hemant Agrawal wrote: > > On 3/8/2017 9:09 PM, Thomas Monjalon wrote: > >> 2017-03-08 18:22, Hemant Agrawal: > On Fri, 3 Mar 2017 18:16:36 +0530, Hemant Agrawal > wrote: > I think the current mempool handlers should be moved first in a > separate patch. > >> > >> Yes it should have been done earlier. > >> > >>> Are you seeing any benefit by making it a separate patch series? > >> > >> A separate patchset for moving mempool handlers will be easy to review > >> and accept. > >> If integrated in this series, it is kind of hidden and prevent the > >> visibility and review it deserves. > >> By the way the mempool move should be directly committed in the main > >> repository, while this series targets next-net. > >> > > > > hw mempool has dependency on mc-bus. So, we will break it into *5* series, > > > > 1. mc-bus - which can go on main repo - Rebased over main repo > > 2. dpaa2-pool - it can also go on main rep (depends on mc-bus) - Rebased > > over 'main repo + 1'. > > 3. dpaa2-pmd - this depends on mc-bus and dpaa2-pool > > A. mc-bus -another series rebased over net-next > > B. dpaa2-pool - another series rebased over 'net-next + A' > > C. the pmd itself rebased over 'net-next + A + B' > > > > Are you sure, you will like us to do the above and flood the mailing > > list :) > > > > I think, this will make the review difficult. > > > > Bruce, Ferruh, Olivier - does the above help you? If so, we can do it. > > Any views, we can rebase our next version accordingly. > we are looking forward to get it merged in 17.05. Yes, moving the other mempool drivers in the drivers/ directory before adding another one there makes sense to me, in order to avoid having two places for mempool handlers. As Thomas stated, it's probably better to have it in another patchset, so it can be quickly reviewed and integrated. Regards, Olivier
[dpdk-dev] 答复: [PATCH] net/virtio-user: fix overflow
Hi Yuanhan, >On Mon, Mar 13, 2017 at 03:09:15PM +, Wenfeng Liu wrote: >> This commit fixes an array overflow when number of queues is higher than 8. > >Firstly, this commit log could be a bit more informative, to something >like: > >virtio-user limits the qeueue number to 8 but provides no limit >check against the queue number input from user. If a bigger queue >number (> 8) is given, there is an overflow issue. Doing a sanity >check could avoid it. > Sure, I will revise the commit log accordingly. >> >> Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer") >> >> Signed-off-by: Wenfeng Liu >> --- >> drivers/net/virtio/virtio_pci.h | 3 ++- >> drivers/net/virtio/virtio_user/virtio_user_dev.c | 2 +- >> drivers/net/virtio/virtio_user/virtio_user_dev.h | 6 +++--- >> drivers/net/virtio/virtio_user_ethdev.c | 7 +++ >> 4 files changed, 13 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/net/virtio/virtio_pci.h >> b/drivers/net/virtio/virtio_pci.h index 59e45c4..bd940b4 100644 >> --- a/drivers/net/virtio/virtio_pci.h >> +++ b/drivers/net/virtio/virtio_pci.h >> @@ -160,7 +160,8 @@ >> /* >> * Maximum number of virtqueues per device. >> */ >> -#define VIRTIO_MAX_VIRTQUEUES 8 >> +#define VIRTIO_MAX_VIRTQUEUE_PAIRS 8 >> +#define VIRTIO_MAX_VIRTQUEUES VIRTIO_MAX_VIRTQUEUE_PAIRS * 2 + 1 >> >> /* Common configuration */ >> #define VIRTIO_PCI_CAP_COMMON_CFG 1 >> diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c >> b/drivers/net/virtio/virtio_user/virtio_user_dev.c >> index e7fd65f..5b81676 100644 >> --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c >> +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c >> @@ -234,7 +234,7 @@ int virtio_user_stop_device(struct virtio_user_dev *dev) >> uint32_t i, q; >> >> dev->vhostfd = -1; >> -for (i = 0; i < VIRTIO_MAX_VIRTQUEUES * 2 + 1; ++i) { >> +for (i = 0; i < VIRTIO_MAX_VIRTQUEUES; ++i) { > >Right, we don't need setup callfd and kickfd for the ctrl-queue. I did not remove the ctrl-queue. I just redefined the MACRO according to DRY principle: #define VIRTIO_MAX_VIRTQUEUES VIRTIO_MAX_VIRTQUEUE_PAIRS * 2 + 1 I noticed that I missed the bracket in the MACRO and will add it in next version. > >> dev->kickfds[i] = -1; >> dev->callfds[i] = -1; >> } >> diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h >> b/drivers/net/virtio/virtio_user/virtio_user_dev.h >> index 6ecb91e..ba80d05 100644 >> --- a/drivers/net/virtio/virtio_user/virtio_user_dev.h >> +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h >> @@ -49,8 +49,8 @@ struct virtio_user_dev { >> int *tapfds; >> >> /* for both vhost_user and vhost_kernel */ >> -int callfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1]; >> -int kickfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1]; >> +int callfds[VIRTIO_MAX_VIRTQUEUES]; >> +int kickfds[VIRTIO_MAX_VIRTQUEUES]; >> int mac_specified; >> uint32_tmax_queue_pairs; >> uint32_tqueue_pairs; >> @@ -62,7 +62,7 @@ struct virtio_user_dev { >> uint8_t status; >> uint8_t mac_addr[ETHER_ADDR_LEN]; >> charpath[PATH_MAX]; >> -struct vringvrings[VIRTIO_MAX_VIRTQUEUES * 2 + 1]; >> +struct vringvrings[VIRTIO_MAX_VIRTQUEUES]; > >Have you actually tested your patch? You are removing the vring for ctrl-queue, which is wrong to me. > >> struct virtio_user_backend_ops *ops; }; >> >> diff --git a/drivers/net/virtio/virtio_user_ethdev.c >> b/drivers/net/virtio/virtio_user_ethdev.c >> index 16d1526..d476a2d 100644 >> --- a/drivers/net/virtio/virtio_user_ethdev.c >> +++ b/drivers/net/virtio/virtio_user_ethdev.c >> @@ -433,6 +433,13 @@ >> goto end; >> } >> >> +if (queues > VIRTIO_MAX_VIRTQUEUE_PAIRS) { >> +PMD_INIT_LOG(ERR, "arg %s %u exceeds the limit %u", >> +VIRTIO_USER_ARG_QUEUES_NUM, queues, >> +VIRTIO_MAX_VIRTQUEUE_PAIRS); >> +goto end; >> +} > >Yes, we need this check. So, to me, you were actually doing two things in this patch: > >- check the queue number, to avoid overflow >- remove the callfds and kickfds for ctrl-queue. > >That said, please do them in two patches. > >Thanks. > > --yliu >> + >> eth_dev = virtio_user_eth_dev_alloc(name); >> if (!eth_dev) { >> PMD_INIT_LOG(ERR, "virtio_user fails to alloc device"); >> -- >> 1.8.3.1 Regards, liuwf
[dpdk-dev] [PATCH v4 1/8] net/tap: remove wrong IFF_NOARP flags
There is no reason not to support ARP on a tap netdevice. Focus on IFF_UP when a link status change is required. Fixes: f457b472b1f2 ("net/tap: add link up and down operations") Signed-off-by: Pascal Mazon --- drivers/net/tap/rte_eth_tap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index ece3a5fcc897..b75b7542a329 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -327,7 +327,7 @@ tap_link_set_down(struct rte_eth_dev *dev) struct pmd_internals *pmd = dev->data->dev_private; dev->data->dev_link.link_status = ETH_LINK_DOWN; - return tap_link_set_flags(pmd, IFF_UP | IFF_NOARP, 0); + return tap_link_set_flags(pmd, IFF_UP, 0); } static int @@ -336,7 +336,7 @@ tap_link_set_up(struct rte_eth_dev *dev) struct pmd_internals *pmd = dev->data->dev_private; dev->data->dev_link.link_status = ETH_LINK_UP; - return tap_link_set_flags(pmd, IFF_UP | IFF_NOARP, 1); + return tap_link_set_flags(pmd, IFF_UP, 1); } static int -- 2.8.0.rc0
[dpdk-dev] [PATCH v4 0/8] net/tap: add additional management ops
Add a few eth_dev ops to the tap PMD for completeness. We want it to behave as much as possible as a standard PMD. v2 change: - use snprintf in tap_mtu set v3 change: - call tap_mac_set() only once in tap_setup_queue() v4 changes: - use new tap_ioctl function for shared behavior between ops - update supported packet types - remove IFF_NOARP flag from link status change - sync desired MTU to both remote and tap netdevices - remove support for mac_add and mac_remove ops Pascal Mazon (8): net/tap: remove wrong IFF_NOARP flags net/tap: refactor ioctl calls net/tap: add MAC address management net/tap: add MTU management net/tap: add speed capabilities net/tap: add multicast addresses management net/tap: add packet type management net/tap: add flow control management doc/guides/nics/features/tap.ini | 6 + drivers/net/tap/rte_eth_tap.c| 255 +++ 2 files changed, 214 insertions(+), 47 deletions(-) -- 2.8.0.rc0
[dpdk-dev] [PATCH v4 2/8] net/tap: refactor ioctl calls
Create a socket for ioctl at tap device creation instead of opening it and closing it every call to tap_link_set_flags(). Use a common tap_ioctl() function that can be extended for various uses (such as MTU change, MAC address change, ...). Signed-off-by: Pascal Mazon --- drivers/net/tap/rte_eth_tap.c | 90 +++ 1 file changed, 48 insertions(+), 42 deletions(-) diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index b75b7542a329..18edac57eead 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -107,6 +107,7 @@ struct pmd_internals { struct ether_addr eth_addr; /* Mac address of the device port */ int if_index; /* IF_INDEX for the port */ + int ioctl_sock; /* socket for ioctl calls */ struct rx_queue rxq[RTE_PMD_TAP_MAX_QUEUES];/* List of RX queues */ struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES];/* List of TX queues */ @@ -280,63 +281,55 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) } static int -tap_link_set_flags(struct pmd_internals *pmd, short flags, int add) +tap_ioctl(struct pmd_internals *pmd, unsigned long request, + struct ifreq *ifr, int set) { - struct ifreq ifr; - int err, s; - - /* -* An AF_INET/DGRAM socket is needed for -* SIOCGIFFLAGS/SIOCSIFFLAGS, using fd won't work. -*/ - s = socket(AF_INET, SOCK_DGRAM, 0); - if (s < 0) { - RTE_LOG(ERR, PMD, - "Unable to get a socket to set flags: %s\n", - strerror(errno)); - return -1; - } - memset(&ifr, 0, sizeof(ifr)); - snprintf(ifr.ifr_name, IFNAMSIZ, "%s", pmd->name); - err = ioctl(s, SIOCGIFFLAGS, &ifr); - if (err < 0) { - RTE_LOG(WARNING, PMD, "Unable to get %s device flags: %s\n", - pmd->name, strerror(errno)); - close(s); - return -1; - } - if (add) - ifr.ifr_flags |= flags; - else - ifr.ifr_flags &= ~flags; - err = ioctl(s, SIOCSIFFLAGS, &ifr); - if (err < 0) { - RTE_LOG(WARNING, PMD, "Unable to %s flags 0x%x: %s\n", - add ? "set" : "unset", flags, strerror(errno)); - close(s); - return -1; - } - close(s); + short req_flags = ifr->ifr_flags; + snprintf(ifr->ifr_name, IFNAMSIZ, "%s", pmd->name); + switch (request) { + case SIOCSIFFLAGS: + /* fetch current flags to leave other flags untouched */ + if (ioctl(pmd->ioctl_sock, SIOCGIFFLAGS, ifr) < 0) + goto error; + if (set) + ifr->ifr_flags |= req_flags; + else + ifr->ifr_flags &= ~req_flags; + break; + default: + RTE_LOG(WARNING, PMD, "%s: ioctl() called with wrong arg\n", + pmd->name); + return -EINVAL; + } + if (ioctl(pmd->ioctl_sock, request, ifr) < 0) + goto error; return 0; + +error: + RTE_LOG(ERR, PMD, "%s: ioctl(%lu) failed with error: %s\n", + ifr->ifr_name, request, strerror(errno)); + return -errno; } static int tap_link_set_down(struct rte_eth_dev *dev) { struct pmd_internals *pmd = dev->data->dev_private; + struct ifreq ifr = { .ifr_flags = IFF_UP }; dev->data->dev_link.link_status = ETH_LINK_DOWN; - return tap_link_set_flags(pmd, IFF_UP, 0); + return tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0); } static int tap_link_set_up(struct rte_eth_dev *dev) { struct pmd_internals *pmd = dev->data->dev_private; + struct ifreq ifr = { .ifr_flags = IFF_UP }; dev->data->dev_link.link_status = ETH_LINK_UP; - return tap_link_set_flags(pmd, IFF_UP, 1); + return tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1); } static int @@ -470,36 +463,40 @@ static void tap_promisc_enable(struct rte_eth_dev *dev) { struct pmd_internals *pmd = dev->data->dev_private; + struct ifreq ifr = { .ifr_flags = IFF_PROMISC }; dev->data->promiscuous = 1; - tap_link_set_flags(pmd, IFF_PROMISC, 1); + tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1); } static void tap_promisc_disable(struct rte_eth_dev *dev) { struct pmd_internals *pmd = dev->data->dev_private; + struct ifreq ifr = { .ifr_flags = IFF_PROMISC }; dev->data->promiscuous = 0; - tap_link_set_flags(pmd, IFF_PROMISC, 0); + tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0); } static void tap_allmulti_enable(struct rte_eth_dev *dev) { struct pmd_internals *pmd = dev->data->dev_private; + struct ifreq ifr = { .ifr_flags = IFF_ALLMULTI }; dev->data->all_multicast = 1; - t
[dpdk-dev] [PATCH v4 4/8] net/tap: add MTU management
The MTU is assigned to the tap netdevice according to the argument, but packet transmission and reception just write/read on an fd with the default limit being the socket buffer size. As a new rte_eth_dev_data is allocated during tap device init, ensure it is set again dev->data->mtu. Once the actual netdevice is created via tun_alloc(), make sure to apply the desired MTU to the netdevice. Signed-off-by: Pascal Mazon --- doc/guides/nics/features/tap.ini | 1 + drivers/net/tap/rte_eth_tap.c| 26 ++ 2 files changed, 27 insertions(+) diff --git a/doc/guides/nics/features/tap.ini b/doc/guides/nics/features/tap.ini index d9b47a003654..b80577753240 100644 --- a/doc/guides/nics/features/tap.ini +++ b/doc/guides/nics/features/tap.ini @@ -9,6 +9,7 @@ Jumbo frame = Y Promiscuous mode = Y Allmulticast mode= Y Basic stats = Y +MTU update = Y Unicast MAC filter = Y Other kdrv = Y ARMv7= Y diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index 06c1faa92dce..3e61f7bda555 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -302,6 +302,7 @@ tap_ioctl(struct pmd_internals *pmd, unsigned long request, break; case SIOCGIFHWADDR: case SIOCSIFHWADDR: + case SIOCSIFMTU: break; default: RTE_LOG(WARNING, PMD, "%s: ioctl() called with wrong arg\n", @@ -547,6 +548,15 @@ tap_setup_queue(struct rte_eth_dev *dev, pmd->name, qid); return -1; } + if (qid == 0) { + struct ifreq ifr; + + ifr.ifr_mtu = dev->data->mtu; + if (tap_ioctl(pmd, SIOCSIFMTU, &ifr, 1) < 0) { + close(fd); + return -1; + } + } } } @@ -642,6 +652,20 @@ tap_tx_queue_setup(struct rte_eth_dev *dev, return 0; } +static int +tap_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) +{ + struct pmd_internals *pmd = dev->data->dev_private; + struct ifreq ifr = { .ifr_mtu = mtu }; + int err = 0; + + err = tap_ioctl(pmd, SIOCSIFMTU, &ifr, 1); + if (!err) + dev->data->mtu = mtu; + + return err; +} + static const struct eth_dev_ops ops = { .dev_start = tap_dev_start, .dev_stop = tap_dev_stop, @@ -660,6 +684,7 @@ static const struct eth_dev_ops ops = { .allmulticast_enable= tap_allmulti_enable, .allmulticast_disable = tap_allmulti_disable, .mac_addr_set = tap_mac_set, + .mtu_set= tap_mtu_set, .stats_get = tap_stats_get, .stats_reset= tap_stats_reset, }; @@ -710,6 +735,7 @@ eth_dev_tap_create(const char *name, char *tap_name) /* Setup some default values */ data->dev_private = pmd; data->port_id = dev->data->port_id; + data->mtu = dev->data->mtu; data->dev_flags = RTE_ETH_DEV_DETACHABLE; data->kdrv = RTE_KDRV_NONE; data->drv_name = pmd_tap_drv.driver.name; -- 2.8.0.rc0
[dpdk-dev] [PATCH v4 6/8] net/tap: add multicast addresses management
A tap netdevice actually receives every packet, without any filtering whatsoever. There is no need for any multicast address registration to receive multicast packets. Signed-off-by: Pascal Mazon --- doc/guides/nics/features/tap.ini | 1 + drivers/net/tap/rte_eth_tap.c| 13 + 2 files changed, 14 insertions(+) diff --git a/doc/guides/nics/features/tap.ini b/doc/guides/nics/features/tap.ini index 6c07352088bf..6aa11874e2bc 100644 --- a/doc/guides/nics/features/tap.ini +++ b/doc/guides/nics/features/tap.ini @@ -10,6 +10,7 @@ Promiscuous mode = Y Allmulticast mode= Y Basic stats = Y MTU update = Y +Multicast MAC filter = Y Speed capabilities = Y Unicast MAC filter = Y Other kdrv = Y diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index 86d2b3b4c106..e9aa2f45ba54 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -701,6 +701,18 @@ tap_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) return err; } +static int +tap_set_mc_addr_list(struct rte_eth_dev *dev __rte_unused, +struct ether_addr *mc_addr_set __rte_unused, +uint32_t nb_mc_addr __rte_unused) +{ + /* +* Nothing to do actually: the tap has no filtering whatsoever, every +* packet is received. +*/ + return 0; +} + static const struct eth_dev_ops ops = { .dev_start = tap_dev_start, .dev_stop = tap_dev_stop, @@ -720,6 +732,7 @@ static const struct eth_dev_ops ops = { .allmulticast_disable = tap_allmulti_disable, .mac_addr_set = tap_mac_set, .mtu_set= tap_mtu_set, + .set_mc_addr_list = tap_set_mc_addr_list, .stats_get = tap_stats_get, .stats_reset= tap_stats_reset, }; -- 2.8.0.rc0
[dpdk-dev] [PATCH v4 5/8] net/tap: add speed capabilities
Tap PMD is flexible, it supports any speed. Signed-off-by: Pascal Mazon --- doc/guides/nics/features/tap.ini | 1 + drivers/net/tap/rte_eth_tap.c| 35 +++ 2 files changed, 36 insertions(+) diff --git a/doc/guides/nics/features/tap.ini b/doc/guides/nics/features/tap.ini index b80577753240..6c07352088bf 100644 --- a/doc/guides/nics/features/tap.ini +++ b/doc/guides/nics/features/tap.ini @@ -10,6 +10,7 @@ Promiscuous mode = Y Allmulticast mode= Y Basic stats = Y MTU update = Y +Speed capabilities = Y Unicast MAC filter = Y Other kdrv = Y ARMv7= Y diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index 3e61f7bda555..86d2b3b4c106 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -359,6 +359,40 @@ tap_dev_configure(struct rte_eth_dev *dev __rte_unused) return 0; } +static uint32_t +tap_dev_speed_capa(void) +{ + uint32_t speed = pmd_link.link_speed; + uint32_t capa = 0; + + if (speed >= ETH_SPEED_NUM_10M) + capa |= ETH_LINK_SPEED_10M; + if (speed >= ETH_SPEED_NUM_100M) + capa |= ETH_LINK_SPEED_100M; + if (speed >= ETH_SPEED_NUM_1G) + capa |= ETH_LINK_SPEED_1G; + if (speed >= ETH_SPEED_NUM_5G) + capa |= ETH_LINK_SPEED_2_5G; + if (speed >= ETH_SPEED_NUM_5G) + capa |= ETH_LINK_SPEED_5G; + if (speed >= ETH_SPEED_NUM_10G) + capa |= ETH_LINK_SPEED_10G; + if (speed >= ETH_SPEED_NUM_20G) + capa |= ETH_LINK_SPEED_20G; + if (speed >= ETH_SPEED_NUM_25G) + capa |= ETH_LINK_SPEED_25G; + if (speed >= ETH_SPEED_NUM_40G) + capa |= ETH_LINK_SPEED_40G; + if (speed >= ETH_SPEED_NUM_50G) + capa |= ETH_LINK_SPEED_50G; + if (speed >= ETH_SPEED_NUM_56G) + capa |= ETH_LINK_SPEED_56G; + if (speed >= ETH_SPEED_NUM_100G) + capa |= ETH_LINK_SPEED_100G; + + return capa; +} + static void tap_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) { @@ -371,6 +405,7 @@ tap_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) dev_info->max_tx_queues = internals->nb_queues; dev_info->min_rx_bufsize = 0; dev_info->pci_dev = NULL; + dev_info->speed_capa = tap_dev_speed_capa(); } static void -- 2.8.0.rc0
[dpdk-dev] [PATCH v4 3/8] net/tap: add MAC address management
As soon as the netdevice is created, update pmd->mac_addr with its actual MAC address. Signed-off-by: Pascal Mazon --- doc/guides/nics/features/tap.ini | 1 + drivers/net/tap/rte_eth_tap.c| 39 +-- 2 files changed, 34 insertions(+), 6 deletions(-) diff --git a/doc/guides/nics/features/tap.ini b/doc/guides/nics/features/tap.ini index f4aca6921ddc..d9b47a003654 100644 --- a/doc/guides/nics/features/tap.ini +++ b/doc/guides/nics/features/tap.ini @@ -9,6 +9,7 @@ Jumbo frame = Y Promiscuous mode = Y Allmulticast mode= Y Basic stats = Y +Unicast MAC filter = Y Other kdrv = Y ARMv7= Y ARMv8= Y diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index 18edac57eead..06c1faa92dce 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -113,6 +113,10 @@ struct pmd_internals { struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES];/* List of TX queues */ }; +static int +tap_ioctl(struct pmd_internals *pmd, unsigned long request, + struct ifreq *ifr, int set); + /* Tun/Tap allocation routine * * name is the number of the interface to use, unless NULL to take the host @@ -178,13 +182,12 @@ tun_alloc(struct pmd_internals *pmd, uint16_t qid) } if (qid == 0) { - if (ioctl(fd, SIOCGIFHWADDR, &ifr) == -1) { - RTE_LOG(ERR, PMD, "ioctl failed (SIOCGIFHWADDR) (%s)\n", - ifr.ifr_name); - goto error; - } + struct ifreq ifr; - rte_memcpy(&pmd->eth_addr, ifr.ifr_hwaddr.sa_data, ETH_ALEN); + if (tap_ioctl(pmd, SIOCGIFHWADDR, &ifr, 0) < 0) + goto error; + rte_memcpy(&pmd->eth_addr, ifr.ifr_hwaddr.sa_data, + ETHER_ADDR_LEN); } return fd; @@ -297,6 +300,9 @@ tap_ioctl(struct pmd_internals *pmd, unsigned long request, else ifr->ifr_flags &= ~req_flags; break; + case SIOCGIFHWADDR: + case SIOCSIFHWADDR: + break; default: RTE_LOG(WARNING, PMD, "%s: ioctl() called with wrong arg\n", pmd->name); @@ -499,6 +505,26 @@ tap_allmulti_disable(struct rte_eth_dev *dev) tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0); } + +static void +tap_mac_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr) +{ + struct pmd_internals *pmd = dev->data->dev_private; + struct ifreq ifr; + + if (is_zero_ether_addr(mac_addr)) { + RTE_LOG(ERR, PMD, "%s: can't set an empty MAC address\n", + dev->data->name); + return; + } + + ifr.ifr_hwaddr.sa_family = AF_LOCAL; + rte_memcpy(ifr.ifr_hwaddr.sa_data, mac_addr, ETHER_ADDR_LEN); + if (tap_ioctl(pmd, SIOCSIFHWADDR, &ifr, 1) < 0) + return; + rte_memcpy(&pmd->eth_addr, mac_addr, ETHER_ADDR_LEN); +} + static int tap_setup_queue(struct rte_eth_dev *dev, struct pmd_internals *internals, @@ -633,6 +659,7 @@ static const struct eth_dev_ops ops = { .promiscuous_disable= tap_promisc_disable, .allmulticast_enable= tap_allmulti_enable, .allmulticast_disable = tap_allmulti_disable, + .mac_addr_set = tap_mac_set, .stats_get = tap_stats_get, .stats_reset= tap_stats_reset, }; -- 2.8.0.rc0
[dpdk-dev] [PATCH v4 7/8] net/tap: add packet type management
Advertize packet types supported by the librte_net. Signed-off-by: Pascal Mazon --- doc/guides/nics/features/tap.ini | 1 + drivers/net/tap/rte_eth_tap.c| 35 +++ 2 files changed, 36 insertions(+) diff --git a/doc/guides/nics/features/tap.ini b/doc/guides/nics/features/tap.ini index 6aa11874e2bc..7f3f4d661dd7 100644 --- a/doc/guides/nics/features/tap.ini +++ b/doc/guides/nics/features/tap.ini @@ -13,6 +13,7 @@ MTU update = Y Multicast MAC filter = Y Speed capabilities = Y Unicast MAC filter = Y +Packet type parsing = Y Other kdrv = Y ARMv7= Y ARMv8= Y diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index e9aa2f45ba54..371249b0daed 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include @@ -228,6 +229,8 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) mbuf->data_len = len; mbuf->pkt_len = len; mbuf->port = rxq->in_port; + mbuf->packet_type = rte_net_get_ptype(mbuf, NULL, + RTE_PTYPE_ALL_MASK); /* account for the receive frame */ bufs[num_rx++] = mbuf; @@ -713,6 +716,37 @@ tap_set_mc_addr_list(struct rte_eth_dev *dev __rte_unused, return 0; } +static const uint32_t* +tap_dev_supported_ptypes_get(struct rte_eth_dev *dev __rte_unused) +{ + static const uint32_t ptypes[] = { + RTE_PTYPE_INNER_L2_ETHER, + RTE_PTYPE_INNER_L2_ETHER_VLAN, + RTE_PTYPE_INNER_L2_ETHER_QINQ, + RTE_PTYPE_INNER_L3_IPV4, + RTE_PTYPE_INNER_L3_IPV4_EXT, + RTE_PTYPE_INNER_L3_IPV6, + RTE_PTYPE_INNER_L3_IPV6_EXT, + RTE_PTYPE_INNER_L4_FRAG, + RTE_PTYPE_INNER_L4_UDP, + RTE_PTYPE_INNER_L4_TCP, + RTE_PTYPE_INNER_L4_SCTP, + RTE_PTYPE_L2_ETHER, + RTE_PTYPE_L2_ETHER_VLAN, + RTE_PTYPE_L2_ETHER_QINQ, + RTE_PTYPE_L3_IPV4, + RTE_PTYPE_L3_IPV4_EXT, + RTE_PTYPE_L3_IPV6_EXT, + RTE_PTYPE_L3_IPV6, + RTE_PTYPE_L4_FRAG, + RTE_PTYPE_L4_UDP, + RTE_PTYPE_L4_TCP, + RTE_PTYPE_L4_SCTP, + }; + + return ptypes; +} + static const struct eth_dev_ops ops = { .dev_start = tap_dev_start, .dev_stop = tap_dev_stop, @@ -735,6 +769,7 @@ static const struct eth_dev_ops ops = { .set_mc_addr_list = tap_set_mc_addr_list, .stats_get = tap_stats_get, .stats_reset= tap_stats_reset, + .dev_supported_ptypes_get = tap_dev_supported_ptypes_get, }; static int -- 2.8.0.rc0
[dpdk-dev] [PATCH v4 8/8] net/tap: add flow control management
A tap netdevice does not support flow control; ensure nothing but RTE_FC_NONE mode can be set. Signed-off-by: Pascal Mazon --- doc/guides/nics/features/tap.ini | 1 + drivers/net/tap/rte_eth_tap.c| 19 +++ 2 files changed, 20 insertions(+) diff --git a/doc/guides/nics/features/tap.ini b/doc/guides/nics/features/tap.ini index 7f3f4d661dd7..a51712dce066 100644 --- a/doc/guides/nics/features/tap.ini +++ b/doc/guides/nics/features/tap.ini @@ -14,6 +14,7 @@ Multicast MAC filter = Y Speed capabilities = Y Unicast MAC filter = Y Packet type parsing = Y +Flow control = Y Other kdrv = Y ARMv7= Y ARMv8= Y diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index 371249b0daed..7557abb2ebfc 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -747,6 +747,23 @@ tap_dev_supported_ptypes_get(struct rte_eth_dev *dev __rte_unused) return ptypes; } +static int +tap_flow_ctrl_get(struct rte_eth_dev *dev __rte_unused, + struct rte_eth_fc_conf *fc_conf) +{ + fc_conf->mode = RTE_FC_NONE; + return 0; +} + +static int +tap_flow_ctrl_set(struct rte_eth_dev *dev __rte_unused, + struct rte_eth_fc_conf *fc_conf) +{ + if (fc_conf->mode != RTE_FC_NONE) + return -ENOTSUP; + return 0; +} + static const struct eth_dev_ops ops = { .dev_start = tap_dev_start, .dev_stop = tap_dev_stop, @@ -757,6 +774,8 @@ static const struct eth_dev_ops ops = { .tx_queue_setup = tap_tx_queue_setup, .rx_queue_release = tap_rx_queue_release, .tx_queue_release = tap_tx_queue_release, + .flow_ctrl_get = tap_flow_ctrl_get, + .flow_ctrl_set = tap_flow_ctrl_set, .link_update= tap_link_update, .dev_set_link_up= tap_link_set_up, .dev_set_link_down = tap_link_set_down, -- 2.8.0.rc0
Re: [dpdk-dev] 答复: [PATCH] net/virtio-user: fix overflow
On Tue, Mar 14, 2017 at 04:20:39PM +0800, Wenfeng Liu wrote: > Hi Yuanhan, > > >On Mon, Mar 13, 2017 at 03:09:15PM +, Wenfeng Liu wrote: > >> This commit fixes an array overflow when number of queues is higher than > 8. > > > >Firstly, this commit log could be a bit more informative, to something > >like: > > > >virtio-user limits the qeueue number to 8 but provides no limit > >check against the queue number input from user. If a bigger queue > >number (> 8) is given, there is an overflow issue. Doing a sanity > >check could avoid it. > > > > Sure, I will revise the commit log accordingly. > > >> > >> Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer") > >> > >> Signed-off-by: Wenfeng Liu > >> --- > >> drivers/net/virtio/virtio_pci.h | 3 ++- > >> drivers/net/virtio/virtio_user/virtio_user_dev.c | 2 +- > >> drivers/net/virtio/virtio_user/virtio_user_dev.h | 6 +++--- > >> drivers/net/virtio/virtio_user_ethdev.c | 7 +++ > >> 4 files changed, 13 insertions(+), 5 deletions(-) > >> > >> diff --git a/drivers/net/virtio/virtio_pci.h > >> b/drivers/net/virtio/virtio_pci.h index 59e45c4..bd940b4 100644 > >> --- a/drivers/net/virtio/virtio_pci.h > >> +++ b/drivers/net/virtio/virtio_pci.h > >> @@ -160,7 +160,8 @@ > >> /* > >> * Maximum number of virtqueues per device. > >> */ > >> -#define VIRTIO_MAX_VIRTQUEUES 8 > >> +#define VIRTIO_MAX_VIRTQUEUE_PAIRS 8 > >> +#define VIRTIO_MAX_VIRTQUEUES VIRTIO_MAX_VIRTQUEUE_PAIRS * 2 + 1 > >> > >> /* Common configuration */ > >> #define VIRTIO_PCI_CAP_COMMON_CFG 1 > >> diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c > >> b/drivers/net/virtio/virtio_user/virtio_user_dev.c > >> index e7fd65f..5b81676 100644 > >> --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c > >> +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c > >> @@ -234,7 +234,7 @@ int virtio_user_stop_device(struct virtio_user_dev > *dev) > >>uint32_t i, q; > >> > >>dev->vhostfd = -1; > >> - for (i = 0; i < VIRTIO_MAX_VIRTQUEUES * 2 + 1; ++i) { > >> + for (i = 0; i < VIRTIO_MAX_VIRTQUEUES; ++i) { > > > >Right, we don't need setup callfd and kickfd for the ctrl-queue. > > I did not remove the ctrl-queue. I just redefined the MACRO according to DRY > principle: Oh, right, sorry, I overlooked it. Then for this patch, it's okay to me. > #define VIRTIO_MAX_VIRTQUEUES VIRTIO_MAX_VIRTQUEUE_PAIRS * 2 + 1 > I noticed that I missed the bracket in the MACRO and will add it in next > version. Yes, please. Also, please put a "Cc: sta...@dpdk.org" before your Signed-of-by: it's a candidate for stable releases. --yliu
[dpdk-dev] [PATCH v4 1/4] net/tap: move private elements to external header
In the next patch, access to struct pmd_internals will be necessary in tap_flow.c to store the flows. Signed-off-by: Pascal Mazon Acked-by: Olga Shern --- drivers/net/tap/Makefile | 1 + drivers/net/tap/rte_eth_tap.c | 35 ++-- drivers/net/tap/rte_eth_tap.h | 74 +++ 3 files changed, 77 insertions(+), 33 deletions(-) create mode 100644 drivers/net/tap/rte_eth_tap.h diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile index e18f30c56f52..bdbe69e62a4e 100644 --- a/drivers/net/tap/Makefile +++ b/drivers/net/tap/Makefile @@ -40,6 +40,7 @@ EXPORT_MAP := rte_pmd_tap_version.map LIBABIVER := 1 CFLAGS += -O3 +CFLAGS += -I$(SRCDIR) CFLAGS += $(WERROR_FLAGS) # diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index 7557abb2ebfc..536fd206e789 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -51,6 +51,8 @@ #include #include +#include + /* Linux based path to the TUN device */ #define TUN_TAP_DEV_PATH"/dev/net/tun" #define DEFAULT_TAP_NAME"dtap" @@ -81,39 +83,6 @@ static struct rte_eth_link pmd_link = { .link_autoneg = ETH_LINK_SPEED_AUTONEG }; -struct pkt_stats { - uint64_t opackets; /* Number of output packets */ - uint64_t ipackets; /* Number of input packets */ - uint64_t obytes;/* Number of bytes on output */ - uint64_t ibytes;/* Number of bytes on input */ - uint64_t errs; /* Number of error packets */ -}; - -struct rx_queue { - struct rte_mempool *mp; /* Mempool for RX packets */ - uint16_t in_port; /* Port ID */ - int fd; - - struct pkt_stats stats; /* Stats for this RX queue */ -}; - -struct tx_queue { - int fd; - struct pkt_stats stats; /* Stats for this TX queue */ -}; - -struct pmd_internals { - char name[RTE_ETH_NAME_MAX_LEN];/* Internal Tap device name */ - uint16_t nb_queues; /* Number of queues supported */ - struct ether_addr eth_addr; /* Mac address of the device port */ - - int if_index; /* IF_INDEX for the port */ - int ioctl_sock; /* socket for ioctl calls */ - - struct rx_queue rxq[RTE_PMD_TAP_MAX_QUEUES];/* List of RX queues */ - struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES];/* List of TX queues */ -}; - static int tap_ioctl(struct pmd_internals *pmd, unsigned long request, struct ifreq *ifr, int set); diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h new file mode 100644 index ..880ec1b4fc8c --- /dev/null +++ b/drivers/net/tap/rte_eth_tap.h @@ -0,0 +1,74 @@ +/*- + * BSD LICENSE + * + * Copyright 2017 6WIND S.A. + * Copyright 2017 Mellanox. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of 6WIND S.A. nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_ETH_TAP_H_ +#define _RTE_ETH_TAP_H_ + +#include + +#include +#include + +#define RTE_PMD_TAP_MAX_QUEUES 16 + +struct pkt_stats { + uint64_t opackets; /* Number of output packets */ + uint64_t ipackets; /* Number of input packets */ + uint64_t obytes; /* Number of bytes on output */ + uint64_t ibytes; /* Number of bytes on input */ + uint64_t errs; /* Number of error packets */ +}; + +struct rx_queue { + struct rte_mempool *mp; /*
[dpdk-dev] [PATCH v4 2/4] net/tap: add preliminary support for rte_flow
The flow API provides the ability to classify packets received by a tap netdevice. This patch only implements skeleton functions for flow API support, no patterns are supported yet. Signed-off-by: Pascal Mazon Acked-by: Olga Shern --- doc/guides/nics/features/tap.ini | 1 + drivers/net/tap/Makefile | 1 + drivers/net/tap/rte_eth_tap.c| 6 ++ drivers/net/tap/rte_eth_tap.h| 2 + drivers/net/tap/tap_flow.c | 185 +++ drivers/net/tap/tap_flow.h | 46 ++ 6 files changed, 241 insertions(+) create mode 100644 drivers/net/tap/tap_flow.c create mode 100644 drivers/net/tap/tap_flow.h diff --git a/doc/guides/nics/features/tap.ini b/doc/guides/nics/features/tap.ini index a51712dce066..9d73f61cca3b 100644 --- a/doc/guides/nics/features/tap.ini +++ b/doc/guides/nics/features/tap.ini @@ -9,6 +9,7 @@ Jumbo frame = Y Promiscuous mode = Y Allmulticast mode= Y Basic stats = Y +Flow API = Y MTU update = Y Multicast MAC filter = Y Speed capabilities = Y diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile index bdbe69e62a4e..386b8b0594d3 100644 --- a/drivers/net/tap/Makefile +++ b/drivers/net/tap/Makefile @@ -47,6 +47,7 @@ CFLAGS += $(WERROR_FLAGS) # all source are stored in SRCS-y # SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += rte_eth_tap.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap_flow.c # this lib depends upon: DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_eal diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index 536fd206e789..78eac9a11ea0 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -52,6 +52,7 @@ #include #include +#include /* Linux based path to the TUN device */ #define TUN_TAP_DEV_PATH"/dev/net/tun" @@ -435,6 +436,7 @@ tap_dev_close(struct rte_eth_dev *dev __rte_unused) struct pmd_internals *internals = dev->data->dev_private; tap_link_set_down(dev); + tap_flow_flush(dev, NULL); for (i = 0; i < internals->nb_queues; i++) { if (internals->rxq[i].fd != -1) @@ -758,6 +760,7 @@ static const struct eth_dev_ops ops = { .stats_get = tap_stats_get, .stats_reset= tap_stats_reset, .dev_supported_ptypes_get = tap_dev_supported_ptypes_get, + .filter_ctrl= tap_dev_filter_ctrl, }; static int @@ -829,6 +832,8 @@ eth_dev_tap_create(const char *name, char *tap_name) pmd->txq[i].fd = -1; } + LIST_INIT(&pmd->flows); + return 0; error_exit: @@ -942,6 +947,7 @@ rte_pmd_tap_remove(const char *name) return 0; internals = eth_dev->data->dev_private; + tap_flow_flush(eth_dev, NULL); for (i = 0; i < internals->nb_queues; i++) if (internals->rxq[i].fd != -1) close(internals->rxq[i].fd); diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h index 880ec1b4fc8c..a64116f5a35e 100644 --- a/drivers/net/tap/rte_eth_tap.h +++ b/drivers/net/tap/rte_eth_tap.h @@ -34,6 +34,7 @@ #ifndef _RTE_ETH_TAP_H_ #define _RTE_ETH_TAP_H_ +#include #include #include @@ -67,6 +68,7 @@ struct pmd_internals { struct ether_addr eth_addr; /* Mac address of the device port */ int if_index; /* IF_INDEX for the port */ int ioctl_sock; /* socket for ioctl calls */ + LIST_HEAD(tap_flows, rte_flow) flows;/* rte_flow rules */ struct rx_queue rxq[RTE_PMD_TAP_MAX_QUEUES]; /* List of RX queues */ struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES]; /* List of TX queues */ }; diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c new file mode 100644 index ..c32ed382d745 --- /dev/null +++ b/drivers/net/tap/tap_flow.c @@ -0,0 +1,185 @@ +/*- + * BSD LICENSE + * + * Copyright 2017 6WIND S.A. + * Copyright 2017 Mellanox. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of 6WIND S.A. nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES
[dpdk-dev] [PATCH v4 0/4] net/tap: support flow API
This series add support for the flow API in tap PMD. It enables filtering specific packets incoming on the tap netdevice, to process only desired ones. Under the hood, it uses kernel TC (traffic control), which takes place very early in the stack, and supports most common pattern items and actions defined in the flow API. This series applies on top of: [PATCH 0/6] net/tap: add additional management ops v2 changes: - support compilation on kernels < 4.2 (where flower support appeared) - set whitespaces in tap.h - remove unnecessary goto v3 changes: - vlan patterns enabled depending on running kernel (4.9+) - update doc/guides/nics/tap.rst for Flow API support - rebase on top of "net/tap: add additional management ops" series v4 changes: - rebase on top of "net/tap: add additional management ops" series - fix a few netlink doxygen comments - rename tap.h -> rte_eth_tap.h - flush flow rules only when applicable Pascal Mazon (4): net/tap: move private elements to external header net/tap: add preliminary support for rte_flow net/tap: add netlink back-end for flow API net/tap: add basic flow API patterns and actions doc/guides/nics/features/tap.ini |1 + doc/guides/nics/tap.rst | 23 + drivers/net/tap/Makefile | 44 ++ drivers/net/tap/rte_eth_tap.c| 100 ++-- drivers/net/tap/rte_eth_tap.h| 79 +++ drivers/net/tap/tap_flow.c | 1078 ++ drivers/net/tap/tap_flow.h | 58 ++ drivers/net/tap/tap_netlink.c| 367 + drivers/net/tap/tap_netlink.h| 69 +++ drivers/net/tap/tap_tcmsgs.c | 378 + drivers/net/tap/tap_tcmsgs.h | 63 +++ 11 files changed, 2226 insertions(+), 34 deletions(-) create mode 100644 drivers/net/tap/rte_eth_tap.h create mode 100644 drivers/net/tap/tap_flow.c create mode 100644 drivers/net/tap/tap_flow.h create mode 100644 drivers/net/tap/tap_netlink.c create mode 100644 drivers/net/tap/tap_netlink.h create mode 100644 drivers/net/tap/tap_tcmsgs.c create mode 100644 drivers/net/tap/tap_tcmsgs.h -- 2.8.0.rc0
[dpdk-dev] [PATCH v4 3/4] net/tap: add netlink back-end for flow API
Each kernel netdevice may have queueing disciplines set for it, which determine how to handle the packet (mostly on egress). That's part of the TC (Traffic Control) mechanism. Through TC, it is possible to set filter rules that match specific packets, and act according to what is in the rule. This is a perfect candidate to implement the flow API for the tap PMD, as it has an associated kernel netdevice automatically. Each flow API rule will be translated into its TC counterpart. To leverage TC, it is necessary to communicate with the kernel using netlink. This patch introduces a library to help that communication. Inside netlink.c, functions are generic for any netlink messaging. Inside tcmsgs.c, functions are specific to deal with TC rules. Signed-off-by: Pascal Mazon Acked-by: Olga Shern --- drivers/net/tap/Makefile | 2 + drivers/net/tap/tap_netlink.c | 367 drivers/net/tap/tap_netlink.h | 69 drivers/net/tap/tap_tcmsgs.c | 378 ++ drivers/net/tap/tap_tcmsgs.h | 63 +++ 5 files changed, 879 insertions(+) create mode 100644 drivers/net/tap/tap_netlink.c create mode 100644 drivers/net/tap/tap_netlink.h create mode 100644 drivers/net/tap/tap_tcmsgs.c create mode 100644 drivers/net/tap/tap_tcmsgs.h diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile index 386b8b0594d3..4ae2ca6cfbab 100644 --- a/drivers/net/tap/Makefile +++ b/drivers/net/tap/Makefile @@ -48,6 +48,8 @@ CFLAGS += $(WERROR_FLAGS) # SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += rte_eth_tap.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap_flow.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap_netlink.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap_tcmsgs.c # this lib depends upon: DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_eal diff --git a/drivers/net/tap/tap_netlink.c b/drivers/net/tap/tap_netlink.c new file mode 100644 index ..10f00d1931c6 --- /dev/null +++ b/drivers/net/tap/tap_netlink.c @@ -0,0 +1,367 @@ +/*- + * BSD LICENSE + * + * Copyright 2017 6WIND S.A. + * Copyright 2017 Mellanox. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of 6WIND S.A. nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +/* Must be quite large to support dumping a huge list of QDISC or filters. */ +#define BUF_SIZE (32 * 1024) /* Size of the buffer to receive kernel messages */ +#define SNDBUF_SIZE 32768 /* Send buffer size for the netlink socket */ +#define RCVBUF_SIZE 32768 /* Receive buffer size for the netlink socket */ + +struct nested_tail { + struct rtattr *tail; + struct nested_tail *prev; +}; + +/** + * Initialize a netlink socket for communicating with the kernel. + * + * @return + * netlink socket file descriptor on success, -1 otherwise. + */ +int +nl_init(void) +{ + int fd, sndbuf_size = SNDBUF_SIZE, rcvbuf_size = RCVBUF_SIZE; + struct sockaddr_nl local = { .nl_family = AF_NETLINK }; + + fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_ROUTE); + if (fd < 0) { + RTE_LOG(ERR, PMD, "Unable to create a netlink socket\n"); + return -1; + } + if (setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(int))) { + RTE_LOG(ERR, PMD, "Unable to set socket buffer send size\n"); + return -1; + } + if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcvb
[dpdk-dev] [PATCH v4 4/4] net/tap: add basic flow API patterns and actions
Supported flow rules are now mapped to TC rules on the tap netdevice. The netlink message used for creating the TC rule is stored in struct rte_flow. That way, by simply changing a metadata in it, we can require for the rule deletion without further parsing. Supported items: - eth: src and dst (with variable masks), and eth_type (0x mask). - vlan: vid, pcp, tpid, but not eid. - ipv4/6: src and dst (with variable masks), and ip_proto (0x mask). - udp/tcp: src and dst port (0x) mask. Supported actions: - DROP - QUEUE - PASSTHRU It is generally not possible to provide a "last" item. However, if the "last" item, once masked, is identical to the masked spec, then it is supported. Only IPv4/6 and MAC addresses can use a variable mask. All other items need a full mask (exact match). Support for VLAN requires kernel headers >= 4.9, checked using auto-config.sh. Signed-off-by: Pascal Mazon Acked-by: Olga Shern --- doc/guides/nics/tap.rst | 23 ++ drivers/net/tap/Makefile | 40 ++ drivers/net/tap/rte_eth_tap.c | 61 ++- drivers/net/tap/rte_eth_tap.h | 3 + drivers/net/tap/tap_flow.c| 919 +- drivers/net/tap/tap_flow.h| 12 + 6 files changed, 1043 insertions(+), 15 deletions(-) diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst index c4f207be3b47..cdb528b5eae4 100644 --- a/doc/guides/nics/tap.rst +++ b/doc/guides/nics/tap.rst @@ -82,6 +82,29 @@ can utilize that stack to handle the network protocols. Plus you would be able to address the interface using an IP address assigned to the internal interface. +Flow API support + + +The tap PMD supports major flow API pattern items and actions, when running on +linux kernels above 4.2 ("Flower" classifier required). Supported items: + +- eth: src and dst (with variable masks), and eth_type (0x mask). +- vlan: vid, pcp, tpid, but not eid. (requires kernel 4.9) +- ipv4/6: src and dst (with variable masks), and ip_proto (0x mask). +- udp/tcp: src and dst port (0x) mask. + +Supported actions: + +- DROP +- QUEUE +- PASSTHRU + +It is generally not possible to provide a "last" item. However, if the "last" +item, once masked, is identical to the masked spec, then it is supported. + +Only IPv4/6 and MAC addresses can use a variable mask. All other items need a +full mask (exact match). + Example --- diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile index 4ae2ca6cfbab..a6542dad1d66 100644 --- a/drivers/net/tap/Makefile +++ b/drivers/net/tap/Makefile @@ -41,6 +41,7 @@ LIBABIVER := 1 CFLAGS += -O3 CFLAGS += -I$(SRCDIR) +CFLAGS += -I. CFLAGS += $(WERROR_FLAGS) # @@ -57,5 +58,44 @@ DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_mbuf DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_mempool DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_ether DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_kvargs +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_hash include $(RTE_SDK)/mk/rte.lib.mk + +# Generate and clean-up tap_autoconf.h. + +export CC CFLAGS CPPFLAGS EXTRA_CFLAGS EXTRA_CPPFLAGS +export AUTO_CONFIG_CFLAGS = -Wno-error + +ifndef V +AUTOCONF_OUTPUT := >/dev/null +endif + +tap_autoconf.h.new: FORCE + +tap_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh + $Q $(RM) -f -- '$@' + $Q sh -- '$<' '$@' \ + HAVE_TC_FLOWER \ + linux/pkt_cls.h \ + enum TCA_FLOWER_UNSPEC \ + $(AUTOCONF_OUTPUT) + $Q sh -- '$<' '$@' \ + HAVE_TC_VLAN_ID \ + linux/pkt_cls.h \ + enum TCA_FLOWER_KEY_VLAN_PRIO \ + $(AUTOCONF_OUTPUT) + +# Create tap_autoconf.h or update it in case it differs from the new one. + +tap_autoconf.h: tap_autoconf.h.new + $Q [ -f '$@' ] && \ + cmp '$<' '$@' $(AUTOCONF_OUTPUT) || \ + mv '$<' '$@' + +$(SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP):.c=.o): tap_autoconf.h + +clean_tap: FORCE + $Q rm -f -- tap_autoconf.h tap_autoconf.h.new + +clean: clean_tap diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index 78eac9a11ea0..ed2099212e2a 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -42,17 +42,20 @@ #include #include #include +#include #include #include #include #include -#include +#include #include #include +#include #include #include #include +#include /* Linux based path to the TUN device */ #define TUN_TAP_DEV_PATH"/dev/net/tun" @@ -67,6 +70,9 @@ #define RTE_PMD_TAP_MAX_QUEUES 1 #endif +#define FLOWER_KERNEL_VERSION KERNEL_VERSION(4, 2, 0) +#define FLOWER_VLAN_KERNEL_VERSION KERNEL_VERSION(4, 9, 0) + static struct rte_vdev_driver pmd_tap_drv; static const char *valid_arguments[] = { @@ -159,6 +165,28 @@ tun_alloc(struct pmd_internals *pmd, uint16_t qid) goto error; rte_memcpy(&pmd->eth_addr, ifr.ifr_hwaddr.sa
[dpdk-dev] [PATCH v4] net/tap: remote netdevice traffic capture
This patchset adds the special "remote" feature to the tap PMD, that actually enables capturing traffic from another netdevice. This is especially useful to get packets into the DPDK app, when the remote netdevice has no DPDK support. The "remote" feature requires flow API support as flow rules will be configured on the remote netdevice for redirection, using the same mechanism. This series applies on top of: [PATCH 0/4] net/tap: support flow API v2 changes: - rebase on top of updated "net/tap: support flow API" - fix implicit flow flush when closing the netdevices v3 changes: - memset(0) for remote_iface in rte_pmd_tap_probe() - use snprintf instead of strncpy to correctly handle terminating \0 v4 changes: - rebase on top of updated "net/tap: support flow API" - use only a single patch now as MTU, MAC and flags can be easily managed with tap_ioctl() Pascal Mazon (1): net/tap: add remote netdevice traffic capture doc/guides/nics/tap.rst | 17 ++ drivers/net/tap/rte_eth_tap.c | 101 +- drivers/net/tap/rte_eth_tap.h | 4 + drivers/net/tap/tap_flow.c| 451 -- drivers/net/tap/tap_flow.h| 24 +++ 5 files changed, 575 insertions(+), 22 deletions(-) -- 2.8.0.rc0
[dpdk-dev] [PATCH v4] net/tap: add remote netdevice traffic capture
By default, a tap netdevice is of no use when not fed by a separate process. The ability to automatically feed it from another netdevice allows applications to capture any kind of traffic normally destined to the kernel stack. This patch implements this ability through a new optional "remote" parameter. Packets matching filtering rules created with the flow API are matched on the remote device and redirected to the tap PMD, where the relevant action will be performed. Signed-off-by: Pascal Mazon Acked-by: Olga Shern --- doc/guides/nics/tap.rst | 17 ++ drivers/net/tap/rte_eth_tap.c | 101 +- drivers/net/tap/rte_eth_tap.h | 4 + drivers/net/tap/tap_flow.c| 451 -- drivers/net/tap/tap_flow.h| 24 +++ 5 files changed, 575 insertions(+), 22 deletions(-) diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst index cdb528b5eae4..676a569b00ca 100644 --- a/doc/guides/nics/tap.rst +++ b/doc/guides/nics/tap.rst @@ -58,6 +58,23 @@ needed, but the interface does not enforce that speed, for example:: --vdev=net_tap0,iface=foo0,speed=25000 +It is possible to specify a remote netdevice to capture packets from by adding +``remote=foo1``, for example:: + + --vdev=net_tap,iface=tap0,remote=foo1 + +If a ``remote`` is set, then all packets with the tap PMD's local MAC coming +in on the remote netdevice will be redirected to the tap. +If the tap is in promiscuous mode, then all packets will be redirected. +In allmulti mode, all multicast packets will be redirected. +It is possible to add explicit rte_flow rules on the tap PMD to capture specific +traffic. For instance, in testpmd, the following rte_flow rule would capture +packets with the given MAC address from the remote, and send it to the tap RX +QUEUE 3:: + + testpmd> flow create 0 ingress pattern eth src is 02:03:04:05:06:07 / \ +end actions queue index 3 / end + After the DPDK application is started you can send and receive packets on the interface using the standard rx_burst/tx_burst APIs in DPDK. From the host point of view you can use any host tool like tcpdump, Wireshark, ping, Pktgen diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index ed2099212e2a..b4601d49fa2c 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -63,6 +63,7 @@ #define ETH_TAP_IFACE_ARG "iface" #define ETH_TAP_SPEED_ARG "speed" +#define ETH_TAP_REMOTE_ARG "remote" #ifdef IFF_MULTI_QUEUE #define RTE_PMD_TAP_MAX_QUEUES 16 @@ -78,6 +79,7 @@ static struct rte_vdev_driver pmd_tap_drv; static const char *valid_arguments[] = { ETH_TAP_IFACE_ARG, ETH_TAP_SPEED_ARG, + ETH_TAP_REMOTE_ARG, NULL }; @@ -187,10 +189,43 @@ tun_alloc(struct pmd_internals *pmd, uint16_t qid) pmd->name); return fd; } + if (pmd->remote_if_index) { + /* +* Flush usually returns negative value because it tries +* to delete every QDISC (and on a running device, one +* QDISC at least is needed). Ignore negative return +* value. +*/ + qdisc_flush(pmd->nlsk_fd, pmd->remote_if_index); + if (qdisc_create_ingress(pmd->nlsk_fd, +pmd->remote_if_index) < 0) + goto remote_fail; + LIST_INIT(&pmd->implicit_flows); + if (tap_flow_implicit_create( + pmd, TAP_REMOTE_LOCAL_MAC) < 0) + goto remote_fail; + if (tap_flow_implicit_create( + pmd, TAP_REMOTE_BROADCAST) < 0) + goto remote_fail; + if (tap_flow_implicit_create( + pmd, TAP_REMOTE_BROADCASTV6) < 0) + goto remote_fail; + if (tap_flow_implicit_create( + pmd, TAP_REMOTE_TX) < 0) + goto remote_fail; + } } return fd; +remote_fail: + RTE_LOG(ERR, PMD, + "Could not set up remote flow rules for %s: remote disabled.\n", + pmd->name); + pmd->remote_if_index = 0; + tap_flow_implicit_flush(pmd, NULL); + return fd; + error: if (fd > 0) close(fd); @@ -289,8 +324,17 @@ tap_ioctl(struct pmd_internals *pmd, unsigned long request, struct ifreq *ifr, int set) { short req_flags = ifr->ifr_flags; + int remote = !!pmd->remote_if_index; - snprintf(ifr->ifr_name, IFNAMSIZ, "%s", pmd->name); + /* +* If there is a remote netdevice, apply ioctl
Re: [dpdk-dev] [PATCH v1 12/14] ring: separate out head index manipulation for enq/deq
On Wed, 8 Mar 2017 12:06:54 +, Bruce Richardson wrote: > On Wed, Mar 08, 2017 at 11:49:06AM +0100, Olivier MATZ wrote: > > On Thu, 23 Feb 2017 17:24:05 +, Bruce Richardson > > wrote: > > > We can write a single common function for head manipulation for enq > > > and a common one for deq, allowing us to have a single worker function > > > for enq and deq, rather than two of each. Update all other inline > > > functions to use the new functions. > > > > > > Signed-off-by: Bruce Richardson > > > --- > > > lib/librte_ring/rte_ring.c | 4 +- > > > lib/librte_ring/rte_ring.h | 328 > > > - > > > 2 files changed, 149 insertions(+), 183 deletions(-) > > > > > > > [...] > > > > > +static inline __attribute__((always_inline)) unsigned int > > > +__rte_ring_do_enqueue(struct rte_ring *r, void * const *obj_table, > > > + unsigned int n, enum rte_ring_queue_behavior behavior, > > > + int is_sp, unsigned int *free_space) > > > { > > > - uint32_t prod_head, cons_tail; > > > - uint32_t prod_next, free_entries; > > > - uint32_t mask = r->mask; > > > - > > > - prod_head = r->prod.head; > > > - cons_tail = r->cons.tail; > > > - /* The subtraction is done between two unsigned 32bits value > > > - * (the result is always modulo 32 bits even if we have > > > - * prod_head > cons_tail). So 'free_entries' is always between 0 > > > - * and size(ring)-1. */ > > > - free_entries = mask + cons_tail - prod_head; > > > - > > > - /* check that we have enough room in ring */ > > > - if (unlikely(n > free_entries)) > > > - n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : free_entries; > > > + uint32_t prod_head, prod_next; > > > + uint32_t free_entries; > > > > > > + n = __rte_ring_move_prod_head(r, is_sp, n, behavior, > > > + &prod_head, &prod_next, &free_entries); > > > if (n == 0) > > > goto end; > > > > > > - > > > - prod_next = prod_head + n; > > > - r->prod.head = prod_next; > > > - > > > - /* write entries in ring */ > > > ENQUEUE_PTRS(); > > > rte_smp_wmb(); > > > > > > + /* > > > + * If there are other enqueues in progress that preceded us, > > > + * we need to wait for them to complete > > > + */ > > > + while (unlikely(r->prod.tail != prod_head)) > > > + rte_pause(); > > > + > > > > I'd say this part should not be done in case is_sp == 1. > > Since it is sometimes a constant arg in an inline func, it may be better > > to add the if (is_sp == 0). > > > > [...] > > > > Yes, it's an unnecessary check. However, having it in place for the sp > case made no performance difference in my test, so I decided to keep > the code shorter by avoiding an additional branch. If there is a > performance hit I'll remove it, but I would rather not add more branches > to the code in the absense of a real impact to not having them. Ok. Maybe it's worth checking the numbers given by the unit test. Olivier
Re: [dpdk-dev] [PATCH v2 07/14] ring: make bulk and burst fn return vals consistent
On Wed, 8 Mar 2017 12:08:42 +, Bruce Richardson wrote: > On Wed, Mar 08, 2017 at 11:22:40AM +0100, Olivier MATZ wrote: > > On Tue, 7 Mar 2017 11:32:10 +, Bruce Richardson > > wrote: > > > The bulk fns for rings returns 0 for all elements enqueued and negative > > > for no space. Change that to make them consistent with the burst functions > > > in returning the number of elements enqueued/dequeued, i.e. 0 or N. > > > This change also allows the return value from enq/deq to be used directly > > > without a branch for error checking. > > > > > > Signed-off-by: Bruce Richardson > > > > [...] > > > > > @@ -716,7 +695,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * > > > const *obj_table, > > > static inline int __attribute__((always_inline)) > > > rte_ring_mp_enqueue(struct rte_ring *r, void *obj) > > > { > > > - return rte_ring_mp_enqueue_bulk(r, &obj, 1); > > > + return rte_ring_mp_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS; > > > } > > > > > > /** > > > > I'm wondering if these functions (enqueue/dequeue of one element) should > > be modified to return 0 (fail) or 1 (success) too, for consistency with > > the bulk functions. > > > > Any opinion? > > > I thought about that, but I would view it as risky, unless we want to go > changing the parameters to the function also, as the compiler won't flag > a change in return value like that. > Ok, I have no better solution anyway. Olivier
Re: [dpdk-dev] [PATCH v2 00/14] refactor and cleanup of rte_ring
On Tue, 7 Mar 2017 11:32:03 +, Bruce Richardson wrote: > NOTE: this set depends on the v2 cleanup set sent previously. > http://dpdk.org/ml/archives/dev/2017-February/thread.html#58200 > > This patchset make a set of, sometimes non-backward compatible, cleanup > changes to the rte_ring code in order to improve it. The resulting code is > shorter, since the existing functions are restructured to reduce code > duplication, as well as being more consistent in behaviour. The specific > changes made are explained in each patch which makes that change. > > Changes in V2: > * Eliminated extra cacheline padding where cachelines are 128B > * Renamed rte_ring_ht_ptr struct to rte_ring_headtail > * Removed missed references to ring watermarks in test code and docs > > This patchset is largely the same as that posted previously on-list as > an RFC: > http://dpdk.org/ml/archives/dev/2017-February/thread.html#56982 > > Changes in V1 from RFC: > * Included release notes updates as changes are made in each patch > * Fixed some missed comment updates when changing the code > * Separated some initial fixup patches from this set to send separately > * Dropped the final two patches for an rte_event_ring, as not relevant > for this set. That can be done as a separate set later. > * The macros for copying the pointers have an extra parameter added, > indicating the start of the ring buffer itself. This allows more > flexibility for reusing them in other ring implementations. > > Bruce Richardson (14): > ring: remove split cacheline build setting > ring: create common structure for prod and cons metadata > ring: eliminate duplication of size and mask fields > ring: remove debug setting > ring: remove the yield when waiting for tail update > ring: remove watermark support > ring: make bulk and burst fn return vals consistent > ring: allow enqueue fns to return free space value > ring: allow dequeue fns to return remaining entry count > examples/quota_watermark: use ring space for watermarks > ring: reduce scope of local variables > ring: separate out head index manipulation for enq/deq > ring: create common function for updating tail idx > ring: make ring struct and enq/deq macros type agnostic > > app/pdump/main.c | 2 +- > config/common_base | 3 - > doc/guides/prog_guide/env_abstraction_layer.rst| 5 - > doc/guides/prog_guide/ring_lib.rst | 15 - > doc/guides/rel_notes/release_17_05.rst | 32 + > doc/guides/sample_app_ug/server_node_efd.rst | 2 +- > drivers/crypto/null/null_crypto_pmd.c | 2 +- > drivers/net/bonding/rte_eth_bond_pmd.c | 3 +- > drivers/net/ring/rte_eth_ring.c| 4 +- > examples/distributor/main.c| 5 +- > examples/load_balancer/runtime.c | 34 +- > .../client_server_mp/mp_client/client.c| 9 +- > .../client_server_mp/mp_server/main.c | 2 +- > examples/packet_ordering/main.c| 13 +- > examples/qos_sched/app_thread.c| 14 +- > examples/quota_watermark/qw/init.c | 5 +- > examples/quota_watermark/qw/main.c | 21 +- > examples/quota_watermark/qw/main.h | 1 + > examples/quota_watermark/qwctl/commands.c | 4 +- > examples/quota_watermark/qwctl/qwctl.c | 2 + > examples/quota_watermark/qwctl/qwctl.h | 1 + > examples/server_node_efd/node/node.c | 2 +- > examples/server_node_efd/server/main.c | 2 +- > lib/librte_hash/rte_cuckoo_hash.c | 5 +- > lib/librte_mempool/rte_mempool_ring.c | 12 +- > lib/librte_pdump/rte_pdump.c | 2 +- > lib/librte_port/rte_port_frag.c| 3 +- > lib/librte_port/rte_port_ras.c | 2 +- > lib/librte_port/rte_port_ring.c| 34 +- > lib/librte_ring/rte_ring.c | 76 +-- > lib/librte_ring/rte_ring.h | 760 > - > test/test-pipeline/pipeline_hash.c | 5 +- > test/test-pipeline/runtime.c | 19 +- > test/test/autotest_test_funcs.py | 7 - > test/test/commands.c | 52 -- > test/test/test_link_bonding_mode4.c| 6 +- > test/test/test_pmd_ring_perf.c | 12 +- > test/test/test_ring.c | 704 +++ > test/test/test_ring_perf.c | 36 +- > test/test/test_table_acl.c | 2 +- > test/test/test_table_pipeline.c| 2 +- > test/test/test_table_ports.c | 12 +- > test/test/virtual_pmd.c
[dpdk-dev] [PATCH v4] mbuf: use pktmbuf helper to create the pool
From: Hemant Agrawal When possible, replace the uses of rte_mempool_create() with the helper provided in librte_mbuf: rte_pktmbuf_pool_create(). This is the preferred way to create a mbuf pool. This also updates the documentation. Signed-off-by: Olivier Matz Signed-off-by: Hemant Agrawal Acked-by: Olivier Matz --- Hi Hemant, I noticed I did not ack this patch. I just rebased it on top of master branch (app/test was moved). v4: * rebase on top of master v3: * removing changes from ip_reassembly app v2: * removing compilation error from ip_reassmebly * fix minor patch apply warnings doc/guides/sample_app_ug/ipv4_multicast.rst| 12 doc/guides/sample_app_ug/l2_forward_job_stats.rst | 33 -- .../sample_app_ug/l2_forward_real_virtual.rst | 26 +++-- doc/guides/sample_app_ug/ptpclient.rst | 11 ++-- doc/guides/sample_app_ug/quota_watermark.rst | 26 ++--- drivers/net/bonding/rte_eth_bond_8023ad.c | 13 - examples/ip_pipeline/init.c| 18 ++-- examples/multi_process/l2fwd_fork/main.c | 13 +++-- examples/tep_termination/main.c| 16 +-- lib/librte_mbuf/rte_mbuf.c | 7 +++-- lib/librte_mbuf/rte_mbuf.h | 29 +++ test/test/test_link_bonding_rssconf.c | 11 12 files changed, 90 insertions(+), 125 deletions(-) diff --git a/doc/guides/sample_app_ug/ipv4_multicast.rst b/doc/guides/sample_app_ug/ipv4_multicast.rst index 2f65eed..3e30f50 100644 --- a/doc/guides/sample_app_ug/ipv4_multicast.rst +++ b/doc/guides/sample_app_ug/ipv4_multicast.rst @@ -145,12 +145,12 @@ Memory pools for indirect buffers are initialized differently from the memory po .. code-block:: c -packet_pool = rte_mempool_create("packet_pool", NB_PKT_MBUF, PKT_MBUF_SIZE, 32, sizeof(struct rte_pktmbuf_pool_private), - rte_pktmbuf_pool_init, NULL, rte_pktmbuf_init, NULL, rte_socket_id(), 0); - -header_pool = rte_mempool_create("header_pool", NB_HDR_MBUF, HDR_MBUF_SIZE, 32, 0, NULL, NULL, rte_pktmbuf_init, NULL, rte_socket_id(), 0); -clone_pool = rte_mempool_create("clone_pool", NB_CLONE_MBUF, -CLONE_MBUF_SIZE, 32, 0, NULL, NULL, rte_pktmbuf_init, NULL, rte_socket_id(), 0); +packet_pool = rte_pktmbuf_pool_create("packet_pool", NB_PKT_MBUF, 32, + 0, PKT_MBUF_DATA_SIZE, rte_socket_id()); +header_pool = rte_pktmbuf_pool_create("header_pool", NB_HDR_MBUF, 32, + 0, HDR_MBUF_DATA_SIZE, rte_socket_id()); +clone_pool = rte_pktmbuf_pool_create("clone_pool", NB_CLONE_MBUF, 32, + 0, 0, rte_socket_id()); The reason for this is because indirect buffers are not supposed to hold any packet data and therefore can be initialized with lower amount of reserved memory for each buffer. diff --git a/doc/guides/sample_app_ug/l2_forward_job_stats.rst b/doc/guides/sample_app_ug/l2_forward_job_stats.rst index 456e85e..1029cc8 100644 --- a/doc/guides/sample_app_ug/l2_forward_job_stats.rst +++ b/doc/guides/sample_app_ug/l2_forward_job_stats.rst @@ -193,36 +193,25 @@ and the application to store network packet data: .. code-block:: c /* create the mbuf pool */ -l2fwd_pktmbuf_pool = -rte_mempool_create("mbuf_pool", NB_MBUF, - MBUF_SIZE, 32, - sizeof(struct rte_pktmbuf_pool_private), - rte_pktmbuf_pool_init, NULL, - rte_pktmbuf_init, NULL, - rte_socket_id(), 0); +l2fwd_pktmbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", NB_MBUF, + MEMPOOL_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, + rte_socket_id()); if (l2fwd_pktmbuf_pool == NULL) rte_exit(EXIT_FAILURE, "Cannot init mbuf pool\n"); The rte_mempool is a generic structure used to handle pools of objects. -In this case, it is necessary to create a pool that will be used by the driver, -which expects to have some reserved space in the mempool structure, -sizeof(struct rte_pktmbuf_pool_private) bytes. -The number of allocated pkt mbufs is NB_MBUF, with a size of MBUF_SIZE each. -A per-lcore cache of 32 mbufs is kept. +In this case, it is necessary to create a pool that will be used by the driver. +The number of allocated pkt mbufs is NB_MBUF, with a data room size of +RTE_MBUF_DEFAULT_BUF_SIZE each. +A per-lcore cache of MEMPOOL_CACHE_SIZE mbufs is kept. The memory is allocated in rte_socket_id() socket, but it is possible to extend this code to allocate one mbuf pool per socket. -Two callback pointers are also given to the rte_mempool_create() function: - -* The first callback pointer is to rte_pktmbuf_pool_init() and is used -to initialize the private data of the mempool, which is needed by the driver. -This function is provided by the mbuf API, bu
[dpdk-dev] Reg DPDK with unsupported NIC
Hi , Please find my query, Currently i am planning to develop DPDK APP (linux env). I do not have DPDK supported NIC. Till then i would still like to develop DPDK APP and want DPDK to use OS interface to TX/RX packets from NIC. How can i make it? I went through KNI and my understanding is you cannot use it - is this correct? In what way i can still develop DPDK APP with non supported NIC till get the DPDK NIC. Thanks Raman
Re: [dpdk-dev] i40e SR-IOV with Linux PF
2017-03-14 04:44, Wu, Jingjing: > From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] > > > Hi i40e developers, > > > > > > Referring to the VFD discussion, I thought basic behaviours were the > > > same regardless of the PF driver: > > > http://dpdk.org/ml/archives/dev/2016-December/053056.html > > > " > > > > In the meanwhile, we have some test models ongoing to validate > > > > combination of Linux and DPDK drivers for VF and PF. > > > > We'll fully support below 4 cases going forward. > > > > 1. DPDK PF + DPDK VF > > > > 2. DPDK PF + Linux VF > > > > 3. Linux PF + DPDK VF > > > > 4. Linux PF + Linux VF (it's not our scope) > > > [...] > > > > Linux PF + DPDK VF has been tested with 1.0 API long time ago. > > > > There is some test activities ongoing. > > > " > > > > > > I think the Linux PF case is important and deserves more consideration. > > > When looking at the code, specifically i40evf_vlan_offload_set() and > > > i40evf_vlan_pvid_set(), I read this: > > > " > > > /* Linux pf host doesn't support vlan offload yet */ > > > if (vf->version_major == I40E_DPDK_VERSION_MAJOR) { " > > > > > > Is there some work in progress on Linux side to get the same behaviour > > > as with a DPDK PF? > > > > > As I know, VFD features are marked with an "EXPERIMENTAL" tag. > And we are working on the extendable interface (feature based) with > PF kernel driver. The VLAN offload is not a VFD feature. It is a basic driver feature. It is said that it is supported in the documentation but it is not with a Linux PF. Please consider the rest of my email: > > At least, it must be documented in > > doc/guides/nics/features/i40e_vf.ini > > and marked as partially supported (P instead of Y) in > > doc/guides/nics/i40e.rst
Re: [dpdk-dev] i40e queues per VF
ping again 2017-03-08 12:25, Thomas Monjalon: > ping - still waiting for answers to below questions, please > > 2017-02-16 15:55, Thomas Monjalon: > > 2017-02-16 13:58, Wu, Jingjing: > > > From: Thomas Monjalon > > > > > > > > Hi, > > > > > > > > When reading the documentation, it is not easy to understand the > > > > capability of > > > > i40evf for the number of queues. > > > > > > > > First, please could you explain why we need a build-time config option? > > > > In the doc, there is neither justification nor tuning guidelines: > > > > > > > > http://dpdk.org/doc/guides/nics/i40e.html#config-file-options > > > > " > > > > CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF (default 64) Number of > > > > queues reserved for PF. > > > > CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF (default 4) Number of > > > > queues reserved for each SR-IOV VF. > > > > " > > > > > > This number is used as initialization time to allocate queue number > > > for PF/VF for HW's queue pool. Will add more description in i40e.rst. > > > > The description "Number of queues reserved for each SR-IOV VF" seems > > partially wrong. Please explain it is a queue pair. > > > > > > I feel these are hard limits and should be some constants in the code, > > > > not some > > > > build configuration options. > > > > > > > > The other doc to look at is: > > > > http://dpdk.org/doc/guides/nics/intel_vf.html#intel-fortville-10-40-gigabit- > > > > ethernet-controller-vf-infrastructure > > > > " > > > > Each VF can have a maximum of 16 queue pairs. > > > > " > > > > > > > > Do we agree that a queue pair is 1 Rx queue / 1 Tx queue? > > > > Note: the concept of queue pairs in Intel VF should be explained > > > > somewhere. > > > > > > > Yes. > > > > Below, a different limitation is given: > > > > " > > > > The available queue number(at most 4) per VF depends on the total > > > > number of > > > > pool, which is determined by the max number of VF at PF initialization > > > > stage and > > > > the number of queue specified in config " > > > > > > > I think there may be some inconsistent description in doc intel_vf.rst > > > due to > > > Multiple kinds of NICs. We should correct them. > > > Thanks for pointing that. > > > > > > > So what is the real maximum of queue pairs? 4 or 16? > > > > The datasheet talks about 16 queues. Is it 8 pairs? > > > > > > That's is 16 queue pairs. 16 RX queues and 16 Tx queues. > > > > > > > > Is there something to configure the number of queues when creating VF > > > > with the > > > > kernel driver? > > > > > > In kernel driver, it seems at most only 4 queues are supported. That's > > > Why we add build-time config option to make more queues are possible. > > > > If we can create 16 queue pairs, why restrict the default configuration to > > 4? > > Why is it a build-time config option?
Re: [dpdk-dev] Reg DPDK with unsupported NIC
2017-03-14 10:28 keltezéssel, raman geetha gopalakrishnan írta: Hi , Please find my query, Currently i am planning to develop DPDK APP (linux env). I do not have DPDK supported NIC. Till then i would still like to develop DPDK APP and want DPDK to use OS interface to TX/RX packets from NIC. How can i make it? I went through KNI and my understanding is you cannot use it - is this correct? In what way i can still develop DPDK APP with non supported NIC till get the DPDK NIC. Thanks Raman Hi, My recommendation is to do your development in a KVM VM. DPDK works perfectly with virtio NICs. Geza
Re: [dpdk-dev] [PATCH v9 16/18] examples/distributor: give Rx thread a core
On 10/3/2017 4:51 PM, Bruce Richardson wrote: On Mon, Mar 06, 2017 at 09:10:31AM +, David Hunt wrote: This so that with the increased amount of stats we are counting, we don't interfere with the rx core. Where are the stats being counted in the current code and how would they interfere? /Bruce Previous version of the distributor example did not print out several lines of statistics every second, which the new version does. I felt that it it would be better to separate this off to it's own core, so that the rx core performance would not be impacted. I'll add an extra comment in the commit message. Regards, Dave.
Re: [dpdk-dev] Compiling DPDK 17.02 with a Busybox-based tar
2017-03-14 07:58, Raphael Cohn: > Hi, > > To compile DPDK on a system with Busybox tar installed, it's necessary to > make a small change to the build system:- > > sed -i -e '/--keep-newer-files/d' mk/rte.sdkinstall.mk > sed -i -e 's;--strip-components=1 \\;--strip-components=1;g' mk/ > rte.sdkinstall.mk > > I'm not sure whether the impact of this change fundamentally affects DPDK. > I'm a little surprised that tar is needed at all for a compile + install, > but I haven't investigated further. Is it being used to do a copy? Yes it is used to make a copy. It is a convenient one-liner. I am a bit surprised that you need to install DPDK with busybox. The busybox systems are generally cross-built and prepared out of the box, with the host tools. However, if you feel it is important to install DPDK on such target, you are welcome to propose a patch.
Re: [dpdk-dev] [PATCH] net/ixgbevf: fix stats update after a PF reset
Hi Wei, On Thu, 16 Feb 2017 17:49:22 +0100, Olivier Matz wrote: > Hi Wei, > > On Thu, 9 Feb 2017 14:50:05 +, "Dai, Wei" wrote: > > > -Original Message- > > > From: Dai, Wei > > > Sent: Thursday, February 9, 2017 10:38 PM > > > To: 'Olivier Matz' ; dev@dpdk.org; Zhang, > > > Helin ; Ananyev, Konstantin > > > > > > Cc: Guo Fengtian ; sta...@dpdk.org > > > Subject: RE: [dpdk-dev] [PATCH] net/ixgbevf: fix stats update after > > > a PF reset > > > > > > The stats register can rewind to zero when the port is running for > > > a long period. So I am afraid that this check is not always correct. > > > Why not introduce a variable to directly indicate whether the > > > resulted stats should be updated or not. > > > > Another way is to clear > > hw_stats->last_vfgprc/last_vfgorc/last_vfgptc/last_vfmprc at the same > > time PF is set down. > > > > Can we know easily in VF if the PF was set down? Any guideline for this? Or is it something we cannot fix? Thanks, Olivier
Re: [dpdk-dev] Reg DPDK with unsupported NIC
2017-03-14 14:58, raman geetha gopalakrishnan: > In what way i can still develop DPDK APP with non supported NIC till get > the DPDK NIC. You should check the bottom of this page (section Ohers): http://dpdk.org/doc/nics The simplest is to use the ring PMD which do not require a NIC at all. Or you can try the other solutions (pcap, tap, af_packet).
Re: [dpdk-dev] [PATCH 01/17] vhost: introduce driver features related APIs
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: Introduce few APIs to set/get/enable/disable driver features. Signed-off-by: Yuanhan Liu --- lib/librte_vhost/rte_vhost_version.map | 10 lib/librte_vhost/rte_virtio_net.h | 9 lib/librte_vhost/socket.c | 90 ++ 3 files changed, 109 insertions(+) Nice! Reviewed-by: Maxime Coquelin I wonder whether we could protect from setting/enabling/disabling features once negotiation is done? Thanks, Maxime
Re: [dpdk-dev] Reg DPDK with unsupported NIC
Sent from my iPhone > On Mar 14, 2017, at 5:29 PM, raman geetha gopalakrishnan > wrote: > > Hi , > > Please find my query, Currently i am planning to develop DPDK APP (linux > env). I do not have DPDK supported NIC. Till then i would still like to > develop DPDK APP and want DPDK to use OS interface to TX/RX packets from > NIC. How can i make it? I went through KNI and my understanding is you > cannot use it - is this correct? > > In what way i can still develop DPDK APP with non supported NIC till get > the DPDK NIC. Take a look at the TAP PMD driver and look at the doc file in the doc directory. It will not be high performance but it can work nicely using tap and some like socat. > > Thanks > Raman
Re: [dpdk-dev] [PATCH 01/17] vhost: introduce driver features related APIs
On 03/14/2017 10:46 AM, Maxime Coquelin wrote: On 03/03/2017 10:51 AM, Yuanhan Liu wrote: Introduce few APIs to set/get/enable/disable driver features. Signed-off-by: Yuanhan Liu --- lib/librte_vhost/rte_vhost_version.map | 10 lib/librte_vhost/rte_virtio_net.h | 9 lib/librte_vhost/socket.c | 90 ++ 3 files changed, 109 insertions(+) Nice! Reviewed-by: Maxime Coquelin I wonder whether we could protect from setting/enabling/disabling features once negotiation is done? Oh, I forgot one comment on this patch. Since these new functions are part to the API, shouldn't them be documented? Maxime
[dpdk-dev] [PATCH] doc: add maintainer role about replying questions
The first line of the MAINTAINERS file are: " The intention of this file is to provide a set of names that we can rely on for helping in patch reviews and questions. " Unfortunately, some maintainers do not endorse their role for questions asked on the mailing list. Hope making it clear in the contribution guide will enforce more responsive maintainer participation. Signed-off-by: Thomas Monjalon --- doc/guides/contributing/patches.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst index fe71381..84a5dab 100644 --- a/doc/guides/contributing/patches.rst +++ b/doc/guides/contributing/patches.rst @@ -67,6 +67,7 @@ The role of the component maintainers is to: * Review patches for the component or delegate the review. The review should be done, ideally, within 1 week of submission to the mailing list. * Add an ``acked-by`` to patches, or patchsets, that are ready for committing to a tree. +* Reply to questions asked about the component. Component maintainers can be added or removed by submitting a patch to the ``MAINTAINERS`` file. Maintainers should have demonstrated a reasonable level of contributions or reviews to the component area. -- 2.7.0
[dpdk-dev] [PATCH v2] net/virtio-user: fix overflow
virtio-user limits the qeueue number to 8 but provides no limit check against the queue number input from user. If a bigger queue number (> 8) is given, there is an overflow issue. Doing a sanity check could avoid it. Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer") Cc: sta...@dpdk.org Signed-off-by: Wenfeng Liu --- drivers/net/virtio/virtio_pci.h | 3 ++- drivers/net/virtio/virtio_user/virtio_user_dev.c | 2 +- drivers/net/virtio/virtio_user/virtio_user_dev.h | 6 +++--- drivers/net/virtio/virtio_user_ethdev.c | 7 +++ 4 files changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index 59e45c4..1302556 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -160,7 +160,8 @@ /* * Maximum number of virtqueues per device. */ -#define VIRTIO_MAX_VIRTQUEUES 8 +#define VIRTIO_MAX_VIRTQUEUE_PAIRS 8 +#define VIRTIO_MAX_VIRTQUEUES (VIRTIO_MAX_VIRTQUEUE_PAIRS * 2 + 1) /* Common configuration */ #define VIRTIO_PCI_CAP_COMMON_CFG 1 diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c index e7fd65f..5b81676 100644 --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c @@ -234,7 +234,7 @@ int virtio_user_stop_device(struct virtio_user_dev *dev) uint32_t i, q; dev->vhostfd = -1; - for (i = 0; i < VIRTIO_MAX_VIRTQUEUES * 2 + 1; ++i) { + for (i = 0; i < VIRTIO_MAX_VIRTQUEUES; ++i) { dev->kickfds[i] = -1; dev->callfds[i] = -1; } diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h b/drivers/net/virtio/virtio_user/virtio_user_dev.h index 6ecb91e..ba80d05 100644 --- a/drivers/net/virtio/virtio_user/virtio_user_dev.h +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h @@ -49,8 +49,8 @@ struct virtio_user_dev { int *tapfds; /* for both vhost_user and vhost_kernel */ - int callfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1]; - int kickfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1]; + int callfds[VIRTIO_MAX_VIRTQUEUES]; + int kickfds[VIRTIO_MAX_VIRTQUEUES]; int mac_specified; uint32_tmax_queue_pairs; uint32_tqueue_pairs; @@ -62,7 +62,7 @@ struct virtio_user_dev { uint8_t status; uint8_t mac_addr[ETHER_ADDR_LEN]; charpath[PATH_MAX]; - struct vringvrings[VIRTIO_MAX_VIRTQUEUES * 2 + 1]; + struct vringvrings[VIRTIO_MAX_VIRTQUEUES]; struct virtio_user_backend_ops *ops; }; diff --git a/drivers/net/virtio/virtio_user_ethdev.c b/drivers/net/virtio/virtio_user_ethdev.c index 16d1526..d476a2d 100644 --- a/drivers/net/virtio/virtio_user_ethdev.c +++ b/drivers/net/virtio/virtio_user_ethdev.c @@ -433,6 +433,13 @@ goto end; } + if (queues > VIRTIO_MAX_VIRTQUEUE_PAIRS) { + PMD_INIT_LOG(ERR, "arg %s %u exceeds the limit %u", + VIRTIO_USER_ARG_QUEUES_NUM, queues, + VIRTIO_MAX_VIRTQUEUE_PAIRS); + goto end; + } + eth_dev = virtio_user_eth_dev_alloc(name); if (!eth_dev) { PMD_INIT_LOG(ERR, "virtio_user fails to alloc device"); -- 1.8.3.1
Re: [dpdk-dev] Reg DPDK with unsupported NIC
Thanks Keith For the quick Reply. On Tue, Mar 14, 2017 at 3:22 PM, Wiles, Keith wrote: > > > Sent from my iPhone > > > On Mar 14, 2017, at 5:29 PM, raman geetha gopalakrishnan < > glowing...@gmail.com> wrote: > > > > Hi , > > > > Please find my query, Currently i am planning to develop DPDK APP (linux > > env). I do not have DPDK supported NIC. Till then i would still like to > > develop DPDK APP and want DPDK to use OS interface to TX/RX packets from > > NIC. How can i make it? I went through KNI and my understanding is you > > cannot use it - is this correct? > > > > In what way i can still develop DPDK APP with non supported NIC till get > > the DPDK NIC. > > Take a look at the TAP PMD driver and look at the doc file in the doc > directory. > > It will not be high performance but it can work nicely using tap and some > like socat. > > > > Thanks > > Raman >
Re: [dpdk-dev] Reg DPDK with unsupported NIC
Thanks Monjalon For the quick reply. On Tue, Mar 14, 2017 at 3:16 PM, Thomas Monjalon wrote: > 2017-03-14 14:58, raman geetha gopalakrishnan: > > In what way i can still develop DPDK APP with non supported NIC till get > > the DPDK NIC. > > You should check the bottom of this page (section Ohers): > http://dpdk.org/doc/nics > The simplest is to use the ring PMD which do not require a NIC at all. > Or you can try the other solutions (pcap, tap, af_packet). >
Re: [dpdk-dev] [PATCH 02/17] net/vhost: remove feature related APIs
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: vdev options is better for disabling/enabling some features. Signed-off-by: Yuanhan Liu --- drivers/net/vhost/rte_eth_vhost.c | 25 drivers/net/vhost/rte_eth_vhost.h | 30 - drivers/net/vhost/rte_pmd_vhost_version.map | 3 --- 3 files changed, 58 deletions(-) Acked-by: Maxime Coquelin Thanks, Maxime
Re: [dpdk-dev] [PATCH 03/17] vhost: use new APIs to handle features
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: Signed-off-by: Yuanhan Liu --- examples/tep_termination/main.c| 4 +++- examples/vhost/main.c | 43 +- lib/librte_vhost/rte_vhost_version.map | 3 --- lib/librte_vhost/rte_virtio_net.h | 13 -- lib/librte_vhost/socket.c | 23 +- lib/librte_vhost/vhost.c | 41 lib/librte_vhost/vhost.h | 20 lib/librte_vhost/vhost_user.c | 8 +++ 8 files changed, 76 insertions(+), 79 deletions(-) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 8433a54..f7227bf 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -143,9 +143,9 @@ * The features that we support are requested. */ static uint64_t -vhost_user_get_features(void) +vhost_user_get_features(struct virtio_net *dev) { - return VHOST_FEATURES; + return rte_vhost_driver_get_features(dev->ifname); } /* @@ -154,7 +154,7 @@ static int vhost_user_set_features(struct virtio_net *dev, uint64_t features) { - if (features & ~VHOST_FEATURES) + if (features & ~rte_vhost_driver_get_features(dev->ifname)) rte_vhost_driver_get_features() returns -1 if the socket is not found. It would result in accepting any feature trying to be set. Maxime
Re: [dpdk-dev] [PATCH v9 04/18] lib: add new distributor code
On 10/3/2017 4:03 PM, Bruce Richardson wrote: On Mon, Mar 06, 2017 at 09:10:19AM +, David Hunt wrote: This patch includes public header file which will be used once we add in the symbol versioning for v20 and v1705 APIs. Also includes v1702 header file, and code for new Now 1705. burst-capable distributor library. This will be re-named as rte_distributor.h later in the patch-set The new distributor code contains a very similar API to the legacy code, but now sends bursts of up to 8 mbufs to each worker. Flow ID's are reduced to 15 bits for an optimal flow matching algorithm. Signed-off-by: David Hunt --- lib/librte_distributor/Makefile | 1 + lib/librte_distributor/rte_distributor.c | 628 +++ lib/librte_distributor/rte_distributor_private.h | 7 +- lib/librte_distributor/rte_distributor_v1705.h | 269 ++ 4 files changed, 904 insertions(+), 1 deletion(-) create mode 100644 lib/librte_distributor/rte_distributor.c create mode 100644 lib/librte_distributor/rte_distributor_v1705.h Minor nit, I think this patch might be squashed into the previous one, to have new structures and code together. /Bruce Done in the next version. Dave.
Re: [dpdk-dev] [PATCH v9 12/18] examples/distributor: allow for extra stats
On 10/3/2017 4:46 PM, Bruce Richardson wrote: On Mon, Mar 06, 2017 at 09:10:27AM +, David Hunt wrote: + freq = rte_get_timer_hz(); + t = rte_rdtsc() + freq; + while (!quit_signal_dist) { + if (t < rte_rdtsc()) { + print_stats(); + t = rte_rdtsc() + freq; + } + } + You can probably put in a usleep or nanosleep inot the while loop above. No need to burn an entire core by polling the tsc. /Bruce Done in the next version. Dave.
Re: [dpdk-dev] [PATCH v9 14/18] examples/distributor: give distributor a core
On 10/3/2017 4:49 PM, Bruce Richardson wrote: On Mon, Mar 06, 2017 at 09:10:29AM +, David Hunt wrote: Signed-off-by: David Hunt --- Title could do with some rewording - e.g. "make distributor API calls on dedicated core" This also requires an explanation as to why the change is being made. Does it not also need an update to the sample app guide about how the app works? /Bruce Yes, sure. And I'll add some more info into the dist_app.rst file. Rgds, Dave.
Re: [dpdk-dev] [PATCH v9 15/18] examples/distributor: limit number of Tx rings
On 10/3/2017 4:50 PM, Bruce Richardson wrote: On Mon, Mar 06, 2017 at 09:10:30AM +, David Hunt wrote: Signed-off-by: David Hunt --- Please explain reason for change. /Bruce I've re-worked this change, as it's mostly do to with performance, resulting in about 10% gain. The number of Tx rings has been removed, as it was part of an investigation for high core count operation, and is not relevant. Rgds, Dave.
Re: [dpdk-dev] [PATCH 04/17] vhost: make notify ops per vhost driver
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: Assume there is an application both support vhost-user net and vhost-user scsi, the callback should be different. Making notify ops per vhost driver allow application define different set of callbacks for different driver. Signed-off-by: Yuanhan Liu --- drivers/net/vhost/rte_eth_vhost.c | 20 +++- examples/tep_termination/main.c | 3 ++- examples/vhost/main.c | 5 +++-- lib/librte_vhost/rte_virtio_net.h | 3 ++- lib/librte_vhost/socket.c | 32 lib/librte_vhost/vhost.c | 16 +--- lib/librte_vhost/vhost.h | 5 - lib/librte_vhost/vhost_user.c | 15 +-- 8 files changed, 64 insertions(+), 35 deletions(-) diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c index 0ebaa4a..816a9a0 100644 --- a/drivers/net/vhost/rte_eth_vhost.c +++ b/drivers/net/vhost/rte_eth_vhost.c @@ -667,6 +667,12 @@ struct vhost_xstats_name_off { return 0; } +static struct virtio_net_device_ops vhost_ops = { + .new_device = new_device, + .destroy_device = destroy_device, + .vring_state_changed = vring_state_changed, +}; + int rte_eth_vhost_get_queue_event(uint8_t port_id, struct rte_eth_vhost_queue_event *event) @@ -736,15 +742,6 @@ struct vhost_xstats_name_off { static void * vhost_driver_session(void *param __rte_unused) { - static struct virtio_net_device_ops vhost_ops; - - /* set vhost arguments */ - vhost_ops.new_device = new_device; - vhost_ops.destroy_device = destroy_device; - vhost_ops.vring_state_changed = vring_state_changed; - if (rte_vhost_driver_callback_register(&vhost_ops) < 0) - RTE_LOG(ERR, PMD, "Can't register callbacks\n"); - /* start event handling */ rte_vhost_driver_session_start(); @@ -1077,6 +1074,11 @@ struct vhost_xstats_name_off { if (rte_vhost_driver_register(iface_name, flags)) goto error; + if (rte_vhost_driver_callback_register(iface_name, &vhost_ops) < 0) { + RTE_LOG(ERR, PMD, "Can't register callbacks\n"); + goto error; + } + /* We need only one message handling thread */ if (rte_atomic16_add_return(&nb_started_ports, 1) == 1) { if (vhost_driver_session_start()) diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c index 8c45128..03c0fbe 100644 --- a/examples/tep_termination/main.c +++ b/examples/tep_termination/main.c @@ -1258,7 +1258,8 @@ static inline void __attribute__((always_inline)) rte_vhost_driver_disable_features(dev_basename, 1ULL << VIRTIO_NET_F_MRG_RXBUF); - rte_vhost_driver_callback_register(&virtio_net_device_ops); + rte_vhost_driver_callback_register(dev_basename, + &virtio_net_device_ops); Return should be checked here, as this function can now return -1. rte_vhost_driver_session_start(); diff --git a/examples/vhost/main.c b/examples/vhost/main.c index 972a6a8..867efc6 100644 --- a/examples/vhost/main.c +++ b/examples/vhost/main.c @@ -1538,9 +1538,10 @@ static inline void __attribute__((always_inline)) rte_vhost_driver_enable_features(file, 1ULL << VIRTIO_NET_F_CTRL_RX); } - } - rte_vhost_driver_callback_register(&virtio_net_device_ops); + rte_vhost_driver_callback_register(file, + &virtio_net_device_ops); + } Ditto. rte_vhost_driver_session_start(); return 0; diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h index 51f5166..3bfd0b7 100644 --- a/lib/librte_vhost/rte_virtio_net.h +++ b/lib/librte_vhost/rte_virtio_net.h @@ -91,7 +91,8 @@ struct virtio_net_device_ops { int rte_vhost_driver_disable_features(const char *path, uint64_t features); /* Register callbacks. */ -int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const); +int rte_vhost_driver_callback_register(const char *path, + struct virtio_net_device_ops const * const); /* Start vhost driver session blocking loop. */ int rte_vhost_driver_session_start(void); diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 9e0ec05..550af64 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -73,6 +73,8 @@ struct vhost_user_socket { */ uint64_t supported_features; uint64_t features; + + struct virtio_net_device_ops const *notify_ops; }; struct vhost_user_connection { @@ -718,6 +720,36 @@ struct vhost_user_reconnect_list { return -1; } +/* + * Register ops so that we can add/remove device to data core. + */ +int +rte_vhost_driver_callback_register(const char *path, + struct virtio_net_device_ops const * const ops) +{ + struc
Re: [dpdk-dev] [PATCH 05/17] vhost: export guest memory regions
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: Some vhost-user driver may need this info to setup its own page tables for GPA (guest physical addr) to HPA (host physical addr) translation. SPDK (Storage Performance Development Kit) is one example. Besides, by exporting this memory info, we could also export the gpa_to_vva() as an inline function, which helps for performance. Otherwise, it has to be referenced indirectly by a "vid". Signed-off-by: Yuanhan Liu --- lib/librte_vhost/rte_vhost_version.map | 1 + lib/librte_vhost/rte_virtio_net.h | 24 lib/librte_vhost/vhost.c | 23 +++ lib/librte_vhost/vhost.h | 28 ++-- lib/librte_vhost/vhost_user.c | 12 ++-- 5 files changed, 56 insertions(+), 32 deletions(-) diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index ee72d5f..b890da6 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -35,5 +35,6 @@ DPDK_17.05 { rte_vhost_driver_enable_features; rte_vhost_driver_get_features; rte_vhost_driver_set_features; + rte_vhost_get_vhost_memory; } DPDK_16.07; diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h index 3bfd0b7..eddf0f4 100644 --- a/lib/librte_vhost/rte_virtio_net.h +++ b/lib/librte_vhost/rte_virtio_net.h @@ -59,6 +59,28 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; /** + * Information relating to memory regions including offsets to + * addresses in QEMUs memory file. + */ +struct rte_vhost_mem_region { + uint64_t guest_phys_addr; + uint64_t guest_user_addr; + uint64_t host_user_addr; + uint64_t size; + void *mmap_addr; + uint64_t mmap_size; + int fd; +}; + +/** + * Memory structure includes region and mapping information. + */ +struct rte_vhost_memory { + uint32_t nregions; + struct rte_vhost_mem_region regions[0]; +}; + +/** * Device and vring operations. */ struct virtio_net_device_ops { @@ -187,4 +209,6 @@ uint16_t rte_vhost_enqueue_burst(int vid, uint16_t queue_id, uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count); +int rte_vhost_get_vhost_memory(int vid, struct rte_vhost_memory **mem); + #endif /* _VIRTIO_NET_H_ */ diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 0088f87..eee229a 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -340,6 +340,29 @@ struct virtio_net * return 0; } +int +rte_vhost_get_vhost_memory(int vid, struct rte_vhost_memory **mem) +{ + struct virtio_net *dev; + struct rte_vhost_memory *m; + size_t size; + + dev = get_device(vid); + if (!dev) + return -1; + + size = dev->mem->nregions * sizeof(struct rte_vhost_mem_region); + m = malloc(size); + if (!m) + return -1; + + m->nregions = dev->mem->nregions; + memcpy(m->regions, dev->mem->regions, size); + *mem = m; + + return 0; +} + Might worth to be documented, as API. Other than that: Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [dpdk-dev] [PATCH 06/17] vhost: introduce API to fetch negotiated features
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: Signed-off-by: Yuanhan Liu --- lib/librte_vhost/rte_vhost_version.map | 1 + lib/librte_vhost/rte_virtio_net.h | 1 + lib/librte_vhost/vhost.c | 12 3 files changed, 14 insertions(+) diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index b890da6..85a2796 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -35,6 +35,7 @@ DPDK_17.05 { rte_vhost_driver_enable_features; rte_vhost_driver_get_features; rte_vhost_driver_set_features; + rte_vhost_get_negotiated_features rte_vhost_get_vhost_memory; } DPDK_16.07; diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h index eddf0f4..e7b1599 100644 --- a/lib/librte_vhost/rte_virtio_net.h +++ b/lib/librte_vhost/rte_virtio_net.h @@ -210,5 +210,6 @@ uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count); int rte_vhost_get_vhost_memory(int vid, struct rte_vhost_memory **mem); +uint64_t rte_vhost_get_negotiated_features(int vid); #endif /* _VIRTIO_NET_H_ */ diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index eee229a..8046aef 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -340,6 +340,18 @@ struct virtio_net * return 0; } +uint64_t +rte_vhost_get_negotiated_features(int vid) +{ + struct virtio_net *dev; + + dev = get_device(vid); + if (!dev) + return -1; + + return dev->features; +} Are we sure the negotiation is done when we can get the device? Maxime
Re: [dpdk-dev] rte_malloc_heap usage scenario
On 09/03/2017 04:48, Yerden Zhumabekov wrote: Hello, Can anybody explain why rte_malloc_heap.h is advertised as a public interface? It only contains malloc_heap struct which is used internally and violates DPDK naming conventions for public struct. I haven't found any use of it outside of EAL and it's undocumented. If it is in fact public, will it continue to be? As you point out, the struct name does not follow dpdk conventions for public structs, that is how it was historically and was never changed. IMHO the struct itself is not meant to be public (probably we should add some comments) but it needs to be as it is a field in public 'struct rte_mem_config' which is also a field in public 'struct rte_config'. That being said, it doesn't look like rte_mem_config should be a public struct at all, although there are a couple of PMDs (virtio_user and ena) accessing mem_config info through rte_config. I cannot guarantee that would remain public, but there should be at least a notice in case of any changes. Sergio I'd like to use it (and some code in librte_eal/common) for region-based allocation in a secondary process for easy memory reclamation in case of failure. Is it safe to reuse malloc_heap struct?
Re: [dpdk-dev] Reg DPDK with unsupported NIC
On 3/14/2017 9:28 AM, raman geetha gopalakrishnan wrote: > Hi , > > Please find my query, Currently i am planning to develop DPDK APP (linux > env). I do not have DPDK supported NIC. Till then i would still like to > develop DPDK APP and want DPDK to use OS interface to TX/RX packets from > NIC. How can i make it? > I went through KNI and my understanding is you > cannot use it - is this correct? You can use it. KNI does not require a physical device at all. But with the KNI support in main tree, you need to use KNI specific APIs, - which KNI sample app shows. With the KNI PMD in next-net tree, it is possible to use KNI as any other device. > > In what way i can still develop DPDK APP with non supported NIC till get > the DPDK NIC. > > Thanks > Raman >
Re: [dpdk-dev] [PATCH 07/17] vhost: export vhost vring info
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: Signed-off-by: Yuanhan Liu --- lib/librte_vhost/rte_vhost_version.map | 1 + lib/librte_vhost/rte_virtio_net.h | 13 + lib/librte_vhost/vhost.c | 30 ++ lib/librte_vhost/vhost.h | 2 ++ 4 files changed, 46 insertions(+) diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index 85a2796..efde936 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -37,5 +37,6 @@ DPDK_17.05 { rte_vhost_driver_set_features; rte_vhost_get_negotiated_features rte_vhost_get_vhost_memory; + rte_vhost_get_vhost_vring; } DPDK_16.07; diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h index e7b1599..80209ad 100644 --- a/lib/librte_vhost/rte_virtio_net.h +++ b/lib/librte_vhost/rte_virtio_net.h @@ -80,6 +80,17 @@ struct rte_vhost_memory { struct rte_vhost_mem_region regions[0]; }; +struct rte_vhost_vring { + struct vring_desc *desc; + struct vring_avail *avail; + struct vring_used *used; + uint64_tlog_guest_addr; + + int callfd; + int kickfd; + uint16_tsize; +}; + /** * Device and vring operations. */ @@ -211,5 +222,7 @@ uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id, int rte_vhost_get_vhost_memory(int vid, struct rte_vhost_memory **mem); uint64_t rte_vhost_get_negotiated_features(int vid); +int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx, + struct rte_vhost_vring *vring); #endif /* _VIRTIO_NET_H_ */ diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 8046aef..b175f0e 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -375,6 +375,36 @@ struct virtio_net * return 0; } +int +rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx, + struct rte_vhost_vring *vring) +{ + struct virtio_net *dev; + struct vhost_virtqueue *vq; + + dev = get_device(vid); + if (!dev) + return -1; + + if (vring_idx > VHOST_MAX_VRING) Shouldn't be ">=" ? + return -1; + + vq = dev->virtqueue[vring_idx]; + if (!vq) + return -1; + + vring->desc = vq->desc; + vring->avail = vq->avail; + vring->used = vq->used; + vring->log_guest_addr = vq->log_guest_addr; + + vring->callfd = vq->callfd; + vring->kickfd = vq->kickfd; + vring->size= vq->size; + + return 0; +} + uint16_t rte_vhost_avail_entries(int vid, uint16_t queue_id) { diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index ed7b2c9..b0ef0cc 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -138,6 +138,8 @@ struct vhost_virtqueue { #ifndef VIRTIO_NET_F_MQ #define VIRTIO_NET_F_MQ 22 #endif + +#define VHOST_MAX_VRING0x100 Looking at the code, I'm not clear where this limitation comes from. It seems that it can be up to 0x1, no? struct virtio_net { ... struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; with: #ifdef VIRTIO_NET_F_MQ #define VHOST_MAX_QUEUE_PAIRS VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX #define VHOST_SUPPORTS_MQ (1ULL << VIRTIO_NET_F_MQ) #else #define VHOST_MAX_QUEUE_PAIRS 1 #define VHOST_SUPPORTS_MQ 0 #endif and: #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX0x8000 Regards, Maxime
Re: [dpdk-dev] [PATCH 08/17] vhost: export API to translate gpa to vva
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: Signed-off-by: Yuanhan Liu --- lib/librte_vhost/rte_vhost_version.map | 1 + lib/librte_vhost/rte_virtio_net.h | 21 + lib/librte_vhost/vhost.h | 19 --- lib/librte_vhost/virtio_net.c | 23 +-- 4 files changed, 35 insertions(+), 29 deletions(-) Same remark as previous patches, the API should be documented. Other than that: Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [dpdk-dev] [PATCH 09/17] vhost: turn queue pair to vring
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: The queue pair is very virtio-net specific, other devices don't have such concept. To make it generic, we should log the number of vrings instead of the number of queue pairs. This patch just does a simple convert, a later patch would export the number of vrings to applications. Signed-off-by: Yuanhan Liu --- lib/librte_vhost/vhost.c | 74 ++- lib/librte_vhost/vhost.h | 4 +-- lib/librte_vhost/vhost_user.c | 28 +--- lib/librte_vhost/virtio_net.c | 10 +++--- 4 files changed, 38 insertions(+), 78 deletions(-) Nice. Making it generic simplifies the net specific code. Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [dpdk-dev] [PATCH 1/2] eal_interrupts: support external rxq interrupt handlers
On 3/12/2017 8:30 AM, Shahaf Shuler wrote: > Thursday, March 9, 2017 8:10 PM, Thomas Monjalon: >> 2017-02-08 15:57, Shahaf Shuler: >>> /** >>> + * It deletes registered eventfds. >>> + * >>> + * @param intr_handle >>> + * Pointer to the interrupt handle. >>> + */ >>> +void >>> +rte_intr_free_epoll_fd(struct rte_intr_handle *intr_handle); >> >> This patch looks OK except a miss in the EAL map files for this new function. >> The title should be "eal/linux: support external Rx interrupt" >> >> I think this patch should target next-net. >> Please Ferruh, take care of this patchset. Thanks > > Ferruh, any more comments before submitting a v2? Hi Shahaf, No extra comments, only second patch does not cleanly apply to next-net, please rebase it to before sending. Thanks, ferruh
Re: [dpdk-dev] [PATCH 10/17] vhost: export the number of vrings
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: We used to use rte_vhost_get_queue_num() for telling how many vrings. However, the return value is the number of "queue pairs", which is very virtio-net specific. To make it generic, we should return the number of vrings instead, and let the driver do the proper translation. Say, virtio-net driver could turn it to the number of queue pairs by dividing 2. Meanwhile, mark rte_vhost_get_queue_num as deprecated. Signed-off-by: Yuanhan Liu --- drivers/net/vhost/rte_eth_vhost.c | 2 +- lib/librte_vhost/rte_vhost_version.map | 1 + lib/librte_vhost/rte_virtio_net.h | 17 + lib/librte_vhost/vhost.c | 11 +++ 4 files changed, 30 insertions(+), 1 deletion(-) Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [dpdk-dev] [PATCH 11/17] vhost: move the device ready check at proper place
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: Currently, we check vq->desc, vq->kickfd and vq->callfd to know whether a virtio device is ready or not. However, we only do it when handling SET_VRING_KICK message, which could be wrong if a vhost-user frontend send SET_VRING_KICK first and SET_VRING_CALL later. To work for all possible vhost-user frontend implementations, we could move the ready check at the end of vhost-user message handler. Meanwhile, since we do the check more often than before, the "virtio not ready" message is dropped, to not flood the screen. Signed-off-by: Yuanhan Liu --- lib/librte_vhost/vhost_user.c | 32 ++-- 1 file changed, 14 insertions(+), 18 deletions(-) Makes sense: Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [dpdk-dev] [PATCH 12/17] vhost: drop the Rx and Tx queue macro
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: They are virti-net specific and should be defined inside the virtio-net s/virti-net/virtio-net/ driver. Signed-off-by: Yuanhan Liu --- drivers/net/vhost/rte_eth_vhost.c | 2 ++ examples/tep_termination/main.h | 2 ++ examples/vhost/main.h | 2 ++ lib/librte_vhost/rte_virtio_net.h | 3 --- 4 files changed, 6 insertions(+), 3 deletions(-) With the commit msg fixed: Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [dpdk-dev] [PATCH 13/17] vhost: do not include net specific headers
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: Include it internally, at vhost.h. Signed-off-by: Yuanhan Liu --- examples/vhost/main.h | 2 ++ lib/librte_vhost/rte_virtio_net.h | 4 lib/librte_vhost/vhost.h | 4 3 files changed, 6 insertions(+), 4 deletions(-) Reviewed-by: Maxime Coquelin
Re: [dpdk-dev] [PATCH 14/17] vhost: rename device ops struct
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: rename "virtio_net_device_ops" to "vhost_device_ops", to not let it be virtio-net specific. Signed-off-by: Yuanhan Liu --- drivers/net/vhost/rte_eth_vhost.c | 2 +- examples/tep_termination/main.c | 4 ++-- examples/vhost/main.c | 4 ++-- lib/librte_vhost/Makefile | 2 +- lib/librte_vhost/rte_virtio_net.h | 4 ++-- lib/librte_vhost/socket.c | 6 +++--- lib/librte_vhost/vhost.h | 4 ++-- 7 files changed, 13 insertions(+), 13 deletions(-) Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [dpdk-dev] [PATCH 15/17] vhost: rename virtio-net to vhost
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: Rename "virtio-net" to "vhost" in the API comments. Signed-off-by: Yuanhan Liu --- lib/librte_vhost/rte_virtio_net.h | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) Reviewed-by: Maxime Coquelin Thanks, Maxime
[dpdk-dev] [PATCH 2/2] net/tap: add Rx trigger
This commit adds a signal-based trigger to the Rx burst function in order to avoid unnecessary system calls while Rx queues are empty. Triggered Rx bursts put less pressure on the kernel, free up CPU resources for applications and result in a noticeable performance improvement when sharing CPU threads with other PMDs. Measuring the traffic forwarding rate between two physical devices in testpmd (IO mode, single thread, 64B packets) before and after adding two tap PMD instances (4 ports total) that do not process any traffic and comparing results yields: Without Rx trigger: -15% (--burst=32) -62% (--burst=1) With Rx trigger: -0.3% (--burst=32) -6% (--burst=1) Signed-off-by: Adrien Mazarguil Acked-by: Pascal Mazon --- drivers/net/tap/rte_eth_tap.c | 59 ++ 1 file changed, 59 insertions(+) diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index c757a7c..d5d467a 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -31,6 +31,8 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ +#include +#include #include #include #include @@ -42,6 +44,9 @@ #include #include #include +#include +#include +#include #include #include #include @@ -72,6 +77,8 @@ static const char *valid_arguments[] = { static int tap_unit; +static volatile uint32_t tap_trigger; /* Rx trigger */ + static struct rte_eth_link pmd_link = { .link_speed = ETH_SPEED_NUM_10G, .link_duplex = ETH_LINK_FULL_DUPLEX, @@ -89,6 +96,7 @@ struct pkt_stats { struct rx_queue { struct rte_mempool *mp; /* Mempool for RX packets */ + uint32_t trigger_seen; /* Last seen Rx trigger value */ uint16_t in_port; /* Port ID */ int fd; @@ -111,6 +119,13 @@ struct pmd_internals { struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES];/* List of TX queues */ }; +static void +tap_trigger_cb(int sig __rte_unused) +{ + /* Valid trigger values are nonzero */ + tap_trigger = (tap_trigger + 1) | 0x8000; +} + /* Tun/Tap allocation routine * * name is the number of the interface to use, unless NULL to take the host @@ -175,6 +190,43 @@ tun_alloc(struct pmd_internals *pmd, uint16_t qid) goto error; } + /* Set up trigger to optimize empty Rx bursts */ + errno = 0; + do { + struct sigaction sa; + int flags = fcntl(fd, F_GETFL); + + if (flags == -1 || sigaction(SIGIO, NULL, &sa) == -1) + break; + if (sa.sa_handler != tap_trigger_cb) { + /* +* Make sure SIGIO is not already taken. This is done +* as late as possible to leave the application a +* chance to set up its own signal handler first. +*/ + if (sa.sa_handler != SIG_IGN && + sa.sa_handler != SIG_DFL) { + errno = EBUSY; + break; + } + sa = (struct sigaction){ + .sa_flags = SA_RESTART, + .sa_handler = tap_trigger_cb, + }; + if (sigaction(SIGIO, &sa, NULL) == -1) + break; + } + /* Enable SIGIO on file descriptor */ + fcntl(fd, F_SETFL, flags | O_ASYNC); + fcntl(fd, F_SETOWN, getpid()); + } while (0); + if (errno) { + /* Disable trigger globally in case of error */ + tap_trigger = 0; + RTE_LOG(WARNING, PMD, "Rx trigger disabled: %s\n", + strerror(errno)); + } + if (qid == 0) { if (ioctl(fd, SIOCGIFHWADDR, &ifr) == -1) { RTE_LOG(ERR, PMD, "ioctl failed (SIOCGIFHWADDR) (%s)\n", @@ -204,7 +256,13 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) struct rx_queue *rxq = queue; uint16_t num_rx; unsigned long num_rx_bytes = 0; + uint32_t trigger = tap_trigger; + if (trigger == rxq->trigger_seen) + return 0; + if (trigger) + rxq->trigger_seen = trigger; + rte_compiler_barrier(); for (num_rx = 0; num_rx < nb_pkts; ) { /* allocate the next mbuf */ mbuf = rte_pktmbuf_alloc(rxq->mp); @@ -563,6 +621,7 @@ tap_rx_queue_setup(struct rte_eth_dev *dev, } internals->rxq[rx_queue_id].mp = mp; + internals->rxq[rx_queue_id].trigger_seen = 1; /* force initial burst */ internals->rxq[rx_queue_id].in_port = dev->data->port_id; /* Now get the space available for data in the mbuf */ -- 2.1.4
[dpdk-dev] [PATCH 1/2] net/tap: remove redundant syscall on Tx
Polling the Tx queue file descriptor before writing to it is not mandatory since it is configured as non-blocking. Signed-off-by: Adrien Mazarguil Acked-by: Pascal Mazon --- drivers/net/tap/rte_eth_tap.c | 26 -- 1 file changed, 8 insertions(+), 18 deletions(-) diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index efc4426..c757a7c 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -43,7 +43,6 @@ #include #include #include -#include #include #include #include @@ -242,7 +241,6 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) { struct rte_mbuf *mbuf; struct tx_queue *txq = queue; - struct pollfd pfd; uint16_t num_tx = 0; unsigned long num_tx_bytes = 0; int i, n; @@ -250,26 +248,18 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) if (unlikely(nb_pkts == 0)) return 0; - pfd.events = POLLOUT; - pfd.fd = txq->fd; for (i = 0; i < nb_pkts; i++) { - n = poll(&pfd, 1, 0); - + /* copy the tx frame data */ + mbuf = bufs[num_tx]; + n = write(txq->fd, + rte_pktmbuf_mtod(mbuf, void *), + rte_pktmbuf_pkt_len(mbuf)); if (n <= 0) break; - if (pfd.revents & POLLOUT) { - /* copy the tx frame data */ - mbuf = bufs[num_tx]; - n = write(pfd.fd, rte_pktmbuf_mtod(mbuf, void*), - rte_pktmbuf_pkt_len(mbuf)); - if (n <= 0) - break; - - num_tx++; - num_tx_bytes += mbuf->pkt_len; - rte_pktmbuf_free(mbuf); - } + num_tx++; + num_tx_bytes += mbuf->pkt_len; + rte_pktmbuf_free(mbuf); } txq->stats.opackets += num_tx; -- 2.1.4
Re: [dpdk-dev] [PATCH 16/17] vhost: rename header file
On 03/03/2017 10:51 AM, Yuanhan Liu wrote: Rename "rte_virtio_net.h" to "rte_vhost.h", to not let it be virtio net specific. Signed-off-by: Yuanhan Liu --- doc/guides/rel_notes/deprecation.rst | 9 -- drivers/net/vhost/rte_eth_vhost.c | 2 +- drivers/net/vhost/rte_eth_vhost.h | 2 +- examples/tep_termination/main.c| 2 +- examples/tep_termination/vxlan_setup.c | 2 +- examples/vhost/main.c | 2 +- lib/librte_vhost/Makefile | 2 +- lib/librte_vhost/rte_vhost.h | 259 + lib/librte_vhost/rte_virtio_net.h | 259 - Did you perform the rename using git mv? lib/librte_vhost/vhost.c | 2 +- lib/librte_vhost/vhost.h | 2 +- lib/librte_vhost/vhost_user.h | 2 +- lib/librte_vhost/virtio_net.c | 2 +- 13 files changed, 269 insertions(+), 278 deletions(-) create mode 100644 lib/librte_vhost/rte_vhost.h delete mode 100644 lib/librte_vhost/rte_virtio_net.h Other than that, the change looks good to me: Reviewed-by: Maxime Coquelin Thanks, Maxime
[dpdk-dev] [PATCH v2 0/2] net/mlx5 add rxq interrupt support
This patchset adds support for rxq interrupts on MLX5 PMD. The first patch introduces changes on eal_interrupt to support external interrupt handlers. The second patch implements the support on mlx5 PMD and demonstrate the use of the changes made in the first patch on v2: * add new function to EAL map file. * change commit title. * add documentation to guide and release notes. [PATCH v2 1/2] eal/linux: support external Rx interrupt [PATCH v2 2/2] net/mlx5: support user space rxq interrupt event
[dpdk-dev] [PATCH v2 1/2] eal/linux: support external Rx interrupt
Prior to this patch only UIO/VFIO interrupt handlers types were supported. This patch adds support for the external interrupt handler type, allowing external drivers to set their own fds with specific interrupt handlers. Signed-off-by: Shahaf Shuler --- lib/librte_eal/linuxapp/eal/eal_interrupts.c | 33 ++ .../linuxapp/eal/include/exec-env/rte_interrupts.h | 9 ++ lib/librte_eal/linuxapp/eal/rte_eal_version.map| 7 + 3 files changed, 37 insertions(+), 12 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index 92a19cb..c19eb8c 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -939,6 +939,8 @@ static __attribute__((noreturn)) void * bytes_read = sizeof(buf.vfio_intr_count); break; #endif + case RTE_INTR_HANDLE_EXT: + return; default: bytes_read = 1; RTE_LOG(INFO, EAL, "unexpected intr type\n"); @@ -1167,6 +1169,24 @@ static __attribute__((noreturn)) void * return rc; } +void +rte_intr_free_epoll_fd(struct rte_intr_handle *intr_handle) +{ + uint32_t i; + struct rte_epoll_event *rev; + + for (i = 0; i < intr_handle->nb_efd; i++) { + rev = &intr_handle->elist[i]; + if (rev->status == RTE_EPOLL_INVALID) + continue; + if (rte_epoll_ctl(rev->epfd, EPOLL_CTL_DEL, rev->fd, rev)) { + /* force free if the entry valid */ + eal_epoll_data_safe_free(rev); + rev->status = RTE_EPOLL_INVALID; + } + } +} + int rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd) { @@ -1202,19 +1222,8 @@ static __attribute__((noreturn)) void * rte_intr_efd_disable(struct rte_intr_handle *intr_handle) { uint32_t i; - struct rte_epoll_event *rev; - - for (i = 0; i < intr_handle->nb_efd; i++) { - rev = &intr_handle->elist[i]; - if (rev->status == RTE_EPOLL_INVALID) - continue; - if (rte_epoll_ctl(rev->epfd, EPOLL_CTL_DEL, rev->fd, rev)) { - /* force free if the entry valid */ - eal_epoll_data_safe_free(rev); - rev->status = RTE_EPOLL_INVALID; - } - } + rte_intr_free_epoll_fd(intr_handle); if (intr_handle->max_intr > intr_handle->nb_efd) { for (i = 0; i < intr_handle->nb_efd; i++) close(intr_handle->efds[i]); diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h index d459bf4..735d307 100644 --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h @@ -171,6 +171,15 @@ struct rte_intr_handle { int epfd, int op, unsigned int vec, void *data); /** + * It deletes registered eventfds. + * + * @param intr_handle + * Pointer to the interrupt handle. + */ +void +rte_intr_free_epoll_fd(struct rte_intr_handle *intr_handle); + +/** * It enables the packet I/O interrupt event if it's necessary. * It creates event fd for each interrupt vector when MSIX is used, * otherwise it multiplexes a single event fd. diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map index 9c134b4..905d8de 100644 --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map @@ -186,3 +186,10 @@ DPDK_17.02 { rte_bus_unregister; } DPDK_16.11; + +DPDK_17.05 { + global: + + rte_intr_free_epoll_fd; + +} DPDK_17.02; -- 1.8.3.1
[dpdk-dev] [PATCH v2 2/2] net/mlx5: support user space rxq interrupt event
Implement rxq interrupt callbacks Signed-off-by: Shahaf Shuler --- doc/guides/nics/features/mlx5.ini | 1 + doc/guides/rel_notes/release_17_05.rst | 6 ++ drivers/net/mlx5/Makefile | 5 ++ drivers/net/mlx5/mlx5.c| 2 + drivers/net/mlx5/mlx5_rxq.c| 126 - drivers/net/mlx5/mlx5_rxtx.c | 73 +++ drivers/net/mlx5/mlx5_rxtx.h | 7 ++ drivers/net/mlx5/mlx5_trigger.c| 9 +++ 8 files changed, 228 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini index 1814f82..532c0ef 100644 --- a/doc/guides/nics/features/mlx5.ini +++ b/doc/guides/nics/features/mlx5.ini @@ -7,6 +7,7 @@ Speed capabilities = Y Link status = Y Link status event= Y +Rx interrupt = Y Queue start/stop = Y MTU update = Y Jumbo frame = Y diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst index 49e9640..f9f79c2 100644 --- a/doc/guides/rel_notes/release_17_05.rst +++ b/doc/guides/rel_notes/release_17_05.rst @@ -62,6 +62,11 @@ New Features Added support for Hardware TSO for tunneled and non-tunneled packets. Tunneling protocols supported are GRE and VXLAN. +* **Added support for Rx interrupts on mlx5 driver.** + + Rx queues can be armed with an interrupt which will trigger on the + next packet arrival. + * **Added vmxnet3 version 3 support.** Added support for vmxnet3 version 3 which includes several @@ -73,6 +78,7 @@ New Features * Generic flow API support for Ethernet, VLAN, IPv4, IPv6, UDP and TCP pattern items with QUEUE action for ingress traffic. + Resolved Issues --- diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile index 671089c..084044a 100644 --- a/drivers/net/mlx5/Makefile +++ b/drivers/net/mlx5/Makefile @@ -137,6 +137,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh /usr/include/linux/ethtool.h \ enum ETHTOOL_LINK_MODE_10baseKR4_Full_BIT \ $(AUTOCONF_OUTPUT) + $Q sh -- '$<' '$@' \ + HAVE_UPDATE_CQ_CI \ + infiniband/mlx5_hw.h \ + func ibv_mlx5_exp_update_cq_ci \ + $(AUTOCONF_OUTPUT) # Create mlx5_autoconf.h or update it in case it differs from the new one. diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 6f42948..ebc7984 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -225,6 +225,8 @@ .rss_hash_update = mlx5_rss_hash_update, .rss_hash_conf_get = mlx5_rss_hash_conf_get, .filter_ctrl = mlx5_dev_filter_ctrl, + .rx_queue_intr_enable = mlx5_rx_intr_enable, + .rx_queue_intr_disable = mlx5_rx_intr_disable, }; static struct { diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index 28e93d3..e6070a0 100644 --- a/drivers/net/mlx5/mlx5_rxq.c +++ b/drivers/net/mlx5/mlx5_rxq.c @@ -36,6 +36,7 @@ #include #include #include +#include /* Verbs header. */ /* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */ @@ -57,6 +58,7 @@ #include #include #include +#include #ifdef PEDANTIC #pragma GCC diagnostic error "-Wpedantic" #endif @@ -773,6 +775,8 @@ claim_zero(ibv_exp_destroy_wq(rxq_ctrl->wq)); if (rxq_ctrl->cq != NULL) claim_zero(ibv_destroy_cq(rxq_ctrl->cq)); + if (rxq_ctrl->channel != NULL) + claim_zero(ibv_destroy_comp_channel(rxq_ctrl->channel)); if (rxq_ctrl->rd != NULL) { struct ibv_exp_destroy_res_domain_attr attr = { .comp_mask = 0, @@ -1014,6 +1018,16 @@ (void *)dev, strerror(ret)); goto error; } + if (dev->data->dev_conf.intr_conf.rxq) { + tmpl.channel = ibv_create_comp_channel(priv->ctx); + if (tmpl.channel == NULL) { + dev->data->dev_conf.intr_conf.rxq = 0; + ret = ENOMEM; + ERROR("%p: Comp Channel creation failure: %s", + (void *)dev, strerror(ret)); + goto error; + } + } attr.cq = (struct ibv_exp_cq_init_attr){ .comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN, .res_domain = tmpl.rd, @@ -1023,7 +1037,7 @@ attr.cq.flags |= IBV_EXP_CQ_COMPRESSED_CQE; cqe_n = (desc * 2) - 1; /* Double the number of CQEs. */ } - tmpl.cq = ibv_exp_create_cq(priv->ctx, cqe_n, NULL, NULL, 0, + tmpl.cq = ibv_exp_create_cq(priv->ctx, cqe_n, NULL, tmpl.channel, 0, &attr.cq); if (tmpl.cq == NULL) { ret = ENOMEM; @@ -1347,3 +1361,113 @@ rxq = (*priv->rxqs)[index]; return priv->dev->rx_pkt_
Re: [dpdk-dev] [PATCH 1/2] net/i40e: enable VF untag drop
On 3/9/2017 3:24 AM, Zhang, Qi Z wrote: > Hi Ferruh: > >> -Original Message- >> From: Yigit, Ferruh >> Sent: Tuesday, March 7, 2017 6:51 PM >> To: Zhang, Qi Z ; Wu, Jingjing ; >> Zhang, Helin >> Cc: dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH 1/2] net/i40e: enable VF untag drop >> >> On 3/3/2017 1:59 AM, Qi Zhang wrote: >>> Add a new private API to support the untag drop enable/disable for >>> specific VF. >>> >>> Signed-off-by: Qi Zhang >>> --- >>> drivers/net/i40e/i40e_ethdev.c | 49 >>> + >>> drivers/net/i40e/rte_pmd_i40e.h | 18 +++ >> >> Shared library is giving build error because of API is missing in >> *version.map file >> >>> 2 files changed, 67 insertions(+) >>> >> >> <...> >> >>> diff --git a/drivers/net/i40e/rte_pmd_i40e.h >>> b/drivers/net/i40e/rte_pmd_i40e.h index a0ad88c..895e2cc 100644 >>> --- a/drivers/net/i40e/rte_pmd_i40e.h >>> +++ b/drivers/net/i40e/rte_pmd_i40e.h >>> @@ -332,4 +332,22 @@ int rte_pmd_i40e_get_vf_stats(uint8_t port, int >>> rte_pmd_i40e_reset_vf_stats(uint8_t port, >>> uint16_t vf_id); >>> >>> +/** >>> + * Enable/Disable VF untag drop >>> + * >>> + * @param port >>> + *The port identifier of the Ethernet device. >>> + * @param vf_id >>> + *VF on witch to enable/disable >>> + * @param on >>> + *Enable or Disable >>> + * @retura >> >> @return >> >>> + * - (0) if successful. >>> + * -(-ENODEVE) if *port* invalid >>> + * -(-EINVAL) if bad parameter. >>> + */ >>> +int rte_pmd_i40e_set_vf_vlan_untag_drop(uint8_t port, >>> + uint16_t vf_id, >>> + uint8_t on); >> >> As discussed previously, I believe it is good to keep following syntax in >> API: >> __, for this API it becomes: > I think, current naming rule is __ right? Overall, I am not aware of any defined naming rule, I am for defining one. > See below > rte_pmd_i40e_set_vf_vlan_anti_spoof; > rte_pmd_i40e_set_vf_vlan_filter; > rte_pmd_i40e_set_vf_vlan_insert; > rte_pmd_i40e_set_vf_vlan_stripq; > rte_pmd_i40e_set_vf_vlan_tag; > so what's wrong with this? This breaks hierarchical approach, if you think API name as tree. Easier to see this when you sort the APIs, ns_set_x, ns_reset_x, ns_del_x will spread to different locations. This looks OK when you work on one type of object already, but with all APIs in concern, I believe object based grouping is better than action based grouping. And why do you think above one is better? Again, as long as one is agreed on, I am OK. >> >> rte_pmd_i40e_vf_vlan_untag_drop_set(), and perhaps "set" can be removed? >> >>> + >>> #endif /* _PMD_I40E_H_ */ >>> >
Re: [dpdk-dev] [PATCH 2/2] app/testpmd: enable VF untag drop in testpmd
On 3/9/2017 2:59 AM, Zhang, Qi Z wrote: > > >> -Original Message- >> From: Yigit, Ferruh >> Sent: Tuesday, March 7, 2017 7:14 PM >> To: Zhang, Qi Z ; Wu, Jingjing ; >> Zhang, Helin >> Cc: dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH 2/2] app/testpmd: enable VF untag drop in >> testpmd >> >> On 3/3/2017 1:59 AM, Qi Zhang wrote: >>> Add command line to support untag drop to specific VF in testpmd. >>> >>> Signed-off-by: Qi Zhang >>> --- >>> app/test-pmd/cmdline.c | 104 >>> + >>> 1 file changed, 104 insertions(+) >>> >>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index >>> 43fc636..4ddc2c9 100644 >>> --- a/app/test-pmd/cmdline.c >>> +++ b/app/test-pmd/cmdline.c >>> @@ -311,6 +311,10 @@ static void cmd_help_long_parsed(void >>> *parsed_result, >>> >>> "set vf vlan antispoof (port_id) (vf_id) (on|off)\n" >>> "Set VLAN antispoof for a VF from the PF.\n\n" >>> +#ifdef RTE_LIBRTE_I40E_PMD >>> + "set vf vlan untagdrop (port_id) (vf_id) (on|off)\n" >>> + "Set VLAN untag drop for a VF from the PF.\n\n" >>> +#endif >> >> We should decide how to implement PMD specific APIs in testpmd, and be >> consistent about it. >> >> Currently there are two approaches: >> >> 1- Wrap PMD specific feature and API with and PMD #ifdef, as done here. >> >> 2- Enable feature by default, return -ENOTSUP for port_id that does not >> support >> it. Ex: cmd_vf_rxvlan_filter. >> >> I am for second option. And explicitly not disabling I40E driver does not >> mean >> you should have those NICs in your runtime environment, so the effect will be >> same as always enabling it. >> > Yes, I notice this problem, during implementation, I saw both patterns exist, > so I have to choose one of them > We'd better align this. > Both option ok for me, but a little bit prefer option 1 , since it's not > necessary to explore a command if no backend device, that make the hint more > clean. I agree it's not necessary to explore a command if no backend device, but first option does not guaranties it. What it checks is compile time option, not runtime device detection. I40E driver is enabled by default, and unless user explicitly disables it, testpmd command will be there independent from device is there or not. >> >> And since number of PMD specific APIs are increasing, perhaps we should find >> a >> better approach for testpmd to prevent them corrupting testpmd. > Will think about this, also like to know if you or anyone have any good > suggestion. >> >> Also it may worth to discuss why number of PMD specific APIs are increasing. >> >>> >>> "set vf vlan tag (port_id) (vf_id) (on|off)\n" >>> "Set VLAN tag for a VF from the PF.\n\n" >>> @@ -10995,6 +10999,103 @@ cmdline_parse_inst_t >> cmd_set_vf_vlan_anti_spoof = { >>> }, >>> }; >>> >> <...>
Re: [dpdk-dev] [PATCH 1/2] net/tap: remove redundant syscall on Tx
> On Mar 14, 2017, at 8:51 PM, Adrien Mazarguil > wrote: > > Polling the Tx queue file descriptor before writing to it is not mandatory > since it is configured as non-blocking. > > Signed-off-by: Adrien Mazarguil > Acked-by: Pascal Mazon Acked-by: Keith Wiles > --- > drivers/net/tap/rte_eth_tap.c | 26 -- > 1 file changed, 8 insertions(+), 18 deletions(-) > > diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c > index efc4426..c757a7c 100644 > --- a/drivers/net/tap/rte_eth_tap.c > +++ b/drivers/net/tap/rte_eth_tap.c > @@ -43,7 +43,6 @@ > #include > #include > #include > -#include > #include > #include > #include > @@ -242,7 +241,6 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, > uint16_t nb_pkts) > { > struct rte_mbuf *mbuf; > struct tx_queue *txq = queue; > - struct pollfd pfd; > uint16_t num_tx = 0; > unsigned long num_tx_bytes = 0; > int i, n; > @@ -250,26 +248,18 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, > uint16_t nb_pkts) > if (unlikely(nb_pkts == 0)) > return 0; > > - pfd.events = POLLOUT; > - pfd.fd = txq->fd; > for (i = 0; i < nb_pkts; i++) { > - n = poll(&pfd, 1, 0); > - > + /* copy the tx frame data */ > + mbuf = bufs[num_tx]; > + n = write(txq->fd, > + rte_pktmbuf_mtod(mbuf, void *), > + rte_pktmbuf_pkt_len(mbuf)); > if (n <= 0) > break; > > - if (pfd.revents & POLLOUT) { > - /* copy the tx frame data */ > - mbuf = bufs[num_tx]; > - n = write(pfd.fd, rte_pktmbuf_mtod(mbuf, void*), > - rte_pktmbuf_pkt_len(mbuf)); > - if (n <= 0) > - break; > - > - num_tx++; > - num_tx_bytes += mbuf->pkt_len; > - rte_pktmbuf_free(mbuf); > - } > + num_tx++; > + num_tx_bytes += mbuf->pkt_len; > + rte_pktmbuf_free(mbuf); > } > > txq->stats.opackets += num_tx; > -- > 2.1.4 > Regards, Keith
Re: [dpdk-dev] [PATCH 2/2] net/tap: add Rx trigger
> On Mar 14, 2017, at 8:51 PM, Adrien Mazarguil > wrote: > > This commit adds a signal-based trigger to the Rx burst function in order > to avoid unnecessary system calls while Rx queues are empty. > > Triggered Rx bursts put less pressure on the kernel, free up CPU resources > for applications and result in a noticeable performance improvement when > sharing CPU threads with other PMDs. > > Measuring the traffic forwarding rate between two physical devices in > testpmd (IO mode, single thread, 64B packets) before and after adding two > tap PMD instances (4 ports total) that do not process any traffic and > comparing results yields: > > Without Rx trigger: > > -15% (--burst=32) > -62% (--burst=1) > > With Rx trigger: > > -0.3% (--burst=32) > -6% (--burst=1) > > Signed-off-by: Adrien Mazarguil > Acked-by: Pascal Mazon Acked-by: Keith Wiles > --- > drivers/net/tap/rte_eth_tap.c | 59 ++ > 1 file changed, 59 insertions(+) > > diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c > index c757a7c..d5d467a 100644 > --- a/drivers/net/tap/rte_eth_tap.c > +++ b/drivers/net/tap/rte_eth_tap.c > @@ -31,6 +31,8 @@ > * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > */ > > +#include > +#include > #include > #include > #include > @@ -42,6 +44,9 @@ > #include > #include > #include > +#include > +#include > +#include > #include > #include > #include > @@ -72,6 +77,8 @@ static const char *valid_arguments[] = { > > static int tap_unit; > > +static volatile uint32_t tap_trigger;/* Rx trigger */ > + > static struct rte_eth_link pmd_link = { > .link_speed = ETH_SPEED_NUM_10G, > .link_duplex = ETH_LINK_FULL_DUPLEX, > @@ -89,6 +96,7 @@ struct pkt_stats { > > struct rx_queue { > struct rte_mempool *mp; /* Mempool for RX packets */ > + uint32_t trigger_seen; /* Last seen Rx trigger value */ > uint16_t in_port; /* Port ID */ > int fd; > > @@ -111,6 +119,13 @@ struct pmd_internals { > struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES];/* List of TX queues */ > }; > > +static void > +tap_trigger_cb(int sig __rte_unused) > +{ > + /* Valid trigger values are nonzero */ > + tap_trigger = (tap_trigger + 1) | 0x8000; > +} > + > /* Tun/Tap allocation routine > * > * name is the number of the interface to use, unless NULL to take the host > @@ -175,6 +190,43 @@ tun_alloc(struct pmd_internals *pmd, uint16_t qid) > goto error; > } > > + /* Set up trigger to optimize empty Rx bursts */ > + errno = 0; > + do { > + struct sigaction sa; > + int flags = fcntl(fd, F_GETFL); > + > + if (flags == -1 || sigaction(SIGIO, NULL, &sa) == -1) > + break; > + if (sa.sa_handler != tap_trigger_cb) { > + /* > + * Make sure SIGIO is not already taken. This is done > + * as late as possible to leave the application a > + * chance to set up its own signal handler first. > + */ > + if (sa.sa_handler != SIG_IGN && > + sa.sa_handler != SIG_DFL) { > + errno = EBUSY; > + break; > + } > + sa = (struct sigaction){ > + .sa_flags = SA_RESTART, > + .sa_handler = tap_trigger_cb, > + }; > + if (sigaction(SIGIO, &sa, NULL) == -1) > + break; > + } > + /* Enable SIGIO on file descriptor */ > + fcntl(fd, F_SETFL, flags | O_ASYNC); > + fcntl(fd, F_SETOWN, getpid()); > + } while (0); > + if (errno) { > + /* Disable trigger globally in case of error */ > + tap_trigger = 0; > + RTE_LOG(WARNING, PMD, "Rx trigger disabled: %s\n", > + strerror(errno)); > + } > + > if (qid == 0) { > if (ioctl(fd, SIOCGIFHWADDR, &ifr) == -1) { > RTE_LOG(ERR, PMD, "ioctl failed (SIOCGIFHWADDR) (%s)\n", > @@ -204,7 +256,13 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, > uint16_t nb_pkts) > struct rx_queue *rxq = queue; > uint16_t num_rx; > unsigned long num_rx_bytes = 0; > + uint32_t trigger = tap_trigger; > > + if (trigger == rxq->trigger_seen) > + return 0; > + if (trigger) > + rxq->trigger_seen = trigger; > + rte_compiler_barrier(); > for (num_rx = 0; num_rx < nb_pkts; ) { > /* allocate the next mbuf */ > mbuf = rte_pktmbuf_alloc(rxq->mp); > @@ -563,6 +621,7 @@ tap_rx_queue_setup(struct rte_eth_dev *dev, > } > > internals->rxq[rx_queue_id].mp = mp; > + internals->rxq[r
Re: [dpdk-dev] i40e SR-IOV with Linux PF
> > +Cc dev@dpdk.org > > 2017-03-13 15:29, Thomas Monjalon: > > Hi i40e developers, > > > > Referring to the VFD discussion, I thought basic behaviours were the > > same regardless of the PF driver: > > http://dpdk.org/ml/archives/dev/2016-December/053056.html > > " > > > In the meanwhile, we have some test models ongoing to validate > > > combination of Linux and DPDK drivers for VF and PF. > > > We'll fully support below 4 cases going forward. > > > 1. DPDK PF + DPDK VF > > > 2. DPDK PF + Linux VF > > > 3. Linux PF + DPDK VF > > > 4. Linux PF + Linux VF (it's not our scope) > > [...] > > > Linux PF + DPDK VF has been tested with 1.0 API long time ago. > > > There is some test activities ongoing. > > " > > > > I think the Linux PF case is important and deserves more consideration. > > When looking at the code, specifically i40evf_vlan_offload_set() and > > i40evf_vlan_pvid_set(), I read this: > > " > > /* Linux pf host doesn't support vlan offload yet */ > > if (vf->version_major == I40E_DPDK_VERSION_MAJOR) { " > > > > Is there some work in progress on Linux side to get the same behaviour > > as with a DPDK PF? > > > > At least, it must be documented in > > doc/guides/nics/features/i40e_vf.ini > > and marked as partially supported (P instead of Y) in > > doc/guides/nics/i40e.rst Thanks for pointing it out. We'll sync our code with latest kernel driver and document and comment properly soon.
Re: [dpdk-dev] [PATCH v4] net/tap: add remote netdevice traffic capture
> On Mar 14, 2017, at 4:34 PM, Pascal Mazon wrote: > > By default, a tap netdevice is of no use when not fed by a separate > process. The ability to automatically feed it from another netdevice > allows applications to capture any kind of traffic normally destined to > the kernel stack. > > This patch implements this ability through a new optional "remote" > parameter. > > Packets matching filtering rules created with the flow API are matched > on the remote device and redirected to the tap PMD, where the relevant > action will be performed. > > Signed-off-by: Pascal Mazon > Acked-by: Olga Shern Acked-by: Keith Wiles > --- > doc/guides/nics/tap.rst | 17 ++ > drivers/net/tap/rte_eth_tap.c | 101 +- > drivers/net/tap/rte_eth_tap.h | 4 + > drivers/net/tap/tap_flow.c| 451 -- > drivers/net/tap/tap_flow.h| 24 +++ > 5 files changed, 575 insertions(+), 22 deletions(-) Regards, Keith
Re: [dpdk-dev] [PATCH v4 3/4] net/tap: add netlink back-end for flow API
> On Mar 14, 2017, at 4:29 PM, Pascal Mazon wrote: > > Each kernel netdevice may have queueing disciplines set for it, which > determine how to handle the packet (mostly on egress). That's part of > the TC (Traffic Control) mechanism. > > Through TC, it is possible to set filter rules that match specific > packets, and act according to what is in the rule. This is a perfect > candidate to implement the flow API for the tap PMD, as it has an > associated kernel netdevice automatically. > > Each flow API rule will be translated into its TC counterpart. > > To leverage TC, it is necessary to communicate with the kernel using > netlink. This patch introduces a library to help that communication. > > Inside netlink.c, functions are generic for any netlink messaging. > Inside tcmsgs.c, functions are specific to deal with TC rules. > > Signed-off-by: Pascal Mazon > Acked-by: Olga Shern Acked-by: Keith Wiles Regards, Keith
Re: [dpdk-dev] [PATCH v4 1/4] net/tap: move private elements to external header
> On Mar 14, 2017, at 4:29 PM, Pascal Mazon wrote: > > In the next patch, access to struct pmd_internals will be necessary in > tap_flow.c to store the flows. > > Signed-off-by: Pascal Mazon > Acked-by: Olga Shern Acked-by: Keith Wiles Regards, Keith
Re: [dpdk-dev] Compiling DPDK 17.02 with a Busybox-based tar
Why it is surprising that I need to compile DPDK on a system Busybox? I currently build DPDK both for my local Alpine Linux system (busybox) using the system compiler, use busybox for my cross-tools toolchain, and also use it in Libertine Linux. It means a system using DPDK can be much more minimal. Likewise, Aboriginal Linux uses Busybox (and, shortly, Toybox), for similar reasons. It also means users of my rust crate can work in a wider range of systems than RHEL / Ubuntu. On 14 March 2017 at 09:39, Thomas Monjalon wrote: > 2017-03-14 07:58, Raphael Cohn: > > Hi, > > > > To compile DPDK on a system with Busybox tar installed, it's necessary to > > make a small change to the build system:- > > > > sed -i -e '/--keep-newer-files/d' mk/rte.sdkinstall.mk > > sed -i -e 's;--strip-components=1 \\;--strip-components=1;g' mk/ > > rte.sdkinstall.mk > > > > I'm not sure whether the impact of this change fundamentally affects > DPDK. > > I'm a little surprised that tar is needed at all for a compile + install, > > but I haven't investigated further. Is it being used to do a copy? > > Yes it is used to make a copy. > It is a convenient one-liner. > > I am a bit surprised that you need to install DPDK with busybox. > The busybox systems are generally cross-built and prepared out of the box, > with the host tools. > However, if you feel it is important to install DPDK on such target, > you are welcome to propose a patch. > >
Re: [dpdk-dev] [PATCH 1/2] net/i40e: enable VF untag drop
> -Original Message- > From: Yigit, Ferruh > Sent: Tuesday, March 14, 2017 9:30 PM > To: Zhang, Qi Z ; Wu, Jingjing > ; Zhang, Helin > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH 1/2] net/i40e: enable VF untag drop > > On 3/9/2017 3:24 AM, Zhang, Qi Z wrote: > > Hi Ferruh: > > > >> -Original Message- > >> From: Yigit, Ferruh > >> Sent: Tuesday, March 7, 2017 6:51 PM > >> To: Zhang, Qi Z ; Wu, Jingjing > >> ; Zhang, Helin > >> Cc: dev@dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH 1/2] net/i40e: enable VF untag drop > >> > >> On 3/3/2017 1:59 AM, Qi Zhang wrote: > >>> Add a new private API to support the untag drop enable/disable for > >>> specific VF. > >>> > >>> Signed-off-by: Qi Zhang > >>> --- > >>> drivers/net/i40e/i40e_ethdev.c | 49 > >>> + > >>> drivers/net/i40e/rte_pmd_i40e.h | 18 +++ > >> > >> Shared library is giving build error because of API is missing in > >> *version.map file > >> > >>> 2 files changed, 67 insertions(+) > >>> > >> > >> <...> > >> > >>> diff --git a/drivers/net/i40e/rte_pmd_i40e.h > >>> b/drivers/net/i40e/rte_pmd_i40e.h index a0ad88c..895e2cc 100644 > >>> --- a/drivers/net/i40e/rte_pmd_i40e.h > >>> +++ b/drivers/net/i40e/rte_pmd_i40e.h > >>> @@ -332,4 +332,22 @@ int rte_pmd_i40e_get_vf_stats(uint8_t port, > >>> int rte_pmd_i40e_reset_vf_stats(uint8_t port, > >>> uint16_t vf_id); > >>> > >>> +/** > >>> + * Enable/Disable VF untag drop > >>> + * > >>> + * @param port > >>> + *The port identifier of the Ethernet device. > >>> + * @param vf_id > >>> + *VF on witch to enable/disable > >>> + * @param on > >>> + *Enable or Disable > >>> + * @retura > >> > >> @return > >> > >>> + * - (0) if successful. > >>> + * -(-ENODEVE) if *port* invalid > >>> + * -(-EINVAL) if bad parameter. > >>> + */ > >>> +int rte_pmd_i40e_set_vf_vlan_untag_drop(uint8_t port, > >>> + uint16_t vf_id, > >>> + uint8_t on); > >> > >> As discussed previously, I believe it is good to keep following syntax in > API: > >> __, for this API it becomes: > > I think, current naming rule is __ right? > > Overall, I am not aware of any defined naming rule, I am for defining one. > > > See below > > rte_pmd_i40e_set_vf_vlan_anti_spoof; > > rte_pmd_i40e_set_vf_vlan_filter; > > rte_pmd_i40e_set_vf_vlan_insert; > > rte_pmd_i40e_set_vf_vlan_stripq; > > rte_pmd_i40e_set_vf_vlan_tag; > > so what's wrong with this > > This breaks hierarchical approach, if you think API name as tree. Easier to > see this when you sort the APIs, ns_set_x, ns_reset_x, ns_del_x will spread > to different locations. I agree with your point, I had thought your concern is only about this patch, but actually it's not. > > This looks OK when you work on one type of object already, but with all APIs > in concern, I believe object based grouping is better than action based > grouping. > > And why do you think above one is better? Again, as long as one is agreed on, I don't, sorry for make you misunderstand > > >> > >> rte_pmd_i40e_vf_vlan_untag_drop_set(), and perhaps "set" can be > removed? > >> > >>> + > >>> #endif /* _PMD_I40E_H_ */ > >>> > >
Re: [dpdk-dev] [PATCH 2/2] app/testpmd: enable VF untag drop in testpmd
> -Original Message- > From: Yigit, Ferruh > Sent: Tuesday, March 14, 2017 9:32 PM > To: Zhang, Qi Z ; Wu, Jingjing > ; Zhang, Helin > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH 2/2] app/testpmd: enable VF untag drop in > testpmd > > On 3/9/2017 2:59 AM, Zhang, Qi Z wrote: > > > > > >> -Original Message- > >> From: Yigit, Ferruh > >> Sent: Tuesday, March 7, 2017 7:14 PM > >> To: Zhang, Qi Z ; Wu, Jingjing > >> ; Zhang, Helin > >> Cc: dev@dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH 2/2] app/testpmd: enable VF untag drop > >> in testpmd > >> > >> On 3/3/2017 1:59 AM, Qi Zhang wrote: > >>> Add command line to support untag drop to specific VF in testpmd. > >>> > >>> Signed-off-by: Qi Zhang > >>> --- > >>> app/test-pmd/cmdline.c | 104 > >>> + > >>> 1 file changed, 104 insertions(+) > >>> > >>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index > >>> 43fc636..4ddc2c9 100644 > >>> --- a/app/test-pmd/cmdline.c > >>> +++ b/app/test-pmd/cmdline.c > >>> @@ -311,6 +311,10 @@ static void cmd_help_long_parsed(void > >>> *parsed_result, > >>> > >>> "set vf vlan antispoof (port_id) (vf_id) (on|off)\n" > >>> "Set VLAN antispoof for a VF from the PF.\n\n" > >>> +#ifdef RTE_LIBRTE_I40E_PMD > >>> + "set vf vlan untagdrop (port_id) (vf_id) (on|off)\n" > >>> + "Set VLAN untag drop for a VF from the PF.\n\n" > >>> +#endif > >> > >> We should decide how to implement PMD specific APIs in testpmd, and > >> be consistent about it. > >> > >> Currently there are two approaches: > >> > >> 1- Wrap PMD specific feature and API with and PMD #ifdef, as done here. > >> > >> 2- Enable feature by default, return -ENOTSUP for port_id that does > >> not support it. Ex: cmd_vf_rxvlan_filter. > >> > >> I am for second option. And explicitly not disabling I40E driver does > >> not mean you should have those NICs in your runtime environment, so > >> the effect will be same as always enabling it. > >> > > Yes, I notice this problem, during implementation, I saw both patterns > > exist, so I have to choose one of them We'd better align this. > > Both option ok for me, but a little bit prefer option 1 , since it's not > necessary to explore a command if no backend device, that make the hint > more clean. > > I agree it's not necessary to explore a command if no backend device, but > first option does not guaranties it. What it checks is compile time option, > not > runtime device detection. > I40E driver is enabled by default, and unless user explicitly disables it, > testpmd command will be there independent from device is there or not. So this still benefit custom build, right? Anyway, as I said, both are ok for me, I followed option 2 since most APIs follow this. Please check v2. > > >> > >> And since number of PMD specific APIs are increasing, perhaps we > >> should find a better approach for testpmd to prevent them corrupting > testpmd. > > Will think about this, also like to know if you or anyone have any good > suggestion. > >> > >> Also it may worth to discuss why number of PMD specific APIs are > increasing. > >> > >>> > >>> "set vf vlan tag (port_id) (vf_id) (on|off)\n" > >>> "Set VLAN tag for a VF from the PF.\n\n" > >>> @@ -10995,6 +10999,103 @@ cmdline_parse_inst_t > >> cmd_set_vf_vlan_anti_spoof = { > >>> }, > >>> }; > >>> > >> <...>
Re: [dpdk-dev] i40e SR-IOV with Linux PF
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] > Sent: Tuesday, March 14, 2017 2:31 AM > To: Wu, Jingjing > Cc: Zhang, Helin ; Chen, Jing D > ; Yigit, Ferruh ; > vincent.jar...@6wind.com; dev@dpdk.org > Subject: Re: i40e SR-IOV with Linux PF > > 2017-03-14 04:44, Wu, Jingjing: > > From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] > > > > Hi i40e developers, > > > > > > > > Referring to the VFD discussion, I thought basic behaviours were > > > > the same regardless of the PF driver: > > > > http://dpdk.org/ml/archives/dev/2016-December/053056.html > > > > " > > > > > In the meanwhile, we have some test models ongoing to validate > > > > > combination of Linux and DPDK drivers for VF and PF. > > > > > We'll fully support below 4 cases going forward. > > > > > 1. DPDK PF + DPDK VF > > > > > 2. DPDK PF + Linux VF > > > > > 3. Linux PF + DPDK VF > > > > > 4. Linux PF + Linux VF (it's not our scope) > > > > [...] > > > > > Linux PF + DPDK VF has been tested with 1.0 API long time ago. > > > > > There is some test activities ongoing. > > > > " > > > > > > > > I think the Linux PF case is important and deserves more consideration. > > > > When looking at the code, specifically i40evf_vlan_offload_set() > > > > and i40evf_vlan_pvid_set(), I read this: > > > > " > > > > /* Linux pf host doesn't support vlan offload yet */ > > > > if (vf->version_major == I40E_DPDK_VERSION_MAJOR) { " > > > > > > > > Is there some work in progress on Linux side to get the same > > > > behaviour as with a DPDK PF? > > > > > > > > As I know, VFD features are marked with an "EXPERIMENTAL" tag. > > And we are working on the extendable interface (feature based) with PF > > kernel driver. > > The VLAN offload is not a VFD feature. It is a basic driver feature. > It is said that it is supported in the documentation but it is not with a > Linux PF. > > Please consider the rest of my email: > Yes, I was saying VLAN offload. What we are working to make VF and PF sync based on feature ability but not whether it is DPDK PF or Kernel PF. And, about the doc, yes, we need to add such limitation here. Thanks Jingjing > > > At least, it must be documented in > > > doc/guides/nics/features/i40e_vf.ini > > > and marked as partially supported (P instead of Y) in > > > doc/guides/nics/i40e.rst
Re: [dpdk-dev] [PATCH v2 00/13] introduce fail-safe PMD
> The central question that I would like to tackle is this: why should > we require from our users declaring a bonding device to have > hot-plug support? > We'll, strictly speaking, I suppose we don't have to require it. But by that same token, we don't need to do it in a separate PMD either, there are lots of other options. I think I used an ambiguous formulation here. To be sure that we understand each other, what I want to express is that it will certainly seem very strange for a user to declare a bond on a single device, first, and to be expected to do so if they wanted hot-plug support on that particular device. The bonding here would only be used as a place-holder, which does not really make sense from the user point of view. I think you understood that, and what I take from your response is that while agreeing you would like to do so differently. I just wanted to be clear. > I took some time to illustrate a few modes of operation: > > Fig. 1 > >.-. >| application | >`--.--' > | > .'-.-. <-- init, conf, Rx/Tx > | | | > | .---|--.--|--. <--- conf, link check, Rx/Tx > | | | | | | > v | v v v v > .-. | .---. .--. > | bonding | | | ixgbe | | mlx4 | > `.' | `---' `--' > | | > `--' > > Typical link fail-over. > > > Fig. 2 > > .-. > | application | > `--.--' > | < init, conf, Rx/Tx > v > .---. > | fail-safe | > `-.-' > | > .---'. <--- init, conf, dev check, Rx/Tx > || > vv > .---. .--. > | ixgbe | | mlx4 | > `---' `--' > > Typical automatic hot-plug handling with device fail-over. > > [...] Yes, I think we all understand the purpose of your PMD - its there to provide a placeholder device that can respond to application requests in a sane manner, until such time as real hardware is put in its place via a hot plug/failure. Thats all well and good, I'm saying there are better ways to go about this that can provide the same functionality without having to add an extra 4k lines of code to the project, many of which already exist. Ah, yes, I didn't want to imply that the purpose of this PMD wasn't understood already by many. These figures are there to illustrate some use cases that some users could recognize, and serve as a support for the point made afterward. The main thing that can be taken from these is the division along the link-level and device-level checking that is done. This explains most of my position. The nature of those checks imply different kind of code, most of which is thus actually not duplicated / would require pretty much the same amount of code to be implemented either as libraries or as part of the bonding PMD. This is the crux of my argument, which I expand upon below. > 1. LSC vs. RMV > > A link status change is a valid state for a device. It calls for > specific responses, e.g. a link switch in a bonding device, without > losing the general configuration of the port. > > The removal of a device calls for more than pausing operations and > switching an active device. The party responsible for initializing the > device should take care of closing it properly. If this party also > wants to be able to restore the device if it was plugged back in, it > would need be able to initialize it back and reconfigure its previous > state. > > As we can see that in [Fig. 1], this responsibility lies upon the > application. > Again, yes, I think we all see the benefit of centralizing hot plug operations, no one is disagreing with that, its the code/functional duplication that is concerning. Certainly, I will try to explain why the code is not actually duplicated / why the functions are actually only superficially overlapping. > 2. Bonding and link availability > > The hot-plug functionality is not a core function of the bonding PMD. > It is only interested in knowing if the link is active or not. > Currently, yes. The suggestion was that you augment the bonding driver so that hot plug is a core function of bonding. > Adding the device persistence to the bonding PMD would mean adding the > ability to flexibly parse device definitions to cope with plug-ins in > evolving busses (PCI hot-plug could mean changing bus addresses), being > able to emulate the EAL and the ether layer and to properly store the > device configuration. This means formally describing the life of a > device in a DPDK application from start to finish. > Which seems to me to be exactly what your PMD does. I don't see why its fundamentally harder to do that in an existing pmd, than it is in a new one. Indeed it does. I must emphasize the "formally describe the life of a device". The hot-plug functionality goes beyong the link-level check. The description of a device from a DPDK
Re: [dpdk-dev] Compiling DPDK 17.02 with a Busybox-based tar
2017-03-14 14:20, Raphael Cohn: > Why it is surprising that I need to compile DPDK on a system Busybox? > > I currently build DPDK both for my local Alpine Linux system (busybox) > using the system compiler, use busybox for my cross-tools toolchain, and > also use it in Libertine Linux. It means a system using DPDK can be much > more minimal. Likewise, Aboriginal Linux uses Busybox (and, shortly, > Toybox), for similar reasons. It also means users of my rust crate can work > in a wider range of systems than RHEL / Ubuntu. I understand the need for busybox on the target system. I was just saying that generally we use a more complete system on the host building a target. But you are free to use any tool you want :) PS: please reply in-line > On 14 March 2017 at 09:39, Thomas Monjalon > wrote: > > > 2017-03-14 07:58, Raphael Cohn: > > > Hi, > > > > > > To compile DPDK on a system with Busybox tar installed, it's necessary to > > > make a small change to the build system:- > > > > > > sed -i -e '/--keep-newer-files/d' mk/rte.sdkinstall.mk > > > sed -i -e 's;--strip-components=1 \\;--strip-components=1;g' mk/ > > > rte.sdkinstall.mk > > > > > > I'm not sure whether the impact of this change fundamentally affects > > DPDK. > > > I'm a little surprised that tar is needed at all for a compile + install, > > > but I haven't investigated further. Is it being used to do a copy? > > > > Yes it is used to make a copy. > > It is a convenient one-liner. > > > > I am a bit surprised that you need to install DPDK with busybox. > > The busybox systems are generally cross-built and prepared out of the box, > > with the host tools. > > However, if you feel it is important to install DPDK on such target, > > you are welcome to propose a patch. > > > >
[dpdk-dev] [PATCH v2] examples/ip_fragmentation: fix check of packet type
The packet_type in mbuf is not correctly filled by ixgbe 82599 NIC. To use the ether_type in ethernet header to check packet type is more reliaber. Fixes: 3c0184cc0c60 ("examples: replace some offload flags with packet type") Fixes: ab351fe1c95c ("mbuf: remove packet type from offload flags") Cc: sta...@dpdk.org Reported-by: Fangfang Wei Signed-off-by: Wei Dai --- examples/ip_fragmentation/main.c | 74 1 file changed, 74 insertions(+) diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c index 9e9ecae..151f059 100644 --- a/examples/ip_fragmentation/main.c +++ b/examples/ip_fragmentation/main.c @@ -653,6 +653,74 @@ check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) } } +/* Check L3 packet type detection capablity of the NIC port */ +static int +check_ptype(int portid) +{ + int i, ret; + int ptype_l3_ipv4 = 0, ptype_l3_ipv6 = 0; + uint32_t ptype_mask = RTE_PTYPE_L3_MASK; + + ret = rte_eth_dev_get_supported_ptypes(portid, ptype_mask, NULL, 0); + if (ret <= 0) + return 0; + + uint32_t ptypes[ret]; + + ret = rte_eth_dev_get_supported_ptypes(portid, ptype_mask, ptypes, ret); + for (i = 0; i < ret; ++i) { + if (ptypes[i] & RTE_PTYPE_L3_IPV4) + ptype_l3_ipv4 = 1; + if (ptypes[i] & RTE_PTYPE_L3_IPV6) + ptype_l3_ipv6 = 1; + } + + if (ptype_l3_ipv4 == 0) + printf("port %d cannot parse RTE_PTYPE_L3_IPV4\n", portid); + + if (ptype_l3_ipv6 == 0) + printf("port %d cannot parse RTE_PTYPE_L3_IPV6\n", portid); + + if (ptype_l3_ipv4 && ptype_l3_ipv6) + return 1; + + return 0; + +} + +/* Parse packet type of a packet by SW */ +static inline void +parse_ptype(struct rte_mbuf *m) +{ + struct ether_hdr *eth_hdr; + uint32_t packet_type = RTE_PTYPE_UNKNOWN; + uint16_t ether_type; + + eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *); + ether_type = eth_hdr->ether_type; + if (ether_type == rte_cpu_to_be_16(ETHER_TYPE_IPv4)) + packet_type |= RTE_PTYPE_L3_IPV4_EXT_UNKNOWN; + else if (ether_type == rte_cpu_to_be_16(ETHER_TYPE_IPv6)) + packet_type |= RTE_PTYPE_L3_IPV6_EXT_UNKNOWN; + + m->packet_type = packet_type; +} + +/* callback function to detect packet type for a queue of a port */ +static uint16_t +cb_parse_ptype(uint8_t port __rte_unused, uint16_t queue __rte_unused, + struct rte_mbuf *pkts[], uint16_t nb_pkts, + uint16_t max_pkts __rte_unused, + void *user_param __rte_unused) +{ + uint16_t i; + + for (i = 0; i < nb_pkts; ++i) + parse_ptype(pkts[i]); + + return nb_pkts; +} + static int init_routing_table(void) { @@ -944,6 +1012,12 @@ main(int argc, char **argv) ret, portid); rte_eth_promiscuous_enable(portid); + + if (check_ptype(portid) == 0) { + rte_eth_add_rx_callback(portid, 0, cb_parse_ptype, NULL); + printf("Add Rx callback funciton to detect L3 packet type by SW :" + " port = %d\n", portid); + } } if (init_routing_table() < 0) -- 2.7.4
[dpdk-dev] [PATCH] ip_frag: handle MTU sizes not aligned to 8 bytes
The rte_ipv4_fragment_packet API expects that the link/interface MTU value passed in be divisible by 8 bytes. Given the name of the parameter is "mtu" rather than "frag_size" it is not necessarily the case that it will be divisible by 8. An MTU of 1500 happens to produce a max fragment size of 1480 (1500 - sizeof(ipv4_hdr)) which is divisible by 8 but other MTU values such as 1600 or 9000 do not produce values that are divisible by 8. Unfortunately, the API checks that the frag_size value produced is divisible by 8 with a call to RTE_ASSERT which is only enabled when the RTE_LOG_LEVEL >= RTE_LOG_DEBUG. In cases where the log level is set normally the code silently continues and produces IP fragments that have invalid fragment offset values. An application may not have control over what MTU a user selects and rather than have each application adjust the MTU to pass a suitable value to the fragmentation API this change modifies the fragmentation API to handle cases where the "mtu" argument is not divisible by 8 and automatically adjust the internal "frag_size". Signed-off-by: Allain Legacy --- lib/librte_ip_frag/rte_ipv4_fragmentation.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/lib/librte_ip_frag/rte_ipv4_fragmentation.c b/lib/librte_ip_frag/rte_ipv4_fragmentation.c index a2259e8..8c5f5ec 100644 --- a/lib/librte_ip_frag/rte_ipv4_fragmentation.c +++ b/lib/librte_ip_frag/rte_ipv4_fragmentation.c @@ -48,7 +48,7 @@ #defineIPV4_HDR_DF_MASK(1 << IPV4_HDR_DF_SHIFT) #defineIPV4_HDR_MF_MASK(1 << IPV4_HDR_MF_SHIFT) -#defineIPV4_HDR_FO_MASK((1 << IPV4_HDR_FO_SHIFT) - 1) +#defineIPV4_HDR_FO_ALIGN (1 << IPV4_HDR_FO_SHIFT) static inline void __fill_ipv4hdr_frag(struct ipv4_hdr *dst, const struct ipv4_hdr *src, uint16_t len, uint16_t fofs, @@ -103,11 +103,14 @@ static inline void __free_fragments(struct rte_mbuf *mb[], uint32_t num) uint32_t out_pkt_pos, in_seg_data_pos; uint32_t more_in_segs; uint16_t fragment_offset, flag_offset, frag_size; + uint16_t frag_bytes_remaining; - frag_size = (uint16_t)(mtu_size - sizeof(struct ipv4_hdr)); - - /* Fragment size should be a multiply of 8. */ - RTE_ASSERT((frag_size & IPV4_HDR_FO_MASK) == 0); + /* +* Ensure the IP payload length of all fragments is aligned to a +* multiple of 8 bytes as per RFC791 section 2.3. +*/ + frag_size = RTE_ALIGN_FLOOR((mtu_size - sizeof(struct ipv4_hdr)), + IPV4_HDR_FO_ALIGN); in_hdr = rte_pktmbuf_mtod(pkt_in, struct ipv4_hdr *); flag_offset = rte_cpu_to_be_16(in_hdr->fragment_offset); @@ -142,6 +145,7 @@ static inline void __free_fragments(struct rte_mbuf *mb[], uint32_t num) /* Reserve space for the IP header that will be built later */ out_pkt->data_len = sizeof(struct ipv4_hdr); out_pkt->pkt_len = sizeof(struct ipv4_hdr); + frag_bytes_remaining = frag_size; out_seg_prev = out_pkt; more_out_segs = 1; @@ -161,7 +165,7 @@ static inline void __free_fragments(struct rte_mbuf *mb[], uint32_t num) /* Prepare indirect buffer */ rte_pktmbuf_attach(out_seg, in_seg); - len = mtu_size - out_pkt->pkt_len; + len = frag_bytes_remaining; if (len > (in_seg->data_len - in_seg_data_pos)) { len = in_seg->data_len - in_seg_data_pos; } @@ -171,9 +175,10 @@ static inline void __free_fragments(struct rte_mbuf *mb[], uint32_t num) out_pkt->pkt_len); out_pkt->nb_segs += 1; in_seg_data_pos += len; + frag_bytes_remaining -= len; /* Current output packet (i.e. fragment) done ? */ - if (unlikely(out_pkt->pkt_len >= mtu_size)) + if (unlikely(frag_bytes_remaining == 0)) more_out_segs = 0; /* Current input segment done ? */ -- 1.8.3.1
Re: [dpdk-dev] [PATCH v2] net/sfc: set multicast address list in started state only
On 3/9/2017 4:21 PM, Andrew Rybchenko wrote: > From: Ivan Malov > > According to 'libefx' API requirements, one is allowed to > apply multicast address list to the port in started state > only, otherwise the new array should be copied to a local > storage in order to be applied during the next port start > > Fixes: 0fa0070e4391 ("net/sfc: support multicast addresses list controls") > Coverity issue: 141296 > > Fixes: e9ddf37a507d ("net/sfc: fix setting empty multicast list") > Cc: sta...@dpdk.org > > Signed-off-by: Ivan Malov > Signed-off-by: Andrew Rybchenko > Reviewed-by: Andrew Lee > Reviewed-by: Andy Moreton Applied to dpdk-next-net/master, thanks.
Re: [dpdk-dev] [PATCH] net/e1000: advertise offload capabilities for the EM PMD
On 3/13/2017 3:15 AM, Lu, Wenzhuo wrote: > Hi, > >> -Original Message- >> From: Allain Legacy [mailto:allain.leg...@windriver.com] >> Sent: Friday, March 10, 2017 10:38 PM >> To: Lu, Wenzhuo >> Cc: dev@dpdk.org >> Subject: [PATCH] net/e1000: advertise offload capabilities for the EM PMD >> >> The hardware offload capabilities are not being advertised for the EM PMD. >> Because of this, applications that only enable these features if the device >> advertises them will never do so. >> >> Normally this is not an issue since normal packet processing should work >> even if >> hardware offload is not available. But, in older versions of Virtual Box >> the e1000 >> device emulation (Intel PRO/1000 MT Desktop 82540EM) assumes that it should >> enable VLAN stripping even if the driver does not request it. This means >> that any >> ingress packets that have a VLAN tag will be stripped. Since the >> application did >> not request to enable VLAN stripping it is not expecting these packets so >> they >> are not processed as VLAN packets. >> >> Regardless of the Virtual Box issue, the driver should be advertising >> supported >> capabilities as is done in other drivers. >> >> Signed-off-by: Allain Legacy > Acked-by: Wenzhuo Lu Applied to dpdk-next-net/master, thanks.
Re: [dpdk-dev] [PATCH v2] net/thunderx: check data offset alignment requirement
On 3/13/2017 8:02 AM, Jerin Jacob wrote: > nicvf HW expects the DMA address of the packet data to be > aligned with cache line size. > > Packet data offset is a function of struct mbuf size, > mbuf private size and headroom. mbuf private size can > be changed from the application in pool creation, this > check detects HW alignment requirement constraint in pmd > start function. > > Signed-off-by: Jerin Jacob > Acked-by: Hemant Agrawal Applied to dpdk-next-net/master, thanks.
Re: [dpdk-dev] [PATCH] net/vmxnet3: fix broken debug build
On 3/13/2017 5:19 PM, Shrikrishna Khare wrote: > > > On Mon, 13 Mar 2017, Andrew Rybchenko wrote: > >> Fixes: 4dd452523000 ("net/vmxnet3: add receive data ring support") >> >> Signed-off-by: Andrew Rybchenko > > Thank you for catching and fixing this. > > Acked-by: Shrikrishna Khare Squashed into original commit in next-net [1], thanks. [1] 4dd452523000 ("net/vmxnet3: add receive data ring support")
Re: [dpdk-dev] [PATCH] net/vmxnet3: fix queue size changes
On Mon, Mar 13, 2017 at 11:41 PM, Charles (Chas) Williams wrote: > If the user reconfigures the queues size, then the previosly allocated > memzone may potentially be too small. Instead, always free the old > memzone and allocate a new one. > > Fixes: dfaff37fc46d ("vmxnet3: import new vmxnet3 poll mode driver > implementation") > > Signed-off-by: Chas Williams > --- > drivers/net/vmxnet3/vmxnet3_rxtx.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c > b/drivers/net/vmxnet3/vmxnet3_rxtx.c > index 6649c3f..104e040 100644 > --- a/drivers/net/vmxnet3/vmxnet3_rxtx.c > +++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c > @@ -893,8 +893,8 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf > **rx_pkts, uint16_t nb_pkts) > > /* > * Create memzone for device rings. malloc can't be used as the physical > address is > - * needed. If the memzone is already created, then this function returns a > ptr > - * to the old one. > + * needed. If the memzone already exists, we free it since it may have been > created > + * with a different size. > */ > static const struct rte_memzone * > ring_dma_zone_reserve(struct rte_eth_dev *dev, const char *ring_name, > @@ -909,7 +909,7 @@ ring_dma_zone_reserve(struct rte_eth_dev *dev, const char > *ring_name, > > mz = rte_memzone_lookup(z_name); > if (mz) > - return mz; > + rte_memzone_free(mz); > > return rte_memzone_reserve_aligned(z_name, ring_size, >socket_id, 0, > VMXNET3_RING_BA_ALIGN); Chas, Thanks for hunting this one down. Wouldn't the rte_memzone_free() better fit into vmxnet3_cmd_ring_release() ? Also the ring_dma_zone_reserve() could get replaced by rte_eth_dma_zone_reserve() (see also http://dpdk.org/dev/patchwork/patch/21432/).
[dpdk-dev] [PATCH] igb_uio: support devices with at least 1 bar defined
From: Matt Peters Allow the BAR setup to succeed if a device has at least 1 BAR region defined. Previously, the device probe would only succeed if at least one memory BAR existed, but there are devices that have only port I/O BARs. For example, on Virtual Box a virtio device has only a single I/O BAR because by default MSI-X is not enabled. While in qemu/kvm the virtio device has MSI-X enabled and therefore has both an I/O and Memory BAR. The following are excerpts from "lspci -nn -s 00:09.0" on both types of systems. Virtual Box: Region 0: I/O ports at d260 [size=32] Capabilities: [80] #00 [] QEMU/KVM: Region 0: I/O ports at c060 [size=32] Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at feb8 [disabled] [size=256K] Capabilities: [40] MSI-X: Enable+ Count=3 Masked- Vector table: BAR=1 offset= PBA: BAR=1 offset=0800 Signed-off-by: Matt Peters Signed-off-by: Allain Legacy --- lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c index df41e45..a910eb8 100644 --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c @@ -314,7 +314,7 @@ struct rte_uio_pci_dev { } } - return (iom != 0) ? ret : -ENOENT; + return (iom != 0 || iop != 0) ? ret : -ENOENT; } #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 8, 0) -- 1.8.3.1
Re: [dpdk-dev] [PATCH] net/vmxnet3: fix queue size changes
On 03/14/2017 12:11 PM, Jan Blunck wrote: On Mon, Mar 13, 2017 at 11:41 PM, Charles (Chas) Williams wrote: If the user reconfigures the queues size, then the previosly allocated memzone may potentially be too small. Instead, always free the old memzone and allocate a new one. Fixes: dfaff37fc46d ("vmxnet3: import new vmxnet3 poll mode driver implementation") Signed-off-by: Chas Williams --- drivers/net/vmxnet3/vmxnet3_rxtx.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c b/drivers/net/vmxnet3/vmxnet3_rxtx.c index 6649c3f..104e040 100644 --- a/drivers/net/vmxnet3/vmxnet3_rxtx.c +++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c @@ -893,8 +893,8 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) /* * Create memzone for device rings. malloc can't be used as the physical address is - * needed. If the memzone is already created, then this function returns a ptr - * to the old one. + * needed. If the memzone already exists, we free it since it may have been created + * with a different size. */ static const struct rte_memzone * ring_dma_zone_reserve(struct rte_eth_dev *dev, const char *ring_name, @@ -909,7 +909,7 @@ ring_dma_zone_reserve(struct rte_eth_dev *dev, const char *ring_name, mz = rte_memzone_lookup(z_name); if (mz) - return mz; + rte_memzone_free(mz); return rte_memzone_reserve_aligned(z_name, ring_size, socket_id, 0, VMXNET3_RING_BA_ALIGN); Chas, Thanks for hunting this one down. Wouldn't the rte_memzone_free() better fit into vmxnet3_cmd_ring_release() ? I don't care which way it goes. I just did what is basically done in gpa_zone_reserve() to match the "style". Tracking the current ring size and avoiding reallocating a potentially large chunk of memory seems like a better idea. Also the ring_dma_zone_reserve() could get replaced by rte_eth_dma_zone_reserve() (see also Yes, it probably should get changed to that along with tracking the size.
Re: [dpdk-dev] [PATCH v2 1/2] eal/linux: support external Rx interrupt
> On Mar 14, 2017, at 6:03 AM, Shahaf Shuler wrote: > > Prior to this patch only UIO/VFIO interrupt handlers types were supported. > This patch adds support for the external interrupt handler type, allowing > external drivers to set their own fds with specific interrupt handlers. > > Signed-off-by: Shahaf Shuler Acked-by: Yongseok Koh Regards, Yongseok
Re: [dpdk-dev] [PATCH v2 2/2] net/mlx5: support user space rxq interrupt event
> On Mar 14, 2017, at 6:03 AM, Shahaf Shuler wrote: > > Implement rxq interrupt callbacks > > Signed-off-by: Shahaf Shuler Acked-by: Yongseok Koh Regards, Yongseok
Re: [dpdk-dev] [PATCH] igb_uio: support devices with at least 1 bar defined
On 3/14/2017 4:33 PM, Allain Legacy wrote: > From: Matt Peters > > Allow the BAR setup to succeed if a device has at least 1 BAR region > defined. Previously, the device probe would only succeed if at least one > memory BAR existed, but there are devices that have only port I/O BARs. > > For example, on Virtual Box a virtio device has only a single I/O BAR > because by default MSI-X is not enabled. While in qemu/kvm the virtio > device has MSI-X enabled and therefore has both an I/O and Memory BAR. > > The following are excerpts from "lspci -nn -s 00:09.0" on both types of > systems. > > Virtual Box: > > Region 0: I/O ports at d260 [size=32] > Capabilities: [80] #00 [] > > QEMU/KVM: > > Region 0: I/O ports at c060 [size=32] > Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K] > Expansion ROM at feb8 [disabled] [size=256K] > Capabilities: [40] MSI-X: Enable+ Count=3 Masked- > Vector table: BAR=1 offset= > PBA: BAR=1 offset=0800 > > Signed-off-by: Matt Peters > Signed-off-by: Allain Legacy Acked-by: Ferruh Yigit
Re: [dpdk-dev] [PATCH 1/2] net/tap: remove redundant syscall on Tx
On 3/14/2017 1:52 PM, Wiles, Keith wrote: > >> On Mar 14, 2017, at 8:51 PM, Adrien Mazarguil >> wrote: >> >> Polling the Tx queue file descriptor before writing to it is not mandatory >> since it is configured as non-blocking. >> >> Signed-off-by: Adrien Mazarguil >> Acked-by: Pascal Mazon > > Acked-by: Keith Wiles Series applied to dpdk-next-net/master, thanks.
[dpdk-dev] [PATCH v3] lpm: extend IPv6 next hop field
This patch extend next_hop field from 8-bits to 21-bits in LPM library for IPv6. Added versioning symbols to functions and updated library and applications that have a dependency on LPM library. Signed-off-by: Vladyslav Buslov Acked-by: Bruce Richardson --- Fixed compilation error in l3fwd_lpm.h doc/guides/prog_guide/lpm6_lib.rst | 2 +- doc/guides/rel_notes/release_17_05.rst | 5 + examples/ip_fragmentation/main.c| 17 +-- examples/ip_reassembly/main.c | 17 +-- examples/ipsec-secgw/ipsec-secgw.c | 2 +- examples/l3fwd/l3fwd_lpm.h | 2 +- examples/l3fwd/l3fwd_lpm_sse.h | 24 ++--- examples/performance-thread/l3fwd-thread/main.c | 11 +- lib/librte_lpm/rte_lpm6.c | 134 +--- lib/librte_lpm/rte_lpm6.h | 32 +- lib/librte_lpm/rte_lpm_version.map | 10 ++ lib/librte_table/rte_table_lpm_ipv6.c | 9 +- test/test/test_lpm6.c | 115 ++-- test/test/test_lpm6_perf.c | 4 +- 14 files changed, 293 insertions(+), 91 deletions(-) diff --git a/doc/guides/prog_guide/lpm6_lib.rst b/doc/guides/prog_guide/lpm6_lib.rst index 0aea5c5..f791507 100644 --- a/doc/guides/prog_guide/lpm6_lib.rst +++ b/doc/guides/prog_guide/lpm6_lib.rst @@ -53,7 +53,7 @@ several thousand IPv6 rules, but the number can vary depending on the case. An LPM prefix is represented by a pair of parameters (128-bit key, depth), with depth in the range of 1 to 128. An LPM rule is represented by an LPM prefix and some user data associated with the prefix. The prefix serves as the unique identifier for the LPM rule. -In this implementation, the user data is 1-byte long and is called "next hop", +In this implementation, the user data is 21-bits long and is called "next hop", which corresponds to its main use of storing the ID of the next hop in a routing table entry. The main methods exported for the LPM component are: diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst index 4b90036..918f483 100644 --- a/doc/guides/rel_notes/release_17_05.rst +++ b/doc/guides/rel_notes/release_17_05.rst @@ -41,6 +41,9 @@ New Features Also, make sure to start the actual text at the margin. = +* **Increased number of next hops for LPM IPv6 to 2^21.** + + The next_hop field is extended from 8 bits to 21 bits for IPv6. * **Added powerpc support in pci probing for vfio-pci devices.** @@ -114,6 +117,8 @@ API Changes Also, make sure to start the actual text at the margin. = +* The LPM ``next_hop`` field is extended from 8 bits to 21 bits for IPv6 + while keeping ABI compatibility. ABI Changes --- diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c index 9e9ecae..1b005b5 100644 --- a/examples/ip_fragmentation/main.c +++ b/examples/ip_fragmentation/main.c @@ -265,8 +265,8 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf, uint8_t queueid, uint8_t port_in) { struct rx_queue *rxq; - uint32_t i, len, next_hop_ipv4; - uint8_t next_hop_ipv6, port_out, ipv6; + uint32_t i, len, next_hop; + uint8_t port_out, ipv6; int32_t len2; ipv6 = 0; @@ -290,9 +290,9 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf, ip_dst = rte_be_to_cpu_32(ip_hdr->dst_addr); /* Find destination port */ - if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop_ipv4) == 0 && - (enabled_port_mask & 1 << next_hop_ipv4) != 0) { - port_out = next_hop_ipv4; + if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop) == 0 && + (enabled_port_mask & 1 << next_hop) != 0) { + port_out = next_hop; /* Build transmission burst for new port */ len = qconf->tx_mbufs[port_out].len; @@ -326,9 +326,10 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf, ip_hdr = rte_pktmbuf_mtod(m, struct ipv6_hdr *); /* Find destination port */ - if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop_ipv6) == 0 && - (enabled_port_mask & 1 << next_hop_ipv6) != 0) { - port_out = next_hop_ipv6; + if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, + &next_hop) == 0 && + (enabled_port_mask & 1 << next_hop) != 0) { + port_out = next_hop; /* Build transmission burst for new port
Re: [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio?
Allain, see inline, + I did restore the thread from http://dpdk.org/ml/archives/dev/2017-March/060087.html into the same email. To make it short, using ivshmem, you keep people unfocused from virtio. Vincent, Perhaps you can help me understand why the performance or functionality of AVP vs. Virtio is relevant to the decision of accepting this driver. There are many drivers in the DPDK; most of which provide the same functionality at comparable performance rates. AVP is just another such driver. The fact that it is virtual rather than physical, in my opinion, should not influence the decision of accepting this driver. No, it is a different logic: your driver is about Qemu's vNIC support. Other PMDs are here to support vendors and different hypervisors. For isntance, in case of vmxnet3, different versions, evolutions and optimisations can be managed inside vmxnet3. We need to avoid the proliferation of PMDs for a same hypervisor while there is already an efficient solution, virtio. On the other hand, code quality/complexity or lack of a maintainer are reasonable reasons for rejecting. If our driver is accepted we are committed to maintaining it and testing changes as required by any driver framework changes which may impact all drivers. But, then it is unfocusing your capabilities. As a community, we need to be sure that we are all focused on improving existing solutions. Since virtio is the one, I would rather prefer to see more people working on improving the virtio's community instead of getting everyone unfocused. Along the same lines, I do not understand why upstreaming AVP in to the Linux kernel or qemu/kvm should be a prerequisite for inclusion in the DPDK. Continuing my analogy from above, the AVP device is a commercial offering tied to the Wind River Systems Titanium product line. It enables virtualized DPDK applications and increases DPDK adoption. Similarly to how a driver from company XYX is tied to a commercial NIC that must be purchased by a customer, our AVP device is available to operators that choose to leverage our Titanium product to implement their Cloud solutions. It is not our intention to upstream the qemu/kvm or host vswitch portion of the AVP device. Our qemu/kvm extensions are GPL so they are available to our customers if they desire to rebuild qemu/kvm with their own proprietary extensions It was a solution before 2013, but now we are in 2017, vhost-user is here, show them the new proper path. Let's be focus, please. Our AVP device was implemented in 2013 in response to the challenge of lower than required performance of qemu/virtio in both user space and DPDK applications in the VM. Rather than making complex changes to qemu/virtio and continuously have to forward prop those as we upgraded to newer versions of qemu we decided to decouple ourselves from that code base. We developed the AVP device based on an evolution of KNI+ivshmem by enhancing both with features that would meet the needs of our customers; better performance, multi-queue support, live-migration support, hot-plug support. As I said in my earlier response, since 2013, qemu/virtio has seen improved performance with the introduction of vhost-user. The performance of vhost-user still has not yet achieved performance levels equal to our AVP PMD. Frankly, I was on the boat than you: I did strong pitch the use of ivshmem for vNICs for many benefits. But Redhat folks did push bash to ask every one to be focused in virtio. So, back to 2013, I would have supported your AVP approach, but now we are in 2017 and we must stack focused on a single and proper qemu solution => virtio/vhost. I acknowledge that the AVP driver could exist as an out-of-tree driver loaded as a shared library at runtime. In fact, 2 years ago we released our driver source on github for this very reason. We provide instructions and support for building the AVP PMD as a shared library. Some customers have adopted this method while many insist on an in-tree driver for several reasons. Most importantly, they want to eliminate the burden of needing to build and support an additional package into their product. An in-tree driver would eliminate the need for a separate build/packaging process. Also, they want an option that allows them to be able to develop directly on the bleeding edge of DPDK rather than waiting on us to update our out-of-tree driver based on stable releases of the DPDK. In this regard, an in-tree driver would allow our customers to work directly on the latest DPDK. Make them move to virtio, then this pain will disappear. An in-tree driver provides obvious benefits to our customers, but keep in mind that this also provides a benefit to the DPDK. If a customer must develop on a stable release because they must use an out-of-tree driver then they are less likely to contribute fixes/enhancements/testing upstream. I know this first hand because I work with software from different sources on a daily ba
Re: [dpdk-dev] [PATCH] net/i40e: fix error log in descriptor done function
On 3/1/2017 2:45 PM, Olivier Matz wrote: > It's not queue identifier but a descriptor identifier. > > Fixes: 4861cde46116 ("i40e: new poll mode driver") > > Signed-off-by: Olivier Matz Acked-by: Ferruh Yigit