Re: [dpdk-dev] [PATCH] net/virtio-user: support changing tap interface name

2017-03-14 Thread Yuanhan Liu

On Sat, Mar 11, 2017 at 03:36:58PM +, Wenfeng Liu wrote:
> This patch adds a new option 'iface' to change the interface name of
> tap device with vhost-kernel as backend.
> 
> Signed-off-by: Wenfeng Liu 

Jianfeng, any comments?

--yliu


[dpdk-dev] Compiling DPDK 17.02 using musl instead of glibc

2017-03-14 Thread Raphael Cohn
Hi,

I've managed to compile DPDK 17.02 for musl instead of glibc. In essence, I
had to change a number of files. I've not prepared a patch at this stage,
but I wanted to share my changes with the list so that others can benefit.

Firstly, I apply the following minor sed edits:-

# Assumes glibc
sed -i -e 's;#include ;#include \n#include
;g' lib/librte_eal/common/include/rte_lcore.h
sed -i -e 's;#include ;#include \n#include
;g' lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
sed -i -e 's;#include ;#include \n#include
;g' lib/librte_eal/linuxapp/eal/eal_memory.c
sed -i -e 's;#include ;#include \n#include
\n;g' lib/librte_eal/linuxapp/eal/eal_pci_uio.c
sed -i -e 's;#define PAGE_SIZE;#undef PAGE_SIZE\n#define PAGE_SIZE;g'
lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
sed -i -e 's;#include ;#include \n#include
;g' app/test/test_eal_flags.c

# ?
sed -i -e 's;uint hash_key_len;uint8_t hash_key_len;g'
app/test-pmd/testpmd.c app/test-pmd/testpmd.h app/t

Secondly, I replaced these files with dummies as they use glibc
functionality without equivalence:-
lib/librte_eal/linuxapp/eal/eal_debug.c
lib/librte_eal/linuxapp/eal/eal_log.c
lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c

That's not ideal

Lastly, I also had to modify musl itself:-
- Musl does not ship sys/queue.h. Perhaps in future DPDK could ship its own
copy of this? Musl will not be shipping it, either
- Musl lacks some functions in bits/io.h for x86_64. They seem open to
adding them, but need a contribution from someone knowledgeable (
http://www.openwall.com/lists/musl/2015/11/10/1). There is a patch shipped
with OSv for it.

Raph


[dpdk-dev] Changes to DPDK 17.02 to cross compile for Linux on Mac OS X

2017-03-14 Thread Raphael Cohn
Hi,

If anyone's interested, there are a few small changes needed to pmdinfogen
to get it to cross-compile using Homebrew on Mac OS X. These aren't really
patch worthy, as my changes will break it for other BSD-based systems.

Firstly
local libelfPrefix="$(brew --prefix libelf)"
local extra="-I${libelfPrefix}/include/libelf
-I${libelfPrefix}/include -isystem${dpdkSrcDir}/buildtools/pmdinfogen"
sed -i -e 's;-g;-g '"$extra"';g'
buildtools/pmdinfogen/Makefile
sed -i -e 's;#include ;#include
"endian.h";g' buildtools/pmdinfogen/pmdinfogen.h

Secondly, create a simple endian.h file inside buildtools/pmdinfogen. There
are a few ones floating around on the web.

Thirdly, create a stripped down elf.h inside buildtools/pmdinfogen. There
are a few public domain ones floating around on the web. I'm not sure about
the copyright status of these, so I've not linked to them.

I'm using these changes currently as part of a Rust crate wrapping DPDK,
called `dpdk-sys` (it has a mid-level Rust wrapper called `dpdk`). It
compiles but it's early days - see https://github.com/lemonrock/dpdk .

Raph


Re: [dpdk-dev] [PATCH 1/3] app/testpmd: add port reset command into testpmd

2017-03-14 Thread Zhao1, Wei
Hi, Ferruh 

> -Original Message-
> From: Yigit, Ferruh
> Sent: Wednesday, March 8, 2017 7:07 PM
> To: Zhao1, Wei ; dev@dpdk.org
> Cc: Lu, Wenzhuo 
> Subject: Re: [dpdk-dev] [PATCH 1/3] app/testpmd: add port reset command
> into testpmd
> 
> On 3/3/2017 4:56 AM, Wei Zhao wrote:
> > Add vf port reset command into testpmd project, it is the interface
> > for user to reset vf port.
> 
> I think it is better to change the order of this patch, first implement new 
> API
> in ethdev, later this patch implement new API in testpmd.
> 

You means change the order of patch 0001 and 0002 ?0003 remain the same?
Ok,I will do that in v2 patch
 
> >
> > Signed-off-by: Wei Zhao 
> > Signed-off-by: Wenzhuo Lu 
> > ---
> >  app/test-pmd/cmdline.c | 17 ++---  app/test-pmd/testpmd.c |
> > 67 ++
> >  app/test-pmd/testpmd.h |  1 +
> >  3 files changed, 81 insertions(+), 4 deletions(-)
> >
> > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > 43fc636..59db672 100644
> > --- a/app/test-pmd/cmdline.c
> > +++ b/app/test-pmd/cmdline.c
> > @@ -596,6 +596,9 @@ static void cmd_help_long_parsed(void
> *parsed_result,
> > "port close (port_id|all)\n"
> > "Close all ports or port_id.\n\n"
> >
> > +   "port reset (port_id|all)\n"
> > +   "Reset all ports or port_id.\n\n"
> 
> It is not clear what reset does to the port. This is only for VF right?
> Adding reset here hides that it is for VF.

Yes, it is only for VF port reset, maybe I should change cmd line command to 
"port vf_reset  index"
that mode to avoid ambiguous for user in v2

> 
> <...>
> 
> > @@ -601,6 +602,7 @@ init_config(void)
> > if (init_fwd_streams() < 0)
> > rte_exit(EXIT_FAILURE, "FAIL from init_fwd_streams()\n");
> >
> > +
> 
> This may be unintentional.

Yes, I will fix it in v2 patch. 

> 
> <...>
> 
> > @@ -1350,6 +1363,10 @@ start_port(portid_t pid)
> > return -1;
> > }
> > }
> > +
> > +   /* register reset interrupt callback */
> > +   rte_eth_dev_callback_register(pi,
> RTE_ETH_EVENT_INTR_RESET,
> > +   reset_event_callback, NULL);
> 
> So each port started will register a callback to handle reset events,
> 
> 1- isn't this overkill for the usecases that does not need this reset?
> 2- should there be an unregister event?
> 3- This issue can be fixed in testpmd, but for user application, is this the
> suggested way?

1. it is hard to know which port may be need to reset in the feature at the 
beginning stage, so there is  register event for all ports.
2. this is only a demo for user application. I think in user application, there 
should be a specific thread to check whether reset_ports is set by 
 I40e function _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET, 
NULL),  so  RTE_ETH_EVENT_INTR_RESET must be registered.
 IF true, then user application should call function rte_eth_dev_reset(pi)
3. . this is only a demo for user application, user application can create a 
new specific thread to do this work , but testpmd app can not because that may 
be introduce 
   A lot of unknow problem for existing feature.

> 
> > if (port->need_reconfig_queues > 0) {
> > port->need_reconfig_queues = 0;
> > /* setup tx queues */
> > @@ -1559,6 +1576,56 @@ close_port(portid_t pid)  }
> >
> >  void
> > +reset_port(portid_t pid)
> > +{
> > +   portid_t pi;
> > +   struct rte_port *port;
> > +
> > +   if (port_id_is_invalid(pid, ENABLED_WARN))
> > +   return;
> > +
> > +   printf("Closing ports...\n");
> > +
> > +   FOREACH_PORT(pi, ports) {
> 
> Since we already know the port_id (pid), why iterating through all ports?

This is usefully, because when user command is “port reset all”, the port_id is 
0x
If all of these ports are vf, then it will need to retrieve all port.

> 
> > +   if (pid != pi && pid != (portid_t)RTE_PORT_ALL)
> > +   continue;
> > +
> > +   if (port_is_forwarding(pi) != 0 && test_done == 0) {
> > +   printf("Please remove port %d from forwarding "
> > +   "configuration.\n", pi);
> > +   continue;
> > +   }
> > +
> > +   if (port_is_bonding_slave(pi)) {
> > +   printf("Please remove port %d from "
> > +   "bonded device.\n", pi);
> > +   continue;
> > +   }
> > +
> > +   if (!reset_ports[pi]) {
> > +   printf("vf must get reset port %d info from "
> > +   "pf before reset.\n", pi);
> > +   continue;
> > +   }
> 
> Can there be a timing issue here? Is it possible that reset occurred already
> and we are in the middle of the callback function when this check don

[dpdk-dev] Compiling DPDK 17.02 with a Busybox-based tar

2017-03-14 Thread Raphael Cohn
Hi,

To compile DPDK on a system with Busybox tar installed, it's necessary to
make a small change to the build system:-

sed -i -e '/--keep-newer-files/d' mk/rte.sdkinstall.mk
sed -i -e 's;--strip-components=1 \\;--strip-components=1;g' mk/
rte.sdkinstall.mk

I'm not sure whether the impact of this change fundamentally affects DPDK.
I'm a little surprised that tar is needed at all for a compile + install,
but I haven't investigated further. Is it being used to do a copy?

Raph


Re: [dpdk-dev] [PATCHv8 19/46] pool/dpaa2: add DPAA2 hardware offloaded mempool

2017-03-14 Thread Olivier Matz
Hi Hemant,

On Tue, 14 Mar 2017 12:12:17 +0530, Hemant Agrawal  
wrote:
> On 3/9/2017 11:27 AM, Hemant Agrawal wrote:
> > On 3/8/2017 9:09 PM, Thomas Monjalon wrote:  
> >> 2017-03-08 18:22, Hemant Agrawal:  
>  On Fri, 3 Mar 2017 18:16:36 +0530, Hemant Agrawal
>   wrote:
>  I think the current mempool handlers should be moved first in a
>  separate patch.  
> >>
> >> Yes it should have been done earlier.
> >>  
> >>> Are you seeing any benefit by making it a separate patch series?  
> >>
> >> A separate patchset for moving mempool handlers will be easy to review
> >> and accept.
> >> If integrated in this series, it is kind of hidden and prevent the
> >> visibility and review it deserves.
> >> By the way the mempool move should be directly committed in the main
> >> repository, while this series targets next-net.
> >>  
> >
> > hw mempool has dependency on mc-bus. So, we will break it into *5* series,
> >
> > 1. mc-bus - which can go on main repo - Rebased over main repo
> > 2. dpaa2-pool - it can also go on main rep (depends on mc-bus) - Rebased
> > over 'main repo + 1'.
> > 3. dpaa2-pmd - this depends on mc-bus and dpaa2-pool
> >   A. mc-bus -another series rebased over net-next
> >   B. dpaa2-pool - another series rebased over 'net-next + A'
> >   C. the pmd itself rebased over 'net-next + A + B'
> >
> > Are you sure, you will like us to do the above and flood the mailing
> > list :)
> >
> > I think, this will make the review difficult.
> >
> > Bruce, Ferruh, Olivier - does the above help you? If so, we can do it.  
> 
> Any views, we can rebase our next version accordingly.
> we are looking forward to get it merged in 17.05.

Yes, moving the other mempool drivers in the drivers/ directory
before adding another one there makes sense to me, in order to
avoid having two places for mempool handlers.

As Thomas stated, it's probably better to have it in another
patchset, so it can be quickly reviewed and integrated.

Regards,
Olivier


[dpdk-dev] 答复: [PATCH] net/virtio-user: fix overflow

2017-03-14 Thread Wenfeng Liu
Hi Yuanhan,

>On Mon, Mar 13, 2017 at 03:09:15PM +, Wenfeng Liu wrote:
>> This commit fixes an array overflow when number of queues is higher than
8.
>
>Firstly, this commit log could be a bit more informative, to something
>like:
>
>virtio-user limits the qeueue number to 8 but provides no limit
>check against the queue number input from user. If a bigger queue
>number (> 8) is given, there is an overflow issue. Doing a sanity
>check could avoid it.
>

Sure, I will revise the commit log accordingly.

>> 
>> Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")
>> 
>> Signed-off-by: Wenfeng Liu 
>> ---
>>  drivers/net/virtio/virtio_pci.h  | 3 ++-
>>  drivers/net/virtio/virtio_user/virtio_user_dev.c | 2 +-  
>> drivers/net/virtio/virtio_user/virtio_user_dev.h | 6 +++---
>>  drivers/net/virtio/virtio_user_ethdev.c  | 7 +++
>>  4 files changed, 13 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/net/virtio/virtio_pci.h 
>> b/drivers/net/virtio/virtio_pci.h index 59e45c4..bd940b4 100644
>> --- a/drivers/net/virtio/virtio_pci.h
>> +++ b/drivers/net/virtio/virtio_pci.h
>> @@ -160,7 +160,8 @@
>>  /*
>>   * Maximum number of virtqueues per device.
>>   */
>> -#define VIRTIO_MAX_VIRTQUEUES 8
>> +#define VIRTIO_MAX_VIRTQUEUE_PAIRS 8
>> +#define VIRTIO_MAX_VIRTQUEUES VIRTIO_MAX_VIRTQUEUE_PAIRS * 2 + 1
>>  
>>  /* Common configuration */
>>  #define VIRTIO_PCI_CAP_COMMON_CFG   1
>> diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
>> b/drivers/net/virtio/virtio_user/virtio_user_dev.c
>> index e7fd65f..5b81676 100644
>> --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
>> +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
>> @@ -234,7 +234,7 @@ int virtio_user_stop_device(struct virtio_user_dev
*dev)
>>  uint32_t i, q;
>>  
>>  dev->vhostfd = -1;
>> -for (i = 0; i < VIRTIO_MAX_VIRTQUEUES * 2 + 1; ++i) {
>> +for (i = 0; i < VIRTIO_MAX_VIRTQUEUES; ++i) {
>
>Right, we don't need setup callfd and kickfd for the ctrl-queue.

I did not remove the ctrl-queue. I just redefined the MACRO according to DRY
principle:
#define VIRTIO_MAX_VIRTQUEUES VIRTIO_MAX_VIRTQUEUE_PAIRS * 2 + 1
I noticed that I missed the bracket in the MACRO and will add it in next
version.

>
>>  dev->kickfds[i] = -1;
>>  dev->callfds[i] = -1;
>>  }
>> diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h 
>> b/drivers/net/virtio/virtio_user/virtio_user_dev.h
>> index 6ecb91e..ba80d05 100644
>> --- a/drivers/net/virtio/virtio_user/virtio_user_dev.h
>> +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h
>> @@ -49,8 +49,8 @@ struct virtio_user_dev {
>>  int *tapfds;
>>  
>>  /* for both vhost_user and vhost_kernel */
>> -int callfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
>> -int kickfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
>> +int callfds[VIRTIO_MAX_VIRTQUEUES];
>> +int kickfds[VIRTIO_MAX_VIRTQUEUES];
>>  int mac_specified;
>>  uint32_tmax_queue_pairs;
>>  uint32_tqueue_pairs;
>> @@ -62,7 +62,7 @@ struct virtio_user_dev {
>>  uint8_t status;
>>  uint8_t mac_addr[ETHER_ADDR_LEN];
>>  charpath[PATH_MAX];
>> -struct vringvrings[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
>> +struct vringvrings[VIRTIO_MAX_VIRTQUEUES];
>
>Have you actually tested your patch? You are removing the vring for
ctrl-queue, which is wrong to me.
>
>>  struct virtio_user_backend_ops *ops;  };
>>  
>> diff --git a/drivers/net/virtio/virtio_user_ethdev.c 
>> b/drivers/net/virtio/virtio_user_ethdev.c
>> index 16d1526..d476a2d 100644
>> --- a/drivers/net/virtio/virtio_user_ethdev.c
>> +++ b/drivers/net/virtio/virtio_user_ethdev.c
>> @@ -433,6 +433,13 @@
>>  goto end;
>>  }
>>  
>> +if (queues > VIRTIO_MAX_VIRTQUEUE_PAIRS) {
>> +PMD_INIT_LOG(ERR, "arg %s %u exceeds the limit %u",
>> +VIRTIO_USER_ARG_QUEUES_NUM, queues,
>> +VIRTIO_MAX_VIRTQUEUE_PAIRS);
>> +goto end;
>> +}
>
>Yes, we need this check. So, to me, you were actually doing two things in
this patch:
>
>- check the queue number, to avoid overflow
>- remove the callfds and kickfds for ctrl-queue.
>
>That said, please do them in two patches.
>
>Thanks.
>
>   --yliu
>> +
>>  eth_dev = virtio_user_eth_dev_alloc(name);
>>  if (!eth_dev) {
>>  PMD_INIT_LOG(ERR, "virtio_user fails to alloc device");
>> --
>> 1.8.3.1

Regards,
liuwf



[dpdk-dev] [PATCH v4 1/8] net/tap: remove wrong IFF_NOARP flags

2017-03-14 Thread Pascal Mazon
There is no reason not to support ARP on a tap netdevice.
Focus on IFF_UP when a link status change is required.

Fixes: f457b472b1f2 ("net/tap: add link up and down operations")
Signed-off-by: Pascal Mazon 
---
 drivers/net/tap/rte_eth_tap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index ece3a5fcc897..b75b7542a329 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -327,7 +327,7 @@ tap_link_set_down(struct rte_eth_dev *dev)
struct pmd_internals *pmd = dev->data->dev_private;
 
dev->data->dev_link.link_status = ETH_LINK_DOWN;
-   return tap_link_set_flags(pmd, IFF_UP | IFF_NOARP, 0);
+   return tap_link_set_flags(pmd, IFF_UP, 0);
 }
 
 static int
@@ -336,7 +336,7 @@ tap_link_set_up(struct rte_eth_dev *dev)
struct pmd_internals *pmd = dev->data->dev_private;
 
dev->data->dev_link.link_status = ETH_LINK_UP;
-   return tap_link_set_flags(pmd, IFF_UP | IFF_NOARP, 1);
+   return tap_link_set_flags(pmd, IFF_UP, 1);
 }
 
 static int
-- 
2.8.0.rc0



[dpdk-dev] [PATCH v4 0/8] net/tap: add additional management ops

2017-03-14 Thread Pascal Mazon
Add a few eth_dev ops to the tap PMD for completeness.
We want it to behave as much as possible as a standard PMD.

v2 change:
  - use snprintf in tap_mtu set

v3 change:
  - call tap_mac_set() only once in tap_setup_queue()

v4 changes:
  - use new tap_ioctl function for shared behavior between ops
  - update supported packet types
  - remove IFF_NOARP flag from link status change
  - sync desired MTU to both remote and tap netdevices
  - remove support for mac_add and mac_remove ops

Pascal Mazon (8):
  net/tap: remove wrong IFF_NOARP flags
  net/tap: refactor ioctl calls
  net/tap: add MAC address management
  net/tap: add MTU management
  net/tap: add speed capabilities
  net/tap: add multicast addresses management
  net/tap: add packet type management
  net/tap: add flow control management

 doc/guides/nics/features/tap.ini |   6 +
 drivers/net/tap/rte_eth_tap.c| 255 +++
 2 files changed, 214 insertions(+), 47 deletions(-)

-- 
2.8.0.rc0



[dpdk-dev] [PATCH v4 2/8] net/tap: refactor ioctl calls

2017-03-14 Thread Pascal Mazon
Create a socket for ioctl at tap device creation instead of opening it
and closing it every call to tap_link_set_flags().

Use a common tap_ioctl() function that can be extended for various uses
(such as MTU change, MAC address change, ...).

Signed-off-by: Pascal Mazon 
---
 drivers/net/tap/rte_eth_tap.c | 90 +++
 1 file changed, 48 insertions(+), 42 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index b75b7542a329..18edac57eead 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -107,6 +107,7 @@ struct pmd_internals {
struct ether_addr eth_addr; /* Mac address of the device port */
 
int if_index;   /* IF_INDEX for the port */
+   int ioctl_sock; /* socket for ioctl calls */
 
struct rx_queue rxq[RTE_PMD_TAP_MAX_QUEUES];/* List of RX queues */
struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES];/* List of TX queues */
@@ -280,63 +281,55 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
 }
 
 static int
-tap_link_set_flags(struct pmd_internals *pmd, short flags, int add)
+tap_ioctl(struct pmd_internals *pmd, unsigned long request,
+ struct ifreq *ifr, int set)
 {
-   struct ifreq ifr;
-   int err, s;
-
-   /*
-* An AF_INET/DGRAM socket is needed for
-* SIOCGIFFLAGS/SIOCSIFFLAGS, using fd won't work.
-*/
-   s = socket(AF_INET, SOCK_DGRAM, 0);
-   if (s < 0) {
-   RTE_LOG(ERR, PMD,
-   "Unable to get a socket to set flags: %s\n",
-   strerror(errno));
-   return -1;
-   }
-   memset(&ifr, 0, sizeof(ifr));
-   snprintf(ifr.ifr_name, IFNAMSIZ, "%s", pmd->name);
-   err = ioctl(s, SIOCGIFFLAGS, &ifr);
-   if (err < 0) {
-   RTE_LOG(WARNING, PMD, "Unable to get %s device flags: %s\n",
-   pmd->name, strerror(errno));
-   close(s);
-   return -1;
-   }
-   if (add)
-   ifr.ifr_flags |= flags;
-   else
-   ifr.ifr_flags &= ~flags;
-   err = ioctl(s, SIOCSIFFLAGS, &ifr);
-   if (err < 0) {
-   RTE_LOG(WARNING, PMD, "Unable to %s flags 0x%x: %s\n",
-   add ? "set" : "unset", flags, strerror(errno));
-   close(s);
-   return -1;
-   }
-   close(s);
+   short req_flags = ifr->ifr_flags;
 
+   snprintf(ifr->ifr_name, IFNAMSIZ, "%s", pmd->name);
+   switch (request) {
+   case SIOCSIFFLAGS:
+   /* fetch current flags to leave other flags untouched */
+   if (ioctl(pmd->ioctl_sock, SIOCGIFFLAGS, ifr) < 0)
+   goto error;
+   if (set)
+   ifr->ifr_flags |= req_flags;
+   else
+   ifr->ifr_flags &= ~req_flags;
+   break;
+   default:
+   RTE_LOG(WARNING, PMD, "%s: ioctl() called with wrong arg\n",
+   pmd->name);
+   return -EINVAL;
+   }
+   if (ioctl(pmd->ioctl_sock, request, ifr) < 0)
+   goto error;
return 0;
+
+error:
+   RTE_LOG(ERR, PMD, "%s: ioctl(%lu) failed with error: %s\n",
+   ifr->ifr_name, request, strerror(errno));
+   return -errno;
 }
 
 static int
 tap_link_set_down(struct rte_eth_dev *dev)
 {
struct pmd_internals *pmd = dev->data->dev_private;
+   struct ifreq ifr = { .ifr_flags = IFF_UP };
 
dev->data->dev_link.link_status = ETH_LINK_DOWN;
-   return tap_link_set_flags(pmd, IFF_UP, 0);
+   return tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0);
 }
 
 static int
 tap_link_set_up(struct rte_eth_dev *dev)
 {
struct pmd_internals *pmd = dev->data->dev_private;
+   struct ifreq ifr = { .ifr_flags = IFF_UP };
 
dev->data->dev_link.link_status = ETH_LINK_UP;
-   return tap_link_set_flags(pmd, IFF_UP, 1);
+   return tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1);
 }
 
 static int
@@ -470,36 +463,40 @@ static void
 tap_promisc_enable(struct rte_eth_dev *dev)
 {
struct pmd_internals *pmd = dev->data->dev_private;
+   struct ifreq ifr = { .ifr_flags = IFF_PROMISC };
 
dev->data->promiscuous = 1;
-   tap_link_set_flags(pmd, IFF_PROMISC, 1);
+   tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1);
 }
 
 static void
 tap_promisc_disable(struct rte_eth_dev *dev)
 {
struct pmd_internals *pmd = dev->data->dev_private;
+   struct ifreq ifr = { .ifr_flags = IFF_PROMISC };
 
dev->data->promiscuous = 0;
-   tap_link_set_flags(pmd, IFF_PROMISC, 0);
+   tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0);
 }
 
 static void
 tap_allmulti_enable(struct rte_eth_dev *dev)
 {
struct pmd_internals *pmd = dev->data->dev_private;
+   struct ifreq ifr = { .ifr_flags = IFF_ALLMULTI };
 
dev->data->all_multicast = 1;
-   t

[dpdk-dev] [PATCH v4 4/8] net/tap: add MTU management

2017-03-14 Thread Pascal Mazon
The MTU is assigned to the tap netdevice according to the argument, but
packet transmission and reception just write/read on an fd with the
default limit being the socket buffer size.

As a new rte_eth_dev_data is allocated during tap device init, ensure it
is set again dev->data->mtu.
Once the actual netdevice is created via tun_alloc(), make sure to apply
the desired MTU to the netdevice.

Signed-off-by: Pascal Mazon 
---
 doc/guides/nics/features/tap.ini |  1 +
 drivers/net/tap/rte_eth_tap.c| 26 ++
 2 files changed, 27 insertions(+)

diff --git a/doc/guides/nics/features/tap.ini b/doc/guides/nics/features/tap.ini
index d9b47a003654..b80577753240 100644
--- a/doc/guides/nics/features/tap.ini
+++ b/doc/guides/nics/features/tap.ini
@@ -9,6 +9,7 @@ Jumbo frame  = Y
 Promiscuous mode = Y
 Allmulticast mode= Y
 Basic stats  = Y
+MTU update   = Y
 Unicast MAC filter   = Y
 Other kdrv   = Y
 ARMv7= Y
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 06c1faa92dce..3e61f7bda555 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -302,6 +302,7 @@ tap_ioctl(struct pmd_internals *pmd, unsigned long request,
break;
case SIOCGIFHWADDR:
case SIOCSIFHWADDR:
+   case SIOCSIFMTU:
break;
default:
RTE_LOG(WARNING, PMD, "%s: ioctl() called with wrong arg\n",
@@ -547,6 +548,15 @@ tap_setup_queue(struct rte_eth_dev *dev,
pmd->name, qid);
return -1;
}
+   if (qid == 0) {
+   struct ifreq ifr;
+
+   ifr.ifr_mtu = dev->data->mtu;
+   if (tap_ioctl(pmd, SIOCSIFMTU, &ifr, 1) < 0) {
+   close(fd);
+   return -1;
+   }
+   }
}
}
 
@@ -642,6 +652,20 @@ tap_tx_queue_setup(struct rte_eth_dev *dev,
return 0;
 }
 
+static int
+tap_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+   struct pmd_internals *pmd = dev->data->dev_private;
+   struct ifreq ifr = { .ifr_mtu = mtu };
+   int err = 0;
+
+   err = tap_ioctl(pmd, SIOCSIFMTU, &ifr, 1);
+   if (!err)
+   dev->data->mtu = mtu;
+
+   return err;
+}
+
 static const struct eth_dev_ops ops = {
.dev_start  = tap_dev_start,
.dev_stop   = tap_dev_stop,
@@ -660,6 +684,7 @@ static const struct eth_dev_ops ops = {
.allmulticast_enable= tap_allmulti_enable,
.allmulticast_disable   = tap_allmulti_disable,
.mac_addr_set   = tap_mac_set,
+   .mtu_set= tap_mtu_set,
.stats_get  = tap_stats_get,
.stats_reset= tap_stats_reset,
 };
@@ -710,6 +735,7 @@ eth_dev_tap_create(const char *name, char *tap_name)
/* Setup some default values */
data->dev_private = pmd;
data->port_id = dev->data->port_id;
+   data->mtu = dev->data->mtu;
data->dev_flags = RTE_ETH_DEV_DETACHABLE;
data->kdrv = RTE_KDRV_NONE;
data->drv_name = pmd_tap_drv.driver.name;
-- 
2.8.0.rc0



[dpdk-dev] [PATCH v4 6/8] net/tap: add multicast addresses management

2017-03-14 Thread Pascal Mazon
A tap netdevice actually receives every packet, without any filtering
whatsoever. There is no need for any multicast address registration
to receive multicast packets.

Signed-off-by: Pascal Mazon 
---
 doc/guides/nics/features/tap.ini |  1 +
 drivers/net/tap/rte_eth_tap.c| 13 +
 2 files changed, 14 insertions(+)

diff --git a/doc/guides/nics/features/tap.ini b/doc/guides/nics/features/tap.ini
index 6c07352088bf..6aa11874e2bc 100644
--- a/doc/guides/nics/features/tap.ini
+++ b/doc/guides/nics/features/tap.ini
@@ -10,6 +10,7 @@ Promiscuous mode = Y
 Allmulticast mode= Y
 Basic stats  = Y
 MTU update   = Y
+Multicast MAC filter = Y
 Speed capabilities   = Y
 Unicast MAC filter   = Y
 Other kdrv   = Y
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 86d2b3b4c106..e9aa2f45ba54 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -701,6 +701,18 @@ tap_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
return err;
 }
 
+static int
+tap_set_mc_addr_list(struct rte_eth_dev *dev __rte_unused,
+struct ether_addr *mc_addr_set __rte_unused,
+uint32_t nb_mc_addr __rte_unused)
+{
+   /*
+* Nothing to do actually: the tap has no filtering whatsoever, every
+* packet is received.
+*/
+   return 0;
+}
+
 static const struct eth_dev_ops ops = {
.dev_start  = tap_dev_start,
.dev_stop   = tap_dev_stop,
@@ -720,6 +732,7 @@ static const struct eth_dev_ops ops = {
.allmulticast_disable   = tap_allmulti_disable,
.mac_addr_set   = tap_mac_set,
.mtu_set= tap_mtu_set,
+   .set_mc_addr_list   = tap_set_mc_addr_list,
.stats_get  = tap_stats_get,
.stats_reset= tap_stats_reset,
 };
-- 
2.8.0.rc0



[dpdk-dev] [PATCH v4 5/8] net/tap: add speed capabilities

2017-03-14 Thread Pascal Mazon
Tap PMD is flexible, it supports any speed.

Signed-off-by: Pascal Mazon 
---
 doc/guides/nics/features/tap.ini |  1 +
 drivers/net/tap/rte_eth_tap.c| 35 +++
 2 files changed, 36 insertions(+)

diff --git a/doc/guides/nics/features/tap.ini b/doc/guides/nics/features/tap.ini
index b80577753240..6c07352088bf 100644
--- a/doc/guides/nics/features/tap.ini
+++ b/doc/guides/nics/features/tap.ini
@@ -10,6 +10,7 @@ Promiscuous mode = Y
 Allmulticast mode= Y
 Basic stats  = Y
 MTU update   = Y
+Speed capabilities   = Y
 Unicast MAC filter   = Y
 Other kdrv   = Y
 ARMv7= Y
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 3e61f7bda555..86d2b3b4c106 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -359,6 +359,40 @@ tap_dev_configure(struct rte_eth_dev *dev __rte_unused)
return 0;
 }
 
+static uint32_t
+tap_dev_speed_capa(void)
+{
+   uint32_t speed = pmd_link.link_speed;
+   uint32_t capa = 0;
+
+   if (speed >= ETH_SPEED_NUM_10M)
+   capa |= ETH_LINK_SPEED_10M;
+   if (speed >= ETH_SPEED_NUM_100M)
+   capa |= ETH_LINK_SPEED_100M;
+   if (speed >= ETH_SPEED_NUM_1G)
+   capa |= ETH_LINK_SPEED_1G;
+   if (speed >= ETH_SPEED_NUM_5G)
+   capa |= ETH_LINK_SPEED_2_5G;
+   if (speed >= ETH_SPEED_NUM_5G)
+   capa |= ETH_LINK_SPEED_5G;
+   if (speed >= ETH_SPEED_NUM_10G)
+   capa |= ETH_LINK_SPEED_10G;
+   if (speed >= ETH_SPEED_NUM_20G)
+   capa |= ETH_LINK_SPEED_20G;
+   if (speed >= ETH_SPEED_NUM_25G)
+   capa |= ETH_LINK_SPEED_25G;
+   if (speed >= ETH_SPEED_NUM_40G)
+   capa |= ETH_LINK_SPEED_40G;
+   if (speed >= ETH_SPEED_NUM_50G)
+   capa |= ETH_LINK_SPEED_50G;
+   if (speed >= ETH_SPEED_NUM_56G)
+   capa |= ETH_LINK_SPEED_56G;
+   if (speed >= ETH_SPEED_NUM_100G)
+   capa |= ETH_LINK_SPEED_100G;
+
+   return capa;
+}
+
 static void
 tap_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 {
@@ -371,6 +405,7 @@ tap_dev_info(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->max_tx_queues = internals->nb_queues;
dev_info->min_rx_bufsize = 0;
dev_info->pci_dev = NULL;
+   dev_info->speed_capa = tap_dev_speed_capa();
 }
 
 static void
-- 
2.8.0.rc0



[dpdk-dev] [PATCH v4 3/8] net/tap: add MAC address management

2017-03-14 Thread Pascal Mazon
As soon as the netdevice is created, update pmd->mac_addr with its
actual MAC address.

Signed-off-by: Pascal Mazon 
---
 doc/guides/nics/features/tap.ini |  1 +
 drivers/net/tap/rte_eth_tap.c| 39 +--
 2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/doc/guides/nics/features/tap.ini b/doc/guides/nics/features/tap.ini
index f4aca6921ddc..d9b47a003654 100644
--- a/doc/guides/nics/features/tap.ini
+++ b/doc/guides/nics/features/tap.ini
@@ -9,6 +9,7 @@ Jumbo frame  = Y
 Promiscuous mode = Y
 Allmulticast mode= Y
 Basic stats  = Y
+Unicast MAC filter   = Y
 Other kdrv   = Y
 ARMv7= Y
 ARMv8= Y
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 18edac57eead..06c1faa92dce 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -113,6 +113,10 @@ struct pmd_internals {
struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES];/* List of TX queues */
 };
 
+static int
+tap_ioctl(struct pmd_internals *pmd, unsigned long request,
+ struct ifreq *ifr, int set);
+
 /* Tun/Tap allocation routine
  *
  * name is the number of the interface to use, unless NULL to take the host
@@ -178,13 +182,12 @@ tun_alloc(struct pmd_internals *pmd, uint16_t qid)
}
 
if (qid == 0) {
-   if (ioctl(fd, SIOCGIFHWADDR, &ifr) == -1) {
-   RTE_LOG(ERR, PMD, "ioctl failed (SIOCGIFHWADDR) (%s)\n",
-   ifr.ifr_name);
-   goto error;
-   }
+   struct ifreq ifr;
 
-   rte_memcpy(&pmd->eth_addr, ifr.ifr_hwaddr.sa_data, ETH_ALEN);
+   if (tap_ioctl(pmd, SIOCGIFHWADDR, &ifr, 0) < 0)
+   goto error;
+   rte_memcpy(&pmd->eth_addr, ifr.ifr_hwaddr.sa_data,
+  ETHER_ADDR_LEN);
}
 
return fd;
@@ -297,6 +300,9 @@ tap_ioctl(struct pmd_internals *pmd, unsigned long request,
else
ifr->ifr_flags &= ~req_flags;
break;
+   case SIOCGIFHWADDR:
+   case SIOCSIFHWADDR:
+   break;
default:
RTE_LOG(WARNING, PMD, "%s: ioctl() called with wrong arg\n",
pmd->name);
@@ -499,6 +505,26 @@ tap_allmulti_disable(struct rte_eth_dev *dev)
tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0);
 }
 
+
+static void
+tap_mac_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
+{
+   struct pmd_internals *pmd = dev->data->dev_private;
+   struct ifreq ifr;
+
+   if (is_zero_ether_addr(mac_addr)) {
+   RTE_LOG(ERR, PMD, "%s: can't set an empty MAC address\n",
+   dev->data->name);
+   return;
+   }
+
+   ifr.ifr_hwaddr.sa_family = AF_LOCAL;
+   rte_memcpy(ifr.ifr_hwaddr.sa_data, mac_addr, ETHER_ADDR_LEN);
+   if (tap_ioctl(pmd, SIOCSIFHWADDR, &ifr, 1) < 0)
+   return;
+   rte_memcpy(&pmd->eth_addr, mac_addr, ETHER_ADDR_LEN);
+}
+
 static int
 tap_setup_queue(struct rte_eth_dev *dev,
struct pmd_internals *internals,
@@ -633,6 +659,7 @@ static const struct eth_dev_ops ops = {
.promiscuous_disable= tap_promisc_disable,
.allmulticast_enable= tap_allmulti_enable,
.allmulticast_disable   = tap_allmulti_disable,
+   .mac_addr_set   = tap_mac_set,
.stats_get  = tap_stats_get,
.stats_reset= tap_stats_reset,
 };
-- 
2.8.0.rc0



[dpdk-dev] [PATCH v4 7/8] net/tap: add packet type management

2017-03-14 Thread Pascal Mazon
Advertize packet types supported by the librte_net.

Signed-off-by: Pascal Mazon 
---
 doc/guides/nics/features/tap.ini |  1 +
 drivers/net/tap/rte_eth_tap.c| 35 +++
 2 files changed, 36 insertions(+)

diff --git a/doc/guides/nics/features/tap.ini b/doc/guides/nics/features/tap.ini
index 6aa11874e2bc..7f3f4d661dd7 100644
--- a/doc/guides/nics/features/tap.ini
+++ b/doc/guides/nics/features/tap.ini
@@ -13,6 +13,7 @@ MTU update   = Y
 Multicast MAC filter = Y
 Speed capabilities   = Y
 Unicast MAC filter   = Y
+Packet type parsing  = Y
 Other kdrv   = Y
 ARMv7= Y
 ARMv8= Y
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index e9aa2f45ba54..371249b0daed 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -228,6 +229,8 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t 
nb_pkts)
mbuf->data_len = len;
mbuf->pkt_len = len;
mbuf->port = rxq->in_port;
+   mbuf->packet_type = rte_net_get_ptype(mbuf, NULL,
+ RTE_PTYPE_ALL_MASK);
 
/* account for the receive frame */
bufs[num_rx++] = mbuf;
@@ -713,6 +716,37 @@ tap_set_mc_addr_list(struct rte_eth_dev *dev __rte_unused,
return 0;
 }
 
+static const uint32_t*
+tap_dev_supported_ptypes_get(struct rte_eth_dev *dev __rte_unused)
+{
+   static const uint32_t ptypes[] = {
+   RTE_PTYPE_INNER_L2_ETHER,
+   RTE_PTYPE_INNER_L2_ETHER_VLAN,
+   RTE_PTYPE_INNER_L2_ETHER_QINQ,
+   RTE_PTYPE_INNER_L3_IPV4,
+   RTE_PTYPE_INNER_L3_IPV4_EXT,
+   RTE_PTYPE_INNER_L3_IPV6,
+   RTE_PTYPE_INNER_L3_IPV6_EXT,
+   RTE_PTYPE_INNER_L4_FRAG,
+   RTE_PTYPE_INNER_L4_UDP,
+   RTE_PTYPE_INNER_L4_TCP,
+   RTE_PTYPE_INNER_L4_SCTP,
+   RTE_PTYPE_L2_ETHER,
+   RTE_PTYPE_L2_ETHER_VLAN,
+   RTE_PTYPE_L2_ETHER_QINQ,
+   RTE_PTYPE_L3_IPV4,
+   RTE_PTYPE_L3_IPV4_EXT,
+   RTE_PTYPE_L3_IPV6_EXT,
+   RTE_PTYPE_L3_IPV6,
+   RTE_PTYPE_L4_FRAG,
+   RTE_PTYPE_L4_UDP,
+   RTE_PTYPE_L4_TCP,
+   RTE_PTYPE_L4_SCTP,
+   };
+
+   return ptypes;
+}
+
 static const struct eth_dev_ops ops = {
.dev_start  = tap_dev_start,
.dev_stop   = tap_dev_stop,
@@ -735,6 +769,7 @@ static const struct eth_dev_ops ops = {
.set_mc_addr_list   = tap_set_mc_addr_list,
.stats_get  = tap_stats_get,
.stats_reset= tap_stats_reset,
+   .dev_supported_ptypes_get = tap_dev_supported_ptypes_get,
 };
 
 static int
-- 
2.8.0.rc0



[dpdk-dev] [PATCH v4 8/8] net/tap: add flow control management

2017-03-14 Thread Pascal Mazon
A tap netdevice does not support flow control; ensure nothing but
RTE_FC_NONE mode can be set.

Signed-off-by: Pascal Mazon 
---
 doc/guides/nics/features/tap.ini |  1 +
 drivers/net/tap/rte_eth_tap.c| 19 +++
 2 files changed, 20 insertions(+)

diff --git a/doc/guides/nics/features/tap.ini b/doc/guides/nics/features/tap.ini
index 7f3f4d661dd7..a51712dce066 100644
--- a/doc/guides/nics/features/tap.ini
+++ b/doc/guides/nics/features/tap.ini
@@ -14,6 +14,7 @@ Multicast MAC filter = Y
 Speed capabilities   = Y
 Unicast MAC filter   = Y
 Packet type parsing  = Y
+Flow control = Y
 Other kdrv   = Y
 ARMv7= Y
 ARMv8= Y
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 371249b0daed..7557abb2ebfc 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -747,6 +747,23 @@ tap_dev_supported_ptypes_get(struct rte_eth_dev *dev 
__rte_unused)
return ptypes;
 }
 
+static int
+tap_flow_ctrl_get(struct rte_eth_dev *dev __rte_unused,
+ struct rte_eth_fc_conf *fc_conf)
+{
+   fc_conf->mode = RTE_FC_NONE;
+   return 0;
+}
+
+static int
+tap_flow_ctrl_set(struct rte_eth_dev *dev __rte_unused,
+ struct rte_eth_fc_conf *fc_conf)
+{
+   if (fc_conf->mode != RTE_FC_NONE)
+   return -ENOTSUP;
+   return 0;
+}
+
 static const struct eth_dev_ops ops = {
.dev_start  = tap_dev_start,
.dev_stop   = tap_dev_stop,
@@ -757,6 +774,8 @@ static const struct eth_dev_ops ops = {
.tx_queue_setup = tap_tx_queue_setup,
.rx_queue_release   = tap_rx_queue_release,
.tx_queue_release   = tap_tx_queue_release,
+   .flow_ctrl_get  = tap_flow_ctrl_get,
+   .flow_ctrl_set  = tap_flow_ctrl_set,
.link_update= tap_link_update,
.dev_set_link_up= tap_link_set_up,
.dev_set_link_down  = tap_link_set_down,
-- 
2.8.0.rc0



Re: [dpdk-dev] 答复: [PATCH] net/virtio-user: fix overflow

2017-03-14 Thread Yuanhan Liu
On Tue, Mar 14, 2017 at 04:20:39PM +0800, Wenfeng Liu wrote:
> Hi Yuanhan,
> 
> >On Mon, Mar 13, 2017 at 03:09:15PM +, Wenfeng Liu wrote:
> >> This commit fixes an array overflow when number of queues is higher than
> 8.
> >
> >Firstly, this commit log could be a bit more informative, to something
> >like:
> >
> >virtio-user limits the qeueue number to 8 but provides no limit
> >check against the queue number input from user. If a bigger queue
> >number (> 8) is given, there is an overflow issue. Doing a sanity
> >check could avoid it.
> >
> 
> Sure, I will revise the commit log accordingly.
> 
> >> 
> >> Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")
> >> 
> >> Signed-off-by: Wenfeng Liu 
> >> ---
> >>  drivers/net/virtio/virtio_pci.h  | 3 ++-
> >>  drivers/net/virtio/virtio_user/virtio_user_dev.c | 2 +-  
> >> drivers/net/virtio/virtio_user/virtio_user_dev.h | 6 +++---
> >>  drivers/net/virtio/virtio_user_ethdev.c  | 7 +++
> >>  4 files changed, 13 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/drivers/net/virtio/virtio_pci.h 
> >> b/drivers/net/virtio/virtio_pci.h index 59e45c4..bd940b4 100644
> >> --- a/drivers/net/virtio/virtio_pci.h
> >> +++ b/drivers/net/virtio/virtio_pci.h
> >> @@ -160,7 +160,8 @@
> >>  /*
> >>   * Maximum number of virtqueues per device.
> >>   */
> >> -#define VIRTIO_MAX_VIRTQUEUES 8
> >> +#define VIRTIO_MAX_VIRTQUEUE_PAIRS 8
> >> +#define VIRTIO_MAX_VIRTQUEUES VIRTIO_MAX_VIRTQUEUE_PAIRS * 2 + 1
> >>  
> >>  /* Common configuration */
> >>  #define VIRTIO_PCI_CAP_COMMON_CFG 1
> >> diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
> >> b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> >> index e7fd65f..5b81676 100644
> >> --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
> >> +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> >> @@ -234,7 +234,7 @@ int virtio_user_stop_device(struct virtio_user_dev
> *dev)
> >>uint32_t i, q;
> >>  
> >>dev->vhostfd = -1;
> >> -  for (i = 0; i < VIRTIO_MAX_VIRTQUEUES * 2 + 1; ++i) {
> >> +  for (i = 0; i < VIRTIO_MAX_VIRTQUEUES; ++i) {
> >
> >Right, we don't need setup callfd and kickfd for the ctrl-queue.
> 
> I did not remove the ctrl-queue. I just redefined the MACRO according to DRY
> principle:

Oh, right, sorry, I overlooked it. Then for this patch, it's okay to me.

> #define VIRTIO_MAX_VIRTQUEUES VIRTIO_MAX_VIRTQUEUE_PAIRS * 2 + 1
> I noticed that I missed the bracket in the MACRO and will add it in next
> version.

Yes, please. Also, please put a "Cc: sta...@dpdk.org" before your
Signed-of-by: it's a candidate for stable releases.

--yliu


[dpdk-dev] [PATCH v4 1/4] net/tap: move private elements to external header

2017-03-14 Thread Pascal Mazon
In the next patch, access to struct pmd_internals will be necessary in
tap_flow.c to store the flows.

Signed-off-by: Pascal Mazon 
Acked-by: Olga Shern 
---
 drivers/net/tap/Makefile  |  1 +
 drivers/net/tap/rte_eth_tap.c | 35 ++--
 drivers/net/tap/rte_eth_tap.h | 74 +++
 3 files changed, 77 insertions(+), 33 deletions(-)
 create mode 100644 drivers/net/tap/rte_eth_tap.h

diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile
index e18f30c56f52..bdbe69e62a4e 100644
--- a/drivers/net/tap/Makefile
+++ b/drivers/net/tap/Makefile
@@ -40,6 +40,7 @@ EXPORT_MAP := rte_pmd_tap_version.map
 LIBABIVER := 1
 
 CFLAGS += -O3
+CFLAGS += -I$(SRCDIR)
 CFLAGS += $(WERROR_FLAGS)
 
 #
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 7557abb2ebfc..536fd206e789 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -51,6 +51,8 @@
 #include 
 #include 
 
+#include 
+
 /* Linux based path to the TUN device */
 #define TUN_TAP_DEV_PATH"/dev/net/tun"
 #define DEFAULT_TAP_NAME"dtap"
@@ -81,39 +83,6 @@ static struct rte_eth_link pmd_link = {
.link_autoneg = ETH_LINK_SPEED_AUTONEG
 };
 
-struct pkt_stats {
-   uint64_t opackets;  /* Number of output packets */
-   uint64_t ipackets;  /* Number of input packets */
-   uint64_t obytes;/* Number of bytes on output */
-   uint64_t ibytes;/* Number of bytes on input */
-   uint64_t errs;  /* Number of error packets */
-};
-
-struct rx_queue {
-   struct rte_mempool *mp; /* Mempool for RX packets */
-   uint16_t in_port;   /* Port ID */
-   int fd;
-
-   struct pkt_stats stats; /* Stats for this RX queue */
-};
-
-struct tx_queue {
-   int fd;
-   struct pkt_stats stats; /* Stats for this TX queue */
-};
-
-struct pmd_internals {
-   char name[RTE_ETH_NAME_MAX_LEN];/* Internal Tap device name */
-   uint16_t nb_queues; /* Number of queues supported */
-   struct ether_addr eth_addr; /* Mac address of the device port */
-
-   int if_index;   /* IF_INDEX for the port */
-   int ioctl_sock; /* socket for ioctl calls */
-
-   struct rx_queue rxq[RTE_PMD_TAP_MAX_QUEUES];/* List of RX queues */
-   struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES];/* List of TX queues */
-};
-
 static int
 tap_ioctl(struct pmd_internals *pmd, unsigned long request,
  struct ifreq *ifr, int set);
diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
new file mode 100644
index ..880ec1b4fc8c
--- /dev/null
+++ b/drivers/net/tap/rte_eth_tap.h
@@ -0,0 +1,74 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_TAP_H_
+#define _RTE_ETH_TAP_H_
+
+#include 
+
+#include 
+#include 
+
+#define RTE_PMD_TAP_MAX_QUEUES 16
+
+struct pkt_stats {
+   uint64_t opackets; /* Number of output packets */
+   uint64_t ipackets; /* Number of input packets */
+   uint64_t obytes; /* Number of bytes on output */
+   uint64_t ibytes; /* Number of bytes on input */
+   uint64_t errs; /* Number of error packets */
+};
+
+struct rx_queue {
+   struct rte_mempool *mp; /*

[dpdk-dev] [PATCH v4 2/4] net/tap: add preliminary support for rte_flow

2017-03-14 Thread Pascal Mazon
The flow API provides the ability to classify packets received by a tap
netdevice.

This patch only implements skeleton functions for flow API support, no
patterns are supported yet.

Signed-off-by: Pascal Mazon 
Acked-by: Olga Shern 
---
 doc/guides/nics/features/tap.ini |   1 +
 drivers/net/tap/Makefile |   1 +
 drivers/net/tap/rte_eth_tap.c|   6 ++
 drivers/net/tap/rte_eth_tap.h|   2 +
 drivers/net/tap/tap_flow.c   | 185 +++
 drivers/net/tap/tap_flow.h   |  46 ++
 6 files changed, 241 insertions(+)
 create mode 100644 drivers/net/tap/tap_flow.c
 create mode 100644 drivers/net/tap/tap_flow.h

diff --git a/doc/guides/nics/features/tap.ini b/doc/guides/nics/features/tap.ini
index a51712dce066..9d73f61cca3b 100644
--- a/doc/guides/nics/features/tap.ini
+++ b/doc/guides/nics/features/tap.ini
@@ -9,6 +9,7 @@ Jumbo frame  = Y
 Promiscuous mode = Y
 Allmulticast mode= Y
 Basic stats  = Y
+Flow API = Y
 MTU update   = Y
 Multicast MAC filter = Y
 Speed capabilities   = Y
diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile
index bdbe69e62a4e..386b8b0594d3 100644
--- a/drivers/net/tap/Makefile
+++ b/drivers/net/tap/Makefile
@@ -47,6 +47,7 @@ CFLAGS += $(WERROR_FLAGS)
 # all source are stored in SRCS-y
 #
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += rte_eth_tap.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap_flow.c
 
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_eal
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 536fd206e789..78eac9a11ea0 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -52,6 +52,7 @@
 #include 
 
 #include 
+#include 
 
 /* Linux based path to the TUN device */
 #define TUN_TAP_DEV_PATH"/dev/net/tun"
@@ -435,6 +436,7 @@ tap_dev_close(struct rte_eth_dev *dev __rte_unused)
struct pmd_internals *internals = dev->data->dev_private;
 
tap_link_set_down(dev);
+   tap_flow_flush(dev, NULL);
 
for (i = 0; i < internals->nb_queues; i++) {
if (internals->rxq[i].fd != -1)
@@ -758,6 +760,7 @@ static const struct eth_dev_ops ops = {
.stats_get  = tap_stats_get,
.stats_reset= tap_stats_reset,
.dev_supported_ptypes_get = tap_dev_supported_ptypes_get,
+   .filter_ctrl= tap_dev_filter_ctrl,
 };
 
 static int
@@ -829,6 +832,8 @@ eth_dev_tap_create(const char *name, char *tap_name)
pmd->txq[i].fd = -1;
}
 
+   LIST_INIT(&pmd->flows);
+
return 0;
 
 error_exit:
@@ -942,6 +947,7 @@ rte_pmd_tap_remove(const char *name)
return 0;
 
internals = eth_dev->data->dev_private;
+   tap_flow_flush(eth_dev, NULL);
for (i = 0; i < internals->nb_queues; i++)
if (internals->rxq[i].fd != -1)
close(internals->rxq[i].fd);
diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
index 880ec1b4fc8c..a64116f5a35e 100644
--- a/drivers/net/tap/rte_eth_tap.h
+++ b/drivers/net/tap/rte_eth_tap.h
@@ -34,6 +34,7 @@
 #ifndef _RTE_ETH_TAP_H_
 #define _RTE_ETH_TAP_H_
 
+#include 
 #include 
 
 #include 
@@ -67,6 +68,7 @@ struct pmd_internals {
struct ether_addr eth_addr;   /* Mac address of the device port */
int if_index; /* IF_INDEX for the port */
int ioctl_sock;   /* socket for ioctl calls */
+   LIST_HEAD(tap_flows, rte_flow) flows;/* rte_flow rules */
struct rx_queue rxq[RTE_PMD_TAP_MAX_QUEUES]; /* List of RX queues */
struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES]; /* List of TX queues */
 };
diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
new file mode 100644
index ..c32ed382d745
--- /dev/null
+++ b/drivers/net/tap/tap_flow.c
@@ -0,0 +1,185 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES 

[dpdk-dev] [PATCH v4 0/4] net/tap: support flow API

2017-03-14 Thread Pascal Mazon
This series add support for the flow API in tap PMD.

It enables filtering specific packets incoming on the tap netdevice, to
process only desired ones. Under the hood, it uses kernel TC (traffic
control), which takes place very early in the stack, and supports most
common pattern items and actions defined in the flow API.

This series applies on top of:

  [PATCH 0/6] net/tap: add additional management ops

v2 changes:
  - support compilation on kernels < 4.2 (where flower support appeared)
  - set whitespaces in tap.h
  - remove unnecessary goto

v3 changes:
  - vlan patterns enabled depending on running kernel (4.9+)
  - update doc/guides/nics/tap.rst for Flow API support
  - rebase on top of "net/tap: add additional management ops" series

v4 changes:
  - rebase on top of "net/tap: add additional management ops" series
  - fix a few netlink doxygen comments
  - rename tap.h -> rte_eth_tap.h
  - flush flow rules only when applicable

Pascal Mazon (4):
  net/tap: move private elements to external header
  net/tap: add preliminary support for rte_flow
  net/tap: add netlink back-end for flow API
  net/tap: add basic flow API patterns and actions

 doc/guides/nics/features/tap.ini |1 +
 doc/guides/nics/tap.rst  |   23 +
 drivers/net/tap/Makefile |   44 ++
 drivers/net/tap/rte_eth_tap.c|  100 ++--
 drivers/net/tap/rte_eth_tap.h|   79 +++
 drivers/net/tap/tap_flow.c   | 1078 ++
 drivers/net/tap/tap_flow.h   |   58 ++
 drivers/net/tap/tap_netlink.c|  367 +
 drivers/net/tap/tap_netlink.h|   69 +++
 drivers/net/tap/tap_tcmsgs.c |  378 +
 drivers/net/tap/tap_tcmsgs.h |   63 +++
 11 files changed, 2226 insertions(+), 34 deletions(-)
 create mode 100644 drivers/net/tap/rte_eth_tap.h
 create mode 100644 drivers/net/tap/tap_flow.c
 create mode 100644 drivers/net/tap/tap_flow.h
 create mode 100644 drivers/net/tap/tap_netlink.c
 create mode 100644 drivers/net/tap/tap_netlink.h
 create mode 100644 drivers/net/tap/tap_tcmsgs.c
 create mode 100644 drivers/net/tap/tap_tcmsgs.h

-- 
2.8.0.rc0



[dpdk-dev] [PATCH v4 3/4] net/tap: add netlink back-end for flow API

2017-03-14 Thread Pascal Mazon
Each kernel netdevice may have queueing disciplines set for it, which
determine how to handle the packet (mostly on egress). That's part of
the TC (Traffic Control) mechanism.

Through TC, it is possible to set filter rules that match specific
packets, and act according to what is in the rule. This is a perfect
candidate to implement the flow API for the tap PMD, as it has an
associated kernel netdevice automatically.

Each flow API rule will be translated into its TC counterpart.

To leverage TC, it is necessary to communicate with the kernel using
netlink. This patch introduces a library to help that communication.

Inside netlink.c, functions are generic for any netlink messaging.
Inside tcmsgs.c, functions are specific to deal with TC rules.

Signed-off-by: Pascal Mazon 
Acked-by: Olga Shern 
---
 drivers/net/tap/Makefile  |   2 +
 drivers/net/tap/tap_netlink.c | 367 
 drivers/net/tap/tap_netlink.h |  69 
 drivers/net/tap/tap_tcmsgs.c  | 378 ++
 drivers/net/tap/tap_tcmsgs.h  |  63 +++
 5 files changed, 879 insertions(+)
 create mode 100644 drivers/net/tap/tap_netlink.c
 create mode 100644 drivers/net/tap/tap_netlink.h
 create mode 100644 drivers/net/tap/tap_tcmsgs.c
 create mode 100644 drivers/net/tap/tap_tcmsgs.h

diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile
index 386b8b0594d3..4ae2ca6cfbab 100644
--- a/drivers/net/tap/Makefile
+++ b/drivers/net/tap/Makefile
@@ -48,6 +48,8 @@ CFLAGS += $(WERROR_FLAGS)
 #
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += rte_eth_tap.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap_flow.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap_netlink.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap_tcmsgs.c
 
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_eal
diff --git a/drivers/net/tap/tap_netlink.c b/drivers/net/tap/tap_netlink.c
new file mode 100644
index ..10f00d1931c6
--- /dev/null
+++ b/drivers/net/tap/tap_netlink.c
@@ -0,0 +1,367 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+/* Must be quite large to support dumping a huge list of QDISC or filters. */
+#define BUF_SIZE (32 * 1024) /* Size of the buffer to receive kernel messages 
*/
+#define SNDBUF_SIZE 32768 /* Send buffer size for the netlink socket */
+#define RCVBUF_SIZE 32768 /* Receive buffer size for the netlink socket */
+
+struct nested_tail {
+   struct rtattr *tail;
+   struct nested_tail *prev;
+};
+
+/**
+ * Initialize a netlink socket for communicating with the kernel.
+ *
+ * @return
+ *   netlink socket file descriptor on success, -1 otherwise.
+ */
+int
+nl_init(void)
+{
+   int fd, sndbuf_size = SNDBUF_SIZE, rcvbuf_size = RCVBUF_SIZE;
+   struct sockaddr_nl local = { .nl_family = AF_NETLINK };
+
+   fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_ROUTE);
+   if (fd < 0) {
+   RTE_LOG(ERR, PMD, "Unable to create a netlink socket\n");
+   return -1;
+   }
+   if (setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(int))) {
+   RTE_LOG(ERR, PMD, "Unable to set socket buffer send size\n");
+   return -1;
+   }
+   if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcvb

[dpdk-dev] [PATCH v4 4/4] net/tap: add basic flow API patterns and actions

2017-03-14 Thread Pascal Mazon
Supported flow rules are now mapped to TC rules on the tap netdevice.
The netlink message used for creating the TC rule is stored in struct
rte_flow. That way, by simply changing a metadata in it, we can require
for the rule deletion without further parsing.

Supported items:
- eth: src and dst (with variable masks), and eth_type (0x mask).
- vlan: vid, pcp, tpid, but not eid.
- ipv4/6: src and dst (with variable masks), and ip_proto (0x mask).
- udp/tcp: src and dst port (0x) mask.

Supported actions:
- DROP
- QUEUE
- PASSTHRU

It is generally not possible to provide a "last" item. However, if the
"last" item, once masked, is identical to the masked spec, then it is
supported.

Only IPv4/6 and MAC addresses can use a variable mask. All other
items need a full mask (exact match).

Support for VLAN requires kernel headers >= 4.9, checked using
auto-config.sh.

Signed-off-by: Pascal Mazon 
Acked-by: Olga Shern 
---
 doc/guides/nics/tap.rst   |  23 ++
 drivers/net/tap/Makefile  |  40 ++
 drivers/net/tap/rte_eth_tap.c |  61 ++-
 drivers/net/tap/rte_eth_tap.h |   3 +
 drivers/net/tap/tap_flow.c| 919 +-
 drivers/net/tap/tap_flow.h|  12 +
 6 files changed, 1043 insertions(+), 15 deletions(-)

diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
index c4f207be3b47..cdb528b5eae4 100644
--- a/doc/guides/nics/tap.rst
+++ b/doc/guides/nics/tap.rst
@@ -82,6 +82,29 @@ can utilize that stack to handle the network protocols. Plus 
you would be able
 to address the interface using an IP address assigned to the internal
 interface.
 
+Flow API support
+
+
+The tap PMD supports major flow API pattern items and actions, when running on
+linux kernels above 4.2 ("Flower" classifier required). Supported items:
+
+- eth: src and dst (with variable masks), and eth_type (0x mask).
+- vlan: vid, pcp, tpid, but not eid. (requires kernel 4.9)
+- ipv4/6: src and dst (with variable masks), and ip_proto (0x mask).
+- udp/tcp: src and dst port (0x) mask.
+
+Supported actions:
+
+- DROP
+- QUEUE
+- PASSTHRU
+
+It is generally not possible to provide a "last" item. However, if the "last"
+item, once masked, is identical to the masked spec, then it is supported.
+
+Only IPv4/6 and MAC addresses can use a variable mask. All other items need a
+full mask (exact match).
+
 Example
 ---
 
diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile
index 4ae2ca6cfbab..a6542dad1d66 100644
--- a/drivers/net/tap/Makefile
+++ b/drivers/net/tap/Makefile
@@ -41,6 +41,7 @@ LIBABIVER := 1
 
 CFLAGS += -O3
 CFLAGS += -I$(SRCDIR)
+CFLAGS += -I.
 CFLAGS += $(WERROR_FLAGS)
 
 #
@@ -57,5 +58,44 @@ DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_mbuf
 DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_mempool
 DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_ether
 DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_hash
 
 include $(RTE_SDK)/mk/rte.lib.mk
+
+# Generate and clean-up tap_autoconf.h.
+
+export CC CFLAGS CPPFLAGS EXTRA_CFLAGS EXTRA_CPPFLAGS
+export AUTO_CONFIG_CFLAGS = -Wno-error
+
+ifndef V
+AUTOCONF_OUTPUT := >/dev/null
+endif
+
+tap_autoconf.h.new: FORCE
+
+tap_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
+   $Q $(RM) -f -- '$@'
+   $Q sh -- '$<' '$@' \
+   HAVE_TC_FLOWER \
+   linux/pkt_cls.h \
+   enum TCA_FLOWER_UNSPEC \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TC_VLAN_ID \
+   linux/pkt_cls.h \
+   enum TCA_FLOWER_KEY_VLAN_PRIO \
+   $(AUTOCONF_OUTPUT)
+
+# Create tap_autoconf.h or update it in case it differs from the new one.
+
+tap_autoconf.h: tap_autoconf.h.new
+   $Q [ -f '$@' ] && \
+   cmp '$<' '$@' $(AUTOCONF_OUTPUT) || \
+   mv '$<' '$@'
+
+$(SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP):.c=.o): tap_autoconf.h
+
+clean_tap: FORCE
+   $Q rm -f -- tap_autoconf.h tap_autoconf.h.new
+
+clean: clean_tap
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 78eac9a11ea0..ed2099212e2a 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -42,17 +42,20 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
 #include 
+#include 
 
 /* Linux based path to the TUN device */
 #define TUN_TAP_DEV_PATH"/dev/net/tun"
@@ -67,6 +70,9 @@
 #define RTE_PMD_TAP_MAX_QUEUES 1
 #endif
 
+#define FLOWER_KERNEL_VERSION KERNEL_VERSION(4, 2, 0)
+#define FLOWER_VLAN_KERNEL_VERSION KERNEL_VERSION(4, 9, 0)
+
 static struct rte_vdev_driver pmd_tap_drv;
 
 static const char *valid_arguments[] = {
@@ -159,6 +165,28 @@ tun_alloc(struct pmd_internals *pmd, uint16_t qid)
goto error;
rte_memcpy(&pmd->eth_addr, ifr.ifr_hwaddr.sa

[dpdk-dev] [PATCH v4] net/tap: remote netdevice traffic capture

2017-03-14 Thread Pascal Mazon
This patchset adds the special "remote" feature to the tap PMD, that
actually enables capturing traffic from another netdevice. This is
especially useful to get packets into the DPDK app, when the remote
netdevice has no DPDK support.

The "remote" feature requires flow API support as flow rules will be
configured on the remote netdevice for redirection, using the same
mechanism.

This series applies on top of:

  [PATCH 0/4] net/tap: support flow API

v2 changes:
  - rebase on top of updated "net/tap: support flow API"
  - fix implicit flow flush when closing the netdevices

v3 changes:
  - memset(0) for remote_iface in rte_pmd_tap_probe()
  - use snprintf instead of strncpy to correctly handle terminating \0

v4 changes:
  - rebase on top of updated "net/tap: support flow API"
  - use only a single patch now as MTU, MAC and flags can be easily managed
with tap_ioctl()

Pascal Mazon (1):
  net/tap: add remote netdevice traffic capture

 doc/guides/nics/tap.rst   |  17 ++
 drivers/net/tap/rte_eth_tap.c | 101 +-
 drivers/net/tap/rte_eth_tap.h |   4 +
 drivers/net/tap/tap_flow.c| 451 --
 drivers/net/tap/tap_flow.h|  24 +++
 5 files changed, 575 insertions(+), 22 deletions(-)

-- 
2.8.0.rc0



[dpdk-dev] [PATCH v4] net/tap: add remote netdevice traffic capture

2017-03-14 Thread Pascal Mazon
By default, a tap netdevice is of no use when not fed by a separate
process. The ability to automatically feed it from another netdevice
allows applications to capture any kind of traffic normally destined to
the kernel stack.

This patch implements this ability through a new optional "remote"
parameter.

Packets matching filtering rules created with the flow API are matched
on the remote device and redirected to the tap PMD, where the relevant
action will be performed.

Signed-off-by: Pascal Mazon 
Acked-by: Olga Shern 
---
 doc/guides/nics/tap.rst   |  17 ++
 drivers/net/tap/rte_eth_tap.c | 101 +-
 drivers/net/tap/rte_eth_tap.h |   4 +
 drivers/net/tap/tap_flow.c| 451 --
 drivers/net/tap/tap_flow.h|  24 +++
 5 files changed, 575 insertions(+), 22 deletions(-)

diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
index cdb528b5eae4..676a569b00ca 100644
--- a/doc/guides/nics/tap.rst
+++ b/doc/guides/nics/tap.rst
@@ -58,6 +58,23 @@ needed, but the interface does not enforce that speed, for 
example::
 
--vdev=net_tap0,iface=foo0,speed=25000
 
+It is possible to specify a remote netdevice to capture packets from by adding
+``remote=foo1``, for example::
+
+   --vdev=net_tap,iface=tap0,remote=foo1
+
+If a ``remote`` is set, then all packets with the tap PMD's local MAC coming
+in on the remote netdevice will be redirected to the tap.
+If the tap is in promiscuous mode, then all packets will be redirected.
+In allmulti mode, all multicast packets will be redirected.
+It is possible to add explicit rte_flow rules on the tap PMD to capture 
specific
+traffic. For instance, in testpmd, the following rte_flow rule would capture
+packets with the given MAC address from the remote, and send it to the tap RX
+QUEUE 3::
+
+   testpmd> flow create 0 ingress pattern eth src is 02:03:04:05:06:07 / \
+end actions queue index 3 / end
+
 After the DPDK application is started you can send and receive packets on the
 interface using the standard rx_burst/tx_burst APIs in DPDK. From the host
 point of view you can use any host tool like tcpdump, Wireshark, ping, Pktgen
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index ed2099212e2a..b4601d49fa2c 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -63,6 +63,7 @@
 
 #define ETH_TAP_IFACE_ARG   "iface"
 #define ETH_TAP_SPEED_ARG   "speed"
+#define ETH_TAP_REMOTE_ARG  "remote"
 
 #ifdef IFF_MULTI_QUEUE
 #define RTE_PMD_TAP_MAX_QUEUES 16
@@ -78,6 +79,7 @@ static struct rte_vdev_driver pmd_tap_drv;
 static const char *valid_arguments[] = {
ETH_TAP_IFACE_ARG,
ETH_TAP_SPEED_ARG,
+   ETH_TAP_REMOTE_ARG,
NULL
 };
 
@@ -187,10 +189,43 @@ tun_alloc(struct pmd_internals *pmd, uint16_t qid)
pmd->name);
return fd;
}
+   if (pmd->remote_if_index) {
+   /*
+* Flush usually returns negative value because it tries
+* to delete every QDISC (and on a running device, one
+* QDISC at least is needed). Ignore negative return
+* value.
+*/
+   qdisc_flush(pmd->nlsk_fd, pmd->remote_if_index);
+   if (qdisc_create_ingress(pmd->nlsk_fd,
+pmd->remote_if_index) < 0)
+   goto remote_fail;
+   LIST_INIT(&pmd->implicit_flows);
+   if (tap_flow_implicit_create(
+   pmd, TAP_REMOTE_LOCAL_MAC) < 0)
+   goto remote_fail;
+   if (tap_flow_implicit_create(
+   pmd, TAP_REMOTE_BROADCAST) < 0)
+   goto remote_fail;
+   if (tap_flow_implicit_create(
+   pmd, TAP_REMOTE_BROADCASTV6) < 0)
+   goto remote_fail;
+   if (tap_flow_implicit_create(
+   pmd, TAP_REMOTE_TX) < 0)
+   goto remote_fail;
+   }
}
 
return fd;
 
+remote_fail:
+   RTE_LOG(ERR, PMD,
+   "Could not set up remote flow rules for %s: remote disabled.\n",
+   pmd->name);
+   pmd->remote_if_index = 0;
+   tap_flow_implicit_flush(pmd, NULL);
+   return fd;
+
 error:
if (fd > 0)
close(fd);
@@ -289,8 +324,17 @@ tap_ioctl(struct pmd_internals *pmd, unsigned long request,
  struct ifreq *ifr, int set)
 {
short req_flags = ifr->ifr_flags;
+   int remote = !!pmd->remote_if_index;
 
-   snprintf(ifr->ifr_name, IFNAMSIZ, "%s", pmd->name);
+   /*
+* If there is a remote netdevice, apply ioctl

Re: [dpdk-dev] [PATCH v1 12/14] ring: separate out head index manipulation for enq/deq

2017-03-14 Thread Olivier Matz
On Wed, 8 Mar 2017 12:06:54 +, Bruce Richardson 
 wrote:
> On Wed, Mar 08, 2017 at 11:49:06AM +0100, Olivier MATZ wrote:
> > On Thu, 23 Feb 2017 17:24:05 +, Bruce Richardson 
> >  wrote:  
> > > We can write a single common function for head manipulation for enq
> > > and a common one for deq, allowing us to have a single worker function
> > > for enq and deq, rather than two of each. Update all other inline
> > > functions to use the new functions.
> > > 
> > > Signed-off-by: Bruce Richardson 
> > > ---
> > >  lib/librte_ring/rte_ring.c |   4 +-
> > >  lib/librte_ring/rte_ring.h | 328 
> > > -
> > >  2 files changed, 149 insertions(+), 183 deletions(-)
> > >   
> > 
> > [...]
> >   
> > > +static inline __attribute__((always_inline)) unsigned int
> > > +__rte_ring_do_enqueue(struct rte_ring *r, void * const *obj_table,
> > > +  unsigned int n, enum rte_ring_queue_behavior behavior,
> > > +  int is_sp, unsigned int *free_space)
> > >  {
> > > - uint32_t prod_head, cons_tail;
> > > - uint32_t prod_next, free_entries;
> > > - uint32_t mask = r->mask;
> > > -
> > > - prod_head = r->prod.head;
> > > - cons_tail = r->cons.tail;
> > > - /* The subtraction is done between two unsigned 32bits value
> > > -  * (the result is always modulo 32 bits even if we have
> > > -  * prod_head > cons_tail). So 'free_entries' is always between 0
> > > -  * and size(ring)-1. */
> > > - free_entries = mask + cons_tail - prod_head;
> > > -
> > > - /* check that we have enough room in ring */
> > > - if (unlikely(n > free_entries))
> > > - n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : free_entries;
> > > + uint32_t prod_head, prod_next;
> > > + uint32_t free_entries;
> > >  
> > > + n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > > + &prod_head, &prod_next, &free_entries);
> > >   if (n == 0)
> > >   goto end;
> > >  
> > > -
> > > - prod_next = prod_head + n;
> > > - r->prod.head = prod_next;
> > > -
> > > - /* write entries in ring */
> > >   ENQUEUE_PTRS();
> > >   rte_smp_wmb();
> > >  
> > > + /*
> > > +  * If there are other enqueues in progress that preceded us,
> > > +  * we need to wait for them to complete
> > > +  */
> > > + while (unlikely(r->prod.tail != prod_head))
> > > + rte_pause();
> > > +  
> > 
> > I'd say this part should not be done in case is_sp == 1.
> > Since it is sometimes a constant arg in an inline func, it may be better
> > to add the if (is_sp == 0).
> > 
> > [...]
> >   
> 
> Yes, it's an unnecessary check. However, having it in place for the sp
> case made no performance difference in my test, so I decided to keep
> the code shorter by avoiding an additional branch. If there is a
> performance hit I'll remove it, but I would rather not add more branches
> to the code in the absense of a real impact to not having them.

Ok.
Maybe it's worth checking the numbers given by the unit test.

Olivier



Re: [dpdk-dev] [PATCH v2 07/14] ring: make bulk and burst fn return vals consistent

2017-03-14 Thread Olivier Matz
On Wed, 8 Mar 2017 12:08:42 +, Bruce Richardson 
 wrote:
> On Wed, Mar 08, 2017 at 11:22:40AM +0100, Olivier MATZ wrote:
> > On Tue,  7 Mar 2017 11:32:10 +, Bruce Richardson 
> >  wrote:  
> > > The bulk fns for rings returns 0 for all elements enqueued and negative
> > > for no space. Change that to make them consistent with the burst functions
> > > in returning the number of elements enqueued/dequeued, i.e. 0 or N.
> > > This change also allows the return value from enq/deq to be used directly
> > > without a branch for error checking.
> > > 
> > > Signed-off-by: Bruce Richardson   
> > 
> > [...]
> >   
> > > @@ -716,7 +695,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * 
> > > const *obj_table,
> > >  static inline int __attribute__((always_inline))
> > >  rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
> > >  {
> > > - return rte_ring_mp_enqueue_bulk(r, &obj, 1);
> > > + return rte_ring_mp_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
> > >  }
> > >  
> > >  /**  
> > 
> > I'm wondering if these functions (enqueue/dequeue of one element) should
> > be modified to return 0 (fail) or 1 (success) too, for consistency with
> > the bulk functions.
> > 
> > Any opinion?
> >   
> I thought about that, but I would view it as risky, unless we want to go
> changing the parameters to the function also, as the compiler won't flag
> a change in return value like that.
> 

Ok, I have no better solution anyway.

Olivier


Re: [dpdk-dev] [PATCH v2 00/14] refactor and cleanup of rte_ring

2017-03-14 Thread Olivier Matz
On Tue,  7 Mar 2017 11:32:03 +, Bruce Richardson 
 wrote:
> NOTE: this set depends on the v2 cleanup set sent previously.
>   http://dpdk.org/ml/archives/dev/2017-February/thread.html#58200
> 
> This patchset make a set of, sometimes non-backward compatible, cleanup
> changes to the rte_ring code in order to improve it. The resulting code is
> shorter, since the existing functions are restructured to reduce code
> duplication, as well as being more consistent in behaviour. The specific
> changes made are explained in each patch which makes that change.
> 
> Changes in V2:
> * Eliminated extra cacheline padding where cachelines are 128B
> * Renamed rte_ring_ht_ptr struct to rte_ring_headtail
> * Removed missed references to ring watermarks in test code and docs
> 
> This patchset is largely the same as that posted previously on-list as
> an RFC:
>   http://dpdk.org/ml/archives/dev/2017-February/thread.html#56982
> 
> Changes in V1 from RFC:
> * Included release notes updates as changes are made in each patch
> * Fixed some missed comment updates when changing the code
> * Separated some initial fixup patches from this set to send separately
> * Dropped the final two patches for an rte_event_ring, as not relevant
>   for this set. That can be done as a separate set later.
> * The macros for copying the pointers have an extra parameter added,
>   indicating the start of the ring buffer itself. This allows more
>   flexibility for reusing them in other ring implementations.
> 
> Bruce Richardson (14):
>   ring: remove split cacheline build setting
>   ring: create common structure for prod and cons metadata
>   ring: eliminate duplication of size and mask fields
>   ring: remove debug setting
>   ring: remove the yield when waiting for tail update
>   ring: remove watermark support
>   ring: make bulk and burst fn return vals consistent
>   ring: allow enqueue fns to return free space value
>   ring: allow dequeue fns to return remaining entry count
>   examples/quota_watermark: use ring space for watermarks
>   ring: reduce scope of local variables
>   ring: separate out head index manipulation for enq/deq
>   ring: create common function for updating tail idx
>   ring: make ring struct and enq/deq macros type agnostic
> 
>  app/pdump/main.c   |   2 +-
>  config/common_base |   3 -
>  doc/guides/prog_guide/env_abstraction_layer.rst|   5 -
>  doc/guides/prog_guide/ring_lib.rst |  15 -
>  doc/guides/rel_notes/release_17_05.rst |  32 +
>  doc/guides/sample_app_ug/server_node_efd.rst   |   2 +-
>  drivers/crypto/null/null_crypto_pmd.c  |   2 +-
>  drivers/net/bonding/rte_eth_bond_pmd.c |   3 +-
>  drivers/net/ring/rte_eth_ring.c|   4 +-
>  examples/distributor/main.c|   5 +-
>  examples/load_balancer/runtime.c   |  34 +-
>  .../client_server_mp/mp_client/client.c|   9 +-
>  .../client_server_mp/mp_server/main.c  |   2 +-
>  examples/packet_ordering/main.c|  13 +-
>  examples/qos_sched/app_thread.c|  14 +-
>  examples/quota_watermark/qw/init.c |   5 +-
>  examples/quota_watermark/qw/main.c |  21 +-
>  examples/quota_watermark/qw/main.h |   1 +
>  examples/quota_watermark/qwctl/commands.c  |   4 +-
>  examples/quota_watermark/qwctl/qwctl.c |   2 +
>  examples/quota_watermark/qwctl/qwctl.h |   1 +
>  examples/server_node_efd/node/node.c   |   2 +-
>  examples/server_node_efd/server/main.c |   2 +-
>  lib/librte_hash/rte_cuckoo_hash.c  |   5 +-
>  lib/librte_mempool/rte_mempool_ring.c  |  12 +-
>  lib/librte_pdump/rte_pdump.c   |   2 +-
>  lib/librte_port/rte_port_frag.c|   3 +-
>  lib/librte_port/rte_port_ras.c |   2 +-
>  lib/librte_port/rte_port_ring.c|  34 +-
>  lib/librte_ring/rte_ring.c |  76 +--
>  lib/librte_ring/rte_ring.h | 760 
> -
>  test/test-pipeline/pipeline_hash.c |   5 +-
>  test/test-pipeline/runtime.c   |  19 +-
>  test/test/autotest_test_funcs.py   |   7 -
>  test/test/commands.c   |  52 --
>  test/test/test_link_bonding_mode4.c|   6 +-
>  test/test/test_pmd_ring_perf.c |  12 +-
>  test/test/test_ring.c  | 704 +++
>  test/test/test_ring_perf.c |  36 +-
>  test/test/test_table_acl.c |   2 +-
>  test/test/test_table_pipeline.c|   2 +-
>  test/test/test_table_ports.c   |  12 +-
>  test/test/virtual_pmd.c

[dpdk-dev] [PATCH v4] mbuf: use pktmbuf helper to create the pool

2017-03-14 Thread Olivier Matz
From: Hemant Agrawal 

When possible, replace the uses of rte_mempool_create() with
the helper provided in librte_mbuf: rte_pktmbuf_pool_create().

This is the preferred way to create a mbuf pool.

This also updates the documentation.

Signed-off-by: Olivier Matz 
Signed-off-by: Hemant Agrawal 
Acked-by: Olivier Matz 

---

Hi Hemant,

I noticed I did not ack this patch. I just rebased it on top
of master branch (app/test was moved).


v4:
* rebase on top of master

v3:
* removing changes from ip_reassembly app

v2:
* removing compilation error from ip_reassmebly
* fix minor patch apply warnings


 doc/guides/sample_app_ug/ipv4_multicast.rst| 12 
 doc/guides/sample_app_ug/l2_forward_job_stats.rst  | 33 --
 .../sample_app_ug/l2_forward_real_virtual.rst  | 26 +++--
 doc/guides/sample_app_ug/ptpclient.rst | 11 ++--
 doc/guides/sample_app_ug/quota_watermark.rst   | 26 ++---
 drivers/net/bonding/rte_eth_bond_8023ad.c  | 13 -
 examples/ip_pipeline/init.c| 18 ++--
 examples/multi_process/l2fwd_fork/main.c   | 13 +++--
 examples/tep_termination/main.c| 16 +--
 lib/librte_mbuf/rte_mbuf.c |  7 +++--
 lib/librte_mbuf/rte_mbuf.h | 29 +++
 test/test/test_link_bonding_rssconf.c  | 11 
 12 files changed, 90 insertions(+), 125 deletions(-)

diff --git a/doc/guides/sample_app_ug/ipv4_multicast.rst 
b/doc/guides/sample_app_ug/ipv4_multicast.rst
index 2f65eed..3e30f50 100644
--- a/doc/guides/sample_app_ug/ipv4_multicast.rst
+++ b/doc/guides/sample_app_ug/ipv4_multicast.rst
@@ -145,12 +145,12 @@ Memory pools for indirect buffers are initialized 
differently from the memory po
 
 .. code-block:: c
 
-packet_pool = rte_mempool_create("packet_pool", NB_PKT_MBUF, 
PKT_MBUF_SIZE, 32, sizeof(struct rte_pktmbuf_pool_private),
- rte_pktmbuf_pool_init, NULL, 
rte_pktmbuf_init, NULL, rte_socket_id(), 0);
-
-header_pool = rte_mempool_create("header_pool", NB_HDR_MBUF, 
HDR_MBUF_SIZE, 32, 0, NULL, NULL, rte_pktmbuf_init, NULL, rte_socket_id(), 0);
-clone_pool = rte_mempool_create("clone_pool", NB_CLONE_MBUF,
-CLONE_MBUF_SIZE, 32, 0, NULL, NULL, rte_pktmbuf_init, NULL, 
rte_socket_id(), 0);
+packet_pool = rte_pktmbuf_pool_create("packet_pool", NB_PKT_MBUF, 32,
+   0, PKT_MBUF_DATA_SIZE, rte_socket_id());
+header_pool = rte_pktmbuf_pool_create("header_pool", NB_HDR_MBUF, 32,
+   0, HDR_MBUF_DATA_SIZE, rte_socket_id());
+clone_pool = rte_pktmbuf_pool_create("clone_pool", NB_CLONE_MBUF, 32,
+   0, 0, rte_socket_id());
 
 The reason for this is because indirect buffers are not supposed to hold any 
packet data and
 therefore can be initialized with lower amount of reserved memory for each 
buffer.
diff --git a/doc/guides/sample_app_ug/l2_forward_job_stats.rst 
b/doc/guides/sample_app_ug/l2_forward_job_stats.rst
index 456e85e..1029cc8 100644
--- a/doc/guides/sample_app_ug/l2_forward_job_stats.rst
+++ b/doc/guides/sample_app_ug/l2_forward_job_stats.rst
@@ -193,36 +193,25 @@ and the application to store network packet data:
 .. code-block:: c
 
 /* create the mbuf pool */
-l2fwd_pktmbuf_pool =
-rte_mempool_create("mbuf_pool", NB_MBUF,
-   MBUF_SIZE, 32,
-   sizeof(struct rte_pktmbuf_pool_private),
-   rte_pktmbuf_pool_init, NULL,
-   rte_pktmbuf_init, NULL,
-   rte_socket_id(), 0);
+l2fwd_pktmbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", NB_MBUF,
+   MEMPOOL_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE,
+   rte_socket_id());
 
 if (l2fwd_pktmbuf_pool == NULL)
 rte_exit(EXIT_FAILURE, "Cannot init mbuf pool\n");
 
 The rte_mempool is a generic structure used to handle pools of objects.
-In this case, it is necessary to create a pool that will be used by the driver,
-which expects to have some reserved space in the mempool structure,
-sizeof(struct rte_pktmbuf_pool_private) bytes.
-The number of allocated pkt mbufs is NB_MBUF, with a size of MBUF_SIZE each.
-A per-lcore cache of 32 mbufs is kept.
+In this case, it is necessary to create a pool that will be used by the driver.
+The number of allocated pkt mbufs is NB_MBUF, with a data room size of
+RTE_MBUF_DEFAULT_BUF_SIZE each.
+A per-lcore cache of MEMPOOL_CACHE_SIZE mbufs is kept.
 The memory is allocated in rte_socket_id() socket,
 but it is possible to extend this code to allocate one mbuf pool per socket.
 
-Two callback pointers are also given to the rte_mempool_create() function:
-
-*   The first callback pointer is to rte_pktmbuf_pool_init() and is used
-to initialize the private data of the mempool, which is needed by the 
driver.
-This function is provided by the mbuf API, bu

[dpdk-dev] Reg DPDK with unsupported NIC

2017-03-14 Thread raman geetha gopalakrishnan
Hi ,

Please find my query, Currently i am planning to develop DPDK APP (linux
env). I do not have DPDK supported NIC. Till then i would still like to
develop DPDK APP and want DPDK  to use OS interface to TX/RX packets from
NIC. How can i make it? I went through KNI and my understanding is you
cannot use it - is this correct?

In what way i can still develop DPDK APP with non supported NIC till get
the DPDK NIC.

Thanks
Raman


Re: [dpdk-dev] i40e SR-IOV with Linux PF

2017-03-14 Thread Thomas Monjalon
2017-03-14 04:44, Wu, Jingjing:
> From: Thomas Monjalon [mailto:thomas.monja...@6wind.com]
> > > Hi i40e developers,
> > >
> > > Referring to the VFD discussion, I thought basic behaviours were the
> > > same regardless of the PF driver:
> > > http://dpdk.org/ml/archives/dev/2016-December/053056.html
> > > "
> > > > In the meanwhile, we have some test models ongoing to validate
> > > > combination of Linux and DPDK drivers for VF and PF.
> > > > We'll fully support below 4 cases going forward.
> > > > 1. DPDK PF + DPDK VF
> > > > 2. DPDK PF + Linux VF
> > > > 3. Linux PF + DPDK VF
> > > > 4. Linux PF + Linux VF (it's not our scope)
> > > [...]
> > > > Linux PF + DPDK VF has been tested with 1.0 API long time ago.
> > > > There is some test activities ongoing.
> > > "
> > >
> > > I think the Linux PF case is important and deserves more consideration.
> > > When looking at the code, specifically i40evf_vlan_offload_set() and
> > > i40evf_vlan_pvid_set(), I read this:
> > > "
> > > /* Linux pf host doesn't support vlan offload yet */
> > > if (vf->version_major == I40E_DPDK_VERSION_MAJOR) { "
> > >
> > > Is there some work in progress on Linux side to get the same behaviour
> > > as with a DPDK PF?
> > >
> 
> As I know, VFD features are marked with an "EXPERIMENTAL" tag.
> And we are working on the extendable interface (feature based) with
> PF kernel driver. 

The VLAN offload is not a VFD feature. It is a basic driver feature.
It is said that it is supported in the documentation but it is not
with a Linux PF.

Please consider the rest of my email:

> > At least, it must be documented in
> >   doc/guides/nics/features/i40e_vf.ini
> > and marked as partially supported (P instead of Y) in
> >   doc/guides/nics/i40e.rst


Re: [dpdk-dev] i40e queues per VF

2017-03-14 Thread Thomas Monjalon
ping again

2017-03-08 12:25, Thomas Monjalon:
> ping - still waiting for answers to below questions, please
> 
> 2017-02-16 15:55, Thomas Monjalon:
> > 2017-02-16 13:58, Wu, Jingjing:
> > > From: Thomas Monjalon
> > > > 
> > > > Hi,
> > > > 
> > > > When reading the documentation, it is not easy to understand the 
> > > > capability of
> > > > i40evf for the number of queues.
> > > > 
> > > > First, please could you explain why we need a build-time config option?
> > > > In the doc, there is neither justification nor tuning guidelines:
> > > > 
> > > > http://dpdk.org/doc/guides/nics/i40e.html#config-file-options
> > > > "
> > > > CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF (default 64) Number of
> > > > queues reserved for PF.
> > > > CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF (default 4) Number of
> > > > queues reserved for each SR-IOV VF.
> > > > "
> > > 
> > > This number is used as initialization time to allocate queue number
> > > for PF/VF for HW's queue pool. Will add more description in i40e.rst.
> > 
> > The description "Number of queues reserved for each SR-IOV VF" seems
> > partially wrong. Please explain it is a queue pair.
> > 
> > > > I feel these are hard limits and should be some constants in the code, 
> > > > not some
> > > > build configuration options.
> > > > 
> > > > The other doc to look at is:
> > > > http://dpdk.org/doc/guides/nics/intel_vf.html#intel-fortville-10-40-gigabit-
> > > > ethernet-controller-vf-infrastructure
> > > > "
> > > > Each VF can have a maximum of 16 queue pairs.
> > > > "
> > > > 
> > > > Do we agree that a queue pair is 1 Rx queue / 1 Tx queue?
> > > > Note: the concept of queue pairs in Intel VF should be explained 
> > > > somewhere.
> > > > 
> > > Yes.
> > > > Below, a different limitation is given:
> > > > "
> > > > The available queue number(at most 4) per VF depends on the total 
> > > > number of
> > > > pool, which is determined by the max number of VF at PF initialization 
> > > > stage and
> > > > the number of queue specified in config "
> > > >
> > > I think there may be some inconsistent description in  doc intel_vf.rst 
> > > due to
> > > Multiple kinds of NICs. We should correct them.
> > > Thanks for pointing that.
> > > 
> > > > So what is the real maximum of queue pairs? 4 or 16?
> > > > The datasheet talks about 16 queues. Is it 8 pairs?
> > > 
> > > That's is 16 queue pairs. 16 RX queues and 16 Tx queues.
> > > > 
> > > > Is there something to configure the number of queues when creating VF 
> > > > with the
> > > > kernel driver?
> > > 
> > > In kernel driver, it seems at most only 4 queues are supported. That's
> > > Why we add  build-time config option to make more queues are possible.
> > 
> > If we can create 16 queue pairs, why restrict the default configuration to 
> > 4?
> > Why is it a build-time config option?



Re: [dpdk-dev] Reg DPDK with unsupported NIC

2017-03-14 Thread Gémes Géza

2017-03-14 10:28 keltezéssel, raman geetha gopalakrishnan írta:

Hi ,

Please find my query, Currently i am planning to develop DPDK APP (linux
env). I do not have DPDK supported NIC. Till then i would still like to
develop DPDK APP and want DPDK  to use OS interface to TX/RX packets from
NIC. How can i make it? I went through KNI and my understanding is you
cannot use it - is this correct?

In what way i can still develop DPDK APP with non supported NIC till get
the DPDK NIC.

Thanks
Raman


Hi,

My recommendation is to do your development in a KVM VM. DPDK works 
perfectly with virtio NICs.


Geza



Re: [dpdk-dev] [PATCH v9 16/18] examples/distributor: give Rx thread a core

2017-03-14 Thread Hunt, David



On 10/3/2017 4:51 PM, Bruce Richardson wrote:

On Mon, Mar 06, 2017 at 09:10:31AM +, David Hunt wrote:

This so that with the increased amount of stats we are counting,
we don't interfere with the rx core.


Where are the stats being counted in the current code and how would they
interfere?

/Bruce


Previous version of the distributor example did not print out several 
lines of
statistics every second, which the new version does. I felt that it it 
would be
better to separate this off to it's own core, so that the rx core 
performance

would not be impacted.

I'll add an extra comment in the commit message.

Regards,
Dave.


Re: [dpdk-dev] Compiling DPDK 17.02 with a Busybox-based tar

2017-03-14 Thread Thomas Monjalon
2017-03-14 07:58, Raphael Cohn:
> Hi,
> 
> To compile DPDK on a system with Busybox tar installed, it's necessary to
> make a small change to the build system:-
> 
> sed -i -e '/--keep-newer-files/d' mk/rte.sdkinstall.mk
> sed -i -e 's;--strip-components=1 \\;--strip-components=1;g' mk/
> rte.sdkinstall.mk
> 
> I'm not sure whether the impact of this change fundamentally affects DPDK.
> I'm a little surprised that tar is needed at all for a compile + install,
> but I haven't investigated further. Is it being used to do a copy?

Yes it is used to make a copy.
It is a convenient one-liner.

I am a bit surprised that you need to install DPDK with busybox.
The busybox systems are generally cross-built and prepared out of the box,
with the host tools.
However, if you feel it is important to install DPDK on such target,
you are welcome to propose a patch.



Re: [dpdk-dev] [PATCH] net/ixgbevf: fix stats update after a PF reset

2017-03-14 Thread Olivier Matz
Hi Wei,

On Thu, 16 Feb 2017 17:49:22 +0100, Olivier Matz  wrote:
> Hi Wei,
> 
> On Thu, 9 Feb 2017 14:50:05 +, "Dai, Wei"  wrote:
> > > -Original Message-
> > > From: Dai, Wei
> > > Sent: Thursday, February 9, 2017 10:38 PM
> > > To: 'Olivier Matz' ; dev@dpdk.org; Zhang,
> > > Helin ; Ananyev, Konstantin
> > > 
> > > Cc: Guo Fengtian ; sta...@dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH] net/ixgbevf: fix stats update after
> > > a PF reset
> > > 
> > > The stats register can rewind to zero when the port is running for
> > > a long period. So I am afraid that this check is not always correct.
> > > Why not introduce a variable to directly indicate whether the
> > > resulted stats should be updated or not.  
> >
> > Another way is to clear
> > hw_stats->last_vfgprc/last_vfgorc/last_vfgptc/last_vfmprc at the same
> > time PF is set down.
> >   
> 
> Can we know easily in VF if the PF was set down?

Any guideline for this? Or is it something we cannot fix?

Thanks,
Olivier


Re: [dpdk-dev] Reg DPDK with unsupported NIC

2017-03-14 Thread Thomas Monjalon
2017-03-14 14:58, raman geetha gopalakrishnan:
> In what way i can still develop DPDK APP with non supported NIC till get
> the DPDK NIC.

You should check the bottom of this page (section Ohers):
http://dpdk.org/doc/nics
The simplest is to use the ring PMD which do not require a NIC at all.
Or you can try the other solutions (pcap, tap, af_packet).


Re: [dpdk-dev] [PATCH 01/17] vhost: introduce driver features related APIs

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

Introduce few APIs to set/get/enable/disable driver features.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/rte_vhost_version.map | 10 
 lib/librte_vhost/rte_virtio_net.h  |  9 
 lib/librte_vhost/socket.c  | 90 ++
 3 files changed, 109 insertions(+)


Nice!
Reviewed-by: Maxime Coquelin 

I wonder whether we could protect from setting/enabling/disabling
features once negotiation is done?

Thanks,
Maxime


Re: [dpdk-dev] Reg DPDK with unsupported NIC

2017-03-14 Thread Wiles, Keith


Sent from my iPhone

> On Mar 14, 2017, at 5:29 PM, raman geetha gopalakrishnan 
>  wrote:
> 
> Hi ,
> 
> Please find my query, Currently i am planning to develop DPDK APP (linux
> env). I do not have DPDK supported NIC. Till then i would still like to
> develop DPDK APP and want DPDK  to use OS interface to TX/RX packets from
> NIC. How can i make it? I went through KNI and my understanding is you
> cannot use it - is this correct?
> 
> In what way i can still develop DPDK APP with non supported NIC till get
> the DPDK NIC.

Take a look at the TAP PMD driver and look at the doc file in the doc 
directory. 

It will not be high performance but it can work nicely using tap and some like 
socat.
> 
> Thanks
> Raman


Re: [dpdk-dev] [PATCH 01/17] vhost: introduce driver features related APIs

2017-03-14 Thread Maxime Coquelin



On 03/14/2017 10:46 AM, Maxime Coquelin wrote:



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

Introduce few APIs to set/get/enable/disable driver features.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/rte_vhost_version.map | 10 
 lib/librte_vhost/rte_virtio_net.h  |  9 
 lib/librte_vhost/socket.c  | 90
++
 3 files changed, 109 insertions(+)


Nice!
Reviewed-by: Maxime Coquelin 

I wonder whether we could protect from setting/enabling/disabling
features once negotiation is done?


Oh, I forgot one comment on this patch.
Since these new functions are part to the API, shouldn't them be
documented?

Maxime


[dpdk-dev] [PATCH] doc: add maintainer role about replying questions

2017-03-14 Thread Thomas Monjalon
The first line of the MAINTAINERS file are:
"
The intention of this file is to provide a set of names that we can rely on
for helping in patch reviews and questions.
"

Unfortunately, some maintainers do not endorse their role for questions
asked on the mailing list.
Hope making it clear in the contribution guide will enforce more
responsive maintainer participation.

Signed-off-by: Thomas Monjalon 
---
 doc/guides/contributing/patches.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/contributing/patches.rst 
b/doc/guides/contributing/patches.rst
index fe71381..84a5dab 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -67,6 +67,7 @@ The role of the component maintainers is to:
 * Review patches for the component or delegate the review.
   The review should be done, ideally, within 1 week of submission to the 
mailing list.
 * Add an ``acked-by`` to patches, or patchsets, that are ready for committing 
to a tree.
+* Reply to questions asked about the component.
 
 Component maintainers can be added or removed by submitting a patch to the 
``MAINTAINERS`` file.
 Maintainers should have demonstrated a reasonable level of contributions or 
reviews to the component area.
-- 
2.7.0



[dpdk-dev] [PATCH v2] net/virtio-user: fix overflow

2017-03-14 Thread Wenfeng Liu
virtio-user limits the qeueue number to 8 but provides no limit
check against the queue number input from user. If a bigger queue
number (> 8) is given, there is an overflow issue. Doing a sanity
check could avoid it.

Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")
Cc: sta...@dpdk.org

Signed-off-by: Wenfeng Liu 
---
 drivers/net/virtio/virtio_pci.h  | 3 ++-
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 2 +-
 drivers/net/virtio/virtio_user/virtio_user_dev.h | 6 +++---
 drivers/net/virtio/virtio_user_ethdev.c  | 7 +++
 4 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 59e45c4..1302556 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -160,7 +160,8 @@
 /*
  * Maximum number of virtqueues per device.
  */
-#define VIRTIO_MAX_VIRTQUEUES 8
+#define VIRTIO_MAX_VIRTQUEUE_PAIRS 8
+#define VIRTIO_MAX_VIRTQUEUES (VIRTIO_MAX_VIRTQUEUE_PAIRS * 2 + 1)
 
 /* Common configuration */
 #define VIRTIO_PCI_CAP_COMMON_CFG  1
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index e7fd65f..5b81676 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -234,7 +234,7 @@ int virtio_user_stop_device(struct virtio_user_dev *dev)
uint32_t i, q;
 
dev->vhostfd = -1;
-   for (i = 0; i < VIRTIO_MAX_VIRTQUEUES * 2 + 1; ++i) {
+   for (i = 0; i < VIRTIO_MAX_VIRTQUEUES; ++i) {
dev->kickfds[i] = -1;
dev->callfds[i] = -1;
}
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h 
b/drivers/net/virtio/virtio_user/virtio_user_dev.h
index 6ecb91e..ba80d05 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.h
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h
@@ -49,8 +49,8 @@ struct virtio_user_dev {
int *tapfds;
 
/* for both vhost_user and vhost_kernel */
-   int callfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
-   int kickfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
+   int callfds[VIRTIO_MAX_VIRTQUEUES];
+   int kickfds[VIRTIO_MAX_VIRTQUEUES];
int mac_specified;
uint32_tmax_queue_pairs;
uint32_tqueue_pairs;
@@ -62,7 +62,7 @@ struct virtio_user_dev {
uint8_t status;
uint8_t mac_addr[ETHER_ADDR_LEN];
charpath[PATH_MAX];
-   struct vringvrings[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
+   struct vringvrings[VIRTIO_MAX_VIRTQUEUES];
struct virtio_user_backend_ops *ops;
 };
 
diff --git a/drivers/net/virtio/virtio_user_ethdev.c 
b/drivers/net/virtio/virtio_user_ethdev.c
index 16d1526..d476a2d 100644
--- a/drivers/net/virtio/virtio_user_ethdev.c
+++ b/drivers/net/virtio/virtio_user_ethdev.c
@@ -433,6 +433,13 @@
goto end;
}
 
+   if (queues > VIRTIO_MAX_VIRTQUEUE_PAIRS) {
+   PMD_INIT_LOG(ERR, "arg %s %u exceeds the limit %u",
+   VIRTIO_USER_ARG_QUEUES_NUM, queues,
+   VIRTIO_MAX_VIRTQUEUE_PAIRS);
+   goto end;
+   }
+
eth_dev = virtio_user_eth_dev_alloc(name);
if (!eth_dev) {
PMD_INIT_LOG(ERR, "virtio_user fails to alloc device");
-- 
1.8.3.1



Re: [dpdk-dev] Reg DPDK with unsupported NIC

2017-03-14 Thread raman geetha gopalakrishnan
Thanks Keith For the quick Reply.

On Tue, Mar 14, 2017 at 3:22 PM, Wiles, Keith  wrote:

>
>
> Sent from my iPhone
>
> > On Mar 14, 2017, at 5:29 PM, raman geetha gopalakrishnan <
> glowing...@gmail.com> wrote:
> >
> > Hi ,
> >
> > Please find my query, Currently i am planning to develop DPDK APP (linux
> > env). I do not have DPDK supported NIC. Till then i would still like to
> > develop DPDK APP and want DPDK  to use OS interface to TX/RX packets from
> > NIC. How can i make it? I went through KNI and my understanding is you
> > cannot use it - is this correct?
> >
> > In what way i can still develop DPDK APP with non supported NIC till get
> > the DPDK NIC.
>
> Take a look at the TAP PMD driver and look at the doc file in the doc
> directory.
>
> It will not be high performance but it can work nicely using tap and some
> like socat.
> >
> > Thanks
> > Raman
>


Re: [dpdk-dev] Reg DPDK with unsupported NIC

2017-03-14 Thread raman geetha gopalakrishnan
Thanks Monjalon For the quick reply.

On Tue, Mar 14, 2017 at 3:16 PM, Thomas Monjalon 
wrote:

> 2017-03-14 14:58, raman geetha gopalakrishnan:
> > In what way i can still develop DPDK APP with non supported NIC till get
> > the DPDK NIC.
>
> You should check the bottom of this page (section Ohers):
> http://dpdk.org/doc/nics
> The simplest is to use the ring PMD which do not require a NIC at all.
> Or you can try the other solutions (pcap, tap, af_packet).
>


Re: [dpdk-dev] [PATCH 02/17] net/vhost: remove feature related APIs

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

vdev options is better for disabling/enabling some features.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/vhost/rte_eth_vhost.c   | 25 
 drivers/net/vhost/rte_eth_vhost.h   | 30 -
 drivers/net/vhost/rte_pmd_vhost_version.map |  3 ---
 3 files changed, 58 deletions(-)


Acked-by: Maxime Coquelin 

Thanks,
Maxime


Re: [dpdk-dev] [PATCH 03/17] vhost: use new APIs to handle features

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

Signed-off-by: Yuanhan Liu 
---
 examples/tep_termination/main.c|  4 +++-
 examples/vhost/main.c  | 43 +-
 lib/librte_vhost/rte_vhost_version.map |  3 ---
 lib/librte_vhost/rte_virtio_net.h  | 13 --
 lib/librte_vhost/socket.c  | 23 +-
 lib/librte_vhost/vhost.c   | 41 
 lib/librte_vhost/vhost.h   | 20 
 lib/librte_vhost/vhost_user.c  |  8 +++
 8 files changed, 76 insertions(+), 79 deletions(-)







diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 8433a54..f7227bf 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -143,9 +143,9 @@
  * The features that we support are requested.
  */
 static uint64_t
-vhost_user_get_features(void)
+vhost_user_get_features(struct virtio_net *dev)
 {
-   return VHOST_FEATURES;
+   return rte_vhost_driver_get_features(dev->ifname);
 }

 /*
@@ -154,7 +154,7 @@
 static int
 vhost_user_set_features(struct virtio_net *dev, uint64_t features)
 {
-   if (features & ~VHOST_FEATURES)
+   if (features & ~rte_vhost_driver_get_features(dev->ifname))


rte_vhost_driver_get_features() returns -1 if the socket is not found.
It would result in accepting any feature trying to be set.

Maxime


Re: [dpdk-dev] [PATCH v9 04/18] lib: add new distributor code

2017-03-14 Thread Hunt, David



On 10/3/2017 4:03 PM, Bruce Richardson wrote:

On Mon, Mar 06, 2017 at 09:10:19AM +, David Hunt wrote:

This patch includes public header file which will be used once
we add in the symbol versioning for v20 and v1705 APIs.

Also includes v1702 header file, and code for new

Now 1705.


burst-capable distributor library. This will be re-named as
rte_distributor.h later in the patch-set

The new distributor code contains a very similar API to the legacy code,
but now sends bursts of up to 8 mbufs to each worker. Flow ID's are
reduced to 15 bits for an optimal flow matching algorithm.

Signed-off-by: David Hunt 
---
  lib/librte_distributor/Makefile  |   1 +
  lib/librte_distributor/rte_distributor.c | 628 +++
  lib/librte_distributor/rte_distributor_private.h |   7 +-
  lib/librte_distributor/rte_distributor_v1705.h   | 269 ++
  4 files changed, 904 insertions(+), 1 deletion(-)
  create mode 100644 lib/librte_distributor/rte_distributor.c
  create mode 100644 lib/librte_distributor/rte_distributor_v1705.h


Minor nit, I think this patch might be squashed into the previous one,
to have new structures and code together.

/Bruce


Done in the next version.
Dave.


Re: [dpdk-dev] [PATCH v9 12/18] examples/distributor: allow for extra stats

2017-03-14 Thread Hunt, David



On 10/3/2017 4:46 PM, Bruce Richardson wrote:

On Mon, Mar 06, 2017 at 09:10:27AM +, David Hunt wrote:

+   freq = rte_get_timer_hz();
+   t = rte_rdtsc() + freq;
+   while (!quit_signal_dist) {
+   if (t < rte_rdtsc()) {
+   print_stats();
+   t = rte_rdtsc() + freq;
+   }
+   }
+

You can probably put in a usleep or nanosleep inot the while loop above.
No need to burn an entire core by polling the tsc.

/Bruce



Done in the next version.
Dave.


Re: [dpdk-dev] [PATCH v9 14/18] examples/distributor: give distributor a core

2017-03-14 Thread Hunt, David



On 10/3/2017 4:49 PM, Bruce Richardson wrote:

On Mon, Mar 06, 2017 at 09:10:29AM +, David Hunt wrote:

Signed-off-by: David Hunt 
---

Title could do with some rewording - e.g. "make distributor API calls on
dedicated core"

This also requires an explanation as to why the change is being made.
Does it not also need an update to the sample app guide about how the
app works?

/Bruce


Yes, sure. And I'll add some more info into the dist_app.rst file.
Rgds,
Dave.


Re: [dpdk-dev] [PATCH v9 15/18] examples/distributor: limit number of Tx rings

2017-03-14 Thread Hunt, David


On 10/3/2017 4:50 PM, Bruce Richardson wrote:

On Mon, Mar 06, 2017 at 09:10:30AM +, David Hunt wrote:

Signed-off-by: David Hunt 
---

Please explain reason for change.

/Bruce


I've re-worked this change, as it's mostly do to with performance, 
resulting in about 10% gain.
The number of Tx rings has been removed, as it was part of an 
investigation for

high core count operation, and is not relevant.
Rgds,
Dave.


Re: [dpdk-dev] [PATCH 04/17] vhost: make notify ops per vhost driver

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

Assume there is an application both support vhost-user net and
vhost-user scsi, the callback should be different. Making notify
ops per vhost driver allow application define different set of
callbacks for different driver.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/vhost/rte_eth_vhost.c | 20 +++-
 examples/tep_termination/main.c   |  3 ++-
 examples/vhost/main.c |  5 +++--
 lib/librte_vhost/rte_virtio_net.h |  3 ++-
 lib/librte_vhost/socket.c | 32 
 lib/librte_vhost/vhost.c  | 16 +---
 lib/librte_vhost/vhost.h  |  5 -
 lib/librte_vhost/vhost_user.c | 15 +--
 8 files changed, 64 insertions(+), 35 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 0ebaa4a..816a9a0 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -667,6 +667,12 @@ struct vhost_xstats_name_off {
return 0;
 }

+static struct virtio_net_device_ops vhost_ops = {
+   .new_device  = new_device,
+   .destroy_device  = destroy_device,
+   .vring_state_changed = vring_state_changed,
+};
+
 int
 rte_eth_vhost_get_queue_event(uint8_t port_id,
struct rte_eth_vhost_queue_event *event)
@@ -736,15 +742,6 @@ struct vhost_xstats_name_off {
 static void *
 vhost_driver_session(void *param __rte_unused)
 {
-   static struct virtio_net_device_ops vhost_ops;
-
-   /* set vhost arguments */
-   vhost_ops.new_device = new_device;
-   vhost_ops.destroy_device = destroy_device;
-   vhost_ops.vring_state_changed = vring_state_changed;
-   if (rte_vhost_driver_callback_register(&vhost_ops) < 0)
-   RTE_LOG(ERR, PMD, "Can't register callbacks\n");
-
/* start event handling */
rte_vhost_driver_session_start();

@@ -1077,6 +1074,11 @@ struct vhost_xstats_name_off {
if (rte_vhost_driver_register(iface_name, flags))
goto error;

+   if (rte_vhost_driver_callback_register(iface_name, &vhost_ops) < 0) {
+   RTE_LOG(ERR, PMD, "Can't register callbacks\n");
+   goto error;
+   }
+
/* We need only one message handling thread */
if (rte_atomic16_add_return(&nb_started_ports, 1) == 1) {
if (vhost_driver_session_start())
diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c
index 8c45128..03c0fbe 100644
--- a/examples/tep_termination/main.c
+++ b/examples/tep_termination/main.c
@@ -1258,7 +1258,8 @@ static inline void __attribute__((always_inline))
rte_vhost_driver_disable_features(dev_basename,
1ULL << VIRTIO_NET_F_MRG_RXBUF);

-   rte_vhost_driver_callback_register(&virtio_net_device_ops);
+   rte_vhost_driver_callback_register(dev_basename,
+   &virtio_net_device_ops);


Return should be checked here, as this function can now return -1.



rte_vhost_driver_session_start();

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 972a6a8..867efc6 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1538,9 +1538,10 @@ static inline void __attribute__((always_inline))
rte_vhost_driver_enable_features(file,
1ULL << VIRTIO_NET_F_CTRL_RX);
}
-   }

-   rte_vhost_driver_callback_register(&virtio_net_device_ops);
+   rte_vhost_driver_callback_register(file,
+   &virtio_net_device_ops);
+   }


Ditto.



rte_vhost_driver_session_start();
return 0;
diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 51f5166..3bfd0b7 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -91,7 +91,8 @@ struct virtio_net_device_ops {
 int rte_vhost_driver_disable_features(const char *path, uint64_t features);

 /* Register callbacks. */
-int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * 
const);
+int rte_vhost_driver_callback_register(const char *path,
+   struct virtio_net_device_ops const * const);
 /* Start vhost driver session blocking loop. */
 int rte_vhost_driver_session_start(void);

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 9e0ec05..550af64 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -73,6 +73,8 @@ struct vhost_user_socket {
 */
uint64_t supported_features;
uint64_t features;
+
+   struct virtio_net_device_ops const *notify_ops;
 };

 struct vhost_user_connection {
@@ -718,6 +720,36 @@ struct vhost_user_reconnect_list {
return -1;
 }

+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_callback_register(const char *path,
+   struct virtio_net_device_ops const * const ops)
+{
+   struc

Re: [dpdk-dev] [PATCH 05/17] vhost: export guest memory regions

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

Some vhost-user driver may need this info to setup its own page tables
for GPA (guest physical addr) to HPA (host physical addr) translation.
SPDK (Storage Performance Development Kit) is one example.

Besides, by exporting this memory info, we could also export the
gpa_to_vva() as an inline function, which helps for performance.
Otherwise, it has to be referenced indirectly by a "vid".

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/rte_vhost_version.map |  1 +
 lib/librte_vhost/rte_virtio_net.h  | 24 
 lib/librte_vhost/vhost.c   | 23 +++
 lib/librte_vhost/vhost.h   | 28 ++--
 lib/librte_vhost/vhost_user.c  | 12 ++--
 5 files changed, 56 insertions(+), 32 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost_version.map 
b/lib/librte_vhost/rte_vhost_version.map
index ee72d5f..b890da6 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -35,5 +35,6 @@ DPDK_17.05 {
rte_vhost_driver_enable_features;
rte_vhost_driver_get_features;
rte_vhost_driver_set_features;
+   rte_vhost_get_vhost_memory;

 } DPDK_16.07;
diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 3bfd0b7..eddf0f4 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -59,6 +59,28 @@
 enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};

 /**
+ * Information relating to memory regions including offsets to
+ * addresses in QEMUs memory file.
+ */
+struct rte_vhost_mem_region {
+   uint64_t guest_phys_addr;
+   uint64_t guest_user_addr;
+   uint64_t host_user_addr;
+   uint64_t size;
+   void *mmap_addr;
+   uint64_t mmap_size;
+   int fd;
+};
+
+/**
+ * Memory structure includes region and mapping information.
+ */
+struct rte_vhost_memory {
+   uint32_t nregions;
+   struct rte_vhost_mem_region regions[0];
+};
+
+/**
  * Device and vring operations.
  */
 struct virtio_net_device_ops {
@@ -187,4 +209,6 @@ uint16_t rte_vhost_enqueue_burst(int vid, uint16_t queue_id,
 uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);

+int rte_vhost_get_vhost_memory(int vid, struct rte_vhost_memory **mem);
+
 #endif /* _VIRTIO_NET_H_ */
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 0088f87..eee229a 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -340,6 +340,29 @@ struct virtio_net *
return 0;
 }

+int
+rte_vhost_get_vhost_memory(int vid, struct rte_vhost_memory **mem)
+{
+   struct virtio_net *dev;
+   struct rte_vhost_memory *m;
+   size_t size;
+
+   dev = get_device(vid);
+   if (!dev)
+   return -1;
+
+   size = dev->mem->nregions * sizeof(struct rte_vhost_mem_region);
+   m = malloc(size);
+   if (!m)
+   return -1;
+
+   m->nregions = dev->mem->nregions;
+   memcpy(m->regions, dev->mem->regions, size);
+   *mem = m;
+
+   return 0;
+}
+


Might worth to be documented, as API.
Other than that:
Reviewed-by: Maxime Coquelin 

Thanks,
Maxime


Re: [dpdk-dev] [PATCH 06/17] vhost: introduce API to fetch negotiated features

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/rte_vhost_version.map |  1 +
 lib/librte_vhost/rte_virtio_net.h  |  1 +
 lib/librte_vhost/vhost.c   | 12 
 3 files changed, 14 insertions(+)

diff --git a/lib/librte_vhost/rte_vhost_version.map 
b/lib/librte_vhost/rte_vhost_version.map
index b890da6..85a2796 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -35,6 +35,7 @@ DPDK_17.05 {
rte_vhost_driver_enable_features;
rte_vhost_driver_get_features;
rte_vhost_driver_set_features;
+   rte_vhost_get_negotiated_features
rte_vhost_get_vhost_memory;

 } DPDK_16.07;
diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index eddf0f4..e7b1599 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -210,5 +210,6 @@ uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);

 int rte_vhost_get_vhost_memory(int vid, struct rte_vhost_memory **mem);
+uint64_t rte_vhost_get_negotiated_features(int vid);

 #endif /* _VIRTIO_NET_H_ */
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index eee229a..8046aef 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -340,6 +340,18 @@ struct virtio_net *
return 0;
 }

+uint64_t
+rte_vhost_get_negotiated_features(int vid)
+{
+   struct virtio_net *dev;
+
+   dev = get_device(vid);
+   if (!dev)
+   return -1;
+
+   return dev->features;
+}

Are we sure the negotiation is done when we can get the device?

Maxime


Re: [dpdk-dev] rte_malloc_heap usage scenario

2017-03-14 Thread Sergio Gonzalez Monroy

On 09/03/2017 04:48, Yerden Zhumabekov wrote:

Hello,

Can anybody explain why rte_malloc_heap.h is advertised as a public 
interface? It only contains malloc_heap struct which is used 
internally and violates DPDK naming conventions for public struct. I 
haven't found any use of it outside of EAL and it's undocumented. If 
it is in fact public, will it continue to be?


As you point out, the struct name does not follow dpdk conventions for 
public structs, that is how it was historically and was never changed.
IMHO the struct itself is not meant to be public (probably we should add 
some comments) but it needs to be as it is a field in public 'struct 
rte_mem_config' which is also a field in public 'struct rte_config'.


That being said, it doesn't look like rte_mem_config should be a public 
struct at all, although there are a couple of PMDs (virtio_user and ena) 
accessing mem_config info through rte_config.


I cannot guarantee that would remain public, but there should be at 
least a notice in case of any changes.


Sergio



I'd like to use it (and some code in librte_eal/common) for 
region-based allocation in a secondary process for easy memory 
reclamation in case of failure. Is it safe to reuse malloc_heap struct?







Re: [dpdk-dev] Reg DPDK with unsupported NIC

2017-03-14 Thread Ferruh Yigit
On 3/14/2017 9:28 AM, raman geetha gopalakrishnan wrote:
> Hi ,
> 
> Please find my query, Currently i am planning to develop DPDK APP (linux
> env). I do not have DPDK supported NIC. Till then i would still like to
> develop DPDK APP and want DPDK  to use OS interface to TX/RX packets from
> NIC. How can i make it? 

> I went through KNI and my understanding is you
> cannot use it - is this correct?

You can use it. KNI does not require a physical device at all.

But with the KNI support in main tree, you need to use KNI specific
APIs, - which KNI sample app shows.

With the KNI PMD in next-net tree, it is possible to use KNI as any
other device.

> 
> In what way i can still develop DPDK APP with non supported NIC till get
> the DPDK NIC.
> 
> Thanks
> Raman
> 



Re: [dpdk-dev] [PATCH 07/17] vhost: export vhost vring info

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/rte_vhost_version.map |  1 +
 lib/librte_vhost/rte_virtio_net.h  | 13 +
 lib/librte_vhost/vhost.c   | 30 ++
 lib/librte_vhost/vhost.h   |  2 ++
 4 files changed, 46 insertions(+)

diff --git a/lib/librte_vhost/rte_vhost_version.map 
b/lib/librte_vhost/rte_vhost_version.map
index 85a2796..efde936 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -37,5 +37,6 @@ DPDK_17.05 {
rte_vhost_driver_set_features;
rte_vhost_get_negotiated_features
rte_vhost_get_vhost_memory;
+   rte_vhost_get_vhost_vring;

 } DPDK_16.07;
diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index e7b1599..80209ad 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -80,6 +80,17 @@ struct rte_vhost_memory {
struct rte_vhost_mem_region regions[0];
 };

+struct rte_vhost_vring {
+   struct vring_desc   *desc;
+   struct vring_avail  *avail;
+   struct vring_used   *used;
+   uint64_tlog_guest_addr;
+
+   int callfd;
+   int kickfd;
+   uint16_tsize;
+};
+
 /**
  * Device and vring operations.
  */
@@ -211,5 +222,7 @@ uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id,

 int rte_vhost_get_vhost_memory(int vid, struct rte_vhost_memory **mem);
 uint64_t rte_vhost_get_negotiated_features(int vid);
+int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
+ struct rte_vhost_vring *vring);

 #endif /* _VIRTIO_NET_H_ */
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 8046aef..b175f0e 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -375,6 +375,36 @@ struct virtio_net *
return 0;
 }

+int
+rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
+ struct rte_vhost_vring *vring)
+{
+   struct virtio_net *dev;
+   struct vhost_virtqueue *vq;
+
+   dev = get_device(vid);
+   if (!dev)
+   return -1;
+
+   if (vring_idx > VHOST_MAX_VRING)

Shouldn't be ">=" ?


+   return -1;
+
+   vq = dev->virtqueue[vring_idx];
+   if (!vq)
+   return -1;
+
+   vring->desc  = vq->desc;
+   vring->avail = vq->avail;
+   vring->used  = vq->used;
+   vring->log_guest_addr  = vq->log_guest_addr;
+
+   vring->callfd  = vq->callfd;
+   vring->kickfd  = vq->kickfd;
+   vring->size= vq->size;
+
+   return 0;
+}
+
 uint16_t
 rte_vhost_avail_entries(int vid, uint16_t queue_id)
 {
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index ed7b2c9..b0ef0cc 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -138,6 +138,8 @@ struct vhost_virtqueue {
 #ifndef VIRTIO_NET_F_MQ
  #define VIRTIO_NET_F_MQ   22
 #endif
+
+#define VHOST_MAX_VRING0x100

Looking at the code, I'm not clear where this limitation comes from.
It seems that it can be up to 0x1, no?

struct virtio_net {
...
struct vhost_virtqueue  *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];

with:
#ifdef VIRTIO_NET_F_MQ
 #define VHOST_MAX_QUEUE_PAIRS  VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX
 #define VHOST_SUPPORTS_MQ  (1ULL << VIRTIO_NET_F_MQ)
#else
 #define VHOST_MAX_QUEUE_PAIRS  1
 #define VHOST_SUPPORTS_MQ  0
#endif

and:
#define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX0x8000


Regards,
Maxime


Re: [dpdk-dev] [PATCH 08/17] vhost: export API to translate gpa to vva

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/rte_vhost_version.map |  1 +
 lib/librte_vhost/rte_virtio_net.h  | 21 +
 lib/librte_vhost/vhost.h   | 19 ---
 lib/librte_vhost/virtio_net.c  | 23 +--
 4 files changed, 35 insertions(+), 29 deletions(-)



Same remark as previous patches, the API should be documented.
Other than that:
Reviewed-by: Maxime Coquelin 

Thanks,
Maxime


Re: [dpdk-dev] [PATCH 09/17] vhost: turn queue pair to vring

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

The queue pair is very virtio-net specific, other devices don't have
such concept. To make it generic, we should log the number of vrings
instead of the number of queue pairs.

This patch just does a simple convert, a later patch would export the
number of vrings to applications.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost.c  | 74 ++-
 lib/librte_vhost/vhost.h  |  4 +--
 lib/librte_vhost/vhost_user.c | 28 +---
 lib/librte_vhost/virtio_net.c | 10 +++---
 4 files changed, 38 insertions(+), 78 deletions(-)



Nice. Making it generic simplifies the net specific code.
Reviewed-by: Maxime Coquelin 

Thanks,
Maxime


Re: [dpdk-dev] [PATCH 1/2] eal_interrupts: support external rxq interrupt handlers

2017-03-14 Thread Ferruh Yigit
On 3/12/2017 8:30 AM, Shahaf Shuler wrote:
> Thursday, March 9, 2017 8:10 PM, Thomas Monjalon:
>> 2017-02-08 15:57, Shahaf Shuler:
>>>  /**
>>> + * It deletes registered eventfds.
>>> + *
>>> + * @param intr_handle
>>> + *   Pointer to the interrupt handle.
>>> + */
>>> +void
>>> +rte_intr_free_epoll_fd(struct rte_intr_handle *intr_handle);
>>
>> This patch looks OK except a miss in the EAL map files for this new function.
>> The title should be "eal/linux: support external Rx interrupt"
>>
>> I think this patch should target next-net.
>> Please Ferruh, take care of this patchset. Thanks
> 
> Ferruh, any more comments before submitting a v2? 

Hi Shahaf,

No extra comments, only second patch does not cleanly apply to next-net,
please rebase it to before sending.

Thanks,
ferruh




Re: [dpdk-dev] [PATCH 10/17] vhost: export the number of vrings

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

We used to use rte_vhost_get_queue_num() for telling how many vrings.
However, the return value is the number of "queue pairs", which is
very virtio-net specific. To make it generic, we should return the
number of vrings instead, and let the driver do the proper translation.
Say, virtio-net driver could turn it to the number of queue pairs by
dividing 2.

Meanwhile, mark rte_vhost_get_queue_num as deprecated.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/vhost/rte_eth_vhost.c  |  2 +-
 lib/librte_vhost/rte_vhost_version.map |  1 +
 lib/librte_vhost/rte_virtio_net.h  | 17 +
 lib/librte_vhost/vhost.c   | 11 +++
 4 files changed, 30 insertions(+), 1 deletion(-)


Reviewed-by: Maxime Coquelin 

Thanks,
Maxime


Re: [dpdk-dev] [PATCH 11/17] vhost: move the device ready check at proper place

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

Currently, we check vq->desc, vq->kickfd and vq->callfd to know whether
a virtio device is ready or not. However, we only do it when handling
SET_VRING_KICK message, which could be wrong if a vhost-user frontend
send SET_VRING_KICK first and SET_VRING_CALL later.

To work for all possible vhost-user frontend implementations, we could
move the ready check at the end of vhost-user message handler.

Meanwhile, since we do the check more often than before, the "virtio
not ready" message is dropped, to not flood the screen.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost_user.c | 32 ++--
 1 file changed, 14 insertions(+), 18 deletions(-)


Makes sense:
Reviewed-by: Maxime Coquelin 

Thanks,
Maxime


Re: [dpdk-dev] [PATCH 12/17] vhost: drop the Rx and Tx queue macro

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

They are virti-net specific and should be defined inside the virtio-net

s/virti-net/virtio-net/


driver.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/vhost/rte_eth_vhost.c | 2 ++
 examples/tep_termination/main.h   | 2 ++
 examples/vhost/main.h | 2 ++
 lib/librte_vhost/rte_virtio_net.h | 3 ---
 4 files changed, 6 insertions(+), 3 deletions(-)


With the commit msg fixed:
Reviewed-by: Maxime Coquelin 

Thanks,
Maxime


Re: [dpdk-dev] [PATCH 13/17] vhost: do not include net specific headers

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

Include it internally, at vhost.h.

Signed-off-by: Yuanhan Liu 
---
 examples/vhost/main.h | 2 ++
 lib/librte_vhost/rte_virtio_net.h | 4 
 lib/librte_vhost/vhost.h  | 4 
 3 files changed, 6 insertions(+), 4 deletions(-)


Reviewed-by: Maxime Coquelin 



Re: [dpdk-dev] [PATCH 14/17] vhost: rename device ops struct

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

rename "virtio_net_device_ops" to "vhost_device_ops", to not let it
be virtio-net specific.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/vhost/rte_eth_vhost.c | 2 +-
 examples/tep_termination/main.c   | 4 ++--
 examples/vhost/main.c | 4 ++--
 lib/librte_vhost/Makefile | 2 +-
 lib/librte_vhost/rte_virtio_net.h | 4 ++--
 lib/librte_vhost/socket.c | 6 +++---
 lib/librte_vhost/vhost.h  | 4 ++--
 7 files changed, 13 insertions(+), 13 deletions(-)



Reviewed-by: Maxime Coquelin 

Thanks,
Maxime


Re: [dpdk-dev] [PATCH 15/17] vhost: rename virtio-net to vhost

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

Rename "virtio-net" to "vhost" in the API comments.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/rte_virtio_net.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)



Reviewed-by: Maxime Coquelin 
Thanks,
Maxime


[dpdk-dev] [PATCH 2/2] net/tap: add Rx trigger

2017-03-14 Thread Adrien Mazarguil
This commit adds a signal-based trigger to the Rx burst function in order
to avoid unnecessary system calls while Rx queues are empty.

Triggered Rx bursts put less pressure on the kernel, free up CPU resources
for applications and result in a noticeable performance improvement when
sharing CPU threads with other PMDs.

Measuring the traffic forwarding rate between two physical devices in
testpmd (IO mode, single thread, 64B packets) before and after adding two
tap PMD instances (4 ports total) that do not process any traffic and
comparing results yields:

Without Rx trigger:

 -15% (--burst=32)
 -62% (--burst=1)

With Rx trigger:

 -0.3% (--burst=32)
 -6% (--burst=1)

Signed-off-by: Adrien Mazarguil 
Acked-by: Pascal Mazon 
---
 drivers/net/tap/rte_eth_tap.c | 59 ++
 1 file changed, 59 insertions(+)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index c757a7c..d5d467a 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -31,6 +31,8 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -42,6 +44,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -72,6 +77,8 @@ static const char *valid_arguments[] = {
 
 static int tap_unit;
 
+static volatile uint32_t tap_trigger;  /* Rx trigger */
+
 static struct rte_eth_link pmd_link = {
.link_speed = ETH_SPEED_NUM_10G,
.link_duplex = ETH_LINK_FULL_DUPLEX,
@@ -89,6 +96,7 @@ struct pkt_stats {
 
 struct rx_queue {
struct rte_mempool *mp; /* Mempool for RX packets */
+   uint32_t trigger_seen;  /* Last seen Rx trigger value */
uint16_t in_port;   /* Port ID */
int fd;
 
@@ -111,6 +119,13 @@ struct pmd_internals {
struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES];/* List of TX queues */
 };
 
+static void
+tap_trigger_cb(int sig __rte_unused)
+{
+   /* Valid trigger values are nonzero */
+   tap_trigger = (tap_trigger + 1) | 0x8000;
+}
+
 /* Tun/Tap allocation routine
  *
  * name is the number of the interface to use, unless NULL to take the host
@@ -175,6 +190,43 @@ tun_alloc(struct pmd_internals *pmd, uint16_t qid)
goto error;
}
 
+   /* Set up trigger to optimize empty Rx bursts */
+   errno = 0;
+   do {
+   struct sigaction sa;
+   int flags = fcntl(fd, F_GETFL);
+
+   if (flags == -1 || sigaction(SIGIO, NULL, &sa) == -1)
+   break;
+   if (sa.sa_handler != tap_trigger_cb) {
+   /*
+* Make sure SIGIO is not already taken. This is done
+* as late as possible to leave the application a
+* chance to set up its own signal handler first.
+*/
+   if (sa.sa_handler != SIG_IGN &&
+   sa.sa_handler != SIG_DFL) {
+   errno = EBUSY;
+   break;
+   }
+   sa = (struct sigaction){
+   .sa_flags = SA_RESTART,
+   .sa_handler = tap_trigger_cb,
+   };
+   if (sigaction(SIGIO, &sa, NULL) == -1)
+   break;
+   }
+   /* Enable SIGIO on file descriptor */
+   fcntl(fd, F_SETFL, flags | O_ASYNC);
+   fcntl(fd, F_SETOWN, getpid());
+   } while (0);
+   if (errno) {
+   /* Disable trigger globally in case of error */
+   tap_trigger = 0;
+   RTE_LOG(WARNING, PMD, "Rx trigger disabled: %s\n",
+   strerror(errno));
+   }
+
if (qid == 0) {
if (ioctl(fd, SIOCGIFHWADDR, &ifr) == -1) {
RTE_LOG(ERR, PMD, "ioctl failed (SIOCGIFHWADDR) (%s)\n",
@@ -204,7 +256,13 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t 
nb_pkts)
struct rx_queue *rxq = queue;
uint16_t num_rx;
unsigned long num_rx_bytes = 0;
+   uint32_t trigger = tap_trigger;
 
+   if (trigger == rxq->trigger_seen)
+   return 0;
+   if (trigger)
+   rxq->trigger_seen = trigger;
+   rte_compiler_barrier();
for (num_rx = 0; num_rx < nb_pkts; ) {
/* allocate the next mbuf */
mbuf = rte_pktmbuf_alloc(rxq->mp);
@@ -563,6 +621,7 @@ tap_rx_queue_setup(struct rte_eth_dev *dev,
}
 
internals->rxq[rx_queue_id].mp = mp;
+   internals->rxq[rx_queue_id].trigger_seen = 1; /* force initial burst */
internals->rxq[rx_queue_id].in_port = dev->data->port_id;
 
/* Now get the space available for data in the mbuf */
-- 
2.1.4



[dpdk-dev] [PATCH 1/2] net/tap: remove redundant syscall on Tx

2017-03-14 Thread Adrien Mazarguil
Polling the Tx queue file descriptor before writing to it is not mandatory
since it is configured as non-blocking.

Signed-off-by: Adrien Mazarguil 
Acked-by: Pascal Mazon 
---
 drivers/net/tap/rte_eth_tap.c | 26 --
 1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index efc4426..c757a7c 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -43,7 +43,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -242,7 +241,6 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t 
nb_pkts)
 {
struct rte_mbuf *mbuf;
struct tx_queue *txq = queue;
-   struct pollfd pfd;
uint16_t num_tx = 0;
unsigned long num_tx_bytes = 0;
int i, n;
@@ -250,26 +248,18 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
if (unlikely(nb_pkts == 0))
return 0;
 
-   pfd.events = POLLOUT;
-   pfd.fd = txq->fd;
for (i = 0; i < nb_pkts; i++) {
-   n = poll(&pfd, 1, 0);
-
+   /* copy the tx frame data */
+   mbuf = bufs[num_tx];
+   n = write(txq->fd,
+ rte_pktmbuf_mtod(mbuf, void *),
+ rte_pktmbuf_pkt_len(mbuf));
if (n <= 0)
break;
 
-   if (pfd.revents & POLLOUT) {
-   /* copy the tx frame data */
-   mbuf = bufs[num_tx];
-   n = write(pfd.fd, rte_pktmbuf_mtod(mbuf, void*),
- rte_pktmbuf_pkt_len(mbuf));
-   if (n <= 0)
-   break;
-
-   num_tx++;
-   num_tx_bytes += mbuf->pkt_len;
-   rte_pktmbuf_free(mbuf);
-   }
+   num_tx++;
+   num_tx_bytes += mbuf->pkt_len;
+   rte_pktmbuf_free(mbuf);
}
 
txq->stats.opackets += num_tx;
-- 
2.1.4



Re: [dpdk-dev] [PATCH 16/17] vhost: rename header file

2017-03-14 Thread Maxime Coquelin



On 03/03/2017 10:51 AM, Yuanhan Liu wrote:

Rename "rte_virtio_net.h" to "rte_vhost.h", to not let it be virtio
net specific.

Signed-off-by: Yuanhan Liu 
---
 doc/guides/rel_notes/deprecation.rst   |   9 --
 drivers/net/vhost/rte_eth_vhost.c  |   2 +-
 drivers/net/vhost/rte_eth_vhost.h  |   2 +-
 examples/tep_termination/main.c|   2 +-
 examples/tep_termination/vxlan_setup.c |   2 +-
 examples/vhost/main.c  |   2 +-
 lib/librte_vhost/Makefile  |   2 +-
 lib/librte_vhost/rte_vhost.h   | 259 +
 lib/librte_vhost/rte_virtio_net.h  | 259 -

Did you perform the rename using git mv?


 lib/librte_vhost/vhost.c   |   2 +-
 lib/librte_vhost/vhost.h   |   2 +-
 lib/librte_vhost/vhost_user.h  |   2 +-
 lib/librte_vhost/virtio_net.c  |   2 +-
 13 files changed, 269 insertions(+), 278 deletions(-)
 create mode 100644 lib/librte_vhost/rte_vhost.h
 delete mode 100644 lib/librte_vhost/rte_virtio_net.h


Other than that, the change looks good to me:
Reviewed-by: Maxime Coquelin 

Thanks,
Maxime


[dpdk-dev] [PATCH v2 0/2] net/mlx5 add rxq interrupt support

2017-03-14 Thread Shahaf Shuler
This patchset adds support for rxq interrupts on MLX5 PMD.
The first patch introduces changes on eal_interrupt to support external 
interrupt handlers.
The second patch implements the support on mlx5 PMD and demonstrate the use of 
the changes made in the first patch

on v2:
 * add new function to EAL map file.
 * change commit title.
 * add documentation to guide and release notes.

[PATCH v2 1/2] eal/linux: support external Rx interrupt
[PATCH v2 2/2] net/mlx5: support user space rxq interrupt event


[dpdk-dev] [PATCH v2 1/2] eal/linux: support external Rx interrupt

2017-03-14 Thread Shahaf Shuler
Prior to this patch only UIO/VFIO interrupt handlers types were supported.
This patch adds support for the external interrupt handler type, allowing
external drivers to set their own fds with specific interrupt handlers.

Signed-off-by: Shahaf Shuler 
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 33 ++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |  9 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map|  7 +
 3 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 92a19cb..c19eb8c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -939,6 +939,8 @@ static __attribute__((noreturn)) void *
bytes_read = sizeof(buf.vfio_intr_count);
break;
 #endif
+   case RTE_INTR_HANDLE_EXT:
+   return;
default:
bytes_read = 1;
RTE_LOG(INFO, EAL, "unexpected intr type\n");
@@ -1167,6 +1169,24 @@ static __attribute__((noreturn)) void *
return rc;
 }
 
+void
+rte_intr_free_epoll_fd(struct rte_intr_handle *intr_handle)
+{
+   uint32_t i;
+   struct rte_epoll_event *rev;
+
+   for (i = 0; i < intr_handle->nb_efd; i++) {
+   rev = &intr_handle->elist[i];
+   if (rev->status == RTE_EPOLL_INVALID)
+   continue;
+   if (rte_epoll_ctl(rev->epfd, EPOLL_CTL_DEL, rev->fd, rev)) {
+   /* force free if the entry valid */
+   eal_epoll_data_safe_free(rev);
+   rev->status = RTE_EPOLL_INVALID;
+   }
+   }
+}
+
 int
 rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd)
 {
@@ -1202,19 +1222,8 @@ static __attribute__((noreturn)) void *
 rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
 {
uint32_t i;
-   struct rte_epoll_event *rev;
-
-   for (i = 0; i < intr_handle->nb_efd; i++) {
-   rev = &intr_handle->elist[i];
-   if (rev->status == RTE_EPOLL_INVALID)
-   continue;
-   if (rte_epoll_ctl(rev->epfd, EPOLL_CTL_DEL, rev->fd, rev)) {
-   /* force free if the entry valid */
-   eal_epoll_data_safe_free(rev);
-   rev->status = RTE_EPOLL_INVALID;
-   }
-   }
 
+   rte_intr_free_epoll_fd(intr_handle);
if (intr_handle->max_intr > intr_handle->nb_efd) {
for (i = 0; i < intr_handle->nb_efd; i++)
close(intr_handle->efds[i]);
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h 
b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index d459bf4..735d307 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -171,6 +171,15 @@ struct rte_intr_handle {
int epfd, int op, unsigned int vec, void *data);
 
 /**
+ * It deletes registered eventfds.
+ *
+ * @param intr_handle
+ *   Pointer to the interrupt handle.
+ */
+void
+rte_intr_free_epoll_fd(struct rte_intr_handle *intr_handle);
+
+/**
  * It enables the packet I/O interrupt event if it's necessary.
  * It creates event fd for each interrupt vector when MSIX is used,
  * otherwise it multiplexes a single event fd.
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map 
b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 9c134b4..905d8de 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -186,3 +186,10 @@ DPDK_17.02 {
rte_bus_unregister;
 
 } DPDK_16.11;
+
+DPDK_17.05 {
+   global:
+
+   rte_intr_free_epoll_fd;
+
+} DPDK_17.02;
-- 
1.8.3.1



[dpdk-dev] [PATCH v2 2/2] net/mlx5: support user space rxq interrupt event

2017-03-14 Thread Shahaf Shuler
Implement rxq interrupt callbacks

Signed-off-by: Shahaf Shuler 
---
 doc/guides/nics/features/mlx5.ini  |   1 +
 doc/guides/rel_notes/release_17_05.rst |   6 ++
 drivers/net/mlx5/Makefile  |   5 ++
 drivers/net/mlx5/mlx5.c|   2 +
 drivers/net/mlx5/mlx5_rxq.c| 126 -
 drivers/net/mlx5/mlx5_rxtx.c   |  73 +++
 drivers/net/mlx5/mlx5_rxtx.h   |   7 ++
 drivers/net/mlx5/mlx5_trigger.c|   9 +++
 8 files changed, 228 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/features/mlx5.ini 
b/doc/guides/nics/features/mlx5.ini
index 1814f82..532c0ef 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -7,6 +7,7 @@
 Speed capabilities   = Y
 Link status  = Y
 Link status event= Y
+Rx interrupt = Y
 Queue start/stop = Y
 MTU update   = Y
 Jumbo frame  = Y
diff --git a/doc/guides/rel_notes/release_17_05.rst 
b/doc/guides/rel_notes/release_17_05.rst
index 49e9640..f9f79c2 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -62,6 +62,11 @@ New Features
   Added support for Hardware TSO for tunneled and non-tunneled packets.
   Tunneling protocols supported are GRE and VXLAN.
 
+* **Added support for Rx interrupts on mlx5 driver.**
+
+  Rx queues can be armed with an interrupt which will trigger on the
+  next packet arrival.
+
 * **Added vmxnet3 version 3 support.**
 
   Added support for vmxnet3 version 3 which includes several
@@ -73,6 +78,7 @@ New Features
   * Generic flow API support for Ethernet, VLAN, IPv4, IPv6, UDP and TCP
 pattern items with QUEUE action for ingress traffic.
 
+
 Resolved Issues
 ---
 
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 671089c..084044a 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -137,6 +137,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
/usr/include/linux/ethtool.h \
enum ETHTOOL_LINK_MODE_10baseKR4_Full_BIT \
$(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_UPDATE_CQ_CI \
+   infiniband/mlx5_hw.h \
+   func ibv_mlx5_exp_update_cq_ci \
+   $(AUTOCONF_OUTPUT)
 
 # Create mlx5_autoconf.h or update it in case it differs from the new one.
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 6f42948..ebc7984 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -225,6 +225,8 @@
.rss_hash_update = mlx5_rss_hash_update,
.rss_hash_conf_get = mlx5_rss_hash_conf_get,
.filter_ctrl = mlx5_dev_filter_ctrl,
+   .rx_queue_intr_enable = mlx5_rx_intr_enable,
+   .rx_queue_intr_disable = mlx5_rx_intr_disable,
 };
 
 static struct {
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 28e93d3..e6070a0 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Verbs header. */
 /* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
@@ -57,6 +58,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
@@ -773,6 +775,8 @@
claim_zero(ibv_exp_destroy_wq(rxq_ctrl->wq));
if (rxq_ctrl->cq != NULL)
claim_zero(ibv_destroy_cq(rxq_ctrl->cq));
+   if (rxq_ctrl->channel != NULL)
+   claim_zero(ibv_destroy_comp_channel(rxq_ctrl->channel));
if (rxq_ctrl->rd != NULL) {
struct ibv_exp_destroy_res_domain_attr attr = {
.comp_mask = 0,
@@ -1014,6 +1018,16 @@
  (void *)dev, strerror(ret));
goto error;
}
+   if (dev->data->dev_conf.intr_conf.rxq) {
+   tmpl.channel = ibv_create_comp_channel(priv->ctx);
+   if (tmpl.channel == NULL) {
+   dev->data->dev_conf.intr_conf.rxq = 0;
+   ret = ENOMEM;
+   ERROR("%p: Comp Channel creation failure: %s",
+   (void *)dev, strerror(ret));
+   goto error;
+   }
+   }
attr.cq = (struct ibv_exp_cq_init_attr){
.comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN,
.res_domain = tmpl.rd,
@@ -1023,7 +1037,7 @@
attr.cq.flags |= IBV_EXP_CQ_COMPRESSED_CQE;
cqe_n = (desc * 2) - 1; /* Double the number of CQEs. */
}
-   tmpl.cq = ibv_exp_create_cq(priv->ctx, cqe_n, NULL, NULL, 0,
+   tmpl.cq = ibv_exp_create_cq(priv->ctx, cqe_n, NULL, tmpl.channel, 0,
&attr.cq);
if (tmpl.cq == NULL) {
ret = ENOMEM;
@@ -1347,3 +1361,113 @@
rxq = (*priv->rxqs)[index];
return priv->dev->rx_pkt_

Re: [dpdk-dev] [PATCH 1/2] net/i40e: enable VF untag drop

2017-03-14 Thread Ferruh Yigit
On 3/9/2017 3:24 AM, Zhang, Qi Z wrote:
> Hi Ferruh:
> 
>> -Original Message-
>> From: Yigit, Ferruh
>> Sent: Tuesday, March 7, 2017 6:51 PM
>> To: Zhang, Qi Z ; Wu, Jingjing ;
>> Zhang, Helin 
>> Cc: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH 1/2] net/i40e: enable VF untag drop
>>
>> On 3/3/2017 1:59 AM, Qi Zhang wrote:
>>> Add a new private API to support the untag drop enable/disable for
>>> specific VF.
>>>
>>> Signed-off-by: Qi Zhang 
>>> ---
>>>  drivers/net/i40e/i40e_ethdev.c  | 49
>>> +
>>>  drivers/net/i40e/rte_pmd_i40e.h | 18 +++
>>
>> Shared library is giving build error because of API is missing in 
>> *version.map file
>>
>>>  2 files changed, 67 insertions(+)
>>>
>>
>> <...>
>>
>>> diff --git a/drivers/net/i40e/rte_pmd_i40e.h
>>> b/drivers/net/i40e/rte_pmd_i40e.h index a0ad88c..895e2cc 100644
>>> --- a/drivers/net/i40e/rte_pmd_i40e.h
>>> +++ b/drivers/net/i40e/rte_pmd_i40e.h
>>> @@ -332,4 +332,22 @@ int rte_pmd_i40e_get_vf_stats(uint8_t port,  int
>>> rte_pmd_i40e_reset_vf_stats(uint8_t port,
>>> uint16_t vf_id);
>>>
>>> +/**
>>> + * Enable/Disable VF untag drop
>>> + *
>>> + * @param port
>>> + *The port identifier of the Ethernet device.
>>> + * @param vf_id
>>> + *VF on witch to enable/disable
>>> + * @param on
>>> + *Enable or Disable
>>> + * @retura
>>
>> @return
>>
>>> + *  - (0) if successful.
>>> + *  -(-ENODEVE) if *port* invalid
>>> + *  -(-EINVAL) if bad parameter.
>>> + */
>>> +int rte_pmd_i40e_set_vf_vlan_untag_drop(uint8_t port,
>>> +   uint16_t vf_id,
>>> +   uint8_t on);
>>
>> As discussed previously, I believe it is good to keep following syntax in 
>> API:
>> __, for this API it becomes:
> I think, current naming rule is __ right? 

Overall, I am not aware of any defined naming rule, I am for defining one.

> See below
>   rte_pmd_i40e_set_vf_vlan_anti_spoof;
> rte_pmd_i40e_set_vf_vlan_filter;
> rte_pmd_i40e_set_vf_vlan_insert;
> rte_pmd_i40e_set_vf_vlan_stripq;
> rte_pmd_i40e_set_vf_vlan_tag;
> so what's wrong with this?

This breaks hierarchical approach, if you think API name as tree. Easier
to see this when you sort the APIs, ns_set_x, ns_reset_x, ns_del_x will
spread to different locations.

This looks OK when you work on one type of object already, but with all
APIs in concern, I believe object based grouping is better than action
based grouping.

And why do you think above one is better? Again, as long as one is
agreed on, I am OK.

>>
>> rte_pmd_i40e_vf_vlan_untag_drop_set(), and perhaps "set" can be removed?
>>
>>> +
>>>  #endif /* _PMD_I40E_H_ */
>>>
> 



Re: [dpdk-dev] [PATCH 2/2] app/testpmd: enable VF untag drop in testpmd

2017-03-14 Thread Ferruh Yigit
On 3/9/2017 2:59 AM, Zhang, Qi Z wrote:
> 
> 
>> -Original Message-
>> From: Yigit, Ferruh
>> Sent: Tuesday, March 7, 2017 7:14 PM
>> To: Zhang, Qi Z ; Wu, Jingjing ;
>> Zhang, Helin 
>> Cc: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH 2/2] app/testpmd: enable VF untag drop in
>> testpmd
>>
>> On 3/3/2017 1:59 AM, Qi Zhang wrote:
>>> Add command line to support untag drop to specific VF in testpmd.
>>>
>>> Signed-off-by: Qi Zhang 
>>> ---
>>>  app/test-pmd/cmdline.c | 104
>>> +
>>>  1 file changed, 104 insertions(+)
>>>
>>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
>>> 43fc636..4ddc2c9 100644
>>> --- a/app/test-pmd/cmdline.c
>>> +++ b/app/test-pmd/cmdline.c
>>> @@ -311,6 +311,10 @@ static void cmd_help_long_parsed(void
>>> *parsed_result,
>>>
>>> "set vf vlan antispoof (port_id) (vf_id) (on|off)\n"
>>> "Set VLAN antispoof for a VF from the PF.\n\n"
>>> +#ifdef RTE_LIBRTE_I40E_PMD
>>> +   "set vf vlan untagdrop (port_id) (vf_id) (on|off)\n"
>>> +   "Set VLAN untag drop for a VF from the PF.\n\n"
>>> +#endif
>>
>> We should decide how to implement PMD specific APIs in testpmd, and be
>> consistent about it.
>>
>> Currently there are two approaches:
>>
>> 1- Wrap PMD specific feature and API with and PMD #ifdef, as done here.
>>
>> 2- Enable feature by default, return -ENOTSUP for port_id that does not 
>> support
>> it. Ex: cmd_vf_rxvlan_filter.
>>
>> I am for second option. And explicitly not disabling I40E driver does not 
>> mean
>> you should have those NICs in your runtime environment, so the effect will be
>> same as always enabling it.
>>
> Yes, I notice this problem, during implementation, I saw both patterns exist, 
> so I have to choose one of them
> We'd better align this.
> Both option ok for me, but a little bit prefer option 1 , since it's not 
> necessary to explore a command if no backend device, that make the hint more 
> clean.

I agree it's not necessary to explore a command if no backend device,
but first option does not guaranties it. What it checks is compile time
option, not runtime device detection.
I40E driver is enabled by default, and unless user explicitly disables
it, testpmd command will be there independent from device is there or not.

>>
>> And since number of PMD specific APIs are increasing, perhaps we should find 
>> a
>> better approach for testpmd to prevent them corrupting testpmd.
> Will think about this, also like to know if you or anyone have any good 
> suggestion.
>>
>> Also it may worth to discuss why number of PMD specific APIs are increasing.
>>
>>>
>>> "set vf vlan tag (port_id) (vf_id) (on|off)\n"
>>> "Set VLAN tag for a VF from the PF.\n\n"
>>> @@ -10995,6 +10999,103 @@ cmdline_parse_inst_t
>> cmd_set_vf_vlan_anti_spoof = {
>>> },
>>>  };
>>>
>> <...>



Re: [dpdk-dev] [PATCH 1/2] net/tap: remove redundant syscall on Tx

2017-03-14 Thread Wiles, Keith

> On Mar 14, 2017, at 8:51 PM, Adrien Mazarguil  
> wrote:
> 
> Polling the Tx queue file descriptor before writing to it is not mandatory
> since it is configured as non-blocking.
> 
> Signed-off-by: Adrien Mazarguil 
> Acked-by: Pascal Mazon 

Acked-by: Keith Wiles 

> ---
> drivers/net/tap/rte_eth_tap.c | 26 --
> 1 file changed, 8 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
> index efc4426..c757a7c 100644
> --- a/drivers/net/tap/rte_eth_tap.c
> +++ b/drivers/net/tap/rte_eth_tap.c
> @@ -43,7 +43,6 @@
> #include 
> #include 
> #include 
> -#include 
> #include 
> #include 
> #include 
> @@ -242,7 +241,6 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, 
> uint16_t nb_pkts)
> {
>   struct rte_mbuf *mbuf;
>   struct tx_queue *txq = queue;
> - struct pollfd pfd;
>   uint16_t num_tx = 0;
>   unsigned long num_tx_bytes = 0;
>   int i, n;
> @@ -250,26 +248,18 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, 
> uint16_t nb_pkts)
>   if (unlikely(nb_pkts == 0))
>   return 0;
> 
> - pfd.events = POLLOUT;
> - pfd.fd = txq->fd;
>   for (i = 0; i < nb_pkts; i++) {
> - n = poll(&pfd, 1, 0);
> -
> + /* copy the tx frame data */
> + mbuf = bufs[num_tx];
> + n = write(txq->fd,
> +   rte_pktmbuf_mtod(mbuf, void *),
> +   rte_pktmbuf_pkt_len(mbuf));
>   if (n <= 0)
>   break;
> 
> - if (pfd.revents & POLLOUT) {
> - /* copy the tx frame data */
> - mbuf = bufs[num_tx];
> - n = write(pfd.fd, rte_pktmbuf_mtod(mbuf, void*),
> -   rte_pktmbuf_pkt_len(mbuf));
> - if (n <= 0)
> - break;
> -
> - num_tx++;
> - num_tx_bytes += mbuf->pkt_len;
> - rte_pktmbuf_free(mbuf);
> - }
> + num_tx++;
> + num_tx_bytes += mbuf->pkt_len;
> + rte_pktmbuf_free(mbuf);
>   }
> 
>   txq->stats.opackets += num_tx;
> -- 
> 2.1.4
> 

Regards,
Keith



Re: [dpdk-dev] [PATCH 2/2] net/tap: add Rx trigger

2017-03-14 Thread Wiles, Keith

> On Mar 14, 2017, at 8:51 PM, Adrien Mazarguil  
> wrote:
> 
> This commit adds a signal-based trigger to the Rx burst function in order
> to avoid unnecessary system calls while Rx queues are empty.
> 
> Triggered Rx bursts put less pressure on the kernel, free up CPU resources
> for applications and result in a noticeable performance improvement when
> sharing CPU threads with other PMDs.
> 
> Measuring the traffic forwarding rate between two physical devices in
> testpmd (IO mode, single thread, 64B packets) before and after adding two
> tap PMD instances (4 ports total) that do not process any traffic and
> comparing results yields:
> 
> Without Rx trigger:
> 
> -15% (--burst=32)
> -62% (--burst=1)
> 
> With Rx trigger:
> 
> -0.3% (--burst=32)
> -6% (--burst=1)
> 
> Signed-off-by: Adrien Mazarguil 
> Acked-by: Pascal Mazon 

Acked-by: Keith Wiles 

> ---
> drivers/net/tap/rte_eth_tap.c | 59 ++
> 1 file changed, 59 insertions(+)
> 
> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
> index c757a7c..d5d467a 100644
> --- a/drivers/net/tap/rte_eth_tap.c
> +++ b/drivers/net/tap/rte_eth_tap.c
> @@ -31,6 +31,8 @@
>  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>  */
> 
> +#include 
> +#include 
> #include 
> #include 
> #include 
> @@ -42,6 +44,9 @@
> #include 
> #include 
> #include 
> +#include 
> +#include 
> +#include 
> #include 
> #include 
> #include 
> @@ -72,6 +77,8 @@ static const char *valid_arguments[] = {
> 
> static int tap_unit;
> 
> +static volatile uint32_t tap_trigger;/* Rx trigger */
> +
> static struct rte_eth_link pmd_link = {
>   .link_speed = ETH_SPEED_NUM_10G,
>   .link_duplex = ETH_LINK_FULL_DUPLEX,
> @@ -89,6 +96,7 @@ struct pkt_stats {
> 
> struct rx_queue {
>   struct rte_mempool *mp; /* Mempool for RX packets */
> + uint32_t trigger_seen;  /* Last seen Rx trigger value */
>   uint16_t in_port;   /* Port ID */
>   int fd;
> 
> @@ -111,6 +119,13 @@ struct pmd_internals {
>   struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES];/* List of TX queues */
> };
> 
> +static void
> +tap_trigger_cb(int sig __rte_unused)
> +{
> + /* Valid trigger values are nonzero */
> + tap_trigger = (tap_trigger + 1) | 0x8000;
> +}
> +
> /* Tun/Tap allocation routine
>  *
>  * name is the number of the interface to use, unless NULL to take the host
> @@ -175,6 +190,43 @@ tun_alloc(struct pmd_internals *pmd, uint16_t qid)
>   goto error;
>   }
> 
> + /* Set up trigger to optimize empty Rx bursts */
> + errno = 0;
> + do {
> + struct sigaction sa;
> + int flags = fcntl(fd, F_GETFL);
> +
> + if (flags == -1 || sigaction(SIGIO, NULL, &sa) == -1)
> + break;
> + if (sa.sa_handler != tap_trigger_cb) {
> + /*
> +  * Make sure SIGIO is not already taken. This is done
> +  * as late as possible to leave the application a
> +  * chance to set up its own signal handler first.
> +  */
> + if (sa.sa_handler != SIG_IGN &&
> + sa.sa_handler != SIG_DFL) {
> + errno = EBUSY;
> + break;
> + }
> + sa = (struct sigaction){
> + .sa_flags = SA_RESTART,
> + .sa_handler = tap_trigger_cb,
> + };
> + if (sigaction(SIGIO, &sa, NULL) == -1)
> + break;
> + }
> + /* Enable SIGIO on file descriptor */
> + fcntl(fd, F_SETFL, flags | O_ASYNC);
> + fcntl(fd, F_SETOWN, getpid());
> + } while (0);
> + if (errno) {
> + /* Disable trigger globally in case of error */
> + tap_trigger = 0;
> + RTE_LOG(WARNING, PMD, "Rx trigger disabled: %s\n",
> + strerror(errno));
> + }
> +
>   if (qid == 0) {
>   if (ioctl(fd, SIOCGIFHWADDR, &ifr) == -1) {
>   RTE_LOG(ERR, PMD, "ioctl failed (SIOCGIFHWADDR) (%s)\n",
> @@ -204,7 +256,13 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, 
> uint16_t nb_pkts)
>   struct rx_queue *rxq = queue;
>   uint16_t num_rx;
>   unsigned long num_rx_bytes = 0;
> + uint32_t trigger = tap_trigger;
> 
> + if (trigger == rxq->trigger_seen)
> + return 0;
> + if (trigger)
> + rxq->trigger_seen = trigger;
> + rte_compiler_barrier();
>   for (num_rx = 0; num_rx < nb_pkts; ) {
>   /* allocate the next mbuf */
>   mbuf = rte_pktmbuf_alloc(rxq->mp);
> @@ -563,6 +621,7 @@ tap_rx_queue_setup(struct rte_eth_dev *dev,
>   }
> 
>   internals->rxq[rx_queue_id].mp = mp;
> + internals->rxq[r

Re: [dpdk-dev] i40e SR-IOV with Linux PF

2017-03-14 Thread Chen, Jing D
> 
> +Cc dev@dpdk.org
> 
> 2017-03-13 15:29, Thomas Monjalon:
> > Hi i40e developers,
> >
> > Referring to the VFD discussion, I thought basic behaviours were the
> > same regardless of the PF driver:
> > http://dpdk.org/ml/archives/dev/2016-December/053056.html
> > "
> > > In the meanwhile, we have some test models ongoing to validate
> > > combination of Linux and DPDK drivers for VF and PF.
> > > We'll fully support below 4 cases going forward.
> > > 1. DPDK PF + DPDK VF
> > > 2. DPDK PF + Linux VF
> > > 3. Linux PF + DPDK VF
> > > 4. Linux PF + Linux VF (it's not our scope)
> > [...]
> > > Linux PF + DPDK VF has been tested with 1.0 API long time ago.
> > > There is some test activities ongoing.
> > "
> >
> > I think the Linux PF case is important and deserves more consideration.
> > When looking at the code, specifically i40evf_vlan_offload_set() and
> > i40evf_vlan_pvid_set(), I read this:
> > "
> > /* Linux pf host doesn't support vlan offload yet */
> > if (vf->version_major == I40E_DPDK_VERSION_MAJOR) { "
> >
> > Is there some work in progress on Linux side to get the same behaviour
> > as with a DPDK PF?
> >
> > At least, it must be documented in
> > doc/guides/nics/features/i40e_vf.ini
> > and marked as partially supported (P instead of Y) in
> > doc/guides/nics/i40e.rst

Thanks for pointing it out. We'll sync our code with latest kernel driver and
document and comment properly soon.


Re: [dpdk-dev] [PATCH v4] net/tap: add remote netdevice traffic capture

2017-03-14 Thread Wiles, Keith

> On Mar 14, 2017, at 4:34 PM, Pascal Mazon  wrote:
> 
> By default, a tap netdevice is of no use when not fed by a separate
> process. The ability to automatically feed it from another netdevice
> allows applications to capture any kind of traffic normally destined to
> the kernel stack.
> 
> This patch implements this ability through a new optional "remote"
> parameter.
> 
> Packets matching filtering rules created with the flow API are matched
> on the remote device and redirected to the tap PMD, where the relevant
> action will be performed.
> 
> Signed-off-by: Pascal Mazon 
> Acked-by: Olga Shern 

Acked-by: Keith Wiles 

> ---
> doc/guides/nics/tap.rst   |  17 ++
> drivers/net/tap/rte_eth_tap.c | 101 +-
> drivers/net/tap/rte_eth_tap.h |   4 +
> drivers/net/tap/tap_flow.c| 451 --
> drivers/net/tap/tap_flow.h|  24 +++
> 5 files changed, 575 insertions(+), 22 deletions(-)

Regards,
Keith



Re: [dpdk-dev] [PATCH v4 3/4] net/tap: add netlink back-end for flow API

2017-03-14 Thread Wiles, Keith

> On Mar 14, 2017, at 4:29 PM, Pascal Mazon  wrote:
> 
> Each kernel netdevice may have queueing disciplines set for it, which
> determine how to handle the packet (mostly on egress). That's part of
> the TC (Traffic Control) mechanism.
> 
> Through TC, it is possible to set filter rules that match specific
> packets, and act according to what is in the rule. This is a perfect
> candidate to implement the flow API for the tap PMD, as it has an
> associated kernel netdevice automatically.
> 
> Each flow API rule will be translated into its TC counterpart.
> 
> To leverage TC, it is necessary to communicate with the kernel using
> netlink. This patch introduces a library to help that communication.
> 
> Inside netlink.c, functions are generic for any netlink messaging.
> Inside tcmsgs.c, functions are specific to deal with TC rules.
> 
> Signed-off-by: Pascal Mazon 
> Acked-by: Olga Shern 

Acked-by: Keith Wiles 


Regards,
Keith



Re: [dpdk-dev] [PATCH v4 1/4] net/tap: move private elements to external header

2017-03-14 Thread Wiles, Keith

> On Mar 14, 2017, at 4:29 PM, Pascal Mazon  wrote:
> 
> In the next patch, access to struct pmd_internals will be necessary in
> tap_flow.c to store the flows.
> 
> Signed-off-by: Pascal Mazon 
> Acked-by: Olga Shern 

Acked-by: Keith Wiles 

Regards,
Keith



Re: [dpdk-dev] Compiling DPDK 17.02 with a Busybox-based tar

2017-03-14 Thread Raphael Cohn
Why it is surprising that I need to compile DPDK on a system Busybox?

I currently build DPDK both for my local Alpine Linux system (busybox)
using the system compiler, use busybox for my cross-tools toolchain, and
also use it in Libertine Linux. It means a system using DPDK can be much
more minimal. Likewise, Aboriginal Linux uses Busybox (and, shortly,
Toybox), for similar reasons. It also means users of my rust crate can work
in a wider range of systems than RHEL / Ubuntu.

On 14 March 2017 at 09:39, Thomas Monjalon 
wrote:

> 2017-03-14 07:58, Raphael Cohn:
> > Hi,
> >
> > To compile DPDK on a system with Busybox tar installed, it's necessary to
> > make a small change to the build system:-
> >
> > sed -i -e '/--keep-newer-files/d' mk/rte.sdkinstall.mk
> > sed -i -e 's;--strip-components=1 \\;--strip-components=1;g' mk/
> > rte.sdkinstall.mk
> >
> > I'm not sure whether the impact of this change fundamentally affects
> DPDK.
> > I'm a little surprised that tar is needed at all for a compile + install,
> > but I haven't investigated further. Is it being used to do a copy?
>
> Yes it is used to make a copy.
> It is a convenient one-liner.
>
> I am a bit surprised that you need to install DPDK with busybox.
> The busybox systems are generally cross-built and prepared out of the box,
> with the host tools.
> However, if you feel it is important to install DPDK on such target,
> you are welcome to propose a patch.
>
>


Re: [dpdk-dev] [PATCH 1/2] net/i40e: enable VF untag drop

2017-03-14 Thread Zhang, Qi Z


> -Original Message-
> From: Yigit, Ferruh
> Sent: Tuesday, March 14, 2017 9:30 PM
> To: Zhang, Qi Z ; Wu, Jingjing
> ; Zhang, Helin 
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/2] net/i40e: enable VF untag drop
> 
> On 3/9/2017 3:24 AM, Zhang, Qi Z wrote:
> > Hi Ferruh:
> >
> >> -Original Message-
> >> From: Yigit, Ferruh
> >> Sent: Tuesday, March 7, 2017 6:51 PM
> >> To: Zhang, Qi Z ; Wu, Jingjing
> >> ; Zhang, Helin 
> >> Cc: dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH 1/2] net/i40e: enable VF untag drop
> >>
> >> On 3/3/2017 1:59 AM, Qi Zhang wrote:
> >>> Add a new private API to support the untag drop enable/disable for
> >>> specific VF.
> >>>
> >>> Signed-off-by: Qi Zhang 
> >>> ---
> >>>  drivers/net/i40e/i40e_ethdev.c  | 49
> >>> +
> >>>  drivers/net/i40e/rte_pmd_i40e.h | 18 +++
> >>
> >> Shared library is giving build error because of API is missing in
> >> *version.map file
> >>
> >>>  2 files changed, 67 insertions(+)
> >>>
> >>
> >> <...>
> >>
> >>> diff --git a/drivers/net/i40e/rte_pmd_i40e.h
> >>> b/drivers/net/i40e/rte_pmd_i40e.h index a0ad88c..895e2cc 100644
> >>> --- a/drivers/net/i40e/rte_pmd_i40e.h
> >>> +++ b/drivers/net/i40e/rte_pmd_i40e.h
> >>> @@ -332,4 +332,22 @@ int rte_pmd_i40e_get_vf_stats(uint8_t port,
> >>> int rte_pmd_i40e_reset_vf_stats(uint8_t port,
> >>>   uint16_t vf_id);
> >>>
> >>> +/**
> >>> + * Enable/Disable VF untag drop
> >>> + *
> >>> + * @param port
> >>> + *The port identifier of the Ethernet device.
> >>> + * @param vf_id
> >>> + *VF on witch to enable/disable
> >>> + * @param on
> >>> + *Enable or Disable
> >>> + * @retura
> >>
> >> @return
> >>
> >>> + *  - (0) if successful.
> >>> + *  -(-ENODEVE) if *port* invalid
> >>> + *  -(-EINVAL) if bad parameter.
> >>> + */
> >>> +int rte_pmd_i40e_set_vf_vlan_untag_drop(uint8_t port,
> >>> + uint16_t vf_id,
> >>> + uint8_t on);
> >>
> >> As discussed previously, I believe it is good to keep following syntax in
> API:  
> >> __, for this API it becomes:
> > I think, current naming rule is __ right?
> 
> Overall, I am not aware of any defined naming rule, I am for defining one.
> 
> > See below
> > rte_pmd_i40e_set_vf_vlan_anti_spoof;
> > rte_pmd_i40e_set_vf_vlan_filter;
> > rte_pmd_i40e_set_vf_vlan_insert;
> > rte_pmd_i40e_set_vf_vlan_stripq;
> > rte_pmd_i40e_set_vf_vlan_tag;
> > so what's wrong with this 
> 
> This breaks hierarchical approach, if you think API name as tree. Easier to
> see this when you sort the APIs, ns_set_x, ns_reset_x, ns_del_x will spread
> to different locations.
I agree with your point, I had thought your concern is only about this patch, 
but actually it's not.
> 
> This looks OK when you work on one type of object already, but with all APIs
> in concern, I believe object based grouping is better than action based
> grouping.

> 
> And why do you think above one is better? Again, as long as one is agreed on,
I don't, sorry for make you misunderstand
> 
> >>
> >> rte_pmd_i40e_vf_vlan_untag_drop_set(), and perhaps "set" can be
> removed?
> >>
> >>> +
> >>>  #endif /* _PMD_I40E_H_ */
> >>>
> >



Re: [dpdk-dev] [PATCH 2/2] app/testpmd: enable VF untag drop in testpmd

2017-03-14 Thread Zhang, Qi Z


> -Original Message-
> From: Yigit, Ferruh
> Sent: Tuesday, March 14, 2017 9:32 PM
> To: Zhang, Qi Z ; Wu, Jingjing
> ; Zhang, Helin 
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/2] app/testpmd: enable VF untag drop in
> testpmd
> 
> On 3/9/2017 2:59 AM, Zhang, Qi Z wrote:
> >
> >
> >> -Original Message-
> >> From: Yigit, Ferruh
> >> Sent: Tuesday, March 7, 2017 7:14 PM
> >> To: Zhang, Qi Z ; Wu, Jingjing
> >> ; Zhang, Helin 
> >> Cc: dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH 2/2] app/testpmd: enable VF untag drop
> >> in testpmd
> >>
> >> On 3/3/2017 1:59 AM, Qi Zhang wrote:
> >>> Add command line to support untag drop to specific VF in testpmd.
> >>>
> >>> Signed-off-by: Qi Zhang 
> >>> ---
> >>>  app/test-pmd/cmdline.c | 104
> >>> +
> >>>  1 file changed, 104 insertions(+)
> >>>
> >>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> >>> 43fc636..4ddc2c9 100644
> >>> --- a/app/test-pmd/cmdline.c
> >>> +++ b/app/test-pmd/cmdline.c
> >>> @@ -311,6 +311,10 @@ static void cmd_help_long_parsed(void
> >>> *parsed_result,
> >>>
> >>>   "set vf vlan antispoof (port_id) (vf_id) (on|off)\n"
> >>>   "Set VLAN antispoof for a VF from the PF.\n\n"
> >>> +#ifdef RTE_LIBRTE_I40E_PMD
> >>> + "set vf vlan untagdrop (port_id) (vf_id) (on|off)\n"
> >>> + "Set VLAN untag drop for a VF from the PF.\n\n"
> >>> +#endif
> >>
> >> We should decide how to implement PMD specific APIs in testpmd, and
> >> be consistent about it.
> >>
> >> Currently there are two approaches:
> >>
> >> 1- Wrap PMD specific feature and API with and PMD #ifdef, as done here.
> >>
> >> 2- Enable feature by default, return -ENOTSUP for port_id that does
> >> not support it. Ex: cmd_vf_rxvlan_filter.
> >>
> >> I am for second option. And explicitly not disabling I40E driver does
> >> not mean you should have those NICs in your runtime environment, so
> >> the effect will be same as always enabling it.
> >>
> > Yes, I notice this problem, during implementation, I saw both patterns
> > exist, so I have to choose one of them We'd better align this.
> > Both option ok for me, but a little bit prefer option 1 , since it's not
> necessary to explore a command if no backend device, that make the hint
> more clean.
> 
> I agree it's not necessary to explore a command if no backend device, but
> first option does not guaranties it. What it checks is compile time option, 
> not
> runtime device detection.
> I40E driver is enabled by default, and unless user explicitly disables it,
> testpmd command will be there independent from device is there or not.
So this still benefit custom build, right?
Anyway, as I said, both are ok for me, I followed option 2 since most APIs 
follow this.
Please check v2.
> 
> >>
> >> And since number of PMD specific APIs are increasing, perhaps we
> >> should find a better approach for testpmd to prevent them corrupting
> testpmd.
> > Will think about this, also like to know if you or anyone have any good
> suggestion.
> >>
> >> Also it may worth to discuss why number of PMD specific APIs are
> increasing.
> >>
> >>>
> >>>   "set vf vlan tag (port_id) (vf_id) (on|off)\n"
> >>>   "Set VLAN tag for a VF from the PF.\n\n"
> >>> @@ -10995,6 +10999,103 @@ cmdline_parse_inst_t
> >> cmd_set_vf_vlan_anti_spoof = {
> >>>   },
> >>>  };
> >>>
> >> <...>



Re: [dpdk-dev] i40e SR-IOV with Linux PF

2017-03-14 Thread Wu, Jingjing


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monja...@6wind.com]
> Sent: Tuesday, March 14, 2017 2:31 AM
> To: Wu, Jingjing 
> Cc: Zhang, Helin ; Chen, Jing D
> ; Yigit, Ferruh ;
> vincent.jar...@6wind.com; dev@dpdk.org
> Subject: Re: i40e SR-IOV with Linux PF
> 
> 2017-03-14 04:44, Wu, Jingjing:
> > From: Thomas Monjalon [mailto:thomas.monja...@6wind.com]
> > > > Hi i40e developers,
> > > >
> > > > Referring to the VFD discussion, I thought basic behaviours were
> > > > the same regardless of the PF driver:
> > > > http://dpdk.org/ml/archives/dev/2016-December/053056.html
> > > > "
> > > > > In the meanwhile, we have some test models ongoing to validate
> > > > > combination of Linux and DPDK drivers for VF and PF.
> > > > > We'll fully support below 4 cases going forward.
> > > > > 1. DPDK PF + DPDK VF
> > > > > 2. DPDK PF + Linux VF
> > > > > 3. Linux PF + DPDK VF
> > > > > 4. Linux PF + Linux VF (it's not our scope)
> > > > [...]
> > > > > Linux PF + DPDK VF has been tested with 1.0 API long time ago.
> > > > > There is some test activities ongoing.
> > > > "
> > > >
> > > > I think the Linux PF case is important and deserves more consideration.
> > > > When looking at the code, specifically i40evf_vlan_offload_set()
> > > > and i40evf_vlan_pvid_set(), I read this:
> > > > "
> > > > /* Linux pf host doesn't support vlan offload yet */
> > > > if (vf->version_major == I40E_DPDK_VERSION_MAJOR) { "
> > > >
> > > > Is there some work in progress on Linux side to get the same
> > > > behaviour as with a DPDK PF?
> > > >
> >
> > As I know, VFD features are marked with an "EXPERIMENTAL" tag.
> > And we are working on the extendable interface (feature based) with PF
> > kernel driver.
> 
> The VLAN offload is not a VFD feature. It is a basic driver feature.
> It is said that it is supported in the documentation but it is not with a 
> Linux PF.
> 
> Please consider the rest of my email:
> 
Yes, I was saying VLAN offload. What we are working to make VF and PF sync
based on feature ability but not whether it is DPDK PF or Kernel PF.

And, about the doc, yes, we need to add such limitation here.

Thanks
Jingjing
> > > At least, it must be documented in
> > >   doc/guides/nics/features/i40e_vf.ini
> > > and marked as partially supported (P instead of Y) in
> > >   doc/guides/nics/i40e.rst


Re: [dpdk-dev] [PATCH v2 00/13] introduce fail-safe PMD

2017-03-14 Thread Gaëtan Rivet

> The central question that I would like to tackle is this: why should
> we require from our users declaring a bonding device to have
> hot-plug support?
>
We'll, strictly speaking, I suppose we don't have to require it.  But by that
same token, we don't need to do it in a separate PMD either, there are lots of
other options.



I think I used an ambiguous formulation here. To be sure that we
understand each other, what I want to express is that it will certainly
seem very strange for a user to declare a bond on a single device,
first, and to be expected to do so if they wanted hot-plug support on
that particular device.

The bonding here would only be used as a place-holder, which does not
really make sense from the user point of view.

I think you understood that, and what I take from your response is that
while agreeing you would like to do so differently. I just wanted to be
clear.


> I took some time to illustrate a few modes of operation:
>
> Fig. 1
>
>.-.
>| application |
>`--.--'
>   |
>  .'-.-. <-- init, conf, Rx/Tx
>  |  | |
>  |  .---|--.--|--. <--- conf, link check, Rx/Tx
>  |  |   |  |  |  |
>  v  |   v  v  v  v
> .-. | .---. .--.
> | bonding | | | ixgbe | | mlx4 |
> `.' | `---' `--'
>  |  |
>  `--'
>
> Typical link fail-over.
>
>
> Fig. 2
>
>  .-.
>  | application |
>  `--.--'
> | < init, conf, Rx/Tx
> v
>   .---.
>   | fail-safe |
>   `-.-'
> |
> .---'. <--- init, conf, dev check, Rx/Tx
> ||
> vv
> .---. .--.
> | ixgbe | | mlx4 |
> `---' `--'
>
> Typical automatic hot-plug handling with device fail-over.
>
> [...]

Yes, I think we all understand the purpose of your PMD - its there to provide a
placeholder device that can respond to application requests in a sane manner,
until such time as real hardware is put in its place via a hot plug/failure.  
Thats all
well and good, I'm saying there are better ways to go about this that can
provide the same functionality without having to add an extra 4k lines of code
to the project, many of which already exist.



Ah, yes, I didn't want to imply that the purpose of this PMD wasn't
understood already by many. These figures are there to illustrate some
use cases that some users could recognize, and serve as a support for
the point made afterward.

The main thing that can be taken from these is the division along the
link-level and device-level checking that is done. This explains most of
my position. The nature of those checks imply different kind of code,
most of which is thus actually not duplicated / would require pretty
much the same amount of code to be implemented either as libraries or as
part of the bonding PMD. This is the crux of my argument, which I expand
upon below.


> 1. LSC vs. RMV
>
>  A link status change is a valid state for a device. It calls for
>  specific responses, e.g. a link switch in a bonding device, without
>  losing the general configuration of the port.
>
>  The removal of a device calls for more than pausing operations and
>  switching an active device. The party responsible for initializing the
>  device should take care of closing it properly. If this party also
>  wants to be able to restore the device if it was plugged back in, it
>  would need be able to initialize it back and reconfigure its previous
>  state.
>
>  As we can see that in [Fig. 1], this responsibility lies upon the
>  application.
>
Again, yes, I think we all see the benefit of centralizing hot plug operations,
no one is disagreing with that, its the code/functional duplication that is
concerning.



Certainly, I will try to explain why the code is not actually duplicated
/ why the functions are actually only superficially overlapping.


> 2. Bonding and link availability
>
>  The hot-plug functionality is not a core function of the bonding PMD.
>  It is only interested in knowing if the link is active or not.
>
Currently, yes.  The suggestion was that you augment the bonding driver so that
hot plug is a core function of bonding.

>  Adding the device persistence to the bonding PMD would mean adding the
>  ability to flexibly parse device definitions to cope with plug-ins in
>  evolving busses (PCI hot-plug could mean changing bus addresses), being
>  able to emulate the EAL and the ether layer and to properly store the
>  device configuration.  This means formally describing the life of a
>  device in a DPDK application from start to finish.
>
Which seems to me to be exactly what your PMD does.  I don't see why its
fundamentally harder to do that in an existing pmd, than it is in a new one.



Indeed it does. I must emphasize the "formally describe the life of a
device". The hot-plug functionality goes beyong the link-level check.
The description of a device from a DPDK

Re: [dpdk-dev] Compiling DPDK 17.02 with a Busybox-based tar

2017-03-14 Thread Thomas Monjalon
2017-03-14 14:20, Raphael Cohn:
> Why it is surprising that I need to compile DPDK on a system Busybox?
> 
> I currently build DPDK both for my local Alpine Linux system (busybox)
> using the system compiler, use busybox for my cross-tools toolchain, and
> also use it in Libertine Linux. It means a system using DPDK can be much
> more minimal. Likewise, Aboriginal Linux uses Busybox (and, shortly,
> Toybox), for similar reasons. It also means users of my rust crate can work
> in a wider range of systems than RHEL / Ubuntu.

I understand the need for busybox on the target system.
I was just saying that generally we use a more complete system on the
host building a target. But you are free to use any tool you want :)

PS: please reply in-line


> On 14 March 2017 at 09:39, Thomas Monjalon 
> wrote:
> 
> > 2017-03-14 07:58, Raphael Cohn:
> > > Hi,
> > >
> > > To compile DPDK on a system with Busybox tar installed, it's necessary to
> > > make a small change to the build system:-
> > >
> > > sed -i -e '/--keep-newer-files/d' mk/rte.sdkinstall.mk
> > > sed -i -e 's;--strip-components=1 \\;--strip-components=1;g' mk/
> > > rte.sdkinstall.mk
> > >
> > > I'm not sure whether the impact of this change fundamentally affects
> > DPDK.
> > > I'm a little surprised that tar is needed at all for a compile + install,
> > > but I haven't investigated further. Is it being used to do a copy?
> >
> > Yes it is used to make a copy.
> > It is a convenient one-liner.
> >
> > I am a bit surprised that you need to install DPDK with busybox.
> > The busybox systems are generally cross-built and prepared out of the box,
> > with the host tools.
> > However, if you feel it is important to install DPDK on such target,
> > you are welcome to propose a patch.
> >
> >




[dpdk-dev] [PATCH v2] examples/ip_fragmentation: fix check of packet type

2017-03-14 Thread Wei Dai
The packet_type in mbuf is not correctly filled by ixgbe 82599 NIC.
To use the ether_type in ethernet header to check packet type is
more reliaber.

Fixes: 3c0184cc0c60 ("examples: replace some offload flags with packet type")
Fixes: ab351fe1c95c ("mbuf: remove packet type from offload flags")

Cc: sta...@dpdk.org

Reported-by: Fangfang Wei 
Signed-off-by: Wei Dai 
---
 examples/ip_fragmentation/main.c | 74 
 1 file changed, 74 insertions(+)

diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index 9e9ecae..151f059 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -653,6 +653,74 @@ check_all_ports_link_status(uint8_t port_num, uint32_t 
port_mask)
}
 }
 
+/* Check L3 packet type detection capablity of the NIC port */
+static int
+check_ptype(int portid)
+{
+   int i, ret;
+   int ptype_l3_ipv4 = 0, ptype_l3_ipv6 = 0;
+   uint32_t ptype_mask = RTE_PTYPE_L3_MASK;
+
+   ret = rte_eth_dev_get_supported_ptypes(portid, ptype_mask, NULL, 0);
+   if (ret <= 0)
+   return 0;
+
+   uint32_t ptypes[ret];
+
+   ret = rte_eth_dev_get_supported_ptypes(portid, ptype_mask, ptypes, ret);
+   for (i = 0; i < ret; ++i) {
+   if (ptypes[i] & RTE_PTYPE_L3_IPV4)
+   ptype_l3_ipv4 = 1;
+   if (ptypes[i] & RTE_PTYPE_L3_IPV6)
+   ptype_l3_ipv6 = 1;
+   }
+
+   if (ptype_l3_ipv4 == 0)
+   printf("port %d cannot parse RTE_PTYPE_L3_IPV4\n", portid);
+
+   if (ptype_l3_ipv6 == 0)
+   printf("port %d cannot parse RTE_PTYPE_L3_IPV6\n", portid);
+
+   if (ptype_l3_ipv4 && ptype_l3_ipv6)
+   return 1;
+
+   return 0;
+
+}
+
+/* Parse packet type of a packet by SW */
+static inline void
+parse_ptype(struct rte_mbuf *m)
+{
+   struct ether_hdr *eth_hdr;
+   uint32_t packet_type = RTE_PTYPE_UNKNOWN;
+   uint16_t ether_type;
+
+   eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
+   ether_type = eth_hdr->ether_type;
+   if (ether_type == rte_cpu_to_be_16(ETHER_TYPE_IPv4))
+   packet_type |= RTE_PTYPE_L3_IPV4_EXT_UNKNOWN;
+   else if (ether_type == rte_cpu_to_be_16(ETHER_TYPE_IPv6))
+   packet_type |= RTE_PTYPE_L3_IPV6_EXT_UNKNOWN;
+
+   m->packet_type = packet_type;
+}
+
+/* callback function to detect packet type for a queue of a port */
+static uint16_t
+cb_parse_ptype(uint8_t port __rte_unused, uint16_t queue __rte_unused,
+  struct rte_mbuf *pkts[], uint16_t nb_pkts,
+  uint16_t max_pkts __rte_unused,
+  void *user_param __rte_unused)
+{
+   uint16_t i;
+
+   for (i = 0; i < nb_pkts; ++i)
+   parse_ptype(pkts[i]);
+
+   return nb_pkts;
+}
+
 static int
 init_routing_table(void)
 {
@@ -944,6 +1012,12 @@ main(int argc, char **argv)
ret, portid);
 
rte_eth_promiscuous_enable(portid);
+
+   if (check_ptype(portid) == 0) {
+   rte_eth_add_rx_callback(portid, 0, cb_parse_ptype, 
NULL);
+   printf("Add Rx callback funciton to detect L3 packet 
type by SW :"
+   " port = %d\n", portid);
+   }
}
 
if (init_routing_table() < 0)
-- 
2.7.4



[dpdk-dev] [PATCH] ip_frag: handle MTU sizes not aligned to 8 bytes

2017-03-14 Thread Allain Legacy
The rte_ipv4_fragment_packet API expects that the link/interface MTU value
passed in be divisible by 8 bytes.  Given the name of the parameter is
"mtu" rather than "frag_size" it is not necessarily the case that it will
be divisible by 8.  An MTU of 1500 happens to produce a max fragment size
of 1480 (1500 - sizeof(ipv4_hdr)) which is divisible by 8 but other MTU
values such as 1600 or 9000 do not produce values that are divisible by 8.

Unfortunately, the API checks that the frag_size value produced is
divisible by 8 with a call to RTE_ASSERT which is only enabled when the
RTE_LOG_LEVEL >= RTE_LOG_DEBUG.  In cases where the log level is set
normally the code silently continues and produces IP fragments that have
invalid fragment offset values.

An application may not have control over what MTU a user selects and rather
than have each application adjust the MTU to pass a suitable value to the
fragmentation API this change modifies the fragmentation API to handle
cases where the "mtu" argument is not divisible by 8 and automatically
adjust the internal "frag_size".

Signed-off-by: Allain Legacy 
---
 lib/librte_ip_frag/rte_ipv4_fragmentation.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/lib/librte_ip_frag/rte_ipv4_fragmentation.c 
b/lib/librte_ip_frag/rte_ipv4_fragmentation.c
index a2259e8..8c5f5ec 100644
--- a/lib/librte_ip_frag/rte_ipv4_fragmentation.c
+++ b/lib/librte_ip_frag/rte_ipv4_fragmentation.c
@@ -48,7 +48,7 @@
 #defineIPV4_HDR_DF_MASK(1 << IPV4_HDR_DF_SHIFT)
 #defineIPV4_HDR_MF_MASK(1 << IPV4_HDR_MF_SHIFT)
 
-#defineIPV4_HDR_FO_MASK((1 << 
IPV4_HDR_FO_SHIFT) - 1)
+#defineIPV4_HDR_FO_ALIGN   (1 << IPV4_HDR_FO_SHIFT)
 
 static inline void __fill_ipv4hdr_frag(struct ipv4_hdr *dst,
const struct ipv4_hdr *src, uint16_t len, uint16_t fofs,
@@ -103,11 +103,14 @@ static inline void __free_fragments(struct rte_mbuf 
*mb[], uint32_t num)
uint32_t out_pkt_pos, in_seg_data_pos;
uint32_t more_in_segs;
uint16_t fragment_offset, flag_offset, frag_size;
+   uint16_t frag_bytes_remaining;
 
-   frag_size = (uint16_t)(mtu_size - sizeof(struct ipv4_hdr));
-
-   /* Fragment size should be a multiply of 8. */
-   RTE_ASSERT((frag_size & IPV4_HDR_FO_MASK) == 0);
+   /*
+* Ensure the IP payload length of all fragments is aligned to a
+* multiple of 8 bytes as per RFC791 section 2.3.
+*/
+   frag_size = RTE_ALIGN_FLOOR((mtu_size - sizeof(struct ipv4_hdr)),
+   IPV4_HDR_FO_ALIGN);
 
in_hdr = rte_pktmbuf_mtod(pkt_in, struct ipv4_hdr *);
flag_offset = rte_cpu_to_be_16(in_hdr->fragment_offset);
@@ -142,6 +145,7 @@ static inline void __free_fragments(struct rte_mbuf *mb[], 
uint32_t num)
/* Reserve space for the IP header that will be built later */
out_pkt->data_len = sizeof(struct ipv4_hdr);
out_pkt->pkt_len = sizeof(struct ipv4_hdr);
+   frag_bytes_remaining = frag_size;
 
out_seg_prev = out_pkt;
more_out_segs = 1;
@@ -161,7 +165,7 @@ static inline void __free_fragments(struct rte_mbuf *mb[], 
uint32_t num)
 
/* Prepare indirect buffer */
rte_pktmbuf_attach(out_seg, in_seg);
-   len = mtu_size - out_pkt->pkt_len;
+   len = frag_bytes_remaining;
if (len > (in_seg->data_len - in_seg_data_pos)) {
len = in_seg->data_len - in_seg_data_pos;
}
@@ -171,9 +175,10 @@ static inline void __free_fragments(struct rte_mbuf *mb[], 
uint32_t num)
out_pkt->pkt_len);
out_pkt->nb_segs += 1;
in_seg_data_pos += len;
+   frag_bytes_remaining -= len;
 
/* Current output packet (i.e. fragment) done ? */
-   if (unlikely(out_pkt->pkt_len >= mtu_size))
+   if (unlikely(frag_bytes_remaining == 0))
more_out_segs = 0;
 
/* Current input segment done ? */
-- 
1.8.3.1



Re: [dpdk-dev] [PATCH v2] net/sfc: set multicast address list in started state only

2017-03-14 Thread Ferruh Yigit
On 3/9/2017 4:21 PM, Andrew Rybchenko wrote:
> From: Ivan Malov 
> 
> According to 'libefx' API requirements, one is allowed to
> apply multicast address list to the port in started state
> only, otherwise the new array should be copied to a local
> storage in order to be applied during the next port start
> 
> Fixes: 0fa0070e4391 ("net/sfc: support multicast addresses list controls")
> Coverity issue: 141296
> 
> Fixes: e9ddf37a507d ("net/sfc: fix setting empty multicast list")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Ivan Malov 
> Signed-off-by: Andrew Rybchenko 
> Reviewed-by: Andrew Lee 
> Reviewed-by: Andy Moreton 

Applied to dpdk-next-net/master, thanks.



Re: [dpdk-dev] [PATCH] net/e1000: advertise offload capabilities for the EM PMD

2017-03-14 Thread Ferruh Yigit
On 3/13/2017 3:15 AM, Lu, Wenzhuo wrote:
> Hi,
> 
>> -Original Message-
>> From: Allain Legacy [mailto:allain.leg...@windriver.com]
>> Sent: Friday, March 10, 2017 10:38 PM
>> To: Lu, Wenzhuo
>> Cc: dev@dpdk.org
>> Subject: [PATCH] net/e1000: advertise offload capabilities for the EM PMD
>>
>> The hardware offload capabilities are not being advertised for the EM PMD.
>> Because of this, applications that only enable these features if the device
>> advertises them will never do so.
>>
>> Normally this is not an issue since normal packet processing should work 
>> even if
>> hardware offload is not available.  But, in older versions of Virtual Box 
>> the e1000
>> device emulation (Intel PRO/1000 MT Desktop 82540EM) assumes that it should
>> enable VLAN stripping even if the driver does not request it.  This means 
>> that any
>> ingress packets that have a VLAN tag will be stripped.  Since the 
>> application did
>> not request to enable VLAN stripping it is not expecting these packets so 
>> they
>> are not processed as VLAN packets.
>>
>> Regardless of the Virtual Box issue, the driver should be advertising 
>> supported
>> capabilities as is done in other drivers.
>>
>> Signed-off-by: Allain Legacy 
> Acked-by: Wenzhuo Lu 

Applied to dpdk-next-net/master, thanks.



Re: [dpdk-dev] [PATCH v2] net/thunderx: check data offset alignment requirement

2017-03-14 Thread Ferruh Yigit
On 3/13/2017 8:02 AM, Jerin Jacob wrote:
> nicvf HW expects the DMA address of the packet data to be
> aligned with cache line size.
> 
> Packet data offset is a function of struct mbuf size,
> mbuf private size and headroom. mbuf private size can
> be changed from the application in pool creation, this
> check detects HW alignment requirement constraint in pmd
> start function.
> 
> Signed-off-by: Jerin Jacob 
> Acked-by: Hemant Agrawal 

Applied to dpdk-next-net/master, thanks.



Re: [dpdk-dev] [PATCH] net/vmxnet3: fix broken debug build

2017-03-14 Thread Ferruh Yigit
On 3/13/2017 5:19 PM, Shrikrishna Khare wrote:
> 
> 
> On Mon, 13 Mar 2017, Andrew Rybchenko wrote:
> 
>> Fixes: 4dd452523000 ("net/vmxnet3: add receive data ring support")
>>
>> Signed-off-by: Andrew Rybchenko 
> 
> Thank you for catching and fixing this.
> 
> Acked-by: Shrikrishna Khare 

Squashed into original commit in next-net [1], thanks.

[1]
4dd452523000 ("net/vmxnet3: add receive data ring support")


Re: [dpdk-dev] [PATCH] net/vmxnet3: fix queue size changes

2017-03-14 Thread Jan Blunck
On Mon, Mar 13, 2017 at 11:41 PM, Charles (Chas) Williams
 wrote:
> If the user reconfigures the queues size, then the previosly allocated
> memzone may potentially be too small.  Instead, always free the old
> memzone and allocate a new one.
>
> Fixes: dfaff37fc46d ("vmxnet3: import new vmxnet3 poll mode driver 
> implementation")
>
> Signed-off-by: Chas Williams 
> ---
>  drivers/net/vmxnet3/vmxnet3_rxtx.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
> b/drivers/net/vmxnet3/vmxnet3_rxtx.c
> index 6649c3f..104e040 100644
> --- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
> +++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
> @@ -893,8 +893,8 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
> **rx_pkts, uint16_t nb_pkts)
>
>  /*
>   * Create memzone for device rings. malloc can't be used as the physical 
> address is
> - * needed. If the memzone is already created, then this function returns a 
> ptr
> - * to the old one.
> + * needed. If the memzone already exists, we free it since it may have been 
> created
> + * with a different size.
>   */
>  static const struct rte_memzone *
>  ring_dma_zone_reserve(struct rte_eth_dev *dev, const char *ring_name,
> @@ -909,7 +909,7 @@ ring_dma_zone_reserve(struct rte_eth_dev *dev, const char 
> *ring_name,
>
> mz = rte_memzone_lookup(z_name);
> if (mz)
> -   return mz;
> +   rte_memzone_free(mz);
>
> return rte_memzone_reserve_aligned(z_name, ring_size,
>socket_id, 0, 
> VMXNET3_RING_BA_ALIGN);

Chas,

Thanks for hunting this one down. Wouldn't the rte_memzone_free()
better fit into vmxnet3_cmd_ring_release() ?

Also the ring_dma_zone_reserve() could get replaced by
rte_eth_dma_zone_reserve() (see also
http://dpdk.org/dev/patchwork/patch/21432/).


[dpdk-dev] [PATCH] igb_uio: support devices with at least 1 bar defined

2017-03-14 Thread Allain Legacy
From: Matt Peters 

Allow the BAR setup to succeed if a device has at least 1 BAR region
defined.  Previously, the device probe would only succeed if at least one
memory BAR existed, but there are devices that have only port I/O BARs.

For example, on Virtual Box a virtio device has only a single I/O BAR
because by default MSI-X is not enabled.  While in qemu/kvm the virtio
device has MSI-X enabled and therefore has both an I/O and Memory BAR.

The following are excerpts from "lspci -nn -s 00:09.0" on both types of
systems.

Virtual Box:

Region 0: I/O ports at d260 [size=32]
Capabilities: [80] #00 []

QEMU/KVM:

Region 0: I/O ports at c060 [size=32]
Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at feb8 [disabled] [size=256K]
Capabilities: [40] MSI-X: Enable+ Count=3 Masked-
Vector table: BAR=1 offset=
PBA: BAR=1 offset=0800

Signed-off-by: Matt Peters 
Signed-off-by: Allain Legacy 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index df41e45..a910eb8 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -314,7 +314,7 @@ struct rte_uio_pci_dev {
}
}
 
-   return (iom != 0) ? ret : -ENOENT;
+   return (iom != 0 || iop != 0) ? ret : -ENOENT;
 }
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 8, 0)
-- 
1.8.3.1



Re: [dpdk-dev] [PATCH] net/vmxnet3: fix queue size changes

2017-03-14 Thread Charles (Chas) Williams



On 03/14/2017 12:11 PM, Jan Blunck wrote:

On Mon, Mar 13, 2017 at 11:41 PM, Charles (Chas) Williams
 wrote:

If the user reconfigures the queues size, then the previosly allocated
memzone may potentially be too small.  Instead, always free the old
memzone and allocate a new one.

Fixes: dfaff37fc46d ("vmxnet3: import new vmxnet3 poll mode driver 
implementation")

Signed-off-by: Chas Williams 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index 6649c3f..104e040 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -893,8 +893,8 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)

 /*
  * Create memzone for device rings. malloc can't be used as the physical 
address is
- * needed. If the memzone is already created, then this function returns a ptr
- * to the old one.
+ * needed. If the memzone already exists, we free it since it may have been 
created
+ * with a different size.
  */
 static const struct rte_memzone *
 ring_dma_zone_reserve(struct rte_eth_dev *dev, const char *ring_name,
@@ -909,7 +909,7 @@ ring_dma_zone_reserve(struct rte_eth_dev *dev, const char 
*ring_name,

mz = rte_memzone_lookup(z_name);
if (mz)
-   return mz;
+   rte_memzone_free(mz);

return rte_memzone_reserve_aligned(z_name, ring_size,
   socket_id, 0, VMXNET3_RING_BA_ALIGN);


Chas,

Thanks for hunting this one down. Wouldn't the rte_memzone_free()
better fit into vmxnet3_cmd_ring_release() ?


I don't care which way it goes.  I just did what is basically done in
gpa_zone_reserve() to match the "style".  Tracking the current ring size
and avoiding reallocating a potentially large chunk of memory seems like
a better idea.


Also the ring_dma_zone_reserve() could get replaced by
rte_eth_dma_zone_reserve() (see also


Yes, it probably should get changed to that along with tracking the size.


Re: [dpdk-dev] [PATCH v2 1/2] eal/linux: support external Rx interrupt

2017-03-14 Thread Yongseok Koh

> On Mar 14, 2017, at 6:03 AM, Shahaf Shuler  wrote:
> 
> Prior to this patch only UIO/VFIO interrupt handlers types were supported.
> This patch adds support for the external interrupt handler type, allowing
> external drivers to set their own fds with specific interrupt handlers.
> 
> Signed-off-by: Shahaf Shuler 

Acked-by: Yongseok Koh 

Regards,
Yongseok



Re: [dpdk-dev] [PATCH v2 2/2] net/mlx5: support user space rxq interrupt event

2017-03-14 Thread Yongseok Koh

> On Mar 14, 2017, at 6:03 AM, Shahaf Shuler  wrote:
> 
> Implement rxq interrupt callbacks
> 
> Signed-off-by: Shahaf Shuler 

Acked-by: Yongseok Koh 

Regards,
Yongseok


Re: [dpdk-dev] [PATCH] igb_uio: support devices with at least 1 bar defined

2017-03-14 Thread Ferruh Yigit
On 3/14/2017 4:33 PM, Allain Legacy wrote:
> From: Matt Peters 
> 
> Allow the BAR setup to succeed if a device has at least 1 BAR region
> defined.  Previously, the device probe would only succeed if at least one
> memory BAR existed, but there are devices that have only port I/O BARs.
> 
> For example, on Virtual Box a virtio device has only a single I/O BAR
> because by default MSI-X is not enabled.  While in qemu/kvm the virtio
> device has MSI-X enabled and therefore has both an I/O and Memory BAR.
> 
> The following are excerpts from "lspci -nn -s 00:09.0" on both types of
> systems.
> 
> Virtual Box:
> 
> Region 0: I/O ports at d260 [size=32]
> Capabilities: [80] #00 []
> 
> QEMU/KVM:
> 
> Region 0: I/O ports at c060 [size=32]
> Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K]
> Expansion ROM at feb8 [disabled] [size=256K]
> Capabilities: [40] MSI-X: Enable+ Count=3 Masked-
> Vector table: BAR=1 offset=
> PBA: BAR=1 offset=0800
> 
> Signed-off-by: Matt Peters 
> Signed-off-by: Allain Legacy 

Acked-by: Ferruh Yigit 



Re: [dpdk-dev] [PATCH 1/2] net/tap: remove redundant syscall on Tx

2017-03-14 Thread Ferruh Yigit
On 3/14/2017 1:52 PM, Wiles, Keith wrote:
> 
>> On Mar 14, 2017, at 8:51 PM, Adrien Mazarguil  
>> wrote:
>>
>> Polling the Tx queue file descriptor before writing to it is not mandatory
>> since it is configured as non-blocking.
>>
>> Signed-off-by: Adrien Mazarguil 
>> Acked-by: Pascal Mazon 
> 
> Acked-by: Keith Wiles 

Series applied to dpdk-next-net/master, thanks.



[dpdk-dev] [PATCH v3] lpm: extend IPv6 next hop field

2017-03-14 Thread Vladyslav Buslov
This patch extend next_hop field from 8-bits to 21-bits in LPM library
for IPv6.

Added versioning symbols to functions and updated
library and applications that have a dependency on LPM library.

Signed-off-by: Vladyslav Buslov 
Acked-by: Bruce Richardson 
---

Fixed compilation error in l3fwd_lpm.h

 doc/guides/prog_guide/lpm6_lib.rst  |   2 +-
 doc/guides/rel_notes/release_17_05.rst  |   5 +
 examples/ip_fragmentation/main.c|  17 +--
 examples/ip_reassembly/main.c   |  17 +--
 examples/ipsec-secgw/ipsec-secgw.c  |   2 +-
 examples/l3fwd/l3fwd_lpm.h  |   2 +-
 examples/l3fwd/l3fwd_lpm_sse.h  |  24 ++---
 examples/performance-thread/l3fwd-thread/main.c |  11 +-
 lib/librte_lpm/rte_lpm6.c   | 134 +---
 lib/librte_lpm/rte_lpm6.h   |  32 +-
 lib/librte_lpm/rte_lpm_version.map  |  10 ++
 lib/librte_table/rte_table_lpm_ipv6.c   |   9 +-
 test/test/test_lpm6.c   | 115 ++--
 test/test/test_lpm6_perf.c  |   4 +-
 14 files changed, 293 insertions(+), 91 deletions(-)

diff --git a/doc/guides/prog_guide/lpm6_lib.rst 
b/doc/guides/prog_guide/lpm6_lib.rst
index 0aea5c5..f791507 100644
--- a/doc/guides/prog_guide/lpm6_lib.rst
+++ b/doc/guides/prog_guide/lpm6_lib.rst
@@ -53,7 +53,7 @@ several thousand IPv6 rules, but the number can vary 
depending on the case.
 An LPM prefix is represented by a pair of parameters (128-bit key, depth), 
with depth in the range of 1 to 128.
 An LPM rule is represented by an LPM prefix and some user data associated with 
the prefix.
 The prefix serves as the unique identifier for the LPM rule.
-In this implementation, the user data is 1-byte long and is called "next hop",
+In this implementation, the user data is 21-bits long and is called "next hop",
 which corresponds to its main use of storing the ID of the next hop in a 
routing table entry.
 
 The main methods exported for the LPM component are:
diff --git a/doc/guides/rel_notes/release_17_05.rst 
b/doc/guides/rel_notes/release_17_05.rst
index 4b90036..918f483 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -41,6 +41,9 @@ New Features
  Also, make sure to start the actual text at the margin.
  =
 
+* **Increased number of next hops for LPM IPv6 to 2^21.**
+
+  The next_hop field is extended from 8 bits to 21 bits for IPv6.
 
 * **Added powerpc support in pci probing for vfio-pci devices.**
 
@@ -114,6 +117,8 @@ API Changes
Also, make sure to start the actual text at the margin.
=
 
+* The LPM ``next_hop`` field is extended from 8 bits to 21 bits for IPv6
+  while keeping ABI compatibility.
 
 ABI Changes
 ---
diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index 9e9ecae..1b005b5 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -265,8 +265,8 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct 
lcore_queue_conf *qconf,
uint8_t queueid, uint8_t port_in)
 {
struct rx_queue *rxq;
-   uint32_t i, len, next_hop_ipv4;
-   uint8_t next_hop_ipv6, port_out, ipv6;
+   uint32_t i, len, next_hop;
+   uint8_t port_out, ipv6;
int32_t len2;
 
ipv6 = 0;
@@ -290,9 +290,9 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct 
lcore_queue_conf *qconf,
ip_dst = rte_be_to_cpu_32(ip_hdr->dst_addr);
 
/* Find destination port */
-   if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop_ipv4) == 0 &&
-   (enabled_port_mask & 1 << next_hop_ipv4) != 0) {
-   port_out = next_hop_ipv4;
+   if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop) == 0 &&
+   (enabled_port_mask & 1 << next_hop) != 0) {
+   port_out = next_hop;
 
/* Build transmission burst for new port */
len = qconf->tx_mbufs[port_out].len;
@@ -326,9 +326,10 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct 
lcore_queue_conf *qconf,
ip_hdr = rte_pktmbuf_mtod(m, struct ipv6_hdr *);
 
/* Find destination port */
-   if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, 
&next_hop_ipv6) == 0 &&
-   (enabled_port_mask & 1 << next_hop_ipv6) != 0) {
-   port_out = next_hop_ipv6;
+   if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr,
+   &next_hop) == 0 &&
+   (enabled_port_mask & 1 << next_hop) != 0) {
+   port_out = next_hop;
 
/* Build transmission burst for new port

Re: [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio?

2017-03-14 Thread Vincent JARDIN

Allain,

see inline,
+ I did restore the thread from 
http://dpdk.org/ml/archives/dev/2017-March/060087.html into the same email.


To make it short, using ivshmem, you keep people unfocused from virtio.


Vincent,
Perhaps you can help me understand why the performance or functionality of
AVP vs. Virtio is relevant to the decision of accepting this driver. There
are many drivers in the DPDK; most of which provide the same functionality
at comparable performance rates. AVP is just another such driver. The fact
that it is virtual rather than physical, in my opinion, should not influence
the decision of accepting this driver.


No, it is a different logic: your driver is about Qemu's vNIC support.
Other PMDs are here to support vendors and different hypervisors. For 
isntance, in case of vmxnet3, different versions, evolutions and 
optimisations can be managed inside vmxnet3. We need to avoid the 
proliferation of PMDs for a same hypervisor while there is already an 
efficient solution, virtio.



On the other hand, code
quality/complexity or lack of a maintainer are reasonable reasons for
rejecting. If our driver is accepted we are committed to maintaining it and
testing changes as required by any driver framework changes which may impact
all drivers.


But, then it is unfocusing your capabilities. As a community, we need to 
be sure that we are all focused on improving existing solutions. Since 
virtio is the one, I would rather prefer to see more people working on 
improving the virtio's community instead of getting everyone unfocused.




Along the same lines, I do not understand why upstreaming AVP in to the Linux
kernel or qemu/kvm should be a prerequisite for inclusion in the DPDK.
Continuing my analogy from above, the AVP device is a commercial offering
tied to the Wind River Systems Titanium product line. It enables virtualized
DPDK applications and increases DPDK adoption. Similarly to how a driver from
company XYX is tied to a commercial NIC that must be purchased by a customer,
our AVP device is available to operators that choose to leverage our Titanium
product to implement their Cloud solutions. It is not our intention to
upstream the qemu/kvm or host vswitch portion of the AVP device. Our qemu/kvm
extensions are GPL so they are available to our customers if they desire to
rebuild qemu/kvm with their own proprietary extensions


It was a solution before 2013, but now we are in 2017, vhost-user is 
here, show them the new proper path. Let's be focus, please.



Our AVP device was implemented in 2013 in response to the challenge of lower
than required performance of qemu/virtio in both user space and DPDK
applications in the VM. Rather than making complex changes to qemu/virtio
and continuously have to forward prop those as we upgraded to newer versions
of qemu we decided to decouple ourselves from that code base. We developed the
AVP device based on an evolution of KNI+ivshmem by enhancing both with
features that would meet the needs of our customers; better performance,
multi-queue support, live-migration support, hot-plug support. As I said in
my earlier response, since 2013, qemu/virtio has seen improved performance
with the introduction of vhost-user. The performance of vhost-user still has
not yet achieved performance levels equal to our AVP PMD.


Frankly, I was on the boat than you: I did strong pitch the use of 
ivshmem for vNICs for many benefits. But Redhat folks did push bash to 
ask every one to be focused in virtio. So, back to 2013, I would have 
supported your AVP approach, but now we are in 2017 and we must stack 
focused on a single and proper qemu solution => virtio/vhost.




I acknowledge that the AVP driver could exist as an out-of-tree driver loaded
as a shared library at runtime. In fact, 2 years ago we released our driver
source on github for this very reason.  We provide instructions and support
for building the AVP PMD as a shared library. Some customers have adopted
this method while many insist on an in-tree driver for several reasons.

Most importantly, they want to eliminate the burden of needing to build and
support an additional package into their product. An in-tree driver would
eliminate the need for a separate build/packaging process. Also, they want
an option that allows them to be able to develop directly on the bleeding
edge of DPDK rather than waiting on us to update our out-of-tree driver
based on stable releases of the DPDK. In this regard, an in-tree driver
would allow our customers to work directly on the latest DPDK.


Make them move to virtio, then this pain will disappear.


An in-tree driver provides obvious benefits to our customers, but keep in
mind that this also provides a benefit to the DPDK. If a customer must
develop on a stable release because they must use an out-of-tree driver then
they are less likely to contribute fixes/enhancements/testing upstream. I
know this first hand because I work with software from different sources on
a daily ba

Re: [dpdk-dev] [PATCH] net/i40e: fix error log in descriptor done function

2017-03-14 Thread Ferruh Yigit
On 3/1/2017 2:45 PM, Olivier Matz wrote:
> It's not queue identifier but a descriptor identifier.
> 
> Fixes: 4861cde46116 ("i40e: new poll mode driver")
> 
> Signed-off-by: Olivier Matz 

Acked-by: Ferruh Yigit 



  1   2   >