[dpdk-dev] [PATCH] i40e: fix build of VXLAN packet identification debug

2014-11-06 Thread Choonho Son
The commit 15dbb63ef9e9f108e7dcd837b88234f27a1ec258 didn't compile,
if CONFIG_RTE_LIBRTE_I40E_DEBUG_DRIVER is enabled.

Signed-off-by: Choonho Son 
---
 lib/librte_pmd_i40e/i40e_ethdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index fc78b20..4570795 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -4722,8 +4722,8 @@ i40e_add_vxlan_port(struct i40e_pf *pf, uint16_t port)
return -1;
}

-   PMD_DRV_LOG(INFO, "Added %s port %d with AQ command with index %d",
-port,  filter_index);
+   PMD_DRV_LOG(INFO, "Added port %d with AQ command with index %d",
+port,  filter_idx);

/* New port: add it and mark its index in the bitmap */
pf->vxlan_ports[idx] = port;
-- 
1.9.1



[dpdk-dev] 答复:答复: [PATCH] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-06 Thread XU Liang
I have a multiple processes application. When start a secondary process, we got 
error message "EAL: pci_map_resource(): cannot mmap(11, 0x77fba000, 
0x2, 0x0): Bad file descriptor (0x77fb9000)".The secondary process link 
difference shared libraries, so the address 0x77fba000 is used.
--Burakov, 
Anatoly ?2014?11?5?(???) 23:59?? 
?dev at dpdk.org RE: 
???[dpdk-dev] [PATCH] eal: map uio resources after hugepages when the 
base_virtaddr is configured.



font-family: MS Gothic;panose-1: 2 11 6 9 7 2 5 8 2 4;font-family: Cambria 
Math;panose-1: 2 4 5 3 5 4 6 3 2 4;font-family: Calibri;panose-1: 2 15 5 2 2 2 
4 3 2 4;font-family: Tahoma;panose-1: 2 11 6 4 3 5 4 4 2 4;font-family: \@MS 
Gothic;panose-1: 2 11 6 9 7 2 5 8 2 4;font-family: Microsoft JhengHei;panose-1: 
2 11 6 4 3 5 4 4 2 4;font-family: \@Microsoft JhengHei;panose-1: 2 11 6 4 3 5 4 
4 2 4;p.MsoNormal, li.MsoNormal, div.MsoNormal {margin: 0.0cm;margin-bottom: 
1.0E-4pt;font-size: 12.0pt;font-family: Times New Roman , serif;}
a:link, span.MsoHyperlink {mso-style-priority: 99;color: 
#0563c1;text-decoration: underline;}
a:visited, span.MsoHyperlinkFollowed {mso-style-priority: 99;color: 
#954f72;text-decoration: underline;}
span.EmailStyle17 {mso-style-type: personal-reply;font-family: Calibri , 
sans-serif;color: #1f497d;}
*.MsoChpDefault {mso-style-type: export-only;font-family: Calibri , sans-serif;}
size: 612.0pt 792.0pt;margin: 72.0pt 72.0pt 72.0pt 72.0pt;div.WordSection1 
{page: WordSection1;}




Hi Liang
?
Yes it is a problem. Even if it was carefully selected by user, nothing stops 
the DPDK application from mapping something into where you?re trying to map your
 UIO devices. Plus, this changes the default behavior where a wrong 
base-virtaddr leads to a failure to initialize, rather than simply using a 
different address (remember that pci_map_resource fails if it cannot map the 
resource at the exact address you requested).
?
A very crude way of finding out where hugepages end would be to walk the 
hugepage memory (walk through memsegs and note the maximum start addr + length 
of that
 memseg).
?
Could you perhaps explain what is the problem that you?re trying to solve with 
this? I can?t think of a situation where the location of UIO maps would matter,
 to be honest.
?
Thanks,
Anatoly
?
From: XU Liang [mailto:liang...@cinfotech.cn]


Sent: Wednesday, November 5, 2014 3:49 PM

To: Burakov, Anatoly; dev at dpdk.org

Subject: ???[dpdk-dev] [PATCH] eal: map uio resources after hugepages when the 
base_virtaddr is configured.
?


I think the base_virtadd will be carefully selected by user when they need it. 
So maybe it's not a real problem. ?:>


?


The real reason is I can't find a easy way to get the end address of hugepages. 
Can you give me some?suggestions ?



--


Burakov, Anatoly 


?2014?11?5?(???)
 23:10


?? ?dev at dpdk.org
 


RE:
 [dpdk-dev] [PATCH] eal: map uio resources after hugepages when the 
base_virtaddr is configured.


?

I have a slight problems with this patch.



The base_virtaddr doesn't necessarily correspond to an address that everything 
gets mapped to. It's a "hint" of sorts, that may or may not be taken into 
account by mmap. Therefore we can't simply assume that if we requested a 
base-virtaddr, everything will
 get mapped at exactly that address. We also can't assume that hugepages will 
be ordered one after the other and occupy neatly all the contiguous virtual 
memory between base_virtaddr and base_virtaddr + internal_config.memory - there 
may be holes, for whatever
 reasons.



Also, 



Thanks,

Anatoly



-Original Message-

From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of lxu

Sent: Wednesday, November 5, 2014 1:25 PM

To: dev at dpdk.org

Subject: [dpdk-dev] [PATCH] eal: map uio resources after hugepages when the 
base_virtaddr is configured.



---

lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 9 -

1 file changed, 8 insertions(+), 1 deletion(-)



diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c

index 7e62266..bc7ed3a 100644

--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c

+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c

@@ -289,6 +289,11 @@ pci_uio_map_resource(struct rte_pci_device *dev)

struct rte_pci_addr *loc = &dev->addr;

struct mapped_pci_resource *uio_res;

struct pci_map *maps;

+ static void * requested_addr = NULL;

+ if (internal_config.base_virtaddr && NULL == requested_addr) {

+ requested_addr = (uint8_t *) internal_config.base_virtaddr 

+ + internal_config.memory;

+ }



dev->intr_handle.fd = -1;

dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN; @@ -371,10 +376,12 @@ 
pci_uio_map_resource(struct rte_pci_device *dev)

if (maps[j].addr != NULL)

fail = 1;

else {

- mapaddr = pci_map_resource(NULL, fd, (off_t)offset,

+ mapaddr = pci_map_

[dpdk-dev] [PATCH] lib/librte_vhost: code style fixes

2014-11-06 Thread Huawei Xie
This patch fixes code style issues and refines some comments in vhost library.

Signed-off-by: Huawei Xie 
---
 lib/librte_vhost/eventfd_link/eventfd_link.c | 244 ++---
 lib/librte_vhost/eventfd_link/eventfd_link.h | 127 ++-
 lib/librte_vhost/rte_virtio_net.h|   3 +-
 lib/librte_vhost/vhost-net-cdev.c| 187 +---
 lib/librte_vhost/vhost_rxtx.c|  13 +-
 lib/librte_vhost/virtio-net.c| 317 +--
 6 files changed, 494 insertions(+), 397 deletions(-)

diff --git a/lib/librte_vhost/eventfd_link/eventfd_link.c 
b/lib/librte_vhost/eventfd_link/eventfd_link.c
index fc0653a..542ec2c 100644
--- a/lib/librte_vhost/eventfd_link/eventfd_link.c
+++ b/lib/librte_vhost/eventfd_link/eventfd_link.c
@@ -1,26 +1,26 @@
 /*-
- *  * GPL LICENSE SUMMARY
- *  *
- *  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *  *
- *  *   This program is free software; you can redistribute it and/or modify
- *  *   it under the terms of version 2 of the GNU General Public License as
- *  *   published by the Free Software Foundation.
- *  *
- *  *   This program is distributed in the hope that it will be useful, but
- *  *   WITHOUT ANY WARRANTY; without even the implied warranty of
- *  *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- *  *   General Public License for more details.
- *  *
- *  *   You should have received a copy of the GNU General Public License
- *  *   along with this program; if not, write to the Free Software
- *  *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 
USA.
- *  *   The full GNU General Public License is included in this distribution
- *  *   in the file called LICENSE.GPL.
- *  *
- *  *   Contact Information:
- *  *   Intel Corporation
- *   */
+ * GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *   General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *   The full GNU General Public License is included in this distribution
+ *   in the file called LICENSE.GPL.
+ *
+ *   Contact Information:
+ *   Intel Corporation
+ */

 #include 
 #include 
@@ -42,15 +42,15 @@
  * get_files_struct is copied from fs/file.c
  */
 struct files_struct *
-get_files_struct (struct task_struct *task)
+get_files_struct(struct task_struct *task)
 {
struct files_struct *files;

-   task_lock (task);
+   task_lock(task);
files = task->files;
if (files)
-   atomic_inc (&files->count);
-   task_unlock (task);
+   atomic_inc(&files->count);
+   task_unlock(task);

return files;
 }
@@ -59,17 +59,15 @@ get_files_struct (struct task_struct *task)
  * put_files_struct is extracted from fs/file.c
  */
 void
-put_files_struct (struct files_struct *files)
+put_files_struct(struct files_struct *files)
 {
-   if (atomic_dec_and_test (&files->count))
-   {
-   BUG ();
-   }
+   if (atomic_dec_and_test(&files->count))
+   BUG();
 }


 static long
-eventfd_link_ioctl (struct file *f, unsigned int ioctl, unsigned long arg)
+eventfd_link_ioctl(struct file *f, unsigned int ioctl, unsigned long arg)
 {
void __user *argp = (void __user *) arg;
struct task_struct *task_target = NULL;
@@ -78,96 +76,88 @@ eventfd_link_ioctl (struct file *f, unsigned int ioctl, 
unsigned long arg)
struct fdtable *fdt;
struct eventfd_copy eventfd_copy;

-   switch (ioctl)
-   {
-   case EVENTFD_COPY:
-   if (copy_from_user (&eventfd_copy, argp, sizeof (struct 
eventfd_copy)))
-   return -EFAULT;
-
-   /*
-* Find the task struct for the target pid
-*/
-   task_target =
-   pid_task (find_vpid (eventfd_copy.target_pid), 
PIDTYPE_PID);
-   if (task_target == NULL)
-   {
-   printk (KERN_DEBUG "Failed to get mem ctx for 
target pid\n");
-   return -EFAULT;
-   }
-
-   files = get_files_struct (current);
-   if (files == NULL)
-   {
-

[dpdk-dev] [PATCH 1/2] lib/librte_vhost: code style fixes

2014-11-06 Thread Huawei Xie
fixes alignment issues, lengthy lines, misordered type and other coding style 
issues.

Signed-off-by: Huawei Xie 
---
 lib/librte_vhost/eventfd_link/eventfd_link.c | 244 ++---
 lib/librte_vhost/eventfd_link/eventfd_link.h | 127 ++-
 lib/librte_vhost/rte_virtio_net.h|   3 +-
 lib/librte_vhost/vhost-net-cdev.c| 187 +---
 lib/librte_vhost/vhost_rxtx.c|  13 +-
 lib/librte_vhost/virtio-net.c| 317 +--
 6 files changed, 494 insertions(+), 397 deletions(-)

diff --git a/lib/librte_vhost/eventfd_link/eventfd_link.c 
b/lib/librte_vhost/eventfd_link/eventfd_link.c
index fc0653a..542ec2c 100644
--- a/lib/librte_vhost/eventfd_link/eventfd_link.c
+++ b/lib/librte_vhost/eventfd_link/eventfd_link.c
@@ -1,26 +1,26 @@
 /*-
- *  * GPL LICENSE SUMMARY
- *  *
- *  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *  *
- *  *   This program is free software; you can redistribute it and/or modify
- *  *   it under the terms of version 2 of the GNU General Public License as
- *  *   published by the Free Software Foundation.
- *  *
- *  *   This program is distributed in the hope that it will be useful, but
- *  *   WITHOUT ANY WARRANTY; without even the implied warranty of
- *  *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- *  *   General Public License for more details.
- *  *
- *  *   You should have received a copy of the GNU General Public License
- *  *   along with this program; if not, write to the Free Software
- *  *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 
USA.
- *  *   The full GNU General Public License is included in this distribution
- *  *   in the file called LICENSE.GPL.
- *  *
- *  *   Contact Information:
- *  *   Intel Corporation
- *   */
+ * GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *   General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *   The full GNU General Public License is included in this distribution
+ *   in the file called LICENSE.GPL.
+ *
+ *   Contact Information:
+ *   Intel Corporation
+ */

 #include 
 #include 
@@ -42,15 +42,15 @@
  * get_files_struct is copied from fs/file.c
  */
 struct files_struct *
-get_files_struct (struct task_struct *task)
+get_files_struct(struct task_struct *task)
 {
struct files_struct *files;

-   task_lock (task);
+   task_lock(task);
files = task->files;
if (files)
-   atomic_inc (&files->count);
-   task_unlock (task);
+   atomic_inc(&files->count);
+   task_unlock(task);

return files;
 }
@@ -59,17 +59,15 @@ get_files_struct (struct task_struct *task)
  * put_files_struct is extracted from fs/file.c
  */
 void
-put_files_struct (struct files_struct *files)
+put_files_struct(struct files_struct *files)
 {
-   if (atomic_dec_and_test (&files->count))
-   {
-   BUG ();
-   }
+   if (atomic_dec_and_test(&files->count))
+   BUG();
 }


 static long
-eventfd_link_ioctl (struct file *f, unsigned int ioctl, unsigned long arg)
+eventfd_link_ioctl(struct file *f, unsigned int ioctl, unsigned long arg)
 {
void __user *argp = (void __user *) arg;
struct task_struct *task_target = NULL;
@@ -78,96 +76,88 @@ eventfd_link_ioctl (struct file *f, unsigned int ioctl, 
unsigned long arg)
struct fdtable *fdt;
struct eventfd_copy eventfd_copy;

-   switch (ioctl)
-   {
-   case EVENTFD_COPY:
-   if (copy_from_user (&eventfd_copy, argp, sizeof (struct 
eventfd_copy)))
-   return -EFAULT;
-
-   /*
-* Find the task struct for the target pid
-*/
-   task_target =
-   pid_task (find_vpid (eventfd_copy.target_pid), 
PIDTYPE_PID);
-   if (task_target == NULL)
-   {
-   printk (KERN_DEBUG "Failed to get mem ctx for 
target pid\n");
-   return -EFAULT;
-   }
-
-   files = get_files_struct (current);
-   if (files == NULL)
-   {
-

[dpdk-dev] [PATCH 2/2] lib/librte_vhost: printk->pr_debug

2014-11-06 Thread Huawei Xie
printk -> pr_debug

Signed-off-by: Huawei Xie 
---
 lib/librte_vhost/eventfd_link/eventfd_link.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/librte_vhost/eventfd_link/eventfd_link.c 
b/lib/librte_vhost/eventfd_link/eventfd_link.c
index 542ec2c..7755dd6 100644
--- a/lib/librte_vhost/eventfd_link/eventfd_link.c
+++ b/lib/librte_vhost/eventfd_link/eventfd_link.c
@@ -88,13 +88,13 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, 
unsigned long arg)
task_target =
pid_task(find_vpid(eventfd_copy.target_pid), 
PIDTYPE_PID);
if (task_target == NULL) {
-   printk(KERN_DEBUG "Failed to get mem ctx for target 
pid\n");
+   pr_debug("Failed to get mem ctx for target pid\n");
return -EFAULT;
}

files = get_files_struct(current);
if (files == NULL) {
-   printk(KERN_DEBUG "Failed to get files struct\n");
+   pr_debug("Failed to get files struct\n");
return -EFAULT;
}

@@ -109,7 +109,7 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, 
unsigned long arg)
put_files_struct(files);

if (file == NULL) {
-   printk(KERN_DEBUG "Failed to get file from source 
pid\n");
+   pr_debug("Failed to get file from source pid\n");
return 0;
}

@@ -128,7 +128,7 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, 
unsigned long arg)

files = get_files_struct(task_target);
if (files == NULL) {
-   printk(KERN_DEBUG "Failed to get files struct\n");
+   pr_debug("Failed to get files struct\n");
return -EFAULT;
}

@@ -143,7 +143,7 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, 
unsigned long arg)
put_files_struct(files);

if (file == NULL) {
-   printk(KERN_DEBUG "Failed to get file from target 
pid\n");
+   pr_debug("Failed to get file from target pid\n");
return 0;
}

-- 
1.8.1.4



[dpdk-dev] [PATCH v2] eal: add option --master-lcore

2014-11-06 Thread Simon Kuenzer

Thanks, that was quick! Quicker than me, actually. ;-)

I started to rebase and modify my previous patch as well but I got a bit 
stuck because I wanted to try to solve the command order issue that 
Aaron mentioned. In fact, I agree with him in this point. However, I 
figured out that more preceding patches might be required to get this 
proberly done.
The issue is that I would like to have all code that changes the master 
lcore placed completely in eal_common_options.c.
Something like eal_common_sanity_check() and eal_common_configure() 
functions would be needed because eal_parse_common_option() is just 
called within the option parsing loop. Maybe we can improve this later?

I am fine with this patch but I have some comments inlined.
In general: acknowledged from my side.

Thanks a lot,

Simon

On 04.11.2014 22:40, Thomas Monjalon wrote:
> From: Simon Kuenzer 
>
> Enable users to specify the lcore id that is used as master lcore.
>
> Signed-off-by: Simon Kuenzer 
> Signed-off-by: Thomas Monjalon 
> ---
>
> changes in v2:
> - rebase on HEAD including common options for BSD and Linux
> - use strtol() instead of atoi() to check syntax errors
> - unit tests
>
>   app/test/test.c |  1 +
>   app/test/test_eal_flags.c   | 49 
> +
>   lib/librte_eal/bsdapp/eal/eal.c |  7 +
>   lib/librte_eal/common/eal_common_options.c  | 31 ++
>   lib/librte_eal/common/include/eal_options.h |  2 ++
>   lib/librte_eal/linuxapp/eal/eal.c   |  7 +
>   6 files changed, 97 insertions(+)
>
> diff --git a/app/test/test.c b/app/test/test.c
> index 9bee6bb..2fecff5 100644
> --- a/app/test/test.c
> +++ b/app/test/test.c
> @@ -82,6 +82,7 @@ do_recursive_call(void)
>   } actions[] =  {
>   { "run_secondary_instances", test_mp_secondary },
>   { "test_missing_c_flag", no_action },
> + { "test_master_lcore_flag", no_action },
>   { "test_missing_n_flag", no_action },
>   { "test_no_hpet_flag", no_action },
>   { "test_whitelist_flag", no_action },
> diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
> index 21e6cca..45020b8 100644
> --- a/app/test/test_eal_flags.c
> +++ b/app/test/test_eal_flags.c
> @@ -520,6 +520,49 @@ test_missing_c_flag(void)
>   }
>
>   /*
> + * Test --master-lcore option with matching coremask
> + */
> +static int
> +test_master_lcore_flag(void)
> +{
> +#ifdef RTE_EXEC_ENV_BSDAPP
> + /* BSD target doesn't support prefixes at this point */
> + const char * prefix = "";
> +#else
> + char prefix[PATH_MAX], tmp[PATH_MAX];
> + if (get_current_prefix(tmp, sizeof(tmp)) == NULL) {
> + printf("Error - unable to get current prefix!\n");
> + return -1;
> + }
> + snprintf(prefix, sizeof(prefix), "--file-prefix=%s", tmp);
> +#endif
> +
> + /* --master-lcore flag but no value */
> + const char *argv1[] = { prgname, prefix, mp_flag, "-n", "1", "-c", "3", 
> "--master-lcore"};
> + /* --master-lcore flag with invalid value */
> + const char *argv2[] = { prgname, prefix, mp_flag, "-n", "1", "-c", "3", 
> "--master-lcore", "-1"};
> + /* --master-lcore flag with invalid value */
> + const char *argv3[] = { prgname, prefix, mp_flag, "-n", "1", "-c", "3", 
> "--master-lcore", "X"};
> + /* master lcore not in coremask */
> + const char *argv4[] = { prgname, prefix, mp_flag, "-n", "1", "-c", "3", 
> "--master-lcore", "2"};
> + /* valid value */
> + const char *argv5[] = { prgname, prefix, mp_flag, "-n", "1", "-c", "3", 
> "--master-lcore", "1"};
> +
> + if (launch_proc(argv1) == 0
> + || launch_proc(argv2) == 0
> + || launch_proc(argv3) == 0
> + || launch_proc(argv4) == 0) {
> + printf("Error - process ran without error with wrong 
> --master-lcore\n");
> + return -1;
> + }
> + if (launch_proc(argv5) != 0) {
> + printf("Error - process did not run ok with valid 
> --master-lcore\n");
> + return -1;
> + }
> + return 0;
> +}
> +
> +/*
>* Test that the app doesn't run without the -n flag. In all cases
>* should give an error and fail to run.
>* Since -n is not compulsory for MP, we instead use --no-huge and 
> --no-shconf
> @@ -1214,6 +1257,12 @@ test_eal_flags(void)
>   return ret;
>   }
>
> + ret = test_master_lcore_flag();
> + if (ret < 0) {
> + printf("Error in test_master_lcore_flag()\n");
> + return ret;
> + }
> +
>   ret = test_missing_n_flag();
>   if (ret < 0) {
>   printf("Error in test_missing_n_flag()\n");
> diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
> index ca99cb9..c764fec 100644
> --- a/lib/librte_eal/bsdapp/eal/eal.c
> +++ b/lib/librte_eal/bsdapp/

[dpdk-dev] [PATCH] lib/librte_pmd_i40e: i40e vlan filter set fix

2014-11-06 Thread Xie, Huawei
Thomas, comments for this patch?

> -Original Message-
> From: Xie, Huawei
> Sent: Saturday, September 27, 2014 10:49 PM
> To: dev at dpdk.org
> Cc: Xie, Huawei; Chen, Jing D; Zhang, Helin
> Subject: [PATCH] lib/librte_pmd_i40e: i40e vlan filter set fix
> 
> the right shift bits should be 5 rather than 4.
> vid_idx = (uint32_t) ((vlan_id >> 5 ) & 0x7F)
> 
> Signed-off-by: Huawei Xie 
> CC: Jing Chen 
> CC: Helin Zhang 
> 
> ---
>  lib/librte_pmd_i40e/i40e_ethdev.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c
> b/lib/librte_pmd_i40e/i40e_ethdev.c
> index 9009bd4..9c9d831 100644
> --- a/lib/librte_pmd_i40e/i40e_ethdev.c
> +++ b/lib/librte_pmd_i40e/i40e_ethdev.c
> @@ -3786,14 +3786,11 @@ i40e_set_vlan_filter(struct i40e_vsi *vsi,
>  {
>   uint32_t vid_idx, vid_bit;
> 
> -#define UINT32_BIT_MASK  0x1F
> -#define VALID_VLAN_BIT_MASK  0xFFF
>   /* VFTA is 32-bits size array, each element contains 32 vlan bits, Find 
> the
>*  element first, then find the bits it belongs to
>*/
> - vid_idx = (uint32_t) ((vlan_id & VALID_VLAN_BIT_MASK) >>
> -   sizeof(uint32_t));
> - vid_bit = (uint32_t) (1 << (vlan_id & UINT32_BIT_MASK));
> + vid_idx = (uint32_t) ((vlan_id >> 5 ) & 0x7F);
> + vid_bit = (uint32_t) (1 << (vlan_id & 0x1F));
> 
>   if (on)
>   vsi->vfta[vid_idx] |= vid_bit;
> --
> 1.8.1.4



[dpdk-dev] [PATCH v4 3/3] vhost: Check offset value

2014-11-06 Thread Ouyang, Changchun
Agree with Thomas! using small patches is for easily understanding. 
Merging and mixing things together is not a good thing.

Changchun

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, November 6, 2014 1:01 AM
> To: Xie, Huawei
> Cc: dev at dpdk.org; Ouyang, Changchun
> Subject: Re: [dpdk-dev] [PATCH v4 3/3] vhost: Check offset value
> 
> 2014-11-05 16:52, Xie, Huawei:
> > Why don't we merge 1,2,3 patches?
> 
> Because it's simpler to understand small patches with a dedicated
> explanation in the commit log of each patch.
> Why do you want to merge them?
> 
> --
> Thomas


[dpdk-dev] [PATCH v4 7/8] ethdev: support of multiple sizes of redirection table

2014-11-06 Thread Zhang, Helin
Hi Thomas

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, November 6, 2014 4:53 AM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 7/8] ethdev: support of multiple sizes of
> redirection table
> 
> 2014-10-31 17:03, Helin Zhang:
> >  #define ETH_RSS_RETA_SIZE_64  64
> >  #define ETH_RSS_RETA_SIZE_128 128
> >  #define ETH_RSS_RETA_SIZE_512 512
> 
> Are these values still needed?
It was widely used in igb/ixgbe/i40e code, and app/testpmd. It is good to be 
kept there,
though we can define them separately in each component. This would be more 
convenient
for PMDs and user applications.

> Why 256 is forbidden?
256 is not a valid table size of current supported NICs, for other/future NIC 
which supports
this size, it can be added later as needed.

> Maybe that some comments are needed here.
Comments might not be needed, as their names tell us what they are clearly. Did 
you mean
any other annotations to be added for these macros? I am open for that to add 
any good
annotations for them.

> 
> > +#define RTE_RETA_GROUP_SIZE   64



[dpdk-dev] bifurcated driver

2014-11-06 Thread Vincent JARDIN
+Or

On 05/11/2014 23:48, Zhou, Danny wrote:
> Hi Thomas,
>
> Thanks for sharing the links to ibverbs, I will take a close look at it and 
> compare it to bifurcated driver. My take
> after a rough review is that idea is very much similar, but bifurcated driver 
> implementation is generic for any
> Ethernet device based on existing af_packet mechanism, with extension of 
> exchanging the messages between
> user space and kernel space driver.
>
> I have an internal document to summary the pros and cons of below solutions, 
> except for ibvers, but
> will be adding it shortly.
>
> - igb_uio
> - uio_pci_generic
> - VFIO
> - bifurcated driver
>
> Short answers to your questions:
>>  - upstream status
> Adding IOMMU based memory protection and generic descriptor description 
> support now, into version 2
> kernel patches.
>
>>  - usable with kernel netdev
> af_packet based, and relevant patchset will be submitted to netdev for sure.
>
>>  - usable in a vm
> No, it does no coexist with SRIOV for number of reasons. but if you 
> pass-through a PF to a VM, it works perfect.
>
>>  - usable for Ethernet
> It could work with all Ethernet NICs, as flow director is available and NIC 
> driver support new net_ops to split off
> queue pairs for user space.
>
>>  - hardware requirements
> No specific hardware requirements. All mainstream NICs have multiple qpairs 
> and flow director support.
>
>>  - security protection
> Leverage IOMMU to provide memory protection on Intel platform. Other archs 
> provide similar memory protection
> mechanism, so we only use arch-agnostic DMA memory allocation APIs in kernel 
> to support memory protection.
>
>>  - performance
> DPDK native performance on user space queues, as long as drop_en is enabled 
> to avoid head-of-line blocking.
>
> -Danny
>
>> -Original Message-
>> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
>> Sent: Wednesday, November 05, 2014 9:01 PM
>> To: Zhou, Danny
>> Cc: dev at dpdk.org; Fastabend, John R
>> Subject: Re: [dpdk-dev] bifurcated driver
>>
>> Hi Danny,
>>
>> 2014-10-31 17:36, O'driscoll, Tim:
>>> Bifurcated Driver (Danny.Zhou at intel.com)
>>
>> Thanks for the presentation of bifurcated driver during the community call.
>> I asked if you looked at ibverbs and you wanted a link to check.
>> The kernel module is here:
>>  
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core
>> The userspace library:
>>  http://git.kernel.org/cgit/libs/infiniband/libibverbs.git
>>
>> Extract from Kconfig:
>> "
>> config INFINIBAND_USER_ACCESS
>>  tristate "InfiniBand userspace access (verbs and CM)"
>>  select ANON_INODES
>>  ---help---
>>Userspace InfiniBand access support.  This enables the
>>kernel side of userspace verbs and the userspace
>>communication manager (CM).  This allows userspace processes
>>to set up connections and directly access InfiniBand
>>hardware for fast-path operations.  You will also need
>>libibverbs, libibcm and a hardware driver library from
>>.
>> "
>>
>> It seems to be close to the bifurcated driver needs.
>> Not sure if it can solve the security issues if there is no dedicated MMU
>> in the NIC.
>>
>> I feel we should sum up pros and cons of
>>  - igb_uio
>>  - uio_pci_generic
>>  - VFIO
>>  - ibverbs
>>  - bifurcated driver
>> I suggest to consider these criterias:
>>  - upstream status
>>  - usable with kernel netdev
>>  - usable in a vm
>>  - usable for ethernet
>>  - hardware requirements
>>  - security protection
>>  - performance
>>
>> --
>> Thomas



[dpdk-dev] Could virtio-net-pmd co-exist with virtio-net.ko?

2014-11-06 Thread GongJinrong
Hi, Guys

 When I run virtio-net-pmd in VM, I got "virtio-net device is already
used by another driver" error message, after I removed the virtio-net.ko, it
worked, but now I cannot use the virio-net driver for another virtual NIC,
this cost that normal network performance(non-DPDK application) drops a lot,
could the virtio-net-pmd co-exist with standard virio-net driver?

BR
John Gong


[dpdk-dev] [PATCH] librte_vhost: Fix the path test issue

2014-11-06 Thread Xie, Huawei
>   path = realpath(memfile, resolved_path);
> - if (path == NULL) {
> + if ((path == NULL) && (strlen(resolved_path) == 0)) {
>   RTE_LOG(ERR, VHOST_CONFIG,
>   "(%"PRIu64") Failed to resolve fd directory\n",
>   dev->device_fh);
Changchun:
For some strange file, according to API description, we shouldn't check 
resolved_path as it is undefined.
To make the loop go on, we could use "continue" when we detect path is NULL.

RETURN VALUE
   If there is no error, realpath() returns a pointer to the resolved_path.

   Otherwise it returns a NULL pointer, and the contents of the array 
resolved_path are undefined, and errno is set to indicate the error.



[dpdk-dev] [PATCH v5 2/5] ethdev: add enum type and relevant structures for hash filter control

2014-11-06 Thread Zhang, Helin
Thanks for your good comments!

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, November 3, 2014 3:57 PM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v5 2/5] ethdev: add enum type and relevant
> structures for hash filter control
> 
> 2014-10-21 11:14, Helin Zhang:
> > +enum rte_eth_hash_filter_info_type {
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_UNKNOWN = 0,
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_SYM_HASH_ENA_PER_PCTYPE,
> 
> PCTYPE is an unknown word in the API layer.
> Could you replace it by something more generic?
So you suggested to remove words of PCTYPE, pctype, or packet classification 
type?
I am not trying to rename ETH_RSS_IPV4_SHIFT, ..., ETH_RSS_NONF_IPV4_UDP_SHIFT, 
..., etc.
They are actually pctype in i40e, and not only for RSS. I would like to rename 
them into
generic names. Any good naming ideas from you or other guys? My idea is to 
rename
them like RTE_ETH_FLOW_TYPE_XX.

> 
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_SYM_HASH_ENA_PER_PORT,
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_FILTER_SWAP,
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_HASH_FUNCTION,
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_MAX,
> > +};
> 
> You should comment each constant.
OK. Good to know that!

> 
> > +struct rte_eth_sym_hash_ena_info {
> > +   /**< packet classification type, defined in rte_ethdev.h */
> > +   uint8_t pctype;
> 
> No, PCTYPE is not anymore defined in ethdev.
We need a generic name for that, how about flow_type? Good comments are welcome!

> 
> > +/**
> > + * A structure used to set or get filter swap information, to support
> > + * 'RTE_ETH_FILTER_HASH', 'RTE_ETH_FILTER_GET/RTE_ETH_FILTER_SET',
> > + * with information type 'RTE_ETH_HASH_FILTER_INFO_TYPE_FILTER_SWAP'.
> > + */
> > +struct rte_eth_filter_swap_info {
> > +   /**< Packet classification type, defined in rte_ethdev.h */
> > +   uint8_t pctype;
> > +   /**< Offset of the 1st field of the 1st couple to be swapped. */
> > +   uint8_t off0_src0;
> > +   /**< Offset of the 2nd field of the 1st couple to be swapped. */
> > +   uint8_t off0_src1;
> > +   /**< Field length of the first couple. */
> > +   uint8_t len0;
> > +   /**< Offset of the 1st field of the 2nd couple to be swapped. */
> > +   uint8_t off1_src0;
> > +   /**< Offset of the 2nd field of the 2nd couple to be swapped. */
> > +   uint8_t off1_src1;
> > +   /**< Field length of the second couple. */
> > +   uint8_t len1;
> > +};
> 
> I guess it would be easier to understand if
> RTE_ETH_HASH_FILTER_INFO_TYPE_FILTER_SWAP was defined previously.
It has already been defined before this structure definition.
I don't think I have understood your idea. Could you help to explain more? 
Thanks!

> 
> --
> Thomas

Regards,
Helin


[dpdk-dev] bifurcated driver

2014-11-06 Thread Zhou, Danny
I roughly read libibverbs related code and relevant infiniband/rdma documents, 
and found though 
many concepts in libibverbs looks similar to bifurcated driver, but there are 
still lots of differences as 
illustrated below based on my understanding: 

1) Queue pair defined in RDMA specification are abstract concept, where the 
queue pairs term used in 
  bifurcated driver are rx/tx queue pairs in the NIC.
2) Bifurcated PMD in DPDK directly access NIC resources as a slave driver (no 
NIC control), while libibverbs
  as a user space library rather than driver offloads certain operations to 
kernel driver and NIC by invoking 
  "verbs" APIs.
3) Libibverbs invokes infiniband specific system calls to allow user/kernel 
space communication based on 
  "verbs" defined in infiniband/RDMA spec, while bifurcated driver build on top 
of af_packet module 
  and new socket options to do things like hw queue split-off , map certain 
pages on I/O space to user space 
  operations, etc.
4) There is a specific embedded MMU unit in Infiniband/RDMA to provides memory 
protection, while
  bifurcated driver uses IOMMU rather than NIC to provide memory protection.

IMHO, libibverbs and corresponding kernel modules/drivers are specifically 
designed and implemented for 
direct access to RDMA hardware from userspace, and it highly depends on "verbs" 
related system calls 
supported by infiniband/rdma mechanism in kernel, rather than netdev mechanism 
that bifurcated driver 
solution depends on. 

> -Original Message-
> From: Vincent JARDIN [mailto:vincent.jardin at 6wind.com]
> Sent: Thursday, November 06, 2014 9:31 AM
> To: Zhou, Danny
> Cc: Thomas Monjalon; dev at dpdk.org; Fastabend, John R; Or Gerlitz
> Subject: Re: [dpdk-dev] bifurcated driver
> 
> +Or
> 
> On 05/11/2014 23:48, Zhou, Danny wrote:
> > Hi Thomas,
> >
> > Thanks for sharing the links to ibverbs, I will take a close look at it and 
> > compare it to bifurcated driver. My take
> > after a rough review is that idea is very much similar, but bifurcated 
> > driver implementation is generic for any
> > Ethernet device based on existing af_packet mechanism, with extension of 
> > exchanging the messages between
> > user space and kernel space driver.
> >
> > I have an internal document to summary the pros and cons of below 
> > solutions, except for ibvers, but
> > will be adding it shortly.
> >
> > - igb_uio
> > - uio_pci_generic
> > - VFIO
> > - bifurcated driver
> >
> > Short answers to your questions:
> >>- upstream status
> > Adding IOMMU based memory protection and generic descriptor description 
> > support now, into version 2
> > kernel patches.
> >
> >>- usable with kernel netdev
> > af_packet based, and relevant patchset will be submitted to netdev for sure.
> >
> >>- usable in a vm
> > No, it does no coexist with SRIOV for number of reasons. but if you 
> > pass-through a PF to a VM, it works perfect.
> >
> >>- usable for Ethernet
> > It could work with all Ethernet NICs, as flow director is available and NIC 
> > driver support new net_ops to split off
> > queue pairs for user space.
> >
> >>- hardware requirements
> > No specific hardware requirements. All mainstream NICs have multiple qpairs 
> > and flow director support.
> >
> >>- security protection
> > Leverage IOMMU to provide memory protection on Intel platform. Other archs 
> > provide similar memory protection
> > mechanism, so we only use arch-agnostic DMA memory allocation APIs in 
> > kernel to support memory protection.
> >
> >>- performance
> > DPDK native performance on user space queues, as long as drop_en is enabled 
> > to avoid head-of-line blocking.
> >
> > -Danny
> >
> >> -Original Message-
> >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> >> Sent: Wednesday, November 05, 2014 9:01 PM
> >> To: Zhou, Danny
> >> Cc: dev at dpdk.org; Fastabend, John R
> >> Subject: Re: [dpdk-dev] bifurcated driver
> >>
> >> Hi Danny,
> >>
> >> 2014-10-31 17:36, O'driscoll, Tim:
> >>> Bifurcated Driver (Danny.Zhou at intel.com)
> >>
> >> Thanks for the presentation of bifurcated driver during the community call.
> >> I asked if you looked at ibverbs and you wanted a link to check.
> >> The kernel module is here:
> >>
> >> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core
> >> The userspace library:
> >>http://git.kernel.org/cgit/libs/infiniband/libibverbs.git
> >>
> >> Extract from Kconfig:
> >> "
> >> config INFINIBAND_USER_ACCESS
> >>tristate "InfiniBand userspace access (verbs and CM)"
> >>select ANON_INODES
> >>---help---
> >>  Userspace InfiniBand access support.  This enables the
> >>  kernel side of userspace verbs and the userspace
> >>  communication manager (CM).  This allows userspace processes
> >>  to set up connections and directly access InfiniBand
> >>  hardware for fast-path operations.  You will also need
> >>  libibverbs, libibcm and a har

[dpdk-dev] [PATCH 0/2] lib/librte_vhost: coding style fixes

2014-11-06 Thread Huawei Xie
This patchset fixes serious coding style issues in vhost library.

Huawei Xie (2):
  fix alignment, lengthy lines, misordered type and other style issues 
  printk -> pr_debug

 lib/librte_vhost/eventfd_link/eventfd_link.c | 244 ++---
 lib/librte_vhost/eventfd_link/eventfd_link.h | 127 ++-
 lib/librte_vhost/rte_virtio_net.h|   3 +-
 lib/librte_vhost/vhost-net-cdev.c| 187 +---
 lib/librte_vhost/vhost_rxtx.c|  13 +-
 lib/librte_vhost/virtio-net.c| 317 +--
 6 files changed, 494 insertions(+), 397 deletions(-)

-- 
1.8.1.4



[dpdk-dev] [PATCH] librte_vhost: Fix the path test issue

2014-11-06 Thread Ouyang, Changchun
Hi Huawei, 
Thanks for the comments,
And my response as follows.

> -Original Message-
> From: Xie, Huawei
> Sent: Thursday, November 6, 2014 10:39 AM
> To: Ouyang, Changchun; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] librte_vhost: Fix the path test issue
> 
> > path = realpath(memfile, resolved_path);
> > -   if (path == NULL) {
> > +   if ((path == NULL) && (strlen(resolved_path) == 0)) {
> > RTE_LOG(ERR, VHOST_CONFIG,
> > "(%"PRIu64") Failed to resolve fd directory\n",
> > dev->device_fh);
> Changchun:
> For some strange file, according to API description, we shouldn't check
> resolved_path as it is undefined.
> To make the loop go on, we could use "continue" when we detect path is
> NULL.
> 
> RETURN VALUE
>If there is no error, realpath() returns a pointer to the 
> resolved_path.
> 
>Otherwise it returns a NULL pointer, and the contents of the array
> resolved_path are undefined, and errno is set to indicate the error.

After my investigation this issue and find out using continue doesn't work.

The reason is procmap.fname itself is 
"/dev/hugepages/qemu_back_mem.pc.ram.zxfqLq",
It is not a normal path, so in this case, path is null, while resolved-path is 
/dev/hugepages/qemu_back_mem.pc.ram.zxfqLq

If 'continue' is used, then procmap.fname could not be hit in the directory 
list,
And then  app will exit after report:?Failed to find memory file for pid

So I have to keep it.

Thanks again
Changchun



[dpdk-dev] [PATCH v5 1/5] i40e: Use constant random hash keys

2014-11-06 Thread Zhang, Helin


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, November 3, 2014 4:59 PM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v5 1/5] i40e: Use constant random hash keys
> 
> 2014-11-03 08:18, Zhang, Helin:
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > The title is a bit surprising:
> > > - it should be about RSS
> >
> > RSS makes use of hash function to route received packets, though hash
> > function can be used for other cases, e.g. Flow director.
> 
> Yes but this patch is only changing rss_key_default so I guess it's only 
> related to
> RSS, right?
Yes, it is related to RSS.

> 
> > > - a constant cannot be really random ;)
> >
> > The hash keys are generated by libc random function.
> > It is preparatory to avoid calling random function for each port.
> 
> Here, you remove the call to rte_rand by a constant value.
Yes, actually the hardware just needs random keys by default, no matter what the
real data it is. So array of random data would be better, as no need of cpu 
cycles,
and it is safer in multi-threads environments.

> 
> > > 2014-10-21 11:14, Helin Zhang:
> > > > To be simpler, and remove the race condition, it uses prepared
> > > > constant random hash keys to replace runtime generating the hash keys.
> > >
> > > Could you explain what is the role of rss_key_default?
> >
> > Hash function needs to be configured with keys, before end users
> > configured them with specific keys, we need to provide a default keys
> > which is generated by libc random function.
> > The random keys can get the hash function to route the received
> > packets to all the queues well-proportioned.

Regards,
Helin


[dpdk-dev] bifurcated driver

2014-11-06 Thread Alex Markuze
Danny sums up the issue perfectly IMHO.
While both verbs and DPDK aim to provide generic user space networking, the
similarities end there.
verbs and RDMA HW are closely coupled and behave differently then standard
eth nics and are not related to netdev mechanisms.

Or, welcome to this discussion.

Those interested can read the IB spec's (+1K pages) available from
openfabrics*.
*https://www.openfabrics.org/index.php




On Thu, Nov 6, 2014 at 6:45 AM, Zhou, Danny  wrote:

> I roughly read libibverbs related code and relevant infiniband/rdma
> documents, and found though
> many concepts in libibverbs looks similar to bifurcated driver, but there
> are still lots of differences as
> illustrated below based on my understanding:
>
> 1) Queue pair defined in RDMA specification are abstract concept, where
> the queue pairs term used in
>   bifurcated driver are rx/tx queue pairs in the NIC.
> 2) Bifurcated PMD in DPDK directly access NIC resources as a slave driver
> (no NIC control), while libibverbs
>   as a user space library rather than driver offloads certain operations
> to kernel driver and NIC by invoking
>   "verbs" APIs.
> 3) Libibverbs invokes infiniband specific system calls to allow
> user/kernel space communication based on
>   "verbs" defined in infiniband/RDMA spec, while bifurcated driver build
> on top of af_packet module
>   and new socket options to do things like hw queue split-off , map
> certain pages on I/O space to user space
>   operations, etc.
> 4) There is a specific embedded MMU unit in Infiniband/RDMA to provides
> memory protection, while
>   bifurcated driver uses IOMMU rather than NIC to provide memory
> protection.
>
> IMHO, libibverbs and corresponding kernel modules/drivers are specifically
> designed and implemented for
> direct access to RDMA hardware from userspace, and it highly depends on
> "verbs" related system calls
> supported by infiniband/rdma mechanism in kernel, rather than netdev
> mechanism that bifurcated driver
> solution depends on.
>
> > -Original Message-
> > From: Vincent JARDIN [mailto:vincent.jardin at 6wind.com]
> > Sent: Thursday, November 06, 2014 9:31 AM
> > To: Zhou, Danny
> > Cc: Thomas Monjalon; dev at dpdk.org; Fastabend, John R; Or Gerlitz
> > Subject: Re: [dpdk-dev] bifurcated driver
> >
> > +Or
> >
> > On 05/11/2014 23:48, Zhou, Danny wrote:
> > > Hi Thomas,
> > >
> > > Thanks for sharing the links to ibverbs, I will take a close look at
> it and compare it to bifurcated driver. My take
> > > after a rough review is that idea is very much similar, but bifurcated
> driver implementation is generic for any
> > > Ethernet device based on existing af_packet mechanism, with extension
> of exchanging the messages between
> > > user space and kernel space driver.
> > >
> > > I have an internal document to summary the pros and cons of below
> solutions, except for ibvers, but
> > > will be adding it shortly.
> > >
> > > - igb_uio
> > > - uio_pci_generic
> > > - VFIO
> > > - bifurcated driver
> > >
> > > Short answers to your questions:
> > >>- upstream status
> > > Adding IOMMU based memory protection and generic descriptor
> description support now, into version 2
> > > kernel patches.
> > >
> > >>- usable with kernel netdev
> > > af_packet based, and relevant patchset will be submitted to netdev for
> sure.
> > >
> > >>- usable in a vm
> > > No, it does no coexist with SRIOV for number of reasons. but if you
> pass-through a PF to a VM, it works perfect.
> > >
> > >>- usable for Ethernet
> > > It could work with all Ethernet NICs, as flow director is available
> and NIC driver support new net_ops to split off
> > > queue pairs for user space.
> > >
> > >>- hardware requirements
> > > No specific hardware requirements. All mainstream NICs have multiple
> qpairs and flow director support.
> > >
> > >>- security protection
> > > Leverage IOMMU to provide memory protection on Intel platform. Other
> archs provide similar memory protection
> > > mechanism, so we only use arch-agnostic DMA memory allocation APIs in
> kernel to support memory protection.
> > >
> > >>- performance
> > > DPDK native performance on user space queues, as long as drop_en is
> enabled to avoid head-of-line blocking.
> > >
> > > -Danny
> > >
> > >> -Original Message-
> > >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > >> Sent: Wednesday, November 05, 2014 9:01 PM
> > >> To: Zhou, Danny
> > >> Cc: dev at dpdk.org; Fastabend, John R
> > >> Subject: Re: [dpdk-dev] bifurcated driver
> > >>
> > >> Hi Danny,
> > >>
> > >> 2014-10-31 17:36, O'driscoll, Tim:
> > >>> Bifurcated Driver (Danny.Zhou at intel.com)
> > >>
> > >> Thanks for the presentation of bifurcated driver during the community
> call.
> > >> I asked if you looked at ibverbs and you wanted a link to check.
> > >> The kernel module is here:
> > >>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core
> > >> The userspace l

[dpdk-dev] Could virtio-net-pmd co-exist with virtio-net.ko?

2014-11-06 Thread Matthew Hall
On Thu, Nov 06, 2014 at 10:24:11AM +0800, GongJinrong wrote:
> Hi, Guys
> 
>  When I run virtio-net-pmd in VM, I got "virtio-net device is already
> used by another driver" error message, after I removed the virtio-net.ko, it
> worked, but now I cannot use the virio-net driver for another virtual NIC,
> this cost that normal network performance(non-DPDK application) drops a lot,
> could the virtio-net-pmd co-exist with standard virio-net driver?
> 
> BR
> John Gong

I have no proof it will work perfectly, as I never got to use the virtio PMDs 
because neither works in VirtualBox (developer-friendly / desktop 
virtualization).

But there is a script included in DPDK, dpdk_nic_bind.py, which should let you 
configure this more intelligently on a per-VNIC basis. You could try something 
similar to this:

export RTE_SDK="${build_directory}/external/dpdk"
export RTE_TOOLS="${RTE_SDK}/tools"
export RTE_NIC_BIND="${RTE_TOOLS}/dpdk_nic_bind.py"

"${RTE_NIC_BIND}" --status | fgrep "${PCI_ID}"
"${RTE_NIC_BIND}" -b none  "${PCI_ID}"
"${RTE_NIC_BIND}" -b igb_uio   "${PCI_ID}"
"${RTE_NIC_BIND}" --status | fgrep "${PCI_ID}"

Good Luck!
Matthew.


[dpdk-dev] [PATCH v4 7/8] ethdev: support of multiple sizes of redirection table

2014-11-06 Thread Thomas Monjalon
2014-11-06 01:02, Zhang, Helin:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > 2014-10-31 17:03, Helin Zhang:
> > >  #define ETH_RSS_RETA_SIZE_64  64
> > >  #define ETH_RSS_RETA_SIZE_128 128
> > >  #define ETH_RSS_RETA_SIZE_512 512
> > 
> > Are these values still needed?
> 
> It was widely used in igb/ixgbe/i40e code, and app/testpmd. It is good to be 
> kept there,
> though we can define them separately in each component. This would be more 
> convenient
> for PMDs and user applications.

If it should be used by applications, it must stay in ethdev.

> > Why 256 is forbidden?
> 
> 256 is not a valid table size of current supported NICs, for other/future NIC 
> which supports
> this size, it can be added later as needed.

The problem is that we don't know which value is supported for each driver.
You should add a comment like this:
/**@{
 * Some RSS RETA sizes may be not supported by some drivers.
 * Check in the PMD documentation.
 */
#define ETH_RSS_RETA_SIZE_64  64
#define ETH_RSS_RETA_SIZE_128 128
#define ETH_RSS_RETA_SIZE_512 512
/**@}*/

And then add some comments in the PMD to describe the supported sizes.

> > Maybe that some comments are needed here.
> 
> Comments might not be needed, as their names tell us what they are clearly. 
> Did you mean
> any other annotations to be added for these macros? I am open for that to add 
> any good
> annotations for them.

We just have to keep in mind that the API reference for users is in doxygen.
Some details are obvious for you but not clear for the user, especially if he
doesn't care about i40e.

Thanks
-- 
Thomas


[dpdk-dev] [PATCH] lib/librte_pmd_i40e: i40e vlan filter set fix

2014-11-06 Thread Thomas Monjalon
2014-11-06 00:22, Xie, Huawei:
> Thomas, comments for this patch?

No, but Jing Chen made some comments that you didn't replied:
"Please try to use macro to replace numbers."
So I wait a v2.

> > -Original Message-
> > From: Xie, Huawei
> > Sent: Saturday, September 27, 2014 10:49 PM
> > To: dev at dpdk.org
> > Cc: Xie, Huawei; Chen, Jing D; Zhang, Helin
> > Subject: [PATCH] lib/librte_pmd_i40e: i40e vlan filter set fix
> > 
> > the right shift bits should be 5 rather than 4.
> > vid_idx = (uint32_t) ((vlan_id >> 5 ) & 0x7F)
> > 
> > Signed-off-by: Huawei Xie 
> > CC: Jing Chen 
> > CC: Helin Zhang 
> > 
> > ---
> >  lib/librte_pmd_i40e/i40e_ethdev.c | 7 ++-
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c
> > b/lib/librte_pmd_i40e/i40e_ethdev.c
> > index 9009bd4..9c9d831 100644
> > --- a/lib/librte_pmd_i40e/i40e_ethdev.c
> > +++ b/lib/librte_pmd_i40e/i40e_ethdev.c
> > @@ -3786,14 +3786,11 @@ i40e_set_vlan_filter(struct i40e_vsi *vsi,
> >  {
> > uint32_t vid_idx, vid_bit;
> > 
> > -#define UINT32_BIT_MASK  0x1F
> > -#define VALID_VLAN_BIT_MASK  0xFFF
> > /* VFTA is 32-bits size array, each element contains 32 vlan bits, Find 
> > the
> >  *  element first, then find the bits it belongs to
> >  */
> > -   vid_idx = (uint32_t) ((vlan_id & VALID_VLAN_BIT_MASK) >>
> > - sizeof(uint32_t));
> > -   vid_bit = (uint32_t) (1 << (vlan_id & UINT32_BIT_MASK));
> > +   vid_idx = (uint32_t) ((vlan_id >> 5 ) & 0x7F);
> > +   vid_bit = (uint32_t) (1 << (vlan_id & 0x1F));
> > 
> > if (on)
> > vsi->vfta[vid_idx] |= vid_bit;
> > --
> > 1.8.1.4



[dpdk-dev] [PATCH v5 2/5] ethdev: add enum type and relevant structures for hash filter control

2014-11-06 Thread Thomas Monjalon
2014-11-06 03:41, Zhang, Helin:
> > > +/**
> > > + * A structure used to set or get filter swap information, to support
> > > + * 'RTE_ETH_FILTER_HASH', 'RTE_ETH_FILTER_GET/RTE_ETH_FILTER_SET',
> > > + * with information type 'RTE_ETH_HASH_FILTER_INFO_TYPE_FILTER_SWAP'.
> > > + */
> > > +struct rte_eth_filter_swap_info {
> > > + /**< Packet classification type, defined in rte_ethdev.h */
> > > + uint8_t pctype;
> > > + /**< Offset of the 1st field of the 1st couple to be swapped. */
> > > + uint8_t off0_src0;
> > > + /**< Offset of the 2nd field of the 1st couple to be swapped. */
> > > + uint8_t off0_src1;
> > > + /**< Field length of the first couple. */
> > > + uint8_t len0;
> > > + /**< Offset of the 1st field of the 2nd couple to be swapped. */
> > > + uint8_t off1_src0;
> > > + /**< Offset of the 2nd field of the 2nd couple to be swapped. */
> > > + uint8_t off1_src1;
> > > + /**< Field length of the second couple. */
> > > + uint8_t len1;
> > > +};
> > 
> > I guess it would be easier to understand if
> > RTE_ETH_HASH_FILTER_INFO_TYPE_FILTER_SWAP was defined previously.
> 
> It has already been defined before this structure definition.
> I don't think I have understood your idea. Could you help to explain more? 
> Thanks!

By "defined", I mean "explained". What is the action of swap filter?
You offer new features in API without explaining them. It's probably obvious
for you but not for me.

-- 
Thomas


[dpdk-dev] [PATCH v4 7/8] ethdev: support of multiple sizes of redirection table

2014-11-06 Thread Zhang, Helin


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, November 6, 2014 4:33 PM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 7/8] ethdev: support of multiple sizes of
> redirection table
> 
> 2014-11-06 01:02, Zhang, Helin:
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > 2014-10-31 17:03, Helin Zhang:
> > > >  #define ETH_RSS_RETA_SIZE_64  64
> > > >  #define ETH_RSS_RETA_SIZE_128 128  #define ETH_RSS_RETA_SIZE_512
> > > > 512
> > >
> > > Are these values still needed?
> >
> > It was widely used in igb/ixgbe/i40e code, and app/testpmd. It is good
> > to be kept there, though we can define them separately in each
> > component. This would be more convenient for PMDs and user applications.
> 
> If it should be used by applications, it must stay in ethdev.
Good to get it aligned with us.

> 
> > > Why 256 is forbidden?
> >
> > 256 is not a valid table size of current supported NICs, for
> > other/future NIC which supports this size, it can be added later as needed.
> 
> The problem is that we don't know which value is supported for each driver.
> You should add a comment like this:
> /**@{
>  * Some RSS RETA sizes may be not supported by some drivers.
>  * Check in the PMD documentation.
>  */
> #define ETH_RSS_RETA_SIZE_64  64
> #define ETH_RSS_RETA_SIZE_128 128
> #define ETH_RSS_RETA_SIZE_512 512
> /**@}*/
In rte_ethdev.h, there is comments for rte_eth_dev_rss_reta_update() and
rte_eth_dev_rss_reta_query() that the reta table size can be queried by
rte_eth_dev_info_get().
So the end users could know the reta size of each NIC by reading its datasheet,
or call that function to query the size directly.
The macros defined here let the reta size more straightforward, and easy to use.
OK, it is good to add some annotations here. Thanks!

> 
> And then add some comments in the PMD to describe the supported sizes.
> 
> > > Maybe that some comments are needed here.
> >
> > Comments might not be needed, as their names tell us what they are
> > clearly. Did you mean any other annotations to be added for these
> > macros? I am open for that to add any good annotations for them.
> 
> We just have to keep in mind that the API reference for users is in doxygen.
> Some details are obvious for you but not clear for the user, especially if he
> doesn't care about i40e.
> 
> Thanks
> --
> Thomas

Regards,
Helin


[dpdk-dev] [PATCH v5 2/5] ethdev: add enum type and relevant structures for hash filter control

2014-11-06 Thread Zhang, Helin


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, November 6, 2014 4:43 PM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v5 2/5] ethdev: add enum type and relevant
> structures for hash filter control
> 
> 2014-11-06 03:41, Zhang, Helin:
> > > > +/**
> > > > + * A structure used to set or get filter swap information, to
> > > > +support
> > > > + * 'RTE_ETH_FILTER_HASH',
> > > > +'RTE_ETH_FILTER_GET/RTE_ETH_FILTER_SET',
> > > > + * with information type
> 'RTE_ETH_HASH_FILTER_INFO_TYPE_FILTER_SWAP'.
> > > > + */
> > > > +struct rte_eth_filter_swap_info {
> > > > +   /**< Packet classification type, defined in rte_ethdev.h */
> > > > +   uint8_t pctype;
> > > > +   /**< Offset of the 1st field of the 1st couple to be swapped. */
> > > > +   uint8_t off0_src0;
> > > > +   /**< Offset of the 2nd field of the 1st couple to be swapped. */
> > > > +   uint8_t off0_src1;
> > > > +   /**< Field length of the first couple. */
> > > > +   uint8_t len0;
> > > > +   /**< Offset of the 1st field of the 2nd couple to be swapped. */
> > > > +   uint8_t off1_src0;
> > > > +   /**< Offset of the 2nd field of the 2nd couple to be swapped. */
> > > > +   uint8_t off1_src1;
> > > > +   /**< Field length of the second couple. */
> > > > +   uint8_t len1;
> > > > +};
> > >
> > > I guess it would be easier to understand if
> > > RTE_ETH_HASH_FILTER_INFO_TYPE_FILTER_SWAP was defined previously.
> >
> > It has already been defined before this structure definition.
> > I don't think I have understood your idea. Could you help to explain more?
> Thanks!
> 
> By "defined", I mean "explained". What is the action of swap filter?
> You offer new features in API without explaining them. It's probably obvious 
> for
> you but not for me.
Yes, they should be explained.

> 
> --
> Thomas

Regards,
Helin


[dpdk-dev] bifurcated driver

2014-11-06 Thread Nicolas Dichtel
Also CC netdev, this thread may interest network folks.

Le 06/11/2014 09:13, Alex Markuze a ?crit :
> Danny sums up the issue perfectly IMHO.
> While both verbs and DPDK aim to provide generic user space networking, the
> similarities end there.
> verbs and RDMA HW are closely coupled and behave differently then standard
> eth nics and are not related to netdev mechanisms.
>
> Or, welcome to this discussion.
>
> Those interested can read the IB spec's (+1K pages) available from
> openfabrics*.
> *https://www.openfabrics.org/index.php
>
>
>
>
> On Thu, Nov 6, 2014 at 6:45 AM, Zhou, Danny  wrote:
>
>> I roughly read libibverbs related code and relevant infiniband/rdma
>> documents, and found though
>> many concepts in libibverbs looks similar to bifurcated driver, but there
>> are still lots of differences as
>> illustrated below based on my understanding:
>>
>> 1) Queue pair defined in RDMA specification are abstract concept, where
>> the queue pairs term used in
>>bifurcated driver are rx/tx queue pairs in the NIC.
>> 2) Bifurcated PMD in DPDK directly access NIC resources as a slave driver
>> (no NIC control), while libibverbs
>>as a user space library rather than driver offloads certain operations
>> to kernel driver and NIC by invoking
>>"verbs" APIs.
>> 3) Libibverbs invokes infiniband specific system calls to allow
>> user/kernel space communication based on
>>"verbs" defined in infiniband/RDMA spec, while bifurcated driver build
>> on top of af_packet module
>>and new socket options to do things like hw queue split-off , map
>> certain pages on I/O space to user space
>>operations, etc.
>> 4) There is a specific embedded MMU unit in Infiniband/RDMA to provides
>> memory protection, while
>>bifurcated driver uses IOMMU rather than NIC to provide memory
>> protection.
>>
>> IMHO, libibverbs and corresponding kernel modules/drivers are specifically
>> designed and implemented for
>> direct access to RDMA hardware from userspace, and it highly depends on
>> "verbs" related system calls
>> supported by infiniband/rdma mechanism in kernel, rather than netdev
>> mechanism that bifurcated driver
>> solution depends on.
>>
>>> -Original Message-
>>> From: Vincent JARDIN [mailto:vincent.jardin at 6wind.com]
>>> Sent: Thursday, November 06, 2014 9:31 AM
>>> To: Zhou, Danny
>>> Cc: Thomas Monjalon; dev at dpdk.org; Fastabend, John R; Or Gerlitz
>>> Subject: Re: [dpdk-dev] bifurcated driver
>>>
>>> +Or
>>>
>>> On 05/11/2014 23:48, Zhou, Danny wrote:
 Hi Thomas,

 Thanks for sharing the links to ibverbs, I will take a close look at
>> it and compare it to bifurcated driver. My take
 after a rough review is that idea is very much similar, but bifurcated
>> driver implementation is generic for any
 Ethernet device based on existing af_packet mechanism, with extension
>> of exchanging the messages between
 user space and kernel space driver.

 I have an internal document to summary the pros and cons of below
>> solutions, except for ibvers, but
 will be adding it shortly.

 - igb_uio
 - uio_pci_generic
 - VFIO
 - bifurcated driver

 Short answers to your questions:
> - upstream status
 Adding IOMMU based memory protection and generic descriptor
>> description support now, into version 2
 kernel patches.

> - usable with kernel netdev
 af_packet based, and relevant patchset will be submitted to netdev for
>> sure.

> - usable in a vm
 No, it does no coexist with SRIOV for number of reasons. but if you
>> pass-through a PF to a VM, it works perfect.

> - usable for Ethernet
 It could work with all Ethernet NICs, as flow director is available
>> and NIC driver support new net_ops to split off
 queue pairs for user space.

> - hardware requirements
 No specific hardware requirements. All mainstream NICs have multiple
>> qpairs and flow director support.

> - security protection
 Leverage IOMMU to provide memory protection on Intel platform. Other
>> archs provide similar memory protection
 mechanism, so we only use arch-agnostic DMA memory allocation APIs in
>> kernel to support memory protection.

> - performance
 DPDK native performance on user space queues, as long as drop_en is
>> enabled to avoid head-of-line blocking.

 -Danny

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, November 05, 2014 9:01 PM
> To: Zhou, Danny
> Cc: dev at dpdk.org; Fastabend, John R
> Subject: Re: [dpdk-dev] bifurcated driver
>
> Hi Danny,
>
> 2014-10-31 17:36, O'driscoll, Tim:
>> Bifurcated Driver (Danny.Zhou at intel.com)
>
> Thanks for the presentation of bifurcated driver during the community
>> call.
> I asked if you looked at ibverbs and you wanted a link to check.
> The kernel module 

[dpdk-dev] [PATCH] Add user defined tag calculation callback to librte_distributor.

2014-11-06 Thread Bruce Richardson
On Wed, Nov 05, 2014 at 07:24:13PM +0200, jigsaw wrote:
> Hi Bruce,
> 
> OK understood. Then there's no real need to make any change.
> But the question remains about this line:
> 
> http://dpdk.org/browse/dpdk/tree/lib/librte_distributor/rte_distributor.c#n285
> 
> new_tag = (next_mb->hash.rss | 1);
> 
> Why the logical OR is needed?

That's needed to ensure that we never track a tag with an actual value of zero.
We instead always force the low bit to be 1, so that we can use zero as an
"empty" value.

/Bruce

> 
> thx &
> rgds,
> 
> -qinglai
> 
> On Wed, Nov 5, 2014 at 6:36 PM, Bruce Richardson  intel.com
> > wrote:
> 
> > On Wed, Nov 05, 2014 at 05:11:51PM +0200, jigsaw wrote:
> > > Hi Bruce,
> > >
> > > Thanks for reply.
> > > The idea is triggered by real life use case, where the flow id is buried
> > in
> > > L3 payload. Deep packet inspection is one of the scenarios, tunneled pkts
> > > is another.
> > > However, only functionality is verified. Performance impact has not been
> > > checked yet.
> > >
> > > To add distributor and another void * as params is nice.
> > >
> > > Your advice of extract tags in a row inspired me another solution, which
> > is
> > > to change the union hash inside rte_mbuf:
> > >
> > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > index e8f9bfc..5b13c0b 100644
> > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > @@ -185,6 +185,7 @@ struct rte_mbuf {
> > > uint16_t id;
> > > } fdir;   /**< Filter identifier if FDIR enabled
> > */
> > > uint32_t sched;   /**< Hierarchical scheduler */
> > > +   uint32_t user;/**< User defined hash tag */
> > > } hash;   /**< hash information */
> > >
> > > /* second cache line - fields only used in slow path or on TX */
> > >
> > > The new union field user is actually for documentation purpose only, coz
> > > user application can set hash.rss value and have the same result.
> > > Therefore, the user application is free to calculate the tag in burst
> > mode
> > > before calling rte_distributor_process.
> > >
> > > Then rte_distributor_process needs to read next_mb->hash.user.
> > > Does it sounds better?
> >
> > What you propose is the exact original intent, though I did not try to add
> > a new union member purely for documentation purposes. I had planned, but
> > perhaps did not explain well enough, that the application would itself set
> > up
> > the tag as it thought best before passing packets to the distributor. I
> > suspect
> > that overloading the RSS field for this impeded that idea geting through.
> >
> > /Bruce
> >
> > >
> > > I have another question: why the logical OR 1 is added to new_tag?
> > >
> > > thx &
> > > rgds,
> > > -qinglai
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Nov 5, 2014 at 4:27 PM, Bruce Richardson <
> > bruce.richardson at intel.com
> > > > wrote:
> > >
> > > > On Wed, Nov 05, 2014 at 03:30:37PM +0200, Qinglai Xiao wrote:
> > > > > User defined tag calculation has access to mbuf.
> > > > > Default tag is RSS hash result.
> > > > >
> > > >
> > > > Interesting idea.
> > > > Did you investigate was there any performance improvement or regression
> > > > comparing
> > > > whether the callback was called per-packet as packets were dequeued for
> > > > distribution
> > > > (i.e. how you have things now in your patch), compared to calling
> > > > the callback in a loop to extract the tags for all packets initially? I
> > > > suspect
> > > > there probably isn't much performance difference either way, but it
> > may be
> > > > worth
> > > > checking.
> > > > One other point, is that I think the callback to extract the tag should
> > > > have
> > > > additional parameters - at least one, if not two. I would suggest that
> > the
> > > > distributor pointer be passed in, as well as an arbitrary void *
> > pointer.
> > > >
> > > > Regards,
> > > > /Bruce
> > > >
> > > > > Signed-off-by: Qinglai Xiao 
> > > > > ---
> > > > >  app/test/test_distributor.c  |6 +++---
> > > > >  app/test/test_distributor_perf.c |2 +-
> > > > >  lib/librte_distributor/rte_distributor.c |   12 ++--
> > > > >  lib/librte_distributor/rte_distributor.h |7 ++-
> > > > >  4 files changed, 20 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/app/test/test_distributor.c
> > b/app/test/test_distributor.c
> > > > > index ce06436..6ea4943 100644
> > > > > --- a/app/test/test_distributor.c
> > > > > +++ b/app/test/test_distributor.c
> > > > > @@ -452,7 +452,7 @@ int test_error_distributor_create_name(void)
> > > > >   char *name = NULL;
> > > > >
> > > > >   d = rte_distributor_create(name, rte_socket_id(),
> > > > > - rte_lcore_count() - 1);
> > > > > + rte_lcore_count() - 1, NULL);
> > > > >   if (d != NULL || rte_errno != EINVAL) {
> > > > >   

[dpdk-dev] Could virtio-net-pmd co-exist with virtio-net.ko?

2014-11-06 Thread GongJinrong
Hi, Matthew, Thanks a lot, I will try it.

-Original Message-
From: Matthew Hall [mailto:mh...@mhcomputing.net] 
Sent: Thursday, November 06, 2014 4:21 PM
To: GongJinrong
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] Could virtio-net-pmd co-exist with virtio-net.ko?

On Thu, Nov 06, 2014 at 10:24:11AM +0800, GongJinrong wrote:
> Hi, Guys
> 
>  When I run virtio-net-pmd in VM, I got "virtio-net device is 
> already used by another driver" error message, after I removed the 
> virtio-net.ko, it worked, but now I cannot use the virio-net driver 
> for another virtual NIC, this cost that normal network 
> performance(non-DPDK application) drops a lot, could the virtio-net-pmd
co-exist with standard virio-net driver?
> 
> BR
> John Gong

I have no proof it will work perfectly, as I never got to use the virtio
PMDs because neither works in VirtualBox (developer-friendly / desktop
virtualization).

But there is a script included in DPDK, dpdk_nic_bind.py, which should let
you configure this more intelligently on a per-VNIC basis. You could try
something similar to this:

export RTE_SDK="${build_directory}/external/dpdk"
export RTE_TOOLS="${RTE_SDK}/tools"
export RTE_NIC_BIND="${RTE_TOOLS}/dpdk_nic_bind.py"

"${RTE_NIC_BIND}" --status | fgrep "${PCI_ID}"
"${RTE_NIC_BIND}" -b none  "${PCI_ID}"
"${RTE_NIC_BIND}" -b igb_uio   "${PCI_ID}"
"${RTE_NIC_BIND}" --status | fgrep "${PCI_ID}"

Good Luck!
Matthew.


[dpdk-dev] [PATCH] Add user defined tag calculation callback to librte_distributor.

2014-11-06 Thread jigsaw
OK understood. Thanks. -qinglai

On Thu, Nov 6, 2014 at 11:22 AM, Bruce Richardson <
bruce.richardson at intel.com> wrote:

> On Wed, Nov 05, 2014 at 07:24:13PM +0200, jigsaw wrote:
> > Hi Bruce,
> >
> > OK understood. Then there's no real need to make any change.
> > But the question remains about this line:
> >
> >
> http://dpdk.org/browse/dpdk/tree/lib/librte_distributor/rte_distributor.c#n285
> >
> > new_tag = (next_mb->hash.rss | 1);
> >
> > Why the logical OR is needed?
>
> That's needed to ensure that we never track a tag with an actual value of
> zero.
> We instead always force the low bit to be 1, so that we can use zero as an
> "empty" value.
>
> /Bruce
>
> >
> > thx &
> > rgds,
> >
> > -qinglai
> >
> > On Wed, Nov 5, 2014 at 6:36 PM, Bruce Richardson <
> bruce.richardson at intel.com
> > > wrote:
> >
> > > On Wed, Nov 05, 2014 at 05:11:51PM +0200, jigsaw wrote:
> > > > Hi Bruce,
> > > >
> > > > Thanks for reply.
> > > > The idea is triggered by real life use case, where the flow id is
> buried
> > > in
> > > > L3 payload. Deep packet inspection is one of the scenarios, tunneled
> pkts
> > > > is another.
> > > > However, only functionality is verified. Performance impact has not
> been
> > > > checked yet.
> > > >
> > > > To add distributor and another void * as params is nice.
> > > >
> > > > Your advice of extract tags in a row inspired me another solution,
> which
> > > is
> > > > to change the union hash inside rte_mbuf:
> > > >
> > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > index e8f9bfc..5b13c0b 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > @@ -185,6 +185,7 @@ struct rte_mbuf {
> > > > uint16_t id;
> > > > } fdir;   /**< Filter identifier if FDIR
> enabled
> > > */
> > > > uint32_t sched;   /**< Hierarchical scheduler */
> > > > +   uint32_t user;/**< User defined hash tag */
> > > > } hash;   /**< hash information */
> > > >
> > > > /* second cache line - fields only used in slow path or on
> TX */
> > > >
> > > > The new union field user is actually for documentation purpose only,
> coz
> > > > user application can set hash.rss value and have the same result.
> > > > Therefore, the user application is free to calculate the tag in burst
> > > mode
> > > > before calling rte_distributor_process.
> > > >
> > > > Then rte_distributor_process needs to read next_mb->hash.user.
> > > > Does it sounds better?
> > >
> > > What you propose is the exact original intent, though I did not try to
> add
> > > a new union member purely for documentation purposes. I had planned,
> but
> > > perhaps did not explain well enough, that the application would itself
> set
> > > up
> > > the tag as it thought best before passing packets to the distributor. I
> > > suspect
> > > that overloading the RSS field for this impeded that idea geting
> through.
> > >
> > > /Bruce
> > >
> > > >
> > > > I have another question: why the logical OR 1 is added to new_tag?
> > > >
> > > > thx &
> > > > rgds,
> > > > -qinglai
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Nov 5, 2014 at 4:27 PM, Bruce Richardson <
> > > bruce.richardson at intel.com
> > > > > wrote:
> > > >
> > > > > On Wed, Nov 05, 2014 at 03:30:37PM +0200, Qinglai Xiao wrote:
> > > > > > User defined tag calculation has access to mbuf.
> > > > > > Default tag is RSS hash result.
> > > > > >
> > > > >
> > > > > Interesting idea.
> > > > > Did you investigate was there any performance improvement or
> regression
> > > > > comparing
> > > > > whether the callback was called per-packet as packets were
> dequeued for
> > > > > distribution
> > > > > (i.e. how you have things now in your patch), compared to calling
> > > > > the callback in a loop to extract the tags for all packets
> initially? I
> > > > > suspect
> > > > > there probably isn't much performance difference either way, but it
> > > may be
> > > > > worth
> > > > > checking.
> > > > > One other point, is that I think the callback to extract the tag
> should
> > > > > have
> > > > > additional parameters - at least one, if not two. I would suggest
> that
> > > the
> > > > > distributor pointer be passed in, as well as an arbitrary void *
> > > pointer.
> > > > >
> > > > > Regards,
> > > > > /Bruce
> > > > >
> > > > > > Signed-off-by: Qinglai Xiao 
> > > > > > ---
> > > > > >  app/test/test_distributor.c  |6 +++---
> > > > > >  app/test/test_distributor_perf.c |2 +-
> > > > > >  lib/librte_distributor/rte_distributor.c |   12 ++--
> > > > > >  lib/librte_distributor/rte_distributor.h |7 ++-
> > > > > >  4 files changed, 20 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/app/test/test_distributor.c
> > > b/app/test/test_distributor.c
> > > > > > index ce06436..6ea4943 100644
> > > > > > --- a/app/test/test_dist

[dpdk-dev] [PATCH 0/2] lib/librte_vhost: coding style fixes

2014-11-06 Thread De Lara Guarch, Pablo
Hi Huawei,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
> Sent: Thursday, November 06, 2014 4:46 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH 0/2] lib/librte_vhost: coding style fixes
> 
> This patchset fixes serious coding style issues in vhost library.
> 
> Huawei Xie (2):
>   fix alignment, lengthy lines, misordered type and other style issues
>   printk -> pr_debug
> 
>  lib/librte_vhost/eventfd_link/eventfd_link.c | 244 ++---
>  lib/librte_vhost/eventfd_link/eventfd_link.h | 127 ++-
>  lib/librte_vhost/rte_virtio_net.h|   3 +-
>  lib/librte_vhost/vhost-net-cdev.c| 187 +---
>  lib/librte_vhost/vhost_rxtx.c|  13 +-
>  lib/librte_vhost/virtio-net.c| 317 
> +--
>  6 files changed, 494 insertions(+), 397 deletions(-)
> 
> --
> 1.8.1.4

I assume this is the cover letter of the two patches that you sent 5 hours 
before. 
You should probably send this again, as it seems to not be connected to the 
patches 
(look here: http://dpdk.org/ml/archives/dev/2014-November/)


Thanks,
Pablo



[dpdk-dev] [PULL REQUEST] doc: TestPMD Application User Guide.

2014-11-06 Thread Thomas Monjalon
> Bernard Iremonger (1):
>   doc: TestPMD Application User Guide

Pulled in the main repository.

This document should now be updated to reflect the recent changes to
testpmd.
>From now, patches for testpmd should include doc update.

Thanks
-- 
Thomas


[dpdk-dev] [PATCH] Add user defined tag calculation callback to librte_distributor.

2014-11-06 Thread Thomas Monjalon
2014-11-06 09:22, Bruce Richardson:
> On Wed, Nov 05, 2014 at 07:24:13PM +0200, jigsaw wrote:
> > http://dpdk.org/browse/dpdk/tree/lib/librte_distributor/rte_distributor.c#n285
> > 
> > new_tag = (next_mb->hash.rss | 1);
> > 
> > Why the logical OR is needed?
> 
> That's needed to ensure that we never track a tag with an actual value of 
> zero.
> We instead always force the low bit to be 1, so that we can use zero as an
> "empty" value.

Bruce, could you check how this code may be better commented please?
This discussion shows that the distributor library probably needs more
explanations in the code or doxygen.

Thanks
-- 
Thomas


[dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension

2014-11-06 Thread Tetsuya Mukawa
Hi Xie,

Here are RFC patches to add vhost-user extension to librte_vhost.

It seems now you are merging a patch that fixes coding style of
librte_vhost.
Unfortunately my patches based on latest tree, so I will submit
again after your patch is acked.
Because of this, I haven't check coding style strictly. When I
rebase on your new patch, I will check coding style too.

Anyway, could you please check patches?

Thanks,
Tetsuya

Tetsuya Mukawa (7):
  lib/librte_vhost: Fix host_memory_map() to handle various memory
regions
  lib/librte_vhost: Add an abstraction layer for vhost backends
  lib/librte_vhost: Add an abstraction layer tointerpret messages
  lib/librte_vhost: Move vhost vhost-cuse device list and accessor
functions
  lib/librte_vhost: Add a vhost session abstraction
  lib/librte_vhost: Add vhost-cuse/user specific initialization
  lib/librte_vhost: Add vhost-user implementation

 lib/librte_vhost/Makefile  |   2 +-
 lib/librte_vhost/rte_virtio_net.h  |  49 ++-
 lib/librte_vhost/vhost-net-cdev.c  |  29 +-
 lib/librte_vhost/vhost-net-cdev.h  | 113 ---
 lib/librte_vhost/vhost-net-user.c  | 541 ++
 lib/librte_vhost/vhost-net.c   | 132 
 lib/librte_vhost/vhost-net.h   | 127 +++
 lib/librte_vhost/vhost_rxtx.c  |   2 +-
 lib/librte_vhost/virtio-net-cdev.c | 624 ++
 lib/librte_vhost/virtio-net-user.c | 410 +++
 lib/librte_vhost/virtio-net.c  | 669 -
 11 files changed, 2032 insertions(+), 666 deletions(-)
 delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
 create mode 100644 lib/librte_vhost/vhost-net-user.c
 create mode 100644 lib/librte_vhost/vhost-net.c
 create mode 100644 lib/librte_vhost/vhost-net.h
 create mode 100644 lib/librte_vhost/virtio-net-cdev.c
 create mode 100644 lib/librte_vhost/virtio-net-user.c

-- 
1.9.1



[dpdk-dev] [RFC PATCH 1/7] lib/librte_vhost: Fix host_memory_map() to handle various memory regions

2014-11-06 Thread Tetsuya Mukawa
Without this patch, host_memory_map() can only handle a region that
exists on head of a guest physical memory. The patch fixes the
host_memory_map() to handle regions exist on middle of the physical memory.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_vhost/virtio-net.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 8015dd8..9155a68 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -83,6 +83,7 @@ const uint32_t BUFSIZE = PATH_MAX;
 /* Structure containing information gathered from maps file. */
 struct procmap {
uint64_tva_start;   /* Start virtual address in file. */
+   uint64_tva_end; /* End virtual address in file. */
uint64_tlen;/* Size of file. */
uint64_tpgoff;  /* Not used. */
uint32_tmaj;/* Not used. */
@@ -176,7 +177,7 @@ host_memory_map(struct virtio_net *dev, struct 
virtio_memory *mem,
return -1;
}

-   procmap.len = strtoull(in[1], &end, 16);
+   procmap.va_end = strtoull(in[1], &end, 16);
if ((in[1] == '\0') || (end == NULL) || (*end != '\0') || 
(errno != 0)) {
fclose(fmap);
return -1;
@@ -209,8 +210,8 @@ host_memory_map(struct virtio_net *dev, struct 
virtio_memory *mem,
memcpy(&procmap.prot, in[2], PROT_SZ);
memcpy(&procmap.fname, in[7], PATH_MAX);

-   if (procmap.va_start == addr) {
-   procmap.len = procmap.len - procmap.va_start;
+   if ((procmap.va_start <= addr) && (procmap.va_end >= addr)) {
+   procmap.len = procmap.va_end - procmap.va_start;
found = 1;
break;
}
-- 
1.9.1



[dpdk-dev] [RFC PATCH 2/7] lib/librte_vhost: Add an abstraction layer for vhost backends

2014-11-06 Thread Tetsuya Mukawa
The patch adds an abstraction layer for vhost backends.
So far CUSE is the only one vhost backend. But QEMU-2.1 can have one
more backend called vhost-user. To handle both backends, this kind of
layer is needed.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_vhost/Makefile |   2 +-
 lib/librte_vhost/rte_virtio_net.h |  30 --
 lib/librte_vhost/vhost-net-cdev.c |  24 
 lib/librte_vhost/vhost-net-cdev.h | 113 --
 lib/librte_vhost/vhost-net.c  |  97 
 lib/librte_vhost/vhost-net.h  | 113 ++
 lib/librte_vhost/vhost_rxtx.c |   2 +-
 lib/librte_vhost/virtio-net.c |   2 +-
 8 files changed, 252 insertions(+), 131 deletions(-)
 delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
 create mode 100644 lib/librte_vhost/vhost-net.c
 create mode 100644 lib/librte_vhost/vhost-net.h

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index c008d64..0d4aa98 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -37,7 +37,7 @@ LIB = librte_vhost.a
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -D_FILE_OFFSET_BITS=64 -lfuse
 LDFLAGS += -lfuse
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-net-cdev.c virtio-net.c vhost_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-net.c virtio-net.c vhost_rxtx.c

 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index b6548a1..a36c0e3 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -31,8 +31,8 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

-#ifndef _VIRTIO_NET_H_
-#define _VIRTIO_NET_H_
+#ifndef _RTE_VIRTIO_NET_H_
+#define _RTE_VIRTIO_NET_H_

 /**
  * @file
@@ -71,6 +71,25 @@ struct buf_vector {
 };

 /**
+ * Enum for vhost driver types.
+ */
+typedef enum {
+   VHOST_DRV_CUSE, /* cuse driver */
+   VHOST_DRV_NUM   /* the number of vhost driver types */
+} vhost_driver_type_t;
+
+/**
+ * Structure contains information relating vhost driver.
+ */
+struct vhost_driver {
+   vhost_driver_type_t type;   /**< driver type. */
+   const char  *dev_name;  /**< accessing device name. */
+   union {
+   struct fuse_session *session;   /**< fuse session. */
+   };
+};
+
+/**
  * Structure contains variables relevant to RX/TX virtqueues.
  */
 struct vhost_virtqueue {
@@ -176,12 +195,13 @@ uint64_t rte_vhost_feature_get(void);
 int rte_vhost_enable_guest_notification(struct virtio_net *dev, uint16_t 
queue_id, int enable);

 /* Register vhost driver. dev_name could be different for multiple instance 
support. */
-int rte_vhost_driver_register(const char *dev_name);
+struct vhost_driver *rte_vhost_driver_register(
+   const char *dev_name, vhost_driver_type_t type);

 /* Register callbacks. */
 int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * 
const);
 /* Start vhost driver session blocking loop. */
-int rte_vhost_driver_session_start(void);
+int rte_vhost_driver_session_start(struct vhost_driver *drv);

 /**
  * This function adds buffers to the virtio devices RX virtqueue. Buffers can
@@ -210,4 +230,4 @@ uint16_t rte_vhost_enqueue_burst(struct virtio_net *dev, 
uint16_t queue_id,
 uint16_t rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);

-#endif /* _VIRTIO_NET_H_ */
+#endif /* _RTE_VIRTIO_NET_H_ */
diff --git a/lib/librte_vhost/vhost-net-cdev.c 
b/lib/librte_vhost/vhost-net-cdev.c
index 91ff0d8..83e1d14 100644
--- a/lib/librte_vhost/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost-net-cdev.c
@@ -2,7 +2,6 @@
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -44,7 +43,7 @@
 #include 
 #include 

-#include "vhost-net-cdev.h"
+#include "vhost-net.h"

 #define FUSE_OPT_DUMMY "\0\0"
 #define FUSE_OPT_FORE  "-f\0\0"
@@ -55,7 +54,6 @@ static const uint32_t default_minor = 1;
 static const char  cuse_device_name[]  = "/dev/cuse";
 static const char  default_cdev[] = "vhost-net";

-static struct fuse_session *session;
 static struct vhost_net_device_ops const *ops;

 /*
@@ -300,9 +298,10 @@ static const struct cuse_lowlevel_ops vhost_net_ops = {
  * cuse_info is populated and used to register the cuse device. 
vhost_net_device_ops are
  * also passed when the device is registered in main.c.
  */
-int
-rte_vhost_driver_register(const char *dev_name)
+static int
+vhost_cuse_driver_register(struct vhost_driver *drv)
 {
+   const char *dev_name;
   

[dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction layer tointerpret messages

2014-11-06 Thread Tetsuya Mukawa
This patch adds an abstraction layer to interpret messages from QEMU.
This abstraction layer is needed because there are differences in
message formats between vhost-cuse and vhost-user.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_vhost/vhost-net-cdev.c  |   2 +-
 lib/librte_vhost/vhost-net.h   |   3 +-
 lib/librte_vhost/virtio-net-cdev.c | 492 +
 lib/librte_vhost/virtio-net.c  | 484 ++--
 4 files changed, 517 insertions(+), 464 deletions(-)
 create mode 100644 lib/librte_vhost/virtio-net-cdev.c

diff --git a/lib/librte_vhost/vhost-net-cdev.c 
b/lib/librte_vhost/vhost-net-cdev.c
index 83e1d14..12d0f68 100644
--- a/lib/librte_vhost/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost-net-cdev.c
@@ -342,7 +342,7 @@ vhost_cuse_driver_register(struct vhost_driver *drv)
cuse_info.dev_info_argv = device_argv;
cuse_info.flags = CUSE_UNRESTRICTED_IOCTL;

-   ops = get_virtio_net_callbacks();
+   ops = get_virtio_net_callbacks(drv->type);

drv->session = cuse_lowlevel_setup(3, fuse_argv,
&cuse_info, &vhost_net_ops, 0, NULL);
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 03a5c57..09a99ce 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -109,5 +109,6 @@ struct vhost_net_device_ops {
 };


-struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
+struct vhost_net_device_ops const *get_virtio_net_callbacks(
+   vhost_driver_type_t type);
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/virtio-net-cdev.c 
b/lib/librte_vhost/virtio-net-cdev.c
new file mode 100644
index 000..f225bf5
--- /dev/null
+++ b/lib/librte_vhost/virtio-net-cdev.c
@@ -0,0 +1,492 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2014 IGEL Co.,Ltd. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of IGEL nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "vhost-net.h"
+#include "eventfd_link/eventfd_link.h"
+
+const char eventfd_cdev[] = "/dev/eventfd-link";
+
+/* Line size for reading maps file. */
+const uint32_t BUFSIZE = PATH_MAX;
+
+/* Size of prot char array in procmap. */
+#define PROT_SZ 5
+
+/* Number of elements in procmap struct. */
+#define PROCMAP_SZ 8
+
+/* Structure containing information gathered from maps file. */
+struct procmap {
+   uint64_tva_start;   /* Start virtual address in file. */
+   uint64_tva_end; /* End virtual address in file. */
+   uint64_tlen;/* Size of file. */
+   uint64_tpgoff;  /* Not used. */
+   uint32_tmaj;/* Not used. */
+   uint32_tmin;/* Not used. */
+   uint32_tino;/* Not used. */
+   charprot[PROT_SZ];  /* Not used. */
+   charfname[PATH_MAX];/* File name. */
+};
+
+/*
+ * Locate the file containing QEMU's memory space and map it to our address 
space.
+ */
+static int
+host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
+   pid_t pid, uint64_t addr)
+{
+   struct dirent *dptr = NULL;
+   struct procmap procmap;
+   DIR *dp = NULL;
+   int fd;
+   int i;
+   char memfile[PATH_MAX];
+   char mapfile[PATH_MAX];
+   char procdir[PATH_MAX];
+   char resolved_path[PATH_

[dpdk-dev] [RFC PATCH 4/7] lib/librte_vhost: Move vhost vhost-cuse device list and accessor functions

2014-11-06 Thread Tetsuya Mukawa
vhost-cuse and vhost-user should have a independent device list.
This patch moves vhost-cuse device list and list accessor functions
to 'virtio-net-cdev.c'.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_vhost/vhost-net-cdev.c  |   1 +
 lib/librte_vhost/vhost-net.h   |   5 +-
 lib/librte_vhost/virtio-net-cdev.c | 133 +++--
 lib/librte_vhost/virtio-net.c  | 121 ++---
 4 files changed, 181 insertions(+), 79 deletions(-)

diff --git a/lib/librte_vhost/vhost-net-cdev.c 
b/lib/librte_vhost/vhost-net-cdev.c
index 12d0f68..090c6fc 100644
--- a/lib/librte_vhost/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost-net-cdev.c
@@ -66,6 +66,7 @@ fuse_req_to_vhost_ctx(fuse_req_t req, struct fuse_file_info 
*fi)
struct vhost_device_ctx ctx;
struct fuse_ctx const *const req_ctx = fuse_req_ctx(req);

+   ctx.type = VHOST_DRV_CUSE;
ctx.pid = req_ctx->pid;
ctx.fh = fi->fh;

diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 09a99ce..64873d0 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -77,8 +77,9 @@
  * Structure used to identify device context.
  */
 struct vhost_device_ctx {
-   pid_t   pid;/* PID of process calling the IOCTL. */
-   uint64_tfh; /* Populated with fi->fh to track the device 
index. */
+   vhost_driver_type_t type;   /* driver type. */
+   pid_t   pid;/* PID of process calling the IOCTL. */
+   uint64_tfh; /* Populated with fi->fh to track the 
device index. */
 };

 /*
diff --git a/lib/librte_vhost/virtio-net-cdev.c 
b/lib/librte_vhost/virtio-net-cdev.c
index f225bf5..70bc578 100644
--- a/lib/librte_vhost/virtio-net-cdev.c
+++ b/lib/librte_vhost/virtio-net-cdev.c
@@ -41,6 +41,24 @@
 #include "vhost-net.h"
 #include "eventfd_link/eventfd_link.h"

+/* Functions defined in virtio_net.c */
+static void init_device(struct virtio_net *dev);
+static void cleanup_device(struct virtio_net *dev);
+static void free_device(struct virtio_net_config_ll *ll_dev);
+static int new_device(struct vhost_device_ctx ctx);
+static void destroy_device(struct vhost_device_ctx ctx);
+static int set_owner(struct vhost_device_ctx ctx);
+static int reset_owner(struct vhost_device_ctx ctx);
+static int get_features(struct vhost_device_ctx ctx, uint64_t *pu);
+static int set_features(struct vhost_device_ctx ctx, uint64_t *pu);
+static int set_vring_num(struct vhost_device_ctx ctx, struct vhost_vring_state 
*state);
+static int set_vring_addr(struct vhost_device_ctx ctx, struct vhost_vring_addr 
*addr);
+static int set_vring_base(struct vhost_device_ctx ctx, struct 
vhost_vring_state *state);
+static int set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file 
*file);
+
+/* Root address of the linked list in the configuration core. */
+static struct virtio_net_config_ll *cdev_ll_root;
+
 const char eventfd_cdev[] = "/dev/eventfd-link";

 /* Line size for reading maps file. */
@@ -65,11 +83,114 @@ struct procmap {
charfname[PATH_MAX];/* File name. */
 };

+/**
+ * Retrieves an entry from the devices configuration linked list.
+ */
+static struct virtio_net_config_ll *
+cdev_get_config_ll_entry(struct vhost_device_ctx ctx)
+{
+   struct virtio_net_config_ll *ll_dev = cdev_ll_root;
+
+   /* Loop through linked list until the device_fh is found. */
+   while (ll_dev != NULL) {
+   if (ll_dev->dev.device_fh == ctx.fh)
+   return ll_dev;
+   ll_dev = ll_dev->next;
+   }
+
+   return NULL;
+}
+
+/**
+ * Searches the configuration core linked list and retrieves the device if it 
exists.
+ */
+static struct virtio_net *
+cdev_get_device(struct vhost_device_ctx ctx)
+{
+   struct virtio_net_config_ll *ll_dev;
+
+   ll_dev = cdev_get_config_ll_entry(ctx);
+
+   /* If a matching entry is found in the linked list, return the device 
in that entry. */
+   if (ll_dev)
+   return &ll_dev->dev;
+
+   RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Device not found in linked 
list.\n", ctx.fh);
+   return NULL;
+}
+
+/**
+ * Add entry containing a device to the device configuration linked list.
+ */
+static void
+cdev_add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
+{
+   struct virtio_net_config_ll *ll_dev = cdev_ll_root;
+
+   /* If ll_dev == NULL then this is the first device so go to else */
+   if (ll_dev) {
+   /* If the 1st device_fh != 0 then we insert our device here. */
+   if (ll_dev->dev.device_fh != 0) {
+   new_ll_dev->dev.device_fh = 0;
+   new_ll_dev->next = ll_dev;
+   cdev_ll_root = new_ll_dev;
+   } else {
+   /* Increment through the ll until we find un unused 
device_fh. Insert the device at that entry*/
+   while

[dpdk-dev] [RFC PATCH 5/7] lib/librte_vhost: Add a vhost session abstraction

2014-11-06 Thread Tetsuya Mukawa
Vhost session relates vhost communication layer to virtio-net
device layer. Because vhost-cuse and vhost-user have different
session information, the patch is needed.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_vhost/rte_virtio_net.h  | 2 +-
 lib/librte_vhost/vhost-net-cdev.c  | 8 
 lib/librte_vhost/vhost-net.h   | 7 ++-
 lib/librte_vhost/virtio-net-cdev.c | 7 ---
 4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index a36c0e3..a9e20ea 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -85,7 +85,7 @@ struct vhost_driver {
vhost_driver_type_t type;   /**< driver type. */
const char  *dev_name;  /**< accessing device name. */
union {
-   struct fuse_session *session;   /**< fuse session. */
+   struct fuse_session *cuse_session;  /**< fuse session. */
};
 };

diff --git a/lib/librte_vhost/vhost-net-cdev.c 
b/lib/librte_vhost/vhost-net-cdev.c
index 090c6fc..6754548 100644
--- a/lib/librte_vhost/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost-net-cdev.c
@@ -67,7 +67,7 @@ fuse_req_to_vhost_ctx(fuse_req_t req, struct fuse_file_info 
*fi)
struct fuse_ctx const *const req_ctx = fuse_req_ctx(req);

ctx.type = VHOST_DRV_CUSE;
-   ctx.pid = req_ctx->pid;
+   ctx.cdev.pid = req_ctx->pid;
ctx.fh = fi->fh;

return ctx;
@@ -345,9 +345,9 @@ vhost_cuse_driver_register(struct vhost_driver *drv)

ops = get_virtio_net_callbacks(drv->type);

-   drv->session = cuse_lowlevel_setup(3, fuse_argv,
+   drv->cuse_session = cuse_lowlevel_setup(3, fuse_argv,
&cuse_info, &vhost_net_ops, 0, NULL);
-   if (drv->session == NULL)
+   if (drv->cuse_session == NULL)
return -1;

return 0;
@@ -359,7 +359,7 @@ vhost_cuse_driver_register(struct vhost_driver *drv)
 static int
 vhost_cuse_driver_session_start(struct vhost_driver *drv)
 {
-   fuse_session_loop(drv->session);
+   fuse_session_loop(drv->cuse_session);

return 0;
 }
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 64873d0..ef04832 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -72,14 +72,19 @@
 #define PRINT_PACKET(device, addr, size, header) do {} while (0)
 #endif

+struct vhost_device_cuse_ctx {
+   pid_t   pid;/* PID of process calling the IOCTL. */
+};

 /*
  * Structure used to identify device context.
  */
 struct vhost_device_ctx {
vhost_driver_type_t type;   /* driver type. */
-   pid_t   pid;/* PID of process calling the IOCTL. */
uint64_tfh; /* Populated with fi->fh to track the 
device index. */
+   union {
+   struct vhost_device_cuse_ctx cdev;
+   };
 };

 /*
diff --git a/lib/librte_vhost/virtio-net-cdev.c 
b/lib/librte_vhost/virtio-net-cdev.c
index 70bc578..ac97551 100644
--- a/lib/librte_vhost/virtio-net-cdev.c
+++ b/lib/librte_vhost/virtio-net-cdev.c
@@ -412,7 +412,8 @@ cuse_set_mem_table(struct vhost_device_ctx ctx, const void 
*mem_regions_addr,
if (mem->regions[regionidx].guest_phys_address == 0x0) {
mem->base_address = 
mem->regions[regionidx].userspace_address;
/* Map VM memory file */
-   if (cdev_host_memory_map(dev, mem, ctx.pid, 
mem->base_address) != 0) {
+   if (cdev_host_memory_map(dev, mem, ctx.cdev.pid,
+   mem->base_address) != 0) {
free(mem);
return -1;
}
@@ -543,7 +544,7 @@ cuse_set_vring_call(struct vhost_device_ctx ctx, struct 
vhost_vring_file *file)
vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
eventfd_kick.source_fd = vq->kickfd;
eventfd_kick.target_fd = file->fd;
-   eventfd_kick.target_pid = ctx.pid;
+   eventfd_kick.target_pid = ctx.cdev.pid;

if (eventfd_copy(dev, &eventfd_kick))
return -1;
@@ -577,7 +578,7 @@ cuse_set_vring_kick(struct vhost_device_ctx ctx, struct 
vhost_vring_file *file)
vq->callfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
eventfd_call.source_fd = vq->callfd;
eventfd_call.target_fd = file->fd;
-   eventfd_call.target_pid = ctx.pid;
+   eventfd_call.target_pid = ctx.cdev.pid;

if (eventfd_copy(dev, &eventfd_call))
return -1;
-- 
1.9.1



[dpdk-dev] [RFC PATCH 6/7] lib/librte_vhost: Add vhost-cuse/user specific initialization

2014-11-06 Thread Tetsuya Mukawa
Initialization of vhost-cuse and vhost-user are different.
To call each initialization, the patch is needed.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_vhost/virtio-net-cdev.c | 12 +++-
 lib/librte_vhost/virtio-net.c  | 13 ++---
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/lib/librte_vhost/virtio-net-cdev.c 
b/lib/librte_vhost/virtio-net-cdev.c
index ac97551..a1ba1f9 100644
--- a/lib/librte_vhost/virtio-net-cdev.c
+++ b/lib/librte_vhost/virtio-net-cdev.c
@@ -42,7 +42,7 @@
 #include "eventfd_link/eventfd_link.h"

 /* Functions defined in virtio_net.c */
-static void init_device(struct virtio_net *dev);
+static void init_device(struct vhost_device_ctx ctx, struct virtio_net *dev);
 static void cleanup_device(struct virtio_net *dev);
 static void free_device(struct virtio_net_config_ll *ll_dev);
 static int new_device(struct vhost_device_ctx ctx);
@@ -186,6 +186,16 @@ cdev_get_config_ll_root(void)
return cdev_ll_root;
 }

+
+/**
+ * CUSE specific device initialization.
+ */
+static void
+cdev_init_device(struct vhost_device_ctx ctx __rte_unused,
+   struct virtio_net *dev __rte_unused)
+{
+}
+
 /*
  * Locate the file containing QEMU's memory space and map it to our address 
space.
  */
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 603bb09..13fbb6f 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -212,7 +212,7 @@ get_config_ll_root(struct vhost_device_ctx ctx)
  *  Initialise all variables in device structure.
  */
 static void
-init_device(struct virtio_net *dev)
+init_device(struct vhost_device_ctx ctx, struct virtio_net *dev)
 {
uint64_t vq_offset;

@@ -228,6 +228,13 @@ init_device(struct virtio_net *dev)
/* Backends are set to -1 indicating an inactive device. */
dev->virtqueue[VIRTIO_RXQ]->backend = VIRTIO_DEV_STOPPED;
dev->virtqueue[VIRTIO_TXQ]->backend = VIRTIO_DEV_STOPPED;
+
+   switch (ctx.type) {
+   case VHOST_DRV_CUSE:
+   return cdev_init_device(ctx, dev);
+   default:
+   break;
+   }
 }

 /*
@@ -273,7 +280,7 @@ new_device(struct vhost_device_ctx ctx)
new_ll_dev->dev.virtqueue[VIRTIO_TXQ] = virtqueue_tx;

/* Initialise device and virtqueues. */
-   init_device(&new_ll_dev->dev);
+   init_device(ctx, &new_ll_dev->dev);

new_ll_dev->next = NULL;

@@ -339,7 +346,7 @@ reset_owner(struct vhost_device_ctx ctx)
ll_dev = get_config_ll_entry(ctx);

cleanup_device(&ll_dev->dev);
-   init_device(&ll_dev->dev);
+   init_device(ctx, &ll_dev->dev);

return 0;
 }
-- 
1.9.1



[dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation

2014-11-06 Thread Tetsuya Mukawa
This patch adds vhost-user implementation to librte_vhost.
To communicate with vhost-user of QEMU, speficy VHOST_DRV_USER as
a vhost_driver_type_t variable in rte_vhost_driver_register().

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_vhost/rte_virtio_net.h  |  19 +-
 lib/librte_vhost/vhost-net-user.c  | 541 +
 lib/librte_vhost/vhost-net.c   |  39 ++-
 lib/librte_vhost/vhost-net.h   |   7 +
 lib/librte_vhost/virtio-net-user.c | 410 
 lib/librte_vhost/virtio-net.c  |  64 -
 6 files changed, 1073 insertions(+), 7 deletions(-)
 create mode 100644 lib/librte_vhost/vhost-net-user.c
 create mode 100644 lib/librte_vhost/virtio-net-user.c

diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index a9e20ea..af07900 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -75,17 +75,32 @@ struct buf_vector {
  */
 typedef enum {
VHOST_DRV_CUSE, /* cuse driver */
+   VHOST_DRV_USER, /* vhost-user driver */
VHOST_DRV_NUM   /* the number of vhost driver types */
 } vhost_driver_type_t;

+
+/**
+ * Structure contains vhost-user session specific information
+ */
+struct vhost_user_session {
+   int fh; /**< session identifier */
+   pthread_t   tid;/**< thread id of session handler */
+   int socketfd;   /**< fd of socket */
+   int interval;   /**< reconnection interval of session */
+};
+
 /**
  * Structure contains information relating vhost driver.
  */
 struct vhost_driver {
vhost_driver_type_t type;   /**< driver type. */
const char  *dev_name;  /**< accessing device name. */
+   void*priv;  /**< private data. */
union {
struct fuse_session *cuse_session;  /**< fuse session. */
+   struct vhost_user_session *user_session;
+   /**< vhost-user session. */
};
 };

@@ -199,9 +214,11 @@ struct vhost_driver *rte_vhost_driver_register(
const char *dev_name, vhost_driver_type_t type);

 /* Register callbacks. */
-int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * 
const);
+int rte_vhost_driver_callback_register(struct vhost_driver *drv,
+   struct virtio_net_device_ops const * const, void *priv);
 /* Start vhost driver session blocking loop. */
 int rte_vhost_driver_session_start(struct vhost_driver *drv);
+void rte_vhost_driver_session_stop(struct vhost_driver *drv);

 /**
  * This function adds buffers to the virtio devices RX virtqueue. Buffers can
diff --git a/lib/librte_vhost/vhost-net-user.c 
b/lib/librte_vhost/vhost-net-user.c
new file mode 100644
index 000..434f20f
--- /dev/null
+++ b/lib/librte_vhost/vhost-net-user.c
@@ -0,0 +1,541 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2014 IGEL Co/.Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of IGEL nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#define VHOST_USER_MAX_DEVICE  (32)
+#define VHOST_USER_MAX_FD_NUM  (3)
+
+/* start id of vhost user device */
+rte_atomic16_t vhost_user_device_id;
+
+static struct vhost_net_device_ops const *ops;
+
+typedef enum VhostUserRequest {
+   VHOST_USER_NONE = 0,
+   VHOST_USER_GET_FEATURES = 1,
+ 

[dpdk-dev] [PATCH 1/2] lib/librte_vhost: code style fixes

2014-11-06 Thread Neil Horman
On Thu, Nov 06, 2014 at 07:31:41AM +0800, Huawei Xie wrote:
> fixes alignment issues, lengthy lines, misordered type and other coding style 
> issues.
> 
> Signed-off-by: Huawei Xie 
> ---
>  lib/librte_vhost/eventfd_link/eventfd_link.c | 244 ++---
>  lib/librte_vhost/eventfd_link/eventfd_link.h | 127 ++-
>  lib/librte_vhost/rte_virtio_net.h|   3 +-
>  lib/librte_vhost/vhost-net-cdev.c| 187 +---
>  lib/librte_vhost/vhost_rxtx.c|  13 +-
>  lib/librte_vhost/virtio-net.c| 317 
> +--
>  6 files changed, 494 insertions(+), 397 deletions(-)
> 
Acked-by: Neil Horman 



[dpdk-dev] [PATCH 2/2] lib/librte_vhost: printk->pr_debug

2014-11-06 Thread Neil Horman
On Thu, Nov 06, 2014 at 07:31:42AM +0800, Huawei Xie wrote:
> printk -> pr_debug
> 
> Signed-off-by: Huawei Xie 
> ---
>  lib/librte_vhost/eventfd_link/eventfd_link.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/librte_vhost/eventfd_link/eventfd_link.c 
> b/lib/librte_vhost/eventfd_link/eventfd_link.c
> index 542ec2c..7755dd6 100644
> --- a/lib/librte_vhost/eventfd_link/eventfd_link.c
> +++ b/lib/librte_vhost/eventfd_link/eventfd_link.c
> @@ -88,13 +88,13 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, 
> unsigned long arg)
>   task_target =
>   pid_task(find_vpid(eventfd_copy.target_pid), 
> PIDTYPE_PID);
>   if (task_target == NULL) {
> - printk(KERN_DEBUG "Failed to get mem ctx for target 
> pid\n");
> + pr_debug("Failed to get mem ctx for target pid\n");
>   return -EFAULT;
>   }
>  
>   files = get_files_struct(current);
>   if (files == NULL) {
> - printk(KERN_DEBUG "Failed to get files struct\n");
> + pr_debug("Failed to get files struct\n");
>   return -EFAULT;
>   }
>  
> @@ -109,7 +109,7 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, 
> unsigned long arg)
>   put_files_struct(files);
>  
>   if (file == NULL) {
> - printk(KERN_DEBUG "Failed to get file from source 
> pid\n");
> + pr_debug("Failed to get file from source pid\n");
>   return 0;
>   }
>  
> @@ -128,7 +128,7 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, 
> unsigned long arg)
>  
>   files = get_files_struct(task_target);
>   if (files == NULL) {
> - printk(KERN_DEBUG "Failed to get files struct\n");
> + pr_debug("Failed to get files struct\n");
>   return -EFAULT;
>   }
>  
> @@ -143,7 +143,7 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, 
> unsigned long arg)
>   put_files_struct(files);
>  
>   if (file == NULL) {
> - printk(KERN_DEBUG "Failed to get file from target 
> pid\n");
> + pr_debug("Failed to get file from target pid\n");
>   return 0;
>   }
>  
> -- 
> 1.8.1.4
> 
> 
Acked-by: Neil Horman 


[dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload

2014-11-06 Thread Liu, Jijiang


> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Wednesday, November 5, 2014 6:28 PM
> To: Liu, Jijiang
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum
> offload
> 
> Hi Jijiang,
> 
> Thank you for your answer. Please find some comments below.
> 
> On 11/05/2014 07:02 AM, Liu, Jijiang wrote:
> >> First, the code checks if the mbuf has the flag PKT_RX_TUNNEL_IPV4_HDR.
> >> What is the meaning of this flag? It was added by [3], but there is
> >> no description in comments or in the commit log explaining in which
> >> case this flag is set by the driver. The name supposes that this flag
> >> is set when the received packet is an IPv4 tunnel, but the commit log talks
> about vxlan.
> >
> > The flag PKT_RX_TUNNEL_IPV4_HDR can be used for all tunneling packet types
> with outer IPV4 header.
> > For example:
> > IPv4 --> GRE/Teredo/VXLAN --> MAC --> IPv4:
> > MAC, IPV4, GRENAT, MAC, IPV4, SCTP, PAY4 MAC, IPV4, GRENAT, MAC, IPV6,
> > UDP, PAY4 MAC, IPV4, GRENAT, MAC, IPV6, UDP, PAY4 These tunneling
> > packet formats have a common point that is outer IPv4 header here.
> >
> > Only VXLAN tunneling packet is supported in DPDK for i40e now, so  the 
> > commit
> log talks about VXLAN .
> 
> Is it possible to have a more formal definition? For instance, is the 
> following
> definition below correct?
> 
>   "the PKT_RX_TUNNEL_IPV4_HDR flag CAN be set by a driver if the packet
>contains a tunneling protocol inside an IPv4 header".

Yes, correct.

> If the definition above is correct, I don't see how this flag can help an 
> application
> to run faster. There is already a flag telling if there is a valid IPv4 header
> (PKT_RX_IPV4_HDR). As the PKT_RX_TUNNEL_IPV4_HDR flag does not tell what
> is ip->proto, the work done by an application to dissect a packet would be 
> exactly
> the same with or without this flag.

If the PKT_RX_TUNNEL_IPV4_HDR flag is set, which means driver tell application 
that incoming packet is encapsulated packet, and application will process / 
analyse the packet according to tunneling format indicated by packet_type.

In terms of VXLAN packet format (MAC,IPv4,UDP,VXLAN,MAC,IP,TCP,PAY4), if only 
the PKT_RX_IPV4_HDR flag is set, and application regard its payload as "from 
VXLAN to PAY4", but actually, the real payload is PAY4.

> Please, can you give an example showing in which conditions this flag can 
> help an
> application?

http://dpdk.org/ml/archives/dev/2014-October/007151.html
http://dpdk.org/ml/archives/dev/2014-October/007156.html

We used the PKT_RX_TUNNEL_IPV4_HDR in the two patches to help application 
identify incoming packet is tunneling packet.




[dpdk-dev] [TEST] vhost pmd for testing vhost-user

2014-11-06 Thread Tetsuya Mukawa
Hi Xie,

I've written vhost PMD to test vhost-user.
This patch may be useful when you test vhost-user.

Here are steps when I test vhost-user.

1. Start testpmd on the host
$ sudo ./x86_64-native-linuxapp-gcc/app/testpmd -c f -n 1 -m 1024 \
--vdev 'eth_vhost0,iface=/tmp/virtq0' \
--vdev 'eth_vhost1,iface=/tmp/virtq1' -- -i

2, Start QEMU like followings.
$ sudo qemu-system-x86_64 -M pc-1.0 -cpu host -m 4096 -smp 4 -enable-kvm \
-drive file=,if=none,id=drive-virtio-disk0,format=raw \
-device 
virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 \
-object 
memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \
-numa node,memdev=mem \
-chardev socket,id=chr0,path=/tmp/virtq0,server \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0 \
-chardev socket,id=chr1,path=/tmp/virtq1,server \
-netdev vhost-user,id=net1,chardev=chr1,vhostforce \
-device virtio-net-pci,netdev=net1 \
-vnc :2

3. Bind 2 virtio-net devices to igb_uio on the guest.

4. Start testpmd on the guest.
$ sudo ./x86_64-native-linuxapp-gcc/app/testpmd -c f -n 1 -m 1024 -- -i

5. Start forwarding on the guest.
testpmd> start

6. Start forwarding on the host.
testpmd> start tx_first

7. Stop forwading.

Thanks,
Tetsuya

Tetsuya Mukawa (1):
  lib/librte_pmd_vhost: Add vhost pmd

 config/common_linuxapp   |   5 +
 lib/Makefile |   1 +
 lib/librte_pmd_vhost/Makefile|  57 
 lib/librte_pmd_vhost/rte_eth_vhost.c | 487 +++
 lib/librte_pmd_vhost/rte_eth_vhost.h |  55 
 mk/rte.app.mk|   4 +
 6 files changed, 609 insertions(+)
 create mode 100644 lib/librte_pmd_vhost/Makefile
 create mode 100644 lib/librte_pmd_vhost/rte_eth_vhost.c
 create mode 100644 lib/librte_pmd_vhost/rte_eth_vhost.h

-- 
1.9.1



[dpdk-dev] [TEST] lib/librte_pmd_vhost: Add vhost pmd

2014-11-06 Thread Tetsuya Mukawa
The vhost pmd is a poll mode driver using librte_vhost library. It is
almost similar to a relation between librte_ring and ring pmd.

Here is a command example of QEMU(above 2.1) to communicate with the vhost pmd.
qemu-system-x86_64 -M pc-1.0 -cpu host -m 4096 \
-object 
memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on
-numa node,memdev=mem \
-chardev socket,id=chr0,path=/tmp/virtq0,server \
-netdev vhost-user,id=net0,chardev=chr0 \
-device virtio-net-pci,netdev=net0

Also testpmd example is here.
testpmd -c f -n 1 -m 1024 --vdev 'eth_vhost0,iface=/tmp/virtq0' -- -i

You can invoke QEMU and testpmd with any order. But if you invoke QEMU
after testpmd, please set '-m' option for qemu not to use all hugepage
memory.

Signed-off-by: Tetsuya Mukawa 
---
 config/common_linuxapp   |   5 +
 lib/Makefile |   1 +
 lib/librte_pmd_vhost/Makefile|  57 
 lib/librte_pmd_vhost/rte_eth_vhost.c | 487 +++
 lib/librte_pmd_vhost/rte_eth_vhost.h |  55 
 mk/rte.app.mk|   4 +
 6 files changed, 609 insertions(+)
 create mode 100644 lib/librte_pmd_vhost/Makefile
 create mode 100644 lib/librte_pmd_vhost/rte_eth_vhost.c
 create mode 100644 lib/librte_pmd_vhost/rte_eth_vhost.h

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 8be79c3..71a54fc 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -243,6 +243,11 @@ CONFIG_RTE_PMD_RING_MAX_TX_RINGS=16
 CONFIG_RTE_LIBRTE_PMD_PCAP=n

 #
+# Compile burst-oriented VHOST PMD driver
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=n
+
+#
 # Compile link bonding PMD library
 #
 CONFIG_RTE_LIBRTE_PMD_BOND=y
diff --git a/lib/Makefile b/lib/Makefile
index e3237ff..4f314a7 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -47,6 +47,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += librte_pmd_i40e
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += librte_pmd_bond
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_RING) += librte_pmd_ring
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += librte_pmd_pcap
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += librte_pmd_vhost
 DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += librte_pmd_virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += librte_pmd_vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += librte_pmd_xenvirt
diff --git a/lib/librte_pmd_vhost/Makefile b/lib/librte_pmd_vhost/Makefile
new file mode 100644
index 000..6c85d55
--- /dev/null
+++ b/lib/librte_pmd_vhost/Makefile
@@ -0,0 +1,57 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_eal lib/librte_vhost
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_pmd_vhost/rte_eth_vhost.c 
b/lib/librte_pmd_vhost/rte_eth_vhost.c
new file mode 100644
index 000..9e5f622
--- /dev/null
+++ b/lib/librte_pmd_vhost/rte_eth_vhost.c
@@ -0,0 +1,487 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   C

[dpdk-dev] [TEST] lib/librte_pmd_vhost: Add vhost pmd

2014-11-06 Thread Tetsuya Mukawa
Addition:
The patch was written just for testing vhost-user extension patches.
So it has a lot of issues now.

Thanks,
Tetsuya


[dpdk-dev] 答复: [PATCH] Add user defined tag calculation callback tolibrte_distributor.

2014-11-06 Thread Qinglai Xiao
Hi Bruce,

There is a subtle case in which tag values are 2 and 3, respectively. Then 
these two tags cannot be distinguished. There should be a better way so as to 
handle this situation.

thx &
rgds
-qinglai

--
???: "Thomas Monjalon" 
: ?2014/?11/?6 12:36
???: "Bruce Richardson" 
??: "dev at dpdk.org" ; "jigsaw" 
??: Re: [dpdk-dev] [PATCH] Add user defined tag calculation callback 
tolibrte_distributor.

2014-11-06 09:22, Bruce Richardson:
> On Wed, Nov 05, 2014 at 07:24:13PM +0200, jigsaw wrote:
> > http://dpdk.org/browse/dpdk/tree/lib/librte_distributor/rte_distributor.c#n285
> > 
> > new_tag = (next_mb->hash.rss | 1);
> > 
> > Why the logical OR is needed?
> 
> That's needed to ensure that we never track a tag with an actual value of 
> zero.
> We instead always force the low bit to be 1, so that we can use zero as an
> "empty" value.

Bruce, could you check how this code may be better commented please?
This discussion shows that the distributor library probably needs more
explanations in the code or doxygen.

Thanks
-- 
Thomas


[dpdk-dev] [PATCH v3 0/5] support of configurable CRC stripping in VF

2014-11-06 Thread Helin Zhang
To support configurable CRC stripping in both PF host
and VF, a new operation and a new structure are added
to carry more configurations from VF to PF host.

v2 changes:
* Put all the renaming and code style fixes into a patch.
* Put processing crc stripping configuration in PF host
  into a single patch.
* Put setting the crc stripping into a single patch.
* Put the configuring crc stripping in VF into a single patch.
* Added several more code style fixes reported by checkpatch.pl.

v3 changes:
* Added a macro of calculating memory size for configuring
  vsi queues.
* Used array of memory in stack to replace the memory
  allocated by rte_zmalloc().
* Added an input parameter for configuring crc stripping in
  RX queue context.
* Put configuring crc stripping of both PF host and VF
  into a single patch.
* Defined below new structures for the configuring specifically.
  - struct i40e_virtchnl_rxq_ext_info;
  - struct i40e_virtchnl_queue_pair_ext_info;
  - struct i40e_virtchnl_vsi_queue_config_ext_info;
* Renamed 'I40E_VIRTCHNL_OP_CONFIG_VSI_QUEUES_EX' to
  'I40E_VIRTCHNL_OP_CONFIG_VSI_QUEUES_EXT'.

Helin Zhang (5):
  config: remove useless i40e items in config files
  i40evf: Remove 'host_is_dpdk', and use version number instead
  i40e: renaming and code style fix
  i40e: support of configurable crc stripping in rx queue
  i40e: support of configurable VF crc stripping

 config/common_bsdapp |   1 -
 config/common_linuxapp   |   1 -
 lib/librte_pmd_i40e/i40e_ethdev.h|   3 +-
 lib/librte_pmd_i40e/i40e_ethdev_vf.c | 218 +++
 lib/librte_pmd_i40e/i40e_pf.c| 134 +++--
 lib/librte_pmd_i40e/i40e_pf.h|  57 +++--
 6 files changed, 297 insertions(+), 117 deletions(-)

-- 
1.8.1.4



[dpdk-dev] [PATCH v3 1/5] config: remove useless i40e items in config files

2014-11-06 Thread Helin Zhang
Remove 'CONFIG_RTE_LIBRTE_I40E_PF_DISABLE_STRIP_CRC'
from config files, as nowhere uses it.

Signed-off-by: Helin Zhang 
---
 config/common_bsdapp   | 1 -
 config/common_linuxapp | 1 -
 2 files changed, 2 deletions(-)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 9dc9f56..57cad76 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -179,7 +179,6 @@ CONFIG_RTE_LIBRTE_I40E_DEBUG_RX=n
 CONFIG_RTE_LIBRTE_I40E_DEBUG_TX=n
 CONFIG_RTE_LIBRTE_I40E_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_I40E_DEBUG_DRIVER=n
-CONFIG_RTE_LIBRTE_I40E_PF_DISABLE_STRIP_CRC=y
 CONFIG_RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC=n
 CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC=n
 CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF=4
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 8be79c3..57b61c9 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -202,7 +202,6 @@ CONFIG_RTE_LIBRTE_I40E_DEBUG_RX=n
 CONFIG_RTE_LIBRTE_I40E_DEBUG_TX=n
 CONFIG_RTE_LIBRTE_I40E_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_I40E_DEBUG_DRIVER=n
-CONFIG_RTE_LIBRTE_I40E_PF_DISABLE_STRIP_CRC=n
 CONFIG_RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC=y
 CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC=n
 CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF=4
-- 
1.8.1.4



[dpdk-dev] [PATCH v3 3/5] i40e: renaming and code style fix

2014-11-06 Thread Helin Zhang
Rename some local variables to express more accurately
and briefly. Fix several code style issues reported by
checkpatch.pl. Line warpping for some source lines which
has more than 80 characters, and merge lines together for
those source lines which does not need any line wrapping
actually. Add macros for numeric or calculating memory
sizes.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_ethdev_vf.c | 86 ++--
 lib/librte_pmd_i40e/i40e_pf.c| 46 +--
 lib/librte_pmd_i40e/i40e_pf.h| 29 +---
 3 files changed, 84 insertions(+), 77 deletions(-)

v2 changes:
* Put all the renaming and code style fixes into a patch.
* Added several more code style fixes for i40e_pf.c.

v3 changes:
* Added a macro of calculating memory size for configuring
  vsi queues.
* Used array of memory in stack to replace the memory
  allocated by rte_zmalloc().

diff --git a/lib/librte_pmd_i40e/i40e_ethdev_vf.c 
b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
index 966f02f..9a8cdc8 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev_vf.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
@@ -537,78 +537,76 @@ static int
 i40evf_configure_queues(struct rte_eth_dev *dev)
 {
struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
-   struct i40e_virtchnl_vsi_queue_config_info *queue_info;
-   struct i40e_virtchnl_queue_pair_info *queue_cfg;
struct i40e_rx_queue **rxq =
(struct i40e_rx_queue **)dev->data->rx_queues;
struct i40e_tx_queue **txq =
(struct i40e_tx_queue **)dev->data->tx_queues;
-   int i, len, nb_qpairs, num_rxq, num_txq;
-   int err;
+   struct i40e_virtchnl_vsi_queue_config_info *vc_vqci;
+   struct i40e_virtchnl_queue_pair_info *vc_qpi;
struct vf_cmd_info args;
-   struct rte_pktmbuf_pool_private *mbp_priv;
+   uint16_t i, nb_qp = vf->num_queue_pairs;
+   const uint32_t size =
+   I40E_VIRTCHNL_CONFIG_VSI_QUEUES_SIZE(vc_vqci, nb_qp);
+   uint8_t buff[size];
+   int ret;

-   nb_qpairs = vf->num_queue_pairs;
-   len = sizeof(*queue_info) + sizeof(*queue_cfg) * nb_qpairs;
-   queue_info = rte_zmalloc("queue_info", len, 0);
-   if (queue_info == NULL) {
-   PMD_INIT_LOG(ERR, "failed alloc memory for queue_info");
-   return -1;
-   }
-   queue_info->vsi_id = vf->vsi_res->vsi_id;
-   queue_info->num_queue_pairs = nb_qpairs;
-   queue_cfg = queue_info->qpair;
+   memset(buff, 0, sizeof(buff));
+   vc_vqci = (struct i40e_virtchnl_vsi_queue_config_info *)buff;
+   vc_vqci->vsi_id = vf->vsi_res->vsi_id;
+   vc_vqci->num_queue_pairs = nb_qp;
+   vc_qpi = vc_vqci->qpair;

-   num_rxq = dev->data->nb_rx_queues;
-   num_txq = dev->data->nb_tx_queues;
/*
 * PF host driver required to configure queues in pairs, which means
 * rxq_num should equals to txq_num. The actual usage won't always
 * work that way. The solution is fills 0 with HW ring option in case
 * they are not equal.
 */
-   for (i = 0; i < nb_qpairs; i++) {
+   for (i = 0; i < nb_qp; i++) {
/*Fill TX info */
-   queue_cfg->txq.vsi_id = queue_info->vsi_id;
-   queue_cfg->txq.queue_id = i;
-   if (i < num_txq) {
-   queue_cfg->txq.ring_len = txq[i]->nb_tx_desc;
-   queue_cfg->txq.dma_ring_addr = 
txq[i]->tx_ring_phys_addr;
+   vc_qpi->txq.vsi_id = vc_vqci->vsi_id;
+   vc_qpi->txq.queue_id = i;
+   if (i < dev->data->nb_tx_queues) {
+   vc_qpi->txq.ring_len = txq[i]->nb_tx_desc;
+   vc_qpi->txq.dma_ring_addr = txq[i]->tx_ring_phys_addr;
} else {
-   queue_cfg->txq.ring_len = 0;
-   queue_cfg->txq.dma_ring_addr = 0;
+   vc_qpi->txq.ring_len = 0;
+   vc_qpi->txq.dma_ring_addr = 0;
}

/* Fill RX info */
-   queue_cfg->rxq.vsi_id = queue_info->vsi_id;
-   queue_cfg->rxq.queue_id = i;
-   queue_cfg->rxq.max_pkt_size = vf->max_pkt_len;
-   if (i < num_rxq) {
-   mbp_priv = rte_mempool_get_priv(rxq[i]->mp);
-   queue_cfg->rxq.databuffer_size = 
mbp_priv->mbuf_data_room_size -
-  RTE_PKTMBUF_HEADROOM;;
-   queue_cfg->rxq.ring_len = rxq[i]->nb_rx_desc;
-   queue_cfg->rxq.dma_ring_addr = 
rxq[i]->rx_ring_phys_addr;;
+   vc_qpi->rxq.vsi_id = vc_vqci->vsi_id;
+   vc_qpi->rxq.queue_id = i;
+   vc_qpi->rxq.max_pkt_size = vf->max_pkt_len;
+   if (i < dev->data->nb_rx_queues) {
+   struct rte_pktmbuf_pool_private *mbp_priv =
+  

[dpdk-dev] [PATCH v3 4/5] i40e: support of configurable crc stripping in rx queue

2014-11-06 Thread Helin Zhang
Support of configurable crc stripping in context of
VF RX queues.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_pf.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

v2 changes:
* Put setting the crc stripping into a single patch.

v3 changes:
* Added an input parameter for configuring crc stripping in
  RX queue context.

diff --git a/lib/librte_pmd_i40e/i40e_pf.c b/lib/librte_pmd_i40e/i40e_pf.c
index f4b4f2d..7f98636 100644
--- a/lib/librte_pmd_i40e/i40e_pf.c
+++ b/lib/librte_pmd_i40e/i40e_pf.c
@@ -56,6 +56,8 @@
 #include "i40e_rxtx.h"
 #include "i40e_pf.h"

+#define I40E_CFG_CRCSTRIP_DEFAULT 1
+
 static int
 i40e_pf_host_switch_queues(struct i40e_pf_vf *vf,
   struct i40e_virtchnl_queue_select *qsel,
@@ -325,7 +327,8 @@ send_msg:
 static int
 i40e_pf_host_hmc_config_rxq(struct i40e_hw *hw,
struct i40e_pf_vf *vf,
-   struct i40e_virtchnl_rxq_info *rxq)
+   struct i40e_virtchnl_rxq_info *rxq,
+   uint8_t crcstrip)
 {
int err = I40E_SUCCESS;
struct i40e_hmc_obj_rxq rx_ctx;
@@ -354,7 +357,7 @@ i40e_pf_host_hmc_config_rxq(struct i40e_hw *hw,
rx_ctx.tphdata_ena = 1;
rx_ctx.tphhead_ena = 1;
rx_ctx.lrxqthresh = 2;
-   rx_ctx.crcstrip = 1;
+   rx_ctx.crcstrip = crcstrip;
rx_ctx.l2tsel = 1;
rx_ctx.prefena = 1;

@@ -434,8 +437,8 @@ i40e_pf_host_process_cmd_config_vsi_queues(struct 
i40e_pf_vf *vf,
}

/* Apply VF RX queue setting to HMC */
-   if (i40e_pf_host_hmc_config_rxq(hw, vf, &vc_qpi[i].rxq)
-   != I40E_SUCCESS) {
+   if (i40e_pf_host_hmc_config_rxq(hw, vf, &vc_qpi[i].rxq,
+   I40E_CFG_CRCSTRIP_DEFAULT) != I40E_SUCCESS) {
PMD_DRV_LOG(ERR, "Configure RX queue HMC failed");
ret = I40E_ERR_PARAM;
goto send_msg;
-- 
1.8.1.4



[dpdk-dev] [PATCH v3 2/5] i40evf: Remove 'host_is_dpdk', and use version number instead

2014-11-06 Thread Helin Zhang
API version number is straightfoward enough for checking
the PF host, and no need to use 'host_is_dpdk'.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_ethdev.h|  3 ++-
 lib/librte_pmd_i40e/i40e_ethdev_vf.c | 29 +++--
 2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.h 
b/lib/librte_pmd_i40e/i40e_ethdev.h
index afa14aa..96361c2 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.h
+++ b/lib/librte_pmd_i40e/i40e_ethdev.h
@@ -323,7 +323,8 @@ struct i40e_vf {
bool promisc_unicast_enabled;
bool promisc_multicast_enabled;

-   bool host_is_dpdk; /* The flag indicates if the host is DPDK */
+   uint32_t version_major; /* Major version number */
+   uint32_t version_minor; /* Minor version number */
uint16_t promisc_flags; /* Promiscuous setting */
uint32_t vlan[I40E_VFTA_SIZE]; /* VLAN bit map */

diff --git a/lib/librte_pmd_i40e/i40e_ethdev_vf.c 
b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
index fa838e6..966f02f 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev_vf.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
@@ -393,17 +393,18 @@ i40evf_check_api_version(struct rte_eth_dev *dev)
}

pver = (struct i40e_virtchnl_version_info *)args.out_buffer;
-   /* We are talking with DPDK host */
-   if (pver->major == I40E_DPDK_VERSION_MAJOR) {
-   vf->host_is_dpdk = TRUE;
-   PMD_DRV_LOG(INFO, "Detect PF host is DPDK app");
-   }
-   /* It's linux host driver */
-   else if ((pver->major != version.major) ||
-   (pver->minor != version.minor)) {
-   PMD_INIT_LOG(ERR, "pf/vf API version mismatch. "
-"(%u.%u)-(%u.%u)", pver->major, pver->minor,
-version.major, version.minor);
+   vf->version_major = pver->major;
+   vf->version_minor = pver->minor;
+   if (vf->version_major == I40E_DPDK_VERSION_MAJOR)
+   PMD_DRV_LOG(INFO, "Peer is DPDK PF host");
+   else if ((vf->version_major == I40E_VIRTCHNL_VERSION_MAJOR) &&
+   (vf->version_minor == I40E_VIRTCHNL_VERSION_MINOR))
+   PMD_DRV_LOG(INFO, "Peer is Linux PF host");
+   else {
+   PMD_INIT_LOG(ERR, "PF/VF API version mismatch:(%u.%u)-(%u.%u)",
+   vf->version_major, vf->version_minor,
+   I40E_VIRTCHNL_VERSION_MAJOR,
+   I40E_VIRTCHNL_VERSION_MINOR);
return -1;
}

@@ -1182,7 +1183,7 @@ i40evf_vlan_offload_set(struct rte_eth_dev *dev, int mask)
struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);

/* Linux pf host doesn't support vlan offload yet */
-   if (vf->host_is_dpdk) {
+   if (vf->version_major == I40E_DPDK_VERSION_MAJOR) {
/* Vlan stripping setting */
if (mask & ETH_VLAN_STRIP_MASK) {
/* Enable or disable VLAN stripping */
@@ -1207,7 +1208,7 @@ i40evf_vlan_pvid_set(struct rte_eth_dev *dev, uint16_t 
pvid, int on)
info.on = on;

/* Linux pf host don't support vlan offload yet */
-   if (vf->host_is_dpdk) {
+   if (vf->version_major == I40E_DPDK_VERSION_MAJOR) {
if (info.on)
info.config.pvid = pvid;
else {
@@ -1480,7 +1481,7 @@ i40evf_dev_link_update(struct rte_eth_dev *dev,
 * DPDK pf host provide interfacet to acquire link status
 * while Linux driver does not
 */
-   if (vf->host_is_dpdk)
+   if (vf->version_major == I40E_DPDK_VERSION_MAJOR)
i40evf_get_link_status(dev, &new_link);
else {
/* Always assume it's up, for Linux driver PF host */
-- 
1.8.1.4



[dpdk-dev] [PATCH v3 5/5] i40e: support of configurable VF crc stripping

2014-11-06 Thread Helin Zhang
Configurable CRC stripping needs to be supported in VF,
and the configuration should be finally set in relevant
RX queue context with PF host support.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_ethdev_vf.c | 155 +--
 lib/librte_pmd_i40e/i40e_pf.c|  83 +--
 lib/librte_pmd_i40e/i40e_pf.h|  28 +++
 3 files changed, 218 insertions(+), 48 deletions(-)

v2 changes:
* Put setting the crc stripping of PF host into a single patch.
* Put configuring crc stripping in VF into a single patch.

v3 changes:
* Put configuring crc stripping of both PF host and VF
  into a single patch.
* Defined below new structures for the configuring specifically.
  - struct i40e_virtchnl_rxq_ext_info;
  - struct i40e_virtchnl_queue_pair_ext_info;
  - struct i40e_virtchnl_vsi_queue_config_ext_info;
* Renamed 'I40E_VIRTCHNL_OP_CONFIG_VSI_QUEUES_EX' to
  'I40E_VIRTCHNL_OP_CONFIG_VSI_QUEUES_EXT'.

diff --git a/lib/librte_pmd_i40e/i40e_ethdev_vf.c 
b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
index 9a8cdc8..554d9d7 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev_vf.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
@@ -533,8 +533,46 @@ i40evf_config_vlan_pvid(struct rte_eth_dev *dev,
return err;
 }

+static void
+i40evf_fill_virtchnl_vsi_txq_info(struct i40e_virtchnl_txq_info *txq_info,
+ uint16_t vsi_id,
+ uint16_t queue_id,
+ uint16_t nb_txq,
+ struct i40e_tx_queue *txq)
+{
+   txq_info->vsi_id = vsi_id;
+   txq_info->queue_id = queue_id;
+   if (queue_id < nb_txq) {
+   txq_info->ring_len = txq->nb_tx_desc;
+   txq_info->dma_ring_addr = txq->tx_ring_phys_addr;
+   }
+}
+
+static void
+i40evf_fill_virtchnl_vsi_rxq_info(struct i40e_virtchnl_rxq_info *rxq_info,
+ uint16_t vsi_id,
+ uint16_t queue_id,
+ uint16_t nb_rxq,
+ uint32_t max_pkt_size,
+ struct i40e_rx_queue *rxq)
+{
+   rxq_info->vsi_id = vsi_id;
+   rxq_info->queue_id = queue_id;
+   rxq_info->max_pkt_size = max_pkt_size;
+   if (queue_id < nb_rxq) {
+   struct rte_pktmbuf_pool_private *mbp_priv;
+
+   rxq_info->ring_len = rxq->nb_rx_desc;
+   rxq_info->dma_ring_addr = rxq->rx_ring_phys_addr;
+   mbp_priv = rte_mempool_get_priv(rxq->mp);
+   rxq_info->databuffer_size =
+   mbp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM;
+   }
+}
+
+/* It configures VSI queues to co-work with Linux PF host */
 static int
-i40evf_configure_queues(struct rte_eth_dev *dev)
+i40evf_configure_vsi_queues(struct rte_eth_dev *dev)
 {
struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
struct i40e_rx_queue **rxq =
@@ -554,47 +592,14 @@ i40evf_configure_queues(struct rte_eth_dev *dev)
vc_vqci = (struct i40e_virtchnl_vsi_queue_config_info *)buff;
vc_vqci->vsi_id = vf->vsi_res->vsi_id;
vc_vqci->num_queue_pairs = nb_qp;
-   vc_qpi = vc_vqci->qpair;
-
-   /*
-* PF host driver required to configure queues in pairs, which means
-* rxq_num should equals to txq_num. The actual usage won't always
-* work that way. The solution is fills 0 with HW ring option in case
-* they are not equal.
-*/
-   for (i = 0; i < nb_qp; i++) {
-   /*Fill TX info */
-   vc_qpi->txq.vsi_id = vc_vqci->vsi_id;
-   vc_qpi->txq.queue_id = i;
-   if (i < dev->data->nb_tx_queues) {
-   vc_qpi->txq.ring_len = txq[i]->nb_tx_desc;
-   vc_qpi->txq.dma_ring_addr = txq[i]->tx_ring_phys_addr;
-   } else {
-   vc_qpi->txq.ring_len = 0;
-   vc_qpi->txq.dma_ring_addr = 0;
-   }

-   /* Fill RX info */
-   vc_qpi->rxq.vsi_id = vc_vqci->vsi_id;
-   vc_qpi->rxq.queue_id = i;
-   vc_qpi->rxq.max_pkt_size = vf->max_pkt_len;
-   if (i < dev->data->nb_rx_queues) {
-   struct rte_pktmbuf_pool_private *mbp_priv =
-   rte_mempool_get_priv(rxq[i]->mp);
-
-   vc_qpi->rxq.databuffer_size =
-   mbp_priv->mbuf_data_room_size -
-   RTE_PKTMBUF_HEADROOM;
-   vc_qpi->rxq.ring_len = rxq[i]->nb_rx_desc;
-   vc_qpi->rxq.dma_ring_addr = rxq[i]->rx_ring_phys_addr;
-   } else {
-   vc_qpi->rxq.ring_len = 0;
-   vc_qpi->rxq.dma_ring_addr = 0;
-   vc_qpi->rxq.databuffer_size = 0;
-   }
-   vc_qpi++;

[dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload

2014-11-06 Thread Olivier MATZ
Hello Jijiang,

On 11/06/2014 12:24 PM, Liu, Jijiang wrote:
>> Is it possible to have a more formal definition? For instance, is the 
>> following
>> definition below correct?
>>
>>   "the PKT_RX_TUNNEL_IPV4_HDR flag CAN be set by a driver if the packet
>>contains a tunneling protocol inside an IPv4 header".
> 
> Yes, correct.
> 
>> If the definition above is correct, I don't see how this flag can help an 
>> application
>> to run faster. There is already a flag telling if there is a valid IPv4 
>> header
>> (PKT_RX_IPV4_HDR). As the PKT_RX_TUNNEL_IPV4_HDR flag does not tell what
>> is ip->proto, the work done by an application to dissect a packet would be 
>> exactly
>> the same with or without this flag.
> 
> If the PKT_RX_TUNNEL_IPV4_HDR flag is set, which means driver tell 
> application that incoming packet is encapsulated packet, and application will 
> process / analyse the packet according to tunneling format indicated by 
> packet_type.

Where is it written that when the PKT_RX_TUNNEL_IPV4_HDR flag is set,
the packet_type is also set?

To which header packet_type refers to? Inner or Outer? Depends?

What are the possible values for packet_type?

Is the PKT_RX_TUNNEL_IPV4_HDR flag set in mbuf related to the commands
rx_vxlan_port add|del? If yes, it should be written in the API!
(assuming this is the right API design)

When the PKT_RX_TUNNEL_IPV4_HDR flag is set, does PKT_RX_IPV4_HDR or
PKT_RX_VLAN_PKT concerns the inner or outer headers? I hope it still
concerns the first one, else it would break many applications relying
on the these flags.

As you can see, today, an application cannot use PKT_RX_TUNNEL_IPV4_HDR
or m->packet_type because it is not documented.


> In terms of VXLAN packet format (MAC,IPv4,UDP,VXLAN,MAC,IP,TCP,PAY4), if only 
> the PKT_RX_IPV4_HDR flag is set, and application regard its payload as "from 
> VXLAN to PAY4", but actually, the real payload is PAY4.
>   
>> Please, can you give an example showing in which conditions this flag can 
>> help an
>> application?
> 
> http://dpdk.org/ml/archives/dev/2014-October/007151.html
> http://dpdk.org/ml/archives/dev/2014-October/007156.html
> 
> We used the PKT_RX_TUNNEL_IPV4_HDR in the two patches to help application 
> identify incoming packet is tunneling packet.

As you agreed on "the PKT_RX_TUNNEL_IPV4_HDR flag CAN be set by a
driver", it means that if the flag is not present, the application
should do the check in software. And there are several reasons why
the flag may not be present:
 - the packet is not a VxLAN packet
 - the hw or driver was not able to recognize it (I don't know, maybe
   if there are IP options the hw will not recognize it?)
 - the hw or driver does not support it (all drivers except i40e)

So the application has to provide the software equivalent code
to process PAY4.

The "csum" testpmd forwarding engine is now a bad example because it
is not able to do the same processing in software or hardware. It
now only works with an i40e driver, which was not the case before. Also,
the semantic of the command line arguments changed. Before, the meaning
was "if the flag is set, process the checksum in the NIC, else in SW".
Now, it's "huh... it depends on the flag."

I will submit a rework of the csum fowarding engine to clarify its
behavior.

Regards,
Olivier


[dpdk-dev] [PATCH v3 1/5] ethdev: add vmdq rx mode

2014-11-06 Thread Thomas Monjalon
2014-10-31 13:19, Ouyang Changchun:
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -577,6 +577,7 @@ struct rte_eth_vmdq_rx_conf {
>   uint8_t default_pool; /**< The default pool, if applicable */
>   uint8_t enable_loop_back; /**< Enable VT loop back */
>   uint8_t nb_pool_maps; /**< We can have up to 64 filters/mappings */
> + uint32_t rx_mode; /**< RX mode for vmdq */

You are adding the field rx_mode in struct rte_eth_vmdq_rx_conf.
So the comment "RX mode for vmdq" is not really informative :)
It would be more interesting to explain which kind of value this field
must contain. Something like "flags from ETH_VMDQ_ACCEPT_*".

-- 
Thomas


[dpdk-dev] [PATCH] distributor: add comments to make code more readable

2014-11-06 Thread Bruce Richardson
From: "Bruce Richardson" 

Add in some additional comments around more complex areas of the code
so as to make the code easier to read and understand.

Signed-off-by: Bruce Richardson 
---
 lib/librte_distributor/rte_distributor.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/lib/librte_distributor/rte_distributor.c 
b/lib/librte_distributor/rte_distributor.c
index 585ff88..656ee5c 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -92,6 +92,7 @@ struct rte_distributor {
unsigned num_workers; /**< Number of workers polling */

uint32_t in_flight_tags[RTE_MAX_LCORE];
+   /**< Tracks the tag being processed per core, 0 == no pkt */
struct rte_distributor_backlog backlog[RTE_MAX_LCORE];

union rte_distributor_buffer bufs[RTE_MAX_LCORE];
@@ -282,10 +283,22 @@ rte_distributor_process(struct rte_distributor *d,
next_mb = mbufs[next_idx++];
next_value = (((int64_t)(uintptr_t)next_mb)
<< RTE_DISTRIB_FLAG_BITS);
+   /*
+* Set the low bit on the tag, so we can guarantee that
+* we never store a tag value of zero. That means we can
+* use the zero-value to indicate that no packet is
+* being processed by a worker.
+*/
new_tag = (next_mb->hash.rss | 1);

uint32_t match = 0;
unsigned i;
+   /*
+* to scan for a match use "xor" and "not" to get a 0/1
+* value, then use shifting to merge to single "match"
+* variable, where a one-bit indicates a match for the
+* worker given by the bit-position
+*/
for (i = 0; i < d->num_workers; i++)
match |= (!(d->in_flight_tags[i] ^ new_tag)
<< i);
-- 
2.1.1



[dpdk-dev] [PATCH] Add user defined tag calculation callback to librte_distributor.

2014-11-06 Thread Bruce Richardson
On Thu, Nov 06, 2014 at 11:36:09AM +0100, Thomas Monjalon wrote:
> 2014-11-06 09:22, Bruce Richardson:
> > On Wed, Nov 05, 2014 at 07:24:13PM +0200, jigsaw wrote:
> > > http://dpdk.org/browse/dpdk/tree/lib/librte_distributor/rte_distributor.c#n285
> > > 
> > > new_tag = (next_mb->hash.rss | 1);
> > > 
> > > Why the logical OR is needed?
> > 
> > That's needed to ensure that we never track a tag with an actual value of 
> > zero.
> > We instead always force the low bit to be 1, so that we can use zero as an
> > "empty" value.
> 
> Bruce, could you check how this code may be better commented please?
> This discussion shows that the distributor library probably needs more
> explanations in the code or doxygen.
>

I've sent a patch adding in a couple of comments where I thought some additional
clarification might been needed. Any other places where more info is needed, 
just
let me know, and I'll be happy to patch that extra info in too.

/Bruce


[dpdk-dev] [PATCH v3 3/5] ixgbe: Config PFVML2FLT register

2014-11-06 Thread Thomas Monjalon
Title would be more high level.
Example: "ixgbe: configure Rx mode for VMDQ"

2014-10-31 13:19, Ouyang Changchun:
> + for (i = 0; i < (int)num_pools; i++) {
> + if (cfg->rx_mode & ETH_VMDQ_ACCEPT_UNTAG)
> + vmolr |= IXGBE_VMOLR_AUPE;
> + if (cfg->rx_mode & ETH_VMDQ_ACCEPT_HASH_MC)
> + vmolr |= IXGBE_VMOLR_ROMPE;
> + if (cfg->rx_mode & ETH_VMDQ_ACCEPT_HASH_UC)
> + vmolr |= IXGBE_VMOLR_ROPE;
> + if (cfg->rx_mode & ETH_VMDQ_ACCEPT_BROADCAST)
> + vmolr |= IXGBE_VMOLR_BAM;
> + if (cfg->rx_mode & ETH_VMDQ_ACCEPT_MULTICAST)
> + vmolr |= IXGBE_VMOLR_MPE;
> +
> + IXGBE_WRITE_REG(hw, IXGBE_VMOLR(i), vmolr);
> + }

Please factorize code with ixgbe_set_pool_rx_mode() which is really similar.

-- 
Thomas


[dpdk-dev] [PATCH v3 4/5] virtio: New API for promisc and allmulticast

2014-11-06 Thread Thomas Monjalon
2014-10-31 13:19, Ouyang Changchun:
> Add new API in virtio for supporting promiscuous and allmulticast enable and 
> disable.

It's not a new API because there is no difference for application programming.
It should be something like "virtio: support promiscuous and allmulticast"

-- 
Thomas


[dpdk-dev] 答复: [PATCH] Add user defined tag calculation callback tolibrte_distributor.

2014-11-06 Thread Bruce Richardson
On Thu, Nov 06, 2014 at 02:36:09PM +0200, Qinglai Xiao wrote:
> Hi Bruce,
> 
> There is a subtle case in which tag values are 2 and 3, respectively. Then 
> these two tags cannot be distinguished. There should be a better way so as to 
> handle this situation.

It's not just in that, case, it's in any case where a pair of tags differ by
only a single bit. I've been assuming that the tag is likely to be a hash
value in most cases - given that it's only 32-bit - in which case it just 
doesn't
matter which bit we chose to permanently set to 1, but if there are scenarios
where it's likely that the low bits are used but the high ones not so, we can
look to change which bit is set to 1. Either way, the distributor just uses a
31-bit tag rather than a 32-bit one.

/Bruce

> 
> thx &
> rgds
> -qinglai
> 
> --
> ???: "Thomas Monjalon" 
> : ?2014/?11/?6 12:36
> ???: "Bruce Richardson" 
> ??: "dev at dpdk.org" ; "jigsaw" 
> ??: Re: [dpdk-dev] [PATCH] Add user defined tag calculation callback 
> tolibrte_distributor.
> 
> 2014-11-06 09:22, Bruce Richardson:
> > On Wed, Nov 05, 2014 at 07:24:13PM +0200, jigsaw wrote:
> > > http://dpdk.org/browse/dpdk/tree/lib/librte_distributor/rte_distributor.c#n285
> > > 
> > > new_tag = (next_mb->hash.rss | 1);
> > > 
> > > Why the logical OR is needed?
> > 
> > That's needed to ensure that we never track a tag with an actual value of 
> > zero.
> > We instead always force the low bit to be 1, so that we can use zero as an
> > "empty" value.
> 
> Bruce, could you check how this code may be better commented please?
> This discussion shows that the distributor library probably needs more
> explanations in the code or doxygen.
> 
> Thanks
> -- 
> Thomas


[dpdk-dev] [PATCH v2] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-06 Thread lxu
---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..a591da3 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -273,6 +273,24 @@ pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
return uio_num;
 }

+static inline const struct rte_memseg *
+get_physmem_last(void)
+{
+   const struct rte_memseg * seg = rte_eal_get_physmem_layout();
+   const struct rte_memseg * last = seg;
+   unsigned i = 0;
+
+   for (i=0; iaddr == NULL)
+   break;
+
+   if(seg->addr > last->addr)
+   last = seg;
+
+   }
+   return last;
+}
+
 /* map the PCI resource of a PCI device in virtual memory */
 int
 pci_uio_map_resource(struct rte_pci_device *dev)
@@ -290,6 +308,13 @@ pci_uio_map_resource(struct rte_pci_device *dev)
struct mapped_pci_resource *uio_res;
struct pci_map *maps;

+   /* map uio resource into user required virtual address */
+   static void * requested_addr = NULL;
+   if (internal_config.base_virtaddr && NULL == requested_addr) {
+   const struct rte_memseg * last = get_physmem_last();
+   requested_addr = (void *)(last->addr_64 + last->len);
+   }
+
dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;

@@ -371,10 +396,12 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   mapaddr = pci_map_resource(requested_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+   else if (NULL != requested_addr)
+   requested_addr = (uint8_t *)mapaddr + 
maps[j].size;
}

if (fail) {
-- 
1.9.1



[dpdk-dev] [PATCH v5 0/8] support of multiple sizes of redirection table

2014-11-06 Thread Helin Zhang
As e1000, ixgbe and i40e hardware use different sizes of redirection table in
PF or VF, ethdev and PMDs need to be reworked to support multiple sizes of that
table. In addition, commands in testpmd also need to be reworked to support
these changes.

v2 changes:
* Reorganized the patches.
* Added code style fixes.
* Added support of reta updating/querying in i40e VF.

v3 changes:
* Reorganized the patch set.
* Added returning default RX/TX configurations in VF (igb/ixgbe/i40e), as the
  patch set of it for PF has been accepted recently.

v4 changes:
* Renamed RTE_BIT_WIDTH_64 to RTE_RETA_GROUP_SIZE.
* Added more comments to rte_eth_dev_rss_reta_update() and
  rte_eth_dev_rss_reta_query().

v5 changes:
* Reworked the annotations of macros of RETA sizes in rte_ethdev.h.

Helin Zhang (8):
  app/testpmd: code style fix
  i40evf: code style fix
  i40e: support of setting hash lookup table size
  igb: implement ops of 'dev_infos_get' for PF and VF respectively
  ixgbe: implement ops of 'dev_infos_get' for PF and VF respectively
  i40e: rework of ops of 'dev_infos_get' for both PF and VF
  ethdev: support of multiple sizes of redirection table
  i40evf: support of updating/querying redirection table

 app/test-pmd/cmdline.c   | 166 +
 app/test-pmd/config.c|  37 ---
 app/test-pmd/testpmd.h   |   4 +-
 lib/librte_ether/rte_ethdev.c| 116 
 lib/librte_ether/rte_ethdev.h|  51 ++---
 lib/librte_pmd_e1000/igb_ethdev.c| 170 +++---
 lib/librte_pmd_i40e/i40e_ethdev.c| 122 +++--
 lib/librte_pmd_i40e/i40e_ethdev.h|  25 -
 lib/librte_pmd_i40e/i40e_ethdev_vf.c | 124 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c  | 198 ++-
 10 files changed, 694 insertions(+), 319 deletions(-)

-- 
1.8.1.4



[dpdk-dev] [PATCH v5 1/8] app/testpmd: code style fix

2014-11-06 Thread Helin Zhang
Fix of several code style issues.

Signed-off-by: Helin Zhang 
---
 app/test-pmd/cmdline.c | 28 +++-
 app/test-pmd/config.c  |  2 +-
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4c3fc76..daba286 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -1602,7 +1602,7 @@ parse_reta_config(const char *str, struct 
rte_eth_rss_reta *reta_conf)
nb_queue = (uint8_t)int_fld[FLD_QUEUE];

if (hash_index >= ETH_RSS_RETA_NUM_ENTRIES) {
-   printf("Invalid RETA hash index=%d",hash_index);
+   printf("Invalid RETA hash index=%d", hash_index);
return -1;
}

@@ -1619,22 +1619,24 @@ parse_reta_config(const char *str, struct 
rte_eth_rss_reta *reta_conf)

 static void
 cmd_set_rss_reta_parsed(void *parsed_result,
-   __attribute__((unused)) struct cmdline *cl,
-   __attribute__((unused)) void *data)
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
 {
int ret;
struct rte_eth_rss_reta reta_conf;
struct cmd_config_rss_reta *res = parsed_result;

-   memset(&reta_conf,0,sizeof(struct rte_eth_rss_reta));
+   memset(&reta_conf, 0, sizeof(struct rte_eth_rss_reta));
if (!strcmp(res->list_name, "reta")) {
if (parse_reta_config(res->list_of_items, &reta_conf)) {
-   printf("Invalid RSS Redirection Table config 
entered\n");
+   printf("Invalid RSS Redirection Table config "
+   "entered\n");
return;
}
ret = rte_eth_dev_rss_reta_update(res->port_id, &reta_conf);
if (ret != 0)
-   printf("Bad redirection table parameter, return code = 
%d \n",ret);
+   printf("Bad redirection table parameter, "
+   "return code = %d \n", ret);
}
 }

@@ -1696,19 +1698,19 @@ static void cmd_showport_reta_parsed(void 
*parsed_result,
 }

 cmdline_parse_token_string_t cmd_showport_reta_show =
-TOKEN_STRING_INITIALIZER(struct  cmd_showport_reta, show, "show");
+   TOKEN_STRING_INITIALIZER(struct  cmd_showport_reta, show, "show");
 cmdline_parse_token_string_t cmd_showport_reta_port =
-TOKEN_STRING_INITIALIZER(struct  cmd_showport_reta, port, "port");
+   TOKEN_STRING_INITIALIZER(struct  cmd_showport_reta, port, "port");
 cmdline_parse_token_num_t cmd_showport_reta_port_id =
-TOKEN_NUM_INITIALIZER(struct cmd_showport_reta, port_id, UINT8);
+   TOKEN_NUM_INITIALIZER(struct cmd_showport_reta, port_id, UINT8);
 cmdline_parse_token_string_t cmd_showport_reta_rss =
-TOKEN_STRING_INITIALIZER(struct cmd_showport_reta, rss, "rss");
+   TOKEN_STRING_INITIALIZER(struct cmd_showport_reta, rss, "rss");
 cmdline_parse_token_string_t cmd_showport_reta_reta =
-TOKEN_STRING_INITIALIZER(struct cmd_showport_reta, reta, "reta");
+   TOKEN_STRING_INITIALIZER(struct cmd_showport_reta, reta, "reta");
 cmdline_parse_token_num_t cmd_showport_reta_mask_lo =
-TOKEN_NUM_INITIALIZER(struct cmd_showport_reta,mask_lo,UINT64);
+   TOKEN_NUM_INITIALIZER(struct cmd_showport_reta, mask_lo, UINT64);
 cmdline_parse_token_num_t cmd_showport_reta_mask_hi =
-   TOKEN_NUM_INITIALIZER(struct cmd_showport_reta,mask_hi,UINT64);
+   TOKEN_NUM_INITIALIZER(struct cmd_showport_reta, mask_hi, UINT64);

 cmdline_parse_inst_t cmd_showport_reta = {
.f = cmd_showport_reta_parsed,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 9bc08f4..73afcf5 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -764,7 +764,7 @@ rxtx_config_display(void)
 void
 port_rss_reta_info(portid_t port_id,struct rte_eth_rss_reta *reta_conf)
 {
-   uint8_t i,j;
+   uint8_t i, j;
int ret;

if (port_id_is_invalid(port_id))
-- 
1.8.1.4



[dpdk-dev] [PATCH v5 8/8] i40evf: support of updating/querying redirection table

2014-11-06 Thread Helin Zhang
Support of updating/querying redirection table has been added for VF.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_ethdev_vf.c | 99 ++--
 1 file changed, 94 insertions(+), 5 deletions(-)

v2 changes:
* Add support of updating/querying i40e reta of VF.

v4 changes:
* Renamed RTE_BIT_WIDTH_64 to RTE_RETA_GROUP_SIZE.

diff --git a/lib/librte_pmd_i40e/i40e_ethdev_vf.c 
b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
index 3e64666..03bc28e 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev_vf.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
@@ -126,11 +126,6 @@ static void i40evf_dev_allmulticast_disable(struct 
rte_eth_dev *dev);
 static int i40evf_get_link_status(struct rte_eth_dev *dev,
  struct rte_eth_link *link);
 static int i40evf_init_vlan(struct rte_eth_dev *dev);
-static int i40evf_config_rss(struct i40e_vf *vf);
-static int i40evf_dev_rss_hash_update(struct rte_eth_dev *dev,
- struct rte_eth_rss_conf *rss_conf);
-static int i40evf_dev_rss_hash_conf_get(struct rte_eth_dev *dev,
-   struct rte_eth_rss_conf *rss_conf);
 static int i40evf_dev_rx_queue_start(struct rte_eth_dev *dev,
 uint16_t rx_queue_id);
 static int i40evf_dev_rx_queue_stop(struct rte_eth_dev *dev,
@@ -139,6 +134,17 @@ static int i40evf_dev_tx_queue_start(struct rte_eth_dev 
*dev,
 uint16_t tx_queue_id);
 static int i40evf_dev_tx_queue_stop(struct rte_eth_dev *dev,
uint16_t tx_queue_id);
+static int i40evf_dev_rss_reta_update(struct rte_eth_dev *dev,
+   struct rte_eth_rss_reta_entry64 *reta_conf,
+   uint16_t reta_size);
+static int i40evf_dev_rss_reta_query(struct rte_eth_dev *dev,
+   struct rte_eth_rss_reta_entry64 *reta_conf,
+   uint16_t reta_size);
+static int i40evf_config_rss(struct i40e_vf *vf);
+static int i40evf_dev_rss_hash_update(struct rte_eth_dev *dev,
+ struct rte_eth_rss_conf *rss_conf);
+static int i40evf_dev_rss_hash_conf_get(struct rte_eth_dev *dev,
+   struct rte_eth_rss_conf *rss_conf);

 /* Default hash key buffer for RSS */
 static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1];
@@ -166,6 +172,8 @@ static struct eth_dev_ops i40evf_eth_dev_ops = {
.rx_queue_release = i40e_dev_rx_queue_release,
.tx_queue_setup   = i40e_dev_tx_queue_setup,
.tx_queue_release = i40e_dev_tx_queue_release,
+   .reta_update  = i40evf_dev_rss_reta_update,
+   .reta_query   = i40evf_dev_rss_reta_query,
.rss_hash_update  = i40evf_dev_rss_hash_update,
.rss_hash_conf_get= i40evf_dev_rss_hash_conf_get,
 };
@@ -1611,6 +1619,87 @@ i40evf_dev_close(struct rte_eth_dev *dev)
 }

 static int
+i40evf_dev_rss_reta_update(struct rte_eth_dev *dev,
+  struct rte_eth_rss_reta_entry64 *reta_conf,
+  uint16_t reta_size)
+{
+   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t lut, l;
+   uint16_t i, j;
+   uint16_t idx, shift;
+   uint8_t mask;
+
+   if (reta_size != ETH_RSS_RETA_SIZE_64) {
+   PMD_DRV_LOG(ERR, "The size of hash lookup table configured "
+   "(%d) doesn't match the number of hardware can"
+   "support (%d)\n", reta_size, ETH_RSS_RETA_SIZE_64);
+   return -EINVAL;
+   }
+
+   for (i = 0; i < reta_size; i += I40E_4_BIT_WIDTH) {
+   idx = i / RTE_RETA_GROUP_SIZE;
+   shift = i % RTE_RETA_GROUP_SIZE;
+   mask = (uint8_t)((reta_conf[idx].mask >> shift) &
+   I40E_4_BIT_MASK);
+   if (!mask)
+   continue;
+   if (mask == I40E_4_BIT_MASK)
+   l = 0;
+   else
+   l = I40E_READ_REG(hw, I40E_VFQF_HLUT(i >> 2));
+
+   for (j = 0, lut = 0; j < I40E_4_BIT_WIDTH; j++) {
+   if (mask & (0x1 << j))
+   lut |= reta_conf[idx].reta[shift + j] <<
+   (CHAR_BIT * j);
+   else
+   lut |= l & (I40E_8_BIT_MASK << (CHAR_BIT * j));
+   }
+   I40E_WRITE_REG(hw, I40E_VFQF_HLUT(i >> 2), lut);
+   }
+
+   return 0;
+}
+
+static int
+i40evf_dev_rss_reta_query(struct rte_eth_dev *dev,
+ struct rte_eth_rss_reta_entry64 *reta_conf,
+ uint16_t reta_size)
+{
+   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t lut;
+   uint16_t i, j;
+   uint16_t idx, shift;
+   uint8_

[dpdk-dev] [PATCH v5 3/8] i40e: support of setting hash lookup table size

2014-11-06 Thread Helin Zhang
Add support of setting hash lookup table size according to the hardawre
capability.

Signed-off-by: Helin Zhang 
---
 lib/librte_ether/rte_ethdev.h |  9 -
 lib/librte_pmd_i40e/i40e_ethdev.c | 14 +-
 lib/librte_pmd_i40e/i40e_ethdev.h |  1 +
 3 files changed, 22 insertions(+), 2 deletions(-)

v5 changes:
* Reworked the annotations of macros of RETA sizes.

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 7e4c998..93df7b1 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -443,9 +443,16 @@ struct rte_eth_rss_conf {
ETH_RSS_FRAG_IPV6 | \
ETH_RSS_L2_PAYLOAD)

-/* Definitions used for redirection table entry size */
+/*
+ * Definitions used for redirection table entry size.
+ * Some RSS RETA sizes may not be supported by some drivers, check the
+ * documentation or the description of relevant functions for more details.
+ */
 #define ETH_RSS_RETA_NUM_ENTRIES 128
 #define ETH_RSS_RETA_MAX_QUEUE   16
+#define ETH_RSS_RETA_SIZE_64  64
+#define ETH_RSS_RETA_SIZE_128 128
+#define ETH_RSS_RETA_SIZE_512 512

 /* Definitions used for VMDQ and DCB functionality */
 #define ETH_VMDQ_MAX_VLAN_FILTERS   64 /**< Maximum nb. of VMDQ vlan filters. 
*/
diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index 4570795..c6b52be 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -3195,7 +3195,19 @@ i40e_pf_setup(struct i40e_pf *pf)

/* Configure filter control */
memset(&settings, 0, sizeof(settings));
-   settings.hash_lut_size = I40E_HASH_LUT_SIZE_128;
+   if (hw->func_caps.rss_table_size == ETH_RSS_RETA_SIZE_128)
+   settings.hash_lut_size = I40E_HASH_LUT_SIZE_128;
+   else if (hw->func_caps.rss_table_size == ETH_RSS_RETA_SIZE_512)
+   settings.hash_lut_size = I40E_HASH_LUT_SIZE_512;
+   else {
+   PMD_DRV_LOG(ERR, "Hash lookup table size (%u) not supported\n",
+   hw->func_caps.rss_table_size);
+   return I40E_ERR_PARAM;
+   }
+   PMD_DRV_LOG(INFO, "Hardware capability of hash lookup table "
+   "size: %u\n", hw->func_caps.rss_table_size);
+   pf->hash_lut_size = hw->func_caps.rss_table_size;
+
/* Enable ethtype and macvlan filters */
settings.enable_ethtype = TRUE;
settings.enable_macvlan = TRUE;
diff --git a/lib/librte_pmd_i40e/i40e_ethdev.h 
b/lib/librte_pmd_i40e/i40e_ethdev.h
index afa14aa..28c0754 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.h
+++ b/lib/librte_pmd_i40e/i40e_ethdev.h
@@ -270,6 +270,7 @@ struct i40e_pf {
uint16_t vmdq_nb_qps; /* The number of queue pairs of VMDq */
uint16_t vf_nb_qps; /* The number of queue pairs of VF */
uint16_t fdir_nb_qps; /* The number of queue pairs of Flow Director */
+   uint16_t hash_lut_size; /* The size of hash lookup table */

/* store VXLAN UDP ports */
uint16_t vxlan_ports[I40E_MAX_PF_UDP_OFFLOAD_PORTS];
-- 
1.8.1.4



[dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload

2014-11-06 Thread Liu, Jijiang
Hi Olivier,

> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Thursday, November 6, 2014 9:09 PM
> To: Liu, Jijiang
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum
> offload
> 
> Hello Jijiang,
> 
> On 11/06/2014 12:24 PM, Liu, Jijiang wrote:
> >> Is it possible to have a more formal definition? For instance, is the
> >> following definition below correct?
> >>
> >>   "the PKT_RX_TUNNEL_IPV4_HDR flag CAN be set by a driver if the packet
> >>contains a tunneling protocol inside an IPv4 header".
> >
> > Yes, correct.
> >
> >> If the definition above is correct, I don't see how this flag can
> >> help an application to run faster. There is already a flag telling if
> >> there is a valid IPv4 header (PKT_RX_IPV4_HDR). As the
> >> PKT_RX_TUNNEL_IPV4_HDR flag does not tell what is ip->proto, the work
> >> done by an application to dissect a packet would be exactly the same with 
> >> or
> without this flag.
> >
> > If the PKT_RX_TUNNEL_IPV4_HDR flag is set, which means driver tell
> application that incoming packet is encapsulated packet, and application will
> process / analyse the packet according to tunneling format indicated by
> packet_type.
> 
> Where is it written that when the PKT_RX_TUNNEL_IPV4_HDR flag is set, the
> packet_type is also set?
> 
> To which header packet_type refers to? Inner or Outer? Depends?
> 
> What are the possible values for packet_type?
> 
> Is the PKT_RX_TUNNEL_IPV4_HDR flag set in mbuf related to the commands
> rx_vxlan_port add|del? If yes, it should be written in the API!
> (assuming this is the right API design)
> 
> When the PKT_RX_TUNNEL_IPV4_HDR flag is set, does PKT_RX_IPV4_HDR or
> PKT_RX_VLAN_PKT concerns the inner or outer headers? I hope it still concerns
> the first one, else it would break many applications relying on the these 
> flags.
> 
> As you can see, today, an application cannot use PKT_RX_TUNNEL_IPV4_HDR or
> m->packet_type because it is not documented.
> 
> 
> > In terms of VXLAN packet format (MAC,IPv4,UDP,VXLAN,MAC,IP,TCP,PAY4), if
> only the PKT_RX_IPV4_HDR flag is set, and application regard its payload as 
> "from
> VXLAN to PAY4", but actually, the real payload is PAY4.
> >
> >> Please, can you give an example showing in which conditions this flag
> >> can help an application?
> >
> > http://dpdk.org/ml/archives/dev/2014-October/007151.html
> > http://dpdk.org/ml/archives/dev/2014-October/007156.html
> >
> > We used the PKT_RX_TUNNEL_IPV4_HDR in the two patches to help
> application identify incoming packet is tunneling packet.
> 
> As you agreed on "the PKT_RX_TUNNEL_IPV4_HDR flag CAN be set by a driver",
> it means that if the flag is not present, the application should do the check 
> in
> software. And there are several reasons why the flag may not be present:
>  - the packet is not a VxLAN packet
As long as it is tunneling packet with IPv4/6 header, the flag should be set by 
driver.

>  - the hw or driver was not able to recognize it (I don't know, maybe
>if there are IP options the hw will not recognize it?) 
>  - the hw or driver does not support it (all drivers except i40e)
E1000/ixgbe don't support VXLAN packet and another tunneling packet, so driver 
don't need to set this flag.
As to other NICs that support tunneling packet , I don't why HW or driver can't 
recognize it.

> So the application has to provide the software equivalent code to process 
> PAY4.
> 
> The "csum" testpmd forwarding engine is now a bad example because it is not
> able to do the same processing in software or hardware. It now only works with
> an i40e driver, which was not the case before. Also, the semantic of the 
> command
> line arguments changed. Before, the meaning was "if the flag is set, process 
> the
> checksum in the NIC, else in SW".
> Now, it's "huh... it depends on the flag."


Currently, If the packet is non-tunneling packet, I believe the  "csum" testpmd 
forwarding engine also works well as before.
we changed the engine as follows, which is compatible with previous 
implementation.
-   if (pkt_ol_flags & PKT_RX_IPV4_HDR) {
+   if (pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_TUNNEL_IPV4_HDR)) {
...

-   else if (pkt_ol_flags & PKT_RX_IPV6_HDR) {
+   } else if (pkt_ol_flags & (PKT_RX_IPV6_HDR | 
PKT_RX_TUNNEL_IPV6_HDR)) {


> I will submit a rework of the csum fowarding engine to clarify its behavior.
OK. good.

> Regards,
> Olivier


[dpdk-dev] [PATCH v5 2/8] i40evf: code style fix

2014-11-06 Thread Helin Zhang
Fix of several code style issues.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_ethdev_vf.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev_vf.c 
b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
index fa838e6..5b8a3bf 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev_vf.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
@@ -131,10 +131,14 @@ static int i40evf_dev_rss_hash_update(struct rte_eth_dev 
*dev,
  struct rte_eth_rss_conf *rss_conf);
 static int i40evf_dev_rss_hash_conf_get(struct rte_eth_dev *dev,
struct rte_eth_rss_conf *rss_conf);
-static int i40evf_dev_rx_queue_start(struct rte_eth_dev *, uint16_t);
-static int i40evf_dev_rx_queue_stop(struct rte_eth_dev *, uint16_t);
-static int i40evf_dev_tx_queue_start(struct rte_eth_dev *, uint16_t);
-static int i40evf_dev_tx_queue_stop(struct rte_eth_dev *, uint16_t);
+static int i40evf_dev_rx_queue_start(struct rte_eth_dev *dev,
+uint16_t rx_queue_id);
+static int i40evf_dev_rx_queue_stop(struct rte_eth_dev *dev,
+   uint16_t rx_queue_id);
+static int i40evf_dev_tx_queue_start(struct rte_eth_dev *dev,
+uint16_t tx_queue_id);
+static int i40evf_dev_tx_queue_stop(struct rte_eth_dev *dev,
+   uint16_t tx_queue_id);

 /* Default hash key buffer for RSS */
 static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1];
-- 
1.8.1.4



[dpdk-dev] [PATCH v5 6/8] i40e: rework of ops of 'dev_infos_get' for both PF and VF

2014-11-06 Thread Helin Zhang
Returning redirection table size has been supported in ops of 'dev_infos_get'
for both PF and VF. Default RX/TX configurations of VF can be returned in ops
of 'dev_infos_get', while it was missed before.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_ethdev.c| 15 +++
 lib/librte_pmd_i40e/i40e_ethdev.h| 11 +++
 lib/librte_pmd_i40e/i40e_ethdev_vf.c | 23 +++
 3 files changed, 37 insertions(+), 12 deletions(-)

v2 changes:
* Put getting reta size of both i40e PF and VF into a single patch.

v3 changes:
* Returning default RX/TX configurations has been added in ops of
  'dev_infos_get' for VF, as it was added recently in that for PF.

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index c6b52be..fa6ad01 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -59,17 +59,6 @@
 #include "i40e_rxtx.h"
 #include "i40e_pf.h"

-#define I40E_DEFAULT_RX_FREE_THRESH  32
-#define I40E_DEFAULT_RX_PTHRESH  8
-#define I40E_DEFAULT_RX_HTHRESH  8
-#define I40E_DEFAULT_RX_WTHRESH  0
-
-#define I40E_DEFAULT_TX_FREE_THRESH  32
-#define I40E_DEFAULT_TX_PTHRESH  32
-#define I40E_DEFAULT_TX_HTHRESH  0
-#define I40E_DEFAULT_TX_WTHRESH  0
-#define I40E_DEFAULT_TX_RSBIT_THRESH 32
-
 /* Maximun number of MAC addresses */
 #define I40E_NUM_MACADDR_MAX   64
 #define I40E_CLEAR_PXE_WAIT_MS 200
@@ -1443,6 +1432,7 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
DEV_TX_OFFLOAD_UDP_CKSUM |
DEV_TX_OFFLOAD_TCP_CKSUM |
DEV_TX_OFFLOAD_SCTP_CKSUM;
+   dev_info->reta_size = pf->hash_lut_size;

dev_info->default_rxconf = (struct rte_eth_rxconf) {
.rx_thresh = {
@@ -1462,7 +1452,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
},
.tx_free_thresh = I40E_DEFAULT_TX_FREE_THRESH,
.tx_rs_thresh = I40E_DEFAULT_TX_RSBIT_THRESH,
-   .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | 
ETH_TXQ_FLAGS_NOOFFLOADS,
+   .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
+   ETH_TXQ_FLAGS_NOOFFLOADS,
};

if (pf->flags | I40E_FLAG_VMDQ) {
diff --git a/lib/librte_pmd_i40e/i40e_ethdev.h 
b/lib/librte_pmd_i40e/i40e_ethdev.h
index 28c0754..afa4e5d 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.h
+++ b/lib/librte_pmd_i40e/i40e_ethdev.h
@@ -56,6 +56,17 @@
 /* Always assign pool 0 to main VSI, VMDQ will start from 1 */
 #define I40E_VMDQ_POOL_BASE   1

+#define I40E_DEFAULT_RX_FREE_THRESH  32
+#define I40E_DEFAULT_RX_PTHRESH  8
+#define I40E_DEFAULT_RX_HTHRESH  8
+#define I40E_DEFAULT_RX_WTHRESH  0
+
+#define I40E_DEFAULT_TX_FREE_THRESH  32
+#define I40E_DEFAULT_TX_PTHRESH  32
+#define I40E_DEFAULT_TX_HTHRESH  0
+#define I40E_DEFAULT_TX_WTHRESH  0
+#define I40E_DEFAULT_TX_RSBIT_THRESH 32
+
 /* i40e flags */
 #define I40E_FLAG_RSS   (1ULL << 0)
 #define I40E_FLAG_DCB   (1ULL << 1)
diff --git a/lib/librte_pmd_i40e/i40e_ethdev_vf.c 
b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
index 5b8a3bf..3e64666 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev_vf.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
@@ -1567,6 +1567,29 @@ i40evf_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->max_tx_queues = vf->vsi_res->num_queue_pairs;
dev_info->min_rx_bufsize = I40E_BUF_SIZE_MIN;
dev_info->max_rx_pktlen = I40E_FRAME_SIZE_MAX;
+   dev_info->reta_size = ETH_RSS_RETA_SIZE_64;
+
+   dev_info->default_rxconf = (struct rte_eth_rxconf) {
+   .rx_thresh = {
+   .pthresh = I40E_DEFAULT_RX_PTHRESH,
+   .hthresh = I40E_DEFAULT_RX_HTHRESH,
+   .wthresh = I40E_DEFAULT_RX_WTHRESH,
+   },
+   .rx_free_thresh = I40E_DEFAULT_RX_FREE_THRESH,
+   .rx_drop_en = 0,
+   };
+
+   dev_info->default_txconf = (struct rte_eth_txconf) {
+   .tx_thresh = {
+   .pthresh = I40E_DEFAULT_TX_PTHRESH,
+   .hthresh = I40E_DEFAULT_TX_HTHRESH,
+   .wthresh = I40E_DEFAULT_TX_WTHRESH,
+   },
+   .tx_free_thresh = I40E_DEFAULT_TX_FREE_THRESH,
+   .tx_rs_thresh = I40E_DEFAULT_TX_RSBIT_THRESH,
+   .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
+   ETH_TXQ_FLAGS_NOOFFLOADS,
+   };
 }

 static void
-- 
1.8.1.4



[dpdk-dev] [PATCH v5 4/8] igb: implement ops of 'dev_infos_get' for PF and VF respectively

2014-11-06 Thread Helin Zhang
As more and more information are different between PF and VF, ops of
'dev_infos_get' has been implemented respectively. In addition, new field of
'reta_size' has been added in 'struct rte_eth_dev_info' for returning
redirection table size.

Signed-off-by: Helin Zhang 
---
 lib/librte_ether/rte_ethdev.h |  2 ++
 lib/librte_pmd_e1000/igb_ethdev.c | 61 ---
 2 files changed, 52 insertions(+), 11 deletions(-)

v2 changes:
* Added new function for ops of 'dev_infos_get' specifically for igb VF.

v3 changes:
* Put the adding new element of 'reta_size' in ethdev into this patch,
  as it is needed.
* Returning default RX/TX configurations has been added in ops of
  'dev_infos_get', as it was accepted recently in another patches.

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 93df7b1..d81629b 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -939,6 +939,8 @@ struct rte_eth_dev_info {
uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
+   uint16_t reta_size;
+   /**< Device redirection table size, the total number of entries. */
struct rte_eth_rxconf default_rxconf; /**< Default RX configuration */
struct rte_eth_txconf default_txconf; /**< Default TX configuration */
uint16_t vmdq_queue_base; /**< First queue ID for VMDQ pools. */
diff --git a/lib/librte_pmd_e1000/igb_ethdev.c 
b/lib/librte_pmd_e1000/igb_ethdev.c
index c13ea05..bae4eb2 100644
--- a/lib/librte_pmd_e1000/igb_ethdev.c
+++ b/lib/librte_pmd_e1000/igb_ethdev.c
@@ -83,6 +83,8 @@ static void eth_igb_stats_get(struct rte_eth_dev *dev,
struct rte_eth_stats *rte_stats);
 static void eth_igb_stats_reset(struct rte_eth_dev *dev);
 static void eth_igb_infos_get(struct rte_eth_dev *dev,
+ struct rte_eth_dev_info *dev_info);
+static void eth_igbvf_infos_get(struct rte_eth_dev *dev,
struct rte_eth_dev_info *dev_info);
 static int  eth_igb_flow_ctrl_get(struct rte_eth_dev *dev,
struct rte_eth_fc_conf *fc_conf);
@@ -282,7 +284,7 @@ static struct eth_dev_ops igbvf_eth_dev_ops = {
.stats_get= eth_igbvf_stats_get,
.stats_reset  = eth_igbvf_stats_reset,
.vlan_filter_set  = igbvf_vlan_filter_set,
-   .dev_infos_get= eth_igb_infos_get,
+   .dev_infos_get= eth_igbvf_infos_get,
.rx_queue_setup   = eth_igb_rx_queue_setup,
.rx_queue_release = eth_igb_rx_queue_release,
.tx_queue_setup   = eth_igb_tx_queue_setup,
@@ -1268,8 +1270,7 @@ eth_igbvf_stats_reset(struct rte_eth_dev *dev)
 }

 static void
-eth_igb_infos_get(struct rte_eth_dev *dev,
-   struct rte_eth_dev_info *dev_info)
+eth_igb_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 {
struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);

@@ -1333,23 +1334,61 @@ eth_igb_infos_get(struct rte_eth_dev *dev,
dev_info->max_vmdq_pools = 0;
break;

+   default:
+   /* Should not happen */
+   break;
+   }
+   dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
+
+   dev_info->default_rxconf = (struct rte_eth_rxconf) {
+   .rx_thresh = {
+   .pthresh = IGB_DEFAULT_RX_PTHRESH,
+   .hthresh = IGB_DEFAULT_RX_HTHRESH,
+   .wthresh = IGB_DEFAULT_RX_WTHRESH,
+   },
+   .rx_free_thresh = IGB_DEFAULT_RX_FREE_THRESH,
+   .rx_drop_en = 0,
+   };
+
+   dev_info->default_txconf = (struct rte_eth_txconf) {
+   .tx_thresh = {
+   .pthresh = IGB_DEFAULT_TX_PTHRESH,
+   .hthresh = IGB_DEFAULT_TX_HTHRESH,
+   .wthresh = IGB_DEFAULT_TX_WTHRESH,
+   },
+   .txq_flags = 0,
+   };
+}
+
+static void
+eth_igbvf_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   dev_info->min_rx_bufsize = 256; /* See BSIZE field of RCTL register. */
+   dev_info->max_rx_pktlen  = 0x3FFF; /* See RLPML register. */
+   dev_info->max_mac_addrs = hw->mac.rar_entry_count;
+   dev_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP |
+   DEV_RX_OFFLOAD_IPV4_CKSUM |
+   DEV_RX_OFFLOAD_UDP_CKSUM  |
+   DEV_RX_OFFLOAD_TCP_CKSUM;
+   dev_info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT |
+   DEV_TX_OFFLOAD_IPV4_CKSUM  |
+   DEV_TX_OFFLOAD_UDP_CKSUM   |
+   

[dpdk-dev] [PATCH v5 7/8] ethdev: support of multiple sizes of redirection table

2014-11-06 Thread Helin Zhang
As 40G NIC supports different sizes (128/512/64 entries) of redirection table
from that (128 entries) of 1G and 10G NICs, support of multiple sizes of
redirection table is needed. It includes,
* Redefine 'struct rte_eth_rss_reta' in ethdev.
  - To 'struct rte_eth_rss_reta_entry64' which contains 64 entries and 64 bits
mask.
  - Array of above new structure can be used for any number of redirection
table entries, as long as the number is multiple of 64. This is quite
flexible for the future expanding of redirection table.
* Redefinition of relevant interfaces in ethdev.
  - Interface of reta update has been redefined with new parameters.
  - Interface of reta query has been redefined with new parameters.
* Rework of 1G PMD in igb.
  - reta update has been reworked.
  - reta query has been reworked.
* Rework of 10G PMD in ixgbe.
  - reta update has been reworked.
  - reta query has been reworked.
* Rework of 40G PMD (PF only) in i40e.
  - reta update has been reworked.
  - reta query has been reworked.
* Implement relevant commands in testpmd.

Signed-off-by: Helin Zhang 
---
 app/test-pmd/cmdline.c  | 150 ++--
 app/test-pmd/config.c   |  37 +
 app/test-pmd/testpmd.h  |   4 +-
 lib/librte_ether/rte_ethdev.c   | 116 +---
 lib/librte_ether/rte_ethdev.h   |  40 ++
 lib/librte_pmd_e1000/igb_ethdev.c   | 109 +-
 lib/librte_pmd_i40e/i40e_ethdev.c   |  93 --
 lib/librte_pmd_i40e/i40e_ethdev.h   |  13 +++-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 108 ++
 9 files changed, 406 insertions(+), 264 deletions(-)

v2 changes:
* Put rework of updating/querying igb reta to a single patch.
* Put rework of updating/querying ixgbe reta to a single patch.
* Put rework of updating/querying i40e reta to a single patch.

v3 changes:
* Put all redefinitions of structures and interfaces into a single patch.
* Put all reworks of igb/igbe/i40e of supporting multiple sizes of reta into
  the same patch.
* Put all relevant testpmd reworks of supporting multiple sizes of reta into
  the same patch.

v4 changes:
* Renamed RTE_BIT_WIDTH_64 to RTE_RETA_GROUP_SIZE.
* Added more comments to rte_eth_dev_rss_reta_update() and
  rte_eth_dev_rss_reta_query().

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index daba286..cf252e1 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -59,6 +59,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -186,6 +187,11 @@ static void cmd_help_long_parsed(void *parsed_result,
"show port (info|stats|xstats|fdir|stat_qmap) 
(port_id|all)\n"
"Display information for port_id, or all.\n\n"

+   "show port X rss reta (size) (mask0,mask1,...)\n"
+   "Display the rss redirection table entry indicated"
+   " by masks on port X. size is used to indicate the"
+   " hardware supported reta size\n\n"
+
"show port rss-hash [key]\n"
"Display the RSS hash functions and RSS hash key"
" of port X\n\n"
@@ -1562,11 +1568,13 @@ struct cmd_config_rss_reta {
 };

 static int
-parse_reta_config(const char *str, struct rte_eth_rss_reta *reta_conf)
+parse_reta_config(const char *str,
+ struct rte_eth_rss_reta_entry64 *reta_conf,
+ uint16_t nb_entries)
 {
int i;
unsigned size;
-   uint8_t hash_index;
+   uint16_t hash_index, idx, shift;
uint8_t nb_queue;
char s[256];
const char *p, *p0 = str;
@@ -1594,24 +1602,23 @@ parse_reta_config(const char *str, struct 
rte_eth_rss_reta *reta_conf)
for (i = 0; i < _NUM_FLD; i++) {
errno = 0;
int_fld[i] = strtoul(str_fld[i], &end, 0);
-   if (errno != 0 || end == str_fld[i] || int_fld[i] > 255)
+   if (errno != 0 || end == str_fld[i] ||
+   int_fld[i] > 65535)
return -1;
}

-   hash_index = (uint8_t)int_fld[FLD_HASH_INDEX];
+   hash_index = (uint16_t)int_fld[FLD_HASH_INDEX];
nb_queue = (uint8_t)int_fld[FLD_QUEUE];

-   if (hash_index >= ETH_RSS_RETA_NUM_ENTRIES) {
-   printf("Invalid RETA hash index=%d", hash_index);
+   if (hash_index >= nb_entries) {
+   printf("Invalid RETA hash index=%d\n", hash_index);
return -1;
}

-   if (hash_index < ETH_RSS_RETA_NUM_ENTRIES/2)
-   reta_conf->mask_lo |= (1ULL << hash_index);
-   else
-   reta_conf->mask_hi |= (1ULL << (hash_index - 

[dpdk-dev] [PATCH v5 5/8] ixgbe: implement ops of 'dev_infos_get' for PF and VF respectively

2014-11-06 Thread Helin Zhang
As more and more information are different between PF and VF, ops of
'dev_infos_get' has been implemented respectively. In addition, returning
redirection table size has been supported in it.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 90 +
 1 file changed, 71 insertions(+), 19 deletions(-)

v2 changes:
* Added new function for ops of 'dev_infos_get' specifically for ixgbe VF.

v3 changes:
* Returning default RX/TX configurations has been added in ops of
  'dev_infos_get' for VF, as it was added recently in that for PF.

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 9c73a30..5a17f3a 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -132,8 +132,9 @@ static int ixgbe_dev_queue_stats_mapping_set(struct 
rte_eth_dev *eth_dev,
 uint8_t stat_idx,
 uint8_t is_rx);
 static void ixgbe_dev_info_get(struct rte_eth_dev *dev,
-   struct rte_eth_dev_info *dev_info);
-
+  struct rte_eth_dev_info *dev_info);
+static void ixgbevf_dev_info_get(struct rte_eth_dev *dev,
+struct rte_eth_dev_info *dev_info);
 static int ixgbe_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu);

 static int ixgbe_vlan_filter_set(struct rte_eth_dev *dev,
@@ -391,7 +392,7 @@ static struct eth_dev_ops ixgbevf_eth_dev_ops = {
.stats_get= ixgbevf_dev_stats_get,
.stats_reset  = ixgbevf_dev_stats_reset,
.dev_close= ixgbevf_dev_close,
-   .dev_infos_get= ixgbe_dev_info_get,
+   .dev_infos_get= ixgbevf_dev_info_get,
.mtu_set  = ixgbevf_dev_set_mtu,
.vlan_filter_set  = ixgbevf_vlan_filter_set,
.vlan_strip_queue_set = ixgbevf_vlan_strip_queue_set,
@@ -1964,25 +1965,76 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
DEV_TX_OFFLOAD_SCTP_CKSUM;

dev_info->default_rxconf = (struct rte_eth_rxconf) {
-   .rx_thresh = {
-   .pthresh = IXGBE_DEFAULT_RX_PTHRESH,
-   .hthresh = IXGBE_DEFAULT_RX_HTHRESH,
-   .wthresh = IXGBE_DEFAULT_RX_WTHRESH,
-   },
-   .rx_free_thresh = IXGBE_DEFAULT_RX_FREE_THRESH,
-   .rx_drop_en = 0,
+   .rx_thresh = {
+   .pthresh = IXGBE_DEFAULT_RX_PTHRESH,
+   .hthresh = IXGBE_DEFAULT_RX_HTHRESH,
+   .wthresh = IXGBE_DEFAULT_RX_WTHRESH,
+   },
+   .rx_free_thresh = IXGBE_DEFAULT_RX_FREE_THRESH,
+   .rx_drop_en = 0,
+   };
+
+   dev_info->default_txconf = (struct rte_eth_txconf) {
+   .tx_thresh = {
+   .pthresh = IXGBE_DEFAULT_TX_PTHRESH,
+   .hthresh = IXGBE_DEFAULT_TX_HTHRESH,
+   .wthresh = IXGBE_DEFAULT_TX_WTHRESH,
+   },
+   .tx_free_thresh = IXGBE_DEFAULT_TX_FREE_THRESH,
+   .tx_rs_thresh = IXGBE_DEFAULT_TX_RSBIT_THRESH,
+   .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
+   ETH_TXQ_FLAGS_NOOFFLOADS,
};
+   dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
+}

+static void
+ixgbevf_dev_info_get(struct rte_eth_dev *dev,
+struct rte_eth_dev_info *dev_info)
+{
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   dev_info->max_rx_queues = (uint16_t)hw->mac.max_rx_queues;
+   dev_info->max_tx_queues = (uint16_t)hw->mac.max_tx_queues;
+   dev_info->min_rx_bufsize = 1024; /* cf BSIZEPACKET in SRRCTL reg */
+   dev_info->max_rx_pktlen = 15872; /* includes CRC, cf MAXFRS reg */
+   dev_info->max_mac_addrs = hw->mac.num_rar_entries;
+   dev_info->max_hash_mac_addrs = IXGBE_VMDQ_NUM_UC_MAC;
+   dev_info->max_vfs = dev->pci_dev->max_vfs;
+   if (hw->mac.type == ixgbe_mac_82598EB)
+   dev_info->max_vmdq_pools = ETH_16_POOLS;
+   else
+   dev_info->max_vmdq_pools = ETH_64_POOLS;
+   dev_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP |
+   DEV_RX_OFFLOAD_IPV4_CKSUM |
+   DEV_RX_OFFLOAD_UDP_CKSUM  |
+   DEV_RX_OFFLOAD_TCP_CKSUM;
+   dev_info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT |
+   DEV_TX_OFFLOAD_IPV4_CKSUM  |
+   DEV_TX_OFFLOAD_UDP_CKSUM   |
+   DEV_TX_OFFLOAD_TCP_CKSUM   |
+   DEV_TX_OFFLOAD_SCTP_CKSUM;
+
+   dev_info->default_rxconf = (struct rte_eth_rxconf) {
+   .rx_thresh = {
+

[dpdk-dev] [PATCH v2] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-06 Thread Burakov, Anatoly
Few nitpicks.

Static variables are always initialized to 0, so "= NULL" isn't necessary, a 
declaration will suffice. Also, we have a macro RTE_PTR_ADD to add numbers to 
pointers, I think it would be better to use those. Otherwise, looks fine to me.

I still feel uneasy about depending on nothing being mapped directly after 
hugepages (perhaps we could do mmap(bar_size) before trying pci_map_resource, 
and increment requested_addr until we find a free spot?), but I imagine this 
case would be quite rare, so probably it's not worth the added kludge.

Thanks,
Anatoly

-Original Message-
From: lxu [mailto:liang...@cinfotech.cn] 
Sent: Thursday, November 6, 2014 2:12 PM
To: dev at dpdk.org
Cc: Burakov, Anatoly
Subject: [PATCH v2] eal: map uio resources after hugepages when the 
base_virtaddr is configured.

---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..a591da3 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -273,6 +273,24 @@ pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
return uio_num;
 }

+static inline const struct rte_memseg *
+get_physmem_last(void)
+{
+   const struct rte_memseg * seg = rte_eal_get_physmem_layout();
+   const struct rte_memseg * last = seg;
+   unsigned i = 0;
+
+   for (i=0; iaddr == NULL)
+   break;
+
+   if(seg->addr > last->addr)
+   last = seg;
+
+   }
+   return last;
+}
+
 /* map the PCI resource of a PCI device in virtual memory */
 int
 pci_uio_map_resource(struct rte_pci_device *dev)
@@ -290,6 +308,13 @@ pci_uio_map_resource(struct rte_pci_device *dev)
struct mapped_pci_resource *uio_res;
struct pci_map *maps;

+   /* map uio resource into user required virtual address */
+   static void * requested_addr = NULL;
+   if (internal_config.base_virtaddr && NULL == requested_addr) {
+   const struct rte_memseg * last = get_physmem_last();
+   requested_addr = (void *)(last->addr_64 + last->len);
+   }
+
dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;

@@ -371,10 +396,12 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   mapaddr = pci_map_resource(requested_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+   else if (NULL != requested_addr)
+   requested_addr = (uint8_t *)mapaddr + 
maps[j].size;
}

if (fail) {
-- 
1.9.1



[dpdk-dev] [PATCH v3] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-06 Thread lxu
---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..3a218d0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -273,6 +273,24 @@ pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
return uio_num;
 }

+static inline const struct rte_memseg *
+get_physmem_last(void)
+{
+   const struct rte_memseg * seg = rte_eal_get_physmem_layout();
+   const struct rte_memseg * last = seg;
+   unsigned i = 0;
+
+   for (i=0; iaddr == NULL)
+   break;
+
+   if(seg->addr > last->addr)
+   last = seg;
+
+   }
+   return last;
+}
+
 /* map the PCI resource of a PCI device in virtual memory */
 int
 pci_uio_map_resource(struct rte_pci_device *dev)
@@ -290,6 +308,13 @@ pci_uio_map_resource(struct rte_pci_device *dev)
struct mapped_pci_resource *uio_res;
struct pci_map *maps;

+   /* map uio resource into user required virtual address */
+   static void * requested_addr;
+   if (internal_config.base_virtaddr && NULL == requested_addr) {
+   const struct rte_memseg * last = get_physmem_last();
+   requested_addr = RTE_PTR_ADD(last->addr, last->len);
+   }
+
dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;

@@ -371,10 +396,12 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   mapaddr = pci_map_resource(requested_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+   else if (NULL != requested_addr)
+   requested_addr = (uint8_t *)mapaddr + 
maps[j].size;
}

if (fail) {
-- 
1.9.1



[dpdk-dev] 答复:[PATCH v2] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-06 Thread 徐亮
When user configure?base_virtaddr, we should believe they can take care it.
In my case, I always check /proc//maps to find a huge free address space, 
such as 0x20  , to map all the hugepages and uio resource.?
--Burakov, 
Anatoly ?2014?11?6?(???) 22:29?? 
?dev at dpdk.org RE: [PATCH v2] 
eal: map uio resources after hugepages when the base_virtaddr is configured.
Few nitpicks.

Static variables are always initialized to 0, so "= NULL" isn't necessary, a 
declaration will suffice. Also, we have a macro RTE_PTR_ADD to add numbers to 
pointers, I think it would be better to use those. Otherwise, looks fine to me.

I still feel uneasy about depending on nothing being mapped directly after 
hugepages (perhaps we could do mmap(bar_size) before trying pci_map_resource, 
and increment requested_addr until we find a free spot?), but I imagine this 
case would be quite rare, so probably it's not worth the added kludge.

Thanks,
Anatoly

-Original Message-
From: lxu [mailto:liang...@cinfotech.cn] 
Sent: Thursday, November 6, 2014 2:12 PM
To: dev at dpdk.org
Cc: Burakov, Anatoly
Subject: [PATCH v2] eal: map uio resources after hugepages when the 
base_virtaddr is configured.

---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..a591da3 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -273,6 +273,24 @@ pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
return uio_num;
 }

+static inline const struct rte_memseg *
+get_physmem_last(void)
+{
+   const struct rte_memseg * seg = rte_eal_get_physmem_layout();
+   const struct rte_memseg * last = seg;
+   unsigned i = 0;
+
+   for (i=0; iaddr == NULL)
+   break;
+
+   if(seg->addr > last->addr)
+   last = seg;
+
+   }
+   return last;
+}
+
 /* map the PCI resource of a PCI device in virtual memory */
 int
 pci_uio_map_resource(struct rte_pci_device *dev)
@@ -290,6 +308,13 @@ pci_uio_map_resource(struct rte_pci_device *dev)
struct mapped_pci_resource *uio_res;
struct pci_map *maps;

+   /* map uio resource into user required virtual address */
+   static void * requested_addr = NULL;
+   if (internal_config.base_virtaddr && NULL == requested_addr) {
+   const struct rte_memseg * last = get_physmem_last();
+   requested_addr = (void *)(last->addr_64 + last->len);
+   }
+
dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;

@@ -371,10 +396,12 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   mapaddr = pci_map_resource(requested_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+   else if (NULL != requested_addr)
+   requested_addr = (uint8_t *)mapaddr + 
maps[j].size;
}

if (fail) {
-- 
1.9.1


[dpdk-dev] [PATCH v3] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-06 Thread Burakov, Anatoly
+   requested_addr = (uint8_t *)mapaddr + 
maps[j].size;

RTE_PTR_ADD?

Thanks,
Anatoly

-Original Message-
From: lxu [mailto:liang...@cinfotech.cn] 
Sent: Thursday, November 6, 2014 2:47 PM
To: dev at dpdk.org
Cc: Burakov, Anatoly
Subject: [PATCH v3] eal: map uio resources after hugepages when the 
base_virtaddr is configured.

---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..3a218d0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -273,6 +273,24 @@ pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
return uio_num;
 }

+static inline const struct rte_memseg *
+get_physmem_last(void)
+{
+   const struct rte_memseg * seg = rte_eal_get_physmem_layout();
+   const struct rte_memseg * last = seg;
+   unsigned i = 0;
+
+   for (i=0; iaddr == NULL)
+   break;
+
+   if(seg->addr > last->addr)
+   last = seg;
+
+   }
+   return last;
+}
+
 /* map the PCI resource of a PCI device in virtual memory */  int  
pci_uio_map_resource(struct rte_pci_device *dev) @@ -290,6 +308,13 @@ 
pci_uio_map_resource(struct rte_pci_device *dev)
struct mapped_pci_resource *uio_res;
struct pci_map *maps;

+   /* map uio resource into user required virtual address */
+   static void * requested_addr;
+   if (internal_config.base_virtaddr && NULL == requested_addr) {
+   const struct rte_memseg * last = get_physmem_last();
+   requested_addr = RTE_PTR_ADD(last->addr, last->len);
+   }
+
dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;

@@ -371,10 +396,12 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   mapaddr = pci_map_resource(requested_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+   else if (NULL != requested_addr)
+   requested_addr = (uint8_t *)mapaddr + 
maps[j].size;
}

if (fail) {
--
1.9.1



[dpdk-dev] [PATCH v4] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-06 Thread lxu
---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..a2c9ab6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -273,6 +273,24 @@ pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
return uio_num;
 }

+static inline const struct rte_memseg *
+get_physmem_last(void)
+{
+   const struct rte_memseg * seg = rte_eal_get_physmem_layout();
+   const struct rte_memseg * last = seg;
+   unsigned i = 0;
+
+   for (i=0; iaddr == NULL)
+   break;
+
+   if(seg->addr > last->addr)
+   last = seg;
+
+   }
+   return last;
+}
+
 /* map the PCI resource of a PCI device in virtual memory */
 int
 pci_uio_map_resource(struct rte_pci_device *dev)
@@ -290,6 +308,13 @@ pci_uio_map_resource(struct rte_pci_device *dev)
struct mapped_pci_resource *uio_res;
struct pci_map *maps;

+   /* map uio resource into user required virtual address */
+   static void * requested_addr;
+   if (internal_config.base_virtaddr && NULL == requested_addr) {
+   const struct rte_memseg * last = get_physmem_last();
+   requested_addr = RTE_PTR_ADD(last->addr, last->len);
+   }
+
dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;

@@ -371,10 +396,12 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   mapaddr = pci_map_resource(requested_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+   else if (NULL != requested_addr)
+   requested_addr = RTE_PTR_ADD(mapaddr, 
maps[j].size);
}

if (fail) {
-- 
1.9.1



[dpdk-dev] [PATCH v3] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-06 Thread De Lara Guarch, Pablo
Include at least signoff.

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of lxu
> Sent: Thursday, November 06, 2014 2:47 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v3] eal: map uio resources after hugepages
> when the base_virtaddr is configured.
> 
> ---
>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 29
> -
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> index 7e62266..3a218d0 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> @@ -273,6 +273,24 @@ pci_get_uio_dev(struct rte_pci_device *dev, char
> *dstbuf,
>   return uio_num;
>  }
> 
> +static inline const struct rte_memseg *
> +get_physmem_last(void)
> +{
> + const struct rte_memseg * seg = rte_eal_get_physmem_layout();
> + const struct rte_memseg * last = seg;
> + unsigned i = 0;
> +
> + for (i=0; i + if (seg->addr == NULL)
> + break;
> +
> + if(seg->addr > last->addr)
> + last = seg;
> +
> + }
> + return last;
> +}
> +
>  /* map the PCI resource of a PCI device in virtual memory */
>  int
>  pci_uio_map_resource(struct rte_pci_device *dev)
> @@ -290,6 +308,13 @@ pci_uio_map_resource(struct rte_pci_device *dev)
>   struct mapped_pci_resource *uio_res;
>   struct pci_map *maps;
> 
> + /* map uio resource into user required virtual address */
> + static void * requested_addr;
> + if (internal_config.base_virtaddr && NULL == requested_addr) {
> + const struct rte_memseg * last = get_physmem_last();
> + requested_addr = RTE_PTR_ADD(last->addr, last->len);
> + }
> +
>   dev->intr_handle.fd = -1;
>   dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
> 
> @@ -371,10 +396,12 @@ pci_uio_map_resource(struct rte_pci_device *dev)
>   if (maps[j].addr != NULL)
>   fail = 1;
>   else {
> - mapaddr = pci_map_resource(NULL, fd,
> (off_t)offset,
> + mapaddr =
> pci_map_resource(requested_addr, fd, (off_t)offset,
>   (size_t)maps[j].size);
>   if (mapaddr == NULL)
>   fail = 1;
> + else if (NULL != requested_addr)
> + requested_addr = (uint8_t
> *)mapaddr + maps[j].size;
>   }
> 
>   if (fail) {
> --
> 1.9.1



[dpdk-dev] [PATCH v4] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-06 Thread lxu
Signed-off-by: lxu 
---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..a2c9ab6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -273,6 +273,24 @@ pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
return uio_num;
 }

+static inline const struct rte_memseg *
+get_physmem_last(void)
+{
+   const struct rte_memseg * seg = rte_eal_get_physmem_layout();
+   const struct rte_memseg * last = seg;
+   unsigned i = 0;
+
+   for (i=0; iaddr == NULL)
+   break;
+
+   if(seg->addr > last->addr)
+   last = seg;
+
+   }
+   return last;
+}
+
 /* map the PCI resource of a PCI device in virtual memory */
 int
 pci_uio_map_resource(struct rte_pci_device *dev)
@@ -290,6 +308,13 @@ pci_uio_map_resource(struct rte_pci_device *dev)
struct mapped_pci_resource *uio_res;
struct pci_map *maps;

+   /* map uio resource into user required virtual address */
+   static void * requested_addr;
+   if (internal_config.base_virtaddr && NULL == requested_addr) {
+   const struct rte_memseg * last = get_physmem_last();
+   requested_addr = RTE_PTR_ADD(last->addr, last->len);
+   }
+
dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;

@@ -371,10 +396,12 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   mapaddr = pci_map_resource(requested_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+   else if (NULL != requested_addr)
+   requested_addr = RTE_PTR_ADD(mapaddr, 
maps[j].size);
}

if (fail) {
-- 
1.9.1



[dpdk-dev] [PATCH v4] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-06 Thread Thomas Monjalon
No explanation -> I won't review it.
No signed-off-by -> I'll ignore it.

Maybe your patch is useful, maybe not. I cannot know.

-- 
Thomas


[dpdk-dev] [PATCH v5] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-06 Thread lxu
Sorry, I'm learning the right way to send a patch by git. 
I have a multiple processes application. When start the secondary process, I 
got error message "EAL: pci_map_resource(): cannot mmap(11, 0x77fba000, 
0x2, 0x0): Bad file descriptor (0x77fb9000)".
The secondary process links a lot of additional shared libraries, so the 
address 0x77fba000 had already be used.
I had fixed similar hugepages mmap problems by base_virtaddr. So I believe the 
uio resource should be mapped into base_virtaddr at this situation.
This patch try to fix it.


Signed-off-by: lxu 
---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..a2c9ab6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -273,6 +273,24 @@ pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
return uio_num;
 }

+static inline const struct rte_memseg *
+get_physmem_last(void)
+{
+   const struct rte_memseg * seg = rte_eal_get_physmem_layout();
+   const struct rte_memseg * last = seg;
+   unsigned i = 0;
+
+   for (i=0; iaddr == NULL)
+   break;
+
+   if(seg->addr > last->addr)
+   last = seg;
+
+   }
+   return last;
+}
+
 /* map the PCI resource of a PCI device in virtual memory */
 int
 pci_uio_map_resource(struct rte_pci_device *dev)
@@ -290,6 +308,13 @@ pci_uio_map_resource(struct rte_pci_device *dev)
struct mapped_pci_resource *uio_res;
struct pci_map *maps;

+   /* map uio resource into user required virtual address */
+   static void * requested_addr;
+   if (internal_config.base_virtaddr && NULL == requested_addr) {
+   const struct rte_memseg * last = get_physmem_last();
+   requested_addr = RTE_PTR_ADD(last->addr, last->len);
+   }
+
dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;

@@ -371,10 +396,12 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   mapaddr = pci_map_resource(requested_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+   else if (NULL != requested_addr)
+   requested_addr = RTE_PTR_ADD(mapaddr, 
maps[j].size);
}

if (fail) {
-- 
1.9.1



[dpdk-dev] [PATCH v5] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-06 Thread Burakov, Anatoly
The explanation of the patch should be generic and impartial, i.e. when this 
and this happens, it results in such and such problem, and this patch fixes it 
by doing this and that. In other words, this will appear in the git history, so 
whoever is reading the commit log will be able to figure out what this patch 
does and why it's been applied.

Thomas, do we need to do similar changes to VFIO code, to keep consistency? 
Also, do we really need for this to depend on --base-virtaddr? Why not do it 
unconditionally, i.e. map PCI resources right after hugepages in memory?

Thanks,
Anatoly

-Original Message-
From: lxu [mailto:liang...@cinfotech.cn] 
Sent: Thursday, November 6, 2014 3:32 PM
To: dev at dpdk.org
Cc: Burakov, Anatoly; thomas.monjalon at 6wind.com; De Lara Guarch, Pablo
Subject: [PATCH v5] eal: map uio resources after hugepages when the 
base_virtaddr is configured.

Sorry, I'm learning the right way to send a patch by git. 
I have a multiple processes application. When start the secondary process, I 
got error message "EAL: pci_map_resource(): cannot mmap(11, 0x77fba000, 
0x2, 0x0): Bad file descriptor (0x77fb9000)".
The secondary process links a lot of additional shared libraries, so the 
address 0x77fba000 had already be used.
I had fixed similar hugepages mmap problems by base_virtaddr. So I believe the 
uio resource should be mapped into base_virtaddr at this situation.
This patch try to fix it.


Signed-off-by: lxu 
---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..a2c9ab6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -273,6 +273,24 @@ pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
return uio_num;
 }

+static inline const struct rte_memseg *
+get_physmem_last(void)
+{
+   const struct rte_memseg * seg = rte_eal_get_physmem_layout();
+   const struct rte_memseg * last = seg;
+   unsigned i = 0;
+
+   for (i=0; iaddr == NULL)
+   break;
+
+   if(seg->addr > last->addr)
+   last = seg;
+
+   }
+   return last;
+}
+
 /* map the PCI resource of a PCI device in virtual memory */  int  
pci_uio_map_resource(struct rte_pci_device *dev) @@ -290,6 +308,13 @@ 
pci_uio_map_resource(struct rte_pci_device *dev)
struct mapped_pci_resource *uio_res;
struct pci_map *maps;

+   /* map uio resource into user required virtual address */
+   static void * requested_addr;
+   if (internal_config.base_virtaddr && NULL == requested_addr) {
+   const struct rte_memseg * last = get_physmem_last();
+   requested_addr = RTE_PTR_ADD(last->addr, last->len);
+   }
+
dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;

@@ -371,10 +396,12 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   mapaddr = pci_map_resource(requested_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+   else if (NULL != requested_addr)
+   requested_addr = RTE_PTR_ADD(mapaddr, 
maps[j].size);
}

if (fail) {
--
1.9.1



[dpdk-dev] [PATCH v3 0/5] support of configurable CRC stripping in VF

2014-11-06 Thread Ananyev, Konstantin

> From: Zhang, Helin
> Sent: Thursday, November 06, 2014 12:54 PM
> To: dev at dpdk.org
> Cc: Cao, Waterman; Cao, Min; Ananyev, Konstantin; Zhang, Helin
> Subject: [PATCH v3 0/5] support of configurable CRC stripping in VF
> 
> To support configurable CRC stripping in both PF host
> and VF, a new operation and a new structure are added
> to carry more configurations from VF to PF host.
> 
> v2 changes:
> * Put all the renaming and code style fixes into a patch.
> * Put processing crc stripping configuration in PF host
>   into a single patch.
> * Put setting the crc stripping into a single patch.
> * Put the configuring crc stripping in VF into a single patch.
> * Added several more code style fixes reported by checkpatch.pl.
> 
> v3 changes:
> * Added a macro of calculating memory size for configuring
>   vsi queues.
> * Used array of memory in stack to replace the memory
>   allocated by rte_zmalloc().
> * Added an input parameter for configuring crc stripping in
>   RX queue context.
> * Put configuring crc stripping of both PF host and VF
>   into a single patch.
> * Defined below new structures for the configuring specifically.
>   - struct i40e_virtchnl_rxq_ext_info;
>   - struct i40e_virtchnl_queue_pair_ext_info;
>   - struct i40e_virtchnl_vsi_queue_config_ext_info;
> * Renamed 'I40E_VIRTCHNL_OP_CONFIG_VSI_QUEUES_EX' to
>   'I40E_VIRTCHNL_OP_CONFIG_VSI_QUEUES_EXT'.
> 
> Helin Zhang (5):
>   config: remove useless i40e items in config files
>   i40evf: Remove 'host_is_dpdk', and use version number instead
>   i40e: renaming and code style fix
>   i40e: support of configurable crc stripping in rx queue
>   i40e: support of configurable VF crc stripping
> 
>  config/common_bsdapp |   1 -
>  config/common_linuxapp   |   1 -
>  lib/librte_pmd_i40e/i40e_ethdev.h|   3 +-
>  lib/librte_pmd_i40e/i40e_ethdev_vf.c | 218 
> +++
>  lib/librte_pmd_i40e/i40e_pf.c| 134 +++--
>  lib/librte_pmd_i40e/i40e_pf.h|  57 +++--
>  6 files changed, 297 insertions(+), 117 deletions(-)
> 
> --
> 1.8.1.4

Acked-by: Konstantin Ananyev 



[dpdk-dev] [PATCH v5] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-06 Thread Thomas Monjalon
2014-11-06 15:41, Burakov, Anatoly:
> Thomas, do we need to do similar changes to VFIO code, to keep consistency?
> Also, do we really need for this to depend on --base-virtaddr? Why not do it
> unconditionally, i.e. map PCI resources right after hugepages in memory?

I don't really like the secondary process mechanism at all.
So I won't give good advice here ;)
But I feel this option --base-virtaddr should be improved or removed.

-- 
Thomas


[dpdk-dev] UDP Checksum

2014-11-06 Thread Alex Markuze
Hi,
I'm seeing "UDP: bad checksum." messages(dmesg) for packets sent by my dpdk
app to a socket on a remote machine.
Looking at the packets the scum value is set, its just not what wireshark
expects.

When sending I'm setting these fields in the egress packets.

pkt->pkt.vlan_macip.f.l2_len = sizeof(struct ether_hdr);

pkt->pkt.vlan_macip.f.l3_len = sizeof(struct ipv4_hdr);

pkt->ol_flags |= (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK);
//PKT_TX_OFFLOAD_MASK;


I'm working with a 82599 VF.


Any thoughts? I'm not sure what else to check.


[dpdk-dev] UDP Checksum

2014-11-06 Thread Olivier MATZ
Hello,

On 11/06/2014 05:05 PM, Alex Markuze wrote:
> I'm seeing "UDP: bad checksum." messages(dmesg) for packets sent by my dpdk
> app to a socket on a remote machine.
> Looking at the packets the scum value is set, its just not what wireshark
> expects.
> 
> When sending I'm setting these fields in the egress packets.
> 
> pkt->pkt.vlan_macip.f.l2_len = sizeof(struct ether_hdr);
> 
> pkt->pkt.vlan_macip.f.l3_len = sizeof(struct ipv4_hdr);
> 
> pkt->ol_flags |= (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK);
> //PKT_TX_OFFLOAD_MASK;

I think you need to do:

 pkt->pkt.vlan_macip.f.l2_len = sizeof(struct ether_hdr);
 pkt->pkt.vlan_macip.f.l3_len = sizeof(struct ipv4_hdr);
 pkt->ol_flags |= (PKT_TX_IP_CKSUM | PKT_TX_UDP_CKSUM);
 ipv4_hdr->hdr_checksum = 0;
 udp_hdr->dgram_cksum = 0;
 udp_hdr->dgram_cksum = get_ipv4_psd_sum(ipv4_hdr); /* see csumonly.c */


Regards,
Olivier


[dpdk-dev] UDP Checksum

2014-11-06 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alex Markuze
> Sent: Thursday, November 06, 2014 4:05 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] UDP Checksum
> 
> Hi,
> I'm seeing "UDP: bad checksum." messages(dmesg) for packets sent by my dpdk
> app to a socket on a remote machine.
> Looking at the packets the scum value is set, its just not what wireshark
> expects.
> 
> When sending I'm setting these fields in the egress packets.
> 
> pkt->pkt.vlan_macip.f.l2_len = sizeof(struct ether_hdr);
> 
> pkt->pkt.vlan_macip.f.l3_len = sizeof(struct ipv4_hdr);
> 
> pkt->ol_flags |= (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK);
> //PKT_TX_OFFLOAD_MASK;
> 
> 
> I'm working with a 82599 VF.
> 
> 
> Any thoughts? I'm not sure what else to check.

As I remember, you have to setup  IPV4 header checksum to 0 and
calculate and setup pseudo-header checksum for UDP.


[dpdk-dev] [PATCH v5] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-06 Thread Burakov, Anatoly
Well, removing --base-virtaddr is not what I'm asking about.

The issue at hand here is that, given our secondary process mechanism (that you 
don't like :-) ), some stuff may be attempted to be mapped into space a 
secondary process may already have mapped something else into (some libraries, 
for example). This issue was originally discovered by OVDK, so we added a 
--base-virtaddr option to try and map hugepages at exact virtual address, 
rather than wherever mmap decides to do so on its own.

The issue encountered by Liang (the author of the patch) is similar, only it's 
not the hugepages are mapped into the occupied space, but rather the PCI 
resources (which are mapped with NULL by default, so can be mapped anywhere). 
Therefore he suggested a patch that maps the PCI resources into a space just 
after the last hugepage when --base-virtaddr is provided. I'm not sure we need 
the dependence on --base-virtaddr though, it can probably be done 
unconditionally. If you have no opinion on the matter, we can leave this detail 
of the patch as it is, then.

Also, I would suspect that if we are to modify where UIO resources are mapped, 
VFIO code should be modified the same way to avoid inconsistency between the 
two.

Thanks,
Anatoly

-Original Message-
From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
Sent: Thursday, November 6, 2014 3:58 PM
To: Burakov, Anatoly
Cc: lxu; dev at dpdk.org; De Lara Guarch, Pablo
Subject: Re: [PATCH v5] eal: map uio resources after hugepages when the 
base_virtaddr is configured.

2014-11-06 15:41, Burakov, Anatoly:
> Thomas, do we need to do similar changes to VFIO code, to keep consistency?
> Also, do we really need for this to depend on --base-virtaddr? Why not 
> do it unconditionally, i.e. map PCI resources right after hugepages in memory?

I don't really like the secondary process mechanism at all.
So I won't give good advice here ;)
But I feel this option --base-virtaddr should be improved or removed.

--
Thomas


[dpdk-dev] UDP Checksum

2014-11-06 Thread Alex Markuze
I was setting both ip and udp scum fields to 0. PKT_TX_UDP_CKSUM ==
PKT_TX_L4_MASK = 0x6000.

I was not aware of the get_ipv4_psd_sum(ipv4_hdr);
And I'm quite frankly surprised the HW doesn't already do this. Farther
more I don't remember kernel drivers messing with
L3 Headers(bnx2x/mlx4). Is this true for all PMDs that do scum offloads?

I will give it a try now.


On Thu, Nov 6, 2014 at 6:15 PM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alex Markuze
> > Sent: Thursday, November 06, 2014 4:05 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] UDP Checksum
> >
> > Hi,
> > I'm seeing "UDP: bad checksum." messages(dmesg) for packets sent by my
> dpdk
> > app to a socket on a remote machine.
> > Looking at the packets the scum value is set, its just not what wireshark
> > expects.
> >
> > When sending I'm setting these fields in the egress packets.
> >
> > pkt->pkt.vlan_macip.f.l2_len = sizeof(struct ether_hdr);
> >
> > pkt->pkt.vlan_macip.f.l3_len = sizeof(struct ipv4_hdr);
> >
> > pkt->ol_flags |= (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK);
> > //PKT_TX_OFFLOAD_MASK;
> >
> >
> > I'm working with a 82599 VF.
> >
> >
> > Any thoughts? I'm not sure what else to check.
>
> As I remember, you have to setup  IPV4 header checksum to 0 and
> calculate and setup pseudo-header checksum for UDP.
> From app/test-pmd/csumonly.c:
> ...
> if (pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_TUNNEL_IPV4_HDR)) {
>
> /* Do not support ipv4 option field */
> l3_len = sizeof(struct ipv4_hdr) ;
>
> ...
>
> /* Do not delete, this is required by HW*/
> ipv4_hdr->hdr_checksum = 0;
>
>...
>
>   if (l4_proto == IPPROTO_UDP) {
> udp_hdr = (struct udp_hdr*)
> (rte_pktmbuf_mtod(mb,
> unsigned char *) + l2_len
> + l3_len);
> if (tx_ol_flags & 0x2) {
> /* HW Offload */
> ol_flags |= PKT_TX_UDP_CKSUM;
> if (ipv4_tunnel)
> udp_hdr->dgram_cksum = 0;
> else
> /* Pseudo header sum need
> be set properly */
> udp_hdr->dgram_cksum =
>
> get_ipv4_psd_sum(ipv4_hdr);
>
>
>
>


[dpdk-dev] UDP Checksum

2014-11-06 Thread Ananyev, Konstantin


> -Original Message-
> From: Ananyev, Konstantin
> Sent: Thursday, November 06, 2014 5:04 PM
> To: Ananyev, Konstantin
> Subject: FW: [dpdk-dev] UDP Checksum
> 
> 
> 
> From: Alex Markuze [mailto:alex at weka.io]
> Sent: Thursday, November 06, 2014 4:27 PM
> To: Ananyev, Konstantin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] UDP Checksum
> 
> I was setting both ip and udp scum fields to 0.?PKT_TX_UDP_CKSUM 
> ==?PKT_TX_L4_MASK = 0x6000.
> 
> I was not aware of the?get_ipv4_psd_sum(ipv4_hdr);
> And I'm quite frankly surprised the HW doesn't already do this. Farther more 
> I don't remember kernel drivers messing with
> L3 Headers(bnx2x/mlx4). Is this true for all PMDs that do scum offloads?

I suppose it depends on HW implementation.
All Intel NICs I am aware about (e1000, ixgbe, i40e) expect that SW provides 
the pseudo IP header checksum in the L4 header.
Not sure what is the story with NICs from other manufactures.

> 
> I will give it a try now.
> 
> 
> On Thu, Nov 6, 2014 at 6:15 PM, Ananyev, Konstantin  intel.com> wrote:
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alex Markuze
> > Sent: Thursday, November 06, 2014 4:05 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] UDP Checksum
> >
> > Hi,
> > I'm seeing "UDP: bad checksum." messages(dmesg) for packets sent by my dpdk
> > app to a socket on a remote machine.
> > Looking at the packets the scum value is set, its just not what wireshark
> > expects.
> >
> > When sending I'm setting these fields in the egress packets.
> >
> >? ? ? ? ?pkt->pkt.vlan_macip.f.l2_len = sizeof(struct ether_hdr);
> >
> >? ? ? ? ?pkt->pkt.vlan_macip.f.l3_len = sizeof(struct ipv4_hdr);
> >
> >? ? ? ? ?pkt->ol_flags |= (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK);
> > //PKT_TX_OFFLOAD_MASK;
> >
> >
> > I'm working with a 82599 VF.
> >
> >
> > Any thoughts? I'm not sure what else to check.
> As I remember, you have to setup? IPV4 header checksum to 0 and
> calculate and setup pseudo-header checksum for UDP.
> From app/test-pmd/csumonly.c:
> ...
> if (pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_TUNNEL_IPV4_HDR)) {
> 
> ? ? ? ? ? ? ? ? ? ? ? ? /* Do not support ipv4 option field */
> ? ? ? ? ? ? ? ? ? ? ? ? l3_len = sizeof(struct ipv4_hdr) ;
> 
> ? ? ? ? ? ? ? ? ? ? ? ? ...
> 
> ? ? ? ? ? ? ? ? ? ? ? ? /* Do not delete, this is required by HW*/
> ? ? ? ? ? ? ? ? ? ? ? ? ipv4_hdr->hdr_checksum = 0;
> 
> ? ? ? ? ? ? ? ? ? ? ? ?...
> 
> ? ? ? ? ? ? ? ? ? ? ? if (l4_proto == IPPROTO_UDP) {
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? udp_hdr = (struct udp_hdr*) 
> (rte_pktmbuf_mtod(mb,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unsigned char *) + l2_len + 
> l3_len);
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? if (tx_ol_flags & 0x2) {
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /* HW Offload */
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ol_flags |= PKT_TX_UDP_CKSUM;
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? if (ipv4_tunnel)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? udp_hdr->dgram_cksum = 0;
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? else
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /* Pseudo header sum need be 
> set properly */
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? udp_hdr->dgram_cksum =
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
> get_ipv4_psd_sum(ipv4_hdr);
> 
> 



[dpdk-dev] [PATCH v5] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-06 Thread Bruce Richardson
On Thu, Nov 06, 2014 at 04:10:15PM +, Burakov, Anatoly wrote:
> Well, removing --base-virtaddr is not what I'm asking about.
> 
> The issue at hand here is that, given our secondary process mechanism (that 
> you don't like :-) ), some stuff may be attempted to be mapped into space a 
> secondary process may already have mapped something else into (some 
> libraries, for example). This issue was originally discovered by OVDK, so we 
> added a --base-virtaddr option to try and map hugepages at exact virtual 
> address, rather than wherever mmap decides to do so on its own.
> 
> The issue encountered by Liang (the author of the patch) is similar, only 
> it's not the hugepages are mapped into the occupied space, but rather the PCI 
> resources (which are mapped with NULL by default, so can be mapped anywhere). 
> Therefore he suggested a patch that maps the PCI resources into a space just 
> after the last hugepage when --base-virtaddr is provided. I'm not sure we 
> need the dependence on --base-virtaddr though, it can probably be done 
> unconditionally. If you have no opinion on the matter, we can leave this 
> detail of the patch as it is, then.
> 
> Also, I would suspect that if we are to modify where UIO resources are 
> mapped, VFIO code should be modified the same way to avoid inconsistency 
> between the two.
> 

I find nothing wrong with your logic, Anatoly, it makes sense to me. :-)

I'm curious, however, as to what Thomas has in mind for how we might improve
the base-virtaddr flag.

/Bruce

> Thanks,
> Anatoly
> 
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] 
> Sent: Thursday, November 6, 2014 3:58 PM
> To: Burakov, Anatoly
> Cc: lxu; dev at dpdk.org; De Lara Guarch, Pablo
> Subject: Re: [PATCH v5] eal: map uio resources after hugepages when the 
> base_virtaddr is configured.
> 
> 2014-11-06 15:41, Burakov, Anatoly:
> > Thomas, do we need to do similar changes to VFIO code, to keep consistency?
> > Also, do we really need for this to depend on --base-virtaddr? Why not 
> > do it unconditionally, i.e. map PCI resources right after hugepages in 
> > memory?
> 
> I don't really like the secondary process mechanism at all.
> So I won't give good advice here ;)
> But I feel this option --base-virtaddr should be improved or removed.
> 
> --
> Thomas


[dpdk-dev] 答复: [PATCH] Add user defined tag calculation callback tolibrte_distributor.

2014-11-06 Thread jigsaw
Hi Bruce,

In my use case, unfortunately the tag is not hash. And the tag can be on
either low or high bits, depending on configuration.
I wonder if it is possible to let the user to decide which bit to mask,
i.e. to add another param to rte_distributor_create to define the mask.

thx &
rgds,
-qinglai

On Thu, Nov 6, 2014 at 3:59 PM, Bruce Richardson  wrote:

> On Thu, Nov 06, 2014 at 02:36:09PM +0200, Qinglai Xiao wrote:
> > Hi Bruce,
> >
> > There is a subtle case in which tag values are 2 and 3, respectively.
> Then these two tags cannot be distinguished. There should be a better way
> so as to handle this situation.
>
> It's not just in that, case, it's in any case where a pair of tags differ
> by
> only a single bit. I've been assuming that the tag is likely to be a hash
> value in most cases - given that it's only 32-bit - in which case it just
> doesn't
> matter which bit we chose to permanently set to 1, but if there are
> scenarios
> where it's likely that the low bits are used but the high ones not so, we
> can
> look to change which bit is set to 1. Either way, the distributor just
> uses a
> 31-bit tag rather than a 32-bit one.
>
> /Bruce
>
> >
> > thx &
> > rgds
> > -qinglai
> >
> > --
> > ???: "Thomas Monjalon" 
> > : ?2014/?11/?6 12:36
> > ???: "Bruce Richardson" 
> > ??: "dev at dpdk.org" ; "jigsaw" 
> > ??: Re: [dpdk-dev] [PATCH] Add user defined tag calculation callback
> tolibrte_distributor.
> >
> > 2014-11-06 09:22, Bruce Richardson:
> > > On Wed, Nov 05, 2014 at 07:24:13PM +0200, jigsaw wrote:
> > > >
> http://dpdk.org/browse/dpdk/tree/lib/librte_distributor/rte_distributor.c#n285
> > > >
> > > > new_tag = (next_mb->hash.rss | 1);
> > > >
> > > > Why the logical OR is needed?
> > >
> > > That's needed to ensure that we never track a tag with an actual value
> of zero.
> > > We instead always force the low bit to be 1, so that we can use zero
> as an
> > > "empty" value.
> >
> > Bruce, could you check how this code may be better commented please?
> > This discussion shows that the distributor library probably needs more
> > explanations in the code or doxygen.
> >
> > Thanks
> > --
> > Thomas
>


[dpdk-dev] [RFC PATCH] Adding RTE_KNI_PREEMPT configuration option

2014-11-06 Thread Marc Sune
Hi guys,

Any comment, suggestion or objection to this patch? Otherwise I would 
send a non-RFC patch

Thanks
Marc

On 05/11/14 01:17, Marc Sune wrote:
> This patch introduces CONFIG_RTE_KNI_PREEMPT flag. When set to 'no', KNI
> kernel thread(s) do not call schedule_timeout_interruptible(), which improves
> overall KNI performance at the expense of CPU cycles (polling).
>
> Default values is 'yes', maintaining the same behaviour as of now.
>
> Note: this RFC patch is based on v1.7.1, since I was using a 1.7 application.
> It will eventually be rebased to 1.8 upon acceptance.
>
> Signed-off-by: Marc Sune 
> ---
>   config/common_linuxapp |1 +
>   lib/librte_eal/linuxapp/kni/kni_misc.c |4 
>   2 files changed, 5 insertions(+)
>
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index 9047975..9cebcf1 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -382,6 +382,7 @@ CONFIG_RTE_LIBRTE_PIPELINE=y
>   # Compile librte_kni
>   #
>   CONFIG_RTE_LIBRTE_KNI=y
> +CONFIG_RTE_KNI_PREEMPT=y
>   CONFIG_RTE_KNI_KO_DEBUG=n
>   CONFIG_RTE_KNI_VHOST=n
>   CONFIG_RTE_KNI_VHOST_MAX_CACHE_SIZE=1024
> diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c 
> b/lib/librte_eal/linuxapp/kni/kni_misc.c
> index ba6..e7e6c27 100644
> --- a/lib/librte_eal/linuxapp/kni/kni_misc.c
> +++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
> @@ -229,9 +229,11 @@ kni_thread_single(void *unused)
>   }
>   }
>   up_read(&kni_list_lock);
> +#ifdef RTE_KNI_PREEMPT
>   /* reschedule out for a while */
>   schedule_timeout_interruptible(usecs_to_jiffies( \
>   KNI_KTHREAD_RESCHEDULE_INTERVAL));
> +#endif
>   }
>   
>   return 0;
> @@ -252,8 +254,10 @@ kni_thread_multiple(void *param)
>   #endif
>   kni_net_poll_resp(dev);
>   }
> +#ifdef RTE_KNI_PREEMPT
>   schedule_timeout_interruptible(usecs_to_jiffies( \
>   KNI_KTHREAD_RESCHEDULE_INTERVAL));
> +#endif
>   }
>   
>   return 0;



[dpdk-dev] [PATCH] ADD mode 5(tlb) to link bonding pmd

2014-11-06 Thread De Lara Guarch, Pablo


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Daniel Mrzyglod
> Sent: Wednesday, September 17, 2014 11:01 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] ADD mode 5(tlb) to link bonding pmd
> 
> This patch set adds support of mode 5 to link bonding pmd
> 
> This patchset depend on  Declan Doherty patch set:
> http://dpdk.org/ml/archives/dev/2014-September/005069.html
> 
> Signed-off-by: Daniel Mrzyglod 
> ---
>  lib/librte_pmd_bond/rte_eth_bond.h |   23 
>  lib/librte_pmd_bond/rte_eth_bond_args.c|1 +
>  lib/librte_pmd_bond/rte_eth_bond_pmd.c |  163
> +++-
>  lib/librte_pmd_bond/rte_eth_bond_private.h |5 +-
>  4 files changed, 189 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_pmd_bond/rte_eth_bond.h
> b/lib/librte_pmd_bond/rte_eth_bond.h
> index bd59780..1bd76ce 100644
> --- a/lib/librte_pmd_bond/rte_eth_bond.h
> +++ b/lib/librte_pmd_bond/rte_eth_bond.h
> @@ -75,6 +75,29 @@ extern "C" {
>  /**< Broadcast (Mode 3).
>   * In this mode all transmitted packets will be transmitted on all available
>   * active slaves of the bonded. */
> +#define BONDING_MODE_ADAPTIVE_TRANSMIT_LOAD_BALANCING
>   (5)
> +/**< Broadcast (Mode 5)

Typo, should be Adaptive TLB (Mode 5).

> + * Adaptive transmit load balancing: channel bonding that
> + * does not require any special switch support.  The
> + * outgoing traffic is distributed according to the
> + * current load (computed relative to the speed) on each
> + * slave.  Incoming traffic is received by the current
> + * slave.  If the receiving slave fails, another slave
> + * takes over the MAC address of the failed receiving
> + * slave.*/
> +#define BONDING_MODE_ADAPTIVE_LOAD_BALANCING
>   (6)



[dpdk-dev] Cannot run l3fwd

2014-11-06 Thread Eduard Gibert Renart
Hi Everyone:

When I try to run l3fwd inside my VM I get the following error:

ubuntu at ubuntu-VirtualBox:~/Desktop/dpdk-1.7.1/examples/l3fwd/build$ sudo 
./l3fwd -c 0x3 -n 2 -- -p 0x3 --config="(0,0,0),(1,0,1)"
EAL: Cannot read numa node link for lcore 0 - using physical package id instead
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Cannot read numa node link for lcore 1 - using physical package id instead
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Support maximum 64 logical core(s) by configuration.
EAL: Detected 2 lcore(s)
EAL: Setting up memory...
EAL: cannot open /proc/self/numa_maps, consider that all memory is in socket_id 0
EAL: Ask a virtual area of 0x40 bytes
EAL: Virtual area found at 0xb640 (size = 0x40)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0xb600 (size = 0x20)
EAL: Ask a virtual area of 0xf00 bytes
EAL: Virtual area found at 0xa6e0 (size = 0xf00)
EAL: Ask a virtual area of 0x1e20 bytes
EAL: Virtual area found at 0x88a0 (size = 0x1e20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x8860 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x8820 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x87e0 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x87a0 (size = 0x20)
EAL: Ask a virtual area of 0xe40 bytes
EAL: Virtual area found at 0x7940 (size = 0xe40)
EAL: Ask a virtual area of 0x220 bytes
EAL: Virtual area found at 0x7700 (size = 0x220)
EAL: Ask a virtual area of 0xc0 bytes
EAL: Virtual area found at 0x7620 (size = 0xc0)
EAL: Ask a virtual area of 0xe0 bytes
EAL: Virtual area found at 0x7520 (size = 0xe0)
EAL: Requesting 512 pages of size 2MB from socket 0
EAL: TSC frequency is ~2503668 KHz
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable 
clock cycles !
EAL: Master core 0 is ready (tid=b75c2800)
EAL: Core 1 is ready (tid=751ffb40)
EAL: PCI device :00:03.0 on NUMA socket -1
EAL:   probe driver: 8086:100e rte_em_pmd
EAL:   :00:03.0 not managed by UIO driver, skipping
EAL: PCI device :00:08.0 on NUMA socket -1
EAL:   probe driver: 8086:100e rte_em_pmd
EAL:   PCI memory mapped at 0xb7572000
EAL: PCI device :00:09.0 on NUMA socket -1
EAL:   probe driver: 8086:100e rte_em_pmd
EAL:   PCI memory mapped at 0xb7552000
EAL: PCI device :00:03.0 on NUMA socket -1
EAL:   probe driver: 8086:100e rte_em_pmd
EAL:   :00:03.0 not managed by UIO driver, skipping
Initializing port 0 ... Creating queues: nb_rxq=1 nb_txq=2... EAL: Error - 
exiting with code: 1
  Cause: Cannot configure device: err=-22, port=0

Any ideas on what is causing the error? 

Thanks,
Eduard Gibert Renart


[dpdk-dev] 答复: [PATCH] Add user defined tag calculation callback tolibrte_distributor.

2014-11-06 Thread jigsaw
Hi Bruce,

Actually IMHO it is good to leave the freedom to user to decide how to
interpret the tag value, i.e. remove the OR 1 bit.
If the tag value is zero, then we assume the programmer know what he is
doing. Of course this shall be clearly documented in the comment/doxgen.


thx &
rgds,
-qinglai

On Thu, Nov 6, 2014 at 8:01 PM, jigsaw  wrote:

> Hi Bruce,
>
> In my use case, unfortunately the tag is not hash. And the tag can be on
> either low or high bits, depending on configuration.
> I wonder if it is possible to let the user to decide which bit to mask,
> i.e. to add another param to rte_distributor_create to define the mask.
>
> thx &
> rgds,
> -qinglai
>
> On Thu, Nov 6, 2014 at 3:59 PM, Bruce Richardson <
> bruce.richardson at intel.com> wrote:
>
>> On Thu, Nov 06, 2014 at 02:36:09PM +0200, Qinglai Xiao wrote:
>> > Hi Bruce,
>> >
>> > There is a subtle case in which tag values are 2 and 3, respectively.
>> Then these two tags cannot be distinguished. There should be a better way
>> so as to handle this situation.
>>
>> It's not just in that, case, it's in any case where a pair of tags differ
>> by
>> only a single bit. I've been assuming that the tag is likely to be a hash
>> value in most cases - given that it's only 32-bit - in which case it just
>> doesn't
>> matter which bit we chose to permanently set to 1, but if there are
>> scenarios
>> where it's likely that the low bits are used but the high ones not so, we
>> can
>> look to change which bit is set to 1. Either way, the distributor just
>> uses a
>> 31-bit tag rather than a 32-bit one.
>>
>> /Bruce
>>
>> >
>> > thx &
>> > rgds
>> > -qinglai
>> >
>> > --
>> > ???: "Thomas Monjalon" 
>> > : ?2014/?11/?6 12:36
>> > ???: "Bruce Richardson" 
>> > ??: "dev at dpdk.org" ; "jigsaw" 
>> > ??: Re: [dpdk-dev] [PATCH] Add user defined tag calculation callback
>> tolibrte_distributor.
>> >
>> > 2014-11-06 09:22, Bruce Richardson:
>> > > On Wed, Nov 05, 2014 at 07:24:13PM +0200, jigsaw wrote:
>> > > >
>> http://dpdk.org/browse/dpdk/tree/lib/librte_distributor/rte_distributor.c#n285
>> > > >
>> > > > new_tag = (next_mb->hash.rss | 1);
>> > > >
>> > > > Why the logical OR is needed?
>> > >
>> > > That's needed to ensure that we never track a tag with an actual
>> value of zero.
>> > > We instead always force the low bit to be 1, so that we can use zero
>> as an
>> > > "empty" value.
>> >
>> > Bruce, could you check how this code may be better commented please?
>> > This discussion shows that the distributor library probably needs more
>> > explanations in the code or doxygen.
>> >
>> > Thanks
>> > --
>> > Thomas
>>
>
>


[dpdk-dev] Max throughput Using QOS Scheduler

2014-11-06 Thread Dumitrescu, Cristian
Hi Srikanth,

>>Is there any difference between scheduler behavior  for above two scenarios  
>>while enqueing and de-queueing ??
All the pipe queues share the bandwidth allocated to their pipe. The 
distribution of available pipe bandwidth between the pipe queues is governed by 
features like traffic class strict priority, bandwidth sharing between pipe 
traffic classes, weights of the queues within the same traffic class, etc. In 
the case you mention, you are just using one queue for each traffic class.

Let?s take an example:

-Configuration: pipe rate = 10 Mbps, pipe traffic class 0 .. 3 rates = 
[20% of pipe rate = 2 Mbps, 30% of pipe rate = 3 Mbps, 40% of pipe rate = 4 
Mbps, 100% of pipe rate = 10 Mbps]. Convention is that traffic class 0 is the 
highest priority.

-Injected traffic per traffic class for this pipe: [3, 0, 0, 0] Mbps => 
Output traffic per traffic class for this pipe: [2 , 0, 0, 0] Mbps

-Injected traffic per traffic class for this pipe: [0, 0, 0, 15] Mbps 
=> Output traffic per traffic class for this pipe: [0, 0, 0, 10] Mbps

-Injected traffic per traffic class for this pipe: [1, 10, 2, 15] Mbps 
=> Output traffic per traffic class for this pipe: [1, 3, 2, 4] Mbps

Makes sense?

>>Queue size is 64 , and number of packets enqueued and dequeued is 64 as well.
I strongly recommend you never use a dequeue burst size that is equal to 
enqueue burst size, as performance will be bad.

In the qos_sched sample app, we use [enqueue burst size, dequeue burst size] 
set to [64, 32], other reasonable values could be [64, 48], [32, 16], etc. An 
enqueue burst bigger than dequeue burst will cause the big packet reservoir 
which is the traffic manager/port scheduler to fill up to a reasonable level 
that will allow dequeu to function optimally, and then the system regulates 
itself.

The reason is: since we interlace enqueue and dequeue calls, if you push on 
every iteration e.g. 64 packets in and then look to get 64 packets out, you?ll 
only have 64 packets into the queues, then you?ll work hard to find them, and 
you get out exactly those 64 packets that you pushed in.

>>And what is the improvements i would gain if i move to DPDK 1.7 w.r.t QOS ?
The QoS code is pretty stable since release 1.4, not many improvements added 
(maybe it?s the right time to revisit this feature and push it to the next 
level ?), but there are improvements in other DPDK libraries that are 
dependencies for QoS (e.g. packet Rx/Tx).

Hope this helps.

Regards,
Cristian



From: Srikanth Akula [mailto:srikanth...@gmail.com]
Sent: Thursday, October 30, 2014 4:10 PM
To: dev at dpdk.org; Dumitrescu, Cristian
Subject: Max throughput Using QOS Scheduler

Hello All ,

I am currently trying to implement QOS scheduler using DPDK 1.6 . I have 
configured 1 subport , 4096 pipes for the sub port and 4 TC's and 4 Queues .

Currently i am trying to send packets destined to single Queue of the available 
16 queues of one of the pipe .

Could some body explain what could be the throughput we can achieve using this 
scheme.  The reason for asking this is , i could sense different behavior each 
time when i send traffic destined to different destination Queues  .

for example :

1. << Only one stream>>> Stream destined Q0 of TC0 ..


2. << 4 streams  1st Stream destined for Q3 of Tc3 ...
 2nd stream destined for Q2 of Tc2
 3rd stream destined for Q1 of TC1
 4th Stream destined for Q0 of TC0

Is there any difference between scheduler behavior  for above two scenarios  
while enqueing and de-queueing ??

Queue size is 64 , and number of packets enqueud and dequeued is 64 as well.
And what is the improvements i would gain if i move to DPDK 1.7 w.r.t QOS ?


Could you please clarify my queries ?


Thanks & Regards,
Srikanth


--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.



[dpdk-dev] [PATCH] librte_vhost: Fix the path test issue

2014-11-06 Thread Xie, Huawei


> -Original Message-
> From: Ouyang, Changchun
> Sent: Wednesday, November 05, 2014 10:20 PM
> To: Xie, Huawei; dev at dpdk.org
> Cc: Ouyang, Changchun
> Subject: RE: [dpdk-dev] [PATCH] librte_vhost: Fix the path test issue
> 
> Hi Huawei,
> Thanks for the comments,
> And my response as follows.
> 
> > -Original Message-
> > From: Xie, Huawei
> > Sent: Thursday, November 6, 2014 10:39 AM
> > To: Ouyang, Changchun; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH] librte_vhost: Fix the path test issue
> >
> > >   path = realpath(memfile, resolved_path);
> > > - if (path == NULL) {
> > > + if ((path == NULL) && (strlen(resolved_path) == 0)) {
> > >   RTE_LOG(ERR, VHOST_CONFIG,
> > >   "(%"PRIu64") Failed to resolve fd directory\n",
> > >   dev->device_fh);
> > Changchun:
> > For some strange file, according to API description, we shouldn't check
> > resolved_path as it is undefined.
> > To make the loop go on, we could use "continue" when we detect path is
> > NULL.
> >
> > RETURN VALUE
> >If there is no error, realpath() returns a pointer to the 
> > resolved_path.
> >
> >Otherwise it returns a NULL pointer, and the contents of the array
> > resolved_path are undefined, and errno is set to indicate the error.
> 
> After my investigation this issue and find out using continue doesn't work.
> 
> The reason is procmap.fname itself is
> "/dev/hugepages/qemu_back_mem.pc.ram.zxfqLq",
> It is not a normal path, so in this case, path is null, while resolved-path is
> /dev/hugepages/qemu_back_mem.pc.ram.zxfqLq
> 
> If 'continue' is used, then procmap.fname could not be hit in the directory 
> list,
> And then  app will exit after report:?Failed to find memory file for pid

I did some investigation. This is due to that qemu unlink the file after it 
maps 
the huge page file. So this is a special case, it is ok we check the resolved 
path
when path is NULL if errno indicates "No such file or directory".
> 
> So I have to keep it.
> 
> Thanks again
> Changchun



[dpdk-dev] [PATCH] librte_vhost: Fix the path test issue

2014-11-06 Thread Xie, Huawei


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang Changchun
> Sent: Monday, November 03, 2014 1:12 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] librte_vhost: Fix the path test issue
> 
> Commit aec8283d47d4e4366b6 fixes the compilation issue, but it leads to
> one runtime issue: early exit wrongly. In some case, 'path' is NULL, but
> 'resolved_path' has effective path, it should continue going ahead rather
> than exit.
> 
> Signed-off-by: Changchun Ouyang 
> ---
>  lib/librte_vhost/virtio-net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
> index 8015dd8..3fa1274 100644
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -237,7 +237,7 @@ host_memory_map(struct virtio_net *dev, struct
> virtio_memory *mem,
>   snprintf(memfile, PATH_MAX, "/proc/%u/fd/%s",
>   pid, dptr->d_name);
>   path = realpath(memfile, resolved_path);
> - if (path == NULL) {
> + if ((path == NULL) && (strlen(resolved_path) == 0)) {
>   RTE_LOG(ERR, VHOST_CONFIG,
>   "(%"PRIu64") Failed to resolve fd directory\n",
>   dev->device_fh);
> --
> 1.8.4.2

Acked-by: Huawei Xie 


[dpdk-dev] Cannot run l3fwd

2014-11-06 Thread Thomas Monjalon
Hi,
You replied to an email which is absolutely not related to your question,
polluting the thread.
Don't take me wrong: I don't want you to re-post your question in a new thread.
You have the source code of DPDK and you didn't check where the error is,
not enabled the debug logs. So, it's clearly too early to post in this
mailing-list.

2014-11-06 14:39, Eduard Gibert Renart:
> Initializing port 0 ... Creating queues: nb_rxq=1 nb_txq=2... EAL: Error - 
> exiting with code: 1
>   Cause: Cannot configure device: err=-22, port=0

Maybe you didn't find the debug options. Please check in config files and 
rebuild.

-- 
Thomas


[dpdk-dev] Bugs in newest patches

2014-11-06 Thread Thomas Monjalon
Hi,

2014-11-07 05:21, Keunhong Lee:
> I just pulled new patches from the master branch, and found that it doesn't
> work with C++.
> 
> in  lib/librte_eal/common/include/generic/rte_cycles.h
> +#ifdef __cplusplus
> +extern "C" {
> +#endif

It's already included in lib/librte_eal/common/include/arch/x86/rte_cycles.h
which includes lib/librte_eal/common/include/generic/rte_cycles.h.

You shouldn't include the generic header directly.

-- 
Thomas


[dpdk-dev] Panic in rte MEMPOOL__mempool_check_cookies()

2014-11-06 Thread Kamraan Nasim
Greetings,

I have been hitting this issue fairly consistently for the ixgbe driver

MEMPOOL: obj=0x7ffeed1f5d00, mempool=0x7ffeecb69bc0, cookie=badbadbadadd2e55
PANIC in __mempool_check_cookies():
MEMPOOL: bad header cookie (get)

It seems to be a corruption in the mempool bound to my ixgbe port. What I
have observed is that this ONLY happens if I initialize dpdk(i.e. start
dpdk application) AFTER traffic is already flowing in through the port. If
I initialize dpdk and bind BEFORE I start traffic then things seem to work
fine.

Any clues on why this might be happening?

A bit stumped, so would really appreciate all the help I can get on this
one.

Thanks,
Kam


(bt for your reference)

#2  0x00408cc6 in __rte_panic (funcname=0x571100
"__mempool_check_cookies", format=
0x568fb0 "MEMPOOL: bad header cookie (get)\n%.0s")
at
/b/knasim/bandwagon/sbn/src/share/dpdk/lib/librte_eal/linuxapp/eal/eal_debug.c:83
#3  0x004af027 in __mempool_check_cookies (rxq=)
at
/b/knasim/bandwagon/sbn/src/share/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:357
#4  rte_mempool_get_bulk (rxq=)
at
/b/knasim/bandwagon/sbn/src/share/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1094
#5  ixgbe_rx_alloc_bufs (rxq=)
at
/b/knasim/bandwagon/sbn/src/share/dpdk/lib/librte_pmd_ixgbe/ixgbe_rxtx.c:997
#6  0x004afce9 in rx_recv_pkts (rx_queue=0x7ffeec8edbc0,
rx_pkts=0x900410,
nb_pkts=)
at
/b/knasim/bandwagon/sbn/src/share/dpdk/lib/librte_pmd_ixgbe/ixgbe_rxtx.c:1074
#7  ixgbe_recv_pkts_bulk_alloc (rx_queue=0x7ffeec8edbc0, rx_pkts=0x900410,
nb_pkts=)
at
/b/knasim/bandwagon/sbn/src/share/dpdk/lib/librte_pmd_ixgbe/ixgbe_rxtx.c:1124
#8  0x00520d36 in rte_eth_rx_burst (lp=0x900340, n_workers=14,
bsz_rd=, bsz_wr=
144, pos_lb=0 '\000') at /usr/lib/dpdk/include/rte_ethdev.h:2368


  1   2   >