[dpdk-dev] [DPDK-memory] how qemu waste such long time under dpdk huge page envriment?

2017-06-16 Thread Sam
Hi all,

I'm running `QEMU_CMD ...` to create a vm under dpdk huge page envriment
(which set huge page 1G). And I enable all events in qemu.

For qemu and ovs-dpdk(ovs-2.4.9 with dpdk-2.2.0) environment, detail log is:

> 30012@1497443246.678304:object_dynamic_cast_assert
qemu:memory-region->qemu:memory-region (/home/hu
> anghuai/cloud/contrib/qemu-2.6.0/memory.c:1076:memory_region_initfn)
> 30012@1497443256.274866:object_dynamic_cast_assert
qio-channel-socket->qio-channel-socket (io/chann
> el-socket.c:389:qio_channel_socket_init)


I don't know why qemu doing 'memory_region_initfn' function in this 10
second, does anyone know this?

static void memory_region_initfn(Object *obj)
> {
> MemoryRegion *mr = MEMORY_REGION(obj);
> ObjectProperty *op;
> mr->ops = &unassigned_mem_ops;
> mr->enabled = true;
> mr->romd_mode = true;
> mr->global_locking = true;
> mr->destructor = memory_region_destructor_none;
> QTAILQ_INIT(&mr->subregions);
> QTAILQ_INIT(&mr->coalesced);
> op = object_property_add(OBJECT(mr), "container",
>  "link<" TYPE_MEMORY_REGION ">",
>  memory_region_get_container,
>  NULL, /* memory_region_set_container */
>  NULL, NULL, &error_abort);
> op->resolve = memory_region_resolve_container;
> object_property_add(OBJECT(mr), "addr", "uint64",
> memory_region_get_addr,
> NULL, /* memory_region_set_addr */
> NULL, NULL, &error_abort);
> object_property_add(OBJECT(mr), "priority", "uint32",
> memory_region_get_priority,
> NULL, /* memory_region_set_priority */
> NULL, NULL, &error_abort);
> object_property_add_bool(OBJECT(mr), "may-overlap",
>  memory_region_get_may_overlap,
>  NULL, /* memory_region_set_may_overlap */
>  &error_abort);
> object_property_add(OBJECT(mr), "size", "uint64",
> memory_region_get_size,
> NULL, /* memory_region_set_size, */
> NULL, NULL, &error_abort);
> }


Thank you~


Re: [dpdk-dev] [DPDK-memory] how qemu waste such long time under dpdk huge page envriment?

2017-06-16 Thread Sam
BTW, while running ovs-dpdk, this log is also take long time, does that
mean dpdk request large memory take long time?

EAL: Setting up physically contiguous memory...


2017-06-16 16:11 GMT+08:00 Sam :

> Hi all,
>
> I'm running `QEMU_CMD ...` to create a vm under dpdk huge page envriment
> (which set huge page 1G). And I enable all events in qemu.
>
> For qemu and ovs-dpdk(ovs-2.4.9 with dpdk-2.2.0) environment, detail log
> is:
>
> > 30012@1497443246.678304:object_dynamic_cast_assert
> qemu:memory-region->qemu:memory-region (/home/hu
> > anghuai/cloud/contrib/qemu-2.6.0/memory.c:1076:memory_region_initfn)
> > 30012@1497443256.274866:object_dynamic_cast_assert
> qio-channel-socket->qio-channel-socket (io/chann
> > el-socket.c:389:qio_channel_socket_init)
>
>
> I don't know why qemu doing 'memory_region_initfn' function in this 10
> second, does anyone know this?
>
> static void memory_region_initfn(Object *obj)
>> {
>> MemoryRegion *mr = MEMORY_REGION(obj);
>> ObjectProperty *op;
>> mr->ops = &unassigned_mem_ops;
>> mr->enabled = true;
>> mr->romd_mode = true;
>> mr->global_locking = true;
>> mr->destructor = memory_region_destructor_none;
>> QTAILQ_INIT(&mr->subregions);
>> QTAILQ_INIT(&mr->coalesced);
>> op = object_property_add(OBJECT(mr), "container",
>>  "link<" TYPE_MEMORY_REGION ">",
>>  memory_region_get_container,
>>  NULL, /* memory_region_set_container */
>>  NULL, NULL, &error_abort);
>> op->resolve = memory_region_resolve_container;
>> object_property_add(OBJECT(mr), "addr", "uint64",
>> memory_region_get_addr,
>> NULL, /* memory_region_set_addr */
>> NULL, NULL, &error_abort);
>> object_property_add(OBJECT(mr), "priority", "uint32",
>> memory_region_get_priority,
>> NULL, /* memory_region_set_priority */
>> NULL, NULL, &error_abort);
>> object_property_add_bool(OBJECT(mr), "may-overlap",
>>  memory_region_get_may_overlap,
>>  NULL, /* memory_region_set_may_overlap */
>>  &error_abort);
>> object_property_add(OBJECT(mr), "size", "uint64",
>> memory_region_get_size,
>> NULL, /* memory_region_set_size, */
>> NULL, NULL, &error_abort);
>> }
>
>
> Thank you~
>


Re: [dpdk-dev] How to get "--base-virtaddr" when using DPDK

2017-06-16 Thread Bruce Richardson
On Thu, Jun 15, 2017 at 12:00:16PM -0600, Junguk Cho wrote:
> Hi,
> 
> I have the same situation which is explained here (
> https://github.com/collectd/collectd/issues/2284).
> 
> So, I would like to try "--base-virtaddr" option.
> I tried to use "struct rte_memseg *m = rte_eal_get_physmem_layout()" option
> based on (http://dpdk.org/doc/api/structrte__memseg.html).
> However, it always  returns different number.
> 
> How do I get virtual address value to use it as an input of "base-viraddr"?
> 
Hi,

there is no one way to get a virtual address to pass in here - if you
are aware of some suitable values just pass them directly. What I have
done in the past in a multi-process system, is to run the primary
normaly, and then when running the failing secondary turn up the log
level so I can see the mapped addresses before it fails. Then I record
the starting address that the secondary uses, and re-run the primary
passing that in as the base-virtaddr hint.

Regards,
/Bruce


Re: [dpdk-dev] [PATCH 01/38] eal: add support for 24 40 and 48 bit operations

2017-06-16 Thread Bruce Richardson
On Fri, Jun 16, 2017 at 11:10:31AM +0530, Shreyansh Jain wrote:
> From: Hemant Agrawal 
> 
> Bit Swap and LE<=>BE conversions for 23, 40 and 48 bit width
> 
> Signed-off-by: Hemant Agrawal 
> ---
>  .../common/include/generic/rte_byteorder.h | 78 
> ++
>  1 file changed, 78 insertions(+)
> 
Are these really common enough for inclusion in an generic EAL file?
Would they be better being driver specific, so that we don't end up with
lots of extra byte-swap routines for each possible size used by a
driver.

/Bruce


Re: [dpdk-dev] [DPDK-memory] how qemu waste such long time under dpdk huge page envriment?

2017-06-16 Thread Bruce Richardson
On Fri, Jun 16, 2017 at 04:26:40PM +0800, Sam wrote:
> BTW, while running ovs-dpdk, this log is also take long time, does that
> mean dpdk request large memory take long time?
> 
> EAL: Setting up physically contiguous memory...
> 

When running with 1G pages, I found that the mmap system call takes a
considerable amount of time to execute. I think this is due to the
kernel zero-ing out the 1G pages. IIRC on one system I measured it as taking
about 0.4 seconds per 1G page.

/Bruce

> 
> 2017-06-16 16:11 GMT+08:00 Sam :
> 
> > Hi all,
> >
> > I'm running `QEMU_CMD ...` to create a vm under dpdk huge page envriment
> > (which set huge page 1G). And I enable all events in qemu.
> >
> > For qemu and ovs-dpdk(ovs-2.4.9 with dpdk-2.2.0) environment, detail log
> > is:
> >
> > > 30012@1497443246.678304:object_dynamic_cast_assert
> > qemu:memory-region->qemu:memory-region (/home/hu
> > > anghuai/cloud/contrib/qemu-2.6.0/memory.c:1076:memory_region_initfn)
> > > 30012@1497443256.274866:object_dynamic_cast_assert
> > qio-channel-socket->qio-channel-socket (io/chann
> > > el-socket.c:389:qio_channel_socket_init)
> >
> >
> > I don't know why qemu doing 'memory_region_initfn' function in this 10
> > second, does anyone know this?
> >
> > static void memory_region_initfn(Object *obj)
> >> {
> >> MemoryRegion *mr = MEMORY_REGION(obj);
> >> ObjectProperty *op;
> >> mr->ops = &unassigned_mem_ops;
> >> mr->enabled = true;
> >> mr->romd_mode = true;
> >> mr->global_locking = true;
> >> mr->destructor = memory_region_destructor_none;
> >> QTAILQ_INIT(&mr->subregions);
> >> QTAILQ_INIT(&mr->coalesced);
> >> op = object_property_add(OBJECT(mr), "container",
> >>  "link<" TYPE_MEMORY_REGION ">",
> >>  memory_region_get_container,
> >>  NULL, /* memory_region_set_container */
> >>  NULL, NULL, &error_abort);
> >> op->resolve = memory_region_resolve_container;
> >> object_property_add(OBJECT(mr), "addr", "uint64",
> >> memory_region_get_addr,
> >> NULL, /* memory_region_set_addr */
> >> NULL, NULL, &error_abort);
> >> object_property_add(OBJECT(mr), "priority", "uint32",
> >> memory_region_get_priority,
> >> NULL, /* memory_region_set_priority */
> >> NULL, NULL, &error_abort);
> >> object_property_add_bool(OBJECT(mr), "may-overlap",
> >>  memory_region_get_may_overlap,
> >>  NULL, /* memory_region_set_may_overlap */
> >>  &error_abort);
> >> object_property_add(OBJECT(mr), "size", "uint64",
> >> memory_region_get_size,
> >> NULL, /* memory_region_set_size, */
> >> NULL, NULL, &error_abort);
> >> }
> >
> >
> > Thank you~
> >


Re: [dpdk-dev] [PATCH 01/38] eal: add support for 24 40 and 48 bit operations

2017-06-16 Thread Shreyansh Jain
Hi Bruce,

> -Original Message-
> From: Bruce Richardson [mailto:bruce.richard...@intel.com]
> Sent: Friday, June 16, 2017 2:27 PM
> To: Shreyansh Jain 
> Cc: dev@dpdk.org; ferruh.yi...@intel.com; Hemant Agrawal
> 
> Subject: Re: [dpdk-dev] [PATCH 01/38] eal: add support for 24 40 and 48 bit
> operations
> 
> On Fri, Jun 16, 2017 at 11:10:31AM +0530, Shreyansh Jain wrote:
> > From: Hemant Agrawal 
> >
> > Bit Swap and LE<=>BE conversions for 23, 40 and 48 bit width
> >
> > Signed-off-by: Hemant Agrawal 
> > ---
> >  .../common/include/generic/rte_byteorder.h | 78
> ++
> >  1 file changed, 78 insertions(+)
> >
> Are these really common enough for inclusion in an generic EAL file?
> Would they be better being driver specific, so that we don't end up with
> lots of extra byte-swap routines for each possible size used by a
> driver.
 
Reasoning was to keep all bit/byte swap at a single place and if it is
useful for others.

>From DPAA perspective, these macro can be anywhere. In case someone else too
has use of this (now or in near-future), probably then we can consider this
in EAL.
Else, if I don't get much responses in a few days, I will shift them to
DPAA driver in next version of this series.

-
Shreyansh


[dpdk-dev] [PATCH v3 3/3] doc: add new ddp get info command

2017-06-16 Thread Andrey Chilikin
This patch adds description of new command for dynamic device
personalization profiles: ddp get info (profile_path)

Signed-off-by: Andrey Chilikin 
---
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 0e50c1071..c982cd895 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -380,6 +380,13 @@ For example::
testpmd> read txd 0 0 4
 0x0001 - 0x24C3C440 / 0x000F - 0x2330003C
 
+ddp get info
+
+
+Display information about dynamic device personalization profile::
+
+   testpmd> ddp get info (profile_patch)
+
 show vf stats
 ~
 
-- 
2.13.0



[dpdk-dev] [PATCH v3 0/3] net/i40e: get information about ddp profile

2017-06-16 Thread Andrey Chilikin
This patch adds ability to request information about dynamic device
personalization profile

v3:
Add command description to test-pmd guide

v2:
Add rte_pmd_i40e_get_ddp_info() function to i40e version map
Add CL to test-pmd for getting information about profile

Andrey Chilikin (3):
  net/i40e: get information about ddp profile
  app/testpmd: enable ddp get info feature
  doc: add new ddp get info command

 app/test-pmd/cmdline.c  | 131 ++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   7 ++
 drivers/net/i40e/rte_pmd_i40e.c | 163 +++-
 drivers/net/i40e/rte_pmd_i40e.h |  46 
 drivers/net/i40e/rte_pmd_i40e_version.map   |   7 ++
 5 files changed, 350 insertions(+), 4 deletions(-)

-- 
2.13.0

Acked-by: Beilei Xing 


[dpdk-dev] [PATCH v3 1/3] net/i40e: get information about ddp profile

2017-06-16 Thread Andrey Chilikin
This patch adds ability to request information about dynamic device
personalization profile

Signed-off-by: Andrey Chilikin 
---
 drivers/net/i40e/rte_pmd_i40e.c   | 163 +-
 drivers/net/i40e/rte_pmd_i40e.h   |  46 +
 drivers/net/i40e/rte_pmd_i40e_version.map |   7 ++
 3 files changed, 212 insertions(+), 4 deletions(-)

diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c
index f7ce62bb5..45cdcfaa3 100644
--- a/drivers/net/i40e/rte_pmd_i40e.c
+++ b/drivers/net/i40e/rte_pmd_i40e.c
@@ -1468,7 +1468,7 @@ rte_pmd_i40e_set_tc_strict_prio(uint8_t port, uint8_t 
tc_map)
return ret;
 }
 
-#define I40E_PROFILE_INFO_SIZE 48
+#define I40E_PROFILE_INFO_SIZE sizeof(struct rte_pmd_i40e_profile_info)
 #define I40E_MAX_PROFILE_NUM 16
 
 static void
@@ -1520,9 +1520,6 @@ i40e_add_rm_profile_info(struct i40e_hw *hw, uint8_t 
*profile_info_sec)
return status;
 }
 
-#define I40E_PROFILE_INFO_SIZE 48
-#define I40E_MAX_PROFILE_NUM 16
-
 /* Check if the profile info exists */
 static int
 i40e_check_profile_info(uint8_t port, uint8_t *profile_info_sec)
@@ -1682,6 +1679,164 @@ rte_pmd_i40e_process_ddp_package(uint8_t port, uint8_t 
*buff,
return status;
 }
 
+int rte_pmd_i40e_get_ddp_info(uint8_t *pkg_buff, uint32_t pkg_size,
+   uint8_t *info_buff, uint32_t info_size,
+   enum rte_pmd_i40e_package_info type)
+{
+   uint32_t ret_size;
+   struct i40e_package_header *pkg_hdr;
+   struct i40e_generic_seg_header *i40e_seg_hdr;
+   struct i40e_generic_seg_header *note_seg_hdr;
+   struct i40e_generic_seg_header *metadata_seg_hdr;
+
+   if (!info_buff) {
+   PMD_DRV_LOG(ERR, "Output info buff is invalid.");
+   return -EINVAL;
+   }
+
+   if (!pkg_buff || pkg_size < (sizeof(struct i40e_package_header) +
+   sizeof(struct i40e_metadata_segment) +
+   sizeof(uint32_t) * 2)) {
+   PMD_DRV_LOG(ERR, "Package buff is invalid.");
+   return -EINVAL;
+   }
+
+   pkg_hdr = (struct i40e_package_header *)pkg_buff;
+   if (pkg_hdr->segment_count < 2) {
+   PMD_DRV_LOG(ERR, "Segment_count should be 2 at least.");
+   return -EINVAL;
+   }
+
+   /* Find metadata segment */
+   metadata_seg_hdr = i40e_find_segment_in_package(SEGMENT_TYPE_METADATA,
+   pkg_hdr);
+
+   /* Find global notes segment */
+   note_seg_hdr = i40e_find_segment_in_package(SEGMENT_TYPE_NOTES,
+   pkg_hdr);
+
+   /* Find i40e profile segment */
+   i40e_seg_hdr = i40e_find_segment_in_package(SEGMENT_TYPE_I40E, pkg_hdr);
+
+   /* get global header info */
+   if (type == RTE_PMD_I40E_PKG_INFO_GLOBAL_HEADER) {
+   struct rte_pmd_i40e_profile_info *info =
+   (struct rte_pmd_i40e_profile_info *)info_buff;
+
+   if (info_size < sizeof(struct rte_pmd_i40e_profile_info)) {
+   PMD_DRV_LOG(ERR, "Output info buff size is invalid.");
+   return -EINVAL;
+   }
+
+   if (!metadata_seg_hdr) {
+   PMD_DRV_LOG(ERR, "Failed to find metadata segment 
header");
+   return -EINVAL;
+   }
+
+   memset(info, 0, sizeof(struct rte_pmd_i40e_profile_info));
+   info->owner = RTE_PMD_I40E_DDP_OWNER_UNKNOWN;
+   info->track_id =
+   ((struct i40e_metadata_segment 
*)metadata_seg_hdr)->track_id;
+
+   memcpy(info->name,
+   ((struct i40e_metadata_segment 
*)metadata_seg_hdr)->name,
+   I40E_DDP_NAME_SIZE);
+   memcpy(&info->version,
+   &((struct i40e_metadata_segment 
*)metadata_seg_hdr)->version,
+   sizeof(struct i40e_ddp_version));
+   return I40E_SUCCESS;
+   }
+
+   /* get global note size */
+   if (type == RTE_PMD_I40E_PKG_INFO_GLOBAL_NOTES_SIZE) {
+   if (info_size < sizeof(uint32_t)) {
+   PMD_DRV_LOG(ERR, "Invalid information buffer size");
+   return -EINVAL;
+   }
+   if (note_seg_hdr == NULL)
+   ret_size = 0;
+   else
+   ret_size = note_seg_hdr->size;
+   *(uint32_t *)info_buff = ret_size;
+   return I40E_SUCCESS;
+   }
+
+   /* get global note */
+   if (type == RTE_PMD_I40E_PKG_INFO_GLOBAL_NOTES) {
+   if (note_seg_hdr == NULL)
+   return -ENOTSUP;
+   if (info_size < note_seg_hdr->size) {
+   PMD_DRV_LOG(ERR, "Information buffer size is too 
small");
+   return -EINVAL;
+   }
+   memcpy(info_buff, ¬e_seg_hdr[1], note_seg_hdr->size);
+   return I40E_SUCCESS;
+   }

[dpdk-dev] [PATCH v3 2/3] app/testpmd: enable ddp get info feature

2017-06-16 Thread Andrey Chilikin
This patch demostrates how to get information about dynamic device
personalization profile. Command 'ddp get info (path_to_profile)'
extracts and prints information about the given profile.

Signed-off-by: Andrey Chilikin 
---
 app/test-pmd/cmdline.c | 131 +
 1 file changed, 131 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 0afac68c7..9e7858901 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -218,6 +218,9 @@ static void cmd_help_long_parsed(void *parsed_result,
"ddp get list (port_id)\n"
"Get ddp profile info list\n\n"
 
+   "ddp get info (profile_path)\n"
+   "Get ddp profile information.\n\n"
+
"show vf stats (port_id) (vf_id)\n"
"Display a VF's statistics.\n\n"
 
@@ -12904,6 +12907,133 @@ cmdline_parse_inst_t cmd_ddp_add = {
},
 };
 
+/* Get dynamic device personalization profile info */
+struct cmd_ddp_info_result {
+   cmdline_fixed_string_t ddp;
+   cmdline_fixed_string_t get;
+   cmdline_fixed_string_t info;
+   char filepath[];
+};
+
+cmdline_parse_token_string_t cmd_ddp_info_ddp =
+   TOKEN_STRING_INITIALIZER(struct cmd_ddp_info_result, ddp, "ddp");
+cmdline_parse_token_string_t cmd_ddp_info_get =
+   TOKEN_STRING_INITIALIZER(struct cmd_ddp_info_result, get, "get");
+cmdline_parse_token_string_t cmd_ddp_info_info =
+   TOKEN_STRING_INITIALIZER(struct cmd_ddp_info_result, info, "info");
+cmdline_parse_token_string_t cmd_ddp_info_filepath =
+   TOKEN_STRING_INITIALIZER(struct cmd_ddp_info_result, filepath, NULL);
+
+static void
+cmd_ddp_info_parsed(
+   void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_ddp_info_result *res = parsed_result;
+   uint8_t *pkg;
+   uint32_t pkg_size;
+   int ret = -ENOTSUP;
+#ifdef RTE_LIBRTE_I40E_PMD
+   uint32_t i;
+   uint8_t *buff;
+   uint32_t buff_size;
+   struct rte_pmd_i40e_profile_info info;
+   uint32_t dev_num;
+   struct rte_pmd_i40e_ddp_device_id *devs;
+#endif
+
+   pkg = open_ddp_package_file(res->filepath, &pkg_size);
+   if (!pkg)
+   return;
+
+#ifdef RTE_LIBRTE_I40E_PMD
+   ret = rte_pmd_i40e_get_ddp_info(pkg, pkg_size,
+   (uint8_t *)&info, sizeof(info),
+   RTE_PMD_I40E_PKG_INFO_GLOBAL_HEADER);
+   if (!ret) {
+   printf("Global Track id:   0x%x\n", info.track_id);
+   printf("Global Version:%d.%d.%d.%d\n",
+   info.version.major,
+   info.version.minor,
+   info.version.update,
+   info.version.draft);
+   printf("Global Package name:   %s\n\n", info.name);
+   }
+
+   ret = rte_pmd_i40e_get_ddp_info(pkg, pkg_size,
+   (uint8_t *)&info, sizeof(info),
+   RTE_PMD_I40E_PKG_INFO_HEADER);
+   if (!ret) {
+   printf("i40e Profile Track id: 0x%x\n", info.track_id);
+   printf("i40e Profile Version:  %d.%d.%d.%d\n",
+   info.version.major,
+   info.version.minor,
+   info.version.update,
+   info.version.draft);
+   printf("i40e Profile name: %s\n\n", info.name);
+   }
+
+   ret = rte_pmd_i40e_get_ddp_info(pkg, pkg_size,
+   (uint8_t *)&buff_size, sizeof(buff_size),
+   RTE_PMD_I40E_PKG_INFO_GLOBAL_NOTES_SIZE);
+   if (!ret && buff_size) {
+   buff = (uint8_t *)malloc(buff_size);
+   if (buff) {
+   ret = rte_pmd_i40e_get_ddp_info(pkg, pkg_size,
+   buff, buff_size,
+   
RTE_PMD_I40E_PKG_INFO_GLOBAL_NOTES);
+   if (!ret)
+   printf("Package Notes:\n%s\n\n", buff);
+   free(buff);
+   }
+   }
+
+   ret = rte_pmd_i40e_get_ddp_info(pkg, pkg_size,
+   (uint8_t *)&dev_num, sizeof(dev_num),
+   RTE_PMD_I40E_PKG_INFO_DEVID_NUM);
+   if (!ret && dev_num) {
+   devs = (struct rte_pmd_i40e_ddp_device_id *)malloc(dev_num *
+   sizeof(struct rte_pmd_i40e_ddp_device_id));
+   if (devs) {
+   ret = rte_pmd_i40e_get_ddp_info(pkg, pkg_size,
+   (uint8_t *)devs, dev_num *
+   sizeof(struct 
rte_pmd_i40e_ddp_device_id),
+   
RTE_PMD_

Re: [dpdk-dev] [PATCH 01/38] eal: add support for 24 40 and 48 bit operations

2017-06-16 Thread Thomas Monjalon
16/06/2017 11:21, Shreyansh Jain:
> Hi Bruce,
> 
> From: Bruce Richardson [mailto:bruce.richard...@intel.com]
> > On Fri, Jun 16, 2017 at 11:10:31AM +0530, Shreyansh Jain wrote:
> > > From: Hemant Agrawal 
> > >
> > > Bit Swap and LE<=>BE conversions for 23, 40 and 48 bit width
> > >
> > > Signed-off-by: Hemant Agrawal 
> > > ---
> > >  .../common/include/generic/rte_byteorder.h | 78
> > ++
> > >  1 file changed, 78 insertions(+)
> > >
> > Are these really common enough for inclusion in an generic EAL file?
> > Would they be better being driver specific, so that we don't end up with
> > lots of extra byte-swap routines for each possible size used by a
> > driver.
>  
> Reasoning was to keep all bit/byte swap at a single place and if it is
> useful for others.
> 
> From DPAA perspective, these macro can be anywhere. In case someone else too
> has use of this (now or in near-future), probably then we can consider this
> in EAL.
> Else, if I don't get much responses in a few days, I will shift them to
> DPAA driver in next version of this series.

I prefer keeping common functions in a common place.
So I like this patch.



Re: [dpdk-dev] [PATCH 01/38] eal: add support for 24 40 and 48 bit operations

2017-06-16 Thread Adrien Mazarguil
Hi Shreyansh,

On Fri, Jun 16, 2017 at 09:21:35AM +, Shreyansh Jain wrote:
> Hi Bruce,
> 
> > -Original Message-
> > From: Bruce Richardson [mailto:bruce.richard...@intel.com]
> > Sent: Friday, June 16, 2017 2:27 PM
> > To: Shreyansh Jain 
> > Cc: dev@dpdk.org; ferruh.yi...@intel.com; Hemant Agrawal
> > 
> > Subject: Re: [dpdk-dev] [PATCH 01/38] eal: add support for 24 40 and 48 bit
> > operations
> > 
> > On Fri, Jun 16, 2017 at 11:10:31AM +0530, Shreyansh Jain wrote:
> > > From: Hemant Agrawal 
> > >
> > > Bit Swap and LE<=>BE conversions for 23, 40 and 48 bit width
> > >
> > > Signed-off-by: Hemant Agrawal 
> > > ---
> > >  .../common/include/generic/rte_byteorder.h | 78
> > ++
> > >  1 file changed, 78 insertions(+)
> > >
> > Are these really common enough for inclusion in an generic EAL file?
> > Would they be better being driver specific, so that we don't end up with
> > lots of extra byte-swap routines for each possible size used by a
> > driver.
>  
> Reasoning was to keep all bit/byte swap at a single place and if it is
> useful for others.
> 
> From DPAA perspective, these macro can be anywhere. In case someone else too
> has use of this (now or in near-future), probably then we can consider this
> in EAL.
> Else, if I don't get much responses in a few days, I will shift them to
> DPAA driver in next version of this series.

While I'm not against exposing exotic byte swapping functions, they are not
completely safe and I'm not sure they should be part of public header files
on that basis.

Problem is their storage size is larger than the number of bytes they deal
with, which raises the question: are filler bytes prepended or appended to
the converted value? How about input values in non-native order? Answering
that is not so easy as it depends on the use case. We actually had a similar
issue when defining VXLAN's VNI field for rte_flow, which is 24-bit in
network order.

Take rte_constant_bswap48() for instance, assuming input value is
little-endian, output is supposed to be big-endian. While the shifts are
correct, filler bytes are not in the right place for a big-endian system,
and the resulting value stored on uint64_t cannot be used as-is. Again, that
depends on the use case, it could be correct if the resulting value was to
be used as is on a little-endian system.

I think the only safe way to deal with that is by defining specific types of
the proper size, e.g.:

 typedef uint8_t uint48_t[6];

These are cumbersome and cannot be used like normal integers though. With
such types, byte-swapping functions become meaningless.

Since these are supposed to be rather simple functions, I'm not sure
handling/documenting all this complexity in rte_byteorder.h makes sense.

-- 
Adrien Mazarguil
6WIND


[dpdk-dev] [PATCH] vhost: added error log in vhost_user_set_features

2017-06-16 Thread Dariusz Stojaczyk
Since vhost_user_set_features failure is not handled in any way, a
single error log has been added to at least to let the user know that
something has gone wrong.

Signed-off-by: Dariusz Stojaczyk 
---
 lib/librte_vhost/vhost_user.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 5c8058b..b3aff90 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -164,8 +164,12 @@ vhost_user_set_features(struct virtio_net *dev, uint64_t 
features)
uint64_t vhost_features = 0;
 
rte_vhost_driver_get_features(dev->ifname, &vhost_features);
-   if (features & ~vhost_features)
+   if (features & ~vhost_features) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "(%d) received invalid negotiated features.\n",
+   dev->vid);
return -1;
+   }
 
if ((dev->flags & VIRTIO_DEV_RUNNING) && dev->features != features) {
if (dev->notify_ops->features_changed)
-- 
2.7.4



[dpdk-dev] [PATCH] net/mlx4: fix assertion failure on link update

2017-06-16 Thread Adrien Mazarguil
The interrupt handler can sometimes be triggered for reasons other than a
link status event. An assertion failure happen when such events occur while
an asynchronous link status update is already scheduled.

Address this issue using the same approach as its mlx5 counterpart,
commit a9f2fbc42f0c ("net/mlx5: fix inconsistent link status")

Fixes: c4da6caa426d ("mlx4: handle link status interrupts")

Cc: sta...@dpdk.org
Signed-off-by: Adrien Mazarguil 
Acked-by: Nelio Laranjeiro 
---
 drivers/net/mlx4/mlx4.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 050d646..57b52ac 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -5343,6 +5343,7 @@ priv_dev_status_handler(struct priv *priv, struct 
rte_eth_dev *dev,
 {
struct ibv_async_event event;
int port_change = 0;
+   struct rte_eth_link *link = &dev->data->dev_link;
int ret = 0;
 
*events = 0;
@@ -5364,22 +5365,20 @@ priv_dev_status_handler(struct priv *priv, struct 
rte_eth_dev *dev,
  event.event_type, event.element.port_num);
ibv_ack_async_event(&event);
}
-
-   if (port_change ^ priv->pending_alarm) {
-   struct rte_eth_link *link = &dev->data->dev_link;
-
-   priv->pending_alarm = 0;
-   mlx4_link_update(dev, 0);
-   if (((link->link_speed == 0) && link->link_status) ||
-   ((link->link_speed != 0) && !link->link_status)) {
+   if (!port_change)
+   return ret;
+   mlx4_link_update(dev, 0);
+   if (((link->link_speed == 0) && link->link_status) ||
+   ((link->link_speed != 0) && !link->link_status)) {
+   if (!priv->pending_alarm) {
/* Inconsistent status, check again later. */
priv->pending_alarm = 1;
rte_eal_alarm_set(MLX4_ALARM_TIMEOUT_US,
  mlx4_dev_link_status_handler,
  dev);
-   } else {
-   *events |= (1 << RTE_ETH_EVENT_INTR_LSC);
}
+   } else {
+   *events |= (1 << RTE_ETH_EVENT_INTR_LSC);
}
return ret;
 }
@@ -5400,6 +5399,7 @@ mlx4_dev_link_status_handler(void *arg)
 
priv_lock(priv);
assert(priv->pending_alarm == 1);
+   priv->pending_alarm = 0;
ret = priv_dev_status_handler(priv, dev, &events);
priv_unlock(priv);
if (ret > 0 && events & (1 << RTE_ETH_EVENT_INTR_LSC))
-- 
2.1.4



Re: [dpdk-dev] [PATCH] vhost: added error log in vhost_user_set_features

2017-06-16 Thread Yuanhan Liu
On Fri, Jun 16, 2017 at 04:32:05PM +0200, Dariusz Stojaczyk wrote:
> Since vhost_user_set_features failure is not handled in any way, a
> single error log has been added to at least to let the user know that
> something has gone wrong.

Indeed. And applied to dpdk-next-virtio.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH] doc: add testpmd commands for DDP

2017-06-16 Thread Ferruh Yigit
On 6/16/2017 3:43 AM, Beilei Xing wrote:
> Add testpmd commands for loading DDP package and getting
> loaded DDP info list.
> 
> Signed-off-by: Beilei Xing 

Applied to dpdk-next-net/master, thanks.

(Abbreviation DDP extended in document)


[dpdk-dev] [git pull] virtio changes for 17.08-rc1

2017-06-16 Thread Yuanhan Liu
Hi Thomas,

Please consider pulling following virtio changes for 17.08-rc1 at
git://dpdk.org/next/dpdk-next-virtiomaster

Thanks.

--yliu

---
Daniel Verkamp (1):
  vhost: access VhostUsrMsg via packed struct

Dariusz Stojaczyk (3):
  vhost: fix malloc size too small
  vhost: fix memory leak
  vhost: added error log for badly negotiated features

Jens Freimann (1):
  vhost: check malloc return value when allocating guest pages

Tiwei Bie (1):
  net/virtio: zero the whole memory zone

Yuanhan Liu (1):
  vhost: fix crash on NUMA

Zhihong Wang (1):
  vhost: support rx_queue_count


---
 drivers/net/vhost/rte_eth_vhost.c  | 13 
 drivers/net/virtio/virtio_ethdev.c |  2 +-
 lib/librte_vhost/rte_vhost.h   | 12 +++
 lib/librte_vhost/rte_vhost_version.map |  7 +
 lib/librte_vhost/vhost.c   | 28 -
 lib/librte_vhost/vhost_user.c  | 81 
+---
 6 files changed, 109 insertions(+), 34 deletions(-)


Re: [dpdk-dev] mlx debug build error with clang

2017-06-16 Thread Adrien Mazarguil
Hi Ferruh,

On Tue, Jun 13, 2017 at 04:32:03PM +0100, Ferruh Yigit wrote:
> Hi Adrien, Nelio,
> 
> I am getting following build error [1] with clang [2] when debug enabled
> for mlx4 and mlx5.
> 
> This started after I update my box, not sure what triggered this.
> Can you please help fixing this?
> 
> Thanks,
> ferruh
> 
> 
> [1]
> 
> .../drivers/net/mlx4/mlx4_flow.c:731:3: error: use of GNU statement
> expression extension [-Werror,-Wgnu-statement-expression]
> claim_zero(ibv_destroy_qp(fdq->qp));
> ^
> .../drivers/net/mlx4/mlx4.h:185:25: note: expanded from macro 'claim_zero'
> #define claim_zero(...) assert((__VA_ARGS__) == 0)
> ^
> /usr/include/assert.h:95:6: note: expanded from macro 'assert'
> ({  \
>  ^
> 
> 
> 
> .../drivers/net/mlx5/mlx5_fdir.c:278:2: error: use of GNU statement
> expression extension [-Werror,-Wgnu-statement-expression]
> assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec_offset);
> ^
> /usr/include/assert.h:95:6: note: expanded from macro 'assert'
> ({  \
>  ^
> 
> [Many of similar ...]
> 
> 
> [2]
> target: x86_64-native-linuxapp-clang
> 
> clang version 4.0.0 (tags/RELEASE_400/final)

Recent Glibc versions now apparently handle assert() through a nonstandard
({ }) construct, which is not pedantic-safe due to a missing __extension__
keyword.

assert.h still provides a standard assert() definition that shouldn't cause
compilation to fail when the following condition is met:

 #if !defined __GNUC__ || defined __STRICT_ANSI__

However __GNUC__ is (always?) defined by clang for maximum compatibility
with GCC while __STRICT_ANSI__ is not due to the -std=gnu99 parameter,
assert.h thinks it's OK to use a ({ }) construct but is then caught by
clang's -pedantic parameter.

There are two ways to address this issue while keeping our beloved -pedantic
parameter in debug mode:

1. Replacing -std=gnu99 with -std=c99 (which is even stricter) to bring back
   __STRICT_ANSI__.
2. Replacing assert() statements with RTE_ASSERT().

The former should be doable now that DPDK includes have been cleaned up, and
we're thinking about doing the latter at some point.

Since I don't have a recent Glibc handy, can you try replacing -std=gnu99
with -std=c99 in both Makefiles (mlx4 and mlx5), and report how GCC and
clang fare? (GCC at least seems to have no problem with that on my side)

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH v3 0/3] net/i40e: get information about ddp profile

2017-06-16 Thread Ferruh Yigit
On 6/16/2017 10:25 AM, Andrey Chilikin wrote:
> This patch adds ability to request information about dynamic device
> personalization profile
> 
> v3:
> Add command description to test-pmd guide
> 
> v2:
> Add rte_pmd_i40e_get_ddp_info() function to i40e version map
> Add CL to test-pmd for getting information about profile
> 
> Andrey Chilikin (3):
>   net/i40e: get information about ddp profile
>   app/testpmd: enable ddp get info feature
>   doc: add new ddp get info command

Series applied to dpdk-next-net/master, thanks.

- Kept ACK from Beilei
- Merged 3/3 into 2/3


Re: [dpdk-dev] mlx debug build error with clang

2017-06-16 Thread Ferruh Yigit
On 6/16/2017 1:19 PM, Adrien Mazarguil wrote:
> Hi Ferruh,
> 
> On Tue, Jun 13, 2017 at 04:32:03PM +0100, Ferruh Yigit wrote:
>> Hi Adrien, Nelio,
>>
>> I am getting following build error [1] with clang [2] when debug enabled
>> for mlx4 and mlx5.
>>
>> This started after I update my box, not sure what triggered this.
>> Can you please help fixing this?
>>
>> Thanks,
>> ferruh
>>
>>
>> [1]
>>
>> .../drivers/net/mlx4/mlx4_flow.c:731:3: error: use of GNU statement
>> expression extension [-Werror,-Wgnu-statement-expression]
>> claim_zero(ibv_destroy_qp(fdq->qp));
>> ^
>> .../drivers/net/mlx4/mlx4.h:185:25: note: expanded from macro 'claim_zero'
>> #define claim_zero(...) assert((__VA_ARGS__) == 0)
>> ^
>> /usr/include/assert.h:95:6: note: expanded from macro 'assert'
>> ({  \
>>  ^
>>
>> 
>>
>> .../drivers/net/mlx5/mlx5_fdir.c:278:2: error: use of GNU statement
>> expression extension [-Werror,-Wgnu-statement-expression]
>> assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec_offset);
>> ^
>> /usr/include/assert.h:95:6: note: expanded from macro 'assert'
>> ({  \
>>  ^
>>
>> [Many of similar ...]
>>
>>
>> [2]
>> target: x86_64-native-linuxapp-clang
>>
>> clang version 4.0.0 (tags/RELEASE_400/final)
> 
> Recent Glibc versions now apparently handle assert() through a nonstandard
> ({ }) construct, which is not pedantic-safe due to a missing __extension__
> keyword.
> 
> assert.h still provides a standard assert() definition that shouldn't cause
> compilation to fail when the following condition is met:
> 
>  #if !defined __GNUC__ || defined __STRICT_ANSI__
> 
> However __GNUC__ is (always?) defined by clang for maximum compatibility
> with GCC while __STRICT_ANSI__ is not due to the -std=gnu99 parameter,
> assert.h thinks it's OK to use a ({ }) construct but is then caught by
> clang's -pedantic parameter.
> 
> There are two ways to address this issue while keeping our beloved -pedantic
> parameter in debug mode:
> 
> 1. Replacing -std=gnu99 with -std=c99 (which is even stricter) to bring back
>__STRICT_ANSI__.
> 2. Replacing assert() statements with RTE_ASSERT().
> 
> The former should be doable now that DPDK includes have been cleaned up, and
> we're thinking about doing the latter at some point.
> 
> Since I don't have a recent Glibc handy, can you try replacing -std=gnu99
> with -std=c99 in both Makefiles (mlx4 and mlx5), and report how GCC and
> clang fare? (GCC at least seems to have no problem with that on my side)

Sure, I will try -std=c99

> 



Re: [dpdk-dev] [PATCH 2/7] net/mlx4: fix Rx interrupts with multiple ports

2017-06-16 Thread Ferruh Yigit
On 6/14/2017 12:49 PM, Adrien Mazarguil wrote:
> Several Ethernet device structures are allocated on top of a common PCI
> device for mlx4 adapters with multiple ports. These inherit a common
> interrupt handle from their parent PCI device, which prevents Rx interrupts
> from working properly on all ports as their configuration is overwritten.
> 
> Use a local interrupt handle to address this issue.

Hi Adrien,

I am not clear why local copy required, and main concern from my point
of view is if this is a common problem for all PMDs and should be
addressed in higher level?

The variable is already per eth_dev, but this patch moves it the private
data. What overwrites it within eth_dev?

Thanks,
ferruh

> 
> Fixes: 9f05a4b81809 ("net/mlx4: support user space Rx interrupt event")
> 
> Signed-off-by: Adrien Mazarguil 
> Acked-by: Moti Haimovsky 
> ---
>  drivers/net/mlx4/mlx4.c | 9 +
>  drivers/net/mlx4/mlx4.h | 1 +
>  2 files changed, 10 insertions(+)
> 
> diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
> index 178562e..2b4722f 100644
> --- a/drivers/net/mlx4/mlx4.c
> +++ b/drivers/net/mlx4/mlx4.c
> @@ -6207,6 +6207,15 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct 
> rte_pci_device *pci_dev)
>  
>   eth_dev->device->driver = &mlx4_driver.driver;
>  
> + /*
> +  * Copy and override interrupt handle to prevent it from
> +  * being shared between all ethdev instances of a given PCI
> +  * device. This is required to properly handle Rx interrupts
> +  * on all ports.
> +  */
> + priv->intr_handle_dev = *eth_dev->intr_handle;
> + eth_dev->intr_handle = &priv->intr_handle_dev;
> +
>   priv->dev = eth_dev;
>   eth_dev->dev_ops = &mlx4_dev_ops;
>  
> diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
> index c46fc23..b74fbf8 100644
> --- a/drivers/net/mlx4/mlx4.h
> +++ b/drivers/net/mlx4/mlx4.h
> @@ -345,6 +345,7 @@ struct priv {
>   unsigned int txqs_n; /* TX queues array size. */
>   struct rxq *(*rxqs)[]; /* RX queues. */
>   struct txq *(*txqs)[]; /* TX queues. */
> + struct rte_intr_handle intr_handle_dev; /* Device interrupt handler. */
>   struct rte_intr_handle intr_handle; /* Interrupt handler. */
>   struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
>   LIST_HEAD(mlx4_flows, rte_flow) flows;
> 



Re: [dpdk-dev] [PATCH 2/7] net/mlx4: fix Rx interrupts with multiple ports

2017-06-16 Thread Adrien Mazarguil
On Fri, Jun 16, 2017 at 02:07:54PM +0100, Ferruh Yigit wrote:
> On 6/14/2017 12:49 PM, Adrien Mazarguil wrote:
> > Several Ethernet device structures are allocated on top of a common PCI
> > device for mlx4 adapters with multiple ports. These inherit a common
> > interrupt handle from their parent PCI device, which prevents Rx interrupts
> > from working properly on all ports as their configuration is overwritten.
> > 
> > Use a local interrupt handle to address this issue.
> 
> Hi Adrien,
> 
> I am not clear why local copy required, and main concern from my point
> of view is if this is a common problem for all PMDs and should be
> addressed in higher level?

This issue only affects PMDs that handle multiple Ethernet ports through a
single PCI device. Such PMDs (like mlx4) identify themselves as PCI drivers
that manually have to register multiple rte_eth_dev instances through
rte_eth_dev_allocate(), which they then have to initialize.

> The variable is already per eth_dev, but this patch moves it the private
> data. What overwrites it within eth_dev?

Calling rte_eth_copy_pci_info() makes the rte_eth_dev structure inherit the
default interrupt handle of the underlying PCI device. By "inherit", I mean
eth_dev->intr_handle points to it, in that sense it's not per eth_dev but
per PCI device.

mlx4 Rx interrupts are associated with a given Verbs context, and each port
has its own Verbs context, so they cannot be shared, while other PMDs using
other methods for catching interrupts may be perfectly fine with a single
vector associated with the PCI device. It depends on the PMD, for instance
there is no such problem with mlx5 as exactly one PCI device is associated
with a given port.

This patch merely allocates a specific interrupt handle associated with the
eth_dev itself and makes the eth_dev handle point to that instead of the
default PCI handle. This "local" handle is initialized using the PCI handle
as a template before modifying the pointer. It's completely safe.

> > Fixes: 9f05a4b81809 ("net/mlx4: support user space Rx interrupt event")
> > 
> > Signed-off-by: Adrien Mazarguil 
> > Acked-by: Moti Haimovsky 
> > ---
> >  drivers/net/mlx4/mlx4.c | 9 +
> >  drivers/net/mlx4/mlx4.h | 1 +
> >  2 files changed, 10 insertions(+)
> > 
> > diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
> > index 178562e..2b4722f 100644
> > --- a/drivers/net/mlx4/mlx4.c
> > +++ b/drivers/net/mlx4/mlx4.c
> > @@ -6207,6 +6207,15 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, 
> > struct rte_pci_device *pci_dev)
> >  
> > eth_dev->device->driver = &mlx4_driver.driver;
> >  
> > +   /*
> > +* Copy and override interrupt handle to prevent it from
> > +* being shared between all ethdev instances of a given PCI
> > +* device. This is required to properly handle Rx interrupts
> > +* on all ports.
> > +*/
> > +   priv->intr_handle_dev = *eth_dev->intr_handle;
> > +   eth_dev->intr_handle = &priv->intr_handle_dev;
> > +
> > priv->dev = eth_dev;
> > eth_dev->dev_ops = &mlx4_dev_ops;
> >  
> > diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
> > index c46fc23..b74fbf8 100644
> > --- a/drivers/net/mlx4/mlx4.h
> > +++ b/drivers/net/mlx4/mlx4.h
> > @@ -345,6 +345,7 @@ struct priv {
> > unsigned int txqs_n; /* TX queues array size. */
> > struct rxq *(*rxqs)[]; /* RX queues. */
> > struct txq *(*txqs)[]; /* TX queues. */
> > +   struct rte_intr_handle intr_handle_dev; /* Device interrupt handler. */
> > struct rte_intr_handle intr_handle; /* Interrupt handler. */
> > struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
> > LIST_HEAD(mlx4_flows, rte_flow) flows;
> > 
> 

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH 2/7] net/mlx4: fix Rx interrupts with multiple ports

2017-06-16 Thread Ferruh Yigit
On 6/16/2017 2:39 PM, Adrien Mazarguil wrote:
> On Fri, Jun 16, 2017 at 02:07:54PM +0100, Ferruh Yigit wrote:
>> On 6/14/2017 12:49 PM, Adrien Mazarguil wrote:
>>> Several Ethernet device structures are allocated on top of a common PCI
>>> device for mlx4 adapters with multiple ports. These inherit a common
>>> interrupt handle from their parent PCI device, which prevents Rx interrupts
>>> from working properly on all ports as their configuration is overwritten.
>>>
>>> Use a local interrupt handle to address this issue.
>>
>> Hi Adrien,
>>
>> I am not clear why local copy required, and main concern from my point
>> of view is if this is a common problem for all PMDs and should be
>> addressed in higher level?
> 
> This issue only affects PMDs that handle multiple Ethernet ports through a
> single PCI device. Such PMDs (like mlx4) identify themselves as PCI drivers
> that manually have to register multiple rte_eth_dev instances through
> rte_eth_dev_allocate(), which they then have to initialize.
> 
>> The variable is already per eth_dev, but this patch moves it the private
>> data. What overwrites it within eth_dev?
> 
> Calling rte_eth_copy_pci_info() makes the rte_eth_dev structure inherit the
> default interrupt handle of the underlying PCI device. By "inherit", I mean
> eth_dev->intr_handle points to it, in that sense it's not per eth_dev but
> per PCI device.
> 
> mlx4 Rx interrupts are associated with a given Verbs context, and each port
> has its own Verbs context, so they cannot be shared, while other PMDs using
> other methods for catching interrupts may be perfectly fine with a single
> vector associated with the PCI device. It depends on the PMD, for instance
> there is no such problem with mlx5 as exactly one PCI device is associated
> with a given port.
> 
> This patch merely allocates a specific interrupt handle associated with the
> eth_dev itself and makes the eth_dev handle point to that instead of the
> default PCI handle. This "local" handle is initialized using the PCI handle
> as a template before modifying the pointer. It's completely safe.

Thanks for clarification.

> 
>>> Fixes: 9f05a4b81809 ("net/mlx4: support user space Rx interrupt event")
>>>
>>> Signed-off-by: Adrien Mazarguil 
>>> Acked-by: Moti Haimovsky 
>>> ---
>>>  drivers/net/mlx4/mlx4.c | 9 +
>>>  drivers/net/mlx4/mlx4.h | 1 +
>>>  2 files changed, 10 insertions(+)
>>>
>>> diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
>>> index 178562e..2b4722f 100644
>>> --- a/drivers/net/mlx4/mlx4.c
>>> +++ b/drivers/net/mlx4/mlx4.c
>>> @@ -6207,6 +6207,15 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, 
>>> struct rte_pci_device *pci_dev)
>>>  
>>> eth_dev->device->driver = &mlx4_driver.driver;
>>>  
>>> +   /*
>>> +* Copy and override interrupt handle to prevent it from
>>> +* being shared between all ethdev instances of a given PCI
>>> +* device. This is required to properly handle Rx interrupts
>>> +* on all ports.
>>> +*/
>>> +   priv->intr_handle_dev = *eth_dev->intr_handle;
>>> +   eth_dev->intr_handle = &priv->intr_handle_dev;
>>> +
>>> priv->dev = eth_dev;
>>> eth_dev->dev_ops = &mlx4_dev_ops;
>>>  
>>> diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
>>> index c46fc23..b74fbf8 100644
>>> --- a/drivers/net/mlx4/mlx4.h
>>> +++ b/drivers/net/mlx4/mlx4.h
>>> @@ -345,6 +345,7 @@ struct priv {
>>> unsigned int txqs_n; /* TX queues array size. */
>>> struct rxq *(*rxqs)[]; /* RX queues. */
>>> struct txq *(*txqs)[]; /* TX queues. */
>>> +   struct rte_intr_handle intr_handle_dev; /* Device interrupt handler. */
>>> struct rte_intr_handle intr_handle; /* Interrupt handler. */
>>> struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
>>> LIST_HEAD(mlx4_flows, rte_flow) flows;
>>>
>>
> 



Re: [dpdk-dev] [PATCH 0/7] Fix Rx interrupt support in mlx4 and mlx5

2017-06-16 Thread Ferruh Yigit
On 6/14/2017 12:49 PM, Adrien Mazarguil wrote:
> Several issues mainly related to error handling were found in both
> implementations as they share most of their Rx interrupts handling code.
> 
> Another problem with the mlx4 implementation is that it does not work
> properly with multiple ports adapters that share a common PCI device.
> 
> Adrien Mazarguil (7):
>   net/mlx4: fix typos from prior commit
>   net/mlx4: fix Rx interrupts with multiple ports
>   net/mlx4: fix Rx interrupts management
>   net/mlx5: fix misplaced Rx interrupts functions
>   net/mlx5: fix Rx interrupts support checks
>   net/mlx5: fix return value in Rx interrupts code
>   net/mlx5: fix Rx interrupts management
> 
>  drivers/net/mlx4/mlx4.c | 179 ---
>  drivers/net/mlx4/mlx4.h |   1 +
>  drivers/net/mlx5/mlx5.c |   2 +
>  drivers/net/mlx5/mlx5_rxq.c | 142 ++-
>  drivers/net/mlx5/mlx5_rxtx.c|  73 --
>  drivers/net/mlx5/mlx5_rxtx.h|  12 +--
>  drivers/net/mlx5/mlx5_trigger.c |  16 ++--
>  7 files changed, 194 insertions(+), 231 deletions(-)

Series applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] [PATCH] app/testpmd: fix build without ixgbe

2017-06-16 Thread Thomas Monjalon
16/06/2017 03:29, Lu, Wenzhuo:
> From: Nicolau, Radu
> > 
> > Looks ok to me, but why would one enable IXGBE_BYPASS without enabling
> > IXGBE?
> 
> Although the scenario doesn't make sense, I think it's better to add more 
> check to avoid the users making any mistake.

Applied


Re: [dpdk-dev] [PATCH v2 1/3] eal: introduce big and little endian types

2017-06-16 Thread Thomas Monjalon
Series applied, thanks




Re: [dpdk-dev] [PATCH] eal: remove vdev probe by dev args

2017-06-16 Thread Thomas Monjalon
Please Jan, could you comment?

09/06/2017 11:21, Ferruh Yigit:
> On 6/8/2017 9:45 PM, Thomas Monjalon wrote:
> > 10/05/2017 13:01, Ferruh Yigit:
> >> Virtual device/driver probing done via name.
> >>
> >> A new alternative method introduced to probe the device with providing
> >> driver name in devargs as "driver=".
> >>
> >> This patch removes alternative method and fixes virtual device usages
> >> with proper device names.
> >>
> >> Fixes: 87c3bf29c642 ("test: do not short-circuit null device creation")
> >> Fixes: d39670086a63 ("eal: parse driver argument before probing drivers")
> >>
> >> Signed-off-by: Ferruh Yigit 
> >> ---
> > [...]
> >>  static int
> >>  vdev_probe_all_drivers(struct rte_vdev_device *dev)
> >>  {
> >>const char *name;
> >> -  char *drv_name;
> >>struct rte_vdev_driver *driver;
> >>int ret = 1;
> >>  
> >> -  drv_name = parse_driver_arg(rte_vdev_device_args(dev));
> >> -  name = drv_name ? drv_name : rte_vdev_device_name(dev);
> >> +  name = rte_vdev_device_name(dev);
> > 
> > It seems you are reverting the commit d39670086a63:
> > eal: parse driver argument before probing drivers
> 
> Mostly yes.
> 
> > 
> > In some cases the virtual device name should be totally different than
> > the driver being used for the device. Therefore lets parse the devargs 
> > for
> > the "driver" argument before probing drivers in 
> > vdev_probe_all_drivers().
> > 
> > Is this "driver" option useless?
> 
> There is already a generic way to probe, why need another method, I
> didn't get the motivation but this looks like a hack, only place I found
> used is in bonding unit test, which can be switched to generic way
> without this support.
> 
> And this is a hidden / an undocumented feature.
> 
> Thanks,
> ferruh
> 





Re: [dpdk-dev] [PATCH] mem: support page locking on FreeBSD

2017-06-16 Thread Bruce Richardson
On Thu, Jun 15, 2017 at 07:50:55PM +0200, Thomas Monjalon wrote:
> The function rte_mem_lock_page() was added for Linux only.
> The file eal_common_memory.c is a better place to make it
> available in FreeBSD also.
> 
> The issue is seen when trying to compile bnxt on FreeBSD:
>   bnxt_hwrm.c: undefined reference to `rte_mem_lock_page'
> 
> Fixes: 3097de6e6bfb ("mem: get physical address of any pointer")
> 
> Reported-by: Fangfang Wei 
> Signed-off-by: Thomas Monjalon 
> ---
This fixes gcc and clang errors on FreeBSD so

Acked-by: Bruce Richardson 



Re: [dpdk-dev] [PATCH] net/bonding: support bifurcated driver in eal cli using --vdev

2017-06-16 Thread Thomas Monjalon
14/06/2017 12:49, Gowrishankar:
> At present, creating bonding devices using --vdev is broken for PMD like
> mlx5 as it is neither UIO nor VFIO based and hence PMD driver is unknown
> to find_port_id_by_pci_addr(), as below.
> 
> testpmd  --vdev 'net_bonding0,mode=1,slave=,socket_id=0'
> 
> PMD: bond_ethdev_parse_slave_port_kvarg(150) - Invalid slave port value
>  () specified
> EAL: Failed to parse slave ports for bonded device net_bonding0

Thanks for reporting.

> This patch adds RTE_KDRV_BIFURCATED in rte_kernel_driver for the PMD
> driver like mlx5 to be a known one.

The current kdrv value should be RTE_KDRV_UNKNOWN for Mellanox devices.
I do not see the value of creating a new type.

I think the issue is in the bonding code (find_port_id_by_pci_addr):

* TODO: Once the PCI bus has arrived we should have a better
* way to test for being a PCI device or not.
*/
if (rte_eth_devices[i].data->kdrv == RTE_KDRV_UNKNOWN ||
rte_eth_devices[i].data->kdrv == RTE_KDRV_NONE)
continue;



Re: [dpdk-dev] [PATCH v2] test/test_mbuf: remove mempool global var

2017-06-16 Thread Olivier Matz
On Thu,  8 Jun 2017 14:28:31 +, Santosh Shukla 
 wrote:
> Let test_mbuf alloc and free mempool.
> 
> Cc: sta...@dpdk.org
> Signed-off-by: Santosh Shukla 

Acked-by: Olivier Matz 


Re: [dpdk-dev] mlx debug build error with clang

2017-06-16 Thread Ferruh Yigit
On 6/16/2017 1:58 PM, Ferruh Yigit wrote:
> On 6/16/2017 1:19 PM, Adrien Mazarguil wrote:
>> Hi Ferruh,
>>
>> On Tue, Jun 13, 2017 at 04:32:03PM +0100, Ferruh Yigit wrote:
>>> Hi Adrien, Nelio,
>>>
>>> I am getting following build error [1] with clang [2] when debug enabled
>>> for mlx4 and mlx5.
>>>
>>> This started after I update my box, not sure what triggered this.
>>> Can you please help fixing this?
>>>
>>> Thanks,
>>> ferruh
>>>
>>>
>>> [1]
>>>
>>> .../drivers/net/mlx4/mlx4_flow.c:731:3: error: use of GNU statement
>>> expression extension [-Werror,-Wgnu-statement-expression]
>>> claim_zero(ibv_destroy_qp(fdq->qp));
>>> ^
>>> .../drivers/net/mlx4/mlx4.h:185:25: note: expanded from macro 'claim_zero'
>>> #define claim_zero(...) assert((__VA_ARGS__) == 0)
>>> ^
>>> /usr/include/assert.h:95:6: note: expanded from macro 'assert'
>>> ({  \
>>>  ^
>>>
>>> 
>>>
>>> .../drivers/net/mlx5/mlx5_fdir.c:278:2: error: use of GNU statement
>>> expression extension [-Werror,-Wgnu-statement-expression]
>>> assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec_offset);
>>> ^
>>> /usr/include/assert.h:95:6: note: expanded from macro 'assert'
>>> ({  \
>>>  ^
>>>
>>> [Many of similar ...]
>>>
>>>
>>> [2]
>>> target: x86_64-native-linuxapp-clang
>>>
>>> clang version 4.0.0 (tags/RELEASE_400/final)
>>
>> Recent Glibc versions now apparently handle assert() through a nonstandard
>> ({ }) construct, which is not pedantic-safe due to a missing __extension__
>> keyword.
>>
>> assert.h still provides a standard assert() definition that shouldn't cause
>> compilation to fail when the following condition is met:
>>
>>  #if !defined __GNUC__ || defined __STRICT_ANSI__
>>
>> However __GNUC__ is (always?) defined by clang for maximum compatibility
>> with GCC while __STRICT_ANSI__ is not due to the -std=gnu99 parameter,
>> assert.h thinks it's OK to use a ({ }) construct but is then caught by
>> clang's -pedantic parameter.
>>
>> There are two ways to address this issue while keeping our beloved -pedantic
>> parameter in debug mode:
>>
>> 1. Replacing -std=gnu99 with -std=c99 (which is even stricter) to bring back
>>__STRICT_ANSI__.
>> 2. Replacing assert() statements with RTE_ASSERT().
>>
>> The former should be doable now that DPDK includes have been cleaned up, and
>> we're thinking about doing the latter at some point.
>>
>> Since I don't have a recent Glibc handy, can you try replacing -std=gnu99
>> with -std=c99 in both Makefiles (mlx4 and mlx5), and report how GCC and
>> clang fare? (GCC at least seems to have no problem with that on my side)
> 
> Sure, I will try -std=c99

This fixes the build error.

> 
>>
> 



Re: [dpdk-dev] [PATCH] mem: support page locking on FreeBSD

2017-06-16 Thread Thomas Monjalon
16/06/2017 16:31, Bruce Richardson:
> On Thu, Jun 15, 2017 at 07:50:55PM +0200, Thomas Monjalon wrote:
> > The function rte_mem_lock_page() was added for Linux only.
> > The file eal_common_memory.c is a better place to make it
> > available in FreeBSD also.
> > 
> > The issue is seen when trying to compile bnxt on FreeBSD:
> > bnxt_hwrm.c: undefined reference to `rte_mem_lock_page'
> > 
> > Fixes: 3097de6e6bfb ("mem: get physical address of any pointer")
> > 
> > Reported-by: Fangfang Wei 
> > Signed-off-by: Thomas Monjalon 
> > ---
> This fixes gcc and clang errors on FreeBSD so
> 
> Acked-by: Bruce Richardson 

Applied


Re: [dpdk-dev] [RFC] Kernel Control Path (KCP)

2017-06-16 Thread Ferruh Yigit
Hi Alex,

On 6/15/2017 1:07 PM, Alex Rosenbaum wrote:
> please excuse me if I missed out of the previous conversation and
> asking these questions again...
> 
> Why create a new driver instead of improving the existing KNI driver?

For control path, KNI uses Linux kernel driver within KNI kernel module.
This method works, but may not be best option, and technically not
extendable for some drivers. KNI control path currently supports only
two drivers, proposed KCP works for all PMDs by default.

Overall, KCP is outcome of the effort of improving KNI control path.

Initial proposal was (a year ago I guess) introducing two new modules,
one for control path and one for data path, and replace KNI completely.
But current target is have KCP to have better control path support.

Also, KNI handles both data and control path. But both are different
functionalities and not need to be in some module. For example an
application may not need exception data path to kernel, but may be
interested in controlling DPDK interfaces via common Linux tools.

> Can you share a table of the differences between the two driver /
> approaches [KNI vs KCP]?

KCP differences against KNI:

- KCP is only for control path
- Linux virtual interfaces created automatically, without DPDK
application modification.
- To create/destroy interfaces KCP uses rtnl, KNI uses ioctl. So
technically it is possible to use "ip" tool to create / destroy
interfaces supported by KCP.
- KCP kernel module and userspace counterpart communicates via netlink,
KNI uses ioctl.
- KCP works for all PMDs without update on PMDs.

> 
> Why do you want to remove features like data path that is provided by KNI 
> today?

There is not intention to remove exception data path, the focus is to
improve KNI.

> 
> thanks,
> Alex
> 



Re: [dpdk-dev] [RFC] Kernel Control Path (KCP)

2017-06-16 Thread Ferruh Yigit
On 5/28/2017 5:55 PM, Wiles, Keith wrote:
> 
>> On May 26, 2017, at 11:52 AM, Ferruh Yigit  wrote:
>>
>> We are looking for re-sending [1] the Kernel Control Path (KCP)
>> with some updates [2].
>>
>> Mainly this is an usability improvement for DPDK.
>>
>> And a quick reminder about what KCP is:
>>
>> "KCP is Linux virtual network interface that can control DPDK ports".
>>
>> So DPDK interfaces, somehow will be visible and it will be possible to
>> use common Linux tools on DPDK interfaces.
>>
>> This work can be done in multiple steps:
>>
>> - At first step virtual interfaces can be read-only, and can be used
>>  to get stats / information from DPDK ports.
>>
>> - Second step can be controlling the DPDK interfaces in a common way
>>  like Linux interfaces.
>>
>> It is good to remind that KCP is only for control path, and no data
>> traffic will be available on those interfaces, meaning not able to use
>> tcpdump or similar tools on those interfaces.
>>
>> I would like to hear about comments, requirements and objection about
>> the idea?
>>
>> Also the name "Kernel Control Path" can be too broad, I am open to a
>> name change, any comments on naming is welcome.
> 
> Using kernel in the name is not very useful, but netlink is the real part 
> that makes sense.
> 
> How about one of these:
> - DNI = DPDK Netlink Interface
> - DNC = DPDK Netlink Control
> - NCI = Netlink Control Interface
> 
> Being able to control DPDK interfaces via Netlink is one of the customer 
> needs I have heard of late.

My concern is this name my create a miss understanding that DPDK is
providing a netlink interface for other applications that they can use
to control DPDK application / interfaces.

Here although netlink sockets used to communicate between kernel and
userspace, DPDK application connects to the netlink socket provided by
kernel module, and DPDK interfaces controlled using virtual Linux
network interfaces, independent from what kind of communication method
used between kernel and userspace.

> 
>>
>>
>> [1]
>> http://dpdk.org/ml/archives/dev/2016-March/035139.html
>>
>> [2]
>> Updates planned to the latest version sent:
>> - Create control interfaces without requiring an API call from user
>>  application, this will let DPDK applications have this support
>>  without any modification.
>> - Default enabled interfaces will be read-only.
>> - Possible rename.
>>
>>
>> Thanks,
>> ferruh
>>
>>
>> Ferruh Yigit (4):
>>  ethtool: move from sample folder to lib folder
>>  kcp: add kernel control path kernel module
>>  rte_ctrl_if: add control interface library
>>  ethdev: add control interface support
>>
>> -- 
>> 2.9.3
>>
> 
> Regards,
> Keith
> 



Re: [dpdk-dev] [PATCH v2 1/2] net/e1000: fix flex type filter

2017-06-16 Thread Ferruh Yigit
On 6/16/2017 6:04 AM, Wei Zhao wrote:
> Fix bug in parse the flex info of wrong use local
> variable index which will cause filter fail, and some
> error in calculation mask.
> 
> Fixes: 7cd77faf7129 ("net/igb: parse flow API flex filter")
> 
> Signed-off-by: Wei Zhao 

Series applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] how to poll a 40G/100G NIC with multiqueues without RSS

2017-06-16 Thread Chilikin, Andrey
With XL710 NICs you can use flexible payload for RSS for ETH_RSS_L2_PAYLOAD. By 
default XL710 PMD will extract first 16 bytes of the payload. You can include 
these 16 bytes for RSS with rte_eth_dev_filter_ctrl(...,RTE_ETH_FILTER_HASH, 
RTE_ETH_FILTER_SET, ...)

/Andrey

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Chillance Zen
Sent: Friday, June 9, 2017 7:48 AM
To: dev@dpdk.org
Subject: [dpdk-dev] how to poll a 40G/100G NIC with multiqueues without RSS

Hi, everyone,
since I need to realize my own protocol with which RSS can not be configured.
but we need massive packets to be processed with  multiple cpus(even that one 
cpu is always polling is not enough)

does anyone know how to distribute traffic to multiple queues

thanks®ards

--
Linc @bjtu


Re: [dpdk-dev] [RFC] Kernel Control Path (KCP)

2017-06-16 Thread Stephen Hemminger
On Fri, 16 Jun 2017 16:27:47 +0100
Ferruh Yigit  wrote:

> Hi Alex,
> 
> On 6/15/2017 1:07 PM, Alex Rosenbaum wrote:
> > please excuse me if I missed out of the previous conversation and
> > asking these questions again...
> > 
> > Why create a new driver instead of improving the existing KNI driver?  
> 
> For control path, KNI uses Linux kernel driver within KNI kernel module.
> This method works, but may not be best option, and technically not
> extendable for some drivers. KNI control path currently supports only
> two drivers, proposed KCP works for all PMDs by default.
> 
> Overall, KCP is outcome of the effort of improving KNI control path.
> 
> Initial proposal was (a year ago I guess) introducing two new modules,
> one for control path and one for data path, and replace KNI completely.
> But current target is have KCP to have better control path support.
> 
> Also, KNI handles both data and control path. But both are different
> functionalities and not need to be in some module. For example an
> application may not need exception data path to kernel, but may be
> interested in controlling DPDK interfaces via common Linux tools.
> 
> > Can you share a table of the differences between the two driver /
> > approaches [KNI vs KCP]?  
> 
> KCP differences against KNI:
> 
> - KCP is only for control path
> - Linux virtual interfaces created automatically, without DPDK
> application modification.
> - To create/destroy interfaces KCP uses rtnl, KNI uses ioctl. So
> technically it is possible to use "ip" tool to create / destroy
> interfaces supported by KCP.
> - KCP kernel module and userspace counterpart communicates via netlink,
> KNI uses ioctl.
> - KCP works for all PMDs without update on PMDs.
> 
> > 
> > Why do you want to remove features like data path that is provided by KNI 
> > today?  
> 
> There is not intention to remove exception data path, the focus is to
> improve KNI.
> 
> > 
> > thanks,
> > Alex
> >   
> 

Hopefully KCP can be submitted for upstream kernel, and therefore be 
supportable over
the long term. KNI in its current form is not acceptable upstream for a number
of reasons: style, use of ioctl, races with control operations, etc.