[dpdk-dev] New driver (large patch) question.

2016-03-03 Thread Vincent JARDIN
Please,

Le 02/03/2016 22:30, Stephen Hurd a ?crit :
> Too many of the DPDK drivers are bloated.
>>Recall the venerable paraphrase of Pascal, "I made this so long because I
>>did not have time to make it shorter."
>>https://en.wikipedia.org/wiki/Wikipedia:Too_long;_didn%27t_read

Keep In Simple, Small Is Beautiful, the big drivers with dead codes are 
not easy to be maintained. We have lot of duplication of efforts between 
the kernel and some DPDK PMDs,

Currently, the breakdown of Lines of Codes of the PMDs are:

492 ring
522 null
666 af_packet
829 pcap
1229 szedata2
1300 mpipe
1411 xenvirt
2036 nfp
2260 vmxnet3
3074 virtio
4129 mlx4
4205 bonding
4524 mlx5
4904 enic
7654 cxgbe
7969 fm10k
27862 ixgbe
29209 e1000
31392 i40e
38031 bnx2x

(I did use cloc).

Vincent


[dpdk-dev] [PATCH] cryptodev: fix RTE_PMD_DEBUG_TRACE redefinition

2016-03-03 Thread Marc Sune
RTE_PMD_DEBUG_TRACE used RTE_FUNC_PTR_OR_ERR_RET was redefined
in rte_cryptodev_pmd.h which produced MACRO redefinition warnings
when including both rte_cryptodev_pmd.h and rte_ethdev.h.

This commit moves MACRO definition to rte_cryptodev.c to prevent
this warning.
---
 lib/librte_cryptodev/rte_cryptodev.c | 7 +++
 lib/librte_cryptodev/rte_cryptodev_pmd.h | 7 ---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/librte_cryptodev/rte_cryptodev.c 
b/lib/librte_cryptodev/rte_cryptodev.c
index 2838852..90d2c30 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -71,6 +71,13 @@
 #include "rte_cryptodev.h"
 #include "rte_cryptodev_pmd.h"

+#ifdef RTE_LIBRTE_CRYPTODEV_DEBUG
+#define RTE_PMD_DEBUG_TRACE(...) \
+   rte_pmd_debug_trace(__func__, __VA_ARGS__)
+#else
+#define RTE_PMD_DEBUG_TRACE(fmt, args...)
+#endif
+
 struct rte_cryptodev rte_crypto_devices[RTE_CRYPTO_MAX_DEVS];

 struct rte_cryptodev *rte_cryptodevs = &rte_crypto_devices[0];
diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h 
b/lib/librte_cryptodev/rte_cryptodev_pmd.h
index 8270afa..c43680f 100644
--- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
+++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
@@ -62,13 +62,6 @@ struct rte_cryptodev_qp_conf;

 enum rte_cryptodev_event_type;

-#ifdef RTE_LIBRTE_CRYPTODEV_DEBUG
-#define RTE_PMD_DEBUG_TRACE(...) \
-   rte_pmd_debug_trace(__func__, __VA_ARGS__)
-#else
-#define RTE_PMD_DEBUG_TRACE(fmt, args...)
-#endif
-
 struct rte_cryptodev_session {
struct {
uint8_t dev_id;
-- 
2.1.4



[dpdk-dev] [PATCH] cmdline: include missing cmdline_parse.h

2016-03-03 Thread Marc Sune
cmdline_parse_*.h headers use struct cmdline_token_hdr /
cmdline_parse_token_hdr_t which is defined in cmdline_parse.h, but
do not include it, forcing manual inclusion.

This commit includes cmdline_parse.h in all cmdline_parse_*.h.
---
 lib/librte_cmdline/cmdline_parse_etheraddr.h | 2 ++
 lib/librte_cmdline/cmdline_parse_ipaddr.h| 1 +
 lib/librte_cmdline/cmdline_parse_num.h   | 2 ++
 lib/librte_cmdline/cmdline_parse_portlist.h  | 2 ++
 lib/librte_cmdline/cmdline_parse_string.h| 2 ++
 5 files changed, 9 insertions(+)

diff --git a/lib/librte_cmdline/cmdline_parse_etheraddr.h 
b/lib/librte_cmdline/cmdline_parse_etheraddr.h
index 0085bb3..e539fb6 100644
--- a/lib/librte_cmdline/cmdline_parse_etheraddr.h
+++ b/lib/librte_cmdline/cmdline_parse_etheraddr.h
@@ -61,6 +61,8 @@
 #ifndef _PARSE_ETHERADDR_H_
 #define _PARSE_ETHERADDR_H_

+#include 
+
 #ifdef __cplusplus
 extern "C" {
 #endif
diff --git a/lib/librte_cmdline/cmdline_parse_ipaddr.h 
b/lib/librte_cmdline/cmdline_parse_ipaddr.h
index 46c6e1b..2b4266f 100644
--- a/lib/librte_cmdline/cmdline_parse_ipaddr.h
+++ b/lib/librte_cmdline/cmdline_parse_ipaddr.h
@@ -61,6 +61,7 @@
 #ifndef _PARSE_IPADDR_H_
 #define _PARSE_IPADDR_H_

+#include 
 #include 

 #ifdef __cplusplus
diff --git a/lib/librte_cmdline/cmdline_parse_num.h 
b/lib/librte_cmdline/cmdline_parse_num.h
index 5376806..2558cbf 100644
--- a/lib/librte_cmdline/cmdline_parse_num.h
+++ b/lib/librte_cmdline/cmdline_parse_num.h
@@ -61,6 +61,8 @@
 #ifndef _PARSE_NUM_H_
 #define _PARSE_NUM_H_

+#include 
+
 #ifdef __cplusplus
 extern "C" {
 #endif
diff --git a/lib/librte_cmdline/cmdline_parse_portlist.h 
b/lib/librte_cmdline/cmdline_parse_portlist.h
index 8505059..73d70e0 100644
--- a/lib/librte_cmdline/cmdline_parse_portlist.h
+++ b/lib/librte_cmdline/cmdline_parse_portlist.h
@@ -61,6 +61,8 @@
 #ifndef _PARSE_PORTLIST_H_
 #define _PARSE_PORTLIST_H_

+#include 
+
 #ifdef __cplusplus
 extern "C" {
 #endif
diff --git a/lib/librte_cmdline/cmdline_parse_string.h 
b/lib/librte_cmdline/cmdline_parse_string.h
index c205622..94aa1f1 100644
--- a/lib/librte_cmdline/cmdline_parse_string.h
+++ b/lib/librte_cmdline/cmdline_parse_string.h
@@ -61,6 +61,8 @@
 #ifndef _PARSE_STRING_H_
 #define _PARSE_STRING_H_

+#include 
+
 #ifdef __cplusplus
 extern "C" {
 #endif
-- 
2.1.4



[dpdk-dev] New driver (large patch) question.

2016-03-03 Thread Thomas Monjalon
2016-03-02 15:10, Stephen Hurd:
> On Wed, Mar 2, 2016 at 2:15 PM, Thomas Monjalon 
> wrote:
> > > The driver itself doesn't have a lot of optional features in it, it's the
> > > header file that's too big.
> >
> > It is big because there are many different things.
> > You can split the file in different patches.
> > Examples:
> > - a patch for RSS will bring the hardware structures for RSS
> > - a patch for the stats will bring the hardware stats structures
> > etc
> 
> Should I split additional definitions/documentation that's not currently
> used in the driver as well?  Or should it stay as only enough to document
> what the driver already does?

I don't understand the question.
If something is not used, it should not been there.

> It's a fairly work-intensive project to deconstruct the existing driver
> into a series of small patches that work at each step, is this a hard
> requirement? (if so, I'd better get cracking)

There is no hard requirement. I'm just giving you some advices to get
some reviewers and make them confident when accepting your patches.
By the way, you would get more attention by introducing the device with
some web links and performance numbers in the cover letter.
It is also appreciated to provide a documentation in doc/guides/nics/.
You could also fill the (new) table in overview.rst.

Thanks


[dpdk-dev] [PATCH v6 0/5] Support VxLAN & NVGRE checksum off-load on X550

2016-03-03 Thread Wenzhuo Lu
This patch set add the VxLAN & NVGRE checksum off-load support.
Both RX and TX checksum off-load can be used for VxLAN & NVGRE.
And the VxLAN port can be set, it's implemented in this patch
set either.

v2:
- Update release note.

v3:
- Update RX/TX offload capability.
- Reuse PKT_RX_EIP_CKSUM_BAD but not add a new one.
- Correct the tunnel len for TX, and remove the useless out_l2_len.
- Don't set the tunnel type for TX, and remove the unused ol_flag_nvgre.

v4:
- Fix the issue that not setting the MAC length correctly.

v5:
- Change the behavior of VxLAN port add/del to make it align with i40e.

v6:
- Fix x86_64-native-linuxapp-gcc-shared compile error.

Wenzhuo Lu (5):
  lib/librte_ether: change function name of tunnel port config
  i40e: rename the tunnel port config functions
  ixgbe: support UDP tunnel port config
  ixgbe: support VxLAN &  NVGRE RX checksum off-load
  ixgbe: support VxLAN &  NVGRE TX checksum off-load

 app/test-pmd/cmdline.c |   6 +-
 doc/guides/rel_notes/release_16_04.rst |   9 +++
 drivers/net/i40e/i40e_ethdev.c |  22 +++---
 drivers/net/ixgbe/ixgbe_ethdev.c   | 131 +
 drivers/net/ixgbe/ixgbe_rxtx.c |  67 ++---
 drivers/net/ixgbe/ixgbe_rxtx.h |   6 +-
 examples/tep_termination/vxlan_setup.c |   2 +-
 lib/librte_ether/rte_ethdev.c  |  45 +++
 lib/librte_ether/rte_ethdev.h  |  19 +
 lib/librte_ether/rte_ether_version.map |   2 +
 lib/librte_mbuf/rte_mbuf.c |   2 +-
 lib/librte_mbuf/rte_mbuf.h |   2 +-
 12 files changed, 285 insertions(+), 28 deletions(-)

-- 
1.9.3



[dpdk-dev] [PATCH v6 1/5] lib/librte_ether: change function name of tunnel port config

2016-03-03 Thread Wenzhuo Lu
The names of function for tunnel port configuration are not
accurate. They're tunnel_add/del, better change them to
tunnel_port_add/del.
As it may be an ABI change if change the names directly, the
new functions are added but not remove the old ones. The old
ones will be removed in the next release after an ABI change
announcement.

Signed-off-by: Wenzhuo Lu 
---
 app/test-pmd/cmdline.c |  6 +++--
 examples/tep_termination/vxlan_setup.c |  2 +-
 lib/librte_ether/rte_ethdev.c  | 45 ++
 lib/librte_ether/rte_ethdev.h  | 18 ++
 lib/librte_ether/rte_ether_version.map |  2 ++
 5 files changed, 70 insertions(+), 3 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 52e9f5f..0fae655 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -6782,9 +6782,11 @@ cmd_tunnel_udp_config_parsed(void *parsed_result,
tunnel_udp.prot_type = RTE_TUNNEL_TYPE_VXLAN;

if (!strcmp(res->what, "add"))
-   ret = rte_eth_dev_udp_tunnel_add(res->port_id, &tunnel_udp);
+   ret = rte_eth_dev_udp_tunnel_port_add(res->port_id,
+ &tunnel_udp);
else
-   ret = rte_eth_dev_udp_tunnel_delete(res->port_id, &tunnel_udp);
+   ret = rte_eth_dev_udp_tunnel_port_delete(res->port_id,
+&tunnel_udp);

if (ret < 0)
printf("udp tunneling add error: (%s)\n", strerror(-ret));
diff --git a/examples/tep_termination/vxlan_setup.c 
b/examples/tep_termination/vxlan_setup.c
index 51ad133..8836603 100644
--- a/examples/tep_termination/vxlan_setup.c
+++ b/examples/tep_termination/vxlan_setup.c
@@ -191,7 +191,7 @@ vxlan_port_init(uint8_t port, struct rte_mempool *mbuf_pool)
/* Configure UDP port for UDP tunneling */
tunnel_udp.udp_port = udp_port;
tunnel_udp.prot_type = RTE_TUNNEL_TYPE_VXLAN;
-   retval = rte_eth_dev_udp_tunnel_add(port, &tunnel_udp);
+   retval = rte_eth_dev_udp_tunnel_port_add(port, &tunnel_udp);
if (retval < 0)
return retval;
rte_eth_macaddr_get(port, &ports_eth_addr[port]);
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 1257965..937b348 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1949,6 +1949,28 @@ rte_eth_dev_udp_tunnel_add(uint8_t port_id,
 }

 int
+rte_eth_dev_udp_tunnel_port_add(uint8_t port_id,
+   struct rte_eth_udp_tunnel *udp_tunnel)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   if (udp_tunnel == NULL) {
+   RTE_PMD_DEBUG_TRACE("Invalid udp_tunnel parameter\n");
+   return -EINVAL;
+   }
+
+   if (udp_tunnel->prot_type >= RTE_TUNNEL_TYPE_MAX) {
+   RTE_PMD_DEBUG_TRACE("Invalid tunnel type\n");
+   return -EINVAL;
+   }
+
+   dev = &rte_eth_devices[port_id];
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->udp_tunnel_port_add, -ENOTSUP);
+   return (*dev->dev_ops->udp_tunnel_port_add)(dev, udp_tunnel);
+}
+
+int
 rte_eth_dev_udp_tunnel_delete(uint8_t port_id,
  struct rte_eth_udp_tunnel *udp_tunnel)
 {
@@ -1972,6 +1994,29 @@ rte_eth_dev_udp_tunnel_delete(uint8_t port_id,
 }

 int
+rte_eth_dev_udp_tunnel_port_delete(uint8_t port_id,
+  struct rte_eth_udp_tunnel *udp_tunnel)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   dev = &rte_eth_devices[port_id];
+
+   if (udp_tunnel == NULL) {
+   RTE_PMD_DEBUG_TRACE("Invalid udp_tunnel parameter\n");
+   return -EINVAL;
+   }
+
+   if (udp_tunnel->prot_type >= RTE_TUNNEL_TYPE_MAX) {
+   RTE_PMD_DEBUG_TRACE("Invalid tunnel type\n");
+   return -EINVAL;
+   }
+
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->udp_tunnel_port_del, -ENOTSUP);
+   return (*dev->dev_ops->udp_tunnel_port_del)(dev, udp_tunnel);
+}
+
+int
 rte_eth_led_on(uint8_t port_id)
 {
struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16da821..f1f96c1 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1261,6 +1261,14 @@ typedef int (*eth_set_eeprom_t)(struct rte_eth_dev *dev,
struct rte_dev_eeprom_info *info);
 /**< @internal Program eeprom data  */

+typedef int (*eth_udp_tunnel_port_add_t)(struct rte_eth_dev *dev,
+struct rte_eth_udp_tunnel *tunnel_udp);
+/**< @internal Add tunneling UDP port */
+
+typedef int (*eth_udp_tunnel_port_del_t)(struct rte_eth_dev *dev,
+struct rte_eth_udp_tunnel *tunnel_udp);
+/**< @internal Delete tunneling UDP po

[dpdk-dev] [PATCH v6 2/5] i40e: rename the tunnel port config functions

2016-03-03 Thread Wenzhuo Lu
As the names of tunnel port config functions are not
accurate, change them from tunnel_add/del to
tunnel_port_add/del.
And support both the old and new rte ops.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/i40e/i40e_ethdev.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index ef24122..3cc9384 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -369,10 +369,10 @@ static int i40e_dev_rss_hash_update(struct rte_eth_dev 
*dev,
struct rte_eth_rss_conf *rss_conf);
 static int i40e_dev_rss_hash_conf_get(struct rte_eth_dev *dev,
  struct rte_eth_rss_conf *rss_conf);
-static int i40e_dev_udp_tunnel_add(struct rte_eth_dev *dev,
-   struct rte_eth_udp_tunnel *udp_tunnel);
-static int i40e_dev_udp_tunnel_del(struct rte_eth_dev *dev,
-   struct rte_eth_udp_tunnel *udp_tunnel);
+static int i40e_dev_udp_tunnel_port_add(struct rte_eth_dev *dev,
+   struct rte_eth_udp_tunnel *udp_tunnel);
+static int i40e_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
+   struct rte_eth_udp_tunnel *udp_tunnel);
 static int i40e_ethertype_filter_set(struct i40e_pf *pf,
struct rte_eth_ethertype_filter *filter,
bool add);
@@ -467,8 +467,10 @@ static const struct eth_dev_ops i40e_eth_dev_ops = {
.reta_query   = i40e_dev_rss_reta_query,
.rss_hash_update  = i40e_dev_rss_hash_update,
.rss_hash_conf_get= i40e_dev_rss_hash_conf_get,
-   .udp_tunnel_add   = i40e_dev_udp_tunnel_add,
-   .udp_tunnel_del   = i40e_dev_udp_tunnel_del,
+   .udp_tunnel_add   = i40e_dev_udp_tunnel_port_add,
+   .udp_tunnel_del   = i40e_dev_udp_tunnel_port_del,
+   .udp_tunnel_port_add  = i40e_dev_udp_tunnel_port_add,
+   .udp_tunnel_port_del  = i40e_dev_udp_tunnel_port_del,
.filter_ctrl  = i40e_dev_filter_ctrl,
.rxq_info_get = i40e_rxq_info_get,
.txq_info_get = i40e_txq_info_get,
@@ -5976,8 +5978,8 @@ i40e_del_vxlan_port(struct i40e_pf *pf, uint16_t port)

 /* Add UDP tunneling port */
 static int
-i40e_dev_udp_tunnel_add(struct rte_eth_dev *dev,
-   struct rte_eth_udp_tunnel *udp_tunnel)
+i40e_dev_udp_tunnel_port_add(struct rte_eth_dev *dev,
+struct rte_eth_udp_tunnel *udp_tunnel)
 {
int ret = 0;
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
@@ -6007,8 +6009,8 @@ i40e_dev_udp_tunnel_add(struct rte_eth_dev *dev,

 /* Remove UDP tunneling port */
 static int
-i40e_dev_udp_tunnel_del(struct rte_eth_dev *dev,
-   struct rte_eth_udp_tunnel *udp_tunnel)
+i40e_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
+struct rte_eth_udp_tunnel *udp_tunnel)
 {
int ret = 0;
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-- 
1.9.3



[dpdk-dev] [PATCH v6 3/5] ixgbe: support UDP tunnel port config

2016-03-03 Thread Wenzhuo Lu
Add UDP tunnel port add/del support on ixgbe. Now only
support VxLAN port configuration.
Although according to the specification the VxLAN port has
a default value 4789, it can be changed. We support VxLAN
port configuration to meet the change.
Note, the default value of VxLAN port in ixgbe NICs is 0. So
please set it when using VxLAN off-load.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 123 +++
 1 file changed, 123 insertions(+)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 3e6fe86..ec2ff0e 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -337,6 +337,10 @@ static int ixgbe_timesync_read_time(struct rte_eth_dev 
*dev,
   struct timespec *timestamp);
 static int ixgbe_timesync_write_time(struct rte_eth_dev *dev,
   const struct timespec *timestamp);
+static int ixgbe_dev_udp_tunnel_port_add(struct rte_eth_dev *dev,
+struct rte_eth_udp_tunnel *udp_tunnel);
+static int ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
+struct rte_eth_udp_tunnel *udp_tunnel);

 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -495,6 +499,10 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
.timesync_adjust_time = ixgbe_timesync_adjust_time,
.timesync_read_time   = ixgbe_timesync_read_time,
.timesync_write_time  = ixgbe_timesync_write_time,
+   .udp_tunnel_add   = ixgbe_dev_udp_tunnel_port_add,
+   .udp_tunnel_del   = ixgbe_dev_udp_tunnel_port_del,
+   .udp_tunnel_port_add  = ixgbe_dev_udp_tunnel_port_add,
+   .udp_tunnel_port_del  = ixgbe_dev_udp_tunnel_port_del,
 };

 /*
@@ -6191,6 +6199,121 @@ ixgbe_dev_get_dcb_info(struct rte_eth_dev *dev,
return 0;
 }

+static int
+ixgbe_update_vxlan_port(struct ixgbe_hw *hw,
+   uint16_t port)
+{
+   IXGBE_WRITE_REG(hw, IXGBE_VXLANCTRL, port);
+   IXGBE_WRITE_FLUSH(hw);
+
+   return 0;
+}
+
+/* There's only one register for VxLAN UDP port.
+ * So, we cannot add several ports. Will update it.
+ */
+static int
+ixgbe_add_vxlan_port(struct ixgbe_hw *hw,
+uint16_t port)
+{
+   if (port == 0) {
+   PMD_DRV_LOG(ERR, "Add VxLAN port 0 is not allowed.");
+   return -EINVAL;
+   }
+
+   return ixgbe_update_vxlan_port(hw, port);
+}
+
+/* We cannot delete the VxLAN port. For there's a register for VxLAN
+ * UDP port, it must have a value.
+ * So, will reset it to the original value 0.
+ */
+static int
+ixgbe_del_vxlan_port(struct ixgbe_hw *hw,
+uint16_t port)
+{
+   uint16_t cur_port;
+
+   cur_port = (uint16_t)IXGBE_READ_REG(hw, IXGBE_VXLANCTRL);
+
+   if (cur_port != port) {
+   PMD_DRV_LOG(ERR, "Port %u does not exist.", port);
+   return -EINVAL;
+   }
+
+   return ixgbe_update_vxlan_port(hw, 0);
+}
+
+/* Add UDP tunneling port */
+static int
+ixgbe_dev_udp_tunnel_port_add(struct rte_eth_dev *dev,
+ struct rte_eth_udp_tunnel *udp_tunnel)
+{
+   int ret = 0;
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   if (hw->mac.type != ixgbe_mac_X550 &&
+   hw->mac.type != ixgbe_mac_X550EM_x) {
+   return -ENOTSUP;
+   }
+
+   if (udp_tunnel == NULL)
+   return -EINVAL;
+
+   switch (udp_tunnel->prot_type) {
+   case RTE_TUNNEL_TYPE_VXLAN:
+   ret = ixgbe_add_vxlan_port(hw, udp_tunnel->udp_port);
+   break;
+
+   case RTE_TUNNEL_TYPE_GENEVE:
+   case RTE_TUNNEL_TYPE_TEREDO:
+   PMD_DRV_LOG(ERR, "Tunnel type is not supported now.");
+   ret = -1;
+   break;
+
+   default:
+   PMD_DRV_LOG(ERR, "Invalid tunnel type");
+   ret = -1;
+   break;
+   }
+
+   return ret;
+}
+
+/* Remove UDP tunneling port */
+static int
+ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
+ struct rte_eth_udp_tunnel *udp_tunnel)
+{
+   int ret = 0;
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   if (hw->mac.type != ixgbe_mac_X550 &&
+   hw->mac.type != ixgbe_mac_X550EM_x) {
+   return -ENOTSUP;
+   }
+
+   if (udp_tunnel == NULL)
+   return -EINVAL;
+
+   switch (udp_tunnel->prot_type) {
+   case RTE_TUNNEL_TYPE_VXLAN:
+   ret = ixgbe_del_vxlan_port(hw, udp_tunnel->udp_port);
+   break;
+   case RTE_TUNNEL_TYPE_GENEVE:
+   case RTE_TUNNEL_TYPE_TEREDO:
+   PMD_DRV_LOG(ERR, "Tunnel type is not supported now.");
+   ret = -1;
+   break;
+   default:
+   PMD_DRV_LOG(ERR, "Invalid tunnel

[dpdk-dev] [PATCH v6 4/5] ixgbe: support VxLAN & NVGRE RX checksum off-load

2016-03-03 Thread Wenzhuo Lu
X550 will do VxLAN & NVGRE RX checksum off-load automatically.
This patch exposes the result of the checksum off-load.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/ixgbe_ethdev.c |  4 
 drivers/net/ixgbe/ixgbe_rxtx.c   | 11 ++-
 lib/librte_ether/rte_ethdev.h|  1 +
 lib/librte_mbuf/rte_mbuf.c   |  2 +-
 lib/librte_mbuf/rte_mbuf.h   |  2 +-
 5 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index ec2ff0e..86afba4 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -2799,6 +2799,10 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
!RTE_ETH_DEV_SRIOV(dev).active)
dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_TCP_LRO;

+   if (hw->mac.type == ixgbe_mac_X550 ||
+   hw->mac.type == ixgbe_mac_X550EM_x)
+   dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM;
+
dev_info->tx_offload_capa =
DEV_TX_OFFLOAD_VLAN_INSERT |
DEV_TX_OFFLOAD_IPV4_CKSUM  |
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index e95e6b7..6b913ee 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1003,6 +1003,8 @@ rx_desc_status_to_pkt_flags(uint32_t rx_status)
 static inline uint64_t
 rx_desc_error_to_pkt_flags(uint32_t rx_status)
 {
+   uint64_t pkt_flags;
+
/*
 * Bit 31: IPE, IPv4 checksum error
 * Bit 30: L4I, L4I integrity error
@@ -1011,8 +1013,15 @@ rx_desc_error_to_pkt_flags(uint32_t rx_status)
0,  PKT_RX_L4_CKSUM_BAD, PKT_RX_IP_CKSUM_BAD,
PKT_RX_IP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD
};
-   return error_to_pkt_flags_map[(rx_status >>
+   pkt_flags = error_to_pkt_flags_map[(rx_status >>
IXGBE_RXDADV_ERR_CKSUM_BIT) & IXGBE_RXDADV_ERR_CKSUM_MSK];
+
+   if ((rx_status & IXGBE_RXD_STAT_OUTERIPCS) &&
+   (rx_status & IXGBE_RXDADV_ERR_OUTERIPER)) {
+   pkt_flags |= PKT_RX_EIP_CKSUM_BAD;
+   }
+
+   return pkt_flags;
 }

 /*
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index f1f96c1..e7e7a66 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -810,6 +810,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_TCP_CKSUM   0x0008
 #define DEV_RX_OFFLOAD_TCP_LRO 0x0010
 #define DEV_RX_OFFLOAD_QINQ_STRIP  0x0020
+#define DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM 0x0040

 /**
  * TX offload capabilities of a device.
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index c18b438..dc0467c 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -253,7 +253,7 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
case PKT_RX_FDIR: return "PKT_RX_FDIR";
case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
-   /* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */
+   case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD";
/* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index c973e9b..c4e7e25 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -88,7 +88,7 @@ extern "C" {
 #define PKT_RX_FDIR  (1ULL << 2)  /**< RX packet with FDIR match 
indicate. */
 #define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)  /**< L4 cksum of RX pkt. is not OK. 
*/
 #define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)  /**< IP cksum of RX pkt. is not OK. 
*/
-#define PKT_RX_EIP_CKSUM_BAD (0ULL << 0)  /**< External IP header checksum 
error. */
+#define PKT_RX_EIP_CKSUM_BAD (1ULL << 5)  /**< External IP header checksum 
error. */
 #define PKT_RX_OVERSIZE  (0ULL << 0)  /**< Num of desc of an RX pkt 
oversize. */
 #define PKT_RX_HBUF_OVERFLOW (0ULL << 0)  /**< Header buffer overflow. */
 #define PKT_RX_RECIP_ERR (0ULL << 0)  /**< Hardware processing error. */
-- 
1.9.3



[dpdk-dev] [PATCH v6 5/5] ixgbe: support VxLAN & NVGRE TX checksum off-load

2016-03-03 Thread Wenzhuo Lu
The patch add VxLAN & NVGRE TX checksum off-load. When the flag of
outer IP header checksum offload is set, we'll set the context
descriptor to enable this checksum off-load.

Also update release note for VxLAN & NVGRE checksum off-load support.

Signed-off-by: Wenzhuo Lu 
---
 doc/guides/rel_notes/release_16_04.rst |  9 ++
 drivers/net/ixgbe/ixgbe_ethdev.c   |  4 +++
 drivers/net/ixgbe/ixgbe_rxtx.c | 56 +++---
 drivers/net/ixgbe/ixgbe_rxtx.h |  6 +++-
 4 files changed, 63 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 8273817..a17c2fb 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -46,6 +46,15 @@ This section should contain new features added in this 
release. Sample format:

 * **Added vhost-user live migration support.**

+* **Added support for VxLAN & NVGRE checksum off-load on X550.**
+
+  * Added support for VxLAN & NVGRE RX/TX checksum off-load on
+X550. RX/TX checksum off-load is provided on both inner and
+outer IP header and TCP header.
+  * Added functions to support VxLAN port configuration. The
+default VxLAN port number is 4789 but this can be updated
+programmatically.
+

 Resolved Issues
 ---
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 86afba4..7ad7a84 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -2811,6 +2811,10 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
DEV_TX_OFFLOAD_SCTP_CKSUM  |
DEV_TX_OFFLOAD_TCP_TSO;

+   if (hw->mac.type == ixgbe_mac_X550 ||
+   hw->mac.type == ixgbe_mac_X550EM_x)
+   dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM;
+
dev_info->default_rxconf = (struct rte_eth_rxconf) {
.rx_thresh = {
.pthresh = IXGBE_DEFAULT_RX_PTHRESH,
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 6b913ee..c2c71de 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -85,7 +85,8 @@
PKT_TX_VLAN_PKT |\
PKT_TX_IP_CKSUM |\
PKT_TX_L4_MASK | \
-   PKT_TX_TCP_SEG)
+   PKT_TX_TCP_SEG | \
+   PKT_TX_OUTER_IP_CKSUM)

 static inline struct rte_mbuf *
 rte_rxmbuf_alloc(struct rte_mempool *mp)
@@ -364,9 +365,11 @@ ixgbe_set_xmit_ctx(struct ixgbe_tx_queue *txq,
uint32_t ctx_idx;
uint32_t vlan_macip_lens;
union ixgbe_tx_offload tx_offload_mask;
+   uint32_t seqnum_seed = 0;

ctx_idx = txq->ctx_curr;
-   tx_offload_mask.data = 0;
+   tx_offload_mask.data[0] = 0;
+   tx_offload_mask.data[1] = 0;
type_tucmd_mlhl = 0;

/* Specify which HW CTX to upload. */
@@ -430,18 +433,35 @@ ixgbe_set_xmit_ctx(struct ixgbe_tx_queue *txq,
}
}

+   if (ol_flags & PKT_TX_OUTER_IP_CKSUM) {
+   tx_offload_mask.outer_l2_len |= ~0;
+   tx_offload_mask.outer_l3_len |= ~0;
+   tx_offload_mask.l2_len |= ~0;
+   seqnum_seed |= tx_offload.outer_l3_len
+  << IXGBE_ADVTXD_OUTER_IPLEN;
+   seqnum_seed |= tx_offload.l2_len
+  << IXGBE_ADVTXD_TUNNEL_LEN;
+   }
+
txq->ctx_cache[ctx_idx].flags = ol_flags;
-   txq->ctx_cache[ctx_idx].tx_offload.data  =
-   tx_offload_mask.data & tx_offload.data;
+   txq->ctx_cache[ctx_idx].tx_offload.data[0]  =
+   tx_offload_mask.data[0] & tx_offload.data[0];
+   txq->ctx_cache[ctx_idx].tx_offload.data[1]  =
+   tx_offload_mask.data[1] & tx_offload.data[1];
txq->ctx_cache[ctx_idx].tx_offload_mask= tx_offload_mask;

ctx_txd->type_tucmd_mlhl = rte_cpu_to_le_32(type_tucmd_mlhl);
vlan_macip_lens = tx_offload.l3_len;
-   vlan_macip_lens |= (tx_offload.l2_len << IXGBE_ADVTXD_MACLEN_SHIFT);
+   if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+   vlan_macip_lens |= (tx_offload.outer_l2_len <<
+   IXGBE_ADVTXD_MACLEN_SHIFT);
+   else
+   vlan_macip_lens |= (tx_offload.l2_len <<
+   IXGBE_ADVTXD_MACLEN_SHIFT);
vlan_macip_lens |= ((uint32_t)tx_offload.vlan_tci << 
IXGBE_ADVTXD_VLAN_SHIFT);
ctx_txd->vlan_macip_lens = rte_cpu_to_le_32(vlan_macip_lens);
ctx_txd->mss_l4len_idx   = rte_cpu_to_le_32(mss_l4len_idx);
-   ctx_txd->seqnum_seed = 0;
+   ctx_txd->seqnum_seed = seqnum_seed;
 }

 /*
@@ -454,16 +474,24 @@ what_advctx_update(struct ixgbe_tx_queue *txq, uint64_t 
flags,
 {
/* If match with the current used context */
if (likely((

[dpdk-dev] [PATCH] e1000: fix setting of VF MAC address

2016-03-03 Thread Lu, Wenzhuo
Hi Bernard,


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bernard Iremonger
> Sent: Thursday, March 3, 2016 12:09 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] e1000: fix setting of VF MAC address
> 
> Allow reprogramming of the RAR with a zero mac address, to ensure that the VF
> traffic goes to the PF after stop, close and detach of the VF.
> 
> Fixes: be2d648a2dd3 ("igb: add PF support")
> Fixes: d82170d27918 ("igb: add VF support")
> Signed-off-by: Bernard Iremonger 
> ---
>  drivers/net/e1000/igb_ethdev.c | 12 +++-
>  drivers/net/e1000/igb_pf.c |  8 +---
>  2 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
> index 4ed5e95..f1044b7 100644
> --- a/drivers/net/e1000/igb_ethdev.c
> +++ b/drivers/net/e1000/igb_ethdev.c
> @@ -1,7 +1,7 @@
>  /*-
>   *   BSD LICENSE
>   *
> - *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
> + *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
>   *   All rights reserved.
>   *
>   *   Redistribution and use in source and binary forms, with or without
> @@ -2819,6 +2819,7 @@ igbvf_dev_close(struct rte_eth_dev *dev)
>   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
>   struct e1000_adapter *adapter =
>   E1000_DEV_PRIVATE(dev->data->dev_private);
> + struct ether_addr addr;
> 
>   PMD_INIT_FUNC_TRACE();
> 
> @@ -2827,6 +2828,15 @@ igbvf_dev_close(struct rte_eth_dev *dev)
>   igbvf_dev_stop(dev);
>   adapter->stopped = 1;
>   igb_dev_free_queues(dev);
> +
> + /**
> +  * reprogram the RAR with a zero mac address,
> +  * to ensure that the VF traffic goes to the PF
> +  * after stop, close and detach of the VF.
> +  **/
> +
> + memset(&addr, 0, sizeof(addr));
> + igbvf_default_mac_addr_set(dev, &addr);
>  }
> 
>  static int igbvf_set_vfta(struct e1000_hw *hw, uint16_t vid, bool on) diff 
> --git
> a/drivers/net/e1000/igb_pf.c b/drivers/net/e1000/igb_pf.c index
> 1d00dda..95204e9 100644
> --- a/drivers/net/e1000/igb_pf.c
> +++ b/drivers/net/e1000/igb_pf.c
> @@ -1,7 +1,7 @@
>  /*-
>   *   BSD LICENSE
>   *
> - *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
> + *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
>   *   All rights reserved.
>   *
>   *   Redistribution and use in source and binary forms, with or without
> @@ -332,8 +332,10 @@ igb_vf_set_mac_addr(struct rte_eth_dev *dev,
> uint32_t vf, uint32_t *msgbuf)
>   int rar_entry = hw->mac.rar_entry_count - (vf + 1);
>   uint8_t *new_mac = (uint8_t *)(&msgbuf[1]);
> 
> - if (is_valid_assigned_ether_addr((struct ether_addr*)new_mac)) {
> - rte_memcpy(vfinfo[vf].vf_mac_addresses, new_mac, 6);
> + if (is_unicast_ether_addr((struct ether_addr *)new_mac)) {
> + if (!is_zero_ether_addr((struct ether_addr *)new_mac))
> + rte_memcpy(vfinfo[vf].vf_mac_addresses, new_mac,
> + sizeof(vfinfo[vf].vf_mac_addresses));
>   hw->mac.ops.rar_set(hw, new_mac, rar_entry);
If the new mac is 0, after this, the rar is 0, but the address stored in vfinfo 
is not changed and surely not 0. Right?
So, they're not align with each other. Could it cause some problem?

>   return 0;
>   }
> --
> 2.6.3



[dpdk-dev] [PATCH v2] ixgbe: support multicast promiscuous mode on VF

2016-03-03 Thread Wang, Xiao W
Hi,

> > > +
> > > + err = mbx->ops.write_posted(hw, msgbuf, 2, 0);
> > > + if (err)
> > > + return err;
> > > +
> > > + err = mbx->ops.read_posted(hw, msgbuf, 2, 0);
> >
> > Is it more reasonable to read a message of size 1 than 2? Pf side only
> > write 1 word into mbx.
> Thanks for the comment. But actually PF writes 16 words into the mbx, and 2
> words have meaning.
> Word0 is used to check ack/nack. PF uses word1 to write the xcast_mode back.
> I don't check the word1 because I don't see the necessary:)
> 
> >

OK, I learned that kernel PF put 2 words into mbx for this message.

Acked-by: Xiao Wang 


[dpdk-dev] [PATCH] hash: fix memcmp function pointer in multi-process environment

2016-03-03 Thread Qiu, Michael
On 3/3/2016 11:36 AM, Dhana Eadala wrote:
> We found a problem in dpdk-2.2 using under multi-process environment.
> Here is the brief description how we are using the dpdk:
>
> We have two processes proc1, proc2 using dpdk. These proc1 and proc2 are two 
> different compiled binaries.
> proc1 is started as primary process and proc2 as secondary process.
>
> proc1:
> Calls srcHash = rte_hash_create("src_hash_name") to create rte_hash structure.
> As part of this, this api initalized the rte_hash structure and set the 
> srcHash->rte_hash_cmp_eq to the address of memcmp() from proc1 address space.
>
> proc2:
> calls srcHash =  rte_hash_find_existing("src_hash_name"). This returns the 
> rte_hash created by proc1.
> This srcHash->rte_hash_cmp_eq still points to the address of memcmp() from 
> proc1 address space.
> Later proc2  calls rte_hash_lookup_with_hash(srcHash, (const void*) &key, 
> key.sig);
> Under the hood, rte_hash_lookup_with_hash() invokes 
> __rte_hash_lookup_with_hash(), which in turn calls h->rte_hash_cmp_eq(key, 
> k->key, h->key_len).
> This leads to a crash as h->rte_hash_cmp_eq is an address from proc1 address 
> space and is invalid address in proc2 address space.
>
> We found, from dpdk documentation, that
>
> "
>  The use of function pointers between multiple processes running based of 
> different compiled
>  binaries is not supported, since the location of a given function in one 
> process may be different to
>  its location in a second. This prevents the librte_hash library from 
> behaving properly as in a  multi-
>  threaded instance, since it uses a pointer to the hash function internally.
>
>  To work around this issue, it is recommended that multi-process applications 
> perform the hash
>  calculations by directly calling the hashing function from the code and then 
> using the
>  rte_hash_add_with_hash()/rte_hash_lookup_with_hash() functions instead of 
> the functions which do
>  the hashing internally, such as rte_hash_add()/rte_hash_lookup().
> "
>
> We did follow the recommended steps by invoking rte_hash_lookup_with_hash().
> It was no issue up to and including dpdk-2.0. In later releases started 
> crashing because rte_hash_cmp_eq is introduced in dpdk-2.1
>
> We fixed it with the following patch and would like to submit the patch to 
> dpdk.org.
> Patch is created such that, if anyone wanted to use dpdk in multi-process 
> environment with function pointers not shared, they need to
> define RTE_LIB_MP_NO_FUNC_PTR in their Makefile. Without defining this flag 
> in Makefile, it works as it is now.
>
> Signed-off-by: Dhana Eadala 
> ---
>

Some comments:

1.  your commit log need to refactor, better to limit every line less
than 80 character.

2. I think you could add the ifdef here in
lib/librte_hash/rte_cuckoo_hash.c :
/*
 * If x86 architecture is used, select appropriate compare function,
 * which may use x86 instrinsics, otherwise use memcmp
 */
#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686) ||\
 defined(RTE_ARCH_X86_X32) || defined(RTE_ARCH_ARM64)
/* Select function to compare keys */
switch (params->key_len) {
case 16:
h->rte_hash_cmp_eq = rte_hash_k16_cmp_eq;
break;
[...]
break;
default:
/* If key is not multiple of 16, use generic memcmp */
h->rte_hash_cmp_eq = memcmp;
}
#else
h->rte_hash_cmp_eq = memcmp;
#endif

So that could remove other #ifdef in those lines.

3. I don't think ask others to write RTE_LIB_MP_NO_FUNC_PTR in makefile
is a good idea, if you really want to do that, please add a doc so that
others could know it.

Thanks,
Michael


[dpdk-dev] New driver (large patch) question.

2016-03-03 Thread Qiu, Michael
On 3/3/2016 7:11 AM, Stephen Hurd wrote:
> On Wed, Mar 2, 2016 at 2:15 PM, Thomas Monjalon 
> wrote:
>
>>> The comments in it are the only publicly available
>>> documentation on the hardware I'm aware of.
>> So you must keep the comments.
>>
> That's my goal, but the comments are well over the 300k limit.
>
>
>>> The driver itself doesn't have a lot of optional features in it, it's the
>>> header file that's too big.
>> It is big because there are many different things.
>> You can split the file in different patches.
>> Examples:
>> - a patch for RSS will bring the hardware structures for RSS
>> - a patch for the stats will bring the hardware stats structures
>> etc
>>
> Should I split additional definitions/documentation that's not currently
> used in the driver as well?  Or should it stay as only enough to document
> what the driver already does?
>
> The header file is expected to be publicly released in the future, so I
> tried to keep it as close to the original as possible.  I'm not strongly
> attached to this approach, but it does make it easier to support future
> firmware releases.
>
> It's a fairly work-intensive project to deconstruct the existing driver
> into a series of small patches that work at each step, is this a hard
> requirement? (if so, I'd better get cracking)

Does original header file has it's own commit log(like it in other
project)? If yes, it could make your life simpler.

Thanks,
Michael 
> PS: please answer inline
> Sorry, $work just switched us to GMail and I'm still learning the ropes.
>



[dpdk-dev] [PATCH v4 05/12] pmd/fm10k: add dev_ptype_info_get implementation

2016-03-03 Thread Tan, Jianfeng
Hi,

On 3/3/2016 4:11 AM, Chen, Jing D wrote:
> Hi,
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianfeng Tan
> Sent: Thursday, February 25, 2016 6:09 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v4 05/12] pmd/fm10k: add dev_ptype_info_get 
> implementation
>
> Signed-off-by: Jianfeng Tan 
> ---
>   drivers/net/fm10k/fm10k_ethdev.c   | 50 
> ++
>   drivers/net/fm10k/fm10k_rxtx.c |  3 +++
>   drivers/net/fm10k/fm10k_rxtx_vec.c |  3 +++
>   3 files changed, 56 insertions(+)
>
> diff --git a/drivers/net/fm10k/fm10k_ethdev.c 
> b/drivers/net/fm10k/fm10k_ethdev.c
> index 421266b..429cbdd 100644
> --- a/drivers/net/fm10k/fm10k_ethdev.c
> +++ b/drivers/net/fm10k/fm10k_ethdev.c
> @@ -1335,6 +1335,55 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
>   };
>   }
>   
> +#ifdef RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE
> +static const uint32_t *
> +fm10k_dev_ptype_info_get(struct rte_eth_dev *dev) {
> + if (dev->rx_pkt_burst == fm10k_recv_pkts ||
> + dev->rx_pkt_burst == fm10k_recv_scattered_pkts) {
> + static uint32_t ptypes[] = {
> + /* refers to rx_desc_to_ol_flags() */
> + RTE_PTYPE_L2_ETHER,
> + RTE_PTYPE_L3_IPV4,
> + RTE_PTYPE_L3_IPV4_EXT,
> + RTE_PTYPE_L3_IPV6,
> + RTE_PTYPE_L3_IPV6_EXT,
> + RTE_PTYPE_L4_TCP,
> + RTE_PTYPE_L4_UDP,
> + RTE_PTYPE_UNKNOWN
> + };
> +
> + return ptypes;
> + } else if (dev->rx_pkt_burst == fm10k_recv_pkts_vec ||
> +dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec) {
> + static uint32_t ptypes_vec[] = {
> + /* refers to fm10k_desc_to_pktype_v() */
> + RTE_PTYPE_L3_IPV4,
> + RTE_PTYPE_L3_IPV4_EXT,
> + RTE_PTYPE_L3_IPV6,
> + RTE_PTYPE_L3_IPV6_EXT,
> + RTE_PTYPE_L4_TCP,
> + RTE_PTYPE_L4_UDP,
> + RTE_PTYPE_TUNNEL_GENEVE,
> + RTE_PTYPE_TUNNEL_NVGRE,
> + RTE_PTYPE_TUNNEL_VXLAN,
> + RTE_PTYPE_TUNNEL_GRE,
> + RTE_PTYPE_UNKNOWN
> + };
> +
> + return ptypes_vec;
> + }
> +
> + return NULL;
> +}
> May I know when " fm10k_dev_ptype_info_get " will be called? In fm10k, the 
> actual
> Rx/tx func will be decided after port is started.

Thank you for pointing out this. It's indeed an issue here. And it makes 
no difference when all rx functions fill the same ptypes, which, 
unfortunately, does not apply to all PMDs. According to my analysis, 
only in fm10k's case, we should call ptype_info_get after dev_start(), 
and for other PMDs, it can called just after rx_queue_setup. So in all, 
I need to add this as a caution in API declaration.

__details__

eth_cxgbe_dev_init

eth_igb_dev_init
eth_igbvf_dev_init
eth_igb_rx_init <- eth_igb_start (makes no difference, rx functins fill 
same ptypes)
eth_igbvf_rx_init <- igbvf_dev_start (makes no difference, rx functins 
fill same ptypes)

eth_enicpmd_dev_init

fm10k_set_rx_function <- fm10k_dev_rx_init <- fm10k_dev_start

eth_i40e_dev_init
i40evf_dev_init
i40e_set_rx_function <- eth_i40e_dev_init
  <- i40evf_dev_init
  <- i40e_dev_rx_init <- 
i40e_dev_rxtx_init <- i40e_dev_start (makes no difference, rx functins 
fill same ptypes)
  <- i40evf_rx_init <- 
i40evf_dev_start (makes no difference, rx functins fill same ptypes)

ixgbe_set_rx_function <- eth_ixgbe_dev_init
<- ixgbe_dev_rx_init <- 
ixgbe_dev_start (makes no difference, rx functions fill same ptypes)
<- ixgbevf_dev_rx_init

mlx4_rx_queue_setup
mlx4_dev_set_mtu (makes no difference, rx functions fill same ptypes)

mlx5_rx_queue_setup
mlx5_dev_set_mtu (makes no difference, rx functions fill same ptypes)

nfp_net_init

eth_vmxnet3_dev_init

Thanks,
Jianfeng





[dpdk-dev] [PATCH 1/4] ixgbe: support UDP tunnel add/del

2016-03-03 Thread Qiu, Michael
On 1/11/2016 3:08 PM, Wenzhuo Lu wrote:
> Add UDP tunnel add/del support on ixgbe. Now it only support
> VxLAN port configuration.
> Although the VxLAN port has a default value 4789, it can be
> changed. We support VxLAN port configuration to meet the
> change.
> Note, the default value of VxLAN port in ixgbe NICs is 0. So
> please set it when using VxLAN off-load.
>
> Signed-off-by: Wenzhuo Lu 
> ---
>  drivers/net/ixgbe/ixgbe_ethdev.c | 93 
> 
>  1 file changed, 93 insertions(+)
>
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c 
> b/drivers/net/ixgbe/ixgbe_ethdev.c
> index 4c4c6df..381cbad 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -337,6 +337,10 @@ static int ixgbe_timesync_read_time(struct rte_eth_dev 
> *dev,
>  struct timespec *timestamp);
>  static int ixgbe_timesync_write_time(struct rte_eth_dev *dev,
>  const struct timespec *timestamp);
> +static int ixgbe_dev_udp_tunnel_add(struct rte_eth_dev *dev,
> + struct rte_eth_udp_tunnel *udp_tunnel);
> +static int ixgbe_dev_udp_tunnel_del(struct rte_eth_dev *dev,
> + struct rte_eth_udp_tunnel *udp_tunnel);
>  
>  /*
>   * Define VF Stats MACRO for Non "cleared on read" register
> @@ -495,6 +499,8 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
>   .timesync_adjust_time = ixgbe_timesync_adjust_time,
>   .timesync_read_time   = ixgbe_timesync_read_time,
>   .timesync_write_time  = ixgbe_timesync_write_time,
> + .udp_tunnel_add   = ixgbe_dev_udp_tunnel_add,
> + .udp_tunnel_del   = ixgbe_dev_udp_tunnel_del,
>  };
>  
>  /*
> @@ -6191,6 +6197,93 @@ ixgbe_dev_get_dcb_info(struct rte_eth_dev *dev,
>   return 0;
>  }
>  
> +#define DEFAULT_VXLAN_PORT 4789
> +
> +/* on x550, there's only one register for VxLAN UDP port.
> + * So, we cannot add or del the port. We only update it.
> + */
> +static int
> +ixgbe_update_vxlan_port(struct ixgbe_hw *hw,
> + uint16_t port)
> +{
> + IXGBE_WRITE_REG(hw, IXGBE_VXLANCTRL, port);
> + IXGBE_WRITE_FLUSH(hw);
> +
> + return 0;
> +}
> +
> +/* Add UDP tunneling port */
> +static int
> +ixgbe_dev_udp_tunnel_add(struct rte_eth_dev *dev,
> +  struct rte_eth_udp_tunnel *udp_tunnel)
> +{
> + int ret = 0;
> + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +
> + if (hw->mac.type != ixgbe_mac_X550 &&
> + hw->mac.type != ixgbe_mac_X550EM_x) {
> + return -ENOTSUP;
> + }
> +
> + if (udp_tunnel == NULL)
> + return -EINVAL;
> +
> + switch (udp_tunnel->prot_type) {
> + case RTE_TUNNEL_TYPE_VXLAN:
> + /* cannot add a port, update the port value */
> + ret = ixgbe_update_vxlan_port(hw, udp_tunnel->udp_port);
> + break;
> +
> + case RTE_TUNNEL_TYPE_GENEVE:
> + case RTE_TUNNEL_TYPE_TEREDO:
> + PMD_DRV_LOG(ERR, "Tunnel type is not supported now.");
> + ret = -1;
> + break;
> +
> + default:
> + PMD_DRV_LOG(ERR, "Invalid tunnel type");
> + ret = -1;
> + break;
> + }
> +
> + return ret;
> +}
> +
> +/* Remove UDP tunneling port */
> +static int
> +ixgbe_dev_udp_tunnel_del(struct rte_eth_dev *dev,
> +  struct rte_eth_udp_tunnel *udp_tunnel)
> +{
> + int ret = 0;
> + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +
> + if (hw->mac.type != ixgbe_mac_X550 &&
> + hw->mac.type != ixgbe_mac_X550EM_x) {
> + return -ENOTSUP;
> + }
> +
> + if (udp_tunnel == NULL)
> + return -EINVAL;
> +
> + switch (udp_tunnel->prot_type) {
> + case RTE_TUNNEL_TYPE_VXLAN:
> + /* cannot del the port, reset it to default */
> + ret = ixgbe_update_vxlan_port(hw, DEFAULT_VXLAN_PORT);
> + break;
> + case RTE_TUNNEL_TYPE_GENEVE:
> + case RTE_TUNNEL_TYPE_TEREDO:
> + PMD_DRV_LOG(ERR, "Tunnel type is not supported now.");
> + ret = -1;

Better to use the -EINVAL or other, mixed style always not good.

Thanks,
Michael
> + break;
> + default:
> + PMD_DRV_LOG(ERR, "Invalid tunnel type");
> + ret = -1;
> + break;
> + }
> +
> + return ret;
> +}
> +
>  static struct rte_driver rte_ixgbe_driver = {
>   .type = PMD_PDEV,
>   .init = rte_ixgbe_pmd_init,



[dpdk-dev] [PATCH 1/4] ixgbe: support UDP tunnel add/del

2016-03-03 Thread Lu, Wenzhuo
Hi Michael,

> -Original Message-
> From: Qiu, Michael
> Sent: Thursday, March 3, 2016 2:58 PM
> To: Lu, Wenzhuo; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/4] ixgbe: support UDP tunnel add/del
> 
> On 1/11/2016 3:08 PM, Wenzhuo Lu wrote:
> > Add UDP tunnel add/del support on ixgbe. Now it only support VxLAN
> > port configuration.
> > Although the VxLAN port has a default value 4789, it can be changed.
> > We support VxLAN port configuration to meet the change.
> > Note, the default value of VxLAN port in ixgbe NICs is 0. So please
> > set it when using VxLAN off-load.
> >
> > Signed-off-by: Wenzhuo Lu 
> > ---
> >  drivers/net/ixgbe/ixgbe_ethdev.c | 93
> > 
> >  1 file changed, 93 insertions(+)
> >
> > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > b/drivers/net/ixgbe/ixgbe_ethdev.c
> > index 4c4c6df..381cbad 100644
> > --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > @@ -337,6 +337,10 @@ static int ixgbe_timesync_read_time(struct
> rte_eth_dev *dev,
> >struct timespec *timestamp);
> >  static int ixgbe_timesync_write_time(struct rte_eth_dev *dev,
> >const struct timespec *timestamp);
> > +static int ixgbe_dev_udp_tunnel_add(struct rte_eth_dev *dev,
> > +   struct rte_eth_udp_tunnel *udp_tunnel);
> static int
> > +ixgbe_dev_udp_tunnel_del(struct rte_eth_dev *dev,
> > +   struct rte_eth_udp_tunnel *udp_tunnel);
> >
> >  /*
> >   * Define VF Stats MACRO for Non "cleared on read" register @@ -495,6
> > +499,8 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
> > .timesync_adjust_time = ixgbe_timesync_adjust_time,
> > .timesync_read_time   = ixgbe_timesync_read_time,
> > .timesync_write_time  = ixgbe_timesync_write_time,
> > +   .udp_tunnel_add   = ixgbe_dev_udp_tunnel_add,
> > +   .udp_tunnel_del   = ixgbe_dev_udp_tunnel_del,
> >  };
> >
> >  /*
> > @@ -6191,6 +6197,93 @@ ixgbe_dev_get_dcb_info(struct rte_eth_dev *dev,
> > return 0;
> >  }
> >
> > +#define DEFAULT_VXLAN_PORT 4789
> > +
> > +/* on x550, there's only one register for VxLAN UDP port.
> > + * So, we cannot add or del the port. We only update it.
> > + */
> > +static int
> > +ixgbe_update_vxlan_port(struct ixgbe_hw *hw,
> > +   uint16_t port)
> > +{
> > +   IXGBE_WRITE_REG(hw, IXGBE_VXLANCTRL, port);
> > +   IXGBE_WRITE_FLUSH(hw);
> > +
> > +   return 0;
> > +}
> > +
> > +/* Add UDP tunneling port */
> > +static int
> > +ixgbe_dev_udp_tunnel_add(struct rte_eth_dev *dev,
> > +struct rte_eth_udp_tunnel *udp_tunnel) {
> > +   int ret = 0;
> > +   struct ixgbe_hw *hw =
> > +IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > +
> > +   if (hw->mac.type != ixgbe_mac_X550 &&
> > +   hw->mac.type != ixgbe_mac_X550EM_x) {
> > +   return -ENOTSUP;
> > +   }
> > +
> > +   if (udp_tunnel == NULL)
> > +   return -EINVAL;
> > +
> > +   switch (udp_tunnel->prot_type) {
> > +   case RTE_TUNNEL_TYPE_VXLAN:
> > +   /* cannot add a port, update the port value */
> > +   ret = ixgbe_update_vxlan_port(hw, udp_tunnel->udp_port);
> > +   break;
> > +
> > +   case RTE_TUNNEL_TYPE_GENEVE:
> > +   case RTE_TUNNEL_TYPE_TEREDO:
> > +   PMD_DRV_LOG(ERR, "Tunnel type is not supported now.");
> > +   ret = -1;
> > +   break;
> > +
> > +   default:
> > +   PMD_DRV_LOG(ERR, "Invalid tunnel type");
> > +   ret = -1;
> > +   break;
> > +   }
> > +
> > +   return ret;
> > +}
> > +
> > +/* Remove UDP tunneling port */
> > +static int
> > +ixgbe_dev_udp_tunnel_del(struct rte_eth_dev *dev,
> > +struct rte_eth_udp_tunnel *udp_tunnel) {
> > +   int ret = 0;
> > +   struct ixgbe_hw *hw =
> > +IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > +
> > +   if (hw->mac.type != ixgbe_mac_X550 &&
> > +   hw->mac.type != ixgbe_mac_X550EM_x) {
> > +   return -ENOTSUP;
> > +   }
> > +
> > +   if (udp_tunnel == NULL)
> > +   return -EINVAL;
> > +
> > +   switch (udp_tunnel->prot_type) {
> > +   case RTE_TUNNEL_TYPE_VXLAN:
> > +   /* cannot del the port, reset it to default */
> > +   ret = ixgbe_update_vxlan_port(hw, DEFAULT_VXLAN_PORT);
> > +   break;
> > +   case RTE_TUNNEL_TYPE_GENEVE:
> > +   case RTE_TUNNEL_TYPE_TEREDO:
> > +   PMD_DRV_LOG(ERR, "Tunnel type is not supported now.");
> > +   ret = -1;
> 
> Better to use the -EINVAL or other, mixed style always not good.
Good suggestion, thanks. I'll change it.

> 
> Thanks,
> Michael
> > +   break;
> > +   default:
> > +   PMD_DRV_LOG(ERR, "Invalid tunnel type");
> > +   ret = -1;
> > +   break;
> > +   }
> > +
> > +   return ret;
> > +}
> > +
> >  static struct rte_driver rte_ixgbe_driver = {
> > .type = PMD_PDEV,
> > .init = rte_ixgbe_pmd_init,



[dpdk-dev] [PATCH 1/7] lib/librte_ether: Add 2/2.5/25/50Gbps link speeds

2016-03-03 Thread Simon Kågström
Hi!

On 2016-03-03 05:08, Stephen Hurd wrote:
> Add additional ETH_LINK_SPEED_* macros for 2, 2.5, 25, and 50 Gbps links
> 
> Signed-off-by: Stephen Hurd 
> ---
>  lib/librte_ether/rte_ethdev.h | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 16da821..cb40bbb 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -254,10 +254,14 @@ struct rte_eth_link {
>  #define ETH_LINK_SPEED_10   10  /**< 10 megabits/second. */
>  #define ETH_LINK_SPEED_100  100 /**< 100 megabits/second. */
>  #define ETH_LINK_SPEED_1000 1000/**< 1 gigabits/second. */
> +#define ETH_LINK_SPEED_2000 2000/**< 2 gigabits/second. */
> +#define ETH_LINK_SPEED_2500 2500/**< 2.5 gigabits/second. */
>  #define ETH_LINK_SPEED_11   /**< 10 gigabits/second. */
>  #define ETH_LINK_SPEED_10G  1   /**< alias of 10 gigabits/second. */
>  #define ETH_LINK_SPEED_20G  2   /**< 20 gigabits/second. */
> +#define ETH_LINK_SPEED_25G  25000/**< 25 gigabits/second. */
>  #define ETH_LINK_SPEED_40G  4   /**< 40 gigabits/second. */
> +#define ETH_LINK_SPEED_50G  5   /**< 50 gigabits/second. */

I realize this is a more general question, but is it really meaningful
to have macros for all possible link speeds? We're working on a PMD
driver with a channelized interface exposed as DPDK ports. Each channel
can be configured with an arbitrary speed, so e.g., 1337 Mbps is also
possible.

To me, it would seem more natural to just have a number in mbits for the
link speed.

// Simon



[dpdk-dev] [PATCH 1/3] kcp: add kernel control path kernel module

2016-03-03 Thread Panu Matilainen
On 03/03/2016 12:35 AM, Thomas Monjalon wrote:
> 2016-03-02 12:21, Thomas Monjalon:
>> 2016-03-02 11:47, Vincent JARDIN:
>>> Le 02/03/2016 09:27, Panu Matilainen a ?crit :
>> I'd like to see these be merged.
>>
>> Jay
>
> The code is really not ready. I am okay with cooperative development
> but the current code needs to go into a staging type tree.
> No compatibility, no ABI guarantees, more of an RFC.
> Don't want vendors building products with it then screaming when it
> gets rebuilt/reworked/scrapped.
>

 Exactly.
>>>
>>> +1 too
>>>
>>> We need to build on this innovation while there is a path for kernel
>>> mainstream. The logic of using a staging is a good one.
>>>
>>> Thomas,
>>>
>>> can we open a staging folder into the DPDK like it is done into the kernel?
>>
>> It's possible to create a staging directory if everybody agree.
>> It is important to state in a README file or in the doc/ that
>> there will be no guarantee (no stable ABI, no validation and can be dropped)
>> and that it is a work in progress, a suggestion to discuss with the kernel
>> community.
>>
>> The kernel modules must clearly target an upstream integration.
>
> Actually the examples directory has been used as a staging for ethtool and
> lthread. We also have the crypto API which is still experimental.
> So I think we must decide among these 3 solutions:
>   - no special directory, just mark and document an experimental state
>   - put only kcp/kdp in the staging directory
>   - put kcp/kdp in staging and move other experimental libs here

To answer this, I think we need to start by clarifying the kernel module 
situation. Quoting your from 
http://dpdk.org/ml/archives/dev/2016-January/032263.html:

> Sorry the kernel module party is over.
> One day, igb_uio will be removed.
> I suggest to make a first version without interrupt support
> and work with Linux community to fix your issues.

This to me reads "no more out-of-tree kernel modules, period" but here 
we are discussing the fate of another one.

If the policy truly is "no more kernel modules" (which I would fully 
back and applaud) then I think there's little to discuss - if the 
destination is kernel upstream then why should the modules pass through 
the dpdk codebase? Put it in another repo on dpdk.org, advertise it, 
make testing it as easy as possible and all (like have it integrate with 
dpdk makefiles if needed) instead.

The difference with crypto API and ethtool is different in that the 
destination for them clearly is dpdk itself. I would like to see 
experimental code moved to a separate (staging or whatever) directory 
(or a repo/git submodule) to make the situation absolutely clear. Or a 
repo/git submodule or such. I also still think experimental features 
should not be enabled by default in the configs, no other project that I 
know of does that, but that's another discussion.

- Panu -


[dpdk-dev] [PATCH v6 0/6] interrupt mode for fm10k

2016-03-03 Thread Liu, Yong
Hi Thomas,
In patchwork website, this patch set seem that has been applied. 
http://dpdk.org/dev/patchwork/patch/10381/

But I can't find it neither in dpdk repo nor in dpdk-next-net repo. 
Could you check with that? 

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Shaopeng He
> Sent: Friday, February 05, 2016 12:58 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v6 0/6] interrupt mode for fm10k
> 
> This patch series adds interrupt mode support for fm10k,
> contains four major parts:
> 
> 1. implement rx_descriptor_done function in fm10k
> 2. add rx interrupt support in fm10k PF and VF
> 3. make sure default VID available in dev_init in fm10k
> 4. fix a memory leak for non-ip packet in l3fwd-power,
>which happens mostly when testing fm10k interrupt mode.
> 
> v6 changes:
>   - add fixes line
>   - relocate version change message in individual patch
> after the --- separator
> 
> v5 changes:
>   - remove one unnecessary NULL check for rte_free
>   - fix a wrong error message
>   - add more clean up when memory allocation fails
>   - split line over 80 characters to 2 lines
>   - update interrupt mode limitation in fm10k.rst
> 
> v4 changes:
>   - rebase to latest code
>   - update release 2.3 note in corresponding patches
> 
> v3 changes:
>   - rebase to latest code
>   - macro renaming according to the EAL change
> 
> v2 changes:
>   - reword some comments and commit messages
>   - split one big patch into three smaller ones
> 
> Shaopeng He (6):
>   fm10k: implement rx_descriptor_done function
>   fm10k: setup rx queue interrupts for PF and VF
>   fm10k: remove rx queue interrupts when dev stops
>   fm10k: add rx queue interrupt en/dis functions
>   fm10k: make sure default VID available in dev_init
>   l3fwd-power: fix a memory leak for non-ip packet
> 
>  doc/guides/nics/fm10k.rst|   7 ++
>  doc/guides/rel_notes/release_2_3.rst |   8 ++
>  drivers/net/fm10k/fm10k.h|   6 ++
>  drivers/net/fm10k/fm10k_ethdev.c | 174
> ---
>  drivers/net/fm10k/fm10k_rxtx.c   |  25 +
>  examples/l3fwd-power/main.c  |   3 +-
>  6 files changed, 211 insertions(+), 12 deletions(-)
> 
> --
> 1.9.3



[dpdk-dev] [PATCH v6 0/6] interrupt mode for fm10k

2016-03-03 Thread Thomas Monjalon
2016-03-03 08:36, Liu, Yong:
> Hi Thomas,
> In patchwork website, this patch set seem that has been applied. 
> http://dpdk.org/dev/patchwork/patch/10381/
> 
> But I can't find it neither in dpdk repo nor in dpdk-next-net repo. 
> Could you check with that?

It's here:
http://dpdk.org/browse/next/dpdk-next-net/log/?h=rel_16_04&ofs=50




[dpdk-dev] [PATCH v6 0/6] interrupt mode for fm10k

2016-03-03 Thread Liu, Yong
Thanks Thomas, found it.

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, March 03, 2016 5:11 PM
> To: Liu, Yong
> Cc: He, Shaopeng; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 0/6] interrupt mode for fm10k
> 
> 2016-03-03 08:36, Liu, Yong:
> > Hi Thomas,
> > In patchwork website, this patch set seem that has been applied.
> > http://dpdk.org/dev/patchwork/patch/10381/
> >
> > But I can't find it neither in dpdk repo nor in dpdk-next-net repo.
> > Could you check with that?
> 
> It's here:
> http://dpdk.org/browse/next/dpdk-next-net/log/?h=rel_16_04&ofs=50
> 



[dpdk-dev] [PATCH 1/7] lib/librte_ether: Add 2/2.5/25/50Gbps link speeds

2016-03-03 Thread Thomas Monjalon
Hi,

2016-03-03 08:53, Simon K?gstr?m:
> Hi!
> 
> On 2016-03-03 05:08, Stephen Hurd wrote:
> > Add additional ETH_LINK_SPEED_* macros for 2, 2.5, 25, and 50 Gbps links

Stephen,
you could be interested in the rework done by Marc Sune:
http://dpdk.org/dev/patchwork/patch/10919/

> I realize this is a more general question, but is it really meaningful
> to have macros for all possible link speeds? We're working on a PMD
> driver with a channelized interface exposed as DPDK ports. Each channel
> can be configured with an arbitrary speed, so e.g., 1337 Mbps is also
> possible.

What is the benefit? Why not negotiate the maximum capability of the peer?

> To me, it would seem more natural to just have a number in mbits for the
> link speed.

Please jump in the thread initiated by Marc Sune months ago.


[dpdk-dev] [PATCH v3 1/2] librte_pipeline: add support for packet redirection at action handlers

2016-03-03 Thread Panu Matilainen
On 03/02/2016 10:41 PM, Jasvinder Singh wrote:
> Currently, there is no mechanism that allows the pipeline ports (in/out) and
> table action handlers to override the default forwarding decision (as
> previously configured per input port or in the table entry). Therefore, new
> pipeline API functions have been added which allows action handlers to
> hijack packets and remove them from the pipeline processing, and then either
> drop them or send them out of the pipeline on any output port. The port
> (in/out) and table action handler prototypes have been changed for making
> use of these new API functions. This feature will be helpful to implement
> functions such as exception handling (e.g. TTL =0), load balancing etc.
>
> Signed-off-by: Jasvinder Singh 
> Acked-by: Cristian Dumitrescu 
> ---
> v3
> * improved comments in "rte_pipeline.h"
>
> v2
> * rebased on master
>
>   doc/guides/rel_notes/deprecation.rst |   5 -
>   doc/guides/rel_notes/release_16_04.rst   |   6 +-
>   lib/librte_pipeline/Makefile |   4 +-
>   lib/librte_pipeline/rte_pipeline.c   | 461 
> ++-
>   lib/librte_pipeline/rte_pipeline.h   | 174 ++
>   lib/librte_pipeline/rte_pipeline_version.map |   8 +
>   6 files changed, 362 insertions(+), 296 deletions(-)
>
[...]

This causes a build failure:

== Build app/test-pipeline
   CC pipeline_stub.o
/srv/work/repos/dpdk/app/test-pipeline/pipeline_stub.c: In function 
?app_main_loop_worker_pipeline_stub?:
/srv/work/repos/dpdk/app/test-pipeline/pipeline_stub.c:97:4: error: 
unknown field ?f_action_bulk? specified in initializer
 .f_action_bulk = NULL,
 ^
/srv/work/repos/dpdk/mk/internal/rte.compile-pre.mk:126: recipe for 
target 'pipeline_stub.o' failed

Each individual commit needs to be buildable. Since its simply an 
incompatible API change, I guess there's no other way than updating the 
test app(s) in the same commit as the library. The other alternative 
would be temporarily disabling the test app(s) in the previous commit 
but that doesn't seem any better to me.

- Panu -


[dpdk-dev] [PATCH v6 1/5] lib/librte_ether: change function name of tunnel port config

2016-03-03 Thread Panu Matilainen
On 03/03/2016 03:22 AM, Wenzhuo Lu wrote:
> The names of function for tunnel port configuration are not
> accurate. They're tunnel_add/del, better change them to
> tunnel_port_add/del.
> As it may be an ABI change if change the names directly, the
> new functions are added but not remove the old ones. The old
> ones will be removed in the next release after an ABI change
> announcement.
>
> Signed-off-by: Wenzhuo Lu 
> ---
[...]
> diff --git a/lib/librte_ether/rte_ether_version.map 
> b/lib/librte_ether/rte_ether_version.map
> index d8db24d..5122217 100644
> --- a/lib/librte_ether/rte_ether_version.map
> +++ b/lib/librte_ether/rte_ether_version.map
> @@ -114,6 +114,8 @@ DPDK_2.2 {
>   rte_eth_tx_queue_setup;
>   rte_eth_xstats_get;
>   rte_eth_xstats_reset;
> + rte_eth_dev_udp_tunnel_port_add;
> + rte_eth_dev_udp_tunnel_port_delete;
>
>   local: *;
>   };

These symbols were not present in DPDK 2.2, hence they dont belong in 
that section. You need to declare a new version section, see 
http://dpdk.org/browse/dpdk/commit/?id=c2189745c38d944e3b0e0c99066d67d7bc7e7744 
for an example.

- Panu -




[dpdk-dev] [PATCH v3 1/2] librte_pipeline: add support for packet redirection at action handlers

2016-03-03 Thread Singh, Jasvinder


> -Original Message-
> From: Panu Matilainen [mailto:pmatilai at redhat.com]
> Sent: Thursday, March 3, 2016 9:35 AM
> To: Singh, Jasvinder ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 1/2] librte_pipeline: add support for
> packet redirection at action handlers
> 
> On 03/02/2016 10:41 PM, Jasvinder Singh wrote:
> > Currently, there is no mechanism that allows the pipeline ports
> > (in/out) and table action handlers to override the default forwarding
> > decision (as previously configured per input port or in the table
> > entry). Therefore, new pipeline API functions have been added which
> > allows action handlers to hijack packets and remove them from the
> > pipeline processing, and then either drop them or send them out of the
> > pipeline on any output port. The port
> > (in/out) and table action handler prototypes have been changed for
> > making use of these new API functions. This feature will be helpful to
> > implement functions such as exception handling (e.g. TTL =0), load
> balancing etc.
> >
> > Signed-off-by: Jasvinder Singh 
> > Acked-by: Cristian Dumitrescu 
> > ---
> > v3
> > * improved comments in "rte_pipeline.h"
> >
> > v2
> > * rebased on master
> >
> >   doc/guides/rel_notes/deprecation.rst |   5 -
> >   doc/guides/rel_notes/release_16_04.rst   |   6 +-
> >   lib/librte_pipeline/Makefile |   4 +-
> >   lib/librte_pipeline/rte_pipeline.c   | 461 
> > ++-
> >   lib/librte_pipeline/rte_pipeline.h   | 174 ++
> >   lib/librte_pipeline/rte_pipeline_version.map |   8 +
> >   6 files changed, 362 insertions(+), 296 deletions(-)
> >
> [...]
> 
> This causes a build failure:
> 
> == Build app/test-pipeline
>CC pipeline_stub.o
> /srv/work/repos/dpdk/app/test-pipeline/pipeline_stub.c: In function
> ?app_main_loop_worker_pipeline_stub?:
> /srv/work/repos/dpdk/app/test-pipeline/pipeline_stub.c:97:4: error:
> unknown field ?f_action_bulk? specified in initializer
>  .f_action_bulk = NULL,
>  ^
> /srv/work/repos/dpdk/mk/internal/rte.compile-pre.mk:126: recipe for
> target 'pipeline_stub.o' failed
> 
> Each individual commit needs to be buildable. Since its simply an
> incompatible API change, I guess there's no other way than updating the test
> app(s) in the same commit as the library. The other alternative would be
> temporarily disabling the test app(s) in the previous commit but that doesn't
> seem any better to me.
> 
I would prefer first suggestion to merge both commits (library and app) into 
one (library + app) so that all the build dependency could be resolved.

Thanks, 
Jasvinder


[dpdk-dev] [PATCH] e1000: fix setting of VF MAC address

2016-03-03 Thread Iremonger, Bernard
Hi Wenzhuo,

> -Original Message-
> From: Lu, Wenzhuo
> Sent: Thursday, March 3, 2016 2:34 AM
> To: Iremonger, Bernard ; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] e1000: fix setting of VF MAC address
> 
> Hi Bernard,
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bernard
> Iremonger
> > Sent: Thursday, March 3, 2016 12:09 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH] e1000: fix setting of VF MAC address
> >
> > Allow reprogramming of the RAR with a zero mac address, to ensure that
> > the VF traffic goes to the PF after stop, close and detach of the VF.
> >
> > Fixes: be2d648a2dd3 ("igb: add PF support")
> > Fixes: d82170d27918 ("igb: add VF support")
> > Signed-off-by: Bernard Iremonger 
> > ---
> >  drivers/net/e1000/igb_ethdev.c | 12 +++-
> >  drivers/net/e1000/igb_pf.c |  8 +---
> >  2 files changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/e1000/igb_ethdev.c
> > b/drivers/net/e1000/igb_ethdev.c index 4ed5e95..f1044b7 100644
> > --- a/drivers/net/e1000/igb_ethdev.c
> > +++ b/drivers/net/e1000/igb_ethdev.c
> > @@ -1,7 +1,7 @@
> >  /*-
> >   *   BSD LICENSE
> >   *
> > - *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
> > + *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
> >   *   All rights reserved.
> >   *
> >   *   Redistribution and use in source and binary forms, with or without
> > @@ -2819,6 +2819,7 @@ igbvf_dev_close(struct rte_eth_dev *dev)
> > struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data-
> > >dev_private);
> > struct e1000_adapter *adapter =
> > E1000_DEV_PRIVATE(dev->data->dev_private);
> > +   struct ether_addr addr;
> >
> > PMD_INIT_FUNC_TRACE();
> >
> > @@ -2827,6 +2828,15 @@ igbvf_dev_close(struct rte_eth_dev *dev)
> > igbvf_dev_stop(dev);
> > adapter->stopped = 1;
> > igb_dev_free_queues(dev);
> > +
> > +   /**
> > +* reprogram the RAR with a zero mac address,
> > +* to ensure that the VF traffic goes to the PF
> > +* after stop, close and detach of the VF.
> > +**/
> > +
> > +   memset(&addr, 0, sizeof(addr));
> > +   igbvf_default_mac_addr_set(dev, &addr);
> >  }
> >
> >  static int igbvf_set_vfta(struct e1000_hw *hw, uint16_t vid, bool on)
> > diff --git a/drivers/net/e1000/igb_pf.c b/drivers/net/e1000/igb_pf.c
> > index
> > 1d00dda..95204e9 100644
> > --- a/drivers/net/e1000/igb_pf.c
> > +++ b/drivers/net/e1000/igb_pf.c
> > @@ -1,7 +1,7 @@
> >  /*-
> >   *   BSD LICENSE
> >   *
> > - *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
> > + *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
> >   *   All rights reserved.
> >   *
> >   *   Redistribution and use in source and binary forms, with or without
> > @@ -332,8 +332,10 @@ igb_vf_set_mac_addr(struct rte_eth_dev *dev,
> > uint32_t vf, uint32_t *msgbuf)
> > int rar_entry = hw->mac.rar_entry_count - (vf + 1);
> > uint8_t *new_mac = (uint8_t *)(&msgbuf[1]);
> >
> > -   if (is_valid_assigned_ether_addr((struct ether_addr*)new_mac)) {
> > -   rte_memcpy(vfinfo[vf].vf_mac_addresses, new_mac, 6);
> > +   if (is_unicast_ether_addr((struct ether_addr *)new_mac)) {
> > +   if (!is_zero_ether_addr((struct ether_addr *)new_mac))
> > +   rte_memcpy(vfinfo[vf].vf_mac_addresses,
> new_mac,
> > +   sizeof(vfinfo[vf].vf_mac_addresses));
> > hw->mac.ops.rar_set(hw, new_mac, rar_entry);
> If the new mac is 0, after this, the rar is 0, but the address stored in 
> vfinfo is
> not changed and surely not 0. Right?
> So, they're not align with each other. Could it cause some problem?
> 
> > return 0;
> > }
> > --
> > 2.6.3

 vfinfo[vf].vf_mac_addresses  contains the perm_addr MAC address.
I do not want to overwrite this with a zero address. If the VF is attached 
again it uses the perm_addr  MAC address.
The follow sequence works fine for me in testpmd, the VF is port 1:

Testpmd> port stop 1
Testpmd> port close 1   /* VF MAC address is set to zero here */
Testpmd> port detach 1
Testpmd> port attach :04:10.0  
Testpmd> show port info 1   /* VF MAC address is perm_addr MAC address */

Regards,

Bernard.



[dpdk-dev] [PATCH 1/3] kcp: add kernel control path kernel module

2016-03-03 Thread Ferruh Yigit
On 3/3/2016 8:31 AM, Panu Matilainen wrote:
> On 03/03/2016 12:35 AM, Thomas Monjalon wrote:
>> 2016-03-02 12:21, Thomas Monjalon:
>>> 2016-03-02 11:47, Vincent JARDIN:
 Le 02/03/2016 09:27, Panu Matilainen a ?crit :
>>> I'd like to see these be merged.
>>>
>>> Jay
>>
>> The code is really not ready. I am okay with cooperative development
>> but the current code needs to go into a staging type tree.
>> No compatibility, no ABI guarantees, more of an RFC.
>> Don't want vendors building products with it then screaming when it
>> gets rebuilt/reworked/scrapped.
>>
>
> Exactly.

 +1 too

 We need to build on this innovation while there is a path for kernel
 mainstream. The logic of using a staging is a good one.

 Thomas,

 can we open a staging folder into the DPDK like it is done into the
 kernel?
>>>
>>> It's possible to create a staging directory if everybody agree.
>>> It is important to state in a README file or in the doc/ that
>>> there will be no guarantee (no stable ABI, no validation and can be
>>> dropped)
>>> and that it is a work in progress, a suggestion to discuss with the
>>> kernel
>>> community.
>>>
>>> The kernel modules must clearly target an upstream integration.
>>
>> Actually the examples directory has been used as a staging for ethtool
>> and
>> lthread. We also have the crypto API which is still experimental.
>> So I think we must decide among these 3 solutions:
>> - no special directory, just mark and document an experimental state
>> - put only kcp/kdp in the staging directory
>> - put kcp/kdp in staging and move other experimental libs here
> 
> To answer this, I think we need to start by clarifying the kernel module
> situation. Quoting your from
> http://dpdk.org/ml/archives/dev/2016-January/032263.html:
> 
>> Sorry the kernel module party is over.
>> One day, igb_uio will be removed.
>> I suggest to make a first version without interrupt support
>> and work with Linux community to fix your issues.
> 
> This to me reads "no more out-of-tree kernel modules, period" but here
> we are discussing the fate of another one.
> 
> If the policy truly is "no more kernel modules" (which I would fully
> back and applaud) then I think there's little to discuss - if the
> destination is kernel upstream then why should the modules pass through
> the dpdk codebase? Put it in another repo on dpdk.org, advertise it,
> make testing it as easy as possible and all (like have it integrate with
> dpdk makefiles if needed) instead.
> 
Hi Panu,

I just want to remind that these modules are to replace existing KNI
kernel module, and to reduce it's maintenance cost.
We are not adding new kernel modules for new features.

I believe replacing KNI module with new code in DPDK is a required
improvement step. But to replace, KNI users should verify the new codes.

Going directly from KNI to Linux upstream, if possible, is not easy.
Upstreaming should be done in incremental steps.

How about following steps:
1- Add KCP/KDP with an EXPERIMENTAL flag.
2- When they are mature enough, remove KNI, remove EXPERIMENTAL from
KCP/KDP.
3- Work on upstreaming

Thanks,
ferruh

> The difference with crypto API and ethtool is different in that the
> destination for them clearly is dpdk itself. I would like to see
> experimental code moved to a separate (staging or whatever) directory
> (or a repo/git submodule) to make the situation absolutely clear. Or a
> repo/git submodule or such. I also still think experimental features
> should not be enabled by default in the configs, no other project that I
> know of does that, but that's another discussion.
> 
> - Panu -



[dpdk-dev] [PATCH] nfp: tx checksum offload fixes

2016-03-03 Thread Alejandro Lucero
Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_net.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index fd4dd39..6078e9f 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -1522,7 +1522,7 @@ static inline void
 nfp_net_tx_cksum(struct nfp_net_txq *txq, struct nfp_net_tx_desc *txd,
 struct rte_mbuf *mb)
 {
-   uint16_t ol_flags;
+   uint64_t ol_flags;
struct nfp_net_hw *hw = txq->hw;

if (!(hw->cap & NFP_NET_CFG_CTRL_TXCSUM))
@@ -1543,7 +1543,8 @@ nfp_net_tx_cksum(struct nfp_net_txq *txq, struct 
nfp_net_tx_desc *txd,
break;
}

-   txd->flags |= PCIE_DESC_TX_CSUM;
+   if (ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK))
+   txd->flags |= PCIE_DESC_TX_CSUM;
 }

 /* nfp_net_rx_cksum - set mbuf checksum flags based on RX descriptor flags */
-- 
1.7.9.5



[dpdk-dev] [PATCH 1/3] kcp: add kernel control path kernel module

2016-03-03 Thread Ferruh Yigit
On 3/2/2016 10:18 PM, Jay Rolette wrote:
> 
> On Tue, Mar 1, 2016 at 8:02 PM, Stephen Hemminger
> mailto:stephen at networkplumber.org>> wrote:
> 
> On Mon, 29 Feb 2016 08:33:25 -0600
> Jay Rolette mailto:rolette at infiniteio.com>>
> wrote:
> 
> > On Mon, Feb 29, 2016 at 5:06 AM, Thomas Monjalon
> mailto:thomas.monjalon at 6wind.com>>
> > wrote:
> >
> > > Hi,
> > > I totally agree with Avi's comments.
> > > This topic is really important for the future of DPDK.
> > > So I think we must give some time to continue the discussion
> > > and have netdev involved in the choices done.
> > > As a consequence, these series should not be merged in the
> release 16.04.
> > > Thanks for continuing the work.
> > >
> >
> > I know you guys are very interested in getting rid of the out-of-tree
> > drivers, but please do not block incremental improvements to DPDK
> in the
> > meantime. Ferruh's patch improves the usability of KNI. Don't
> throw out
> > good and useful enhancements just because it isn't where you want
> to be in
> > the end.
> >
> > I'd like to see these be merged.
> >
> > Jay
> 
> The code is really not ready. I am okay with cooperative development
> but the current code needs to go into a staging type tree.
> No compatibility, no ABI guarantees, more of an RFC.
> Don't want vendors building products with it then screaming when it
> gets rebuilt/reworked/scrapped.
> 
> 
> That's fair. To be clear, it wasn't my intent for code that wasn't baked
> yet to be merged. 
> 
> The main point of my comment was that I think it is important not to
> halt incremental improvements to existing capabilities (KNI in this
> case) just because there are philosophical or directional changes that
> the community would like to make longer-term.
> 
> Bird in the hand vs. two in the bush...
> 

There are two different statements, first, code being not ready, I agree
a fair point (although there is no argument to that statement, it makes
hard to discuss this, I will put aside this), this implies when code is
ready it can go in to repo.

But not having kernel module, independent from their state against what
they are trying to replace is something else. And this won't help on KNI
related problems.

Thanks,
ferruh



[dpdk-dev] [PATCH 1/3] kcp: add kernel control path kernel module

2016-03-03 Thread Thomas Monjalon
2016-03-03 10:05, Ferruh Yigit:
> On 3/3/2016 8:31 AM, Panu Matilainen wrote:
> > On 03/03/2016 12:35 AM, Thomas Monjalon wrote:
> >> 2016-03-02 12:21, Thomas Monjalon:
> >>> 2016-03-02 11:47, Vincent JARDIN:
>  Le 02/03/2016 09:27, Panu Matilainen a ?crit :
> >>> I'd like to see these be merged.
> >>>
> >>> Jay
> >>
> >> The code is really not ready. I am okay with cooperative development
> >> but the current code needs to go into a staging type tree.
> >> No compatibility, no ABI guarantees, more of an RFC.
> >> Don't want vendors building products with it then screaming when it
> >> gets rebuilt/reworked/scrapped.
> >>
> >
> > Exactly.
> 
>  +1 too
> 
>  We need to build on this innovation while there is a path for kernel
>  mainstream. The logic of using a staging is a good one.
> 
>  Thomas,
> 
>  can we open a staging folder into the DPDK like it is done into the
>  kernel?
> >>>
> >>> It's possible to create a staging directory if everybody agree.
> >>> It is important to state in a README file or in the doc/ that
> >>> there will be no guarantee (no stable ABI, no validation and can be
> >>> dropped)
> >>> and that it is a work in progress, a suggestion to discuss with the
> >>> kernel
> >>> community.
> >>>
> >>> The kernel modules must clearly target an upstream integration.
> >>
> >> Actually the examples directory has been used as a staging for ethtool
> >> and
> >> lthread. We also have the crypto API which is still experimental.
> >> So I think we must decide among these 3 solutions:
> >> - no special directory, just mark and document an experimental state
> >> - put only kcp/kdp in the staging directory
> >> - put kcp/kdp in staging and move other experimental libs here
> > 
> > To answer this, I think we need to start by clarifying the kernel module
> > situation. Quoting your from
> > http://dpdk.org/ml/archives/dev/2016-January/032263.html:
> > 
> >> Sorry the kernel module party is over.
> >> One day, igb_uio will be removed.
> >> I suggest to make a first version without interrupt support
> >> and work with Linux community to fix your issues.
> > 
> > This to me reads "no more out-of-tree kernel modules, period" but here
> > we are discussing the fate of another one.
> > 
> > If the policy truly is "no more kernel modules" (which I would fully
> > back and applaud) then I think there's little to discuss - if the
> > destination is kernel upstream then why should the modules pass through
> > the dpdk codebase? Put it in another repo on dpdk.org, advertise it,
> > make testing it as easy as possible and all (like have it integrate with
> > dpdk makefiles if needed) instead.
> > 
> Hi Panu,
> 
> I just want to remind that these modules are to replace existing KNI
> kernel module, and to reduce it's maintenance cost.
> We are not adding new kernel modules for new features.
> 
> I believe replacing KNI module with new code in DPDK is a required
> improvement step. But to replace, KNI users should verify the new codes.
> 
> Going directly from KNI to Linux upstream, if possible, is not easy.
> Upstreaming should be done in incremental steps.
> 
> How about following steps:
> 1- Add KCP/KDP with an EXPERIMENTAL flag.
> 2- When they are mature enough, remove KNI, remove EXPERIMENTAL from
> KCP/KDP.
> 3- Work on upstreaming

What about working with upstream early (step 3 before 2)?
KNI is not so nice but it was advertised and used.
If we want to advertise a replacement, it must be approved by upstream.
We need some stable and widely adopted interfaces to bring more confidence
in the project.


[dpdk-dev] [PATCH] nfp: tx checksum offload fixes

2016-03-03 Thread Thomas Monjalon
Hi Alejandro,

Please start the title with a verb (fix here),
describe briefly the bug,
and add a Fixes: tag.

More info in this doc:
http://dpdk.org/doc/guides/contributing/patches.html#commit-messages-subject-line

2016-03-03 10:08, Alejandro Lucero:
> Signed-off-by: Alejandro Lucero 
> ---
>  drivers/net/nfp/nfp_net.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)



[dpdk-dev] [PATCH 1/7] lib/librte_ether: Add 2/2.5/25/50Gbps link speeds

2016-03-03 Thread Simon Kågström
On 2016-03-03 10:28, Thomas Monjalon wrote:
> 2016-03-03 08:53, Simon K?gstr?m:

>> I realize this is a more general question, but is it really meaningful
>> to have macros for all possible link speeds? We're working on a PMD
>> driver with a channelized interface exposed as DPDK ports. Each channel
>> can be configured with an arbitrary speed, so e.g., 1337 Mbps is also
>> possible.
> 
> What is the benefit? Why not negotiate the maximum capability of the peer?

Communication is channelized over a backplane, and each channel has a
specific (and configurable) capacity.

>> To me, it would seem more natural to just have a number in mbits for the
>> link speed.
> 
> Please jump in the thread initiated by Marc Sune months ago.

OK. I haven't been following the DPDK mailing list for a while, so I
wasn't aware of this.

// Simon



[dpdk-dev] [PATCH v3] mk: stop on warning only in developer build

2016-03-03 Thread Thomas Monjalon
2016-03-02 22:04, Bruce Richardson:
> On Wed, Mar 02, 2016 at 03:22:23PM +0100, Thomas Monjalon wrote:
> > From: Panu Matilainen 
> > 
> > Add RTE_DEVEL_BUILD make-variable which can be used to do things
> > differently when doing development vs building a release,
> > autodetected from source root .git presence and overridable via
> > commandline. It is used it to enable -Werror compiler flag and may
> > be extended to other checks.
> > 
> > Failing build on warnings is a useful developer tool but its bad
> > for release tarballs which can and do get built with newer
> > compilers than what was used/available during development. Compilers
> > routinely add new warnings so code which built silently with cc X
> > might no longer do so with X+1. This doesn't make the existing code
> > any more buggier and failing the build in this case does not help
> > to improve the quality of an already released version either.
> > 
> > This change the default flags which can be tuned with EXTRA_CFLAGS.
> > 
> > Signed-off-by: Panu Matilainen 
> > Signed-off-by: Thomas Monjalon 
> 
> Acked-by: Bruce Richardson 

Applied, thanks

If people have problems compiling previous releases,
please remind to use EXTRA_CFLAGS=-Wno-error


[dpdk-dev] [PATCH] mk: fix error message

2016-03-03 Thread Thomas Monjalon
2016-03-02 16:52, Thomas Monjalon:
> When specifying a wrong directory with RTE_SDK and RTE_TARGET
> to build an application, the error message about missing config
> file was wrong.
> 
> Fixes: 6b62a72a70d0 ("mk: install a standard cutomizable tree")
> 
> Reported-by: Steeven Lee 
> Signed-off-by: Thomas Monjalon 

Applied


[dpdk-dev] [PATCH] eal: fix symbol map version number

2016-03-03 Thread Thomas Monjalon
> The version 2.3 has been renamed 16.04.
> 
> Fixes: 6d7de6d2e357 ("version: switch to year.month numbers")
> 
> Reported-by: Panu Matilainen 
> Signed-off-by: Thomas Monjalon 

Applied


[dpdk-dev] [PATCH 1/3] kcp: add kernel control path kernel module

2016-03-03 Thread Panu Matilainen
On 03/03/2016 12:05 PM, Ferruh Yigit wrote:
> On 3/3/2016 8:31 AM, Panu Matilainen wrote:
>> On 03/03/2016 12:35 AM, Thomas Monjalon wrote:
>>> 2016-03-02 12:21, Thomas Monjalon:
 2016-03-02 11:47, Vincent JARDIN:
> Le 02/03/2016 09:27, Panu Matilainen a ?crit :
 I'd like to see these be merged.

 Jay
>>>
>>> The code is really not ready. I am okay with cooperative development
>>> but the current code needs to go into a staging type tree.
>>> No compatibility, no ABI guarantees, more of an RFC.
>>> Don't want vendors building products with it then screaming when it
>>> gets rebuilt/reworked/scrapped.
>>>
>>
>> Exactly.
>
> +1 too
>
> We need to build on this innovation while there is a path for kernel
> mainstream. The logic of using a staging is a good one.
>
> Thomas,
>
> can we open a staging folder into the DPDK like it is done into the
> kernel?

 It's possible to create a staging directory if everybody agree.
 It is important to state in a README file or in the doc/ that
 there will be no guarantee (no stable ABI, no validation and can be
 dropped)
 and that it is a work in progress, a suggestion to discuss with the
 kernel
 community.

 The kernel modules must clearly target an upstream integration.
>>>
>>> Actually the examples directory has been used as a staging for ethtool
>>> and
>>> lthread. We also have the crypto API which is still experimental.
>>> So I think we must decide among these 3 solutions:
>>>  - no special directory, just mark and document an experimental state
>>>  - put only kcp/kdp in the staging directory
>>>  - put kcp/kdp in staging and move other experimental libs here
>>
>> To answer this, I think we need to start by clarifying the kernel module
>> situation. Quoting your from
>> http://dpdk.org/ml/archives/dev/2016-January/032263.html:
>>
>>> Sorry the kernel module party is over.
>>> One day, igb_uio will be removed.
>>> I suggest to make a first version without interrupt support
>>> and work with Linux community to fix your issues.
>>
>> This to me reads "no more out-of-tree kernel modules, period" but here
>> we are discussing the fate of another one.
>>
>> If the policy truly is "no more kernel modules" (which I would fully
>> back and applaud) then I think there's little to discuss - if the
>> destination is kernel upstream then why should the modules pass through
>> the dpdk codebase? Put it in another repo on dpdk.org, advertise it,
>> make testing it as easy as possible and all (like have it integrate with
>> dpdk makefiles if needed) instead.
>>
> Hi Panu,
>
> I just want to remind that these modules are to replace existing KNI
> kernel module, and to reduce it's maintenance cost.
> We are not adding new kernel modules for new features.
>
> I believe replacing KNI module with new code in DPDK is a required
> improvement step. But to replace, KNI users should verify the new codes.
>
> Going directly from KNI to Linux upstream, if possible, is not easy.
> Upstreaming should be done in incremental steps.
>
> How about following steps:
> 1- Add KCP/KDP with an EXPERIMENTAL flag.
> 2- When they are mature enough, remove KNI, remove EXPERIMENTAL from
> KCP/KDP.
> 3- Work on upstreaming

And if upstream says no, as they just as well might? You're one step 
forward, two steps back.

You need to engage upstream NOW, as has been suggested in this thread 
several times already.

- Panu -


[dpdk-dev] [PATCH v3] mk: stop on warning only in developer build

2016-03-03 Thread Panu Matilainen
On 03/03/2016 12:36 PM, Thomas Monjalon wrote:
> 2016-03-02 22:04, Bruce Richardson:
>> On Wed, Mar 02, 2016 at 03:22:23PM +0100, Thomas Monjalon wrote:
>>> From: Panu Matilainen 
>>>
>>> Add RTE_DEVEL_BUILD make-variable which can be used to do things
>>> differently when doing development vs building a release,
>>> autodetected from source root .git presence and overridable via
>>> commandline. It is used it to enable -Werror compiler flag and may
>>> be extended to other checks.
>>>
>>> Failing build on warnings is a useful developer tool but its bad
>>> for release tarballs which can and do get built with newer
>>> compilers than what was used/available during development. Compilers
>>> routinely add new warnings so code which built silently with cc X
>>> might no longer do so with X+1. This doesn't make the existing code
>>> any more buggier and failing the build in this case does not help
>>> to improve the quality of an already released version either.
>>>
>>> This change the default flags which can be tuned with EXTRA_CFLAGS.
>>>
>>> Signed-off-by: Panu Matilainen 
>>> Signed-off-by: Thomas Monjalon 
>>
>> Acked-by: Bruce Richardson 
>
> Applied, thanks

Thanks for dusting this up, I'd pretty much forgotten the whole thing.

- Panu -




[dpdk-dev] [PATCH v4] librte_pipeline: add support for packet redirection at action handlers

2016-03-03 Thread Jasvinder Singh
Currently, there is no mechanism that allows the pipeline ports (in/out) and
table action handlers to override the default forwarding decision (as
previously configured per input port or in the table entry). Therefore, new
pipeline API functions have been added which allows action handlers to
hijack packets and remove them from the pipeline processing, and then either
drop them or send them out of the pipeline on any output port. The port
(in/out) and table action handler prototypes have been changed for making
use of these new API functions. This feature will be helpful to implement
functions such as exception handling (e.g. TTL =0), load balancing etc.
Changes are made to the ports and table action handlers defined in
app/test_pipeline and ip_pipeline sample application.

Signed-off-by: Jasvinder Singh 
Acked-by: Cristian Dumitrescu 
---
v4
* merged library and app commits

v3
* improved comments in "rte_pipeline.h"

v2
* rebased on master

 app/test-pipeline/pipeline_acl.c   |   3 +-
 app/test-pipeline/pipeline_hash.c  |   3 +-
 app/test-pipeline/pipeline_lpm.c   |   3 +-
 app/test-pipeline/pipeline_lpm_ipv6.c  |   3 +-
 app/test-pipeline/pipeline_stub.c  |   3 +-
 doc/guides/rel_notes/deprecation.rst   |   5 -
 doc/guides/rel_notes/release_16_04.rst |   6 +-
 .../ip_pipeline/pipeline/pipeline_actions_common.h |  47 ++-
 .../ip_pipeline/pipeline/pipeline_firewall_be.c|   3 +-
 .../pipeline/pipeline_flow_actions_be.c|   3 +-
 .../pipeline/pipeline_flow_classification_be.c |   3 +-
 .../ip_pipeline/pipeline/pipeline_passthrough_be.c |   3 +-
 .../ip_pipeline/pipeline/pipeline_routing_be.c |   3 +-
 lib/librte_pipeline/Makefile   |   4 +-
 lib/librte_pipeline/rte_pipeline.c | 461 +++--
 lib/librte_pipeline/rte_pipeline.h | 174 +---
 lib/librte_pipeline/rte_pipeline_version.map   |   8 +
 17 files changed, 399 insertions(+), 336 deletions(-)

diff --git a/app/test-pipeline/pipeline_acl.c b/app/test-pipeline/pipeline_acl.c
index f163e55..22d5f36 100644
--- a/app/test-pipeline/pipeline_acl.c
+++ b/app/test-pipeline/pipeline_acl.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -159,7 +159,6 @@ app_main_loop_worker_pipeline_acl(void) {
.ops = &rte_port_ring_writer_ops,
.arg_create = (void *) &port_ring_params,
.f_action = NULL,
-   .f_action_bulk = NULL,
.arg_ah = NULL,
};

diff --git a/app/test-pipeline/pipeline_hash.c 
b/app/test-pipeline/pipeline_hash.c
index 8b888d7..f8aac0d 100644
--- a/app/test-pipeline/pipeline_hash.c
+++ b/app/test-pipeline/pipeline_hash.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -140,7 +140,6 @@ app_main_loop_worker_pipeline_hash(void) {
.ops = &rte_port_ring_writer_ops,
.arg_create = (void *) &port_ring_params,
.f_action = NULL,
-   .f_action_bulk = NULL,
.arg_ah = NULL,
};

diff --git a/app/test-pipeline/pipeline_lpm.c b/app/test-pipeline/pipeline_lpm.c
index 2d7bc01..916abd4 100644
--- a/app/test-pipeline/pipeline_lpm.c
+++ b/app/test-pipeline/pipeline_lpm.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -99,7 +99,6 @@ app_main_loop_worker_pipeline_lpm(void) {
.ops = &rte_port_ring_writer_ops,
.arg_create = (void *) &port_ring_params,
.f_action = NULL,
-   .f_action_bulk = NULL,
.arg_ah = NULL,
};

diff --git a/app/test-pipeline/pipeline_lpm_ipv6.c 
b/app/test-pipeline/pipeline_lpm_ipv6.c
index c895b62..3352e89 100644
--- a/app/test-pipeline/pipeline_lpm_ipv6.c
+++ b/app/test-pipeline/pipeline_lpm_ipv6.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without

[dpdk-dev] [PATCH v2] nfp: fix variable type in tx checksum offload

2016-03-03 Thread Alejandro Lucero
The mbuf ol_flags field was changed to uin64_t with DPDK version 1.8

Fixes: b812daadad0d (\"nfp: add Rx and Tx\")

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_net.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index fd4dd39..0e3705e 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -1522,7 +1522,7 @@ static inline void
 nfp_net_tx_cksum(struct nfp_net_txq *txq, struct nfp_net_tx_desc *txd,
 struct rte_mbuf *mb)
 {
-   uint16_t ol_flags;
+   uint64_t ol_flags;
struct nfp_net_hw *hw = txq->hw;

if (!(hw->cap & NFP_NET_CFG_CTRL_TXCSUM))
-- 
1.7.9.5



[dpdk-dev] [PATCH] nfp: fix how tx checksum is advertised to firmware

2016-03-03 Thread Alejandro Lucero
Even with tx checksum offload available, do not set the flag by default.

Fixes: b812daadad0d (\"nfp: add Rx and Tx\")

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_net.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index 0e3705e..6078e9f 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -1543,7 +1543,8 @@ nfp_net_tx_cksum(struct nfp_net_txq *txq, struct 
nfp_net_tx_desc *txd,
break;
}

-   txd->flags |= PCIE_DESC_TX_CSUM;
+   if (ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK))
+   txd->flags |= PCIE_DESC_TX_CSUM;
 }

 /* nfp_net_rx_cksum - set mbuf checksum flags based on RX descriptor flags */
-- 
1.7.9.5



[dpdk-dev] [PATCH v3 0/3] Snow3G support for Intel Quick Assist Devices

2016-03-03 Thread Deepak Kumar JAIN
 This patchset contains fixes and refactoring for Snow3G(UEA2 and
 UIA2) wireless algorithm for Intel Quick Assist devices.

 QAT PMD previously supported only cipher/hash alg-chaining for AES/SHA.
 The code has been refactored to also support cipher-only and hash  only
 (for Snow3G only) functionality along with alg-chaining.

 Changes from v2:

 1) Rebasing based on below mentioned patchset.

 This patchset depends on
 cryptodev API changes
 http://dpdk.org/ml/archives/dev/2016-February/034212.html

Deepak Kumar JAIN (3):
  crypto: add cipher/auth only support
  qat: add support for Snow3G
  app/test: add Snow3G tests

 app/test/test_cryptodev.c  | 1037 +++-
 app/test/test_cryptodev.h  |3 +-
 app/test/test_cryptodev_snow3g_hash_test_vectors.h |  415 
 app/test/test_cryptodev_snow3g_test_vectors.h  |  379 +++
 doc/guides/cryptodevs/qat.rst  |8 +-
 doc/guides/rel_notes/release_16_04.rst |6 +
 drivers/crypto/qat/qat_adf/qat_algs.h  |   19 +-
 drivers/crypto/qat/qat_adf/qat_algs_build_desc.c   |  280 +-
 drivers/crypto/qat/qat_crypto.c|  149 ++-
 drivers/crypto/qat/qat_crypto.h|   10 +
 10 files changed, 2231 insertions(+), 75 deletions(-)
 create mode 100644 app/test/test_cryptodev_snow3g_hash_test_vectors.h
 create mode 100644 app/test/test_cryptodev_snow3g_test_vectors.h

-- 
2.1.0



[dpdk-dev] [PATCH v3 1/3] crypto: add cipher/auth only support

2016-03-03 Thread Deepak Kumar JAIN
Refactored the existing functionality into
modular form to support the cipher/auth only
functionalities.

Signed-off-by: Deepak Kumar JAIN 
---
 drivers/crypto/qat/qat_adf/qat_algs.h|  18 +-
 drivers/crypto/qat/qat_adf/qat_algs_build_desc.c | 210 ---
 drivers/crypto/qat/qat_crypto.c  | 137 +++
 drivers/crypto/qat/qat_crypto.h  |  10 ++
 4 files changed, 308 insertions(+), 67 deletions(-)

diff --git a/drivers/crypto/qat/qat_adf/qat_algs.h 
b/drivers/crypto/qat/qat_adf/qat_algs.h
index 76c08c0..b73a5d0 100644
--- a/drivers/crypto/qat/qat_adf/qat_algs.h
+++ b/drivers/crypto/qat/qat_adf/qat_algs.h
@@ -3,7 +3,7 @@
  *  redistributing this file, you may do so under either license.
  *
  *  GPL LICENSE SUMMARY
- *  Copyright(c) 2015 Intel Corporation.
+ *  Copyright(c) 2015-2016 Intel Corporation.
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of version 2 of the GNU General Public License as
  *  published by the Free Software Foundation.
@@ -17,7 +17,7 @@
  *  qat-linux at intel.com
  *
  *  BSD LICENSE
- *  Copyright(c) 2015 Intel Corporation.
+ *  Copyright(c) 2015-2016 Intel Corporation.
  *  Redistribution and use in source and binary forms, with or without
  *  modification, are permitted provided that the following conditions
  *  are met:
@@ -104,11 +104,15 @@ struct qat_alg_ablkcipher_cd {

 int qat_get_inter_state_size(enum icp_qat_hw_auth_algo qat_hash_alg);

-int qat_alg_aead_session_create_content_desc(struct qat_session *cd,
-   uint8_t *enckey, uint32_t enckeylen,
-   uint8_t *authkey, uint32_t authkeylen,
-   uint32_t add_auth_data_length,
-   uint32_t digestsize);
+int qat_alg_aead_session_create_content_desc_cipher(struct qat_session *cd,
+   uint8_t *enckey,
+   uint32_t enckeylen);
+
+int qat_alg_aead_session_create_content_desc_auth(struct qat_session *cdesc,
+   uint8_t *authkey,
+   uint32_t authkeylen,
+   uint32_t add_auth_data_length,
+   uint32_t digestsize);

 void qat_alg_init_common_hdr(struct icp_qat_fw_comn_req_hdr *header);

diff --git a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c 
b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
index ceaffb7..bef444b 100644
--- a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
+++ b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
@@ -3,7 +3,7 @@
  *  redistributing this file, you may do so under either license.
  *
  *  GPL LICENSE SUMMARY
- *  Copyright(c) 2015 Intel Corporation.
+ *  Copyright(c) 2015-2016 Intel Corporation.
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of version 2 of the GNU General Public License as
  *  published by the Free Software Foundation.
@@ -17,7 +17,7 @@
  *  qat-linux at intel.com
  *
  *  BSD LICENSE
- *  Copyright(c) 2015 Intel Corporation.
+ *  Copyright(c) 2015-2016 Intel Corporation.
  *  Redistribution and use in source and binary forms, with or without
  *  modification, are permitted provided that the following conditions
  *  are met:
@@ -359,15 +359,139 @@ void qat_alg_init_common_hdr(struct 
icp_qat_fw_comn_req_hdr *header)
   ICP_QAT_FW_LA_NO_UPDATE_STATE);
 }

-int qat_alg_aead_session_create_content_desc(struct qat_session *cdesc,
-   uint8_t *cipherkey, uint32_t cipherkeylen,
-   uint8_t *authkey, uint32_t authkeylen,
-   uint32_t add_auth_data_length,
-   uint32_t digestsize)
+int qat_alg_aead_session_create_content_desc_cipher(struct qat_session *cdesc,
+   uint8_t *cipherkey,
+   uint32_t cipherkeylen)
 {
-   struct qat_alg_cd *content_desc = &cdesc->cd;
-   struct icp_qat_hw_cipher_algo_blk *cipher = &content_desc->cipher;
-   struct icp_qat_hw_auth_algo_blk *hash = &content_desc->hash;
+   struct icp_qat_hw_cipher_algo_blk *cipher;
+   struct icp_qat_fw_la_bulk_req *req_tmpl = &cdesc->fw_req;
+   struct icp_qat_fw_comn_req_hdr_cd_pars *cd_pars = &req_tmpl->cd_pars;
+   struct icp_qat_fw_comn_req_hdr *header = &req_tmpl->comn_hdr;
+   void *ptr = &req_tmpl->cd_ctrl;
+   struct icp_qat_fw_cipher_cd_ctrl_hdr *cipher_cd_ctrl = ptr;
+   struct icp_qat_fw_auth_cd_ctrl_hdr *hash_cd_ctrl = ptr;
+   enum icp_qat_hw_cipher_convert key_convert;
+   uint16_t proto = ICP_QAT_FW_LA_NO_PROTO;/* no CCM/GCM/Snow3G */
+   uint16_t cipher_offset = 0;
+
+   PM

[dpdk-dev] [PATCH v3 2/3] qat: add support for Snow3G

2016-03-03 Thread Deepak Kumar JAIN
Signed-off-by: Deepak Kumar JAIN 
---
 doc/guides/cryptodevs/qat.rst|  8 ++-
 doc/guides/rel_notes/release_16_04.rst   |  6 ++
 drivers/crypto/qat/qat_adf/qat_algs.h|  1 +
 drivers/crypto/qat/qat_adf/qat_algs_build_desc.c | 86 +---
 drivers/crypto/qat/qat_crypto.c  | 12 +++-
 5 files changed, 100 insertions(+), 13 deletions(-)

diff --git a/doc/guides/cryptodevs/qat.rst b/doc/guides/cryptodevs/qat.rst
index 23402b4..af52047 100644
--- a/doc/guides/cryptodevs/qat.rst
+++ b/doc/guides/cryptodevs/qat.rst
@@ -1,5 +1,5 @@
 ..  BSD LICENSE
-Copyright(c) 2015 Intel Corporation. All rights reserved.
+Copyright(c) 2015-2016 Intel Corporation. All rights reserved.

 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions
@@ -47,6 +47,7 @@ Cipher algorithms:
 * ``RTE_CRYPTO_SYM_CIPHER_AES128_CBC``
 * ``RTE_CRYPTO_SYM_CIPHER_AES192_CBC``
 * ``RTE_CRYPTO_SYM_CIPHER_AES256_CBC``
+* ``RTE_CRYPTO_SYM_CIPHER_SNOW3G_UEA2``

 Hash algorithms:

@@ -54,14 +55,15 @@ Hash algorithms:
 * ``RTE_CRYPTO_AUTH_SHA256_HMAC``
 * ``RTE_CRYPTO_AUTH_SHA512_HMAC``
 * ``RTE_CRYPTO_AUTH_AES_XCBC_MAC``
+* ``RTE_CRYPTO_AUTH_SNOW3G_UIA2``


 Limitations
 ---

 * Chained mbufs are not supported.
-* Hash only is not supported.
-* Cipher only is not supported.
+* Hash only is not supported except Snow3G UIA2.
+* Cipher only is not supported except Snow3G UEA2.
 * Only in-place is currently supported (destination address is the same as 
source address).
 * Only supports the session-oriented API implementation (session-less APIs are 
not supported).
 * Not performance tuned.
diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 64e913d..d8ead62 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -35,6 +35,12 @@ This section should contain new features added in this 
release. Sample format:

   Refer to the previous release notes for examples.

+* **Added support of Snow3G (UEA2 and UIA2) for Intel Quick Assist Devices.**
+
+  Enabled support for Snow3g Wireless algorithm for Intel Quick Assist devices.
+  Support for cipher only, Hash only is also provided
+  along with alg-chaining operations.
+
 * **Enabled bulk allocation of mbufs.**

   A new function ``rte_pktmbuf_alloc_bulk()`` has been added to allow the user
diff --git a/drivers/crypto/qat/qat_adf/qat_algs.h 
b/drivers/crypto/qat/qat_adf/qat_algs.h
index b73a5d0..b47dbc2 100644
--- a/drivers/crypto/qat/qat_adf/qat_algs.h
+++ b/drivers/crypto/qat/qat_adf/qat_algs.h
@@ -125,5 +125,6 @@ void qat_alg_ablkcipher_init_dec(struct 
qat_alg_ablkcipher_cd *cd,
unsigned int keylen);

 int qat_alg_validate_aes_key(int key_len, enum icp_qat_hw_cipher_algo *alg);
+int qat_alg_validate_snow3g_key(int key_len, enum icp_qat_hw_cipher_algo *alg);

 #endif
diff --git a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c 
b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
index bef444b..dd27476 100644
--- a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
+++ b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
@@ -376,7 +376,8 @@ int qat_alg_aead_session_create_content_desc_cipher(struct 
qat_session *cdesc,

PMD_INIT_FUNC_TRACE();

-   if (cdesc->qat_cmd == ICP_QAT_FW_LA_CMD_HASH_CIPHER) {
+   if (cdesc->qat_cmd == ICP_QAT_FW_LA_CMD_HASH_CIPHER &&
+   cdesc->qat_hash_alg != ICP_QAT_HW_AUTH_ALGO_SNOW_3G_UIA2) {
cipher =
(struct icp_qat_hw_cipher_algo_blk *)((char *)&cdesc->cd +
sizeof(struct icp_qat_hw_auth_algo_blk));
@@ -409,13 +410,20 @@ int 
qat_alg_aead_session_create_content_desc_cipher(struct qat_session *cdesc,
else
key_convert = ICP_QAT_HW_CIPHER_KEY_CONVERT;

+   if (cdesc->qat_hash_alg == ICP_QAT_HW_AUTH_ALGO_SNOW_3G_UIA2)
+   key_convert = ICP_QAT_HW_CIPHER_KEY_CONVERT;
+
/* For Snow3G, set key convert and other bits */
if (cdesc->qat_cipher_alg == ICP_QAT_HW_CIPHER_ALGO_SNOW_3G_UEA2) {
key_convert = ICP_QAT_HW_CIPHER_KEY_CONVERT;
ICP_QAT_FW_LA_RET_AUTH_SET(header->serv_specif_flags,
ICP_QAT_FW_LA_NO_RET_AUTH_RES);
-   ICP_QAT_FW_LA_CMP_AUTH_SET(header->serv_specif_flags,
-   ICP_QAT_FW_LA_NO_CMP_AUTH_RES);
+   if (cdesc->qat_cmd == ICP_QAT_FW_LA_CMD_HASH_CIPHER)  {
+   ICP_QAT_FW_LA_RET_AUTH_SET(header->serv_specif_flags,
+   ICP_QAT_FW_LA_RET_AUTH_RES);
+   ICP_QAT_FW_LA_CMP_AUTH_SET(header->serv_specif_flags,
+   ICP_QAT_FW_LA_NO_CMP_AUTH_RES);
+   }
}

cipher->aes.cipher_config.val =
@@ -431,7

[dpdk-dev] [PATCH v3 3/3] app/test: add Snow3G tests

2016-03-03 Thread Deepak Kumar JAIN
Signed-off-by: Deepak Kumar JAIN 
---
 app/test/test_cryptodev.c  | 1037 +++-
 app/test/test_cryptodev.h  |3 +-
 app/test/test_cryptodev_snow3g_hash_test_vectors.h |  415 
 app/test/test_cryptodev_snow3g_test_vectors.h  |  379 +++
 4 files changed, 1831 insertions(+), 3 deletions(-)
 create mode 100644 app/test/test_cryptodev_snow3g_hash_test_vectors.h
 create mode 100644 app/test/test_cryptodev_snow3g_test_vectors.h

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index acba98a..a37018c 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -42,7 +42,8 @@

 #include "test.h"
 #include "test_cryptodev.h"
-
+#include "test_cryptodev_snow3g_test_vectors.h"
+#include "test_cryptodev_snow3g_hash_test_vectors.h"
 static enum rte_cryptodev_type gbl_cryptodev_type;

 struct crypto_testsuite_params {
@@ -68,6 +69,9 @@ struct crypto_unittest_params {
uint8_t *digest;
 };

+#define ALIGN_POW2_ROUNDUP(num, align) \
+   (((num) + (align) - 1) & ~((align) - 1))
+
 /*
  * Forward declarations.
  */
@@ -1748,6 +1752,997 @@ test_AES_CBC_HMAC_AES_XCBC_decrypt_digest_verify(void)
return TEST_SUCCESS;
 }

+/* * Snow3G Tests * */
+static int
+create_snow3g_hash_session(uint8_t dev_id,
+   const uint8_t *key, const uint8_t key_len,
+   const uint8_t aad_len, const uint8_t auth_len,
+   enum rte_crypto_auth_operation op)
+{
+   uint8_t hash_key[key_len];
+
+   struct crypto_unittest_params *ut_params = &unittest_params;
+
+   memcpy(hash_key, key, key_len);
+#ifdef RTE_APP_TEST_DEBUG
+   rte_hexdump(stdout, "key:", key, key_len);
+#endif
+   /* Setup Authentication Parameters */
+   ut_params->auth_xform.type = RTE_CRYPTO_SYM_XFORM_AUTH;
+   ut_params->auth_xform.next = NULL;
+
+   ut_params->auth_xform.auth.op = op;
+   ut_params->auth_xform.auth.algo = RTE_CRYPTO_AUTH_SNOW3G_UIA2;
+   ut_params->auth_xform.auth.key.length = key_len;
+   ut_params->auth_xform.auth.key.data = hash_key;
+   ut_params->auth_xform.auth.digest_length = auth_len;
+   ut_params->auth_xform.auth.add_auth_data_length = aad_len;
+   ut_params->sess = rte_cryptodev_sym_session_create(dev_id,
+   &ut_params->auth_xform);
+   TEST_ASSERT_NOT_NULL(ut_params->sess, "Session creation failed");
+   return 0;
+}
+static int
+create_snow3g_cipher_session(uint8_t dev_id,
+   enum rte_crypto_cipher_operation op,
+   const uint8_t *key, const uint8_t key_len)
+{
+   uint8_t cipher_key[key_len];
+
+   struct crypto_unittest_params *ut_params = &unittest_params;
+
+   memcpy(cipher_key, key, key_len);
+
+   /* Setup Cipher Parameters */
+   ut_params->cipher_xform.type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+   ut_params->cipher_xform.next = NULL;
+
+   ut_params->cipher_xform.cipher.algo = RTE_CRYPTO_CIPHER_SNOW3G_UEA2;
+   ut_params->cipher_xform.cipher.op = op;
+   ut_params->cipher_xform.cipher.key.data = cipher_key;
+   ut_params->cipher_xform.cipher.key.length = key_len;
+
+#ifdef RTE_APP_TEST_DEBUG
+   rte_hexdump(stdout, "key:", key, key_len);
+#endif
+   /* Create Crypto session */
+   ut_params->sess = rte_cryptodev_sym_session_create(dev_id,
+   &ut_params->
+   cipher_xform);
+   TEST_ASSERT_NOT_NULL(ut_params->sess, "Session creation failed");
+   return 0;
+}
+
+static int
+create_snow3g_cipher_operation(const uint8_t *iv, const unsigned iv_len,
+   const unsigned data_len)
+{
+   struct crypto_testsuite_params *ts_params = &testsuite_params;
+   struct crypto_unittest_params *ut_params = &unittest_params;
+   unsigned iv_pad_len = 0;
+
+   /* Generate Crypto op data structure */
+   ut_params->op = rte_crypto_op_alloc(ts_params->op_mpool,
+   RTE_CRYPTO_OP_TYPE_SYMMETRIC);
+   TEST_ASSERT_NOT_NULL(ut_params->op,
+   "Failed to allocate pktmbuf offload");
+
+   /* Set crypto operation data parameters */
+   rte_crypto_op_attach_sym_session(ut_params->op, ut_params->sess);
+
+   struct rte_crypto_sym_op *sym_op = ut_params->op->sym;
+
+   /* set crypto operation source mbuf */
+   sym_op->m_src = ut_params->ibuf;
+
+   /* iv */
+   iv_pad_len = RTE_ALIGN_CEIL(iv_len, 16);
+   sym_op->cipher.iv.data = (uint8_t *)rte_pktmbuf_prepend(ut_params->ibuf
+   , iv_pad_len);
+
+   TEST_ASSERT_NOT_NULL(sym_op->cipher.iv.data, "no room to prepend iv");
+
+   memset(sym_op->cipher.iv.data, 0, iv_pad_len);
+   sym_op->cipher.iv.phys_addr = rte_pktmbuf_mtophys(ut_params->ibuf);
+   sym_op->cipher.iv.length = iv_pad_len;
+
+   rte_memcpy(sym_op->cipher.iv.data, iv, iv_len);
+

[dpdk-dev] [PATCH v1 0/3] virtio vector and misc

2016-03-03 Thread Thomas Monjalon
2016-03-02 14:11, Santosh Shukla:
> On Wed, Mar 2, 2016 at 2:02 PM, Yuanhan Liu  
> wrote:
> > On Tue, Mar 01, 2016 at 03:32:17PM +0530, Santosh Shukla wrote:
> >> - 1st patch: let non-x86 arch use virtio pmd driver in non-vec
> >> - 2nd patch: enable virtio arm support
> >> - 3rd patch: update virtio for arm feature entry in release guide.
> >
> > Series looks good to me:
> >
> > Acked-by: Yuanhan Liu 
> >
> > However, FYI, Thomas would like to see that the release note is
> > added __inside__ the patch that acutally enables it, but not in
> > another standalone patch. In another word, you should squash
> > patch 3 to patch 2.
> >
> > I'm wondering Thomas could do that for you this time while applying
> > your pathces. But, this is just a kind remind, and you should not
> > do that next time.
> >
> 
> Thanks!,
> 
> Thomas, Can you please take care?

I've squashed and reworded a bit the release note.
Applied, thanks.


[dpdk-dev] [PATCH v1 0/3] virtio vector and misc

2016-03-03 Thread Santosh Shukla
On Thu, Mar 3, 2016 at 6:56 PM, Thomas Monjalon
 wrote:
> 2016-03-02 14:11, Santosh Shukla:
>> On Wed, Mar 2, 2016 at 2:02 PM, Yuanhan Liu  
>> wrote:
>> > On Tue, Mar 01, 2016 at 03:32:17PM +0530, Santosh Shukla wrote:
>> >> - 1st patch: let non-x86 arch use virtio pmd driver in non-vec
>> >> - 2nd patch: enable virtio arm support
>> >> - 3rd patch: update virtio for arm feature entry in release guide.
>> >
>> > Series looks good to me:
>> >
>> > Acked-by: Yuanhan Liu 
>> >
>> > However, FYI, Thomas would like to see that the release note is
>> > added __inside__ the patch that acutally enables it, but not in
>> > another standalone patch. In another word, you should squash
>> > patch 3 to patch 2.
>> >
>> > I'm wondering Thomas could do that for you this time while applying
>> > your pathces. But, this is just a kind remind, and you should not
>> > do that next time.
>> >
>>
>> Thanks!,
>>
>> Thomas, Can you please take care?
>
> I've squashed and reworded a bit the release note.
> Applied, thanks.

Thanks you!


[dpdk-dev] [PATCH v4] af_packet: make the device detachable

2016-03-03 Thread Iremonger, Bernard
> -Original Message-
> From: Wojciech Zmuda [mailto:woz at semihalf.com]
> Sent: Wednesday, March 2, 2016 11:56 AM
> To: dev at dpdk.org
> Cc: Iremonger, Bernard ;
> linville at tuxdriver.com; Richardson, Bruce ;
> pmatilai at redhat.com
> Subject: [PATCH v4] af_packet: make the device detachable
> 
> Allow dynamic deallocation of af_packet device through proper API
> functions. To achieve this:
> * set device flag to RTE_ETH_DEV_DETACHABLE
> * implement rte_pmd_af_packet_devuninit() and expose it
>   through rte_driver.uninit()
> * copy device name to ethdev->data to make discoverable with
>   rte_eth_dev_allocated()
> Moreover, make af_packet init function static, as there is no reason to keep
> it public.
> 
> Signed-off-by: Wojciech Zmuda 

Acked-by: Bernard Iremonger 




[dpdk-dev] [PATCH v3 0/5] Add flow director and RX VLAN stripping support

2016-03-03 Thread Adrien Mazarguil
To preserve compatibility with Mellanox OFED 3.1, flow director and RX VLAN
stripping code is only enabled if compiled with 3.2.

Changes in v3:
- Fixed flow registration issue caused by missing masks in flow rules.
- Fixed packet duplication with overlapping FDIR rules.
- Added FDIR flush command support.
- Updated Mellanox OFED prerequisite to 3.2-2.0.0.0.

Changes in v2:
- Rebased patchset on top of dpdk-next-net/rel_16_04.
- Fixed trivial compilation warnings (positive errnos are left on purpose).
- Updated documentation and release notes for flow director and RX VLAN
  stripping features.
- Fixed missing Mellanox OFED >= 3.2 check for CQ family query interface
  version.

Yaacov Hazan (5):
  mlx5: refactor special flows handling
  mlx5: add special flows (broadcast and IPv6 multicast)
  mlx5: make flow steering rule generator more generic
  mlx5: add support for flow director
  mlx5: add support for RX VLAN stripping

 doc/guides/nics/mlx5.rst   |  16 +-
 doc/guides/rel_notes/release_16_04.rst |  13 +
 drivers/net/mlx5/Makefile  |   6 +
 drivers/net/mlx5/mlx5.c|  39 +-
 drivers/net/mlx5/mlx5.h|  19 +-
 drivers/net/mlx5/mlx5_defs.h   |  14 +
 drivers/net/mlx5/mlx5_ethdev.c |   3 +-
 drivers/net/mlx5/mlx5_fdir.c   | 980 +
 drivers/net/mlx5/mlx5_mac.c|  10 +-
 drivers/net/mlx5/mlx5_rxmode.c | 350 ++--
 drivers/net/mlx5/mlx5_rxq.c|  82 ++-
 drivers/net/mlx5/mlx5_rxtx.c   |  27 +
 drivers/net/mlx5/mlx5_rxtx.h   |  51 +-
 drivers/net/mlx5/mlx5_trigger.c|  21 +-
 drivers/net/mlx5/mlx5_vlan.c   | 104 
 15 files changed, 1508 insertions(+), 227 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_fdir.c

-- 
2.1.4



[dpdk-dev] [PATCH v3 1/5] mlx5: refactor special flows handling

2016-03-03 Thread Adrien Mazarguil
From: Yaacov Hazan 

Merge redundant code by adding a static initialization table to manage
promiscuous and allmulticast (special) flows.

New function priv_rehash_flows() implements the logic to enable/disable
relevant flows in one place from any context.

Signed-off-by: Yaacov Hazan 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c |   4 +-
 drivers/net/mlx5/mlx5.h |   6 +-
 drivers/net/mlx5/mlx5_defs.h|   3 +
 drivers/net/mlx5/mlx5_rxmode.c  | 321 ++--
 drivers/net/mlx5/mlx5_rxq.c |  33 -
 drivers/net/mlx5/mlx5_rxtx.h|  29 +++-
 drivers/net/mlx5/mlx5_trigger.c |  14 +-
 7 files changed, 210 insertions(+), 200 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d88b5..52bf4b2 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -88,8 +88,8 @@ mlx5_dev_close(struct rte_eth_dev *dev)
  ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
/* In case mlx5_dev_stop() has not been called. */
priv_dev_interrupt_handler_uninstall(priv, dev);
-   priv_allmulticast_disable(priv);
-   priv_promiscuous_disable(priv);
+   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_ALLMULTI);
+   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_PROMISC);
priv_mac_addrs_disable(priv);
priv_destroy_hash_rxqs(priv);
/* Prevent crashes when queues are still in use. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 2f9a594..1c69bfa 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -195,13 +195,11 @@ int mlx5_dev_rss_reta_update(struct rte_eth_dev *,

 /* mlx5_rxmode.c */

-int priv_promiscuous_enable(struct priv *);
+int priv_special_flow_enable(struct priv *, enum hash_rxq_flow_type);
+void priv_special_flow_disable(struct priv *, enum hash_rxq_flow_type);
 void mlx5_promiscuous_enable(struct rte_eth_dev *);
-void priv_promiscuous_disable(struct priv *);
 void mlx5_promiscuous_disable(struct rte_eth_dev *);
-int priv_allmulticast_enable(struct priv *);
 void mlx5_allmulticast_enable(struct rte_eth_dev *);
-void priv_allmulticast_disable(struct priv *);
 void mlx5_allmulticast_disable(struct rte_eth_dev *);

 /* mlx5_stats.c */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index bb82c9a..1ec14ef 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -43,6 +43,9 @@
 /* Maximum number of simultaneous VLAN filters. */
 #define MLX5_MAX_VLAN_IDS 128

+/* Maximum number of special flows. */
+#define MLX5_MAX_SPECIAL_FLOWS 2
+
 /* Request send completion once in every 64 sends, might be less. */
 #define MLX5_PMD_TX_PER_COMP_REQ 64

diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 096fd18..b2ed17e 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -58,31 +58,96 @@
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"

-static void hash_rxq_promiscuous_disable(struct hash_rxq *);
-static void hash_rxq_allmulticast_disable(struct hash_rxq *);
+/* Initialization data for special flows. */
+static const struct special_flow_init special_flow_init[] = {
+   [HASH_RXQ_FLOW_TYPE_PROMISC] = {
+   .dst_mac_val = "\x00\x00\x00\x00\x00\x00",
+   .dst_mac_mask = "\x00\x00\x00\x00\x00\x00",
+   .hash_types =
+   1 << HASH_RXQ_TCPV4 |
+   1 << HASH_RXQ_UDPV4 |
+   1 << HASH_RXQ_IPV4 |
+#ifdef HAVE_FLOW_SPEC_IPV6
+   1 << HASH_RXQ_TCPV6 |
+   1 << HASH_RXQ_UDPV6 |
+   1 << HASH_RXQ_IPV6 |
+#endif /* HAVE_FLOW_SPEC_IPV6 */
+   1 << HASH_RXQ_ETH |
+   0,
+   },
+   [HASH_RXQ_FLOW_TYPE_ALLMULTI] = {
+   .dst_mac_val = "\x01\x00\x00\x00\x00\x00",
+   .dst_mac_mask = "\x01\x00\x00\x00\x00\x00",
+   .hash_types =
+   1 << HASH_RXQ_UDPV4 |
+   1 << HASH_RXQ_IPV4 |
+#ifdef HAVE_FLOW_SPEC_IPV6
+   1 << HASH_RXQ_UDPV6 |
+   1 << HASH_RXQ_IPV6 |
+#endif /* HAVE_FLOW_SPEC_IPV6 */
+   1 << HASH_RXQ_ETH |
+   0,
+   },
+};

 /**
- * Enable promiscuous mode in a hash RX queue.
+ * Enable a special flow in a hash RX queue.
  *
  * @param hash_rxq
  *   Pointer to hash RX queue structure.
+ * @param flow_type
+ *   Special flow type.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 static int
-hash_rxq_promiscuous_enable(struct hash_rxq *hash_rxq)
+hash_rxq_special_flow_enable(struct hash_rxq *hash_rxq,
+enum hash_rxq_flow_type flow_type)
 {
struct ibv_exp_flow *flow;
FLOW_ATTR_SPEC_ETH(data, hash_rxq_flow_attr(hash_rxq, NULL, 0));
struct ibv_exp_flow_attr *attr = &data->attr;
+   stru

[dpdk-dev] [PATCH v3 2/5] mlx5: add special flows (broadcast and IPv6 multicast)

2016-03-03 Thread Adrien Mazarguil
From: Yaacov Hazan 

Until now, broadcast frames were handled like unicast. Moving the related
flow to the special flows table frees up the related unicast MAC entry.

The same method is used to handle IPv6 multicast frames.

Signed-off-by: Yaacov Hazan 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c |  7 +++
 drivers/net/mlx5/mlx5_defs.h|  2 +-
 drivers/net/mlx5/mlx5_ethdev.c  |  3 +--
 drivers/net/mlx5/mlx5_mac.c |  6 ++
 drivers/net/mlx5/mlx5_rxmode.c  | 24 
 drivers/net/mlx5/mlx5_rxq.c | 10 ++
 drivers/net/mlx5/mlx5_rxtx.h|  6 ++
 drivers/net/mlx5/mlx5_trigger.c |  4 
 8 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 52bf4b2..cf7c4a5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -90,6 +90,8 @@ mlx5_dev_close(struct rte_eth_dev *dev)
priv_dev_interrupt_handler_uninstall(priv, dev);
priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_ALLMULTI);
priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_PROMISC);
+   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_BROADCAST);
+   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_IPV6MULTI);
priv_mac_addrs_disable(priv);
priv_destroy_hash_rxqs(priv);
/* Prevent crashes when queues are still in use. */
@@ -416,13 +418,10 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
 mac.addr_bytes[0], mac.addr_bytes[1],
 mac.addr_bytes[2], mac.addr_bytes[3],
 mac.addr_bytes[4], mac.addr_bytes[5]);
-   /* Register MAC and broadcast addresses. */
+   /* Register MAC address. */
claim_zero(priv_mac_addr_add(priv, 0,
 (const uint8_t (*)[ETHER_ADDR_LEN])
 mac.addr_bytes));
-   claim_zero(priv_mac_addr_add(priv, (RTE_DIM(priv->mac) - 1),
-&(const uint8_t [ETHER_ADDR_LEN])
-{ "\xff\xff\xff\xff\xff\xff" }));
 #ifndef NDEBUG
{
char ifname[IF_NAMESIZE];
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 1ec14ef..67c3948 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -44,7 +44,7 @@
 #define MLX5_MAX_VLAN_IDS 128

 /* Maximum number of special flows. */
-#define MLX5_MAX_SPECIAL_FLOWS 2
+#define MLX5_MAX_SPECIAL_FLOWS 4

 /* Request send completion once in every 64 sends, might be less. */
 #define MLX5_PMD_TX_PER_COMP_REQ 64
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 1159fa3..6704382 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -501,8 +501,7 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *info)
max = 65535;
info->max_rx_queues = max;
info->max_tx_queues = max;
-   /* Last array entry is reserved for broadcast. */
-   info->max_mac_addrs = (RTE_DIM(priv->mac) - 1);
+   info->max_mac_addrs = RTE_DIM(priv->mac);
info->rx_offload_capa =
(priv->hw_csum ?
 (DEV_RX_OFFLOAD_IPV4_CKSUM |
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index b1f34d9..a1a7ff5 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -212,8 +212,7 @@ mlx5_mac_addr_remove(struct rte_eth_dev *dev, uint32_t 
index)
priv_lock(priv);
DEBUG("%p: removing MAC address from index %" PRIu32,
  (void *)dev, index);
-   /* Last array entry is reserved for broadcast. */
-   if (index >= (RTE_DIM(priv->mac) - 1))
+   if (index >= RTE_DIM(priv->mac))
goto end;
priv_mac_addr_del(priv, index);
 end:
@@ -479,8 +478,7 @@ mlx5_mac_addr_add(struct rte_eth_dev *dev, struct 
ether_addr *mac_addr,
priv_lock(priv);
DEBUG("%p: adding MAC address at index %" PRIu32,
  (void *)dev, index);
-   /* Last array entry is reserved for broadcast. */
-   if (index >= (RTE_DIM(priv->mac) - 1))
+   if (index >= RTE_DIM(priv->mac))
goto end;
priv_mac_addr_add(priv, index,
  (const uint8_t (*)[ETHER_ADDR_LEN])
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index b2ed17e..6ee7ce3 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -88,6 +88,30 @@ static const struct special_flow_init special_flow_init[] = {
1 << HASH_RXQ_ETH |
0,
},
+   [HASH_RXQ_FLOW_TYPE_BROADCAST] = {
+   .dst_mac_val = "\xff\xff\xff\xff\xff\xff",
+   .dst_mac_mask = "\xff\xff\xff\xff\xff\xff",
+   .has

[dpdk-dev] [PATCH v3 3/5] mlx5: make flow steering rule generator more generic

2016-03-03 Thread Adrien Mazarguil
From: Yaacov Hazan 

Upcoming flow director support will reuse this function to generate filter
rules.

Signed-off-by: Yaacov Hazan 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_mac.c|  4 ++--
 drivers/net/mlx5/mlx5_rxmode.c |  5 +++--
 drivers/net/mlx5/mlx5_rxq.c| 16 
 drivers/net/mlx5/mlx5_rxtx.h   |  4 ++--
 4 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index a1a7ff5..edb05ad 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -241,7 +241,7 @@ hash_rxq_add_mac_flow(struct hash_rxq *hash_rxq, unsigned 
int mac_index,
const uint8_t (*mac)[ETHER_ADDR_LEN] =
(const uint8_t (*)[ETHER_ADDR_LEN])
priv->mac[mac_index].addr_bytes;
-   FLOW_ATTR_SPEC_ETH(data, hash_rxq_flow_attr(hash_rxq, NULL, 0));
+   FLOW_ATTR_SPEC_ETH(data, priv_flow_attr(priv, NULL, 0, hash_rxq->type));
struct ibv_exp_flow_attr *attr = &data->attr;
struct ibv_exp_flow_spec_eth *spec = &data->spec;
unsigned int vlan_enabled = !!priv->vlan_filter_n;
@@ -256,7 +256,7 @@ hash_rxq_add_mac_flow(struct hash_rxq *hash_rxq, unsigned 
int mac_index,
 * This layout is expected by libibverbs.
 */
assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
-   hash_rxq_flow_attr(hash_rxq, attr, sizeof(data));
+   priv_flow_attr(priv, attr, sizeof(data), hash_rxq->type);
/* The first specification must be Ethernet. */
assert(spec->type == IBV_EXP_FLOW_SPEC_ETH);
assert(spec->size == sizeof(*spec));
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 6ee7ce3..9ac7a41 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -129,8 +129,9 @@ static int
 hash_rxq_special_flow_enable(struct hash_rxq *hash_rxq,
 enum hash_rxq_flow_type flow_type)
 {
+   struct priv *priv = hash_rxq->priv;
struct ibv_exp_flow *flow;
-   FLOW_ATTR_SPEC_ETH(data, hash_rxq_flow_attr(hash_rxq, NULL, 0));
+   FLOW_ATTR_SPEC_ETH(data, priv_flow_attr(priv, NULL, 0, hash_rxq->type));
struct ibv_exp_flow_attr *attr = &data->attr;
struct ibv_exp_flow_spec_eth *spec = &data->spec;
const uint8_t *mac;
@@ -148,7 +149,7 @@ hash_rxq_special_flow_enable(struct hash_rxq *hash_rxq,
 * This layout is expected by libibverbs.
 */
assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
-   hash_rxq_flow_attr(hash_rxq, attr, sizeof(data));
+   priv_flow_attr(priv, attr, sizeof(data), hash_rxq->type);
/* The first specification must be Ethernet. */
assert(spec->type == IBV_EXP_FLOW_SPEC_ETH);
assert(spec->size == sizeof(*spec));
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index fcf192a..36910b2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -210,27 +210,27 @@ const size_t rss_hash_default_key_len = 
sizeof(rss_hash_default_key);
  * information from hash_rxq_init[]. Nothing is written to flow_attr when
  * flow_attr_size is not large enough, but the required size is still returned.
  *
- * @param[in] hash_rxq
- *   Pointer to hash RX queue.
+ * @param priv
+ *   Pointer to private structure.
  * @param[out] flow_attr
  *   Pointer to flow attribute structure to fill. Note that the allocated
  *   area must be larger and large enough to hold all flow specifications.
  * @param flow_attr_size
  *   Entire size of flow_attr and trailing room for flow specifications.
+ * @param type
+ *   Hash RX queue type to use for flow steering rule.
  *
  * @return
  *   Total size of the flow attribute buffer. No errors are defined.
  */
 size_t
-hash_rxq_flow_attr(const struct hash_rxq *hash_rxq,
-  struct ibv_exp_flow_attr *flow_attr,
-  size_t flow_attr_size)
+priv_flow_attr(struct priv *priv, struct ibv_exp_flow_attr *flow_attr,
+  size_t flow_attr_size, enum hash_rxq_type type)
 {
size_t offset = sizeof(*flow_attr);
-   enum hash_rxq_type type = hash_rxq->type;
const struct hash_rxq_init *init = &hash_rxq_init[type];

-   assert(hash_rxq->priv != NULL);
+   assert(priv != NULL);
assert((size_t)type < RTE_DIM(hash_rxq_init));
do {
offset += init->flow_spec.hdr.size;
@@ -244,7 +244,7 @@ hash_rxq_flow_attr(const struct hash_rxq *hash_rxq,
.type = IBV_EXP_FLOW_ATTR_NORMAL,
.priority = init->flow_priority,
.num_of_specs = 0,
-   .port = hash_rxq->priv->port,
+   .port = priv->port,
.flags = 0,
};
do {
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index d5a5019..c42bb8d 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -270,8 +270

[dpdk-dev] [PATCH v3 4/5] mlx5: add support for flow director

2016-03-03 Thread Adrien Mazarguil
From: Yaacov Hazan 

Add support for flow director filters (RTE_FDIR_MODE_PERFECT and
RTE_FDIR_MODE_PERFECT_MAC_VLAN modes).

This feature requires MLNX_OFED >= 3.2.

Signed-off-by: Yaacov Hazan 
Signed-off-by: Adrien Mazarguil 
Signed-off-by: Raslan Darawsheh 
---
 doc/guides/nics/mlx5.rst   |  14 +-
 doc/guides/rel_notes/release_16_04.rst |   7 +
 drivers/net/mlx5/Makefile  |   6 +
 drivers/net/mlx5/mlx5.c|  12 +
 drivers/net/mlx5/mlx5.h|  10 +
 drivers/net/mlx5/mlx5_defs.h   |  11 +
 drivers/net/mlx5/mlx5_fdir.c   | 980 +
 drivers/net/mlx5/mlx5_rxq.c|   6 +
 drivers/net/mlx5/mlx5_rxtx.h   |   7 +
 drivers/net/mlx5/mlx5_trigger.c|   3 +
 10 files changed, 1055 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/mlx5/mlx5_fdir.c

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index b2a12ce..6bd452e 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -86,6 +86,7 @@ Features
 - Promiscuous mode.
 - Multicast promiscuous mode.
 - Hardware checksum offloads.
+- Flow director (RTE_FDIR_MODE_PERFECT and RTE_FDIR_MODE_PERFECT_MAC_VLAN).

 Limitations
 ---
@@ -214,7 +215,8 @@ DPDK and must be installed separately:

 Currently supported by DPDK:

-- Mellanox OFED **3.1-1.0.3** or **3.1-1.5.7.1** depending on usage.
+- Mellanox OFED **3.1-1.0.3**, **3.1-1.5.7.1** or **3.2-2.0.0.0** depending
+  on usage.

 The following features are supported with version **3.1-1.5.7.1** and
 above only:
@@ -223,6 +225,11 @@ Currently supported by DPDK:
 - RX checksum offloads.
 - IBM POWER8.

+The following features are supported with version **3.2-2.0.0.0** and
+above only:
+
+- Flow director.
+
 - Minimum firmware version:

   With MLNX_OFED **3.1-1.0.3**:
@@ -235,6 +242,11 @@ Currently supported by DPDK:
   - ConnectX-4: **12.13.0144**
   - ConnectX-4 Lx: **14.13.0144**

+  With MLNX_OFED **3.2-2.0.0.0**:
+
+  - ConnectX-4: **12.14.2036**
+  - ConnectX-4 Lx: **14.14.2036**
+
 Getting Mellanox OFED
 ~

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 73494f9..c6c76d6 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -74,6 +74,13 @@ This section should contain new features added in this 
release. Sample format:

 * **szedata2: Add functions for setting link up/down.**

+* **mlx5: flow director support.**
+
+  Added flow director support (RTE_FDIR_MODE_PERFECT and
+  RTE_FDIR_MODE_PERFECT_MAC_VLAN).
+
+  Only available with Mellanox OFED >= 3.2.
+

 Resolved Issues
 ---
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 698f072..46a17e0 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -52,6 +52,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxmode.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_vlan.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_stats.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rss.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_fdir.c

 # Dependencies.
 DEPDIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/librte_ether
@@ -125,6 +126,11 @@ mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
infiniband/verbs.h \
enum IBV_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR \
$(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS \
+   infiniband/verbs.h \
+   enum IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS \
+   $(AUTOCONF_OUTPUT)

 $(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index cf7c4a5..43e24ff 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -94,6 +94,11 @@ mlx5_dev_close(struct rte_eth_dev *dev)
priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_IPV6MULTI);
priv_mac_addrs_disable(priv);
priv_destroy_hash_rxqs(priv);
+
+   /* Remove flow director elements. */
+   priv_fdir_disable(priv);
+   priv_fdir_delete_filters_list(priv);
+
/* Prevent crashes when queues are still in use. */
dev->rx_pkt_burst = removed_rx_burst;
dev->tx_pkt_burst = removed_tx_burst;
@@ -170,6 +175,9 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.reta_query = mlx5_dev_rss_reta_query,
.rss_hash_update = mlx5_rss_hash_update,
.rss_hash_conf_get = mlx5_rss_hash_conf_get,
+#ifdef MLX5_FDIR_SUPPORT
+   .filter_ctrl = mlx5_dev_filter_ctrl,
+#endif /* MLX5_FDIR_SUPPORT */
 };

 static struct {
@@ -422,6 +430,10 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
claim_zero(priv_mac_addr_add(priv, 0,
 (const uint8_t (*)[ETHER_ADDR_LEN])

[dpdk-dev] [PATCH v3 5/5] mlx5: add support for RX VLAN stripping

2016-03-03 Thread Adrien Mazarguil
From: Yaacov Hazan 

Allows HW to strip the 802.1Q header from incoming frames and report it
through the mbuf structure.

This feature requires MLNX_OFED >= 3.2.

Signed-off-by: Yaacov Hazan 
Signed-off-by: Adrien Mazarguil 
---
 doc/guides/nics/mlx5.rst   |   2 +
 doc/guides/rel_notes/release_16_04.rst |   6 ++
 drivers/net/mlx5/mlx5.c|  16 -
 drivers/net/mlx5/mlx5.h|   3 +
 drivers/net/mlx5/mlx5_rxq.c|  17 +-
 drivers/net/mlx5/mlx5_rxtx.c   |  27 +
 drivers/net/mlx5/mlx5_rxtx.h   |   5 ++
 drivers/net/mlx5/mlx5_vlan.c   | 104 +
 8 files changed, 178 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 6bd452e..edfbf1f 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -83,6 +83,7 @@ Features
 - Configurable RETA table.
 - Support for multiple MAC addresses.
 - VLAN filtering.
+- RX VLAN stripping.
 - Promiscuous mode.
 - Multicast promiscuous mode.
 - Hardware checksum offloads.
@@ -229,6 +230,7 @@ Currently supported by DPDK:
 above only:

 - Flow director.
+- RX VLAN stripping.

 - Minimum firmware version:

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index c6c76d6..c69e55e 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -81,6 +81,12 @@ This section should contain new features added in this 
release. Sample format:

   Only available with Mellanox OFED >= 3.2.

+* **mlx5: RX VLAN stripping support.**
+
+  Added support for RX VLAN stripping.
+
+  Only available with Mellanox OFED >= 3.2.
+

 Resolved Issues
 ---
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 43e24ff..575420e 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -171,6 +171,10 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.mac_addr_add = mlx5_mac_addr_add,
.mac_addr_set = mlx5_mac_addr_set,
.mtu_set = mlx5_dev_set_mtu,
+#ifdef HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS
+   .vlan_strip_queue_set = mlx5_vlan_strip_queue_set,
+   .vlan_offload_set = mlx5_vlan_offload_set,
+#endif /* HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS */
.reta_update = mlx5_dev_rss_reta_update,
.reta_query = mlx5_dev_rss_reta_query,
.rss_hash_update = mlx5_rss_hash_update,
@@ -325,7 +329,11 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
 #ifdef HAVE_EXP_QUERY_DEVICE
exp_device_attr.comp_mask =
IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS |
-   IBV_EXP_DEVICE_ATTR_RX_HASH;
+   IBV_EXP_DEVICE_ATTR_RX_HASH |
+#ifdef HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS
+   IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS |
+#endif /* HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS */
+   0;
 #endif /* HAVE_EXP_QUERY_DEVICE */

DEBUG("using port %u (%08" PRIx32 ")", port, test);
@@ -396,6 +404,12 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
priv->ind_table_max_size = RSS_INDIRECTION_TABLE_SIZE;
DEBUG("maximum RX indirection table size is %u",
  priv->ind_table_max_size);
+#ifdef HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS
+   priv->hw_vlan_strip = !!(exp_device_attr.wq_vlan_offloads_cap &
+IBV_EXP_RECEIVE_WQ_CVLAN_STRIP);
+#endif /* HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS */
+   DEBUG("VLAN stripping is %ssupported",
+ (priv->hw_vlan_strip ? "" : "not "));

 #else /* HAVE_EXP_QUERY_DEVICE */
priv->ind_table_max_size = RSS_INDIRECTION_TABLE_SIZE;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 8019ee3..8442016 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -101,6 +101,7 @@ struct priv {
unsigned int allmulti_req:1; /* All multicast mode requested. */
unsigned int hw_csum:1; /* Checksum offload is supported. */
unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
+   unsigned int hw_vlan_strip:1; /* VLAN stripping is supported. */
unsigned int vf:1; /* This is a VF device. */
unsigned int pending_alarm:1; /* An alarm is pending. */
/* RX/TX queues. */
@@ -211,6 +212,8 @@ void mlx5_stats_reset(struct rte_eth_dev *);
 /* mlx5_vlan.c */

 int mlx5_vlan_filter_set(struct rte_eth_dev *, uint16_t, int);
+void mlx5_vlan_offload_set(struct rte_eth_dev *, int);
+void mlx5_vlan_strip_queue_set(struct rte_eth_dev *, uint16_t, int);

 /* mlx5_trigger.c */

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 093f4e5..573ad8f 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1224,6 +1224,8 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq 

[dpdk-dev] [PATCH v3 0/7] Performance optimizations for mlx5 and mlx4

2016-03-03 Thread Adrien Mazarguil
This patchset improves the mlx5 PMD performance by doing better prefetching,
by reordering internal structure fields and by removing a few unnecessary
operations.

Note: should be applied after "Add flow director and RX VLAN stripping
support" to avoid conflicts.

Changes in v3:
- None, submitted again due to dependency with previous patchset.

Changes in v2:
- Rebased patchset on top of dpdk-next-net/rel_16_04.
- Fixed missing update for receive function in rxq_rehash().
- Added a commit to register memory on page boundaries instead of mempool
  object boundaries for better performance (mlx4 and mlx5).

Adrien Mazarguil (1):
  mlx: use aligned memory to register regions

Nelio Laranjeiro (6):
  mlx5: prefetch next TX mbuf header and data
  mlx5: reorder TX/RX queue structure
  mlx5: remove one indirection level from RX/TX functions
  mlx5: process offload flags only when requested
  mlx5: avoid lkey retrieval for inlined packets
  mlx5: free buffers immediately after completion

 drivers/net/mlx4/mlx4.c  |  58 ++---
 drivers/net/mlx5/Makefile|   1 +
 drivers/net/mlx5/mlx5_rxq.c  |  22 +++--
 drivers/net/mlx5/mlx5_rxtx.c | 189 +++
 drivers/net/mlx5/mlx5_rxtx.h |  55 -
 drivers/net/mlx5/mlx5_txq.c  |  14 
 6 files changed, 236 insertions(+), 103 deletions(-)

-- 
2.1.4



[dpdk-dev] [PATCH v3 1/7] mlx5: prefetch next TX mbuf header and data

2016-03-03 Thread Adrien Mazarguil
From: Nelio Laranjeiro 

This change improves performance noticeably.

Signed-off-by: Nelio Laranjeiro 
---
 drivers/net/mlx5/mlx5_rxtx.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 7585570..bee5ce2 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -443,8 +443,11 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
unsigned int i;
unsigned int max;
int err;
+   struct rte_mbuf *buf = pkts[0];

assert(elts_comp_cd != 0);
+   /* Prefetch first packet cacheline. */
+   rte_prefetch0(buf);
txq_complete(txq);
max = (elts_n - (elts_head - txq->elts_tail));
if (max > elts_n)
@@ -458,7 +461,7 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
if (max > pkts_n)
max = pkts_n;
for (i = 0; (i != max); ++i) {
-   struct rte_mbuf *buf = pkts[i];
+   struct rte_mbuf *buf_next = pkts[i + 1];
unsigned int elts_head_next =
(((elts_head + 1) == elts_n) ? 0 : elts_head + 1);
struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
@@ -481,6 +484,8 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
tmp = next;
} while (tmp != NULL);
}
+   if (i + 1 < max)
+   rte_prefetch0(buf_next);
/* Request TX completion. */
if (unlikely(--elts_comp_cd == 0)) {
elts_comp_cd = txq->elts_comp_cd_init;
@@ -502,6 +507,7 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
uintptr_t addr;
uint32_t length;
uint32_t lkey;
+   uintptr_t buf_next_addr;

/* Retrieve buffer information. */
addr = rte_pktmbuf_mtod(buf, uintptr_t);
@@ -522,6 +528,13 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
rte_prefetch0((volatile void *)
  (uintptr_t)addr);
RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
+   /* Prefetch next buffer data. */
+   if (i + 1 < max) {
+   buf_next_addr =
+   rte_pktmbuf_mtod(buf_next, uintptr_t);
+   rte_prefetch0((volatile void *)
+ (uintptr_t)buf_next_addr);
+   }
/* Put packet into send queue. */
 #if MLX5_PMD_MAX_INLINE > 0
if (length <= txq->max_inline)
@@ -571,6 +584,7 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
 #endif /* MLX5_PMD_SGE_WR_N > 1 */
}
elts_head = elts_head_next;
+   buf = buf_next;
 #ifdef MLX5_PMD_SOFT_COUNTERS
/* Increment sent bytes counter. */
txq->stats.obytes += sent_size;
-- 
2.1.4



[dpdk-dev] [PATCH v3 2/7] mlx5: reorder TX/RX queue structure

2016-03-03 Thread Adrien Mazarguil
From: Nelio Laranjeiro 

Remove padding and move important fields to the beginning for better
performance.

Signed-off-by: Nelio Laranjeiro 
---
 drivers/net/mlx5/mlx5_rxtx.h | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index fde0ca2..4a857d8 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -105,7 +105,6 @@ struct priv;
 struct rxq {
struct priv *priv; /* Back pointer to private data. */
struct rte_mempool *mp; /* Memory Pool for allocations. */
-   struct ibv_mr *mr; /* Memory Region (for mp). */
struct ibv_cq *cq; /* Completion Queue. */
struct ibv_exp_wq *wq; /* Work Queue. */
struct ibv_exp_wq_family *if_wq; /* WQ burst interface. */
@@ -117,19 +116,20 @@ struct rxq {
unsigned int port_id; /* Port ID for incoming packets. */
unsigned int elts_n; /* (*elts)[] length. */
unsigned int elts_head; /* Current index in (*elts)[]. */
-   union {
-   struct rxq_elt_sp (*sp)[]; /* Scattered RX elements. */
-   struct rxq_elt (*no_sp)[]; /* RX elements. */
-   } elts;
unsigned int sp:1; /* Use scattered RX elements. */
unsigned int csum:1; /* Enable checksum offloading. */
unsigned int csum_l2tun:1; /* Same for L2 tunnels. */
unsigned int vlan_strip:1; /* Enable VLAN stripping. */
+   union {
+   struct rxq_elt_sp (*sp)[]; /* Scattered RX elements. */
+   struct rxq_elt (*no_sp)[]; /* RX elements. */
+   } elts;
uint32_t mb_len; /* Length of a mp-issued mbuf. */
-   struct mlx5_rxq_stats stats; /* RX queue counters. */
unsigned int socket; /* CPU socket ID for allocations. */
+   struct mlx5_rxq_stats stats; /* RX queue counters. */
struct ibv_exp_res_domain *rd; /* Resource Domain. */
struct fdir_queue fdir_queue; /* Flow director queue. */
+   struct ibv_mr *mr; /* Memory Region (for mp). */
 };

 /* Hash RX queue types. */
@@ -248,30 +248,31 @@ typedef uint8_t linear_t[16384];
 /* TX queue descriptor. */
 struct txq {
struct priv *priv; /* Back pointer to private data. */
-   struct {
-   const struct rte_mempool *mp; /* Cached Memory Pool. */
-   struct ibv_mr *mr; /* Memory Region (for mp). */
-   uint32_t lkey; /* mr->lkey */
-   } mp2mr[MLX5_PMD_TX_MP_CACHE]; /* MP to MR translation table. */
struct ibv_cq *cq; /* Completion Queue. */
struct ibv_qp *qp; /* Queue Pair. */
-   struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
-   struct ibv_exp_cq_family *if_cq; /* CQ interface. */
+   struct txq_elt (*elts)[]; /* TX elements. */
 #if MLX5_PMD_MAX_INLINE > 0
uint32_t max_inline; /* Max inline send size <= MLX5_PMD_MAX_INLINE. */
 #endif
unsigned int elts_n; /* (*elts)[] length. */
-   struct txq_elt (*elts)[]; /* TX elements. */
unsigned int elts_head; /* Current index in (*elts)[]. */
unsigned int elts_tail; /* First element awaiting completion. */
unsigned int elts_comp; /* Number of completion requests. */
unsigned int elts_comp_cd; /* Countdown for next completion request. */
unsigned int elts_comp_cd_init; /* Initial value for countdown. */
+   struct {
+   const struct rte_mempool *mp; /* Cached Memory Pool. */
+   struct ibv_mr *mr; /* Memory Region (for mp). */
+   uint32_t lkey; /* mr->lkey */
+   } mp2mr[MLX5_PMD_TX_MP_CACHE]; /* MP to MR translation table. */
struct mlx5_txq_stats stats; /* TX queue counters. */
+   /* Elements used only for init part are here. */
linear_t (*elts_linear)[]; /* Linearized buffers. */
struct ibv_mr *mr_linear; /* Memory Region for linearized buffers. */
-   unsigned int socket; /* CPU socket ID for allocations. */
+   struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
+   struct ibv_exp_cq_family *if_cq; /* CQ interface. */
struct ibv_exp_res_domain *rd; /* Resource Domain. */
+   unsigned int socket; /* CPU socket ID for allocations. */
 };

 /* mlx5_rxq.c */
-- 
2.1.4



[dpdk-dev] [PATCH v3 3/7] mlx5: remove one indirection level from RX/TX functions

2016-03-03 Thread Adrien Mazarguil
From: Nelio Laranjeiro 

Avoid dereferencing pointers twice to get to fast Verbs functions by
storing them directly in RX/TX queue structures.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Yaacov Hazan 
---
 drivers/net/mlx5/Makefile|  1 +
 drivers/net/mlx5/mlx5_rxq.c  | 16 
 drivers/net/mlx5/mlx5_rxtx.c | 34 +-
 drivers/net/mlx5/mlx5_rxtx.h | 23 +--
 drivers/net/mlx5/mlx5_txq.c  | 14 ++
 5 files changed, 57 insertions(+), 31 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 46a17e0..39cdf2c 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -67,6 +67,7 @@ CFLAGS += -g
 CFLAGS += -I.
 CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-strict-prototypes
 LDLIBS += -libverbs

 # A few warnings cannot be avoided in external headers.
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 573ad8f..55d002e 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -901,6 +901,8 @@ rxq_cleanup(struct rxq *rxq)
rxq_free_elts_sp(rxq);
else
rxq_free_elts(rxq);
+   rxq->poll = NULL;
+   rxq->recv = NULL;
if (rxq->if_wq != NULL) {
assert(rxq->priv != NULL);
assert(rxq->priv->ctx != NULL);
@@ -1103,6 +1105,10 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
err = EIO;
goto error;
}
+   if (tmpl.sp)
+   tmpl.recv = tmpl.if_wq->recv_sg_list;
+   else
+   tmpl.recv = tmpl.if_wq->recv_burst;
 error:
*rxq = tmpl;
assert(err >= 0);
@@ -1345,6 +1351,16 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
*rxq = tmpl;
DEBUG("%p: rxq updated with %p", (void *)rxq, (void *)&tmpl);
assert(ret == 0);
+   /* Assign function in queue. */
+#ifdef HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS
+   rxq->poll = rxq->if_cq->poll_length_flags_cvlan;
+#else /* HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS */
+   rxq->poll = rxq->if_cq->poll_length_flags;
+#endif /* HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS */
+   if (rxq->sp)
+   rxq->recv = rxq->if_wq->recv_sg_list;
+   else
+   rxq->recv = rxq->if_wq->recv_burst;
return 0;
 error:
rxq_cleanup(&tmpl);
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index bee5ce2..63ddc53 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -93,7 +93,7 @@ txq_complete(struct txq *txq)
DEBUG("%p: processing %u work requests completions",
  (void *)txq, elts_comp);
 #endif
-   wcs_n = txq->if_cq->poll_cnt(txq->cq, elts_comp);
+   wcs_n = txq->poll_cnt(txq->cq, elts_comp);
if (unlikely(wcs_n == 0))
return 0;
if (unlikely(wcs_n < 0)) {
@@ -538,14 +538,14 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
/* Put packet into send queue. */
 #if MLX5_PMD_MAX_INLINE > 0
if (length <= txq->max_inline)
-   err = txq->if_qp->send_pending_inline
+   err = txq->send_pending_inline
(txq->qp,
 (void *)addr,
 length,
 send_flags);
else
 #endif
-   err = txq->if_qp->send_pending
+   err = txq->send_pending
(txq->qp,
 addr,
 length,
@@ -567,7 +567,7 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
goto stop;
RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
/* Put SG list into send queue. */
-   err = txq->if_qp->send_pending_sg_list
+   err = txq->send_pending_sg_list
(txq->qp,
 sges,
 ret.num,
@@ -599,7 +599,7 @@ stop:
txq->stats.opackets += i;
 #endif
/* Ring QP doorbell. */
-   err = txq->if_qp->send_flush(txq->qp);
+   err = txq->send_flush(txq->qp);
if (unlikely(err)) {
/* A nonzero value is not supposed to be returned.
 * Nothing can be done about it. */
@@ -733,14 +733,7 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
/* Sanity checks. */
assert(elts_head < rxq->elts_n);
assert(rxq->elts_head < rxq->elts_n);
-#ifdef HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS
-   ret = rxq->if_cq->poll_length_flags_c

[dpdk-dev] [PATCH v3 4/7] mlx5: process offload flags only when requested

2016-03-03 Thread Adrien Mazarguil
From: Nelio Laranjeiro 

Improve performance by processing offloads only when requested by the
application.

Signed-off-by: Nelio Laranjeiro 
---
 drivers/net/mlx5/mlx5_rxtx.c | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 63ddc53..c84ec8c 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -853,14 +853,16 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
NB_SEGS(pkt_buf) = j;
PORT(pkt_buf) = rxq->port_id;
PKT_LEN(pkt_buf) = pkt_buf_len;
-   pkt_buf->packet_type = rxq_cq_to_pkt_type(flags);
-   pkt_buf->ol_flags = rxq_cq_to_ol_flags(rxq, flags);
+   if (rxq->csum | rxq->csum_l2tun | rxq->vlan_strip) {
+   pkt_buf->packet_type = rxq_cq_to_pkt_type(flags);
+   pkt_buf->ol_flags = rxq_cq_to_ol_flags(rxq, flags);
 #ifdef HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS
-   if (flags & IBV_EXP_CQ_RX_CVLAN_STRIPPED_V1) {
-   pkt_buf->ol_flags |= PKT_RX_VLAN_PKT;
-   pkt_buf->vlan_tci = vlan_tci;
-   }
+   if (flags & IBV_EXP_CQ_RX_CVLAN_STRIPPED_V1) {
+   pkt_buf->ol_flags |= PKT_RX_VLAN_PKT;
+   pkt_buf->vlan_tci = vlan_tci;
+   }
 #endif /* HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS */
+   }

/* Return packet. */
*(pkts++) = pkt_buf;
@@ -1006,15 +1008,16 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
NEXT(seg) = NULL;
PKT_LEN(seg) = len;
DATA_LEN(seg) = len;
-   seg->packet_type = rxq_cq_to_pkt_type(flags);
-   seg->ol_flags = rxq_cq_to_ol_flags(rxq, flags);
+   if (rxq->csum | rxq->csum_l2tun | rxq->vlan_strip) {
+   seg->packet_type = rxq_cq_to_pkt_type(flags);
+   seg->ol_flags = rxq_cq_to_ol_flags(rxq, flags);
 #ifdef HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS
-   if (flags & IBV_EXP_CQ_RX_CVLAN_STRIPPED_V1) {
-   seg->ol_flags |= PKT_RX_VLAN_PKT;
-   seg->vlan_tci = vlan_tci;
-   }
+   if (flags & IBV_EXP_CQ_RX_CVLAN_STRIPPED_V1) {
+   seg->ol_flags |= PKT_RX_VLAN_PKT;
+   seg->vlan_tci = vlan_tci;
+   }
 #endif /* HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS */
-
+   }
/* Return packet. */
*(pkts++) = seg;
++pkts_ret;
-- 
2.1.4



[dpdk-dev] [PATCH v3 5/7] mlx5: avoid lkey retrieval for inlined packets

2016-03-03 Thread Adrien Mazarguil
From: Nelio Laranjeiro 

Improves performance as the lkey is not needed by hardware in this case.

Signed-off-by: Nelio Laranjeiro 
---
 drivers/net/mlx5/mlx5_rxtx.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index c84ec8c..b82017e 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -512,16 +512,6 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
/* Retrieve buffer information. */
addr = rte_pktmbuf_mtod(buf, uintptr_t);
length = DATA_LEN(buf);
-   /* Retrieve Memory Region key for this memory pool. */
-   lkey = txq_mp2mr(txq, txq_mb2mp(buf));
-   if (unlikely(lkey == (uint32_t)-1)) {
-   /* MR does not exist. */
-   DEBUG("%p: unable to get MP <-> MR"
- " association", (void *)txq);
-   /* Clean up TX element. */
-   elt->buf = NULL;
-   goto stop;
-   }
/* Update element. */
elt->buf = buf;
if (txq->priv->vf)
@@ -545,12 +535,25 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
 send_flags);
else
 #endif
+   {
+   /* Retrieve Memory Region key for this
+* memory pool. */
+   lkey = txq_mp2mr(txq, txq_mb2mp(buf));
+   if (unlikely(lkey == (uint32_t)-1)) {
+   /* MR does not exist. */
+   DEBUG("%p: unable to get MP <-> MR"
+ " association", (void *)txq);
+   /* Clean up TX element. */
+   elt->buf = NULL;
+   goto stop;
+   }
err = txq->send_pending
(txq->qp,
 addr,
 length,
 lkey,
 send_flags);
+   }
if (unlikely(err))
goto stop;
 #ifdef MLX5_PMD_SOFT_COUNTERS
-- 
2.1.4



[dpdk-dev] [PATCH v3 6/7] mlx5: free buffers immediately after completion

2016-03-03 Thread Adrien Mazarguil
From: Nelio Laranjeiro 

This lowers the amount of cache misses.

Signed-off-by: Nelio Laranjeiro 
---
 drivers/net/mlx5/mlx5_rxtx.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index b82017e..622ac17 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -84,6 +84,7 @@ txq_complete(struct txq *txq)
 {
unsigned int elts_comp = txq->elts_comp;
unsigned int elts_tail = txq->elts_tail;
+   unsigned int elts_free = txq->elts_tail;
const unsigned int elts_n = txq->elts_n;
int wcs_n;

@@ -110,6 +111,25 @@ txq_complete(struct txq *txq)
elts_tail += wcs_n * txq->elts_comp_cd_init;
if (elts_tail >= elts_n)
elts_tail -= elts_n;
+
+   while (elts_free != elts_tail) {
+   struct txq_elt *elt = &(*txq->elts)[elts_free];
+   unsigned int elts_free_next =
+   (((elts_free + 1) == elts_n) ? 0 : elts_free + 1);
+   struct rte_mbuf *tmp = elt->buf;
+   struct txq_elt *elt_next = &(*txq->elts)[elts_free_next];
+
+   RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
+   /* Faster than rte_pktmbuf_free(). */
+   do {
+   struct rte_mbuf *next = NEXT(tmp);
+
+   rte_pktmbuf_free_seg(tmp);
+   tmp = next;
+   } while (tmp != NULL);
+   elts_free = elts_free_next;
+   }
+
txq->elts_tail = elts_tail;
txq->elts_comp = elts_comp;
return 0;
@@ -464,7 +484,6 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
struct rte_mbuf *buf_next = pkts[i + 1];
unsigned int elts_head_next =
(((elts_head + 1) == elts_n) ? 0 : elts_head + 1);
-   struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
struct txq_elt *elt = &(*txq->elts)[elts_head];
unsigned int segs = NB_SEGS(buf);
 #ifdef MLX5_PMD_SOFT_COUNTERS
@@ -472,18 +491,6 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
 #endif
uint32_t send_flags = 0;

-   /* Clean up old buffer. */
-   if (likely(elt->buf != NULL)) {
-   struct rte_mbuf *tmp = elt->buf;
-
-   /* Faster than rte_pktmbuf_free(). */
-   do {
-   struct rte_mbuf *next = NEXT(tmp);
-
-   rte_pktmbuf_free_seg(tmp);
-   tmp = next;
-   } while (tmp != NULL);
-   }
if (i + 1 < max)
rte_prefetch0(buf_next);
/* Request TX completion. */
@@ -517,7 +524,6 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
if (txq->priv->vf)
rte_prefetch0((volatile void *)
  (uintptr_t)addr);
-   RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
/* Prefetch next buffer data. */
if (i + 1 < max) {
buf_next_addr =
@@ -568,7 +574,6 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
  &sges);
if (ret.length == (unsigned int)-1)
goto stop;
-   RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
/* Put SG list into send queue. */
err = txq->send_pending_sg_list
(txq->qp,
-- 
2.1.4



[dpdk-dev] [PATCH v3 7/7] mlx: use aligned memory to register regions

2016-03-03 Thread Adrien Mazarguil
The first and last memory pool elements are usually cache-aligned but not
page-aligned, particularly when using huge pages.

Hardware performance can be improved significantly by registering memory
regions starting and ending on page boundaries.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c  | 58 +---
 drivers/net/mlx5/mlx5_rxq.c  |  6 +
 drivers/net/mlx5/mlx5_rxtx.c | 52 ---
 drivers/net/mlx5/mlx5_rxtx.h |  1 +
 4 files changed, 99 insertions(+), 18 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 6688f66..3c1f4c2 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -86,6 +86,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-pedantic"
 #endif
@@ -1177,6 +1178,52 @@ txq_complete(struct txq *txq)
return 0;
 }

+/* For best performance, this function should not be inlined. */
+static struct ibv_mr *mlx4_mp2mr(struct ibv_pd *, const struct rte_mempool *)
+   __attribute__((noinline));
+
+/**
+ * Register mempool as a memory region.
+ *
+ * @param pd
+ *   Pointer to protection domain.
+ * @param mp
+ *   Pointer to memory pool.
+ *
+ * @return
+ *   Memory region pointer, NULL in case of error.
+ */
+static struct ibv_mr *
+mlx4_mp2mr(struct ibv_pd *pd, const struct rte_mempool *mp)
+{
+   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
+   uintptr_t start = mp->elt_va_start;
+   uintptr_t end = mp->elt_va_end;
+   unsigned int i;
+
+   DEBUG("mempool %p area start=%p end=%p size=%zu",
+ (const void *)mp, (void *)start, (void *)end,
+ (size_t)(end - start));
+   /* Round start and end to page boundary if found in memory segments. */
+   for (i = 0; (i < RTE_MAX_MEMSEG) && (ms[i].addr != NULL); ++i) {
+   uintptr_t addr = (uintptr_t)ms[i].addr;
+   size_t len = ms[i].len;
+   unsigned int align = ms[i].hugepage_sz;
+
+   if ((start > addr) && (start < addr + len))
+   start = RTE_ALIGN_FLOOR(start, align);
+   if ((end > addr) && (end < addr + len))
+   end = RTE_ALIGN_CEIL(end, align);
+   }
+   DEBUG("mempool %p using start=%p end=%p size=%zu for MR",
+ (const void *)mp, (void *)start, (void *)end,
+ (size_t)(end - start));
+   return ibv_reg_mr(pd,
+ (void *)start,
+ end - start,
+ IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE);
+}
+
 /**
  * Get Memory Pool (MP) from mbuf. If mbuf is indirect, the pool from which
  * the cloned mbuf is allocated is returned instead.
@@ -1228,10 +1275,7 @@ txq_mp2mr(struct txq *txq, const struct rte_mempool *mp)
/* Add a new entry, register MR first. */
DEBUG("%p: discovered new memory pool \"%s\" (%p)",
  (void *)txq, mp->name, (const void *)mp);
-   mr = ibv_reg_mr(txq->priv->pd,
-   (void *)mp->elt_va_start,
-   (mp->elt_va_end - mp->elt_va_start),
-   (IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE));
+   mr = mlx4_mp2mr(txq->priv->pd, mp);
if (unlikely(mr == NULL)) {
DEBUG("%p: unable to configure MR, ibv_reg_mr() failed.",
  (void *)txq);
@@ -3713,11 +3757,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
DEBUG("%p: %s scattered packets support (%u WRs)",
  (void *)dev, (tmpl.sp ? "enabling" : "disabling"), desc);
/* Use the entire RX mempool as the memory region. */
-   tmpl.mr = ibv_reg_mr(priv->pd,
-(void *)mp->elt_va_start,
-(mp->elt_va_end - mp->elt_va_start),
-(IBV_ACCESS_LOCAL_WRITE |
- IBV_ACCESS_REMOTE_WRITE));
+   tmpl.mr = mlx4_mp2mr(priv->pd, mp);
if (tmpl.mr == NULL) {
ret = EINVAL;
ERROR("%p: MR creation failure: %s",
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 55d002e..0f5ac65 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1190,11 +1190,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
DEBUG("%p: %s scattered packets support (%u WRs)",
  (void *)dev, (tmpl.sp ? "enabling" : "disabling"), desc);
/* Use the entire RX mempool as the memory region. */
-   tmpl.mr = ibv_reg_mr(priv->pd,
-(void *)mp->elt_va_start,
-(mp->elt_va_end - mp->elt_va_start),
-(IBV_ACCESS_LOCAL_WRITE |
- IBV_ACCESS_REMOTE_WRITE));
+   tmpl.mr = mlx5_mp2mr(priv->pd, mp);
if (tmpl.mr == NULL) {
ret = EINVAL

[dpdk-dev] [PATCH v2 0/7] Assorted fixes for mlx4 and mlx5

2016-03-03 Thread Adrien Mazarguil
This patchset addresses several minor issues, release notes are updated
accordingly.

Note: should be applied after "Performance optimizations for mlx5 and mlx4".

Changes in v2:
- None, submitted again due to dependency with previous patchset.

Adrien Mazarguil (3):
  mlx5: manage all special flow types at once
  mlx5: remove redundant debugging message
  mlx5: apply VLAN filtering to broadcast and IPv6 multicast flows

Or Ami (2):
  mlx5: fix possible crash during initialization
  mlx5: check if port is configured as Ethernet device

Robin Jarry (1):
  mlx4: make sure that number of RX queues is a power of 2

Yaacov Hazan (1):
  mlx5: fix RX checksum offload in non L3/L4 packets

 doc/guides/rel_notes/release_16_04.rst |  17 
 drivers/net/mlx4/mlx4.c|   6 ++
 drivers/net/mlx5/Makefile  |   5 ++
 drivers/net/mlx5/mlx5.c|  18 ++--
 drivers/net/mlx5/mlx5.h|   2 +
 drivers/net/mlx5/mlx5_rxmode.c | 147 +
 drivers/net/mlx5/mlx5_rxq.c|   5 +-
 drivers/net/mlx5/mlx5_rxtx.c   |  26 --
 drivers/net/mlx5/mlx5_rxtx.h   |   4 +-
 drivers/net/mlx5/mlx5_trigger.c|  10 +--
 drivers/net/mlx5/mlx5_vlan.c   |   5 +-
 11 files changed, 204 insertions(+), 41 deletions(-)

-- 
2.1.4



[dpdk-dev] [PATCH v2 1/7] mlx5: fix possible crash during initialization

2016-03-03 Thread Adrien Mazarguil
From: Or Ami 

RSS configuration should not be freed when priv is NULL.

Fixes: 2f97422e7759 ("mlx5: support RSS hash update and get")

Signed-off-by: Or Ami 
---
 doc/guides/rel_notes/release_16_04.rst | 4 
 drivers/net/mlx5/mlx5.c| 6 --
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index c69e55e..953eaa1 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -150,6 +150,10 @@ Drivers

 * **vmxnet3: add TSO support.**

+* **mlx5: Fixed possible crash during initialization.**
+
+  A crash could occur when failing to allocate private device context.
+

 Libraries
 ~
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 575420e..41dcbbf 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -497,8 +497,10 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
continue;

 port_error:
-   rte_free(priv->rss_conf);
-   rte_free(priv);
+   if (priv) {
+   rte_free(priv->rss_conf);
+   rte_free(priv);
+   }
if (pd)
claim_zero(ibv_dealloc_pd(pd));
if (ctx)
-- 
2.1.4



[dpdk-dev] [PATCH v2 2/7] mlx5: check if port is configured as Ethernet device

2016-03-03 Thread Adrien Mazarguil
From: Or Ami 

If the port link layer is not Ethernet, notify the user.

Signed-off-by: Or Ami 
---
 doc/guides/rel_notes/release_16_04.rst | 5 +
 drivers/net/mlx5/mlx5.c| 7 +++
 2 files changed, 12 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 953eaa1..73d0cfc 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -154,6 +154,11 @@ Drivers

   A crash could occur when failing to allocate private device context.

+* **mlx5: Added port type check.**
+
+  Done to prevent port initialization on non-Ethernet link layers and
+  to report an error.
+

 Libraries
 ~
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 41dcbbf..ae2576f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -348,6 +348,13 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
ERROR("port query failed: %s", strerror(err));
goto port_error;
}
+
+   if (port_attr.link_layer != IBV_LINK_LAYER_ETHERNET) {
+   ERROR("port %d is not configured in Ethernet mode",
+ port);
+   goto port_error;
+   }
+
if (port_attr.state != IBV_PORT_ACTIVE)
DEBUG("port %d is not active: \"%s\" (%d)",
  port, ibv_port_state_str(port_attr.state),
-- 
2.1.4



[dpdk-dev] [PATCH v2 3/7] mlx5: manage all special flow types at once

2016-03-03 Thread Adrien Mazarguil
This commit adds helpers to remove redundant code.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c |  5 +
 drivers/net/mlx5/mlx5.h |  2 ++
 drivers/net/mlx5/mlx5_rxmode.c  | 40 
 drivers/net/mlx5/mlx5_trigger.c | 10 ++
 4 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ae2576f..ad69ec2 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -88,10 +88,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
  ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
/* In case mlx5_dev_stop() has not been called. */
priv_dev_interrupt_handler_uninstall(priv, dev);
-   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_ALLMULTI);
-   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_PROMISC);
-   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_BROADCAST);
-   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_IPV6MULTI);
+   priv_special_flow_disable_all(priv);
priv_mac_addrs_disable(priv);
priv_destroy_hash_rxqs(priv);

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 8442016..43b24fb 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -199,6 +199,8 @@ int mlx5_dev_rss_reta_update(struct rte_eth_dev *,

 int priv_special_flow_enable(struct priv *, enum hash_rxq_flow_type);
 void priv_special_flow_disable(struct priv *, enum hash_rxq_flow_type);
+int priv_special_flow_enable_all(struct priv *);
+void priv_special_flow_disable_all(struct priv *);
 void mlx5_promiscuous_enable(struct rte_eth_dev *);
 void mlx5_promiscuous_disable(struct rte_eth_dev *);
 void mlx5_allmulticast_enable(struct rte_eth_dev *);
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 9ac7a41..bcf4231 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -268,6 +268,46 @@ priv_special_flow_disable(struct priv *priv, enum 
hash_rxq_flow_type flow_type)
 }

 /**
+ * Enable all special flows in all hash RX queues.
+ *
+ * @param priv
+ *   Private structure.
+ */
+int
+priv_special_flow_enable_all(struct priv *priv)
+{
+   enum hash_rxq_flow_type flow_type;
+
+   for (flow_type = 0; flow_type != HASH_RXQ_FLOW_TYPE_MAC; ++flow_type) {
+   int ret;
+
+   ret = priv_special_flow_enable(priv, flow_type);
+   if (!ret)
+   continue;
+   /* Failure, rollback. */
+   while (flow_type)
+   priv_special_flow_disable(priv, --flow_type);
+   return ret;
+   }
+   return 0;
+}
+
+/**
+ * Disable all special flows in all hash RX queues.
+ *
+ * @param priv
+ *   Private structure.
+ */
+void
+priv_special_flow_disable_all(struct priv *priv)
+{
+   enum hash_rxq_flow_type flow_type;
+
+   for (flow_type = 0; flow_type != HASH_RXQ_FLOW_TYPE_MAC; ++flow_type)
+   priv_special_flow_disable(priv, flow_type);
+}
+
+/**
  * DPDK callback to enable promiscuous mode.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index db7890f..b5ca7d4 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -80,10 +80,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
  " %s",
  (void *)priv, strerror(err));
/* Rollback. */
-   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_IPV6MULTI);
-   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_BROADCAST);
-   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_ALLMULTI);
-   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_PROMISC);
+   priv_special_flow_disable_all(priv);
priv_mac_addrs_disable(priv);
priv_destroy_hash_rxqs(priv);
}
@@ -113,10 +110,7 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
return;
}
DEBUG("%p: cleaning up and destroying hash RX queues", (void *)dev);
-   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_IPV6MULTI);
-   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_BROADCAST);
-   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_ALLMULTI);
-   priv_special_flow_disable(priv, HASH_RXQ_FLOW_TYPE_PROMISC);
+   priv_special_flow_disable_all(priv);
priv_mac_addrs_disable(priv);
priv_destroy_hash_rxqs(priv);
priv_fdir_disable(priv);
-- 
2.1.4



[dpdk-dev] [PATCH v2 4/7] mlx5: remove redundant debugging message

2016-03-03 Thread Adrien Mazarguil
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_rxmode.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index bcf4231..730527e 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -204,8 +204,6 @@ hash_rxq_special_flow_disable(struct hash_rxq *hash_rxq,
 {
if (hash_rxq->special_flow[flow_type] == NULL)
return;
-   DEBUG("%p: disabling special flow %s (%d)",
- (void *)hash_rxq, hash_rxq_flow_type_str(flow_type), flow_type);
claim_zero(ibv_exp_destroy_flow(hash_rxq->special_flow[flow_type]));
hash_rxq->special_flow[flow_type] = NULL;
DEBUG("%p: special flow %s (%d) disabled",
-- 
2.1.4



[dpdk-dev] [PATCH v2 5/7] mlx5: apply VLAN filtering to broadcast and IPv6 multicast flows

2016-03-03 Thread Adrien Mazarguil
Unlike promiscuous and allmulticast flows, those should remain
VLAN-specific.

Signed-off-by: Adrien Mazarguil 
---
 doc/guides/rel_notes/release_16_04.rst |   4 ++
 drivers/net/mlx5/mlx5_rxmode.c | 105 +
 drivers/net/mlx5/mlx5_rxq.c|   5 +-
 drivers/net/mlx5/mlx5_rxtx.h   |   4 +-
 drivers/net/mlx5/mlx5_vlan.c   |   5 +-
 5 files changed, 106 insertions(+), 17 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 73d0cfc..6e94bbe 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -159,6 +159,10 @@ Drivers
   Done to prevent port initialization on non-Ethernet link layers and
   to report an error.

+* **mlx5: Applied VLAN filtering to broadcast and IPv6 multicast flows.**
+
+  Prevented reception of multicast frames outside of configured VLANs.
+

 Libraries
 ~
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 730527e..2bc005e 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -74,6 +74,7 @@ static const struct special_flow_init special_flow_init[] = {
 #endif /* HAVE_FLOW_SPEC_IPV6 */
1 << HASH_RXQ_ETH |
0,
+   .per_vlan = 0,
},
[HASH_RXQ_FLOW_TYPE_ALLMULTI] = {
.dst_mac_val = "\x01\x00\x00\x00\x00\x00",
@@ -87,6 +88,7 @@ static const struct special_flow_init special_flow_init[] = {
 #endif /* HAVE_FLOW_SPEC_IPV6 */
1 << HASH_RXQ_ETH |
0,
+   .per_vlan = 0,
},
[HASH_RXQ_FLOW_TYPE_BROADCAST] = {
.dst_mac_val = "\xff\xff\xff\xff\xff\xff",
@@ -100,6 +102,7 @@ static const struct special_flow_init special_flow_init[] = 
{
 #endif /* HAVE_FLOW_SPEC_IPV6 */
1 << HASH_RXQ_ETH |
0,
+   .per_vlan = 1,
},
 #ifdef HAVE_FLOW_SPEC_IPV6
[HASH_RXQ_FLOW_TYPE_IPV6MULTI] = {
@@ -110,24 +113,28 @@ static const struct special_flow_init special_flow_init[] 
= {
1 << HASH_RXQ_IPV6 |
1 << HASH_RXQ_ETH |
0,
+   .per_vlan = 1,
},
 #endif /* HAVE_FLOW_SPEC_IPV6 */
 };

 /**
- * Enable a special flow in a hash RX queue.
+ * Enable a special flow in a hash RX queue for a given VLAN index.
  *
  * @param hash_rxq
  *   Pointer to hash RX queue structure.
  * @param flow_type
  *   Special flow type.
+ * @param vlan_index
+ *   VLAN index to use.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 static int
-hash_rxq_special_flow_enable(struct hash_rxq *hash_rxq,
-enum hash_rxq_flow_type flow_type)
+hash_rxq_special_flow_enable_vlan(struct hash_rxq *hash_rxq,
+ enum hash_rxq_flow_type flow_type,
+ unsigned int vlan_index)
 {
struct priv *priv = hash_rxq->priv;
struct ibv_exp_flow *flow;
@@ -136,12 +143,15 @@ hash_rxq_special_flow_enable(struct hash_rxq *hash_rxq,
struct ibv_exp_flow_spec_eth *spec = &data->spec;
const uint8_t *mac;
const uint8_t *mask;
+   unsigned int vlan_enabled = (priv->vlan_filter_n &&
+special_flow_init[flow_type].per_vlan);
+   unsigned int vlan_id = priv->vlan_filter[vlan_index];

/* Check if flow is relevant for this hash_rxq. */
if (!(special_flow_init[flow_type].hash_types & (1 << hash_rxq->type)))
return 0;
/* Check if flow already exists. */
-   if (hash_rxq->special_flow[flow_type] != NULL)
+   if (hash_rxq->special_flow[flow_type][vlan_index] != NULL)
return 0;

/*
@@ -164,12 +174,14 @@ hash_rxq_special_flow_enable(struct hash_rxq *hash_rxq,
mac[0], mac[1], mac[2],
mac[3], mac[4], mac[5],
},
+   .vlan_tag = (vlan_enabled ? htons(vlan_id) : 0),
},
.mask = {
.dst_mac = {
mask[0], mask[1], mask[2],
mask[3], mask[4], mask[5],
},
+   .vlan_tag = (vlan_enabled ? htons(0xfff) : 0),
},
};

@@ -184,9 +196,77 @@ hash_rxq_special_flow_enable(struct hash_rxq *hash_rxq,
return errno;
return EINVAL;
}
-   hash_rxq->special_flow[flow_type] = flow;
-   DEBUG("%p: enabling special flow %s (%d)",
- (void *)hash_rxq, hash_rxq_flow_type_str(flow_type), flow_type);
+   hash_rxq->special_flow[flow_type][vlan_index] = flow;
+   DEBUG("%p: special flow %s (index %d) VLAN %u (index %u) enabled",
+ (vo

[dpdk-dev] [PATCH v2 6/7] mlx5: fix RX checksum offload in non L3/L4 packets

2016-03-03 Thread Adrien Mazarguil
From: Yaacov Hazan 

Change rxq_cq_to_ol_flags() to set checksum flags according to packet type,
so for non L3/L4 packets the mbuf chksum_bad flags will not be set.

Fixes: 67fa62bc672d ("mlx5: support checksum offload")

Signed-off-by: Yaacov Hazan 
---
 doc/guides/rel_notes/release_16_04.rst |  4 
 drivers/net/mlx5/Makefile  |  5 +
 drivers/net/mlx5/mlx5_rxtx.c   | 26 ++
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 6e94bbe..8669515 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -163,6 +163,10 @@ Drivers

   Prevented reception of multicast frames outside of configured VLANs.

+* **mlx5: Fixed RX checksum offload in non L3/L4 packets.**
+
+  Fixed report of bad checksum for packets of unknown type.
+

 Libraries
 ~
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 39cdf2c..7076ae3 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -132,6 +132,11 @@ mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
infiniband/verbs.h \
enum IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS \
$(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_EXP_CQ_RX_TCP_PACKET \
+   infiniband/verbs.h \
+   enum IBV_EXP_CQ_RX_TCP_PACKET \
+   $(AUTOCONF_OUTPUT)

 $(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 4c53c7a..4919189 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -719,14 +719,24 @@ rxq_cq_to_ol_flags(const struct rxq *rxq, uint32_t flags)
 {
uint32_t ol_flags = 0;

-   if (rxq->csum)
-   ol_flags |=
-   TRANSPOSE(~flags,
- IBV_EXP_CQ_RX_IP_CSUM_OK,
- PKT_RX_IP_CKSUM_BAD) |
-   TRANSPOSE(~flags,
- IBV_EXP_CQ_RX_TCP_UDP_CSUM_OK,
- PKT_RX_L4_CKSUM_BAD);
+   if (rxq->csum) {
+   /* Set IP checksum flag only for IPv4/IPv6 packets. */
+   if (flags &
+   (IBV_EXP_CQ_RX_IPV4_PACKET | IBV_EXP_CQ_RX_IPV6_PACKET))
+   ol_flags |=
+   TRANSPOSE(~flags,
+   IBV_EXP_CQ_RX_IP_CSUM_OK,
+   PKT_RX_IP_CKSUM_BAD);
+#ifdef HAVE_EXP_CQ_RX_TCP_PACKET
+   /* Set L4 checksum flag only for TCP/UDP packets. */
+   if (flags &
+   (IBV_EXP_CQ_RX_TCP_PACKET | IBV_EXP_CQ_RX_UDP_PACKET))
+#endif /* HAVE_EXP_CQ_RX_TCP_PACKET */
+   ol_flags |=
+   TRANSPOSE(~flags,
+   IBV_EXP_CQ_RX_TCP_UDP_CSUM_OK,
+   PKT_RX_L4_CKSUM_BAD);
+   }
/*
 * PKT_RX_IP_CKSUM_BAD and PKT_RX_L4_CKSUM_BAD are used in place
 * of PKT_RX_EIP_CKSUM_BAD because the latter is not functional
-- 
2.1.4



[dpdk-dev] [PATCH v2 7/7] mlx4: make sure that number of RX queues is a power of 2

2016-03-03 Thread Adrien Mazarguil
From: Robin Jarry 

In the documentation it is specified that the hardware only supports a
number of RX queues if it is a power of 2.

Since ibv_exp_create_qp may not return an error when the number of
queues is unsupported by hardware, sanitize the value in dev_configure.

Signed-off-by: Robin Jarry 
---
 drivers/net/mlx4/mlx4.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 3c1f4c2..67025c7 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -734,6 +734,12 @@ dev_configure(struct rte_eth_dev *dev)
}
if (rxqs_n == priv->rxqs_n)
return 0;
+   if ((rxqs_n & (rxqs_n - 1)) != 0) {
+   ERROR("%p: invalid number of RX queues (%u),"
+ " must be a power of 2",
+ (void *)dev, rxqs_n);
+   return EINVAL;
+   }
INFO("%p: RX queues number update: %u -> %u",
 (void *)dev, priv->rxqs_n, rxqs_n);
/* If RSS is enabled, disable it first. */
-- 
2.1.4



[dpdk-dev] [PATCH v2 0/5] Implement missing features in mlx5

2016-03-03 Thread Adrien Mazarguil
This patchset adds to mlx5 a few features available in mlx4 (TX from
secondary processes) or provided by Verbs (support for HW packet padding,
TX VLAN insertion).

Release notes and documentation are updated accordingly.

Note: should be applied after "Assorted fixes for mlx4 and mlx5".

Changes in v2:
- Added support for CRC stripping configuration.
- Updated packet padding feature macro and made cosmetic changes to its
  implementation to match CRC stripping's.
- Updated release notes about packet padding.
- Updated TX VLAN insertion documentation.

Olga Shern (2):
  mlx5: add RX CRC stripping configuration
  mlx5: add support for HW packet padding

Or Ami (2):
  mlx5: add callbacks to support link (up / down) changes
  mlx5: allow operation in secondary processes

Yaacov Hazan (1):
  mlx5: add VLAN insertion offload

 config/common_linuxapp |   1 +
 doc/guides/nics/mlx5.rst   |  28 ++-
 doc/guides/rel_notes/release_16_04.rst |  27 +++
 drivers/net/mlx5/Makefile  |  19 +++
 drivers/net/mlx5/mlx5.c|  79 -
 drivers/net/mlx5/mlx5.h|  20 +++
 drivers/net/mlx5/mlx5_defs.h   |   9 +
 drivers/net/mlx5/mlx5_ethdev.c | 299 -
 drivers/net/mlx5/mlx5_mac.c|   6 +
 drivers/net/mlx5/mlx5_rxmode.c |  12 ++
 drivers/net/mlx5/mlx5_rxq.c|  85 ++
 drivers/net/mlx5/mlx5_rxtx.c   | 115 ++---
 drivers/net/mlx5/mlx5_rxtx.h   |  22 +++
 drivers/net/mlx5/mlx5_stats.c  |   2 +-
 drivers/net/mlx5/mlx5_trigger.c|   6 +
 drivers/net/mlx5/mlx5_txq.c|  65 ++-
 16 files changed, 753 insertions(+), 42 deletions(-)

-- 
2.1.4



[dpdk-dev] [PATCH v2 1/5] mlx5: add callbacks to support link (up / down) changes

2016-03-03 Thread Adrien Mazarguil
From: Or Ami 

Burst functions are updated to make sure applications cannot attempt to
send/receive after link is brought down.

Signed-off-by: Or Ami 
---
 doc/guides/rel_notes/release_16_04.rst |  4 ++
 drivers/net/mlx5/mlx5.c|  2 +
 drivers/net/mlx5/mlx5.h|  2 +
 drivers/net/mlx5/mlx5_ethdev.c | 85 ++
 4 files changed, 93 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 8669515..5e43d50 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -87,6 +87,10 @@ This section should contain new features added in this 
release. Sample format:

   Only available with Mellanox OFED >= 3.2.

+* **mlx5: Added link up/down callbacks.**
+
+  Implemented callbacks to bring link up and down.
+

 Resolved Issues
 ---
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ad69ec2..14ac4ba 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -148,6 +148,8 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.dev_configure = mlx5_dev_configure,
.dev_start = mlx5_dev_start,
.dev_stop = mlx5_dev_stop,
+   .dev_set_link_down = mlx5_set_link_down,
+   .dev_set_link_up = mlx5_set_link_up,
.dev_close = mlx5_dev_close,
.promiscuous_enable = mlx5_promiscuous_enable,
.promiscuous_disable = mlx5_promiscuous_disable,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 43b24fb..9a3f240 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -168,6 +168,8 @@ void mlx5_dev_link_status_handler(void *);
 void mlx5_dev_interrupt_handler(struct rte_intr_handle *, void *);
 void priv_dev_interrupt_handler_uninstall(struct priv *, struct rte_eth_dev *);
 void priv_dev_interrupt_handler_install(struct priv *, struct rte_eth_dev *);
+int mlx5_set_link_down(struct rte_eth_dev *dev);
+int mlx5_set_link_up(struct rte_eth_dev *dev);

 /* mlx5_mac.c */

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 6704382..f609e0f 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -968,3 +968,88 @@ priv_dev_interrupt_handler_install(struct priv *priv, 
struct rte_eth_dev *dev)
   dev);
}
 }
+
+/**
+ * Change the link state (UP / DOWN).
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param up
+ *   Nonzero for link up, otherwise link down.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+static int
+priv_set_link(struct priv *priv, int up)
+{
+   struct rte_eth_dev *dev = priv->dev;
+   int err;
+   unsigned int i;
+
+   if (up) {
+   err = priv_set_flags(priv, ~IFF_UP, IFF_UP);
+   if (err)
+   return err;
+   for (i = 0; i < priv->rxqs_n; i++)
+   if ((*priv->rxqs)[i]->sp)
+   break;
+   /* Check if an sp queue exists.
+* Note: Some old frames might be received.
+*/
+   if (i == priv->rxqs_n)
+   dev->rx_pkt_burst = mlx5_rx_burst;
+   else
+   dev->rx_pkt_burst = mlx5_rx_burst_sp;
+   dev->tx_pkt_burst = mlx5_tx_burst;
+   } else {
+   err = priv_set_flags(priv, ~IFF_UP, ~IFF_UP);
+   if (err)
+   return err;
+   dev->rx_pkt_burst = removed_rx_burst;
+   dev->tx_pkt_burst = removed_tx_burst;
+   }
+   return 0;
+}
+
+/**
+ * DPDK callback to bring the link DOWN.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+int
+mlx5_set_link_down(struct rte_eth_dev *dev)
+{
+   struct priv *priv = dev->data->dev_private;
+   int err;
+
+   priv_lock(priv);
+   err = priv_set_link(priv, 0);
+   priv_unlock(priv);
+   return err;
+}
+
+/**
+ * DPDK callback to bring the link UP.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+int
+mlx5_set_link_up(struct rte_eth_dev *dev)
+{
+   struct priv *priv = dev->data->dev_private;
+   int err;
+
+   priv_lock(priv);
+   err = priv_set_link(priv, 1);
+   priv_unlock(priv);
+   return err;
+}
-- 
2.1.4



[dpdk-dev] [PATCH v2 2/5] mlx5: allow operation in secondary processes

2016-03-03 Thread Adrien Mazarguil
From: Or Ami 

Secondary processes are expected to use queues and other resources
allocated by the primary, however Verbs resources can only be shared
between processes when inherited through fork().

This limitation can be worked around for TX by configuring separate queues
from secondary processes.

Signed-off-by: Or Ami 
---
 doc/guides/nics/mlx5.rst   |   3 +-
 doc/guides/rel_notes/release_16_04.rst |   4 +
 drivers/net/mlx5/mlx5.c|  42 +--
 drivers/net/mlx5/mlx5.h|  12 ++
 drivers/net/mlx5/mlx5_ethdev.c | 202 -
 drivers/net/mlx5/mlx5_mac.c|   6 +
 drivers/net/mlx5/mlx5_rxmode.c |  12 ++
 drivers/net/mlx5/mlx5_rxq.c|  46 
 drivers/net/mlx5/mlx5_rxtx.h   |   8 ++
 drivers/net/mlx5/mlx5_stats.c  |   2 +-
 drivers/net/mlx5/mlx5_trigger.c|   6 +
 drivers/net/mlx5/mlx5_txq.c|  50 +++-
 12 files changed, 378 insertions(+), 15 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index edfbf1f..f0d8a7e 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -88,6 +88,7 @@ Features
 - Multicast promiscuous mode.
 - Hardware checksum offloads.
 - Flow director (RTE_FDIR_MODE_PERFECT and RTE_FDIR_MODE_PERFECT_MAC_VLAN).
+- Secondary process TX is supported.

 Limitations
 ---
@@ -96,7 +97,7 @@ Limitations
 - Inner RSS for VXLAN frames is not supported yet.
 - Port statistics through software counters only.
 - Hardware checksum offloads for VXLAN inner header are not supported yet.
-- Secondary processes are not supported yet.
+- Secondary process RX is not supported.

 Configuration
 -
diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 5e43d50..49eed7e 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -91,6 +91,10 @@ This section should contain new features added in this 
release. Sample format:

   Implemented callbacks to bring link up and down.

+* **mlx5: Added support for operation in secondary processes.**
+
+  Implemented TX support in secondary processes (like mlx4).
+

 Resolved Issues
 ---
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 14ac4ba..998e6f0 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -78,7 +78,7 @@
 static void
 mlx5_dev_close(struct rte_eth_dev *dev)
 {
-   struct priv *priv = dev->data->dev_private;
+   struct priv *priv = mlx5_get_priv(dev);
void *tmp;
unsigned int i;

@@ -483,18 +483,44 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
goto port_error;
}

-   eth_dev->data->dev_private = priv;
-   eth_dev->pci_dev = pci_dev;
+   /* Secondary processes have to use local storage for their
+* private data as well as a copy of eth_dev->data, but this
+* pointer must not be modified before burst functions are
+* actually called. */
+   if (mlx5_is_secondary()) {
+   struct mlx5_secondary_data *sd =
+   &mlx5_secondary_data[eth_dev->data->port_id];
+   sd->primary_priv = eth_dev->data->dev_private;
+   if (sd->primary_priv == NULL) {
+   ERROR("no private data for port %u",
+   eth_dev->data->port_id);
+   err = EINVAL;
+   goto port_error;
+   }
+   sd->shared_dev_data = eth_dev->data;
+   rte_spinlock_init(&sd->lock);
+   memcpy(sd->data.name, sd->shared_dev_data->name,
+  sizeof(sd->data.name));
+   sd->data.dev_private = priv;
+   sd->data.rx_mbuf_alloc_failed = 0;
+   sd->data.mtu = ETHER_MTU;
+   sd->data.port_id = sd->shared_dev_data->port_id;
+   sd->data.mac_addrs = priv->mac;
+   eth_dev->tx_pkt_burst = mlx5_tx_burst_secondary_setup;
+   eth_dev->rx_pkt_burst = mlx5_rx_burst_secondary_setup;
+   } else {
+   eth_dev->data->dev_private = priv;
+   eth_dev->data->rx_mbuf_alloc_failed = 0;
+   eth_dev->data->mtu = ETHER_MTU;
+   eth_dev->data->mac_addrs = priv->mac;
+   }

+   eth_dev->pci_dev = pci_dev;
rte_eth_copy_pci_info(eth_dev, pci_dev);
-
eth_dev->driver = &mlx5_driver;
-   eth_dev->data->rx_mbuf_alloc_failed = 0;
-   eth_dev->data->mtu = ETHER_MTU;
-
priv->dev =

[dpdk-dev] [PATCH v2 3/5] mlx5: add RX CRC stripping configuration

2016-03-03 Thread Adrien Mazarguil
From: Olga Shern 

Until now, CRC was always stripped by hardware. This feature can be
configured since MLNX_OFED >= 3.2.

Signed-off-by: Olga Shern 
---
 doc/guides/nics/mlx5.rst   |  2 ++
 doc/guides/rel_notes/release_16_04.rst |  6 ++
 drivers/net/mlx5/Makefile  |  5 +
 drivers/net/mlx5/mlx5.c|  7 +++
 drivers/net/mlx5/mlx5.h|  1 +
 drivers/net/mlx5/mlx5_rxq.c| 24 
 drivers/net/mlx5/mlx5_rxtx.c   |  6 --
 drivers/net/mlx5/mlx5_rxtx.h   |  1 +
 8 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index f0d8a7e..8b63f3f 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -84,6 +84,7 @@ Features
 - Support for multiple MAC addresses.
 - VLAN filtering.
 - RX VLAN stripping.
+- RX CRC stripping configuration.
 - Promiscuous mode.
 - Multicast promiscuous mode.
 - Hardware checksum offloads.
@@ -232,6 +233,7 @@ Currently supported by DPDK:

 - Flow director.
 - RX VLAN stripping.
+- RX CRC stripping configuration.

 - Minimum firmware version:

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 49eed7e..01fb7ed 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -95,6 +95,12 @@ This section should contain new features added in this 
release. Sample format:

   Implemented TX support in secondary processes (like mlx4).

+* **mlx5: Added RX CRC stripping configuration.**
+
+  Until now, CRC was always stripped. It can now be configured.
+
+  Only available with Mellanox OFED >= 3.2.
+

 Resolved Issues
 ---
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 7076ae3..cc6de2d 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -137,6 +137,11 @@ mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
infiniband/verbs.h \
enum IBV_EXP_CQ_RX_TCP_PACKET \
$(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_VERBS_FCS \
+   infiniband/verbs.h \
+   enum IBV_EXP_CREATE_WQ_FLAG_SCATTER_FCS \
+   $(AUTOCONF_OUTPUT)

 $(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 998e6f0..acfb365 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -417,6 +417,13 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
DEBUG("VLAN stripping is %ssupported",
  (priv->hw_vlan_strip ? "" : "not "));

+#ifdef HAVE_VERBS_FCS
+   priv->hw_fcs_strip = !!(exp_device_attr.exp_device_cap_flags &
+   IBV_EXP_DEVICE_SCATTER_FCS);
+#endif /* HAVE_VERBS_FCS */
+   DEBUG("FCS stripping configuration is %ssupported",
+ (priv->hw_fcs_strip ? "" : "not "));
+
 #else /* HAVE_EXP_QUERY_DEVICE */
priv->ind_table_max_size = RSS_INDIRECTION_TABLE_SIZE;
 #endif /* HAVE_EXP_QUERY_DEVICE */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index bad9283..9690827 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -103,6 +103,7 @@ struct priv {
unsigned int hw_csum:1; /* Checksum offload is supported. */
unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
unsigned int hw_vlan_strip:1; /* VLAN stripping is supported. */
+   unsigned int hw_fcs_strip:1; /* FCS stripping is supported. */
unsigned int vf:1; /* This is a VF device. */
unsigned int pending_alarm:1; /* An alarm is pending. */
/* RX/TX queues. */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 3d84f41..19a1119 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1258,6 +1258,30 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
  0),
 #endif /* HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS */
};
+
+#ifdef HAVE_VERBS_FCS
+   /* By default, FCS (CRC) is stripped by hardware. */
+   if (dev->data->dev_conf.rxmode.hw_strip_crc) {
+   tmpl.crc_present = 0;
+   } else if (priv->hw_fcs_strip) {
+   /* Ask HW/Verbs to leave CRC in place when supported. */
+   attr.wq.flags |= IBV_EXP_CREATE_WQ_FLAG_SCATTER_FCS;
+   attr.wq.comp_mask |= IBV_EXP_CREATE_WQ_FLAGS;
+   tmpl.crc_present = 1;
+   } else {
+   WARN("%p: CRC stripping has been disabled but will still"
+" be performed by hardware, make sure MLNX_OFED and"
+" firmware are up to date",
+(void *)dev);
+   tmpl.crc_present = 0;
+   }
+   DEBUG("%p: CRC stripping is %s, %u bytes will be subt

[dpdk-dev] [PATCH v2 4/5] mlx5: add support for HW packet padding

2016-03-03 Thread Adrien Mazarguil
From: Olga Shern 

Environment variable MLX5_PMD_ENABLE_PADDING enables HW packet padding
in PCI bus transactions.

When packet size is cache aligned and CRC stripping is enabled, 4 fewer
bytes are written to the PCI bus. Enabling padding makes such packets
aligned again.

In cases where PCI bandwidth is the bottleneck, padding can improve
performance by 10%.

This is disabled by default since this can also decrease performance for
unaligned packet sizes.

Signed-off-by: Olga Shern 
---
 doc/guides/nics/mlx5.rst   | 14 ++
 doc/guides/rel_notes/release_16_04.rst |  7 +++
 drivers/net/mlx5/Makefile  |  5 +
 drivers/net/mlx5/mlx5.c| 28 
 drivers/net/mlx5/mlx5.h|  5 +
 drivers/net/mlx5/mlx5_rxq.c| 15 +++
 6 files changed, 74 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 8b63f3f..9df30be 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -156,6 +156,20 @@ Environment variables
   lower performance when there is no backpressure, it is not enabled by
   default.

+- ``MLX5_PMD_ENABLE_PADDING``
+
+  Enables HW packet padding in PCI bus transactions.
+
+  When packet size is cache aligned and CRC stripping is enabled, 4 fewer
+  bytes are written to the PCI bus. Enabling padding makes such packets
+  aligned again.
+
+  In cases where PCI bandwidth is the bottleneck, padding can improve
+  performance by 10%.
+
+  This is disabled by default since this can also decrease performance for
+  unaligned packet sizes.
+
 Run-time configuration
 ~~

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 01fb7ed..6bcfad1 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -101,6 +101,13 @@ This section should contain new features added in this 
release. Sample format:

   Only available with Mellanox OFED >= 3.2.

+* **mlx5: Added optional packet padding by HW.**
+
+  Added an option to make PCI bus transactions rounded to multiple of a
+  cache line size for better alignment.
+
+  Only available with Mellanox OFED >= 3.2.
+

 Resolved Issues
 ---
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index cc6de2d..a6a3cab 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -142,6 +142,11 @@ mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
infiniband/verbs.h \
enum IBV_EXP_CREATE_WQ_FLAG_SCATTER_FCS \
$(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_VERBS_RX_END_PADDING \
+   infiniband/verbs.h \
+   enum IBV_EXP_CREATE_WQ_FLAG_RX_END_PADDING \
+   $(AUTOCONF_OUTPUT)

 $(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index acfb365..94eefb9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -68,6 +68,25 @@
 #include "mlx5_defs.h"

 /**
+ * Retrieve integer value from environment variable.
+ *
+ * @param[in] name
+ *   Environment variable name.
+ *
+ * @return
+ *   Integer value, 0 if the variable is not set.
+ */
+int
+mlx5_getenv_int(const char *name)
+{
+   const char *val = getenv(name);
+
+   if (val == NULL)
+   return 0;
+   return atoi(val);
+}
+
+/**
  * DPDK callback to close the device.
  *
  * Destroy all queues and objects, free memory.
@@ -332,6 +351,9 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
 #ifdef HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS
IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS |
 #endif /* HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS */
+#ifdef HAVE_EXP_CREATE_WQ_FLAG_RX_END_PADDING
+   IBV_EXP_DEVICE_ATTR_RX_PAD_END_ALIGN |
+#endif /* HAVE_EXP_CREATE_WQ_FLAG_RX_END_PADDING */
0;
 #endif /* HAVE_EXP_QUERY_DEVICE */

@@ -424,6 +446,12 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
DEBUG("FCS stripping configuration is %ssupported",
  (priv->hw_fcs_strip ? "" : "not "));

+#ifdef HAVE_VERBS_RX_END_PADDING
+   priv->hw_padding = !!exp_device_attr.rx_pad_end_addr_align;
+#endif /* HAVE_VERBS_RX_END_PADDING */
+   DEBUG("hardware RX end alignment padding is %ssupported",
+ (priv->hw_padding ? "" : "not "));
+
 #else /* HAVE_EXP_QUERY_DEVICE */
priv->ind_table_max_size = RSS_INDIRECTION_TABLE_SIZE;
 #endif /* HAVE_EXP_QUERY_DEVICE */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9690827..1904d54 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -104,6 +104,7 @@ struct priv {
unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
unsigned int hw_vlan_strip:1; /* VL

[dpdk-dev] [PATCH v2 5/5] mlx5: add VLAN insertion offload

2016-03-03 Thread Adrien Mazarguil
From: Yaacov Hazan 

VLAN insertion is done in software by the PMD by default unless
CONFIG_RTE_LIBRTE_MLX5_VERBS_VLAN_INSERTION is enabled and Verbs provides
support for hardware insertion.

When enabled, this option improves performance when VLAN insertion is
requested, however ConnectX-4 Lx boards cannot take advantage of
multi-packet send optimizations anymore.

Signed-off-by: Yaacov Hazan 
Signed-off-by: Adrien Mazarguil 
---
 config/common_linuxapp |   1 +
 doc/guides/nics/mlx5.rst   |   9 +++
 doc/guides/rel_notes/release_16_04.rst |   6 ++
 drivers/net/mlx5/Makefile  |   9 +++
 drivers/net/mlx5/mlx5_defs.h   |   9 +++
 drivers/net/mlx5/mlx5_ethdev.c |  12 ++--
 drivers/net/mlx5/mlx5_rxtx.c   | 109 +++--
 drivers/net/mlx5/mlx5_rxtx.h   |  13 
 drivers/net/mlx5/mlx5_txq.c|  15 -
 9 files changed, 158 insertions(+), 25 deletions(-)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 7b5e49f..793d262 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -220,6 +220,7 @@ CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
 CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N=4
 CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE=0
 CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8
+CONFIG_RTE_LIBRTE_MLX5_VERBS_VLAN_INSERTION=n

 #
 # Compile burst-oriented Broadcom PMD driver
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 9df30be..e391518 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -84,6 +84,7 @@ Features
 - Support for multiple MAC addresses.
 - VLAN filtering.
 - RX VLAN stripping.
+- TX VLAN insertion.
 - RX CRC stripping configuration.
 - Promiscuous mode.
 - Multicast promiscuous mode.
@@ -143,6 +144,13 @@ These options can be modified in the ``.config`` file.

   This value is always 1 for RX queues since they use a single MP.

+- ``CONFIG_RTE_LIBRTE_MLX5_VERBS_VLAN_INSERTION`` (default **n**)
+
+  Use Verbs instead of PMD implementation for VLAN insertion. Disabled by
+  default since it prevents ConnectX-4 Lx adapters from taking advantage of
+  multi-packet send optimizations, otherwise provides better performance
+  when VLAN insertion is requested.
+
 Environment variables
 ~

@@ -247,6 +255,7 @@ Currently supported by DPDK:

 - Flow director.
 - RX VLAN stripping.
+- TX VLAN insertion.
 - RX CRC stripping configuration.

 - Minimum firmware version:
diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 6bcfad1..238ef84 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -108,6 +108,12 @@ This section should contain new features added in this 
release. Sample format:

   Only available with Mellanox OFED >= 3.2.

+* **mlx5: TX VLAN insertion support.**
+
+  Added support for TX VLAN insertion.
+
+  Only available with Mellanox OFED >= 3.2.
+

 Resolved Issues
 ---
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index a6a3cab..7d24fd2 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -101,6 +101,10 @@ ifdef CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE
 CFLAGS += -DMLX5_PMD_TX_MP_CACHE=$(CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE)
 endif

+ifeq ($(CONFIG_RTE_LIBRTE_MLX5_VERBS_VLAN_INSERTION),y)
+CFLAGS += -DMLX5_VERBS_VLAN_INSERTION
+endif
+
 include $(RTE_SDK)/mk/rte.lib.mk

 # Generate and clean-up mlx5_autoconf.h.
@@ -147,6 +151,11 @@ mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
infiniband/verbs.h \
enum IBV_EXP_CREATE_WQ_FLAG_RX_END_PADDING \
$(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_VERBS_VLAN_INSERTION \
+   infiniband/verbs.h \
+   enum IBV_EXP_RECEIVE_WQ_CVLAN_INSERTION \
+   $(AUTOCONF_OUTPUT)

 $(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h

diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 5b00d8e..fb8db2e 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -95,4 +95,13 @@
 #define MLX5_FDIR_SUPPORT 1
 #endif

+/*
+ * Prevent compilation when HW VLAN insertion is requested by configuration
+ * but not supported by Verbs.
+ */
+#if defined(MLX5_VERBS_VLAN_INSERTION) && !defined(HAVE_VERBS_VLAN_INSERTION)
+#error CONFIG_RTE_LIBRTE_MLX5_VERBS_VLAN_INSERTION \
+   enabled in configuration but not supported by libibverbs.
+#endif
+
 #endif /* RTE_PMD_MLX5_DEFS_H_ */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 6b674a2..66115d2 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -544,12 +544,12 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *info)
  DEV_RX_OFFLOAD_UDP_CKSUM |
  DEV_RX_OFFLOAD_TCP_CKSUM) :
 0);
-   info->tx_offload_capa =
-   (priv->hw_c

[dpdk-dev] [PATCH] config: remove duplicate configuration information

2016-03-03 Thread Wiles, Keith
>In order to cleanup the configuration files some and reduce
>the number of duplicate configuration information. Add a new
>file called common_base which contains just about all of the
>configuration lines in one place. Then have the common_bsdapp,
>common_linuxapp files include this one file. Then in those OS
>specific files add the delta configuration lines.

Ping. I got a +1 for this patch just trying to get someone else to agree and 
ack. I know the current stuff kind of works, but it does require modifying 
multiple files and while moving this to a single place to modify I did find at 
least on different.

I would like to see this one go in unless it just does not make any sense.

Thanks
++Keith

>
>Signed-off-by: Keith Wiles 
>---
> config/common_base  | 498 
> config/common_bsdapp| 436 +---
> config/common_linuxapp  | 491 +--
> config/defconfig_x86_64-native-bsdapp-clang |   1 +
> config/defconfig_x86_64-native-bsdapp-gcc   |   1 +
> 5 files changed, 518 insertions(+), 909 deletions(-)
> create mode 100644 config/common_base
>
>diff --git a/config/common_base b/config/common_base
>new file mode 100644
>index 000..91a12eb
>--- /dev/null
>+++ b/config/common_base
>@@ -0,0 +1,498 @@
>+#   BSD LICENSE
>+#
>+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
>+#   All rights reserved.
>+#
>+#   Redistribution and use in source and binary forms, with or without
>+#   modification, are permitted provided that the following conditions
>+#   are met:
>+#
>+# * Redistributions of source code must retain the above copyright
>+#   notice, this list of conditions and the following disclaimer.
>+# * Redistributions in binary form must reproduce the above copyright
>+#   notice, this list of conditions and the following disclaimer in
>+#   the documentation and/or other materials provided with the
>+#   distribution.
>+# * Neither the name of Intel Corporation nor the names of its
>+#   contributors may be used to endorse or promote products derived
>+#   from this software without specific prior written permission.
>+#
>+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>+#
>+
>+#
>+# Use intrinsics or assembly code for key routines
>+#
>+CONFIG_RTE_FORCE_INTRINSICS=n
>+
>+#
>+# Machine forces strict alignment constraints.
>+#
>+CONFIG_RTE_ARCH_STRICT_ALIGN=n
>+
>+#
>+# Compile to share library
>+#
>+CONFIG_RTE_BUILD_SHARED_LIB=n
>+
>+#
>+# Combine to one single library
>+#
>+CONFIG_RTE_BUILD_COMBINE_LIBS=n
>+
>+#
>+# Use newest code breaking previous ABI
>+#
>+CONFIG_RTE_NEXT_ABI=y
>+
>+#
>+# Machine's cache line size
>+#
>+CONFIG_RTE_CACHE_LINE_SIZE=64
>+
>+#
>+# Compile Environment Abstraction Layer
>+#
>+CONFIG_RTE_LIBRTE_EAL=y
>+CONFIG_RTE_MAX_LCORE=128
>+CONFIG_RTE_MAX_NUMA_NODES=8
>+CONFIG_RTE_MAX_MEMSEG=256
>+CONFIG_RTE_MAX_MEMZONE=2560
>+CONFIG_RTE_MAX_TAILQ=32
>+CONFIG_RTE_LOG_LEVEL=8
>+CONFIG_RTE_LOG_HISTORY=256
>+CONFIG_RTE_LIBEAL_USE_HPET=n
>+CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
>+CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
>+CONFIG_RTE_EAL_IGB_UIO=y
>+CONFIG_RTE_EAL_VFIO=y
>+CONFIG_RTE_MALLOC_DEBUG=n
>+
>+# Default driver path (or "" to disable)
>+CONFIG_RTE_EAL_PMD_PATH=""
>+
>+#
>+# Special configurations in PCI Config Space for high performance
>+#
>+CONFIG_RTE_PCI_CONFIG=n
>+CONFIG_RTE_PCI_EXTENDED_TAG=""
>+CONFIG_RTE_PCI_MAX_READ_REQUEST_SIZE=0
>+
>+#
>+# Compile Environment Abstraction Layer to support Vmware TSC map
>+#
>+CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y
>+
>+#
>+# Compile the argument parser library
>+#
>+CONFIG_RTE_LIBRTE_KVARGS=y
>+
>+#
>+# Compile generic ethernet library
>+#
>+CONFIG_RTE_LIBRTE_ETHER=y
>+CONFIG_RTE_LIBRTE_ETHDEV_DEBUG=n
>+CONFIG_RTE_MAX_ETHPORTS=32
>+CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
>+CONFIG_RTE_LIBRTE_IEEE1588=n
>+CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
>+CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
>+
>+#
>+# Support NIC bypass logic
>+#
>+CONFIG_RTE_NIC_BYPASS=n
>+
>+#
>+# Compile burst-oriented IGB & EM PMD drivers
>+#
>+CONFIG_RTE_LIBRTE_EM_PMD=y
>+CONFIG_RTE_LIBRTE_IGB_PMD=y
>+CONFIG_RTE_LIBRTE_E1000_DEBUG_INIT=n
>+CON

[dpdk-dev] [PATCH] eal: add missing long-options for short option arguments

2016-03-03 Thread Wiles, Keith
>>On Thu, Feb 25, 2016 at 01:09:16PM -0600, Keith Wiles wrote:
>>> A number of short options for EAL are missing long options
>>> and this patch adds those missing options.
>>> 
>>> The missing long options are for:
>>> -c add --coremask
>>> -d add --driver
>>> -l add --corelist
>>> -m add --memsize
>>> -n add --mem-channels
>>> -r add --mem-ranks
>>> -v add --version
>>> Add an alias for --lcores using --lcore-map
>>> 
>>> Signed-off-by: Keith Wiles 
>>
>>Why do we need long options for all the short options?
>
>I think we need the long options to match the short options just because it 
>makes sense to me to have long options for all short options. Take the case of 
>-v, just about everyone else has a ?version long-option, but we do not.
>
>The real reason is to allow for DPDK configuration via a configuration file 
>and I wanted to use the same strings for the config file variables as the 
>command line options. I figured I would add the long options now as they do 
>not effect the configuration file patch.

Ping. I really want to have long options for the short option to allow me to 
use those same options for the config file support I would like to use for 
DPDK. A config file support is much more reasonable for live or production 
systems IMHO. Plus it could be very nice for the examples to have a config file 
on how that example could be configured.

I can create the config file support without the long option names for the 
short ones, but it would be a lot cleaner to have the same names for config 
file and command line.

Thanks
++Keith

>>
>>/Bruce
>>
>>
>
>
>Regards,
>Keith
>
>
>
>
>


Regards,
Keith






[dpdk-dev] [PATCH] mk: add makefile extention support

2016-03-03 Thread Wiles, Keith
>>2016-02-28 21:47, Wiles, Keith:
>>> >Hi,
>>> >
>>> >2016-02-09 11:35, Keith Wiles:
>>> >> Adding support to the build system to allow for Makefile.XXX
>>> >> extention to a subtree, which already has Makefiles. These
>>> >> Makefiles could be from the autotools and others places. Using
>>> >> the Makefile extention RTE_MKFILE_SUFFIX in a makefile subtree
>>> >> using 'export RTE_MKFILE_SUFFIX=.XXX' to use Makefile.XXX in
>>> >> that subtree.
>>> >> 
>>> >> The main reason I needed this feature was to integrate a autotool
>>> >> open source projects with DPDK and keep the original Makefiles.
>>> >
>>> >Sorry I fail to understand why it is needed.
>>> >Are you trying to add autotool in DPDK? I don't think it is a good 
>>> >approach.
>>> >The DPDK must provide a pkgconfig interface to be integrated anywhere.
>>> 
>>> I was not trying to add autotools to DPDK. On a number of times I wanted to 
>>> integrate a open source project(s) with DPDK and use DPDK?s build system, 
>>> but because the open source project already contained Makefile files you 
>>> can not use DPDK build system without modify or moving the original 
>>> Makefile files. Using this method I can just add a exported variable and 
>>> supply my own Makefile.XXX files.
>>> 
>>> One case was building FreeBSD source, but I did not want to modify FreeBSD 
>>> Makefiles (or reply on previous built Makefiles as they would not work on 
>>> Linux anyway) as I was pulling the source down from freebsd.org repo. Using 
>>> a patch to add the Makefiles with a different suffix allows me to build 
>>> FreeBSD using DPDK, without having to modify or own the FreeBSD source. I 
>>> have had this problem a number of times with open source code I did not 
>>> want to modify, but just build within DPDK build system and adding the 
>>> support for a different suffix to DPDK provided a clean way. The change 
>>> does not effect the correct build system and just allows someone to define 
>>> a new suffix for a given subtree in the code.
>>
>>Why would you like to have another project inside the DPDK files tree?
>>If you want to integrate the lib inside an existing project, the solution
>>is pkgconfig.
>
>The goal for me was to use DPDK build system for that project, instead of 
>using autotools or some other makefile system. In the case of FreeBSD code, 
>the FreeBSD build system requires FreeBSD tools to be built as the ?make? and 
>the Makefiles are very different on a Linux machine.

Does anyone find this patch useful, I would hate to see this one die as it does 
not effect the current builds, but adds support for using DPDK build system 
without having to modify or move the existing Makefiles.

Thanks
++Keith
>>
>
>
>Regards,
>Keith
>
>
>
>
>


Regards,
Keith






[dpdk-dev] [PATCH] eal: add missing long-options for short option arguments

2016-03-03 Thread David Marchand
On Thu, Feb 25, 2016 at 11:12 PM, Wiles, Keith  wrote:
>>On Thu, Feb 25, 2016 at 01:09:16PM -0600, Keith Wiles wrote:
>>> A number of short options for EAL are missing long options
>>> and this patch adds those missing options.
>>>
>>> The missing long options are for:
>>> -c add --coremask
>>> -d add --driver
>>> -l add --corelist
>>> -m add --memsize
>>> -n add --mem-channels
>>> -r add --mem-ranks
>>> -v add --version
>>> Add an alias for --lcores using --lcore-map
>>>
>>> Signed-off-by: Keith Wiles 
>>
>>Why do we need long options for all the short options?
>
> I think we need the long options to match the short options just because it 
> makes sense to me to have long options for all short options. Take the case 
> of -v, just about everyone else has a ?version long-option, but we do not.
>
> The real reason is to allow for DPDK configuration via a configuration file 
> and I wanted to use the same strings for the config file variables as the 
> command line options. I figured I would add the long options now as they do 
> not effect the configuration file patch.

No strong opinion on this.

Just, why "memsize" with no -  but "mem-channels" ?
And why cut down to mem rather than memory ?


-- 
David Marchand


[dpdk-dev] [PATCH] eal: add missing long-options for short option arguments

2016-03-03 Thread Wiles, Keith
>On Thu, Feb 25, 2016 at 11:12 PM, Wiles, Keith  
>wrote:
>>>On Thu, Feb 25, 2016 at 01:09:16PM -0600, Keith Wiles wrote:
 A number of short options for EAL are missing long options
 and this patch adds those missing options.

 The missing long options are for:
 -c add --coremask
 -d add --driver
 -l add --corelist
 -m add --memsize
 -n add --mem-channels
 -r add --mem-ranks
 -v add --version
 Add an alias for --lcores using --lcore-map

 Signed-off-by: Keith Wiles 
>>>
>>>Why do we need long options for all the short options?
>>
>> I think we need the long options to match the short options just because it 
>> makes sense to me to have long options for all short options. Take the case 
>> of -v, just about everyone else has a ?version long-option, but we do not.
>>
>> The real reason is to allow for DPDK configuration via a configuration file 
>> and I wanted to use the same strings for the config file variables as the 
>> command line options. I figured I would add the long options now as they do 
>> not effect the configuration file patch.
>
>No strong opinion on this.
>
>Just, why "memsize" with no -  but "mem-channels" ?
>And why cut down to mem rather than memory ?

I debated on mem-size, but I noticed in a couple places some used memsize. I 
can change them to any thing someone wants. If you want memory-channels and 
memory-ranks I am good with that too.

>
>
>-- 
>David Marchand
>


Regards,
Keith






[dpdk-dev] [PATCH 01/10] ethdev: add a generic flow and new behavior switch to fdir

2016-03-03 Thread Olga Shern
I think what Thomas meant is that we should redesign  Flow Director feature and 
call it something else , Mellanox is calling  it "Flow Steering"  . I agree 
that  Filtering may be more generic name.
We have implemented Flow Director API in Mellanox ConnectX-4 PMD (part of the 
DPDK 16.04 patches) but  we did is in very awkward way that will fit the 
current API and some Mellanox features are missing with current Flow Director 
API.
Therefore I disagree with Jingjing's statement that this API is generic. 
Frankly, it is very hard to understand it , as Thomas mentioned  ..., not sure 
how DPDK users understand what each function/field means  

Best Regards,
Olga

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Wu, Jingjing
Sent: Friday, February 26, 2016 3:18 AM
To: Thomas Monjalon; Rahul Lakkireddy
Cc: dev at dpdk.org; Kumar A S; Nirranjan Kirubaharan
Subject: Re: [dpdk-dev] [PATCH 01/10] ethdev: add a generic flow and new 
behavior switch to fdir



> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Friday, February 26, 2016 2:25 AM
> To: Rahul Lakkireddy 
> Cc: Richardson, Bruce ; dev at dpdk.org; 
> Kumar A S ; Nirranjan Kirubaharan 
> ; Wu, Jingjing 
> Subject: Re: [dpdk-dev] [PATCH 01/10] ethdev: add a generic flow and 
> new behavior switch to fdir
> 
> 2016-02-25 15:03, Rahul Lakkireddy:
> > On Wednesday, February 02/24/16, 2016 at 14:17:58 -0800, Thomas Monjalon 
> > wrote:
> > > > A raw flow provides a generic way for vendors to add their 
> > > > vendor specific input flow.
> > >
> > > Please, "generic" and "vendor specific" in the same sentence.
> > > It's obviously wrong.
> >
> > I think this sentence is being mis-interpreted.
> > What I intended to say is: the fields are generic so that any vendor 
> > can hook-in. The fields themselves are not vendor specific.
> 
> We are trying to push some features into fields of an API instead of 
> thinking how to make it simple.
> 
> > > > In our case, it is possible to match several flows in a single 
> > > > rule.  For example, it's possible to set an ethernet, vlan, ip 
> > > > and tcp/udp flows all in a single rule.  We can specify all of 
> > > > these flows in a single raw input flow, which can then be passed 
> > > > to cxgbe flow director to set the corresponding filter.
> > >
> > > I feel we need to define what is an API.
> > > If the application wants to call something specific to the NIC, 
> > > why using the ethdev API? You just have to include cxgbe.h.
> >
> > Well, in that sense, flow-director is also very intel specific, no ?
> 
> Yes. I think the term "flow director" comes from Intel.
> 
> > What we are trying to do is make flow-director generic
> 
> So let's stop calling it flow director.
> We are talking about filtering, right?
> 
Hi Thomas

Are you suggesting chelsio to define a new filter type?

> Why is it so complex? We are talking about packet filtering, not rocket 
> science!
>
The complex is due to different NICs different behavior :-) As I know, it is a 
common way to use used-define data pass specific infor to driver.

Even flow director is concept from Intel's NIC, but I think it is the generic 
one comparing with other kinds of filters. So I think that's why Rahul choose 
it to add their kind of filters.
As I know enic driver also uses flow director API to support their filters.

No matter chelsio NIC filter uses flow director API or define another new 
filter type. I vote the change happened in struct rte_eth_fdir_input, it 
provide a RAW Flow type, And there is also a mask field for that, by this way, 
user can have a flexible way to configure.
And drivers can parse the raw input to define the filter fields.

But for the change happened in struct rte_eth_fdir_action, only SWITCH type is 
added, Where to switch? All things is in 
behavior_arg[RTE_ETH_BEHAVIOR_ARG_MAX_LEN]
which is black to user. Maybe your previous define in RFC makes more sense. 
It's better to add user defined field but not for all args.

Any better suggestion?


[dpdk-dev] [PATCH v2 0/3] ABI change for RETA, cmdline

2016-03-03 Thread Adrien Mazarguil
On Tue, Jan 12, 2016 at 11:49:06AM +0100, Nelio Laranjeiro wrote:
> Previous version of commit
> "cmdline: increase command line buffer", had side effects and was breaking
> some commands.
> 
> In this version, I only applied John McNamara's solution which consists in
> increasing only RDLINE_BUF_SIZE define from 256 to 512 bytes [1].
> 
> [1] http://dpdk.org/ml/archives/dev/2015-November/027643.html

Acked-by: Adrien Mazarguil 

-- 
Adrien Mazarguil
6WIND


[dpdk-dev] [PATCH v4 05/12] pmd/fm10k: add dev_ptype_info_get implementation

2016-03-03 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tan, Jianfeng
> Sent: Thursday, March 03, 2016 6:04 AM
> To: Chen, Jing D; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 05/12] pmd/fm10k: add dev_ptype_info_get 
> implementation
> 
> Hi,
> 
> On 3/3/2016 4:11 AM, Chen, Jing D wrote:
> > Hi,
> >
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianfeng Tan
> > Sent: Thursday, February 25, 2016 6:09 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v4 05/12] pmd/fm10k: add dev_ptype_info_get 
> > implementation
> >
> > Signed-off-by: Jianfeng Tan 
> > ---
> >   drivers/net/fm10k/fm10k_ethdev.c   | 50 
> > ++
> >   drivers/net/fm10k/fm10k_rxtx.c |  3 +++
> >   drivers/net/fm10k/fm10k_rxtx_vec.c |  3 +++
> >   3 files changed, 56 insertions(+)
> >
> > diff --git a/drivers/net/fm10k/fm10k_ethdev.c 
> > b/drivers/net/fm10k/fm10k_ethdev.c
> > index 421266b..429cbdd 100644
> > --- a/drivers/net/fm10k/fm10k_ethdev.c
> > +++ b/drivers/net/fm10k/fm10k_ethdev.c
> > @@ -1335,6 +1335,55 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
> > };
> >   }
> >
> > +#ifdef RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE
> > +static const uint32_t *
> > +fm10k_dev_ptype_info_get(struct rte_eth_dev *dev) {
> > +   if (dev->rx_pkt_burst == fm10k_recv_pkts ||
> > +   dev->rx_pkt_burst == fm10k_recv_scattered_pkts) {
> > +   static uint32_t ptypes[] = {
> > +   /* refers to rx_desc_to_ol_flags() */
> > +   RTE_PTYPE_L2_ETHER,
> > +   RTE_PTYPE_L3_IPV4,
> > +   RTE_PTYPE_L3_IPV4_EXT,
> > +   RTE_PTYPE_L3_IPV6,
> > +   RTE_PTYPE_L3_IPV6_EXT,
> > +   RTE_PTYPE_L4_TCP,
> > +   RTE_PTYPE_L4_UDP,
> > +   RTE_PTYPE_UNKNOWN
> > +   };
> > +
> > +   return ptypes;
> > +   } else if (dev->rx_pkt_burst == fm10k_recv_pkts_vec ||
> > +  dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec) {
> > +   static uint32_t ptypes_vec[] = {
> > +   /* refers to fm10k_desc_to_pktype_v() */
> > +   RTE_PTYPE_L3_IPV4,
> > +   RTE_PTYPE_L3_IPV4_EXT,
> > +   RTE_PTYPE_L3_IPV6,
> > +   RTE_PTYPE_L3_IPV6_EXT,
> > +   RTE_PTYPE_L4_TCP,
> > +   RTE_PTYPE_L4_UDP,
> > +   RTE_PTYPE_TUNNEL_GENEVE,
> > +   RTE_PTYPE_TUNNEL_NVGRE,
> > +   RTE_PTYPE_TUNNEL_VXLAN,
> > +   RTE_PTYPE_TUNNEL_GRE,
> > +   RTE_PTYPE_UNKNOWN
> > +   };
> > +
> > +   return ptypes_vec;
> > +   }
> > +
> > +   return NULL;
> > +}
> > May I know when " fm10k_dev_ptype_info_get " will be called? In fm10k, the 
> > actual
> > Rx/tx func will be decided after port is started.
> 
> Thank you for pointing out this. It's indeed an issue here. And it makes
> no difference when all rx functions fill the same ptypes, which,
> unfortunately, does not apply to all PMDs. According to my analysis,
> only in fm10k's case, we should call ptype_info_get after dev_start(),
> and for other PMDs, it can called just after rx_queue_setup. So in all,
> I need to add this as a caution in API declaration.

Good catch Mark :)
I think it should be called after dev_start() for all devices:
dev_start() is the usual point where final decision
what RX function should be used is made.
At least for the PMDs I am aware about (ixgbe, i40e, igb).

Konstantin

> 
> __details__
> 
> eth_cxgbe_dev_init
> 
> eth_igb_dev_init
> eth_igbvf_dev_init
> eth_igb_rx_init <- eth_igb_start (makes no difference, rx functins fill
> same ptypes)
> eth_igbvf_rx_init <- igbvf_dev_start (makes no difference, rx functins
> fill same ptypes)
> 
> eth_enicpmd_dev_init
> 
> fm10k_set_rx_function <- fm10k_dev_rx_init <- fm10k_dev_start
> 
> eth_i40e_dev_init
> i40evf_dev_init
> i40e_set_rx_function <- eth_i40e_dev_init
>   <- i40evf_dev_init
>   <- i40e_dev_rx_init <-
> i40e_dev_rxtx_init <- i40e_dev_start (makes no difference, rx functins
> fill same ptypes)
>   <- i40evf_rx_init <-
> i40evf_dev_start (makes no difference, rx functins fill same ptypes)
> 
> ixgbe_set_rx_function <- eth_ixgbe_dev_init
> <- ixgbe_dev_rx_init <-
> ixgbe_dev_start (makes no difference, rx functions fill same ptypes)
> <- ixgbevf_dev_rx_init
> 
> mlx4_rx_queue_setup
> mlx4_dev_set_mtu (makes no difference, rx functions fill same ptypes)
> 
> mlx5_rx_queue_setup
> mlx5_dev_set_mtu (makes no difference, rx functions fill same ptypes)
> 
> nfp_net_init
> 
> eth_vmxnet3_dev_init
> 
> Thanks,
> Jianfeng
> 
> 



[dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst

2016-03-03 Thread Xie, Huawei
On 2/18/2016 9:48 PM, Yuanhan Liu wrote:
> The current rte_vhost_dequeue_burst() implementation is a bit messy
> and logic twisted. And you could see repeat code here and there: it
> invokes rte_pktmbuf_alloc() three times at three different places!
>
> However, rte_vhost_dequeue_burst() acutally does a simple job: copy
> the packet data from vring desc to mbuf. What's tricky here is:
>
> - desc buff could be chained (by desc->next field), so that you need
>   fetch next one if current is wholly drained.
>
> - One mbuf could not be big enough to hold all desc buff, hence you
>   need to chain the mbuf as well, by the mbuf->next field.
>
> Even though, the logic could be simple. Here is the pseudo code.
>
>   while (this_desc_is_not_drained_totally || has_next_desc) {
>   if (this_desc_has_drained_totally) {
>   this_desc = next_desc();
>   }
>
>   if (mbuf_has_no_room) {
>   mbuf = allocate_a_new_mbuf();
>   }
>
>   COPY(mbuf, desc);
>   }
>
> And this is how I refactored rte_vhost_dequeue_burst.
>
> Note that the old patch does a special handling for skipping virtio
> header. However, that could be simply done by adjusting desc_avail
> and desc_offset var:
>
>   desc_avail  = desc->len - vq->vhost_hlen;
>   desc_offset = vq->vhost_hlen;
>
> This refactor makes the code much more readable (IMO), yet it reduces
> binary code size (nearly 2K).
>
> Signed-off-by: Yuanhan Liu 
> ---
>
> v2: - fix potential NULL dereference bug of var "prev" and "head"
> ---
>  lib/librte_vhost/vhost_rxtx.c | 297 
> +-
>  1 file changed, 116 insertions(+), 181 deletions(-)
>
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 5e7e5b1..d5cd0fa 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -702,21 +702,104 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, 
> struct rte_mbuf *m)
>   }
>  }
>  
> +static inline struct rte_mbuf *
> +copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
> +   uint16_t desc_idx, struct rte_mempool *mbuf_pool)
> +{
> + struct vring_desc *desc;
> + uint64_t desc_addr;
> + uint32_t desc_avail, desc_offset;
> + uint32_t mbuf_avail, mbuf_offset;
> + uint32_t cpy_len;
> + struct rte_mbuf *head = NULL;
> + struct rte_mbuf *cur = NULL, *prev = NULL;
> + struct virtio_net_hdr *hdr;
> +
> + desc = &vq->desc[desc_idx];
> + desc_addr = gpa_to_vva(dev, desc->addr);
> + rte_prefetch0((void *)(uintptr_t)desc_addr);
> +
> + /* Retrieve virtio net header */
> + hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr);
> + desc_avail  = desc->len - vq->vhost_hlen;

There is a serious bug here, desc->len - vq->vhost_len could overflow.
VM could easily create this case. Let us fix it here.

> + desc_offset = vq->vhost_hlen;
> +
> + mbuf_avail  = 0;
> + mbuf_offset = 0;
> + while (desc_avail || (desc->flags & VRING_DESC_F_NEXT) != 0) {
> + /* This desc reachs to its end, get the next one */
> + if (desc_avail == 0) {
> + desc = &vq->desc[desc->next];
> +
> + desc_addr = gpa_to_vva(dev, desc->addr);
> + rte_prefetch0((void *)(uintptr_t)desc_addr);
> +
> + desc_offset = 0;
> + desc_avail  = desc->len;
> +
> + PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
> + }
> +
> + /*
> +  * This mbuf reachs to its end, get a new one
> +  * to hold more data.
> +  */
> + if (mbuf_avail == 0) {
> + cur = rte_pktmbuf_alloc(mbuf_pool);
> + if (unlikely(!cur)) {
> + RTE_LOG(ERR, VHOST_DATA, "Failed to "
> + "allocate memory for mbuf.\n");
> + if (head)
> + rte_pktmbuf_free(head);
> + return NULL;
> + }
> + if (!head) {
> + head = cur;
> + } else {
> + prev->next = cur;
> + prev->data_len = mbuf_offset;
> + head->nb_segs += 1;
> + }
> + head->pkt_len += mbuf_offset;
> + prev = cur;
> +
> + mbuf_offset = 0;
> + mbuf_avail  = cur->buf_len - RTE_PKTMBUF_HEADROOM;
> + }
> +
> + cpy_len = RTE_MIN(desc_avail, mbuf_avail);
> + rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, mbuf_offset),
> + (void *)((uintptr_t)(desc_addr + desc_offset)),
> + cpy_len);
> +
> +   

[dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst

2016-03-03 Thread Xie, Huawei
On 2/18/2016 9:48 PM, Yuanhan Liu wrote:
> + mbuf_avail  = 0;
> + mbuf_offset = 0;
> + while (desc_avail || (desc->flags & VRING_DESC_F_NEXT) != 0) {
> + /* This desc reachs to its end, get the next one */
> + if (desc_avail == 0) {
> + desc = &vq->desc[desc->next];
> +
> + desc_addr = gpa_to_vva(dev, desc->addr);
> + rte_prefetch0((void *)(uintptr_t)desc_addr);
> +
> + desc_offset = 0;
> + desc_avail  = desc->len;
> +
> + PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
> + }
> +
> + /*
> +  * This mbuf reachs to its end, get a new one
> +  * to hold more data.
> +  */
> + if (mbuf_avail == 0) {
> + cur = rte_pktmbuf_alloc(mbuf_pool);
> + if (unlikely(!cur)) {
> + RTE_LOG(ERR, VHOST_DATA, "Failed to "
> + "allocate memory for mbuf.\n");
> + if (head)
> + rte_pktmbuf_free(head);
> + return NULL;
> + }

We could always allocate the head mbuf before the loop, then we save the
following branch and make the code more streamlined.
It reminds me that this change prevents the possibility of mbuf bulk
allocation, one solution is we pass the head mbuf from an additional
parameter.
Btw, put unlikely before the check of mbuf_avail and checks elsewhere.

> + if (!head) {
> + head = cur;
> + } else {
> + prev->next = cur;
> + prev->data_len = mbuf_offset;
> + head->nb_segs += 1;
> + }
> + head->pkt_len += mbuf_offset;
> + prev = cur;
> +
> + mbuf_offset = 0;
> + mbuf_avail  = cur->buf_len - RTE_PKTMBUF_HEADROOM;
> + }
> +
> + cpy_len = RTE_MIN(desc_avail, mbuf_avail);
> + rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, mbuf_offset),
> + (void *)((uintptr_t)(desc_addr + desc_offset)),
> + cpy_len);
> +
> + mbuf_avail  -= cpy_len;
> + mbuf_offset += cpy_len;
> + desc_avail  -= cpy_len;
> + desc_offset += cpy_len;
> + }
> +



[dpdk-dev] [PATCH] mk: add makefile extention support

2016-03-03 Thread Thomas Monjalon
2016-03-03 14:55, Wiles, Keith:
> >>2016-02-28 21:47, Wiles, Keith:
> >>> >Hi,
> >>> >
> >>> >2016-02-09 11:35, Keith Wiles:
> >>> >> Adding support to the build system to allow for Makefile.XXX
> >>> >> extention to a subtree, which already has Makefiles. These
> >>> >> Makefiles could be from the autotools and others places. Using
> >>> >> the Makefile extention RTE_MKFILE_SUFFIX in a makefile subtree
> >>> >> using 'export RTE_MKFILE_SUFFIX=.XXX' to use Makefile.XXX in
> >>> >> that subtree.
> >>> >> 
> >>> >> The main reason I needed this feature was to integrate a autotool
> >>> >> open source projects with DPDK and keep the original Makefiles.
> >>> >
> >>> >Sorry I fail to understand why it is needed.
> >>> >Are you trying to add autotool in DPDK? I don't think it is a good 
> >>> >approach.
> >>> >The DPDK must provide a pkgconfig interface to be integrated anywhere.
> >>> 
> >>> I was not trying to add autotools to DPDK. On a number of times I wanted 
> >>> to integrate a open source project(s) with DPDK and use DPDK?s build 
> >>> system, but because the open source project already contained Makefile 
> >>> files you can not use DPDK build system without modify or moving the 
> >>> original Makefile files. Using this method I can just add a exported 
> >>> variable and supply my own Makefile.XXX files.
> >>> 
> >>> One case was building FreeBSD source, but I did not want to modify 
> >>> FreeBSD Makefiles (or reply on previous built Makefiles as they would not 
> >>> work on Linux anyway) as I was pulling the source down from freebsd.org 
> >>> repo. Using a patch to add the Makefiles with a different suffix allows 
> >>> me to build FreeBSD using DPDK, without having to modify or own the 
> >>> FreeBSD source. I have had this problem a number of times with open 
> >>> source code I did not want to modify, but just build within DPDK build 
> >>> system and adding the support for a different suffix to DPDK provided a 
> >>> clean way. The change does not effect the correct build system and just 
> >>> allows someone to define a new suffix for a given subtree in the code.
> >>
> >>Why would you like to have another project inside the DPDK files tree?
> >>If you want to integrate the lib inside an existing project, the solution
> >>is pkgconfig.
> >
> >The goal for me was to use DPDK build system for that project, instead of 
> >using autotools or some other makefile system. In the case of FreeBSD code, 
> >the FreeBSD build system requires FreeBSD tools to be built as the ?make? 
> >and the Makefiles are very different on a Linux machine.
> 
> Does anyone find this patch useful, I would hate to see this one die as it 
> does not effect the current builds, but adds support for using DPDK build 
> system without having to modify or move the existing Makefiles.

I would hate making the build system even more complicated to use it
for something which is not its role.
It opens the door to feature requests which are clearly out of its scope.



[dpdk-dev] [PATCH 1/3] kcp: add kernel control path kernel module

2016-03-03 Thread Stephen Hemminger
On Thu, 3 Mar 2016 10:11:57 +
Ferruh Yigit  wrote:

> On 3/2/2016 10:18 PM, Jay Rolette wrote:
> > 
> > On Tue, Mar 1, 2016 at 8:02 PM, Stephen Hemminger
> > mailto:stephen at networkplumber.org>> 
> > wrote:
> > 
> > On Mon, 29 Feb 2016 08:33:25 -0600
> > Jay Rolette mailto:rolette at 
> > infiniteio.com>>
> > wrote:
> > 
> > > On Mon, Feb 29, 2016 at 5:06 AM, Thomas Monjalon
> > mailto:thomas.monjalon at 6wind.com>>
> > > wrote:
> > >
> > > > Hi,
> > > > I totally agree with Avi's comments.
> > > > This topic is really important for the future of DPDK.
> > > > So I think we must give some time to continue the discussion
> > > > and have netdev involved in the choices done.
> > > > As a consequence, these series should not be merged in the
> > release 16.04.
> > > > Thanks for continuing the work.
> > > >
> > >
> > > I know you guys are very interested in getting rid of the out-of-tree
> > > drivers, but please do not block incremental improvements to DPDK
> > in the
> > > meantime. Ferruh's patch improves the usability of KNI. Don't
> > throw out
> > > good and useful enhancements just because it isn't where you want
> > to be in
> > > the end.
> > >
> > > I'd like to see these be merged.
> > >
> > > Jay
> > 
> > The code is really not ready. I am okay with cooperative development
> > but the current code needs to go into a staging type tree.
> > No compatibility, no ABI guarantees, more of an RFC.
> > Don't want vendors building products with it then screaming when it
> > gets rebuilt/reworked/scrapped.
> > 
> > 
> > That's fair. To be clear, it wasn't my intent for code that wasn't baked
> > yet to be merged. 
> > 
> > The main point of my comment was that I think it is important not to
> > halt incremental improvements to existing capabilities (KNI in this
> > case) just because there are philosophical or directional changes that
> > the community would like to make longer-term.
> > 
> > Bird in the hand vs. two in the bush...
> > 
> 
> There are two different statements, first, code being not ready, I agree
> a fair point (although there is no argument to that statement, it makes
> hard to discuss this, I will put aside this), this implies when code is
> ready it can go in to repo.
> 
> But not having kernel module, independent from their state against what
> they are trying to replace is something else. And this won't help on KNI
> related problems.
> 
> Thanks,
> ferruh
> 

Why not re-submit patches but put in lib/librte_eal/staging or similar path
and make sure that it does not get build by normal build process.


[dpdk-dev] [PATCH] config: remove duplicate configuration information

2016-03-03 Thread Thomas Monjalon
2016-03-03 14:43, Wiles, Keith:
> >In order to cleanup the configuration files some and reduce
> >the number of duplicate configuration information. Add a new
> >file called common_base which contains just about all of the
> >configuration lines in one place. Then have the common_bsdapp,
> >common_linuxapp files include this one file. Then in those OS
> >specific files add the delta configuration lines.
> 
> Ping. I got a +1 for this patch just trying to get someone else to agree and 
> ack. I know the current stuff kind of works, but it does require modifying 
> multiple files and while moving this to a single place to modify I did find 
> at least on different.
> 
> I would like to see this one go in unless it just does not make any sense.

You have my +1
I'm going to review it.


[dpdk-dev] [PATCH] igb_uio: cast private data to correct struct type

2016-03-03 Thread Ferruh Yigit
Fixes: af75078fece3 ("first public release")

This was working fine because addresses of two structs are same:

struct A {
struct B b;
} a;

As above sample "a" and "b" has same address.

Now casting private data back to the correct struct type, to the one
stored.

Signed-off-by: Ferruh Yigit 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index f5617d2..3374e44 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -561,24 +561,17 @@ fail_free:
 static void
 igbuio_pci_remove(struct pci_dev *dev)
 {
-   struct uio_info *info = pci_get_drvdata(dev);
-   struct rte_uio_pci_dev *udev;
-
-   if (info->priv == NULL) {
-   pr_notice("Not igbuio device\n");
-   return;
-   }
-   udev = info->priv;
+   struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);

sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
-   uio_unregister_device(info);
-   igbuio_pci_release_iomem(info);
+   uio_unregister_device(&udev->info);
+   igbuio_pci_release_iomem(&udev->info);
if (udev->mode == RTE_INTR_MODE_MSIX)
pci_disable_msix(dev);
pci_release_regions(dev);
pci_disable_device(dev);
pci_set_drvdata(dev, NULL);
-   kfree(info);
+   kfree(udev);
 }

 static int
-- 
2.5.0



[dpdk-dev] [PATCH] igb_uio: use macros for array size calculation

2016-03-03 Thread Ferruh Yigit
Minor code cleanup.
Remove array size calculations and remove unnecessary assignment.

Signed-off-by: Ferruh Yigit 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 3374e44..563c57b 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -58,7 +58,7 @@ struct rte_uio_pci_dev {
enum rte_intr_mode mode;
 };

-static char *intr_mode = NULL;
+static char *intr_mode;
 static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;

 /* sriov sysfs */
@@ -332,7 +332,7 @@ igbuio_pci_setup_iomem(struct pci_dev *dev, struct uio_info 
*info,
unsigned long addr, len;
void *internal_addr;

-   if (sizeof(info->mem) / sizeof(info->mem[0]) <= n)
+   if (n >= MAX_UIO_MAPS)
return -EINVAL;

addr = pci_resource_start(dev, pci_bar);
@@ -357,7 +357,7 @@ igbuio_pci_setup_ioport(struct pci_dev *dev, struct 
uio_info *info,
 {
unsigned long addr, len;

-   if (sizeof(info->port) / sizeof(info->port[0]) <= n)
+   if (n >= MAX_UIO_PORT_REGIONS)
return -EINVAL;

addr = pci_resource_start(dev, pci_bar);
@@ -402,7 +402,7 @@ igbuio_setup_bars(struct pci_dev *dev, struct uio_info 
*info)
iom = 0;
iop = 0;

-   for (i = 0; i != sizeof(bar_names) / sizeof(bar_names[0]); i++) {
+   for (i = 0; i < ARRAY_SIZE(bar_names); i++) {
if (pci_resource_len(dev, i) != 0 &&
pci_resource_start(dev, i) != 0) {
flags = pci_resource_flags(dev, i);
-- 
2.5.0



[dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst

2016-03-03 Thread Xie, Huawei
On 2/18/2016 9:48 PM, Yuanhan Liu wrote:
> [...]
CCed changchun, the author for the chained handling of desc and mbuf.
The change makes the code more readable, but i think the following
commit message is simple and enough.
>
>   while (this_desc_is_not_drained_totally || has_next_desc) {
>   if (this_desc_has_drained_totally) {
>   this_desc = next_desc();
>   }
>
>   if (mbuf_has_no_room) {
>   mbuf = allocate_a_new_mbuf();
>   }
>
>   COPY(mbuf, desc);
>   }
>
> [...]
>
> This refactor makes the code much more readable (IMO), yet it reduces
> binary code size (nearly 2K).
I guess the reduced binary code size comes from reduced inline calls to
mbuf allocation.



[dpdk-dev] [PATCH] examples/l3fwd: em path performance fix

2016-03-03 Thread Tomasz Kulasek
It seems that for the most use cases, previous hash_multi_lookup provides
better performance, and more, sequential lookup can cause significant
performance drop.

This patch sets previously optional hash_multi_lookup method as default.
It also provides some minor optimizations such as queue drain only on used
tx ports.

Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match")

Reported-by: Qian Xu 
Signed-off-by: Tomasz Kulasek 
---
 examples/l3fwd/l3fwd.h|2 ++
 examples/l3fwd/l3fwd_em.c |6 +++---
 examples/l3fwd/l3fwd_em_hlm_sse.h |   12 ++--
 examples/l3fwd/l3fwd_em_sse.h |9 +
 examples/l3fwd/l3fwd_lpm.c|4 ++--
 examples/l3fwd/main.c |7 +++
 6 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index da6d369..207a60a 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -84,6 +84,8 @@ struct lcore_rx_queue {
 struct lcore_conf {
uint16_t n_rx_queue;
struct lcore_rx_queue rx_queue_list[MAX_RX_QUEUE_PER_LCORE];
+   uint16_t n_tx_port;
+   uint16_t tx_port_id[RTE_MAX_ETHPORTS];
uint16_t tx_queue_id[RTE_MAX_ETHPORTS];
struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
void *ipv4_lookup_struct;
diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
index f6a65d8..c8c781d 100644
--- a/examples/l3fwd/l3fwd_em.c
+++ b/examples/l3fwd/l3fwd_em.c
@@ -305,7 +305,7 @@ em_get_ipv6_dst_port(void *ipv6_hdr,  uint8_t portid, void 
*lookup_struct)
  * buffer optimization i.e. ENABLE_MULTI_BUFFER_OPTIMIZE=1.
  */
 #if defined(__SSE4_1__)
-#ifndef HASH_MULTI_LOOKUP
+#ifdef NO_HASH_MULTI_LOOKUP
 #include "l3fwd_em_sse.h"
 #else
 #include "l3fwd_em_hlm_sse.h"
@@ -552,8 +552,8 @@ em_main_loop(__attribute__((unused)) void *dummy)
diff_tsc = cur_tsc - prev_tsc;
if (unlikely(diff_tsc > drain_tsc)) {

-   for (i = 0; i < qconf->n_rx_queue; i++) {
-   portid = qconf->rx_queue_list[i].port_id;
+   for (i = 0; i < qconf->n_tx_port; ++i) {
+   portid = qconf->tx_port_id[i];
if (qconf->tx_mbufs[portid].len == 0)
continue;
send_burst(qconf,
diff --git a/examples/l3fwd/l3fwd_em_hlm_sse.h 
b/examples/l3fwd/l3fwd_em_hlm_sse.h
index d3388da..517815a 100644
--- a/examples/l3fwd/l3fwd_em_hlm_sse.h
+++ b/examples/l3fwd/l3fwd_em_hlm_sse.h
@@ -34,17 +34,9 @@
 #ifndef __L3FWD_EM_HLM_SSE_H__
 #define __L3FWD_EM_HLM_SSE_H__

-/**
- * @file
- * This is an optional implementation of packet classification in Exact-Match
- * path using rte_hash_lookup_multi method from previous implementation.
- * While sequential classification seems to be faster, it's disabled by default
- * and can be enabled with HASH_LOOKUP_MULTI global define in compilation time.
- */
-
 #include "l3fwd_sse.h"

-static inline void
+static inline __attribute__((always_inline)) void
 em_get_dst_port_ipv4x8(struct lcore_conf *qconf, struct rte_mbuf *m[8],
uint8_t portid, uint16_t dst_port[8])
 {
@@ -168,7 +160,7 @@ get_ipv6_5tuple(struct rte_mbuf *m0, __m128i mask0,
key->xmm[2] = _mm_and_si128(tmpdata2, mask1);
 }

-static inline void
+static inline __attribute__((always_inline)) void
 em_get_dst_port_ipv6x8(struct lcore_conf *qconf, struct rte_mbuf *m[8],
uint8_t portid, uint16_t dst_port[8])
 {
diff --git a/examples/l3fwd/l3fwd_em_sse.h b/examples/l3fwd/l3fwd_em_sse.h
index 4c6d14f..7f10af4 100644
--- a/examples/l3fwd/l3fwd_em_sse.h
+++ b/examples/l3fwd/l3fwd_em_sse.h
@@ -34,6 +34,15 @@
 #ifndef __L3FWD_EM_SSE_H__
 #define __L3FWD_EM_SSE_H__

+/**
+ * @file
+ * This is an optional implementation of packet classification in Exact-Match
+ * path using sequential packet classification method.
+ * While hash lookup multi seems to provide better performance, it's disabled
+ * by default and can be enabled with NO_HASH_LOOKUP_MULTI global define in
+ * compilation time.
+ */
+
 #include "l3fwd_sse.h"

 static inline __attribute__((always_inline)) uint16_t
diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c
index e0ed3c4..8df762d 100644
--- a/examples/l3fwd/l3fwd_lpm.c
+++ b/examples/l3fwd/l3fwd_lpm.c
@@ -158,8 +158,8 @@ lpm_main_loop(__attribute__((unused)) void *dummy)
diff_tsc = cur_tsc - prev_tsc;
if (unlikely(diff_tsc > drain_tsc)) {

-   for (i = 0; i < qconf->n_rx_queue; i++) {
-   portid = qconf->rx_queue_list[i].port_id;
+   for (i = 0; i < qconf->n_tx_port; ++i) {
+   portid = qconf->tx_port_id[i];
if (qconf->tx_mbufs[portid].len == 0)
continue;
send_burst(

[dpdk-dev] [PATCH] igb_uio: use macros for array size calculation

2016-03-03 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Thursday, March 03, 2016 5:08 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] igb_uio: use macros for array size calculation
> 
> Minor code cleanup.
> Remove array size calculations and remove unnecessary assignment.
> 
> Signed-off-by: Ferruh Yigit 
> ---
>  lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
> b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> index 3374e44..563c57b 100644
> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> @@ -58,7 +58,7 @@ struct rte_uio_pci_dev {
>   enum rte_intr_mode mode;
>  };
> 
> -static char *intr_mode = NULL;
> +static char *intr_mode;
>  static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
> 
>  /* sriov sysfs */
> @@ -332,7 +332,7 @@ igbuio_pci_setup_iomem(struct pci_dev *dev, struct 
> uio_info *info,
>   unsigned long addr, len;
>   void *internal_addr;
> 
> - if (sizeof(info->mem) / sizeof(info->mem[0]) <= n)
> + if (n >= MAX_UIO_MAPS)

Why using hardcoded value is better than sizeof()?
As I can see below there is a macro ARRAY_SIZE, why not to use it here then?
Konstantin

>   return -EINVAL;
> 
>   addr = pci_resource_start(dev, pci_bar);
> @@ -357,7 +357,7 @@ igbuio_pci_setup_ioport(struct pci_dev *dev, struct 
> uio_info *info,
>  {
>   unsigned long addr, len;
> 
> - if (sizeof(info->port) / sizeof(info->port[0]) <= n)
> + if (n >= MAX_UIO_PORT_REGIONS)
>   return -EINVAL;
> 
>   addr = pci_resource_start(dev, pci_bar);
> @@ -402,7 +402,7 @@ igbuio_setup_bars(struct pci_dev *dev, struct uio_info 
> *info)
>   iom = 0;
>   iop = 0;
> 
> - for (i = 0; i != sizeof(bar_names) / sizeof(bar_names[0]); i++) {
> + for (i = 0; i < ARRAY_SIZE(bar_names); i++) {
>   if (pci_resource_len(dev, i) != 0 &&
>   pci_resource_start(dev, i) != 0) {
>   flags = pci_resource_flags(dev, i);
> --
> 2.5.0



[dpdk-dev] [PATCH] igb_uio: use macros for array size calculation

2016-03-03 Thread Ferruh Yigit
On 3/3/2016 5:25 PM, Ananyev, Konstantin wrote:
> 
> 
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ferruh Yigit
>> Sent: Thursday, March 03, 2016 5:08 PM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [PATCH] igb_uio: use macros for array size calculation
>>
>> Minor code cleanup.
>> Remove array size calculations and remove unnecessary assignment.
>>
>> Signed-off-by: Ferruh Yigit 
>> ---
>>  lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
>> b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> index 3374e44..563c57b 100644
>> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> @@ -58,7 +58,7 @@ struct rte_uio_pci_dev {
>>  enum rte_intr_mode mode;
>>  };
>>
>> -static char *intr_mode = NULL;
>> +static char *intr_mode;
>>  static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
>>
>>  /* sriov sysfs */
>> @@ -332,7 +332,7 @@ igbuio_pci_setup_iomem(struct pci_dev *dev, struct 
>> uio_info *info,
>>  unsigned long addr, len;
>>  void *internal_addr;
>>
>> -if (sizeof(info->mem) / sizeof(info->mem[0]) <= n)
>> +if (n >= MAX_UIO_MAPS)
> 
> Why using hardcoded value is better than sizeof()?
> As I can see below there is a macro ARRAY_SIZE, why not to use it here then?

Both are valid, but in uio (uio_driver.h) "mem" array defined as:
 struct uio_mem  mem[MAX_UIO_MAPS];

So we already know the size of the array, and it is exposed to us, why
need to calculate. Is there any benefit of calculating it?

Thanks,
ferruh



[dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst

2016-03-03 Thread Xie, Huawei
On 2/18/2016 9:48 PM, Yuanhan Liu wrote:
> The current rte_vhost_dequeue_burst() implementation is a bit messy
[...]
> +
>  uint16_t
>  rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
>   struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
>  {
> - struct rte_mbuf *m, *prev;
>   struct vhost_virtqueue *vq;
> - struct vring_desc *desc;
> - uint64_t vb_addr = 0;
> - uint64_t vb_net_hdr_addr = 0;
> - uint32_t head[MAX_PKT_BURST];
> + uint32_t desc_indexes[MAX_PKT_BURST];

indices


>   uint32_t used_idx;
>   uint32_t i;
> - uint16_t free_entries, entry_success = 0;
> + uint16_t free_entries;
>   uint16_t avail_idx;
> - struct virtio_net_hdr *hdr = NULL;
> + struct rte_mbuf *m;
>  
>   if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) {
>   RTE_LOG(ERR, VHOST_DATA,
> @@ -730,197 +813,49 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, 
> uint16_t queue_id,
>   return 0;
>  
>   avail_idx =  *((volatile uint16_t *)&vq->avail->idx);
> -
> - /* If there are no available buffers then return. */
> - if (vq->last_used_idx == avail_idx)
> + free_entries = avail_idx - vq->last_used_idx;
> + if (free_entries == 0)
>   return 0;
>  
> - LOG_DEBUG(VHOST_DATA, "%s (%"PRIu64")\n", __func__,
> - dev->device_fh);
> + LOG_DEBUG(VHOST_DATA, "%s (%"PRIu64")\n", __func__, dev->device_fh);
>  
> - /* Prefetch available ring to retrieve head indexes. */
> - rte_prefetch0(&vq->avail->ring[vq->last_used_idx & (vq->size - 1)]);
> + used_idx = vq->last_used_idx & (vq->size -1);
>  
> - /*get the number of free entries in the ring*/
> - free_entries = (avail_idx - vq->last_used_idx);
> + /* Prefetch available ring to retrieve head indexes. */
> + rte_prefetch0(&vq->avail->ring[used_idx]);
>  
> - free_entries = RTE_MIN(free_entries, count);
> - /* Limit to MAX_PKT_BURST. */
> - free_entries = RTE_MIN(free_entries, MAX_PKT_BURST);
> + count = RTE_MIN(count, MAX_PKT_BURST);
> + count = RTE_MIN(count, free_entries);
> + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") about to dequeue %u buffers\n",
> + dev->device_fh, count);
>  
> - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n",
> - dev->device_fh, free_entries);
>   /* Retrieve all of the head indexes first to avoid caching issues. */
> - for (i = 0; i < free_entries; i++)
> - head[i] = vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 
> 1)];
> + for (i = 0; i < count; i++) {
> + desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) &
> + (vq->size - 1)];
> + }
>  
>   /* Prefetch descriptor index. */
> - rte_prefetch0(&vq->desc[head[entry_success]]);
> + rte_prefetch0(&vq->desc[desc_indexes[0]]);
>   rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]);
>  
> - while (entry_success < free_entries) {
> - uint32_t vb_avail, vb_offset;
> - uint32_t seg_avail, seg_offset;
> - uint32_t cpy_len;
> - uint32_t seg_num = 0;
> - struct rte_mbuf *cur;
> - uint8_t alloc_err = 0;
> -
> - desc = &vq->desc[head[entry_success]];
> -
> - vb_net_hdr_addr = gpa_to_vva(dev, desc->addr);
> - hdr = (struct virtio_net_hdr *)((uintptr_t)vb_net_hdr_addr);
> -
> - /* Discard first buffer as it is the virtio header */
> - if (desc->flags & VRING_DESC_F_NEXT) {
> - desc = &vq->desc[desc->next];
> - vb_offset = 0;
> - vb_avail = desc->len;
> - } else {
> - vb_offset = vq->vhost_hlen;
> - vb_avail = desc->len - vb_offset;
> - }
> -
> - /* Buffer address translation. */
> - vb_addr = gpa_to_vva(dev, desc->addr);
> - /* Prefetch buffer address. */
> - rte_prefetch0((void *)(uintptr_t)vb_addr);
> -
> - used_idx = vq->last_used_idx & (vq->size - 1);
> -
> - if (entry_success < (free_entries - 1)) {
> - /* Prefetch descriptor index. */
> - rte_prefetch0(&vq->desc[head[entry_success+1]]);
> - rte_prefetch0(&vq->used->ring[(used_idx + 1) & 
> (vq->size - 1)]);
> - }

Why is this prefetch silently dropped in the patch?
> -
> - /* Update used index buffer information. */
> - vq->used->ring[used_idx].id = head[entry_success];
> - vq->used->ring[used_idx].len = 0;
> -
> - /* Allocate an mbuf and populate the structure. */
> - m = rte_pktmbuf_alloc(mbuf_pool);
> - if (unlikely(m == NULL)) {
> - RTE_LOG(ERR, VHOST_DATA,
> -   

[dpdk-dev] [PATCH] igb_uio: use macros for array size calculation

2016-03-03 Thread Ananyev, Konstantin


> -Original Message-
> From: Yigit, Ferruh
> Sent: Thursday, March 03, 2016 5:35 PM
> To: Ananyev, Konstantin; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] igb_uio: use macros for array size calculation
> 
> On 3/3/2016 5:25 PM, Ananyev, Konstantin wrote:
> >
> >
> >> -Original Message-
> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ferruh Yigit
> >> Sent: Thursday, March 03, 2016 5:08 PM
> >> To: dev at dpdk.org
> >> Subject: [dpdk-dev] [PATCH] igb_uio: use macros for array size calculation
> >>
> >> Minor code cleanup.
> >> Remove array size calculations and remove unnecessary assignment.
> >>
> >> Signed-off-by: Ferruh Yigit 
> >> ---
> >>  lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 8 
> >>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
> >> b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> >> index 3374e44..563c57b 100644
> >> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> >> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> >> @@ -58,7 +58,7 @@ struct rte_uio_pci_dev {
> >>enum rte_intr_mode mode;
> >>  };
> >>
> >> -static char *intr_mode = NULL;
> >> +static char *intr_mode;
> >>  static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
> >>
> >>  /* sriov sysfs */
> >> @@ -332,7 +332,7 @@ igbuio_pci_setup_iomem(struct pci_dev *dev, struct 
> >> uio_info *info,
> >>unsigned long addr, len;
> >>void *internal_addr;
> >>
> >> -  if (sizeof(info->mem) / sizeof(info->mem[0]) <= n)
> >> +  if (n >= MAX_UIO_MAPS)
> >
> > Why using hardcoded value is better than sizeof()?
> > As I can see below there is a macro ARRAY_SIZE, why not to use it here then?
> 
> Both are valid, but in uio (uio_driver.h) "mem" array defined as:
>  struct uio_mem  mem[MAX_UIO_MAPS];

Yep, so if both are valid, why to change it a the first place? :)

> 
> So we already know the size of the array, and it is exposed to us, why
> need to calculate. Is there any benefit of calculating it?

if in future definition of the mem[] would change to let say:
struct uio_mem   mem[X]
your code would still be valid, and no need to update it. 
Konstantin

> 
> Thanks,
> ferruh



[dpdk-dev] [PATCH] mk: add makefile extention support

2016-03-03 Thread Wiles, Keith
>2016-03-03 14:55, Wiles, Keith:
>> >>2016-02-28 21:47, Wiles, Keith:
>> >>> >Hi,
>> >>> >
>> >>> >2016-02-09 11:35, Keith Wiles:
>> >>> >> Adding support to the build system to allow for Makefile.XXX
>> >>> >> extention to a subtree, which already has Makefiles. These
>> >>> >> Makefiles could be from the autotools and others places. Using
>> >>> >> the Makefile extention RTE_MKFILE_SUFFIX in a makefile subtree
>> >>> >> using 'export RTE_MKFILE_SUFFIX=.XXX' to use Makefile.XXX in
>> >>> >> that subtree.
>> >>> >> 
>> >>> >> The main reason I needed this feature was to integrate a autotool
>> >>> >> open source projects with DPDK and keep the original Makefiles.
>> >>> >
>> >>> >Sorry I fail to understand why it is needed.
>> >>> >Are you trying to add autotool in DPDK? I don't think it is a good 
>> >>> >approach.
>> >>> >The DPDK must provide a pkgconfig interface to be integrated anywhere.
>> >>> 
>> >>> I was not trying to add autotools to DPDK. On a number of times I wanted 
>> >>> to integrate a open source project(s) with DPDK and use DPDK?s build 
>> >>> system, but because the open source project already contained Makefile 
>> >>> files you can not use DPDK build system without modify or moving the 
>> >>> original Makefile files. Using this method I can just add a exported 
>> >>> variable and supply my own Makefile.XXX files.
>> >>> 
>> >>> One case was building FreeBSD source, but I did not want to modify 
>> >>> FreeBSD Makefiles (or reply on previous built Makefiles as they would 
>> >>> not work on Linux anyway) as I was pulling the source down from 
>> >>> freebsd.org repo. Using a patch to add the Makefiles with a different 
>> >>> suffix allows me to build FreeBSD using DPDK, without having to modify 
>> >>> or own the FreeBSD source. I have had this problem a number of times 
>> >>> with open source code I did not want to modify, but just build within 
>> >>> DPDK build system and adding the support for a different suffix to DPDK 
>> >>> provided a clean way. The change does not effect the correct build 
>> >>> system and just allows someone to define a new suffix for a given 
>> >>> subtree in the code.
>> >>
>> >>Why would you like to have another project inside the DPDK files tree?
>> >>If you want to integrate the lib inside an existing project, the solution
>> >>is pkgconfig.
>> >
>> >The goal for me was to use DPDK build system for that project, instead of 
>> >using autotools or some other makefile system. In the case of FreeBSD code, 
>> >the FreeBSD build system requires FreeBSD tools to be built as the ?make? 
>> >and the Makefiles are very different on a Linux machine.
>> 
>> Does anyone find this patch useful, I would hate to see this one die as it 
>> does not effect the current builds, but adds support for using DPDK build 
>> system without having to modify or move the existing Makefiles.
>
>I would hate making the build system even more complicated to use it
>for something which is not its role.
>It opens the door to feature requests which are clearly out of its scope.

Ok, Thanks I will change the status in patchwork to rejected.
>
>


Regards,
Keith






[dpdk-dev] [PATCH] igb_uio: use macros for array size calculation

2016-03-03 Thread Ferruh Yigit
On 3/3/2016 5:45 PM, Ananyev, Konstantin wrote:
> 
> 
>> -Original Message-
>> From: Yigit, Ferruh
>> Sent: Thursday, March 03, 2016 5:35 PM
>> To: Ananyev, Konstantin; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] igb_uio: use macros for array size 
>> calculation
>>
>> On 3/3/2016 5:25 PM, Ananyev, Konstantin wrote:
>>>
>>>
 -Original Message-
 From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ferruh Yigit
 Sent: Thursday, March 03, 2016 5:08 PM
 To: dev at dpdk.org
 Subject: [dpdk-dev] [PATCH] igb_uio: use macros for array size calculation

 Minor code cleanup.
 Remove array size calculations and remove unnecessary assignment.

 Signed-off-by: Ferruh Yigit 
 ---
  lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

 diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
 b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
 index 3374e44..563c57b 100644
 --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
 +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
 @@ -58,7 +58,7 @@ struct rte_uio_pci_dev {
enum rte_intr_mode mode;
  };

 -static char *intr_mode = NULL;
 +static char *intr_mode;
  static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;

  /* sriov sysfs */
 @@ -332,7 +332,7 @@ igbuio_pci_setup_iomem(struct pci_dev *dev, struct 
 uio_info *info,
unsigned long addr, len;
void *internal_addr;

 -  if (sizeof(info->mem) / sizeof(info->mem[0]) <= n)
 +  if (n >= MAX_UIO_MAPS)
>>>
>>> Why using hardcoded value is better than sizeof()?
>>> As I can see below there is a macro ARRAY_SIZE, why not to use it here then?
>>
>> Both are valid, but in uio (uio_driver.h) "mem" array defined as:
>>  struct uio_mem  mem[MAX_UIO_MAPS];
> 
> Yep, so if both are valid, why to change it a the first place? :)
> 
>>
>> So we already know the size of the array, and it is exposed to us, why
>> need to calculate. Is there any benefit of calculating it?
> 
> if in future definition of the mem[] would change to let say:
> struct uio_mem   mem[X]
> your code would still be valid, and no need to update it. 

Since it is the part of uio API, I expect this will remain same,
otherwise igb_uio.c will already have problems  because there is other
piece of code that already rely on this information.

But I don't have a strong opinion on one or other, I will update this to
use ARRAY_SIZE()


Thanks,
ferruh



[dpdk-dev] [PATCH 1/3] kcp: add kernel control path kernel module

2016-03-03 Thread Ferruh Yigit
On 3/3/2016 4:59 PM, Stephen Hemminger wrote:
> On Thu, 3 Mar 2016 10:11:57 +
> Ferruh Yigit  wrote:
> 
>> On 3/2/2016 10:18 PM, Jay Rolette wrote:
>>>
>>> On Tue, Mar 1, 2016 at 8:02 PM, Stephen Hemminger
>>> mailto:stephen at networkplumber.org>> 
>>> wrote:
>>>
>>> On Mon, 29 Feb 2016 08:33:25 -0600
>>> Jay Rolette mailto:rolette at 
>>> infiniteio.com>>
>>> wrote:
>>>
>>> > On Mon, Feb 29, 2016 at 5:06 AM, Thomas Monjalon
>>> mailto:thomas.monjalon at 6wind.com>>
>>> > wrote:
>>> >
>>> > > Hi,
>>> > > I totally agree with Avi's comments.
>>> > > This topic is really important for the future of DPDK.
>>> > > So I think we must give some time to continue the discussion
>>> > > and have netdev involved in the choices done.
>>> > > As a consequence, these series should not be merged in the
>>> release 16.04.
>>> > > Thanks for continuing the work.
>>> > >
>>> >
>>> > I know you guys are very interested in getting rid of the out-of-tree
>>> > drivers, but please do not block incremental improvements to DPDK
>>> in the
>>> > meantime. Ferruh's patch improves the usability of KNI. Don't
>>> throw out
>>> > good and useful enhancements just because it isn't where you want
>>> to be in
>>> > the end.
>>> >
>>> > I'd like to see these be merged.
>>> >
>>> > Jay
>>>
>>> The code is really not ready. I am okay with cooperative development
>>> but the current code needs to go into a staging type tree.
>>> No compatibility, no ABI guarantees, more of an RFC.
>>> Don't want vendors building products with it then screaming when it
>>> gets rebuilt/reworked/scrapped.
>>>
>>>
>>> That's fair. To be clear, it wasn't my intent for code that wasn't baked
>>> yet to be merged. 
>>>
>>> The main point of my comment was that I think it is important not to
>>> halt incremental improvements to existing capabilities (KNI in this
>>> case) just because there are philosophical or directional changes that
>>> the community would like to make longer-term.
>>>
>>> Bird in the hand vs. two in the bush...
>>>
>>
>> There are two different statements, first, code being not ready, I agree
>> a fair point (although there is no argument to that statement, it makes
>> hard to discuss this, I will put aside this), this implies when code is
>> ready it can go in to repo.
>>
>> But not having kernel module, independent from their state against what
>> they are trying to replace is something else. And this won't help on KNI
>> related problems.
>>
>> Thanks,
>> ferruh
>>
> 
> Why not re-submit patches but put in lib/librte_eal/staging or similar path
> and make sure that it does not get build by normal build process.
> 
I will do when staging is ready/defined.

Also will start working on upstreaming modules.

Thanks,
ferruh


  1   2   >