[dpdk-dev] LRU using DPDK 1.7

2014-09-23 Thread Saha, Avik (AWS)
Hello
   I was wondering if there is way to use the rte_table_hash_lru without 
building a pipeline - Basically using the same hash table like functionality of 
add, delete and lookup without setting up a pipeline and connect it to ports 
etc.

Thanks
Avik



[dpdk-dev] [PATCH 1/5]librte_ether:use new filter framework

2014-09-23 Thread Jijiang Liu
Introduce a new filter framewok in librte_ether. As to the implemetation 
discussion, please refer to 
http://dpdk.org/ml/archives/dev/2014-September/005179.html, and VF MACVLAN 
filter implementation is based on it.


Signed-off-by: Jijiang Liu 
Acked-by: Helin Zhang 
Acked-by: Jingjing Wu 
Acked-by: Changchun Ouyang 

---
 lib/librte_ether/Makefile   |1 +
 lib/librte_ether/rte_eth_ctrl.h |   79 +++
 lib/librte_ether/rte_ethdev.c   |   33 
 lib/librte_ether/rte_ethdev.h   |   48 +++-
 4 files changed, 160 insertions(+), 1 deletions(-)
 create mode 100644 lib/librte_ether/rte_eth_ctrl.h

diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index b310f8b..a461c31 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -46,6 +46,7 @@ SRCS-y += rte_ethdev.c
 #
 SYMLINK-y-include += rte_ether.h
 SYMLINK-y-include += rte_ethdev.h
+SYMLINK-y-include += rte_eth_ctrl.h

 # this lib depends upon:
 DEPDIRS-y += lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
new file mode 100644
index 000..66745a6
--- /dev/null
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -0,0 +1,79 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_CTRL_H_
+#define _RTE_ETH_CTRL_H_
+
+/**
+ * @file
+ *
+ * Ethernet device features and related data structures used
+ * by control APIs should be defined in this file.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Feature filter types
+ */
+enum rte_filter_type {
+   RTE_ETH_FILTER_NONE = 0,
+   RTE_ETH_FILTER_RSS,
+   RTE_ETH_FILTER_FDIR,
+   RTE_ETH_FILTER_MAX,
+};
+
+/**
+ * All generic operations to filters
+ */
+enum rte_filter_op {
+   RTE_ETH_FILTER_OP_NONE = 0,
+   /**< used to check whether the type filter is supported */
+   RTE_ETH_FILTER_OP_ADD,  /**< add filter entry */
+   RTE_ETH_FILTER_OP_UPDATE,   /**< update filter entry */
+   RTE_ETH_FILTER_OP_DELETE,   /**< delete filter entry */
+   RTE_ETH_FILTER_OP_GET,  /**< get filter entry */
+   RTE_ETH_FILTER_OP_SET,  /**< configurations */
+   RTE_ETH_FILTER_OP_GET_INFO,
+   /**< get information of filter, such as status or statistics */
+   RTE_ETH_FILTER_OP_MAX,
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_ETH_CTRL_H_ */
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index fd1010a..a3f45a6 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3002,3 +3002,36 @@ rte_eth_dev_get_flex_filter(uint8_t port_id, uint16_t 
index,
return (*dev->dev_ops->get_flex_filter)(dev, index, filter,
rx_queue);
 }
+
+int
+rte_eth_dev_filter_supported(uint8_t port_id, enum rte_filter_type filter_type)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -ENODEV;
+   }
+
+   dev = &rte_eth_devices[port_id];
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->filter_ctrl, -ENOTSUP);
+   return (*dev->dev_ops->filter_ctrl)(dev, filter_type,
+   RTE_ET

[dpdk-dev] [PATCH 3/5]i40e:optimize MACVLAN filter implementation

2014-09-23 Thread Jijiang Liu
This patch mainly optimizes i40e_add_macvlan_filters() and 
i40e_remove_macvlan_filters() functions in order that
we can provide a flexible configuration interface. And another relevant MACVLAN 
filter codes are changed based on new data structures 

Signed-off-by: Jijiang Liu 
Acked-by: Helin Zhang 
Acked-by: Jingjing Wu 
Acked-by: Changchun Ouyang 

---
 lib/librte_pmd_i40e/i40e_ethdev.c |  209 ++---
 lib/librte_pmd_i40e/i40e_ethdev.h |   18 +++-
 lib/librte_pmd_i40e/i40e_pf.c |7 +-
 3 files changed, 193 insertions(+), 41 deletions(-)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index a00d6ca..9cc2ece 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -205,6 +205,9 @@ static int i40e_dev_rss_hash_update(struct rte_eth_dev *dev,
struct rte_eth_rss_conf *rss_conf);
 static int i40e_dev_rss_hash_conf_get(struct rte_eth_dev *dev,
  struct rte_eth_rss_conf *rss_conf);
+static int i40e_dev_filter_ctrl(struct rte_eth_dev *dev,
+   enum rte_filter_type filter_type,
+   enum rte_filter_op filter_op, void *arg);

 /* Default hash key buffer for RSS */
 static uint32_t rss_key_default[I40E_PFQF_HKEY_MAX_INDEX + 1];
@@ -256,6 +259,7 @@ static struct eth_dev_ops i40e_eth_dev_ops = {
.reta_query   = i40e_dev_rss_reta_query,
.rss_hash_update  = i40e_dev_rss_hash_update,
.rss_hash_conf_get= i40e_dev_rss_hash_conf_get,
+   .filter_ctrl  = i40e_dev_filter_ctrl,
 };

 static struct eth_driver rte_i40e_pmd = {
@@ -1514,6 +1518,7 @@ i40e_macaddr_add(struct rte_eth_dev *dev,
 {
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct i40e_mac_filter_info mac_filter;
struct i40e_vsi *vsi = pf->main_vsi;
struct ether_addr old_mac;
int ret;
@@ -1539,8 +1544,10 @@ i40e_macaddr_add(struct rte_eth_dev *dev,
(void)rte_memcpy(&old_mac, hw->mac.addr, ETHER_ADDR_LEN);
(void)rte_memcpy(hw->mac.addr, mac_addr->addr_bytes,
ETHER_ADDR_LEN);
+   (void)rte_memcpy(&mac_filter.mac_addr, mac_addr, ETHER_ADDR_LEN);
+   mac_filter.filter_type = RTE_MACVLAN_PERFECT_MATCH;

-   ret = i40e_vsi_add_mac(vsi, mac_addr);
+   ret = i40e_vsi_add_mac(vsi, &mac_filter);
if (ret != I40E_SUCCESS) {
PMD_DRV_LOG(ERR, "Failed to add MACVLAN filter");
return;
@@ -2457,6 +2464,7 @@ i40e_update_default_filter_setting(struct i40e_vsi *vsi)
 {
struct i40e_hw *hw = I40E_VSI_TO_HW(vsi);
struct i40e_aqc_remove_macvlan_element_data def_filter;
+   struct i40e_mac_filter_info filter;
int ret;

if (vsi->type != I40E_VSI_MAIN)
@@ -2470,6 +2478,7 @@ i40e_update_default_filter_setting(struct i40e_vsi *vsi)
ret = i40e_aq_remove_macvlan(hw, vsi->seid, &def_filter, 1, NULL);
if (ret != I40E_SUCCESS) {
struct i40e_mac_filter *f;
+   struct ether_addr *mac;

PMD_DRV_LOG(WARNING, "Cannot remove the default "
"macvlan filter");
@@ -2479,15 +2488,18 @@ i40e_update_default_filter_setting(struct i40e_vsi *vsi)
PMD_DRV_LOG(ERR, "failed to allocate memory");
return I40E_ERR_NO_MEMORY;
}
-   (void)rte_memcpy(&f->macaddr.addr_bytes, hw->mac.perm_addr,
+   mac = &f->mac_info.mac_addr;
+   (void)rte_memcpy(&mac->addr_bytes, hw->mac.perm_addr,
ETH_ADDR_LEN);
TAILQ_INSERT_TAIL(&vsi->mac_list, f, next);
vsi->mac_num++;

return ret;
}
-
-   return i40e_vsi_add_mac(vsi, (struct ether_addr *)(hw->mac.perm_addr));
+   (void)rte_memcpy(&filter.mac_addr,
+   (struct ether_addr *)(hw->mac.perm_addr), ETH_ADDR_LEN);
+   filter.filter_type = RTE_MACVLAN_PERFECT_MATCH;
+   return i40e_vsi_add_mac(vsi, &filter);
 }

 static int
@@ -2541,6 +2553,7 @@ i40e_vsi_setup(struct i40e_pf *pf,
 {
struct i40e_hw *hw = I40E_PF_TO_HW(pf);
struct i40e_vsi *vsi;
+   struct i40e_mac_filter_info filter;
int ret;
struct i40e_vsi_context ctxt;
struct ether_addr broadcast =
@@ -2751,7 +2764,10 @@ i40e_vsi_setup(struct i40e_pf *pf,
}

/* MAC/VLAN configuration */
-   ret = i40e_vsi_add_mac(vsi, &broadcast);
+   (void)rte_memcpy(&filter.mac_addr, &broadcast, ETHER_ADDR_LEN);
+   filter.filter_type = RTE_MACVLAN_PERFECT_MATCH;
+
+   ret = i40e_vsi_add_mac(vsi, &filter);
if (ret != I40E_SUCCESS) {
PMD_DRV_LOG(ERR, "Failed to add MACVLAN filter")

[dpdk-dev] [PATCH 5/5]testpmd:test VF MACVLAN filter for i40e

2014-09-23 Thread Jijiang Liu
Add a test command in testpmd to test VF MACVLAN filter feature.

Signed-off-by: Jijiang Liu 
Acked-by: Helin Zhang 
Acked-by: Jingjing Wu 
Acked-by: Changchun Ouyang 
---
 app/test-pmd/cmdline.c |  115 ++-
 1 files changed, 112 insertions(+), 3 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b04a4e8..bfdf265 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -351,9 +351,14 @@ static void cmd_help_long_parsed(void *parsed_result,
"e.g., 'set stat_qmap rx 0 2 5' sets rx queue 2"
" on port 0 to mapping 5.\n\n"

-   "set port (port_id) vf (vf_id) rx|tx on|off \n"
+   "set port (port_id) vf (vf_id) rx|tx on|off\n"
"Enable/Disable a VF receive/tranmit from a 
port\n\n"

+   "set port (port_id) vf (vf_id) (mac_addr)"
+   " (exact-mac#exact-mac-vlan#hashmac|hashmac-vlan) 
on|off\n"
+   "   Add/Remove unicast or multicast MAC addr filter"
+   " for a VF.\n\n"
+
"set port (port_id) vf (vf_id) rxmode (AUPE|ROPE|BAM"
"|MPE) (on|off)\n"
"AUPE:accepts untagged VLAN;"
@@ -5795,6 +5800,108 @@ cmdline_parse_inst_t cmd_set_uc_all_hash_filter = {
},
 };

+/* *** CONFIGURE MACVLAN FILTER FOR VF(s) *** */
+struct cmd_set_vf_macvlan_filter {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t port;
+   uint8_t port_id;
+   cmdline_fixed_string_t vf;
+   uint8_t vf_id;
+   struct ether_addr address;
+   cmdline_fixed_string_t filter_type;
+   cmdline_fixed_string_t mode;
+};
+
+static void
+cmd_set_vf_macvlan_parsed(void *parsed_result,
+  __attribute__((unused)) struct cmdline *cl,
+  __attribute__((unused)) void *data)
+{
+   int is_on, ret = 0;
+   struct cmd_set_vf_macvlan_filter *res = parsed_result;
+   struct rte_eth_mac_filter filter;
+
+   memset(&filter, 0, sizeof(struct rte_eth_mac_filter));
+
+   (void)rte_memcpy(&filter.mac_addr, &res->address, ETHER_ADDR_LEN);
+   filter.id = res->vf_id;
+   filter.pf_vf_flag = 0;
+
+   if (!strcmp(res->filter_type, "exact-mac"))
+   filter.filter_type = RTE_MAC_PERFECT_MATCH;
+   else if (!strcmp(res->filter_type, "exact-mac-vlan"))
+   filter.filter_type = RTE_MACVLAN_PERFECT_MATCH;
+   else if (!strcmp(res->filter_type, "hashmac"))
+   filter.filter_type = RTE_MAC_HASH_MATCH;
+   else if (!strcmp(res->filter_type, "hashmac-vlan"))
+   filter.filter_type = RTE_MACVLAN_HASH_MATCH;
+
+   is_on = (strcmp(res->mode, "on") == 0) ? 1 : 0;
+
+   if (is_on)
+   ret = rte_eth_dev_filter_ctrl(res->port_id,
+   RTE_ETH_FILTER_MACVLAN,
+   RTE_ETH_FILTER_OP_ADD,
+&filter);
+   else
+   ret = rte_eth_dev_filter_ctrl(res->port_id,
+   RTE_ETH_FILTER_MACVLAN,
+   RTE_ETH_FILTER_OP_DELETE,
+   &filter);
+
+   if (ret < 0)
+   printf("bad set MAC hash parameter, return code = %d\n", ret);
+
+}
+
+cmdline_parse_token_string_t cmd_set_vf_macvlan_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_vf_macvlan_filter,
+set, "set");
+cmdline_parse_token_string_t cmd_set_vf_macvlan_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_vf_macvlan_filter,
+port, "port");
+cmdline_parse_token_num_t cmd_set_vf_macvlan_portid =
+   TOKEN_NUM_INITIALIZER(struct cmd_set_vf_macvlan_filter,
+ port_id, UINT8);
+cmdline_parse_token_string_t cmd_set_vf_macvlan_vf =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_vf_macvlan_filter,
+vf, "vf");
+cmdline_parse_token_num_t cmd_set_vf_macvlan_vf_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_set_vf_macvlan_filter,
+   vf_id, UINT8);
+cmdline_parse_token_etheraddr_t cmd_set_vf_macvlan_mac =
+   TOKEN_ETHERADDR_INITIALIZER(struct cmd_set_vf_macvlan_filter,
+   address);
+cmdline_parse_token_string_t cmd_set_vf_macvlan_filter_type =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_vf_macvlan_filter,
+   filter_type, "exact-mac#exact-mac-vlan"
+   "#hashmac#hashmac-vlan");
+cmdline_parse_token_string_t cmd_set_vf_macvlan_mode =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_vf_macvlan_filter,
+mode, "on#off");
+
+cmdline_parse_inst_t cmd_set_vf_macvlan_filter = {
+   .f = cmd_set_vf_macvlan_

[dpdk-dev] [PATCH 4/5]i40e:add VF MACVLAN filter implementation in librte_pmd_i40e

2014-09-23 Thread Jijiang Liu
Add i40e_vf_mac_filter_set() function to support perfect match and hash match 
filter of MAC address and VLAN ID for a VF.

Signed-off-by: Jijiang Liu 
Acked-by: Helin Zhang 
Acked-by: Jingjing Wu 
Acked-by: Changchun Ouyang 

---
 lib/librte_pmd_i40e/i40e_ethdev.c |  117 -
 1 files changed, 114 insertions(+), 3 deletions(-)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index bdab17c..66ad3bb 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -1591,6 +1591,118 @@ i40e_macaddr_remove(struct rte_eth_dev *dev, uint32_t 
index)
memset(&pf->dev_addr, 0, sizeof(struct ether_addr));
 }

+/* Set perfect match or hash match of MAC and VLAN for a VF */
+static int
+i40e_vf_mac_filter_set(struct i40e_pf *pf,
+struct rte_eth_mac_filter *filter,
+bool add)
+{
+   struct i40e_hw *hw;
+   struct i40e_mac_filter_info mac_filter;
+   struct ether_addr old_mac;
+   struct ether_addr *new_mac;
+   struct i40e_pf_vf *vf = NULL;
+   uint16_t vf_id;
+   int ret;
+
+   if (pf == NULL) {
+   PMD_DRV_LOG(ERR, "Invalid PF argument\n");
+   return -EINVAL;
+   }
+   hw = I40E_PF_TO_HW(pf);
+
+   if (filter == NULL) {
+   PMD_DRV_LOG(ERR, "Invalid mac filter argument\n");
+   return -EINVAL;
+   }
+
+   new_mac = &filter->mac_addr;
+
+   if (is_zero_ether_addr(new_mac)) {
+   PMD_DRV_LOG(ERR, "Invalid ethernet address\n");
+   return -EINVAL;
+   }
+
+   vf_id = filter->id;
+
+   if (vf_id > pf->vf_num - 1 || !pf->vfs) {
+   PMD_DRV_LOG(ERR, "Invalid argument\n");
+   return -EINVAL;
+   }
+   vf = &pf->vfs[vf_id];
+
+   if (add && is_same_ether_addr(new_mac, &(pf->dev_addr))) {
+   PMD_DRV_LOG(INFO, "Ignore adding permanent MAC address\n");
+   return -EINVAL;
+   }
+
+   if (add) {
+   (void)rte_memcpy(&old_mac, hw->mac.addr, ETHER_ADDR_LEN);
+   (void)rte_memcpy(hw->mac.addr, new_mac->addr_bytes,
+   ETHER_ADDR_LEN);
+   (void)rte_memcpy(&mac_filter.mac_addr, &filter->mac_addr,
+ETHER_ADDR_LEN);
+
+   mac_filter.filter_type = filter->filter_type;
+   mac_filter.queue_id = filter->queue_id;
+   ret = i40e_vsi_add_mac(vf->vsi, &mac_filter);
+   if (ret != I40E_SUCCESS) {
+   PMD_DRV_LOG(ERR, "Failed to add MAC filter\n");
+   return -1;
+   }
+   ether_addr_copy(new_mac, &pf->dev_addr);
+   } else {
+   (void)rte_memcpy(hw->mac.addr, hw->mac.perm_addr,
+   ETHER_ADDR_LEN);
+   ret = i40e_vsi_delete_mac(vf->vsi, &filter->mac_addr);
+   if (ret != I40E_SUCCESS) {
+   PMD_DRV_LOG(ERR, "Failed to delete MAC filter\n");
+   return -1;
+   }
+
+   /* Clear device address as it has been removed */
+   if (is_same_ether_addr(&(pf->dev_addr), new_mac))
+   memset(&pf->dev_addr, 0, sizeof(struct ether_addr));
+   }
+
+   return 0;
+}
+
+static int
+i40e_mac_filter_handle(struct i40e_pf *pf, enum rte_filter_op filter_op,
+   void *arg)
+{
+   struct rte_eth_mac_filter *filter;
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   int ret = I40E_NOT_SUPPORTED;
+
+   filter = (struct rte_eth_mac_filter *)(arg);
+
+   switch (filter_op) {
+   case RTE_ETH_FILTER_OP_NONE:
+   ret = I40E_SUCCESS;
+   break;
+   case RTE_ETH_FILTER_OP_ADD:
+   i40e_pf_disable_irq0(hw);
+   if (!filter->pf_vf_flag)
+   ret = i40e_vf_mac_filter_set(pf, filter, 1);
+   i40e_pf_enable_irq0(hw);
+   break;
+   case RTE_ETH_FILTER_OP_DELETE:
+   i40e_pf_disable_irq0(hw);
+   if (!filter->pf_vf_flag)
+   ret = i40e_vf_mac_filter_set(pf, filter, 0);
+   i40e_pf_enable_irq0(hw);
+   break;
+   default:
+   PMD_DRV_LOG(ERR, "unknown operation %u\n", filter_op);
+   ret = I40E_ERR_PARAM;
+   break;
+   }
+
+   return ret;
+}
+
 static int
 i40e_dev_rss_reta_update(struct rte_eth_dev *dev,
 struct rte_eth_rss_reta *reta_conf)
@@ -4224,16 +4336,15 @@ static int
 i40e_dev_filter_ctrl(struct rte_eth_dev *dev, enum rte_filter_type filter_type,
 enum rte_filter_op filter_op, void *arg) {

+   struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
int ret = 0;

-   if (dev == NULL)
-   return -EINVAL;
-
if (arg == 

[dpdk-dev] [PATCH 0/5]support filter of unicast and multicast MAC address for VF on Fortville

2014-09-23 Thread Jijiang Liu
The patch set enhances MACVLAN filter configurability and supports perfect and 
hash match filter of unicast
and multicast MAC address for VF on Fortville.

It mainly includes:
 - Use new filter mechanism discussed at 
http://dpdk.org/ml/archives/dev/2014-September/005179.html. 
 - Enhance MACVLAN filter to be configurable. Now the following options are 
configurable:  
   1. Perfect match of MAC address 
   2. Perfect match of MAC address and VLAN ID 
   3. Hash match of MAC address 
   4. Hash match of MAC address and perfect match of VLAN ID
   5. To Queue: use MAC and VLAN to point to a queue
 - Support perfect and hash match of unicast and multicast MAC address for VF 
for i40e 


jijiangl (5):
  Use new filter framework
  Add new definations for MACVLAN filter enhancement in rte_eth_ctrl.h file
  Change parameters of MAC/VLAN filter to be configurable
  Add VF MACVLAN filter handle for i40e
  Test VF MACVLAN filter for i40e

 app/test-pmd/cmdline.c|  115 +-
 lib/librte_ether/Makefile |1 +
 lib/librte_ether/rte_eth_ctrl.h   |  104 
 lib/librte_ether/rte_ethdev.c |   33 
 lib/librte_ether/rte_ethdev.h |   48 ++-
 lib/librte_pmd_i40e/i40e_ethdev.c |  321 -
 lib/librte_pmd_i40e/i40e_ethdev.h |   18 ++-
 lib/librte_pmd_i40e/i40e_pf.c |7 +-
 8 files changed, 601 insertions(+), 46 deletions(-)
 create mode 100644 lib/librte_ether/rte_eth_ctrl.h

-- 
1.7.7.6



[dpdk-dev] [PATCH 2/5]librte_ether:extend data structures of MACVLAN filter

2014-09-23 Thread Jijiang Liu
Add new data definations for MACVLAN filter enhancement in rte_eth_ctrl.h file.

Signed-off-by: Jijiang Liu 
Acked-by: Helin Zhang 
Acked-by: Jingjing Wu 
Acked-by: Changchun Ouyang 
---
 lib/librte_ether/rte_eth_ctrl.h |   25 +
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index 66745a6..0910376 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -53,6 +53,7 @@ enum rte_filter_type {
RTE_ETH_FILTER_NONE = 0,
RTE_ETH_FILTER_RSS,
RTE_ETH_FILTER_FDIR,
+   RTE_ETH_FILTER_MACVLAN,
RTE_ETH_FILTER_MAX,
 };

@@ -72,6 +73,30 @@ enum rte_filter_op {
RTE_ETH_FILTER_OP_MAX,
 };

+/* *** MACVLAN FILTER *** */
+
+/* MAC/VLAN filter type */
+#define RTE_MAC_PERFECT_MATCH  0x0001
+#define RTE_MACVLAN_PERFECT_MATCH  0x0002
+#define RTE_MAC_HASH_MATCH 0x0004
+#define RTE_MACVLAN_HASH_MATCH 0x0008
+#define RTE_MACVLAN_TO_QUEUE   0x0010
+
+/* MACVLAN filter type mask */
+#define RTE_MACVLAN_FILTER_MASK0x000F
+
+
+/**
+ * MAC filter structure
+ */
+struct rte_eth_mac_filter {
+   uint8_t  pf_vf_flag;  /**< 0 for PF;1 for VF */
+   uint16_t id;  /**< PF ID or VF ID */
+   uint16_t filter_type; /**< MAC/VLAN filter type */
+   uint16_t queue_id;/**< to queue ID */
+   struct ether_addr mac_addr;
+};
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.7.7.6



[dpdk-dev] LRU using DPDK 1.7

2014-09-23 Thread Saha, Avik (AWS)
So with DPDK 1.7 there are 2 separate implementations - one is the rte_hash 
which does not support LRU (at least to my understanding - I could be wrong 
here) and then there is the librte_table library which has support for LRU in a 
hash table. I m a little confused as to which one you are referring to Matthew.

-Original Message-
From: Matthew Hall [mailto:mh...@mhcomputing.net] 
Sent: Monday, September 22, 2014 6:34 PM
To: Saha, Avik (AWS)
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] LRU using DPDK 1.7

On Tue, Sep 23, 2014 at 01:08:21AM +, Saha, Avik (AWS) wrote:
> I was wondering if there is way to use the rte_table_hash_lru without 
> building a pipeline - Basically using the same hash table like 
> functionality of add, delete and lookup without setting up a pipeline 
> and connect it to ports etc.

I've been finding that rte_hash is designed only for some very specialized 
purposes. It doesn't work well if you use unexpected sizes of keys or want 
behavior that isn't precisely doing what the designers of the hash used it 
for... it's not very general-purpose.

I did try to point out one example of the issue but I didn't get much response 
yet to my questions about its limitations and whether a more general-purpose 
table was available, or at least some discussion what rte_hash is for and what 
it's not for.

Matthew.


[dpdk-dev] [PATCH 06/10] Alternate implementation of librte_power for VM Power Management(Guest).

2014-09-23 Thread Carew, Alan
Hi Neil,


> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Monday, September 22, 2014 8:18 PM
> To: Carew, Alan
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 06/10] Alternate implementation of
> librte_power for VM Power Management(Guest).
> 
> On Mon, Sep 22, 2014 at 07:34:35PM +0100, Alan Carew wrote:
> > Re-using the host based librte_power API the alternate implementation uses
> > the guest channel API to forward request for frequency changes to the host
> > monitor.
> > A subset of the librte_power API is supported:
> >  rte_power_init(unsigned lcore_id)
> >  rte_power_exit(unsigned lcore_id)
> >  rte_power_freq_up(unsigned lcore_id)
> >  rte_power_freq_down(unsigned lcore_id)
> >  rte_power_freq_min(unsigned lcore_id)
> >  rte_power_freq_max(unsigned lcore_id)
> >
> > The other unsupported APIs from librte_power return -ENOTSUP.
> >
> > Signed-off-by: Alan Carew 
> > ---
> >  lib/librte_power_vm/Makefile|  49 ++
> >  lib/librte_power_vm/rte_power.c | 146
> 
> >  2 files changed, 195 insertions(+)
> >  create mode 100644 lib/librte_power_vm/Makefile
> >  create mode 100644 lib/librte_power_vm/rte_power.c
> >
> NAK.
> This is a bad design choice.  Creating an alternate library with all the same
> symbols in place prevents an application from compiling in support for both 
> host
> and guest power management in parallel (i.e. if an app wants to be able to do
> power management in either environment, and only gets built once, it won't
> work).
> 
> In fact, linking a statically built library with both 
> CONFIG_RTE_LIBRTE_POWER=y
> and CONFIG_RTE_LIBRTE_POWER_VM=y yields the following link-time build
> break:
> 
> LD test
> /home/nhorman/git/dpdk/build/lib/librte_power.a(guest_channel.o): In
> function
> `guest_channel_host_connect':
> guest_channel.c:(.text+0x0): multiple definition of
> `guest_channel_host_connect'
> /home/nhorman/git/dpdk/build/lib/librte_power.a(guest_channel.o):guest_cha
> nnel.c:(.text+0x0):
> first defined here
> /home/nhorman/git/dpdk/build/lib/librte_power.a(guest_channel.o): In
> function
> `guest_channel_send_msg':
> guest_channel.c:(.text+0x370): multiple definition of `guest_channel_send_msg'
> 
> Ad nauseum.
> 
> What you should do is merge this functionality in with the existing librte 
> power
> library, and make the choice of implementation a run time decision, so theres
> only a single public facing API symbol set, and both implementations can
> coexist, getting chosen at run time (via initialization config option,
> environment detection, etc).  Konstantin and I had a simmilar discussion
> regarding the ACL library and the use of the match function.  I think we came 
> up
> with some reasonably performant solutions.
> 
> Neil

Makes sense, I'll take a look at runtime configuration options and post a V2.

Thanks,
Alan 


[dpdk-dev] DPDK 1.7 crashes on table initialization

2014-09-23 Thread Saha, Avik (AWS)
Hey guys
   My DPDK application is crashing on a table creation when I specify 
.action_data_size as greater than 0. I could not find the constraints on this 
field in documentation (multiple of 4 or something). Could someone please give 
some guidance on this issue.

Thanks
Avik


[dpdk-dev] KNI and memzones

2014-09-23 Thread Marc Sune
Hi all,

So we are having some problems with KNI. In short, we have a DPDK 
application that creates KNI interfaces and destroys them during its 
lifecycle and connecting them to DOCKER containers. Interfaces may 
eventually be even named the same (see below).

We were wondering why even calling rte_kni_release() the hugepages 
memory was rapidly being exhausted, and we also realised even after 
destruction, you cannot use the same name for the interface.

After close inspection of the rte_kni lib we think the core issue and is 
mostly a design issue. rte_kni_alloc ends up calling 
kni_memzone_reserve() that calls at the end rte_memzone_reserve() which 
cannot be unreserved by rte_kni_relese() (by design of memzones). The 
exhaustion is rapid due to the number of FIFOs created (6).

If this would be right, we would propose and try to provide a patch as 
follows:

* Create a new rte_kni_init(unsigned int max_knis);

This would preallocate all the FIFO rings(TX, RX, ALLOC, FREE, Request 
and  Response)*max_knis by calling kni_memzone_reserve(), and store them 
in a kni_fifo_pool. This should only be called once by DPDK applications 
at bootstrapping time.

* rte_kni_allocate would just use one of the kni_fifo_pool (one => 
meaning a a set of 6 FIFOs making a single slot)
* rte_kni_release would return to the pool.

This should solve both issues. We would base the patch on 1.7.2.

Thoughts?
marc

p.s. Lately someone involved with DPDK said KNI would be deprecated in 
future DPDK releases; I haven't read or listen to this before, is this 
true? What would be the natural replacement then?


[dpdk-dev] compile error with linuxapp-clang target on Fedora 20 with 3.15.10 kernel

2014-09-23 Thread Bruce Richardson
On Mon, Sep 22, 2014 at 03:12:43PM -0700, Matthew Hall wrote:
> On Mon, Sep 22, 2014 at 04:05:29PM -0400, Neil Horman wrote:
> > On Mon, Sep 22, 2014 at 12:23:36PM -0700, Matthew Hall wrote:
> > > I fixed some of the clang errors a few weeks ago. But some of my patches 
> > > got sent back due to issues seen by others and I didn't have time to fix 
> > > them yet.
> > Can you elaborate on the specific issue here?
> > Neil
> 
> Sure...
> 
> Have a look at this thread. With this, I got it compiling fine with Clang on 
> Ubuntu 14.04 LTS.
> 
> Some of your stuff was funky kernel problems... I probably didn't get that as 
> I was using an earlier kernel release.
> 
> One of the patches was merged as it was trivial but the others involved 
> disabling some warnings on certain examples... but people said they preferred 
> using ifdef's instead to fix them, which I didn't get a chance to do yet.
> 
> Maybe we could try and make all of these clang fixes happen together. I 
> really 
> value the better error messages, I can fix bugs much quicker with all of 
> those.
> 
> Matthew.

"make examples" on all the examples has failed for some time, but the 
compilation of the main libs used to work. I've pulled down a 3.14 kernel 
for fedora from koji and confirmed that building with 
"RTE_KERNELDIR=/usr/src/kernels/3.14.9-200.fc20.x86_64/" works fine. It's 
something that has changed in 3.15 and beyond that is causing clang flags to 
get passed in to gcc. I've confirmed that 3.16 also doesn't work.

/Bruce


[dpdk-dev] DPDK 1.7 crashes on table initialization

2014-09-23 Thread Neil Horman
On Tue, Sep 23, 2014 at 09:07:55AM +, Saha, Avik (AWS) wrote:
> Hey guys
>My DPDK application is crashing on a table creation when I specify 
> .action_data_size as greater than 0. I could not find the constraints on this 
> field in documentation (multiple of 4 or something). Could someone please 
> give some guidance on this issue.
> 
Post the backtrace when the error occurs?
Neil

> Thanks
> Avik
> 


[dpdk-dev] [PATCH 1/4] compat: Add infrastructure to support symbol versioning

2014-09-23 Thread Sergio Gonzalez Monroy
Hi Neil,

On Mon, Sep 15, 2014 at 03:23:48PM -0400, Neil Horman wrote:
> Add initial pass header files to support symbol versioning.
> 
> Signed-off-by: Neil Horman 
> CC: Thomas Monjalon 
> CC: "Richardson, Bruce" 
> ---
>  lib/Makefile   |  1 +
>  lib/librte_compat/Makefile | 38 +++
>  lib/librte_compat/rte_compat.h | 86 
> ++
>  mk/rte.lib.mk  |  6 +++
>  4 files changed, 131 insertions(+)
>  create mode 100644 lib/librte_compat/Makefile
>  create mode 100644 lib/librte_compat/rte_compat.h
> 
> diff --git a/lib/Makefile b/lib/Makefile
> index 10c5bb3..a85b55b 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -32,6 +32,7 @@
>  include $(RTE_SDK)/mk/rte.vars.mk
>  
>  DIRS-$(CONFIG_RTE_LIBC) += libc
> +DIRS-y += librte_compat
>  DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal
>  DIRS-$(CONFIG_RTE_LIBRTE_MALLOC) += librte_malloc
>  DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring
> diff --git a/lib/librte_compat/Makefile b/lib/librte_compat/Makefile
> new file mode 100644
> index 000..a61511a
> --- /dev/null
> +++ b/lib/librte_compat/Makefile
> @@ -0,0 +1,38 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +# * Redistributions of source code must retain the above copyright
> +#   notice, this list of conditions and the following disclaimer.
> +# * Redistributions in binary form must reproduce the above copyright
> +#   notice, this list of conditions and the following disclaimer in
> +#   the documentation and/or other materials provided with the
> +#   distribution.
> +# * Neither the name of Intel Corporation nor the names of its
> +#   contributors may be used to endorse or promote products derived
> +#   from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +
> +# install includes
> +SYMLINK-y-include := rte_compat.h
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_compat/rte_compat.h b/lib/librte_compat/rte_compat.h
> new file mode 100644
> index 000..6d65a53
> --- /dev/null
> +++ b/lib/librte_compat/rte_compat.h
> @@ -0,0 +1,86 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of Intel Corporation nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVE

[dpdk-dev] [PATCH v2 0/5] Mbuf Structure Rework, part 3

2014-09-23 Thread Bruce Richardson
This is the final planned set of patches to make changes to the mbuf
data structure and associated files. This patch set makes more changes to
help improve performance following the mbuf changes and adds in two new
fields into the mbuf structure.

It is planned to add other fields other than the two provided here, but
patches for adding those fields will be included in the patch sets for the
changes making use of those fields, since adding them does not affect, or
move, any other mbuf fields.

Changes in V2:
* Updated userdata pointer in mbuf to always be 8 bytes big
* Updated a number of commit messages to have more details about the 
performance benefits of the changes proposed in the patches
* Removed old patch 5 which added the second vlan tag, and replaced it with a 
new, smaller patch which just moves the existing vlan_tci field above the 
16-bit reserved space.

Bruce Richardson (5):
  mbuf: ensure next pointer is set to null on free
  ixgbe: add prefetch to improve slow-path tx perf
  testpmd: Change rxfreet default to 32
  mbuf: add userdata pointer field
  mbuf: switch vlan_tci and reserved2 fields

 app/test-pmd/testpmd.c  |  4 +++-
 .../linuxapp/eal/include/exec-env/rte_kni_common.h  |  6 --
 lib/librte_mbuf/rte_mbuf.h  | 12 ++--
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 13 -
 4 files changed, 25 insertions(+), 10 deletions(-)

-- 
1.9.3



[dpdk-dev] [PATCH v2 4/5] mbuf: add userdata pointer field

2014-09-23 Thread Bruce Richardson
While some applications may store metadata about packets in the packet
mbuf headroom, this is not a workable solution for packet metadata which
is either:
* larger than the headroom (or headroom is needed for adding pkt headers)
* needs to be shared or copied among packets

To support these use cases in applications, we reserve a general
"userdata" pointer field inside the second cache-line of the mbuf. This
is better than having the application store the pointer to the external
metadata in the packet headroom, as it saves an additional cache-line
from being used.

Apart from storing metadata, this field also provides a general 8-byte
scratch space inside the mbuf for any other application uses that are
applicable.

Changes in V2:
* made the userdata field always have 8-bytes available, even on 32-bit

Signed-off-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 6 --
 lib/librte_mbuf/rte_mbuf.h| 6 ++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h 
b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
index 25ed672..e548161 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
@@ -116,8 +116,10 @@ struct rte_kni_mbuf {
char pad2[2];
uint16_t data_len;  /**< Amount of data in segment buffer. */
uint32_t pkt_len;   /**< Total pkt len: sum of all segment 
data_len. */
-   char pad3[8];
-   void *pool __attribute__((__aligned__(64)));
+
+   /* fields on second cache line */
+   char pad3[8] __attribute__((__aligned__(64)));
+   void *pool;
void *next;
 };

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 8e27d2e..9e70d3b 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -172,6 +172,12 @@ struct rte_mbuf {

/* second cache line - fields only used in slow path or on TX */
MARKER cacheline1 __rte_cache_aligned;
+
+   union {
+   void *userdata;   /**< Can be used for external metadata */
+   uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
+   };
+
struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
struct rte_mbuf *next;/**< Next segment of scattered packet. */

-- 
1.9.3



[dpdk-dev] [PATCH v2 5/5] mbuf: switch vlan_tci and reserved2 fields

2014-09-23 Thread Bruce Richardson
Move the vlan_tci field up by two bytes in the mbuf data structure. This
has two effects:
* Ensures the the ixgbe vector driver places the vlan tag in the correct
  place in the mbuf.
* Allows a second vlan tag field, if one is added in the future, to be
  placed after the existing vlan field, rather than before.

Signed-off-by: Bruce Richardson 
---
 lib/librte_mbuf/rte_mbuf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 9e70d3b..68304cc 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -159,8 +159,8 @@ struct rte_mbuf {
uint16_t reserved2;   /**< Unused field. Required for padding */
uint16_t data_len;/**< Amount of data in segment buffer. */
uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
-   uint16_t reserved;
uint16_t vlan_tci;/**< VLAN Tag Control Identifier (CPU order) 
*/
+   uint16_t reserved;
union {
uint32_t rss; /**< RSS hash result if RSS enabled */
struct {
-- 
1.9.3



[dpdk-dev] [PATCH v2 3/5] testpmd: Change rxfreet default to 32

2014-09-23 Thread Bruce Richardson
To improve performance by using bulk alloc or vectored RX routines, we
need to set rx free threshold (rxfreet) value to 32, so make this the
testpmd default.

Thirty-two is the minimum setting needed to enable either the
bulk alloc or vector RX routines inside the ixgbe driver, so it's
best made the default for that reason. Please see
"check_rx_burst_bulk_alloc_preconditions()" in ixgbe_rxtx.c, and
RX function assignment logic in "ixgbe_dev_rx_queue_setup()" in
the same file.

The difference in IO performance for testpmd when called without any
optional parameters, and using 10G NICs using the ixgbe driver, can be
significant - approx 25% or more.

Updates in V2:
* Updated commit message with additional details

Signed-off-by: Bruce Richardson 
---
 app/test-pmd/testpmd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9f6cdc4..f76406f 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -225,7 +225,9 @@ struct rte_eth_thresh tx_thresh = {
 /*
  * Configurable value of RX free threshold.
  */
-uint16_t rx_free_thresh = 0; /* Immediately free RX descriptors by default. */
+uint16_t rx_free_thresh = 32; /* Refill RX descriptors once every 32 packets,
+   This setting is needed for ixgbe to enable bulk alloc or vector
+   receive functionality. */

 /*
  * Configurable value of RX drop enable.
-- 
1.9.3



[dpdk-dev] [PATCH v2 1/5] mbuf: ensure next pointer is set to null on free

2014-09-23 Thread Bruce Richardson
The receive functions for packets do not modify the next pointer so
the next pointer should always be cleared on mbuf free, just in case.
The slow-path TX needs to clear it, and the standard mbuf free function
also needs to clear it. Fast path TX does not handle chained mbufs so
is unaffected

Changes in V2:
* None

Signed-off-by: Bruce Richardson 
---
 lib/librte_mbuf/rte_mbuf.h| 4 +++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 1c6e115..8e27d2e 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -682,8 +682,10 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 static inline void __attribute__((always_inline))
 rte_pktmbuf_free_seg(struct rte_mbuf *m)
 {
-   if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m
+   if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m {
+   m->next = NULL;
__rte_mbuf_raw_free(m);
+   }
 }

 /**
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index a80cade..6f702b3 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -145,6 +145,7 @@ ixgbe_tx_free_bufs(struct igb_tx_queue *txq)
/* free buffers one at a time */
if ((txq->txq_flags & (uint32_t)ETH_TXQ_FLAGS_NOREFCOUNT) != 0) {
for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
+   txep->mbuf->next = NULL;
rte_mempool_put(txep->mbuf->pool, txep->mbuf);
txep->mbuf = NULL;
}
-- 
1.9.3



[dpdk-dev] [PATCH v2 2/5] ixgbe: add prefetch to improve slow-path tx perf

2014-09-23 Thread Bruce Richardson
Make a small improvement to slow path TX performance by adding in a
prefetch for the second mbuf cache line.
Also move assignment of l2/l3 length values only when needed.

What I've done with the prefetches is two-fold:
1) changed it from prefetching the mbuf (first cache line) to prefetching
the mbuf pool pointer (second cache line) so that when we go to access
the pool pointer to free transmitted mbufs we don't get a cache miss. When
clearing the ring and freeing mbufs, the pool pointer is the only mbuf
field used, so we don't need that first cache line.
2) changed the code to prefetch earlier - in effect to prefetch one mbuf
ahead. The original code prefetched the mbuf to be freed as soon as it
started processing the mbuf to replace it. Instead now, every time we
calculate what the next mbuf position is going to be we prefetch the mbuf
in that position (i.e. the mbuf pool pointer we are going to free the mbuf
to), even while we are still updating the previous mbuf slot on the ring.
This gives the prefetch much more time to resolve and get the data we need
in the cache before we need it.

In terms of performance difference, a quick sanity test using testpmd
on a Xeon (Sandy Bridge uarch) platform showed performance increases
between approx 8-18%, depending on the particular RX path used in
conjuntion with this TX path code.

Changes in V2:
* Expanded commit message with extra details of change.

Signed-off-by: Bruce Richardson 
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 6f702b3..c0bb49f 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -565,25 +565,26 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
ixgbe_xmit_cleanup(txq);
}

+   rte_prefetch0(&txe->mbuf->pool);
+
/* TX loop */
for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
new_ctx = 0;
tx_pkt = *tx_pkts++;
pkt_len = tx_pkt->pkt_len;

-   RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
/*
 * Determine how many (if any) context descriptors
 * are needed for offload functionality.
 */
ol_flags = tx_pkt->ol_flags;
-   vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
-   vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;

/* If hardware offload required */
tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
if (tx_ol_req) {
+   vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
+   vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
+
/* If new context need be built or reuse the exist ctx. 
*/
ctx = what_advctx_update(txq, tx_ol_req,
vlan_macip_lens.data);
@@ -720,7 +721,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
&txr[tx_id];

txn = &sw_ring[txe->next_id];
-   RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
+   rte_prefetch0(&txn->mbuf->pool);

if (txe->mbuf != NULL) {
rte_pktmbuf_free_seg(txe->mbuf);
@@ -749,6 +750,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
do {
txd = &txr[tx_id];
txn = &sw_ring[txe->next_id];
+   rte_prefetch0(&txn->mbuf->pool);

if (txe->mbuf != NULL)
rte_pktmbuf_free_seg(txe->mbuf);
-- 
1.9.3



[dpdk-dev] KNI and memzones

2014-09-23 Thread Jay Rolette
*> p.s. Lately someone involved with DPDK said KNI would be deprecated in
future DPDK releases; I haven't read or listen to this before, is this
true? What would be the natural replacement then?*

KNI is a non-trivial part of the product I'm in the process of building.
I'd appreciate someone "in the know" addressing this one please. Are there
specific roadmap plans relative to KNI that we need to be aware of?

Regards,
Jay

On Tue, Sep 23, 2014 at 4:27 AM, Marc Sune  wrote:

> Hi all,
>
> So we are having some problems with KNI. In short, we have a DPDK
> application that creates KNI interfaces and destroys them during its
> lifecycle and connecting them to DOCKER containers. Interfaces may
> eventually be even named the same (see below).
>
> We were wondering why even calling rte_kni_release() the hugepages memory
> was rapidly being exhausted, and we also realised even after destruction,
> you cannot use the same name for the interface.
>
> After close inspection of the rte_kni lib we think the core issue and is
> mostly a design issue. rte_kni_alloc ends up calling kni_memzone_reserve()
> that calls at the end rte_memzone_reserve() which cannot be unreserved by
> rte_kni_relese() (by design of memzones). The exhaustion is rapid due to
> the number of FIFOs created (6).
>
> If this would be right, we would propose and try to provide a patch as
> follows:
>
> * Create a new rte_kni_init(unsigned int max_knis);
>
> This would preallocate all the FIFO rings(TX, RX, ALLOC, FREE, Request
> and  Response)*max_knis by calling kni_memzone_reserve(), and store them in
> a kni_fifo_pool. This should only be called once by DPDK applications at
> bootstrapping time.
>
> * rte_kni_allocate would just use one of the kni_fifo_pool (one => meaning
> a a set of 6 FIFOs making a single slot)
> * rte_kni_release would return to the pool.
>
> This should solve both issues. We would base the patch on 1.7.2.
>
> Thoughts?
> marc
>
> p.s. Lately someone involved with DPDK said KNI would be deprecated in
> future DPDK releases; I haven't read or listen to this before, is this
> true? What would be the natural replacement then?
>


[dpdk-dev] [PATCH 0/3] distributor_app: new sample application for distributor library

2014-09-23 Thread Bruce Richardson
On Tue, Sep 16, 2014 at 01:13:24PM +0100, reshmapa wrote:
> From: Reshma Pattan 
> 
> A new sample app that shows the usage of the distributor library. This
> app works as follows:
> 
> *An RX thread runs which pulls packets from each ethernet port in turn
> and passes those packets to worker using a distributor component.
> 
> *The workers take the packets in turn, and determine the output port
> for those packets using basic l2forwarding doing an xor on the source
> port id.
> 
> *The RX thread takes the returned packets from the workers and enqueue
> those packets into an rte_ring structure.
> 
> *A TX thread pulls the packets off the rte_ring structure and then
> sends each packet out the output port specified previously by the worker
> 
> Bruce Richardson (1):
>   distributor_app: new sample app
> 
> Reshma Pattan (2):
>   distributor_app: code review comments implementation
>   distributor_app: removed extra spaces
> 

Since this is just a sample app and the second two patches are just minor 
adjustments to it, I suggest that this be resubmitted as a single patch 
instead of a set. That should also fix the whitespace warnings one gets when 
using "git am" to apply the set.

Please also check the indentation used in the file. I see in a number of 
places that spaces are used instead of tabs for indentation. Running 
checkpatch.pl on the patch before submission should help catch these issues.

Regards,
/Bruce

>  examples/Makefile |   1 +
>  examples/distributor_app/Makefile |  57 
>  examples/distributor_app/main.c   | 586 
> ++
>  examples/distributor_app/main.h   |  46 +++
>  4 files changed, 690 insertions(+)
>  create mode 100644 examples/distributor_app/Makefile
>  create mode 100644 examples/distributor_app/main.c
>  create mode 100644 examples/distributor_app/main.h
> 
> -- 
> 1.8.3.1
> 


[dpdk-dev] [PATCH 1/6] ether: enhancement for VMDQ support

2014-09-23 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" 

The change includes several parts:
1. Clear pool bitmap when trying to remove specific MAC.
2. Define RSS, DCB and VMDQ flags to combine rx_mq_mode.
3. Use 'struct' to replace 'union', which to expand the rx_adv_conf
   arguments to better support RSS, DCB and VMDQ.
4. Fix bug in rte_eth_dev_config_restore function, which will restore
   all MAC address to default pool.
5. Define additional 3 arguments for better VMDQ support.

Signed-off-by: Chen Jing D(Mark) 
Acked-by: Konstantin Ananyev 
Acked-by: Jingjing Wu 
Acked-by: Jijiang Liu 
Acked-by: Huawei Xie 
---
 lib/librte_ether/rte_ethdev.c |   12 +++-
 lib/librte_ether/rte_ethdev.h |   39 ---
 2 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index fd1010a..b7ef56e 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -771,7 +771,8 @@ rte_eth_dev_config_restore(uint8_t port_id)
continue;

/* add address to the hardware */
-   if  (*dev->dev_ops->mac_addr_add)
+   if  (*dev->dev_ops->mac_addr_add &&
+   dev->data->mac_pool_sel[i] & (1ULL << pool))
(*dev->dev_ops->mac_addr_add)(dev, &addr, i, pool);
else {
PMD_DEBUG_TRACE("port %d: MAC address array not 
supported\n",
@@ -1249,10 +1250,8 @@ rte_eth_dev_info_get(uint8_t port_id, struct 
rte_eth_dev_info *dev_info)
}
dev = &rte_eth_devices[port_id];

-   /* Default device offload capabilities to zero */
-   dev_info->rx_offload_capa = 0;
-   dev_info->tx_offload_capa = 0;
-   dev_info->if_index = 0;
+   /* Set all fields with zero */
+   memset(dev_info, 0, sizeof(*dev_info));
FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get);
(*dev->dev_ops->dev_infos_get)(dev, dev_info);
dev_info->pci_dev = dev->pci_dev;
@@ -2022,6 +2021,9 @@ rte_eth_dev_mac_addr_remove(uint8_t port_id, struct 
ether_addr *addr)
/* Update address in NIC data structure */
ether_addr_copy(&null_mac_addr, &dev->data->mac_addrs[index]);

+   /* Update pool bitmap in NIC data structure */
+   dev->data->mac_pool_sel[index] = 0;
+
return 0;
 }

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 50df654..8f3b6df 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -251,21 +251,34 @@ struct rte_eth_thresh {
uint8_t wthresh; /**< Ring writeback threshold. */
 };

+#define ETH_MQ_RX_RSS_FLAG  0x1
+#define ETH_MQ_RX_DCB_FLAG  0x2
+#define ETH_MQ_RX_VMDQ_FLAG 0x4
+
 /**
  *  A set of values to identify what method is to be used to route
  *  packets to multiple queues.
  */
 enum rte_eth_rx_mq_mode {
-   ETH_MQ_RX_NONE = 0,  /**< None of DCB,RSS or VMDQ mode */
-
-   ETH_MQ_RX_RSS,   /**< For RX side, only RSS is on */
-   ETH_MQ_RX_DCB,   /**< For RX side,only DCB is on. */
-   ETH_MQ_RX_DCB_RSS,   /**< Both DCB and RSS enable */
-
-   ETH_MQ_RX_VMDQ_ONLY, /**< Only VMDQ, no RSS nor DCB */
-   ETH_MQ_RX_VMDQ_RSS,  /**< RSS mode with VMDQ */
-   ETH_MQ_RX_VMDQ_DCB,  /**< Use VMDQ+DCB to route traffic to queues */
-   ETH_MQ_RX_VMDQ_DCB_RSS, /**< Enable both VMDQ and DCB in VMDq */
+   /**< None of DCB,RSS or VMDQ mode */
+   ETH_MQ_RX_NONE = 0,
+
+   /**< For RX side, only RSS is on */
+   ETH_MQ_RX_RSS = ETH_MQ_RX_RSS_FLAG,
+   /**< For RX side,only DCB is on. */
+   ETH_MQ_RX_DCB = ETH_MQ_RX_DCB_FLAG,
+   /**< Both DCB and RSS enable */
+   ETH_MQ_RX_DCB_RSS = ETH_MQ_RX_RSS_FLAG | ETH_MQ_RX_DCB_FLAG,
+
+   /**< Only VMDQ, no RSS nor DCB */
+   ETH_MQ_RX_VMDQ_ONLY = ETH_MQ_RX_VMDQ_FLAG,
+   /**< RSS mode with VMDQ */
+   ETH_MQ_RX_VMDQ_RSS = ETH_MQ_RX_RSS_FLAG | ETH_MQ_RX_VMDQ_FLAG,
+   /**< Use VMDQ+DCB to route traffic to queues */
+   ETH_MQ_RX_VMDQ_DCB = ETH_MQ_RX_VMDQ_FLAG | ETH_MQ_RX_DCB_FLAG,
+   /**< Enable both VMDQ and DCB in VMDq */
+   ETH_MQ_RX_VMDQ_DCB_RSS = ETH_MQ_RX_RSS_FLAG | ETH_MQ_RX_DCB_FLAG |
+ETH_MQ_RX_VMDQ_FLAG,
 };

 /**
@@ -840,7 +853,7 @@ struct rte_eth_conf {
 Read the datasheet of given ethernet controller
 for details. The possible values of this field
 are defined in implementation of each driver. 
*/
-   union {
+   struct {
struct rte_eth_rss_conf rss_conf; /**< Port RSS configuration */
struct rte_eth_vmdq_dcb_conf vmdq_dcb_conf;
/**< Port vmdq+dcb configuration. */
@@ -906,6 +919,10 @@ struct rte_eth_dev_info {
uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
  

[dpdk-dev] [PATCH 3/6] ixgbe: change for VMDQ arguments expansion

2014-09-23 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" 

Assign new VMDQ arguments with correct values.

Signed-off-by: Chen Jing D(Mark) 
Acked-by: Konstantin Ananyev 
Acked-by: Jingjing Wu 
Acked-by: Jijiang Liu 
Acked-by: Huawei Xie 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index f4b590b..d0f9bcb 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -1933,6 +1933,7 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->max_vmdq_pools = ETH_16_POOLS;
else
dev_info->max_vmdq_pools = ETH_64_POOLS;
+   dev_info->vmdq_queue_num = dev_info->max_rx_queues;
dev_info->rx_offload_capa =
DEV_RX_OFFLOAD_VLAN_STRIP |
DEV_RX_OFFLOAD_IPV4_CKSUM |
-- 
1.7.7.6



[dpdk-dev] [PATCH 6/6] i40e: Add full VMDQ pools support

2014-09-23 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" 

1. Function i40e_vsi_* name change to i40e_dev_* since PF can contains
   more than 1 VSI after VMDQ enabled.
2. i40e_dev_rx/tx_queue_setup change to have capability of setup
   queues that belongs to VMDQ pools.
3. Add queue mapping. This will do a convertion between queue index
   that application used and real NIC queue index.
3. i40e_dev_start/stop change to have capability switching VMDQ queues.
4. i40e_pf_config_rss change to calculate actual main VSI queue numbers
   after VMDQ pools introduced.

Signed-off-by: Chen Jing D(Mark) 
Acked-by: Konstantin Ananyev 
Acked-by: Jingjing Wu 
Acked-by: Jijiang Liu 
Acked-by: Huawei Xie 
---
 lib/librte_pmd_i40e/i40e_ethdev.c |  183 +
 lib/librte_pmd_i40e/i40e_ethdev.h |4 +-
 lib/librte_pmd_i40e/i40e_rxtx.c   |  125 +-
 3 files changed, 231 insertions(+), 81 deletions(-)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index 3185654..9009bd4 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -167,7 +167,7 @@ static int i40e_dev_rss_reta_query(struct rte_eth_dev *dev,
 static int i40e_get_cap(struct i40e_hw *hw);
 static int i40e_pf_parameter_init(struct rte_eth_dev *dev);
 static int i40e_pf_setup(struct i40e_pf *pf);
-static int i40e_vsi_init(struct i40e_vsi *vsi);
+static int i40e_dev_rxtx_init(struct i40e_pf *pf);
 static int i40e_vmdq_setup(struct rte_eth_dev *dev);
 static void i40e_stat_update_32(struct i40e_hw *hw, uint32_t reg,
bool offset_loaded, uint64_t *offset, uint64_t *stat);
@@ -770,8 +770,8 @@ i40e_dev_start(struct rte_eth_dev *dev)
 {
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   struct i40e_vsi *vsi = pf->main_vsi;
-   int ret;
+   struct i40e_vsi *main_vsi = pf->main_vsi;
+   int ret, i;

if ((dev->data->dev_conf.link_duplex != ETH_LINK_AUTONEG_DUPLEX) &&
(dev->data->dev_conf.link_duplex != ETH_LINK_FULL_DUPLEX)) {
@@ -782,41 +782,53 @@ i40e_dev_start(struct rte_eth_dev *dev)
}

/* Initialize VSI */
-   ret = i40e_vsi_init(vsi);
+   ret = i40e_dev_rxtx_init(pf);
if (ret != I40E_SUCCESS) {
-   PMD_DRV_LOG(ERR, "Failed to init VSI");
+   PMD_DRV_LOG(ERR, "Failed to init rx/tx queues\n");
goto err_up;
}

/* Map queues with MSIX interrupt */
-   i40e_vsi_queues_bind_intr(vsi);
-   i40e_vsi_enable_queues_intr(vsi);
+   i40e_vsi_queues_bind_intr(main_vsi);
+   i40e_vsi_enable_queues_intr(main_vsi);
+
+   /* Map VMDQ VSI queues with MSIX interrupt */
+   for (i = 0; i < pf->nb_cfg_vmdq_vsi; i++) {
+   i40e_vsi_queues_bind_intr(pf->vmdq[i].vsi);
+   i40e_vsi_enable_queues_intr(pf->vmdq[i].vsi);
+   }

/* Enable all queues which have been configured */
-   ret = i40e_vsi_switch_queues(vsi, TRUE);
+   ret = i40e_dev_switch_queues(pf, TRUE);
if (ret != I40E_SUCCESS) {
PMD_DRV_LOG(ERR, "Failed to enable VSI");
goto err_up;
}

/* Enable receiving broadcast packets */
-   if ((vsi->type == I40E_VSI_MAIN) || (vsi->type == I40E_VSI_VMDQ2)) {
-   ret = i40e_aq_set_vsi_broadcast(hw, vsi->seid, true, NULL);
+   ret = i40e_aq_set_vsi_broadcast(hw, main_vsi->seid, true, NULL);
+   if (ret != I40E_SUCCESS)
+   PMD_DRV_LOG(INFO, "fail to set vsi broadcast\n");
+
+   for (i = 0; i < pf->nb_cfg_vmdq_vsi; i++) {
+   ret = i40e_aq_set_vsi_broadcast(hw, pf->vmdq[i].vsi->seid,
+   true, NULL);
if (ret != I40E_SUCCESS)
-   PMD_DRV_LOG(INFO, "fail to set vsi broadcast");
+   PMD_DRV_LOG(INFO, "fail to set vsi broadcast\n");
}

/* Apply link configure */
ret = i40e_apply_link_speed(dev);
if (I40E_SUCCESS != ret) {
-   PMD_DRV_LOG(ERR, "Fail to apply link setting");
+   PMD_DRV_LOG(ERR, "Fail to apply link setting\n");
goto err_up;
}

return I40E_SUCCESS;

 err_up:
-   i40e_vsi_switch_queues(vsi, FALSE);
+   i40e_dev_switch_queues(pf, FALSE);
+   i40e_dev_clear_queues(dev);

return ret;
 }
@@ -825,17 +837,26 @@ static void
 i40e_dev_stop(struct rte_eth_dev *dev)
 {
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-   struct i40e_vsi *vsi = pf->main_vsi;
+   struct i40e_vsi *main_vsi = pf->main_vsi;
+   int i;

/* Disable all queues */
-   i40e_vsi_switch_queues(vsi, FALSE);
+   i40e_dev_switch_queues(pf, FALSE);
+
+   /* un-map queues with interrupt registers */
+   i40e_vsi_disable_queues_intr(main_vsi);
+   i40e_vs

[dpdk-dev] [PATCH v3 3/5] testpmd: adding parameter to reconfig method to set socket_id when adding new port to portlist

2014-09-23 Thread Declan Doherty

Signed-off-by: Declan Doherty 
---
 app/test-pmd/cmdline.c |2 +-
 app/test-pmd/testpmd.c |3 ++-
 app/test-pmd/testpmd.h |2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 67321f7..ed76eea 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -3614,7 +3614,7 @@ static void cmd_create_bonded_device_parsed(void 
*parsed_result,

/* Update number of ports */
nb_ports = rte_eth_dev_count();
-   reconfig(port_id);
+   reconfig(port_id, res->socket);
rte_eth_promiscuous_enable(port_id);
}

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9f6cdc4..66e3c7c 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -628,7 +628,7 @@ init_config(void)


 void
-reconfig(portid_t new_port_id)
+reconfig(portid_t new_port_id, unsigned socket_id)
 {
struct rte_port *port;

@@ -647,6 +647,7 @@ reconfig(portid_t new_port_id)
/* set flag to initialize port/queue */
port->need_reconfig = 1;
port->need_reconfig_queues = 1;
+   port->socket_id = socket_id;

init_port_config();
 }
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 142091d..7b78cc5 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -455,7 +455,7 @@ void fwd_config_display(void);
 void rxtx_config_display(void);
 void fwd_config_setup(void);
 void set_def_fwd_config(void);
-void reconfig(portid_t new_port_id);
+void reconfig(portid_t new_port_id, unsigned socket_id);
 int init_fwd_streams(void);

 void port_mtu_set(portid_t port_id, uint16_t mtu);
-- 
1.7.4.1



[dpdk-dev] [PATCH 0/6] i40e VMDQ support

2014-09-23 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" 

Define extra VMDQ arguments to expand VMDQ configuration. This also
includes change in igb and ixgbe PMD driver. In the meanwhile, fix 2
defects in rte_ether library.

Add full VMDQ support in i40e PMD driver. renamed some functions, setup
VMDQ VSI after it's enabled in application. It also make some improvement
on macaddr add/delete to support setting multiple macaddr for single or
multiple pools.

Finally, change i40e rx/tx_queue_setup and dev_start/stop functions to
configure/switch queues belonging to VMDQ pools.

Chen Jing D(Mark) (6):
  ether: enhancement for VMDQ support
  igb: change for VMDQ arguments expansion
  ixgbe: change for VMDQ arguments expansion
  i40e: add VMDQ support
  i40e: macaddr add/del enhancement
  i40e: Add full VMDQ pools support

 config/common_linuxapp  |1 +
 lib/librte_ether/rte_ethdev.c   |   12 +-
 lib/librte_ether/rte_ethdev.h   |   39 ++-
 lib/librte_pmd_e1000/igb_ethdev.c   |3 +
 lib/librte_pmd_i40e/i40e_ethdev.c   |  509 ++-
 lib/librte_pmd_i40e/i40e_ethdev.h   |   21 ++-
 lib/librte_pmd_i40e/i40e_rxtx.c |  125 +++--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |1 +
 8 files changed, 537 insertions(+), 174 deletions(-)

-- 
1.7.7.6



[dpdk-dev] [PATCH 2/6] igb: change for VMDQ arguments expansion

2014-09-23 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" 

Assign new VMDQ arguments with correct values.

Signed-off-by: Chen Jing D(Mark) 
Acked-by: Konstantin Ananyev 
Acked-by: Jingjing Wu 
Acked-by: Jijiang Liu 
Acked-by: Huawei Xie 
---
 lib/librte_pmd_e1000/igb_ethdev.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/lib/librte_pmd_e1000/igb_ethdev.c 
b/lib/librte_pmd_e1000/igb_ethdev.c
index c9acdc5..dc0ea6d 100644
--- a/lib/librte_pmd_e1000/igb_ethdev.c
+++ b/lib/librte_pmd_e1000/igb_ethdev.c
@@ -1286,18 +1286,21 @@ eth_igb_infos_get(struct rte_eth_dev *dev,
dev_info->max_rx_queues = 16;
dev_info->max_tx_queues = 16;
dev_info->max_vmdq_pools = ETH_8_POOLS;
+   dev_info->vmdq_queue_num = 16;
break;

case e1000_82580:
dev_info->max_rx_queues = 8;
dev_info->max_tx_queues = 8;
dev_info->max_vmdq_pools = ETH_8_POOLS;
+   dev_info->vmdq_queue_num = 8;
break;

case e1000_i350:
dev_info->max_rx_queues = 8;
dev_info->max_tx_queues = 8;
dev_info->max_vmdq_pools = ETH_8_POOLS;
+   dev_info->vmdq_queue_num = 8;
break;

case e1000_i354:
-- 
1.7.7.6



[dpdk-dev] [PATCH 4/6] i40e: add VMDQ support

2014-09-23 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" 

The change includes several parts:
1. Get maximum number of VMDQ pools supported in dev_init.
2. Fill VMDQ info in i40e_dev_info_get.
3. Setup VMDQ pools in i40e_dev_configure.
4. i40e_vsi_setup change to support creation of VMDQ VSI.

Signed-off-by: Chen Jing D(Mark) 
Acked-by: Konstantin Ananyev 
Acked-by: Jingjing Wu 
Acked-by: Jijiang Liu 
Acked-by: Huawei Xie 
---
 config/common_linuxapp|1 +
 lib/librte_pmd_i40e/i40e_ethdev.c |  237 -
 lib/librte_pmd_i40e/i40e_ethdev.h |   17 +++-
 3 files changed, 225 insertions(+), 30 deletions(-)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 5bee910..d0bb3f7 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -208,6 +208,7 @@ CONFIG_RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC=y
 CONFIG_RTE_LIBRTE_I40E_ALLOW_UNSUPPORTED_SFP=n
 CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC=n
 CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF=4
+CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM=4
 # interval up to 8160 us, aligned to 2 (or default value)
 CONFIG_RTE_LIBRTE_I40E_ITR_INTERVAL=-1

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index a00d6ca..a267c96 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -168,6 +168,7 @@ static int i40e_get_cap(struct i40e_hw *hw);
 static int i40e_pf_parameter_init(struct rte_eth_dev *dev);
 static int i40e_pf_setup(struct i40e_pf *pf);
 static int i40e_vsi_init(struct i40e_vsi *vsi);
+static int i40e_vmdq_setup(struct rte_eth_dev *dev);
 static void i40e_stat_update_32(struct i40e_hw *hw, uint32_t reg,
bool offset_loaded, uint64_t *offset, uint64_t *stat);
 static void i40e_stat_update_48(struct i40e_hw *hw,
@@ -269,21 +270,11 @@ static struct eth_driver rte_i40e_pmd = {
 };

 static inline int
-i40e_prev_power_of_2(int n)
+i40e_align_floor(int n)
 {
-   int p = n;
-
-   --p;
-   p |= p >> 1;
-   p |= p >> 2;
-   p |= p >> 4;
-   p |= p >> 8;
-   p |= p >> 16;
-   if (p == (n - 1))
-   return n;
-   p >>= 1;
-
-   return ++p;
+   if (n == 0)
+   return 0;
+   return (1 << (sizeof(n) * CHAR_BIT - 1 - __builtin_clz(n)));
 }

 static inline int
@@ -500,7 +491,7 @@ eth_i40e_dev_init(__rte_unused struct eth_driver *eth_drv,
if (!dev->data->mac_addrs) {
PMD_INIT_LOG(ERR, "Failed to allocated memory "
"for storing mac address");
-   goto err_get_mac_addr;
+   goto err_mac_alloc;
}
ether_addr_copy((struct ether_addr *)hw->mac.perm_addr,
&dev->data->mac_addrs[0]);
@@ -521,8 +512,9 @@ eth_i40e_dev_init(__rte_unused struct eth_driver *eth_drv,

return 0;

+err_mac_alloc:
+   i40e_vsi_release(pf->main_vsi);
 err_setup_pf_switch:
-   rte_free(pf->main_vsi);
 err_get_mac_addr:
 err_configure_lan_hmc:
(void)i40e_shutdown_lan_hmc(hw);
@@ -541,6 +533,27 @@ err_get_capabilities:
 static int
 i40e_dev_configure(struct rte_eth_dev *dev)
 {
+   int ret;
+   enum rte_eth_rx_mq_mode mq_mode = dev->data->dev_conf.rxmode.mq_mode;
+
+   /* VMDQ setup.
+*  Needs to move VMDQ setting out of i40e_pf_config_mq_rx() as VMDQ and
+*  RSS setting have different requirements.
+*  General PMD driver call sequence are NIC init, configure,
+*  rx/tx_queue_setup and dev_start. In rx/tx_queue_setup() function, it
+*  will try to lookup the VSI that specific queue belongs to if VMDQ
+*  applicable. So, VMDQ setting has to be done before
+*  rx/tx_queue_setup(). This function is good  to place vmdq_setup.
+*  For RSS setting, it will try to calculate actual configured RX queue
+*  number, which will be available after rx_queue_setup(). dev_start()
+*  function is good to place RSS setup.
+*/
+   if (mq_mode & ETH_MQ_RX_VMDQ_FLAG) {
+   ret = i40e_vmdq_setup(dev);
+   if (ret)
+   return ret;
+   }
+
return i40e_dev_init_vlan(dev);
 }

@@ -1389,6 +1402,16 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
DEV_TX_OFFLOAD_UDP_CKSUM |
DEV_TX_OFFLOAD_TCP_CKSUM |
DEV_TX_OFFLOAD_SCTP_CKSUM;
+
+   if (pf->flags | I40E_FLAG_VMDQ) {
+   dev_info->max_vmdq_pools = pf->max_nb_vmdq_vsi;
+   dev_info->vmdq_queue_base = dev_info->max_rx_queues;
+   dev_info->vmdq_queue_num = pf->vmdq_nb_qps *
+   pf->max_nb_vmdq_vsi;
+   dev_info->vmdq_pool_base = I40E_VMDQ_POOL_BASE;
+   dev_info->max_rx_queues += dev_info->vmdq_queue_num;
+   dev_info->max_tx_queues += dev_info->vmdq_queue_num;
+   }
 }

 static int
@@ -1814,7 +1837,7 @@ i40e_pf_par

[dpdk-dev] [PATCH 5/6] i40e: macaddr add/del enhancement

2014-09-23 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" 

Change i40e_macaddr_add and i40e_macaddr_remove functions to support
multiple macaddr add/delete. In the meanwhile, support macaddr ops
on different pools.

Signed-off-by: Chen Jing D(Mark) 
Acked-by: Konstantin Ananyev 
Acked-by: Jingjing Wu 
Acked-by: Jijiang Liu 
Acked-by: Huawei Xie 
---
 lib/librte_pmd_i40e/i40e_ethdev.c |   91 +---
 1 files changed, 43 insertions(+), 48 deletions(-)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index a267c96..3185654 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -1532,45 +1532,37 @@ i40e_priority_flow_ctrl_set(__rte_unused struct 
rte_eth_dev *dev,
 static void
 i40e_macaddr_add(struct rte_eth_dev *dev,
 struct ether_addr *mac_addr,
-__attribute__((unused)) uint32_t index,
-__attribute__((unused)) uint32_t pool)
+__rte_unused uint32_t index,
+uint32_t pool)
 {
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   struct i40e_vsi *vsi = pf->main_vsi;
-   struct ether_addr old_mac;
+   struct i40e_vsi *vsi;
int ret;

-   if (!is_valid_assigned_ether_addr(mac_addr)) {
-   PMD_DRV_LOG(ERR, "Invalid ethernet address");
-   return;
-   }
-
-   if (is_same_ether_addr(mac_addr, &(pf->dev_addr))) {
-   PMD_DRV_LOG(INFO, "Ignore adding permanent mac address");
+   /* If VMDQ not enabled or configured, return */
+   if (pool != 0 && (!(pf->flags | I40E_FLAG_VMDQ) || 
!pf->nb_cfg_vmdq_vsi)) {
+   PMD_DRV_LOG(ERR, "VMDQ not %s, can't set mac to pool %u\n",
+   pf->flags | I40E_FLAG_VMDQ ? "configured" : "enabled",
+   pool);
return;
}

-   /* Write mac address */
-   ret = i40e_aq_mac_address_write(hw, I40E_AQC_WRITE_TYPE_LAA_ONLY,
-   mac_addr->addr_bytes, NULL);
-   if (ret != I40E_SUCCESS) {
-   PMD_DRV_LOG(ERR, "Failed to write mac address");
+   if (pool > pf->nb_cfg_vmdq_vsi) {
+   PMD_DRV_LOG(ERR, "Pool number %u invalid. Max pool is %u\n",
+   pool, pf->nb_cfg_vmdq_vsi);
return;
}

-   (void)rte_memcpy(&old_mac, hw->mac.addr, ETHER_ADDR_LEN);
-   (void)rte_memcpy(hw->mac.addr, mac_addr->addr_bytes,
-   ETHER_ADDR_LEN);
+   if (pool == 0)
+   vsi = pf->main_vsi;
+   else
+   vsi = pf->vmdq[pool - 1].vsi;

ret = i40e_vsi_add_mac(vsi, mac_addr);
if (ret != I40E_SUCCESS) {
-   PMD_DRV_LOG(ERR, "Failed to add MACVLAN filter");
+   PMD_DRV_LOG(ERR, "Failed to add MACVLAN filter\n");
return;
}
-
-   ether_addr_copy(mac_addr, &pf->dev_addr);
-   i40e_vsi_delete_mac(vsi, &old_mac);
 }

 /* Remove a MAC address, and update filters */
@@ -1578,36 +1570,39 @@ static void
 i40e_macaddr_remove(struct rte_eth_dev *dev, uint32_t index)
 {
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-   struct i40e_vsi *vsi = pf->main_vsi;
-   struct rte_eth_dev_data *data = I40E_VSI_TO_DEV_DATA(vsi);
+   struct i40e_vsi *vsi;
+   struct rte_eth_dev_data *data = dev->data;
struct ether_addr *macaddr;
int ret;
-   struct i40e_hw *hw =
-   I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-
-   if (index >= vsi->max_macaddrs)
-   return;
+   uint32_t i;
+   uint64_t pool_sel;

macaddr = &(data->mac_addrs[index]);
-   if (!is_valid_assigned_ether_addr(macaddr))
-   return;
-
-   ret = i40e_aq_mac_address_write(hw, I40E_AQC_WRITE_TYPE_LAA_ONLY,
-   hw->mac.perm_addr, NULL);
-   if (ret != I40E_SUCCESS) {
-   PMD_DRV_LOG(ERR, "Failed to write mac address");
-   return;
-   }
-
-   (void)rte_memcpy(hw->mac.addr, hw->mac.perm_addr, ETHER_ADDR_LEN);

-   ret = i40e_vsi_delete_mac(vsi, macaddr);
-   if (ret != I40E_SUCCESS)
-   return;
+   pool_sel = dev->data->mac_pool_sel[index];
+
+   for (i = 0; i < sizeof(pool_sel) * CHAR_BIT; i++) {
+   if (pool_sel & (1ULL << i)) {
+   if (i == 0)
+   vsi = pf->main_vsi;
+   else {
+   /* No VMDQ pool enabled or configured */
+   if (!(pf->flags | I40E_FLAG_VMDQ) ||
+   (i > pf->nb_cfg_vmdq_vsi)) {
+   PMD_DRV_LOG(ERR, "No VMDQ pool enabled"
+   "/configured\n");
+   

[dpdk-dev] [PATCH v3 1/5] bond: free mbufs if transmission fails in bonding tx_burst functions

2014-09-23 Thread Declan Doherty
Fixing a number of corner cases that if transmission failed on slave devices 
then this
could lead to leaked mbufs 

Signed-off-by: Declan Doherty 
---
 app/test/test_link_bonding.c   |  393 +++-
 app/test/virtual_pmd.c |   80 +--
 app/test/virtual_pmd.h |7 +
 lib/librte_pmd_bond/rte_eth_bond_pmd.c |   83 ++--
 4 files changed, 525 insertions(+), 38 deletions(-)

diff --git a/app/test/test_link_bonding.c b/app/test/test_link_bonding.c
index cce32ed..1a847eb 100644
--- a/app/test/test_link_bonding.c
+++ b/app/test/test_link_bonding.c
@@ -663,6 +663,9 @@ enable_bonded_slaves(void)
int i;

for (i = 0; i < test_params->bonded_slave_count; i++) {
+   
virtual_ethdev_tx_burst_fn_set_success(test_params->slave_port_ids[i],
+   1);
+
virtual_ethdev_simulate_link_status_interrupt(
test_params->slave_port_ids[i], 1);
}
@@ -1413,6 +1416,135 @@ test_roundrobin_tx_burst(void)
 }

 static int
+verify_mbufs_ref_count(struct rte_mbuf **mbufs, int nb_mbufs, int val)
+{
+   int i, refcnt;
+
+   for (i = 0; i < nb_mbufs; i++) {
+   refcnt = rte_mbuf_refcnt_read(mbufs[i]);
+   TEST_ASSERT_EQUAL(refcnt, val,
+   "mbuf ref count (%d)is not the expected value (%d)",
+   refcnt, val);
+   }
+   return 0;
+}
+
+
+static void
+free_mbufs(struct rte_mbuf **mbufs, int nb_mbufs)
+{
+   int i;
+
+   for (i = 0; i < nb_mbufs; i++)
+   rte_pktmbuf_free(mbufs[i]);
+}
+
+#define TEST_RR_SLAVE_TX_FAIL_SLAVE_COUNT  (2)
+#define TEST_RR_SLAVE_TX_FAIL_BURST_SIZE   (64)
+#define TEST_RR_SLAVE_TX_FAIL_PACKETS_COUNT(22)
+#define TEST_RR_SLAVE_TX_FAIL_FAILING_SLAVE_IDX(1)
+
+static int
+test_roundrobin_tx_burst_slave_tx_fail(void)
+{
+   struct rte_mbuf *pkt_burst[MAX_PKT_BURST];
+   struct rte_mbuf *expected_tx_fail_pkts[MAX_PKT_BURST];
+
+   struct rte_eth_stats port_stats;
+
+   int i, first_fail_idx, tx_count;
+
+   TEST_ASSERT_SUCCESS(initialize_bonded_device_with_slaves(
+   BONDING_MODE_ROUND_ROBIN, 0,
+   TEST_RR_SLAVE_TX_FAIL_SLAVE_COUNT, 1),
+   "Failed to intialise bonded device");
+
+   /* Generate test bursts of packets to transmit */
+   TEST_ASSERT_EQUAL(generate_test_burst(pkt_burst,
+   TEST_RR_SLAVE_TX_FAIL_BURST_SIZE, 0, 1, 0, 0, 0),
+   TEST_RR_SLAVE_TX_FAIL_BURST_SIZE,
+   "Failed to generate test packet burst");
+
+   /* Copy references to packets which we expect not to be transmitted */
+   first_fail_idx = (TEST_RR_SLAVE_TX_FAIL_BURST_SIZE -
+   (TEST_RR_SLAVE_TX_FAIL_PACKETS_COUNT *
+   TEST_RR_SLAVE_TX_FAIL_SLAVE_COUNT)) +
+   TEST_RR_SLAVE_TX_FAIL_FAILING_SLAVE_IDX;
+
+   for (i = 0; i < TEST_RR_SLAVE_TX_FAIL_PACKETS_COUNT; i++) {
+   expected_tx_fail_pkts[i] = pkt_burst[first_fail_idx +
+   (i * TEST_RR_SLAVE_TX_FAIL_SLAVE_COUNT)];
+   }
+
+   /* Set virtual slave to only fail transmission of
+* TEST_RR_SLAVE_TX_FAIL_PACKETS_COUNT packets in burst */
+   virtual_ethdev_tx_burst_fn_set_success(
+   
test_params->slave_port_ids[TEST_RR_SLAVE_TX_FAIL_FAILING_SLAVE_IDX],
+   0);
+
+   virtual_ethdev_tx_burst_fn_set_tx_pkt_fail_count(
+   
test_params->slave_port_ids[TEST_RR_SLAVE_TX_FAIL_FAILING_SLAVE_IDX],
+   TEST_RR_SLAVE_TX_FAIL_PACKETS_COUNT);
+
+   tx_count = rte_eth_tx_burst(test_params->bonded_port_id, 0, pkt_burst,
+   TEST_RR_SLAVE_TX_FAIL_BURST_SIZE);
+
+   TEST_ASSERT_EQUAL(tx_count, TEST_RR_SLAVE_TX_FAIL_BURST_SIZE -
+   TEST_RR_SLAVE_TX_FAIL_PACKETS_COUNT,
+   "Transmitted (%d) an unexpected (%d) number of 
packets", tx_count,
+   TEST_RR_SLAVE_TX_FAIL_BURST_SIZE -
+   TEST_RR_SLAVE_TX_FAIL_PACKETS_COUNT);
+
+   /* Verify that failed packet are expected failed packets */
+   for (i = 0; i < TEST_RR_SLAVE_TX_FAIL_PACKETS_COUNT; i++) {
+   TEST_ASSERT_EQUAL(expected_tx_fail_pkts[i], pkt_burst[i + 
tx_count],
+   "expected mbuf (%d) pointer %p not expected 
pointer %p",
+   i, expected_tx_fail_pkts[i], pkt_burst[i + 
tx_count]);
+   }
+
+   /* Verify bonded port tx stats */
+   rte_eth_stats_get(test_params->bonded_port_id, &port_stats);
+
+   TEST_ASSERT_EQUAL(port_stats.opackets,
+   (uint64_t)TEST_RR_SLAVE_TX_FAIL_BURST_SIZE -
+   TEST_RR_SLAVE_TX_FAIL_PACKETS_COUNT,
+

[dpdk-dev] [PATCH v3 4/5] bond: lsc polling support

2014-09-23 Thread Declan Doherty
Adds link status polling functionality to bonding device as well as API
to set polling interval and link up / down propagation delay.
Also contains unit tests for testing polling functionailty.


Signed-off-by: Declan Doherty 
---
 app/test/test.h|7 +-
 app/test/test_link_bonding.c   |  258 ---
 app/test/virtual_pmd.c |   17 +-
 app/test/virtual_pmd.h |   48 +++-
 lib/librte_pmd_bond/rte_eth_bond.h |   80 ++
 lib/librte_pmd_bond/rte_eth_bond_api.c |  309 +++
 lib/librte_pmd_bond/rte_eth_bond_args.c|   30 ++-
 lib/librte_pmd_bond/rte_eth_bond_pmd.c |  387 +---
 lib/librte_pmd_bond/rte_eth_bond_private.h |   71 --
 9 files changed, 861 insertions(+), 346 deletions(-)

diff --git a/app/test/test.h b/app/test/test.h
index 98ab804..24b1640 100644
--- a/app/test/test.h
+++ b/app/test/test.h
@@ -62,14 +62,15 @@

 #define TEST_ASSERT_SUCCESS(val, msg, ...) do {
\
if (!(val == 0)) {  
\
-   printf("TestCase %s() line %d failed: " 
\
-   msg "\n", __func__, __LINE__, ##__VA_ARGS__);   
\
+   printf("TestCase %s() line %d failed (err %d): "
\
+   msg "\n", __func__, __LINE__, val,  
\
+   ##__VA_ARGS__); 
\
return -1;  
\
}   
\
 } while (0)

 #define TEST_ASSERT_FAIL(val, msg, ...) do {   
\
-   if (!(val != -1)) { 
\
+   if (!(val != 0)) {  
\
printf("TestCase %s() line %d failed: " 
\
msg "\n", __func__, __LINE__, ##__VA_ARGS__);   
\
return -1;  
\
diff --git a/app/test/test_link_bonding.c b/app/test/test_link_bonding.c
index 50355a3..c32b685 100644
--- a/app/test/test_link_bonding.c
+++ b/app/test/test_link_bonding.c
@@ -234,42 +234,34 @@ configure_ethdev(uint8_t port_id, uint8_t start, uint8_t 
en_isr)
else
default_pmd_conf.intr_conf.lsc = 0;

-   if (rte_eth_dev_configure(port_id, test_params->nb_rx_q,
-   test_params->nb_tx_q, &default_pmd_conf) != 0) {
-   goto error;
-   }
+   TEST_ASSERT_SUCCESS(rte_eth_dev_configure(port_id, test_params->nb_rx_q,
+   test_params->nb_tx_q, &default_pmd_conf),
+   "rte_eth_dev_configure for port %d failed", port_id);

-   for (q_id = 0; q_id < test_params->nb_rx_q; q_id++) {
-   if (rte_eth_rx_queue_setup(port_id, q_id, RX_RING_SIZE,
+   for (q_id = 0; q_id < test_params->nb_rx_q; q_id++)
+   TEST_ASSERT_SUCCESS(rte_eth_rx_queue_setup(port_id, q_id, 
RX_RING_SIZE,
rte_eth_dev_socket_id(port_id), 
&rx_conf_default,
-   test_params->mbuf_pool) < 0) {
-   goto error;
-   }
-   }
+   test_params->mbuf_pool) ,
+   "rte_eth_rx_queue_setup for port %d failed", 
port_id);

-   for (q_id = 0; q_id < test_params->nb_tx_q; q_id++) {
-   if (rte_eth_tx_queue_setup(port_id, q_id, TX_RING_SIZE,
-   rte_eth_dev_socket_id(port_id), 
&tx_conf_default) < 0) {
-   printf("Failed to setup tx queue (%d).\n", q_id);
-   goto error;
-   }
-   }
+   for (q_id = 0; q_id < test_params->nb_tx_q; q_id++)
+   TEST_ASSERT_SUCCESS(rte_eth_tx_queue_setup(port_id, q_id, 
TX_RING_SIZE,
+   rte_eth_dev_socket_id(port_id), 
&tx_conf_default),
+   "rte_eth_tx_queue_setup for port %d failed", 
port_id);

-   if (start) {
-   if (rte_eth_dev_start(port_id) < 0) {
-   printf("Failed to start device (%d).\n", port_id);
-   goto error;
-   }
-   }
-   return 0;
+   if (start)
+   TEST_ASSERT_SUCCESS(rte_eth_dev_start(port_id),
+   "rte_eth_dev_start for port %d failed", 
port_id);

-error:
-   pri

[dpdk-dev] [PATCH v3 5/5] bond: unit test test macro refactor

2014-09-23 Thread Declan Doherty

Signed-off-by: Declan Doherty 
---
 app/test/test_link_bonding.c | 2574 +-
 1 files changed, 1036 insertions(+), 1538 deletions(-)

diff --git a/app/test/test_link_bonding.c b/app/test/test_link_bonding.c
index c32b685..c4fcaf7 100644
--- a/app/test/test_link_bonding.c
+++ b/app/test/test_link_bonding.c
@@ -31,6 +31,7 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

+#include "unistd.h"
 #include 
 #include 
 #include 
@@ -265,7 +266,7 @@ static pthread_cond_t cvar = PTHREAD_COND_INITIALIZER;
 static int
 test_setup(void)
 {
-   int i, retval, nb_mbuf_per_pool;
+   int i, nb_mbuf_per_pool;
struct ether_addr *mac_addr = (struct ether_addr *)slave_mac;

/* Allocate ethernet packet header with space for VLAN header */
@@ -273,10 +274,8 @@ test_setup(void)
test_params->pkt_eth_hdr = malloc(sizeof(struct ether_hdr) +
sizeof(struct vlan_hdr));

-   if (test_params->pkt_eth_hdr == NULL) {
-   printf("ethernet header struct allocation failed!\n");
-   return -1;
-   }
+   TEST_ASSERT_NOT_NULL(test_params->pkt_eth_hdr,
+   "Ethernet header struct allocation failed!");
}

nb_mbuf_per_pool = RTE_TEST_RX_DESC_MAX + DEF_PKT_BURST +
@@ -286,10 +285,8 @@ test_setup(void)
MBUF_SIZE, MBUF_CACHE_SIZE, sizeof(struct 
rte_pktmbuf_pool_private),
rte_pktmbuf_pool_init, NULL, rte_pktmbuf_init, 
NULL,
rte_socket_id(), 0);
-   if (test_params->mbuf_pool == NULL) {
-   printf("rte_mempool_create failed\n");
-   return -1;
-   }
+   TEST_ASSERT_NOT_NULL(test_params->mbuf_pool,
+   "rte_mempool_create failed");
}

/* Create / Initialize virtual eth devs */
@@ -303,20 +300,12 @@ test_setup(void)

test_params->slave_port_ids[i] = 
virtual_ethdev_create(pmd_name,
mac_addr, rte_socket_id(), 1);
-   if (test_params->slave_port_ids[i] < 0) {
-   printf("Failed to create virtual virtual ethdev 
%s\n", pmd_name);
-   return -1;
-   }
+   TEST_ASSERT(test_params->slave_port_ids[i] >= 0,
+   "Failed to create virtual virtual 
ethdev %s", pmd_name);

-   printf("Created virtual ethdev %s\n", pmd_name);
-
-   retval = 
configure_ethdev(test_params->slave_port_ids[i], 1, 0);
-   if (retval != 0) {
-   printf("Failed to configure virtual ethdev 
%s\n", pmd_name);
-   return -1;
-   }
-
-   printf("Configured virtual ethdev %s\n", pmd_name);
+   TEST_ASSERT_SUCCESS(configure_ethdev(
+   test_params->slave_port_ids[i], 1, 0),
+   "Failed to configure virtual ethdev 
%s", pmd_name);
}
slaves_initialized = 1;
}
@@ -350,14 +339,14 @@ test_create_bonded_device(void)
current_slave_count = 
rte_eth_bond_slaves_get(test_params->bonded_port_id,
slaves, RTE_MAX_ETHPORTS);

-   TEST_ASSERT(current_slave_count == 0,
+   TEST_ASSERT_EQUAL(current_slave_count, 0,
"Number of slaves %d is great than expected %d.",
current_slave_count, 0);

current_slave_count = rte_eth_bond_active_slaves_get(
test_params->bonded_port_id, slaves, RTE_MAX_ETHPORTS);

-   TEST_ASSERT(current_slave_count == 0,
+   TEST_ASSERT_EQUAL(current_slave_count, 0,
"Number of active slaves %d is great than expected %d.",
current_slave_count, 0);

@@ -375,30 +364,21 @@ test_create_bonded_device_with_invalid_params(void)
/* Invalid name */
port_id = rte_eth_bond_create(NULL, test_params->bonding_mode,
rte_socket_id());
-   if (port_id >= 0) {
-   printf("Created bonded device unexpectedly.\n");
-   return -1;
-   }
+   TEST_ASSERT(port_id < 0, "Created bonded device unexpectedly");

test_params->bonding_mode = INVALID_BONDING_MODE;

/* Invalid bonding mode */
port_id = rte_eth_bond_create(BONDED_DEV_NAME, 
test_params->bonding_mode,
rte_socket_id());
-   if (port_id >= 0) {
-   printf("Created bonded device unexpectedly.\n");
-   return -1;
-   }
+   TEST_ASSERT(port_id < 0, "Created bonded device 

[dpdk-dev] [PATCH v3 0/5] link bonding

2014-09-23 Thread Declan Doherty
This patch set contains a typo fix for the bond free mbufs patch aswell
as updates to the test app patch to rebase for changes in the mbuf patches
. It also contains a patch to add support for slave devices which don't
support link status interrupts and also a patch to tidy up the link bonding
unit test so that all tests use the new test macros.

Declan Doherty (5):
  bond: free mbufs if transmission fails in bonding tx_burst functions
  test app: adding support for generating variable sized packet
  testpmd: adding parameter to reconfig method to set socket_id when
adding new port to portlist
  bond: lsc polling support
  bond: unit test test macro refactor

 app/test-pmd/cmdline.c |2 +-
 app/test-pmd/testpmd.c |3 +-
 app/test-pmd/testpmd.h |2 +-
 app/test/packet_burst_generator.c  |   25 +-
 app/test/packet_burst_generator.h  |6 +-
 app/test/test.h|7 +-
 app/test/test_link_bonding.c   | 3245 ++--
 app/test/virtual_pmd.c |   97 +-
 app/test/virtual_pmd.h |   53 +-
 lib/librte_pmd_bond/rte_eth_bond.h |   80 +
 lib/librte_pmd_bond/rte_eth_bond_api.c |  309 ++-
 lib/librte_pmd_bond/rte_eth_bond_args.c|   30 +-
 lib/librte_pmd_bond/rte_eth_bond_pmd.c |  470 +++-
 lib/librte_pmd_bond/rte_eth_bond_private.h |   71 +-
 14 files changed, 2450 insertions(+), 1950 deletions(-)

-- 
1.7.4.1



[dpdk-dev] [PATCH v3 2/5] test app: adding support for generating variable sized packet

2014-09-23 Thread Declan Doherty

Signed-off-by: Declan Doherty 
---
 app/test/packet_burst_generator.c |   25 -
 app/test/packet_burst_generator.h |6 +-
 app/test/test_link_bonding.c  |   14 +-
 3 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/app/test/packet_burst_generator.c 
b/app/test/packet_burst_generator.c
index 9e747a4..b2824dc 100644
--- a/app/test/packet_burst_generator.c
+++ b/app/test/packet_burst_generator.c
@@ -74,8 +74,7 @@ static inline void
 copy_buf_to_pkt(void *buf, unsigned len, struct rte_mbuf *pkt, unsigned offset)
 {
if (offset + len <= pkt->data_len) {
-   rte_memcpy(rte_pktmbuf_mtod(pkt, char *) + offset,
-   buf, (size_t) len);
+   rte_memcpy(rte_pktmbuf_mtod(pkt, char *) + offset, buf, 
(size_t) len);
return;
}
copy_buf_to_pkt_segs(buf, len, pkt, offset);
@@ -191,20 +190,12 @@ initialize_ipv4_header(struct ipv4_hdr *ip_hdr, uint32_t 
src_addr,
  */
 #define RTE_MAX_SEGS_PER_PKT 255 /**< pkt.nb_segs is a 8-bit unsigned char. */

-#define TXONLY_DEF_PACKET_LEN 64
-#define TXONLY_DEF_PACKET_LEN_128 128
-
-uint16_t tx_pkt_length = TXONLY_DEF_PACKET_LEN;
-uint16_t tx_pkt_seg_lengths[RTE_MAX_SEGS_PER_PKT] = {
-   TXONLY_DEF_PACKET_LEN_128,
-};
-
-uint8_t  tx_pkt_nb_segs = 1;

 int
 generate_packet_burst(struct rte_mempool *mp, struct rte_mbuf **pkts_burst,
struct ether_hdr *eth_hdr, uint8_t vlan_enabled, void *ip_hdr,
-   uint8_t ipv4, struct udp_hdr *udp_hdr, int nb_pkt_per_burst)
+   uint8_t ipv4, struct udp_hdr *udp_hdr, int nb_pkt_per_burst,
+   uint8_t pkt_len, uint8_t nb_pkt_segs)
 {
int i, nb_pkt = 0;
size_t eth_hdr_size;
@@ -221,9 +212,9 @@ nomore_mbuf:
break;
}

-   pkt->data_len = tx_pkt_seg_lengths[0];
+   pkt->data_len = pkt_len;
pkt_seg = pkt;
-   for (i = 1; i < tx_pkt_nb_segs; i++) {
+   for (i = 1; i < nb_pkt_segs; i++) {
pkt_seg->next = rte_pktmbuf_alloc(mp);
if (pkt_seg->next == NULL) {
pkt->nb_segs = i;
@@ -231,7 +222,7 @@ nomore_mbuf:
goto nomore_mbuf;
}
pkt_seg = pkt_seg->next;
-   pkt_seg->data_len = tx_pkt_seg_lengths[i];
+   pkt_seg->data_len = pkt_len;
}
pkt_seg->next = NULL; /* Last segment of packet. */

@@ -259,8 +250,8 @@ nomore_mbuf:
 * Complete first mbuf of packet and append it to the
 * burst of packets to be transmitted.
 */
-   pkt->nb_segs = tx_pkt_nb_segs;
-   pkt->pkt_len = tx_pkt_length;
+   pkt->nb_segs = nb_pkt_segs;
+   pkt->pkt_len = pkt_len;
pkt->l2_len = eth_hdr_size;

if (ipv4) {
diff --git a/app/test/packet_burst_generator.h 
b/app/test/packet_burst_generator.h
index 5b3cd6c..f86589e 100644
--- a/app/test/packet_burst_generator.h
+++ b/app/test/packet_burst_generator.h
@@ -47,6 +47,9 @@ extern "C" {
 #define IPV4_ADDR(a, b, c, d)(((a & 0xff) << 24) | ((b & 0xff) << 16) | \
((c & 0xff) << 8) | (d & 0xff))

+#define PACKET_BURST_GEN_PKT_LEN 60
+#define PACKET_BURST_GEN_PKT_LEN_128 128
+

 void
 initialize_eth_header(struct ether_hdr *eth_hdr, struct ether_addr *src_mac,
@@ -68,7 +71,8 @@ initialize_ipv4_header(struct ipv4_hdr *ip_hdr, uint32_t 
src_addr,
 int
 generate_packet_burst(struct rte_mempool *mp, struct rte_mbuf **pkts_burst,
struct ether_hdr *eth_hdr, uint8_t vlan_enabled, void *ip_hdr,
-   uint8_t ipv4, struct udp_hdr *udp_hdr, int nb_pkt_per_burst);
+   uint8_t ipv4, struct udp_hdr *udp_hdr, int nb_pkt_per_burst,
+   uint8_t pkt_len, uint8_t nb_pkt_segs);

 #ifdef __cplusplus
 }
diff --git a/app/test/test_link_bonding.c b/app/test/test_link_bonding.c
index 1a847eb..50355a3 100644
--- a/app/test/test_link_bonding.c
+++ b/app/test/test_link_bonding.c
@@ -1338,7 +1338,8 @@ generate_test_burst(struct rte_mbuf **pkts_burst, 
uint16_t burst_size,
/* Generate burst of packets to transmit */
generated_burst_size = generate_packet_burst(test_params->mbuf_pool,
pkts_burst, test_params->pkt_eth_hdr, vlan, ip_hdr, 
ipv4,
-   test_params->pkt_udp_hdr, burst_size);
+   test_params->pkt_udp_hdr, burst_size, 
PACKET_BURST_GEN_PKT_LEN_128,
+   1);
if (generated_burst_size != burst_size) {
printf("Failed to generate packet burst");
return -1;
@@ -2056,7 +2057,7 @@ test_activebackup_tx_burst(void)
/* Generate a burst of packets to transmit */
generated_burst_size = gene

[dpdk-dev] [PATCH 1/4] compat: Add infrastructure to support symbol versioning

2014-09-23 Thread Neil Horman
On Tue, Sep 23, 2014 at 11:39:29AM +0100, Sergio Gonzalez Monroy wrote:
> Hi Neil,
> 
> On Mon, Sep 15, 2014 at 03:23:48PM -0400, Neil Horman wrote:
> > Add initial pass header files to support symbol versioning.
> > 
> > Signed-off-by: Neil Horman 
> > CC: Thomas Monjalon 
> > CC: "Richardson, Bruce" 
> > ---
> >  lib/Makefile   |  1 +
> >  lib/librte_compat/Makefile | 38 +++
> >  lib/librte_compat/rte_compat.h | 86 
> > ++
> >  mk/rte.lib.mk  |  6 +++
> >  4 files changed, 131 insertions(+)
> >  create mode 100644 lib/librte_compat/Makefile
> >  create mode 100644 lib/librte_compat/rte_compat.h
> > 
> > diff --git a/lib/Makefile b/lib/Makefile
> > index 10c5bb3..a85b55b 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -32,6 +32,7 @@
> >  include $(RTE_SDK)/mk/rte.vars.mk
> >  
> >  DIRS-$(CONFIG_RTE_LIBC) += libc
> > +DIRS-y += librte_compat
> >  DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal
> >  DIRS-$(CONFIG_RTE_LIBRTE_MALLOC) += librte_malloc
> >  DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring
> > diff --git a/lib/librte_compat/Makefile b/lib/librte_compat/Makefile
> > new file mode 100644
> > index 000..a61511a
> > --- /dev/null
> > +++ b/lib/librte_compat/Makefile
> > @@ -0,0 +1,38 @@
> > +#   BSD LICENSE
> > +#
> > +#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > +#   All rights reserved.
> > +#
> > +#   Redistribution and use in source and binary forms, with or without
> > +#   modification, are permitted provided that the following conditions
> > +#   are met:
> > +#
> > +# * Redistributions of source code must retain the above copyright
> > +#   notice, this list of conditions and the following disclaimer.
> > +# * Redistributions in binary form must reproduce the above copyright
> > +#   notice, this list of conditions and the following disclaimer in
> > +#   the documentation and/or other materials provided with the
> > +#   distribution.
> > +# * Neither the name of Intel Corporation nor the names of its
> > +#   contributors may be used to endorse or promote products derived
> > +#   from this software without specific prior written permission.
> > +#
> > +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > +
> > +include $(RTE_SDK)/mk/rte.vars.mk
> > +
> > +
> > +# install includes
> > +SYMLINK-y-include := rte_compat.h
> > +
> > +include $(RTE_SDK)/mk/rte.lib.mk
> > diff --git a/lib/librte_compat/rte_compat.h b/lib/librte_compat/rte_compat.h
> > new file mode 100644
> > index 000..6d65a53
> > --- /dev/null
> > +++ b/lib/librte_compat/rte_compat.h
> > @@ -0,0 +1,86 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + * * Redistributions of source code must retain the above copyright
> > + *   notice, this list of conditions and the following disclaimer.
> > + * * Redistributions in binary form must reproduce the above copyright
> > + *   notice, this list of conditions and the following disclaimer in
> > + *   the documentation and/or other materials provided with the
> > + *   distribution.
> > + * * Neither the name of Intel Corporation nor the names of its
> > + *   contributors may be used to endorse or promote products derived
> > + *   from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS O

[dpdk-dev] [PATCH] Change alarm cancel function to thread-safe.

2014-09-23 Thread Michal Jastrzebski
It eliminates a race between threads using rte_alarm_cancel and rte_alarm_set.

Signed-off-by: Pawel Wodkowski 
Reviewed-by: Michal Jastrzebski 
---
 lib/librte_eal/common/include/rte_alarm.h |3 +-
 lib/librte_eal/linuxapp/eal/eal_alarm.c   |   68 +++--
 2 files changed, 46 insertions(+), 25 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_alarm.h 
b/lib/librte_eal/common/include/rte_alarm.h
index d451522..f5f7de4 100644
--- a/lib/librte_eal/common/include/rte_alarm.h
+++ b/lib/librte_eal/common/include/rte_alarm.h
@@ -76,7 +76,8 @@ typedef void (*rte_eal_alarm_callback)(void *arg);
 int rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback cb, void *cb_arg);

 /**
- * Function to cancel an alarm callback which has been registered before.
+ * Function to cancel an alarm callback which has been registered before. If
+ * used ouside alarm callback it wait for all callbacks to finish its 
execution.
  *
  * @param cb_fn
  *  alarm callback
diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c 
b/lib/librte_eal/linuxapp/eal/eal_alarm.c
index 480f0cb..0561dbf 100644
--- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
+++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
@@ -69,12 +69,14 @@ struct alarm_entry {
struct timeval time;
rte_eal_alarm_callback cb_fn;
void *cb_arg;
-   volatile int executing;
+   volatile uint8_t executing;
+   volatile pthread_t executing_id;
 };

 static LIST_HEAD(alarm_list, alarm_entry) alarm_list = LIST_HEAD_INITIALIZER();
 static rte_spinlock_t alarm_list_lk = RTE_SPINLOCK_INITIALIZER;

+
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static int handler_registered = 0;
 static void eal_alarm_callback(struct rte_intr_handle *hdl, void *arg);
@@ -108,11 +110,14 @@ eal_alarm_callback(struct rte_intr_handle *hdl 
__rte_unused,
(ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec == 
now.tv_sec &&
ap->time.tv_usec <= 
now.tv_usec))){
ap->executing = 1;
+   ap->executing_id = pthread_self();
rte_spinlock_unlock(&alarm_list_lk);

ap->cb_fn(ap->cb_arg);

rte_spinlock_lock(&alarm_list_lk);
+   ap->executing = 0;
+
LIST_REMOVE(ap, next);
rte_free(ap);
}
@@ -156,7 +161,6 @@ rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback 
cb_fn, void *cb_arg)
new_alarm->cb_arg = cb_arg;
new_alarm->time.tv_usec = (now.tv_usec + us) % US_PER_S;
new_alarm->time.tv_sec = now.tv_sec + ((now.tv_usec + us) / US_PER_S);
-   new_alarm->executing = 0;

rte_spinlock_lock(&alarm_list_lk);
if (!handler_registered) {
@@ -202,34 +206,50 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void 
*cb_arg)
 {
struct alarm_entry *ap, *ap_prev;
int count = 0;
+   int executing;

if (!cb_fn)
return -1;

-   rte_spinlock_lock(&alarm_list_lk);
-   /* remove any matches at the start of the list */
-   while ((ap = LIST_FIRST(&alarm_list)) != NULL &&
-   cb_fn == ap->cb_fn && ap->executing == 0 &&
-   (cb_arg == (void *)-1 || cb_arg == ap->cb_arg)) {
-   LIST_REMOVE(ap, next);
-   rte_free(ap);
-   count++;
-   }
-   ap_prev = ap;
-
-   /* now go through list, removing entries not at start */
-   LIST_FOREACH(ap, &alarm_list, next) {
-   /* this won't be true first time through */
-   if (cb_fn == ap->cb_fn &&  ap->executing == 0 &&
+   do {
+   executing = 0;
+   rte_spinlock_lock(&alarm_list_lk);
+   /* remove any matches at the start of the list */
+   while ((ap = LIST_FIRST(&alarm_list)) != NULL &&
+   cb_fn == ap->cb_fn &&
(cb_arg == (void *)-1 || cb_arg == ap->cb_arg)) 
{
-   LIST_REMOVE(ap,next);
-   rte_free(ap);
-   count++;
-   ap = ap_prev;
+
+   if (ap->executing == 0) {
+   LIST_REMOVE(ap, next);
+   rte_free(ap);
+   count++;
+   } else {
+   if (pthread_equal(ap->executing_id, 
pthread_self()) == 0)
+   executing++;
+
+   break;
+   }
}
ap_prev = ap;
-   }
-   rte_spinlock_unlock(&alarm_list_lk);
+
+   /* now go through list, removing entries not at start */
+   LIST_FOREACH(ap, &alarm_list, next) {
+   /* this won't be true first time through */
+   if (cb_fn == ap->cb_fn &&
+  

[dpdk-dev] [PATCH 0/7] cleanup option parsing in bsd/linux eal

2014-09-23 Thread Thomas Monjalon
2014-09-22 08:43, Neil Horman:
> On Mon, Sep 22, 2014 at 10:37:54AM +0200, David Marchand wrote:
> > Following Neil comments, here is a patchset to rework the eal options 
> > parsing.
> > I tried to have everything common to linux and bsd in a single file.
> > 
> > I ran a little make test on linux, it looks fine (at least I have as many 
> > fails
> > as before my changes).
> > 
> > There is still work in this part, but I want to stop here.
> > If anyone wants to continue ... :-)

Yes, many eal parts should be factorized for Linux and BSD.
Cleanups are welcome!

> Series
> ACK

Applied

Thanks
-- 
Thomas


[dpdk-dev] [PATCH] eal: remove kni file from bsdapp eal

2014-09-23 Thread Thomas Monjalon
> KNI applies only to linux, so there should be no need for any kni files to
> be present in the bsdapp eal folder.
> 
> Signed-off-by: Bruce Richardson 

Acked-by: Thomas Monjalon 
Applied

Thanks
-- 
Thomas


[dpdk-dev] [PATCH 0/2] introduce dev_ops to get extended statistics of a device

2014-09-23 Thread Thomas Monjalon
2014-07-23 18:41, Richardson, Bruce:
> > The generic statistics structure is getting bigger as new statistics are
> > added in specific devices. For instance, fdir, tx_pause or loopback
> > stats do not apply on virtual devices. It won't be possible to add every
> > specific statistics in this generic stats structure, but on the other
> > hand these specific statistics are useful for debugging purpose.
> > 
> > This 2 patches introduces xstats_get() and xstats_reset() in
> > dev_ops. When registered by a device, it can be used to provide
> > arbitrary statistics that are identified by a name, as done by eththool
> > in kernel.
> > 
> > After that, some statistics could be moved from the generic structure to
> > this new framework, but it will be part of another patch series as it
> > should be discussed first.
> 
> I like the idea, so Ack on the concept. :-)

Applied with some minor fixes in the comments.

Next step is to move some fields from rte_eth_stats to extended stats.
First candidates to move are the VF-only counters.

Thanks
-- 
Thomas


[dpdk-dev] [PATCH v4 0/5] lib/librte_vhost: user space vhost cuse driver library

2014-09-23 Thread Xie, Huawei
Hi Thomas:
Comments to this and the vhost example patch?

BR.
Huawei

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
> Sent: Friday, September 12, 2014 6:55 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v4 0/5] lib/librte_vhost: user space vhost cuse 
> driver
> library
> 
> This set of patches transforms and refactors vhost example to a user
> space vhost cuse library. This library implements a user space vhost
> cuse driver, and provides generic APIs for user space ethernet vswitch
> to integrate us-vhost for fast packet switching with guest virtio.
> 
> Change notes:
> 
>  v2) Turn off vhost lib by default
> 
>  v3) Fixed checkpatch issues
> 
>  v4) Split the patch per Thomas's requirement
> 
> 
> Huawei Xie (5):
>   mv vhost example to vhost lib directory
>   copy the vhost rx/tx functions from main.c to new file vhost_rxtx.c
>   remove main.c main.h
>   remove Makefile
>   rename virtio-net.h to rte_virtio_net.h as API header file
>   vmdq, mac learning and other switch related logics are removed
>   zero copy feature isn't generic,and is removed.
>   add vhost lib Makefile.
>   Add TODOs for found new issues.
>   Fix coding style issue which are treated as errors by checkpatch.pl
>   add vhost lib support in makefile
>   turn off vhost lib by default as it requires fuse development package.
> 
>  config/common_linuxapp   |7 +
>  examples/vhost/Makefile  |   60 -
>  examples/vhost/eventfd_link/Makefile |   39 -
>  examples/vhost/eventfd_link/eventfd_link.c   |  205 --
>  examples/vhost/eventfd_link/eventfd_link.h   |   79 -
>  examples/vhost/libvirt/qemu-wrap.py  |  367 ---
>  examples/vhost/main.c| 3722 
> --
>  examples/vhost/main.h|   86 -
>  examples/vhost/vhost-net-cdev.c  |  367 ---
>  examples/vhost/vhost-net-cdev.h  |   83 -
>  examples/vhost/virtio-net.c  | 1165 
>  examples/vhost/virtio-net.h  |  161 --
>  lib/Makefile |1 +
>  lib/librte_vhost/Makefile|   48 +
>  lib/librte_vhost/eventfd_link/Makefile   |   39 +
>  lib/librte_vhost/eventfd_link/eventfd_link.c |  205 ++
>  lib/librte_vhost/eventfd_link/eventfd_link.h |   79 +
>  lib/librte_vhost/libvirt/qemu-wrap.py|  367 +++
>  lib/librte_vhost/rte_virtio_net.h|  192 ++
>  lib/librte_vhost/vhost-net-cdev.c|  362 +++
>  lib/librte_vhost/vhost-net-cdev.h|  112 +
>  lib/librte_vhost/vhost_rxtx.c|  301 +++
>  lib/librte_vhost/virtio-net.c| 1000 +++
>  mk/rte.app.mk|5 +
>  24 files changed, 2718 insertions(+), 6334 deletions(-)
>  delete mode 100644 examples/vhost/Makefile
>  delete mode 100644 examples/vhost/eventfd_link/Makefile
>  delete mode 100644 examples/vhost/eventfd_link/eventfd_link.c
>  delete mode 100644 examples/vhost/eventfd_link/eventfd_link.h
>  delete mode 100755 examples/vhost/libvirt/qemu-wrap.py
>  delete mode 100644 examples/vhost/main.c
>  delete mode 100644 examples/vhost/main.h
>  delete mode 100644 examples/vhost/vhost-net-cdev.c
>  delete mode 100644 examples/vhost/vhost-net-cdev.h
>  delete mode 100644 examples/vhost/virtio-net.c
>  delete mode 100644 examples/vhost/virtio-net.h
>  create mode 100644 lib/librte_vhost/Makefile
>  create mode 100644 lib/librte_vhost/eventfd_link/Makefile
>  create mode 100644 lib/librte_vhost/eventfd_link/eventfd_link.c
>  create mode 100644 lib/librte_vhost/eventfd_link/eventfd_link.h
>  create mode 100755 lib/librte_vhost/libvirt/qemu-wrap.py
>  create mode 100644 lib/librte_vhost/rte_virtio_net.h
>  create mode 100644 lib/librte_vhost/vhost-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost_rxtx.c
>  create mode 100644 lib/librte_vhost/virtio-net.c
> 
> --
> 1.8.1.4



[dpdk-dev] [PATCH 1/4] compat: Add infrastructure to support symbol versioning

2014-09-23 Thread Sergio Gonzalez Monroy
On Tue, Sep 23, 2014 at 10:58:29AM -0400, Neil Horman wrote:
> On Tue, Sep 23, 2014 at 11:39:29AM +0100, Sergio Gonzalez Monroy wrote:
> > Hi Neil,
> > 
> > On Mon, Sep 15, 2014 at 03:23:48PM -0400, Neil Horman wrote:
> > > Add initial pass header files to support symbol versioning.
> > > 
> > > Signed-off-by: Neil Horman 
> > > CC: Thomas Monjalon 
> > > CC: "Richardson, Bruce" 
> > > ---
> > >  lib/Makefile   |  1 +
> > >  lib/librte_compat/Makefile | 38 +++
> > >  lib/librte_compat/rte_compat.h | 86 
> > > ++
> > >  mk/rte.lib.mk  |  6 +++
> > >  4 files changed, 131 insertions(+)
> > >  create mode 100644 lib/librte_compat/Makefile
> > >  create mode 100644 lib/librte_compat/rte_compat.h
> > > 
> > > diff --git a/lib/Makefile b/lib/Makefile
> > > index 10c5bb3..a85b55b 100644
> > > --- a/lib/Makefile
> > > +++ b/lib/Makefile
> > > @@ -32,6 +32,7 @@
> > >  include $(RTE_SDK)/mk/rte.vars.mk
> > >  
> > >  DIRS-$(CONFIG_RTE_LIBC) += libc
> > > +DIRS-y += librte_compat
> > >  DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal
> > >  DIRS-$(CONFIG_RTE_LIBRTE_MALLOC) += librte_malloc
> > >  DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring
> > > diff --git a/lib/librte_compat/Makefile b/lib/librte_compat/Makefile
> > > new file mode 100644
> > > index 000..a61511a
> > > --- /dev/null
> > > +++ b/lib/librte_compat/Makefile
> > > @@ -0,0 +1,38 @@
> > > +#   BSD LICENSE
> > > +#
> > > +#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > > +#   All rights reserved.
> > > +#
> > > +#   Redistribution and use in source and binary forms, with or without
> > > +#   modification, are permitted provided that the following conditions
> > > +#   are met:
> > > +#
> > > +# * Redistributions of source code must retain the above copyright
> > > +#   notice, this list of conditions and the following disclaimer.
> > > +# * Redistributions in binary form must reproduce the above copyright
> > > +#   notice, this list of conditions and the following disclaimer in
> > > +#   the documentation and/or other materials provided with the
> > > +#   distribution.
> > > +# * Neither the name of Intel Corporation nor the names of its
> > > +#   contributors may be used to endorse or promote products derived
> > > +#   from this software without specific prior written permission.
> > > +#
> > > +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > > +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > > +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > > +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > > +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > > +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > > +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > > +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > > +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > > +
> > > +include $(RTE_SDK)/mk/rte.vars.mk
> > > +
> > > +
> > > +# install includes
> > > +SYMLINK-y-include := rte_compat.h
> > > +
> > > +include $(RTE_SDK)/mk/rte.lib.mk
> > > diff --git a/lib/librte_compat/rte_compat.h 
> > > b/lib/librte_compat/rte_compat.h
> > > new file mode 100644
> > > index 000..6d65a53
> > > --- /dev/null
> > > +++ b/lib/librte_compat/rte_compat.h
> > > @@ -0,0 +1,86 @@
> > > +/*-
> > > + *   BSD LICENSE
> > > + *
> > > + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > > + *   All rights reserved.
> > > + *
> > > + *   Redistribution and use in source and binary forms, with or without
> > > + *   modification, are permitted provided that the following conditions
> > > + *   are met:
> > > + *
> > > + * * Redistributions of source code must retain the above copyright
> > > + *   notice, this list of conditions and the following disclaimer.
> > > + * * Redistributions in binary form must reproduce the above 
> > > copyright
> > > + *   notice, this list of conditions and the following disclaimer in
> > > + *   the documentation and/or other materials provided with the
> > > + *   distribution.
> > > + * * Neither the name of Intel Corporation nor the names of its
> > > + *   contributors may be used to endorse or promote products derived
> > > + *   from this software without specific prior written permission.
> > > + *
> > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 
> > > FOR

[dpdk-dev] KNI and memzones

2014-09-23 Thread Zhou, Danny

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jay Rolette
> Sent: Tuesday, September 23, 2014 8:39 PM
> To: Marc Sune
> Cc: ; dev-team at bisdn.de
> Subject: Re: [dpdk-dev] KNI and memzones
> 
> *> p.s. Lately someone involved with DPDK said KNI would be deprecated in
> future DPDK releases; I haven't read or listen to this before, is this
> true? What would be the natural replacement then?*
> 
> KNI is a non-trivial part of the product I'm in the process of building.
> I'd appreciate someone "in the know" addressing this one please. Are there
> specific roadmap plans relative to KNI that we need to be aware of?
> 

KNI and multi-threaded KNI has several limitation:
1) Flow classification and packet distribution are done both software, 
specifically KNI user space library, at the cost of CPU cycles.
2) Low performance, skb creation/free and packetscopy between skb and mbuf 
kills performance significantly.
3) Dedicate cores in user space and kernel space responsible for rx/tx packets 
between DPDK App and KNI device, it seems to me waste too many core resources.
4) GPL license jail as KNI sits in kernel.

We actually have a bifurcated driver prototype that meets both high performance 
and upstreamable requirement, which is treated as alternative solution of KNI. 
The idea is to
leverage NIC' flow director capability to bifurcate data plane packets to DPDK 
and keep control plane packets or whatever packets need to go through kernel' 
TCP/IP stack remains
being processed in kernel(NIC driver + stack). Basically, kernel NIC driver and 
DPDK co-exists to driver a same NIC device, but manipulate different rx/tx 
queue pairs. Though there is some 
tough consistent NIC control issue which needs to be resolved and upstreamed to 
kernel, which I do not want to expose details at the moment.

IMHO, KNI should NOT be removed unless there is a really good user space, 
open-source and socket backward-compatible() TCP/IP stack which should not 
become true very soon.
The bifurcated driver approach could certainly replace KNI for some use cases 
where DPDK does not own the NIC control. 

Do you mind share your KNI use case in more details to help determine whether 
bifurcate driver could help with?

> Regards,
> Jay
> 
> On Tue, Sep 23, 2014 at 4:27 AM, Marc Sune  wrote:
> 
> > Hi all,
> >
> > So we are having some problems with KNI. In short, we have a DPDK
> > application that creates KNI interfaces and destroys them during its
> > lifecycle and connecting them to DOCKER containers. Interfaces may
> > eventually be even named the same (see below).
> >
> > We were wondering why even calling rte_kni_release() the hugepages memory
> > was rapidly being exhausted, and we also realised even after destruction,
> > you cannot use the same name for the interface.
> >
> > After close inspection of the rte_kni lib we think the core issue and is
> > mostly a design issue. rte_kni_alloc ends up calling kni_memzone_reserve()
> > that calls at the end rte_memzone_reserve() which cannot be unreserved by
> > rte_kni_relese() (by design of memzones). The exhaustion is rapid due to
> > the number of FIFOs created (6).
> >
> > If this would be right, we would propose and try to provide a patch as
> > follows:
> >
> > * Create a new rte_kni_init(unsigned int max_knis);
> >
> > This would preallocate all the FIFO rings(TX, RX, ALLOC, FREE, Request
> > and  Response)*max_knis by calling kni_memzone_reserve(), and store them in
> > a kni_fifo_pool. This should only be called once by DPDK applications at
> > bootstrapping time.
> >
> > * rte_kni_allocate would just use one of the kni_fifo_pool (one => meaning
> > a a set of 6 FIFOs making a single slot)
> > * rte_kni_release would return to the pool.
> >
> > This should solve both issues. We would base the patch on 1.7.2.
> >
> > Thoughts?
> > marc
> >
> > p.s. Lately someone involved with DPDK said KNI would be deprecated in
> > future DPDK releases; I haven't read or listen to this before, is this
> > true? What would be the natural replacement then?
> >


[dpdk-dev] [PATCH v2 3/5] testpmd: Change rxfreet default to 32

2014-09-23 Thread Neil Horman
On Tue, Sep 23, 2014 at 12:08:15PM +0100, Bruce Richardson wrote:
> To improve performance by using bulk alloc or vectored RX routines, we
> need to set rx free threshold (rxfreet) value to 32, so make this the
> testpmd default.
> 
> Thirty-two is the minimum setting needed to enable either the
> bulk alloc or vector RX routines inside the ixgbe driver, so it's
> best made the default for that reason. Please see
> "check_rx_burst_bulk_alloc_preconditions()" in ixgbe_rxtx.c, and
> RX function assignment logic in "ixgbe_dev_rx_queue_setup()" in
> the same file.
> 
> The difference in IO performance for testpmd when called without any
> optional parameters, and using 10G NICs using the ixgbe driver, can be
> significant - approx 25% or more.
> 
> Updates in V2:
> * Updated commit message with additional details
> 
> Signed-off-by: Bruce Richardson 
> ---
>  app/test-pmd/testpmd.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index 9f6cdc4..f76406f 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -225,7 +225,9 @@ struct rte_eth_thresh tx_thresh = {
>  /*
>   * Configurable value of RX free threshold.
>   */
> -uint16_t rx_free_thresh = 0; /* Immediately free RX descriptors by default. 
> */
> +uint16_t rx_free_thresh = 32; /* Refill RX descriptors once every 32 packets,
> + This setting is needed for ixgbe to enable bulk alloc or vector
> + receive functionality. */

I thought we were talking about making this a pmd private selectable item, or
allowing a reserved "let the pmd decide" setting.  Or are we saving that for a
later time?

Neil

>  
>  /*
>   * Configurable value of RX drop enable.
> -- 
> 1.9.3
> 
> 


[dpdk-dev] compile error with linuxapp-clang target on Fedora 20 with 3.15.10 kernel

2014-09-23 Thread Matthew Hall
I fixed one main libs bug which blocked compile that was trivial and got it 
applied. I had examples working too but using an impolite method of doing so.

As for the latest kernel stuff, it sounds like we have to get a hand from LKML 
or a sublist to figure it out, eh? Doesn't seem like it's in the DPDK code.

Matthew.
-- 
Sent from my mobile device.

On September 23, 2014 2:59:47 AM PDT, Bruce Richardson  wrote:
>On Mon, Sep 22, 2014 at 03:12:43PM -0700, Matthew Hall wrote:
>> On Mon, Sep 22, 2014 at 04:05:29PM -0400, Neil Horman wrote:
>> > On Mon, Sep 22, 2014 at 12:23:36PM -0700, Matthew Hall wrote:
>> > > I fixed some of the clang errors a few weeks ago. But some of my
>patches got sent back due to issues seen by others and I didn't have
>time to fix them yet.
>> > Can you elaborate on the specific issue here?
>> > Neil
>> 
>> Sure...
>> 
>> Have a look at this thread. With this, I got it compiling fine with
>Clang on 
>> Ubuntu 14.04 LTS.
>> 
>> Some of your stuff was funky kernel problems... I probably didn't get
>that as 
>> I was using an earlier kernel release.
>> 
>> One of the patches was merged as it was trivial but the others
>involved 
>> disabling some warnings on certain examples... but people said they
>preferred 
>> using ifdef's instead to fix them, which I didn't get a chance to do
>yet.
>> 
>> Maybe we could try and make all of these clang fixes happen together.
>I really 
>> value the better error messages, I can fix bugs much quicker with all
>of 
>> those.
>> 
>> Matthew.
>
>"make examples" on all the examples has failed for some time, but the 
>compilation of the main libs used to work. I've pulled down a 3.14
>kernel 
>for fedora from koji and confirmed that building with 
>"RTE_KERNELDIR=/usr/src/kernels/3.14.9-200.fc20.x86_64/" works fine.
>It's 
>something that has changed in 3.15 and beyond that is causing clang
>flags to 
>get passed in to gcc. I've confirmed that 3.16 also doesn't work.
>
>/Bruce



[dpdk-dev] [PATCH 1/4] compat: Add infrastructure to support symbol versioning

2014-09-23 Thread Neil Horman
On Tue, Sep 23, 2014 at 05:29:48PM +0100, Sergio Gonzalez Monroy wrote:
> On Tue, Sep 23, 2014 at 10:58:29AM -0400, Neil Horman wrote:
> > On Tue, Sep 23, 2014 at 11:39:29AM +0100, Sergio Gonzalez Monroy wrote:
> > > Hi Neil,
> > > 
> > > On Mon, Sep 15, 2014 at 03:23:48PM -0400, Neil Horman wrote:
> > > > Add initial pass header files to support symbol versioning.
> > > > 
> > > > Signed-off-by: Neil Horman 
> > > > CC: Thomas Monjalon 
> > > > CC: "Richardson, Bruce" 
> > > > ---
> > > >  lib/Makefile   |  1 +
> > > >  lib/librte_compat/Makefile | 38 +++
> > > >  lib/librte_compat/rte_compat.h | 86 
> > > > ++
> > > >  mk/rte.lib.mk  |  6 +++
> > > >  4 files changed, 131 insertions(+)
> > > >  create mode 100644 lib/librte_compat/Makefile
> > > >  create mode 100644 lib/librte_compat/rte_compat.h
> > > > 
> > > > diff --git a/lib/Makefile b/lib/Makefile
> > > > index 10c5bb3..a85b55b 100644
> > > > --- a/lib/Makefile
> > > > +++ b/lib/Makefile
> > > > @@ -32,6 +32,7 @@
> > > >  include $(RTE_SDK)/mk/rte.vars.mk
> > > >  
> > > >  DIRS-$(CONFIG_RTE_LIBC) += libc
> > > > +DIRS-y += librte_compat
> > > >  DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal
> > > >  DIRS-$(CONFIG_RTE_LIBRTE_MALLOC) += librte_malloc
> > > >  DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring
> > > > diff --git a/lib/librte_compat/Makefile b/lib/librte_compat/Makefile
> > > > new file mode 100644
> > > > index 000..a61511a
> > > > --- /dev/null
> > > > +++ b/lib/librte_compat/Makefile
> > > > @@ -0,0 +1,38 @@
> > > > +#   BSD LICENSE
> > > > +#
> > > > +#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > > > +#   All rights reserved.
> > > > +#
> > > > +#   Redistribution and use in source and binary forms, with or without
> > > > +#   modification, are permitted provided that the following conditions
> > > > +#   are met:
> > > > +#
> > > > +# * Redistributions of source code must retain the above copyright
> > > > +#   notice, this list of conditions and the following disclaimer.
> > > > +# * Redistributions in binary form must reproduce the above 
> > > > copyright
> > > > +#   notice, this list of conditions and the following disclaimer in
> > > > +#   the documentation and/or other materials provided with the
> > > > +#   distribution.
> > > > +# * Neither the name of Intel Corporation nor the names of its
> > > > +#   contributors may be used to endorse or promote products derived
> > > > +#   from this software without specific prior written permission.
> > > > +#
> > > > +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > > +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > > +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 
> > > > FOR
> > > > +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
> > > > COPYRIGHT
> > > > +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 
> > > > INCIDENTAL,
> > > > +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > > > +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF 
> > > > USE,
> > > > +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 
> > > > ANY
> > > > +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > > > +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE 
> > > > USE
> > > > +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH 
> > > > DAMAGE.
> > > > +
> > > > +include $(RTE_SDK)/mk/rte.vars.mk
> > > > +
> > > > +
> > > > +# install includes
> > > > +SYMLINK-y-include := rte_compat.h
> > > > +
> > > > +include $(RTE_SDK)/mk/rte.lib.mk
> > > > diff --git a/lib/librte_compat/rte_compat.h 
> > > > b/lib/librte_compat/rte_compat.h
> > > > new file mode 100644
> > > > index 000..6d65a53
> > > > --- /dev/null
> > > > +++ b/lib/librte_compat/rte_compat.h
> > > > @@ -0,0 +1,86 @@
> > > > +/*-
> > > > + *   BSD LICENSE
> > > > + *
> > > > + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > > > + *   All rights reserved.
> > > > + *
> > > > + *   Redistribution and use in source and binary forms, with or without
> > > > + *   modification, are permitted provided that the following conditions
> > > > + *   are met:
> > > > + *
> > > > + * * Redistributions of source code must retain the above copyright
> > > > + *   notice, this list of conditions and the following disclaimer.
> > > > + * * Redistributions in binary form must reproduce the above 
> > > > copyright
> > > > + *   notice, this list of conditions and the following disclaimer 
> > > > in
> > > > + *   the documentation and/or other materials provided with the
> > > > + *   distribution.
> > > > + * * Neither the name of Intel Corporation nor the names of its
> > > > + *   contributors may be used to endorse or pr

[dpdk-dev] KNI and memzones

2014-09-23 Thread Jay Rolette
I can't discuss product details openly yet, but I'm happy to have a
detailed discussion under NDA with Intel. In fact, we had an early NDA
discussion with Intel about it a few months ago.

That said, the use case isn't tied so closely to my product that I can't
describe it in general terms...

Imagine a box that installs in your network as a transparent
bump-in-the-wire. Traffic comes in port 1 and is processed by our
DPDK-based engine, then the packets are forwarded out port 2, where they
head to their original destination. From a network topology point of view,
the box is mostly invisible.

Same process applies for traffic going the other way (RX on port 2,
special-sauce processing in DPDK app, TX on port 1).

If you are familiar with network security products, this is very much how
IPS devices work.

Where KNI comes into play is for several user-space apps that need to use
the normal network stack (sockets) to communicate over the _same_ ports
used on the main data path. We use KNI to create a virtual port with an IP
address overlaid on the "invisible" data path ports.

This isn't just for control traffic. It's obviously not line-rate
processing, but we need to get all the bandwidth we can out of it.

Let me know if that makes sense or if I need to clarify some things. If
you'd rather continue this as an NDA discussion, just shoot me an email
directly.

Regards,
Jay



On Tue, Sep 23, 2014 at 11:38 AM, Zhou, Danny  wrote:

>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jay Rolette
> > Sent: Tuesday, September 23, 2014 8:39 PM
> > To: Marc Sune
> > Cc: ; dev-team at bisdn.de
> > Subject: Re: [dpdk-dev] KNI and memzones
> >
> > *> p.s. Lately someone involved with DPDK said KNI would be deprecated in
> > future DPDK releases; I haven't read or listen to this before, is this
> > true? What would be the natural replacement then?*
> >
> > KNI is a non-trivial part of the product I'm in the process of building.
> > I'd appreciate someone "in the know" addressing this one please. Are
> there
> > specific roadmap plans relative to KNI that we need to be aware of?
> >
>
> KNI and multi-threaded KNI has several limitation:
> 1) Flow classification and packet distribution are done both software,
> specifically KNI user space library, at the cost of CPU cycles.
> 2) Low performance, skb creation/free and packetscopy between skb and mbuf
> kills performance significantly.
> 3) Dedicate cores in user space and kernel space responsible for rx/tx
> packets between DPDK App and KNI device, it seems to me waste too many core
> resources.
> 4) GPL license jail as KNI sits in kernel.
>
> We actually have a bifurcated driver prototype that meets both high
> performance and upstreamable requirement, which is treated as alternative
> solution of KNI. The idea is to
> leverage NIC' flow director capability to bifurcate data plane packets to
> DPDK and keep control plane packets or whatever packets need to go through
> kernel' TCP/IP stack remains
> being processed in kernel(NIC driver + stack). Basically, kernel NIC
> driver and DPDK co-exists to driver a same NIC device, but manipulate
> different rx/tx queue pairs. Though there is some
> tough consistent NIC control issue which needs to be resolved and
> upstreamed to kernel, which I do not want to expose details at the moment.
>
> IMHO, KNI should NOT be removed unless there is a really good user space,
> open-source and socket backward-compatible() TCP/IP stack which should not
> become true very soon.
> The bifurcated driver approach could certainly replace KNI for some use
> cases where DPDK does not own the NIC control.
>
> Do you mind share your KNI use case in more details to help determine
> whether bifurcate driver could help with?
>
> > Regards,
> > Jay
> >
> > On Tue, Sep 23, 2014 at 4:27 AM, Marc Sune  wrote:
> >
> > > Hi all,
> > >
> > > So we are having some problems with KNI. In short, we have a DPDK
> > > application that creates KNI interfaces and destroys them during its
> > > lifecycle and connecting them to DOCKER containers. Interfaces may
> > > eventually be even named the same (see below).
> > >
> > > We were wondering why even calling rte_kni_release() the hugepages
> memory
> > > was rapidly being exhausted, and we also realised even after
> destruction,
> > > you cannot use the same name for the interface.
> > >
> > > After close inspection of the rte_kni lib we think the core issue and
> is
> > > mostly a design issue. rte_kni_alloc ends up calling
> kni_memzone_reserve()
> > > that calls at the end rte_memzone_reserve() which cannot be unreserved
> by
> > > rte_kni_relese() (by design of memzones). The exhaustion is rapid due
> to
> > > the number of FIFOs created (6).
> > >
> > > If this would be right, we would propose and try to provide a patch as
> > > follows:
> > >
> > > * Create a new rte_kni_init(unsigned int max_knis);
> > >
> > > This would preallocate all the FIFO rings(TX, RX, ALLOC, FREE, 

[dpdk-dev] Can not init NIC after merge to DPDK 1.7 problem

2014-09-23 Thread Wang, Shawn
Hi:

We are using our own Makefile in building dpdk program. Recently we are working 
on upgrading from DPDK 1.3 to DPDK 1.7. I found the rte_ixgbe_pmd_init has been 
replaced by PMD_REGISTER_DRIVER. So I delete rte_ixgbe_pmd_init calls. But 
after that, our dpdk program could not correctly find the NIC anymore. After 
digging into it a little more, I found the code dose not correctly register the 
driver type we are using, which is ixgbe.
To isolate the problem, I hacked a smal example l3fwd, and only have the main.c 
file like this for my testing purpose.

#include 
#include 

#include "main.h"

int
MAIN(int argc, char **argv)
{
/* init EAL */
int ret = rte_eal_init(argc, argv);
printf("ret %d\n", ret);
return 0;
}

I found if I use the Makefile provided in the example, the program will find 
the ixgbe NIC. But if I just use these 2 commands to compile and link it. It 
will not find the ixgbe NIC.

gcc -I../../x86_64-native-linuxapp-gcc/include 
-L../../x86_64-native-linuxapp-gcc/lib -lrte_eal -c main.c
gcc -o l3fwd main.o -L../../x86_64-native-linuxapp-gcc/lib -lrte_eal 
-lrte_distributor -lrte_pipeline -lrte_port -lrte_timer -lrte_hash -lrte_acl 
-lm -lrt -lrte_mbuf -lethdev -lrte_malloc -lrte_mempool -lrte_ring -lc -lm 
-lrte_cmdline -lrte_cfgfile -lrte_pmd_bond -lrte_pmd_ixgbe -lrte_pmd_e1000 
-lrte_pmd_ring -lpthread -ldl -lrt

Can someone share some light on what is magic of the dpdk Makefile to correctly 
register the NIC type?

Thank you so much.
Xingbo Wang


[dpdk-dev] Can not init NIC after merge to DPDK 1.7 problem

2014-09-23 Thread Neil Horman
On Tue, Sep 23, 2014 at 06:53:57PM +, Wang, Shawn wrote:
> Hi:
> 
> We are using our own Makefile in building dpdk program. Recently we are 
> working on upgrading from DPDK 1.3 to DPDK 1.7. I found the 
> rte_ixgbe_pmd_init has been replaced by PMD_REGISTER_DRIVER. So I delete 
> rte_ixgbe_pmd_init calls. But after that, our dpdk program could not 
> correctly find the NIC anymore. After digging into it a little more, I found 
> the code dose not correctly register the driver type we are using, which is 
> ixgbe.
> To isolate the problem, I hacked a smal example l3fwd, and only have the 
> main.c file like this for my testing purpose.
> 
> #include 
> #include 
> 
> #include "main.h"
> 
> int
> MAIN(int argc, char **argv)
> {
> /* init EAL */
> int ret = rte_eal_init(argc, argv);
> printf("ret %d\n", ret);
> return 0;
> }
> 
> I found if I use the Makefile provided in the example, the program will find 
> the ixgbe NIC. But if I just use these 2 commands to compile and link it. It 
> will not find the ixgbe NIC.
> 
> gcc -I../../x86_64-native-linuxapp-gcc/include 
> -L../../x86_64-native-linuxapp-gcc/lib -lrte_eal -c main.c
> gcc -o l3fwd main.o -L../../x86_64-native-linuxapp-gcc/lib -lrte_eal 
> -lrte_distributor -lrte_pipeline -lrte_port -lrte_timer -lrte_hash -lrte_acl 
> -lm -lrt -lrte_mbuf -lethdev -lrte_malloc -lrte_mempool -lrte_ring -lc -lm 
> -lrte_cmdline -lrte_cfgfile -lrte_pmd_bond -lrte_pmd_ixgbe -lrte_pmd_e1000 
> -lrte_pmd_ring -lpthread -ldl -lrt
> 
> Can someone share some light on what is magic of the dpdk Makefile to 
> correctly register the NIC type?
> 
> Thank you so much.
> Xingbo Wang
> 

I'm not really sure why you would strip out the Makefiles to dpdk, but I suppose
thats not the germaine question.

First, how are you building the DPDK?  As a set of shared libraries, or as a set
of static archives?  If you're building shared libraries, you need to pass
-shared to gcc, or the constructors will get stripped out using your command
line above.  There might be some other options that escape me, but you can find
out for sure by using the packaged makefiles and running make V=1 to see all the
passed options in the link stage

Secondly, when you say register the NIC type, do you mean that you don't see the
NIC get registered with dpdk, or you don't see an instance of the NIC created?
If its the former, you need to confirm that by running a debugger and looking at
what elements are on the device_list after your applications starts.  If its the
latter, that may well be a config error, as you may need to pass the --whitelist
option on the command line to trigger a device probe.

Neil



[dpdk-dev] KNI and memzones

2014-09-23 Thread Zhou, Danny
It looks like a typical network middle box usage with IDS/IPS/DPI sort of 
functionalities.  Good enough performance rather than line-rate performance 
should be ok for this case, and multi-threaded KNI(multiple software rx/tx 
queues are established between DPDK and a single vEth netdev with multiple 
kernel threads affinities to several lcores) should fit, with linear 
performance scaling if you can allocate multiple lcores to achieve satisfied 
throughput for relatively big packets.

Since NIC control is still in DPDK? PMD for this case, bifurcated driver does 
not fit, unless you only use DPDK to rx/tx packets in your box.

From: Jay Rolette [mailto:role...@infiniteio.com]
Sent: Wednesday, September 24, 2014 2:53 AM
To: Zhou, Danny
Cc: Marc Sune; ; dev-team at bisdn.de
Subject: Re: [dpdk-dev] KNI and memzones

I can't discuss product details openly yet, but I'm happy to have a detailed 
discussion under NDA with Intel. In fact, we had an early NDA discussion with 
Intel about it a few months ago.

That said, the use case isn't tied so closely to my product that I can't 
describe it in general terms...

Imagine a box that installs in your network as a transparent bump-in-the-wire. 
Traffic comes in port 1 and is processed by our DPDK-based engine, then the 
packets are forwarded out port 2, where they head to their original 
destination. From a network topology point of view, the box is mostly invisible.

Same process applies for traffic going the other way (RX on port 2, 
special-sauce processing in DPDK app, TX on port 1).

If you are familiar with network security products, this is very much how IPS 
devices work.

Where KNI comes into play is for several user-space apps that need to use the 
normal network stack (sockets) to communicate over the _same_ ports used on the 
main data path. We use KNI to create a virtual port with an IP address overlaid 
on the "invisible" data path ports.

This isn't just for control traffic. It's obviously not line-rate processing, 
but we need to get all the bandwidth we can out of it.

Let me know if that makes sense or if I need to clarify some things. If you'd 
rather continue this as an NDA discussion, just shoot me an email directly.

Regards,
Jay



On Tue, Sep 23, 2014 at 11:38 AM, Zhou, Danny mailto:danny.zhou at intel.com>> wrote:

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On 
> Behalf Of Jay Rolette
> Sent: Tuesday, September 23, 2014 8:39 PM
> To: Marc Sune
> Cc: mailto:dev at dpdk.org>>; dev-team at 
> bisdn.de
> Subject: Re: [dpdk-dev] KNI and memzones
>
> *> p.s. Lately someone involved with DPDK said KNI would be deprecated in
> future DPDK releases; I haven't read or listen to this before, is this
> true? What would be the natural replacement then?*
>
> KNI is a non-trivial part of the product I'm in the process of building.
> I'd appreciate someone "in the know" addressing this one please. Are there
> specific roadmap plans relative to KNI that we need to be aware of?
>

KNI and multi-threaded KNI has several limitation:
1) Flow classification and packet distribution are done both software, 
specifically KNI user space library, at the cost of CPU cycles.
2) Low performance, skb creation/free and packetscopy between skb and mbuf 
kills performance significantly.
3) Dedicate cores in user space and kernel space responsible for rx/tx packets 
between DPDK App and KNI device, it seems to me waste too many core resources.
4) GPL license jail as KNI sits in kernel.

We actually have a bifurcated driver prototype that meets both high performance 
and upstreamable requirement, which is treated as alternative solution of KNI. 
The idea is to
leverage NIC' flow director capability to bifurcate data plane packets to DPDK 
and keep control plane packets or whatever packets need to go through kernel' 
TCP/IP stack remains
being processed in kernel(NIC driver + stack). Basically, kernel NIC driver and 
DPDK co-exists to driver a same NIC device, but manipulate different rx/tx 
queue pairs. Though there is some
tough consistent NIC control issue which needs to be resolved and upstreamed to 
kernel, which I do not want to expose details at the moment.

IMHO, KNI should NOT be removed unless there is a really good user space, 
open-source and socket backward-compatible() TCP/IP stack which should not 
become true very soon.
The bifurcated driver approach could certainly replace KNI for some use cases 
where DPDK does not own the NIC control.

Do you mind share your KNI use case in more details to help determine whether 
bifurcate driver could help with?

> Regards,
> Jay
>
> On Tue, Sep 23, 2014 at 4:27 AM, Marc Sune  bisdn.de> wrote:
>
> > Hi all,
> >
> > So we are having some problems with KNI. In short, we have a DPDK
> > application that creates KNI interfaces and destroys them during its
> > lifecycle and connecting them to DOCKER containers

[dpdk-dev] KNI and memzones

2014-09-23 Thread Jay Rolette
Yep, good way to describe it. Not really related to network security
functions but very similar architecture.

On Tue, Sep 23, 2014 at 2:12 PM, Zhou, Danny  wrote:

>  It looks like a typical network middle box usage with IDS/IPS/DPI sort
> of functionalities.  Good enough performance rather than line-rate
> performance should be ok for this case, and multi-threaded KNI(multiple
> software rx/tx queues are established between DPDK and a single vEth netdev
> with multiple kernel threads affinities to several lcores) should fit, with
> linear performance scaling if you can allocate multiple lcores to achieve
> satisfied throughput for relatively big packets.
>
>
>
> Since NIC control is still in DPDK? PMD for this case, bifurcated driver
> does not fit, unless you only use DPDK to rx/tx packets in your box.
>
>
>
> *From:* Jay Rolette [mailto:rolette at infiniteio.com]
> *Sent:* Wednesday, September 24, 2014 2:53 AM
> *To:* Zhou, Danny
> *Cc:* Marc Sune; ; dev-team at bisdn.de
>
> *Subject:* Re: [dpdk-dev] KNI and memzones
>
>
>
> I can't discuss product details openly yet, but I'm happy to have a
> detailed discussion under NDA with Intel. In fact, we had an early NDA
> discussion with Intel about it a few months ago.
>
>
>
> That said, the use case isn't tied so closely to my product that I can't
> describe it in general terms...
>
>
>
> Imagine a box that installs in your network as a transparent
> bump-in-the-wire. Traffic comes in port 1 and is processed by our
> DPDK-based engine, then the packets are forwarded out port 2, where they
> head to their original destination. From a network topology point of view,
> the box is mostly invisible.
>
>
>
> Same process applies for traffic going the other way (RX on port 2,
> special-sauce processing in DPDK app, TX on port 1).
>
>
>
> If you are familiar with network security products, this is very much how
> IPS devices work.
>
>
>
> Where KNI comes into play is for several user-space apps that need to use
> the normal network stack (sockets) to communicate over the _same_ ports
> used on the main data path. We use KNI to create a virtual port with an IP
> address overlaid on the "invisible" data path ports.
>
>
>
> This isn't just for control traffic. It's obviously not line-rate
> processing, but we need to get all the bandwidth we can out of it.
>
>
>
> Let me know if that makes sense or if I need to clarify some things. If
> you'd rather continue this as an NDA discussion, just shoot me an email
> directly.
>
>
>
> Regards,
>
> Jay
>
>
>
>
>
>
>
> On Tue, Sep 23, 2014 at 11:38 AM, Zhou, Danny 
> wrote:
>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jay Rolette
> > Sent: Tuesday, September 23, 2014 8:39 PM
> > To: Marc Sune
> > Cc: ; dev-team at bisdn.de
> > Subject: Re: [dpdk-dev] KNI and memzones
> >
> > *> p.s. Lately someone involved with DPDK said KNI would be deprecated in
> > future DPDK releases; I haven't read or listen to this before, is this
> > true? What would be the natural replacement then?*
> >
> > KNI is a non-trivial part of the product I'm in the process of building.
> > I'd appreciate someone "in the know" addressing this one please. Are
> there
> > specific roadmap plans relative to KNI that we need to be aware of?
> >
>
> KNI and multi-threaded KNI has several limitation:
> 1) Flow classification and packet distribution are done both software,
> specifically KNI user space library, at the cost of CPU cycles.
> 2) Low performance, skb creation/free and packetscopy between skb and mbuf
> kills performance significantly.
> 3) Dedicate cores in user space and kernel space responsible for rx/tx
> packets between DPDK App and KNI device, it seems to me waste too many core
> resources.
> 4) GPL license jail as KNI sits in kernel.
>
> We actually have a bifurcated driver prototype that meets both high
> performance and upstreamable requirement, which is treated as alternative
> solution of KNI. The idea is to
> leverage NIC' flow director capability to bifurcate data plane packets to
> DPDK and keep control plane packets or whatever packets need to go through
> kernel' TCP/IP stack remains
> being processed in kernel(NIC driver + stack). Basically, kernel NIC
> driver and DPDK co-exists to driver a same NIC device, but manipulate
> different rx/tx queue pairs. Though there is some
> tough consistent NIC control issue which needs to be resolved and
> upstreamed to kernel, which I do not want to expose details at the moment.
>
> IMHO, KNI should NOT be removed unless there is a really good user space,
> open-source and socket backward-compatible() TCP/IP stack which should not
> become true very soon.
> The bifurcated driver approach could certainly replace KNI for some use
> cases where DPDK does not own the NIC control.
>
> Do you mind share your KNI use case in more details to help determine
> whether bifurcate driver could help with?
>
>
> > Regards,
> > Jay
> >
> > On Tue, Sep 23, 2014 at 4:27

[dpdk-dev] Can not init NIC after merge to DPDK 1.7 problem

2014-09-23 Thread Matthew Hall
On Tue, Sep 23, 2014 at 06:53:57PM +, Wang, Shawn wrote:
> Can someone share some light on what is magic of the dpdk Makefile to 
> correctly register the NIC type?

I had the same problem as a guy who began using it before the auto-reg, 
stopped a while, and began again after.

You have to pass the following GNU LD option:

--whole-archive

Matthew.


[dpdk-dev] KNI and memzones

2014-09-23 Thread Marc Sune
Danny,

On 23/09/14 18:38, Zhou, Danny wrote:
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jay Rolette
>> Sent: Tuesday, September 23, 2014 8:39 PM
>> To: Marc Sune
>> Cc: ; dev-team at bisdn.de
>> Subject: Re: [dpdk-dev] KNI and memzones
>>
>> *> p.s. Lately someone involved with DPDK said KNI would be deprecated in
>> future DPDK releases; I haven't read or listen to this before, is this
>> true? What would be the natural replacement then?*
>>
>> KNI is a non-trivial part of the product I'm in the process of building.
>> I'd appreciate someone "in the know" addressing this one please. Are there
>> specific roadmap plans relative to KNI that we need to be aware of?
>>
> KNI and multi-threaded KNI has several limitation:
> 1) Flow classification and packet distribution are done both software, 
> specifically KNI user space library, at the cost of CPU cycles.
> 2) Low performance, skb creation/free and packetscopy between skb and mbuf 
> kills performance significantly.
> 3) Dedicate cores in user space and kernel space responsible for rx/tx 
> packets between DPDK App and KNI device, it seems to me waste too many core 
> resources.
> 4) GPL license jail as KNI sits in kernel.
>
> We actually have a bifurcated driver prototype that meets both high 
> performance and upstreamable requirement, which is treated as alternative 
> solution of KNI. The idea is to
> leverage NIC' flow director capability to bifurcate data plane packets to 
> DPDK and keep control plane packets or whatever packets need to go through 
> kernel' TCP/IP stack remains
> being processed in kernel(NIC driver + stack). Basically, kernel NIC driver 
> and DPDK co-exists to driver a same NIC device, but manipulate different 
> rx/tx queue pairs. Though there is some
> tough consistent NIC control issue which needs to be resolved and upstreamed 
> to kernel, which I do not want to expose details at the moment.
>
> IMHO, KNI should NOT be removed unless there is a really good user space, 
> open-source and socket backward-compatible() TCP/IP stack which should not 
> become true very soon.
> The bifurcated driver approach could certainly replace KNI for some use cases 
> where DPDK does not own the NIC control.
>
> Do you mind share your KNI use case in more details to help determine whether 
> bifurcate driver could help with?
I don't know if your question was (also) directed to me, but I will give 
an explanation, as short as I can, to put the problem in context, since 
the KNI issue is still open to me.

The use case is a set of experimental(still, though close to stable) 
extensions over xDPd's[1] multi-platform OpenFlow switch as well as an 
orchestration framework prototype, developed by Politecnico di 
Torino&BISDN to support the deployment of NF graphs, in the framework of 
the UNIFY FP7 [2] research (which btw, Intel is a partner of). This 
prototype was publicly demoed in EWSDN'14. The switch extensions are 
already public, but still in a development branch. We would like to 
merge them mainstream, but we need to fix some issues, being the major 
one this KNI problem. The code for the orchestration will be public soon 
too.

The idea is that the standard xdpd, in the gnu-linux-dpdk platform, is 
enhanced by creating and destroying virtual ports that hide behind VNFs. 
>From the perspective of OF, these are just ports, so the OF controller 
can distribute traffic across VNFs and other OF LSIs(~virtual switches) 
via regular OF flowmods outputting there, and compose complex VNF 
graphs. We have implemented 3 types of ports a) NATIVE, meaning the 
function is a DPDK primary process function; this is a place holder for 
future work b) SHMEM, meaning a VNF implemented as a secondary process 
communicating via rte_ring buffers, implemented but needs to be profiled 
and c) EXTERNAL ports, implemented currently using KNI interfaces.

Although KNI imposes performance penalties, it is still interesting for 
legacy applications that can be reused without any change, using DOCKER 
or other containers as well as low performance functions. virtio for VMs 
are also next steps.

The approach of the bifurcated driver is something that would fit quite 
straight forward, since the HW hooks (used in ASICs and other HW accel) 
when installing flowmods can capture this and configure the NICs to 
shortcut the sw OF processing and send pkts directly to the VNF ports, 
if the flow matches flow director restrictions. But I am not sure on the 
way back to the switch, so from the kernel to the PHY or other kernels, 
since there is no flow director. In any case, this is something we 
already had in the mid-term roadmap for the normal OF switch without the 
VNF port extensions.

I wouldn't want to go deeper on the specifics of the use case, because 
the important topic here  is actually the librte_kni implement. So 
please let me know if some of you would be interested in further 
details, also @Jay since the use-case 

[dpdk-dev] Can not init NIC after merge to DPDK 1.7 problem

2014-09-23 Thread Sanford, Robert
We ran into a similar problem when migrating to 1.7.
Here are the subtle flags, in dpdk/mk/rte.app.mk, that we needed:

LDLIBS += --whole-archive
...
LDLIBS += --no-whole-archive

This apparently tells the linker to pull in whole archive(s), even if it
thinks that we don't need all objects.




--
Regards,
Robert


>Hi:
>
>We are using our own Makefile in building dpdk program. Recently we are
>working on upgrading from DPDK 1.3 to DPDK 1.7. I found the
>rte_ixgbe_pmd_init has been replaced by PMD_REGISTER_DRIVER. So I delete
>rte_ixgbe_pmd_init calls. But after that, our dpdk program could not
>correctly find the NIC anymore. After digging into it a little more, I
>found the code dose not correctly register the driver type we are using,
>which is ixgbe.
>To isolate the problem, I hacked a smal example l3fwd, and only have the
>main.c file like this for my testing purpose.
>
>#include 
>#include 
>
>#include "main.h"
>
>int
>MAIN(int argc, char **argv)
>{
>/* init EAL */
>int ret = rte_eal_init(argc, argv);
>printf("ret %d\n", ret);
>return 0;
>}
>
>I found if I use the Makefile provided in the example, the program will
>find the ixgbe NIC. But if I just use these 2 commands to compile and
>link it. It will not find the ixgbe NIC.
>
>gcc -I../../x86_64-native-linuxapp-gcc/include
>-L../../x86_64-native-linuxapp-gcc/lib -lrte_eal -c main.c
>gcc -o l3fwd main.o -L../../x86_64-native-linuxapp-gcc/lib -lrte_eal
>-lrte_distributor -lrte_pipeline -lrte_port -lrte_timer -lrte_hash
>-lrte_acl -lm -lrt -lrte_mbuf -lethdev -lrte_malloc -lrte_mempool
>-lrte_ring -lc -lm -lrte_cmdline -lrte_cfgfile -lrte_pmd_bond
>-lrte_pmd_ixgbe -lrte_pmd_e1000 -lrte_pmd_ring -lpthread -ldl -lrt
>
>Can someone share some light on what is magic of the dpdk Makefile to
>correctly register the NIC type?
>
>Thank you so much.
>Xingbo Wang



[dpdk-dev] Can not init NIC after merge to DPDK 1.7 problem

2014-09-23 Thread Wang, Shawn
This does resolve the problem.

Thank you so much.


From: Sanford, Robert [rsanf...@akamai.com]
Sent: Tuesday, September 23, 2014 2:50 PM
To: Wang, Shawn; dev at dpdk.org
Subject: Re: [dpdk-dev] Can not init NIC after merge to DPDK 1.7 problem

We ran into a similar problem when migrating to 1.7.
Here are the subtle flags, in dpdk/mk/rte.app.mk, that we needed:

LDLIBS += --whole-archive
...
LDLIBS += --no-whole-archive

This apparently tells the linker to pull in whole archive(s), even if it
thinks that we don't need all objects.




--
Regards,
Robert


>Hi:
>
>We are using our own Makefile in building dpdk program. Recently we are
>working on upgrading from DPDK 1.3 to DPDK 1.7. I found the
>rte_ixgbe_pmd_init has been replaced by PMD_REGISTER_DRIVER. So I delete
>rte_ixgbe_pmd_init calls. But after that, our dpdk program could not
>correctly find the NIC anymore. After digging into it a little more, I
>found the code dose not correctly register the driver type we are using,
>which is ixgbe.
>To isolate the problem, I hacked a smal example l3fwd, and only have the
>main.c file like this for my testing purpose.
>
>#include 
>#include 
>
>#include "main.h"
>
>int
>MAIN(int argc, char **argv)
>{
>/* init EAL */
>int ret = rte_eal_init(argc, argv);
>printf("ret %d\n", ret);
>return 0;
>}
>
>I found if I use the Makefile provided in the example, the program will
>find the ixgbe NIC. But if I just use these 2 commands to compile and
>link it. It will not find the ixgbe NIC.
>
>gcc -I../../x86_64-native-linuxapp-gcc/include
>-L../../x86_64-native-linuxapp-gcc/lib -lrte_eal -c main.c
>gcc -o l3fwd main.o -L../../x86_64-native-linuxapp-gcc/lib -lrte_eal
>-lrte_distributor -lrte_pipeline -lrte_port -lrte_timer -lrte_hash
>-lrte_acl -lm -lrt -lrte_mbuf -lethdev -lrte_malloc -lrte_mempool
>-lrte_ring -lc -lm -lrte_cmdline -lrte_cfgfile -lrte_pmd_bond
>-lrte_pmd_ixgbe -lrte_pmd_e1000 -lrte_pmd_ring -lpthread -ldl -lrt
>
>Can someone share some light on what is magic of the dpdk Makefile to
>correctly register the NIC type?
>
>Thank you so much.
>Xingbo Wang