[dpdk-dev] [PATCH 0/4] enhancement to i40e PF host driver

2017-01-03 Thread Chen Jing D(Mark)
Current PF host driver can serve DPDK VF well, but the
implementation is not complete to support Linux VF,
even both DPDK VF and Linux VF use same API set.

This patch set made below changes:
1. Make an enhancement on interface to serve VF, so
   both Linux and DPDK VF can be well served.
2. Change API version number so both DPDK and Linux
   VF can recognize and select proper command and
   data structure to request service. But the
   sacrifice is DPDK VF can't identify host driver
   (Linux or DPDK) and extended function provided
   in DPDK PF host driver can't be used.
   This situation will change after negotiate with
   Linux maintainer to provide a better mechanism
   to identify both PF and VF function.

Chen Jing D(Mark) (4):
  net/i40e: change version number to support Linux VF
  net/i40e: return correct VSI id
  net/i40e: parse more VF parameter and configure
  net/i40e: support Linux VF to configure IRQ link list

 drivers/net/i40e/i40e_pf.c |  174 +++-
 1 files changed, 156 insertions(+), 18 deletions(-)

-- 
1.7.7.6



[dpdk-dev] [PATCH 1/4] net/i40e: change version number to support Linux VF

2017-01-03 Thread Chen Jing D(Mark)
i40e PF host only support to work with DPDK VF driver, Linux
VF driver is not supported. This change will enhance in version
number returned.

Current version info returned won't be able to be recognized
by Linux VF driver, change to values that both DPDK VF and Linux
driver can recognize.

The expense is original DPDK host specific feature like
CFG_VLAN_PVID and CONFIG_VSI_QUEUES_EXT will not available.

DPDK VF also can't identify host driver by version number returned.
It always assume talking with Linux PF.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/i40e/i40e_pf.c |   16 ++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/net/i40e/i40e_pf.c b/drivers/net/i40e/i40e_pf.c
index 97b8ecc..229f71a 100644
--- a/drivers/net/i40e/i40e_pf.c
+++ b/drivers/net/i40e/i40e_pf.c
@@ -278,8 +278,20 @@
 {
struct i40e_virtchnl_version_info info;
 
-   info.major = I40E_DPDK_VERSION_MAJOR;
-   info.minor = I40E_DPDK_VERSION_MINOR;
+   /* Respond like a Linux PF host in order to support both DPDK VF and
+* Linux VF driver. The expense is original DPDK host specific feature
+* like CFG_VLAN_PVID and CONFIG_VSI_QUEUES_EXT will not available.
+*
+* DPDK VF also can't identify host driver by version number returned.
+* It always assume talking with Linux PF.
+*
+* TODO:
+* Discuss with Linux driver maintainer if possible to carry more info
+* in this function to identify it's Linux or DPDK host.
+*/
+   info.major = I40E_VIRTCHNL_VERSION_MAJOR;
+   info.minor = I40E_VIRTCHNL_VERSION_MINOR_NO_VF_CAPS;
+
i40e_pf_host_send_msg_to_vf(vf, I40E_VIRTCHNL_OP_VERSION,
I40E_SUCCESS, (uint8_t *)&info, sizeof(info));
 }
-- 
1.7.7.6



[dpdk-dev] [PATCH 3/4] net/i40e: parse more VF parameter and configure

2017-01-03 Thread Chen Jing D(Mark)
When VF requested to configure TX queue, a few parameters are
missed to be configured in PF host. This change have more
fields parsed and configured for TX context.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/i40e/i40e_pf.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/i40e/i40e_pf.c b/drivers/net/i40e/i40e_pf.c
index 5314d9f..75c5f03 100644
--- a/drivers/net/i40e/i40e_pf.c
+++ b/drivers/net/i40e/i40e_pf.c
@@ -406,10 +406,12 @@
 
/* clear the context structure first */
memset(&tx_ctx, 0, sizeof(tx_ctx));
-   tx_ctx.new_context = 1;
tx_ctx.base = txq->dma_ring_addr / I40E_QUEUE_BASE_ADDR_UNIT;
tx_ctx.qlen = txq->ring_len;
tx_ctx.rdylist = rte_le_to_cpu_16(vf->vsi->info.qs_handle[0]);
+   tx_ctx.head_wb_ena = txq->headwb_enabled;
+   tx_ctx.head_wb_addr = txq->dma_headwb_addr;
+
err = i40e_clear_lan_tx_queue_context(hw, abs_queue_id);
if (err != I40E_SUCCESS)
return err;
-- 
1.7.7.6



[dpdk-dev] [PATCH 2/4] net/i40e: return correct VSI id

2017-01-03 Thread Chen Jing D(Mark)
PF host didn't return correct VSI id to VF.
This change fix it.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/i40e/i40e_pf.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/i40e/i40e_pf.c b/drivers/net/i40e/i40e_pf.c
index 229f71a..5314d9f 100644
--- a/drivers/net/i40e/i40e_pf.c
+++ b/drivers/net/i40e/i40e_pf.c
@@ -335,8 +335,7 @@
 
/* Change below setting if PF host can support more VSIs for VF */
vf_res->vsi_res[0].vsi_type = I40E_VSI_SRIOV;
-   /* As assume Vf only has single VSI now, always return 0 */
-   vf_res->vsi_res[0].vsi_id = 0;
+   vf_res->vsi_res[0].vsi_id = vf->vsi->vsi_id;
vf_res->vsi_res[0].num_queue_pairs = vf->vsi->nb_qps;
ether_addr_copy(&vf->mac_addr,
(struct ether_addr *)vf_res->vsi_res[0].default_mac_addr);
-- 
1.7.7.6



[dpdk-dev] [PATCH 4/4] net/i40e: support Linux VF to configure IRQ link list

2017-01-03 Thread Chen Jing D(Mark)
i40e PF host only support to work with DPDK VF driver, Linux
VF driver is not supported. This change will enhance in
configuring IRQ link list.

This Change will identify VF client by number of vector
requested. DPDK VF will ask only single one while Linux VF
will request at least 2. It will have different configuration
for different clients. DPDK VF will be configured to link all
queue together, while Linux VF will be configured per request.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/i40e/i40e_pf.c |  151 
 1 files changed, 138 insertions(+), 13 deletions(-)

diff --git a/drivers/net/i40e/i40e_pf.c b/drivers/net/i40e/i40e_pf.c
index 75c5f03..eee5e85 100644
--- a/drivers/net/i40e/i40e_pf.c
+++ b/drivers/net/i40e/i40e_pf.c
@@ -552,13 +552,115 @@
return ret;
 }
 
+static void
+i40e_pf_config_irq_link_list(struct i40e_pf_vf *vf,
+ struct i40e_virtchnl_vector_map *vvm)
+{
+   uint64_t linklistmap = 0, tempmap;
+   struct i40e_hw *hw = I40E_PF_TO_HW(vf->pf);
+   uint16_t qid;
+   bool b_first_q = true;
+   enum i40e_queue_type qtype;
+   uint16_t vector_id;
+   uint32_t reg, reg_idx;
+   uint16_t itr_idx = 0, i;
+
+   vector_id = vvm->vector_id;
+   /* setup the head */
+   if (!vector_id)
+   reg_idx = I40E_VPINT_LNKLST0(vf->vf_idx);
+   else
+   reg_idx = I40E_VPINT_LNKLSTN(
+   ((hw->func_caps.num_msix_vectors_vf - 1) * vf->vf_idx)
+   + (vector_id - 1));
+
+   if (vvm->rxq_map == 0 && vvm->txq_map == 0) {
+   I40E_WRITE_REG(hw, reg_idx,
+   I40E_VPINT_LNKLST0_FIRSTQ_INDX_MASK);
+   goto cfg_irq_done;
+   }
+
+   /* sort all rx and tx queues */
+   tempmap = vvm->rxq_map;
+   for (i = 0; i < sizeof(vvm->rxq_map) * 8; i++) {
+   if (tempmap & 0x1)
+   linklistmap |= (1 << (2 * i));
+   tempmap >>= 1;
+   }
+
+   tempmap = vvm->txq_map;
+   for (i = 0; i < sizeof(vvm->txq_map) * 8; i++) {
+   if (tempmap & 0x1)
+   linklistmap |= (1 << (2 * i + 1));
+   tempmap >>= 1;
+   }
+
+   /* Link all rx and tx queues into a chained list */
+   tempmap = linklistmap;
+   i = 0;
+   b_first_q = true;
+   do {
+   if (tempmap & 0x1) {
+   qtype = (enum i40e_queue_type)(i % 2);
+   qid = vf->vsi->base_queue + i / 2;
+   if (b_first_q) {
+   /* This is header */
+   b_first_q = false;
+   reg = ((qtype <<
+   I40E_VPINT_LNKLSTN_FIRSTQ_TYPE_SHIFT)
+   | qid);
+   } else {
+   /* element in the link list */
+   reg = (vector_id) |
+   (qtype << I40E_QINT_RQCTL_NEXTQ_TYPE_SHIFT) |
+   (qid << I40E_QINT_RQCTL_NEXTQ_INDX_SHIFT) |
+   BIT(I40E_QINT_RQCTL_CAUSE_ENA_SHIFT) |
+   (itr_idx << I40E_QINT_RQCTL_ITR_INDX_SHIFT);
+   }
+   I40E_WRITE_REG(hw, reg_idx, reg);
+   /* find next register to program */
+   switch (qtype) {
+   case I40E_QUEUE_TYPE_RX:
+   reg_idx = I40E_QINT_RQCTL(qid);
+   itr_idx = vvm->rxitr_idx;
+   break;
+   case I40E_QUEUE_TYPE_TX:
+   reg_idx = I40E_QINT_TQCTL(qid);
+   itr_idx = vvm->txitr_idx;
+   break;
+   default:
+   break;
+   }
+   }
+   i++;
+   tempmap >>= 1;
+   } while (tempmap);
+
+   /* Terminate the link list */
+   reg = (vector_id) |
+   (0 << I40E_QINT_RQCTL_NEXTQ_TYPE_SHIFT) |
+   (0x7FF << I40E_QINT_RQCTL_NEXTQ_INDX_SHIFT) |
+   BIT(I40E_QINT_RQCTL_CAUSE_ENA_SHIFT) |
+   (itr_idx << I40E_QINT_RQCTL_ITR_INDX_SHIFT);
+   I40E_WRITE_REG(hw, reg_idx, reg);
+
+cfg_irq_done:
+   I40E_WRITE_FLUSH(hw);
+}
+
 static int
 i40e_pf_host_process_cmd_config_irq_map(struct i40e_pf_vf *vf,
uint8_t *msg, uint16_t msglen)
 {
int ret = I40E_SUCCESS;
+   struct i40e_pf *pf = vf->pf;
+   struct i40e_hw *hw = I40E_PF_TO_HW(vf->pf);
struct i40e_virtchnl_irq_map_info *irqmap =
(struct i40e_virtchnl_irq_map_info *)msg;
+   struct i40e_virtchnl_vector_map *map;
+   int i;
+   uint16_t vector_id;
+   unsigned lo

Re: [dpdk-dev] [PATCH 2/2] vhost: start vhost servers once

2017-01-03 Thread Yuanhan Liu
On Fri, Dec 30, 2016 at 04:26:27PM -0500, Charles (Chas) Williams wrote:
> 
> 
> On 12/29/2016 10:15 PM, Yuanhan Liu wrote:
> >On Thu, Dec 29, 2016 at 10:58:11AM -0500, Charles (Chas) Williams wrote:
> >>On 12/29/2016 03:52 AM, Yuanhan Liu wrote:
> >>>On Wed, Dec 28, 2016 at 04:10:52PM -0500, Charles (Chas) Williams wrote:
> Start a vhost server once during devinit instead of during device start
> and stop.  Some vhost clients, QEMU, don't re-attaching to sockets when
> the vhost server is stopped and later started.  Preserve existing behavior
> for vhost clients.
> >>>
> >>>I didn't quite get the idea what you are going to fix.
> >>
> >>The issue I am trying to fix is QEMU interaction when DPDK's vhost is
> >>acting as a server to QEMU vhost clients.  If you create a vhost server
> >>device, it doesn't create the actual datagram socket until you call
> >>.dev_start().  If you call .dev_stop() is also deletes those sockets.
> >>For QEMU, this is a problem since QEMU doesn't know how to re-attach to
> >>datagram sockets that have gone away.
> >
> >Thanks! And I'd appreciate it if you could have written the commit log
> >this way firstly.
> >
> >>.dev_start()/.dev_stop() seems to roughly means link up and link down
> >>so I understand why you might want to add/remove the datagram sockets.
> >>However, in practice, this doesn't seem to make much sense for a DPDK
> >>vhost server.
> >
> >Agree.
> >
> >>This doesn't seem like the right way to indicate link
> >>status to vhost clients.
> >>
> >>It seems like it would just be easier to do this for both clients and
> >>servers, but I don't know why it was done this way originally so I
> >>choose to keep the client behavior.
> >
> >I don't think there are any differences between DPDK acting as client or
> >server. To me, the right logic seems to be (for both DPDK as server and
> >client).
> >
> >For register,
> >- register the vhost socket at probe stage (either at rte_pmd_vhost_probe
> >  or at eth_dev_vhost_create).
> >- start the vhost session right after the register when we haven't started
> >  it before.
> >
> >For unregister,
> >- invoke rte_vhost_driver_unregister() at rte_pmd_vhost_remove().
> 
> OK. This will be much easier than what I submitted.

Good.

> 
> >For dev_start/stop,
> >- set allow_queuing to 1/0 for start/stop, respectively.
> 
> Unfortunately, I don't think this will work.  new_device() doesn't happen
> until a client connects.  allow_queueing seems to be following the status
> of the "wire" as it where.  .dev_start()/.dev_stop() is the link of local
> port connected to the wire (administratively up or down as it where).
> 
> .dev_start() can happen before new_device() and attempting to RX for a
> client that doesn't exist doesn't seem like a good idea. 

Right.

> Perhaps another
> flag that follows dev_started, but for the queues?

I will comment it on your v3 patches.

--yliu


Re: [dpdk-dev] [PATCH v3 1/2] net/vhost: create datagram sockets immediately

2017-01-03 Thread Yuanhan Liu
On Sun, Jan 01, 2017 at 02:01:56PM -0500, Charles (Chas) Williams wrote:
> If you create a vhost server device, it doesn't create the actual datagram
> socket until you call .dev_start().  If you call .dev_stop() is also
> deletes those sockets.  For QEMU clients, this is a problem since QEMU
> doesn't know how to re-attach to datagram sockets that have gone away.
> 
> To work around this, register and unregister the datagram sockets during

I will not call it's a "workaround", instead, it's a "fix" to me.

> device creation and removal.
> 
> Fixes: ee584e9710b9 ("vhost: add driver on top of the library")
> 
> Signed-off-by: Chas Williams 
> ---
>  drivers/net/vhost/rte_eth_vhost.c | 43 
> ---
>  1 file changed, 17 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/net/vhost/rte_eth_vhost.c 
> b/drivers/net/vhost/rte_eth_vhost.c
> index 60b0f51..6b11e40 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -114,8 +114,6 @@ struct pmd_internal {
>   char *iface_name;
>   uint16_t max_queues;
>   uint64_t flags;

I think the "flags" could also be dropped in this patch: no user any
more.

--yliu


Re: [dpdk-dev] [PATCH v3 16/33] drivers/pool/dpaa2: adding hw offloaded mempool

2017-01-03 Thread Hemant Agrawal

On 12/29/2016 12:38 PM, Santosh Shukla wrote:

Hi Shreyansh,

On Thu, Dec 29, 2016 at 10:46:35AM +0530, Shreyansh Jain wrote:

From: Hemant Agrawal 

Adding NXP DPAA2 architecture specific mempool support
Each mempool instance is represented by a DPBP object
from the FSL-MC bus.

This patch also registers a dpaa2 type MEMPOOL OPS

Signed-off-by: Hemant Agrawal 
---
 config/common_base|   1 +
 config/defconfig_arm64-dpaa2-linuxapp-gcc |   4 +
 drivers/Makefile  |   1 +
 drivers/bus/fslmc/Makefile|   2 +
 drivers/bus/fslmc/fslmc_vfio.c|   9 +-
 drivers/bus/fslmc/fslmc_vfio.h|   2 +
 drivers/bus/fslmc/portal/dpaa2_hw_dpbp.c  | 137 +
 drivers/bus/fslmc/portal/dpaa2_hw_pvt.h   |  19 ++
 drivers/bus/fslmc/rte_pmd_fslmcbus_version.map|   2 +
 drivers/common/Makefile   |   3 +
 drivers/pool/Makefile |  38 +++
 drivers/pool/dpaa2/Makefile   |  67 +
 drivers/pool/dpaa2/dpaa2_hw_mempool.c | 339 ++
 drivers/pool/dpaa2/dpaa2_hw_mempool.h |  95 ++
 drivers/pool/dpaa2/rte_pmd_dpaa2_pool_version.map |   8 +
 mk/rte.app.mk |   1 +
 16 files changed, 727 insertions(+), 1 deletion(-)
 create mode 100644 drivers/bus/fslmc/portal/dpaa2_hw_dpbp.c
 create mode 100644 drivers/pool/Makefile
 create mode 100644 drivers/pool/dpaa2/Makefile
 create mode 100644 drivers/pool/dpaa2/dpaa2_hw_mempool.c
 create mode 100644 drivers/pool/dpaa2/dpaa2_hw_mempool.h
 create mode 100644 drivers/pool/dpaa2/rte_pmd_dpaa2_pool_version.map

diff --git a/config/common_base b/config/common_base
index d605e85..493811f 100644
--- a/config/common_base
+++ b/config/common_base
@@ -276,6 +276,7 @@ CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_MBOX=n
 # Compile Support Libraries for NXP DPAA2
 #
 CONFIG_RTE_LIBRTE_DPAA2_COMMON=n
+CONFIG_RTE_LIBRTE_DPAA2_POOL=n

 #
 # Compile NXP DPAA2 FSL-MC Bus
diff --git a/config/defconfig_arm64-dpaa2-linuxapp-gcc 
b/config/defconfig_arm64-dpaa2-linuxapp-gcc
index d3bc9d8..7665912 100644
--- a/config/defconfig_arm64-dpaa2-linuxapp-gcc
+++ b/config/defconfig_arm64-dpaa2-linuxapp-gcc
@@ -42,10 +42,14 @@ CONFIG_RTE_ARCH_ARM_TUNE="cortex-a57+fp+simd"
 CONFIG_RTE_MAX_LCORE=8
 CONFIG_RTE_MAX_NUMA_NODES=1

+CONFIG_RTE_PKTMBUF_HEADROOM=256
+
 #
 # Compile Support Libraries for DPAA2
 #
 CONFIG_RTE_LIBRTE_DPAA2_COMMON=y
+CONFIG_RTE_LIBRTE_DPAA2_POOL=n
+CONFIG_RTE_MBUF_DEFAULT_MEMPOOL_OPS="dpaa2"

 #
 # Compile NXP DPAA2 FSL-MC Bus
diff --git a/drivers/Makefile b/drivers/Makefile
index bdae63b..9fd268e 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -33,6 +33,7 @@ include $(RTE_SDK)/mk/rte.vars.mk

 DIRS-y += common
 DIRS-y += bus
+DIRS-y += pool
 DIRS-y += net
 DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += crypto

diff --git a/drivers/bus/fslmc/Makefile b/drivers/bus/fslmc/Makefile
index 1b815dd..35f30ad 100644
--- a/drivers/bus/fslmc/Makefile
+++ b/drivers/bus/fslmc/Makefile
@@ -47,6 +47,7 @@ CFLAGS += "-Wno-strict-aliasing"
 CFLAGS += -I$(RTE_SDK)/drivers/bus/fslmc
 CFLAGS += -I$(RTE_SDK)/drivers/bus/fslmc/mc
 CFLAGS += -I$(RTE_SDK)/drivers/common/dpaa2/qbman/include
+CFLAGS += -I$(RTE_SDK)/drivers/pool/dpaa2
 CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal

 # versioning export map
@@ -63,6 +64,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS) += \
 mc/mc_sys.c

 SRCS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS) += portal/dpaa2_hw_dpio.c
+SRCS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS) += portal/dpaa2_hw_dpbp.c
 SRCS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS) += fslmc_vfio.c
 SRCS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS) += fslmc_bus.c

diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index ed0a8b9..4e47ec8 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -268,7 +268,7 @@ int fslmc_vfio_process_group(struct rte_bus *bus)
char path[PATH_MAX];
int64_t v_addr;
int ndev_count;
-   int dpio_count = 0;
+   int dpio_count = 0, dpbp_count = 0;
struct fslmc_vfio_group *group = &vfio_groups[0];
static int process_once;

@@ -418,6 +418,11 @@ int fslmc_vfio_process_group(struct rte_bus *bus)
if (!ret)
dpio_count++;
}
+   if (!strcmp(object_type, "dpbp")) {
+   ret = dpaa2_create_dpbp_device(object_id);
+   if (!ret)
+   dpbp_count++;
+   }
}
closedir(d);

@@ -425,6 +430,8 @@ int fslmc_vfio_process_group(struct rte_bus *bus)
if (ret)
FSLMC_VFIO_LOG(DEBUG, "Error in affining qbman swp %d", ret);

+   FSLMC_VFIO_LOG(DEBUG, "DPAA2: Added dpbp_count = %d dpio_count=%d\n",
+ dpbp_count, dpio_count);
return 0;

 FAILURE:
diff --git a/drive

Re: [dpdk-dev] [PATCH v3 2/2] net/vhost: emulate device start/stop behavior

2017-01-03 Thread Yuanhan Liu
On Sun, Jan 01, 2017 at 02:01:57PM -0500, Charles (Chas) Williams wrote:
> .dev_start()/.dev_stop() roughly corresponds to the local device's
> port being up or down.  This is different from the remote client being
> connected which is roughtly link up or down.  Emulate the behavior by
> separately tracking the local start/stop state to determine if we should
> allow packets to be queued to the remote client.
> 
> Signed-off-by: Chas Williams 
> ---
>  drivers/net/vhost/rte_eth_vhost.c | 65 
> ---
>  1 file changed, 54 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/vhost/rte_eth_vhost.c 
> b/drivers/net/vhost/rte_eth_vhost.c
> index 6b11e40..d5a4540 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -100,7 +100,8 @@ struct vhost_stats {
>  
>  struct vhost_queue {
>   int vid;
> - rte_atomic32_t allow_queuing;
> + rte_atomic32_t connected;
> + rte_atomic32_t ready;
>   rte_atomic32_t while_queuing;
>   struct pmd_internal *internal;
>   struct rte_mempool *mb_pool;
> @@ -383,18 +384,25 @@ vhost_update_packet_xstats(struct vhost_queue *vq,
>   }
>  }
>  
> +static inline bool
> +queuing_stopped(struct vhost_queue *r)
> +{
> + return unlikely(rte_atomic32_read(&r->connected) == 0 ||
> + rte_atomic32_read(&r->ready) == 0);
> +}

That's one more check comparing to the old code, meaning a bit more
expensive than before.

I think we could maintain the same effort by:

- introduce per-device "started" flag: set/unset on dev_start/stop,
  respectively.

- introduce per-device "dev_attached" flag: set/unset on
  new/destory_device(), respectively.

On update of each flag, setting "allow_queuing" properly.

Okay to you?

--yliu


Re: [dpdk-dev] [PATCH v3 1/4] ethdev: add firmware information get

2017-01-03 Thread Yang, Qiming
Hi, Ferruh
Please see the question below. In my opinion, etrack_id is just a name used to 
define the ID of one NIC.
In kernel version ethtool, it will print this ID in the line of firmware 
verison. 
I know what is etrack_id mean, but I really don't know why this named etrack_id.
Can you explain this question?
 
-Original Message-
From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
Sent: Tuesday, January 3, 2017 4:40 PM
To: Yang, Qiming 
Subject: Re: [PATCH v3 1/4] ethdev: add firmware information get

Please reply below the question and on the mailing list.
You'll have to explain why this name etrack_id.

2017-01-03 03:28, Yang, Qiming:
> Hi, Thomas
> etrack_id is not a terminology, it's decided by me.
> Which is store the unique number of the firmware.
> firmware-version: 5.04 0x800024ca
> 800024ca is the etrack_id of this NIC.
> 
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
> Sent: Monday, January 2, 2017 11:39 PM
> To: Yang, Qiming 
> Cc: dev@dpdk.org; Horton, Remy ; Yigit, Ferruh 
> 
> Subject: Re: [PATCH v3 1/4] ethdev: add firmware information get
> 
> 2016-12-27 20:30, Qiming Yang:
> >  /**
> > + * Retrieve the firmware version of a device.
> > + *
> > + * @param port_id
> > + *   The port identifier of the device.
> > + * @param fw_major
> > + *   A array pointer to store the major firmware version of a device.
> > + * @param fw_minor
> > + *   A array pointer to store the minor firmware version of a device.
> > + * @param fw_patch
> > + *   A array pointer to store the firmware patch number of a device.
> > + * @param etrack_id
> > + *   A array pointer to store the nvm version of a device.
> > + */
> > +void rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t *fw_major,
> > +   uint32_t *fw_minor, uint32_t *fw_patch, uint32_t *etrack_id);
> 
> I have a reserve about the naming etrack_id.
> Please could you point to a document explaining this ID?
> Is it known outside of Intel?




Re: [dpdk-dev] [PATCH 1/2] ethdev: fix name index in xstats Api

2017-01-03 Thread Remy Horton

Been away, hence the somewhat late review..

On 16/12/2016 09:44, Olivier Matz wrote:
[..]

Today, each 'id' returned by rte_eth_xstats_get() is equal to the index
in the returned array, making this value useless. It also prevents a
driver from having different indexes for names and value, like in the
example below:


My original intention was to give free reign over what id numbers are 
used, but for reasons I've now forgotten the implementation ended up 
making everything sequential.



CC: sta...@dpdk.org
Fixes: bd6aa172cf35 ("ethdev: fetch extended statistics with integer ids")

Signed-off-by: Olivier Matz 


Acked-by: Remy Horton 


Re: [dpdk-dev] [PATCH v2 2/2] ethdev: clarify xstats Api documentation

2017-01-03 Thread Remy Horton


On 23/12/2016 20:35, Olivier Matz wrote:

Reword the Api documentation of xstats ethdev.

CC: sta...@dpdk.org
Signed-off-by: Olivier Matz 


Acked-by: Remy Horton 


Re: [dpdk-dev] XL710 with i40e driver drops packets on RX even on a small rates.

2017-01-03 Thread Martin Weiser
Hello,

we are also seeing this issue on one of our test systems while it does
not occur on other test systems with the same DPDK version (we tested
16.11 and current master).

The system that we can reproduce this issue on also has a X552 ixgbe NIC
which can forward the exact same traffic using the same testpmd
parameters without a problem.
Even if we install a 82599ES ixgbe NIC in the same PCI slot that the
XL710 was in the 82599ES can forward the traffic without any drops.

Like in the issue reported by Ilya all packet drops occur on the testpmd
side and are accounted as 'imissed'. Increasing the number of rx
descriptors only helps a little at low packet rates.

Drops start occurring at pretty low packet rates like 10 packets per
second.

Any suggestions would be greatly appreciated.

Best regards,
Martin



On 22.08.16 14:06, Ilya Maximets wrote:
> Hello, All.
>
> I've faced with a really bad situation with packet drops on a small
> packet rates (~45 Kpps) while using XL710 NIC with i40e DPDK driver.
>
> The issue was found while testing PHY-VM-PHY scenario with OVS and
> confirmed on PHY-PHY scenario with testpmd.
>
> DPDK version 16.07 was used in all cases.
> XL710 firmware-version: f5.0.40043 a1.5 n5.04 e2505
>
> Test description (PHY-PHY):
>
>   * Following cmdline was used:
>
>   # n_desc=2048
>   # ./testpmd -c 0xf -n 2 --socket-mem=8192,0 -w :05:00.0 -v \
>   -- --burst=32 --txd=${n_desc} --rxd=${n_desc} \
>   --rxq=1 --txq=1 --nb-cores=1 \
>   --eth-peer=0,a0:00:00:00:00:00 --forward-mode=mac
>
>   * DPDK-Pktgen application was used as a traffic generator.
> Single flow generated.
>
> Results:
>
>   * Packet size: 128B, rate: 90% of 10Gbps (~7.5 Mpps):
>
> On the generator's side:
>
> Total counts:
>   Tx:  759034368 packets
>   Rx:  759033239 packets
>   Lost  :   1129 packets
>
> Average rates:
>   Tx:7590344 pps
>   Rx:7590332 pps
>   Lost  : 11 pps
>
> All of this dropped packets are RX-dropped on testpmd's side:
>
> +++ Accumulated forward statistics for all 
> ports+++
> RX-packets: 759033239  RX-dropped: 1129  RX-total: 
> 759034368
> TX-packets: 759033239  TX-dropped: 0 TX-total: 
> 759033239
> 
> +++
>
> At the same time 10G NIC with IXGBE driver works perfectly
> without any packet drops in the same scenario.
>
> Much worse situation with PHY-VM-PHY scenario with OVS:
>
>   * testpmd application used inside guest to forward incoming packets.
> (almost same cmdline as for PHY-PHY)
>
>   * For packet size 256 B on rate 1% of 10Gbps (~45 Kpps):
>
> Total counts:
>   Tx:1358112 packets
>   Rx:1357990 packets
>   Lost  :122 packets
>
> Average rates:
>   Tx:  45270 pps
>   Rx:  45266 pps
>   Lost  :  4 pps
>
> All of this 122 dropped packets can be found in rx_dropped counter:
>
>   # ovs-vsctl get interface dpdk0 statistics:rx_dropped
>   122
>
>And again, no issues with IXGBE on the exactly same scenario.
>
>
> Results of my investigation:
>
>   * I found that all of this packets are 'imissed'. This means that rx
> descriptor ring was overflowed.
>
>   * I've modified i40e driver to check the real number of free descriptors
> that was not still filled by the NIC and found that HW fills
> rx descriptors with uneven rate. Looks like it fills them using
> a huge batches.
>
>   * So, root cause of packet drops with XL710 is somehow uneven rate of
> filling of the hw rx descriptors by the NIC. This leads to exhausting
> of rx descriptors and packet drops by the hardware. 10G IXGBE NIC 
> works
> more smoothly and driver is able to refill hw ring with rx descriptors
> in time.
>
>   * The issue becomes worse with OVS because of much bigger latencies
> between 'rte_eth_rx_burst()' calls.
>
> The easiest solution for this problem is to increase number of RX descriptors.
> Increasing up to 4096 eliminates packet drops but decreases the performance a 
> lot:
>
>   For OVS PHY-VM-PHY scenario by 10%
>   For OVS PHY-PHY scenario by 20%
>   For tespmd PHY-PHY scenario by 17% (22.1 Mpps --> 18.2 Mpps for 64B 
> packets)
>
> As a result we have a trade-off between zero drop rate on small packet rates 
> and
> the higher maximum performance that is very sad.
>
> Using of 16B descriptors doesn't really help with performance.
> Upgrading the firmware from version 4.4 to 

Re: [dpdk-dev] [PATCH v5 00/20] Decouple ethdev from PCI device

2017-01-03 Thread Ferruh Yigit
On 12/25/2016 10:33 PM, Thomas Monjalon wrote:
> 2016-12-23 16:57, Jan Blunck:
>> This repost addresses the review comments of Thomas Monjalon to completely
>> remove the ethdev helper to further decrease the coupling of the ethdev and
>> the eal layers. This required me to squash together all patches using the
>> rte_eth_dev_to_pci() helper into "Decouple from PCI device" patch. As
>> discussed privately I'll keep the PCI information in rte_eth_dev_info
>> untouched.
> 
> Applied with some trivial fixes, thanks
> 

I rebased these changes into next-net tree. And need to update some sfc
and nfp patches [1] there.

Andrew, Alejandro,

Can you please review your driver in the latest next-net tree?

Thanks,
ferruh

[1]
nfp:
net/nfp: add Rx interrupts

sfc:
net/sfc: support link status change interrupt
net/sfc: interrupts support sufficient for event queue init
net/sfc: implement driver operation to init device on attach
net/sfc: libefx-based PMD stub sufficient to build and init


Re: [dpdk-dev] [PATCH v5 00/20] Decouple ethdev from PCI device

2017-01-03 Thread Ferruh Yigit
On 12/25/2016 10:33 PM, Thomas Monjalon wrote:
> 2016-12-23 16:57, Jan Blunck:
>> This repost addresses the review comments of Thomas Monjalon to completely
>> remove the ethdev helper to further decrease the coupling of the ethdev and
>> the eal layers. This required me to squash together all patches using the
>> rte_eth_dev_to_pci() helper into "Decouple from PCI device" patch. As
>> discussed privately I'll keep the PCI information in rte_eth_dev_info
>> untouched.
> 
> Applied with some trivial fixes, thanks
> 

Getting following build error for mlx5 [1], it is mainly because verbs.h
also using container_of macro.

[1]
In file included from
.../x86_64-native-linuxapp-gcc/include/rte_mbuf.h:57:0,
 from .../x86_64-native-linuxapp-gcc/include/rte_ether.h:52,
 from .../drivers/net/mlx5/mlx5_trigger.c:38:
/usr/include/infiniband/verbs.h: In function ‘verbs_get_device’:
.../x86_64-native-linuxapp-gcc/include/rte_common.h:350:40: error:
initialization discards ‘const’ qualifier from pointer target type
[-Werror=discarded-qualifiers]
typeof(((type *)0)->member) *_ptr = (ptr); \
^


[dpdk-dev] [PATCH v3] crypto/aesni_gcm: migration from MB library to ISA-L

2017-01-03 Thread Piotr Azarewicz
Current Cryptodev AES-NI GCM PMD is implemented using Multi Buffer
Crypto library.This patch reimplement the device using ISA-L Crypto
library: https://github.com/01org/isa-l_crypto.

The migration entailed the following additional support for:
  * GMAC algorithm.
  * 256-bit cipher key.
  * Session-less mode.
  * Out-of place processing
  * Scatter-gatter support for chained mbufs (only out-of place and
destination mbuf must be contiguous)

Verified current unit tests and added new unit tests to verify new
functionalities.

Signed-off-by: Piotr Azarewicz 
---

To be applied on top of:
   [dpdk-dev] [PATCH v2 0/3] Fix iv sizes in crypto drivers capabilities

v3 changes:
- rebase on top of dpdk-next-crypto

v2 changes:
- implement native scatter-gatter support for chained mbufs (only out-of
place and destination mbuf must be contiguous)
- write unit test for session-less mode
- write unit test for out-of place processing
- add support for GMAC authentication algorithm

 app/test/test_cryptodev.c|  739 +++---
 app/test/test_cryptodev_gcm_test_vectors.h   |  487 +-
 doc/guides/cryptodevs/aesni_gcm.rst  |   23 +-
 doc/guides/rel_notes/release_17_02.rst   |   13 +
 drivers/crypto/aesni_gcm/Makefile|8 +-
 drivers/crypto/aesni_gcm/aesni_gcm_ops.h |   95 +--
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c |  307 -
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c |   49 +-
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h |   15 +-
 mk/rte.app.mk|3 +-
 10 files changed, 1356 insertions(+), 383 deletions(-)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index ba6bbb5..ecbf765 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -3979,16 +3979,48 @@ static int test_snow3g_decryption_oop(const struct 
snow3g_test_data *tdata)
 }
 
 static int
+create_gcm_xforms(struct rte_crypto_op *op,
+   enum rte_crypto_cipher_operation cipher_op,
+   uint8_t *key, const uint8_t key_len,
+   const uint8_t aad_len, const uint8_t auth_len,
+   enum rte_crypto_auth_operation auth_op)
+{
+   TEST_ASSERT_NOT_NULL(rte_crypto_op_sym_xforms_alloc(op, 2),
+   "failed to allocate space for crypto transforms");
+
+   struct rte_crypto_sym_op *sym_op = op->sym;
+
+   /* Setup Cipher Parameters */
+   sym_op->xform->type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+   sym_op->xform->cipher.algo = RTE_CRYPTO_CIPHER_AES_GCM;
+   sym_op->xform->cipher.op = cipher_op;
+   sym_op->xform->cipher.key.data = key;
+   sym_op->xform->cipher.key.length = key_len;
+
+   TEST_HEXDUMP(stdout, "key:", key, key_len);
+
+   /* Setup Authentication Parameters */
+   sym_op->xform->next->type = RTE_CRYPTO_SYM_XFORM_AUTH;
+   sym_op->xform->next->auth.algo = RTE_CRYPTO_AUTH_AES_GCM;
+   sym_op->xform->next->auth.op = auth_op;
+   sym_op->xform->next->auth.digest_length = auth_len;
+   sym_op->xform->next->auth.add_auth_data_length = aad_len;
+   sym_op->xform->next->auth.key.length = 0;
+   sym_op->xform->next->auth.key.data = NULL;
+   sym_op->xform->next->next = NULL;
+
+   return 0;
+}
+
+static int
 create_gcm_operation(enum rte_crypto_cipher_operation op,
-   const uint8_t *auth_tag, const unsigned auth_tag_len,
-   const uint8_t *iv, const unsigned iv_len,
-   const uint8_t *aad, const unsigned aad_len,
-   const unsigned data_len, unsigned data_pad_len)
+   const struct gcm_test_data *tdata)
 {
struct crypto_testsuite_params *ts_params = &testsuite_params;
struct crypto_unittest_params *ut_params = &unittest_params;
 
-   unsigned iv_pad_len = 0, aad_buffer_len;
+   uint8_t *plaintext, *ciphertext;
+   unsigned int iv_pad_len, aad_pad_len, plaintext_pad_len;
 
/* Generate Crypto op data structure */
ut_params->op = rte_crypto_op_alloc(ts_params->op_mpool,
@@ -3998,63 +4030,118 @@ static int test_snow3g_decryption_oop(const struct 
snow3g_test_data *tdata)
 
struct rte_crypto_sym_op *sym_op = ut_params->op->sym;
 
-   sym_op->auth.digest.data = (uint8_t *)rte_pktmbuf_append(
-   ut_params->ibuf, auth_tag_len);
-   TEST_ASSERT_NOT_NULL(sym_op->auth.digest.data,
-   "no room to append digest");
-   sym_op->auth.digest.phys_addr = rte_pktmbuf_mtophys_offset(
-   ut_params->ibuf, data_pad_len);
-   sym_op->auth.digest.length = auth_tag_len;
-
-   if (op == RTE_CRYPTO_CIPHER_OP_DECRYPT) {
-   rte_memcpy(sym_op->auth.digest.data, auth_tag, auth_tag_len);
-   TEST_HEXDUMP(stdout, "digest:",
-   sym_op->auth.digest.data,
-   sym_op->auth.digest.length);
-   }
+   /*

Re: [dpdk-dev] [PATCH 1/2] net/ixgbe: remove unused global variable

2017-01-03 Thread Ferruh Yigit
On 12/27/2016 10:09 AM, Jerin Jacob wrote:
> Removed unused "reg_info" global variable from ixgbe driver.
> 
> cat build/app/testpmd.map | grep "Allocating common symbols" -A 15
> Allocating common symbols
> Common symbol   sizefile
> reg_info0x18build/lib/librte_pmd_ixgbe.a(ixgbe_ethdev.o)
> 
> Signed-off-by: Jerin Jacob 

Acked-by: Ferruh Yigit 



Re: [dpdk-dev] [PATCH 2/2] app/testpmd: remove explicit ixgbe link request

2017-01-03 Thread Ferruh Yigit
On 12/27/2016 10:09 AM, Jerin Jacob wrote:
> Removed explicit ixgbe driver linkage request from
> app/testpmd makefile to mk/rte.app.mk to
> 1)Maintain the correct link ordering(from higher level libraries
> to lower level libraries)
> 2)In shared lib configuration, any application can use ixgbe
> exposed pmd specific APIs not just testpmd.

In testpmd, "explicit ixgbe driver linkage request" added because
testpmd uses ixgbe PMD specific APIs.

Overall, that line is for shared library, for static library result
should be same.

I believe it is good to keep it in testpmd Makefile, updating rte.app.mk
to have it will:
- link library to the applications which does not use PMD specific APIs
and want to load PMD dynamically.
- link library to the application that won't use driver at all. This may
break the distributed binaries, since testpmd will now be dependent to a
specific PMD.

> 
> Signed-off-by: Jerin Jacob 
> ---
>  app/test-pmd/Makefile | 2 --
>  mk/rte.app.mk | 2 +-
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/app/test-pmd/Makefile b/app/test-pmd/Makefile
> index 5988c3e..96e0c67 100644
> --- a/app/test-pmd/Makefile
> +++ b/app/test-pmd/Makefile
> @@ -59,8 +59,6 @@ SRCS-y += csumonly.c
>  SRCS-y += icmpecho.c
>  SRCS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ieee1588fwd.c
>  
> -_LDLIBS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += -lrte_pmd_ixgbe
> -
>  CFLAGS_cmdline.o := -D_GNU_SOURCE
>  
>  # this application needs libraries first
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index f75f0e2..aee235c 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -101,6 +101,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)+= 
> -lrte_cfgfile
>  
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)   += -lrte_pmd_bond
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT)+= -lrte_pmd_xenvirt -lxenstore
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD)  += -lrte_pmd_ixgbe
>  
>  ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
>  # plugins (link only if static libraries)
> @@ -114,7 +115,6 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_ENA_PMD)+= 
> -lrte_pmd_ena
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_ENIC_PMD)   += -lrte_pmd_enic
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_FM10K_PMD)  += -lrte_pmd_fm10k
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_I40E_PMD)   += -lrte_pmd_i40e
> -_LDLIBS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD)  += -lrte_pmd_ixgbe
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)   += -lrte_pmd_mlx4 -libverbs
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)   += -lrte_pmd_mlx5 -libverbs
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_MPIPE_PMD)  += -lrte_pmd_mpipe -lgxio
> 



Re: [dpdk-dev] [PATCH v3 1/2] net/vhost: create datagram sockets immediately

2017-01-03 Thread Charles (Chas) Williams



On 01/03/2017 03:22 AM, Yuanhan Liu wrote:

On Sun, Jan 01, 2017 at 02:01:56PM -0500, Charles (Chas) Williams wrote:

If you create a vhost server device, it doesn't create the actual datagram
socket until you call .dev_start().  If you call .dev_stop() is also
deletes those sockets.  For QEMU clients, this is a problem since QEMU
doesn't know how to re-attach to datagram sockets that have gone away.

To work around this, register and unregister the datagram sockets during


I will not call it's a "workaround", instead, it's a "fix" to me.


OK.


device creation and removal.

Fixes: ee584e9710b9 ("vhost: add driver on top of the library")

Signed-off-by: Chas Williams 
---
 drivers/net/vhost/rte_eth_vhost.c | 43 ---
 1 file changed, 17 insertions(+), 26 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 60b0f51..6b11e40 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -114,8 +114,6 @@ struct pmd_internal {
char *iface_name;
uint16_t max_queues;
uint64_t flags;


I think the "flags" could also be dropped in this patch: no user any
more.


Sorry, I hadn't noticed that -- Yes, it can go away.


Re: [dpdk-dev] [PATCH v5 00/20] Decouple ethdev from PCI device

2017-01-03 Thread Thomas Monjalon
2017-01-03 12:24, Ferruh Yigit:
> On 12/25/2016 10:33 PM, Thomas Monjalon wrote:
> > Applied with some trivial fixes, thanks
> 
> Getting following build error for mlx5 [1], it is mainly because verbs.h
> also using container_of macro.
> 
> [1]
> In file included from
> .../x86_64-native-linuxapp-gcc/include/rte_mbuf.h:57:0,
>  from .../x86_64-native-linuxapp-gcc/include/rte_ether.h:52,
>  from .../drivers/net/mlx5/mlx5_trigger.c:38:
> /usr/include/infiniband/verbs.h: In function ‘verbs_get_device’:
> .../x86_64-native-linuxapp-gcc/include/rte_common.h:350:40: error:
> initialization discards ‘const’ qualifier from pointer target type
> [-Werror=discarded-qualifiers]
> typeof(((type *)0)->member) *_ptr = (ptr); \
> ^

Yes, this issue is fixed by upgrading mofed to version 3.4.


Re: [dpdk-dev] [PATCH v2 14/18] net/ixgbe: parse L2 tunnel filter

2017-01-03 Thread Adrien Mazarguil
Hi Wei,

On Fri, Dec 30, 2016 at 03:53:06PM +0800, Wei Zhao wrote:
> check if the rule is a L2 tunnel rule, and get the L2 tunnel info.
> 
> Signed-off-by: Wei Zhao 
> Signed-off-by: Wenzhuo Lu 
> 
> ---
> 
> v2:
> --add new error set function
> --change return value type of parser function
> ---
>  drivers/net/ixgbe/ixgbe_ethdev.c | 269 
> +++
>  lib/librte_ether/rte_flow.h  |  32 +
>  2 files changed, 273 insertions(+), 28 deletions(-)
[...]
> diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
> index 98084ac..e9e6220 100644
> --- a/lib/librte_ether/rte_flow.h
> +++ b/lib/librte_ether/rte_flow.h
> @@ -268,6 +268,13 @@ enum rte_flow_item_type {
>* See struct rte_flow_item_vxlan.
>*/
>   RTE_FLOW_ITEM_TYPE_VXLAN,
> +
> + /**
> +   * Matches a E_TAG header.
> +   *
> +   * See struct rte_flow_item_e_tag.
> +   */
> + RTE_FLOW_ITEM_TYPE_E_TAG,
>  };
>  
>  /**
> @@ -454,6 +461,31 @@ struct rte_flow_item_vxlan {
>  };
>  
>  /**
> + * RTE_FLOW_ITEM_TYPE_E_TAG.
> + *
> + * Matches a E-tag header.
> + */
> +struct rte_flow_item_e_tag {
> + struct ether_addr dst; /**< Destination MAC. */
> + struct ether_addr src; /**< Source MAC. */
> + uint16_t e_tag_ethertype; /**< E-tag EtherType, 0x893F. */
> + uint16_t e_pcp:3; /**<  E-PCP */
> + uint16_t dei:1; /**< DEI */
> + uint16_t in_e_cid_base:12; /**< Ingress E-CID base */
> + uint16_t rsv:2; /**< reserved */
> + uint16_t grp:2; /**< GRP */
> + uint16_t e_cid_base:12; /**< E-CID base */
> + uint16_t in_e_cid_ext:8; /**< Ingress E-CID extend */
> + uint16_t e_cid_ext:8; /**< E-CID extend */
> + uint16_t type; /**< MAC type. */
> + unsigned int tags; /**< Number of 802.1Q/ad tags defined. */
> + struct {
> + uint16_t tpid; /**< Tag protocol identifier. */
> + uint16_t tci; /**< Tag control information. */
> + } tag[]; /**< 802.1Q/ad tag definitions, outermost first. */
> +};
[...]

See my previous reply [1], this definition is not endian-safe and comprises
protocols defined as independent items (namely ETH and VLAN). Here is an
untested suggestion:

 struct rte_flow_item_e_tag {
 uint16_t tpid; /**< Tag protocol identifier (0x893F). */
 /** E-Tag control information (E-TCI). */
 uint16_t epcp_edei_in_ecid_b; /**< E-PCP (3b), E-DEI (1b), ingress E-CID 
base (12b). */
 uint16_t rsvd_grp_ecid_b; /**< Reserved (2b), GRP (2b), E-CID base (12b). 
*/
 uint8_t in_ecid_e; /**< Ingress E-CID ext. */
 uint8_t ecid_e; /**< E-CID ext. */
 };

Applications are responsibile for breaking down and filling individual
fields properly. Ethernet header would be provided as its own item as shown
in this testpmd flow command example:

 flow create 0 ingress pattern eth / e_tag in_ecid_base is 42 / end actions 
drop / end

Note, all multibyte values are in network order like other protocol header
definitions.

[1] http://dpdk.org/ml/archives/dev/2016-December/053181.html
Message ID: 20161223081310.gh10...@6wind.com

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH v2 15/18] net/ixgbe: parse flow director filter

2017-01-03 Thread Adrien Mazarguil
Hi Wei,

On Fri, Dec 30, 2016 at 03:53:07PM +0800, Wei Zhao wrote:
> check if the rule is a flow director rule, and get the flow director info.
> 
> Signed-off-by: Wei Zhao 
> Signed-off-by: Wenzhuo Lu 
> 
> ---
> 
> v2:add new error set function
> ---
>  drivers/net/ixgbe/ixgbe_ethdev.c | 1467 
> +-
>  drivers/net/ixgbe/ixgbe_ethdev.h |   16 +
>  drivers/net/ixgbe/ixgbe_fdir.c   |  247 ---
>  lib/librte_ether/rte_flow.h  |   23 +
>  4 files changed, 1495 insertions(+), 258 deletions(-)
[...]
> diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
> index e9e6220..e59f458 100644
> --- a/lib/librte_ether/rte_flow.h
> +++ b/lib/librte_ether/rte_flow.h
> @@ -275,6 +275,13 @@ enum rte_flow_item_type {
> * See struct rte_flow_item_e_tag.
> */
>   RTE_FLOW_ITEM_TYPE_E_TAG,
> +
> + /**
> +  * Matches a NVGRE header.
> +  *
> +  * See struct rte_flow_item_nvgre.
> +  */
> + RTE_FLOW_ITEM_TYPE_NVGRE,
>  };
>  
>  /**
> @@ -486,6 +493,22 @@ struct rte_flow_item_e_tag {
>  };
>  
>  /**
> + * RTE_FLOW_ITEM_TYPE_NVGRE.
> + *
> + * Matches a NVGRE header.
> + */
> +struct rte_flow_item_nvgre {
> + uint32_t flags0:1; /**< 0 */
> + uint32_t rsvd1:1; /**< 1 bit not defined */
> + uint32_t flags1:2; /**< 2 bits, 1 0 */
> + uint32_t rsvd0:9; /**< Reserved0 */
> + uint32_t ver:3; /**< version */
> + uint32_t protocol:16; /**< protocol type, 0x6558 */
> + uint8_t tni[3]; /**< tenant network ID or virtual subnet ID */
> + uint8_t flow_id; /**< flow ID or Reserved */
> +};
[...]

See my previous reply [1], this definition is not endian-safe due to the use
of bit-fields and should look more like the VXLAN item. Here is an untested
suggestion (not sure about all values):

 struct rte_flow_item_nvgre {
 /**
  * Checksum (1b), undefined (1b), key bit (1b), sequence number (1b),
  * reserved 0 (9b), version (3b).
  *
  * \c_k_s_rsvd0_ver must have value 0x2000 according to RFC 7637.
  */ 
 uint16_t c_k_s_rsvd0_ver;
 uint16_t proto; /**< Protocol type (0x6558). */
 uint8_t vsid[3]; /**< Virtual subnet ID. */
 uint8_t flow_id; /**< Flow ID. */ 
 };

Like for E-Tag, applications are responsibile for breaking down and filling
individual fields properly.

[1] http://dpdk.org/ml/archives/dev/2016-December/053181.html
Message ID: 20161223081310.gh10...@6wind.com

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH v3] crypto/aesni_gcm: migration from MB library to ISA-L

2017-01-03 Thread Thomas Monjalon
2017-01-03 14:02, Piotr Azarewicz:
> Current Cryptodev AES-NI GCM PMD is implemented using Multi Buffer
> Crypto library.This patch reimplement the device using ISA-L Crypto
> library: https://github.com/01org/isa-l_crypto.
> 
> The migration entailed the following additional support for:
>   * GMAC algorithm.
>   * 256-bit cipher key.
>   * Session-less mode.
>   * Out-of place processing
>   * Scatter-gatter support for chained mbufs (only out-of place and
> destination mbuf must be contiguous)
> 
> Verified current unit tests and added new unit tests to verify new
> functionalities.
> 
> Signed-off-by: Piotr Azarewicz 
[...]
>  The AES-NI GCM PMD (**librte_pmd_aesni_gcm**) provides poll mode crypto 
> driver
> -support for utilizing Intel multi buffer library (see AES-NI Multi-buffer 
> PMD documentation
> -to learn more about it, including installation).
> -
> -The AES-NI GCM PMD has current only been tested on Fedora 21 64-bit with gcc.
> +support for utilizing Intel ISA-L crypto library, which provides operation 
> acceleration
> +through the AES-NI instruction sets for AES-GCM authenticated cipher 
> algorithm.

Please could you compare these libraries regarding the performance?

[...]
>  Features
>  
> @@ -49,16 +47,21 @@ Cipher algorithms:
>  Authentication algorithms:
>  
>  * RTE_CRYPTO_AUTH_AES_GCM
> +* RTE_CRYPTO_AUTH_AES_GMAC
> +
> +Installation
> +
> +
> +To build DPDK with the AESNI_GCM_PMD the user is required to install
> +the ``libisal_crypto`` library in the build environment.
> +For download and more details please visit 
> ``_.

[...]
>  Limitations
>  ---
>  
> -* Chained mbufs are not supported.
> +* Chained mbufs are supported but only out-of-place (destination mbuf must 
> be contiguous).
>  * Hash only is not supported.
>  * Cipher only is not supported.
> -* Only in-place is currently supported (destination address is the same as 
> source address).
> -* Only supports session-oriented API implementation (session-less APIs are 
> not supported).
>  *  Not performance tuned.

[...]
> --- a/drivers/crypto/aesni_gcm/Makefile
> +++ b/drivers/crypto/aesni_gcm/Makefile
> @@ -31,9 +31,6 @@
>  include $(RTE_SDK)/mk/rte.vars.mk
>  
>  ifneq ($(MAKECMDGOALS),clean)
> -ifeq ($(AESNI_MULTI_BUFFER_LIB_PATH),)
> -$(error "Please define AESNI_MULTI_BUFFER_LIB_PATH environment variable")
> -endif
>  endif
>  
>  # library name
> @@ -50,10 +47,7 @@ LIBABIVER := 1
>  EXPORT_MAP := rte_pmd_aesni_gcm_version.map
>  
>  # external library dependencies
> -CFLAGS += -I$(AESNI_MULTI_BUFFER_LIB_PATH)
> -CFLAGS += -I$(AESNI_MULTI_BUFFER_LIB_PATH)/include
> -LDLIBS += -L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
> -LDLIBS += -lcrypto
> +LDLIBS += -lisal_crypto

You need to update the script test-build.sh.
Thanks


Re: [dpdk-dev] [PATCH v2 02/18] net/ixgbe: store flow director filter

2017-01-03 Thread Dai, Wei
Hi, Wei Zhao

Would you please do git rebase master for this patch set?
When I do git pull and then git apply this patch, following errors are reported:
[root@dpdk4 dpdk-org]# git am 
../patches/bundle-488-zhaowei-ixgbe-filter-api-v2.mbox

Applying: net/ixgbe: store SYN filter
Applying: net/ixgbe: store flow director filter
error: patch failed: drivers/net/ixgbe/ixgbe_ethdev.c:1284
error: drivers/net/ixgbe/ixgbe_ethdev.c: patch does not apply
Patch failed at 0002 net/ixgbe: store flow director filter
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Wei Zhao
> Sent: Friday, December 30, 2016 3:53 PM
> To: dev@dpdk.org
> Cc: Lu, Wenzhuo ; Zhao1, Wei 
> Subject: [dpdk-dev] [PATCH v2 02/18] net/ixgbe: store flow director filter
> 
> Add support for storing flow director filter in SW.
> 
> Signed-off-by: Wenzhuo Lu 
> Signed-off-by: Wei Zhao 
> ---
> 
> v2:
> --add a fdir initialization function in device start process
> ---
>  drivers/net/ixgbe/ixgbe_ethdev.c |  55 
> drivers/net/ixgbe/ixgbe_ethdev.h |  19 ++-
>  drivers/net/ixgbe/ixgbe_fdir.c   | 105
> ++-
>  3 files changed, 176 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> b/drivers/net/ixgbe/ixgbe_ethdev.c
> index 316e560..de27a73 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -60,6 +60,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "ixgbe_logs.h"
>  #include "base/ixgbe_api.h"
> @@ -165,6 +166,7 @@ enum ixgbevf_xcast_modes {
> 
>  static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev);  static int
> eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev);
> +static int ixgbe_fdir_filter_init(struct rte_eth_dev *eth_dev);
>  static int  ixgbe_dev_configure(struct rte_eth_dev *dev);  static int
> ixgbe_dev_start(struct rte_eth_dev *dev);  static void ixgbe_dev_stop(struct
> rte_eth_dev *dev); @@ -1276,6 +1278,9 @@ eth_ixgbe_dev_init(struct
> rte_eth_dev *eth_dev)
> 
>   /* initialize SYN filter */
>   filter_info->syn_info = 0;
> + /* initialize flow director filter list & hash */
> + ixgbe_fdir_filter_init(eth_dev);
> +
>   return 0;
>  }
> 
> @@ -1284,6 +1289,9 @@ eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev)
> {
>   struct rte_pci_device *pci_dev;
>   struct ixgbe_hw *hw;
> + struct ixgbe_hw_fdir_info *fdir_info =
> + IXGBE_DEV_PRIVATE_TO_FDIR_INFO(eth_dev->data->dev_private);
> + struct ixgbe_fdir_filter *fdir_filter;
> 
>   PMD_INIT_FUNC_TRACE();
> 
> @@ -1317,9 +1325,56 @@ eth_ixgbe_dev_uninit(struct rte_eth_dev
> *eth_dev)
>   rte_free(eth_dev->data->hash_mac_addrs);
>   eth_dev->data->hash_mac_addrs = NULL;
> 
> + /* remove all the fdir filters & hash */
> + if (fdir_info->hash_map)
> + rte_free(fdir_info->hash_map);
> + if (fdir_info->hash_handle)
> + rte_hash_free(fdir_info->hash_handle);
> +
> + while ((fdir_filter = TAILQ_FIRST(&fdir_info->fdir_list))) {
> + TAILQ_REMOVE(&fdir_info->fdir_list,
> +  fdir_filter,
> +  entries);
> + rte_free(fdir_filter);
> + }
> +
>   return 0;
>  }
> 
> +static int ixgbe_fdir_filter_init(struct rte_eth_dev *eth_dev) {
> + struct ixgbe_hw_fdir_info *fdir_info =
> + IXGBE_DEV_PRIVATE_TO_FDIR_INFO(eth_dev->data->dev_private);
> + char fdir_hash_name[RTE_HASH_NAMESIZE];
> + struct rte_hash_parameters fdir_hash_params = {
> + .name = fdir_hash_name,
> + .entries = IXGBE_MAX_FDIR_FILTER_NUM,
> + .key_len = sizeof(union ixgbe_atr_input),
> + .hash_func = rte_hash_crc,
> + .hash_func_init_val = 0,
> + .socket_id = rte_socket_id(),
> + };
> +
> + TAILQ_INIT(&fdir_info->fdir_list);
> + snprintf(fdir_hash_name, RTE_HASH_NAMESIZE,
> +  "fdir_%s", eth_dev->data->name);
> + fdir_info->hash_handle = rte_hash_create(&fdir_hash_params);
> + if (!fdir_info->hash_handle) {
> + PMD_INIT_LOG(ERR, "Failed to create fdir hash table!");
> + return -EINVAL;
> + }
> + fdir_info->hash_map = rte_zmalloc("ixgbe",
> +   sizeof(struct ixgbe_fdir_filter *) *
> +   IXGBE_MAX_FDIR_FILTER_NUM,
> +   0);
> + if (!fdir_info->hash_map) {
> + PMD_INIT_LOG(ERR,
> +  "Failed to allocate memory for fdir hash map!");
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
>  /*
>   * Negotiate mail

Re: [dpdk-dev] [PATCH v2 01/18] net/ixgbe: store SYN filter

2017-01-03 Thread Dai, Wei
Hi, Wei Zhao 

I think that you had better give a cover letter for such a series of patches.
You can give the changes between v2 and v1 in cover letter 
and maybe no need describe it in each one.

Thanks &Best Regards
-Wei

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Wei Zhao
> Sent: Friday, December 30, 2016 3:53 PM
> To: dev@dpdk.org
> Cc: Lu, Wenzhuo ; Zhao1, Wei 
> Subject: [dpdk-dev] [PATCH v2 01/18] net/ixgbe: store SYN filter
> 
> Add support for storing SYN filter in SW.
> 
> Signed-off-by: Wenzhuo Lu 
> Signed-off-by: Wei Zhao 
> ---
> 
> v2:
> --synqf assignment location change
> ---
>  drivers/net/ixgbe/ixgbe_ethdev.c | 14 +++---
> drivers/net/ixgbe/ixgbe_ethdev.h |  2 ++
>  2 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> b/drivers/net/ixgbe/ixgbe_ethdev.c
> index a25bac8..316e560 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -1274,6 +1274,8 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
>   memset(filter_info->fivetuple_mask, 0,
>  sizeof(uint32_t) * IXGBE_5TUPLE_ARRAY_SIZE);
> 
> + /* initialize SYN filter */
> + filter_info->syn_info = 0;
>   return 0;
>  }
> 
> @@ -5580,15 +5582,18 @@ ixgbe_syn_filter_set(struct rte_eth_dev *dev,
>   bool add)
>  {
>   struct ixgbe_hw *hw =
> IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> + struct ixgbe_filter_info *filter_info =
> + IXGBE_DEV_PRIVATE_TO_FILTER_INFO(dev->data->dev_private);
> + uint32_t syn_info;
>   uint32_t synqf;
> 
>   if (filter->queue >= IXGBE_MAX_RX_QUEUE_NUM)
>   return -EINVAL;
> 
> - synqf = IXGBE_READ_REG(hw, IXGBE_SYNQF);
> + syn_info = filter_info->syn_info;
> 
>   if (add) {
> - if (synqf & IXGBE_SYN_FILTER_ENABLE)
> + if (syn_info & IXGBE_SYN_FILTER_ENABLE)
>   return -EINVAL;
>   synqf = (uint32_t)(((filter->queue <<
> IXGBE_SYN_FILTER_QUEUE_SHIFT) &
>   IXGBE_SYN_FILTER_QUEUE) | IXGBE_SYN_FILTER_ENABLE);
> @@ -5598,10 +5603,13 @@ ixgbe_syn_filter_set(struct rte_eth_dev *dev,
>   else
>   synqf &= ~IXGBE_SYN_FILTER_SYNQFP;
>   } else {
> - if (!(synqf & IXGBE_SYN_FILTER_ENABLE))
> + synqf = IXGBE_READ_REG(hw, IXGBE_SYNQF);
> + if (!(syn_info & IXGBE_SYN_FILTER_ENABLE))
>   return -ENOENT;
>   synqf &= ~(IXGBE_SYN_FILTER_QUEUE |
> IXGBE_SYN_FILTER_ENABLE);
>   }
> +
> + filter_info->syn_info = synqf;
>   IXGBE_WRITE_REG(hw, IXGBE_SYNQF, synqf);
>   IXGBE_WRITE_FLUSH(hw);
>   return 0;
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h
> b/drivers/net/ixgbe/ixgbe_ethdev.h
> index 4ff6338..827026c 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.h
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.h
> @@ -262,6 +262,8 @@ struct ixgbe_filter_info {
>   /* Bit mask for every used 5tuple filter */
>   uint32_t fivetuple_mask[IXGBE_5TUPLE_ARRAY_SIZE];
>   struct ixgbe_5tuple_filter_list fivetuple_list;
> + /* store the SYN filter info */
> + uint32_t syn_info;
>  };
> 
>  /*
> --
> 2.5.5



Re: [dpdk-dev] [PATCH v5 00/20] Decouple ethdev from PCI device

2017-01-03 Thread Ferruh Yigit
On 1/3/2017 2:06 PM, Thomas Monjalon wrote:
> 2017-01-03 12:24, Ferruh Yigit:
>> On 12/25/2016 10:33 PM, Thomas Monjalon wrote:
>>> Applied with some trivial fixes, thanks
>>
>> Getting following build error for mlx5 [1], it is mainly because verbs.h
>> also using container_of macro.
>>
>> [1]
>> In file included from
>> .../x86_64-native-linuxapp-gcc/include/rte_mbuf.h:57:0,
>>  from .../x86_64-native-linuxapp-gcc/include/rte_ether.h:52,
>>  from .../drivers/net/mlx5/mlx5_trigger.c:38:
>> /usr/include/infiniband/verbs.h: In function ‘verbs_get_device’:
>> .../x86_64-native-linuxapp-gcc/include/rte_common.h:350:40: error:
>> initialization discards ‘const’ qualifier from pointer target type
>> [-Werror=discarded-qualifiers]
>> typeof(((type *)0)->member) *_ptr = (ptr); \
>> ^
> 
> Yes, this issue is fixed by upgrading mofed to version 3.4.
> 

Confirmed.


Re: [dpdk-dev] [PATCH v3 1/4] ethdev: add firmware information get

2017-01-03 Thread Ferruh Yigit
On 1/3/2017 9:05 AM, Yang, Qiming wrote:
> Hi, Ferruh
> Please see the question below. In my opinion, etrack_id is just a name used 
> to define the ID of one NIC.
> In kernel version ethtool, it will print this ID in the line of firmware 
> verison. 
> I know what is etrack_id mean, but I really don't know why this named 
> etrack_id.

Hi Qiming,

I suggested the API based on fields you already used in your patch.

So, this API is to get FW version, is etrack_id something that defines
(part of) firmware version?

Thanks,
ferruh


> Can you explain this question?
>  
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
> Sent: Tuesday, January 3, 2017 4:40 PM
> To: Yang, Qiming 
> Subject: Re: [PATCH v3 1/4] ethdev: add firmware information get
> 
> Please reply below the question and on the mailing list.
> You'll have to explain why this name etrack_id.
> 
> 2017-01-03 03:28, Yang, Qiming:
>> Hi, Thomas
>> etrack_id is not a terminology, it's decided by me.
>> Which is store the unique number of the firmware.
>> firmware-version: 5.04 0x800024ca
>> 800024ca is the etrack_id of this NIC.
>>
>> -Original Message-
>> From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
>> Sent: Monday, January 2, 2017 11:39 PM
>> To: Yang, Qiming 
>> Cc: dev@dpdk.org; Horton, Remy ; Yigit, Ferruh 
>> 
>> Subject: Re: [PATCH v3 1/4] ethdev: add firmware information get
>>
>> 2016-12-27 20:30, Qiming Yang:
>>>  /**
>>> + * Retrieve the firmware version of a device.
>>> + *
>>> + * @param port_id
>>> + *   The port identifier of the device.
>>> + * @param fw_major
>>> + *   A array pointer to store the major firmware version of a device.
>>> + * @param fw_minor
>>> + *   A array pointer to store the minor firmware version of a device.
>>> + * @param fw_patch
>>> + *   A array pointer to store the firmware patch number of a device.
>>> + * @param etrack_id
>>> + *   A array pointer to store the nvm version of a device.
>>> + */
>>> +void rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t *fw_major,
>>> +   uint32_t *fw_minor, uint32_t *fw_patch, uint32_t *etrack_id);
>>
>> I have a reserve about the naming etrack_id.
>> Please could you point to a document explaining this ID?
>> Is it known outside of Intel?
> 
> 



Re: [dpdk-dev] [PATCH v3 1/4] ethdev: add firmware information get

2017-01-03 Thread Ferruh Yigit
On 12/27/2016 12:30 PM, Qiming Yang wrote:
> This patch adds a new API 'rte_eth_dev_fw_info_get' for fetching
> firmware related information by a given device.
> 
> Signed-off-by: Qiming Yang 
> Acked-by: Remy Horton 
> ---
> v2 changes:
> * modified some comment statements.
> v3 changes:
> * change API, use rte_eth_dev_fw_info_get(uint8_t port_id,
>   uint32_t *fw_major, uint32_t *fw_minor, uint32_t *fw_patch,
>   uint32_t *etrack_id) instead of rte_eth_dev_fwver_get(uint8_t port_id,
>   char *fw_version, int fw_length).
>   Add statusment in /doc/guides/nics/features/default.ini and
>   release_17_02.rst.
> ---
> ---
>  doc/guides/nics/features/default.ini   |  1 +
>  doc/guides/rel_notes/release_17_02.rst |  4 
>  lib/librte_ether/rte_ethdev.c  | 14 ++
>  lib/librte_ether/rte_ethdev.h  | 23 +++
>  lib/librte_ether/rte_ether_version.map |  1 +
>  5 files changed, 43 insertions(+)

This patch also should remove deprecation notice.
Item 3 of the requested changes.

> 
> diff --git a/doc/guides/nics/features/default.ini 
> b/doc/guides/nics/features/default.ini
> index f1bf9bf..8237ee4 100644
> --- a/doc/guides/nics/features/default.ini
> +++ b/doc/guides/nics/features/default.ini
> @@ -66,3 +66,4 @@ x86-64   =
>  Usage doc=
>  Design doc   =
>  Perf doc =
> +FW version   =

I am not sure about this location, I think it can be before "EEPROM
dump", what do you think?

> diff --git a/doc/guides/rel_notes/release_17_02.rst 
> b/doc/guides/rel_notes/release_17_02.rst
> index 180af82..f6dc6c0 100644
> --- a/doc/guides/rel_notes/release_17_02.rst
> +++ b/doc/guides/rel_notes/release_17_02.rst
> @@ -52,6 +52,10 @@ New Features
>See the :ref:`Generic flow API ` documentation for more
>information.
>  
> +* **Added firmware information get API.**
> + Added a new function ``rte_eth_dev_fw_info_get()`` to fetch firmware related
> + information by a given device. Information include major firmware version,
> + minor firmware version, patch number and etrack id.
>  
>  Resolved Issues
>  ---
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 280f0db..f399f09 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -1586,6 +1586,20 @@ rte_eth_dev_set_rx_queue_stats_mapping(uint8_t 
> port_id, uint16_t rx_queue_id,
>  }
>  
>  void
> +rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t *fw_major, uint32_t 
> *fw_minor,
> + uint32_t *fw_patch, uint32_t *etrack_id)

I am for rte_eth_dev_fw_version_get(), to limit the scope of the API.
And API name and eth_dev_ops name should match..

> +{
> + struct rte_eth_dev *dev;
> +
> + RTE_ETH_VALID_PORTID_OR_RET(port_id);
> + dev = &rte_eth_devices[port_id];
> +

What do you think setting all arguments to zero here?

> + RTE_FUNC_PTR_OR_RET(*dev->dev_ops->fw_version_get);
> + (*dev->dev_ops->fw_version_get)(dev, fw_major, fw_minor,
> + fw_patch, etrack_id);
> +}
> +

<...>

> --- a/lib/librte_ether/rte_ether_version.map
> +++ b/lib/librte_ether/rte_ether_version.map
> @@ -156,5 +156,6 @@ DPDK_17.02 {
>   rte_flow_flush;
>   rte_flow_query;
>   rte_flow_validate;
> + rte_eth_dev_fw_info_get;

Please add this alphabetically sorted.

>  
>  } DPDK_16.11;
> 



Re: [dpdk-dev] [PATCH v3 2/4] net/e1000: add firmware version get

2017-01-03 Thread Ferruh Yigit
On 12/27/2016 12:30 PM, Qiming Yang wrote:
> This patch adds a new function eth_igb_fw_version_get.
> 
> Signed-off-by: Qiming Yang 
> ---
> v3 changes:
>  * use eth_igb_fw_version_get(struct rte_eth_dev *dev, u32 *fw_major,
>u32 *fw_minor, u32 *fw_minor, u32 *fw_patch, u32 *etrack_id) instead
>of eth_igb_fw_version_get(struct rte_eth_dev *dev, char *fw_version,
>int fw_length). Add statusment in /doc/guides/nics/features/igb.ini.
> ---
> ---
>  doc/guides/nics/features/igb.ini |  1 +
>  drivers/net/e1000/igb_ethdev.c   | 43 
> 
>  2 files changed, 44 insertions(+)
> 
> diff --git a/doc/guides/nics/features/igb.ini 
> b/doc/guides/nics/features/igb.ini
> index 9fafe72..ffd87ba 100644
> --- a/doc/guides/nics/features/igb.ini
> +++ b/doc/guides/nics/features/igb.ini
> @@ -39,6 +39,7 @@ EEPROM dump  = Y
>  Registers dump   = Y
>  BSD nic_uio  = Y
>  Linux UIO= Y
> +FW version   = Y

Please keep same location with default.ini file. Why you are putting
this just into middle of the uio and vfio?

>  Linux VFIO   = Y
>  x86-32   = Y
>  x86-64   = Y
> diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
> index 4a15447..25344b7 100644
> --- a/drivers/net/e1000/igb_ethdev.c
> +++ b/drivers/net/e1000/igb_ethdev.c
> @@ -120,6 +120,8 @@ static int eth_igb_xstats_get_names(struct rte_eth_dev 
> *dev,
>   unsigned limit);
>  static void eth_igb_stats_reset(struct rte_eth_dev *dev);
>  static void eth_igb_xstats_reset(struct rte_eth_dev *dev);
> +static void eth_igb_fw_version_get(struct rte_eth_dev *dev, u32 *fw_major,
> + u32 *fw_minor, u32 *fw_patch, u32 *etrack_id);

I think you can use a struct as parameter here. But beware, that struct
should NOT be a public struct.

>  static void eth_igb_infos_get(struct rte_eth_dev *dev,
> struct rte_eth_dev_info *dev_info);
>  static const uint32_t *eth_igb_supported_ptypes_get(struct rte_eth_dev *dev);
> @@ -389,6 +391,7 @@ static const struct eth_dev_ops eth_igb_ops = {
>   .xstats_get_names = eth_igb_xstats_get_names,
>   .stats_reset  = eth_igb_stats_reset,
>   .xstats_reset = eth_igb_xstats_reset,
> + .fw_version_get   = eth_igb_fw_version_get,
>   .dev_infos_get= eth_igb_infos_get,
>   .dev_supported_ptypes_get = eth_igb_supported_ptypes_get,
>   .mtu_set  = eth_igb_mtu_set,
> @@ -1981,6 +1984,46 @@ eth_igbvf_stats_reset(struct rte_eth_dev *dev)
>  }
>  

<...>


Re: [dpdk-dev] [PATCH v3 3/4] net/ixgbe: add firmware version get

2017-01-03 Thread Ferruh Yigit
On 12/27/2016 12:30 PM, Qiming Yang wrote:
> This patch add a new function ixgbe_fw_version_get.
> 
> Signed-off-by: Qiming Yang 

<...>

>  
>  static void
> +ixgbe_fw_version_get(struct rte_eth_dev *dev, __rte_unused u32 *fw_major,
> + __rte_unused u32 *fw_minor, __rte_unused u32 *fw_patch, u32 *etrack_id)

This API at least provide major and minor fw versions I think. Isn't
there any kind of FW version information for ixgbe? Just providing
etrack_id is not looking good.

> +{
> + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> + u16 eeprom_verh, eeprom_verl;
> +
> + ixgbe_read_eeprom(hw, 0x2e, &eeprom_verh);
> + ixgbe_read_eeprom(hw, 0x2d, &eeprom_verl);
> +
> + *etrack_id = (eeprom_verh << 16) | eeprom_verl;
> +}
> +
> +static void
>  ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info 
> *dev_info)
>  {
>   struct rte_pci_device *pci_dev = IXGBE_DEV_TO_PCI(dev);
> 



[dpdk-dev] [PATCH] app/test: fix aad padding size in SGL operation

2017-01-03 Thread Arek Kusztal
This commit fixes unnecessary padding of aad for GCM using
scatter-gather list

Fixes: b71990ffa7e4 ("app/test: add SGL tests to cryptodev QAT suite")

Signed-off-by: Arek Kusztal 
---
 app/test/test_cryptodev.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index ba6bbb5..3eaf1b7 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -5983,7 +5983,7 @@ create_gcm_operation_SGL(enum rte_crypto_cipher_operation 
op,
const unsigned int iv_len = tdata->iv.len;
const unsigned int aad_len = tdata->aad.len;
 
-   unsigned int iv_pad_len = 0, aad_buffer_len = 0;
+   unsigned int iv_pad_len = 0;
 
/* Generate Crypto op data structure */
ut_params->op = rte_crypto_op_alloc(ts_params->op_mpool,
@@ -6023,17 +6023,15 @@ create_gcm_operation_SGL(enum 
rte_crypto_cipher_operation op,
 
rte_memcpy(sym_op->cipher.iv.data, tdata->iv.data, iv_pad_len);
 
-   aad_buffer_len = ALIGN_POW2_ROUNDUP(aad_len, 16);
-
sym_op->auth.aad.data = (uint8_t *)rte_pktmbuf_prepend(
-   ut_params->ibuf, aad_buffer_len);
+   ut_params->ibuf, aad_len);
TEST_ASSERT_NOT_NULL(sym_op->auth.aad.data,
"no room to prepend aad");
sym_op->auth.aad.phys_addr = rte_pktmbuf_mtophys(
ut_params->ibuf);
sym_op->auth.aad.length = aad_len;
 
-   memset(sym_op->auth.aad.data, 0, aad_buffer_len);
+   memset(sym_op->auth.aad.data, 0, aad_len);
rte_memcpy(sym_op->auth.aad.data, tdata->aad.data, aad_len);
 
TEST_HEXDUMP(stdout, "iv:", sym_op->cipher.iv.data, iv_pad_len);
@@ -6041,9 +6039,9 @@ create_gcm_operation_SGL(enum rte_crypto_cipher_operation 
op,
sym_op->auth.aad.data, aad_len);
 
sym_op->cipher.data.length = tdata->plaintext.len;
-   sym_op->cipher.data.offset = aad_buffer_len + iv_pad_len;
+   sym_op->cipher.data.offset = aad_len + iv_pad_len;
 
-   sym_op->auth.data.offset = aad_buffer_len + iv_pad_len;
+   sym_op->auth.data.offset = aad_len + iv_pad_len;
sym_op->auth.data.length = tdata->plaintext.len;
 
return 0;
-- 
2.1.0



Re: [dpdk-dev] [PATCH 23/25] net/qede/base: semantic/formatting changes

2017-01-03 Thread Ferruh Yigit
On 12/31/2016 7:41 AM, Mody, Rasesh wrote:
>> From: Ferruh Yigit [mailto:ferruh.yi...@intel.com]
>> Sent: Friday, December 23, 2016 7:42 AM
>>
>> On 12/3/2016 9:11 AM, Rasesh Mody wrote:
>>> This patch consists of semantic/formatting changes. It also includes
>>> comment additions.
>>
>> As far as I can see majority of the changes are formatting, but not all.
>>
>> Functional changes are hard to detect in this patch, what do you think
>> separating formatting/comments patches into another patch, so functional
>> changes can become more visible?
> 
> There are few of places(ecore_hw_bar_size(), ecore_get_hw_info() and 
> ecore_init_cmd_*), where there is a bit of code refactoring. However, they 
> are not a major change. We have tried to isolate most of the functional 
> changes and made them part of the separate patches as fit. I think, we can 
> include a bit of description in commit message to cover it in this patch. 
> Please let me know if you think otherwise.

I believe it is good to separate code refactoring into different patch
if possible, instead of covering this in commit log.

This makes functional changes easy to find in the future. In this patch
hard to spot them.

Thanks,
ferruh

> 
>>>
>>> Signed-off-by: Rasesh Mody 
>>> ---
>> <...>



Re: [dpdk-dev] [PATCH 1/2] net/ixgbe: remove unused global variable

2017-01-03 Thread Ferruh Yigit
On 1/3/2017 1:23 PM, Ferruh Yigit wrote:
> On 12/27/2016 10:09 AM, Jerin Jacob wrote:
>> Removed unused "reg_info" global variable from ixgbe driver.
>>
>> cat build/app/testpmd.map | grep "Allocating common symbols" -A 15
>> Allocating common symbols
>> Common symbol   sizefile
>> reg_info0x18build/lib/librte_pmd_ixgbe.a(ixgbe_ethdev.o)
>>
>> Signed-off-by: Jerin Jacob 
> 
> Acked-by: Ferruh Yigit 
> 

Applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] [RFC 00/23] Refactor eal_init to remove panic() calls

2017-01-03 Thread Aaron Conole
Thomas Monjalon  writes:

> Hi Aaron,
>
> 2016-12-30 10:25, Aaron Conole:
>> In many cases, it's enough to simply let the application know that the
>> call to initialize DPDK has failed.  A complete halt can then be
>> decided by the application based on error returned (and the app could
>> even attempt a possible re-attempt after some corrective action by the
>> user or application).
>> 
>> There is still some work left in this series.
>
> Thanks for starting the work.
> I think it is candidate for 17.05 and can be promoted in the roadmap:
>   http://dpdk.org/dev/roadmap

Okay.

> Have you checked wether these changes are modifying the API?

That'll be my last pass through.

> Some doxygen comments may need to be updated when a new error code
> is used.

Agreed;  I also want to ensure that there's a consistent set of error
codes, and a consistent place to check for them.

I'll probably prefer to put them in rte_errno.

Thanks for your thoughts, Thomas!


[dpdk-dev] [PATCH] mk: disable ICC warning 188

2017-01-03 Thread Ferruh Yigit
error #188: enumerated type mixed with another type

This is get when an integer assigned to an enum variable.

Since this usage is common and causing many ICC compilation errors, and
other compilers accept this usage. Disabling the warning.

Signed-off-by: Ferruh Yigit 
---
 mk/toolchain/icc/rte.vars.mk | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mk/toolchain/icc/rte.vars.mk b/mk/toolchain/icc/rte.vars.mk
index ba69f1f..86d9ef7 100644
--- a/mk/toolchain/icc/rte.vars.mk
+++ b/mk/toolchain/icc/rte.vars.mk
@@ -71,6 +71,7 @@ TOOLCHAIN_ASFLAGS =
 #   was declared "deprecated"
 WERROR_FLAGS := -Wall -w2 -diag-disable 271 -diag-warning 1478
 WERROR_FLAGS += -diag-disable 13368 -diag-disable 15527
+WERROR_FLAGS += -diag-disable 188
 
 ifeq ($(RTE_DEVEL_BUILD),y)
 WERROR_FLAGS += -Werror-all
-- 
2.9.3



Re: [dpdk-dev] [PATCH v5 0/6] net/mlx5: support flow API

2017-01-03 Thread Ferruh Yigit
On 12/29/2016 3:15 PM, Nelio Laranjeiro wrote:
> Changes in v5:
> 
>  - Fix masking when only spec is present in item structure.
>  - Fix first element of flow items array.
> 
> Changes in v4:
> 
>  - Simplify flow parsing by using a graph.
>  - Add VXLAN flow item.
>  - Add mark flow action.
>  - Extend IPv4 filter item (Type of service, Next Protocol ID).
> 
> Changes in v3:
> 
>  - Fix Ethernet ether type issue.
> 
> Changes in v2:
> 
>  - Fix several issues.
>  - Support VLAN filtering.
> 
> Nelio Laranjeiro (6):
>   net/mlx5: add preliminary flow API support
>   net/mlx5: support basic flow items and actions
>   net/mlx5: support VLAN flow item
>   net/mlx5: support VXLAN flow item
>   net/mlx5: support mark flow action
>   net/mlx5: extend IPv4 flow item

This patch is giving ICC warnings [1], but please check:
http://dpdk.org/dev/patchwork/patch/18808/



[1]
.../drivers/net/mlx5/mlx5_flow.c(550): error #188: enumerated type mixed
with another type
.type = flow->inner | IBV_EXP_FLOW_SPEC_ETH,
^

.../drivers/net/mlx5/mlx5_flow.c(626): error #188: enumerated type mixed
with another type
.type = flow->inner | IBV_EXP_FLOW_SPEC_IPV4_EXT,
^

.../drivers/net/mlx5/mlx5_flow.c(679): error #188: enumerated type mixed
with another type
.type = flow->inner | IBV_EXP_FLOW_SPEC_IPV6,
^

.../drivers/net/mlx5/mlx5_flow.c(727): error #188: enumerated type mixed
with another type
.type = flow->inner | IBV_EXP_FLOW_SPEC_UDP,
^

.../drivers/net/mlx5/mlx5_flow.c(769): error #188: enumerated type mixed
with another type
.type = flow->inner | IBV_EXP_FLOW_SPEC_TCP,
^

.../drivers/net/mlx5/mlx5_flow.c(816): error #188: enumerated type mixed
with another type
.type = flow->inner | IBV_EXP_FLOW_SPEC_VXLAN_TUNNEL,
^

> 
>  drivers/net/mlx5/Makefile   |1 +
>  drivers/net/mlx5/mlx5.h |   19 +
>  drivers/net/mlx5/mlx5_fdir.c|   15 +
>  drivers/net/mlx5/mlx5_flow.c| 1248 
> +++
>  drivers/net/mlx5/mlx5_prm.h |   70 ++-
>  drivers/net/mlx5/mlx5_rxtx.c|   12 +-
>  drivers/net/mlx5/mlx5_rxtx.h|3 +-
>  drivers/net/mlx5/mlx5_trigger.c |2 +
>  8 files changed, 1367 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/net/mlx5/mlx5_flow.c
> 



[dpdk-dev] [PATCH v4 2/2] net/vhost: emulate device start/stop behavior

2017-01-03 Thread Charles (Chas) Williams
.dev_start()/.dev_stop() roughly corresponds to the local device's port
being ready.  This is different from the remote client being connected
which is roughly link up or down.  Emulate the device start/stop behavior
by separately tracking the start/stop state to determine if we should
allow packets to be queued to/from the remote client.

Signed-off-by: Chas Williams 
---
 drivers/net/vhost/rte_eth_vhost.c | 82 ---
 1 file changed, 51 insertions(+), 31 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index c669e79..0b5b80a 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -110,9 +110,11 @@ struct vhost_queue {
 };
 
 struct pmd_internal {
+   rte_atomic32_t dev_attached;
char *dev_name;
char *iface_name;
uint16_t max_queues;
+   rte_atomic32_t started;
 };
 
 struct internal_list {
@@ -490,6 +492,38 @@ find_internal_resource(char *ifname)
return list;
 }
 
+static void
+update_queuing_status(struct rte_eth_dev *dev)
+{
+   struct pmd_internal *internal = dev->data->dev_private;
+   struct vhost_queue *vq;
+   unsigned int i;
+   int allow_queuing = 1;
+
+   if (rte_atomic32_read(&internal->started) == 0 ||
+   rte_atomic32_read(&internal->dev_attached) == 0)
+   allow_queuing = 0;
+
+   /* Wait until rx/tx_pkt_burst stops accessing vhost device */
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   vq = dev->data->rx_queues[i];
+   if (vq == NULL)
+   continue;
+   rte_atomic32_set(&vq->allow_queuing, allow_queuing);
+   while (rte_atomic32_read(&vq->while_queuing))
+   rte_pause();
+   }
+
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   vq = dev->data->tx_queues[i];
+   if (vq == NULL)
+   continue;
+   rte_atomic32_set(&vq->allow_queuing, allow_queuing);
+   while (rte_atomic32_read(&vq->while_queuing))
+   rte_pause();
+   }
+}
+
 static int
 new_device(int vid)
 {
@@ -541,18 +575,8 @@ new_device(int vid)
 
eth_dev->data->dev_link.link_status = ETH_LINK_UP;
 
-   for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
-   vq = eth_dev->data->rx_queues[i];
-   if (vq == NULL)
-   continue;
-   rte_atomic32_set(&vq->allow_queuing, 1);
-   }
-   for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
-   vq = eth_dev->data->tx_queues[i];
-   if (vq == NULL)
-   continue;
-   rte_atomic32_set(&vq->allow_queuing, 1);
-   }
+   rte_atomic32_set(&internal->dev_attached, 1);
+   update_queuing_status(eth_dev);
 
RTE_LOG(INFO, PMD, "New connection established\n");
 
@@ -565,6 +589,7 @@ static void
 destroy_device(int vid)
 {
struct rte_eth_dev *eth_dev;
+   struct pmd_internal *internal;
struct vhost_queue *vq;
struct internal_list *list;
char ifname[PATH_MAX];
@@ -578,24 +603,10 @@ destroy_device(int vid)
return;
}
eth_dev = list->eth_dev;
+   internal = eth_dev->data->dev_private;
 
-   /* Wait until rx/tx_pkt_burst stops accessing vhost device */
-   for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
-   vq = eth_dev->data->rx_queues[i];
-   if (vq == NULL)
-   continue;
-   rte_atomic32_set(&vq->allow_queuing, 0);
-   while (rte_atomic32_read(&vq->while_queuing))
-   rte_pause();
-   }
-   for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
-   vq = eth_dev->data->tx_queues[i];
-   if (vq == NULL)
-   continue;
-   rte_atomic32_set(&vq->allow_queuing, 0);
-   while (rte_atomic32_read(&vq->while_queuing))
-   rte_pause();
-   }
+   rte_atomic32_set(&internal->dev_attached, 0);
+   update_queuing_status(eth_dev);
 
eth_dev->data->dev_link.link_status = ETH_LINK_DOWN;
 
@@ -769,14 +780,23 @@ vhost_driver_session_stop(void)
 }
 
 static int
-eth_dev_start(struct rte_eth_dev *dev __rte_unused)
+eth_dev_start(struct rte_eth_dev *dev)
 {
+   struct pmd_internal *internal = dev->data->dev_private;
+
+   rte_atomic32_set(&internal->started, 1);
+   update_queuing_status(dev);
+
return 0;
 }
 
 static void
-eth_dev_stop(struct rte_eth_dev *dev __rte_unused)
+eth_dev_stop(struct rte_eth_dev *dev)
 {
+   struct pmd_internal *internal = dev->data->dev_private;
+
+   rte_atomic32_set(&internal->started, 0);
+   update_queuing_status(dev);
 }
 
 static int
-- 
2.1.4



[dpdk-dev] [PATCH v4 1/2] net/vhost: create datagram sockets immediately

2017-01-03 Thread Charles (Chas) Williams
If you create a vhost server device, it doesn't create the actual datagram
socket until you call .dev_start().  If you call .dev_stop() is also
deletes those sockets.  For QEMU clients, this is a problem since QEMU
doesn't know how to re-attach to datagram sockets that have gone away.

To fix this, register and unregister the datagram sockets during device
creation and removal.

Fixes: ee584e9710b9 ("vhost: add driver on top of the library")

Signed-off-by: Chas Williams 
---
 drivers/net/vhost/rte_eth_vhost.c | 45 +++
 1 file changed, 17 insertions(+), 28 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 60b0f51..c669e79 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -113,9 +113,6 @@ struct pmd_internal {
char *dev_name;
char *iface_name;
uint16_t max_queues;
-   uint64_t flags;
-
-   volatile uint16_t once;
 };
 
 struct internal_list {
@@ -772,35 +769,14 @@ vhost_driver_session_stop(void)
 }
 
 static int
-eth_dev_start(struct rte_eth_dev *dev)
+eth_dev_start(struct rte_eth_dev *dev __rte_unused)
 {
-   struct pmd_internal *internal = dev->data->dev_private;
-   int ret = 0;
-
-   if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
-   ret = rte_vhost_driver_register(internal->iface_name,
-   internal->flags);
-   if (ret)
-   return ret;
-   }
-
-   /* We need only one message handling thread */
-   if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
-   ret = vhost_driver_session_start();
-
-   return ret;
+   return 0;
 }
 
 static void
-eth_dev_stop(struct rte_eth_dev *dev)
+eth_dev_stop(struct rte_eth_dev *dev __rte_unused)
 {
-   struct pmd_internal *internal = dev->data->dev_private;
-
-   if (rte_atomic16_cmpset(&internal->once, 1, 0))
-   rte_vhost_driver_unregister(internal->iface_name);
-
-   if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
-   vhost_driver_session_stop();
 }
 
 static int
@@ -1043,7 +1019,6 @@ eth_dev_vhost_create(const char *name, char *iface_name, 
int16_t queues,
internal->iface_name = strdup(iface_name);
if (internal->iface_name == NULL)
goto error;
-   internal->flags = flags;
 
list->eth_dev = eth_dev;
pthread_mutex_lock(&internal_list_lock);
@@ -1078,6 +1053,15 @@ eth_dev_vhost_create(const char *name, char *iface_name, 
int16_t queues,
eth_dev->rx_pkt_burst = eth_vhost_rx;
eth_dev->tx_pkt_burst = eth_vhost_tx;
 
+   if (rte_vhost_driver_register(iface_name, flags))
+   goto error;
+
+   /* We need only one message handling thread */
+   if (rte_atomic16_add_return(&nb_started_ports, 1) == 1) {
+   if (vhost_driver_session_start())
+   goto error;
+   }
+
return data->port_id;
 
 error:
@@ -1215,6 +1199,11 @@ rte_pmd_vhost_remove(const char *name)
 
eth_dev_stop(eth_dev);
 
+   rte_vhost_driver_unregister(internal->iface_name);
+
+   if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+   vhost_driver_session_stop();
+
rte_free(vring_states[eth_dev->data->port_id]);
vring_states[eth_dev->data->port_id] = NULL;
 
-- 
2.1.4



Re: [dpdk-dev] [dpdk-stable] [PATCH] net/mlx5: fix RSS hash result for flows

2017-01-03 Thread Ferruh Yigit
On 12/28/2016 9:58 AM, Nelio Laranjeiro wrote:
> Flows redirected to a specific queue do not have a valid RSS hash result
> and the related mbuf flag must not be set.
> 
> Fixes: ecf60761fc2a ("net/mlx5: return RSS hash result in mbuf")
> 
> CC: sta...@dpdk.org
> Signed-off-by: Nelio Laranjeiro 
> Acked-by: Adrien Mazarguil 

Applied to dpdk-next-net/master, thanks.



[dpdk-dev] [PATCH v2] Scheduler: add driver for scheduler crypto pmd

2017-01-03 Thread Fan Zhang
This patch provides the initial implementation of the scheduler poll mode
driver using DPDK cryptodev framework.

Scheduler PMD is used to schedule and enqueue the crypto ops to the
hardware and/or software crypto devices attached to it (slaves). The
dequeue operation from the slave(s), and the possible dequeued crypto op
reordering, are then carried out by the scheduler.

As the initial version, the scheduler PMD currently supports only the
Round-robin mode, which distributes the enqueued burst of crypto ops
among its slaves in a round-robin manner. This mode may help to fill
the throughput gap between the physical core and the existing cryptodevs
to increase the overall performance. Moreover, the scheduler PMD is
provided the APIs for user to create his/her own scheduler.

Build instructions:
To build DPDK with CRYTPO_SCHEDULER_PMD the user is required to set
CONFIG_RTE_LIBRTE_PMD_CRYPTO_SCHEDULER=y in config/common_base

Notice:
- Scheduler PMD shares same EAL commandline options as other cryptodevs.
  However, apart from socket_id, the rest of cryptodev options are
  ignored. The scheduler PMD's max_nb_queue_pairs and max_nb_sessions
  options are set as the minimum values of the attached slaves'. For
  example, a scheduler cryptodev is attached 2 cryptodevs with
  max_nb_queue_pairs of 2 and 8, respectively. The scheduler cryptodev's
  max_nb_queue_pairs will be automatically updated as 2.

- The scheduler cryptodev cannot be started unless the scheduling mode
  is set and at least one slave is attached. Also, to configure the
  scheduler in the run-time, like attach/detach slave(s), change
  scheduling mode, or enable/disable crypto op ordering, one should stop
  the scheduler first, otherwise an error will be returned.

Signed-off-by: Fan Zhang 
Signed-off-by: Declan Doherty 
---
 config/common_base |  10 +-
 drivers/crypto/Makefile|   1 +
 drivers/crypto/scheduler/Makefile  |  67 +++
 drivers/crypto/scheduler/rte_cryptodev_scheduler.c | 598 +
 drivers/crypto/scheduler/rte_cryptodev_scheduler.h | 183 +++
 .../scheduler/rte_cryptodev_scheduler_ioctls.h |  92 
 .../scheduler/rte_cryptodev_scheduler_operations.h |  71 +++
 .../scheduler/rte_pmd_crypto_scheduler_version.map |  12 +
 drivers/crypto/scheduler/scheduler_pmd.c   | 168 ++
 drivers/crypto/scheduler/scheduler_pmd_ops.c   | 495 +
 drivers/crypto/scheduler/scheduler_pmd_private.h   | 122 +
 drivers/crypto/scheduler/scheduler_roundrobin.c| 419 +++
 lib/librte_cryptodev/rte_cryptodev.h   |   4 +
 mk/rte.app.mk  |   3 +-
 14 files changed, 2242 insertions(+), 3 deletions(-)
 create mode 100644 drivers/crypto/scheduler/Makefile
 create mode 100644 drivers/crypto/scheduler/rte_cryptodev_scheduler.c
 create mode 100644 drivers/crypto/scheduler/rte_cryptodev_scheduler.h
 create mode 100644 drivers/crypto/scheduler/rte_cryptodev_scheduler_ioctls.h
 create mode 100644 
drivers/crypto/scheduler/rte_cryptodev_scheduler_operations.h
 create mode 100644 
drivers/crypto/scheduler/rte_pmd_crypto_scheduler_version.map
 create mode 100644 drivers/crypto/scheduler/scheduler_pmd.c
 create mode 100644 drivers/crypto/scheduler/scheduler_pmd_ops.c
 create mode 100644 drivers/crypto/scheduler/scheduler_pmd_private.h
 create mode 100644 drivers/crypto/scheduler/scheduler_roundrobin.c

diff --git a/config/common_base b/config/common_base
index 4bff83a..a3783a6 100644
--- a/config/common_base
+++ b/config/common_base
@@ -358,7 +358,7 @@ CONFIG_RTE_CRYPTODEV_NAME_LEN=64
 #
 # Compile PMD for QuickAssist based devices
 #
-CONFIG_RTE_LIBRTE_PMD_QAT=n
+CONFIG_RTE_LIBRTE_PMD_QAT=y
 CONFIG_RTE_LIBRTE_PMD_QAT_DEBUG_INIT=n
 CONFIG_RTE_LIBRTE_PMD_QAT_DEBUG_TX=n
 CONFIG_RTE_LIBRTE_PMD_QAT_DEBUG_RX=n
@@ -372,7 +372,7 @@ CONFIG_RTE_QAT_PMD_MAX_NB_SESSIONS=2048
 #
 # Compile PMD for AESNI backed device
 #
-CONFIG_RTE_LIBRTE_PMD_AESNI_MB=n
+CONFIG_RTE_LIBRTE_PMD_AESNI_MB=y
 CONFIG_RTE_LIBRTE_PMD_AESNI_MB_DEBUG=n
 
 #
@@ -400,6 +400,12 @@ CONFIG_RTE_LIBRTE_PMD_KASUMI=n
 CONFIG_RTE_LIBRTE_PMD_KASUMI_DEBUG=n
 
 #
+# Compile PMD for Crypto Scheduler device
+#
+CONFIG_RTE_LIBRTE_PMD_CRYPTO_SCHEDULER=y
+CONFIG_RTE_LIBRTE_PMD_CRYPTO_SCHEDULER_DEBUG=n
+
+#
 # Compile PMD for ZUC device
 #
 CONFIG_RTE_LIBRTE_PMD_ZUC=n
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index 745c614..cdd3c94 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -38,6 +38,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PMD_QAT) += qat
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_SNOW3G) += snow3g
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_KASUMI) += kasumi
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_ZUC) += zuc
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_CRYPTO_SCHEDULER) += scheduler
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO) += null
 
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/crypto/scheduler/Makefile 
b/drivers/crypto/scheduler/Makefi

Re: [dpdk-dev] [PATCH] net/i40e: fix wrong return value when handling PF message

2017-01-03 Thread Ferruh Yigit
On 12/21/2016 8:29 AM, Wenzhuo Lu wrote:
> When VF receives a message from PF, it should check the return
> value. But in i40evf_execute_vf_cmd the value is ignored and not
> returned to the caller.
> 
> Fixes: 95cd21f45d1b ("i40evf: allocate virtchnl commands buffer per VF")
> 
> Signed-off-by: Wenzhuo Lu 

CC: sta...@dpdk.org

Applied to dpdk-next-net/master, thanks.



[dpdk-dev] [PATCH v3] Scheduler: add driver for scheduler crypto pmd

2017-01-03 Thread Fan Zhang
This patch provides the initial implementation of the scheduler poll mode
driver using DPDK cryptodev framework.

Scheduler PMD is used to schedule and enqueue the crypto ops to the
hardware and/or software crypto devices attached to it (slaves). The
dequeue operation from the slave(s), and the possible dequeued crypto op
reordering, are then carried out by the scheduler.

As the initial version, the scheduler PMD currently supports only the
Round-robin mode, which distributes the enqueued burst of crypto ops
among its slaves in a round-robin manner. This mode may help to fill
the throughput gap between the physical core and the existing cryptodevs
to increase the overall performance. Moreover, the scheduler PMD is
provided the APIs for user to create his/her own scheduler.

Build instructions:
To build DPDK with CRYTPO_SCHEDULER_PMD the user is required to set
CONFIG_RTE_LIBRTE_PMD_CRYPTO_SCHEDULER=y in config/common_base

Notice:
- Scheduler PMD shares same EAL commandline options as other cryptodevs.
  However, apart from socket_id, the rest of cryptodev options are
  ignored. The scheduler PMD's max_nb_queue_pairs and max_nb_sessions
  options are set as the minimum values of the attached slaves'. For
  example, a scheduler cryptodev is attached 2 cryptodevs with
  max_nb_queue_pairs of 2 and 8, respectively. The scheduler cryptodev's
  max_nb_queue_pairs will be automatically updated as 2.

- The scheduler cryptodev cannot be started unless the scheduling mode
  is set and at least one slave is attached. Also, to configure the
  scheduler in the run-time, like attach/detach slave(s), change
  scheduling mode, or enable/disable crypto op ordering, one should stop
  the scheduler first, otherwise an error will be returned.

Changes in v3:
Fixed config/common_base.

Changes in v2:
New approaches in API to suit future scheduling modes.

Signed-off-by: Fan Zhang 
Signed-off-by: Declan Doherty 
---
 config/common_base |   6 +
 drivers/crypto/Makefile|   1 +
 drivers/crypto/scheduler/Makefile  |  67 +++
 drivers/crypto/scheduler/rte_cryptodev_scheduler.c | 598 +
 drivers/crypto/scheduler/rte_cryptodev_scheduler.h | 183 +++
 .../scheduler/rte_cryptodev_scheduler_ioctls.h |  92 
 .../scheduler/rte_cryptodev_scheduler_operations.h |  71 +++
 .../scheduler/rte_pmd_crypto_scheduler_version.map |  12 +
 drivers/crypto/scheduler/scheduler_pmd.c   | 168 ++
 drivers/crypto/scheduler/scheduler_pmd_ops.c   | 495 +
 drivers/crypto/scheduler/scheduler_pmd_private.h   | 122 +
 drivers/crypto/scheduler/scheduler_roundrobin.c| 419 +++
 lib/librte_cryptodev/rte_cryptodev.h   |   4 +
 mk/rte.app.mk  |   3 +-
 14 files changed, 2240 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/scheduler/Makefile
 create mode 100644 drivers/crypto/scheduler/rte_cryptodev_scheduler.c
 create mode 100644 drivers/crypto/scheduler/rte_cryptodev_scheduler.h
 create mode 100644 drivers/crypto/scheduler/rte_cryptodev_scheduler_ioctls.h
 create mode 100644 
drivers/crypto/scheduler/rte_cryptodev_scheduler_operations.h
 create mode 100644 
drivers/crypto/scheduler/rte_pmd_crypto_scheduler_version.map
 create mode 100644 drivers/crypto/scheduler/scheduler_pmd.c
 create mode 100644 drivers/crypto/scheduler/scheduler_pmd_ops.c
 create mode 100644 drivers/crypto/scheduler/scheduler_pmd_private.h
 create mode 100644 drivers/crypto/scheduler/scheduler_roundrobin.c

diff --git a/config/common_base b/config/common_base
index 4bff83a..79d120d 100644
--- a/config/common_base
+++ b/config/common_base
@@ -400,6 +400,12 @@ CONFIG_RTE_LIBRTE_PMD_KASUMI=n
 CONFIG_RTE_LIBRTE_PMD_KASUMI_DEBUG=n
 
 #
+# Compile PMD for Crypto Scheduler device
+#
+CONFIG_RTE_LIBRTE_PMD_CRYPTO_SCHEDULER=n
+CONFIG_RTE_LIBRTE_PMD_CRYPTO_SCHEDULER_DEBUG=n
+
+#
 # Compile PMD for ZUC device
 #
 CONFIG_RTE_LIBRTE_PMD_ZUC=n
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index 745c614..cdd3c94 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -38,6 +38,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PMD_QAT) += qat
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_SNOW3G) += snow3g
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_KASUMI) += kasumi
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_ZUC) += zuc
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_CRYPTO_SCHEDULER) += scheduler
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO) += null
 
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/crypto/scheduler/Makefile 
b/drivers/crypto/scheduler/Makefile
new file mode 100644
index 000..976a565
--- /dev/null
+++ b/drivers/crypto/scheduler/Makefile
@@ -0,0 +1,67 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2015 Intel Corporation. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistribut

Re: [dpdk-dev] [PATCH] mk: disable ICC warning 188

2017-01-03 Thread Adrien Mazarguil
On Tue, Jan 03, 2017 at 04:15:42PM +, Ferruh Yigit wrote:
> error #188: enumerated type mixed with another type
> 
> This is get when an integer assigned to an enum variable.
> 
> Since this usage is common and causing many ICC compilation errors, and
> other compilers accept this usage. Disabling the warning.
> 
> Signed-off-by: Ferruh Yigit 
> ---
>  mk/toolchain/icc/rte.vars.mk | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mk/toolchain/icc/rte.vars.mk b/mk/toolchain/icc/rte.vars.mk
> index ba69f1f..86d9ef7 100644
> --- a/mk/toolchain/icc/rte.vars.mk
> +++ b/mk/toolchain/icc/rte.vars.mk
> @@ -71,6 +71,7 @@ TOOLCHAIN_ASFLAGS =
>  #   was declared "deprecated"
>  WERROR_FLAGS := -Wall -w2 -diag-disable 271 -diag-warning 1478
>  WERROR_FLAGS += -diag-disable 13368 -diag-disable 15527
> +WERROR_FLAGS += -diag-disable 188
>  
>  ifeq ($(RTE_DEVEL_BUILD),y)
>  WERROR_FLAGS += -Werror-all
> -- 
> 2.9.3

I also think this warning may be useful but is not worth the trouble in many
cases, thus:

Acked-by: Adrien Mazarguil 

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH RFC 0/2] Allow vectorized Rx with 4096 desc ring size on Intel NICs.

2017-01-03 Thread Ferruh Yigit
On 1/2/2017 3:40 PM, Thomas Monjalon wrote:
> 2016-12-27 08:03, Ilya Maximets:
>> Hello.
>> Ferruh, Thomas, is there a chance for this to be accepted to 17.02?
>> Maybe I should resend this patch-set without 'RFC' tag?
> 
> Yes it should be integrated in 17.02.
> Ferruh, any news?
> 

I was waiting for a review from driver maintainers

Ilya,

If you don't mind, can you please send again as patches instead of RFC,
if patches also don't get any objection, they can be merged.

Thanks,
ferruh


[dpdk-dev] [PATCH 1/4] net/af_packet: add iface name to internals

2017-01-03 Thread Charles (Chas) Williams
This will be used by later changes to determine the underlying linux
interface.

Signed-off-by: Chas Williams 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index a1e13ff..5541fd7 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -99,6 +99,7 @@ struct pmd_internals {
unsigned nb_queues;
 
int if_index;
+   char *if_name;
struct ether_addr eth_addr;
 
struct tpacket_req req;
@@ -533,6 +534,7 @@ rte_pmd_init_internals(const char *name,
name);
goto error_early;
}
+   (*internals)->if_name = strdup(pair->value);
(*internals)->if_index = ifr.ifr_ifindex;
 
if (ioctl(sockfd, SIOCGIFHWADDR, &ifr) == -1) {
@@ -724,6 +726,7 @@ rte_pmd_init_internals(const char *name,
((*internals)->rx_queue[q].sockfd != qsockfd))
close((*internals)->rx_queue[q].sockfd);
}
+   rte_free((*internals)->if_name);
rte_free(*internals);
 error_early:
rte_free(data);
@@ -892,6 +895,7 @@ rte_pmd_af_packet_remove(const char *name)
rte_free(internals->rx_queue[q].rd);
rte_free(internals->tx_queue[q].rd);
}
+   rte_free(internals->if_name);
 
rte_free(eth_dev->data->dev_private);
rte_free(eth_dev->data);
-- 
2.1.4



[dpdk-dev] [PATCH 3/4] net/af_packet: promisicuous support

2017-01-03 Thread Charles (Chas) Williams
Add promiscuous support to the AF_PACKET PMD.  The underlying linux
device's IF_PROMISC flag is toggled to enable or disable.

Signed-off-by: Charles (Chas) Williams 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 40 +++
 1 file changed, 40 insertions(+)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index d8ac5c6..b01a8d1 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -441,6 +441,44 @@ eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
return 0;
 }
 
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+   struct ifreq ifr;
+   int s;
+
+   s = socket(PF_INET, SOCK_DGRAM, 0);
+   if (s < 0)
+   return;
+
+   strncpy(ifr.ifr_name, if_name, IFNAMSIZ);
+   if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+   goto out;
+   ifr.ifr_flags &= mask;
+   ifr.ifr_flags |= flags;
+   if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+   goto out;
+out:
+   close(s);
+   return;
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+   struct pmd_internals *internals = dev->data->dev_private;
+
+   eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+   struct pmd_internals *internals = dev->data->dev_private;
+
+   eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
 static const struct eth_dev_ops ops = {
.dev_start = eth_dev_start,
.dev_stop = eth_dev_stop,
@@ -448,6 +486,8 @@ static const struct eth_dev_ops ops = {
.dev_configure = eth_dev_configure,
.dev_infos_get = eth_dev_info,
.mtu_set = eth_dev_mtu_set,
+   .promiscuous_disable = eth_dev_promiscuous_disable,
+   .promiscuous_enable = eth_dev_promiscuous_enable,
.rx_queue_setup = eth_rx_queue_setup,
.tx_queue_setup = eth_tx_queue_setup,
.rx_queue_release = eth_queue_release,
-- 
2.1.4



[dpdk-dev] [PATCH 4/4] net/af_packet: add 802.1Q (VLAN) support

2017-01-03 Thread Charles (Chas) Williams
AF_PACKET has some flags to check on the receive side for 802.1Q
information.  If present, we copy into the mbuf.  For transmit, we
insert any 802.1Q information into the packet before copying to the ring.

Signed-off-by: Charles (Chas) Williams 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index b01a8d1..7f1df92 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -161,6 +161,12 @@ eth_af_packet_rx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
pbuf = (uint8_t *) ppd + ppd->tp_mac;
memcpy(rte_pktmbuf_mtod(mbuf, void *), pbuf, 
rte_pktmbuf_data_len(mbuf));
 
+   /* check for vlan info */
+   if (ppd->tp_status & TP_STATUS_VLAN_VALID) {
+   mbuf->vlan_tci = ppd->tp_vlan_tci;
+   mbuf->ol_flags |= (PKT_RX_VLAN_PKT | 
PKT_RX_VLAN_STRIPPED);
+   }
+
/* release incoming frame and advance ring buffer */
ppd->tp_status = TP_STATUS_KERNEL;
if (++framenum >= framecount)
@@ -214,6 +220,14 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
continue;
}
 
+   /* insert vlan info if necessary */
+   if (mbuf->ol_flags & PKT_TX_VLAN_PKT) {
+   if (rte_vlan_insert(&mbuf)) {
+   rte_pktmbuf_free(mbuf);
+   continue;
+   }
+   }
+
/* point at the next incoming frame */
if ((ppd->tp_status != TP_STATUS_AVAILABLE) &&
(poll(&pfd, 1, -1) < 0))
-- 
2.1.4



[dpdk-dev] [PATCH 2/4] net/af_packet: add support to change mtu

2017-01-03 Thread Charles (Chas) Williams
The underlying linux device's MTU is changed subject to the frame size
limitations during device creation.

Signed-off-by: Charles (Chas) Williams 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index 5541fd7..d8ac5c6 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -413,12 +413,41 @@ eth_tx_queue_setup(struct rte_eth_dev *dev,
return 0;
 }
 
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+   struct pmd_internals *internals = dev->data->dev_private;
+   struct ifreq ifr = { .ifr_mtu = mtu };
+   int ret;
+   int s;
+   unsigned int data_size = internals->req.tp_frame_size -
+TPACKET2_HDRLEN -
+sizeof(struct sockaddr_ll);
+
+   if (mtu > data_size)
+   return -EINVAL;
+
+   s = socket(PF_INET, SOCK_DGRAM, 0);
+   if (s < 0)
+   return -EINVAL;
+
+   strncpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
+   ret = ioctl(s, SIOCSIFMTU, &ifr);
+   close(s);
+
+   if (ret < 0)
+   return -EINVAL;
+
+   return 0;
+}
+
 static const struct eth_dev_ops ops = {
.dev_start = eth_dev_start,
.dev_stop = eth_dev_stop,
.dev_close = eth_dev_close,
.dev_configure = eth_dev_configure,
.dev_infos_get = eth_dev_info,
+   .mtu_set = eth_dev_mtu_set,
.rx_queue_setup = eth_rx_queue_setup,
.tx_queue_setup = eth_tx_queue_setup,
.rx_queue_release = eth_queue_release,
-- 
2.1.4



Re: [dpdk-dev] [PATCH v5 04/12] eal: integrate bus scan and probe with EAL

2017-01-03 Thread Thomas Monjalon
2016-12-26 18:53, Shreyansh Jain:
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -844,6 +845,9 @@ rte_eal_init(int argc, char **argv)
> if (rte_eal_intr_init() < 0)
> rte_panic("Cannot init interrupt-handling thread\n");
>  
> +   if (rte_eal_bus_scan())
> +   rte_panic("Cannot scan the buses for devices\n");

Yes, definitely. Just one scan functions which scan registered bus.

> @@ -884,6 +888,9 @@ rte_eal_init(int argc, char **argv)
> if (rte_eal_pci_probe())
> rte_panic("Cannot probe PCI\n");
>  
> +   if (rte_eal_bus_probe())
> +   rte_panic("Cannot probe devices\n");
> +
> if (rte_eal_dev_init() < 0)
> rte_panic("Cannot init pmd devices\n");

What is the benefit of initializing (probe) a device outside of the scan?
Currently, it is done in two steps, so you are keeping the same behaviour.

I imagine a model where the scan function decide to initialize the
device and can require some help from a callback to make this decision.
So the whitelist/blacklist policy can be implemented with callbacks at
the scan level and possibly the responsibility of the application.
Note that the callback model would be a change for a next release.


Re: [dpdk-dev] [PATCH v5 01/12] eal/bus: introduce bus abstraction

2017-01-03 Thread Thomas Monjalon
2016-12-26 18:53, Shreyansh Jain:
> +DPDK_17.02 {
> + global:
> +
> + rte_bus_list;
> + rte_eal_bus_add_device;
> + rte_eal_bus_add_driver;
> + rte_eal_bus_get;
> + rte_eal_bus_dump;
> + rte_eal_bus_register;
> + rte_eal_bus_insert_device;
> + rte_eal_bus_remove_device;
> + rte_eal_bus_remove_driver;
> + rte_eal_bus_unregister;

I think the prefix can be just rte_bus_ instead of rte_eal_bus_.

> +/** Double linked list of buses */
> +TAILQ_HEAD(rte_bus_list, rte_bus);
> +
> +/* Global Bus list */
> +extern struct rte_bus_list rte_bus_list;

Why the bus list is public?

> +/**
> + * A structure describing a generic bus.
> + */
> +struct rte_bus {
> + TAILQ_ENTRY(rte_bus) next;   /**< Next bus object in linked list */
> + struct rte_driver_list driver_list;
> +  /**< List of all drivers on bus */
> + struct rte_device_list device_list;
> +  /**< List of all devices on bus */
> + const char *name;/**< Name of the bus */
> +};

I am not convinced we should link a generic bus to drivers and devices.
What do you think of having rte_pci_bus being a rte_bus and linking
with rte_pci_device and rte_pci_driver lists?

I'm thinking to something like that:

struct rte_bus {
TAILQ_ENTRY(rte_bus) next;
const char *name;
rte_bus_scan_t scan;
rte_bus_match_t match;
};
struct rte_pci_bus {
struct rte_bus bus;
struct rte_pci_driver_list pci_drivers;
struct rte_pci_device_list pci_devices;
};

> +/** Helper for Bus registration. The constructor has higher priority than
> + * PMD constructors
> + */
> +#define RTE_REGISTER_BUS(nm, bus) \
> +static void __attribute__((constructor(101), used)) businitfn_ ##nm(void) \
> +{\
> + (bus).name = RTE_STR(nm);\
> + rte_eal_bus_register(&bus); \
> +}

By removing the lists from rte_bus as suggested above, do you still need
a priority for this constructor?

>  struct rte_device {
>   TAILQ_ENTRY(rte_device) next; /**< Next device */
> + struct rte_bus *bus;  /**< Device connected to this bus */
>   const struct rte_driver *driver;/**< Associated driver */
>   int numa_node;/**< NUMA node connection */
>   struct rte_devargs *devargs;  /**< Device user arguments */
> @@ -148,6 +149,7 @@ void rte_eal_device_remove(struct rte_device *dev);
>   */
>  struct rte_driver {
>   TAILQ_ENTRY(rte_driver) next;  /**< Next in list. */
> + struct rte_bus *bus;   /**< Bus serviced by this driver */
>   const char *name;   /**< Driver name. */
>   const char *alias;  /**< Driver alias. */
>  };

Do we need to know the bus associated to a driver in rte_driver?
Bus and driver are already associated in rte_device.



Re: [dpdk-dev] [PATCH v5 05/12] eal: add probe and remove support for rte_driver

2017-01-03 Thread Thomas Monjalon
2016-12-26 18:53, Shreyansh Jain:
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -152,6 +162,8 @@ struct rte_driver {
>   struct rte_bus *bus;   /**< Bus serviced by this driver */
>   const char *name;   /**< Driver name. */
>   const char *alias;  /**< Driver alias. */
> + driver_probe_t *probe; /**< Probe the device */
> + driver_remove_t *remove;   /**< Remove/hotplugging the device */
>  };

If I understand well, this probe function does neither scan nor match.
So it could be named init.

I think the probe (init) and remove ops must be specific to the bus.
We can have them in rte_bus, and as an example, the pci implementation
would call the pci probe and remove ops of rte_pci_driver.

Please use rte_ prefix in public headers.


Re: [dpdk-dev] [PATCH v5 07/12] pci: split match and probe function

2017-01-03 Thread Thomas Monjalon
2016-12-26 18:54, Shreyansh Jain:
> --- a/lib/librte_eal/common/include/rte_pci.h
> +++ b/lib/librte_eal/common/include/rte_pci.h
> @@ -373,6 +373,21 @@ rte_eal_compare_pci_addr(const struct rte_pci_addr *addr,
>  int rte_eal_pci_scan(void);
>  
>  /**
> + * Match the PCI Driver and Device using the ID Table
> + *
> + * @param pci_drv
> + *   PCI driver from which ID table would be extracted
> + * @param pci_dev
> + *   PCI device to match against the driver
> + * @return
> + *   0 for successful match
> + *   !0 for unsuccessful match
> + */
> +int
> +rte_eal_pci_match(struct rte_pci_driver *pci_drv,
> +   struct rte_pci_device *pci_dev);

Yes we definitely need this function.


Re: [dpdk-dev] [PATCH v5 08/12] eal/pci: generalize args of PCI scan/match towards RTE device/driver

2017-01-03 Thread Thomas Monjalon
2016-12-26 18:54, Shreyansh Jain:
> PCI scan and match now work on rte_device/rte_driver rather than PCI
> specific objects. These functions can now be plugged to the generic
> bus callbacks for scanning and matching devices/drivers.

These sentences looks weird :)
PCI functions must work with PCI objects, it's simpler.
However I agree to register PCI scan, match, init and remove functions
with the generic rte_bus. Then the rte_device object is casted into
rte_pci_device inside these functions.


Re: [dpdk-dev] [PATCH v5 12/12] eal/bus: add bus iteration macros

2017-01-03 Thread Thomas Monjalon
2016-12-26 18:54, Shreyansh Jain:
> Three macros:
>  FOREACH_BUS
>  FOREACH_DEVICE_ON_BUS
>  FOREACH_DRIVER_ON_BUS
> are introduced to make looping over bus (on global list), devices and
> drivers (on a specific bus) prettier.

Nice



Re: [dpdk-dev] [PATCH v5 00/12] Introducing EAL Bus-Device-Driver Model

2017-01-03 Thread Thomas Monjalon
2016-12-26 18:53, Shreyansh Jain:
> Link to v1: [10]
> Link to v2: [11]
> Link to v3: [13]
> Link to v4: [14]
> 
> :: Introduction ::
> 
> DPDK has been inherently a PCI inclined framework. Because of this, the
> design of device tree (or list) within DPDK is also PCI inclined. A
> non-PCI device doesn't have a way of being expressed without using hooks
> started from EAL to PMD.

It is a very important work to make DPDK growing.

I am sorry to not have done a lot of public comments before today.
I have sent some thoughts about moving some things from generic objects to
specialized ones. I think they are not so much big changes in your work
and I hope we could converge to something in the git tree really soon.

Thanks Shreyansh.

PS: reviews from others are more than welcome!


[dpdk-dev] [PATCH] eal: Support running as unprivileged user

2017-01-03 Thread Ben Walker
For Linux kernel 4.0 and newer, the ability to obtain
physical page frame numbers for unprivileged users from
/proc/self/pagemap was removed. Instead, when an IOMMU
is present, simply choose our own DMA addresses instead.

Signed-off-by: Ben Walker 
---
 lib/librte_eal/common/eal_private.h  | 12 ++
 lib/librte_eal/linuxapp/eal/eal_memory.c | 71 +++-
 lib/librte_eal/linuxapp/eal/eal_pci.c|  6 ++-
 3 files changed, 68 insertions(+), 21 deletions(-)

diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 9e7d8f6..8b2d323 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -34,6 +34,7 @@
 #ifndef _EAL_PRIVATE_H_
 #define _EAL_PRIVATE_H_
 
+#include 
 #include 
 #include 
 
@@ -301,4 +302,15 @@ int rte_eal_hugepage_init(void);
  */
 int rte_eal_hugepage_attach(void);
 
+/**
+ * Returns true if the system is able to obtain
+ * physical addresses. Return false if using DMA
+ * addresses through an IOMMU.
+ *
+ * Drivers based on uio will not load unless physical
+ * addresses are obtainable. It is only possible to get
+ * physical addresses when running as a privileged user.
+ */
+bool rte_eal_using_phys_addrs(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index a956bb2..33c66c1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -64,6 +64,7 @@
 #define _FILE_OFFSET_BITS 64
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -122,26 +123,24 @@ int rte_xen_dom0_supported(void)
 
 static uint64_t baseaddr_offset;
 
-static unsigned proc_pagemap_readable;
+static bool phys_addrs_available = true;
 
 #define RANDOMIZE_VA_SPACE_FILE "/proc/sys/kernel/randomize_va_space"
 
 static void
-test_proc_pagemap_readable(void)
+test_phys_addrs_available(void)
 {
-   int fd = open("/proc/self/pagemap", O_RDONLY);
+   uint64_t tmp;
+   phys_addr_t physaddr;
 
-   if (fd < 0) {
+   physaddr = rte_mem_virt2phy(&tmp);
+   if (physaddr == RTE_BAD_PHYS_ADDR) {
RTE_LOG(ERR, EAL,
-   "Cannot open /proc/self/pagemap: %s. "
-   "virt2phys address translation will not work\n",
+   "Cannot obtain physical addresses: %s. "
+   "Only vfio will function.\n",
strerror(errno));
-   return;
+   phys_addrs_available = false;
}
-
-   /* Is readable */
-   close(fd);
-   proc_pagemap_readable = 1;
 }
 
 /* Lock page in physical memory and prevent from swapping. */
@@ -190,7 +189,7 @@ rte_mem_virt2phy(const void *virtaddr)
}
 
/* Cannot parse /proc/self/pagemap, no need to log errors everywhere */
-   if (!proc_pagemap_readable)
+   if (!phys_addrs_available)
return RTE_BAD_PHYS_ADDR;
 
/* standard page size */
@@ -229,6 +228,9 @@ rte_mem_virt2phy(const void *virtaddr)
 * the pfn (page frame number) are bits 0-54 (see
 * pagemap.txt in linux Documentation)
 */
+   if ((page & 0x7fULL) == 0)
+   return RTE_BAD_PHYS_ADDR;
+
physaddr = ((page & 0x7fULL) * page_size)
+ ((unsigned long)virtaddr % page_size);
 
@@ -255,6 +257,22 @@ find_physaddrs(struct hugepage_file *hugepg_tbl, struct 
hugepage_info *hpi)
 }
 
 /*
+ * For each hugepage in hugepg_tbl, fill the physaddr value sequentially.
+ */
+static int
+set_physaddrs(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi)
+{
+   unsigned i;
+   phys_addr_t addr = 0;
+
+   for (i = 0; i < hpi->num_pages[0]; i++) {
+   hugepg_tbl[i].physaddr = addr;
+   addr += hugepg_tbl[i].size;
+   }
+   return 0;
+}
+
+/*
  * Check whether address-space layout randomization is enabled in
  * the kernel. This is important for multi-process as it can prevent
  * two processes mapping data to the same virtual address
@@ -951,7 +969,7 @@ rte_eal_hugepage_init(void)
int nr_hugefiles, nr_hugepages = 0;
void *addr;
 
-   test_proc_pagemap_readable();
+   test_phys_addrs_available();
 
memset(used_hp, 0, sizeof(used_hp));
 
@@ -1043,11 +1061,20 @@ rte_eal_hugepage_init(void)
continue;
}
 
-   /* find physical addresses and sockets for each hugepage */
-   if (find_physaddrs(&tmp_hp[hp_offset], hpi) < 0){
-   RTE_LOG(DEBUG, EAL, "Failed to find phys addr for %u MB 
pages\n",
-   (unsigned)(hpi->hugepage_sz / 
0x10));
-   goto fail;
+   if (phys_addrs_available) {
+   /* find physical addresses for each hugepage */
+   if (find_physaddrs(&tmp_hp[hp_offse

Re: [dpdk-dev] Running DPDK as an unprivileged user

2017-01-03 Thread Walker, Benjamin
On Thu, 2016-12-29 at 17:14 -0800, Stephen Hemminger wrote:
> If kernel broke pinning of hugepages, then it is an upstream kernel bug.

The kernel, under a myriad of circumstances, will change the mapping of virtual
to physical addresses for hugepages. This behavior began somewhere around kernel
3.16 and with each release more cases where the mapping can change are
introduced. DPDK should not be relying on that mapping staying static, and
instead should be using vfio to explicitly pin the pages. I've consulted the
relevant kernel developers who write the code in this area and they are
universally in agreement that this is not a kernel bug and the mappings will get
less static over time.

On Mon, 2017-01-02 at 11:47 -0800, Stephen Hemminger wrote:
> On Mon, 02 Jan 2017 15:32:08 +0100
> Thomas Monjalon  wrote:
> 
> > 2016-12-29 17:14, Stephen Hemminger:
> > > On Thu, 29 Dec 2016 20:41:21 +
> > > "Walker, Benjamin"  wrote:  
> > > > My second question is whether the user should be allowed to
> > > > mix uio and vfio usage simultaneously. For vfio, the
> > > > physical addresses are really DMA addresses and are best
> > > > when arbitrarily chosen to appear sequential relative to
> > > > their virtual addresses. For uio, they are physical
> > > > addresses and are not chosen at all. It seems that these two
> > > > things are in conflict and that it will be difficult, ugly,
> > > > and maybe impossible to resolve the simultaneous use of
> > > > both.  
> > > 
> > > Unless application is running as privileged user (ie root), UIO
> > > is not going to work. Therefore don't worry about mixed environment.  
> > 
> > Yes, mixing UIO and VFIO is possible only as root.
> > However, what is the benefit of mixing them?
> 
> One possible case where this could be used, Hyper-V/Azure and SR-IOV.
> The VF interface will show up on an isolated PCI bus and the virtual NIC
> is on VMBUS. It is possible to use VFIO on the PCI to get MSI-X per queue
> interrupts, but there is no support for VFIO on VMBUS.

I sent out a patch a little while ago that makes DPDK work when running as an
unprivileged user with an IOMMU. I allow mixing of uio/vfio when root (I choose
the DMA address to be the physical address), but only vfio when unprivileged (I
choose the DMA addresses to start at 0).

Unfortunately, there are a few more wrinkles for systems that do not have an
IOMMU. These systems still need to explicitly pin memory, but they need to use
physical addresses instead of DMA addresses. There are two concerns with this:

1) Physical addresses cannot be exposed to unprivileged users due to security
concerns (the fallout of rowhammer). Therefore, systems without an IOMMU can
only support privileged users. I think this is probably fine.
2) The IOCTL from vfio to pin the memory is tied to specifying the DMA address
and programming the IOMMU. This is unfortunate - systems without an IOMMU still
want to do the pinning, but they need to be given the physical address instead
of specifying a DMA address.
3) Not all device types, particularly in virtualization environments, support
vfio today. These devices have no way to explicitly pin memory.

I think this is going to take a kernel patch or two to resolve, unless someone
has a good idea.

[dpdk-dev] [PATCH v2] eal: Support running as unprivileged user

2017-01-03 Thread Ben Walker
For Linux kernel 4.0 and newer, the ability to obtain
physical page frame numbers for unprivileged users from
/proc/self/pagemap was removed. Instead, when an IOMMU
is present, simply choose our own DMA addresses instead.

Signed-off-by: Ben Walker 
---
 lib/librte_eal/common/eal_private.h  | 12 +
 lib/librte_eal/linuxapp/eal/eal_memory.c | 75 +++-
 lib/librte_eal/linuxapp/eal/eal_pci.c|  6 ++-
 3 files changed, 71 insertions(+), 22 deletions(-)

diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 9e7d8f6..8b2d323 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -34,6 +34,7 @@
 #ifndef _EAL_PRIVATE_H_
 #define _EAL_PRIVATE_H_
 
+#include 
 #include 
 #include 
 
@@ -301,4 +302,15 @@ int rte_eal_hugepage_init(void);
  */
 int rte_eal_hugepage_attach(void);
 
+/**
+ * Returns true if the system is able to obtain
+ * physical addresses. Return false if using DMA
+ * addresses through an IOMMU.
+ *
+ * Drivers based on uio will not load unless physical
+ * addresses are obtainable. It is only possible to get
+ * physical addresses when running as a privileged user.
+ */
+bool rte_eal_using_phys_addrs(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index a956bb2..8678ae9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -64,6 +64,7 @@
 #define _FILE_OFFSET_BITS 64
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -122,26 +123,24 @@ int rte_xen_dom0_supported(void)
 
 static uint64_t baseaddr_offset;
 
-static unsigned proc_pagemap_readable;
+static bool phys_addrs_available = true;
 
 #define RANDOMIZE_VA_SPACE_FILE "/proc/sys/kernel/randomize_va_space"
 
 static void
-test_proc_pagemap_readable(void)
+test_phys_addrs_available(void)
 {
-   int fd = open("/proc/self/pagemap", O_RDONLY);
+   uint64_t tmp;
+   phys_addr_t physaddr;
 
-   if (fd < 0) {
+   physaddr = rte_mem_virt2phy(&tmp);
+   if (physaddr == RTE_BAD_PHYS_ADDR) {
RTE_LOG(ERR, EAL,
-   "Cannot open /proc/self/pagemap: %s. "
-   "virt2phys address translation will not work\n",
+   "Cannot obtain physical addresses: %s. "
+   "Only vfio will function.\n",
strerror(errno));
-   return;
+   phys_addrs_available = false;
}
-
-   /* Is readable */
-   close(fd);
-   proc_pagemap_readable = 1;
 }
 
 /* Lock page in physical memory and prevent from swapping. */
@@ -190,7 +189,7 @@ rte_mem_virt2phy(const void *virtaddr)
}
 
/* Cannot parse /proc/self/pagemap, no need to log errors everywhere */
-   if (!proc_pagemap_readable)
+   if (!phys_addrs_available)
return RTE_BAD_PHYS_ADDR;
 
/* standard page size */
@@ -229,6 +228,9 @@ rte_mem_virt2phy(const void *virtaddr)
 * the pfn (page frame number) are bits 0-54 (see
 * pagemap.txt in linux Documentation)
 */
+   if ((page & 0x7fULL) == 0)
+   return RTE_BAD_PHYS_ADDR;
+
physaddr = ((page & 0x7fULL) * page_size)
+ ((unsigned long)virtaddr % page_size);
 
@@ -242,7 +244,7 @@ rte_mem_virt2phy(const void *virtaddr)
 static int
 find_physaddrs(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi)
 {
-   unsigned i;
+   unsigned int i;
phys_addr_t addr;
 
for (i = 0; i < hpi->num_pages[0]; i++) {
@@ -255,6 +257,22 @@ find_physaddrs(struct hugepage_file *hugepg_tbl, struct 
hugepage_info *hpi)
 }
 
 /*
+ * For each hugepage in hugepg_tbl, fill the physaddr value sequentially.
+ */
+static int
+set_physaddrs(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi)
+{
+   unsigned int i;
+   phys_addr_t addr = 0;
+
+   for (i = 0; i < hpi->num_pages[0]; i++) {
+   hugepg_tbl[i].physaddr = addr;
+   addr += hugepg_tbl[i].size;
+   }
+   return 0;
+}
+
+/*
  * Check whether address-space layout randomization is enabled in
  * the kernel. This is important for multi-process as it can prevent
  * two processes mapping data to the same virtual address
@@ -951,7 +969,7 @@ rte_eal_hugepage_init(void)
int nr_hugefiles, nr_hugepages = 0;
void *addr;
 
-   test_proc_pagemap_readable();
+   test_phys_addrs_available();
 
memset(used_hp, 0, sizeof(used_hp));
 
@@ -1043,11 +1061,22 @@ rte_eal_hugepage_init(void)
continue;
}
 
-   /* find physical addresses and sockets for each hugepage */
-   if (find_physaddrs(&tmp_hp[hp_offset], hpi) < 0){
-   RTE_LOG(DEBUG, EAL, "Failed to find phys addr for %u MB 
pages\n",
-

Re: [dpdk-dev] [PATCH v2 01/18] net/ixgbe: store SYN filter

2017-01-03 Thread Zhao1, Wei
Hi, Daiwei

> -Original Message-
> From: Dai, Wei
> Sent: Tuesday, January 3, 2017 10:33 PM
> To: Zhao1, Wei ; dev@dpdk.org
> Cc: Lu, Wenzhuo ; Zhao1, Wei
> 
> Subject: RE: [dpdk-dev] [PATCH v2 01/18] net/ixgbe: store SYN filter
> 
> Hi, Wei Zhao
> 
> I think that you had better give a cover letter for such a series of patches.
> You can give the changes between v2 and v1 in cover letter and maybe no
> need describe it in each one.
> 

Ok, I will add cover letter in v3

> Thanks &Best Regards
> -Wei
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Wei Zhao
> > Sent: Friday, December 30, 2016 3:53 PM
> > To: dev@dpdk.org
> > Cc: Lu, Wenzhuo ; Zhao1, Wei
> > 
> > Subject: [dpdk-dev] [PATCH v2 01/18] net/ixgbe: store SYN filter
> >
> > Add support for storing SYN filter in SW.
> >
> > Signed-off-by: Wenzhuo Lu 
> > Signed-off-by: Wei Zhao 
> > ---
> >
> > v2:
> > --synqf assignment location change
> > ---
> >  drivers/net/ixgbe/ixgbe_ethdev.c | 14 +++---
> > drivers/net/ixgbe/ixgbe_ethdev.h |  2 ++
> >  2 files changed, 13 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > b/drivers/net/ixgbe/ixgbe_ethdev.c
> > index a25bac8..316e560 100644
> > --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > @@ -1274,6 +1274,8 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
> > memset(filter_info->fivetuple_mask, 0,
> >sizeof(uint32_t) * IXGBE_5TUPLE_ARRAY_SIZE);
> >
> > +   /* initialize SYN filter */
> > +   filter_info->syn_info = 0;
> > return 0;
> >  }
> >
> > @@ -5580,15 +5582,18 @@ ixgbe_syn_filter_set(struct rte_eth_dev *dev,
> > bool add)
> >  {
> > struct ixgbe_hw *hw =
> > IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > +   struct ixgbe_filter_info *filter_info =
> > +   IXGBE_DEV_PRIVATE_TO_FILTER_INFO(dev->data-
> >dev_private);
> > +   uint32_t syn_info;
> > uint32_t synqf;
> >
> > if (filter->queue >= IXGBE_MAX_RX_QUEUE_NUM)
> > return -EINVAL;
> >
> > -   synqf = IXGBE_READ_REG(hw, IXGBE_SYNQF);
> > +   syn_info = filter_info->syn_info;
> >
> > if (add) {
> > -   if (synqf & IXGBE_SYN_FILTER_ENABLE)
> > +   if (syn_info & IXGBE_SYN_FILTER_ENABLE)
> > return -EINVAL;
> > synqf = (uint32_t)(((filter->queue <<
> > IXGBE_SYN_FILTER_QUEUE_SHIFT) &
> > IXGBE_SYN_FILTER_QUEUE) |
> IXGBE_SYN_FILTER_ENABLE); @@ -5598,10
> > +5603,13 @@ ixgbe_syn_filter_set(struct rte_eth_dev *dev,
> > else
> > synqf &= ~IXGBE_SYN_FILTER_SYNQFP;
> > } else {
> > -   if (!(synqf & IXGBE_SYN_FILTER_ENABLE))
> > +   synqf = IXGBE_READ_REG(hw, IXGBE_SYNQF);
> > +   if (!(syn_info & IXGBE_SYN_FILTER_ENABLE))
> > return -ENOENT;
> > synqf &= ~(IXGBE_SYN_FILTER_QUEUE |
> IXGBE_SYN_FILTER_ENABLE);
> > }
> > +
> > +   filter_info->syn_info = synqf;
> > IXGBE_WRITE_REG(hw, IXGBE_SYNQF, synqf);
> > IXGBE_WRITE_FLUSH(hw);
> > return 0;
> > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h
> > b/drivers/net/ixgbe/ixgbe_ethdev.h
> > index 4ff6338..827026c 100644
> > --- a/drivers/net/ixgbe/ixgbe_ethdev.h
> > +++ b/drivers/net/ixgbe/ixgbe_ethdev.h
> > @@ -262,6 +262,8 @@ struct ixgbe_filter_info {
> > /* Bit mask for every used 5tuple filter */
> > uint32_t fivetuple_mask[IXGBE_5TUPLE_ARRAY_SIZE];
> > struct ixgbe_5tuple_filter_list fivetuple_list;
> > +   /* store the SYN filter info */
> > +   uint32_t syn_info;
> >  };
> >
> >  /*
> > --
> > 2.5.5



Re: [dpdk-dev] [PATCH v2 02/18] net/ixgbe: store flow director filter

2017-01-03 Thread Zhao1, Wei
Hi, weid


> -Original Message-
> From: Dai, Wei
> Sent: Tuesday, January 3, 2017 10:28 PM
> To: Zhao1, Wei ; dev@dpdk.org
> Cc: Lu, Wenzhuo ; Zhao1, Wei
> 
> Subject: RE: [dpdk-dev] [PATCH v2 02/18] net/ixgbe: store flow director filter
> 
> Hi, Wei Zhao
> 
> Would you please do git rebase master for this patch set?
> When I do git pull and then git apply this patch, following errors are 
> reported:
> [root@dpdk4 dpdk-org]# git am ../patches/bundle-488-zhaowei-ixgbe-filter-
> api-v2.mbox
> 

This patch is based on dpdk_next_net lib.

> Applying: net/ixgbe: store SYN filter
> Applying: net/ixgbe: store flow director filter
> error: patch failed: drivers/net/ixgbe/ixgbe_ethdev.c:1284
> error: drivers/net/ixgbe/ixgbe_ethdev.c: patch does not apply Patch failed at
> 0002 net/ixgbe: store flow director filter The copy of the patch that failed 
> is
> found in: .git/rebase-apply/patch When you have resolved this problem, run
> "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Wei Zhao
> > Sent: Friday, December 30, 2016 3:53 PM
> > To: dev@dpdk.org
> > Cc: Lu, Wenzhuo ; Zhao1, Wei
> > 
> > Subject: [dpdk-dev] [PATCH v2 02/18] net/ixgbe: store flow director
> > filter
> >
> > Add support for storing flow director filter in SW.
> >
> > Signed-off-by: Wenzhuo Lu 
> > Signed-off-by: Wei Zhao 
> > ---
> >
> > v2:
> > --add a fdir initialization function in device start process
> > ---
> >  drivers/net/ixgbe/ixgbe_ethdev.c |  55 
> > drivers/net/ixgbe/ixgbe_ethdev.h |  19 ++-
> >  drivers/net/ixgbe/ixgbe_fdir.c   | 105
> > ++-
> >  3 files changed, 176 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > b/drivers/net/ixgbe/ixgbe_ethdev.c
> > index 316e560..de27a73 100644
> > --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > @@ -60,6 +60,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include "ixgbe_logs.h"
> >  #include "base/ixgbe_api.h"
> > @@ -165,6 +166,7 @@ enum ixgbevf_xcast_modes {
> >
> >  static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev);  static
> > int eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev);
> > +static int ixgbe_fdir_filter_init(struct rte_eth_dev *eth_dev);
> >  static int  ixgbe_dev_configure(struct rte_eth_dev *dev);  static int
> > ixgbe_dev_start(struct rte_eth_dev *dev);  static void
> > ixgbe_dev_stop(struct rte_eth_dev *dev); @@ -1276,6 +1278,9 @@
> > eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
> >
> > /* initialize SYN filter */
> > filter_info->syn_info = 0;
> > +   /* initialize flow director filter list & hash */
> > +   ixgbe_fdir_filter_init(eth_dev);
> > +
> > return 0;
> >  }
> >
> > @@ -1284,6 +1289,9 @@ eth_ixgbe_dev_uninit(struct rte_eth_dev
> > *eth_dev) {
> > struct rte_pci_device *pci_dev;
> > struct ixgbe_hw *hw;
> > +   struct ixgbe_hw_fdir_info *fdir_info =
> > +   IXGBE_DEV_PRIVATE_TO_FDIR_INFO(eth_dev->data-
> >dev_private);
> > +   struct ixgbe_fdir_filter *fdir_filter;
> >
> > PMD_INIT_FUNC_TRACE();
> >
> > @@ -1317,9 +1325,56 @@ eth_ixgbe_dev_uninit(struct rte_eth_dev
> > *eth_dev)
> > rte_free(eth_dev->data->hash_mac_addrs);
> > eth_dev->data->hash_mac_addrs = NULL;
> >
> > +   /* remove all the fdir filters & hash */
> > +   if (fdir_info->hash_map)
> > +   rte_free(fdir_info->hash_map);
> > +   if (fdir_info->hash_handle)
> > +   rte_hash_free(fdir_info->hash_handle);
> > +
> > +   while ((fdir_filter = TAILQ_FIRST(&fdir_info->fdir_list))) {
> > +   TAILQ_REMOVE(&fdir_info->fdir_list,
> > +fdir_filter,
> > +entries);
> > +   rte_free(fdir_filter);
> > +   }
> > +
> > return 0;
> >  }
> >
> > +static int ixgbe_fdir_filter_init(struct rte_eth_dev *eth_dev) {
> > +   struct ixgbe_hw_fdir_info *fdir_info =
> > +   IXGBE_DEV_PRIVATE_TO_FDIR_INFO(eth_dev->data-
> >dev_private);
> > +   char fdir_hash_name[RTE_HASH_NAMESIZE];
> > +   struct rte_hash_parameters fdir_hash_params = {
> > +   .name = fdir_hash_name,
> > +   .entries = IXGBE_MAX_FDIR_FILTER_NUM,
> > +   .key_len = sizeof(union ixgbe_atr_input),
> > +   .hash_func = rte_hash_crc,
> > +   .hash_func_init_val = 0,
> > +   .socket_id = rte_socket_id(),
> > +   };
> > +
> > +   TAILQ_INIT(&fdir_info->fdir_list);
> > +   snprintf(fdir_hash_name, RTE_HASH_NAMESIZE,
> > +"fdir_%s", eth_dev->data->name);
> > +   fdir_info->hash_handle = rte_hash_create(&fdir_hash_params);
> > +   if (!fdir_info->hash_handle) {
> > +   PMD_INIT_LOG(ERR, "Failed to create fdir hash table!");
> > +   return -EINVAL;
> > +   }
> > +   fdir_info->hash_map = r

Re: [dpdk-dev] [PATCH v3 3/4] net/ixgbe: add firmware version get

2017-01-03 Thread Yang, Qiming
You can see the kernel version ethtool, use command 'ethtool -i  '
driver: ixgbe
version: 4.2.1-k
firmware-version: 0x61bf0001

Ixgbe's FW version do not have the major and minor number, and the original 
purpose of this function is get FW version, so I think it's enough. 

-Original Message-
From: Yigit, Ferruh 
Sent: Tuesday, January 3, 2017 11:04 PM
To: Yang, Qiming ; dev@dpdk.org; 
thomas.monja...@6wind.com
Cc: Horton, Remy 
Subject: Re: [PATCH v3 3/4] net/ixgbe: add firmware version get

On 12/27/2016 12:30 PM, Qiming Yang wrote:
> This patch add a new function ixgbe_fw_version_get.
> 
> Signed-off-by: Qiming Yang 

<...>

>  
>  static void
> +ixgbe_fw_version_get(struct rte_eth_dev *dev, __rte_unused u32 *fw_major,
> + __rte_unused u32 *fw_minor, __rte_unused u32 *fw_patch, u32 
> +*etrack_id)

This API at least provide major and minor fw versions I think. Isn't there any 
kind of FW version information for ixgbe? Just providing etrack_id is not 
looking good.

> +{
> + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> + u16 eeprom_verh, eeprom_verl;
> +
> + ixgbe_read_eeprom(hw, 0x2e, &eeprom_verh);
> + ixgbe_read_eeprom(hw, 0x2d, &eeprom_verl);
> +
> + *etrack_id = (eeprom_verh << 16) | eeprom_verl; }
> +
> +static void
>  ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info 
> *dev_info)  {
>   struct rte_pci_device *pci_dev = IXGBE_DEV_TO_PCI(dev);
> 



[dpdk-dev] [PATCH] lib/librte_vhost: fix memory leak

2017-01-03 Thread Yong Wang
In function vhost_new_device(), current code dose not free 'dev'
in "i == MAX_VHOST_DEVICE" condition statements. It will lead to a
memory leak.

Signed-off-by: Yong Wang 
---
 lib/librte_vhost/vhost.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 31825b8..e415093 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -250,6 +250,7 @@ struct virtio_net *
if (i == MAX_VHOST_DEVICE) {
RTE_LOG(ERR, VHOST_CONFIG,
"Failed to find a free slot for new device.\n");
+   rte_free(dev);
return -1;
}
 
-- 
1.8.3.1




Re: [dpdk-dev] [PATCH v3 2/4] net/e1000: add firmware version get

2017-01-03 Thread Yang, Qiming
See the reply below.

-Original Message-
From: Yigit, Ferruh 
Sent: Tuesday, January 3, 2017 11:03 PM
To: Yang, Qiming ; dev@dpdk.org; 
thomas.monja...@6wind.com
Cc: Horton, Remy 
Subject: Re: [PATCH v3 2/4] net/e1000: add firmware version get

On 12/27/2016 12:30 PM, Qiming Yang wrote:
> This patch adds a new function eth_igb_fw_version_get.
> 
> Signed-off-by: Qiming Yang 
> ---
> v3 changes:
>  * use eth_igb_fw_version_get(struct rte_eth_dev *dev, u32 *fw_major,
>u32 *fw_minor, u32 *fw_minor, u32 *fw_patch, u32 *etrack_id) instead
>of eth_igb_fw_version_get(struct rte_eth_dev *dev, char *fw_version,
>int fw_length). Add statusment in /doc/guides/nics/features/igb.ini.
> ---
> ---
>  doc/guides/nics/features/igb.ini |  1 +
>  drivers/net/e1000/igb_ethdev.c   | 43 
> 
>  2 files changed, 44 insertions(+)
> 
> diff --git a/doc/guides/nics/features/igb.ini 
> b/doc/guides/nics/features/igb.ini
> index 9fafe72..ffd87ba 100644
> --- a/doc/guides/nics/features/igb.ini
> +++ b/doc/guides/nics/features/igb.ini
> @@ -39,6 +39,7 @@ EEPROM dump  = Y
>  Registers dump   = Y
>  BSD nic_uio  = Y
>  Linux UIO= Y
> +FW version   = Y

Please keep same location with default.ini file. Why you are putting this just 
into middle of the uio and vfio?
Qiming: It's a clerical error, I want to add this line at the end of this file.

>  Linux VFIO   = Y
>  x86-32   = Y
>  x86-64   = Y
> diff --git a/drivers/net/e1000/igb_ethdev.c 
> b/drivers/net/e1000/igb_ethdev.c index 4a15447..25344b7 100644
> --- a/drivers/net/e1000/igb_ethdev.c
> +++ b/drivers/net/e1000/igb_ethdev.c
> @@ -120,6 +120,8 @@ static int eth_igb_xstats_get_names(struct rte_eth_dev 
> *dev,
>   unsigned limit);
>  static void eth_igb_stats_reset(struct rte_eth_dev *dev);  static 
> void eth_igb_xstats_reset(struct rte_eth_dev *dev);
> +static void eth_igb_fw_version_get(struct rte_eth_dev *dev, u32 *fw_major,
> + u32 *fw_minor, u32 *fw_patch, u32 *etrack_id);

I think you can use a struct as parameter here. But beware, that struct should 
NOT be a public struct.
Qiming: I think only add a private struct for igb is unnecessary. Keep the 
arguments consistent with rte_eth_dev_fw_info_get is better.
What do you think?

>  static void eth_igb_infos_get(struct rte_eth_dev *dev,
> struct rte_eth_dev_info *dev_info);  static const 
> uint32_t 
> *eth_igb_supported_ptypes_get(struct rte_eth_dev *dev); @@ -389,6 
> +391,7 @@ static const struct eth_dev_ops eth_igb_ops = {
>   .xstats_get_names = eth_igb_xstats_get_names,
>   .stats_reset  = eth_igb_stats_reset,
>   .xstats_reset = eth_igb_xstats_reset,
> + .fw_version_get   = eth_igb_fw_version_get,
>   .dev_infos_get= eth_igb_infos_get,
>   .dev_supported_ptypes_get = eth_igb_supported_ptypes_get,
>   .mtu_set  = eth_igb_mtu_set,
> @@ -1981,6 +1984,46 @@ eth_igbvf_stats_reset(struct rte_eth_dev *dev)  
> }
>  

<...>


[dpdk-dev] [PATCH v5 00/17] net/i40e: consistent filter API

2017-01-03 Thread Beilei Xing
The patch set depends on Adrien's Generic flow API(rte_flow).

The patches mainly finish following functions:
1) Store and restore all kinds of filters.
2) Parse all kinds of filters.
3) Add flow validate function.
4) Add flow create function.
5) Add flow destroy function.
6) Add flow flush function.

v5 changes:
 Change some local variable name.
 Add removing i40e_flow_list during device unint.
 Fix compile error when gcc compile option isn't '-O0'.

v4 changes:
 Change I40E_TCI_MASK with 0x to align with testpmd.
 Modidy the stats show when restoring filters.

v3 changes:
 Set the related cause pointer to a non-NULL value when error happens.
 Change return value when error happens.
 Modify filter_del parameter with key.
 Malloc filter after checking when delete a filter.
 Delete meaningless initialization.
 Add return value when there's error.
 Change global variable definition.
 Modify some function declaration.

v2 changes:
 Add i40e_flow.c, all flow ops are implemented in the file.
 Change the whole implementation of all parse flow functions.
 Update error info for all flow ops.
 Add flow_list to store flows created.

Beilei Xing (17):
  net/i40e: store ethertype filter
  net/i40e: store tunnel filter
  net/i40e: store flow director filter
  net/i40e: restore ethertype filter
  net/i40e: restore tunnel filter
  net/i40e: restore flow director filter
  net/i40e: add flow validate function
  net/i40e: parse flow director filter
  net/i40e: parse tunnel filter
  net/i40e: add flow create function
  net/i40e: add flow destroy function
  net/i40e: destroy ethertype filter
  net/i40e: destroy tunnel filter
  net/i40e: destroy flow directory filter
  net/i40e: add flow flush function
  net/i40e: flush ethertype filters
  net/i40e: flush tunnel filters

 drivers/net/i40e/Makefile  |2 +
 drivers/net/i40e/i40e_ethdev.c |  526 ++--
 drivers/net/i40e/i40e_ethdev.h |  173 
 drivers/net/i40e/i40e_fdir.c   |  140 +++-
 drivers/net/i40e/i40e_flow.c   | 1772 
 5 files changed, 2547 insertions(+), 66 deletions(-)
 create mode 100644 drivers/net/i40e/i40e_flow.c

-- 
2.5.5



[dpdk-dev] [PATCH v5 01/17] net/i40e: store ethertype filter

2017-01-03 Thread Beilei Xing
Currently there's no ethertype filter stored in SW.
This patch stores ethertype filter with cuckoo hash
in SW, also adds protection if an ethertype filter
has been added.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/Makefile  |   1 +
 drivers/net/i40e/i40e_ethdev.c | 166 -
 drivers/net/i40e/i40e_ethdev.h |  31 
 3 files changed, 197 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile
index 66997b6..11175c4 100644
--- a/drivers/net/i40e/Makefile
+++ b/drivers/net/i40e/Makefile
@@ -117,5 +117,6 @@ DEPDIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += lib/librte_eal 
lib/librte_ether
 DEPDIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += lib/librte_mempool lib/librte_mbuf
 DEPDIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += lib/librte_net
 DEPDIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += lib/librte_hash
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 8033c35..e43b4d9 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -461,6 +462,12 @@ static void i40e_set_default_mac_addr(struct rte_eth_dev 
*dev,
 
 static int i40e_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu);
 
+static int i40e_ethertype_filter_convert(
+   const struct rte_eth_ethertype_filter *input,
+   struct i40e_ethertype_filter *filter);
+static int i40e_sw_ethertype_filter_insert(struct i40e_pf *pf,
+  struct i40e_ethertype_filter *filter);
+
 static const struct rte_pci_id pci_id_i40e_map[] = {
{ RTE_PCI_DEVICE(I40E_INTEL_VENDOR_ID, I40E_DEV_ID_SFP_XL710) },
{ RTE_PCI_DEVICE(I40E_INTEL_VENDOR_ID, I40E_DEV_ID_QEMU) },
@@ -938,9 +945,18 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
int ret;
uint32_t len;
uint8_t aq_fail = 0;
+   struct i40e_ethertype_rule *ethertype_rule = &pf->ethertype;
 
PMD_INIT_FUNC_TRACE();
 
+   char ethertype_hash_name[RTE_HASH_NAMESIZE];
+   struct rte_hash_parameters ethertype_hash_params = {
+   .name = ethertype_hash_name,
+   .entries = I40E_MAX_ETHERTYPE_FILTER_NUM,
+   .key_len = sizeof(struct i40e_ethertype_filter_input),
+   .hash_func = rte_hash_crc,
+   };
+
dev->dev_ops = &i40e_eth_dev_ops;
dev->rx_pkt_burst = i40e_recv_pkts;
dev->tx_pkt_burst = i40e_xmit_pkts;
@@ -1180,8 +1196,33 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
pf->flags &= ~I40E_FLAG_DCB;
}
 
+   /* Initialize ethertype filter rule list and hash */
+   TAILQ_INIT(ðertype_rule->ethertype_list);
+   snprintf(ethertype_hash_name, RTE_HASH_NAMESIZE,
+"ethertype_%s", dev->data->name);
+   ethertype_rule->hash_table = rte_hash_create(ðertype_hash_params);
+   if (!ethertype_rule->hash_table) {
+   PMD_INIT_LOG(ERR, "Failed to create ethertype hash table!");
+   ret = -EINVAL;
+   goto err_ethertype_hash_table_create;
+   }
+   ethertype_rule->hash_map = rte_zmalloc("i40e_ethertype_hash_map",
+  sizeof(struct i40e_ethertype_filter *) *
+  I40E_MAX_ETHERTYPE_FILTER_NUM,
+  0);
+   if (!ethertype_rule->hash_map) {
+   PMD_INIT_LOG(ERR,
+"Failed to allocate memory for ethertype hash map!");
+   ret = -ENOMEM;
+   goto err_ethertype_hash_map_alloc;
+   }
+
return 0;
 
+err_ethertype_hash_map_alloc:
+   rte_hash_free(ethertype_rule->hash_table);
+err_ethertype_hash_table_create:
+   rte_free(dev->data->mac_addrs);
 err_mac_alloc:
i40e_vsi_release(pf->main_vsi);
 err_setup_pf_switch:
@@ -1204,23 +1245,40 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 static int
 eth_i40e_dev_uninit(struct rte_eth_dev *dev)
 {
+   struct i40e_pf *pf;
struct rte_pci_device *pci_dev;
struct i40e_hw *hw;
struct i40e_filter_control_settings settings;
+   struct i40e_ethertype_filter *p_ethertype;
int ret;
uint8_t aq_fail = 0;
+   struct i40e_ethertype_rule *ethertype_rule;
 
PMD_INIT_FUNC_TRACE();
 
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
return 0;
 
+   pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
pci_dev = dev->pci_dev;
+   ethertype_rule = &pf->ethertype;
 
if (hw->adapter_stopped == 0)
i40e_dev_close(dev);
 
+   /* Remove all ethertype director rules and hash */
+   if (ethertype_rule->hash_map)
+   rte_free(ethertype_rule->hash_map);
+   if (ether

[dpdk-dev] [PATCH v5 02/17] net/i40e: store tunnel filter

2017-01-03 Thread Beilei Xing
Currently there's no tunnel filter stored in SW.
This patch stores tunnel filter in SW with cuckoo
hash, also adds protection if a tunnel filter has
been added.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_ethdev.c | 169 -
 drivers/net/i40e/i40e_ethdev.h |  32 
 2 files changed, 198 insertions(+), 3 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index e43b4d9..2bdb4d6 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -468,6 +468,12 @@ static int i40e_ethertype_filter_convert(
 static int i40e_sw_ethertype_filter_insert(struct i40e_pf *pf,
   struct i40e_ethertype_filter *filter);
 
+static int i40e_tunnel_filter_convert(
+   struct i40e_aqc_add_remove_cloud_filters_element_data *cld_filter,
+   struct i40e_tunnel_filter *tunnel_filter);
+static int i40e_sw_tunnel_filter_insert(struct i40e_pf *pf,
+   struct i40e_tunnel_filter *tunnel_filter);
+
 static const struct rte_pci_id pci_id_i40e_map[] = {
{ RTE_PCI_DEVICE(I40E_INTEL_VENDOR_ID, I40E_DEV_ID_SFP_XL710) },
{ RTE_PCI_DEVICE(I40E_INTEL_VENDOR_ID, I40E_DEV_ID_QEMU) },
@@ -946,6 +952,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
uint32_t len;
uint8_t aq_fail = 0;
struct i40e_ethertype_rule *ethertype_rule = &pf->ethertype;
+   struct i40e_tunnel_rule *tunnel_rule = &pf->tunnel;
 
PMD_INIT_FUNC_TRACE();
 
@@ -957,6 +964,14 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
.hash_func = rte_hash_crc,
};
 
+   char tunnel_hash_name[RTE_HASH_NAMESIZE];
+   struct rte_hash_parameters tunnel_hash_params = {
+   .name = tunnel_hash_name,
+   .entries = I40E_MAX_TUNNEL_FILTER_NUM,
+   .key_len = sizeof(struct i40e_tunnel_filter_input),
+   .hash_func = rte_hash_crc,
+   };
+
dev->dev_ops = &i40e_eth_dev_ops;
dev->rx_pkt_burst = i40e_recv_pkts;
dev->tx_pkt_burst = i40e_xmit_pkts;
@@ -1217,8 +1232,33 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
goto err_ethertype_hash_map_alloc;
}
 
+   /* Initialize tunnel filter rule list and hash */
+   TAILQ_INIT(&tunnel_rule->tunnel_list);
+   snprintf(tunnel_hash_name, RTE_HASH_NAMESIZE,
+"tunnel_%s", dev->data->name);
+   tunnel_rule->hash_table = rte_hash_create(&tunnel_hash_params);
+   if (!tunnel_rule->hash_table) {
+   PMD_INIT_LOG(ERR, "Failed to create tunnel hash table!");
+   ret = -EINVAL;
+   goto err_tunnel_hash_table_create;
+   }
+   tunnel_rule->hash_map = rte_zmalloc("i40e_tunnel_hash_map",
+   sizeof(struct i40e_tunnel_filter *) *
+   I40E_MAX_TUNNEL_FILTER_NUM,
+   0);
+   if (!tunnel_rule->hash_map) {
+   PMD_INIT_LOG(ERR,
+"Failed to allocate memory for tunnel hash map!");
+   ret = -ENOMEM;
+   goto err_tunnel_hash_map_alloc;
+   }
+
return 0;
 
+err_tunnel_hash_map_alloc:
+   rte_hash_free(tunnel_rule->hash_table);
+err_tunnel_hash_table_create:
+   rte_free(ethertype_rule->hash_map);
 err_ethertype_hash_map_alloc:
rte_hash_free(ethertype_rule->hash_table);
 err_ethertype_hash_table_create:
@@ -1250,9 +1290,11 @@ eth_i40e_dev_uninit(struct rte_eth_dev *dev)
struct i40e_hw *hw;
struct i40e_filter_control_settings settings;
struct i40e_ethertype_filter *p_ethertype;
+   struct i40e_tunnel_filter *p_tunnel;
int ret;
uint8_t aq_fail = 0;
struct i40e_ethertype_rule *ethertype_rule;
+   struct i40e_tunnel_rule *tunnel_rule;
 
PMD_INIT_FUNC_TRACE();
 
@@ -1263,6 +1305,7 @@ eth_i40e_dev_uninit(struct rte_eth_dev *dev)
hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
pci_dev = dev->pci_dev;
ethertype_rule = &pf->ethertype;
+   tunnel_rule = &pf->tunnel;
 
if (hw->adapter_stopped == 0)
i40e_dev_close(dev);
@@ -1279,6 +1322,17 @@ eth_i40e_dev_uninit(struct rte_eth_dev *dev)
rte_free(p_ethertype);
}
 
+   /* Remove all tunnel director rules and hash */
+   if (tunnel_rule->hash_map)
+   rte_free(tunnel_rule->hash_map);
+   if (tunnel_rule->hash_table)
+   rte_hash_free(tunnel_rule->hash_table);
+
+   while ((p_tunnel = TAILQ_FIRST(&tunnel_rule->tunnel_list))) {
+   TAILQ_REMOVE(&tunnel_rule->tunnel_list, p_tunnel, rules);
+   rte_free(p_tunnel);
+   }
+
dev->dev_ops = NULL;
dev->rx_pkt_burst = NULL;
dev->tx_pkt_burst = NULL;
@@ -6478,6 +6532,85 @@ i40e_dev_get_filter_type(uint16_t filter_type, uint16_t 
*flag)
return 0;
 }
 
+/* 

[dpdk-dev] [PATCH v5 04/17] net/i40e: restore ethertype filter

2017-01-03 Thread Beilei Xing
Add support of restoring ethertype filter in case filter
dropped accidentally, as all filters need to be added and
removed by user obviously for generic filter API.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_ethdev.c | 44 ++
 1 file changed, 44 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index fb7d794..189d110 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -474,6 +474,9 @@ static int i40e_tunnel_filter_convert(
 static int i40e_sw_tunnel_filter_insert(struct i40e_pf *pf,
struct i40e_tunnel_filter *tunnel_filter);
 
+static void i40e_ethertype_filter_restore(struct i40e_pf *pf);
+static void i40e_filter_restore(struct i40e_pf *pf);
+
 static const struct rte_pci_id pci_id_i40e_map[] = {
{ RTE_PCI_DEVICE(I40E_INTEL_VENDOR_ID, I40E_DEV_ID_SFP_XL710) },
{ RTE_PCI_DEVICE(I40E_INTEL_VENDOR_ID, I40E_DEV_ID_QEMU) },
@@ -1955,6 +1958,8 @@ i40e_dev_start(struct rte_eth_dev *dev)
/* enable uio intr after callback register */
rte_intr_enable(intr_handle);
 
+   i40e_filter_restore(pf);
+
return I40E_SUCCESS;
 
 err_up:
@@ -10071,3 +10076,42 @@ i40e_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 
return ret;
 }
+
+/* Restore ethertype filter */
+static void
+i40e_ethertype_filter_restore(struct i40e_pf *pf)
+{
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   struct i40e_ethertype_filter_list
+   *ethertype_list = &pf->ethertype.ethertype_list;
+   struct i40e_ethertype_filter *f;
+   struct i40e_control_filter_stats stats;
+   uint16_t flags;
+
+   TAILQ_FOREACH(f, ethertype_list, rules) {
+   flags = 0;
+   if (!(f->flags & RTE_ETHTYPE_FLAGS_MAC))
+   flags |= I40E_AQC_ADD_CONTROL_PACKET_FLAGS_IGNORE_MAC;
+   if (f->flags & RTE_ETHTYPE_FLAGS_DROP)
+   flags |= I40E_AQC_ADD_CONTROL_PACKET_FLAGS_DROP;
+   flags |= I40E_AQC_ADD_CONTROL_PACKET_FLAGS_TO_QUEUE;
+
+   memset(&stats, 0, sizeof(stats));
+   i40e_aq_add_rem_control_packet_filter(hw,
+   f->input.mac_addr.addr_bytes,
+   f->input.ether_type,
+   flags, pf->main_vsi->seid,
+   f->queue, 1, &stats, NULL);
+   }
+   PMD_DRV_LOG(INFO, "Ethertype filter:"
+   " mac_etype_used = %u, etype_used = %u,"
+   " mac_etype_free = %u, etype_free = %u\n",
+   stats.mac_etype_used, stats.etype_used,
+   stats.mac_etype_free, stats.etype_free);
+}
+
+static void
+i40e_filter_restore(struct i40e_pf *pf)
+{
+   i40e_ethertype_filter_restore(pf);
+}
-- 
2.5.5



[dpdk-dev] [PATCH v5 03/17] net/i40e: store flow director filter

2017-01-03 Thread Beilei Xing
Currently there's no flow director filter stored in SW. This
patch stores flow director filters in SW with cuckoo hash,
also adds protection if a flow director filter has been added.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_ethdev.c |  48 +++
 drivers/net/i40e/i40e_ethdev.h |  14 ++
 drivers/net/i40e/i40e_fdir.c   | 105 +
 3 files changed, 167 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 2bdb4d6..fb7d794 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -953,6 +953,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
uint8_t aq_fail = 0;
struct i40e_ethertype_rule *ethertype_rule = &pf->ethertype;
struct i40e_tunnel_rule *tunnel_rule = &pf->tunnel;
+   struct i40e_fdir_info *fdir_info = &pf->fdir;
 
PMD_INIT_FUNC_TRACE();
 
@@ -972,6 +973,14 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
.hash_func = rte_hash_crc,
};
 
+   char fdir_hash_name[RTE_HASH_NAMESIZE];
+   struct rte_hash_parameters fdir_hash_params = {
+   .name = fdir_hash_name,
+   .entries = I40E_MAX_FDIR_FILTER_NUM,
+   .key_len = sizeof(struct rte_eth_fdir_input),
+   .hash_func = rte_hash_crc,
+   };
+
dev->dev_ops = &i40e_eth_dev_ops;
dev->rx_pkt_burst = i40e_recv_pkts;
dev->tx_pkt_burst = i40e_xmit_pkts;
@@ -1253,8 +1262,33 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
goto err_tunnel_hash_map_alloc;
}
 
+   /* Initialize flow director filter rule list and hash */
+   TAILQ_INIT(&fdir_info->fdir_list);
+   snprintf(fdir_hash_name, RTE_HASH_NAMESIZE,
+"fdir_%s", dev->data->name);
+   fdir_info->hash_table = rte_hash_create(&fdir_hash_params);
+   if (!fdir_info->hash_table) {
+   PMD_INIT_LOG(ERR, "Failed to create fdir hash table!");
+   ret = -EINVAL;
+   goto err_fdir_hash_table_create;
+   }
+   fdir_info->hash_map = rte_zmalloc("i40e_fdir_hash_map",
+ sizeof(struct i40e_fdir_filter *) *
+ I40E_MAX_FDIR_FILTER_NUM,
+ 0);
+   if (!fdir_info->hash_map) {
+   PMD_INIT_LOG(ERR,
+"Failed to allocate memory for fdir hash map!");
+   ret = -ENOMEM;
+   goto err_fdir_hash_map_alloc;
+   }
+
return 0;
 
+err_fdir_hash_map_alloc:
+   rte_hash_free(fdir_info->hash_table);
+err_fdir_hash_table_create:
+   rte_free(tunnel_rule->hash_map);
 err_tunnel_hash_map_alloc:
rte_hash_free(tunnel_rule->hash_table);
 err_tunnel_hash_table_create:
@@ -1291,10 +1325,12 @@ eth_i40e_dev_uninit(struct rte_eth_dev *dev)
struct i40e_filter_control_settings settings;
struct i40e_ethertype_filter *p_ethertype;
struct i40e_tunnel_filter *p_tunnel;
+   struct i40e_fdir_filter *p_fdir;
int ret;
uint8_t aq_fail = 0;
struct i40e_ethertype_rule *ethertype_rule;
struct i40e_tunnel_rule *tunnel_rule;
+   struct i40e_fdir_info *fdir_info;
 
PMD_INIT_FUNC_TRACE();
 
@@ -1306,6 +1342,7 @@ eth_i40e_dev_uninit(struct rte_eth_dev *dev)
pci_dev = dev->pci_dev;
ethertype_rule = &pf->ethertype;
tunnel_rule = &pf->tunnel;
+   fdir_info = &pf->fdir;
 
if (hw->adapter_stopped == 0)
i40e_dev_close(dev);
@@ -1333,6 +1370,17 @@ eth_i40e_dev_uninit(struct rte_eth_dev *dev)
rte_free(p_tunnel);
}
 
+   /* Remove all flow director rules and hash */
+   if (fdir_info->hash_map)
+   rte_free(fdir_info->hash_map);
+   if (fdir_info->hash_table)
+   rte_hash_free(fdir_info->hash_table);
+
+   while ((p_fdir = TAILQ_FIRST(&fdir_info->fdir_list))) {
+   TAILQ_REMOVE(&fdir_info->fdir_list, p_fdir, rules);
+   rte_free(p_fdir);
+   }
+
dev->dev_ops = NULL;
dev->rx_pkt_burst = NULL;
dev->tx_pkt_burst = NULL;
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 83f3594..b79fbd6 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -377,6 +377,14 @@ struct i40e_fdir_flex_mask {
 };
 
 #define I40E_FILTER_PCTYPE_MAX 64
+#define I40E_MAX_FDIR_FILTER_NUM (1024 * 8)
+
+struct i40e_fdir_filter {
+   TAILQ_ENTRY(i40e_fdir_filter) rules;
+   struct rte_eth_fdir_filter fdir;
+};
+
+TAILQ_HEAD(i40e_fdir_filter_list, i40e_fdir_filter);
 /*
  *  A structure used to define fields of a FDIR related info.
  */
@@ -395,6 +403,10 @@ struct i40e_fdir_info {
 */
struct i40e_fdir_flex_pit flex_set[I40E_MAX_FLXPLD_LAYER * 
I40E_MAX_FLXPLD_FIED];
struct i40e_fdir_flex_mask flex_mask[I40E_FIL

[dpdk-dev] [PATCH v5 07/17] net/i40e: add flow validate function

2017-01-03 Thread Beilei Xing
This patch adds i40e_flow_validation function to check if
a flow is valid according to the flow pattern.
i40e_parse_ethertype_filter is added first, it also gets
the ethertype info.
i40e_flow.c is added to handle all generic filter events.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/Makefile  |   1 +
 drivers/net/i40e/i40e_ethdev.c |   7 +
 drivers/net/i40e/i40e_ethdev.h |  18 ++
 drivers/net/i40e/i40e_flow.c   | 447 +
 4 files changed, 473 insertions(+)
 create mode 100644 drivers/net/i40e/i40e_flow.c

diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile
index 11175c4..89bd85a 100644
--- a/drivers/net/i40e/Makefile
+++ b/drivers/net/i40e/Makefile
@@ -105,6 +105,7 @@ endif
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_ethdev_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_pf.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_fdir.c
+SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_flow.c
 
 # vector PMD driver needs SSE4.1 support
 ifeq ($(findstring RTE_MACHINE_CPUFLAG_SSE4_1,$(CFLAGS)),)
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 153322a..edfd52b 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -8426,6 +8426,8 @@ i40e_ethertype_filter_handle(struct rte_eth_dev *dev,
return ret;
 }
 
+const struct rte_flow_ops i40e_flow_ops;
+
 static int
 i40e_dev_filter_ctrl(struct rte_eth_dev *dev,
 enum rte_filter_type filter_type,
@@ -8457,6 +8459,11 @@ i40e_dev_filter_ctrl(struct rte_eth_dev *dev,
case RTE_ETH_FILTER_FDIR:
ret = i40e_fdir_ctrl_func(dev, filter_op, arg);
break;
+   case RTE_ETH_FILTER_GENERIC:
+   if (filter_op != RTE_ETH_FILTER_GET)
+   return -EINVAL;
+   *(const void **)arg = &i40e_flow_ops;
+   break;
default:
PMD_DRV_LOG(WARNING, "Filter type (%d) not supported",
filter_type);
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 92f6f55..23f360b 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define I40E_VLAN_TAG_SIZE4
 
@@ -629,6 +630,23 @@ struct i40e_adapter {
struct rte_timecounter tx_tstamp_tc;
 };
 
+union i40e_filter_t {
+   struct rte_eth_ethertype_filter ethertype_filter;
+   struct rte_eth_fdir_filter fdir_filter;
+   struct rte_eth_tunnel_filter_conf tunnel_filter;
+};
+
+typedef int (*parse_filter_t)(struct rte_eth_dev *dev,
+ const struct rte_flow_attr *attr,
+ const struct rte_flow_item pattern[],
+ const struct rte_flow_action actions[],
+ struct rte_flow_error *error,
+ union i40e_filter_t *filter);
+struct i40e_valid_pattern {
+   enum rte_flow_item_type *items;
+   parse_filter_t parse_filter;
+};
+
 int i40e_dev_switch_queues(struct i40e_pf *pf, bool on);
 int i40e_vsi_release(struct i40e_vsi *vsi);
 struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf,
diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
new file mode 100644
index 000..a9ff73f
--- /dev/null
+++ b/drivers/net/i40e/i40e_flow.c
@@ -0,0 +1,447 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (I

[dpdk-dev] [PATCH v5 06/17] net/i40e: restore flow director filter

2017-01-03 Thread Beilei Xing
Add support of restoring flow director filter.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_ethdev.c |  1 +
 drivers/net/i40e/i40e_ethdev.h |  1 +
 drivers/net/i40e/i40e_fdir.c   | 31 +++
 3 files changed, 33 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 67e1b37..153322a 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -10135,4 +10135,5 @@ i40e_filter_restore(struct i40e_pf *pf)
 {
i40e_ethertype_filter_restore(pf);
i40e_tunnel_filter_restore(pf);
+   i40e_fdir_filter_restore(pf);
 }
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index b79fbd6..92f6f55 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -670,6 +670,7 @@ int i40e_fdir_ctrl_func(struct rte_eth_dev *dev,
 int i40e_select_filter_input_set(struct i40e_hw *hw,
 struct rte_eth_input_set_conf *conf,
 enum rte_filter_type filter);
+void i40e_fdir_filter_restore(struct i40e_pf *pf);
 int i40e_hash_filter_inset_select(struct i40e_hw *hw,
 struct rte_eth_input_set_conf *conf);
 int i40e_fdir_filter_inset_select(struct i40e_pf *pf,
diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
index 4a29b37..f89dbc9 100644
--- a/drivers/net/i40e/i40e_fdir.c
+++ b/drivers/net/i40e/i40e_fdir.c
@@ -1586,3 +1586,34 @@ i40e_fdir_ctrl_func(struct rte_eth_dev *dev,
}
return ret;
 }
+
+/* Restore flow director filter */
+void
+i40e_fdir_filter_restore(struct i40e_pf *pf)
+{
+   struct rte_eth_dev *dev = I40E_VSI_TO_ETH_DEV(pf->main_vsi);
+   struct i40e_fdir_filter_list *fdir_list = &pf->fdir.fdir_list;
+   struct i40e_fdir_filter *f;
+#ifdef RTE_LIBRTE_I40E_DEBUG_DRIVER
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   uint32_t fdstat;
+   uint32_t guarant_cnt;  /**< Number of filters in guaranteed spaces. */
+   uint32_t best_cnt; /**< Number of filters in best effort spaces. */
+#endif /* RTE_LIBRTE_I40E_DEBUG_DRIVER */
+
+   TAILQ_FOREACH(f, fdir_list, rules)
+   i40e_add_del_fdir_filter(dev, &f->fdir, TRUE);
+
+#ifdef RTE_LIBRTE_I40E_DEBUG_DRIVER
+   fdstat = I40E_READ_REG(hw, I40E_PFQF_FDSTAT);
+   guarant_cnt =
+   (uint32_t)((fdstat & I40E_PFQF_FDSTAT_GUARANT_CNT_MASK) >>
+  I40E_PFQF_FDSTAT_GUARANT_CNT_SHIFT);
+   best_cnt =
+   (uint32_t)((fdstat & I40E_PFQF_FDSTAT_BEST_CNT_MASK) >>
+  I40E_PFQF_FDSTAT_BEST_CNT_SHIFT);
+#endif /* RTE_LIBRTE_I40E_DEBUG_DRIVER */
+
+   PMD_DRV_LOG(INFO, "FDIR: Guarant count: %d,  Best count: %d\n",
+   guarant_cnt, best_cnt);
+}
-- 
2.5.5



[dpdk-dev] [PATCH v5 05/17] net/i40e: restore tunnel filter

2017-01-03 Thread Beilei Xing
Add support of restoring tunnel filter.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_ethdev.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 189d110..67e1b37 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -475,6 +475,7 @@ static int i40e_sw_tunnel_filter_insert(struct i40e_pf *pf,
struct i40e_tunnel_filter *tunnel_filter);
 
 static void i40e_ethertype_filter_restore(struct i40e_pf *pf);
+static void i40e_tunnel_filter_restore(struct i40e_pf *pf);
 static void i40e_filter_restore(struct i40e_pf *pf);
 
 static const struct rte_pci_id pci_id_i40e_map[] = {
@@ -10110,8 +10111,28 @@ i40e_ethertype_filter_restore(struct i40e_pf *pf)
stats.mac_etype_free, stats.etype_free);
 }
 
+/* Restore tunnel filter */
+static void
+i40e_tunnel_filter_restore(struct i40e_pf *pf)
+{
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   struct i40e_vsi *vsi = pf->main_vsi;
+   struct i40e_tunnel_filter_list
+   *tunnel_list = &pf->tunnel.tunnel_list;
+   struct i40e_tunnel_filter *f;
+   struct i40e_aqc_add_remove_cloud_filters_element_data cld_filter;
+
+   TAILQ_FOREACH(f, tunnel_list, rules) {
+   memset(&cld_filter, 0, sizeof(cld_filter));
+   rte_memcpy(&cld_filter, &f->input, sizeof(f->input));
+   cld_filter.queue_number = f->queue;
+   i40e_aq_add_cloud_filters(hw, vsi->seid, &cld_filter, 1);
+   }
+}
+
 static void
 i40e_filter_restore(struct i40e_pf *pf)
 {
i40e_ethertype_filter_restore(pf);
+   i40e_tunnel_filter_restore(pf);
 }
-- 
2.5.5



[dpdk-dev] [PATCH v5 08/17] net/i40e: parse flow director filter

2017-01-03 Thread Beilei Xing
This patch adds i40e_parse_fdir_filter to check if a rule
is a flow director rule according to the flow pattern,
and the function also gets the flow director info.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_ethdev.c |  56 +---
 drivers/net/i40e/i40e_ethdev.h |  55 
 drivers/net/i40e/i40e_flow.c   | 607 +
 3 files changed, 663 insertions(+), 55 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index edfd52b..bcf28cf 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -139,60 +139,6 @@
 #define I40E_DEFAULT_DCB_APP_NUM1
 #define I40E_DEFAULT_DCB_APP_PRIO   3
 
-#define I40E_INSET_NONE0x0ULL
-
-/* bit0 ~ bit 7 */
-#define I40E_INSET_DMAC0x0001ULL
-#define I40E_INSET_SMAC0x0002ULL
-#define I40E_INSET_VLAN_OUTER  0x0004ULL
-#define I40E_INSET_VLAN_INNER  0x0008ULL
-#define I40E_INSET_VLAN_TUNNEL 0x0010ULL
-
-/* bit 8 ~ bit 15 */
-#define I40E_INSET_IPV4_SRC0x0100ULL
-#define I40E_INSET_IPV4_DST0x0200ULL
-#define I40E_INSET_IPV6_SRC0x0400ULL
-#define I40E_INSET_IPV6_DST0x0800ULL
-#define I40E_INSET_SRC_PORT0x1000ULL
-#define I40E_INSET_DST_PORT0x2000ULL
-#define I40E_INSET_SCTP_VT 0x4000ULL
-
-/* bit 16 ~ bit 31 */
-#define I40E_INSET_IPV4_TOS0x0001ULL
-#define I40E_INSET_IPV4_PROTO  0x0002ULL
-#define I40E_INSET_IPV4_TTL0x0004ULL
-#define I40E_INSET_IPV6_TC 0x0008ULL
-#define I40E_INSET_IPV6_FLOW   0x0010ULL
-#define I40E_INSET_IPV6_NEXT_HDR   0x0020ULL
-#define I40E_INSET_IPV6_HOP_LIMIT  0x0040ULL
-#define I40E_INSET_TCP_FLAGS   0x0080ULL
-
-/* bit 32 ~ bit 47, tunnel fields */
-#define I40E_INSET_TUNNEL_IPV4_DST   0x0001ULL
-#define I40E_INSET_TUNNEL_IPV6_DST   0x0002ULL
-#define I40E_INSET_TUNNEL_DMAC   0x0004ULL
-#define I40E_INSET_TUNNEL_SRC_PORT   0x0008ULL
-#define I40E_INSET_TUNNEL_DST_PORT   0x0010ULL
-#define I40E_INSET_TUNNEL_ID 0x0020ULL
-
-/* bit 48 ~ bit 55 */
-#define I40E_INSET_LAST_ETHER_TYPE 0x0001ULL
-
-/* bit 56 ~ bit 63, Flex Payload */
-#define I40E_INSET_FLEX_PAYLOAD_W1 0x0100ULL
-#define I40E_INSET_FLEX_PAYLOAD_W2 0x0200ULL
-#define I40E_INSET_FLEX_PAYLOAD_W3 0x0400ULL
-#define I40E_INSET_FLEX_PAYLOAD_W4 0x0800ULL
-#define I40E_INSET_FLEX_PAYLOAD_W5 0x1000ULL
-#define I40E_INSET_FLEX_PAYLOAD_W6 0x2000ULL
-#define I40E_INSET_FLEX_PAYLOAD_W7 0x4000ULL
-#define I40E_INSET_FLEX_PAYLOAD_W8 0x8000ULL
-#define I40E_INSET_FLEX_PAYLOAD \
-   (I40E_INSET_FLEX_PAYLOAD_W1 | I40E_INSET_FLEX_PAYLOAD_W2 | \
-   I40E_INSET_FLEX_PAYLOAD_W3 | I40E_INSET_FLEX_PAYLOAD_W4 | \
-   I40E_INSET_FLEX_PAYLOAD_W5 | I40E_INSET_FLEX_PAYLOAD_W6 | \
-   I40E_INSET_FLEX_PAYLOAD_W7 | I40E_INSET_FLEX_PAYLOAD_W8)
-
 /**
  * Below are values for writing un-exposed registers suggested
  * by silicon experts
@@ -7617,7 +7563,7 @@ i40e_validate_input_set(enum i40e_filter_pctype pctype,
 }
 
 /* default input set fields combination per pctype */
-static uint64_t
+uint64_t
 i40e_get_default_input_set(uint16_t pctype)
 {
static const uint64_t default_inset_table[] = {
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 23f360b..9e3a48d 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -190,6 +190,60 @@ enum i40e_flxpld_layer_idx {
 #define FLOATING_VEB_SUPPORTED_FW_MAJ 5
 #define FLOATING_VEB_SUPPORTED_FW_MIN 0
 
+#define I40E_INSET_NONE0x0ULL
+
+/* bit0 ~ bit 7 */
+#define I40E_INSET_DMAC0x0001ULL
+#define I40E_INSET_SMAC0x0002ULL
+#define I40E_INSET_VLAN_OUTER  0x0004ULL
+#define I40E_INSET_VLAN_INNER  0x0008ULL
+#define I40E_INSET_VLAN_TUNNEL 0x0010ULL
+
+/* bit 8 ~ bit 15 */
+#define I40E_INSET_IPV4_SRC0x0100ULL
+#define I40E_INSET_IPV4_DST0x0200ULL
+#define I40E_INSET_IPV6_SRC0x0400ULL
+#define I40E_INSET_IPV6_DST0x0800ULL
+#define I40E_INSET_SRC_PORT0x1000ULL
+#define I40E_INSET_DST_PORT0x2000ULL
+#define I40E_INSET_SCTP_VT 0x4000ULL
+
+/* bit 16 ~ bit 31 */
+#define I40E_INSET_IPV4_TOS0x0001ULL
+#define I40E_INSET_IPV4_PROTO  0x0002ULL
+#define I40E_INSET_IPV4_TTL0x0004ULL
+#define I40E_INS

[dpdk-dev] [PATCH v5 10/17] net/i40e: add flow create function

2017-01-03 Thread Beilei Xing
This patch adds i40e_flow_create function to create a
rule. It will check if a flow matches ethertype filter
or flow director filter or tunnel filter, if the flow
matches some kind of filter, then set the filter to HW.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_ethdev.c | 16 ++---
 drivers/net/i40e/i40e_ethdev.h | 21 
 drivers/net/i40e/i40e_fdir.c   |  2 +-
 drivers/net/i40e/i40e_flow.c   | 77 ++
 4 files changed, 110 insertions(+), 6 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index bcf28cf..bbc43dc 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -353,9 +353,6 @@ static int i40e_dev_udp_tunnel_port_add(struct rte_eth_dev 
*dev,
 static int i40e_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
struct rte_eth_udp_tunnel *udp_tunnel);
 static void i40e_filter_input_set_init(struct i40e_pf *pf);
-static int i40e_ethertype_filter_set(struct i40e_pf *pf,
-   struct rte_eth_ethertype_filter *filter,
-   bool add);
 static int i40e_ethertype_filter_handle(struct rte_eth_dev *dev,
enum rte_filter_op filter_op,
void *arg);
@@ -1233,6 +1230,8 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
goto err_fdir_hash_map_alloc;
}
 
+   TAILQ_INIT(&pf->flow_list);
+
return 0;
 
 err_fdir_hash_map_alloc:
@@ -1273,6 +1272,7 @@ eth_i40e_dev_uninit(struct rte_eth_dev *dev)
struct rte_pci_device *pci_dev;
struct i40e_hw *hw;
struct i40e_filter_control_settings settings;
+   struct i40e_flow *p_flow;
struct i40e_ethertype_filter *p_ethertype;
struct i40e_tunnel_filter *p_tunnel;
struct i40e_fdir_filter *p_fdir;
@@ -1297,6 +1297,12 @@ eth_i40e_dev_uninit(struct rte_eth_dev *dev)
if (hw->adapter_stopped == 0)
i40e_dev_close(dev);
 
+   /* Remove all flows */
+   while ((p_flow = TAILQ_FIRST(&pf->flow_list))) {
+   TAILQ_REMOVE(&pf->flow_list, p_flow, node);
+   rte_free(p_flow);
+   }
+
/* Remove all ethertype director rules and hash */
if (ethertype_rule->hash_map)
rte_free(ethertype_rule->hash_map);
@@ -6611,7 +6617,7 @@ i40e_sw_tunnel_filter_del(struct i40e_pf *pf,
return 0;
 }
 
-static int
+int
 i40e_dev_tunnel_filter_set(struct i40e_pf *pf,
struct rte_eth_tunnel_filter_conf *tunnel_filter,
uint8_t add)
@@ -8256,7 +8262,7 @@ i40e_sw_ethertype_filter_del(struct i40e_pf *pf,
  * Configure ethertype filter, which can director packet by filtering
  * with mac address and ether_type or only ether_type
  */
-static int
+int
 i40e_ethertype_filter_set(struct i40e_pf *pf,
struct rte_eth_ethertype_filter *filter,
bool add)
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 9e3a48d..b33910d 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -536,6 +536,17 @@ struct i40e_mirror_rule {
 TAILQ_HEAD(i40e_mirror_rule_list, i40e_mirror_rule);
 
 /*
+ * Struct to store flow created.
+ */
+struct i40e_flow {
+   TAILQ_ENTRY(i40e_flow) node;
+   enum rte_filter_type filter_type;
+   void *rule;
+};
+
+TAILQ_HEAD(i40e_flow_list, i40e_flow);
+
+/*
  * Structure to store private data specific for PF instance.
  */
 struct i40e_pf {
@@ -592,6 +603,7 @@ struct i40e_pf {
bool floating_veb; /* The flag to use the floating VEB */
/* The floating enable flag for the specific VF */
bool floating_veb_list[I40E_MAX_VF];
+   struct i40e_flow_list flow_list;
 };
 
 enum pending_msg {
@@ -767,6 +779,15 @@ i40e_sw_tunnel_filter_lookup(struct i40e_tunnel_rule 
*tunnel_rule,
 int i40e_sw_tunnel_filter_del(struct i40e_pf *pf,
  struct i40e_tunnel_filter_input *input);
 uint64_t i40e_get_default_input_set(uint16_t pctype);
+int i40e_ethertype_filter_set(struct i40e_pf *pf,
+ struct rte_eth_ethertype_filter *filter,
+ bool add);
+int i40e_add_del_fdir_filter(struct rte_eth_dev *dev,
+const struct rte_eth_fdir_filter *filter,
+bool add);
+int i40e_dev_tunnel_filter_set(struct i40e_pf *pf,
+  struct rte_eth_tunnel_filter_conf *tunnel_filter,
+  uint8_t add);
 
 /* I40E_DEV_PRIVATE_TO */
 #define I40E_DEV_PRIVATE_TO_PF(adapter) \
diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
index f89dbc9..91d91aa 100644
--- a/drivers/net/i40e/i40e_fdir.c
+++ b/drivers/net/i40e/i40e_fdir.c
@@ -1099,7 +1099,7 @@ i40e_sw_fdir_filter_del(struct i40e_pf *pf, struct 
rte_eth_fdir_input *input)
  * @filte

[dpdk-dev] [PATCH v5 09/17] net/i40e: parse tunnel filter

2017-01-03 Thread Beilei Xing
This patch adds i40e_parse_tunnel_filter to check if
a rule is a tunnel rule according to items of the flow
pattern, and the function also gets the tunnel info.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_flow.c | 394 +++
 1 file changed, 394 insertions(+)

diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index 64b4ab6..42ebe5e 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -54,6 +54,8 @@
 #define I40E_IPV4_TC_SHIFT 4
 #define I40E_IPV6_TC_MASK  (0x00FF << I40E_IPV4_TC_SHIFT)
 #define I40E_IPV6_FRAG_HEADER  44
+#define I40E_TENANT_ARRAY_NUM  3
+#define I40E_TCI_MASK  0x
 
 static int i40e_flow_validate(struct rte_eth_dev *dev,
  const struct rte_flow_attr *attr,
@@ -76,6 +78,14 @@ static int i40e_parse_fdir_act(struct rte_eth_dev *dev,
   const struct rte_flow_action *actions,
   struct rte_flow_error *error,
   struct rte_eth_fdir_filter *filter);
+static int i40e_parse_tunnel_pattern(__rte_unused struct rte_eth_dev *dev,
+const struct rte_flow_item *pattern,
+struct rte_flow_error *error,
+struct rte_eth_tunnel_filter_conf *filter);
+static int i40e_parse_tunnel_act(struct rte_eth_dev *dev,
+const struct rte_flow_action *actions,
+struct rte_flow_error *error,
+struct rte_eth_tunnel_filter_conf *filter);
 static int i40e_parse_attr(const struct rte_flow_attr *attr,
   struct rte_flow_error *error);
 
@@ -192,6 +202,45 @@ static enum rte_flow_item_type 
pattern_fdir_ipv6_sctp_ext[] = {
RTE_FLOW_ITEM_TYPE_END,
 };
 
+/* Pattern matched tunnel filter */
+static enum rte_flow_item_type pattern_vxlan_1[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_VXLAN,
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+static enum rte_flow_item_type pattern_vxlan_2[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_VXLAN,
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+static enum rte_flow_item_type pattern_vxlan_3[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_VXLAN,
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_VLAN,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+static enum rte_flow_item_type pattern_vxlan_4[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_VXLAN,
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_VLAN,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
 static int
 i40e_parse_ethertype_filter(struct rte_eth_dev *dev,
const struct rte_flow_attr *attr,
@@ -257,6 +306,33 @@ i40e_parse_fdir_filter(struct rte_eth_dev *dev,
return 0;
 }
 
+static int
+i40e_parse_tunnel_filter(struct rte_eth_dev *dev,
+const struct rte_flow_attr *attr,
+const struct rte_flow_item pattern[],
+const struct rte_flow_action actions[],
+struct rte_flow_error *error,
+union i40e_filter_t *filter)
+{
+   struct rte_eth_tunnel_filter_conf *tunnel_filter =
+   &filter->tunnel_filter;
+   int ret;
+
+   ret = i40e_parse_tunnel_pattern(dev, pattern, error, tunnel_filter);
+   if (ret)
+   return ret;
+
+   ret = i40e_parse_tunnel_act(dev, actions, error, tunnel_filter);
+   if (ret)
+   return ret;
+
+   ret = i40e_parse_attr(attr, error);
+   if (ret)
+   return ret;
+
+   return ret;
+}
+
 static struct i40e_valid_pattern i40e_supported_patterns[] = {
/* Ethertype */
{ pattern_ethertype, i40e_parse_ethertype_filter },
@@ -277,6 +353,11 @@ static struct i40e_valid_pattern i40e_supported_patterns[] 
= {
{ pattern_fdir_ipv6_tcp_ext, i40e_parse_fdir_filter },
{ pattern_fdir_ipv6_sctp, i40e_parse_fdir_filter },
{ pattern_fdir_ipv6_sctp_ext, i40e_parse_fdir_filter },
+   /* tunnel */
+   { pattern_vxlan_1, i40e_parse_tunnel_filter },
+   { pattern_vxlan_2, i40e_parse_tunnel_filter },
+   { pattern_vxlan_3, i40e_parse_tunnel_filter },
+   { pattern_vxlan_4, i40e_parse_tunnel_filter },
 };
 
 #define NEXT_ITEM_OF_ACTION(act, actions, index)\
@@ -991,6 +1072,319 @@ i40e_parse_fdir_act(struct rte_eth_dev *dev,
return 0;
 }
 
+/* Parse to get the action info of a tunnle filter */
+static int

[dpdk-dev] [PATCH v5 11/17] net/i40e: add flow destroy function

2017-01-03 Thread Beilei Xing
This patch adds i40e_flow_destroy function to destroy
a flow for users.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_flow.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index 3114368..ece9f89 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -67,6 +67,9 @@ static struct rte_flow *i40e_flow_create(struct rte_eth_dev 
*dev,
 const struct rte_flow_item pattern[],
 const struct rte_flow_action actions[],
 struct rte_flow_error *error);
+static int i40e_flow_destroy(struct rte_eth_dev *dev,
+struct rte_flow *flow,
+struct rte_flow_error *error);
 static int i40e_parse_ethertype_pattern(__rte_unused struct rte_eth_dev *dev,
const struct rte_flow_item *pattern,
struct rte_flow_error *error,
@@ -97,6 +100,7 @@ static int i40e_parse_attr(const struct rte_flow_attr *attr,
 const struct rte_flow_ops i40e_flow_ops = {
.validate = i40e_flow_validate,
.create = i40e_flow_create,
+   .destroy = i40e_flow_destroy,
 };
 
 union i40e_filter_t cons_filter;
@@ -1523,3 +1527,32 @@ i40e_flow_create(struct rte_eth_dev *dev,
rte_free(flow);
return NULL;
 }
+
+static int
+i40e_flow_destroy(struct rte_eth_dev *dev,
+ struct rte_flow *flow,
+ struct rte_flow_error *error)
+{
+   struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+   struct i40e_flow *pmd_flow = (struct i40e_flow *)flow;
+   enum rte_filter_type filter_type = pmd_flow->filter_type;
+   int ret = 0;
+
+   switch (filter_type) {
+   default:
+   PMD_DRV_LOG(WARNING, "Filter type (%d) not supported",
+   filter_type);
+   ret = -EINVAL;
+   break;
+   }
+
+   if (!ret) {
+   TAILQ_REMOVE(&pf->flow_list, pmd_flow, node);
+   rte_free(pmd_flow);
+   } else
+   rte_flow_error_set(error, -ret,
+  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+  "Failed to destroy flow.");
+
+   return ret;
+}
-- 
2.5.5



[dpdk-dev] [PATCH v5 12/17] net/i40e: destroy ethertype filter

2017-01-03 Thread Beilei Xing
This patch adds i40e_dev_destroy_ethertype_filter function
to destroy a ethertype filter for users.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_flow.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index ece9f89..2940058 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -49,6 +49,7 @@
 
 #include "i40e_logs.h"
 #include "base/i40e_type.h"
+#include "base/i40e_prototype.h"
 #include "i40e_ethdev.h"
 
 #define I40E_IPV4_TC_SHIFT 4
@@ -96,6 +97,8 @@ static int i40e_parse_tunnel_act(struct rte_eth_dev *dev,
 struct rte_eth_tunnel_filter_conf *filter);
 static int i40e_parse_attr(const struct rte_flow_attr *attr,
   struct rte_flow_error *error);
+static int i40e_dev_destroy_ethertype_filter(struct i40e_pf *pf,
+struct i40e_ethertype_filter *filter);
 
 const struct rte_flow_ops i40e_flow_ops = {
.validate = i40e_flow_validate,
@@ -1539,6 +1542,10 @@ i40e_flow_destroy(struct rte_eth_dev *dev,
int ret = 0;
 
switch (filter_type) {
+   case RTE_ETH_FILTER_ETHERTYPE:
+   ret = i40e_dev_destroy_ethertype_filter(pf,
+   (struct i40e_ethertype_filter *)pmd_flow->rule);
+   break;
default:
PMD_DRV_LOG(WARNING, "Filter type (%d) not supported",
filter_type);
@@ -1556,3 +1563,38 @@ i40e_flow_destroy(struct rte_eth_dev *dev,
 
return ret;
 }
+
+static int
+i40e_dev_destroy_ethertype_filter(struct i40e_pf *pf,
+ struct i40e_ethertype_filter *filter)
+{
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   struct i40e_ethertype_rule *ethertype_rule = &pf->ethertype;
+   struct i40e_ethertype_filter *node;
+   struct i40e_control_filter_stats stats;
+   uint16_t flags = 0;
+   int ret = 0;
+
+   if (!(filter->flags & RTE_ETHTYPE_FLAGS_MAC))
+   flags |= I40E_AQC_ADD_CONTROL_PACKET_FLAGS_IGNORE_MAC;
+   if (filter->flags & RTE_ETHTYPE_FLAGS_DROP)
+   flags |= I40E_AQC_ADD_CONTROL_PACKET_FLAGS_DROP;
+   flags |= I40E_AQC_ADD_CONTROL_PACKET_FLAGS_TO_QUEUE;
+
+   memset(&stats, 0, sizeof(stats));
+   ret = i40e_aq_add_rem_control_packet_filter(hw,
+   filter->input.mac_addr.addr_bytes,
+   filter->input.ether_type,
+   flags, pf->main_vsi->seid,
+   filter->queue, 0, &stats, NULL);
+   if (ret < 0)
+   return ret;
+
+   node = i40e_sw_ethertype_filter_lookup(ethertype_rule, &filter->input);
+   if (!node)
+   return -EINVAL;
+
+   ret = i40e_sw_ethertype_filter_del(pf, &node->input);
+
+   return ret;
+}
-- 
2.5.5



[dpdk-dev] [PATCH v5 14/17] net/i40e: destroy flow directory filter

2017-01-03 Thread Beilei Xing
This patch supports destroying a flow directory filter
for users.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_flow.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index f334844..2674c2c 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -1552,6 +1552,10 @@ i40e_flow_destroy(struct rte_eth_dev *dev,
ret = i40e_dev_destroy_tunnel_filter(pf,
 (struct i40e_tunnel_filter *)pmd_flow->rule);
break;
+   case RTE_ETH_FILTER_FDIR:
+   ret = i40e_add_del_fdir_filter(dev,
+  &((struct i40e_fdir_filter *)pmd_flow->rule)->fdir, 0);
+   break;
default:
PMD_DRV_LOG(WARNING, "Filter type (%d) not supported",
filter_type);
-- 
2.5.5



[dpdk-dev] [PATCH v5 13/17] net/i40e: destroy tunnel filter

2017-01-03 Thread Beilei Xing
This patch adds i40e_dev_destroy_tunnel_filter function
to destroy a tunnel filter for users.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_flow.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index 2940058..f334844 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -99,6 +99,8 @@ static int i40e_parse_attr(const struct rte_flow_attr *attr,
   struct rte_flow_error *error);
 static int i40e_dev_destroy_ethertype_filter(struct i40e_pf *pf,
 struct i40e_ethertype_filter *filter);
+static int i40e_dev_destroy_tunnel_filter(struct i40e_pf *pf,
+ struct i40e_tunnel_filter *filter);
 
 const struct rte_flow_ops i40e_flow_ops = {
.validate = i40e_flow_validate,
@@ -1546,6 +1548,10 @@ i40e_flow_destroy(struct rte_eth_dev *dev,
ret = i40e_dev_destroy_ethertype_filter(pf,
(struct i40e_ethertype_filter *)pmd_flow->rule);
break;
+   case RTE_ETH_FILTER_TUNNEL:
+   ret = i40e_dev_destroy_tunnel_filter(pf,
+(struct i40e_tunnel_filter *)pmd_flow->rule);
+   break;
default:
PMD_DRV_LOG(WARNING, "Filter type (%d) not supported",
filter_type);
@@ -1598,3 +1604,38 @@ i40e_dev_destroy_ethertype_filter(struct i40e_pf *pf,
 
return ret;
 }
+
+static int
+i40e_dev_destroy_tunnel_filter(struct i40e_pf *pf,
+  struct i40e_tunnel_filter *filter)
+{
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   struct i40e_vsi *vsi = pf->main_vsi;
+   struct i40e_aqc_add_remove_cloud_filters_element_data cld_filter;
+   struct i40e_tunnel_rule *tunnel_rule = &pf->tunnel;
+   struct i40e_tunnel_filter *node;
+   int ret = 0;
+
+   memset(&cld_filter, 0, sizeof(cld_filter));
+   ether_addr_copy((struct ether_addr *)&filter->input.outer_mac,
+   (struct ether_addr *)&cld_filter.outer_mac);
+   ether_addr_copy((struct ether_addr *)&filter->input.inner_mac,
+   (struct ether_addr *)&cld_filter.inner_mac);
+   cld_filter.inner_vlan = filter->input.inner_vlan;
+   cld_filter.flags = filter->input.flags;
+   cld_filter.tenant_id = filter->input.tenant_id;
+   cld_filter.queue_number = filter->queue;
+
+   ret = i40e_aq_remove_cloud_filters(hw, vsi->seid,
+  &cld_filter, 1);
+   if (ret < 0)
+   return ret;
+
+   node = i40e_sw_tunnel_filter_lookup(tunnel_rule, &filter->input);
+   if (!node)
+   return -EINVAL;
+
+   ret = i40e_sw_tunnel_filter_del(pf, &node->input);
+
+   return ret;
+}
-- 
2.5.5



[dpdk-dev] [PATCH v5 15/17] net/i40e: add flow flush function

2017-01-03 Thread Beilei Xing
This patch adds i40e_flow_flush function to flush all
filters for users. And flow director flush function
is involved first.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_ethdev.h |  1 +
 drivers/net/i40e/i40e_fdir.c   |  4 +---
 drivers/net/i40e/i40e_flow.c   | 51 ++
 3 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index b33910d..57fd796 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -788,6 +788,7 @@ int i40e_add_del_fdir_filter(struct rte_eth_dev *dev,
 int i40e_dev_tunnel_filter_set(struct i40e_pf *pf,
   struct rte_eth_tunnel_filter_conf *tunnel_filter,
   uint8_t add);
+int i40e_fdir_flush(struct rte_eth_dev *dev);
 
 /* I40E_DEV_PRIVATE_TO */
 #define I40E_DEV_PRIVATE_TO_PF(adapter) \
diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
index 91d91aa..67d63ff 100644
--- a/drivers/net/i40e/i40e_fdir.c
+++ b/drivers/net/i40e/i40e_fdir.c
@@ -119,8 +119,6 @@ static int i40e_fdir_filter_programming(struct i40e_pf *pf,
enum i40e_filter_pctype pctype,
const struct rte_eth_fdir_filter *filter,
bool add);
-static int i40e_fdir_flush(struct rte_eth_dev *dev);
-
 static int i40e_fdir_filter_convert(const struct rte_eth_fdir_filter *input,
 struct i40e_fdir_filter *filter);
 static struct i40e_fdir_filter *
@@ -1325,7 +1323,7 @@ i40e_fdir_filter_programming(struct i40e_pf *pf,
  * i40e_fdir_flush - clear all filters of Flow Director table
  * @pf: board private structure
  */
-static int
+int
 i40e_fdir_flush(struct rte_eth_dev *dev)
 {
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index 2674c2c..bc8a76c 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -68,6 +68,8 @@ static struct rte_flow *i40e_flow_create(struct rte_eth_dev 
*dev,
 const struct rte_flow_item pattern[],
 const struct rte_flow_action actions[],
 struct rte_flow_error *error);
+static int i40e_flow_flush(struct rte_eth_dev *dev,
+  struct rte_flow_error *error);
 static int i40e_flow_destroy(struct rte_eth_dev *dev,
 struct rte_flow *flow,
 struct rte_flow_error *error);
@@ -101,11 +103,13 @@ static int i40e_dev_destroy_ethertype_filter(struct 
i40e_pf *pf,
 struct i40e_ethertype_filter *filter);
 static int i40e_dev_destroy_tunnel_filter(struct i40e_pf *pf,
  struct i40e_tunnel_filter *filter);
+static int i40e_fdir_filter_flush(struct i40e_pf *pf);
 
 const struct rte_flow_ops i40e_flow_ops = {
.validate = i40e_flow_validate,
.create = i40e_flow_create,
.destroy = i40e_flow_destroy,
+   .flush = i40e_flow_flush,
 };
 
 union i40e_filter_t cons_filter;
@@ -1643,3 +1647,50 @@ i40e_dev_destroy_tunnel_filter(struct i40e_pf *pf,
 
return ret;
 }
+
+static int
+i40e_flow_flush(struct rte_eth_dev *dev, struct rte_flow_error *error)
+{
+   struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+   int ret;
+
+   ret = i40e_fdir_filter_flush(pf);
+   if (ret)
+   rte_flow_error_set(error, -ret,
+  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+  "Failed to flush FDIR flows.");
+
+   return ret;
+}
+
+static int
+i40e_fdir_filter_flush(struct i40e_pf *pf)
+{
+   struct rte_eth_dev *dev = pf->adapter->eth_dev;
+   struct i40e_fdir_info *fdir_info = &pf->fdir;
+   struct i40e_fdir_filter *fdir_filter;
+   struct i40e_flow *flow;
+   void *temp;
+   int ret;
+
+   ret = i40e_fdir_flush(dev);
+   if (!ret) {
+   /* Delete FDIR filters in FDIR list. */
+   while ((fdir_filter = TAILQ_FIRST(&fdir_info->fdir_list))) {
+   ret = i40e_sw_fdir_filter_del(pf,
+ &fdir_filter->fdir.input);
+   if (ret < 0)
+   return ret;
+   }
+
+   /* Delete FDIR flows in flow list. */
+   TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
+   if (flow->filter_type == RTE_ETH_FILTER_FDIR) {
+   TAILQ_REMOVE(&pf->flow_list, flow, node);
+   rte_free(flow);
+   }
+   }
+   }
+
+   return ret;
+}
-- 
2.5.5



[dpdk-dev] [PATCH v5 17/17] net/i40e: flush tunnel filters

2017-01-03 Thread Beilei Xing
This patch adds i40e_tunnel_filter_flush function
to flush all tunnel filters, including filters in
SW and HW.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_flow.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index 2e696d3..c8eae4f 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -105,6 +105,7 @@ static int i40e_dev_destroy_tunnel_filter(struct i40e_pf 
*pf,
  struct i40e_tunnel_filter *filter);
 static int i40e_fdir_filter_flush(struct i40e_pf *pf);
 static int i40e_ethertype_filter_flush(struct i40e_pf *pf);
+static int i40e_tunnel_filter_flush(struct i40e_pf *pf);
 
 const struct rte_flow_ops i40e_flow_ops = {
.validate = i40e_flow_validate,
@@ -1671,6 +1672,14 @@ i40e_flow_flush(struct rte_eth_dev *dev, struct 
rte_flow_error *error)
return -rte_errno;
}
 
+   ret = i40e_tunnel_filter_flush(pf);
+   if (ret) {
+   rte_flow_error_set(error, -ret,
+  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+  "Failed to flush tunnel flows.");
+   return -rte_errno;
+   }
+
return ret;
 }
 
@@ -1733,3 +1742,31 @@ i40e_ethertype_filter_flush(struct i40e_pf *pf)
 
return ret;
 }
+
+/* Flush all tunnel filters */
+static int
+i40e_tunnel_filter_flush(struct i40e_pf *pf)
+{
+   struct i40e_tunnel_filter_list
+   *tunnel_list = &pf->tunnel.tunnel_list;
+   struct i40e_tunnel_filter *filter;
+   struct i40e_flow *flow;
+   void *temp;
+   int ret = 0;
+
+   while ((filter = TAILQ_FIRST(tunnel_list))) {
+   ret = i40e_dev_destroy_tunnel_filter(pf, filter);
+   if (ret)
+   return ret;
+   }
+
+   /* Delete tunnel flows in flow list. */
+   TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
+   if (flow->filter_type == RTE_ETH_FILTER_TUNNEL) {
+   TAILQ_REMOVE(&pf->flow_list, flow, node);
+   rte_free(flow);
+   }
+   }
+
+   return ret;
+}
-- 
2.5.5



[dpdk-dev] [PATCH v5 16/17] net/i40e: flush ethertype filters

2017-01-03 Thread Beilei Xing
This patch adds i40e_ethertype_filter_flush function
to flush all ethertype filters, including filters in
SW and HW.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_flow.c | 41 -
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index bc8a76c..2e696d3 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -104,6 +104,7 @@ static int i40e_dev_destroy_ethertype_filter(struct i40e_pf 
*pf,
 static int i40e_dev_destroy_tunnel_filter(struct i40e_pf *pf,
  struct i40e_tunnel_filter *filter);
 static int i40e_fdir_filter_flush(struct i40e_pf *pf);
+static int i40e_ethertype_filter_flush(struct i40e_pf *pf);
 
 const struct rte_flow_ops i40e_flow_ops = {
.validate = i40e_flow_validate,
@@ -1655,10 +1656,20 @@ i40e_flow_flush(struct rte_eth_dev *dev, struct 
rte_flow_error *error)
int ret;
 
ret = i40e_fdir_filter_flush(pf);
-   if (ret)
+   if (ret) {
rte_flow_error_set(error, -ret,
   RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
   "Failed to flush FDIR flows.");
+   return -rte_errno;
+   }
+
+   ret = i40e_ethertype_filter_flush(pf);
+   if (ret) {
+   rte_flow_error_set(error, -ret,
+  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+  "Failed to ethertype flush flows.");
+   return -rte_errno;
+   }
 
return ret;
 }
@@ -1694,3 +1705,31 @@ i40e_fdir_filter_flush(struct i40e_pf *pf)
 
return ret;
 }
+
+/* Flush all ethertype filters */
+static int
+i40e_ethertype_filter_flush(struct i40e_pf *pf)
+{
+   struct i40e_ethertype_filter_list
+   *ethertype_list = &pf->ethertype.ethertype_list;
+   struct i40e_ethertype_filter *filter;
+   struct i40e_flow *flow;
+   void *temp;
+   int ret = 0;
+
+   while ((filter = TAILQ_FIRST(ethertype_list))) {
+   ret = i40e_dev_destroy_ethertype_filter(pf, filter);
+   if (ret)
+   return ret;
+   }
+
+   /* Delete ethertype flows in flow list. */
+   TAILQ_FOREACH_SAFE(flow, &pf->flow_list, node, temp) {
+   if (flow->filter_type == RTE_ETH_FILTER_ETHERTYPE) {
+   TAILQ_REMOVE(&pf->flow_list, flow, node);
+   rte_free(flow);
+   }
+   }
+
+   return ret;
+}
-- 
2.5.5



Re: [dpdk-dev] [PATCH v3 1/4] ethdev: add firmware information get

2017-01-03 Thread Yang, Qiming
Yes, in my opinion it is. And I use this name already exist in the share code 
from ND team.

-Original Message-
From: Yigit, Ferruh 
Sent: Tuesday, January 3, 2017 10:49 PM
To: Yang, Qiming 
Cc: dev@dpdk.org; Horton, Remy ; Thomas Monjalon 

Subject: Re: [PATCH v3 1/4] ethdev: add firmware information get

On 1/3/2017 9:05 AM, Yang, Qiming wrote:
> Hi, Ferruh
> Please see the question below. In my opinion, etrack_id is just a name used 
> to define the ID of one NIC.
> In kernel version ethtool, it will print this ID in the line of firmware 
> verison. 
> I know what is etrack_id mean, but I really don't know why this named 
> etrack_id.

Hi Qiming,

I suggested the API based on fields you already used in your patch.

So, this API is to get FW version, is etrack_id something that defines (part 
of) firmware version?

Thanks,
ferruh


> Can you explain this question?
>  
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monja...@6wind.com]
> Sent: Tuesday, January 3, 2017 4:40 PM
> To: Yang, Qiming 
> Subject: Re: [PATCH v3 1/4] ethdev: add firmware information get
> 
> Please reply below the question and on the mailing list.
> You'll have to explain why this name etrack_id.
> 
> 2017-01-03 03:28, Yang, Qiming:
>> Hi, Thomas
>> etrack_id is not a terminology, it's decided by me.
>> Which is store the unique number of the firmware.
>> firmware-version: 5.04 0x800024ca
>> 800024ca is the etrack_id of this NIC.
>>
>> -Original Message-
>> From: Thomas Monjalon [mailto:thomas.monja...@6wind.com]
>> Sent: Monday, January 2, 2017 11:39 PM
>> To: Yang, Qiming 
>> Cc: dev@dpdk.org; Horton, Remy ; Yigit, Ferruh 
>> 
>> Subject: Re: [PATCH v3 1/4] ethdev: add firmware information get
>>
>> 2016-12-27 20:30, Qiming Yang:
>>>  /**
>>> + * Retrieve the firmware version of a device.
>>> + *
>>> + * @param port_id
>>> + *   The port identifier of the device.
>>> + * @param fw_major
>>> + *   A array pointer to store the major firmware version of a device.
>>> + * @param fw_minor
>>> + *   A array pointer to store the minor firmware version of a device.
>>> + * @param fw_patch
>>> + *   A array pointer to store the firmware patch number of a device.
>>> + * @param etrack_id
>>> + *   A array pointer to store the nvm version of a device.
>>> + */
>>> +void rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t *fw_major,
>>> +   uint32_t *fw_minor, uint32_t *fw_patch, uint32_t *etrack_id);
>>
>> I have a reserve about the naming etrack_id.
>> Please could you point to a document explaining this ID?
>> Is it known outside of Intel?
> 
> 



[dpdk-dev] [PATCH v3 2/7] net/virtio_user: fix not properly reset device

2017-01-03 Thread Jianfeng Tan
virtio_user is not properly reset when users call vtpci_reset(),
as it ignores VIRTIO_CONFIG_STATUS_RESET status in
virtio_user_set_status().

This might lead to initialization failure as it starts to re-init
the device before sending RESET messege to backend. Besides, previous
callfds and kickfds are not closed.

To fix it, we add support to disable virtqueues when it's set to
DRIVER OK status, and re-init fields in struct virtio_user_dev.

Fixes: e9efa4d93821 ("net/virtio-user: add new virtual PCI driver")
Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")

CC: sta...@dpdk.org

Signed-off-by: Jianfeng Tan 
---
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 26 
 drivers/net/virtio/virtio_user_ethdev.c  | 15 --
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index 0d7e17b..a38398b 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -182,7 +182,17 @@ virtio_user_start_device(struct virtio_user_dev *dev)
 
 int virtio_user_stop_device(struct virtio_user_dev *dev)
 {
-   return vhost_user_sock(dev->vhostfd, VHOST_USER_RESET_OWNER, NULL);
+   uint32_t i;
+
+   for (i = 0; i < dev->max_queue_pairs * 2; ++i) {
+   close(dev->callfds[i]);
+   close(dev->kickfds[i]);
+   }
+
+   for (i = 0; i < dev->max_queue_pairs; ++i)
+   vhost_user_enable_queue_pair(dev->vhostfd, i, 0);
+
+   return 0;
 }
 
 static inline void
@@ -210,6 +220,8 @@ int
 virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
 int cq, int queue_size, const char *mac)
 {
+   uint32_t i;
+
snprintf(dev->path, PATH_MAX, "%s", path);
dev->max_queue_pairs = queues;
dev->queue_pairs = 1; /* mq disabled by default */
@@ -218,6 +230,11 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char 
*path, int queues,
parse_mac(dev, mac);
dev->vhostfd = -1;
 
+   for (i = 0; i < VIRTIO_MAX_VIRTQUEUES * 2 + 1; ++i) {
+   dev->kickfds[i] = -1;
+   dev->callfds[i] = -1;
+   }
+
dev->vhostfd = vhost_user_setup(dev->path);
if (dev->vhostfd < 0) {
PMD_INIT_LOG(ERR, "backend set up fails");
@@ -264,13 +281,6 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char 
*path, int queues,
 void
 virtio_user_dev_uninit(struct virtio_user_dev *dev)
 {
-   uint32_t i;
-
-   for (i = 0; i < dev->max_queue_pairs * 2; ++i) {
-   close(dev->callfds[i]);
-   close(dev->kickfds[i]);
-   }
-
close(dev->vhostfd);
 }
 
diff --git a/drivers/net/virtio/virtio_user_ethdev.c 
b/drivers/net/virtio/virtio_user_ethdev.c
index 4a5a227..93f5b01 100644
--- a/drivers/net/virtio/virtio_user_ethdev.c
+++ b/drivers/net/virtio/virtio_user_ethdev.c
@@ -87,21 +87,24 @@ virtio_user_write_dev_config(struct virtio_hw *hw, size_t 
offset,
 }
 
 static void
-virtio_user_set_status(struct virtio_hw *hw, uint8_t status)
+virtio_user_reset(struct virtio_hw *hw)
 {
struct virtio_user_dev *dev = virtio_user_get_dev(hw);
 
-   if (status & VIRTIO_CONFIG_STATUS_DRIVER_OK)
-   virtio_user_start_device(dev);
-   dev->status = status;
+   if (dev->status & VIRTIO_CONFIG_STATUS_DRIVER_OK)
+   virtio_user_stop_device(dev);
 }
 
 static void
-virtio_user_reset(struct virtio_hw *hw)
+virtio_user_set_status(struct virtio_hw *hw, uint8_t status)
 {
struct virtio_user_dev *dev = virtio_user_get_dev(hw);
 
-   virtio_user_stop_device(dev);
+   if (status & VIRTIO_CONFIG_STATUS_DRIVER_OK)
+   virtio_user_start_device(dev);
+   else if (status == VIRTIO_CONFIG_STATUS_RESET)
+   virtio_user_reset(hw);
+   dev->status = status;
 }
 
 static uint8_t
-- 
2.7.4



[dpdk-dev] [PATCH v3 3/7] net/virtio_user: move vhost user specific code

2017-01-03 Thread Jianfeng Tan
To support vhost kernel as the backend of net_virtio_user in coming
patches, we move vhost_user specific structs and macros into
vhost_user.c, and only keep common definitions in vhost.h.

Besides, remove VHOST_USER_MQ feature check.

Signed-off-by: Jianfeng Tan 
---
 drivers/net/virtio/virtio_user/vhost.h   | 36 
 drivers/net/virtio/virtio_user/vhost_user.c  | 32 +
 drivers/net/virtio/virtio_user/virtio_user_dev.c |  9 --
 3 files changed, 32 insertions(+), 45 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/vhost.h 
b/drivers/net/virtio/virtio_user/vhost.h
index 7adb55f..e54ac35 100644
--- a/drivers/net/virtio/virtio_user/vhost.h
+++ b/drivers/net/virtio/virtio_user/vhost.h
@@ -42,8 +42,6 @@
 #include "../virtio_logs.h"
 #include "../virtqueue.h"
 
-#define VHOST_MEMORY_MAX_NREGIONS 8
-
 struct vhost_vring_state {
unsigned int index;
unsigned int num;
@@ -105,40 +103,6 @@ struct vhost_memory_region {
uint64_t mmap_offset;
 };
 
-struct vhost_memory {
-   uint32_t nregions;
-   uint32_t padding;
-   struct vhost_memory_region regions[VHOST_MEMORY_MAX_NREGIONS];
-};
-
-struct vhost_user_msg {
-   enum vhost_user_request request;
-
-#define VHOST_USER_VERSION_MASK 0x3
-#define VHOST_USER_REPLY_MASK   (0x1 << 2)
-   uint32_t flags;
-   uint32_t size; /* the following payload size */
-   union {
-#define VHOST_USER_VRING_IDX_MASK   0xff
-#define VHOST_USER_VRING_NOFD_MASK  (0x1 << 8)
-   uint64_t u64;
-   struct vhost_vring_state state;
-   struct vhost_vring_addr addr;
-   struct vhost_memory memory;
-   } payload;
-   int fds[VHOST_MEMORY_MAX_NREGIONS];
-} __attribute((packed));
-
-#define VHOST_USER_HDR_SIZE offsetof(struct vhost_user_msg, payload.u64)
-#define VHOST_USER_PAYLOAD_SIZE \
-   (sizeof(struct vhost_user_msg) - VHOST_USER_HDR_SIZE)
-
-/* The version of the protocol we support */
-#define VHOST_USER_VERSION0x1
-
-#define VHOST_USER_F_PROTOCOL_FEATURES 30
-#define VHOST_USER_MQ (1ULL << VHOST_USER_F_PROTOCOL_FEATURES)
-
 int vhost_user_sock(int vhostfd, enum vhost_user_request req, void *arg);
 int vhost_user_setup(const char *path);
 int vhost_user_enable_queue_pair(int vhostfd, uint16_t pair_idx, int enable);
diff --git a/drivers/net/virtio/virtio_user/vhost_user.c 
b/drivers/net/virtio/virtio_user/vhost_user.c
index 082e821..295ce16 100644
--- a/drivers/net/virtio/virtio_user/vhost_user.c
+++ b/drivers/net/virtio/virtio_user/vhost_user.c
@@ -42,6 +42,38 @@
 
 #include "vhost.h"
 
+/* The version of the protocol we support */
+#define VHOST_USER_VERSION0x1
+
+#define VHOST_MEMORY_MAX_NREGIONS 8
+struct vhost_memory {
+   uint32_t nregions;
+   uint32_t padding;
+   struct vhost_memory_region regions[VHOST_MEMORY_MAX_NREGIONS];
+};
+
+struct vhost_user_msg {
+   enum vhost_user_request request;
+
+#define VHOST_USER_VERSION_MASK 0x3
+#define VHOST_USER_REPLY_MASK   (0x1 << 2)
+   uint32_t flags;
+   uint32_t size; /* the following payload size */
+   union {
+#define VHOST_USER_VRING_IDX_MASK   0xff
+#define VHOST_USER_VRING_NOFD_MASK  (0x1 << 8)
+   uint64_t u64;
+   struct vhost_vring_state state;
+   struct vhost_vring_addr addr;
+   struct vhost_memory memory;
+   } payload;
+   int fds[VHOST_MEMORY_MAX_NREGIONS];
+} __attribute((packed));
+
+#define VHOST_USER_HDR_SIZE offsetof(struct vhost_user_msg, payload.u64)
+#define VHOST_USER_PAYLOAD_SIZE \
+   (sizeof(struct vhost_user_msg) - VHOST_USER_HDR_SIZE)
+
 static int
 vhost_user_write(int fd, void *buf, int len, int *fds, int fd_num)
 {
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index a38398b..8dd563a 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -151,8 +151,6 @@ virtio_user_start_device(struct virtio_user_dev *dev)
 * VIRTIO_NET_F_MAC and VIRTIO_NET_F_CTRL_VQ is stripped.
 */
features = dev->features;
-   if (dev->max_queue_pairs > 1)
-   features |= VHOST_USER_MQ;
features &= ~(1ull << VIRTIO_NET_F_MAC);
features &= ~(1ull << VIRTIO_NET_F_CTRL_VQ);
ret = vhost_user_sock(dev->vhostfd, VHOST_USER_SET_FEATURES, &features);
@@ -268,13 +266,6 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char 
*path, int queues,
dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_MAC_ADDR);
}
 
-   if (dev->max_queue_pairs > 1) {
-   if (!(dev->features & VHOST_USER_MQ)) {
-   PMD_INIT_LOG(ERR, "MQ not supported by the backend");
-   return -1;
-   }
-   }
-
return 0;
 }
 
-- 
2.7.4



[dpdk-dev] [PATCH v3 1/7] net/virtio_user: fix wrongly set features

2017-01-03 Thread Jianfeng Tan
Before the commit 86d59b21468a ("net/virtio: support LRO"), features
in virtio PMD, is decided and properly set at device initialization
and will not be changed. But afterward, features could be changed in
virtio_dev_configure(), and will be re-negotiated if it's changed.

In virtio_user, device features is obtained at driver probe phase
only once, but we did not store it. So the added feature bits in
re-negotiation will fail.

To fix it, we store it down, and will be used to feature negotiation
either at device initialization phase or device configure phase.

Fixes: e9efa4d93821 ("net/virtio-user: add new virtual PCI driver")

CC: sta...@dpdk.org

Signed-off-by: Jianfeng Tan 
---
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 34 +++-
 drivers/net/virtio/virtio_user/virtio_user_dev.h |  5 +++-
 drivers/net/virtio/virtio_user_ethdev.c  |  4 +--
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index e239e0e..0d7e17b 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -148,12 +148,13 @@ virtio_user_start_device(struct virtio_user_dev *dev)
 
/* Step 1: set features
 * Make sure VHOST_USER_F_PROTOCOL_FEATURES is added if mq is enabled,
-* and VIRTIO_NET_F_MAC is stripped.
+* VIRTIO_NET_F_MAC and VIRTIO_NET_F_CTRL_VQ is stripped.
 */
features = dev->features;
if (dev->max_queue_pairs > 1)
features |= VHOST_USER_MQ;
features &= ~(1ull << VIRTIO_NET_F_MAC);
+   features &= ~(1ull << VIRTIO_NET_F_CTRL_VQ);
ret = vhost_user_sock(dev->vhostfd, VHOST_USER_SET_FEATURES, &features);
if (ret < 0)
goto error;
@@ -228,29 +229,26 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char 
*path, int queues,
}
 
if (vhost_user_sock(dev->vhostfd, VHOST_USER_GET_FEATURES,
-   &dev->features) < 0) {
+   &dev->device_features) < 0) {
PMD_INIT_LOG(ERR, "get_features failed: %s", strerror(errno));
return -1;
}
if (dev->mac_specified)
-   dev->features |= (1ull << VIRTIO_NET_F_MAC);
+   dev->device_features |= (1ull << VIRTIO_NET_F_MAC);
 
-   if (!cq) {
-   dev->features &= ~(1ull << VIRTIO_NET_F_CTRL_VQ);
-   /* Also disable features depends on VIRTIO_NET_F_CTRL_VQ */
-   dev->features &= ~(1ull << VIRTIO_NET_F_CTRL_RX);
-   dev->features &= ~(1ull << VIRTIO_NET_F_CTRL_VLAN);
-   dev->features &= ~(1ull << VIRTIO_NET_F_GUEST_ANNOUNCE);
-   dev->features &= ~(1ull << VIRTIO_NET_F_MQ);
-   dev->features &= ~(1ull << VIRTIO_NET_F_CTRL_MAC_ADDR);
-   } else {
-   /* vhost user backend does not need to know ctrl-q, so
-* actually we need add this bit into features. However,
-* DPDK vhost-user does send features with this bit, so we
-* check it instead of OR it for now.
+   if (cq) {
+   /* device does not really need to know anything about CQ,
+* so if necessary, we just claim to support CQ
 */
-   if (!(dev->features & (1ull << VIRTIO_NET_F_CTRL_VQ)))
-   PMD_INIT_LOG(INFO, "vhost does not support ctrl-q");
+   dev->device_features |= (1ull << VIRTIO_NET_F_CTRL_VQ);
+   } else {
+   dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_VQ);
+   /* Also disable features depends on VIRTIO_NET_F_CTRL_VQ */
+   dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_RX);
+   dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_VLAN);
+   dev->device_features &= ~(1ull << VIRTIO_NET_F_GUEST_ANNOUNCE);
+   dev->device_features &= ~(1ull << VIRTIO_NET_F_MQ);
+   dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_MAC_ADDR);
}
 
if (dev->max_queue_pairs > 1) {
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h 
b/drivers/net/virtio/virtio_user/virtio_user_dev.h
index 33690b5..28fc788 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.h
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h
@@ -46,7 +46,10 @@ struct virtio_user_dev {
uint32_tmax_queue_pairs;
uint32_tqueue_pairs;
uint32_tqueue_size;
-   uint64_tfeatures;
+   uint64_tfeatures; /* the negotiated features with driver,
+  * and will be sync with device
+  */
+   uint64_tdevice_features; /* supported features by device */
uint8_t status;
uint8_t mac_addr[ETHER_ADDR_LEN];
char  

[dpdk-dev] [PATCH v3 0/7] virtio_user as an alternative exception path

2017-01-03 Thread Jianfeng Tan
v3:
  - Drop the patch to postpone driver ok sending patch, superseded it
with a bug fix to disable all virtqueues and re-init the device.
(you might wonder why not just send reset owner msg. Under my test,
 it causes spinlock deadlock problem when killing the program).
  - Avoid compiling error on 32-bit system for pointer convert.
  - Fix a bug in patch "abstract virtio user backend ops", vhostfd is
not properly assigned.
  - Fix a "MQ cannot be used" bug in v2, which is related to strip
some feature bits that vhost kernel does not recognize.
  - Update release note.

v2: (Lots of them are from yuanhan's comment)
  - Add offloding feature.
  - Add multiqueue support.
  - Add a new patch to postpone the sending of driver ok notification.
  - Put fix patch ahead of the whole patch series.
  - Split original 0001 patch into 0003 and 0004 patches.
  - Remove the original vhost_internal design, just add those into
struct virtio_user_dev for simplicity.
  - Reword "control" to "send_request".
  - Reword "host_features" to "device_features". 

In v16.07, we upstreamed a virtual device, virtio_user (with vhost-user
as the backend). The path to go with a vhost-kernel backend has been
dropped for bad performance comparing to vhost-user and code simplicity.

But after a second thought, virtio_user + vhost-kernel is a good 
candidate as an exceptional path, such as KNI, which exchanges packets
with kernel networking stack.
  - maintenance: vhost-net (kernel) is upstreamed and extensively used 
kernel module. We don't need any out-of-tree module like KNI.
  - performance: as with KNI, this solution would use one or more
kthreads to send/receive packets from user space DPDK applications,
which has little impact on user space polling thread (except that
it might enter into kernel space to wake up those kthreads if
necessary).
  - features: vhost-net is born to be a networking solution, which has
lots of networking related featuers, like multi queue, tso, multi-seg
mbuf, etc.

Signed-off-by: Jianfeng Tan 


Jianfeng Tan (7):
  net/virtio_user: fix wrongly set features
  net/virtio_user: fix not properly reset device
  net/virtio_user: move vhost user specific code
  net/virtio_user: abstract virtio user backend ops
  net/virtio_user: add vhost kernel support
  net/virtio_user: enable offloading
  net/virtio_user: enable multiqueue with vhost kernel

 doc/guides/rel_notes/release_17_02.rst   |  20 +
 drivers/net/virtio/Makefile  |   1 +
 drivers/net/virtio/virtio_user/vhost.h   |  51 +--
 drivers/net/virtio/virtio_user/vhost_kernel.c| 487 +++
 drivers/net/virtio/virtio_user/vhost_user.c  |  97 +++--
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 138 ---
 drivers/net/virtio/virtio_user/virtio_user_dev.h |  16 +-
 drivers/net/virtio/virtio_user_ethdev.c  |  19 +-
 8 files changed, 705 insertions(+), 124 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_user/vhost_kernel.c

-- 
2.7.4



[dpdk-dev] [PATCH v3 4/7] net/virtio_user: abstract virtio user backend ops

2017-01-03 Thread Jianfeng Tan
Add a struct virtio_user_backend_ops to abstract three kinds of backend
operations:
  - setup, create the unix socket connection;
  - send_request, sync messages with backend;
  - enable_qp, enable some queue pair.

Signed-off-by: Jianfeng Tan 
---
 drivers/net/virtio/virtio_user/vhost.h   | 17 +-
 drivers/net/virtio/virtio_user/vhost_user.c  | 65 +++-
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 77 +++-
 drivers/net/virtio/virtio_user/virtio_user_dev.h |  5 ++
 4 files changed, 106 insertions(+), 58 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/vhost.h 
b/drivers/net/virtio/virtio_user/vhost.h
index e54ac35..515e4fc 100644
--- a/drivers/net/virtio/virtio_user/vhost.h
+++ b/drivers/net/virtio/virtio_user/vhost.h
@@ -96,6 +96,8 @@ enum vhost_user_request {
VHOST_USER_MAX
 };
 
+const char * const vhost_msg_strings[VHOST_USER_MAX];
+
 struct vhost_memory_region {
uint64_t guest_phys_addr;
uint64_t memory_size; /* bytes */
@@ -103,8 +105,17 @@ struct vhost_memory_region {
uint64_t mmap_offset;
 };
 
-int vhost_user_sock(int vhostfd, enum vhost_user_request req, void *arg);
-int vhost_user_setup(const char *path);
-int vhost_user_enable_queue_pair(int vhostfd, uint16_t pair_idx, int enable);
+struct virtio_user_dev;
+
+struct virtio_user_backend_ops {
+   int (*setup)(struct virtio_user_dev *dev);
+   int (*send_request)(struct virtio_user_dev *dev,
+   enum vhost_user_request req,
+   void *arg);
+   int (*enable_qp)(struct virtio_user_dev *dev,
+uint16_t pair_idx,
+int enable);
+};
 
+struct virtio_user_backend_ops ops_user;
 #endif
diff --git a/drivers/net/virtio/virtio_user/vhost_user.c 
b/drivers/net/virtio/virtio_user/vhost_user.c
index 295ce16..a9ca10f 100644
--- a/drivers/net/virtio/virtio_user/vhost_user.c
+++ b/drivers/net/virtio/virtio_user/vhost_user.c
@@ -41,6 +41,7 @@
 #include 
 
 #include "vhost.h"
+#include "virtio_user_dev.h"
 
 /* The version of the protocol we support */
 #define VHOST_USER_VERSION0x1
@@ -255,24 +256,26 @@ prepare_vhost_memory_user(struct vhost_user_msg *msg, int 
fds[])
 
 static struct vhost_user_msg m;
 
-static const char * const vhost_msg_strings[] = {
-   [VHOST_USER_SET_OWNER] = "VHOST_USER_SET_OWNER",
-   [VHOST_USER_RESET_OWNER] = "VHOST_USER_RESET_OWNER",
-   [VHOST_USER_SET_FEATURES] = "VHOST_USER_SET_FEATURES",
-   [VHOST_USER_GET_FEATURES] = "VHOST_USER_GET_FEATURES",
-   [VHOST_USER_SET_VRING_CALL] = "VHOST_USER_SET_VRING_CALL",
-   [VHOST_USER_SET_VRING_NUM] = "VHOST_USER_SET_VRING_NUM",
-   [VHOST_USER_SET_VRING_BASE] = "VHOST_USER_SET_VRING_BASE",
-   [VHOST_USER_GET_VRING_BASE] = "VHOST_USER_GET_VRING_BASE",
-   [VHOST_USER_SET_VRING_ADDR] = "VHOST_USER_SET_VRING_ADDR",
-   [VHOST_USER_SET_VRING_KICK] = "VHOST_USER_SET_VRING_KICK",
-   [VHOST_USER_SET_MEM_TABLE] = "VHOST_USER_SET_MEM_TABLE",
-   [VHOST_USER_SET_VRING_ENABLE] = "VHOST_USER_SET_VRING_ENABLE",
+const char * const vhost_msg_strings[] = {
+   [VHOST_USER_SET_OWNER] = "VHOST_SET_OWNER",
+   [VHOST_USER_RESET_OWNER] = "VHOST_RESET_OWNER",
+   [VHOST_USER_SET_FEATURES] = "VHOST_SET_FEATURES",
+   [VHOST_USER_GET_FEATURES] = "VHOST_GET_FEATURES",
+   [VHOST_USER_SET_VRING_CALL] = "VHOST_SET_VRING_CALL",
+   [VHOST_USER_SET_VRING_NUM] = "VHOST_SET_VRING_NUM",
+   [VHOST_USER_SET_VRING_BASE] = "VHOST_SET_VRING_BASE",
+   [VHOST_USER_GET_VRING_BASE] = "VHOST_GET_VRING_BASE",
+   [VHOST_USER_SET_VRING_ADDR] = "VHOST_SET_VRING_ADDR",
+   [VHOST_USER_SET_VRING_KICK] = "VHOST_SET_VRING_KICK",
+   [VHOST_USER_SET_MEM_TABLE] = "VHOST_SET_MEM_TABLE",
+   [VHOST_USER_SET_VRING_ENABLE] = "VHOST_SET_VRING_ENABLE",
NULL,
 };
 
-int
-vhost_user_sock(int vhostfd, enum vhost_user_request req, void *arg)
+static int
+vhost_user_sock(struct virtio_user_dev *dev,
+   enum vhost_user_request req,
+   void *arg)
 {
struct vhost_user_msg msg;
struct vhost_vring_file *file = 0;
@@ -280,9 +283,9 @@ vhost_user_sock(int vhostfd, enum vhost_user_request req, 
void *arg)
int fds[VHOST_MEMORY_MAX_NREGIONS];
int fd_num = 0;
int i, len;
+   int vhostfd = dev->vhostfd;
 
RTE_SET_USED(m);
-   RTE_SET_USED(vhost_msg_strings);
 
PMD_DRV_LOG(INFO, "%s", vhost_msg_strings[req]);
 
@@ -403,15 +406,13 @@ vhost_user_sock(int vhostfd, enum vhost_user_request req, 
void *arg)
 
 /**
  * Set up environment to talk with a vhost user backend.
- * @param path
- *   - The path to vhost user unix socket file.
  *
  * @return
- *   - (-1) if fail to set up;
- *   - (>=0) if successful, and it is the fd to vhostfd.
+ *   - (-1) if fail;
+ *   - (0) if succeed.
  */
-int
-vhost_user_setup(const char *path)
+static int
+vhost_user_setup(struct virtio_

[dpdk-dev] [PATCH v3 6/7] net/virtio_user: enable offloading

2017-01-03 Thread Jianfeng Tan
When used with vhost kernel backend, we can offload at both directions.
  - From vhost kernel to virtio_user, the offload is enabled so that
DPDK app can trust the flow is checksum-correct; and if DPDK app
sends it through another port, the checksum needs to be
recalculated or offloaded. It also applies to TSO.
  - From virtio_user to vhost_kernel, the offload is enabled so that
kernel can trust the flow is L4-checksum-correct, no need to verify
it; if kernel will consume it, DPDK app should make sure the
l3-checksum is correctly set.

Signed-off-by: Jianfeng Tan 
---
 drivers/net/virtio/virtio_user/vhost_kernel.c | 61 ++-
 1 file changed, 59 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c 
b/drivers/net/virtio/virtio_user/vhost_kernel.c
index 1e7cdef..bdb4af2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -91,6 +91,13 @@ struct vhost_memory_kernel {
 #define IFF_ATTACH_QUEUE 0x0200
 #define IFF_DETACH_QUEUE 0x0400
 
+/* Features for GSO (TUNSETOFFLOAD). */
+#define TUN_F_CSUM 0x01/* You can hand me unchecksummed packets. */
+#define TUN_F_TSO4 0x02/* I can handle TSO for IPv4 packets */
+#define TUN_F_TSO6 0x04/* I can handle TSO for IPv6 packets */
+#define TUN_F_TSO_ECN  0x08/* I can handle TSO with ECN bits. */
+#define TUN_F_UFO  0x10/* I can handle UFO packets */
+
 /* Constants */
 #define TUN_DEF_SNDBUF (1ull << 20)
 #define PATH_NET_TUN   "/dev/net/tun"
@@ -176,6 +183,28 @@ prepare_vhost_memory_kernel(void)
return vm;
 }
 
+/* with below features, vhost kernel does not need to do the checksum and TSO,
+ * these info will be passed to virtio_user through virtio net header.
+ */
+#define VHOST_KERNEL_GUEST_OFFLOADS_MASK   \
+   ((1ULL << VIRTIO_NET_F_GUEST_CSUM) |\
+(1ULL << VIRTIO_NET_F_GUEST_TSO4) |\
+(1ULL << VIRTIO_NET_F_GUEST_TSO6) |\
+(1ULL << VIRTIO_NET_F_GUEST_ECN)  |\
+(1ULL << VIRTIO_NET_F_GUEST_UFO))
+
+/* with below features, when flows from virtio_user to vhost kernel
+ * (1) if flows goes up through the kernel networking stack, it does not need
+ * to verify checksum, which can save CPU cycles;
+ * (2) if flows goes through a Linux bridge and outside from an interface
+ * (kernel driver), checksum and TSO will be done by GSO in kernel or even
+ * offloaded into real physical device.
+ */
+#define VHOST_KERNEL_HOST_OFFLOADS_MASK\
+   ((1ULL << VIRTIO_NET_F_HOST_TSO4) | \
+(1ULL << VIRTIO_NET_F_HOST_TSO6) | \
+(1ULL << VIRTIO_NET_F_CSUM))
+
 static int
 vhost_kernel_ioctl(struct virtio_user_dev *dev,
   enum vhost_user_request req,
@@ -196,10 +225,15 @@ vhost_kernel_ioctl(struct virtio_user_dev *dev,
arg = (void *)vm;
}
 
-   /* Does not work when VIRTIO_F_IOMMU_PLATFORM now, why? */
-   if (req_kernel == VHOST_SET_FEATURES)
+   if (req_kernel == VHOST_SET_FEATURES) {
+   /* Does not work when VIRTIO_F_IOMMU_PLATFORM now, why? */
*(uint64_t *)arg &= ~(1ULL << VIRTIO_F_IOMMU_PLATFORM);
 
+   /* VHOST kernel does not know about below flags */
+   *(uint64_t *)arg &= ~VHOST_KERNEL_GUEST_OFFLOADS_MASK;
+   *(uint64_t *)arg &= ~VHOST_KERNEL_HOST_OFFLOADS_MASK;
+   }
+
for (i = 0; i < VHOST_KERNEL_MAX_QUEUES; ++i) {
if (dev->vhostfds[i] < 0)
continue;
@@ -209,6 +243,15 @@ vhost_kernel_ioctl(struct virtio_user_dev *dev,
break;
}
 
+   if (!ret && req_kernel == VHOST_GET_FEATURES) {
+   /* with tap as the backend, all these features are supported
+* but not claimed by vhost-net, so we add them back when
+* reporting to upper layer.
+*/
+   *((uint64_t *)arg) |= VHOST_KERNEL_GUEST_OFFLOADS_MASK;
+   *((uint64_t *)arg) |= VHOST_KERNEL_HOST_OFFLOADS_MASK;
+   }
+
if (vm)
free(vm);
 
@@ -280,6 +323,12 @@ vhost_kernel_enable_queue_pair(struct virtio_user_dev *dev,
int hdr_size;
int vhostfd;
int tapfd;
+   unsigned int offload =
+   TUN_F_CSUM |
+   TUN_F_TSO4 |
+   TUN_F_TSO6 |
+   TUN_F_TSO_ECN |
+   TUN_F_UFO;
 
vhostfd = dev->vhostfds[pair_idx];
 
@@ -354,6 +403,14 @@ vhost_kernel_enable_queue_pair(struct virtio_user_dev *dev,
goto error;
}
 
+   /* TODO: before set the offload capabilities, we'd better (1) check
+* negotiated features to see if necessary to offload; (2) query tap
+* to see if it supports the offload capabilities.
+*/
+   if (ioctl(tapfd, TUNSETOFFLOAD, offload) != 0)
+   PMD

[dpdk-dev] [PATCH v3 7/7] net/virtio_user: enable multiqueue with vhost kernel

2017-01-03 Thread Jianfeng Tan
With vhost kernel, to enable multiqueue, we need backend device
in kernel support multiqueue feature. Specifically, with tap
as the backend, as linux/Documentation/networking/tuntap.txt shows,
we check if tap supports IFF_MULTI_QUEUE feature.

And for vhost kernel, each queue pair has a vhost fd, and with a tap
fd binding this vhost fd. All tap fds are set with the same tap
interface name.

Signed-off-by: Jianfeng Tan 
---
 drivers/net/virtio/virtio_user/vhost_kernel.c| 69 +---
 drivers/net/virtio/virtio_user/virtio_user_dev.c |  1 +
 2 files changed, 64 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c 
b/drivers/net/virtio/virtio_user/vhost_kernel.c
index bdb4af2..023bdf8 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -206,6 +206,29 @@ prepare_vhost_memory_kernel(void)
 (1ULL << VIRTIO_NET_F_CSUM))
 
 static int
+tap_supporte_mq(void)
+{
+   int tapfd;
+   unsigned int tap_features;
+
+   tapfd = open(PATH_NET_TUN, O_RDWR);
+   if (tapfd < 0) {
+   PMD_DRV_LOG(ERR, "fail to open %s: %s",
+   PATH_NET_TUN, strerror(errno));
+   return -1;
+   }
+
+   if (ioctl(tapfd, TUNGETFEATURES, &tap_features) == -1) {
+   PMD_DRV_LOG(ERR, "TUNGETFEATURES failed: %s", strerror(errno));
+   close(tapfd);
+   return -1;
+   }
+
+   close(tapfd);
+   return tap_features & IFF_MULTI_QUEUE;
+}
+
+static int
 vhost_kernel_ioctl(struct virtio_user_dev *dev,
   enum vhost_user_request req,
   void *arg)
@@ -213,6 +236,8 @@ vhost_kernel_ioctl(struct virtio_user_dev *dev,
int i, ret = -1;
uint64_t req_kernel;
struct vhost_memory_kernel *vm = NULL;
+   int vhostfd;
+   unsigned int queue_sel;
 
PMD_DRV_LOG(INFO, "%s", vhost_msg_strings[req]);
 
@@ -232,15 +257,37 @@ vhost_kernel_ioctl(struct virtio_user_dev *dev,
/* VHOST kernel does not know about below flags */
*(uint64_t *)arg &= ~VHOST_KERNEL_GUEST_OFFLOADS_MASK;
*(uint64_t *)arg &= ~VHOST_KERNEL_HOST_OFFLOADS_MASK;
+
+   *(uint64_t *)arg &= ~(1ULL << VIRTIO_NET_F_MQ);
}
 
-   for (i = 0; i < VHOST_KERNEL_MAX_QUEUES; ++i) {
-   if (dev->vhostfds[i] < 0)
-   continue;
+   switch (req_kernel) {
+   case VHOST_SET_VRING_NUM:
+   case VHOST_SET_VRING_ADDR:
+   case VHOST_SET_VRING_BASE:
+   case VHOST_GET_VRING_BASE:
+   case VHOST_SET_VRING_KICK:
+   case VHOST_SET_VRING_CALL:
+   queue_sel = *(unsigned int *)arg;
+   vhostfd = dev->vhostfds[queue_sel / 2];
+   *(unsigned int *)arg = queue_sel % 2;
+   PMD_DRV_LOG(DEBUG, "vhostfd=%d, index=%u",
+   vhostfd, *(unsigned int *)arg);
+   break;
+   default:
+   vhostfd = -1;
+   }
+   if (vhostfd == -1) {
+   for (i = 0; i < VHOST_KERNEL_MAX_QUEUES; ++i) {
+   if (dev->vhostfds[i] < 0)
+   continue;
 
-   ret = ioctl(dev->vhostfds[i], req_kernel, arg);
-   if (ret < 0)
-   break;
+   ret = ioctl(dev->vhostfds[i], req_kernel, arg);
+   if (ret < 0)
+   break;
+   }
+   } else {
+   ret = ioctl(vhostfd, req_kernel, arg);
}
 
if (!ret && req_kernel == VHOST_GET_FEATURES) {
@@ -250,6 +297,12 @@ vhost_kernel_ioctl(struct virtio_user_dev *dev,
 */
*((uint64_t *)arg) |= VHOST_KERNEL_GUEST_OFFLOADS_MASK;
*((uint64_t *)arg) |= VHOST_KERNEL_HOST_OFFLOADS_MASK;
+
+   /* vhost_kernel will not declare this feature, but it does
+* support multi-queue.
+*/
+   if (tap_supporte_mq())
+   *(uint64_t *)arg |= (1ull << VIRTIO_NET_F_MQ);
}
 
if (vm)
@@ -329,6 +382,7 @@ vhost_kernel_enable_queue_pair(struct virtio_user_dev *dev,
TUN_F_TSO6 |
TUN_F_TSO_ECN |
TUN_F_UFO;
+   int req_mq = (dev->max_queue_pairs > 1);
 
vhostfd = dev->vhostfds[pair_idx];
 
@@ -382,6 +436,9 @@ vhost_kernel_enable_queue_pair(struct virtio_user_dev *dev,
goto error;
}
 
+   if (req_mq)
+   ifr.ifr_flags |= IFF_MULTI_QUEUE;
+
if (dev->ifname)
strncpy(ifr.ifr_name, dev->ifname, IFNAMSIZ);
else
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index c40b77e..2d9d989 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virt

[dpdk-dev] [PATCH v3 5/7] net/virtio_user: add vhost kernel support

2017-01-03 Thread Jianfeng Tan
This patch add support vhost kernel as the backend for virtio_user.
Three main hook functions are added:
  - vhost_kernel_setup() to open char device, each vq pair needs one
vhostfd;
  - vhost_kernel_ioctl() to communicate control messages with vhost
kernel module;
  - vhost_kernel_enable_queue_pair() to open tap device and set it
as the backend of corresonding vhost fd (that is to say, vq pair).

Signed-off-by: Jianfeng Tan 
---
 doc/guides/rel_notes/release_17_02.rst   |  20 ++
 drivers/net/virtio/Makefile  |   1 +
 drivers/net/virtio/virtio_user/vhost.h   |   2 +
 drivers/net/virtio/virtio_user/vhost_kernel.c| 373 +++
 drivers/net/virtio/virtio_user/virtio_user_dev.c |  21 +-
 drivers/net/virtio/virtio_user/virtio_user_dev.h |   6 +
 6 files changed, 420 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_user/vhost_kernel.c

diff --git a/doc/guides/rel_notes/release_17_02.rst 
b/doc/guides/rel_notes/release_17_02.rst
index 180af82..7354df5 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -52,6 +52,26 @@ New Features
   See the :ref:`Generic flow API ` documentation for more
   information.
 
+* **virtio_user with vhost-kernel as another exceptional path.**
+
+  Previously, we upstreamed a virtual device, virtio_user with vhost-user
+  as the backend, as a way for IPC (Inter-Process Communication) and user
+  space container networking.
+
+  Virtio_user with vhost-kernel as the backend is a solution for exceptional
+  path, such as KNI, which exchanges packets with kernel networking stack.
+  This solution is very promising in:
+
+  * maintenance: vhost and vhost-net (kernel) is upstreamed and extensively
+used kernel module.
+  * features: vhost-net is born to be a networking solution, which has
+lots of networking related featuers, like multi queue, tso, multi-seg
+mbuf, etc.
+  * performance: similar to KNI, this solution would use one or more
+kthreads to send/receive packets from user space DPDK applications,
+which has little impact on user space polling thread (except that
+it might enter into kernel space to wake up those kthreads if
+necessary).
 
 Resolved Issues
 ---
diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 97972a6..faeffb2 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -60,6 +60,7 @@ endif
 
 ifeq ($(CONFIG_RTE_VIRTIO_USER),y)
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_kernel.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_dev.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user_ethdev.c
 endif
diff --git a/drivers/net/virtio/virtio_user/vhost.h 
b/drivers/net/virtio/virtio_user/vhost.h
index 515e4fc..5c983bd 100644
--- a/drivers/net/virtio/virtio_user/vhost.h
+++ b/drivers/net/virtio/virtio_user/vhost.h
@@ -118,4 +118,6 @@ struct virtio_user_backend_ops {
 };
 
 struct virtio_user_backend_ops ops_user;
+struct virtio_user_backend_ops ops_kernel;
+
 #endif
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c 
b/drivers/net/virtio/virtio_user/vhost_kernel.c
new file mode 100644
index 000..1e7cdef
--- /dev/null
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -0,0 +1,373 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEG

Re: [dpdk-dev] [PATCH] lib/librte_vhost: fix memory leak

2017-01-03 Thread Yuanhan Liu
On Tue, Jan 03, 2017 at 10:57:55PM -0500, Yong Wang wrote:
> In function vhost_new_device(), current code dose not free 'dev'
> in "i == MAX_VHOST_DEVICE" condition statements. It will lead to a
> memory leak.

Nice catch!

Here are few minor stuff you might need pay attention to for future
contribution:

- a fix patch needs a fixline, like following

  Fixes: 45ca9c6f7bc6 ("vhost: get rid of linked list for devices")

- the prefix for vhost lib is "vhost: ". And FYI, for PMD drivers, it's
  'net/PMD_NAME', say 'net/virtio'.


For you convenience, I have fixed the two while applying. And thanks
for the fix.

Applied to dpdk-next-virtio.

--yliu


Re: [dpdk-dev] [PATCH v3 2/7] net/virtio_user: fix not properly reset device

2017-01-03 Thread Yuanhan Liu
On Wed, Jan 04, 2017 at 03:59:21AM +, Jianfeng Tan wrote:
> virtio_user is not properly reset when users call vtpci_reset(),
> as it ignores VIRTIO_CONFIG_STATUS_RESET status in
> virtio_user_set_status().
> 
> This might lead to initialization failure as it starts to re-init
> the device before sending RESET messege to backend. Besides, previous
> callfds and kickfds are not closed.
> 
> To fix it, we add support to disable virtqueues when it's set to
> DRIVER OK status, and re-init fields in struct virtio_user_dev.
> 
> Fixes: e9efa4d93821 ("net/virtio-user: add new virtual PCI driver")
> Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")
> 
> CC: sta...@dpdk.org
> 
> Signed-off-by: Jianfeng Tan 

Note that, typically, there should be no empty line between 'Cc' and SoB.

> ---
>  drivers/net/virtio/virtio_user/virtio_user_dev.c | 26 
> 
>  drivers/net/virtio/virtio_user_ethdev.c  | 15 --
>  2 files changed, 27 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
> b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> index 0d7e17b..a38398b 100644
> --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
> +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> @@ -182,7 +182,17 @@ virtio_user_start_device(struct virtio_user_dev *dev)
>  
>  int virtio_user_stop_device(struct virtio_user_dev *dev)

The name doesn't seem to be well named: "dev_stop" comes to my firstly
when I saw that :/

Rename it to "xxx_reset_device"?

--yliu


Re: [dpdk-dev] [PATCH v3 3/7] net/virtio_user: move vhost user specific code

2017-01-03 Thread Yuanhan Liu
On Wed, Jan 04, 2017 at 03:59:22AM +, Jianfeng Tan wrote:
> To support vhost kernel as the backend of net_virtio_user in coming
> patches, we move vhost_user specific structs and macros into
> vhost_user.c, and only keep common definitions in vhost.h.
> 
> Besides, remove VHOST_USER_MQ feature check.

Again, I have to ask, why? You don't only remove the check, also, you
removed this feature setting, which seems to break the MQ support?

--yliu


Re: [dpdk-dev] [PATCH v3 4/7] net/virtio_user: abstract virtio user backend ops

2017-01-03 Thread Yuanhan Liu
On Wed, Jan 04, 2017 at 03:59:23AM +, Jianfeng Tan wrote:
> +struct virtio_user_backend_ops ops_user;

Better to qualify it with "extern const" ...

--yliu


Re: [dpdk-dev] [PATCH v3 5/7] net/virtio_user: add vhost kernel support

2017-01-03 Thread Yuanhan Liu
On Wed, Jan 04, 2017 at 03:59:24AM +, Jianfeng Tan wrote:
> +static int
> +vhost_kernel_ioctl(struct virtio_user_dev *dev,
> +enum vhost_user_request req,
> +void *arg)
> +{
> + int i, ret = -1;
> + uint64_t req_kernel;
> + struct vhost_memory_kernel *vm = NULL;
> +
> + PMD_DRV_LOG(INFO, "%s", vhost_msg_strings[req]);
> +
> + req_kernel = vhost_req_user_to_kernel[req];
> +
> + if (req_kernel == VHOST_SET_MEM_TABLE) {
> + vm = prepare_vhost_memory_kernel();
> + if (!vm)
> + return -1;
> + arg = (void *)vm;
> + }
> +
> + /* Does not work when VIRTIO_F_IOMMU_PLATFORM now, why? */
> + if (req_kernel == VHOST_SET_FEATURES)
> + *(uint64_t *)arg &= ~(1ULL << VIRTIO_F_IOMMU_PLATFORM);

You missed my comments in last version?

--yliu


Re: [dpdk-dev] [PATCH v5 00/17] net/i40e: consistent filter API

2017-01-03 Thread Wu, Jingjing


> -Original Message-
> From: Xing, Beilei
> Sent: Wednesday, January 4, 2017 11:23 AM
> To: Wu, Jingjing ; Zhang, Helin 
> Cc: dev@dpdk.org
> Subject: [PATCH v5 00/17] net/i40e: consistent filter API
> 
> The patch set depends on Adrien's Generic flow API(rte_flow).
> 
> The patches mainly finish following functions:
> 1) Store and restore all kinds of filters.
> 2) Parse all kinds of filters.
> 3) Add flow validate function.
> 4) Add flow create function.
> 5) Add flow destroy function.
> 6) Add flow flush function.
> 
> v5 changes:
>  Change some local variable name.
>  Add removing i40e_flow_list during device unint.
>  Fix compile error when gcc compile option isn't '-O0'.
> 
> v4 changes:
>  Change I40E_TCI_MASK with 0x to align with testpmd.
>  Modidy the stats show when restoring filters.
> 
> v3 changes:
>  Set the related cause pointer to a non-NULL value when error happens.
>  Change return value when error happens.
>  Modify filter_del parameter with key.
>  Malloc filter after checking when delete a filter.
>  Delete meaningless initialization.
>  Add return value when there's error.
>  Change global variable definition.
>  Modify some function declaration.
> 
> v2 changes:
>  Add i40e_flow.c, all flow ops are implemented in the file.
>  Change the whole implementation of all parse flow functions.
>  Update error info for all flow ops.
>  Add flow_list to store flows created.
> 
> Beilei Xing (17):
>   net/i40e: store ethertype filter
>   net/i40e: store tunnel filter
>   net/i40e: store flow director filter
>   net/i40e: restore ethertype filter
>   net/i40e: restore tunnel filter
>   net/i40e: restore flow director filter
>   net/i40e: add flow validate function
>   net/i40e: parse flow director filter
>   net/i40e: parse tunnel filter
>   net/i40e: add flow create function
>   net/i40e: add flow destroy function
>   net/i40e: destroy ethertype filter
>   net/i40e: destroy tunnel filter
>   net/i40e: destroy flow directory filter
>   net/i40e: add flow flush function
>   net/i40e: flush ethertype filters
>   net/i40e: flush tunnel filters
> 
>  drivers/net/i40e/Makefile  |2 +
>  drivers/net/i40e/i40e_ethdev.c |  526 ++--
> drivers/net/i40e/i40e_ethdev.h |  173 
>  drivers/net/i40e/i40e_fdir.c   |  140 +++-
>  drivers/net/i40e/i40e_flow.c   | 1772
> 
>  5 files changed, 2547 insertions(+), 66 deletions(-)  create mode 100644
> drivers/net/i40e/i40e_flow.c
> 

Acked-by: Jingjing Wu 

Thanks
Jingjing


Re: [dpdk-dev] [PATCH v3 3/7] net/virtio_user: move vhost user specific code

2017-01-03 Thread Tan, Jianfeng
Hi Yuanhan,

> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan@linux.intel.com]
> Sent: Wednesday, January 4, 2017 2:03 PM
> To: Tan, Jianfeng
> Cc: dev@dpdk.org; Yigit, Ferruh; Liang, Cunming
> Subject: Re: [PATCH v3 3/7] net/virtio_user: move vhost user specific code
> 
> On Wed, Jan 04, 2017 at 03:59:22AM +, Jianfeng Tan wrote:
> > To support vhost kernel as the backend of net_virtio_user in coming
> > patches, we move vhost_user specific structs and macros into
> > vhost_user.c, and only keep common definitions in vhost.h.
> >
> > Besides, remove VHOST_USER_MQ feature check.
> 
> Again, I have to ask, why? You don't only remove the check, also, you
> removed this feature setting, which seems to break the MQ support?

I have answered it here:
http://dpdk.org/ml/archives/dev/2016-December/053520.html

To be more clear, VHOST_USER_MQ is a not-well-defined macro: #define 
VHOST_USER_MQ (1ULL << VHOST_USER_F_PROTOCOL_FEATURES),
which is a feature bit in vhost user protocol.

According to QEMU/ docs/specs/vhost-user.txt, "If 
VHOST_USER_F_PROTOCOL_FEATURES has not been negotiated, the ring is initialized 
in an enabled state. "

But our DPDK vhost library does not take care of this feature bit. Just make 
this as default: the ring is initialized in an disabled state. And our 
virtio_user with vhost-user does send VHOST_USER_SET_VRING_ENABLE to enable 
each queue pair.

So I think it's not necessary to add it back.

How do you think?

Thanks,
Jianfeng

> 
>   --yliu


[dpdk-dev] 答复: Re: [PATCH] lib/librte_vhost: fix memory leak

2017-01-03 Thread wang . yong19
> Yuanhan Liu  
> 2017/01/04 12:02
> 
> to
> 
> Yong Wang , 
> 
> cc
> 
> dev@dpdk.org
> 
> subject
> 
> Re: [PATCH] lib/librte_vhost: fix memory leak
> 
> On Tue, Jan 03, 2017 at 10:57:55PM -0500, Yong Wang wrote:
> > In function vhost_new_device(), current code dose not free 'dev'
> > in "i == MAX_VHOST_DEVICE" condition statements. It will lead to a
> > memory leak.
> 
> Nice catch!
> 
> Here are few minor stuff you might need pay attention to for future
> contribution:
> 
> - a fix patch needs a fixline, like following
> 
>   Fixes: 45ca9c6f7bc6 ("vhost: get rid of linked list for devices")
> 
> - the prefix for vhost lib is "vhost: ". And FYI, for PMD drivers, it's
>   'net/PMD_NAME', say 'net/virtio'.
> 
> 
> For you convenience, I have fixed the two while applying. And thanks
> for the fix.
> 
> Applied to dpdk-next-virtio.
> 
>--yliu

Thanks for your advice. 



Re: [dpdk-dev] [PATCH v2 5/9] net/virtio: setup rxq interrupts

2017-01-03 Thread Tan, Jianfeng



On 12/30/2016 2:27 PM, Yuanhan Liu wrote:

On Thu, Dec 29, 2016 at 07:30:39AM +, Jianfeng Tan wrote:

This patch mainly allocates structure to store queue/irq mapping,
and configure queue/irq mapping down through PCI ops. It also creates
eventfds for each Rx queue and tell the kernel about the eventfd/intr
binding.

Mostly importantly, different from previous NICs (usually implements
these logic in dev_start()), virtio's interrupt settings should be
configured down to QEMU before sending DRIVER_OK notification.

Isn't it obvious we have to have all driver stuff (including interrupt
settings) configured properly before setting DRIVER_OK? :) That said,
it's meanless to state the fact that virtio acts differently than other
nics here on dev_start/stop.


Note: We only support 1:1 queue/irq mapping so far, which means, each
rx queue has one exclusive interrupt (corresponding to irqfd in the
qemu/kvm) to get notified when packets are available on that queue.

That means you have to setup the "vectors=N" option has to set correctly
in QEMU, otherwise it won't work?


Yes, actually, the correct value should be "vectors>=N+1", with N 
standing for the number of queue pairs. It's due to the hard coded 
mapping logic:

0 -> config irq
1 -> rxq0
2 -> rxq1
...


  If so, you also have to doc it somewhere.


Agreed.

[...]

+
+   if (virtio_queues_bind_intr(dev) < 0) {
+   PMD_INIT_LOG(ERR, "Failed to bind queue/interrupt");
+   return -1;

You have to free intr_handle->intr_vec, otherwise, memory leak occurs.


It's freed at dev_close(). Do you mean freeing and reallocating here? As 
nr_rx_queues is not a changeable value, I don't see the necessity here. 
I miss something?


Thanks,
Jianfeng


Re: [dpdk-dev] [PATCH v3 3/7] net/virtio_user: move vhost user specific code

2017-01-03 Thread Yuanhan Liu
On Wed, Jan 04, 2017 at 06:46:34AM +, Tan, Jianfeng wrote:
> Hi Yuanhan,
> 
> > -Original Message-
> > From: Yuanhan Liu [mailto:yuanhan@linux.intel.com]
> > Sent: Wednesday, January 4, 2017 2:03 PM
> > To: Tan, Jianfeng
> > Cc: dev@dpdk.org; Yigit, Ferruh; Liang, Cunming
> > Subject: Re: [PATCH v3 3/7] net/virtio_user: move vhost user specific code
> > 
> > On Wed, Jan 04, 2017 at 03:59:22AM +, Jianfeng Tan wrote:
> > > To support vhost kernel as the backend of net_virtio_user in coming
> > > patches, we move vhost_user specific structs and macros into
> > > vhost_user.c, and only keep common definitions in vhost.h.
> > >
> > > Besides, remove VHOST_USER_MQ feature check.
> > 
> > Again, I have to ask, why? You don't only remove the check, also, you
> > removed this feature setting, which seems to break the MQ support?
> 
> I have answered it here:
> http://dpdk.org/ml/archives/dev/2016-December/053520.html

I thought we have made some agreements :/

> 
> To be more clear, VHOST_USER_MQ is a not-well-defined macro: #define 
> VHOST_USER_MQ (1ULL << VHOST_USER_F_PROTOCOL_FEATURES),
> which is a feature bit in vhost user protocol.

Yes, it's again named wrongly.

> According to QEMU/ docs/specs/vhost-user.txt, "If 
> VHOST_USER_F_PROTOCOL_FEATURES has not been negotiated, the ring is 
> initialized in an enabled state. "
> 
> But our DPDK vhost library does not take care of this feature bit.
> Just make this as default: the ring is initialized in an disabled state. And 
> our virtio_user with vhost-user does send VHOST_USER_SET_VRING_ENABLE to 
> enable each queue pair.

VHOST_USER_F_PROTOCOL_FEATURES is a fundamental feature for quite many
vhost-user extended features, including the MQ. If it's not set, the MQ
should not work.

It may still work in your case, becase you made an assumtion that the
vhost backend supports the MQ feature (which is true in nowadays, as
the feature has been there for a quite while). However, that's not an
assumtion you can take while adding the vhost-user MQ support at that
time. And such feature bit (including the PROTOCOL_F_MQ) makes sure
that we will not try to enable MQ with and older vhost backend that
doesn't have the support.

Put simply, this feature is needed, and as the feature name states,
it's needed only for vhost-user.

--yliu

> 
> So I think it's not necessary to add it back.
> 
> How do you think?
> 
> Thanks,
> Jianfeng
> 
> > 
> > --yliu


  1   2   >