[dpdk-dev] [PATCH] net/i40evf: fix casting between structs

2016-11-30 Thread Wu, Jingjing


> -Original Message-
> From: Yigit, Ferruh
> Sent: Wednesday, November 30, 2016 12:08 AM
> To: Wu, Jingjing ; dev at dpdk.org
> Cc: Zhang, Helin 
> Subject: Re: [dpdk-dev] [PATCH] net/i40evf: fix casting between structs
> 
> On 11/27/2016 9:35 AM, Jingjing Wu wrote:
> > Casting from structs which lay out data in typed members to structs
> > which have flat memory buffers, will cause problems if the alignment
> > of the former isn't as expected.
> > This patch removes the casting between structs.
> >
> > Fixes: ae19955e7c86 ("i40evf: support reporting PF reset")
> > Signed-off-by: Jingjing Wu 
> > ---
> >  drivers/net/i40e/i40e_ethdev_vf.c | 27 +++
> >  1 file changed, 15 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/net/i40e/i40e_ethdev_vf.c
> > b/drivers/net/i40e/i40e_ethdev_vf.c
> > index aa306d6..53d7c87 100644
> > --- a/drivers/net/i40e/i40e_ethdev_vf.c
> > +++ b/drivers/net/i40e/i40e_ethdev_vf.c
> > @@ -1336,8 +1336,9 @@ i40evf_handle_aq_msg(struct rte_eth_dev *dev)
> > struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
> > struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data-
> >dev_private);
> > struct i40e_arq_event_info info;
> > -   struct i40e_virtchnl_msg *v_msg;
> > -   uint16_t pending, opcode;
> > +   uint16_t pending, aq_opc;
> > +   enum i40e_virtchnl_ops msg_opc;
> > +   enum i40e_status_code msg_ret;
> > int ret;
> >
> > info.buf_len = I40E_AQ_BUF_SZ;
> > @@ -1346,7 +1347,6 @@ i40evf_handle_aq_msg(struct rte_eth_dev *dev)
> > return;
> > }
> > info.msg_buf = vf->aq_resp;
> > -   v_msg = (struct i40e_virtchnl_msg *)&info.desc;
> >
> > pending = 1;
> > while (pending) {
> > @@ -1357,32 +1357,35 @@ i40evf_handle_aq_msg(struct rte_eth_dev *dev)
> > "ret: %d", ret);
> > break;
> > }
> > -   opcode = rte_le_to_cpu_16(info.desc.opcode);
> > -
> > -   switch (opcode) {
> > +   aq_opc = rte_le_to_cpu_16(info.desc.opcode);
> > +   msg_opc = (enum i40e_virtchnl_ops)rte_le_to_cpu_32(
> > + info.desc.cookie_high);
> > +   msg_ret = (enum i40e_status_code)rte_le_to_cpu_32(
> > + info.desc.cookie_low);
> 
> What do you think commenting cookie_high is opcode and cookie_low is
> return_value?
> 
OK. Will add some comments.

> > +   switch (aq_opc) {
> > case i40e_aqc_opc_send_msg_to_vf:
> > -   if (v_msg->v_opcode == I40E_VIRTCHNL_OP_EVENT)
> > +   if (msg_opc == I40E_VIRTCHNL_OP_EVENT)
> > /* process event*/
> > i40evf_handle_pf_event(dev, info.msg_buf,
> >info.msg_len);
> > else {
> > /* read message and it's expected one */
> > -   if (v_msg->v_opcode == vf->pend_cmd) {
> > -   vf->cmd_retval = v_msg->v_retval;
> > +   if (msg_opc == vf->pend_cmd) {
> > +   vf->cmd_retval = msg_ret;
> > /* prevent compiler reordering */
> > rte_compiler_barrier();
> > _clear_cmd(vf);
> > } else
> > PMD_DRV_LOG(ERR, "command
> mismatch,"
> > "expect %u, get %u",
> > -   vf->pend_cmd, v_msg-
> >v_opcode);
> > +   vf->pend_cmd, msg_ret);
> 
> s/msg_ret/msg_opc/ ?
Yes, should use msg_opc here. Thanks!

Will update!


Thanks
Jingjing


[dpdk-dev] [PATCH v2] i40e: Fix eth_i40e_dev_init sequence on ThunderX

2016-11-30 Thread Asok Tiyyagura
I can confirm we ran into a similar issue while using ixgbe.


-Asok


Hi
>
> i40e_asq_send_command: rd32 & wr32 under ThunderX gives unpredictable
>results. To solve this include rte memory barriers
>
> Signed-off-by: Satha Rao 
> ---
>  drivers/net/i40e/base/i40e_osdep.h | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/drivers/net/i40e/base/i40e_osdep.h 
> b/drivers/net/i40e/base/i40e_osdep.h
> index 38e7ba5..ffa3160 100644
> --- a/drivers/net/i40e/base/i40e_osdep.h
> +++ b/drivers/net/i40e/base/i40e_osdep.h
> @@ -158,7 +158,13 @@ do { 
>\
>((volatile uint32_t *)((char *)(a)->hw_addr + (reg)))
>  static inline uint32_t i40e_read_addr(volatile void *addr)
>  {
> +#if defined(RTE_ARCH_ARM64)
> + uint32_t val = rte_le_to_cpu_32(I40E_PCI_REG(addr));
> + rte_rmb();
> + return val;

If you really need an rmb/wmb with MMIO read/writes on ARM,
I think you can avoid #ifdefs here and use rte_smp_rmb/rte_smp_wmb.
BTW, I suppose if you need it for i40e, you would need it for other devices too.
Konstantin

> +#else
>return rte_le_to_cpu_32(I40E_PCI_REG(addr));
> +#endif
>  }
>  #define I40E_PCI_REG_WRITE(reg, value) \
>do { I40E_PCI_REG((reg)) = rte_cpu_to_le_32(value); } while (0)
> @@ -171,8 +177,16 @@ static inline uint32_t i40e_read_addr(volatile void 
> *addr)
>I40E_PCI_REG_WRITE(I40E_PCI_REG_ADDR((hw), (reg)), (value))
>
>  #define rd32(a, reg) i40e_read_addr(I40E_PCI_REG_ADDR((a), (reg)))
> +#if defined(RTE_ARCH_ARM64)
> +#define wr32(a, reg, value) \
> + do { \
> + I40E_PCI_REG_WRITE(I40E_PCI_REG_ADDR((a), (reg)), (value)); \
> + rte_wmb(); \
> + } while (0)
> +#else
>  #define wr32(a, reg, value) \
>I40E_PCI_REG_WRITE(I40E_PCI_REG_ADDR((a), (reg)), (value))
> +#endif
>  #define flush(a) i40e_read_addr(I40E_PCI_REG_ADDR((a), (I40E_GLGEN_STAT)))
>
>  #define ARRAY_SIZE(arr) (sizeof(arr)/sizeof(arr[0]))
> --
> 2.7.4



[dpdk-dev] [PATCH v2] net/i40evf: fix casting between structs

2016-11-30 Thread Jingjing Wu
Casting from structs which lay out data in typed members
to structs which have flat memory buffers, will cause
problems if the alignment of the former isn't as expected.
This patch removes the casting between structs.

Fixes: ae19955e7c86 ("i40evf: support reporting PF reset")
Signed-off-by: Jingjing Wu 
---
v2 change:
 correct the arguments in log.
 add more comments.

 drivers/net/i40e/i40e_ethdev_vf.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index aa306d6..5ec5264 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1336,8 +1336,9 @@ i40evf_handle_aq_msg(struct rte_eth_dev *dev)
struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
struct i40e_arq_event_info info;
-   struct i40e_virtchnl_msg *v_msg;
-   uint16_t pending, opcode;
+   uint16_t pending, aq_opc;
+   enum i40e_virtchnl_ops msg_opc;
+   enum i40e_status_code msg_ret;
int ret;

info.buf_len = I40E_AQ_BUF_SZ;
@@ -1346,7 +1347,6 @@ i40evf_handle_aq_msg(struct rte_eth_dev *dev)
return;
}
info.msg_buf = vf->aq_resp;
-   v_msg = (struct i40e_virtchnl_msg *)&info.desc;

pending = 1;
while (pending) {
@@ -1357,32 +1357,39 @@ i40evf_handle_aq_msg(struct rte_eth_dev *dev)
"ret: %d", ret);
break;
}
-   opcode = rte_le_to_cpu_16(info.desc.opcode);
-
-   switch (opcode) {
+   aq_opc = rte_le_to_cpu_16(info.desc.opcode);
+   /* For the message sent from pf to vf, opcode is stored in
+* cookie_high of struct i40e_aq_desc, while return error code
+* are stored in cookie_low, Which is done by
+* i40e_aq_send_msg_to_vf in PF driver.*/
+   msg_opc = (enum i40e_virtchnl_ops)rte_le_to_cpu_32(
+ info.desc.cookie_high);
+   msg_ret = (enum i40e_status_code)rte_le_to_cpu_32(
+ info.desc.cookie_low);
+   switch (aq_opc) {
case i40e_aqc_opc_send_msg_to_vf:
-   if (v_msg->v_opcode == I40E_VIRTCHNL_OP_EVENT)
+   if (msg_opc == I40E_VIRTCHNL_OP_EVENT)
/* process event*/
i40evf_handle_pf_event(dev, info.msg_buf,
   info.msg_len);
else {
/* read message and it's expected one */
-   if (v_msg->v_opcode == vf->pend_cmd) {
-   vf->cmd_retval = v_msg->v_retval;
+   if (msg_opc == vf->pend_cmd) {
+   vf->cmd_retval = msg_ret;
/* prevent compiler reordering */
rte_compiler_barrier();
_clear_cmd(vf);
} else
PMD_DRV_LOG(ERR, "command mismatch,"
"expect %u, get %u",
-   vf->pend_cmd, v_msg->v_opcode);
+   vf->pend_cmd, msg_opc);
PMD_DRV_LOG(DEBUG, "adminq response is 
received,"
-" opcode = %d\n", v_msg->v_opcode);
+" opcode = %d\n", msg_opc);
}
break;
default:
PMD_DRV_LOG(ERR, "Request %u is not supported yet",
-   opcode);
+   aq_opc);
break;
}
}
-- 
2.4.11



[dpdk-dev] [PATCH 1/1] net/i40e: enable auto link update for XXV710

2016-11-30 Thread Zhang, Qi Z
Hi Ferruh:

> -Original Message-
> From: Yigit, Ferruh
> Sent: Wednesday, November 30, 2016 1:46 AM
> To: Zhang, Qi Z ; Wu, Jingjing  intel.com>;
> Zhang, Helin 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/1] net/i40e: enable auto link update for
> XXV710
> 
> Hi Qi,
> 
> On 11/24/2016 11:43 PM, Qi Zhang wrote:
> > This patch remove the limitation that XXV710 device does
> 
> XXV710 is 25G device, and support added in 16.11 (please correct me if this is
> wrong.), but I can't find any DPDK documentation for this device.
> 
> Can you please add some documentation, at least to http://dpdk.org/doc/nics
> and http://dpdk.org/doc/guides/nics/i40e.html
> in a different patch?

For 16.11, since XXV710 is not officially supported, so they are missing in 
document.
For 17.02 this will be updated. Thanks for remind.
> 
> > not support auto link update.
> 
> Can you please add more details that why we can remove the limitation now?
Ok, will update in v2.
> 
> >
> > Signed-off-by: Qi Zhang 
> > ---
> >  drivers/net/i40e/i40e_ethdev.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/i40e/i40e_ethdev.c
> > b/drivers/net/i40e/i40e_ethdev.c index 67778ba..b7a916d 100644
> > --- a/drivers/net/i40e/i40e_ethdev.c
> > +++ b/drivers/net/i40e/i40e_ethdev.c
> > @@ -1628,6 +1628,8 @@ i40e_phy_conf_link(struct i40e_hw *hw,
> >
> > /* use get_phy_abilities_resp value for the rest */
> > phy_conf.phy_type = phy_ab.phy_type;
> > +   phy_conf.phy_type_ext = phy_ab.phy_type_ext;
> > +   phy_conf.fec_config = phy_ab.mod_type_ext;
> 
> And these changes look like called for all device types, just to double 
> check, are
> these 25G specific?

Actually only XXV710 need this two lines, but base on firmware engineer's 
input, 
this could be implemented in generic way since no impact for other i40e devices.
> 
> > phy_conf.eee_capability = phy_ab.eee_capability;
> > phy_conf.eeer = phy_ab.eeer_val;
> > phy_conf.low_power_ctrl = phy_ab.d3_lpan; @@ -1653,8 +1655,7 @@
> > i40e_apply_link_speed(struct rte_eth_dev *dev)
> > struct rte_eth_conf *conf = &dev->data->dev_conf;
> >
> > speed = i40e_parse_link_speeds(conf->link_speeds);
> > -   if (!I40E_PHY_TYPE_SUPPORT_25G(hw->phy.phy_types))
> > -   abilities |= I40E_AQ_PHY_ENABLE_ATOMIC_LINK;
> > +   abilities |= I40E_AQ_PHY_ENABLE_ATOMIC_LINK;
> > if (!(conf->link_speeds & ETH_LINK_SPEED_FIXED))
> > abilities |= I40E_AQ_PHY_AN_ENABLED;
> > abilities |= I40E_AQ_PHY_LINK_ENABLED; @@ -1990,8 +1991,7 @@
> > i40e_dev_set_link_down(struct rte_eth_dev *dev)
> > uint8_t abilities = 0;
> > struct i40e_hw *hw =
> I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> >
> > -   if (!I40E_PHY_TYPE_SUPPORT_25G(hw->phy.phy_types))
> > -   abilities = I40E_AQ_PHY_ENABLE_ATOMIC_LINK;
> > +   abilities = I40E_AQ_PHY_ENABLE_ATOMIC_LINK;
> > return i40e_phy_conf_link(hw, abilities, speed);  }
> >
> >
Thanks!
Qi


[dpdk-dev] [PATCH v1] examples/ethtool: fix segfault querying non-PCI devices

2016-11-30 Thread Remy Horton
Doing a device information query on a non-PCI device such as
vhost was resulting in the dereferencing of a NULL pointer
(the absent PCI data), causing a segmentation fault.

Fixes: bda68ab9d1e7 ("examples/ethtool: add user-space ethtool sample 
application")

Signed-off-by: Remy Horton 
---
 doc/guides/rel_notes/release_17_02.rst |  3 +++
 examples/ethtool/lib/rte_ethtool.c | 13 +
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_02.rst 
b/doc/guides/rel_notes/release_17_02.rst
index 3b65038..f471e49 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -70,6 +70,9 @@ Libraries
 Examples
 

+   * **examples/ethtool Fixed crash with non-PCI devices.**
+ Querying a non-PCI device was dereferencing non-existent PCI data
+ resulting in a segmentation fault.

 Other
 ~
diff --git a/examples/ethtool/lib/rte_ethtool.c 
b/examples/ethtool/lib/rte_ethtool.c
index a1f91d4..6f0ce84 100644
--- a/examples/ethtool/lib/rte_ethtool.c
+++ b/examples/ethtool/lib/rte_ethtool.c
@@ -61,10 +61,15 @@ rte_ethtool_get_drvinfo(uint8_t port_id, struct 
ethtool_drvinfo *drvinfo)
dev_info.driver_name);
snprintf(drvinfo->version, sizeof(drvinfo->version), "%s",
rte_version());
-   snprintf(drvinfo->bus_info, sizeof(drvinfo->bus_info),
-   "%04x:%02x:%02x.%x",
-   dev_info.pci_dev->addr.domain, dev_info.pci_dev->addr.bus,
-   dev_info.pci_dev->addr.devid, dev_info.pci_dev->addr.function);
+   if (dev_info.pci_dev)
+   snprintf(drvinfo->bus_info, sizeof(drvinfo->bus_info),
+   "%04x:%02x:%02x.%x",
+   dev_info.pci_dev->addr.domain,
+   dev_info.pci_dev->addr.bus,
+   dev_info.pci_dev->addr.devid,
+   dev_info.pci_dev->addr.function);
+   else
+   snprintf(drvinfo->bus_info, sizeof(drvinfo->bus_info), "N/A");

memset(®_info, 0, sizeof(reg_info));
rte_eth_dev_get_reg_info(port_id, ®_info);
-- 
2.5.5



[dpdk-dev] [PATCH] vmxnet3: fix Rx deadlock

2016-11-30 Thread Yong Wang
> -Original Message-
> From: Stefan Puiu [mailto:stefan.puiu at gmail.com]
> Sent: Monday, November 14, 2016 2:46 AM
> To: dev at dpdk.org
> Cc: mac_leehk at yahoo.com.hk; Yong Wang ;
> Stefan Puiu 
> Subject: [PATCH] vmxnet3: fix Rx deadlock
> 
> Our use case is that we have an app that needs to keep mbufs around
> for a while. We've seen cases when calling vmxnet3_post_rx_bufs() from
> vmxet3_recv_pkts(), it might not succeed to add any mbufs to any RX
> descriptors (where it returns -err). Since there are no mbufs that the
> virtual hardware can use, and since nobody calls
> vmxnet3_post_rx_bufs() after that, no packets will be received after

The patch looks good overall.

I think a more accurate description is that the particular descriptor's 
generation bit never got flipped properly when an mbuf failed to be refilled 
which caused the rx stuck, rather than vmxnet3_post_rx_bufs() not being called 
afterwards.

> this. I call this a deadlock for lack of a better term - the virtual
> HW waits for free mbufs, while the app waits for the hardware to
> notify it for data. Note that after this, the app can't recover.
> 
> This fix is a rework of this patch by Marco Lee:
> https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__dpdk.org_dev_patchwork_patch_6575_&d=CwIBAg&c=Sqcl0Ez6M0X8a
> eM67LKIiDJAXVeAw-YihVMNtXt-
> uEs&r=44mSO5N5yEs4CeCdtQE0xt0F7J0p67_mApYVAzyYms0&m=g2gi3ZErdx
> AKGY8d3wbhk2D6TLUVYBs3K-
> KMdiJwuvI&s=YLz0Wsl_kQUXPWij82nnO9ROB64AK5ZtDCyUvHuU8jA&e= . I
> had to forward port it,
> address review comments and also reverted the allocation failure
> handing to the first version of the patch

s/handing/handling

> (https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__dpdk.org_ml_archives_dev_2015-
> 2DJuly_022079.html&d=CwIBAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-
> YihVMNtXt-
> uEs&r=44mSO5N5yEs4CeCdtQE0xt0F7J0p67_mApYVAzyYms0&m=g2gi3ZErdx
> AKGY8d3wbhk2D6TLUVYBs3K-
> KMdiJwuvI&s=5HksZV8s99b3jVV7Pea60d18hKqXxp4eRpJWjz6sWLc&e= ),
> since that's
> the only approach that seems to work, and seems to be what other
> drivers are doing (I checked ixgbe and em). Reusing the mbuf that's
> getting passed to the application doesn't seem to make sense, and it
> was causing weird issues in our app. Also, reusing rxm without
> checking if it's NULL could cause the code to crash.
> 
> Signed-off-by: Stefan Puiu 
> ---
>  drivers/net/vmxnet3/vmxnet3_rxtx.c | 38
> --
>  1 file changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c
> b/drivers/net/vmxnet3/vmxnet3_rxtx.c
> index b109168..c9d2488 100644
> --- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
> +++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
> @@ -518,6 +518,32 @@
>   return nb_tx;
>  }
> 
> +static inline void
> +vmxnet3_renew_desc(vmxnet3_rx_queue_t *rxq, uint8_t ring_id,
> + struct rte_mbuf *mbuf)

Nit: align the params here to be consistent with other functions.

> +{
> + uint32_t  val = 0;

Nit: extra space before "val"

> + struct vmxnet3_cmd_ring *ring = &rxq->cmd_ring[ring_id];
> + struct Vmxnet3_RxDesc *rxd =
> + (struct Vmxnet3_RxDesc *)(ring->base + ring->next2fill);
> + vmxnet3_buf_info_t *buf_info = &ring->buf_info[ring->next2fill];
> +
> + if (ring_id == 0)
> + val = VMXNET3_RXD_BTYPE_HEAD;
> + else
> + val = VMXNET3_RXD_BTYPE_BODY;
> +
> + buf_info->m = mbuf;
> + buf_info->len = (uint16_t)(mbuf->buf_len -
> RTE_PKTMBUF_HEADROOM);
> + buf_info->bufPA = rte_mbuf_data_dma_addr_default(mbuf);
> +
> + rxd->addr = buf_info->bufPA;
> + rxd->btype = val;
> + rxd->len = buf_info->len;
> + rxd->gen = ring->gen;
> +
> + vmxnet3_cmd_ring_adv_next2fill(ring);
> +}
>  /*
>   *  Allocates mbufs and clusters. Post rx descriptors with buffer details
>   *  so that device can receive packets in those buffers.
> @@ -657,9 +683,17 @@
>   }
> 
>   while (rcd->gen == rxq->comp_ring.gen) {
> + struct rte_mbuf *newm;

Nit: add a blank line here.

>   if (nb_rx >= nb_pkts)
>   break;
> 
> + newm = rte_mbuf_raw_alloc(rxq->mp);
> + if (unlikely(newm == NULL)) {
> + PMD_RX_LOG(ERR, "Error allocating mbuf");
> + rxq->stats.rx_buf_alloc_failure++;
> + break;
> + }
> +
>   idx = rcd->rxdIdx;
>   ring_idx = (uint8_t)((rcd->rqID == rxq->qid1) ? 0 : 1);
>   rxd = (Vmxnet3_RxDesc *)rxq->cmd_ring[ring_idx].base +
> idx;
> @@ -759,8 +793,8 @@
>   VMXNET3_INC_RING_IDX_ONLY(rxq-
> >cmd_ring[ring_idx].next2comp,
> rxq->cmd_ring[ring_idx].size);
> 
> - /* It's time to allocate some new buf and renew descriptors
> */
> - vmxnet3_post_rx_bufs(rxq, ring_idx);
> + /* It's time to  renew descriptors */

Nit: extra space before "renew"

> +   

[dpdk-dev] [PATCH v2] doc: prog_guide: fix section heading

2016-11-30 Thread Baruch Siach
This section only deals with Tx queues configuration.

Acked-by: John McNamara 
Signed-off-by: Baruch Siach 
---
v2: Add John's ack
---
 doc/guides/prog_guide/poll_mode_drv.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/guides/prog_guide/poll_mode_drv.rst 
b/doc/guides/prog_guide/poll_mode_drv.rst
index bf3ea9fde25a..d4c92ea2cf4a 100644
--- a/doc/guides/prog_guide/poll_mode_drv.rst
+++ b/doc/guides/prog_guide/poll_mode_drv.rst
@@ -200,8 +200,8 @@ Ethernet* flow control (pause frame) can be configured on 
the individual port.
 Refer to the testpmd source code for details.
 Also, L4 (UDP/TCP/ SCTP) checksum offload by the NIC can be enabled for an 
individual packet as long as the packet mbuf is set up correctly. See `Hardware 
Offload`_ for details.

-Configuration of Transmit and Receive Queues
-
+Configuration of Transmit Queues
+

 Each transmit queue is independently configured with the following information:

-- 
2.10.2



[dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-11-30 Thread John Daley (johndale)
Hi,
-john

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, November 28, 2016 3:03 AM
> To: dev at dpdk.org; Rahul Lakkireddy ;
> Stephen Hurd ; Jan Medala
> ; Jakub Palider ; John Daley
> (johndale) ; Adrien Mazarguil
> ; Alejandro Lucero
> ; Harish Patil
> ; Rasesh Mody ; Jerin
> Jacob ; Yuanhan Liu
> ; Yong Wang 
> Cc: Tomasz Kulasek ;
> konstantin.ananyev at intel.com; olivier.matz at 6wind.com
> Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> 
> We need attention of every PMD developers on this thread.
> 
> Reminder of what Konstantin suggested:
> "
> - if the PMD supports TX offloads AND
> - if to be able use any of these offloads the upper layer SW would have to:
> * modify the contents of the packet OR
> * obey HW specific restrictions
> then it is a PMD developer responsibility to provide tx_prep() that would
> implement expected modifications of the packet contents and restriction
> checks.
> Otherwise, tx_prep() implementation is not required and can be safely set to
> NULL.
> "
> 
> I copy/paste also my previous conclusion:
> 
> Before txprep, there is only one API: the application must prepare the
> packets checksum itself (get_psd_sum in testpmd).
> With txprep, the application have 2 choices: keep doing the job itself or call
> txprep which calls a PMD-specific function.
> The question is: does non-Intel drivers need a checksum preparation for
> TSO?
> Will it behave well if txprep does nothing in these drivers?
> 
> When looking at the code, most of drivers handle the TSO flags.
> But it is hard to know whether they rely on the pseudo checksum or not.
> 
> git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG'
> drivers/net/
> 
> drivers/net/bnxt/bnxt_txr.c
> drivers/net/cxgbe/sge.c
> drivers/net/e1000/em_rxtx.c
> drivers/net/e1000/igb_rxtx.c
> drivers/net/ena/ena_ethdev.c
> drivers/net/enic/enic_rxtx.c
> drivers/net/fm10k/fm10k_rxtx.c
> drivers/net/i40e/i40e_rxtx.c
> drivers/net/ixgbe/ixgbe_rxtx.c
> drivers/net/mlx4/mlx4.c
> drivers/net/mlx5/mlx5_rxtx.c
> drivers/net/nfp/nfp_net.c
> drivers/net/qede/qede_rxtx.c
> drivers/net/thunderx/nicvf_rxtx.c
> drivers/net/virtio/virtio_rxtx.c
> drivers/net/vmxnet3/vmxnet3_rxtx.c
> 
> Please, we need a comment for each driver saying "it is OK, we do not need
> any checksum preparation for TSO"
> or
> "yes we have to implement tx_prepare or TSO will not work in this mode"

I like the idea of tx prep since it should make for cleaner apps.

For enic, I believe the answer is " it is OK, we do not need any checksum 
preparation".

Prior to now, it was necessary to set IP checksum to 0 and put in a TCP/UDP 
pseudo header. But there is a hardware overwrite of checksums option which 
makes preparation in software unnecessary and it is testing out well so far. I 
plan to enable it in 17.02. TSO is also being enabled for 17.02 and it does not 
look like any prep is required. So I'm going with " txprep NULL pointer is OK 
for enic", but may have to change my mind if something comes up in testing.

-john


[dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-11-30 Thread Adrien Mazarguil
On Mon, Nov 28, 2016 at 12:03:06PM +0100, Thomas Monjalon wrote:
> We need attention of every PMD developers on this thread.

I've been following this thread from the beginning while working on rte_flow
and wanted to see where it was headed before replying. (I know, v11 was
submitted about 1 month ago but still.)

> Reminder of what Konstantin suggested:
> "
> - if the PMD supports TX offloads AND
> - if to be able use any of these offloads the upper layer SW would have to:
> * modify the contents of the packet OR
> * obey HW specific restrictions
> then it is a PMD developer responsibility to provide tx_prep() that would 
> implement
> expected modifications of the packet contents and restriction checks.
> Otherwise, tx_prep() implementation is not required and can be safely set to 
> NULL.  
> "
> 
> I copy/paste also my previous conclusion:
> 
> Before txprep, there is only one API: the application must prepare the
> packets checksum itself (get_psd_sum in testpmd).
> With txprep, the application have 2 choices: keep doing the job itself
> or call txprep which calls a PMD-specific function.

Something is definitely needed here, and only PMDs can provide it. I think
applications should not have to clear checksum fields or initialize them to
some magic value, same goes for any other offload or hardware limitation
that needs to be worked around.

tx_prep() is one possible answer to this issue, however as mentioned in the
original patch it can be very expensive if exposed by the PMD.

Another issue I'm more concerned about is the way limitations are managed
(struct rte_eth_desc_lim). While not officially tied to tx_prep(), this
structure contains new fields that are only relevant to a few devices, and I
fear it will keep growing with each new hardware quirk to manage, breaking
ABIs in the process.

What are applications supposed to do, check each of them regardless before
attempting to send a burst?

I understand tx_prep() automates this process, however I'm wondering why
isn't the TX burst function doing that itself. Using nb_mtu_seg_max as an
example, tx_prep() has an extra check in case of TSO that the TX burst
function does not perform. This ends up being much more expensive to
applications due to the additional loop doing redundant testing on each
mbuf.

If, say as a performance improvement, we decided to leave the validation
part to the TX burst function; what remains in tx_prep() is basically heavy
"preparation" requiring mbuf changes (i.e. erasing checksums, for now).

Following the same logic, why can't such a thing be made part of the TX
burst function as well (through a direct call to rte_phdr_cksum_fix()
whenever necessary). From an application standpoint, what are the advantages
of having to:

 if (tx_prep()) // iterate and update mbufs as needed
 tx_burst(); // iterate and send

Compared to:

 tx_burst(); // iterate, update as needed and send

Note that PMDs could still provide different TX callbacks depending on the
set of enabled offloads so performance is not unnecessarily impacted.

In my opinion the second approach is both faster to applications and more
friendly from a usability perspective, am I missing something obvious?

> The question is: does non-Intel drivers need a checksum preparation for TSO?
> Will it behave well if txprep does nothing in these drivers?
> 
> When looking at the code, most of drivers handle the TSO flags.
> But it is hard to know whether they rely on the pseudo checksum or not.
> 
> git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG' drivers/net/
> 
> drivers/net/bnxt/bnxt_txr.c
> drivers/net/cxgbe/sge.c
> drivers/net/e1000/em_rxtx.c
> drivers/net/e1000/igb_rxtx.c
> drivers/net/ena/ena_ethdev.c
> drivers/net/enic/enic_rxtx.c
> drivers/net/fm10k/fm10k_rxtx.c
> drivers/net/i40e/i40e_rxtx.c
> drivers/net/ixgbe/ixgbe_rxtx.c
> drivers/net/mlx4/mlx4.c
> drivers/net/mlx5/mlx5_rxtx.c
> drivers/net/nfp/nfp_net.c
> drivers/net/qede/qede_rxtx.c
> drivers/net/thunderx/nicvf_rxtx.c
> drivers/net/virtio/virtio_rxtx.c
> drivers/net/vmxnet3/vmxnet3_rxtx.c
> 
> Please, we need a comment for each driver saying
> "it is OK, we do not need any checksum preparation for TSO"
> or
> "yes we have to implement tx_prepare or TSO will not work in this mode"

For both mlx4 and mlx5 then,
"it is OK, we do not need any checksum preparation for TSO".

Actually I do not think we'll ever need tx_prep() unless we add our own
quirks to struct rte_eth_desc_lim (and friends) which are currently quietly
handled by TX burst functions.

-- 
Adrien Mazarguil
6WIND


[dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-11-30 Thread Thomas Monjalon
2016-11-30 08:40, Adrien Mazarguil:
[...]
> I understand tx_prep() automates this process, however I'm wondering why
> isn't the TX burst function doing that itself. Using nb_mtu_seg_max as an
> example, tx_prep() has an extra check in case of TSO that the TX burst
> function does not perform. This ends up being much more expensive to
> applications due to the additional loop doing redundant testing on each
> mbuf.
> 
> If, say as a performance improvement, we decided to leave the validation
> part to the TX burst function; what remains in tx_prep() is basically heavy
> "preparation" requiring mbuf changes (i.e. erasing checksums, for now).
> 
> Following the same logic, why can't such a thing be made part of the TX
> burst function as well (through a direct call to rte_phdr_cksum_fix()
> whenever necessary). From an application standpoint, what are the advantages
> of having to:
> 
>  if (tx_prep()) // iterate and update mbufs as needed
>  tx_burst(); // iterate and send
> 
> Compared to:
> 
>  tx_burst(); // iterate, update as needed and send
> 
> Note that PMDs could still provide different TX callbacks depending on the
> set of enabled offloads so performance is not unnecessarily impacted.
> 
> In my opinion the second approach is both faster to applications and more
> friendly from a usability perspective, am I missing something obvious?

I think it was not clearly explained in this patchset, but this is
my understanding:
tx_prepare and tx_burst can be called at different stages of a pipeline,
on different cores.



[dpdk-dev] [PATCH] virtio: tx with can_push when VERSION_1 is set

2016-11-30 Thread Pierre Pfister (ppfister)

> Le 22 nov. 2016 ? 14:17, Maxime Coquelin  a 
> ?crit :
> 
> Hi Pierre,
> 
> On 11/22/2016 10:54 AM, Pierre Pfister (ppfister) wrote:
>> Hello Maxime,
>> 
>>> Le 9 nov. 2016 ? 15:51, Maxime Coquelin  a 
>>> ?crit :
>>> 
>>> Hi Pierre,
>>> 
>>> On 11/09/2016 01:42 PM, Pierre Pfister (ppfister) wrote:
 Hello Maxime,
 
 Sorry for the late reply.
 
 
> Le 8 nov. 2016 ? 10:44, Maxime Coquelin  a 
> ?crit :
> 
> Hi Pierre,
> 
> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote:
>> Current virtio driver advertises VERSION_1 support,
>> but does not handle device's VERSION_1 support when
>> sending packets (it looks for ANY_LAYOUT feature,
>> which is absent).
>> 
>> This patch enables 'can_push' in tx path when VERSION_1
>> is advertised by the device.
>> 
>> This significantly improves small packets forwarding rate
>> towards devices advertising VERSION_1 feature.
> I think it depends whether offloading is enabled or not.
> If no offloading enabled, I measured significant drop.
> Indeed, when no offloading is enabled, the Tx path in Virtio
> does not access the virtio header before your patch, as the header is 
> memset to zero at device init time.
> With your patch, it gets memset to zero at every transmit in the hot
> path.
 
 Right. On the virtio side that is true, but on the device side, we have to 
 access the header anyway.
>>> No more now, if no offload features have been negotiated.
>>> I have done a patch that landed in v16.11 to skip header parsing in
>>> this case.
>>> That said, we still have to access its descriptor.
>>> 
 And accessing two descriptors (with the address resolution and memory 
 fetch which comes with it)
 is a costy operation compared to a single one.
 In the case indirect descriptors are used, this is 1 desc access instead 
 or 3.
>>> I agree this is far from being optimal.
>>> 
 And in the case chained descriptors are used, this doubles the number of 
 packets that you can put in your queue.
 
 Those are the results in my PHY -> VM (testpmd) -> PHY setup
 Traffic is flowing bidirectionally. Numbers are for lossless-rates.
 
 When chained buffers are used for dpdk's TX: 2x2.13Mpps
 When indirect descriptors are used for dpdk's TX: 2x2.38Mpps
 When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps
>>> When I tried it, I also did PVP 0% benchmark, and I got opposite results. 
>>> Chained and indirect cases were significantly better.
>>> 
>>> My PVP setup was using a single NIC and single Virtio PMD, and NIC2VM
>>> forwarding was IO mode done with testpmd on host, and Rx->Tx forwarding
>>> was macswap mode on guest side.
>>> 
>>> I also saw some perf regression when running simple tespmd test on both
>>> ends.
>>> 
>>> Yuanhan, did you run some benchmark with your series enabling
>>> ANY_LAYOUT?
>> 
>> It was enabled. But the specs specify that VERSION_1 includes ANY_LAYOUT.
>> Therefor, Qemu removes ANY_LAYOUT when VERSION_1 is set.
>> 
>> We can keep arguing about which is fastest. I guess we have different setups 
>> and different results, so we probably are deadlocked here.
>> But in any case, the current code is inconsistent, as it uses single 
>> descriptor when ANY_LAYOUT is set, but not when VERSION_1 is set.
>> 
>> I believe it makes sense to use single-descriptor any time it is possible, 
>> but you are free to think otherwise.
>> Please make a call and make the code consistent (removes single-descriptors 
>> all together, or use them when VERSION_1 is set too). Otherwise it just 
>> creates yet-another testing headache.
> I also think it makes sense to have a single descriptor, but I had to
> highlight that I noticed a significant performance degradation when
> using single descriptor on my setup.
> 
> I'm fine we take your patch in virtio-next, so that more testing is conducted.
> 

Thanks,

I just realised there was an indentation error in the patch.
Meaning that this patch didn't make it to 16.11 ...
I will send a new version.

- Pierre


> Thanks,
> Maxime
> 
>> 
>> Thanks,
>> 
>> - Pierre
>> 
>>> 
 
 I must also note that qemu 2.5 does not seem to deal with VERSION_1 and 
 ANY_LAYOUT correctly.
 The patch I am proposing here works for qemu 2.7, but with qemu 2.5, 
 testpmd still behaves as if ANY_LAYOUT (or VERSION_1) was not available. 
 This is not catastrophic. But just note that you will not see performance 
 in some cases with qemu 2.5.
>>> 
>>> Thanks for the info.
>>> 
>>> Regards,
>>> Maxime
>> 



[dpdk-dev] [PATCH v2] virtio: tx with can_push when VERSION_1 is set

2016-11-30 Thread Pierre Pfister (ppfister)
Current virtio driver advertises VERSION_1 support,
but does not handle device's VERSION_1 support when
sending packets (it looks for ANY_LAYOUT feature,
which is absent).

This patch enables 'can_push' in tx path when VERSION_1
is advertised by the device.

This significantly improves small packets forwarding rate
towards devices advertising VERSION_1 feature.

Signed-off-by: Pierre Pfister 
---
 drivers/net/virtio/virtio_rxtx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 22d97a4..1e5a6b9 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -1015,7 +1015,8 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
}

/* optimize ring usage */
-   if (vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) &&
+   if ((vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) ||
+ vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) &&
rte_mbuf_refcnt_read(txm) == 1 &&
RTE_MBUF_DIRECT(txm) &&
txm->nb_segs == 1 &&
-- 
2.7.4 (Apple Git-66)


[dpdk-dev] [PATCH 1/1] net/i40e: enable auto link update for XXV710

2016-11-30 Thread Ferruh Yigit
On 11/30/2016 2:26 AM, Zhang, Qi Z wrote:
> Hi Ferruh:
> 
>> -Original Message-
>> From: Yigit, Ferruh
>> Sent: Wednesday, November 30, 2016 1:46 AM
>> To: Zhang, Qi Z ; Wu, Jingjing > intel.com>;
>> Zhang, Helin 
>> Cc: dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH 1/1] net/i40e: enable auto link update for
>> XXV710
>>
>> Hi Qi,
>>
>> On 11/24/2016 11:43 PM, Qi Zhang wrote:
>>> This patch remove the limitation that XXV710 device does
>>
>> XXV710 is 25G device, and support added in 16.11 (please correct me if this 
>> is
>> wrong.), but I can't find any DPDK documentation for this device.
>>
>> Can you please add some documentation, at least to http://dpdk.org/doc/nics
>> and http://dpdk.org/doc/guides/nics/i40e.html
>> in a different patch?
> 
> For 16.11, since XXV710 is not officially supported, so they are missing in 
> document.
> For 17.02 this will be updated. Thanks for remind.

If officially will be added on 17.02, can you please update release
notes too, to announce new device support?

>>
>>> not support auto link update.
>>
>> Can you please add more details that why we can remove the limitation now?
> Ok, will update in v2.
>>
>>>
>>> Signed-off-by: Qi Zhang 
>>> ---
>>>  drivers/net/i40e/i40e_ethdev.c | 8 
>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/net/i40e/i40e_ethdev.c
>>> b/drivers/net/i40e/i40e_ethdev.c index 67778ba..b7a916d 100644
>>> --- a/drivers/net/i40e/i40e_ethdev.c
>>> +++ b/drivers/net/i40e/i40e_ethdev.c
>>> @@ -1628,6 +1628,8 @@ i40e_phy_conf_link(struct i40e_hw *hw,
>>>
>>> /* use get_phy_abilities_resp value for the rest */
>>> phy_conf.phy_type = phy_ab.phy_type;
>>> +   phy_conf.phy_type_ext = phy_ab.phy_type_ext;
>>> +   phy_conf.fec_config = phy_ab.mod_type_ext;
>>
>> And these changes look like called for all device types, just to double 
>> check, are
>> these 25G specific?
> 
> Actually only XXV710 need this two lines, but base on firmware engineer's 
> input, 
> this could be implemented in generic way since no impact for other i40e 
> devices.
>>
>>> phy_conf.eee_capability = phy_ab.eee_capability;
>>> phy_conf.eeer = phy_ab.eeer_val;
>>> phy_conf.low_power_ctrl = phy_ab.d3_lpan; @@ -1653,8 +1655,7 @@
>>> i40e_apply_link_speed(struct rte_eth_dev *dev)
>>> struct rte_eth_conf *conf = &dev->data->dev_conf;
>>>
>>> speed = i40e_parse_link_speeds(conf->link_speeds);
>>> -   if (!I40E_PHY_TYPE_SUPPORT_25G(hw->phy.phy_types))
>>> -   abilities |= I40E_AQ_PHY_ENABLE_ATOMIC_LINK;
>>> +   abilities |= I40E_AQ_PHY_ENABLE_ATOMIC_LINK;
>>> if (!(conf->link_speeds & ETH_LINK_SPEED_FIXED))
>>> abilities |= I40E_AQ_PHY_AN_ENABLED;
>>> abilities |= I40E_AQ_PHY_LINK_ENABLED; @@ -1990,8 +1991,7 @@
>>> i40e_dev_set_link_down(struct rte_eth_dev *dev)
>>> uint8_t abilities = 0;
>>> struct i40e_hw *hw =
>> I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>>
>>> -   if (!I40E_PHY_TYPE_SUPPORT_25G(hw->phy.phy_types))
>>> -   abilities = I40E_AQ_PHY_ENABLE_ATOMIC_LINK;
>>> +   abilities = I40E_AQ_PHY_ENABLE_ATOMIC_LINK;
>>> return i40e_phy_conf_link(hw, abilities, speed);  }
>>>
>>>
> Thanks!
> Qi
> 



[dpdk-dev] Proposal for a new Committer model

2016-11-30 Thread Mcnamara, John
> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Tuesday, November 29, 2016 7:12 PM
> To: Mcnamara, John 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] Proposal for a new Committer model
> 
> > ...
> >
> > B) Designate alternates to serve as backups for the maintainer when
> > they are unavailable.  This provides high-availablility, and sounds
> > very much like your proposal, but in the interests of clarity, there
> > is still a single maintainer at any one time, it just may change to
> > ensure the continued merging of patches, if the primary maintainer isn't
> available.
> > Ideally however, those backup alternates arent needed, because most of
> > the primary maintainers work in merging pull requests, which are done
> > based on the trust of the submaintainer, and done during a very
> > limited window of time.  This also partially addreses multi-vendor
> > fairness if your subtree maintainers come from multiple participating
> companies.
> >
> > Regards
> > Neil
> >
> >
> >
> 
> Soo, I feel like we're wandering away from this thread.  Are you going to
> make a change to the comitter model?

Hi,

Yes. I think we have consensus on the main parts. I'll re-draft a proposal that 
we can discuss and then add to the contributors guide.

John




[dpdk-dev] [PATCH v2] virtio: tx with can_push when VERSION_1 is set

2016-11-30 Thread Yuanhan Liu
On Wed, Nov 30, 2016 at 09:18:42AM +, Pierre Pfister (ppfister) wrote:
> Current virtio driver advertises VERSION_1 support,
> but does not handle device's VERSION_1 support when
> sending packets (it looks for ANY_LAYOUT feature,
> which is absent).
> 
> This patch enables 'can_push' in tx path when VERSION_1
> is advertised by the device.
> 
> This significantly improves small packets forwarding rate
> towards devices advertising VERSION_1 feature.
> 
> Signed-off-by: Pierre Pfister 

>From the virtio spec point of view, that VERSION_1 implies ANY_LAYOUT,
this patch is right and I think we could apply this patch. No objections?

--yliu
> ---
>  drivers/net/virtio/virtio_rxtx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio/virtio_rxtx.c 
> b/drivers/net/virtio/virtio_rxtx.c
> index 22d97a4..1e5a6b9 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -1015,7 +1015,8 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
> **tx_pkts, uint16_t nb_pkts)
>   }
>  
>   /* optimize ring usage */
> - if (vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) &&
> + if ((vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) ||
> +   vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) &&
>   rte_mbuf_refcnt_read(txm) == 1 &&
>   RTE_MBUF_DIRECT(txm) &&
>   txm->nb_segs == 1 &&
> -- 
> 2.7.4 (Apple Git-66)


[dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-11-30 Thread Kulasek, TomaszX
Hi,

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, November 30, 2016 09:50
> To: Adrien Mazarguil ; Kulasek, TomaszX
> 
> Cc: dev at dpdk.org; Ananyev, Konstantin ;
> olivier.matz at 6wind.com
> Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> 
> 2016-11-30 08:40, Adrien Mazarguil:
> [...]
> > I understand tx_prep() automates this process, however I'm wondering
> > why isn't the TX burst function doing that itself. Using
> > nb_mtu_seg_max as an example, tx_prep() has an extra check in case of
> > TSO that the TX burst function does not perform. This ends up being
> > much more expensive to applications due to the additional loop doing
> > redundant testing on each mbuf.
> >
> > If, say as a performance improvement, we decided to leave the
> > validation part to the TX burst function; what remains in tx_prep() is
> > basically heavy "preparation" requiring mbuf changes (i.e. erasing
> checksums, for now).
> >
> > Following the same logic, why can't such a thing be made part of the
> > TX burst function as well (through a direct call to
> > rte_phdr_cksum_fix() whenever necessary). From an application
> > standpoint, what are the advantages of having to:
> >
> >  if (tx_prep()) // iterate and update mbufs as needed
> >  tx_burst(); // iterate and send
> >
> > Compared to:
> >
> >  tx_burst(); // iterate, update as needed and send
> >
> > Note that PMDs could still provide different TX callbacks depending on
> > the set of enabled offloads so performance is not unnecessarily
> impacted.
> >
> > In my opinion the second approach is both faster to applications and
> > more friendly from a usability perspective, am I missing something
> obvious?
> 
> I think it was not clearly explained in this patchset, but this is my
> understanding:
> tx_prepare and tx_burst can be called at different stages of a pipeline,
> on different cores.

Yes, this API is intended to be used optionaly, not only just before tx_burst.

1. Separating both stages:
   a) We may have a control over burst (packet content, validation) when needed.
   b) For invalid packets we may restore them or do some another task if needed 
(even on early stage of processing).
   c) Tx burst keep as simple as it should be.

2. Joining the functionality of tx_prepare and tx_burst have some disadvantages:
   a) When packet is invalid it cannot be restored by application should be 
dropped.
   b) Tx burst needs to modify the content of the packet.
   c) We have no way to eliminate overhead of preparation (tx_prepare) for the 
application where performance is a key.

3. Using tx callbacks
   a) We still need to have different implementations for different devices.
   b) The overhead in performance (comparing to the pair tx_prepare/tx_burst) 
will not be better while both ways uses very similar mechanism.

In addition, tx_prepare mechanism can be turned off by compilation flag (as 
discussed with Jerin in http://dpdk.org/dev/patchwork/patch/15770/) to provide 
real NOOP functionality (e.g. for low-end CPUs, where even unnecessary memory 
dereference and check can have significant impact on performance).

Tomasz


[dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-11-30 Thread Ananyev, Konstantin
Hi Adrien,

> 
> On Mon, Nov 28, 2016 at 12:03:06PM +0100, Thomas Monjalon wrote:
> > We need attention of every PMD developers on this thread.
> 
> I've been following this thread from the beginning while working on rte_flow
> and wanted to see where it was headed before replying. (I know, v11 was
> submitted about 1 month ago but still.)
> 
> > Reminder of what Konstantin suggested:
> > "
> > - if the PMD supports TX offloads AND
> > - if to be able use any of these offloads the upper layer SW would have to:
> > * modify the contents of the packet OR
> > * obey HW specific restrictions
> > then it is a PMD developer responsibility to provide tx_prep() that would 
> > implement
> > expected modifications of the packet contents and restriction checks.
> > Otherwise, tx_prep() implementation is not required and can be safely set 
> > to NULL.
> > "
> >
> > I copy/paste also my previous conclusion:
> >
> > Before txprep, there is only one API: the application must prepare the
> > packets checksum itself (get_psd_sum in testpmd).
> > With txprep, the application have 2 choices: keep doing the job itself
> > or call txprep which calls a PMD-specific function.
> 
> Something is definitely needed here, and only PMDs can provide it. I think
> applications should not have to clear checksum fields or initialize them to
> some magic value, same goes for any other offload or hardware limitation
> that needs to be worked around.
> 
> tx_prep() is one possible answer to this issue, however as mentioned in the
> original patch it can be very expensive if exposed by the PMD.
> 
> Another issue I'm more concerned about is the way limitations are managed
> (struct rte_eth_desc_lim). While not officially tied to tx_prep(), this
> structure contains new fields that are only relevant to a few devices, and I
> fear it will keep growing with each new hardware quirk to manage, breaking
> ABIs in the process.

Well, if some new HW capability/limitation would arise and we'd like to support
it in DPDK, then yes we probably would need to think how to incorporate it here.
Do you have anything particular in mind here?

> 
> What are applications supposed to do, check each of them regardless before
> attempting to send a burst?
> 
> I understand tx_prep() automates this process, however I'm wondering why
> isn't the TX burst function doing that itself. Using nb_mtu_seg_max as an
> example, tx_prep() has an extra check in case of TSO that the TX burst
> function does not perform. This ends up being much more expensive to
> applications due to the additional loop doing redundant testing on each
> mbuf.
> 
> If, say as a performance improvement, we decided to leave the validation
> part to the TX burst function; what remains in tx_prep() is basically heavy
> "preparation" requiring mbuf changes (i.e. erasing checksums, for now).
> 
> Following the same logic, why can't such a thing be made part of the TX
> burst function as well (through a direct call to rte_phdr_cksum_fix()
> whenever necessary). From an application standpoint, what are the advantages
> of having to:
> 
>  if (tx_prep()) // iterate and update mbufs as needed
>  tx_burst(); // iterate and send
> 
> Compared to:
> 
>  tx_burst(); // iterate, update as needed and send

I think that was discussed extensively quite a lot previously here:
As Thomas already replied - main motivation is to allow user
to execute them on different stages of packet TX pipeline,
and probably on different cores.
I think that provides better flexibility to the user to when/where
do these preparations and hopefully would lead to better performance.

Though, if you or any other PMD developer/maintainer would prefer
for particular PMD to combine both functionalities into tx_burst() and
keep tx_prep() as NOP - this is still possible too.  

> 
> Note that PMDs could still provide different TX callbacks depending on the
> set of enabled offloads so performance is not unnecessarily impacted.
> 
> In my opinion the second approach is both faster to applications and more
> friendly from a usability perspective, am I missing something obvious?
> 
> > The question is: does non-Intel drivers need a checksum preparation for TSO?
> > Will it behave well if txprep does nothing in these drivers?
> >
> > When looking at the code, most of drivers handle the TSO flags.
> > But it is hard to know whether they rely on the pseudo checksum or not.
> >
> > git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG' 
> > drivers/net/
> >
> > drivers/net/bnxt/bnxt_txr.c
> > drivers/net/cxgbe/sge.c
> > drivers/net/e1000/em_rxtx.c
> > drivers/net/e1000/igb_rxtx.c
> > drivers/net/ena/ena_ethdev.c
> > drivers/net/enic/enic_rxtx.c
> > drivers/net/fm10k/fm10k_rxtx.c
> > drivers/net/i40e/i40e_rxtx.c
> > drivers/net/ixgbe/ixgbe_rxtx.c
> > drivers/net/mlx4/mlx4.c
> > drivers/net/mlx5/mlx5_rxtx.c
> > drivers/net/nfp/nfp_net.c
> > drivers/net/qede/qede_rxtx.c
> > drivers/net/thunderx/nicvf_rxtx.c
> > drivers/net/virtio/

[dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-11-30 Thread Ananyev, Konstantin
Hi John,

> 
> Hi,
> -john
> 
> > -Original Message-
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Sent: Monday, November 28, 2016 3:03 AM
> > To: dev at dpdk.org; Rahul Lakkireddy ;
> > Stephen Hurd ; Jan Medala
> > ; Jakub Palider ; John Daley
> > (johndale) ; Adrien Mazarguil
> > ; Alejandro Lucero
> > ; Harish Patil
> > ; Rasesh Mody ; Jerin
> > Jacob ; Yuanhan Liu
> > ; Yong Wang 
> > Cc: Tomasz Kulasek ;
> > konstantin.ananyev at intel.com; olivier.matz at 6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> >
> > We need attention of every PMD developers on this thread.
> >
> > Reminder of what Konstantin suggested:
> > "
> > - if the PMD supports TX offloads AND
> > - if to be able use any of these offloads the upper layer SW would have to:
> > * modify the contents of the packet OR
> > * obey HW specific restrictions
> > then it is a PMD developer responsibility to provide tx_prep() that would
> > implement expected modifications of the packet contents and restriction
> > checks.
> > Otherwise, tx_prep() implementation is not required and can be safely set to
> > NULL.
> > "
> >
> > I copy/paste also my previous conclusion:
> >
> > Before txprep, there is only one API: the application must prepare the
> > packets checksum itself (get_psd_sum in testpmd).
> > With txprep, the application have 2 choices: keep doing the job itself or 
> > call
> > txprep which calls a PMD-specific function.
> > The question is: does non-Intel drivers need a checksum preparation for
> > TSO?
> > Will it behave well if txprep does nothing in these drivers?
> >
> > When looking at the code, most of drivers handle the TSO flags.
> > But it is hard to know whether they rely on the pseudo checksum or not.
> >
> > git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG'
> > drivers/net/
> >
> > drivers/net/bnxt/bnxt_txr.c
> > drivers/net/cxgbe/sge.c
> > drivers/net/e1000/em_rxtx.c
> > drivers/net/e1000/igb_rxtx.c
> > drivers/net/ena/ena_ethdev.c
> > drivers/net/enic/enic_rxtx.c
> > drivers/net/fm10k/fm10k_rxtx.c
> > drivers/net/i40e/i40e_rxtx.c
> > drivers/net/ixgbe/ixgbe_rxtx.c
> > drivers/net/mlx4/mlx4.c
> > drivers/net/mlx5/mlx5_rxtx.c
> > drivers/net/nfp/nfp_net.c
> > drivers/net/qede/qede_rxtx.c
> > drivers/net/thunderx/nicvf_rxtx.c
> > drivers/net/virtio/virtio_rxtx.c
> > drivers/net/vmxnet3/vmxnet3_rxtx.c
> >
> > Please, we need a comment for each driver saying "it is OK, we do not need
> > any checksum preparation for TSO"
> > or
> > "yes we have to implement tx_prepare or TSO will not work in this mode"
> 
> I like the idea of tx prep since it should make for cleaner apps.
> 
> For enic, I believe the answer is " it is OK, we do not need any checksum 
> preparation".
> 
> Prior to now, it was necessary to set IP checksum to 0 and put in a TCP/UDP 
> pseudo header. But there is a hardware overwrite of
> checksums option which makes preparation in software unnecessary and it is 
> testing out well so far. I plan to enable it in 17.02. TSO is also
> being enabled for 17.02 and it does not look like any prep is required. So 
> I'm going with " txprep NULL pointer is OK for enic", but may have
> to change my mind if something comes up in testing.

That's great thanks.
Other non-Intel PMD maintainers, please any feedback? 
Konstantin

> 
> -john


[dpdk-dev] [PATCH] net/qede: fix resource leak

2016-11-30 Thread Yong Wang
Current code does not close 'fd' on function exit, leaking resources.

Signed-off-by: Yong Wang 
---
 drivers/net/qede/qede_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/qede/qede_main.c b/drivers/net/qede/qede_main.c
index ab22409..b666e1c 100644
--- a/drivers/net/qede/qede_main.c
+++ b/drivers/net/qede/qede_main.c
@@ -137,6 +137,7 @@ static int qed_load_firmware_data(struct ecore_dev *edev)

if (fstat(fd, &st) < 0) {
DP_NOTICE(edev, false, "Can't stat firmware file\n");
+   close(fd);
return -1;
}

@@ -158,9 +159,11 @@ static int qed_load_firmware_data(struct ecore_dev *edev)
if (edev->fw_len < 104) {
DP_NOTICE(edev, false, "Invalid fw size: %" PRIu64 "\n",
  edev->fw_len);
+   close(fd);
return -EINVAL;
}

+   close(fd);
return 0;
 }
 #endif
-- 
1.8.3.1




[dpdk-dev] IXGBE VF server crash

2016-11-30 Thread Gregory Etelson
Hello,

I have a server with 8 VFs over Intel 82599 10Gb adapter
A test iterates DPDK process over each VF in turn for 10 sec:

while [ 1 ]; do
for vf in $VFS; do run_dpdk $vf; done
done

After several minutes server crashes.
The crash does not trigger kdump or leaves traces on console
The server power cycles

The crash reproduced on several different servers

Does anybody familiar with this fault ?
It there a way to stop it ?

Setup:
CentOS 6.8 x86_64
ixgbe-4.3.15
DPDK-16.11

Regards,
Gregory


[dpdk-dev] [dpdk-announce] DPDK 16.07.2 released

2016-11-30 Thread Yuanhan Liu
Hi all,

Here is a new stable release for 16.07:
http://fast.dpdk.org/rel/dpdk-16.07.2.tar.xz

Thanks everyone for making it happen!

Please note that this will be the last stable release for 16.07, and
16.11 will be our first LTS, which would have longer maintenance (2
years).

--yliu

---
 app/proc_info/main.c   |   6 +-
 app/test-pmd/cmdline.c |  70 +--
 app/test-pmd/config.c  |  61 ++-
 app/test-pmd/testpmd.c |  50 +-
 app/test-pmd/testpmd.h |   6 +
 app/test/test_hash_multiwriter.c   |  10 +-
 doc/guides/nics/enic.rst   |  53 ++-
 doc/guides/nics/i40e.rst   |  48 ++
 doc/guides/nics/mlx5.rst   |   3 +-
 doc/guides/rel_notes/release_16_07.rst |  95 +++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst|   3 +-
 drivers/net/bnx2x/bnx2x.c  |   6 +-
 drivers/net/bnx2x/elink.c  |   2 +-
 drivers/net/bnxt/bnxt.h|   1 +
 drivers/net/bnxt/bnxt_ethdev.c |  30 +-
 drivers/net/bnxt/bnxt_vnic.c   |   2 +-
 drivers/net/bonding/rte_eth_bond_api.c |  15 -
 drivers/net/bonding/rte_eth_bond_pmd.c |  10 +
 drivers/net/ena/ena_ethdev.c   |  10 +-
 drivers/net/enic/base/vnic_rq.c|   6 +-
 drivers/net/enic/base/vnic_rq.h|   1 +
 drivers/net/enic/enic.h|   8 +-
 drivers/net/enic/enic_clsf.c   |   5 +-
 drivers/net/enic/enic_ethdev.c |  10 +-
 drivers/net/enic/enic_main.c   |  32 +-
 drivers/net/fm10k/fm10k_ethdev.c   |   5 +-
 drivers/net/fm10k/fm10k_rxtx.c |  10 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   3 +
 drivers/net/i40e/i40e_ethdev.c | 214 -
 drivers/net/i40e/i40e_ethdev.h |   4 +-
 drivers/net/i40e/i40e_ethdev_vf.c  |  81 +---
 drivers/net/i40e/i40e_pf.c |  33 +-
 drivers/net/i40e/i40e_pf.h |   3 +-
 drivers/net/i40e/i40e_rxtx_vec.c   |   4 +-
 drivers/net/ixgbe/ixgbe_fdir.c |  10 +-
 drivers/net/ixgbe/ixgbe_regs.h |  40 +-
 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c |   3 +
 drivers/net/mlx5/Makefile  |  20 +
 drivers/net/mlx5/mlx5.c|   1 +
 drivers/net/mlx5/mlx5.h|   4 +
 drivers/net/mlx5/mlx5_ethdev.c | 162 ++-
 drivers/net/mlx5/mlx5_fdir.c   | 270 +++
 drivers/net/mlx5/mlx5_prm.h|  27 ++
 drivers/net/mlx5/mlx5_rxq.c|   9 +-
 drivers/net/mlx5/mlx5_rxtx.c   | 521 +
 drivers/net/mlx5/mlx5_rxtx.h   |   7 +-
 drivers/net/mlx5/mlx5_txq.c|   9 +-
 drivers/net/mlx5/mlx5_vlan.c   |   3 +-
 drivers/net/qede/Makefile  |   4 +
 drivers/net/ring/rte_eth_ring.c|   2 +-
 drivers/net/thunderx/nicvf_rxtx.c  |  11 +-
 drivers/net/virtio/virtio_ethdev.c |  14 +-
 drivers/net/vmxnet3/vmxnet3_rxtx.c |  34 +-
 examples/ip_pipeline/init.c|   2 +-
 examples/ipsec-secgw/ipsec-secgw.c |   4 +-
 examples/l2fwd-crypto/main.c   |  23 +-
 examples/qos_sched/app_thread.c|  22 +-
 examples/tep_termination/vxlan.c   |   4 +-
 lib/librte_eal/common/arch/arm/rte_cpuflags.c  |   1 +
 lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c   |   1 +
 lib/librte_eal/common/eal_common_pci.c |   2 +-
 lib/librte_eal/common/include/rte_version.h|   2 +-
 lib/librte_eal/linuxapp/kni/ethtool/igb/igb_main.c |  21 +-
 lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h  |   5 +
 .../linuxapp/kni/ethtool/ixgbe/ixgbe_main.c|   2 +-
 lib/librte_ether/rte_ethdev.c  |  15 +-
 lib/librte_hash/rte_cuckoo_hash.c  |   6 +-
 lib/librte_hash/rte_cuckoo_hash.h  |   2 +
 lib/librte_hash/rte_cuckoo_hash_x86.h  |   3 +-
 lib/librte_lpm/rte_lpm.c   |   6 +-
 lib/librte_mempool/rte_mempool.c   |   6 +-
 lib/librte_pdump/rte_pdump.c   |   4 +-
 lib/librte_vhost/vhost_rxtx.c  |  17 +-
 pkg/dpdk.spec  |   2 +-
 74 files changed, 1260 insertions(+), 941 deletions(-)

Adrien Mazarguil (1):
  net/mlx5: fix Rx VLAN offload capability report

Ajit Khaparde (1

[dpdk-dev] [PATCH v2 1/1] net/i40e: enable auto link update for 25G

2016-11-30 Thread Ferruh Yigit
On 11/29/2016 8:26 PM, Qi Zhang wrote:
> In previous patch for 25G (XXV710) enable
> 75d133dd329: ("net/i40e: enable 25G device"),
> we intend to disable the auto linke update as a work around
> for the issue that link can't be turn on when auto link update
> is enabled. Now we know the root cause, there are interface 
> changes of AQ command "set_phy_config" and "get_phy_capabilities" 
> for 25G. So, this patch remove this limitation.
> 
> Signed-off-by: Qi Zhang 

Commit log updated a little.

Applied to dpdk-next-net/master, thanks.


[dpdk-dev] [PATCH] e1000/base: announce supported NICs

2016-11-30 Thread Ferruh Yigit
On 11/27/2016 6:11 PM, Wenzhuo Lu wrote:
> Announce the support of I219 NICs. Also add all the
> other supported NICs.
> 
> Add Intel I219 NICs support in release note too.
> 
> Signed-off-by: Wenzhuo Lu 

Commit log and release notes wording updated.

Applied to dpdk-next-net/master, thanks.



[dpdk-dev] [PATCH v2] net/i40evf: fix casting between structs

2016-11-30 Thread Ferruh Yigit
On 11/30/2016 2:02 AM, Jingjing Wu wrote:
> Casting from structs which lay out data in typed members
> to structs which have flat memory buffers, will cause
> problems if the alignment of the former isn't as expected.
> This patch removes the casting between structs.
> 
> Fixes: ae19955e7c86 ("i40evf: support reporting PF reset")
> Signed-off-by: Jingjing Wu 

Applied to dpdk-next-net/master, thanks.



[dpdk-dev] [PATCH v2] maintainers: update pcap pmd maintainers

2016-11-30 Thread Thomas Monjalon
2016-11-29 14:39, John McNamara:
> Remove Nico Pernas Maradei as a PCAP PMD maintainer.
> 
> Signed-off-by: John McNamara 

Yes he is not active anymore.
Note he is welcome to come back at any time.

Acked-by: Thomas Monjalon 

Applied, thanks


[dpdk-dev] [PATCH] scripts: fix checkpatch from standard input

2016-11-30 Thread Thomas Monjalon
2016-11-28 16:21, Olivier Matz:
> On Mon, 21 Nov 2016 23:42:41 +0100, Thomas Monjalon
>  wrote:
> > When checking a valid patch from standard input,
> > the footer lines of the report are not filtered out.
> > 
> > The function check is called outside of any loop,
> > so the statement continue has no effect and the footer is printed.
> > 
> > Fixes: 8005feef421d ("scripts: add standard input to checkpatch")
> > 
> > Signed-off-by: Thomas Monjalon 
> 
> The 'continue' statement is not always without effect. On my machine
> (but it looks it's not the same everywhere):
> - with dash, the 'continue' acts like a return in that case
> - with bash, it displays an error:
>   "continue: only meaningful in a `for', `while', or `until' loop"
> - with bash --posix, the 'continue' is ignored...
> 
> In my case, checkpatches.sh was displaying "0/1 valid" although there
> was no error. This patch solves the issue, thanks.
> 
> 
> Acked-by: Olivier Matz 

I've amended with your explanations and applied, thanks


[dpdk-dev] [PATCH] scripts: check cc stable mailing list in commit

2016-11-30 Thread Ferruh Yigit
On 11/21/2016 10:43 PM, Thomas Monjalon wrote:
> Add a check for commits fixing a released bug.
> Such commits are found thanks to scripts/git-log-fixes.sh.
> They must be sent CC: stable at dpdk.org.
> In order to avoid forgetting CC, this mail header can be written
> in the git commit message.
> 
> Signed-off-by: Thomas Monjalon 

I think this is useful, thanks for the patch.

> ---
>  scripts/check-git-log.sh | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/scripts/check-git-log.sh b/scripts/check-git-log.sh
> index 5f8a9fc..4f98a7a 100755
> --- a/scripts/check-git-log.sh
> +++ b/scripts/check-git-log.sh
> @@ -47,12 +47,14 @@ if [ "$1" = '-h' -o "$1" = '--help' ] ; then
>   exit
>  fi
>  
> +selfdir=$(dirname $(readlink -e $0))
>  range=${1:-origin/master..}
>  
>  commits=$(git log --format='%h' --reverse $range)
>  headlines=$(git log --format='%s' --reverse $range)
>  bodylines=$(git log --format='%b' --reverse $range)
>  fixes=$(git log --format='%h %s' --reverse $range | grep -i ': *fix' | cut 
> -d' ' -f1)
> +stablefixes=$($selfdir/git-log-fixes.sh $range | sed '/(N\/A)$/d'  | cut -d' 
> ' -f2)

This breaks the "check-git-log.sh -N" usage, since "-N" is not a valid
range for git-log-fixes.sh.
Generates warning:
.../scripts/git-log-fixes.sh: illegal option -- 6
usage: git-log-fixes.sh [-h] 

>  tags=$(git log --format='%b' --reverse $range | grep -i -e 'by *:' -e 
> 'fix.*:')
>  bytag='\(Reported\|Suggested\|Signed-off\|Acked\|Reviewed\|Tested\)-by:'
>  
> @@ -191,3 +193,10 @@ bad=$(for fixtag in $fixtags ; do
>   printf "$fixtag" | grep -v "^$good$"
>  done | sed 's,^,\t,')
>  [ -z "$bad" ] || printf "Wrong 'Fixes' reference:\n$bad\n"
> +
> +# check CC:stable for fixes
> +bad=$(for fix in $stablefixes ; do
> + git log --format='%b' -1 $fix | grep -qi '^CC: *stable at dpdk.org' ||
> + git log --format='\t%s' -1 $fix
> +done)
> +[ -z "$bad" ] || printf "Should CC: stable at dpdk.org\n$bad\n"

This is good for developer, but since "CC: xx" tags removed when patch
applied, this will generate warnings when run against existing history.

I don't know what can be done for this.

Or should we keep CC: tags in commit log perhaps?

> 



[dpdk-dev] [PATCH] scripts: check cc stable mailing list in commit

2016-11-30 Thread Thomas Monjalon
2016-11-30 14:54, Ferruh Yigit:
> On 11/21/2016 10:43 PM, Thomas Monjalon wrote:
> > +stablefixes=$($selfdir/git-log-fixes.sh $range | sed '/(N\/A)$/d'  | cut 
> > -d' ' -f2)
> 
> This breaks the "check-git-log.sh -N" usage, since "-N" is not a valid
> range for git-log-fixes.sh.
> Generates warning:
> .../scripts/git-log-fixes.sh: illegal option -- 6
> usage: git-log-fixes.sh [-h] 

Yes, good catch.
I'm trying to fix it by converting -N to HEAD~N..

if printf -- $range | grep -q '^-[0-9]\+' ; then
range="HEAD$(printf -- $range | sed 's,^-,~,').."
fi

> > +# check CC:stable for fixes
> > +bad=$(for fix in $stablefixes ; do
> > +   git log --format='%b' -1 $fix | grep -qi '^CC: *stable at dpdk.org' ||
> > +   git log --format='\t%s' -1 $fix
> > +done)
> > +[ -z "$bad" ] || printf "Should CC: stable at dpdk.org\n$bad\n"
> 
> This is good for developer, but since "CC: xx" tags removed when patch
> applied, this will generate warnings when run against existing history.

I do not think it is a problem.
Who runs this tool against existing history?

> I don't know what can be done for this.
> 
> Or should we keep CC: tags in commit log perhaps?

I do not see the value of keeping the CC: in the git history.


[dpdk-dev] [PATCH] cryptodev: fix crash on null dereference

2016-11-30 Thread De Lara Guarch, Pablo
Hi Jerin,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> Sent: Tuesday, November 15, 2016 7:12 PM
> To: dev at dpdk.org
> Cc: Doherty, Declan; Jerin Jacob
> Subject: [dpdk-dev] [PATCH] cryptodev: fix crash on null dereference
> 
> crypodev->data->name will be null when
> rte_cryptodev_get_dev_id() invoked without a valid
> crypto device instance.
> 
> Signed-off-by: Jerin Jacob 

Could you add a "Fixes" line? 

Thanks,
Pablo


[dpdk-dev] [PATCH] test: adding AES cipher-only tests on QAT PMD

2016-11-30 Thread De Lara Guarch, Pablo


> -Original Message-
> From: Kusztal, ArkadiuszX
> Sent: Friday, November 25, 2016 2:14 PM
> To: Trahe, Fiona; dev at dpdk.org
> Cc: De Lara Guarch, Pablo; Griffin, John
> Subject: RE: [PATCH] test: adding AES cipher-only tests on QAT PMD
> 
> 
> 
> > -Original Message-
> > From: Trahe, Fiona
> > Sent: Thursday, November 24, 2016 6:29 PM
> > To: dev at dpdk.org
> > Cc: De Lara Guarch, Pablo ; Trahe, Fiona
> > ; Griffin, John ; 
> > Kusztal,
> > ArkadiuszX 
> > Subject: [PATCH] test: adding AES cipher-only tests on QAT PMD
> >
> > Extended functional AES-CBC and AES-CTR cipher-only tests to run on QAT
> > PMD.
> > Added AES_CBC cipher-only performance tests on QAT PMD.
> > No driver changes, but as now tested, QAT documentation is updated to
> > remove constraint.
> >
> > Signed-off-by: Fiona Trahe 
> > ---
> >  app/test/test_cryptodev.c  | 18 ++
> >  app/test/test_cryptodev_aes_test_vectors.h | 36 +++
> >  app/test/test_cryptodev_perf.c | 96 +++--
> 
> > -
> >  doc/guides/cryptodevs/qat.rst  |  1 -
> >  4 files changed, 102 insertions(+), 49 deletions(-)
> >
> > --
> > 2.5.0
> Acked-by: Arek Kusztal 

Applied to dpdk-next-crypto.
Thanks,

Pablo


[dpdk-dev] [PATCH] crypto: remove unused digest-appended feature

2016-11-30 Thread De Lara Guarch, Pablo


> -Original Message-
> From: Griffin, John
> Sent: Friday, November 18, 2016 6:16 PM
> To: Trahe, Fiona; dev at dpdk.org
> Cc: De Lara Guarch, Pablo; Jastrzebski, MichalX K; Kusztal, ArkadiuszX
> Subject: Re: [PATCH] crypto: remove unused digest-appended feature
> 
> On 17/11/16 17:33, Fiona Trahe wrote:
> > The cryptodev API had specified that if the digest address field was
> > left empty on an authentication operation, then the PMD would assume
> > the digest was appended to the source or destination data.
> > This case was not handled at all by most PMDs and incorrectly handled
> > by the QAT PMD.
> > As no bugs were raised, it is assumed to be not needed, so this patch
> > removes it, rather than add handling for the case on all PMDs.
> > The digest can still be appended to the data, but its
> > address must now be provided in the op.
> >
> > Signed-off-by: Fiona Trahe 
> > ---
> >   drivers/crypto/qat/qat_adf/qat_algs_build_desc.c |  2 ++
> >   drivers/crypto/qat/qat_crypto.c  | 18 +-
> >   lib/librte_cryptodev/rte_crypto_sym.h| 10 +-
> >   3 files changed, 4 insertions(+), 26 deletions(-)
> >
> 
> Acked-by: John Griffin 

Applied to dpdk-next-crypto.
Thanks,

Pablo


[dpdk-dev] [PATCH] scripts: check cc stable mailing list in commit

2016-11-30 Thread Bruce Richardson
On Wed, Nov 30, 2016 at 04:09:47PM +0100, Thomas Monjalon wrote:
> 2016-11-30 14:54, Ferruh Yigit:
> > On 11/21/2016 10:43 PM, Thomas Monjalon wrote:
> > > +stablefixes=$($selfdir/git-log-fixes.sh $range | sed '/(N\/A)$/d'  | cut 
> > > -d' ' -f2)
> > 
> > This breaks the "check-git-log.sh -N" usage, since "-N" is not a valid
> > range for git-log-fixes.sh.
> > Generates warning:
> > .../scripts/git-log-fixes.sh: illegal option -- 6
> > usage: git-log-fixes.sh [-h] 
> 
> Yes, good catch.
> I'm trying to fix it by converting -N to HEAD~N..
> 
> if printf -- $range | grep -q '^-[0-9]\+' ; then
> range="HEAD$(printf -- $range | sed 's,^-,~,').."
> fi
> 
> > > +# check CC:stable for fixes
> > > +bad=$(for fix in $stablefixes ; do
> > > + git log --format='%b' -1 $fix | grep -qi '^CC: *stable at dpdk.org' ||
> > > + git log --format='\t%s' -1 $fix
> > > +done)
> > > +[ -z "$bad" ] || printf "Should CC: stable at dpdk.org\n$bad\n"
> > 
> > This is good for developer, but since "CC: xx" tags removed when patch
> > applied, this will generate warnings when run against existing history.
> 
> I do not think it is a problem.
> Who runs this tool against existing history?
>

Me for one. I prefer to run the script against the commits in the repo
before I generate the patches, rather than manually hand-editing the
patches afterward - or having to fix the repo and then regenerate them.
Also, when I was maintaining the next-net tree, I used to use pwclient git-am
to apply a patch, and then check-got-log.sh -1 to sanity check it once
build checks had passed.

/Bruce



[dpdk-dev] [PATCH] scripts: check cc stable mailing list in commit

2016-11-30 Thread Thomas Monjalon
2016-11-30 15:26, Bruce Richardson:
> On Wed, Nov 30, 2016 at 04:09:47PM +0100, Thomas Monjalon wrote:
> > 2016-11-30 14:54, Ferruh Yigit:
> > > On 11/21/2016 10:43 PM, Thomas Monjalon wrote:
> > > > +stablefixes=$($selfdir/git-log-fixes.sh $range | sed '/(N\/A)$/d'  | 
> > > > cut -d' ' -f2)
> > > 
> > > This breaks the "check-git-log.sh -N" usage, since "-N" is not a valid
> > > range for git-log-fixes.sh.
> > > Generates warning:
> > > .../scripts/git-log-fixes.sh: illegal option -- 6
> > > usage: git-log-fixes.sh [-h] 
> > 
> > Yes, good catch.
> > I'm trying to fix it by converting -N to HEAD~N..
> > 
> > if printf -- $range | grep -q '^-[0-9]\+' ; then
> > range="HEAD$(printf -- $range | sed 's,^-,~,').."
> > fi
> > 
> > > > +# check CC:stable for fixes
> > > > +bad=$(for fix in $stablefixes ; do
> > > > +   git log --format='%b' -1 $fix | grep -qi '^CC: *stable at 
> > > > dpdk.org' ||
> > > > +   git log --format='\t%s' -1 $fix
> > > > +done)
> > > > +[ -z "$bad" ] || printf "Should CC: stable at dpdk.org\n$bad\n"
> > > 
> > > This is good for developer, but since "CC: xx" tags removed when patch
> > > applied, this will generate warnings when run against existing history.
> > 
> > I do not think it is a problem.
> > Who runs this tool against existing history?
> >
> 
> Me for one. I prefer to run the script against the commits in the repo
> before I generate the patches, rather than manually hand-editing the
> patches afterward - or having to fix the repo and then regenerate them.
> Also, when I was maintaining the next-net tree, I used to use pwclient git-am
> to apply a patch, and then check-got-log.sh -1 to sanity check it once
> build checks had passed.

I am not sure to understand.
You explain that you run the script for the commits you are going to send
or going to push. That's the normal usage.
In your cases you should have the CC: stable or you will have the warning.



[dpdk-dev] [PATCH] scripts: check cc stable mailing list in commit

2016-11-30 Thread Bruce Richardson
On Wed, Nov 30, 2016 at 04:31:46PM +0100, Thomas Monjalon wrote:
> 2016-11-30 15:26, Bruce Richardson:
> > On Wed, Nov 30, 2016 at 04:09:47PM +0100, Thomas Monjalon wrote:
> > > 2016-11-30 14:54, Ferruh Yigit:
> > > > On 11/21/2016 10:43 PM, Thomas Monjalon wrote:
> > > > > +stablefixes=$($selfdir/git-log-fixes.sh $range | sed '/(N\/A)$/d'  | 
> > > > > cut -d' ' -f2)
> > > > 
> > > > This breaks the "check-git-log.sh -N" usage, since "-N" is not a valid
> > > > range for git-log-fixes.sh.
> > > > Generates warning:
> > > > .../scripts/git-log-fixes.sh: illegal option -- 6
> > > > usage: git-log-fixes.sh [-h] 
> > > 
> > > Yes, good catch.
> > > I'm trying to fix it by converting -N to HEAD~N..
> > > 
> > > if printf -- $range | grep -q '^-[0-9]\+' ; then
> > > range="HEAD$(printf -- $range | sed 's,^-,~,').."
> > > fi
> > > 
> > > > > +# check CC:stable for fixes
> > > > > +bad=$(for fix in $stablefixes ; do
> > > > > + git log --format='%b' -1 $fix | grep -qi '^CC: *stable at 
> > > > > dpdk.org' ||
> > > > > + git log --format='\t%s' -1 $fix
> > > > > +done)
> > > > > +[ -z "$bad" ] || printf "Should CC: stable at dpdk.org\n$bad\n"
> > > > 
> > > > This is good for developer, but since "CC: xx" tags removed when patch
> > > > applied, this will generate warnings when run against existing history.
> > > 
> > > I do not think it is a problem.
> > > Who runs this tool against existing history?
> > >
> > 
> > Me for one. I prefer to run the script against the commits in the repo
> > before I generate the patches, rather than manually hand-editing the
> > patches afterward - or having to fix the repo and then regenerate them.
> > Also, when I was maintaining the next-net tree, I used to use pwclient 
> > git-am
> > to apply a patch, and then check-got-log.sh -1 to sanity check it once
> > build checks had passed.
> 
> I am not sure to understand.
> You explain that you run the script for the commits you are going to send
> or going to push. That's the normal usage.
> In your cases you should have the CC: stable or you will have the warning.
> 
Ah, yes, good point.
Never mind.

/Bruce


[dpdk-dev] [PATCH 0/3] Add scatter-gather list capability to Intel QuickAssist Technology driver

2016-11-30 Thread Arek Kusztal
This patchset adds scatter-gather list (SGL) capability to Intel QuickAssist 
Technology driver
and corresponding tests to QAT cryptodev test suite.

This patchset depends on the following patches/patchsets:

"crypto/qat: fix to avoid buffer overwrite in OOP case"
(http://dpdk.org/dev/patchwork/patch/17241)

Arek Kusztal (3):
  lib/librte_cryptodev: add private member to crypto op struct
  crypto/qat: add SGL capability to Intel QuickAssist driver
  app/test: add SGL tests to cryptodev QAT suite

 app/test/test_cryptodev.c  | 356 +
 app/test/test_cryptodev_gcm_test_vectors.h | 823 -
 doc/guides/rel_notes/release_17_02.rst |   5 +
 drivers/crypto/qat/qat_crypto.c| 145 -
 drivers/crypto/qat/qat_crypto.h|   6 +
 drivers/crypto/qat/qat_qp.c|  28 +
 lib/librte_cryptodev/rte_crypto.h  |   4 +
 7 files changed, 1353 insertions(+), 14 deletions(-)

-- 
2.1.0



[dpdk-dev] [PATCH 1/3] lib/librte_cryptodev: add private member to crypto op struct

2016-11-30 Thread Arek Kusztal
This commit adds void * _priv member to rte_crypto_op struct to be used
by internal driver operations.

Signed-off-by: Arek Kusztal 
---
 lib/librte_cryptodev/rte_crypto.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/librte_cryptodev/rte_crypto.h 
b/lib/librte_cryptodev/rte_crypto.h
index 9019518..9aa09ce 100644
--- a/lib/librte_cryptodev/rte_crypto.h
+++ b/lib/librte_cryptodev/rte_crypto.h
@@ -117,6 +117,9 @@ struct rte_crypto_op {
struct rte_crypto_sym_op *sym;
/**< Symmetric operation parameters */
}; /**< operation specific parameters */
+
+   void *_priv;
+   /**< Private pointer to be used by driver */
 } __rte_cache_aligned;

 /**
@@ -146,6 +149,7 @@ __rte_crypto_op_reset(struct rte_crypto_op *op, enum 
rte_crypto_op_type type)
}

op->opaque_data = NULL;
+   op->_priv = NULL;
 }

 /**
-- 
2.1.0



[dpdk-dev] [PATCH 2/3] crypto/qat: add SGL capability to Intel QuickAssist driver

2016-11-30 Thread Arek Kusztal
This commit adds scatter-gather list capability to Intel QuickAssist
Technology driver.

Signed-off-by: Arek Kusztal 
---
 doc/guides/rel_notes/release_17_02.rst |   5 ++
 drivers/crypto/qat/qat_crypto.c| 145 ++---
 drivers/crypto/qat/qat_crypto.h|   6 ++
 drivers/crypto/qat/qat_qp.c|  28 +++
 4 files changed, 173 insertions(+), 11 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_02.rst 
b/doc/guides/rel_notes/release_17_02.rst
index 3b65038..87b 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -38,6 +38,11 @@ New Features
  Also, make sure to start the actual text at the margin.
  =

+* **Updated the QAT PMD.**
+
+  The QAT PMD was updated with additional support for:
+
+  * Scatter-gather list (SGL) support.

 Resolved Issues
 ---
diff --git a/drivers/crypto/qat/qat_crypto.c b/drivers/crypto/qat/qat_crypto.c
index afce4ac..fa3c277 100644
--- a/drivers/crypto/qat/qat_crypto.c
+++ b/drivers/crypto/qat/qat_crypto.c
@@ -67,6 +67,13 @@

 #define BYTE_LENGTH8

+#define SGL_2ND_COOKIE_OFF (QAT_SGL_MAX_NUMBER \
+   * sizeof(struct qat_alg_buf) \
+   + sizeof(struct qat_alg_buf_list))
+
+#define SGL_SECOND_COOKIE_ADDR(arg, cast)  ((cast)(arg) \
+   + SGL_2ND_COOKIE_OFF)
+
 static const struct rte_cryptodev_capabilities qat_pmd_capabilities[] = {
{   /* SHA1 HMAC */
.op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
@@ -503,7 +510,8 @@ static inline uint32_t
 adf_modulo(uint32_t data, uint32_t shift);

 static inline int
-qat_write_hw_desc_entry(struct rte_crypto_op *op, uint8_t *out_msg);
+qat_write_hw_desc_entry(struct rte_crypto_op *op,
+   uint8_t *out_msg, struct qat_qp *qp);

 void qat_crypto_sym_clear_session(struct rte_cryptodev *dev,
void *session)
@@ -839,7 +847,6 @@ unsigned qat_crypto_sym_get_session_private_size(
return RTE_ALIGN_CEIL(sizeof(struct qat_session), 8);
 }

-
 uint16_t
 qat_pmd_enqueue_op_burst(void *qp, struct rte_crypto_op **ops,
uint16_t nb_ops)
@@ -873,9 +880,16 @@ qat_pmd_enqueue_op_burst(void *qp, struct rte_crypto_op 
**ops,
}

while (nb_ops_sent != nb_ops_possible) {
-   ret = qat_write_hw_desc_entry(*cur_op, base_addr + tail);
+   ret = qat_write_hw_desc_entry(*cur_op, base_addr + tail,
+   tmp_qp);
if (ret != 0) {
tmp_qp->stats.enqueue_err_count++;
+   /*
+* This message cannot be enqueued,
+* decrease number of ops that wasnt sent
+*/
+   rte_atomic16_sub(&tmp_qp->inflights16,
+   nb_ops_possible - nb_ops_sent);
if (nb_ops_sent == 0)
return 0;
goto kick_tail;
@@ -884,6 +898,7 @@ qat_pmd_enqueue_op_burst(void *qp, struct rte_crypto_op 
**ops,
tail = adf_modulo(tail + queue->msg_size, queue->modulo);
nb_ops_sent++;
cur_op++;
+
}
 kick_tail:
WRITE_CSR_RING_TAIL(tmp_qp->mmap_bar_addr, queue->hw_bundle_number,
@@ -914,7 +929,7 @@ qat_pmd_dequeue_op_burst(void *qp, struct rte_crypto_op 
**ops,

 #ifdef RTE_LIBRTE_PMD_QAT_DEBUG_RX
rte_hexdump(stdout, "qat_response:", (uint8_t *)resp_msg,
-   sizeof(struct icp_qat_fw_comn_resp));
+   sizeof(struct icp_qat_fw_comn_resp));
 #endif
if (ICP_QAT_FW_COMN_STATUS_FLAG_OK !=
ICP_QAT_FW_COMN_RESP_CRYPTO_STAT_GET(
@@ -923,6 +938,15 @@ qat_pmd_dequeue_op_burst(void *qp, struct rte_crypto_op 
**ops,
} else {
rx_op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
}
+   /*
+* _priv set for symmetric means it is SGL
+* free this address from qp mempool of SGL's
+*/
+   if (rx_op->_priv) {
+   rte_mempool_put(tmp_qp->sgl_pool, rx_op->_priv);
+   rx_op->_priv = NULL;
+   }
+
*(uint32_t *)resp_msg = ADF_RING_EMPTY_SIG;
queue->head = adf_modulo(queue->head +
queue->msg_size,
@@ -945,8 +969,51 @@ qat_pmd_dequeue_op_burst(void *qp, struct rte_crypto_op 
**ops,
 }

 static inline int
-qat_write_hw_desc_entry(struct rte_crypto_op *op, uint8_t *out_msg)
+qat_sgl_fill_array(struct rte_mbuf *buf, uint64_t buff_start,
+   void *sgl_cookie, int32_t total_pck_len)
 {
+   int nr = 0;
+   struct qat_alg_buf_list *list = sgl_cookie;
+
+   int32_t to

[dpdk-dev] [PATCH 3/3] app/test: add SGL tests to cryptodev QAT suite

2016-11-30 Thread Arek Kusztal
This commit adds GCM tests to use within scatter-gather list.
Test use direct chained mbufs created based on the input parameter
for max size for in place operations and out of place operations.

Signed-off-by: Arek Kusztal 
---
 app/test/test_cryptodev.c  | 356 +
 app/test/test_cryptodev_gcm_test_vectors.h | 823 -
 2 files changed, 1176 insertions(+), 3 deletions(-)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index 00dced5..f1f3542 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -5924,6 +5924,356 @@ test_authenticated_decryption_fail_when_corruption(
 }

 static int
+create_gcm_operation_SGL(enum rte_crypto_cipher_operation op,
+   const struct gcm_test_data *tdata,
+   void *digest_mem, uint64_t digest_phys)
+{
+   struct crypto_testsuite_params *ts_params = &testsuite_params;
+   struct crypto_unittest_params *ut_params = &unittest_params;
+
+   const unsigned int auth_tag_len = tdata->auth_tag.len;
+   const unsigned int iv_len = tdata->iv.len;
+   const unsigned int aad_len = tdata->aad.len;
+
+   unsigned int iv_pad_len = 0, aad_buffer_len = 0;
+
+   /* Generate Crypto op data structure */
+   ut_params->op = rte_crypto_op_alloc(ts_params->op_mpool,
+   RTE_CRYPTO_OP_TYPE_SYMMETRIC);
+   TEST_ASSERT_NOT_NULL(ut_params->op,
+   "Failed to allocate symmetric crypto operation struct");
+
+   struct rte_crypto_sym_op *sym_op = ut_params->op->sym;
+
+   sym_op->auth.digest.data = digest_mem;
+
+   TEST_ASSERT_NOT_NULL(sym_op->auth.digest.data,
+   "no room to append digest");
+
+   sym_op->auth.digest.phys_addr = digest_phys;
+   sym_op->auth.digest.length = auth_tag_len;
+
+   if (op == RTE_CRYPTO_CIPHER_OP_DECRYPT) {
+   rte_memcpy(sym_op->auth.digest.data, tdata->auth_tag.data,
+   auth_tag_len);
+   TEST_HEXDUMP(stdout, "digest:",
+   sym_op->auth.digest.data,
+   sym_op->auth.digest.length);
+   }
+
+   iv_pad_len = RTE_ALIGN_CEIL(iv_len, 16);
+
+   sym_op->cipher.iv.data = (uint8_t *)rte_pktmbuf_prepend(
+   ut_params->ibuf, iv_pad_len);
+
+   TEST_ASSERT_NOT_NULL(sym_op->cipher.iv.data,
+   "no room to prepend iv");
+
+   memset(sym_op->cipher.iv.data, 0, iv_pad_len);
+   sym_op->cipher.iv.phys_addr = rte_pktmbuf_mtophys(ut_params->ibuf);
+   sym_op->cipher.iv.length = iv_len;
+
+   rte_memcpy(sym_op->cipher.iv.data, tdata->iv.data, iv_pad_len);
+
+   aad_buffer_len = ALIGN_POW2_ROUNDUP(aad_len, 16);
+
+   sym_op->auth.aad.data = (uint8_t *)rte_pktmbuf_prepend(
+   ut_params->ibuf, aad_buffer_len);
+   TEST_ASSERT_NOT_NULL(sym_op->auth.aad.data,
+   "no room to prepend aad");
+   sym_op->auth.aad.phys_addr = rte_pktmbuf_mtophys(
+   ut_params->ibuf);
+   sym_op->auth.aad.length = aad_len;
+
+   memset(sym_op->auth.aad.data, 0, aad_buffer_len);
+   rte_memcpy(sym_op->auth.aad.data, tdata->aad.data, aad_len);
+
+   TEST_HEXDUMP(stdout, "iv:", sym_op->cipher.iv.data, iv_pad_len);
+   TEST_HEXDUMP(stdout, "aad:",
+   sym_op->auth.aad.data, aad_len);
+
+   sym_op->cipher.data.length = tdata->plaintext.len;
+   sym_op->cipher.data.offset = aad_buffer_len + iv_pad_len;
+
+   sym_op->auth.data.offset = aad_buffer_len + iv_pad_len;
+   sym_op->auth.data.length = tdata->plaintext.len;
+
+   return 0;
+}
+
+#define SGL_MAX_NO 16
+
+static int
+test_AES_GCM_authenticated_encryption_SGL(const struct gcm_test_data *tdata,
+   const int oop, uint32_t fragsz, uint32_t fragsz_oop)
+{
+   struct crypto_testsuite_params *ts_params = &testsuite_params;
+   struct crypto_unittest_params *ut_params = &unittest_params;
+   struct rte_mbuf *buf, *buf_oop = NULL, *buf_last_oop = NULL;
+   int retval;
+   int to_trn = 0;
+   int to_trn_tbl[SGL_MAX_NO];
+   int segs = 1;
+   unsigned int trn_data = 0;
+   uint8_t *plaintext, *ciphertext, *auth_tag;
+
+   if (fragsz > tdata->plaintext.len)
+   fragsz = tdata->plaintext.len;
+
+   uint16_t plaintext_len = fragsz;
+   uint16_t frag_size_oop = fragsz_oop ? fragsz_oop : fragsz;
+
+   if (fragsz_oop > tdata->plaintext.len)
+   frag_size_oop = tdata->plaintext.len;
+
+   int ecx = 0;
+   void *digest_mem = NULL;
+
+   uint32_t prepend_len = ALIGN_POW2_ROUNDUP(tdata->iv.len, 16)
+   + tdata->aad.len;
+
+   if (tdata->plaintext.len % fragsz != 0) {
+   if (tdata->plaintext.len / fragsz + 1 > SGL_MAX_NO)
+   return 1;
+   }   else {
+   if (tdata->plaintext.len

[dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-11-30 Thread Harish Patil


>We need attention of every PMD developers on this thread.
>
>Reminder of what Konstantin suggested:
>"
>- if the PMD supports TX offloads AND
>- if to be able use any of these offloads the upper layer SW would have
>to:
>* modify the contents of the packet OR
>* obey HW specific restrictions
>then it is a PMD developer responsibility to provide tx_prep() that would
>implement
>expected modifications of the packet contents and restriction checks.
>Otherwise, tx_prep() implementation is not required and can be safely set
>to NULL.  
>"
>
>I copy/paste also my previous conclusion:
>
>Before txprep, there is only one API: the application must prepare the
>packets checksum itself (get_psd_sum in testpmd).
>With txprep, the application have 2 choices: keep doing the job itself
>or call txprep which calls a PMD-specific function.
>The question is: does non-Intel drivers need a checksum preparation for
>TSO?
>Will it behave well if txprep does nothing in these drivers?
>
>When looking at the code, most of drivers handle the TSO flags.
>But it is hard to know whether they rely on the pseudo checksum or not.
>
>git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG'
>drivers/net/
>
>drivers/net/bnxt/bnxt_txr.c
>drivers/net/cxgbe/sge.c
>drivers/net/e1000/em_rxtx.c
>drivers/net/e1000/igb_rxtx.c
>drivers/net/ena/ena_ethdev.c
>drivers/net/enic/enic_rxtx.c
>drivers/net/fm10k/fm10k_rxtx.c
>drivers/net/i40e/i40e_rxtx.c
>drivers/net/ixgbe/ixgbe_rxtx.c
>drivers/net/mlx4/mlx4.c
>drivers/net/mlx5/mlx5_rxtx.c
>drivers/net/nfp/nfp_net.c
>drivers/net/qede/qede_rxtx.c
>drivers/net/thunderx/nicvf_rxtx.c
>drivers/net/virtio/virtio_rxtx.c
>drivers/net/vmxnet3/vmxnet3_rxtx.c
>
>Please, we need a comment for each driver saying
>"it is OK, we do not need any checksum preparation for TSO"
>or
>"yes we have to implement tx_prepare or TSO will not work in this mode"
>

qede PMD doesn?t currently support TSO yet, it only supports Tx TCP/UDP/IP
csum offloads.
So Tx preparation isn?t applicable. So as of now -
"it is OK, we do not need any checksum preparation for TSO"


Thanks,
Harish



[dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-11-30 Thread Ananyev, Konstantin

Hi Harish,
> 
> 
> >We need attention of every PMD developers on this thread.
> >
> >Reminder of what Konstantin suggested:
> >"
> >- if the PMD supports TX offloads AND
> >- if to be able use any of these offloads the upper layer SW would have
> >to:
> >* modify the contents of the packet OR
> >* obey HW specific restrictions
> >then it is a PMD developer responsibility to provide tx_prep() that would
> >implement
> >expected modifications of the packet contents and restriction checks.
> >Otherwise, tx_prep() implementation is not required and can be safely set
> >to NULL.
> >"
> >
> >I copy/paste also my previous conclusion:
> >
> >Before txprep, there is only one API: the application must prepare the
> >packets checksum itself (get_psd_sum in testpmd).
> >With txprep, the application have 2 choices: keep doing the job itself
> >or call txprep which calls a PMD-specific function.
> >The question is: does non-Intel drivers need a checksum preparation for
> >TSO?
> >Will it behave well if txprep does nothing in these drivers?
> >
> >When looking at the code, most of drivers handle the TSO flags.
> >But it is hard to know whether they rely on the pseudo checksum or not.
> >
> >git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG'
> >drivers/net/
> >
> >drivers/net/bnxt/bnxt_txr.c
> >drivers/net/cxgbe/sge.c
> >drivers/net/e1000/em_rxtx.c
> >drivers/net/e1000/igb_rxtx.c
> >drivers/net/ena/ena_ethdev.c
> >drivers/net/enic/enic_rxtx.c
> >drivers/net/fm10k/fm10k_rxtx.c
> >drivers/net/i40e/i40e_rxtx.c
> >drivers/net/ixgbe/ixgbe_rxtx.c
> >drivers/net/mlx4/mlx4.c
> >drivers/net/mlx5/mlx5_rxtx.c
> >drivers/net/nfp/nfp_net.c
> >drivers/net/qede/qede_rxtx.c
> >drivers/net/thunderx/nicvf_rxtx.c
> >drivers/net/virtio/virtio_rxtx.c
> >drivers/net/vmxnet3/vmxnet3_rxtx.c
> >
> >Please, we need a comment for each driver saying
> >"it is OK, we do not need any checksum preparation for TSO"
> >or
> >"yes we have to implement tx_prepare or TSO will not work in this mode"
> >
> 
> qede PMD doesn?t currently support TSO yet, it only supports Tx TCP/UDP/IP
> csum offloads.
> So Tx preparation isn?t applicable. So as of now -
> "it is OK, we do not need any checksum preparation for TSO"

Thanks for the answer.
Though please note that it not only for TSO.
This is for any TX offload for which the upper layer SW would have
to modify the contents of the packet.
Though as I can see for qede neither PKT_TX_IP_CKSUM or PKT_TX_TCP_CKSUM
exhibits any extra requirements for the user.
Is that correct?

Konstantin   


> 
> 
> Thanks,
> Harish



[dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-11-30 Thread Kevin Traynor
Hi Adrien,

On 11/16/2016 04:23 PM, Adrien Mazarguil wrote:
> This new API supersedes all the legacy filter types described in
> rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
> PMDs to process and validate flow rules.
> 
> Benefits:
> 
> - A unified API is easier to program for, applications do not have to be
>   written for a specific filter type which may or may not be supported by
>   the underlying device.
> 
> - The behavior of a flow rule is the same regardless of the underlying
>   device, applications do not need to be aware of hardware quirks.
> 
> - Extensible by design, API/ABI breakage should rarely occur if at all.
> 
> - Documentation is self-standing, no need to look up elsewhere.
> 
> Existing filter types will be deprecated and removed in the near future.

I'd suggest to add a deprecation notice to deprecation.rst, ideally with
a target release.

> 
> Signed-off-by: Adrien Mazarguil 
> ---
>  MAINTAINERS|   4 +
>  lib/librte_ether/Makefile  |   3 +
>  lib/librte_ether/rte_eth_ctrl.h|   1 +
>  lib/librte_ether/rte_ether_version.map |  10 +
>  lib/librte_ether/rte_flow.c| 159 +
>  lib/librte_ether/rte_flow.h| 947 
>  lib/librte_ether/rte_flow_driver.h | 177 ++
>  7 files changed, 1301 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d6bb8f8..3b46630 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -243,6 +243,10 @@ M: Thomas Monjalon 
>  F: lib/librte_ether/
>  F: scripts/test-null.sh
>  
> +Generic flow API
> +M: Adrien Mazarguil 
> +F: lib/librte_ether/rte_flow*
> +
>  Crypto API
>  M: Declan Doherty 
>  F: lib/librte_cryptodev/
> diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
> index efe1e5f..9335361 100644
> --- a/lib/librte_ether/Makefile
> +++ b/lib/librte_ether/Makefile
> @@ -44,6 +44,7 @@ EXPORT_MAP := rte_ether_version.map
>  LIBABIVER := 5
>  
>  SRCS-y += rte_ethdev.c
> +SRCS-y += rte_flow.c
>  
>  #
>  # Export include files
> @@ -51,6 +52,8 @@ SRCS-y += rte_ethdev.c
>  SYMLINK-y-include += rte_ethdev.h
>  SYMLINK-y-include += rte_eth_ctrl.h
>  SYMLINK-y-include += rte_dev_info.h
> +SYMLINK-y-include += rte_flow.h
> +SYMLINK-y-include += rte_flow_driver.h
>  
>  # this lib depends upon:
>  DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool 
> lib/librte_ring lib/librte_mbuf
> diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
> index fe80eb0..8386904 100644
> --- a/lib/librte_ether/rte_eth_ctrl.h
> +++ b/lib/librte_ether/rte_eth_ctrl.h
> @@ -99,6 +99,7 @@ enum rte_filter_type {
>   RTE_ETH_FILTER_FDIR,
>   RTE_ETH_FILTER_HASH,
>   RTE_ETH_FILTER_L2_TUNNEL,
> + RTE_ETH_FILTER_GENERIC,
>   RTE_ETH_FILTER_MAX
>  };
>  
> diff --git a/lib/librte_ether/rte_ether_version.map 
> b/lib/librte_ether/rte_ether_version.map
> index 72be66d..b5d2547 100644
> --- a/lib/librte_ether/rte_ether_version.map
> +++ b/lib/librte_ether/rte_ether_version.map
> @@ -147,3 +147,13 @@ DPDK_16.11 {
>   rte_eth_dev_pci_remove;
>  
>  } DPDK_16.07;
> +
> +DPDK_17.02 {
> + global:
> +
> + rte_flow_validate;
> + rte_flow_create;
> + rte_flow_destroy;
> + rte_flow_query;
> +
> +} DPDK_16.11;
> diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
> new file mode 100644
> index 000..064963d
> --- /dev/null
> +++ b/lib/librte_ether/rte_flow.c
> @@ -0,0 +1,159 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright 2016 6WIND S.A.
> + *   Copyright 2016 Mellanox.

There's Mellanox copyright but you are the only signed-off-by - is that
right?

> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of 6WIND S.A. nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 

[dpdk-dev] [PATCH v2] i40e: Fix eth_i40e_dev_init sequence on ThunderX

2016-11-30 Thread Ananyev, Konstantin
Hi Jerin,

> 
> On Tue, Nov 22, 2016 at 01:46:54PM +, Bruce Richardson wrote:
> > On Tue, Nov 22, 2016 at 03:46:38AM +0530, Jerin Jacob wrote:
> > > On Sun, Nov 20, 2016 at 11:21:43PM +, Ananyev, Konstantin wrote:
> > > > Hi
> > > > >
> > > > > i40e_asq_send_command: rd32 & wr32 under ThunderX gives unpredictable
> > > > >results. To solve this include rte memory 
> > > > > barriers
> > > > >
> > > > > Signed-off-by: Satha Rao 
> > > > > ---
> > > > >  drivers/net/i40e/base/i40e_osdep.h | 14 ++
> > > > >  1 file changed, 14 insertions(+)
> > > > >
> > > > > diff --git a/drivers/net/i40e/base/i40e_osdep.h 
> > > > > b/drivers/net/i40e/base/i40e_osdep.h
> > > > > index 38e7ba5..ffa3160 100644
> > > > > --- a/drivers/net/i40e/base/i40e_osdep.h
> > > > > +++ b/drivers/net/i40e/base/i40e_osdep.h
> > > > > @@ -158,7 +158,13 @@ do { 
> > > > >\
> > > > >   ((volatile uint32_t *)((char *)(a)->hw_addr + (reg)))
> > > > >  static inline uint32_t i40e_read_addr(volatile void *addr)
> > > > >  {
> > > > > +#if defined(RTE_ARCH_ARM64)
> > > > > + uint32_t val = rte_le_to_cpu_32(I40E_PCI_REG(addr));
> > > > > + rte_rmb();
> > > > > + return val;
> > > >
> > > > If you really need an rmb/wmb with MMIO read/writes on ARM,
> > > > I think you can avoid #ifdefs here and use rte_smp_rmb/rte_smp_wmb.
> > > > BTW, I suppose if you need it for i40e, you would need it for other 
> > > > devices too.
> > >
> > > Yes. ARM would need for all devices(typically, the devices on external 
> > > PCI bus).
> > > I guess rte_smp_rmb may not be the correct abstraction. So we need more of
> > > rte_rmb() as we need only non smp variant on IO side. I guess then it 
> > > make sense to
> > > create new abstraction in eal with following variants so that each arch
> > > gets opportunity to make what it makes sense that specific platform
> > >
> > > rte_readb_relaxed
> > > rte_readw_relaxed
> > > rte_readl_relaxed
> > > rte_readq_relaxed
> > > rte_writeb_relaxed
> > > rte_writew_relaxed
> > > rte_writel_relaxed
> > > rte_writeq_relaxed
> > > rte_readb
> > > rte_readw
> > > rte_readl
> > > rte_readq
> > > rte_writeb
> > > rte_writew
> > > rte_writel
> > > rte_writeq
> > >
> > > Thoughts ?
> > >
> >
> > That seems like a lot of API calls!
> > Perhaps you can clarify - why would the rte_smp_rmb() not work for you?
> 
> Currently arm64 mapped DMB as rte_smp_rmb() for smp case.
> 
> Ideally for io barrier and non smp case, we need to map it as DSB and it is
> bit heavier than DMB

Ok, so you need some new macro, like rte_io_(r|w)mb or so, that would expand 
into dmb
for ARM,  correct?

> 
> The linux kernel arm64 mappings
> http://lxr.free-electrons.com/source/arch/arm64/include/asm/io.h#L142
> 
> DMB vs DSB
> https://community.arm.com/thread/3833
> 
> The relaxed one are without any barriers.(the use case like accessing on
> chip peripherals may need only relaxed versions)
> 
> Thoughts on new rte EAL abstraction?

Looks like a lot of macros but if you guys think that would help - NP with that 
:)
Again, in that case we probably can get rid of driver specific pci reg 
read/write defines.

Konstantin

> 
> >
> > /Bruce


[dpdk-dev] [PATCH v4] net/kni: add KNI PMD

2016-11-30 Thread Ferruh Yigit
Add KNI PMD which wraps librte_kni for ease of use.

KNI PMD can be used as any regular PMD to send / receive packets to the
Linux networking stack.

Signed-off-by: Ferruh Yigit 
---

v4:
* allow only single queue
* use driver.name as name

v3:
* rebase on top of latest master

v2:
* updated driver name eth_kni -> net_kni
---
 config/common_base  |   1 +
 config/common_linuxapp  |   1 +
 drivers/net/Makefile|   1 +
 drivers/net/kni/Makefile|  63 +
 drivers/net/kni/rte_eth_kni.c   | 462 
 drivers/net/kni/rte_pmd_kni_version.map |   4 +
 mk/rte.app.mk   |  10 +-
 7 files changed, 537 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/kni/Makefile
 create mode 100644 drivers/net/kni/rte_eth_kni.c
 create mode 100644 drivers/net/kni/rte_pmd_kni_version.map

diff --git a/config/common_base b/config/common_base
index 4bff83a..3385879 100644
--- a/config/common_base
+++ b/config/common_base
@@ -543,6 +543,7 @@ CONFIG_RTE_PIPELINE_STATS_COLLECT=n
 # Compile librte_kni
 #
 CONFIG_RTE_LIBRTE_KNI=n
+CONFIG_RTE_LIBRTE_PMD_KNI=n
 CONFIG_RTE_KNI_KMOD=n
 CONFIG_RTE_KNI_PREEMPT_DEFAULT=y
 CONFIG_RTE_KNI_VHOST=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 2483dfa..2ecd510 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -39,6 +39,7 @@ CONFIG_RTE_EAL_IGB_UIO=y
 CONFIG_RTE_EAL_VFIO=y
 CONFIG_RTE_KNI_KMOD=y
 CONFIG_RTE_LIBRTE_KNI=y
+CONFIG_RTE_LIBRTE_PMD_KNI=y
 CONFIG_RTE_LIBRTE_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index bc93230..c4771cd 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -41,6 +41,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic
 DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k
 DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e
 DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += kni
 DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
 DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
 DIRS-$(CONFIG_RTE_LIBRTE_MPIPE_PMD) += mpipe
diff --git a/drivers/net/kni/Makefile b/drivers/net/kni/Makefile
new file mode 100644
index 000..0b7cf91
--- /dev/null
+++ b/drivers/net/kni/Makefile
@@ -0,0 +1,63 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_kni.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lpthread
+
+EXPORT_MAP := rte_pmd_kni_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += rte_eth_kni.c
+
+#
+# Export include files
+#
+SYMLINK-y-include +=
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += lib/librte_kni
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += lib/librte_mempool
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/kni/rte_eth_kni.c b/drivers/net/kni/rte_eth_kni.c
new file mode 100644
index 000..6c4df96
--- /dev/null
+++ b/drivers/net/kni/rte_eth_kni.c
@@ -0,0 +1,462 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserve

[dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical scheduler

2016-11-30 Thread Cristian Dumitrescu
This RFC proposes an ethdev-based abstraction layer for Quality of Service (QoS)
hierarchical scheduler. The goal of the abstraction layer is to provide a simple
generic API that is agnostic of the underlying HW, SW or mixed HW-SW complex
implementation.

Q1: What is the benefit for having an abstraction layer for QoS hierarchical
layer?
A1: There is growing interest in the industry for handling various HW-based,
SW-based or mixed hierarchical scheduler implementations using a unified DPDK
API.

Q2: Which devices are targeted by this abstraction layer?
A2: All current and future devices that expose a hierarchical scheduler feature
under DPDK, including NICs, FPGAs, ASICs, SOCs, SW libraries.

Q3: Which scheduler hierarchies are supported by the API?
A3: Hopefully any scheduler hierarchy can be described and covered by the
current API. Of course, functional correctness, accuracy and performance levels
depend on the specific implementations of this API.

Q4: Why have this abstraction layer into ethdev as opposed to a new type of
device (e.g. scheddev) similar to ethdev, cryptodev, eventdev, etc?
A4: Packets are sent to the Ethernet device using the ethdev API
rte_eth_tx_burst() function, with the hierarchical scheduling taking place
automatically (i.e. no SW intervention) in HW implementations. Basically, the
hierarchical scheduler is done as part of packet TX operation.
The hierarchical scheduler is typically the last stage before packet TX and it
is tightly integrated with the TX stage. The hierarchical scheduler is just
another offload feature of the Ethernet device, which needs to be accommodated
by the ethdev API similar to any other offload feature (such as RSS, DCB,
flow director, etc).
Once the decision to schedule a specific packet has been taken, this packet
cannot be dropped and it has to be sent over the wire as is, otherwise what
takes place on the wire is not what was planned at scheduling time, so the
scheduling is not accurate (Note: there are some devices which allow prepending
headers to the packet after the scheduling stage at the expense of sending
correction requests back to the scheduler, but this only strengthens the bond
between scheduling and TX).

Q5: Given that the packet scheduling takes place automatically for pure HW
implementations, how does packet scheduling take place for poll-mode SW
implementations?
A5: The API provided function rte_sched_run() is designed to take care of this.
For HW implementations, this function typically does nothing. For SW
implementations, this function is typically expected to perform dequeue of
packets from the hierarchical scheduler and their write to Ethernet device TX
queue, periodic flush of any buffers on enqueue-side into the hierarchical
scheduler for burst-oriented implementations, etc.

Q6: Which are the scheduling algorithms supported?
A6: The fundamental scheduling algorithms that are supported are Strict Priority
(SP) and Weighted Fair Queuing (WFQ). The SP and WFQ algorithms are supported at
the level of each node of the scheduling hierarchy, regardless of the node
level/position in the tree. The SP algorithm is used to schedule between sibling
nodes with different priority, while WFQ is used to schedule between groups of
siblings that have the same priority.
Algorithms such as Weighed Round Robin (WRR), byte-level WRR, Deficit WRR
(DWRR), etc are considered approximations of the ideal WFQ and are therefore
assimilated to WFQ, although an associated implementation-dependent accuracy,
performance and resource usage trade-off might exist.

Q7: Which are the supported congestion management algorithms?
A7: Tail drop, head drop and Weighted Random Early Detection (WRED). They are
available for every leaf node in the hierarchy, subject to the specific
implementation supporting them.

Q8: Is traffic shaping supported?
A8: Yes, there are a number of shapers (rate limiters) that can be supported for
each node in the hierarchy (built-in limit is currently set to 4 per node). Each
shaper can be private to a node (used only by that node) or shared between
multiple nodes.

Q9: What is the purpose of having shaper profiles and WRED profiles?
A9: In most implementations, many shapers typically share the same configuration
parameters, so defining shaper profiles simplifies the configuration task. Same
considerations apply to WRED contexts and profiles.

Q10: How is the scheduling hierarchy defined and created?
A10: Scheduler hierarchy tree is set up by creating new nodes and connecting
them to other existing nodes, which thus become parent nodes. The unique ID that
is assigned to each node when the node is created is further used to update the
node configuration or to connect children nodes to it. The leaf nodes of the
scheduler hierarchy are each attached to one of the Ethernet device TX queues.

Q11: Are on-the-fly changes of the scheduling hierarchy allowed by the API?
A11: Yes. The actual changes take place subject to the specific implementation
su

[dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-11-30 Thread Thomas Monjalon
2016-11-30 17:42, Ananyev, Konstantin:
> > >Please, we need a comment for each driver saying
> > >"it is OK, we do not need any checksum preparation for TSO"
> > >or
> > >"yes we have to implement tx_prepare or TSO will not work in this mode"
> > >
> > 
> > qede PMD doesn?t currently support TSO yet, it only supports Tx TCP/UDP/IP
> > csum offloads.
> > So Tx preparation isn?t applicable. So as of now -
> > "it is OK, we do not need any checksum preparation for TSO"
> 
> Thanks for the answer.
> Though please note that it not only for TSO.

Oh yes, sorry, my wording was incorrect.
We need to know if any checksum preparation is needed prior
offloading its final computation to the hardware or driver.
So the question applies to TSO and simple checksum offload.

We are still waiting answers for
bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.

> This is for any TX offload for which the upper layer SW would have
> to modify the contents of the packet.
> Though as I can see for qede neither PKT_TX_IP_CKSUM or PKT_TX_TCP_CKSUM
> exhibits any extra requirements for the user.
> Is that correct?



[dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-11-30 Thread Harish Patil
>
>
>
>Hi Harish,
>> 
>> 
>> >We need attention of every PMD developers on this thread.
>> >
>> >Reminder of what Konstantin suggested:
>> >"
>> >- if the PMD supports TX offloads AND
>> >- if to be able use any of these offloads the upper layer SW would have
>> >to:
>> >* modify the contents of the packet OR
>> >* obey HW specific restrictions
>> >then it is a PMD developer responsibility to provide tx_prep() that
>>would
>> >implement
>> >expected modifications of the packet contents and restriction checks.
>> >Otherwise, tx_prep() implementation is not required and can be safely
>>set
>> >to NULL.
>> >"
>> >
>> >I copy/paste also my previous conclusion:
>> >
>> >Before txprep, there is only one API: the application must prepare the
>> >packets checksum itself (get_psd_sum in testpmd).
>> >With txprep, the application have 2 choices: keep doing the job itself
>> >or call txprep which calls a PMD-specific function.
>> >The question is: does non-Intel drivers need a checksum preparation for
>> >TSO?
>> >Will it behave well if txprep does nothing in these drivers?
>> >
>> >When looking at the code, most of drivers handle the TSO flags.
>> >But it is hard to know whether they rely on the pseudo checksum or not.
>> >
>> >git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG'
>> >drivers/net/
>> >
>> >drivers/net/bnxt/bnxt_txr.c
>> >drivers/net/cxgbe/sge.c
>> >drivers/net/e1000/em_rxtx.c
>> >drivers/net/e1000/igb_rxtx.c
>> >drivers/net/ena/ena_ethdev.c
>> >drivers/net/enic/enic_rxtx.c
>> >drivers/net/fm10k/fm10k_rxtx.c
>> >drivers/net/i40e/i40e_rxtx.c
>> >drivers/net/ixgbe/ixgbe_rxtx.c
>> >drivers/net/mlx4/mlx4.c
>> >drivers/net/mlx5/mlx5_rxtx.c
>> >drivers/net/nfp/nfp_net.c
>> >drivers/net/qede/qede_rxtx.c
>> >drivers/net/thunderx/nicvf_rxtx.c
>> >drivers/net/virtio/virtio_rxtx.c
>> >drivers/net/vmxnet3/vmxnet3_rxtx.c
>> >
>> >Please, we need a comment for each driver saying
>> >"it is OK, we do not need any checksum preparation for TSO"
>> >or
>> >"yes we have to implement tx_prepare or TSO will not work in this mode"
>> >
>> 
>> qede PMD doesn?t currently support TSO yet, it only supports Tx
>>TCP/UDP/IP
>> csum offloads.
>> So Tx preparation isn?t applicable. So as of now -
>> "it is OK, we do not need any checksum preparation for TSO"
>
>Thanks for the answer.
>Though please note that it not only for TSO.

Okay. I initially thought so. But was not sure, so I explicitly indicated
that there is no TSO support.

>This is for any TX offload for which the upper layer SW would have
>to modify the contents of the packet.
>Though as I can see for qede neither PKT_TX_IP_CKSUM or PKT_TX_TCP_CKSUM
>exhibits any extra requirements for the user.
>Is that correct?

That?s right.

>
>Konstantin   
>
>
>> 
>> 
>> Thanks,
>> Harish
>
>




[dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-11-30 Thread Ajit Khaparde
On Mon,
??
Nov 28, 2016 at 5:03 AM, Thomas Monjalon  wrote:

> We need attention of every PMD developers on this thread.
>
> Reminder of what Konstantin suggested:
> "
> - if the PMD supports TX offloads AND
> - if to be able use any of these offloads the upper layer SW would have to:
> * modify the contents of the packet OR
> * obey HW specific restrictions
> then it is a PMD developer responsibility to provide tx_prep() that would
> implement
> expected modifications of the packet contents and restriction checks.
> Otherwise, tx_prep() implementation is not required and can be safely set
> to NULL.
> "
>
> I copy/paste also my previous conclusion:
>
> Before txprep, there is only one API: the application must prepare the
> packets checksum itself (get_psd_sum in testpmd).
> With txprep, the application have 2 choices: keep doing the job itself
> or call txprep which calls a PMD-specific function.
> The question is: does non-Intel drivers need a checksum preparation for
> TSO?
> Will it behave well if txprep does nothing in these drivers?
>
> When looking at the code, most of drivers handle the TSO flags.
> But it is hard to know whether they rely on the pseudo checksum or not.
>
> git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG'
> drivers/net/
>
> drivers/net/bnxt/bnxt_txr.c
>
?::: snip:::
?


>
> Please, we need a comment for each driver saying
> "it is OK, we do not need any checksum preparation for TSO"
> or
> "yes we have to implement tx_prepare or TSO will not work in this mode"
>

?The bnxt devices
 don't need pse
??
udo header checksum in the packet for TSO or TX
checksum offload.
? So..
?
"it is OK, we do not need any checksum preparation for TSO"


[dpdk-dev] apply commit e30a0178d290a4e83dc01f9c2170d4859339c9cf "kni: support RHEL 7.3" to dpdk-stable?

2016-11-30 Thread Roberts, Lee A.
Does it make sense to apply the commit for "kni: support RHEL 7.3" 
(http://www.dpdk.org/browse/dpdk/commit/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h?id=e30a0178d290a4e83dc01f9c2170d4859339c9cf)
to the stable tree to enable clean compilation on RHEL 7.3?

 - Lee Roberts



[dpdk-dev] Hyper-v support

2016-11-30 Thread Varun
Hi,

I would like to know if the latest DPDK (16.11) supports hyper-v?

I couldn't find any conclusive evidence online or in dpdk roadmap. Is it
likely that we see it in 17.05?

-- 
Regards,
Varun