Re: [dpdk-dev] [PATCH v5 04/12] eal: integrate bus scan and probe with EAL

2017-01-08 Thread Rosen, Rami
Hi,
...
...

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c 
index 2206277..2c223de 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
...

+/* Scan all the buses for registering devices */ int
+rte_eal_bus_scan(void)
+{
+   int ret;
+   struct rte_bus *bus = NULL;
+
+   TAILQ_FOREACH(bus, &rte_bus_list, next) {
+   ret = bus->scan(bus);
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
+   bus->name);
+   /* TODO: Should error on a particular bus block scan
+* for all others?
+*/
+   return ret;
+   }
+   }
+
+   return 0;
+}
+

Nitpick - the return type of rte_eal_bus_scan() is int and not void:

* Scan all the buses attached to the framework.
+ *
+ * @param void
+ * @return void
+ */
+int rte_eal_bus_scan(void);


Rami Rosen


Re: [dpdk-dev] [PATCH v5 3/8] ethdev: reserve capability flags for PMD-specific API

2017-01-08 Thread Ananyev, Konstantin

Hi Adrien,

> 
> Hi Konstantin,
> 
> On Thu, Jan 05, 2017 at 11:32:38AM +, Ananyev, Konstantin wrote:
> > Hi Adrien,
> >
> > >
> > > On Thu, Jan 05, 2017 at 07:56:08AM +0800, Tiwei Bie wrote:
> > > > On Thu, Jan 05, 2017 at 01:44:18AM +0800, Ananyev, Konstantin wrote:
> > > > [...]
> > > > > > >
> > > > > > > I understand that.
> > > > > > > My question was: suppose user would like to create a bonded 
> > > > > > > device over 2 NICs.
> > > > > > > One of them is ixgbe, while other would be some other type.
> > > > > > > In future get_dev_info() for each of them might return 
> > > > > > > DEV_RX_OFFLOAD_RESERVED_0  bit as set.
> > > > > > > But it would mean completely different thing.
> > > > > > > How bonded device would know that to deal properly?
> > > > > > >
> > > > > > > Another example - user has 2 NICs of different type and would 
> > > > > > > like to send the same packet on both of them simultaneously.
> > > > > > > As PKT_TX_RESERVED might mean different things for these devices, 
> > > > > > > and user would like to use let say
> > > > > > > PKT_TX_IXGBE_MACSEC on one of them, he would need to do a copy of 
> > > > > > > them, instead just increment a refcnt.
> > > > > > >
> > > > > > > Similar issues might arise at RX handling: user got a packet with 
> > > > > > > PKT_RX_RESERVED_0 set.
> > > > > > > What does it really mean if there are different NIC types in the 
> > > > > > > system?
> > > > > > > The only way to answer that question, as I can see,  is to keep 
> > > > > > > track from what NIC that packet was received.
> > > > > > > Which I suppose, is not always convenient.
> > > > > > >
> > > > > >
> > > > > > The main purpose is to put the PMD-specific APIs in a separate
> > > > > > namespace instead of mixing the PMD-specific APIs and global APIs
> > > > > > up, and also save the bits in mbuf.ol_flags.
> > > > > >
> > > > > > There are other ways to achieve this goal, such as introducing
> > > > > > the PMD specific ol_flags in mbuf second cache line as you said.
> > > > > > I just thought defining some reserved bits seems to be the most
> > > > > > simple way which won't introduce many changes.
> > > > > >
> > > > > > What's your suggestions? Should I just revert the changes and
> > > > > > define the generic flags directly?
> > > > >
> > > > > Yes, that would be my preference.
> > > > > As I said above - spending extra bit in ol_flags  doesn't look like a 
> > > > > big problem to me.
> > > > > In return there would be no need to consider how to handle all that 
> > > > > confusing scenarios in future.
> > > >
> > > > Okay. I'll update my patches. Thanks a lot for your comments.
> > >
> > > Well, I do not agree with Konstantin (no one saw this coming eh?)
> >
> > :)
> >
> >  >and do not think you need to update your series again.
> > >
> > > PMD-specific symbols have nothing to do in the global namespace in my
> > > opinion, they are not versioned and may evolve without notice. Neither
> > > applications nor the bonding PMD can rely on them. That's the trade-off.
> >
> > Not sure I do understand your reasoning.
> > For me MACSEC offload is just one of many HW offloads that we support
> > and should be treated in exactly the same way.
> > Applications should be able to use it in a transparent and reliable way,
> > not only under some limited conditions.
> > Otherwise what is the point to introduce it at all?
> 
> Well my first reply to this thread was asking why isn't the whole API global
> from the start then?

That's good question, and my preference would always be to have the
API to configure this feature as generic one.
I guess the main reason why it is not right now we don't reach an agreement
how this API should look like: 
http://dpdk.org/ml/archives/dev/2016-September/047810.html
But I'll leave it to the author to provide the real reason here. 

> 
> Given there are valid reasons for it not to and no plan to make it so in the
> near future, applications must be aware that they are including
> rte_pmd_ixgbe.h to use it. That in itself is a limiting condition, right?

Yes, it is definitely a limiting factor.
Though even if API to configure device to use macsec would be PMD specific 
right now,
The API to query that capability and the API to use it at datapath 
(mbuf.ol_flags) still
can be (and I think should be) device independent and transparent to use.  

> 
> > Yes, right now it is supported only by ixgbe PMD, but why that should be the
> > reason to treat is as second-class citizen?
> > Let say PKT_TX_TUNNEL_* offloads also are supported only by one PMD right 
> > now.
> 
> You are right about PKT_TX_TUNNEL_*, however these flags exist on their own
> and are not tied to any API function calls, unlike in this series where
> PKT_TX_MACSEC can only be used if the DEV_TX_OFFLOAD_MACSEC_INSERT
> capability is present 

I don't think PKT_TX_TUNNEL_* 'exists on its own'.
To use it well behaving app have to:
1) Query that device does provide that capability: DEV_TX_OFFLOAD_*

[dpdk-dev] Port stats zero when using MLX5 DPDK driver

2017-01-08 Thread george . dit
Hi,

I have a simple setup with a machine that contains a dual port 10 GbE Intel
82599ES NIC and another dual port 100 GbE Mellanox ConnectX-4 NIC. The
Intel ports are 0 and 1, while the Mellanox ones are 2 and 3.

I properly compiled DPDK 16.11 and test-pmd works just fine for all 4 ports.
Then, I ran a simple primary application that forwards packets from 0 <-->
1 and 2 <--> 3 and started dpdk-procinfo -- --stats (or --xstats) as a
secondary monitoring process, while sending some traffic to all 4 ports.
The problem I see is that the statistics reported by the Mellanox NICs are
always zero (Intel ports report just fine).

What is the reason behind this behavior? Is there a bug in the driver
(maybe recently fixed by DPDK 17.02 rc?) or is it simply a lack of this
functionality?

Thanks and best regards,

-- 
   Georgios Katsikas
   Ph.D. Student and Research Assistant
   Network Systems Lab (NSL)



   *E-Mail:*  george .d...@gmail.com
   *Web Site:*  http://www.di.uoa.gr/~katsikas/



Re: [dpdk-dev] A question

2017-01-08 Thread Zhang, Helin
Hi Kanani

Within around one year, we have implemented input set to reconfigure some 
registers to select which field to be used for hash calculation.
So I think it should work quite better now with using X710 or XL710. What’s 
your issue here?
Could you help to have a try and tell me the real issue in your side? Hopefully 
I can help a little bit.

Regards,
Helin

From: kamiar kanani [mailto:kamiar.kan...@gmail.com]
Sent: Sunday, January 8, 2017 8:04 PM
To: Zhang, Helin 
Subject: A question

Hi Helin

I read a post from you in one of the DPDK mailing list about a problem with 
XL710 symmetric RSS configuration.
the problem states limitation in the XL710's capability of redirecting 
fragmented packets. the link is:
http://dpdk.org/ml/archives/dev/2015-July/022453.html
I searched through internet but I didn't find any thing better. It would be a 
great help to me if you could
give me any hint about what can be done for the problem, if there is any update.

thanks a lot



[dpdk-dev] [PATCH 1/5] net/mlx5: last WQE no room inline

2017-01-08 Thread Elad Persiko
Prior to this patch, when sending a packet and the following
conditions were reached:
1. last working queue element is used.
2. inline was requested by the user
3. no room for inline packet.
then the inline request was ignored and the packet was sent
by pointer completely.

This patch handles this scenario. In this case the last
work queue element is turned to be a null work queue element and
the packet is being sent after the wrap around.

Signed-off-by: Elad Persiko 
---
 drivers/net/mlx5/mlx5_rxtx.c | 12 
 drivers/net/mlx5/mlx5_txq.c  |  8 ++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index e0ee2f2..be38aed 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -481,6 +481,17 @@
pkt_inline_sz += copy_b;
/* Sanity check. */
assert(addr <= addr_end);
+   } else {
+   wqe->ctrl = (rte_v128u32_t){
+   htonl(txq->wqe_ci << 8),
+   htonl(txq->qp_num_8s | 1),
+   0,
+   0,
+   };
+   length = 0;
+   buf = *(pkts--);
+   ds = 1;
+   goto next_pkt_part;
}
/*
 * 2 DWORDs consumed by the WQE header + ETH segment +
@@ -577,6 +588,7 @@
0,
0,
};
+next_pkt_part:
wqe->eseg = (rte_v128u32_t){
0,
cs_flags,
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 949035b..951e50a 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -314,8 +314,12 @@
/* CQ to be associated with the receive queue. */
.recv_cq = tmpl.cq,
.cap = {
-   /* Max number of outstanding WRs. */
-   .max_send_wr = ((priv->device_attr.max_qp_wr < desc) ?
+   /*
+* Max number of outstanding WRs.
+* "+1" for null WQE place holder.
+*/
+   .max_send_wr = ((priv->device_attr.max_qp_wr <
+   (desc + 1)) ?
priv->device_attr.max_qp_wr :
desc),
/*
-- 
1.8.3.1



[dpdk-dev] [PATCH 5/5] doc: add tso capabilities feature for mlx5

2017-01-08 Thread Elad Persiko
Feature implemented at:
b007e98ccda9 ("net/mlx5: implement TSO data path")
085c4137280a ("net/mlx5: support TSO in control plane")

Signed-off-by: Elad Persiko 
---
 doc/guides/nics/features/mlx4.ini | 1 +
 doc/guides/nics/features/mlx5.ini | 1 +
 2 files changed, 2 insertions(+)

diff --git a/doc/guides/nics/features/mlx4.ini 
b/doc/guides/nics/features/mlx4.ini
index c9828f7..d74b9dd 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -10,6 +10,7 @@ Queue start/stop = Y
 MTU update   = Y
 Jumbo frame  = Y
 Scattered Rx = Y
+TSO  = Y
 Promiscuous mode = Y
 Allmulticast mode= Y
 Unicast MAC filter   = Y
diff --git a/doc/guides/nics/features/mlx5.ini 
b/doc/guides/nics/features/mlx5.ini
index f811e3f..f8a215e 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -11,6 +11,7 @@ Queue start/stop = Y
 MTU update   = Y
 Jumbo frame  = Y
 Scattered Rx = Y
+TSO  = Y
 Promiscuous mode = Y
 Allmulticast mode= Y
 Unicast MAC filter   = Y
-- 
1.8.3.1



[dpdk-dev] [PATCH 2/5] net/mlx5: remove unessecary goto label

2017-01-08 Thread Elad Persiko
use_dseg label can be deleted as it happens without goto.

Signed-off-by: Elad Persiko 
---
 drivers/net/mlx5/mlx5_rxtx.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index be38aed..1560530 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -505,7 +505,6 @@
if ((uintptr_t)dseg >= end)
dseg = (volatile rte_v128u32_t *)
   txq->wqes;
-   goto use_dseg;
} else if (!segs_n) {
goto next_pkt;
} else {
@@ -523,19 +522,18 @@
dseg = (volatile rte_v128u32_t *)
((uintptr_t)wqe + (3 * MLX5_WQE_DWORD_SIZE));
ds = 3;
-use_dseg:
-   /* Add the remaining packet as a simple ds. */
-   addr = htonll(addr);
-   *dseg = (rte_v128u32_t){
-   htonl(length),
-   txq_mp2mr(txq, txq_mb2mp(buf)),
-   addr,
-   addr >> 32,
-   };
-   ++ds;
-   if (!segs_n)
-   goto next_pkt;
}
+   /* Add the remaining packet as a simple ds. */
+   addr = htonll(addr);
+   *dseg = (rte_v128u32_t){
+   htonl(length),
+   txq_mp2mr(txq, txq_mb2mp(buf)),
+   addr,
+   addr >> 32,
+   };
+   ++ds;
+   if (!segs_n)
+   goto next_pkt;
 next_seg:
assert(buf);
assert(ds);
-- 
1.8.3.1



[dpdk-dev] [PATCH 3/5] net/mlx5: support TSO in control plane

2017-01-08 Thread Elad Persiko
Signed-off-by: Elad Persiko 
---
 doc/guides/nics/mlx5.rst|  6 ++
 drivers/net/mlx5/mlx5.c | 17 -
 drivers/net/mlx5/mlx5.h |  1 +
 drivers/net/mlx5/mlx5_txq.c |  4 +++-
 4 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index a41c432..816075a 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -188,6 +188,12 @@ Run-time configuration
   It is currently only supported on the ConnectX-4 Lx and ConnectX-5
   families of adapters. Enabled by default.
 
+- ``txq_lso_en`` parameter [int]
+
+  A nonzero value enables TCP Segmentation Offloading (in hardware) on tx
+  side. It saves CPU time and PCI bandwidth.
+
+  Enabled by default.
 Prerequisites
 -
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 6293c1f..55c5b87 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -84,6 +84,9 @@
 /* Device parameter to enable multi-packet send WQEs. */
 #define MLX5_TXQ_MPW_EN "txq_mpw_en"
 
+/* Device parameter to enable LSO. */
+#define MLX5_TXQ_LSO_EN "txq_lso_en"
+
 /**
  * Retrieve integer value from environment variable.
  *
@@ -287,6 +290,8 @@
priv->txqs_inline = tmp;
} else if (strcmp(MLX5_TXQ_MPW_EN, key) == 0) {
priv->mps &= !!tmp; /* Enable MPW only if HW supports */
+   } else if (strcmp(MLX5_TXQ_LSO_EN, key) == 0) {
+   priv->lso &= !!tmp;
} else {
WARN("%s: unknown parameter", key);
return -EINVAL;
@@ -312,6 +317,7 @@
MLX5_RXQ_CQE_COMP_EN,
MLX5_TXQ_INLINE,
MLX5_TXQS_MIN_INLINE,
+   MLX5_TXQ_LSO_EN,
MLX5_TXQ_MPW_EN,
NULL,
};
@@ -429,7 +435,7 @@
mps = 0;
}
INFO("PCI information matches, using device \"%s\""
-" (SR-IOV: %s, MPS: %s)",
+" (SR-IOV: %s, LSO: true, MPS: %s)",
 list[i]->name,
 sriov ? "true" : "false",
 mps ? "true" : "false");
@@ -474,8 +480,11 @@
IBV_EXP_DEVICE_ATTR_RX_HASH |
IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS |
IBV_EXP_DEVICE_ATTR_RX_PAD_END_ALIGN |
+   IBV_EXP_DEVICE_ATTR_TSO_CAPS |
0;
 
+   exp_device_attr.tso_caps.max_tso = 262144;
+   exp_device_attr.tso_caps.supported_qpts =  IBV_QPT_RAW_ETH;
DEBUG("using port %u (%08" PRIx32 ")", port, test);
 
ctx = ibv_open_device(ibv_dev);
@@ -525,6 +534,7 @@
priv->port = port;
priv->pd = pd;
priv->mtu = ETHER_MTU;
+   priv->lso = 1; /* Enabled by default. */
priv->mps = mps; /* Enable MPW by default if supported. */
priv->cqe_comp = 1; /* Enable compression by default. */
err = mlx5_args(priv, pci_dev->device.devargs);
@@ -580,6 +590,11 @@
err = ENOTSUP;
goto port_error;
}
+   if (priv->lso && priv->mps) {
+   ERROR("LSO and MPS can't coexists");
+   err = ENOTSUP;
+   goto port_error;
+   }
/* Allocate and register default RSS hash keys. */
priv->rss_conf = rte_calloc(__func__, hash_rxq_init_n,
sizeof((*priv->rss_conf)[0]), 0);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index ee62e04..a163983 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -116,6 +116,7 @@ struct priv {
unsigned int hw_padding:1; /* End alignment padding is supported. */
unsigned int sriov:1; /* This is a VF or PF with VF devices. */
unsigned int mps:1; /* Whether multi-packet send is supported. */
+   unsigned int lso:1; /* Whether lso is supported. */
unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */
unsigned int pending_alarm:1; /* An alarm is pending. */
unsigned int txq_inline; /* Maximum packet size for inlining. */
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 951e50a..de9f494 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -337,8 +337,10 @@
.sq_sig_all = 0,
.pd = priv->pd,
.res_domain = tmpl.rd,
+   .max_tso_header = 128,  // ETH/IPv4/TCP header example
.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
- IBV_EXP_QP_INIT_ATTR_RES_DOMAIN),
+ IBV_EXP_QP_INIT_ATTR_RES_DOMAIN |
+ IBV_EXP_QP_INIT_ATTR_MAX_TSO_HEADER),
};
if (priv->txq_inline && (priv->

[dpdk-dev] [PATCH 4/5] net/mlx5: implement TSO data path

2017-01-08 Thread Elad Persiko
Signed-off-by: Elad Persiko 
---
 drivers/net/mlx5/mlx5_ethdev.c |   2 +
 drivers/net/mlx5/mlx5_rxtx.c   | 246 +++--
 2 files changed, 187 insertions(+), 61 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index fbb1b65..ea5ab02 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -589,6 +589,8 @@ struct priv *
(priv->hw_vlan_strip ? DEV_RX_OFFLOAD_VLAN_STRIP : 0);
if (!priv->mps)
info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT;
+   if (priv->lso)
+   info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO;
if (priv->hw_csum)
info->tx_offload_capa |=
(DEV_TX_OFFLOAD_IPV4_CKSUM |
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 1560530..4940dc1 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -388,6 +388,8 @@
uint16_t pkt_inline_sz = MLX5_WQE_DWORD_SIZE;
uint16_t ehdr;
uint8_t cs_flags = 0;
+   uint8_t header_sum;
+   uint8_t tso;
 #ifdef MLX5_PMD_SOFT_COUNTERS
uint32_t total_length = 0;
 #endif
@@ -429,37 +431,29 @@
pkt_addr = rte_pktmbuf_mtod(*pkts, volatile void *);
rte_prefetch0(pkt_addr);
}
-   /* Should we enable HW CKSUM offload */
-   if (buf->ol_flags &
-   (PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM)) {
-   cs_flags = MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM;
-   }
-   raw = ((uint8_t *)(uintptr_t)wqe) + 2 * MLX5_WQE_DWORD_SIZE;
-   /*
-* Start by copying the Ethernet header minus the first two
-* bytes which will be appended at the end of the Ethernet
-* segment.
-*/
-   memcpy((uint8_t *)raw, ((uint8_t *)addr) + 2, 16);
-   length -= MLX5_WQE_DWORD_SIZE;
-   addr += MLX5_WQE_DWORD_SIZE;
-   /* Replace the Ethernet type by the VLAN if necessary. */
-   if (buf->ol_flags & PKT_TX_VLAN_PKT) {
-   uint32_t vlan = htonl(0x8100 | buf->vlan_tci);
-
-   memcpy((uint8_t *)(raw + MLX5_WQE_DWORD_SIZE - 2 -
-  sizeof(vlan)),
-  &vlan, sizeof(vlan));
-   addr -= sizeof(vlan);
-   length += sizeof(vlan);
-   }
-   /* Inline if enough room. */
-   if (txq->max_inline != 0) {
+   tso = buf->tso_segsz && buf->l4_len;
+   if (tso) {
+   /*
+* After copying the ETH seg we need to copy 16 bytes
+* less.
+*/
+   header_sum = buf->l2_len + buf->l3_len + buf->l4_len
+- MLX5_WQE_DWORD_SIZE;
+   raw = ((uint8_t *)(uintptr_t)wqe) +
+ 2 * MLX5_WQE_DWORD_SIZE;
+   /*
+* Start by copying the Ethernet header minus the
+* first two bytes which will be appended at the
+* end of the Ethernet segment.
+*/
+   memcpy((uint8_t *)raw, ((uint8_t *)addr) + 2,
+  MLX5_WQE_DWORD_SIZE);
+   length -= MLX5_WQE_DWORD_SIZE;
+   addr += MLX5_WQE_DWORD_SIZE;
+
uintptr_t end = (uintptr_t)
(((uintptr_t)txq->wqes) +
 (1 << txq->wqe_n) * MLX5_WQE_SIZE);
-   uint16_t max_inline =
-   txq->max_inline * RTE_CACHE_LINE_SIZE;
uint16_t room;
 
/*
@@ -468,12 +462,9 @@
 */
raw += MLX5_WQE_DWORD_SIZE - 2;
room = end - (uintptr_t)raw;
-   if (room > max_inline) {
-   uintptr_t addr_end = (addr + max_inline) &
-   ~(RTE_CACHE_LINE_SIZE - 1);
-   uint16_t copy_b = ((addr_end - addr) > length) ?
- length :
- (addr_end - addr);
+   if (room > header_sum) {
+   uintptr_t addr_end = addr + header_sum;
+   uint16_t copy_b = addr_end - addr;
 
rte_memcpy((void *)raw, (void *)addr, copy_b);
addr += copy_b;
@@ -488,10 +479,16 @@
 

[dpdk-dev] [PATCH v2] doc: add tso capabilities feature for mlx5

2017-01-08 Thread Elad Persiko
Feature implemented at:
commit b007e98ccda9 ("net/mlx5: implement TSO data path")
commit 085c4137280a ("net/mlx5: support TSO in control plane")

Signed-off-by: Elad Persiko 
---
 doc/guides/nics/features/mlx4.ini | 1 +
 doc/guides/nics/features/mlx5.ini | 1 +
 2 files changed, 2 insertions(+)

diff --git a/doc/guides/nics/features/mlx4.ini 
b/doc/guides/nics/features/mlx4.ini
index c9828f7..d74b9dd 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -10,6 +10,7 @@ Queue start/stop = Y
 MTU update   = Y
 Jumbo frame  = Y
 Scattered Rx = Y
+TSO  = Y
 Promiscuous mode = Y
 Allmulticast mode= Y
 Unicast MAC filter   = Y
diff --git a/doc/guides/nics/features/mlx5.ini 
b/doc/guides/nics/features/mlx5.ini
index f811e3f..f8a215e 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -11,6 +11,7 @@ Queue start/stop = Y
 MTU update   = Y
 Jumbo frame  = Y
 Scattered Rx = Y
+TSO  = Y
 Promiscuous mode = Y
 Allmulticast mode= Y
 Unicast MAC filter   = Y
-- 
1.8.3.1



Re: [dpdk-dev] [PATCH v5 2/5] net/e1000: add firmware version get

2017-01-08 Thread Stephen Hemminger
On Sun,  8 Jan 2017 12:11:32 +0800
Qiming Yang  wrote:

> + switch (hw->mac.type) {
> + case e1000_i210:
> + case e1000_i211:
> + if (!(e1000_get_flash_presence_i210(hw))) {
> + snprintf(fw_version, fw_length,
> +  "%2d.%2d-%d",
> +  fw.invm_major, fw.invm_minor,
> +  fw.invm_img_type);
> + break;
> + }
> + /* fall through */
> + default:
> + /* if option rom is valid, display its version too*/
> + if (fw.or_valid) {
> + snprintf(fw_version, fw_length,
> +  "%d.%d, 0x%08x, %d.%d.%d",
> +  fw.eep_major, fw.eep_minor, fw.etrack_id,
> +  fw.or_major, fw.or_build, fw.or_patch);
> + /* no option rom */
> + } else {
> + if (fw.etrack_id != 0X) {
> + snprintf(fw_version, fw_length,
> +  "%d.%d, 0x%08x",
> +  fw.eep_major, fw.eep_minor, fw.etrack_id);
> + } else {
> + snprintf(fw_version, fw_length,
> +  "%d.%d.%d",
> +  fw.eep_major, fw.eep_minor, fw.eep_build);

Indentation is incorrect here.


Re: [dpdk-dev] [PATCH v5 1/5] ethdev: add firmware version get

2017-01-08 Thread Stephen Hemminger
On Sun,  8 Jan 2017 12:11:31 +0800
Qiming Yang  wrote:

>  void
> +rte_eth_dev_fw_version_get(uint8_t port_id, char *fw_version, int fw_length)
> +{
> + struct rte_eth_dev *dev;
> +
> + RTE_ETH_VALID_PORTID_OR_RET(port_id);
> + dev = &rte_eth_devices[port_id];
> +
> + RTE_FUNC_PTR_OR_RET(*dev->dev_ops->fw_version_get);
> + (*dev->dev_ops->fw_version_get)(dev, fw_version, fw_length);
> +}

Maybe dev argument to fw_version_get should be:
   const struct rte_eth_dev *dev



Re: [dpdk-dev] [PATCH v5 4/5] net/i40e: add firmware version get

2017-01-08 Thread Stephen Hemminger
On Sun,  8 Jan 2017 12:11:34 +0800
Qiming Yang  wrote:

>  static void
> +i40e_fw_version_get(struct rte_eth_dev *dev, char *fw_version, int fw_length)
> +{
> + struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +
> + snprintf(fw_version, fw_length,
> +  "%d.%d%d 0x%08x",
> +  ((hw->nvm.version >> 12) & 0xf),
> +  ((hw->nvm.version >> 4) & 0xff),
> +  (hw->nvm.version & 0xf), hw->nvm.eetrack);


It would be good to have same constants and format between Linux kernel
driver and DPDK.

Use %u as format specifier for unsigned values

static inline char *i40e_nvm_version_str(struct i40e_hw *hw)
{
static char buf[32];
u32 full_ver;
u8 ver, patch;
u16 build;

full_ver = hw->nvm.oem_ver;
ver = (u8)(full_ver >> I40E_OEM_VER_SHIFT);
build = (u16)((full_ver >> I40E_OEM_VER_BUILD_SHIFT) &
 I40E_OEM_VER_BUILD_MASK);
patch = (u8)(full_ver & I40E_OEM_VER_PATCH_MASK);

snprintf(buf, sizeof(buf),
 "%x.%02x 0x%x %d.%d.%d",
 (hw->nvm.version & I40E_NVM_VERSION_HI_MASK) >>
I40E_NVM_VERSION_HI_SHIFT,
 (hw->nvm.version & I40E_NVM_VERSION_LO_MASK) >>
I40E_NVM_VERSION_LO_SHIFT,
 hw->nvm.eetrack, ver, build, patch);


Re: [dpdk-dev] [PATCH v5 5/5] ethtool: display firmware version

2017-01-08 Thread Stephen Hemminger
On Sun,  8 Jan 2017 12:11:35 +0800
Qiming Yang  wrote:

> --- a/examples/ethtool/lib/rte_ethtool.c
> +++ b/examples/ethtool/lib/rte_ethtool.c
> @@ -54,6 +54,9 @@ rte_ethtool_get_drvinfo(uint8_t port_id, struct 
> ethtool_drvinfo *drvinfo)
>  
>   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>  
> + rte_eth_dev_fw_version_get(port_id, drvinfo->fw_version,
> +   sizeof(drvinfo->fw_version));
> +
>   memset(&dev_info, 0, sizeof(dev_info));
>   rte_eth_dev_info_get(port_id, &dev_info);

That memset is redundant since dev_info_get() does memset as first thing.


Re: [dpdk-dev] [PATCH v3 2/6] net/virtio: fix wrong Rx/Tx method for secondary process

2017-01-08 Thread Stephen Hemminger
On Fri,  6 Jan 2017 18:16:16 +0800
Yuanhan Liu  wrote:

> If the primary enables the vector Rx/Tx path, the current code would
> let the secondary always choose the non vector Rx/Tx path. This results
> to a Rx/Tx method mismatch between primary and secondary process. Werid
> errors then may happen, something like:
> 
> PMD: virtio_xmit_pkts() tx: virtqueue_enqueue error: -14
> 
> Fix it by choosing the correct Rx/Tx callbacks for the secondary process.
> That is, use vector path if it's given.
> 
> Fixes: 8d8393fb1861 ("virtio: pick simple Rx/Tx")
> 
> Cc: sta...@dpdk.org
> Signed-off-by: Yuanhan Liu 

This is failing on intel compile tests.


http://dpdk.org/patch/18975

_Compilation issues_

Submitter: Yuanhan Liu 
Date: Fri,  6 Jan 2017 18:16:18 +0800
DPDK git baseline: Repo:dpdk-next-virtio, Branch:master, 
CommitID:2b2669fc4c792f9a3ab73490bb93f7810a71c089

Patch18974-18975 --> compile error
Build Summary: 18 Builds Done, 0 Successful, 18 Failures

Test environment and configuration as below:
OS: RHEL7.2_64
Kernel Version:3.10.0-327.el7.x86_64
CPU info:Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
GCC Version:gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
Clang Version:3.4.2
i686-native-linuxapp-gcc
x86_64-native-linuxapp-gcc
x86_64-native-linuxapp-gcc-shared
OS: FreeBSD10.3_64
Kernel Version:10.3-RELEASE
CPU info: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz (2194.97-MHz K8-class 
CPU)
GCC Version:gcc (FreeBSD Ports Collection) 4.8.5
Clang Version:3.4.1
x86_64-native-bsdapp-clang
x86_64-native-bsdapp-gcc
OS: FC24_64
Kernel Version:4.8.6-201.fc24.x86_64
CPU info:Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
GCC Version:gcc (GCC) 6.2.1 20160916 (Red Hat 6.2.1-2)
Clang Version:3.8.0
x86_64-native-linuxapp-gcc-debug
i686-native-linuxapp-gcc
x86_64-native-linuxapp-gcc
x86_64-native-linuxapp-gcc-shared
x86_64-native-linuxapp-clang
OS: UB1604_64
Kernel Version:4.4.0-53-generic
CPU info:Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
GCC Version:gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Clang Version:3.8.0
i686-native-linuxapp-gcc
x86_64-native-linuxapp-gcc
x86_64-native-linuxapp-gcc-shared
x86_64-native-linuxapp-clang
OS: CentOS7_64
Kernel Version:3.10.0-327.el7.x86_64
CPU info:Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
GCC Version:gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
Clang Version:3.4.2
i686-native-linuxapp-gcc
x86_64-native-linuxapp-clang
x86_64-native-linuxapp-gcc-shared
x86_64-native-linuxapp-gcc

Failed Build #1:
OS: RHEL7.2_64
Target: i686-native-linuxapp-gcc
MKRES test_resource_c.res.o  
/home/patchWorkOrg/compilation/i686-native-linuxapp-gcc/lib/librte_ethdev.a(rte_ethdev.o):
 In function `rte_eth_dev_pci_probe':
rte_ethdev.c:(.text+0x930): undefined reference to `eth_dev_attach_secondary'
collect2: error: ld returned 1 exit status


Failed Build #2:
OS: RHEL7.2_64
Target: x86_64-native-linuxapp-gcc
MKRES test_resource_c.res.o  
/home/patchWorkOrg/compilation/x86_64-native-linuxapp-gcc/lib/librte_ethdev.a(rte_ethdev.o):
 In function `rte_eth_dev_pci_probe':
rte_ethdev.c:(.text+0x9c4): undefined reference to `eth_dev_attach_secondary'
collect2: error: ld returned 1 exit status


Failed Build #3:
OS: RHEL7.2_64
Target: x86_64-native-linuxapp-gcc-shared

/home/patchWorkOrg/compilation/lib/librte_ether/rte_ethdev.c: In function 
‘rte_eth_dev_pci_probe’:
/home/patchWorkOrg/compilation/lib/librte_ether/rte_ethdev.c:273:3: warning: 
implicit declaration of function ‘eth_dev_attach_secondary’ 
[-Wimplicit-function-declaration]
   eth_dev = eth_dev_attach_secondary(ethdev_name);
   ^
/home/patchWorkOrg/compilation/lib/librte_ether/rte_ethdev.c:273:3: warning: 
nested extern declaration of ‘eth_dev_attach_secondary’ [-Wnested-externs]
/home/patchWorkOrg/compilation/lib/librte_ether/rte_ethdev.c:273:11: warning: 
assignment makes pointer from integer without a cast [enabled by default]
   eth_dev = eth_dev_attach_secondary(ethdev_name);
   ^
/home/patchWorkOrg/compilation/lib/librte_ether/rte_ethdev.c: At top level:
/home/patchWorkOrg/compilation/lib/librte_ether/rte_ethdev.c:193:1: warning: 
‘eth_dev_init’ defined but not used [-Wunused-function]
 eth_dev_init(struct rte_eth_dev *eth_dev, uint8_t port_id, const char *name)
 ^  LD librte_ethdev.so.5.1
rte_ethdev.o: In function `rte_eth_dev_pci_probe':
rte_ethdev.c:(.text+0xa37): undefined reference to `eth_dev_attach_secondary'
collect2: error: ld returned 1 exit status


Failed Build #4:
OS: FreeBSD10.3_64
Target: x86_64-native-bsdapp-clang
MKRES test_resource_c.res.o  
/home/patchWorkOrg/compilation/x86_64-native-bsdapp-clang/lib/librte_ethdev.a(rte_ethdev.o):
 In function `rte_eth_dev_pci_probe':
/home/patchWorkOrg/compilation/lib/librte_ether/rte_ethdev.c:(.text+0x295): 
undefined reference to `eth_dev_attach_secondary'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
/ho

Re: [dpdk-dev] [PATCH] examples/ip_pipeline: check vlan and mpls params

2017-01-08 Thread Stephen Hemminger
On Fri, 6 Jan 2017 17:21:46 +
"Jyoti, Anand B"  wrote:

> +
> + /* Max MPLS label value 20 bits */
> + for (i = 0; i < data->l2.mpls.n_labels; i++)


What ever editor or mail system you are using is putting a unicode space in 
that statement,
not visible to normal mail client, but causes checkpatch failure.

> +
> + /* Max MPLS label value 20 bits */
> + for (i =3D 0; i < data->l2.mpls.n_labels; i++)

Please fix the patch and resubmit


Re: [dpdk-dev] [PATCH v7 14/27] net/i40e: set VF VLAN insertion from PF

2017-01-08 Thread Lu, Wenzhuo
Hi Bernard,


> -Original Message-
> From: Iremonger, Bernard
> Sent: Friday, January 6, 2017 7:29 PM
> To: Lu, Wenzhuo; Wu, Jingjing; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v7 14/27] net/i40e: set VF VLAN insertion from
> PF
> 
> Hi  Jinqjing, Wenzhuo,
> 
> > -Original Message-
> > From: Lu, Wenzhuo
> > Sent: Friday, January 6, 2017 8:20 AM
> > To: Wu, Jingjing ; dev@dpdk.org
> > Cc: Iremonger, Bernard 
> > Subject: RE: [dpdk-dev] [PATCH v7 14/27] net/i40e: set VF VLAN
> > insertion from PF
> >
> > Hi Jingjing, Bernard,
> >
> >
> > > -Original Message-
> > > From: Wu, Jingjing
> > > Sent: Friday, January 6, 2017 8:33 AM
> > > To: Lu, Wenzhuo; dev@dpdk.org
> > > Cc: Iremonger, Bernard
> > > Subject: RE: [dpdk-dev] [PATCH v7 14/27] net/i40e: set VF VLAN
> > > insertion from PF
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Wenzhuo Lu
> > > > Sent: Tuesday, January 3, 2017 2:55 PM
> > > > To: dev@dpdk.org
> > > > Cc: Iremonger, Bernard 
> > > > Subject: [dpdk-dev] [PATCH v7 14/27] net/i40e: set VF VLAN
> > > > insertion from PF
> > > >
> > > > From: Bernard Iremonger 
> > > >
> > > > Support inserting VF VLAN id from PF.
> > > > User can call the API on PF to insert a VLAN id to a specific VF.
> > > >
> > > > Signed-off-by: Bernard Iremonger 
> > > > ---
> > > >  drivers/net/i40e/i40e_ethdev.c| 56
> > > > +++
> > > >  drivers/net/i40e/rte_pmd_i40e.h   | 19 +++
> > > >  drivers/net/i40e/rte_pmd_i40e_version.map |  1 +
> > > >  3 files changed, 76 insertions(+)
> > > >
> > > > diff --git a/drivers/net/i40e/i40e_ethdev.c
> > > > b/drivers/net/i40e/i40e_ethdev.c index 7ab1c93..31c387d 100644
> > > > --- a/drivers/net/i40e/i40e_ethdev.c
> > > > +++ b/drivers/net/i40e/i40e_ethdev.c
> > > > @@ -10266,3 +10266,59 @@ static void
> > > > i40e_set_default_mac_addr(struct rte_eth_dev *dev,
> > > > else
> > > > return -EINVAL;
> > > >  }
> > > > +
> > > > +int rte_pmd_i40e_set_vf_vlan_insert(uint8_t port, uint16_t vf_id,
> > > > +   uint16_t vlan_id)
> > > > +{
> > > > +   struct rte_eth_dev *dev;
> > > > +   struct rte_eth_dev_info dev_info;
> > > > +   struct i40e_pf *pf;
> > > > +   struct i40e_hw *hw;
> > > > +   struct i40e_vsi *vsi;
> > > > +   struct i40e_vsi_context ctxt;
> > > > +   int ret;
> > > > +
> > > > +   RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
> > > > +
> > > > +   dev = &rte_eth_devices[port];
> > > > +   rte_eth_dev_info_get(port, &dev_info);
> > >
> > > It looks dev_info is not used in this function.
> > I'll delete it.
> 
> It was intended to use dev_info have a check on max_vfs.
> 
> if (vf_id >= dev_info.max_vfs)
>   return -EINVAL;
> 
> Should this check be added instead of deleting dev_info ?
It's not wrong to check it. But we don't think this check is necessary because 
there's the check below, "vf_id >= pf->vf_num".

> 
> > >
> > > > +   pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> > > > +   hw = I40E_PF_TO_HW(pf);
> > > > +
> > > > +   /**
> > > > +* return -ENODEV if SRIOV not enabled, VF number not configured
> > > > +* or no queue assigned.
> > > > +*/
> > > > +   if (!hw->func_caps.sr_iov_1_1 || pf->vf_num == 0 ||
> > > > +   pf->vf_nb_qps == 0)
> > > > +   return -ENODEV;
> > > > +
> > > > +   if (vf_id >= pf->vf_num || !pf->vfs)
> > > > +   return -EINVAL;
> > > > +
> > > > +   if (vlan_id > ETHER_MAX_VLAN_ID)
> > > > +   return -EINVAL;
> > > > +
> > > > +   vsi = pf->vfs[vf_id].vsi;
> > > > +   if (!vsi)
> > > > +   return -EINVAL;
> > > > +
> > > > +   vsi->info.valid_sections =
> > > > cpu_to_le16(I40E_AQ_VSI_PROP_VLAN_VALID);
> > > > +   vsi->info.pvid = vlan_id;
> > > > +   if (vlan_id > 0)
> > > > +   vsi->info.port_vlan_flags |=
> > > I40E_AQ_VSI_PVLAN_INSERT_PVID;
> > > > +   else
> > > > +   vsi->info.port_vlan_flags &=
> > > > ~I40E_AQ_VSI_PVLAN_INSERT_PVID;
> > > So, Pvid is used here for insert. Does it has any relationship with
> > > vlan anti- spoof patch?
> > > If so, it's better to consider how to deal with that.
> > It's vlan insertion not filtering. So I think not related.
> >
> > >
> > > Thanks
> > > Jingjing
> 
> Regards,
> 
> Bernard.



Re: [dpdk-dev] [PATCH v5 2/5] net/e1000: add firmware version get

2017-01-08 Thread Yang, Qiming


-Original Message-
From: Stephen Hemminger [mailto:step...@networkplumber.org] 
Sent: Monday, January 9, 2017 7:04 AM
To: Yang, Qiming 
Cc: dev@dpdk.org; Yigit, Ferruh ; Zhang, Helin 
; Horton, Remy 
Subject: Re: [dpdk-dev] [PATCH v5 2/5] net/e1000: add firmware version get

On Sun,  8 Jan 2017 12:11:32 +0800
Qiming Yang  wrote:

> + switch (hw->mac.type) {
> + case e1000_i210:
> + case e1000_i211:
> + if (!(e1000_get_flash_presence_i210(hw))) {
> + snprintf(fw_version, fw_length,
> +  "%2d.%2d-%d",
> +  fw.invm_major, fw.invm_minor,
> +  fw.invm_img_type);
> + break;
> + }
> + /* fall through */
> + default:
> + /* if option rom is valid, display its version too*/
> + if (fw.or_valid) {
> + snprintf(fw_version, fw_length,
> +  "%d.%d, 0x%08x, %d.%d.%d",
> +  fw.eep_major, fw.eep_minor, fw.etrack_id,
> +  fw.or_major, fw.or_build, fw.or_patch);
> + /* no option rom */
> + } else {
> + if (fw.etrack_id != 0X) {
> + snprintf(fw_version, fw_length,
> +  "%d.%d, 0x%08x",
> +  fw.eep_major, fw.eep_minor, fw.etrack_id);
> + } else {
> + snprintf(fw_version, fw_length,
> +  "%d.%d.%d",
> +  fw.eep_major, fw.eep_minor, fw.eep_build);

Indentation is incorrect here.
Qiming: my mistake.


Re: [dpdk-dev] [PATCH v2 1/5] eal: Set numa node value for system which not support NUMA.

2017-01-08 Thread nickcooper-zhangtonghao

> On Jan 6, 2017, at 8:01 AM, Yong Wang  wrote:
> 
> Can you add the exact steps to reproduce the vmxnet3 issues to help the 
> review and the verification. My guess is that you have stopped the device, 
> changed some ring parameters (to something larger than the previous settings) 
> and restarted the device.


Thanks for your reply. Your guess is right. I run the openvswitch+dpdk with 
vmxnet3. 
First,I set the nb_desc of rx queue to 2048, and then stop the device, change 
nb_desc to 4096.
When I start the device again, I get the openvswitch crash caused by dpdk. 
Writing a sample program based on dpdk, I also get a crash.
The e1000 dpdk driver allocates RX ring for max possible mumber of hardware 
descriptors. I guess vmxnet3 should be in the same case.
I will submit v3.

Thanks. 




Re: [dpdk-dev] [PATCH v2 1/5] eal: Set numa node value for system which not support NUMA.

2017-01-08 Thread nickcooper-zhangtonghao
Thanks for your reply. The patch you submitted is better. Thanks for your 
improvement.

My legal name is “Nick Zhang”. So,

Signed-off-by: Nick Zhang 


Thanks.
Nick

> On Jan 6, 2017, at 12:26 AM, Stephen Hemminger  
> wrote:
> 
> It is good to see more checking for valid values.  I suspect that other 
> systems
> may have the same problem.  My preference would to have the code comment 
> generic
> and to have the precise details of about where this was observed in the commit
> log. 
> 
> The following would do same thing but be simpler:
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
> b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index 43501342..9f09cd98 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -306,19 +306,12 @@ pci_scan_one(const char *dirname, uint16_t domain, 
> uint8_t bus,
>   dev->max_vfs = (uint16_t)tmp;
>   }
> 
> - /* get numa node */
> + /* get numa node, default to 0 if not present */
>   snprintf(filename, sizeof(filename), "%s/numa_node",
>dirname);
> - if (access(filename, R_OK) != 0) {
> - /* if no NUMA support, set default to 0 */
> - dev->device.numa_node = 0;
> - } else {
> - if (eal_parse_sysfs_value(filename, &tmp) < 0) {
> - free(dev);
> - return -1;
> - }
> + if (eal_parse_sysfs_value(filename, &tmp) == 0 &&
> + tmp < RTE_MAX_NUMA_NODES)
>   dev->device.numa_node = tmp;
> - }
> 
>   /* parse resources */
>   snprintf(filename, sizeof(filename), "%s/resource", dirname);



Re: [dpdk-dev] [PATCH v2 1/5] eal: Set numa node value for system which not support NUMA.

2017-01-08 Thread nickcooper-zhangtonghao
I submitted the patches for first time. 
The first one is an individual patch for eal, others is for vmxnet3 
interdependently.


Thanks.
Nick

> On Jan 5, 2017, at 10:23 PM, Ferruh Yigit  wrote:
> 
> Hi nickcooper-zhangtonghao,
> 
> The patches in the patchset are individual patches, right? Is there any
> dependency between them?
> 
> And CC'ed vmxnet3 driver maintainer: Yong Wang  >
> 
> Thanks,
> ferruh



[dpdk-dev] [PATCH v3 1/4] vmxnet3: Avoid memory leak in vmxnet3_dev_rx_queue_setup.

2017-01-08 Thread Nick Zhang
This patch will check the "nb_desc" parameter for rx queue.
Rx vmxnet rings length should be between 128-4096.
The patch will release the rxq and re-allocation it soon
for different "nb_desc".

Signed-off-by: Nick Zhang 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index b109168..e77374f 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -926,6 +926,21 @@
 
PMD_INIT_FUNC_TRACE();
 
+   /* Rx vmxnet rings length should be between 128-4096 */
+   if (nb_desc < VMXNET3_DEF_RX_RING_SIZE) {
+   PMD_INIT_LOG(ERR, "VMXNET3 Rx Ring Size Min: 128");
+   return -EINVAL;
+   } else if (nb_desc > VMXNET3_RX_RING_MAX_SIZE) {
+   PMD_INIT_LOG(ERR, "VMXNET3 Rx Ring Size Max: 4096");
+   return -EINVAL;
+   }
+
+   /* Free memory prior to re-allocation if needed. */
+   if (dev->data->rx_queues[queue_idx] != NULL) {
+   vmxnet3_dev_rx_queue_release(dev->data->rx_queues[queue_idx]);
+   dev->data->rx_queues[queue_idx] = NULL;
+   }
+
rxq = rte_zmalloc("ethdev_rx_queue", sizeof(struct vmxnet3_rx_queue),
  RTE_CACHE_LINE_SIZE);
if (rxq == NULL) {
@@ -946,18 +961,9 @@
ring1 = &rxq->cmd_ring[1];
comp_ring = &rxq->comp_ring;
 
-   /* Rx vmxnet rings length should be between 256-4096 */
-   if (nb_desc < VMXNET3_DEF_RX_RING_SIZE) {
-   PMD_INIT_LOG(ERR, "VMXNET3 Rx Ring Size Min: 256");
-   return -EINVAL;
-   } else if (nb_desc > VMXNET3_RX_RING_MAX_SIZE) {
-   PMD_INIT_LOG(ERR, "VMXNET3 Rx Ring Size Max: 4096");
-   return -EINVAL;
-   } else {
-   ring0->size = nb_desc;
-   ring0->size &= ~VMXNET3_RING_SIZE_MASK;
-   ring1->size = ring0->size;
-   }
+   ring0->size = nb_desc;
+   ring0->size &= ~VMXNET3_RING_SIZE_MASK;
+   ring1->size = ring0->size;
 
comp_ring->size = ring0->size + ring1->size;
 
-- 
1.8.3.1





[dpdk-dev] [PATCH v3 2/4] vmxnet3: Avoid segfault caused by vmxnet3_dev_rx_queue_setup.

2017-01-08 Thread Nick Zhang
When we config RX queue with 2048 RX queue size, and stop the
device, changed queue size to 4096 and then start the device,
there will be segment fault. We should allocate RX ring for
max possible number of hardware descriptors.

Signed-off-by: Nick Zhang 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index e77374f..d5d7c33 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -977,8 +977,11 @@
comp_ring->next2proc = 0;
comp_ring->gen = VMXNET3_INIT_GEN;
 
-   size = sizeof(struct Vmxnet3_RxDesc) * (ring0->size + ring1->size);
-   size += sizeof(struct Vmxnet3_RxCompDesc) * comp_ring->size;
+   /* Allocate RX ring for max possible number of hardware descriptors. */
+   size = sizeof(struct Vmxnet3_RxDesc) *
+   (VMXNET3_RX_RING_MAX_SIZE * VMXNET3_RX_CMDRING_SIZE);
+   size += sizeof(struct Vmxnet3_RxCompDesc) *
+   (VMXNET3_RX_RING_MAX_SIZE * VMXNET3_RX_CMDRING_SIZE);
 
mz = ring_dma_zone_reserve(dev, "rxdesc", queue_idx, size, socket_id);
if (mz == NULL) {
-- 
1.8.3.1





[dpdk-dev] [PATCH v3 4/4] vmxnet3: Avoid segfault caused by vmxnet3_dev_tx_queue_setup.

2017-01-08 Thread Nick Zhang
When we config TX queue with 2048 TX queue size, stop the device,
changed queue size to 4096 and then start the device, there will
be segment fault. We should allocate TX ring for max possible
number of hardware descriptors.

Signed-off-by: Nick Zhang 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index f00b3b9..5f35a2e 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -874,9 +874,10 @@
comp_ring->next2proc = 0;
comp_ring->gen = VMXNET3_INIT_GEN;
 
-   size = sizeof(struct Vmxnet3_TxDesc) * ring->size;
-   size += sizeof(struct Vmxnet3_TxCompDesc) * comp_ring->size;
-   size += sizeof(struct Vmxnet3_TxDataDesc) * data_ring->size;
+   /* Allocate Tx ring for max possible number of hardware descriptors. */
+   size = sizeof(struct Vmxnet3_TxDesc) * VMXNET3_TX_RING_MAX_SIZE;
+   size += sizeof(struct Vmxnet3_TxCompDesc) * VMXNET3_TX_RING_MAX_SIZE;
+   size += sizeof(struct Vmxnet3_TxDataDesc) * VMXNET3_TX_RING_MAX_SIZE;
 
mz = ring_dma_zone_reserve(dev, "txdesc", queue_idx, size, socket_id);
if (mz == NULL) {
-- 
1.8.3.1





[dpdk-dev] [PATCH v3 3/4] vmxnet3: Avoid memory leak in vmxnet3_dev_tx_queue_setup.

2017-01-08 Thread Nick Zhang
This patch will check the "nb_desc" parameter for tx queue.
Tx vmxnet rings length should be between 512-4096. The patch
will release the txq and re-allocation it soon for different "nb_desc".

Signed-off-by: Nick Zhang 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 32 +++-
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index d5d7c33..f00b3b9 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -828,6 +828,23 @@
return -EINVAL;
}
 
+   /* Tx vmxnet ring length should be between 512-4096 */
+   if (nb_desc < VMXNET3_DEF_TX_RING_SIZE) {
+   PMD_INIT_LOG(ERR, "VMXNET3 Tx Ring Size Min: %u",
+   VMXNET3_DEF_TX_RING_SIZE);
+   return -EINVAL;
+   } else if (nb_desc > VMXNET3_TX_RING_MAX_SIZE) {
+   PMD_INIT_LOG(ERR, "VMXNET3 Tx Ring Size Max: %u",
+   VMXNET3_TX_RING_MAX_SIZE);
+   return -EINVAL;
+   }
+
+   /* Free memory prior to re-allocation if needed... */
+   if (dev->data->tx_queues[queue_idx] != NULL) {
+   vmxnet3_dev_tx_queue_release(dev->data->tx_queues[queue_idx]);
+   dev->data->tx_queues[queue_idx] = NULL;
+   }
+
txq = rte_zmalloc("ethdev_tx_queue", sizeof(struct vmxnet3_tx_queue),
  RTE_CACHE_LINE_SIZE);
if (txq == NULL) {
@@ -846,19 +863,8 @@
comp_ring = &txq->comp_ring;
data_ring = &txq->data_ring;
 
-   /* Tx vmxnet ring length should be between 512-4096 */
-   if (nb_desc < VMXNET3_DEF_TX_RING_SIZE) {
-   PMD_INIT_LOG(ERR, "VMXNET3 Tx Ring Size Min: %u",
-VMXNET3_DEF_TX_RING_SIZE);
-   return -EINVAL;
-   } else if (nb_desc > VMXNET3_TX_RING_MAX_SIZE) {
-   PMD_INIT_LOG(ERR, "VMXNET3 Tx Ring Size Max: %u",
-VMXNET3_TX_RING_MAX_SIZE);
-   return -EINVAL;
-   } else {
-   ring->size = nb_desc;
-   ring->size &= ~VMXNET3_RING_SIZE_MASK;
-   }
+   ring->size = nb_desc;
+   ring->size &= ~VMXNET3_RING_SIZE_MASK;
comp_ring->size = data_ring->size = ring->size;
 
/* Tx vmxnet rings structure initialization*/
-- 
1.8.3.1





[dpdk-dev] [PATCH] net/i40e: fix segment num in reassemble process

2017-01-08 Thread Chenghu Yao
When freeing up last mbuf, start->nb_segs should be decremented
by one. See also ixgbe process.

Signed-off-by: Chenghu Yao 
---
 drivers/net/i40e/i40e_rxtx_vec_common.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h 
b/drivers/net/i40e/i40e_rxtx_vec_common.h
index 6cb5dce..990520f 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
@@ -71,6 +71,7 @@
/* free up last mbuf */
struct rte_mbuf *secondlast = start;
 
+   start->nb_segs--;
while (secondlast->next != end)
secondlast = secondlast->next;
secondlast->data_len -= (rxq->crc_len -
-- 
1.8.3.1




Re: [dpdk-dev] [PATCH v5 3/8] ethdev: reserve capability flags for PMD-specific API

2017-01-08 Thread Tiwei Bie
On Sun, Jan 08, 2017 at 08:39:55PM +0800, Ananyev, Konstantin wrote:
> Hi Adrien,
> 
> > 
> > Hi Konstantin,
> > 
> > On Thu, Jan 05, 2017 at 11:32:38AM +, Ananyev, Konstantin wrote:
> > > Hi Adrien,
> > >
> > > >
> > > > On Thu, Jan 05, 2017 at 07:56:08AM +0800, Tiwei Bie wrote:
> > > > > On Thu, Jan 05, 2017 at 01:44:18AM +0800, Ananyev, Konstantin wrote:
[...]
> > Well my first reply to this thread was asking why isn't the whole API global
> > from the start then?
> 
> That's good question, and my preference would always be to have the
> API to configure this feature as generic one.
> I guess the main reason why it is not right now we don't reach an agreement
> how this API should look like: 
> http://dpdk.org/ml/archives/dev/2016-September/047810.html
> But I'll leave it to the author to provide the real reason here. 
> 

Yes, currently this work just provided a thin layer over 82599's
hardware MACsec offload support to allow users configure 82599's
MACsec offload engine. The current API may be too specific and may
need a rework to be used with other NICs.

> > 
> > Given there are valid reasons for it not to and no plan to make it so in the
> > near future, applications must be aware that they are including
> > rte_pmd_ixgbe.h to use it. That in itself is a limiting condition, right?
> 
> Yes, it is definitely a limiting factor.
> Though even if API to configure device to use macsec would be PMD specific 
> right now,
> The API to query that capability and the API to use it at datapath 
> (mbuf.ol_flags) still
> can be (and I think should be) device independent and transparent to use.  
> 
> > 
> > > Yes, right now it is supported only by ixgbe PMD, but why that should be 
> > > the
> > > reason to treat is as second-class citizen?
> > > Let say PKT_TX_TUNNEL_* offloads also are supported only by one PMD right 
> > > now.
> > 
> > You are right about PKT_TX_TUNNEL_*, however these flags exist on their own
> > and are not tied to any API function calls, unlike in this series where
> > PKT_TX_MACSEC can only be used if the DEV_TX_OFFLOAD_MACSEC_INSERT
> > capability is present 
> 
> I don't think PKT_TX_TUNNEL_* 'exists on its own'.
> To use it well behaving app have to:
> 1) Query that device does provide that capability: DEV_TX_OFFLOAD_*_TNL_TSO
> 2) configure PMD( & device) to use that capability
> 3) use that offload at run-time TX code (mb->ol_flags |= ...; mb->tx_offload 
> = ...)
> 
> For PKT_TX_TUNNEL_*  2) is pretty simple - user just need to make sure
> that full-featured TX function will be selected:
> txconf.txq_flags = 0; ...;  rte_eth_tx_queue_setup(..., &txconf);
>  
> For TX_MACSEC, as I understand 2) will be more complicated and
> right now is PMD specific, but anyway the main pattern remains the same.
> So at least 1) and 3) could be kept device neutral.
> 

Yes, the PMD-specific APIs focus on the 2nd step:

2) configure PMD( & device) to use that capability

The other parts (querying the capabilities, using the offload
in the datapath, registering the callback, getting the statistics
via xstats) are done in generic way.

Best regards,
Tiwei Bie


Re: [dpdk-dev] [PATCH v2 5/7] net/virtio_user: add vhost kernel support

2017-01-08 Thread Jason Wang



On 2016年12月23日 15:14, Jianfeng Tan wrote:

This patch add support vhost kernel as the backend for virtio_user.
Three main hook functions are added:
   - vhost_kernel_setup() to open char device, each vq pair needs one
 vhostfd;
   - vhost_kernel_ioctl() to communicate control messages with vhost
 kernel module;
   - vhost_kernel_enable_queue_pair() to open tap device and set it
 as the backend of corresonding vhost fd (that is to say, vq pair).

Signed-off-by: Jianfeng Tan 
---
  drivers/net/virtio/Makefile  |   1 +
  drivers/net/virtio/virtio_user/vhost.h   |   2 +
  drivers/net/virtio/virtio_user/vhost_kernel.c| 364 +++
  drivers/net/virtio/virtio_user/virtio_user_dev.c |  21 +-
  drivers/net/virtio/virtio_user/virtio_user_dev.h |   4 +
  5 files changed, 388 insertions(+), 4 deletions(-)
  create mode 100644 drivers/net/virtio/virtio_user/vhost_kernel.c

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 97972a6..faeffb2 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -60,6 +60,7 @@ endif
  
  ifeq ($(CONFIG_RTE_VIRTIO_USER),y)

  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_kernel.c
  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_dev.c
  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user_ethdev.c
  endif
diff --git a/drivers/net/virtio/virtio_user/vhost.h 
b/drivers/net/virtio/virtio_user/vhost.h
index bd67133..ffab13a 100644
--- a/drivers/net/virtio/virtio_user/vhost.h
+++ b/drivers/net/virtio/virtio_user/vhost.h
@@ -120,4 +120,6 @@ struct virtio_user_backend_ops {
  };
  
  struct virtio_user_backend_ops ops_user;

+struct virtio_user_backend_ops ops_kernel;
+
  #endif
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c 
b/drivers/net/virtio/virtio_user/vhost_kernel.c
new file mode 100644
index 000..8984c5c
--- /dev/null
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -0,0 +1,364 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "vhost.h"
+#include "virtio_user_dev.h"
+
+struct vhost_memory_kernel {
+   uint32_t nregions;
+   uint32_t padding;
+   struct vhost_memory_region regions[0];
+};
+
+/* vhost kernel ioctls */
+#define VHOST_VIRTIO 0xAF
+#define VHOST_GET_FEATURES _IOR(VHOST_VIRTIO, 0x00, __u64)
+#define VHOST_SET_FEATURES _IOW(VHOST_VIRTIO, 0x00, __u64)
+#define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01)
+#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02)
+#define VHOST_SET_MEM_TABLE _IOW(VHOST_VIRTIO, 0x03, struct 
vhost_memory_kernel)
+#define VHOST_SET_LOG_BASE _IOW(VHOST_VIRTIO, 0x04, __u64)
+#define VHOST_SET_LOG_FD _IOW(VHOST_VIRTIO, 0x07, int)
+#define VHOST_SET_VRING_NUM _IOW(VHOST_VIRTIO, 0x10, struct vhost_vring_state)
+#define VHOST_SET_VRING_ADDR _IOW(VHOST_VIRTIO, 0x11, struct vhost_vring_addr)
+#define VHOST_SET_VRING_BASE _IOW(VHOST_VIRTIO, 0x12, struct vhost_vring_state)
+#define VHOST_GET_VRING_BASE _IOWR(VHOST_VIRTIO, 0x12, struct 
vhost_vring_state)
+#define VHOST_SET_VRING_KICK _IOW(VHOST_VIRTIO, 0x20, struct vhost_vring_file)
+#define VHOST_SET_VRING_CALL _IOW

Re: [dpdk-dev] [PATCH v2 5/7] net/virtio_user: add vhost kernel support

2017-01-08 Thread Jason Wang



On 2017年01月04日 15:22, Tan, Jianfeng wrote:


Sorry, I forget to reply this comment.

On 12/26/2016 3:44 PM, Yuanhan Liu wrote:

[...]

+
+/* Does not work when VIRTIO_F_IOMMU_PLATFORM now, why? */

Because this feature need the vhost IOTLB support from the device
emulation. Patches for QEMU hasn't been merged yet, but it has been
there for a while.


Yes. And it's in need of help from QEMU.



Since we don't have the support yet, for sure it won't work. But
I'm wondering why you have to disable it explicitly?


Here we do not have QEMU. Frontend driver talks with vhost-net through 
virtio_user_dev. And both ends claim to support 
VIRTIO_F_IOMMU_PLATFORM. So this feature bit will be negotiated if we 
don't explicitly disable it. In my previous test, it fails in 
vhost_init_device_iotlb() of vhost kernel module.


Interesting, vhost_init_device_iotlb() only fail when OOM. Do you meet it?


My guess is that, for this feature, without the help of QEMU, vhost 
kernel module cannot work independently.


Technically it can if your userspace supports device IOTLB APIs too.

Or if you're using static mappings, and preset all mappings through 
VHOST_IOTLB_UPADATE to make sure no IOTLB misses.


Thanks



Thanks,
Jianfeng




[dpdk-dev] [PATCH v2] examples/ip_pipeline: check VLAN and MPLS params

2017-01-08 Thread Jyoti, Anand B
This commit add to CLI command check for the following errors
1. SVLAN and CVLAN IDs greater than 12 bits
2. MPLS ID greater than 20 bits
3. max number of supported MPLS labels to avoid array overflow

It prevents running CLI commands with invalid parameters.

Signed-off-by: Anand B Jyoti 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/pipeline/pipeline_routing.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/examples/ip_pipeline/pipeline/pipeline_routing.c 
b/examples/ip_pipeline/pipeline/pipeline_routing.c
index 3aadbf9..3deaff9 100644
--- a/examples/ip_pipeline/pipeline/pipeline_routing.c
+++ b/examples/ip_pipeline/pipeline/pipeline_routing.c
@@ -494,6 +494,26 @@ app_pipeline_routing_add_route(struct app_params *app,
/* data */
if (data->port_id >= p->n_ports_out)
return -1;
+
+   /* Valid range of VLAN tags 12 bits */
+   if (data->flags & PIPELINE_ROUTING_ROUTE_QINQ)
+   if ((data->l2.qinq.svlan & 0xF000) ||
+   (data->l2.qinq.cvlan & 0xF000))
+   return -1;
+
+   /* Max number of MPLS labels supported */
+   if (data->flags & PIPELINE_ROUTING_ROUTE_MPLS) {
+   uint32_t i;
+
+   if (data->l2.mpls.n_labels >
+   PIPELINE_ROUTING_MPLS_LABELS_MAX)
+   return -1;
+
+   /* Max MPLS label value 20 bits */
+   for (i = 0; i < data->l2.mpls.n_labels; i++)
+   if (data->l2.mpls.labels[i] & 0xFFF0)
+   return -1;
+   }
}
break;
 
-- 
2.7.4



Re: [dpdk-dev] [PATCH v3 2/6] net/virtio: fix wrong Rx/Tx method for secondary process

2017-01-08 Thread Yuanhan Liu
On Sun, Jan 08, 2017 at 03:15:00PM -0800, Stephen Hemminger wrote:
> On Fri,  6 Jan 2017 18:16:16 +0800
> Yuanhan Liu  wrote:
> 
> > If the primary enables the vector Rx/Tx path, the current code would
> > let the secondary always choose the non vector Rx/Tx path. This results
> > to a Rx/Tx method mismatch between primary and secondary process. Werid
> > errors then may happen, something like:
> > 
> > PMD: virtio_xmit_pkts() tx: virtqueue_enqueue error: -14
> > 
> > Fix it by choosing the correct Rx/Tx callbacks for the secondary process.
> > That is, use vector path if it's given.
> > 
> > Fixes: 8d8393fb1861 ("virtio: pick simple Rx/Tx")
> > 
> > Cc: sta...@dpdk.org
> > Signed-off-by: Yuanhan Liu 
> 
> This is failing on intel compile tests.
> 
> 
> http://dpdk.org/patch/18975

Thanks, but it looks like a false alarm to me, for reasons below.

> Failed Build #2:
> OS: RHEL7.2_64
> Target: x86_64-native-linuxapp-gcc
> MKRES test_resource_c.res.o  
> /home/patchWorkOrg/compilation/x86_64-native-linuxapp-gcc/lib/librte_ethdev.a(rte_ethdev.o):
>  In function `rte_eth_dev_pci_probe':
> rte_ethdev.c:(.text+0x9c4): undefined reference to `eth_dev_attach_secondary'
> collect2: error: ld returned 1 exit status

- eth_dev_attach_secondary is not defined in this patch, it's defined
  (and used) in the first patch.

- eth_dev_attach_secondary is actually defined; The report even shows
  it fails to build with gcc: the gcc build passes on my dev box.

Honestly, I seldom trusted the build reports from STV.

--yliu


Re: [dpdk-dev] [PATCH v1 2/2] Test cases for rte_memcmp functions

2017-01-08 Thread Wang, Zhihong


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monja...@6wind.com]
> Sent: Tuesday, January 3, 2017 4:41 AM
> To: Wang, Zhihong ; Ravi Kerur
> 
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1 2/2] Test cases for rte_memcmp
> functions
> 
> 2016-06-07 11:09, Wang, Zhihong:
> > From: Ravi Kerur [mailto:rke...@gmail.com]
> > > Zhilong, Thomas,
> > >
> > > If there is enough interest within DPDK community I can work on adding
> support
> > > for 'unaligned access' and 'test cases' for it. Please let me know either
> way.
> >
> > Hi Ravi,
> >
> > This rte_memcmp is proved with better performance than glibc's in aligned
> > cases, I think it has good value to DPDK lib.
> >
> > Though we don't have memcmp in critical pmd data path, it offers a better
> > choice for applications who do.
> 
> Re-thinking about this series, could it be some values to have a rte_memcmp
> implementation?

I think this series (rte_memcmp included) could help:

 1. Potentially better performance in hot paths.

 2. Agile for tuning.

 3. Avoid performance complications -- unusual but possible,
like the glibc memset issue I met while working on vhost
enqueue.

> What is the value compared to glibc one? Why not working on glibc?

As to working on glibc, wider design consideration and test
coverage might be needed, and we'll face different release
cycles, can we have the same agility? Also working with old
glibc could be a problem.



Re: [dpdk-dev] [PATCH v5 01/12] eal/bus: introduce bus abstraction

2017-01-08 Thread Shreyansh Jain

On Friday 06 January 2017 08:25 PM, Thomas Monjalon wrote:

2017-01-06 16:01, Shreyansh Jain:

On Wednesday 04 January 2017 03:22 AM, Thomas Monjalon wrote:

2016-12-26 18:53, Shreyansh Jain:

+/**
+ * A structure describing a generic bus.
+ */
+struct rte_bus {
+   TAILQ_ENTRY(rte_bus) next;   /**< Next bus object in linked list */
+   struct rte_driver_list driver_list;
+/**< List of all drivers on bus */
+   struct rte_device_list device_list;
+/**< List of all devices on bus */
+   const char *name;/**< Name of the bus */
+};


I am not convinced we should link a generic bus to drivers and devices.
What do you think of having rte_pci_bus being a rte_bus and linking
with rte_pci_device and rte_pci_driver lists?


This is different from what I had in mind.
You are saying:

  Class: rte_bus
   `-> No object instantiated for this class
  Class: rte_pci_bus inheriting rte_bus
   `-> object instantiated for this class.

Here, rte_bus is being treated as an abstract class which is only
inherited and rte_pci_bus is the base class which is instantiated.

And I was thinking:

  Class: rte_bus
   `-> Object: pci_bus (defined in */eal/eal_pci.c)

Here, rte_bus is that base class which is instantiated.

I agree that what you are suggesting is inline with current model:
  rte_device -> abstract class (no one instantiates it)
   `-> rte_pci_device -> Base class which inherits rte_device and
 is instantiated.


Yes


When I choose not to create rte_pci_bus, it was because I didn't want
another indirection in form of rte_bus->rte_pci_bus->object.
There were no 'non-generic' bus functions which were only applicable for
rte_pci_bus. Eventually, rte_pci_bus ended up being a direct inheritance
of rte_bus.


I'm thinking to something like that:

struct rte_bus {
TAILQ_ENTRY(rte_bus) next;
const char *name;
rte_bus_scan_t scan;
rte_bus_match_t match;
};
struct rte_pci_bus {
struct rte_bus bus;
struct rte_pci_driver_list pci_drivers;
struct rte_pci_device_list pci_devices;
};


if we go by rte_bus->rte_pci_bus->(instance of rte_pci_bus), above is
fine. Though, I am in favor of rte_bus->(instance of rte_bus for PCI)
because I don't see any 'non-generic' information in rte_pci_bus which
can't be put in rte_bus.


The lists of drivers and devices are specific to the bus.
Your proposal was to list them as generic rte_driver/rte_device and
cast them. I'm just saying we can directly declare them with the right type,
e.g. rte_pci_driver/rte_pci_device.


Ok. I get your point. Already changing the code to reflect this.



In the same logic, the functions probe/remove are specifics for the bus and
should be declared in rte_pci_driver instead of the generic rte_driver.


Yes, I agree with this after above argument.





+/** Helper for Bus registration. The constructor has higher priority than
+ * PMD constructors
+ */
+#define RTE_REGISTER_BUS(nm, bus) \
+static void __attribute__((constructor(101), used)) businitfn_ ##nm(void) \
+{\
+   (bus).name = RTE_STR(nm);\
+   rte_eal_bus_register(&bus); \
+}


By removing the lists from rte_bus as suggested above, do you still need
a priority for this constructor?


I think yes.
Even if we have rte_pci_bus as a class, object of rte_bus should be part
of Bus list *before* registration of a driver (because, driver
registration searches for bus by name).

(This is assuming that no global PCI/VDEV/XXX bus object is created
which is directly used within all PCI specific bus operations).

There was another suggestion on list which was to check for existence of
bus _within_ each driver registration and create/instantiate an object
in case no bus is registered. I didn't like the approach so I didn't use
it. From David [1], and me [2]

[1] http://dpdk.org/ml/archives/dev/2016-December/051689.html
[2] http://dpdk.org/ml/archives/dev/2016-December/051698.html


OK, we can keep your approach of prioritize bus registrations.
If we see an issue later, we could switch to a bus registration done
when a first driver is registered on the bus.


Thanks for confirmation.





 struct rte_device {
TAILQ_ENTRY(rte_device) next; /**< Next device */
+   struct rte_bus *bus;  /**< Device connected to this bus */
const struct rte_driver *driver;/**< Associated driver */
int numa_node;/**< NUMA node connection */
struct rte_devargs *devargs;  /**< Device user arguments */
@@ -148,6 +149,7 @@ void rte_eal_device_remove(struct rte_device *dev);
  */
 struct rte_driver {
TAILQ_ENTRY(rte_driver) next;  /**< Next in list. */
+   struct rte_bus *bus;   /**< Bus serviced by this driver */
const char *name;   /**< Driver name. */
const char *alias;  /**< Driver alias. */
 };


Do we need to know the bus asso

Re: [dpdk-dev] [PATCH v5 05/12] eal: add probe and remove support for rte_driver

2017-01-08 Thread Shreyansh Jain

On Friday 06 January 2017 08:56 PM, Thomas Monjalon wrote:

2017-01-06 17:14, Shreyansh Jain:

On Wednesday 04 January 2017 03:35 AM, Thomas Monjalon wrote:

2016-12-26 18:53, Shreyansh Jain:

--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -152,6 +162,8 @@ struct rte_driver {
struct rte_bus *bus;   /**< Bus serviced by this driver */
const char *name;   /**< Driver name. */
const char *alias;  /**< Driver alias. */
+   driver_probe_t *probe; /**< Probe the device */
+   driver_remove_t *remove;   /**< Remove/hotplugging the device */
 };


If I understand well, this probe function does neither scan nor match.
So it could be named init.


Current model is:

After scanning for devices and populating bus->device_list,
Bus probe does:
  `-> bus->match()
  `-> rte_driver->probe() for matched driver

For PCI drivers, '.probe = rte_eal_pci_probe'.

For example, igb_ethdev.c:

--->8---
static struct eth_driver rte_igb_pmd = {
 .pci_drv = {
 .driver = {
 .probe = rte_eal_pci_probe,
 .remove = rte_eal_pci_remove,
 },
...
--->8---


Yes
I'm just having some doubts about the naming "probe" compared to "init".
And yes I know I was advocating to unify naming to "probe" recently :)
I would like to be sure it is not confusing for anyone.
Do you agree that "init" refers to global driver initialization and
"probe" refers to instantiating a device?


Ok. Makes sense as a standardized way of differentiating 'init' from 
'probe'.




If yes, the comment could be changed from "Probe the device" to
"Check and instantiate a device".


Now that probe if removed from rte_driver, I think this would no longer 
be valid. [1]


[1] http://dpdk.org/ml/archives/dev/2017-January/054140.html




I think the probe (init) and remove ops must be specific to the bus.
We can have them in rte_bus, and as an example, the pci implementation
would call the pci probe and remove ops of rte_pci_driver.


I do not understand clearly what I was saying here :/


:)




So,
---
After scanning for devices (bus->scan()):
Bus probe (rte_eal_bus_probe()):
  `-> bus->match()
  `-> bus->init() - a new fn rte_bus_pci_init()


I suggest the naming bus->probe().
It is currently implemented in rte_eal_pci_probe_one_driver().


  -> which calls rte_eal_pci_probe()


Not needed here, this function is converted into the PCI match function.


  -> and rte_pci_driver->probe()


Yes, bus->probe() makes some processing and calls rte_pci_driver->probe().


I have made some changes on similar lines. Will share them soon. Then we 
can discuss again.






and remove rte_driver probe and remove callbacks because they are now
redundant. (they were added in bus patches itself)
---

Is the above correct understanding of your statement?


I think we just need to move probe/remove in rte_pci_driver.


Somehow I don't remember why I didn't do this in first place - it seems
to be better option than introducing a rte_driver->probe()/remove()
layer. I will change it (and think again why I rejected this idea in
first place). Thanks.


Thanks





Re: [dpdk-dev] [PATCH v5 04/12] eal: integrate bus scan and probe with EAL

2017-01-08 Thread Shreyansh Jain

Hello,

On Sunday 08 January 2017 05:51 PM, Rosen, Rami wrote:

Hi,



diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c 
index 2206277..2c223de 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c


+/* Scan all the buses for registering devices */ int
+rte_eal_bus_scan(void)
+{
+   int ret;
+   struct rte_bus *bus = NULL;
+
+   TAILQ_FOREACH(bus, &rte_bus_list, next) {
+   ret = bus->scan(bus);
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
+   bus->name);
+   /* TODO: Should error on a particular bus block scan
+* for all others?
+*/
+   return ret;
+   }
+   }
+
+   return 0;
+}
+

Nitpick - the return type of rte_eal_bus_scan() is int and not void:


Thanks for review. I will fix this before sending v6.



* Scan all the buses attached to the framework.
+ *
+ * @param void
+ * @return void
+ */
+int rte_eal_bus_scan(void);


Rami Rosen



-
Shreyansh


Re: [dpdk-dev] [PATCH v5 04/12] eal: integrate bus scan and probe with EAL

2017-01-08 Thread Shreyansh Jain

On Friday 06 January 2017 07:16 PM, Thomas Monjalon wrote:

2017-01-06 17:30, Shreyansh Jain:

On Friday 06 January 2017 04:08 PM, Shreyansh Jain wrote:

On Wednesday 04 January 2017 03:16 AM, Thomas Monjalon wrote:

2016-12-26 18:53, Shreyansh Jain:

--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -844,6 +845,9 @@ rte_eal_init(int argc, char **argv)
if (rte_eal_intr_init() < 0)
rte_panic("Cannot init interrupt-handling thread\n");

+   if (rte_eal_bus_scan())
+   rte_panic("Cannot scan the buses for devices\n");


Yes, definitely. Just one scan functions which scan registered bus.


@@ -884,6 +888,9 @@ rte_eal_init(int argc, char **argv)
if (rte_eal_pci_probe())
rte_panic("Cannot probe PCI\n");

+   if (rte_eal_bus_probe())
+   rte_panic("Cannot probe devices\n");
+
if (rte_eal_dev_init() < 0)
rte_panic("Cannot init pmd devices\n");


What is the benefit of initializing (probe) a device outside of the scan?
Currently, it is done in two steps, so you are keeping the same
behaviour.


Yes, only for compatibility to existing model of two-step process.
Ideally, only a single step is enough (init->probe).

During the discussion in [1] also this point was raised - at that time
for VDEV and applicability to PCI.

[1] http://dpdk.org/ml/archives/dev/2016-December/051306.html

If you want, I can merge these two. I postponed it because 1) it is an
independent change and should really impact bus and 2) I was not sure
of dependency of init *before* pthread_create for all workers.


I forgot _not_ in above - rephrasing:

If you want, I can merge these two. I postponed it because 1) it is an
independent change and should _not_ really impact bus and 2) I was not
sure of dependency of init *before* pthread_create for all workers.


I'm OK with your approach.


I imagine a model where the scan function decide to initialize the
device and can require some help from a callback to make this decision.
So the whitelist/blacklist policy can be implemented with callbacks at
the scan level and possibly the responsibility of the application.
Note that the callback model would be a change for a next release.


Agree. But, that is not really part of Bus patches - isn't it? Or, you
want to change that with this series?


No it is not the scope of this series.
Please could you add it in the cover letter as a next step?


Yes, I will add to cover letter as Pending Item.


Thanks





Re: [dpdk-dev] [PATCH] doc: add known issue for uio_pci_generic in XL710

2017-01-08 Thread Wu, Jingjing


> -Original Message-
> From: Guo, Jia
> Sent: Friday, December 23, 2016 10:35 AM
> To: Zhang, Helin ; Wu, Jingjing 
> Cc: dev@dpdk.org; Guo, Jia 
> Subject: [PATCH] doc: add known issue for uio_pci_generic in XL710
> 
> When bind with uio_pci_generic in XL710, the result is failed.
> uio_pci_generic is not supported by XL710.
> 
> Signed-off-by: Jeff Guo 

Acked-by: Jingjing Wu 



Re: [dpdk-dev] [PATCH v5 1/5] ethdev: add firmware version get

2017-01-08 Thread Yang, Qiming
-Original Message-
From: Stephen Hemminger [mailto:step...@networkplumber.org] 
Sent: Monday, January 9, 2017 7:05 AM
To: Yang, Qiming 
Cc: dev@dpdk.org; Yigit, Ferruh ; Zhang, Helin 
; Horton, Remy 
Subject: Re: [dpdk-dev] [PATCH v5 1/5] ethdev: add firmware version get

On Sun,  8 Jan 2017 12:11:31 +0800
Qiming Yang  wrote:

>  void
> +rte_eth_dev_fw_version_get(uint8_t port_id, char *fw_version, int 
> +fw_length) {
> + struct rte_eth_dev *dev;
> +
> + RTE_ETH_VALID_PORTID_OR_RET(port_id);
> + dev = &rte_eth_devices[port_id];
> +
> + RTE_FUNC_PTR_OR_RET(*dev->dev_ops->fw_version_get);
> + (*dev->dev_ops->fw_version_get)(dev, fw_version, fw_length); }

Maybe dev argument to fw_version_get should be:
   const struct rte_eth_dev *dev
Qiming: do you means the argument to ops fw_version_get? 
why should add 'const'? both two are OK, but we usually use struct rte_eth_dev 
*dev.


[dpdk-dev] [PATCH v4] ethdev: fix port data mismatched in multiple process model

2017-01-08 Thread Yuanhan Liu
Assume we have two virtio ports, 00:03.0 and 00:04.0. The first one is
managed by the kernel driver, while the later one is managed by DPDK.

Now we start the primary process. 00:03.0 will be skipped by DPDK virtio
PMD driver (since it's being used by the kernel). 00:04.0 would be
successfully initiated by DPDK virtio PMD (if nothing abnormal happens).
After that, we would get a port id 0, and all the related info needed
by virtio (virtio_hw) is stored at rte_eth_dev_data[0].

Then we start the secondary process. As usual, 00:03.0 will be firstly
probed. It firstly tries to get a local eth_dev structure for it (by
rte_eth_dev_allocate):

port_id = rte_eth_dev_find_free_port();
...

eth_dev = &rte_eth_devices[port_id];
eth_dev->data = &rte_eth_dev_data[port_id];
...

return eth_dev;

Since it's a first PCI device, port_id will be 0. eth_dev->data would
then point to rte_eth_dev_data[0]. And here things start going wrong,
as rte_eth_dev_data[0] actually stores the virtio_hw for 00:04.0.

That said, in the secondary process, DPDK will continue to drive PCI
device 00.03.0 (despite the fact it's been managed by kernel), with
the info from PCI device 00:04.0. Which is wrong.

The fix is to attach the port already registered by the primary process:
iterate the rte_eth_dev_data[], and get the port id who's PCI ID matches
the current PCI device.

This would let us maintain same port ID for the same PCI device, keeping
the chance of referencing to wrong data minimal.

Fixes: af75078fece3 ("first public release")

Cc: sta...@dpdk.org
Cc: Thomas Monjalon 
Cc: Bruce Richardson 
Cc: Ferruh Yigit 
Signed-off-by: Yuanhan Liu 
---

v4: - assign eth_dev in the common eth_dev init help function
  it also renamed to eth_dev_get, to not confuse with the
  eth_dev_init callback.
- move primoary process specific assignments to rte_eth_dev_allocate
- drop the virtio example in comment
- combine two code block for primary into one

v3: - do not move rte_eth_dev_data_alloc to pci_probe
- rename eth_dev_attach to eth_dev_attach_secondary
- introduce eth_dev_init() for common eth_dev struct initiation
- move comment block inside the "if" block
---
 lib/librte_ether/rte_ethdev.c | 71 +--
 1 file changed, 62 insertions(+), 9 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index fde8112..1453df1 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -189,6 +189,19 @@ struct rte_eth_dev *
return RTE_MAX_ETHPORTS;
 }
 
+static struct rte_eth_dev *
+eth_dev_get(uint8_t port_id)
+{
+   struct rte_eth_dev *eth_dev = &rte_eth_devices[port_id];
+
+   eth_dev->data = &rte_eth_dev_data[port_id];
+   eth_dev->attached = DEV_ATTACHED;
+   eth_dev_last_created_port = port_id;
+   nb_ports++;
+
+   return eth_dev;
+}
+
 struct rte_eth_dev *
 rte_eth_dev_allocate(const char *name)
 {
@@ -210,13 +223,41 @@ struct rte_eth_dev *
return NULL;
}
 
-   eth_dev = &rte_eth_devices[port_id];
-   eth_dev->data = &rte_eth_dev_data[port_id];
+   eth_dev = eth_dev_get(port_id);
snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
eth_dev->data->port_id = port_id;
-   eth_dev->attached = DEV_ATTACHED;
-   eth_dev_last_created_port = port_id;
-   nb_ports++;
+
+   return eth_dev;
+}
+
+/*
+ * Attach to a port already registered by the primary process, which
+ * makes sure that the same device would have the same port id both
+ * in the primary and secondary process.
+ */
+static struct rte_eth_dev *
+eth_dev_attach_secondary(const char *name)
+{
+   uint8_t i;
+   struct rte_eth_dev *eth_dev;
+
+   if (rte_eth_dev_data == NULL)
+   rte_eth_dev_data_alloc();
+
+   for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
+   if (strcmp(rte_eth_dev_data[i].name, name) == 0)
+   break;
+   }
+   if (i == RTE_MAX_ETHPORTS) {
+   RTE_PMD_DEBUG_TRACE(
+   "device %s is not driven by the primary process\n",
+   name);
+   return NULL;
+   }
+
+   eth_dev = eth_dev_get(i);
+   RTE_ASSERT(eth_dev->data->port_id == i);
+
return eth_dev;
 }
 
@@ -246,16 +287,28 @@ struct rte_eth_dev *
rte_eal_pci_device_name(&pci_dev->addr, ethdev_name,
sizeof(ethdev_name));
 
-   eth_dev = rte_eth_dev_allocate(ethdev_name);
-   if (eth_dev == NULL)
-   return -ENOMEM;
-
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+   eth_dev = rte_eth_dev_allocate(ethdev_name);
+   if (eth_dev == NULL)
+   return -ENOMEM;
+
eth_dev->data->dev_private = rte_zmalloc("ethdev private 
structure",
  eth_drv->dev_private_size