Re: [dpdk-dev] [RFC PATCH 4/4] ethdev: add helpers to move to the new offloads API

2017-08-24 Thread Shahaf Shuler
Thursday, August 24, 2017 1:06 AM, Thomas Monjalon:
> 23/08/2017 15:13, Shahaf Shuler:
> > Wednesday, August 23, 2017 3:29 PM, Ananyev, Konstantin:
> > > From: Shahaf Shuler
> > > > In order to enable PMDs to support only one of the APIs, and
> > > > applications to avoid branching according to the underlying device
> > > > a copy functions to/from the old/new APIs were added.
> 
> Looks a good intent.
> I would prefer the word "convert" instead of "copy".
> 
> > > >  int
> > > >  rte_eth_rx_queue_setup(uint8_t port_id, uint16_t rx_queue_id,
> [...]
> > > > +   } else if ((!(dev->data->dev_flags & RTE_ETH_DEV_RXQ_OFFLOAD))
> &&
> > > > +  (dev->data->dev_conf.rxmode.ignore == 1)) {
> > > > +   int ret;
> > > > +   struct rte_eth_rxmode rxmode;
> > > > +
> > > > +   rte_eth_copy_rxq_offloads(&rxmode, rx_conf);
> > > > +   if (memcmp(&rxmode, &dev->data->dev_conf.rxmode,
> > > > +  sizeof(rxmode))) {
> > > > +   /*
> > > > +* device which work with rxmode offloads API
> requires
> > > > +* a re-configuration in order to apply the new
> offloads
> > > > +* configuration.
> > > > +*/
> > > > +   dev->data->dev_conf.rxmode = rxmode;
> > > > +   ret = rte_eth_dev_configure(port_id,
> > > > +   dev->data->nb_rx_queues,
> > > > +   dev->data->nb_tx_queues,
> > > > +   &dev->data->dev_conf);
> > >
> > > Hmm, and why we would need to reconfigure our device in the middle
> > > of rx queue setup?
> >
> > The reason is the old Rx offloads API is configured on device configure.
> > This if section is for applications which already moved to the new
> > offload API however the underlying PMD still uses the old one.
> 
> Isn't it risky to re-run configure here?
> We could also declare this case as an error.
> 
> I think applications which have migrated to the new API, could use the
> convert functions themselves before calling configure to support not
> migrated PMDs.
> The cons of my solution are:
> - discourage apps to migrate before all PMDs have migrated
> - expose a temporary function to convert API I propose it anyway because
> there is always someone to like bad ideas ;)

Yes. I tried to make it as simple as possible for application to move to the 
new API.
Defining it as error flow, will enforce the application to check the PMD 
offload mode and branch accordingly. The conversion functions are a good 
helpers, yet the code remains complex due to the different cases with the 
different PMDs.

Considering the re-configuration is risky, and without other ideas I will need 
to fall back to the error flow case.
Are we OK with that?


[dpdk-dev] [PATCH] net/qede:fix the bug about pointer params may NULL

2017-08-24 Thread Rongqiang XIE
In function qede_rss_reta_update(),the pointer params returned from
call to function rte_zmalloc() may be NULL and will be dereferenced.
So, should judge the params is NULL or not.

Signed-off-by: Rongqiang XIE 
---
 drivers/net/qede/qede_ethdev.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c
index 0e05989..4e9e89f 100644
--- a/drivers/net/qede/qede_ethdev.c
+++ b/drivers/net/qede/qede_ethdev.c
@@ -2012,6 +2012,10 @@ int qede_rss_reta_update(struct rte_eth_dev *eth_dev,
memset(&vport_update_params, 0, sizeof(vport_update_params));
params = rte_zmalloc("qede_rss", sizeof(*params) * edev->num_hwfns,
 RTE_CACHE_LINE_SIZE);
+   if (params == NULL) {
+   DP_ERR(edev, "failed to allocate memory\n");
+   return -ENOMEM;
+   }
 
for (i = 0; i < reta_size; i++) {
idx = i / RTE_RETA_GROUP_SIZE;
-- 
1.8.3.1




Re: [dpdk-dev] [PATCH 1/2] net/mlx5: support device removal event

2017-08-24 Thread Nélio Laranjeiro
On Wed, Aug 23, 2017 at 07:44:45PM +, Matan Azrad wrote:
> Hi Nelio
> 
> > -Original Message-
> > From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com]
> > Sent: Wednesday, August 23, 2017 12:41 PM
> > To: Matan Azrad 
> > Cc: Adrien Mazarguil ; dev@dpdk.org
> > Subject: Re: [PATCH 1/2] net/mlx5: support device removal event
> > 
> > Hi Matan,
> > 
> > On Sun, Aug 13, 2017 at 03:25:11PM +0300, Matan Azrad wrote:
> > > Extend the LSC event handling to support the device removal as well.
> > > The Verbs library may send several related events, which are different
> > > from LSC event.
> > >
> > > The mlx5 event handling has been made capable of receiving and
> > > signaling several event types at once.
> > >
> > > This support includes next:
> > > 1. Removal event detection according to the user configuration.
> > > 2. Calling to all registered mlx5 removal callbacks.
> > > 3. Capabilities extension to include removal interrupt handling.
> > >
> > > Signed-off-by: Matan Azrad 
> > > ---
> > >  drivers/net/mlx5/mlx5.c|   2 +-
> > >  drivers/net/mlx5/mlx5_ethdev.c | 100
> > > +++--
> > >  2 files changed, 68 insertions(+), 34 deletions(-)
> > >
> > > Hi
> > > This patch based on top of last Nelio mlx5 cleanup patches.
> > >
> > > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> > > bd66a7c..1a3d7f1 100644
> > > --- a/drivers/net/mlx5/mlx5.c
> > > +++ b/drivers/net/mlx5/mlx5.c
> > > @@ -865,7 +865,7 @@ static struct rte_pci_driver mlx5_driver = {
> > >   },
> > >   .id_table = mlx5_pci_id_map,
> > >   .probe = mlx5_pci_probe,
> > > - .drv_flags = RTE_PCI_DRV_INTR_LSC,
> > > + .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV,
> > >  };
> > >
> > >  /**
> > > diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> > > b/drivers/net/mlx5/mlx5_ethdev.c index 57f6237..404d8f4 100644
> > > --- a/drivers/net/mlx5/mlx5_ethdev.c
> > > +++ b/drivers/net/mlx5/mlx5_ethdev.c
> > > @@ -1112,47 +1112,75 @@ mlx5_ibv_device_to_pci_addr(const struct
> > > ibv_device *device,  }
> > >
> > >  /**
> > > - * Link status handler.
> > > + * Update the link status.
> > > + * Set alarm if the device link status is inconsistent.
> > 
> > Adding such comment should also comment about the issue this alarm is
> > solving i.e. why the link is inconsistent and why the alarm help to fix the
> > issue.
> > 
> I didn't see any comments about that in the old code , Hence I didn't write 
> it.

Normal as the alarm is a work around specifically necessary to Mellanox PMD.
Now you explicitly announce that this function program an alarm, the question
is why is it necessary?

> I think you right and this could be added.(even before this patch).

No, in the current code, it update the link, if it inconsistent it tries to
have a link correct ASAP.  There is no need to inform this function will
program an alarm, it is internal cooking.

> > >   *
> > >   * @param priv
> > >   *   Pointer to private structure.
> > > - * @param dev
> > > - *   Pointer to the rte_eth_dev structure.
> > >   *
> > >   * @return
> > > - *   Nonzero if the callback process can be called immediately.
> > > + *   Zero if alarm is not set and the link status is consistent.
> > >   */
> > >  static int
> > > -priv_dev_link_status_handler(struct priv *priv, struct rte_eth_dev
> > > *dev)
> > > +priv_link_status_alarm_update(struct priv *priv)
> > 
> > The old name is more accurate, the fact we need to program an alarm is a
> > work around to get the correct status from ethtool.  If it was possible to 
> > avoid
> > it, this alarm would not exists.
> > 
> Probably because of the git +- format and this specific patch you got confuse 
> here.

No I applied your patch and read your code.  You did not understand my
comment.

>[...]

When I read:

>  void
>  mlx5_dev_link_status_handler(void *arg)
>  {
> struct rte_eth_dev *dev = arg;
> struct priv *priv = dev->data->dev_private;
> int ret;
> 
> priv_lock(priv);
> assert(priv->pending_alarm == 1);
> priv->pending_alarm = 0;
> -   ret = priv_dev_link_status_handler(priv, dev);
> +   ret = priv_link_status_alarm_update(priv);
> priv_unlock(priv);
> -   if (ret)
> +   if (!ret)
> _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC, 
> NULL,
> - NULL);
> +   NULL);
>  }

I am expecting to find something related to a link update, what I see is an 
alarm
update.  I don't expect to update an alarm but a link.  The names and action
are inconsistent i.e. mlx5_dev_link_status_handler() should handle a link not
an alarm.

I understand there is a need to add more function levels, but the
priv_link_status_alarm_update() should be renamed to something like
priv_link_status_update().

Regards,

-- 
Nélio Laranjeiro
6WIND


[dpdk-dev] [PATCH v2 3/3] net/mlx5: add hardware timestamp

2017-08-24 Thread Raslan Darawsheh
Expose a new capapilty of Rx hw timestamp and
added new device args to enable it hw_timestamp.
It will add the raw hw timestamp into the packets.

Its expected that it will lower down the performance since using it
will disable the cqe comprission, and will add extra checkes in
the vec rx path.

Signed-off-by: Raslan Darawsheh 
---
 doc/guides/nics/mlx5.rst |  5 +
 drivers/net/mlx5/mlx5.c  | 23 +++
 drivers/net/mlx5/mlx5.h  |  1 +
 drivers/net/mlx5/mlx5_ethdev.c   |  3 ++-
 drivers/net/mlx5/mlx5_rxq.c  |  3 +++
 drivers/net/mlx5/mlx5_rxtx.c |  5 +
 drivers/net/mlx5/mlx5_rxtx.h |  3 ++-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.c | 13 -
 8 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index f4cb18b..7dbd844 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -268,6 +268,11 @@ Run-time configuration
 
   Enabled by default.
 
+- ``rx_timestamp`` parameter [int]
+
+ A nonzero value enables Rx timestamp.
+ When hw timestamp is enabled, packets will have raw hw timestamp.
+
 Prerequisites
 -
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b7e5046..4b3a3ab 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -94,6 +94,9 @@
 /* Device parameter to enable hardware TSO offload. */
 #define MLX5_TSO "tso"
 
+/* Device parameter to enable hardware timestamp offload. */
+#define MLX5_RX_TIMESTAMP "rx_timestamp"
+
 /* Device parameter to enable hardware Tx vector. */
 #define MLX5_TX_VEC_EN "tx_vec_en"
 
@@ -113,6 +116,7 @@ struct mlx5_args {
int tso;
int tx_vec_en;
int rx_vec_en;
+   int hw_timestamp;
 };
 /**
  * Retrieve integer value from environment variable.
@@ -336,6 +340,8 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
args->tx_vec_en = !!tmp;
} else if (strcmp(MLX5_RX_VEC_EN, key) == 0) {
args->rx_vec_en = !!tmp;
+   } else if (strcmp(MLX5_RX_TIMESTAMP, key) == 0) {
+   args->hw_timestamp = !!tmp;
} else {
WARN("%s: unknown parameter", key);
return -EINVAL;
@@ -367,6 +373,7 @@ mlx5_args(struct mlx5_args *args, struct rte_devargs 
*devargs)
MLX5_TSO,
MLX5_TX_VEC_EN,
MLX5_RX_VEC_EN,
+   MLX5_RX_TIMESTAMP,
NULL,
};
struct rte_kvargs *kvlist;
@@ -426,6 +433,8 @@ mlx5_args_assign(struct priv *priv, struct mlx5_args *args)
priv->tx_vec_en = args->tx_vec_en;
if (args->rx_vec_en != MLX5_ARG_UNSET)
priv->rx_vec_en = args->rx_vec_en;
+   if (args->hw_timestamp != MLX5_ARG_UNSET)
+   priv->hw_timestamp = args->hw_timestamp;
 }
 
 /**
@@ -573,6 +582,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
.tso = MLX5_ARG_UNSET,
.tx_vec_en = MLX5_ARG_UNSET,
.rx_vec_en = MLX5_ARG_UNSET,
+   .hw_timestamp = MLX5_ARG_UNSET,
};
 
exp_device_attr.comp_mask =
@@ -581,6 +591,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS |
IBV_EXP_DEVICE_ATTR_RX_PAD_END_ALIGN |
IBV_EXP_DEVICE_ATTR_TSO_CAPS |
+   IBV_EXP_DEVICE_ATTR_WITH_TIMESTAMP_MASK |
0;
 
DEBUG("using port %u (%08" PRIx32 ")", port, test);
@@ -662,6 +673,18 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
 IBV_EXP_DEVICE_VXLAN_SUPPORT);
DEBUG("L2 tunnel checksum offloads are %ssupported",
  (priv->hw_csum_l2tun ? "" : "not "));
+   if (priv->hw_timestamp) {
+   priv->hw_timestamp =
+   (exp_device_attr.comp_mask |
+IBV_EXP_DEVICE_ATTR_WITH_TIMESTAMP_MASK);
+   DEBUG("Timestamping offload is %ssupported",
+ (priv->hw_timestamp ? "" : "not "));
+   priv->cqe_comp = (priv->hw_timestamp ?
+ 0 : priv->cqe_comp);
+   DEBUG("%s",
+ (priv->hw_timestamp ?
+ "cqe compression is disabled" : ""));
+   }
 
priv->ind_table_max_size = 
exp_device_attr.rx_hash_caps.max_rwq_indirection_table_size;
/* Remove this check once DPDK supports larger/variable
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 43c5384..4d19351 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h

[dpdk-dev] [PATCH v2 1/3] ethdev: expose Rx hardware timestamp

2017-08-24 Thread Raslan Darawsheh
Added new capability to the list of rx offloads for hw timestamp

The PMDs how expose this capability will always have it enabled.
But, if the following API got accepted applications can choose
between disable/enable this API.
http://dpdk.org/dev/patchwork/patch/27470/

Signed-off-by: Raslan Darawsheh 
---
 lib/librte_ether/rte_ethdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 0adf327..cc5d281 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -907,6 +907,8 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_QINQ_STRIP  0x0020
 #define DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM 0x0040
 #define DEV_RX_OFFLOAD_MACSEC_STRIP 0x0080
+#define DEV_RX_OFFLOAD_TIMESTAMP 0x0100
+/**< Device puts raw timestamp in mbuf. */
 
 /**
  * TX offload capabilities of a device.
-- 
2.7.4



[dpdk-dev] [PATCH v2 2/3] app/testpmd: add Rx timestamp in testpmd

2017-08-24 Thread Raslan Darawsheh
Added new print in case a PMD exposes Rx timestamp.
Also, added a print for timestamp value in rxonly mode
in case the packet was timestamped.

Signed-off-by: Raslan Darawsheh 
---
 app/test-pmd/config.c | 3 +++
 app/test-pmd/rxonly.c | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 3ae3e1c..8a5da5d 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -598,6 +598,9 @@ port_offload_cap_display(portid_t port_id)
printf("off\n");
}
 
+   if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TIMESTAMP)
+   printf("HW timestamp:  on\n");
+
if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_QINQ_INSERT) {
printf("Double VLANs insert:   ");
if (ports[port_id].tx_ol_flags &
diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index 5ef0219..f4d35d7 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -158,6 +158,8 @@ pkt_burst_receive(struct fwd_stream *fs)
printf("hash=0x%x ID=0x%x ",
   mb->hash.fdir.hash, mb->hash.fdir.id);
}
+   if (ol_flags & PKT_RX_TIMESTAMP)
+   printf(" - timestamp %lu ", mb->timestamp);
if (ol_flags & PKT_RX_VLAN_STRIPPED)
printf(" - VLAN tci=0x%x", mb->vlan_tci);
if (ol_flags & PKT_RX_QINQ_STRIPPED)
-- 
2.7.4



[dpdk-dev] [PATCH v2] eal: add config option to enable asserts

2017-08-24 Thread Xueming Li
Currently, enabling assertion have to set CONFIG_RTE_LOG_LEVEL to
RTE_LOG_DEBUG. CONFIG_RTE_LOG_LEVEL is the default log level of control
path, RTE_LOG_DP_LEVEL is the log level of data path. It's a little bit
hard to understand literally that assertion is decided by control path
LOG_LEVEL, especially assertion used on data path.

On the other hand, DPDK need an assertion enabling switch w/o impacting
log output level, assuming "--log-level" not specified.

Assertion is an important API to balance DPDK high performance and
robustness. To promote assertion usage, it's valuable to unhide
assertion out of COFNIG_RTE_LOG_LEVEL.

In one word, log is log, assertion is assertion, debug is hot pot :)

Rationale of this patch is to introduce an dedicate switch of
assertion: RTE_ENABLE_ASSERT

Signed-off-by: Xueming Li 
Acked-by: Gaetan Rivet 

---
v2: Changed macro name from RTE_ASSERTION to RTE_ENABLE_ASSERT
---
 config/common_base| 1 +
 lib/librte_eal/common/include/rte_debug.h | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/config/common_base b/config/common_base
index 5e97a08b6..2cb445b6d 100644
--- a/config/common_base
+++ b/config/common_base
@@ -93,6 +93,7 @@ CONFIG_RTE_MAX_NUMA_NODES=8
 CONFIG_RTE_MAX_MEMSEG=256
 CONFIG_RTE_MAX_MEMZONE=2560
 CONFIG_RTE_MAX_TAILQ=32
+CONFIG_RTE_ENABLE_ASSERT=n
 CONFIG_RTE_LOG_LEVEL=RTE_LOG_INFO
 CONFIG_RTE_LOG_DP_LEVEL=RTE_LOG_INFO
 CONFIG_RTE_LOG_HISTORY=256
diff --git a/lib/librte_eal/common/include/rte_debug.h 
b/lib/librte_eal/common/include/rte_debug.h
index cab6fb4c9..79b67b3ec 100644
--- a/lib/librte_eal/common/include/rte_debug.h
+++ b/lib/librte_eal/common/include/rte_debug.h
@@ -79,7 +79,7 @@ void rte_dump_registers(void);
 #define rte_panic(...) rte_panic_(__func__, __VA_ARGS__, "dummy")
 #define rte_panic_(func, format, ...) __rte_panic(func, format "%.0s", 
__VA_ARGS__)
 
-#if RTE_LOG_LEVEL >= RTE_LOG_DEBUG
+#ifdef RTE_ENABLE_ASSERT
 #define RTE_ASSERT(exp)RTE_VERIFY(exp)
 #else
 #define RTE_ASSERT(exp) do {} while (0)
-- 
2.13.3



Re: [dpdk-dev] [PATCH 1/2] eal/x86: use cpuid builtin

2017-08-24 Thread Sergio Gonzalez Monroy

On 23/08/2017 21:49, Thomas Monjalon wrote:

Please could you explain why the asm code was used?


I guess we were not aware that there was a builtin for it.


Are you sure this builtin is implemented everywhere?



Actually the builtin used in this patch is not supported in most CLANG 
version (just recently merged upstream),
so I have reworked the patch to use builtin supported in both GCC and 
CLANG 3.4+.


Thanks,
Sergio


[dpdk-dev] [PATCH v2] eal: add config option to enable asserts

2017-08-24 Thread Xueming Li
Currently, enabling assertion have to set CONFIG_RTE_LOG_LEVEL to
RTE_LOG_DEBUG. CONFIG_RTE_LOG_LEVEL is the default log level of control
path, RTE_LOG_DP_LEVEL is the log level of data path. It's a little bit
hard to understand literally that assertion is decided by control path
LOG_LEVEL, especially assertion used on data path.

On the other hand, DPDK need an assertion enabling switch w/o impacting
log output level, assuming "--log-level" not specified.

Assertion is an important API to balance DPDK high performance and
robustness. To promote assertion usage, it's valuable to unhide
assertion out of COFNIG_RTE_LOG_LEVEL.

In one word, log is log, assertion is assertion, debug is hot pot :)

Rationale of this patch is to introduce an dedicate switch of
assertion: RTE_ENABLE_ASSERT

Signed-off-by: Xueming Li 
Acked-by: Gaetan Rivet 

---
v2: Changed macro name from RTE_ASSERTION to RTE_ENABLE_ASSERT
---
 config/common_base| 1 +
 lib/librte_eal/common/include/rte_debug.h | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/config/common_base b/config/common_base
index 5e97a08b6..2cb445b6d 100644
--- a/config/common_base
+++ b/config/common_base
@@ -93,6 +93,7 @@ CONFIG_RTE_MAX_NUMA_NODES=8
 CONFIG_RTE_MAX_MEMSEG=256
 CONFIG_RTE_MAX_MEMZONE=2560
 CONFIG_RTE_MAX_TAILQ=32
+CONFIG_RTE_ENABLE_ASSERT=n
 CONFIG_RTE_LOG_LEVEL=RTE_LOG_INFO
 CONFIG_RTE_LOG_DP_LEVEL=RTE_LOG_INFO
 CONFIG_RTE_LOG_HISTORY=256
diff --git a/lib/librte_eal/common/include/rte_debug.h 
b/lib/librte_eal/common/include/rte_debug.h
index cab6fb4c9..79b67b3ec 100644
--- a/lib/librte_eal/common/include/rte_debug.h
+++ b/lib/librte_eal/common/include/rte_debug.h
@@ -79,7 +79,7 @@ void rte_dump_registers(void);
 #define rte_panic(...) rte_panic_(__func__, __VA_ARGS__, "dummy")
 #define rte_panic_(func, format, ...) __rte_panic(func, format "%.0s", 
__VA_ARGS__)
 
-#if RTE_LOG_LEVEL >= RTE_LOG_DEBUG
+#ifdef RTE_ENABLE_ASSERT
 #define RTE_ASSERT(exp)RTE_VERIFY(exp)
 #else
 #define RTE_ASSERT(exp) do {} while (0)
-- 
2.13.3



Re: [dpdk-dev] [PATCH 1/2] net/mlx5: replace memory barrier type

2017-08-24 Thread Bruce Richardson
On Thu, Aug 24, 2017 at 06:56:11AM +, Shahaf Shuler wrote:
> Wednesday, August 23, 2017 4:12 PM, Bruce Richardson:
> > On Wed, Aug 23, 2017 at 01:39:08PM +0200, Nélio Laranjeiro wrote:
> > > On Mon, Aug 21, 2017 at 10:47:01AM +0300, Sagi Grimberg wrote:
> > >
> > > Acked-by: Nelio Laranjeiro 
> > >
> > While a compiler barrier may do on platforms with strong ordering, I'm
> > wondering if the rte_smp_wmb() macro may be needed here to give
> > compiler barrier or actual memory barrier depending on platform?
> 
> Thanks for the catch!
> 
> However, the description of rte_smp_wmb() not seems to fit our case here.
> We don't try to sync between different lcores, rather between the device and 
> a single lcore. 
> 
> Maybe rte_io_wmb fits better? 
> 
Yep. Looks about right.

/Bruce


Re: [dpdk-dev] [PATCH 1/7] member: implement main API

2017-08-24 Thread Ferruh Yigit
On 8/22/2017 11:02 AM, Luca Boccassi wrote:
> On Mon, 2017-08-21 at 17:19 -0700, Yipeng Wang wrote:
>> Membership library is an extension and generalization of a
>> traditional
>> filter (for example Bloom Filter) structure. In general, the
>> Membership
>> library is a data structure that provides a "set-summary" and
>> responds
>> to set-membership queries of whether a certain element belongs to a
>> set(s). A membership test for an element will return the set this
>> element
>> belongs to or not-found if the element is never inserted into the
>> set-summary.
>>
>> The results of the membership test is not 100% accurate. Certain
>> false positive or false negative probability could exist. However,
>> comparing to a "full-blown" complete list of elements, a "set-
>> summary"
>> is memory efficient and fast on lookup.
>>
>> This patch add the main API definition.
>>
>> Signed-off-by: Yipeng Wang 
>> ---
>>  lib/Makefile |   2 +
>>  lib/librte_eal/common/eal_common_log.c   |   1 +
>>  lib/librte_eal/common/include/rte_log.h  |   1 +
>>  lib/librte_member/Makefile   |  48 +++
>>  lib/librte_member/rte_member.c   | 357 +
>>  lib/librte_member/rte_member.h   | 518
>> +++
>>  lib/librte_member/rte_member_version.map |  15 +
>>  7 files changed, 942 insertions(+)
>>  create mode 100644 lib/librte_member/Makefile
>>  create mode 100644 lib/librte_member/rte_member.c
>>  create mode 100644 lib/librte_member/rte_member.h
>>  create mode 100644 lib/librte_member/rte_member_version.map
>>
> 
>> diff --git a/lib/librte_member/Makefile b/lib/librte_member/Makefile
>> new file mode 100644
>> index 000..997c825
>> --- /dev/null
>> +++ b/lib/librte_member/Makefile
>> @@ -0,0 +1,48 @@
>> +#   BSD LICENSE
>> +#
>> +#   Copyright(c) 2017 Intel Corporation. All rights reserved.
>> +#   All rights reserved.
>> +#
>> +#   Redistribution and use in source and binary forms, with or
>> without
>> +#   modification, are permitted provided that the following
>> conditions
>> +#   are met:
>> +#
>> +# * Redistributions of source code must retain the above
>> copyright
>> +#   notice, this list of conditions and the following
>> disclaimer.
>> +# * Redistributions in binary form must reproduce the above
>> copyright
>> +#   notice, this list of conditions and the following disclaimer
>> in
>> +#   the documentation and/or other materials provided with the
>> +#   distribution.
>> +# * Neither the name of Intel Corporation nor the names of its
>> +#   contributors may be used to endorse or promote products
>> derived
>> +#   from this software without specific prior written
>> permission.
>> +#
>> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
>> CONTRIBUTORS
>> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
>> NOT
>> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
>> FITNESS FOR
>> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
>> COPYRIGHT
>> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>> INCIDENTAL,
>> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
>> USE,
>> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
>> ON ANY
>> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
>> TORT
>> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
>> THE USE
>> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
>> DAMAGE.
>> +
>> +include $(RTE_SDK)/mk/rte.vars.mk
>> +
>> +# library name
>> +LIB = librte_member.a
>> +
>> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
>> +
> 
> This breaks reproducibility as the output directory will be included
> before the source directory, causing a race - please do something like:
> 
> CFLAGS := -I$(SRCDIR) $(CFLAGS)
> CFLAGS += $(WERROR_FLAGS) -O3

Can we remove "-I$(SRCDIR)" completely by first installing headers and
later compiling objects, all using $(RTE_OUT) only?

Do you think can this work?

> 
>> +EXPORT_MAP := rte_member_version.map
>> +
>> +LIBABIVER := 1
>> +
>> +# all source are stored in SRCS-y
>> +SRCS-$(CONFIG_RTE_LIBRTE_MEMBER) +=  rte_member.c
>> +# install includes
>> +SYMLINK-$(CONFIG_RTE_LIBRTE_MEMBER)-include := rte_member.h
>> +
>> +include $(RTE_SDK)/mk/rte.lib.mk
> 



Re: [dpdk-dev] [PATCH 1/7] member: implement main API

2017-08-24 Thread Luca Boccassi
On Thu, 2017-08-24 at 10:35 +0100, Ferruh Yigit wrote:
> On 8/22/2017 11:02 AM, Luca Boccassi wrote:
> > On Mon, 2017-08-21 at 17:19 -0700, Yipeng Wang wrote:
> > > Membership library is an extension and generalization of a
> > > traditional
> > > filter (for example Bloom Filter) structure. In general, the
> > > Membership
> > > library is a data structure that provides a "set-summary" and
> > > responds
> > > to set-membership queries of whether a certain element belongs to
> > > a
> > > set(s). A membership test for an element will return the set this
> > > element
> > > belongs to or not-found if the element is never inserted into the
> > > set-summary.
> > > 
> > > The results of the membership test is not 100% accurate. Certain
> > > false positive or false negative probability could exist.
> > > However,
> > > comparing to a "full-blown" complete list of elements, a "set-
> > > summary"
> > > is memory efficient and fast on lookup.
> > > 
> > > This patch add the main API definition.
> > > 
> > > Signed-off-by: Yipeng Wang 
> > > ---
> > >  lib/Makefile |   2 +
> > >  lib/librte_eal/common/eal_common_log.c   |   1 +
> > >  lib/librte_eal/common/include/rte_log.h  |   1 +
> > >  lib/librte_member/Makefile   |  48 +++
> > >  lib/librte_member/rte_member.c   | 357
> > > +
> > >  lib/librte_member/rte_member.h   | 518
> > > +++
> > >  lib/librte_member/rte_member_version.map |  15 +
> > >  7 files changed, 942 insertions(+)
> > >  create mode 100644 lib/librte_member/Makefile
> > >  create mode 100644 lib/librte_member/rte_member.c
> > >  create mode 100644 lib/librte_member/rte_member.h
> > >  create mode 100644 lib/librte_member/rte_member_version.map
> > > 
> > > diff --git a/lib/librte_member/Makefile
> > > b/lib/librte_member/Makefile
> > > new file mode 100644
> > > index 000..997c825
> > > --- /dev/null
> > > +++ b/lib/librte_member/Makefile
> > > @@ -0,0 +1,48 @@
> > > +#   BSD LICENSE
> > > +#
> > > +#   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > > +#   All rights reserved.
> > > +#
> > > +#   Redistribution and use in source and binary forms, with or
> > > without
> > > +#   modification, are permitted provided that the following
> > > conditions
> > > +#   are met:
> > > +#
> > > +# * Redistributions of source code must retain the above
> > > copyright
> > > +#   notice, this list of conditions and the following
> > > disclaimer.
> > > +# * Redistributions in binary form must reproduce the above
> > > copyright
> > > +#   notice, this list of conditions and the following
> > > disclaimer
> > > in
> > > +#   the documentation and/or other materials provided with
> > > the
> > > +#   distribution.
> > > +# * Neither the name of Intel Corporation nor the names of
> > > its
> > > +#   contributors may be used to endorse or promote products
> > > derived
> > > +#   from this software without specific prior written
> > > permission.
> > > +#
> > > +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> > > CONTRIBUTORS
> > > +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING,
> > > BUT
> > > NOT
> > > +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> > > FITNESS FOR
> > > +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> > > COPYRIGHT
> > > +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> > > INCIDENTAL,
> > > +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> > > NOT
> > > +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> > > LOSS OF
> > > USE,
> > > +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> > > AND
> > > ON ANY
> > > +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
> > > OR
> > > TORT
> > > +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> > > OF
> > > THE USE
> > > +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> > > DAMAGE.
> > > +
> > > +include $(RTE_SDK)/mk/rte.vars.mk
> > > +
> > > +# library name
> > > +LIB = librte_member.a
> > > +
> > > +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
> > > +
> > 
> > This breaks reproducibility as the output directory will be
> > included
> > before the source directory, causing a race - please do something
> > like:
> > 
> > CFLAGS := -I$(SRCDIR) $(CFLAGS)
> > CFLAGS += $(WERROR_FLAGS) -O3
> 
> Can we remove "-I$(SRCDIR)" completely by first installing headers
> and
> later compiling objects, all using $(RTE_OUT) only?
> 
> Do you think can this work?

I'm not sure, it might - but given Bruce's effort to port to Meson I'm
not sure it's worth spending a lot of time doing big refactoring of the
existing build system

> > > +EXPORT_MAP := rte_member_version.map
> > > +
> > > +LIBABIVER := 1
> > > +
> > > +# all source are stored in SRCS-y
> > > +SRCS-$(CONFIG_RTE_LIBRTE_MEMBER) +=  rte_member.c
> > > +# install includes
> > > +SYMLINK

[dpdk-dev] [PATCH 0/5] net/i40e: implement dynamic mapping of flow types to pctypes

2017-08-24 Thread Kirill Rybalchenko
Implement dynamic mapping of software flow types to hardware pctypes.
This allows to map new flow types to pctypes without changing
API of the driver.

Kirill Rybalchenko (5):
  app/testpmd: add new commands to manipulate with pctype mapping
  net/i40e: add function to initialize pctype mapping table
  net/i40e: add new functions to manipulate with pctype mapping table
  net/i40e: change list of parameters for functions mapping flow type to
pctype and back
  net/i40e: implement dynamic mapping of sw flow types to hw pctypes

 app/test-pmd/cmdline.c| 263 
 drivers/net/i40e/i40e_ethdev.c| 313 +-
 drivers/net/i40e/i40e_ethdev.h|  16 +-
 drivers/net/i40e/i40e_ethdev_vf.c |  36 ++---
 drivers/net/i40e/i40e_fdir.c  |  10 +-
 drivers/net/i40e/i40e_flow.c  |   2 +-
 drivers/net/i40e/i40e_rxtx.c  |  57 +++
 drivers/net/i40e/i40e_rxtx.h  |   1 +
 drivers/net/i40e/rte_pmd_i40e.c   |  98 
 drivers/net/i40e/rte_pmd_i40e.h   |  60 
 10 files changed, 583 insertions(+), 273 deletions(-)

-- 
2.5.5



[dpdk-dev] [PATCH 2/5] net/i40e: add function to initialize pctype mapping table

2017-08-24 Thread Kirill Rybalchenko
Add new function i40e_set_default_pctype_table() to
initialize flow type to pctype dynamic mapping table
with default values.

Signed-off-by: Kirill Rybalchenko 
---
 drivers/net/i40e/i40e_rxtx.c | 57 
 drivers/net/i40e/i40e_rxtx.h |  1 +
 2 files changed, 58 insertions(+)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index d42c23c..5e75567 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -2941,6 +2941,63 @@ i40e_set_default_ptype_table(struct rte_eth_dev *dev)
ad->ptype_tbl[i] = i40e_get_default_pkt_type(i);
 }
 
+void __attribute__((cold))
+i40e_set_default_pctype_table(struct rte_eth_dev *dev)
+{
+   struct i40e_adapter *ad = 
I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   int i;
+
+   for (i = 0; i < I40E_FLOW_TYPE_MAX; i++)
+   ad->pcypes_tbl[i] = 0ULL;
+   ad->flow_types_msk = 0ULL;
+   ad->pctypes_msk = 0ULL;
+
+   ad->pcypes_tbl[RTE_ETH_FLOW_FRAG_IPV4] =
+   (1ULL << I40E_FILTER_PCTYPE_FRAG_IPV4);
+   ad->pcypes_tbl[RTE_ETH_FLOW_NONFRAG_IPV4_UDP] =
+   (1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_UDP);
+   ad->pcypes_tbl[RTE_ETH_FLOW_NONFRAG_IPV4_TCP] =
+   (1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_TCP);
+   ad->pcypes_tbl[RTE_ETH_FLOW_NONFRAG_IPV4_SCTP] =
+   (1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_SCTP);
+   ad->pcypes_tbl[RTE_ETH_FLOW_NONFRAG_IPV4_OTHER] =
+   (1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_OTHER);
+   ad->pcypes_tbl[RTE_ETH_FLOW_FRAG_IPV6] =
+   (1ULL << I40E_FILTER_PCTYPE_FRAG_IPV6);
+   ad->pcypes_tbl[RTE_ETH_FLOW_NONFRAG_IPV6_UDP] =
+   (1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_UDP);
+   ad->pcypes_tbl[RTE_ETH_FLOW_NONFRAG_IPV6_TCP] =
+   (1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_TCP);
+   ad->pcypes_tbl[RTE_ETH_FLOW_NONFRAG_IPV6_SCTP] =
+   (1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_SCTP);
+   ad->pcypes_tbl[RTE_ETH_FLOW_NONFRAG_IPV6_OTHER] =
+   (1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_OTHER);
+   ad->pcypes_tbl[RTE_ETH_FLOW_L2_PAYLOAD] =
+   (1ULL << I40E_FILTER_PCTYPE_L2_PAYLOAD);
+
+   if (hw->mac.type == I40E_MAC_X722) {
+   ad->pcypes_tbl[RTE_ETH_FLOW_NONFRAG_IPV4_UDP] |=
+   (1ULL << 
I40E_FILTER_PCTYPE_NONF_UNICAST_IPV4_UDP);
+   ad->pcypes_tbl[RTE_ETH_FLOW_NONFRAG_IPV4_UDP] |=
+   (1ULL << 
I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV4_UDP);
+   ad->pcypes_tbl[RTE_ETH_FLOW_NONFRAG_IPV4_TCP] |=
+   (1ULL << 
I40E_FILTER_PCTYPE_NONF_IPV4_TCP_SYN_NO_ACK);
+   ad->pcypes_tbl[RTE_ETH_FLOW_NONFRAG_IPV6_UDP] |=
+   (1ULL << 
I40E_FILTER_PCTYPE_NONF_UNICAST_IPV6_UDP);
+   ad->pcypes_tbl[RTE_ETH_FLOW_NONFRAG_IPV6_UDP] |=
+   (1ULL << 
I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP);
+   ad->pcypes_tbl[RTE_ETH_FLOW_NONFRAG_IPV6_TCP] |=
+   (1ULL << 
I40E_FILTER_PCTYPE_NONF_IPV6_TCP_SYN_NO_ACK);
+   }
+
+   for (i = 0; i < I40E_FLOW_TYPE_MAX; i++) {
+   if (ad->pcypes_tbl[i])
+   ad->flow_types_msk |= (1ULL << i);
+   ad->pctypes_msk |= ad->pcypes_tbl[i];
+   }
+}
+
 /* Stubs needed for linkage when CONFIG_RTE_I40E_INC_VECTOR is set to 'n' */
 int __attribute__((weak))
 i40e_rx_vec_dev_conf_condition_check(struct rte_eth_dev __rte_unused *dev)
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 20084d6..2a58ced 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -255,6 +255,7 @@ void i40e_set_tx_function_flag(struct rte_eth_dev *dev,
   struct i40e_tx_queue *txq);
 void i40e_set_tx_function(struct rte_eth_dev *dev);
 void i40e_set_default_ptype_table(struct rte_eth_dev *dev);
+void i40e_set_default_pctype_table(struct rte_eth_dev *dev);
 
 /* For each value it means, datasheet of hardware can tell more details
  *
-- 
2.5.5



[dpdk-dev] [PATCH 3/5] net/i40e: add new functions to manipulate with pctype mapping table

2017-08-24 Thread Kirill Rybalchenko
Add new functions which allow modify, return or reset to default
the contents of flow type to pctype dynamic mapping table.

Signed-off-by: Kirill Rybalchenko 
---
 drivers/net/i40e/rte_pmd_i40e.c | 98 +
 drivers/net/i40e/rte_pmd_i40e.h | 60 +
 2 files changed, 158 insertions(+)

diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c
index 950a0d6..c91efd5 100644
--- a/drivers/net/i40e/rte_pmd_i40e.c
+++ b/drivers/net/i40e/rte_pmd_i40e.c
@@ -2117,3 +2117,101 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port,
 
return 0;
 }
+
+int rte_pmd_i40e_flow_type_mapping_reset(uint8_t port)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+   dev = &rte_eth_devices[port];
+
+   if (!is_i40e_supported(dev))
+   return -ENOTSUP;
+
+   i40e_set_default_pctype_table(dev);
+
+   return 0;
+}
+
+int rte_pmd_i40e_flow_type_mapping_get(
+   uint8_t port,
+   struct rte_pmd_i40e_flow_type_mapping *mapping_items,
+   uint16_t size,
+   uint16_t *count,
+   uint8_t valid_only)
+{
+   struct rte_eth_dev *dev;
+   struct i40e_adapter *ad;
+   int n = 0;
+   uint16_t i;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+   dev = &rte_eth_devices[port];
+
+   if (!is_i40e_supported(dev))
+   return -ENOTSUP;
+
+   ad = I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+
+   for (i = 0; i < I40E_FLOW_TYPE_MAX; i++) {
+   if (n >= size)
+   break;
+   if (valid_only && ad->pcypes_tbl[i] == 0ULL)
+   continue;
+   mapping_items[n].flow_type = i;
+   mapping_items[n].pctype = ad->pcypes_tbl[i];
+   n++;
+   }
+
+   *count = n;
+   return 0;
+}
+
+int
+rte_pmd_i40e_flow_type_mapping_update(
+   uint8_t port,
+   struct rte_pmd_i40e_flow_type_mapping *mapping_items,
+   uint16_t count,
+   uint8_t exclusive)
+{
+   struct rte_eth_dev *dev;
+   struct i40e_adapter *ad;
+   int i;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+   dev = &rte_eth_devices[port];
+
+   if (!is_i40e_supported(dev))
+   return -ENOTSUP;
+
+   if (count > I40E_FLOW_TYPE_MAX)
+   return -EINVAL;
+
+   for (i = 0; i < count; i++)
+   if (mapping_items[i].flow_type >= I40E_FLOW_TYPE_MAX)
+   return -EINVAL;
+
+   ad = I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+
+   if (exclusive) {
+   for (i = 0; i < I40E_FLOW_TYPE_MAX; i++)
+   ad->pcypes_tbl[i] = 0ULL;
+   ad->flow_types_msk = 0ULL;
+   }
+
+   for (i = 0; i < count; i++) {
+   ad->pcypes_tbl[mapping_items[i].flow_type] = 
mapping_items[i].pctype;
+   if (mapping_items[i].pctype)
+   ad->flow_types_msk |= (1ULL << 
mapping_items[i].flow_type);
+   else
+   ad->flow_types_msk &= ~(1ULL << 
mapping_items[i].flow_type);
+   }
+
+   for (i = 0, ad->pctypes_msk = 0ULL; i < I40E_FLOW_TYPE_MAX; i++)
+   ad->pctypes_msk |= ad->pcypes_tbl[i];
+
+   return 0;
+}
diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h
index 356fa89..d993c89 100644
--- a/drivers/net/i40e/rte_pmd_i40e.h
+++ b/drivers/net/i40e/rte_pmd_i40e.h
@@ -637,4 +637,64 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port,
   uint8_t mask,
   uint32_t pkt_type);
 
+struct rte_pmd_i40e_flow_type_mapping {
+   uint8_t flow_type; /**< software defined flow type*/
+   uint64_t pctype; /**< hardware defined pctype */
+};
+
+/**
+ * Update hardware defined pctype to software defined flow type
+ * mapping table.
+ *
+ * @param port
+ *pointer to port identifier of the device.
+ * @param mapping_items
+ *the base address of the mapping items array.
+ * @param count
+ *number of mapping items.
+ * @param exclusive
+ *the flag indicate different ptype mapping update method.
+ *-(0) only overwrite referred PTYPE mapping,
+ * keep other PTYPEs mapping unchanged.
+ *-(!0) overwrite referred PTYPE mapping,
+ * set other PTYPEs maps to PTYPE_UNKNOWN.
+ */
+int rte_pmd_i40e_flow_type_mapping_update(
+   uint8_t port,
+   struct rte_pmd_i40e_flow_type_mapping *mapping_items,
+   uint16_t count,
+   uint8_t exclusive);
+
+/**
+ * Get software defined flow type to hardware defined pctype
+ * mapping items.
+ *
+ * @param port
+ *pointer to port identifier of the device.
+ * @pa

[dpdk-dev] [PATCH 1/5] app/testpmd: add new commands to manipulate with pctype mapping

2017-08-24 Thread Kirill Rybalchenko
Add new commands to manipulate with dynamic flow type to
pctype mapping table in i40e PMD.
Commands allow to print table, modify it and reset to default value.

Signed-off-by: Kirill Rybalchenko 
---
 app/test-pmd/cmdline.c | 263 +
 1 file changed, 263 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index cd8c358..6bf3a9d 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -13795,6 +13795,265 @@ cmdline_parse_inst_t cmd_clear_vf_stats = {
},
 };
 
+/* pctype mapping reset */
+
+/* Common result structure for pctype mapping reset */
+struct cmd_pctype_mapping_reset_result {
+   cmdline_fixed_string_t pctype;
+   cmdline_fixed_string_t mapping;
+   cmdline_fixed_string_t reset;
+   uint8_t port_id;
+};
+
+/* Common CLI fields for ptype mapping reset*/
+cmdline_parse_token_string_t cmd_pctype_mapping_reset_pctype =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_pctype_mapping_reset_result,
+pctype, "pctype");
+cmdline_parse_token_string_t cmd_pctype_mapping_reset_mapping =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_pctype_mapping_reset_result,
+mapping, "mapping");
+cmdline_parse_token_string_t cmd_pctype_mapping_reset_reset =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_pctype_mapping_reset_result,
+reset, "reset");
+cmdline_parse_token_num_t cmd_pctype_mapping_reset_port_id =
+   TOKEN_NUM_INITIALIZER
+   (struct cmd_pctype_mapping_reset_result,
+port_id, UINT8);
+
+static void
+cmd_pctype_mapping_reset_parsed(
+   void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_pctype_mapping_reset_result *res = parsed_result;
+   int ret = -ENOTSUP;
+
+   if (port_id_is_invalid(res->port_id, ENABLED_WARN))
+   return;
+
+#ifdef RTE_LIBRTE_I40E_PMD
+   ret = rte_pmd_i40e_flow_type_mapping_reset(res->port_id);
+#endif
+
+   switch (ret) {
+   case 0:
+   break;
+   case -ENODEV:
+   printf("invalid port_id %d\n", res->port_id);
+   break;
+   case -ENOTSUP:
+   printf("function not implemented\n");
+   break;
+   default:
+   printf("programming error: (%s)\n", strerror(-ret));
+   }
+}
+
+cmdline_parse_inst_t cmd_pctype_mapping_reset = {
+   .f = cmd_pctype_mapping_reset_parsed,
+   .data = NULL,
+   .help_str = "pctype mapping reset ",
+   .tokens = {
+   (void *)&cmd_pctype_mapping_reset_pctype,
+   (void *)&cmd_pctype_mapping_reset_mapping,
+   (void *)&cmd_pctype_mapping_reset_reset,
+   (void *)&cmd_pctype_mapping_reset_port_id,
+   NULL,
+   },
+};
+
+/* pctype mapping get */
+
+/* Common result structure for ptype mapping get */
+struct cmd_pctype_mapping_get_result {
+   cmdline_fixed_string_t pctype;
+   cmdline_fixed_string_t mapping;
+   cmdline_fixed_string_t get;
+   uint8_t port_id;
+   uint8_t valid_only;
+};
+
+/* Common CLI fields for ptype mapping get */
+cmdline_parse_token_string_t cmd_pctype_mapping_get_pctype =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_pctype_mapping_get_result,
+pctype, "pctype");
+cmdline_parse_token_string_t cmd_pctype_mapping_get_mapping =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_pctype_mapping_get_result,
+mapping, "mapping");
+cmdline_parse_token_string_t cmd_pctype_mapping_get_get =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_pctype_mapping_get_result,
+get, "get");
+cmdline_parse_token_num_t cmd_pctype_mapping_get_port_id =
+   TOKEN_NUM_INITIALIZER
+   (struct cmd_pctype_mapping_get_result,
+port_id, UINT8);
+cmdline_parse_token_num_t cmd_pctype_mapping_get_valid_only =
+   TOKEN_NUM_INITIALIZER
+   (struct cmd_pctype_mapping_get_result,
+valid_only, UINT8);
+
+static void
+cmd_pctype_mapping_get_parsed(
+   void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_pctype_mapping_get_result *res = parsed_result;
+   int ret = -ENOTSUP;
+#ifdef RTE_LIBRTE_I40E_PMD
+   int max_ptype_num = 64;
+   struct rte_pmd_i40e_flow_type_mapping mapping[max_ptype_num];
+   uint16_t count;
+   int i;
+#endif
+
+   if (port_id_is_invalid(res->port_id, ENABLED_WARN))
+   return;
+
+#ifdef RTE_LIBRTE_I40E_PMD
+   ret = rte_pmd_i40e_flow_type_mapping_get(res->port_id,
+   mapping,
+   max_ptype_num,
+   &count,
+   re

[dpdk-dev] [PATCH 4/5] net/i40e: change list of parameters for functions mapping flow type to pctype and back

2017-08-24 Thread Kirill Rybalchenko
Functions i40e_pctype_to_flowtype and i40e_flowtype_to_pctype are
changed to work with dynamic mapping pctype to flowtype table.
This table is located in private data area of adapter structure.
Therefore those functions need one extra parameter pointing to
the adapter data structure.

Signed-off-by: Kirill Rybalchenko 
---
 drivers/net/i40e/i40e_fdir.c | 10 +-
 drivers/net/i40e/i40e_flow.c |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
index 8013add..7515941 100644
--- a/drivers/net/i40e/i40e_fdir.c
+++ b/drivers/net/i40e/i40e_fdir.c
@@ -665,10 +665,10 @@ i40e_fdir_configure(struct rte_eth_dev *dev)
pctype = (enum i40e_filter_pctype)i40e_read_rx_ctl(
hw, I40E_GLQF_FD_PCTYPES(
(int)i40e_flowtype_to_pctype(
-   conf->flex_mask[i].flow_type)));
+   conf->flex_mask[i].flow_type, pf->adapter)));
} else
pctype = i40e_flowtype_to_pctype(
-   conf->flex_mask[i].flow_type);
+   conf->flex_mask[i].flow_type, pf->adapter);
 
i40e_set_flex_mask_on_pctype(pf, pctype, &conf->flex_mask[i]);
}
@@ -1143,9 +1143,9 @@ i40e_add_del_fdir_filter(struct rte_eth_dev *dev,
pctype = (enum i40e_filter_pctype)i40e_read_rx_ctl(
hw, I40E_GLQF_FD_PCTYPES(
(int)i40e_flowtype_to_pctype(
-   filter->input.flow_type)));
+   filter->input.flow_type, pf->adapter)));
} else
-   pctype = i40e_flowtype_to_pctype(filter->input.flow_type);
+   pctype = i40e_flowtype_to_pctype(filter->input.flow_type, 
pf->adapter);
 
ret = i40e_fdir_filter_programming(pf, pctype, filter, add);
if (ret < 0) {
@@ -1400,7 +1400,7 @@ i40e_fdir_info_get_flex_mask(struct i40e_pf *pf,
if (!I40E_VALID_PCTYPE((enum i40e_filter_pctype)i))
continue;
}
-   flow_type = i40e_pctype_to_flowtype((enum i40e_filter_pctype)i);
+   flow_type = i40e_pctype_to_flowtype(i, pf->adapter);
for (j = 0; j < I40E_FDIR_MAX_FLEXWORD_NUM; j++) {
if (mask->word_mask & I40E_FLEX_WORD_MASK(j)) {
ptr->mask[j * sizeof(uint16_t)] = UINT8_MAX;
diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index b92719a..4db771c 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -2776,7 +2776,7 @@ i40e_flow_parse_fdir_pattern(struct rte_eth_dev *dev,
}
}
 
-   pctype = i40e_flowtype_to_pctype(flow_type);
+   pctype = i40e_flowtype_to_pctype(flow_type, pf->adapter);
if (pctype == 0 || pctype > I40E_FILTER_PCTYPE_L2_PAYLOAD) {
rte_flow_error_set(error, EINVAL,
   RTE_FLOW_ERROR_TYPE_ITEM, item,
-- 
2.5.5



[dpdk-dev] [PATCH 5/5] net/i40e: implement dynamic mapping of sw flow types to hw pctypes

2017-08-24 Thread Kirill Rybalchenko
Implement dynamic mapping of software flow types to hardware pctypes.
This allows to add new flow types and pctypes for DDP without changing
API of the driver. The mapping table is located in private
data area for particular network adapter and can be individually
modified with set of appropriate functions.

Signed-off-by: Kirill Rybalchenko 
---
 drivers/net/i40e/i40e_ethdev.c| 313 +-
 drivers/net/i40e/i40e_ethdev.h|  16 +-
 drivers/net/i40e/i40e_ethdev_vf.c |  36 ++---
 3 files changed, 98 insertions(+), 267 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 4a2e3f2..d80eca9 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1062,6 +1062,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
return 0;
}
i40e_set_default_ptype_table(dev);
+   i40e_set_default_pctype_table(dev);
pci_dev = RTE_ETH_DEV_TO_PCI(dev);
intr_handle = &pci_dev->intr_handle;
 
@@ -2965,7 +2966,7 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->hash_key_size = (I40E_PFQF_HKEY_MAX_INDEX + 1) *
sizeof(uint32_t);
dev_info->reta_size = pf->hash_lut_size;
-   dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
+   dev_info->flow_type_rss_offloads = pf->adapter->flow_types_msk;
 
dev_info->default_rxconf = (struct rte_eth_rxconf) {
.rx_thresh = {
@@ -6556,104 +6557,36 @@ i40e_vsi_delete_mac(struct i40e_vsi *vsi, struct 
ether_addr *addr)
 
 /* Configure hash enable flags for RSS */
 uint64_t
-i40e_config_hena(uint64_t flags, enum i40e_mac_type type)
+i40e_config_hena(uint64_t flags, struct i40e_adapter *adapter)
 {
uint64_t hena = 0;
+   int i;
 
if (!flags)
return hena;
 
-   if (flags & ETH_RSS_FRAG_IPV4)
-   hena |= 1ULL << I40E_FILTER_PCTYPE_FRAG_IPV4;
-   if (flags & ETH_RSS_NONFRAG_IPV4_TCP) {
-   if (type == I40E_MAC_X722) {
-   hena |= (1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_TCP) |
-(1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_TCP_SYN_NO_ACK);
-   } else
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_TCP;
-   }
-   if (flags & ETH_RSS_NONFRAG_IPV4_UDP) {
-   if (type == I40E_MAC_X722) {
-   hena |= (1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_UDP) |
-(1ULL << I40E_FILTER_PCTYPE_NONF_UNICAST_IPV4_UDP) |
-(1ULL << I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV4_UDP);
-   } else
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_UDP;
-   }
-   if (flags & ETH_RSS_NONFRAG_IPV4_SCTP)
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_SCTP;
-   if (flags & ETH_RSS_NONFRAG_IPV4_OTHER)
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_OTHER;
-   if (flags & ETH_RSS_FRAG_IPV6)
-   hena |= 1ULL << I40E_FILTER_PCTYPE_FRAG_IPV6;
-   if (flags & ETH_RSS_NONFRAG_IPV6_TCP) {
-   if (type == I40E_MAC_X722) {
-   hena |= (1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_TCP) |
-(1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_TCP_SYN_NO_ACK);
-   } else
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_TCP;
-   }
-   if (flags & ETH_RSS_NONFRAG_IPV6_UDP) {
-   if (type == I40E_MAC_X722) {
-   hena |= (1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_UDP) |
-(1ULL << I40E_FILTER_PCTYPE_NONF_UNICAST_IPV6_UDP) |
-(1ULL << I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP);
-   } else
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_UDP;
+   for (i = 0; i < I40E_FLOW_TYPE_MAX; i++) {
+   if (flags & (1ULL << i))
+   hena |= adapter->pcypes_tbl[i];
}
-   if (flags & ETH_RSS_NONFRAG_IPV6_SCTP)
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_SCTP;
-   if (flags & ETH_RSS_NONFRAG_IPV6_OTHER)
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_OTHER;
-   if (flags & ETH_RSS_L2_PAYLOAD)
-   hena |= 1ULL << I40E_FILTER_PCTYPE_L2_PAYLOAD;
 
return hena;
 }
 
 /* Parse the hash enable flags */
 uint64_t
-i40e_parse_hena(uint64_t flags)
+i40e_parse_hena(uint64_t flags, struct i40e_adapter *adapter)
 {
uint64_t rss_hf = 0;
 
if (!flags)
return rss_hf;
-   if (flags & (1ULL << I40E_FILTER_PCTYPE_FRAG_IPV4))
-   rss_hf |= ETH_RSS_FRAG_IPV4;
-   if (flags & (1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_TCP))
-   rss_hf |= ETH_RSS_NONFRAG_IPV4_TCP;
-   if (flags & (1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_TCP_SYN_NO_ACK))
-   rss_hf |= ETH_RSS_NONF

Re: [dpdk-dev] [PATCH 1/7] member: implement main API

2017-08-24 Thread Ferruh Yigit
On 8/24/2017 10:55 AM, Luca Boccassi wrote:
> On Thu, 2017-08-24 at 10:35 +0100, Ferruh Yigit wrote:
>> On 8/22/2017 11:02 AM, Luca Boccassi wrote:
>>> On Mon, 2017-08-21 at 17:19 -0700, Yipeng Wang wrote:
 Membership library is an extension and generalization of a
 traditional
 filter (for example Bloom Filter) structure. In general, the
 Membership
 library is a data structure that provides a "set-summary" and
 responds
 to set-membership queries of whether a certain element belongs to
 a
 set(s). A membership test for an element will return the set this
 element
 belongs to or not-found if the element is never inserted into the
 set-summary.

 The results of the membership test is not 100% accurate. Certain
 false positive or false negative probability could exist.
 However,
 comparing to a "full-blown" complete list of elements, a "set-
 summary"
 is memory efficient and fast on lookup.

 This patch add the main API definition.

 Signed-off-by: Yipeng Wang 
 ---
  lib/Makefile |   2 +
  lib/librte_eal/common/eal_common_log.c   |   1 +
  lib/librte_eal/common/include/rte_log.h  |   1 +
  lib/librte_member/Makefile   |  48 +++
  lib/librte_member/rte_member.c   | 357
 +
  lib/librte_member/rte_member.h   | 518
 +++
  lib/librte_member/rte_member_version.map |  15 +
  7 files changed, 942 insertions(+)
  create mode 100644 lib/librte_member/Makefile
  create mode 100644 lib/librte_member/rte_member.c
  create mode 100644 lib/librte_member/rte_member.h
  create mode 100644 lib/librte_member/rte_member_version.map

 diff --git a/lib/librte_member/Makefile
 b/lib/librte_member/Makefile
 new file mode 100644
 index 000..997c825
 --- /dev/null
 +++ b/lib/librte_member/Makefile
 @@ -0,0 +1,48 @@
 +#   BSD LICENSE
 +#
 +#   Copyright(c) 2017 Intel Corporation. All rights reserved.
 +#   All rights reserved.
 +#
 +#   Redistribution and use in source and binary forms, with or
 without
 +#   modification, are permitted provided that the following
 conditions
 +#   are met:
 +#
 +# * Redistributions of source code must retain the above
 copyright
 +#   notice, this list of conditions and the following
 disclaimer.
 +# * Redistributions in binary form must reproduce the above
 copyright
 +#   notice, this list of conditions and the following
 disclaimer
 in
 +#   the documentation and/or other materials provided with
 the
 +#   distribution.
 +# * Neither the name of Intel Corporation nor the names of
 its
 +#   contributors may be used to endorse or promote products
 derived
 +#   from this software without specific prior written
 permission.
 +#
 +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
 CONTRIBUTORS
 +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING,
 BUT
 NOT
 +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
 FITNESS FOR
 +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
 COPYRIGHT
 +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
 INCIDENTAL,
 +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
 NOT
 +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 LOSS OF
 USE,
 +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
 AND
 ON ANY
 +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 OR
 TORT
 +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
 OF
 THE USE
 +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
 DAMAGE.
 +
 +include $(RTE_SDK)/mk/rte.vars.mk
 +
 +# library name
 +LIB = librte_member.a
 +
 +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
 +
>>>
>>> This breaks reproducibility as the output directory will be
>>> included
>>> before the source directory, causing a race - please do something
>>> like:
>>>
>>> CFLAGS := -I$(SRCDIR) $(CFLAGS)
>>> CFLAGS += $(WERROR_FLAGS) -O3
>>
>> Can we remove "-I$(SRCDIR)" completely by first installing headers
>> and
>> later compiling objects, all using $(RTE_OUT) only?
>>
>> Do you think can this work?
> 
> I'm not sure, it might - but given Bruce's effort to port to Meson I'm
> not sure it's worth spending a lot of time doing big refactoring of the
> existing build system

Not big refactoring, following seems worked for me, if you would like to
test:

diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
index 13115d146..643da47da 100644
--- a/mk/rte.lib.mk
+++ b/mk/rte.lib.mk
@@ -59,14 +59,19 @@ endif


 _BUILD = $(LIB)
-_INSTALL = $(INSTALL-FILES-y) $(SYM

[dpdk-dev] [PATCH 1/4] test-crypto-perf: add nb-desc parameter

2017-08-24 Thread Anatoly Burakov
This parameter makes number of cryptodev descriptors adjustable
and defaults to earlier hardcoded default of 2048.

Signed-off-by: Burakov, Anatoly 
---
 app/test-crypto-perf/cperf_options.h |  2 ++
 app/test-crypto-perf/cperf_options_parsing.c | 21 +
 app/test-crypto-perf/main.c  |  2 +-
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/app/test-crypto-perf/cperf_options.h 
b/app/test-crypto-perf/cperf_options.h
index 10cd2d8..edd6b79 100644
--- a/app/test-crypto-perf/cperf_options.h
+++ b/app/test-crypto-perf/cperf_options.h
@@ -12,6 +12,7 @@
 #define CPERF_BURST_SIZE   ("burst-sz")
 #define CPERF_BUFFER_SIZE  ("buffer-sz")
 #define CPERF_SEGMENTS_NB  ("segments-nb")
+#define CPERF_DESC_NB  ("desc-nb")
 
 #define CPERF_DEVTYPE  ("devtype")
 #define CPERF_OPTYPE   ("optype")
@@ -68,6 +69,7 @@ struct cperf_options {
uint32_t total_ops;
uint32_t segments_nb;
uint32_t test_buffer_size;
+   uint32_t nb_descriptors;
 
uint32_t sessionless:1;
uint32_t out_of_place:1;
diff --git a/app/test-crypto-perf/cperf_options_parsing.c 
b/app/test-crypto-perf/cperf_options_parsing.c
index 085aa8f..f4097d9 100644
--- a/app/test-crypto-perf/cperf_options_parsing.c
+++ b/app/test-crypto-perf/cperf_options_parsing.c
@@ -340,6 +340,24 @@ parse_segments_nb(struct cperf_options *opts, const char 
*arg)
 }
 
 static int
+parse_desc_nb(struct cperf_options *opts, const char *arg)
+{
+   int ret = parse_uint32_t(&opts->nb_descriptors, arg);
+
+   if (ret) {
+   RTE_LOG(ERR, USER1, "failed to parse descriptors number\n");
+   return -1;
+   }
+
+   if (opts->nb_descriptors == 0) {
+   RTE_LOG(ERR, USER1, "invalid descriptors number specified\n");
+   return -1;
+   }
+
+   return 0;
+}
+
+static int
 parse_device_type(struct cperf_options *opts, const char *arg)
 {
if (strlen(arg) > (sizeof(opts->device_type) - 1))
@@ -641,6 +659,7 @@ static struct option lgopts[] = {
{ CPERF_BURST_SIZE, required_argument, 0, 0 },
{ CPERF_BUFFER_SIZE, required_argument, 0, 0 },
{ CPERF_SEGMENTS_NB, required_argument, 0, 0 },
+   { CPERF_DESC_NB, required_argument, 0, 0 },
 
{ CPERF_DEVTYPE, required_argument, 0, 0 },
{ CPERF_OPTYPE, required_argument, 0, 0 },
@@ -684,6 +703,7 @@ cperf_options_default(struct cperf_options *opts)
 
opts->pool_sz = 8192;
opts->total_ops = 1000;
+   opts->nb_descriptors = 2048;
 
opts->buffer_size_list[0] = 64;
opts->buffer_size_count = 1;
@@ -740,6 +760,7 @@ cperf_opts_parse_long(int opt_idx, struct cperf_options 
*opts)
{ CPERF_BURST_SIZE, parse_burst_sz },
{ CPERF_BUFFER_SIZE,parse_buffer_sz },
{ CPERF_SEGMENTS_NB,parse_segments_nb },
+   { CPERF_DESC_NB,parse_desc_nb },
{ CPERF_DEVTYPE,parse_device_type },
{ CPERF_OPTYPE, parse_op_type },
{ CPERF_SESSIONLESS,parse_sessionless },
diff --git a/app/test-crypto-perf/main.c b/app/test-crypto-perf/main.c
index 99f5d3e..7e6ca8e 100644
--- a/app/test-crypto-perf/main.c
+++ b/app/test-crypto-perf/main.c
@@ -123,7 +123,7 @@ cperf_initialize_cryptodev(struct cperf_options *opts, 
uint8_t *enabled_cdevs,
};
 
struct rte_cryptodev_qp_conf qp_conf = {
-   .nb_descriptors = 2048
+   .nb_descriptors = opts->nb_descriptors
};
 
 
-- 
2.7.4



[dpdk-dev] [PATCH 3/4] test-crypto-perf: add new PMD benchmarking mode

2017-08-24 Thread Anatoly Burakov
This patch adds a new benchmarking mode, which is intended for
microbenchmarking individual parts of the cryptodev framework,
specifically crypto ops alloc-build-free, cryptodev PMD enqueue
and cryptodev PMD dequeue.

It works by first benchmarking crypto operation alloc-build-free
loop (no enqueues/dequeues happening), and then benchmarking
enqueue and dequeue separately, by first completely filling up the
TX queue, and then completely draining the RX queue.

Results are shown as cycle counts per alloc/build/free, PMD enqueue
and PMD dequeue.

One new test mode is added: "pmd-cyclecount"
  (called with --ptest=pmd-cyclecount)

New command-line argument is also added:
  --pmd-cyclecount-delay-ms: this is a pmd-cyclecount-specific parameter
  that controls the delay between enqueue and dequeue. This is
  useful for benchmarking hardware acceleration, as hardware may
  not be able to keep up with enqueued packets. This parameter
  can be increased if there are large amounts of dequeue
  retries.

Signed-off-by: Burakov, Anatoly 
---
 app/test-crypto-perf/Makefile|   1 +
 app/test-crypto-perf/cperf_options.h |   9 +-
 app/test-crypto-perf/cperf_options_parsing.c |  33 ++
 app/test-crypto-perf/cperf_test_pmd_cyclecount.c | 707 +++
 app/test-crypto-perf/cperf_test_pmd_cyclecount.h |  61 ++
 app/test-crypto-perf/main.c  |   9 +-
 6 files changed, 818 insertions(+), 2 deletions(-)
 create mode 100644 app/test-crypto-perf/cperf_test_pmd_cyclecount.c
 create mode 100644 app/test-crypto-perf/cperf_test_pmd_cyclecount.h

diff --git a/app/test-crypto-perf/Makefile b/app/test-crypto-perf/Makefile
index e4a989f..821e8e5 100644
--- a/app/test-crypto-perf/Makefile
+++ b/app/test-crypto-perf/Makefile
@@ -42,6 +42,7 @@ SRCS-y += cperf_options_parsing.c
 SRCS-y += cperf_test_vectors.c
 SRCS-y += cperf_test_throughput.c
 SRCS-y += cperf_test_latency.c
+SRCS-y += cperf_test_pmd_cyclecount.c
 SRCS-y += cperf_test_verify.c
 SRCS-y += cperf_test_vector_parsing.c
 
diff --git a/app/test-crypto-perf/cperf_options.h 
b/app/test-crypto-perf/cperf_options.h
index edd6b79..b82d78b 100644
--- a/app/test-crypto-perf/cperf_options.h
+++ b/app/test-crypto-perf/cperf_options.h
@@ -41,12 +41,17 @@
 
 #define CPERF_CSV  ("csv-friendly")
 
+/* benchmark-specific options */
+#define CPERF_PMD_CYCLECOUNT_DELAY_MS \
+   ("pmd-cyclecount-delay-ms")
+
 #define MAX_LIST 32
 
 enum cperf_perf_test_type {
CPERF_TEST_TYPE_THROUGHPUT,
CPERF_TEST_TYPE_LATENCY,
-   CPERF_TEST_TYPE_VERIFY
+   CPERF_TEST_TYPE_VERIFY,
+   CPERF_TEST_TYPE_PMD_CYCLECOUNT
 };
 
 
@@ -115,6 +120,8 @@ struct cperf_options {
uint32_t min_burst_size;
uint32_t inc_burst_size;
 
+   /* pmd-cyclecount specific options */
+   uint32_t pmdcc_delay;
 };
 
 void
diff --git a/app/test-crypto-perf/cperf_options_parsing.c 
b/app/test-crypto-perf/cperf_options_parsing.c
index f4097d9..42a920f 100644
--- a/app/test-crypto-perf/cperf_options_parsing.c
+++ b/app/test-crypto-perf/cperf_options_parsing.c
@@ -76,6 +76,10 @@ parse_cperf_test_type(struct cperf_options *opts, const char 
*arg)
{
cperf_test_type_strs[CPERF_TEST_TYPE_LATENCY],
CPERF_TEST_TYPE_LATENCY
+   },
+   {
+   cperf_test_type_strs[CPERF_TEST_TYPE_PMD_CYCLECOUNT],
+   CPERF_TEST_TYPE_PMD_CYCLECOUNT
}
};
 
@@ -641,6 +645,20 @@ parse_csv_friendly(struct cperf_options *opts, const char 
*arg __rte_unused)
return 0;
 }
 
+static int
+parse_pmd_cyclecount_delay_ms(struct cperf_options *opts,
+   const char *arg)
+{
+   int ret = parse_uint32_t(&opts->pmdcc_delay, arg);
+
+   if (ret) {
+   RTE_LOG(ERR, USER1, "failed to parse pmd-cyclecount delay\n");
+   return -1;
+   }
+
+   return 0;
+}
+
 typedef int (*option_parser_t)(struct cperf_options *opts,
const char *arg);
 
@@ -693,6 +711,8 @@ static struct option lgopts[] = {
 
{ CPERF_CSV, no_argument, 0, 0},
 
+   { CPERF_PMD_CYCLECOUNT_DELAY_MS, required_argument, 0, 0 },
+
{ NULL, 0, 0, 0 }
 };
 
@@ -747,6 +767,8 @@ cperf_options_default(struct cperf_options *opts)
opts->aead_aad_sz = 0;
 
opts->digest_sz = 12;
+
+   opts->pmdcc_delay = 0;
 }
 
 static int
@@ -782,6 +804,7 @@ cperf_opts_parse_long(int opt_idx, struct cperf_options 
*opts)
{ CPERF_AEAD_AAD_SZ,parse_aead_aad_sz },
{ CPERF_DIGEST_SZ,  parse_digest_sz },
{ CPERF_CSV,parse_csv_friendly},
+   { CPERF_PMD_CYCLECOUNT_DELAY_MS, parse_pmd_cyclecount_delay_ms},
};
unsigned int i;
 
@@ -925,6 +948,14 @@ cperf_options_check(struct cperf_options *options)
return -EINVAL;
}
 
+ 

[dpdk-dev] [PATCH 2/4] doc: document new nb-desc parameter for test-crypto-perf app

2017-08-24 Thread Anatoly Burakov
Signed-off-by: Burakov, Anatoly 
---
 doc/guides/tools/cryptoperf.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/tools/cryptoperf.rst b/doc/guides/tools/cryptoperf.rst
index 457f817..985848b 100644
--- a/doc/guides/tools/cryptoperf.rst
+++ b/doc/guides/tools/cryptoperf.rst
@@ -325,6 +325,10 @@ The following are the appication command-line options:
 
 Set the size of digest.
 
+* ``--desc-nb ``
+
+Set default number of descriptors in cryptodev.
+
 * ``--csv-friendly``
 
 Enable test result output CSV friendly rather than human friendly.
-- 
2.7.4



[dpdk-dev] [PATCH 0/4] New crypto acceleration benchmark mode

2017-08-24 Thread Anatoly Burakov
From: "Burakov, Anatoly" 

This patchset adds a new "PMD cyclecount" test mode for test-crypto-perf
application. This mode is intended to measure hardware acceleration
cost (in terms of cycle count) more accurately than throughput test.

The general idea is the following:
- Measure build-alloc-free cycle separately
- Alloc and build ops
- Measure completely filling up the TX ring
- Wait until ops are processed
- Measure completely draining the RX ring
- Free all allocated ops

In order to make measurements more accurate, the enqueue/dequeue is
still done in bursts of specified size, but all of the bursts are now
part of a "superburst" of size equal to number of descriptors
configured for the device. So, if the number of descriptors configured
was 2048 (the default), then 2048 ops will be enqueued and dequeued,
in bursts of size specified by test command line.

The following command-line switch will run the test:
  --ptest=pmd-cyclecount

In addition to a new mode, two mode command line switches are added:
- --desc-nb - configure number of cryptodev descriptors. This value was
 previously hardcoded to 2048, but is now configurable and set
 to 2048 by default (so existing behavior is unchanged).
- --pmd-cyclecount-delay-ms - pmd-cyclecount-specific parameter that
 configures the delay (in milliseconds) between TX and RX
 superbursts, to allow hardware to process ops. Set to 0 by
 default, and it is expected that each user will tune it for
 every device. This has no effect on other benchmark modes.

PMD cyclecount mode can be used to benchmark software cryptodev drivers
as well, but the results will be far less accurate for smaller burst
sizes.

Anatoly Burakov (4):
  test-crypto-perf: add nb-desc parameter
  doc: document new nb-desc parameter for test-crypto-perf app
  test-crypto-perf: add new PMD benchmarking mode
  doc: document new pmd-cyclecount benchmarking mode in test-crypto-perf

 app/test-crypto-perf/Makefile|   1 +
 app/test-crypto-perf/cperf_options.h |  11 +-
 app/test-crypto-perf/cperf_options_parsing.c |  54 ++
 app/test-crypto-perf/cperf_test_pmd_cyclecount.c | 707 +++
 app/test-crypto-perf/cperf_test_pmd_cyclecount.h |  61 ++
 app/test-crypto-perf/main.c  |  11 +-
 doc/guides/rel_notes/release_17_11.rst   |   6 +
 doc/guides/tools/cryptoperf.rst  |  14 +-
 8 files changed, 861 insertions(+), 4 deletions(-)
 create mode 100644 app/test-crypto-perf/cperf_test_pmd_cyclecount.c
 create mode 100644 app/test-crypto-perf/cperf_test_pmd_cyclecount.h

-- 
2.7.4



[dpdk-dev] [PATCH 4/4] doc: document new pmd-cyclecount benchmarking mode in test-crypto-perf

2017-08-24 Thread Anatoly Burakov
Also, document the new pmd-cyclecount-specific flag.

Signed-off-by: Burakov, Anatoly 
---
 doc/guides/rel_notes/release_17_11.rst |  6 ++
 doc/guides/tools/cryptoperf.rst| 10 +-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_17_11.rst 
b/doc/guides/rel_notes/release_17_11.rst
index 170f4f9..860858c 100644
--- a/doc/guides/rel_notes/release_17_11.rst
+++ b/doc/guides/rel_notes/release_17_11.rst
@@ -41,6 +41,12 @@ New Features
  Also, make sure to start the actual text at the margin.
  =
 
+* **Add new benchmarking mode to dpdk-crypto-perf-test application.**
+
+  Added new "PMD cyclecount" benchmark mode to dpdk-crypto-perf-test 
application
+  that displays more detailed breakdown of CPU cycles used by hardware
+  acceleration.
+
 
 Resolved Issues
 ---
diff --git a/doc/guides/tools/cryptoperf.rst b/doc/guides/tools/cryptoperf.rst
index 985848b..482e1cf 100644
--- a/doc/guides/tools/cryptoperf.rst
+++ b/doc/guides/tools/cryptoperf.rst
@@ -50,7 +50,8 @@ offload are still consumed by the test tool and included in 
the cycle-count.
 These cycles are consumed by retries and inefficient API calls enqueuing and
 dequeuing smaller bursts than specified by the cmdline parameter. This results
 in a larger cycle-count measurement and should not be interpreted as an offload
-cost measurement.
+cost measurement. Using "pmd-cyclecount" mode will give a better idea of
+actual costs of hardware acceleration.
 
 On hardware devices the throughput measurement is not necessarily the maximum
 possible for the device, e.g. it may be necessary to use multiple cores to keep
@@ -134,6 +135,7 @@ The following are the appication command-line options:
throughput
latency
verify
+   pmd-cyclecount
 
 * ``--silent``
 
@@ -329,6 +331,12 @@ The following are the appication command-line options:
 
 Set default number of descriptors in cryptodev.
 
+* ``--pmd-cyclecount-delay-ms ``
+
+Add a delay (in milliseconds) between enqueue and dequeue in
+pmd-cyclecount benchmarking mode (useful when benchmarking
+hardware acceleration).
+
 * ``--csv-friendly``
 
 Enable test result output CSV friendly rather than human friendly.
-- 
2.7.4



[dpdk-dev] [PATCH] app/testpmd:add bond type description

2017-08-24 Thread Rongqiang XIE
In function cmd_show_bonding_config_parsed() used number represent
the bond type,in order more detailed,add bond type description
otherwise we may confused about the number type.
And also,the primary port just use in mode active backup and tlb,
so,when the mode is active backup or tlb show the primary port info
may be more appropriate.

Signed-off-by: Rongqiang XIE 
---
 app/test-pmd/cmdline.c | 29 +++--
 drivers/net/bonding/rte_eth_bond.h | 15 +
 drivers/net/bonding/rte_eth_bond_api.c | 39 ++
 3 files changed, 72 insertions(+), 11 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index cd8c358..c386a63 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -4593,6 +4593,7 @@ static void cmd_show_bonding_config_parsed(void 
*parsed_result,
 {
struct cmd_show_bonding_config_result *res = parsed_result;
int bonding_mode, agg_mode;
+   char bonding_str[BONDING_MODE_STRING_LEN];
uint8_t slaves[RTE_MAX_ETHPORTS];
int num_slaves, num_active_slaves;
int primary_id;
@@ -4600,13 +4601,17 @@ static void cmd_show_bonding_config_parsed(void 
*parsed_result,
portid_t port_id = res->port_id;
 
/* Display the bonding mode.*/
-   bonding_mode = rte_eth_bond_mode_get(port_id);
-   if (bonding_mode < 0) {
-   printf("\tFailed to get bonding mode for port = %d\n", port_id);
+   if (!rte_eth_bond_mode_string_get(port_id, bonding_str)) {
+   printf("\tFailed to get bonding mode string for port = %d\n", 
port_id);
return;
} else
-   printf("\tBonding mode: %d\n", bonding_mode);
+   printf("\tBonding mode: %s\n", bonding_str);
 
+   bonding_mode = rte_eth_bond_mode_get(port_id);
+   if (bonding_mode < 0) {
+   printf("\tFailed to get bonding mode for port = %d\n", port_id);
+   return; 
+   }
if (bonding_mode == BONDING_MODE_BALANCE) {
int balance_xmit_policy;
 
@@ -4685,13 +4690,15 @@ static void cmd_show_bonding_config_parsed(void 
*parsed_result,
printf("\tActive Slaves: []\n");
 
}
-
-   primary_id = rte_eth_bond_primary_get(port_id);
-   if (primary_id < 0) {
-   printf("\tFailed to get primary slave for port = %d\n", 
port_id);
-   return;
-   } else
-   printf("\tPrimary: [%d]\n", primary_id);
+   if (bonding_mode == BONDING_MODE_ACTIVE_BACKUP ||
+   bonding_mode == BONDING_MODE_TLB){
+   primary_id = rte_eth_bond_primary_get(port_id);
+   if (primary_id < 0) {
+   printf("\tFailed to get primary slave for port = %d\n", 
port_id);
+   return;
+   } else
+   printf("\tPrimary: [%d]\n", primary_id);
+   }
 
 }
 
diff --git a/drivers/net/bonding/rte_eth_bond.h 
b/drivers/net/bonding/rte_eth_bond.h
index 8efbf07..c25293a 100644
--- a/drivers/net/bonding/rte_eth_bond.h
+++ b/drivers/net/bonding/rte_eth_bond.h
@@ -117,6 +117,9 @@
 #define BALANCE_XMIT_POLICY_LAYER34(2)
 /**< Layer 3+4 (IP Addresses + UDP Ports) transmit load balancing */
 
+/* Max length size for bond mode string */
+#define BONDING_MODE_STRING_LEN  (30)
+
 /**
  * Create a bonded rte_eth_dev device
  *
@@ -189,6 +192,18 @@
 rte_eth_bond_mode_get(uint8_t bonded_port_id);
 
 /**
+ * Get link bonding mode string of bonded device
+ *
+ * @param bonded_port_id   Port ID of bonded device.
+ *
+ * @param mode  mode string
+ * @return
+ * link bonding mode on success, negative value otherwise
+ */
+int
+rte_eth_bond_mode_string_get(uint8_t bonded_port_id, char *mode);
+
+/**
  * Set slave rte_eth_dev as primary slave of bonded device
  *
  * @param bonded_port_id   Port ID of bonded device.
diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index de1d9e0..5ba097c 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -510,6 +510,45 @@
 }
 
 int
+rte_eth_bond_mode_string_get(uint8_t bonded_port_id, char *mode)
+{
+   struct bond_dev_private *internals;
+
+   if (valid_bonded_port_id(bonded_port_id) != 0)
+   return -1;
+
+   internals = rte_eth_devices[bonded_port_id].data->dev_private;
+   
+   switch (internals->mode) {
+   case BONDING_MODE_ROUND_ROBIN:
+   memcpy(mode, "round-robin", BONDING_MODE_STRING_LEN);
+   break;
+   case BONDING_MODE_ACTIVE_BACKUP:
+   memcpy(mode, "active-backup", BONDING_MODE_STRING_LEN);
+   break;
+   case BONDING_MODE_BALANCE:
+   memcpy(mode, "link-aggregation", 
BONDING_MODE_STRING_LEN);
+   break;
+   case BOND

[dpdk-dev] 答复: Re: 答复: Re: [PATCH] app/testpmd:add bond type description

2017-08-24 Thread xie . rongqiang
Hi,
   I make a new patch for this issue becase the previous patch has delete 
when the version 17.08 release.
 
   The website is http://www.dpdk.org/dev/patchwork/patch/27851/,Thank 
you.


Thomas Monjalon  写于 2017/08/24 04:22:17:

> 发件人:  Thomas Monjalon 
> 收件人:  xie.rongqi...@zte.com.cn, Declan Doherty 
> , 
> 抄送: dev@dpdk.org, jingjing...@intel.com
> 日期:  2017/08/24 04:23
> 主题: Re: [dpdk-dev] 答复: Re:  [PATCH] app/testpmd:add bond type 
description
> 
> 16/08/2017 04:31, xie.rongqi...@zte.com.cn:
> > I am sorry to reply so late for some reason.
> > 
> > And i figure out two ways to implement this kind of things inside the 
> > bonding code,
> > 
> > First,if can the function rte_eth_bond_mode_get() return string, so we 
can 
> > print
> 
> No it is better to use integers in API.
> 
> > the bond mode straight, but in this way, we need fix the other c 
source 
> > where call the function. 
> > 
> > Second, we add an interface return bond mode string, in this way, we 
just 
> > call it in function
> 
> Yes a new function to convert integer to string seems better.
> 
> At the end, Declan should approve/decide.
> 
> > cmd_show_bonding_config_parsed().
> > 
> > Finally, which way do you agree more? 
> > 
> > Looking forward to your early reply,Thank your. 
> > 
> > 
> > Thomas Monjalon   2017/07/03 02:11:52:
> > 
> > > :  Thomas Monjalon 
> > > :  Declan Doherty , 
> > > : dev@dpdk.org, RongQiang Xie , 
> > > jingjing...@intel.com
> > > :  2017/07/03 02:12
> > > : Re: [dpdk-dev] [PATCH] app/testpmd:add bond type description
> > > 
> > > 30/06/2017 17:39, Declan Doherty:
> > > > On 30/06/17 08:56, RongQiang Xie wrote:
> > > > > In function cmd_show_bonding_config_parsed() used number 
represent
> > > > > the bond type,in order more detailed,add bond type description
> > > > > otherwise we may confused about the number type.
> > > > > And also,the primary port just use in mode active backup and 
tlb,
> > > > > so,when the mode is active backup or tlb show the primary port 
info
> > > > > may be more appropriate.
> > > > > 
> > > > > Signed-off-by: RongQiang Xie 
> > > > > ---
> > > > >   app/test-pmd/cmdline.c | 17 +++--
> > > > >   1 file changed, 11 insertions(+), 6 deletions(-)
> > > > > 
> > > > > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> > > > > index ff8ffd2..45845a4 100644
> > > > > --- a/app/test-pmd/cmdline.c
> > > > > +++ b/app/test-pmd/cmdline.c
> > > > > @@ -4390,7 +4390,9 @@ static void cmd_show_bonding_config_parsed
> > > (void *parsed_result,
> > > > > printf("\tFailed to get bonding mode for port = %d\n", 
> > port_id);
> > > > > return;
> > > > >  } else
> > > > > -  printf("\tBonding mode: %d\n", bonding_mode);
> > > > > +  printf("\tBonding mode: %d ", bonding_mode);
> > > > > +   printf("[0:Round Robin, 1:Active Backup, 2:Balance, 
3:Broadcast, 
> > ");
> > > > > +   printf("\n\t\t\t4:802.3AD, 5:Adaptive TLB, 6:Adaptive Load 
> > > Balancing]\n");
> > > > > 
> > > > 
> > > > Good idea, but it would be clearer if we just returned the actual 
mode 
> > 
> > > > string so the user doesn't need to parse it themselves, like 
below.
> > > > 
> > > > -   } else
> > > > -   printf("\tBonding mode: %d ", bonding_mode);
> > > > -   printf("[0:Round Robin, 1:Active Backup, 2:Balance, 
> > 3:Broadcast, ");
> > > > -   printf("\n\t\t\t4:802.3AD, 5:Adaptive TLB, 6:Adaptive Load 

> > > > Balancing]\n");
> > > > +   }
> > > > +
> > > > +   printf("\tBonding mode: %d (", bonding_mode);
> > > > +   switch (bonding_mode) {
> > > > +   case BONDING_MODE_ROUND_ROBIN:
> > > > +   printf("round-robin");
> > > > +   break;
> > > > +   case BONDING_MODE_ACTIVE_BACKUP:
> > > > +   printf("active-backup");
> > > > +   break;
> > > > +   case BONDING_MODE_BALANCE:
> > > > +   printf("link-aggregation");
> > > > +   break;
> > > > +   case BONDING_MODE_BROADCAST:
> > > > +   printf("broadcast");
> > > > +   break;
> > > > +   case BONDING_MODE_8023AD:
> > > > +   printf("link-aggregation-802.3ad");
> > > > +   break;
> > > > +   case BONDING_MODE_TLB:
> > > > +   printf("transmit-load-balancing");
> > > > +   break;
> > > > +   case BONDING_MODE_ALB:
> > > > +   printf("adaptive-load-balancing");
> > > > +   break;
> > > > +   default:
> > > > +   printf("unknown-mode");
> > > > +   }
> > > > +   printf(")\n");
> > > 
> > > I would say no.
> > > Can we think how to implement this kind of things inside the bonding 

> > code?
> > > 
> > > 
> > 
> 
> 
> 
> 



[dpdk-dev] [RFC v1] examples/flow_filtering: demo of simple rte flow

2017-08-24 Thread Ori Kam
This application shows a simple usage of the
rte_flow API for hardware filtering offloading.

In this demo we are filtering specific IP to
specific target queue, while sending all the
rest of the packets to other queue.

Included in this commit is a simple python
script file for sending custom packets.

Signed-off-by: Ori Kam 
---
 examples/flow_filtering/Makefile|  17 ++
 examples/flow_filtering/flow_blocks.c   | 122 +++
 examples/flow_filtering/main.c  | 233 
 examples/flow_filtering/test-flowy.scapy.py |  28 
 4 files changed, 400 insertions(+)
 create mode 100644 examples/flow_filtering/Makefile
 create mode 100644 examples/flow_filtering/flow_blocks.c
 create mode 100644 examples/flow_filtering/main.c
 create mode 100755 examples/flow_filtering/test-flowy.scapy.py

diff --git a/examples/flow_filtering/Makefile b/examples/flow_filtering/Makefile
new file mode 100644
index 000..6e5295d
--- /dev/null
+++ b/examples/flow_filtering/Makefile
@@ -0,0 +1,17 @@
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overridden by command line or environment
+RTE_TARGET ?= x86_64-native-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+APP = flow
+
+SRCS-y := main.c
+
+CFLAGS += -g3
+CFLAGS += $(WERROR_FLAGS)
+
+include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/flow_filtering/flow_blocks.c 
b/examples/flow_filtering/flow_blocks.c
new file mode 100644
index 000..cb8b06d
--- /dev/null
+++ b/examples/flow_filtering/flow_blocks.c
@@ -0,0 +1,122 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define MAX_PATTERN_NUM4
+
+
+static struct rte_flow *
+generate_ipv4_rule(uint8_t port_id, uint16_t rx_q,
+   uint32_t src_ip, uint32_t src_mask,
+   uint32_t dest_ip, uint32_t dest_mask)
+{
+   struct rte_flow_attr attr;
+   struct rte_flow_item pattern[MAX_PATTERN_NUM];
+   struct rte_flow_action action[MAX_PATTERN_NUM];
+   struct rte_flow *flow;
+   struct rte_flow_error error;
+
+   memset(pattern, 0, sizeof(pattern));
+   memset(action, 0, sizeof(action));
+
+   /*
+* create the action sequence.
+* one action only,  move packet to queue
+*/
+   struct rte_flow_action_queue queue = { .index = rx_q };
+
+   action[0].type = RTE_FLOW_ACTION_TYPE_QUEUE;
+   action[0].conf = &queue;
+
+   action[1].type = RTE_FLOW_ACTION_TYPE_END;
+
+   /*
+* set the rule attribute.
+* in this case only ingress packets will be checked.
+*/
+   memset(&attr, 0, sizeof(struct rte_flow_attr));
+   attr.ingress = 1;
+
+   /*
+* set the first level of the pattern (eth).
+* since in this example we just want to get the
+* ipv4 we set this level to allow all.
+*/
+   struct rte_flow_item_eth eth_spec;
+   struct rte_flow_item_eth eth_mask;
+   memset(ð_spec, 0, sizeof(struct rte_flow_item_eth));
+   memset(ð_mask, 0, sizeof(struct rte_flow_item_eth));
+   eth_spec.type = 0;
+   eth_mask.type = 0;
+   pattern[0].type = RTE_FLOW_ITEM_TYPE_ETH;
+   pattern[0].spec = ð_spec;
+   pattern[0].mask = ð_mask;
+
+   /*
+* setting the second level of the pattern (vlan).
+* since in this example we just 

Re: [dpdk-dev] [dpdk-stable] [PATCH] net/mlx5: fix xstats functions unlock missing

2017-08-24 Thread Ferruh Yigit
On 8/23/2017 4:09 PM, Nélio Laranjeiro wrote:
> On Mon, Aug 14, 2017 at 02:32:24PM +0300, Matan Azrad wrote:
>> The corrupted code didn't unlock the spinlock in xstats
>> get and reset functions error flow.
>>
>> Hence, if these errors happaned, the device spinlock was
>> left locked and many mlx5 device functionalities were blocked.
>>
>> The fix unlocks the spinlock in the missed places.
>>
>> Fixes: e62bc9e70608 ("net/mlx5: fix extended statistics")
>> Cc: sta...@dpdk.org
>>
>> Signed-off-by: Matan Azrad 

> Acked-by: Nelio Laranjeiro 

Applied to dpdk-next-net/master, thanks.


[dpdk-dev] [PATCH v1] net/mlx5: support upstream rdma-core

2017-08-24 Thread Shachar Beiser
 This removes the dependency on specific Mellanox OFED libraries by
 using the upstream rdma-core and linux upstream community code.

 Minimal requirements: rdma-core v16 and Kernel Linux 4.14.

Signed-off-by: Shachar Beiser 
---
 doc/guides/nics/mlx5.rst |  29 +-
 drivers/net/mlx5/Makefile|  39 +--
 drivers/net/mlx5/mlx5.c  |  93 +++--
 drivers/net/mlx5/mlx5.h  |   4 +-
 drivers/net/mlx5/mlx5.rst| 663 +++
 drivers/net/mlx5/mlx5_ethdev.c   |  10 +-
 drivers/net/mlx5/mlx5_fdir.c | 103 +++---
 drivers/net/mlx5/mlx5_flow.c | 226 ++--
 drivers/net/mlx5/mlx5_mac.c  |  16 +-
 drivers/net/mlx5/mlx5_prm.h  |  41 ++-
 drivers/net/mlx5/mlx5_rxmode.c   |  18 +-
 drivers/net/mlx5/mlx5_rxq.c  | 221 ++--
 drivers/net/mlx5/mlx5_rxtx.c |  17 +-
 drivers/net/mlx5/mlx5_rxtx.h |  35 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.c |   5 +-
 drivers/net/mlx5/mlx5_txq.c  |  71 ++--
 drivers/net/mlx5/mlx5_vlan.c |  12 +-
 mk/rte.app.mk|   2 +-
 18 files changed, 1145 insertions(+), 460 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5.rst

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index f4cb18b..a1b3321 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -295,6 +295,7 @@ DPDK and must be installed separately:
 
 - **Kernel modules** (mlnx-ofed-kernel)
 
+  DPDK 17.11 supports linux upstream kernel.
   They provide the kernel-side Verbs API and low level device drivers that
   manage actual hardware initialization and resources sharing with user
   space processes.
@@ -376,23 +377,27 @@ Supported NICs
 Quick Start Guide
 -
 
-1. Download latest Mellanox OFED. For more info check the  `prerequisites`_.
+1. Since DPDK 17.11 version Mellanox DPDK runs on both top of linux upstream 
kernel
+   and on top of Mellanox OFED.
+   If your Mellanox DPDK version is older than 17.11 or
+you Mellanox DPDK version is newer , but you want to run on top 
Mellanox OFED :
+a. Download latest Mellanox OFED. For more info check the  
`prerequisites`_.
+b. Install the required libraries and kernel modules either by 
installing
+   only the required set, or by installing the entire Mellanox 
OFED:
 
+   .. code-block:: console
 
-2. Install the required libraries and kernel modules either by installing
-   only the required set, or by installing the entire Mellanox OFED:
-
-   .. code-block:: console
-
-./mlnxofedinstall
-
-3. Verify the firmware is the correct one:
+   ./mlnxofedinstall
+   If your Mellanox DPDK is newer than 17.11 and runs on top of linux upstream
+a. Install linux upstream kernel 4.14v and above.
+b. Install Mellanox rdma-core v16 or above
+2. Verify the firmware is the correct one:
 
.. code-block:: console
 
 ibv_devinfo
 
-4. Verify all ports links are set to Ethernet:
+3. Verify all ports links are set to Ethernet:
 
.. code-block:: console
 
@@ -422,7 +427,7 @@ Quick Start Guide
 mlxconfig -d  set SRIOV_EN=1 NUM_OF_VFS=16
 mlxfwreset -d  reset
 
-5. Restart the driver:
+4. Restart the driver:
 
.. code-block:: console
 
@@ -449,7 +454,7 @@ Quick Start Guide
 
 echo [num_vfs] > /sys/class/infiniband/mlx5_0/device/sriov_numvfs
 
-6. Compile DPDK and you are ready to go. See instructions on
+5. Compile DPDK and you are ready to go. See instructions on
:ref:`Development Kit Build System `
 
 Performance tuning
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 14b739a..2de1c78 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -104,41 +104,20 @@ mlx5_autoconf.h.new: FORCE
 mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
$Q $(RM) -f -- '$@'
$Q sh -- '$<' '$@' \
-   HAVE_VERBS_IBV_EXP_CQ_COMPRESSED_CQE \
-   infiniband/verbs_exp.h \
-   enum IBV_EXP_CQ_COMPRESSED_CQE \
+   HAVE_IBV_DEVICE_VXLAN_SUPPORT \
+   infiniband/verbs.h \
+   enum IBV_DEVICE_VXLAN_SUPPORT \
$(AUTOCONF_OUTPUT)
$Q sh -- '$<' '$@' \
-   HAVE_VERBS_MLX5_ETH_VLAN_INLINE_HEADER_SIZE \
-   infiniband/mlx5_hw.h \
-   enum MLX5_ETH_VLAN_INLINE_HEADER_SIZE \
+   HAVE_IBV_WQ_FLAG_RX_END_PADDING \
+   infiniband/verbs.h \
+   enum IBV_WQ_FLAG_RX_END_PADDING \
$(AUTOCONF_OUTPUT)
$Q sh -- '$<' '$@' \
-   HAVE_VERBS_MLX5_OPCODE_TSO \
-   infiniband/mlx5_hw.h \
-   enum MLX5_OPCODE_TSO \
+   HAVE_IBV_MLX5_MOD_MPW \
+   infiniband/mlx5dv.h \
+   enum MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED \
$(AUTOCONF_OUTPUT)
-

Re: [dpdk-dev] [PATCH] net/mlx5: extend debug logs verbosity

2017-08-24 Thread Nélio Laranjeiro
On Wed, Aug 23, 2017 at 10:10:58AM +0300, Shahaf Shuler wrote:
> Extend debug logs verbosity by printing the full completion with error
> along with the entire txq in case of error. For the Rx case no logs were
> added since such errors are counted and recovered by the Rx data path.
> 
> Such prints are essential to understand the root cause for the error.
> 
> Signed-off-by: Shahaf Shuler 
> Signed-off-by: Xueming Li 
> ---
> 
> This patch should be applied only after the series:
> http://dpdk.org/dev/patchwork/patch/27367/
> 
> ---
>  drivers/net/mlx5/mlx5_rxtx.h | 20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)

Acked-by: Nelio Laranjeiro 

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH 1/2] net/mlx5: fix num seg assumption on vPMD

2017-08-24 Thread Nélio Laranjeiro
On Wed, Aug 23, 2017 at 10:33:57AM +0300, Shahaf Shuler wrote:
> vPMD Tx function assumes that after the scatter of the
> multi-segment packets the next packet will be a single segment packet.
> 
> This is not current as the function can return due to lack of resources
> without sending all of the multi-segment mbufs sequence.
> 
> Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Shahaf Shuler 
> ---
> 
> This patch should be applied only after the series:
> http://dpdk.org/dev/patchwork/patch/27367/
> 
> ---
>  drivers/net/mlx5/mlx5_rxtx_vec_sse.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.c 
> b/drivers/net/mlx5/mlx5_rxtx_vec_sse.c
> index 8560f745a..30727e6dd 100644
> --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.c
> +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.c
> @@ -119,8 +119,7 @@ txq_wr_dseg_v(struct txq *txq, __m128i *dseg,
>  }
>  
>  /**
> - * Count the number of continuous single segment packets. The first packet 
> must
> - * be a single segment packet.
> + * Count the number of continuous single segment packets.
>   *
>   * @param pkts
>   *   Pointer to array of packets.
> @@ -137,7 +136,8 @@ txq_check_multiseg(struct rte_mbuf **pkts, uint16_t 
> pkts_n)
>  
>   if (!pkts_n)
>   return 0;
> - assert(NB_SEGS(pkts[0]) == 1);
> + if (NB_SEGS(pkts[0]) > 1)
> + return 0;
>   /* Count the number of continuous single segment packets. */
>   for (pos = 1; pos < pkts_n; ++pos)
>   if (NB_SEGS(pkts[pos]) > 1)
> @@ -502,6 +502,8 @@ mlx5_tx_burst_vec(void *dpdk_txq, struct rte_mbuf **pkts, 
> uint16_t pkts_n)
>   n = RTE_MIN((uint16_t)(pkts_n - nb_tx), MLX5_VPMD_TX_MAX_BURST);
>   if (!(txq->flags & ETH_TXQ_FLAGS_NOMULTSEGS))
>   n = txq_check_multiseg(&pkts[nb_tx], n);
> + if (!n)
> + break;
>   if (!(txq->flags & ETH_TXQ_FLAGS_NOOFFLOADS))
>   n = txq_calc_offload(txq, &pkts[nb_tx], n, &cs_flags);
>   ret = txq_burst_v(txq, &pkts[nb_tx], n, cs_flags);
> -- 
> 2.12.0
 
Acked-by: Nelio Laranjeiro 

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH 2/2] net/mlx5: enforce Tx num of segments limitation

2017-08-24 Thread Nélio Laranjeiro
On Wed, Aug 23, 2017 at 10:33:58AM +0300, Shahaf Shuler wrote:
> Mellanox NICs has a limitation on the number of mbuf segments a multi
> segment mbuf can have. The max number depends on the Tx offloads requested.
> 
> The current code not enforce such limitation, which might cause
> malformed WQEs to be written to the device.

Avoid acronyms in the commit message (at least on first occurrence), not all
people knows what a WQE is and getting such information is not easy.

> This commit adds verification for the number of mbuf segments posted
> to the device. In case of overflow the packet will not be sent.
> Debug prints were added to help application identify the cause for such
> case.
> 
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Shahaf Shuler 
> ---
> 
> This patch should be applied only after the series:
> http://dpdk.org/dev/patchwork/patch/27367/
> 
> ---
>  drivers/net/mlx5/mlx5_defs.h |  3 ++-
>  drivers/net/mlx5/mlx5_prm.h  |  3 +++
>  drivers/net/mlx5/mlx5_rxtx.c | 30 +++---
>  drivers/net/mlx5/mlx5_rxtx_vec_sse.c |  8 
>  drivers/net/mlx5/mlx5_txq.c  | 27 +++
>  5 files changed, 67 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
> index 608072f7e..87244e7db 100644
> --- a/drivers/net/mlx5/mlx5_prm.h
> +++ b/drivers/net/mlx5/mlx5_prm.h
> @@ -154,6 +154,9 @@
>  /* Default mark value used when none is provided. */
>  #define MLX5_FLOW_MARK_DEFAULT 0xff
>  
> +/* Maximum number of DS in WQE. */
> +#define MLX5_MAX_DS (63)
> +

Why the parenthesis?

Thanks,

-- 
Nélio Laranjeiro
6WIND


[dpdk-dev] [PATCH] net/i40e: fix vsi vlan stripping

2017-08-24 Thread Zang MingJie
Function i40e_vsi_config_vlan_stripping doesn't strip vlan tag for vf.
The patch should fix the problem.

Signed-off-by: Zang MingJie 
---
 drivers/net/i40e/i40e_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 5f26e24a3..cd48ebbc1 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -5189,7 +5189,7 @@ i40e_vsi_config_vlan_stripping(struct i40e_vsi *vsi, bool 
on)
}
 
if (on)
-   vlan_flags = I40E_AQ_VSI_PVLAN_EMOD_STR_BOTH;
+   vlan_flags = I40E_AQ_VSI_PVLAN_EMOD_STR;
else
vlan_flags = I40E_AQ_VSI_PVLAN_EMOD_NOTHING;
vsi->info.valid_sections =
-- 
2.11.0



[dpdk-dev] [PATCH v1 00/11] Cavium Octeontx external mempool driver

2017-08-24 Thread Santosh Shukla
Patch implements the HW mempool offload driver for packets buffer.
This HW mempool offload driver has dependency on:
- IOVA infrastrucure [1].
- Dynamically configure mempool handle (ie.. --mbuf-pool-ops eal arg) [2].
- Infrastructure to support octeontx HW mempool manager [3]. 

Mempool driver based on v17.11-rc0. Series has dependency
on upstream patches [1],[2],[3].

A new pool handle called "octeontx_fpavf" introduced and is being configured
using eal arg mbuf-pool-ops="octeontx_fpavf", Note that this --eal arg is
under review.
Or
Can be configured statically like below:
CONFIG_RTE_MBUF_DEFAULT_MEMPOOL_OPS="octeontx_fpavf"

A new mempool driver specific CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL config
is introduced.

Refer doc patch [10/11] for build and run steps.

Patch summary:
- [0/11] : add mempool offload HW block definition.
- [1/11] : support for build and log infra, needed for pmd driver.
- [2/11] : probe mempool PCIe vf device
- [3/11] : support pool alloc
- [4/11] : support pool free
- [5/11] : support pool enq and deq
- [6/11] : support pool get count
- [7/11] : support pool get capability
- [8/11] : support pool update range
- [9/11] : translate pool handle to pool index
- [10/11] : doc and release info

Checkpatch status:
- Noticed false positive line over 80 char debug warning
- asm_ false +ve error.

Thanks.

[1] http://dpdk.org/ml/archives/dev/2017-August/072871.html
[2] http://dpdk.org/ml/archives/dev/2017-August/072910.html
[3] http://dpdk.org/ml/archives/dev/2017-August/072892.html


Santosh Shukla (11):
  mempool/octeontx: add HW constants
  mempool/octeontx: add build and log infrastructure
  mempool/octeontx: probe fpavf pcie devices
  mempool/octeontx: implement pool alloc
  mempool/octeontx: implement pool free
  mempool/octeontx: implement pool enq and deq
  mempool/octeontx: implement pool get count
  mempool/octeontx: implement pool get capability
  mempool/octeontx: implement pool update range
  mempool/octeontx: translate handle to pool
  doc: add mempool and octeontx mempool device

 MAINTAINERS|   6 +
 config/common_base |   6 +
 doc/guides/index.rst   |   1 +
 doc/guides/mempool/index.rst   |  40 +
 doc/guides/mempool/octeontx.rst| 127 
 drivers/Makefile   |   5 +-
 drivers/mempool/Makefile   |   2 +
 drivers/mempool/octeontx/Makefile  |  74 ++
 drivers/mempool/octeontx/octeontx_fpavf.c  | 835 +
 drivers/mempool/octeontx/octeontx_fpavf.h  | 145 
 drivers/mempool/octeontx/rte_mempool_octeontx.c| 246 ++
 .../octeontx/rte_mempool_octeontx_version.map  |   7 +
 mk/rte.app.mk  |   1 +
 13 files changed, 1493 insertions(+), 2 deletions(-)
 create mode 100644 doc/guides/mempool/index.rst
 create mode 100644 doc/guides/mempool/octeontx.rst
 create mode 100644 drivers/mempool/octeontx/Makefile
 create mode 100644 drivers/mempool/octeontx/octeontx_fpavf.c
 create mode 100644 drivers/mempool/octeontx/octeontx_fpavf.h
 create mode 100644 drivers/mempool/octeontx/rte_mempool_octeontx.c
 create mode 100644 drivers/mempool/octeontx/rte_mempool_octeontx_version.map

-- 
2.11.0



[dpdk-dev] [PATCH v1 01/11] mempool/octeontx: add HW constants

2017-08-24 Thread Santosh Shukla
add HW constants of octeontx fpa mempool device.

Signed-off-by: Santosh Shukla 
Signed-off-by: Jerin Jacob 
---
 drivers/mempool/octeontx/octeontx_fpavf.h | 71 +++
 1 file changed, 71 insertions(+)
 create mode 100644 drivers/mempool/octeontx/octeontx_fpavf.h

diff --git a/drivers/mempool/octeontx/octeontx_fpavf.h 
b/drivers/mempool/octeontx/octeontx_fpavf.h
new file mode 100644
index 0..5c4ee04f7
--- /dev/null
+++ b/drivers/mempool/octeontx/octeontx_fpavf.h
@@ -0,0 +1,71 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright (C) 2017 Cavium Inc. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Cavium networks nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef__OCTEONTX_FPAVF_H__
+#define__OCTEONTX_FPAVF_H__
+
+/* fpa pool Vendor ID and Device ID */
+#define PCI_VENDOR_ID_CAVIUM   0x177D
+#define PCI_DEVICE_ID_OCTEONTX_FPA_VF  0xA053
+
+#defineFPA_VF_MAX  32
+
+/* FPA VF register offsets */
+#define FPA_VF_INT(x)  (0x200ULL | ((x) << 22))
+#define FPA_VF_INT_W1S(x)  (0x210ULL | ((x) << 22))
+#define FPA_VF_INT_ENA_W1S(x)  (0x220ULL | ((x) << 22))
+#define FPA_VF_INT_ENA_W1C(x)  (0x230ULL | ((x) << 22))
+
+#defineFPA_VF_VHPOOL_AVAILABLE(vhpool) (0x04150 | 
((vhpool)&0x0))
+#defineFPA_VF_VHPOOL_THRESHOLD(vhpool) (0x04160 | 
((vhpool)&0x0))
+#defineFPA_VF_VHPOOL_START_ADDR(vhpool)(0x04200 | 
((vhpool)&0x0))
+#defineFPA_VF_VHPOOL_END_ADDR(vhpool)  (0x04210 | 
((vhpool)&0x0))
+
+#defineFPA_VF_VHAURA_CNT(vaura)(0x20120 | 
((vaura)&0xf)<<18)
+#defineFPA_VF_VHAURA_CNT_ADD(vaura)(0x20128 | 
((vaura)&0xf)<<18)
+#defineFPA_VF_VHAURA_CNT_LIMIT(vaura)  (0x20130 | 
((vaura)&0xf)<<18)
+#defineFPA_VF_VHAURA_CNT_THRESHOLD(vaura)  (0x20140 | 
((vaura)&0xf)<<18)
+#defineFPA_VF_VHAURA_OP_ALLOC(vaura)   (0x3 | 
((vaura)&0xf)<<18)
+#defineFPA_VF_VHAURA_OP_FREE(vaura)(0x38000 | 
((vaura)&0xf)<<18)
+
+#define FPA_VF_FREE_ADDRS_S(x, y, z)   \
+   ((x) | (((y) & 0x1ff) << 3) | z) & 1)) << 14))
+
+/* FPA VF register offsets from VF_BAR4, size 2 MByte */
+#defineFPA_VF_MSIX_VEC_ADDR0x0
+#defineFPA_VF_MSIX_VEC_CTL 0x8
+#defineFPA_VF_MSIX_PBA 0xF
+
+#defineFPA_VF0_APERTURE_SHIFT  22
+#define FPA_AURA_SET_SIZE  16
+
+#endif /* __OCTEONTX_FPAVF_H__ */
-- 
2.11.0



[dpdk-dev] [PATCH v1 02/11] mempool/octeontx: add build and log infrastructure

2017-08-24 Thread Santosh Shukla
Signed-off-by: Santosh Shukla 
Signed-off-by: Jerin Jacob 
---
 config/common_base |  6 +++
 drivers/Makefile   |  5 +-
 drivers/mempool/Makefile   |  2 +
 drivers/mempool/octeontx/Makefile  | 60 ++
 drivers/mempool/octeontx/octeontx_fpavf.c  | 31 +++
 drivers/mempool/octeontx/octeontx_fpavf.h  | 19 +++
 .../octeontx/rte_mempool_octeontx_version.map  |  4 ++
 mk/rte.app.mk  |  1 +
 8 files changed, 126 insertions(+), 2 deletions(-)
 create mode 100644 drivers/mempool/octeontx/Makefile
 create mode 100644 drivers/mempool/octeontx/octeontx_fpavf.c
 create mode 100644 drivers/mempool/octeontx/rte_mempool_octeontx_version.map

diff --git a/config/common_base b/config/common_base
index 5e97a08b6..21ef8b1d2 100644
--- a/config/common_base
+++ b/config/common_base
@@ -540,6 +540,12 @@ CONFIG_RTE_LIBRTE_PMD_OCTEONTX_SSOVF=y
 CONFIG_RTE_LIBRTE_PMD_OCTEONTX_SSOVF_DEBUG=n
 
 #
+# Compile PMD for octeontx fpa mempool device
+#
+CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL=y
+CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL_DEBUG=n
+
+#
 # Compile librte_ring
 #
 CONFIG_RTE_LIBRTE_RING=y
diff --git a/drivers/Makefile b/drivers/Makefile
index 7fef66d71..c4483faa7 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -32,13 +32,14 @@
 include $(RTE_SDK)/mk/rte.vars.mk
 
 DIRS-y += bus
+DEPDIRS-event := bus
+DIRS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += event
 DIRS-y += mempool
 DEPDIRS-mempool := bus
+DEPDIRS-mempool := event
 DIRS-y += net
 DEPDIRS-net := bus mempool
 DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += crypto
 DEPDIRS-crypto := mempool
-DIRS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += event
-DEPDIRS-event := bus
 
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/mempool/Makefile b/drivers/mempool/Makefile
index efd55f23e..e2a701089 100644
--- a/drivers/mempool/Makefile
+++ b/drivers/mempool/Makefile
@@ -38,5 +38,7 @@ DIRS-$(CONFIG_RTE_DRIVER_MEMPOOL_RING) += ring
 DEPDIRS-ring = $(core-libs)
 DIRS-$(CONFIG_RTE_DRIVER_MEMPOOL_STACK) += stack
 DEPDIRS-stack = $(core-libs)
+DIRS-$(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL) += octeontx
+DEPDIRS-octeontx = $(core-libs) librte_pmd_octeontx_ssovf
 
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/mempool/octeontx/Makefile 
b/drivers/mempool/octeontx/Makefile
new file mode 100644
index 0..55ca1d944
--- /dev/null
+++ b/drivers/mempool/octeontx/Makefile
@@ -0,0 +1,60 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Cavium Inc. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Cavium Networks nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+#
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_mempool_octeontx.a
+
+ifeq ($(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL_DEBUG),y)
+CFLAGS += -O0 -g
+else
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+endif
+
+EXPORT_MAP := rte_mempool_octeontx_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL) += octeontx_fpavf.c
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL) += lib/librte_mbuf
+
+LDLIBS += -lrte_pmd_octeontx_ssovf
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/mempool/octeontx/octeontx_fpavf.c 
b/drivers/mempool/octeontx/octeontx_fpavf.c
new file mode 100644
index 0..9bb7759c0
--- /dev/null
+++ b/drivers/mempool/octeontx/octeontx_fpavf.c
@@ -0,0 +1,31 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright (C) 

[dpdk-dev] [PATCH v1 03/11] mempool/octeontx: probe fpavf pcie devices

2017-08-24 Thread Santosh Shukla
A mempool device is set of PCIe vfs.
On Octeontx HW, each mempool devices are enumerated as
separate SRIOV VF PCIe device.

In order to expose as a mempool device:
On PCIe probe, the driver stores the information associated with the
PCIe device and later upon application pool request
(e.g. rte_mempool_create_empty), Infrastructure creates a pool device
with earlier probed PCIe VF devices.

Signed-off-by: Santosh Shukla 
Signed-off-by: Jerin Jacob 
---
 drivers/mempool/octeontx/octeontx_fpavf.c | 151 ++
 drivers/mempool/octeontx/octeontx_fpavf.h |  39 
 2 files changed, 190 insertions(+)

diff --git a/drivers/mempool/octeontx/octeontx_fpavf.c 
b/drivers/mempool/octeontx/octeontx_fpavf.c
index 9bb7759c0..0b4a9357f 100644
--- a/drivers/mempool/octeontx/octeontx_fpavf.c
+++ b/drivers/mempool/octeontx/octeontx_fpavf.c
@@ -29,3 +29,154 @@
  *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "octeontx_fpavf.h"
+
+struct fpavf_res {
+   void*pool_stack_base;
+   void*bar0;
+   uint64_tstack_ln_ptr;
+   uint16_tdomain_id;
+   uint16_tvf_id;  /* gpool_id */
+   uint16_tsz128;  /* Block size in cache lines */
+   boolis_inuse;
+};
+
+struct octeontx_fpadev {
+   rte_spinlock_t lock;
+   uint8_t total_gpool_cnt;
+   struct fpavf_res pool[FPA_VF_MAX];
+};
+
+static struct octeontx_fpadev fpadev;
+
+static void
+octeontx_fpavf_setup(void)
+{
+   uint8_t i;
+   static bool init_once;
+
+   if (!init_once) {
+   rte_spinlock_init(&fpadev.lock);
+   fpadev.total_gpool_cnt = 0;
+
+   for (i = 0; i < FPA_VF_MAX; i++) {
+
+   fpadev.pool[i].domain_id = ~0;
+   fpadev.pool[i].stack_ln_ptr = 0;
+   fpadev.pool[i].sz128 = 0;
+   fpadev.pool[i].bar0 = NULL;
+   fpadev.pool[i].pool_stack_base = NULL;
+   fpadev.pool[i].is_inuse = false;
+   }
+   init_once = 1;
+   }
+}
+
+static int
+octeontx_fpavf_identify(void *bar0)
+{
+   uint64_t val;
+   uint16_t domain_id;
+   uint16_t vf_id;
+   uint64_t stack_ln_ptr;
+
+   val = fpavf_read64((void *)((uintptr_t)bar0 +
+   FPA_VF_VHAURA_CNT_THRESHOLD(0)));
+
+   domain_id = (val >> 8) & 0x;
+   vf_id = (val >> 24) & 0x;
+
+   stack_ln_ptr = fpavf_read64((void *)((uintptr_t)bar0 +
+   FPA_VF_VHPOOL_THRESHOLD(0)));
+   if (vf_id >= FPA_VF_MAX) {
+   fpavf_log_err("vf_id(%d) greater than max vf (32)\n", vf_id);
+   return -1;
+   }
+
+   if (fpadev.pool[vf_id].is_inuse) {
+   fpavf_log_err("vf_id %d is_inuse\n", vf_id);
+   return -1;
+   }
+
+   fpadev.pool[vf_id].domain_id = domain_id;
+   fpadev.pool[vf_id].vf_id = vf_id;
+   fpadev.pool[vf_id].bar0 = bar0;
+   fpadev.pool[vf_id].stack_ln_ptr = stack_ln_ptr;
+
+   /* SUCCESS */
+   return vf_id;
+}
+
+/* FPAVF pcie device aka mempool probe */
+static int
+fpavf_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+{
+   uint8_t *idreg;
+   int res;
+   struct fpavf_res *fpa;
+
+   RTE_SET_USED(pci_drv);
+   RTE_SET_USED(fpa);
+
+   /* For secondary processes, the primary has done all the work */
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+   return 0;
+
+   if (pci_dev->mem_resource[0].addr == NULL) {
+   fpavf_log_err("Empty bars %p ", pci_dev->mem_resource[0].addr);
+   return -ENODEV;
+   }
+   idreg = pci_dev->mem_resource[0].addr;
+
+   octeontx_fpavf_setup();
+
+   res = octeontx_fpavf_identify(idreg);
+   if (res < 0)
+   return -1;
+
+   fpa = &fpadev.pool[res];
+   fpadev.total_gpool_cnt++;
+   rte_wmb();
+
+   fpavf_log_dbg("total_fpavfs %d bar0 %p domain %d vf %d stk_ln_ptr 0x%x",
+  fpadev.total_gpool_cnt, fpa->bar0, fpa->domain_id,
+  fpa->vf_id, (unsigned int)fpa->stack_ln_ptr);
+
+   return 0;
+}
+
+static const struct rte_pci_id pci_fpavf_map[] = {
+   {
+   RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM,
+   PCI_DEVICE_ID_OCTEONTX_FPA_VF)
+   },
+   {
+   .vendor_id = 0,
+   },
+};
+
+static struct rte_pci_driver pci_fpavf = {
+   .id_table = pci_fpavf_map,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA,
+   .probe = fpavf_probe,
+};
+
+RTE_PMD_REGISTE

[dpdk-dev] [PATCH v1 04/11] mempool/octeontx: implement pool alloc

2017-08-24 Thread Santosh Shukla
Upon pool allocation request by application, Octeontx FPA alloc
does following:
- Gets free pool from pci fpavf array.
- Uses mbox to communicate fpapf driver about,
  * gpool-id
  * pool block_sz
  * alignemnt
- Programs fpavf pool boundary.

Signed-off-by: Santosh Shukla 
Signed-off-by: Jerin Jacob 
---
 drivers/mempool/octeontx/Makefile   |   1 +
 drivers/mempool/octeontx/octeontx_fpavf.c   | 515 
 drivers/mempool/octeontx/octeontx_fpavf.h   |  10 +
 drivers/mempool/octeontx/rte_mempool_octeontx.c |  88 
 4 files changed, 614 insertions(+)
 create mode 100644 drivers/mempool/octeontx/rte_mempool_octeontx.c

diff --git a/drivers/mempool/octeontx/Makefile 
b/drivers/mempool/octeontx/Makefile
index 55ca1d944..9c3389608 100644
--- a/drivers/mempool/octeontx/Makefile
+++ b/drivers/mempool/octeontx/Makefile
@@ -51,6 +51,7 @@ LIBABIVER := 1
 # all source are stored in SRCS-y
 #
 SRCS-$(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL) += octeontx_fpavf.c
+SRCS-$(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL) += rte_mempool_octeontx.c
 
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL) += lib/librte_mbuf
diff --git a/drivers/mempool/octeontx/octeontx_fpavf.c 
b/drivers/mempool/octeontx/octeontx_fpavf.c
index 0b4a9357f..85ddf0a03 100644
--- a/drivers/mempool/octeontx/octeontx_fpavf.c
+++ b/drivers/mempool/octeontx/octeontx_fpavf.c
@@ -46,9 +46,75 @@
 #include 
 #include 
 #include 
+#include 
 
+#include 
 #include "octeontx_fpavf.h"
 
+/* FPA Mbox Message */
+#define IDENTIFY   0x0
+
+#define FPA_CONFIGSET  0x1
+#define FPA_CONFIGGET  0x2
+#define FPA_START_COUNT0x3
+#define FPA_STOP_COUNT 0x4
+#define FPA_ATTACHAURA 0x5
+#define FPA_DETACHAURA 0x6
+#define FPA_SETAURALVL 0x7
+#define FPA_GETAURALVL 0x8
+
+#define FPA_COPROC 0x1
+
+/* fpa mbox struct */
+struct octeontx_mbox_fpa_cfg {
+   int aid;
+   uint64_tpool_cfg;
+   uint64_tpool_stack_base;
+   uint64_tpool_stack_end;
+   uint64_taura_cfg;
+};
+
+struct __attribute__((__packed__)) gen_req {
+   uint32_tvalue;
+};
+
+struct __attribute__((__packed__)) idn_req {
+   uint8_t domain_id;
+};
+
+struct __attribute__((__packed__)) gen_resp {
+   uint16_tdomain_id;
+   uint16_tvfid;
+};
+
+struct __attribute__((__packed__)) dcfg_resp {
+   uint8_t sso_count;
+   uint8_t ssow_count;
+   uint8_t fpa_count;
+   uint8_t pko_count;
+   uint8_t tim_count;
+   uint8_t net_port_count;
+   uint8_t virt_port_count;
+};
+
+#define FPA_MAX_POOL   32
+#define FPA_PF_PAGE_SZ 4096
+
+#define FPA_LN_SIZE128
+#define FPA_ROUND_UP(x, size) \
+   unsigned long)(x)) + size-1) & (~(size-1)))
+#define FPA_OBJSZ_2_CACHE_LINE(sz) (((sz) + RTE_CACHE_LINE_MASK) >> 7)
+#define FPA_CACHE_LINE_2_OBJSZ(sz) ((sz) << 7)
+
+#define POOL_ENA   (0x1 << 0)
+#define POOL_DIS   (0x0 << 0)
+#define POOL_SET_NAT_ALIGN (0x1 << 1)
+#define POOL_DIS_NAT_ALIGN (0x0 << 1)
+#define POOL_STYPE(x)  (((x) & 0x1) << 2)
+#define POOL_LTYPE(x)  (((x) & 0x3) << 3)
+#define POOL_BUF_OFFSET(x) (((x) & 0x7fffULL) << 16)
+#define POOL_BUF_SIZE(x)   (((x) & 0x7ffULL) << 32)
+
 struct fpavf_res {
void*pool_stack_base;
void*bar0;
@@ -67,6 +133,455 @@ struct octeontx_fpadev {
 
 static struct octeontx_fpadev fpadev;
 
+/* lock is taken by caller */
+static int
+octeontx_fpa_gpool_alloc(unsigned int object_size)
+{
+   struct fpavf_res *res = NULL;
+   uint16_t gpool;
+   unsigned int sz128;
+
+   sz128 = FPA_OBJSZ_2_CACHE_LINE(object_size);
+
+   for (gpool = 0; gpool < FPA_VF_MAX; gpool++) {
+
+   /* Skip VF that is not mapped Or _inuse */
+   if ((fpadev.pool[gpool].bar0 == NULL) ||
+   (fpadev.pool[gpool].is_inuse == true))
+   continue;
+
+   res = &fpadev.pool[gpool];
+
+   RTE_ASSERT(res->domain_id != (uint16_t)~0);
+   RTE_ASSERT(res->vf_id != (uint16_t)~0);
+   RTE_ASSERT(res->stack_ln_ptr != 0);
+
+   if (res->sz128 == 0) {
+   res->sz128 = sz128;
+
+   fpavf_log_dbg("gpool %d blk_sz %d\n", gpool, sz128);
+   return gpool;
+   }
+   }
+
+   return -ENOSPC;
+}
+
+/* lock is taken by caller */
+static __rte_always_inline uintptr_t
+octeontx_fpa_gpool2handle(uint16_t gpool)
+{
+   struct fpavf_res *res = NULL;
+
+   RTE_ASSERT(gpool < FPA_VF_MAX);
+
+   res = &fpadev.pool[gpool];
+   if (unlikely(res == NULL))
+   return 0;
+
+   return (uintptr_t)res->bar0;
+}
+
+/* lock is taken by caller */
+static __rte_always_inline in

[dpdk-dev] [PATCH v1 05/11] mempool/octeontx: implement pool free

2017-08-24 Thread Santosh Shukla
Upon pool free request from application, Octeon FPA free
does following:
- Uses mbox to reset fpapf pool setup.
- frees fpavf resources.

Signed-off-by: Santosh Shukla 
Signed-off-by: Jerin Jacob 
---
 drivers/mempool/octeontx/octeontx_fpavf.c   | 107 
 drivers/mempool/octeontx/octeontx_fpavf.h   |   2 +
 drivers/mempool/octeontx/rte_mempool_octeontx.c |  12 ++-
 3 files changed, 120 insertions(+), 1 deletion(-)

diff --git a/drivers/mempool/octeontx/octeontx_fpavf.c 
b/drivers/mempool/octeontx/octeontx_fpavf.c
index 85ddf0a03..bcbbefd7d 100644
--- a/drivers/mempool/octeontx/octeontx_fpavf.c
+++ b/drivers/mempool/octeontx/octeontx_fpavf.c
@@ -582,6 +582,113 @@ octeontx_fpa_bufpool_create(unsigned int object_size, 
unsigned int object_count,
return (uintptr_t)NULL;
 }
 
+/*
+ * Destroy a buffer pool.
+ */
+int
+octeontx_fpa_bufpool_destroy(uintptr_t handle, int node_id)
+{
+   void **node, **curr, *head = NULL;
+   uint64_t sz;
+   uint64_t cnt, avail;
+   unsigned int gpool;
+   int ret;
+
+   RTE_SET_USED(node_id);
+
+   /* Wait for all outstanding writes to be comitted */
+   rte_smp_wmb();
+
+   if (unlikely(!octeontx_fpa_handle_valid(handle)))
+   return -EINVAL;
+
+   /* get pool */
+   gpool = octeontx_fpa_handle2gpool(handle);
+
+/* Check for no outstanding buffers */
+   cnt = fpavf_read64((void *)((uintptr_t)handle +
+   FPA_VF_VHAURA_CNT(gpool)));
+   if (cnt) {
+   fpavf_log_dbg("buffer exist in pool cnt %ld\n", cnt);
+   return -EBUSY;
+   }
+
+   rte_spinlock_lock(&fpadev.lock);
+
+   avail = fpavf_read64((void *)((uintptr_t)handle +
+   FPA_VF_VHPOOL_AVAILABLE(gpool)));
+
+   /* Prepare to empty the entire POOL */
+   fpavf_write64(avail, (void *)((uintptr_t)handle +
+FPA_VF_VHAURA_CNT_LIMIT(gpool)));
+   fpavf_write64(avail + 1, (void *)((uintptr_t)handle +
+FPA_VF_VHAURA_CNT_THRESHOLD(gpool)));
+
+   /* Empty the pool */
+   /* Invalidate the POOL */
+   octeontx_gpool_free(gpool);
+
+   /* Process all buffers in the pool */
+   while (avail--) {
+
+   /* Yank a buffer from the pool */
+   node = (void *)(uintptr_t)
+   fpavf_read64((void *)
+(handle + FPA_VF_VHAURA_OP_ALLOC(gpool)));
+
+   if (node == NULL) {
+   fpavf_log_err("ERROR: GAURA[%u] missing %lu buffers\n",
+ gpool, avail);
+   break;
+   }
+
+   /* Imsert it into an ordered linked list */
+   for (curr = &head; curr[0] != NULL; curr = curr[0]) {
+   if ((uintptr_t)node <= (uintptr_t)curr[0])
+   break;
+   }
+   node[0] = curr[0];
+   curr[0] = node;
+   }
+
+   /* Verify the linked list to be a perfect series */
+   sz = octeontx_fpa_bufpool_block_size(handle) << 7;
+   for (curr = head; curr != NULL && curr[0] != NULL;
+   curr = curr[0]) {
+   if (curr == curr[0] ||
+   (curr != ((void *)((uintptr_t)curr[0] - sz {
+   fpavf_log_err("POOL# %u buf sequence err (%p vs. %p)\n",
+ gpool, curr, curr[0]);
+   }
+   }
+
+   /* Disable pool operation */
+   fpavf_write64(~0ul, (void *)((uintptr_t)handle +
+FPA_VF_VHPOOL_START_ADDR(gpool)));
+   fpavf_write64(~0ul, (void *)((uintptr_t)handle +
+   FPA_VF_VHPOOL_END_ADDR(gpool)));
+
+   (void)octeontx_fpapf_pool_destroy(gpool);
+
+   /* Deactivate the AURA */
+   fpavf_write64(0, (void *)((uintptr_t)handle +
+   FPA_VF_VHAURA_CNT_LIMIT(gpool)));
+   fpavf_write64(0, (void *)((uintptr_t)handle +
+   FPA_VF_VHAURA_CNT_THRESHOLD(gpool)));
+
+   ret = octeontx_fpapf_aura_detach(gpool);
+   if (ret) {
+   fpavf_log_err("Failed to dettach gaura %u. error code=%d\n",
+ gpool, ret);
+   }
+
+   /* Free VF */
+   (void)octeontx_fpavf_free(gpool);
+
+   rte_spinlock_unlock(&fpadev.lock);
+   return 0;
+}
+
 static void
 octeontx_fpavf_setup(void)
 {
diff --git a/drivers/mempool/octeontx/octeontx_fpavf.h 
b/drivers/mempool/octeontx/octeontx_fpavf.h
index 3e8a2682f..936276715 100644
--- a/drivers/mempool/octeontx/octeontx_fpavf.h
+++ b/drivers/mempool/octeontx/octeontx_fpavf.h
@@ -135,5 +135,7 @@ octeontx_fpa_bufpool_create(unsigned int object_size, 
unsigned int object_count,
unsigned int buf_offset, char **va_start,
int node);
 int
+octeontx_fpa_bufpool_

[dpdk-dev] [PATCH v1 06/11] mempool/octeontx: implement pool enq and deq

2017-08-24 Thread Santosh Shukla
Signed-off-by: Santosh Shukla 
Signed-off-by: Jerin Jacob 
---
 drivers/mempool/octeontx/Makefile   | 13 +
 drivers/mempool/octeontx/rte_mempool_octeontx.c | 65 -
 2 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/drivers/mempool/octeontx/Makefile 
b/drivers/mempool/octeontx/Makefile
index 9c3389608..0b2043842 100644
--- a/drivers/mempool/octeontx/Makefile
+++ b/drivers/mempool/octeontx/Makefile
@@ -53,6 +53,19 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL) += octeontx_fpavf.c
 SRCS-$(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL) += rte_mempool_octeontx.c
 
+ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
+CFLAGS_rte_mempool_octeontx.o += -fno-prefetch-loop-arrays
+
+ifeq ($(shell test $(GCC_VERSION) -ge 46 && echo 1), 1)
+CFLAGS_rte_mempool_octeontx.o += -Ofast
+else
+CFLAGS_rte_mempool_octeontx.o += -O3 -ffast-math
+endif
+
+else
+CFLAGS_rte_mempool_octeontx.o += -Ofast
+endif
+
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL) += lib/librte_mbuf
 
diff --git a/drivers/mempool/octeontx/rte_mempool_octeontx.c 
b/drivers/mempool/octeontx/rte_mempool_octeontx.c
index 6754a78c0..47d16cb8f 100644
--- a/drivers/mempool/octeontx/rte_mempool_octeontx.c
+++ b/drivers/mempool/octeontx/rte_mempool_octeontx.c
@@ -84,12 +84,73 @@ octeontx_fpavf_free(struct rte_mempool *mp)
octeontx_fpa_bufpool_destroy(pool, mp->socket_id);
 }
 
+static __rte_always_inline void *
+octeontx_fpa_bufpool_alloc(uintptr_t handle)
+{
+   return (void *)fpavf_read64((void *)(handle +
+   FPA_VF_VHAURA_OP_ALLOC(0)));
+}
+
+static __rte_always_inline void
+octeontx_fpa_bufpool_free(uintptr_t handle, void *buf)
+{
+   uint64_t free_addr = FPA_VF_FREE_ADDRS_S(FPA_VF_VHAURA_OP_FREE(0),
+0 /* DWB */, 1 /* FABS */);
+
+   fpavf_write64((uintptr_t)buf, (void *)(handle + free_addr));
+}
+
+static int
+octeontx_fpavf_enqueue(struct rte_mempool *mp, void * const *obj_table,
+   unsigned int n)
+{
+   uintptr_t pool;
+   unsigned int index;
+
+   pool = (uintptr_t)mp->pool_id;
+   for (index = 0; index < n; index++, obj_table++)
+   octeontx_fpa_bufpool_free(pool, *obj_table);
+
+   return 0;
+}
+
+static int
+octeontx_fpavf_dequeue(struct rte_mempool *mp, void **obj_table,
+   unsigned int n)
+{
+   unsigned int index;
+   uintptr_t pool;
+   void *obj;
+
+   pool = (uintptr_t)mp->pool_id;
+   for (index = 0; index < n; index++, obj_table++) {
+   obj = octeontx_fpa_bufpool_alloc(pool);
+   if (obj == NULL) {
+   /*
+* Failed to allocate the requested number of objects
+* from the pool. Current pool implementation requires
+* completing the entire request or returning error
+* otherwise.
+* Free already allocated buffers to the pool.
+*/
+   for (; index > 0; index--) {
+   obj_table--;
+   octeontx_fpa_bufpool_free(pool, *obj_table);
+   }
+   return -ENOMEM;
+   }
+   *obj_table = obj;
+   }
+
+   return 0;
+}
+
 static struct rte_mempool_ops octeontx_fpavf_ops = {
.name = "octeontx_fpavf",
.alloc = octeontx_fpavf_alloc,
.free = octeontx_fpavf_free,
-   .enqueue = NULL,
-   .dequeue = NULL,
+   .enqueue = octeontx_fpavf_enqueue,
+   .dequeue = octeontx_fpavf_dequeue,
.get_count = NULL,
.get_capabilities = NULL,
.update_range = NULL,
-- 
2.11.0



[dpdk-dev] [PATCH v1 07/11] mempool/octeontx: implement pool get count

2017-08-24 Thread Santosh Shukla
Signed-off-by: Santosh Shukla 
Signed-off-by: Jerin Jacob 
---
 drivers/mempool/octeontx/octeontx_fpavf.c   | 22 ++
 drivers/mempool/octeontx/octeontx_fpavf.h   |  2 ++
 drivers/mempool/octeontx/rte_mempool_octeontx.c | 12 +++-
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/mempool/octeontx/octeontx_fpavf.c 
b/drivers/mempool/octeontx/octeontx_fpavf.c
index bcbbefd7d..adee744db 100644
--- a/drivers/mempool/octeontx/octeontx_fpavf.c
+++ b/drivers/mempool/octeontx/octeontx_fpavf.c
@@ -488,6 +488,28 @@ octeontx_fpa_bufpool_block_size(uintptr_t handle)
return FPA_CACHE_LINE_2_OBJSZ(res->sz128);
 }
 
+int
+octeontx_fpa_bufpool_free_count(uintptr_t handle)
+{
+   uint64_t cnt, limit, avail;
+   int gpool;
+
+   if (unlikely(!octeontx_fpa_handle_valid(handle)))
+   return -EINVAL;
+
+   gpool = octeontx_fpa_handle2gpool(handle);
+
+   cnt = fpavf_read64((void *)((uintptr_t)handle +
+   FPA_VF_VHAURA_CNT(gpool)));
+   limit = fpavf_read64((void *)((uintptr_t)handle +
+   FPA_VF_VHAURA_CNT_LIMIT(gpool)));
+
+   avail = fpavf_read64((void *)((uintptr_t)handle +
+   FPA_VF_VHPOOL_AVAILABLE(gpool)));
+
+   return RTE_MIN(avail, (limit - cnt));
+}
+
 uintptr_t
 octeontx_fpa_bufpool_create(unsigned int object_size, unsigned int 
object_count,
unsigned int buf_offset, char **va_start,
diff --git a/drivers/mempool/octeontx/octeontx_fpavf.h 
b/drivers/mempool/octeontx/octeontx_fpavf.h
index 936276715..9c601e0f8 100644
--- a/drivers/mempool/octeontx/octeontx_fpavf.h
+++ b/drivers/mempool/octeontx/octeontx_fpavf.h
@@ -138,4 +138,6 @@ int
 octeontx_fpa_bufpool_destroy(uintptr_t handle, int node);
 int
 octeontx_fpa_bufpool_block_size(uintptr_t handle);
+int
+octeontx_fpa_bufpool_free_count(uintptr_t handle);
 #endif /* __OCTEONTX_FPAVF_H__ */
diff --git a/drivers/mempool/octeontx/rte_mempool_octeontx.c 
b/drivers/mempool/octeontx/rte_mempool_octeontx.c
index 47d16cb8f..e56ea43c7 100644
--- a/drivers/mempool/octeontx/rte_mempool_octeontx.c
+++ b/drivers/mempool/octeontx/rte_mempool_octeontx.c
@@ -145,13 +145,23 @@ octeontx_fpavf_dequeue(struct rte_mempool *mp, void 
**obj_table,
return 0;
 }
 
+static unsigned int
+octeontx_fpavf_get_count(const struct rte_mempool *mp)
+{
+   uintptr_t pool;
+
+   pool = (uintptr_t)mp->pool_id;
+
+   return octeontx_fpa_bufpool_free_count(pool);
+}
+
 static struct rte_mempool_ops octeontx_fpavf_ops = {
.name = "octeontx_fpavf",
.alloc = octeontx_fpavf_alloc,
.free = octeontx_fpavf_free,
.enqueue = octeontx_fpavf_enqueue,
.dequeue = octeontx_fpavf_dequeue,
-   .get_count = NULL,
+   .get_count = octeontx_fpavf_get_count,
.get_capabilities = NULL,
.update_range = NULL,
 };
-- 
2.11.0



[dpdk-dev] [PATCH v1 08/11] mempool/octeontx: implement pool get capability

2017-08-24 Thread Santosh Shukla
Allow mempool HW manager to advertise his pool capability.

Signed-off-by: Santosh Shukla 
Signed-off-by: Jerin Jacob 
---
 drivers/mempool/octeontx/rte_mempool_octeontx.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/mempool/octeontx/rte_mempool_octeontx.c 
b/drivers/mempool/octeontx/rte_mempool_octeontx.c
index e56ea43c7..cc1b101f4 100644
--- a/drivers/mempool/octeontx/rte_mempool_octeontx.c
+++ b/drivers/mempool/octeontx/rte_mempool_octeontx.c
@@ -155,6 +155,14 @@ octeontx_fpavf_get_count(const struct rte_mempool *mp)
return octeontx_fpa_bufpool_free_count(pool);
 }
 
+static int
+octeontx_fpavf_get_capabilities(struct rte_mempool *mp)
+{
+   mp->flags |= (MEMPOOL_F_CAPA_PHYS_CONTIG |
+   MEMPOOL_F_POOL_BLK_SZ_ALIGNED);
+   return 0;
+}
+
 static struct rte_mempool_ops octeontx_fpavf_ops = {
.name = "octeontx_fpavf",
.alloc = octeontx_fpavf_alloc,
@@ -162,7 +170,7 @@ static struct rte_mempool_ops octeontx_fpavf_ops = {
.enqueue = octeontx_fpavf_enqueue,
.dequeue = octeontx_fpavf_dequeue,
.get_count = octeontx_fpavf_get_count,
-   .get_capabilities = NULL,
+   .get_capabilities = octeontx_fpavf_get_capabilities,
.update_range = NULL,
 };
 
-- 
2.11.0



[dpdk-dev] [PATCH v1 09/11] mempool/octeontx: implement pool update range

2017-08-24 Thread Santosh Shukla
Add support for update range ops in mempool driver.

Allow more than one HW pool when using OcteonTx mempool driver:
By storing each pool information to the list and find appropriate
list element by matching the rte_mempool pointers.

Signed-off-by: Santosh Shukla 
Signed-off-by: Jerin Jacob 
---
 drivers/mempool/octeontx/rte_mempool_octeontx.c | 73 -
 1 file changed, 71 insertions(+), 2 deletions(-)

diff --git a/drivers/mempool/octeontx/rte_mempool_octeontx.c 
b/drivers/mempool/octeontx/rte_mempool_octeontx.c
index cc1b101f4..7c16259ea 100644
--- a/drivers/mempool/octeontx/rte_mempool_octeontx.c
+++ b/drivers/mempool/octeontx/rte_mempool_octeontx.c
@@ -36,17 +36,49 @@
 
 #include "octeontx_fpavf.h"
 
+/*
+ * Per-pool descriptor.
+ * Links mempool with the corresponding memzone,
+ * that provides memory under the pool's elements.
+ */
+struct octeontx_pool_info {
+   const struct rte_mempool *mp;
+   uintptr_t mz_addr;
+
+   SLIST_ENTRY(octeontx_pool_info) link;
+};
+
+SLIST_HEAD(octeontx_pool_list, octeontx_pool_info);
+
+/* List of the allocated pools */
+static struct octeontx_pool_list octeontx_pool_head =
+   SLIST_HEAD_INITIALIZER(octeontx_pool_head);
+/* Spinlock to protect pool list */
+static rte_spinlock_t pool_list_lock = RTE_SPINLOCK_INITIALIZER;
+
 static int
 octeontx_fpavf_alloc(struct rte_mempool *mp)
 {
uintptr_t pool;
+   struct octeontx_pool_info *pool_info;
uint32_t memseg_count = mp->size;
uint32_t object_size;
uintptr_t va_start;
int rc = 0;
 
+   rte_spinlock_lock(&pool_list_lock);
+   SLIST_FOREACH(pool_info, &octeontx_pool_head, link) {
+   if (pool_info->mp == mp)
+   break;
+   }
+   if (pool_info == NULL) {
+   rte_spinlock_unlock(&pool_list_lock);
+   return -ENXIO;
+   }
+
/* virtual hugepage mapped addr */
-   va_start = ~(uint64_t)0;
+   va_start = pool_info->mz_addr;
+   rte_spinlock_unlock(&pool_list_lock);
 
object_size = mp->elt_size + mp->header_size + mp->trailer_size;
 
@@ -77,10 +109,27 @@ octeontx_fpavf_alloc(struct rte_mempool *mp)
 static void
 octeontx_fpavf_free(struct rte_mempool *mp)
 {
+   struct octeontx_pool_info *pool_info;
uintptr_t pool;
 
pool = (uintptr_t)mp->pool_id;
 
+   rte_spinlock_lock(&pool_list_lock);
+   SLIST_FOREACH(pool_info, &octeontx_pool_head, link) {
+   if (pool_info->mp == mp)
+   break;
+   }
+
+   if (pool_info == NULL) {
+   rte_spinlock_unlock(&pool_list_lock);
+   rte_panic("%s: trying to free pool with no valid metadata",
+   __func__);
+   }
+
+   SLIST_REMOVE(&octeontx_pool_head, pool_info, octeontx_pool_info, link);
+   rte_spinlock_unlock(&pool_list_lock);
+
+   rte_free(pool_info);
octeontx_fpa_bufpool_destroy(pool, mp->socket_id);
 }
 
@@ -163,6 +212,26 @@ octeontx_fpavf_get_capabilities(struct rte_mempool *mp)
return 0;
 }
 
+static void
+octeontx_fpavf_update_range(const struct rte_mempool *mp,
+   char *vaddr, phys_addr_t paddr, size_t len)
+{
+   struct octeontx_pool_info *pool_info;
+
+   RTE_SET_USED(paddr);
+   RTE_SET_USED(len);
+
+   pool_info = rte_malloc("octeontx_pool_info", sizeof(*pool_info), 0);
+   if (pool_info == NULL)
+   return;
+
+   pool_info->mp = mp;
+   pool_info->mz_addr = (uintptr_t)vaddr;
+   rte_spinlock_lock(&pool_list_lock);
+   SLIST_INSERT_HEAD(&octeontx_pool_head, pool_info, link);
+   rte_spinlock_unlock(&pool_list_lock);
+}
+
 static struct rte_mempool_ops octeontx_fpavf_ops = {
.name = "octeontx_fpavf",
.alloc = octeontx_fpavf_alloc,
@@ -171,7 +240,7 @@ static struct rte_mempool_ops octeontx_fpavf_ops = {
.dequeue = octeontx_fpavf_dequeue,
.get_count = octeontx_fpavf_get_count,
.get_capabilities = octeontx_fpavf_get_capabilities,
-   .update_range = NULL,
+   .update_range = octeontx_fpavf_update_range,
 };
 
 MEMPOOL_REGISTER_OPS(octeontx_fpavf_ops);
-- 
2.11.0



[dpdk-dev] [PATCH v1 10/11] mempool/octeontx: translate handle to pool

2017-08-24 Thread Santosh Shukla
Adding global api to translate handle to pool,
needed by octeontx pmd.

Signed-off-by: Santosh Shukla 
Signed-off-by: Jerin Jacob 
---
 drivers/mempool/octeontx/octeontx_fpavf.c | 9 +
 drivers/mempool/octeontx/octeontx_fpavf.h | 2 ++
 drivers/mempool/octeontx/rte_mempool_octeontx_version.map | 3 +++
 3 files changed, 14 insertions(+)

diff --git a/drivers/mempool/octeontx/octeontx_fpavf.c 
b/drivers/mempool/octeontx/octeontx_fpavf.c
index adee744db..b8dc56c37 100644
--- a/drivers/mempool/octeontx/octeontx_fpavf.c
+++ b/drivers/mempool/octeontx/octeontx_fpavf.c
@@ -472,6 +472,15 @@ octeontx_gpool_free(uint16_t gpool)
 }
 
 /*
+ * Return guara for a given pool
+ */
+int
+rte_octeontx_fpa_bufpool_gaura(uintptr_t handle)
+{
+   return octeontx_fpa_handle2gpool(handle);
+}
+
+/*
  * Return buffer size for a given pool
  */
 int
diff --git a/drivers/mempool/octeontx/octeontx_fpavf.h 
b/drivers/mempool/octeontx/octeontx_fpavf.h
index 9c601e0f8..df0c9a8f2 100644
--- a/drivers/mempool/octeontx/octeontx_fpavf.h
+++ b/drivers/mempool/octeontx/octeontx_fpavf.h
@@ -140,4 +140,6 @@ int
 octeontx_fpa_bufpool_block_size(uintptr_t handle);
 int
 octeontx_fpa_bufpool_free_count(uintptr_t handle);
+int
+rte_octeontx_fpa_bufpool_gaura(uintptr_t handle);
 #endif /* __OCTEONTX_FPAVF_H__ */
diff --git a/drivers/mempool/octeontx/rte_mempool_octeontx_version.map 
b/drivers/mempool/octeontx/rte_mempool_octeontx_version.map
index a70bd197b..377333f31 100644
--- a/drivers/mempool/octeontx/rte_mempool_octeontx_version.map
+++ b/drivers/mempool/octeontx/rte_mempool_octeontx_version.map
@@ -1,4 +1,7 @@
 DPDK_17.11 {
+   global:
+
+   rte_octeontx_fpa_bufpool_gaura;
 
local: *;
 };
-- 
2.11.0



[dpdk-dev] [PATCH v1 11/11] doc: add mempool and octeontx mempool device

2017-08-24 Thread Santosh Shukla
This commit adds a section to the docs listing the mempool
device PMDs available.

It then adds the octeontx fpavf mempool PMD to the listed mempool
devices.

Signed-off-by: Santosh Shukla 
Signed-off-by: Jerin Jacob 
---
 MAINTAINERS |   6 ++
 doc/guides/index.rst|   1 +
 doc/guides/mempool/index.rst|  40 +
 doc/guides/mempool/octeontx.rst | 127 
 4 files changed, 174 insertions(+)
 create mode 100644 doc/guides/mempool/index.rst
 create mode 100644 doc/guides/mempool/octeontx.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index a0cd75e15..4122c8099 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -337,6 +337,12 @@ F: drivers/net/liquidio/
 F: doc/guides/nics/liquidio.rst
 F: doc/guides/nics/features/liquidio.ini
 
+Cavium Octeontx Mempool
+M: Santosh Shukla 
+F: drivers/mempool/octeontx
+F: doc/guides/mempool/index.rst
+F: doc/guides/mempool/octeontx.rst
+
 Chelsio cxgbe
 M: Rahul Lakkireddy 
 F: drivers/net/cxgbe/
diff --git a/doc/guides/index.rst b/doc/guides/index.rst
index 63716b095..98f4b7aab 100644
--- a/doc/guides/index.rst
+++ b/doc/guides/index.rst
@@ -44,6 +44,7 @@ DPDK documentation
nics/index
cryptodevs/index
eventdevs/index
+   mempool/index
xen/index
contributing/index
rel_notes/index
diff --git a/doc/guides/mempool/index.rst b/doc/guides/mempool/index.rst
new file mode 100644
index 0..38bbca1c4
--- /dev/null
+++ b/doc/guides/mempool/index.rst
@@ -0,0 +1,40 @@
+..  BSD LICENSE
+Copyright(c) 2017 Cavium Inc. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Mempool Device driver
+=
+
+The following are a list of mempool PMDs, which can be used from an
+application through the mempool API.
+
+.. toctree::
+:maxdepth: 2
+:numbered:
+
+octeontx
diff --git a/doc/guides/mempool/octeontx.rst b/doc/guides/mempool/octeontx.rst
new file mode 100644
index 0..74999009c
--- /dev/null
+++ b/doc/guides/mempool/octeontx.rst
@@ -0,0 +1,127 @@
+..  BSD LICENSE
+Copyright (C) Cavium, Inc. 2017. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Cavium, Inc nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR T

Re: [dpdk-dev] [PATCH v1] net/mlx5: support upstream rdma-core

2017-08-24 Thread Ferruh Yigit
On 8/24/2017 1:23 PM, Shachar Beiser wrote:
>  This removes the dependency on specific Mellanox OFED libraries by
>  using the upstream rdma-core and linux upstream community code.
> 
>  Minimal requirements: rdma-core v16 and Kernel Linux 4.14.

4.14?

Is 4.13 released?

> 
> Signed-off-by: Shachar Beiser 

<...>



Re: [dpdk-dev] [PATCH v2 1/3] ethdev: expose Rx hardware timestamp

2017-08-24 Thread Nélio Laranjeiro
On Thu, Aug 24, 2017 at 10:46:31AM +0300, Raslan Darawsheh wrote:
> Added new capability to the list of rx offloads for hw timestamp
> 
> The PMDs how expose this capability will always have it enabled.

Not sure this comment is accurate, Rx offloads are application request, PMD
does not have to enable it by default.

> But, if the following API got accepted applications can choose
> between disable/enable this API.
> http://dpdk.org/dev/patchwork/patch/27470/
> 
> Signed-off-by: Raslan Darawsheh 
> ---
>  lib/librte_ether/rte_ethdev.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 0adf327..cc5d281 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -907,6 +907,8 @@ struct rte_eth_conf {
>  #define DEV_RX_OFFLOAD_QINQ_STRIP  0x0020
>  #define DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM 0x0040
>  #define DEV_RX_OFFLOAD_MACSEC_STRIP 0x0080
> +#define DEV_RX_OFFLOAD_TIMESTAMP 0x0100
> +/**< Device puts raw timestamp in mbuf. */
>  
>  /**
>   * TX offload capabilities of a device.
> -- 
> 2.7.4

-- 
Nélio Laranjeiro
6WIND


[dpdk-dev] [PATCH] rte_sched: don't count RED-drops as tail-drops

2017-08-24 Thread Alan Dewar
Everytime the rte_sched_port_update_subport_stats_on_drop or
rte_sched_port_update_queue_stats_on_drop functions are called the
n_pkts_dropped counter is incremented.

The n_pkts_red_dropped counter is only incremented when the function
argument red is non-zero.

Packets that are RED-dropped are not Tail-dropped, so the n_pkts_dropped
counter should not be incremented when the n_pkts_red_dropped counter is.

Signed-off-by: Alan Dewar 
---
 lib/librte_sched/rte_sched.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index b7cba11..c10c266 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -1108,7 +1108,7 @@ rte_sched_port_update_subport_stats_on_drop(struct 
rte_sched_port *port,
uint32_t tc_index = (qindex >> 2) & 0x3;
uint32_t pkt_len = pkt->pkt_len;
 
-   s->stats.n_pkts_tc_dropped[tc_index] += 1;
+   s->stats.n_pkts_tc_dropped[tc_index] += !red;
s->stats.n_bytes_tc_dropped[tc_index] += pkt_len;
 #ifdef RTE_SCHED_RED
s->stats.n_pkts_red_dropped[tc_index] += red;
@@ -1140,7 +1140,7 @@ rte_sched_port_update_queue_stats_on_drop(struct 
rte_sched_port *port,
struct rte_sched_queue_extra *qe = port->queue_extra + qindex;
uint32_t pkt_len = pkt->pkt_len;
 
-   qe->stats.n_pkts_dropped += 1;
+   qe->stats.n_pkts_dropped += !red;
qe->stats.n_bytes_dropped += pkt_len;
 #ifdef RTE_SCHED_RED
qe->stats.n_pkts_red_dropped += red;
-- 
2.1.4



Re: [dpdk-dev] [PATCH v2 2/3] app/testpmd: add Rx timestamp in testpmd

2017-08-24 Thread Nélio Laranjeiro
On Thu, Aug 24, 2017 at 10:46:32AM +0300, Raslan Darawsheh wrote:
> Added new print in case a PMD exposes Rx timestamp.
> Also, added a print for timestamp value in rxonly mode
> in case the packet was timestamped.
> 
> Signed-off-by: Raslan Darawsheh 
> ---
>  app/test-pmd/config.c | 3 +++
>  app/test-pmd/rxonly.c | 2 ++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index 3ae3e1c..8a5da5d 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -598,6 +598,9 @@ port_offload_cap_display(portid_t port_id)
>   printf("off\n");
>   }
>  
> + if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TIMESTAMP)
> + printf("HW timestamp:  on\n");
> +
>   if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_QINQ_INSERT) {
>   printf("Double VLANs insert:   ");
>   if (ports[port_id].tx_ol_flags &
> diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
> index 5ef0219..f4d35d7 100644
> --- a/app/test-pmd/rxonly.c
> +++ b/app/test-pmd/rxonly.c
> @@ -158,6 +158,8 @@ pkt_burst_receive(struct fwd_stream *fs)
>   printf("hash=0x%x ID=0x%x ",
>  mb->hash.fdir.hash, mb->hash.fdir.id);
>   }
> + if (ol_flags & PKT_RX_TIMESTAMP)
> + printf(" - timestamp %lu ", mb->timestamp);
>   if (ol_flags & PKT_RX_VLAN_STRIPPED)
>   printf(" - VLAN tci=0x%x", mb->vlan_tci);
>   if (ol_flags & PKT_RX_QINQ_STRIPPED)
> -- 
> 2.7.4
 
How can we enable this Rx offload?

Thanks,

-- 
Nélio Laranjeiro
6WIND


[dpdk-dev] [PATCH 0/5] table: add key mask for hash tables

2017-08-24 Thread Cristian Dumitrescu
Main changes:

1. The key_mask parameter is added to all the hash tables that were
   previously missing it, as well to the hash compute function. This was
   first started in DPDK 2.0, but was only implemented for a couple of
   hash tables. The benefit of this approach is that it allows for better
   performance for large keys (bigger than 16 bytes), while it preserves
   the same performance for small keys [Q&A1].

2. The precomputed key signature (i.e. non-"do-sig") versions have been
   removed for all the hash tables, so now the key signature is always
   computed on every lookup. Note that this approach also allows for the
   precomputed key signature scheme [Q&A2].

3. API cleanup: single parameter structure common for all hash tables.

Q&A:

Q1: How is better lookup performance achieved by using key mask approach
   for hash tables?
A1: This approach eliminates the need to consolidate the lookup key in a
   single contiguous buffer where the relevant packet fields are written
   one by one, which is a very costly operation that also hash strong data
   dependencies.

Q2: How can the pre-computed key signature scheme be implemented with
current approach?
A2: The application can implement a straightforward custom hash function
that simply reads the pre-computed key signature from a given offset
in the input key buffer where it has been stored prior to the lookup
operation.

Cristian Dumitrescu (5):
  table: add key mask for hash tables
  test: update due to api changes in librte_table
  test-pipeline: update due to api changes in librte_table
  ip_pipeline: update due to api changes in librte_table
  deprecation: removed the librte_table notice

 doc/guides/rel_notes/deprecation.rst   |   8 -
 .../pipeline/pipeline_flow_classification.c|   9 +-
 .../pipeline/pipeline_flow_classification_be.c |  54 +-
 .../ip_pipeline/pipeline/pipeline_passthrough_be.c |  18 +-
 .../ip_pipeline/pipeline/pipeline_routing_be.c |  18 +-
 lib/librte_table/rte_table_hash.h  | 302 +
 lib/librte_table/rte_table_hash_cuckoo.c   | 200 +++---
 lib/librte_table/rte_table_hash_ext.c  | 407 
 lib/librte_table/rte_table_hash_key16.c| 691 +
 lib/librte_table/rte_table_hash_key32.c| 381 +++-
 lib/librte_table/rte_table_hash_key8.c | 654 +--
 lib/librte_table/rte_table_hash_lru.c  | 506 ++-
 test/test-pipeline/main.h  |   5 +-
 test/test-pipeline/pipeline_hash.c | 107 +---
 test/test/test_table.c |   1 +
 test/test/test_table.h |   3 +-
 test/test/test_table_combined.c| 140 ++---
 test/test/test_table_tables.c  | 148 ++---
 18 files changed, 1160 insertions(+), 2492 deletions(-)

-- 
2.7.4



[dpdk-dev] [PATCH 3/5] test-pipeline: update due to api changes in librte_table

2017-08-24 Thread Cristian Dumitrescu
Signed-off-by: Cristian Dumitrescu 
---
 test/test-pipeline/main.h  |   5 +-
 test/test-pipeline/pipeline_hash.c | 107 +
 2 files changed, 18 insertions(+), 94 deletions(-)

diff --git a/test/test-pipeline/main.h b/test/test-pipeline/main.h
index 3685849..26395a3 100644
--- a/test/test-pipeline/main.h
+++ b/test/test-pipeline/main.h
@@ -131,7 +131,10 @@ enum {
 
 void app_main_loop_rx(void);
 void app_main_loop_rx_metadata(void);
-uint64_t test_hash(void *key, uint32_t key_size, uint64_t seed);
+uint64_t test_hash(void *key,
+   void *key_mask,
+   uint32_t key_size,
+   uint64_t seed);
 
 void app_main_loop_worker(void);
 void app_main_loop_worker_pipeline_stub(void);
diff --git a/test/test-pipeline/pipeline_hash.c 
b/test/test-pipeline/pipeline_hash.c
index 991e381..edc1663 100644
--- a/test/test-pipeline/pipeline_hash.c
+++ b/test/test-pipeline/pipeline_hash.c
@@ -169,23 +169,23 @@ app_main_loop_worker_pipeline_hash(void) {
"ring %d\n", i);
}
 
+   struct rte_table_hash_params table_hash_params = {
+   .name = "TABLE",
+   .key_size = key_size,
+   .key_offset = APP_METADATA_OFFSET(32),
+   .key_mask = NULL,
+   .n_keys = 1 << 24,
+   .n_buckets = 1 << 22,
+   .f_hash = test_hash,
+   .seed = 0,
+   };
+
/* Table configuration */
switch (app.pipeline_type) {
case e_APP_PIPELINE_HASH_KEY8_EXT:
case e_APP_PIPELINE_HASH_KEY16_EXT:
case e_APP_PIPELINE_HASH_KEY32_EXT:
{
-   struct rte_table_hash_ext_params table_hash_params = {
-   .key_size = key_size,
-   .n_keys = 1 << 24,
-   .n_buckets = 1 << 22,
-   .n_buckets_ext = 1 << 21,
-   .f_hash = test_hash,
-   .seed = 0,
-   .signature_offset = APP_METADATA_OFFSET(0),
-   .key_offset = APP_METADATA_OFFSET(32),
-   };
-
struct rte_pipeline_table_params table_params = {
.ops = &rte_table_hash_ext_ops,
.arg_create = &table_hash_params,
@@ -204,16 +204,6 @@ app_main_loop_worker_pipeline_hash(void) {
case e_APP_PIPELINE_HASH_KEY16_LRU:
case e_APP_PIPELINE_HASH_KEY32_LRU:
{
-   struct rte_table_hash_lru_params table_hash_params = {
-   .key_size = key_size,
-   .n_keys = 1 << 24,
-   .n_buckets = 1 << 22,
-   .f_hash = test_hash,
-   .seed = 0,
-   .signature_offset = APP_METADATA_OFFSET(0),
-   .key_offset = APP_METADATA_OFFSET(32),
-   };
-
struct rte_pipeline_table_params table_params = {
.ops = &rte_table_hash_lru_ops,
.arg_create = &table_hash_params,
@@ -230,16 +220,6 @@ app_main_loop_worker_pipeline_hash(void) {
 
case e_APP_PIPELINE_HASH_SPEC_KEY8_EXT:
{
-   struct rte_table_hash_key8_ext_params table_hash_params = {
-   .n_entries = 1 << 24,
-   .n_entries_ext = 1 << 23,
-   .signature_offset = APP_METADATA_OFFSET(0),
-   .key_offset = APP_METADATA_OFFSET(32),
-   .key_mask = NULL,
-   .f_hash = test_hash,
-   .seed = 0,
-   };
-
struct rte_pipeline_table_params table_params = {
.ops = &rte_table_hash_key8_ext_ops,
.arg_create = &table_hash_params,
@@ -256,15 +236,6 @@ app_main_loop_worker_pipeline_hash(void) {
 
case e_APP_PIPELINE_HASH_SPEC_KEY8_LRU:
{
-   struct rte_table_hash_key8_lru_params table_hash_params = {
-   .n_entries = 1 << 24,
-   .signature_offset = APP_METADATA_OFFSET(0),
-   .key_offset = APP_METADATA_OFFSET(32),
-   .key_mask = NULL,
-   .f_hash = test_hash,
-   .seed = 0,
-   };
-
struct rte_pipeline_table_params table_params = {
.ops = &rte_table_hash_key8_lru_ops,
.arg_create = &table_hash_params,
@@ -281,16 +252,6 @@ app_main_loop_worker_pipeline_hash(void) {
 
case e_APP_PIPELINE_HASH_SPEC_KEY16_EXT:
{
-   struct rte_table_hash_key16_ext_params table_hash_params = {
-   .n_entries = 1 << 24,
-   .n_entries_ext = 1 << 23,
-   .signature_offset = APP_METADATA_OFFSET(0),
-   .key_offset = APP_METADATA_OFFSET(32),
-  

[dpdk-dev] [PATCH 2/5] test: update due to api changes in librte_table

2017-08-24 Thread Cristian Dumitrescu
Signed-off-by: Cristian Dumitrescu 
---
 test/test/test_table.c  |   1 +
 test/test/test_table.h  |   3 +-
 test/test/test_table_combined.c | 140 +
 test/test/test_table_tables.c   | 148 +---
 4 files changed, 128 insertions(+), 164 deletions(-)

diff --git a/test/test/test_table.c b/test/test/test_table.c
index 9e9eed8..db7d4e6 100644
--- a/test/test/test_table.c
+++ b/test/test/test_table.c
@@ -72,6 +72,7 @@ static void app_init_rings(void);
 static void app_init_mbuf_pools(void);
 
 uint64_t pipeline_test_hash(void *key,
+   __attribute__((unused)) void *key_mask,
__attribute__((unused)) uint32_t key_size,
__attribute__((unused)) uint64_t seed)
 {
diff --git a/test/test/test_table.h b/test/test/test_table.h
index 84d1845..8c1df33 100644
--- a/test/test/test_table.h
+++ b/test/test/test_table.h
@@ -94,7 +94,7 @@
APP_METADATA_OFFSET(32));   \
k32 = (uint32_t *) key; \
k32[0] = (value);   \
-   *signature = pipeline_test_hash(key, 0, 0); \
+   *signature = pipeline_test_hash(key, NULL, 0, 0);   \
rte_ring_enqueue((ring), m);\
 } while (0)
 
@@ -131,6 +131,7 @@
 /* Function definitions */
 uint64_t pipeline_test_hash(
void *key,
+   __attribute__((unused)) void *key_mask,
__attribute__((unused)) uint32_t key_size,
__attribute__((unused)) uint64_t seed);
 
diff --git a/test/test/test_table_combined.c b/test/test/test_table_combined.c
index a2d19a1..417bc42 100644
--- a/test/test/test_table_combined.c
+++ b/test/test/test_table_combined.c
@@ -200,8 +200,6 @@ test_table_type(struct rte_table_ops *table_ops, void 
*table_args,
return -CHECK_TABLE_CONSISTENCY;
}
 
-
-
/* Flow test - All hits */
if (table_packets->n_hit_packets) {
for (i = 0; i < table_packets->n_hit_packets; i++)
@@ -248,7 +246,6 @@ test_table_type(struct rte_table_ops *table_ops, void 
*table_args,
VERIFY_TRAFFIC(RING_TX, table_packets->n_miss_packets, 0);
}
 
-
/* Change table entry action */
printf("Change entry action\n");
table_entry.table_id = ring_out_2_id;
@@ -441,12 +438,15 @@ test_table_hash8lru(void)
int status, i;
 
/* Traffic flow */
-   struct rte_table_hash_key8_lru_params key8lru_params = {
-   .n_entries = 1<<24,
-   .f_hash = pipeline_test_hash,
-   .signature_offset = APP_METADATA_OFFSET(0),
+   struct rte_table_hash_params key8lru_params = {
+   .name = "TABLE",
+   .key_size = 8,
.key_offset = APP_METADATA_OFFSET(32),
.key_mask = NULL,
+   .n_keys = 1 << 16,
+   .n_buckets = 1 << 16,
+   .f_hash = pipeline_test_hash,
+   .seed = 0,
};
 
uint8_t key8lru[8];
@@ -475,14 +475,14 @@ test_table_hash8lru(void)
VERIFY(status, CHECK_TABLE_OK);
 
/* Invalid parameters */
-   key8lru_params.n_entries = 0;
+   key8lru_params.n_keys = 0;
 
status = test_table_type(&rte_table_hash_key8_lru_ops,
(void *)&key8lru_params, (void *)key8lru, &table_packets,
NULL, 0);
VERIFY(status, CHECK_TABLE_TABLE_CONFIG);
 
-   key8lru_params.n_entries = 1<<16;
+   key8lru_params.n_keys = 1<<16;
key8lru_params.f_hash = NULL;
 
status = test_table_type(&rte_table_hash_key8_lru_ops,
@@ -499,13 +499,15 @@ test_table_hash16lru(void)
int status, i;
 
/* Traffic flow */
-   struct rte_table_hash_key16_lru_params key16lru_params = {
-   .n_entries = 1<<16,
-   .f_hash = pipeline_test_hash,
-   .seed = 0,
-   .signature_offset = APP_METADATA_OFFSET(0),
+   struct rte_table_hash_params key16lru_params = {
+   .name = "TABLE",
+   .key_size = 16,
.key_offset = APP_METADATA_OFFSET(32),
.key_mask = NULL,
+   .n_keys = 1 << 16,
+   .n_buckets = 1 << 16,
+   .f_hash = pipeline_test_hash,
+   .seed = 0,
};
 
uint8_t key16lru[16];
@@ -534,14 +536,14 @@ test_table_hash16lru(void)
VERIFY(status, CHECK_TABLE_OK);
 
/* Invalid parameters */
-   key16lru_params.n_entries = 0;
+   key16lru_params.n_keys = 0;
 
status = test_table_type(&rte_table_hash_key16_lru_ops,
(void *)&key16lru_params, (void *)key16lru, &table_packets,
NULL, 0);
VERIFY(status, CHECK_TABLE_TABLE_CONFIG);
 
-   key16lru_params.n_entries = 1<<16;
+   key16lru_params.n_keys = 1<<16;
 

[dpdk-dev] [PATCH 4/5] ip_pipeline: update due to api changes in librte_table

2017-08-24 Thread Cristian Dumitrescu
Signed-off-by: Cristian Dumitrescu 
---
 .../pipeline/pipeline_flow_classification.c|  9 ++--
 .../pipeline/pipeline_flow_classification_be.c | 54 --
 .../ip_pipeline/pipeline/pipeline_passthrough_be.c | 18 
 .../ip_pipeline/pipeline/pipeline_routing_be.c | 18 
 4 files changed, 33 insertions(+), 66 deletions(-)

diff --git a/examples/ip_pipeline/pipeline/pipeline_flow_classification.c 
b/examples/ip_pipeline/pipeline/pipeline_flow_classification.c
index 9ef50cc..a8fae1e 100644
--- a/examples/ip_pipeline/pipeline/pipeline_flow_classification.c
+++ b/examples/ip_pipeline/pipeline/pipeline_flow_classification.c
@@ -88,8 +88,11 @@ app_pipeline_fc_key_convert(struct pipeline_fc_key *key_in,
uint32_t *signature)
 {
uint8_t buffer[PIPELINE_FC_FLOW_KEY_MAX_SIZE];
+   uint8_t m[PIPELINE_FC_FLOW_KEY_MAX_SIZE]; /* key mask */
void *key_buffer = (key_out) ? key_out : buffer;
 
+   memset(m, 0xFF, sizeof(m));
+
switch (key_in->type) {
case FLOW_KEY_QINQ:
{
@@ -101,7 +104,7 @@ app_pipeline_fc_key_convert(struct pipeline_fc_key *key_in,
qinq->cvlan = rte_cpu_to_be_16(key_in->key.qinq.cvlan);
 
if (signature)
-   *signature = (uint32_t) hash_default_key8(qinq, 8, 0);
+   *signature = (uint32_t) hash_default_key8(qinq, m, 8, 
0);
return 0;
}
 
@@ -118,7 +121,7 @@ app_pipeline_fc_key_convert(struct pipeline_fc_key *key_in,
ipv4->port_dst = 
rte_cpu_to_be_16(key_in->key.ipv4_5tuple.port_dst);
 
if (signature)
-   *signature = (uint32_t) hash_default_key16(ipv4, 16, 0);
+   *signature = (uint32_t) hash_default_key16(ipv4, m, 16, 
0);
return 0;
}
 
@@ -136,7 +139,7 @@ app_pipeline_fc_key_convert(struct pipeline_fc_key *key_in,
ipv6->port_dst = 
rte_cpu_to_be_16(key_in->key.ipv6_5tuple.port_dst);
 
if (signature)
-   *signature = (uint32_t) hash_default_key64(ipv6, 64, 0);
+   *signature = (uint32_t) hash_default_key64(ipv6, m, 64, 
0);
return 0;
}
 
diff --git a/examples/ip_pipeline/pipeline/pipeline_flow_classification_be.c 
b/examples/ip_pipeline/pipeline/pipeline_flow_classification_be.c
index 026f00c..e489957 100644
--- a/examples/ip_pipeline/pipeline/pipeline_flow_classification_be.c
+++ b/examples/ip_pipeline/pipeline/pipeline_flow_classification_be.c
@@ -492,40 +492,15 @@ static void *pipeline_fc_init(struct pipeline_params 
*params,
/* Tables */
p->n_tables = 1;
{
-   struct rte_table_hash_key8_ext_params
-   table_hash_key8_params = {
-   .n_entries = p_fc->n_flows,
-   .n_entries_ext = p_fc->n_flows,
-   .signature_offset = p_fc->hash_offset,
-   .key_offset = p_fc->key_offset,
-   .f_hash = hash_func[(p_fc->key_size / 8) - 1],
-   .key_mask = (p_fc->key_mask_present) ?
-   p_fc->key_mask : NULL,
-   .seed = 0,
-   };
-
-   struct rte_table_hash_key16_ext_params
-   table_hash_key16_params = {
-   .n_entries = p_fc->n_flows,
-   .n_entries_ext = p_fc->n_flows,
-   .signature_offset = p_fc->hash_offset,
-   .key_offset = p_fc->key_offset,
-   .f_hash = hash_func[(p_fc->key_size / 8) - 1],
-   .key_mask = (p_fc->key_mask_present) ?
-   p_fc->key_mask : NULL,
-   .seed = 0,
-   };
-
-   struct rte_table_hash_ext_params
-   table_hash_params = {
+   struct rte_table_hash_params table_hash_params = {
+   .name = p->name,
.key_size = p_fc->key_size,
+   .key_offset = p_fc->key_offset,
+   .key_mask = (p_fc->key_mask_present)? p_fc->key_mask : 
NULL,
.n_keys = p_fc->n_flows,
.n_buckets = p_fc->n_flows / 4,
-   .n_buckets_ext = p_fc->n_flows / 4,
.f_hash = hash_func[(p_fc->key_size / 8) - 1],
.seed = 0,
-   .signature_offset = p_fc->hash_offset,
-   .key_offset = p_fc->key_offset,
};
 
struct rte_pipeline_table_params table_params = {
@@ -542,32 +517,19 @@ static void *pipeline_fc_init(struct pipeline_params 
*params,
 
switch (p_fc->key_size) {
case 8:
-   if (p_fc->hash_offset != 0) {
-   table_params.ops 

[dpdk-dev] [PATCH 5/5] deprecation: removed the librte_table notice

2017-08-24 Thread Cristian Dumitrescu
Signed-off-by: Cristian Dumitrescu 
---
 doc/guides/rel_notes/deprecation.rst | 8 
 1 file changed, 8 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 3362f33..e181a61 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -112,11 +112,3 @@ Deprecation Notices
 
   - ``rte_cryptodev_allocate_driver``
 
-* librte_table: The ``key_mask`` parameter will be added to all the hash tables
-  that currently do not have it, as well as to the hash compute function 
prototype.
-  The non-"do-sig" versions of the hash tables will be removed
-  (including the ``signature_offset`` parameter)
-  and the "do-sig" versions renamed accordingly.
-- 
2.7.4



Re: [dpdk-dev] [PATCH v2 3/3] net/mlx5: add hardware timestamp

2017-08-24 Thread Nélio Laranjeiro
On Thu, Aug 24, 2017 at 10:46:33AM +0300, Raslan Darawsheh wrote:
> Expose a new capapilty of Rx hw timestamp and
> added new device args to enable it hw_timestamp.
> It will add the raw hw timestamp into the packets.
> 
> Its expected that it will lower down the performance since using it
> will disable the cqe comprission, and will add extra checkes in
> the vec rx path.
> 
> Signed-off-by: Raslan Darawsheh 
> ---
>  doc/guides/nics/mlx5.rst |  5 +
>  drivers/net/mlx5/mlx5.c  | 23 +++
>  drivers/net/mlx5/mlx5.h  |  1 +
>  drivers/net/mlx5/mlx5_ethdev.c   |  3 ++-
>  drivers/net/mlx5/mlx5_rxq.c  |  3 +++
>  drivers/net/mlx5/mlx5_rxtx.c |  5 +
>  drivers/net/mlx5/mlx5_rxtx.h |  3 ++-
>  drivers/net/mlx5/mlx5_rxtx_vec_sse.c | 13 -
>  8 files changed, 53 insertions(+), 3 deletions(-)
 
It is strange to enable/disable this single offload again the application
request.  Why do you need such behavior?

Another point I don't seen any code to retrieve this offloads request from the
port configuration as for the others.  It is expected in the new revision?

Thanks,

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [RFC] net/mlx5: support count flow action

2017-08-24 Thread Ori Kam
Hi Nelio,

Please see my comments in line.

Ori

> -Original Message-
> From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com]
> Sent: Thursday, August 24, 2017 9:54 AM
> To: Ori Kam 
> Cc: adrien.mazar...@6wind.com; dev@dpdk.org
> Subject: Re: [RFC] net/mlx5: support count flow action
> 
> Hi Ori,
> 
> Please keep the coding style of the file, and pass checkpatch before
> submitting a patch on the mailing list.  It helps the review by having a 
> correct
> patch respecting the coding style of the file.
> I won't spot out here all the coding style issues, if you need some help, feel
> free to ask.
> 
Sorry won't happen again.

> On Mon, Aug 21, 2017 at 03:35:41PM +0300, Ori Kam wrote:
> > Support count flow action.
> 
> Why copy/pasting the title in the commit message?
> 
I was under the impression that main function of the RFC should also be in the 
message body.

> > This patch is basic design only, do to missing features on the verbs
> > driver. As soon as the features will be implemented on the verbs
> > driver this will be updated and rebased on top of
> > dpdk.org/ml/archives/dev/2017-August/072351.html
> > (The verbs driver should be ready starting September)
> >
> > This RFC should be applied on top of
> > dpdk.org/ml/archives/dev/2017-August/072351.html
> 
> Last two comments should be after '---' line.
> 
Those two lines are part of the commit message, any way I will move them.

> > Signed-off-by: Ori Kam 
> > ---
> >  drivers/net/mlx5/mlx5.h  |   4 ++
> >  drivers/net/mlx5/mlx5_flow.c | 163
> > ++-
> 
> There are missing changes in the Makefile to have the
> HAVE_VERBS_IBV_EXP_FLOW_SPEC_ACTION_COUNT and the include of the
> mlx5_autoconf.h in mlx5_flow.c.
> 
I haven't added them since this feature is not supported yet, and 
I don't want anybody trying to activate them.
When the feature will be supported on the verbs then I will update
those files. 

> >  2 files changed, 166 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> > e89aba8..434e848 100644
> > --- a/drivers/net/mlx5/mlx5.h
> > +++ b/drivers/net/mlx5/mlx5.h
> >[...]
> > +/**
> > + * Query an existing flow rule.
> > + *
> > + * @see rte_flow_query()
> > + * @see rte_flow_ops
> > + */
> > +int
> > +mlx5_flow_query(struct rte_eth_dev *dev,
> > +   struct rte_flow *flow,
> > +   enum rte_flow_action_type type,
> > +   void *res,
> > +   struct rte_flow_error *error)
> > +{
> > +
> > +   int res_value = 0;
> > +   switch (type){
> > +   case RTE_FLOW_ACTION_TYPE_COUNT:
> > +   if (!flow->counter) {
> > +   rte_flow_error_set(error, EINVAL,
> > +
> RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > +  NULL,
> > +  "No counter
> is set for this flow");
> > +   return -1;
> 
> Wrong returned value, read the rte_flow_query API allowed values.
> 
Will be fixed
> > +   }
> > +#ifdef HAVE_VERBS_IBV_EXP_FLOW_SPEC_ACTION_COUNT
> > +   res_value =
> priv_flow_query_counter(mlx5_get_priv(dev), flow->counter,
> > +   (struct rte_flow_query_count*)res,
> > +   error);
> > +#else
> > +   rte_flow_error_set(error, ENOTSUP,
> RTE_FLOW_ERROR_TYPE_ACTION,
> > +   NULL,
> "Flow count unsupported");
> > +   (void)dev;
> > +   (void)flow;
> > +   (void)type;
> > +   (void)res;
> > +   (void)error;
> > +   return -1;
> 
> Same here.
> 
Will be fixed.

> > +#endif
> 
> I'll suggest to have a dedicated function here to handle this situation, like 
> a
> mlx5_flow_query_counters() and call it from this case.  It will clearly ease 
> the
> readability and maintenance.
> 
Will be update according to your suggestion.

> Thanks,
> 
> --
> Nélio Laranjeiro
> 6WIND
Thanks,
Ori Kam


[dpdk-dev] [PATCH v1 2/2] net/mlx5: add multiple process support

2017-08-24 Thread Xueming Li
PMD uses Verbs object which were not available in the shared memory, in
addition, due to IO pages, it was not possible to use the primary
process Tx queues from a secondary process.

This patch modify the location where Verbs objects are allocated (from
process memory address space to shared memory address space) and thus
allow a secondary process to use those object by mapping this shared
memory space its own memory space.
For Tx IO pages, it uses a unix socket to get back the communication
channel with the Kernel driver from the primary process, this is
necessary to remap those pages in the secondary process memory space and
thus use the same Tx queues.

This is only supported from Linux kernel (v4.14) and rdma-core (v14).

Cc: Nelio Laranjeiro 
Signed-off-by: Xueming Li 
---
 doc/guides/nics/mlx5.rst   |   3 +-
 drivers/net/mlx5/Makefile  |   1 +
 drivers/net/mlx5/mlx5.c| 132 --
 drivers/net/mlx5/mlx5.h|  18 +--
 drivers/net/mlx5/mlx5_ethdev.c | 215 ++
 drivers/net/mlx5/mlx5_rxq.c|  41 --
 drivers/net/mlx5/mlx5_rxtx.h   |   5 +-
 drivers/net/mlx5/mlx5_socket.c | 294 +
 drivers/net/mlx5/mlx5_txq.c|  89 -
 9 files changed, 501 insertions(+), 297 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_socket.c

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index a68b7ad..9eeada4 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -87,7 +87,7 @@ Features
 - Flow director (RTE_FDIR_MODE_PERFECT, RTE_FDIR_MODE_PERFECT_MAC_VLAN and
   RTE_ETH_FDIR_REJECT).
 - Flow API.
-- Secondary process TX is supported.
+- Secondary process.
 - KVM and VMware ESX SR-IOV modes are supported.
 - RSS hash result is supported.
 - Hardware TSO.
@@ -99,7 +99,6 @@ Limitations
 - Inner RSS for VXLAN frames is not supported yet.
 - Port statistics through software counters only.
 - Hardware checksum RX offloads for VXLAN inner header are not supported yet.
-- Secondary process RX is not supported.
 
 Configuration
 -
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 0feed4c..6c8f404 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -52,6 +52,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rss.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_fdir.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mr.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_socket.c
 
 # Basic CFLAGS.
 CFLAGS += -O3
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 39a159c..3002e7e 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -126,6 +126,52 @@ struct mlx5_args {
 }
 
 /**
+ * Verbs callback to allocate a memory. This function should allocate the space
+ * according to the size provided residing inside a huge page.
+ *
+ * @param[in] size
+ *   The size in bytes of the memory to allocate.
+ * @param[in] data
+ *   A pointer to the callback data.
+ *
+ * @return
+ *   a pointer to the allocate space.
+ */
+static void *
+mlx5_extern_alloc_buf(size_t size, void *data)
+{
+   struct priv *priv = data;
+   void *ret;
+   size_t alignment = sysconf(_SC_PAGESIZE);
+
+   assert(data != NULL);
+   assert(!mlx5_is_secondary());
+
+   ret = rte_malloc_socket(__func__, size, alignment,
+   priv->dev->device->numa_node);
+   DEBUG("Extern alloc size: %lu, align: %lu: %p", size, alignment, ret);
+   return ret;
+}
+
+/**
+ * Verbs callback to free a memory.
+ *
+ * @param[in] ptr
+ *   A pointer to the memory to free.
+ * @param[in] data
+ *   A pointer to the callback data.
+ */
+static void
+mlx5_extern_free_buf(void *ptr, void *data __rte_unused)
+{
+   assert(data != NULL);
+   assert(!mlx5_is_secondary());
+
+   DEBUG("Extern free request: %p", ptr);
+   rte_free(ptr);
+}
+
+/**
  * DPDK callback to close the device.
  *
  * Destroy all queues and objects, free memory.
@@ -203,6 +249,7 @@ struct mlx5_args {
}
if (priv->reta_idx != NULL)
rte_free(priv->reta_idx);
+   priv_socket_uninit(priv);
priv_unlock(priv);
memset(priv, 0, sizeof(*priv));
 }
@@ -526,6 +573,7 @@ struct mlx5_args {
assert(err > 0);
return -err;
}
+   err = 0; /* previous errors are handled if attr_ctx is NULL. */
ibv_dev = list[i];
 
DEBUG("device opened");
@@ -555,6 +603,40 @@ struct mlx5_args {
.tso = MLX5_ARG_UNSET,
};
 
+   mlx5_dev[idx].ports |= test;
+   if (mlx5_is_secondary()) {
+   /* from rte_ethdev.c */
+   char name[RTE_ETH_NAME_MAX_LEN];
+
+   snprintf(name, sizeof(name), "%s port %u",
+ibv_get_device_name(ibv_dev), port);
+   eth_d

[dpdk-dev] [PATCH v1 1/2] net/mlx5: change eth device reference for secondary process

2017-08-24 Thread Xueming Li
rte_eth_dev created by primary process were not available in secondary
process, it was not possible to use the primary process local memory
object from a secondary process.

This patch modify the reference of primary rte_eth_dev object, use
local rte_eth_dev secondary process instead.

Cc: Nelio Laranjeiro 
Signed-off-by: Xueming Li 
---
 drivers/net/mlx5/mlx5.h |  6 ++---
 drivers/net/mlx5/mlx5_ethdev.c  | 52 ++---
 drivers/net/mlx5/mlx5_fdir.c|  3 +++
 drivers/net/mlx5/mlx5_rss.c |  3 +++
 drivers/net/mlx5/mlx5_rxq.c |  2 ++
 drivers/net/mlx5/mlx5_trigger.c |  4 ++--
 6 files changed, 41 insertions(+), 29 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 684a603..2dee07c 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -95,7 +95,7 @@ struct mlx5_xstats_ctrl {
 };
 
 struct priv {
-   struct rte_eth_dev *dev; /* Ethernet device. */
+   struct rte_eth_dev *dev; /* Ethernet device of master process. */
struct ibv_context *ctx; /* Verbs context. */
struct ibv_device_attr_ex device_attr; /* Device properties. */
struct ibv_pd *pd; /* Protection Domain. */
@@ -223,8 +223,8 @@ int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
 int mlx5_set_link_down(struct rte_eth_dev *dev);
 int mlx5_set_link_up(struct rte_eth_dev *dev);
 struct priv *mlx5_secondary_data_setup(struct priv *priv);
-void priv_select_tx_function(struct priv *);
-void priv_select_rx_function(struct priv *);
+void mlx5_dev_select_tx_function(struct rte_eth_dev *dev);
+void mlx5_dev_select_rx_function(struct rte_eth_dev *dev);
 
 /* mlx5_mac.c */
 
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index f5167e0..fce7dd5 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1038,7 +1038,7 @@ struct priv *
 * burst function again.
 */
if (!ret)
-   priv_select_rx_function(priv);
+   mlx5_dev_select_rx_function(dev);
 out:
priv_unlock(priv);
assert(ret >= 0);
@@ -1347,7 +1347,7 @@ struct priv *
 /**
  * Change the link state (UP / DOWN).
  *
- * @param priv
+ * @param dev
  *   Pointer to Ethernet device structure.
  * @param up
  *   Nonzero for link up, otherwise link down.
@@ -1356,17 +1356,17 @@ struct priv *
  *   0 on success, errno value on failure.
  */
 static int
-priv_set_link(struct priv *priv, int up)
+mlx5_dev_set_link(struct rte_eth_dev *dev, int up)
 {
-   struct rte_eth_dev *dev = priv->dev;
+   struct priv *priv = dev->data->dev_private;
int err;
 
if (up) {
err = priv_set_flags(priv, ~IFF_UP, IFF_UP);
if (err)
return err;
-   priv_select_tx_function(priv);
-   priv_select_rx_function(priv);
+   mlx5_dev_select_tx_function(dev);
+   mlx5_dev_select_rx_function(dev);
} else {
err = priv_set_flags(priv, ~IFF_UP, ~IFF_UP);
if (err)
@@ -1393,7 +1393,7 @@ struct priv *
int err;
 
priv_lock(priv);
-   err = priv_set_link(priv, 0);
+   err = mlx5_dev_set_link(dev, 0);
priv_unlock(priv);
return err;
 }
@@ -1414,7 +1414,7 @@ struct priv *
int err;
 
priv_lock(priv);
-   err = priv_set_link(priv, 1);
+   err = mlx5_dev_set_link(dev, 1);
priv_unlock(priv);
return err;
 }
@@ -1560,8 +1560,8 @@ struct priv *
rte_mb();
priv->dev->data = &sd->data;
rte_mb();
-   priv_select_tx_function(priv);
-   priv_select_rx_function(priv);
+   mlx5_dev_select_tx_function(priv->dev);
+   mlx5_dev_select_rx_function(priv->dev);
priv_unlock(priv);
 end:
/* More sanity checks. */
@@ -1579,30 +1579,32 @@ struct priv *
 /**
  * Configure the TX function to use.
  *
- * @param priv
- *   Pointer to private structure.
+ * @param dev
+ *   Pointer to device structure.
  */
 void
-priv_select_tx_function(struct priv *priv)
+mlx5_dev_select_tx_function(struct rte_eth_dev *dev)
 {
-   priv->dev->tx_pkt_burst = mlx5_tx_burst;
+   struct priv *priv = dev->data->dev_private;
+
+   dev->tx_pkt_burst = mlx5_tx_burst;
/* Select appropriate TX function. */
if (priv->mps == MLX5_MPW_ENHANCED) {
if (priv_check_vec_tx_support(priv) > 0) {
if (priv_check_raw_vec_tx_support(priv) > 0)
-   priv->dev->tx_pkt_burst = mlx5_tx_burst_raw_vec;
+   dev->tx_pkt_burst = mlx5_tx_burst_raw_vec;
else
-   priv->dev->tx_pkt_burst = mlx5_tx_burst_vec;
+   dev->tx_pkt_burst = mlx5_tx_burst_vec;
DEBUG("selected Enhanced MPW TX vectorized function");
} else {
-   priv->dev->tx_pk

Re: [dpdk-dev] [RFC] remove redundant file header note

2017-08-24 Thread Ferruh Yigit
On 8/23/2017 9:40 PM, Thomas Monjalon wrote:
> 21/08/2017 15:51, Ferruh Yigit:
>> Some of the "All rights reserved." note looks like duplicate, there is
>> one for each copyright note in the same line, and extra one after the
>> copyright notes. Sample is in below patch.
>>
>> Although this looks like a duplication, I am not sure if this is a legal
>> requirement, or legally has a meaning.
> 
> I think the whole sentence "All rights reserved" has no meaning
> and could be removed in both lines, but I am not a lawyer :)
> I'm afraid lawyers won't take the risk to change these lines.

I also don't dare to take that risk :)
Let me send a patch just to remove standalone "All rights reserved." lines..

> 
> [...]
>>  #   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
>> -#   All rights reserved.
> 
> You could also remove "(c)" after Copyright.
> 



Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements ***

2017-08-24 Thread Carrillo, Erik G


> -Original Message-
> From: Wiles, Keith
> Sent: Wednesday, August 23, 2017 4:05 PM
> To: Carrillo, Erik G 
> Cc: rsanf...@akamai.com; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements ***
> 
> 
> > On Aug 23, 2017, at 2:28 PM, Carrillo, Erik G 
> wrote:
> >
> >>
> >> -Original Message-
> >> From: Wiles, Keith
> >> Sent: Wednesday, August 23, 2017 11:50 AM
> >> To: Carrillo, Erik G 
> >> Cc: rsanf...@akamai.com; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements
> >> ***
> >>
> >>
> >>> On Aug 23, 2017, at 11:19 AM, Carrillo, Erik G
> >>> 
> >> wrote:
> >>>
> >>>
> >>>
>  -Original Message-
>  From: Wiles, Keith
>  Sent: Wednesday, August 23, 2017 10:02 AM
>  To: Carrillo, Erik G 
>  Cc: rsanf...@akamai.com; dev@dpdk.org
>  Subject: Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements
>  ***
> 
> 
> > On Aug 23, 2017, at 9:47 AM, Gabriel Carrillo
> > 
>  wrote:
> >
> > In the current implementation of the DPDK timer library, timers
> > can be created and set to be handled by a target lcore by adding
> > it to a skiplist that corresponds to that lcore.  However, if an
> > application enables multiple lcores, and each of these lcores
> > repeatedly attempts to install timers on the same target lcore,
> > overall application throughput will be reduced as all lcores
> > contend to acquire the lock guarding the single skiplist of pending
> timers.
> >
> > This patchset addresses this scenario by adding an array of
> > skiplists to each lcore's priv_timer struct, such that when lcore
> > i installs a timer on lcore k, the timer will be added to the ith
> > skiplist for lcore k.  If lcore j installs a timer on lcore k
> > simultaneously, lcores i and j can both proceed since they will be
> > acquiring different locks for different lists.
> >
> > When lcore k processes its pending timers, it will traverse each
> > skiplist in its array and acquire a skiplist's lock while a run
> > list is broken out; meanwhile, all other lists can continue to be
> modified.
> > Then, all run lists for lcore k are collected and traversed
> > together so timers are executed in their global order.
> 
>  What is the performance and/or latency added to the timeout now?
> 
>  I worry about the case when just about all of the cores are
>  enabled, which could be as high was 128 or more now.
> >>>
> >>> There is a case in the timer_perf_autotest that runs
> >>> rte_timer_manage
> >> with zero timers that can give a sense of the added latency.   When run
> with
> >> one lcore, it completes in around 25 cycles.  When run with 43 lcores
> >> (the highest I have access to at the moment), rte_timer_mange
> >> completes in around 155 cycles.  So it looks like each added lcore
> >> adds around 3 cycles of overhead for checking empty lists in my testing.
> >>
> >> Does this mean we have only 25 cycles on the current design or is the
> >> 25 cycles for the new design?
> >>
> >
> > Both - when run with one lcore, the new design becomes equivalent to the
> original one.  I tested the current design to confirm.
> 
> Good thanks
> 
> >
> >> If for the new design, then what is the old design cost compared to
> >> the new cost.
> >>
> >> I also think we need the call to a timer function in the calculation,
> >> just to make sure we have at least one timer in the list and we
> >> account for any short cuts in the code for no timers active.
> >>
> >
> > Looking at the numbers for non-empty lists in timer_perf_autotest, the
> overhead appears to fall away.  Here are some representative runs for
> timer_perf_autotest:
> >
> > 43 lcores enabled, installing 1M timers on an lcore and processing them
> with current design:
> >
> > <...snipped...>
> > Appending 100 timers
> > Time for 100 timers: 424066294 (193ms), Time per timer: 424 (0us)
> > Time for 100 callbacks: 73124504 (33ms), Time per callback: 73
> > (0us) Resetting 100 timers Time for 100 timers: 1406756396
> > (641ms), Time per timer: 1406 (1us) <...snipped...>
> >
> > 43 lcores enabled, installing 1M timers on an lcore and processing them
> with proposed design:
> >
> > <...snipped...>
> > Appending 100 timers
> > Time for 100 timers: 382912762 (174ms), Time per timer: 382 (0us)
> > Time for 100 callbacks: 79194418 (36ms), Time per callback: 79
> > (0us) Resetting 100 timers Time for 100 timers: 1427189116
> > (650ms), Time per timer: 1427 (1us) <...snipped…>
> 
> it looks ok then. The main concern I had was the timers in Pktgen and
> someone telling the jitter increase or latency or performance. I guess I will
> just have to wait an see.
> 
> >
> > The above are not averages, so the numbers don't really indicate which is
> faster, but they show that the overhead of the proposed design should not
> be appreciable.

[dpdk-dev] [PATCH 1/5] lib: add Generic Segmentation Offload API framework

2017-08-24 Thread Jiayu Hu
Generic Segmentation Offload (GSO) is a SW technique to split large
packets into small ones. Akin to TSO, GSO enables applications to
operate on large packets, thus reducing per-packet processing overhead.

To enable more flexibility to applications, DPDK GSO is implemented
as a standalone library. Applications explicitly use the GSO library
to segment packets. This patch introduces the GSO API framework to DPDK.

The GSO library provides a segmentation API, rte_gso_segment(), for
applications. It splits an input packet into small ones in each
invocation. The GSO library refers to these small packets generated
by rte_gso_segment() as GSO segments. When all GSO segments are freed,
the input packet is freed automatically.

Signed-off-by: Jiayu Hu 
Signed-off-by: Mark Kavanagh 
---
 config/common_base |   5 ++
 lib/Makefile   |   2 +
 lib/librte_gso/Makefile|  49 
 lib/librte_gso/rte_gso.c   |  47 
 lib/librte_gso/rte_gso.h   | 111 +
 lib/librte_gso/rte_gso_version.map |   7 +++
 mk/rte.app.mk  |   1 +
 7 files changed, 222 insertions(+)
 create mode 100644 lib/librte_gso/Makefile
 create mode 100644 lib/librte_gso/rte_gso.c
 create mode 100644 lib/librte_gso/rte_gso.h
 create mode 100644 lib/librte_gso/rte_gso_version.map

diff --git a/config/common_base b/config/common_base
index 5e97a08..603e340 100644
--- a/config/common_base
+++ b/config/common_base
@@ -652,6 +652,11 @@ CONFIG_RTE_LIBRTE_IP_FRAG_TBL_STAT=n
 CONFIG_RTE_LIBRTE_GRO=y
 
 #
+# Compile GSO library
+#
+CONFIG_RTE_LIBRTE_GSO=y
+
+#
 # Compile librte_meter
 #
 CONFIG_RTE_LIBRTE_METER=y
diff --git a/lib/Makefile b/lib/Makefile
index 86caba1..3d123f4 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso
+DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ether librte_net
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile
new file mode 100644
index 000..aeaacbc
--- /dev/null
+++ b/lib/librte_gso/Makefile
@@ -0,0 +1,49 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gso.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+
+EXPORT_MAP := rte_gso_version.map
+
+LIBABIVER := 1
+
+#source files
+SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
new file mode 100644
index 000..b81afce
--- /dev/null
+++ b/lib/librte_gso/rte_gso.c
@@ -0,0 +1,47 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source c

[dpdk-dev] [PATCH 0/5] Support TCP/IPv4, VxLAN and GRE GSO in DPDK

2017-08-24 Thread Jiayu Hu
Generic Segmentation Offload (GSO) is a SW technique to split large
packets into small ones. Akin to TSO, GSO enables applications to
operate on large packets, thus reducing per-packet processing overhead.

To enable more flexibility to applications, DPDK GSO is implemented
as a standalone library. Applications explicitly use the GSO library
to segment packets. This patch adds GSO support to DPDK for specific
packet types: specifically, TCP/IPv4, VxLAN, and GRE.

The first patch introduces the GSO API framework. The second patch
adds GSO support for TCP/IPv4 packets (containing an optional VLAN
tag). The third patch adds GSO support for VxLAN packets that contain
outer IPv4, and inner TCP/IPv4 headers (plus optional inner and/or 
outer VLAN tags). The fourth patch adds GSO support for GRE packets
that contain outer IPv4, and inner TCP/IPv4 headers (with optional 
outer VLAN tag). The last patch in the series enables TCP/IPv4, VxLAN,
and GRE GSO in testpmd's checksum forwarding engine.

The performance of TCP/IPv4 GSO on a 10Gbps link is demonstrated using
iperf. Setup for the test is described as follows:

a. Connect 2 x 10Gbps physical ports (P0, P1), together physically.
b. Launch testpmd with P0 and a vhost-user port, and use csum
   forwarding engine.
c. Select IP and TCP HW checksum calculation for P0; select TCP HW
   checksum calculation for vhost-user port.
d. Launch a VM with csum and tso offloading enabled.
e. Run iperf-client on virtio-net port in the VM to send TCP packets.

With GSO enabled for P0 in testpmd, observed iperf throughput is ~9Gbps.
The experimental data of VxLAN and GRE will be shown later.

Jiayu Hu (3):
  lib: add Generic Segmentation Offload API framework
  gso/lib: add TCP/IPv4 GSO support
  app/testpmd: enable TCP/IPv4, VxLAN and GRE GSO

Mark Kavanagh (2):
  lib/gso: add VxLAN GSO support
  lib/gso: add GRE GSO support

 app/test-pmd/cmdline.c  | 121 +
 app/test-pmd/config.c   |  25 ++
 app/test-pmd/csumonly.c |  68 -
 app/test-pmd/testpmd.c  |   9 +
 app/test-pmd/testpmd.h  |  10 +
 config/common_base  |   5 +
 lib/Makefile|   2 +
 lib/librte_eal/common/include/rte_log.h |   1 +
 lib/librte_gso/Makefile |  52 
 lib/librte_gso/gso_common.c | 431 
 lib/librte_gso/gso_common.h | 180 +
 lib/librte_gso/gso_tcp.c|  82 ++
 lib/librte_gso/gso_tcp.h|  73 ++
 lib/librte_gso/gso_tunnel.c |  62 +
 lib/librte_gso/gso_tunnel.h |  46 
 lib/librte_gso/rte_gso.c| 100 
 lib/librte_gso/rte_gso.h| 122 +
 lib/librte_gso/rte_gso_version.map  |   7 +
 mk/rte.app.mk   |   1 +
 19 files changed, 1392 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_gso/Makefile
 create mode 100644 lib/librte_gso/gso_common.c
 create mode 100644 lib/librte_gso/gso_common.h
 create mode 100644 lib/librte_gso/gso_tcp.c
 create mode 100644 lib/librte_gso/gso_tcp.h
 create mode 100644 lib/librte_gso/gso_tunnel.c
 create mode 100644 lib/librte_gso/gso_tunnel.h
 create mode 100644 lib/librte_gso/rte_gso.c
 create mode 100644 lib/librte_gso/rte_gso.h
 create mode 100644 lib/librte_gso/rte_gso_version.map

-- 
2.7.4



[dpdk-dev] [PATCH 3/5] lib/gso: add VxLAN GSO support

2017-08-24 Thread Jiayu Hu
From: Mark Kavanagh 

This patch adds GSO support for VxLAN-encapsulated packets. Supported
VxLAN packets must have an outer IPv4 header (prepended by an optional
VLAN tag), and contain an inner TCP/IPv4 packet (with an optional inner
VLAN tag).

VxLAN GSO assumes that all input packets have correct checksums and
doesn't update checksums for output packets. Additionally, it doesn't
process IP fragmented packets.

As with TCP/IPv4 GSO, VxLAN GSO uses a two-segment MBUF to organize each
output packet, which mandates support for multi-segment mbufs in the TX
functions of the NIC driver. Also, if a packet is GSOed, VxLAN GSO
reduces its MBUF refcnt by 1. As a result, when all of its GSOed
segments are freed, the packet is freed automatically.

Signed-off-by: Mark Kavanagh 
Signed-off-by: Jiayu Hu 
---
 lib/librte_gso/Makefile |   1 +
 lib/librte_gso/gso_common.c | 109 ++--
 lib/librte_gso/gso_common.h |  41 -
 lib/librte_gso/gso_tunnel.c |  62 +
 lib/librte_gso/gso_tunnel.h |  46 +++
 lib/librte_gso/rte_gso.c|  12 -
 lib/librte_gso/rte_gso.h|   4 ++
 7 files changed, 268 insertions(+), 7 deletions(-)
 create mode 100644 lib/librte_gso/gso_tunnel.c
 create mode 100644 lib/librte_gso/gso_tunnel.h

diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile
index 0f8e38f..a4d1a81 100644
--- a/lib/librte_gso/Makefile
+++ b/lib/librte_gso/Makefile
@@ -44,6 +44,7 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c
 SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c
 SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp.c
+SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tunnel.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h
diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c
index 2b54fbd..65cec44 100644
--- a/lib/librte_gso/gso_common.c
+++ b/lib/librte_gso/gso_common.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "gso_common.h"
 
@@ -156,18 +157,60 @@ gso_do_segment(struct rte_mbuf *pkt,
return nb_segs;
 }
 
+static inline void parse_ethernet(struct ether_hdr *eth_hdr,
+   struct rte_mbuf *pkt);
+
+static inline void
+parse_vxlan(struct udp_hdr *udp_hdr, struct rte_mbuf *pkt)
+{
+   struct ether_hdr *eth_hdr;
+
+   eth_hdr = (struct ether_hdr *)((char *)udp_hdr +
+   sizeof(struct udp_hdr) +
+   sizeof(struct vxlan_hdr));
+
+   pkt->packet_type |= RTE_PTYPE_TUNNEL_VXLAN;
+   pkt->outer_l2_len = pkt->l2_len;
+   parse_ethernet(eth_hdr, pkt);
+   pkt->l2_len += ETHER_VXLAN_HLEN; /* add udp + vxlan */
+}
+
+static inline void
+parse_udp(struct udp_hdr *udp_hdr, struct rte_mbuf *pkt)
+{
+   /* Outer UDP header of VxLAN packet */
+   if (udp_hdr->dst_port == rte_cpu_to_be_16(VXLAN_DEFAULT_PORT)) {
+   pkt->packet_type |= RTE_PTYPE_L4_UDP;
+   parse_vxlan(udp_hdr, pkt);
+   } else {
+   /* IPv4/UDP packet */
+   pkt->l4_len = sizeof(struct udp_hdr);
+   pkt->packet_type |= RTE_PTYPE_L4_UDP;
+   }
+}
+
 static inline void
 parse_ipv4(struct ipv4_hdr *ipv4_hdr, struct rte_mbuf *pkt)
 {
struct tcp_hdr *tcp_hdr;
+   struct udp_hdr *udp_hdr;
 
switch (ipv4_hdr->next_proto_id) {
case IPPROTO_TCP:
-   pkt->packet_type |= RTE_PTYPE_L4_TCP;
+   if (IS_VXLAN_PKT(pkt)) {
+   pkt->outer_l3_len = pkt->l3_len;
+   pkt->packet_type |= RTE_PTYPE_INNER_L4_TCP;
+   } else
+   pkt->packet_type |= RTE_PTYPE_L4_TCP;
pkt->l3_len = IPv4_HDR_LEN(ipv4_hdr);
tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
pkt->l4_len = TCP_HDR_LEN(tcp_hdr);
break;
+   case IPPROTO_UDP:
+   pkt->l3_len = IPv4_HDR_LEN(ipv4_hdr);
+   udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
+   parse_udp(udp_hdr, pkt);
+   break;
}
 }
 
@@ -182,13 +225,21 @@ parse_ethernet(struct ether_hdr *eth_hdr, struct rte_mbuf 
*pkt)
if (ethertype == ETHER_TYPE_VLAN) {
vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1);
pkt->l2_len = sizeof(struct vlan_hdr);
-   pkt->packet_type |= RTE_PTYPE_L2_ETHER_VLAN;
+   if (IS_VXLAN_PKT(pkt))
+   pkt->packet_type |= RTE_PTYPE_INNER_L2_ETHER_VLAN;
+   else
+   pkt->packet_type |= RTE_PTYPE_L2_ETHER_VLAN;
ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto);
-   }
+   } else
+   pkt->l2_len = 0;
 
switch (ethertype) {
case ETHER_TYPE_IPv4:
-   if (IS_VLAN_PKT(pkt)) {
+   if (IS_VXLAN_PKT(pkt)) {
+   if (!IS_INNER_VLAN_PKT(pkt))
+   

[dpdk-dev] [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support

2017-08-24 Thread Jiayu Hu
This patch adds GSO support for TCP/IPv4 packets. Supported packets
may include a single VLAN tag. TCP/IPv4 GSO assumes that all input
packets have correct checksums, and doesn't update checksums for output
packets (the responsibility for this lies with the application).
Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets.

TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect
MBUF, to organize an output packet. Note that we refer to these two
chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet
header, while the indirect mbuf simply points to a location within the
original packet's payload. Consequently, use of the GSO library requires
multi-segment MBUF support in the TX functions of the NIC driver.

If a packet is GSOed, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a
result, when all of its GSOed segments are freed, the packet is freed
automatically.

Signed-off-by: Jiayu Hu 
Signed-off-by: Mark Kavanagh 
---
 lib/librte_eal/common/include/rte_log.h |   1 +
 lib/librte_gso/Makefile |   2 +
 lib/librte_gso/gso_common.c | 270 
 lib/librte_gso/gso_common.h | 120 ++
 lib/librte_gso/gso_tcp.c|  82 ++
 lib/librte_gso/gso_tcp.h|  73 +
 lib/librte_gso/rte_gso.c|  44 +-
 lib/librte_gso/rte_gso.h|   3 +
 8 files changed, 593 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_gso/gso_common.c
 create mode 100644 lib/librte_gso/gso_common.h
 create mode 100644 lib/librte_gso/gso_tcp.c
 create mode 100644 lib/librte_gso/gso_tcp.h

diff --git a/lib/librte_eal/common/include/rte_log.h 
b/lib/librte_eal/common/include/rte_log.h
index ec8dba7..2fa1199 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -87,6 +87,7 @@ extern struct rte_logs rte_logs;
 #define RTE_LOGTYPE_CRYPTODEV 17 /**< Log related to cryptodev. */
 #define RTE_LOGTYPE_EFD   18 /**< Log related to EFD. */
 #define RTE_LOGTYPE_EVENTDEV  19 /**< Log related to eventdev. */
+#define RTE_LOGTYPE_GSO   20 /**< Log related to GSO. */
 
 /* these log types can be used in an application */
 #define RTE_LOGTYPE_USER1 24 /**< User-defined log type 1. */
diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile
index aeaacbc..0f8e38f 100644
--- a/lib/librte_gso/Makefile
+++ b/lib/librte_gso/Makefile
@@ -42,6 +42,8 @@ LIBABIVER := 1
 
 #source files
 SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c
+SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c
+SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h
diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c
new file mode 100644
index 000..2b54fbd
--- /dev/null
+++ b/lib/librte_gso/gso_common.c
@@ -0,0 +1,270 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+#include 
+
+#include 
+#include 
+#include 
+
+#include "gso_common.h"
+
+static inline void
+hdr_segment_init(struct rte_mbuf *hdr_segment, struct rte_mbuf *pkt,
+   uint16_t pkt_hdr_offset)
+{
+   /* copy mbuf metadata */
+   hdr_segment->nb_segs = 1;
+   hdr_segment->port = pkt->port;
+   hdr_se

[dpdk-dev] [PATCH 4/5] lib/gso: add GRE GSO support

2017-08-24 Thread Jiayu Hu
From: Mark Kavanagh 

This patch adds GSO support for GRE-tunneled packets. Supported GRE
packets must contain an outer IPv4 header, and inner TCP/IPv4 headers.
They may also contain a single VLAN tag. GRE GSO assumes that all input
packets have correct checksums and doesn't update checksums for output
packets. Additionally, it doesn't process IP fragmented packets.

As with VxLAN GSO, GRE GSO uses a two-segment MBUF to organize each
output packet, which requires multi-segment mbuf support in the TX
functions of the NIC driver. Also, if a packet is GSOed, GRE GSO reduces
its MBUF refcnt by 1. As a result, when all of its GSOed segments are
freed, the packet is freed automatically.

Signed-off-by: Mark Kavanagh 
Signed-off-by: Jiayu Hu 
---
 lib/librte_gso/gso_common.c | 66 +++--
 lib/librte_gso/gso_common.h | 21 +++
 lib/librte_gso/rte_gso.c|  5 +++-
 lib/librte_gso/rte_gso.h|  4 +++
 4 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c
index 65cec44..b3e7f9d 100644
--- a/lib/librte_gso/gso_common.c
+++ b/lib/librte_gso/gso_common.c
@@ -37,6 +37,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -159,6 +160,8 @@ gso_do_segment(struct rte_mbuf *pkt,
 
 static inline void parse_ethernet(struct ether_hdr *eth_hdr,
struct rte_mbuf *pkt);
+static inline void parse_ipv4(struct ipv4_hdr *ipv4_hdr,
+   struct rte_mbuf *pkt);
 
 static inline void
 parse_vxlan(struct udp_hdr *udp_hdr, struct rte_mbuf *pkt)
@@ -190,15 +193,29 @@ parse_udp(struct udp_hdr *udp_hdr, struct rte_mbuf *pkt)
 }
 
 static inline void
+parse_gre(struct gre_hdr *gre_hdr, struct rte_mbuf *pkt)
+{
+   struct ipv4_hdr *ipv4_hdr;
+
+   if (gre_hdr->proto == rte_cpu_to_be_16(ETHER_TYPE_IPv4)) {
+   ipv4_hdr = (struct ipv4_hdr *)(gre_hdr + 1);
+   pkt->packet_type |= RTE_PTYPE_INNER_L3_IPV4;
+   parse_ipv4(ipv4_hdr, pkt);
+   }
+}
+
+static inline void
 parse_ipv4(struct ipv4_hdr *ipv4_hdr, struct rte_mbuf *pkt)
 {
+   struct gre_hdr *gre_hdr;
struct tcp_hdr *tcp_hdr;
struct udp_hdr *udp_hdr;
 
switch (ipv4_hdr->next_proto_id) {
case IPPROTO_TCP:
-   if (IS_VXLAN_PKT(pkt)) {
-   pkt->outer_l3_len = pkt->l3_len;
+   if (IS_TUNNEL_PKT(pkt)) {
+   if (IS_VXLAN_PKT(pkt))
+   pkt->outer_l3_len = pkt->l3_len;
pkt->packet_type |= RTE_PTYPE_INNER_L4_TCP;
} else
pkt->packet_type |= RTE_PTYPE_L4_TCP;
@@ -211,6 +228,14 @@ parse_ipv4(struct ipv4_hdr *ipv4_hdr, struct rte_mbuf *pkt)
udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
parse_udp(udp_hdr, pkt);
break;
+   case IPPROTO_GRE:
+   gre_hdr = (struct gre_hdr *)(ipv4_hdr + 1);
+   pkt->outer_l2_len = pkt->l2_len;
+   pkt->outer_l3_len = IPv4_HDR_LEN(ipv4_hdr);
+   pkt->l2_len = sizeof(*gre_hdr);
+   pkt->packet_type |= RTE_PTYPE_TUNNEL_GRE;
+   parse_gre(gre_hdr, pkt);
+   break;
}
 }
 
@@ -343,6 +368,43 @@ gso_update_pkt_headers(struct rte_mbuf *pkt, uint16_t 
nb_segments,
sent_seq += seg->next->data_len;
}
break;
+   case ETHER_VLAN_IPv4_GRE_IPv4_TCP_PKT:
+   case ETHER_IPv4_GRE_IPv4_TCP_PKT:
+   outer_ipv4_hdr =
+   (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+   pkt->outer_l2_len);
+   ipv4_hdr = (struct ipv4_hdr *)((char *)outer_ipv4_hdr +
+   pkt->outer_l3_len + pkt->l2_len);
+   tcp_hdr = (struct tcp_hdr *)(ipv4_hdr + 1);
+
+   /* Retrieve values from original packet */
+   id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
+   outer_id = rte_be_to_cpu_16(outer_ipv4_hdr->packet_id);
+   sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+
+   for (i = 0; i < nb_segments; i++) {
+   seg = out_segments[i];
+
+   /* Update outer IPv4 header */
+   offset = seg->outer_l2_len;
+   update_ipv4_header(rte_pktmbuf_mtod(seg, char *),
+   offset, seg->pkt_len, outer_id);
+   outer_id++;
+
+   /* Update inner IPv4 header */
+   offset += seg->outer_l3_len + seg->l2_len;
+   update_ipv4_header(rte_pktmbuf_mtod(seg, char *),
+   offset, seg->pkt_len, id);
+   id++;
+
+   /* Update inner TCP header */
+   offset += seg->l3_len;
+ 

[dpdk-dev] [PATCH 5/5] app/testpmd: enable TCP/IPv4, VxLAN and GRE GSO

2017-08-24 Thread Jiayu Hu
This patch adds GSO support to the csum forwarding engine. Oversized
packets transmitted over a GSO-enabled port will undergo segmentation
(with the exception of packet-types unsupported by the GSO library).
GSO support is disabled by default.

GSO support may be toggled on a per-port basis, using the command

"set port  gso on|off".

The maximum packet length for GSO segments may be set with the command

"set port  gso_segsz "

Signed-off-by: Jiayu Hu 
Signed-off-by: Mark Kavanagh 
---
 app/test-pmd/cmdline.c  | 121 
 app/test-pmd/config.c   |  25 ++
 app/test-pmd/csumonly.c |  68 +--
 app/test-pmd/testpmd.c  |   9 
 app/test-pmd/testpmd.h  |  10 
 5 files changed, 228 insertions(+), 5 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index cd8c358..754e249 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -431,6 +431,13 @@ static void cmd_help_long_parsed(void *parsed_result,
"Set max flow number and max packet number per-flow"
" for GRO.\n\n"
 
+   "set port (port_id) gso (on|off)"
+   "Enable or disable Generic Segmentation Offload in"
+   " csum forwarding engine.\n\n"
+
+   "set port  gso_segsz \n"
+   "Set max packet length for GSO segment.\n\n"
+
"set fwd (%s)\n"
"Set packet forwarding mode.\n\n"
 
@@ -3963,6 +3970,118 @@ cmdline_parse_inst_t cmd_gro_set = {
},
 };
 
+/* *** ENABLE/DISABLE GSO FOR PORTS *** */
+struct cmd_gso_enable_result {
+   cmdline_fixed_string_t cmd_set;
+   cmdline_fixed_string_t cmd_port;
+   cmdline_fixed_string_t cmd_keyword;
+   cmdline_fixed_string_t cmd_mode;
+   uint8_t cmd_pid;
+};
+
+static void
+cmd_gso_enable_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_gso_enable_result *res;
+
+   res = parsed_result;
+   setup_gso(res->cmd_mode, res->cmd_pid);
+}
+
+cmdline_parse_token_string_t cmd_gso_enable_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_enable_result,
+   cmd_set, "set");
+cmdline_parse_token_string_t cmd_gso_enable_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_enable_result,
+   cmd_port, "port");
+cmdline_parse_token_string_t cmd_gso_enable_keyword =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_enable_result,
+   cmd_keyword, "gso");
+cmdline_parse_token_string_t cmd_gso_enable_mode =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_enable_result,
+   cmd_mode, "on#off");
+cmdline_parse_token_num_t cmd_gso_enable_pid =
+   TOKEN_NUM_INITIALIZER(struct cmd_gso_enable_result,
+   cmd_pid, UINT8);
+
+cmdline_parse_inst_t cmd_gso_enable = {
+   .f = cmd_gso_enable_parsed,
+   .data = NULL,
+   .help_str = "set port  gso on|off",
+   .tokens = {
+   (void *)&cmd_gso_enable_set,
+   (void *)&cmd_gso_enable_port,
+   (void *)&cmd_gso_enable_pid,
+   (void *)&cmd_gso_enable_keyword,
+   (void *)&cmd_gso_enable_mode,
+   NULL,
+   },
+};
+
+/* *** SET MAX PACKET LENGTH FOR GSO SEGMENT *** */
+struct cmd_gso_size_result {
+   cmdline_fixed_string_t cmd_set;
+   cmdline_fixed_string_t cmd_port;
+   cmdline_fixed_string_t cmd_keyword;
+   uint16_t cmd_segsz;
+   uint8_t cmd_pid;
+};
+
+static void
+cmd_gso_size_parsed(void *parsed_result,
+  __attribute__((unused)) struct cmdline *cl,
+  __attribute__((unused)) void *data)
+{
+   struct cmd_gso_size_result *res = parsed_result;
+
+   if (port_id_is_invalid(res->cmd_pid, ENABLED_WARN))
+   return;
+
+   if (!strcmp(res->cmd_keyword, "gso_segsz")) {
+   if (res->cmd_segsz == 0) {
+   gso_ports[res->cmd_pid].enable = 0;
+   gso_ports[res->cmd_pid].gso_segsz = 0;
+   printf("Input gso_segsz is 0. Disable GSO for"
+   " port %u\n", res->cmd_pid);
+   } else
+   gso_ports[res->cmd_pid].gso_segsz = res->cmd_segsz;
+
+   }
+}
+
+cmdline_parse_token_string_t cmd_gso_size_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_size_result,
+   cmd_set, "set");
+cmdline_parse_token_string_t cmd_gso_size_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_size_result,
+   cmd_port, "port");
+cmdline_parse_token_string_t cmd_gso_size_keyword =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_size_result,
+   cmd_k

Re: [dpdk-dev] Byte order of vlan_tci of rte_mbuf is different on different source

2017-08-24 Thread Roger B Melton

Hi folks,

Resurrecting this old thread.  I ran across this issue recently in DPDK 
17.05, and it's also present in17.08.  It appears no fix was committed.  
I'm working on a solution, but if anyone has a fix in flight please let 
me know.


Regards,
Roger


On 4/29/16 3:29 AM, Olivier Matz wrote:

Hi,

On 04/25/2016 04:35 AM, zhang.xingh...@zte.com.cn wrote:

When using I350 working on SR-IOV mode, we got confused that byte order
of vlan_tci in the VF received packet descriptor is different when the
packet source is different.

1) Packets from VF to VF, the byte order is big-endian. (e.g. 0xF00)
2) Packets from PC to VF, the byte order is little-endian. (e.g. 0xF)

Below is the testing net-work:
 VM0VM1 PC
 VF0VF1  |
   | |   |
   +--+--+   |
  |  |
  PF |
  hypervisor |
  SR-IOV NIC |
  |  |
  |VLAN 15   |
  +-switch---+


We make a breakpoint at the following line of eth_igb_recv_pkts, the
vlan_tci
we observed that everytime.

uint16_t
eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts)

 /* Only valid if PKT_RX_VLAN_PKT set in pkt_flags */
 rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);

In rte_mbuf.h, it is specified that these values (vlan_tci and
vlan_tci_outer) must be stored in CPU order.

It's probably a driver or hardware issue. Note that in linux there is
something that looks similar to your issue:

http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/igb/igb_main.c#L1278

   /* On i350, i354, i210, and i211, loopback VLAN packets
* have the tag byte-swapped.
*/
   if (adapter->hw.mac.type >= e1000_i350)
   set_bit(IGB_RING_FLAG_RX_LB_VLAN_BSWAP, &ring->flags);

I think you could check if the same thing is done in the
dpdk driver.




ZTE Information Security Notice: The information contained in this mail (and 
any attachment transmitted herewith) is privileged and confidential and is 
intended for the exclusive use of the addressee(s).  If you are not an intended 
recipient, any disclosure, reproduction, distribution or other dissemination or 
use of the information contained is strictly prohibited.  If you have received 
this mail in error, please delete it and notify us immediately.


This notice should be removed in public emails.

Regards,
Olivier
.



Re: [dpdk-dev] [PATCH 1/2] net/mlx5: support device removal event

2017-08-24 Thread Matan Azrad
Hi Nelio

> -Original Message-
> From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com]
> Sent: Thursday, August 24, 2017 10:38 AM
> To: Matan Azrad 
> Cc: Adrien Mazarguil ; dev@dpdk.org
> Subject: Re: [PATCH 1/2] net/mlx5: support device removal event
> 
> On Wed, Aug 23, 2017 at 07:44:45PM +, Matan Azrad wrote:
> > Hi Nelio
> >
> > > -Original Message-
> > > From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com]
> > > Sent: Wednesday, August 23, 2017 12:41 PM
> > > To: Matan Azrad 
> > > Cc: Adrien Mazarguil ; dev@dpdk.org
> > > Subject: Re: [PATCH 1/2] net/mlx5: support device removal event
> > >
> > > Hi Matan,
> > >
> > > On Sun, Aug 13, 2017 at 03:25:11PM +0300, Matan Azrad wrote:
> > > > Extend the LSC event handling to support the device removal as well.
> > > > The Verbs library may send several related events, which are
> > > > different from LSC event.
> > > >
> > > > The mlx5 event handling has been made capable of receiving and
> > > > signaling several event types at once.
> > > >
> > > > This support includes next:
> > > > 1. Removal event detection according to the user configuration.
> > > > 2. Calling to all registered mlx5 removal callbacks.
> > > > 3. Capabilities extension to include removal interrupt handling.
> > > >
> > > > Signed-off-by: Matan Azrad 
> > > > ---
> > > >  drivers/net/mlx5/mlx5.c|   2 +-
> > > >  drivers/net/mlx5/mlx5_ethdev.c | 100
> > > > +++--
> > > >  2 files changed, 68 insertions(+), 34 deletions(-)
> > > >
> > > > Hi
> > > > This patch based on top of last Nelio mlx5 cleanup patches.
> > > >
> > > > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> > > > index
> > > > bd66a7c..1a3d7f1 100644
> > > > --- a/drivers/net/mlx5/mlx5.c
> > > > +++ b/drivers/net/mlx5/mlx5.c
> > > > @@ -865,7 +865,7 @@ static struct rte_pci_driver mlx5_driver = {
> > > > },
> > > > .id_table = mlx5_pci_id_map,
> > > > .probe = mlx5_pci_probe,
> > > > -   .drv_flags = RTE_PCI_DRV_INTR_LSC,
> > > > +   .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV,
> > > >  };
> > > >
> > > >  /**
> > > > diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> > > > b/drivers/net/mlx5/mlx5_ethdev.c index 57f6237..404d8f4 100644
> > > > --- a/drivers/net/mlx5/mlx5_ethdev.c
> > > > +++ b/drivers/net/mlx5/mlx5_ethdev.c
> > > > @@ -1112,47 +1112,75 @@ mlx5_ibv_device_to_pci_addr(const struct
> > > > ibv_device *device,  }
> > > >
> > > >  /**
> > > > - * Link status handler.
> > > > + * Update the link status.
> > > > + * Set alarm if the device link status is inconsistent.
> > >
> > > Adding such comment should also comment about the issue this alarm
> > > is solving i.e. why the link is inconsistent and why the alarm help
> > > to fix the issue.
> > >
> > I didn't see any comments about that in the old code , Hence I didn't write
> it.
> 
> Normal as the alarm is a work around specifically necessary to Mellanox PMD.
> Now you explicitly announce that this function program an alarm, the
> question is why is it necessary?
> 

> > I think you right and this could be added.(even before this patch).
> 
> No, in the current code, it update the link, if it inconsistent it tries to 
> have a
> link correct ASAP.  There is no need to inform this function will program an
> alarm, it is internal cooking.
> 
> > > >   *
> > > >   * @param priv
> > > >   *   Pointer to private structure.
> > > > - * @param dev
> > > > - *   Pointer to the rte_eth_dev structure.
> > > >   *
> > > >   * @return
> > > > - *   Nonzero if the callback process can be called immediately.
> > > > + *   Zero if alarm is not set and the link status is consistent.
> > > >   */
> > > >  static int
> > > > -priv_dev_link_status_handler(struct priv *priv, struct
> > > > rte_eth_dev
> > > > *dev)
> > > > +priv_link_status_alarm_update(struct priv *priv)
> > >
> > > The old name is more accurate, the fact we need to program an alarm
> > > is a work around to get the correct status from ethtool.  If it was
> > > possible to avoid it, this alarm would not exists.
> > >
> > Probably because of the git +- format and this specific patch you got
> confuse here.
> 
> No I applied your patch and read your code.  You did not understand my
> comment.
>
I thought it because you said "old name" related to a new function name :) 
 
> >[...]
> 
> When I read:
> 
> >  void
> >  mlx5_dev_link_status_handler(void *arg)  {
> > struct rte_eth_dev *dev = arg;
> > struct priv *priv = dev->data->dev_private;
> > int ret;
> >
> > priv_lock(priv);
> > assert(priv->pending_alarm == 1);
> > priv->pending_alarm = 0;
> > -   ret = priv_dev_link_status_handler(priv, dev);
> > +   ret = priv_link_status_alarm_update(priv);
> > priv_unlock(priv);
> > -   if (ret)
> > +   if (!ret)
> > _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC,
> NULL,
> > - 

[dpdk-dev] 17.11 Mellanox Roadmap

2017-08-24 Thread Shahaf Shuler
[Apologies for the late notice]

Below are the features that we're planning to submit for the 17.11 release:



- new ethdev Rx/Tx offloads API

- new IPsec inline offload API

- rework mlx drivers to support upstream ibverbs from 
linux-rdma.org

- support secondary process in mlx5

- support plug out in mlx5

- support Raw Rx timestamp in mlx5

- support flow counter in mlx5 and mlx4

- new example for rte_flow API

- Isolate mode rework


Preliminary patches for above work are already on Mailing list.


--Shahaf



Re: [dpdk-dev] cuckoo hash in dpdk

2017-08-24 Thread Andriy Berestovskyy
Hey Pragash,
I am not the author of the code, but I guess it is done that way
because modern compilers do recognize power of two constants and do
substitute division and modulo operations with corresponding bit
manipulations.

Just try to compile a small program like the following:

volatile unsigned a = 123, b, c;
int main(int argc, char **argv)
{
b = a / 4;
c = a % 4;
printf("%x %x %x\n", a, b, c);
}


and then disassemble it with gdb:

(gdb) disassemble /s main
[...]
13 b = a / 4;
   0x00400464 <+20>: shr$0x2,%eax
   0x00400467 <+23>: mov%eax,0x200bd3(%rip)# 0x601040 

14 c = a % 4;
   0x0040046d <+29>: mov0x200bc5(%rip),%eax# 0x601038 
   0x00400473 <+35>: and$0x3,%eax
   0x00400476 <+38>: mov%eax,0x200bc8(%rip)# 0x601044 
[...]

As you can see both division and modulo was substituted with "shr" and "and".

So basically nowadays there is no need to worry about that and
complicate code with explicit low-level optimizations. Hope that
answers your question.

Regards,
Andriy


On Wed, Aug 23, 2017 at 4:15 PM, Pragash Vijayaragavan  wrote:
> Hi,
>
> I got the chance to look at the cuckoo hash used in dpdk and have a query.
>
> would using division and modulo operations be slower than bitwise
> operations on RTE_HASH_BUCKET_ENTRIES, specially since
> RTE_HASH_BUCKET_ENTRIES is a power of 2.
> For example, to do a modulo we can do a "AND" operation on
> (RTE_HASH_BUCKET_ENTRIES - 1), which might be faster. We did a cuckoo
> filter for VPP and doing this gave a slight improvement in speed.
> Is there any particular reason its done this way.
>
> Sorry if i am being wrong in any way, i was just curious.
>
> Thanks,
>
> Pragash Vijayaragavan
> Grad Student at Rochester Institute of Technology
> email : pxv3...@rit.edu
> ph : 585 764 4662



-- 
Andriy Berestovskyy


Re: [dpdk-dev] [PATCH v1] net/mlx5: support upstream rdma-core

2017-08-24 Thread Nélio Laranjeiro
On Thu, Aug 24, 2017 at 12:23:10PM +, Shachar Beiser wrote:
>  This removes the dependency on specific Mellanox OFED libraries by
>  using the upstream rdma-core and linux upstream community code.
> 
>  Minimal requirements: rdma-core v16 and Kernel Linux 4.14.

Is not it also suppose to keep working with previous kernel if the user
installs Mellanox OFED?

> Signed-off-by: Shachar Beiser 
> [...]
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst

Is not it better to split this documentation in two subparts, one with for
people with new kernel and rdma-core and the others with old kernel versions
and Mellanox OFED?

> diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
> index 14b739a..2de1c78 100644
> --- a/drivers/net/mlx5/Makefile
> +++ b/drivers/net/mlx5/Makefile
> @@ -104,41 +104,20 @@ mlx5_autoconf.h.new: FORCE
>  mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
>   $Q $(RM) -f -- '$@'
>[...]
>   $Q sh -- '$<' '$@' \
> - HAVE_VERBS_MLX5_OPCODE_TSO \
> - infiniband/mlx5_hw.h \
> - enum MLX5_OPCODE_TSO \
> + HAVE_IBV_MLX5_MOD_MPW \
> + infiniband/mlx5dv.h \
> + enum MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED \
>   $(AUTOCONF_OUTPUT)
> - $Q sh -- '$<' '$@' \
> - HAVE_ETHTOOL_LINK_MODE_25G \
> - /usr/include/linux/ethtool.h \
> - enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
> - $(AUTOCONF_OUTPUT)
> - $Q sh -- '$<' '$@' \
> - HAVE_ETHTOOL_LINK_MODE_50G \
> - /usr/include/linux/ethtool.h \
> - enum ETHTOOL_LINK_MODE_5baseCR2_Full_BIT \
> - $(AUTOCONF_OUTPUT)
> - $Q sh -- '$<' '$@' \
> - HAVE_ETHTOOL_LINK_MODE_100G \
> - /usr/include/linux/ethtool.h \
> - enum ETHTOOL_LINK_MODE_10baseKR4_Full_BIT \
> - $(AUTOCONF_OUTPUT)
> - $Q sh -- '$<' '$@' \
> - HAVE_UPDATE_CQ_CI \
> - infiniband/mlx5_hw.h \
> - func ibv_mlx5_exp_update_cq_ci \
> - $(AUTOCONF_OUTPUT)
> -
>  # Create mlx5_autoconf.h or update it in case it differs from the new one.

Keep the ETHTOOL_LINK_MODE_* macros, it is still necessary for previous kernel
versions.

>  mlx5_autoconf.h: mlx5_autoconf.h.new
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> index bd66a7c..c2e37a3 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -247,10 +247,8 @@ struct mlx5_args {
>   .filter_ctrl = mlx5_dev_filter_ctrl,
>   .rx_descriptor_status = mlx5_rx_descriptor_status,
>   .tx_descriptor_status = mlx5_tx_descriptor_status,
> -#ifdef HAVE_UPDATE_CQ_CI
>   .rx_queue_intr_enable = mlx5_rx_intr_enable,
>   .rx_queue_intr_disable = mlx5_rx_intr_disable,
> -#endif
>  };
>  
>  static struct {
> @@ -442,7 +440,7 @@ struct mlx5_args {
>   struct ibv_device *ibv_dev;
>   int err = 0;
>   struct ibv_context *attr_ctx = NULL;
> - struct ibv_device_attr device_attr;
> + struct ibv_device_attr_ex device_attr;
>   unsigned int sriov;
>   unsigned int mps;
>   unsigned int tunnel_en;
> @@ -493,34 +491,24 @@ struct mlx5_args {
>  PCI_DEVICE_ID_MELLANOX_CONNECTX5VF) ||
> (pci_dev->id.device_id ==
>  PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF));
> - /*
> -  * Multi-packet send is supported by ConnectX-4 Lx PF as well
> -  * as all ConnectX-5 devices.
> -  */

This comment should be kept bellow.

>[...]
> @@ -539,13 +527,29 @@ struct mlx5_args {
>   return -err;
>   }
>   ibv_dev = list[i];
> -
>   DEBUG("device opened");
> - if (ibv_query_device(attr_ctx, &device_attr))
> +#ifdef HAVE_IBV_MLX5_MOD_MPW
> + struct mlx5dv_context attrs_out;
> + mlx5dv_query_device(attr_ctx, &attrs_out);
> + if (attrs_out.flags & (MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW |
> +MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED)) {
> + INFO("Enhanced MPW is detected\n");
> + mps = MLX5_MPW_ENHANCED;
> + } else if (attrs_out.flags & MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED) {
> + INFO("MPW is detected\n");
> + mps = MLX5_MPW;
> + } else {
> + INFO("MPW is disabled\n");
> + mps = MLX5_MPW_DISABLED;
> + }
> +#else
> + mps = MLX5_MPW_DISABLED;
> +#endif

This does not guarantee you won't have fall in the faulty kernel.

Take in consideration the following point, I have a kernel 4.13 and a
rdma-core v20, this rdma-core library version embed the enum defined for the
autoconf i.e. enum MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED in mlx5dv.h.
This code will be available and executed on a faulty kernel version.
Won't I face the issue?

> @@ -664,29 +660,32 @@ struct mlx5_args {
>   priv->ind_table_max_size = ETH_RSS_RETA_SIZE_512;
>   DE

Re: [dpdk-dev] [RFC] net/mlx5: support count flow action

2017-08-24 Thread Nélio Laranjeiro
Hi Ori,

On Thu, Aug 24, 2017 at 02:04:32PM +, Ori Kam wrote:
> Hi Nelio,
> 
> Please see my comments in line.
> 
> Ori
> 
> > -Original Message-
> > From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com]
> > Sent: Thursday, August 24, 2017 9:54 AM
> > To: Ori Kam 
> > Cc: adrien.mazar...@6wind.com; dev@dpdk.org
> > Subject: Re: [RFC] net/mlx5: support count flow action
> > 
> > Hi Ori,
> > 
> > Please keep the coding style of the file, and pass checkpatch before
> > submitting a patch on the mailing list.  It helps the review by having a 
> > correct
> > patch respecting the coding style of the file.
> > I won't spot out here all the coding style issues, if you need some help, 
> > feel
> > free to ask.
> > 
> Sorry won't happen again.

No problem, first contribution is always complicate.

> > On Mon, Aug 21, 2017 at 03:35:41PM +0300, Ori Kam wrote:
> > > Support count flow action.
> > 
> > Why copy/pasting the title in the commit message?
> > 
> I was under the impression that main function of the RFC should also be in 
> the message body.

No, it is not necessary, the commit message should bring useful information by
still being short and precise.

>[...]
> > > ---
> > >  drivers/net/mlx5/mlx5.h  |   4 ++
> > >  drivers/net/mlx5/mlx5_flow.c | 163
> > > ++-
> > 
> > There are missing changes in the Makefile to have the
> > HAVE_VERBS_IBV_EXP_FLOW_SPEC_ACTION_COUNT and the include of the
> > mlx5_autoconf.h in mlx5_flow.c.
> > 
> I haven't added them since this feature is not supported yet, and 
> I don't want anybody trying to activate them.
> When the feature will be supported on the verbs then I will update
> those files. 

Ok, so a new version should be sent soon :)

>[...]
> > 
> Will be update according to your suggestion.
 
Thanks,

-- 
Nélio Laranjeiro
6WIND


[dpdk-dev] [PATCH 0/2] ethdev: add support for raw flow type for flow director

2017-08-24 Thread Kirill Rybalchenko
For complex packets use raw flow type with pre-constructed packet buffer
instead of creating a packet internally in PMD.

Kirill Rybalchenko (2):
  ethdev: add support for raw flow type for flow director
  net/i40e: add support for raw flow type for flow director

 drivers/net/i40e/i40e_fdir.c| 27 +++
 lib/librte_ether/rte_eth_ctrl.h | 10 ++
 2 files changed, 29 insertions(+), 8 deletions(-)

-- 
2.5.5



[dpdk-dev] [PATCH 2/2] net/i40e: add support for raw flow type for flow director

2017-08-24 Thread Kirill Rybalchenko
When addidng flow director filter for raw flow type, instead
of constructing packet use buffer with pre-constructed packet.

Signed-off-by: Kirill Rybalchenko 
---
 drivers/net/i40e/i40e_fdir.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
index 8013add..0558914 100644
--- a/drivers/net/i40e/i40e_fdir.c
+++ b/drivers/net/i40e/i40e_fdir.c
@@ -1093,6 +1093,7 @@ i40e_add_del_fdir_filter(struct rte_eth_dev *dev,
struct i40e_fdir_filter *fdir_filter, *node;
struct i40e_fdir_filter check_filter; /* Check if the filter exists */
int ret = 0;
+   uint16_t flow_type = filter->input.flow_type;
 
if (dev->data->dev_conf.fdir_conf.mode != RTE_FDIR_MODE_PERFECT) {
PMD_DRV_LOG(ERR, "FDIR is not enabled, please"
@@ -1100,7 +1101,7 @@ i40e_add_del_fdir_filter(struct rte_eth_dev *dev,
return -ENOTSUP;
}
 
-   if (!I40E_VALID_FLOW(filter->input.flow_type)) {
+   if (flow_type != RTE_ETH_FLOW_RAW && !I40E_VALID_FLOW(flow_type)) {
PMD_DRV_LOG(ERR, "invalid flow_type input.");
return -EINVAL;
}
@@ -1132,20 +1133,30 @@ i40e_add_del_fdir_filter(struct rte_eth_dev *dev,
 
memset(pkt, 0, I40E_FDIR_PKT_LEN);
 
-   ret = i40e_fdir_construct_pkt(pf, &filter->input, pkt);
-   if (ret < 0) {
-   PMD_DRV_LOG(ERR, "construct packet for fdir fails.");
-   return ret;
+   if (flow_type == RTE_ETH_FLOW_RAW) {
+   if (filter->input.flow.raw_flow.length > I40E_FDIR_PKT_LEN ||
+   !filter->input.flow.raw_flow.packet ||
+   !I40E_VALID_FLOW(filter->input.flow.raw_flow.flow)) {
+   PMD_DRV_LOG(ERR, "Invalid raw flow filter parameters!");
+   }
+   memcpy(pkt, filter->input.flow.raw_flow.packet,
+  filter->input.flow.raw_flow.length);
+   flow_type = filter->input.flow.raw_flow.flow;
+   } else {
+   ret = i40e_fdir_construct_pkt(pf, &filter->input, pkt);
+   if (ret < 0) {
+   PMD_DRV_LOG(ERR, "construct packet for fdir fails.");
+   return ret;
+   }
}
 
if (hw->mac.type == I40E_MAC_X722) {
/* get translated pctype value in fd pctype register */
pctype = (enum i40e_filter_pctype)i40e_read_rx_ctl(
hw, I40E_GLQF_FD_PCTYPES(
-   (int)i40e_flowtype_to_pctype(
-   filter->input.flow_type)));
+   (int)i40e_flowtype_to_pctype(flow_type)));
} else
-   pctype = i40e_flowtype_to_pctype(filter->input.flow_type);
+   pctype = i40e_flowtype_to_pctype(flow_type);
 
ret = i40e_fdir_filter_programming(pf, pctype, filter, add);
if (ret < 0) {
-- 
2.5.5



[dpdk-dev] [PATCH 1/2] ethdev: add support for raw flow type for flow director

2017-08-24 Thread Kirill Rybalchenko
Add new structure rte_eth_raw_flow to the union rte_eth_fdir_flow
to support filter for raw flow type

Signed-off-by: Kirill Rybalchenko 
---
 lib/librte_ether/rte_eth_ctrl.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index 8386904..22d9640 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -525,6 +525,15 @@ struct rte_eth_tunnel_flow {
 };
 
 /**
+ * A structure used to define the input for raw flow
+ */
+struct rte_eth_raw_flow {
+   uint16_t flow; /**< flow type. */
+   void *packet; /**< pre-constructed packet buffer. */
+   uint16_t length; /**< buffer length. */
+};
+
+/**
  * An union contains the inputs for all types of flow
  * Items in flows need to be in big endian
  */
@@ -540,6 +549,7 @@ union rte_eth_fdir_flow {
struct rte_eth_ipv6_flow   ipv6_flow;
struct rte_eth_mac_vlan_flow mac_vlan_flow;
struct rte_eth_tunnel_flow   tunnel_flow;
+   struct rte_eth_raw_flowraw_flow;
 };
 
 /**
-- 
2.5.5



[dpdk-dev] [PATCH 0/5] new mlx4 Tx datapath bypassing ibverbs

2017-08-24 Thread Moti Haimovsky
This series of patches implements the mlx4-pmd with Tx data path that directly
access the device queues for transmitting packets, bypassing the ibverbs Tx
data path altogether.
Using this scheme allows the PMD to work with upstream rdma-core package
instead of the Mellanox OFED one without sacrificing Tx functionality.

These patches should be applied in the order listed below as each depends on
its predecessor to work.

This implementation allows rapid deployment of new features without the need to
update the underlying OFED.

This work depends on
http://dpdk.org/ml/archives/dev/2017-August/072281.html
[dpdk-dev] [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD
by Adrien Mazarguil

It had been built and tested using rdma-core-15-1 from
 https://github.com/linux-rdma/rdma-core
and kernel-ml-4.12.0-1.el7.elrepo.x86_64

It had been built and tested using rdma-core-15-1 from
 https://github.com/linux-rdma/rdma-core
and kernel-ml-4.12.0-1.el7.elrepo.x86_64

Moti Haimovsky (5):
  net/mlx4: add simple Tx bypassing ibverbs
  net/mlx4: support multi-segments Tx
  net/mlx4: refine setting Tx completion flag
  net/mlx4: add Tx checksum offloads
  net/mlx4: add loopback Tx from VF

 drivers/net/mlx4/mlx4.c|   7 +
 drivers/net/mlx4/mlx4.h|   2 +
 drivers/net/mlx4/mlx4_ethdev.c |   6 +
 drivers/net/mlx4/mlx4_prm.h| 249 ++
 drivers/net/mlx4/mlx4_rxtx.c   | 456 +
 drivers/net/mlx4/mlx4_rxtx.h   |  39 +++-
 drivers/net/mlx4/mlx4_txq.c|  66 +-
 mk/rte.app.mk  |   2 +-
 8 files changed, 734 insertions(+), 93 deletions(-)
 create mode 100644 drivers/net/mlx4/mlx4_prm.h

-- 
1.8.3.1



[dpdk-dev] [PATCH 1/5] net/mlx4: add simple Tx bypassing ibverbs

2017-08-24 Thread Moti Haimovsky
PMD now sends the single-buffer packets directly to the device
bypassing the ibv Tx post and poll routines.

Signed-off-by: Moti Haimovsky 
---
 drivers/net/mlx4/mlx4_prm.h  | 253 +
 drivers/net/mlx4/mlx4_rxtx.c | 260 +++
 drivers/net/mlx4/mlx4_rxtx.h |  30 -
 drivers/net/mlx4/mlx4_txq.c  |  52 -
 mk/rte.app.mk|   2 +-
 5 files changed, 546 insertions(+), 51 deletions(-)
 create mode 100644 drivers/net/mlx4/mlx4_prm.h

diff --git a/drivers/net/mlx4/mlx4_prm.h b/drivers/net/mlx4/mlx4_prm.h
new file mode 100644
index 000..c5ce33b
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_prm.h
@@ -0,0 +1,253 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_PMD_MLX4_MLX4_PRM_H_
+#define RTE_PMD_MLX4_MLX4_PRM_H_
+
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include 
+#include 
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+/* Basic TxQ building block */
+#define TXBB_SHIFT 6
+#define TXBB_SIZE (1 << TXBB_SHIFT)
+
+/* Typical TSO descriptor with 16 gather entries is 352 bytes... */
+#define MAX_WQE_SIZE   512
+#define MAX_WQE_TXBBS  (MAX_WQE_SIZE / TXBB_SIZE)
+
+/* Send Queue Stamping/Invalidating info */
+#define SQ_STAMP_STRIDE64
+#define SQ_STAMP_DWORDS(SQ_STAMP_STRIDE / 4)
+#define SQ_STAMP_SHIFT 31
+#define SQ_STAMP_VAL   0x7fff
+
+/* WQE flags */
+#define MLX4_OPCODE_SEND   0x0a
+#define MLX4_EN_BIT_WQE_OWN0x8000
+
+#define SIZE_TO_TXBBS(size) (RTE_ALIGN((size), (TXBB_SIZE)) / (TXBB_SIZE))
+
+/**
+ * Update the HW with the new  CQ consumer value.
+ *
+ * @param cq
+ *   Pointer to the cq structure.
+ */
+static inline void
+mlx4_cq_set_ci(struct mlx4_cq *cq)
+{
+   *cq->set_ci_db = rte_cpu_to_be_32(cq->cons_index & 0xff);
+}
+
+/**
+ * Returns a pointer to the cqe in position n.
+ *
+ * @param cq
+ *   Pointer to the cq structure.
+ * @param n
+ *   The number of the entry its address we seek.
+ *
+ * @return
+ *   pointer to the cqe.
+ */
+static inline struct mlx4_cqe
+*mlx4_get_cqe(struct mlx4_cq *cq, int n)
+{
+   return (struct mlx4_cqe *)(cq->buf + n * cq->cqe_size);
+}
+
+/**
+ * Returns a pointer to the cqe in position n if it is owned by SW.
+ *
+ * @param cq
+ *   Pointer to the cq structure.
+ * @param n
+ *   The number of the entry its address we seek.
+ *
+ * @return
+ *   pointer to the cqe if owned by SW, otherwise returns NULL.
+ */
+static inline void
+*mlx4_get_sw_cqe(struct mlx4_cq *cq, int n)
+{
+   struct mlx4_cqe *cqe = mlx4_get_cqe(cq, n & (cq->cqe_cnt - 1));
+   struct mlx4_cqe *tcqe = cq->cqe_size == 64 ? cqe + 1 : cqe;
+
+   return (!!(tcqe->owner_sr_opcode & MLX4_CQE_OWNER_MASK) ^
+   !!(n & cq->cqe_cnt)) ? NULL : cqe;
+}
+
+/**
+ * returns pointer to the wqe at position n.
+ *
+ * @param sq
+ *   Pointer to the sq.
+ * @param n
+ *   The entry number the queue.
+ *
+ * @return
+ *   A pointer to the required entry.
+ */
+static inline void
+*mlx4_get_send_wqe(struct mlx4_sq *sq, unsigned int n)
+{
+   return sq->buf + n * TXBB_SIZE;
+}
+
+/**
+ * returns the size in bytes of this WQE.
+ *
+ * @param wqe
+ *   Pointer to the WQE we want to interrogate.
+ *
+ * @return
+ *   W

[dpdk-dev] [PATCH 2/5] net/mlx4: support multi-segments Tx

2017-08-24 Thread Moti Haimovsky
PRM now supports transmitting packets spanning over arbitrary
amount of buffers.

Signed-off-by: Moti Haimovsky 
---
 drivers/net/mlx4/mlx4_prm.h  |  16 +---
 drivers/net/mlx4/mlx4_rxtx.c | 213 +++
 drivers/net/mlx4/mlx4_rxtx.h |   3 +-
 drivers/net/mlx4/mlx4_txq.c  |  12 ++-
 4 files changed, 170 insertions(+), 74 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_prm.h b/drivers/net/mlx4/mlx4_prm.h
index c5ce33b..8b0248a 100644
--- a/drivers/net/mlx4/mlx4_prm.h
+++ b/drivers/net/mlx4/mlx4_prm.h
@@ -61,7 +61,7 @@
 #define MLX4_OPCODE_SEND   0x0a
 #define MLX4_EN_BIT_WQE_OWN0x8000
 
-#define SIZE_TO_TXBBS(size) (RTE_ALIGN((size), (TXBB_SIZE)) / (TXBB_SIZE))
+#define SIZE_TO_TXBBS(size)(RTE_ALIGN((size), (TXBB_SIZE)) / (TXBB_SIZE))
 
 /**
  * Update the HW with the new  CQ consumer value.
@@ -148,6 +148,7 @@
 
 /**
  * Fills the ctrl segment of a WQE with info needed for transmitting the 
packet.
+ * Owner field is filled later.
  *
  * @param seg
  *   Pointer to the control structure in the WQE.
@@ -161,8 +162,8 @@
  *   Immediate data/Invalidation key..
  */
 static inline void
-mlx4_set_ctrl_seg(struct mlx4_wqe_ctrl_seg *seg, uint32_t owner,
-uint8_t fence_size, uint32_t srcrb_flags, uint32_t imm)
+mlx4_set_ctrl_seg(struct mlx4_wqe_ctrl_seg *seg, uint8_t fence_size,
+ uint32_t srcrb_flags, uint32_t imm)
 {
seg->fence_size = fence_size;
seg->srcrb_flags = rte_cpu_to_be_32(srcrb_flags);
@@ -173,13 +174,6 @@
 * For the IBV_WR_SEND_WITH_INV, it should be htobe32(imm).
 */
seg->imm = imm;
-   /*
-* Make sure descriptor is fully written before
-* setting ownership bit (because HW can start
-* executing as soon as we do).
-*/
-   rte_wmb();
-   seg->owner_opcode = rte_cpu_to_be_32(owner);
 }
 
 /**
@@ -241,7 +235,7 @@
  *   The number of data-segments the WQE contains.
  *
  * @return
- *   WQE size in bytes.
+ *   The calculated WQE size in bytes.
  */
 static inline int
 mlx4_wqe_calc_real_size(unsigned int count)
diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c
index 0720e34..e41ea9e 100644
--- a/drivers/net/mlx4/mlx4_rxtx.c
+++ b/drivers/net/mlx4/mlx4_rxtx.c
@@ -309,6 +309,101 @@
 }
 
 /**
+ * Copy a WQE written in the bounce buffer back to the SQ.
+ * Routine is used when a WQE wraps-around the SQ and therefore needs a
+ * special attention. note that the WQE is written backward to the SQ.
+ *
+ * @param txq
+ *   Pointer to mlx4 Tx queue structure.
+ * @param index
+ *   First SQ TXBB index for this WQE.
+ * @param desc_size
+ *   TXBB-aligned sixe of the WQE.
+ *
+ * @return
+ *   A pointer to the control segment of this WQE in the SQ.
+ */
+static struct mlx4_wqe_ctrl_seg
+*mlx4_bounce_to_desc(struct txq *txq,
+uint32_t index,
+unsigned int desc_size)
+{
+   struct mlx4_sq *sq = &txq->msq;
+   uint32_t copy = (sq->txbb_cnt - index) * TXBB_SIZE;
+   int i;
+
+   for (i = desc_size - copy - 4; i >= 0; i -= 4) {
+   if ((i & (TXBB_SIZE - 1)) == 0)
+   rte_wmb();
+   *((uint32_t *)(sq->buf + i)) =
+   *((uint32_t *)(txq->bounce_buf + copy + i));
+   }
+   for (i = copy - 4; i >= 4; i -= 4) {
+   if ((i & (TXBB_SIZE - 1)) == 0)
+   rte_wmb();
+   *((uint32_t *)(sq->buf + index * TXBB_SIZE + i)) =
+   *((uint32_t *)(txq->bounce_buf + i));
+   }
+   /* Return real descriptor location */
+   return (struct mlx4_wqe_ctrl_seg *)(sq->buf + index * TXBB_SIZE);
+}
+
+/**
+ * Handle address translation of scattered buffers for mlx4_tx_burst().
+ *
+ * @param txq
+ *   TX queue structure.
+ * @param segs
+ *   Number of segments in buf.
+ * @param elt
+ *   TX queue element to fill.
+ * @param[in] buf
+ *   Buffer to process.
+ * @param elts_head
+ *   Index of the linear buffer to use if necessary (normally txq->elts_head).
+ * @param[out] sges
+ *   Array filled with SGEs on success.
+ *
+ * @return
+ *   A structure containing the processed packet size in bytes and the
+ *   number of SGEs. Both fields are set to (unsigned int)-1 in case of
+ *   failure.
+ */
+static inline int
+mlx4_tx_sg_virt_to_lkey(struct txq *txq, struct rte_mbuf *buf,
+   struct ibv_sge *sges, unsigned int segs)
+{
+   unsigned int j;
+
+   /* Register segments as SGEs. */
+   for (j = 0; (j != segs); ++j) {
+   struct ibv_sge *sge = &sges[j];
+   uint32_t lkey;
+
+   /* Retrieve Memory Region key for this memory pool. */
+   lkey = mlx4_txq_mp2mr(txq, mlx4_txq_mb2mp(buf));
+   if (unlikely(lkey == (uint32_t)-1)) {
+   /* MR does not exist. */
+   DEBUG("%p: unable to get MP <-> MR association",
+ (vo

[dpdk-dev] [PATCH 3/5] net/mlx4: refine setting Tx completion flag

2017-08-24 Thread Moti Haimovsky
PMD now take into consideration the amount of entries in the TxQ
a packet occupies when choosing weather to set the report-completion flag
to the chip or not.

Signed-off-by: Moti Haimovsky 
---
 drivers/net/mlx4/mlx4_rxtx.c | 30 +++---
 drivers/net/mlx4/mlx4_rxtx.h |  2 +-
 2 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c
index e41ea9e..dae0e47 100644
--- a/drivers/net/mlx4/mlx4_rxtx.c
+++ b/drivers/net/mlx4/mlx4_rxtx.c
@@ -461,14 +461,16 @@
/* Fill-in date from last to first */
for (i = wr->num_sge  - 1; i >= 0; --i)
mlx4_set_data_seg(dseg + i,  wr->sg_list + i);
-   /* Handle control info
-*
-* For raw eth, the SOLICIT flag is used to indicate that
-* no icrc should be calculated
-*/
-   srcrb_flags = MLX4_WQE_CTRL_SOLICIT |
- ((wr->send_flags & IBV_SEND_SIGNALED) ?
- MLX4_WQE_CTRL_CQ_UPDATE : 0);
+   /* Handle control info */
+   /* For raw eth always set the SOLICIT flag */
+   /*  Request Tx completion. */
+   txq->elts_comp_cd -= nr_txbbs;
+   if (unlikely(txq->elts_comp_cd <= 0)) {
+   srcrb_flags = MLX4_WQE_CTRL_SOLICIT | MLX4_WQE_CTRL_CQ_UPDATE;
+   txq->elts_comp_cd = txq->elts_comp_cd_init;
+   } else {
+   srcrb_flags = MLX4_WQE_CTRL_SOLICIT;
+   }
fence_size = (wr->send_flags & IBV_SEND_FENCE ?
MLX4_WQE_CTRL_FENCE : 0) | ((wqe_real_size / 16) & 0x3f);
owner_opcode = MLX4_OPCODE_SEND |
@@ -514,13 +516,12 @@
struct ibv_send_wr *wr_bad = NULL;
unsigned int elts_head = txq->elts_head;
const unsigned int elts_n = txq->elts_n;
-   unsigned int elts_comp_cd = txq->elts_comp_cd;
unsigned int elts_comp = 0;
unsigned int i;
unsigned int max;
int err;
 
-   assert(elts_comp_cd != 0);
+   assert(txq->elts_comp_cd != 0);
mlx4_txq_complete(txq);
max = (elts_n - (elts_head - txq->elts_tail));
if (max > elts_n)
@@ -560,11 +561,6 @@
tmp = next;
} while (tmp != NULL);
}
-   /* Request Tx completion. */
-   if (unlikely(--elts_comp_cd == 0)) {
-   elts_comp_cd = txq->elts_comp_cd_init;
-   send_flags |= IBV_SEND_SIGNALED;
-   }
if (buf->pkt_len <= txq->max_inline)
send_flags |= IBV_SEND_INLINE;
/* Update element. */
@@ -580,9 +576,6 @@
/* post the pkt for sending */
err = mlx4_post_send(txq, buf, wr, &wr_bad);
if (unlikely(err)) {
-   if (unlikely(wr_bad->send_flags &
-IBV_SEND_SIGNALED))
-   elts_comp_cd = 1;
elt->buf = NULL;
goto stop;
}
@@ -602,7 +595,6 @@
mlx4_send_flush(txq);
txq->elts_head = elts_head;
txq->elts_comp += elts_comp;
-   txq->elts_comp_cd = elts_comp_cd;
return i;
 }
 
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index 7cae7e2..35e0de7 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -140,7 +140,7 @@ struct txq {
unsigned int elts_head; /**< Current index in (*elts)[]. */
unsigned int elts_tail; /**< First element awaiting completion. */
unsigned int elts_comp; /**< Number of pkts waiting for completion. */
-   unsigned int elts_comp_cd; /**< Countdown for next completion. */
+   int elts_comp_cd; /**< Countdown for next completion. */
unsigned int elts_comp_cd_init; /**< Initial value for countdown. */
struct mlx4_txq_stats stats; /**< Tx queue counters. */
unsigned int socket; /**< CPU socket ID for allocations. */
-- 
1.8.3.1



[dpdk-dev] [PATCH 4/5] net/mlx4: add Tx checksum offloads

2017-08-24 Thread Moti Haimovsky
PMD now supports offloading the ip and tcp/udp checksum header calculation
(including tunneled packets) to the hardware.

Signed-off-by: Moti Haimovsky 
---
 drivers/net/mlx4/mlx4.c|  7 +++
 drivers/net/mlx4/mlx4.h|  2 ++
 drivers/net/mlx4/mlx4_ethdev.c |  6 ++
 drivers/net/mlx4/mlx4_prm.h|  2 ++
 drivers/net/mlx4/mlx4_rxtx.c   | 25 +
 drivers/net/mlx4/mlx4_rxtx.h   |  2 ++
 drivers/net/mlx4/mlx4_txq.c|  4 +++-
 7 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index b084903..3149be6 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -397,6 +397,7 @@ struct mlx4_conf {
.ports.present = 0,
};
unsigned int vf;
+   unsigned int tunnel_en;
int i;
 
(void)pci_drv;
@@ -456,6 +457,9 @@ struct mlx4_conf {
rte_errno = ENODEV;
goto error;
}
+   /* Only cx3-pro supports L3 tunneling */
+   tunnel_en = (device_attr.vendor_part_id ==
+PCI_DEVICE_ID_MELLANOX_CONNECTX3PRO);
INFO("%u port(s) detected", device_attr.phys_port_cnt);
conf.ports.present |= (UINT64_C(1) << device_attr.phys_port_cnt) - 1;
if (mlx4_args(pci_dev->device.devargs, &conf)) {
@@ -529,6 +533,9 @@ struct mlx4_conf {
priv->pd = pd;
priv->mtu = ETHER_MTU;
priv->vf = vf;
+   priv->tunnel_en = tunnel_en;
+   priv->hw_csum =
+!!(device_attr.device_cap_flags & IBV_DEVICE_RAW_IP_CSUM);
/* Configure the first MAC address by default. */
if (mlx4_get_mac(priv, &mac.addr_bytes)) {
ERROR("cannot get MAC address, is mlx4_en loaded?"
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 93e5502..439a828 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -104,6 +104,8 @@ struct priv {
unsigned int vf:1; /* This is a VF device. */
unsigned int intr_alarm:1; /* An interrupt alarm is scheduled. */
unsigned int isolated:1; /* Toggle isolated mode. */
+   unsigned int hw_csum:1; /* Checksum offload is supported. */
+   unsigned int tunnel_en:1; /* Device tunneling is enabled */
struct rte_intr_handle intr_handle; /* Port interrupt handle. */
struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
LIST_HEAD(mlx4_flows, rte_flow) flows;
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index a9e8059..e4ecbfa 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -553,6 +553,12 @@
info->max_mac_addrs = 1;
info->rx_offload_capa = 0;
info->tx_offload_capa = 0;
+   if (priv->hw_csum)
+   info->tx_offload_capa |= (DEV_TX_OFFLOAD_IPV4_CKSUM |
+ DEV_TX_OFFLOAD_UDP_CKSUM  |
+ DEV_TX_OFFLOAD_TCP_CKSUM);
+   if (priv->tunnel_en)
+   info->tx_offload_capa |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM;
if (mlx4_get_ifname(priv, &ifname) == 0)
info->if_index = if_nametoindex(ifname);
info->speed_capa =
diff --git a/drivers/net/mlx4/mlx4_prm.h b/drivers/net/mlx4/mlx4_prm.h
index 8b0248a..38e9a45 100644
--- a/drivers/net/mlx4/mlx4_prm.h
+++ b/drivers/net/mlx4/mlx4_prm.h
@@ -60,6 +60,8 @@
 /* WQE flags */
 #define MLX4_OPCODE_SEND   0x0a
 #define MLX4_EN_BIT_WQE_OWN0x8000
+#define MLX4_WQE_CTRL_IIP_HDR_CSUM (1 << 28)
+#define MLX4_WQE_CTRL_IL4_HDR_CSUM (1 << 27)
 
 #define SIZE_TO_TXBBS(size)(RTE_ALIGN((size), (TXBB_SIZE)) / (TXBB_SIZE))
 
diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c
index dae0e47..3415f63 100644
--- a/drivers/net/mlx4/mlx4_rxtx.c
+++ b/drivers/net/mlx4/mlx4_rxtx.c
@@ -475,9 +475,27 @@
MLX4_WQE_CTRL_FENCE : 0) | ((wqe_real_size / 16) & 0x3f);
owner_opcode = MLX4_OPCODE_SEND |
   ((sq->head & sq->txbb_cnt) ? MLX4_EN_BIT_WQE_OWN : 0);
+   /* Should we enable HW CKSUM offload ? */
+   if (txq->priv->hw_csum &&
+   (pkt->ol_flags &
+   (PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM))) {
+   const uint64_t is_tunneled = pkt->ol_flags &
+(PKT_TX_TUNNEL_GRE |
+ PKT_TX_TUNNEL_VXLAN);
+
+   if (is_tunneled && txq->tunnel_en) {
+   owner_opcode |= MLX4_WQE_CTRL_IIP_HDR_CSUM |
+   MLX4_WQE_CTRL_IL4_HDR_CSUM;
+   if (pkt->ol_flags & PKT_TX_OUTER_IP_CKSUM)
+   srcrb_flags |= MLX4_WQE_CTRL_IP_HDR_CSUM;
+   } else {
+   srcrb_flags |= MLX4_WQE_CTRL_IP_HDR_CSUM |
+ ML

[dpdk-dev] [PATCH 5/5] net/mlx4: add loopback Tx from VF

2017-08-24 Thread Moti Haimovsky
Added loopback functionality use when the chip is a VF in order to
enable packet transmission between VFs and between VFs and PF.

Signed-off-by: Moti Haimovsky 
---
 drivers/net/mlx4/mlx4_prm.h  |  2 +-
 drivers/net/mlx4/mlx4_rxtx.c | 28 ++--
 drivers/net/mlx4/mlx4_rxtx.h |  2 ++
 drivers/net/mlx4/mlx4_txq.c  |  2 ++
 4 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_prm.h b/drivers/net/mlx4/mlx4_prm.h
index 38e9a45..e328cff 100644
--- a/drivers/net/mlx4/mlx4_prm.h
+++ b/drivers/net/mlx4/mlx4_prm.h
@@ -168,7 +168,7 @@
  uint32_t srcrb_flags, uint32_t imm)
 {
seg->fence_size = fence_size;
-   seg->srcrb_flags = rte_cpu_to_be_32(srcrb_flags);
+   seg->srcrb_flags = srcrb_flags;
/*
 * The caller should prepare "imm" in advance based on WR opcode.
 * For IBV_WR_SEND_WITH_IMM and IBV_WR_RDMA_WRITE_WITH_IMM,
diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c
index 3415f63..ed19c72 100644
--- a/drivers/net/mlx4/mlx4_rxtx.c
+++ b/drivers/net/mlx4/mlx4_rxtx.c
@@ -426,7 +426,11 @@
struct mlx4_wqe_data_seg *dseg;
struct mlx4_sq *sq = &txq->msq;
struct ibv_sge sge[wr->num_sge];
-   uint32_t srcrb_flags;
+   union {
+   uint32_t flags;
+   uint16_t flags16[2];
+   } srcrb;
+   uint32_t imm = 0;
uint8_t fence_size;
uint32_t head_idx = sq->head & sq->txbb_cnt_mask;
uint32_t owner_opcode;
@@ -466,10 +470,10 @@
/*  Request Tx completion. */
txq->elts_comp_cd -= nr_txbbs;
if (unlikely(txq->elts_comp_cd <= 0)) {
-   srcrb_flags = MLX4_WQE_CTRL_SOLICIT | MLX4_WQE_CTRL_CQ_UPDATE;
+   srcrb.flags = MLX4_WQE_CTRL_SOLICIT | MLX4_WQE_CTRL_CQ_UPDATE;
txq->elts_comp_cd = txq->elts_comp_cd_init;
} else {
-   srcrb_flags = MLX4_WQE_CTRL_SOLICIT;
+   srcrb.flags = MLX4_WQE_CTRL_SOLICIT;
}
fence_size = (wr->send_flags & IBV_SEND_FENCE ?
MLX4_WQE_CTRL_FENCE : 0) | ((wqe_real_size / 16) & 0x3f);
@@ -487,14 +491,26 @@
owner_opcode |= MLX4_WQE_CTRL_IIP_HDR_CSUM |
MLX4_WQE_CTRL_IL4_HDR_CSUM;
if (pkt->ol_flags & PKT_TX_OUTER_IP_CKSUM)
-   srcrb_flags |= MLX4_WQE_CTRL_IP_HDR_CSUM;
+   srcrb.flags |= MLX4_WQE_CTRL_IP_HDR_CSUM;
} else {
-   srcrb_flags |= MLX4_WQE_CTRL_IP_HDR_CSUM |
+   srcrb.flags |= MLX4_WQE_CTRL_IP_HDR_CSUM |
  MLX4_WQE_CTRL_TCP_UDP_CSUM;
}
}
+   /* convert flags to BE before adding the mac address (if at all)
+* to it
+*/
+   srcrb.flags = rte_cpu_to_be_32(srcrb.flags);
+   /* Copy dst mac address to wqe. This allows loopback in eSwitch,
+* so that VFs and PF can communicate with each other
+*/
+   if (txq->lb) {
+   srcrb.flags16[0] = *(rte_pktmbuf_mtod(pkt, uint16_t *));
+   imm = *(rte_pktmbuf_mtod_offset(pkt, uint32_t *,
+   sizeof(uint16_t)));
+   }
/* fill in ctrl info but ownership */
-   mlx4_set_ctrl_seg(ctrl, fence_size, srcrb_flags, 0);
+   mlx4_set_ctrl_seg(ctrl, fence_size, srcrb.flags, imm);
/* If we used a bounce buffer then copy wqe back into sq */
if (unlikely(bounce))
ctrl = mlx4_bounce_to_desc(txq, head_idx, wqe_size);
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index b4675b7..8e407f5 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -148,6 +148,8 @@ struct txq {
struct mlx4_cq mcq; /**< Info for directly manipulating the CQ. */
uint16_t tunnel_en:1;
/* When set TX offload for tunneled packets are supported. */
+   uint16_t lb:1;
+   /* Whether pkts should be looped-back by eswitch or not */
char *bounce_buf; /**< Side memory to be used when wqe wraps around */
 };
 
diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c
index cecd5e8..296d72d 100644
--- a/drivers/net/mlx4/mlx4_txq.c
+++ b/drivers/net/mlx4/mlx4_txq.c
@@ -410,6 +410,8 @@ struct txq_mp2mr_mbuf_check_data {
  (void *)dev, strerror(rte_errno));
goto error;
}
+   /* If a VF device - need to loopback xmitted packets */
+   tmpl.lb = !!(priv->vf);
/* Clean up txq in case we're reinitializing it. */
DEBUG("%p: cleaning-up old txq just in case", (void *)txq);
mlx4_txq_cleanup(txq);
-- 
1.8.3.1



[dpdk-dev] [PATCH 00/16] nfp: add pf support

2017-08-24 Thread Alejandro Lucero
NFP PMD has just had support for SRIOV VFs until now. This patchset adds
support for the PF, but just for being used as another DPDK port. No VF
management is added by now.

NFP is a programmable device and it supports virtual NICs (vNICs) through
firmware implementation. Different firmware applications implement vNICs
for PF devices and VF devices, being number of vNICs dependent on the
firmware and the NFP card available. PF vNIC (virtual) BARs are a subset
of PF PCI device BARs while VF vNIC BARs are same than VF PCI BARs. 

Working with VF vNICs requires a PF driver uploading the firmware and 
doing some NFP configuration. This needs can be only done with the kernel
NFP PF netdev driver by now.

Working with PF vNIC requires the PMD doing the NFP configuration and for
accessing the NFP a specific user space interface is created. NFP Service 
Processor Userspace (NSPU) interface allows to create specific PCI BAR
windows for accessing different parts of the NFP device, including the
Network Service Processor (NSP) itself. The NSPU interface is implemented
as the base for working with the PF.

Alejandro Lucero (16):
  nfp: add nsp user space interface
  nfp: add specific pf probe function
  nfp: add support for new pci id
  nfp: add nsp support for commands
  nfp: add nsp fw upload command
  nfp: add nsp symbol resolution command
  nfp: add fw upload logic
  nfp: add support for vnic config bar mapping
  nfp: add support for vNIC rx/tx bar mappings
  nfp: support pf devices inside pmd initialization
  nfp: allocate eth_dev from pf probe function
  nfp: support pf multiport
  nfp: add nsp support for hw link configuration
  nfp: add support for hw port link configuration
  nfp: read pf port mac addr using nsp
  doc: update nfp with pf support information

 doc/guides/nics/nfp.rst|  71 +++--
 drivers/net/nfp/Makefile   |   2 +
 drivers/net/nfp/nfp_net.c  | 377 +++--
 drivers/net/nfp/nfp_net_ctrl.h |   3 +
 drivers/net/nfp/nfp_net_eth.h  |  82 ++
 drivers/net/nfp/nfp_net_pmd.h  |   8 +
 drivers/net/nfp/nfp_nfpu.c | 103 +++
 drivers/net/nfp/nfp_nfpu.h |  55 
 drivers/net/nfp/nfp_nspu.c | 623 +
 drivers/net/nfp/nfp_nspu.h |  83 ++
 10 files changed, 1369 insertions(+), 38 deletions(-)
 create mode 100644 drivers/net/nfp/nfp_net_eth.h
 create mode 100644 drivers/net/nfp/nfp_nfpu.c
 create mode 100644 drivers/net/nfp/nfp_nfpu.h
 create mode 100644 drivers/net/nfp/nfp_nspu.c
 create mode 100644 drivers/net/nfp/nfp_nspu.h

-- 
1.9.1



[dpdk-dev] [PATCH 01/16] nfp: add nsp user space interface

2017-08-24 Thread Alejandro Lucero
Working with the PF requires access to the NFP for basic configuration.
NSP is the NFP Service Processor helping with hardware and firmware
configuration. NSPU is the NSP user space interface for working with the
NSP.

Configuration through NSPU allows to create PCI BAR windows for accessing
different NFP hardware units, including the BAR window for the NSPU
interface access itself. NFP expansion bar registers are used for creating
those PCI BAR windows. NSPU uses a specific expansion bar which is
reprogrammed for accessing/doing different things.

Other expansion bars will be configured later for configuring the PF vNIC
bars, a subset of PF PCI BARs.

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/Makefile   |   2 +
 drivers/net/nfp/nfp_nfpu.c | 103 
 drivers/net/nfp/nfp_nfpu.h |  55 +++
 drivers/net/nfp/nfp_nspu.c | 129 +
 drivers/net/nfp/nfp_nspu.h |  71 +
 5 files changed, 360 insertions(+)
 create mode 100644 drivers/net/nfp/nfp_nfpu.c
 create mode 100644 drivers/net/nfp/nfp_nfpu.h
 create mode 100644 drivers/net/nfp/nfp_nspu.c
 create mode 100644 drivers/net/nfp/nfp_nspu.h

diff --git a/drivers/net/nfp/Makefile b/drivers/net/nfp/Makefile
index 4ee2c2d..3e4c6f4 100644
--- a/drivers/net/nfp/Makefile
+++ b/drivers/net/nfp/Makefile
@@ -49,5 +49,7 @@ LIBABIVER := 1
 # all source are stored in SRCS-y
 #
 SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_net.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_nfpu.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_nspu.c
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/nfp/nfp_nfpu.c b/drivers/net/nfp/nfp_nfpu.c
new file mode 100644
index 000..556ded3
--- /dev/null
+++ b/drivers/net/nfp/nfp_nfpu.c
@@ -0,0 +1,103 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "nfp_nfpu.h"
+
+/* PF BAR and expansion BAR for the NSP interface */
+#define NFP_CFG_PCIE_BAR0
+#define NFP_CFG_EXP_BAR 7
+
+#define NFP_CFG_EXP_BAR_CFG_BASE   0x3
+
+/* There could be other NFP userspace tools using the NSP interface.
+ * Make sure there is no other process using it and locking the access for
+ * avoiding problems.
+ */
+static int
+nspv_aquire_process_lock(nfpu_desc_t *desc)
+{
+   int rc;
+   struct flock lock;
+   char lockname[30];
+
+   memset(&lock, 0, sizeof(lock));
+
+   snprintf(lockname, sizeof(lockname), "/var/lock/nfp%d", desc->nfp);
+
+   /* Using S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH */
+   desc->lock = open(lockname, O_RDWR | O_CREAT, 0666);
+
+   if (desc->lock < 0)
+   return desc->lock;
+
+   lock.l_type = F_WRLCK;
+   lock.l_whence = SEEK_SET;
+   rc = -1;
+   while (rc != 0) {
+   rc = fcntl(desc->lock, F_SETLK, &lock);
+   if (rc < 0) {
+   if ((errno != EAGAIN) && (errno != EACCES)) {
+   close(desc->lock);
+   return rc;
+   }
+   }
+   }
+
+   return 0;
+}
+
+int
+nfpu_open(struct rte_pci_device *pci_dev, nfpu_desc_t *desc, int nfp)
+{
+   void *cfg_base, *mem_base;
+   size_t barsz;
+   int ret = 0;
+   int i = 0;
+
+   desc->nfp = nfp;
+
+   ret = nspv_aquire_process_lock(desc);
+   if (ret)
+   return -1;
+
+   barsz = pci_dev->mem_resource[0].len;
+
+   /* barsz in log2 */
+   while (barsz >>= 1)
+   i++;
+   barsz = i;
+
+   /* Getting address for NFP expansion BAR registers */
+   cfg_base = pci_dev->mem_resource[0].addr;
+   cfg_base = (uint8_t *)cfg_base + NFP_CFG_EXP_BAR_CFG_BASE;
+
+   /* Getting address for NFP NSP interface registers */
+   mem_base = pci_dev->mem_resource[0].addr;
+   mem_base = (uint8_t *)mem_base + (NFP_CFG_EXP_BAR << (barsz - 3));
+
+
+   desc->nspu = rte_malloc("nfp nspu", sizeof(nspu_desc_t), 0);
+   nfp_nspu_init(desc->nspu, desc->nfp, NFP_CFG_PCIE_BAR, barsz,
+ NFP_CFG_EXP_BAR, cfg_base, mem_base);
+
+   return ret;
+}
+
+int
+nfpu_close(nfpu_desc_t *desc)
+{
+   rte_free(desc->nspu);
+   close(desc->lock);
+   unlink("/var/lock/nfp0");
+   return 0;
+}
diff --git a/drivers/net/nfp/nfp_nfpu.h b/drivers/net/nfp/nfp_nfpu.h
new file mode 100644
index 000..31511b3
--- /dev/null
+++ b/drivers/net/nfp/nfp_nfpu.h
@@ -0,0 +1,55 @@
+/*
+ * Copyright (c) 2017 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must repr

[dpdk-dev] [PATCH 02/16] nfp: add specific pf probe function

2017-08-24 Thread Alejandro Lucero
Configuring the NFP PMD for using the PF requires access through the
NSPU interface for device configuration. This patch adds a specific probe
function for the PF which uses the NSPU interface. Just basic NSPU access
is done by now reading the NSPU ABI version.

No ethernet port is created yet.

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_net.c | 75 +++
 1 file changed, 69 insertions(+), 6 deletions(-)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index a3bf5e1..e2fe83a 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -59,6 +59,8 @@
 #include "nfp_net_logs.h"
 #include "nfp_net_ctrl.h"
 
+#include "nfp_nfpu.h"
+
 /* Prototypes */
 static void nfp_net_close(struct rte_eth_dev *dev);
 static int nfp_net_configure(struct rte_eth_dev *dev);
@@ -2632,12 +2634,63 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
return 0;
 }
 
-static const struct rte_pci_id pci_id_nfp_net_map[] = {
+static int nfp_pf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+   struct rte_pci_device *dev)
+{
+   nfpu_desc_t *nfpu_desc;
+   nspu_desc_t *nspu_desc;
+   int major, minor;
+
+   if (!dev)
+   return -ENODEV;
+
+   nfpu_desc = rte_malloc("nfp nfpu", sizeof(nfpu_desc_t), 0);
+   if (!nfpu_desc)
+   return -ENOMEM;
+
+   if (nfpu_open(dev, nfpu_desc, 0) < 0) {
+   RTE_LOG(ERR, PMD,
+   "nfpu_open failed\n");
+   goto nfpu_error;
+   }
+
+   nspu_desc = nfpu_desc->nspu;
+
+
+   /* Check NSP ABI version */
+   if (nfp_nsp_get_abi_version(nspu_desc, &major, &minor) < 0) {
+   RTE_LOG(INFO, PMD, "NFP NSP not present\n");
+   goto no_abi;
+   }
+   PMD_INIT_LOG(INFO, "nspu ABI version: %d.%d\n", major, minor);
+
+   if (minor < 20) {
+   RTE_LOG(INFO, PMD, "NFP NSP ABI version too old. Required 0.20 
or higher\n");
+   goto no_abi;
+   }
+
+   /* No port is created yet */
+
+no_abi:
+   nfpu_close(nfpu_desc);
+nfpu_error:
+   rte_free(nfpu_desc);
+
+   return -ENODEV;
+}
+
+static const struct rte_pci_id pci_id_nfp_pf_net_map[] = {
{
RTE_PCI_DEVICE(PCI_VENDOR_ID_NETRONOME,
   PCI_DEVICE_ID_NFP6000_PF_NIC)
},
{
+   .vendor_id = 0,
+   },
+};
+
+static const struct rte_pci_id pci_id_nfp_vf_net_map[] = {
+   {
RTE_PCI_DEVICE(PCI_VENDOR_ID_NETRONOME,
   PCI_DEVICE_ID_NFP6000_VF_NIC)
},
@@ -2658,16 +2711,26 @@ static int eth_nfp_pci_remove(struct rte_pci_device 
*pci_dev)
return rte_eth_dev_pci_generic_remove(pci_dev, NULL);
 }
 
-static struct rte_pci_driver rte_nfp_net_pmd = {
-   .id_table = pci_id_nfp_net_map,
+static struct rte_pci_driver rte_nfp_net_pf_pmd = {
+   .id_table = pci_id_nfp_pf_net_map,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+   .probe = nfp_pf_pci_probe,
+   .remove = eth_nfp_pci_remove,
+};
+
+static struct rte_pci_driver rte_nfp_net_vf_pmd = {
+   .id_table = pci_id_nfp_vf_net_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = eth_nfp_pci_probe,
.remove = eth_nfp_pci_remove,
 };
 
-RTE_PMD_REGISTER_PCI(net_nfp, rte_nfp_net_pmd);
-RTE_PMD_REGISTER_PCI_TABLE(net_nfp, pci_id_nfp_net_map);
-RTE_PMD_REGISTER_KMOD_DEP(net_nfp, "* igb_uio | uio_pci_generic | vfio-pci");
+RTE_PMD_REGISTER_PCI(net_nfp_pf, rte_nfp_net_pf_pmd);
+RTE_PMD_REGISTER_PCI(net_nfp_vf, rte_nfp_net_vf_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(net_nfp_pf, pci_id_nfp_pf_net_map);
+RTE_PMD_REGISTER_PCI_TABLE(net_nfp_vf, pci_id_nfp_vf_net_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_nfp_pf, "* igb_uio | uio_pci_generic | vfio");
+RTE_PMD_REGISTER_KMOD_DEP(net_nfp_vf, "* igb_uio | uio_pci_generic | vfio");
 
 /*
  * Local variables:
-- 
1.9.1



[dpdk-dev] [PATCH 05/16] nfp: add nsp fw upload command

2017-08-24 Thread Alejandro Lucero
Using NSPU interface for fw upload. Firmware file needs to be
installed in specific path inside system firmware directory.

NSPU buffer is used for writing the firmware before sending the
command.

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_nspu.c | 66 +-
 drivers/net/nfp/nfp_nspu.h |  1 +
 2 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/drivers/net/nfp/nfp_nspu.c b/drivers/net/nfp/nfp_nspu.c
index dbb5305..57ee45f 100644
--- a/drivers/net/nfp/nfp_nspu.c
+++ b/drivers/net/nfp/nfp_nspu.c
@@ -3,6 +3,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include 
 
@@ -38,7 +41,8 @@
 #define NSP_STATUS_MINOR(x)  (int)(((x) >> 32) & 0xfff)
 
 /* NSP commands */
-#define NSP_CMD_RESET  1
+#define NSP_CMD_RESET  1
+#define NSP_CMD_FW_LOAD6
 
 #define NSP_BUFFER_CFG_SIZE_MASK   (0xff)
 
@@ -304,3 +308,63 @@
 
return res;
 }
+
+#define DEFAULT_FW_PATH   "/lib/firmware/netronome"
+#define DEFAULT_FW_FILENAME   "nic_dpdk_default.nffw"
+
+int
+nfp_fw_upload(nspu_desc_t *nspu_desc)
+{
+   int fw_f;
+   char *fw_buf;
+   char filename[100];
+   struct stat file_stat;
+   off_t fsize, bytes;
+   ssize_t size;
+   int ret;
+
+   size = nspu_desc->buf_size;
+
+   sprintf(filename, "%s/%s", DEFAULT_FW_PATH, DEFAULT_FW_FILENAME);
+   fw_f = open(filename, O_RDONLY);
+   if (fw_f < 0) {
+   RTE_LOG(INFO, PMD, "Firmware file %s/%s not found.",
+   DEFAULT_FW_PATH, DEFAULT_FW_FILENAME);
+   return -ENOENT;
+   }
+
+   fstat(fw_f, &file_stat);
+
+   fsize = file_stat.st_size;
+   RTE_LOG(DEBUG, PMD, "Firmware file with size: %" PRIu64 "\n",
+   (uint64_t)fsize);
+
+   if (fsize > (off_t)size) {
+   RTE_LOG(INFO, PMD, "fw file too big: %" PRIu64
+  " bytes (%" PRIu64 " max)",
+ (uint64_t)fsize, (uint64_t)size);
+   return -EINVAL;
+   }
+
+   fw_buf = malloc((size_t)size);
+   if (!fw_buf) {
+   RTE_LOG(INFO, PMD, "malloc failed for fw buffer");
+   return -ENOMEM;
+   }
+   memset(fw_buf, 0, size);
+
+   bytes = read(fw_f, fw_buf, fsize);
+   if (bytes != fsize) {
+   RTE_LOG(INFO, PMD, "Reading fw to buffer failed.\n"
+  "Just %" PRIu64 " of %" PRIu64 " bytes 
read.",
+  (uint64_t)bytes, (uint64_t)fsize);
+   free(fw_buf);
+   return -EIO;
+   }
+
+   ret = nspu_command(nspu_desc, NSP_CMD_FW_LOAD, 0, 1, fw_buf, 0, bytes);
+
+   free(fw_buf);
+
+   return ret;
+}
diff --git a/drivers/net/nfp/nfp_nspu.h b/drivers/net/nfp/nfp_nspu.h
index a142eb3..6e1c25f 100644
--- a/drivers/net/nfp/nfp_nspu.h
+++ b/drivers/net/nfp/nfp_nspu.h
@@ -73,3 +73,4 @@ int nfp_nspu_init(nspu_desc_t *desc, int nfp, int pcie_bar, 
size_t pcie_barsz,
  int exp_bar, void *exp_bar_cfg_base, void *exp_bar_mmap);
 int nfp_nsp_get_abi_version(nspu_desc_t *desc, int *major, int *minor);
 int nfp_fw_reset(nspu_desc_t *nspu_desc);
+int nfp_fw_upload(nspu_desc_t *nspu_desc);
-- 
1.9.1



[dpdk-dev] [PATCH 04/16] nfp: add nsp support for commands

2017-08-24 Thread Alejandro Lucero
NSPU interface declares a buffer controlled by the NFP NSP service
processor. It is possible to send commands to the NSP using the NSPU
and this buffer for data related to the command. A command can imply
buffer read, buffer write, both or none.

Initial command for resetting the firmware is added as well which
does not require the buffer at all.

Commands will allow firmware upload, symbol resolution and ethernet
link configuration. Future commands will allow specific offloads like
flow offloads and eBPF offload.

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_nspu.c | 181 -
 drivers/net/nfp/nfp_nspu.h |   4 +
 2 files changed, 183 insertions(+), 2 deletions(-)

diff --git a/drivers/net/nfp/nfp_nspu.c b/drivers/net/nfp/nfp_nspu.c
index a157915..dbb5305 100644
--- a/drivers/net/nfp/nfp_nspu.c
+++ b/drivers/net/nfp/nfp_nspu.c
@@ -29,14 +29,19 @@
 #define NSP_STATUS  0x00
 #define NSP_COMMAND 0x08
 #define NSP_BUFFER 0x10
-#define NSP_DEFAULT_BUFFER  0x18
-#define NSP_DEFAULT_BUFFER_CFG  0x20
+#define NSP_DEFAULT_BUF 0x18
+#define NSP_DEFAULT_BUF_CFG  0x20
 
 #define NSP_MAGIC0xab10
 #define NSP_STATUS_MAGIC(x)  (((x) >> 48) & 0x)
 #define NSP_STATUS_MAJOR(x)  (int)(((x) >> 44) & 0xf)
 #define NSP_STATUS_MINOR(x)  (int)(((x) >> 32) & 0xfff)
 
+/* NSP commands */
+#define NSP_CMD_RESET  1
+
+#define NSP_BUFFER_CFG_SIZE_MASK   (0xff)
+
 #define NSP_REG_ADDR(d, off, reg) ((uint8_t *)(d)->mem_base + (off) + (reg))
 #define NSP_REG_VAL(p) (*(uint64_t *)(p))
 
@@ -118,12 +123,184 @@
 nfp_nspu_init(nspu_desc_t *desc, int nfp, int pcie_bar, size_t pcie_barsz,
  int exp_bar, void *exp_bar_cfg_base, void *exp_bar_mmap)
 {
+   uint64_t offset, buffaddr;
+   uint64_t nsp_reg;
+
desc->nfp = nfp;
desc->pcie_bar = pcie_bar;
desc->exp_bar = exp_bar;
desc->barsz = pcie_barsz;
+   desc->windowsz = 1 << (desc->barsz - 3);
desc->cfg_base = exp_bar_cfg_base;
desc->mem_base = exp_bar_mmap;
 
+   nspu_xlate(desc, NSP_BASE, &offset);
+
+   /*
+* Other NSPU clients can use other buffers. Let's tell NSPU we use the
+* default buffer.
+*/
+   buffaddr = NSP_REG_VAL(NSP_REG_ADDR(desc, offset, NSP_DEFAULT_BUF));
+   NSP_REG_VAL(NSP_REG_ADDR(desc, offset, NSP_BUFFER)) = buffaddr;
+
+   /* NFP internal addresses are 40 bits. Clean all other bits here */
+   buffaddr = buffaddr & (((uint64_t)1 << 40) - 1);
+   desc->bufaddr = buffaddr;
+
+   /* Lets get information about the buffer */
+   nsp_reg = NSP_REG_VAL(NSP_REG_ADDR(desc, offset, NSP_DEFAULT_BUF_CFG));
+
+   /* Buffer size comes in MBs. Coversion to bytes */
+   desc->buf_size = ((size_t)nsp_reg & NSP_BUFFER_CFG_SIZE_MASK) << 20;
+
return 0;
 }
+
+#define NSPU_NFP_BUF(addr, base, off) \
+   (*(uint64_t *)((uint8_t *)(addr)->mem_base + ((base) | (off
+
+#define NSPU_HOST_BUF(base, off) (*(uint64_t *)((uint8_t *)(base) + (off)))
+
+static int
+nspu_buff_write(nspu_desc_t *desc, void *buffer, size_t size)
+{
+   uint64_t pcie_offset, pcie_window_base, pcie_window_offset;
+   uint64_t windowsz = desc->windowsz;
+   uint64_t buffaddr, j, i = 0;
+   int ret = 0;
+
+   if (size > desc->buf_size)
+   return -1;
+
+   buffaddr = desc->bufaddr;
+   windowsz = desc->windowsz;
+
+   while (i < size) {
+   /* Expansion bar reconfiguration per window size */
+   nspu_xlate(desc, buffaddr + i, &pcie_offset);
+   pcie_window_base = pcie_offset & (~(windowsz - 1));
+   pcie_window_offset = pcie_offset & (windowsz - 1);
+   for (j = pcie_window_offset; ((j < windowsz) && (i < size));
+j += 8) {
+   NSPU_NFP_BUF(desc, pcie_window_base, j) =
+   NSPU_HOST_BUF(buffer, i);
+   i += 8;
+   }
+   }
+
+   return ret;
+}
+
+static int
+nspu_buff_read(nspu_desc_t *desc, void *buffer, size_t size)
+{
+   uint64_t pcie_offset, pcie_window_base, pcie_window_offset;
+   uint64_t windowsz, i = 0, j;
+   uint64_t buffaddr;
+   int ret = 0;
+
+   if (size > desc->buf_size)
+   return -1;
+
+   buffaddr = desc->bufaddr;
+   windowsz = desc->windowsz;
+
+   while (i < size) {
+   /* Expansion bar reconfiguration per window size */
+   nspu_xlate(desc, buffaddr + i, &pcie_offset);
+   pcie_window_base = pcie_offset & (~(windowsz - 1));
+   pcie_window_offset = pcie_offset & (windowsz - 1);
+   for (j = pcie_window_offset; ((j < windowsz) && (i < size));
+j += 8) {
+   NSPU_HOST_BUF(buffer, i) =
+   NSPU_NFP_BUF(desc, pcie_window_base, j)

[dpdk-dev] [PATCH 03/16] nfp: add support for new pci id

2017-08-24 Thread Alejandro Lucero
A NFP PF PCI devices can have PCI ID 4000 or 6000.

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_net.c | 4 
 drivers/net/nfp/nfp_net_pmd.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index e2fe83a..1890a4a 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -2682,6 +2682,10 @@ static int nfp_pf_pci_probe(struct rte_pci_driver 
*pci_drv __rte_unused,
 static const struct rte_pci_id pci_id_nfp_pf_net_map[] = {
{
RTE_PCI_DEVICE(PCI_VENDOR_ID_NETRONOME,
+  PCI_DEVICE_ID_NFP4000_PF_NIC)
+   },
+   {
+   RTE_PCI_DEVICE(PCI_VENDOR_ID_NETRONOME,
   PCI_DEVICE_ID_NFP6000_PF_NIC)
},
{
diff --git a/drivers/net/nfp/nfp_net_pmd.h b/drivers/net/nfp/nfp_net_pmd.h
index c6bddaa..3818130 100644
--- a/drivers/net/nfp/nfp_net_pmd.h
+++ b/drivers/net/nfp/nfp_net_pmd.h
@@ -42,6 +42,7 @@
 
 #define NFP_NET_PMD_VERSION "0.1"
 #define PCI_VENDOR_ID_NETRONOME 0x19ee
+#define PCI_DEVICE_ID_NFP4000_PF_NIC0x4000
 #define PCI_DEVICE_ID_NFP6000_PF_NIC0x6000
 #define PCI_DEVICE_ID_NFP6000_VF_NIC0x6003
 
-- 
1.9.1



[dpdk-dev] [PATCH 07/16] nfp: add fw upload logic

2017-08-24 Thread Alejandro Lucero
PMD will use this function for uploading the firmware. First, a
symbol resolution is done for finding out if there is a firmware
already there. If not, a NFP reset is called before using NSPU
fw upload code.

PMD PF probe function is now using this logic.

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_net.c  |  3 +++
 drivers/net/nfp/nfp_nspu.c | 48 +++---
 drivers/net/nfp/nfp_nspu.h |  6 +-
 3 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index 1890a4a..c0d5f58 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -2639,6 +2639,7 @@ static int nfp_pf_pci_probe(struct rte_pci_driver 
*pci_drv __rte_unused,
 {
nfpu_desc_t *nfpu_desc;
nspu_desc_t *nspu_desc;
+   uint64_t offset_symbol;
int major, minor;
 
if (!dev)
@@ -2669,6 +2670,8 @@ static int nfp_pf_pci_probe(struct rte_pci_driver 
*pci_drv __rte_unused,
goto no_abi;
}
 
+   nfp_nsp_fw_setup(nspu_desc, "nfd_cfg_pf0_num_ports", &offset_symbol);
+
/* No port is created yet */
 
 no_abi:
diff --git a/drivers/net/nfp/nfp_nspu.c b/drivers/net/nfp/nfp_nspu.c
index 4296be1..f4fee71 100644
--- a/drivers/net/nfp/nfp_nspu.c
+++ b/drivers/net/nfp/nfp_nspu.c
@@ -21,6 +21,9 @@
 /* NFP target for NSP access */
 #define NFP_NSP_TARGET   7
 
+/* Expansion BARs for mapping PF vnic BARs */
+#define NFP_NET_PF_CFG_EXP_BAR6
+
 /*
  * This is an NFP internal address used for configuring properly an NFP
  * expansion BAR.
@@ -297,7 +300,7 @@
return ret;
 }
 
-int
+static int
 nfp_fw_reset(nspu_desc_t *nspu_desc)
 {
int res;
@@ -313,7 +316,7 @@
 #define DEFAULT_FW_PATH   "/lib/firmware/netronome"
 #define DEFAULT_FW_FILENAME   "nic_dpdk_default.nffw"
 
-int
+static int
 nfp_fw_upload(nspu_desc_t *nspu_desc)
 {
int fw_f;
@@ -391,7 +394,7 @@
  * a PCI BAR window. NFP expansion BARs are used in this regard through
  * the NSPU interface.
  */
-int
+static int
 nfp_nspu_set_bar_from_symbl(nspu_desc_t *desc, const char *symbl,
uint32_t expbar, uint64_t *pcie_offset,
ssize_t *size)
@@ -454,3 +457,42 @@
free(sym_buf);
return ret;
 }
+
+int
+nfp_nsp_fw_setup(nspu_desc_t *desc, const char *sym, uint64_t *pcie_offset)
+{
+   ssize_t bar0_sym_size;
+
+   /* If the symbol resolution works, it implies a firmware app
+* is already there.
+*/
+   if (!nfp_nspu_set_bar_from_symbl(desc, sym, NFP_NET_PF_CFG_EXP_BAR,
+pcie_offset, &bar0_sym_size))
+   return 0;
+
+   /* No firmware app detected or not the right one */
+   RTE_LOG(INFO, PMD, "No firmware detected. Resetting NFP...\n");
+   if (nfp_fw_reset(desc) < 0) {
+   RTE_LOG(ERR, PMD, "nfp fw reset failed\n");
+   return -ENODEV;
+   }
+
+   RTE_LOG(INFO, PMD, "Reset done.\n");
+   RTE_LOG(INFO, PMD, "Uploading firmware...\n");
+
+   if (nfp_fw_upload(desc) < 0) {
+   RTE_LOG(ERR, PMD, "nfp fw upload failed\n");
+   return -ENODEV;
+   }
+
+   RTE_LOG(INFO, PMD, "Done.\n");
+
+   /* Now the symbol should be there */
+   if (nfp_nspu_set_bar_from_symbl(desc, sym, NFP_NET_PF_CFG_EXP_BAR,
+   pcie_offset, &bar0_sym_size)) {
+   RTE_LOG(ERR, PMD, "nfp PF BAR symbol resolution failed\n");
+   return -ENODEV;
+   }
+
+   return 0;
+}
diff --git a/drivers/net/nfp/nfp_nspu.h b/drivers/net/nfp/nfp_nspu.h
index 7734b4f..c439700 100644
--- a/drivers/net/nfp/nfp_nspu.h
+++ b/drivers/net/nfp/nfp_nspu.h
@@ -72,8 +72,4 @@
 int nfp_nspu_init(nspu_desc_t *desc, int nfp, int pcie_bar, size_t pcie_barsz,
  int exp_bar, void *exp_bar_cfg_base, void *exp_bar_mmap);
 int nfp_nsp_get_abi_version(nspu_desc_t *desc, int *major, int *minor);
-int nfp_fw_reset(nspu_desc_t *nspu_desc);
-int nfp_fw_upload(nspu_desc_t *nspu_desc);
-int nfp_nspu_set_bar_from_symbl(nspu_desc_t *desc, const char *symbl,
-   uint32_t expbar, uint64_t *pcie_offset,
-   ssize_t *size);
+int nfp_nsp_fw_setup(nspu_desc_t *desc, const char *sym, uint64_t 
*pcie_offset);
-- 
1.9.1



[dpdk-dev] [PATCH 06/16] nfp: add nsp symbol resolution command

2017-08-24 Thread Alejandro Lucero
Firmware has symbols helping to configure things like number of
PF ports, vNIC BARs addresses inside NFP memories, or ethernet
link state. Different firmware apps have different things to map
and likely different internal NFP addresses to use.

Host drivers can use the NSPU interface for getting symbol data
regarding different hardware configurations. Once the driver has
the information about a specific object, a mapping is required
configuring an NFP expansion bar creating a device PCI bar window.

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_nspu.c | 86 ++
 drivers/net/nfp/nfp_nspu.h |  3 ++
 2 files changed, 89 insertions(+)

diff --git a/drivers/net/nfp/nfp_nspu.c b/drivers/net/nfp/nfp_nspu.c
index 57ee45f..4296be1 100644
--- a/drivers/net/nfp/nfp_nspu.c
+++ b/drivers/net/nfp/nfp_nspu.c
@@ -43,6 +43,7 @@
 /* NSP commands */
 #define NSP_CMD_RESET  1
 #define NSP_CMD_FW_LOAD6
+#define NSP_CMD_GET_SYMBOL 14
 
 #define NSP_BUFFER_CFG_SIZE_MASK   (0xff)
 
@@ -368,3 +369,88 @@
 
return ret;
 }
+
+/* Firmware symbol descriptor size */
+#define NFP_SYM_DESC_LEN 40
+
+#define SYMBOL_DATA(b, off) (*(int64_t *)((b) + (off)))
+#define SYMBOL_UDATA(b, off) (*(uint64_t *)((b) + (off)))
+
+/* Firmware symbols contain information about how to access what they
+ * represent. It can be as simple as an numeric variable declared at a
+ * specific NFP memory, but it can also be more complex structures and
+ * related to specific hardware functionalities or components. Target,
+ * domain and address allow to create the BAR window for accessing such
+ * hw object and size defines the length to map.
+ *
+ * A vNIC is a network interface implemented inside the NFP and using a
+ * subset of device PCI BARs. Specific firmware symbols allow to map those
+ * vNIC bars by host drivers like the NFP PMD.
+ *
+ * Accessing what the symbol represents implies to map the access through
+ * a PCI BAR window. NFP expansion BARs are used in this regard through
+ * the NSPU interface.
+ */
+int
+nfp_nspu_set_bar_from_symbl(nspu_desc_t *desc, const char *symbl,
+   uint32_t expbar, uint64_t *pcie_offset,
+   ssize_t *size)
+{
+   int64_t type;
+   int64_t target;
+   int64_t domain;
+   uint64_t addr;
+   char *sym_buf;
+   int ret = 0;
+
+   sym_buf = malloc(desc->buf_size);
+   strncpy(sym_buf, symbl, strlen(symbl));
+   ret = nspu_command(desc, NSP_CMD_GET_SYMBOL, 1, 1, sym_buf,
+  NFP_SYM_DESC_LEN, strlen(symbl));
+   if (ret) {
+   RTE_LOG(DEBUG, PMD, "symbol resolution (%s) failed\n", symbl);
+   goto clean;
+   }
+
+   /* Reading symbol information */
+   type = SYMBOL_DATA(sym_buf, 0);
+   target = SYMBOL_DATA(sym_buf, 8);
+   domain =  SYMBOL_DATA(sym_buf, 16);
+   addr = SYMBOL_UDATA(sym_buf, 24);
+   *size = (ssize_t)SYMBOL_UDATA(sym_buf, 32);
+
+   if (type != 1) {
+   RTE_LOG(INFO, PMD, "wrong symbol type\n");
+   ret = -EINVAL;
+   goto clean;
+   }
+   if (!(target == 7 || target == -7)) {
+   RTE_LOG(INFO, PMD, "wrong symbol target\n");
+   ret = -EINVAL;
+   goto clean;
+   }
+   if (domain == 8 || domain == 9) {
+   RTE_LOG(INFO, PMD, "wrong symbol domain\n");
+   ret = -EINVAL;
+   goto clean;
+   }
+
+   /* Adjusting address based on symbol location */
+   if (domain >= 24 && domain << 28 && target == 7) {
+   addr = 1ULL << 37 | addr | ((uint64_t)domain & 0x3) << 35;
+   } else {
+   addr = 1ULL << 39 | addr | ((uint64_t)domain & 0x3f) << 32;
+   if (target == -7)
+   target = 7;
+   }
+
+   /* Configuring NFP expansion bar for mapping specific PCI BAR window */
+   nfp_nspu_mem_bar_cfg(desc, expbar, target, addr, pcie_offset);
+
+   /* This is the PCI BAR offset to use by the host */
+   *pcie_offset |= ((expbar & 0x7) << (desc->barsz - 3));
+
+clean:
+   free(sym_buf);
+   return ret;
+}
diff --git a/drivers/net/nfp/nfp_nspu.h b/drivers/net/nfp/nfp_nspu.h
index 6e1c25f..7734b4f 100644
--- a/drivers/net/nfp/nfp_nspu.h
+++ b/drivers/net/nfp/nfp_nspu.h
@@ -74,3 +74,6 @@ int nfp_nspu_init(nspu_desc_t *desc, int nfp, int pcie_bar, 
size_t pcie_barsz,
 int nfp_nsp_get_abi_version(nspu_desc_t *desc, int *major, int *minor);
 int nfp_fw_reset(nspu_desc_t *nspu_desc);
 int nfp_fw_upload(nspu_desc_t *nspu_desc);
+int nfp_nspu_set_bar_from_symbl(nspu_desc_t *desc, const char *symbl,
+   uint32_t expbar, uint64_t *pcie_offset,
+   ssize_t *size);
-- 
1.9.1



[dpdk-dev] [PATCH 08/16] nfp: add support for vnic config bar mapping

2017-08-24 Thread Alejandro Lucero
NFP vNICs use a subset of PCI device BARs. vNIC config bar depends on
firmware symbol defining how to map it through a NFP expansion bar.

This patch adds a NSPU API function for getting a vNIC config bar
mapped through a expansion bar giving a firmware symbol. The PMD will
use the PCI bar offset returned for accessing the vNIC bar.

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_nspu.c | 13 +
 drivers/net/nfp/nfp_nspu.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/drivers/net/nfp/nfp_nspu.c b/drivers/net/nfp/nfp_nspu.c
index f4fee71..f68bae6 100644
--- a/drivers/net/nfp/nfp_nspu.c
+++ b/drivers/net/nfp/nfp_nspu.c
@@ -496,3 +496,16 @@
 
return 0;
 }
+
+int
+nfp_nsp_map_ctrl_bar(nspu_desc_t *desc, uint64_t *pcie_offset)
+{
+   ssize_t bar0_sym_size;
+
+   if (nfp_nspu_set_bar_from_symbl(desc, "_pf0_net_bar0",
+   NFP_NET_PF_CFG_EXP_BAR,
+   pcie_offset, &bar0_sym_size))
+   return -ENODEV;
+
+   return 0;
+}
diff --git a/drivers/net/nfp/nfp_nspu.h b/drivers/net/nfp/nfp_nspu.h
index c439700..8211f92 100644
--- a/drivers/net/nfp/nfp_nspu.h
+++ b/drivers/net/nfp/nfp_nspu.h
@@ -73,3 +73,4 @@ int nfp_nspu_init(nspu_desc_t *desc, int nfp, int pcie_bar, 
size_t pcie_barsz,
  int exp_bar, void *exp_bar_cfg_base, void *exp_bar_mmap);
 int nfp_nsp_get_abi_version(nspu_desc_t *desc, int *major, int *minor);
 int nfp_nsp_fw_setup(nspu_desc_t *desc, const char *sym, uint64_t 
*pcie_offset);
+int nfp_nsp_map_ctrl_bar(nspu_desc_t *desc, uint64_t *pcie_offset);
-- 
1.9.1



[dpdk-dev] [PATCH 10/16] nfp: support pf devices inside pmd initialization

2017-08-24 Thread Alejandro Lucero
nfp_net_init is where a dpdk port related to a eth_dev is initialized.
NFP VF vNICs use VF PCI BARs as they come after SRIOV is enabled. But for
NFP PF vNIC just a subset of PF PCI BARs are used.

This patch adds support for mapping the right PCI BAR subsets for the PF
vNIC. It uses the NSPU API functions introduced previously for configuring
NFP expansion bars.

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_net.c | 56 ---
 drivers/net/nfp/nfp_net_pmd.h |  4 
 2 files changed, 52 insertions(+), 8 deletions(-)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index c0d5f58..7c23b7a 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -55,12 +55,11 @@
 #include 
 #include 
 
+#include "nfp_nfpu.h"
 #include "nfp_net_pmd.h"
 #include "nfp_net_logs.h"
 #include "nfp_net_ctrl.h"
 
-#include "nfp_nfpu.h"
-
 /* Prototypes */
 static void nfp_net_close(struct rte_eth_dev *dev);
 static int nfp_net_configure(struct rte_eth_dev *dev);
@@ -101,7 +100,7 @@ static uint16_t nfp_net_xmit_pkts(void *tx_queue, struct 
rte_mbuf **tx_pkts,
  * happen to be at the same offset on the NFP6000 and the NFP3200 so
  * we use a single macro here.
  */
-#define NFP_PCIE_QUEUE(_q) (0x8 + (0x800 * ((_q) & 0xff)))
+#define NFP_PCIE_QUEUE(_q) (0x800 * ((_q) & 0xff))
 
 /* Maximum value which can be added to a queue with one transaction */
 #define NFP_QCP_MAX_ADD0x7f
@@ -2496,10 +2495,13 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
struct rte_pci_device *pci_dev;
struct nfp_net_hw *hw;
 
-   uint32_t tx_bar_off, rx_bar_off;
+   uint64_t tx_bar_off = 0, rx_bar_off = 0;
uint32_t start_q;
int stride = 4;
 
+   nspu_desc_t *nspu_desc = NULL;
+   uint64_t bar_offset;
+
PMD_INIT_FUNC_TRACE();
 
hw = NFP_NET_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
@@ -2532,11 +2534,33 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
"hw->ctrl_bar is NULL. BAR0 not configured\n");
return -ENODEV;
}
+
+   /* Is this a PF device? */
+   if ((pci_dev->id.device_id == PCI_DEVICE_ID_NFP4000_PF_NIC) ||
+   (pci_dev->id.device_id == PCI_DEVICE_ID_NFP6000_PF_NIC)) {
+   nspu_desc = hw->nspu_desc;
+
+   if (nfp_nsp_map_ctrl_bar(nspu_desc, &bar_offset) != 0) {
+   /*
+* A firmware should be there after PF probe so this
+* should not happen.
+*/
+   RTE_LOG(ERR, PMD, "PF BAR symbol resolution failed\n");
+   return -ENODEV;
+   }
+
+   /* vNIC PF control BAR is a subset of PF PCI device BAR */
+   hw->ctrl_bar += bar_offset;
+   PMD_INIT_LOG(DEBUG, "ctrl bar: %p\n", hw->ctrl_bar);
+   }
+
hw->max_rx_queues = nn_cfg_readl(hw, NFP_NET_CFG_MAX_RXRINGS);
hw->max_tx_queues = nn_cfg_readl(hw, NFP_NET_CFG_MAX_TXRINGS);
 
/* Work out where in the BAR the queues start. */
switch (pci_dev->id.device_id) {
+   case PCI_DEVICE_ID_NFP4000_PF_NIC:
+   case PCI_DEVICE_ID_NFP6000_PF_NIC:
case PCI_DEVICE_ID_NFP6000_VF_NIC:
start_q = nn_cfg_readl(hw, NFP_NET_CFG_START_TXQ);
tx_bar_off = NFP_PCIE_QUEUE(start_q);
@@ -2548,11 +2572,27 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
return -ENODEV;
}
 
-   PMD_INIT_LOG(DEBUG, "tx_bar_off: 0x%08x", tx_bar_off);
-   PMD_INIT_LOG(DEBUG, "rx_bar_off: 0x%08x", rx_bar_off);
+   PMD_INIT_LOG(DEBUG, "tx_bar_off: 0x%" PRIx64 "\n", tx_bar_off);
+   PMD_INIT_LOG(DEBUG, "rx_bar_off: 0x%" PRIx64 "\n", rx_bar_off);
 
-   hw->tx_bar = (uint8_t *)pci_dev->mem_resource[2].addr + tx_bar_off;
-   hw->rx_bar = (uint8_t *)pci_dev->mem_resource[2].addr + rx_bar_off;
+   if ((pci_dev->id.device_id == PCI_DEVICE_ID_NFP4000_PF_NIC) ||
+   (pci_dev->id.device_id == PCI_DEVICE_ID_NFP6000_PF_NIC)) {
+   /* configure access to tx/rx vNIC BARs */
+   nfp_nsp_map_queues_bar(nspu_desc, &bar_offset);
+   PMD_INIT_LOG(DEBUG, "tx/rx bar_offset: %" PRIx64 "\n",
+   bar_offset);
+   hw->hw_queues = (uint8_t *)pci_dev->mem_resource[0].addr;
+
+   /* vNIC PF tx/rx BARs are a subset of PF PCI device */
+   hw->hw_queues += bar_offset;
+   hw->tx_bar = hw->hw_queues + tx_bar_off;
+   hw->rx_bar = hw->hw_queues + rx_bar_off;
+   } else {
+   hw->tx_bar = (uint8_t *)pci_dev->mem_resource[2].addr +
+tx_bar_off;
+   hw->rx_bar = (uint8_t *)pci_dev->mem_resource[2].addr +
+rx_bar_off;
+   }
 
PMD_INIT_LOG(DEBUG, "ctrl_bar: %p, tx_bar: %p, rx_bar: %p",
   

[dpdk-dev] [PATCH 09/16] nfp: add support for vNIC rx/tx bar mappings

2017-08-24 Thread Alejandro Lucero
NFP vNICs use a subset of PCI device BARs. vNIC rx/tx bars point to
NFP hardware queues unit. Unline vNIC config bar, the NFP address is
always the same so the NFP expansion bar configuration always uses
the same hardcoded physical address.

This patch adds a NSPU API function for getting vNIC rx/tx bars
mapped through a expansion bar using that specific physical address.

The PMD will use the PCI bar offset returned for mapping the vNIC
rx/tx bars.

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_nspu.c | 21 -
 drivers/net/nfp/nfp_nspu.h |  1 +
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/nfp/nfp_nspu.c b/drivers/net/nfp/nfp_nspu.c
index f68bae6..df5af33 100644
--- a/drivers/net/nfp/nfp_nspu.c
+++ b/drivers/net/nfp/nfp_nspu.c
@@ -22,7 +22,8 @@
 #define NFP_NSP_TARGET   7
 
 /* Expansion BARs for mapping PF vnic BARs */
-#define NFP_NET_PF_CFG_EXP_BAR6
+#define NFP_NET_PF_CFG_EXP_BAR  6
+#define NFP_NET_PF_HW_QUEUES_EXP_BAR5
 
 /*
  * This is an NFP internal address used for configuring properly an NFP
@@ -509,3 +510,21 @@
 
return 0;
 }
+
+/*
+ * This is a hardcoded fixed NFP internal CPP bus address for the hw queues 
unit
+ * inside the PCIE island.
+ */
+#define NFP_CPP_PCIE_QUEUES ((uint64_t)(1ULL << 39) |  0x8 | \
+((uint64_t)0x4 & 0x3f) << 32)
+
+/* Configure a specific NFP expansion bar for accessing the vNIC rx/tx BARs */
+void
+nfp_nsp_map_queues_bar(nspu_desc_t *desc, uint64_t *pcie_offset)
+{
+   nfp_nspu_mem_bar_cfg(desc, NFP_NET_PF_HW_QUEUES_EXP_BAR, 0,
+NFP_CPP_PCIE_QUEUES, pcie_offset);
+
+   /* This is the pcie offset to use by the host */
+   *pcie_offset |= ((NFP_NET_PF_HW_QUEUES_EXP_BAR & 0x7) << (27 - 3));
+}
diff --git a/drivers/net/nfp/nfp_nspu.h b/drivers/net/nfp/nfp_nspu.h
index 8211f92..4b09d4f 100644
--- a/drivers/net/nfp/nfp_nspu.h
+++ b/drivers/net/nfp/nfp_nspu.h
@@ -74,3 +74,4 @@ int nfp_nspu_init(nspu_desc_t *desc, int nfp, int pcie_bar, 
size_t pcie_barsz,
 int nfp_nsp_get_abi_version(nspu_desc_t *desc, int *major, int *minor);
 int nfp_nsp_fw_setup(nspu_desc_t *desc, const char *sym, uint64_t 
*pcie_offset);
 int nfp_nsp_map_ctrl_bar(nspu_desc_t *desc, uint64_t *pcie_offset);
+void nfp_nsp_map_queues_bar(nspu_desc_t *desc, uint64_t *pcie_offset);
-- 
1.9.1



[dpdk-dev] [PATCH 12/16] nfp: support pf multiport

2017-08-24 Thread Alejandro Lucero
A NFP PF PCI device can have several physical ports, up to 8. Because
DPDK core creates one eth_dev per PCI device, nfp pf probe function
is used. Number of PF ports is obtained from firmware symbol using
NSPU API. Inside PF probe function an eth_dev per port is created and
nfp_net_init invoked for each port.

There are some limitations reagarding multiport: rx interrupts and
device hotplug are not supported.

Interrupts are handled with the VFIO or UIO drivers help. Those
drivers just know about PCI devices, so it is not possible, without
changing how DPDK handles interrupts, manage interrupts assigned to
different PF ports.

About hotplug, the problem is this functionality is based on a PCI
device, and although device pluging is possible, which would add as
many ports as supported by firmware, unpluging is based on device name
linked to a eth_dev, and device name has a suffix now (_portX, with X
being the port index) which DPDK core is not aware of. While rx
interrupts with multiport could be likely solved with some layer of
indirection, hotplug would require changes to DPDK core.

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_net.c  | 197 -
 drivers/net/nfp/nfp_net_ctrl.h |   3 +
 drivers/net/nfp/nfp_net_pmd.h  |   2 +
 3 files changed, 162 insertions(+), 40 deletions(-)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index 6005e41..f9ce204 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -687,6 +687,11 @@ static void nfp_net_read_mac(struct nfp_net_hw *hw)
 
/* check and configure queue intr-vector mapping */
if (dev->data->dev_conf.intr_conf.rxq != 0) {
+   if (hw->pf_multiport_enabled) {
+   PMD_INIT_LOG(ERR, "PMD rx interrupt is not supported "
+ "with NFP multiport PF");
+   return -EINVAL;
+   }
if (intr_handle->type == RTE_INTR_HANDLE_UIO) {
/*
 * Better not to share LSC with RX interrupts.
@@ -2489,11 +2494,40 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
.rx_queue_intr_disable  = nfp_rx_queue_intr_disable,
 };
 
+/*
+ * All eth_dev created got its private data, but before nfp_net_init, that
+ * private data is referencing private data for all the PF ports. This is due
+ * to how the vNIC bars are mapped based on first port, so all ports need info
+ * about port 0 private data. Inside nfp_net_init the private data pointer is
+ * changed to the right address for each port once the bars have been mapped.
+ */
+static int
+get_pf_port_number(char *name)
+{
+   char *pf_str = name;
+   int size = 0;
+
+   while ((*pf_str != '_') && (*pf_str != '\0') && (size++ < 30))
+   pf_str++;
+
+   if (size == 30)
+   /*
+* This should not happen at all and it would mean major
+* implementation fault.
+*/
+   rte_panic("nfp_net: problem with pf device name\n");
+
+   /* Expecting _portX with X within [0,7] */
+   pf_str += 5;
+
+   return (int)strtol(pf_str, NULL, 10);
+}
+
 static int
 nfp_net_init(struct rte_eth_dev *eth_dev)
 {
struct rte_pci_device *pci_dev;
-   struct nfp_net_hw *hw;
+   struct nfp_net_hw *hw, *hwport0;
 
uint64_t tx_bar_off = 0, rx_bar_off = 0;
uint32_t start_q;
@@ -2501,10 +2535,32 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 
nspu_desc_t *nspu_desc = NULL;
uint64_t bar_offset;
+   int port = 0;
 
PMD_INIT_FUNC_TRACE();
 
-   hw = NFP_NET_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
+   pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
+
+   if ((pci_dev->id.device_id == PCI_DEVICE_ID_NFP4000_PF_NIC) ||
+   (pci_dev->id.device_id == PCI_DEVICE_ID_NFP6000_PF_NIC)) {
+   port = get_pf_port_number(eth_dev->data->name);
+   if (port < 0 || port > 7) {
+   RTE_LOG(ERR, PMD, "Port value is wrong\n");
+   return -ENODEV;
+   }
+
+   PMD_INIT_LOG(DEBUG, "Working with PF port value %d\n", port);
+
+   /* This points to port 0 private data */
+   hwport0 = NFP_NET_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
+
+   /* This points to the specific port private data */
+   hw = &hwport0[port];
+   hw->pf_port_idx = port;
+   } else {
+   hw = NFP_NET_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
+   hwport0 = 0;
+   }
 
eth_dev->dev_ops = &nfp_net_eth_dev_ops;
eth_dev->rx_pkt_burst = &nfp_net_recv_pkts;
@@ -2514,9 +2570,10 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
return 0;
 
-   pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
r

[dpdk-dev] [PATCH 13/16] nfp: add nsp support for hw link configuration

2017-08-24 Thread Alejandro Lucero
Adding a new NSPU command for being able to read and write the ethernet
port table from/to the NFP. This will allow the PMD to put the Link up
or down when a port is started or stopped. Until now, this was performed
by the firmware independently of PMD functionality.

The ethernet port table has also some other useful information that will
be used in further commits.

Usually NSPU is used at device probe time and that is sequential code
execution. However, reading and writing the NFP eth table can be done at
different times and from different cores, and it implies it could happen
a concurrent access. A spinlock is added to the global nspu object for
protecting the NFP and avoiding the concurrent access.

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_net_eth.h | 82 +++
 drivers/net/nfp/nfp_nspu.c| 79 +++--
 drivers/net/nfp/nfp_nspu.h|  4 +++
 3 files changed, 162 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/nfp/nfp_net_eth.h

diff --git a/drivers/net/nfp/nfp_net_eth.h b/drivers/net/nfp/nfp_net_eth.h
new file mode 100644
index 000..af57f03
--- /dev/null
+++ b/drivers/net/nfp/nfp_net_eth.h
@@ -0,0 +1,82 @@
+/*
+ * Copyright (c) 2017 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * vim:shiftwidth=8:noexpandtab
+ *
+ * @file dpdk/pmd/nfp_net_eth.h
+ *
+ * Netronome NFP_NET PDM driver
+ */
+
+union eth_table_entry {
+   struct {
+   uint64_t port;
+   uint64_t state;
+   uint8_t mac_addr[6];
+   uint8_t resv[2];
+   uint64_t control;
+   };
+   uint64_t raw[4];
+};
+
+#ifndef BIT_ULL
+#define BIT_ULL(a) (1ULL << (a))
+#endif
+
+#define NSP_ETH_NBI_PORT_COUNT  24
+#define NSP_ETH_MAX_COUNT   (2 * NSP_ETH_NBI_PORT_COUNT)
+#define NSP_ETH_TABLE_SIZE   (NSP_ETH_MAX_COUNT * sizeof(union 
eth_table_entry))
+
+#define NSP_ETH_PORT_LANES  0xf
+#define NSP_ETH_PORT_INDEX  0xff00
+#define NSP_ETH_PORT_LABEL  0x3f
+#define NSP_ETH_PORT_PHYLABEL   0xfc0
+
+#define NSP_ETH_PORT_LANES_MASK rte_cpu_to_le_64(NSP_ETH_PORT_LANES)
+
+#define NSP_ETH_STATE_CONFIGUREDBIT_ULL(0)
+#define NSP_ETH_STATE_ENABLED   BIT_ULL(1)
+#define NSP_ETH_STATE_TX_ENABLEDBIT_ULL(2)
+#define NSP_ETH_STATE_RX_ENABLEDBIT_ULL(3)
+#define NSP_ETH_STATE_RATE  0xf00
+#define NSP_ETH_STATE_INTERFACE 0xff000
+#define NSP_ETH_STATE_MEDIA 0x30
+#define NSP_ETH_STATE_OVRD_CHNG BIT_ULL(22)
+#define NSP_ETH_STATE_ANEG  0x380
+
+#define NSP_ETH_CTRL_CONFIGURED BIT_ULL(0)
+#define NSP_ETH_CTRL_ENABLEDBIT_ULL(1)
+#define NSP_ETH_CTRL_TX_ENABLED BIT_ULL(2)
+#define NSP_ETH_CTRL_RX_ENABLED BIT_ULL(3)
+#define NSP_ETH_CTRL_SET_RATE   BIT_ULL(4)
+#define NSP_ETH_CTRL_SET_LANES  BIT_ULL(5)
+#define NSP_ETH_CTRL_SET_ANEG   BIT_ULL(6)
diff --git a/drivers/net/nfp/nfp_nspu.c b/drivers/net/nfp/nfp_nspu.c
index df5af33..29325e6 100644
--- a/drivers/net/nfp/nfp_nspu.c
+++ b/drivers/net/nfp/nfp_nspu.c
@@ -8,8 +8,10 @@
 #include 
 
 #include 
+#include 
 
 #include "nfp_nfpu.h"
+#include "nfp_net_eth.h"
 
 #define CFG_EXP_BAR_ADDR_SZ 1
 #define CFG_EXP_BAR_MAP_TYPE   1
@@ -45,9 +47,11 @@
 #define NSP_STATUS_MINOR(x)  (int)(((x

[dpdk-dev] [PATCH 11/16] nfp: allocate eth_dev from pf probe function

2017-08-24 Thread Alejandro Lucero
NFP can support several physical ports per PF device. Depending on
firmware info, one or more eth_dev objects will need to be created.

This patch adds the call to create just one eth_dev by now with future
commits supporting the multiport option. Once the eth_dev has been
created, probe function invokes pmd initialization with the new eth_dev.

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_net.c | 50 +++
 1 file changed, 42 insertions(+), 8 deletions(-)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index 7c23b7a..6005e41 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -2677,13 +2677,16 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 static int nfp_pf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
struct rte_pci_device *dev)
 {
+   struct rte_eth_dev *eth_dev;
+   struct nfp_net_hw *hw;
nfpu_desc_t *nfpu_desc;
nspu_desc_t *nspu_desc;
uint64_t offset_symbol;
int major, minor;
+   int ret = -ENODEV;
 
if (!dev)
-   return -ENODEV;
+   return ret;
 
nfpu_desc = rte_malloc("nfp nfpu", sizeof(nfpu_desc_t), 0);
if (!nfpu_desc)
@@ -2697,29 +2700,60 @@ static int nfp_pf_pci_probe(struct rte_pci_driver 
*pci_drv __rte_unused,
 
nspu_desc = nfpu_desc->nspu;
 
-
/* Check NSP ABI version */
if (nfp_nsp_get_abi_version(nspu_desc, &major, &minor) < 0) {
RTE_LOG(INFO, PMD, "NFP NSP not present\n");
-   goto no_abi;
+   goto error;
}
PMD_INIT_LOG(INFO, "nspu ABI version: %d.%d\n", major, minor);
 
if (minor < 20) {
RTE_LOG(INFO, PMD, "NFP NSP ABI version too old. Required 0.20 
or higher\n");
-   goto no_abi;
+   goto error;
}
 
-   nfp_nsp_fw_setup(nspu_desc, "nfd_cfg_pf0_num_ports", &offset_symbol);
+   ret = nfp_nsp_fw_setup(nspu_desc, "nfd_cfg_pf0_num_ports",
+  &offset_symbol);
+   if (ret)
+   goto error;
 
-   /* No port is created yet */
+   eth_dev = rte_eth_dev_allocate(dev->device.name);
+   if (!eth_dev) {
+   ret = -ENODEV;
+   goto error;
+   }
 
-no_abi:
+   eth_dev->data->dev_private = rte_zmalloc("nfp_pf_port",
+sizeof(struct nfp_net_adapter),
+RTE_CACHE_LINE_SIZE);
+   if (!eth_dev->data->dev_private) {
+   rte_eth_dev_release_port(eth_dev);
+   ret = -ENODEV;
+   goto error;
+   }
+
+   hw = (struct nfp_net_hw *)(eth_dev->data->dev_private);
+   hw->nspu_desc = nspu_desc;
+   hw->nfpu_desc = nfpu_desc;
+   hw->is_pf = 1;
+
+   eth_dev->device = &dev->device;
+   rte_eth_copy_pci_info(eth_dev, dev);
+
+   ret = nfp_net_init(eth_dev);
+
+   if (!ret)
+   return 0;
+
+   /* something went wrong */
+   rte_eth_dev_release_port(eth_dev);
+
+error:
nfpu_close(nfpu_desc);
 nfpu_error:
rte_free(nfpu_desc);
 
-   return -ENODEV;
+   return ret;
 }
 
 static const struct rte_pci_id pci_id_nfp_pf_net_map[] = {
-- 
1.9.1



[dpdk-dev] [PATCH 14/16] nfp: add support for hw port link configuration

2017-08-24 Thread Alejandro Lucero
It is PMD task to configure the hardware port: link up when port started
and link down when port stopped. This is not required for VFs but it is
for PF ports.

A minor refactoring in PMD stop and close functions is done because the
Link down needs to happen just when device is stopped.

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_net.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index f9ce204..aa611e1 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -737,6 +737,10 @@ static void nfp_net_read_mac(struct nfp_net_hw *hw)
goto error;
}
 
+   if (hw->is_pf)
+   /* Configure the physical port up */
+   nfp_nsp_eth_config(hw->nspu_desc, hw->pf_port_idx, 1);
+
hw->ctrl = new_ctrl;
 
return 0;
@@ -765,9 +769,12 @@ static void nfp_net_read_mac(struct nfp_net_hw *hw)
 nfp_net_stop(struct rte_eth_dev *dev)
 {
int i;
+   struct nfp_net_hw *hw;
 
PMD_INIT_LOG(DEBUG, "Stop");
 
+   hw = NFP_NET_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
nfp_net_disable_queues(dev);
 
/* Clear queues */
@@ -780,6 +787,10 @@ static void nfp_net_read_mac(struct nfp_net_hw *hw)
nfp_net_reset_rx_queue(
(struct nfp_net_rxq *)dev->data->rx_queues[i]);
}
+
+   if (hw->is_pf)
+   /* Configure the physical port down */
+   nfp_nsp_eth_config(hw->nspu_desc, hw->pf_port_idx, 0);
 }
 
 /* Reset and stop device. The device can not be restarted. */
@@ -788,6 +799,7 @@ static void nfp_net_read_mac(struct nfp_net_hw *hw)
 {
struct nfp_net_hw *hw;
struct rte_pci_device *pci_dev;
+   int i;
 
PMD_INIT_LOG(DEBUG, "Close");
 
@@ -799,7 +811,18 @@ static void nfp_net_read_mac(struct nfp_net_hw *hw)
 * threads/queues before calling the device close function.
 */
 
-   nfp_net_stop(dev);
+   nfp_net_disable_queues(dev);
+
+   /* Clear queues */
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   nfp_net_reset_tx_queue(
+   (struct nfp_net_txq *)dev->data->tx_queues[i]);
+   }
+
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   nfp_net_reset_rx_queue(
+   (struct nfp_net_rxq *)dev->data->rx_queues[i]);
+   }
 
rte_intr_disable(&pci_dev->intr_handle);
nn_cfg_writeb(hw, NFP_NET_CFG_LSC, 0xff);
-- 
1.9.1



[dpdk-dev] [PATCH 15/16] nfp: read pf port mac addr using nsp

2017-08-24 Thread Alejandro Lucero
During initialization, mac address is read from configuration bar. This is
the default option when using VFs.

This patch adds support for reading the mac address using the NSPU
interface when PMD works with the PF.

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_net.c | 59 +--
 drivers/net/nfp/nfp_net_pmd.h |  1 +
 drivers/net/nfp/nfp_nspu.c| 22 +++-
 drivers/net/nfp/nfp_nspu.h|  2 ++
 4 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index aa611e1..9496b63 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -593,7 +593,55 @@ enum nfp_qcp_ptr {
hw->qcp_cfg = hw->tx_bar + NFP_QCP_QUEUE_ADDR_SZ;
 }
 
-static void nfp_net_read_mac(struct nfp_net_hw *hw)
+#define ETH_ADDR_LEN   6
+
+static void
+nfp_eth_copy_mac_reverse(uint8_t *dst, const uint8_t *src)
+{
+   int i;
+
+   for (i = 0; i < ETH_ADDR_LEN; i++)
+   dst[ETH_ADDR_LEN - i - 1] = src[i];
+}
+
+static int
+nfp_net_pf_read_mac(struct nfp_net_hw *hw, int port)
+{
+   union eth_table_entry *entry;
+   int idx, i;
+
+   idx = port;
+   entry = hw->eth_table;
+
+   /* Reading NFP ethernet table obtained before */
+   for (i = 0; i < NSP_ETH_MAX_COUNT; i++) {
+   if (!(entry->port & NSP_ETH_PORT_LANES_MASK)) {
+   /* port not in use */
+   entry++;
+   continue;
+   }
+   if (idx == 0)
+   break;
+   idx--;
+   entry++;
+   }
+
+   if (i == NSP_ETH_MAX_COUNT)
+   return -EINVAL;
+
+   /*
+* hw points to port0 private data. We need hw now pointing to
+* right port.
+*/
+   hw += port;
+   nfp_eth_copy_mac_reverse((uint8_t *)&hw->mac_addr,
+(uint8_t *)&entry->mac_addr);
+
+   return 0;
+}
+
+static void
+nfp_net_vf_read_mac(struct nfp_net_hw *hw)
 {
uint32_t tmp;
 
@@ -2672,6 +2720,10 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 
/* vNIC PF tx/rx BARs are a subset of PF PCI device */
hwport0->hw_queues += bar_offset;
+
+   /* Lets seize the chance to read eth table from hw */
+   if (nfp_nsp_eth_read_table(nspu_desc, &hw->eth_table))
+   return -ENODEV;
}
 
if (hw->is_pf) {
@@ -2732,7 +2784,10 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
return -ENOMEM;
}
 
-   nfp_net_read_mac(hw);
+   if (hw->is_pf)
+   nfp_net_pf_read_mac(hwport0, port);
+   else
+   nfp_net_vf_read_mac(hw);
 
if (!is_valid_assigned_ether_addr((struct ether_addr *)&hw->mac_addr)) {
/* Using random mac addresses for VFs */
diff --git a/drivers/net/nfp/nfp_net_pmd.h b/drivers/net/nfp/nfp_net_pmd.h
index d7e38d4..20ade1a 100644
--- a/drivers/net/nfp/nfp_net_pmd.h
+++ b/drivers/net/nfp/nfp_net_pmd.h
@@ -441,6 +441,7 @@ struct nfp_net_hw {
uint8_t is_pf;
uint8_t pf_port_idx;
uint8_t pf_multiport_enabled;
+   union eth_table_entry *eth_table;
nspu_desc_t *nspu_desc;
nfpu_desc_t *nfpu_desc;
 };
diff --git a/drivers/net/nfp/nfp_nspu.c b/drivers/net/nfp/nfp_nspu.c
index 29325e6..36f2b30 100644
--- a/drivers/net/nfp/nfp_nspu.c
+++ b/drivers/net/nfp/nfp_nspu.c
@@ -11,7 +11,6 @@
 #include 
 
 #include "nfp_nfpu.h"
-#include "nfp_net_eth.h"
 
 #define CFG_EXP_BAR_ADDR_SZ 1
 #define CFG_EXP_BAR_MAP_TYPE   1
@@ -601,3 +600,24 @@
rte_spinlock_unlock(&desc->nsp_lock);
return ret;
 }
+
+int
+nfp_nsp_eth_read_table(nspu_desc_t *desc, union eth_table_entry **table)
+{
+   int ret;
+
+   RTE_LOG(INFO, PMD, "Reading hw ethernet table...\n");
+   /* port 0 allocates the eth table and read it using NSPU */
+   *table = malloc(NSP_ETH_TABLE_SIZE);
+   if (!table)
+   return -ENOMEM;
+
+   ret = nspu_command(desc, NSP_CMD_READ_ETH_TABLE, 1, 0, *table,
+  NSP_ETH_TABLE_SIZE, 0);
+   if (ret)
+   return ret;
+
+   RTE_LOG(INFO, PMD, "Done\n");
+
+   return 0;
+}
diff --git a/drivers/net/nfp/nfp_nspu.h b/drivers/net/nfp/nfp_nspu.h
index 4e58986..8c33835 100644
--- a/drivers/net/nfp/nfp_nspu.h
+++ b/drivers/net/nfp/nfp_nspu.h
@@ -58,6 +58,7 @@
  */
 
 #include 
+#include "nfp_net_eth.h"
 
 typedef struct {
int nfp;/* NFP device */
@@ -79,3 +80,4 @@ int nfp_nspu_init(nspu_desc_t *desc, int nfp, int pcie_bar, 
size_t pcie_barsz,
 int nfp_nsp_map_ctrl_bar(nspu_desc_t *desc, uint64_t *pcie_offset);
 void nfp_nsp_map_queues_bar(nspu_desc_t *desc, uint64_t *pcie_offset);
 int nfp_nsp_eth_config(nspu_desc_t *desc, int port, int up);
+int nfp_nsp_eth_read_table(nspu_desc_t *desc, union eth_table_entry **table);
-- 
1.9.1



[dpdk-dev] [PATCH 16/16] doc: update nfp with pf support information

2017-08-24 Thread Alejandro Lucero
NFP PMF has now support for both, PF and VFs. This patch updates
the guide and give some information about implications.

Signed-off-by: Alejandro Lucero 
---
 doc/guides/nics/nfp.rst | 71 -
 1 file changed, 52 insertions(+), 19 deletions(-)

diff --git a/doc/guides/nics/nfp.rst b/doc/guides/nics/nfp.rst
index c732fb1..69ae952 100644
--- a/doc/guides/nics/nfp.rst
+++ b/doc/guides/nics/nfp.rst
@@ -1,5 +1,5 @@
 ..  BSD LICENSE
-Copyright(c) 2015 Netronome Systems, Inc. All rights reserved.
+Copyright(c) 2015-2017 Netronome Systems, Inc. All rights reserved.
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
@@ -38,31 +38,31 @@ up to 400 Gbps.
 
 This document explains how to use DPDK with the Netronome Poll Mode
 Driver (PMD) supporting Netronome's Network Flow Processor 6xxx
-(NFP-6xxx).
+(NFP-6xxx) and Netronome's Flow Processor 4xxx (NFP-4xxx).
 
-Currently the driver supports virtual functions (VFs) only.
+NFP is a SRIOV capable device and the PMD driver supports the physical
+function (PF) and virtual functions (VFs).
 
 Dependencies
 
 
-Before using the Netronome's DPDK PMD some NFP-6xxx configuration,
+Before using the Netronome's DPDK PMD some NFP configuration,
 which is not related to DPDK, is required. The system requires
-installation of **Netronome's BSP (Board Support Package)** which includes
-Linux drivers, programs and libraries.
+installation of **Netronome's BSP (Board Support Package)** along
+with some specific NFP firmware application.
 
-If you have a NFP-6xxx device you should already have the code and
-documentation for doing this configuration. Contact
+If you have a NFP device you should already have the code and
+documentation for doing all this configuration. Contact
 **supp...@netronome.com** to obtain the latest available firmware.
 
-The NFP Linux kernel drivers (including the required PF driver for the
-NFP) are available on Github at
+The NFP Linux netdev kernel driver for VFs is part of vanilla kernel
+since kernel vesion 4.5, and support for the PF since kernel version
+4.11. Support for older kernels can be obtained on Github at
 **https://github.com/Netronome/nfp-drv-kmods** along with build
 instructions.
 
-DPDK runs in userspace and PMDs uses the Linux kernel UIO interface to
-allow access to physical devices from userspace. The NFP PMD requires
-the **igb_uio** UIO driver, available with DPDK, to perform correct
-initialization.
+NFP PMD needs to be used along with UIO **igb_uio** or VFIO (vfio-pci)
+Linux kernel driver.
 
 Building the software
 -
@@ -71,7 +71,7 @@ Netronome's PMD code is provided in the **drivers/net/nfp** 
directory.
 Although NFP PMD has Netronome´s BSP dependencies, it is possible to
 compile it along with other DPDK PMDs even if no BSP was installed before.
 Of course, a DPDK app will require such a BSP installed for using the
-NFP PMD.
+NFP PMD, along with a specific NFP firmware application.
 
 Default PMD configuration is at **common_linuxapp configuration** file:
 
@@ -88,13 +88,46 @@ Refer to the document :ref:`compiling and testing a PMD for 
a NIC 

[dpdk-dev] [PATCH 0/8] bnxt patchset

2017-08-24 Thread Ajit Khaparde
This patchset adds:
  - support for xstats get by id
  - support for rx_queue_count
  - support for rx_descriptor_status
  - support for rx_descriptor_done
  - support for tx_descriptor_status
  - support for flow filter ops
  - new HWRM structures which are used by the flow filtering functions
It also fixes the HWRM_*() macros and locking to ensure that there is
only one outstanding command with the firmware.

Please consider including them for the upcoming release.

-- 
2.10.1 (Apple Git-78)



[dpdk-dev] [PATCH 1/8] net/bnxt: add support for xstats get by id

2017-08-24 Thread Ajit Khaparde
This patch adds support for xstats_get_by_id/xstats_get_names_by_id.
Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_ethdev.c |  2 ++
 drivers/net/bnxt/bnxt_stats.c  | 44 ++
 drivers/net/bnxt/bnxt_stats.h  |  5 +
 3 files changed, 51 insertions(+)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index c9d1122..1302710 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -1564,6 +1564,8 @@ static const struct eth_dev_ops bnxt_dev_ops = {
.txq_info_get = bnxt_txq_info_get_op,
.dev_led_on = bnxt_dev_led_on_op,
.dev_led_off = bnxt_dev_led_off_op,
+   .xstats_get_by_id = bnxt_dev_xstats_get_by_id_op,
+   .xstats_get_names_by_id = bnxt_dev_xstats_get_names_by_id_op,
 };
 
 static bool bnxt_vf_pciid(uint16_t id)
diff --git a/drivers/net/bnxt/bnxt_stats.c b/drivers/net/bnxt/bnxt_stats.c
index d7d0e35..46fc3c4 100644
--- a/drivers/net/bnxt/bnxt_stats.c
+++ b/drivers/net/bnxt/bnxt_stats.c
@@ -358,3 +358,47 @@ void bnxt_dev_xstats_reset_op(struct rte_eth_dev *eth_dev)
if (!(bp->flags & BNXT_FLAG_PORT_STATS))
RTE_LOG(ERR, PMD, "Operation not supported\n");
 }
+
+int bnxt_dev_xstats_get_by_id_op(struct rte_eth_dev *dev, const uint64_t *ids,
+   uint64_t *values, unsigned int limit)
+{
+   /* Account for the Tx drop pkts aka the Anti spoof counter */
+   const unsigned int stat_cnt = RTE_DIM(bnxt_rx_stats_strings) +
+   RTE_DIM(bnxt_tx_stats_strings) + 1;
+   struct rte_eth_xstat xstats[stat_cnt];
+   uint16_t i;
+
+   bnxt_dev_xstats_get_op(dev, xstats, limit);
+
+   for (i = 0; i < limit; i++) {
+   if (ids[i] >= stat_cnt) {
+   RTE_LOG(ERR, PMD, "id value isn't valid");
+   return -1;
+   }
+   values[i] = xstats[ids[i]].value;
+   }
+   return limit;
+}
+
+int bnxt_dev_xstats_get_names_by_id_op(struct rte_eth_dev *dev,
+   struct rte_eth_xstat_name *xstats_names,
+   const uint64_t *ids, unsigned int limit)
+{
+   /* Account for the Tx drop pkts aka the Anti spoof counter */
+   const unsigned int stat_cnt = RTE_DIM(bnxt_rx_stats_strings) +
+   RTE_DIM(bnxt_tx_stats_strings) + 1;
+   struct rte_eth_xstat_name xstats_names_copy[stat_cnt];
+   uint16_t i;
+
+   bnxt_dev_xstats_get_names_op(dev, xstats_names_copy, limit);
+
+   for (i = 0; i < limit; i++) {
+   if (ids[i] >= stat_cnt) {
+   RTE_LOG(ERR, PMD, "id value isn't valid");
+   return -1;
+   }
+   strcpy(xstats_names[i].name,
+   xstats_names_copy[ids[i]].name);
+   }
+   return limit;
+}
diff --git a/drivers/net/bnxt/bnxt_stats.h b/drivers/net/bnxt/bnxt_stats.h
index b6d133e..daeb3d9 100644
--- a/drivers/net/bnxt/bnxt_stats.h
+++ b/drivers/net/bnxt/bnxt_stats.h
@@ -46,6 +46,11 @@ int bnxt_dev_xstats_get_names_op(__rte_unused struct 
rte_eth_dev *eth_dev,
 int bnxt_dev_xstats_get_op(struct rte_eth_dev *eth_dev,
   struct rte_eth_xstat *xstats, unsigned int n);
 void bnxt_dev_xstats_reset_op(struct rte_eth_dev *eth_dev);
+int bnxt_dev_xstats_get_by_id_op(struct rte_eth_dev *dev, const uint64_t *ids,
+   uint64_t *values, unsigned int limit);
+int bnxt_dev_xstats_get_names_by_id_op(struct rte_eth_dev *dev,
+   struct rte_eth_xstat_name *xstats_names,
+   const uint64_t *ids, unsigned int limit);
 
 struct bnxt_xstats_name_off {
char name[RTE_ETH_XSTATS_NAME_SIZE];
-- 
2.10.1 (Apple Git-78)



  1   2   >