[dpdk-dev] [PATCH] net/mlx5: disable ConnectX-4 Lx Multi Packet Send by default

2018-08-13 Thread Shahaf Shuler
On ConnectX-4 Lx the Multi Packet Send (MPW) feature is considered
un-secure, as on some cases were the application provides incorrect mbufs
on the Tx burst the host or NIC can get stuck.

Hence, disabling the feature by default for this specific NIC.
Users can still enable this feature and enjoy the performance gain
(mostly for low number of cores) by using the txq_mpw_en devarg.

This patch will impact the out of the box performance of some application
using ConnectX-4 Lx for the sack of security and robustness.

Cc: sta...@dpdk.org

Signed-off-by: Shahaf Shuler 
---
 doc/guides/nics/mlx5.rst   | 7 ++-
 drivers/net/mlx5/mlx5.c| 9 +++--
 drivers/net/mlx5/mlx5_ethdev.c | 5 +++--
 drivers/net/mlx5/mlx5_prm.h| 3 ++-
 4 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 52e1213cf5..dbdb90b59b 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -339,7 +339,12 @@ Run-time configuration
   When those offloads are requested the MPS send function will not be used.
 
   It is currently only supported on the ConnectX-4 Lx, ConnectX-5 and Bluefield
-  families of adapters. Enabled by default.
+  families of adapters.
+  On ConnectX-4 Lx the MPW is considered un-secure hence disabled by default.
+  Users which enable the MPW should be aware that application which provides 
incorrect
+  mbuf descriptors in the Tx burst can lead to serious errors in the host 
including, on some cases,
+  NIC to get stuck.
+  On ConnectX-5 and Bluefield the MPW is secure and enabled by default.
 
 - ``txq_mpw_hdr_dseg_en`` parameter [int]
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ec63bc6e22..2e8f906f35 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -477,7 +477,11 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
} else if (strcmp(MLX5_TXQS_MIN_INLINE, key) == 0) {
config->txqs_inline = tmp;
} else if (strcmp(MLX5_TXQ_MPW_EN, key) == 0) {
-   config->mps = !!tmp ? config->mps : 0;
+   tmp = !!tmp;
+   if (tmp && config->mps == MLX5_MPW)
+   config->mps = MLX5_MPW_USER_FORCED;
+   else
+   config->mps = tmp ? config->mps : 0;
} else if (strcmp(MLX5_TXQ_MPW_HDR_DSEG_EN, key) == 0) {
config->mpw_hdr_dseg = !!tmp;
} else if (strcmp(MLX5_TXQ_MAX_INLINE_LEN, key) == 0) {
@@ -1044,7 +1048,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
}
DRV_LOG(INFO, "%sMPS is %s",
config.mps == MLX5_MPW_ENHANCED ? "enhanced " : "",
-   config.mps != MLX5_MPW_DISABLED ? "enabled" : "disabled");
+   (config.mps != MLX5_MPW_DISABLED && config.mps != MLX5_MPW) ?
+   "enabled" : "disabled");
if (config.cqe_comp && !cqe_comp) {
DRV_LOG(WARNING, "Rx CQE compression isn't supported");
config.cqe_comp = 0;
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 34c5b95ee6..48917b0f6b 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1224,11 +1224,12 @@ mlx5_select_tx_function(struct rte_eth_dev *dev)
"port %u selected enhanced MPW Tx function",
dev->data->port_id);
}
-   } else if (config->mps && (config->txq_inline > 0)) {
+   } else if (config->mps == MLX5_MPW_USER_FORCED &&
+  (config->txq_inline > 0)) {
tx_pkt_burst = mlx5_tx_burst_mpw_inline;
DRV_LOG(DEBUG, "port %u selected MPW inline Tx function",
dev->data->port_id);
-   } else if (config->mps) {
+   } else if (config->mps == MLX5_MPW_USER_FORCED) {
tx_pkt_burst = mlx5_tx_burst_mpw;
DRV_LOG(DEBUG, "port %u selected MPW Tx function",
dev->data->port_id);
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index 0870d32fdb..b64cd45eee 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -197,7 +197,8 @@ struct mlx5_wqe64 {
 /* MPW mode. */
 enum mlx5_mpw_mode {
MLX5_MPW_DISABLED,
-   MLX5_MPW,
+   MLX5_MPW, /* MPW is supported by the device. */
+   MLX5_MPW_USER_FORCED, /* MPW is supported and selected by the user. */
MLX5_MPW_ENHANCED, /* Enhanced Multi-Packet Send WQE, a.k.a MPWv2. */
 };
 
-- 
2.12.0



Re: [dpdk-dev] [PATCH v2 00/12] preparing l2fwd for eventmode additions

2018-08-13 Thread Joseph, Anoob

Hi Bruce, Pablo,

If there are no more issues about the approach, can you review the 
patches and give the feedback?


Please do note that this series doesn't add any event mode specific 
code. That will come as a different patch series after incorporating 
Jerin's comments.


Thanks,
Anoob
On 02-08-2018 13:49, Ananyev, Konstantin wrote:

External Email

Hi everyone,


In order to get this series accepted, we need more discussions
with more people involved.
So it will miss 18.08.

It can be discussed in a more global discussion about examples maintenance.
If discussion does not happen, you can request it to the technical board.


Event dev framework and various adapters enable multiple packet handling
schemes, as opposed to the traditional polling on queues. But these
features are not integrated into any established example application.
There are specific example applications for event dev etc, which can be
used to analyze an event device or a particular eventdev adapter, but
there is no standard application which can be used to compare the real
world performance for a system when it's using event device for packet
handling and when it's done via polling on queues.

The following patch submitted by Sunil was looking to address this issue
with l3fwd,
https://mails.dpdk.org/archives/dev/2018-March/093131.html

Bruce & Jerin reviewed the patch and suggested the addition of helper
functions to abstract the event mode additions in applications,
https://mails.dpdk.org/archives/dev/2018-April/096879.html

This effort of adding helper functions for eventmode was taken up
following the above suggestion. The idea is to add eventmode without
touching the existing code path. All the eventmode specific additions
would go into library so that these need not be repeated for every
application. And since there is no change in the existing code path,
performance for any vendor should not have any impact with the additions.

The scope of this effort has increased since the submission, as now we
have Tx adapter as well. Sunil & Konstantin had clarified their
concerns, and gave green flag to this approach.
https://mails.dpdk.org/archives/dev/2018-June/105730.html
https://mails.dpdk.org/archives/dev/2018-July/106453.html

I guess Bruce was opening this question to the community. For compute
intense applications like ipsec-secgw, eventmode might be the right
approach in the first place. Such complex applications would need a
scheduler to perform dynamic load balancing. Addition of eventmode in
l2fwd was to float around the idea which can then be scaled for more
complex applications.

If maintainers doesn't have any objection to this, I'm fine with adding
this in the next release.

Thanks,
Anoob

It is important that DPDK has good examples of how to use existing
frameworks and libraries. These applications are what most customers
build their applications from and they provide basis for testing.

The DPDK needs to continue to support multiple usage models. This
is one of its strong points. I would rather leave existing l2fwd
and l3fwd alone and instead make new examples that use the frameworks.
If nothing else haveing l2fwd and l2fwd-eventdev would allow for
performance comparisons.

Unlike other applications example, there wont be any change in packet
processing functions in eventdev vs poll mode case. Only worker
schematics will change and that can be moved to separated files.
something like worker_poll.c and worker_event.c and both of them
use common packet processing functions using mbuf.

The only disadvantage of having separate application would be packet
processing code duplication. Which is non trivial for l3fwd, IPSec
application IMO.

Personally I am ok with original design suggestion:
keep packet processing code common, that would be used by both poll and event 
modes.
We could just have a command-line parameter in which mode the app will run.
Another alternative - generate two binaries l2fwd-poll, l2fwd-event (or so).
Konstantin

# Are we fine with code duplication in example application like l3fwd and
IPSec?
# if yes, Are we fine with keeping l2fwd _as is_ to reduce the
complexity and l2fwd-eventdev supports both modes wherever possible?


As the number of examples increases, probably also need to have
a roadmap or decision chart to explain the advangage/disadvantage
of each architecture.





Re: [dpdk-dev] [PATCH] app/testpmd: support bitmask for RSS and FDIR

2018-08-13 Thread Xing, Beilei



> -Original Message-
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Friday, August 10, 2018 4:58 PM
> To: Xing, Beilei 
> Cc: dev@dpdk.org; Lu, Wenzhuo ; Wu, Jingjing
> 
> Subject: Re: [dpdk-dev] [PATCH] app/testpmd: support bitmask for RSS and
> FDIR
> 
> 10/08/2018 04:31, Xing, Beilei:
> > Hi Thomas,
> >
> > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > > 06/08/2018 07:45, Beilei Xing:
> > > > This patch adds bitmask support for RSS, FDIR and FDIR flexible
> > > > payload.
> > > >
> > > > Signed-off-by: Beilei Xing 
> > >
> > > Flow director API is deprecated for almost 2 years:
> > >   http://git.dpdk.org/dpdk/commit/?id=7fdcde6
> > > We asked several times to stop using it in i40e.
> > > It has not been listened.
> > > We have even accepted a workaround in flow_filtering example for i40e:
> > >   http://git.dpdk.org/dpdk/commit/?id=9a93446
> > > We need to stop and move all to a common rte_flow API.
> > >
> > > As a consequence, I will be against this patch.
> > >
> >
> > It's not for deprecated flow director API, it's mainly for i40e private API
> rte_pmd_i40e_inset_set.
> > This patch is an extend of:
> > https://patches.dpdk.org/patch/33031/
> 
> It is a private API for flow director.
> I feel you play with me, it's worst.
> 

This API is designed when i40e supports DDP (Dynamic Device Personalization), 
which can support any protocol customer wants. Supported protocols in rte_flow 
are limited, if HW supports a new protocol, we should also change rte_flow and 
driver correspondingly each time, it's not dynamic then. So the API is 
designed, although it works for FDIR, but it's not designed for FDIR especially 
but for input set, sorry the title annoyed you, it's my fault. 
And you are right the private API is not a good way after all, I will abandon 
the patch first and investigate if there's other way. Thanks for the comments.



[dpdk-dev] [RFC] ethdev: support metadata as flow rule criteria

2018-08-13 Thread Dekel Peled
Current implementation of rte_flow allows match pattern of flow rule,
based on packet data or header fields.
This limits the application use of match patterns.

For example, consider a vswitch application which controls a set of VMs,
connected with virtio, in a fabric with overlay of VXLAN.
Several VMs can have the same inner tuple, while the outer tuple is
different and controlled by the vswitch (encap action).
For the vswtich to be able to offload the rule to the NIC, it must use a
unique match criteria, independent from the inner tuple, to perform the
encap action.

This RFC adds support for additional metadata to use as match pattern.
The metadata is an opaque item, fully controlled by the application.

The use of metadata is relevant for egress rules only.
It can be set in the flow rule using the RTE_FLOW_ITEM_META.

Application should set the packet metdata in the mbuf->metadata field,
and set the PKT_TX_METADATA flag in the mbuf->ol_flags.
The NIC will use the packet metadata as match criteria for relevant flow
rules.

For example, to do an encap action depending on the VM id, the
application needs to configure 'match on metadata' rte_flow rule with
VM id as metadata, along with desired encap action.
When preparing an egress data packet, application will set VM id data in
mbuf metadata field and set PKT_TX_METADATA flag.

PMD will send data packets to NIC, with VM id as metadata.
Egress flow on NIC will match metadata as done with other criteria.
Upon match on metadata (VM id) the appropriate encap action will be
performed.

This RFC introduces metadata item type for rte_flow RTE_FLOW_ITEM_META,
along with corresponding struct rte_flow_item_meta and ol_flag
PKT_TX_METADATA.
It also enhances struct rte_mbuf with new data item, uint64_t metadata.

Comments are welcome.

Signed-off-by: Dekel Peled 
---
 doc/guides/prog_guide/rte_flow.rst | 21 +
 lib/librte_ethdev/rte_flow.c   |  1 +
 lib/librte_ethdev/rte_flow.h   | 25 +
 lib/librte_mbuf/rte_mbuf.h | 11 +++
 4 files changed, 58 insertions(+)

diff --git a/doc/guides/prog_guide/rte_flow.rst 
b/doc/guides/prog_guide/rte_flow.rst
index b305a72..b6e35f1 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1191,6 +1191,27 @@ Normally preceded by any of:
 - `Item: ICMP6_ND_NS`_
 - `Item: ICMP6_ND_OPT`_
 
+Item: ``META``
+^^
+
+Matches an application specific 64 bit metadata item.
+
+- Default ``mask`` matches any 64 bit value.
+
+.. _table_rte_flow_item_meta:
+
+.. table:: META
+
+   +--+--+---+
+   | Field| Subfield | Value |
+   +==+==+===+
+   | ``spec`` | ``data`` | 64 bit metadata value |
+   +--+--+
+   | ``last`` | ``data`` | upper range value |
+   +--+--+---+
+   | ``mask`` | ``data`` | zeroed to match any value |
+   +--+--+---+
+
 Actions
 ~~~
 
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index cff4b52..54e5ef8 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -66,6 +66,7 @@ struct rte_flow_desc_data {
 sizeof(struct rte_flow_item_icmp6_nd_opt_sla_eth)),
MK_FLOW_ITEM(ICMP6_ND_OPT_TLA_ETH,
 sizeof(struct rte_flow_item_icmp6_nd_opt_tla_eth)),
+   MK_FLOW_ITEM(META, sizeof(struct rte_flow_item_meta)),
 };
 
 /** Generate flow_action[] entry. */
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index f8ba71c..b81c816 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -413,6 +413,15 @@ enum rte_flow_item_type {
 * See struct rte_flow_item_mark.
 */
RTE_FLOW_ITEM_TYPE_MARK,
+
+   /**
+* [META]
+*
+* Matches a metadata value specified in mbuf metadata field.
+*
+* See struct rte_flow_item_meta.
+*/
+   RTE_FLOW_ITEM_TYPE_META,
 };
 
 /**
@@ -849,6 +858,22 @@ struct rte_flow_item_gre {
 #endif
 
 /**
+ * RTE_FLOW_ITEM_TYPE_META.
+ *
+ * Matches a specified metadata value.
+ */
+struct rte_flow_item_meta {
+   uint64_t data;
+};
+
+/** Default mask for RTE_FLOW_ITEM_TYPE_META. */
+#ifndef __cplusplus
+static const struct rte_flow_item_meta rte_flow_item_meta_mask = {
+   .data = RTE_BE64(UINT64_MAX),
+};
+#endif
+
+/**
  * RTE_FLOW_ITEM_TYPE_FUZZY
  *
  * Fuzzy pattern match, expect faster than default.
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 9ce5d76..8f06a78 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -182,6 +182,11 @@
 /* add new TX flags here */
 
 /**
+ * This flag indicates that the metadata field in the mbuf is in use.
+ */
+#define PKT_TX_METADATA(1ULL << 41)
+
+/*

Re: [dpdk-dev] [RFC] mlx5: fix error unwind in device start

2018-08-13 Thread Shahaf Shuler
Hi Stephan,

Thursday, August 2, 2018 1:00 AM, Stephen Hemminger:
> Subject: [RFC] mlx5: fix error unwind in device start
> 
> The error handling in start of the mlx5 driver is buggy.
> For example, if setting up the flows fails the device driver will then get 
> stuck
> in mlx5_flow_rxq_flags_clear waiting for something that will never happen.

Looking at the code I cannot understand why the mlx5_flow_rxq_flags_clear get 
stuck nor to what it waits.
The function has few finite loops which are not depended in anything which 
happened before it at the device start.

Moreover I tried to force either the mlx5_traffic_enable or the mlx5_flow_start 
to stop, however the results was the port failed to start but no stuck.

Can you provide more details about the issue you saw there?  

> 
> The problem is that the code jumps to a common error label and does
> unwind for portions of the driver which have not been setup.
> 
> This suggested patch breaks it into different labels with each failure path 
> only
> unwinding what was done.
> 
> Also, the ethdev driver should not be manipulating the dev_started flag
> directly. That is handled by the common ethdev layer.
> 

I agree that maybe this code part can be better written, but my question before 
is whether we have an actual bug that we will solve w/ this change? 

> The patch works for the success case, but furthur testing is needed to
> actually exercise all the error paths.
> This is left as exercise for the maintainers.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  drivers/net/mlx5/mlx5_trigger.c | 26 +-
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_trigger.c
> b/drivers/net/mlx5/mlx5_trigger.c index e2a9bb703261..79a7b233986a
> 100644
> --- a/drivers/net/mlx5/mlx5_trigger.c
> +++ b/drivers/net/mlx5/mlx5_trigger.c
> @@ -171,42 +171,42 @@ mlx5_dev_start(struct rte_eth_dev *dev)
>   if (ret) {
>   DRV_LOG(ERR, "port %u Rx queue allocation failed: %s",
>   dev->data->port_id, strerror(rte_errno));
> - mlx5_txq_stop(dev);
> - return -rte_errno;
> + goto error_txq_stop;
>   }
> - dev->data->dev_started = 1;
> +
>   ret = mlx5_rx_intr_vec_enable(dev);
>   if (ret) {
>   DRV_LOG(ERR, "port %u Rx interrupt vector creation failed",
>   dev->data->port_id);
> - goto error;
> + goto error_rxq_stop;
>   }
>   mlx5_xstats_init(dev);
>   ret = mlx5_traffic_enable(dev);
>   if (ret) {
>   DRV_LOG(DEBUG, "port %u failed to set defaults flows",
>   dev->data->port_id);
> - goto error;
> + goto error_intr_vec_disable;
>   }
>   ret = mlx5_flow_start(dev, &priv->flows);
>   if (ret) {
>   DRV_LOG(DEBUG, "port %u failed to set flows",
>   dev->data->port_id);
> - goto error;
> + goto error_traffic_disable;
>   }
> +
>   dev->tx_pkt_burst = mlx5_select_tx_function(dev);
>   dev->rx_pkt_burst = mlx5_select_rx_function(dev);
>   mlx5_dev_interrupt_handler_install(dev);
>   return 0;
> -error:
> - ret = rte_errno; /* Save rte_errno before cleanup. */
> - /* Rollback. */
> - dev->data->dev_started = 0;
> - mlx5_flow_stop(dev, &priv->flows);
> +
> +error_traffic_disable:
>   mlx5_traffic_disable(dev);
> - mlx5_txq_stop(dev);
> +error_intr_vec_disable:
> + mlx5_rx_intr_vec_disable(dev);
> +error_rxq_stop:
>   mlx5_rxq_stop(dev);
> - rte_errno = ret; /* Restore rte_errno. */
> +error_txq_stop:
> + mlx5_txq_stop(dev);
>   return -rte_errno;
>  }
> 
> --
> 2.18.0



[dpdk-dev] [RFC] ethdev: add tail drop API for traffic management

2018-08-13 Thread Rosen Xu
This patch introduces new ethdev generic Tail Drop API for Traffic
Management, which is yet another standard congestion management
offload for Ethernet devices.

Tail Drop is about packets dropping when they arrive on a congested
interface buffer. It's one mode of congestion management for hierarchy
leaf nodes.

There are two configuration parameters for Tail Drop:
1. Buffer Depth: determine the depth of receive fifo for packet RX.
2. Drop Threshold: water line of receive fifo to judge whether the
   current received packet dropped or enqueue.

Signed-off-by: Rosen Xu 
---
 lib/librte_ethdev/rte_tm.c|  42 ++
 lib/librte_ethdev/rte_tm.h| 172 ++
 lib/librte_ethdev/rte_tm_driver.h |  35 
 3 files changed, 249 insertions(+)

diff --git a/lib/librte_ethdev/rte_tm.c b/lib/librte_ethdev/rte_tm.c
index 9709454..89a7dec 100644
--- a/lib/librte_ethdev/rte_tm.c
+++ b/lib/librte_ethdev/rte_tm.c
@@ -168,6 +168,48 @@ int rte_tm_shared_wred_context_delete(uint16_t port_id,
shared_wred_context_id, error);
 }
 
+/* Add Tail Drop profile */
+int rte_tm_tdrop_profile_add(uint16_t port_id,
+   uint32_t tdrop_profile_id,
+   struct rte_tm_tdrop_params *profile,
+   struct rte_tm_error *error)
+{
+   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+   return RTE_TM_FUNC(port_id, tdrop_profile_add)(dev,
+   tdrop_profile_id, profile, error);
+}
+
+/* Delete Tail Drop profile */
+int rte_tm_tdrop_profile_delete(uint16_t port_id,
+   uint32_t tdrop_profile_id,
+   struct rte_tm_error *error)
+{
+   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+   return RTE_TM_FUNC(port_id, tdrop_profile_delete)(dev,
+   tdrop_profile_id, error);
+}
+
+/* Add/update shared Tail Drop context */
+int rte_tm_shared_tdrop_context_add_update(uint16_t port_id,
+   uint32_t shared_tdrop_context_id,
+   uint32_t tdrop_profile_id,
+   struct rte_tm_error *error)
+{
+   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+   return RTE_TM_FUNC(port_id, shared_tdrop_context_add_update)(dev,
+   shared_tdrop_context_id, tdrop_profile_id, error);
+}
+
+/* Delete shared Tail Drop context */
+int rte_tm_shared_tdrop_context_delete(uint16_t port_id,
+   uint32_t shared_tdrop_context_id,
+   struct rte_tm_error *error)
+{
+   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+   return RTE_TM_FUNC(port_id, shared_tdrop_context_delete)(dev,
+   shared_tdrop_context_id, error);
+}
+
 /* Add shaper profile */
 int rte_tm_shaper_profile_add(uint16_t port_id,
uint32_t shaper_profile_id,
diff --git a/lib/librte_ethdev/rte_tm.h b/lib/librte_ethdev/rte_tm.h
index 955f02f..91b087d 100644
--- a/lib/librte_ethdev/rte_tm.h
+++ b/lib/librte_ethdev/rte_tm.h
@@ -93,6 +93,15 @@
 #define RTE_TM_WRED_PROFILE_ID_NONE  UINT32_MAX
 
 /**
+ * Invalid TDROP profile ID.
+ *
+ * @see struct rte_tm_node_params
+ * @see rte_tm_node_add()
+ * @see rte_tm_node_tdrop_context_update()
+ */
+#define RTE_TM_TDROP_PROFILE_ID_NONE  UINT32_MAX
+
+/**
  *Invalid shaper profile ID.
  *
  * @see struct rte_tm_node_params
@@ -871,6 +880,37 @@ struct rte_tm_wred_params {
 };
 
 /**
+ * Tail Drop (TDROP) profile
+ *
+ * Multiple TDROP contexts can share the same TDROP profile. Each leaf node 
with
+ * TDROP enabled as its congestion management mode has zero or one private 
TDROP
+ * context (only one leaf node using it) and/or zero, one or several shared
+ * TDROP contexts (multiple leaf nodes use the same TDROP context). A private
+ * TDROP context is used to perform congestion management for a single leaf
+ * node, while a shared TDROP context is used to perform congestion management
+ * for a group of leaf nodes.
+ *
+ * @see struct rte_tm_capabilities::cman_tdrop_packet_mode_supported
+ * @see struct rte_tm_capabilities::cman_tdrop_byte_mode_supported
+ */
+struct rte_tm_tdrop_params {
+   /** Committed queue length (in bytes) */
+   uint64_t committed_length;
+
+   /** Peak queue length (in bytes) */
+   uint64_t peak_length;
+
+   /** Drop threshold of queue */
+   uint64_t drop_th;
+
+   /** When non-zero, the *drop_th* threshold is specified
+* in packets (TDROP packet mode). When zero, the *drop_th*
+* threshold is specified in bytes (TDROP byte mode)
+*/
+   int packet_mode;
+};
+
+/**
  * Token bucket
  */
 struct rte_tm_token_bucket {
@@ -1000,6 +1040,32 @@ struct rte_tm_node_params {
 */
uint32_t n_shared_wred_contexts;
} wred;
+
+   /** TDROP parameters (only valid when *cman* is set to
+* TDROP).
+*/
+   struct {
+   /** TDROP profile for private TDROP context. The
+ 

Re: [dpdk-dev] [RFC] ethdev: support metadata as flow rule criteria

2018-08-13 Thread Dekel Peled
Adding relevant maintainers.
 
> -Original Message-
> From: Dekel Peled [mailto:dek...@mellanox.com]
> Sent: Monday, August 13, 2018 10:47 AM
> To: dev@dpdk.org
> Cc: Ori Kam ; Shahaf Shuler
> 
> Subject: [RFC] ethdev: support metadata as flow rule criteria
> 
> Current implementation of rte_flow allows match pattern of flow rule, based
> on packet data or header fields.
> This limits the application use of match patterns.
> 
> For example, consider a vswitch application which controls a set of VMs,
> connected with virtio, in a fabric with overlay of VXLAN.
> Several VMs can have the same inner tuple, while the outer tuple is different
> and controlled by the vswitch (encap action).
> For the vswtich to be able to offload the rule to the NIC, it must use a 
> unique
> match criteria, independent from the inner tuple, to perform the encap
> action.
> 
> This RFC adds support for additional metadata to use as match pattern.
> The metadata is an opaque item, fully controlled by the application.
> 
> The use of metadata is relevant for egress rules only.
> It can be set in the flow rule using the RTE_FLOW_ITEM_META.
> 
> Application should set the packet metdata in the mbuf->metadata field, and
> set the PKT_TX_METADATA flag in the mbuf->ol_flags.
> The NIC will use the packet metadata as match criteria for relevant flow 
> rules.
> 
> For example, to do an encap action depending on the VM id, the application
> needs to configure 'match on metadata' rte_flow rule with VM id as
> metadata, along with desired encap action.
> When preparing an egress data packet, application will set VM id data in
> mbuf metadata field and set PKT_TX_METADATA flag.
> 
> PMD will send data packets to NIC, with VM id as metadata.
> Egress flow on NIC will match metadata as done with other criteria.
> Upon match on metadata (VM id) the appropriate encap action will be
> performed.
> 
> This RFC introduces metadata item type for rte_flow
> RTE_FLOW_ITEM_META, along with corresponding struct
> rte_flow_item_meta and ol_flag PKT_TX_METADATA.
> It also enhances struct rte_mbuf with new data item, uint64_t metadata.
> 
> Comments are welcome.
> 
> Signed-off-by: Dekel Peled 
> ---
>  doc/guides/prog_guide/rte_flow.rst | 21 +
>  lib/librte_ethdev/rte_flow.c   |  1 +
>  lib/librte_ethdev/rte_flow.h   | 25 +
>  lib/librte_mbuf/rte_mbuf.h | 11 +++
>  4 files changed, 58 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst
> b/doc/guides/prog_guide/rte_flow.rst
> index b305a72..b6e35f1 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -1191,6 +1191,27 @@ Normally preceded by any of:
>  - `Item: ICMP6_ND_NS`_
>  - `Item: ICMP6_ND_OPT`_
> 
> +Item: ``META``
> +^^
> +
> +Matches an application specific 64 bit metadata item.
> +
> +- Default ``mask`` matches any 64 bit value.
> +
> +.. _table_rte_flow_item_meta:
> +
> +.. table:: META
> +
> +   +--+--+---+
> +   | Field| Subfield | Value |
> +   +==+==+===+
> +   | ``spec`` | ``data`` | 64 bit metadata value |
> +   +--+--+
> +   | ``last`` | ``data`` | upper range value |
> +   +--+--+---+
> +   | ``mask`` | ``data`` | zeroed to match any value |
> +   +--+--+---+
> +
>  Actions
>  ~~~
> 
> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c index
> cff4b52..54e5ef8 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -66,6 +66,7 @@ struct rte_flow_desc_data {
>sizeof(struct rte_flow_item_icmp6_nd_opt_sla_eth)),
>   MK_FLOW_ITEM(ICMP6_ND_OPT_TLA_ETH,
>sizeof(struct rte_flow_item_icmp6_nd_opt_tla_eth)),
> + MK_FLOW_ITEM(META, sizeof(struct rte_flow_item_meta)),
>  };
> 
>  /** Generate flow_action[] entry. */
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h index
> f8ba71c..b81c816 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -413,6 +413,15 @@ enum rte_flow_item_type {
>* See struct rte_flow_item_mark.
>*/
>   RTE_FLOW_ITEM_TYPE_MARK,
> +
> + /**
> +  * [META]
> +  *
> +  * Matches a metadata value specified in mbuf metadata field.
> +  *
> +  * See struct rte_flow_item_meta.
> +  */
> + RTE_FLOW_ITEM_TYPE_META,
>  };
> 
>  /**
> @@ -849,6 +858,22 @@ struct rte_flow_item_gre {  #endif
> 
>  /**
> + * RTE_FLOW_ITEM_TYPE_META.
> + *
> + * Matches a specified metadata value.
> + */
> +struct rte_flow_item_meta {
> + uint64_t data;
> +};
> +
> +/** Default mask for RTE_FLOW_ITEM_TYPE_META. */ #ifndef __cplusplus
> +static const struct rte_flow_item_meta rte_flow_item_meta_mask = {
> + .da

Re: [dpdk-dev] [PATCH v2] ethdev: fix device info getting

2018-08-13 Thread Andrew Rybchenko

On 13.08.2018 05:50, Lu, Wenzhuo wrote:

Hi Thomas,



-Original Message-
From: Thomas Monjalon [mailto:tho...@monjalon.net]
Sent: Wednesday, August 1, 2018 11:37 PM
To: Lu, Wenzhuo ; Andrew Rybchenko
; Yigit, Ferruh 
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2] ethdev: fix device info getting

16/07/2018 03:58, Lu, Wenzhuo:

Hi Andrew,


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Lu, Wenzhuo
Sent: Monday, July 16, 2018 9:08 AM
To: Andrew Rybchenko ; dev@dpdk.org
Cc: Yigit, Ferruh ; Thomas Monjalon

Subject: Re: [dpdk-dev] [PATCH v2] ethdev: fix device info getting

Hi Andrew,


-Original Message-
From: Andrew Rybchenko [mailto:arybche...@solarflare.com]
Sent: Friday, July 13, 2018 4:03 PM
To: Lu, Wenzhuo ; dev@dpdk.org
Cc: Yigit, Ferruh ; Thomas Monjalon

Subject: Re: [dpdk-dev] [PATCH v2] ethdev: fix device info getting

Hi, Wenzhuo,

I'm sorry, but I have more even harder questions than the previous one.
This questions are rather generic and mainly to ethdev maintainers.

On 13.07.2018 05:42, Wenzhuo Lu wrote:

The device information cannot be gotten correctly before the
configuration is set. Because on some NICs the information has
dependence on the configuration.

Thinking about it I have the following question. Is it valid
behaviour of the dev_info if it changes after configuration?
I always thought that the primary goal of the dev_info is to
provide information to app about device capabilities to allow app
configure device and queues correctly. Now we see the case when
dev_info changes on configure. May be it is acceptable, but it is
really suspicious. If we accept it, it should be documented.
May be dev_info should be split into parts: part which is
persistent and part which may depend on device configuration.

As I remember, the similar discussion has happened :) I've raised
the similar suggestion like this. But we don’t make it happen.
The reason is, you see, this is the rte layer's behavior. So the
user doesn't have to know it. From APP's PoV, it inputs the
configuration, it calls this API "rte_eth_dev_configure". It doesn't
know  the configuration is copied before getting the info or not.
So, to my opinion, we can still keep the behavior. We only need to
split it into parts when we do see the case that cannot make it.

Maybe I talked too much about the patch. Think about it again. Your
comments is about how to use the APIs, rte_eth_dev_info_get,

rte_eth_dev_configure. To my opinion, rte_eth_dev_info_get is just to get
the info. It can be called anywhere, before configuration or after. It's
reasonable the info changes with the configuration changing.

But we do have something missing, like, rte_eth_dev_capability_get which

should be stable. APP can use this API to get the necessary info before
configuration.

A question, maybe a little divergent thinking, that APP should have some

intelligence to handle the capability automatically. So getting the capability
is not so good and effective, looks like we still need the human involvement.
Maybe that the reason currently we suppose APP know the capability from
the paper copies, examples...

I am not sure to understand all the sentences.
But I agree that we should take a decision about the stability of these infos.
Either infos cannot change after probing, or we must document that the app
must request infos regularly (when?).

Sorry, I missed this mail.

I have the concern that different NICs have different behavior. One info can be 
stable on a NIC but dynamic on another. Considering this, we may better not 
splitting the rte_eth_dev_info_get to 2 APIs. And comparing with handling this 
in rte layer, maybe we can let every NIC has its own decision.
I have an idea. Maybe we can add a parameter for potential dynamic fields. Like,
Changing
uint16_t nb_rx_queues;
to
struct nb_rx_queues {
uint16_t value;
bool stable;
}


May be it is just very bad example, but as I understand nb_rx_queues is 
mainly required to configure the device properly. Or should app 
configure, get new value, reconfigure again, get new value and so on and 
stop when previous is equal to the new one. Yes, I dramatise and it 
sounds really bad. In any case it would over-complicate interface and no 
single app will do it correctly.


Stable dev_info is simple. If there are real cases when something can't 
be stable (and may be recommended Rx/Tx ring sizes is good example, it 
should at least documented in dev_info structure description or may be 
moved to separate API.



By default, the stable is false. Then every NIC can maintain its own behavior.

Some fileds that must be stable can be left unchanged, like, driver_name, 
max_rx_queues.

As this patch is just reversing a bad commit to fix a bug, if my idea sounds 
good or worth discussing, I can send another RFC mail for it.





Re: [dpdk-dev] [PATCH v2 00/12] preparing l2fwd for eventmode additions

2018-08-13 Thread Bruce Richardson
On Mon, Aug 13, 2018 at 12:52:19PM +0530, Joseph, Anoob wrote:
> Hi Bruce, Pablo,
> 
> If there are no more issues about the approach, can you review the patches
> and give the feedback?
> 
> Please do note that this series doesn't add any event mode specific code.
> That will come as a different patch series after incorporating Jerin's
> comments.
> 
> Thanks,
> Anoob

My main concern is with l2fwd, rather than l3fwd, which is already fairly
complicated. I could see l3fwd being updated to allow an eventmode without
too many problems.

With l2fwd, the only issue I have is with the volume of code involved.
l2fwd is currently a very simple application which fits in a single file.
With these updates it's no longer such a simple, approachable app, rather
it becomes one which takes a bit of studying a switching between files to
fully understand. The data path is only a very small part of the app, so by
adding an event-based path to the same app we have very little code saving.
Therefore, I think having a separate l2fwd-eventdev would be better for
this case. Two simpler to understand apps is better than one more
complicated on IMHO.

My 2c.

/Bruce

> On 02-08-2018 13:49, Ananyev, Konstantin wrote:
> > External Email
> > 
> > Hi everyone,
> > 
> > > > > > In order to get this series accepted, we need more discussions
> > > > > > with more people involved.
> > > > > > So it will miss 18.08.
> > > > > > 
> > > > > > It can be discussed in a more global discussion about examples 
> > > > > > maintenance.
> > > > > > If discussion does not happen, you can request it to the technical 
> > > > > > board.
> > > > > > 
> > > > > Event dev framework and various adapters enable multiple packet 
> > > > > handling
> > > > > schemes, as opposed to the traditional polling on queues. But these
> > > > > features are not integrated into any established example application.
> > > > > There are specific example applications for event dev etc, which can 
> > > > > be
> > > > > used to analyze an event device or a particular eventdev adapter, but
> > > > > there is no standard application which can be used to compare the real
> > > > > world performance for a system when it's using event device for packet
> > > > > handling and when it's done via polling on queues.
> > > > > 
> > > > > The following patch submitted by Sunil was looking to address this 
> > > > > issue
> > > > > with l3fwd,
> > > > > https://mails.dpdk.org/archives/dev/2018-March/093131.html
> > > > > 
> > > > > Bruce & Jerin reviewed the patch and suggested the addition of helper
> > > > > functions to abstract the event mode additions in applications,
> > > > > https://mails.dpdk.org/archives/dev/2018-April/096879.html
> > > > > 
> > > > > This effort of adding helper functions for eventmode was taken up
> > > > > following the above suggestion. The idea is to add eventmode without
> > > > > touching the existing code path. All the eventmode specific additions
> > > > > would go into library so that these need not be repeated for every
> > > > > application. And since there is no change in the existing code path,
> > > > > performance for any vendor should not have any impact with the 
> > > > > additions.
> > > > > 
> > > > > The scope of this effort has increased since the submission, as now we
> > > > > have Tx adapter as well. Sunil & Konstantin had clarified their
> > > > > concerns, and gave green flag to this approach.
> > > > > https://mails.dpdk.org/archives/dev/2018-June/105730.html
> > > > > https://mails.dpdk.org/archives/dev/2018-July/106453.html
> > > > > 
> > > > > I guess Bruce was opening this question to the community. For compute
> > > > > intense applications like ipsec-secgw, eventmode might be the right
> > > > > approach in the first place. Such complex applications would need a
> > > > > scheduler to perform dynamic load balancing. Addition of eventmode in
> > > > > l2fwd was to float around the idea which can then be scaled for more
> > > > > complex applications.
> > > > > 
> > > > > If maintainers doesn't have any objection to this, I'm fine with 
> > > > > adding
> > > > > this in the next release.
> > > > > 
> > > > > Thanks,
> > > > > Anoob
> > > > It is important that DPDK has good examples of how to use existing
> > > > frameworks and libraries. These applications are what most customers
> > > > build their applications from and they provide basis for testing.
> > > > 
> > > > The DPDK needs to continue to support multiple usage models. This
> > > > is one of its strong points. I would rather leave existing l2fwd
> > > > and l3fwd alone and instead make new examples that use the frameworks.
> > > > If nothing else haveing l2fwd and l2fwd-eventdev would allow for
> > > > performance comparisons.
> > > Unlike other applications example, there wont be any change in packet
> > > processing functions in eventdev vs poll mode case. Only worker
> > > schematics will change and that can be moved to separated files.
> > > something like wo

Re: [dpdk-dev] [PATCH] version: 18.11-rc0

2018-08-13 Thread Mcnamara, John



> -Original Message-
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Saturday, August 11, 2018 11:12 PM
> To: dev@dpdk.org
> Cc: Yigit, Ferruh ; Mcnamara, John
> 
> Subject: [PATCH] version: 18.11-rc0
> 
> Start version numbering for a new release cycle, and introduce a template
> file for release notes.
> 
> The release notes comments have a new block to suggest the order of items,
> inspired by Ferruh's proposal.
> 
> Signed-off-by: Thomas Monjalon 
> ---
>  doc/guides/rel_notes/release_18_11.rst  | 207 
>  lib/librte_eal/common/include/rte_version.h |   6 +-

The release_18_11.rst should be included in the relevant index.rst doc.

Apart from that:

Acked-by: John McNamara 






Re: [dpdk-dev] [RFC] ethdev: add generic MAC address rewrite actions

2018-08-13 Thread Rahul Lakkireddy
On Tuesday, August 08/07/18, 2018 at 14:20:10 +, Jack Min wrote:
> There is a need to offload rewrite MAC address for both destination and source
> from the matched flow
> 
> The proposed actions could make above easily achieved
> 

+1.

We're also looking to offload these actions. In addition, we also have
a requirement to offload an action to swap the source and destination
MAC addresses (i.e. source MAC address will get overwritten with the
destination MAC address and vice-versa).

Could you please add one more action RTE_FLOW_ACTION_TYPE_MAC_SWAP
to achieve this? This action will not take any arguments. Let us
know your thoughts.

Thanks,
Rahul

> 
> Signed-off-by: Xiaoyu Min mailto:jack...@mellanox.com>>
> ---
> lib/librte_ethdev/rte_flow.h | 32 
> 1 file changed, 32 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index f8ba71cdb..4a51ab2a3 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -1505,6 +1505,26 @@ enum rte_flow_action_type {
>  * error.
>  */
> RTE_FLOW_ACTION_TYPE_NVGRE_DECAP,
> +
> +   /**
> +* Set source MAC address from matched flow.
> +*
> +* If flow pattern does not define a valid RTE_FLOW_ITEM_TYPE_ETH,
> +* the PMD should return a RTE_FLOW_ERROR_TYPE_ACTION error.
> +*
> +* See struct rte_flow_action_set_mac.
> +*/
> +   RTE_FLOW_ACTION_TYPE_SET_MAC_SRC,
> +
> +   /**
> +* Set destination MAC address from matched flow.
> +*
> +* If flow pattern does not define a valid RTE_FLOW_ITEM_TYPE_ETH,
> +* the PMD should return a RTE_FLOW_ERROR_TYPE_ACTION error.
> +*
> +* See struct rte_flow_action_set_mac.
> +*/
> +   RTE_FLOW_ACTION_TYPE_SET_MAC_DST,
> };
> 
> /**
> @@ -1868,6 +1888,18 @@ struct rte_flow_action_nvgre_encap {
> struct rte_flow_item *definition;
> };
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ACTION_TYPE_SET_MAC
> + *
> + * Set MAC address from the matched flow
> + */
> +struct rte_flow_action_set_mac {
> +   uint8_t mac_addr[ETHER_ADDR_LEN];
> +};
> +
> /*
>   * Definition of a single action.
>   *
> --
> 2.17.1


[dpdk-dev] [PATCH] app/test-pmd: add and identify shaper profile parameters

2018-08-13 Thread Rosen Xu
As struct rte_tm_shaper_params defined, the command line of
test-pmd should include committed and peak parameters, but
right now the command line doesn't identify whether it's
committed or peak parameter. This patch identifies and
adds the clarify definition

Signed-off-by: Rosen Xu 
Fixes: bddc2f40b594 ("app/testpmd: add commands for shaper and wred profiles")
Cc: jasvinder.si...@intel.com
---
 app/test-pmd/cmdline_tm.c | 34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/app/test-pmd/cmdline_tm.c b/app/test-pmd/cmdline_tm.c
index 631f179..8ee2785 100644
--- a/app/test-pmd/cmdline_tm.c
+++ b/app/test-pmd/cmdline_tm.c
@@ -771,8 +771,10 @@ struct cmd_add_port_tm_node_shaper_profile_result {
cmdline_fixed_string_t profile;
uint16_t port_id;
uint32_t shaper_id;
-   uint64_t tb_rate;
-   uint64_t tb_size;
+   uint64_t cmit_tb_rate;
+   uint64_t cmit_tb_size;
+   uint64_t peak_tb_rate;
+   uint64_t peak_tb_size;
uint32_t pktlen_adjust;
 };
 
@@ -807,14 +809,22 @@ struct cmd_add_port_tm_node_shaper_profile_result {
TOKEN_NUM_INITIALIZER(
struct cmd_add_port_tm_node_shaper_profile_result,
shaper_id, UINT32);
-cmdline_parse_token_num_t cmd_add_port_tm_node_shaper_profile_tb_rate =
+cmdline_parse_token_num_t cmd_add_port_tm_node_shaper_profile_cmit_tb_rate =
TOKEN_NUM_INITIALIZER(
struct cmd_add_port_tm_node_shaper_profile_result,
-   tb_rate, UINT64);
-cmdline_parse_token_num_t cmd_add_port_tm_node_shaper_profile_tb_size =
+   cmit_tb_rate, UINT64);
+cmdline_parse_token_num_t cmd_add_port_tm_node_shaper_profile_cmit_tb_size =
TOKEN_NUM_INITIALIZER(
struct cmd_add_port_tm_node_shaper_profile_result,
-   tb_size, UINT64);
+   cmit_tb_size, UINT64);
+cmdline_parse_token_num_t cmd_add_port_tm_node_shaper_profile_peak_tb_rate =
+   TOKEN_NUM_INITIALIZER(
+   struct cmd_add_port_tm_node_shaper_profile_result,
+   peak_tb_rate, UINT64);
+cmdline_parse_token_num_t cmd_add_port_tm_node_shaper_profile_peak_tb_size =
+   TOKEN_NUM_INITIALIZER(
+   struct cmd_add_port_tm_node_shaper_profile_result,
+   peak_tb_size, UINT64);
 cmdline_parse_token_num_t cmd_add_port_tm_node_shaper_profile_pktlen_adjust =
TOKEN_NUM_INITIALIZER(
struct cmd_add_port_tm_node_shaper_profile_result,
@@ -838,8 +848,10 @@ static void 
cmd_add_port_tm_node_shaper_profile_parsed(void *parsed_result,
/* Private shaper profile params */
memset(&sp, 0, sizeof(struct rte_tm_shaper_params));
memset(&error, 0, sizeof(struct rte_tm_error));
-   sp.peak.rate = res->tb_rate;
-   sp.peak.size = res->tb_size;
+   sp.committed.rate = res->cmit_tb_rate;
+   sp.committed.size = res->cmit_tb_size;
+   sp.peak.rate = res->peak_tb_rate;
+   sp.peak.size = res->peak_tb_size;
sp.pkt_length_adjust = pkt_len_adjust;
 
ret = rte_tm_shaper_profile_add(port_id, shaper_id, &sp, &error);
@@ -862,8 +874,10 @@ static void 
cmd_add_port_tm_node_shaper_profile_parsed(void *parsed_result,
(void *)&cmd_add_port_tm_node_shaper_profile_profile,
(void *)&cmd_add_port_tm_node_shaper_profile_port_id,
(void *)&cmd_add_port_tm_node_shaper_profile_shaper_id,
-   (void *)&cmd_add_port_tm_node_shaper_profile_tb_rate,
-   (void *)&cmd_add_port_tm_node_shaper_profile_tb_size,
+   (void *)&cmd_add_port_tm_node_shaper_profile_cmit_tb_rate,
+   (void *)&cmd_add_port_tm_node_shaper_profile_cmit_tb_size,
+   (void *)&cmd_add_port_tm_node_shaper_profile_peak_tb_rate,
+   (void *)&cmd_add_port_tm_node_shaper_profile_peak_tb_size,
(void *)&cmd_add_port_tm_node_shaper_profile_pktlen_adjust,
NULL,
},
-- 
1.8.3.1



Re: [dpdk-dev] [PATCH] version: 18.11-rc0

2018-08-13 Thread Thomas Monjalon
13/08/2018 12:07, Mcnamara, John:
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > 
> > Start version numbering for a new release cycle, and introduce a template
> > file for release notes.
> > 
> > The release notes comments have a new block to suggest the order of items,
> > inspired by Ferruh's proposal.
> > 
> > Signed-off-by: Thomas Monjalon 
> > ---
> >  doc/guides/rel_notes/release_18_11.rst  | 207 
> >  lib/librte_eal/common/include/rte_version.h |   6 +-
> 
> The release_18_11.rst should be included in the relevant index.rst doc.

Ah yes, thank you!
> 
> Apart from that:
> 
> Acked-by: John McNamara 

Applied, thanks




[dpdk-dev] [PATCH v2] bus/pci: check if 5-level paging is enabled when testing IOMMU address width

2018-08-13 Thread Drocula
The kernel version 4.14 released with the support of 5-level paging.
When PML5 enabled, user-space virtual addresses uses up to 56 bits.
see kernel's Documentation/x86/x86_64/mm.txt.

Signed-off-by: ZY Qiu 
---
 drivers/bus/pci/linux/pci.c | 33 ++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac..acc19df 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -552,16 +553,39 @@
 }
 
 #if defined(RTE_ARCH_X86)
+/*
+ * Try to detect whether the system uses 5-level page table.
+ */
+static bool
+system_uses_PML5(void)
+{
+#define X86_56_BIT_VA (0xfULL << 52)
+   void *page_4k;
+   page_4k = mmap((void *)X86_56_BIT_VA, 4096, PROT_READ | PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+
+   if (page_4k == MAP_FAILED)
+   return false;
+   munmap(page_4k, 4096);
+
+   if ((unsigned long long)page_4k & X86_56_BIT_VA)
+   return true;
+   return false;
+}
+
 static bool
 pci_one_device_iommu_support_va(struct rte_pci_device *dev)
 {
 #define VTD_CAP_MGAW_SHIFT 16
 #define VTD_CAP_MGAW_MASK  (0x3fULL << VTD_CAP_MGAW_SHIFT)
-#define X86_VA_WIDTH 47 /* From Documentation/x86/x86_64/mm.txt */
+/*  From Documentation/x86/x86_64/mm.txt */
+#define X86_VA_WIDTH_PML4 47
+#define X86_VA_WIDTH_PML5 56
+
struct rte_pci_addr *addr = &dev->addr;
char filename[PATH_MAX];
FILE *fp;
-   uint64_t mgaw, vtd_cap_reg = 0;
+   uint64_t mgaw, vtd_cap_reg = 0, va_width = X86_VA_WIDTH_PML4;
 
snprintf(filename, sizeof(filename),
 "%s/" PCI_PRI_FMT "/iommu/intel-iommu/cap",
@@ -587,8 +611,11 @@
 
fclose(fp);
 
+   if (system_uses_PML5())
+   va_width = X86_VA_WIDTH_PML5;
+
mgaw = ((vtd_cap_reg & VTD_CAP_MGAW_MASK) >> VTD_CAP_MGAW_SHIFT) + 1;
-   if (mgaw < X86_VA_WIDTH)
+   if (mgaw < va_width)
return false;
 
return true;
-- 
1.8.3.1



Re: [dpdk-dev] Multi-thread mempool usage

2018-08-13 Thread Matteo Lanzuisi

Any suggestion? any idea about this behaviour?

Il 08/08/2018 11:56, Matteo Lanzuisi ha scritto:

Hi all,

recently I began using "dpdk-17.11-11.el7.x86_64" rpm (RedHat rpm) on 
RedHat 7.5 kernel 3.10.0-862.6.3.el7.x86_64 as a porting of an 
application from RH6 to RH7. On RH6 I used dpdk-2.2.0.


This application is made up by one or more threads (each one on a 
different logical core) reading packets from i40e interfaces.


Each thread can call the following code lines when receiving a 
specific packet:


RTE_LCORE_FOREACH(lcore_id)
{
    result = 
rte_mempool_get(cea_main_lcore_conf[lcore_id].de_conf.cmd_pool, 
(VOID_P *) &new_work);        // mempools are created one for each 
logical core

    if (((uint64_t)(new_work)) < 0x7f00)
    printf("Result %d, lcore di partenza %u, lcore di 
ricezione %u, pointer %p\n", result, rte_lcore_id(), lcore_id, 
new_work);    // debug print, on my server it should never happen but 
with multi-thread happens always on the last logical core

    if (result == 0)
    {
    new_work->command = command; // usage of the memory gotten 
from the mempool... <- here is where the application crashes
    result = 
rte_ring_enqueue(cea_main_lcore_conf[lcore_id].de_conf.cmd_ring, 
(VOID_P) new_work);    // enqueues the gotten buffer on the rings of 
all lcores

    // check on result value ...
    }
    else
    {
    // do something if result != 0 ...
    }
}

This code worked perfectly (never had an issue) on dpdk-2.2.0, while 
if I use more than 1 thread doing these operations on dpdk-17.11 it 
happens that after some times the "new_work" pointer is not a good 
one, and the application crashes when using that pointer.


It seems that these lines cannot be used by more than one thread 
simultaneously. I also used many 2017 and 2018 dpdk versions without 
success.


Is this code possible on the new dpdk versions? Or have I to change my 
application so that this code is called just by one lcore at a time?


Matteo





--


Descrizione: resi.gif

*Matteo Lanzuisi*

/Business Unit ICT/

/
/RESI Informatica S.p.A.
Via Pontina Km 44,044
04011 Aprilia (LT) - Italy
*Tel:*+39 06 92710339
*Mobile:*+39 3355686712
*Fax:*+39 06 92710208

*Email:*m.lanzu...@resi.it
*Web:*www.resi.it http://www.resi.it/>**

Descrizione: resi-group.gif width= 

_ _ _ _ _ _ _ _ _ _ _ _

*Nota di riservatezza:*Ai sensi del Decreto Legislativo n. 196/2003, 
"Codice in materia di Protezione dei dati personali", si precisa che le 
informazioni contenute in questo messaggio e negli eventuali allegati 
sono riservate e per uso esclusivo del destinatario. Persone diverse 
dallo stesso non possono copiare o distribuire il messaggio a terzi. 
Chiunque riceva questo messaggio per errore è pregato di distruggerlo e 
di informare immediatamente il mittente. Grazie.



*Confidentiality Notice:* Accordingly to Italian legislative decree n. 
196/2003 concerning privacy, the information contained in this e-mail is 
intended for the named recipients only. It may contain privileged and 
confidential information and if you are not an intended recipient, you 
must not copy, distribute or take any action in reliance on it. If you 
have received this e-mail in error, please notify the sender by e-mail 
and delete the e-mail and any copies of it. Thank you.


*P*Please consider the environment before printing this mail. 
Rispettate l’ambiente e stampate questa email solo in caso di reale 
necessità




[dpdk-dev] [RFC v2 2/3] ethdev: add flow api actions to modify TCP/UDP port numbers

2018-08-13 Thread Rahul Lakkireddy
From: Shagun Agrawal 

Add actions:
- SET_TP_SRC - set a new TCP/UDP source port number.
- SET_TP_DST - set a new TCP/UDP destination port number.

Signed-off-by: Shagun Agrawal 
Signed-off-by: Rahul Lakkireddy 
---
v2:
- Remove OpenFlow prefix from TCP/UDP port rewrite actions.
- Re-based to tip.

 app/test-pmd/cmdline_flow.c | 50 +
 app/test-pmd/config.c   |  4 +++
 doc/guides/prog_guide/rte_flow.rst  | 30 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  8 +
 lib/librte_ethdev/rte_flow.c|  4 +++
 lib/librte_ethdev/rte_flow.h| 29 +
 6 files changed, 125 insertions(+)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index fd135da64..3935539cb 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -251,6 +251,10 @@ enum index {
ACTION_SET_IPV6_SRC_IPV6_SRC,
ACTION_SET_IPV6_DST,
ACTION_SET_IPV6_DST_IPV6_DST,
+   ACTION_SET_TP_SRC,
+   ACTION_SET_TP_SRC_TP_SRC,
+   ACTION_SET_TP_DST,
+   ACTION_SET_TP_DST_TP_DST,
 };
 
 /** Maximum size for pattern in struct rte_flow_item_raw. */
@@ -828,6 +832,8 @@ static const enum index next_action[] = {
ACTION_SET_IPV4_DST,
ACTION_SET_IPV6_SRC,
ACTION_SET_IPV6_DST,
+   ACTION_SET_TP_SRC,
+   ACTION_SET_TP_DST,
ZERO,
 };
 
@@ -954,6 +960,18 @@ static const enum index action_set_ipv6_dst[] = {
ZERO,
 };
 
+static const enum index action_set_tp_src[] = {
+   ACTION_SET_TP_SRC_TP_SRC,
+   ACTION_NEXT,
+   ZERO,
+};
+
+static const enum index action_set_tp_dst[] = {
+   ACTION_SET_TP_DST_TP_DST,
+   ACTION_NEXT,
+   ZERO,
+};
+
 static const enum index action_jump[] = {
ACTION_JUMP_GROUP,
ACTION_NEXT,
@@ -2570,6 +2588,38 @@ static const struct token token_list[] = {
(struct rte_flow_action_set_ipv6, ipv6_addr)),
.call = parse_vc_conf,
},
+   [ACTION_SET_TP_SRC] = {
+   .name = "set_tp_src",
+   .help = "set TCP/UDP source port number",
+   .priv = PRIV_ACTION(SET_TP_SRC,
+   sizeof(struct rte_flow_action_set_tp)),
+   .next = NEXT(action_set_tp_src),
+   .call = parse_vc,
+   },
+   [ACTION_SET_TP_SRC_TP_SRC] = {
+   .name = "port",
+   .help = "new source port number to set",
+   .next = NEXT(action_set_tp_src, NEXT_ENTRY(UNSIGNED)),
+   .args = ARGS(ARGS_ENTRY_HTON
+(struct rte_flow_action_set_tp, port)),
+   .call = parse_vc_conf,
+   },
+   [ACTION_SET_TP_DST] = {
+   .name = "set_tp_dst",
+   .help = "set TCP/UDP destination port number",
+   .priv = PRIV_ACTION(SET_TP_DST,
+   sizeof(struct rte_flow_action_set_tp)),
+   .next = NEXT(action_set_tp_dst),
+   .call = parse_vc,
+   },
+   [ACTION_SET_TP_DST_TP_DST] = {
+   .name = "port",
+   .help = "new destination port number to set",
+   .next = NEXT(action_set_tp_dst, NEXT_ENTRY(UNSIGNED)),
+   .args = ARGS(ARGS_ENTRY_HTON
+(struct rte_flow_action_set_tp, port)),
+   .call = parse_vc_conf,
+   },
 };
 
 /** Remove and return last entry from argument stack. */
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 401b066b3..70dd52254 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1161,6 +1161,10 @@ static const struct {
   sizeof(struct rte_flow_action_set_ipv6)),
MK_FLOW_ACTION(SET_IPV6_DST,
   sizeof(struct rte_flow_action_set_ipv6)),
+   MK_FLOW_ACTION(SET_TP_SRC,
+  sizeof(struct rte_flow_action_set_tp)),
+   MK_FLOW_ACTION(SET_TP_DST,
+  sizeof(struct rte_flow_action_set_tp)),
 };
 
 /** Compute storage space needed by action configuration and copy it. */
diff --git a/doc/guides/prog_guide/rte_flow.rst 
b/doc/guides/prog_guide/rte_flow.rst
index 8f3dcc6c1..4faf8cb40 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2136,6 +2136,36 @@ Set a new IPv6 destination address.
| ``ipv6_addr`` | new IPv6 destination address |
+---+--+
 
+Action: ``SET_TP_SRC``
+^
+
+Set a new TCP/UDP source port number.
+
+.. _table_rte_flow_action_set_tp_src:
+
+.. table:: SET_TP_SRC
+
+   +--+-+
+   | Field| Value   |
+   +==+=+
+   | ``port`` | new TCP/UDP source port |
+   +---++
+
+Action: ``SET_TP_DST``
+^
+
+Set a new TCP/UDP destination port 

[dpdk-dev] [RFC v2 3/3] net/cxgbe: add flow actions to modify IP and TCP/UDP port address

2018-08-13 Thread Rahul Lakkireddy
From: Shagun Agrawal 

Query firmware for the new filter work request to offload flows with
actions to modify IP and TCP/UDP port addresses. When available,
translate IP and TCP/UDP port address modify actions to internal
hardware specification and offload the flow to hardware.

Signed-off-by: Shagun Agrawal 
Signed-off-by: Rahul Lakkireddy 
---
v2:
- Re-based to tip.

 drivers/net/cxgbe/base/common.h |   1 +
 drivers/net/cxgbe/base/t4fw_interface.h |  50 
 drivers/net/cxgbe/cxgbe_filter.c|  21 ++-
 drivers/net/cxgbe/cxgbe_filter.h|  23 
 drivers/net/cxgbe/cxgbe_flow.c  | 100 +++-
 drivers/net/cxgbe/cxgbe_main.c  |  10 
 6 files changed, 200 insertions(+), 5 deletions(-)

diff --git a/drivers/net/cxgbe/base/common.h b/drivers/net/cxgbe/base/common.h
index 157201da2..bbd915f29 100644
--- a/drivers/net/cxgbe/base/common.h
+++ b/drivers/net/cxgbe/base/common.h
@@ -270,6 +270,7 @@ struct adapter_params {
 
bool ulptx_memwrite_dsgl;  /* use of T5 DSGL allowed */
u8 fw_caps_support;   /* 32-bit Port Capabilities */
+   bool filter2_wr_support;/* FW support for FILTER2_WR */
 };
 
 /* Firmware Port Capabilities types.
diff --git a/drivers/net/cxgbe/base/t4fw_interface.h 
b/drivers/net/cxgbe/base/t4fw_interface.h
index e80b58a32..832d22f93 100644
--- a/drivers/net/cxgbe/base/t4fw_interface.h
+++ b/drivers/net/cxgbe/base/t4fw_interface.h
@@ -61,6 +61,7 @@ enum fw_wr_opcodes {
FW_ETH_TX_PKTS_WR   = 0x09,
FW_ETH_TX_PKT_VM_WR = 0x11,
FW_ETH_TX_PKTS_VM_WR= 0x12,
+   FW_FILTER2_WR   = 0x77,
FW_ETH_TX_PKTS2_WR  = 0x78,
 };
 
@@ -197,6 +198,51 @@ struct fw_filter_wr {
__u8   sma[6];
 };
 
+struct fw_filter2_wr {
+   __be32 op_pkd;
+   __be32 len16_pkd;
+   __be64 r3;
+   __be32 tid_to_iq;
+   __be32 del_filter_to_l2tix;
+   __be16 ethtype;
+   __be16 ethtypem;
+   __u8   frag_to_ovlan_vldm;
+   __u8   smac_sel;
+   __be16 rx_chan_rx_rpl_iq;
+   __be32 maci_to_matchtypem;
+   __u8   ptcl;
+   __u8   ptclm;
+   __u8   ttyp;
+   __u8   ttypm;
+   __be16 ivlan;
+   __be16 ivlanm;
+   __be16 ovlan;
+   __be16 ovlanm;
+   __u8   lip[16];
+   __u8   lipm[16];
+   __u8   fip[16];
+   __u8   fipm[16];
+   __be16 lp;
+   __be16 lpm;
+   __be16 fp;
+   __be16 fpm;
+   __be16 r7;
+   __u8   sma[6];
+   __be16 r8;
+   __u8   filter_type_swapmac;
+   __u8   natmode_to_ulp_type;
+   __be16 newlport;
+   __be16 newfport;
+   __u8   newlip[16];
+   __u8   newfip[16];
+   __be32 natseqcheck;
+   __be32 r9;
+   __be64 r10;
+   __be64 r11;
+   __be64 r12;
+   __be64 r13;
+};
+
 #define S_FW_FILTER_WR_TID 12
 #define V_FW_FILTER_WR_TID(x)  ((x) << S_FW_FILTER_WR_TID)
 
@@ -300,6 +346,9 @@ struct fw_filter_wr {
 #define S_FW_FILTER_WR_MATCHTYPEM  0
 #define V_FW_FILTER_WR_MATCHTYPEM(x)   ((x) << S_FW_FILTER_WR_MATCHTYPEM)
 
+#define S_FW_FILTER2_WR_NATMODE5
+#define V_FW_FILTER2_WR_NATMODE(x) ((x) << S_FW_FILTER2_WR_NATMODE)
+
 /**
  *  C O M M A N D s
  */
@@ -655,6 +704,7 @@ enum fw_params_param_dev {
FW_PARAMS_PARAM_DEV_FWREV   = 0x0B, /* fw version */
FW_PARAMS_PARAM_DEV_TPREV   = 0x0C, /* tp version */
FW_PARAMS_PARAM_DEV_ULPTX_MEMWRITE_DSGL = 0x17,
+   FW_PARAMS_PARAM_DEV_FILTER2_WR  = 0x1D,
 };
 
 /*
diff --git a/drivers/net/cxgbe/cxgbe_filter.c b/drivers/net/cxgbe/cxgbe_filter.c
index 7f0d38001..deae1c37b 100644
--- a/drivers/net/cxgbe/cxgbe_filter.c
+++ b/drivers/net/cxgbe/cxgbe_filter.c
@@ -87,6 +87,9 @@ int validate_filter(struct adapter *adapter, struct 
ch_filter_specification *fs)
if (fs->val.iport >= adapter->params.nports)
return -ERANGE;
 
+   if (!fs->cap && fs->nat_mode && !adapter->params.filter2_wr_support)
+   return -EOPNOTSUPP;
+
return 0;
 }
 
@@ -648,7 +651,7 @@ int set_filter_wr(struct rte_eth_dev *dev, unsigned int 
fidx)
struct adapter *adapter = ethdev2adap(dev);
struct filter_entry *f = &adapter->tids.ftid_tab[fidx];
struct rte_mbuf *mbuf;
-   struct fw_filter_wr *fwr;
+   struct fw_filter2_wr *fwr;
struct sge_ctrl_txq *ctrlq;
unsigned int port_id = ethdev2pinfo(dev)->port_id;
int ret;
@@ -663,13 +666,16 @@ int set_filter_wr(struct rte_eth_dev *dev, unsigned int 
fidx)
mbuf->data_len = sizeof(*fwr);
mbuf->pkt_len = mbuf->data_len;
 
-   fwr = rte_pktmbuf_mtod(mbuf, struct fw_filter_wr *);
+   fwr = rte_pktmbuf_mtod(mbuf, struct fw_filter2_wr *);
memset(fwr, 0, sizeof(*fwr));
 
/*
 * Construct the work request to set the filter.
  

[dpdk-dev] [RFC v2 1/3] ethdev: add flow api actions to modify IP addresses

2018-08-13 Thread Rahul Lakkireddy
From: Shagun Agrawal 

Add actions:
- SET_IPV4_SRC - set a new IPv4 source address.
- SET_IPV4_DST - set a new IPv4 destination address.
- SET_IPV6_SRC - set a new IPv6 source address.
- SET_IPV6_DST - set a new IPv6 destination address.

Signed-off-by: Shagun Agrawal 
Signed-off-by: Rahul Lakkireddy 
---
v2:
- Remove OpenFlow prefix from IPv4 and IPv6 rewrite actions.
- Remove Network (NW) prefix from IPv4 and IPv6 rewrite actions.
- Re-based to tip.

 app/test-pmd/cmdline_flow.c | 100 
 app/test-pmd/config.c   |   8 +++
 doc/guides/prog_guide/rte_flow.rst  |  60 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  16 +
 lib/librte_ethdev/rte_flow.c|   8 +++
 lib/librte_ethdev/rte_flow.h|  58 
 6 files changed, 250 insertions(+)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index f9260600e..fd135da64 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -243,6 +243,14 @@ enum index {
ACTION_VXLAN_DECAP,
ACTION_NVGRE_ENCAP,
ACTION_NVGRE_DECAP,
+   ACTION_SET_IPV4_SRC,
+   ACTION_SET_IPV4_SRC_IPV4_SRC,
+   ACTION_SET_IPV4_DST,
+   ACTION_SET_IPV4_DST_IPV4_DST,
+   ACTION_SET_IPV6_SRC,
+   ACTION_SET_IPV6_SRC_IPV6_SRC,
+   ACTION_SET_IPV6_DST,
+   ACTION_SET_IPV6_DST_IPV6_DST,
 };
 
 /** Maximum size for pattern in struct rte_flow_item_raw. */
@@ -816,6 +824,10 @@ static const enum index next_action[] = {
ACTION_VXLAN_DECAP,
ACTION_NVGRE_ENCAP,
ACTION_NVGRE_DECAP,
+   ACTION_SET_IPV4_SRC,
+   ACTION_SET_IPV4_DST,
+   ACTION_SET_IPV6_SRC,
+   ACTION_SET_IPV6_DST,
ZERO,
 };
 
@@ -918,6 +930,30 @@ static const enum index action_of_push_mpls[] = {
ZERO,
 };
 
+static const enum index action_set_ipv4_src[] = {
+   ACTION_SET_IPV4_SRC_IPV4_SRC,
+   ACTION_NEXT,
+   ZERO,
+};
+
+static const enum index action_set_ipv4_dst[] = {
+   ACTION_SET_IPV4_DST_IPV4_DST,
+   ACTION_NEXT,
+   ZERO,
+};
+
+static const enum index action_set_ipv6_src[] = {
+   ACTION_SET_IPV6_SRC_IPV6_SRC,
+   ACTION_NEXT,
+   ZERO,
+};
+
+static const enum index action_set_ipv6_dst[] = {
+   ACTION_SET_IPV6_DST_IPV6_DST,
+   ACTION_NEXT,
+   ZERO,
+};
+
 static const enum index action_jump[] = {
ACTION_JUMP_GROUP,
ACTION_NEXT,
@@ -2470,6 +2506,70 @@ static const struct token token_list[] = {
.next = NEXT(NEXT_ENTRY(ACTION_NEXT)),
.call = parse_vc,
},
+   [ACTION_SET_IPV4_SRC] = {
+   .name = "set_ipv4_src",
+   .help = "set IPv4 source address",
+   .priv = PRIV_ACTION(SET_IPV4_SRC,
+   sizeof(struct rte_flow_action_set_ipv4)),
+   .next = NEXT(action_set_ipv4_src),
+   .call = parse_vc,
+   },
+   [ACTION_SET_IPV4_SRC_IPV4_SRC] = {
+   .name = "ipv4_addr",
+   .help = "new IPv4 source address to set",
+   .next = NEXT(action_set_ipv4_src, NEXT_ENTRY(IPV4_ADDR)),
+   .args = ARGS(ARGS_ENTRY_HTON
+   (struct rte_flow_action_set_ipv4, ipv4_addr)),
+   .call = parse_vc_conf,
+   },
+   [ACTION_SET_IPV4_DST] = {
+   .name = "set_ipv4_dst",
+   .help = "set IPv4 destination address",
+   .priv = PRIV_ACTION(SET_IPV4_DST,
+   sizeof(struct rte_flow_action_set_ipv4)),
+   .next = NEXT(action_set_ipv4_dst),
+   .call = parse_vc,
+   },
+   [ACTION_SET_IPV4_DST_IPV4_DST] = {
+   .name = "ipv4_addr",
+   .help = "new IPv4 destination address to set",
+   .next = NEXT(action_set_ipv4_dst, NEXT_ENTRY(IPV4_ADDR)),
+   .args = ARGS(ARGS_ENTRY_HTON
+   (struct rte_flow_action_set_ipv4, ipv4_addr)),
+   .call = parse_vc_conf,
+   },
+   [ACTION_SET_IPV6_SRC] = {
+   .name = "set_ipv6_src",
+   .help = "set IPv6 source address",
+   .priv = PRIV_ACTION(SET_IPV6_SRC,
+   sizeof(struct rte_flow_action_set_ipv6)),
+   .next = NEXT(action_set_ipv6_src),
+   .call = parse_vc,
+   },
+   [ACTION_SET_IPV6_SRC_IPV6_SRC] = {
+   .name = "ipv6_addr",
+   .help = "new IPv6 source address to set",
+   .next = NEXT(action_set_ipv6_src, NEXT_ENTRY(IPV6_ADDR)),
+   .args = ARGS(ARGS_ENTRY_HTON
+   (struct rte_flow_action_set_ipv6, ipv6_addr)),
+   .call = parse_vc_conf,
+   },
+   [ACTION_SET_IPV6_DST] = {
+   .name = "set_ipv6_dst",
+   .help = "set IPv6 destination address",
+   .priv = PRIV_ACTION(SET_IPV6_DST,
+   

[dpdk-dev] [RFC v2 0/3] ethdev: add IP address and TCP/UDP port rewrite actions to flow API

2018-08-13 Thread Rahul Lakkireddy
This series of patches add support for actions:
- SET_IPV4_SRC - set a new IPv4 source address.
- SET_IPV4_DST - set a new IPv4 destination address.
- SET_IPV6_SRC - set a new IPv6 source address.
- SET_IPV6_DST - set a new IPv6 destination address.
- SET_TP_SRC - set a new TCP/UDP source port number.
- SET_TP_DST - set a new TCP/UDP destination port number.

These actions are useful in Network Address Translation use case
to edit IP address and TCP/UDP port numbers before switching
the packets out to the destination device port.

Patch 1 adds support for IP address rewrite to rte_flow and testpmd.

Patch 2 adds support for TCP/UDP port rewrite to rte_flow and testpmd.

Patch 3 shows CXGBE PMD example to offload these actions to hardware.

Feedback and suggestions will be much appreciated.

Thanks,
Rahul

---
v2
- Remove OpenFlow prefix.
- Remove Network (NW) prefix from IPv4 and IPv6 rewrite actions.
- Re-based to tip.

Shagun Agrawal (3):
  ethdev: add flow api actions to modify IP addresses
  ethdev: add flow api actions to modify TCP/UDP port numbers
  net/cxgbe: add flow actions to modify IP and TCP/UDP port address

 app/test-pmd/cmdline_flow.c | 150 
 app/test-pmd/config.c   |  12 +++
 doc/guides/prog_guide/rte_flow.rst  |  90 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  24 +
 drivers/net/cxgbe/base/common.h |   1 +
 drivers/net/cxgbe/base/t4fw_interface.h |  50 ++
 drivers/net/cxgbe/cxgbe_filter.c|  21 +++-
 drivers/net/cxgbe/cxgbe_filter.h|  23 +
 drivers/net/cxgbe/cxgbe_flow.c  | 100 ++-
 drivers/net/cxgbe/cxgbe_main.c  |  10 ++
 lib/librte_ethdev/rte_flow.c|  12 +++
 lib/librte_ethdev/rte_flow.h|  87 
 12 files changed, 575 insertions(+), 5 deletions(-)

-- 
2.14.1



Re: [dpdk-dev] [PATCH v3 1/6] doc/qat: add limitations to compressdev PMD

2018-08-13 Thread Jozwiak, TomaszX



-Original Message-
From: Trahe, Fiona 
Sent: Friday, August 10, 2018 4:11 PM
To: dev@dpdk.org; De Lara Guarch, Pablo ; 
Jozwiak, TomaszX ; tho...@monjalon.net
Cc: Trahe, Fiona 
Subject: [PATCH v3 1/6] doc/qat: add limitations to compressdev PMD

Add 2 missing limitations to QAT compressdev documentation

Signed-off-by: Fiona Trahe 
Acked-by: tomaszx.jozw...@intel.com


Re: [dpdk-dev] [PATCH v3 2/6] doc/qat: add information on how to test

2018-08-13 Thread Jozwiak, TomaszX



-Original Message-
From: Trahe, Fiona 
Sent: Friday, August 10, 2018 4:11 PM
To: dev@dpdk.org; De Lara Guarch, Pablo ; 
Jozwiak, TomaszX ; tho...@monjalon.net
Cc: Trahe, Fiona 
Subject: [PATCH v3 2/6] doc/qat: add information on how to test

Add section to common QAT part of doc about which tests can be used to exercise 
QAT compress and crypto PMDS

Signed-off-by: Fiona Trahe 
Acked-by: tomaszx.jozw...@intel.com


Re: [dpdk-dev] [PATCH v3 3/6] doc/qat: fix typos and make cosmetic changes

2018-08-13 Thread Jozwiak, TomaszX



-Original Message-
From: Trahe, Fiona 
Sent: Friday, August 10, 2018 4:11 PM
To: dev@dpdk.org; De Lara Guarch, Pablo ; 
Jozwiak, TomaszX ; tho...@monjalon.net
Cc: Trahe, Fiona 
Subject: [PATCH v3 3/6] doc/qat: fix typos and make cosmetic changes

Signed-off-by: Fiona Trahe 
Acked-by: tomaszx.jozw...@intel.com


Re: [dpdk-dev] [PATCH v3 4/6] doc/qat: add overview of doc sections

2018-08-13 Thread Jozwiak, TomaszX



-Original Message-
From: Trahe, Fiona 
Sent: Friday, August 10, 2018 4:11 PM
To: dev@dpdk.org; De Lara Guarch, Pablo ; 
Jozwiak, TomaszX ; tho...@monjalon.net
Cc: Trahe, Fiona 
Subject: [PATCH v3 4/6] doc/qat: add overview of doc sections

Add overview of QAT doc sections and link between them.
Indent to next level all sections within the crypto and common sections.

Signed-off-by: Fiona Trahe 
Acked-by: tomaszx.jozw...@intel.com


Re: [dpdk-dev] [PATCH v3 6/6] doc/qat: describe build config options

2018-08-13 Thread Jozwiak, TomaszX



-Original Message-
From: Trahe, Fiona 
Sent: Friday, August 10, 2018 4:11 PM
To: dev@dpdk.org; De Lara Guarch, Pablo ; 
Jozwiak, TomaszX ; tho...@monjalon.net
Cc: Trahe, Fiona 
Subject: [PATCH v3 6/6] doc/qat: describe build config options

Added description of the build configuration options for QAT.

Signed-off-by: Fiona Trahe 
Acked-by: tomaszx.jozw...@intel.com


Re: [dpdk-dev] [PATCH] compress/qat: use compression specific driver name

2018-08-13 Thread Jozwiak, TomaszX



-Original Message-
From: Trahe, Fiona 
Sent: Friday, August 10, 2018 5:18 PM
To: dev@dpdk.org; akhil.go...@nxp.com; De Lara Guarch, Pablo 
; Jozwiak, TomaszX 
Cc: Trahe, Fiona 
Subject: [PATCH] compress/qat: use compression specific driver name

The QAT compression driver was named "qat".
Rename to compress_qat for consistency with other compressdev drivers and with 
crypto_qat.

Signed-off-by: Fiona Trahe 
Acked-by: tomaszx.jozw...@intel.com


Re: [dpdk-dev] [PATCH] crypto/qat: fix typo

2018-08-13 Thread Jozwiak, TomaszX



-Original Message-
From: Trahe, Fiona 
Sent: Friday, August 10, 2018 5:20 PM
To: dev@dpdk.org; akhil.go...@nxp.com; De Lara Guarch, Pablo 
; Jozwiak, TomaszX 
Cc: Trahe, Fiona 
Subject: RE: [PATCH] crypto/qat: fix typo

Corrected subject - only 1 patch here.

> -Original Message-
> From: Trahe, Fiona
> Sent: Friday, August 10, 2018 4:19 PM
> To: dev@dpdk.org; akhil.go...@nxp.com; De Lara Guarch, Pablo 
> ; Jozwiak, TomaszX 
> 
> Cc: Trahe, Fiona 
> Subject: [PATCH 1/2] crypto/qat: fix typo
> 
> Signed-off-by: Fiona Trahe 
Acked-by: tomaszx.jozw...@intel.com


Re: [dpdk-dev] [PATCH v3 5/6] doc/qat: update build instructions for both PMDs

2018-08-13 Thread Jozwiak, TomaszX



-Original Message-
From: Trahe, Fiona 
Sent: Friday, August 10, 2018 4:11 PM
To: dev@dpdk.org; De Lara Guarch, Pablo ; 
Jozwiak, TomaszX ; tho...@monjalon.net
Cc: Trahe, Fiona 
Subject: [PATCH v3 5/6] doc/qat: update build instructions for both PMDs

Update PMD build section.
Linked to kernel dependency section and refactored text between those 2 
sections.

Signed-off-by: Fiona Trahe 
Acked-by: tomaszx.jozw...@intel.com


Re: [dpdk-dev] [PATCH v3 0/6] doc/qat: clarify build config options

2018-08-13 Thread Jozwiak, TomaszX



-Original Message-
From: Trahe, Fiona 
Sent: Friday, August 10, 2018 4:11 PM
To: dev@dpdk.org; De Lara Guarch, Pablo ; 
Jozwiak, TomaszX ; tho...@monjalon.net
Cc: Trahe, Fiona 
Subject: [PATCH v3 0/6] doc/qat: clarify build config options

Clarified documentation structure between compressedev, cryptodev and common 
build parts.
Clarified build configuration options.
Added Testing section.
Fixed typos and made some cosmetic improvements.


v3 changes
 - squashed some patches
 - added more text to remaining commit msgs
 - fixed underline line length
 - fixed typo: comp_qat should be qat 

v2 changes
 - split into patchset
 - changed test path from build/build/test/test to build/app
 - use make defconfig instead of make config T=xxx
 - matched underline lengths to titles

Fiona Trahe (6):
  doc/qat: add limitations to compressdev PMD
  doc/qat: add information on how to test
  doc/qat: fix typos and make cosmetic changes
  doc/qat: add overview of doc sections
  doc/qat: update build instructions for both PMDs
  doc/qat: describe build config options

 doc/guides/compressdevs/qat_comp.rst |   6 +-
 doc/guides/cryptodevs/qat.rst| 195 +--
 2 files changed, 143 insertions(+), 58 deletions(-)

--
2.13.6


Series-acked-by: tomaszx.jozw...@intel.com


Re: [dpdk-dev] [PATCH] net/ixgbe: remove hardcoded CRC STRIP config from ixgbe

2018-08-13 Thread Ferruh Yigit
On 8/12/2018 9:46 AM, Shahaf Shuler wrote:
> Sunday, August 12, 2018 10:53 AM, Andrew Rybchenko:
>> Subject: Re: [PATCH] net/ixgbe: remove hardcoded CRC STRIP config from
>> ixgbe
>>
>> On 12.08.2018 09:28, Shahaf Shuler wrote:
>>> Thursday, August 9, 2018 11:32 AM, Ferruh Yigit:
 Subject: Re: [PATCH] net/ixgbe: remove hardcoded CRC STRIP config
 from ixgbe

 On 7/24/2018 3:36 AM, Wei Zhao wrote:
> There is CRC related ifdefs for ixgbe:
> CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n
> It is used in VF drivers ixgbevf_dev_configure() functions.
> VF cannot change the CRC strip behavior and based on what PF
> configured it needs to response proper to user
> ixgbevf_dev_configure() request. Right now what PF set is defined by
> above config options but this method is too static.
>
> Signed-off-by: Wei Zhao 
> Signed-off-by: Wenzhuo Lu 
 <...>
> @@ -334,6 +334,7 @@ struct rte_eth_rxmode {
>* structure are allowed to be set.
>*/
>   uint64_t offloads;
> + uint64_t offloads_disable;
 Do we need a disable flag in ethdev, an offload not enabled is
 disabled by default isn't it. This conflicts with offloads flag and makes 
 is
>> confusing.
>>> +1.
>>>
>>> **all** the offloads are disabled by default.
>>>
 For igb/e1000 VF case, VF driver can't change what PF set and VF
 driver can't learn the PF setting dynamically, so this information
 needs to be passed to VF driver by application/user.
 Currently this information passed by compile time config option, my
 suggestion was using devargs.

 In your implementation testpmd parameter added to get this
 information and pass to driver, but this means all applications needs
 to do this, instead adding this support to driver looks better to me.
>>
>> I think we should add fixed offloads to dev_info. I.e. if offlload is 
>> supported,
>> it could be marked as fixed (i.e. always enabled).
>> If offload is not supported, it is always disabled (and cannot/should not be
>> marked as fixed).
>> May be the right name for it is not "fixed", but "always_enabled".
> 
> I think it will over complicate applications. Those limitation should be 
> expressed as part of the "limitation" section of the corresponding PMD guide. 
> 
> PMDs with such limitation can also put some warning message to notify or even 
> fail the device configuration if the needed permanent offload is not set by 
> the application. 

Some PMDs already have these warning messages, for the case device supports an
offload, so it is in advertised capabilities, but doesn't support disabling it.

"Can't disable"/"always on" information from PMD is missing now, it would be
nice to get it from PMD but I agree that it will complicate things.

And this won't help the ixgbe VF case anyway, for that case if offload can be
enabled/disable in VF depends on PF configuration, so it is not a fixed
information for VF that you can put into driver code.

> 
>> Also it should be persistent. It should not be allowed in above ixgbe case to
>> change the offload state on PF if there are other users (drivers attached).
>> Otherwise, we need mechanism to notify apps about these changes -
>> overcomplicated.



Re: [dpdk-dev] [RFC] mlx5: fix error unwind in device start

2018-08-13 Thread Stephen Hemminger
On Mon, 13 Aug 2018 07:52:47 +
Shahaf Shuler  wrote:

> Hi Stephan,
> 
> Thursday, August 2, 2018 1:00 AM, Stephen Hemminger:
> > Subject: [RFC] mlx5: fix error unwind in device start
> > 
> > The error handling in start of the mlx5 driver is buggy.
> > For example, if setting up the flows fails the device driver will then get 
> > stuck
> > in mlx5_flow_rxq_flags_clear waiting for something that will never happen.  
> 
> Looking at the code I cannot understand why the mlx5_flow_rxq_flags_clear get 
> stuck nor to what it waits.
> The function has few finite loops which are not depended in anything which 
> happened before it at the device start.
> 
> Moreover I tried to force either the mlx5_traffic_enable or the 
> mlx5_flow_start to stop, however the results was the port failed to start but 
> no stuck.
> 
> Can you provide more details about the issue you saw there?  
> 
> > 
> > The problem is that the code jumps to a common error label and does
> > unwind for portions of the driver which have not been setup.
> > 
> > This suggested patch breaks it into different labels with each failure path 
> > only
> > unwinding what was done.
> > 
> > Also, the ethdev driver should not be manipulating the dev_started flag
> > directly. That is handled by the common ethdev layer.
> >   
> 
> I agree that maybe this code part can be better written, but my question 
> before is whether we have an actual bug that we will solve w/ this change? 
> 
> > The patch works for the success case, but furthur testing is needed to
> > actually exercise all the error paths.
> > This is left as exercise for the maintainers.
> > 
> > Signed-off-by: Stephen Hemminger 
> > ---
> >  drivers/net/mlx5/mlx5_trigger.c | 26 +-
> >  1 file changed, 13 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/net/mlx5/mlx5_trigger.c
> > b/drivers/net/mlx5/mlx5_trigger.c index e2a9bb703261..79a7b233986a
> > 100644
> > --- a/drivers/net/mlx5/mlx5_trigger.c
> > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > @@ -171,42 +171,42 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > if (ret) {
> > DRV_LOG(ERR, "port %u Rx queue allocation failed: %s",
> > dev->data->port_id, strerror(rte_errno));
> > -   mlx5_txq_stop(dev);
> > -   return -rte_errno;
> > +   goto error_txq_stop;
> > }
> > -   dev->data->dev_started = 1;
> > +
> > ret = mlx5_rx_intr_vec_enable(dev);
> > if (ret) {
> > DRV_LOG(ERR, "port %u Rx interrupt vector creation failed",
> > dev->data->port_id);
> > -   goto error;
> > +   goto error_rxq_stop;
> > }
> > mlx5_xstats_init(dev);
> > ret = mlx5_traffic_enable(dev);
> > if (ret) {
> > DRV_LOG(DEBUG, "port %u failed to set defaults flows",
> > dev->data->port_id);
> > -   goto error;
> > +   goto error_intr_vec_disable;
> > }
> > ret = mlx5_flow_start(dev, &priv->flows);
> > if (ret) {
> > DRV_LOG(DEBUG, "port %u failed to set flows",
> > dev->data->port_id);
> > -   goto error;
> > +   goto error_traffic_disable;
> > }
> > +
> > dev->tx_pkt_burst = mlx5_select_tx_function(dev);
> > dev->rx_pkt_burst = mlx5_select_rx_function(dev);
> > mlx5_dev_interrupt_handler_install(dev);
> > return 0;
> > -error:
> > -   ret = rte_errno; /* Save rte_errno before cleanup. */
> > -   /* Rollback. */
> > -   dev->data->dev_started = 0;
> > -   mlx5_flow_stop(dev, &priv->flows);
> > +
> > +error_traffic_disable:
> > mlx5_traffic_disable(dev);
> > -   mlx5_txq_stop(dev);
> > +error_intr_vec_disable:
> > +   mlx5_rx_intr_vec_disable(dev);
> > +error_rxq_stop:
> > mlx5_rxq_stop(dev);
> > -   rte_errno = ret; /* Restore rte_errno. */
> > +error_txq_stop:
> > +   mlx5_txq_stop(dev);
> > return -rte_errno;
> >  }
> > 
> > --
> > 2.18.0  
> 

The issue was caused in an early version of netvsc VF support where it forgot
to call dev_configure on the mlx5 device. In that case mlx5 would get confused 
and stuck.


[dpdk-dev] [PATCH] checkpatches: don't assume bash syntax

2018-08-13 Thread Stephen Hemminger
The read -d option is a bash extension and not avaiable in other
shells. On Debian, /bin/sh is dash and checktpatches would
fail with:
./devtools/checkpatches.sh: 52: read: Illegal option -d

Fix by using awk -e and adding necessary double backslash.

Fixes: 7413e7f2aeb3 ("devtools: alert on new calls to exit from libs")
Signed-off-by: Stephen Hemminger 
---
 devtools/checkpatches.sh | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
index ba795ad1dc36..c63162678538 100755
--- a/devtools/checkpatches.sh
+++ b/devtools/checkpatches.sh
@@ -49,7 +49,7 @@ check_forbidden_additions() {
 # - No search is done inside comments
 # - Both additions and removals of the expressions are checked
 #   A positive balance of additions fails the check
-   read -d '' awk_script << 'EOF'
+awk -e '
BEGIN {
split(FOLDERS,deny_folders," ");
split(EXPRESSIONS,deny_expr," ");
@@ -70,7 +70,7 @@ check_forbidden_additions() {
# non comment code
if (in_comment == 0) {
for (i in deny_expr) {
-   forbidden_added = "^\+.*" deny_expr[i];
+   forbidden_added = "^\\+.*" deny_expr[i];
forbidden_removed="^-.*" deny_expr[i];
current = expressions[deny_expr[i]]
if ($0 ~ forbidden_added) {
@@ -90,13 +90,13 @@ check_forbidden_additions() {
}
# switch to next file , check if the balance of add/remove
# of previous filehad new additions
-   ($0 ~ "^\+\+\+ b/") {
+   ($0 ~ "^\\+\\+\\+ b/") {
in_file = 0;
if (count > 0) {
exit;
}
for (i in deny_folders) {
-   re = "^\+\+\+ b/" deny_folders[i];
+   re = "^\\+\\+\\+ b/" deny_folders[i];
if ($0 ~ deny_folders[i]) {
in_file = 1
last_file = $0
@@ -115,14 +115,8 @@ check_forbidden_additions() {
exit RET_ON_FAIL
}
}
-EOF
-   # -
-   # refrain from new additions of rte_panic() and rte_exit()
-   # multiple folders and expressions are separated by spaces
-   awk -v FOLDERS="lib drivers" \
-   -v EXPRESSIONS="rte_panic\\\( rte_exit\\\(" \
-   -v RET_ON_FAIL=1 \
-   "$awk_script" -
+' -v FOLDERS="lib drivers" -v EXPRESSIONS="rte_panic\\\( rte_exit\\\(" \
+   -v RET_ON_FAIL=1 
 }
 
 number=0
-- 
2.18.0



[dpdk-dev] [PATCH v2 0/2] netvsc: event buffer bug fixes

2018-08-13 Thread Stephen Hemminger
A couple of bugs were introduced by the way the event
buffer is handled.

Stephen Hemminger (2):
  netvsc: fix rte malloc pool corruption
  netvsc: resize event buffer as needed

v2 - split into two patches and fix whitespace

 drivers/net/netvsc/hn_rxtx.c | 49 +---
 drivers/net/netvsc/hn_var.h  |  2 +-
 2 files changed, 35 insertions(+), 16 deletions(-)

-- 
2.18.0



[dpdk-dev] [PATCH v2 2/2] netvsc: resize event buffer as needed

2018-08-13 Thread Stephen Hemminger
The event buffer was changed to be a fixed size value, but it
is not large enough for a forwarding stress test.

This version of event buffer code uses malloc/realloc to size
the event buffer as needed. Malloc is preferred over rte_malloc
because the event buffer does not need to be used for DMA
and huge page is a limited resource.

Fixes: 530af95a7849 ("bus/vmbus: avoid signalling host on read")
Signed-off-by: Stephen Hemminger 
---
 drivers/net/netvsc/hn_rxtx.c | 50 ++--
 drivers/net/netvsc/hn_var.h  |  2 +-
 2 files changed, 37 insertions(+), 15 deletions(-)

diff --git a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c
index 3e52a328b152..79bb4d587783 100644
--- a/drivers/net/netvsc/hn_rxtx.c
+++ b/drivers/net/netvsc/hn_rxtx.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -718,16 +719,22 @@ struct hn_rx_queue *hn_rx_queue_alloc(struct hn_data *hv,
 {
struct hn_rx_queue *rxq;
 
-   rxq = rte_zmalloc_socket("HN_RXQ",
-sizeof(*rxq) + HN_RXQ_EVENT_DEFAULT,
+   rxq = rte_zmalloc_socket("HN_RXQ", sizeof(*rxq),
 RTE_CACHE_LINE_SIZE, socket_id);
-   if (rxq) {
-   rxq->hv = hv;
-   rxq->chan = hv->channels[queue_id];
-   rte_spinlock_init(&rxq->ring_lock);
-   rxq->port_id = hv->port_id;
-   rxq->queue_id = queue_id;
+   if (!rxq)
+   return NULL;
+
+   rxq->hv = hv;
+   rxq->chan = hv->channels[queue_id];
+   rte_spinlock_init(&rxq->ring_lock);
+   rxq->port_id = hv->port_id;
+   rxq->queue_id = queue_id;
+   rxq->event_buf = malloc(HN_RXQ_EVENT_DEFAULT);
+   if (!rxq->event_buf) {
+   free(rxq);
+   return NULL;
}
+
return rxq;
 }
 
@@ -776,6 +783,7 @@ hn_dev_rx_queue_setup(struct rte_eth_dev *dev,
 
 fail:
rte_ring_free(rxq->rx_ring);
+   free(rxq->event_buf);
rte_free(rxq);
return -ENOMEM;
 }
@@ -794,8 +802,10 @@ hn_dev_rx_queue_release(void *arg)
rxq->rx_ring = NULL;
rxq->mb_pool = NULL;
 
-   if (rxq != rxq->hv->primary)
+   if (rxq != rxq->hv->primary) {
+   free(rxq->event_buf);
rte_free(rxq);
+   }
 }
 
 void
@@ -850,19 +860,31 @@ void hn_process_events(struct hn_data *hv, uint16_t 
queue_id)
 
for (;;) {
const struct vmbus_chanpkt_hdr *pkt;
-   uint32_t len = HN_RXQ_EVENT_DEFAULT;
+   uint32_t len = malloc_usable_size(rxq->event_buf);
const void *data;
 
+retry:
ret = rte_vmbus_chan_recv_raw(rxq->chan, rxq->event_buf, &len);
if (ret == -EAGAIN)
break;  /* ring is empty */
 
-   else if (ret == -ENOBUFS)
-   rte_exit(EXIT_FAILURE, "event buffer not big enough (%u 
< %u)",
-HN_RXQ_EVENT_DEFAULT, len);
-   else if (ret <= 0)
+   if (unlikely(ret == -ENOBUFS)) {
+   /* event buffer not large enough to read ring */
+
+   PMD_DRV_LOG(DEBUG,
+   "event buffer expansion (need %u)", len);
+   rxq->event_buf = realloc(rxq->event_buf, len);
+   if (rxq->event_buf)
+   goto retry;
+   /* out of memory, no more events now */
+   break;
+   }
+
+   if (unlikely(ret <= 0)) {
+   /* This indicates a failure to communicate (or worse) */
rte_exit(EXIT_FAILURE,
 "vmbus ring buffer error: %d", ret);
+   }
 
bytes_read += ret;
pkt = (const struct vmbus_chanpkt_hdr *)rxq->event_buf;
diff --git a/drivers/net/netvsc/hn_var.h b/drivers/net/netvsc/hn_var.h
index f7ff8585bc1c..0430f450cf37 100644
--- a/drivers/net/netvsc/hn_var.h
+++ b/drivers/net/netvsc/hn_var.h
@@ -77,7 +77,7 @@ struct hn_rx_queue {
struct hn_stats stats;
uint64_t ring_full;
 
-   uint8_t event_buf[];
+   void *event_buf;
 };
 
 
-- 
2.18.0



[dpdk-dev] [PATCH v2 1/2] netvsc: fix rte malloc pool corruption

2018-08-13 Thread Stephen Hemminger
The event buffer was changed to be a fixed size value, but
calls to rte_free were left. That causes bugs because
it calls to rte_free() for a pointer that was not setup with
rte_malloc().

Fixes: 530af95a7849 ("bus/vmbus: avoid signalling host on read")
Signed-off-by: Stephen Hemminger 
---
 drivers/net/netvsc/hn_rxtx.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c
index 02ef27e363cc..3e52a328b152 100644
--- a/drivers/net/netvsc/hn_rxtx.c
+++ b/drivers/net/netvsc/hn_rxtx.c
@@ -776,7 +776,6 @@ hn_dev_rx_queue_setup(struct rte_eth_dev *dev,
 
 fail:
rte_ring_free(rxq->rx_ring);
-   rte_free(rxq->event_buf);
rte_free(rxq);
return -ENOMEM;
 }
@@ -795,10 +794,8 @@ hn_dev_rx_queue_release(void *arg)
rxq->rx_ring = NULL;
rxq->mb_pool = NULL;
 
-   if (rxq != rxq->hv->primary) {
-   rte_free(rxq->event_buf);
+   if (rxq != rxq->hv->primary)
rte_free(rxq);
-   }
 }
 
 void
-- 
2.18.0



Re: [dpdk-dev] [PATCH v2 00/12] preparing l2fwd for eventmode additions

2018-08-13 Thread Joseph, Anoob

Hi Bruce,

The reason why l2fwd was chosen was to allow everyone to chip in their 
ideas while preparing the framework.
This framework would be extended to other applications, hence needed 
enough inputs before expanding to complex applications. If your 
suggestion is to make l3fwd event driven first, I'll start looking in 
that direction.


As for l2fwd, I'm fine with moving event-mode additions to a new app. 
But with the present approach, the app would run in both event mode and 
poll mode.


Your thoughts on renaming the existing app to l2fwd-poll and the 
proposed app as l2fwd?


Thanks,
Anoob
On 13-08-2018 14:57, Bruce Richardson wrote:

External Email

On Mon, Aug 13, 2018 at 12:52:19PM +0530, Joseph, Anoob wrote:

Hi Bruce, Pablo,

If there are no more issues about the approach, can you review the patches
and give the feedback?

Please do note that this series doesn't add any event mode specific code.
That will come as a different patch series after incorporating Jerin's
comments.

Thanks,
Anoob

My main concern is with l2fwd, rather than l3fwd, which is already fairly
complicated. I could see l3fwd being updated to allow an eventmode without
too many problems.

With l2fwd, the only issue I have is with the volume of code involved.
l2fwd is currently a very simple application which fits in a single file.
With these updates it's no longer such a simple, approachable app, rather
it becomes one which takes a bit of studying a switching between files to
fully understand. The data path is only a very small part of the app, so by
adding an event-based path to the same app we have very little code saving.
Therefore, I think having a separate l2fwd-eventdev would be better for
this case. Two simpler to understand apps is better than one more
complicated on IMHO.

My 2c.

/Bruce


On 02-08-2018 13:49, Ananyev, Konstantin wrote:

External Email

Hi everyone,


In order to get this series accepted, we need more discussions
with more people involved.
So it will miss 18.08.

It can be discussed in a more global discussion about examples maintenance.
If discussion does not happen, you can request it to the technical board.


Event dev framework and various adapters enable multiple packet handling
schemes, as opposed to the traditional polling on queues. But these
features are not integrated into any established example application.
There are specific example applications for event dev etc, which can be
used to analyze an event device or a particular eventdev adapter, but
there is no standard application which can be used to compare the real
world performance for a system when it's using event device for packet
handling and when it's done via polling on queues.

The following patch submitted by Sunil was looking to address this issue
with l3fwd,
https://mails.dpdk.org/archives/dev/2018-March/093131.html

Bruce & Jerin reviewed the patch and suggested the addition of helper
functions to abstract the event mode additions in applications,
https://mails.dpdk.org/archives/dev/2018-April/096879.html

This effort of adding helper functions for eventmode was taken up
following the above suggestion. The idea is to add eventmode without
touching the existing code path. All the eventmode specific additions
would go into library so that these need not be repeated for every
application. And since there is no change in the existing code path,
performance for any vendor should not have any impact with the additions.

The scope of this effort has increased since the submission, as now we
have Tx adapter as well. Sunil & Konstantin had clarified their
concerns, and gave green flag to this approach.
https://mails.dpdk.org/archives/dev/2018-June/105730.html
https://mails.dpdk.org/archives/dev/2018-July/106453.html

I guess Bruce was opening this question to the community. For compute
intense applications like ipsec-secgw, eventmode might be the right
approach in the first place. Such complex applications would need a
scheduler to perform dynamic load balancing. Addition of eventmode in
l2fwd was to float around the idea which can then be scaled for more
complex applications.

If maintainers doesn't have any objection to this, I'm fine with adding
this in the next release.

Thanks,
Anoob

It is important that DPDK has good examples of how to use existing
frameworks and libraries. These applications are what most customers
build their applications from and they provide basis for testing.

The DPDK needs to continue to support multiple usage models. This
is one of its strong points. I would rather leave existing l2fwd
and l3fwd alone and instead make new examples that use the frameworks.
If nothing else haveing l2fwd and l2fwd-eventdev would allow for
performance comparisons.

Unlike other applications example, there wont be any change in packet
processing functions in eventdev vs poll mode case. Only worker
schematics will change and that can be moved to separated files.
something like worker_poll.c and worker_event.c and both of th

[dpdk-dev] [RFC 1/2] mbuf: add a sanity check on segment metadata

2018-08-13 Thread David Marchand
Add some basic check on the segments offset and length metadata:
always funny to have a < 0 tailroom cast to uint16_t ;-).

Signed-off-by: David Marchand 
---
 lib/librte_mbuf/rte_mbuf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index e714c5a..7eeef12 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -200,6 +200,8 @@ rte_mbuf_sanity_check(const struct rte_mbuf *m, int 
is_header)
pkt_len = m->pkt_len;
 
do {
+   if (m->data_off + m->data_len > m->buf_len)
+   rte_panic("bad segment metadata\n");
nb_segs -= 1;
pkt_len -= m->data_len;
} while ((m = m->next) != NULL);
-- 
2.7.4



[dpdk-dev] [RFC 2/2] ethdev: check received mbufs sanity

2018-08-13 Thread David Marchand
Let's check the mbufs given by the drivers directly in the rx handler.
The only drawback is that you need CONFIG_RTE_LIBRTE_MBUF_DEBUG to be set
for this to actually do some real checks.

Signed-off-by: David Marchand 
---
 lib/librte_ethdev/rte_ethdev.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 7070e9a..8843307 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -3803,6 +3803,7 @@ rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
 {
struct rte_eth_dev *dev = &rte_eth_devices[port_id];
uint16_t nb_rx;
+   uint16_t index;
 
 #ifdef RTE_LIBRTE_ETHDEV_DEBUG
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
@@ -3816,6 +3817,9 @@ rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
 rx_pkts, nb_pkts);
 
+   for (index = 0; index < nb_rx; index++)
+   __rte_mbuf_sanity_check(rx_pkts[index], 1);
+
 #ifdef RTE_ETHDEV_RXTX_CALLBACKS
if (unlikely(dev->post_rx_burst_cbs[queue_id] != NULL)) {
struct rte_eth_rxtx_callback *cb =
-- 
2.7.4



Re: [dpdk-dev] [PATCH v2 2/2] netvsc: resize event buffer as needed

2018-08-13 Thread Stephen Hemminger
On Mon, 13 Aug 2018 08:51:08 -0700
Stephen Hemminger  wrote:

> The event buffer was changed to be a fixed size value, but it
> is not large enough for a forwarding stress test.
> 
> This version of event buffer code uses malloc/realloc to size
> the event buffer as needed. Malloc is preferred over rte_malloc
> because the event buffer does not need to be used for DMA
> and huge page is a limited resource.
> 
> Fixes: 530af95a7849 ("bus/vmbus: avoid signalling host on read")
> Signed-off-by: Stephen Hemminger 

Self NAK. This won't work when secondary process needs the buffer.


[dpdk-dev] 16.11.8 (LTS) patches review and test

2018-08-13 Thread luca . boccassi
Hi all,

Here is a list of patches targeted for LTS release 16.11.8. Please
help review and test. The planned date for the final release is August
the 23rd.
Before that, please shout if anyone has objections with these
patches being applied.

Also for the companies committed to running regression tests,
please run the tests and report any issue before the release date.

A release candidate tarball can be found at:

https://dpdk.org/browse/dpdk-stable/tag/?id=v16.11.8-rc1

These patches are located at branch 16.11 of dpdk-stable repo:
https://dpdk.org/browse/dpdk-stable/

Thanks.

Luca Boccassi

---
Adrien Mazarguil (1):
  maintainers: update for Mellanox PMDs

Ajit Khaparde (6):
  net/bnxt: fix HW Tx checksum offload check
  net/bnxt: fix incorrect IO address handling in Tx
  net/bnxt: fix Rx ring count limitation
  net/bnxt: check access denied for HWRM commands
  net/bnxt: fix RETA size
  net/bnxt: fix close operation

Alejandro Lucero (1):
  net/nfp: fix field initialization in Tx descriptor

Anatoly Burakov (2):
  eal/linux: fix invalid syntax in interrupts
  test: fix EAL flags autotest on FreeBSD

Beilei Xing (1):
  net/i40e: fix shifts of 32-bit value

Bruce Richardson (2):
  examples/exception_path: fix out-of-bounds read
  mk: fix permissions when using make install

Chas Williams (1):
  net/bonding: do not clear active slave count

Damjan Marion (1):
  net/i40e: do not reset device info data

Dan Gora (1):
  kni: fix crash with null name

Daria Kolistratova (1):
  net/ena: fix SIGFPE with 0 Rx queue

Dariusz Stojaczyk (1):
  eal: fix return codes on thread naming failure

Drocula Lambda (1):
  kni: fix build on RHEL 7.5

Emma Kenny (1):
  examples/multi_process: build l2fwd_fork app

Ferruh Yigit (2):
  kni: fix build with gcc 8.1
  net/thunderx: fix build with gcc optimization on

Fiona Trahe (1):
  crypto/qat: fix checks for 3GPP algo bit params

Gage Eads (1):
  net: rename u16 to fix shadowed declaration

Gavin Hu (1):
  maintainers: claim maintainership for ARM v7 and v8

Haiyue Wang (2):
  mbuf: fix typo in IPv6 macro comment
  net/i40e: workaround performance degradation

Hemant Agrawal (1):
  test/crypto: fix device id when stopping port

Hyong Youb Kim (1):
  net/enic: do not overwrite admin Tx queue limit

Ido Goshen (1):
  net/pcap: fix multiple queues

Jerin Jacob (2):
  ethdev: fix queue statistics mapping documentation
  eal: fix bitmap documentation

Kiran Kumar (3):
  net/bonding: fix MAC address reset
  ethdev: check queue stats mapping input arguments
  net/thunderx: avoid sq door bell write on zero packet

Konstantin Ananyev (3):
  examples/ipsec-secgw: fix IPv4 checksum at Tx
  examples/ipsec-secgw: fix bypass rule processing
  app/testpmd: fix DCB config

Maxime Coquelin (1):
  vhost: fix missing increment of log cache count

Pablo de Lara (3):
  test/hash: fix multiwriter with non consecutive cores
  test/hash: fix potential memory leak
  hash: fix doxygen of return values

Radu Nicolau (2):
  test: fix uninitialized port configuration
  net/bonding: fix race condition

Rafal Kozik (4):
  net/ena: check pointer before memset
  net/ena: change memory type
  net/ena: fix GENMASK_ULL macro
  net/ena: set link speed as none

Rahul Lakkireddy (2):
  net/cxgbe/base: update flash part information
  net/cxgbe: fix init failure due to new flash parts

Rami Rosen (2):
  examples/l3fwd: remove useless include
  ethdev: fix a doxygen comment for port allocation

Rasesh Mody (3):
  net/qede: fix default extended VLAN offload config
  net/qede/base: fix GRC attention callback
  net/bnx2x: fix FW command timeout during stop

Shahed Shaikh (1):
  net/qede: fix MAC address removal failure message

Shreyansh Jain (1):
  doc: fix bonding command in testpmd

Wei Zhao (6):
  net/ixgbe: fix tunnel id format error for FDIR
  net/ixgbe: fix tunnel type set error for FDIR
  net/ixgbe: fix mask bits register set error for FDIR
  app/testpmd: fix VLAN TCI mask set error for FDIR
  net/i40e: fix check of flow director programming status
  net/i40e: revert fix of flow director check

Xiaoxin Peng (1):
  net/bnxt: fix Tx with multiple mbuf

Xiaoyun Li (1):
  net/i40e: fix link speed

Yipeng Wang (3):
  hash: fix multiwriter lock memory allocation
  hash: fix a multi-writer race condition
  hash: fix key slot size accuracy


Re: [dpdk-dev] [RFC] ethdev: add tail drop API for traffic management

2018-08-13 Thread Stephen Hemminger
On Mon, 13 Aug 2018 15:53:32 +0800
Rosen Xu  wrote:

> @@ -1028,6 +1094,8 @@ enum rte_tm_error_type {
>   RTE_TM_ERROR_TYPE_WRED_PROFILE_YELLOW,
>   RTE_TM_ERROR_TYPE_WRED_PROFILE_RED,
>   RTE_TM_ERROR_TYPE_WRED_PROFILE_ID,
> + RTE_TM_ERROR_TYPE_TDROP_PROFILE,
> + RTE_TM_ERROR_TYPE_TDROP_PROFILE_ID,
>   RTE_TM_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
>   RTE_TM_ERROR_TYPE_SHAPER_PROFILE,
>   RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_RATE,
> @@ -1279,6 +1347,110 @@ struct rte_tm_error {

Be careful, adding a new enum in middle of list will potentially break ABI.


Re: [dpdk-dev] [PATCH] net/mlx5: fix RSS flow action hash type selection

2018-08-13 Thread Yongseok Koh


> On Aug 12, 2018, at 4:14 AM, Shahaf Shuler  wrote:
> 
> On the code after the below commits, the criteria to select the IPV4 or
> IPV6 hash functions was the existence of some ETH_RSS_IPV4 RSS types on
> the flow rule.
> 
> The check is wrong. For example ETH_RSS_NONFRAG_IPV4_TCP will not select
> the IPV4 hash which will cause the packet to be spread in a bad way.
> 
> Fix it by adding the corresponding types needed for each hash selection.
> 
> Fixes: 592f05b29a25 ("net/mlx5: add RSS flow action")
> Fixes: fd0b70316bca ("net/mlx5: support inner RSS computation")
> Cc: sta...@dpdk.org
> Cc: nelio.laranje...@6wind.com
> Cc: or...@mellanox.com
> 
> Reported-by: Yaroslav Brustinov 
> Signed-off-by: Shahaf Shuler 
> ---
> 
> Few notes:
> 1. this patch should be backported to 18.08 stable
> 2. There is more work planned in 18.11 for the flow engine.
>   The work should be on top of this fix.
> 
> ---
> drivers/net/mlx5/mlx5_flow.c | 8 +++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index ca4625b699..da96932da5 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -1053,6 +1053,8 @@ mlx5_flow_item_ipv4(const struct rte_flow_item *item, 
> struct rte_flow *flow,
>   mlx5_flow_verbs_hashfields_adjust
>   (flow, tunnel,
>(ETH_RSS_IPV4 | ETH_RSS_FRAG_IPV4 |
> +   ETH_RSS_NONFRAG_IPV4_TCP |
> +   ETH_RSS_NONFRAG_IPV4_UDP |
> ETH_RSS_NONFRAG_IPV4_OTHER),
>(IBV_RX_HASH_SRC_IPV4 | IBV_RX_HASH_DST_IPV4));
>   flow->cur_verbs->attr->priority = MLX5_PRIORITY_MAP_L3;
> @@ -1188,7 +1190,11 @@ mlx5_flow_item_ipv6(const struct rte_flow_item *item, 
> struct rte_flow *flow,
>   if (size <= flow_size) {
>   mlx5_flow_verbs_hashfields_adjust
>   (flow, tunnel,
> -  (ETH_RSS_IPV6 | ETH_RSS_NONFRAG_IPV6_OTHER),
> +  (ETH_RSS_IPV6 |
> +   ETH_RSS_NONFRAG_IPV6_TCP | ETH_RSS_NONFRAG_IPV6_UDP |
> +   ETH_RSS_NONFRAG_IPV6_OTHER | ETH_RSS_IPV6_EX |
> +   ETH_RSS_IPV6_TCP_EX | ETH_RSS_IPV6_UDP_EX |
> +   ETH_RSS_FRAG_IPV6 | ETH_RSS_NONFRAG_IPV6_OTHER),

ETH_RSS_NONFRAG_IPV6_OTHER appears twice.
And hope to see the same order as in rte_ethdev.h

Thanks,
Yongseok

>(IBV_RX_HASH_SRC_IPV6 | IBV_RX_HASH_DST_IPV6));
>   flow->cur_verbs->attr->priority = MLX5_PRIORITY_MAP_L3;
>   mlx5_flow_spec_verbs_add(flow, &ipv6, size);
> -- 
> 2.12.0
> 



Re: [dpdk-dev] Multi-thread mempool usage

2018-08-13 Thread Olivier Matz
Hello Matteo,

On Mon, Aug 13, 2018 at 03:20:44PM +0200, Matteo Lanzuisi wrote:
> Any suggestion? any idea about this behaviour?
> 
> Il 08/08/2018 11:56, Matteo Lanzuisi ha scritto:
> > Hi all,
> > 
> > recently I began using "dpdk-17.11-11.el7.x86_64" rpm (RedHat rpm) on
> > RedHat 7.5 kernel 3.10.0-862.6.3.el7.x86_64 as a porting of an
> > application from RH6 to RH7. On RH6 I used dpdk-2.2.0.
> > 
> > This application is made up by one or more threads (each one on a
> > different logical core) reading packets from i40e interfaces.
> > 
> > Each thread can call the following code lines when receiving a specific
> > packet:
> > 
> > RTE_LCORE_FOREACH(lcore_id)
> > {
> >     result =
> > rte_mempool_get(cea_main_lcore_conf[lcore_id].de_conf.cmd_pool, (VOID_P
> > *) &new_work);        // mempools are created one for each logical core
> >     if (((uint64_t)(new_work)) < 0x7f00)
> >     printf("Result %d, lcore di partenza %u, lcore di ricezione
> > %u, pointer %p\n", result, rte_lcore_id(), lcore_id, new_work);    //
> > debug print, on my server it should never happen but with multi-thread
> > happens always on the last logical core

Here, checking the value of new_work looks wrong to me, before
ensuring that result == 0. At least, new_work should be set to
NULL before calling rte_mempool_get().

> >     if (result == 0)
> >     {
> >     new_work->command = command; // usage of the memory gotten
> > from the mempool... <- here is where the application crashes

Do you know why it crashes? Is it that new_work is NULL?

Can you check how the mempool is initialized? It should be in multi
consumer and depending on your use case, single or multi producer.

Another thing that could be checked: at all the places where you
return your work object to the mempool, you should add a check
that it is not NULL. Or just enabling RTE_LIBRTE_MEMPOOL_DEBUG
could do the trick: it adds some additional checks when doing
mempool operations.

> >     result =
> > rte_ring_enqueue(cea_main_lcore_conf[lcore_id].de_conf.cmd_ring,
> > (VOID_P) new_work);    // enqueues the gotten buffer on the rings of all
> > lcores
> >     // check on result value ...
> >     }
> >     else
> >     {
> >     // do something if result != 0 ...
> >     }
> > }
> > 
> > This code worked perfectly (never had an issue) on dpdk-2.2.0, while if
> > I use more than 1 thread doing these operations on dpdk-17.11 it happens
> > that after some times the "new_work" pointer is not a good one, and the
> > application crashes when using that pointer.
> > 
> > It seems that these lines cannot be used by more than one thread
> > simultaneously. I also used many 2017 and 2018 dpdk versions without
> > success.
> > 
> > Is this code possible on the new dpdk versions? Or have I to change my
> > application so that this code is called just by one lcore at a time?

Assuming the mempool is properly initialized, I don't see any reason
why it would not work. There has been a lot of changes in mempool between
dpdk-2.2.0 and dpdk-17.11, but this behavior should remain the same.

If the comments above do not help to solve the issue, it could be helpful
to try to reproduce the issue in a minimal program, so we can help to
review it.

Regards,
Olivier


[dpdk-dev] [PATCH] ethdev: fix rte_eth_dev_owner_unset

2018-08-13 Thread Stephen Hemminger
The rte_eth_dev_owner_unset function is unusable because
it always returns -EINVAL. This is because the magic (unowned)
value is flagged as not valid.

Move the validation of owner into set and unset as
separate calls.

Fixes: 5b7ba31148a8 ("ethdev: add port ownership")
Signed-off-by: Stephen Hemminger 
---
 lib/librte_ethdev/rte_ethdev.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 4c320250589a..9398550a1189 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -443,10 +443,6 @@ _rte_eth_dev_owner_set(const uint16_t port_id, const 
uint64_t old_owner_id,
return -ENODEV;
}
 
-   if (!rte_eth_is_valid_owner_id(new_owner->id) &&
-   !rte_eth_is_valid_owner_id(old_owner_id))
-   return -EINVAL;
-
port_owner = &rte_eth_devices[port_id].data->owner;
if (port_owner->id != old_owner_id) {
RTE_ETHDEV_LOG(ERR,
@@ -475,6 +471,9 @@ rte_eth_dev_owner_set(const uint16_t port_id,
 {
int ret;
 
+   if (!rte_eth_is_valid_owner_id(owner->id))
+   return -EINVAL;
+
rte_eth_dev_shared_data_prepare();
 
rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
@@ -492,6 +491,9 @@ rte_eth_dev_owner_unset(const uint16_t port_id, const 
uint64_t owner_id)
{.id = RTE_ETH_DEV_NO_OWNER, .name = ""};
int ret;
 
+   if (!rte_eth_is_valid_owner_id(owner_id))
+   return -EINVAL;
+
rte_eth_dev_shared_data_prepare();
 
rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
-- 
2.18.0



Re: [dpdk-dev] [PATCH v2] ethdev: fix device info getting

2018-08-13 Thread Lu, Wenzhuo
Hi Andrew,

> -Original Message-
> From: Andrew Rybchenko [mailto:arybche...@solarflare.com]
> Sent: Monday, August 13, 2018 4:39 PM
> To: Lu, Wenzhuo ; Thomas Monjalon
> ; Yigit, Ferruh 
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] ethdev: fix device info getting
> 
> On 13.08.2018 05:50, Lu, Wenzhuo wrote:
> > Hi Thomas,
> >
> >
> >> -Original Message-
> >> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> >> Sent: Wednesday, August 1, 2018 11:37 PM
> >> To: Lu, Wenzhuo ; Andrew Rybchenko
> >> ; Yigit, Ferruh 
> >> Cc: dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v2] ethdev: fix device info getting
> >>
> >> 16/07/2018 03:58, Lu, Wenzhuo:
> >>> Hi Andrew,
> >>>
>  -Original Message-
>  From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Lu, Wenzhuo
>  Sent: Monday, July 16, 2018 9:08 AM
>  To: Andrew Rybchenko ; dev@dpdk.org
>  Cc: Yigit, Ferruh ; Thomas Monjalon
>  
>  Subject: Re: [dpdk-dev] [PATCH v2] ethdev: fix device info getting
> 
>  Hi Andrew,
> 
> > -Original Message-
> > From: Andrew Rybchenko [mailto:arybche...@solarflare.com]
> > Sent: Friday, July 13, 2018 4:03 PM
> > To: Lu, Wenzhuo ; dev@dpdk.org
> > Cc: Yigit, Ferruh ; Thomas Monjalon
> > 
> > Subject: Re: [dpdk-dev] [PATCH v2] ethdev: fix device info getting
> >
> > Hi, Wenzhuo,
> >
> > I'm sorry, but I have more even harder questions than the previous
> one.
> > This questions are rather generic and mainly to ethdev maintainers.
> >
> > On 13.07.2018 05:42, Wenzhuo Lu wrote:
> >> The device information cannot be gotten correctly before the
> >> configuration is set. Because on some NICs the information has
> >> dependence on the configuration.
> > Thinking about it I have the following question. Is it valid
> > behaviour of the dev_info if it changes after configuration?
> > I always thought that the primary goal of the dev_info is to
> > provide information to app about device capabilities to allow app
> > configure device and queues correctly. Now we see the case when
> > dev_info changes on configure. May be it is acceptable, but it is
> > really suspicious. If we accept it, it should be documented.
> > May be dev_info should be split into parts: part which is
> > persistent and part which may depend on device configuration.
>  As I remember, the similar discussion has happened :) I've raised
>  the similar suggestion like this. But we don’t make it happen.
>  The reason is, you see, this is the rte layer's behavior. So the
>  user doesn't have to know it. From APP's PoV, it inputs the
>  configuration, it calls this API "rte_eth_dev_configure". It
>  doesn't know  the configuration is copied before getting the info or not.
>  So, to my opinion, we can still keep the behavior. We only need to
>  split it into parts when we do see the case that cannot make it.
> >>> Maybe I talked too much about the patch. Think about it again. Your
> >>> comments is about how to use the APIs, rte_eth_dev_info_get,
> >> rte_eth_dev_configure. To my opinion, rte_eth_dev_info_get is just to
> >> get the info. It can be called anywhere, before configuration or
> >> after. It's reasonable the info changes with the configuration changing.
> >>> But we do have something missing, like, rte_eth_dev_capability_get
> >>> which
> >> should be stable. APP can use this API to get the necessary info
> >> before configuration.
> >>> A question, maybe a little divergent thinking, that APP should have
> >>> some
> >> intelligence to handle the capability automatically. So getting the
> >> capability is not so good and effective, looks like we still need the human
> involvement.
> >> Maybe that the reason currently we suppose APP know the capability
> >> from the paper copies, examples...
> >>
> >> I am not sure to understand all the sentences.
> >> But I agree that we should take a decision about the stability of these
> infos.
> >> Either infos cannot change after probing, or we must document that
> >> the app must request infos regularly (when?).
> > Sorry, I missed this mail.
> >
> > I have the concern that different NICs have different behavior. One info
> can be stable on a NIC but dynamic on another. Considering this, we may
> better not splitting the rte_eth_dev_info_get to 2 APIs. And comparing with
> handling this in rte layer, maybe we can let every NIC has its own decision.
> > I have an idea. Maybe we can add a parameter for potential dynamic
> > fields. Like, Changing uint16_t nb_rx_queues; to struct nb_rx_queues {
> > uint16_t value; bool stable; }
> 
> May be it is just very bad example, but as I understand nb_rx_queues is
> mainly required to configure the device properly. Or should app configure,
> get new value, reconfigure again, get new value and so on and stop when
> previous is equal to the new one. Yes, I dramatise and it

Re: [dpdk-dev] [PATCH v9] checkpatches.sh: Add checks for ABI symbol addition

2018-08-13 Thread Rao, Nikhil

On 6/27/2018 11:31 PM, Neil Horman wrote:

diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-change.sh
new file mode 100755
index 0..17d123cf4
--- /dev/null
+++ b/devtools/check-symbol-change.sh
@@ -0,0 +1,159 @@
+#!/bin/sh
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Neil Horman 
+
+build_map_changes()
+{
+   local fname=$1
+   local mapdb=$2
+
+   cat $fname | awk '
+   # Initialize our variables
+   BEGIN {map="";sym="";ar="";sec=""; in_sec=0; in_map=0}
+
+   # Anything that starts with + or -, followed by an a
+   # and ends in the string .map is the name of our map file
+   # This may appear multiple times in a patch if multiple
+   # map files are altered, and all section/symbol names
+   # appearing between a triggering of this rule and the
+   # next trigger of this rule are associated with this file
+   /[-+] a\/.*\.map/ {map=$2; in_map=1}
+
+   # Same pattern as above, only it matches on anything that
+   # doesnt end in 'map', indicating we have left the map chunk.
+   # When we hit this, turn off the in_map variable, which
+   # supresses the subordonate rules below
+   /[-+] a\/.*\.^(map)/ {in_map=0}
+
+   # Triggering this rule, which starts a line with a + and ends it
+   # with a { identifies a versioned section.  The section name is
+   # the rest of the line with the + and { symbols remvoed.
+   # Triggering this rule sets in_sec to 1, which actives the
+   # symbol rule below
+   /+.*{/ {gsub("+","");
+   if (in_map == 1) {
+   sec=$1; in_sec=1;
+   }
+   }
+


I am adding a symbol as shown below, however the rule above fails to 
detect that the new symbol is being added to a pre-existing EXPERIMENTAL 
block (picks up the section name as @@ instead).


Any suggestions ?

diff --git a/lib/librte_eventdev/rte_eventdev_version.map 
b/lib/librte_eventdev/rte_eventdev_version.map

index 12835e9..4b8c55d 100644
--- a/lib/librte_eventdev/rte_eventdev_version.map
+++ b/lib/librte_eventdev/rte_eventdev_version.map
@@ -96,6 +96,7 @@ EXPERIMENTAL {
rte_event_crypto_adapter_stats_reset;
rte_event_crypto_adapter_stop;
rte_event_eth_rx_adapter_cb_register;
+   rte_event_eth_tx_adapter_caps_get;
rte_event_timer_adapter_caps_get;
rte_event_timer_adapter_create;
rte_event_timer_adapter_create_ext;

Thanks,
Nikhil


Re: [dpdk-dev] [PATCH] ethdev: fix rte_eth_dev_owner_unset

2018-08-13 Thread Matan Azrad
Hi Stephen

From: Stephen Hemminger
> The rte_eth_dev_owner_unset function is unusable because it always
> returns -EINVAL. This is because the magic (unowned) value is flagged as not
> valid.
> 

It's OK to raise an error when you do unset for unowned device.
It means that unset owner should be called for owned device.

> Move the validation of owner into set and unset as separate calls.
> 
> Fixes: 5b7ba31148a8 ("ethdev: add port ownership")
> Signed-off-by: Stephen Hemminger 
> ---
>  lib/librte_ethdev/rte_ethdev.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 4c320250589a..9398550a1189 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -443,10 +443,6 @@ _rte_eth_dev_owner_set(const uint16_t port_id,
> const uint64_t old_owner_id,
>   return -ENODEV;
>   }
> 
> - if (!rte_eth_is_valid_owner_id(new_owner->id) &&
> - !rte_eth_is_valid_owner_id(old_owner_id))
> - return -EINVAL;
> -
>   port_owner = &rte_eth_devices[port_id].data->owner;
>   if (port_owner->id != old_owner_id) {
>   RTE_ETHDEV_LOG(ERR,
> @@ -475,6 +471,9 @@ rte_eth_dev_owner_set(const uint16_t port_id,  {
>   int ret;
> 
> + if (!rte_eth_is_valid_owner_id(owner->id))
> + return -EINVAL;
> +
>   rte_eth_dev_shared_data_prepare();
> 
>   rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
> @@ -492,6 +491,9 @@ rte_eth_dev_owner_unset(const uint16_t port_id,
> const uint64_t owner_id)
>   {.id = RTE_ETH_DEV_NO_OWNER, .name = ""};
>   int ret;
> 
> + if (!rte_eth_is_valid_owner_id(owner_id))
> + return -EINVAL;
> +
>   rte_eth_dev_shared_data_prepare();
> 
>   rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
> --
> 2.18.0



Re: [dpdk-dev] [RFC] ethdev: add tail drop API for traffic management

2018-08-13 Thread Jerin Jacob
-Original Message-
> Date: Mon, 13 Aug 2018 15:53:32 +0800
> From: Rosen Xu 
> To: dev@dpdk.org
> CC: cristian.dumitre...@intel.com, wenzhuo...@intel.com,
>  jasvinder.si...@intel.com, rosen...@intel.com, ferruh.yi...@intel.com
> Subject: [dpdk-dev] [RFC] ethdev: add tail drop API for traffic management
> X-Mailer: git-send-email 1.8.3.1
> 
> 
> This patch introduces new ethdev generic Tail Drop API for Traffic
> Management, which is yet another standard congestion management
> offload for Ethernet devices.
> 
> Tail Drop is about packets dropping when they arrive on a congested
> interface buffer. It's one mode of congestion management for hierarchy
> leaf nodes.
> 
> There are two configuration parameters for Tail Drop:
> 1. Buffer Depth: determine the depth of receive fifo for packet RX.

If it is for Packet Rx, We should not add it in rte_tm, Right?

Apart from tail drop, RED(random early detection) also found in some HW on RX 
side. How about
creating generic Rx congestion management, which includes RED and Tail
drop based on the capability.(In future some other scheme also)



> 2. Drop Threshold: water line of receive fifo to judge whether the
>current received packet dropped or enqueue.
> 
> Signed-off-by: Rosen Xu 
> ---
>  lib/librte_ethdev/rte_tm.c|  42 ++
>  lib/librte_ethdev/rte_tm.h| 172 
> ++
>  lib/librte_ethdev/rte_tm_driver.h |  35 
>  3 files changed, 249 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_tm.c b/lib/librte_ethdev/rte_tm.c
> index 9709454..89a7dec 100644
> --- a/lib/librte_ethdev/rte_tm.c
> +++ b/lib/librte_ethdev/rte_tm.c
> @@ -168,6 +168,48 @@ int rte_tm_shared_wred_context_delete(uint16_t port_id,
> shared_wred_context_id, error);
>  }
> 
> +/* Add Tail Drop profile */
> +int rte_tm_tdrop_profile_add(uint16_t port_id,
> +   uint32_t tdrop_profile_id,
> +   struct rte_tm_tdrop_params *profile,
> +   struct rte_tm_error *error)
> +{
> +   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +   return RTE_TM_FUNC(port_id, tdrop_profile_add)(dev,
> +   tdrop_profile_id, profile, error);
> +}
> +
> +/* Delete Tail Drop profile */
> +int rte_tm_tdrop_profile_delete(uint16_t port_id,
> +   uint32_t tdrop_profile_id,
> +   struct rte_tm_error *error)
> +{
> +   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +   return RTE_TM_FUNC(port_id, tdrop_profile_delete)(dev,
> +   tdrop_profile_id, error);
> +}
> +
> +/* Add/update shared Tail Drop context */
> +int rte_tm_shared_tdrop_context_add_update(uint16_t port_id,
> +   uint32_t shared_tdrop_context_id,
> +   uint32_t tdrop_profile_id,
> +   struct rte_tm_error *error)
> +{
> +   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +   return RTE_TM_FUNC(port_id, shared_tdrop_context_add_update)(dev,
> +   shared_tdrop_context_id, tdrop_profile_id, error);
> +}
> +
> +/* Delete shared Tail Drop context */
> +int rte_tm_shared_tdrop_context_delete(uint16_t port_id,
> +   uint32_t shared_tdrop_context_id,
> +   struct rte_tm_error *error)
> +{
> +   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +   return RTE_TM_FUNC(port_id, shared_tdrop_context_delete)(dev,
> +   shared_tdrop_context_id, error);
> +}
> +
>  /* Add shaper profile */
>  int rte_tm_shaper_profile_add(uint16_t port_id,
> uint32_t shaper_profile_id,
> diff --git a/lib/librte_ethdev/rte_tm.h b/lib/librte_ethdev/rte_tm.h
> index 955f02f..91b087d 100644
> --- a/lib/librte_ethdev/rte_tm.h
> +++ b/lib/librte_ethdev/rte_tm.h
> @@ -93,6 +93,15 @@
>  #define RTE_TM_WRED_PROFILE_ID_NONE  UINT32_MAX
> 
>  /**
> + * Invalid TDROP profile ID.
> + *
> + * @see struct rte_tm_node_params
> + * @see rte_tm_node_add()
> + * @see rte_tm_node_tdrop_context_update()
> + */
> +#define RTE_TM_TDROP_PROFILE_ID_NONE  UINT32_MAX
> +
> +/**
>   *Invalid shaper profile ID.
>   *
>   * @see struct rte_tm_node_params
> @@ -871,6 +880,37 @@ struct rte_tm_wred_params {
>  };
> 
>  /**
> + * Tail Drop (TDROP) profile
> + *
> + * Multiple TDROP contexts can share the same TDROP profile. Each leaf node 
> with
> + * TDROP enabled as its congestion management mode has zero or one private 
> TDROP
> + * context (only one leaf node using it) and/or zero, one or several shared
> + * TDROP contexts (multiple leaf nodes use the same TDROP context). A private
> + * TDROP context is used to perform congestion management for a single leaf
> + * node, while a shared TDROP context is used to perform congestion 
> management
> + * for a group of leaf nodes.
> + *
> + * @see struct rte_tm_capabilities::cman_tdrop_packet_mode_supported
> + * @see struct rte_tm_capabilities::cman_tdrop_byte_mode_supported
> + */
> +struct rte_tm_tdrop_params {
> +   /** Committed queue length (in bytes) */
> +   uint64_t committed_length;
> +
> +  

Re: [dpdk-dev] [RFC] ethdev: add tail drop API for traffic management

2018-08-13 Thread Xu, Rosen



> -Original Message-
> From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com]
> Sent: Tuesday, August 14, 2018 14:06
> To: Xu, Rosen 
> Cc: dev@dpdk.org; Dumitrescu, Cristian ; Lu,
> Wenzhuo ; Singh, Jasvinder
> ; Yigit, Ferruh ;
> nithin.dabilpu...@cavium.com
> Subject: Re: [dpdk-dev] [RFC] ethdev: add tail drop API for traffic
> management
> 
> -Original Message-
> > Date: Mon, 13 Aug 2018 15:53:32 +0800
> > From: Rosen Xu 
> > To: dev@dpdk.org
> > CC: cristian.dumitre...@intel.com, wenzhuo...@intel.com,
> > jasvinder.si...@intel.com, rosen...@intel.com, ferruh.yi...@intel.com
> > Subject: [dpdk-dev] [RFC] ethdev: add tail drop API for traffic
> > management
> > X-Mailer: git-send-email 1.8.3.1
> >
> >
> > This patch introduces new ethdev generic Tail Drop API for Traffic
> > Management, which is yet another standard congestion management
> > offload for Ethernet devices.
> >
> > Tail Drop is about packets dropping when they arrive on a congested
> > interface buffer. It's one mode of congestion management for hierarchy
> > leaf nodes.
> >
> > There are two configuration parameters for Tail Drop:
> > 1. Buffer Depth: determine the depth of receive fifo for packet RX.
> 
> If it is for Packet Rx, We should not add it in rte_tm, Right?

For cyber function perspective, it belongs to tm ingress, just like WRED, so it 
add it in rte_tm.

> Apart from tail drop, RED(random early detection) also found in some HW
> on RX side. How about creating generic Rx congestion management, which
> includes RED and Tail drop based on the capability.(In future some other
> scheme also)

It's a good idea, but the configuration is different between RED and Tail Drop,
especially for FPGA IP, so I added new.

> 
> 
> > 2. Drop Threshold: water line of receive fifo to judge whether the
> >current received packet dropped or enqueue.
> >
> > Signed-off-by: Rosen Xu 
> > ---
> >  lib/librte_ethdev/rte_tm.c|  42 ++
> >  lib/librte_ethdev/rte_tm.h| 172
> ++
> >  lib/librte_ethdev/rte_tm_driver.h |  35 
> >  3 files changed, 249 insertions(+)
> >
> > diff --git a/lib/librte_ethdev/rte_tm.c b/lib/librte_ethdev/rte_tm.c
> > index 9709454..89a7dec 100644
> > --- a/lib/librte_ethdev/rte_tm.c
> > +++ b/lib/librte_ethdev/rte_tm.c
> > @@ -168,6 +168,48 @@ int rte_tm_shared_wred_context_delete(uint16_t
> port_id,
> > shared_wred_context_id, error);  }
> >
> > +/* Add Tail Drop profile */
> > +int rte_tm_tdrop_profile_add(uint16_t port_id,
> > +   uint32_t tdrop_profile_id,
> > +   struct rte_tm_tdrop_params *profile,
> > +   struct rte_tm_error *error)
> > +{
> > +   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +   return RTE_TM_FUNC(port_id, tdrop_profile_add)(dev,
> > +   tdrop_profile_id, profile, error); }
> > +
> > +/* Delete Tail Drop profile */
> > +int rte_tm_tdrop_profile_delete(uint16_t port_id,
> > +   uint32_t tdrop_profile_id,
> > +   struct rte_tm_error *error)
> > +{
> > +   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +   return RTE_TM_FUNC(port_id, tdrop_profile_delete)(dev,
> > +   tdrop_profile_id, error); }
> > +
> > +/* Add/update shared Tail Drop context */ int
> > +rte_tm_shared_tdrop_context_add_update(uint16_t port_id,
> > +   uint32_t shared_tdrop_context_id,
> > +   uint32_t tdrop_profile_id,
> > +   struct rte_tm_error *error)
> > +{
> > +   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +   return RTE_TM_FUNC(port_id,
> shared_tdrop_context_add_update)(dev,
> > +   shared_tdrop_context_id, tdrop_profile_id, error); }
> > +
> > +/* Delete shared Tail Drop context */ int
> > +rte_tm_shared_tdrop_context_delete(uint16_t port_id,
> > +   uint32_t shared_tdrop_context_id,
> > +   struct rte_tm_error *error)
> > +{
> > +   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +   return RTE_TM_FUNC(port_id, shared_tdrop_context_delete)(dev,
> > +   shared_tdrop_context_id, error); }
> > +
> >  /* Add shaper profile */
> >  int rte_tm_shaper_profile_add(uint16_t port_id,
> > uint32_t shaper_profile_id,
> > diff --git a/lib/librte_ethdev/rte_tm.h b/lib/librte_ethdev/rte_tm.h
> > index 955f02f..91b087d 100644
> > --- a/lib/librte_ethdev/rte_tm.h
> > +++ b/lib/librte_ethdev/rte_tm.h
> > @@ -93,6 +93,15 @@
> >  #define RTE_TM_WRED_PROFILE_ID_NONE  UINT32_MAX
> >
> >  /**
> > + * Invalid TDROP profile ID.
> > + *
> > + * @see struct rte_tm_node_params
> > + * @see rte_tm_node_add()
> > + * @see rte_tm_node_tdrop_context_update()
> > + */
> > +#define RTE_TM_TDROP_PROFILE_ID_NONE  UINT32_MAX
> > +
> > +/**
> >   *Invalid shaper profile ID.
> >   *
> >   * @see struct rte_tm_node_params
> > @@ -871,6 +880,37 @@ struct rte_tm_wred_params {  };
> >
> >  /**
> > + * Tail Drop (TDROP) profile
> > + *
> > + * Multiple TDROP contexts