Re: [PATCH v2 00/20] Enable lock annotations on most libraries and drivers

2023-02-25 Thread David Marchand
Salut Gaëtan,

On Fri, Feb 24, 2023 at 4:59 PM Gaëtan Rivet  wrote:
> > FreeBSD libc pthread API has lock annotations while Linux glibc has
> > none.
> > We could simply disable the check on FreeBSD, but having this check,
> > a few issues got raised in drivers that are built with FreeBSD.
> > For now, I went with a simple #ifdef FreeBSD for pthread mutex related
> > annotations in our code.
> >
>
> Hi David,
>
> This is a great change, thanks for doing it.
> However I am not sure I understand the logic regarding the '#ifdef FREEBSD'.
>
> These annotations provide static hints to clang's thread safety analysis.
> What is the dependency on FreeBSD glibc?

FreeBSD libc added clang annotations, while glibc did not.

FreeBSD 13.1 libc:
int pthread_mutex_lock(pthread_mutex_t * __mutex)
__locks_exclusive(*__mutex);

With:

#if __has_extension(c_thread_safety_attributes)
#define __lock_annotate(x)  __attribute__((x))
#else
#define __lock_annotate(x)
#endif

/* Function acquires an exclusive or shared lock. */
#define __locks_exclusive(...) \
__lock_annotate(exclusive_lock_function(__VA_ARGS__))


glibc:
extern int pthread_mutex_lock (pthread_mutex_t *__mutex)
 __THROWNL __nonnull ((1));



Since glibc does not declare that pthread_mutex_t is lockable, adding
an annotation triggers an error for Linux (taking eal_common_proc.c
patch 14 as an example):

../lib/eal/common/eal_common_proc.c:911:2: error:
'exclusive_locks_required' attribute requires arguments whose type is
annotated with 'capability' attribute; type here is 'pthread_mutex_t'
[-Werror,-Wthread-safety-attributes]
__rte_exclusive_locks_required(pending_requests.lock)
^
../lib/eal/include/rte_lock_annotations.h:23:17: note: expanded from
macro '__rte_exclusive_locks_required'
__attribute__((exclusive_locks_required(__VA_ARGS__)))
   ^

We could wrap this annotation in a new construct (which would require
some build time check wrt pthread API state).
Not sure it is worth the effort though, for now.


-- 
David Marchand



[PATCH v4] lib/net: add MPLS insert and strip functionality

2023-02-25 Thread Tanzeel-inline
From: Tanzeel Ahmed 

This patch is new version of [PATCH] lib/net: added push MPLS header API.
I have also added the MPLS strip functionality to address the question
asked in last patch.

> You should explain why you add this function.
None of the foundational NICs currently supports MPLS insertion and
stripping, this functionality can help users who rely on MPLS in their
network application.

> I'm not sure it should be inline
I did for performance in high-traffic application.

Signed-off-by: Tanzeel Ahmed 

---
v4:
* Removed extra void cast.
* rte_pktmbuf_append/mtod now return void*.
  The memmove result is casted to rte_ether_hdr*.

v3:
* fixed patch check failure issue

v2:
* marked experimental
* coding style fixed
* changed rte_memcpy to memcpy
* mpls header marked as const in parameter
* added MPLS stripping functionality
---
 .mailmap   |  1 +
 lib/net/rte_mpls.h | 97 ++
 2 files changed, 98 insertions(+)

diff --git a/.mailmap b/.mailmap
index a9f4f28..2af4e0d 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1312,6 +1312,7 @@ Takeshi Yoshimura  

 Takuya Asada 
 Tal Avraham 
 Tal Shnaiderman  
+Tanzeel Ahmed 
 Tao Y Yang 
 Tao Zhu 
 Taripin Samuel 
diff --git a/lib/net/rte_mpls.h b/lib/net/rte_mpls.h
index 51523e7..d7e267f 100644
--- a/lib/net/rte_mpls.h
+++ b/lib/net/rte_mpls.h
@@ -13,6 +13,8 @@
 
 #include 
 #include 
+#include 
+#include 
 
 #ifdef __cplusplus
 extern "C" {
@@ -36,6 +38,101 @@ struct rte_mpls_hdr {
uint8_t  ttl;   /**< Time to live. */
 } __rte_packed;
 
+#define RTE_MPLS_HLEN 4 /**< Length of MPLS header. */
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Insert MPLS header into the packet.
+ * If it's first MPLS header to be inserted in the packet,
+ *  - Updates the ether type.
+ *  - Sets the MPLS bottom-of-stack bit to 1.
+ *
+ * @param m
+ *   The pointer to the mbuf.
+ * @param mp
+ *   The pointer to the MPLS header.
+ * @return
+ *   0 on success, -1 on error
+ */
+__rte_experimental
+static inline int
+rte_mpls_push_over_l2(struct rte_mbuf **m, const struct rte_mpls_hdr *mp)
+{
+   void *os, *ns;
+   struct rte_ether_hdr *nh;
+   struct rte_mpls_hdr *mph;
+
+   /* Can't insert header if mbuf is shared */
+   if (!RTE_MBUF_DIRECT(*m) || rte_mbuf_refcnt_read(*m) > 1)
+   return -EINVAL;
+
+   /* Can't insert header if ethernet frame doesn't exist */
+   if (rte_pktmbuf_data_len(*m) < RTE_ETHER_HDR_LEN)
+   return -EINVAL;
+
+   os = rte_pktmbuf_mtod(*m, void *);
+   ns = (void *)rte_pktmbuf_prepend(*m, sizeof(struct rte_mpls_hdr));
+   if (ns == NULL)
+   return -ENOSPC;
+
+   nh = (struct rte_ether_hdr *)memmove(ns, os, RTE_ETHER_HDR_LEN);
+
+   /* Copy the MPLS header after ethernet frame */
+   mph = rte_pktmbuf_mtod_offset(*m, struct rte_mpls_hdr*,
+   sizeof(struct rte_ether_hdr));
+   memcpy(mph, mp, RTE_MPLS_HLEN);
+
+   mph->tag_msb = rte_cpu_to_be_16(mp->tag_msb);
+
+   /* If first MPLS header, update ether type and bottom-of-stack bit */
+   if (nh->ether_type != rte_cpu_to_be_16(RTE_ETHER_TYPE_MPLS)) {
+   nh->ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_MPLS);
+   mph->bs = 1;
+   } else {
+   mph->bs = 0;
+   }
+
+   return 0;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Strips MPLS from the packet. Doesn't update the ether type
+ *
+ * @param m
+ *   The pointer to the mbuf.
+ * @return
+ *   0 on success, -1 on error
+ */
+__rte_experimental
+static inline int
+rte_mpls_strip_over_l2(struct rte_mbuf *m)
+{
+   struct rte_ether_hdr *eh = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
+   struct rte_mpls_hdr *mph;
+   bool mpls_exist = true;
+
+   if (eh->ether_type != rte_cpu_to_be_16(RTE_ETHER_TYPE_MPLS))
+   return -1;
+
+   /* Stripping all MPLS header */
+   while (mpls_exist) {
+   mph = rte_pktmbuf_mtod_offset(m, struct rte_mpls_hdr*,
+   sizeof(struct rte_ether_hdr));
+   if (mph->bs & 1)
+   mpls_exist = false;
+   memmove(rte_pktmbuf_adj(m, sizeof(struct rte_mpls_hdr)),
+   eh, sizeof(struct rte_ether_hdr));
+   eh = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
+   }
+
+   return 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.3.1



[PATCH] net/mlx5: reject flow template API reconfiguration

2023-02-25 Thread Dariusz Sosnowski
Flow Template API allows rte_flow_configure() to be called
more than once, to allow applications to dynamically reconfigure
the amount of resources available for flow table and flow rule creation.
PMDs can reject the change in configuration and
keep the old configuration.

Before this patch, during Flow Template API reconfiguration in mlx5 PMD,
all the allocated resources (HW/FW object pools, flow rules, pattern and
actions templates) were released and reallocated according to
the new configuration. This however leads to undefined behavior,
since all references to templates, flows or indirect actions,
which are held by the user, are now invalidated.

This patch changes the reconfiguration behavior.
Configuration provided to rte_flow_configure() is stored in port's
private data structure. On any subsequent call to rte_flow_configure(),
the provided configuration is compared against the stored configuration.
If both are equal, rte_flow_configure() reports a success and
configuration is unaffected. Otherwise, (-ENOTSUP) is returned.

Signed-off-by: Dariusz Sosnowski 
---
 doc/guides/nics/mlx5.rst|   3 +
 drivers/net/mlx5/mlx5.h |   8 +++
 drivers/net/mlx5/mlx5_flow_hw.c | 104 +++-
 3 files changed, 112 insertions(+), 3 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 6510e74fb9..2fc0399b99 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -160,6 +160,9 @@ Limitations
   - Set ``dv_flow_en`` to 2 in order to enable HW steering.
   - Async queue-based ``rte_flow_async`` APIs supported only.
   - NIC ConnectX-5 and before are not supported.
+  - Reconfiguring flow API engine is not supported.
+Any subsequent call to ``rte_flow_configure()`` with different 
configuration
+than initially provided will be rejected with ``-ENOTSUP`` error code.
   - Partial match with item template is not supported.
   - IPv6 5-tuple matching is not supported.
   - With E-Switch enabled, ports which share the E-Switch domain
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a766fb408e..7fc75a2535 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1656,6 +1656,13 @@ struct mlx5_hw_ctrl_flow {
struct rte_flow *flow;
 };
 
+/* HW Steering port configuration passed to rte_flow_configure(). */
+struct mlx5_flow_hw_attr {
+   struct rte_flow_port_attr port_attr;
+   uint16_t nb_queue;
+   struct rte_flow_queue_attr *queue_attr;
+};
+
 struct mlx5_flow_hw_ctrl_rx;
 
 struct mlx5_priv {
@@ -1763,6 +1770,7 @@ struct mlx5_priv {
uint32_t nb_queue; /* HW steering queue number. */
struct mlx5_hws_cnt_pool *hws_cpool; /* HW steering's counter pool. */
uint32_t hws_mark_refcnt; /* HWS mark action reference counter. */
+   struct mlx5_flow_hw_attr *hw_attr; /* HW Steering port configuration. */
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
/* Item template list. */
LIST_HEAD(flow_hw_itt, rte_flow_pattern_template) flow_hw_itt;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index a9c7045a3e..29437d8d6c 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -6792,6 +6792,86 @@ mlx5_flow_hw_cleanup_ctrl_rx_templates(struct 
rte_eth_dev *dev)
}
 }
 
+/**
+ * Copy the provided HWS configuration to a newly allocated buffer.
+ *
+ * @param[in] port_attr
+ *   Port configuration attributes.
+ * @param[in] nb_queue
+ *   Number of queue.
+ * @param[in] queue_attr
+ *   Array that holds attributes for each flow queue.
+ *
+ * @return
+ *   Pointer to copied HWS configuration is returned on success.
+ *   Otherwise, NULL is returned and rte_errno is set.
+ */
+static struct mlx5_flow_hw_attr *
+flow_hw_alloc_copy_config(const struct rte_flow_port_attr *port_attr,
+ const uint16_t nb_queue,
+ const struct rte_flow_queue_attr *queue_attr[],
+ struct rte_flow_error *error)
+{
+   struct mlx5_flow_hw_attr *hw_attr;
+   size_t hw_attr_size;
+   unsigned int i;
+
+   hw_attr_size = sizeof(*hw_attr) + nb_queue * 
sizeof(*hw_attr->queue_attr);
+   hw_attr = mlx5_malloc(MLX5_MEM_ZERO, hw_attr_size, 0, SOCKET_ID_ANY);
+   if (!hw_attr) {
+   rte_flow_error_set(error, ENOMEM, 
RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+  "Not enough memory to store configuration");
+   return NULL;
+   }
+   memcpy(&hw_attr->port_attr, port_attr, sizeof(*port_attr));
+   hw_attr->nb_queue = nb_queue;
+   /* Queue attributes are placed after the mlx5_flow_hw_attr. */
+   hw_attr->queue_attr = (struct rte_flow_queue_attr *)(hw_attr + 1);
+   for (i = 0; i < nb_queue; ++i)
+   memcpy(&hw_attr->queue_attr[i], queue_attr[i], 
sizeof(hw_attr->queue_attr[i]));
+   return hw_attr;
+}
+
+/**
+ * Compa

[PATCH 0/2] net/mlx5: fix representor matching

2023-02-25 Thread Dariusz Sosnowski
This patch series addresses a few issues with representor
matching mode available in mlx5 PMD:

- First patch fixes an issue with internal flow group
  index translation occurring when both extended metadata mode
  and representor matching was enabled.
- Second patch fixes an issue with isolated mode occurring
  when representor matching was disabled.

Dariusz Sosnowski (2):
  net/mlx5: fix egress group translation in HWS
  net/mlx5: fix isolated mode when repr matching is disabled

 doc/guides/nics/mlx5.rst |  3 +++
 drivers/net/mlx5/linux/mlx5_os.c | 16 
 drivers/net/mlx5/mlx5_flow.c |  4 
 drivers/net/mlx5/mlx5_flow_hw.c  | 14 +-
 4 files changed, 32 insertions(+), 5 deletions(-)

-- 
2.25.1



[PATCH 1/2] net/mlx5: fix egress group translation in HWS

2023-02-25 Thread Dariusz Sosnowski
With HW Steering enabled creating egress template tables and
egress flow rules on E-Switch setups is allowed.
To enable it, PMD creates a set of default egress flow rules
responsible for:

- Storing representor ID (vport tag is used) in HW register.
  This is used for traffic source identification.
- Copying software metadata to proper HW register to allow
  preserving metadata across domains.

Structure of these flow rules and whether they are inserted
depend on the device configuration.
There are the following cases:

1. repr_matching=1 and dv_xmeta_en=4
   - An egress flow rule in group 0 is created for each Tx queue;
   - Flow rule matching SQ number - fills unused REG_C_0 bits
 with vport tag, copies REG_A to REG_C_1 and jumps to group 1.
2. repr_matching=1 and dv_xmeta_en=0
   - An egress flow rule in group 0 is created for each Tx queue;
   - Flow rule matching SQ number - fills unused REG_C_0 bits
 with vport tag and jumps to group 1.
3. repr_matching=0 and dv_xmeta_en=4
   - A single egress flow rule in group 0 is created;
   - Flow rule matches all E-Switch manager TX traffic,
 copies REG_A to REG_C and jumps to group 1.
4. repr_matching=0 and dv_xmeta_en=0 - no default flow rules are added.

When default egress flow rules are required, they are inserted in
group 0 and this group is reserved for PMD purposes.
User created template tables must be created in higher groups.
As a result, on template table creation PMD is translating
the provided group (incrementing it in that case).

Before this patch, a condition used to check if translation of egress
flow group is needed was incorrect. It did not allow translation
if both representor matching AND extended metadata mode were enabled.

This patch fixes this condition - translation is allowed if and only if
representor matching OR extended metadata mode is enabled.

Fixes: 483181f7b6dd ("net/mlx5: support device control of representor matching")
Cc: sta...@dpdk.org

Signed-off-by: Dariusz Sosnowski 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/mlx5_flow_hw.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index a9c7045a3e..d3d86fe24d 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -3260,14 +3260,18 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
  "group index not supported");
*table_group = group + 1;
} else if (config->dv_esw_en &&
-  !(config->repr_matching && config->dv_xmeta_en == 
MLX5_XMETA_MODE_META32_HWS) &&
+  (config->repr_matching || config->dv_xmeta_en == 
MLX5_XMETA_MODE_META32_HWS) &&
   cfg->external &&
   flow_attr->egress) {
/*
-* On E-Switch setups, egress group translation is not done if 
and only if
-* representor matching is disabled and legacy metadata mode is 
selected.
-* In all other cases, egree group 0 is reserved for 
representor tagging flows
-* and metadata copy flows.
+* On E-Switch setups, default egress flow rules are inserted 
to allow
+* representor matching and/or preserving metadata across 
steering domains.
+* These flow rules are inserted in group 0 and this group is 
reserved by PMD
+* for these purposes.
+*
+* As a result, if representor matching or extended metadata 
mode is enabled,
+* group provided by the user must be incremented to avoid 
inserting flow rules
+* in group 0.
 */
if (group > MLX5_HW_MAX_EGRESS_GROUP)
return rte_flow_error_set(error, EINVAL,
-- 
2.25.1



[PATCH 2/2] net/mlx5: fix isolated mode when repr matching is disabled

2023-02-25 Thread Dariusz Sosnowski
In HW steering mode, when running on an E-Switch setup,
mlx5 PMD provides an ability to enable or disable
representor matching (through `repr_matching_en` device argument).
If representor matching is enabled, any ingress or egress flow rule,
created on any port representor will match traffic related
to that specific port.
If it is disabled, flow rule created on one of the ports,
will match traffic related to all ports.

As a result, when representor matching is disabled,
PMD cannot correctly create control flow rules for receiving
default traffic according to port configuration.
Since each port representor in the same switch domain,
can have different port configuration and flow rules
do not differentiate between ports, these flow rules cannot be
correctly applied.
In that case, each port works in de facto isolated mode.

This patch makes sure that if representor matching is disabled,
port is forced into isolated mode. Disabling flow isolated is forbidden.

Fixes: 483181f7b6dd ("net/mlx5: support device control of representor matching")
Cc: sta...@dpdk.org

Signed-off-by: Dariusz Sosnowski 
Acked-by: Ori Kam 
---
 doc/guides/nics/mlx5.rst |  3 +++
 drivers/net/mlx5/linux/mlx5_os.c | 16 
 drivers/net/mlx5/mlx5_flow.c |  4 
 3 files changed, 23 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 6510e74fb9..350fb0287e 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1137,6 +1137,9 @@ for an additional list of options shared with other mlx5 
drivers.
   - 0. If representor matching is disabled, then there will be no implicit
 item added. As a result, ingress flow rules will match traffic
 coming to any port, not only the port on which flow rule is created.
+Because of that, default flow rules for ingress traffic cannot be created
+and port starts in isolated mode by default. Port cannot be switched back
+to non-isolated mode.
 
   - 1. If representor matching is enabled (default setting),
 then each ingress pattern template has an implicit REPRESENTED_PORT
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index a71474c90a..2cce0a7f0f 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1613,6 +1613,22 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
err = EINVAL;
goto error;
}
+   /*
+* If representor matching is disabled, PMD cannot create 
default flow rules
+* to receive traffic for all ports, since implicit source port 
match is not added.
+* Isolated mode is forced.
+*/
+   if (priv->sh->config.dv_esw_en && 
!priv->sh->config.repr_matching) {
+   err = mlx5_flow_isolate(eth_dev, 1, NULL);
+   if (err < 0) {
+   err = -err;
+   goto error;
+   }
+   DRV_LOG(WARNING, "port %u ingress traffic is restricted 
to defined "
+"flow rules (isolated mode) since 
representor "
+"matching is disabled",
+   eth_dev->data->port_id);
+   }
return eth_dev;
 #else
DRV_LOG(ERR, "DV support is missing for HWS.");
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index a6a426caf7..7ee30bdd1f 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -8077,6 +8077,10 @@ mlx5_flow_isolate(struct rte_eth_dev *dev,
   "port must be stopped first");
return -rte_errno;
}
+   if (!enable && !priv->sh->config.repr_matching)
+   return rte_flow_error_set(error, ENOTSUP, 
RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+ "isolated mode cannot be disabled 
when "
+ "representor matching is disabled");
priv->isolated = !!enable;
if (enable)
dev->dev_ops = &mlx5_dev_ops_isolate;
-- 
2.25.1



RE: [PATCH] net/mlx5: reject flow template API reconfiguration

2023-02-25 Thread Ori Kam
Hi,

> -Original Message-
> From: Dariusz Sosnowski 
> Sent: Saturday, 25 February 2023 21:58
> Subject: [PATCH] net/mlx5: reject flow template API reconfiguration
> 
> Flow Template API allows rte_flow_configure() to be called
> more than once, to allow applications to dynamically reconfigure
> the amount of resources available for flow table and flow rule creation.
> PMDs can reject the change in configuration and
> keep the old configuration.
> 
> Before this patch, during Flow Template API reconfiguration in mlx5 PMD,
> all the allocated resources (HW/FW object pools, flow rules, pattern and
> actions templates) were released and reallocated according to
> the new configuration. This however leads to undefined behavior,
> since all references to templates, flows or indirect actions,
> which are held by the user, are now invalidated.
> 
> This patch changes the reconfiguration behavior.
> Configuration provided to rte_flow_configure() is stored in port's
> private data structure. On any subsequent call to rte_flow_configure(),
> the provided configuration is compared against the stored configuration.
> If both are equal, rte_flow_configure() reports a success and
> configuration is unaffected. Otherwise, (-ENOTSUP) is returned.
> 
> Signed-off-by: Dariusz Sosnowski 
> ---
Acked-by: Ori Kam