Re: [PATCH v2 00/20] Enable lock annotations on most libraries and drivers
Salut Gaëtan, On Fri, Feb 24, 2023 at 4:59 PM Gaëtan Rivet wrote: > > FreeBSD libc pthread API has lock annotations while Linux glibc has > > none. > > We could simply disable the check on FreeBSD, but having this check, > > a few issues got raised in drivers that are built with FreeBSD. > > For now, I went with a simple #ifdef FreeBSD for pthread mutex related > > annotations in our code. > > > > Hi David, > > This is a great change, thanks for doing it. > However I am not sure I understand the logic regarding the '#ifdef FREEBSD'. > > These annotations provide static hints to clang's thread safety analysis. > What is the dependency on FreeBSD glibc? FreeBSD libc added clang annotations, while glibc did not. FreeBSD 13.1 libc: int pthread_mutex_lock(pthread_mutex_t * __mutex) __locks_exclusive(*__mutex); With: #if __has_extension(c_thread_safety_attributes) #define __lock_annotate(x) __attribute__((x)) #else #define __lock_annotate(x) #endif /* Function acquires an exclusive or shared lock. */ #define __locks_exclusive(...) \ __lock_annotate(exclusive_lock_function(__VA_ARGS__)) glibc: extern int pthread_mutex_lock (pthread_mutex_t *__mutex) __THROWNL __nonnull ((1)); Since glibc does not declare that pthread_mutex_t is lockable, adding an annotation triggers an error for Linux (taking eal_common_proc.c patch 14 as an example): ../lib/eal/common/eal_common_proc.c:911:2: error: 'exclusive_locks_required' attribute requires arguments whose type is annotated with 'capability' attribute; type here is 'pthread_mutex_t' [-Werror,-Wthread-safety-attributes] __rte_exclusive_locks_required(pending_requests.lock) ^ ../lib/eal/include/rte_lock_annotations.h:23:17: note: expanded from macro '__rte_exclusive_locks_required' __attribute__((exclusive_locks_required(__VA_ARGS__))) ^ We could wrap this annotation in a new construct (which would require some build time check wrt pthread API state). Not sure it is worth the effort though, for now. -- David Marchand
[PATCH v4] lib/net: add MPLS insert and strip functionality
From: Tanzeel Ahmed This patch is new version of [PATCH] lib/net: added push MPLS header API. I have also added the MPLS strip functionality to address the question asked in last patch. > You should explain why you add this function. None of the foundational NICs currently supports MPLS insertion and stripping, this functionality can help users who rely on MPLS in their network application. > I'm not sure it should be inline I did for performance in high-traffic application. Signed-off-by: Tanzeel Ahmed --- v4: * Removed extra void cast. * rte_pktmbuf_append/mtod now return void*. The memmove result is casted to rte_ether_hdr*. v3: * fixed patch check failure issue v2: * marked experimental * coding style fixed * changed rte_memcpy to memcpy * mpls header marked as const in parameter * added MPLS stripping functionality --- .mailmap | 1 + lib/net/rte_mpls.h | 97 ++ 2 files changed, 98 insertions(+) diff --git a/.mailmap b/.mailmap index a9f4f28..2af4e0d 100644 --- a/.mailmap +++ b/.mailmap @@ -1312,6 +1312,7 @@ Takeshi Yoshimura Takuya Asada Tal Avraham Tal Shnaiderman +Tanzeel Ahmed Tao Y Yang Tao Zhu Taripin Samuel diff --git a/lib/net/rte_mpls.h b/lib/net/rte_mpls.h index 51523e7..d7e267f 100644 --- a/lib/net/rte_mpls.h +++ b/lib/net/rte_mpls.h @@ -13,6 +13,8 @@ #include #include +#include +#include #ifdef __cplusplus extern "C" { @@ -36,6 +38,101 @@ struct rte_mpls_hdr { uint8_t ttl; /**< Time to live. */ } __rte_packed; +#define RTE_MPLS_HLEN 4 /**< Length of MPLS header. */ + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Insert MPLS header into the packet. + * If it's first MPLS header to be inserted in the packet, + * - Updates the ether type. + * - Sets the MPLS bottom-of-stack bit to 1. + * + * @param m + * The pointer to the mbuf. + * @param mp + * The pointer to the MPLS header. + * @return + * 0 on success, -1 on error + */ +__rte_experimental +static inline int +rte_mpls_push_over_l2(struct rte_mbuf **m, const struct rte_mpls_hdr *mp) +{ + void *os, *ns; + struct rte_ether_hdr *nh; + struct rte_mpls_hdr *mph; + + /* Can't insert header if mbuf is shared */ + if (!RTE_MBUF_DIRECT(*m) || rte_mbuf_refcnt_read(*m) > 1) + return -EINVAL; + + /* Can't insert header if ethernet frame doesn't exist */ + if (rte_pktmbuf_data_len(*m) < RTE_ETHER_HDR_LEN) + return -EINVAL; + + os = rte_pktmbuf_mtod(*m, void *); + ns = (void *)rte_pktmbuf_prepend(*m, sizeof(struct rte_mpls_hdr)); + if (ns == NULL) + return -ENOSPC; + + nh = (struct rte_ether_hdr *)memmove(ns, os, RTE_ETHER_HDR_LEN); + + /* Copy the MPLS header after ethernet frame */ + mph = rte_pktmbuf_mtod_offset(*m, struct rte_mpls_hdr*, + sizeof(struct rte_ether_hdr)); + memcpy(mph, mp, RTE_MPLS_HLEN); + + mph->tag_msb = rte_cpu_to_be_16(mp->tag_msb); + + /* If first MPLS header, update ether type and bottom-of-stack bit */ + if (nh->ether_type != rte_cpu_to_be_16(RTE_ETHER_TYPE_MPLS)) { + nh->ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_MPLS); + mph->bs = 1; + } else { + mph->bs = 0; + } + + return 0; +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Strips MPLS from the packet. Doesn't update the ether type + * + * @param m + * The pointer to the mbuf. + * @return + * 0 on success, -1 on error + */ +__rte_experimental +static inline int +rte_mpls_strip_over_l2(struct rte_mbuf *m) +{ + struct rte_ether_hdr *eh = rte_pktmbuf_mtod(m, struct rte_ether_hdr *); + struct rte_mpls_hdr *mph; + bool mpls_exist = true; + + if (eh->ether_type != rte_cpu_to_be_16(RTE_ETHER_TYPE_MPLS)) + return -1; + + /* Stripping all MPLS header */ + while (mpls_exist) { + mph = rte_pktmbuf_mtod_offset(m, struct rte_mpls_hdr*, + sizeof(struct rte_ether_hdr)); + if (mph->bs & 1) + mpls_exist = false; + memmove(rte_pktmbuf_adj(m, sizeof(struct rte_mpls_hdr)), + eh, sizeof(struct rte_ether_hdr)); + eh = rte_pktmbuf_mtod(m, struct rte_ether_hdr *); + } + + return 0; +} + #ifdef __cplusplus } #endif -- 1.8.3.1
[PATCH] net/mlx5: reject flow template API reconfiguration
Flow Template API allows rte_flow_configure() to be called more than once, to allow applications to dynamically reconfigure the amount of resources available for flow table and flow rule creation. PMDs can reject the change in configuration and keep the old configuration. Before this patch, during Flow Template API reconfiguration in mlx5 PMD, all the allocated resources (HW/FW object pools, flow rules, pattern and actions templates) were released and reallocated according to the new configuration. This however leads to undefined behavior, since all references to templates, flows or indirect actions, which are held by the user, are now invalidated. This patch changes the reconfiguration behavior. Configuration provided to rte_flow_configure() is stored in port's private data structure. On any subsequent call to rte_flow_configure(), the provided configuration is compared against the stored configuration. If both are equal, rte_flow_configure() reports a success and configuration is unaffected. Otherwise, (-ENOTSUP) is returned. Signed-off-by: Dariusz Sosnowski --- doc/guides/nics/mlx5.rst| 3 + drivers/net/mlx5/mlx5.h | 8 +++ drivers/net/mlx5/mlx5_flow_hw.c | 104 +++- 3 files changed, 112 insertions(+), 3 deletions(-) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 6510e74fb9..2fc0399b99 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -160,6 +160,9 @@ Limitations - Set ``dv_flow_en`` to 2 in order to enable HW steering. - Async queue-based ``rte_flow_async`` APIs supported only. - NIC ConnectX-5 and before are not supported. + - Reconfiguring flow API engine is not supported. +Any subsequent call to ``rte_flow_configure()`` with different configuration +than initially provided will be rejected with ``-ENOTSUP`` error code. - Partial match with item template is not supported. - IPv6 5-tuple matching is not supported. - With E-Switch enabled, ports which share the E-Switch domain diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index a766fb408e..7fc75a2535 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -1656,6 +1656,13 @@ struct mlx5_hw_ctrl_flow { struct rte_flow *flow; }; +/* HW Steering port configuration passed to rte_flow_configure(). */ +struct mlx5_flow_hw_attr { + struct rte_flow_port_attr port_attr; + uint16_t nb_queue; + struct rte_flow_queue_attr *queue_attr; +}; + struct mlx5_flow_hw_ctrl_rx; struct mlx5_priv { @@ -1763,6 +1770,7 @@ struct mlx5_priv { uint32_t nb_queue; /* HW steering queue number. */ struct mlx5_hws_cnt_pool *hws_cpool; /* HW steering's counter pool. */ uint32_t hws_mark_refcnt; /* HWS mark action reference counter. */ + struct mlx5_flow_hw_attr *hw_attr; /* HW Steering port configuration. */ #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H) /* Item template list. */ LIST_HEAD(flow_hw_itt, rte_flow_pattern_template) flow_hw_itt; diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c index a9c7045a3e..29437d8d6c 100644 --- a/drivers/net/mlx5/mlx5_flow_hw.c +++ b/drivers/net/mlx5/mlx5_flow_hw.c @@ -6792,6 +6792,86 @@ mlx5_flow_hw_cleanup_ctrl_rx_templates(struct rte_eth_dev *dev) } } +/** + * Copy the provided HWS configuration to a newly allocated buffer. + * + * @param[in] port_attr + * Port configuration attributes. + * @param[in] nb_queue + * Number of queue. + * @param[in] queue_attr + * Array that holds attributes for each flow queue. + * + * @return + * Pointer to copied HWS configuration is returned on success. + * Otherwise, NULL is returned and rte_errno is set. + */ +static struct mlx5_flow_hw_attr * +flow_hw_alloc_copy_config(const struct rte_flow_port_attr *port_attr, + const uint16_t nb_queue, + const struct rte_flow_queue_attr *queue_attr[], + struct rte_flow_error *error) +{ + struct mlx5_flow_hw_attr *hw_attr; + size_t hw_attr_size; + unsigned int i; + + hw_attr_size = sizeof(*hw_attr) + nb_queue * sizeof(*hw_attr->queue_attr); + hw_attr = mlx5_malloc(MLX5_MEM_ZERO, hw_attr_size, 0, SOCKET_ID_ANY); + if (!hw_attr) { + rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "Not enough memory to store configuration"); + return NULL; + } + memcpy(&hw_attr->port_attr, port_attr, sizeof(*port_attr)); + hw_attr->nb_queue = nb_queue; + /* Queue attributes are placed after the mlx5_flow_hw_attr. */ + hw_attr->queue_attr = (struct rte_flow_queue_attr *)(hw_attr + 1); + for (i = 0; i < nb_queue; ++i) + memcpy(&hw_attr->queue_attr[i], queue_attr[i], sizeof(hw_attr->queue_attr[i])); + return hw_attr; +} + +/** + * Compa
[PATCH 0/2] net/mlx5: fix representor matching
This patch series addresses a few issues with representor matching mode available in mlx5 PMD: - First patch fixes an issue with internal flow group index translation occurring when both extended metadata mode and representor matching was enabled. - Second patch fixes an issue with isolated mode occurring when representor matching was disabled. Dariusz Sosnowski (2): net/mlx5: fix egress group translation in HWS net/mlx5: fix isolated mode when repr matching is disabled doc/guides/nics/mlx5.rst | 3 +++ drivers/net/mlx5/linux/mlx5_os.c | 16 drivers/net/mlx5/mlx5_flow.c | 4 drivers/net/mlx5/mlx5_flow_hw.c | 14 +- 4 files changed, 32 insertions(+), 5 deletions(-) -- 2.25.1
[PATCH 1/2] net/mlx5: fix egress group translation in HWS
With HW Steering enabled creating egress template tables and egress flow rules on E-Switch setups is allowed. To enable it, PMD creates a set of default egress flow rules responsible for: - Storing representor ID (vport tag is used) in HW register. This is used for traffic source identification. - Copying software metadata to proper HW register to allow preserving metadata across domains. Structure of these flow rules and whether they are inserted depend on the device configuration. There are the following cases: 1. repr_matching=1 and dv_xmeta_en=4 - An egress flow rule in group 0 is created for each Tx queue; - Flow rule matching SQ number - fills unused REG_C_0 bits with vport tag, copies REG_A to REG_C_1 and jumps to group 1. 2. repr_matching=1 and dv_xmeta_en=0 - An egress flow rule in group 0 is created for each Tx queue; - Flow rule matching SQ number - fills unused REG_C_0 bits with vport tag and jumps to group 1. 3. repr_matching=0 and dv_xmeta_en=4 - A single egress flow rule in group 0 is created; - Flow rule matches all E-Switch manager TX traffic, copies REG_A to REG_C and jumps to group 1. 4. repr_matching=0 and dv_xmeta_en=0 - no default flow rules are added. When default egress flow rules are required, they are inserted in group 0 and this group is reserved for PMD purposes. User created template tables must be created in higher groups. As a result, on template table creation PMD is translating the provided group (incrementing it in that case). Before this patch, a condition used to check if translation of egress flow group is needed was incorrect. It did not allow translation if both representor matching AND extended metadata mode were enabled. This patch fixes this condition - translation is allowed if and only if representor matching OR extended metadata mode is enabled. Fixes: 483181f7b6dd ("net/mlx5: support device control of representor matching") Cc: sta...@dpdk.org Signed-off-by: Dariusz Sosnowski Acked-by: Ori Kam --- drivers/net/mlx5/mlx5_flow_hw.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c index a9c7045a3e..d3d86fe24d 100644 --- a/drivers/net/mlx5/mlx5_flow_hw.c +++ b/drivers/net/mlx5/mlx5_flow_hw.c @@ -3260,14 +3260,18 @@ flow_hw_translate_group(struct rte_eth_dev *dev, "group index not supported"); *table_group = group + 1; } else if (config->dv_esw_en && - !(config->repr_matching && config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) && + (config->repr_matching || config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) && cfg->external && flow_attr->egress) { /* -* On E-Switch setups, egress group translation is not done if and only if -* representor matching is disabled and legacy metadata mode is selected. -* In all other cases, egree group 0 is reserved for representor tagging flows -* and metadata copy flows. +* On E-Switch setups, default egress flow rules are inserted to allow +* representor matching and/or preserving metadata across steering domains. +* These flow rules are inserted in group 0 and this group is reserved by PMD +* for these purposes. +* +* As a result, if representor matching or extended metadata mode is enabled, +* group provided by the user must be incremented to avoid inserting flow rules +* in group 0. */ if (group > MLX5_HW_MAX_EGRESS_GROUP) return rte_flow_error_set(error, EINVAL, -- 2.25.1
[PATCH 2/2] net/mlx5: fix isolated mode when repr matching is disabled
In HW steering mode, when running on an E-Switch setup, mlx5 PMD provides an ability to enable or disable representor matching (through `repr_matching_en` device argument). If representor matching is enabled, any ingress or egress flow rule, created on any port representor will match traffic related to that specific port. If it is disabled, flow rule created on one of the ports, will match traffic related to all ports. As a result, when representor matching is disabled, PMD cannot correctly create control flow rules for receiving default traffic according to port configuration. Since each port representor in the same switch domain, can have different port configuration and flow rules do not differentiate between ports, these flow rules cannot be correctly applied. In that case, each port works in de facto isolated mode. This patch makes sure that if representor matching is disabled, port is forced into isolated mode. Disabling flow isolated is forbidden. Fixes: 483181f7b6dd ("net/mlx5: support device control of representor matching") Cc: sta...@dpdk.org Signed-off-by: Dariusz Sosnowski Acked-by: Ori Kam --- doc/guides/nics/mlx5.rst | 3 +++ drivers/net/mlx5/linux/mlx5_os.c | 16 drivers/net/mlx5/mlx5_flow.c | 4 3 files changed, 23 insertions(+) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 6510e74fb9..350fb0287e 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -1137,6 +1137,9 @@ for an additional list of options shared with other mlx5 drivers. - 0. If representor matching is disabled, then there will be no implicit item added. As a result, ingress flow rules will match traffic coming to any port, not only the port on which flow rule is created. +Because of that, default flow rules for ingress traffic cannot be created +and port starts in isolated mode by default. Port cannot be switched back +to non-isolated mode. - 1. If representor matching is enabled (default setting), then each ingress pattern template has an implicit REPRESENTED_PORT diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c index a71474c90a..2cce0a7f0f 100644 --- a/drivers/net/mlx5/linux/mlx5_os.c +++ b/drivers/net/mlx5/linux/mlx5_os.c @@ -1613,6 +1613,22 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev, err = EINVAL; goto error; } + /* +* If representor matching is disabled, PMD cannot create default flow rules +* to receive traffic for all ports, since implicit source port match is not added. +* Isolated mode is forced. +*/ + if (priv->sh->config.dv_esw_en && !priv->sh->config.repr_matching) { + err = mlx5_flow_isolate(eth_dev, 1, NULL); + if (err < 0) { + err = -err; + goto error; + } + DRV_LOG(WARNING, "port %u ingress traffic is restricted to defined " +"flow rules (isolated mode) since representor " +"matching is disabled", + eth_dev->data->port_id); + } return eth_dev; #else DRV_LOG(ERR, "DV support is missing for HWS."); diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index a6a426caf7..7ee30bdd1f 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -8077,6 +8077,10 @@ mlx5_flow_isolate(struct rte_eth_dev *dev, "port must be stopped first"); return -rte_errno; } + if (!enable && !priv->sh->config.repr_matching) + return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "isolated mode cannot be disabled when " + "representor matching is disabled"); priv->isolated = !!enable; if (enable) dev->dev_ops = &mlx5_dev_ops_isolate; -- 2.25.1
RE: [PATCH] net/mlx5: reject flow template API reconfiguration
Hi, > -Original Message- > From: Dariusz Sosnowski > Sent: Saturday, 25 February 2023 21:58 > Subject: [PATCH] net/mlx5: reject flow template API reconfiguration > > Flow Template API allows rte_flow_configure() to be called > more than once, to allow applications to dynamically reconfigure > the amount of resources available for flow table and flow rule creation. > PMDs can reject the change in configuration and > keep the old configuration. > > Before this patch, during Flow Template API reconfiguration in mlx5 PMD, > all the allocated resources (HW/FW object pools, flow rules, pattern and > actions templates) were released and reallocated according to > the new configuration. This however leads to undefined behavior, > since all references to templates, flows or indirect actions, > which are held by the user, are now invalidated. > > This patch changes the reconfiguration behavior. > Configuration provided to rte_flow_configure() is stored in port's > private data structure. On any subsequent call to rte_flow_configure(), > the provided configuration is compared against the stored configuration. > If both are equal, rte_flow_configure() reports a success and > configuration is unaffected. Otherwise, (-ENOTSUP) is returned. > > Signed-off-by: Dariusz Sosnowski > --- Acked-by: Ori Kam