RE: [PATCH v5 0/2] net/mlx5: add random item support
Hi, > -Original Message- > From: Michael Baum > Sent: Thursday, January 25, 2024 2:37 PM > To: dev@dpdk.org > Cc: Matan Azrad ; Dariusz Sosnowski > ; Raslan Darawsheh ; Slava > Ovsiienko ; Ori Kam ; > Suanming Mou > Subject: [PATCH v5 0/2] net/mlx5: add random item support > > Add support for matching random value using the "rte_flow_item_random" > structure. > > v2: > - Rebase. > - Move release notes to the new release file. > > v3: > - Fix typos in commit message and release notes. > - Update limitations. > > v4: > - Fix using same value for both "MLX5_FLOW_ITEM_NSH" and >"MLX5_FLOW_ITEM_RANDOM". > - Add "Acked-by" from v3. > > v5: > - Rebase. > - Remove "Depends-on" label. > > > Erez Shitrit (1): > net/mlx5/hws: add support for random number match > > Michael Baum (1): > net/mlx5: add random item support > > doc/guides/nics/features/mlx5.ini | 1 + > doc/guides/nics/mlx5.rst | 9 +++ > doc/guides/rel_notes/release_24_03.rst | 1 + > drivers/net/mlx5/hws/mlx5dr_definer.c | 33 > ++ drivers/net/mlx5/hws/mlx5dr_definer.h | > 8 ++- > drivers/net/mlx5/mlx5_flow.h | 3 +++ > drivers/net/mlx5/mlx5_flow_dv.c| 5 > drivers/net/mlx5/mlx5_flow_hw.c| 5 > 8 files changed, 64 insertions(+), 1 deletion(-) > > -- > 2.25.1 Sereis applied to next-net-mlx, Kindest regards Raslan Darawsheh
RE: [PATCH] RFC: use C11 alignas instead of GCC attribute aligned
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Saturday, 27 January 2024 20.15 > > On 2024-01-26 11:18, Morten Brørup wrote: > >> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > >> Sent: Friday, 26 January 2024 11.05 > >> > >> On 2024-01-25 23:53, Morten Brørup wrote: > From: Tyler Retzlaff [mailto:roret...@linux.microsoft.com] > Sent: Thursday, 25 January 2024 19.37 > > ping. > > Please review this thread if you have time, the main point of > discussion > I would like to receive consensus on the following questions. > > 1. Should we continue to expand common alignments behind an > >> __rte_macro > > i.e. what do we prefer to appear in code > > alignas(RTE_CACHE_LINE_MIN_SIZE) > > -- or -- > > __rte_cache_aligned > > One of the benefits of dropping the macro is it provides a clear > >> visual > indicator that it is not placed in the same location or get > applied > to types as is done with __attribute__((__aligned__(n))). > >>> > >>> We don't want our own proprietary variant of something that already > >> exists in the C standard. Now that we have moved to C11, the __rte > >> alignment macros should be considered obsolete. > >> > >> Making so something cache-line aligned is not in C11. > > > > We are talking about the __rte_aligned() macro, not the cache > alignment macro. > > > > OK, in that case, what is the relevance of question 1 above? With this in mind, try re-reading Tyler's clarifications in this tread. Briefly: alignas() can be attached to variables and structure fields, but not to types (like __rte_aligned()), so to align a structure: struct foo { int alignas(64) bar; /* alignas(64) must be here */ int baz; }; /* __rte_aligned(64) was here, but alignas(64) cannot be here. */ So the question is: Do we want to eliminate the __rte_aligned() macro - which relies on compiler attributes - and migrate to using the C11 standard alignas()? I think yes; after updating to C11, the workaround for pre-C11 not offering alignment is obsolete, and its removal should be on the roadmap. Continuously cleaning up old cruft in DPDK is important, especially for reducing the learning curve for newcomers to the project. For backwards compatibility, we can still keep the obsolete macros, but they should be removed from the project itself. > > >> > >> __rte_cache_aligned is shorter, provides a tiny bit of abstraction, > and > >> is already an established DPDK standard. So just keep the macro. If > it > >> would change, I would argue for it to be changed to > rte_cache_aligned > >> (i.e., just moving it out of __ namespace, and maybe making it > >> all-uppercase). > >> > >> Non-trivial C programs wrap things all the time, standard or not. > It's > >> not something to be overly concerned about, imo. > > > > Using the cache alignment macro was obviously a bad example for > discussing the __rte_aligned() macro. > > > > FYI, Tyler later agreed to introducing the RTE_CACHE_ALIGNAS you had > proposed in an earlier correspondence. > > > >> > >>> > >>> Note: I don't mind convenience macros for common use cases, so we > >> could also introduce the macro suggested by Mattias [1]: > >>> > >>> #define RTE_CACHE_ALIGNAS alignas(RTE_CACHE_LINE_SIZE) > >>> > >>> [1]: https://inbox.dpdk.org/dev/dc3f3131-38e6-4219-861e- > >> b31ec10c0...@lysator.liu.se/ > >>> > > 2. where should we place alignas(n) or __rte_macro (if we use a > >> macro) > > Should it be on the same line as the variable or field or on the > preceeding line? > > /* same line example struct */ > struct T { > /* alignas(64) applies to field0 *not* struct T type > >> declaration > */ > alignas(64) void *field0; > void *field1; > > ... other fields ... > > alignas(64) uint64_t field5; > uint32_t field6; > > ... more fields ... > > }; > > /* same line example array */ > alignas(64) static const uint32_t array[4] = { ... }; > > -- or -- > > /* preceeding line example struct */ > struct T { > /* alignas(64) applies to field0 *not* struct T type > >> declaration > */ > alignas(64) > void *field0; > void *field1; > > ... other fields ... > > alignas(64) > uint64_t field5; > uint32_t field6; > > ... more fields ... > > }; > > /* preceeding line example array */ > alignas(64) > static const uint32_t array[4] = { ... }; > > >>> > >>> Searching the net for what other projects do, I came across this > >> required placement [2]: > >>> > >>> uint64_t alignas(64) field5; > >>> > >>> [2]
[PATCH 0/4] introduce encap hash calculation
This patch set adds the support for encap hash calculation. It is based on RFC: https://patchwork.dpdk.org/project/dpdk/patch/20231210083100.7893-1-or...@nvidia.com/ Hamdan Igbaria (1): net/mlx5/hws: introduce encap entropy hash calculation API Ori Kam (3): ethdev: introduce encap hash calculation net/mlx5: add calc encap hash support app/testpmd: add encap hash calculation app/test-pmd/cmdline_flow.c | 57 +-- app/test-pmd/config.c | 30 app/test-pmd/testpmd.h | 3 + doc/guides/prog_guide/rte_flow.rst | 22 ++ doc/guides/rel_notes/release_24_03.rst | 4 ++ doc/guides/testpmd_app_ug/testpmd_funcs.rst | 21 +- drivers/common/mlx5/mlx5_prm.h | 8 ++- drivers/net/mlx5/hws/mlx5dr.h | 38 ++ drivers/net/mlx5/hws/mlx5dr_cmd.c | 23 ++ drivers/net/mlx5/hws/mlx5dr_cmd.h | 4 ++ drivers/net/mlx5/hws/mlx5dr_crc32.c | 78 + drivers/net/mlx5/hws/mlx5dr_crc32.h | 5 ++ drivers/net/mlx5/mlx5_flow.c| 29 drivers/net/mlx5/mlx5_flow.h| 8 +++ drivers/net/mlx5/mlx5_flow_hw.c | 66 + lib/ethdev/rte_flow.c | 25 +++ lib/ethdev/rte_flow.h | 50 + lib/ethdev/rte_flow_driver.h| 5 ++ lib/ethdev/version.map | 1 + 19 files changed, 470 insertions(+), 7 deletions(-) -- 2.34.1
[PATCH 1/4] ethdev: introduce encap hash calculation
During the encapsulation of a packet, it is expected to calculate the hash value which is based on the original packet (the outer values, which will become the inner values). The tunnel protocol defines which tunnel field should hold this hash, but it doesn't define the hash calculation algorithm. An application that uses flow offloads gets the first few packets and then decides to offload the flow. As a result, there are two different paths that a packet from a given flow may take. SW for the first few packets or HW for the rest. When the packet goes through the SW, the SW encapsulates the packet and must use the same hash calculation as the HW will do for the rest of the packets in this flow. This patch gives the SW a way to query the hash value for a given packet as if the packet was passed through the HW. Signed-off-by: Ori Kam --- doc/guides/prog_guide/rte_flow.rst | 22 doc/guides/rel_notes/release_24_03.rst | 4 +++ lib/ethdev/rte_flow.c | 25 + lib/ethdev/rte_flow.h | 50 ++ lib/ethdev/rte_flow_driver.h | 5 +++ lib/ethdev/version.map | 1 + 6 files changed, 107 insertions(+) diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst index 900fdaefb6..0435dda3c7 100644 --- a/doc/guides/prog_guide/rte_flow.rst +++ b/doc/guides/prog_guide/rte_flow.rst @@ -4211,6 +4211,28 @@ as it would be calculated in the HW. uint8_t pattern_template_index, uint32_t *hash, struct rte_flow_error *error); +Calculate encap hash + + +Calculating hash of a packet in SW as it would be calculated in HW for the encap action + +When the HW execute an encapsulation action, it may calculate an hash value which is based +on the original packet. This hash is stored depending on the encapsulation protocol, in one +of the outer fields. + +This function allows the application to calculate the hash for a given packet as if the +encapsulation was done in HW. + +.. code-block:: c + + int + rte_flow_calc_encap_hash(uint16_t port_id, +const struct rte_flow_item pattern[], + enum rte_flow_encap_hash_field dest_field, +uint8_t hash_len, +uint8_t *hash, + struct rte_flow_error *error); + .. _flow_isolated_mode: Flow isolated mode diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst index 5e545da867..80af380172 100644 --- a/doc/guides/rel_notes/release_24_03.rst +++ b/doc/guides/rel_notes/release_24_03.rst @@ -75,6 +75,10 @@ New Features * Added support for VXLAN-GPE matching in HW Steering flow engine (``dv_flow_en`` = 2). +* **Added ability to calculate the encap hash as done by HW.** + + * Added function that calculates the encap hash, as the HW calculates it: +``rte_flow_calc_encap_hash()`` Removed Items - diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c index 3f58d792f9..7fce754be1 100644 --- a/lib/ethdev/rte_flow.c +++ b/lib/ethdev/rte_flow.c @@ -2482,3 +2482,28 @@ rte_flow_calc_table_hash(uint16_t port_id, const struct rte_flow_template_table hash, error); return flow_err(port_id, ret, error); } + +int +rte_flow_calc_encap_hash(uint16_t port_id, const struct rte_flow_item pattern[], +enum rte_flow_encap_hash_field dest_field, uint8_t hash_len, +uint8_t *hash, struct rte_flow_error *error) +{ + int ret; + struct rte_eth_dev *dev; + const struct rte_flow_ops *ops; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + ops = rte_flow_ops_get(port_id, error); + if (!ops || !ops->flow_calc_encap_hash) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "calc encap hash is not supported"); + if ((dest_field == RTE_FLOW_ENCAP_HASH_FIELD_SRC_PORT && hash_len != 2) || + (dest_field == RTE_FLOW_ENCAP_HASH_FIELD_NVGRE_FLOW_ID && hash_len != 1)) + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "hash len doesn't match the requested field len"); + dev = &rte_eth_devices[port_id]; + ret = ops->flow_calc_encap_hash(dev, pattern, dest_field, hash, error); + return flow_err(port_id, ret, error); +} diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index 1267c146e5..ffbde58245 100644 --- a/lib/ethdev/rte_flow.h +++ b/lib/ethdev/rte_flow.h @@ -6783,6 +6783,56 @@ rte_flow_calc_table_hash(uint16_t port_id, const struct rte
[PATCH 2/4] net/mlx5/hws: introduce encap entropy hash calculation API
From: Hamdan Igbaria Add new function for encap entropy hash calculation, the function will check the device capability for the entropy hash type used by the device, and will calculate the entropy hash value of the user passed fields according this type. Signed-off-by: Hamdan Igbaria --- drivers/common/mlx5/mlx5_prm.h | 8 ++- drivers/net/mlx5/hws/mlx5dr.h | 38 ++ drivers/net/mlx5/hws/mlx5dr_cmd.c | 23 + drivers/net/mlx5/hws/mlx5dr_cmd.h | 4 ++ drivers/net/mlx5/hws/mlx5dr_crc32.c | 78 + drivers/net/mlx5/hws/mlx5dr_crc32.h | 5 ++ 6 files changed, 154 insertions(+), 2 deletions(-) diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h index 69404b5ed8..82923d5f3f 100644 --- a/drivers/common/mlx5/mlx5_prm.h +++ b/drivers/common/mlx5/mlx5_prm.h @@ -2130,7 +2130,10 @@ struct mlx5_ifc_flow_table_prop_layout_bits { struct mlx5_ifc_roce_caps_bits { u8 reserved_0[0x1e]; u8 qp_ts_format[0x2]; - u8 reserved_at_20[0x7e0]; + u8 reserved_at_20[0xa0]; + u8 r_roce_max_src_udp_port[0x10]; + u8 r_roce_min_src_udp_port[0x10]; + u8 reserved_at_e0[0x720]; }; struct mlx5_ifc_ft_fields_support_bits { @@ -2368,7 +2371,8 @@ struct mlx5_ifc_cmd_hca_cap_2_bits { u8 format_select_dw_gtpu_first_ext_dw_0[0x8]; u8 generate_wqe_type[0x20]; u8 reserved_at_2c0[0x160]; - u8 reserved_at_420[0x1c]; + u8 reserved_at_420[0x18]; + u8 encap_entropy_hash_type[0x4]; u8 flow_table_hash_type[0x4]; u8 reserved_at_440[0x3c0]; }; diff --git a/drivers/net/mlx5/hws/mlx5dr.h b/drivers/net/mlx5/hws/mlx5dr.h index d88f73ab57..321b649f8c 100644 --- a/drivers/net/mlx5/hws/mlx5dr.h +++ b/drivers/net/mlx5/hws/mlx5dr.h @@ -279,6 +279,27 @@ struct mlx5dr_action_dest_attr { } reformat; }; +union mlx5dr_crc_encap_entropy_hash_ip_field { + uint8_t ipv6_addr[16]; + struct { + uint8_t reserved[12]; + rte_be32_t ipv4_addr; + }; +}; + +struct mlx5dr_crc_encap_entropy_hash_fields { + union mlx5dr_crc_encap_entropy_hash_ip_field dst; + union mlx5dr_crc_encap_entropy_hash_ip_field src; + uint8_t next_protocol; + rte_be16_t dst_port; + rte_be16_t src_port; +}__rte_packed; + +enum mlx5dr_crc_encap_entropy_hash_size { + MLX5DR_CRC_ENCAP_ENTROPY_HASH_SIZE_8, + MLX5DR_CRC_ENCAP_ENTROPY_HASH_SIZE_16, +}; + /* Open a context used for direct rule insertion using hardware steering. * Each context can contain multiple tables of different types. * @@ -845,4 +866,21 @@ int mlx5dr_send_queue_action(struct mlx5dr_context *ctx, */ int mlx5dr_debug_dump(struct mlx5dr_context *ctx, FILE *f); +/* Calculate encap entropy hash value + * + * @param[in] ctx + * The context to get from it's capabilities the entropy hash type. + * @param[in] data + * The fields for the hash calculation. + * @param[in] entropy_res + * An array to store the hash value to it. + * @param[in] res_size + * The result size. + * @return zero on success non zero otherwise. + */ +int mlx5dr_crc_encap_entropy_hash_calc(struct mlx5dr_context *ctx, + struct mlx5dr_crc_encap_entropy_hash_fields *data, + uint8_t entropy_res[], + enum mlx5dr_crc_encap_entropy_hash_size res_size); + #endif diff --git a/drivers/net/mlx5/hws/mlx5dr_cmd.c b/drivers/net/mlx5/hws/mlx5dr_cmd.c index 876a47147d..f77b194708 100644 --- a/drivers/net/mlx5/hws/mlx5dr_cmd.c +++ b/drivers/net/mlx5/hws/mlx5dr_cmd.c @@ -1103,6 +1103,8 @@ int mlx5dr_cmd_query_caps(struct ibv_context *ctx, caps->ipsec_offload = MLX5_GET(query_hca_cap_out, out, capability.cmd_hca_cap.ipsec_offload); + caps->roce = MLX5_GET(query_hca_cap_out, out, capability.cmd_hca_cap.roce); + MLX5_SET(query_hca_cap_in, in, op_mod, MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE_2 | MLX5_HCA_CAP_OPMOD_GET_CUR); @@ -1158,6 +1160,9 @@ int mlx5dr_cmd_query_caps(struct ibv_context *ctx, caps->flow_table_hash_type = MLX5_GET(query_hca_cap_out, out, capability.cmd_hca_cap_2.flow_table_hash_type); + caps->encap_entropy_hash_type = MLX5_GET(query_hca_cap_out, out, + capability.cmd_hca_cap_2.encap_entropy_hash_type); + MLX5_SET(query_hca_cap_in, in, op_mod, MLX5_GET_HCA_CAP_OP_MOD_NIC_FLOW_TABLE | MLX5_HCA_CAP_OPMOD_GET_CUR); @@ -1306,6 +1311,24 @@ int mlx5dr_cmd_query_caps(struct ibv_context *ctx, capability.esw_cap.merged_eswitch); } + if (caps->roce) { + MLX5_SET(query_hca_cap_in, in, op_mod, +MLX5_GET_HCA_CAP_OP_MOD_ROCE
[PATCH 3/4] net/mlx5: add calc encap hash support
This commit adds support for encap hash calculation. Signed-off-by: Ori Kam --- drivers/net/mlx5/mlx5_flow.c| 29 +++ drivers/net/mlx5/mlx5_flow.h| 8 drivers/net/mlx5/mlx5_flow_hw.c | 66 + 3 files changed, 103 insertions(+) diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index acaf34ce52..d1a7ad9234 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -1197,6 +1197,12 @@ mlx5_flow_calc_table_hash(struct rte_eth_dev *dev, const struct rte_flow_item pattern[], uint8_t pattern_template_index, uint32_t *hash, struct rte_flow_error *error); +static int +mlx5_flow_calc_encap_hash(struct rte_eth_dev *dev, + const struct rte_flow_item pattern[], + enum rte_flow_encap_hash_field dest_field, + uint8_t *hash, + struct rte_flow_error *error); static const struct rte_flow_ops mlx5_flow_ops = { .validate = mlx5_flow_validate, @@ -1253,6 +1259,7 @@ static const struct rte_flow_ops mlx5_flow_ops = { .async_action_list_handle_query_update = mlx5_flow_async_action_list_handle_query_update, .flow_calc_table_hash = mlx5_flow_calc_table_hash, + .flow_calc_encap_hash = mlx5_flow_calc_encap_hash, }; /* Tunnel information. */ @@ -11121,6 +11128,28 @@ mlx5_flow_calc_table_hash(struct rte_eth_dev *dev, hash, error); } +static int +mlx5_flow_calc_encap_hash(struct rte_eth_dev *dev, + const struct rte_flow_item pattern[], + enum rte_flow_encap_hash_field dest_field, + uint8_t *hash, + struct rte_flow_error *error) +{ + enum mlx5_flow_drv_type drv_type = flow_get_drv_type(dev, NULL); + const struct mlx5_flow_driver_ops *fops; + + if (drv_type == MLX5_FLOW_TYPE_MIN || drv_type == MLX5_FLOW_TYPE_MAX) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ACTION, + NULL, "invalid driver type"); + fops = flow_get_drv_ops(drv_type); + if (!fops || !fops->flow_calc_encap_hash) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ACTION, + NULL, "no calc encap hash handler"); + return fops->flow_calc_encap_hash(dev, pattern, dest_field, hash, error); +} + /** * Destroy all indirect actions (shared RSS). * diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h index fe4f46724b..a04dde9e93 100644 --- a/drivers/net/mlx5/mlx5_flow.h +++ b/drivers/net/mlx5/mlx5_flow.h @@ -2109,6 +2109,13 @@ typedef int const struct rte_flow_item pattern[], uint8_t pattern_template_index, uint32_t *hash, struct rte_flow_error *error); +typedef int +(*mlx5_flow_calc_encap_hash_t) + (struct rte_eth_dev *dev, +const struct rte_flow_item pattern[], +enum rte_flow_encap_hash_field dest_field, +uint8_t *hash, +struct rte_flow_error *error); struct mlx5_flow_driver_ops { mlx5_flow_validate_t validate; @@ -2182,6 +2189,7 @@ struct mlx5_flow_driver_ops { mlx5_flow_async_action_list_handle_query_update_t async_action_list_handle_query_update; mlx5_flow_calc_table_hash_t flow_calc_table_hash; + mlx5_flow_calc_encap_hash_t flow_calc_encap_hash; }; /* mlx5_flow.c */ diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c index 48b70c0c29..0e4e7bd3f5 100644 --- a/drivers/net/mlx5/mlx5_flow_hw.c +++ b/drivers/net/mlx5/mlx5_flow_hw.c @@ -11556,6 +11556,71 @@ flow_hw_calc_table_hash(struct rte_eth_dev *dev, return 0; } +static int +flow_hw_calc_encap_hash(struct rte_eth_dev *dev, + const struct rte_flow_item pattern[], + enum rte_flow_encap_hash_field dest_field, + uint8_t *hash, + struct rte_flow_error *error) +{ + struct mlx5_priv *priv = dev->data->dev_private; + struct mlx5dr_crc_encap_entropy_hash_fields data; + enum mlx5dr_crc_encap_entropy_hash_size res_size = + dest_field == RTE_FLOW_ENCAP_HASH_FIELD_SRC_PORT ? + MLX5DR_CRC_ENCAP_ENTROPY_HASH_SIZE_16 : + MLX5DR_CRC_ENCAP_ENTROPY_HASH_SIZE_8; + int res; + + memset(&data, 0, sizeof(struct mlx5dr_crc_encap_entropy_hash_fields)); + + for (; pattern->type != RTE_FLOW_ITEM_TYPE_END; pattern++) { +
[PATCH 4/4] app/testpmd: add encap hash calculation
This commits add support for calculating the encap hash. Command structure: flow hash {port} encap {target field} pattern {item} [/ {item} [...] ] / end Example: calculate hash to be used by VXLAN encapsulation. flow hash 0 encap hash_field_sport pattern ipv4 dst is 7.7.7.7 src is 8.8.8.8 / udp dst is 5678 src is 1234 / end Signed-off-by: Ori Kam --- app/test-pmd/cmdline_flow.c | 57 +++-- app/test-pmd/config.c | 30 +++ app/test-pmd/testpmd.h | 3 ++ doc/guides/testpmd_app_ug/testpmd_funcs.rst | 21 +++- 4 files changed, 106 insertions(+), 5 deletions(-) diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c index 4062879552..a42f07276d 100644 --- a/app/test-pmd/cmdline_flow.c +++ b/app/test-pmd/cmdline_flow.c @@ -219,6 +219,10 @@ enum index { HASH_CALC_TABLE, HASH_CALC_PATTERN_INDEX, HASH_CALC_PATTERN, + HASH_CALC_ENCAP, + HASH_CALC_DEST, + ENCAP_HASH_FIELD_SRC_PORT, + ENCAP_HASH_FIELD_GRE_FLOW_ID, /* Tunnel arguments. */ TUNNEL_CREATE, @@ -1192,6 +1196,8 @@ struct buffer { uint32_t pattern_n; uint32_t actions_n; uint8_t *data; + enum rte_flow_encap_hash_field field; + uint8_t encap_hash; } vc; /**< Validate/create arguments. */ struct { uint64_t *rule; @@ -2550,6 +2556,18 @@ static const enum index action_represented_port[] = { ZERO, }; +static const enum index next_hash_subcmd[] = { + HASH_CALC_TABLE, + HASH_CALC_ENCAP, + ZERO, +}; + +static const enum index next_hash_encap_dest_subcmd[] = { + ENCAP_HASH_FIELD_SRC_PORT, + ENCAP_HASH_FIELD_GRE_FLOW_ID, + ZERO, +}; + static int parse_set_raw_encap_decap(struct context *, const struct token *, const char *, unsigned int, void *, unsigned int); @@ -3789,7 +3807,7 @@ static const struct token token_list[] = { [HASH] = { .name = "hash", .help = "calculate hash for a given pattern in a given template table", - .next = NEXT(NEXT_ENTRY(HASH_CALC_TABLE), NEXT_ENTRY(COMMON_PORT_ID)), + .next = NEXT(next_hash_subcmd, NEXT_ENTRY(COMMON_PORT_ID)), .args = ARGS(ARGS_ENTRY(struct buffer, port)), .call = parse_hash, }, @@ -3803,6 +3821,12 @@ static const struct token token_list[] = { args.vc.table_id)), .call = parse_hash, }, + [HASH_CALC_ENCAP] = { + .name = "encap", + .help = "calculates encap hash", + .next = NEXT(next_hash_encap_dest_subcmd), + .call = parse_hash, + }, [HASH_CALC_PATTERN_INDEX] = { .name = "pattern_template", .help = "specify pattern template id", @@ -3812,6 +3836,18 @@ static const struct token token_list[] = { args.vc.pat_templ_id)), .call = parse_hash, }, + [ENCAP_HASH_FIELD_SRC_PORT] = { + .name = "hash_field_sport", + .help = "the encap hash field is src port", + .next = NEXT(NEXT_ENTRY(ITEM_PATTERN)), + .call = parse_hash, + }, + [ENCAP_HASH_FIELD_GRE_FLOW_ID] = { + .name = "hash_field_flow_id", + .help = "the encap hash field is NVGRE flow id", + .next = NEXT(NEXT_ENTRY(ITEM_PATTERN)), + .call = parse_hash, + }, /* Top-level command. */ [INDIRECT_ACTION] = { .name = "indirect_action", @@ -10691,6 +10727,15 @@ parse_hash(struct context *ctx, const struct token *token, ctx->object = out->args.vc.pattern; ctx->objmask = NULL; return len; + case HASH_CALC_ENCAP: + out->args.vc.encap_hash = 1; + return len; + case ENCAP_HASH_FIELD_SRC_PORT: + out->args.vc.field = RTE_FLOW_ENCAP_HASH_FIELD_SRC_PORT; + return len; + case ENCAP_HASH_FIELD_GRE_FLOW_ID: + out->args.vc.field = RTE_FLOW_ENCAP_HASH_FIELD_NVGRE_FLOW_ID; + return len; default: return -1; } @@ -12651,9 +12696,13 @@ cmd_flow_parsed(const struct buffer *in) port_queue_flow_pull(in->port, in->queue); break; case HASH: - port_flow_hash_calc(in->port, in->args.vc.table_id, - in->args.vc.pat_templ_id, - in->args.vc.pattern); + if (!in->args.vc.encap_hash) + port_flow_hash_calc(in->port, in->ar
Re: [PATCH] RFC: use C11 alignas instead of GCC attribute aligned
On 2024-01-28 09:57, Morten Brørup wrote: From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] Sent: Saturday, 27 January 2024 20.15 On 2024-01-26 11:18, Morten Brørup wrote: From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] Sent: Friday, 26 January 2024 11.05 On 2024-01-25 23:53, Morten Brørup wrote: From: Tyler Retzlaff [mailto:roret...@linux.microsoft.com] Sent: Thursday, 25 January 2024 19.37 ping. Please review this thread if you have time, the main point of discussion I would like to receive consensus on the following questions. 1. Should we continue to expand common alignments behind an __rte_macro i.e. what do we prefer to appear in code alignas(RTE_CACHE_LINE_MIN_SIZE) -- or -- __rte_cache_aligned One of the benefits of dropping the macro is it provides a clear visual indicator that it is not placed in the same location or get applied to types as is done with __attribute__((__aligned__(n))). We don't want our own proprietary variant of something that already exists in the C standard. Now that we have moved to C11, the __rte alignment macros should be considered obsolete. Making so something cache-line aligned is not in C11. We are talking about the __rte_aligned() macro, not the cache alignment macro. OK, in that case, what is the relevance of question 1 above? With this in mind, try re-reading Tyler's clarifications in this tread. Briefly: alignas() can be attached to variables and structure fields, but not to types (like __rte_aligned()), so to align a structure: struct foo { int alignas(64) bar; /* alignas(64) must be here */ int baz; }; /* __rte_aligned(64) was here, but alignas(64) cannot be here. */ So the question is: Do we want to eliminate the __rte_aligned() macro - which relies on compiler attributes - and migrate to using the C11 standard alignas()? I think yes; after updating to C11, the workaround for pre-C11 not offering alignment is obsolete, and its removal should be on the roadmap. OK, thanks for the explanation. Interesting limitation in the standard. If the construct the standard is offering is less effective (in this case, less readable) and the non-standard-based option is possible to implement on all compilers (i.e., on MSVC too), then we should keep the custom option. Especially if it's already there, but also in cases where it isn't. In fact, one could argue *everything* related to alignment should go through something rte_, __rte_ or RTE_-prefixed. So, "int RTE_ALIGNAS(64) bar;". Maybe that would be silly, but it would be consistent with RTE_CACHE_ALIGNAS. I would worry more about allowing DPDK developers writing clean and readable code, than very slightly lowering the bar for the fraction of newcomers experienced with the latest and greatest from the C standard, and *not* familiar with age-old GCC extensions. Continuously cleaning up old cruft in DPDK is important, especially for reducing the learning curve for newcomers to the project. For backwards compatibility, we can still keep the obsolete macros, but they should be removed from the project itself. __rte_cache_aligned is shorter, provides a tiny bit of abstraction, and is already an established DPDK standard. So just keep the macro. If it would change, I would argue for it to be changed to rte_cache_aligned (i.e., just moving it out of __ namespace, and maybe making it all-uppercase). Non-trivial C programs wrap things all the time, standard or not. It's not something to be overly concerned about, imo. Using the cache alignment macro was obviously a bad example for discussing the __rte_aligned() macro. FYI, Tyler later agreed to introducing the RTE_CACHE_ALIGNAS you had proposed in an earlier correspondence. Note: I don't mind convenience macros for common use cases, so we could also introduce the macro suggested by Mattias [1]: #define RTE_CACHE_ALIGNAS alignas(RTE_CACHE_LINE_SIZE) [1]: https://inbox.dpdk.org/dev/dc3f3131-38e6-4219-861e- b31ec10c0...@lysator.liu.se/ 2. where should we place alignas(n) or __rte_macro (if we use a macro) Should it be on the same line as the variable or field or on the preceeding line? /* same line example struct */ struct T { /* alignas(64) applies to field0 *not* struct T type declaration */ alignas(64) void *field0; void *field1; ... other fields ... alignas(64) uint64_t field5; uint32_t field6; ... more fields ... }; /* same line example array */ alignas(64) static const uint32_t array[4] = { ... }; -- or -- /* preceeding line example struct */ struct T { /* alignas(64) applies to field0 *not* struct T type declaration */ alignas(64) void *field0; void *field1; ... other fields ... alignas(64) uint64_t field5; uint32_t field6; ... m
[PATCH v2 0/2] net/mlx5: add random compare support
Add support for compare item with "RTE_FLOW_FIELD_RANDOM". Depends-on: series-30606 ("ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE") v2: - Rebase. - Add "RTE_FLOW_FIELD_RANDOM" compare support. - Reduce the "Depends-on" list. Hamdan Igbaria (1): net/mlx5/hws: add support for compare matcher Michael Baum (1): net/mlx5: add support to compare random value drivers/common/mlx5/mlx5_prm.h| 16 ++ drivers/net/mlx5/hws/mlx5dr_cmd.c | 9 +- drivers/net/mlx5/hws/mlx5dr_cmd.h | 1 + drivers/net/mlx5/hws/mlx5dr_debug.c | 4 +- drivers/net/mlx5/hws/mlx5dr_debug.h | 1 + drivers/net/mlx5/hws/mlx5dr_definer.c | 243 +- drivers/net/mlx5/hws/mlx5dr_definer.h | 33 drivers/net/mlx5/hws/mlx5dr_matcher.c | 48 + drivers/net/mlx5/hws/mlx5dr_matcher.h | 12 +- drivers/net/mlx5/mlx5_flow_hw.c | 70 ++-- 10 files changed, 410 insertions(+), 27 deletions(-) -- 2.25.1
[PATCH v2 1/2] net/mlx5/hws: add support for compare matcher
From: Hamdan Igbaria Add support for compare matcher, this matcher will allow direct comparison between two packet fields, or a packet field and a value, with fully masked DW. For now this matcher hash table is limited to size 1x1, thus it supports only 1 rule STE. Signed-off-by: Hamdan Igbaria Signed-off-by: Michael Baum --- drivers/common/mlx5/mlx5_prm.h| 16 ++ drivers/net/mlx5/hws/mlx5dr_cmd.c | 9 +- drivers/net/mlx5/hws/mlx5dr_cmd.h | 1 + drivers/net/mlx5/hws/mlx5dr_debug.c | 4 +- drivers/net/mlx5/hws/mlx5dr_debug.h | 1 + drivers/net/mlx5/hws/mlx5dr_definer.c | 243 +- drivers/net/mlx5/hws/mlx5dr_definer.h | 33 drivers/net/mlx5/hws/mlx5dr_matcher.c | 48 + drivers/net/mlx5/hws/mlx5dr_matcher.h | 12 +- 9 files changed, 358 insertions(+), 9 deletions(-) diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h index 69404b5ed8..f64944f042 100644 --- a/drivers/common/mlx5/mlx5_prm.h +++ b/drivers/common/mlx5/mlx5_prm.h @@ -3448,6 +3448,7 @@ enum mlx5_ifc_rtc_ste_format { MLX5_IFC_RTC_STE_FORMAT_8DW = 0x4, MLX5_IFC_RTC_STE_FORMAT_11DW = 0x5, MLX5_IFC_RTC_STE_FORMAT_RANGE = 0x7, + MLX5_IFC_RTC_STE_FORMAT_4DW_RANGE = 0x8, }; enum mlx5_ifc_rtc_reparse_mode { @@ -3486,6 +3487,21 @@ struct mlx5_ifc_rtc_bits { u8 reserved_at_1a0[0x260]; }; +struct mlx5_ifc_ste_match_4dw_range_ctrl_dw_bits { + u8 match[0x1]; + u8 reserved_at_1[0x2]; + u8 base1[0x1]; + u8 inverse1[0x1]; + u8 reserved_at_5[0x1]; + u8 operator1[0x2]; + u8 reserved_at_8[0x3]; + u8 base0[0x1]; + u8 inverse0[0x1]; + u8 reserved_at_a[0x1]; + u8 operator0[0x2]; + u8 compare_delta[0x10]; +}; + struct mlx5_ifc_alias_context_bits { u8 vhca_id_to_be_accessed[0x10]; u8 reserved_at_10[0xd]; diff --git a/drivers/net/mlx5/hws/mlx5dr_cmd.c b/drivers/net/mlx5/hws/mlx5dr_cmd.c index 876a47147d..702d6fadac 100644 --- a/drivers/net/mlx5/hws/mlx5dr_cmd.c +++ b/drivers/net/mlx5/hws/mlx5dr_cmd.c @@ -370,9 +370,12 @@ mlx5dr_cmd_rtc_create(struct ibv_context *ctx, attr, obj_type, MLX5_GENERAL_OBJ_TYPE_RTC); attr = MLX5_ADDR_OF(create_rtc_in, in, rtc); - MLX5_SET(rtc, attr, ste_format_0, rtc_attr->is_frst_jumbo ? - MLX5_IFC_RTC_STE_FORMAT_11DW : - MLX5_IFC_RTC_STE_FORMAT_8DW); + if (rtc_attr->is_compare) { + MLX5_SET(rtc, attr, ste_format_0, MLX5_IFC_RTC_STE_FORMAT_4DW_RANGE); + } else { + MLX5_SET(rtc, attr, ste_format_0, rtc_attr->is_frst_jumbo ? +MLX5_IFC_RTC_STE_FORMAT_11DW : MLX5_IFC_RTC_STE_FORMAT_8DW); + } if (rtc_attr->is_scnd_range) { MLX5_SET(rtc, attr, ste_format_1, MLX5_IFC_RTC_STE_FORMAT_RANGE); diff --git a/drivers/net/mlx5/hws/mlx5dr_cmd.h b/drivers/net/mlx5/hws/mlx5dr_cmd.h index 18c2b07fc8..073ffd9633 100644 --- a/drivers/net/mlx5/hws/mlx5dr_cmd.h +++ b/drivers/net/mlx5/hws/mlx5dr_cmd.h @@ -82,6 +82,7 @@ struct mlx5dr_cmd_rtc_create_attr { uint8_t reparse_mode; bool is_frst_jumbo; bool is_scnd_range; + bool is_compare; }; struct mlx5dr_cmd_alias_obj_create_attr { diff --git a/drivers/net/mlx5/hws/mlx5dr_debug.c b/drivers/net/mlx5/hws/mlx5dr_debug.c index 11557bcab8..a9094cd35b 100644 --- a/drivers/net/mlx5/hws/mlx5dr_debug.c +++ b/drivers/net/mlx5/hws/mlx5dr_debug.c @@ -99,6 +99,7 @@ static int mlx5dr_debug_dump_matcher_match_template(FILE *f, struct mlx5dr_matcher *matcher) { bool is_root = matcher->tbl->level == MLX5DR_ROOT_LEVEL; + bool is_compare = mlx5dr_matcher_is_compare(matcher); enum mlx5dr_debug_res_type type; int i, ret; @@ -117,7 +118,8 @@ mlx5dr_debug_dump_matcher_match_template(FILE *f, struct mlx5dr_matcher *matcher return rte_errno; } - type = MLX5DR_DEBUG_RES_TYPE_MATCHER_TEMPLATE_MATCH_DEFINER; + type = is_compare ? MLX5DR_DEBUG_RES_TYPE_MATCHER_TEMPLATE_COMPARE_MATCH_DEFINER : + MLX5DR_DEBUG_RES_TYPE_MATCHER_TEMPLATE_MATCH_DEFINER; ret = mlx5dr_debug_dump_matcher_template_definer(f, mt, mt->definer, type); if (ret) return ret; diff --git a/drivers/net/mlx5/hws/mlx5dr_debug.h b/drivers/net/mlx5/hws/mlx5dr_debug.h index 5cffdb10b5..a89a6a0b1d 100644 --- a/drivers/net/mlx5/hws/mlx5dr_debug.h +++ b/drivers/net/mlx5/hws/mlx5dr_debug.h @@ -24,6 +24,7 @@ enum mlx5dr_debug_res_type { MLX5DR_DEBUG_RES_TYPE_MATCHER_ACTION_TEMPLATE = 4204, MLX5DR_DEBUG_RES_TYPE_MATCHER_TEMPLATE_HASH_DEFINER = 4205, MLX5DR_DEBUG_RES_TYPE_MATCHER_TEMPLATE_RANGE_DEFINER = 4206, + MLX5DR_DEBUG_RES_TYPE_MATCHER_TEMPLATE_COMPARE_MATCH_DEFINER = 4207, }; static inline uint64_t diff --git a/drivers/net/mlx5/hws/mlx5dr_defi
[PATCH v2 2/2] net/mlx5: add support to compare random value
Add support to use "RTE_FLOW_ITEM_TYPE_COMPARE" with "RTE_FLOW_FIELD_RAMDOM" as an argument. The random field is supported only when base is an immediate value, random field cannot be compared with enother field. Signed-off-by: Michael Baum --- drivers/net/mlx5/mlx5_flow_hw.c | 70 - 1 file changed, 52 insertions(+), 18 deletions(-) diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c index 01fdcbfed9..be303ea3a8 100644 --- a/drivers/net/mlx5/mlx5_flow_hw.c +++ b/drivers/net/mlx5/mlx5_flow_hw.c @@ -6686,18 +6686,55 @@ flow_hw_prepend_item(const struct rte_flow_item *items, return copied_items; } -static inline bool -flow_hw_item_compare_field_supported(enum rte_flow_field_id field) +static int +flow_hw_item_compare_field_validate(enum rte_flow_field_id arg_field, + enum rte_flow_field_id base_field, + struct rte_flow_error *error) { - switch (field) { + switch (arg_field) { + case RTE_FLOW_FIELD_TAG: + case RTE_FLOW_FIELD_META: + break; + case RTE_FLOW_FIELD_RANDOM: + if (base_field == RTE_FLOW_FIELD_VALUE) + return 0; + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, + "compare random is supported only with immediate value"); + default: + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, + "compare item argument field is not supported"); + } + switch (base_field) { case RTE_FLOW_FIELD_TAG: case RTE_FLOW_FIELD_META: case RTE_FLOW_FIELD_VALUE: - return true; + break; + default: + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, + "compare item base field is not supported"); + } + return 0; +} + +static inline uint32_t +flow_hw_item_compare_width_supported(enum rte_flow_field_id field) +{ + switch (field) { + case RTE_FLOW_FIELD_TAG: + case RTE_FLOW_FIELD_META: + return 32; + case RTE_FLOW_FIELD_RANDOM: + return 16; default: break; } - return false; + return 0; } static int @@ -6706,6 +6743,7 @@ flow_hw_validate_item_compare(const struct rte_flow_item *item, { const struct rte_flow_item_compare *comp_m = item->mask; const struct rte_flow_item_compare *comp_v = item->spec; + int ret; if (unlikely(!comp_m)) return rte_flow_error_set(error, EINVAL, @@ -6717,19 +6755,13 @@ flow_hw_validate_item_compare(const struct rte_flow_item *item, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, "compare item only support full mask"); - if (!flow_hw_item_compare_field_supported(comp_m->a.field) || - !flow_hw_item_compare_field_supported(comp_m->b.field)) - return rte_flow_error_set(error, ENOTSUP, - RTE_FLOW_ERROR_TYPE_UNSPECIFIED, - NULL, - "compare item field not support"); - if (comp_m->a.field == RTE_FLOW_FIELD_VALUE && - comp_m->b.field == RTE_FLOW_FIELD_VALUE) - return rte_flow_error_set(error, EINVAL, - RTE_FLOW_ERROR_TYPE_UNSPECIFIED, - NULL, - "compare between value is not valid"); + ret = flow_hw_item_compare_field_validate(comp_m->a.field, + comp_m->b.field, error); + if (ret < 0) + return ret; if (comp_v) { + uint32_t width; + if (comp_v->operation != comp_m->operation || comp_v->a.field != comp_m->a.field || comp_v->b.field != comp_m->b.field) @@ -6737,7 +6769,9 @@ flow_hw_validate_item_compare(const struct rte_flow_item *item, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, "compare item spec/mask not matching"); - if ((comp_v->width & comp_m->width) != 32) + width = flow_hw_item_compare_width_supported(comp_v->a.field); + MLX5_ASSERT(width > 0); + if ((comp_v->width & comp_m->width) !=
Re: DTS testpmd and SCAPY integration
Hello, I've uploaded the code snapshot and functional description here: https://drive.google.com/drive/folders/1cHrPwx4fUJ6ibUCtHd4kNKsrmmvQvvOj?usp=drive_link Regards, Gregory From: Jeremy Spewock Sent: Tuesday, January 23, 2024 20:26 To: NBU-Contact-Thomas Monjalon (EXTERNAL) Cc: Honnappa Nagarahalli ; Gregory Etelson ; Juraj Linkeš ; Paul Szczepanek ; Yoan Picchi ; Patrick Robb ; Luca Vizzarro ; c...@dpdk.org ; dev@dpdk.org ; nd Subject: Re: DTS testpmd and SCAPY integration External email: Use caution opening links or attachments On Tue, Jan 23, 2024 at 3:50 AM Thomas Monjalon mailto:tho...@monjalon.net>> wrote: 23/01/2024 04:42, Honnappa Nagarahalli: > > 08/01/2024 13:10, Luca Vizzarro: > > > Your proposal sounds rather interesting. Certainly enabling DTS to > > > accept YAML-written tests sounds more developer-friendly and should > > > enable quicker test-writing. As this is an extra feature though – and > > > a nice-to-have, it should definitely be discussed in the DTS meetings > > > as Honnappa suggested already. > > > > I would not classify this idea as "nice-to-have". > > I would split this proposal in 2 parts: > > 1/ YAML is an implementation alternative. > > 2/ Being able to write a test with a small number of lines, > > reusing some commands from existing tools, > > should be our "must-have" common goal. > > > > Others have mentioned that YAML may not be suitable in complex cases, and > > that it would be an additional language for test writing. > > I personnaly think we should focus on a single path which is easy to read > > and > > maintain. > > I think we are digressing from the plan we had put forward if we have to go > down this path. > We should understand what it means by going down the YAML format. > Also, what would happen if there is another innovation in 3 months? There is a misunderstanding here. I suggest to take this proposal as an example of the simplicity to reach. But I agree with you it is more reasonnable to continue with the Python path. I agree that it would be easier to develop both the framework and some of the more complex test cases in python and think sticking with Python is a good route. I think taking this proposal and using it as an example of something we could work towards is also a great outlook because it gives a more structured goal instead of the idea being "let's just make it as simple as we can." > We already have scatter-gather test suite ported to DPDK repo and has > undergone review in the community. > > In the last meeting we went through a simple test case. Is it possible to > write the scatter-gather test case in YAML and see how they compare? After the latest CI meeting we thought about writing a simple text case in Python with some virtual functions which would abstract all the boilerplate code, so it would have the same level of simplicity as this YAML proposal. Looking at what we have for DTS currently, we have put thought into achieving this and, in some categories, have made things similar to the YAML approach. Like, for example, when it comes to things like sending, receiving, and verifying a packet, all you have to do is create a packet and call a method. All of the boilerplate regarding how to send and receive and how it can be different between types of traffic generators, as well as setting up the ability to forward the packet is completely abstracted away from you. It's as simple as you send a packet and you get some packets back from the test suite perspective. The os_udp testing suite that Juraj wrote is a great example of this and how short the simple case can be. Additionally we have created centralized APIs between the SUT and TG nodes so that all of the methods that the developer would have to use can be accessed either through these nodes, or through the method inherited from the base class for test suites. I think that when you try to make things extremely simple there is a common tradeoff; the more simplistic it is, the more restrictive it can become. We would have to make more assumptions about use cases and how things would look in these common cases. This generally isn't a super big problem in our case because if support for something that is needed is missing, we can add it to the API and with these test cases there are things that will generally be similar every time. This tradeoff can also be offset by creating the functions to make these things simple while still exposing less restrictive underlying methods so that more niche use cases can still be fulfilled. Just something to keep in mind. > > For the configuration side, YAML is already used in DTS. > > For the test suite logic, do you think we can achieve the same simplicity > > with > > some Python code? > > > > We discussed how to progress with this proposal during the CI meeting last > > week. > > We need to check how it could look and what we can improve to reach th
RE: Potential RTE bitset RFC
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Saturday, 27 January 2024 19.32 > > Hi. > > The new timer RFC ("htimer") I submitted last year also included a new > bitset API. > > https://patchwork.dpdk.org/project/dpdk/patch/20230315170342.214127-2- > mattias.ronnb...@ericsson.com/ > > My experience is that multi-word bitsets are often useful. Examples > from > DPDK are rte_service.c and DSW its "service ports" bitset (both have 64 > as a hard upper limit). Small, but multi-word, bitsets are not > particularly hard to open-code, but then you end up with a lot of > duplication. > > I wanted to ask if there is an interest in seeing a bitset API (as per > my patchset) in DPDK. Absolutely! Your bitset patch seems very complete, with test cases and all. Let's standardize on this, so we can avoid variants of similar code all over the place. > > Upstreaming htimer, including having it replace today's rte_timer is > more work than I can commit to, so I think you won't get RTE bitset > that > way any time soon. Thanks for the update regarding the htimer progress. :-) I certainly don't object to a dedicated fast path library for high-volume timers, such as those in a TCP/IP (or QUIC/IP) stack. In my opinion, the existing rte_timer library can be improved at a later stage, if anybody cares. It's a shame if that requirement is holding back the addition of a new and useful library. -Morten
Re: [PATCH v3 0/7] fix race-condition of proactive error handling mode
Hi Ferruh, Kindly ping for review. Thanks On 2024/1/15 9:44, fengchengwen wrote: > Kindly ping. > > On 2023/12/5 10:30, fengchengwen wrote: >> Hi Ferruh, >> >> I notice this patchset was delegated to you, so could you take a view? >> >> Thanks. >> >> On 2023/11/6 21:11, Chengwen Feng wrote: >>> This patch fixes race-condition of proactive error handling mode, the >>> discussion thread [1]. >>> >>> [1] >>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kal...@intel.com/ >>> >>> Chengwen Feng (7): >>> ethdev: fix race-condition of proactive error handling mode >>> net/hns3: replace fp ops config function >>> net/bnxt: fix race-condition when report error recovery >>> net/bnxt: use fp ops setup function >>> app/testpmd: add error recovery usage demo >>> app/testpmd: extract event handling to event.c >>> doc: testpmd support event handling section >>> >>> --- >>> v3: >>> - adjust the usage of RTE_ETH_EVENT_QUEUE_STATE in 7/7 commit. >>> - add ack-by from Konstantin Ananyev, Ajit Khaparde and Huisong Li. >>> v2: >>> - extract event handling to event.c and document it, which address >>> Ferruh's comment. >>> - add ack-by from Konstantin Ananyev and Dongdong Liu. >>> >>> app/test-pmd/event.c | 390 +++ >>> app/test-pmd/meson.build | 1 + >>> app/test-pmd/parameters.c| 36 +- >>> app/test-pmd/testpmd.c | 247 +--- >>> app/test-pmd/testpmd.h | 10 +- >>> doc/guides/prog_guide/poll_mode_drv.rst | 20 +- >>> doc/guides/testpmd_app_ug/event_handling.rst | 81 >>> doc/guides/testpmd_app_ug/index.rst | 1 + >>> drivers/net/bnxt/bnxt_cpr.c | 18 +- >>> drivers/net/bnxt/bnxt_ethdev.c | 9 +- >>> drivers/net/hns3/hns3_rxtx.c | 21 +- >>> lib/ethdev/ethdev_driver.c | 8 + >>> lib/ethdev/ethdev_driver.h | 10 + >>> lib/ethdev/rte_ethdev.h | 32 +- >>> lib/ethdev/version.map | 1 + >>> 15 files changed, 552 insertions(+), 333 deletions(-) >>> create mode 100644 app/test-pmd/event.c >>> create mode 100644 doc/guides/testpmd_app_ug/event_handling.rst >>> >> . >> > . >
Re: Potential RTE bitset RFC
Hi, On 2024/1/28 21:52, Morten Brørup wrote: >> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] >> Sent: Saturday, 27 January 2024 19.32 >> >> Hi. >> >> The new timer RFC ("htimer") I submitted last year also included a new >> bitset API. >> >> https://patchwork.dpdk.org/project/dpdk/patch/20230315170342.214127-2- >> mattias.ronnb...@ericsson.com/ >> >> My experience is that multi-word bitsets are often useful. Examples >> from >> DPDK are rte_service.c and DSW its "service ports" bitset (both have 64 >> as a hard upper limit). Small, but multi-word, bitsets are not >> particularly hard to open-code, but then you end up with a lot of >> duplication. >> >> I wanted to ask if there is an interest in seeing a bitset API (as per >> my patchset) in DPDK. > > Absolutely! > Your bitset patch seems very complete, with test cases and all. > Let's standardize on this, so we can avoid variants of similar code all over > the place. The bitmap (lib/eal/include/rte_bitmap.h) provides a subset of this bitset library. Maybe it's better to extend the bitmap library. Thanks. > >> >> Upstreaming htimer, including having it replace today's rte_timer is >> more work than I can commit to, so I think you won't get RTE bitset >> that >> way any time soon. > > Thanks for the update regarding the htimer progress. :-) > > I certainly don't object to a dedicated fast path library for high-volume > timers, such as those in a TCP/IP (or QUIC/IP) stack. > > In my opinion, the existing rte_timer library can be improved at a later > stage, if anybody cares. It's a shame if that requirement is holding back the > addition of a new and useful library. > > -Morten >
Re: [dpdk-dev] [PATCH v4] eal: refactor rte_eal_init into sub-functions
On (01/24/24 16:53), David Marchand wrote: > Date: Wed, 24 Jan 2024 16:53:33 +0100 > From: David Marchand > To: Rahul Gupta > Cc: dev@dpdk.org, tho...@monjalon.net, bruce.richard...@intel.com, > dmitry.kozl...@gmail.com, step...@networkplumber.org, > sovar...@linux.microsoft.com, ok...@kernel.org, > sujithsan...@microsoft.com, sowmini.varad...@microsoft.com, > krathina...@microsoft.com, rahulrgupt...@gmail.com, Rahul Gupta > > Subject: Re: [dpdk-dev] [PATCH v4] eal: refactor rte_eal_init into > sub-functions > > On Wed, Jan 24, 2024 at 2:45 PM Rahul Gupta > wrote: > > > > From: Rahul Gupta > > > > In continuation to the following email, I am sending this patch. > > (https://inbox.dpdk.org/dev/20231110172523.ga17...@microsoft.com/) > > > > Initialization requires rte_eal_init + rte_pktmbuf_pool_create which > > can consume a total time of 500-600 ms: > > a) For many devices FLR may take a significant chunk of time > >(200-250 ms in our use-case), this FLR is triggered during device > >probe in rte_eal_init(). > > b) rte_pktmbuf_pool_create() can consume up to 300-350 ms for > > applications that require huge memory. > > > > This cost is incurred on each restart (which happens in our use-case > > during binary updates for servicing). > > This patch provides an optimization using pthreads that applications > > can use and which can save 200-230ms. > > > > In this patch, rte_eal_init() is refactored into two parts- > > a) 1st part is dependent code ie- it’s a perquisite of the FLR and > >mempool creation. So this code needs to be executed before any > >pthreads. Its named as rte_eal_init_setup() > > b) 2nd part of code is independent code ie- it can execute in parallel > >to mempool creation in a pthread. Its named as > > rte_eal_init_async_setup(). > > > > In existing applications no changes are required unless they wish to > > leverage > > the optimization. > > > > If the application wants to leverage this optimization, then it needs to > > call > > rte_eal_init_async() (instead of call rte_eal_init()), then it can create a > > thread using rte_eal_remote_launch() to schedule a task it would like todo > > in > > parallel rte_eal_init_async_setup(), this task can be a mbuf pool creation > > using- rte_pktmbuf_pool_create() > > > > After this, if next operations require completion of above task, then > > user can use rte_eal_init_wait_async_setup_complete(), > > or if user wants to just check status of that thread, then use- > > rte_eal_init_async_setup_done() > > Looking at what this patch does.. I am under the impression all you > really need is rte_eal_init without initial probing. > Such behavior can probably be achieved with a allowlist set to a non > existing device (like for example "-a :00:00.0"), then later, use > device hotplug. The patch will be useful to all the adapters irrespective of their host plug support. > > Some quick note on this patch. > > - don't expose symbols externally if they are only for internal use in > the same library, done in next patch. > - current version is 24.03, not 24.01 (wrt comment in version.map), done > - other OSes are not handled by this patch, please do the work for > FreeBSD and Windows, I can send patch to support non-linux OS, but due to lack of setup, I will need help to test same. Also, I am planning to do the porting at the end (ie after incorporating all review comments, in order to prevent duplication of efforts). > - as a followup of the previous point, please check if we can share > code between OSes and, if so, move the shared code to lib/eal/common, The code for rte_eal_init() is different for all three distros, (even if I consider just the 1st part of rte_eal_init() ie rte_eal_init_setup()). So its difficult to move to common dir. We will have todo it separately for all OS. > - did you test this series with multiprocess? Yes it works fine, I have tested it with simple_mp app. > - why should telemetry and other parts of the current rte_eal_init() > be left in the second stage of this initialisation pipeline? Actually motivation of this patch was todo most of the work in parallel ie in second stage, so not only probe/FLR but telemetry and any other work which can be executed in parallel are done here. (pls refer to the link in the commit message for more details) > > > -- > David Marchand Thanks for the reviewing, my comments are inline. Thanks, Rahul.
RE: [EXT] Re: [PATCH] app/testpmd: add command to get Tx queue used count
-- Please doc this command in doc/guides/testpmd_app_ug/testpmd_funcs.rst >> Thanks, will update in next version Also why not extend "show port rxq xxx" command to support txq ? >> txq and rxq are different directions, so extended "show port " command to >> support txq similar to rxq command. Could you please give more details if I >> missed something here. Thanks, Satha On 2024/1/24 20:18, skotesh...@marvell.com wrote: > From: Satha Rao > > Fastpath API to get txq used count. > >testpmd> show port 0 txq 0 desc count > > Signed-off-by: Satha Rao > --- > app/test-pmd/cmdline.c | 78 > ++ > 1 file changed, 78 insertions(+) > > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index > f704319..1d09633 100644 > --- a/app/test-pmd/cmdline.c > +++ b/app/test-pmd/cmdline.c > @@ -12638,6 +12638,83 @@ struct cmd_show_port_supported_ptypes_result { > }, > }; > > +/* *** display tx queue desc used count *** */ struct > +cmd_show_tx_queue_desc_count_result { > + cmdline_fixed_string_t cmd_show; > + cmdline_fixed_string_t cmd_port; > + cmdline_fixed_string_t cmd_txq; > + cmdline_fixed_string_t cmd_desc; > + cmdline_fixed_string_t cmd_count; > + portid_t cmd_pid; > + portid_t cmd_qid; > +}; > + > +static void > +cmd_show_tx_queue_desc_count_parsed(void *parsed_result, > + __rte_unused struct cmdline *cl, > + __rte_unused void *data) > +{ > + struct cmd_show_tx_queue_desc_count_result *res = parsed_result; > + int rc; > + > + if (rte_eth_tx_queue_is_valid(res->cmd_pid, res->cmd_qid) != 0) { > + fprintf(stderr, "Invalid input: port id = %d, queue id = %d\n", > res->cmd_pid, > + res->cmd_qid); > + return; > + } > + > + rc = rte_eth_tx_queue_count(res->cmd_pid, res->cmd_qid); > + if (rc < 0) { > + fprintf(stderr, "Tx queue count get failed rc=%d > queue_id=%d\n", rc, res->cmd_qid); > + return; > + } > + printf("TxQ %d used desc count = %d\n", res->cmd_qid, rc); } > + > +static cmdline_parse_token_string_t cmd_show_tx_queue_desc_count_show = > + TOKEN_STRING_INITIALIZER > + (struct cmd_show_tx_queue_desc_count_result, > + cmd_show, "show"); > +static cmdline_parse_token_string_t cmd_show_tx_queue_desc_count_port = > + TOKEN_STRING_INITIALIZER > + (struct cmd_show_tx_queue_desc_count_result, > + cmd_port, "port"); > +static cmdline_parse_token_num_t cmd_show_tx_queue_desc_count_pid = > + TOKEN_NUM_INITIALIZER > + (struct cmd_show_tx_queue_desc_count_result, > + cmd_pid, RTE_UINT16); > +static cmdline_parse_token_string_t cmd_show_tx_queue_desc_count_txq = > + TOKEN_STRING_INITIALIZER > + (struct cmd_show_tx_queue_desc_count_result, > + cmd_txq, "txq"); > +static cmdline_parse_token_num_t cmd_show_tx_queue_desc_count_qid = > + TOKEN_NUM_INITIALIZER > + (struct cmd_show_tx_queue_desc_count_result, > + cmd_qid, RTE_UINT16); > +static cmdline_parse_token_string_t cmd_show_tx_queue_desc_count_desc = > + TOKEN_STRING_INITIALIZER > + (struct cmd_show_tx_queue_desc_count_result, > + cmd_desc, "desc"); > +static cmdline_parse_token_string_t cmd_show_tx_queue_desc_count_count = > + TOKEN_STRING_INITIALIZER > + (struct cmd_show_tx_queue_desc_count_result, > + cmd_count, "count"); > +static cmdline_parse_inst_t cmd_show_tx_queue_desc_count = { > + .f = cmd_show_tx_queue_desc_count_parsed, > + .data = NULL, > + .help_str = "show port txq desc count", > + .tokens = { > + (void *)&cmd_show_tx_queue_desc_count_show, > + (void *)&cmd_show_tx_queue_desc_count_port, > + (void *)&cmd_show_tx_queue_desc_count_pid, > + (void *)&cmd_show_tx_queue_desc_count_txq, > + (void *)&cmd_show_tx_queue_desc_count_qid, > + (void *)&cmd_show_tx_queue_desc_count_desc, > + (void *)&cmd_show_tx_queue_desc_count_count, > + NULL, > + }, > +}; > + > /* *** display rx/tx descriptor status *** */ struct > cmd_show_rx_tx_desc_status_result { > cmdline_fixed_string_t cmd_show; > @@ -13346,6 +13423,7 @@ struct cmd_config_tx_affinity_map { > (cmdline_parse_inst_t *)&cmd_show_tx_metadata, > (cmdline_parse_inst_t *)&cmd_show_rx_tx_desc_status, > (cmdline_parse_inst_t *)&cmd_show_rx_queue_desc_used_count, > + (cmdline_parse_inst_t *)&cmd_show_tx_queue_desc_count, > (cmdline_parse_inst_t *)&cmd_set_raw, > (cmdline_parse_inst_t *)&cmd_show_set_raw, > (cmdline_parse_inst_t *)&cmd_show_set_raw_all, >
Re: Potential RTE bitset RFC
On 2024-01-29 04:02, fengchengwen wrote: Hi, On 2024/1/28 21:52, Morten Brørup wrote: From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] Sent: Saturday, 27 January 2024 19.32 Hi. The new timer RFC ("htimer") I submitted last year also included a new bitset API. https://patchwork.dpdk.org/project/dpdk/patch/20230315170342.214127-2- mattias.ronnb...@ericsson.com/ My experience is that multi-word bitsets are often useful. Examples from DPDK are rte_service.c and DSW its "service ports" bitset (both have 64 as a hard upper limit). Small, but multi-word, bitsets are not particularly hard to open-code, but then you end up with a lot of duplication. I wanted to ask if there is an interest in seeing a bitset API (as per my patchset) in DPDK. Absolutely! Your bitset patch seems very complete, with test cases and all. Let's standardize on this, so we can avoid variants of similar code all over the place. The bitmap (lib/eal/include/rte_bitmap.h) provides a subset of this bitset library. Maybe it's better to extend the bitmap library. Thanks. RTE bitmap is for large bitsets. This library is for small bitsets where performance is crucial (i.e., you can't live with the "extra" spacial and/or temporal overhead of rte_bitmap.h). Stephen Hemminger suggested supporting atomic operations in this API. The would make sense to me as well. void rte_bitset_atomic_set(uint64_t *bitset, size_t bit_num, int memory_model); or, alternatively, you only provided a get/set with a relaxed memory model, and you could loose the memory model parameter. Upstreaming htimer, including having it replace today's rte_timer is more work than I can commit to, so I think you won't get RTE bitset that way any time soon. Thanks for the update regarding the htimer progress. :-) I certainly don't object to a dedicated fast path library for high-volume timers, such as those in a TCP/IP (or QUIC/IP) stack. In my opinion, the existing rte_timer library can be improved at a later stage, if anybody cares. It's a shame if that requirement is holding back the addition of a new and useful library. -Morten
Re: Potential RTE bitset RFC
On 2024-01-28 14:52, Morten Brørup wrote: From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] Sent: Saturday, 27 January 2024 19.32 Hi. The new timer RFC ("htimer") I submitted last year also included a new bitset API. https://patchwork.dpdk.org/project/dpdk/patch/20230315170342.214127-2- mattias.ronnb...@ericsson.com/ My experience is that multi-word bitsets are often useful. Examples from DPDK are rte_service.c and DSW its "service ports" bitset (both have 64 as a hard upper limit). Small, but multi-word, bitsets are not particularly hard to open-code, but then you end up with a lot of duplication. I wanted to ask if there is an interest in seeing a bitset API (as per my patchset) in DPDK. Absolutely! Your bitset patch seems very complete, with test cases and all. Let's standardize on this, so we can avoid variants of similar code all over the place. Upstreaming htimer, including having it replace today's rte_timer is more work than I can commit to, so I think you won't get RTE bitset that way any time soon. Thanks for the update regarding the htimer progress. :-) I certainly don't object to a dedicated fast path library for high-volume timers, such as those in a TCP/IP (or QUIC/IP) stack. In my opinion, the existing rte_timer library can be improved at a later stage, if anybody cares. It's a shame if that requirement is holding back the addition of a new and useful library. You could just add the core HTW parts of the htimer library to DPDK as a new library (and leave out the rest of htimer), but in that case you want to tailor this API to fit a future HTW-based rte_timer implementation. Without actually implementing such a replacement, it's hard to know exactly what properties you want from the HTW API/implementation. Therefor, I think you should do both at the same time.
RE: Potential RTE bitset RFC
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Monday, 29 January 2024 07.52 > > On 2024-01-28 14:52, Morten Brørup wrote: > >> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > >> Sent: Saturday, 27 January 2024 19.32 > >> > >> Hi. > >> > >> The new timer RFC ("htimer") I submitted last year also included a > new > >> bitset API. > >> > >> https://patchwork.dpdk.org/project/dpdk/patch/20230315170342.214127- > 2- > >> mattias.ronnb...@ericsson.com/ > >> > >> My experience is that multi-word bitsets are often useful. Examples > >> from > >> DPDK are rte_service.c and DSW its "service ports" bitset (both have > 64 > >> as a hard upper limit). Small, but multi-word, bitsets are not > >> particularly hard to open-code, but then you end up with a lot of > >> duplication. > >> > >> I wanted to ask if there is an interest in seeing a bitset API (as > per > >> my patchset) in DPDK. > > > > Absolutely! > > Your bitset patch seems very complete, with test cases and all. > > Let's standardize on this, so we can avoid variants of similar code > all over the place. > > > >> > >> Upstreaming htimer, including having it replace today's rte_timer is > >> more work than I can commit to, so I think you won't get RTE bitset > >> that > >> way any time soon. > > > > Thanks for the update regarding the htimer progress. :-) > > > > I certainly don't object to a dedicated fast path library for high- > volume timers, such as those in a TCP/IP (or QUIC/IP) stack. > > > > In my opinion, the existing rte_timer library can be improved at a > later stage, if anybody cares. It's a shame if that requirement is > holding back the addition of a new and useful library. > > > > You could just add the core HTW parts of the htimer library to DPDK as > a > new library (and leave out the rest of htimer), but in that case you > want to tailor this API to fit a future HTW-based rte_timer > implementation. Without actually implementing such a replacement, it's > hard to know exactly what properties you want from the HTW > API/implementation. > > Therefor, I think you should do both at the same time. We have other categories of libraries with separate APIs for variants, e.g. rte_hash and rte_fbk_hash. So we could also have two APIs for different timer library variants, although I might be alone in the DPDK community with this opinion regarding timer libraries. From a high level perspective, I agree that a more unified API is preferable. If you consider a long term road map leading to a unified API more of a "must have" than a "nice to have", it makes really good sense to think that through before contributing new components, and I will not press for a core HTW library. PS: If DPDK was written in C++, I would generally press for common superclass templates and be opposed to multiple standalone libraries with similar properties. But it's not. And sometimes purpose-specific variants of otherwise similar libraries do make sense, especially in the fast path, where every cycle is precious!
Re: [dpdk-dev] [PATCH v4] eal: refactor rte_eal_init into sub-functions
On Mon, Jan 29, 2024 at 6:35 AM Rahul Gupta wrote: > > Looking at what this patch does.. I am under the impression all you > > really need is rte_eal_init without initial probing. > > Such behavior can probably be achieved with a allowlist set to a non > > existing device (like for example "-a :00:00.0"), then later, use > > device hotplug. > The patch will be useful to all the adapters irrespective of their > host plug support. I did not say hotplug support is needed. If what I described already works, this patch adds nothing. -- David Marchand