[v1 1/4] lib: add optional fields in GRE header
There are optional fields in GRE header(checksum/key/sequence), this patch adds definition of structures of the optional fields. Signed-off-by: Sean Zhang --- lib/net/rte_gre.h | 21 + 1 file changed, 21 insertions(+) diff --git a/lib/net/rte_gre.h b/lib/net/rte_gre.h index 5897756..5113e79 100644 --- a/lib/net/rte_gre.h +++ b/lib/net/rte_gre.h @@ -48,6 +48,27 @@ struct rte_gre_hdr { uint16_t proto; /**< Protocol Type */ } __rte_packed; +/** + * Optional field checksum in GRE header + */ +struct rte_gre_hdr_opt_checksum { + rte_be16_t checksum; +} __rte_packed; + +/** + * Optional field key in GRE header + */ +struct rte_gre_hdr_opt_key { + rte_be32_t key; +} __rte_packed; + +/** + * Optional field sequence in GRE header + */ +struct rte_gre_hdr_opt_sequence { + rte_be32_t sequence; +} __rte_packed; + #ifdef __cplusplus } #endif -- 1.8.3.1
[v1 0/4] Add support for GRE optional fields matching
This patch set adds support for matching optional fields of GRE header. The optional fields are checksum, key and sequence number. Currently, key field is supported with pattern gre_key item '.. / gre / gre_key value is xx / ..' with field gre_key in misc, but misc does not support matching of checksum and sequence number of GRE. To support matching of checksum and sequence number fields in GRE, rdma-core needs the capbility of misc5 and support tunnel_header 0-3. Since tunnel_header1 is used to match checksum, tunnel_header2 for key and tunnel_header3 for sequence by hardware. If checksum and sequence number not present in the pattern, use misc as before for the matching. Application can still use gre_key item 'gre_key value is xx' for key matching, the effect is the same if use 'gre_option key is xx'. If using gre_option item, the flags in gre item should be correspondingly set. For example, if using gre_option to match checksum, the c_bit should be set '1' (.. / gre c_bit is 1 / gre_option checksum is xx / ..). Sean Zhang (4): lib: add optional fields in GRE header ethdev: support GRE optional fields app/testpmd: add gre_option item command net/mlx5: support matching optional fields of GRE app/test-pmd/cmdline_flow.c | 59 +++ doc/guides/prog_guide/rte_flow.rst | 17 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 6 ++ drivers/common/mlx5/mlx5_devx_cmds.c| 3 + drivers/net/mlx5/linux/mlx5_os.c| 2 + drivers/net/mlx5/mlx5.h | 1 + drivers/net/mlx5/mlx5_flow.c| 106 drivers/net/mlx5/mlx5_flow.h| 6 ++ drivers/net/mlx5/mlx5_flow_dv.c | 146 lib/ethdev/rte_flow.c | 1 + lib/ethdev/rte_flow.h | 19 lib/net/rte_gre.h | 21 12 files changed, 387 insertions(+) -- 1.8.3.1
[v1 3/4] app/testpmd: add gre_option item command
Add gre_option command for matching optional fields(checksum/key/sequence) in GRE header. The item must follow gre item, and the item does not change the flags in gre item, the application should set the flags in gre item correspondingly. Application can still use gre_key item 'gre_key value is xx' for key matching, the effect is the same with using 'gre_option key is xx'. The examples for gre_option are as follows: To match on checksum field with value 0x11: testpmd> ... pattern / eth / gre c_bit is 1 / gre_option checksum is 0x11 / end .. To match on checksum field with value 0x11 and any value of key: testpmd> ... pattern / eth / gre c_bit is 1 k_bit is 1 / gre_option checksum is 0x11 / end .. To match on checksum field with value 0x11 and no key field in packet: testpmd> ... pattern / eth / gre c_bit is 1 k_bit is 0 / gre_option checksum is 0x11 / end .. The invalid patterns for gre_option are as follows: testpmd> ... pattern / eth / gre / gre_option checksum is 0x11 / end .. (c_bit in gre item not present) testpmd> ... pattern / eth / gre c_bit is 0 / gre_option checksum is 0x11 / end .. (c_bit is unset for gre item, but checksum is specified by gre_option item) Signed-off-by: Sean Zhang --- app/test-pmd/cmdline_flow.c | 59 + doc/guides/testpmd_app_ug/testpmd_funcs.rst | 6 +++ 2 files changed, 65 insertions(+) diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c index 5c2bba4..b7aacac 100644 --- a/app/test-pmd/cmdline_flow.c +++ b/app/test-pmd/cmdline_flow.c @@ -271,6 +271,10 @@ enum index { ITEM_META_DATA, ITEM_GRE_KEY, ITEM_GRE_KEY_VALUE, + ITEM_GRE_OPTION, + ITEM_GRE_OPTION_CHECKSUM, + ITEM_GRE_OPTION_KEY, + ITEM_GRE_OPTION_SEQUENCE, ITEM_GTP_PSC, ITEM_GTP_PSC_QFI, ITEM_GTP_PSC_PDU_T, @@ -1042,6 +1046,7 @@ struct parse_action_priv { ITEM_ICMP6_ND_OPT_TLA_ETH, ITEM_META, ITEM_GRE_KEY, + ITEM_GRE_OPTION, ITEM_GTP_PSC, ITEM_PPPOES, ITEM_PPPOED, @@ -1232,6 +1237,14 @@ struct parse_action_priv { ZERO, }; +static const enum index item_gre_option[] = { + ITEM_GRE_OPTION_CHECKSUM, + ITEM_GRE_OPTION_KEY, + ITEM_GRE_OPTION_SEQUENCE, + ITEM_NEXT, + ZERO, +}; + static const enum index item_gtp[] = { ITEM_GTP_FLAGS, ITEM_GTP_MSG_TYPE, @@ -3479,6 +3492,38 @@ static int comp_set_modify_field_id(struct context *, const struct token *, item_param), .args = ARGS(ARG_ENTRY_HTON(rte_be32_t)), }, + [ITEM_GRE_OPTION] = { + .name = "gre_option", + .help = "match GRE optional fields", + .priv = PRIV_ITEM(GRE_OPTION, + sizeof(struct rte_flow_item_gre_opt)), + .next = NEXT(item_gre_option), + .call = parse_vc, + }, + [ITEM_GRE_OPTION_CHECKSUM] = { + .name = "checksum", + .help = "match GRE checksum", + .next = NEXT(item_gre_option, NEXT_ENTRY(COMMON_UNSIGNED), +item_param), + .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_gre_opt, +checksum)), + }, + [ITEM_GRE_OPTION_KEY] = { + .name = "key", + .help = "match GRE key", + .next = NEXT(item_gre_option, NEXT_ENTRY(COMMON_UNSIGNED), +item_param), + .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_gre_opt, +key)), + }, + [ITEM_GRE_OPTION_SEQUENCE] = { + .name = "sequence", + .help = "match GRE sequence", + .next = NEXT(item_gre_option, NEXT_ENTRY(COMMON_UNSIGNED), +item_param), + .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_gre_opt, +sequence)), + }, [ITEM_GTP_PSC] = { .name = "gtp_psc", .help = "match GTP extension header with type 0x85", @@ -9235,6 +9280,20 @@ static int comp_set_modify_field_id(struct context *, const struct token *, ((const struct rte_flow_item_flex *) item->spec)->length : 0; break; + case RTE_FLOW_ITEM_TYPE_GRE_OPTION: + size = 0; + if (item->spec) { + const struct rte_flow_item_gre_opt + *opt = item->spec; + if (opt->checksum.checksum) + size += 4; + if (opt->key.key) +
[v1 2/4] ethdev: support GRE optional fields
Add flow pattern items and header format for matching optional fields (checksum/key/sequence) in GRE header. And the flags in gre item should be correspondingly set with the new added items. Signed-off-by: Sean Zhang --- doc/guides/prog_guide/rte_flow.rst | 17 + lib/ethdev/rte_flow.c | 1 + lib/ethdev/rte_flow.h | 19 +++ 3 files changed, 37 insertions(+) diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst index b4aa9c4..0e47501 100644 --- a/doc/guides/prog_guide/rte_flow.rst +++ b/doc/guides/prog_guide/rte_flow.rst @@ -1106,6 +1106,7 @@ Matches a GRE header. Item: ``GRE_KEY`` ^ +This action is deprecated. Consider `Item: GRE_OPTION`. Matches a GRE key field. This should be preceded by item ``GRE``. @@ -1113,6 +1114,22 @@ This should be preceded by item ``GRE``. - Value to be matched is a big-endian 32 bit integer. - When this item present it implicitly match K bit in default mask as "1" +Item: ``GRE_OPTION`` + + +Matches a GRE optional fields (checksum/key/sequence). +This should be preceded by item ``GRE``. + +- ``checksum``: checksum. +- ``key``: key. +- ``sequence``: sequence. +- The items in GRE_OPTION do not change bit flags(c_bit/k_bit/s_bit) in GRE + item. The bit flags need be set with GRE item by application. When the items + present, the corresponding bits in GRE spec and mask should be set "1" by + application, it means to match specified value of the fields. When the items + no present, but the corresponding bits in GRE spec and mask is "1", it means + to match any value of the fields. + Item: ``FUZZY`` ^^^ diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c index a93f68a..7f93900 100644 --- a/lib/ethdev/rte_flow.c +++ b/lib/ethdev/rte_flow.c @@ -139,6 +139,7 @@ struct rte_flow_desc_data { MK_FLOW_ITEM(META, sizeof(struct rte_flow_item_meta)), MK_FLOW_ITEM(TAG, sizeof(struct rte_flow_item_tag)), MK_FLOW_ITEM(GRE_KEY, sizeof(rte_be32_t)), + MK_FLOW_ITEM(GRE_OPTION, sizeof(struct rte_flow_item_gre_opt)), MK_FLOW_ITEM(GTP_PSC, sizeof(struct rte_flow_item_gtp_psc)), MK_FLOW_ITEM(PPPOES, sizeof(struct rte_flow_item_pppoe)), MK_FLOW_ITEM(PPPOED, sizeof(struct rte_flow_item_pppoe)), diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index 1031fb2..db58b47 100644 --- a/lib/ethdev/rte_flow.h +++ b/lib/ethdev/rte_flow.h @@ -37,6 +37,7 @@ #include #include #include +#include #ifdef __cplusplus extern "C" { @@ -660,6 +661,13 @@ enum rte_flow_item_type { * See struct rte_flow_item_ppp. */ RTE_FLOW_ITEM_TYPE_PPP, + + /** +* Matches GRE optional fields. +* +* See struct rte_flow_item_gre_opt. +*/ + RTE_FLOW_ITEM_TYPE_GRE_OPTION, }; /** @@ -1196,6 +1204,17 @@ struct rte_flow_item_gre { #endif /** + * RTE_FLOW_ITEM_TYPE_GRE_OPTION. + * + * Matches GRE optional fields in header. + */ +struct rte_flow_item_gre_opt { + struct rte_gre_hdr_opt_checksum checksum; + struct rte_gre_hdr_opt_key key; + struct rte_gre_hdr_opt_sequence sequence; +}; + +/** * RTE_FLOW_ITEM_TYPE_FUZZY * * Fuzzy pattern match, expect faster than default. -- 1.8.3.1
[v1 4/4] net/mlx5: support matching optional fields of GRE
This patch adds matching on the optional fields (checksum/key/sequence) of GRE header. The matching of checksum and sequence fields requests support from rdma-core with capability of misc5 and tunner_header 0-3. For patterns without checksum and sequence specified, keep using misc for matching as before, but for patterns with checksum or sequence, validate capability first and then use misc5 for the matching. Signed-off-by: Sean Zhang --- drivers/common/mlx5/mlx5_devx_cmds.c | 3 + drivers/net/mlx5/linux/mlx5_os.c | 2 + drivers/net/mlx5/mlx5.h | 1 + drivers/net/mlx5/mlx5_flow.c | 106 + drivers/net/mlx5/mlx5_flow.h | 6 ++ drivers/net/mlx5/mlx5_flow_dv.c | 146 +++ 6 files changed, 264 insertions(+) diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c index 7cd3d4f..5d21480 100644 --- a/drivers/common/mlx5/mlx5_devx_cmds.c +++ b/drivers/common/mlx5/mlx5_devx_cmds.c @@ -1027,6 +1027,9 @@ struct mlx5_devx_obj * attr->flow.tunnel_header_0_1 = MLX5_GET (flow_table_nic_cap, hcattr, ft_field_support_2_nic_receive.tunnel_header_0_1); + attr->flow.tunnel_header_2_3 = MLX5_GET + (flow_table_nic_cap, hcattr, +ft_field_support_2_nic_receive.tunnel_header_2_3); attr->pkt_integrity_match = mlx5_devx_query_pkt_integrity_match(hcattr); attr->inner_ipv4_ihl = MLX5_GET (flow_table_nic_cap, hcattr, diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c index 36f0fbf..6dbbb1f 100644 --- a/drivers/net/mlx5/linux/mlx5_os.c +++ b/drivers/net/mlx5/linux/mlx5_os.c @@ -1385,6 +1385,8 @@ } if (config->hca_attr.flow.tunnel_header_0_1) sh->tunnel_header_0_1 = 1; + if (config->hca_attr.flow.tunnel_header_2_3) + sh->tunnel_header_2_3 = 1; #endif #ifdef HAVE_MLX5_DR_CREATE_ACTION_ASO if (config->hca_attr.flow_hit_aso && diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index b55f581..32b1c7b 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -1150,6 +1150,7 @@ struct mlx5_dev_ctx_shared { uint32_t meter_aso_en:1; /* Flow Meter ASO is supported. */ uint32_t ct_aso_en:1; /* Connection Tracking ASO is supported. */ uint32_t tunnel_header_0_1:1; /* tunnel_header_0_1 is supported. */ + uint32_t tunnel_header_2_3:1; /* tunnel_header_2_3 is supported. */ uint32_t misc5_cap:1; /* misc5 matcher parameter is supported. */ uint32_t reclaim_mode:1; /* Reclaim memory. */ uint32_t dr_drop_action_en:1; /* Use DR drop action. */ diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index b7cf414..9e608ba 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -2802,6 +2802,112 @@ struct mlx5_flow_tunnel_info { } /** + * Validate GRE optional item. + * + * @param[in] dev + * Pointer to the Ethernet device structure. + * @param[in] item + * Item specification. + * @param[in] item_flags + * Bit flags to mark detected items. + * @param[in] attr + * Flow rule attributes. + * @param[in] gre_item + * Pointer to gre_item + * @param[out] error + * Pointer to error structure. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +int +mlx5_flow_validate_item_gre_option(struct rte_eth_dev *dev, + const struct rte_flow_item *item, + uint64_t item_flags, + const struct rte_flow_attr *attr, + const struct rte_flow_item *gre_item, + struct rte_flow_error *error) +{ + const struct rte_flow_item_gre *gre_spec = gre_item->spec; + const struct rte_flow_item_gre *gre_mask = gre_item->mask; + const struct rte_flow_item_gre_opt *spec = item->spec; + const struct rte_flow_item_gre_opt *mask = item->mask; + struct mlx5_priv *priv = dev->data->dev_private; + int ret = 0; + + if (!(item_flags & MLX5_FLOW_LAYER_GRE)) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "No preceding GRE header"); + if (item_flags & MLX5_FLOW_LAYER_INNER) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "GRE option following a wrong item"); + if (!gre_mask) + gre_mask = &rte_flow_item_gre_mask; + + struct rte_flow_item_gre_opt gre_option_default_mask = { + .checksum = { + .checksum = 0x, + }, +
Re: [PATCH v2 7/9] vhost: remove multi-line logs
Hi Chenbo, On 1/26/22 04:27, Xia, Chenbo wrote: Hi Maxime, -Original Message- From: Maxime Coquelin Sent: Tuesday, January 25, 2022 7:25 PM To: dev@dpdk.org; Xia, Chenbo ; david.march...@redhat.com Cc: Maxime Coquelin Subject: [PATCH v2 7/9] vhost: remove multi-line logs This patch replaces multi-lines logs in multiple single- line logs in order to ease logs filtering based on their socket path. Signed-off-by: Maxime Coquelin --- lib/vhost/socket.c | 10 ++-- lib/vhost/vhost.c | 8 ++-- lib/vhost/vhost_user.c | 106 +++-- 3 files changed, 60 insertions(+), 64 deletions(-) ... @@ -1122,8 +1121,7 @@ vhost_user_postcopy_region_register(struct virtio_net *dev, if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, ®_struct)) { - VHOST_LOG_CONFIG(ERR, "(%s) failed to register ufd for region " - "%" PRIx64 " - %" PRIx64 " (ufd = %d) %s\n", + VHOST_LOG_CONFIG(ERR, "(%s) failed to register ufd for region %" PRIx64 " - %" PRIx64 " (ufd = %d) %s\n", This line is > 100 chars, and I think the original one is fine for log filtering or using log to search code. What do you think? Makes sense, I will revert this change in v3. Thanks, Maxime
RE: [PATCH v2 01/10] ethdev: introduce flow pre-configuration hints
> -Original Message- > From: Bruce Richardson > Sent: Tuesday, January 25, 2022 8:14 PM > To: Ori Kam > Subject: Re: [PATCH v2 01/10] ethdev: introduce flow pre-configuration hints > > On Tue, Jan 25, 2022 at 06:09:42PM +, Bruce Richardson wrote: > > On Tue, Jan 25, 2022 at 03:58:45PM +, Ori Kam wrote: > > > Hi Bruce, > > > > > > > -Original Message- From: Bruce Richardson > > > > Sent: Monday, January 24, 2022 8:09 PM > > > > Subject: Re: [PATCH v2 01/10] ethdev: introduce flow > > > > pre-configuration hints > > > > > > > > On Mon, Jan 24, 2022 at 11:16:15PM +0530, Jerin Jacob wrote: > > > > > On Mon, Jan 24, 2022 at 11:05 PM Thomas Monjalon > > > > > wrote: > > > > > > > > > > > > 24/01/2022 15:36, Jerin Jacob: > > > > > > > On Tue, Jan 18, 2022 at 9:01 PM Alexander Kozyrev > > > > > > > wrote: > > > > > > > > +struct rte_flow_port_attr { + /** +* Version > > > > > > > > of the struct layout, should be 0. +*/ + > > > > > > > > uint32_t version; > > > > > > > > > > > > > > Why version number? Across DPDK, we are using dynamic function > > > > > > > versioning, I think, that would be sufficient for ABI > > > > > > > versioning > > > > > > > > > > > > Function versioning is not ideal when the structure is accessed > > > > > > in many places like many drivers and library functions. > > > > > > > > > > > > The idea of this version field (which can be a bitfield) is to > > > > > > update it when some new features are added, so the users of the > > > > > > struct can check if a feature is there before trying to use it. > > > > > > It means a bit more code in the functions, but avoid duplicating > > > > > > functions as in function versioning. > > > > > > > > > > > > Another approach was suggested by Bruce, and applied to dmadev. > > > > > > It is assuming we only add new fields at the end (no removal), > > > > > > and focus on the size of the struct. By passing sizeof as an > > > > > > extra parameter, the function knows which fields are OK to use. > > > > > > Example: > > > > > > http://code.dpdk.org/dpdk/v21.11/source/lib/dmadev/rte_dmadev.c#L476 > > > > > > > > > > + @Richardson, Bruce Either approach is fine, No strong opinion. > > > > > We can have one approach and use it across DPDK for consistency. > > > > > > > > > > > > > In general I prefer the size-based approach, mainly because of its > > > > simplicity. However, some other reasons why we may want to choose it: > > > > > > > > * It's completely hidden from the end user, and there is no need for > > > > an extra struct field that needs to be filled in > > > > > > > > * Related to that, for the version-field approach, if the field is > > > > present in a user-allocated struct, then you probably need to start > > > > preventing user error via: - having the external struct not have the > > > > field and use a separate internal struct to add in the version info > > > > after the fact in the versioned function. Alternatively, - provide a > > > > separate init function for each structure to fill in the version > > > > field appropriately > > > > > > > > * In general, using the size-based approach like in the linked > > > > example is more resilient since it's compiler-inserted, so there is > > > > reduced chance of error. > > > > > > > > * A sizeof field allows simple-enough handling in the drivers - > > > > especially since it does not allow removed fields. Each driver only > > > > needs to check that the size passed in is greater than that expected, > > > > thereby allowing us to have both updated and non-updated drivers > > > > co-existing simultaneously. [For a version field, the same scheme > > > > could also work if we keep the no-delete rule, but for a bitmask > > > > field, I believe things may get more complex in terms of checking] > > > > > > > > In terms of the limitations of using sizeof - requiring new fields to > > > > always go on the end, and preventing shrinking the struct - I think > > > > that the simplicity gains far outweigh the impact of these > > > > strictions. > > > > > > > > * Adding fields to struct is far more common than wanting to remove > > > > one > > > > > > > > * So long as the added field is at the end, even if the struct size > > > > doesn't change the scheme can still work as the versioned function > > > > for the old struct can ensure that the extra field is appropriately > > > > zeroed (rather than random) on entry into any driver function > > > > > > > > > > Zero can be a valid value so this is may result in an issue. > > > > > > > In this instance, I was using zero as a neutral, default-option value. If > > having zero as the default causes problems, we can always make the > > structure size change to force a new size value. > > > > > > * If we do want to remove a field, the space simply needs to be > > > > marked as reserved in the struct, until the next ABI break release, > > > > when it can be compacted. Again, function versioning can take care of > > > >
RE: [PATCH v1] raw/ifpga: fix ifpga devices cleanup function
> -Original Message- > From: Xu, Rosen > Sent: Wednesday, January 26, 2022 3:08 PM > To: Huang, Wei ; dev@dpdk.org; Zhang, Qi Z > > Cc: sta...@dpdk.org; Zhang, Tianfei ; Yigit, Ferruh > > Subject: RE: [PATCH v1] raw/ifpga: fix ifpga devices cleanup function > > Hi, > > > -Original Message- > > From: Huang, Wei > > Sent: Wednesday, January 26, 2022 11:30 > > To: dev@dpdk.org; Xu, Rosen ; Zhang, Qi Z > > > > Cc: sta...@dpdk.org; Zhang, Tianfei ; Yigit, > > Ferruh ; Huang, Wei > > Subject: [PATCH v1] raw/ifpga: fix ifpga devices cleanup function > > > > Use rte_dev_remove() to replace rte_rawdev_pmd_release() in > > ifpga_rawdev_cleanup(), resources occupied by ifpga raw devices such > > as threads can be released correctly. > > > > Fixes: f724a802 ("raw/ifpga: add miscellaneous APIs") > > > > Signed-off-by: Wei Huang > > --- > > drivers/raw/ifpga/ifpga_rawdev.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/raw/ifpga/ifpga_rawdev.c > > b/drivers/raw/ifpga/ifpga_rawdev.c > > index fdf3c23..88c38aa 100644 > > --- a/drivers/raw/ifpga/ifpga_rawdev.c > > +++ b/drivers/raw/ifpga/ifpga_rawdev.c > > @@ -1787,12 +1787,14 @@ int ifpga_rawdev_partial_reconfigure(struct > > rte_rawdev *dev, int port, void ifpga_rawdev_cleanup(void) { > > struct ifpga_rawdev *dev; > > + struct rte_rawdev *rdev; > > unsigned int i; > > > > for (i = 0; i < IFPGA_RAWDEV_NUM; i++) { > > dev = &ifpga_rawdevices[i]; > > if (dev->rawdev) { > > - rte_rawdev_pmd_release(dev->rawdev); > > + rdev = dev->rawdev; > > + rte_dev_remove(rdev->device); > > dev->rawdev = NULL; > > } > > } > > -- > > 1.8.3.1 > > Acked-by: Rosen Xu Applied to dpdk-next-net-intel. Thanks Qi
[PATCH v3 0/9] vhost: improve logging
This series aims at easing Vhost logs analysis, by prepending the Vhost-user socket path to all logs and to remove multi-line comments. Doing so, filtering Vhost-user ports logs is much easier. Changes in v3: == - Fix various typos reported (Chenbo) - Revert one multi-line comment removal (Chenbo) Changes in v2: == - Add missing socket paths (David) - avoid identical logs in iotlb code (David) - Use data log type when used in datapath (David) Maxime Coquelin (9): vhost: improve IOTLB logs vhost: improve vDPA registration failure log vhost: improve Vhost layer logs vhost: improve Vhost-user layer logs vhost: improve socket layer logs vhost: improve Virtio-net layer logs vhost: remove multi-line logs vhost: differentiate IOTLB logs vhost: use proper logging type for data path lib/vhost/iotlb.c | 30 +- lib/vhost/iotlb.h | 10 +- lib/vhost/socket.c | 148 - lib/vhost/vdpa.c | 4 +- lib/vhost/vhost.c | 108 --- lib/vhost/vhost_user.c | 678 - lib/vhost/vhost_user.h | 4 +- lib/vhost/virtio_net.c | 165 +- 8 files changed, 548 insertions(+), 599 deletions(-) -- 2.34.1
[PATCH v3 1/9] vhost: improve IOTLB logs
This patch adds IOTLB mempool name when logging debug or error messages, and also prepends the socket path. to all the logs. Signed-off-by: Maxime Coquelin Reviewed-by: Chenbo Xia --- lib/vhost/iotlb.c | 26 +++--- lib/vhost/iotlb.h | 10 +- lib/vhost/vhost.c | 2 +- lib/vhost/vhost_user.c | 2 +- 4 files changed, 22 insertions(+), 18 deletions(-) diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c index 82bdb84526..afa86d7c2b 100644 --- a/lib/vhost/iotlb.c +++ b/lib/vhost/iotlb.c @@ -62,7 +62,7 @@ vhost_user_iotlb_pending_miss(struct vhost_virtqueue *vq, uint64_t iova, } void -vhost_user_iotlb_pending_insert(struct vhost_virtqueue *vq, +vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue *vq, uint64_t iova, uint8_t perm) { struct vhost_iotlb_entry *node; @@ -70,14 +70,16 @@ vhost_user_iotlb_pending_insert(struct vhost_virtqueue *vq, ret = rte_mempool_get(vq->iotlb_pool, (void **)&node); if (ret) { - VHOST_LOG_CONFIG(DEBUG, "IOTLB pool empty, clear entries\n"); + VHOST_LOG_CONFIG(DEBUG, "(%s) IOTLB pool %s empty, clear entries\n", + dev->ifname, vq->iotlb_pool->name); if (!TAILQ_EMPTY(&vq->iotlb_pending_list)) vhost_user_iotlb_pending_remove_all(vq); else vhost_user_iotlb_cache_random_evict(vq); ret = rte_mempool_get(vq->iotlb_pool, (void **)&node); if (ret) { - VHOST_LOG_CONFIG(ERR, "IOTLB pool still empty, failure\n"); + VHOST_LOG_CONFIG(ERR, "(%s) IOTLB pool %s still empty, failure\n", + dev->ifname, vq->iotlb_pool->name); return; } } @@ -156,22 +158,25 @@ vhost_user_iotlb_cache_random_evict(struct vhost_virtqueue *vq) } void -vhost_user_iotlb_cache_insert(struct vhost_virtqueue *vq, uint64_t iova, - uint64_t uaddr, uint64_t size, uint8_t perm) +vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint64_t iova, uint64_t uaddr, + uint64_t size, uint8_t perm) { struct vhost_iotlb_entry *node, *new_node; int ret; ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node); if (ret) { - VHOST_LOG_CONFIG(DEBUG, "IOTLB pool empty, clear entries\n"); + VHOST_LOG_CONFIG(DEBUG, "(%s) IOTLB pool %s empty, clear entries\n", + dev->ifname, vq->iotlb_pool->name); if (!TAILQ_EMPTY(&vq->iotlb_list)) vhost_user_iotlb_cache_random_evict(vq); else vhost_user_iotlb_pending_remove_all(vq); ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node); if (ret) { - VHOST_LOG_CONFIG(ERR, "IOTLB pool still empty, failure\n"); + VHOST_LOG_CONFIG(ERR, "(%s) IOTLB pool %s still empty, failure\n", + dev->ifname, vq->iotlb_pool->name); return; } } @@ -311,7 +316,7 @@ vhost_user_iotlb_init(struct virtio_net *dev, int vq_index) snprintf(pool_name, sizeof(pool_name), "iotlb_%u_%d_%d", getpid(), dev->vid, vq_index); - VHOST_LOG_CONFIG(DEBUG, "IOTLB cache name: %s\n", pool_name); + VHOST_LOG_CONFIG(DEBUG, "(%s) IOTLB cache name: %s\n", dev->ifname, pool_name); /* If already created, free it and recreate */ vq->iotlb_pool = rte_mempool_lookup(pool_name); @@ -324,9 +329,8 @@ vhost_user_iotlb_init(struct virtio_net *dev, int vq_index) RTE_MEMPOOL_F_NO_CACHE_ALIGN | RTE_MEMPOOL_F_SP_PUT); if (!vq->iotlb_pool) { - VHOST_LOG_CONFIG(ERR, - "Failed to create IOTLB cache pool (%s)\n", - pool_name); + VHOST_LOG_CONFIG(ERR, "(%s) Failed to create IOTLB cache pool %s\n", + dev->ifname, pool_name); return -1; } diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h index b6e0757ad6..8d0ff7473b 100644 --- a/lib/vhost/iotlb.h +++ b/lib/vhost/iotlb.h @@ -33,17 +33,17 @@ vhost_user_iotlb_wr_unlock(struct vhost_virtqueue *vq) rte_rwlock_write_unlock(&vq->iotlb_lock); } -void vhost_user_iotlb_cache_insert(struct vhost_virtqueue *vq, uint64_t iova, - uint64_t uaddr, uint64_t size, - uint8_t perm); +void vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq, + ui
[PATCH v3 3/9] vhost: improve Vhost layer logs
This patch prepends Vhost logs with the Vhost-user socket path when available to ease filtering logs for a given port. Signed-off-by: Maxime Coquelin Reviewed-by: Chenbo Xia --- lib/vhost/vhost.c | 104 +++--- 1 file changed, 51 insertions(+), 53 deletions(-) diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c index e52d7f7bb6..3b05f17a50 100644 --- a/lib/vhost/vhost.c +++ b/lib/vhost/vhost.c @@ -58,9 +58,8 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq, vhost_user_iotlb_pending_insert(dev, vq, iova, perm); if (vhost_user_iotlb_miss(dev, iova, perm)) { - VHOST_LOG_CONFIG(ERR, - "IOTLB miss req failed for IOVA 0x%" PRIx64 "\n", - iova); + VHOST_LOG_CONFIG(ERR, "(%s) IOTLB miss req failed for IOVA 0x%" PRIx64 "\n", + dev->ifname, iova); vhost_user_iotlb_pending_remove(vq, iova, 1, perm); } @@ -126,8 +125,8 @@ __vhost_log_write_iova(struct virtio_net *dev, struct vhost_virtqueue *vq, hva = __vhost_iova_to_vva(dev, vq, iova, &map_len, VHOST_ACCESS_RW); if (map_len != len) { VHOST_LOG_DATA(ERR, - "Failed to write log for IOVA 0x%" PRIx64 ". No IOTLB entry found\n", - iova); + "(%s) failed to write log for IOVA 0x%" PRIx64 ". No IOTLB entry found\n", + dev->ifname, iova); return; } @@ -243,8 +242,8 @@ __vhost_log_cache_write_iova(struct virtio_net *dev, struct vhost_virtqueue *vq, hva = __vhost_iova_to_vva(dev, vq, iova, &map_len, VHOST_ACCESS_RW); if (map_len != len) { VHOST_LOG_DATA(ERR, - "Failed to write log for IOVA 0x%" PRIx64 ". No IOTLB entry found\n", - iova); + "(%s) failed to write log for IOVA 0x%" PRIx64 ". No IOTLB entry found\n", + dev->ifname, iova); return; } @@ -422,9 +421,9 @@ translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq, gpa = hva_to_gpa(dev, hva, exp_size); if (!gpa) { VHOST_LOG_CONFIG(ERR, - "VQ: Failed to find GPA for log_addr: 0x%" + "(%s) failed to find GPA for log_addr: 0x%" PRIx64 " hva: 0x%" PRIx64 "\n", - log_addr, hva); + dev->ifname, log_addr, hva); return 0; } return gpa; @@ -551,16 +550,15 @@ init_vring_queue(struct virtio_net *dev, uint32_t vring_idx) int numa_node = SOCKET_ID_ANY; if (vring_idx >= VHOST_MAX_VRING) { - VHOST_LOG_CONFIG(ERR, - "Failed not init vring, out of bound (%d)\n", - vring_idx); + VHOST_LOG_CONFIG(ERR, "(%s) failed to init vring, out of bound (%d)\n", + dev->ifname, vring_idx); return; } vq = dev->virtqueue[vring_idx]; if (!vq) { - VHOST_LOG_CONFIG(ERR, "Virtqueue not allocated (%d)\n", - vring_idx); + VHOST_LOG_CONFIG(ERR, "(%s) virtqueue not allocated (%d)\n", + dev->ifname, vring_idx); return; } @@ -572,8 +570,8 @@ init_vring_queue(struct virtio_net *dev, uint32_t vring_idx) #ifdef RTE_LIBRTE_VHOST_NUMA if (get_mempolicy(&numa_node, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR)) { - VHOST_LOG_CONFIG(ERR, "(%d) failed to query numa node: %s\n", - dev->vid, rte_strerror(errno)); + VHOST_LOG_CONFIG(ERR, "(%s) failed to query numa node: %s\n", + dev->ifname, rte_strerror(errno)); numa_node = SOCKET_ID_ANY; } #endif @@ -590,15 +588,15 @@ reset_vring_queue(struct virtio_net *dev, uint32_t vring_idx) if (vring_idx >= VHOST_MAX_VRING) { VHOST_LOG_CONFIG(ERR, - "Failed not init vring, out of bound (%d)\n", - vring_idx); + "(%s) failed to reset vring, out of bound (%d)\n", + dev->ifname, vring_idx); return; } vq = dev->virtqueue[vring_idx]; if (!vq) { - VHOST_LOG_CONFIG(ERR, "Virtqueue not allocated (%d)\n", - vring_idx); + VHOST_LOG_CONFIG(ERR, "(%s) failed to reset vring, virtqueue not allocated (%d)\n", + dev->ifname, vring_idx);
[PATCH v3 2/9] vhost: improve vDPA registration failure log
This patch adds name of the device failing vDPA registration. Signed-off-by: Maxime Coquelin Reviewed-by: Chenbo Xia --- lib/vhost/vdpa.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/vhost/vdpa.c b/lib/vhost/vdpa.c index 09ad5d866e..6df2230a67 100644 --- a/lib/vhost/vdpa.c +++ b/lib/vhost/vdpa.c @@ -82,8 +82,8 @@ rte_vdpa_register_device(struct rte_device *rte_dev, !ops->get_protocol_features || !ops->dev_conf || !ops->dev_close || !ops->set_vring_state || !ops->set_features) { - VHOST_LOG_CONFIG(ERR, - "Some mandatory vDPA ops aren't implemented\n"); + VHOST_LOG_CONFIG(ERR, "(%s) Some mandatory vDPA ops aren't implemented\n", + rte_dev->name); return NULL; } -- 2.34.1
[PATCH v3 4/9] vhost: improve Vhost-user layer logs
This patch adds the Vhost-user socket path to Vhost-user layer logs in order to ease logs filtering. Signed-off-by: Maxime Coquelin Reviewed-by: Chenbo Xia --- lib/vhost/vhost_user.c | 573 - 1 file changed, 275 insertions(+), 298 deletions(-) diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index 6aaac6a316..c95eef8f5d 100644 --- a/lib/vhost/vhost_user.c +++ b/lib/vhost/vhost_user.c @@ -94,7 +94,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = { }; static int send_vhost_reply(int sockfd, struct VhostUserMsg *msg); -static int read_vhost_message(int sockfd, struct VhostUserMsg *msg); +static int read_vhost_message(struct virtio_net *dev, int sockfd, struct VhostUserMsg *msg); static void close_msg_fds(struct VhostUserMsg *msg) @@ -117,14 +117,13 @@ close_msg_fds(struct VhostUserMsg *msg) * close all FDs and return an error if this is not the case. */ static int -validate_msg_fds(struct VhostUserMsg *msg, int expected_fds) +validate_msg_fds(struct virtio_net *dev, struct VhostUserMsg *msg, int expected_fds) { if (msg->fd_num == expected_fds) return 0; - VHOST_LOG_CONFIG(ERR, - " Expect %d FDs for request %s, received %d\n", - expected_fds, + VHOST_LOG_CONFIG(ERR, "(%s) expect %d FDs for request %s, received %d\n", + dev->ifname, expected_fds, vhost_message_str[msg->request.master], msg->fd_num); @@ -144,7 +143,7 @@ get_blk_size(int fd) } static int -async_dma_map(struct rte_vhost_mem_region *region, bool do_map) +async_dma_map(struct virtio_net *dev, struct rte_vhost_mem_region *region, bool do_map) { uint64_t host_iova; int ret = 0; @@ -172,7 +171,7 @@ async_dma_map(struct rte_vhost_mem_region *region, bool do_map) if (rte_errno == ENODEV || rte_errno == ENOTSUP) return 0; - VHOST_LOG_CONFIG(ERR, "DMA engine map failed\n"); + VHOST_LOG_CONFIG(ERR, "(%s) DMA engine map failed\n", dev->ifname); /* DMA mapping errors won't stop VHST_USER_SET_MEM_TABLE. */ return 0; } @@ -188,7 +187,7 @@ async_dma_map(struct rte_vhost_mem_region *region, bool do_map) if (rte_errno == EINVAL) return 0; - VHOST_LOG_CONFIG(ERR, "DMA engine unmap failed\n"); + VHOST_LOG_CONFIG(ERR, "(%s) DMA engine unmap failed\n", dev->ifname); return ret; } } @@ -209,7 +208,7 @@ free_mem_region(struct virtio_net *dev) reg = &dev->mem->regions[i]; if (reg->host_user_addr) { if (dev->async_copy && rte_vfio_is_enabled("vfio")) - async_dma_map(reg, false); + async_dma_map(dev, reg, false); munmap(reg->mmap_addr, reg->mmap_size); close(reg->fd); @@ -287,11 +286,13 @@ vhost_user_notify_queue_state(struct virtio_net *dev, uint16_t index, * the device hasn't been initialised. */ static int -vhost_user_set_owner(struct virtio_net **pdev __rte_unused, +vhost_user_set_owner(struct virtio_net **pdev, struct VhostUserMsg *msg, int main_fd __rte_unused) { - if (validate_msg_fds(msg, 0) != 0) + struct virtio_net *dev = *pdev; + + if (validate_msg_fds(dev, msg, 0) != 0) return RTE_VHOST_MSG_RESULT_ERR; return RTE_VHOST_MSG_RESULT_OK; @@ -304,7 +305,7 @@ vhost_user_reset_owner(struct virtio_net **pdev, { struct virtio_net *dev = *pdev; - if (validate_msg_fds(msg, 0) != 0) + if (validate_msg_fds(dev, msg, 0) != 0) return RTE_VHOST_MSG_RESULT_ERR; vhost_destroy_device_notify(dev); @@ -324,7 +325,7 @@ vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg, struct virtio_net *dev = *pdev; uint64_t features = 0; - if (validate_msg_fds(msg, 0) != 0) + if (validate_msg_fds(dev, msg, 0) != 0) return RTE_VHOST_MSG_RESULT_ERR; rte_vhost_driver_get_features(dev->ifname, &features); @@ -346,7 +347,7 @@ vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg, struct virtio_net *dev = *pdev; uint32_t queue_num = 0; - if (validate_msg_fds(msg, 0) != 0) + if (validate_msg_fds(dev, msg, 0) != 0) return RTE_VHOST_MSG_RESULT_ERR; rte_vhost_driver_get_queue_num(dev->ifname, &queue_num); @@ -370,14 +371,13 @@ vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg, uint64_t vhost_features = 0; struct rte_vdpa_device *vdpa_dev; - if (validate_msg_fds
[PATCH v3 5/9] vhost: improve socket layer logs
This patch adds the Vhost socket path whenever possible in order to make debugging possible when multiple Vhost devices are in use. Some vhost-user layer functions are modified to pass the device path down to the socket layer. Signed-off-by: Maxime Coquelin Reviewed-by: Chenbo Xia --- lib/vhost/socket.c | 146 + lib/vhost/vhost_user.c | 26 lib/vhost/vhost_user.h | 4 +- 3 files changed, 76 insertions(+), 100 deletions(-) diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c index 82963c1e6d..ad3471d6a9 100644 --- a/lib/vhost/socket.c +++ b/lib/vhost/socket.c @@ -100,7 +100,7 @@ static struct vhost_user vhost_user = { * with number of fds read. */ int -read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds, +read_fd_message(char *ifname, int sockfd, char *buf, int buflen, int *fds, int max_fds, int *fd_num) { struct iovec iov; @@ -124,12 +124,13 @@ read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds, ret = recvmsg(sockfd, &msgh, 0); if (ret <= 0) { if (ret) - VHOST_LOG_CONFIG(ERR, "recvmsg failed\n"); + VHOST_LOG_CONFIG(ERR, "(%s) recvmsg failed on fd %d (%s)\n", + ifname, sockfd, strerror(errno)); return ret; } if (msgh.msg_flags & (MSG_TRUNC | MSG_CTRUNC)) { - VHOST_LOG_CONFIG(ERR, "truncated msg\n"); + VHOST_LOG_CONFIG(ERR, "(%s) truncated msg (fd %d)\n", ifname, sockfd); return -1; } @@ -152,7 +153,7 @@ read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds, } int -send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num) +send_fd_message(char *ifname, int sockfd, char *buf, int buflen, int *fds, int fd_num) { struct iovec iov; @@ -174,7 +175,7 @@ send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num) msgh.msg_controllen = sizeof(control); cmsg = CMSG_FIRSTHDR(&msgh); if (cmsg == NULL) { - VHOST_LOG_CONFIG(ERR, "cmsg == NULL\n"); + VHOST_LOG_CONFIG(ERR, "(%s) cmsg == NULL\n", ifname); errno = EINVAL; return -1; } @@ -192,7 +193,8 @@ send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num) } while (ret < 0 && errno == EINTR); if (ret < 0) { - VHOST_LOG_CONFIG(ERR, "sendmsg error\n"); + VHOST_LOG_CONFIG(ERR, "(%s) sendmsg error on fd %d (%s)\n", + ifname, sockfd, strerror(errno)); return ret; } @@ -243,14 +245,14 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) dev->async_copy = 1; } - VHOST_LOG_CONFIG(INFO, "new device, handle is %d, path is %s\n", vid, vsocket->path); + VHOST_LOG_CONFIG(INFO, "(%s) new device, handle is %d\n", vsocket->path, vid); if (vsocket->notify_ops->new_connection) { ret = vsocket->notify_ops->new_connection(vid); if (ret < 0) { VHOST_LOG_CONFIG(ERR, - "failed to add vhost user connection with fd %d\n", - fd); + "(%s) failed to add vhost user connection with fd %d\n", + vsocket->path, fd); goto err_cleanup; } } @@ -261,9 +263,8 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) ret = fdset_add(&vhost_user.fdset, fd, vhost_user_read_cb, NULL, conn); if (ret < 0) { - VHOST_LOG_CONFIG(ERR, - "failed to add fd %d into vhost server fdset\n", - fd); + VHOST_LOG_CONFIG(ERR, "(%s) failed to add fd %d into vhost server fdset\n", + vsocket->path, fd); if (vsocket->notify_ops->destroy_connection) vsocket->notify_ops->destroy_connection(conn->vid); @@ -295,7 +296,8 @@ vhost_user_server_new_connection(int fd, void *dat, int *remove __rte_unused) if (fd < 0) return; - VHOST_LOG_CONFIG(INFO, "new vhost user connection is %d\n", fd); + VHOST_LOG_CONFIG(INFO, "(%s) new vhost user connection is %d\n", + vsocket->path, fd); vhost_user_add_connection(fd, vsocket); } @@ -343,13 +345,13 @@ create_unix_socket(struct vhost_user_socket *vsocket) fd = socket(AF_UNIX, SOCK_STREAM, 0); if (fd < 0) return -1; - VHOST_LOG_CONFIG(INFO, "vhost-user %s: socket created, fd: %d\n", - vsocket->is_server ? "server" : "client", fd)
[PATCH v3 6/9] vhost: improve Virtio-net layer logs
This patch standardizes logging done in Virtio-net, so that the Vhost-user socket path is always prepended to the logs. It will ease log analysis when multiple Vhost-user ports are in use. Signed-off-by: Maxime Coquelin Reviewed-by: Chenbo Xia --- lib/vhost/virtio_net.c | 165 - 1 file changed, 79 insertions(+), 86 deletions(-) diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c index b3d954aab4..f19713137c 100644 --- a/lib/vhost/virtio_net.c +++ b/lib/vhost/virtio_net.c @@ -792,12 +792,12 @@ copy_vnet_hdr_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, } static __rte_always_inline int -async_iter_initialize(struct vhost_async *async) +async_iter_initialize(struct virtio_net *dev, struct vhost_async *async) { struct rte_vhost_iov_iter *iter; if (unlikely(async->iovec_idx >= VHOST_MAX_ASYNC_VEC)) { - VHOST_LOG_DATA(ERR, "no more async iovec available\n"); + VHOST_LOG_DATA(ERR, "(%s) no more async iovec available\n", dev->ifname); return -1; } @@ -809,7 +809,8 @@ async_iter_initialize(struct vhost_async *async) } static __rte_always_inline int -async_iter_add_iovec(struct vhost_async *async, void *src, void *dst, size_t len) +async_iter_add_iovec(struct virtio_net *dev, struct vhost_async *async, + void *src, void *dst, size_t len) { struct rte_vhost_iov_iter *iter; struct rte_vhost_iovec *iovec; @@ -818,7 +819,7 @@ async_iter_add_iovec(struct vhost_async *async, void *src, void *dst, size_t len static bool vhost_max_async_vec_log; if (!vhost_max_async_vec_log) { - VHOST_LOG_DATA(ERR, "no more async iovec available\n"); + VHOST_LOG_DATA(ERR, "(%s) no more async iovec available\n", dev->ifname); vhost_max_async_vec_log = true; } @@ -876,11 +877,11 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq, hpa = (void *)(uintptr_t)gpa_to_first_hpa(dev, buf_iova + buf_offset, cpy_len, &mapped_len); if (unlikely(!hpa)) { - VHOST_LOG_DATA(ERR, "(%d) %s: failed to get hpa.\n", dev->vid, __func__); + VHOST_LOG_DATA(ERR, "(%s) %s: failed to get hpa.\n", dev->ifname, __func__); return -1; } - if (unlikely(async_iter_add_iovec(async, + if (unlikely(async_iter_add_iovec(dev, async, (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset), hpa, (size_t)mapped_len))) @@ -951,8 +952,8 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, } else hdr = (struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)hdr_addr; - VHOST_LOG_DATA(DEBUG, "(%d) RX: num merge buffers %d\n", - dev->vid, num_buffers); + VHOST_LOG_DATA(DEBUG, "(%s) RX: num merge buffers %d\n", + dev->ifname, num_buffers); if (unlikely(buf_len < dev->vhost_hlen)) { buf_offset = dev->vhost_hlen - buf_len; @@ -970,7 +971,7 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, mbuf_offset = 0; if (is_async) { - if (async_iter_initialize(async)) + if (async_iter_initialize(dev, async)) return -1; } @@ -1133,14 +1134,14 @@ virtio_dev_rx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, pkt_len, buf_vec, &num_buffers, avail_head, &nr_vec) < 0)) { VHOST_LOG_DATA(DEBUG, - "(%d) failed to get enough desc from vring\n", - dev->vid); + "(%s) failed to get enough desc from vring\n", + dev->ifname); vq->shadow_used_idx -= num_buffers; break; } - VHOST_LOG_DATA(DEBUG, "(%d) current index %d | end index %d\n", - dev->vid, vq->last_avail_idx, + VHOST_LOG_DATA(DEBUG, "(%s) current index %d | end index %d\n", + dev->ifname, vq->last_avail_idx, vq->last_avail_idx + num_buffers); if (mbuf_to_desc(dev, vq, pkts[pkt_idx], buf_vec, nr_vec, @@ -1287,14 +1288,13 @@ virtio_dev_rx_single_packed(struct virtio_net *dev, if (unlikely(vhost_enqueue_single_packed(dev, vq, pkt, buf_vec, &nr_descs) < 0)) { - VHOST_LOG_DATA(DEBUG, - "(%d)
[PATCH v3 7/9] vhost: remove multi-line logs
This patch replaces multi-lines logs in multiple single- line logs in order to ease logs filtering based on their socket path. Signed-off-by: Maxime Coquelin --- lib/vhost/socket.c | 10 ++-- lib/vhost/vhost.c | 8 ++-- lib/vhost/vhost_user.c | 103 - 3 files changed, 59 insertions(+), 62 deletions(-) diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c index ad3471d6a9..c2f8013cd5 100644 --- a/lib/vhost/socket.c +++ b/lib/vhost/socket.c @@ -868,8 +868,8 @@ rte_vhost_driver_register(const char *path, uint64_t flags) if (vsocket->async_copy && (flags & (RTE_VHOST_USER_IOMMU_SUPPORT | RTE_VHOST_USER_POSTCOPY_SUPPORT))) { - VHOST_LOG_CONFIG(ERR, "(%s) enabling async copy and IOMMU " - "or post-copy feature simultaneously is not supported\n", path); + VHOST_LOG_CONFIG(ERR, "(%s) async copy with IOMMU or post-copy not supported\n", + path); goto out_mutex; } @@ -908,8 +908,10 @@ rte_vhost_driver_register(const char *path, uint64_t flags) (1ULL << VIRTIO_NET_F_HOST_TSO6) | (1ULL << VIRTIO_NET_F_HOST_UFO); - VHOST_LOG_CONFIG(INFO, "(%s) Linear buffers requested without external buffers, " - "disabling host segmentation offloading support\n", path); + VHOST_LOG_CONFIG(INFO, "(%s) Linear buffers requested without external buffers,\n", + path); + VHOST_LOG_CONFIG(INFO, "(%s) disabling host segmentation offloading support\n", + path); vsocket->supported_features &= ~seg_offload_features; vsocket->features &= ~seg_offload_features; } diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c index 3b05f17a50..cd62dc238b 100644 --- a/lib/vhost/vhost.c +++ b/lib/vhost/vhost.c @@ -1786,8 +1786,8 @@ rte_vhost_async_channel_unregister(int vid, uint16_t queue_id) } if (vq->async->pkts_inflight_n) { - VHOST_LOG_CONFIG(ERR, "(%s) failed to unregister async channel. " - "async inflight packets must be completed before unregistration.\n", + VHOST_LOG_CONFIG(ERR, "(%s) failed to unregister async channel.\n", dev->ifname); + VHOST_LOG_CONFIG(ERR, "(%s) inflight packets must be completed before unregistration.\n", dev->ifname); ret = -1; goto out; @@ -1821,8 +1821,8 @@ rte_vhost_async_channel_unregister_thread_unsafe(int vid, uint16_t queue_id) return 0; if (vq->async->pkts_inflight_n) { - VHOST_LOG_CONFIG(ERR, "(%s) failed to unregister async channel. " - "async inflight packets must be completed before unregistration.\n", + VHOST_LOG_CONFIG(ERR, "(%s) failed to unregister async channel.\n", dev->ifname); + VHOST_LOG_CONFIG(ERR, "(%s) inflight packets must be completed before unregistration.\n", dev->ifname); return -1; } diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index e207ace426..d54ac075cd 100644 --- a/lib/vhost/vhost_user.c +++ b/lib/vhost/vhost_user.c @@ -872,13 +872,13 @@ translate_ring_addresses(struct virtio_net *dev, int vq_index) } if (vq->last_used_idx != vq->used->idx) { - VHOST_LOG_CONFIG(WARNING, - "(%s) last_used_idx (%u) and vq->used->idx (%u) mismatches; " - "some packets maybe resent for Tx and dropped for Rx\n", + VHOST_LOG_CONFIG(WARNING, "(%s) last_used_idx (%u) and vq->used->idx (%u) mismatches;\n", dev->ifname, vq->last_used_idx, vq->used->idx); vq->last_used_idx = vq->used->idx; vq->last_avail_idx = vq->used->idx; + VHOST_LOG_CONFIG(WARNING, "(%s) some packets maybe resent for Tx and dropped for Rx\n", + dev->ifname); } vq->access_ok = true; @@ -1066,15 +1066,14 @@ dump_guest_pages(struct virtio_net *dev) for (i = 0; i < dev->nr_guest_pages; i++) { page = &dev->guest_pages[i]; - VHOST_LOG_CONFIG(INFO, - "(%s) guest physical page region %u\n" - "\t guest_phys_addr: %" PRIx64 "\n" - "\t host_phys_addr : %" PRIx64 "\n" - "\t size : %" PRIx64 "\n", - dev->ifname, i, - page->guest_phys_addr, - page->host_phys_addr, - page->size); + VHOST_LOG_CONFIG(INFO, "(%s) guest physical page region %u\n", + dev->i
[PATCH v3 9/9] vhost: use proper logging type for data path
This patch changes type from config to data for functions called in the datapath. Suggested-by: David Marchand Signed-off-by: Maxime Coquelin Reviewed-by: Chenbo Xia --- lib/vhost/vhost.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c index cd62dc238b..f59ca6c157 100644 --- a/lib/vhost/vhost.c +++ b/lib/vhost/vhost.c @@ -58,7 +58,7 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq, vhost_user_iotlb_pending_insert(dev, vq, iova, perm); if (vhost_user_iotlb_miss(dev, iova, perm)) { - VHOST_LOG_CONFIG(ERR, "(%s) IOTLB miss req failed for IOVA 0x%" PRIx64 "\n", + VHOST_LOG_DATA(ERR, "(%s) IOTLB miss req failed for IOVA 0x%" PRIx64 "\n", dev->ifname, iova); vhost_user_iotlb_pending_remove(vq, iova, 1, perm); } @@ -420,7 +420,7 @@ translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq, gpa = hva_to_gpa(dev, hva, exp_size); if (!gpa) { - VHOST_LOG_CONFIG(ERR, + VHOST_LOG_DATA(ERR, "(%s) failed to find GPA for log_addr: 0x%" PRIx64 " hva: 0x%" PRIx64 "\n", dev->ifname, log_addr, hva); -- 2.34.1
[PATCH v3 8/9] vhost: differentiate IOTLB logs
Same logging messages were used for both IOTLB cache insertion failure and IOTLB pending insertion failure. This patch differentiate them to ease logs analysis. Suggested-by: David Marchand Signed-off-by: Maxime Coquelin Reviewed-by: Chenbo Xia --- lib/vhost/iotlb.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c index afa86d7c2b..b24202a7eb 100644 --- a/lib/vhost/iotlb.c +++ b/lib/vhost/iotlb.c @@ -70,7 +70,8 @@ vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue * ret = rte_mempool_get(vq->iotlb_pool, (void **)&node); if (ret) { - VHOST_LOG_CONFIG(DEBUG, "(%s) IOTLB pool %s empty, clear entries\n", + VHOST_LOG_CONFIG(DEBUG, + "(%s) IOTLB pool %s empty, clear entries for pending insertion\n", dev->ifname, vq->iotlb_pool->name); if (!TAILQ_EMPTY(&vq->iotlb_pending_list)) vhost_user_iotlb_pending_remove_all(vq); @@ -78,7 +79,8 @@ vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue * vhost_user_iotlb_cache_random_evict(vq); ret = rte_mempool_get(vq->iotlb_pool, (void **)&node); if (ret) { - VHOST_LOG_CONFIG(ERR, "(%s) IOTLB pool %s still empty, failure\n", + VHOST_LOG_CONFIG(ERR, + "(%s) IOTLB pool %s still empty, pending insertion failure\n", dev->ifname, vq->iotlb_pool->name); return; } @@ -167,7 +169,8 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node); if (ret) { - VHOST_LOG_CONFIG(DEBUG, "(%s) IOTLB pool %s empty, clear entries\n", + VHOST_LOG_CONFIG(DEBUG, + "(%s) IOTLB pool %s empty, clear entries for cache insertion\n", dev->ifname, vq->iotlb_pool->name); if (!TAILQ_EMPTY(&vq->iotlb_list)) vhost_user_iotlb_cache_random_evict(vq); @@ -175,7 +178,8 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq vhost_user_iotlb_pending_remove_all(vq); ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node); if (ret) { - VHOST_LOG_CONFIG(ERR, "(%s) IOTLB pool %s still empty, failure\n", + VHOST_LOG_CONFIG(ERR, + "(%s) IOTLB pool %s still empty, cache insertion failed\n", dev->ifname, vq->iotlb_pool->name); return; } -- 2.34.1
RE: [PATCH v1] raw/ifpga/base: fix port feature ID
> -Original Message- > From: Xu, Rosen > Sent: Wednesday, January 26, 2022 9:18 AM > To: Huang, Wei ; dev@dpdk.org; Zhang, Qi Z > > Cc: sta...@dpdk.org; Zhang, Tianfei ; Yigit, Ferruh > > Subject: RE: [PATCH v1] raw/ifpga/base: fix port feature ID > > Hi, > > > -Original Message- > > From: Huang, Wei > > Sent: Tuesday, January 25, 2022 10:31 > > To: dev@dpdk.org; Xu, Rosen ; Zhang, Qi Z > > > > Cc: sta...@dpdk.org; Zhang, Tianfei ; Yigit, > > Ferruh ; Huang, Wei > > Subject: [PATCH v1] raw/ifpga/base: fix port feature ID > > > > Fix ID value of port features to match the definition from hardware. > > > > Fixes: 473c88f9 ("drivers/raw: remove rawdev from directory names") > > > > Signed-off-by: Wei Huang > > --- > > drivers/raw/ifpga/base/ifpga_defines.h | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/raw/ifpga/base/ifpga_defines.h > > b/drivers/raw/ifpga/base/ifpga_defines.h > > index dca1518..8f62033 100644 > > --- a/drivers/raw/ifpga/base/ifpga_defines.h > > +++ b/drivers/raw/ifpga/base/ifpga_defines.h > > @@ -93,9 +93,9 @@ enum fpga_id_type { > > > > #define PORT_FEATURE_ID_HEADER FEATURE_ID_FIU_HEADER #define > > PORT_FEATURE_ID_ERROR 0x10 -#define PORT_FEATURE_ID_UMSG 0x12 > - > > #define PORT_FEATURE_ID_UINT 0x13 -#define PORT_FEATURE_ID_STP > > 0x14 > > +#define PORT_FEATURE_ID_UMSG 0x11 > > +#define PORT_FEATURE_ID_UINT 0x12 > > +#define PORT_FEATURE_ID_STP 0x13 > > #define PORT_FEATURE_ID_UAFU FEATURE_ID_AFU > > > > /* > > -- > > 1.8.3.1 > > Acked-by: Rosen Xu Acked-by: Qi Zhang Applied to dpdk-next-net-intel. Thanks Qi
Re: [PATCH] vhost: add vDPA resource cleanup callback
Hi Xueming, On 11/3/21 14:49, Maxime Coquelin wrote: On 11/3/21 14:45, Xueming(Steven) Li wrote: On Wed, 2021-11-03 at 09:46 +0100, Maxime Coquelin wrote: On 11/3/21 09:41, Xia, Chenbo wrote: Hi Xueming, -Original Message- From: Xueming(Steven) Li Sent: Thursday, October 21, 2021 8:36 PM To: maxime.coque...@redhat.com; dev@dpdk.org Cc: Xia, Chenbo Subject: Re: [PATCH] vhost: add vDPA resource cleanup callback On Thu, 2021-10-21 at 14:00 +0200, Maxime Coquelin wrote: Hi Xueming, On 10/19/21 13:39, Xueming Li wrote: This patch adds vDPA device cleanup callback to release resources on vhost user connection close. Signed-off-by: Xueming Li --- lib/vhost/rte_vdpa_dev.h | 3 +++ lib/vhost/vhost_user.c | 6 ++ 2 files changed, 9 insertions(+) diff --git a/lib/vhost/rte_vdpa_dev.h b/lib/vhost/rte_vdpa_dev.h index b0f494815fa..2711004fe05 100644 --- a/lib/vhost/rte_vdpa_dev.h +++ b/lib/vhost/rte_vdpa_dev.h @@ -32,6 +32,9 @@ struct rte_vdpa_dev_ops { /** Driver close the device (Mandatory) */ int (*dev_close)(int vid); + /** Connection closed, clean up resources */ + int (*dev_cleanup)(int vid); + /** Enable/disable this vring (Mandatory) */ int (*set_vring_state)(int vid, int vring, int state); diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index 5a894ca0cc7..032b621c86c 100644 --- a/lib/vhost/vhost_user.c +++ b/lib/vhost/vhost_user.c @@ -162,6 +162,12 @@ free_mem_region(struct virtio_net *dev) void vhost_backend_cleanup(struct virtio_net *dev) { + struct rte_vdpa_device *vdpa_dev; + + vdpa_dev = dev->vdpa_dev; + if (vdpa_dev && vdpa_dev->ops->dev_cleanup != NULL) + vdpa_dev->ops->dev_cleanup(dev->vid); + if (dev->mem) { free_mem_region(dev); rte_free(dev->mem); What will be done there that cannot be done in .dev_close()? .dev_close() mainly handles VM suspend and driver reset. If release everything inside dev_close(), the suspend and resume takes longer time if number of VQs are huge. Customer want to upgrade VM configuration using suspend and resume, pause customer VM too long can't be accepted. By saying 'upgrade VM configuration', do you mean VM memory hotplug? Or something more? Is this patch a next-step improvement of this commit? commit 127f9c6f7b78a47b73b3e1c39e021cc81a30b4c9 Author: Matan Azrad Date: Mon Jun 29 14:08:19 2020 + vhost: handle memory hotplug with vDPA devices Some vDPA drivers' basic configurations should be updated when the guest memory is hotplugged. Close vDPA device before hotplug operation and recreate it after the hotplug operation is done. Signed-off-by: Matan Azrad Reviewed-by: Maxime Coquelin Reviewed-by: Chenbo Xia So the idea is to cache and reuse resource between dev_close() and dev_conf(). Actually, the two functions looks more like dev_stop() and dev_start(). dev_cleanup hooks to vhost backend cleanup which called when socket closed for both client and server mode, a safe point to cleanup all cached resources. Having the mlx5 implementation of this callback alongside this patch may help to understand. The mlx5 implementation still a prototype, pending on internal review. So I just post the vhost part to get suggestion/comment. Let me know if the ugly code does help :) I would prefer to see the mlx implementation with this patch in the same patchset to understand the problem. A new callback is fine if the problem itself makes sense :) FYI, I'm about to apply a patch that marks the vDPA driver API as internal, when you will submit a new version please apply on top of it. Haven't check your patch yet, but sounds good form subject :) The branch is now ready, you can rebase your patch on top of it: git://dpdk.org/next/dpdk-next-virtio main Could you please rebase your patch if you want it in v22.03? Thanks! Maxime Maxime Thanks, Maxime Thanks, Chenbo Thanks, Maxime
Re: [PATCH v2 01/10] ethdev: introduce flow pre-configuration hints
On Wed, Jan 26, 2022 at 09:45:18AM +, Ori Kam wrote: > > > > -Original Message- > > From: Bruce Richardson > > Sent: Tuesday, January 25, 2022 8:14 PM > > To: Ori Kam > > Subject: Re: [PATCH v2 01/10] ethdev: introduce flow pre-configuration hints > > > > On Tue, Jan 25, 2022 at 06:09:42PM +, Bruce Richardson wrote: > > > On Tue, Jan 25, 2022 at 03:58:45PM +, Ori Kam wrote: > > > > Hi Bruce, > > > > > > > > > -Original Message- From: Bruce Richardson > > > > > Sent: Monday, January 24, 2022 8:09 PM > > > > > Subject: Re: [PATCH v2 01/10] ethdev: introduce flow > > > > > pre-configuration hints > > > > > > > > > > On Mon, Jan 24, 2022 at 11:16:15PM +0530, Jerin Jacob wrote: > > > > > > On Mon, Jan 24, 2022 at 11:05 PM Thomas Monjalon > > > > > > wrote: > > > > > > > > > > > > > > 24/01/2022 15:36, Jerin Jacob: > > > > > > > > On Tue, Jan 18, 2022 at 9:01 PM Alexander Kozyrev > > > > > > > > wrote: > > > > > > > > > +struct rte_flow_port_attr { + /** +* Version > > > > > > > > > of the struct layout, should be 0. +*/ + > > > > > > > > > uint32_t version; > > > > > > > > > > > > > > > > Why version number? Across DPDK, we are using dynamic function > > > > > > > > versioning, I think, that would be sufficient for ABI > > > > > > > > versioning > > > > > > > > > > > > > > Function versioning is not ideal when the structure is accessed > > > > > > > in many places like many drivers and library functions. > > > > > > > > > > > > > > The idea of this version field (which can be a bitfield) is to > > > > > > > update it when some new features are added, so the users of the > > > > > > > struct can check if a feature is there before trying to use it. > > > > > > > It means a bit more code in the functions, but avoid duplicating > > > > > > > functions as in function versioning. > > > > > > > > > > > > > > Another approach was suggested by Bruce, and applied to dmadev. > > > > > > > It is assuming we only add new fields at the end (no removal), > > > > > > > and focus on the size of the struct. By passing sizeof as an > > > > > > > extra parameter, the function knows which fields are OK to use. > > > > > > > Example: > > > > > > > http://code.dpdk.org/dpdk/v21.11/source/lib/dmadev/rte_dmadev.c#L476 > > > > > > > > > > > > + @Richardson, Bruce Either approach is fine, No strong opinion. > > > > > > We can have one approach and use it across DPDK for consistency. > > > > > > > > > > > > > > > > In general I prefer the size-based approach, mainly because of its > > > > > simplicity. However, some other reasons why we may want to choose it: > > > > > > > > > > * It's completely hidden from the end user, and there is no need for > > > > > an extra struct field that needs to be filled in > > > > > > > > > > * Related to that, for the version-field approach, if the field is > > > > > present in a user-allocated struct, then you probably need to start > > > > > preventing user error via: - having the external struct not have the > > > > > field and use a separate internal struct to add in the version info > > > > > after the fact in the versioned function. Alternatively, - provide a > > > > > separate init function for each structure to fill in the version > > > > > field appropriately > > > > > > > > > > * In general, using the size-based approach like in the linked > > > > > example is more resilient since it's compiler-inserted, so there is > > > > > reduced chance of error. > > > > > > > > > > * A sizeof field allows simple-enough handling in the drivers - > > > > > especially since it does not allow removed fields. Each driver only > > > > > needs to check that the size passed in is greater than that expected, > > > > > thereby allowing us to have both updated and non-updated drivers > > > > > co-existing simultaneously. [For a version field, the same scheme > > > > > could also work if we keep the no-delete rule, but for a bitmask > > > > > field, I believe things may get more complex in terms of checking] > > > > > > > > > > In terms of the limitations of using sizeof - requiring new fields to > > > > > always go on the end, and preventing shrinking the struct - I think > > > > > that the simplicity gains far outweigh the impact of these > > > > > strictions. > > > > > > > > > > * Adding fields to struct is far more common than wanting to remove > > > > > one > > > > > > > > > > * So long as the added field is at the end, even if the struct size > > > > > doesn't change the scheme can still work as the versioned function > > > > > for the old struct can ensure that the extra field is appropriately > > > > > zeroed (rather than random) on entry into any driver function > > > > > > > > > > > > > Zero can be a valid value so this is may result in an issue. > > > > > > > > > > In this instance, I was using zero as a neutral, default-option value. If > > > having zero as the default causes problems, we can always make the > > > structure size change to force a n
Re: [PATCH v2 01/10] ethdev: introduce flow pre-configuration hints
26/01/2022 11:52, Bruce Richardson: > The scenario is as follows. Suppose we have the initial state as below: > > struct x_dev_cfg { >int x; > }; > > int > x_dev_cfg(int dev_id, struct x_dev_cfg *cfg) > { >struct x_dev *dev = x_devs[id]; >// some setup/config may go here >return dev->configure(cfg, sizeof(cfg)); // sizeof(cfg) == 4 > } > > Now, supposing we need to add in a new field into the config structure, a > very common occurance. This will indeed break the ABI, so we need to use > ABI versioning, to ensure that apps passing in the old structure, only call > a function which expects the old structure. Therefore, we need a copy of > the old structure, and a function to work on it. This gives this result: > > struct x_dev_cfg { > int x; > bool flag; // new field; > }; > > struct x_dev_cfg_v22 { // needed for ABI-versioned function > int x; > }; > > /* this function will only be called by *newly-linked* code, which uses > * the new structure */ > int > x_dev_cfg(int dev_id, struct x_dev_cfg *cfg) > { >struct x_dev *dev = x_devs[id]; >// some setup/config may go here >return dev->configure(cfg, sizeof(cfg)); // sizeof(cfg) is now 8 > } > > /* this function is called by apps linked against old version */ > int > x_dev_cfg_v22(int dev_id, struct x_dev_cfg_v22 *cfg) > { >struct x_dev *dev = x_devs[id]; >// some setup/config may go here >return dev->configure((void *)cfg, sizeof(cfg)); // sizeof(cfg) is still 4 > } > > With the above library code, we have different functions using the > different structures, so ABI compatibility is preserved - apps passing in a > 4-byte struct call a function using the 4-byte struct, while newer apps can > use the 8-byte version. > > The final part of the puzzle is then how drivers react to this change. > Originally, all drivers only use "x" in the config structure because that > is all that there is. That will still continue to work fine in the above > case, as both 4-byte and 8-byte structs have the same x value at the same > offset. i.e. no driver updates for x_dev is needed. > > On the other hand, if there are drivers that do want/need the new field, > they can also get to use it, but they do need to check for its presence > before they do so, i.e they would work as below: > > if (size_param > struct(x_dev_cfg_v22)) { // or "== struct(x_dev_cfg)" > // use flags field > } > > Hope this is clear now. Yes, this is the kind of explanation we need in our guideline doc. Alternatives can be documented as well. If we can list pros/cons in the doc, it will be easier to choose the best approach and to explain the choice during code review.
Re: [dpdk-dev] [pull-request] dpdk-next-eventdev - v22.03 - RC1
24/01/2022 10:23, Jerin Jacob Kollanukkaran: > http://dpdk.org/git/next/dpdk-next-eventdev Pulled, thanks.
RE: [PATCH v2 01/10] ethdev: introduce flow pre-configuration hints
> -Original Message- > From: Thomas Monjalon > Sent: Wednesday, January 26, 2022 1:22 PM > Subject: Re: [PATCH v2 01/10] ethdev: introduce flow pre-configuration hints > > 26/01/2022 11:52, Bruce Richardson: > > The scenario is as follows. Suppose we have the initial state as below: > > > > struct x_dev_cfg { > >int x; > > }; > > > > int > > x_dev_cfg(int dev_id, struct x_dev_cfg *cfg) > > { > >struct x_dev *dev = x_devs[id]; > >// some setup/config may go here > >return dev->configure(cfg, sizeof(cfg)); // sizeof(cfg) == 4 > > } > > > > Now, supposing we need to add in a new field into the config structure, a > > very common occurance. This will indeed break the ABI, so we need to use > > ABI versioning, to ensure that apps passing in the old structure, only call > > a function which expects the old structure. Therefore, we need a copy of > > the old structure, and a function to work on it. This gives this result: > > > > struct x_dev_cfg { > > int x; > > bool flag; // new field; > > }; > > > > struct x_dev_cfg_v22 { // needed for ABI-versioned function > > int x; > > }; > > > > /* this function will only be called by *newly-linked* code, which uses > > * the new structure */ > > int > > x_dev_cfg(int dev_id, struct x_dev_cfg *cfg) > > { > >struct x_dev *dev = x_devs[id]; > >// some setup/config may go here > >return dev->configure(cfg, sizeof(cfg)); // sizeof(cfg) is now 8 > > } > > > > /* this function is called by apps linked against old version */ > > int > > x_dev_cfg_v22(int dev_id, struct x_dev_cfg_v22 *cfg) > > { > >struct x_dev *dev = x_devs[id]; > >// some setup/config may go here > >return dev->configure((void *)cfg, sizeof(cfg)); // sizeof(cfg) is still > > 4 > > } > > > > With the above library code, we have different functions using the > > different structures, so ABI compatibility is preserved - apps passing in a > > 4-byte struct call a function using the 4-byte struct, while newer apps can > > use the 8-byte version. > > > > The final part of the puzzle is then how drivers react to this change. > > Originally, all drivers only use "x" in the config structure because that > > is all that there is. That will still continue to work fine in the above > > case, as both 4-byte and 8-byte structs have the same x value at the same > > offset. i.e. no driver updates for x_dev is needed. > > > > On the other hand, if there are drivers that do want/need the new field, > > they can also get to use it, but they do need to check for its presence > > before they do so, i.e they would work as below: > > > > if (size_param > struct(x_dev_cfg_v22)) { // or "== struct(x_dev_cfg)" > > // use flags field > > } > > > > Hope this is clear now. > > Yes, this is the kind of explanation we need in our guideline doc. > Alternatives can be documented as well. > If we can list pros/cons in the doc, it will be easier to choose > the best approach and to explain the choice during code review. > > Thanks you very much for the clear explanation. The draw back is that we need also to duplicate the functions. Using the flags/version we only need to create new structures and from application point of view it knows what exta fields it gets. (I agree that application knowledge has downsides but also advantages) In the case of flags/version your example will look like this (this is for the record and may other developers are intrested): struct x_dev_cfg { //original struct int ver; int x; }; struct x_dev_cfg_v2 { // new struct int ver; int x; bool flag; // new field; }; The function is always the same function: x_dev_cfg(int dev_id, struct x_dev_cfg *cfg) { struct x_dev *dev = x_devs[id]; // some setup/config may go here return dev->configure(cfg); } When calling this function with old struct: X_dev_cfg(id, (struct x_dev_cfg *)cfg) When calling this function with new struct: X_dev_cfg(id, (struct x_dev_cfg *)cfg_v2) In PMD: If (cfg->ver >= 2) // version 2 logic Else If (cfg->v >=0) // base version logic When using flags it gives even more control since pmd can tell exactly what features are required. All options have prons/cons I vote for the version one. We can have a poll 😊 Or like Thomas said list pros and cons and each subsystem can have it own selection.
[PATCH v4 0/2] Add config file support for l3fwd
This patchset introduces config file support for l3fwd and its lookup methods LPM, FIB, and EM, similar to that of l3fwd-acl. This allows for route rules to be defined in configuration files and edited there instead of in each of the lookup methods hardcoded route tables. V4: * Fix nondeterministic bug of segfault on termination of sample app. Sean Morrissey (2): examples/l3fwd: add config file support for LPM/FIB examples/l3fwd: add config file support for EM doc/guides/sample_app_ug/l3_forward.rst | 89 +++-- examples/l3fwd/em_default_v4.cfg| 17 + examples/l3fwd/em_default_v6.cfg| 17 + examples/l3fwd/l3fwd.h | 41 ++ examples/l3fwd/l3fwd_em.c | 479 ++-- examples/l3fwd/l3fwd_fib.c | 52 +-- examples/l3fwd/l3fwd_lpm.c | 278 +- examples/l3fwd/l3fwd_route.h| 49 ++- examples/l3fwd/lpm_default_v4.cfg | 17 + examples/l3fwd/lpm_default_v6.cfg | 17 + examples/l3fwd/main.c | 106 +++--- 11 files changed, 857 insertions(+), 305 deletions(-) create mode 100644 examples/l3fwd/em_default_v4.cfg create mode 100644 examples/l3fwd/em_default_v6.cfg create mode 100644 examples/l3fwd/lpm_default_v4.cfg create mode 100644 examples/l3fwd/lpm_default_v6.cfg -- 2.25.1
[PATCH v4 1/2] examples/l3fwd: add config file support for LPM/FIB
Add support to define ipv4 and ipv6 forwarding tables from reading from a config file for LPM and FIB, with format similar to l3fwd-acl one. With the removal of the hardcoded route tables for IPv4 and IPv6, these routes have been moved to a separate default config file for use with LPM and FIB. Signed-off-by: Sean Morrissey Signed-off-by: Ravi Kerur Acked-by: Konstantin Ananyev --- examples/l3fwd/l3fwd.h| 41 + examples/l3fwd/l3fwd_em.c | 13 ++ examples/l3fwd/l3fwd_fib.c| 52 +++--- examples/l3fwd/l3fwd_lpm.c| 278 +++--- examples/l3fwd/l3fwd_route.h | 17 +- examples/l3fwd/lpm_default_v4.cfg | 17 ++ examples/l3fwd/lpm_default_v6.cfg | 17 ++ examples/l3fwd/main.c | 106 +++- 8 files changed, 454 insertions(+), 87 deletions(-) create mode 100644 examples/l3fwd/lpm_default_v4.cfg create mode 100644 examples/l3fwd/lpm_default_v6.cfg diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index 38ca19133c..4e1e0af033 100644 --- a/examples/l3fwd/l3fwd.h +++ b/examples/l3fwd/l3fwd.h @@ -58,6 +58,30 @@ #endif #define HASH_ENTRY_NUMBER_DEFAULT 16 +/*Log file related character defs. */ +#define COMMENT_LEAD_CHAR ('#') +#define ROUTE_LEAD_CHAR('R') + +#defineIPV6_ADDR_LEN 16 +#defineIPV6_ADDR_U16 (IPV6_ADDR_LEN / sizeof(uint16_t)) +#defineIPV6_ADDR_U32 (IPV6_ADDR_LEN / sizeof(uint32_t)) + +#define GET_CB_FIELD(in, fd, base, lim, dlm) do {\ + unsigned long val; \ + char *end; \ + errno = 0; \ + val = strtoul((in), &end, (base)); \ + if (errno != 0 || end[0] != (dlm) || val > (lim)) \ + return -EINVAL; \ + (fd) = (typeof(fd))val; \ + (in) = end + 1; \ +} while (0) + +struct parm_cfg { + const char *rule_ipv4_name; + const char *rule_ipv6_name; +}; + struct mbuf_table { uint16_t len; struct rte_mbuf *m_table[MAX_PKT_BURST]; @@ -96,6 +120,8 @@ extern xmm_t val_eth[RTE_MAX_ETHPORTS]; extern struct lcore_conf lcore_conf[RTE_MAX_LCORE]; +extern struct parm_cfg parm_config; + /* Send burst of packets on an output interface */ static inline int send_burst(struct lcore_conf *qconf, uint16_t n, uint16_t port) @@ -183,6 +209,12 @@ int init_mem(uint16_t portid, unsigned int nb_mbuf); /* Function pointers for LPM, EM or FIB functionality. */ +void +read_config_files_lpm(void); + +void +read_config_files_em(void); + void setup_lpm(const int socketid); @@ -286,4 +318,13 @@ fib_get_ipv4_l3fwd_lookup_struct(const int socketid); void * fib_get_ipv6_l3fwd_lookup_struct(const int socketid); +void +em_free_routes(void); + +void +lpm_free_routes(void); + +int +is_bypass_line(const char *buff); + #endif /* __L3_FWD_H__ */ diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c index 5cc4a4d979..9058e75a76 100644 --- a/examples/l3fwd/l3fwd_em.c +++ b/examples/l3fwd/l3fwd_em.c @@ -972,6 +972,19 @@ em_event_main_loop_tx_q_burst_vector(__rte_unused void *dummy) return 0; } +/* Load rules from the input file */ +void +read_config_files_em(void) +{ + /* Empty till config file support added to EM */ +} + +void +em_free_routes(void) +{ + /* Empty till config file support added to EM */ +} + /* Initialize exact match (hash) parameters. 8< */ void setup_hash(const int socketid) diff --git a/examples/l3fwd/l3fwd_fib.c b/examples/l3fwd/l3fwd_fib.c index 2110459cc3..003721c908 100644 --- a/examples/l3fwd/l3fwd_fib.c +++ b/examples/l3fwd/l3fwd_fib.c @@ -583,7 +583,7 @@ setup_fib(const int socketid) struct rte_eth_dev_info dev_info; struct rte_fib6_conf config; struct rte_fib_conf config_ipv4; - unsigned int i; + int i; int ret; char s[64]; char abuf[INET6_ADDRSTRLEN]; @@ -603,37 +603,39 @@ setup_fib(const int socketid) "Unable to create the l3fwd FIB table on socket %d\n", socketid); + /* Populate the fib ipv4 table. */ - for (i = 0; i < RTE_DIM(ipv4_l3fwd_route_array); i++) { + for (i = 0; i < route_num_v4; i++) { struct in_addr in; /* Skip unused ports. */ - if ((1 << ipv4_l3fwd_route_array[i].if_out & + if ((1 << route_base_v4[i].if_out & enabled_port_mask) == 0) continue; - rte_eth_dev_info_get(ipv4_l3fwd_route_array[i].if_out, + rte_eth_dev_info_get(route_base_v4[i].if_out, &dev_info); ret = rte_fib_add(ipv4_l3fwd_fib_lookup_struct[socketid], -
[PATCH v4 2/2] examples/l3fwd: add config file support for EM
Add support to define ipv4 and ipv6 forwarding tables from reading from a config file for EM with a format similar to l3fwd-acl one. With the removal of the hardcoded route tables for IPv4 and IPv6 from 'l3fwd_em', these routes have been moved to a separate default config file for use with EM. Related l3fwd docs have been updated to relfect these changes. Signed-off-by: Sean Morrissey Signed-off-by: Ravi Kerur Acked-by: Konstantin Ananyev --- doc/guides/sample_app_ug/l3_forward.rst | 89 +++-- examples/l3fwd/em_default_v4.cfg| 17 + examples/l3fwd/em_default_v6.cfg| 17 + examples/l3fwd/l3fwd_em.c | 476 ++-- examples/l3fwd/l3fwd_route.h| 38 +- 5 files changed, 411 insertions(+), 226 deletions(-) create mode 100644 examples/l3fwd/em_default_v4.cfg create mode 100644 examples/l3fwd/em_default_v6.cfg diff --git a/doc/guides/sample_app_ug/l3_forward.rst b/doc/guides/sample_app_ug/l3_forward.rst index 6d7d7c5cc1..01d86db95d 100644 --- a/doc/guides/sample_app_ug/l3_forward.rst +++ b/doc/guides/sample_app_ug/l3_forward.rst @@ -47,6 +47,7 @@ and loaded into the LPM or FIB object at initialization time. In the sample application, hash-based and FIB-based forwarding supports both IPv4 and IPv6. LPM-based forwarding supports IPv4 only. +During the initialization phase route rules for IPv4 and IPv6 are read from rule files. Compiling the Application - @@ -61,6 +62,8 @@ Running the Application The application has a number of command line options:: ./dpdk-l3fwd [EAL options] -- -p PORTMASK + --rule_ipv4=FILE + --rule_ipv6=FILE [-P] [--lookup LOOKUP_METHOD] --config(port,queue,lcore)[,(port,queue,lcore)] @@ -82,6 +85,11 @@ Where, * ``-p PORTMASK:`` Hexadecimal bitmask of ports to configure +* ``--rule_ipv4=FILE:`` specify the ipv4 rules entries file. + Each rule occupies one line. + +* ``--rule_ipv6=FILE:`` specify the ipv6 rules entries file. + * ``-P:`` Optional, sets all ports to promiscuous mode so that packets are accepted regardless of the packet's Ethernet MAC destination address. Without this option, only packets with the Ethernet MAC destination address set to the Ethernet address of the port are accepted. @@ -135,7 +143,7 @@ To enable L3 forwarding between two ports, assuming that both ports are in the s .. code-block:: console -.//examples/dpdk-l3fwd -l 1,2 -n 4 -- -p 0x3 --config="(0,0,1),(1,0,2)" +.//examples/dpdk-l3fwd -l 1,2 -n 4 -- -p 0x3 --config="(0,0,1),(1,0,2)" --rule_ipv4="rule_ipv4.cfg" --rule_ipv6="rule_ipv6.cfg" In this command: @@ -157,19 +165,23 @@ In this command: | | | | | +--+---+---+-+ +* The -rule_ipv4 option specifies the reading of IPv4 rules sets from the rule_ipv4.cfg file + +* The -rule_ipv6 option specifies the reading of IPv6 rules sets from the rule_ipv6.cfg file. + To use eventdev mode with sync method **ordered** on above mentioned environment, Following is the sample command: .. code-block:: console -.//examples/dpdk-l3fwd -l 0-3 -n 4 -a -- -p 0x3 --eventq-sched=ordered +.//examples/dpdk-l3fwd -l 0-3 -n 4 -a -- -p 0x3 --eventq-sched=ordered --rule_ipv4="rule_ipv4.cfg" --rule_ipv6="rule_ipv6.cfg" or .. code-block:: console .//examples/dpdk-l3fwd -l 0-3 -n 4 -a \ - -- -p 0x03 --mode=eventdev --eventq-sched=ordered + -- -p 0x03 --mode=eventdev --eventq-sched=ordered --rule_ipv4="rule_ipv4.cfg" --rule_ipv6="rule_ipv6.cfg" In this command: @@ -192,7 +204,7 @@ scheduler. Following is the sample command: .. code-block:: console -.//examples/dpdk-l3fwd -l 0-7 -s 0xf -n 4 --vdev event_sw0 -- -p 0x3 --mode=eventdev --eventq-sched=ordered +.//examples/dpdk-l3fwd -l 0-7 -s 0xf -n 4 --vdev event_sw0 -- -p 0x3 --mode=eventdev --eventq-sched=ordered --rule_ipv4="rule_ipv4.cfg" --rule_ipv6="rule_ipv6.cfg" In case of eventdev mode, *--config* option is not used for ethernet port configuration. Instead each ethernet port will be configured with mentioned @@ -216,6 +228,49 @@ The following sections provide some explanation of the sample application code. the initialization and run-time paths are very similar to those of the :doc:`l2_forward_real_virtual` and :doc:`l2_forward_event`. The following sections describe aspects that are specific to the L3 Forwarding sample application. +Parse Rules from File +~ + +The application parses the rules from the file and adds them to the appropriate route table by calling the appropriate function. +It ignores empty and comment lines, and parses and validates the rules it reads. +If errors are detected, the appl
[PATCH] net/bonding: fix MTU set for slaves
ethdev requires device to be configured before setting MTU. In bonding PMD, while configuring slaves, bonding first sets MTU later configures them, which causes failure if slaves are not configured in advance. Fixing by changing the order in slave configure as requested in ethdev layer, configure first and set MTU later. Bugzilla ID: 864 Fixes: b26bee10ee37 ("ethdev: forbid MTU set before device configure") Cc: sta...@dpdk.org Signed-off-by: Ferruh Yigit --- Cc: Andrew Rybchenko Cc: yux.ji...@intel.com --- drivers/net/bonding/rte_eth_bond_pmd.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c index 9607141b393e..6fd52035c9a8 100644 --- a/drivers/net/bonding/rte_eth_bond_pmd.c +++ b/drivers/net/bonding/rte_eth_bond_pmd.c @@ -1741,14 +1741,6 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev, } } - errval = rte_eth_dev_set_mtu(slave_eth_dev->data->port_id, -bonded_eth_dev->data->mtu); - if (errval != 0 && errval != -ENOTSUP) { - RTE_BOND_LOG(ERR, "rte_eth_dev_set_mtu: port %u, err (%d)", - slave_eth_dev->data->port_id, errval); - return errval; - } - /* Configure device */ errval = rte_eth_dev_configure(slave_eth_dev->data->port_id, nb_rx_queues, nb_tx_queues, @@ -1759,6 +1751,14 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev, return errval; } + errval = rte_eth_dev_set_mtu(slave_eth_dev->data->port_id, +bonded_eth_dev->data->mtu); + if (errval != 0 && errval != -ENOTSUP) { + RTE_BOND_LOG(ERR, "rte_eth_dev_set_mtu: port %u, err (%d)", + slave_eth_dev->data->port_id, errval); + return errval; + } + /* Setup Rx Queues */ for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) { bd_rx_q = (struct bond_rx_queue *)bonded_eth_dev->data->rx_queues[q_id]; -- 2.34.1
Re: [PATCH] mbuf: delete dynamic fields copy in hdr copy
Hi, On Tue, Jan 11, 2022 at 05:45:49PM +0100, Thomas Monjalon wrote: > 14/12/2021 08:56, Gaoxiang Liu: > > Because dynamic fields are registered by the DPDK application, > > so it is up to the application to decide whether to copy the value of > > dynamic fields. > > So delete dynamic fields copy in __rte_pktmbuf_copy_hdr. > > It's more flexible for the DPDK application, > > and is useful for improving performance. > > Yes, removing operations will improve the performance, > but it looks wrong. > This is copying all dynamic fields, not matter which one is registered. > We cannot ask the application to manage dynamic fields copy, > especially if the copy is done inside a library. +1 Dynamic fields/flags can be registered by applications, libraries, drivers, ... There is no entity that is aware of which field/flag has to be copied, so the only possibility is to copy all of them.
Re: [PATCH] vdpa/mlx5: workaround queue stop with traffic
On 11/22/21 14:12, Matan Azrad wrote: When the event thread polls traffic and a virtq is stopping, the FW loses synchronization in the virtq indexes. It causes LM failure on synchronization between the HOST indexes to the GUEST indexes. Unset the event thread before the queue stop in the LM process. Fixes: 31b9c29c86af ("vdpa/mlx5: support close and config operations") Cc: sta...@dpdk.org Signed-off-by: Matan Azrad Acked-by: Xueming Li --- drivers/vdpa/mlx5/mlx5_vdpa.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index b9e84dd9bf..8dfaba791d 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -250,10 +250,10 @@ mlx5_vdpa_dev_close(int vid) DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name); return -1; } - if (priv->configured) - ret |= mlx5_vdpa_lm_log(priv); mlx5_vdpa_err_event_unset(priv); mlx5_vdpa_cqe_event_unset(priv); + if (priv->configured) + ret |= mlx5_vdpa_lm_log(priv); mlx5_vdpa_steer_unset(priv); mlx5_vdpa_virtqs_release(priv); mlx5_vdpa_event_qp_global_release(priv); Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [PATCH] net/virtio: fix Tx queue 0 override by queue 128
On 12/2/21 14:50, Xueming Li wrote: Both Rx queue and Tx queue are VirtQ in virtio, VQ index is 256 for Tx queue 128. Uint8 type of TxQ VQ index overflows and overrides Tx queue 0 data. This patch fixes VQ index type with uint16 type. Fixes: c1f86306a026 ("virtio: add new driver") Cc: sta...@dpdk.org Signed-off-by: Xueming Li --- drivers/net/virtio/virtio_rxtx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 2e115ded023..f0eafd29dc1 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -814,7 +814,7 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev, unsigned int socket_id __rte_unused, const struct rte_eth_txconf *tx_conf) { - uint8_t vq_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX; + uint16_t vq_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX; struct virtio_hw *hw = dev->data->dev_private; struct virtqueue *vq = hw->vqs[vq_idx]; struct virtnet_tx *txvq; @@ -858,7 +858,7 @@ int virtio_dev_tx_queue_setup_finish(struct rte_eth_dev *dev, uint16_t queue_idx) { - uint8_t vq_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX; + uint16_t vq_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX; struct virtio_hw *hw = dev->data->dev_private; struct virtqueue *vq = hw->vqs[vq_idx]; Reviewed-by: Maxime Coquelin Thanks for the fix! Maxime
Re: [PATCH v1] raw/ifpga: fix ifpga devices cleanup function
On 1/26/2022 3:29 AM, Wei Huang wrote: Use rte_dev_remove() to replace rte_rawdev_pmd_release() in ifpga_rawdev_cleanup(), resources occupied by ifpga raw devices such as threads can be released correctly. As far as I understand you are fixing an issue that not all resources are released, is this correct? What are these not released resources? And 'rte_rawdev_pmd_release()' rawdev API seems intended to do the cleanup, is it expected that some resources are not freed after this call, or should we fix that API? If the device remove API needs to be used, what is the point of 'rte_rawdev_pmd_release()' API? cc'ed rawdev maintainers for comment. Fixes: f724a802 ("raw/ifpga: add miscellaneous APIs") Signed-off-by: Wei Huang --- drivers/raw/ifpga/ifpga_rawdev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/raw/ifpga/ifpga_rawdev.c b/drivers/raw/ifpga/ifpga_rawdev.c index fdf3c23..88c38aa 100644 --- a/drivers/raw/ifpga/ifpga_rawdev.c +++ b/drivers/raw/ifpga/ifpga_rawdev.c @@ -1787,12 +1787,14 @@ int ifpga_rawdev_partial_reconfigure(struct rte_rawdev *dev, int port, void ifpga_rawdev_cleanup(void) { struct ifpga_rawdev *dev; + struct rte_rawdev *rdev; unsigned int i; for (i = 0; i < IFPGA_RAWDEV_NUM; i++) { dev = &ifpga_rawdevices[i]; if (dev->rawdev) { - rte_rawdev_pmd_release(dev->rawdev); + rdev = dev->rawdev; + rte_dev_remove(rdev->device); dev->rawdev = NULL; } }
Re: [PATCH v2 01/10] ethdev: introduce flow pre-configuration hints
On Wed, Jan 26, 2022 at 12:19:43PM +, Ori Kam wrote: > > > > -Original Message- > > From: Thomas Monjalon > > Sent: Wednesday, January 26, 2022 1:22 PM > > Subject: Re: [PATCH v2 01/10] ethdev: introduce flow pre-configuration hints > > > > 26/01/2022 11:52, Bruce Richardson: > > > The scenario is as follows. Suppose we have the initial state as below: > > > > > > struct x_dev_cfg { > > >int x; > > > }; > > > > > > int > > > x_dev_cfg(int dev_id, struct x_dev_cfg *cfg) > > > { > > >struct x_dev *dev = x_devs[id]; > > >// some setup/config may go here > > >return dev->configure(cfg, sizeof(cfg)); // sizeof(cfg) == 4 > > > } > > > > > > Now, supposing we need to add in a new field into the config structure, a > > > very common occurance. This will indeed break the ABI, so we need to use > > > ABI versioning, to ensure that apps passing in the old structure, only > > > call > > > a function which expects the old structure. Therefore, we need a copy of > > > the old structure, and a function to work on it. This gives this result: > > > > > > struct x_dev_cfg { > > > int x; > > > bool flag; // new field; > > > }; > > > > > > struct x_dev_cfg_v22 { // needed for ABI-versioned function > > > int x; > > > }; > > > > > > /* this function will only be called by *newly-linked* code, which uses > > > * the new structure */ > > > int > > > x_dev_cfg(int dev_id, struct x_dev_cfg *cfg) > > > { > > >struct x_dev *dev = x_devs[id]; > > >// some setup/config may go here > > >return dev->configure(cfg, sizeof(cfg)); // sizeof(cfg) is now 8 > > > } > > > > > > /* this function is called by apps linked against old version */ > > > int > > > x_dev_cfg_v22(int dev_id, struct x_dev_cfg_v22 *cfg) > > > { > > >struct x_dev *dev = x_devs[id]; > > >// some setup/config may go here > > >return dev->configure((void *)cfg, sizeof(cfg)); // sizeof(cfg) is > > > still 4 > > > } > > > > > > With the above library code, we have different functions using the > > > different structures, so ABI compatibility is preserved - apps passing in > > > a > > > 4-byte struct call a function using the 4-byte struct, while newer apps > > > can > > > use the 8-byte version. > > > > > > The final part of the puzzle is then how drivers react to this change. > > > Originally, all drivers only use "x" in the config structure because that > > > is all that there is. That will still continue to work fine in the above > > > case, as both 4-byte and 8-byte structs have the same x value at the same > > > offset. i.e. no driver updates for x_dev is needed. > > > > > > On the other hand, if there are drivers that do want/need the new field, > > > they can also get to use it, but they do need to check for its presence > > > before they do so, i.e they would work as below: > > > > > > if (size_param > struct(x_dev_cfg_v22)) { // or "== struct(x_dev_cfg)" > > > // use flags field > > > } > > > > > > Hope this is clear now. > > > > Yes, this is the kind of explanation we need in our guideline doc. > > Alternatives can be documented as well. > > If we can list pros/cons in the doc, it will be easier to choose > > the best approach and to explain the choice during code review. > > > > > Thanks you very much for the clear explanation. > > The draw back is that we need also to duplicate the functions. > Using the flags/version we only need to create new structures > and from application point of view it knows what exta fields it gets. > (I agree that application knowledge has downsides but also advantages) > > In the case of flags/version your example will look like this (this is for > the record and may other > developers are intrested): > > struct x_dev_cfg { //original struct > int ver; > int x; > }; > > struct x_dev_cfg_v2 { // new struct > int ver; > int x; > bool flag; // new field; > }; > > > The function is always the same function: > x_dev_cfg(int dev_id, struct x_dev_cfg *cfg) > { > struct x_dev *dev = x_devs[id]; > // some setup/config may go here > return dev->configure(cfg); > } > > When calling this function with old struct: > X_dev_cfg(id, (struct x_dev_cfg *)cfg) > > When calling this function with new struct: > X_dev_cfg(id, (struct x_dev_cfg *)cfg_v2) > > In PMD: > If (cfg->ver >= 2) > // version 2 logic > Else If (cfg->v >=0) > // base version logic > > > When using flags it gives even more control since pmd can tell exactly what > features are required. > > All options have prons/cons > I vote for the version one. > > We can have a poll 😊 > Or like Thomas said list pros and cons and each subsystem can > have it own selection. The biggest issue I have with this version approach is how is the user meant to know what version number to put into the structure? When the user upgrades from one version of DPDK to the next, are they manually to update their version numbers in all their structures? If they
Re: [PATCH] net/iavf: fix null pointer dereference
On 1/25/2022 2:23 PM, Weiguo Li wrote: Check for memory allocation failure is added to avoid null pointer dereference. Fixes: 6bc987ecb860 ("net/iavf: support IPsec inline crypto") Signed-off-by: Weiguo Li --- Acked-by: Radu Nicolau
Re: [PATCH 1/3] net/enic: add support for eCPRI matching
On 1/14/2022 3:10 AM, John Daley wrote: eCPRI message can be over Ethernet layer (.1Q supported also) or over UDP layer. Message header formats are the same in these two variants. Only up though the first packet header in the PDU can be matched. RSS on the eCPRI header fields is not supported. Signed-off-by: John Daley Reviewed-by: Hyong Youb Kim --- doc/guides/rel_notes/release_22_03.rst | 1 + drivers/net/enic/enic_fm_flow.c| 65 ++ 2 files changed, 66 insertions(+) Documentation update is missing, can you please fix? $ ./devtools/check-doc-vs-code.sh rte_flow doc out of sync for enic item ecpri
Re: [PATCH 1/3] net/enic: add support for eCPRI matching
On 1/26/2022 2:00 PM, Ferruh Yigit wrote: On 1/14/2022 3:10 AM, John Daley wrote: eCPRI message can be over Ethernet layer (.1Q supported also) or over UDP layer. Message header formats are the same in these two variants. Only up though the first packet header in the PDU can be matched. RSS on the eCPRI header fields is not supported. Signed-off-by: John Daley Reviewed-by: Hyong Youb Kim --- doc/guides/rel_notes/release_22_03.rst | 1 + drivers/net/enic/enic_fm_flow.c | 65 ++ 2 files changed, 66 insertions(+) Documentation update is missing, can you please fix? $ ./devtools/check-doc-vs-code.sh rte_flow doc out of sync for enic item ecpri Hi Thomas, Can we add './devtools/check-doc-vs-code.sh' check to CI, what do you think?
Re: [PATCH 3/3] net/enic: support max descriptors allowed by adapter
On 1/14/2022 3:10 AM, John Daley wrote: Newer VIC adapters have the max number of supported RX and TX descriptors in their configuration. Use these values as the maximums. Signed-off-by: John Daley Reviewed-by: Hyong Youb Kim <...> diff --git a/drivers/net/enic/enic_res.h b/drivers/net/enic/enic_res.h index 34f15d5a42..ae979d52be 100644 --- a/drivers/net/enic/enic_res.h +++ b/drivers/net/enic/enic_res.h @@ -12,9 +12,11 @@ #include "vnic_rq.h" #define ENIC_MIN_WQ_DESCS 64 -#define ENIC_MAX_WQ_DESCS 4096 There are still 'ENIC_MAX_WQ_DESCS' usage remaining in the code, causing build error, can you please fix. ../drivers/net/enic/enic_rxtx.c: In function ‘enic_free_wq_bufs’: ../drivers/net/enic/enic_rxtx.c:397:46: error: ‘ENIC_MAX_WQ_DESCS’ undeclared (first use in this function); did you mean ‘ENIC_MIN_WQ_DESCS’? 397 | RTE_ASSERT(nb_free < ENIC_MAX_WQ_DESCS); | ^ ../lib/eal/include/rte_branch_prediction.h:38:45: note: in definition of macro ‘unlikely’ 38 | #define unlikely(x) __builtin_expect(!!(x), 0) | ^ ../lib/eal/include/rte_debug.h:47:25: note: in expansion of macro ‘RTE_VERIFY’ 47 | #define RTE_ASSERT(exp) RTE_VERIFY(exp) | ^~ ../drivers/net/enic/enic_rxtx.c:397:25: note: in expansion of macro ‘RTE_ASSERT’ 397 | RTE_ASSERT(nb_free < ENIC_MAX_WQ_DESCS); | ^~ ../drivers/net/enic/enic_rxtx.c:397:46: note: each undeclared identifier is reported only once for each function it appears in 397 | RTE_ASSERT(nb_free < ENIC_MAX_WQ_DESCS); | ^ ../lib/eal/include/rte_branch_prediction.h:38:45: note: in definition of macro ‘unlikely’ 38 | #define unlikely(x) __builtin_expect(!!(x), 0) | ^ ../lib/eal/include/rte_debug.h:47:25: note: in expansion of macro ‘RTE_VERIFY’ 47 | #define RTE_ASSERT(exp) RTE_VERIFY(exp) | ^~ ../drivers/net/enic/enic_rxtx.c:397:25: note: in expansion of macro ‘RTE_ASSERT’ 397 | RTE_ASSERT(nb_free < ENIC_MAX_WQ_DESCS); | ^~
Re: [PATCH] vhost: fix data-plane access to released vq
Hi Yuan, On 12/3/21 17:34, Yuan Wang wrote: From: yuan wang When numa reallocation occurs, numa_realoc() on the control plane will free the old vq. If rte_vhost_dequeue_burst() on the data plane get the vq just before release, then it will access the released vq. We need to put the vq->access_lock into struct virtio_net to ensure that it can prevents this situation. This patch is a fix, so the Fixes tag would be needed. But are you really facing this issue, or this is just based on code review? Currently NUMA reallocation is called whenever translate_ring_addresses() is called. translate_ring_addresses() is primarly called at device initialization, before the .new_device() callback is called. At that stage, there is no risk to performa NUMA reallocation as the application is not expected to use APIs requiring vq->access_lock acquisition. But I agree there are possibilities that numa_realloc() gets called while device is in running state. But even if that happened, I don't think it is possible that numa_realloc() ends-up reallocating the virtqueue on a different NUMA node (the vring should not have moved from a physical memory standpoint). And if even it happened, we should be safe because we ensure the VQ was not ready (so not usable by the application) before proceeding with reallocation: static struct virtio_net* numa_realloc(struct virtio_net *dev, int index) { int node, dev_node; struct virtio_net *old_dev; struct vhost_virtqueue *vq; struct batch_copy_elem *bce; struct guest_page *gp; struct rte_vhost_memory *mem; size_t mem_size; int ret; old_dev = dev; vq = dev->virtqueue[index]; /* * If VQ is ready, it is too late to reallocate, it certainly already * happened anyway on VHOST_USER_SET_VRING_ADRR. */ if (vq->ready) return dev; So, if this is fixing a real issue, I would need more details on the issue in order to understand why vq->ready was not set when it should have been. On a side note, while trying to understand how you could face an issue, I noticed that translate_ring_addresses() may be called by vhost_user_iotlb_msg(). In that case, vq->access_lock is not held as this is the handler for VHOST_USER_IOTLB_MSG. We may want to protect translate_ring_addresses() calls with locking the VQ locks. I will post a fix for it. Signed-off-by: Yuan Wang --- lib/vhost/vhost.c | 26 +- lib/vhost/vhost.h | 4 +--- lib/vhost/vhost_user.c | 4 ++-- lib/vhost/virtio_net.c | 16 4 files changed, 24 insertions(+), 26 deletions(-) ... diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index 7085e0885c..f85ce4fda5 100644 --- a/lib/vhost/vhost.h +++ b/lib/vhost/vhost.h @@ -185,9 +185,6 @@ struct vhost_virtqueue { boolaccess_ok; boolready; - rte_spinlock_t access_lock; - - union { struct vring_used_elem *shadow_used_split; struct vring_used_elem_packed *shadow_used_packed; @@ -384,6 +381,7 @@ struct virtio_net { int extbuf; int linearbuf; struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; + rte_spinlock_t vq_access_lock[VHOST_MAX_QUEUE_PAIRS * 2]; The problem here is that you'll be introducing false sharing, so I expect performance to no more scale with the number of queues. It also consumes unnecessary memory. struct inflight_mem_info *inflight_info; #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ) charifname[IF_NAME_SZ]; Thanks, Maxime
Re: [PATCH] net/memif: remove unnecessary rx_intr stub
On 1/15/2022 8:15 AM, Morten Brørup wrote: From: Stephen Hemminger [mailto:step...@networkplumber.org] Sent: Friday, 14 January 2022 21.47 The code in memif driver to stub out rx_irq_enable is unnecessary and causes different error returns than other drivers. The core ethdev code will return -ENOTSUP if the driver has a null rx_queue_intr_enable callback. Fixes: 09c7e63a71f9 ("net/memif: introduce memory interface PMD") Cc: jgraj...@cisco.com Signed-off-by: Stephen Hemminger Acked-by: Morten Brørup No comment from maintainers, hence progressing, Reviewed-by: Ferruh Yigit Applied to dpdk-next-net/main, thanks.
RE: [PATCH v2] net/i40e: reduce redundant store operation
> -Original Message- > From: Feifei Wang > Sent: Tuesday, December 21, 2021 4:11 PM > To: Xing, Beilei > Cc: dev@dpdk.org; Wang, Haiyue ; n...@arm.com; > Feifei Wang ; Ruifeng Wang > > Subject: [PATCH v2] net/i40e: reduce redundant store operation > > For free buffer operation in i40e vector path, it is unnecessary to store > 'NULL' > into txep.mbuf. This is because when putting mbuf into Tx queue, tx_tail is > the sentinel. And when doing tx_free, tx_next_dd is the sentinel. In all > processes, mbuf==NULL is not a condition in check. > Thus reset of mbuf is unnecessary and can be omitted. > > Signed-off-by: Feifei Wang > Reviewed-by: Ruifeng Wang > --- > > v2: remove the change for scalar path due to scalar path needs to check > whether the mbuf is 'NULL' to release and clean up (Haiyue) > > drivers/net/i40e/i40e_rxtx_vec_common.h | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h > b/drivers/net/i40e/i40e_rxtx_vec_common.h > index f9a7f46550..26deb59fc4 100644 > --- a/drivers/net/i40e/i40e_rxtx_vec_common.h > +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h > @@ -103,7 +103,6 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) > if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) { > for (i = 0; i < n; i++) { > free[i] = txep[i].mbuf; > - txep[i].mbuf = NULL; I will suggest to still add some comment here just for explaining, this may help to avoid unnecessary suspect when someone reading or debug on these code 😊 > } > rte_mempool_put_bulk(free[0]->pool, (void **)free, n); > goto done; > -- > 2.25.1
Re: [PATCH 0/3] three bugfixes for hns3 PMD
On 1/17/2022 2:42 AM, Min Hu (Connor) wrote: Three bugfixes: fix vector burst unsupported when PTP enable fix mailbox wait time uninitialization fix Rx/Tx when fast path operation introduced Huisong Li (1): net/hns3: fix mailbox wait time uninitialization Min Hu (Connor) (2): net/hns3: fix Rx/Tx when fast path operation introduced net/hns3: fix vector burst unsupported when PTP enable Series applied to dpdk-next-net/main, thanks.
DTS WG Meeting Minutes 1/19/22
Hello Everyone, Sorry about being so late in sending these out. This document will always be up to date: https://docs.google.com/document/d/1E2mkTHNQ5vyln1JvnJTil15cjacUTXP7LXb9K960Vxs/edit#heading=h.6k5h0aafbuyy Here are the minutes: DTS Working Group Meeting Minutes for 2022 January 19, 2022Attendees - Owen Hilyard - Lincoln Lavoie - Lijuan Tu - Juraj Linkes - Honnappa Nagarahalli Agenda - Review/Carry forward pending action items - Merging DTS into DPDK - Review additional work items MinutesReview/Carry forward pending action items - Honnappa - Propose changes to DPDK to address changing the constants and printfs, refer to Lijuan’s email. - Work is started to add options for configuring the RX and TX queue sizes for L3fwd sample application - Looking to remove the examples/distributor from DPDK which will end up removing the corresponding test case from DTS. - Owen - Send out the DTS contributions guide to DTS and DPDK mailing list. 2 weeks of RFC followed by a patch. - Email is sent - Owen, Lijuan - Look for merging the makefile removal patch ( http://patchwork.dpdk.org/project/dts/list/?series=20610) - No issues for CI - Add a depreciation warning for makefiles when run and in the release notes - Honnappa - Understand how the SPDX change was done for the DPDK repo - State of an automated solution is unknown - We are legally ok to change files with BSD licenses and files with no license to SPDX with BSD-3 clause - Honnappa - Talk to techboard about files with the GPL2 license in DTS. - Work in progress - Owen - Take a deeper look at the license and copyright assignments for the files. - Completed - If there is a possibility of issues, a file-by-file examination would be warranted - Juraj - Convert the makefile build for documentation to use meson build. - Juraj will look at this DTS Contribution Guide - Integrate suggestions from Juraj Merging DTS into DPDK - Try for 22.06 - Focus on framework and test cases used in CI - pf_smoke - vf_smoke - virtio_smoke - nic_single_core_perf - mtu_update - stats_checks - dynamic_config - dynamic_queue - mac_filter - queue_start_stop - scatter - unit_tests_mbuf Review additional work items - Action Items - Honnappa - Propose changes to DPDK to address changing the constants and printfs, refer to Lijuan’s email. - All - Review Owen’s DTS contributions guide - Owen, Lijuan - Look for merging the makefile removal patch ( http://patchwork.dpdk.org/project/dts/list/?series=20610) - Honnappa - Understand how the SPDX change was done for the DPDK repo - Honnappa - Talk to techboard about files with the GPL2 license in DTS. - Juraj - Convert the makefile build for documentation to use meson build. - Owen - RFC for making the python version 3.6 - Owen - Look at making DTS build using the DPDK repo it is in. Any other business - Next Meeting: January 26, 2022 - Subject for next meeting: Focus on getting framework ready to move
RE: [PATCH v2 01/10] ethdev: introduce flow pre-configuration hints
> -Original Message- > From: Bruce Richardson > Sent: Wednesday, January 26, 2022 3:41 PM > To: Ori Kam > Subject: Re: [PATCH v2 01/10] ethdev: introduce flow pre-configuration hints > > On Wed, Jan 26, 2022 at 12:19:43PM +, Ori Kam wrote: > > > > > > > -Original Message- > > > From: Thomas Monjalon > > > Sent: Wednesday, January 26, 2022 1:22 PM > > > Subject: Re: [PATCH v2 01/10] ethdev: introduce flow pre-configuration > > > hints > > > > > > 26/01/2022 11:52, Bruce Richardson: > > > > The scenario is as follows. Suppose we have the initial state as below: > > > > > > > > struct x_dev_cfg { > > > >int x; > > > > }; > > > > > > > > int > > > > x_dev_cfg(int dev_id, struct x_dev_cfg *cfg) > > > > { > > > >struct x_dev *dev = x_devs[id]; > > > >// some setup/config may go here > > > >return dev->configure(cfg, sizeof(cfg)); // sizeof(cfg) == 4 > > > > } > > > > > > > > Now, supposing we need to add in a new field into the config structure, > > > > a > > > > very common occurance. This will indeed break the ABI, so we need to use > > > > ABI versioning, to ensure that apps passing in the old structure, only > > > > call > > > > a function which expects the old structure. Therefore, we need a copy of > > > > the old structure, and a function to work on it. This gives this result: > > > > > > > > struct x_dev_cfg { > > > > int x; > > > > bool flag; // new field; > > > > }; > > > > > > > > struct x_dev_cfg_v22 { // needed for ABI-versioned function > > > > int x; > > > > }; > > > > > > > > /* this function will only be called by *newly-linked* code, which uses > > > > * the new structure */ > > > > int > > > > x_dev_cfg(int dev_id, struct x_dev_cfg *cfg) > > > > { > > > >struct x_dev *dev = x_devs[id]; > > > >// some setup/config may go here > > > >return dev->configure(cfg, sizeof(cfg)); // sizeof(cfg) is now 8 > > > > } > > > > > > > > /* this function is called by apps linked against old version */ > > > > int > > > > x_dev_cfg_v22(int dev_id, struct x_dev_cfg_v22 *cfg) > > > > { > > > >struct x_dev *dev = x_devs[id]; > > > >// some setup/config may go here > > > >return dev->configure((void *)cfg, sizeof(cfg)); // sizeof(cfg) is > > > > still 4 > > > > } > > > > > > > > With the above library code, we have different functions using the > > > > different structures, so ABI compatibility is preserved - apps passing > > > > in a > > > > 4-byte struct call a function using the 4-byte struct, while newer apps > > > > can > > > > use the 8-byte version. > > > > > > > > The final part of the puzzle is then how drivers react to this change. > > > > Originally, all drivers only use "x" in the config structure because > > > > that > > > > is all that there is. That will still continue to work fine in the above > > > > case, as both 4-byte and 8-byte structs have the same x value at the > > > > same > > > > offset. i.e. no driver updates for x_dev is needed. > > > > > > > > On the other hand, if there are drivers that do want/need the new field, > > > > they can also get to use it, but they do need to check for its presence > > > > before they do so, i.e they would work as below: > > > > > > > > if (size_param > struct(x_dev_cfg_v22)) { // or "== > > > > struct(x_dev_cfg)" > > > > // use flags field > > > > } > > > > > > > > Hope this is clear now. > > > > > > Yes, this is the kind of explanation we need in our guideline doc. > > > Alternatives can be documented as well. > > > If we can list pros/cons in the doc, it will be easier to choose > > > the best approach and to explain the choice during code review. > > > > > > > > Thanks you very much for the clear explanation. > > > > The draw back is that we need also to duplicate the functions. > > Using the flags/version we only need to create new structures > > and from application point of view it knows what exta fields it gets. > > (I agree that application knowledge has downsides but also advantages) > > > > In the case of flags/version your example will look like this (this is for > > the record and may other > > developers are intrested): > > > > struct x_dev_cfg { //original struct > > int ver; > > int x; > > }; > > > > struct x_dev_cfg_v2 { // new struct > > int ver; > > int x; > > bool flag; // new field; > > }; > > > > > > The function is always the same function: > > x_dev_cfg(int dev_id, struct x_dev_cfg *cfg) > > { > > struct x_dev *dev = x_devs[id]; > > // some setup/config may go here > > return dev->configure(cfg); > > } > > > > When calling this function with old struct: > > X_dev_cfg(id, (struct x_dev_cfg *)cfg) > > > > When calling this function with new struct: > > X_dev_cfg(id, (struct x_dev_cfg *)cfg_v2) > > > > In PMD: > > If (cfg->ver >= 2) > > // version 2 logic > > Else If (cfg->v >=0) > > // base version logic > > > > > > When using flags it gives even more control si
RE: [PATCH] common/mlx5: fix MR lookup for non-contiguous mempool
Hi, > -Original Message- > From: Dmitry Kozlyuk > Sent: Friday, January 14, 2022 12:52 PM > To: dev@dpdk.org > Cc: sta...@dpdk.org; Wang Yunjian ; Slava > Ovsiienko ; Matan Azrad > Subject: [PATCH] common/mlx5: fix MR lookup for non-contiguous mempool > > Memory region (MR) lookup by address inside mempool MRs was not > accounting for the upper bound of an MR. > For mempools covered by multiple MRs this could return a wrong MR LKey, > typically resulting in an unrecoverable TxQ failure: > > mlx5_net: Cannot change Tx QP state to INIT Invalid argument > > Corresponding message from /var/log/dpdk_mlx5_port_X_txq_Y_index_Z*: > > Unexpected CQE error syndrome 0x04 CQN = 128 SQN = 4848 > wqe_counter = 0 wq_ci = 9 cq_ci = 122 > > This is likely to happen with --legacy-mem and IOVA-as-PA, because EAL > intentionally maps pages at non-adjacent PA to non-adjacent VA in this > mode, and MLX5 PMD works with VA. > > Fixes: 690b2a88c2f7 ("common/mlx5: add mempool registration facilities") > Cc: sta...@dpdk.org > > Reported-by: Wang Yunjian > Signed-off-by: Dmitry Kozlyuk > Reviewed-by: Viacheslav Ovsiienko Patch applied to next-net-mlx, Kindest regards, Raslan Darawsheh
RE: [PATCH] net/mlx5: fix mark enabling for Rx datapath
Hi, > -Original Message- > From: Raja Zidane > Sent: Sunday, January 16, 2022 5:24 PM > To: dev@dpdk.org > Cc: Matan Azrad ; sta...@dpdk.org > Subject: [PATCH] net/mlx5: fix mark enabling for Rx datapath > > To optimize datapath, the mlx5 pmd checked for mark action on flow > creation, and flagged possible destination rxqs (through queue/RSS > actions), then it enabled the mark action logic only for flagged rxqs. > > Mark action didn't work if no queue/rss action was in the same flow, > even when the user use multi-group logic to manage the flows. > So, if mark action is performed in group X and the packet is moved to group > Y > X when the packet is forwarded to Rx queues, SW did not get the mark ID > to the mbuf. > > Flag Rx datapath to report mark action for any queue when the driver > detects the first mark action after dev_start operation. > > Fixes: 8e61555657b2 ("net/mlx5: fix shared RSS and mark actions > combination") > Cc: sta...@dpdk.org > > Signed-off-by: Raja Zidane > --- > Acked-by: Matan Azrad ma...@nvidia.com Patch applied to next-net-mlx, Kindest regards, Raslan Darawsheh
RE: [PATCH] common/mlx5: fix probing return value when failing
Hi, > -Original Message- > From: Bing Zhao > Sent: Monday, January 17, 2022 7:49 PM > To: Slava Ovsiienko ; Matan Azrad > > Cc: dev@dpdk.org; Raslan Darawsheh ; > Xueming(Steven) Li ; sta...@dpdk.org > Subject: [PATCH] common/mlx5: fix probing return value when failing > > While probing the device with unsupported class, the process should > fail because no appropriate driver was found. After traversing all > the drivers, an error value should be returned for the case. > > In the previous implementation, zero value indicating probing success > was wrongly returned. > > Fixes: ad435d320473 ("common/mlx5: add bus-agnostic layer") > Cc: xuemi...@nvidia.com > Cc: sta...@dpdk.org > > Signed-off-by: Bing Zhao > Acked-by: Viacheslav Ovsiienko Patch applied to next-net-mlx, Kindest regards, Raslan Darawsheh
RE: [PATCH] net/mlx5: reject jump to root table
Hi, > -Original Message- > From: Xiaoyu Min > Sent: Tuesday, January 18, 2022 1:39 PM > To: Matan Azrad ; Slava Ovsiienko > ; Dekel Peled > Cc: dev@dpdk.org; sta...@dpdk.org > Subject: [PATCH] net/mlx5: reject jump to root table > > Currently root table as destination is not supported. > The jump action which finally be translated to underlying root table in > rdma-core should be rejected. > > Fixes: f78f747f41d0 ("net/mlx5: allow jump to group lower than current") > Cc: sta...@dpdk.org > > Signed-off-by: Xiaoyu Min > Acked-by: Viacheslav Ovsiienko Patch applied to next-net-mlx, Kindest regards, Raslan Darawsheh
Re: [PATCH v2] net/bonding: fix RSS not work for bonding
On 1/18/2022 9:27 AM, Min Hu (Connor) wrote: 在 2022/1/18 17:18, Yu Wenjun 写道: RSS don't work when bond_ethdev_configure called before rte_eth_bond_slave_add. e.g.: dont't work(examples/bond/main.c): rte_eth_bond_create() rte_eth_dev_configure() rte_eth_bond_slave_add() rte_eth_dev_start() work(testpmd): rte_eth_bond_create() rte_eth_bond_slave_add() rte_eth_dev_configure() rte_eth_dev_start() Fixes: 6b1a001ec546 ("net/bonding: fix RSS key length") Cc: sta...@dpdk.org Signed-off-by: Yu Wenjun Acked-by: Min Hu (Connor) Updated commit log to include rss key length being 0 detail, based on discussions in the mail list. Applied to dpdk-next-net/main, thanks.
DTS WG Meeting Minutes 1/26/22
Hello folks, The meeting minutes can be found here: https://docs.google.com/document/d/1E2mkTHNQ5vyln1JvnJTil15cjacUTXP7LXb9K960Vxs And here are the meeting minutes from Jan 26th: Attendees * Owen Hilyard * Juraj Linkes * Honnappa Nagarahalli * Lincoln Lavoie * Lijuan Tu * Ganapati Agenda * Additions to the agenda * Review/Carry forward pending action items * Merging DTS into DPDK * Review additional work items Minutes Review/Carry forward pending action items * Honnappa - Propose changes to DPDK to address changing the constants and printfs, refer to Lijuan's email. * Patches in internal review, to be pushed next week. * More exploration (of printfs) is needed. * All - Review Owen's DTS contributions guide. * Juraj provided comments, Owen to review, then publish the final version. * Owen, Lijuan - Look for merging the makefile removal patch (http://patchwork.dpdk.org/project/dts/list/?series=20610) * Honnappa - Understand how the SPDX change was done for the DPDK repo * We understand this now, closing. * Honnappa - Talk to techboard about files with the GPL2 license in DTS. * Techboard notified, Hemant is fine with having the GPL2 license in DTS, but discussion is still continuing. * Juraj - Convert the makefile build for documentation to use meson build. * Deferred to the future, when repositories are merged. * Owen - RFC for making the python version 3.6 * Python 3.6 is EOL. Which version to use is informed by Python version EOL, UNH lab's (and other lab's) software update policy and which version is shipped with Ubuntu stable. Going with 3.8 for now. For the future, go with a flexible approach where we won't use EOL Python versions and consider how labs update their software. * Owen - Look at making DTS build using the common DPDK repo in addition to the existing tarball option. * No progress. Merging DTS into DPDK * Are we going to integrate DTS with the DPDK build system? No resolution yet. * Try for 22.07 * Focus on framework and test cases used in CI * Do a deep dive on Feb 9th - identify what needs to be done and estimate the work. Action Items * Honnappa - Patches to DPDK to address changing the constants and printfs under development. * All - Review Owen's DTS contributions guide, Owen to review. * Owen, Lijuan - Look for merging the makefile removal patch (http://patchwork.dpdk.org/project/dts/list/?series=20610) in the next release. * Owen, Lijuan - Create release notes, capture the makefile deprecation in them. Also add a notification/log to DTS runs. * Honnappa - Continue the techboard discussion about files with the GPL2 license in DTS. * Juraj - Postponed: Convert the makefile build for documentation to use meson build after merging the repositories or the directory structure is known. * Owen - RFC for making the python version 3.6 done, update with version 3.8. * Owen - Look at making DTS build using the common DPDK repo in addition to the existing tarball option. Any other business * Next Meeting: February 02, 2022 Regards, Juraj
[PATCH] net/ice: add support for setting promisc by DCF
allow to enable/disable VFs promisc mode over VF0. this feature need to update ice kernel driver (newer than v1.8.0) Signed-off-by: Yiding Zhou --- drivers/net/ice/ice_dcf_vf_representor.c | 56 +--- 1 file changed, 39 insertions(+), 17 deletions(-) diff --git a/drivers/net/ice/ice_dcf_vf_representor.c b/drivers/net/ice/ice_dcf_vf_representor.c index b9fcfc80ad..781282f68c 100644 --- a/drivers/net/ice/ice_dcf_vf_representor.c +++ b/drivers/net/ice/ice_dcf_vf_representor.c @@ -10,6 +10,20 @@ #include "ice_dcf_ethdev.h" #include "ice_rxtx.h" +static __rte_always_inline struct ice_dcf_hw * +ice_dcf_vf_repr_hw(struct ice_dcf_vf_repr *repr) +{ + struct ice_dcf_adapter *dcf_adapter = + repr->dcf_eth_dev->data->dev_private; + + if (!dcf_adapter) { + PMD_DRV_LOG(ERR, "DCF for VF representor has been released\n"); + return NULL; + } + + return &dcf_adapter->real_hw; +} + static uint16_t ice_dcf_vf_repr_rx_burst(__rte_unused void *rxq, __rte_unused struct rte_mbuf **rx_pkts, @@ -78,15 +92,36 @@ ice_dcf_vf_repr_tx_queue_setup(__rte_unused struct rte_eth_dev *dev, } static int -ice_dcf_vf_repr_promiscuous_enable(__rte_unused struct rte_eth_dev *ethdev) +ice_dcf_vf_repr_promiscuous_enable(struct rte_eth_dev *ethdev) { - return 0; + struct ice_dcf_vf_repr *repr = ethdev->data->dev_private; + struct dcf_virtchnl_cmd args; + struct virtchnl_promisc_info promisc; + struct ice_dcf_hw *hw = ice_dcf_vf_repr_hw(repr); + memset(&args, 0, sizeof(args)); + args.v_op = VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE; + promisc.flags = 0; + promisc.vsi_id = hw->vf_vsi_map[repr->vf_id] & ~VIRTCHNL_DCF_VF_VSI_VALID; + promisc.flags |= FLAG_VF_UNICAST_PROMISC; + args.req_msg = (uint8_t *)&promisc; + args.req_msglen = sizeof(promisc); + return ice_dcf_execute_virtchnl_cmd(hw, &args); } static int -ice_dcf_vf_repr_promiscuous_disable(__rte_unused struct rte_eth_dev *ethdev) +ice_dcf_vf_repr_promiscuous_disable(struct rte_eth_dev *ethdev) { - return 0; + struct ice_dcf_vf_repr *repr = ethdev->data->dev_private; + struct dcf_virtchnl_cmd args; + struct virtchnl_promisc_info promisc; + struct ice_dcf_hw *hw = ice_dcf_vf_repr_hw(repr); + memset(&args, 0, sizeof(args)); + args.v_op = VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE; + promisc.flags = 0; + promisc.vsi_id = hw->vf_vsi_map[repr->vf_id] & ~VIRTCHNL_DCF_VF_VSI_VALID; + args.req_msg = (uint8_t *)&promisc; + args.req_msglen = sizeof(promisc); + return ice_dcf_execute_virtchnl_cmd(hw, &args); } static int @@ -108,19 +143,6 @@ ice_dcf_vf_repr_link_update(__rte_unused struct rte_eth_dev *ethdev, return 0; } -static __rte_always_inline struct ice_dcf_hw * -ice_dcf_vf_repr_hw(struct ice_dcf_vf_repr *repr) -{ - struct ice_dcf_adapter *dcf_adapter = - repr->dcf_eth_dev->data->dev_private; - - if (!dcf_adapter) { - PMD_DRV_LOG(ERR, "DCF for VF representor has been released\n"); - return NULL; - } - - return &dcf_adapter->real_hw; -} static int ice_dcf_vf_repr_dev_info_get(struct rte_eth_dev *dev, -- 2.25.1
Re: [PATCH v3 1/1] net/tap: add a check that Rx/Tx have the same num of queues
On 1/19/2022 7:43 AM, Nobuhiro MIKI wrote: Users can create the desired number of RxQ and TxQ in DPDK. For example, if the number of RxQ = 2 and the number of TxQ = 5, a total of 8 file descriptors will be created for a tap device, including RxQ, TxQ, and one for keepalive. The RxQ and TxQ with the same ID are paired by dup(2). In this scenario, Kernel will have 3 RxQ where packets are incoming but not read. The reason for this is that there are only 2 RxQ that are polled by DPDK, while there are 5 queues in Kernel. This patch add a checking if DPDK has appropriate numbers of queues to avoid unexpected packet drop. Signed-off-by: Nobuhiro MIKI Reviewed-by: Ferruh Yigit Applied to dpdk-next-net/main, thanks.
Re: [PATCH 1/4] net/ark: add device capabilities record
On 1/19/2022 7:12 PM, John Miller wrote: Add static record of supported device capabilities. Signed-off-by: John Miller --- drivers/net/ark/ark_ethdev.c | 58 +--- 1 file changed, 48 insertions(+), 10 deletions(-) diff --git a/drivers/net/ark/ark_ethdev.c b/drivers/net/ark/ark_ethdev.c index b618cba3f0..0414c78bb5 100644 --- a/drivers/net/ark/ark_ethdev.c +++ b/drivers/net/ark/ark_ethdev.c @@ -96,6 +96,26 @@ static const struct rte_pci_id pci_id_ark_map[] = { {.vendor_id = 0, /* sentinel */ }, }; +struct ark_caps { + bool rqpacing; Can you please put some comment that what this 'rqpacing' capability is? Either to commit log, or as a comment to the code, or both. +}; +struct ark_dev_caps { + uint32_t device_id; + struct ark_caps caps; +}; +static const struct ark_dev_caps +ark_device_caps[] = { +{0x100d, {.rqpacing = true} }, +{0x100e, {.rqpacing = true} }, +{0x100f, {.rqpacing = true} }, +{0x1010, {.rqpacing = false} }, +{0x1017, {.rqpacing = true} }, +{0x1018, {.rqpacing = true} }, +{0x1019, {.rqpacing = true} }, +{0x101e, {.rqpacing = false} }, This device, 0x101e, even not probed, it looks odd to keep capability for it. And it will increase the readability if you can use macros for device/vendor ID. +{.device_id = 0,} +}; + static int eth_ark_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, struct rte_pci_device *pci_dev) @@ -256,6 +276,7 @@ eth_ark_dev_init(struct rte_eth_dev *dev) int ret; int port_count = 1; int p; + bool rqpacing = false; ark->eth_dev = dev; @@ -270,6 +291,15 @@ eth_ark_dev_init(struct rte_eth_dev *dev) rte_eth_copy_pci_info(dev, pci_dev); dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS; + p = 0; + while (ark_device_caps[p].device_id != 0) { + if (pci_dev->id.device_id == ark_device_caps[p].device_id) { + rqpacing = ark_device_caps[p].caps.rqpacing; + break; + } + p++; + } + /* Use dummy function until setup */ dev->rx_pkt_burst = ð_ark_recv_pkts_noop; dev->tx_pkt_burst = ð_ark_xmit_pkts_noop; @@ -288,8 +318,12 @@ eth_ark_dev_init(struct rte_eth_dev *dev) ark->pktgen.v = (void *)&ark->bar0[ARK_PKTGEN_BASE]; ark->pktchkr.v = (void *)&ark->bar0[ARK_PKTCHKR_BASE]; - ark->rqpacing = - (struct ark_rqpace_t *)(ark->bar0 + ARK_RCPACING_BASE); + if (rqpacing) { + ark->rqpacing = + (struct ark_rqpace_t *)(ark->bar0 + ARK_RCPACING_BASE); + } else { + ark->rqpacing = NULL; + } ark->started = 0; ark->pkt_dir_v = ARK_PKT_DIR_INIT_VAL; @@ -309,13 +343,15 @@ eth_ark_dev_init(struct rte_eth_dev *dev) return -1; } if (ark->sysctrl.t32[3] != 0) { - if (ark_rqp_lasped(ark->rqpacing)) { - ARK_PMD_LOG(ERR, "Arkville Evaluation System - " - "Timer has Expired\n"); - return -1; + if (ark->rqpacing) { + if (ark_rqp_lasped(ark->rqpacing)) { + ARK_PMD_LOG(ERR, "Arkville Evaluation System - " + "Timer has Expired\n"); + return -1; + } + ARK_PMD_LOG(WARNING, "Arkville Evaluation System - " + "Timer is Running\n"); } - ARK_PMD_LOG(WARNING, "Arkville Evaluation System - " - "Timer is Running\n"); } ARK_PMD_LOG(DEBUG, @@ -499,7 +535,8 @@ ark_config_device(struct rte_eth_dev *dev) ark_ddm_stats_reset(ark->ddm.v); ark_ddm_stop(ark->ddm.v, 0); - ark_rqp_stats_reset(ark->rqpacing); + if (ark->rqpacing) + ark_rqp_stats_reset(ark->rqpacing); return 0; } @@ -695,7 +732,8 @@ eth_ark_dev_close(struct rte_eth_dev *dev) /* * TODO This should only be called once for the device during shutdown */ - ark_rqp_dump(ark->rqpacing); + if (ark->rqpacing) + ark_rqp_dump(ark->rqpacing); for (i = 0; i < dev->data->nb_tx_queues; i++) { eth_ark_tx_queue_release(dev->data->tx_queues[i]);
Re: [PATCH 2/4] net/ark: support arbitrary mbuf size
On 1/19/2022 7:12 PM, John Miller wrote: diff --git a/drivers/net/ark/ark_udm.h b/drivers/net/ark/ark_udm.h index 4e51a5e82c..1cbcd94a98 100644 --- a/drivers/net/ark/ark_udm.h +++ b/drivers/net/ark/ark_udm.h @@ -33,7 +33,7 @@ struct ark_rx_meta { #define ARK_RX_WRITE_TIME_NS 2500 #define ARK_UDM_SETUP 0 #define ARK_UDM_CONST2 0xbACECACE -#define ARK_UDM_CONST3 0x334d4455 +#define ARK_UDM_CONST3 0x344d4455 #define ARK_UDM_CONST ARK_UDM_CONST3 struct ark_udm_setup_t { uint32_t r0; What is this change for? It looks unrelated with the rest of the patch. May it be added by mistake?
Re: [PATCH 3/4] net/ark: publish include file for external access
On 1/19/2022 7:12 PM, John Miller wrote: publish rte_pmd_ark.h for external access to extension I have rejected same patch 2 or 3 times since PMD first upstreamed, it is frustrating that it keeps coming :( Please find the reasoning of why it is rejected from archives. Signed-off-by: John Miller
Re: [PATCH 3/4] net/ark: publish include file for external access
On 1/26/2022 4:48 PM, Ferruh Yigit wrote: On 1/19/2022 7:12 PM, John Miller wrote: publish rte_pmd_ark.h for external access to extension I have rejected same patch 2 or 3 times since PMD first upstreamed, it is frustrating that it keeps coming :( cc'ed Shepard & Ed. Please find the reasoning of why it is rejected from archives. Signed-off-by: John Miller
Re: [PATCH 4/4] net/ark: support chunk DMA transfers
On 1/19/2022 7:12 PM, John Miller wrote: Add support for chunk DMA transfers. Signed-off-by: John Miller --- drivers/net/ark/ark_ddm.c | 1 + drivers/net/ark/ark_ethdev_rx.c | 16 +--- drivers/net/ark/ark_mpu.c | 1 + drivers/net/ark/ark_pktchkr.c | 2 +- drivers/net/ark/ark_pktgen.c| 2 +- drivers/net/ark/ark_udm.c | 3 +++ 6 files changed, 16 insertions(+), 9 deletions(-) diff --git a/drivers/net/ark/ark_ddm.c b/drivers/net/ark/ark_ddm.c index 2321371572..b16c739d50 100644 --- a/drivers/net/ark/ark_ddm.c +++ b/drivers/net/ark/ark_ddm.c @@ -55,6 +55,7 @@ ark_ddm_stop(struct ark_ddm_t *ddm, const int wait) int cnt = 0; ddm->cfg.command = 2; + rte_wmb(); Is the new memory barrier related to the chunk DMA transfer? while (wait && (ddm->cfg.stop_flushed & 0x01) == 0) { if (cnt++ > 1000) return 1; diff --git a/drivers/net/ark/ark_ethdev_rx.c b/drivers/net/ark/ark_ethdev_rx.c index 1000f50be0..49134ea08f 100644 --- a/drivers/net/ark/ark_ethdev_rx.c +++ b/drivers/net/ark/ark_ethdev_rx.c @@ -12,6 +12,7 @@ #define ARK_RX_META_SIZE 32 #define ARK_RX_META_OFFSET (RTE_PKTMBUF_HEADROOM - ARK_RX_META_SIZE) +#define ARK_RX_MPU_CHUNK (64U) /* Forward declarations */ struct ark_rx_queue; @@ -104,7 +105,7 @@ static inline void eth_ark_rx_update_cons_index(struct ark_rx_queue *queue, uint32_t cons_index) { queue->cons_index = cons_index; - if ((cons_index + queue->queue_size - queue->seed_index) >= 64U) { + if ((cons_index + queue->queue_size - queue->seed_index) >= ARK_RX_MPU_CHUNK) { eth_ark_rx_seed_mbufs(queue); ark_mpu_set_producer(queue->mpu, queue->seed_index); } @@ -179,12 +180,12 @@ eth_ark_dev_rx_queue_setup(struct rte_eth_dev *dev, queue->reserve_q = rte_zmalloc_socket("Ark_rx_queue mbuf", nb_desc * sizeof(struct rte_mbuf *), - 64, + 512, socket_id); queue->paddress_q = rte_zmalloc_socket("Ark_rx_queue paddr", nb_desc * sizeof(rte_iova_t), - 64, + 512, socket_id); if (queue->reserve_q == 0 || queue->paddress_q == 0) { @@ -455,7 +456,8 @@ eth_ark_rx_stop_queue(struct rte_eth_dev *dev, uint16_t queue_id) static inline int eth_ark_rx_seed_mbufs(struct ark_rx_queue *queue) { - uint32_t limit = queue->cons_index + queue->queue_size; + uint32_t limit = (queue->cons_index & ~(ARK_RX_MPU_CHUNK - 1)) + + queue->queue_size; uint32_t seed_index = queue->seed_index; uint32_t count = 0; @@ -618,14 +620,14 @@ eth_ark_udm_force_close(struct rte_eth_dev *dev) ark_mpu_start(queue->mpu); /* Add some buffers */ - index = 10 + queue->seed_index; + index = ARK_RX_MPU_CHUNK + queue->seed_index; 10 is replaced by '64', the diff is huge, just double checking if this is what intended? ark_mpu_set_producer(queue->mpu, index); } /* Wait to allow data to pass */ usleep(100); - ARK_PMD_LOG(DEBUG, "UDM forced flush attempt, stopped = %d\n", - ark_udm_is_flushed(ark->udm.v)); + ARK_PMD_LOG(NOTICE, "UDM forced flush attempt, stopped = %d\n", + ark_udm_is_flushed(ark->udm.v)); } ark_udm_reset(ark->udm.v); } diff --git a/drivers/net/ark/ark_mpu.c b/drivers/net/ark/ark_mpu.c index 8160c1de7b..b8e94b6ed3 100644 --- a/drivers/net/ark/ark_mpu.c +++ b/drivers/net/ark/ark_mpu.c @@ -68,6 +68,7 @@ ark_mpu_reset(struct ark_mpu_t *mpu) int cnt = 0; mpu->cfg.command = MPU_CMD_RESET; + rte_wmb(); while (mpu->cfg.command != MPU_CMD_IDLE) { if (cnt++ > 1000) diff --git a/drivers/net/ark/ark_pktchkr.c b/drivers/net/ark/ark_pktchkr.c index 84bb567a41..12a5abb2f7 100644 --- a/drivers/net/ark/ark_pktchkr.c +++ b/drivers/net/ark/ark_pktchkr.c @@ -113,7 +113,7 @@ ark_pktchkr_stopped(ark_pkt_chkr_t handle) struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle; uint32_t r = inst->sregs->pkt_start_stop; - return (((r >> 16) & 1) == 1); + return (((r >> 16) & 1) == 1) || (r == 0); } void diff --git a/drivers/net/ark/ark_pktgen.c b/drivers/net/ark/ark_pktgen.c index 515bfe461c..6195ef997f 100644 --- a/drivers/net/ark/ark_pktgen.c +++ b/drivers/net/ark/ark_pktgen.c @@ -107,7 +107,7 @@ ark_pktgen_paused(ark_pkt_gen_t handle) struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle; uint32_t r = inst->regs->pkt_start_stop; - return (((r
Re: [dpdk-dev] [PATCH] net/nfp: free HW rings memzone on queue release
On 1/19/2022 11:48 AM, heinrich.k...@corigine.com wrote: From: Heinrich Kuhn During rx/tx queue setup, memory is reserved for the hardware rings. This memory zone should subsequently be freed in the queue release logic. This commit also adds a call to the release logic in the dev_close() callback so that the ring memzone may be freed during port close too. Fixes: b812daadad0d ("nfp: add Rx and Tx") Cc: sta...@dpdk.org Signed-off-by: Heinrich Kuhn Signed-off-by: Simon Horman Applied to dpdk-next-net/main, thanks.
Re: [Bug 924] Mellanox ConnectX6 DPDK dpdk-testpmd Receive error len error checksum UDP packet performance is very low!
You mention flow2 is with the errors, but also show the performance is good. is this correct? Do you mean when you have errors you get perf drop? Regards, Asaf Penso From: bugzi...@dpdk.org Sent: Thursday, January 20, 2022 11:01:43 AM To: dev@dpdk.org Subject: [Bug 924] Mellanox ConnectX6 DPDK dpdk-testpmd Receive error len error checksum UDP packet performance is very low! https://bugs.dpdk.org/show_bug.cgi?id=924 Bug ID: 924 Summary: Mellanox ConnectX6 DPDK dpdk-testpmd Receive error len error checksum UDP packet performance is very low! Product: DPDK Version: 21.11 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: 188989...@qq.com Target Milestone: --- I use Ixia to construct two streams flow1 udp 64size small packet flow2 udp 64size small packet error len error checksum Send per second 30G bps 57 million pps ./dpdk-testpmd -l 4-22 -n 8 -- -i --rxq 19 --txq 19 --nb-cores 18 --rxd 2048 --txd 2048 --portmask 0xff set fwd rxonly start NIC statistics for port 6 RX-packets: 178423861761 RX-missed: 646877 RX-bytes: 11419127182586 RX-errors: 0 RX-nombuf: 0 TX-packets: 0 TX-errors: 0 TX-bytes: 0 Throughput (since last show) Rx-pps: 5904277 Rx-bps: 3022990248 Tx-pps:0 Tx-bps:0 Recive per second 3G bps 5 million pps ,Only about one tenth was received rx_discards_phy drop [root@localhost ~]# ethtool -S enp202s0f0 |grep dis rx_discards_phy: 48832790759 tx_discards_phy: 0 rx_prio0_discards: 48832470594 rx_prio1_discards: 0 rx_prio2_discards: 0 rx_prio3_discards: 0 rx_prio4_discards: 0 rx_prio5_discards: 0 rx_prio6_discards: 0 rx_prio7_discards: 0 When I change the flow2 to no problem udp, It's all normal. NIC statistics for port 6 RX-packets: 179251451823 RX-missed: 660384 RX-bytes: 11472092947106 RX-errors: 0 RX-nombuf: 0 TX-packets: 0 TX-errors: 0 TX-bytes: 0 Throughput (since last show) Rx-pps: 57553942 Rx-bps: 29467618576 Tx-pps:0 Tx-bps:0 Recive per second 29.4G bps 57 million pps [root@localhost ~]# mlxfwmanager Querying Mellanox devices firmware ... Device #1: -- Device Type: ConnectX6 Part Number: MCX653106A-ECA_Ax Description: ConnectX-6 VPI adapter card; H100Gb/s (HDR100; EDR IB and 100GbE); dual-port QSFP56; PCIe3.0 x16; tall bracket; ROHS R6 PSID: MT_000224 PCI Device Name: :ca:00.0 Base MAC: 08c0eb204e5a Versions: CurrentAvailable FW 20.30.1004 20.32.1010 PXE3.6.0301 3.6.0502 UEFI 14.23.0017 14.25.0017 Status: Update required -- You are receiving this mail because: You are the assignee for the bug.
Re: [PATCH] eal/windows: set pthread affinity
On 1/20/2022 4:17 PM, Pallavi Kadam wrote: Sometimes OS tries to switch the core. So, bind the lcore thread to a fixed core. Implement affinity call on Windows similar to Linux. Signed-off-by: Qiao Liu Signed-off-by: Pallavi Kadam --- lib/eal/windows/eal.c | 4 1 file changed, 4 insertions(+) diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c index 67db7f099a..ca3c41aaa7 100644 --- a/lib/eal/windows/eal.c +++ b/lib/eal/windows/eal.c @@ -422,6 +422,10 @@ rte_eal_init(int argc, char **argv) /* create a thread for each lcore */ if (eal_thread_create(&lcore_config[i].thread_id) != 0) rte_panic("Cannot create thread\n"); + ret = pthread_setaffinity_np(lcore_config[i].thread_id, + sizeof(rte_cpuset_t), &lcore_config[i].cpuset); + if (ret != 0) + RTE_LOG(DEBUG, EAL, "Cannot set affinity\n"); } /* Initialize services so drivers can register services during probe. */ Acked-by: Ranjit Menon
Re: [PATCH v2 03/10] ethdev: bring in async queue-based flow rules
On Tue, Jan 25, 2022 at 9:03 PM Alexander Kozyrev wrote: > > On Monday, January 24, 2022 19:00 Ivan Malov wrote: > > This series is very helpful as it draws attention to > > the problem of making flow API efficient. That said, > > there is much room for improvement, especially in > > what comes to keeping things clear and concise. > > > > In example, the following APIs > > > > - rte_flow_q_flow_create() > > - rte_flow_q_flow_destroy() > > - rte_flow_q_action_handle_create() > > - rte_flow_q_action_handle_destroy() > > - rte_flow_q_action_handle_update() > > > > should probably be transformed into a unified one > > > > int > > rte_flow_q_submit_task(uint16_t port_id, > > uint32_t queue_id, > > const struct rte_flow_q_ops_attr *q_ops_attr, > > enum rte_flow_q_task_type task_type, > > const void *task_data, > > rte_flow_q_task_cookie_t cookie, > > struct rte_flow_error*error); > > > > with a handful of corresponding enum defines and data structures > > for these 5 operations. > We were thinking about the unified function for all queue operations. > But it has too many drawbacks in our opinion: > 1. Different operation return different results and unneeded parameters. > q_flow_create gives a flow handle, q_action_handle returns indirect action > handle. > destroy functions return the status. All these cases needs to be handled > differently. > Also, the unified function is bloated with various parameters not needed for > all operations. > Both of these point results in hard to understand API and messy documentation > with > various structures on how to use it in every particular case. > 2. Performance consideration. > We aimed the new API with the insertion/deletion rate in mind. > Adding if conditions to distinguish between requested operation will cause > some degradation. > It is preferred to have separate small functions that do one job and make it > efficient. > 3. Conforming to the current API. > The idea is to have the same API as we had before and extend it with > asynchronous counterparts. > That is why we took the familiar functions and added queue-based version s > for them. > It is easier for application to switch to new API with this approach. > Interfaces are still the same. Alexander, I think you have made some good points here. Dedicated API is better compared to the unified function. > > > By the way, shouldn't this variety of operation types cover such > > from the tunnel offload model? Getting PMD's opaque "tunnel > > match" items and "tunnel set" actions - things like that. > Don't quite get the idea. Could you please elaborate more on this? > > > Also, I suggest that the attribute "drain" > > be replaced by "postpone" (invert the meaning). > > rte_flow_q_drain() should then be renamed to > > rte_flow_q_push_postponed(). > > > > The rationale behind my suggestion is that "drain" tricks readers into > > thinking that the enqueued operations are going to be completely purged, > > whilst the true intention of the API is to "push" them to the hardware. > I don't have a strong opinion on this naming, if you think "postpone" is > better. > Or we can name it as "doorbell" to signal a NIC about some work to be done > and "rte_flow_q_doorbell" to do this explicitly after a few operations. > > > rte_flow_q_dequeue() also needs to be revisited. The name suggests that > > some "tasks" be cancelled, whereas in reality this API implies "poll" > > semantics. So why not name it "rte_flow_q_poll"? > The polling implies an active busy-waiting of the result. Which is not the > case here. > What we do is only getting results for already processed operations, hence > "dequeue" as opposite to "queue". > What do you think? Or we can have push for drain and pull for dequeue as an > alternative. > > > I believe this function should return an array of completions, just > > like it does in the current version, but provide a "cookie" (can be > > represented by a uintptr_t value) for each completion entry. > > > > The rationale behind choosing "cookie" over "user_data" is clarity. > > Term "user_data" sounds like "flow action conf" or "counter data", > > whilst in reality it likely stands for something normally called > > a cookie. Please correct me if I've got that wrong. > I haven't heard about cookies in context not related to web browsing. > I think that user data is more than a simple cookie, it can contain > anything that application wants to associate with this flow rule, i.e. > connection ID, timestamp, anything. It is more general term here. > > > Well, the short of it: submit_task(), push_postponed() and poll(). > > Having a unified "submit_task()" API should allow to reduce the > > amount of comment duplication. > I'm afraid it won't reduce the amount
[PATCH v4 1/2] gpudev: expose GPU memory to CPU
From: Elena Agostini Enable the possibility to expose a GPU memory area and make it accessible from the CPU. GPU memory has to be allocated via rte_gpu_mem_alloc(). This patch allows the gpudev library to map (and unmap), through the GPU driver, a chunk of GPU memory and to return a memory pointer usable by the CPU to access the GPU memory area. Signed-off-by: Elena Agostini --- doc/guides/prog_guide/gpudev.rst | 9 + drivers/gpu/cuda/cuda.c | 2 ++ lib/gpudev/gpudev.c | 61 lib/gpudev/gpudev_driver.h | 6 lib/gpudev/rte_gpudev.h | 49 + lib/gpudev/version.map | 2 ++ 6 files changed, 129 insertions(+) diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst index ff4626812b..b774ec77f9 100644 --- a/doc/guides/prog_guide/gpudev.rst +++ b/doc/guides/prog_guide/gpudev.rst @@ -73,6 +73,15 @@ Later, it's also possible to unregister that memory with gpudev. CPU memory registered outside of the gpudev library (e.g. with GPU specific library) cannot be unregistered by the gpudev library. +CPU mapping +~~~ + +gpudev can map into the CPU address space a GPU memory address allocated with gpudev. +gpudev returns a pointer the CPU can use to access (ready or write) GPU memory. +Later, it's also possible to unmap that memory with gpudev. +GPU memory CPU mapped outside of the gpudev library (e.g. with GPU specific library) +cannot be unmapped by the gpudev library. + Memory Barrier ~~ diff --git a/drivers/gpu/cuda/cuda.c b/drivers/gpu/cuda/cuda.c index 0ece1bb612..408b659fce 100644 --- a/drivers/gpu/cuda/cuda.c +++ b/drivers/gpu/cuda/cuda.c @@ -1177,6 +1177,8 @@ cuda_gpu_probe(__rte_unused struct rte_pci_driver *pci_drv, struct rte_pci_devic dev->ops.mem_free = cuda_mem_free; dev->ops.mem_register = cuda_mem_register; dev->ops.mem_unregister = cuda_mem_unregister; + dev->ops.mem_cpu_map = NULL; + dev->ops.mem_cpu_unmap = NULL; dev->ops.wmb = cuda_wmb; rte_gpu_complete_new(dev); diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c index 59e2169292..ce92d63257 100644 --- a/lib/gpudev/gpudev.c +++ b/lib/gpudev/gpudev.c @@ -640,6 +640,67 @@ rte_gpu_mem_unregister(int16_t dev_id, void *ptr) return GPU_DRV_RET(dev->ops.mem_unregister(dev, ptr)); } +void * +rte_gpu_mem_cpu_map(int16_t dev_id, size_t size, void *ptr) +{ + struct rte_gpu *dev; + void *ptr_out; + int ret; + + dev = gpu_get_by_id(dev_id); + if (dev == NULL) { + GPU_LOG(ERR, "mem CPU map for invalid device ID %d", dev_id); + rte_errno = ENODEV; + return NULL; + } + + if (dev->ops.mem_cpu_map == NULL) { + GPU_LOG(ERR, "mem CPU map not supported"); + rte_errno = ENOTSUP; + return NULL; + } + + if (ptr == NULL || size == 0) /* dry-run */ + return NULL; + + ret = GPU_DRV_RET(dev->ops.mem_cpu_map(dev, size, ptr, &ptr_out)); + + switch (ret) { + case 0: + return ptr_out; + case -ENOMEM: + case -E2BIG: + rte_errno = -ret; + return NULL; + default: + rte_errno = -EPERM; + return NULL; + } +} + +int +rte_gpu_mem_cpu_unmap(int16_t dev_id, void *ptr) +{ + struct rte_gpu *dev; + + dev = gpu_get_by_id(dev_id); + if (dev == NULL) { + GPU_LOG(ERR, "cpu_unmap mem for invalid device ID %d", dev_id); + rte_errno = ENODEV; + return -rte_errno; + } + + if (dev->ops.mem_cpu_unmap == NULL) { + rte_errno = ENOTSUP; + return -rte_errno; + } + + if (ptr == NULL) /* dry-run */ + return 0; + + return GPU_DRV_RET(dev->ops.mem_cpu_unmap(dev, ptr)); +} + int rte_gpu_wmb(int16_t dev_id) { diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h index 0ed7478e9b..0e55b00bfe 100644 --- a/lib/gpudev/gpudev_driver.h +++ b/lib/gpudev/gpudev_driver.h @@ -31,6 +31,8 @@ typedef int (rte_gpu_mem_alloc_t)(struct rte_gpu *dev, size_t size, unsigned int typedef int (rte_gpu_mem_free_t)(struct rte_gpu *dev, void *ptr); typedef int (rte_gpu_mem_register_t)(struct rte_gpu *dev, size_t size, void *ptr); typedef int (rte_gpu_mem_unregister_t)(struct rte_gpu *dev, void *ptr); +typedef int (rte_gpu_mem_cpu_map_t)(struct rte_gpu *dev, size_t size, void *ptr_in, void **ptr_out); +typedef int (rte_gpu_mem_cpu_unmap_t)(struct rte_gpu *dev, void *ptr); typedef int (rte_gpu_wmb_t)(struct rte_gpu *dev); struct rte_gpu_ops { @@ -46,6 +48,10 @@ struct rte_gpu_ops { rte_gpu_mem_register_t *mem_register; /* Unregister CPU memory from device. */ rte_gpu_mem_unregister_t *mem_unregister; + /* Map GPU memory for CPU visibility. */ +
[PATCH v4 2/2] app/test-gpudev: test cpu_map/cpu_unmap functions
From: Elena Agostini New test case added to test the gpudev cpu_map/cpu_unmap functions. Signed-off-by: Elena Agostini --- app/test-gpudev/main.c | 63 ++ 1 file changed, 63 insertions(+) diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c index 4500a8660b..3d0c17d8fd 100644 --- a/app/test-gpudev/main.c +++ b/app/test-gpudev/main.c @@ -185,6 +185,68 @@ register_cpu_memory(uint16_t gpu_id) return -1; } +static int +gpu_mem_cpu_map(uint16_t gpu_id) +{ + void *ptr_gpu = NULL; + void *ptr_cpu = NULL; + size_t buf_bytes = 1024; + unsigned int align = 4096; + int ret; + + printf("\n===> TEST: Map GPU memory for CPU visibility\n\n"); + + /* Alloc memory on GPU 0 with 4kB alignment */ + ptr_gpu = rte_gpu_mem_alloc(gpu_id, buf_bytes, align); + if (ptr_gpu == NULL) { + fprintf(stderr, "rte_gpu_mem_alloc GPU memory returned error\n"); + goto error; + } + printf("GPU memory allocated at 0x%p size is %zd bytes\n", + ptr_gpu, buf_bytes); + + ptr_cpu = rte_gpu_mem_cpu_map(gpu_id, buf_bytes, ptr_gpu); + if (ptr_cpu == NULL) { + fprintf(stderr, "rte_gpu_mem_cpu_map returned error\n"); + goto error; + } + printf("GPU memory mapped for CPU access at 0x%p\n", ptr_cpu); + + ((uint8_t*)ptr_cpu)[0] = 0x4; + ((uint8_t*)ptr_cpu)[1] = 0x5; + ((uint8_t*)ptr_cpu)[2] = 0x6; + + printf("GPU memory first 3 bytes set from CPU: %x %x %x\n", + ((uint8_t*)ptr_cpu)[0], + ((uint8_t*)ptr_cpu)[1], + ((uint8_t*)ptr_cpu)[2]); + + ret = rte_gpu_mem_cpu_unmap(gpu_id, ptr_cpu); + if (ret < 0) { + fprintf(stderr, "rte_gpu_mem_cpu_unmap returned error %d\n", ret); + goto error; + } + printf("GPU memory mapped for CPU access at 0x%p\n", ptr_cpu); + + ret = rte_gpu_mem_free(gpu_id, ptr_gpu); + if (ret < 0) { + fprintf(stderr, "rte_gpu_mem_free returned error %d\n", ret); + goto error; + } + printf("GPU memory 0x%p freed\n", ptr_gpu); + + printf("\n===> TEST: PASSED\n"); + return 0; + +error: + + rte_gpu_mem_cpu_unmap(gpu_id, ptr_cpu); + rte_gpu_mem_free(gpu_id, ptr_gpu); + + printf("\n===> TEST: FAILED\n"); + return -1; +} + static int create_update_comm_flag(uint16_t gpu_id) { @@ -402,6 +464,7 @@ main(int argc, char **argv) */ alloc_gpu_memory(gpu_id); register_cpu_memory(gpu_id); + gpu_mem_cpu_map(gpu_id); /** * Communication items test -- 2.17.1
[PATCH v5 1/2] gpudev: expose GPU memory to CPU
From: Elena Agostini Enable the possibility to expose a GPU memory area and make it accessible from the CPU. GPU memory has to be allocated via rte_gpu_mem_alloc(). This patch allows the gpudev library to map (and unmap), through the GPU driver, a chunk of GPU memory and to return a memory pointer usable by the CPU to access the GPU memory area. Signed-off-by: Elena Agostini --- doc/guides/prog_guide/gpudev.rst | 9 + drivers/gpu/cuda/cuda.c | 2 ++ lib/gpudev/gpudev.c | 61 lib/gpudev/gpudev_driver.h | 6 lib/gpudev/rte_gpudev.h | 49 + lib/gpudev/version.map | 2 ++ 6 files changed, 129 insertions(+) diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst index ff4626812b..b774ec77f9 100644 --- a/doc/guides/prog_guide/gpudev.rst +++ b/doc/guides/prog_guide/gpudev.rst @@ -73,6 +73,15 @@ Later, it's also possible to unregister that memory with gpudev. CPU memory registered outside of the gpudev library (e.g. with GPU specific library) cannot be unregistered by the gpudev library. +CPU mapping +~~~ + +gpudev can map into the CPU address space a GPU memory address allocated with gpudev. +gpudev returns a pointer the CPU can use to access (ready or write) GPU memory. +Later, it's also possible to unmap that memory with gpudev. +GPU memory CPU mapped outside of the gpudev library (e.g. with GPU specific library) +cannot be unmapped by the gpudev library. + Memory Barrier ~~ diff --git a/drivers/gpu/cuda/cuda.c b/drivers/gpu/cuda/cuda.c index 0ece1bb612..408b659fce 100644 --- a/drivers/gpu/cuda/cuda.c +++ b/drivers/gpu/cuda/cuda.c @@ -1177,6 +1177,8 @@ cuda_gpu_probe(__rte_unused struct rte_pci_driver *pci_drv, struct rte_pci_devic dev->ops.mem_free = cuda_mem_free; dev->ops.mem_register = cuda_mem_register; dev->ops.mem_unregister = cuda_mem_unregister; + dev->ops.mem_cpu_map = NULL; + dev->ops.mem_cpu_unmap = NULL; dev->ops.wmb = cuda_wmb; rte_gpu_complete_new(dev); diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c index 59e2169292..ce92d63257 100644 --- a/lib/gpudev/gpudev.c +++ b/lib/gpudev/gpudev.c @@ -640,6 +640,67 @@ rte_gpu_mem_unregister(int16_t dev_id, void *ptr) return GPU_DRV_RET(dev->ops.mem_unregister(dev, ptr)); } +void * +rte_gpu_mem_cpu_map(int16_t dev_id, size_t size, void *ptr) +{ + struct rte_gpu *dev; + void *ptr_out; + int ret; + + dev = gpu_get_by_id(dev_id); + if (dev == NULL) { + GPU_LOG(ERR, "mem CPU map for invalid device ID %d", dev_id); + rte_errno = ENODEV; + return NULL; + } + + if (dev->ops.mem_cpu_map == NULL) { + GPU_LOG(ERR, "mem CPU map not supported"); + rte_errno = ENOTSUP; + return NULL; + } + + if (ptr == NULL || size == 0) /* dry-run */ + return NULL; + + ret = GPU_DRV_RET(dev->ops.mem_cpu_map(dev, size, ptr, &ptr_out)); + + switch (ret) { + case 0: + return ptr_out; + case -ENOMEM: + case -E2BIG: + rte_errno = -ret; + return NULL; + default: + rte_errno = -EPERM; + return NULL; + } +} + +int +rte_gpu_mem_cpu_unmap(int16_t dev_id, void *ptr) +{ + struct rte_gpu *dev; + + dev = gpu_get_by_id(dev_id); + if (dev == NULL) { + GPU_LOG(ERR, "cpu_unmap mem for invalid device ID %d", dev_id); + rte_errno = ENODEV; + return -rte_errno; + } + + if (dev->ops.mem_cpu_unmap == NULL) { + rte_errno = ENOTSUP; + return -rte_errno; + } + + if (ptr == NULL) /* dry-run */ + return 0; + + return GPU_DRV_RET(dev->ops.mem_cpu_unmap(dev, ptr)); +} + int rte_gpu_wmb(int16_t dev_id) { diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h index 0ed7478e9b..0e55b00bfe 100644 --- a/lib/gpudev/gpudev_driver.h +++ b/lib/gpudev/gpudev_driver.h @@ -31,6 +31,8 @@ typedef int (rte_gpu_mem_alloc_t)(struct rte_gpu *dev, size_t size, unsigned int typedef int (rte_gpu_mem_free_t)(struct rte_gpu *dev, void *ptr); typedef int (rte_gpu_mem_register_t)(struct rte_gpu *dev, size_t size, void *ptr); typedef int (rte_gpu_mem_unregister_t)(struct rte_gpu *dev, void *ptr); +typedef int (rte_gpu_mem_cpu_map_t)(struct rte_gpu *dev, size_t size, void *ptr_in, void **ptr_out); +typedef int (rte_gpu_mem_cpu_unmap_t)(struct rte_gpu *dev, void *ptr); typedef int (rte_gpu_wmb_t)(struct rte_gpu *dev); struct rte_gpu_ops { @@ -46,6 +48,10 @@ struct rte_gpu_ops { rte_gpu_mem_register_t *mem_register; /* Unregister CPU memory from device. */ rte_gpu_mem_unregister_t *mem_unregister; + /* Map GPU memory for CPU visibility. */ +
[PATCH v5 2/2] app/test-gpudev: test cpu_map/cpu_unmap functions
From: Elena Agostini New test case added to test the gpudev cpu_map/cpu_unmap functions. Signed-off-by: Elena Agostini --- app/test-gpudev/main.c | 63 ++ 1 file changed, 63 insertions(+) diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c index 4500a8660b..417f2d78b7 100644 --- a/app/test-gpudev/main.c +++ b/app/test-gpudev/main.c @@ -185,6 +185,68 @@ register_cpu_memory(uint16_t gpu_id) return -1; } +static int +gpu_mem_cpu_map(uint16_t gpu_id) +{ + void *ptr_gpu = NULL; + void *ptr_cpu = NULL; + size_t buf_bytes = 1024; + unsigned int align = 4096; + int ret; + + printf("\n===> TEST: Map GPU memory for CPU visibility\n\n"); + + /* Alloc memory on GPU 0 with 4kB alignment */ + ptr_gpu = rte_gpu_mem_alloc(gpu_id, buf_bytes, align); + if (ptr_gpu == NULL) { + fprintf(stderr, "rte_gpu_mem_alloc GPU memory returned error\n"); + goto error; + } + printf("GPU memory allocated at 0x%p size is %zd bytes\n", + ptr_gpu, buf_bytes); + + ptr_cpu = rte_gpu_mem_cpu_map(gpu_id, buf_bytes, ptr_gpu); + if (ptr_cpu == NULL) { + fprintf(stderr, "rte_gpu_mem_cpu_map returned error\n"); + goto error; + } + printf("GPU memory mapped for CPU access at 0x%p\n", ptr_cpu); + + ((uint8_t *)ptr_cpu)[0] = 0x4; + ((uint8_t *)ptr_cpu)[1] = 0x5; + ((uint8_t *)ptr_cpu)[2] = 0x6; + + printf("GPU memory first 3 bytes set from CPU: %x %x %x\n", + ((uint8_t *)ptr_cpu)[0], + ((uint8_t *)ptr_cpu)[1], + ((uint8_t *)ptr_cpu)[2]); + + ret = rte_gpu_mem_cpu_unmap(gpu_id, ptr_cpu); + if (ret < 0) { + fprintf(stderr, "rte_gpu_mem_cpu_unmap returned error %d\n", ret); + goto error; + } + printf("GPU memory mapped for CPU access at 0x%p\n", ptr_cpu); + + ret = rte_gpu_mem_free(gpu_id, ptr_gpu); + if (ret < 0) { + fprintf(stderr, "rte_gpu_mem_free returned error %d\n", ret); + goto error; + } + printf("GPU memory 0x%p freed\n", ptr_gpu); + + printf("\n===> TEST: PASSED\n"); + return 0; + +error: + + rte_gpu_mem_cpu_unmap(gpu_id, ptr_cpu); + rte_gpu_mem_free(gpu_id, ptr_gpu); + + printf("\n===> TEST: FAILED\n"); + return -1; +} + static int create_update_comm_flag(uint16_t gpu_id) { @@ -402,6 +464,7 @@ main(int argc, char **argv) */ alloc_gpu_memory(gpu_id); register_cpu_memory(gpu_id); + gpu_mem_cpu_map(gpu_id); /** * Communication items test -- 2.17.1
[PATCH v2] net/enic: add support for eCPRI matching
eCPRI message can be over Ethernet layer (.1Q supported also) or over UDP layer. Message header formats are the same in these two variants. Only up though the first packet header in the PDU can be matched. RSS on the eCPRI header fields is not supported. Signed-off-by: John Daley Reviewed-by: Hyong Youb Kim --- v2 - include enic.ini update doc/guides/nics/features/enic.ini | 1 + doc/guides/rel_notes/release_22_03.rst | 1 + drivers/net/enic/enic_fm_flow.c| 65 ++ 3 files changed, 67 insertions(+) diff --git a/doc/guides/nics/features/enic.ini b/doc/guides/nics/features/enic.ini index c3bcead05e..88e4ef8c64 100644 --- a/doc/guides/nics/features/enic.ini +++ b/doc/guides/nics/features/enic.ini @@ -53,6 +53,7 @@ vlan = Y vxlan= Y geneve = Y geneve_opt = Y +ecpri= Y [rte_flow actions] count= Y diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst index b38dc54e62..52d1e32cf6 100644 --- a/doc/guides/rel_notes/release_22_03.rst +++ b/doc/guides/rel_notes/release_22_03.rst @@ -58,6 +58,7 @@ New Features * **Updated Cisco enic driver.** * Added rte_flow support for matching GENEVE packets. + * Added rte_flow support for matching eCPRI packets. Removed Items - diff --git a/drivers/net/enic/enic_fm_flow.c b/drivers/net/enic/enic_fm_flow.c index 752ffeb5c5..589c9253e1 100644 --- a/drivers/net/enic/enic_fm_flow.c +++ b/drivers/net/enic/enic_fm_flow.c @@ -237,6 +237,7 @@ static enic_copy_item_fn enic_fm_copy_item_vxlan; static enic_copy_item_fn enic_fm_copy_item_gtp; static enic_copy_item_fn enic_fm_copy_item_geneve; static enic_copy_item_fn enic_fm_copy_item_geneve_opt; +static enic_copy_item_fn enic_fm_copy_item_ecpri; /* Ingress actions */ static const enum rte_flow_action_type enic_fm_supported_ig_actions[] = { @@ -392,6 +393,15 @@ static const struct enic_fm_items enic_fm_items[] = { RTE_FLOW_ITEM_TYPE_END, }, }, + [RTE_FLOW_ITEM_TYPE_ECPRI] = { + .copy_item = enic_fm_copy_item_ecpri, + .valid_start_item = 1, + .prev_items = (const enum rte_flow_item_type[]) { + RTE_FLOW_ITEM_TYPE_ETH, + RTE_FLOW_ITEM_TYPE_UDP, + RTE_FLOW_ITEM_TYPE_END, + }, + }, }; static int @@ -877,6 +887,61 @@ enic_fm_copy_item_geneve_opt(struct copy_item_args *arg) return 0; } +/* Match eCPRI combined message header */ +static int +enic_fm_copy_item_ecpri(struct copy_item_args *arg) +{ + const struct rte_flow_item *item = arg->item; + const struct rte_flow_item_ecpri *spec = item->spec; + const struct rte_flow_item_ecpri *mask = item->mask; + struct fm_tcam_match_entry *entry = arg->fm_tcam_entry; + struct fm_header_set *fm_data, *fm_mask; + uint8_t *fm_data_to, *fm_mask_to; + + ENICPMD_FUNC_TRACE(); + + /* Tunneling not supported- only matching on inner eCPRI fields. */ + if (arg->header_level > 0) + return -EINVAL; + + /* Need both spec and mask */ + if (!spec || !mask) + return -EINVAL; + + fm_data = &entry->ftm_data.fk_hdrset[0]; + fm_mask = &entry->ftm_mask.fk_hdrset[0]; + + /* eCPRI can only follow L2/VLAN layer if ethernet type is 0xAEFE. */ + if (!(fm_data->fk_metadata & FKM_UDP) && + (fm_mask->l2.eth.fk_ethtype != UINT16_MAX || + rte_cpu_to_be_16(fm_data->l2.eth.fk_ethtype) != + RTE_ETHER_TYPE_ECPRI)) + return -EINVAL; + + if (fm_data->fk_metadata & FKM_UDP) { + /* eCPRI on UDP */ + fm_data->fk_header_select |= FKH_L4RAW; + fm_mask->fk_header_select |= FKH_L4RAW; + fm_data_to = &fm_data->l4.rawdata[sizeof(fm_data->l4.udp)]; + fm_mask_to = &fm_mask->l4.rawdata[sizeof(fm_data->l4.udp)]; + } else { + /* eCPRI directly after Etherent header */ + fm_data->fk_header_select |= FKH_L3RAW; + fm_mask->fk_header_select |= FKH_L3RAW; + fm_data_to = &fm_data->l3.rawdata[0]; + fm_mask_to = &fm_mask->l3.rawdata[0]; + } + + /* +* Use the raw L3 or L4 buffer to match eCPRI since fm_header_set does +* not have eCPRI header. Only 1st message header of PDU can be matched. +* "C" * bit ignored. +*/ + memcpy(fm_data_to, spec, sizeof(*spec)); + memcpy(fm_mask_to, mask, sizeof(*mask)); + return 0; +} + /* * Currently, raw pattern match is very limited. It is intended for matching * UDP tunnel header (e.g. vxlan or geneve). -- 2.33.1
[PATCH v2] net/enic: support max descriptors allowed by adapter
Newer VIC adapters have the max number of supported RX and TX descriptors in their configuration. Use these values as the maximums. Signed-off-by: John Daley Reviewed-by: Hyong Youb Kim --- v2: fix RTE_ASSERT drivers/net/enic/base/cq_enet_desc.h | 6 - drivers/net/enic/enic_res.c | 20 drivers/net/enic/enic_res.h | 6 +++-- drivers/net/enic/enic_rxtx.c | 35 +++- 4 files changed, 49 insertions(+), 18 deletions(-) diff --git a/drivers/net/enic/base/cq_enet_desc.h b/drivers/net/enic/base/cq_enet_desc.h index a34a4f5400..02db85b9a0 100644 --- a/drivers/net/enic/base/cq_enet_desc.h +++ b/drivers/net/enic/base/cq_enet_desc.h @@ -67,7 +67,8 @@ struct cq_enet_rq_desc_64 { uint16_t vlan; uint16_t checksum_fcoe; uint8_t flags; - uint8_t unused[48]; + uint8_t fetch_idx_flags; + uint8_t unused[47]; uint8_t type_color; }; @@ -92,6 +93,9 @@ struct cq_enet_rq_desc_64 { #define CQ_ENET_RQ_DESC_BYTES_WRITTEN_BITS 14 #define CQ_ENET_RQ_DESC_BYTES_WRITTEN_MASK \ ((1 << CQ_ENET_RQ_DESC_BYTES_WRITTEN_BITS) - 1) +#define CQ_ENET_RQ_DESC_FETCH_IDX_BITS 2 +#define CQ_ENET_RQ_DESC_FETCH_IDX_MASK \ + ((1 << CQ_ENET_RQ_DESC_FETCH_IDX_BITS) - 1) #define CQ_ENET_RQ_DESC_FLAGS_TRUNCATED (0x1 << 14) #define CQ_ENET_RQ_DESC_FLAGS_VLAN_STRIPPED (0x1 << 15) diff --git a/drivers/net/enic/enic_res.c b/drivers/net/enic/enic_res.c index 9cfb857939..caf773bab2 100644 --- a/drivers/net/enic/enic_res.c +++ b/drivers/net/enic/enic_res.c @@ -26,6 +26,7 @@ int enic_get_vnic_config(struct enic *enic) struct vnic_enet_config *c = &enic->config; int err; uint64_t sizes; + uint32_t max_rq_descs, max_wq_descs; err = vnic_dev_get_mac_addr(enic->vdev, enic->mac_addr); if (err) { @@ -57,6 +58,8 @@ int enic_get_vnic_config(struct enic *enic) GET_CONFIG(loop_tag); GET_CONFIG(num_arfs); GET_CONFIG(max_pkt_size); + GET_CONFIG(max_rq_ring); + GET_CONFIG(max_wq_ring); /* max packet size is only defined in newer VIC firmware * and will be 0 for legacy firmware and VICs @@ -101,20 +104,29 @@ int enic_get_vnic_config(struct enic *enic) ((enic->filter_actions & FILTER_ACTION_COUNTER_FLAG) ? "count " : "")); - c->wq_desc_count = RTE_MIN((uint32_t)ENIC_MAX_WQ_DESCS, + /* The max size of RQ and WQ rings are specified in 1500 series VICs and +* beyond. If they are not specified by the VIC or if 64B CQ descriptors +* are not being used, the max number of descriptors is 4096. +*/ + max_wq_descs = (enic->cq64_request && c->max_wq_ring) ? c->max_wq_ring : + ENIC_LEGACY_MAX_WQ_DESCS; + c->wq_desc_count = RTE_MIN(max_wq_descs, RTE_MAX((uint32_t)ENIC_MIN_WQ_DESCS, c->wq_desc_count)); c->wq_desc_count &= 0xffe0; /* must be aligned to groups of 32 */ - - c->rq_desc_count = RTE_MIN((uint32_t)ENIC_MAX_RQ_DESCS, + max_rq_descs = (enic->cq64_request && c->max_rq_ring) ? c->max_rq_ring + : ENIC_LEGACY_MAX_WQ_DESCS; + c->rq_desc_count = RTE_MIN(max_rq_descs, RTE_MAX((uint32_t)ENIC_MIN_RQ_DESCS, c->rq_desc_count)); c->rq_desc_count &= 0xffe0; /* must be aligned to groups of 32 */ + dev_debug(NULL, "Max supported VIC descriptors: WQ:%u, RQ:%u\n", + max_wq_descs, max_rq_descs); c->intr_timer_usec = RTE_MIN(c->intr_timer_usec, vnic_dev_get_intr_coal_timer_max(enic->vdev)); dev_info(enic_get_dev(enic), "vNIC MAC addr " RTE_ETHER_ADDR_PRT_FMT - "wq/rq %d/%d mtu %d, max mtu:%d\n", + " wq/rq %d/%d mtu %d, max mtu:%d\n", enic->mac_addr[0], enic->mac_addr[1], enic->mac_addr[2], enic->mac_addr[3], enic->mac_addr[4], enic->mac_addr[5], c->wq_desc_count, c->rq_desc_count, diff --git a/drivers/net/enic/enic_res.h b/drivers/net/enic/enic_res.h index 34f15d5a42..ae979d52be 100644 --- a/drivers/net/enic/enic_res.h +++ b/drivers/net/enic/enic_res.h @@ -12,9 +12,11 @@ #include "vnic_rq.h" #define ENIC_MIN_WQ_DESCS 64 -#define ENIC_MAX_WQ_DESCS 4096 #define ENIC_MIN_RQ_DESCS 64 -#define ENIC_MAX_RQ_DESCS 4096 + +/* 1400 series VICs and prior all have 4K max, after that it's in the config */ +#define ENIC_LEGACY_MAX_WQ_DESCS4096 +#define ENIC_LEGACY_MAX_RQ_DESCS4096 /* A descriptor ring has a multiple of 32 descriptors */ #define ENIC_ALIGN_DESCS 32 diff --git a/drivers/net/enic/enic_rxtx.c b/drivers/net/enic/enic_rxtx.c index c44715bfd0..6159a45907 100644 --- a/drivers/net/enic/enic_rxtx.c +++ b/drivers/net/enic/enic_rxtx.c @@ -84,6 +84,7 @@ enic
RE: [PATCH] net/enic: adjust memory check and use in proper order
Reviewed-by: John Daley Thanks, John -Original Message- From: Weiguo Li Sent: Tuesday, January 25, 2022 4:01 AM To: John Daley (johndale) Cc: dev@dpdk.org Subject: [PATCH] net/enic: adjust memory check and use in proper order Fixes: bb66d562aefc ("net/enic: share flow actions with same signature") Signed-off-by: Weiguo Li --- drivers/net/enic/enic_fm_flow.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/enic/enic_fm_flow.c b/drivers/net/enic/enic_fm_flow.c index bf04d714d0..d8718d17ef 100644 --- a/drivers/net/enic/enic_fm_flow.c +++ b/drivers/net/enic/enic_fm_flow.c @@ -2521,11 +2521,11 @@ enic_action_handle_get(struct enic_flowman *fm, struct fm_action *action_in, memcpy(fma, action_in, sizeof(*fma)); ah = calloc(1, sizeof(*ah)); - memcpy(&ah->key, action_in, sizeof(struct fm_action)); if (ah == NULL) return rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE, NULL, "enic: calloc(fm-action)"); + memcpy(&ah->key, action_in, sizeof(struct fm_action)); args[0] = FM_ACTION_ALLOC; args[1] = fm->cmd.pa; ret = flowman_cmd(fm, args, 2); -- 2.25.1
RE: [PATCH v2 01/10] ethdev: introduce flow pre-configuration hints
On Tuesday, January 25, 2022 13:44 Jerin Jacob wrote: > On Tue, Jan 25, 2022 at 6:58 AM Alexander Kozyrev > wrote: > > > > On Monday, January 24, 2022 12:41 Ajit Khaparde > wrote: > > > On Mon, Jan 24, 2022 at 6:37 AM Jerin Jacob > > > wrote: > > > > > > > Ok, I'll adopt this wording in the v3. > > > > > > > + * > > > > > + * @param port_id > > > > > + * Port identifier of Ethernet device. > > > > > + * @param[in] port_attr > > > > > + * Port configuration attributes. > > > > > + * @param[out] error > > > > > + * Perform verbose error reporting if not NULL. > > > > > + * PMDs initialize this structure in case of error only. > > > > > + * > > > > > + * @return > > > > > + * 0 on success, a negative errno value otherwise and rte_errno is > set. > > > > > + */ > > > > > +__rte_experimental > > > > > +int > > > > > +rte_flow_configure(uint16_t port_id, > > > > > > > > Should we couple, setting resource limit hint to configure function as > > > > if we add future items in > > > > configuration, we may pain to manage all state. Instead how about, > > > > rte_flow_resource_reserve_hint_set()? > > > +1 > > Port attributes are the hints, PMD can safely ignore anything that is not > supported/deemed unreasonable. > > Having several functions to call instead of one configuration function seems > like a burden to me. > > If we add a lot of features which has different state it will be > difficult to manage. > Since it is the slow path and OPTIONAL API. IMO, it should be fine to > have a separate API for a specific purpose > to have a clean interface. This approach contradicts to the DPDK way of configuring devices. It you look at the rte_eth_dev_configure or rte_eth_rx_queue_setup API you will see that the configuration is propagated via config structures. I would like to conform to this approach with my new API as well. Another question is how to deal with interdependencies with separate hints? There could be some resources that requires other resources to be present. Or one resource shares the hardware registers with another one and needs to be accounted for. That is not easy to do with separate function calls. > > > > > > > > > > > > > > > > > > + const struct rte_flow_port_attr *port_attr, > > > > > + struct rte_flow_error *error); > > > > > > > > I think, we should have _get function to get those limit numbers > otherwise, > > > > we can not write portable applications as the return value is kind of > > > > boolean now if > > > > don't define exact values for rte_errno for reasons. > > > +1 > > We had this discussion in RFC. The limits will vary from NIC to NIC and from > system to > > system, depending on hardware capabilities and amount of free memory > for example. > > It is easier to reject a configuration with a clear error description as we > > do > for flow creation. > > In that case, we can return a "defined" return value or "defined" > errno to capture this case so that > the application can make forward progress to differentiate between API > failed vs dont having enough resources > and move on. I think you are right and it will be useful to provide some hardware capabilities. I'll add something like rte_flow_info_get() to obtain available flow rule resources.
RE: [RFC PATCH v1 0/4] Direct re-arming of buffers on receive side
Thanks Morten, appreciate your comments. Few responses inline. > -Original Message- > From: Morten Brørup > Sent: Sunday, December 26, 2021 4:25 AM > To: Feifei Wang > Cc: dev@dpdk.org; nd > Subject: RE: [RFC PATCH v1 0/4] Direct re-arming of buffers on receive side > > > From: Feifei Wang [mailto:feifei.wa...@arm.com] > > Sent: Friday, 24 December 2021 17.46 > > > > > > However, this solution poses several constraint: > > > > 1)The receive queue needs to know which transmit queue it should take > > the buffers from. The application logic decides which transmit port to > > use to send out the packets. In many use cases the NIC might have a > > single port ([1], [2], [3]), in which case a given transmit queue is > > always mapped to a single receive queue (1:1 Rx queue: Tx queue). This > > is easy to configure. > > > > If the NIC has 2 ports (there are several references), then we will > > have > > 1:2 (RX queue: TX queue) mapping which is still easy to configure. > > However, if this is generalized to 'N' ports, the configuration can be > > long. More over the PMD would have to scan a list of transmit queues > > to pull the buffers from. > > I disagree with the description of this constraint. > > As I understand it, it doesn't matter now many ports or queues are in a NIC or > system. > > The constraint is more narrow: > > This patch requires that all packets ingressing on some port/queue must > egress on the specific port/queue that it has been configured to ream its > buffers from. I.e. an application cannot route packets between multiple ports > with this patch. Agree, this patch as is has this constraint. It is not a constraint that would apply for NICs with single port. The above text is describing some of the issues associated with generalizing the solution for N number of ports. If N is small, the configuration is small and scanning should not be bad. > > > > > > > You are missing the fourth constraint: > > 4) The application must transmit all received packets immediately, i.e. QoS > queueing and similar is prohibited. I do not understand this, can you please elaborate?. Even if there is QoS queuing, there would be steady stream of packets being transmitted. These transmitted packets will fill the buffers on the RX side. > > > > > The patch provides a significant performance improvement, but I am > wondering if any real world applications exist that would use this. Only a > "router on a stick" (i.e. a single-port router) comes to my mind, and that is > probably sufficient to call it useful in the real world. Do you have any other > examples to support the usefulness of this patch? SmartNIC is a clear and dominant use case, typically they have a single port for data plane traffic (dual ports are mostly for redundancy) This patch avoids good amount of store operations. The smaller CPUs found in SmartNICs have smaller store buffers which can become bottlenecks. Avoiding the lcore cache saves valuable HW cache space. > > Anyway, the patch doesn't do any harm if unused, and the only performance > cost is the "if (rxq->direct_rxrearm_enable)" branch in the Ethdev driver. So > I > don't oppose to it. >
RE: [PATCH] net/bonding: fix MTU set for slaves
> -Original Message- > From: Yigit, Ferruh > Sent: Wednesday, January 26, 2022 9:11 PM > To: Chas Williams ; Min Hu (Connor) > ; Ivan Ilchenko ; > Andrew Rybchenko > Cc: dev@dpdk.org; Yigit, Ferruh ; sta...@dpdk.org; > Andrew Rybchenko ; Jiang, YuX > > Subject: [PATCH] net/bonding: fix MTU set for slaves > > ethdev requires device to be configured before setting MTU. > > In bonding PMD, while configuring slaves, bonding first sets MTU later > configures them, which causes failure if slaves are not configured in advance. > > Fixing by changing the order in slave configure as requested in ethdev layer, > configure first and set MTU later. > > Bugzilla ID: 864 > Fixes: b26bee10ee37 ("ethdev: forbid MTU set before device configure") > Cc: sta...@dpdk.org > > Signed-off-by: Ferruh Yigit > --- Tested-by: Yu Jiang > Cc: Andrew Rybchenko > Cc: yux.ji...@intel.com > --- Verified patchset https://patches.dpdk.org/project/dpdk/patch/20220126131037.2403369-1-ferruh.yi...@intel.com/ on baseline v22.03-rc0's 8a5a91401d . Tested pass on nic Ethernet Connection X722 for 10GbE SFP+ 37d0/Ubuntu21.10,5.13.0-19-generic
RE: [PATCH v3 7/9] vhost: remove multi-line logs
> -Original Message- > From: Maxime Coquelin > Sent: Wednesday, January 26, 2022 5:55 PM > To: dev@dpdk.org; Xia, Chenbo ; > david.march...@redhat.com > Cc: Maxime Coquelin > Subject: [PATCH v3 7/9] vhost: remove multi-line logs > > This patch replaces multi-lines logs in multiple single- > line logs in order to ease logs filtering based on their > socket path. > > Signed-off-by: Maxime Coquelin > --- > lib/vhost/socket.c | 10 ++-- > lib/vhost/vhost.c | 8 ++-- > lib/vhost/vhost_user.c | 103 - > 3 files changed, 59 insertions(+), 62 deletions(-) > > -- > 2.34.1 Reviewed-by: Chenbo Xia
RE: [PATCH] vdpa/mlx5: workaround queue stop with traffic
> -Original Message- > From: Matan Azrad > Sent: Monday, November 22, 2021 9:13 PM > To: Maxime Coquelin > Cc: dev@dpdk.org; sta...@dpdk.org; Xueming Li > Subject: [PATCH] vdpa/mlx5: workaround queue stop with traffic > > When the event thread polls traffic and a virtq is stopping, the FW loses > synchronization in the virtq indexes. > > It causes LM failure on synchronization between the HOST indexes to > the GUEST indexes. > > Unset the event thread before the queue stop in the LM process. > > Fixes: 31b9c29c86af ("vdpa/mlx5: support close and config operations") > Cc: sta...@dpdk.org > > Signed-off-by: Matan Azrad > Acked-by: Xueming Li > --- > -- > 2.25.1 Applied to next-virtio/main, thanks
RE: [PATCH] net/virtio: fix Tx queue 0 override by queue 128
> -Original Message- > From: Xueming Li > Sent: Thursday, December 2, 2021 9:51 PM > To: dev@dpdk.org > Cc: xuemi...@nvidia.com; sta...@dpdk.org; Maxime Coquelin > ; Xia, Chenbo > Subject: [PATCH] net/virtio: fix Tx queue 0 override by queue 128 > > Both Rx queue and Tx queue are VirtQ in virtio, VQ index is 256 for Tx > queue 128. Uint8 type of TxQ VQ index overflows and overrides Tx queue 0 > data. > > This patch fixes VQ index type with uint16 type. > > Fixes: c1f86306a026 ("virtio: add new driver") > Cc: sta...@dpdk.org > > Signed-off-by: Xueming Li > --- > -- > 2.34.0 Applied to next-virtio/main, thanks
RE: [PATCH v3] vdpa/ifc: fix log info mismatch
> -Original Message- > From: Pei, Andy > Sent: Monday, December 13, 2021 3:01 PM > To: dev@dpdk.org > Cc: Xia, Chenbo ; sta...@dpdk.org > Subject: [PATCH v3] vdpa/ifc: fix log info mismatch > > Fix log info mismatch. > > Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver") > Cc: sta...@dpdk.org > > Signed-off-by: Andy Pei > --- > drivers/vdpa/ifc/base/ifcvf.c | 14 -- > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/drivers/vdpa/ifc/base/ifcvf.c b/drivers/vdpa/ifc/base/ifcvf.c > index 721cb1d..d10c1fd 100644 > --- a/drivers/vdpa/ifc/base/ifcvf.c > +++ b/drivers/vdpa/ifc/base/ifcvf.c > @@ -94,12 +94,14 @@ > return -1; > } > > - DEBUGOUT("capability mapping:\ncommon cfg: %p\n" > - "notify base: %p\nisr cfg: %p\ndevice cfg: %p\n" > - "multiplier: %u\n", > - hw->common_cfg, hw->dev_cfg, > - hw->isr, hw->notify_base, > - hw->notify_off_multiplier); > + DEBUGOUT("capability mapping:\n" > + "common cfg: %p\n" > + "notify base: %p\n" > + "isr cfg: %p\n" > + "device cfg: %p\n" > + "multiplier: %u\n", > + hw->common_cfg, hw->notify_base, hw->isr, hw->dev_cfg, > + hw->notify_off_multiplier); > > return 0; > } > -- > 1.8.3.1 Applied to next-virtio/main, thanks
RE: 回复: [RFC PATCH v1 0/4] Direct re-arming of buffers on receive side
> > [quick summary: ethdev API to bypass mempool] > > 18/01/2022 16:51, Ferruh Yigit: > > On 12/28/2021 6:55 AM, Feifei Wang wrote: > > > Morten Brørup : > > >> The patch provides a significant performance improvement, but I am > > >> wondering if any real world applications exist that would use this. > > >> Only a "router on a stick" (i.e. a single-port router) comes to my > > >> mind, and that is probably sufficient to call it useful in the real > > >> world. Do you have any other examples to support the usefulness of this > patch? > > >> > > > One case I have is about network security. For network firewall, all > > > packets need to ingress on the specified port and egress on the specified > port to do packet filtering. > > > In this case, we can know flow direction in advance. > > > > I also have some concerns on how useful this API will be in real life, > > and does the use case worth the complexity it brings. > > And it looks too much low level detail for the application. I think the application writer already needs to know many low level details to be able to extract performance out of PMDs. For ex: fast free, > > That's difficult to judge. > The use case is limited and the API has some severe limitations. The use case applies for SmartNICs which is a major use case. In terms of limitations, it depends on how one sees it. For ex: lcore cache is not applicable to pipeline mode, but it is still accepted as it is helpful for something else. > The benefit is measured with l3fwd, which is not exactly a real app. It is funny how we treat l3fwd. When it shows performance improvement, we treat it as 'not a real application'. When it shows (even a small) performance drop, the patches are not accepted. We need to make up our mind 😊 > Do we want an API which improves performance in limited scenarios at the > cost of breaking some general design assumptions? It is not breaking any existing design assumptions. It is a very well suited optimization for SmartNIC use case. For this use case, it does not make sense for the same thread to copy data to a temp location (lcore cache), read it immediately and store it in another location. It is a waste of CPU cycles and memory bandwidth. > > Can we achieve the same level of performance with a mempool trick? We cannot as this patch basically avoids memory loads and stores (which reduces the backend stalls) caused by the temporary storage in lcore cache. >
RE: [PATCH] net/virtio: fix unreleased resource when creating virtio user dev is failed
> -Original Message- > From: Harold Huang > Sent: Thursday, December 23, 2021 12:43 PM > To: dev@dpdk.org > Cc: Maxime Coquelin ; Xia, Chenbo > > Subject: [PATCH] net/virtio: fix unreleased resource when creating virtio user > dev is failed > > When eth_virtio_dev_init is failed, the registered virtio user memory event > cb is not released and the backend created tap device is not destroyed. > It would cause some residual tap device existed in the host and creating > a new vdev could be failed because the new virtio_user_dev could use the > same address pointer and register memory event cb to the same address is > not allowed. > > Signed-off-by: Harold Huang > --- > Compared to patch v3, commit log is changed because this bug could > cause residual tap device in the host. > drivers/net/virtio/virtio_user_ethdev.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/virtio/virtio_user_ethdev.c > b/drivers/net/virtio/virtio_user_ethdev.c > index 0271098f0d..16eca2f940 100644 > --- a/drivers/net/virtio/virtio_user_ethdev.c > +++ b/drivers/net/virtio/virtio_user_ethdev.c > @@ -666,6 +666,7 @@ virtio_user_pmd_probe(struct rte_vdev_device *vdev) > /* previously called by pci probing for physical dev */ > if (eth_virtio_dev_init(eth_dev) < 0) { > PMD_INIT_LOG(ERR, "eth_virtio_dev_init fails"); > + virtio_user_dev_uninit(dev); > virtio_user_eth_dev_free(eth_dev); > goto end; > } > -- > 2.27.0 Applied to next-virtio/main with headline shortened and fix tag added. Thanks!
RE: 回复: [RFC PATCH v1 0/4] Direct re-arming of buffers on receive side
> > > From: Thomas Monjalon [mailto:tho...@monjalon.net] > > Sent: Tuesday, 18 January 2022 17.54 > > > > [quick summary: ethdev API to bypass mempool] > > > > 18/01/2022 16:51, Ferruh Yigit: > > > On 12/28/2021 6:55 AM, Feifei Wang wrote: > > > > Morten Brørup : > > > >> The patch provides a significant performance improvement, but I > > > >> am wondering if any real world applications exist that would use > > this. Only a > > > >> "router on a stick" (i.e. a single-port router) comes to my mind, > > and that is > > > >> probably sufficient to call it useful in the real world. Do you > > have any other > > > >> examples to support the usefulness of this patch? > > > >> > > > > One case I have is about network security. For network firewall, > > all packets need > > > > to ingress on the specified port and egress on the specified port > > to do packet filtering. > > > > In this case, we can know flow direction in advance. > > > > > > I also have some concerns on how useful this API will be in real > > life, > > > and does the use case worth the complexity it brings. > > > And it looks too much low level detail for the application. > > > > That's difficult to judge. > > The use case is limited and the API has some severe limitations. > > The benefit is measured with l3fwd, which is not exactly a real app. > > Do we want an API which improves performance in limited scenarios at > > the cost of breaking some general design assumptions? > > > > Can we achieve the same level of performance with a mempool trick? > > Perhaps the mbuf library could offer bulk functions for alloc/free of raw > mbufs - essentially a shortcut directly to the mempool library. > > There might be a few more details to micro-optimize in the mempool library, > if approached with this use case in mind. E.g. the > rte_mempool_default_cache() could do with a few unlikely() in its > comparisons. > > Also, for this use case, the mempool library adds tracing overhead, which this > API bypasses. And considering how short the code path through the mempool > cache is, the tracing overhead is relatively much. I.e.: memcpy(NIC->NIC) vs. > trace() memcpy(NIC->cache) trace() memcpy(cache->NIC). > > A key optimization point could be the number of mbufs being moved to/from > the mempool cache. If that number was fixed at compile time, a faster > memcpy() could be used. However, it seems that different PMDs use bursts of > either 4, 8, or in this case 32 mbufs. If only they could agree on such a > simple > detail. This patch removes the stores and loads which saves on the backend cycles. I do not think, other optimizations can do the same. > > Overall, I strongly agree that it is preferable to optimize the core > libraries, > rather than bypass them. Bypassing will eventually lead to "spaghetti code". IMO, this is not "spaghetti code". There is no design rule in DPDK that says the RX side must allocate buffers from a mempool or TX side must free buffers to a mempool. This patch does not break any modular boundaries. For ex: access internal details of another library.
RE: [dpdk-dev] [PATCH] net/virtio-user: check fd flags getting failure
> -Original Message- > From: Yunjian Wang > Sent: Saturday, January 8, 2022 3:53 PM > To: dev@dpdk.org > Cc: maxime.coque...@redhat.com; Xia, Chenbo ; > dingxiaoxi...@huawei.com; xudin...@huawei.com; Yunjian Wang > ; sta...@dpdk.org > Subject: [dpdk-dev] [PATCH] net/virtio-user: check fd flags getting failure > > The function fcntl() could return errors, > the return value need to be checked. > > Fixes: 6a84c37e3975 ("net/virtio-user: add vhost-user adapter layer") > Cc: sta...@dpdk.org > > Signed-off-by: Yunjian Wang > --- > drivers/net/virtio/virtio_user/vhost_user.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/virtio/virtio_user/vhost_user.c > b/drivers/net/virtio/virtio_user/vhost_user.c > index cc830a660f..0a39393c45 100644 > --- a/drivers/net/virtio/virtio_user/vhost_user.c > +++ b/drivers/net/virtio/virtio_user/vhost_user.c > @@ -840,8 +840,10 @@ vhost_user_setup(struct virtio_user_dev *dev) > } > > flag = fcntl(fd, F_GETFD); > - if (fcntl(fd, F_SETFD, flag | FD_CLOEXEC) < 0) > - PMD_DRV_LOG(WARNING, "fcntl failed, %s", strerror(errno)); > + if (flag == -1) > + PMD_DRV_LOG(WARNING, "fcntl get fd failed, %s", > strerror(errno)); > + else if (fcntl(fd, F_SETFD, flag | FD_CLOEXEC) < 0) > + PMD_DRV_LOG(WARNING, "fcntl set fd failed, %s", > strerror(errno)); > > memset(&un, 0, sizeof(un)); > un.sun_family = AF_UNIX; > -- > 2.27.0 Applied to next-virtio/main with headline fixed (fd -> FD). Thanks.
RE: [dpdk-dev] [PATCH] net/virtio: fix uninitialized old_rss_key variable
> -Original Message- > From: Yunjian Wang > Sent: Saturday, January 8, 2022 4:14 PM > To: dev@dpdk.org > Cc: maxime.coque...@redhat.com; Xia, Chenbo ; > dingxiaoxi...@huawei.com; xudin...@huawei.com; Yunjian Wang > ; sta...@dpdk.org > Subject: [dpdk-dev] [PATCH] net/virtio: fix uninitialized old_rss_key variable > > This patch fixes an issue that uninitialized old_rss_key > is used for restoring the rss_key. > > Coverity issue: 373866 > Fixes: 0c9d66207054 ("net/virtio: support RSS") > Cc: sta...@dpdk.org > > Signed-off-by: Yunjian Wang > --- > drivers/net/virtio/virtio_ethdev.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/virtio/virtio_ethdev.c > b/drivers/net/virtio/virtio_ethdev.c > index c2588369b2..7c445bfc48 100644 > --- a/drivers/net/virtio/virtio_ethdev.c > +++ b/drivers/net/virtio/virtio_ethdev.c > @@ -2028,7 +2028,8 @@ virtio_dev_rss_hash_update(struct rte_eth_dev *dev, > > return 0; > restore_key: > - memcpy(hw->rss_key, old_rss_key, VIRTIO_NET_RSS_KEY_SIZE); > + if (rss_conf->rss_key && rss_conf->rss_key_len) > + memcpy(hw->rss_key, old_rss_key, VIRTIO_NET_RSS_KEY_SIZE); > restore_types: > hw->rss_hash_types = old_hash_types; > > -- > 2.27.0 Applied to next-virtio/main with headline fixed, thanks.
RE: [PATCH v2] vhost: add log for VHOST_USER_SET_VRING_BASE
> -Original Message- > From: Pei, Andy > Sent: Friday, January 14, 2022 3:57 PM > To: dev@dpdk.org > Cc: Xia, Chenbo > Subject: [PATCH v2] vhost: add log for VHOST_USER_SET_VRING_BASE > > This patch adds log for vring related info in handling of vhost message > VHOST_USER_SET_VRING_BASE, which will be useful in live migration case. > > Signed-off-by: Andy Pei > --- > lib/vhost/vhost_user.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c > index a781346..cd8c7bc 100644 > --- a/lib/vhost/vhost_user.c > +++ b/lib/vhost/vhost_user.c > @@ -973,6 +973,11 @@ > vq->last_avail_idx = msg->payload.state.num; > } > > + VHOST_LOG_CONFIG(INFO, > + "vring base idx:%u last_used_idx:%u last_avail_idx:%u.\n", > + msg->payload.state.index, vq->last_used_idx, > + vq->last_avail_idx); > + > return RTE_VHOST_MSG_RESULT_OK; > } > > -- > 1.8.3.1 Applied to next-virtio/main, thanks.
RE: [PATCH v3 0/9] vhost: improve logging
> -Original Message- > From: Maxime Coquelin > Sent: Wednesday, January 26, 2022 5:55 PM > To: dev@dpdk.org; Xia, Chenbo ; > david.march...@redhat.com > Cc: Maxime Coquelin > Subject: [PATCH v3 0/9] vhost: improve logging > > This series aims at easing Vhost logs analysis, by > prepending the Vhost-user socket path to all logs and to > remove multi-line comments. Doing so, filtering Vhost-user > ports logs is much easier. > > Changes in v3: > == > - Fix various typos reported (Chenbo) > - Revert one multi-line comment removal (Chenbo) > > Changes in v2: > == > - Add missing socket paths (David) > - avoid identical logs in iotlb code (David) > - Use data log type when used in datapath (David) > > Maxime Coquelin (9): > vhost: improve IOTLB logs > vhost: improve vDPA registration failure log > vhost: improve Vhost layer logs > vhost: improve Vhost-user layer logs > vhost: improve socket layer logs > vhost: improve Virtio-net layer logs > vhost: remove multi-line logs > vhost: differentiate IOTLB logs > vhost: use proper logging type for data path > > lib/vhost/iotlb.c | 30 +- > lib/vhost/iotlb.h | 10 +- > lib/vhost/socket.c | 148 - > lib/vhost/vdpa.c | 4 +- > lib/vhost/vhost.c | 108 --- > lib/vhost/vhost_user.c | 678 - > lib/vhost/vhost_user.h | 4 +- > lib/vhost/virtio_net.c | 165 +- > 8 files changed, 548 insertions(+), 599 deletions(-) > > -- > 2.34.1 Series applied to next-virtio/main, thanks
RE: [PATCH v2 09/83] lib/vhost: remove unnecessary NULL checks
Hi Stephen, > -Original Message- > From: Stephen Hemminger > Sent: Tuesday, January 25, 2022 1:46 AM > To: dev@dpdk.org > Cc: Stephen Hemminger ; Maxime Coquelin > ; Xia, Chenbo > Subject: [PATCH v2 09/83] lib/vhost: remove unnecessary NULL checks Title should be: vhost: remove unnecessary NULL checks With this fixed: Reviewed-by: Chenbo Xia > > Remove redundant NULL pointer checks before free functions > found by nullfree.cocci > > Signed-off-by: Stephen Hemminger > --- > lib/vhost/iotlb.c| 3 +-- > lib/vhost/vhost_crypto.c | 6 ++ > lib/vhost/vhost_user.c | 9 +++-- > 3 files changed, 6 insertions(+), 12 deletions(-) > > diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c > index 82bdb84526ea..d6b8c0a396b5 100644 > --- a/lib/vhost/iotlb.c > +++ b/lib/vhost/iotlb.c > @@ -315,8 +315,7 @@ vhost_user_iotlb_init(struct virtio_net *dev, int > vq_index) > > /* If already created, free it and recreate */ > vq->iotlb_pool = rte_mempool_lookup(pool_name); > - if (vq->iotlb_pool) > - rte_mempool_free(vq->iotlb_pool); > + rte_mempool_free(vq->iotlb_pool); > > vq->iotlb_pool = rte_mempool_create(pool_name, > IOTLB_CACHE_SIZE, sizeof(struct vhost_iotlb_entry), 0, > diff --git a/lib/vhost/vhost_crypto.c b/lib/vhost/vhost_crypto.c > index 926b5c0bd94a..012e0cefdeba 100644 > --- a/lib/vhost/vhost_crypto.c > +++ b/lib/vhost/vhost_crypto.c > @@ -1498,10 +1498,8 @@ rte_vhost_crypto_create(int vid, uint8_t cryptodev_id, > return 0; > > error_exit: > - if (vcrypto->session_map) > - rte_hash_free(vcrypto->session_map); > - if (vcrypto->mbuf_pool) > - rte_mempool_free(vcrypto->mbuf_pool); > + rte_hash_free(vcrypto->session_map); > + rte_mempool_free(vcrypto->mbuf_pool); > > rte_free(vcrypto); > > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c > index 5eb1dd681231..23a0e3fb3600 100644 > --- a/lib/vhost/vhost_user.c > +++ b/lib/vhost/vhost_user.c > @@ -489,8 +489,7 @@ vhost_user_set_vring_num(struct virtio_net **pdev, > } > > if (vq_is_packed(dev)) { > - if (vq->shadow_used_packed) > - rte_free(vq->shadow_used_packed); > + rte_free(vq->shadow_used_packed); > vq->shadow_used_packed = rte_malloc_socket(NULL, > vq->size * > sizeof(struct vring_used_elem_packed), > @@ -502,8 +501,7 @@ vhost_user_set_vring_num(struct virtio_net **pdev, > } > > } else { > - if (vq->shadow_used_split) > - rte_free(vq->shadow_used_split); > + rte_free(vq->shadow_used_split); > > vq->shadow_used_split = rte_malloc_socket(NULL, > vq->size * sizeof(struct vring_used_elem), > @@ -516,8 +514,7 @@ vhost_user_set_vring_num(struct virtio_net **pdev, > } > } > > - if (vq->batch_copy_elems) > - rte_free(vq->batch_copy_elems); > + rte_free(vq->batch_copy_elems); > vq->batch_copy_elems = rte_malloc_socket(NULL, > vq->size * sizeof(struct batch_copy_elem), > RTE_CACHE_LINE_SIZE, vq->numa_node); > -- > 2.30.2
RE: [PATCH v2 10/83] examples/vhost_blk: remove unnecessary NULL checks
> -Original Message- > From: Stephen Hemminger > Sent: Tuesday, January 25, 2022 1:46 AM > To: dev@dpdk.org > Cc: Stephen Hemminger ; Maxime Coquelin > ; Xia, Chenbo > Subject: [PATCH v2 10/83] examples/vhost_blk: remove unnecessary NULL checks > > Remove redundant NULL pointer checks before free functions > found by nullfree.cocci > > Signed-off-by: Stephen Hemminger > --- > examples/vhost_blk/vhost_blk.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/examples/vhost_blk/vhost_blk.c b/examples/vhost_blk/vhost_blk.c > index feadacc62ee5..2cab1e6994fe 100644 > --- a/examples/vhost_blk/vhost_blk.c > +++ b/examples/vhost_blk/vhost_blk.c > @@ -849,8 +849,7 @@ static void > vhost_blk_ctrlr_destroy(struct vhost_blk_ctrlr *ctrlr) > { > if (ctrlr->bdev != NULL) { > - if (ctrlr->bdev->data != NULL) > - rte_free(ctrlr->bdev->data); > + rte_free(ctrlr->bdev->data); > > rte_free(ctrlr->bdev); > } > -- > 2.30.2 Reviewed-by: Chenbo Xia
RE: [PATCH v2 75/83] vdpa/ifc: remove unnecessary NULL checks
> -Original Message- > From: Stephen Hemminger > Sent: Tuesday, January 25, 2022 1:47 AM > To: dev@dpdk.org > Cc: Stephen Hemminger ; Wang, Xiao W > > Subject: [PATCH v2 75/83] vdpa/ifc: remove unnecessary NULL checks > > Remove redundant NULL pointer checks before free functions > found by nullfree.cocci > > Signed-off-by: Stephen Hemminger > --- > drivers/vdpa/ifc/ifcvf_vdpa.c | 9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c b/drivers/vdpa/ifc/ifcvf_vdpa.c > index 3853c4cf7e85..9f05595b6b34 100644 > --- a/drivers/vdpa/ifc/ifcvf_vdpa.c > +++ b/drivers/vdpa/ifc/ifcvf_vdpa.c > @@ -226,8 +226,7 @@ ifcvf_dma_map(struct ifcvf_internal *internal, bool > do_map) > } > > exit: > - if (mem) > - free(mem); > + free(mem); > return ret; > } > > @@ -253,8 +252,7 @@ hva_to_gpa(int vid, uint64_t hva) > } > > exit: > - if (mem) > - free(mem); > + free(mem); > return gpa; > } > > @@ -661,8 +659,7 @@ m_ifcvf_start(struct ifcvf_internal *internal) > > error: > for (i = 0; i < nr_vring; i++) > - if (internal->m_vring[i].desc) > - rte_free(internal->m_vring[i].desc); > + rte_free(internal->m_vring[i].desc); > > return -1; > } > -- > 2.30.2 Reviewed-by: Chenbo Xia
回复: [PATCH v2] net/i40e: reduce redundant store operation
> -邮件原件- > 发件人: Zhang, Qi Z > 发送时间: Wednesday, January 26, 2022 10:28 PM > 收件人: Feifei Wang ; Xing, Beilei > > 抄送: dev@dpdk.org; Wang, Haiyue ; nd > ; Ruifeng Wang > 主题: RE: [PATCH v2] net/i40e: reduce redundant store operation > > > > > -Original Message- > > From: Feifei Wang > > Sent: Tuesday, December 21, 2021 4:11 PM > > To: Xing, Beilei > > Cc: dev@dpdk.org; Wang, Haiyue ; n...@arm.com; > > Feifei Wang ; Ruifeng Wang > > > > Subject: [PATCH v2] net/i40e: reduce redundant store operation > > > > For free buffer operation in i40e vector path, it is unnecessary to store > 'NULL' > > into txep.mbuf. This is because when putting mbuf into Tx queue, > > tx_tail is the sentinel. And when doing tx_free, tx_next_dd is the > > sentinel. In all processes, mbuf==NULL is not a condition in check. > > Thus reset of mbuf is unnecessary and can be omitted. > > > > Signed-off-by: Feifei Wang > > Reviewed-by: Ruifeng Wang > > --- > > > > v2: remove the change for scalar path due to scalar path needs to > > check whether the mbuf is 'NULL' to release and clean up (Haiyue) > > > > drivers/net/i40e/i40e_rxtx_vec_common.h | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h > > b/drivers/net/i40e/i40e_rxtx_vec_common.h > > index f9a7f46550..26deb59fc4 100644 > > --- a/drivers/net/i40e/i40e_rxtx_vec_common.h > > +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h > > @@ -103,7 +103,6 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) > > if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) { > > for (i = 0; i < n; i++) { > > free[i] = txep[i].mbuf; > > - txep[i].mbuf = NULL; > > I will suggest to still add some comment here just for explaining, this may > help > to avoid unnecessary suspect when someone reading or debug on these code > 😊 > Thanks for your comments. Agree with this, and I will add the comment to explain why this store operation is unnecessary here. > > > } > > rte_mempool_put_bulk(free[0]->pool, (void **)free, n); > > goto done; > > -- > > 2.25.1
RE: [PATCH v4 1/2] gpudev: expose GPU memory to CPU
> -Original Message- > From: eagost...@nvidia.com > Sent: Thursday, January 27, 2022 11:47 > To: dev@dpdk.org > Cc: Elena Agostini > Subject: [PATCH v4 1/2] gpudev: expose GPU memory to CPU > > From: Elena Agostini > > Enable the possibility to expose a GPU memory area and make it > accessible from the CPU. > > GPU memory has to be allocated via rte_gpu_mem_alloc(). > > This patch allows the gpudev library to map (and unmap), > through the GPU driver, a chunk of GPU memory and to return > a memory pointer usable by the CPU to access the GPU memory area. > > Signed-off-by: Elena Agostini > --- > doc/guides/prog_guide/gpudev.rst | 9 + > drivers/gpu/cuda/cuda.c | 2 ++ > lib/gpudev/gpudev.c | 61 > lib/gpudev/gpudev_driver.h | 6 > lib/gpudev/rte_gpudev.h | 49 + > lib/gpudev/version.map | 2 ++ > 6 files changed, 129 insertions(+) > > +__rte_experimental > +void *rte_gpu_mem_cpu_map(int16_t dev_id, size_t size, void *ptr); How about add some direction words like "to/from" to make it straightforward ? For this: rte_gpu_mem_map_to_cpu ? > +__rte_experimental > +int rte_gpu_mem_cpu_unmap(int16_t dev_id, void *ptr); > + And rte_gpu_mem_unmap_to_cpu ? > }; > -- > 2.17.1
RE: [PATCH v2 15/15] vhost: make sure each queue callfd is configured
Hi Andy, > -Original Message- > From: Pei, Andy > Sent: Tuesday, January 25, 2022 5:37 PM > To: dev@dpdk.org > Cc: Xia, Chenbo ; maxime.coque...@redhat.com; Cao, Gang > ; Liu, Changpeng > Subject: [PATCH v2 15/15] vhost: make sure each queue callfd is configured > > During the vhost data path building process, qemu will create > a call fd at first, and create another call fd in the end. > The final call fd will be used to relay notify. > In the original code, after kick fd is set, dev_conf will > set the first call fd. Even though the actual call fd will set, > the data path will not work correctly. > > Signed-off-by: Andy Pei > --- > lib/vhost/vhost_user.c | 14 ++ > 1 file changed, 14 insertions(+) > 1.8.3.1 Please fix all reported error on patchwork first. http://patchwork.dpdk.org/project/dpdk/patch/1643103437-118618-16-git-send-email-andy@intel.com/ Thanks, Chenbo
RE: [EXT] Re: [dpdk-dev] [PATCH v2 2/2] app/testpmd: add queue based pfc CLI options
>-Original Message- >From: Ferruh Yigit >Sent: Tuesday, January 25, 2022 11:07 PM >To: Jerin Jacob Kollanukkaran ; dev@dpdk.org; Xiaoyun >Li ; Aman Singh ; Yuying >Zhang >Cc: tho...@monjalon.net; ajit.khapa...@broadcom.com; >abo...@pensando.io; andrew.rybche...@oktetlabs.ru; >beilei.x...@intel.com; bruce.richard...@intel.com; ch...@att.com; >chenbo@intel.com; ciara.lof...@intel.com; Devendra Singh Rawat >; ed.cz...@atomicrules.com; >evge...@amazon.com; gr...@u256.net; g.si...@nxp.com; >zhouguoy...@huawei.com; haiyue.w...@intel.com; Harman Kalra >; heinrich.k...@corigine.com; >hemant.agra...@nxp.com; hyon...@cisco.com; igo...@amazon.com; Igor >Russkikh ; jgraj...@cisco.com; >jasvinder.si...@intel.com; jianw...@trustnetic.com; >jiawe...@trustnetic.com; jingjing...@intel.com; johnd...@cisco.com; >john.mil...@atomicrules.com; linvi...@tuxdriver.com; keith.wi...@intel.com; >Kiran Kumar Kokkilagadda ; >ouli...@huawei.com; Liron Himi ; >lon...@microsoft.com; m...@semihalf.com; spin...@cesnet.cz; >ma...@nvidia.com; matt.pet...@windriver.com; >maxime.coque...@redhat.com; m...@semihalf.com; humi...@huawei.com; >Pradeep Kumar Nalla ; Nithin Kumar Dabilpuram >; qiming.y...@intel.com; qi.z.zh...@intel.com; >Radha Chintakuntla ; rahul.lakkire...@chelsio.com; >Rasesh Mody ; rosen...@intel.com; >sachin.sax...@oss.nxp.com; Satha Koteswara Rao Kottidi >; Shahed Shaikh ; >shaib...@amazon.com; shepard.sie...@atomicrules.com; >asoma...@amd.com; somnath.ko...@broadcom.com; >sthem...@microsoft.com; steven.webs...@windriver.com; Sunil Kumar Kori >; mtetsu...@gmail.com; Veerasenareddy Burru >; viachesl...@nvidia.com; xiao.w.w...@intel.com; >cloud.wangxiao...@huawei.com; yisen.zhu...@huawei.com; >yongw...@vmware.com; xuanziya...@huawei.com >Subject: [EXT] Re: [dpdk-dev] [PATCH v2 2/2] app/testpmd: add queue based >pfc CLI options > >External Email > >-- >On 1/13/2022 10:27 AM, jer...@marvell.com wrote: >> From: Sunil Kumar Kori >> >> Patch adds command line options to configure queue based priority flow >> control. >> >> - Syntax command is given as below: >> >> set pfc_queue_ctrl rx\ >> tx >> > >Isn't the order of the paramters odd, it is mixing Rx/Tx config, what about >ordering Rx and Tx paramters? > It's been kept like this to portray config for rx_pause and tx_pause separately i.e. mode and corresponding config. >> - Example command to configure queue based priority flow control >>on rx and tx side for port 0, Rx queue 0, Tx queue 0 with pause >>time 2047 >> >> testpmd> set pfc_queue_ctrl 0 rx on 0 0 tx on 0 0 2047 >> >> Signed-off-by: Sunil Kumar Kori > ><...>
[PATCH v3] net/i40e: reduce redundant reset operation
For free buffer operation in i40e vector path, it is unnecessary to store 'NULL' into txep.mbuf. This is because when putting mbuf into Tx queue, tx_tail is the sentinel. And when doing tx_free, tx_next_dd is the sentinel. In all processes, mbuf==NULL is not a condition in check. Thus reset of mbuf is unnecessary and can be omitted. Signed-off-by: Feifei Wang Reviewed-by: Ruifeng Wang --- v2: remove the change for scalar path due to scalar path needs to check whether the mbuf is 'NULL' to release and clean up (Haiyue) v3: add comments to remind reset mbuf is unnecessary here (Qi Zhang) drivers/net/i40e/i40e_rxtx_vec_common.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index f9a7f46550..959832ed6a 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -103,7 +103,7 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) { for (i = 0; i < n; i++) { free[i] = txep[i].mbuf; - txep[i].mbuf = NULL; + /* no need to reset txep[i].mbuf in vector path */ } rte_mempool_put_bulk(free[0]->pool, (void **)free, n); goto done; -- 2.25.1