[Help - BBDev] Operations executions shouldn't be asynchronous from enqueuing?
Dear Developers, I'm currently experimenting with BBDev, in particular with the decoding operations and there is something that is puzzling me. I'm sorry if in reality this is just a misunderstanding from my side, but I would really appreciate some clarifications. What I expected was the following: Once I call the function `rte_bbdev_enqueue_dec_ops` with a certain number of operations and a specific queue_id I expect the function to append the operations to a queue and return, after that I can use `rte_bbdev_dequeue_dec_ops` to read an "output_queue" structure to obtain the operations already processed by the decoder (the number of op. returned could differ from the number of op. enqueued). My assumption is that the decoder works in an asynchronous way in respect to the process that it's enqueuing operations, and dequeuing the once concluded (this last one could even be another separate process from the op. producer). Such that, if the queues are empty the decoder has nothing to do, but if there are operations in the queue the decoder is going to execute them, while the producer(s) and consumer(s) can do something else. What I've understood from the source code: Once the function `rte_bbdev_enqueue_dec_ops` is called then, there is a sequence of functions calls, up to the point where, in `enqueue_dec_all_ops` (drivers/baseband/turbo_sw/bbdev_turbo_software.c:1724), there is a for loop that take care of each operation one by one. At this point, for each operation, the `enqueue_dec_one_op` cycles until there is nothing more to decode calling `process_dec_cb`. This last function executes both `bblib_turbo_adapter_ul` and later `bblib_turbo_decoder`. This sequence of calls never gives back the control to the original function which called `rte_bbdev_enqueue_dec_ops`, that has to wait for all the operations to be concluded before continuing. The same logic applies for both Turbo and LDPC, from what I've see in `enqueue_ldpc_dec_all_ops`. Now my question is, shouldn't the decoder being built with the first approach in mind? By any chance, is the asynchronous part controlled by FlexRAN? I've to admit that I didn't check more deeply. Am I missing something in the code that invalidate what I've understood? I'm of the opinion that an approach like the first one would be more resilient and ductile, separating the logic that controls the queue(s) from the one that executes the actual processing. This way we could simulate more stressful situations where the growing rate of the operations is greater that the outgoing rate from the decoder. Thank you very much for your help. Best regards, Mattia Milani
[PATCH 1/3] examples/l3fwd: support single route file
IPv6 rules file needs to be specified together with IPv4 rules file to configure user given rules. But if user want to give only IPv4 or only IPv6 rules, application returns error: "Missing 1 or more rule files" With this patch application can accept only IPv4, only IPv6 or both IP rules. Signed-off-by: Gagandeep Singh --- examples/l3fwd/em_route_parse.c | 18 ++ examples/l3fwd/lpm_route_parse.c | 17 ++--- 2 files changed, 20 insertions(+), 15 deletions(-) diff --git a/examples/l3fwd/em_route_parse.c b/examples/l3fwd/em_route_parse.c index 6c16832e94..da23356dd6 100644 --- a/examples/l3fwd/em_route_parse.c +++ b/examples/l3fwd/em_route_parse.c @@ -249,8 +249,7 @@ void read_config_files_em(void) { /* ipv4 check */ - if (parm_config.rule_ipv4_name != NULL && - parm_config.rule_ipv6_name != NULL) { + if (parm_config.rule_ipv4_name != NULL) { /* ipv4 check */ route_num_v4 = em_add_rules(parm_config.rule_ipv4_name, &em_route_base_v4, &em_parse_v4_rule); @@ -258,7 +257,14 @@ read_config_files_em(void) em_free_routes(); rte_exit(EXIT_FAILURE, "Failed to add EM IPv4 rules\n"); } - + } else { + RTE_LOG(INFO, L3FWD, "Missing IPv4 rule file, using default instead\n"); + if (em_add_default_v4_rules() < 0) { + em_free_routes(); + rte_exit(EXIT_FAILURE, "Failed to add default IPv4 rules\n"); + } + } + if (parm_config.rule_ipv6_name != NULL) { /* ipv6 check */ route_num_v6 = em_add_rules(parm_config.rule_ipv6_name, &em_route_base_v6, &em_parse_v6_rule); @@ -267,11 +273,7 @@ read_config_files_em(void) rte_exit(EXIT_FAILURE, "Failed to add EM IPv6 rules\n"); } } else { - RTE_LOG(INFO, L3FWD, "Missing 1 or more rule files, using default instead\n"); - if (em_add_default_v4_rules() < 0) { - em_free_routes(); - rte_exit(EXIT_FAILURE, "Failed to add default IPv4 rules\n"); - } + RTE_LOG(INFO, L3FWD, "Missing IPv6 rule file, using default instead\n"); if (em_add_default_v6_rules() < 0) { em_free_routes(); rte_exit(EXIT_FAILURE, "Failed to add default IPv6 rules\n"); diff --git a/examples/l3fwd/lpm_route_parse.c b/examples/l3fwd/lpm_route_parse.c index f2028d79e1..f7d44aa2cd 100644 --- a/examples/l3fwd/lpm_route_parse.c +++ b/examples/l3fwd/lpm_route_parse.c @@ -271,8 +271,7 @@ lpm_free_routes(void) void read_config_files_lpm(void) { - if (parm_config.rule_ipv4_name != NULL && - parm_config.rule_ipv6_name != NULL) { + if (parm_config.rule_ipv4_name != NULL) { /* ipv4 check */ route_num_v4 = lpm_add_rules(parm_config.rule_ipv4_name, &route_base_v4, &lpm_parse_v4_rule); @@ -280,7 +279,15 @@ read_config_files_lpm(void) lpm_free_routes(); rte_exit(EXIT_FAILURE, "Failed to add IPv4 rules\n"); } + } else { + RTE_LOG(INFO, L3FWD, "Missing IPv4 rule file, using default instead\n"); + if (lpm_add_default_v4_rules() < 0) { + lpm_free_routes(); + rte_exit(EXIT_FAILURE, "Failed to add default IPv4 rules\n"); + } + } + if (parm_config.rule_ipv6_name != NULL) { /* ipv6 check */ route_num_v6 = lpm_add_rules(parm_config.rule_ipv6_name, &route_base_v6, &lpm_parse_v6_rule); @@ -289,11 +296,7 @@ read_config_files_lpm(void) rte_exit(EXIT_FAILURE, "Failed to add IPv6 rules\n"); } } else { - RTE_LOG(INFO, L3FWD, "Missing 1 or more rule files, using default instead\n"); - if (lpm_add_default_v4_rules() < 0) { - lpm_free_routes(); - rte_exit(EXIT_FAILURE, "Failed to add default IPv4 rules\n"); - } + RTE_LOG(INFO, L3FWD, "Missing IPv6 rule file, using default instead\n"); if (lpm_add_default_v6_rules() < 0) { lpm_free_routes(); rte_exit(EXIT_FAILURE, "Failed to add default IPv6 rules\n"); -- 2.25.1
[PATCH 2/3] examples/l3fwd: fix return value on rules add
fix return value on adding the EM or LPM rules. Fixes: e7e6dd643092 ("examples/l3fwd: support config file for EM") Fixes: 52def963fc1c ("examples/l3fwd: support config file for LPM/FIB") Cc: sean.morris...@intel.com Cc: sta...@dpdk.org Signed-off-by: Gagandeep Singh --- examples/l3fwd/em_route_parse.c | 11 ++- examples/l3fwd/lpm_route_parse.c | 11 ++- 2 files changed, 12 insertions(+), 10 deletions(-) diff --git a/examples/l3fwd/em_route_parse.c b/examples/l3fwd/em_route_parse.c index da23356dd6..8b534de5f1 100644 --- a/examples/l3fwd/em_route_parse.c +++ b/examples/l3fwd/em_route_parse.c @@ -119,7 +119,7 @@ em_add_rules(const char *rule_path, char buff[LINE_MAX]; FILE *fh; unsigned int i = 0, rule_size = sizeof(*next); - int val; + int val, rc; *proute_base = NULL; fh = fopen(rule_path, "rb"); @@ -172,13 +172,14 @@ em_add_rules(const char *rule_path, return -EINVAL; } - if (parser(buff + 1, next) != 0) { + rc = parser(buff + 1, next); + if (rc != 0) { RTE_LOG(ERR, L3FWD, - "%s Line %u: parse rules error\n", - rule_path, i); + "%s Line %u: parse rules error code = %d\n", + rule_path, i, rc); fclose(fh); free(route_rules); - return -EINVAL; + return rc; } route_cnt++; diff --git a/examples/l3fwd/lpm_route_parse.c b/examples/l3fwd/lpm_route_parse.c index f7d44aa2cd..f27b66e838 100644 --- a/examples/l3fwd/lpm_route_parse.c +++ b/examples/l3fwd/lpm_route_parse.c @@ -184,7 +184,7 @@ lpm_add_rules(const char *rule_path, char buff[LINE_MAX]; FILE *fh; unsigned int i = 0, rule_size = sizeof(*next); - int val; + int val, rc; *proute_base = NULL; fh = fopen(rule_path, "rb"); @@ -237,13 +237,14 @@ lpm_add_rules(const char *rule_path, return -EINVAL; } - if (parser(buff + 1, next) != 0) { + rc = parser(buff + 1, next); + if (rc != 0) { RTE_LOG(ERR, L3FWD, - "%s Line %u: parse rules error\n", - rule_path, i); + "%s Line %u: parse rules error code = %d\n", + rule_path, i, rc); fclose(fh); free(route_rules); - return -EINVAL; + return rc; } route_cnt++; -- 2.25.1
[PATCH 3/3] examples/l3fwd: fix maximum acceptable port ID in routes
Application is accepting routes for port ID up to UINT8_MAX for LPM amd EM routes on parsing the given rule file, but only up to 32 ports can be enabled as per the variable enabled_port_mask which is defined as uint32_t. This patch restricts the rules parsing code to accept routes for port ID up to 31 only to avoid any unnecessary maintenance of rules which will never be used. Fixes: e7e6dd643092 ("examples/l3fwd: support config file for EM") Fixes: 52def963fc1c ("examples/l3fwd: support config file for LPM/FIB") Cc: sean.morris...@intel.com Cc: sta...@dpdk.org Signed-off-by: Gagandeep Singh --- examples/l3fwd/em_route_parse.c | 6 -- examples/l3fwd/lpm_route_parse.c | 6 -- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/examples/l3fwd/em_route_parse.c b/examples/l3fwd/em_route_parse.c index 8b534de5f1..65c71cd1ba 100644 --- a/examples/l3fwd/em_route_parse.c +++ b/examples/l3fwd/em_route_parse.c @@ -65,7 +65,8 @@ em_parse_v6_rule(char *str, struct em_rule *v) /* protocol. */ GET_CB_FIELD(in[CB_FLD_PROTO], v->v6_key.proto, 0, UINT8_MAX, 0); /* out interface. */ - GET_CB_FIELD(in[CB_FLD_IF_OUT], v->if_out, 0, UINT8_MAX, 0); + GET_CB_FIELD(in[CB_FLD_IF_OUT], v->if_out, 0, + (sizeof(enabled_port_mask) * CHAR_BIT) - 1, 0); return 0; } @@ -102,7 +103,8 @@ em_parse_v4_rule(char *str, struct em_rule *v) /* protocol. */ GET_CB_FIELD(in[CB_FLD_PROTO], v->v4_key.proto, 0, UINT8_MAX, 0); /* out interface. */ - GET_CB_FIELD(in[CB_FLD_IF_OUT], v->if_out, 0, UINT8_MAX, 0); + GET_CB_FIELD(in[CB_FLD_IF_OUT], v->if_out, 0, + (sizeof(enabled_port_mask) * CHAR_BIT) - 1, 0); return 0; } diff --git a/examples/l3fwd/lpm_route_parse.c b/examples/l3fwd/lpm_route_parse.c index f27b66e838..357c12d9fe 100644 --- a/examples/l3fwd/lpm_route_parse.c +++ b/examples/l3fwd/lpm_route_parse.c @@ -110,7 +110,8 @@ lpm_parse_v6_rule(char *str, struct lpm_route_rule *v) rc = lpm_parse_v6_net(in[CB_FLD_DST_ADDR], v->ip_32, &v->depth); - GET_CB_FIELD(in[CB_FLD_IF_OUT], v->if_out, 0, UINT8_MAX, 0); + GET_CB_FIELD(in[CB_FLD_IF_OUT], v->if_out, 0, + (sizeof(enabled_port_mask) * CHAR_BIT) - 1, 0); return rc; } @@ -132,7 +133,8 @@ lpm_parse_v4_rule(char *str, struct lpm_route_rule *v) rc = parse_ipv4_addr_mask(in[CB_FLD_DST_ADDR], &v->ip, &v->depth); - GET_CB_FIELD(in[CB_FLD_IF_OUT], v->if_out, 0, UINT8_MAX, 0); + GET_CB_FIELD(in[CB_FLD_IF_OUT], v->if_out, 0, + (sizeof(enabled_port_mask) * CHAR_BIT) - 1, 0); return rc; } -- 2.25.1
[PATCH] ethdev: fix GENEVE option item conversion
The "rte_flow_conv()" function, enables, among other things, to copy item list. For GENEVE option item, the function copies it without considering deep copy. It copies the "data" pointer without copying the pointed values. This patch adds deep copy for after regular copy. Fixes: 2b4c72b4d10d ("ethdev: introduce GENEVE header TLV option item") Cc: sta...@dpdk.org Signed-off-by: Michael Baum --- lib/ethdev/rte_flow.c | 29 + 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c index ca2f85c3fa..4076ae4ee1 100644 --- a/lib/ethdev/rte_flow.c +++ b/lib/ethdev/rte_flow.c @@ -623,6 +623,7 @@ rte_flow_conv_item_spec(void *buf, const size_t size, switch (item->type) { union { const struct rte_flow_item_raw *raw; + const struct rte_flow_item_geneve_opt *geneve_opt; } spec; union { const struct rte_flow_item_raw *raw; @@ -632,10 +633,13 @@ rte_flow_conv_item_spec(void *buf, const size_t size, } mask; union { const struct rte_flow_item_raw *raw; + const struct rte_flow_item_geneve_opt *geneve_opt; } src; union { struct rte_flow_item_raw *raw; + struct rte_flow_item_geneve_opt *geneve_opt; } dst; + void *deep_src; size_t tmp; case RTE_FLOW_ITEM_TYPE_RAW: @@ -664,13 +668,30 @@ rte_flow_conv_item_spec(void *buf, const size_t size, tmp = last.raw->length & mask.raw->length; if (tmp) { off = RTE_ALIGN_CEIL(off, sizeof(*dst.raw->pattern)); - if (size >= off + tmp) - dst.raw->pattern = rte_memcpy - ((void *)((uintptr_t)dst.raw + off), -src.raw->pattern, tmp); + if (size >= off + tmp) { + deep_src = (void *)((uintptr_t)dst.raw + off); + dst.raw->pattern = rte_memcpy(deep_src, + src.raw->pattern, + tmp); + } off += tmp; } break; + case RTE_FLOW_ITEM_TYPE_GENEVE_OPT: + off = rte_flow_conv_copy(buf, data, size, +rte_flow_desc_item, item->type); + spec.geneve_opt = item->spec; + src.geneve_opt = data; + dst.geneve_opt = buf; + tmp = spec.geneve_opt->option_len << 2; + if (size > 0 && src.geneve_opt->data) { + deep_src = (void *)((uintptr_t)(dst.geneve_opt + 1)); + dst.geneve_opt->data = rte_memcpy(deep_src, + src.geneve_opt->data, + tmp); + } + off += tmp; + break; default: off = rte_flow_conv_copy(buf, data, size, rte_flow_desc_item, item->type); -- 2.25.1
[PATCH v1] test/crypto: remove unused stats in test setup
Remove unused stats in test setup. Coverity issue: 373869 Fixes: 2c6dab9cd93 ("test/crypto: add RSA and Mod tests") Cc: sta...@dpdk.org Signed-off-by: Gowrishankar Muthukrishnan --- app/test/test_cryptodev_asym.c | 5 - 1 file changed, 5 deletions(-) diff --git a/app/test/test_cryptodev_asym.c b/app/test/test_cryptodev_asym.c index ef926c6229..3802cf8022 100644 --- a/app/test/test_cryptodev_asym.c +++ b/app/test/test_cryptodev_asym.c @@ -547,8 +547,6 @@ ut_setup_asym(void) qp_id, ts_params->valid_devs[0]); } - rte_cryptodev_stats_reset(ts_params->valid_devs[0]); - /* Start the device */ TEST_ASSERT_SUCCESS(rte_cryptodev_start(ts_params->valid_devs[0]), "Failed to start cryptodev %u", @@ -561,7 +559,6 @@ static void ut_teardown_asym(void) { struct crypto_testsuite_params_asym *ts_params = &testsuite_params; - struct rte_cryptodev_stats stats; uint8_t dev_id = ts_params->valid_devs[0]; if (self->sess != NULL) @@ -571,8 +568,6 @@ ut_teardown_asym(void) self->op = NULL; self->result_op = NULL; - rte_cryptodev_stats_get(ts_params->valid_devs[0], &stats); - /* Stop the device */ rte_cryptodev_stop(ts_params->valid_devs[0]); } -- 2.21.0
[PATCH v1] test/crypto: fix asymmetric capability test
Fix asymmetric capability test for below: * Skip test if asymmetric crypto feature is not supported by device. * Assert return value of RTE function to get asymmetric capability. Coverity issue: 373365 Fixes: 2c6dab9cd93 ("test/crypto: add RSA and Mod tests") Cc: sta...@dpdk.org Signed-off-by: Gowrishankar Muthukrishnan --- app/test/test_cryptodev_asym.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/app/test/test_cryptodev_asym.c b/app/test/test_cryptodev_asym.c index 3802cf8022..1d88832146 100644 --- a/app/test/test_cryptodev_asym.c +++ b/app/test/test_cryptodev_asym.c @@ -626,7 +626,7 @@ test_capability(void) RTE_CRYPTODEV_FF_ASYMMETRIC_CRYPTO)) { RTE_LOG(INFO, USER1, "Device doesn't support asymmetric. Test Skipped\n"); - return TEST_SUCCESS; + return TEST_SKIPPED; } /* print xform capability */ @@ -641,6 +641,7 @@ test_capability(void) capa = rte_cryptodev_asym_capability_get(dev_id, (const struct rte_cryptodev_asym_capability_idx *) &idx); + TEST_ASSERT_NOT_NULL(capa, "Failed to get asymmetric capability"); print_asym_capa(capa); } } -- 2.21.0
[PATCH v1] test/crypto: fix comparison function for modex values
Fix comparison function used by modex test to check from first non-zero value itself. Coverity issue: 430125 Fixes: 2162d32c1c3 ("test/crypto: validate modex from first non-zero") Cc: sta...@dpdk.org Signed-off-by: Gowrishankar Muthukrishnan --- app/test/test_cryptodev_asym.c | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/app/test/test_cryptodev_asym.c b/app/test/test_cryptodev_asym.c index 1d88832146..f0b5d38543 100644 --- a/app/test/test_cryptodev_asym.c +++ b/app/test/test_cryptodev_asym.c @@ -3197,21 +3197,26 @@ static int send_one(void) } static int -modular_cmpeq(const uint8_t *a, const uint8_t *b, size_t len) +modular_cmpeq(const uint8_t *a, size_t a_len, const uint8_t *b, size_t b_len) { - const uint8_t *new_a = a, *new_b = b; + const uint8_t *new_a, *new_b; size_t i, j; /* Strip leading NUL bytes */ - for (i = 0; i < len; i++) + for (i = 0; i < a_len; i++) if (a[i] != 0) - new_a = &a[i]; + break; - for (j = 0; j < len; j++) + for (j = 0; j < b_len; j++) if (b[j] != 0) - new_b = &b[i]; + break; + + if (a_len - i != b_len - j) + return 1; - if (i != j || memcmp(new_a, new_b, len - i)) + new_a = &a[i]; + new_b = &b[j]; + if (memcmp(new_a, new_b, a_len - i)) return 1; return 0; @@ -3251,7 +3256,7 @@ modular_exponentiation(const void *test_data) TEST_ASSERT_SUCCESS(send_one(), "Failed to process crypto op"); - TEST_ASSERT_SUCCESS(modular_cmpeq(vector->reminder.data, + TEST_ASSERT_SUCCESS(modular_cmpeq(vector->reminder.data, vector->reminder.len, self->result_op->asym->modex.result.data, self->result_op->asym->modex.result.length), "operation verification failed\n"); -- 2.21.0
[PATCH] doc: clarify mempool striding optimisation on Arm
The mempool memory channel striding optimisation is not necessary on Arm platforms. Update the Programmer's Guide's mempool section to clarify this. Signed-off-by: Jack Bond-Preston Reviewed-by: Wathsala Vithanage --- doc/guides/prog_guide/mempool_lib.rst | 6 ++ 1 file changed, 6 insertions(+) diff --git a/doc/guides/prog_guide/mempool_lib.rst b/doc/guides/prog_guide/mempool_lib.rst index 4db577fe18..988b0e80c1 100644 --- a/doc/guides/prog_guide/mempool_lib.rst +++ b/doc/guides/prog_guide/mempool_lib.rst @@ -77,6 +77,12 @@ When creating a new pool, the user can specify to use this feature or not. .. _mempool_local_cache: +.. note:: + +This feature is not present for Arm systems. Modern Arm Interconnects choose the SN-F (memory +channel) using a hash of memory address bits. As a result, the load is distributed evenly in all +cases, including the above described, rendering this feature unnecessary. + Local Cache --- -- 2.34.1
Re: [PATCH] app/testpmd: improve sse based macswap
On Sat, Jul 13, 2024 at 08:49:49PM +0530, Vipin Varghese wrote: > Goal of the patch is to improve SSE macswap on x86_64 by reducing > the stalls in backend engine. Original implementation of the SSE > macswap makes loop call to multiple load, shuffle & store. Using > SIMD ISA interleaving we can reduce the stalls for > - load SSE token exhaustion > - Shuffle and Load dependency > > Also other changes which improves packet per second are > - Filling access to MBUF for offload flags which is separate cacheline, > - using register keyword > > Test results: > > Platform: AMD EPYC SIENA 8594P @2.3GHz, no boost > DPDK: 24.03 > > > TEST IO 64B: baseline > - mellanox CX-7 2*200Gbps : 42.0 > - intel E810 1*100Gbps : 82.0 > - intel E810 2*200Gbps (2CQ-DA2): 83.0 > > TEST MACSWAP 64B: > - mellanox CX-7 2*200Gbps : 31.533 : 31.90 > - intel E810 1*100Gbps : 50.380 : 47.0 > - intel E810 2*200Gbps (2CQ-DA2): 48.840 : 49.827 > > TEST MACSWAP 128B: > - mellanox CX-7 2*200Gbps: 30.946 : 31.770 > - intel E810 1*100Gbps: 49.386 : 46.366 > - intel E810 2*200Gbps (2CQ-DA2): 47.979 : 49.503 > > TEST MACSWAP 256B: > - mellanox CX-7 2*200Gbps: 32.480 : 33.150 > - intel E810 1 * 100Gbps: 45.29 : 44.571 > - intel E810 2 * 200Gbps (2CQ-DA2): 45.033 : 45.117 > > Hi, interesting patch. Do you know why we see regressions in some of the cases above? For 1x100G at 64B and 128B packet sizes we see perf drops of 3mpps vs smaller gains in the other two cases at each size (much smaller in the 64B case). Couple of other questions inline below too. Thanks, /Bruce > using multiple queues and lcore there is linear increase in MPPs. > > Signed-off-by: Vipin Varghese > --- > app/test-pmd/macswap_sse.h | 40 ++ > 1 file changed, 19 insertions(+), 21 deletions(-) > > diff --git a/app/test-pmd/macswap_sse.h b/app/test-pmd/macswap_sse.h > index 223f87a539..a3d3a274e5 100644 > --- a/app/test-pmd/macswap_sse.h > +++ b/app/test-pmd/macswap_sse.h > @@ -11,21 +11,21 @@ static inline void > do_macswap(struct rte_mbuf *pkts[], uint16_t nb, > struct rte_port *txp) > { > - struct rte_ether_hdr *eth_hdr[4]; > - struct rte_mbuf *mb[4]; > + register struct rte_ether_hdr *eth_hdr[8]; > + register struct rte_mbuf *mb[8]; Does using "register" actually make a difference to the generated code? Also, why increasing the array sizes from 4 to 8 - the actual code only uses 4 elements of each array below anyway? Is it for cache alignment purposes perhaps - if so, please use explicit cache alignment attributes to specify this rather than having it implicit in the array sizes. > uint64_t ol_flags; > int i; > int r; > - __m128i addr0, addr1, addr2, addr3; > + register __m128i addr0, addr1, addr2, addr3; > /** >* shuffle mask be used to shuffle the 16 bytes. >* byte 0-5 wills be swapped with byte 6-11. >* byte 12-15 will keep unchanged. >*/ > - __m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, > - 5, 4, 3, 2, > - 1, 0, 11, 10, > - 9, 8, 7, 6); > + register const __m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, > + 5, 4, 3, 2, > + 1, 0, 11, 10, > + 9, 8, 7, 6); > > ol_flags = ol_flags_init(txp->dev_conf.txmode.offloads); > vlan_qinq_set(pkts, nb, ol_flags, > @@ -44,23 +44,24 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t nb, > > mb[0] = pkts[i++]; > eth_hdr[0] = rte_pktmbuf_mtod(mb[0], struct rte_ether_hdr *); > - addr0 = _mm_loadu_si128((__m128i *)eth_hdr[0]); > - > mb[1] = pkts[i++]; > eth_hdr[1] = rte_pktmbuf_mtod(mb[1], struct rte_ether_hdr *); > - addr1 = _mm_loadu_si128((__m128i *)eth_hdr[1]); > - > - > mb[2] = pkts[i++]; > eth_hdr[2] = rte_pktmbuf_mtod(mb[2], struct rte_ether_hdr *); > - addr2 = _mm_loadu_si128((__m128i *)eth_hdr[2]); > - > mb[3] = pkts[i++]; > eth_hdr[3] = rte_pktmbuf_mtod(mb[3], struct rte_ether_hdr *); > - addr3 = _mm_loadu_si128((__m128i *)eth_hdr[3]); > > + /* Interleave load, shuffle & set */ > + addr0 = _mm_loadu_si128((__m128i *)eth_hdr[0]); > + mbuf_field_set(mb[0], ol_flags); > + addr1 = _mm_loadu_si128((__m128i *)eth_hdr[1]); > + mbuf_field_set(mb[1], ol_flags); > addr0 = _mm_shuffle_epi8(addr0, shfl_msk); > +
Re: [PATCH v7 01/21] net/ntnic: add ethdev and makes PMD available
Hi Min Zhou, I am seeing that commit for next-net: https://git.dpdk.org/next/dpdk-next-net/commit/?id=a6c3ec342ee105e322ffdb21e810cdfd38455c62 If you try to manually apply it on next-net, does it work? Pasting the logs from our apply process below for context: ``` Trying to checkout branch: origin/next-net-for-main Checked out to next-net-for-main (a6c3ec342ee105e322ffdb21e810cdfd38455c62) Applying patch... Applying: net/ntnic: add ethdev and makes PMD available Applying: net/ntnic: add logging implementation Applying: net/ntnic: add minimal initialization for PCI device Applying: net/ntnic: add NT utilities implementation Applying: net/ntnic: add VFIO module Applying: net/ntnic: add basic eth dev ops to ntnic Applying: net/ntnic: add core platform structures Applying: net/ntnic: add adapter initialization Applying: net/ntnic: add registers and FPGA model for NapaTech NIC Applying: net/ntnic: add FPGA modules for initialization Applying: net/ntnic: add FPGA initialization functionality Applying: net/ntnic: add support of the NT200A0X smartNIC Applying: net/ntnic: add startup and reset sequence for NT200A0X Applying: net/ntnic: add clock profile for the NT200A0X smartNIC Applying: net/ntnic: add link management skeleton Applying: net/ntnic: add link 100G module ops Applying: net/ntnic: add generic NIM and I2C modules Applying: net/ntnic: add QSFP support Applying: net/ntnic: add QSFP28 support Applying: net/ntnic: add GPIO communication for NIMs Applying: net/ntnic: add physical layer control module Running test build... The Meson build system ```
[PATCH v4 1/3] dts: add multicast set function to shell
added set multicast function for changing allmulticast mode within testpmd. Signed-off-by: Dean Marx --- dts/framework/remote_session/testpmd_shell.py | 46 +++ 1 file changed, 46 insertions(+) diff --git a/dts/framework/remote_session/testpmd_shell.py b/dts/framework/remote_session/testpmd_shell.py index ec22f72221..a0be0bd09d 100644 --- a/dts/framework/remote_session/testpmd_shell.py +++ b/dts/framework/remote_session/testpmd_shell.py @@ -806,6 +806,52 @@ def show_port_stats(self, port_id: int) -> TestPmdPortStats: return TestPmdPortStats.parse(output) +def set_promisc(self, port: int, on: bool, verify: bool = True): +"""Turns promiscuous mode on/off for the specified port. + +Args: +port: Port number to use, should be within 0-32. +on: If :data:`True`, turn promisc mode on, otherwise turn off. +verify: If :data:`True` an additional command will be sent to verify that promisc mode +is properly set. Defaults to :data:`True`. + +Raises: +InteractiveCommandExecutionError: If `verify` is :data:`True` and promisc mode +is not correctly set. +""" +promisc_output = self.send_command(f"set promisc {port} {'on' if on else 'off'}") +if verify: +stats = self.show_port_info(port_id=port) +if on ^ stats.is_promiscuous_mode_enabled: +self._logger.debug(f"Failed to set promisc mode on port {port}: \n{promisc_output}") +raise InteractiveCommandExecutionError( +f"Testpmd failed to set promisc mode on port {port}." +) + +def set_multicast_all(self, on: bool, verify: bool = True): +"""Turns multicast mode on/off for the specified port. + +Args: +on: If :data:`True`, turns multicast mode on, otherwise turns off. +verify: If :data:`True` an additional command will be sent to verify +that multicast mode is properly set. Defaults to :data:`True`. + +Raises: +InteractiveCommandExecutionError: If `verify` is :data:`True` and multicast +mode is not properly set. +""" +multicast_output = self.send_command(f"set allmulti all {'on' if on else 'off'}") +if verify: +stats0 = self.show_port_info(port_id=0) +stats1 = self.show_port_info(port_id=1) +if on ^ (stats0.is_allmulticast_mode_enabled and stats1.is_allmulticast_mode_enabled): +self._logger.debug( +f"Failed to set multicast mode on all ports.: \n{multicast_output}" +) +raise InteractiveCommandExecutionError( +"Testpmd failed to set multicast mode on all ports." +) + def close(self) -> None: """Overrides :meth:`~.interactive_shell.close`.""" self.send_command("quit", "") -- 2.44.0
[PATCH v4 2/3] dts: dynamic config conf schema
configuration schema to run dynamic configuration test suite. Signed-off-by: Dean Marx --- dts/framework/config/conf_yaml_schema.json | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/dts/framework/config/conf_yaml_schema.json b/dts/framework/config/conf_yaml_schema.json index f02a310bb5..d7b4afed7d 100644 --- a/dts/framework/config/conf_yaml_schema.json +++ b/dts/framework/config/conf_yaml_schema.json @@ -187,7 +187,8 @@ "enum": [ "hello_world", "os_udp", -"pmd_buffer_scatter" +"pmd_buffer_scatter", +"dynamic_config" ] }, "test_target": { -- 2.44.0
[PATCH v4 3/3] dts: dynamic config test suite
Suite for testing ability of Poll Mode Driver to turn promiscuous mode on/off, allmulticast mode on/off, and show expected behavior when sending packets with known, unknown, broadcast, and multicast destination MAC addresses. Depends-on: patch-1142113 ("add send_packets to test suites and rework packet addressing") Signed-off-by: Dean Marx --- dts/tests/TestSuite_dynamic_config.py | 152 ++ 1 file changed, 152 insertions(+) create mode 100644 dts/tests/TestSuite_dynamic_config.py diff --git a/dts/tests/TestSuite_dynamic_config.py b/dts/tests/TestSuite_dynamic_config.py new file mode 100644 index 00..d6d26419f0 --- /dev/null +++ b/dts/tests/TestSuite_dynamic_config.py @@ -0,0 +1,152 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 University of New Hampshire + +"""Dynamic configuration capabilities test suite. + +This suite checks that it is possible to change the configuration of a port +dynamically. The Poll Mode Driver should be able to enable and disable +promiscuous mode on each port, as well as check the Rx and Tx packets of +each port. Promiscuous mode in networking passes all traffic a NIC receives +to the CPU, rather than just frames with matching MAC addresses. Each test +case sends a packet with a matching address, and one with an unknown address, +to ensure this behavior is shown. + +If packets should be received and forwarded, or received and not forwarded, +depending on the configuration, the port info should match the expected behavior. +""" + +from time import sleep + +from scapy.layers.inet import IP # type: ignore[import-untyped] +from scapy.layers.l2 import Ether # type: ignore[import-untyped] +from scapy.packet import Raw # type: ignore[import-untyped] + +from framework.params.testpmd import SimpleForwardingModes +from framework.remote_session.testpmd_shell import TestPmdShell +from framework.test_suite import TestSuite + + +class TestDynamicConfig(TestSuite): +"""Dynamic config suite. + +Use the show port commands to see the MAC address and promisc mode status +of the Rx port on the DUT. The suite will check the Rx and Tx packets +of each port after configuring promiscuous, multicast, and default mode +on the DUT to verify the expected behavior. It consists of four test cases: + +1. Default mode: verify packets are received and forwarded. +2. Disable promiscuous mode: verify that packets are received +only for the packet with destination address matching the port address. +3. Disable promiscuous mode broadcast: verify that packets with destination +MAC address not matching the port are received and not forwarded, and verify +that broadcast packets are received and forwarded. +4. Disable promiscuous mode multicast: verify that packets with destination +MAC address not matching the port are received and not forwarded, and verify +that multicast packets are received and forwarded. +""" + +def set_up_suite(self) -> None: +"""Set up the test suite. + +Setup: +Verify that at least two ports are open for session. +""" +self.verify(len(self._port_links) > 1, "Not enough ports") + +def send_packet_and_verify(self, should_receive: bool, mac_address: str) -> None: +"""Generate, send and verify packets. + +Generate a packet and send to the DUT, verify that packet is forwarded from DUT to +traffic generator if that behavior is expected. + +Args: +should_receive: Indicate whether the packet should be received. +mac_address: Destination MAC address to generate in packet. +""" +packet = Ether(dst=mac_address) / IP() / Raw(load="x") +received = self.send_packet_and_capture(packet) +contains_packet = any( +packet.haslayer(Raw) and b"x" in packet.load for packet in received +) +self.verify( +should_receive == contains_packet, +f"Packet was {'dropped' if should_receive else 'received'}", +) + +def disable_promisc_setup(self, port_id: int) -> TestPmdShell: +"""Sets up testpmd shell config for cases where promisc mode is disabled. + +Args: +port_id: Port number to disable promisc mode on. + +Returns: +shell: interactive testpmd shell object. +""" +shell = TestPmdShell(node=self.sut_node) +shell.start() +shell.set_promisc(port=port_id, on=False) +shell.set_forward_mode(SimpleForwardingModes.io) +return shell + +def test_default_mode(self) -> None: +"""Tests default configuration. + +Creates a testpmd shell, verifies that promiscuous mode is enabled by default, +and sends two packets; one matching source MAC address and one unknown. +Verifies that both are received. +""" +testpmd = TestPmdShell(node=self.sut_node) +isPromisc
[PATCH v2 1/3] net/ice: fix possible memory leak
This patch fixes possible memory leak inside the ice_hash_parse_raw_pattern() due to the lack of a call to rte_free() for previously allocated pkt_buf and msk_buf. Fixes: 1b9c68120a1c ("net/ice: enable protocol agnostic flow offloading in RSS") Cc: sta...@dpdk.org Reported-by: Michael Theodore Stolarchuk Signed-off-by: Vladimir Medvedkin --- drivers/net/ice/ice_hash.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/net/ice/ice_hash.c b/drivers/net/ice/ice_hash.c index f923641533..913f54fca4 100644 --- a/drivers/net/ice/ice_hash.c +++ b/drivers/net/ice/ice_hash.c @@ -650,7 +650,7 @@ ice_hash_parse_raw_pattern(struct ice_adapter *ad, uint8_t *pkt_buf, *msk_buf; uint8_t tmp_val = 0; uint8_t tmp_c = 0; - int i, j; + int i, j, ret = 0; if (ad->psr == NULL) return -rte_errno; @@ -670,8 +670,10 @@ ice_hash_parse_raw_pattern(struct ice_adapter *ad, return -ENOMEM; msk_buf = rte_zmalloc(NULL, pkt_len, 0); - if (!msk_buf) + if (!msk_buf) { + rte_free(pkt_buf); return -ENOMEM; + } /* convert string to int array */ for (i = 0, j = 0; i < spec_len; i += 2, j++) { @@ -708,17 +710,20 @@ ice_hash_parse_raw_pattern(struct ice_adapter *ad, msk_buf[j] = tmp_val * 16 + tmp_c - '0'; } - if (ice_parser_run(ad->psr, pkt_buf, pkt_len, &rslt)) - return -rte_errno; + ret = ice_parser_run(ad->psr, pkt_buf, pkt_len, &rslt); + if (ret) + goto free_mem; - if (ice_parser_profile_init(&rslt, pkt_buf, msk_buf, - pkt_len, ICE_BLK_RSS, true, &prof)) - return -rte_errno; + ret = ice_parser_profile_init(&rslt, pkt_buf, msk_buf, + pkt_len, ICE_BLK_RSS, true, &prof); + goto free_mem; rte_memcpy(&meta->raw.prof, &prof, sizeof(prof)); +free_mem: rte_free(pkt_buf); rte_free(msk_buf); + return 0; } -- 2.34.1
[PATCH v2 2/3] net/ice: refactor raw pattern parsing function
Replace strlen with more secure strnlen in ice_hash_parse_raw_pattern. Signed-off-by: Vladimir Medvedkin --- drivers/net/ice/ice_hash.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ice/ice_hash.c b/drivers/net/ice/ice_hash.c index 913f54fca4..00503d0d28 100644 --- a/drivers/net/ice/ice_hash.c +++ b/drivers/net/ice/ice_hash.c @@ -658,9 +658,9 @@ ice_hash_parse_raw_pattern(struct ice_adapter *ad, raw_spec = item->spec; raw_mask = item->mask; - spec_len = strlen((char *)(uintptr_t)raw_spec->pattern); - if (strlen((char *)(uintptr_t)raw_mask->pattern) != - spec_len) + spec_len = strnlen((char *)(uintptr_t)raw_spec->pattern, raw_spec->length); + if (strnlen((char *)(uintptr_t)raw_mask->pattern, raw_spec->length) != + spec_len) return -rte_errno; pkt_len = spec_len / 2; -- 2.34.1
[PATCH v2 3/3] net/ice: fix return value for raw pattern parsing function
If the parser was not initialized when calling ice_hash_parse_raw_pattern() -rte_errno was returned. Replace returning rte_errno with ENOTSUP since rte_errno is meaningless in the context of ice_hash_parse_raw_pattern(). Fixes: 1b9c68120a1c ("net/ice: enable protocol agnostic flow offloading in RSS") Cc: sta...@dpdk.org Signed-off-by: Vladimir Medvedkin --- drivers/net/ice/ice_hash.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ice/ice_hash.c b/drivers/net/ice/ice_hash.c index 00503d0d28..13a68b8f02 100644 --- a/drivers/net/ice/ice_hash.c +++ b/drivers/net/ice/ice_hash.c @@ -653,7 +653,7 @@ ice_hash_parse_raw_pattern(struct ice_adapter *ad, int i, j, ret = 0; if (ad->psr == NULL) - return -rte_errno; + return -ENOTSUP; raw_spec = item->spec; raw_mask = item->mask; -- 2.34.1
[PATCH v1 0/3] dts: add test suite for dual VLANs
From: Jeremy Spewock This series ports over the implementation of the dual_vlan test suite in old DTS and refactors it, dropping some duplicated functionality as well as some features that are specific to certain NICs. One thing to note about this series is that it is tested and fully working on a Mellanox NIC running the mlx5_core driver, but in testing I did notice some stranger behavior on a NIC running the bnxt_en driver. The broadcom NIC worked for all test cases except for those involving VLAN insertion. In the presence of 2 VLAN headers it seems that the bnxt_en NIC drops the packet completely if you attempt to insert a 3rd. I originally thought this might be an MTU issue, but with MTUs of 2000 on the DUT and 9000 on the traffic generator the packet was still dropped. I believe VLAN insertion in the presence of no other VLAN headers works on this same NIC was tested by Dean Marx. Jeremy Spewock (3): dts: fix Testpmd function for resetting VLAN insertion dts: add dual_vlan testing suite dts: add dual_vlan test suite to the yaml schema dts/framework/config/conf_yaml_schema.json| 3 +- dts/framework/remote_session/testpmd_shell.py | 2 +- dts/tests/TestSuite_dual_vlan.py | 281 ++ 3 files changed, 284 insertions(+), 2 deletions(-) create mode 100644 dts/tests/TestSuite_dual_vlan.py -- 2.45.2
[PATCH v1 1/3] dts: fix Testpmd function for resetting VLAN insertion
From: Jeremy Spewock The previous method would send the command `tx_vlan set ` when the correct command is `tx_vlan reset `. Fixes: a49d9da1e9a5 ("dts: add VLAN methods to testpmd shell") Cc: dm...@iol.unh.edu depends-on: patch-142103 ("dts: add VLAN methods to testpmd shell") Signed-off-by: Jeremy Spewock --- dts/framework/remote_session/testpmd_shell.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dts/framework/remote_session/testpmd_shell.py b/dts/framework/remote_session/testpmd_shell.py index 09d3bda5d6..a8b6a054b5 100644 --- a/dts/framework/remote_session/testpmd_shell.py +++ b/dts/framework/remote_session/testpmd_shell.py @@ -994,7 +994,7 @@ def tx_vlan_reset(self, port: int, verify: bool = True): InteractiveCommandExecutionError: If `verify` is :data:`True` and the insertion tag is not reset. """ -vlan_insert_output = self.send_command(f"tx_vlan set {port}") +vlan_insert_output = self.send_command(f"tx_vlan reset {port}") if verify: if "Please stop port" in vlan_insert_output or "Invalid port" in vlan_insert_output: self._logger.debug( -- 2.45.2
[PATCH v1 2/3] dts: add dual_vlan testing suite
From: Jeremy Spewock This patch ports over the functionality of the dual_vlan suite from old DTS to the new framework. This test suite exists to test the functionality of VLAN functions such as stripping, inserting, and filerting in the presence of two VLAN headers. There are some test cases which were left out in this refactored version including test cases that test the functionality of VLAN functions on a packet with only one VLAN header, as this is something that is tested in another test suite which is currently in development. Additionally, this series does not include test cases for testing the adjustment of TPID or extended VLAN ranges, as these things were included in the old test suite specifically for testing on Intel hardware and they are not universally supported on every NIC. There could be further reason to add these test cases in the future once the capabilities feature is fully implemented. Extended mode for VLANs seems to be exposed through offload capabilities of the port, but there doesn't seem to be anything as obvious for TPID modification. depends-on: patch-142103 ("dts: add VLAN methods to testpmd shell") Signed-off-by: Jeremy Spewock --- dts/tests/TestSuite_dual_vlan.py | 281 +++ 1 file changed, 281 insertions(+) create mode 100644 dts/tests/TestSuite_dual_vlan.py diff --git a/dts/tests/TestSuite_dual_vlan.py b/dts/tests/TestSuite_dual_vlan.py new file mode 100644 index 00..095e57bc56 --- /dev/null +++ b/dts/tests/TestSuite_dual_vlan.py @@ -0,0 +1,281 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 University of New Hampshire + +"""Dual VLAN functionality testing suite. + +The main objective of this test suite is to ensure that standard VLAN functions such as stripping, +filtering, and inserting all still carry out their expected behavior in the presence of a packet +which contains two VLAN headers. These functions should carry out said behavior not just in +isolation, but also when other VLAN functions are configured on the same port. In addition to this, +the priority attributes of VLAN headers should be unchanged in the case of multiple VLAN headers +existing on a single packet. +""" +import time +from enum import Flag, auto +from typing import ClassVar + +from scapy.layers.l2 import Dot1Q, Ether # type: ignore[import-untyped] +from scapy.packet import Packet, Raw # type: ignore[import-untyped] + +from framework.params.testpmd import SimpleForwardingModes +from framework.remote_session.testpmd_shell import TestPmdShell +from framework.test_suite import TestSuite + + +class TestDualVlan(TestSuite): +"""DPDK Dual VLAN test suite. + +This suite tests the behavior of VLAN functions and properties in the presence of two VLAN +headers. All VLAN functions which are tested in this suite are specified using the inner class +:class:`TestCaseOptions` and should have cases for configuring them in +:meth:`configure_testpmd` as well as cases for testing their behavior in +:meth:`verify_vlan_functions`. Every combination of VLAN functions being enabled should be +tested. Additionally, attributes of VLAN headers, such as priority, are tested to ensure they +are not modified in the case of two VLAN headers. +""" + +class TestCaseOptions(Flag): +"""Flag for specifying which VLAN functions to configure.""" + +#: +VLAN_STRIP = auto() +#: +VLAN_FILTER_INNER = auto() +#: +VLAN_FILTER_OUTER = auto() +#: +VLAN_INSERT = auto() + +#: ID to set on inner VLAN tags. +inner_vlan_tag: ClassVar[int] = 2 +#: ID to set on outer VLAN tags. +outer_vlan_tag: ClassVar[int] = 1 +#: ID to use when inserting VLAN tags. +vlan_insert_tag: ClassVar[int] = 3 +#: +rx_port: ClassVar[int] = 0 +#: +tx_port: ClassVar[int] = 1 + +def is_relevant_packet(self, pkt: Packet) -> bool: +"""Check if a packet was sent by functions in this suite. + +All functions in this test suite send packets with a payload that is packed with 20 "X" +characters. This method, therefore, can determine if the packet was sent by this test suite +by just checking to see if this payload exists on the received packet. + +Args: +pkt: Packet to check for relevancy. + +Returns: +:data:`True` if the packet contains the expected payload, :data:`False` otherwise. +""" +return hasattr(pkt, "load") and "X" * 20 in str(pkt.load) + +def pkt_payload_contains_layers(self, pkt: Packet, *expected_layers: Dot1Q) -> bool: +"""Verify that the payload of the packet matches `expected_layers`. + +The layers in the payload of `pkt` must match the type and the user-defined fields of the +layers in `expected_layers` in order. + +Args: +pkt: Packet to check the payload of. +*expected_layers: Layers expecte
[PATCH v1 3/3] dts: add dual_vlan test suite to the yaml schema
From: Jeremy Spewock Adds the test suite name to the yaml schema to allow for it to be run. Signed-off-by: Jeremy Spewock --- dts/framework/config/conf_yaml_schema.json | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/dts/framework/config/conf_yaml_schema.json b/dts/framework/config/conf_yaml_schema.json index f02a310bb5..b8ad5b37b3 100644 --- a/dts/framework/config/conf_yaml_schema.json +++ b/dts/framework/config/conf_yaml_schema.json @@ -187,7 +187,8 @@ "enum": [ "hello_world", "os_udp", -"pmd_buffer_scatter" +"pmd_buffer_scatter", +"dual_vlan" ] }, "test_target": { -- 2.45.2
Re: [PATCH v4 1/3] dts: add multicast set function to shell
On Mon, Jul 15, 2024 at 12:00 PM Dean Marx wrote: > > added set multicast function for changing allmulticast mode within testpmd. > > Signed-off-by: Dean Marx > --- I still think this patch would benefit from my above comments about modifying the method signatures and using show_port_info_all(), but it looks good to me otherwise.
Re: [PATCH v4 2/3] dts: dynamic config conf schema
On Mon, Jul 15, 2024 at 12:00 PM Dean Marx wrote: > > configuration schema to run dynamic configuration test suite. > > Signed-off-by: Dean Marx Reviewed-by: Jeremy Spewock
Re: [PATCH v4 3/3] dts: dynamic config test suite
On Mon, Jul 15, 2024 at 12:00 PM Dean Marx wrote: > > Suite for testing ability of Poll Mode Driver to turn promiscuous > mode on/off, allmulticast mode on/off, and show expected behavior > when sending packets with known, unknown, broadcast, and multicast > destination MAC addresses. > > Depends-on: patch-1142113 ("add send_packets to test suites and rework > packet addressing") > > Signed-off-by: Dean Marx Reviewed-by: Jeremy Spewock
[DPDK/DTS Bug 1489] Port over dual VLAN test suite to new DTS
https://bugs.dpdk.org/show_bug.cgi?id=1489 Bug ID: 1489 Summary: Port over dual VLAN test suite to new DTS Product: DPDK Version: unspecified Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: Normal Component: DTS Assignee: dev@dpdk.org Reporter: jspew...@iol.unh.edu CC: juraj.lin...@pantheon.tech, pr...@iol.unh.edu Target Milestone: --- This is open on the mailing list: https://patchwork.dpdk.org/project/dpdk/list/?series=32503 . -- You are receiving this mail because: You are the assignee for the bug.
[RFC] ethdev: an API for cache stashing hints
An application provides cache stashing hints to the ethernet devices to improve memory access latencies from the CPU and the NIC. This patch introduces three distinct hints for this purpose. The RTE_ETH_DEV_STASH_HINT_HOST_WILLNEED hint indicates that the host (CPU) requires the data written by the NIC immediately. This implies that the CPU expects to read data from its local cache rather than LLC or main memory if possible. This would improve memory access latency in the Rx path. For PCI devices with TPH capability, these hints translate into DWHR (Device Writes Host Reads) access pattern. This hint is only valid for receive queues. The RTE_ETH_DEV_STASH_HINT_BI_DIR_DATA hint indicates that the host and the device access the data structure equally. Rx/Tx queue descriptors fit the description of such data. This hint applies to both Rx and Tx directions. In the PCI TPH context, this hint translates into a Bi-Directional access pattern. RTE_ETH_DEV_STASH_HINT_DEV_ONLY hint indicates that the CPU is not involved in a given device's receive or transmit paths. This implies that only devices are involved in the IO path. Depending on the implementation, this hint may result in data getting placed in a cache close to the device or not cached at all. For PCI devices with TPH capability, this hint translates into D*D* (DWDR, DRDW, DWDW, DRDR) access patterns. This is a bidirectional hint, and it can be applied to both Rx and Tx queues. The RTE_ETH_DEV_STASH_HINT_HOST_DONTNEED hint indicates that the device reads data written by the host (CPU) that may still be in the host's local cache but is not required by the host anytime soon. This hint is intended to prevent unnecessary cache invalidations that cause interconnect latencies when a device writes to a buffer already in host cache memory. In DPDK, this could happen with the recycling of mbufs where a mbuf is placed in the Tx queue that then gets back into mempool and gets recycled back into the Rx queue, all while a copy is being held in the CPU's local cache unnecessarily. By using this hint on supported platforms, the mbuf will be invalidated after the device completes the buffer reading, but it will be well before the buffer gets recycled and updated in the Rx path. This hint is only valid for transmit queues. Applications use three main interfaces in the ethdev library to discover and set cache stashing hints. rte_eth_dev_stashing_hints_tx interface is used to set hints on a Tx queue. rte_eth_dev_stashing_hints_rx interface is used to set hints on an Rx queue. Both of these functions take the following parameters as inputs: a port_id (the id of the ethernet device), a cpu_id (the target CPU), a cache_level (the level of the cache hierarchy the data should be stashed into), a queue_id (the queue the hints are applied to). In addition to the above list of parameters, a type parameter indicates the type of the object the application expects to be stashed by the hardware. Depending on the hardware, these may vary. Intel E810 NICs support the stashing of Rx/Tx descriptors, packet headers, and packet payloads. These are indicated by the macros RTE_ETH_DEV_STASH_TYPE_DESC, RTE_ETH_DEV_STASH_TYPE_HEADER, RTE_ETH_DEV_STASH_TYPE_PAYLOAD. Hardware capable of stashing data at any given offset into a packet can use the RTE_ETH_DEV_STASH_TYPE_OFFSET type. When an offset is used, the offset parameter in the above two functions should be set appropriately. rte_eth_dev_stashing_hints_discover is used to discover the object types and hints supported in the platform and the device. The function takes types and hints pointers used as a bit vector to indicate hints and types supported by the NIC. An application that intends to use stashing hints should first discover supported hints and types and then use the functions rte_eth_dev_stashing_hints_tx and rte_eth_dev_stashing_hints_rx as required to set stashing hints accordingly. eth_dev_ops structure has been updated with two new ops that a PMD should implement to support cache stashing hints. A PMD that intends to support cache stashing hints should initialize the set_stashing_hints function pointer to a function that issues hints to the underlying hardware in compliance with platform capabilities. The same PMD should also implement a function that can return two-bit fields indicating supported types and hints and then initialize the discover_stashing_hints function pointer with it. If the NIC supports cache stashing hints, the NIC should always set the RTE_ETH_DEV_CAPA_CACHE_STASHING device capability. Signed-off-by: Wathsala Vithanage Reviewed-by: Dhruv Tripathi Jira: ENTNET-5014 Change-Id: I0a4197311a884619b03eba7c94fa0922d5f57045 --- .mailmap | 1 + lib/ethdev/ethdev_driver.h | 67 +++ lib/ethdev/rte_ethdev.c| 153 + lib/ethdev/rte_ethdev.h| 225 + lib/ethdev/version.map | 6 + 5 files changed, 452 insertions(+)
[RFC v2] ethdev: an API for cache stashing hints
An application provides cache stashing hints to the ethernet devices to improve memory access latencies from the CPU and the NIC. This patch introduces three distinct hints for this purpose. The RTE_ETH_DEV_STASH_HINT_HOST_WILLNEED hint indicates that the host (CPU) requires the data written by the NIC immediately. This implies that the CPU expects to read data from its local cache rather than LLC or main memory if possible. This would improve memory access latency in the Rx path. For PCI devices with TPH capability, these hints translate into DWHR (Device Writes Host Reads) access pattern. This hint is only valid for receive queues. The RTE_ETH_DEV_STASH_HINT_BI_DIR_DATA hint indicates that the host and the device access the data structure equally. Rx/Tx queue descriptors fit the description of such data. This hint applies to both Rx and Tx directions. In the PCI TPH context, this hint translates into a Bi-Directional access pattern. RTE_ETH_DEV_STASH_HINT_DEV_ONLY hint indicates that the CPU is not involved in a given device's receive or transmit paths. This implies that only devices are involved in the IO path. Depending on the implementation, this hint may result in data getting placed in a cache close to the device or not cached at all. For PCI devices with TPH capability, this hint translates into D*D* (DWDR, DRDW, DWDW, DRDR) access patterns. This is a bidirectional hint, and it can be applied to both Rx and Tx queues. The RTE_ETH_DEV_STASH_HINT_HOST_DONTNEED hint indicates that the device reads data written by the host (CPU) that may still be in the host's local cache but is not required by the host anytime soon. This hint is intended to prevent unnecessary cache invalidations that cause interconnect latencies when a device writes to a buffer already in host cache memory. In DPDK, this could happen with the recycling of mbufs where a mbuf is placed in the Tx queue that then gets back into mempool and gets recycled back into the Rx queue, all while a copy is being held in the CPU's local cache unnecessarily. By using this hint on supported platforms, the mbuf will be invalidated after the device completes the buffer reading, but it will be well before the buffer gets recycled and updated in the Rx path. This hint is only valid for transmit queues. Applications use three main interfaces in the ethdev library to discover and set cache stashing hints. rte_eth_dev_stashing_hints_tx interface is used to set hints on a Tx queue. rte_eth_dev_stashing_hints_rx interface is used to set hints on an Rx queue. Both of these functions take the following parameters as inputs: a port_id (the id of the ethernet device), a cpu_id (the target CPU), a cache_level (the level of the cache hierarchy the data should be stashed into), a queue_id (the queue the hints are applied to). In addition to the above list of parameters, a type parameter indicates the type of the object the application expects to be stashed by the hardware. Depending on the hardware, these may vary. Intel E810 NICs support the stashing of Rx/Tx descriptors, packet headers, and packet payloads. These are indicated by the macros RTE_ETH_DEV_STASH_TYPE_DESC, RTE_ETH_DEV_STASH_TYPE_HEADER, RTE_ETH_DEV_STASH_TYPE_PAYLOAD. Hardware capable of stashing data at any given offset into a packet can use the RTE_ETH_DEV_STASH_TYPE_OFFSET type. When an offset is used, the offset parameter in the above two functions should be set appropriately. rte_eth_dev_stashing_hints_discover is used to discover the object types and hints supported in the platform and the device. The function takes types and hints pointers used as a bit vector to indicate hints and types supported by the NIC. An application that intends to use stashing hints should first discover supported hints and types and then use the functions rte_eth_dev_stashing_hints_tx and rte_eth_dev_stashing_hints_rx as required to set stashing hints accordingly. eth_dev_ops structure has been updated with two new ops that a PMD should implement to support cache stashing hints. A PMD that intends to support cache stashing hints should initialize the set_stashing_hints function pointer to a function that issues hints to the underlying hardware in compliance with platform capabilities. The same PMD should also implement a function that can return two-bit fields indicating supported types and hints and then initialize the discover_stashing_hints function pointer with it. If the NIC supports cache stashing hints, the NIC should always set the RTE_ETH_DEV_CAPA_CACHE_STASHING device capability. Signed-off-by: Wathsala Vithanage Reviewed-by: Dhruv Tripathi --- .mailmap | 1 + lib/ethdev/ethdev_driver.h | 67 +++ lib/ethdev/rte_ethdev.c| 153 + lib/ethdev/rte_ethdev.h| 225 + lib/ethdev/version.map | 6 + 5 files changed, 452 insertions(+) diff --git a/.mailmap b/.mailmap index f1e64286a1..9c28b74655 100644 -
[PATCH v3 1/4] eal: expand the availability of WFE and related instructions
The availability of __RTE_ARM_WFE, __RTE_ARM_SEV, __RTE_ARM_SEVL, and __RTE_ARM_LOAD_EXC_* macros for other applications, such as PMD power management, should not depend on the choice of use of these instructions in rte_wait_until_equal_N functions. Therefore, this patch moves these macros out of control of the RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED macro. Signed-off-by: Wathsala Vithanage Reviewed-by: Dhruv Tripathi --- lib/eal/arm/include/rte_pause_64.h | 4 ++-- lib/eal/arm/rte_cpuflags.c | 4 ++-- lib/eal/arm/rte_power_intrinsics.c | 9 - 3 files changed, 8 insertions(+), 9 deletions(-) diff --git a/lib/eal/arm/include/rte_pause_64.h b/lib/eal/arm/include/rte_pause_64.h index 9e2dbf3531..8224f09ba7 100644 --- a/lib/eal/arm/include/rte_pause_64.h +++ b/lib/eal/arm/include/rte_pause_64.h @@ -24,8 +24,6 @@ static inline void rte_pause(void) asm volatile("yield" ::: "memory"); } -#ifdef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED - /* Send a local event to quit WFE. */ #define __RTE_ARM_SEVL() { asm volatile("sevl" : : : "memory"); } @@ -148,6 +146,8 @@ static inline void rte_pause(void) __RTE_ARM_LOAD_EXC_128(src, dst, memorder) \ } +#ifdef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED + static __rte_always_inline void rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected, rte_memory_order memorder) diff --git a/lib/eal/arm/rte_cpuflags.c b/lib/eal/arm/rte_cpuflags.c index 7ba4f8ba97..29884c285f 100644 --- a/lib/eal/arm/rte_cpuflags.c +++ b/lib/eal/arm/rte_cpuflags.c @@ -163,7 +163,7 @@ void rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics) { memset(intrinsics, 0, sizeof(*intrinsics)); -#ifdef RTE_ARM_USE_WFE +#ifdef RTE_ARCH_64 intrinsics->power_monitor = 1; -#endif +#endif /* RTE_ARCH_64 */ } diff --git a/lib/eal/arm/rte_power_intrinsics.c b/lib/eal/arm/rte_power_intrinsics.c index f54cf59e80..b0056cce8b 100644 --- a/lib/eal/arm/rte_power_intrinsics.c +++ b/lib/eal/arm/rte_power_intrinsics.c @@ -17,7 +17,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, { RTE_SET_USED(tsc_timestamp); -#ifdef RTE_ARM_USE_WFE +#ifdef RTE_ARCH_64 const unsigned int lcore_id = rte_lcore_id(); uint64_t cur_value; @@ -57,7 +57,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, RTE_SET_USED(pmc); return -ENOTSUP; -#endif +#endif /* RTE_ARCH_64 */ } /** @@ -81,13 +81,12 @@ rte_power_monitor_wakeup(const unsigned int lcore_id) { RTE_SET_USED(lcore_id); -#ifdef RTE_ARM_USE_WFE +#ifdef RTE_ARCH_64 __RTE_ARM_SEV() - return 0; #else return -ENOTSUP; -#endif +#endif /* RTE_ARCH_64 */ } int -- 2.34.1
[PATCH v3 2/4] config/arm: adds Arm Neoverse N3 SoC
Add Arm Neoverse N3 part number to build configuration. Signed-off-by: Wathsala Vithanage --- config/arm/meson.build | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/config/arm/meson.build b/config/arm/meson.build index 012935d5d7..8018218b76 100644 --- a/config/arm/meson.build +++ b/config/arm/meson.build @@ -116,6 +116,18 @@ part_number_config_arm = { ['RTE_MAX_LCORE', 144], ['RTE_MAX_NUMA_NODES', 2] ] +}, +'0xd8e': { +'march': 'armv8.7-a', +'march_features': ['sve2'], +'fallback_march': 'armv8.5-a', +'flags': [ +['RTE_MACHINE', '"neoverse-n3"'], +['RTE_ARM_FEATURE_ATOMICS', true], +['RTE_ARM_FEATURE_WFXT', true], +['RTE_MAX_LCORE', 192], +['RTE_MAX_NUMA_NODES', 2] +] } } implementer_arm = { @@ -572,6 +584,13 @@ soc_n2 = { 'numa': false } +soc_n3 = { +'description': 'Arm Neoverse N3', +'implementer': '0x41', +'part_number': '0xd8e', +'numa': false +} + soc_odyssey = { 'description': 'Marvell Odyssey', 'implementer': '0x41', @@ -699,6 +718,7 @@ socs = { 'kunpeng930': soc_kunpeng930, 'n1sdp': soc_n1sdp, 'n2': soc_n2, +'n3': soc_n3, 'odyssey' : soc_odyssey, 'stingray': soc_stingray, 'thunderx2': soc_thunderx2, @@ -852,7 +872,7 @@ if update_flags if part_number_config.get('force_march', false) candidate_march = part_number_config['march'] else -supported_marchs = ['armv9-a', 'armv8.6-a', 'armv8.5-a', 'armv8.4-a', 'armv8.3-a', +supported_marchs = ['armv9-a', 'armv8.7-a', 'armv8.6-a', 'armv8.5-a', 'armv8.4-a', 'armv8.3-a', 'armv8.2-a', 'armv8.1-a', 'armv8-a'] check_compiler_support = false foreach supported_march: supported_marchs -- 2.34.1
[PATCH v3 3/4] eal: add Arm WFET in power management intrinsics
Wait for event with timeout (WFET) puts the CPU in a low power mode and stays there until an event is signalled (SEV), loss of an exclusive monitor or a timeout. WFET is enabled selectively by checking FEAT_WFxT in Linux auxiliary vector. If FEAT_WFxT is not available power management will fallback to WFE. WFE is available on all the Arm platforms supported by DPDK. Therefore, the RTE_ARM_USE_WFE macro is not required to enable the WFE feature for PMD power monitoring. RTE_ARM_USE_WFE is used at the build time to use the WFE instruction where applicable in the code at the developer's discretion rather than as an indicator of the instruction's availability. Signed-off-by: Wathsala Vithanage Reviewed-by: Dhruv Tripathi Reviewed-by: Honnappa Nagarahalli Reviewed-by: Jack Bond-Preston Reviewed-by: Nick Connolly Reviewed-by: Vinod Krishna --- .mailmap | 2 ++ app/test/test_cpuflags.c | 3 +++ lib/eal/arm/include/rte_cpuflags_64.h | 3 +++ lib/eal/arm/include/rte_pause_64.h| 16 +-- lib/eal/arm/rte_cpuflags.c| 1 + lib/eal/arm/rte_power_intrinsics.c| 39 ++- 6 files changed, 50 insertions(+), 14 deletions(-) diff --git a/.mailmap b/.mailmap index f1e64286a1..a5c49d3702 100644 --- a/.mailmap +++ b/.mailmap @@ -338,6 +338,7 @@ Dexia Li Dexuan Cui Dharmik Thakkar Dheemanth Mallikarjun +Dhruv Tripathi Diana Wang Didier Pallard Dilshod Urazov @@ -1539,6 +1540,7 @@ Vincent Li Vincent S. Cojot Vinh Tran Vipin Padmam Ramesh +Vinod Krishna Vipin Varghese Vipul Ashri Visa Hankala diff --git a/app/test/test_cpuflags.c b/app/test/test_cpuflags.c index a0ff74720c..22ab4dff0a 100644 --- a/app/test/test_cpuflags.c +++ b/app/test/test_cpuflags.c @@ -156,6 +156,9 @@ test_cpuflags(void) printf("Check for SVEBF16:\t"); CHECK_FOR_FLAG(RTE_CPUFLAG_SVEBF16); + + printf("Check for WFXT:\t"); + CHECK_FOR_FLAG(RTE_CPUFLAG_WFXT); #endif #if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686) diff --git a/lib/eal/arm/include/rte_cpuflags_64.h b/lib/eal/arm/include/rte_cpuflags_64.h index afe70209c3..993d980a02 100644 --- a/lib/eal/arm/include/rte_cpuflags_64.h +++ b/lib/eal/arm/include/rte_cpuflags_64.h @@ -36,6 +36,9 @@ enum rte_cpu_flag_t { RTE_CPUFLAG_SVEF64MM, RTE_CPUFLAG_SVEBF16, RTE_CPUFLAG_AARCH64, + + /* WFET and WFIT instructions */ + RTE_CPUFLAG_WFXT, }; #include "generic/rte_cpuflags.h" diff --git a/lib/eal/arm/include/rte_pause_64.h b/lib/eal/arm/include/rte_pause_64.h index 8224f09ba7..809403bffa 100644 --- a/lib/eal/arm/include/rte_pause_64.h +++ b/lib/eal/arm/include/rte_pause_64.h @@ -24,15 +24,27 @@ static inline void rte_pause(void) asm volatile("yield" ::: "memory"); } -/* Send a local event to quit WFE. */ +/* Send a local event to quit WFE/WFxT. */ #define __RTE_ARM_SEVL() { asm volatile("sevl" : : : "memory"); } -/* Send a global event to quit WFE for all cores. */ +/* Send a global event to quit WFE/WFxT for all cores. */ #define __RTE_ARM_SEV() { asm volatile("sev" : : : "memory"); } /* Put processor into low power WFE(Wait For Event) state. */ #define __RTE_ARM_WFE() { asm volatile("wfe" : : : "memory"); } +/* Put processor into low power WFET (WFE with Timeout) state. */ +#ifdef RTE_ARM_FEATURE_WFXT +#define __RTE_ARM_WFET(t) { \ + asm volatile("wfet %x[to]"\ + : \ + : [to] "r" (t)\ + : "memory"); \ + } +#else +#define __RTE_ARM_WFET(t) { RTE_SET_USED(t); } +#endif + /* * Atomic exclusive load from addr, it returns the 8-bit content of * *addr while making it 'monitored', when it is written by someone diff --git a/lib/eal/arm/rte_cpuflags.c b/lib/eal/arm/rte_cpuflags.c index 29884c285f..88e10c6da0 100644 --- a/lib/eal/arm/rte_cpuflags.c +++ b/lib/eal/arm/rte_cpuflags.c @@ -115,6 +115,7 @@ const struct feature_entry rte_cpu_feature_table[] = { FEAT_DEF(SVEF32MM, REG_HWCAP2, 10) FEAT_DEF(SVEF64MM, REG_HWCAP2, 11) FEAT_DEF(SVEBF16, REG_HWCAP2, 12) + FEAT_DEF(WFXT, REG_HWCAP2, 31) FEAT_DEF(AARCH64, REG_PLATFORM, 0) }; #endif /* RTE_ARCH */ diff --git a/lib/eal/arm/rte_power_intrinsics.c b/lib/eal/arm/rte_power_intrinsics.c index b0056cce8b..6475bbca04 100644 --- a/lib/eal/arm/rte_power_intrinsics.c +++ b/lib/eal/arm/rte_power_intrinsics.c @@ -4,19 +4,32 @@ #include +#include "rte_cpuflags.h" #include "rte_power_intrinsics.h" /** - * This function uses WFE instruction to make lcore suspend + * Set wfet_en if WFET is supported + */ +#ifdef RTE_ARCH_64 +static uint8_t wfet_en; +#endif /* RTE_ARCH_64 */ + +RTE_INIT(rte_power_intrinsics_init) +{ +#ifdef RTE_ARCH_64 + if (rte_cpu_get_flag_en
[PATCH v3 4/4] eal: describe Arm CPU features including WFXT
RTE_CPUFALG_WFXT indicates the availability of WFET and WFIT instructions. To preserve consistency across the rte_cpu_flag_t enumeration, add descriptive comments to each Arm feature listed. Signed-off-by: Wathsala Vithanage Reviewed-by: Dhruv Tripathi --- lib/eal/arm/include/rte_cpuflags_64.h | 48 +++ 1 file changed, 48 insertions(+) diff --git a/lib/eal/arm/include/rte_cpuflags_64.h b/lib/eal/arm/include/rte_cpuflags_64.h index 993d980a02..eed67bf6ec 100644 --- a/lib/eal/arm/include/rte_cpuflags_64.h +++ b/lib/eal/arm/include/rte_cpuflags_64.h @@ -13,28 +13,76 @@ extern "C" { * Enumeration of all CPU features supported */ enum rte_cpu_flag_t { + /* Floating point capability */ RTE_CPUFLAG_FP = 0, + + /* Arm Neon extension */ RTE_CPUFLAG_NEON, + + /* Generic timer event stream */ RTE_CPUFLAG_EVTSTRM, + + /* AES instructions */ RTE_CPUFLAG_AES, + + /* Polynomial multiply long instruction */ RTE_CPUFLAG_PMULL, + + /* SHA1 instructions */ RTE_CPUFLAG_SHA1, + + /* SHA2 instructions */ RTE_CPUFLAG_SHA2, + + /* CRC32 instruction */ RTE_CPUFLAG_CRC32, + + /* +* LDADD, LDCLR, LDEOR, LDSET, LDSMAX, LDSMIN, LDUMAX, LDUMIN, CAS, +* CASP, and SWP instructions +*/ RTE_CPUFLAG_ATOMICS, + + /* Arm SVE extension */ RTE_CPUFLAG_SVE, + + /* Arm SVE2 extension */ RTE_CPUFLAG_SVE2, + + /* SVE-AES instructions */ RTE_CPUFLAG_SVEAES, + + /* SVE-PMULL instruction */ RTE_CPUFLAG_SVEPMULL, + + /* SVE bit permute instructions */ RTE_CPUFLAG_SVEBITPERM, + + /* SVE-SHA3 instructions */ RTE_CPUFLAG_SVESHA3, + + /* SVE-SM4 instructions */ RTE_CPUFLAG_SVESM4, + + /* CFINV, RMIF, SETF16, SETF8, AXFLAG, and XAFLAG instructions */ RTE_CPUFLAG_FLAGM2, + + /* FRINT32Z, FRINT32X, FRINT64Z, and FRINT64X instructions */ RTE_CPUFLAG_FRINT, + + /* SVE Int8 matrix multiplication instructions */ RTE_CPUFLAG_SVEI8MM, + + /* SVE FP32 floating-point matrix multiplication instructions */ RTE_CPUFLAG_SVEF32MM, + + /* SVE FP64 floating-point matrix multiplication instructions */ RTE_CPUFLAG_SVEF64MM, + + /* SVE BFloat16 instructions */ RTE_CPUFLAG_SVEBF16, + + /* 64 bit execution state of the Arm architecture */ RTE_CPUFLAG_AARCH64, /* WFET and WFIT instructions */ -- 2.34.1
Re: [PATCH v3 4/4] eal: describe Arm CPU features including WFXT
> On Jul 15, 2024, at 5:53 PM, Wathsala Vithanage > wrote: > > RTE_CPUFALG_WFXT indicates the availability of WFET and WFIT > instructions. To preserve consistency across the rte_cpu_flag_t > enumeration, add descriptive comments to each Arm feature listed. IMO, above can be simpler. “Add descriptive comments to each Arm feature listed in rte_cpu_flag_t" > > Signed-off-by: Wathsala Vithanage > Reviewed-by: Dhruv Tripathi Otherwise, looks good. Reviewed-by: Honnappa Nagarahalli > --- > lib/eal/arm/include/rte_cpuflags_64.h | 48 +++ > 1 file changed, 48 insertions(+) > > diff --git a/lib/eal/arm/include/rte_cpuflags_64.h > b/lib/eal/arm/include/rte_cpuflags_64.h > index 993d980a02..eed67bf6ec 100644 > --- a/lib/eal/arm/include/rte_cpuflags_64.h > +++ b/lib/eal/arm/include/rte_cpuflags_64.h > @@ -13,28 +13,76 @@ extern "C" { > * Enumeration of all CPU features supported > */ > enum rte_cpu_flag_t { > + /* Floating point capability */ > RTE_CPUFLAG_FP = 0, > + > + /* Arm Neon extension */ > RTE_CPUFLAG_NEON, > + > + /* Generic timer event stream */ > RTE_CPUFLAG_EVTSTRM, > + > + /* AES instructions */ > RTE_CPUFLAG_AES, > + > + /* Polynomial multiply long instruction */ > RTE_CPUFLAG_PMULL, > + > + /* SHA1 instructions */ > RTE_CPUFLAG_SHA1, > + > + /* SHA2 instructions */ > RTE_CPUFLAG_SHA2, > + > + /* CRC32 instruction */ > RTE_CPUFLAG_CRC32, > + > + /* > + * LDADD, LDCLR, LDEOR, LDSET, LDSMAX, LDSMIN, LDUMAX, LDUMIN, CAS, > + * CASP, and SWP instructions > + */ > RTE_CPUFLAG_ATOMICS, > + > + /* Arm SVE extension */ > RTE_CPUFLAG_SVE, > + > + /* Arm SVE2 extension */ > RTE_CPUFLAG_SVE2, > + > + /* SVE-AES instructions */ > RTE_CPUFLAG_SVEAES, > + > + /* SVE-PMULL instruction */ > RTE_CPUFLAG_SVEPMULL, > + > + /* SVE bit permute instructions */ > RTE_CPUFLAG_SVEBITPERM, > + > + /* SVE-SHA3 instructions */ > RTE_CPUFLAG_SVESHA3, > + > + /* SVE-SM4 instructions */ > RTE_CPUFLAG_SVESM4, > + > + /* CFINV, RMIF, SETF16, SETF8, AXFLAG, and XAFLAG instructions */ > RTE_CPUFLAG_FLAGM2, > + > + /* FRINT32Z, FRINT32X, FRINT64Z, and FRINT64X instructions */ > RTE_CPUFLAG_FRINT, > + > + /* SVE Int8 matrix multiplication instructions */ > RTE_CPUFLAG_SVEI8MM, > + > + /* SVE FP32 floating-point matrix multiplication instructions */ > RTE_CPUFLAG_SVEF32MM, > + > + /* SVE FP64 floating-point matrix multiplication instructions */ > RTE_CPUFLAG_SVEF64MM, > + > + /* SVE BFloat16 instructions */ > RTE_CPUFLAG_SVEBF16, > + > + /* 64 bit execution state of the Arm architecture */ > RTE_CPUFLAG_AARCH64, > > /* WFET and WFIT instructions */ > -- > 2.34.1 >
Re: [PATCH v3 2/4] config/arm: adds Arm Neoverse N3 SoC
> On Jul 15, 2024, at 5:53 PM, Wathsala Vithanage > wrote: > > Add Arm Neoverse N3 part number to build configuration. > > Signed-off-by: Wathsala Vithanage > --- > config/arm/meson.build | 22 +- > 1 file changed, 21 insertions(+), 1 deletion(-) > > diff --git a/config/arm/meson.build b/config/arm/meson.build > index 012935d5d7..8018218b76 100644 > --- a/config/arm/meson.build > +++ b/config/arm/meson.build > @@ -116,6 +116,18 @@ part_number_config_arm = { > ['RTE_MAX_LCORE', 144], > ['RTE_MAX_NUMA_NODES', 2] > ] > +}, > +'0xd8e': { > +'march': 'armv8.7-a’, Should the above be ‘armv9-a’? > +'march_features': ['sve2'], > +'fallback_march': 'armv8.5-a', > +'flags': [ > +['RTE_MACHINE', '"neoverse-n3"'], > +['RTE_ARM_FEATURE_ATOMICS', true], > +['RTE_ARM_FEATURE_WFXT', true], > +['RTE_MAX_LCORE', 192], > +['RTE_MAX_NUMA_NODES', 2] > +] > } > } > implementer_arm = { > @@ -572,6 +584,13 @@ soc_n2 = { > 'numa': false > } > > +soc_n3 = { > +'description': 'Arm Neoverse N3', > +'implementer': '0x41', > +'part_number': '0xd8e', > +'numa': false > +} > + > soc_odyssey = { > 'description': 'Marvell Odyssey', > 'implementer': '0x41', > @@ -699,6 +718,7 @@ socs = { > 'kunpeng930': soc_kunpeng930, > 'n1sdp': soc_n1sdp, > 'n2': soc_n2, > +'n3': soc_n3, > 'odyssey' : soc_odyssey, > 'stingray': soc_stingray, > 'thunderx2': soc_thunderx2, > @@ -852,7 +872,7 @@ if update_flags > if part_number_config.get('force_march', false) > candidate_march = part_number_config['march'] > else > -supported_marchs = ['armv9-a', 'armv8.6-a', 'armv8.5-a', > 'armv8.4-a', 'armv8.3-a', > +supported_marchs = ['armv9-a', 'armv8.7-a', 'armv8.6-a', > 'armv8.5-a', 'armv8.4-a', 'armv8.3-a', > 'armv8.2-a', 'armv8.1-a', 'armv8-a'] > check_compiler_support = false > foreach supported_march: supported_marchs > -- > 2.34.1 >
Re: [PATCH v7 01/21] net/ntnic: add ethdev and makes PMD available
Hi Patrick, Thanks for giving the link of commit `a6c3ec342ee1`. However I cannot checkout this commit in the next-net repository because the commit ID is not exist in the history of next-net repository. Could you find it? It seems that the commit `a6c3ec342ee1` is nearly identical to the commit `62edcfd6ea3c` except the minor change in the subjects as following: https://git.dpdk.org/next/dpdk-next-net/commit/?id=62edcfd6ea3c61639df68e4a315046d09f462e8c I'm able to checkout the commit `62edcfd6ea3c` in the next-net repository but fail to apply this series, because this series conflicts with the commit `a40ac9bcd85c`. Despite all this, I can apply this series successfully on the artifact of commit `a6c3ec342ee1`: https://git.dpdk.org/next/dpdk-next-net/snapshot/dpdk-next-net-a6c3ec342ee105e322ffdb21e810cdfd38455c62.zip Loongson lab always tries to update the local repository with the upstream before testing the series, and just applies the downloaded series to the local repository. So it's possible to skip some commits in Loongson lab when chooses the base for applying. Best regards, Min Zhou On Mon, Jul, 15, 2024 at 3:39PM, Patrick Robb wrote: Hi Min Zhou, I am seeing that commit for next-net: https://git.dpdk.org/next/dpdk-next-net/commit/?id=a6c3ec342ee105e322ffdb21e810cdfd38455c62 If you try to manually apply it on next-net, does it work? Pasting the logs from our apply process below for context: ``` Trying to checkout branch: origin/next-net-for-main Checked out to next-net-for-main (a6c3ec342ee105e322ffdb21e810cdfd38455c62) Applying patch... Applying: net/ntnic: add ethdev and makes PMD available Applying: net/ntnic: add logging implementation Applying: net/ntnic: add minimal initialization for PCI device Applying: net/ntnic: add NT utilities implementation Applying: net/ntnic: add VFIO module Applying: net/ntnic: add basic eth dev ops to ntnic Applying: net/ntnic: add core platform structures Applying: net/ntnic: add adapter initialization Applying: net/ntnic: add registers and FPGA model for NapaTech NIC Applying: net/ntnic: add FPGA modules for initialization Applying: net/ntnic: add FPGA initialization functionality Applying: net/ntnic: add support of the NT200A0X smartNIC Applying: net/ntnic: add startup and reset sequence for NT200A0X Applying: net/ntnic: add clock profile for the NT200A0X smartNIC Applying: net/ntnic: add link management skeleton Applying: net/ntnic: add link 100G module ops Applying: net/ntnic: add generic NIM and I2C modules Applying: net/ntnic: add QSFP support Applying: net/ntnic: add QSFP28 support Applying: net/ntnic: add GPIO communication for NIMs Applying: net/ntnic: add physical layer control module Running test build... The Meson build system ```
RE: [PATCH v1] test/crypto: remove unused stats in test setup
> Subject: [PATCH v1] test/crypto: remove unused stats in test setup > > Remove unused stats in test setup. > > Coverity issue: 373869 > Fixes: 2c6dab9cd93 ("test/crypto: add RSA and Mod tests") > Cc: sta...@dpdk.org > > Signed-off-by: Gowrishankar Muthukrishnan Acked-by: Anoob Joseph
RE: [PATCH v1] test/crypto: fix asymmetric capability test
> Subject: [PATCH v1] test/crypto: fix asymmetric capability test > > Fix asymmetric capability test for below: > * Skip test if asymmetric crypto feature is not supported by device. > * Assert return value of RTE function to get asymmetric capability. > > Coverity issue: 373365 > Fixes: 2c6dab9cd93 ("test/crypto: add RSA and Mod tests") > Cc: sta...@dpdk.org > > Signed-off-by: Gowrishankar Muthukrishnan Acked-by: Anoob Joseph
RE: [PATCH v1] test/crypto: fix comparison function for modex values
> Subject: [PATCH v1] test/crypto: fix comparison function for modex values > > Fix comparison function used by modex test to check from first non-zero value > itself. > > Coverity issue: 430125 > Fixes: 2162d32c1c3 ("test/crypto: validate modex from first non-zero") > Cc: sta...@dpdk.org > > Signed-off-by: Gowrishankar Muthukrishnan Acked-by: Anoob Joseph
[PATCH] app/testpmd: improve sse based macswap
Goal of the patch is to improve SSE macswap on x86_64 by reducing the stalls in backend engine. Original implementation of the SSE macswap makes loop call to multiple load, shuffle & store. Using SIMD ISA interleaving we can reduce the stalls for - load SSE token exhaustion - Shuffle and Load dependency Also other changes which improves packet per second are - Filling access to MBUF for offload flags which is separate cacheline, - using register keyword Build test using meson script: `` build-gcc-static buildtools build-gcc-shared build-mini build-clang-static build-clang-shared build-x86-generic Test Results: ` Platform-1: AMD EPYC SIENA 8594P @2.3GHz, no boost TEST IO 64B: baseline - mellanox CX-7 2*200Gbps : 42.0 - intel E810 1*100Gbps : 82.0 - intel E810 2*200Gbps (2CQ-DA2): 82.45 TEST MACSWAP 64B: - mellanox CX-7 2*200Gbps : 31.533 : 31.90 - intel E810 1*100Gbps : 50.380 : 47.0 - intel E810 2*200Gbps (2CQ-DA2): 48.840 : 49.827 TEST MACSWAP 128B: - mellanox CX-7 2*200Gbps: 30.946 : 31.770 - intel E810 1*100Gbps: 49.386 : 46.366 - intel E810 2*200Gbps (2CQ-DA2): 47.979 : 49.503 TEST MACSWAP 256B: - mellanox CX-7 2*200Gbps: 32.480 : 33.150 - intel E810 1 * 100Gbps: 45.29 : 44.571 - intel E810 2 * 200Gbps (2CQ-DA2): 45.033 : 45.117 Platform-2: AMD EPYC 9554 @3.1GHz, no boost TEST IO 64B: baseline - intel E810 2*200Gbps (2CQ-DA2): 82.49 TEST MACSWAP: 1Q 1C1T 64B: : 45.0 : 45.54 128B: : 44.48 : 44.43 256B: : 42.0 : 41.99 + TEST MACSWAP: 2Q 2C2T 64B: : 59.5 : 60.55 128B: : 56.78 : 58.1 256B: : 41.85 : 41.99 Signed-off-by: Vipin Varghese --- app/test-pmd/macswap_sse.h | 37 + 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/app/test-pmd/macswap_sse.h b/app/test-pmd/macswap_sse.h index 223f87a539..6e4ed21924 100644 --- a/app/test-pmd/macswap_sse.h +++ b/app/test-pmd/macswap_sse.h @@ -16,16 +16,16 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t nb, uint64_t ol_flags; int i; int r; - __m128i addr0, addr1, addr2, addr3; + register __m128i addr0, addr1, addr2, addr3; /** * shuffle mask be used to shuffle the 16 bytes. * byte 0-5 wills be swapped with byte 6-11. * byte 12-15 will keep unchanged. */ - __m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, - 5, 4, 3, 2, - 1, 0, 11, 10, - 9, 8, 7, 6); + register const __m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, + 5, 4, 3, 2, + 1, 0, 11, 10, + 9, 8, 7, 6); ol_flags = ol_flags_init(txp->dev_conf.txmode.offloads); vlan_qinq_set(pkts, nb, ol_flags, @@ -44,23 +44,24 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t nb, mb[0] = pkts[i++]; eth_hdr[0] = rte_pktmbuf_mtod(mb[0], struct rte_ether_hdr *); - addr0 = _mm_loadu_si128((__m128i *)eth_hdr[0]); - mb[1] = pkts[i++]; eth_hdr[1] = rte_pktmbuf_mtod(mb[1], struct rte_ether_hdr *); - addr1 = _mm_loadu_si128((__m128i *)eth_hdr[1]); - - mb[2] = pkts[i++]; eth_hdr[2] = rte_pktmbuf_mtod(mb[2], struct rte_ether_hdr *); - addr2 = _mm_loadu_si128((__m128i *)eth_hdr[2]); - mb[3] = pkts[i++]; eth_hdr[3] = rte_pktmbuf_mtod(mb[3], struct rte_ether_hdr *); - addr3 = _mm_loadu_si128((__m128i *)eth_hdr[3]); + /* Interleave loads and shuffle with field set */ + addr0 = _mm_loadu_si128((__m128i *)eth_hdr[0]); + mbuf_field_set(mb[0], ol_flags); + addr1 = _mm_loadu_si128((__m128i *)eth_hdr[1]); + mbuf_field_set(mb[1], ol_flags); addr0 = _mm_shuffle_epi8(addr0, shfl_msk); + addr2 = _mm_loadu_si128((__m128i *)eth_hdr[2]); + mbuf_field_set(mb[2], ol_flags); addr1 = _mm_shuffle_epi8(addr1, shfl_msk); + addr3 = _mm_loadu_si128((__m128i *)eth_hdr[3]); + mbuf_field_set(mb[3], ol_flags); addr2 = _mm_shuffle_epi8(addr2, shfl_msk); addr3 = _mm_shuffle_epi8(addr3, shfl_msk); @@ -69,25 +70,21 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t nb, _mm_storeu_si128(
Re: [PATCH 2/3] examples/l3fwd: fix return value on rules add
On 15-07-2024 15:44, Gagandeep Singh wrote: fix return value on adding the EM or LPM rules. Fixes: e7e6dd643092 ("examples/l3fwd: support config file for EM") Fixes: 52def963fc1c ("examples/l3fwd: support config file for LPM/FIB") Cc: sean.morris...@intel.com Cc: sta...@dpdk.org Signed-off-by: Gagandeep Singh Acked-by: Hemant Agrawal