Re: [dpdk-dev] [PATCH v4 2/5] eal: add lcore accessors
On Thu, May 30, 2019 at 12:51 AM Stephen Hemminger < step...@networkplumber.org> wrote: > On Thu, 30 May 2019 00:46:30 +0200 > Thomas Monjalon wrote: > > > 23/05/2019 15:58, David Marchand: > > > From: Stephen Hemminger > > > > > > The fields of the internal EAL core configuration are currently > > > laid bare as part of the API. This is not good practice and limits > > > fixing issues with layout and sizes. > > > > > > Make new accessor functions for the fields used by current drivers > > > and examples. > > [...] > > > +DPDK_19.08 { > > > + global: > > > + > > > + rte_lcore_cpuset; > > > + rte_lcore_index; > > > + rte_lcore_to_cpu_id; > > > + rte_lcore_to_socket_id; > > > + > > > +} DPDK_19.05; > > > + > > > EXPERIMENTAL { > > > global: > > > > Just to make sure, are we OK to introduce these functions > > as non-experimental? > > They were in previous releases as inlines this patch converts them > to real functions. > > Well, yes and no. rte_lcore_index and rte_lcore_to_socket_id already existed, so making them part of the ABI is fine for me. rte_lcore_to_cpu_id is new but seems quite safe in how it can be used, adding it to the ABI is ok for me. rte_lcore_cpuset is new too, and still a bit obscure to me. I am not really convinced we need it until I understand why dpaa2 and fslmc bus need to know about this. I might need more time to look at it, so flag this as experimental sounds fair to me. -- David Marchand
Re: [dpdk-dev] [PATCH v4 2/5] eal: add lcore accessors
30/05/2019 09:31, David Marchand: > On Thu, May 30, 2019 at 12:51 AM Stephen Hemminger < > step...@networkplumber.org> wrote: > > > On Thu, 30 May 2019 00:46:30 +0200 > > Thomas Monjalon wrote: > > > > > 23/05/2019 15:58, David Marchand: > > > > From: Stephen Hemminger > > > > > > > > The fields of the internal EAL core configuration are currently > > > > laid bare as part of the API. This is not good practice and limits > > > > fixing issues with layout and sizes. > > > > > > > > Make new accessor functions for the fields used by current drivers > > > > and examples. > > > [...] > > > > +DPDK_19.08 { > > > > + global: > > > > + > > > > + rte_lcore_cpuset; > > > > + rte_lcore_index; > > > > + rte_lcore_to_cpu_id; > > > > + rte_lcore_to_socket_id; > > > > + > > > > +} DPDK_19.05; > > > > + > > > > EXPERIMENTAL { > > > > global: > > > > > > Just to make sure, are we OK to introduce these functions > > > as non-experimental? > > > > They were in previous releases as inlines this patch converts them > > to real functions. > > > > > Well, yes and no. > > rte_lcore_index and rte_lcore_to_socket_id already existed, so making them > part of the ABI is fine for me. > > rte_lcore_to_cpu_id is new but seems quite safe in how it can be used, > adding it to the ABI is ok for me. It is used by DPAA and some test. I guess adding as experimental is fine too? I'm fine with both options, I'm just trying to apply the policy we agreed on. Does this case deserve an exception? > rte_lcore_cpuset is new too, and still a bit obscure to me. I am not really > convinced we need it until I understand why dpaa2 and fslmc bus need to > know about this. > I might need more time to look at it, so flag this as experimental sounds > fair to me.
Re: [dpdk-dev] [PATCH 25/25] eal: hide shared memory config
On 29-May-19 5:40 PM, Stephen Hemminger wrote: On Wed, 29 May 2019 17:31:11 +0100 Anatoly Burakov wrote: +static inline void +rte_eal_mcfg_wait_complete(struct rte_mem_config *mcfg) +{ + /* wait until shared mem_config finish initialising */ + while (mcfg->magic != RTE_MAGIC) + rte_pause(); +} Not fast path, why is this inline? I kept existing function. Have no preference one way or the other, can change in V2. +#endif // EAL_MEMCFG_H Avoid C++ style comments. Will fix. -- Thanks, Anatoly
[dpdk-dev] [PATCH v1 0/7] add multiple cores feature to test-compress-perf
This patchset adds multiple cores feature to compression perf tool. All structures have been aligned and are consistent with crypto perf tool. All test cases have constructor, runner and destructor and can use more cores and compression devices at the same time. Tomasz Jozwiak (7): app/test-compress-perf: add weak functions for multi-cores test app/test-compress-perf: add ptest command line option app/test-compress-perf: add verification test case app/test-compress-perf: add benchmark test case doc: update dpdk-test-compress-perf description app/test-compress-perf: add force process termination doc: update release notes for 19.08 app/test-compress-perf/Makefile | 1 + app/test-compress-perf/comp_perf.h| 61 +++ app/test-compress-perf/comp_perf_options.h| 46 +- app/test-compress-perf/comp_perf_options_parse.c | 58 +- app/test-compress-perf/comp_perf_test_benchmark.c | 152 -- app/test-compress-perf/comp_perf_test_benchmark.h | 25 +- app/test-compress-perf/comp_perf_test_common.c| 285 ++ app/test-compress-perf/comp_perf_test_common.h| 41 ++ app/test-compress-perf/comp_perf_test_verify.c| 136 +++-- app/test-compress-perf/comp_perf_test_verify.h| 24 +- app/test-compress-perf/main.c | 630 ++ app/test-compress-perf/meson.build| 3 +- doc/guides/rel_notes/release_19_08.rst| 3 + doc/guides/tools/comp_perf.rst| 34 +- 14 files changed, 1033 insertions(+), 466 deletions(-) create mode 100644 app/test-compress-perf/comp_perf.h create mode 100644 app/test-compress-perf/comp_perf_test_common.c create mode 100644 app/test-compress-perf/comp_perf_test_common.h -- 2.7.4
[dpdk-dev] [PATCH v1 1/7] app/test-compress-perf: add weak functions for multi-cores test
This patch adds a template functions for multi-cores performance version of compress-perf-tool. Signed-off-by: Tomasz Jozwiak --- app/test-compress-perf/Makefile | 3 +- app/test-compress-perf/comp_perf.h | 61 +++ app/test-compress-perf/comp_perf_options.h | 45 +- app/test-compress-perf/comp_perf_options_parse.c | 24 +- app/test-compress-perf/comp_perf_test_common.c | 285 +++ app/test-compress-perf/comp_perf_test_common.h | 41 ++ app/test-compress-perf/main.c| 624 ++- app/test-compress-perf/meson.build | 3 +- 8 files changed, 685 insertions(+), 401 deletions(-) create mode 100644 app/test-compress-perf/comp_perf.h create mode 100644 app/test-compress-perf/comp_perf_test_common.c create mode 100644 app/test-compress-perf/comp_perf_test_common.h diff --git a/app/test-compress-perf/Makefile b/app/test-compress-perf/Makefile index d20e17e..de74129 100644 --- a/app/test-compress-perf/Makefile +++ b/app/test-compress-perf/Makefile @@ -12,7 +12,6 @@ CFLAGS += -O3 # all source are stored in SRCS-y SRCS-y := main.c SRCS-y += comp_perf_options_parse.c -SRCS-y += comp_perf_test_verify.c -SRCS-y += comp_perf_test_benchmark.c +SRCS-y += comp_perf_test_common.c include $(RTE_SDK)/mk/rte.app.mk diff --git a/app/test-compress-perf/comp_perf.h b/app/test-compress-perf/comp_perf.h new file mode 100644 index 000..144ad8a --- /dev/null +++ b/app/test-compress-perf/comp_perf.h @@ -0,0 +1,61 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2019 Intel Corporation + */ + +#ifndef _COMP_PERF_ +#define _COMP_PERF_ + +#include + +struct comp_test_data; + +typedef void *(*cperf_constructor_t)( + uint8_t dev_id, + uint16_t qp_id, + struct comp_test_data *options); + +typedef int (*cperf_runner_t)(void *test_ctx); +typedef void (*cperf_destructor_t)(void *test_ctx); + +struct cperf_test { + cperf_constructor_t constructor; + cperf_runner_t runner; + cperf_destructor_t destructor; +}; + +/* Needed for weak functions*/ + +void * +cperf_benchmark_test_constructor(uint8_t dev_id __rte_unused, +uint16_t qp_id __rte_unused, +struct comp_test_data *options __rte_unused); + +void +cperf_benchmark_test_destructor(void *arg __rte_unused); + +int +cperf_benchmark_test_runner(void *test_ctx __rte_unused); + +void * +cperf_verify_test_constructor(uint8_t dev_id __rte_unused, +uint16_t qp_id __rte_unused, +struct comp_test_data *options __rte_unused); + +void +cperf_verify_test_destructor(void *arg __rte_unused); + +int +cperf_verify_test_runner(void *test_ctx __rte_unused); + +void * +cperf_pmd_cyclecount_test_constructor(uint8_t dev_id __rte_unused, +uint16_t qp_id __rte_unused, +struct comp_test_data *options __rte_unused); + +void +cperf_pmd_cyclecount_test_destructor(void *arg __rte_unused); + +int +cperf_pmd_cyclecount_test_runner(void *test_ctx __rte_unused); + +#endif /* _COMP_PERF_ */ diff --git a/app/test-compress-perf/comp_perf_options.h b/app/test-compress-perf/comp_perf_options.h index f87751d..79e63d5 100644 --- a/app/test-compress-perf/comp_perf_options.h +++ b/app/test-compress-perf/comp_perf_options.h @@ -13,6 +13,24 @@ #define MAX_MBUF_DATA_SIZE (UINT16_MAX - RTE_PKTMBUF_HEADROOM) #define MAX_SEG_SIZE ((int)(MAX_MBUF_DATA_SIZE / EXPANSE_RATIO)) +extern const char *cperf_test_type_strs[]; + +/* Cleanup state machine */ +enum cleanup_st { + ST_CLEAR = 0, + ST_TEST_DATA, + ST_COMPDEV, + ST_INPUT_DATA, + ST_MEMORY_ALLOC, + ST_DURING_TEST +}; + +enum cperf_perf_test_type { + CPERF_TEST_TYPE_BENCHMARK, + CPERF_TEST_TYPE_VERIFY, + CPERF_TEST_TYPE_PMDCC +}; + enum comp_operation { COMPRESS_ONLY, DECOMPRESS_ONLY, @@ -30,37 +48,26 @@ struct range_list { struct comp_test_data { char driver_name[64]; char input_file[64]; - struct rte_mbuf **comp_bufs; - struct rte_mbuf **decomp_bufs; - uint32_t total_bufs; + enum cperf_perf_test_type test; + uint8_t *input_data; size_t input_data_sz; - uint8_t *compressed_data; - uint8_t *decompressed_data; - struct rte_mempool *comp_buf_pool; - struct rte_mempool *decomp_buf_pool; - struct rte_mempool *op_pool; - int8_t cdev_id; + uint16_t nb_qps; uint16_t seg_sz; uint16_t out_seg_sz; uint16_t burst_sz; uint32_t pool_sz; uint32_t num_iter; uint16_t max_sgl_segs; + enum rte_comp_huffman huffman_enc; enum comp_operation test_op; int window_sz; - struct range_list level; - /* Store TSC duration for all levels (including level 0) */ - uint64_t comp_tsc_durat
[dpdk-dev] [PATCH v1 3/7] app/test-compress-perf: add verification test case
This patch adds a verification part to compression-perf-tool as a separate test case, which can be executed multi-threaded. Signed-off-by: Tomasz Jozwiak --- app/test-compress-perf/Makefile| 1 + app/test-compress-perf/comp_perf_test_verify.c | 122 ++--- app/test-compress-perf/comp_perf_test_verify.h | 24 - app/test-compress-perf/main.c | 1 + app/test-compress-perf/meson.build | 1 + 5 files changed, 112 insertions(+), 37 deletions(-) diff --git a/app/test-compress-perf/Makefile b/app/test-compress-perf/Makefile index de74129..f54d9a4 100644 --- a/app/test-compress-perf/Makefile +++ b/app/test-compress-perf/Makefile @@ -12,6 +12,7 @@ CFLAGS += -O3 # all source are stored in SRCS-y SRCS-y := main.c SRCS-y += comp_perf_options_parse.c +SRCS-y += comp_perf_test_verify.c SRCS-y += comp_perf_test_common.c include $(RTE_SDK)/mk/rte.app.mk diff --git a/app/test-compress-perf/comp_perf_test_verify.c b/app/test-compress-perf/comp_perf_test_verify.c index 28a0fe8..c2aab70 100644 --- a/app/test-compress-perf/comp_perf_test_verify.c +++ b/app/test-compress-perf/comp_perf_test_verify.c @@ -8,14 +8,48 @@ #include #include "comp_perf_test_verify.h" +#include "comp_perf_test_common.h" + +void +cperf_verify_test_destructor(void *arg) +{ + if (arg) { + comp_perf_free_memory(&((struct cperf_verify_ctx *)arg)->mem); + rte_free(arg); + } +} + +void * +cperf_verify_test_constructor(uint8_t dev_id, uint16_t qp_id, + struct comp_test_data *options) +{ + struct cperf_verify_ctx *ctx = NULL; + + ctx = rte_malloc(NULL, sizeof(struct cperf_verify_ctx), 0); + + if (ctx != NULL) { + ctx->mem.dev_id = dev_id; + ctx->mem.qp_id = qp_id; + ctx->options = options; + + if (!comp_perf_allocate_memory(ctx->options, &ctx->mem) && + !prepare_bufs(ctx->options, &ctx->mem)) + return ctx; + } + + cperf_verify_test_destructor(ctx); + return NULL; +} static int -main_loop(struct comp_test_data *test_data, uint8_t level, - enum rte_comp_xform_type type, - uint8_t *output_data_ptr, - size_t *output_data_sz) +main_loop(struct cperf_verify_ctx *ctx, enum rte_comp_xform_type type) { - uint8_t dev_id = test_data->cdev_id; + struct comp_test_data *test_data = ctx->options; + uint8_t *output_data_ptr; + size_t *output_data_sz; + struct cperf_mem_resources *mem = &ctx->mem; + + uint8_t dev_id = mem->dev_id; uint32_t i, iter, num_iter; struct rte_comp_op **ops, **deq_ops; void *priv_xform = NULL; @@ -33,7 +67,7 @@ main_loop(struct comp_test_data *test_data, uint8_t level, } ops = rte_zmalloc_socket(NULL, - 2 * test_data->total_bufs * sizeof(struct rte_comp_op *), + 2 * mem->total_bufs * sizeof(struct rte_comp_op *), 0, rte_socket_id()); if (ops == NULL) { @@ -42,7 +76,7 @@ main_loop(struct comp_test_data *test_data, uint8_t level, return -1; } - deq_ops = &ops[test_data->total_bufs]; + deq_ops = &ops[mem->total_bufs]; if (type == RTE_COMP_COMPRESS) { xform = (struct rte_comp_xform) { @@ -50,14 +84,16 @@ main_loop(struct comp_test_data *test_data, uint8_t level, .compress = { .algo = RTE_COMP_ALGO_DEFLATE, .deflate.huffman = test_data->huffman_enc, - .level = level, + .level = test_data->level, .window_size = test_data->window_sz, .chksum = RTE_COMP_CHECKSUM_NONE, .hash_algo = RTE_COMP_HASH_ALGO_NONE } }; - input_bufs = test_data->decomp_bufs; - output_bufs = test_data->comp_bufs; + output_data_ptr = ctx->mem.compressed_data; + output_data_sz = &ctx->comp_data_sz; + input_bufs = mem->decomp_bufs; + output_bufs = mem->comp_bufs; out_seg_sz = test_data->out_seg_sz; } else { xform = (struct rte_comp_xform) { @@ -69,8 +105,10 @@ main_loop(struct comp_test_data *test_data, uint8_t level, .hash_algo = RTE_COMP_HASH_ALGO_NONE } }; - input_bufs = test_data->comp_bufs; - output_bufs = test_data->decomp_bufs; + output_data_ptr = ctx->mem.decompressed_data; + output_data_sz = &ctx->decomp_data_sz; + input_bufs = mem->comp_bufs; + output_bufs = mem->decomp_bufs; out_
[dpdk-dev] [PATCH v1 2/7] app/test-compress-perf: add ptest command line option
This patch adds --ptest option to make possible a choose of test case from command line. Signed-off-by: Tomasz Jozwiak --- app/test-compress-perf/comp_perf_options_parse.c | 36 1 file changed, 36 insertions(+) diff --git a/app/test-compress-perf/comp_perf_options_parse.c b/app/test-compress-perf/comp_perf_options_parse.c index bc4b98a..07672b2 100644 --- a/app/test-compress-perf/comp_perf_options_parse.c +++ b/app/test-compress-perf/comp_perf_options_parse.c @@ -15,6 +15,7 @@ #include "comp_perf_options.h" +#define CPERF_PTEST_TYPE ("ptest") #define CPERF_DRIVER_NAME ("driver-name") #define CPERF_TEST_FILE("input-file") #define CPERF_SEG_SIZE ("seg-sz") @@ -37,6 +38,7 @@ static void usage(char *progname) { printf("%s [EAL options] --\n" + " --ptest benchmark / verify :" " --driver-name NAME: compress driver to use\n" " --input-file NAME: file to compress and decompress\n" " --extended-input-sz N: extend file data up to this size (default: no extension)\n" @@ -76,6 +78,37 @@ get_str_key_id_mapping(struct name_id_map *map, unsigned int map_len, } static int +parse_cperf_test_type(struct comp_test_data *test_data, const char *arg) +{ + struct name_id_map cperftest_namemap[] = { + { + cperf_test_type_strs[CPERF_TEST_TYPE_BENCHMARK], + CPERF_TEST_TYPE_BENCHMARK + }, + { + cperf_test_type_strs[CPERF_TEST_TYPE_VERIFY], + CPERF_TEST_TYPE_VERIFY + }, + { + cperf_test_type_strs[CPERF_TEST_TYPE_PMDCC], + CPERF_TEST_TYPE_PMDCC + } + }; + + int id = get_str_key_id_mapping( + (struct name_id_map *)cperftest_namemap, + RTE_DIM(cperftest_namemap), arg); + if (id < 0) { + RTE_LOG(ERR, USER1, "failed to parse test type"); + return -1; + } + + test_data->test = (enum cperf_perf_test_type)id; + + return 0; +} + +static int parse_uint32_t(uint32_t *value, const char *arg) { char *end = NULL; @@ -499,6 +532,8 @@ struct long_opt_parser { }; static struct option lgopts[] = { + + { CPERF_PTEST_TYPE, required_argument, 0, 0 }, { CPERF_DRIVER_NAME, required_argument, 0, 0 }, { CPERF_TEST_FILE, required_argument, 0, 0 }, { CPERF_SEG_SIZE, required_argument, 0, 0 }, @@ -517,6 +552,7 @@ static int comp_perf_opts_parse_long(int opt_idx, struct comp_test_data *test_data) { struct long_opt_parser parsermap[] = { + { CPERF_PTEST_TYPE, parse_cperf_test_type }, { CPERF_DRIVER_NAME,parse_driver_name }, { CPERF_TEST_FILE, parse_test_file }, { CPERF_SEG_SIZE, parse_seg_sz }, -- 2.7.4
[dpdk-dev] [PATCH v1 4/7] app/test-compress-perf: add benchmark test case
This patch adds a benchmark part to compression-perf-tool as a separate test case, which can be executed multi-threaded. Signed-off-by: Tomasz Jozwiak --- app/test-compress-perf/Makefile | 1 + app/test-compress-perf/comp_perf_test_benchmark.c | 139 -- app/test-compress-perf/comp_perf_test_benchmark.h | 25 +++- app/test-compress-perf/main.c | 1 + app/test-compress-perf/meson.build| 1 + 5 files changed, 129 insertions(+), 38 deletions(-) diff --git a/app/test-compress-perf/Makefile b/app/test-compress-perf/Makefile index f54d9a4..d1a6820 100644 --- a/app/test-compress-perf/Makefile +++ b/app/test-compress-perf/Makefile @@ -13,6 +13,7 @@ CFLAGS += -O3 SRCS-y := main.c SRCS-y += comp_perf_options_parse.c SRCS-y += comp_perf_test_verify.c +SRCS-y += comp_perf_test_benchmark.c SRCS-y += comp_perf_test_common.c include $(RTE_SDK)/mk/rte.app.mk diff --git a/app/test-compress-perf/comp_perf_test_benchmark.c b/app/test-compress-perf/comp_perf_test_benchmark.c index 5752906..9b0b146 100644 --- a/app/test-compress-perf/comp_perf_test_benchmark.c +++ b/app/test-compress-perf/comp_perf_test_benchmark.c @@ -10,11 +10,45 @@ #include "comp_perf_test_benchmark.h" +void +cperf_benchmark_test_destructor(void *arg) +{ + if (arg) { + comp_perf_free_memory( + &((struct cperf_benchmark_ctx *)arg)->ver.mem); + rte_free(arg); + } +} + +void * +cperf_benchmark_test_constructor(uint8_t dev_id, uint16_t qp_id, + struct comp_test_data *options) +{ + struct cperf_benchmark_ctx *ctx = NULL; + + ctx = rte_malloc(NULL, sizeof(struct cperf_benchmark_ctx), 0); + + if (ctx != NULL) { + ctx->ver.mem.dev_id = dev_id; + ctx->ver.mem.qp_id = qp_id; + ctx->ver.options = options; + ctx->ver.silent = 1; /* ver. part will be silent */ + + if (!comp_perf_allocate_memory(ctx->ver.options, &ctx->ver.mem) + && !prepare_bufs(ctx->ver.options, &ctx->ver.mem)) + return ctx; + } + + cperf_benchmark_test_destructor(ctx); + return NULL; +} + static int -main_loop(struct comp_test_data *test_data, uint8_t level, - enum rte_comp_xform_type type) +main_loop(struct cperf_benchmark_ctx *ctx, enum rte_comp_xform_type type) { - uint8_t dev_id = test_data->cdev_id; + struct comp_test_data *test_data = ctx->ver.options; + struct cperf_mem_resources *mem = &ctx->ver.mem; + uint8_t dev_id = mem->dev_id; uint32_t i, iter, num_iter; struct rte_comp_op **ops, **deq_ops; void *priv_xform = NULL; @@ -31,7 +65,7 @@ main_loop(struct comp_test_data *test_data, uint8_t level, } ops = rte_zmalloc_socket(NULL, - 2 * test_data->total_bufs * sizeof(struct rte_comp_op *), + 2 * mem->total_bufs * sizeof(struct rte_comp_op *), 0, rte_socket_id()); if (ops == NULL) { @@ -40,7 +74,7 @@ main_loop(struct comp_test_data *test_data, uint8_t level, return -1; } - deq_ops = &ops[test_data->total_bufs]; + deq_ops = &ops[mem->total_bufs]; if (type == RTE_COMP_COMPRESS) { xform = (struct rte_comp_xform) { @@ -48,14 +82,14 @@ main_loop(struct comp_test_data *test_data, uint8_t level, .compress = { .algo = RTE_COMP_ALGO_DEFLATE, .deflate.huffman = test_data->huffman_enc, - .level = level, + .level = test_data->level, .window_size = test_data->window_sz, .chksum = RTE_COMP_CHECKSUM_NONE, .hash_algo = RTE_COMP_HASH_ALGO_NONE } }; - input_bufs = test_data->decomp_bufs; - output_bufs = test_data->comp_bufs; + input_bufs = mem->decomp_bufs; + output_bufs = mem->comp_bufs; out_seg_sz = test_data->out_seg_sz; } else { xform = (struct rte_comp_xform) { @@ -67,8 +101,8 @@ main_loop(struct comp_test_data *test_data, uint8_t level, .hash_algo = RTE_COMP_HASH_ALGO_NONE } }; - input_bufs = test_data->comp_bufs; - output_bufs = test_data->decomp_bufs; + input_bufs = mem->comp_bufs; + output_bufs = mem->decomp_bufs; out_seg_sz = test_data->seg_sz; } @@ -82,13 +116,13 @@ main_loop(struct comp_test_data *test_data, uint8_t level, uint64_t tsc_start, tsc_end, tsc_duration; - tsc_start = tsc_end = tsc_duration = 0; - tsc_start = r
[dpdk-dev] [PATCH v1 5/7] doc: update dpdk-test-compress-perf description
This patch updates a dpdk-test-compress-perf documentation. Signed-off-by: Tomasz Jozwiak --- doc/guides/tools/comp_perf.rst | 34 +++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/doc/guides/tools/comp_perf.rst b/doc/guides/tools/comp_perf.rst index 52869c1..71eef18 100644 --- a/doc/guides/tools/comp_perf.rst +++ b/doc/guides/tools/comp_perf.rst @@ -6,7 +6,9 @@ dpdk-test-compress-perf Tool The ``dpdk-test-compress-perf`` tool is a Data Plane Development Kit (DPDK) utility that allows measuring performance parameters of PMDs available in the -compress tree. The tool reads the data from a file (--input-file), +compress tree. User can use multiple cores to run tests on but only +one type of compression PMD can be measured during single application +execution. The tool reads the data from a file (--input-file), dumps all the file into a buffer and fills out the data of input mbufs, which are passed to compress device with compression operations. Then, the output buffers are fed into the decompression stage, and the resulting @@ -26,9 +28,35 @@ Limitations * Stateful operation is not supported in this version. +EAL Options +~~~ + +The following are the EAL command-line options that can be used in conjunction +with the ``dpdk-test-compress-perf`` application. +See the DPDK Getting Started Guides for more information on these options. + +* ``-c `` or ``-l `` + + Set the hexadecimal bitmask of the cores to run on. The corelist is a + list cores to use. + +.. Note:: + + One lcore is needed for process admin, tests are run on all other cores. + To run tests on two lcores, three lcores must be passed to the tool. + +* ``-w `` + + Add a PCI device in white list. + +* ``--vdev `` + + Add a virtual device. + +Appication Options +~~ -Command line options - + ``--ptest [benchmark/verify]``: set test type (default: benchmark) ``--driver-name NAME``: compress driver to use -- 2.7.4
Re: [dpdk-dev] [PATCH 00/25] Make shared memory config non-public
On 29-May-19 9:11 PM, David Marchand wrote: On Wed, May 29, 2019 at 6:31 PM Anatoly Burakov wrote: This patchset removes the shared memory config from public API, and replaces all usages of said config with new API calls. The patchset is mostly a search-and-replace job and should be pretty easy to review. However, the changes to ENA I went and did the same job with some scripts. Not sure you really need to split in all those patches. We are not going to backport this. The "separate commits" thing is made for the benefit of reviewers, not backporters. In my experience it's much easier to get a maintainer to review a smaller patch than it is to look through a wall of irrelevant changes. That said, for trivial changes such as these, maybe this is indeed unnecessary. Some changes are mixed, the kni changes are in the hash: patch. Oops, will fix, thanks for pointing it out! I spotted a missed qlock in : lib/librte_eal/common/eal_common_tailqs.c: rte_rwlock_read_lock(&mcfg->qlock); lib/librte_eal/common/eal_common_tailqs.c: rte_rwlock_read_unlock(&mcfg->qlock); On the names of the functions, could we have something shorter ? The prefix rte_eal_mcfg_ is not necessary from my pov. I can drop the mcfg, but IMO all of these locking functions should be kept under one namespace, and rte_eal_ is too broad. driver are of particular interest, because they're using the shared memory config in a way that i find confusing. I thought the same when I looked at it before. Hopefully the ena maintainers will enlight us :-). I tried to implement the equivalent changes as well as i could, but since the code doesn't make any sense to me, i would really like to request help from ENA maintainers. Everything else should be pretty straightforward. We are missing the deprecation notice removal at the end of the series and a note in the release notes. Will add. Making into V1 deadline was higher priority :D Thanks Anatoly! -- Thanks, Anatoly
[dpdk-dev] [PATCH v1 6/7] app/test-compress-perf: add force process termination
This patch adds a possibility to force controlled process termination as a result of two signals: SIGTERM and SIGINT Signed-off-by: Tomasz Jozwiak --- app/test-compress-perf/comp_perf_options.h| 1 + app/test-compress-perf/comp_perf_test_benchmark.c | 13 app/test-compress-perf/comp_perf_test_verify.c| 14 app/test-compress-perf/main.c | 26 +-- 4 files changed, 52 insertions(+), 2 deletions(-) diff --git a/app/test-compress-perf/comp_perf_options.h b/app/test-compress-perf/comp_perf_options.h index 79e63d5..534212d 100644 --- a/app/test-compress-perf/comp_perf_options.h +++ b/app/test-compress-perf/comp_perf_options.h @@ -68,6 +68,7 @@ struct comp_test_data { double ratio; enum cleanup_st cleanup; + int perf_comp_force_stop; }; int diff --git a/app/test-compress-perf/comp_perf_test_benchmark.c b/app/test-compress-perf/comp_perf_test_benchmark.c index 9b0b146..b38b33c 100644 --- a/app/test-compress-perf/comp_perf_test_benchmark.c +++ b/app/test-compress-perf/comp_perf_test_benchmark.c @@ -183,6 +183,9 @@ main_loop(struct cperf_benchmark_ctx *ctx, enum rte_comp_xform_type type) ops[op_id]->private_xform = priv_xform; } + if (unlikely(test_data->perf_comp_force_stop)) + goto end; + num_enq = rte_compressdev_enqueue_burst(dev_id, mem->qp_id, ops, num_ops); @@ -241,6 +244,9 @@ main_loop(struct cperf_benchmark_ctx *ctx, enum rte_comp_xform_type type) /* Dequeue the last operations */ while (total_deq_ops < total_ops) { + if (unlikely(test_data->perf_comp_force_stop)) + goto end; + num_deq = rte_compressdev_dequeue_burst(dev_id, mem->qp_id, deq_ops, @@ -305,6 +311,13 @@ main_loop(struct cperf_benchmark_ctx *ctx, enum rte_comp_xform_type type) rte_mempool_put_bulk(mem->op_pool, (void **)ops, allocated); rte_compressdev_private_xform_free(dev_id, priv_xform); rte_free(ops); + + if (test_data->perf_comp_force_stop) { + RTE_LOG(ERR, USER1, + "lcore: %d Perf. test has been aborted by user\n", + mem->lcore_id); + res = -1; + } return res; } diff --git a/app/test-compress-perf/comp_perf_test_verify.c b/app/test-compress-perf/comp_perf_test_verify.c index c2aab70..b2cd7a0 100644 --- a/app/test-compress-perf/comp_perf_test_verify.c +++ b/app/test-compress-perf/comp_perf_test_verify.c @@ -187,6 +187,9 @@ main_loop(struct cperf_verify_ctx *ctx, enum rte_comp_xform_type type) ops[op_id]->private_xform = priv_xform; } + if (unlikely(test_data->perf_comp_force_stop)) + goto end; + num_enq = rte_compressdev_enqueue_burst(dev_id, mem->qp_id, ops, num_ops); @@ -267,6 +270,9 @@ main_loop(struct cperf_verify_ctx *ctx, enum rte_comp_xform_type type) /* Dequeue the last operations */ while (total_deq_ops < total_ops) { + if (unlikely(test_data->perf_comp_force_stop)) + goto end; + num_deq = rte_compressdev_dequeue_burst(dev_id, mem->qp_id, deq_ops, @@ -345,6 +351,14 @@ main_loop(struct cperf_verify_ctx *ctx, enum rte_comp_xform_type type) rte_mempool_put_bulk(mem->op_pool, (void **)ops, allocated); rte_compressdev_private_xform_free(dev_id, priv_xform); rte_free(ops); + + if (test_data->perf_comp_force_stop) { + RTE_LOG(ERR, USER1, + "lcore: %d Perf. test has been aborted by user\n", + mem->lcore_id); + res = -1; + } + return res; } diff --git a/app/test-compress-perf/main.c b/app/test-compress-perf/main.c index c8be84e..98acd02 100644 --- a/app/test-compress-perf/main.c +++ b/app/test-compress-perf/main.c @@ -2,6 +2,10 @@ * Copyright(c) 2018 Intel Corporation */ +#include +#include +#include + #include #include #include @@ -42,6 +46,8 @@ static const struct cperf_test cperf_testmap[] = { } }; +static struct comp_test_data *test_data; + static int comp_perf_check_capabilities(struct comp_test_data *test
[dpdk-dev] [PATCH v1 7/7] doc: update release notes for 19.08
Added release note entry for test-compress-perf application Signed-off-by: Tomasz Jozwiak --- doc/guides/rel_notes/release_19_08.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst index b9510f9..543e7d3 100644 --- a/doc/guides/rel_notes/release_19_08.rst +++ b/doc/guides/rel_notes/release_19_08.rst @@ -54,6 +54,9 @@ New Features Also, make sure to start the actual text at the margin. = +* **Updated test-compress-perf tool application.** + + Added multiple cores feature to compression perf tool application. Removed Items - -- 2.7.4
Re: [dpdk-dev] 18.11.2 (LTS) patches review and test
On 5/21/2019 3:01 PM, Kevin Traynor wrote: Hi all, Here is a list of patches targeted for LTS release 18.11.2. The planned date for the final release is 11th June. Please help with testing and validation of your use cases and report any issues/results. For the final release I will update the release notes with fixes and reported validations. A release candidate tarball can be found at: https://dpdk.org/browse/dpdk-stable/tag/?id=v18.11.2-rc1 These patches are located at branch 18.11 of dpdk-stable repo: https://dpdk.org/browse/dpdk-stable/ Thanks. Kevin Traynor Hi Kevin, I've validated with current head OVS Master and OVS 2.11.1 with VSPERF. Tested with i40e (X710), i40eVF, ixgbe (82599ES), ixgbeVF, igb(I350) and igbVF devices. Following tests were conducted and passed: * vswitch_p2p_tput: vSwitch - configure switch and execute RFC2544 throughput test. * vswitch_p2p_cont: vSwitch - configure switch and execute RFC2544 continuous stream test. * vswitch_pvp_tput: vSwitch - configure switch, vnf and execute RFC2544 throughput test. * vswitch_pvp_cont: vSwitch - configure switch, vnf and execute RFC2544 continuous stream test. * ovsdpdk_hotplug_attach: Ensure successful port-add after binding a device to igb_uio after ovs-vswitchd is launched. * ovsdpdk_mq_p2p_rxqs: Setup rxqs on NIC port. * ovsdpdk_mq_pvp_rxqs: Setup rxqs on vhost user port. * ovsdpdk_mq_pvp_rxqs_linux_bridge: Confirm traffic received over vhost RXQs with Linux virtio device in guest. * ovsdpdk_mq_pvp_rxqs_testpmd: Confirm traffic received over vhost RXQs with DPDK device in guest. * ovsdpdk_vhostuser_client: Test vhost-user client mode. * ovsdpdk_vhostuser_client_reconnect: Test vhost-user client mode reconnect feature. * ovsdpdk_vhostuser_server: Test vhost-user server mode. * ovsdpdk_vhostuser_sock_dir: Verify functionality of vhost-sock-dir flag. * ovsdpdk_vdev_add_null_pmd: Test addition of port using the null DPDK PMD driver. * ovsdpdk_vdev_del_null_pmd: Test deletion of port using the null DPDK PMD driver. * ovsdpdk_vdev_add_af_packet_pmd: Test addition of port using the af_packet DPDK PMD driver. * ovsdpdk_vdev_del_af_packet_pmd: Test deletion of port using the af_packet DPDK PMD driver. * ovsdpdk_numa: Test vhost-user NUMA support. Vhostuser PMD threads should migrate to the same numa slot, where QEMU is executed. * ovsdpdk_jumbo_p2p: Ensure that jumbo frames are received, processed and forwarded correctly by DPDK physical ports. * ovsdpdk_jumbo_pvp: Ensure that jumbo frames are received, processed and forwarded correctly by DPDK vhost-user ports. * ovsdpdk_jumbo_p2p_upper_bound: Ensure that jumbo frames above the configured Rx port's MTU are not accepted. * ovsdpdk_jumbo_mtu_upper_bound_vport: Verify that the upper bound limit is enforced for OvS DPDK vhost-user ports. * ovsdpdk_rate_p2p: Ensure when a user creates a rate limiting physical interface that the traffic is limited to the specified policer rate in a p2p setup. * ovsdpdk_rate_pvp: Ensure when a user creates a rate limiting vHost User interface that the traffic is limited to the specified policer rate in a pvp setup. * ovsdpdk_qos_p2p: In a p2p setup, ensure when a QoS egress policer is created that the traffic is limited to the specified rate. * ovsdpdk_qos_pvp: In a pvp setup, ensure when a QoS egress policer is created that the traffic is limited to the specified rate. * phy2phy_scalability: LTD.Scalability.Flows.RFC2544.0PacketLoss * phy2phy_scalability_cont: Phy2Phy Scalability Continuous Stream * pvp_cont: PVP Continuous Stream * pvvp_cont: PVVP Continuous Stream * pvpv_cont: Two VMs in parallel with Continuous Stream Regards Ian
[dpdk-dev] [PATCH v2 0/3] add more features for AF_XDP pmd
This patch series mainly includes 3 new features for AF_XDP pmd. They are separated independent features, the reason I take them in one patchset is that they have code dependency. 1. zero copy This patch enables `zero copy` between af_xdp umem and mbuf by using external mbuf mechanism. 2. multi-queue With mutli-queue support, one AF_XDP pmd instance can use multi netdev queues. 3. busy-poll With busy-poll, all processing occurs on a single core, performance is better from a per-core perspective. This patch has dependency on busy-poll support in kernel side and now it is in RFC stage [1]. [1] https://www.spinics.net/lists/netdev/msg568337.html V2 changes: 1. improve mutli-queue support by getting the ethtool channel, so the driver is able to get a reason maximum queue number. 2. add a tiny cleanup patch to get rid of unused struct member 3. remove the busy-poll patch as its kernel counterpart changes, will update the patch later Xiaolong Ye (3): net/af_xdp: enable zero copy by extbuf net/af_xdp: add multi-queue support net/af_xdp: remove unused struct member doc/guides/nics/af_xdp.rst | 4 +- drivers/net/af_xdp/rte_eth_af_xdp.c | 285 2 files changed, 213 insertions(+), 76 deletions(-) -- 2.17.1
[dpdk-dev] [PATCH v2 1/3] net/af_xdp: enable zero copy by extbuf
Implement zero copy of af_xdp pmd through mbuf's external memory mechanism to achieve high performance. This patch also provides a new parameter "pmd_zero_copy" for user, so they can choose to enable zero copy of af_xdp pmd or not. To be clear, "zero copy" here is different from the "zero copy mode" of AF_XDP, it is about zero copy between af_xdp umem and mbuf used in dpdk application. Suggested-by: Varghese Vipin Suggested-by: Tummala Sivaprasad Suggested-by: Olivier Matz Signed-off-by: Xiaolong Ye --- doc/guides/nics/af_xdp.rst | 1 + drivers/net/af_xdp/rte_eth_af_xdp.c | 104 +--- 2 files changed, 79 insertions(+), 26 deletions(-) diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst index 175038153..0bd4239fe 100644 --- a/doc/guides/nics/af_xdp.rst +++ b/doc/guides/nics/af_xdp.rst @@ -28,6 +28,7 @@ The following options can be provided to set up an af_xdp port in DPDK. * ``iface`` - name of the Kernel interface to attach to (required); * ``queue`` - netdev queue id (optional, default 0); +* ``pmd_zero_copy`` - enable zero copy or not (optional, default 0); Prerequisites - diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c index 35c72272c..014cd5691 100644 --- a/drivers/net/af_xdp/rte_eth_af_xdp.c +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c @@ -70,6 +70,7 @@ struct xsk_umem_info { struct xsk_umem *umem; struct rte_ring *buf_ring; const struct rte_memzone *mz; + int pmd_zc; }; struct rx_stats { @@ -109,8 +110,8 @@ struct pmd_internals { int if_index; char if_name[IFNAMSIZ]; uint16_t queue_idx; + int pmd_zc; struct ether_addr eth_addr; - struct xsk_umem_info *umem; struct rte_mempool *mb_pool_share; struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS]; @@ -119,10 +120,12 @@ struct pmd_internals { #define ETH_AF_XDP_IFACE_ARG "iface" #define ETH_AF_XDP_QUEUE_IDX_ARG "queue" +#define ETH_AF_XDP_PMD_ZC_ARG "pmd_zero_copy" static const char * const valid_arguments[] = { ETH_AF_XDP_IFACE_ARG, ETH_AF_XDP_QUEUE_IDX_ARG, + ETH_AF_XDP_PMD_ZC_ARG, NULL }; @@ -166,6 +169,15 @@ reserve_fill_queue(struct xsk_umem_info *umem, uint16_t reserve_size) return 0; } +static void +umem_buf_release_to_fq(void *addr, void *opaque) +{ + struct xsk_umem_info *umem = (struct xsk_umem_info *)opaque; + uint64_t umem_addr = (uint64_t)addr - umem->mz->addr_64; + + rte_ring_enqueue(umem->buf_ring, (void *)umem_addr); +} + static uint16_t eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) { @@ -175,6 +187,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) struct xsk_ring_prod *fq = &umem->fq; uint32_t idx_rx = 0; uint32_t free_thresh = fq->size >> 1; + int pmd_zc = umem->pmd_zc; struct rte_mbuf *mbufs[ETH_AF_XDP_RX_BATCH_SIZE]; unsigned long dropped = 0; unsigned long rx_bytes = 0; @@ -197,19 +210,29 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) uint64_t addr; uint32_t len; void *pkt; + uint16_t buf_len = ETH_AF_XDP_FRAME_SIZE; + struct rte_mbuf_ext_shared_info *shinfo; desc = xsk_ring_cons__rx_desc(rx, idx_rx++); addr = desc->addr; len = desc->len; pkt = xsk_umem__get_data(rxq->umem->mz->addr, addr); - rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len); + if (pmd_zc) { + shinfo = rte_pktmbuf_ext_shinfo_init_helper(pkt, + &buf_len, umem_buf_release_to_fq, umem); + + rte_pktmbuf_attach_extbuf(mbufs[i], pkt, 0, buf_len, + shinfo); + } else { + rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), + pkt, len); + rte_ring_enqueue(umem->buf_ring, (void *)addr); + } rte_pktmbuf_pkt_len(mbufs[i]) = len; rte_pktmbuf_data_len(mbufs[i]) = len; rx_bytes += len; bufs[i] = mbufs[i]; - - rte_ring_enqueue(umem->buf_ring, (void *)addr); } xsk_ring_cons__release(rx, rcvd); @@ -262,12 +285,21 @@ kick_tx(struct pkt_tx_queue *txq) pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE); } +static inline bool +in_umem_range(struct xsk_umem_info *umem, uint64_t addr) +{ + uint64_t mz_base_addr = umem->mz->addr_64; + + return addr >= mz_base_addr && addr < mz_base_addr + umem->mz->len; +} + static uint16_t eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
[dpdk-dev] [PATCH v2 2/3] net/af_xdp: add multi-queue support
This patch adds two parameters `start_queue` and `queue_count` to specify the range of netdev queues used by AF_XDP pmd. Signed-off-by: Xiaolong Ye --- doc/guides/nics/af_xdp.rst | 3 +- drivers/net/af_xdp/rte_eth_af_xdp.c | 194 2 files changed, 141 insertions(+), 56 deletions(-) diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst index 0bd4239fe..18defcda3 100644 --- a/doc/guides/nics/af_xdp.rst +++ b/doc/guides/nics/af_xdp.rst @@ -27,7 +27,8 @@ Options The following options can be provided to set up an af_xdp port in DPDK. * ``iface`` - name of the Kernel interface to attach to (required); -* ``queue`` - netdev queue id (optional, default 0); +* ``start_queue`` - starting netdev queue id (optional, default 0); +* ``queue_count`` - total netdev queue number (optional, default 1); * ``pmd_zero_copy`` - enable zero copy or not (optional, default 0); Prerequisites diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c index 014cd5691..f56aabcae 100644 --- a/drivers/net/af_xdp/rte_eth_af_xdp.c +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c @@ -12,6 +12,8 @@ #include #include #include +#include +#include #include "af_xdp_deps.h" #include @@ -57,12 +59,12 @@ static int af_xdp_logtype; #define ETH_AF_XDP_NUM_BUFFERS 4096 #define ETH_AF_XDP_DATA_HEADROOM 0 #define ETH_AF_XDP_DFLT_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS -#define ETH_AF_XDP_DFLT_QUEUE_IDX 0 +#define ETH_AF_XDP_DFLT_START_QUEUE_IDX0 +#define ETH_AF_XDP_DFLT_QUEUE_COUNT1 #define ETH_AF_XDP_RX_BATCH_SIZE 32 #define ETH_AF_XDP_TX_BATCH_SIZE 32 -#define ETH_AF_XDP_MAX_QUEUE_PAIRS 16 struct xsk_umem_info { struct xsk_ring_prod fq; @@ -88,7 +90,7 @@ struct pkt_rx_queue { struct rx_stats stats; struct pkt_tx_queue *pair; - uint16_t queue_idx; + int xsk_queue_idx; }; struct tx_stats { @@ -103,28 +105,34 @@ struct pkt_tx_queue { struct tx_stats stats; struct pkt_rx_queue *pair; - uint16_t queue_idx; + int xsk_queue_idx; }; struct pmd_internals { int if_index; char if_name[IFNAMSIZ]; - uint16_t queue_idx; + int start_queue_idx; + int queue_cnt; + int max_queue_cnt; + int combined_queue_cnt; + int pmd_zc; struct ether_addr eth_addr; struct rte_mempool *mb_pool_share; - struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS]; - struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS]; + struct pkt_rx_queue *rx_queues; + struct pkt_tx_queue *tx_queues; }; #define ETH_AF_XDP_IFACE_ARG "iface" -#define ETH_AF_XDP_QUEUE_IDX_ARG "queue" +#define ETH_AF_XDP_START_QUEUE_ARG "start_queue" +#define ETH_AF_XDP_QUEUE_COUNT_ARG "queue_count" #define ETH_AF_XDP_PMD_ZC_ARG "pmd_zero_copy" static const char * const valid_arguments[] = { ETH_AF_XDP_IFACE_ARG, - ETH_AF_XDP_QUEUE_IDX_ARG, + ETH_AF_XDP_START_QUEUE_ARG, + ETH_AF_XDP_QUEUE_COUNT_ARG, ETH_AF_XDP_PMD_ZC_ARG, NULL }; @@ -394,8 +402,8 @@ eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) dev_info->if_index = internals->if_index; dev_info->max_mac_addrs = 1; dev_info->max_rx_pktlen = ETH_FRAME_LEN; - dev_info->max_rx_queues = 1; - dev_info->max_tx_queues = 1; + dev_info->max_rx_queues = internals->queue_cnt; + dev_info->max_tx_queues = internals->queue_cnt; dev_info->min_mtu = ETHER_MIN_MTU; dev_info->max_mtu = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM; @@ -412,21 +420,23 @@ eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats) struct pmd_internals *internals = dev->data->dev_private; struct xdp_statistics xdp_stats; struct pkt_rx_queue *rxq; + struct pkt_tx_queue *txq; socklen_t optlen; int i, ret; for (i = 0; i < dev->data->nb_rx_queues; i++) { optlen = sizeof(struct xdp_statistics); rxq = &internals->rx_queues[i]; - stats->q_ipackets[i] = internals->rx_queues[i].stats.rx_pkts; - stats->q_ibytes[i] = internals->rx_queues[i].stats.rx_bytes; + txq = rxq->pair; + stats->q_ipackets[i] = rxq->stats.rx_pkts; + stats->q_ibytes[i] = rxq->stats.rx_bytes; - stats->q_opackets[i] = internals->tx_queues[i].stats.tx_pkts; - stats->q_obytes[i] = internals->tx_queues[i].stats.tx_bytes; + stats->q_opackets[i] = txq->stats.tx_pkts; + stats->q_obytes[i] = txq->stats.tx_bytes; stats->ipackets += stats->q_ipackets[i]; stats->ibytes += stats->q_ibytes[i]; - stats->imissed += internals->rx_queue
[dpdk-dev] [PATCH v2 3/3] net/af_xdp: remove unused struct member
Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD") Cc: sta...@dpdk.org Signed-off-by: Xiaolong Ye --- drivers/net/af_xdp/rte_eth_af_xdp.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c index f56aabcae..fc25d245b 100644 --- a/drivers/net/af_xdp/rte_eth_af_xdp.c +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c @@ -118,7 +118,6 @@ struct pmd_internals { int pmd_zc; struct ether_addr eth_addr; - struct rte_mempool *mb_pool_share; struct pkt_rx_queue *rx_queues; struct pkt_tx_queue *tx_queues; -- 2.17.1
Re: [dpdk-dev] [PATCH 2/2] meson: make build configurable
On 29.05.2019 23:15, Michael Santana Francisco wrote: > On 5/29/19 12:39 PM, Ilya Maximets wrote: >> The first thing many developers do before start building DPDK is >> disabling all the not needed divers and libraries. This happens >> just because more than a half of DPDK dirvers and libraries are not >> needed for the particular reason. For example, you don't need >> dpaa*, octeon*, various croypto devices, eventdev, etc. if you're >> only want to build OVS for x86_64 with static linking. >> >> By disabling everything you don't need, build speeds up literally 10x >> times. This is important for CI systems. For example, TravisCI wastes >> 10 minutes for the default DPDK build just to check linking with OVS. >> >> Another thing is the binary size. Number of DPDK libraries and, >> as a result, size of resulted statically linked application decreases >> significantly. >> >> Important thing also that you're able to not install some dependencies >> if you don't have them on a target platform. Just disable libs/drivers >> that depends on it. Similar thing for the glibc version mismatch >> between build and target platforms. >> >> Also, I have to note that less code means less probability of >> failures and less number of attack vectors. >> >> This patch gives 'meson' the power of configurability that we >> have with 'make'. Using new options it's possible to enable just >> what you need and nothing more. >> >> For example, following cmdline could be used to build almost minimal >> set of DPDK libs and drivers to check OVS build: >> >> $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false \ >> -Ddrivers_bus=pci,vdev \ >> -Ddrivers_mempool=ring \ >> -Ddrivers_net=null,virtio,ring \ >> -Ddrivers_crypto=virtio \ >> -Ddrivers_compress=none \ >> -Ddrivers_event=none\ >> -Ddrivers_baseband=none \ >> -Ddrivers_raw=none \ >> -Ddrivers_common=none \ >> -Dlibs=kvargs,eal,cmdline,ring,mempool,mbuf,net,meter,\ >>ethdev,pci,hash,cryptodev,pdump,vhost \ >> -Dapps=none >> >> Adding a few real net drivers will give configuration that can be used >> in production environment. >> >> Looks not very pretty, but this could be moved to a script. >> >> Build details: >> >> Build targets in project: 57 >> >> $ time ninja >> real0m11,528s >> user1m4,137s >> sys 0m4,935s >> >> $ du -sh ../dpdk_meson_install/ >> 3,5M../dpdk_meson_install/ >> >> To compare with what we have without these options: >> >> $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false >> Build targets in project: 434 >> >> $ time ninja >> real1m38,963s >> user10m18,624s >> sys 0m45,478s >> >> $ du -sh ../dpdk_meson_install/ >> 27M ../dpdk_meson_install/ >> >> 10x speed up for the user time. >> 7.7 times size decrease. >> >> This is probably not much user-friendly because it's not a Kconfig >> and dependency tracking in meson is really poor, so it requires >> usually few iterations to pick correct set of libraries to satisfy >> all dependencies. However, it's not a big deal. Options intended >> for a proficient users who knows what they need. >> >> Signed-off-by: Ilya Maximets >> --- >> app/meson.build | 5 + >> drivers/baseband/meson.build | 5 + >> drivers/bus/meson.build | 6 ++ >> drivers/common/meson.build | 6 ++ >> drivers/compress/meson.build | 5 + >> drivers/crypto/meson.build | 5 + >> drivers/event/meson.build| 6 ++ >> drivers/mempool/meson.build | 6 ++ >> drivers/net/meson.build | 6 ++ >> drivers/raw/meson.build | 6 ++ >> lib/meson.build | 5 + >> meson_options.txt| 22 ++ >> 12 files changed, 83 insertions(+) >> >> diff --git a/app/meson.build b/app/meson.build >> index 2b9fdef74..48972954c 100644 >> --- a/app/meson.build >> +++ b/app/meson.build >> @@ -17,6 +17,11 @@ apps = [ >> 'test-pipeline', >> 'test-pmd'] >> >> +enabled_apps = get_option('apps') >> +if enabled_apps != 'all' >> +apps = (enabled_apps == 'none') ? [] : enabled_apps.split(',') >> +endif >> + >> # for BSD only >> lib_execinfo = cc.find_library('execinfo', required: false) >> >> diff --git a/drivers/baseband/meson.build b/drivers/baseband/meson.build >> index 52489df35..fabc80fc2 100644 >> --- a/drivers/baseband/meson.build >> +++ b/drivers/baseband/meson.build >> @@ -3,5 +3,10 @@ >> >> drivers = ['null'] >> >> +enabled_drivers = get_option('drivers_baseband') >> +if enabled_drivers != 'all' >> +drivers = (enabled_drivers == 'none') ? [] : enabled_drivers.split(',') >> +endif >> + >> config_flag_fmt = 'RTE_LIBRTE_@0@_PMD' >> driver_name_fmt = 'rte_pmd_@0@' >> diff -
Re: [dpdk-dev] [PATCH 2/2] meson: make build configurable
On 29.05.2019 23:37, Luca Boccassi wrote: > On Wed, 2019-05-29 at 19:39 +0300, Ilya Maximets wrote: >> The first thing many developers do before start building DPDK is >> disabling all the not needed divers and libraries. This happens >> just because more than a half of DPDK dirvers and libraries are not >> needed for the particular reason. For example, you don't need >> dpaa*, octeon*, various croypto devices, eventdev, etc. if you're >> only want to build OVS for x86_64 with static linking. >> >> By disabling everything you don't need, build speeds up literally 10x >> times. This is important for CI systems. For example, TravisCI wastes >> 10 minutes for the default DPDK build just to check linking with OVS. >> >> Another thing is the binary size. Number of DPDK libraries and, >> as a result, size of resulted statically linked application decreases >> significantly. >> >> Important thing also that you're able to not install some >> dependencies >> if you don't have them on a target platform. Just disable >> libs/drivers >> that depends on it. Similar thing for the glibc version mismatch >> between build and target platforms. >> >> Also, I have to note that less code means less probability of >> failures and less number of attack vectors. >> >> This patch gives 'meson' the power of configurability that we >> have with 'make'. Using new options it's possible to enable just >> what you need and nothing more. >> >> For example, following cmdline could be used to build almost minimal >> set of DPDK libs and drivers to check OVS build: >> >> $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false \ >> -Ddrivers_bus=pci,vdev \ >> -Ddrivers_mempool=ring \ >> -Ddrivers_net=null,virtio,ring \ >> -Ddrivers_crypto=virtio \ >> -Ddrivers_compress=none \ >> -Ddrivers_event=none\ >> -Ddrivers_baseband=none \ >> -Ddrivers_raw=none \ >> -Ddrivers_common=none \ >> >> -Dlibs=kvargs,eal,cmdline,ring,mempool,mbuf,net,meter,\ >>ethdev,pci,hash,cryptodev,pdump,vhost \ >> -Dapps=none >> >> Adding a few real net drivers will give configuration that can be >> used >> in production environment. >> >> Looks not very pretty, but this could be moved to a script. >> >> Build details: >> >> Build targets in project: 57 >> >> $ time ninja >> real0m11,528s >> user1m4,137s >> sys 0m4,935s >> >> $ du -sh ../dpdk_meson_install/ >> 3,5M../dpdk_meson_install/ >> >> To compare with what we have without these options: >> >> $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false >> Build targets in project: 434 >> >> $ time ninja >> real1m38,963s >> user10m18,624s >> sys 0m45,478s >> >> $ du -sh ../dpdk_meson_install/ >> 27M ../dpdk_meson_install/ >> >> 10x speed up for the user time. >> 7.7 times size decrease. >> >> This is probably not much user-friendly because it's not a Kconfig >> and dependency tracking in meson is really poor, so it requires >> usually few iterations to pick correct set of libraries to satisfy >> all dependencies. However, it's not a big deal. Options intended >> for a proficient users who knows what they need. > > Hi, > > We talked about this a few times in the past, and it was actually one > of the design goals to _avoid_ replicating the octopus-like config > system of the makefiles. That's because it makes the test matrix > insanely complicated, not to mention the harm to user friendliness, > among other things. > > If someone doesn't want to use a PMD, they can just avoid installing it > - it's simple enough. So how can I do this? I don't think 'ninja install' has such option. Also, if you think that it is safe to skip some libs/drivers in installation process, it must be safe to not build them at all. It's just a waste of time and computational resources to build something known to be not used. And if you're going to ship DPDK libraries separately in distros, you'll have to test their different combinations anyway. If they're so independent that you don't need to test them in various combinations, than your point about test matrix is not valid. > > Sorry, but from me it's a very strong NACK. Sorry, but let me disagree with you. For me, meson configurability is the essential thing to have in terms of deprecating the 'make' build system. DPDK was and keeps being (in most cases) the library that users statically linking to a single application built for particular platform and not using for anything else. This means that user in most cases knows which parts needed and which parts will never be used. Current meson build system doesn't allow to disable anything forcing users to link with the whole bunch of unused code. One major case is that you have to have build
[dpdk-dev] Short term stable branches/releases
Hi All, A reminder that there is no longer a default in practice of having short term stable branches/releases for xx.02/05/08 DPDK master releases. Note, this is relevant for xx.02/05/08 based short term (~3 month) stables only. DPDK LTS based off xx.11 is *not* changing. This is to allow more time for maintenance and validation of master and LTS branches/releases as it seems to be where the community are most interested. There can still be short term stable branches/releases for individual xx.02/05/08 releases if there is a particular need and a commitment from community members to maintain/validate. See http://doc.dpdk.org/guides/contributing/stable.html#stable-releases for further details. thanks, Kevin.
Re: [dpdk-dev] [PATCH v4 2/5] eal: add lcore accessors
On Thu, May 30, 2019 at 09:40:08AM +0200, Thomas Monjalon wrote: > 30/05/2019 09:31, David Marchand: > > On Thu, May 30, 2019 at 12:51 AM Stephen Hemminger < > > step...@networkplumber.org> wrote: > > > > > On Thu, 30 May 2019 00:46:30 +0200 > > > Thomas Monjalon wrote: > > > > > > > 23/05/2019 15:58, David Marchand: > > > > > From: Stephen Hemminger > > > > > > > > > > The fields of the internal EAL core configuration are currently > > > > > laid bare as part of the API. This is not good practice and limits > > > > > fixing issues with layout and sizes. > > > > > > > > > > Make new accessor functions for the fields used by current drivers > > > > > and examples. > > > > [...] > > > > > +DPDK_19.08 { > > > > > + global: > > > > > + > > > > > + rte_lcore_cpuset; > > > > > + rte_lcore_index; > > > > > + rte_lcore_to_cpu_id; > > > > > + rte_lcore_to_socket_id; > > > > > + > > > > > +} DPDK_19.05; > > > > > + > > > > > EXPERIMENTAL { > > > > > global: > > > > > > > > Just to make sure, are we OK to introduce these functions > > > > as non-experimental? > > > > > > They were in previous releases as inlines this patch converts them > > > to real functions. > > > > > > > > Well, yes and no. > > > > rte_lcore_index and rte_lcore_to_socket_id already existed, so making them > > part of the ABI is fine for me. > > > > rte_lcore_to_cpu_id is new but seems quite safe in how it can be used, > > adding it to the ABI is ok for me. > > It is used by DPAA and some test. > I guess adding as experimental is fine too? > I'm fine with both options, I'm just trying to apply the policy > we agreed on. Does this case deserve an exception? > While it may be a good candidate, I'm not sure how much making an exception for it really matters. I'd be tempted to just mark it experimental and then have it stable for the 19.11 release. What do we really lose by waiting a release to stabilize it?
Re: [dpdk-dev] [PATCH 00/25] Make shared memory config non-public
On Thu, May 30, 2019 at 09:07:44AM +0100, Burakov, Anatoly wrote: > On 29-May-19 9:11 PM, David Marchand wrote: > > On Wed, May 29, 2019 at 6:31 PM Anatoly Burakov > > wrote: > > > > > This patchset removes the shared memory config from public > > > API, and replaces all usages of said config with new API > > > calls. > > > > > > The patchset is mostly a search-and-replace job and should > > > be pretty easy to review. However, the changes to ENA > > > > > > > I went and did the same job with some scripts. > > > > Not sure you really need to split in all those patches. > > We are not going to backport this. > > The "separate commits" thing is made for the benefit of reviewers, not > backporters. In my experience it's much easier to get a maintainer to review > a smaller patch than it is to look through a wall of irrelevant changes. > > That said, for trivial changes such as these, maybe this is indeed > unnecessary. > > > Some changes are mixed, the kni changes are in the hash: patch. > > Oops, will fix, thanks for pointing it out! > > > > > > > I spotted a missed qlock in : > > lib/librte_eal/common/eal_common_tailqs.c: > > rte_rwlock_read_lock(&mcfg->qlock); > > lib/librte_eal/common/eal_common_tailqs.c: > > rte_rwlock_read_unlock(&mcfg->qlock); > > > > > > On the names of the functions, could we have something shorter ? > > The prefix rte_eal_mcfg_ is not necessary from my pov. > > I can drop the mcfg, but IMO all of these locking functions should be kept > under one namespace, and rte_eal_ is too broad. > I think most/all developers are aware that memory is part of eal, so rte_mcfg_ prefix (or rte_memcfg) might work.
[dpdk-dev] [PATCH v1 2/9] net/mlx5: add log file procedure for debug data
Add a global function in the PMD which dumps debug information to specific file. The data can be printed in hexadecimal format or as regular string. The number of debug files per PMD entity should be limited by a new PMD probe parameter called max_dump_files_num. The files will be created in the /var/log directory or in the current directory. Cc: sta...@dpdk.org Signed-off-by: Matan Azrad --- doc/guides/nics/mlx5.rst | 7 +++ drivers/net/mlx5/mlx5.c | 8 drivers/net/mlx5/mlx5.h | 1 + drivers/net/mlx5/mlx5_rxtx.c | 44 drivers/net/mlx5/mlx5_rxtx.h | 2 ++ 5 files changed, 62 insertions(+) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 325e9f6..aa89bd9 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -507,6 +507,13 @@ Run-time configuration representor=[0-2] +- ``max_dump_files_num`` parameter [int] + + The maximum number of files per PMD entity that may be created for debug information. + The files will be created in /var/log directory or in current directory. + + set to 128 by default. + Firmware configuration ~~ diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 9f5ec97..ebb49c8 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -116,6 +116,9 @@ /* Select port representors to instantiate. */ #define MLX5_REPRESENTOR "representor" +/* Device parameter to configure the maximum number of dump files per queue. */ +#define MLX5_MAX_DUMP_FILES_NUM "max_dump_files_num" + #ifndef HAVE_IBV_MLX5_MOD_MPW #define MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED (1 << 2) #define MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW (1 << 3) @@ -926,6 +929,8 @@ struct mlx5_dev_spawn_data { config->dv_flow_en = !!tmp; } else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) { config->mr_ext_memseg_en = !!tmp; + } else if (strcmp(MLX5_MAX_DUMP_FILES_NUM, key) == 0) { + config->max_dump_files_num = tmp; } else { DRV_LOG(WARNING, "%s: unknown parameter", key); rte_errno = EINVAL; @@ -970,6 +975,7 @@ struct mlx5_dev_spawn_data { MLX5_DV_FLOW_EN, MLX5_MR_EXT_MEMSEG_EN, MLX5_REPRESENTOR, + MLX5_MAX_DUMP_FILES_NUM, NULL, }; struct rte_kvargs *kvlist; @@ -1433,6 +1439,8 @@ struct mlx5_dev_spawn_data { DRV_LOG(WARNING, "Multi-Packet RQ isn't supported"); config.mprq.enabled = 0; } + if (config.max_dump_files_num == 0) + config.max_dump_files_num = 128; eth_dev = rte_eth_dev_allocate(name); if (eth_dev == NULL) { DRV_LOG(ERR, "can not allocate rte ethdev"); diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index 3eaaafd..4c339d0 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -204,6 +204,7 @@ struct mlx5_dev_config { unsigned int flow_prio; /* Number of flow priorities. */ unsigned int tso_max_payload_sz; /* Maximum TCP payload for TSO. */ unsigned int ind_table_max_size; /* Maximum indirection table size. */ + unsigned int max_dump_files_num; /* Maximum dump files per queue. */ int txq_inline; /* Maximum packet size for inlining. */ int txqs_inline; /* Queue number threshold for inlining. */ int txqs_vec; /* Queue number threshold for vectorized Tx. */ diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index 3da3f62..2c8d066 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -524,6 +524,50 @@ return rx_queue_count(rxq); } +#define MLX5_SYSTEM_LOG_DIR "/var/log" +/** + * Dump debug information to log file. + * + * @param fname + * The file name. + * @param hex_title + * If not NULL this string is printed as a header to the output + * and the output will be in hexadecimal view. + * @param buf + * This is the buffer address to print out. + * @param len + * The number of bytes to dump out. + */ +void +mlx5_dump_debug_information(const char *fname, const char *hex_title, + const void *buf, unsigned int hex_len) +{ + FILE *fd; + + MKSTR(path, "%s/%s", MLX5_SYSTEM_LOG_DIR, fname); + fd = fopen(path, "a+"); + if (!fd) { + DRV_LOG(WARNING, "cannot open %s for debug dump\n", + path); + MKSTR(path2, "./%s", fname); + fd = fopen(path2, "a+"); + if (!fd) { + DRV_LOG(ERR, "cannot open %s for debug dump\n", + path2); + return; + } + DRV_LOG(INFO, "New debug dump in file %s\n", path2); + } else { + DRV_LOG(INFO, "New debug dump in file %s\n", path); + } + if (hex_title) +
[dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error
Add support for data-path Rx and Tx completions with error handling: 1. Detect the error. 2. Do not crash. 3. Report it in statistics counters. 4. Dump debug information to system log file. 5. Recover the error under the hood. 6. Add support for secondary process recovery. No performance impact was shown. Matan Azrad (9): net/mlx5: remove Rx queues indexes correlation net/mlx5: add log file procedure for debug data net/mlx5: fix device arguments error detection net/mlx5: mitigate Rx doorbell memory barrier net/mlx5: separate Rx queue initialization net/mlx5: extend Rx completion with error handling net/mlx5: handle Tx completion with error net/mlx5: recover secondary process Rx errors net/mlx5: recover secondary process Tx errors doc/guides/nics/mlx5.rst | 7 + drivers/net/mlx5/mlx5.c | 14 +- drivers/net/mlx5/mlx5.h | 12 + drivers/net/mlx5/mlx5_mp.c| 46 +++ drivers/net/mlx5/mlx5_prm.h | 11 + drivers/net/mlx5/mlx5_rxq.c | 42 +-- drivers/net/mlx5/mlx5_rxtx.c | 673 -- drivers/net/mlx5/mlx5_rxtx.h | 193 +- drivers/net/mlx5/mlx5_rxtx_vec.c | 5 +- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 36 +- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 36 +- drivers/net/mlx5/mlx5_trigger.c | 1 + drivers/net/mlx5/mlx5_txq.c | 4 +- 13 files changed, 792 insertions(+), 288 deletions(-) -- 1.8.3.1
[dpdk-dev] [PATCH v1 3/9] net/mlx5: fix device arguments error detection
When bad device arguments are added to the DPDK command line, the PMD ignores all the command line arguments specified by the user and uses the default values instead. This behavior doesn't make sense because the user intension is to force some device parameters and expects to get an error in case of problematic issues with the arguments. Stop probing and report an error in case of problematic command line arguments. Fixes: e72dd09b614e ("net/mlx5: add support for configuration through kvargs") Cc: sta...@dpdk.org Signed-off-by: Matan Azrad --- drivers/net/mlx5/mlx5.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index ebb49c8..23e397e 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -986,8 +986,10 @@ struct mlx5_dev_spawn_data { return 0; /* Following UGLY cast is done to pass checkpatch. */ kvlist = rte_kvargs_parse(devargs->args, params); - if (kvlist == NULL) - return 0; + if (kvlist == NULL) { + rte_errno = EINVAL; + return -rte_errno; + } /* Process parameters. */ for (i = 0; (params[i] != NULL); ++i) { if (rte_kvargs_count(kvlist, params[i])) { -- 1.8.3.1
[dpdk-dev] [PATCH v1 1/9] net/mlx5: remove Rx queues indexes correlation
There is a full correlation between the CQE indexes to the WQE indexes in the vectorized Rx queues management. When the RQ is inserted to the reset state, the correlation may break because the HW starts the RQ polling from index 0 while the CQ polling continues regularly. As an arrangement to CQE errors handling, when the RQ can be reset, the correlation dependence should be removed from all the Rx queues index managments. Remove the aformentioned dependence from the vectorized Rx burst functions. Cc: sta...@dpdk.org Signed-off-by: Matan Azrad --- drivers/net/mlx5/mlx5_rxq.c | 1 + drivers/net/mlx5/mlx5_rxtx.h | 6 +- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 26 +- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 +- 4 files changed, 32 insertions(+), 27 deletions(-) diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index a00cb12..b248f38 100644 --- a/drivers/net/mlx5/mlx5_rxq.c +++ b/drivers/net/mlx5/mlx5_rxq.c @@ -1006,6 +1006,7 @@ struct mlx5_rxq_ibv * rxq_data->cq_uar = cq_info.cq_uar; rxq_data->cqn = cq_info.cqn; rxq_data->cq_arm_sn = 0; + rxq_data->decompressed = 0; /* Update doorbell counter. */ rxq_data->rq_ci = wqe_n >> rxq_data->sges_n; rte_wmb(); diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index 4339aaf..7bacdba 100644 --- a/drivers/net/mlx5/mlx5_rxtx.h +++ b/drivers/net/mlx5/mlx5_rxtx.h @@ -101,11 +101,15 @@ struct mlx5_rxq_data { uint32_t rq_pi; uint32_t cq_ci; uint16_t rq_repl_thresh; /* Threshold for buffer replenishment. */ + union { + struct rxq_zip zip; /* Compressed context. */ + uint16_t decompressed; + /* Number of ready mbufs decompressed from the CQ. */ + }; struct mlx5_mr_ctrl mr_ctrl; /* MR control descriptor. */ uint16_t mprq_max_memcpy_len; /* Maximum size of packet to memcpy. */ volatile void *wqes; volatile struct mlx5_cqe(*cqes)[]; - struct rxq_zip zip; /* Compressed context. */ RTE_STD_C11 union { struct rte_mbuf *(*elts)[]; diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index 38e915c..6a1b2bb 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -352,8 +352,11 @@ * @param elts * Pointer to SW ring to be filled. The first mbuf has to be pre-built from * the title completion descriptor to be copied to the rest of mbufs. + * + * @return + * Number of mini-CQEs successfully decompressed. */ -static inline void +static inline uint16_t rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts) { @@ -505,6 +508,7 @@ rxq->stats.ibytes += rcvd_byte; #endif rxq->cq_ci += mcqe_n; + return mcqe_n; } /** @@ -729,24 +733,17 @@ rte_prefetch_non_temporal(cq + 2); rte_prefetch_non_temporal(cq + 3); pkts_n = RTE_MIN(pkts_n, MLX5_VPMD_RX_MAX_BURST); - /* -* Order of indexes: -* rq_ci >= cq_ci >= rq_pi -* Definition of indexes: -* rq_ci - cq_ci := # of buffers owned by HW (posted). -* cq_ci - rq_pi := # of buffers not returned to app (decompressed). -* N - (rq_ci - rq_pi) := # of buffers consumed (to be replenished). -*/ repl_n = q_n - (rxq->rq_ci - rxq->rq_pi); if (repl_n >= rxq->rq_repl_thresh) mlx5_rx_replenish_bulk_mbuf(rxq, repl_n); /* See if there're unreturned mbufs from compressed CQE. */ - rcvd_pkt = rxq->cq_ci - rxq->rq_pi; + rcvd_pkt = rxq->decompressed; if (rcvd_pkt > 0) { rcvd_pkt = RTE_MIN(rcvd_pkt, pkts_n); rxq_copy_mbuf_v(rxq, pkts, rcvd_pkt); rxq->rq_pi += rcvd_pkt; pkts += rcvd_pkt; + rxq->decompressed -= rcvd_pkt; } elts_idx = rxq->rq_pi & q_mask; elts = &(*rxq->elts)[elts_idx]; @@ -754,10 +751,11 @@ pkts_n = RTE_ALIGN_FLOOR(pkts_n - rcvd_pkt, MLX5_VPMD_DESCS_PER_LOOP); /* Not to cross queue end. */ pkts_n = RTE_MIN(pkts_n, q_n - elts_idx); + pkts_n = RTE_MIN(pkts_n, q_n - cq_idx); if (!pkts_n) return rcvd_pkt; /* At this point, there shouldn't be any remained packets. */ - assert(rxq->rq_pi == rxq->cq_ci); + assert(rxq->decompressed == 0); /* * Note that vectors have reverse order - {v3, v2, v1, v0}, because * there's no instruction to count trailing zeros. __builtin_clzl() is @@ -1003,15 +1001,17 @@ /* Decompress the last CQE if compressed. */ if (comp_idx < MLX5_VPMD_DESCS_PER_LOOP && comp_idx == n) { assert(comp_idx == (nocmp_n % MLX5_VPMD_DESCS_PER_LOOP)); - rxq_cq_
[dpdk-dev] [PATCH v1 4/9] net/mlx5: mitigate Rx doorbell memory barrier
The RQ WQEs must be written in the memory before the HW gets the RQ doorbell, hence a memory barrier should be triggered after the WQEs writing and before the doorbell writing. The current code used rte_wmb barrier which ensures that all the memory stores were done while it is enough to use rte_cio_wmb barrier for the local memory stores because the WQEs are in local memory. CC: sta...@dpdk.org Signed-off-by: Matan Azrad --- drivers/net/mlx5/mlx5_rxq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index b248f38..282295f 100644 --- a/drivers/net/mlx5/mlx5_rxq.c +++ b/drivers/net/mlx5/mlx5_rxq.c @@ -1009,7 +1009,7 @@ struct mlx5_rxq_ibv * rxq_data->decompressed = 0; /* Update doorbell counter. */ rxq_data->rq_ci = wqe_n >> rxq_data->sges_n; - rte_wmb(); + rte_cio_wmb(); *rxq_data->rq_db = rte_cpu_to_be_32(rxq_data->rq_ci); DRV_LOG(DEBUG, "port %u rxq %u updated with %p", dev->data->port_id, idx, (void *)&tmpl); -- 1.8.3.1
[dpdk-dev] [PATCH v1 5/9] net/mlx5: separate Rx queue initialization
Move the RQ WQEs initialization code to separate function as an arrangement to CQE error recovering for code reuse. CC: sta...@dpdk.org Signed-off-by: Matan Azrad --- drivers/net/mlx5/mlx5_rxq.c | 43 ++- drivers/net/mlx5/mlx5_rxtx.c | 53 2 files changed, 55 insertions(+), 41 deletions(-) diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index 282295f..90e8c49 100644 --- a/drivers/net/mlx5/mlx5_rxq.c +++ b/drivers/net/mlx5/mlx5_rxq.c @@ -779,7 +779,6 @@ struct mlx5_rxq_ibv * struct mlx5_rxq_ibv *tmpl; struct mlx5dv_cq cq_info; struct mlx5dv_rwq rwq; - unsigned int i; int ret = 0; struct mlx5dv_obj obj; struct mlx5_dev_config *config = &priv->config; @@ -964,53 +963,15 @@ struct mlx5_rxq_ibv * } /* Fill the rings. */ rxq_data->wqes = rwq.buf; - for (i = 0; (i != wqe_n); ++i) { - volatile struct mlx5_wqe_data_seg *scat; - uintptr_t addr; - uint32_t byte_count; - - if (mprq_en) { - struct mlx5_mprq_buf *buf = (*rxq_data->mprq_bufs)[i]; - - scat = &((volatile struct mlx5_wqe_mprq *) -rxq_data->wqes)[i].dseg; - addr = (uintptr_t)mlx5_mprq_buf_addr(buf); - byte_count = (1 << rxq_data->strd_sz_n) * -(1 << rxq_data->strd_num_n); - } else { - struct rte_mbuf *buf = (*rxq_data->elts)[i]; - - scat = &((volatile struct mlx5_wqe_data_seg *) -rxq_data->wqes)[i]; - addr = rte_pktmbuf_mtod(buf, uintptr_t); - byte_count = DATA_LEN(buf); - } - /* scat->addr must be able to store a pointer. */ - assert(sizeof(scat->addr) >= sizeof(uintptr_t)); - *scat = (struct mlx5_wqe_data_seg){ - .addr = rte_cpu_to_be_64(addr), - .byte_count = rte_cpu_to_be_32(byte_count), - .lkey = mlx5_rx_addr2mr(rxq_data, addr), - }; - } rxq_data->rq_db = rwq.dbrec; rxq_data->cqe_n = log2above(cq_info.cqe_cnt); - rxq_data->cq_ci = 0; - rxq_data->consumed_strd = 0; - rxq_data->rq_pi = 0; - rxq_data->zip = (struct rxq_zip){ - .ai = 0, - }; rxq_data->cq_db = cq_info.dbrec; rxq_data->cqes = (volatile struct mlx5_cqe (*)[])(uintptr_t)cq_info.buf; rxq_data->cq_uar = cq_info.cq_uar; rxq_data->cqn = cq_info.cqn; rxq_data->cq_arm_sn = 0; - rxq_data->decompressed = 0; - /* Update doorbell counter. */ - rxq_data->rq_ci = wqe_n >> rxq_data->sges_n; - rte_cio_wmb(); - *rxq_data->rq_db = rte_cpu_to_be_32(rxq_data->rq_ci); + mlx5_rxq_initialize(rxq_data); + rxq_data->cq_ci = 0; DRV_LOG(DEBUG, "port %u rxq %u updated with %p", dev->data->port_id, idx, (void *)&tmpl); rte_atomic32_inc(&tmpl->refcnt); diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index 2c8d066..aec0185 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -1831,6 +1831,59 @@ } /** + * Initialize Rx WQ and indexes. + * + * @param[in] rxq + * Pointer to RX queue structure. + */ +void +mlx5_rxq_initialize(struct mlx5_rxq_data *rxq) +{ + const unsigned int wqe_n = 1 << rxq->elts_n; + unsigned int i; + + for (i = 0; (i != wqe_n); ++i) { + volatile struct mlx5_wqe_data_seg *scat; + uintptr_t addr; + uint32_t byte_count; + + if (mlx5_rxq_mprq_enabled(rxq)) { + struct mlx5_mprq_buf *buf = (*rxq->mprq_bufs)[i]; + + scat = &((volatile struct mlx5_wqe_mprq *) + rxq->wqes)[i].dseg; + addr = (uintptr_t)mlx5_mprq_buf_addr(buf); + byte_count = (1 << rxq->strd_sz_n) * + (1 << rxq->strd_num_n); + } else { + struct rte_mbuf *buf = (*rxq->elts)[i]; + + scat = &((volatile struct mlx5_wqe_data_seg *) + rxq->wqes)[i]; + addr = rte_pktmbuf_mtod(buf, uintptr_t); + byte_count = DATA_LEN(buf); + } + /* scat->addr must be able to store a pointer. */ + assert(sizeof(scat->addr) >= sizeof(uintptr_t)); + *scat = (struct mlx5_wqe_data_seg){ + .addr = rte_cpu_to_be_64(addr), + .byte_count = rte_cpu_to_be_32(byte_count), + .lkey = mlx5_rx_addr2mr(rxq, addr), +
[dpdk-dev] [PATCH v1 8/9] net/mlx5: recover secondary process Rx errors
The RQ errors recovery mechanism in the PMD invokes a Verbs functions to modify the RQ states in order to reset the RQ and to reactivate it. These Verbs functions are not allowed to be invoked from a secondary process, hence the PMD skips the recovery when the error is captured by secondary processes queues. Using the DPDK IPC mechanism the secondary process can request Verbs queues state modifications to be done synchronically by the primary process. Add support for secondary process Rx errors recovery. Cc: sta...@dpdk.org Signed-off-by: Matan Azrad --- drivers/net/mlx5/mlx5.h | 11 + drivers/net/mlx5/mlx5_mp.c | 46 +++ drivers/net/mlx5/mlx5_rxtx.c| 98 + drivers/net/mlx5/mlx5_rxtx.h| 3 ++ drivers/net/mlx5/mlx5_trigger.c | 1 + 5 files changed, 141 insertions(+), 18 deletions(-) diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index 4c339d0..85a6d02 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -61,6 +61,13 @@ enum mlx5_mp_req_type { MLX5_MP_REQ_CREATE_MR, MLX5_MP_REQ_START_RXTX, MLX5_MP_REQ_STOP_RXTX, + MLX5_MP_REQ_QUEUE_STATE_MODIFY, +}; + +struct mlx5_mp_arg_queue_state_modify { + uint8_t is_wq; /* Set if WQ. */ + uint16_t queue_id; /* DPDK queue ID. */ + enum ibv_wq_state state; /* WQ requested state. */ }; /* Pameters for IPC. */ @@ -71,6 +78,8 @@ struct mlx5_mp_param { RTE_STD_C11 union { uintptr_t addr; /* MLX5_MP_REQ_CREATE_MR */ + struct mlx5_mp_arg_queue_state_modify state_modify; + /* MLX5_MP_REQ_QUEUE_STATE_MODIFY */ } args; }; @@ -542,6 +551,8 @@ int mlx5_ctrl_flow(struct rte_eth_dev *dev, void mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev); int mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr); int mlx5_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev); +int mlx5_mp_req_queue_state_modify(struct rte_eth_dev *dev, + struct mlx5_mp_arg_queue_state_modify *sm); void mlx5_mp_init_primary(void); void mlx5_mp_uninit_primary(void); void mlx5_mp_init_secondary(void); diff --git a/drivers/net/mlx5/mlx5_mp.c b/drivers/net/mlx5/mlx5_mp.c index cea74ad..3ccae51 100644 --- a/drivers/net/mlx5/mlx5_mp.c +++ b/drivers/net/mlx5/mlx5_mp.c @@ -85,6 +85,12 @@ res->result = 0; ret = rte_mp_reply(&mp_res, peer); break; + case MLX5_MP_REQ_QUEUE_STATE_MODIFY: + mp_init_msg(dev, &mp_res, param->type); + res->result = mlx5_queue_state_modify_primary + (dev, ¶m->args.state_modify); + ret = rte_mp_reply(&mp_res, peer); + break; default: rte_errno = EINVAL; DRV_LOG(ERR, "port %u invalid mp request type", @@ -271,6 +277,46 @@ } /** + * Request Verbs queue state modification to the primary process. + * + * @param[in] dev + * Pointer to Ethernet structure. + * @param sm + * State modify parameters. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +int +mlx5_mp_req_queue_state_modify(struct rte_eth_dev *dev, + struct mlx5_mp_arg_queue_state_modify *sm) +{ + struct rte_mp_msg mp_req; + struct rte_mp_msg *mp_res; + struct rte_mp_reply mp_rep; + struct mlx5_mp_param *req = (struct mlx5_mp_param *)mp_req.param; + struct mlx5_mp_param *res; + struct timespec ts = {.tv_sec = MLX5_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0}; + int ret; + + assert(rte_eal_process_type() == RTE_PROC_SECONDARY); + mp_init_msg(dev, &mp_req, MLX5_MP_REQ_QUEUE_STATE_MODIFY); + req->args.state_modify = *sm; + ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts); + if (ret) { + DRV_LOG(ERR, "port %u request to primary process failed", + dev->data->port_id); + return -rte_errno; + } + assert(mp_rep.nb_received == 1); + mp_res = &mp_rep.msgs[0]; + res = (struct mlx5_mp_param *)mp_res->param; + ret = res->result; + free(mp_rep.msgs); + return ret; +} + +/** * Request Verbs command file descriptor for mmap to the primary process. * * @param[in] dev diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index 36e2dd3..cb3baad 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -2031,6 +2031,75 @@ } /** + * Modify a Verbs queue state. + * This must be called from the primary process. + * + * @param dev + * Pointer to Ethernet device. + * @param sm + * State modify request parameters. + * + * @return + * 0 in case of success else non-zero value and rte_errno is set. + */ +int +mlx5_queue_state_modify_primary(struct rte_eth_dev *dev, + const struct mlx5_mp_arg_queue_state
[dpdk-dev] [PATCH v1 6/9] net/mlx5: extend Rx completion with error handling
When WQEs are posted to the HW to receive packets, the PMD may receive a completion report with error from the HW, aka error CQE which is associated to a bad WQE. The error reason may be bad address, wrong lkey, small buffer size, etc. that can wrongly be configured by the PMD or by the user. Checking all the optional mistakes to prevent error CQEs doesn't make sense due to performance impacts, moreover, some error CQEs can be triggered because of the packets coming from the wire when the DPDK application has no any control. Most of the error CQE types change the RQ state to error state what causes all the next received packets to be dropped by the HW and to be completed with CQE flush error forever. The current solution detects these error CQEs and even reports the errors to the user by the statistics error counters but without recovery, so if the RQ inserted to the error state it never moves to ready state again and all the next packets ever will be dropped. Extend the error CQEs handling for recovery by moving the state to ready again, and rearranging all the RQ WQEs and the management variables appropriately. Sometimes the error CQE root cause is very hard to debug and even may be related to some corner cases which are not reproducible easily, hence a dump file with debug information will be created for the first number of error CQEs, this number can be configured by the PMD probe parameters. Cc: sta...@dpdk.org Signed-off-by: Matan Azrad --- drivers/net/mlx5/mlx5_rxtx.c | 328 +++ drivers/net/mlx5/mlx5_rxtx.h | 101 drivers/net/mlx5/mlx5_rxtx_vec.c | 5 +- 3 files changed, 266 insertions(+), 168 deletions(-) diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index aec0185..5369fc1 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -25,6 +25,7 @@ #include #include #include +#include #include "mlx5.h" #include "mlx5_utils.h" @@ -444,7 +445,7 @@ cq_ci = rxq->cq_ci; } cqe = &(*rxq->cqes)[cq_ci & cqe_cnt]; - while (check_cqe(cqe, cqe_n, cq_ci) == 0) { + while (check_cqe(cqe, cqe_n, cq_ci) != MLX5_CQE_STATUS_HW_OWN) { int8_t op_own; unsigned int n; @@ -1884,6 +1885,130 @@ } /** + * Handle a Rx error. + * The function inserts the RQ state to reset when the first error CQE is + * shown, then drains the CQ by the caller function loop. When the CQ is empty, + * it moves the RQ state to ready and initializes the RQ. + * Next CQE identification and error counting are in the caller responsibility. + * + * @param[in] rxq + * Pointer to RX queue structure. + * @param[in] mbuf_prepare + * Whether to prepare mbufs for the RQ. + * + * @return + * -1 in case of recovery error, otherwise the CQE status. + */ +int +mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t mbuf_prepare) +{ + const uint16_t cqe_n = 1 << rxq->cqe_n; + const uint16_t cqe_mask = cqe_n - 1; + const unsigned int wqe_n = 1 << rxq->elts_n; + struct mlx5_rxq_ctrl *rxq_ctrl = + container_of(rxq, struct mlx5_rxq_ctrl, rxq); + struct ibv_wq_attr mod = { + .attr_mask = IBV_WQ_ATTR_STATE, + }; + union { + volatile struct mlx5_cqe *cqe; + volatile struct mlx5_err_cqe *err_cqe; + } u = { + .cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_mask], + }; + int ret; + + switch (rxq->err_state) { + case MLX5_RXQ_ERR_STATE_NO_ERROR: + rxq->err_state = MLX5_RXQ_ERR_STATE_NEED_RESET; + /* Fall-through */ + case MLX5_RXQ_ERR_STATE_NEED_RESET: + if (rte_eal_process_type() != RTE_PROC_PRIMARY) + return -1; + mod.wq_state = IBV_WQS_RESET; + ret = mlx5_glue->modify_wq(rxq_ctrl->ibv->wq, &mod); + if (ret) { + DRV_LOG(ERR, "Cannot change Rx WQ state to RESET %s\n", + strerror(errno)); + return -1; + } + if (rxq_ctrl->dump_file_n < + rxq_ctrl->priv->config.max_dump_files_num) { + MKSTR(err_str, "Unexpected CQE error syndrome " + "0x%02x CQN = %u RQN = %u wqe_counter = %u" + " rq_ci = %u cq_ci = %u", u.err_cqe->syndrome, + rxq->cqn, rxq_ctrl->ibv->wq->wq_num, + rte_be_to_cpu_16(u.err_cqe->wqe_counter), + rxq->rq_ci << rxq->sges_n, rxq->cq_ci); + MKSTR(name, "dpdk_mlx5_port_%u_rxq_%u_%u", + rxq->port_id, rxq->idx, (uint32_t)rte_rdtsc()); + mlx5_dump_debug_information(name, NULL, err_str, 0); + mlx5_dump_debug_information(name, "MLX5 Error CQ:", +
[dpdk-dev] [PATCH v1 9/9] net/mlx5: recover secondary process Tx errors
The SQ errors recovery mechanism in the PMD invokes a Verbs functions to modify the RQ states in order to reset the SQ and to reactivate it. These Verbs functions are not allowed to be invoked from a secondary process, hence the PMD skips the recovery when the error is captured by secondary processes queues. Using the DPDK IPC mechanism the secondary process can request Verbs queues state modifications to be done synchronically by the primary process. Add support for secondary process Tx errors recovery. Cc: sta...@dpdk.org Signed-off-by: Matan Azrad --- drivers/net/mlx5/mlx5_rxtx.c | 104 ++- 1 file changed, 62 insertions(+), 42 deletions(-) diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index cb3baad..9659478 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -51,6 +51,10 @@ static __rte_always_inline void mprq_buf_replace(struct mlx5_rxq_data *rxq, uint16_t rq_idx); +static int +mlx5_queue_state_modify(struct rte_eth_dev *dev, + struct mlx5_mp_arg_queue_state_modify *sm); + uint32_t mlx5_ptype_table[] __rte_cache_aligned = { [0xff] = RTE_PTYPE_ALL_MASK, /* Last entry for errored packet. */ }; @@ -570,52 +574,27 @@ } /** - * Move QP from error state to running state. + * Move QP from error state to running state and initialize indexes. * - * @param txq - * Pointer to TX queue structure. - * @param qp - * The qp pointer for recovery. + * @param txq_ctrl + * Pointer to TX queue control structure. * * @return - * 0 on success, else errno value. + * 0 on success, else -1. */ static int -tx_recover_qp(struct mlx5_txq_data *txq, struct ibv_qp *qp) +tx_recover_qp(struct mlx5_txq_ctrl *txq_ctrl) { - int ret; - struct ibv_qp_attr mod = { - .qp_state = IBV_QPS_RESET, - .port_num = 1, - }; - ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE); - if (ret) { - DRV_LOG(ERR, "Cannot change the Tx QP state to RESET %d\n", - ret); - return ret; - } - mod.qp_state = IBV_QPS_INIT; - ret = mlx5_glue->modify_qp(qp, &mod, - (IBV_QP_STATE | IBV_QP_PORT)); - if (ret) { - DRV_LOG(ERR, "Cannot change Tx QP state to INIT %d\n", ret); - return ret; - } - mod.qp_state = IBV_QPS_RTR; - ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE); - if (ret) { - DRV_LOG(ERR, "Cannot change Tx QP state to RTR %d\n", ret); - return ret; - } - mod.qp_state = IBV_QPS_RTS; - ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE); - if (ret) { - DRV_LOG(ERR, "Cannot change Tx QP state to RTS %d\n", ret); - return ret; - } - txq->wqe_ci = 0; - txq->wqe_pi = 0; - txq->elts_comp = 0; + struct mlx5_mp_arg_queue_state_modify sm = { + .is_wq = 0, + .queue_id = txq_ctrl->txq.idx, + }; + + if (mlx5_queue_state_modify(ETH_DEV(txq_ctrl->priv), &sm)) + return -1; + txq_ctrl->txq.wqe_ci = 0; + txq_ctrl->txq.wqe_pi = 0; + txq_ctrl->txq.elts_comp = 0; return 0; } @@ -690,8 +669,7 @@ */ txq->stats.oerrors += ((txq->wqe_ci & wqe_m) - new_wqe_pi) & wqe_m; - if ((rte_eal_process_type() == RTE_PROC_PRIMARY) && - tx_recover_qp(txq, txq_ctrl->ibv->qp) == 0) { + if (tx_recover_qp(txq_ctrl) == 0) { txq->cq_ci++; /* Release all the remaining buffers. */ return txq->elts_head; @@ -2065,6 +2043,48 @@ rte_errno = errno; return ret; } + } else { + struct mlx5_txq_data *txq = (*priv->txqs)[sm->queue_id]; + struct mlx5_txq_ctrl *txq_ctrl = + container_of(txq, struct mlx5_txq_ctrl, txq); + struct ibv_qp_attr mod = { + .qp_state = IBV_QPS_RESET, + .port_num = (uint8_t)priv->ibv_port, + }; + struct ibv_qp *qp = txq_ctrl->ibv->qp; + + ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE); + if (ret) { + DRV_LOG(ERR, "Cannot change the Tx QP state to RESET " + "%s\n", strerror(errno)); + rte_errno = errno; + return ret; + } + mod.qp_state = IBV_QPS_INIT; + ret = mlx5_glue->modify_qp(qp, &mod, + (IBV_QP_STATE | IBV_QP_PORT)); +
[dpdk-dev] [PATCH v1 7/9] net/mlx5: handle Tx completion with error
When WQEs are posted to the HW to send packets, the PMD may get a completion report with error from the HW, aka error CQE which is associated to a bad WQE. The error reason may be bad address, wrong lkey, bad sizes, etc. that can wrongly be configured by the PMD or by the user. Checking all the optional mistakes to prevent error CQEs doesn't make sense due to performance impacts and huge complexity. The error CQEs change the SQ state to error state what causes all the next posted WQEs to be completed with CQE flush error forever. Currently, the PMD doesn't handle Tx error CQEs and even may crashed when one of them appears. Extend the Tx data-path to detect these error CQEs, to report them by the statistics error counters, to recover the SQ by moving the state to ready again and adjusting the management variables appropriately. Sometimes the error CQE root cause is very hard to debug and even may be related to some corner cases which are not reproducible easily, hence a dump file with debug information will be created for the first number of error CQEs, this number can be configured by the PMD probe parameters. Cc: sta...@dpdk.org Signed-off-by: Matan Azrad --- drivers/net/mlx5/mlx5_prm.h | 11 +++ drivers/net/mlx5/mlx5_rxtx.c | 166 -- drivers/net/mlx5/mlx5_rxtx.h | 81 ++--- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 10 +- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 10 +- drivers/net/mlx5/mlx5_txq.c | 4 +- 6 files changed, 231 insertions(+), 51 deletions(-) diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h index 8c42380..22db86b 100644 --- a/drivers/net/mlx5/mlx5_prm.h +++ b/drivers/net/mlx5/mlx5_prm.h @@ -153,6 +153,17 @@ /* Maximum number of DS in WQE. */ #define MLX5_DSEG_MAX 63 +/* The completion mode offset in the WQE control segment line 2. */ +#define MLX5_COMP_MODE_OFFSET 2 + +/* Completion mode. */ +enum mlx5_completion_mode { + MLX5_COMP_ONLY_ERR = 0x0, + MLX5_COMP_ONLY_FIRST_ERR = 0x1, + MLX5_COMP_ALWAYS = 0x2, + MLX5_COMP_CQE_AND_EQE = 0x3, +}; + /* Subset of struct mlx5_wqe_eth_seg. */ struct mlx5_wqe_eth_seg_small { uint32_t rsvd0; diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index 5369fc1..36e2dd3 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -570,6 +570,141 @@ } /** + * Move QP from error state to running state. + * + * @param txq + * Pointer to TX queue structure. + * @param qp + * The qp pointer for recovery. + * + * @return + * 0 on success, else errno value. + */ +static int +tx_recover_qp(struct mlx5_txq_data *txq, struct ibv_qp *qp) +{ + int ret; + struct ibv_qp_attr mod = { + .qp_state = IBV_QPS_RESET, + .port_num = 1, + }; + ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE); + if (ret) { + DRV_LOG(ERR, "Cannot change the Tx QP state to RESET %d\n", + ret); + return ret; + } + mod.qp_state = IBV_QPS_INIT; + ret = mlx5_glue->modify_qp(qp, &mod, + (IBV_QP_STATE | IBV_QP_PORT)); + if (ret) { + DRV_LOG(ERR, "Cannot change Tx QP state to INIT %d\n", ret); + return ret; + } + mod.qp_state = IBV_QPS_RTR; + ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE); + if (ret) { + DRV_LOG(ERR, "Cannot change Tx QP state to RTR %d\n", ret); + return ret; + } + mod.qp_state = IBV_QPS_RTS; + ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE); + if (ret) { + DRV_LOG(ERR, "Cannot change Tx QP state to RTS %d\n", ret); + return ret; + } + txq->wqe_ci = 0; + txq->wqe_pi = 0; + txq->elts_comp = 0; + return 0; +} + +/* Return 1 if the error CQE is signed otherwise, sign it and return 0. */ +static int +check_err_cqe_seen(volatile struct mlx5_err_cqe *err_cqe) +{ + static const uint8_t magic[] = "seen"; + int ret = 1; + unsigned int i; + + for (i = 0; i < sizeof(magic); ++i) + if (!ret || err_cqe->rsvd1[i] != magic[i]) { + ret = 0; + err_cqe->rsvd1[i] = magic[i]; + } + return ret; +} + +/** + * Handle error CQE. + * + * @param txq + * Pointer to TX queue structure. + * @param error_cqe + * Pointer to the error CQE. + * + * @return + * The last Tx buffer element to free. + */ +uint16_t +mlx5_tx_error_cqe_handle(struct mlx5_txq_data *txq, +volatile struct mlx5_err_cqe *err_cqe) +{ + if (err_cqe->syndrome != MLX5_CQE_SYNDROME_WR_FLUSH_ERR) { + const uint16_t wqe_m = ((1 << txq->wqe_n) - 1); + struct mlx5_txq_ctrl *txq_ctrl = +
Re: [dpdk-dev] [PATCH 2/2] meson: make build configurable
On Wed, May 29, 2019 at 09:37:20PM +0100, Luca Boccassi wrote: > On Wed, 2019-05-29 at 19:39 +0300, Ilya Maximets wrote: > > The first thing many developers do before start building DPDK is > > disabling all the not needed divers and libraries. This happens > > just because more than a half of DPDK dirvers and libraries are not > > needed for the particular reason. For example, you don't need > > dpaa*, octeon*, various croypto devices, eventdev, etc. if you're > > only want to build OVS for x86_64 with static linking. > > > > By disabling everything you don't need, build speeds up literally 10x > > times. This is important for CI systems. For example, TravisCI wastes > > 10 minutes for the default DPDK build just to check linking with OVS. > > > > Another thing is the binary size. Number of DPDK libraries and, > > as a result, size of resulted statically linked application decreases > > significantly. > > > > Important thing also that you're able to not install some > > dependencies > > if you don't have them on a target platform. Just disable > > libs/drivers > > that depends on it. Similar thing for the glibc version mismatch > > between build and target platforms. > > > > Also, I have to note that less code means less probability of > > failures and less number of attack vectors. > > > > This patch gives 'meson' the power of configurability that we > > have with 'make'. Using new options it's possible to enable just > > what you need and nothing more. > > > > For example, following cmdline could be used to build almost minimal > > set of DPDK libs and drivers to check OVS build: > > > > $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false \ > > -Ddrivers_bus=pci,vdev \ > > -Ddrivers_mempool=ring \ > > -Ddrivers_net=null,virtio,ring \ > > -Ddrivers_crypto=virtio \ > > -Ddrivers_compress=none \ > > -Ddrivers_event=none\ > > -Ddrivers_baseband=none \ > > -Ddrivers_raw=none \ > > -Ddrivers_common=none \ > > > > -Dlibs=kvargs,eal,cmdline,ring,mempool,mbuf,net,meter,\ > >ethdev,pci,hash,cryptodev,pdump,vhost \ > > -Dapps=none > > > > Adding a few real net drivers will give configuration that can be > > used > > in production environment. > > > > Looks not very pretty, but this could be moved to a script. > > > > Build details: > > > > Build targets in project: 57 > > > > $ time ninja > > real0m11,528s > > user1m4,137s > > sys 0m4,935s > > > > $ du -sh ../dpdk_meson_install/ > > 3,5M../dpdk_meson_install/ > > > > To compare with what we have without these options: > > > > $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false > > Build targets in project: 434 > > > > $ time ninja > > real1m38,963s > > user10m18,624s > > sys 0m45,478s > > > > $ du -sh ../dpdk_meson_install/ > > 27M ../dpdk_meson_install/ > > > > 10x speed up for the user time. > > 7.7 times size decrease. > > > > This is probably not much user-friendly because it's not a Kconfig > > and dependency tracking in meson is really poor, so it requires > > usually few iterations to pick correct set of libraries to satisfy > > all dependencies. However, it's not a big deal. Options intended > > for a proficient users who knows what they need. > > Hi, > > We talked about this a few times in the past, and it was actually one > of the design goals to _avoid_ replicating the octopus-like config > system of the makefiles. That's because it makes the test matrix > insanely complicated, not to mention the harm to user friendliness, > among other things. > > If someone doesn't want to use a PMD, they can just avoid installing it > - it's simple enough. > > Sorry, but from me it's a very strong NACK. > I would agree with this position - tracking the dependencies of the libraries etc. is a nightmare, and requires lots of ifdef'ery in the code for handling cases where libraries don't exist. However, I might be ok with limiting the drivers somewhat, since they don't tend to depend on each other so much, though ideally I'd still prefer to have one build of DPDK that has minimal configuration. If we say that we can disable some drivers, though, issue then becomes whether e.g. the bus drivers could selectively be disabled, and the knock-on effects of that. I'd hate to see the case where we end up having the meson.build files for drivers becoming a massive list of conditional checks for a bunch of internal dependencies. If someone is wanting to do a custom build of DPDK, they can always patch out the subdirectories they don't want in the meson.build files - but because of testing matrixes for such configurations, I don't think its something we want to explicitly support. /Bruce
Re: [dpdk-dev] [PATCH 2/2] meson: make build configurable
On Thu, 2019-05-30 at 13:03 +0300, Ilya Maximets wrote: > On 29.05.2019 23:37, Luca Boccassi wrote: > > On Wed, 2019-05-29 at 19:39 +0300, Ilya Maximets wrote: > > > The first thing many developers do before start building DPDK is > > > disabling all the not needed divers and libraries. This happens > > > just because more than a half of DPDK dirvers and libraries are > > > not > > > needed for the particular reason. For example, you don't need > > > dpaa*, octeon*, various croypto devices, eventdev, etc. if you're > > > only want to build OVS for x86_64 with static linking. > > > > > > By disabling everything you don't need, build speeds up literally > > > 10x > > > times. This is important for CI systems. For example, TravisCI > > > wastes > > > 10 minutes for the default DPDK build just to check linking with > > > OVS. > > > > > > Another thing is the binary size. Number of DPDK libraries and, > > > as a result, size of resulted statically linked application > > > decreases > > > significantly. > > > > > > Important thing also that you're able to not install some > > > dependencies > > > if you don't have them on a target platform. Just disable > > > libs/drivers > > > that depends on it. Similar thing for the glibc version mismatch > > > between build and target platforms. > > > > > > Also, I have to note that less code means less probability of > > > failures and less number of attack vectors. > > > > > > This patch gives 'meson' the power of configurability that we > > > have with 'make'. Using new options it's possible to enable just > > > what you need and nothing more. > > > > > > For example, following cmdline could be used to build almost > > > minimal > > > set of DPDK libs and drivers to check OVS build: > > > > > > $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false > > > \ > > > -Ddrivers_bus=pci,vdev \ > > > -Ddrivers_mempool=ring \ > > > -Ddrivers_net=null,virtio,ring \ > > > -Ddrivers_crypto=virtio \ > > > -Ddrivers_compress=none \ > > > -Ddrivers_event=none\ > > > -Ddrivers_baseband=none \ > > > -Ddrivers_raw=none \ > > > -Ddrivers_common=none \ > > > > > > -Dlibs=kvargs,eal,cmdline,ring,mempool,mbuf,net,meter,\ > > >ethdev,pci,hash,cryptodev,pdump,vhost \ > > > -Dapps=none > > > > > > Adding a few real net drivers will give configuration that can be > > > used > > > in production environment. > > > > > > Looks not very pretty, but this could be moved to a script. > > > > > > Build details: > > > > > > Build targets in project: 57 > > > > > > $ time ninja > > > real0m11,528s > > > user1m4,137s > > > sys 0m4,935s > > > > > > $ du -sh ../dpdk_meson_install/ > > > 3,5M../dpdk_meson_install/ > > > > > > To compare with what we have without these options: > > > > > > $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false > > > Build targets in project: 434 > > > > > > $ time ninja > > > real1m38,963s > > > user10m18,624s > > > sys 0m45,478s > > > > > > $ du -sh ../dpdk_meson_install/ > > > 27M ../dpdk_meson_install/ > > > > > > 10x speed up for the user time. > > > 7.7 times size decrease. > > > > > > This is probably not much user-friendly because it's not a > > > Kconfig > > > and dependency tracking in meson is really poor, so it requires > > > usually few iterations to pick correct set of libraries to > > > satisfy > > > all dependencies. However, it's not a big deal. Options intended > > > for a proficient users who knows what they need. > > > > Hi, > > > > We talked about this a few times in the past, and it was actually > > one > > of the design goals to _avoid_ replicating the octopus-like config > > system of the makefiles. That's because it makes the test matrix > > insanely complicated, not to mention the harm to user friendliness, > > among other things. > > > > If someone doesn't want to use a PMD, they can just avoid > > installing it > > - it's simple enough. > > So how can I do this? I don't think 'ninja install' has such option. > Also, if you think that it is safe to skip some libs/drivers in > installation > process, it must be safe to not build them at all. It's just a waste > of > time and computational resources to build something known to be not > used. > And if you're going to ship DPDK libraries separately in distros, > you'll > have to test their different combinations anyway. If they're so > independent > that you don't need to test them in various combinations, than your > point > about test matrix is not valid. It can be done in the packaging step, or post-install if there's no packaging. An operating system vendor is free to do its own test and support plan, and decide to leave out som
Re: [dpdk-dev] [PATCH 2/2] meson: make build configurable
On 30.05.2019 13:22, Bruce Richardson wrote: > On Wed, May 29, 2019 at 09:37:20PM +0100, Luca Boccassi wrote: >> On Wed, 2019-05-29 at 19:39 +0300, Ilya Maximets wrote: >>> The first thing many developers do before start building DPDK is >>> disabling all the not needed divers and libraries. This happens >>> just because more than a half of DPDK dirvers and libraries are not >>> needed for the particular reason. For example, you don't need >>> dpaa*, octeon*, various croypto devices, eventdev, etc. if you're >>> only want to build OVS for x86_64 with static linking. >>> >>> By disabling everything you don't need, build speeds up literally 10x >>> times. This is important for CI systems. For example, TravisCI wastes >>> 10 minutes for the default DPDK build just to check linking with OVS. >>> >>> Another thing is the binary size. Number of DPDK libraries and, >>> as a result, size of resulted statically linked application decreases >>> significantly. >>> >>> Important thing also that you're able to not install some >>> dependencies >>> if you don't have them on a target platform. Just disable >>> libs/drivers >>> that depends on it. Similar thing for the glibc version mismatch >>> between build and target platforms. >>> >>> Also, I have to note that less code means less probability of >>> failures and less number of attack vectors. >>> >>> This patch gives 'meson' the power of configurability that we >>> have with 'make'. Using new options it's possible to enable just >>> what you need and nothing more. >>> >>> For example, following cmdline could be used to build almost minimal >>> set of DPDK libs and drivers to check OVS build: >>> >>> $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false \ >>> -Ddrivers_bus=pci,vdev \ >>> -Ddrivers_mempool=ring \ >>> -Ddrivers_net=null,virtio,ring \ >>> -Ddrivers_crypto=virtio \ >>> -Ddrivers_compress=none \ >>> -Ddrivers_event=none\ >>> -Ddrivers_baseband=none \ >>> -Ddrivers_raw=none \ >>> -Ddrivers_common=none \ >>> >>> -Dlibs=kvargs,eal,cmdline,ring,mempool,mbuf,net,meter,\ >>>ethdev,pci,hash,cryptodev,pdump,vhost \ >>> -Dapps=none >>> >>> Adding a few real net drivers will give configuration that can be >>> used >>> in production environment. >>> >>> Looks not very pretty, but this could be moved to a script. >>> >>> Build details: >>> >>> Build targets in project: 57 >>> >>> $ time ninja >>> real0m11,528s >>> user1m4,137s >>> sys 0m4,935s >>> >>> $ du -sh ../dpdk_meson_install/ >>> 3,5M../dpdk_meson_install/ >>> >>> To compare with what we have without these options: >>> >>> $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false >>> Build targets in project: 434 >>> >>> $ time ninja >>> real1m38,963s >>> user10m18,624s >>> sys 0m45,478s >>> >>> $ du -sh ../dpdk_meson_install/ >>> 27M ../dpdk_meson_install/ >>> >>> 10x speed up for the user time. >>> 7.7 times size decrease. >>> >>> This is probably not much user-friendly because it's not a Kconfig >>> and dependency tracking in meson is really poor, so it requires >>> usually few iterations to pick correct set of libraries to satisfy >>> all dependencies. However, it's not a big deal. Options intended >>> for a proficient users who knows what they need. >> >> Hi, >> >> We talked about this a few times in the past, and it was actually one >> of the design goals to _avoid_ replicating the octopus-like config >> system of the makefiles. That's because it makes the test matrix >> insanely complicated, not to mention the harm to user friendliness, >> among other things. >> >> If someone doesn't want to use a PMD, they can just avoid installing it >> - it's simple enough. >> >> Sorry, but from me it's a very strong NACK. >> > I would agree with this position - tracking the dependencies of the > libraries etc. is a nightmare, and requires lots of ifdef'ery in the code > for handling cases where libraries don't exist. > > However, I might be ok with limiting the drivers somewhat, since they don't > tend to depend on each other so much, though ideally I'd still prefer to > have one build of DPDK that has minimal configuration. If we say that we > can disable some drivers, though, issue then becomes whether e.g. the bus > drivers could selectively be disabled, and the knock-on effects of that. > I'd hate to see the case where we end up having the meson.build files for > drivers becoming a massive list of conditional checks for a bunch of > internal dependencies. If someone is wanting to do a custom build of DPDK, > they can always patch out the subdirectories they don't want in the > meson.build files - but because of testing matrixes for such > configurations, I don't
Re: [dpdk-dev] [PATCH 1/2] meson: don't check dependencies for tests if not required
On Wed, May 29, 2019 at 07:39:57PM +0300, Ilya Maximets wrote: > Don't need to check dependencies if test apps will not be built anyway. > > Signed-off-by: Ilya Maximets > --- > app/test/meson.build | 38 +++--- > 1 file changed, 19 insertions(+), 19 deletions(-) > Agree with the idea. Would this work as a shorter alternative placed at the top of the file? if not get_option('tests') subdir_done() endif /Bruce
Re: [dpdk-dev] [PATCH 2/2] meson: make build configurable
On 30.05.2019 14:06, Luca Boccassi wrote: > On Thu, 2019-05-30 at 13:03 +0300, Ilya Maximets wrote: >> On 29.05.2019 23:37, Luca Boccassi wrote: >>> On Wed, 2019-05-29 at 19:39 +0300, Ilya Maximets wrote: The first thing many developers do before start building DPDK is disabling all the not needed divers and libraries. This happens just because more than a half of DPDK dirvers and libraries are not needed for the particular reason. For example, you don't need dpaa*, octeon*, various croypto devices, eventdev, etc. if you're only want to build OVS for x86_64 with static linking. By disabling everything you don't need, build speeds up literally 10x times. This is important for CI systems. For example, TravisCI wastes 10 minutes for the default DPDK build just to check linking with OVS. Another thing is the binary size. Number of DPDK libraries and, as a result, size of resulted statically linked application decreases significantly. Important thing also that you're able to not install some dependencies if you don't have them on a target platform. Just disable libs/drivers that depends on it. Similar thing for the glibc version mismatch between build and target platforms. Also, I have to note that less code means less probability of failures and less number of attack vectors. This patch gives 'meson' the power of configurability that we have with 'make'. Using new options it's possible to enable just what you need and nothing more. For example, following cmdline could be used to build almost minimal set of DPDK libs and drivers to check OVS build: $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false \ -Ddrivers_bus=pci,vdev \ -Ddrivers_mempool=ring \ -Ddrivers_net=null,virtio,ring \ -Ddrivers_crypto=virtio \ -Ddrivers_compress=none \ -Ddrivers_event=none\ -Ddrivers_baseband=none \ -Ddrivers_raw=none \ -Ddrivers_common=none \ -Dlibs=kvargs,eal,cmdline,ring,mempool,mbuf,net,meter,\ ethdev,pci,hash,cryptodev,pdump,vhost \ -Dapps=none Adding a few real net drivers will give configuration that can be used in production environment. Looks not very pretty, but this could be moved to a script. Build details: Build targets in project: 57 $ time ninja real0m11,528s user1m4,137s sys 0m4,935s $ du -sh ../dpdk_meson_install/ 3,5M../dpdk_meson_install/ To compare with what we have without these options: $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false Build targets in project: 434 $ time ninja real1m38,963s user10m18,624s sys 0m45,478s $ du -sh ../dpdk_meson_install/ 27M ../dpdk_meson_install/ 10x speed up for the user time. 7.7 times size decrease. This is probably not much user-friendly because it's not a Kconfig and dependency tracking in meson is really poor, so it requires usually few iterations to pick correct set of libraries to satisfy all dependencies. However, it's not a big deal. Options intended for a proficient users who knows what they need. >>> >>> Hi, >>> >>> We talked about this a few times in the past, and it was actually >>> one >>> of the design goals to _avoid_ replicating the octopus-like config >>> system of the makefiles. That's because it makes the test matrix >>> insanely complicated, not to mention the harm to user friendliness, >>> among other things. >>> >>> If someone doesn't want to use a PMD, they can just avoid >>> installing it >>> - it's simple enough. >> >> So how can I do this? I don't think 'ninja install' has such option. >> Also, if you think that it is safe to skip some libs/drivers in >> installation >> process, it must be safe to not build them at all. It's just a waste >> of >> time and computational resources to build something known to be not >> used. >> And if you're going to ship DPDK libraries separately in distros, >> you'll >> have to test their different combinations anyway. If they're so >> independent >> that you don't need to test them in various combinations, than your >> point >> about test matrix is not valid. > > It can be done in the packaging step, or post-install if there's no > packaging. An operating system vendor is free to do its own test and > support plan, and decide to leave out some PMDs from it. This technically means doing this
Re: [dpdk-dev] 18.11.2 (LTS) patches review and test
> -Original Message- > From: Ian Stokes > Sent: Thursday, May 30, 2019 11:16 AM > To: Kevin Traynor ; dpdk stable > Cc: dev@dpdk.org; Sitong Liu ; Pei Zhang > ; Raslan Darawsheh ; > qian.q...@intel.com; Ju-Hyoung Lee ; Ali Alnubani > ; David Christensen ; > benjamin.wal...@intel.com; Thomas Monjalon ; > John McNamara ; Luca Boccassi > ; Jerin Jacob Kollanukkaran ; > Hemant Agrawal ; Akhil Goyal > > Subject: Re: 18.11.2 (LTS) patches review and test > > On 5/21/2019 3:01 PM, Kevin Traynor wrote: > > Hi all, > > > > Here is a list of patches targeted for LTS release 18.11.2. > > > > The planned date for the final release is 11th June. > > > > Please help with testing and validation of your use cases and report > > any issues/results. For the final release I will update the release > > notes with fixes and reported validations. > > > > Thanks. > > > > Kevin Traynor > > > > Hi Kevin, > > I've validated with current head OVS Master and OVS 2.11.1 with VSPERF. > Tested with i40e (X710), i40eVF, ixgbe (82599ES), ixgbeVF, igb(I350) and > igbVF devices. > > Following tests were conducted and passed: > > * vswitch_p2p_tput: vSwitch - configure switch and execute RFC2544 > throughput test. > * vswitch_p2p_cont: vSwitch - configure switch and execute RFC2544 > continuous stream test. > * vswitch_pvp_tput: vSwitch - configure switch, vnf and execute RFC2544 > throughput test. > * vswitch_pvp_cont: vSwitch - configure switch, vnf and execute RFC2544 > continuous stream test. > * ovsdpdk_hotplug_attach: Ensure successful port-add after binding a device > to igb_uio after ovs-vswitchd is launched. > * ovsdpdk_mq_p2p_rxqs: Setup rxqs on NIC port. > * ovsdpdk_mq_pvp_rxqs: Setup rxqs on vhost user port. > * ovsdpdk_mq_pvp_rxqs_linux_bridge: Confirm traffic received over vhost > RXQs with Linux virtio device in guest. > * ovsdpdk_mq_pvp_rxqs_testpmd: Confirm traffic received over vhost RXQs > with DPDK device in guest. > * ovsdpdk_vhostuser_client: Test vhost-user client mode. > * ovsdpdk_vhostuser_client_reconnect: Test vhost-user client mode > reconnect feature. > * ovsdpdk_vhostuser_server: Test vhost-user server mode. > * ovsdpdk_vhostuser_sock_dir: Verify functionality of vhost-sock-dir flag. > * ovsdpdk_vdev_add_null_pmd: Test addition of port using the null DPDK > PMD driver. > * ovsdpdk_vdev_del_null_pmd: Test deletion of port using the null DPDK > PMD driver. > * ovsdpdk_vdev_add_af_packet_pmd: Test addition of port using the > af_packet DPDK PMD driver. > * ovsdpdk_vdev_del_af_packet_pmd: Test deletion of port using the > af_packet DPDK PMD driver. > * ovsdpdk_numa: Test vhost-user NUMA support. Vhostuser PMD threads > should migrate to the same numa slot, where QEMU is executed. > * ovsdpdk_jumbo_p2p: Ensure that jumbo frames are received, processed > and forwarded correctly by DPDK physical ports. > * ovsdpdk_jumbo_pvp: Ensure that jumbo frames are received, processed > and forwarded correctly by DPDK vhost-user ports. > * ovsdpdk_jumbo_p2p_upper_bound: Ensure that jumbo frames above the > configured Rx port's MTU are not accepted. > * ovsdpdk_jumbo_mtu_upper_bound_vport: Verify that the upper bound > limit is enforced for OvS DPDK vhost-user ports. > * ovsdpdk_rate_p2p: Ensure when a user creates a rate limiting physical > interface that the traffic is limited to the specified policer rate in a p2p > setup. > * ovsdpdk_rate_pvp: Ensure when a user creates a rate limiting vHost User > interface that the traffic is limited to the specified policer rate in a pvp > setup. > * ovsdpdk_qos_p2p: In a p2p setup, ensure when a QoS egress policer is > created that the traffic is limited to the specified rate. > * ovsdpdk_qos_pvp: In a pvp setup, ensure when a QoS egress policer is > created that the traffic is limited to the specified rate. > * phy2phy_scalability: LTD.Scalability.Flows.RFC2544.0PacketLoss > * phy2phy_scalability_cont: Phy2Phy Scalability Continuous Stream > * pvp_cont: PVP Continuous Stream > * pvvp_cont: PVVP Continuous Stream > * pvpv_cont: Two VMs in parallel with Continuous Stream > > Regards > Ian Hi, I validated this version and sent our testing matrix: http://mails.dpdk.org/archives/stable/2019-May/015198.html Thanks, Ali
Re: [dpdk-dev] [PATCH 1/2] meson: don't check dependencies for tests if not required
On 30.05.2019 14:55, Bruce Richardson wrote: > On Wed, May 29, 2019 at 07:39:57PM +0300, Ilya Maximets wrote: >> Don't need to check dependencies if test apps will not be built anyway. >> >> Signed-off-by: Ilya Maximets >> --- >> app/test/meson.build | 38 +++--- >> 1 file changed, 19 insertions(+), 19 deletions(-) >> > Agree with the idea. > > Would this work as a shorter alternative placed at the top of the file? > > if not get_option('tests') > subdir_done() > endif This looks good to me. However, the resulted patch will be much larger because we'll have to shift most of it to the left. If it's OK, I'll prepare v2 with this change. What do you think? Best regards, Ilya Maximets.
Re: [dpdk-dev] [PATCH 1/2] meson: don't check dependencies for tests if not required
On Thu, May 30, 2019 at 03:06:17PM +0300, Ilya Maximets wrote: > On 30.05.2019 14:55, Bruce Richardson wrote: > > On Wed, May 29, 2019 at 07:39:57PM +0300, Ilya Maximets wrote: > >> Don't need to check dependencies if test apps will not be built anyway. > >> > >> Signed-off-by: Ilya Maximets > >> --- > >> app/test/meson.build | 38 +++--- > >> 1 file changed, 19 insertions(+), 19 deletions(-) > >> > > Agree with the idea. > > > > Would this work as a shorter alternative placed at the top of the file? > > > > if not get_option('tests') > > subdir_done() > > endif > > This looks good to me. > However, the resulted patch will be much larger because we'll have to > shift most of it to the left. If it's OK, I'll prepare v2 with this change. > What do you think? > Yes, there will be some left-shifting, but it should just be a single block from lines 338-419, which is probably ok. The end result is better, I think.
Re: [dpdk-dev] [RFC v9] /net: memory interface (memif)
> -Original Message- > From: Ferruh Yigit > Sent: Wednesday, May 29, 2019 7:29 PM > To: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco) > ; dev@dpdk.org > Subject: Re: [dpdk-dev] [RFC v9] /net: memory interface (memif) > > + > > +.. csv-table:: **Memif configuration options** > > + :header: "Option", "Description", "Default", "Valid value" > > + > > + "id=0", "Used to identify peer interface", "0", "uint32_t" > > + "role=master", "Set memif role", "slave", "master|slave" > > + "bsize=1024", "Size of single packet buffer", "2048", "uint16_t" > > What happens is 'bsize < mbuf size'? I didn't see any check in the code but is > there any assumption around this? > Or any assumption that slave and master packet should be same? Or any > other relation? > If there is any assumption it may be good to add checks to the code and > document here. There is no relation between bsize and mbuf size. Memif driver will consume as many buffers as it needs (chaining them). > > +#ifndef _RTE_ETH_MEMIF_H_ > > +#define _RTE_ETH_MEMIF_H_ > > + > > +#ifndef _GNU_SOURCE > > +#define _GNU_SOURCE > > +#endif /* GNU_SOURCE */ > > Why this was required? _GNU_SOURCE is required by memfd_create().
[dpdk-dev] [PATCH v2] meson: don't check dependencies for tests if not required
Don't need to check dependencies if test apps will not be built anyway. Signed-off-by: Ilya Maximets --- Version 2: - 'get_option('tests')' check moved to the top. app/test/meson.build | 141 ++- 1 file changed, 72 insertions(+), 69 deletions(-) diff --git a/app/test/meson.build b/app/test/meson.build index 83391cef0..4de856f93 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -1,6 +1,10 @@ # SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2017 Intel Corporation +if not get_option('tests') + subdir_done() +endif + test_sources = files('commands.c', 'packet_burst_generator.c', 'sample_packet_forward.c', @@ -335,86 +339,85 @@ if get_option('default_library') == 'static' link_libs = dpdk_drivers endif -if get_option('tests') - dpdk_test = executable('dpdk-test', - test_sources, - link_whole: link_libs, - dependencies: test_dep_objs, - c_args: [cflags, '-DALLOW_EXPERIMENTAL_API'], - install_rpath: driver_install_path, - install: false) +dpdk_test = executable('dpdk-test', + test_sources, + link_whole: link_libs, + dependencies: test_dep_objs, + c_args: [cflags, '-DALLOW_EXPERIMENTAL_API'], + install_rpath: driver_install_path, + install: false) - # some perf tests (eg: memcpy perf autotest)take very long - # to complete, so timeout to 10 minutes - timeout_seconds = 600 - timeout_seconds_fast = 10 - - # Retrieve the number of CPU cores, defaulting to 4. - num_cores = '0-3' - if host_machine.system() == 'linux' - num_cores = run_command('cat', - '/sys/devices/system/cpu/present' - ).stdout().strip() - elif host_machine.system() == 'freebsd' - snum_cores = run_command('/sbin/sysctl', '-n', -'hw.ncpu').stdout().strip() - inum_cores = snum_cores.to_int() - 1 -num_cores = '0-@0@'.format(inum_cores) - endif +# some perf tests (eg: memcpy perf autotest)take very long +# to complete, so timeout to 10 minutes +timeout_seconds = 600 +timeout_seconds_fast = 10 - num_cores_arg = '-l ' + num_cores +# Retrieve the number of CPU cores, defaulting to 4. +num_cores = '0-3' +if host_machine.system() == 'linux' + num_cores = run_command('cat', + '/sys/devices/system/cpu/present' + ).stdout().strip() +elif host_machine.system() == 'freebsd' + snum_cores = run_command('/sbin/sysctl', '-n', +'hw.ncpu').stdout().strip() + inum_cores = snum_cores.to_int() - 1 +num_cores = '0-@0@'.format(inum_cores) +endif - test_args = [num_cores_arg, '-n 4'] - foreach arg : fast_parallel_test_names - if host_machine.system() == 'linux' - test(arg, dpdk_test, - env : ['DPDK_TEST=' + arg], - args : test_args + -['--file-prefix=@0@'.format(arg)], - timeout : timeout_seconds_fast, - suite : 'fast-tests') - else - test(arg, dpdk_test, - env : ['DPDK_TEST=' + arg], - args : test_args, - timeout : timeout_seconds_fast, - suite : 'fast-tests') - endif - endforeach +num_cores_arg = '-l ' + num_cores - foreach arg : fast_non_parallel_test_names +test_args = [num_cores_arg, '-n 4'] +foreach arg : fast_parallel_test_names + if host_machine.system() == 'linux' + test(arg, dpdk_test, + env : ['DPDK_TEST=' + arg], + args : test_args + +['--file-prefix=@0@'.format(arg)], + timeout : timeout_seconds_fast, + suite : 'fast-tests') + else test(arg, dpdk_test, env : ['DPDK_TEST=' + arg], args : test_args, - timeout : timeout_seconds_fast, - is_parallel : false, - suite : 'fast-tests') - endforeach + timeout : timeout_seconds_fast, + suite : 'fast-tests') + endif +endforeach - foreach arg : perf_test_names - test(arg, dpdk_test, +foreach arg : fast_non_parallel_test_names + test(arg, dpdk_test, + env : ['DPDK_TEST=' + arg], + args : test_args, + timeout : timeout_seconds_fast, + is_parallel : false, + suite : 'fast-tests') +endforeach + +foreach
Re: [dpdk-dev] [PATCH v2] meson: don't check dependencies for tests if not required
On Thu, May 30, 2019 at 03:38:36PM +0300, Ilya Maximets wrote: > Don't need to check dependencies if test apps will not be built anyway. > > Signed-off-by: Ilya Maximets > --- > > Version 2: > - 'get_option('tests')' check moved to the top. > > app/test/meson.build | 141 ++- > 1 file changed, 72 insertions(+), 69 deletions(-) > Acked-by: Bruce Richardson
[dpdk-dev] [PATCH 1/2] eventdev: replace mbufs with events in Rx callback
Replace the mbuf pointer array in the event eth Rx adapter callback with an event array instead of an mbuf array. Using an event array allows the application to change attributes of the events enqueued by the SW adapter. Signed-off-by: Nikhil Rao --- lib/librte_eventdev/rte_event_eth_rx_adapter.h | 57 +++--- lib/librte_eventdev/rte_event_eth_rx_adapter.c | 32 --- 2 files changed, 52 insertions(+), 37 deletions(-) This patch depends on http://patchwork.dpdk.org/patch/53614/ v1: * add implementation to RFC diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.h b/lib/librte_eventdev/rte_event_eth_rx_adapter.h index 2314b93..a64eed0 100644 --- a/lib/librte_eventdev/rte_event_eth_rx_adapter.h +++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.h @@ -66,16 +66,17 @@ * For SW based packet transfers, i.e., when the * RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT is not set in the adapter's * capabilities flags for a particular ethernet device, the service function - * temporarily enqueues mbufs to an event buffer before batch enqueueing these + * temporarily enqueues events to an event buffer before batch enqueueing these * to the event device. If the buffer fills up, the service function stops * dequeueing packets from the ethernet device. The application may want to * monitor the buffer fill level and instruct the service function to - * selectively buffer packets. The application may also use some other + * selectively buffer events. The application may also use some other * criteria to decide which packets should enter the event device even when - * the event buffer fill level is low. The - * rte_event_eth_rx_adapter_cb_register() function allows the - * application to register a callback that selects which packets to enqueue - * to the event device. + * the event buffer fill level is low or may want to enqueue packets to an + * internal event port. The rte_event_eth_rx_adapter_cb_register() function + * allows the application to register a callback that selects which packets are + * enqueued to the event device by the SW adapter. The callback interface is + * event based so the callback can also modify the event data if it needs to. */ #ifdef __cplusplus @@ -217,12 +218,23 @@ struct rte_event_eth_rx_adapter_stats { * @b EXPERIMENTAL: this API may change without prior notice * * Callback function invoked by the SW adapter before it continues - * to process packets. The callback is passed the size of the enqueue + * to process events. The callback is passed the size of the enqueue * buffer in the SW adapter and the occupancy of the buffer. The - * callback can use these values to decide which mbufs should be - * enqueued to the event device. If the return value of the callback - * is less than nb_mbuf then the SW adapter uses the return value to - * enqueue enq_mbuf[] to the event device. + * callback can use these values to decide which events are + * enqueued to the event device by the SW adapter. The callback may + * also enqueue events internally using its own event port. The SW + * adapter populates the event information based on the Rx queue + * configuration in the adapter. The callback can modify the this event + * information for the events to be enqueued by the SW adapter. + * + * The callback return value is the number of events from the + * beginning of the event array that are to be enqueued by + * the SW adapter. It is the callback's responsibility to arrange + * these events at the beginning of the array, if these events are + * not contiguous in the original array. The *nb_dropped* parameter is + * a pointer to the number of events dropped by the callback, this + * number is used by the adapter to indicate the number of dropped packets + * as part of its statistics. * * @param eth_dev_id * Port identifier of the Ethernet device. @@ -231,27 +243,26 @@ struct rte_event_eth_rx_adapter_stats { * @param enqueue_buf_size * Total enqueue buffer size. * @param enqueue_buf_count - * mbuf count in enqueue buffer. - * @param mbuf - * mbuf array. - * @param nb_mbuf - * mbuf count. + * Event count in enqueue buffer. + * @param[in, out] ev + * Event array. + * @param nb_event + * Event array length. * @param cb_arg * Callback argument. - * @param[out] enq_mbuf - * The adapter enqueues enq_mbuf[] if the return value of the - * callback is less than nb_mbuf + * @param[out] nb_dropped + * Packets dropped by callback. * @return - * Returns the number of mbufs should be enqueued to eventdev + * - The number of events to be enqueued by the SW adapter. */ typedef uint16_t (*rte_event_eth_rx_adapter_cb_fn)(uint16_t eth_dev_id, uint16_t queue_id, uint32_t enqueue_buf_size, uint32_t enqueue_buf_count, - struct rte_mbuf **mbuf, -
Re: [dpdk-dev] [PATCH v2] meson: don't check dependencies for tests if not required
Ilya Maximets writes: > Don't need to check dependencies if test apps will not be built anyway. > > Signed-off-by: Ilya Maximets > --- > > Version 2: > - 'get_option('tests')' check moved to the top. > > app/test/meson.build | 141 ++- > 1 file changed, 72 insertions(+), 69 deletions(-) > Acked-by: Aaron Conole Thanks for this, Ilya!
[dpdk-dev] [PATCH 2/2] eventdev: add dropped count to Rx adapter stats
The application can install a callback invoked by the Rx adapter. The callback can drop packets and populate a callback argument with the number of dropped packets. Add a Rx adapter stats field to keep track of the total number of dropped packets. Signed-off-by: Nikhil Rao --- lib/librte_eventdev/rte_event_eth_rx_adapter.h | 2 ++ lib/librte_eventdev/rte_event_eth_rx_adapter.c | 3 +++ 2 files changed, 5 insertions(+) diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.h b/lib/librte_eventdev/rte_event_eth_rx_adapter.h index a64eed0..4ea5a53 100644 --- a/lib/librte_eventdev/rte_event_eth_rx_adapter.h +++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.h @@ -197,6 +197,8 @@ struct rte_event_eth_rx_adapter_stats { /**< Eventdev enqueue count */ uint64_t rx_enq_retry; /**< Eventdev enqueue retry count */ + uint64_t rx_dropped; + /**< Received packet dropped count */ uint64_t rx_enq_start_ts; /**< Rx enqueue start timestamp */ uint64_t rx_enq_block_cycles; diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.c b/lib/librte_eventdev/rte_event_eth_rx_adapter.c index ab4e3cf..4d41aa7 100644 --- a/lib/librte_eventdev/rte_event_eth_rx_adapter.c +++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.c @@ -807,6 +807,7 @@ static uint16_t rxa_gcd_u16(uint16_t a, uint16_t b) if (dev_info->cb_fn) { + dropped = 0; nb_cb = dev_info->cb_fn(eth_dev_id, rx_queue_id, ETH_EVENT_BUFFER_SIZE, @@ -820,6 +821,8 @@ static uint16_t rxa_gcd_u16(uint16_t a, uint16_t b) nb_cb, num); else num = nb_cb; + if (dropped) + rx_adapter->stats.rx_dropped += dropped; } buf->count += num; -- 1.8.3.1
[dpdk-dev] [PATCH 1/2] eventdev: replace mbufs with events in Rx callback
Replace the mbuf pointer array in the event eth Rx adapter callback with an event array instead of an mbuf array. Using an event array allows the application to change attributes of the events enqueued by the SW adapter. Signed-off-by: Nikhil Rao --- lib/librte_eventdev/rte_event_eth_rx_adapter.h | 57 +++--- lib/librte_eventdev/rte_event_eth_rx_adapter.c | 32 --- 2 files changed, 52 insertions(+), 37 deletions(-) This patch depends on http://patchwork.dpdk.org/patch/53614/ Resending - the previous attempt only sent the first patch. v1: * add implementation to RFC diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.h b/lib/librte_eventdev/rte_event_eth_rx_adapter.h index 2314b93..a64eed0 100644 --- a/lib/librte_eventdev/rte_event_eth_rx_adapter.h +++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.h @@ -66,16 +66,17 @@ * For SW based packet transfers, i.e., when the * RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT is not set in the adapter's * capabilities flags for a particular ethernet device, the service function - * temporarily enqueues mbufs to an event buffer before batch enqueueing these + * temporarily enqueues events to an event buffer before batch enqueueing these * to the event device. If the buffer fills up, the service function stops * dequeueing packets from the ethernet device. The application may want to * monitor the buffer fill level and instruct the service function to - * selectively buffer packets. The application may also use some other + * selectively buffer events. The application may also use some other * criteria to decide which packets should enter the event device even when - * the event buffer fill level is low. The - * rte_event_eth_rx_adapter_cb_register() function allows the - * application to register a callback that selects which packets to enqueue - * to the event device. + * the event buffer fill level is low or may want to enqueue packets to an + * internal event port. The rte_event_eth_rx_adapter_cb_register() function + * allows the application to register a callback that selects which packets are + * enqueued to the event device by the SW adapter. The callback interface is + * event based so the callback can also modify the event data if it needs to. */ #ifdef __cplusplus @@ -217,12 +218,23 @@ struct rte_event_eth_rx_adapter_stats { * @b EXPERIMENTAL: this API may change without prior notice * * Callback function invoked by the SW adapter before it continues - * to process packets. The callback is passed the size of the enqueue + * to process events. The callback is passed the size of the enqueue * buffer in the SW adapter and the occupancy of the buffer. The - * callback can use these values to decide which mbufs should be - * enqueued to the event device. If the return value of the callback - * is less than nb_mbuf then the SW adapter uses the return value to - * enqueue enq_mbuf[] to the event device. + * callback can use these values to decide which events are + * enqueued to the event device by the SW adapter. The callback may + * also enqueue events internally using its own event port. The SW + * adapter populates the event information based on the Rx queue + * configuration in the adapter. The callback can modify the this event + * information for the events to be enqueued by the SW adapter. + * + * The callback return value is the number of events from the + * beginning of the event array that are to be enqueued by + * the SW adapter. It is the callback's responsibility to arrange + * these events at the beginning of the array, if these events are + * not contiguous in the original array. The *nb_dropped* parameter is + * a pointer to the number of events dropped by the callback, this + * number is used by the adapter to indicate the number of dropped packets + * as part of its statistics. * * @param eth_dev_id * Port identifier of the Ethernet device. @@ -231,27 +243,26 @@ struct rte_event_eth_rx_adapter_stats { * @param enqueue_buf_size * Total enqueue buffer size. * @param enqueue_buf_count - * mbuf count in enqueue buffer. - * @param mbuf - * mbuf array. - * @param nb_mbuf - * mbuf count. + * Event count in enqueue buffer. + * @param[in, out] ev + * Event array. + * @param nb_event + * Event array length. * @param cb_arg * Callback argument. - * @param[out] enq_mbuf - * The adapter enqueues enq_mbuf[] if the return value of the - * callback is less than nb_mbuf + * @param[out] nb_dropped + * Packets dropped by callback. * @return - * Returns the number of mbufs should be enqueued to eventdev + * - The number of events to be enqueued by the SW adapter. */ typedef uint16_t (*rte_event_eth_rx_adapter_cb_fn)(uint16_t eth_dev_id, uint16_t queue_id, uint32_t enqueue_buf_size, uint32_t enqueue_buf_count, -
[dpdk-dev] [PATCH] eal: fix positive error codes from probe/remove
According to API, 'rte_dev_probe()' and 'rte_dev_remove()' and their 'hotplug' equivalents must return 0 or negative error code. Bus code returns positive values if device wasn't recognized by any driver, so the result of 'bus->plug/unplug()' must be converted. Positive on remove means that device not found by driver. Positive on probe means that there are no suitable buses/drivers, i.e. device is not supported. CC: sta...@dpdk.org Fixes: a3ee360f4440 ("eal: add hotplug add/remove device") Fixes: 244d5130719c ("eal: enable hotplug on multi-process") Signed-off-by: Ilya Maximets --- lib/librte_eal/common/eal_common_dev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c index 824b8f926..f9cae8e26 100644 --- a/lib/librte_eal/common/eal_common_dev.c +++ b/lib/librte_eal/common/eal_common_dev.c @@ -233,7 +233,7 @@ rte_dev_probe(const char *devargs) * process. */ if (ret != -EEXIST) - return ret; + return (ret < 0) ? ret : -ENOTSUP; } /* primary send attach sync request to secondary. */ @@ -319,7 +319,7 @@ local_dev_remove(struct rte_device *dev) if (ret) { RTE_LOG(ERR, EAL, "Driver cannot detach the device (%s)\n", dev->name); - return ret; + return (ret < 0) ? ret : -ENOENT; } return 0; -- 2.17.1
Re: [dpdk-dev] [PATCH 2/2] meson: make build configurable
On Thu, 2019-05-30 at 14:59 +0300, Ilya Maximets wrote: > On 30.05.2019 14:06, Luca Boccassi wrote: > > On Thu, 2019-05-30 at 13:03 +0300, Ilya Maximets wrote: > > > On 29.05.2019 23:37, Luca Boccassi wrote: > > > > On Wed, 2019-05-29 at 19:39 +0300, Ilya Maximets wrote: > > > > > The first thing many developers do before start building DPDK > > > > > is > > > > > disabling all the not needed divers and libraries. This > > > > > happens > > > > > just because more than a half of DPDK dirvers and libraries > > > > > are > > > > > not > > > > > needed for the particular reason. For example, you don't need > > > > > dpaa*, octeon*, various croypto devices, eventdev, etc. if > > > > > you're > > > > > only want to build OVS for x86_64 with static linking. > > > > > > > > > > By disabling everything you don't need, build speeds up > > > > > literally > > > > > 10x > > > > > times. This is important for CI systems. For example, > > > > > TravisCI > > > > > wastes > > > > > 10 minutes for the default DPDK build just to check linking > > > > > with > > > > > OVS. > > > > > > > > > > Another thing is the binary size. Number of DPDK libraries > > > > > and, > > > > > as a result, size of resulted statically linked application > > > > > decreases > > > > > significantly. > > > > > > > > > > Important thing also that you're able to not install some > > > > > dependencies > > > > > if you don't have them on a target platform. Just disable > > > > > libs/drivers > > > > > that depends on it. Similar thing for the glibc version > > > > > mismatch > > > > > between build and target platforms. > > > > > > > > > > Also, I have to note that less code means less probability of > > > > > failures and less number of attack vectors. > > > > > > > > > > This patch gives 'meson' the power of configurability that we > > > > > have with 'make'. Using new options it's possible to enable > > > > > just > > > > > what you need and nothing more. > > > > > > > > > > For example, following cmdline could be used to build almost > > > > > minimal > > > > > set of DPDK libs and drivers to check OVS build: > > > > > > > > > > $ meson build -Dexamples='' -Dtests=false > > > > > -Denable_kmods=false > > > > > \ > > > > > -Ddrivers_bus=pci,vdev \ > > > > > -Ddrivers_mempool=ring \ > > > > > -Ddrivers_net=null,virtio,ring \ > > > > > -Ddrivers_crypto=virtio \ > > > > > -Ddrivers_compress=none \ > > > > > -Ddrivers_event=none\ > > > > > -Ddrivers_baseband=none \ > > > > > -Ddrivers_raw=none \ > > > > > -Ddrivers_common=none \ > > > > > > > > > > -Dlibs=kvargs,eal,cmdline,ring,mempool,mbuf,net,meter,\ > > > > >ethdev,pci,hash,cryptodev,pdump,vhost > > > > > \ > > > > > -Dapps=none > > > > > > > > > > Adding a few real net drivers will give configuration that > > > > > can be > > > > > used > > > > > in production environment. > > > > > > > > > > Looks not very pretty, but this could be moved to a script. > > > > > > > > > > Build details: > > > > > > > > > > Build targets in project: 57 > > > > > > > > > > $ time ninja > > > > > real0m11,528s > > > > > user1m4,137s > > > > > sys 0m4,935s > > > > > > > > > > $ du -sh ../dpdk_meson_install/ > > > > > 3,5M../dpdk_meson_install/ > > > > > > > > > > To compare with what we have without these options: > > > > > > > > > > $ meson build -Dexamples='' -Dtests=false > > > > > -Denable_kmods=false > > > > > Build targets in project: 434 > > > > > > > > > > $ time ninja > > > > > real1m38,963s > > > > > user10m18,624s > > > > > sys 0m45,478s > > > > > > > > > > $ du -sh ../dpdk_meson_install/ > > > > > 27M ../dpdk_meson_install/ > > > > > > > > > > 10x speed up for the user time. > > > > > 7.7 times size decrease. > > > > > > > > > > This is probably not much user-friendly because it's not a > > > > > Kconfig > > > > > and dependency tracking in meson is really poor, so it > > > > > requires > > > > > usually few iterations to pick correct set of libraries to > > > > > satisfy > > > > > all dependencies. However, it's not a big deal. Options > > > > > intended > > > > > for a proficient users who knows what they need. > > > > > > > > Hi, > > > > > > > > We talked about this a few times in the past, and it was > > > > actually > > > > one > > > > of the design goals to _avoid_ replicating the octopus-like > > > > config > > > > system of the makefiles. That's because it makes the test > > > > matrix > > > > insanely complicated, not to mention the harm to user > > > > friendliness, > > > > among other things. > > > > > > > > If someone doesn't want to use a PMD, they can just avoid > > > > installing it > > > > - it's simple enough. > > > > > > So how c
Re: [dpdk-dev] [PATCH v2] meson: don't check dependencies for tests if not required
On Thu, 2019-05-30 at 15:38 +0300, Ilya Maximets wrote: > Don't need to check dependencies if test apps will not be built > anyway. > > Signed-off-by: Ilya Maximets < > i.maxim...@samsung.com > > > --- > > Version 2: > - 'get_option('tests')' check moved to the top. > > app/test/meson.build | 141 ++--- > -- > 1 file changed, 72 insertions(+), 69 deletions(-) Acked-by: Luca Boccassi -- Kind regards, Luca Boccassi
Re: [dpdk-dev] [PATCH v4 2/5] eal: add lcore accessors
30/05/2019 12:11, Bruce Richardson: > On Thu, May 30, 2019 at 09:40:08AM +0200, Thomas Monjalon wrote: > > 30/05/2019 09:31, David Marchand: > > > On Thu, May 30, 2019 at 12:51 AM Stephen Hemminger < > > > step...@networkplumber.org> wrote: > > > > > > > On Thu, 30 May 2019 00:46:30 +0200 > > > > Thomas Monjalon wrote: > > > > > > > > > 23/05/2019 15:58, David Marchand: > > > > > > From: Stephen Hemminger > > > > > > > > > > > > The fields of the internal EAL core configuration are currently > > > > > > laid bare as part of the API. This is not good practice and limits > > > > > > fixing issues with layout and sizes. > > > > > > > > > > > > Make new accessor functions for the fields used by current drivers > > > > > > and examples. > > > > > [...] > > > > > > +DPDK_19.08 { > > > > > > + global: > > > > > > + > > > > > > + rte_lcore_cpuset; > > > > > > + rte_lcore_index; > > > > > > + rte_lcore_to_cpu_id; > > > > > > + rte_lcore_to_socket_id; > > > > > > + > > > > > > +} DPDK_19.05; > > > > > > + > > > > > > EXPERIMENTAL { > > > > > > global: > > > > > > > > > > Just to make sure, are we OK to introduce these functions > > > > > as non-experimental? > > > > > > > > They were in previous releases as inlines this patch converts them > > > > to real functions. > > > > > > > > > > > Well, yes and no. > > > > > > rte_lcore_index and rte_lcore_to_socket_id already existed, so making them > > > part of the ABI is fine for me. > > > > > > rte_lcore_to_cpu_id is new but seems quite safe in how it can be used, > > > adding it to the ABI is ok for me. > > > > It is used by DPAA and some test. > > I guess adding as experimental is fine too? > > I'm fine with both options, I'm just trying to apply the policy > > we agreed on. Does this case deserve an exception? > > > > While it may be a good candidate, I'm not sure how much making an exception > for it really matters. I'd be tempted to just mark it experimental and then > have it stable for the 19.11 release. What do we really lose by waiting a > release to stabilize it? I would agree Bruce. If no more comment, I will wait for a v5 of this series.
Re: [dpdk-dev] [PATCH v2 1/3] net/af_xdp: enable zero copy by extbuf
On Thu, 30 May 2019 17:07:05 +0800 Xiaolong Ye wrote: > Implement zero copy of af_xdp pmd through mbuf's external memory > mechanism to achieve high performance. > > This patch also provides a new parameter "pmd_zero_copy" for user, so they > can choose to enable zero copy of af_xdp pmd or not. > > To be clear, "zero copy" here is different from the "zero copy mode" of > AF_XDP, it is about zero copy between af_xdp umem and mbuf used in dpdk > application. > > Suggested-by: Varghese Vipin > Suggested-by: Tummala Sivaprasad > Suggested-by: Olivier Matz > Signed-off-by: Xiaolong Ye Why is this a parameter? Can it just be auto detected. Remember configuration is evil, it hurts usability, code coverage and increases complexity.
Re: [dpdk-dev] [PATCH v2 2/3] net/af_xdp: add multi-queue support
On Thu, 30 May 2019 17:07:06 +0800 Xiaolong Ye wrote: > This patch adds two parameters `start_queue` and `queue_count` to > specify the range of netdev queues used by AF_XDP pmd. > > Signed-off-by: Xiaolong Ye Why does this have to be a config option, we already have max queues and number of queues in DPDK configuration.
Re: [dpdk-dev] [PATCH v2] doc/testpmd: update compile steps for bpf examples
HI Thomas, Snipped > > + > > + To built other BPF examples, the compiler requires additional command- > line options. > > "To built" -> "To build" ok > > I think this note is vague. Don't you think it may confuse user if we don't > explicit which kind of options are required? Ok, the `v1` content was `In order to build t2.c and t3.c; pass DPDK targets include and library path as compiler options.`. But as user adds other library functions are added to `examples/bpf' these may vary too. Hence should we state as, `To build other BPF examples, appropriate libraries and dependencies is to be passed as command line options.`
[dpdk-dev] [Bug 288] Target name recorded wrong when try to build dpdk with x86_64-native-linux-gcc
https://bugs.dpdk.org/show_bug.cgi?id=288 Bug ID: 288 Summary: Target name recorded wrong when try to build dpdk with x86_64-native-linux-gcc Product: DPDK Version: unspecified Hardware: All OS: All Status: CONFIRMED Severity: minor Priority: Normal Component: mk Assignee: dev@dpdk.org Reporter: jasvinder.si...@intel.com Target Milestone: --- When build dpdk using x86_64-native-linux-gcc target. At the end, it publish wrong target name x86_64-native-linuxapp-gcc instead of x86_64-native-linux-gcc. Log info: $make install T=x86_64-native-linux-gcc -j Configuration done using x86_64-native-linux-gcc == Build lib == Build lib/librte_kvargs == Build lib/librte_cfgfile SYMLINK-FILE include/rte_cfgfile.h CC rte_cfgfile.o SYMLINK-FILE include/rte_kvargs.h CC rte_kvargs.o INSTALL-APP testpipeline INSTALL-MAP testpipeline.map INSTALL-APP testbbdev INSTALL-MAP testbbdev.map INSTALL-APP testpmd INSTALL-MAP testpmd.map LD test INSTALL-APP test INSTALL-MAP test.map Build complete [x86_64-native-linuxapp-gcc] <-- -- You are receiving this mail because: You are the assignee for the bug.
[dpdk-dev] DPDK Release Status Meeting 30/5/2019
Minutes 30 May 2019 --- Agenda: * Release Dates * Subtrees * OvS * Conferences * Opens Participants: * Debian/Microsoft * Intel * Marvell * Mellanox * Red Hat Release Dates - * v19.08 dates: * Proposal/V1  Monday 03 June   2019  * Integration/Merge/RC1 Monday 01 July   2019  * Release  Thurs  01 August 2019 * Reminder to send roadmaps for the release, it helps planning * Intel and Arm already shared the roadmap * Marvell will have new PMDs and will provide a roadmap * v19.11 proposed dates, *please comment*, * Proposal/V1 Friday 06 September 2019 * Integration/Merge/RC1 Friday 11 October 2019 * Release Friday 08 November 2019 * Constrains: * PRC holidays on October 1-7 inclusive, rc1 shouldn't overlap with it * US DPDK Summit on mid November, better to have release before summit Subtrees * main * Nothing critical, weekly merge done, pulled from sub-trees * rte_ prefix patchset in master, may affect existing patches * KNI ethtool removal merged * meson fix by Bruce to fix daily Intel builds is waiting * next-net * Merging patches, nothing critical * Two new PMDs submitted for this release, memif & hinic * next-eventdev * Nothing merged yet for 19.08 * next-virtio * next-crypto * next-pipeline * next-qos * no update received * Stable trees * v18.11.2-rc1 is waiting for test * 11 June is the target release day * Red Hat and OvS (Ian) tested it * Waiting test from others like Intel & Mellanox etc.. * Next week is only full week before target release date * Microsoft will test when possible OvS --- * 18.11.2-rc1 validation has been completed * There is an OvS patch available to use af_xdp PMD, via 19.08.0-rc0 Conferences --- * DPDK Userspace summit: DPDK Userspace · Sept. 19-20, 2019 https://www.dpdk.org/event/dpdk-userspace-bordeaux/ * CFP Opens: Monday, April 29 * CFP Closes: Friday, May 31 * Reminder that CFP closes tomorrow * US summit dates are not fixed yet Opens - * There is a potential that 19.11 will be big, need to think about ways to reduce the risk of delay for the release * New tool, public-inbox mail archive is enabled: * http://inbox.dpdk.org/announce/db6pr0501mb2167b67f9f92f45a8823c1a5d7...@db6pr0501mb2167.eurprd05.prod.outlook.com/ DPDK Release Status Meetings The DPDK Release Status Meeting is intended for DPDK Committers to discuss the status of the master tree and sub-trees, and for project managers to track progress or milestone dates. The meeting occurs on Thursdays at 8:30 UTC. If you wish to attend just send an email to "John McNamara " for the invite.
Re: [dpdk-dev] [Bug 287] netvsc PMD/dpdk/azure: Driver lockup with multi-queue configuration
On Thu, 30 May 2019 00:10:21 + bugzi...@dpdk.org wrote: > https://bugs.dpdk.org/show_bug.cgi?id=287 > > Bug ID: 287 >Summary: netvsc PMD/dpdk/azure: Driver lockup with multi-queue > configuration >Product: DPDK >Version: 18.11 > Hardware: x86 > OS: Linux > Status: CONFIRMED > Severity: normal > Priority: Normal > Component: ethdev > Assignee: dev@dpdk.org > Reporter: mohsinmazhar_sha...@trendmicro.com > Target Milestone: --- > > I am running an app using dpdk 18.11 netvsc PMD > (https://doc.dpdk.org/guides/nics/netvsc.html) on "Ubuntu 18.04 LTS" VM on > azure running kernel 4.18.0-1018 > (https://launchpad.net/ubuntu/+source/linux-azure/4.18.0-1018.18). The app > uses > multi-queue with 2 cores doing RX/TX. The lockup only occurs when doing a > connections/second test i.e. exercising the netvsc interface. When the lockup > occurs the netvsc interface and it's corresponding mellanox slave both can't > rx/tx packets. > Thanks for the report, busy this week, may have time to address it next week.
[dpdk-dev] [PATCH 1/3] power: add new packet type for capabilities
From: Marcin Hajkowski Add new packet type and commands for capabilities query. Signed-off-by: Marcin Hajkowski --- lib/librte_power/channel_commands.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/lib/librte_power/channel_commands.h b/lib/librte_power/channel_commands.h index ce587283c..b1f5584a8 100644 --- a/lib/librte_power/channel_commands.h +++ b/lib/librte_power/channel_commands.h @@ -34,6 +34,8 @@ extern "C" { /* CPU Power Queries */ #define CPU_POWER_QUERY_FREQ_LIST 7 #define CPU_POWER_QUERY_FREQ 8 +#define CPU_POWER_QUERY_CAPS_LIST 9 +#define CPU_POWER_QUERY_CAPS 10 /* --- Outgoing messages --- */ @@ -43,6 +45,7 @@ extern "C" { /* CPU Power Query Responses */ #define CPU_POWER_FREQ_LIST 3 +#define CPU_POWER_CAPS_LIST 4 #define HOURS 24 @@ -106,6 +109,17 @@ struct channel_packet_freq_list { uint8_t num_vcpu; }; +struct channel_packet_caps_list { + uint64_t resource_id; /**< core_num, device */ + uint32_t unit;/**< scale down/up/min/max */ + uint32_t command; /**< Power, IO, etc */ + char vm_name[VM_MAX_NAME_SZ]; + + uint64_t turbo[MAX_VCPU_PER_VM]; + uint64_t priority[MAX_VCPU_PER_VM]; + uint8_t num_vcpu; +}; + #ifdef __cplusplus } -- 2.17.2
[dpdk-dev] [PATCH 2/3] examples/power_manager: send cpu capabilities on vm request
From: Marcin Hajkowski Send capabilities for requested cores. Signed-off-by: Marcin Hajkowski --- examples/vm_power_manager/channel_monitor.c | 67 + 1 file changed, 67 insertions(+) diff --git a/examples/vm_power_manager/channel_monitor.c b/examples/vm_power_manager/channel_monitor.c index bfd9cc38d..731b3b480 100644 --- a/examples/vm_power_manager/channel_monitor.c +++ b/examples/vm_power_manager/channel_monitor.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include "channel_monitor.h" @@ -704,6 +705,60 @@ send_freq(struct channel_packet *pkt, chan_info); } +static int +send_capabilities(struct channel_packet *pkt, + struct channel_info *chan_info, + bool list_requested) +{ + unsigned int vcore_id = pkt->resource_id; + struct channel_packet_caps_list channel_pkt_caps_list; + struct vm_info info; + struct rte_power_core_capabilities caps; + int ret; + + if (get_info_vm(pkt->vm_name, &info) != 0) + return -1; + + if (!list_requested && vcore_id >= MAX_VCPU_PER_VM) + return -1; + + if (!info.allow_query) + return -1; + + channel_pkt_caps_list.command = CPU_POWER_CAPS_LIST; + channel_pkt_caps_list.num_vcpu = info.num_vcpus; + + if (list_requested) { + unsigned int i; + for (i = 0; i < info.num_vcpus; i++) { + ret = rte_power_get_capabilities(info.pcpu_map[i], + &caps); + if (ret == 0) { + channel_pkt_caps_list.turbo[i] = + caps.turbo; + channel_pkt_caps_list.priority[i] = + caps.priority; + } else + return -1; + + } + } else { + ret = rte_power_get_capabilities(info.pcpu_map[vcore_id], + &caps); + if (ret == 0) { + channel_pkt_caps_list.turbo[vcore_id] = + caps.turbo; + channel_pkt_caps_list.priority[vcore_id] = + caps.priority; + } else + return -1; + } + + return write_binary_packet(&channel_pkt_caps_list, + sizeof(channel_pkt_caps_list), + chan_info); +} + static int send_ack_for_received_cmd(struct channel_packet *pkt, struct channel_info *chan_info, @@ -812,6 +867,18 @@ process_request(struct channel_packet *pkt, struct channel_info *chan_info) RTE_LOG(ERR, CHANNEL_MONITOR, "Error during frequency sending.\n"); } + if (pkt->command == CPU_POWER_QUERY_CAPS_LIST || + pkt->command == CPU_POWER_QUERY_CAPS) { + + RTE_LOG(INFO, CHANNEL_MONITOR, + "Capabilities for %s requested.\n", pkt->vm_name); + int ret = send_capabilities(pkt, + chan_info, + pkt->command == CPU_POWER_QUERY_CAPS_LIST); + if (ret < 0) + RTE_LOG(ERR, CHANNEL_MONITOR, "Error during sending capabilities.\n"); + } + /* * Return is not checked as channel status may have been set to DISABLED * from management thread -- 2.17.2
[dpdk-dev] [PATCH 0/3] Core capabilities query
From: Marcin Hajkowski Extend guest channel and sample apps to query CPU capabilities. Please note that these changes depends on (http://patchwork.dpdk.org/cover/52335/) and (http://patchwork.dpdk.org/cover/52213/) which should be applied first. Marcin Hajkowski (3): power: add new packet type for capabilities examples/power_manager: send cpu capabilities on vm request examples/power_guest: send request for specified core capabilities examples/vm_power_manager/channel_monitor.c | 67 ++ .../guest_cli/vm_power_cli_guest.c| 119 +- lib/librte_power/channel_commands.h | 14 +++ 3 files changed, 198 insertions(+), 2 deletions(-) -- 2.17.2
[dpdk-dev] [PATCH 3/3] examples/power_guest: send request for specified core capabilities
From: Marcin Hajkowski Send request to power manager for core id provided by user to get related capabilities. Signed-off-by: Marcin Hajkowski --- .../guest_cli/vm_power_cli_guest.c| 119 +- 1 file changed, 117 insertions(+), 2 deletions(-) diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c index 848230248..de85c1406 100644 --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c @@ -132,7 +132,7 @@ struct cmd_freq_list_result { }; static int -query_freq_list(struct channel_packet *pkt, unsigned int lcore_id) +query_data(struct channel_packet *pkt, unsigned int lcore_id) { int ret; ret = rte_power_guest_channel_send_msg(pkt, lcore_id); @@ -206,7 +206,7 @@ cmd_query_freq_list_parsed(void *parsed_result, pkt.resource_id = lcore_id; } - ret = query_freq_list(&pkt, lcore_id); + ret = query_data(&pkt, lcore_id); if (ret < 0) { cmdline_printf(cl, "Error during sending frequency list query.\n"); return; @@ -248,6 +248,120 @@ cmdline_parse_inst_t cmd_query_freq_list = { }, }; +struct cmd_query_caps_result { + cmdline_fixed_string_t query_caps; + cmdline_fixed_string_t cpu_num; +}; + +static int +receive_capabilities(struct channel_packet_caps_list *pkt_caps_list, + unsigned int lcore_id) +{ + int ret; + + ret = rte_power_guest_channel_receive_msg(pkt_caps_list, + sizeof(struct channel_packet_caps_list), + lcore_id); + if (ret < 0) { + RTE_LOG(ERR, GUEST_CLI, "Error receiving message.\n"); + return -1; + } + if (pkt_caps_list->command != CPU_POWER_CAPS_LIST) { + RTE_LOG(ERR, GUEST_CLI, "Unexpected message received.\n"); + return -1; + } + return 0; +} + +static void +cmd_query_caps_list_parsed(void *parsed_result, + __rte_unused struct cmdline *cl, + __rte_unused void *data) +{ + struct cmd_query_caps_result *res = parsed_result; + unsigned int lcore_id; + struct channel_packet_caps_list pkt_caps_list; + struct channel_packet pkt; + bool query_list = false; + int ret; + char *ep; + + memset(&pkt, 0, sizeof(struct channel_packet)); + memset(&pkt_caps_list, 0, sizeof(struct channel_packet_caps_list)); + + if (!strcmp(res->cpu_num, "all")) { + + /* Get first enabled lcore. */ + lcore_id = rte_get_next_lcore(-1, + 0, + 0); + if (lcore_id == RTE_MAX_LCORE) { + cmdline_printf(cl, "Enabled core not found.\n"); + return; + } + + pkt.command = CPU_POWER_QUERY_CAPS_LIST; + strcpy(pkt.vm_name, policy.vm_name); + query_list = true; + } else { + errno = 0; + lcore_id = (unsigned int)strtol(res->cpu_num, &ep, 10); + if (errno != 0 || lcore_id >= MAX_VCPU_PER_VM || + ep == res->cpu_num) { + cmdline_printf(cl, "Invalid parameter provided.\n"); + return; + } + pkt.command = CPU_POWER_QUERY_CAPS; + strcpy(pkt.vm_name, policy.vm_name); + pkt.resource_id = lcore_id; + } + + ret = query_data(&pkt, lcore_id); + if (ret < 0) { + cmdline_printf(cl, "Error during sending capabilities query.\n"); + return; + } + + ret = receive_capabilities(&pkt_caps_list, lcore_id); + if (ret < 0) { + cmdline_printf(cl, "Error during capabilities reception.\n"); + return; + } + if (query_list) { + unsigned int i; + for (i = 0; i < pkt_caps_list.num_vcpu; ++i) + cmdline_printf(cl, "Capabilities of [%d] vcore are:" + " turbo possibility: %ld, is priority core: %ld.\n", + i, + pkt_caps_list.turbo[i], + pkt_caps_list.priority[i]); + } else { + cmdline_printf(cl, "Capabilities of [%d] vcore are:" + " turbo possibility: %ld, is priority core: %ld.\n", + lcore_id, + pkt_caps_list.turbo[lcore_id], + pkt_caps_list.priority[lcore_id]); + } +} + +cmdline_parse_token_string_t cmd_query_caps_token = + TOKEN_STRING_INITIALIZER(struct cmd_query_caps_result, query_caps, "query_cpu_caps"); +cmdline_parse_token_string_t cmd_query_cap
Re: [dpdk-dev] [PATCH v2] ipsec: include high order bytes of esn in pkt len
Hi Lukasz, > diff --git a/lib/librte_ipsec/esp_outb.c b/lib/librte_ipsec/esp_outb.c > index c798bc4..ed5974b 100644 > --- a/lib/librte_ipsec/esp_outb.c > +++ b/lib/librte_ipsec/esp_outb.c > @@ -126,11 +126,11 @@ outb_tun_pkt_prepare(struct rte_ipsec_sa *sa, > rte_be64_t sqc, > > /* pad length + esp tail */ > pdlen = clen - plen; > - tlen = pdlen + sa->icv_len; > + tlen = pdlen + sa->icv_len + sa->sqh_len; We probably don't want to increase pkt_len by sa->sqh_len for inline case. That's why I suggested to pass sqh_len as parameter to that function. Then for inline we can just pass 0. Do you see any obstacles with that approach? Same thought for transport mode. Konstantin > > /* do append and prepend */ > ml = rte_pktmbuf_lastseg(mb); > - if (tlen + sa->sqh_len + sa->aad_len > rte_pktmbuf_tailroom(ml)) > + if (tlen + sa->aad_len > rte_pktmbuf_tailroom(ml)) > return -ENOSPC; > > /* prepend header */ > @@ -152,8 +152,8 @@ outb_tun_pkt_prepare(struct rte_ipsec_sa *sa, rte_be64_t > sqc, > rte_memcpy(ph, sa->hdr, sa->hdr_len); > > /* update original and new ip header fields */ > - update_tun_l3hdr(sa, ph + sa->hdr_l3_off, mb->pkt_len, sa->hdr_l3_off, > - sqn_low16(sqc)); > + update_tun_l3hdr(sa, ph + sa->hdr_l3_off, mb->pkt_len - sa->sqh_len, > + sa->hdr_l3_off, sqn_low16(sqc)); > > /* update spi, seqn and iv */ > esph = (struct esp_hdr *)(ph + sa->hdr_len); > @@ -292,11 +292,11 @@ outb_trs_pkt_prepare(struct rte_ipsec_sa *sa, > rte_be64_t sqc, > > /* pad length + esp tail */ > pdlen = clen - plen; > - tlen = pdlen + sa->icv_len; > + tlen = pdlen + sa->icv_len + sa->sqh_len; > > /* do append and insert */ > ml = rte_pktmbuf_lastseg(mb); > - if (tlen + sa->sqh_len + sa->aad_len > rte_pktmbuf_tailroom(ml)) > + if (tlen + sa->aad_len > rte_pktmbuf_tailroom(ml)) > return -ENOSPC; > > /* prepend space for ESP header */ > @@ -314,8 +314,8 @@ outb_trs_pkt_prepare(struct rte_ipsec_sa *sa, rte_be64_t > sqc, > insert_esph(ph, ph + hlen, uhlen); > > /* update ip header fields */ > - np = update_trs_l3hdr(sa, ph + l2len, mb->pkt_len, l2len, l3len, > - IPPROTO_ESP); > + np = update_trs_l3hdr(sa, ph + l2len, mb->pkt_len - sa->sqh_len, l2len, > + l3len, IPPROTO_ESP); > > /* update spi, seqn and iv */ > esph = (struct esp_hdr *)(ph + uhlen); > @@ -425,6 +425,9 @@ esp_outb_sqh_process(const struct rte_ipsec_session *ss, > struct rte_mbuf *mb[], > for (i = 0; i != num; i++) { > if ((mb[i]->ol_flags & PKT_RX_SEC_OFFLOAD_FAILED) == 0) { > ml = rte_pktmbuf_lastseg(mb[i]); > + /* remove high-order 32 bits of esn from packet len */ > + mb[i]->pkt_len -= sa->sqh_len; > + ml->data_len -= sa->sqh_len; > icv = rte_pktmbuf_mtod_offset(ml, void *, > ml->data_len - icv_len); > remove_sqh(icv, icv_len);
Re: [dpdk-dev] [PATCH v4 2/5] eal: add lcore accessors
On Thu, May 30, 2019 at 3:39 PM Thomas Monjalon wrote: > 30/05/2019 12:11, Bruce Richardson: > > On Thu, May 30, 2019 at 09:40:08AM +0200, Thomas Monjalon wrote: > > > 30/05/2019 09:31, David Marchand: > > > > On Thu, May 30, 2019 at 12:51 AM Stephen Hemminger < > > > > step...@networkplumber.org> wrote: > > > > > > > > > On Thu, 30 May 2019 00:46:30 +0200 > > > > > Thomas Monjalon wrote: > > > > > > > > > > > 23/05/2019 15:58, David Marchand: > > > > > > > From: Stephen Hemminger > > > > > > > > > > > > > > The fields of the internal EAL core configuration are currently > > > > > > > laid bare as part of the API. This is not good practice and > limits > > > > > > > fixing issues with layout and sizes. > > > > > > > > > > > > > > Make new accessor functions for the fields used by current > drivers > > > > > > > and examples. > > > > > > [...] > > > > > > > +DPDK_19.08 { > > > > > > > + global: > > > > > > > + > > > > > > > + rte_lcore_cpuset; > > > > > > > + rte_lcore_index; > > > > > > > + rte_lcore_to_cpu_id; > > > > > > > + rte_lcore_to_socket_id; > > > > > > > + > > > > > > > +} DPDK_19.05; > > > > > > > + > > > > > > > EXPERIMENTAL { > > > > > > > global: > > > > > > > > > > > > Just to make sure, are we OK to introduce these functions > > > > > > as non-experimental? > > > > > > > > > > They were in previous releases as inlines this patch converts them > > > > > to real functions. > > > > > > > > > > > > > > Well, yes and no. > > > > > > > > rte_lcore_index and rte_lcore_to_socket_id already existed, so > making them > > > > part of the ABI is fine for me. > > > > > > > > rte_lcore_to_cpu_id is new but seems quite safe in how it can be > used, > > > > adding it to the ABI is ok for me. > > > > > > It is used by DPAA and some test. > > > I guess adding as experimental is fine too? > > > I'm fine with both options, I'm just trying to apply the policy > > > we agreed on. Does this case deserve an exception? > > > > > > > While it may be a good candidate, I'm not sure how much making an > exception > > for it really matters. I'd be tempted to just mark it experimental and > then > > have it stable for the 19.11 release. What do we really lose by waiting a > > release to stabilize it? > > I would agree Bruce. > If no more comment, I will wait for a v5 of this series. > I agree that there is no reason we make an exception for those 2 new ones. But to me the existing rte_lcore_index and rte_lcore_to_socket_id must be marked as stable. This is to avoid breaking existing users that did not set ALLOW_EXPERIMENTAL_API. I will prepare a v5 later. -- David Marchand
[dpdk-dev] [PATCH] cryptodev: free memzone when releasing cryptodev
When a cryptodev is created in a primary process, rte_cryptodev_data_alloc reserves a memzone. However, this memzone was not released when the cryptodev is uninitialized. After that, new cryptodev cannot be created due to memzone name conflict. This commit frees the memzone when a cryptodev is uninitialized, fixing this bug. This approach is chosen instead of keeping and reusing the old memzone, because the new cryptodev could belong to a different NUMA socket. Also, rte_cryptodev_data pointer is now properly recorded in cryptodev_globals.data array. Bugzilla ID: 105 Signed-off-by: Junxiao Shi --- lib/librte_cryptodev/rte_cryptodev.c | 44 +++- 1 file changed, 38 insertions(+), 6 deletions(-) diff --git a/lib/librte_cryptodev/rte_cryptodev.c b/lib/librte_cryptodev/rte_cryptodev.c index 00c2cf4..666dfea 100644 --- a/lib/librte_cryptodev/rte_cryptodev.c +++ b/lib/librte_cryptodev/rte_cryptodev.c @@ -653,6 +653,31 @@ rte_cryptodev_data_alloc(uint8_t dev_id, struct rte_cryptodev_data **data, return 0; } +static inline int +rte_cryptodev_data_free(uint8_t dev_id, struct rte_cryptodev_data **data) +{ + char mz_name[RTE_CRYPTODEV_NAME_MAX_LEN]; + const struct rte_memzone *mz; + int n; + + /* generate memzone name */ + n = snprintf(mz_name, sizeof(mz_name), "rte_cryptodev_data_%u", dev_id); + if (n >= (int)sizeof(mz_name)) + return -EINVAL; + + mz = rte_memzone_lookup(mz_name); + if (mz == NULL) + return -ENOMEM; + + RTE_ASSERT(*data == mz->addr); + *data = NULL; + + if (rte_eal_process_type() == RTE_PROC_PRIMARY) + return rte_memzone_free(mz); + + return 0; +} + static uint8_t rte_cryptodev_find_free_device_index(void) { @@ -687,16 +712,16 @@ rte_cryptodev_pmd_allocate(const char *name, int socket_id) cryptodev = rte_cryptodev_pmd_get_dev(dev_id); if (cryptodev->data == NULL) { - struct rte_cryptodev_data *cryptodev_data = - cryptodev_globals.data[dev_id]; + struct rte_cryptodev_data **cryptodev_data = + &cryptodev_globals.data[dev_id]; - int retval = rte_cryptodev_data_alloc(dev_id, &cryptodev_data, + int retval = rte_cryptodev_data_alloc(dev_id, cryptodev_data, socket_id); - if (retval < 0 || cryptodev_data == NULL) + if (retval < 0 || *cryptodev_data == NULL) return NULL; - cryptodev->data = cryptodev_data; + cryptodev->data = *cryptodev_data; strlcpy(cryptodev->data->name, name, RTE_CRYPTODEV_NAME_MAX_LEN); @@ -724,13 +749,20 @@ rte_cryptodev_pmd_release_device(struct rte_cryptodev *cryptodev) if (cryptodev == NULL) return -EINVAL; + uint8_t dev_id = cryptodev->data->dev_id; + /* Close device only if device operations have been set */ if (cryptodev->dev_ops) { - ret = rte_cryptodev_close(cryptodev->data->dev_id); + ret = rte_cryptodev_close(dev_id); if (ret < 0) return ret; } + struct rte_cryptodev_data **cryptodev_data = &cryptodev_globals.data[dev_id]; + ret = rte_cryptodev_data_free(dev_id, cryptodev_data); + if (ret < 0) + return ret; + cryptodev->attached = RTE_CRYPTODEV_DETACHED; cryptodev_globals.nb_devs--; return 0; -- 2.7.4
[dpdk-dev] eal/pci: Improve automatic selection of IOVA mode
In SPDK, not all drivers are registered with DPDK at start up time. Previously, that meant DPDK always chose to set itself up in IOVA_PA mode. Instead, when the correct iova choice is unclear based on the devices and drivers known to DPDK at start up time, use other heuristics (such as whether /proc/self/pagemap is accessible) to make a better choice. This enables SPDK to run as an unprivileged user again without requiring users to explicitly set the iova mode on the command line.
[dpdk-dev] [PATCH 02/12] eal/pci: Inline several functions into rte_pci_get_iommu_class
This is in preparation for future simplifications. The functions are simply inlined for now. Signed-off-by: Ben Walker Change-Id: I129992c9b44f4575a28cc05b78297e15b6be4249 --- drivers/bus/pci/linux/pci.c | 176 +++- 1 file changed, 71 insertions(+), 105 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index c99d523f0..d3177916a 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -497,86 +497,6 @@ rte_pci_scan(void) return -1; } -/* - * Is pci device bound to any kdrv - */ -static inline int -pci_one_device_is_bound(void) -{ - struct rte_pci_device *dev = NULL; - int ret = 0; - - FOREACH_DEVICE_ON_PCIBUS(dev) { - if (dev->kdrv == RTE_KDRV_UNKNOWN || - dev->kdrv == RTE_KDRV_NONE) { - continue; - } else { - ret = 1; - break; - } - } - return ret; -} - -/* - * Any one of the device bound to uio - */ -static inline int -pci_one_device_bound_uio(void) -{ - struct rte_pci_device *dev = NULL; - struct rte_devargs *devargs; - int need_check; - - FOREACH_DEVICE_ON_PCIBUS(dev) { - devargs = dev->device.devargs; - - need_check = 0; - switch (rte_pci_bus.bus.conf.scan_mode) { - case RTE_BUS_SCAN_WHITELIST: - if (devargs && devargs->policy == RTE_DEV_WHITELISTED) - need_check = 1; - break; - case RTE_BUS_SCAN_UNDEFINED: - case RTE_BUS_SCAN_BLACKLIST: - if (devargs == NULL || - devargs->policy != RTE_DEV_BLACKLISTED) - need_check = 1; - break; - } - - if (!need_check) - continue; - - if (dev->kdrv == RTE_KDRV_IGB_UIO || - dev->kdrv == RTE_KDRV_UIO_GENERIC) { - return 1; - } - } - return 0; -} - -/* - * Any one of the device has iova as va - */ -static inline int -pci_one_device_has_iova_va(void) -{ - struct rte_pci_device *dev = NULL; - struct rte_pci_driver *drv = NULL; - - FOREACH_DRIVER_ON_PCIBUS(drv) { - if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) { - FOREACH_DEVICE_ON_PCIBUS(dev) { - if (dev->kdrv == RTE_KDRV_VFIO && - rte_pci_match(drv, dev)) - return 1; - } - } - } - return 0; -} - #if defined(RTE_ARCH_X86) static bool pci_one_device_iommu_support_va(struct rte_pci_device *dev) @@ -641,14 +561,76 @@ pci_one_device_iommu_support_va(__rte_unused struct rte_pci_device *dev) #endif /* - * All devices IOMMUs support VA as IOVA + * Get iommu class of PCI devices on the bus. */ -static bool -pci_devices_iommu_support_va(void) +enum rte_iova_mode +rte_pci_get_iommu_class(void) { + bool is_bound = false; + bool is_vfio_noiommu_enabled = true; + bool has_iova_va = false; + bool is_bound_uio = false; + bool iommu_no_va = false; + bool break_out; + bool need_check; struct rte_pci_device *dev = NULL; struct rte_pci_driver *drv = NULL; + struct rte_devargs *devargs; + + FOREACH_DEVICE_ON_PCIBUS(dev) { + if (dev->kdrv == RTE_KDRV_UNKNOWN || + dev->kdrv == RTE_KDRV_NONE) { + continue; + } else { + is_bound = true; + break; + } + } + if (!is_bound) + return RTE_IOVA_DC; + FOREACH_DRIVER_ON_PCIBUS(drv) { + if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) { + FOREACH_DEVICE_ON_PCIBUS(dev) { + if (dev->kdrv == RTE_KDRV_VFIO && + rte_pci_match(drv, dev)) { + has_iova_va = true; + break; + } + } + + if (has_iova_va) + break; + } + } + + FOREACH_DEVICE_ON_PCIBUS(dev) { + devargs = dev->device.devargs; + + need_check = false; + switch (rte_pci_bus.bus.conf.scan_mode) { + case RTE_BUS_SCAN_WHITELIST: + if (devargs && devargs->policy == RTE_DEV_WHITELISTED) + need_check = true; + break; + case RTE_BUS_SCAN_UNDEFINED: + case RTE_BUS_SCAN_BLACKLIST: + if (
[dpdk-dev] [PATCH 05/12] eal/pci: Add function pci_ignore_device
This performs a check for whether the device should be ignored due to whitelist or blacklist. This check eventually needs to apply to all of the other checks in rte_pci_get_iommu_class. Signed-off-by: Ben Walker Change-Id: I8e63e4c2e4199f34561ea1d911e13d6d74a47322 --- drivers/bus/pci/linux/pci.c | 44 + 1 file changed, 25 insertions(+), 19 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index b7a66d717..f269b6a64 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -560,6 +560,29 @@ pci_one_device_iommu_support_va(__rte_unused struct rte_pci_device *dev) } #endif +static bool +pci_ignore_device(struct rte_pci_device *dev) +{ + struct rte_devargs *devargs; + + devargs = dev->device.devargs; + + switch (rte_pci_bus.bus.conf.scan_mode) { + case RTE_BUS_SCAN_WHITELIST: + if (devargs && devargs->policy == RTE_DEV_WHITELISTED) + return false; + break; + case RTE_BUS_SCAN_UNDEFINED: + case RTE_BUS_SCAN_BLACKLIST: + if (devargs == NULL || + devargs->policy != RTE_DEV_BLACKLISTED) + return false; + break; + } + + return true; +} + /* * Get iommu class of PCI devices on the bus. */ @@ -571,10 +594,9 @@ rte_pci_get_iommu_class(void) bool has_iova_va = false; bool is_bound_uio = false; bool iommu_no_va = false; - bool need_check; struct rte_pci_device *dev = NULL; struct rte_pci_driver *drv = NULL; - struct rte_devargs *devargs; + FOREACH_DEVICE_ON_PCIBUS(dev) { if (dev->kdrv == RTE_KDRV_UNKNOWN || @@ -612,23 +634,7 @@ rte_pci_get_iommu_class(void) } FOREACH_DEVICE_ON_PCIBUS(dev) { - devargs = dev->device.devargs; - - need_check = false; - switch (rte_pci_bus.bus.conf.scan_mode) { - case RTE_BUS_SCAN_WHITELIST: - if (devargs && devargs->policy == RTE_DEV_WHITELISTED) - need_check = true; - break; - case RTE_BUS_SCAN_UNDEFINED: - case RTE_BUS_SCAN_BLACKLIST: - if (devargs == NULL || - devargs->policy != RTE_DEV_BLACKLISTED) - need_check = true; - break; - } - - if (!need_check) + if (pci_ignore_device(dev)) continue; if (dev->kdrv == RTE_KDRV_IGB_UIO || -- 2.20.1
[dpdk-dev] [PATCH 07/12] eal/pci: Reverse if check in rte_pci_get_iommu_class
It's simpler to reverse the if statement here, especially with an upcoming simplification. Signed-off-by: Ben Walker Change-Id: I6cff80231032304f3f865fdf38157554fad7fd07 --- drivers/bus/pci/linux/pci.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index ebe62f140..f678d2318 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -601,10 +601,8 @@ rte_pci_get_iommu_class(void) if (pci_ignore_device(dev)) continue; - if (dev->kdrv == RTE_KDRV_UNKNOWN || - dev->kdrv == RTE_KDRV_NONE) { - continue; - } else { + if (dev->kdrv != RTE_KDRV_UNKNOWN && + dev->kdrv != RTE_KDRV_NONE) { is_bound = true; break; } -- 2.20.1
[dpdk-dev] [PATCH 11/12] eal/pci: rte_pci_get_iommu_class handles no drivers
In the case where no drivers are registered with the system, rte_pci_get_iommu_class should return RTE_IOVA_DC. Signed-off-by: Ben Walker Change-Id: Ia5b0cae100cfcfe46a9e4996328f9746ce33cfd3 --- drivers/bus/pci/linux/pci.c | 79 ++--- 1 file changed, 38 insertions(+), 41 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 09af66571..abc21061f 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -589,49 +589,68 @@ pci_ignore_device(struct rte_pci_device *dev) enum rte_iova_mode rte_pci_get_iommu_class(void) { - bool is_bound = false; - bool is_vfio_noiommu_enabled = true; - bool has_iova_va = false; - bool is_bound_uio = false; - bool iommu_no_va = false; - struct rte_pci_device *dev = NULL; - struct rte_pci_driver *drv = NULL; + struct rte_pci_device *dev; + struct rte_pci_driver *drv; + struct rte_pci_addr *addr; + enum rte_iova_mode iova_mode; + + iova_mode = RTE_IOVA_DC; FOREACH_DEVICE_ON_PCIBUS(dev) { if (pci_ignore_device(dev)) continue; + addr = &dev->addr; + switch (dev->kdrv) { case RTE_KDRV_UNKNOWN: case RTE_KDRV_NONE: break; case RTE_KDRV_VFIO: - is_bound = true; FOREACH_DRIVER_ON_PCIBUS(drv) { if (!rte_pci_match(drv, dev)) continue; - /* - * just one PCI device needs to be checked out because - * the IOMMU hardware is the same for all of them. - */ - iommu_no_va = !pci_one_device_iommu_support_va(dev); + if ((drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) == 0) + continue; - if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) { - has_iova_va = true; - break; + if (!pci_one_device_iommu_support_va(dev)) { + RTE_LOG(WARNING, EAL, "Device " PCI_PRI_FMT " wanted IOVA as VA, but ", + addr->domain, addr->bus, addr->devid, addr->function); + RTE_LOG(WARNING, EAL, "IOMMU does not support it.\n"); + iova_mode = RTE_IOVA_PA; + } +#ifdef VFIO_PRESENT + else if (rte_vfio_noiommu_is_enabled()) { + RTE_LOG(WARNING, EAL, "Device " PCI_PRI_FMT " wanted IOVA as VA, but ", + addr->domain, addr->bus, addr->devid, addr->function); + RTE_LOG(WARNING, EAL, "vfio-noiommu is enabled.\n"); + iova_mode = RTE_IOVA_PA; +#endif + } else if (iova_mode == RTE_IOVA_PA) { + RTE_LOG(WARNING, EAL, "Device " PCI_PRI_FMT " wanted IOVA as VA, but ", + addr->domain, addr->bus, addr->devid, addr->function); + RTE_LOG(WARNING, EAL, "other devices require PA.\n"); + } else { + iova_mode = RTE_IOVA_VA; } } break; case RTE_KDRV_IGB_UIO: case RTE_KDRV_UIO_GENERIC: case RTE_KDRV_NIC_UIO: - is_bound = true; FOREACH_DRIVER_ON_PCIBUS(drv) { if (!rte_pci_match(drv, dev)) continue; - is_bound_uio = true; + if (iova_mode == RTE_IOVA_VA) { + RTE_LOG(WARNING, EAL, "Some devices wanted IOVA as VA, but "); + RTE_LOG(WARNING, EAL, "device " PCI_PRI_FMT " requires PA.\n", + addr->domain, addr->bus, addr->devid, addr->function); + + } + + iova_mode = RTE_IOVA_PA; break; } break; @@ -639,29 +658,7 @@ rte_pci_get_iommu_class(void) } } - if (!is_bound) - return RTE_IOVA_DC; - -#ifdef VFIO_PRESENT - is_vfio_noiommu_enabled = rte_vfio_noiommu_is_enabled() == true ? -
[dpdk-dev] [PATCH 03/12] eal/pci: Rework loops in rte_pci_get_iommu_class
Make all of the loops first iterate over devices, then drivers. This is in preparation for combining them into a single loop. Signed-off-by: Ben Walker Change-Id: Ifb2bfcc60570a5d5a13481be3da0fc74bf00ef1f --- drivers/bus/pci/linux/pci.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index d3177916a..70815e4f0 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -589,10 +589,10 @@ rte_pci_get_iommu_class(void) if (!is_bound) return RTE_IOVA_DC; - FOREACH_DRIVER_ON_PCIBUS(drv) { - if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) { - FOREACH_DEVICE_ON_PCIBUS(dev) { - if (dev->kdrv == RTE_KDRV_VFIO && + FOREACH_DEVICE_ON_PCIBUS(dev) { + if (dev->kdrv == RTE_KDRV_VFIO) { + FOREACH_DRIVER_ON_PCIBUS(drv) { + if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA && rte_pci_match(drv, dev)) { has_iova_va = true; break; @@ -631,8 +631,8 @@ rte_pci_get_iommu_class(void) } break_out = false; - FOREACH_DRIVER_ON_PCIBUS(drv) { - FOREACH_DEVICE_ON_PCIBUS(dev) { + FOREACH_DEVICE_ON_PCIBUS(dev) { + FOREACH_DRIVER_ON_PCIBUS(drv) { if (!rte_pci_match(drv, dev)) continue; /* -- 2.20.1
[dpdk-dev] [PATCH 01/12] eal: Make rte_eal_using_phys_addrs work sooner
This function only returned the correct answer after a call to initialize the memory subsystem. Make it work prior to that. Signed-off-by: Ben Walker Change-Id: I8f3c5128fbf5da884a956bbcc72c5a13564825d5 --- lib/librte_eal/linux/eal/eal_memory.c | 63 --- 1 file changed, 28 insertions(+), 35 deletions(-) diff --git a/lib/librte_eal/linux/eal/eal_memory.c b/lib/librte_eal/linux/eal/eal_memory.c index 416dad898..0c07bb946 100644 --- a/lib/librte_eal/linux/eal/eal_memory.c +++ b/lib/librte_eal/linux/eal/eal_memory.c @@ -66,34 +66,8 @@ * zone as well as a physical contiguous zone. */ -static bool phys_addrs_available = true; - #define RANDOMIZE_VA_SPACE_FILE "/proc/sys/kernel/randomize_va_space" -static void -test_phys_addrs_available(void) -{ - uint64_t tmp = 0; - phys_addr_t physaddr; - - if (!rte_eal_has_hugepages()) { - RTE_LOG(ERR, EAL, - "Started without hugepages support, physical addresses not available\n"); - phys_addrs_available = false; - return; - } - - physaddr = rte_mem_virt2phy(&tmp); - if (physaddr == RTE_BAD_PHYS_ADDR) { - if (rte_eal_iova_mode() == RTE_IOVA_PA) - RTE_LOG(ERR, EAL, - "Cannot obtain physical addresses: %s. " - "Only vfio will function.\n", - strerror(errno)); - phys_addrs_available = false; - } -} - /* * Get physical address of any mapped virtual address in the current process. */ @@ -107,7 +81,7 @@ rte_mem_virt2phy(const void *virtaddr) off_t offset; /* Cannot parse /proc/self/pagemap, no need to log errors everywhere */ - if (!phys_addrs_available) + if (!rte_eal_using_phys_addrs()) return RTE_BAD_IOVA; /* standard page size */ @@ -1336,8 +1310,6 @@ eal_legacy_hugepage_init(void) int nr_hugefiles, nr_hugepages = 0; void *addr; - test_phys_addrs_available(); - memset(used_hp, 0, sizeof(used_hp)); /* get pointer to global configuration */ @@ -1516,7 +1488,7 @@ eal_legacy_hugepage_init(void) continue; } - if (phys_addrs_available && + if (rte_eal_using_phys_addrs() && rte_eal_iova_mode() != RTE_IOVA_VA) { /* find physical addresses for each hugepage */ if (find_physaddrs(&tmp_hp[hp_offset], hpi) < 0) { @@ -1735,8 +1707,6 @@ eal_hugepage_init(void) uint64_t memory[RTE_MAX_NUMA_NODES]; int hp_sz_idx, socket_id; - test_phys_addrs_available(); - memset(used_hp, 0, sizeof(used_hp)); for (hp_sz_idx = 0; @@ -1879,8 +1849,6 @@ eal_legacy_hugepage_attach(void) "into secondary processes\n"); } - test_phys_addrs_available(); - fd_hugepage = open(eal_hugepage_data_path(), O_RDONLY); if (fd_hugepage < 0) { RTE_LOG(ERR, EAL, "Could not open %s\n", @@ -2020,7 +1988,32 @@ rte_eal_hugepage_attach(void) int rte_eal_using_phys_addrs(void) { - return phys_addrs_available; + static int using_phys_addrs = -1; + uint64_t tmp = 0; + phys_addr_t physaddr; + + if (using_phys_addrs != -1) + return using_phys_addrs; + + /* Set the default to 1 */ + using_phys_addrs = 1; + + if (!rte_eal_has_hugepages()) { + RTE_LOG(ERR, EAL, + "Started without hugepages support, physical addresses not available\n"); + using_phys_addrs = 0; + return using_phys_addrs; + } + + physaddr = rte_mem_virt2phy(&tmp); + if (physaddr == RTE_BAD_PHYS_ADDR) { + if (rte_eal_iova_mode() == RTE_IOVA_PA) + RTE_LOG(ERR, EAL, + "Cannot obtain physical addresses. Only vfio will function.\n"); + using_phys_addrs = 0; + } + + return using_phys_addrs; } static int __rte_unused -- 2.20.1
[dpdk-dev] [PATCH 06/12] eal/pci: Correctly test whitelist/blacklist in rte_pci_get_iommu_class
All of the checks should respect the white and black lists. Signed-off-by: Ben Walker Change-Id: Ie66176bea49987d1fc0a03dbee2638d9dd6efbc5 --- drivers/bus/pci/linux/pci.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index f269b6a64..ebe62f140 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -597,8 +597,10 @@ rte_pci_get_iommu_class(void) struct rte_pci_device *dev = NULL; struct rte_pci_driver *drv = NULL; - FOREACH_DEVICE_ON_PCIBUS(dev) { + if (pci_ignore_device(dev)) + continue; + if (dev->kdrv == RTE_KDRV_UNKNOWN || dev->kdrv == RTE_KDRV_NONE) { continue; @@ -611,6 +613,9 @@ rte_pci_get_iommu_class(void) return RTE_IOVA_DC; FOREACH_DEVICE_ON_PCIBUS(dev) { + if (pci_ignore_device(dev)) + continue; + if (dev->kdrv == RTE_KDRV_VFIO) { FOREACH_DRIVER_ON_PCIBUS(drv) { if (!rte_pci_match(drv, dev)) -- 2.20.1
[dpdk-dev] [PATCH 09/12] eal/pci: Simplify rte_pci_get_iommu class by using a switch
Take several independent if statements and convert to a switch statement. Signed-off-by: Ben Walker Change-Id: Ia77c88ea484b529e8b0c9e09e8ef22cf3210e669 --- drivers/bus/pci/linux/pci.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 11e2e4d1b..41fd82988 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -601,12 +601,12 @@ rte_pci_get_iommu_class(void) if (pci_ignore_device(dev)) continue; - if (dev->kdrv != RTE_KDRV_UNKNOWN && - dev->kdrv != RTE_KDRV_NONE) { + switch (dev->kdrv) { + case RTE_KDRV_UNKNOWN: + case RTE_KDRV_NONE: + break; + case RTE_KDRV_VFIO: is_bound = true; - } - - if (dev->kdrv == RTE_KDRV_VFIO) { FOREACH_DRIVER_ON_PCIBUS(drv) { if (!rte_pci_match(drv, dev)) continue; @@ -622,11 +622,14 @@ rte_pci_get_iommu_class(void) break; } } - } - - if (dev->kdrv == RTE_KDRV_IGB_UIO || - dev->kdrv == RTE_KDRV_UIO_GENERIC) { + break; + case RTE_KDRV_IGB_UIO: + case RTE_KDRV_UIO_GENERIC: + case RTE_KDRV_NIC_UIO: + is_bound = true; is_bound_uio = true; + break; + } } -- 2.20.1
[dpdk-dev] [PATCH 04/12] eal/pci: Collapse two loops in rte_pci_get_iommu_class
Two of these loops easily collapse into a single loop. This sets the stage for future simplifications. Signed-off-by: Ben Walker Change-Id: I3353f2e3585808cebff3f11805f96e4a1cc7fb3a --- drivers/bus/pci/linux/pci.c | 31 ++- 1 file changed, 10 insertions(+), 21 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 70815e4f0..b7a66d717 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -571,7 +571,6 @@ rte_pci_get_iommu_class(void) bool has_iova_va = false; bool is_bound_uio = false; bool iommu_no_va = false; - bool break_out; bool need_check; struct rte_pci_device *dev = NULL; struct rte_pci_driver *drv = NULL; @@ -592,8 +591,16 @@ rte_pci_get_iommu_class(void) FOREACH_DEVICE_ON_PCIBUS(dev) { if (dev->kdrv == RTE_KDRV_VFIO) { FOREACH_DRIVER_ON_PCIBUS(drv) { - if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA && - rte_pci_match(drv, dev)) { + if (!rte_pci_match(drv, dev)) + continue; + + /* + * just one PCI device needs to be checked out because + * the IOMMU hardware is the same for all of them. + */ + iommu_no_va = !pci_one_device_iommu_support_va(dev); + + if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) { has_iova_va = true; break; } @@ -630,24 +637,6 @@ rte_pci_get_iommu_class(void) } } - break_out = false; - FOREACH_DEVICE_ON_PCIBUS(dev) { - FOREACH_DRIVER_ON_PCIBUS(drv) { - if (!rte_pci_match(drv, dev)) - continue; - /* -* just one PCI device needs to be checked out because -* the IOMMU hardware is the same for all of them. -*/ - iommu_no_va = !pci_one_device_iommu_support_va(dev); - break_out = true; - break; - } - - if (break_out) - break; - } - #ifdef VFIO_PRESENT is_vfio_noiommu_enabled = rte_vfio_noiommu_is_enabled() == true ? true : false; -- 2.20.1
[dpdk-dev] [PATCH 12/12] eal: If bus can't decide PA or VA, try to access PA
If the bus can't determine a preference for IOVA_PA vs. IOVA_VA by looking at the devices and drivers, as a last resort test if physical addresses are even accessible in /proc/self/pagemap. If they are, use IOVA_PA. If they are not, use IOVA_VA. Change-Id: If1eeb723283b80b24bd973987054fdad62f59cbd --- lib/librte_eal/common/eal_common_bus.c | 4 lib/librte_eal/linux/eal/eal.c | 28 +++--- 2 files changed, 21 insertions(+), 11 deletions(-) diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c index c8f1901f0..77f1be1b4 100644 --- a/lib/librte_eal/common/eal_common_bus.c +++ b/lib/librte_eal/common/eal_common_bus.c @@ -237,10 +237,6 @@ rte_bus_get_iommu_class(void) mode |= bus->get_iommu_class(); } - if (mode != RTE_IOVA_VA) { - /* Use default IOVA mode */ - mode = RTE_IOVA_PA; - } return mode; } diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c index 161399619..283aae120 100644 --- a/lib/librte_eal/linux/eal/eal.c +++ b/lib/librte_eal/linux/eal/eal.c @@ -948,6 +948,7 @@ rte_eal_init(int argc, char **argv) static char logid[PATH_MAX]; char cpuset[RTE_CPU_AFFINITY_STR_LEN]; char thread_name[RTE_MAX_THREAD_NAME_LEN]; + enum rte_iova_mode iova_mode; /* checks if the machine is adequate */ if (!rte_cpu_is_supported()) { @@ -1037,18 +1038,31 @@ rte_eal_init(int argc, char **argv) /* if no EAL option "--iova-mode=", use bus IOVA scheme */ if (internal_config.iova_mode == RTE_IOVA_DC) { - /* autodetect the IOVA mapping mode (default is RTE_IOVA_PA) */ - rte_eal_get_configuration()->iova_mode = - rte_bus_get_iommu_class(); + /* autodetect the IOVA mapping mode */ + iova_mode = rte_bus_get_iommu_class(); /* Workaround for KNI which requires physical address to work */ - if (rte_eal_get_configuration()->iova_mode == RTE_IOVA_VA && + if (iova_mode == RTE_IOVA_VA && rte_eal_check_module("rte_kni") == 1) { - rte_eal_get_configuration()->iova_mode = RTE_IOVA_PA; + iova_mode = RTE_IOVA_PA; RTE_LOG(WARNING, EAL, - "Some devices want IOVA as VA but PA will be used because.. " - "KNI module inserted\n"); + "Some devices want IOVA as VA but PA will be" + " used because KNI module inserted\n"); + } + + if (iova_mode == RTE_IOVA_DC) { + /* If the bus doesn't care, check if physical addresses are +* accessible. */ + if (rte_eal_using_phys_addrs()) { + /* Physical addresses are available, so the safest +* choice is to use those. */ + iova_mode = RTE_IOVA_PA; + } else { + iova_mode = RTE_IOVA_VA; + } } + + rte_eal_get_configuration()->iova_mode = iova_mode; } else { rte_eal_get_configuration()->iova_mode = internal_config.iova_mode; -- 2.20.1
[dpdk-dev] [PATCH 10/12] eal/pci: Finding a device bound to UIO does not force PA
If a device is found that is bound to the UIO driver, only force IOVA_PA if there is a driver registered to use it. Signed-off-by: Ben Walker Change-Id: I8015f11a33ab1b7662bf374d6944eff8d7a74a07 --- drivers/bus/pci/linux/pci.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 41fd82988..09af66571 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -627,7 +627,13 @@ rte_pci_get_iommu_class(void) case RTE_KDRV_UIO_GENERIC: case RTE_KDRV_NIC_UIO: is_bound = true; - is_bound_uio = true; + FOREACH_DRIVER_ON_PCIBUS(drv) { + if (!rte_pci_match(drv, dev)) + continue; + + is_bound_uio = true; + break; + } break; } -- 2.20.1
[dpdk-dev] [PATCH 08/12] eal/pci: Collapse loops in rte_pci_get_iommu_class
The three loops can now be easily combined into one. This is slightly less efficient than before because it doesn't break out early. But that can be addressed later. Signed-off-by: Ben Walker Change-Id: Ic97155bb478dddbcbeaa6d51947684ffef219a52 --- drivers/bus/pci/linux/pci.c | 19 +++ 1 file changed, 3 insertions(+), 16 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index f678d2318..11e2e4d1b 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -604,15 +604,7 @@ rte_pci_get_iommu_class(void) if (dev->kdrv != RTE_KDRV_UNKNOWN && dev->kdrv != RTE_KDRV_NONE) { is_bound = true; - break; } - } - if (!is_bound) - return RTE_IOVA_DC; - - FOREACH_DEVICE_ON_PCIBUS(dev) { - if (pci_ignore_device(dev)) - continue; if (dev->kdrv == RTE_KDRV_VFIO) { FOREACH_DRIVER_ON_PCIBUS(drv) { @@ -630,15 +622,7 @@ rte_pci_get_iommu_class(void) break; } } - - if (has_iova_va) - break; } - } - - FOREACH_DEVICE_ON_PCIBUS(dev) { - if (pci_ignore_device(dev)) - continue; if (dev->kdrv == RTE_KDRV_IGB_UIO || dev->kdrv == RTE_KDRV_UIO_GENERIC) { @@ -646,6 +630,9 @@ rte_pci_get_iommu_class(void) } } + if (!is_bound) + return RTE_IOVA_DC; + #ifdef VFIO_PRESENT is_vfio_noiommu_enabled = rte_vfio_noiommu_is_enabled() == true ? true : false; -- 2.20.1
Re: [dpdk-dev] [PATCH 02/12] eal/pci: Inline several functions into rte_pci_get_iommu_class
On Thu, 30 May 2019 10:48:09 -0700 Ben Walker wrote: > This is in preparation for future simplifications. The > functions are simply inlined for now. > > Signed-off-by: Ben Walker > Change-Id: I129992c9b44f4575a28cc05b78297e15b6be4249 Please don't inline any functions that are not in the fast path. The compiler will do it anyway.
Re: [dpdk-dev] [PATCH 02/12] eal/pci: Inline several functions into rte_pci_get_iommu_class
> -Original Message- > From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Thursday, May 30, 2019 10:57 AM > To: Walker, Benjamin > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH 02/12] eal/pci: Inline several functions into > rte_pci_get_iommu_class > > On Thu, 30 May 2019 10:48:09 -0700 > Ben Walker wrote: > > > This is in preparation for future simplifications. The functions are > > simply inlined for now. > > > > Signed-off-by: Ben Walker > > Change-Id: I129992c9b44f4575a28cc05b78297e15b6be4249 > > Please don't inline any functions that are not in the fast path. > The compiler will do it anyway. That's not what I mean by inline. I didn't mark the functions inline - I copied their contents into the single place they are called. This patch is a set up patch for a later one that refactors the way this function works, and doing this makes the diff easier to read.
Re: [dpdk-dev] [PATCH v4 2/5] eal: add lcore accessors
On Thu, May 30, 2019 at 07:00:36PM +0200, David Marchand wrote: >On Thu, May 30, 2019 at 3:39 PM Thomas Monjalon ><[1]tho...@monjalon.net> wrote: > > 30/05/2019 12:11, Bruce Richardson: > > On Thu, May 30, 2019 at 09:40:08AM +0200, Thomas Monjalon wrote: > > > 30/05/2019 09:31, David Marchand: > > > > On Thu, May 30, 2019 at 12:51 AM Stephen Hemminger < > > > > [2]step...@networkplumber.org> wrote: > > > > > > > > > On Thu, 30 May 2019 00:46:30 +0200 > > > > > Thomas Monjalon <[3]tho...@monjalon.net> wrote: > > > > > > > > > > > 23/05/2019 15:58, David Marchand: > > > > > > > From: Stephen Hemminger <[4]step...@networkplumber.org> > > > > > > > > > > > > > > The fields of the internal EAL core configuration are > currently > > > > > > > laid bare as part of the API. This is not good practice > and limits > > > > > > > fixing issues with layout and sizes. > > > > > > > > > > > > > > Make new accessor functions for the fields used by > current drivers > > > > > > > and examples. > > > > > > [...] > > > > > > > +DPDK_19.08 { > > > > > > > + global: > > > > > > > + > > > > > > > + rte_lcore_cpuset; > > > > > > > + rte_lcore_index; > > > > > > > + rte_lcore_to_cpu_id; > > > > > > > + rte_lcore_to_socket_id; > > > > > > > + > > > > > > > +} DPDK_19.05; > > > > > > > + > > > > > > > EXPERIMENTAL { > > > > > > > global: > > > > > > > > > > > > Just to make sure, are we OK to introduce these functions > > > > > > as non-experimental? > > > > > > > > > > They were in previous releases as inlines this patch > converts them > > > > > to real functions. > > > > > > > > > > > > > > Well, yes and no. > > > > > > > > rte_lcore_index and rte_lcore_to_socket_id already existed, so > making them > > > > part of the ABI is fine for me. > > > > > > > > rte_lcore_to_cpu_id is new but seems quite safe in how it can > be used, > > > > adding it to the ABI is ok for me. > > > > > > It is used by DPAA and some test. > > > I guess adding as experimental is fine too? > > > I'm fine with both options, I'm just trying to apply the policy > > > we agreed on. Does this case deserve an exception? > > > > > > > While it may be a good candidate, I'm not sure how much making an > exception > > for it really matters. I'd be tempted to just mark it experimental > and then > > have it stable for the 19.11 release. What do we really lose by > waiting a > > release to stabilize it? > I would agree Bruce. > If no more comment, I will wait for a v5 of this series. > >I agree that there is no reason we make an exception for those 2 new >ones. >But to me the existing rte_lcore_index and rte_lcore_to_socket_id must >be marked as stable. >This is to avoid breaking existing users that did not set >ALLOW_EXPERIMENTAL_API. >I will prepare a v5 later. >-- Yes, agreed. Any existing APIs that were already present as static inlines can go straight to stable when added to the .map file. /Bruce
[dpdk-dev] [PATCH 0/8] raw/ioat: driver for Intel QuickData Technology
This patch series adds support for the Intel QuickData Technology device, part of the Intel I/O Acceleration Technology (Intel I/OAT). It is a raw device for allowing hardware DMA i.e. data copies in hardware. Bruce Richardson (8): raw/ioat: add initial support for ioat rawdev driver usertools/dpdk-devbind.py: add support for IOAT devices raw/ioat: add register definition file raw/ioat: create device on probe and destroy on release raw/ioat: add device info function raw/ioat: add configure, start and stop functions raw/ioat: add statistics functions raw/ioat: add local API to perform copies MAINTAINERS | 7 +- app/test/Makefile | 1 + app/test/meson.build| 4 + app/test/test_ioat_rawdev.c | 269 + config/common_armv8a_linux | 1 + config/common_base | 5 + config/defconfig_arm-armv7a-linuxapp-gcc| 1 + config/defconfig_ppc_64-power8-linuxapp-gcc | 1 + doc/guides/rawdevs/index.rst| 1 + doc/guides/rawdevs/ioat_rawdev.rst | 227 ++ doc/guides/rel_notes/release_19_08.rst | 11 + drivers/raw/Makefile| 1 + drivers/raw/ioat/Makefile | 29 ++ drivers/raw/ioat/ioat_rawdev.c | 310 drivers/raw/ioat/meson.build| 9 + drivers/raw/ioat/rte_ioat_rawdev.h | 228 ++ drivers/raw/ioat/rte_ioat_spec.h| 301 +++ drivers/raw/ioat/rte_pmd_ioat_version.map | 4 + drivers/raw/meson.build | 3 +- mk/rte.app.mk | 1 + usertools/dpdk-devbind.py | 10 + 21 files changed, 1422 insertions(+), 2 deletions(-) create mode 100644 app/test/test_ioat_rawdev.c create mode 100644 doc/guides/rawdevs/ioat_rawdev.rst create mode 100644 drivers/raw/ioat/Makefile create mode 100644 drivers/raw/ioat/ioat_rawdev.c create mode 100644 drivers/raw/ioat/meson.build create mode 100644 drivers/raw/ioat/rte_ioat_rawdev.h create mode 100644 drivers/raw/ioat/rte_ioat_spec.h create mode 100644 drivers/raw/ioat/rte_pmd_ioat_version.map -- 2.21.0
[dpdk-dev] [PATCH 1/8] raw/ioat: add initial support for ioat rawdev driver
Add stubs for ioat rawdev driver support in DPDK, specifically: * makefile and meson build hooks * initial public header file * rawdev main C file, with probe and release functions * release note update announcing the driver * initial documentation for the new section in the rawdev doc * unit test stubs for device unit tests Signed-off-by: Bruce Richardson --- MAINTAINERS | 7 +- app/test/Makefile | 1 + app/test/meson.build| 1 + app/test/test_ioat_rawdev.c | 22 + config/common_armv8a_linux | 1 + config/common_base | 5 ++ config/defconfig_arm-armv7a-linuxapp-gcc| 1 + config/defconfig_ppc_64-power8-linuxapp-gcc | 1 + doc/guides/rawdevs/index.rst| 1 + doc/guides/rawdevs/ioat_rawdev.rst | 25 ++ doc/guides/rel_notes/release_19_08.rst | 11 +++ drivers/raw/Makefile| 1 + drivers/raw/ioat/Makefile | 28 +++ drivers/raw/ioat/ioat_rawdev.c | 93 + drivers/raw/ioat/meson.build| 8 ++ drivers/raw/ioat/rte_ioat_rawdev.h | 24 ++ drivers/raw/ioat/rte_pmd_ioat_version.map | 4 + drivers/raw/meson.build | 3 +- mk/rte.app.mk | 1 + 19 files changed, 236 insertions(+), 2 deletions(-) create mode 100644 app/test/test_ioat_rawdev.c create mode 100644 doc/guides/rawdevs/ioat_rawdev.rst create mode 100644 drivers/raw/ioat/Makefile create mode 100644 drivers/raw/ioat/ioat_rawdev.c create mode 100644 drivers/raw/ioat/meson.build create mode 100644 drivers/raw/ioat/rte_ioat_rawdev.h create mode 100644 drivers/raw/ioat/rte_pmd_ioat_version.map diff --git a/MAINTAINERS b/MAINTAINERS index 15d0829c5..b613a1e74 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1042,6 +1042,12 @@ M: Tianfei zhang F: drivers/raw/ifpga_rawdev/ F: doc/guides/rawdevs/ifpga_rawdev.rst +IOAT Rawdev +M: Bruce Richardson +F: drivers/raw/ioat/ +F: doc/guides/rawdevs/ioat_rawdev.rst +F: app/test/test_ioat_rawdev.c + NXP DPAA2 QDMA M: Nipun Gupta F: drivers/raw/dpaa2_qdma/ @@ -1052,7 +1058,6 @@ M: Nipun Gupta F: drivers/raw/dpaa2_cmdif/ F: doc/guides/rawdevs/dpaa2_cmdif.rst - Packet processing - diff --git a/app/test/Makefile b/app/test/Makefile index 68d6b4fbc..7fbdd0755 100644 --- a/app/test/Makefile +++ b/app/test/Makefile @@ -212,6 +212,7 @@ endif ifeq ($(CONFIG_RTE_LIBRTE_RAWDEV),y) SRCS-y += test_rawdev.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_IOAT_RAWDEV) += test_ioat_rawdev.c endif SRCS-$(CONFIG_RTE_LIBRTE_KVARGS) += test_kvargs.c diff --git a/app/test/meson.build b/app/test/meson.build index 83391cef0..9867619d3 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -52,6 +52,7 @@ test_sources = files('commands.c', 'test_hash_perf.c', 'test_hash_readwrite_lf.c', 'test_interrupts.c', + 'test_ioat_rawdev.c', 'test_ipsec.c', 'test_kni.c', 'test_kvargs.c', diff --git a/app/test/test_ioat_rawdev.c b/app/test/test_ioat_rawdev.c new file mode 100644 index 0..bd1bb2827 --- /dev/null +++ b/app/test/test_ioat_rawdev.c @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2019 Intel Corporation + */ + +#include "test.h" + +#ifndef RTE_LIBRTE_PMD_IOAT_RAWDEV + +static int +test_ioat_rawdev(void) { return TEST_SKIPPED; } + +#else + +static int +test_ioat_rawdev(void) +{ + return 0; +} + +#endif /* RTE_LIBRTE_PMD_IOAT_RAWDEV */ + +REGISTER_TEST_COMMAND(ioat_rawdev_autotest, test_ioat_rawdev); diff --git a/config/common_armv8a_linux b/config/common_armv8a_linux index 72091de1c..481712ebc 100644 --- a/config/common_armv8a_linux +++ b/config/common_armv8a_linux @@ -34,5 +34,6 @@ CONFIG_RTE_ARCH_ARM64_MEMCPY=n CONFIG_RTE_LIBRTE_FM10K_PMD=n CONFIG_RTE_LIBRTE_SFC_EFX_PMD=n CONFIG_RTE_LIBRTE_AVP_PMD=n +CONFIG_RTE_LIBRTE_PMD_IOAT_RAWDEV=n CONFIG_RTE_SCHED_VECTOR=n diff --git a/config/common_base b/config/common_base index 6f19ad5d2..2b8db4880 100644 --- a/config/common_base +++ b/config/common_base @@ -741,6 +741,11 @@ CONFIG_RTE_LIBRTE_PMD_DPAA2_QDMA_RAWDEV=n # CONFIG_RTE_LIBRTE_PMD_IFPGA_RAWDEV=y +# +# Compile PMD for Intel IOAT raw device +# +CONFIG_RTE_LIBRTE_PMD_IOAT_RAWDEV=y + # # Compile librte_ring # diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc index c9509b274..ee158ef9d 100644 --- a/config/defconfig_arm-armv7a-linuxapp-gcc +++ b/config/defconfig_arm-armv7a-linuxapp-gcc @@ -54,3 +54,4 @@ CONFIG_RTE_LIBRTE_QEDE_PMD=n CONFIG_RTE_LIBRTE_SFC_EFX_PMD=n CONFIG_RTE_LIBRTE_AVP_PMD=n CONFIG_RTE_LIBRTE_NFP_PMD=n +CONFIG_RTE_LIBRTE_PMD_IOAT_RAWDEV=n diff --git a/config/defconfig_ppc_64-power8-linuxapp-gcc b/config/defconfig_ppc_64-power8-linuxapp-gcc index 7e248b755..9f3670ec0 100644 --- a/config/def
[dpdk-dev] [PATCH 2/8] usertools/dpdk-devbind.py: add support for IOAT devices
In order to allow binding/unbinding of devices for use by the ioat_rawdev, we need to update the devbind script to add a new class of device, and add device ids for the specific HW instances. Signed-off-by: Bruce Richardson --- doc/guides/rawdevs/ioat_rawdev.rst | 11 +++ usertools/dpdk-devbind.py | 10 ++ 2 files changed, 21 insertions(+) diff --git a/doc/guides/rawdevs/ioat_rawdev.rst b/doc/guides/rawdevs/ioat_rawdev.rst index 40ab1b466..99e757498 100644 --- a/doc/guides/rawdevs/ioat_rawdev.rst +++ b/doc/guides/rawdevs/ioat_rawdev.rst @@ -23,3 +23,14 @@ configurations. For builds using ``meson`` and ``ninja``, the driver will be built when the target platform is x86-based. + +Device Setup +- + +The Intel\ |reg| QuickData Technology HW devices will need to be bound to a +user-space IO driver for use. The script ``dpdk-devbind.py`` script +included with DPDK can be used to view the state of the devices and to bind +them to a suitable DPDK-supported kernel driver. When querying the +status of the devices, they will appear under the category of "dma +devices", i.e. the command ``dpdk-devbind.py --status-dev dma`` can be used +to see the state of those devices alone. diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py index 9e79f0d28..bd0d97df3 100755 --- a/usertools/dpdk-devbind.py +++ b/usertools/dpdk-devbind.py @@ -36,11 +36,17 @@ octeontx2_npa = {'Class': '08', 'Vendor': '177d', 'Device': 'a0fb,a0fc', 'SVendor': None, 'SDevice': None} +intel_ioat_bdw = {'Class': '08', 'Vendor': '8086', 'Device': '6f20,6f21,6f22,6f23,6f24,6f25,6f26,6f27,6f2e,6f2f', + 'SVendor': None, 'SDevice': None} +intel_ioat_skx = {'Class': '08', 'Vendor': '8086', 'Device': '2021', + 'SVendor': None, 'SDevice': None} + network_devices = [network_class, cavium_pkx, avp_vnic, ifpga_class] crypto_devices = [encryption_class, intel_processor_class] eventdev_devices = [cavium_sso, cavium_tim, octeontx2_sso] mempool_devices = [cavium_fpa, octeontx2_npa] compress_devices = [cavium_zip] +dma_devices = [intel_ioat_bdw, intel_ioat_skx] # global dict ethernet devices present. Dictionary indexed by PCI address. # Each device within this is itself a dictionary of device properties @@ -595,6 +601,8 @@ def show_status(): if status_dev == "compress" or status_dev == "all": show_device_status(compress_devices , "Compress") +if status_dev == "dma" or status_dev == "all": +show_device_status(dma_devices, "DMA") def parse_args(): '''Parses the command-line arguments given by the user and takes the @@ -670,6 +678,7 @@ def do_arg_actions(): get_device_details(eventdev_devices) get_device_details(mempool_devices) get_device_details(compress_devices) +get_device_details(dma_devices) show_status() @@ -690,6 +699,7 @@ def main(): get_device_details(eventdev_devices) get_device_details(mempool_devices) get_device_details(compress_devices) +get_device_details(dma_devices) do_arg_actions() if __name__ == "__main__": -- 2.21.0
[dpdk-dev] [PATCH 3/8] raw/ioat: add register definition file
Add in the list of registers for the device. File is taken from the SPDK project: https://github.com/spdk/spdk/blob/master/include/spdk/ioat_spec.h Signed-off-by: Bruce Richardson --- drivers/raw/ioat/Makefile| 1 + drivers/raw/ioat/meson.build | 3 +- drivers/raw/ioat/rte_ioat_spec.h | 301 +++ 3 files changed, 304 insertions(+), 1 deletion(-) create mode 100644 drivers/raw/ioat/rte_ioat_spec.h diff --git a/drivers/raw/ioat/Makefile b/drivers/raw/ioat/Makefile index 7726e310a..1e10938f3 100644 --- a/drivers/raw/ioat/Makefile +++ b/drivers/raw/ioat/Makefile @@ -24,5 +24,6 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_IOAT_RAWDEV) += ioat_rawdev.c # export include files SYMLINK-y-include += rte_ioat_rawdev.h +SYMLINK-y-include += rte_ioat_spec.h include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/raw/ioat/meson.build b/drivers/raw/ioat/meson.build index ba7620a68..ca23e23fc 100644 --- a/drivers/raw/ioat/meson.build +++ b/drivers/raw/ioat/meson.build @@ -5,4 +5,5 @@ build = dpdk_conf.has('RTE_ARCH_X86') sources = files('ioat_rawdev.c') deps += ['rawdev', 'bus_pci'] -install_headers('rte_ioat_rawdev.h') +install_headers('rte_ioat_rawdev.h', + 'rte_ioat_spec.h') diff --git a/drivers/raw/ioat/rte_ioat_spec.h b/drivers/raw/ioat/rte_ioat_spec.h new file mode 100644 index 0..305e36ded --- /dev/null +++ b/drivers/raw/ioat/rte_ioat_spec.h @@ -0,0 +1,301 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) Intel Corporation + */ + +/** + * \file + * I/OAT specification definitions + * + * Taken from ioat_spec.h from SPDK project, with prefix renames and + * other minor changes. + */ + +#ifndef RTE_IOAT_SPEC_H +#define RTE_IOAT_SPEC_H + +#ifdef __cplusplus +extern "C" { +#endif + +#include + +#define RTE_IOAT_PCI_CHANERR_INT_OFFSET0x180 + +#define RTE_IOAT_INTRCTRL_MASTER_INT_EN0x01 + +#define RTE_IOAT_VER_3_00x30 +#define RTE_IOAT_VER_3_30x33 + +/* DMA Channel Registers */ +#define RTE_IOAT_CHANCTRL_CHANNEL_PRIORITY_MASK0xF000 +#define RTE_IOAT_CHANCTRL_COMPL_DCA_EN 0x0200 +#define RTE_IOAT_CHANCTRL_CHANNEL_IN_USE 0x0100 +#define RTE_IOAT_CHANCTRL_DESCRIPTOR_ADDR_SNOOP_CONTROL0x0020 +#define RTE_IOAT_CHANCTRL_ERR_INT_EN 0x0010 +#define RTE_IOAT_CHANCTRL_ANY_ERR_ABORT_EN 0x0008 +#define RTE_IOAT_CHANCTRL_ERR_COMPLETION_EN0x0004 +#define RTE_IOAT_CHANCTRL_INT_REARM0x0001 + +/* DMA Channel Capabilities */ +#defineRTE_IOAT_DMACAP_PB (1 << 0) +#defineRTE_IOAT_DMACAP_DCA (1 << 4) +#defineRTE_IOAT_DMACAP_BFILL (1 << 6) +#defineRTE_IOAT_DMACAP_XOR (1 << 8) +#defineRTE_IOAT_DMACAP_PQ (1 << 9) +#defineRTE_IOAT_DMACAP_DMA_DIF (1 << 10) + +struct rte_ioat_registers { + uint8_t chancnt; + uint8_t xfercap; + uint8_t genctrl; + uint8_t intrctrl; + uint32_tattnstatus; + uint8_t cbver; /* 0x08 */ + uint8_t reserved4[0x3]; /* 0x09 */ + uint16_tintrdelay; /* 0x0C */ + uint16_tcs_status; /* 0x0E */ + uint32_tdmacapability; /* 0x10 */ + uint8_t reserved5[0x6C]; /* 0x14 */ + uint16_tchanctrl; /* 0x80 */ + uint8_t reserved6[0x2]; /* 0x82 */ + uint8_t chancmd;/* 0x84 */ + uint8_t reserved3[1]; /* 0x85 */ + uint16_tdmacount; /* 0x86 */ + uint64_tchansts;/* 0x88 */ + uint64_tchainaddr; /* 0x90 */ + uint64_tchancmp;/* 0x98 */ + uint8_t reserved2[0x8]; /* 0xA0 */ + uint32_tchanerr;/* 0xA8 */ + uint32_tchanerrmask;/* 0xAC */ +} __attribute__((packed)); + +#define RTE_IOAT_CHANCMD_RESET 0x20 +#define RTE_IOAT_CHANCMD_SUSPEND 0x04 + +#define RTE_IOAT_CHANSTS_STATUS0x7ULL +#define RTE_IOAT_CHANSTS_ACTIVE0x0 +#define RTE_IOAT_CHANSTS_IDLE 0x1 +#define RTE_IOAT_CHANSTS_SUSPENDED 0x2 +#define RTE_IOAT_CHANSTS_HALTED0x3 +#define RTE_IOAT_CHANSTS_ARMED 0x4 + +#define RTE_IOAT_CHANSTS_UNAFFILIATED_ERROR0x8ULL +#define RTE_IOAT_CHANSTS_SOFT_ERROR0x10ULL + +#define RTE_IOAT_CHANSTS_COMPLETED_DESCRIPTOR_MASK (~0x3FULL) + +#define RTE_IOAT_CHANCMP_ALIGN 8 /* CHANCMP address must be 64-bit aligned */ + +struct rte_ioat_generic_hw_desc { + uint32_t size; + union { + uint32_t control_raw; + struct { + uint32_t int_enable: 1; + uint32_t src_snoop_disable: 1; + uint32_t dest_snoop_disable: 1; +
[dpdk-dev] [PATCH 4/8] raw/ioat: create device on probe and destroy on release
Add the create/destroy driver functions so that we can actually allocate a rawdev and destroy it when done. No rawdev API functions are actually implemented at this point. Signed-off-by: Bruce Richardson --- doc/guides/rawdevs/ioat_rawdev.rst | 11 drivers/raw/ioat/ioat_rawdev.c | 93 +- drivers/raw/ioat/rte_ioat_rawdev.h | 20 +++ 3 files changed, 121 insertions(+), 3 deletions(-) diff --git a/doc/guides/rawdevs/ioat_rawdev.rst b/doc/guides/rawdevs/ioat_rawdev.rst index 99e757498..476b0503f 100644 --- a/doc/guides/rawdevs/ioat_rawdev.rst +++ b/doc/guides/rawdevs/ioat_rawdev.rst @@ -34,3 +34,14 @@ them to a suitable DPDK-supported kernel driver. When querying the status of the devices, they will appear under the category of "dma devices", i.e. the command ``dpdk-devbind.py --status-dev dma`` can be used to see the state of those devices alone. + +Device Probing and Initialization +~~ + +Once bound to a suitable kernel device driver, the HW devices will be found +as part of the PCI scan done at application initialization time. No vdev +parameters need to be passed to create or initialize the device. + +Once probed successfully, the device will appear as a ``rawdev``, that is a +"raw device type" inside DPDK, and can be accessed using APIs from the +``rte_rawdev`` library. diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c index d9fc3091a..b6964bccd 100644 --- a/drivers/raw/ioat/ioat_rawdev.c +++ b/drivers/raw/ioat/ioat_rawdev.c @@ -2,6 +2,7 @@ * Copyright(c) 2019 Intel Corporation */ +#include #include #include @@ -26,15 +27,101 @@ static struct rte_pci_driver ioat_pmd_drv; static int ioat_rawdev_create(const char *name, struct rte_pci_device *dev) { - RTE_SET_USED(name); - RTE_SET_USED(dev); + static const struct rte_rawdev_ops ioat_rawdev_ops = { + }; + + struct rte_rawdev *rawdev = NULL; + struct rte_ioat_rawdev *ioat = NULL; + int ret = 0; + int retry = 0; + + if (!name) { + IOAT_PMD_ERR("Invalid name of the device!"); + ret = -EINVAL; + goto cleanup; + } + + /* Allocate device structure */ + rawdev = rte_rawdev_pmd_allocate(name, sizeof(struct rte_ioat_rawdev), +dev->device.numa_node); + if (rawdev == NULL) { + IOAT_PMD_ERR("Unable to allocate raw device"); + ret = -EINVAL; + goto cleanup; + } + + rawdev->dev_ops = &ioat_rawdev_ops; + rawdev->device = &dev->device; + rawdev->driver_name = dev->device.driver->name; + + ioat = rawdev->dev_private; + ioat->rawdev = rawdev; + ioat->regs = dev->mem_resource[0].addr; + ioat->ring_size = 0; + ioat->desc_ring = NULL; + ioat->status_addr = rte_malloc_virt2iova(ioat) + + offsetof(struct rte_ioat_rawdev, status); + + /* do device initialization - reset and set error behaviour */ + if (ioat->regs->chancnt != 1) + IOAT_PMD_ERR("%s: Channel count == %d\n", __func__, + ioat->regs->chancnt); + + if (ioat->regs->chanctrl & 0x100) { /* locked by someone else */ + IOAT_PMD_WARN("%s: Channel appears locked\n", __func__); + ioat->regs->chanctrl = 0; + } + + ioat->regs->chancmd = RTE_IOAT_CHANCMD_SUSPEND; + rte_delay_ms(1); + ioat->regs->chancmd = RTE_IOAT_CHANCMD_RESET; + rte_delay_ms(1); + while (ioat->regs->chancmd & RTE_IOAT_CHANCMD_RESET) { + ioat->regs->chainaddr = 0; + rte_delay_ms(1); + if (++retry >= 200) { + IOAT_PMD_ERR("%s: cannot reset device. CHANCMD=0x%llx, CHANSTS=0x%llx, CHANERR=0x%llx\n", + __func__, + (unsigned long long)ioat->regs->chancmd, + (unsigned long long)ioat->regs->chansts, + (unsigned long long)ioat->regs->chanerr); + ret = -EIO; + } + } + ioat->regs->chanctrl = RTE_IOAT_CHANCTRL_ANY_ERR_ABORT_EN | + RTE_IOAT_CHANCTRL_ERR_COMPLETION_EN; + return 0; + +cleanup: + if (rawdev) + rte_rawdev_pmd_release(rawdev); + + return ret; } static int ioat_rawdev_destroy(const char *name) { - RTE_SET_USED(name); + int ret; + struct rte_rawdev *rdev; + + if (!name) { + IOAT_PMD_ERR("Invalid device name"); + return -EINVAL; + } + + rdev = rte_rawdev_pmd_get_named_dev(name); + if (!rdev) { + IOAT_PMD_ERR("Invalid device name (%s)", name); + return -EINVAL; + } + + /* rte_rawdev_close is called by pmd_release *
[dpdk-dev] [PATCH 7/8] raw/ioat: add statistics functions
Add stats functions to track what is happening in the driver, and put unit tests to check those. Signed-off-by: Bruce Richardson --- app/test/test_ioat_rawdev.c| 38 ++ doc/guides/rawdevs/ioat_rawdev.rst | 14 ++ drivers/raw/ioat/ioat_rawdev.c | 44 ++ drivers/raw/ioat/rte_ioat_rawdev.h | 6 4 files changed, 102 insertions(+) diff --git a/app/test/test_ioat_rawdev.c b/app/test/test_ioat_rawdev.c index 36e97347c..7081f3365 100644 --- a/app/test/test_ioat_rawdev.c +++ b/app/test/test_ioat_rawdev.c @@ -24,6 +24,11 @@ run_ioat_tests(int dev_id) #define IOAT_TEST_RINGSIZE 512 struct rte_ioat_rawdev_config p = { .ring_size = -1 }; struct rte_rawdev_info info = { .dev_private = &p }; + struct rte_rawdev_xstats_name *snames = NULL; + uint64_t *stats = NULL; + unsigned int *ids = NULL; + unsigned int nb_xstats; + unsigned int i; rte_rawdev_info_get(dev_id, &info); if (p.ring_size != 0) { @@ -48,6 +53,39 @@ run_ioat_tests(int dev_id) printf("Error with rte_rawdev_start()\n"); return -1; } + + /* allocate memory for xstats names and values */ + nb_xstats = rte_rawdev_xstats_names_get(dev_id, NULL, 0); + + snames = malloc(sizeof(*snames) * nb_xstats); + if (snames == NULL) { + printf("Error allocating xstat names memory\n"); + return -1; + } + rte_rawdev_xstats_names_get(dev_id, snames, nb_xstats); + + ids = malloc(sizeof(*ids) * nb_xstats); + if (ids == NULL) { + printf("Error allocating xstat ids memory\n"); + return -1; + } + for (i = 0; i < nb_xstats; i++) + ids[i] = i; + + stats = malloc(sizeof(*stats) * nb_xstats); + if (stats == NULL) { + printf("Error allocating xstat memory\n"); + return -1; + } + + rte_rawdev_xstats_get(dev_id, ids, stats, nb_xstats); + for (i = 0; i < nb_xstats; i++) + printf("%s: %"PRIu64" ", snames[i].name, stats[i]); + printf("\n"); + + free(snames); + free(stats); + free(ids); return 0; } diff --git a/doc/guides/rawdevs/ioat_rawdev.rst b/doc/guides/rawdevs/ioat_rawdev.rst index b3fe79033..47f12e95c 100644 --- a/doc/guides/rawdevs/ioat_rawdev.rst +++ b/doc/guides/rawdevs/ioat_rawdev.rst @@ -111,3 +111,17 @@ The following code shows how the device is configured in Once configured, the device can then be made ready for use by calling the ``rte_rawdev_start()`` API. + +Querying Device Statistics +~~~ + +The statistics from the IOAT rawdev device can be got via the xstats +functions in the ``rte_rawdev`` library, i.e. +``rte_rawdev_xstats_names_get()``, ``rte_rawdev_xstats_get()`` and +``rte_rawdev_xstats_by_name_get``. The statistics returned for each device +instance are: + +* ``failed_enqueues`` +* ``successful_enqueues`` +* ``copies_started`` +* ``copies_completed`` diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c index b4b70a1e6..09fbdbf9c 100644 --- a/drivers/raw/ioat/ioat_rawdev.c +++ b/drivers/raw/ioat/ioat_rawdev.c @@ -4,6 +4,7 @@ #include #include +#include #include #include "rte_ioat_rawdev.h" @@ -106,6 +107,47 @@ ioat_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info) cfg->ring_size = ioat->ring_size; } +static const char *xstat_names[] = { + "failed_enqueues", "successful_enqueues", + "copies_started", "copies_completed" +}; + +static int +ioat_xstats_get(const struct rte_rawdev *dev, const unsigned int ids[], + uint64_t values[], unsigned int n) +{ + const struct rte_ioat_rawdev *ioat = dev->dev_private; + unsigned int i; + + for (i = 0; i < n; i++) { + switch (ids[i]){ + case 0: values[i] = ioat->enqueue_failed; break; + case 1: values[i] = ioat->enqueued; break; + case 2: values[i] = ioat->started; break; + case 3: values[i] = ioat->completed; break; + default: values[i] = 0; break; + } + } + return n; +} + +static int +ioat_xstats_get_names(const struct rte_rawdev *dev, + struct rte_rawdev_xstats_name *names, + unsigned int size) +{ + unsigned int i; + + RTE_SET_USED(dev); + if (size < RTE_DIM(xstat_names)) + return RTE_DIM(xstat_names); + + for (i = 0; i < RTE_DIM(xstat_names); i++) + strlcpy(names[i].name, xstat_names[i], sizeof(names[i])); + + return RTE_DIM(xstat_names); +} + static int ioat_rawdev_create(const char *name, struct rte_pci_device *dev) { @@ -114,6 +156,8 @@ ioat_rawdev_create(const char *name, struct rte_pci_device *dev) .dev_start = ioat_dev_start,
[dpdk-dev] [PATCH 5/8] raw/ioat: add device info function
Add in the "info_get" function to the driver, to allow us to query the device. This allows us to have the unit test pick up the presence of supported hardware or not. Signed-off-by: Bruce Richardson --- app/test/meson.build | 3 +++ app/test/test_ioat_rawdev.c| 23 doc/guides/rawdevs/ioat_rawdev.rst | 34 ++ drivers/raw/ioat/ioat_rawdev.c | 11 ++ drivers/raw/ioat/rte_ioat_rawdev.h | 11 ++ 5 files changed, 82 insertions(+) diff --git a/app/test/meson.build b/app/test/meson.build index 9867619d3..9fe3ddc89 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -305,6 +305,9 @@ endif if dpdk_conf.has('RTE_LIBRTE_KNI') test_deps += 'kni' endif +if dpdk_conf.has('RTE_LIBRTE_PMD_IOAT_RAWDEV') + test_deps += 'pmd_ioat' +endif cflags = machine_args if cc.has_argument('-Wno-format-truncation') diff --git a/app/test/test_ioat_rawdev.c b/app/test/test_ioat_rawdev.c index bd1bb2827..ac1389f6e 100644 --- a/app/test/test_ioat_rawdev.c +++ b/app/test/test_ioat_rawdev.c @@ -11,9 +11,32 @@ test_ioat_rawdev(void) { return TEST_SKIPPED; } #else +#include +#include + +#include +#include +#include + static int test_ioat_rawdev(void) { + const int count = rte_rawdev_count(); + int i, found = 0; + + printf("Checking %d rawdevs\n", count); + for (i = 0; i < count && !found; i++) { + struct rte_rawdev_info info = { .dev_private = NULL }; + found = (rte_rawdev_info_get(i, &info) == 0 && + strcmp(info.driver_name, + IOAT_PMD_RAWDEV_NAME_STR) == 0); + } + + if (!found) { + printf("No IOAT rawdev found, skipping tests\n"); + return TEST_SKIPPED; + } + return 0; } diff --git a/doc/guides/rawdevs/ioat_rawdev.rst b/doc/guides/rawdevs/ioat_rawdev.rst index 476b0503f..b68cdffc3 100644 --- a/doc/guides/rawdevs/ioat_rawdev.rst +++ b/doc/guides/rawdevs/ioat_rawdev.rst @@ -45,3 +45,37 @@ parameters need to be passed to create or initialize the device. Once probed successfully, the device will appear as a ``rawdev``, that is a "raw device type" inside DPDK, and can be accessed using APIs from the ``rte_rawdev`` library. + +Using IOAT Rawdev Devices +-- + +To use the devices from an application, the rawdev API can be used, along +with definitions taken from the device-specific header file +``rte_ioat_rawdev.h``. This header is needed to get the definition of +structure parameters used by some of the rawdev APIs for IOAT rawdev +devices, as well as providing key functions for using the device for memory +copies. + +Getting Device Information +~~~ + +Basic information about each rawdev device can be got using the +``rte_rawdev_info_get()`` API. For most applications, this API will be +needed to verify that the rawdev in question is of the expected type. For +example, the following code in ``test_ioat_rawdev.c`` is used to identify +the IOAT rawdev device for use for the tests: + +.. code-block:: C + +for (i = 0; i < count && !found; i++) { +struct rte_rawdev_info info = { .dev_private = NULL }; +found = (rte_rawdev_info_get(i, &info) == 0 && +strcmp(info.driver_name, +IOAT_PMD_RAWDEV_NAME_STR) == 0); +} + +When calling the ``rte_rawdev_info_get()`` API for an IOAT rawdev device, +the ``dev_private`` field in the ``rte_rawdev_info`` struct should either +be NULL, or else be set to point to a structure of type +``rte_ioat_rawdev_config``, in which case the size of the configured device +input ring will be returned in that structure. diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c index b6964bccd..90bed2810 100644 --- a/drivers/raw/ioat/ioat_rawdev.c +++ b/drivers/raw/ioat/ioat_rawdev.c @@ -24,10 +24,21 @@ static struct rte_pci_driver ioat_pmd_drv; #define IOAT_PMD_ERR(fmt, args...)IOAT_PMD_LOG(ERR, fmt, ## args) #define IOAT_PMD_WARN(fmt, args...) IOAT_PMD_LOG(WARNING, fmt, ## args) +static void +ioat_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info) +{ + struct rte_ioat_rawdev_config *cfg = dev_info; + struct rte_ioat_rawdev *ioat = dev->dev_private; + + if (cfg != NULL) + cfg->ring_size = ioat->ring_size; +} + static int ioat_rawdev_create(const char *name, struct rte_pci_device *dev) { static const struct rte_rawdev_ops ioat_rawdev_ops = { + .dev_info_get = ioat_dev_info_get, }; struct rte_rawdev *rawdev = NULL; diff --git a/drivers/raw/ioat/rte_ioat_rawdev.h b/drivers/raw/ioat/rte_ioat_rawdev.h index c3216a174..7e0d72ca3 100644 --- a/drivers/raw/ioat/rte_ioat_rawdev.h +++ b/drivers/raw/ioat/rte_ioat_rawdev.h @@ -2
[dpdk-dev] [PATCH 6/8] raw/ioat: add configure, start and stop functions
Allow initializing a driver instance. Signed-off-by: Bruce Richardson --- app/test/test_ioat_rawdev.c| 35 +- doc/guides/rawdevs/ioat_rawdev.rst | 32 + drivers/raw/ioat/ioat_rawdev.c | 75 ++ drivers/raw/ioat/rte_ioat_rawdev.h | 14 ++ 4 files changed, 155 insertions(+), 1 deletion(-) diff --git a/app/test/test_ioat_rawdev.c b/app/test/test_ioat_rawdev.c index ac1389f6e..36e97347c 100644 --- a/app/test/test_ioat_rawdev.c +++ b/app/test/test_ioat_rawdev.c @@ -18,6 +18,39 @@ test_ioat_rawdev(void) { return TEST_SKIPPED; } #include #include +static int +run_ioat_tests(int dev_id) +{ +#define IOAT_TEST_RINGSIZE 512 + struct rte_ioat_rawdev_config p = { .ring_size = -1 }; + struct rte_rawdev_info info = { .dev_private = &p }; + + rte_rawdev_info_get(dev_id, &info); + if (p.ring_size != 0) { + printf("Error, initial ring size is non-zero (%d)\n", + (int)p.ring_size); + return -1; + } + + p.ring_size = IOAT_TEST_RINGSIZE; + if (rte_rawdev_configure(dev_id, &info) != 0) { + printf("Error with rte_rawdev_configure()\n"); + return -1; + } + rte_rawdev_info_get(dev_id, &info); + if (p.ring_size != IOAT_TEST_RINGSIZE) { + printf("Error, ring size is not %d (%d)\n", + IOAT_TEST_RINGSIZE, (int)p.ring_size); + return -1; + } + + if (rte_rawdev_start(dev_id) != 0) { + printf("Error with rte_rawdev_start()\n"); + return -1; + } + return 0; +} + static int test_ioat_rawdev(void) { @@ -37,7 +70,7 @@ test_ioat_rawdev(void) return TEST_SKIPPED; } - return 0; + return run_ioat_tests(i); } #endif /* RTE_LIBRTE_PMD_IOAT_RAWDEV */ diff --git a/doc/guides/rawdevs/ioat_rawdev.rst b/doc/guides/rawdevs/ioat_rawdev.rst index b68cdffc3..b3fe79033 100644 --- a/doc/guides/rawdevs/ioat_rawdev.rst +++ b/doc/guides/rawdevs/ioat_rawdev.rst @@ -79,3 +79,35 @@ the ``dev_private`` field in the ``rte_rawdev_info`` struct should either be NULL, or else be set to point to a structure of type ``rte_ioat_rawdev_config``, in which case the size of the configured device input ring will be returned in that structure. + +Device Configuration +~ + +Configuring an IOAT rawdev device is done using the +``rte_rawdev_configure()`` API, which takes the same structure parameters +as the, previously referenced, ``rte_rawdev_info_get()`` API. The main +difference is that, because the parameter is used as input rather than +output, the ``dev_private`` structure element cannot be NULL, and must +point to a valid ``rte_ioat_rawdev_config`` structure, containing the ring +size to be used by the device. The ring size must be a power of two, +between 64 and 4096. + +The following code shows how the device is configured in +``test_ioat_rawdev.c``: + +.. code-block:: C + + #define IOAT_TEST_RINGSIZE 512 +struct rte_ioat_rawdev_config p = { .ring_size = -1 }; +struct rte_rawdev_info info = { .dev_private = &p }; + +/* ... */ + +p.ring_size = IOAT_TEST_RINGSIZE; +if (rte_rawdev_configure(dev_id, &info) != 0) { +printf("Error with rte_rawdev_configure()\n"); +return -1; +} + +Once configured, the device can then be made ready for use by calling the +``rte_rawdev_start()`` API. diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c index 90bed2810..b4b70a1e6 100644 --- a/drivers/raw/ioat/ioat_rawdev.c +++ b/drivers/raw/ioat/ioat_rawdev.c @@ -24,6 +24,78 @@ static struct rte_pci_driver ioat_pmd_drv; #define IOAT_PMD_ERR(fmt, args...)IOAT_PMD_LOG(ERR, fmt, ## args) #define IOAT_PMD_WARN(fmt, args...) IOAT_PMD_LOG(WARNING, fmt, ## args) +#define DESC_SZ sizeof(struct rte_ioat_desc) +#define COMPLETION_SZ sizeof(__m128i) + +static int +ioat_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config) +{ + struct rte_ioat_rawdev_config *params = config; + struct rte_ioat_rawdev *ioat = dev->dev_private; + unsigned short i; + + if (dev->started) + return -EBUSY; + + if (params == NULL) + return -EINVAL; + + if (params->ring_size > 4096 || params->ring_size < 64 || + !rte_is_power_of_2(params->ring_size)) + return -EINVAL; + + ioat->ring_size = params->ring_size; + if (ioat->desc_ring != NULL) { + rte_free(ioat->desc_ring); + ioat->desc_ring = NULL; + } + + /* allocate one block of memory for both descriptors +* and completion handles. +*/ + ioat->desc_ring = rte_zmalloc_socket(NULL, + (DESC_SZ + COMPLETION_SZ) * ioat->ring_size, + 0, /* alignment, de
[dpdk-dev] [PATCH 8/8] raw/ioat: add local API to perform copies
Add local APIs to trigger data copies, and retrieve handle values once those copies are completed. Included are unit tests to validate the data is copies correctly. Signed-off-by: Bruce Richardson --- app/test/test_ioat_rawdev.c| 159 - doc/guides/rawdevs/ioat_rawdev.rst | 100 ++ drivers/raw/ioat/rte_ioat_rawdev.h | 155 +++- 3 files changed, 410 insertions(+), 4 deletions(-) diff --git a/app/test/test_ioat_rawdev.c b/app/test/test_ioat_rawdev.c index 7081f3365..f2240adec 100644 --- a/app/test/test_ioat_rawdev.c +++ b/app/test/test_ioat_rawdev.c @@ -18,6 +18,131 @@ test_ioat_rawdev(void) { return TEST_SKIPPED; } #include #include +static struct rte_mempool *pool; + +static int +test_enqueue_copies(int dev_id) +{ + const unsigned int length = 1024; + unsigned int i; + + do { + struct rte_mbuf *src, *dst; + char *src_data, *dst_data; + struct rte_mbuf *completed[2] = {0}; + + /* test doing a single copy */ + src = rte_pktmbuf_alloc(pool); + dst = rte_pktmbuf_alloc(pool); + src->data_len = src->pkt_len = length; + dst->data_len = dst->pkt_len = length; + src_data = rte_pktmbuf_mtod(src, char *); + dst_data = rte_pktmbuf_mtod(dst, char *); + + for (i = 0; i < length; i++) + src_data[i] = rand() & 0xFF; + + if (rte_ioat_enqueue_copy(dev_id, + src->buf_iova + src->data_off, + dst->buf_iova + dst->data_off, + length, + (uintptr_t)src, + (uintptr_t)dst, + 0 /* no fence */) != 1) { + printf("Error with rte_ioat_enqueue_copy\n"); + return -1; + } + rte_ioat_do_copies(dev_id); + usleep(10); + + if (rte_ioat_completed_copies(dev_id, 1, (void *)&completed[0], + (void *)&completed[1]) != 1) { + printf("Error with rte_ioat_completed_copies\n"); + return -1; + } + if (completed[0] != src || completed[1] != dst) { + printf("Error with completions: got (%p, %p), not (%p,%p)\n", + completed[0], completed[1], src, dst); + return -1; + } + + for (i = 0; i < length; i++) + if (dst_data[i] != src_data[i]) { + printf("Data mismatch at char %u\n", i); + return -1; + } + rte_pktmbuf_free(src); + rte_pktmbuf_free(dst); + } while(0); + + /* test doing multiple copies */ + do { + struct rte_mbuf *srcs[32], *dsts[32]; + struct rte_mbuf *completed_src[64]; + struct rte_mbuf *completed_dst[64]; + unsigned int j; + + for (i = 0; i < RTE_DIM(srcs); i++) { + char *src_data; + + srcs[i] = rte_pktmbuf_alloc(pool); + dsts[i] = rte_pktmbuf_alloc(pool); + srcs[i]->data_len = srcs[i]->pkt_len = length; + dsts[i]->data_len = dsts[i]->pkt_len = length; + src_data = rte_pktmbuf_mtod(srcs[i], char *); + + for (j = 0; j < length; j++) + src_data[j] = rand() & 0xFF; + + if (rte_ioat_enqueue_copy(dev_id, + srcs[i]->buf_iova + srcs[i]->data_off, + dsts[i]->buf_iova + dsts[i]->data_off, + length, + (uintptr_t)srcs[i], + (uintptr_t)dsts[i], + 0 /* nofence */) != 1) { + printf("Error with rte_ioat_enqueue_copy for buffer %u\n", + i); + return -1; + } + } + rte_ioat_do_copies(dev_id); + usleep(100); + + if (rte_ioat_completed_copies(dev_id, 64, (void *)completed_src, + (void *)completed_dst) != RTE_DIM(srcs)) { + printf("Error with rte_ioat_completed_copies\n"); + return -1; + } + for (i = 0; i < RTE_DIM(srcs); i++) { + char *src_data, *dst_data; + + if (completed_src[i] != srcs[i]) { + pri
[dpdk-dev] [PATCH v2 01/12] eal: Make rte_eal_using_phys_addrs work sooner
This function only returned the correct answer after a call to initialize the memory subsystem. Make it work prior to that. Signed-off-by: Ben Walker --- lib/librte_eal/linux/eal/eal_memory.c | 63 --- 1 file changed, 28 insertions(+), 35 deletions(-) diff --git a/lib/librte_eal/linux/eal/eal_memory.c b/lib/librte_eal/linux/eal/eal_memory.c index 416dad898..0c07bb946 100644 --- a/lib/librte_eal/linux/eal/eal_memory.c +++ b/lib/librte_eal/linux/eal/eal_memory.c @@ -66,34 +66,8 @@ * zone as well as a physical contiguous zone. */ -static bool phys_addrs_available = true; - #define RANDOMIZE_VA_SPACE_FILE "/proc/sys/kernel/randomize_va_space" -static void -test_phys_addrs_available(void) -{ - uint64_t tmp = 0; - phys_addr_t physaddr; - - if (!rte_eal_has_hugepages()) { - RTE_LOG(ERR, EAL, - "Started without hugepages support, physical addresses not available\n"); - phys_addrs_available = false; - return; - } - - physaddr = rte_mem_virt2phy(&tmp); - if (physaddr == RTE_BAD_PHYS_ADDR) { - if (rte_eal_iova_mode() == RTE_IOVA_PA) - RTE_LOG(ERR, EAL, - "Cannot obtain physical addresses: %s. " - "Only vfio will function.\n", - strerror(errno)); - phys_addrs_available = false; - } -} - /* * Get physical address of any mapped virtual address in the current process. */ @@ -107,7 +81,7 @@ rte_mem_virt2phy(const void *virtaddr) off_t offset; /* Cannot parse /proc/self/pagemap, no need to log errors everywhere */ - if (!phys_addrs_available) + if (!rte_eal_using_phys_addrs()) return RTE_BAD_IOVA; /* standard page size */ @@ -1336,8 +1310,6 @@ eal_legacy_hugepage_init(void) int nr_hugefiles, nr_hugepages = 0; void *addr; - test_phys_addrs_available(); - memset(used_hp, 0, sizeof(used_hp)); /* get pointer to global configuration */ @@ -1516,7 +1488,7 @@ eal_legacy_hugepage_init(void) continue; } - if (phys_addrs_available && + if (rte_eal_using_phys_addrs() && rte_eal_iova_mode() != RTE_IOVA_VA) { /* find physical addresses for each hugepage */ if (find_physaddrs(&tmp_hp[hp_offset], hpi) < 0) { @@ -1735,8 +1707,6 @@ eal_hugepage_init(void) uint64_t memory[RTE_MAX_NUMA_NODES]; int hp_sz_idx, socket_id; - test_phys_addrs_available(); - memset(used_hp, 0, sizeof(used_hp)); for (hp_sz_idx = 0; @@ -1879,8 +1849,6 @@ eal_legacy_hugepage_attach(void) "into secondary processes\n"); } - test_phys_addrs_available(); - fd_hugepage = open(eal_hugepage_data_path(), O_RDONLY); if (fd_hugepage < 0) { RTE_LOG(ERR, EAL, "Could not open %s\n", @@ -2020,7 +1988,32 @@ rte_eal_hugepage_attach(void) int rte_eal_using_phys_addrs(void) { - return phys_addrs_available; + static int using_phys_addrs = -1; + uint64_t tmp = 0; + phys_addr_t physaddr; + + if (using_phys_addrs != -1) + return using_phys_addrs; + + /* Set the default to 1 */ + using_phys_addrs = 1; + + if (!rte_eal_has_hugepages()) { + RTE_LOG(ERR, EAL, + "Started without hugepages support, physical addresses not available\n"); + using_phys_addrs = 0; + return using_phys_addrs; + } + + physaddr = rte_mem_virt2phy(&tmp); + if (physaddr == RTE_BAD_PHYS_ADDR) { + if (rte_eal_iova_mode() == RTE_IOVA_PA) + RTE_LOG(ERR, EAL, + "Cannot obtain physical addresses. Only vfio will function.\n"); + using_phys_addrs = 0; + } + + return using_phys_addrs; } static int __rte_unused -- 2.20.1
[dpdk-dev] [PATCH v2 03/12] eal/pci: Rework loops in rte_pci_get_iommu_class
Make all of the loops first iterate over devices, then drivers. This is in preparation for combining them into a single loop. Signed-off-by: Ben Walker --- drivers/bus/pci/linux/pci.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index d3177916a..70815e4f0 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -589,10 +589,10 @@ rte_pci_get_iommu_class(void) if (!is_bound) return RTE_IOVA_DC; - FOREACH_DRIVER_ON_PCIBUS(drv) { - if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) { - FOREACH_DEVICE_ON_PCIBUS(dev) { - if (dev->kdrv == RTE_KDRV_VFIO && + FOREACH_DEVICE_ON_PCIBUS(dev) { + if (dev->kdrv == RTE_KDRV_VFIO) { + FOREACH_DRIVER_ON_PCIBUS(drv) { + if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA && rte_pci_match(drv, dev)) { has_iova_va = true; break; @@ -631,8 +631,8 @@ rte_pci_get_iommu_class(void) } break_out = false; - FOREACH_DRIVER_ON_PCIBUS(drv) { - FOREACH_DEVICE_ON_PCIBUS(dev) { + FOREACH_DEVICE_ON_PCIBUS(dev) { + FOREACH_DRIVER_ON_PCIBUS(drv) { if (!rte_pci_match(drv, dev)) continue; /* -- 2.20.1
[dpdk-dev] [PATCH v2 04/12] eal/pci: Collapse two loops in rte_pci_get_iommu_class
Two of these loops easily collapse into a single loop. This sets the stage for future simplifications. Signed-off-by: Ben Walker --- drivers/bus/pci/linux/pci.c | 31 ++- 1 file changed, 10 insertions(+), 21 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 70815e4f0..29ffae77f 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -571,7 +571,6 @@ rte_pci_get_iommu_class(void) bool has_iova_va = false; bool is_bound_uio = false; bool iommu_no_va = false; - bool break_out; bool need_check; struct rte_pci_device *dev = NULL; struct rte_pci_driver *drv = NULL; @@ -592,8 +591,16 @@ rte_pci_get_iommu_class(void) FOREACH_DEVICE_ON_PCIBUS(dev) { if (dev->kdrv == RTE_KDRV_VFIO) { FOREACH_DRIVER_ON_PCIBUS(drv) { - if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA && - rte_pci_match(drv, dev)) { + if (!rte_pci_match(drv, dev)) + continue; + + /* +* just one PCI device needs to be checked out because +* the IOMMU hardware is the same for all of them. +*/ + iommu_no_va = !pci_one_device_iommu_support_va(dev); + + if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) { has_iova_va = true; break; } @@ -630,24 +637,6 @@ rte_pci_get_iommu_class(void) } } - break_out = false; - FOREACH_DEVICE_ON_PCIBUS(dev) { - FOREACH_DRIVER_ON_PCIBUS(drv) { - if (!rte_pci_match(drv, dev)) - continue; - /* -* just one PCI device needs to be checked out because -* the IOMMU hardware is the same for all of them. -*/ - iommu_no_va = !pci_one_device_iommu_support_va(dev); - break_out = true; - break; - } - - if (break_out) - break; - } - #ifdef VFIO_PRESENT is_vfio_noiommu_enabled = rte_vfio_noiommu_is_enabled() == true ? true : false; -- 2.20.1
[dpdk-dev] [PATCH v2 02/12] eal/pci: Inline several functions into rte_pci_get_iommu_class
This is in preparation for future simplifications. The functions are simply inlined for now. Signed-off-by: Ben Walker --- drivers/bus/pci/linux/pci.c | 176 +++- 1 file changed, 71 insertions(+), 105 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index c99d523f0..d3177916a 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -497,86 +497,6 @@ rte_pci_scan(void) return -1; } -/* - * Is pci device bound to any kdrv - */ -static inline int -pci_one_device_is_bound(void) -{ - struct rte_pci_device *dev = NULL; - int ret = 0; - - FOREACH_DEVICE_ON_PCIBUS(dev) { - if (dev->kdrv == RTE_KDRV_UNKNOWN || - dev->kdrv == RTE_KDRV_NONE) { - continue; - } else { - ret = 1; - break; - } - } - return ret; -} - -/* - * Any one of the device bound to uio - */ -static inline int -pci_one_device_bound_uio(void) -{ - struct rte_pci_device *dev = NULL; - struct rte_devargs *devargs; - int need_check; - - FOREACH_DEVICE_ON_PCIBUS(dev) { - devargs = dev->device.devargs; - - need_check = 0; - switch (rte_pci_bus.bus.conf.scan_mode) { - case RTE_BUS_SCAN_WHITELIST: - if (devargs && devargs->policy == RTE_DEV_WHITELISTED) - need_check = 1; - break; - case RTE_BUS_SCAN_UNDEFINED: - case RTE_BUS_SCAN_BLACKLIST: - if (devargs == NULL || - devargs->policy != RTE_DEV_BLACKLISTED) - need_check = 1; - break; - } - - if (!need_check) - continue; - - if (dev->kdrv == RTE_KDRV_IGB_UIO || - dev->kdrv == RTE_KDRV_UIO_GENERIC) { - return 1; - } - } - return 0; -} - -/* - * Any one of the device has iova as va - */ -static inline int -pci_one_device_has_iova_va(void) -{ - struct rte_pci_device *dev = NULL; - struct rte_pci_driver *drv = NULL; - - FOREACH_DRIVER_ON_PCIBUS(drv) { - if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) { - FOREACH_DEVICE_ON_PCIBUS(dev) { - if (dev->kdrv == RTE_KDRV_VFIO && - rte_pci_match(drv, dev)) - return 1; - } - } - } - return 0; -} - #if defined(RTE_ARCH_X86) static bool pci_one_device_iommu_support_va(struct rte_pci_device *dev) @@ -641,14 +561,76 @@ pci_one_device_iommu_support_va(__rte_unused struct rte_pci_device *dev) #endif /* - * All devices IOMMUs support VA as IOVA + * Get iommu class of PCI devices on the bus. */ -static bool -pci_devices_iommu_support_va(void) +enum rte_iova_mode +rte_pci_get_iommu_class(void) { + bool is_bound = false; + bool is_vfio_noiommu_enabled = true; + bool has_iova_va = false; + bool is_bound_uio = false; + bool iommu_no_va = false; + bool break_out; + bool need_check; struct rte_pci_device *dev = NULL; struct rte_pci_driver *drv = NULL; + struct rte_devargs *devargs; + + FOREACH_DEVICE_ON_PCIBUS(dev) { + if (dev->kdrv == RTE_KDRV_UNKNOWN || + dev->kdrv == RTE_KDRV_NONE) { + continue; + } else { + is_bound = true; + break; + } + } + if (!is_bound) + return RTE_IOVA_DC; + FOREACH_DRIVER_ON_PCIBUS(drv) { + if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) { + FOREACH_DEVICE_ON_PCIBUS(dev) { + if (dev->kdrv == RTE_KDRV_VFIO && + rte_pci_match(drv, dev)) { + has_iova_va = true; + break; + } + } + + if (has_iova_va) + break; + } + } + + FOREACH_DEVICE_ON_PCIBUS(dev) { + devargs = dev->device.devargs; + + need_check = false; + switch (rte_pci_bus.bus.conf.scan_mode) { + case RTE_BUS_SCAN_WHITELIST: + if (devargs && devargs->policy == RTE_DEV_WHITELISTED) + need_check = true; + break; + case RTE_BUS_SCAN_UNDEFINED: + case RTE_BUS_SCAN_BLACKLIST: + if (devargs == NULL || + devarg
[dpdk-dev] [PATCH v2 05/12] eal/pci: Add function pci_ignore_device
This performs a check for whether the device should be ignored due to whitelist or blacklist. This check eventually needs to apply to all of the other checks in rte_pci_get_iommu_class. Signed-off-by: Ben Walker --- drivers/bus/pci/linux/pci.c | 44 + 1 file changed, 25 insertions(+), 19 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 29ffae77f..6d311f4e0 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -560,6 +560,29 @@ pci_one_device_iommu_support_va(__rte_unused struct rte_pci_device *dev) } #endif +static bool +pci_ignore_device(struct rte_pci_device *dev) +{ + struct rte_devargs *devargs; + + devargs = dev->device.devargs; + + switch (rte_pci_bus.bus.conf.scan_mode) { + case RTE_BUS_SCAN_WHITELIST: + if (devargs && devargs->policy == RTE_DEV_WHITELISTED) + return false; + break; + case RTE_BUS_SCAN_UNDEFINED: + case RTE_BUS_SCAN_BLACKLIST: + if (devargs == NULL || + devargs->policy != RTE_DEV_BLACKLISTED) + return false; + break; + } + + return true; +} + /* * Get iommu class of PCI devices on the bus. */ @@ -571,10 +594,9 @@ rte_pci_get_iommu_class(void) bool has_iova_va = false; bool is_bound_uio = false; bool iommu_no_va = false; - bool need_check; struct rte_pci_device *dev = NULL; struct rte_pci_driver *drv = NULL; - struct rte_devargs *devargs; + FOREACH_DEVICE_ON_PCIBUS(dev) { if (dev->kdrv == RTE_KDRV_UNKNOWN || @@ -612,23 +634,7 @@ rte_pci_get_iommu_class(void) } FOREACH_DEVICE_ON_PCIBUS(dev) { - devargs = dev->device.devargs; - - need_check = false; - switch (rte_pci_bus.bus.conf.scan_mode) { - case RTE_BUS_SCAN_WHITELIST: - if (devargs && devargs->policy == RTE_DEV_WHITELISTED) - need_check = true; - break; - case RTE_BUS_SCAN_UNDEFINED: - case RTE_BUS_SCAN_BLACKLIST: - if (devargs == NULL || - devargs->policy != RTE_DEV_BLACKLISTED) - need_check = true; - break; - } - - if (!need_check) + if (pci_ignore_device(dev)) continue; if (dev->kdrv == RTE_KDRV_IGB_UIO || -- 2.20.1
[dpdk-dev] [PATCH v2 08/12] eal/pci: Collapse loops in rte_pci_get_iommu_class
The three loops can now be easily combined into one. This is slightly less efficient than before because it doesn't break out early. But that can be addressed later. Signed-off-by: Ben Walker --- drivers/bus/pci/linux/pci.c | 19 +++ 1 file changed, 3 insertions(+), 16 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 549d61e74..765c473e8 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -604,15 +604,7 @@ rte_pci_get_iommu_class(void) if (dev->kdrv != RTE_KDRV_UNKNOWN && dev->kdrv != RTE_KDRV_NONE) { is_bound = true; - break; } - } - if (!is_bound) - return RTE_IOVA_DC; - - FOREACH_DEVICE_ON_PCIBUS(dev) { - if (pci_ignore_device(dev)) - continue; if (dev->kdrv == RTE_KDRV_VFIO) { FOREACH_DRIVER_ON_PCIBUS(drv) { @@ -630,15 +622,7 @@ rte_pci_get_iommu_class(void) break; } } - - if (has_iova_va) - break; } - } - - FOREACH_DEVICE_ON_PCIBUS(dev) { - if (pci_ignore_device(dev)) - continue; if (dev->kdrv == RTE_KDRV_IGB_UIO || dev->kdrv == RTE_KDRV_UIO_GENERIC) { @@ -646,6 +630,9 @@ rte_pci_get_iommu_class(void) } } + if (!is_bound) + return RTE_IOVA_DC; + #ifdef VFIO_PRESENT is_vfio_noiommu_enabled = rte_vfio_noiommu_is_enabled() == true ? true : false; -- 2.20.1
[dpdk-dev] [PATCH v2 07/12] eal/pci: Reverse if check in rte_pci_get_iommu_class
It's simpler to reverse the if statement here, especially with an upcoming simplification. Signed-off-by: Ben Walker --- drivers/bus/pci/linux/pci.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index d2464d2ae..549d61e74 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -601,10 +601,8 @@ rte_pci_get_iommu_class(void) if (pci_ignore_device(dev)) continue; - if (dev->kdrv == RTE_KDRV_UNKNOWN || - dev->kdrv == RTE_KDRV_NONE) { - continue; - } else { + if (dev->kdrv != RTE_KDRV_UNKNOWN && + dev->kdrv != RTE_KDRV_NONE) { is_bound = true; break; } -- 2.20.1
[dpdk-dev] [PATCH v2 09/12] eal/pci: Simplify rte_pci_get_iommu class by using a switch
Take several independent if statements and convert to a switch statement. Signed-off-by: Ben Walker --- drivers/bus/pci/linux/pci.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 765c473e8..5e61f46c8 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -601,12 +601,12 @@ rte_pci_get_iommu_class(void) if (pci_ignore_device(dev)) continue; - if (dev->kdrv != RTE_KDRV_UNKNOWN && - dev->kdrv != RTE_KDRV_NONE) { + switch (dev->kdrv) { + case RTE_KDRV_UNKNOWN: + case RTE_KDRV_NONE: + break; + case RTE_KDRV_VFIO: is_bound = true; - } - - if (dev->kdrv == RTE_KDRV_VFIO) { FOREACH_DRIVER_ON_PCIBUS(drv) { if (!rte_pci_match(drv, dev)) continue; @@ -622,11 +622,14 @@ rte_pci_get_iommu_class(void) break; } } - } - - if (dev->kdrv == RTE_KDRV_IGB_UIO || - dev->kdrv == RTE_KDRV_UIO_GENERIC) { + break; + case RTE_KDRV_IGB_UIO: + case RTE_KDRV_UIO_GENERIC: + case RTE_KDRV_NIC_UIO: + is_bound = true; is_bound_uio = true; + break; + } } -- 2.20.1
[dpdk-dev] [PATCH v2 06/12] eal/pci: Correctly test whitelist/blacklist in rte_pci_get_iommu_class
All of the checks should respect the white and black lists. Signed-off-by: Ben Walker --- drivers/bus/pci/linux/pci.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 6d311f4e0..d2464d2ae 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -597,8 +597,10 @@ rte_pci_get_iommu_class(void) struct rte_pci_device *dev = NULL; struct rte_pci_driver *drv = NULL; - FOREACH_DEVICE_ON_PCIBUS(dev) { + if (pci_ignore_device(dev)) + continue; + if (dev->kdrv == RTE_KDRV_UNKNOWN || dev->kdrv == RTE_KDRV_NONE) { continue; @@ -611,6 +613,9 @@ rte_pci_get_iommu_class(void) return RTE_IOVA_DC; FOREACH_DEVICE_ON_PCIBUS(dev) { + if (pci_ignore_device(dev)) + continue; + if (dev->kdrv == RTE_KDRV_VFIO) { FOREACH_DRIVER_ON_PCIBUS(drv) { if (!rte_pci_match(drv, dev)) -- 2.20.1
[dpdk-dev] [PATCH v2 10/12] eal/pci: Finding a device bound to UIO does not force PA
If a device is found that is bound to the UIO driver, only force IOVA_PA if there is a driver registered to use it. Signed-off-by: Ben Walker --- drivers/bus/pci/linux/pci.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 5e61f46c8..a71c66380 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -627,7 +627,13 @@ rte_pci_get_iommu_class(void) case RTE_KDRV_UIO_GENERIC: case RTE_KDRV_NIC_UIO: is_bound = true; - is_bound_uio = true; + FOREACH_DRIVER_ON_PCIBUS(drv) { + if (!rte_pci_match(drv, dev)) + continue; + + is_bound_uio = true; + break; + } break; } -- 2.20.1
[dpdk-dev] [PATCH v2 11/12] eal/pci: rte_pci_get_iommu_class handles no drivers
In the case where no drivers are registered with the system, rte_pci_get_iommu_class should return RTE_IOVA_DC. Signed-off-by: Ben Walker --- drivers/bus/pci/linux/pci.c | 91 - 1 file changed, 50 insertions(+), 41 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index a71c66380..60424932e 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -589,49 +589,80 @@ pci_ignore_device(struct rte_pci_device *dev) enum rte_iova_mode rte_pci_get_iommu_class(void) { - bool is_bound = false; - bool is_vfio_noiommu_enabled = true; - bool has_iova_va = false; - bool is_bound_uio = false; - bool iommu_no_va = false; - struct rte_pci_device *dev = NULL; - struct rte_pci_driver *drv = NULL; + struct rte_pci_device *dev; + struct rte_pci_driver *drv; + struct rte_pci_addr *addr; + enum rte_iova_mode iova_mode; + + iova_mode = RTE_IOVA_DC; FOREACH_DEVICE_ON_PCIBUS(dev) { if (pci_ignore_device(dev)) continue; + addr = &dev->addr; + switch (dev->kdrv) { case RTE_KDRV_UNKNOWN: case RTE_KDRV_NONE: break; case RTE_KDRV_VFIO: - is_bound = true; FOREACH_DRIVER_ON_PCIBUS(drv) { if (!rte_pci_match(drv, dev)) continue; - /* -* just one PCI device needs to be checked out because -* the IOMMU hardware is the same for all of them. -*/ - iommu_no_va = !pci_one_device_iommu_support_va(dev); + if ((drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) == 0) + continue; - if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) { - has_iova_va = true; - break; + if (!pci_one_device_iommu_support_va(dev)) { + RTE_LOG(WARNING, EAL, "Device " PCI_PRI_FMT + " wanted IOVA as VA, but ", + addr->domain, addr->bus, addr->devid, + addr->function); + RTE_LOG(WARNING, EAL, + "IOMMU does not support it.\n"); + iova_mode = RTE_IOVA_PA; + } +#ifdef VFIO_PRESENT + else if (rte_vfio_noiommu_is_enabled()) { + RTE_LOG(WARNING, EAL, "Device " PCI_PRI_FMT + " wanted IOVA as VA, but ", + addr->domain, addr->bus, addr->devid, + addr->function); + RTE_LOG(WARNING, EAL, + "vfio-noiommu is enabled.\n"); + iova_mode = RTE_IOVA_PA; +#endif + } else if (iova_mode == RTE_IOVA_PA) { + RTE_LOG(WARNING, EAL, "Device " PCI_PRI_FMT + " wanted IOVA as VA, but ", + addr->domain, addr->bus, addr->devid, + addr->function); + RTE_LOG(WARNING, EAL, + "other devices require PA.\n"); + } else { + iova_mode = RTE_IOVA_VA; } } break; case RTE_KDRV_IGB_UIO: case RTE_KDRV_UIO_GENERIC: case RTE_KDRV_NIC_UIO: - is_bound = true; FOREACH_DRIVER_ON_PCIBUS(drv) { if (!rte_pci_match(drv, dev)) continue; - is_bound_uio = true; + if (iova_mode == RTE_IOVA_VA) { + RTE_LOG(WARNING, EAL, + "Some devices wanted IOVA as VA, but "); + RTE_LOG(WARNING, EAL, "device " PCI_PRI_FMT + " requires PA.\n", + addr->domain